✦ Luna Orbit — AI & Machine Learning

AI Tutor, Rufus, Amazon

at Amazon.com

📍 US, WA, Seattle Unknown Posted April 14, 2026
Type Full-Time
Experience mid
Exp. Years Not specified
Education Not specified
Category AI & Machine Learning

As an AI Tutor for Rufus, you will evaluate and label data to improve the fluency and overall shopping experience of Amazon’s shopping AI model. You’ll apply high judgment to ambiguous tasks, follow labeling guidelines and KPIs, and help refine prompting strategies in partnership with product and engineering teams.

  • Deliver high-quality labelled data meeting KPIs
  • Conduct high-judgment evaluations for LLM training
  • Generate human insight data across text, image, video, and audio
  • Develop prompting strategies to train and improve the Shopping LLM
  • Analyze root causes and propose solutions to improve labeling quality and SOP/tooling

This role supports training of Large Language Models using high-judgment evaluations and labeled datasets across multiple modalities (text, image, video, audio). You will develop prompting strategies and use root-cause analysis to identify error patterns and improve labeling quality and operational processes.

The ideal candidate is an AI-focused evaluator and labeler with strong language skills and proven high-judgment performance in ambiguous, detail-heavy tasks. They have experience supporting the training of Large Language Models through data labeling, quality evaluation, and iterative prompting strategies in collaboration with Product, Science, and Engineering teams.

strong language skillsexceptional detail-oriented capabilityhigh judgment in ambiguous environmentshigh-quality labelled data delivery using guidelines to meet KPIsability to train Large Language Models via evaluations and labelingsound judgments and logical decisions when information is ambiguous or incompleteability to identify day-to-day process and operational issues in Standard Operating Procedures and related tooling
in-house tools and software
language skillshigh-judgment evaluationsdata labelingtraining Large Language Modelsprompting strategiesgenerating high-quality human insight datatext image video audio evaluationroot cause analysiserror pattern identificationSOP and tooling issue identificationKPI-based quality control
language skillshigh-judgment evaluationsdata labelingtraining Large Language Modelsprompting strategiesgenerating high-quality labelled dataroot cause analysiserror pattern identificationquality improvement for labeling tasksdata evaluation and labeling in-house tools and softwareKPIs management for labeled datahuman insight data generationtext evaluationimage evaluationvideo evaluationaudio evaluationStandard Operating Procedures (SOP) reviewtooling issue identification
exceptional detail-oriented workhigh judgment in ambiguous environmentsexceptional written communicationexceptional oral communicationstrong interpersonal skillslogical decision-makingcontext-switchingmulti-taskingcustomer-focused mindsetstrong judgment and logical reasoning
Industry SaaS
Job Function Improve a shopping Large Language Model through high-quality data labeling and evaluation
Role Subtype Prompt Engineer
Tech Domains Python
AI TutorRufusAmazon RufusLarge Language ModelsLLMsprompting strategiestraininghigh-judgment evaluationsdata labelinglabeled dataKPIsin-house toolstextimagevideoaudiohuman insight dataambiguous or incomplete informationroot causeserror patternsStandard Operating Procedures (SOP)Standard Operating ProceduresEditorial teamProductScienceEngineering teamstext image video audioroot cause analysis

Must demonstrate strong language skills and exceptional attention to detail, Must be able to perform high-judgment evaluations and deliver high-quality labelled data meeting KPIs, Must be able to work with ambiguous or incomplete information and maintain a high quality bar

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile