✦ Luna Orbit — AI & Machine Learning

Language Engineer, Artificial General Intelligence - Data Services

at Amazon.com

📍 US, MA, Boston Unknown Posted March 14, 2026
Type Not Specified
Experience mid
Exp. Years 2+ years
Education Master's or higher degree in a relevant field
Category AI & Machine Learning

This role involves developing and managing complex multimodal datasets for training and evaluating AI models, with a focus on synthetic data generation and data annotation.

  • Design data collection efforts
  • Analyze large datasets
  • Build data analysis tools
  • Collaborate with scientists
  • Ensure data quality

The environment includes data collection, synthetic data generation, data analysis, and annotation systems, primarily using Python and related tools for multimodal and speech/text data.

The ideal candidate is a mid-level AI or NLP professional with 2+ years experience in language data processing, data collection, and annotation, proficient in Python, and with a strong background in computational linguistics or AI data creation. They should be collaborative, detail-oriented, and capable of handling complex multimodal datasets.

Experience owning and executing language data collection projectsMaster's or higher degree in Computational Linguistics or equivalent2+ years experience in computational linguistics or language data processing or AI data creationExperience with language data annotation systemsProficient with scripting languages such as PythonExperience working with speechtextand multimodal data in multiple languages
PhD in Computational LinguisticsExpertise in bootstrapping AI data collections
PythonData annotation systems
Experience owning language data projectsMaster's in Computational LinguisticsPythondata annotationsynthetic datamultimodal dataspeech datatext data
PythonScripting languagesSynthetic data generationModel-supported data generationData analysisData collectionData annotation systemsMultimodal dataSpeech dataText data
CommunicationOrganizational skillsCollaborationAttention to detailProblem-solving
Industry Technology
Job Function AI data development and dataset management
Role Subtype AI Engineer
Tech Domains Python, Data annotation systems
Language EngineerArtificial General IntelligenceData Servicessynthetic data generationmodel-supported data generationdata collectiondata analysismultimodal dataspeech datatext dataPythonData annotation systemsAI data creationcomputational linguisticsNatural Language Processinglanguage data collectionsynthetic datamultimodal datasetsPython scriptingdata annotationAI model evaluationcollaborative research

Lack of experience with language data annotation systems, No proficiency in scripting languages like Python, No relevant advanced degree, Less than 2 years experience in relevant field

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile