Position Details

Type Not Specified

Experience mid

Exp. Years 2+ years

Education Master's or higher degree in a relevant field

Category AI & Machine Learning

About this role

This role involves developing and managing complex multimodal datasets for training and evaluating AI models, with a focus on synthetic data generation and data annotation.

Key Responsibilities

Design data collection efforts
Analyze large datasets
Build data analysis tools
Collaborate with scientists
Ensure data quality

Technical Overview

The environment includes data collection, synthetic data generation, data analysis, and annotation systems, primarily using Python and related tools for multimodal and speech/text data.

Ideal Candidate

The ideal candidate is a mid-level AI or NLP professional with 2+ years experience in language data processing, data collection, and annotation, proficient in Python, and with a strong background in computational linguistics or AI data creation. They should be collaborative, detail-oriented, and capable of handling complex multimodal datasets.

Must-Have Skills

Experience owning and executing language data collection projectsMaster's or higher degree in Computational Linguistics or equivalent2+ years experience in computational linguistics or language data processing or AI data creationExperience with language data annotation systemsProficient with scripting languages such as PythonExperience working with speechtextand multimodal data in multiple languages

Nice-to-Have Skills

PhD in Computational LinguisticsExpertise in bootstrapping AI data collections

Tools & Platforms

PythonData annotation systems

Required Skills

Experience owning language data projectsMaster's in Computational LinguisticsPythondata annotationsynthetic datamultimodal dataspeech datatext data

Hard Skills

PythonScripting languagesSynthetic data generationModel-supported data generationData analysisData collectionData annotation systemsMultimodal dataSpeech dataText data

Soft Skills

CommunicationOrganizational skillsCollaborationAttention to detailProblem-solving

Industry & Role

Industry Technology

Job Function AI data development and dataset management

Role Subtype AI Engineer

Tech Domains Python, Data annotation systems

Keywords for Your Resume

Language EngineerArtificial General IntelligenceData Servicessynthetic data generationmodel-supported data generationdata collectiondata analysismultimodal dataspeech datatext dataPythonData annotation systemsAI data creationcomputational linguisticsNatural Language Processinglanguage data collectionsynthetic datamultimodal datasetsPython scriptingdata annotationAI model evaluationcollaborative research

Deal Breakers

Lack of experience with language data annotation systems, No proficiency in scripting languages like Python, No relevant advanced degree, Less than 2 years experience in relevant field

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile

Language Engineer, Artificial General Intelligence - Data Services

Get matched to jobs like this