About this role
Senior ML Engineer role focused on building and scaling the machine learning systems behind intelligent robots. You will design training, evaluation, experiment management, and deployment infrastructure with low-latency serving on robotic and edge hardware.
Key Responsibilities
- Design and build scalable ML training infrastructure (distributed training pipelines, GPU cluster management)
- Develop systems for experiment tracking, model versioning, and reproducibility
- Build deployment infrastructure for serving ML models on robotic hardware with strict latency requirements
- Optimize model inference for edge devices and embedded systems
- Collaborate with research teams to accelerate experimentation to production
Technical Overview
Work spans distributed GPU training infrastructure, experiment tracking and reproducible model versioning, and robust deployment/serving pipelines for robotic hardware. The role emphasizes optimizing inference for edge and embedded systems and using Kubernetes/Docker container ecosystems.
Ideal Candidate
The ideal candidate is a Senior ML Engineer with 5+ years of professional software development and extensive experience designing and scaling ML training and deployment infrastructure. They have strong Large Language Model fundamentals, hands-on experience with distributed training on GPU clusters, and can optimize inference for edge and embedded systems in robotics contexts.
Must-Have Skills
5+ years of non-internship professional software development experience5+ years of programming with at least one software programming language experience5+ years of leading design or architecture (design patternsreliability and scaling) of new and existing systems experienceExperience as a mentortech lead or leading an engineering teamExperience with Machine Learning and Large Language Model fundamentalsincluding architecturetraining/inference lifecyclesand optimization of model executionor experience in development in the last 3 yearsExperience with machine learning (ML) tools and methodsExperience in KubernetesDocker or containers ecosystemor experience that includes strong analytical skillsattention to detailand effective communication abilities and experience with programming/scripting (BatchVBPowerShellJavaC#ChefPerlRuby and/or PHP)
Nice-to-Have Skills
Experience building and operating a cloud-based architectureExperience with robotics data (sensor streamsvideopoint clouds) and real-time inference systemsFamiliarity with model optimization techniques (quantizationpruningdistillation)Experience with reinforcement learning or simulation-based training pipelines
Tools & Platforms
KubernetesDockercontainers ecosystemGPU cluster managementcloud-based architecturerobotic hardware
Required Skills
machine learningLarge Language Model fundamentalsML trainingdistributed training pipelinesGPU cluster managementexperiment trackingmodel versioningreproducibilitydeployment infrastructureserving ML modelsroboticsedge devicesembedded systemsKubernetesDockercontainers ecosystemquantizationpruningdistillationreinforcement learningBatchVBPowerShellJavaC#ChefPerlRubyPHP
Hard Skills
machine learningLarge Language Model fundamentalsML trainingML inferenceML model architecturetraining/inference lifecyclesoptimization of model executionexperiment trackingmodel versioningreproducibilitydistributed training pipelinesGPU cluster managementdeployment infrastructureML model servingrobotic hardwareedge devicesembedded systemsquantizationpruningdistillationKubernetesDockercontainers ecosystemdata pipelinesdata labeling infrastructurerobot locomotionrobot perceptionrobot manipulationrobot navigationhuman-robot interactionprogramming/scriptingBatchVBPowerShellJavaC#ChefPerlRubyPHP
Soft Skills
mentoringtech lead experiencecollaboration with research teamscross-functional collaborationeffective communication abilitiesattention to detailleading design or architecturereliability and scaling mindset
Keywords for Your Resume
Senior ML Engineermachine learning systemsML training infrastructuredistributed training pipelinesGPU cluster managementexperiment trackingmodel versioningreproducibilitydeployment infrastructureserving ML modelsrobot locomotionrobot perceptionrobot manipulationrobot navigationhuman-robot interactionedge devicesembedded systemsKubernetesDockercontainers ecosystemLarge Language Model fundamentalstraining/inference lifecyclesoptimization of model executionquantizationpruningdistillationreinforcement learning
Deal Breakers
Must have 5+ years of programming with at least one software programming language, Must have 5+ years leading design or architecture (reliability and scaling) of new and existing systems, Must have Large Language Model fundamentals experience (architecture, training/inference lifecycles, and optimization of model execution)
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile