Position Details
About this role
Scale AI seeks an ML Systems Engineer to build scalable backend platforms for serving foundation models in robotics and Physical AI, bridging research and production engineering.
Key Responsibilities
- Build & scale fault-tolerant, high-performance systems for serving robotics models at scale
- Platform development to enable model capability discovery for faster research iterations
- Collaborate with robotics researchers and computer vision engineers to optimize models for production and research
- Conduct architecture and design reviews for scalability, reliability, and security
- Develop observability and monitoring for real-time performance of model inference
Technical Overview
You will design fault-tolerant, high-performance ML serving platforms, develop internal platforms for model capability discovery, and collaborate with robotics researchers and CV engineers. The tech stack includes Python/Go/Rust/C++, CUDA, Docker, Kubernetes, Terraform, and cloud providers (AWS/GCP) with GPU-accelerated inference and data pipelines.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Less than 4 years backend systems experience, No CUDA or GPU optimization experience, No Kubernetes or Docker experience, No Python/Go/C++ proficiency
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile