Position Details
About this role
Machine Learning Operations Engineer II on Kensho's MLOps team builds and maintains a production ML platform to enable scalable, auditable ML workflows.
Key Responsibilities
- Iterate on ML processes to develop tools, services, and frameworks
- Work closely with ML engineers to solve pain points and implement solutions
- Empower engineers with stable tooling to productionize research
- Provide resources and training for ML teams on best practices
- Evaluate and champion open-source and third-party solutions
Technical Overview
Stack includes Kubernetes-based ML infra on AWS with EKS, Bedrock and SageMaker; tooling with Ray, Airflow, Terraform, Jsonnet; observability with Prometheus and W&B; production ML with model fine-tuning, RL and agents.
Ideal Candidate
The ideal candidate is a mid-level MLOps engineer with 2+ years building production ML platforms, comfortable with Kubernetes, AWS, and modern MLOps tooling. They should excel at solving cross-team problems, implementing scalable tooling for model deployment and observability, and staying current with emerging AI frameworks.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Less than 2 years ML infra experience, No Kubernetes/EKS experience, Not in the United States, No Python or ML tooling experience
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile