Position Details
About this role
Scale AI’s Public Sector ML team deploys advanced AI systems into mission-critical government environments and builds automated evaluation frameworks for safety and governance.
Key Responsibilities
- Develop and maintain automated evaluation pipelines
- Design test datasets and benchmarks
- Build evaluation frameworks for LLM agents
- Conduct stress tests and red-teaming
- Collaborate to produce evaluation datasets
Technical Overview
Focus on automated evaluation pipelines for ML models, LLM agent evaluation, stress testing, red-teaming, and regulatory compliance; Python-based ML tooling; cloud deployments.
Ideal Candidate
The ideal candidate is a senior ML engineer with production ML experience in government contexts, strong Python skills, and familiarity with ML evaluation, CV robustness, and AI safety frameworks.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Deal Breakers
Active security clearance or ability to obtain, Onsite in listed cities, Production experience in gov/regulated domains
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile