✦ Luna Orbit — AI & Machine Learning

Model Quality Software Engineer, Claude Code

at Anthropic

📍 San Francisco, CA | New York City, NY Hybrid Posted March 07, 2026
Type Not Specified
Experience mid
Exp. Years 5+ years
Education Not specified
Category AI & Machine Learning

Anthropic is seeking a Model Quality Software Engineer to develop evaluation systems and infrastructure that enhance the reliability and performance of AI models like Claude, working closely with research teams.

  • Design evaluation systems
  • Build data pipelines
  • Collaborate with research
  • Ensure system reliability
  • Develop infrastructure

The role involves building robust data pipelines, evaluation frameworks, and infrastructure using Python and other tools, ensuring scalable and reliable AI model assessment.

The ideal candidate is a software engineer with over 5 years of experience working at the intersection of research and engineering, specializing in evaluation systems, data pipelines, and infrastructure for AI models. They are collaborative, reliable, and thrive in fast-paced environments.

5+ years experience in software engineeringExperience with research collaborationBuilding evaluation systemsData pipeline developmentSystem reliability
Reinforcement learning systemsResearch computingScientific infrastructureMath or physics backgroundPython and TypeScript
PythonTypeScriptCloud platformsData pipelinesEvaluation frameworks
Software EngineeringResearch collaborationModel evaluationData pipelinesInfrastructureCoding toolsSystem reliabilityPythonInfrastructure development
Software EngineeringResearch collaborationModel evaluationData pipelinesInfrastructureCoding toolsSystem reliabilityPythonInfrastructure development
CollaborationProblem-solvingIndependenceReliability focusFast-paced environment adaptation
Industry AI & Tech
Job Function Develop evaluation systems and infrastructure for AI model quality and reliability.
Model Quality Software EngineerClaude CodeResearch collaborationModel evaluationData pipelinesInfrastructurePythonSystem reliabilityReinforcement learningEvaluation frameworks

Less than 5 years of experience, Lack of experience with research collaboration, No background in system reliability, Inability to work independently

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile