✦ Luna Orbit — AI & Machine Learning

Engineering Manager, Inference Routing and Performance

at Anthropic

📍 San Francisco, CA | New York City, NY Unknown Posted March 18, 2026
Type Full-Time
Experience lead
Exp. Years Not specified
Education Not specified
Category AI & Machine Learning

This role involves leading the inference routing and performance optimization team at Anthropic, focusing on system-level improvements to increase throughput and reduce latency across large AI inference fleets.

  • Decide routing algorithm changes
  • Sequence system improvements
  • Debug latency issues
  • Build performance models
  • Coordinate fleet-wide efficiency

Focuses on building and optimizing distributed systems, load balancing algorithms, cluster coordination, and performance tuning for AI inference infrastructure, working closely with kernel and network internals.

The ideal candidate is a senior engineering leader with deep expertise in distributed systems, load balancing, and performance optimization for AI inference fleets. They should have experience managing complex system architectures and improving throughput and latency in large-scale AI infrastructure.

Distributed SystemsLoad BalancingCluster CoordinationPerformance OptimizationLatency Analysis
Kernel DebuggingNetworkingML FrameworksSystem ArchitectureHigh Performance Computing
KernelsML FrameworksNetworking ToolsCluster Management Software
Distributed SystemsLoad BalancingCluster CoordinationPerformance OptimizationLatency AnalysisKernel DebuggingNetworkingML FrameworksSystem ArchitectureHigh Performance Computing
Distributed SystemsLoad BalancingCluster CoordinationPerformance OptimizationLatency AnalysisKernel DebuggingNetworkingML FrameworksSystem ArchitectureHigh Performance Computing
LeadershipProblem-solvingTechnical Decision MakingCollaborationAnalytical ThinkingCommunicationTeam Management
Industry Technology/AI
Job Function Engineering leadership for AI inference system performance
Role Subtype Engineering Manager
Tech Domains Distributed Systems, Networking, Kernel Internals, High Performance Computing, ML Frameworks
distributed systemsload balancingcluster coordinationperformance optimizationlatency analysiskernel debuggingnetworkingml frameworkssystem architecturehigh performance computinginference routingAI systemsscalabilityfleet efficiencylatency spikes

Lack of experience with distributed systems, No background in AI infrastructure, Unfamiliar with load balancing or cluster coordination, No experience with performance tuning at scale

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile