✦ Luna Orbit — AI & Machine Learning

Senior Software Engineer, Model Serving

at Databricks

📍 San Francisco, California Unknown Posted March 10, 2026
Type Full-Time
Experience senior
Exp. Years 5+ years
Education Not specified
Category AI & Machine Learning

This role focuses on designing and building scalable, high-performance inference systems for Databricks' Model Serving platform, supporting real-time AI/ML model deployment.

  • Design core serving systems
  • Optimize inference performance
  • Collaborate on infrastructure architecture
  • Implement autoscaling and observability
  • Ensure reliability and scalability

The technical scope includes distributed inference systems, APIs, runtime components, autoscaling, and low-latency optimization for CPU and GPU workloads.

The ideal candidate is a senior engineer with over 5 years of experience in building and maintaining large-scale distributed inference systems, with expertise in low-latency serving for CPU and GPU workloads.

5+ years building and operating large-scale distributed systemsExperience in model serving or inference systemsStrong foundation in algorithms and data structuresExperience with CPU and GPU inference systemsProven ability to deliver high-impact initiatives
Experience with routingautoscalingobservabilityPerformance optimization for low-latency servingExperience with real-time inferenceKnowledge of container deployment workflows
APIsRuntime systemsModel containersRoutingCachingObservabilityAutoscaling
Model ServingInference systemsRoutingAutoscalingLow-latency inferenceGPU workloadsCPU workloadsSystem designAlgorithmsData structures
Model ServingInference systemsRoutingSchedulingAutoscalingLow-latency inferenceGPU workloadsCPU workloadsAPIsSystem designAlgorithmsData structures
collaborationcommunicationproblem-solvingteamworktechnical leadership
Industry SaaS
Job Function Develop scalable inference and model serving infrastructure
Model ServingInference systemsLow-latency inferenceGPU workloadsCPU workloadsSystem designAlgorithmsData structuresAutoscalingRoutingCachingObservabilityPerformance optimizationDistributed systemsReal-time inferenceGPUCPU

Less than 5 years of experience in distributed systems, No experience with model inference or serving systems, Lack of familiarity with GPU or CPU inference workloads, Not experienced in system design or algorithms

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile