Position Details

Type Full-Time

Experience senior

Exp. Years 5+ years

Education Not specified

Category AI & Machine Learning

About this role

This role focuses on designing and building scalable, high-performance inference systems for Databricks' Model Serving platform, supporting real-time AI/ML model deployment.

Key Responsibilities

Design core serving systems
Optimize inference performance
Collaborate on infrastructure architecture
Implement autoscaling and observability
Ensure reliability and scalability

Technical Overview

The technical scope includes distributed inference systems, APIs, runtime components, autoscaling, and low-latency optimization for CPU and GPU workloads.

Ideal Candidate

The ideal candidate is a senior engineer with over 5 years of experience in building and maintaining large-scale distributed inference systems, with expertise in low-latency serving for CPU and GPU workloads.

Must-Have Skills

5+ years building and operating large-scale distributed systemsExperience in model serving or inference systemsStrong foundation in algorithms and data structuresExperience with CPU and GPU inference systemsProven ability to deliver high-impact initiatives

Nice-to-Have Skills

Experience with routingautoscalingobservabilityPerformance optimization for low-latency servingExperience with real-time inferenceKnowledge of container deployment workflows

Tools & Platforms

APIsRuntime systemsModel containersRoutingCachingObservabilityAutoscaling

Required Skills

Model ServingInference systemsRoutingAutoscalingLow-latency inferenceGPU workloadsCPU workloadsSystem designAlgorithmsData structures

Hard Skills

Model ServingInference systemsRoutingSchedulingAutoscalingLow-latency inferenceGPU workloadsCPU workloadsAPIsSystem designAlgorithmsData structures

Soft Skills

collaborationcommunicationproblem-solvingteamworktechnical leadership

Industry & Role

Industry SaaS

Job Function Develop scalable inference and model serving infrastructure

Keywords for Your Resume

Model ServingInference systemsLow-latency inferenceGPU workloadsCPU workloadsSystem designAlgorithmsData structuresAutoscalingRoutingCachingObservabilityPerformance optimizationDistributed systemsReal-time inferenceGPUCPU

Deal Breakers

Less than 5 years of experience in distributed systems, No experience with model inference or serving systems, Lack of familiarity with GPU or CPU inference workloads, Not experienced in system design or algorithms

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile

Senior Software Engineer, Model Serving

Get matched to jobs like this