Position Details
About this role
This role focuses on designing and building scalable, high-performance inference systems for Databricks' Model Serving platform, supporting real-time AI/ML model deployment.
Key Responsibilities
- Design core serving systems
- Optimize inference performance
- Collaborate on infrastructure architecture
- Implement autoscaling and observability
- Ensure reliability and scalability
Technical Overview
The technical scope includes distributed inference systems, APIs, runtime components, autoscaling, and low-latency optimization for CPU and GPU workloads.
Ideal Candidate
The ideal candidate is a senior engineer with over 5 years of experience in building and maintaining large-scale distributed inference systems, with expertise in low-latency serving for CPU and GPU workloads.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Less than 5 years of experience in distributed systems, No experience with model inference or serving systems, Lack of familiarity with GPU or CPU inference workloads, Not experienced in system design or algorithms
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile