Position Details
About this role
This role involves designing, developing, and optimizing the inference engine for Databricks' Foundation Model API, focusing on large language models and performance efficiency.
Key Responsibilities
- Design and implement inference engine
- Collaborate with researchers on model architectures
- Optimize for latency and hardware utilization
- Build instrumentation and profiling tools
- Support reliability and fault tolerance
Technical Overview
The technical environment includes CUDA, GPU programming, distributed systems, and performance profiling tools, aimed at scalable ML inference systems.
Ideal Candidate
The ideal candidate is a software engineer with 3+ years of experience in performance-critical systems, specializing in GPU programming and ML inference internals. They should have hands-on experience with CUDA, distributed systems, and optimization for large-scale language models.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Lack of experience with CUDA or GPU programming, No background in ML inference internals, Less than 3 years of relevant experience
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile