✦ Luna Orbit — AI & Machine Learning

Software Engineer - GenAI inference

at Databricks

📍 San Francisco, California Unknown Posted March 10, 2026
Type Not Specified
Experience mid
Exp. Years 3+ years
Education Not specified
Category AI & Machine Learning

This role involves designing, developing, and optimizing the inference engine for Databricks' Foundation Model API, focusing on large language models and performance efficiency.

  • Design and implement inference engine
  • Collaborate with researchers on model architectures
  • Optimize for latency and hardware utilization
  • Build instrumentation and profiling tools
  • Support reliability and fault tolerance

The technical environment includes CUDA, GPU programming, distributed systems, and performance profiling tools, aimed at scalable ML inference systems.

The ideal candidate is a software engineer with 3+ years of experience in performance-critical systems, specializing in GPU programming and ML inference internals. They should have hands-on experience with CUDA, distributed systems, and optimization for large-scale language models.

Performance-critical systemsCUDAGPU programmingML inference internalsdistributed systemsinstrumentationtracingprofilingcollaborate with ML researchers
published researchopen-source contributionsmodel servingsparsityactivation compressionmixture-of-experts
CUDAcuBLAScuDNNNCCL
CUDAGPU programmingML inferencelarge language modelsdistributed systemsperformance optimizationinstrumentationprofilingtracing
CUDAGPU programmingcuBLAScuDNNNCCLML inferencelarge language modelsperformance optimizationdistributed systemsRPC frameworksshardingmemory partitioninginstrumentationprofilingtracing
collaborationproblem-solvingdeep technical understandingownership mindsetcommunication
Industry SaaS, Technology, Artificial Intelligence
Job Function Develop and optimize large-scale ML inference systems
Software EngineerGenAI inferencelarge language modelsML inferenceCUDAGPU programmingdistributed systemsperformance optimizationinstrumentationprofilingtracinginference enginemodel servingscalabilitymemory management

Lack of experience with CUDA or GPU programming, No background in ML inference internals, Less than 3 years of relevant experience

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile