Position Details

Type Not Specified

Experience mid

Exp. Years 3+ years

Education Not specified

Category AI & Machine Learning

About this role

This role involves designing, developing, and optimizing the inference engine for Databricks' Foundation Model API, focusing on large language models and performance efficiency.

Key Responsibilities

Design and implement inference engine
Collaborate with researchers on model architectures
Optimize for latency and hardware utilization
Build instrumentation and profiling tools
Support reliability and fault tolerance

Technical Overview

The technical environment includes CUDA, GPU programming, distributed systems, and performance profiling tools, aimed at scalable ML inference systems.

Ideal Candidate

The ideal candidate is a software engineer with 3+ years of experience in performance-critical systems, specializing in GPU programming and ML inference internals. They should have hands-on experience with CUDA, distributed systems, and optimization for large-scale language models.

Must-Have Skills

Performance-critical systemsCUDAGPU programmingML inference internalsdistributed systemsinstrumentationtracingprofilingcollaborate with ML researchers

Nice-to-Have Skills

published researchopen-source contributionsmodel servingsparsityactivation compressionmixture-of-experts

Tools & Platforms

CUDAcuBLAScuDNNNCCL

Required Skills

CUDAGPU programmingML inferencelarge language modelsdistributed systemsperformance optimizationinstrumentationprofilingtracing

Hard Skills

CUDAGPU programmingcuBLAScuDNNNCCLML inferencelarge language modelsperformance optimizationdistributed systemsRPC frameworksshardingmemory partitioninginstrumentationprofilingtracing

Soft Skills

collaborationproblem-solvingdeep technical understandingownership mindsetcommunication

Industry & Role

Industry SaaS, Technology, Artificial Intelligence

Job Function Develop and optimize large-scale ML inference systems

Keywords for Your Resume

Software EngineerGenAI inferencelarge language modelsML inferenceCUDAGPU programmingdistributed systemsperformance optimizationinstrumentationprofilingtracinginference enginemodel servingscalabilitymemory management

Deal Breakers

Lack of experience with CUDA or GPU programming, No background in ML inference internals, Less than 3 years of relevant experience

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile

Software Engineer - GenAI inference

Get matched to jobs like this