Position Details

Type Not Specified

Experience mid

Exp. Years 5+ years

Education Bachelor's in CS/Engineering or equivalent experience

Category AI & Machine Learning

About this role

This role involves designing and deploying scalable AI inference solutions using NVIDIA GPU technologies and Kubernetes, focusing on disaggregated inference pipelines and model optimization.

Key Responsibilities

Build inference pipelines
Collaborate with DevOps
Accelerate inference pipelines
Mentor teams
Resolve complex GPU allocation issues

Technical Overview

Environment includes Kubernetes, NVIDIA Dynamo, Triton Inference Server, TensorRT-LLM, GPU orchestration, and open-source AI tools, with a focus on low-latency, high-efficiency AI inference workloads.

Ideal Candidate

The ideal candidate is a mid-level solutions architect with 5+ years of experience in deploying distributed AI inference workloads on Kubernetes, with expertise in GPU optimization and model acceleration technologies.

Must-Have Skills

5+ Years in Solutions Architecturedistributed systemsAI inference workloadsKubernetesmodel optimizationGPU orchestrationdisaggregated inferenceTensorRT-LLMNVIDIA DynamoTriton Inference Server

Nice-to-Have Skills

NVIDIA Certified AI Engineeropen-source contributionstransformer neural networksinference acceleration technologies

Tools & Platforms

NVIDIA DynamoTriton Inference ServerTensorRT-LLMKubernetesNVIDIA GPU OperatorNIM OperatorMulti-Instance GPUSGLang

Required Skills

KubernetesTensorRT-LLMNVIDIA DynamoTriton Inference ServerGPU orchestrationGPU Operatordisaggregated inferencetransformer neural networkquantizationspeculative decodingWideEP

Hard Skills

KubernetesKubernetesTensorRT-LLMNVIDIA DynamoTriton Inference ServervLLMSGLangGPUGPU OperatorNVIDIA GPU OperatorNIM OperatorMulti-Instance GPUMIGdisaggregated inferenceGPU orchestrationGPUdisaggregated inference systemstransformer neural networkquantizationspeculative decodingWideEP

Soft Skills

collaborationmentorshiptechnical leadershipproblem-solvingcommunication

Certifications

Required

NVIDIA Certified AI Engineer

Industry & Role

Industry Technology

Job Function Design and implement enterprise AI inference solutions using GPU and Kubernetes technologies

Role Subtype Solutions Architect

Tech Domains Kubernetes, TensorRT-LLM, NVIDIA Dynamo, disaggregated inference, GPU

Keywords for Your Resume

Solutions ArchitectAI inferenceKubernetesTensorRT-LLMNVIDIA DynamoTriton Inference Serverdisaggregated inferenceGPU orchestrationGPU OperatorNVIDIA GPU OperatorNIM OperatorMulti-Instance GPUMIGtransformer neural networkquantizationspeculative decodingWideEPopen-sourceAI EngineerGPUdisaggregated inference systemsinference accelerationNVIDIA Certified AI EngineerTensorRT

Deal Breakers

Less than 5 years experience in solutions architecture, Lack of Kubernetes or GPU experience, No experience with NVIDIA inference technologies

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile

Solutions Architect, Inference Deployments

Get matched to jobs like this