About this role
This role involves designing and deploying scalable AI inference solutions using NVIDIA GPU technologies and Kubernetes, focusing on disaggregated inference pipelines and model optimization.
Key Responsibilities
- Build inference pipelines
- Collaborate with DevOps
- Accelerate inference pipelines
- Mentor teams
- Resolve complex GPU allocation issues
Technical Overview
Environment includes Kubernetes, NVIDIA Dynamo, Triton Inference Server, TensorRT-LLM, GPU orchestration, and open-source AI tools, with a focus on low-latency, high-efficiency AI inference workloads.
Ideal Candidate
The ideal candidate is a mid-level solutions architect with 5+ years of experience in deploying distributed AI inference workloads on Kubernetes, with expertise in GPU optimization and model acceleration technologies.
Must-Have Skills
5+ Years in Solutions Architecturedistributed systemsAI inference workloadsKubernetesmodel optimizationGPU orchestrationdisaggregated inferenceTensorRT-LLMNVIDIA DynamoTriton Inference Server
Nice-to-Have Skills
NVIDIA Certified AI Engineeropen-source contributionstransformer neural networksinference acceleration technologies
Tools & Platforms
NVIDIA DynamoTriton Inference ServerTensorRT-LLMKubernetesNVIDIA GPU OperatorNIM OperatorMulti-Instance GPUSGLang
Required Skills
KubernetesTensorRT-LLMNVIDIA DynamoTriton Inference ServerGPU orchestrationGPU Operatordisaggregated inferencetransformer neural networkquantizationspeculative decodingWideEP
Hard Skills
KubernetesKubernetesTensorRT-LLMNVIDIA DynamoTriton Inference ServervLLMSGLangGPUGPU OperatorNVIDIA GPU OperatorNIM OperatorMulti-Instance GPUMIGdisaggregated inferenceGPU orchestrationGPUdisaggregated inference systemstransformer neural networkquantizationspeculative decodingWideEP
Soft Skills
collaborationmentorshiptechnical leadershipproblem-solvingcommunication
Certifications
Required
NVIDIA Certified AI Engineer
Keywords for Your Resume
Solutions ArchitectAI inferenceKubernetesTensorRT-LLMNVIDIA DynamoTriton Inference Serverdisaggregated inferenceGPU orchestrationGPU OperatorNVIDIA GPU OperatorNIM OperatorMulti-Instance GPUMIGtransformer neural networkquantizationspeculative decodingWideEPopen-sourceAI EngineerGPUdisaggregated inference systemsinference accelerationNVIDIA Certified AI EngineerTensorRT
Deal Breakers
Less than 5 years experience in solutions architecture, Lack of Kubernetes or GPU experience, No experience with NVIDIA inference technologies
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile