✦ Luna Orbit — AI & Machine Learning

Solutions Architect, Inference Deployments

at Nvidia

📍 US, CA, Santa Clara Unknown Posted March 18, 2026
Type Not Specified
Experience mid
Exp. Years 5+ years
Education Bachelor's in CS/Engineering or equivalent experience
Category AI & Machine Learning

This role involves designing and deploying scalable AI inference solutions using NVIDIA GPU technologies and Kubernetes, focusing on disaggregated inference pipelines and model optimization.

  • Build inference pipelines
  • Collaborate with DevOps
  • Accelerate inference pipelines
  • Mentor teams
  • Resolve complex GPU allocation issues

Environment includes Kubernetes, NVIDIA Dynamo, Triton Inference Server, TensorRT-LLM, GPU orchestration, and open-source AI tools, with a focus on low-latency, high-efficiency AI inference workloads.

The ideal candidate is a mid-level solutions architect with 5+ years of experience in deploying distributed AI inference workloads on Kubernetes, with expertise in GPU optimization and model acceleration technologies.

5+ Years in Solutions Architecturedistributed systemsAI inference workloadsKubernetesmodel optimizationGPU orchestrationdisaggregated inferenceTensorRT-LLMNVIDIA DynamoTriton Inference Server
NVIDIA Certified AI Engineeropen-source contributionstransformer neural networksinference acceleration technologies
NVIDIA DynamoTriton Inference ServerTensorRT-LLMKubernetesNVIDIA GPU OperatorNIM OperatorMulti-Instance GPUSGLang
KubernetesTensorRT-LLMNVIDIA DynamoTriton Inference ServerGPU orchestrationGPU Operatordisaggregated inferencetransformer neural networkquantizationspeculative decodingWideEP
KubernetesKubernetesTensorRT-LLMNVIDIA DynamoTriton Inference ServervLLMSGLangGPUGPU OperatorNVIDIA GPU OperatorNIM OperatorMulti-Instance GPUMIGdisaggregated inferenceGPU orchestrationGPUdisaggregated inference systemstransformer neural networkquantizationspeculative decodingWideEP
collaborationmentorshiptechnical leadershipproblem-solvingcommunication

Required

NVIDIA Certified AI Engineer
Industry Technology
Job Function Design and implement enterprise AI inference solutions using GPU and Kubernetes technologies
Role Subtype Solutions Architect
Tech Domains Kubernetes, TensorRT-LLM, NVIDIA Dynamo, disaggregated inference, GPU
Solutions ArchitectAI inferenceKubernetesTensorRT-LLMNVIDIA DynamoTriton Inference Serverdisaggregated inferenceGPU orchestrationGPU OperatorNVIDIA GPU OperatorNIM OperatorMulti-Instance GPUMIGtransformer neural networkquantizationspeculative decodingWideEPopen-sourceAI EngineerGPUdisaggregated inference systemsinference accelerationNVIDIA Certified AI EngineerTensorRT

Less than 5 years experience in solutions architecture, Lack of Kubernetes or GPU experience, No experience with NVIDIA inference technologies

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile