✦ Luna Orbit — AI & Machine Learning

DGX Cloud Performance Engineer

at Nvidia

📍 3 Locations Unknown Posted March 13, 2026
Type Not Specified
Experience senior
Exp. Years 10+ years
Education Bachelor's/Masters in Engineering or equivalent experience
Category AI & Machine Learning

This role involves driving performance analysis, optimization, and modeling for NVIDIA's DGX Cloud AI infrastructure, focusing on large-scale parallel and distributed systems.

  • Develop benchmarks and applications
  • Analyze performance bottlenecks
  • Optimize workloads
  • Collaborate with cross-functional teams
  • Develop modeling frameworks

The technical environment includes high-performance AI workloads, large-scale parallel and distributed systems, AI frameworks like PyTorch and TensorFlow, and cloud platforms such as GCP, AWS, Azure, and OCI.

The ideal candidate is a senior AI & Machine Learning engineer with extensive experience in large-scale parallel and distributed systems, performance optimization, and AI workloads. They possess a strong background in computer architecture, networking, and AI frameworks, with over 10 years of relevant experience.

Expertise in large scale parallel and distributed accelerator-based systemsOptimizing performance and AI workloadsPerformance modeling and benchmarkingStrong background in Computer ArchitectureNetworkingStorage systemsExperience with AI frameworks (PyTorchTensorFlowJAXMegatron-LMTensort-LLMVLLM)Experience with AI/ML models and workloadsincluding LLMs and DNNs10 years experience in relevant areasProficiency in PythonC/C++Experience with public cloud infrastructure (GCPAWSAzureOCI)
Experience with performance optimizationExperience with cloud infrastructure designModeling frameworksTotal Cost of Ownership analysis
PyTorchTensorFlowJAXMegatron-LMTensort-LLMVLLMGCPAWSAzureOCI
Parallel and Distributed SystemsPerformance analysisPerformance modelingBenchmarkingComputer ArchitectureNetworkingStorage systemsAcceleratorsPyTorchTensorFlowJAXMegatron-LMTensort-LLMVLLMAI/ML modelsLarge Language ModelsDeep Neural NetworksPythonC/C++GCPAWSAzureOCI
Parallel and Distributed SystemsPerformance analysisPerformance modelingBenchmarkingComputer ArchitectureNetworkingStorage systemsAcceleratorsPyTorchTensorFlowJAXMegatron-LMTensort-LLMVLLMAI/ML modelsLarge Language ModelsDeep Neural NetworksPythonC/C++Public Cloud InfrastructureGoogle Cloud PlatformAmazon Web ServicesAzureOracle Cloud Infrastructure
collaborationcommunicationproblem-solvinganalytical thinkingteamwork
Industry Technology
Job Function Performance analysis and optimization of AI workloads on large-scale distributed systems
Parallel and Distributed SystemsPerformance analysisPerformance modelingBenchmarkingComputer ArchitectureNetworkingStorage systemsAcceleratorsPyTorchTensorFlowJAXMegatron-LMTensort-LLMVLLMAI/ML modelsLarge Language ModelsDeep Neural NetworksPythonC/C++GCPAWSAzureOCIAI workloadsCloud Infrastructure

Less than 10 years of experience, Lack of experience with large-scale parallel systems, No proficiency in Python or C/C++, No experience with AI frameworks or cloud infrastructure

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile