✦ Luna Orbit — AI & Machine Learning

Principal Architect – HPC & AI (NVidia Ecosystem)

at World Wide Technology

📍 Remote, US Remote 💰 $215K – $245K USD / year Posted April 10, 2026
Salary $215K – $245K USD / year
Type Not Specified
Experience executive
Exp. Years 10+ years
Education Bachelor's degree in a technical field or equivalent hands-on experience architecting large scale HPC or AI systems
Category AI & Machine Learning

This role is for a Principal Architect who designs and oversees HPC and AI platforms in the NVIDIA ecosystem. You will be responsible for end-to-end architecture across compute, networking, storage, orchestration, scheduling, and documentation.

  • Architect NVIDIA-based HPC and AI data center platforms (HGX/DGX)
  • Design high-performance networking and storage integrations for AI/HPC
  • Use BCM, Slurm, Run:AI, and Kubernetes to orchestrate workloads
  • Optimize performance, utilization, and cost efficiency for HPC/AI platforms
  • Create reusable architectural documentation and operational runbooks

The technical scope includes deep architectural knowledge of NVIDIA HGX and DGX platforms, Spectrum-X networking, and scale-out storage integration (VAST Data, Netapp, WEKA, DDN, Lustre) for AI and HPC workloads. You will use and administer NVIDIA Base Command Manager (BCM), Slurm, Run:AI, Kubernetes, and Linux to deliver performant, reproducible, and optimized AI factory/HPC platforms.

The ideal candidate is a principal-level architect with 10+ years designing and optimizing HPC and AI data center platforms, specifically within the NVIDIA ecosystem (HGX, DGX, Spectrum-X). They have hands-on experience with NVIDIA Base Command Manager (BCM), Slurm, Run:AI, Kubernetes administration, and integrating scale-out storage systems (e.g., VAST Data, Netapp, WEKA, DDN, Lustre) into GPU-accelerated environments.

None listed

liquid coolingpower/cooling designdata center integrationmulti-siteair-gappedor regulated environmentsExperience optimizing existing HPC or AI platforms for performanceutilizationand cost efficiency
NVIDIA Base Command Manager (BCM)BCMSlurmRun:AIKubernetesVAST DataNetappWEKADDNLustreHGXDGXSpectrum-XLinux
NVIDIA data center platforms (HGX and DGX)GPU-accelerated compute architectureSpectrum-Xlarge-scale AI factory and HPC platform designhigh-performance parallel or scale-out storage systemsstorage performance characteristics (bandwidthIOPSlatencymetadata scaling)VAST DataNetappWEKADDNLustreNVIDIA Base Command Manager (BCM)SlurmRun:AIKubernetes administrationLinux systems administrationcontainerized AI workflowsmulti-tenant AI workload optimizationperformance/utilization/cost efficiency optimizationmulti-site air-gapped environmentsliquid coolingpower/cooling designarchitectural documentationdesign blueprintsconfiguration guidesdeployment validation reportsoperational runbooksOne Voice standards
NVIDIA data center platformsHGXDGXGPU-accelerated compute architectureAI workloadsHPC workloadsHigh-performance networking architecturesSpectrum-XLarge-scale AI factory and HPC platform designhigh-performance parallelscale-out storage systemsstorage performance characteristicsbandwidthIOPSlatencymetadata scalingVAST DataNetappWEKADDNLustreGPU orchestrationmultitenant AI workload optimizationNVIDIA Base Command Manager (BCM)BCMSlurmRun:AIKubernetes administrationLinux systems administrationcontainerized AI workflowsHPC platform optimization for performanceutilizationand cost efficiencymulti-siteair-gappedregulated environmentsliquid coolingpower/cooling designdata center integrationarchitectural documentationdesign blueprintsconfiguration guidesdeployment validation reportsoperational runbooksreusable templatesreference architecturesstandardized design patternsOne Voice standards
Senior individual contributor roletechnical authoritymentor engineers and architectsdesign reviewsarchitectural guidancetechnical leadershipoperate autonomouslydocumentation disciplinedocumentation clarity completeness technical accuracyculture of documentation
Industry Consulting
Job Function Lead technical architecture for NVIDIA ecosystem HPC and AI platforms
Role Subtype Cloud Architect
Tech Domains Amazon Web Services, Google Cloud Platform, Azure, Kubernetes, Linux, VMware, AI & Machine Learning, Cloud & Infrastructure
Principal ArchitectPrincipal Architect – HPC & AI (NVidia Ecosystem)NVIDIANVIDIA data center platformsHGXDGXGPU-accelerated compute architectureAI workloadsHPC workloadsSpectrum-Xhigh-performance networking architectureslarge-scale AI factoryHPC platform designhigh-performance parallelscale-out storage systemsbandwidthIOPSlatencymetadata scalingVAST DataNetappWEKADDNLustreNVIDIA Base Command Manager (BCM)BCMSlurmRun:AIKubernetesLinuxliquid cooling

10+ years in HPC and data center experience, Expert level with deep architectural knowledge of NVIDIA data center platforms (HGX and DGX)

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile