About this role
Principal Architect for HPC and AI within the NVIDIA ecosystem, responsible for end-to-end design and delivery of GPU-accelerated AI/HPC platforms across large-scale data centers and AI factories.
Key Responsibilities
- Lead end-to-end architecture of GPU-accelerated HPC and AI platforms
- Architect integrated Compute/Networking/Storage using NVIDIA HGX and DGX
- Design storage for AI training/inference and HPC
- Provide hands-on leadership during implementation
- Maintain high-quality architectural documentation
Technical Overview
Architects GPU-accelerated compute across HGX/DGX, plans storage integrations (VAST, NetApp, WEKA, Lustre), orchestrates with BCM/Slurm/Run:AI, uses Kubernetes and Linux, and addresses cooling and power considerations for data centers.
Ideal Candidate
The ideal candidate is a senior lead/ principal architect with 10+ years of HPC/AI architecture experience, expert in NVIDIA data center ecosystem (HGX/DGX), and a track record of designing large-scale GPU-accelerated AI/HPC platforms. They should be able to mentor engineers, author architectural documentation, and drive multi-site deployments.
Must-Have Skills
Expert level with deep architectural knowledge of NVIDIA data center platformsincluding HGX and DGXGPU-accelerated compute architecture for AI and HPC workloadsHigh-performance networking architecturesespecially with Spectrum-XLarge-scale AI factory and HPC platform designHands-on architectural experience with high-performance parallel or scale-out storage systemsHands-on experience with storage platforms such as VAST DataNetappWEKADDNLustreNVIDIA Base Command Manager (BCM) for cluster lifecycle managementSlurm for HPC workload schedulingRun:AI for GPU orchestration and multi-tenant AI workload optimizationKubernetes administrationLinux systems administrationContainerized AI workflowsExperience optimizing HPC/AI platforms for performance/utilization/costSenior individual contributor with technical authorityMentoring engineers and architects
Nice-to-Have Skills
Multi-siteair-gapped or regulated environmentsLiquid coolingpower/cooling designdata center integration
Tools & Platforms
NVIDIA Base Command ManagerSlurmRun:AIKubernetesLinuxVAST DataNetAppWEKADDNLustre
Required Skills
NVIDIA data center platformsHGXDGXGPU-accelerated computeSpectrum-Xhigh-performance storageVAST DataNetappWEKADDNLustreNVIDIA Base Command Manager (BCM)SlurmRun:AIKubernetesLinuxcontainerized AI workflowsGrace CPU architecturesmulti-site environments
Hard Skills
NVIDIA data center platformsHGXDGXGPU-accelerated computeSpectrum-Xhigh-performance parallel storageVAST DataNetappWEKADDNLustreNVIDIA Base Command Manager (BCM)SlurmRun:AIKubernetesLinuxContainerized AI workflowsGrace CPU architecturesNVIDIA Base Command ManagerLustrePower/cooling design
Soft Skills
technical authoritymentoringautonomyleadership without people managementcross-functional collaboration
Keywords for Your Resume
principal architecthpcainvidianvidia data centerhgxdgxslurmrun:aikuberneteslinuxlustrevast datanetappwekaddngrace cpuair-gappeddata center architecturegpu orchestrationNVIDIA data center platformsHGXDGXGPU-accelerated computeSpectrum-XBCMSlurmRun:AIKubernetesLustre
Deal Breakers
10+ years in HPC, Data Center Architecture, and/or Systems Engineering, Experience acting as a senior technical authority, Strong ability to mentor engineers and architects
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile