Position Details
About this role
This role is for a Principal Architect who designs and oversees HPC and AI platforms in the NVIDIA ecosystem. You will be responsible for end-to-end architecture across compute, networking, storage, orchestration, scheduling, and documentation.
Key Responsibilities
- Architect NVIDIA-based HPC and AI data center platforms (HGX/DGX)
- Design high-performance networking and storage integrations for AI/HPC
- Use BCM, Slurm, Run:AI, and Kubernetes to orchestrate workloads
- Optimize performance, utilization, and cost efficiency for HPC/AI platforms
- Create reusable architectural documentation and operational runbooks
Technical Overview
The technical scope includes deep architectural knowledge of NVIDIA HGX and DGX platforms, Spectrum-X networking, and scale-out storage integration (VAST Data, Netapp, WEKA, DDN, Lustre) for AI and HPC workloads. You will use and administer NVIDIA Base Command Manager (BCM), Slurm, Run:AI, Kubernetes, and Linux to deliver performant, reproducible, and optimized AI factory/HPC platforms.
Ideal Candidate
The ideal candidate is a principal-level architect with 10+ years designing and optimizing HPC and AI data center platforms, specifically within the NVIDIA ecosystem (HGX, DGX, Spectrum-X). They have hands-on experience with NVIDIA Base Command Manager (BCM), Slurm, Run:AI, Kubernetes administration, and integrating scale-out storage systems (e.g., VAST Data, Netapp, WEKA, DDN, Lustre) into GPU-accelerated environments.
Must-Have Skills
None listed
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
10+ years in HPC and data center experience, Expert level with deep architectural knowledge of NVIDIA data center platforms (HGX and DGX)
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile