Position Details
About this role
Senior Data Center Engineer responsible for leading large-scale GPU cluster deployments at CSP partners and customers, with a focus on systems management automation, AI/ML workloads, and GPU software stacks.
Key Responsibilities
- Develop understanding of client's business to assist with bringup of CSP and customer clusters
- Provide technical guidance and advisory level to customers for server clusters, large-scale GPU deployments
- Build out datacenter GPU cluster environments for testing and deployment
- Resolve hardware/software issues throughout cluster lifecycle
- Mentor junior members and collaborate with PMs to maintain project schedules
Technical Overview
Technical scope includes AMD Instinct GPUs, data center deployments, high-speed interconnects, Linux environments, and automation with Ansible and Python; familiarity with ROCm/CUDA and NCCL/RCCL is preferred.
Ideal Candidate
The ideal candidate is a senior data center systems engineer with deep GPU cluster bring-up experience, strong automation scripting, and ability to advise CSP partners; proficient in GPU software stacks and high-speed networks.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Must be able to work in the United States, Bachelor's degree in Engineering, Ability to travel up to 20%
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile