Position Details
About this role
This senior/principal role focuses on deploying and managing AI/ML fabrics for AMD data center GPU systems. You will act as a technical interface with customers and partners, drive at-scale debug and infrastructure optimization, and benchmark Machine Learning applications across compute, network, and storage environments.
Key Responsibilities
- Collaborate with strategic customers on scalable compute, networking, and storage designs
- Perform system-level triage and at-scale debug across hardware/firmware/software
- Drive the ramp of Instinct-based large scale AI datacenter infrastructure
- Interface with ROCm, DC GPU HW/FW/ASIC teams, field engineering, OEM/ODM partners, CSPs
- Benchmark and optimize Machine Learning application performance across infrastructure
Technical Overview
The role emphasizes large network architecture, storage environments, AI/ML network deployments, and performance tuning for Instinct-based datacenter infrastructure. You will lead post-rollout management, system triage, and at-scale debug across hardware, firmware, and software, coordinating with ROCm and DC GPU engineering teams.
Ideal Candidate
The ideal candidate is a senior/principal engineer focused on AI/ML deployment engineering for data center GPU infrastructure. They have strong experience with large network architecture, storage environments, and performance tuning, and can perform disciplined system triage and at-scale debug across hardware, firmware, and software while interfacing with customers and internal engineering teams.
Must-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Must have experience performing system triage and at-scale debug across hardware, firmware, and software, Must have experience optimizing compute, network, and storage and benchmarking Machine Learning applications
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile