Position Details
About this role
This role focuses on optimizing the performance and efficiency of large-scale AI training workloads on AMD GPU platforms. The engineer will drive innovations across the software-hardware stack, improving system throughput and scalability.
Key Responsibilities
- Lead performance optimization of AI workloads
- Identify and eliminate system bottlenecks
- Optimize distributed training strategies
- Drive cross-stack optimizations
- Collaborate on hardware/software influence
Technical Overview
The technical environment involves GPU architecture, distributed training strategies, performance profiling, benchmarking, and modeling, with a focus on generative AI workloads on AMD hardware.
Ideal Candidate
The ideal candidate is a senior GPU performance engineer with extensive experience in optimizing large-scale AI training workloads on GPU platforms. They possess deep expertise in GPU architecture, performance analysis, and distributed systems, and are capable of influencing hardware and software design.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Lack of experience in GPU performance analysis, No background in distributed systems, No experience with AI training workloads, Lack of leadership or influence skills
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile