Position Details
About this role
This role involves developing high-performance GPU kernels for machine learning libraries, focusing on GEMM operations and fusions, within AMD's ROCm ecosystem.
Key Responsibilities
- Design and implement GPU kernel generator
- Develop build and testing systems
- Collaborate with teams on ML primitives
- Optimize GPU code
- Document best practices
Technical Overview
Technical environment includes C++, GPU programming, ROCm platform, high-performance computing, and open-source collaboration.
Ideal Candidate
The ideal candidate is a mid-level software engineer with strong expertise in C++, GPU programming, and high-performance computing. They should have experience developing and optimizing GPU kernels, particularly for machine learning libraries, and be comfortable collaborating across teams.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Lack of C++ experience, No GPU programming background, No experience with high-performance kernel development, Bachelor's degree not in relevant field
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile