Position Details
About this role
This role focuses on optimizing generative AI inference workloads on AMD GPU platforms, improving performance, latency, and scalability for large-scale models.
Key Responsibilities
- Optimize inference performance on AMD GPUs
- Improve latency and throughput
- Implement techniques like batching and quantization
- Analyze bottlenecks across system layers
- Collaborate with hardware and software teams
Technical Overview
The technical environment includes AMD GPU hardware, inference frameworks like Triton and vLLM, with emphasis on performance tuning, cross-layer optimization, and distributed inference systems.
Ideal Candidate
The ideal candidate is a mid-level AI engineer with expertise in GenAI inference optimization, GPU performance, and large-scale model deployment. They should be skilled in cross-layer optimization and familiar with inference frameworks.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Lack of experience with GPU inference optimization, No familiarity with distributed inference systems, No knowledge of GPU architecture
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile