Position Details

Type Full-Time

Experience mid

Exp. Years Not specified

Education Not specified

Category AI & Machine Learning

About this role

This role focuses on optimizing generative AI inference workloads on AMD GPU platforms, improving performance, latency, and scalability for large-scale models.

Key Responsibilities

Optimize inference performance on AMD GPUs
Improve latency and throughput
Implement techniques like batching and quantization
Analyze bottlenecks across system layers
Collaborate with hardware and software teams

Technical Overview

The technical environment includes AMD GPU hardware, inference frameworks like Triton and vLLM, with emphasis on performance tuning, cross-layer optimization, and distributed inference systems.

Ideal Candidate

The ideal candidate is a mid-level AI engineer with expertise in GenAI inference optimization, GPU performance, and large-scale model deployment. They should be skilled in cross-layer optimization and familiar with inference frameworks.

Must-Have Skills

Expertise in GenAI inference optimizationGPU architecture and performanceMemory systemsWork across multiple layers (kernelsruntimesframeworks)Optimization techniques (batchingquantizationcaching)

Nice-to-Have Skills

Experience with inference frameworks (e.g.TritonvLLM)Knowledge of distributed systemsProfiling and benchmarking

Tools & Platforms

TritonvLLMSGLangAMD GPU platforms

Required Skills

GenAIinference optimizationGPU performancelarge-scale modelslatency reductionthroughputquantizationcachingprofiling toolsdistributed systems

Hard Skills

GenAIInference optimizationGPU performanceLarge-scale serving systemsLatency reductionThroughput optimizationQuantizationCachingProfiling toolsDistributed environments

Soft Skills

CollaborationAnalytical skillsProblem-solvingCross-functional teamwork

Industry & Role

Industry Technology / Semiconductors / Computing

Job Function Enhancing AI inference performance and scalability on GPU platforms

Role Subtype AI Engineer

Tech Domains AMD GPU, Inference optimization, Distributed systems

Keywords for Your Resume

GenAIinference optimizationGPU performancelarge-scale modelslatency reductionthroughputquantizationcachingprofiling toolsdistributed systemsAMD GPUinference workloadsperformance analysisscalabilityoptimization techniquescross-stack optimizationdistributed environments

Deal Breakers

Lack of experience with GPU inference optimization, No familiarity with distributed inference systems, No knowledge of GPU architecture

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile

Principal GenAI Inference Optimization Engineer

Get matched to jobs like this