✦ Luna Orbit — AI & Machine Learning

Principal GenAI Inference Optimization Engineer

at Advanced Micro Devices

📍 San Jose, California, United States Hybrid Posted March 28, 2026
Type Full-Time
Experience mid
Exp. Years Not specified
Education Not specified
Category AI & Machine Learning

This role focuses on optimizing generative AI inference workloads on AMD GPU platforms, improving performance, latency, and scalability for large-scale models.

  • Optimize inference performance on AMD GPUs
  • Improve latency and throughput
  • Implement techniques like batching and quantization
  • Analyze bottlenecks across system layers
  • Collaborate with hardware and software teams

The technical environment includes AMD GPU hardware, inference frameworks like Triton and vLLM, with emphasis on performance tuning, cross-layer optimization, and distributed inference systems.

The ideal candidate is a mid-level AI engineer with expertise in GenAI inference optimization, GPU performance, and large-scale model deployment. They should be skilled in cross-layer optimization and familiar with inference frameworks.

Expertise in GenAI inference optimizationGPU architecture and performanceMemory systemsWork across multiple layers (kernelsruntimesframeworks)Optimization techniques (batchingquantizationcaching)
Experience with inference frameworks (e.g.TritonvLLM)Knowledge of distributed systemsProfiling and benchmarking
TritonvLLMSGLangAMD GPU platforms
GenAIinference optimizationGPU performancelarge-scale modelslatency reductionthroughputquantizationcachingprofiling toolsdistributed systems
GenAIInference optimizationGPU performanceLarge-scale serving systemsLatency reductionThroughput optimizationQuantizationCachingProfiling toolsDistributed environments
CollaborationAnalytical skillsProblem-solvingCross-functional teamwork
Industry Technology / Semiconductors / Computing
Job Function Enhancing AI inference performance and scalability on GPU platforms
Role Subtype AI Engineer
Tech Domains AMD GPU, Inference optimization, Distributed systems
GenAIinference optimizationGPU performancelarge-scale modelslatency reductionthroughputquantizationcachingprofiling toolsdistributed systemsAMD GPUinference workloadsperformance analysisscalabilityoptimization techniquescross-stack optimizationdistributed environments

Lack of experience with GPU inference optimization, No familiarity with distributed inference systems, No knowledge of GPU architecture

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile