✦ Luna Orbit — AI & Machine Learning

Fellow GPU Performance Optimization Engineer

at Advanced Micro Devices

📍 San Jose, California, United States Hybrid Posted March 28, 2026
Type Full-Time
Experience mid
Exp. Years Not specified
Education Not specified
Category AI & Machine Learning

This role focuses on optimizing the performance and efficiency of large-scale AI training workloads on AMD GPU platforms. The engineer will drive innovations across the software-hardware stack, improving system throughput and scalability.

  • Lead performance optimization of AI workloads
  • Identify and eliminate system bottlenecks
  • Optimize distributed training strategies
  • Drive cross-stack optimizations
  • Collaborate on hardware/software influence

The technical environment involves GPU architecture, distributed training strategies, performance profiling, benchmarking, and modeling, with a focus on generative AI workloads on AMD hardware.

The ideal candidate is a senior GPU performance engineer with extensive experience in optimizing large-scale AI training workloads on GPU platforms. They possess deep expertise in GPU architecture, performance analysis, and distributed systems, and are capable of influencing hardware and software design.

GPU performance analysisdistributed systemslarge-scale AI trainingGPU architectureperformance profilingperformance modeling
open-source contributionssoftware optimizationsystem bottleneck analysisgenerative AI workloadscommunication patterns
AMD GPU platformsprofiling toolsbenchmarking toolsperformance modeling tools
GPU performance analysisdistributed systemsML workloadsGPU architectureperformance profilingbenchmarkingperformance modeling
GPU performance analysisdistributed systemsML workloadsGPU architectureinterconnectsmemory hierarchiescommunication patternsperformance profilingbenchmarkingperformance modelingsoftware-hardware stacklarge-scale AI trainingdistributed training strategiesKernelsRuntimesFrameworksCommunication librariesML frameworks
leadershipcollaborationproblem-solvinginfluencecommunicationinnovative thinking
Industry Semiconductors & Hardware
Job Function GPU performance optimization for AI training workloads
Role Subtype GPU Performance Optimization Engineer
Tech Domains Active Directory, Microsoft 365, Azure, Amazon Web Services, Linux
GPU performance analysisdistributed systemsAI training workloadsGPU architectureperformance profilingperformance modelinglarge-scale AIAMD GPUsoftware-hardware stackgenerative AIperformance optimizationML workloadscommunication patternskernelsruntimesframeworksAI training

Lack of experience in GPU performance analysis, No background in distributed systems, No experience with AI training workloads, Lack of leadership or influence skills

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile