✦ Luna Orbit — AI & Machine Learning

Senior Software Engineer - AI Triton Communication

at Advanced Micro Devices

📍 San Jose, California, United States Hybrid Posted March 14, 2026
Type Full-Time
Experience senior
Exp. Years Not specified
Education Not specified
Category AI & Machine Learning

This role involves advancing the Triton compiler and runtime for AMD GPUs, focusing on distributed execution, communication, and performance optimization for AI workloads.

  • Develop AMD GPU backend for Triton; Build distributed communication capabilities; Optimize kernel performance; Collaborate with hardware teams; Enhance compiler infrastructure

The technical environment includes GPU architecture, compiler backends like MLIR and LLVM, AMD Instinct accelerators, and performance tuning tools, working within AMD's Triton ecosystem.

The ideal candidate is a senior AI/ML software engineer with deep expertise in GPU architecture, compiler technologies, and distributed systems. They should have experience optimizing GPU kernels, working with AMD Instinct accelerators, and developing scalable AI training and inference solutions.

GPU architectureCompiler technologiesDistributed GPU systemsCC++PythonExperience with GPU runtime or communication stack
MLIRLLVMKernel optimizationHardware utilizationAMD Instinct GPUs
ROCmLLVMMLIRGitLinux
GPU architectureCompiler TechnologiesDistributed GPU SystemsGPU RuntimeCommunication StackMLIRLLVMKernel OptimizationHardware UtilizationAMD Instinct AcceleratorsPerformance EngineeringCC++PythonGPU CommunicationDistributed Execution
GPU architectureCompiler TechnologiesDistributed GPU SystemsGPU RuntimeCommunication StackCompiler BackendMLIRLLVMKernel OptimizationAMD Instinct AcceleratorsPerformance EngineeringHardware UtilizationCC++PythonGPU CommunicationDistributed Execution
Problem-SolvingCollaborationTechnical CommunicationAnalytical ThinkingInnovation
Industry Semiconductors & Electronics
Job Function Enhancing Triton compiler and runtime for AMD GPUs in AI applications
Role Subtype AI & Machine Learning
Tech Domains Linux, LLVM, MLIR, GPU, Compiler Technologies, Performance Engineering
GPU architectureCompiler technologiesDistributed GPU systemsGPU runtimeCommunication stackMLIRLLVMKernel optimizationHardware utilizationAMD InstinctPerformance engineeringCC++PythonGPU communicationDistributed executionAI frameworksPyTorchLarge-scale AIDistributed training

Lack of experience with GPU compiler or runtime, No background in GPU architecture or performance engineering, Unable to work in a hybrid environment, No experience with AMD GPUs

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile