✦ Luna Orbit — AI & Machine Learning

Sr. Systems Design Engineer

at Advanced Micro Devices

📍 San Jose, California, United States Hybrid Posted April 02, 2026
Type Full-Time
Experience senior
Exp. Years 3+ years
Education Master's or PhD degree in electrical or computer engineering
Category AI & Machine Learning

AMD is seeking a Senior ML Systems Engineer to develop and optimize ML operator kernels and dataflow pipelines for the NPU platform, with full-stack responsibilities from kernel implementation to hardware integration.

  • Drive technical innovation in NPU kernel and dataflow development; debug silicon bring-up and production issues; coordinate with compiler, runtime, and hardware teams; model performance; mentor engineers

Role spans kernel development for ML workloads across ROCm stack, ONNX Runtime integration, and performance optimization for AMD GPUs/NPUs, including quantization and low-level hardware interactions.

The ideal candidate is a senior ML/AI systems engineer with a strong background in GPU/accelerator kernel development, ROCm/ROCm stack expertise, and hands-on experience with ONNX Runtime. They should excel at cross-functional debugging across frameworks, runtimes, and hardware, and be capable of guiding large-scale AI workloads.

C/C++PythonDeep handson experience with ROCm software stackLinux-based GPU systemsNPU/GPU/accelerator kernel development or SDK integrationQuantization techniques (INT8FP8)
Exposure to MLIR/LLVM compiler infrastructurePublications or patents in ML systemscompilersor computer architectureExperience with Nvidia/AMD GPU platformsExperience deploying large-scale AI training/inference clustersFamiliarity with AI workload characteristics (LLMsdistributed training)
GitdebuggersprofilersLinux development environments
C/C++PythonROCm/ROCm stackONNX RuntimeML operator kernelsdataflow librariesHIP runtimeMLIR/LLVMLinuxGPU kernel developmentquantizationsystolic arraysDMA
C/C++PythonMultithreadingONNX RuntimeNPU kernel developmentML operator kernelsGeMMConvSoftmaxAttentionGitdebuggersprofilersLinux development environmentsMLIR/LLVMQuantization (INT8FP8)Systolic arraysDataflow acceleratorsDMA
Strong written communicationCross-functional collaborationAnalytical thinkingProblem-solvingLeadership
Industry Semiconductors
Job Function Lead end-to-end NPU kernel and dataflow development for AI workloads on AMD hardware
Role Subtype ML Systems Engineer
Tech Domains Linux, Python, Java, JavaScript, SQL / PostgreSQL
NPUML operator kernelsdataflow librariesONNX RuntimeROCmHIP runtimeROCm stackMLIRLLVMC++PythonLinuxGPUkernel developmentdataflowDMAsystolic arraysquantization (INT8FP8)FP16BF16multinodemultirack

Must have a Masters or PhD in Electrical/Computer Engineering or related field

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile