✦ Luna Orbit — AI & Machine Learning

AI Distributed Training & Inference Validation Engineer

at Advanced Micro Devices

📍 San Jose, California, United States Unknown Posted March 19, 2026
Type Not Specified
Experience mid
Exp. Years Not specified
Education Not specified
Category AI & Machine Learning

This role involves validating AI solutions, building cluster automation for training and inference workloads, and working with the latest hardware and software technologies in AI.

  • Validate AI solutions
  • Build automation for distributed training
  • Benchmark AI workloads
  • Develop technical relationships
  • Participate in hardware bring-ups

The environment includes AI hardware, software frameworks like PyTorch and TensorFlow, container orchestration with Kubernetes, and automation scripting using Python and Golang.

The ideal candidate is a mid-level AI engineer with experience in validating complex AI infrastructure, proficient in Python and Golang, with knowledge of distributed systems, AI frameworks, and automation scripting.

Experience with complex compute systems used in AIExperience in validating AI infrastructureExperience with running training of LLMsExperience with distributed systems and schedulersAbility to write automation frameworks using Python or Golang
Experience with AMD ROCMExperience with Linux and Windows operating systemsExperience with AI frameworks like PyTorchTensorFlowJAX
KubernetesROCMROCEv2PyTorchTensorFlowJAX
PythonGolangKubernetesROCMROCEv2PyTorchTensorFlowJAXNCCLIBPerfdistributed systemsLLMsAI infrastructureinference workloadstraining benchmarksautomation frameworksLinuxWindows
PythonGolangKubernetesROCMROCEv2PyTorchTensorFlowJAXNCCLIBPerf
communicationproblem-solvingcollaborationleadershipeffective communication
Industry Technology
Job Function AI infrastructure validation and automation engineering
Role Subtype AI & Machine Learning Engineer
Tech Domains Python, Golang, Kubernetes, ROCM, PyTorch, TensorFlow, JAX
AI solutions validation engineerPythonGolangKubernetesROCMROCEv2PyTorchTensorFlowJAXNCCLIBPerfdistributed systemsLLMsAI infrastructureinference workloadstraining benchmarksautomation frameworksLinuxWindowsAI solutions validation

Lack of experience with AI infrastructure, No proficiency in Python or Golang, No experience with distributed systems

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile