✦ Luna Orbit — AI & Machine Learning

Principal ML Engineer - Large Scale Training Performance Optimization

at Advanced Micro Devices

📍 San Jose, California, United States Unknown Posted March 25, 2026
Type Full-Time
Experience mid
Exp. Years Not specified
Education Master's degree or PhD in Computer Science, Artificial Intelligence, Machine Learning, or related
Category AI & Machine Learning

This role focuses on training large AI models efficiently across multiple GPUs, improving pipeline performance, and contributing to open source AI frameworks.

  • Train large models
  • Optimize training pipelines
  • Contribute to open source
  • Collaborate across teams
  • Stay updated with training algorithms

The environment involves distributed training pipelines, ML frameworks like PyTorch, TensorFlow, JAX, GPU kernel optimization, and large-scale AI model training.

The ideal candidate is a highly skilled ML engineer with advanced knowledge of distributed training of large models, proficient in frameworks like PyTorch, TensorFlow, or JAX, and experienced in GPU optimization and large-scale AI research.

Distributed training pipelinesML frameworks (PyTorchJAXTensorFlow)Distributed training algorithmsPython or C++ programmingExperience with large modelsGPU optimization
LLMsComputer visionGPU kernel optimizationOpen source contributions
PyTorchJAXTensorFlowMegatron-LMMaxTextTorchTitan
Distributed training pipelinesdata paralleltensor parallelpipeline parallelexpert parallelZeROPyTorchJAXTensorFlowlarge modelsGPUGPU kernel optimizationPythonC++ML frameworkstraining algorithmslarge language modelscomputer vision
Distributed training pipelinesData ParallelTensor ParallelPipeline ParallelExpert ParallelZeROPyTorchJAXTensorFlowLarge modelsGPUGPU kernel optimizationPythonC++ML frameworksML/DL frameworksOpen sourceTraining algorithmsLarge language modelsComputer vision
CommunicationProblem-solvingCollaborationAnalytical thinkingTeamwork
Industry Technology
Job Function Developing and optimizing large-scale AI training pipelines
Role Subtype Machine Learning Engineer
Tech Domains Python, C++, ML frameworks, TensorFlow, PyTorch, JAX, GPU, Distributed systems
distributed training pipelinesdata paralleltensor parallelpipeline parallelexpert parallelzeropytorchjaxtensorflowlarge modelsgpugpu kernel optimizationpythonc++ml frameworkstraining algorithmslarge language modelscomputer visiondistributed trainingaimachine learninggpu optimization

Lack of experience with distributed training pipelines, No knowledge of ML frameworks (PyTorch, TensorFlow, JAX), No experience with large models or GPU optimization, Unable to work in the specified location

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile