Position Details

Type Not Specified

Experience lead

Exp. Years Not specified

Education Not specified

Category AI & Machine Learning

About this role

This role involves partnering with AI software teams and customers to enable large-scale training and inference on AMD GPUs, designing scalable Kubernetes architectures, and optimizing AI workloads.

Key Responsibilities

Partner with teams to enable LLM training
Design Kubernetes architectures
Validate inference frameworks
Optimize GPU workloads
Collaborate with customers

Technical Overview

The technical environment includes AI infrastructure, Kubernetes, distributed training frameworks, GPU computing, and inference frameworks like vLLM and SGLang.

Ideal Candidate

The ideal candidate is a lead AI platform engineer with extensive experience in large-scale AI infrastructure, Kubernetes, and GPU-based distributed training. They should be solution-oriented, collaborative, and capable of designing scalable AI deployment architectures.

Must-Have Skills

AI infrastructureKubernetesLarge Language Modelsdistributed trainingGPU computing

Nice-to-Have Skills

K8sInference frameworksvLLMSGLangKubernetes-native

Tools & Platforms

KubernetesSLURMKubeflowMPIVolcanoKueue

Required Skills

AI platformAI infrastructureLarge Language ModelsLLMKubernetesdistributed traininginference frameworksvLLMSGLangGPU computing

Hard Skills

AI PlatformAI infrastructureLarge Language ModelsLLMKubernetesK8sDistributed trainingInference frameworksvLLMSGLangKubernetes-nativeGPU computingContainer orchestration

Soft Skills

collaborativesolution-orientedproblem-solvingcommunicationteamwork

Industry & Role

Industry AI / Data Centers / Hardware

Job Function AI platform solutions engineering for large-scale AI workloads

Role Subtype AI Infrastructure Engineer

Tech Domains Kubernetes, Linux, MPI, SLURM, Container orchestration

Clearance & Visa

Clearance Required None

Visa Sponsorship No

Keywords for Your Resume

AI infrastructureLarge Language ModelsLLMKubernetesdistributed trainingGPUinference frameworksvLLMSGLangK8sKubeflowMPIvolcanoKueueGPU computing

Deal Breakers

Lack of experience with Kubernetes or AI infrastructure, No experience with large language models, Unwillingness to work in a hybrid environment

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile

Principal Engineer- AI Platform Solutions

Get matched to jobs like this