Position Details
About this role
This role involves validating AI solutions, building automation for distributed training and inference workloads, and ensuring system performance in AI clusters. The engineer will work with the latest hardware and software technologies.
Key Responsibilities
- Validate AI solutions
- Build cluster automation
- Reproduce and prevent defects
- Develop testing tools
- Collaborate on hardware/software design
Technical Overview
The technical environment includes AI infrastructure validation, cluster automation, performance profiling, and benchmarking using tools like ROCM, Docker, Kubernetes, SLURM, and LLVM, focusing on large-scale AI training and inference.
Ideal Candidate
The ideal candidate is an experienced AI validation engineer with strong skills in software automation, system validation, and infrastructure for AI workloads. They are proficient in scripting, containerization, and performance profiling, with a focus on large-scale distributed AI systems.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Lack of experience with AI infrastructure validation, No scripting or automation skills, No experience with distributed training or HPC systems
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile