Position Details
About this role
This role involves building and maintaining large-scale compute and storage infrastructure to support Cursor’s AI and coding models, working closely with ML researchers and engineers to optimize training systems and hardware utilization.
Key Responsibilities
- Improve training throughput
- Build GPU infrastructure
- Collaborate with ML teams
- Automate GPU cluster management
- Enhance system reliability
Technical Overview
The role focuses on developing high-performance infrastructure, including GPU clusters, distributed storage, and networking, utilizing tools like Kubernetes, Slurm, and infrastructure-as-code practices across Linux environments.
Ideal Candidate
The ideal candidate is a systems engineer with experience in building large-scale compute and storage infrastructure, proficient in Python, Typescript, Rust, and Golang. They have hands-on experience with distributed storage, networking, and GPU infrastructure, and can operate in Linux and cloud environments, preferably with Kubernetes and Slurm expertise.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Lack of experience with large-scale systems, No experience with Nvidia GPUs or infrastructure-as-code, Unfamiliarity with Linux or Kubernetes
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile