Position Details
About this role
This role involves managing the compute platform for AI infrastructure, focusing on scheduling, orchestration, and resource management to optimize cluster utilization and support diverse workloads.
Key Responsibilities
- Define and own compute scheduling strategy
- Partner with infrastructure teams
- Optimize resource allocation
- Develop observability tools
- Drive platform evolution
Technical Overview
The technical environment includes compute clusters, scheduling and orchestration tools, with a focus on GPU and accelerator resource management for large-scale AI training and inference.
Ideal Candidate
The ideal candidate is an experienced product manager with a background in compute infrastructure and orchestration systems, capable of leading the development of scheduling and capacity management tools for large-scale AI compute platforms.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Lack of experience with compute infrastructure, No background in scheduling/orchestration, No experience with GPU or accelerator clusters
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile