Position Details
About this role
Support and maintain AI infrastructure including GPU and distributed systems, ensuring high performance and reliability in a cloud environment.
Key Responsibilities
- Maintain AI infrastructure
- Troubleshoot issues
- Support GPU and distributed systems
- Collaborate with teams
- Ensure system reliability
Technical Overview
Supports AI tools like PyTorch, TensorFlow, JAX on cloud platforms such as AWS, GCP, and Azure, with expertise in networking and system troubleshooting.
Ideal Candidate
The ideal candidate is a mid-level AI infrastructure support engineer with 2+ years experience managing GPU and distributed systems, proficient in cloud platforms like AWS, GCP, and Azure, with strong troubleshooting skills.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Clearance & Visa
Keywords for Your Resume
Deal Breakers
Lack of experience with GPU or AI infrastructure, No cloud platform knowledge, Inability to work on-site in New York, NY
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile