Position Details
About this role
This role is for a senior AI-focused Site Reliability Engineer within Ai.X. The position centers on applying SRE practices to run and maintain reliable AI-related services.
Key Responsibilities
- Own service reliability for AI workloads
- Drive incident management and recovery improvements
- Implement monitoring and observability practices
- Automate operational workflows and reliability checks
- Partner with engineering teams to improve scalability and uptime
Technical Overview
The technical scope is SRE for AI systems, focusing on reliability, monitoring/observability, and automation. The posting does not specify cloud platforms, programming languages, or tooling.
Ideal Candidate
The ideal candidate is a senior Site Reliability Engineer with experience supporting reliable production services in an AI-focused environment. They should be strong in incident management, monitoring/observability, and automation practices aligned with SRE and DevOps principles.
Must-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Must have Site Reliability Engineering (SRE) experience, Must have AI experience
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile