Position Details
About this role
A role focused on owning and improving critical production systems through code, monitoring, and incident management, ensuring platform reliability at scale.
Key Responsibilities
- Own production services
- Build observability frameworks
- Lead incident response
- Define SLOs
- Improve deployment and automation
Technical Overview
Hands-on SRE role involving code development, observability, incident response, capacity planning, and security hardening in a cloud environment.
Ideal Candidate
The ideal candidate is a mid-level SRE with experience owning critical production systems, writing high-quality code, and implementing observability and incident response frameworks. They are proactive problem-solvers with strong collaboration skills.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Lack of experience with production systems, No experience in incident response or observability, Unable to work on-site in NYC or SF
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile