Position Details
About this role
Senior Site Reliability Engineer focusing on metrics, SLOs/SLIs, and observability to improve reliability of CVS Health systems; defines error budgets, builds monitoring dashboards, and automates quality gates in releases.
Key Responsibilities
- Define and maintain metrics, SLOs, SLIs
- Design and implement monitoring and observability
- Develop automated quality gates in CI/CD
- Incident response and post-mortem analysis
- Drive continuous improvement in monitoring and reliability
Technical Overview
Stack includes Prometheus, Grafana, ELK, cloud platforms (AWS/Azure/GCP), Docker/Kubernetes, CI/CD pipelines, DataDog, AppDynamics, OTEL, UiPath, Power BI; ITIL exposure.
Ideal Candidate
The ideal candidate is a senior SRE with 5+ years designing and operating metrics-driven observability, SLOs/SLIs, and incident response; proficient with Prometheus, Grafana, Kubernetes, and cloud platforms (AWS/Azure/GCP); familiar with automation and CI/CD practices.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Lack of 5+ years SRE/DevOps experience, No experience with metrics/SLOs/SLIs, No cloud platform experience (AWS/Azure/GCP), No container orchestration experience (Kubernetes)
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile