Position Details

Salary $83K – $203K USD / year

Type Full-Time

Experience senior

Exp. Years 5+ years

Education Bachelor's degree or equivalent experience

Category DevOps & SRE

About this role

Senior Site Reliability Engineer focusing on metrics, SLOs/SLIs, and observability to improve reliability of CVS Health systems; defines error budgets, builds monitoring dashboards, and automates quality gates in releases.

Key Responsibilities

Define and maintain metrics, SLOs, SLIs
Design and implement monitoring and observability
Develop automated quality gates in CI/CD
Incident response and post-mortem analysis
Drive continuous improvement in monitoring and reliability

Technical Overview

Stack includes Prometheus, Grafana, ELK, cloud platforms (AWS/Azure/GCP), Docker/Kubernetes, CI/CD pipelines, DataDog, AppDynamics, OTEL, UiPath, Power BI; ITIL exposure.

Ideal Candidate

The ideal candidate is a senior SRE with 5+ years designing and operating metrics-driven observability, SLOs/SLIs, and incident response; proficient with Prometheus, Grafana, Kubernetes, and cloud platforms (AWS/Azure/GCP); familiar with automation and CI/CD practices.

Must-Have Skills

SRE/DevOps experiencedefining and implementing metricsSLOsand SLIsmonitoring/observability tools (PrometheusGrafanaELK stack)cloud platforms (AWSAzureGCP)container orchestration (DockerKubernetes)

Nice-to-Have Skills

ITIL familiarityincident management frameworksautomation scripting (UiPathPower BI)

Tools & Platforms

PrometheusGrafanaELK stackAWSAmazon Web ServicesMicrosoft AzureGoogle Cloud PlatformDockerKubernetesBigQuerySQL ServerDataDogAppDynamicsOTELUiPathPower BI

Required Skills

Site Reliability EngineeringSREDevOpsmetricsSLOsSLIsPrometheusGrafanaELK stackAWSAmazon Web ServicesGoogle Cloud PlatformKubernetesDockerCI/CDBigQuerySQL ServerDataDogAppDynamicsOTELUiPathPower BIITILincident management

Hard Skills

PrometheusGrafanaELK stackAWSAmazon Web ServicesAzureGoogle Cloud PlatformKubernetesDockerCI/CDBigQuerySQL ServerDataDogAppDynamicsOTELUiPathPower BIITILincident management

Soft Skills

analytical thinkingcommunicationstakeholder communicationproblem solvingcollaborationcontinuous improvement

Industry & Role

Industry Healthcare & Medical

Job Function Ensure reliability and performance of CVS Health digital systems through metrics-driven observability and automation

Role Subtype Site Reliability Engineer

Tech Domains Amazon Web Services, Google Cloud Platform, Microsoft Azure, Kubernetes, Docker, Prometheus, Grafana, ELK stack, BigQuery, SQL Server

Keywords for Your Resume

Senior Site Reliability EngineerSREMetricsSLOsSLIserror budgetsmonitoringobservabilityPrometheusGrafanaELK stackAWSAmazon Web ServicesGoogle Cloud PlatformKubernetesDockerCI/CDBigQuerySQL ServerDataDogAppDynamicsOTELUiPathPower BIITILincident managementsenior site reliability engineer

Deal Breakers

Lack of 5+ years SRE/DevOps experience, No experience with metrics/SLOs/SLIs, No cloud platform experience (AWS/Azure/GCP), No container orchestration experience (Kubernetes)

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile

Senior Site Reliability Engineer - Metrics and Observability

Get matched to jobs like this