✦ Luna Orbit — DevOps & SRE

Senior Site Reliability Engineer - Metrics and Observability

at CVS Health

Unknown 💰 $83K – $203K USD / year Posted April 02, 2026
Salary $83K – $203K USD / year
Type Full-Time
Experience senior
Exp. Years 5+ years
Education Bachelor's degree or equivalent experience
Category DevOps & SRE

Senior Site Reliability Engineer focusing on metrics, SLOs/SLIs, and observability to improve reliability of CVS Health systems; defines error budgets, builds monitoring dashboards, and automates quality gates in releases.

  • Define and maintain metrics, SLOs, SLIs
  • Design and implement monitoring and observability
  • Develop automated quality gates in CI/CD
  • Incident response and post-mortem analysis
  • Drive continuous improvement in monitoring and reliability

Stack includes Prometheus, Grafana, ELK, cloud platforms (AWS/Azure/GCP), Docker/Kubernetes, CI/CD pipelines, DataDog, AppDynamics, OTEL, UiPath, Power BI; ITIL exposure.

The ideal candidate is a senior SRE with 5+ years designing and operating metrics-driven observability, SLOs/SLIs, and incident response; proficient with Prometheus, Grafana, Kubernetes, and cloud platforms (AWS/Azure/GCP); familiar with automation and CI/CD practices.

SRE/DevOps experiencedefining and implementing metricsSLOsand SLIsmonitoring/observability tools (PrometheusGrafanaELK stack)cloud platforms (AWSAzureGCP)container orchestration (DockerKubernetes)
ITIL familiarityincident management frameworksautomation scripting (UiPathPower BI)
PrometheusGrafanaELK stackAWSAmazon Web ServicesMicrosoft AzureGoogle Cloud PlatformDockerKubernetesBigQuerySQL ServerDataDogAppDynamicsOTELUiPathPower BI
Site Reliability EngineeringSREDevOpsmetricsSLOsSLIsPrometheusGrafanaELK stackAWSAmazon Web ServicesGoogle Cloud PlatformKubernetesDockerCI/CDBigQuerySQL ServerDataDogAppDynamicsOTELUiPathPower BIITILincident management
PrometheusGrafanaELK stackAWSAmazon Web ServicesAzureGoogle Cloud PlatformKubernetesDockerCI/CDBigQuerySQL ServerDataDogAppDynamicsOTELUiPathPower BIITILincident management
analytical thinkingcommunicationstakeholder communicationproblem solvingcollaborationcontinuous improvement
Industry Healthcare & Medical
Job Function Ensure reliability and performance of CVS Health digital systems through metrics-driven observability and automation
Role Subtype Site Reliability Engineer
Tech Domains Amazon Web Services, Google Cloud Platform, Microsoft Azure, Kubernetes, Docker, Prometheus, Grafana, ELK stack, BigQuery, SQL Server
Senior Site Reliability EngineerSREMetricsSLOsSLIserror budgetsmonitoringobservabilityPrometheusGrafanaELK stackAWSAmazon Web ServicesGoogle Cloud PlatformKubernetesDockerCI/CDBigQuerySQL ServerDataDogAppDynamicsOTELUiPathPower BIITILincident managementsenior site reliability engineer

Lack of 5+ years SRE/DevOps experience, No experience with metrics/SLOs/SLIs, No cloud platform experience (AWS/Azure/GCP), No container orchestration experience (Kubernetes)

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile