✦ Luna Orbit — DevOps & SRE

Lead Director, Software Development Engineering - SRE, Retail Pharmacy

at CVS Health

Remote 💰 $144K – $288K USD / year Posted April 14, 2026
Salary $144K – $288K USD / year
Type Not Specified
Experience lead
Exp. Years Not specified
Education Not specified
Category DevOps & SRE

Lead the SRE organization responsible for reliability, availability, observability, and scalability of CVS Health retail pharmacy distributed store technology. The role drives incident response, compliance, continuous improvement, and proactive monitoring/automation to improve store health and operational excellence.

  • Lead a global team of technical professionals
  • Align SRE strategies with enterprise goals for resilient technology
  • Execute a multi-year roadmap for observability, automation, and reliability improvements
  • Define and maintain SLIs, SLOs, and business KPIs and deliver KPI reporting
  • Lead major incident management with rapid detection, root-cause analysis, and resolution

Own SRE strategy and execution for distributed store environments spanning pharmacy platforms, Point of Sale (POS) systems, handheld devices, store servers, dispensing locations, and connectivity infrastructure. Define SLIs and SLOs, build dashboards and alerting for real-time visibility, and lead multi-year modernization of monitoring and automation, including cloud, edge, and AI-driven monitoring approaches.

The ideal candidate is a lead-level Site Reliability Engineer (SRE) who has delivered reliability and observability improvements for distributed environments, including incident response and root-cause analysis. They are experienced defining SLIs and SLOs, building dashboards and alerting for real-time visibility, and driving automation to reduce toil and enable self-healing capabilities.

Site Reliability Engineering (SRE)incident responseroot-cause analysisService Level Indicators (SLIs)Service Level Objectives (SLOs)monitoringautomationobservabilitydistributed store technology reliabilityleading major incident management
Point of Sale (POS)
Site Reliability Engineering (SRE)reliabilityavailabilityobservabilityscalabilitydistributed store technologyincident responseroot-cause analysiscompliancecontinuous improvementService Level Indicators (SLIs)Service Level Objectives (SLOs)dashboardsvisualizationsalerting systemsreal-time insightsedge nodesmonitoringautomationprocess improvementsbusiness KPIsoperational reportingstrategic communicationplatform engineeringself-healing capabilitiescloud monitoringedge monitoringAI-driven monitoring solutionsstakeholder coordinationoperational excellencesoftware lifecycle reliability practices
Site Reliability Engineering (SRE)reliabilityavailabilityobservabilityscalabilitydistributed store technologypharmacy platformsPoint of Sale (POS) systemsincident responseroot-cause analysiscompliancecontinuous improvementService Level Indicators (SLIs)Service Level Objectives (SLOs)dashboardingvisualizationsalerting systemsreal-time insightsrapid incident responseedge nodesmonitoringautomationprocess improvementsbusiness KPIsoperational reportingstrategic communicationautomationplatform engineeringself-healing capabilitiescloud monitoringedge monitoringAI-driven monitoring solutionsstakeholder coordinationoperational excellencemodernize monitoring and automation capabilitiesintegration with architectsintegration with product engineeringintegration with infrastructure teamssoftware lifecycle reliability practices
leadershipmentorshipguidanceprofessional growth supportstrategic planningcross-functional collaborationstakeholder managementcommunicationorganizational alignmentthought leadership
Industry Healthcare IT
Job Function Lead SRE engineering to ensure reliable, observable, and scalable retail pharmacy store technology and drive automation-driven operational excellence.
Role Subtype Site Reliability Engineer
Tech Domains DevOps & SRE
Lead Director EngineeringSRESite Reliability Engineering (SRE)reliabilityavailabilityobservabilityscalabilityincident responseroot-cause analysisService Level Indicators (SLIs)Service Level Objectives (SLOs)business KPIsdashboardsvisualizationsalerting systemsautomationobservability roadmapmonitoring and automationedge nodesreal-time visibilityself-healing capabilitiescloud monitoringedge monitoringAI-driven monitoringdistributed store environmentsmajor incident management

Must have experience leading major incident management and incident response, Must have experience defining and managing Service Level Indicators (SLIs) and Service Level Objectives (SLOs)

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile