About this role
Lead a remote SRE team responsible for reliability of customer-facing platforms. Own SLOs/SLIs and error budgets, drive incident/change best practices, improve observability, and champion automation to reduce toil and increase resilience.
Key Responsibilities
- Lead and grow the team (hire, coach, blameless culture)
- Own reliability strategy (SLOs/SLIs, error budgets, guardrails)
- Operate platform reliability (availability, latency, capacity, change management)
- Run on-call and escalation (SEV1/2) with blameless postmortems
- Drive observability and automation (reduce alert noise; IaC, CI/CD, chaos/load testing)
Technical Overview
Operate and improve availability, latency, capacity planning, and change management across AWS/Azure/GCP and Kubernetes. Standardize observability using logs/metrics/traces and reduce alert noise with tools such as Datadog, Dynatrace, Prometheus, Grafana, and New Relic, while implementing Infrastructure as Code and CI/CD.
Ideal Candidate
The ideal candidate is an SRE/DevOps leader with 8+ years of platform or reliability engineering experience, including 2–4 years leading SRE or DevOps/Platform teams. They have hands-on expertise operating large-scale services on AWS/Azure/GCP with Kubernetes, strong Linux and distributed systems fundamentals, and a track record of owning on-call programs, SLOs/SLIs, error budgets, and observability improvements.
Must-Have Skills
8+ years in software/platform/reliability engineering2–4 years leading SRE/DevOps/Platform teamsoperating large-scale services on AWS/Azure/GCP with Kubernetes and containersLinuxnetworkingdistributed systemsIaCTerraformCloudFormationBicepCI/CDGitHub ActionsCircleCIAzure DevOpsone scripting language (Python/Go/Bash)observability (metricslogstraces)alerting best practiceson-call programsblameless postmortems
Tools & Platforms
AWSAmazon Web ServicesAzureGoogle Cloud PlatformGCPKubernetesTerraformCloudFormationBicepGitHub ActionsCircleCIAzure DevOpsDatadogDynatracePrometheusGrafanaNew RelicAPMmonitoring tools
Required Skills
SLOsSLIserror budgetsincident managementSEV1/2blameless postmortemsobservabilitymetricslogstracesalerting best practicesInfrastructure as CodeIaCTerraformCloudFormationBicepCI/CDGitHub ActionsCircleCIAzure DevOpsscripting (Python/Go/Bash)on-call programsKubernetesLinuxnetworkingdistributed systemsautomationchaos/game daysload testingtoil reductionleast-privilegesecrets managementaudit readiness
Hard Skills
SLOsSLIserror budgetsincident managementSEV1/2blameless postmortemsavailabilitylatencycapacity planningchange managementobservabilitymetricslogstracesalertinginfra-as-codeInfrastructure as CodeTerraformCloudFormationBicepCI/CDGitHub ActionsCircleCIAzure DevOpsPythonGoBashautomationresiliencechaos/game daysload testingtoil reductionleast-privilegesecrets managementaudit readinessLinuxnetworkingdistributed systemsKubernetescontainersAWSAmazon Web ServicesAzureGoogle Cloud PlatformGCP
Soft Skills
leadershiphirecoachand develop SREsset goalsblamelessdata-driven culturestakeholder managementcommunicationpresenting trade-offs and data to executivescomfortable presenting trade-offscoaching engineers
Keywords for Your Resume
Site Reliability ManagerSRE ManagerSr SRE ManagerSite Reliability EngineeringDevOpsPlatform teamsSLOs/SLIserror budgetsincident managementSEV1/2blameless postmortemson-callavailabilitylatencycapacity planningchange managementobservabilitymetricslogstracesalert noiseDatadogDynatracePrometheusGrafanaNew RelicInfrastructure as CodeIaCTerraformCloudFormationBicepCI/CDGitHub ActionsCircleCIAzure DevOpsKubernetesLinuxdistributed systemsautomationload testingchaos/game days
Deal Breakers
8+ years in software/platform/reliability engineering, 2–4 years leading SRE/DevOps/Platform teams, Proven experience operating large-scale services on AWS/Azure/GCP with Kubernetes and containers, Hands-on IaC (Terraform/CloudFormation/Bicep) and CI/CD (GitHub Actions/CircleCI/Azure DevOps), Must be located such that the role can be performed Remote, US
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile