About this role
Sr SRE Manager to lead reliability engineers for uptime and efficiency of customer-facing platforms, defining SLOs, incident management, and robust observability; drives automation and security alignment.
Key Responsibilities
- Lead & grow the team
- Own reliability strategy
- Operate the platform
- Incident management
- Observability and automation
Technical Overview
Cloud-native, multi-cloud with Kubernetes; IaC (Terraform/CloudFormation/Bicep); CI/CD pipelines; observability stack (Datadog, Dynatrace, Prometheus, Grafana, New Relic); incident response and postmortems.
Ideal Candidate
The ideal candidate is an 8+ year seasoned SRE/DevOps leader who can define and enforce SLOs, manage incidents, and drive a culture of blameless engineering across cloud platforms (AWS/Azure/GCP) and Kubernetes.
Must-Have Skills
8+ years in software/platform/reliability engineering2–4 years leading SRE/DevOps/Platform teamsExperience operating large-scale services on AWS/Azure/GCP with Kubernetes and containersLinux fundamentalsIaC (Terraform/CloudFormation/Bicep)CI/CD (GitHub Actions/CircleCI/Azure DevOps)observability (metricslogstraces) and alertingon-call programscommunication and stakeholder management
Nice-to-Have Skills
chaos engineeringload testingsecurity/compliance partnershipscost optimization
Tools & Platforms
DatadogDynatracePrometheusGrafanaNew RelicTerraformCloudFormationBicepGitHub ActionsCircleCIAzure DevOpsAmazon Web ServicesMicrosoft AzureGoogle Cloud PlatformKubernetes
Required Skills
8+ years in software/platform/reliability engineering; 2–4 years leading SRE/DevOps/Platform teams; AWS/Azure/GCP with Kubernetes; Linux; IaC; CI/CD; observability; on-call; security practices
Hard Skills
SLOserror budgetsincident managementobservabilityAPMDatadogDynatracePrometheusGrafanaNew RelicIaCTerraformCloudFormationBicepCI/CDGitHub ActionsCircleCIAzure DevOpsKubernetesLinuxAWSAzureGCPleast-privilegesecrets management
Soft Skills
leadershipcoachingstakeholder managementcommunicationdata-driven decision making
Keywords for Your Resume
site reliability managersre managerremotecontract6 monthsslosliserror budgetsincident managementobservabilityapmdatadogdynatraceprometheusgrafananew relicIaCterraformcloudformationbicepci/cdgithub actionscircleciazure devopskuberneteslinuxawsazuregcpleast-privilegesecrets managementsresla
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile