Position Details

Type Not Specified

Experience senior

Exp. Years 15+ years

Education Bachelor's degree in Computer Science, Engineering, or a related discipline, or equivalent practical experience

Category DevOps & SRE

About this role

Senior Principal Site Reliability Engineer to design, implement, and drive observability, automation, and reliability initiatives across large distributed systems in a financial services environment.

Key Responsibilities

Lead reliability-focused initiatives
Define SLI/SLOs and error budgets
Build automated remediation from observability signals
Improve MTTR/MTTD and reduce toil
Mentor engineering teams

Technical Overview

Focus on SRE principles, IaC, CI/CD, containerization with Kubernetes/Docker, monitoring/observability, and performance under load; emphasis on cross-functional collaboration.

Ideal Candidate

The ideal candidate is a senior-level SRE/DevOps leader with 15+ years of systems engineering experience, expert in Python/Go/Java/Ruby, and hands-on in Kubernetes/Docker, CI/CD, and observability practices to improve reliability at scale.

Must-Have Skills

Bachelor's degree in Computer ScienceEngineeringor related field15+ years of progressive experience in systems engineering with emphasis on site reliability7+ years in a technical leadership roleProficiency in PythonGoJavaRubyExperience with containerization and orchestration (KubernetesDocker)Design and implement observability solutions (metricslogstracesdashboards)IaC and automated CI/CD pipelinesAgile and DevOps experience

Nice-to-Have Skills

Experience with capacity planningchaos engineeringand fault injectionMentoring and developing high-performing teamsExperience in hybrid cloud environments

Tools & Platforms

PythonGoJavaRubyLinuxWindows ServerKubernetesDockerTerraformCI/CD

Required Skills

PythonGoJavaRubyLinuxWindows ServerKubernetesDockerCI/CDInfrastructure as CodeTerraformObservabilitydistributed systems

Hard Skills

PythonGoJavaRubyLinuxWindows ServerKubernetesDockerCI/CDInfrastructure as CodeTerraformObservabilityDistributed systemsNetworking fundamentals

Soft Skills

leadershipstakeholder managementproblem-solvingcommunicationmentoringanalytical thinking

Industry & Role

Industry Financial Services

Job Function Lead the design and execution of enterprise-scale SRE programs to improve system reliability and performance

Role Subtype Site Reliability Engineer

Tech Domains Python, Go, Java, Ruby, Linux, Windows Server, Kubernetes, Docker, Terraform, CI/CD

Keywords for Your Resume

Senior Principal Site Reliability EngineerSREDevOpsCI/CDInfrastructure as CodeIaCKubernetesDockerObservabilitySI/OPrometheusGrafanaPythonGoJavaRubyLinuxWindows ServerIncident ManagementRCAssite reliability engineersredevopsci/cdkubernetesdockerterraformiaCobservabilityleadership

Deal Breakers

15+ years of experience in relevant domains, 7+ years in technical leadership, Proven ability to design and operate in hybrid on-prem/in-cloud environments

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile

Senior Principal Infrastructure Services (SRE Practice)

Get matched to jobs like this