Position Details
About this role
This role involves leading site reliability engineering efforts for high-scale financial services platforms, focusing on automation, monitoring, and incident management to ensure system reliability.
Key Responsibilities
- Define SLIs and SLOs
- Track error budgets
- Automate workflows
- Manage incident response
- Collaborate on reliability goals
Technical Overview
The technical environment includes public cloud platforms, container orchestration with Docker and Kubernetes, performance monitoring with Dynatrace and Splunk, and scripting for automation.
Ideal Candidate
The ideal candidate is a highly experienced SRE professional with over 15 years in software engineering and cloud environments, specializing in reliability, automation, and performance monitoring. They possess deep expertise in container orchestration, scripting, and incident management, capable of leading complex reliability initiatives.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Less than 15 years of relevant experience, Lack of experience with cloud environments, No experience with monitoring tools like Dynatrace or Splunk, Inability to work in a hybrid onsite/remote environment
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile