Position Details
About this role
This role involves leading site reliability engineering efforts, defining SLIs and SLOs, automating workflows, and ensuring the high availability of critical systems in a cloud environment.
Key Responsibilities
- Define SLIs and SLOs
- Automate workflows and pipelines
- Manage incident response and postmortems
- Conduct capacity planning and performance tuning
- Mentor junior team members
Technical Overview
The technical environment includes SRE practices, performance monitoring tools like Dynatrace and Splunk, container orchestration with Docker and Kubernetes, and cloud infrastructure, with a focus on automation, incident management, and capacity planning.
Ideal Candidate
The ideal candidate is a highly experienced SRE professional with over 15 years in software engineering and architecture, specializing in reliability, automation, and cloud environments. They possess deep expertise in performance monitoring, incident response, and capacity planning, with leadership skills to mentor teams.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Less than 15 years of experience in SRE or software engineering, Lack of experience with cloud and container orchestration, No experience with performance monitoring tools, Poor understanding of incident management
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile