Position Details
About this role
This position involves managing and improving Braze's high-scale data pipelines, ensuring system reliability and performance.
Key Responsibilities
- Deploy and monitor live applications
- Improve system reliability
- Lead incident response
- Set monitoring standards
- Collaborate with platform teams
Technical Overview
The role covers Kafka-based event pipelines, Kubernetes or Docker deployment, and monitoring with tools like Sentry, Datadog, and PagerDuty.
Ideal Candidate
The ideal candidate is a senior SRE with over 5 years of experience in deploying and managing distributed, high-scale systems. They excel in Kubernetes, Docker, Kafka, and incident management, with a focus on reliability and observability.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Less than 5 years of relevant experience, No experience with Kubernetes or Docker Swarm, Lack of experience with monitoring tools
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile