Position Details
About this role
This role involves building, maintaining, and improving large-scale data export systems at Braze, focusing on reliability, scalability, and observability.
Key Responsibilities
- Build and maintain high-scale data systems
- Improve system reliability and performance
- Lead incident response and postmortems
- Define monitoring standards
- Support infrastructure and platform engineering
Technical Overview
The environment includes Kafka-based event pipelines, Kubernetes or Docker for deployment, and monitoring tools like Sentry, Datadog, and PagerDuty.
Ideal Candidate
The ideal candidate is a senior-level SRE with at least 5 years of experience in deploying and maintaining large-scale, distributed systems. They possess strong skills in Kubernetes, Docker, Kafka, and monitoring tools, with a focus on reliability and incident management.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Lack of experience with Kubernetes or Docker Swarm, Less than 5 years of relevant experience, No experience with monitoring tools like Datadog or PagerDuty
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile