Position Details
About this role
This role involves maintaining and scaling cloud infrastructure, ensuring system reliability, and automating operations across distributed systems.
Key Responsibilities
- Design scalable infrastructure
- Lead incident response
- Implement monitoring and observability
- Automate deployment pipelines
- Collaborate on system improvements
Technical Overview
The technical environment includes AWS, Kubernetes, Docker, Terraform, and observability tools like Datadog, Prometheus, and Grafana, focusing on automation and incident management.
Ideal Candidate
The ideal candidate is a senior SRE with 6+ years of experience in infrastructure, proficient in Python, AWS, Kubernetes, and Terraform. They should have strong incident response skills and a focus on automation and system reliability.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Less than 6 years of relevant experience, Lack of AWS or Kubernetes expertise, No experience with incident response or root cause analysis
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile