Position Details
About this role
This role involves managing and improving observability and reliability of large-scale cloud-native systems, working closely with engineering teams to embed monitoring solutions and optimize system performance.
Key Responsibilities
- Lead observability strategy
- Implement telemetry pipelines
- Optimize system reliability
- Coordinate incident response
- Collaborate with engineering teams
Technical Overview
Focus on SRE practices, observability tools like Datadog, Dynatrace, and OpenTelemetry, cloud-native architectures, distributed systems, and incident management.
Ideal Candidate
The ideal candidate is a senior SRE or DevOps engineer with extensive hands-on experience with observability platforms like Datadog, Dynatrace, or Splunk. They possess strong knowledge of distributed systems, cloud-native architectures, and telemetry pipelines, and can lead initiatives to improve system reliability and performance.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Less than 7 years of experience in SRE or DevOps, No hands-on experience with observability platforms, Lack of understanding of distributed systems, No experience with OpenTelemetry, Inability to communicate technical concepts
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile