Position Details

Type Not Specified

Experience senior

Exp. Years 7+ years

Education Not specified

Category DevOps & SRE

About this role

This role involves managing and improving observability and reliability of large-scale cloud-native systems, working closely with engineering teams to embed monitoring solutions and optimize system performance.

Key Responsibilities

Lead observability strategy
Implement telemetry pipelines
Optimize system reliability
Coordinate incident response
Collaborate with engineering teams

Technical Overview

Focus on SRE practices, observability tools like Datadog, Dynatrace, and OpenTelemetry, cloud-native architectures, distributed systems, and incident management.

Ideal Candidate

The ideal candidate is a senior SRE or DevOps engineer with extensive hands-on experience with observability platforms like Datadog, Dynatrace, or Splunk. They possess strong knowledge of distributed systems, cloud-native architectures, and telemetry pipelines, and can lead initiatives to improve system reliability and performance.

Must-Have Skills

7+ years of experience in SREDevOpsplatform engineeringobservability engineeringHands-on experience with observability platformsExperience with OpenTelemetryUnderstanding of distributed systemsStrong communication skills

Nice-to-Have Skills

Telemetry strategy developmentCost optimizationTelemetry governanceIncident response improvement

Tools & Platforms

DatadogDynatraceSplunkNew RelicGrafanaElastic (ELK)OpenTelemetry

Required Skills

SRESite Reliability EngineeringDevOpsobservability platformsDatadogDynatraceSplunkNew RelicGrafanaElasticOpenTelemetrytelemetry pipelinesdistributed systemscloud-native architectures

Hard Skills

SRESite Reliability EngineeringDevOpsobservability platformsDatadogDynatraceSplunkNew RelicGrafanaElastic (ELK)OpenTelemetrytelemetry pipelinescloud-native architecturesdistributed systems

Soft Skills

communicationcollaborationproblem-solvinganalytical thinkingcustomer focus

Industry & Role

Industry SaaS

Job Function Enhance system reliability and observability for enterprise cloud systems

Role Subtype Site Reliability Engineer

Tech Domains Active Directory, Microsoft 365, Azure, Amazon Web Services, Google Cloud Platform, Kubernetes, Docker, Python, Java, JavaScript

Keywords for Your Resume

Site Reliability EngineerSREDevOpsobservability platformsDatadogDynatraceSplunkNew RelicGrafanaElasticOpenTelemetrytelemetry pipelinesdistributed systemscloud-native architecturesincident responseobservabilitycloud-native

Deal Breakers

Less than 7 years of experience in SRE or DevOps, No hands-on experience with observability platforms, Lack of understanding of distributed systems, No experience with OpenTelemetry, Inability to communicate technical concepts

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile

Technical Account Manager, Observe

Get matched to jobs like this