✦ Luna Orbit — DevOps & SRE

Principal Reliability Engineer

at Enlyte

📍 Remote, US Remote 💰 $133K – $190K USD / year Posted April 14, 2026
Salary $133K – $190K USD / year
Type Full-Time
Experience executive
Exp. Years Not specified
Education Not specified
Category DevOps & SRE

Enlyte is seeking a Principal Reliability Engineer to lead the reliability, observability, and operational control of enterprise platforms. This role defines standards, owns integrations with monitoring and incident tooling, and drives measurable reliability improvements aligned to service-level objectives.

  • Own the reliability control plane (monitoring, logging, tracing, alerting, incident management)
  • Define observability standards aligned to service-level objectives
  • Establish reliability patterns and reference architectures across teams
  • Lead reliability initiatives impacting multiple systems or platforms
  • Lead root cause analysis and drive durable incident detection/diagnosis/recovery improvements

You will own the reliability control plane, setting architecture and standards for monitoring, logging, tracing, alerting, and incident management. The position emphasizes telemetry normalization, actionable observability, automation tooling integrations, and leading root cause analysis and incident detection/diagnosis/recovery improvements.

The ideal candidate is a Principal Reliability Engineer with deep experience owning reliability and observability practices across enterprise platforms. They lead complex reliability control plane initiatives covering monitoring, logging, tracing, alerting, incident management, and operational readiness aligned to service-level objectives.

Own the reliability control planeOwn integrations between platforms and reliability tooling (monitoringalertingincident responseon-calland automation systems)Establish and evolve reliability patterns and reference architecturesEnsure observability tooling provides actionable visibility aligned to service-level objectivesLead or support root cause analysis for significant incidents
reliability control planeobservabilitymonitoringloggingtracingalertingincident managementautomation toolingtelemetryservice-level objectivesoperational readinessgraceful degradationfailure handlingroot cause analysisincident detectionincident diagnosisincident recoveryreliability patternsreference architectureson-callintegrations
reliability control planeobservabilitymonitoringloggingtracingalertingincident managementautomation toolingtelemetry collectiontelemetry normalizationtelemetry consumptionservice-level objectivesoperational readinessgraceful degradationfailure handlingroot cause analysisincident detectionincident diagnosisincident recoveryintegration between platforms and reliability tooling
senior technical leadershipcomplex initiative leadershipcross-functional partnershipdefine standards and architecturelead design decisionscollaboration with CloudPlatformSecurityand Application teamsmentorship (implied by Principal-level leadership)communication and escalation management
Industry SaaS
Job Function Lead enterprise reliability and observability strategy, standards, and operational readiness execution.
Role Subtype Site Reliability Engineer
Tech Domains Linux, Amazon Web Services, Kubernetes, DevOps, Cybersecurity
Principal Reliability Engineerreliability control planeobservabilitymonitoringloggingtracingalertingincident managementautomationtelemetryincident detectionincident diagnosisincident recoveryroot cause analysisservice-level objectivesoperational readinessgraceful degradationfailure handlingreliability patternsreference architectureson-callintegration between platforms and reliability tooling

Must demonstrate experience owning a reliability control plane (monitoring/logging/tracing/alerting/incident management), Must have led root cause analysis for significant incidents

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile