Position Details
About this role
Enlyte is seeking a Principal Reliability Engineer to lead the reliability, observability, and operational control of enterprise platforms. This role defines standards, owns integrations with monitoring and incident tooling, and drives measurable reliability improvements aligned to service-level objectives.
Key Responsibilities
- Own the reliability control plane (monitoring, logging, tracing, alerting, incident management)
- Define observability standards aligned to service-level objectives
- Establish reliability patterns and reference architectures across teams
- Lead reliability initiatives impacting multiple systems or platforms
- Lead root cause analysis and drive durable incident detection/diagnosis/recovery improvements
Technical Overview
You will own the reliability control plane, setting architecture and standards for monitoring, logging, tracing, alerting, and incident management. The position emphasizes telemetry normalization, actionable observability, automation tooling integrations, and leading root cause analysis and incident detection/diagnosis/recovery improvements.
Ideal Candidate
The ideal candidate is a Principal Reliability Engineer with deep experience owning reliability and observability practices across enterprise platforms. They lead complex reliability control plane initiatives covering monitoring, logging, tracing, alerting, incident management, and operational readiness aligned to service-level objectives.
Must-Have Skills
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Must demonstrate experience owning a reliability control plane (monitoring/logging/tracing/alerting/incident management), Must have led root cause analysis for significant incidents
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile