Position Details

Salary $121K – $224K USD / year

Type Full-Time

Experience mid

Exp. Years 6+ years

Education Bachelor's in a quantitative or business field

Category DevOps & SRE

About this role

This role involves leading the development and implementation of reliable, scalable infrastructure systems using SRE best practices, automation, and observability tools to ensure high system availability and performance.

Key Responsibilities

Designing system architectures
Leading automation initiatives
Monitoring and troubleshooting system issues
Mentoring engineering teams
Implementing security and reliability improvements

Technical Overview

The technical environment includes cloud platforms like AWS, Azure, GCP, containerization with Kubernetes and Docker, infrastructure as code with Terraform, and monitoring with Prometheus and Grafana.

Ideal Candidate

The ideal candidate is a mid-level Site Reliability Engineer with 6+ years of experience in cloud infrastructure, observability, and automation. They possess strong leadership skills and expertise in designing scalable, reliable systems using SRE practices.

Must-Have Skills

SRE practicesmonitoring and observabilityCI/CD toolscloud infrastructureroot cause analysissecurity evaluationarchitecture design

Nice-to-Have Skills

KubernetesDockerPrometheusGrafanaTerraformAWSAzureGoogle Cloud Platform

Tools & Platforms

JiraGitKubernetesDockerPrometheusGrafanaTerraformAWSAzureGoogle Cloud Platform

Required Skills

SREobservability toolsCI/CDautomationmonitoringsecurityarchitecture designroot cause analysiscloud infrastructureKubernetesDockerTerraformPrometheusGrafanaAWSAzureGoogle Cloud Platform

Hard Skills

Site Reliability EngineeringSREobservability toolsCI/CDautomationmonitoringsecurityperformance optimizationroot cause analysisarchitecture designcloud infrastructure

Soft Skills

leadershipcommunicationproblem-solvingteam mentoringproject managementcollaborationdocumentationcontinuous improvement

Certifications

Required

None specified

Preferred

AWS Certified Solutions ArchitectGoogle Cloud Certified - Professional Cloud ArchitectAzure Solutions Architect ExpertCertified Kubernetes Administrator

Industry & Role

Industry Healthcare & Medical

Job Function Lead the development and maintenance of reliable, scalable infrastructure systems using SRE practices.

Role Subtype Site Reliability Engineer

Tech Domains Active Directory, Microsoft 365, Azure, Amazon Web Services, Google Cloud Platform, Kubernetes, Docker, Terraform, Prometheus, Grafana

Keywords for Your Resume

site reliability engineerSREobservability toolsCI/CDautomationmonitoringsecuritycloud infrastructurearchitecture designroot cause analysisperformance optimizationincident responseDevOpsKubernetesDockerTerraformPrometheusGrafanaAWSAzureGoogle Cloud Platformteam mentoringleadership

Deal Breakers

No experience with SRE practices, Lack of cloud infrastructure knowledge, Less than 6 years of relevant experience, No experience with monitoring tools, Unwillingness to work in a hybrid environment

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile

Principal Site Reliability Engineer

Get matched to jobs like this