About this role
Site Reliability Engineer responsible for reliability, observability, and automation across hybrid cloud/on-prem environments; build and operate scalable production systems in a manufacturing ERP context.
Key Responsibilities
- Operate 24/7 production environments
- Lead incident response and RCA
- Design observability and alerting
- Champion GitOps and IaC (Terraform)
- Collaborate with cross-functional teams in Agile environment
Technical Overview
Hands-on Linux/Unix admin, cloud platforms (AWS/Azure/GCP), GitOps with Terraform, observability tooling (Coralogix, FireHydrant), incident response, on-call rotations; containerization and Kubernetes preferred.
Ideal Candidate
The ideal candidate is a mid-level SRE/DevOps professional with strong Linux/Unix administration, hands-on cloud experience (AWS/Azure/GCP), Terraform and GitOps expertise, and solid incident response capabilities in hybrid/on-prem environments.
Must-Have Skills
3–5+ years hands-on experience in Site Reliability EngineeringDevOpsor Infrastructure rolesDeep expertise in at least one major cloud platform (AWSAzureor GCP)Fluency with Linux/Unix systems administrationincluding kernel internalsnetworkingfile systemsand advanced shell scripting (BashPython)Proven experience managing production systems in hybrid cloud and on-premises environmentsFamiliarity with GitOps workflowsTerraformand observability toolsActive participation in incident response and on-call rotations
Nice-to-Have Skills
Bachelor’s degree in computer scienceEngineeringor related fieldor equivalent experienceKubernetes or container servicesExperience supporting high-availability SaaS platformsCloud certifications (AWSAzureor Google Cloud)Agile/Scrum experience and JiraKnowledge of FinOps and cost optimization best practices
Tools & Platforms
CoralogixFireHydrantTerraformGitKubernetesLinuxPythonBash
Required Skills
Linux/Unixkernel internalsnetworkingfile systemsBashPythonAWS/Azure/GCPGitOpsTerraformCoralogixFireHydrantKubernetesincident responseon-call rotations
Hard Skills
Linux/Unix administrationKernel internalsNetworkingFile systemsBashPythonCloud platforms (AWSAzureor Google Cloud)GitOpsTerraformObservability tools (CoralogixFireHydrant)CoralogixFireHydrantIncident responseOn-call rotationsKubernetes (preferred)
Soft Skills
troubleshootingproblem-solvingcommunicationteam collaborationon-call readinessprioritization
Certifications
Preferred
Cloud certifications (AWSAzureor Google Cloud)
Keywords for Your Resume
site reliability engineersredevopslinuxunixbashpythonawsazuregoogle cloud platformterraformgitopsobservabilitycoralogixfirehydrantkuberneteson-callincident responsehybrid cloudon-premisesremotefull-time
Deal Breakers
Lack of Linux/Unix experience, No cloud experience, No Terraform/GitOps experience, No on-call experience
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile