✦ Luna Orbit — DevOps & SRE

Senior Site Reliability Engineer

at NinjaOne

📍 Remote, US Remote 💰 $160K – $240K USD / year Posted April 14, 2026
Salary $160K – $240K USD / year
Type Full-Time
Experience senior
Exp. Years 10+ years’ experience
Education Not specified
Category DevOps & SRE

Senior Site Reliability Engineer to help scale NinjaOne’s platform reliability and availability. The role emphasizes incident diagnosis, Root Cause Analysis (RCA), observability, automation, and AWS-driven infrastructure improvements.

  • Diagnose and resolve complex application and infrastructure issues
  • Participate in 24x7 on-call rotation, SCRUM, and deployment planning
  • Perform Root Cause Analysis (RCA) and provide recommendations
  • Improve availability and reduce customer impact using observability tools
  • Develop software, scripts, or tooling to improve efficiency and reduce delivery time of applications and infrastructure.

Own reliability and operational improvements across AWS-based services using observability platforms like New Relic, Splunk, and DataDog. Build automation and Infrastructure-as-Code (IaC) with CloudFormation (plus Terraform, Helm, Ansible) and work with containers, Fargate, Kubernetes, and distributed microservice architectures.

The ideal candidate is a senior Site Reliability Engineer/DevOps engineer with 10+ years of experience, strong Linux administration, and deep AWS (Amazon Web Services) operational expertise. They have proven observability skills using New Relic, Splunk, and DataDog and can lead Root Cause Analysis (RCA), improve availability, and automate production reliability through Infrastructure-as-Code (IaC) (primarily CloudFormation, plus Terraform/Helm/Ansible).

10+ years’ experience in DevOps and/or Site Reliability Engineering roles3+ years' experience with an object-oriented language (preferably Java.NET or C++)Intermediate+ level Linux administrationscriptingand troubleshootingDemonstrable knowledge of Observability tools (New RelicSplunkDataDog)Comprehensive experience with Amazon Web Services (AWS) and its core capabilities (VPCEC2ECSRoute53FargateALB/NLB distributionsetc)Experience with cloud automation and infrastructure-as-code (IaC) toolsetsprimarily CloudFormation but also including TerraformHelm and AnsibleHands-on experience with CI/CD and Software Development Life Cycle (SDLC) processesEffective communication skillsboth verbal and writtenParticipate in a 24x7 on-call rotation
Cloud Development Kit (CDK)FargateKubernetescontainersmicroservice architectures
New RelicSplunkDataDogAmazon Web Services (AWS)VPCEC2ECSRoute53FargateALB/NLBCloudFormationTerraformHelmAnsibleCloud Development Kit (CDK)KubernetesLinuxCI/CDSCRUM24x7 on-call rotation
DevOpsSite Reliability EngineeringLinux administrationscriptingtroubleshootingObservabilityNew RelicSplunkDataDogAWSVPCEC2ECSRoute53FargateALB/NLBInfrastructure-as-Code (IaC)CloudFormationTerraformHelmAnsibleCloud Development Kit (CDK)containersKubernetesmicroservice architecturesCI/CDSoftware Development Life Cycle (SDLC)Root Cause Analysis (RCA)technical documentationSOP’s
Root Cause Analysis (RCA)ObservabilityObservability toolsNew RelicSplunkDataDogAmazon Web Services (AWS)VPCEC2ECSRoute53FargateALB/NLBdistributionsLinux administrationscriptingtroubleshootingInfrastructure-as-Code (IaC)CloudFormationTerraformHelmAnsibleCloud Development Kit (CDK)containersKubernetesmicroservice architecturesCI/CDSoftware Development Life Cycle (SDLC)technical documentationSOP’sdeployment planning24x7 on-call rotationSCRUMapplication security-minded architecturesecurity
passion for automationpassion for observabilityeffective communication skillsboth verbal and writtenproblem-solvingcross-team influencedocumentationsecurity-minded thinkingparticipation in SCRUM
Industry SaaS
Job Function Ensure and improve production reliability and scalability for a cloud SaaS platform through AWS operations, observability, automation, and SRE practices.
Role Subtype Site Reliability Engineer
Tech Domains Amazon Web Services, Linux, Kubernetes, DevOps & SRE
Senior Site Reliability EngineerSite Reliability EngineeringDevOpsLinux administrationRoot Cause Analysis (RCA)ObservabilityNew RelicSplunkDataDogAmazon Web Services (AWS)VPCEC2ECSRoute53FargateALB/NLBInfrastructure-as-Code (IaC)CloudFormationTerraformHelmAnsibleKubernetesCI/CDSoftware Development Life Cycle (SDLC)Cloud Development Kit (CDK)

10+ years’ experience in DevOps and/or Site Reliability Engineering roles, 3+ years with an object-oriented language (preferably Java, .NET or C++), Intermediate+ level Linux administration, scripting, and troubleshooting, Comprehensive Amazon Web Services (AWS) experience (VPC, EC2, ECS, Route53, Fargate, ALB/NLB), Must be able to participate in a 24x7 on-call rotation and be located in the USA eligible states listed

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile