✦ Luna Orbit — DevOps & SRE

Senior Site Reliability Engineer - Infrastructure

at S&P Global

📍 2 Locations Unknown Posted April 18, 2026
Type Not Specified
Experience senior
Exp. Years Not specified
Education Not specified
Category DevOps & SRE

Senior Site Reliability Engineer at Kensho responsible for ensuring reliability, scalability, and security of internal and customer-facing services. The role includes operating production systems, designing resilient infrastructure, automating operations, and maintaining robust monitoring and incident readiness in a 24/7 on-call environment.

  • Own and operate production services for availability, performance, and reliability
  • Design, build, and manage AWS infrastructure including EKS-based clusters
  • Provision infrastructure using Terraform (Infrastructure as Code)
  • Deploy, scale, and troubleshoot applications running on Kubernetes
  • Monitor system health with metrics, logs, and alerts; tune dashboards and runbooks

Build and manage AWS infrastructure including EKS-based clusters using Terraform (Infrastructure as Code). Operate and troubleshoot Kubernetes workloads, implement Python-based automation to reduce toil, and run comprehensive monitoring using metrics, logs, alerts, dashboards, and runbooks, including certificate lifecycle management.

The ideal candidate is a senior Site Reliability Engineer who is a hands-on technologist with strong infrastructure and software engineering skills and a Python-first approach. They have owned production reliability for customer-facing services, built and operated AWS infrastructure using EKS, and implemented Infrastructure as Code with Terraform, including Kubernetes operations and monitoring with metrics, logs, alerts, dashboards, and runbooks.

Senior Site Reliability Engineer (SRE)Python firsthands-on technologistensure the reliabilityscalabilityand security of both business-critical internal systems and externalcustomer facing servicesoperate in a 24/7 on call environment
AWSAmazon Web ServicesEKSElastic Kubernetes Service (EKS)TerraformKubernetesPythonZscalermetricslogsalertsdashboardsrunbooks
Site Reliability Engineering (SRE)production servicesavailabilityperformancereliabilityscalabilitysecurityAWS infrastructureEKS-based clustersTerraform (Infrastructure as Code)Kubernetescluster creationupgradeslifecycle managementPythonautomation frameworksmetricslogsalertsdashboardsrunbookstroubleshootingnetworkingcertificatesdeploymentsapplication behaviorcertificate lifecycle managementInfoSecVulnerability ManagementZscaler24/7 on call
Site Reliability Engineering (SRE)production servicesavailabilityperformancereliabilityinfrastructure reliabilityscalabilitysecurityAWS infrastructureAmazon Web ServicesEKS-based clustersElastic Kubernetes Service (EKS)Terraform (Infrastructure as Code)Infrastructure as CodeKubernetescluster creationupgradeslifecycle managementPythonautomation frameworksmetricslogsalertsdashboardsrunbookstroubleshootingnetworkingcertificatesdeploymentsapplication behaviorcertificate lifecycle managementexpiration managementsecurity postureInfoSecVulnerability ManagementZscaleron call environment24/7 on call
hands-on technologiststrong troubleshooting skillsdeep ownership of production systemscollaboration with InfrastructureApplicationand Security teamscollaboration with InfoSecVulnerability Managementand Network Security teamscollaboration with L1/L2 teamscommunication
Industry SaaS / Artificial Intelligence
Job Function Ensure production reliability through SRE practices, AWS/EKS operations, Terraform IaC, and Kubernetes automation
Role Subtype Site Reliability Engineer
Tech Domains Amazon Web Services, Kubernetes, Terraform, Python, DevOps
Senior Site Reliability EngineerSite Reliability Engineer (SRE)SREAWSAmazon Web ServicesEKSElastic Kubernetes Service (EKS)TerraformInfrastructure as CodeKubernetescluster creationupgradeslifecycle managementPythonautomationautomation frameworksmonitoringmetricslogsalertsdashboardsrunbookstroubleshootcertificate lifecycle24/7 on callZscalerInfoSecVulnerability Management

Must have Python first experience, Must be able to operate in a 24/7 on call environment, Must have hands-on AWS and Kubernetes/EKS experience, Must have Infrastructure as Code experience with Terraform

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile