Position Details

Type Full-Time

Experience senior

Exp. Years 5+ years

Education Not specified

Category DevOps & SRE

About this role

Lead the design, build, and maintenance of cloud infrastructure on AWS, ensuring platform reliability, security, and scalability. Drive automation, incident response, and observability for a high-growth AI startup.

Key Responsibilities

Design and maintain cloud infrastructure
Implement CI/CD pipelines
Lead incident response and postmortems
Ensure security and compliance
Enhance platform observability

Technical Overview

Expertise in AWS cloud services, Kubernetes, Terraform, CI/CD pipelines, and monitoring tools like Datadog and Prometheus. Focus on security, operational discipline, and autonomous remediation.

Ideal Candidate

The ideal candidate is a senior Site Reliability Engineer with 5+ years of experience designing and maintaining cloud infrastructure on AWS, proficient in Kubernetes, Terraform, and CI/CD pipelines, with a strong focus on security and observability.

Must-Have Skills

AWSCloud InfrastructureTerraformKubernetesCI/CD pipelinesMonitoring toolsSecurity best practices

Nice-to-Have Skills

SOC2 complianceDatadogCloudWatchPrometheusIncident responseBlameless postmortemsOperational discipline

Tools & Platforms

AWSAmazon Web ServicesTerraformGitHub ActionsKubernetesDatadogCloudWatchPrometheus

Required Skills

AWSAmazon Web ServicesEC2EKSIAMALBRDSS3TerraformInfrastructure as CodeCI/CDGitHub ActionsKubernetesK8sMonitoringDatadogCloudWatchPrometheusSecurityZero TrustSOC2Incident ResponseBlameless PostmortemsObservabilityLog Management

Hard Skills

AWSAmazon Web ServicesEC2EKSIAMALBRDSS3TerraformCloud DevelopmentCI/CDGitHub ActionsKubernetesK8sInfrastructure as CodeTerraformCDKMonitoringDatadogCloudWatchPrometheusSecurityZero TrustSOC2Security GapsLog ManagementObservability

Soft Skills

LeadershipSelf-directedTechnical expertiseProblem-solvingCollaborationCommunicationOperational discipline

Industry & Role

Industry SaaS / Cloud Computing / Data Platforms

Job Function Build and operate reliable, secure cloud infrastructure for autonomous data operations

Keywords for Your Resume

Deal Breakers

Lack of experience with AWS or cloud infrastructure, No experience with Kubernetes or Terraform, Inability to work remotely, Less than 5 years of relevant experience

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile

Sr. Site Reliability Engineer

Get matched to jobs like this