✦ Luna Orbit — Cloud & Infrastructure

System Development Manager, AWS Resilience, AWS Incident Response, AWS Incident Response

at Amazon.com

📍 US, WA, Seattle Unknown Posted March 29, 2026
Type Not Specified
Experience mid
Exp. Years 5+ years
Education Not Specified
Category Cloud & Infrastructure

System Development Manager for AWS Resilience leads incident response engineering and resilience tooling to ensure reliable AWS health across services. You’ll drive observability improvements, automation, and cross-team collaboration.

  • Incident Response Leadership
  • Detection & Observability Improvement
  • Cross-Site, Cross-Team Coordination
  • Post-Incident Analysis
  • Performance Management & Team Development

Technical scope includes operational excellence, tooling for incident response, detection and observability improvements, and cross-team coordination across global teams. Emphasis on distributed systems, networking fundamentals, and post-incident analysis.

The ideal candidate is a senior systems engineer with 5+ years in systems development and infrastructure operations, strong incident response leadership, and deep understanding of distributed systems and networking.

1+ years of engineering team management experience5+ years of experience in systems engineeringsystems developmentor infrastructure operationsStrong understanding of distributed systemsnetworking fundamentalsand infrastructure failure modesExcellent communication skills
Experience hiringdeveloping and promoting engineering talentExperience using data to drive root cause elimination and process improvementExperience managing communication with geographically distributed teamsExperience with operational best practices: monitoringalertingand post-incident analysis
Systems engineeringsystems developmentinfrastructure operationsdistributed systemsnetworking fundamentalsincident responseobservabilitymonitoringpost-incident analysisroot-cause analysisoperational excellenceteam developmentleadership
Systems engineeringSystems developmentInfrastructure operationsDistributed systemsNetworking fundamentalsIncident responseObservabilityMonitoringPost-incident analysisRoot-cause analysisOperational excellenceTeam developmentLeadership
CommunicationLeadershipProblem-solvingMentoringCollaboration
Industry Technology
Job Function Lead engineers to improve AWS resilience and incident response capabilities through tooling and operational practices
Role Subtype Systems Engineering Manager
System Development ManagerAWS ResilienceAWS Incident ResponseAIRSeattlehybridincident response leadershipdetection & observability improvementcross-site coordinationpost-incident analysisperformance managementteam developmentdistributed systemsnetworking fundamentalsmonitoringalertingroot-cause analysissystems engineeringsystems developmentinfrastructure operationsincident responseobservabilityleadership

Less than 5 years of relevant experience, No incident response or distributed systems experience, Lack of leadership or team development experience

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile