✦ Luna Orbit — Cloud & Infrastructure

Sr. Infrastructure Reliability Engineer, Infrastructure Reliability & Quality

at Amazon.com

📍 US, VA, Herndon Unknown Posted March 31, 2026
Type Full-Time
Experience senior
Exp. Years 10+ years
Education Bachelor's degree in Engineering or a related field
Category Cloud & Infrastructure

Senior reliability engineering role focused on identifying reliability risks for AWS datacenter infrastructure, performing root cause analyses, and driving improvements to datacenter availability.

  • Proactively identify reliability risks for datacenter equipment and implement mitigation plans
  • Conduct root cause analysis of critical failures and drive CAPA
  • Develop datacenter system-level reliability models and risk analyses
  • Perform vendor auditing and quarterly reviews to improve availability
  • Collaborate with suppliers and internal teams to define specifications and risk plans

Scope includes physics-of-failure based risk identification, ALT/FEA, RBD and reliability modeling, data analytics, and vendor audits across datacenter mechanical and electrical equipment.

The ideal candidate is a senior infrastructure reliability engineer with 10+ years in data center reliability, possessing deep expertise in physics-of-failure, ALT, FEA, and reliability modeling, plus strong vendor management and cross-functional leadership.

6+ years of data center designconstructionoperationsor facility maintenance experience6+ years of industrial or commercial engineering in mission critical facilities including but not limited to: data centerspower generation or oil and gas facilities experienceBachelor's degree in Engineering or a related fieldExperience in data center designconstructionoperationsor facility maintenance5+ years of root cause analysis and troubleshooting or problem solving experience10+ years of Reliability Engineering work experience in high reliability industry3+ years of experience with accelerated life testingstress analysis and finite element analysis
Professional Engineering or Architectural LicenseKnowledge of building codes and regulations for your regionExperience carrying design concepts through explorationdevelopmentand into deployment or mass productionExperience readinginterpretingand creating construction drawingsspecificationsand submittal documentsBachelor's degree in Electrical or Mechanical EngineeringEngineering TechnologyReliability Engineeringor 10+ years of managinganalyzing and communicating results to senior leadership experienceMaster's or Ph.D. in Reliability EngineeringPhysicsElectricalMechanical or Materials Engineering or a related fieldExperience with proactive and effective reliability approaches in a cost-effective manner throughout product designmanufacture and deployment stagesProven experience in working with external design and manufacturing supply chain partners.Familiarity with major data center infrastructure equipment reliability performanceAbility in managing multiple qualification activities and development schedules
Reliability Block DiagramStatistical ModelingData Analytics
data center designdata center operationsfacility maintenanceindustrial engineeringmission critical facilitiesroot cause analysisaccelerated life testingstress analysisfinite element analysisreliability engineeringreliability block diagramstatistical modelingdata analyticsprocess capabilityvendor auditingvendor managementreliability modelingenvironmental stress analysiselectrical equipmentmechanical equipmentcross-functional collaboration
Physics-of-FailureReliability EngineeringReliability Block DiagramFinite Element AnalysisFinite Element AnalysisAccelerated Life TestingStress AnalysisStatistical ModelingData AnalyticsRoot Cause AnalysisVendor AuditingVendor ManagementProcess CapabilityData Center Mechanical EquipmentData Center Electrical Equipment
ownershipindependencecommunicationvendor managementproblem solvingleadershipteam collaboration

Preferred

Professional Engineer (PE) LicenseArchitectural License
Industry SaaS
Job Function Lead reliability engineering initiatives to optimize datacenter availability and risk management for AWS infrastructure
Role Subtype Site Reliability Engineer
Tech Domains Amazon Web Services, Linux, SQL / PostgreSQL
Infrastructure Reliability Engineerdata centerreliability engineeringPhysics-of-FailureFinite Element AnalysisAccelerated Life TestingStress AnalysisReliability Block DiagramRBDstatistical modelingdata analyticsroot cause analysisvendor auditingvendor managementprocess capabilitydata center mechanical equipmentdata center electrical equipmentsystem reliability modelingenvironmental stressoperational stressReliability EngineeringRoot Cause AnalysisData AnalyticsVendor ManagementData CenterStatistical Modeling

Lack of 10+ years Reliability Engineering experience in high-reliability environments, No data center mechanical/electrical equipment knowledge, No Bachelor's degree in engineering or related field, Unwillingness to travel domestically or internationally

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile