About this role
Senior reliability engineering role focused on identifying reliability risks for AWS datacenter infrastructure, performing root cause analyses, and driving improvements to datacenter availability.
Key Responsibilities
- Proactively identify reliability risks for datacenter equipment and implement mitigation plans
- Conduct root cause analysis of critical failures and drive CAPA
- Develop datacenter system-level reliability models and risk analyses
- Perform vendor auditing and quarterly reviews to improve availability
- Collaborate with suppliers and internal teams to define specifications and risk plans
Technical Overview
Scope includes physics-of-failure based risk identification, ALT/FEA, RBD and reliability modeling, data analytics, and vendor audits across datacenter mechanical and electrical equipment.
Ideal Candidate
The ideal candidate is a senior infrastructure reliability engineer with 10+ years in data center reliability, possessing deep expertise in physics-of-failure, ALT, FEA, and reliability modeling, plus strong vendor management and cross-functional leadership.
Must-Have Skills
6+ years of data center designconstructionoperationsor facility maintenance experience6+ years of industrial or commercial engineering in mission critical facilities including but not limited to: data centerspower generation or oil and gas facilities experienceBachelor's degree in Engineering or a related fieldExperience in data center designconstructionoperationsor facility maintenance5+ years of root cause analysis and troubleshooting or problem solving experience10+ years of Reliability Engineering work experience in high reliability industry3+ years of experience with accelerated life testingstress analysis and finite element analysis
Nice-to-Have Skills
Professional Engineering or Architectural LicenseKnowledge of building codes and regulations for your regionExperience carrying design concepts through explorationdevelopmentand into deployment or mass productionExperience readinginterpretingand creating construction drawingsspecificationsand submittal documentsBachelor's degree in Electrical or Mechanical EngineeringEngineering TechnologyReliability Engineeringor 10+ years of managinganalyzing and communicating results to senior leadership experienceMaster's or Ph.D. in Reliability EngineeringPhysicsElectricalMechanical or Materials Engineering or a related fieldExperience with proactive and effective reliability approaches in a cost-effective manner throughout product designmanufacture and deployment stagesProven experience in working with external design and manufacturing supply chain partners.Familiarity with major data center infrastructure equipment reliability performanceAbility in managing multiple qualification activities and development schedules
Tools & Platforms
Reliability Block DiagramStatistical ModelingData Analytics
Required Skills
data center designdata center operationsfacility maintenanceindustrial engineeringmission critical facilitiesroot cause analysisaccelerated life testingstress analysisfinite element analysisreliability engineeringreliability block diagramstatistical modelingdata analyticsprocess capabilityvendor auditingvendor managementreliability modelingenvironmental stress analysiselectrical equipmentmechanical equipmentcross-functional collaboration
Hard Skills
Physics-of-FailureReliability EngineeringReliability Block DiagramFinite Element AnalysisFinite Element AnalysisAccelerated Life TestingStress AnalysisStatistical ModelingData AnalyticsRoot Cause AnalysisVendor AuditingVendor ManagementProcess CapabilityData Center Mechanical EquipmentData Center Electrical Equipment
Soft Skills
ownershipindependencecommunicationvendor managementproblem solvingleadershipteam collaboration
Certifications
Preferred
Professional Engineer (PE) LicenseArchitectural License
Keywords for Your Resume
Infrastructure Reliability Engineerdata centerreliability engineeringPhysics-of-FailureFinite Element AnalysisAccelerated Life TestingStress AnalysisReliability Block DiagramRBDstatistical modelingdata analyticsroot cause analysisvendor auditingvendor managementprocess capabilitydata center mechanical equipmentdata center electrical equipmentsystem reliability modelingenvironmental stressoperational stressReliability EngineeringRoot Cause AnalysisData AnalyticsVendor ManagementData CenterStatistical Modeling
Deal Breakers
Lack of 10+ years Reliability Engineering experience in high-reliability environments, No data center mechanical/electrical equipment knowledge, No Bachelor's degree in engineering or related field, Unwillingness to travel domestically or internationally
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile