✦ Luna Orbit — DevOps & SRE

FLEX Service Availability Analyst

at Marriott International

📍 Bethesda, MD, United States Hybrid Posted April 11, 2026
Type Temp-to-Hire
Experience mid
Exp. Years 5+ years
Education Undergraduate degree or equivalent experience/certification
Category DevOps & SRE

This temporary FLEX role focuses on ensuring enterprise IT service availability and peak performance through proactive Site Reliability Engineering and incident command leadership. The position emphasizes automation, cloud technologies, and continuous process improvement to minimize disruptions and strengthen the technology landscape.

  • Serve as Incident Commander during major incidents, leading response efforts to restore services and minimize impact on business and consumer operations
  • Design and implement automation tools to reduce manual intervention and improve system performance
  • Perform proactive service reliability engineering and continuous process improvement
  • Manage and document incident, problem, change, and release management activities
  • Use cloud, infrastructure as code, and containerization technologies to enhance availability and reliability

You will lead major incident response as Incident Commander, improve reliability with monitoring/performance/capacity tooling, and build automation using Python/Shell plus Ansible and Jenkins. The role works across cloud platforms (AWS, Azure, GCP) using infrastructure as code and containerization technologies, with strong IT Operations and incident/problem/change/release management practices.

The ideal candidate is a 5+ year IT operations professional with 2+ years of incident, problem, change, and release management experience, including leading calls and documenting outcomes. They are hands-on with Python and Shell scripting, automation using Ansible and Jenkins, and reliability work across cloud platforms (AWS, Azure, GCP) with infrastructure as code and containerization. They can serve as Incident Commander in a 24x7x365 environment and bring calm, decisive leadership during major incidents.

5+ years of experience in an information technology environment.3 years of experience in information technology focused on IT Operations that include troubleshooting complex networkserverstorageand/or application issues.2 years minimum operations experience involving incidentproblemchangeand release management that included leading calls and documenting outcomes.Ability to cover shifts in a 24x7x365 environment and on-call responsibilities.Proficiency in scripting languages (PythonShell) and familiarity with automation tools (such as AnsibleJenkins).Experience with cloud platforms (AWSAzureGCP)infrastructure as codeand containerization technologies.Experience in incident command or incident management in a technology environment.Undergraduate degree or or equivalent experience/certification.
ITIL Foundations v3+ CertificationDemonstrated experience with ITSM suitese.g.ServiceNow.Demonstrated experience with various monitoringperformanceor capacity tools.Experience with continuous integration/continuous deployment (CI/CD) pipelines and DevOps practices.Familiarity with Site Reliability Engineering principles and concepts.Strong leadership qualitiesincluding decisivenessand the ability to motivate teamsalong with the ability to manage stressful situations calmly and effectively.Ability to create constructive relationshipsinfluenceand communicate with varying levels of associates and management.Ability to solve complexcross-functional issues.Strong knowledge of ServerStorageNetworkMiddlewareApplication and Cloud technologies.A high degree of curiosity and a drive to seek more efficient ways of delivering service.
AnsibleJenkinsServiceNowAWSAmazon Web ServicesAzureGCPGoogle Cloud Platform
PythonShellAnsibleJenkinsAWSAzureGCPinfrastructure as codecontainerization technologiesincident commandincident managementIT OperationsITSM suitesServiceNowITIL Foundations v3+
PythonShellAnsibleJenkinsAWSAmazon Web ServicesAzureGCPGoogle Cloud Platforminfrastructure as codecontainerization technologiesincident commandincident managementIT Operationstroubleshooting complex networktroubleshooting complex servertroubleshooting complex storagetroubleshooting complex application issuesincidentproblemchangerelease managementautomation toolsmonitoringperformancecapacity toolscontinuous integration/continuous deployment (CI/CD) pipelinesDevOps practicesSite Reliability Engineering principlesServerStorageNetworkMiddlewareApplicationCloud technologies
leadership skillsproactive problem-solverstrong problem-solvingorganizational skillsanalytical skillsdecisivenessability to motivate teamsability to manage stressful situations calmly and effectivelyability to create constructive relationshipsability to influenceability to communicate with varying levels of associates and managementability to solve complexcross-functional issueshigh degree of curiosityability to seek more efficient ways of delivering service

Preferred

ITIL Foundations v3+
Industry Government/Public Sector
Job Function Ensure enterprise IT service reliability and availability through SRE practices and incident command leadership
Role Subtype Site Reliability Engineer
Tech Domains Python, DevOps & SRE, Amazon Web Services, Azure, Google Cloud Platform, Kubernetes, Docker, ITSM / ServiceNow, Networking / TCP-IP, Linux
FLEX Service Availability AnalystService Availability ManagerSRE Service Availability ManagerSite Reliability EngineeringSite Reliability Engineering principlesDevOpsDevOps practicesincident commandincident managementincidentproblemchangerelease management24x7x365on-callPythonShellAnsibleJenkinsAWSAmazon Web ServicesAzureGCPGoogle Cloud Platforminfrastructure as codecontainerization technologiesITIL Foundations v3+ServiceNowITSM suites

Must have 3 years of IT Operations troubleshooting complex network, server, storage, and/or application issues, Must have 2 years minimum operations experience with incident, problem, change, and release management including leading calls and documenting outcomes, Must be able to cover shifts and on-call responsibilities in a 24x7x365 environment, Must have proficiency in scripting languages (Python, Shell)

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile