✦ Luna Orbit — DevOps & SRE

Director, ML/Dev Ops (Tip.AI)

at Marriott International

📍 Bethesda, MD, United States Hybrid Posted March 26, 2026
Type Not Specified
Experience senior
Exp. Years Not specified
Education Not specified
Category DevOps & SRE

This role involves leading the development and maintenance of scalable, reliable AI platform services using cloud-native tools, with a focus on automation, observability, and incident management.

  • Build CI/CD pipelines
  • Deploy models with Kubernetes
  • Implement observability
  • Ensure platform reliability
  • Automate incident response

The technical environment includes Kubernetes, SageMaker, Ray Serve, Terraform, Vault, and AWS/GCP cloud platforms, emphasizing ML Ops and infrastructure automation.

The ideal candidate is a senior DevOps or SRE professional with extensive experience in building scalable, reliable cloud-native systems, particularly with Kubernetes, SageMaker, and infrastructure as code tools like Terraform and Vault.

Scalable systemsDevOps practicesCloud infrastructureML OpsIaC (TerraformCDK)Secrets managementSLOsIncident management
Harness.ioGCP certificationsAWS certificationsGPU autoscalingCost-aware autoscalingLegacy CI migration
KubernetesSageMakerRay ServeTerraformVaultAWS Secrets ManagerHarness.io
ci/cd pipelineskubernetessageMakerray serveopentelemetryterraformvaultaws secrets managerml opscloud infrastructure
CI/CD pipelinesKubernetesSageMakerRay ServeOpenTelemetryTerraformVaultAWS Secrets ManagerCloud infrastructureML OpsContainerizationSLOsReliability engineeringAutomationIncident response
LeadershipCommunicationProblem-solvingTeamworkInnovation

Required

AWS Certified Solutions ArchitectGCP certifications
Industry Hospitality / Technology
Job Function AI platform engineering and DevOps
Role Subtype DevOps Engineer
Tech Domains Kubernetes, Amazon Web Services, Google Cloud Platform, Terraform, Cybersecurity
ci/cd pipelineskubernetessageMakerray serveopentelemetryterraformvaultaws secrets managerml opscloud infrastructurescalable systemsdevops practicesautomationincident responsegcpawsharness.iogpu autoscalingincident management

Lack of experience with Kubernetes or SageMaker, No background in cloud infrastructure, Inability to work in a hybrid environment, No experience with IaC tools like Terraform

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile