✦ Luna Orbit — DevOps & SRE

Untitled Position

at Company

Hybrid Posted March 29, 2026
Type Not Specified
Experience lead
Exp. Years 6+ years
Education Bachelor's Degree plus extensive years of experience in enterprise IT operations, service management, reliability engineering, or production support
Category DevOps & SRE

Lead, AI Production Services is a hands-on leadership role to own the enterprise AI Operations practice, establishing governance, observability, and service management for production AI and agentic systems at scale.

  • Own the Enterprise AI Operations Practice End-to-End
  • Drive Production Reliability, Support, and Governance
  • Lead Observability, Optimization, and Continuous Improvement
  • Establish Enterprise Reporting and Operational Reviews
  • Partner with Product, Delivery, and Technical Teams

Leads AI/ML production operations, ITIL-based governance, observability tooling, risk management, and supplier governance across hybrid/multi-cloud environments.

The ideal candidate is an experienced lead-level IT operations professional with at least 6+ years in enterprise IT, strong AI/agentic production experience, and a track record of implementing ITIL-based governance and observability at scale.

Bachelor's Degree plus extensive years of experience in enterprise IT operationsservice managementreliability engineeringor production support6+ years of overall leadership experienceGoverning operations frameworksstandardsand operating modelsDeep knowledge of AI/agentic production challengesITIL practicesobservability (PrometheusGrafanaOpenTelemetry)incident/change managementRisk management and executive-level reportingHybrid/multi-cloud environments and supplier governance
Hands-on experience with Azure AI FoundryAWS BedrockLangChainUiPathResponsible AI operations and agentic risk governanceSite reliability engineering (SRE)DevOps or enterprise architectureExperience with hybrid/multi-cloud environmentsSupplier management
Bachelor's Degree plus extensive IT operations and leadership experienceITILobservability tools (PrometheusGrafanaOpenTelemetry)incident/change managementSRE/DevOpsAI operationsAzure AI FoundryAWS BedrockLangChainUiPath
ITILObservabilityPrometheusGrafanaOpenTelemetryIncident ManagementChange ManagementService ManagementAI OpsGovernanceCost OptimizationSite Reliability EngineeringDevOpsHybrid CloudAzure AI FoundryAWS BedrockLangChainUiPathProduction AI systems
LeadershipCommunicationCollaborationStrategic thinkingProblem solving

Preferred

ITIL CertificationDevOps CertificationCloud Certifications (AWS/GCP/Azure)
Industry Technology
Job Function Own and scale the Enterprise AI Operations practice across the organization.
Role Subtype Platform Engineer
Tech Domains ITIL, Observability, Prometheus, Grafana, OpenTelemetry, Azure, Amazon Web Services, LangChain, UiPath
Visa Sponsorship No
lead ai production servicesai operationsai opsitilprometheusgrafanaopen telemetryobservabilityincident managementchange managementsredevopsproduction ai systemsai platformsazure ai foundryamazon web services bedrocklangchainuipathhybridcloud cost optimizationgovernanceai production

Sponsorship needed for US work authorization, Lack of leadership experience in IT operations

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile