About this role
Lead, AI Production Services is a hands-on leadership role to own the enterprise AI Operations practice, establishing governance, observability, and service management for production AI and agentic systems at scale.
Key Responsibilities
- Own the Enterprise AI Operations Practice End-to-End
- Drive Production Reliability, Support, and Governance
- Lead Observability, Optimization, and Continuous Improvement
- Establish Enterprise Reporting and Operational Reviews
- Partner with Product, Delivery, and Technical Teams
Technical Overview
Leads AI/ML production operations, ITIL-based governance, observability tooling, risk management, and supplier governance across hybrid/multi-cloud environments.
Ideal Candidate
The ideal candidate is an experienced lead-level IT operations professional with at least 6+ years in enterprise IT, strong AI/agentic production experience, and a track record of implementing ITIL-based governance and observability at scale.
Must-Have Skills
Bachelor's Degree plus extensive years of experience in enterprise IT operationsservice managementreliability engineeringor production support6+ years of overall leadership experienceGoverning operations frameworksstandardsand operating modelsDeep knowledge of AI/agentic production challengesITIL practicesobservability (PrometheusGrafanaOpenTelemetry)incident/change managementRisk management and executive-level reportingHybrid/multi-cloud environments and supplier governance
Nice-to-Have Skills
Hands-on experience with Azure AI FoundryAWS BedrockLangChainUiPathResponsible AI operations and agentic risk governanceSite reliability engineering (SRE)DevOps or enterprise architectureExperience with hybrid/multi-cloud environmentsSupplier management
Required Skills
Bachelor's Degree plus extensive IT operations and leadership experienceITILobservability tools (PrometheusGrafanaOpenTelemetry)incident/change managementSRE/DevOpsAI operationsAzure AI FoundryAWS BedrockLangChainUiPath
Hard Skills
ITILObservabilityPrometheusGrafanaOpenTelemetryIncident ManagementChange ManagementService ManagementAI OpsGovernanceCost OptimizationSite Reliability EngineeringDevOpsHybrid CloudAzure AI FoundryAWS BedrockLangChainUiPathProduction AI systems
Soft Skills
LeadershipCommunicationCollaborationStrategic thinkingProblem solving
Certifications
Preferred
ITIL CertificationDevOps CertificationCloud Certifications (AWS/GCP/Azure)
Keywords for Your Resume
lead ai production servicesai operationsai opsitilprometheusgrafanaopen telemetryobservabilityincident managementchange managementsredevopsproduction ai systemsai platformsazure ai foundryamazon web services bedrocklangchainuipathhybridcloud cost optimizationgovernanceai production
Deal Breakers
Sponsorship needed for US work authorization, Lack of leadership experience in IT operations
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile