About this role
Own and build the enterprise AI Operations practice, ensuring production AI, agentic, and automation solutions are reliable, observable, well-governed, and continually improving.
Key Responsibilities
- Own the AI Operations strategy and governance
- Drive production reliability, support, and governance
- Lead observability and optimization
- Establish reporting and operational reviews
- Partner with product, delivery, and technical teams
Technical Overview
Mature AI Operations with ITIL-aligned incident/change management, observability (Prometheus, Grafana, OpenTelemetry), and FinOps practices. Leads across AI platforms, governance, and cost optimization.
Ideal Candidate
The ideal candidate is a senior AI operations leader with 6+ years overseeing AI/ML production systems in large-scale environments, strong observability and cost optimization expertise, and experience maturing AI operations as a capability.
Must-Have Skills
Bachelor's degree plus extensive years of experience in enterprise IT operationsservice managementreliability engineeringor production support6+ years of leadership experience in AI/ML/agentic/production systemsProven experience defining and governing operations frameworksDeep knowledge of AI/agentic production challengesITIL practicesobservability (PrometheusGrafanaOpenTelemetry)incident/change management
Nice-to-Have Skills
Hands-on experience with AI platforms like Azure AI FoundryAWS BedrockLangChainor UiPath in productionKnowledge of Responsible AI operations
Tools & Platforms
PrometheusGrafanaOpenTelemetryAzure AI FoundryAmazon Web Services BedrockLangChainUiPath
Required Skills
ITILobservabilityincident managementchange managementproduction readinessAI operationscost optimizationgovernancedashboardsvendor governance
Hard Skills
AI OperationsObservabilityITILIncident managementProblem managementChange managementService managementObservability tools (PrometheusGrafanaOpenTelemetry)FinOps / cost optimizationSRE / DevOpsQA and production readinessMetrics and dashboardsVendor governanceExecutive reporting
Soft Skills
LeadershipStrategic thinkingCommunicationStakeholder managementMentoring
Keywords for Your Resume
lead ai production servicesai operationsai opsobservabilityprometheusgrafanaopen telemetryfinopscloud cost optimizationsredevopsproduction reliabilityincident managementchange managementslavendor governancestakeholder managementexecutive reportinghybridremoteAI operationsai production systems
Deal Breakers
Lack of AI/ML production ops experience, Inability to work in a hybrid work arrangement, No experience with production observability tooling
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile