✦ Luna Orbit — AI & Machine Learning

Lead AI/ML Engineer

at S&P Global

📍 2 Locations Hybrid Posted April 18, 2026
Type Full-Time
Experience lead
Exp. Years Not specified
Education Not specified
Category AI & Machine Learning

This role leads architecture and engineering of production-grade agentic AI workflows. You will design multi-agent systems with reliable state/memory, integrate models and tools, build agent-ready data pipelines with real-time retrieval, and implement LLMOps observability and safety controls for hybrid autonomy.

  • Design and build multi-agent workflows for stateful agentic applications
  • Implement state, memory, and long-running execution control flows including persistent memory and interruptible execution
  • Establish universal standardized tool interfaces and integrate enterprise data sources and operational tools
  • Build agent-ready data pipelines and real-time retrieval using vector databases
  • Implement LLMOps observability (tracing, telemetry, cost monitoring) and safety controls including human-in-the-loop and break-glass controls

You will engineer stateful, goal-driven agentic applications using agent orchestration frameworks, standardized tool interfaces, and model routing/fallback strategies. The role includes building scalable ingestion and vector database retrieval architectures, deploying workloads via containerization and cluster orchestration, and establishing LLMOps tracing, performance/cost monitoring, and reliability testing.

The ideal candidate is a lead AI/ML engineer who has built production-grade agentic (agentic systems) architectures, including multi-agent workflows with state, memory, and long-running execution controls. They have strong LLMOps/observability experience (tracing, reasoning traces, cost monitoring) and can deploy agent workloads using containerization and cluster orchestration.

architect and deliver production-grade autonomous AI workflowsdesign and build multi-agent workflowsengineer statememoryand long-running execution control flowsestablish universal standardized tool interfacesbuild routing and fallback strategies across multiple model endpointspackage and deploy workloads via containerization and cluster orchestrationimplement end-to-end observability for agent execution including reasoning tracesperformance telemetrycost monitoringand production debugging workflowsdesign hybrid autonomy modes (human-in-the-loop through fully autonomous)including approval gatespolicy enforcementand break-glass controls
vector databasescontainerizationcluster orchestration
agent orchestration frameworksmulti-agent workflowsstateful agentic applicationsstatememoryand long-running executionmessage passingpersistent memoryrecoverabilityinterruptible executionstandardized tool interfacesmodel integrationrouting and fallback strategiescontext managementlatencyinference costcontainerizationcluster orchestrationcloud-native serviceshigh-throughput ingestion and transformation pipelinesvector databasesreal-time context injectionLLMOpstracingdebuggingreasoning tracesperformance telemetrycost monitoringhuman-in-the-looppolicy enforcementbreak-glass controlstesting strategies
agent orchestration frameworksmulti-agent workflowsstateful agentic applicationsstatememoryand long-running execution control flowsmessage passingpersistent memoryrecoverabilityinterruptible executionstandardized tool interfacesmodel integrationruntime optimizationrouting and fallback strategiescontext managementlatencyinference cost optimizationproduction deploymentcontainerizationcluster orchestrationcloud-native serviceshigh-throughput ingestion and transformation pipelinesvector databasesretrieval patternsretrieval architecturesLLMOpstracingdebuggingreasoning tracesperformance telemetrycost monitoringproduction debugging workflowssafety and control frameworkshuman-in-the-looppolicy enforcementbreak-glass controlsevaluation and reliability standardstesting strategiesautonomous AI workflowsgoal-driven AI systemsreasonplancoordinateand execute complex tasksschemasdata contractsSLAspipeline specificationsoperational toolsenterprise data sources
cross-functional executiontechnical bridge between teamshands-on collaborationcommunication across AI and data teams
Industry Fintech
Job Function Architect and deliver production-grade agentic AI systems with reliability, observability, and safety controls.
Role Subtype AI Engineer
Tech Domains Cybersecurity
Lead AI/ML EngineerAgentic Systemsautonomous AI workflowsagent orchestration frameworksmulti-agent workflowsstateful agentic applicationsstatememoryand long-running executionpersistent memoryinterruptible executionstandardized tool interfacesmodel integrationruntime optimizationrouting and fallback strategiescontext managementlatencyinference costproduction deploymentcontainerizationcluster orchestrationcloud-native servicesLLMOpstracingcost monitoringhuman-in-the-looppolicy enforcementbreak-glass controlsvector databases

Must be able to design and build multi-agent workflows using agent orchestration frameworks, Must have production deployment experience with containerization and cluster orchestration, Must have LLMOps/observability experience including tracing and cost monitoring

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile