Position Details

Type Full-Time

Experience lead

Exp. Years Not specified

Education Not specified

Category AI & Machine Learning

About this role

This role leads architecture and engineering of production-grade agentic AI workflows. You will design multi-agent systems with reliable state/memory, integrate models and tools, build agent-ready data pipelines with real-time retrieval, and implement LLMOps observability and safety controls for hybrid autonomy.

Key Responsibilities

Design and build multi-agent workflows for stateful agentic applications
Implement state, memory, and long-running execution control flows including persistent memory and interruptible execution
Establish universal standardized tool interfaces and integrate enterprise data sources and operational tools
Build agent-ready data pipelines and real-time retrieval using vector databases
Implement LLMOps observability (tracing, telemetry, cost monitoring) and safety controls including human-in-the-loop and break-glass controls

Technical Overview

You will engineer stateful, goal-driven agentic applications using agent orchestration frameworks, standardized tool interfaces, and model routing/fallback strategies. The role includes building scalable ingestion and vector database retrieval architectures, deploying workloads via containerization and cluster orchestration, and establishing LLMOps tracing, performance/cost monitoring, and reliability testing.

Ideal Candidate

The ideal candidate is a lead AI/ML engineer who has built production-grade agentic (agentic systems) architectures, including multi-agent workflows with state, memory, and long-running execution controls. They have strong LLMOps/observability experience (tracing, reasoning traces, cost monitoring) and can deploy agent workloads using containerization and cluster orchestration.

Must-Have Skills

architect and deliver production-grade autonomous AI workflowsdesign and build multi-agent workflowsengineer statememoryand long-running execution control flowsestablish universal standardized tool interfacesbuild routing and fallback strategies across multiple model endpointspackage and deploy workloads via containerization and cluster orchestrationimplement end-to-end observability for agent execution including reasoning tracesperformance telemetrycost monitoringand production debugging workflowsdesign hybrid autonomy modes (human-in-the-loop through fully autonomous)including approval gatespolicy enforcementand break-glass controls

Tools & Platforms

vector databasescontainerizationcluster orchestration

Required Skills

agent orchestration frameworksmulti-agent workflowsstateful agentic applicationsstatememoryand long-running executionmessage passingpersistent memoryrecoverabilityinterruptible executionstandardized tool interfacesmodel integrationrouting and fallback strategiescontext managementlatencyinference costcontainerizationcluster orchestrationcloud-native serviceshigh-throughput ingestion and transformation pipelinesvector databasesreal-time context injectionLLMOpstracingdebuggingreasoning tracesperformance telemetrycost monitoringhuman-in-the-looppolicy enforcementbreak-glass controlstesting strategies

Hard Skills

agent orchestration frameworksmulti-agent workflowsstateful agentic applicationsstatememoryand long-running execution control flowsmessage passingpersistent memoryrecoverabilityinterruptible executionstandardized tool interfacesmodel integrationruntime optimizationrouting and fallback strategiescontext managementlatencyinference cost optimizationproduction deploymentcontainerizationcluster orchestrationcloud-native serviceshigh-throughput ingestion and transformation pipelinesvector databasesretrieval patternsretrieval architecturesLLMOpstracingdebuggingreasoning tracesperformance telemetrycost monitoringproduction debugging workflowssafety and control frameworkshuman-in-the-looppolicy enforcementbreak-glass controlsevaluation and reliability standardstesting strategiesautonomous AI workflowsgoal-driven AI systemsreasonplancoordinateand execute complex tasksschemasdata contractsSLAspipeline specificationsoperational toolsenterprise data sources

Soft Skills

cross-functional executiontechnical bridge between teamshands-on collaborationcommunication across AI and data teams

Industry & Role

Industry Fintech

Job Function Architect and deliver production-grade agentic AI systems with reliability, observability, and safety controls.

Role Subtype AI Engineer

Tech Domains Cybersecurity

Keywords for Your Resume

Lead AI/ML EngineerAgentic Systemsautonomous AI workflowsagent orchestration frameworksmulti-agent workflowsstateful agentic applicationsstatememoryand long-running executionpersistent memoryinterruptible executionstandardized tool interfacesmodel integrationruntime optimizationrouting and fallback strategiescontext managementlatencyinference costproduction deploymentcontainerizationcluster orchestrationcloud-native servicesLLMOpstracingcost monitoringhuman-in-the-looppolicy enforcementbreak-glass controlsvector databases

Deal Breakers

Must be able to design and build multi-agent workflows using agent orchestration frameworks, Must have production deployment experience with containerization and cluster orchestration, Must have LLMOps/observability experience including tracing and cost monitoring

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile