About this role
This role leads architecture and engineering of production-grade agentic AI workflows. You will design multi-agent systems with reliable state/memory, integrate models and tools, build agent-ready data pipelines with real-time retrieval, and implement LLMOps observability and safety controls for hybrid autonomy.
Key Responsibilities
- Design and build multi-agent workflows for stateful agentic applications
- Implement state, memory, and long-running execution control flows including persistent memory and interruptible execution
- Establish universal standardized tool interfaces and integrate enterprise data sources and operational tools
- Build agent-ready data pipelines and real-time retrieval using vector databases
- Implement LLMOps observability (tracing, telemetry, cost monitoring) and safety controls including human-in-the-loop and break-glass controls
Technical Overview
You will engineer stateful, goal-driven agentic applications using agent orchestration frameworks, standardized tool interfaces, and model routing/fallback strategies. The role includes building scalable ingestion and vector database retrieval architectures, deploying workloads via containerization and cluster orchestration, and establishing LLMOps tracing, performance/cost monitoring, and reliability testing.
Ideal Candidate
The ideal candidate is a lead AI/ML engineer who has built production-grade agentic (agentic systems) architectures, including multi-agent workflows with state, memory, and long-running execution controls. They have strong LLMOps/observability experience (tracing, reasoning traces, cost monitoring) and can deploy agent workloads using containerization and cluster orchestration.
Must-Have Skills
architect and deliver production-grade autonomous AI workflowsdesign and build multi-agent workflowsengineer statememoryand long-running execution control flowsestablish universal standardized tool interfacesbuild routing and fallback strategies across multiple model endpointspackage and deploy workloads via containerization and cluster orchestrationimplement end-to-end observability for agent execution including reasoning tracesperformance telemetrycost monitoringand production debugging workflowsdesign hybrid autonomy modes (human-in-the-loop through fully autonomous)including approval gatespolicy enforcementand break-glass controls
Tools & Platforms
vector databasescontainerizationcluster orchestration
Required Skills
agent orchestration frameworksmulti-agent workflowsstateful agentic applicationsstatememoryand long-running executionmessage passingpersistent memoryrecoverabilityinterruptible executionstandardized tool interfacesmodel integrationrouting and fallback strategiescontext managementlatencyinference costcontainerizationcluster orchestrationcloud-native serviceshigh-throughput ingestion and transformation pipelinesvector databasesreal-time context injectionLLMOpstracingdebuggingreasoning tracesperformance telemetrycost monitoringhuman-in-the-looppolicy enforcementbreak-glass controlstesting strategies
Hard Skills
agent orchestration frameworksmulti-agent workflowsstateful agentic applicationsstatememoryand long-running execution control flowsmessage passingpersistent memoryrecoverabilityinterruptible executionstandardized tool interfacesmodel integrationruntime optimizationrouting and fallback strategiescontext managementlatencyinference cost optimizationproduction deploymentcontainerizationcluster orchestrationcloud-native serviceshigh-throughput ingestion and transformation pipelinesvector databasesretrieval patternsretrieval architecturesLLMOpstracingdebuggingreasoning tracesperformance telemetrycost monitoringproduction debugging workflowssafety and control frameworkshuman-in-the-looppolicy enforcementbreak-glass controlsevaluation and reliability standardstesting strategiesautonomous AI workflowsgoal-driven AI systemsreasonplancoordinateand execute complex tasksschemasdata contractsSLAspipeline specificationsoperational toolsenterprise data sources
Soft Skills
cross-functional executiontechnical bridge between teamshands-on collaborationcommunication across AI and data teams
Keywords for Your Resume
Lead AI/ML EngineerAgentic Systemsautonomous AI workflowsagent orchestration frameworksmulti-agent workflowsstateful agentic applicationsstatememoryand long-running executionpersistent memoryinterruptible executionstandardized tool interfacesmodel integrationruntime optimizationrouting and fallback strategiescontext managementlatencyinference costproduction deploymentcontainerizationcluster orchestrationcloud-native servicesLLMOpstracingcost monitoringhuman-in-the-looppolicy enforcementbreak-glass controlsvector databases
Deal Breakers
Must be able to design and build multi-agent workflows using agent orchestration frameworks, Must have production deployment experience with containerization and cluster orchestration, Must have LLMOps/observability experience including tracing and cost monitoring
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile