About this role
Walmart is hiring a Director of Software Engineering to lead the strategy, architecture, and execution of the Agent AI Platform powering Sparky and next-gen customer experiences. The role focuses on scaling agent orchestration, RAG, evaluation, safety/guardrails, observability, and real-time inference across products.
Key Responsibilities
- Lead architecture and development of Sparky Agent AI Platform
- Design and scale distributed systems for LLMs and retrieval-augmented generation (RAG)
- Define platform capabilities (context management, evaluation pipelines, safety/guardrails, observability)
- Deliver production-ready AI platform features with operational maturity (SLOs, incident management)
- Lead and mentor backend and platform engineering teams and drive platform roadmaps
Technical Overview
You will design and scale distributed systems for LLMs, retrieval-augmented generation (RAG), knowledge bases, and agent orchestration, with strong emphasis on real-time AI inference. You will also establish evaluation pipelines, safety/guardrails, observability, SLOs, incident management, and CI/CD for cloud-native deployments.
Ideal Candidate
The ideal candidate is an executive-level engineering leader with deep expertise in agent-based AI platforms, including LLMs, retrieval-augmented generation (RAG), and real-time AI inference. They have led platform architecture and delivery at scale, built evaluation and safety/guardrails frameworks, and managed backend/platform teams with strong operational practices (SLOs, incident management, performance monitoring).
Must-Have Skills
Lead architecture and development of Sparky Agent AI PlatformDesign and scale distributed systems supporting LLMsretrieval-augmented generation (RAG)agent orchestrationreal-time AI inferenceDefine platform capabilities including context managementevaluation pipelinessafety/guardrailsobservabilityand cost-efficient inferenceEstablish engineering best practiceshigh code qualityand strong operational maturity (SLOsincident managementperformance monitoring)Improve developer workflowsCI/CD pipelinesand cloud-native deployment processesLeadgrowand mentor a team of backend and platform engineers
Tools & Platforms
CI/CD pipelinescloud-native deployment processes
Required Skills
Sparky Agent AI Platformagent orchestrationdistributed systemsLLMsretrieval-augmented generation (RAG)real-time AI inferenceevaluation pipelinessafety/guardrailsobservabilitySLOsincident managementperformance monitoringCI/CD pipelinescloud-native deploymentplatform roadmaps
Hard Skills
AI technology leadershiparchitecturedevelopmentdistributed systemsLLMsretrieval-augmented generation (RAG)retrieval systemsknowledge basesagent orchestrationreal-time AI inferencecontext managementevaluation pipelinessafety/guardrailsobservabilitycost-efficient inferenceperformancereliabilityaccessibilitysecuritySLOsincident managementperformance monitoringCI/CD pipelinescloud-native deployment processesmodel integrationexperimentationcontinuous improvementbackend engineeringplatform engineeringengineering best practiceshigh code qualityoperational maturitytechnical roadmapspeople leadershipmentoringteam planninggoal settingcareer developmentperformance managementcollaborationevaluation frameworksenterprise-wide AI strategy
Soft Skills
leadershipmentoringgrow and mentor a teamcross-functional collaborationtechnical excellence culturehigh performance cultureinnovation cultureaccountabilitylearningcommunicationpartnership with Product and UXpartnering with Data Science and Research teams
Keywords for Your Resume
DirectorSoftware EngineeringDirector of EngineeringEngineering leadershipSparky Agent AI Platformagent orchestrationLLMsretrieval-augmented generation (RAG)knowledge basesreal-time AI inferencedistributed systemscontext managementevaluation pipelinessafety/guardrailsobservabilitycost-efficient inferenceperformance monitoringincident managementSLOsCI/CD pipelinescloud-native deploymentplatform roadmapsbackend engineersplatform engineersmentoringtechnical strategy
Deal Breakers
Must have demonstrated experience with LLMs and retrieval-augmented generation (RAG), Must have experience leading architecture and development of agent orchestration platforms, Must have led teams and execution delivery at a director/executive level, Must have experience with operational maturity concepts like SLOs and incident management
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile