About this role
Lead AI Engineer to build and scale Capital One's AI foundation and platform services, delivering AI-powered products with responsible and observable AI systems.
Key Responsibilities
- Partner with cross-functional teams to deliver AI-powered products
- Design, develop, test, deploy, and support AI software components including foundation model training, LLM inference, similarity search, guardrails, governance and observability
- Leverage Open Source and SaaS AI technologies such as AWS Ultraclusters, Huggingface, VectorDBs, Nemo Guardrails, PyTorch
- Invent state-of-the-art LLM optimization techniques to improve performance, scalability, latency, throughput, and cost
- Contribute to technical vision and long-term roadmap of foundational AI systems
Technical Overview
Hands-on stack includes Python, Go, Scala, Java; PyTorch; Huggingface; AWS Ultraclusters; VectorDBs; Nemo Guardrails; foundation model training; LLM inference; governance and observability.
Ideal Candidate
The ideal candidate is a mid- to senior-level AI Engineer with 4+ years of hands-on experience building production AI systems, strong foundation in ML, and deep proficiency with Python and multi-language stacks. They should have experience with LLMs, AI infrastructure, and cloud platforms, and be able to design scalable, observable AI solutions while adhering to governance and guardrails.
Must-Have Skills
Bachelor's degree in Computer ScienceAIElectrical EngineeringComputer Engineeringor related fieldsAt least 4 years of experience developing AI and ML algorithms or technologiesAt least 4 years of experience programming with PythonGoScalaor Java
Nice-to-Have Skills
6 years of experience deploying scalable and responsible AI solutions on cloud platforms (e.g. AWSGoogle CloudAzureor equivalent private cloud)Experience designingdevelopingdeliveringand supporting AI servicesExperience developing AI and ML algorithms or technologies (e.g. LLM InferenceSimilarity Search and VectorDBsGuardrailsMemory) using PythonC++C#Javaor GolangExperience developing and applying state-of-the-art techniques for optimizing training and inference software to improve hardware utilizationlatencythroughputand costPassion for staying abreast of the latest AI research and AI systemsand judiciously apply novel techniques in production
Tools & Platforms
AWS UltraclustersHuggingfaceVectorDBsNemo GuardrailsPyTorchPythonGoScalaJavaAWSAmazon Web ServicesGoogle Cloud PlatformAzure
Required Skills
PythonGoScalaJavaPyTorchHuggingfaceAWS UltraclustersVectorDBsNemo Guardrailsfoundation model traininglarge language model inferencesimilarity searchguardrailsmodel evaluationexperimentationgovernanceobservabilityAWSAmazon Web ServicesGoogle Cloud PlatformAzure
Hard Skills
PythonGoScalaJavaPyTorchHuggingfaceAWS UltraclustersNemo GuardrailsVectorDBsfoundation model traininglarge language model inferencesimilarity searchguardrailsmodel evaluationexperimentationgovernanceobservabilityAWSAmazon Web ServicesGoogle Cloud PlatformAzure
Soft Skills
communicationteamworkproblem solvingadaptabilityleadership
Keywords for Your Resume
Lead AI EngineerMLXGen AI Platform ServicesAgentic AIfoundation model traininglarge language model inferencesimilarity searchguardrailsmodel evaluationexperimentationgovernanceobservabilityHuggingfaceVectorDBsNemo GuardrailsPyTorchPythonGoScalaJavaAWS UltraclustersAmazon Web ServicesGoogle Cloud PlatformAzurellm inferencefoundation model
Deal Breakers
Must be willing to work on-site at one of Capital One's listed locations, Must be eligible to work in the United States (visa sponsorship may be provided)
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile