About this role
Help build and improve the AI agent pipeline behind PlotStudio AI. You will work on production AI systems including prompt engineering, agent orchestration, model evaluation, and the tooling that makes multi-agent workflows reliable.
Key Responsibilities
- Design, test, and iterate on prompts for a multi-agent pipeline
- Build evaluation scripts and benchmark agent output quality
- Run experiments comparing model performance on analytical tasks
- Improve agent self-correction logic for error detection and retries
- Extend LangGraph orchestration layer and build internal tooling
Technical Overview
Build a multi-agent pipeline (routing, planning, coding, interpretation) and implement evaluation scripts to benchmark agent quality. Extend the orchestration layer using LangGraph and experiment with multiple models (GPT-4.1, GPT-5.1, Claude, Gemini) while adding internal tooling for accuracy, token usage, and failure-mode tracking.
Ideal Candidate
The ideal candidate is a current student pursuing a degree in Computer Science, Data Science, or AI/ML who is comfortable writing and debugging Python. They understand LLM fundamentals (prompts, tokens, context windows, temperature) and are excited to build production AI agent pipelines using prompt engineering, agent orchestration, and evaluation/benchmarking workflows.
Must-Have Skills
Currently pursuing a degree in Computer ScienceData ScienceAI/MLor a related fieldComfortable writing PythonBasic understanding of how LLMs work: promptstokenscontext windowstemperatureGenuine interest in AI agentsnot just chatbotsWillingness to experimentbreak thingsand iterate quicklySelf-directed and comfortable asking questions when you're stuckAble to commit 15-20 hours per week
Nice-to-Have Skills
Experience using the OpenAI APIAnthropic APIor any LLM SDKFamiliarity with LangChainLangGraphCrewA
Tools & Platforms
PythonOpenAI APIAnthropic APILLM SDKLangGraphLangChainCrewA
Required Skills
Pythonprompt engineeringagent orchestrationmulti-agent pipelineLLM evaluationmodel evaluationLangGraphOpenAI APIAnthropic APILangChainCrewAtokenscontext windowstemperatureevaluation scriptsbenchmark agent output qualityagent self-correction logictoken usage tracking
Hard Skills
prompt engineeringagent orchestrationLLM evaluationmodel evaluationinfrastructure for multi-agent workflowsPythonLLMspromptstokenscontext windowstemperaturemulti-agent pipeline (routingplanningcodinginterpretation agents)evaluation scriptsbenchmarkingdataset benchmarkingagent self-correction logicerror detectionretriesadjusting approachdomain skill modulesfinancehealthcareeconometricstracking agent accuracytoken usagefailure modesLangGraphprompt versionsexperiment results documentationexperiment iterationOpenAI APIAnthropic APILLM SDK
Soft Skills
self-directedwillingness to experimentiterative mindsetasking questions when stuckability to work independentlycomfort with breaking things
Keywords for Your Resume
AI Engineering InternAI Engineer InternAI Engineering Intern — AR5 Labsprompt engineeringagent orchestrationmulti-agent pipelineroutingplanning agentscoding agentsinterpretation agentsmodel evaluationevaluation scriptsbenchmark agent output qualityPythonLLMspromptstokenscontext windowstemperatureagent self-correction logicretriesdomain skill modulesfinancehealthcareeconometricstoken usagefailure modesLangGraphOpenAI APIAnthropic APILLM SDKLangChainCrewAGPT-4.1GPT-5.1ClaudeGeminiLLM evaluation
Deal Breakers
Must be currently pursuing a degree in Computer Science, Data Science, AI/ML, or a related field, Must be comfortable writing Python and able to debug code independently, Must be able to commit 15-20 hours per week
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile