About this role
Build and enhance AMD’s AI/ML software quality by debugging issues, creating test automation, and validating features prior to customer release. You will develop containers and tooling to run benchmarks, extract metrics from logs, and populate SQL tables powering Grafana dashboards.
Key Responsibilities
- Debug, Test Fixes and Validate AI/ML SW issues/features before releasing to customers
- Design and develop Docker containers for AI software on ROCm SDK
- Develop software (Bash, C++, Python, Pytest) to execute tests and AI/ML benchmarks
- Extract metrics from run logs and update SQL tables used by Grafana frontend
- Debug and root cause failures/performance degradation on ROCm platform and implement fixes/workarounds
Technical Overview
Hands-on development using Docker containers and a ROCm SDK environment, with Python/Bash/C++ tooling for executing AI/ML benchmarks and extracting metrics. Responsibilities include SQL table design for Grafana reporting and deep troubleshooting/root-cause analysis of failures and performance degradation on the ROCm platform.
Ideal Candidate
The ideal candidate is a software engineer with hands-on Python, Bash, and Linux experience who can develop and debug AI/ML software on ROCm. They have strong skills in Docker, Pytest-based testing, metrics/log processing, and organizing results in SQL tables for Grafana reporting, plus the ability to root-cause performance issues on the ROCm platform.
Must-Have Skills
DebugTest Fixes and Validate AI/ML SW issues/featuresDesign and develop Docker containersDesign and develop software (BashC++PythonPytest)Develop software to execute tests and AI/ML benchmarksDevelop software (PythonLinux shell scripts) to extract metrics from run logsDevelop software to process metrics and update SQL tables used by Grafana frontendDesign and develop SQL tables for test and performance metrics organizationDebug and root cause failures on ROCm platform and implement fixes/workaroundsBachelor's or Master's degree in Computer/Software EngineeringComputer Scienceor related technical discipline
Nice-to-Have Skills
Hands-on experience with PythonHands-on experience with BashHands-on experience with LinuxHands-on experience using AI Agentic developmentKnowledge of AI software frameworksKnowledge of PyTorchKnowledge of JAXKnowledge of vLLMKnowledge of SGLangFamiliarity with Linux and modern software tools and techniques for developmentGood analytical and problem-solving skills
Tools & Platforms
DockerROCm SDKPythonPytestBashLinuxSQLGrafanaJiraC++
Required Skills
DebuggingTestingDockerDocker containersROCm SDKBashC++PythonPytestAI/ML benchmarksLinux shell scriptsLinuxRun logsMetrics extractionSQLSQL tablesGrafana frontendData processingRoot cause analysisPerformance troubleshootingROCm platformSoftware engineering principlesData structureAlgorithmsOperating Systems conceptsMultithread programmingAI software frameworksPyTorchJAXvLLMSGLangAI Agentic developmentCollaborationProject Managers
Hard Skills
DebuggingTestingDockerDocker containersROCm SDKBashC++PythonPytestAI/ML benchmarksPython codeLinux shell scriptsLinuxRun logsMetrics extractionSQLSQL tablesGrafana frontendData processingRoot cause analysisPerformance troubleshootingROCm platformSoftware engineering principlesData structureAlgorithmsOperating Systems conceptsMultithread programmingAI software frameworksPyTorchJAXvLLMSGLangAI Agentic development
Soft Skills
Innovative mindsetProblem-solving mindsetAnalytical skillsAttention to detailCollaborationTeamworkExecution excellenceDirect communicationHumilityInclusive mindsetCross-team coordinationWork with Project ManagersAbility to deliver results
Keywords for Your Resume
AI/ML Framework Software Development EngineerSoftware Development EngineerAI/ML SW issues/featuresDocker containersROCm SDKBashC++PythonPytestAI/ML benchmarksLinux shell scriptsGrafana frontendSQL tablesroot causeperformance degradationROCm platformmultithread programmingData structurealgorithmsOperating Systems conceptsPyTorchJAXvLLMSGLangAI Agentic development
Deal Breakers
Must have Bachelor's or Master's degree in Computer/Software Engineering, Computer Science, or related technical discipline
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile