✦ Luna Orbit — AI & Machine Learning

ML Kernel Performance Engineer, AWS Neuron, Annapurna Labs

at Amazon.com

📍 US, CA, Cupertino Unknown Posted March 28, 2026
Type Not Specified
Experience mid
Exp. Years Not specified
Education Not specified
Category AI & Machine Learning

This role involves developing high-performance kernels for machine learning workloads on AWS's custom accelerators, working at the hardware-software boundary to optimize AI inference and training performance.

  • Optimize ML kernels
  • Collaborate across hardware and software teams
  • Contribute to architecture design
  • Implement performance improvements
  • Mentor engineers

The technical environment includes ML frameworks like PyTorch, C++, distributed systems, and hardware accelerators such as Inferentia and Trainium, focusing on performance tuning and kernel development.

The ideal candidate is a mid-level AI/ML engineer with experience in hardware-software optimization, proficient in C++, and familiar with deep learning frameworks like PyTorch. They should have a strong background in high-performance computing and distributed architectures, with a passion for AI acceleration technology.

low-level optimizationsystem architectureML model accelerationC++software engineering
ML frameworksPyTorchdistributed systemshardware knowledgeresearch publication
AWS Neuron SDKPyTorchGitHubAWS
Machine LearningML compilerruntimePyTorchhigh-performance computingdistributed architectureskernel optimizationhardware-software boundaryML accelerationC++software developmentperformance optimization
Machine LearningML compilerruntimePyTorchhigh-performance computingdistributed architectureskernel optimizationhardware-software boundaryML accelerationC++software developmentperformance optimization
collaborationproblem-solvinginnovative thinkingmentoringcommunication
Industry Technology / Cloud Computing / Artificial Intelligence
Job Function Developing and optimizing machine learning kernels for AI acceleration
Role Subtype Machine Learning Engineer
Tech Domains Amazon Web Services, PyTorch, Machine Learning, High-performance computing
Machine LearningML compilerruntimePyTorchhigh-performance computingdistributed architectureskernel optimizationhardware-software boundaryML accelerationC++software developmentperformance optimizationAWS Neuron SDKAI accelerationdeep learning workloadsGPUTensorFlowAI hardwareperformance tuningsoftware engineeringmachine learningdistributed systemsdeep learning

Lack of experience in low-level optimization, No background in system architecture or ML acceleration, No proficiency in C++, No experience with hardware-software boundary

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile