✦ Luna Orbit — AI & Machine Learning

AI Performance Library Architect

at Intel

📍 2 Locations Hybrid 💰 $170K – $315K USD / year Posted April 01, 2026
Salary $170K – $315K USD / year
Type Not Specified
Experience senior
Exp. Years 5+ years
Education Master's degree in Mathematics, Physics, Computer Science, or a relevant STEM field OR Ph.D.
Category AI & Machine Learning

AI Performance Library Architect building and optimizing the oneDNN project to accelerate AI workloads on Intel CPUs/GPUs. Roles include design, development, and maintenance of performance-critical functionality across cross-platform environments.

  • Design, develop, and maintain new functionality in oneDNN
  • Support developers optimizing AI frameworks
  • Collaborate with cross-platform AI developers
  • Perform performance engineering and low-level optimizations
  • Contribute to open-source software and code quality

Role centers on low-level performance optimization, cross-platform AI software, and open-source contributions with a focus on oneDNN, CUDA/OpenCL, and ML frameworks. Requires deep knowledge of linear algebra implementations and numerical stability.

The ideal candidate is a senior software engineer with 5+ years of C/C++ and performance optimization experience, strong ML framework background, and hands-on work with oneDNN and cross-platform performance tuning.

Master's degree in MathematicsPhysicsComputer Scienceor a relevant STEM field OR Ph.D.5+ years of experience in C and C++Maintaining or contributing to open-source software projectsSoftware libraries design and architectureImplementation of linear algebra algorithms (BLASLAPACKor PyTorch)Performance engineering and software performance optimizationsFloating point arithmetic and numerical stabilitySoftware development on LinuxLow-level performance optimizations using CUDAx86 assembly or intrinsicsor OpenCL
3 years+ Machine learning and deep learning algorithms or HPC3 year+ Floating point implementations of transcendental functions (sincostanheluetc)1 year+ Algorithms for non-IEEE low precision data types (bfloat16fp8fp4)1 year+ AI assisted software development
oneDNNPyTorchCUDAOpenCL
Master's degree in MathematicsPhysicsComputer Scienceor a relevant STEM field OR Ph.D.; 5+ years of experience in C and C++; maintaining/open-source software projects; software libraries design and architecture; BLAS; LAPACK; PyTorch; CUDA; OpenCL; Linux; low-level performance optimizations; floating point arithmetic; numerical stability; oneDNN; machine learning; deep learning
CC++CUDAOpenCLLinuxlow-level performance optimizationsx86 assemblyintrinsicsBLASLAPACKoneDNNPyTorch
communicationteam collaborationproblem-solvingattention to detailcross-functional work
Industry SaaS
Job Function Develop and optimize the oneDNN performance library as part of Intel's AI software stack.
Role Subtype Senior Software Engineer
Tech Domains Linux, Python
oneDNNCC++CUDAOpenCLLinuxx86 assemblyintrinsicsBLASLAPACKPyTorchfloating point arithmeticnumerical stabilityperformance engineeringsoftware libraries designopen-sourceoneDNN projecthigh-performance computingmachine learningdeep learning

Lack of Master’s/PhD in a relevant field, Less than 5 years in C/C++ performance engineering, No experience with Linux, No exposure to ML frameworks

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile