About this role
AI Performance Library Architect building and optimizing the oneDNN project to accelerate AI workloads on Intel CPUs/GPUs. Roles include design, development, and maintenance of performance-critical functionality across cross-platform environments.
Key Responsibilities
- Design, develop, and maintain new functionality in oneDNN
- Support developers optimizing AI frameworks
- Collaborate with cross-platform AI developers
- Perform performance engineering and low-level optimizations
- Contribute to open-source software and code quality
Technical Overview
Role centers on low-level performance optimization, cross-platform AI software, and open-source contributions with a focus on oneDNN, CUDA/OpenCL, and ML frameworks. Requires deep knowledge of linear algebra implementations and numerical stability.
Ideal Candidate
The ideal candidate is a senior software engineer with 5+ years of C/C++ and performance optimization experience, strong ML framework background, and hands-on work with oneDNN and cross-platform performance tuning.
Must-Have Skills
Master's degree in MathematicsPhysicsComputer Scienceor a relevant STEM field OR Ph.D.5+ years of experience in C and C++Maintaining or contributing to open-source software projectsSoftware libraries design and architectureImplementation of linear algebra algorithms (BLASLAPACKor PyTorch)Performance engineering and software performance optimizationsFloating point arithmetic and numerical stabilitySoftware development on LinuxLow-level performance optimizations using CUDAx86 assembly or intrinsicsor OpenCL
Nice-to-Have Skills
3 years+ Machine learning and deep learning algorithms or HPC3 year+ Floating point implementations of transcendental functions (sincostanheluetc)1 year+ Algorithms for non-IEEE low precision data types (bfloat16fp8fp4)1 year+ AI assisted software development
Tools & Platforms
oneDNNPyTorchCUDAOpenCL
Required Skills
Master's degree in MathematicsPhysicsComputer Scienceor a relevant STEM field OR Ph.D.; 5+ years of experience in C and C++; maintaining/open-source software projects; software libraries design and architecture; BLAS; LAPACK; PyTorch; CUDA; OpenCL; Linux; low-level performance optimizations; floating point arithmetic; numerical stability; oneDNN; machine learning; deep learning
Hard Skills
CC++CUDAOpenCLLinuxlow-level performance optimizationsx86 assemblyintrinsicsBLASLAPACKoneDNNPyTorch
Soft Skills
communicationteam collaborationproblem-solvingattention to detailcross-functional work
Keywords for Your Resume
oneDNNCC++CUDAOpenCLLinuxx86 assemblyintrinsicsBLASLAPACKPyTorchfloating point arithmeticnumerical stabilityperformance engineeringsoftware libraries designopen-sourceoneDNN projecthigh-performance computingmachine learningdeep learning
Deal Breakers
Lack of Master’s/PhD in a relevant field, Less than 5 years in C/C++ performance engineering, No experience with Linux, No exposure to ML frameworks
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile