✦ Luna Orbit — Software Engineering

GPU software engineer

at Intel

📍 2 Locations Hybrid 💰 $128K – $245K USD / year Posted April 15, 2026
Salary $128K – $245K USD / year
Type Not Specified
Experience senior
Exp. Years 3+ years
Education Bachelor's degree in computer science, Electrical Engineering/ Electronics Engineering, Computer Engineering, Math, or in a STEM related field of study
Category Software Engineering

Intel is seeking a Senior Software Development Engineer to build and optimize communication runtime features for HPC/AI workloads. The role focuses on developing and improving Intel communication libraries and driving performance engineering for latency and throughput.

  • Develop software features and optimizations for Intel communication libraries (libfabric, oneCCL, ISHMEM, Intel MPI)
  • Perform performance engineering to improve communications latency and throughput
  • Debug issues across hardware and software stack layers
  • Contribute to open-source/upstream projects and deliver complex technical projects independently
  • Collaborate with a diverse distributed team and drive continuous improvement

You will work on communication libraries including libfabric, oneCCL (Collective Communication Library), ISHMEM (Shared Memory Access), and Intel MPI (Message Passing Interface). The scope includes Linux C/C++ development, performance optimization, and debugging across hardware/software layers with experience in RDMA (InfiniBand and/or RoCE), TCP/IP, and potentially GPU and parallel programming.

The ideal candidate is a senior software engineer with 3+ years of strong C and C++ development and debugging experience in Linux environments. They have performance engineering experience for HPC communication libraries such as libfabric, oneCCL (Collective Communication Library), ISHMEM (Shared Memory Access), and/or Intel MPI (Message Passing Interface), with exposure to RDMA (InfiniBand and/or RoCE), TCP/IP, and GPU/parallel programming.

CC++Strong C and C++ programming/development and debugging skills.Development in Linux environments.3+ years of experience with Strong C and C++ programming/development and debugging skills.3+ years of experience with Development in Linux environments.
Ph.D degree in Computer ScienceComputer Engineeringor related fieldPerformance optimizations that improve communications latency or throughputDebugging problems at different layers of the hardware and software stackDemonstrated upstream contributions and experience developing in an open-source environmentTrack record of delivering complex technical projects independentlyExperience collaborating with a diversedistributed teamDistributed computingHPC communications librariesCollective communication librariesDeveloping software for GPUsDeveloping software for one or more layers of the network communications stack: RDMARoCETCP/IPExperience with GPU programming and parallel computingExperience with multithreaded programmingExperience with networking software stackHandson experience with RDMA networking (InfiniBand and/or RoCE) and userspace RDMA APIsPerformance engineering with running bench
libfabriconeCCL (Collective Communication Library)ISHMEM (Shared Memory Access)Intel MPI (Message Passing Interface)RDMARoCE (RDMA over Converged Ethernet)TCP/IP (Transmission Control Protocol/Internet Protocol)
CC++Linux environmentsDebuggingperformance engineeringlibfabriconeCCL (Collective Communication Library)ISHMEM (Shared Memory Access)Intel MPI (Message Passing Interface)RDMARoCETCP/IPInfiniBanduserspace RDMA APIsGPU programmingparallel computingmultithreaded programming
CC++Linux environmentsDebuggingPerformance engineeringCommunication librarieslibfabriconeCCL (Collective Communication Library)ISHMEM (Shared Memory Access)Intel MPI (Message Passing Interface)Distributed computingHPC communications librariesCollective communication librariesGPU programmingParallel computingMultithreaded programmingNetworking software stackRDMAInfiniBandRoCEuserspace RDMA APIsRDMA networkingGPU programming and parallel computingDeveloping software for one or more layers of the network communications stackRoCE (RDMA over Converged Ethernet)TCP/IP (Transmission Control Protocol/Internet Protocol)Bench (running bench)Root cause analysisOpen-source environment
MentorshipCollaborationIndependent problem solvingProblem-solving at different layers of the hardware and software stackContribution to upstream projectsCollaboration with a diversedistributed team
Industry Telecom
Job Function Develop and performance-optimize HPC communication library software in C/C++ on Linux.
Role Subtype Software Architect
Tech Domains Linux, Python, Azure
GPU software engineerSenior Software Development EngineerCC++Linux environmentsDebuggingCommunication Runtimes teamlibfabriconeCCL (Collective Communication Library)ISHMEM (Shared Memory Access)Intel MPI (Message Passing Interface)performance engineeringcommunications latencycommunications throughputRDMARoCETCP/IPInfiniBanduserspace RDMA APIsopen-source environmentdistributed computingHPC communications librariesCollective communication librariesGPU programmingparallel computingmultithreaded programmingperformance optimizations

Must have a Bachelor's degree in a STEM-related field and 3+ years of experience, Must have strong C and C++ programming/development and debugging skills, Must have development experience in Linux environments

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile