✦ Luna Orbit — AI & Machine Learning

Senior Software Engineer, NCCL

at Nvidia

📍 US, CA, Santa Clara Unknown Posted March 13, 2026
Type Full-Time
Experience senior
Exp. Years 5+ years
Education MS/Ph.D. degree in CS/CE or equivalent experience
Category AI & Machine Learning

This role involves developing and maintaining communication libraries and system software for GPU clusters, focusing on high-performance, scalable communication in HPC and deep learning applications.

  • Design communication runtimes
  • Optimize GPU communication
  • Support HPC and deep learning workloads
  • Develop system software for GPU interactions
  • Collaborate on GPU communication protocols

The technical environment includes C/C++, Linux, CUDA, NVIDIA GPUs, MPI, OpenSHMEM, and deep learning frameworks like PyTorch and TensorFlow, supporting high-performance GPU communication.

The ideal candidate is a senior software engineer with 5+ years of experience in GPU communication libraries, parallel programming, and high-performance computing environments, with strong C/C++ skills and experience with NVIDIA GPUs and deep learning frameworks.

Excellent C/C++ programming and debugging skillsExperience with LinuxStrong understanding of computer system architecture and operating systemsExperience with parallel programming interfaces and communication runtimes5+ years relevant experience
CUDA programmingNVIDIA GPUsHigh-performance networks like InfiniBandiWARPExperience with HPC applicationsExperience with Deep Learning Frameworks such as PyTorchTensorFlow
CUDANVIDIA GPUsMPIOpenSHMEMLinux
CC++LinuxMPIOpenSHMEMCUDANVIDIA GPUsDeep Learning FrameworksPyTorchTensorFlowHPCGPU ClustersCommunication Runtimes
CC++LinuxParallel ProgrammingMPIOpenSHMEMCUDANVIDIA GPUsDeep Learning FrameworksPyTorchTensorFlowHigh-Performance NetworksInfiniBandiWARP
CollaborationCommunicationProblem-solvingTeamworkAdaptability
Industry Technology
Job Function Develop high-performance communication libraries for GPU clusters in HPC and AI applications
Senior Software EngineerNCCLCommunication RuntimesGPU ClustersMPIOpenSHMEMCUDANVIDIA GPUsDeep Learning FrameworksPyTorchTensorFlowHigh-Performance NetworksInfiniBandiWARPLinuxCC++Parallel ProgrammingSystem SoftwareGPU

Less than 5 years of relevant experience, Lack of experience with C/C++ programming, No experience with Linux, No familiarity with GPU communication protocols

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile