Position Details
About this role
Support and evolve the high-performance Linux cluster that powers KLA R&D workloads including physics modeling, simulation, algorithm development, and machine learning. The role focuses on reliability, performance, and scalability of a shared, mission-critical HPC environment.
Key Responsibilities
- Support and evolve a highperformance Linux cluster
- Drive reliability, performance, and scalability of a missioncritical HPC environment
- Partner with infrastructure, DevOps, and application teams
- Enable physics modeling, simulation, and algorithm development workloads
- Ensure the platform remains fast, resilient, and ready for demanding computational challenges
Technical Overview
This position is centered on HPC infrastructure operations for a shared high-performance Linux cluster used for simulation and machine learning workloads. It requires skills in maintaining reliability and performance at scale while partnering with DevOps and application teams.
Ideal Candidate
The ideal candidate is an HPC-focused systems engineer experienced with maintaining and evolving high-performance Linux clusters for compute-heavy workloads. They bring strength in reliability, performance, and scalability for mission-critical shared HPC environments and collaborate effectively with infrastructure, DevOps, and application teams.
Must-Have Skills
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Must have experience supporting and evolving a highperformance Linux cluster for HPC workloads, Must demonstrate ability to drive reliability, performance, and scalability in an HPC environment
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile