✦ Luna Orbit — DevOps & SRE

AI Infrastructure Site Reliability Engineer (remote USA)

at Cisco Systems

Hybrid 💰 $165K – $241K USD / year Posted March 25, 2026
Salary $165K – $241K USD / year
Type Not Specified
Experience mid
Exp. Years 5+ years
Education Bachelor's degree in Computer Science, Information Technology, or related field
Category DevOps & SRE

This role involves supporting and automating high-performance AI infrastructure using modern DevOps practices. The engineer will manage NVIDIA DGX and Cisco-UCS systems, ensuring scalability, reliability, and performance.

  • Automate AI platform pipelines
  • Support NVIDIA DGX and Cisco-UCS infrastructure
  • Ensure system scalability and reliability
  • Drive capacity planning
  • Implement monitoring and fault-tolerance

The environment includes Linux-based HPC clusters, Kubernetes, Docker, Terraform, Ansible, Jenkins, and cloud infrastructure. The focus is on automation, system reliability, and infrastructure scaling.

The ideal candidate is a mid-level site reliability engineer with 5+ years of experience in Linux, Kubernetes, and automation tools like Terraform and Ansible. They are proficient in scripting languages such as Python and Go and have experience supporting high-performance compute infrastructure.

LinuxPythonGoKubernetesTerraformAnsibleJenkinsCI/CD
Hybrid cloudVirtualizationCloud infrastructureJiraRa (assuming Jira or similar)
NVIDIA DGXCisco-UCSKubernetesDockerTerraformAnsibleJenkinsGit
pythongoc/c++linuxkubernetesdockerterraformansiblejenkinsgitci/cdcloud computingvirtualizationmonitoringcapacity planning
PythonGoC/C++LinuxKubernetesDockerTerraformAnsibleJenkinsGitCI/CDCloud computingVirtualizationMonitoringCapacity planning
AutomationProblem-solvingCollaborationPerformance analysisTroubleshooting

Preferred

Linux certificationsCloud certifications
Industry Information Technology / Cloud & Infrastructure
Job Function Support and automate AI infrastructure for high-performance compute systems
Role Subtype Site Reliability Engineer
Tech Domains Linux, Kubernetes, Docker, Terraform, Ansible, Jenkins, Cloud computing
pythongoc/c++linuxkubernetesdockerterraformansiblejenkinsgitci/cdcloud computingvirtualizationmonitoringcapacity planningsite reliability engineersrehybrid cloudautomation

Lack of experience with Linux or Kubernetes, No scripting experience in Python or Go, Unfamiliarity with CI/CD pipelines, No experience with high-performance compute infrastructure

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile