✦ Luna Orbit — DevOps & SRE

Senior DevOps Engineer, AIOPs

at Nvidia

📍 US, CA, Santa Clara Unknown Posted March 13, 2026
Type Not Specified
Experience senior
Exp. Years 5+ years
Education BS/MS in CS/CE or equivalent experience
Category DevOps & SRE

This role involves building and maintaining a high-availability AI Data Center platform, focusing on telemetry ingestion, automation, and reliability engineering.

  • Monitor platform health
  • Own Kubernetes deployments
  • Lead incident triage
  • Build runbooks and SOPs
  • Manage deployment infrastructure

Environment includes Kubernetes, Terraform, Helm, scripting in Python and Bash, with a focus on observability, incident management, and platform automation.

The ideal candidate is a senior DevOps engineer with over 5 years of experience managing production distributed systems, with deep expertise in Kubernetes, infrastructure automation, and observability tools. They should be proactive in incident management and continuous improvement of platform reliability.

5+ years operating production distributed systemsKubernetes + containers experienceSLOs/SLIs ownershipScripting (Python/Bash)Terraform + HelmIncident triageMonitoring and observabilityReliability engineering
Experience with GPU telemetryExperience with AI Data Center platformsCanary deploymentsPost-deployment validation
KubernetesTerraformHelmTerraformHelmLogs/metrics dashboards
KubernetesTerraformHelmPythonBashCI/CDMonitoringTelemetryIncident ResponseReliability
KubernetesK8sDockerPythonBashTerraformHelmCI/CDInfrastructure as CodeIaCMonitoringLoggingTelemetryIncident ResponsePostmortems
CommunicationProblem-solvingTeamworkAutomation mindsetDocumentation skills
Industry Technology
Job Function Operate and maintain a scalable, reliable AI Data Center platform
DevOps EngineerKubernetesK8sTerraformHelmCI/CDPythonBashInfrastructure as CodeMonitoringTelemetryIncident ResponsePostmortemsReliabilitySLOsSLIs

Less than 5 years of relevant experience, Lack of Kubernetes or container experience, No scripting or automation skills, No experience with infrastructure as code tools

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile