✦ Luna Orbit — DevOps & SRE

Distinguished Engineer, Cloud Site Reliability Engineering

at Nvidia

📍 US, CA, Santa Clara Hybrid Posted April 02, 2026
Type Full-Time
Experience lead
Exp. Years 18+ years
Education BS EE/CS or equivalent experience
Category DevOps & SRE

Lead NVIDIA's Cloud SRE architecture for the private cloud used by thousands of developers. Architect, implement and operate end-to-end CI/CD, onboard internal teams, and drive performance and cost optimizations across multi-cloud infrastructure.

  • Lead SRE architecture on GPU Private Cloud
  • Architect, implement & support end-to-end CI/CD system
  • Onboard NVIDIA internal development teams to Private Cloud
  • Identify performance bottlenecks and optimize speed and cost
  • Lead software development projects and guide engineers

Hands-on expertise across distributed systems, Java/Python, REST APIs, Docker/Kubernetes, OpenStack, and large-scale data stores (MySQL, Cassandra, MongoDB, Elasticsearch). Strong background in CI/CD pipelines, multi-cloud/hybrid cloud, and infrastructure optimization with cross-functional leadership.

The ideal candidate is a senior SRE/architect with 18+ years in systems software and deep expertise in distributed infra and CI/CD at scale. They have hands-on experience with OpenStack, Kubernetes, Docker, and a strong background in performance optimization and cloud cost efficiency, plus cross-functional leadership.

BS EE/CS or equivalent experience with 18+ years of systems software development including at least 1 year dedicated to developing/exploring AI.Experience maintaining cloud infrastructure and highly available production environment.Strong programming and software development skills in JAVAPythonShell-script along with good understanding of distributed systems and REST APIs.Experience in working with SQL/NoSQL database systems such as MySQLCassandraMongoDB or Elasticsearch.Excellent knowledge and working experience with Docker containers and Virtual Machines.Good background of Cloud technologies like: OpenStackDockerKubernetesChef/PuppetHadoop/Ceph/SwiftStackLXCGitPerforceJFrogKafka.Ability to work across organizational boundaries effectively to improve alignment and productivity between teams in a multi-nationalmulti-time-zone corporate environment.
Depth in AIMachine Learning and Deep Learning algorithms and techniques.Strong collaborative and interpersonal skillswith a consistent record of guiding and influencing others in dynamic environments.Experience developing large-scale software systems using modular architecture under real-time performance requirements.Background in designing high-performancescalable software systems with a strong focus on hardware cost optimization.
DockerKubernetesOpenStackChefPuppetGitPerforceJFrogKafkaMySQLCassandraMongoDBElasticsearchJavaPythonShell scriptingCI/CD tooling
BS EE/CS; 18+ years experience; Java; Python; Shell-script; REST APIs; distributed systems; MySQL; Cassandra; MongoDB; Elasticsearch; Docker; Virtual Machines; OpenStack; Kubernetes; Chef; Puppet; Hadoop; Ceph; SwiftStack; LXC; Git; Perforce; JFrog; Kafka; Windows; Linux; AI; Artificial Intelligence; CI/CD; SRE; private cloud; cloud infrastructure; multi-cloud; private cloud
JavaPythonShell-scriptREST APIsDistributed SystemsSQLMySQLCassandraMongoDBElasticsearchDockerVirtual MachinesOpenStackKubernetesChefPuppetHadoopCephSwiftStackLXCGitPerforceJFrogKafkaLinuxOpenStackKubernetes
LeadershipCommunicationCollaborationProblem-solvingAnalytical thinkingCross-functional teamworkMentorshipStrategic thinking
Industry Technology
Job Function Lead and deliver end-to-end cloud SRE architecture and CI/CD for NVIDIA Private Cloud.
Role Subtype Site Reliability Engineer
Distinguished EngineerCloud Site Reliability EngineeringSRE ArchitectCI/CDDockerKubernetesOpenStackChefPuppetHadoopCephSwiftStackLXCGitPerforceJFrogKafkaMySQLCassandraMongoDBElasticsearchJavaPythonShell-scriptREST APIsLinuxWindowsAndroidDistributed SystemsAIArtificial IntelligenceMulti-cloudHybrid CloudPrivate Cloudsre architectcloud site reliability engineeringci/cdkubernetesdockerjavapythonmySQLelasticsearchopenstack

Less than 18 years of relevant experience, Lack of hands-on Kubernetes/Docker and CI/CD experience, No distributed-systems background

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile