✦ Luna Orbit — DevOps & SRE

Observability DevOps Site Reliability Engineer (SRE)

at Cisco Systems

Unknown Posted April 03, 2026
Type Full-Time
Experience senior
Exp. Years 5+ years
Education Bachelor's degree in computer science, computer engineering, or related field
Category DevOps & SRE

Observability DevOps Site Reliability Engineer (SRE) to develop and support observability capabilities across Cisco IT Datacenter and Cloud environments, leveraging AI/ML to improve reliability, and owning monitoring automation and toolchains.

  • Own reliability and scalability of observability platforms
  • Implement AI/LLM-based monitoring use-cases
  • Lead SRE technologies and toolchain maintenance
  • Collaborate with distributed teams
  • Drive automation in monitoring and incident response

DevOps/SRE with strong observability stack: Splunk, Prometheus/Thanos, Grafana; containerization with Docker/Kubernetes/OpenShift; cloud experience (AWS/GCP/Azure); code/scripting in Python/Go; CI/CD with GitHub/Jenkins; on-prem & cloud integration

The ideal candidate is a senior devops/sre with 5+ years of experience, strong observability and AI/ML capabilities, and hands-on experience with containerization (Docker/Kubernetes/OpenShift). Comfortable with multi-cloud and on-prem monitoring tools (Splunk, Prometheus, Grafana, Elastic), and able to lead across geographically distributed teams.

Bachelor's degree in computer science or related field5+ years of relevant experienceExperience with Docker and Linux-based infrastructures
Splunk Cloud / Splunk Observability CloudElastic / Prometheus / Thanos & GrafanaThousandEyes / Zabbix / AppDynamicsJavaScript (Node.js or React)AI/ML & LLM-based Observability
GitHubJenkinsSplunk CloudSplunk Observability CloudElasticPrometheusThanosGrafanaThousandEyesZabbixAppDynamicsDockerKubernetesOpenShiftVMwareOpenStackAnsible
Bachelor's degree in CS or related field; 5+ years of relevant experience; Docker and Linux-based infra; GitHub; Jenkins; CI/CD; Kubernetes/OpenShift; Python/Go; Prometheus/Thanos; Grafana; Splunk/Elastic; JavaScript (Node.js/React); AI/ML observability
Observability technologiesAI/MLGitHubJenkinsPythonShellGoDockerKubernetesOpenShiftVMwareOpenStackPrometheusThanosGrafanaSplunkElasticZabbixAppDynamicsThousandEyesNode.jsReactAI/LLM based Agentic Observability
LeadershipCross-team collaborationSelf-motivatedRelationship buildingLearning agility

Preferred

AWS Solutions Architect - AssociateAWS Solutions Architect - ProfessionalAWS Certified Security SpecialtyAWS Developer - AssociateAWS DevOps Engineer ProfessionalISC2 Certified Cloud Security Professional (CCSP)CRISCCISSPCISMCISA
Industry Networking & Telecom
Job Function Develop and maintain observability capabilities and reliability solutions for workloads across Cisco IT Datacenter and Cloud environments.
Role Subtype Site Reliability Engineer
Tech Domains Linux, Docker, Kubernetes, OpenShift, Prometheus, Grafana, Splunk, Elastic, Python, Go
observabilitysite reliability engineersredevopsaimlgenaikubernetesdockeropenshiftprometheusgrafanasplunkelasticzabbixappdynamicsgitHubjenkinsgitnode.jsreactcloudawsgcpazure

Lack of 5+ years of relevant experience, No Docker or Linux infrastructure experience, Lack of hands-on container and monitoring tool experience

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile