✦ Luna Orbit — DevOps & SRE

Staff Site Reliability Engineer - Observability

at Okta

📍 San Francisco, California Unknown Posted March 12, 2026
Type Full-Time
Experience mid
Exp. Years 3+ years
Education Not specified
Category DevOps & SRE

Okta is seeking a Site Reliability Engineer specialized in Observability to develop and expand our monitoring ecosystem within GCP, focusing on automation, high reliability, and scalable infrastructure.

  • Build scalable observability infrastructure
  • Automate deployment of agents
  • Optimize data collection in GCP
  • Participate in incident response
  • Develop dashboards and monitoring tools

The role involves managing GCP-based observability tools, automating deployment with Terraform, coding in Python and Go, and working with Kubernetes, Grafana, and Splunk for monitoring and incident response.

The ideal candidate is a highly technical Site Reliability Engineer with at least 3 years of experience in GCP and Kubernetes, proficient in Python and Go, with strong skills in observability tools like Grafana and Splunk, and experience automating infrastructure using Terraform.

GKE: Minimum 5+ ExperienceExpertise in creating dashboards in Splunk or GrafanaMinimum 3+ years in SREDevOpsor Systems EngineeringStrong coding skills in Python or GoDeep understanding of Linux internalsnetworkingand Kubernetes
OpenTelemetryVectorGrafana LokiAWSExperience managing observability tools within AWS
GKEGoogle Cloud PlatformTerraformPythonGoLinuxKubernetesGrafanaSplunkOpenTelemetryGrafana LokiAWS
GKEGoogle Kubernetes EngineGCPGoogle Cloud PlatformTerraformPythonGoLinuxKubernetesGrafanaSplunkOpenTelemetryOTelVectorGrafana LokiAWS
GKEGoogle Kubernetes EngineGCPGoogle Cloud PlatformTerraformTerraformPythonPythonGoGoLinuxLinuxNetworkingTCP/IPDNSLoad BalancingKubernetesKubernetesGrafanaSplunkOpenTelemetryOTelVectorGrafana LokiAWSAmazon Web Services
Problem SolvingAutomationCollaborationCommunicationAnalytical Thinking
Industry Technology
Job Function Design and operate scalable, automated observability systems in GCP
GKEGoogle Kubernetes EngineGCPGoogle Cloud PlatformTerraformPythonGoLinuxKubernetesGrafanaSplunkOpenTelemetryOTelVectorGrafana LokiAWSObservabilitySite Reliability EngineerSREDevOps

Less than 3 years in SRE/DevOps roles, Lack of experience with GCP or Kubernetes, No scripting skills in Python or Go, No experience with observability tools

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile