Position Details
About this role
This role involves designing and maintaining a scalable observability ecosystem using Splunk, Grafana, and infrastructure as code tools. The engineer will optimize logging, monitoring, and incident response workflows in a large-scale distributed environment.
Key Responsibilities
- Design and tune Splunk environments
- Architect Grafana dashboards
- Automate infrastructure deployment with Terraform
- Optimize telemetry data pipelines
- Lead incident response and post-incident reviews
Technical Overview
The technical environment includes Splunk, Grafana, Terraform, and scripting in Go, Python, and Ruby. The focus is on automation, performance tuning, and scalable telemetry data processing.
Ideal Candidate
The ideal candidate is a senior Site Reliability Engineer with deep expertise in Splunk, Grafana, and infrastructure as code tools like Terraform. They should have strong scripting skills in Go, Python, or Ruby and experience optimizing observability ecosystems for large-scale distributed systems.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Lack of experience with Splunk or Grafana, No scripting experience in Go, Python, or Ruby, No experience with infrastructure as code tools like Terraform
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile