✦ Luna Orbit — Software Engineering

Staff+ Software Engineer, Observability

at Anthropic

📍 San Francisco, CA | New York City, NY | Seattle, WA Hybrid Posted March 07, 2026
Type Not Specified
Experience senior
Exp. Years 10+ years
Education Not specified
Category Software Engineering

This role involves designing and building scalable observability systems to monitor and diagnose complex AI infrastructure, ensuring reliability and operational excellence.

  • Design scalable telemetry pipelines
  • Own observability platforms
  • Build instrumentation libraries
  • Drive alerting and SLO infrastructure
  • Partner with teams for solutions

The technical environment includes building telemetry pipelines, working with metrics, logs, traces, error analytics, and distributed systems across cloud platforms like AWS, GCP, and Azure.

The ideal candidate is a highly experienced software engineer with over 10 years in building and maintaining large-scale observability and monitoring systems. They possess deep expertise in metrics, logging, tracing, and error analytics, with a strong background in distributed systems and cloud platforms.

10+ years of industry experienceBuilding and operating large-scale observability or monitoring infrastructureDeep experience with metricsloggingtracingor error analyticsFamiliarity with high-throughput ingest pipelinesExperience with distributed systems
Cost-efficient storage solutionsUnified query interfacesAI-assisted diagnostic toolsMulti-cluster infrastructure experience
PrometheusGrafanaElasticsearchKubernetesCloud platforms (AWSGCPAzure)
Telemetry pipelinesMetricsLoggingTracingError analyticsDistributed systemsMonitoring infrastructureSLOAlertingAI-assisted diagnostic tooling
Telemetry pipelinesMetricsLoggingTracingError analyticsDistributed systemsMonitoring infrastructureSLOAlertingAI-assisted diagnostic tooling
CommunicationProblem-solvingCollaborationArchitectural thinkingOperational excellence
Industry AI & Machine Learning
Job Function Develop and maintain large-scale observability infrastructure for AI systems
Software EngineerObservabilityTelemetry pipelinesMetricsLoggingTracingError analyticsDistributed systemsMonitoring infrastructureSLOAlertingAI-assisted diagnostic toolingKubernetesAWSGCPAzure

Less than 10 years of relevant experience, Lack of experience with large-scale observability systems, No familiarity with distributed systems or cloud platforms

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile