✦ Luna Orbit — DevOps & SRE

SRE Engineer with Spanish/Portuguese

at Andersen

📍 Remote, US Remote Posted March 12, 2026
Type Not Specified
Experience mid
Exp. Years 5+ years
Education Not specified
Category DevOps & SRE

This role involves maintaining and improving the reliability of large-scale cloud platforms using SRE principles, focusing on automation, monitoring, and incident management.

  • Maintaining high system availability and reliability
  • Designing monitoring and alerting solutions
  • Leading incident response and root cause analysis
  • Automating operational tasks
  • Building self-healing solutions

The technical environment includes cloud platforms like Azure, Kubernetes clusters (AKS/EKS/GKE), containerization with Docker, infrastructure automation with Terraform, and monitoring tools such as Dynatrace, Datadog, and Prometheus.

The ideal candidate is a mid-level SRE with 5+ years of experience in incident management, cloud platforms, and container orchestration. They possess strong skills in monitoring, automation, and high availability architectures, with proficiency in English and Spanish/Portuguese.

Experience as a Site Reliability Engineer / Incident Management Engineer for 5+ yearsStrong experience in Incident EscalationExperience with Azure cloud platformsExperience with Kubernetes administration (AKS / EKS / GKE)Experience with containerization technologies (Docker)Experience with Infrastructure as Code for 3+ years (Terraform preferred)Understanding high availability architecturesauto-scalingand disaster recovery strategiesExperience with monitoring and APM tools (DynatraceDatadogPrometheusAzure Monitor)Experience with log aggregation systems (ELKLokiSplunk)Experience with distributed tracing solutions (OpenTelemetry preferred)Experience with alert configurationtuningand reduction of alert fatigueExperience defining and tracking SLIs and SLOsLevel of English – from Intermediate+ and aboveLevel of Spanish/Portuguese – from Upper-Intermediate and above
Experience in FinTechHealthcareRetailTelecomExperience with auto-healing solutionsExperience with capacity planning and disaster recovery planning
AzureKubernetesTerraformDynatraceDatadogPrometheusELKLokiSplunkOpenTelemetry
KubernetesAKSEKSGKEDockerTerraformAzureAzure MonitorDynatraceDatadogPrometheusELKLokiSplunkOpenTelemetryIncident ManagementSLIsSLOsAuto-remediationHigh availability architecturesAuto-scalingDisaster recovery
KubernetesAKSEKSGKEDockerTerraformAzureAzure MonitorDynatraceDatadogPrometheusELKLokiSplunkOpenTelemetryIncident ManagementSLIsSLOsAuto-remediationHigh availability architecturesAuto-scalingDisaster recovery
CommunicationLeadershipTeamworkProblem-solvingCollaboration
Industry IT & Cloud Services
Job Function Ensure cloud platform reliability and automation using SRE best practices
Site Reliability EngineerIncident ManagementKubernetesAKSEKSGKEDockerTerraformAzureAzure MonitorDynatraceDatadogPrometheusELKLokiSplunkOpenTelemetrySLIsSLOsAuto-remediationHigh availability architecturesAuto-scalingDisaster recoveryIncident EscalationMonitoringLoggingAlerting

Less than 5 years of relevant experience, Lack of experience with Kubernetes or cloud platforms, No proficiency in English or Spanish/Portuguese

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile