✦ Luna Orbit — DevOps & SRE

Site Reliability Engineer

at TWG Global AI

📍 Jacksonville, FL, US Remote Posted March 08, 2026
Type Not Specified
Experience mid
Exp. Years 3-6 years
Education Not specified
Category DevOps & SRE

This role involves building and maintaining scalable ML infrastructure, automating workflows, and ensuring system reliability in a cloud-native environment.

  • Build and maintain infrastructure
  • Implement observability tools
  • Design CI/CD pipelines
  • Ensure high availability and disaster recovery
  • Troubleshoot incidents

Stack includes Docker, Kubernetes, Terraform, CI/CD tools, observability platforms, and scripting in Python/Bash on Linux systems.

The ideal candidate is a mid-level SRE with 3+ years of experience in DevOps practices, proficient in containerization, orchestration, and monitoring tools, with a focus on deploying and maintaining ML infrastructure.

DockerKubernetesTerraformPython or BashLinuxML modelsobservability stacks
AirflowPrometheusGrafanaELKDatadog
DockerKubernetesTerraformGitLabGitHubAirflowPrometheusGrafanaELKDatadog
DockerKubernetesTerraformGitLabGitHub ActionsAirflowPythonBashLinuxPrometheusGrafanaELKDatadog
DockerKubernetesTerraformGitLabGitHub ActionsAirflowPythonBashLinuxPrometheusGrafanaELKDatadog
communicationcollaborationproblem-solvingadaptabilityteamwork
Industry Technology, AI & Machine Learning, Financial Services, Insurance
Job Function Maintain and optimize scalable ML infrastructure and system reliability
Site Reliability EngineerSREDevOpsDockerKubernetesTerraformGitLabGitHub ActionsAirflowPythonBashLinuxPrometheusGrafanaELKDatadogobservabilityML modelsmonitoringCI/CD

Lack of experience with Kubernetes or Docker, No experience with ML models in production, Unable to work remotely

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile