Position Details
About this role
This role involves building and maintaining scalable ML infrastructure, automating workflows, and ensuring system reliability in a cloud-native environment.
Key Responsibilities
- Build and maintain infrastructure
- Implement observability tools
- Design CI/CD pipelines
- Ensure high availability and disaster recovery
- Troubleshoot incidents
Technical Overview
Stack includes Docker, Kubernetes, Terraform, CI/CD tools, observability platforms, and scripting in Python/Bash on Linux systems.
Ideal Candidate
The ideal candidate is a mid-level SRE with 3+ years of experience in DevOps practices, proficient in containerization, orchestration, and monitoring tools, with a focus on deploying and maintaining ML infrastructure.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
Lack of experience with Kubernetes or Docker, No experience with ML models in production, Unable to work remotely
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile