✦ Luna Orbit — AI & Machine Learning

Senior AI Infrastructure Engineer, Model Serving Platform

at Scale AI

📍 San Francisco, CA; New York, NY Onsite Posted April 02, 2026
Type Full-Time
Experience senior
Exp. Years 5+ years
Education Not specified
Category AI & Machine Learning

Scale AI seeks a Senior AI Infrastructure Engineer to design and operate platforms for scalable, reliable serving of LLMs and internal LLM capability discovery.

  • Build and maintain fault-tolerant, high-performance systems for serving LLMs
  • Build an internal platform to empower LLM capability discovery
  • Collaborate with researchers and engineers to optimize models for production and research use cases
  • Conduct architecture and design reviews to uphold best practices
  • Develop monitoring and observability solutions

Backend-focused ML infra stack with languages Python/Go/Rust/C++, containerization (Docker/Kubernetes), cloud infrastructure (AWS/GCP), Terraform; LLM serving concepts like rate limiting, token streaming, load balancing.

The ideal candidate is a senior ai infra engineer with 5+ years of backend systems experience, strong LLM serving knowledge, and proven capability to design scalable cloud-based serving platforms.

5+ years backend systems experiencePython/Go/Rust/C++ proficiencyLLM serving & routing experienceDockerKubernetesAWS or Google Cloud PlatformTerraform
vLLMSGLangTensorRT-LLMtext-generation-inference
DockerKubernetesTerraformAmazon Web ServicesGoogle Cloud Platform
5+ years backend systems experiencePython/Go/Rust/C++ proficiencyLLM serving & routing experienceDockerKubernetesAWS or Google Cloud PlatformTerraform
PythonGoRustC++LLM servingrate limitingtoken streamingload balancingDockerKubernetesAmazon Web ServicesAWSGoogle Cloud PlatformGCPTerraformInfrastructure as CodevLLMSGLangTensorRT-LLMtext-generation-inference
communicationproblem-solvingindependenceteamworkself-motivation
Industry SaaS
Job Function Develop and operate large-scale LLM model serving platforms
Role Subtype MLOps Engineer
Tech Domains Python, Go, Rust, C++, Kubernetes, Docker, Amazon Web Services, Google Cloud Platform
senior ai infrastructure engineermodel serving platformscale aillmllm servingrate limitingtoken streamingload balancingTerraformDockerKubernetesAWSAmazon Web ServicesGCPGoogle Cloud PlatformPythonGoRustC++vLLMSGLangTensorRT-LLMtext-generation-inferenceCloud infrastructurebackend systemskubernetesdockerawsgoogle cloud platformterraformmodel serving

5+ years backend systems experience, LLM serving & routing experience, Onsite in San Francisco or New York

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile