✦ Luna Orbit — Cloud & Infrastructure

Infrastructure Engineer, Pre-training

at Anthropic

📍 San Francisco, CA Onsite Posted April 06, 2026
Type Full-Time
Experience senior
Exp. Years 5+ years
Education Advanced degree in Computer Science or related field
Category Cloud & Infrastructure

Anthropic is seeking a Staff-level Infrastructure Engineer to design and implement high-performance data processing infrastructure for large language model training. The role focuses on scalable, fault-tolerant systems, data quality, reproducibility, and observability, collaborating with research teams.

  • Design and implement high-performance data processing infrastructure for LLM training
  • Develop and maintain core processing primitives (tokenization, deduplication, chunking)
  • Build robust data quality assurance and validation at scale
  • Implement comprehensive monitoring systems for data processing infrastructure
  • Create and optimize distributed computing systems for processing web-scale datasets

Stack includes Python, Rust, Apache Spark, IaC (Terraform), and cloud platforms to build distributed data processing pipelines, with emphasis on monitoring, traceability, and scalable data prep for LLMs.

The ideal candidate is a senior infrastructure engineer with 5+ years of distributed systems experience, strong Python and Rust skills, and proven ability to design scalable data processing infrastructure for large language model training.

5+ years of experience in distributed systemsPythonRustApache SparkDistributed systemsHigh-throughputfault-tolerant system designInfrastructure as CodeTerraformCloud computing platformsTokenization algorithmsMonitoring and observabilityReproducibility and traceability in data preparation
Significant experience building and maintaining large-scale distributed systemsReliability and performance focusAbility to solve complex technical challenges at scaleOwnership and independenceInterest in ML infrastructureExperience with ML data pipelines
Apache SparkTerraformInfrastructure as CodeCloud computing platforms
PythonRustApache Sparkdistributed systemshigh-throughputfault-tolerant system designdata processing infrastructuretokenizationtokenization algorithmsmonitoringobservabilityInfrastructure as CodeTerraformcloud computing platformsdata quality assurancereproducibilitytraceabilitylarge language model training
PythonRustApache SparkDistributed systemsInfrastructure as CodeTerraformCloud computing platformsData processing infrastructureTokenizationTokenization algorithmsMonitoringObservabilityHigh-throughput system designFault-tolerant system designData quality assuranceReproducibilityTraceabilityLarge language model trainingPerformance optimization
Strong problem-solving skillsAttention to detailExcellent communicationCollaborativeTeam playerAbility to work with ambiguityDelivery-focusedProactive
Industry SaaS
Job Function Build and operate scalable data processing infrastructure to train large language models.
Role Subtype Infrastructure Engineer
Tech Domains Python, Cloud computing platforms, Apache Spark, Terraform, Infrastructure as Code, Distributed systems
Visa Sponsorship Yes
Infrastructure Engineerdistributed systemsPythonRustApache SparkInfrastructure as CodeTerraformcloud computing platformstokenization algorithmsMonitoringObservabilitydata quality assurancereproducibilitytraceabilitylarge language model traininghigh-throughputfault-tolerantperformance optimizationdata processing infrastructureML infrastructuretokenization

Less than 5 years of experience, No Python or Rust experience, No distributed systems experience, Unwilling to work on-site in San Francisco

Apply for this Position →

Get matched to jobs like this

Luna finds roles that fit your skills and career goals — no endless scrolling required.

Create a Free Profile