About this role
Principal engineer to act as the technical architect for Expedia's Cloud Platform, focusing on scalable cloud infrastructure, Kubernetes, and developer experience tooling. The role emphasizes reliability, observability, FinOps, and AI-enabled platform capabilities.
Key Responsibilities
- Lead Architectural Evolution
- Modernize Kubernetes & Infrastructure
- Hardened Reliability & Observability
- Optimize Cloud Economics
- Support the Developer Workflow
Technical Overview
Architectural leadership over cloud-native platform services on Kubernetes, multi-cluster management, service mesh, and automated scaling; strong emphasis on OpenTelemetry/Prometheus observability and IaC (Terraform, Pulumi) with Go/Rust as primary languages.
Ideal Candidate
The ideal candidate is a senior principal software engineer with extensive cloud platform experience, Kubernetes-based architecture, IaC, and observability expertise. They should be able to drive architectural evolution, define multi-cluster Kubernetes strategies, and lead FinOps initiatives while enabling agent-friendly developer tooling.
Must-Have Skills
Extensive professional software development experience designingbuildingand operating large-scalecloud-native distributed systems and platform services on Kubernetes.Proven ownership of critical services or multi-service platformsincluding responsibility for system design (LLD)API designdata modelingdeploymentand ongoing operational health.Deep expertise with at least one major public cloud provider and core platform technologies (computenetworkingstorageservice discoverysecurityobservabilityand CI/CD).Demonstrated ability to make high-impact architectural decisionsnavigate complex trade-offsand guide multiple teams toward coherentlong-term technical direction.Familiarity with AI-driven systemstoolsor workflows and applying AI/ML concepts to real world products within cloud or platform environments.Deep knowledge of observability patterns (OpenTelemetryPrometheusdistributed tracing).Expert-level understanding of Infrastructure as Code (TerraformPulumi) and CI/CD at scale.Proficiency in GoRustor similar languages used in modern platform engineering.
Nice-to-Have Skills
Track record of defining and evolving multi-year technical strategies for cloud and developer platform ecosystemsand successfully driving adoption of shared platforms across many teams.Experience designing and operating highly availableglobally distributed systems at internet scaleincluding capacity planningperformance optimizationand robust failure handling.Safely integrates and operates AI/ML-enabled solutions that improve outcomessuch as intelligent routingpredictive scalingor automated remediation embedded in platform serviceswith appropriate safeguards.Advanced experience applying AI/ML techniques to cloud and platform problems (for examplecost optimizationanomaly detectionor performance tuning) and partnering with data/ML teams to productionize these capabilities.A Systems Architect: You understand the deep plumbing of the cloud (AWS/GCPK8snetworking). You think in terms of failure domainslatenciesand unit economics.Reliability-First: You've carried a pager for global-scale systems. You have a healthy "paranoia" about stateconsistencyand cascading failures.Hands-on: You still love to build. You can prototype a complex infrastructure change in a weekend to prove it worksbut you have the discipline to ensure it's production-grade before it ships.
Tools & Platforms
KubernetesTerraformPulumiOpenTelemetryPrometheusCI/CDGoRustAmazon Web ServicesLinuxDocker
Required Skills
cloud native distributed systemskubernetesk8sapi designdata modelingservice meshobservabilityOpenTelemetryPrometheusdistributed tracingInfrastructure as CodeTerraformPulumiCI/CDGoRustAWSAmazon Web ServicesLinuxDev ContainersEphemeral environments
Hard Skills
KubernetesK8sOpenTelemetryPrometheusdistributed tracingTerraformPulumiInfrastructure as CodeCI/CDGoRustAmazon Web ServicesAWSLinuxService mesh
Soft Skills
leadershipcommunicationcollaborationstrategic thinkingproblem solvingstakeholder managementteam leadershipmentoringdecision makinginfluencing
Keywords for Your Resume
Principal Software Development EngineerCloud PlatformKubernetesK8sGolden Pathservice meshobservabilityOpenTelemetryPrometheusdistributed tracingInfrastructure as CodeTerraformPulumiCI/CDGoRustAWSAmazon Web Servicesephemeral environmentsAI/ML
Deal Breakers
Lack of Kubernetes or cloud platform experience, Inability to make high-impact architectural decisions, No experience with Infrastructure as Code (Terraform/Pulumi), Unwillingness to operate in a demanding, on-call environment
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile