About this role
Nscale is hiring a Principal Observability Platform Engineer for its Global CISO Organization to design, implement, and scale observability and security for its GPU cloud infrastructure. The role focuses on platform security and operational excellence across compute, networking, storage, and control plane systems.
Key Responsibilities
- Lead security and observability engineering across distributed multi-tenant infrastructure
- Harden Kubernetes, virtualization layers, GPU workloads, and platform services
- Strengthen identity, authentication, authorization, and secrets management
- Embed automated security validation and guardrails into CI/CD pipelines
- Conduct deep technical design reviews and threat modeling exercises
Technical Overview
You will lead security and observability initiatives across distributed, multi-tenant infrastructure, hardening Kubernetes, virtualization layers, and GPU workloads. Responsibilities include identity/auth/secrets management improvements, secure network segmentation, and embedding automated security validation into CI/CD, supported by threat modeling and technical design reviews.
Ideal Candidate
The ideal candidate is a Principal-level security and observability engineer with 10+ years of hands-on experience securing and instrumenting cloud, hyperscale, and large distributed systems. They have deep expertise in Linux systems internals, Kubernetes/container security, Infrastructure-as-Code (Terraform), and identity and access management, with a proven track record securing multi-tenant environments at scale.
Must-Have Skills
10+ years of hands-on security or observability engineering experience in cloudhyperscaleor large distributed systemsStrong software engineering skills (GoPythonRustor similar)Linux systems internalsKubernetes and container securityInfrastructure-as-Code (Terraform or equivalent)Identity and access managementProven experience securing multi-tenant environments at scaleharden Kubernetesstrengthen identityauthenticationauthorizationand secrets management systemsEmbed automated security validation and guardrails into CI/CD pipelinesConduct deep technical design reviews and threat modeling exercises
Nice-to-Have Skills
building observability platforms and telemetry pipelinesfamiliarity with GPU cloud infrastructure or AI workloadsexposure to distributed tracingmetricsand log aggregation tools
Tools & Platforms
GoPythonRustTerraformKubernetesCI/CD pipelinesdistributed tracingmetricslog aggregation tools
Required Skills
security and observability engineeringdistributed systemsmulti-tenant environmentsGoPythonRustLinux systems internalsKubernetescontainer securityInfrastructure-as-CodeTerraformcloud-native architecturesnetwork security and segmentationidentity and access managementauthenticationauthorizationsecrets management systemssecure segmentationtraffic isolation strategiesCI/CD pipelinesautomated security validationthreat modelingtechnical design reviewsobservability platformstelemetry pipelinesdistributed tracingmetricslog aggregation tools
Hard Skills
security and observability engineeringdistributed systemsmulti-tenant environmentsGoPythonRustLinux systems internalsKubernetescontainer securityInfrastructure-as-Code (Terraform)Terraformcloud-native architecturesnetwork security and segmentationidentity and access managementauthenticationauthorizationsecrets management systemssecure segmentationtraffic isolation strategiesCI/CD pipelinesautomated security validationthreat modelingtechnical design reviewsvirtualization layersGPU workloadsplatform securityoperational excellenceobservability platformstelemetry pipelinesdistributed tracingmetricslog aggregation tools
Soft Skills
ownershipaccountabilitymentoring and developing junior engineersraising the technical bararchitectural decision makingpartnering cross-functionallycommunication as a subject matter expertopenness and transparencycollaboration with Networking teamsdeep technical collaboration with CISO
Keywords for Your Resume
Principal Observability Platform Engineerobservabilitysecurityplatform securityoperational excellencedistributedmulti-tenant infrastructuremulti-tenant environments at scaleLinux systems internalsKubernetes and container securitycontainer securityInfrastructure-as-CodeTerraformcloud-native architecturesnetwork security and segmentationidentity and access managementauthenticationauthorizationsecrets managementCI/CD pipelinesthreat modelingtechnical design reviewsdistributed tracingmetricslog aggregationGPU workloads
Deal Breakers
10+ years of hands-on security or observability engineering experience, Strong software engineering skills (Go, Python, Rust, or similar), Deep expertise in Kubernetes and container security, Proven experience securing multi-tenant environments at scale
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile