Position Details
About this role
NVIDIA is seeking a Senior Cloud Operations Engineer for the NGC Cloud team to improve efficiency, reliability, and scalability of systems powering global operations. You will automate CI/CD workflows with GitLab, monitor system health with leading observability tools, and support secure and compliant operational processes.
Key Responsibilities
- Automate build, test, and deployment using GitLab CI/CD pipelines
- Monitor system health with dashboards, alerts, and operational reporting
- Perform secure user offboarding, access reviews, and compliance tasks
- Maintain API performance and integration stability to meet SLAs and SLOs
- Coordinate releases and vulnerability remediation while maintaining documentation and SOPs
Technical Overview
The role centers on operating cloud services and automation pipelines with GitLab CI/CD, including monitoring via Prometheus, Grafana, Datadog, CloudWatch, and Splunk. You will also work across security, reliability, and ITSM processes while maintaining and debugging integrations backed by Java and data stores including Cassandra, DynamoDb, and Redis.
Ideal Candidate
The ideal candidate is a senior cloud operations engineer with 8+ years of hands-on experience supporting complex services and strong automation skills in Python. They have practical monitoring experience using Prometheus, Grafana, Datadog, CloudWatch, and Splunk, and understand ITSM processes for incident, problem, and modification management. They can operate securely and compliantly, maintain GitLab CI/CD pipelines, and work confidently with Java and RDBMS/NoSQL systems like Cassandra, DynamoDb, and Redis.
Must-Have Skills
Nice-to-Have Skills
Tools & Platforms
Required Skills
Hard Skills
Soft Skills
Industry & Role
Keywords for Your Resume
Deal Breakers
8+ years of hands-on experience building/supporting complex services, Python automation knowledge, Experience with monitoring tools: Prometheus, Grafana, Datadog, CloudWatch, Splunk, Familiarity with ITSM practices including incident, problem, and modification processes, Knowledge of core Java (Collections API, Streams API, Concurrency, I/O), Knowledge in RDBMS and NoSQL (Cassandra, DynamoDb, Redis)
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile