About this role
This role builds and maintains tooling and processes to monitor, triage, and troubleshoot issues across the full network-to-application stack in cloud environments. You will own end-to-end availability and performance, automate responses for non-exceptional service conditions, and perform end-to-end testing for a mission critical push-to-talk system.
Key Responsibilities
- Understand complex network and application infrastructure and build monitoring/triage tools
- Troubleshoot OS/Networking/Database issues in cloud-based SaaS and handle live production incidents
- Follow and implement SRE best practices and own end-to-end availability and performance
- Automate response to non-exceptional service conditions and build automation to prevent recurrence
- Perform end-to-end testing and system analysis/configuration management to improve performance, availability, and reliability
Technical Overview
You will operate in a cloud-based SaaS environment, applying SRE best practices to handle live production incidents and to improve application performance and stability. The position emphasizes end-to-end availability, build automation, configuration management, and comprehensive testing for network and push-to-talk communications reliability.
Ideal Candidate
The ideal candidate is a mid-level SRE/DevOps-focused Software Engineer with experience monitoring, triaging, and troubleshooting complex network and application infrastructure in public and private cloud environments. They have hands-on incident response experience, strong knowledge of SRE best practices, and can drive end-to-end availability, automation, and end-to-end testing for mission critical systems.
Must-Have Skills
Understand complex network and application infrastructuremonitortriage and troubleshoot issuesfollow and implement SRE best practiceshandle live production incidentsOwn end-to-end availability and performance of key network and application servicesPerform end to end testing to ensure the overall quality of Mission Critical Push to talk Systemconfiguration management
Tools & Platforms
public cloud environmentsprivate cloud environmentscloud-based SaaS environment
Required Skills
network and application infrastructuremonitortriagetroubleshootpublic and private cloud environmentsOSNetworkingDatabasecloud-based SaaS environmentSRE best practiceslive production incidentsdebug/troubleshootapplication performancestabilityend-to-end availabilitybuild automationautomate response to all non-exceptional service conditionsend to end testingconfiguration managementpush to talk communications solutionsMission Critical Push to talk System
Hard Skills
network and application infrastructuremonitoringtriagetroubleshoot issuespublic and private cloud environmentstroubleshooting OStroubleshooting Networkingtroubleshooting Databasecloud-based SaaS environmentSRE best practiceslive production incidentsdebug/troubleshoot application and infrastructure issuesapplication performance monitoringimprove overall application performancestabilityend-to-end availabilityperformance of key network and application servicesautomationbuild automation to prevent problem recurrenceautomate response to all non-exceptional service conditionsend to end testingsystem analysisconfiguration managementsystem software performanceavailability and reliabilitypush to talk communications solutionsreliability and maintainabilityhardwaresoftware processesnetwork facilitiescontrolssecurity systemsMission Critical Push to talk System
Soft Skills
communicationcross-platform issue troubleshootingproblem-solvingownership mindsetcollaborationattention to detailoperational rigor
Keywords for Your Resume
Software Engineernetwork and application infrastructuremonitortriagetroubleshootpublic and private cloud environmentsOSNetworkingDatabasecloud-based SaaS environmentSRE best practiceslive production incidentsdebug/troubleshootapplication performancestabilityend-to-end availabilitybuild automationautomate responsenon-exceptional service conditionsend to end testingconfiguration managementpush to talk communications solutionsreliability and maintainabilityMission Critical Push to talk Systemautomate response to all non-exceptional service conditions
Deal Breakers
Must have the stated education requirement: Masters degree OR Bachelor's degree + 2 years experience, Must be able to follow and implement SRE best practices, Must have experience troubleshooting network, OS, and database issues in a cloud-based SaaS environment
Get matched to jobs like this
Luna finds roles that fit your skills and career goals — no endless scrolling required.
Create a Free Profile