TL;DR
Site Reliability Engineer (Azure): Driving stability, reliability, and performance across multi-cloud platforms with an accent on application-level troubleshooting and proactive incident management. Focus on building robust observability systems, automating toil, and enhancing resilience through AI-Ops and IaC practices.
Location: Must be based in the United States and authorized to work without sponsorship
Salary: $129,000–$143,000 + bonus + benefits
Company
Avaya is a leader in enterprise software and cloud communications, focused on unifying customer experiences and creating enduring digital connections.
What you will do
- Respond to and manage incidents as part of a 24/7 on-call rotation.
- Perform deep-dive troubleshooting of application performance, logs, and traces.
- Build and maintain observability dashboards using Datadog, Grafana, and cloud-native tools.
- Define SLOs and SLIs to proactively identify reliability risks.
- Integrate AI-Ops tools for predictive alerting and automated incident correlation.
- Lead post-incident reviews and implement systemic improvements to prevent recurring issues.
Requirements
- 5+ years of experience in Site Reliability, DevOps, or Cloud Operations roles.
- Must be authorized to work in the United States without visa sponsorship.
- Strong expertise in Azure and GCP cloud environments.
- Proficiency in IaC tools like Terraform and Ansible.
- Experience with CI/CD pipelines (Jenkins, GitHub Actions).
- Proven ability in deep-dive application troubleshooting and log analysis.
Culture & Benefits
- Comprehensive benefits package.
- Performance-related annual bonus.
- Collaborative work environment focused on engineering growth.
- Equal Opportunity employer committed to diversity and equality.
