TL;DR

Site Reliability Engineer (Azure): Driving stability, reliability, and performance across multi-cloud platforms with an accent on application-level troubleshooting and proactive incident management. Focus on building robust observability systems, automating toil, and enhancing resilience through AI-Ops and IaC practices.

Location: Must be based in the United States and authorized to work without sponsorship

Salary: $129,000–$143,000 + bonus + benefits

Company

Avaya is a leader in enterprise software and cloud communications, focused on unifying customer experiences and creating enduring digital connections.

What you will do

  • Respond to and manage incidents as part of a 24/7 on-call rotation.
  • Perform deep-dive troubleshooting of application performance, logs, and traces.
  • Build and maintain observability dashboards using Datadog, Grafana, and cloud-native tools.
  • Define SLOs and SLIs to proactively identify reliability risks.
  • Integrate AI-Ops tools for predictive alerting and automated incident correlation.
  • Lead post-incident reviews and implement systemic improvements to prevent recurring issues.

Requirements

  • 5+ years of experience in Site Reliability, DevOps, or Cloud Operations roles.
  • Must be authorized to work in the United States without visa sponsorship.
  • Strong expertise in Azure and GCP cloud environments.
  • Proficiency in IaC tools like Terraform and Ansible.
  • Experience with CI/CD pipelines (Jenkins, GitHub Actions).
  • Proven ability in deep-dive application troubleshooting and log analysis.

Culture & Benefits

  • Comprehensive benefits package.
  • Performance-related annual bonus.
  • Collaborative work environment focused on engineering growth.
  • Equal Opportunity employer committed to diversity and equality.