TL;DR

SRE Engineer (System Engineering): Improving reliability and performance of a large-scale cloud platform with an accent on high availability, incident management, and automation. Focus on designing monitoring solutions, managing SLIs/SLOs, and building self-healing systems to ensure stable operations and fast incident resolution.

Location: Remote with presence in multiple countries including USA, Europe, and Latin America

Salary: €2350–€3700 per month (gross)

Company

Andersen cooperates with global leaders in FinTech, Healthcare, Retail, Telecom, and other industries, providing software development and IT services.

What you will do

  • Maintain high system availability and reliability across cloud environments.
  • Design and implement monitoring, logging, and alerting solutions.
  • Define and manage SLIs, SLOs, and error budgets.
  • Lead incident response, root cause analysis, and post-mortems.
  • Automate operational tasks and improve CI/CD pipelines.
  • Partner with engineering teams to enhance application resilience and participate in capacity planning and disaster recovery.

Requirements

  • 5+ years experience as Site Reliability Engineer or Incident Management Engineer.
  • Strong incident escalation skills and experience with Azure cloud and Kubernetes administration.
  • Experience with containerization (Docker), Infrastructure as Code (Terraform), monitoring/APM tools, log aggregation, and distributed tracing.
  • English level: Upper-Intermediate (B2) or higher.
  • Spanish or Portuguese level: Intermediate+ or higher.

Culture & Benefits

  • Work with leaders in multiple industries and opportunities for professional growth.
  • Mentoring and adaptation systems for new employees.
  • Additional earning opportunities up to $1000/month.
  • Access to corporate training portal and certification compensation.
  • Private health insurance, sports activity compensation, English courses, and vibrant corporate life.