TL;DR

SRE Engineer (System Engineering): Improving reliability and performance of a large-scale cloud platform with an accent on high availability, incident management, and automation. Focus on designing monitoring solutions, managing SLIs/SLOs, and building self-healing systems to ensure stable operations and fast incident resolution.

Location: Remote with presence in multiple countries including USA, Europe, and Latin America

Salary: €2350–€3700 per month (gross)

Company

Andersen cooperates with global leaders in FinTech, Healthcare, Retail, Telecom, and other industries, providing software development and IT services.

What you will do

Maintain high system availability and reliability across cloud environments.
Design and implement monitoring, logging, and alerting solutions.
Define and manage SLIs, SLOs, and error budgets.
Lead incident response, root cause analysis, and post-mortems.
Automate operational tasks and improve CI/CD pipelines.
Partner with engineering teams to enhance application resilience and participate in capacity planning and disaster recovery.

Requirements

5+ years experience as Site Reliability Engineer or Incident Management Engineer.
Strong incident escalation skills and experience with Azure cloud and Kubernetes administration.
Experience with containerization (Docker), Infrastructure as Code (Terraform), monitoring/APM tools, log aggregation, and distributed tracing.
English level: Upper-Intermediate (B2) or higher.
Spanish or Portuguese level: Intermediate+ or higher.

Culture & Benefits

Work with leaders in multiple industries and opportunities for professional growth.
Mentoring and adaptation systems for new employees.
Additional earning opportunities up to $1000/month.
Access to corporate training portal and certification compensation.
Private health insurance, sports activity compensation, English courses, and vibrant corporate life.

SRE Engineer with Spanish/Portuguese

Описание вакансии

TL;DR

Company

What you will do

Requirements

Culture & Benefits

Мэтч