TL;DR

Site Reliability Engineer: Ensuring the reliability, scalability, and performance of systems powering StuDocu's internal operations and workflows with an accent on infrastructure reliability and scaling workflow systems. Focus on transforming Kestra into a platform used across the company and improving monitoring and observability.

Location: Must be in the Amsterdam office 2 days per week.

Company

Studocu makes it easy for 60 million students, across 120,000 institutions, to share 50 million study resources.

What you will do

Design, build, and maintain reliable production infrastructure supporting Kestra workflows.
Operate and improve Kubernetes-based infrastructure.
Manage and optimize PostgreSQL databases in production environments.
Scale workflow orchestration system handling thousands of tasks per minute.
Improve monitoring, observability, and system performance.

Requirements

5+ years of experience in Site Reliability Engineering or Infrastructure.
Strong experience running production systems at scale.
Deep knowledge of Kubernetes.
Strong experience with PostgreSQL.
Experience with observability tools i.e. Grafana, Prometheus/Mimir, or Loki.
Experience with AWS is a must; experience with other cloud providers is a bonus.

Nice to have

Experience working with workflow orchestration systems.
Prior experience with Kestra.
Strong communication skills and ability to translate technical topics into clear explanations.

Culture & Benefits

Awesome office in the centre of Amsterdam with a garden.
Hybrid working policy - in office 2 days per week.
24 holiday days, options to buy leave and take 4 weeks working abroad.
Learning & Development budget of 2,500 Euros per year.
Company boat trips and an open in-office bar on a Friday!
Pension scheme with 2% contribution match.

Site Reliability Engineer

Описание вакансии