TL;DR

Site Reliability Engineer: Ensuring the reliability, scalability, and performance of systems powering StuDocu's internal operations and workflows with an accent on infrastructure reliability and scaling workflow systems. Focus on transforming Kestra into a platform used across the company and improving monitoring and observability.

Location: Must be in the Amsterdam office 2 days per week.

Company

Studocu makes it easy for 60 million students, across 120,000 institutions, to share 50 million study resources.

What you will do

  • Design, build, and maintain reliable production infrastructure supporting Kestra workflows.
  • Operate and improve Kubernetes-based infrastructure.
  • Manage and optimize PostgreSQL databases in production environments.
  • Scale workflow orchestration system handling thousands of tasks per minute.
  • Improve monitoring, observability, and system performance.

Requirements

  • 5+ years of experience in Site Reliability Engineering or Infrastructure.
  • Strong experience running production systems at scale.
  • Deep knowledge of Kubernetes.
  • Strong experience with PostgreSQL.
  • Experience with observability tools i.e. Grafana, Prometheus/Mimir, or Loki.
  • Experience with AWS is a must; experience with other cloud providers is a bonus.

Nice to have

  • Experience working with workflow orchestration systems.
  • Prior experience with Kestra.
  • Strong communication skills and ability to translate technical topics into clear explanations.

Culture & Benefits

  • Awesome office in the centre of Amsterdam with a garden.
  • Hybrid working policy - in office 2 days per week.
  • 24 holiday days, options to buy leave and take 4 weeks working abroad.
  • Learning & Development budget of 2,500 Euros per year.
  • Company boat trips and an open in-office bar on a Friday!
  • Pension scheme with 2% contribution match.