TL;DR

Senior Site Reliability Engineer (ClickHouse): Building and leading processes to ensure the reliability, availability, and scalability of cloud infrastructure with an accent on distributed systems design, incident management, and performance optimization. Focus on developing software platforms and tools to enhance operational efficiency for our serverless ClickHouse Cloud.

Salary: $141,000–$230,000 USD

Company

A fast-growing, innovative data company providing real-time analytics and data warehousing solutions.

What you will do

  • Design and implement scalable, secure, and high-availability systems for cloud database infrastructure.
  • Establish and manage service level objectives (SLOs) and service level agreements (SLAs).
  • Enhance incident response processes and conduct blameless post-mortems for outages.
  • Develop software platforms and tools to optimize operational and engineering efficiency.
  • Drive Chaos engineering initiatives to improve system resilience.
  • Manage on-call processes and coordinate cross-team escalations.

Requirements

  • Bachelor’s or Master’s degree in Computer Science or related field.
  • 8+ years of experience in Site Reliability Engineering or related roles.
  • Hands-on experience with Go and/or Python.
  • Strong knowledge of cloud platforms such as AWS, Azure, or GCP.
  • Experience with container orchestration tools like Kubernetes or Docker Swarm.
  • Proficiency in automation and configuration management tools (Ansible, Terraform, Puppet).

Nice to have

  • Previous experience using ClickHouse in production environments.
  • Deep understanding of distributed databases and SQL.

Culture & Benefits

  • Flexible work environment with a globally distributed team.
  • Employer contributions towards healthcare.
  • Equity in the company provided to all new team members.
  • Flexible time off policy.
  • Home office setup stipend of $500.
  • Opportunities for global in-person gatherings and offsites.