Responsibilities:

  • Implement new technologies on top of Kubernetes to ensure the tech stack can scale with the business.
  • Develop and integrate automated solutions to improve and maintain reliability across the production estate and simplify recovery.
  • Design and track metrics for site uptime and performance, ensuring high levels of visibility.
  • Own deployment pipelines and continuously improve monitoring and alerting capabilities.
  • Collaborate closely with other engineering functions to provide timely feedback from environments.
  • Support Engineering in delivering better software, faster and more safely.

Requirements:

  • Strong systems administration skills and solid experience with Linux.
  • Understanding of the difference between containers and virtual machines.
  • Platform engineering/SRE experience at leading startups or fast-growing tech companies.
  • Experience with a regulated industry.
  • Confidence in guiding developers on monitoring and logging of complex systems at scale.
  • Experience working on complex projects.
  • Habit of using AI agents to assist in researching and solving problems.
  • Ability to work collaboratively with Security, Data, and Engineering teams.
  • Desire to forge and own MoonPay’s reliability and recovery processes.
  • Basic understanding of complex reliability structures, theories, principles, and best practices.
  • Experience working with JavaScript codebases and frameworks, e.g. TypeScript, Node.js and React.