Responsibilities:
- Implement new technologies on top of Kubernetes to ensure the tech stack can scale with the business.
- Develop and integrate automated solutions to improve and maintain reliability across the production estate and simplify recovery.
- Design and track metrics for site uptime and performance, ensuring high levels of visibility.
- Own deployment pipelines and continuously improve monitoring and alerting capabilities.
- Collaborate closely with other engineering functions to provide timely feedback from environments.
- Support Engineering in delivering better software, faster and more safely.
Requirements:
- Strong systems administration skills and solid experience with Linux.
- Understanding of the difference between containers and virtual machines.
- Platform engineering/SRE experience at leading startups or fast-growing tech companies.
- Experience with a regulated industry.
- Confidence in guiding developers on monitoring and logging of complex systems at scale.
- Experience working on complex projects.
- Habit of using AI agents to assist in researching and solving problems.
- Ability to work collaboratively with Security, Data, and Engineering teams.
- Desire to forge and own MoonPay’s reliability and recovery processes.
- Basic understanding of complex reliability structures, theories, principles, and best practices.
- Experience working with JavaScript codebases and frameworks, e.g. TypeScript, Node.js and React.
