TL;DR

Lead Site Reliability Engineer (System Engineering): Manage and support cloud and hosted web-based applications with an accent on operational stability, incident resolution, and continuous process improvement. Focus on supporting multi-cloud environments, container orchestration, and disaster recovery tasks.

Location: Remote from Mexico

Company

FICO is a leading global analytics software company serving businesses in over 100 countries, specializing in big data analytics, AI, and machine learning.

What you will do

  • Support full stack of Public Cloud, Private Cloud, and hosted web applications.
  • Drive incident resolution and manage operational stability following best practices.
  • Enhance support processes to prevent recurring problems.
  • Implement and perform disaster recovery tasks on supported applications.
  • Contribute to continuous improvement of operational procedures.

Requirements

  • Location: Must be based in Mexico with ability to work remotely.
  • Strong problem-solving and analytical skills.
  • Excellent English communication skills (minimum B2 level).
  • Experience with RedHat Enterprise Linux, AWS cloud services, Docker, Kubernetes, and observability tools like Kibana, Splunk, Prometheus.
  • Knowledge of networking, storage management, LDAP, and ITIL standards.
  • Willingness to work shifts from Thursday to Monday.

Culture & Benefits

  • Inclusive culture reflecting core values: ownership, customer delight, and respect.
  • Opportunities for professional development and learning.
  • Competitive compensation and benefits programs.
  • People-first work environment promoting work-life balance and social interaction.