TL;DR

Senior ML Infrastructure Engineer (AI): Designing and building robust infrastructure and backend services for production-grade ML and LLM systems with an accent on scalability, reliability, and reproducibility. Focus on building and optimizing deployment pipelines, tooling for model versioning, and ensuring the production-readiness of research prototypes to improve behavioral health care quality.

Location: Tel Aviv District, Israel

Company

Eleos Health develops a behavioral health CareOps automation platform that uses advanced AI to convert therapy conversations into actionable clinical insights.

What you will do

  • Design and maintain production infrastructure supporting scalable ML and LLM systems.
  • Develop and manage automated training and deployment pipelines for machine learning models.
  • Build internal tooling for model experimentation, versioning, and reproducibility.
  • Implement CI/CD workflows tailored for ML and model deployment.
  • Ensure observability, reliability, and monitoring of AI systems in a production environment.
  • Collaborate with data scientists to bridge the gap between research prototypes and scalable production systems.

Requirements

  • 5+ years of industry experience in ML Infrastructure or Backend Engineering.
  • Strong proficiency in Python development for production systems.
  • Hands-on experience with cloud platforms (AWS, GCP, or Azure).
  • Experience with containerization tools like Docker and Kubernetes.
  • Proven track record of building CI/CD pipelines for backend or ML services.
  • Experience managing LLM deployment or ML models in production environments.

Nice to have

  • Experience with prompt engineering and LLM lifecycle management.
  • Familiarity with data versioning tools like DVC or Pachyderm.
  • Hands-on knowledge of MLOps platforms such as MLflow or Kubeflow.