TL;DR

Senior Hardware Support Engineer (AI): Owning production hardware reliability across large-scale data center environments with an accent on hardware engineering, operations, and vendors. Focus on rapid root cause identification and continuous improvement of server and platform reliability.

Location: Remote within the United States. Occasional travel may be required.

Salary: $125,000 – $180,000 per year plus annual performance-based bonus.

Company

Nebius is leading a new era in cloud computing to serve the global AI economy.

What you will do

  • Lead root cause analysis for complex hardware and firmware failures across production fleets.
  • Aggregate recurring problems and error patterns to identify systemic reliability issues.
  • Coordinate with vendors to drive timely diagnostics, RMAs, firmware fixes, and corrective actions.
  • Partner with internal engineering teams to validate fixes and prevent recurrence.
  • Improve hardware observability, failure tracking, and reporting processes.
  • Contribute to long-term hardware reliability strategy and fleet-wide stability improvements.

Requirements

  • Strong hands-on expertise with server hardware in data center or large-scale production environments.
  • Proven experience performing root cause analysis of hardware and firmware failures.
  • Deep understanding of server components (CPU, memory, storage, networking, power, BMC) and failure modes.
  • Experience working directly with hardware vendors and engineering teams to resolve production issues.
  • Structured problem-solving skills using formal IT or incident management methodologies.
  • Clear written and verbal communication skills in cross-functional environments.

Nice to have

  • Experience in GPU-dense, AI, or high-performance computing environments.
  • Exposure to firmware lifecycle management and large-scale rollout validation.
  • Familiarity with Linux-based production systems and infrastructure tooling.
  • Experience improving fleet-wide hardware reliability metrics at scale.

Culture & Benefits

  • Comprehensive medical, dental, and vision coverage.
  • 401(k) plan with company contribution.
  • Flexible paid time off.
  • Paid parental leave.
  • Professional development support.
  • Flexible working arrangements.