TL;DR

Director of Engineering, Cluster Networking (AI): Leading the architecture, design, and engineering delivery of global cluster networking infrastructure for a GPU cloud platform with an accent on high-performance networking fabrics and operational excellence. Focus on defining technical strategy, scaling networking environments for AI workloads, and building a world-class engineering organization.

Location: Remote (Global)

Salary: $150,000 – $300,000 USD

Company

Nscale is a GPU cloud engineered for AI, providing cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers.

What you will do

Define and evolve the multi-year technical roadmap for cluster networking, aligning with AI platform requirements and growth.
Lead end-to-end engineering delivery of cluster networking solutions from design to production deployment and optimization.
Own the operational performance, availability, and reliability of cluster networking infrastructure globally.
Collaborate closely with Compute, Platform, SRE, Data Centre Operations, and Procurement teams for aligned execution.
Build, mentor, and scale a high-performing cluster networking engineering team.
Drive ongoing improvements in network efficiency, performance, and cost optimization.

Requirements

12+ years of experience in networking or infrastructure engineering, with at least 5 years in a senior technical leadership role (Head of Engineering, Director, or equivalent).
Deep hands-on experience designing and operating large-scale data centre or HPC networking environments.
Proven expertise in high-speed Ethernet and/or InfiniBand fabrics supporting GPU or AI workloads.
Strong background in data centre networking, routing protocols, congestion management, and high-availability design.
Experience leading globally distributed engineering teams in high-growth or hyperscale environments.
Ability to design scalable, resilient, high-performance cluster networking systems and apply an automation mindset.

Nice to have

Experience designing networking for large-scale GPU clusters or AI training environments.
Familiarity with HPC networking topologies (Fat Tree, Rail, Dragonfly).
Experience with SONiC, Cumulus, or other open networking platforms.
Knowledge of optics, transceivers, and high-speed interconnect standards (400G/800G).
Background in SRE, distributed systems, or large-scale cloud infrastructure.

Culture & Benefits

Collaborative, supportive, and innovative environment where contributions spark real impact.
Highly competitive package (base + equity) with reviews every 12 months.
Join a fast-growing tech startup pushing boundaries in AI.
Dynamic progression plan tailored to your ambitions with full support.
Human-First Flexibility, trusting Nscalers to deliver with autonomy to shape their day.
Remote-first team with seamless virtual collaboration, no geographical barriers.

Director of Engineering, Cluster Networking

Описание вакансии