TL;DR

Member of Technical Staff, Software Co-Design AI HPC Systems (AI, HPC): Architecting and co-designing next-generation AI systems at datacenter scale with an accent on optimizing end-to-end performance, efficiency, and reliability across hardware and software. Focus on translating insights from real-world AI workloads into concrete improvements, guiding hardware roadmaps, and driving architectural decisions for large-scale AI platforms.

Location: Hybrid in the United Kingdom. Office attendance is required at least four days a week if living within 25 miles of a designated Microsoft office.

Company

Microsoft AI's Superintelligence Team is a startup-like team within Microsoft AI, focused on pushing the boundaries of AI towards Humanist Superintelligence by creating controllable, safety-aligned, and human-values-anchored ultra-capable systems.

What you will do

Lead the co-design of AI systems across hardware and software boundaries, spanning accelerators, interconnects, memory, storage, runtimes, and distributed frameworks.
Drive architectural decisions by analyzing real workloads, identifying bottlenecks, and translating findings into actionable system and hardware requirements.
Co-design and optimize parallelism strategies, execution models, and distributed algorithms to improve scalability, utilization, and cost efficiency of large-scale AI systems.
Develop and evaluate what-if performance models to project system behavior and provide early guidance to hardware and platform roadmaps.
Partner with compiler, kernel, and runtime teams to unlock the full performance of current and next-generation accelerators.
Influence and guide AI hardware design at system and silicon levels, including microarchitecture, interconnect topology, and memory hierarchy.
Lead cross-functional efforts to prototype, validate, and productionize high-impact co-design ideas across infrastructure, hardware, and product teams.
Mentor senior engineers and researchers, set technical direction, and raise the overall bar for systems rigor and performance engineering.

Requirements

Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or a related technical field, or equivalent practical experience.
10+ years of experience working across systems software, hardware architecture, or AI infrastructure, with demonstrated impact at scale.
Strong background in one or more: AI accelerator/GPU architectures, distributed AI training/inference, high-performance computing (HPC), ML systems, runtimes/compilers, performance modeling, or hardware–software co-design for AI workloads.
Proficiency in systems-level programming (e.g., C/C++, CUDA, Python) and performance-critical software development.
Proven ability to work across organizational boundaries and influence technical decisions involving multiple stakeholders.
Must be able to work in a hybrid format from a designated Microsoft office in the United Kingdom.

Nice to have

Experience designing or operating large-scale AI clusters for training or inference.
Deep familiarity with LLMs, multimodal models, or recommendation systems, and their systems-level implications.
Experience with accelerator interconnects and communication stacks (e.g., NCCL, MPI, RDMA, high-speed Ethernet or InfiniBand).
Background in performance modeling and capacity planning for future hardware generations.
Prior experience contributing to or leading hardware roadmaps, silicon bring-up, or platform architecture reviews.
Publications, patents, or open-source contributions in systems, architecture, or ML systems.

Culture & Benefits

Mission to empower every person and organization to achieve more, with a growth mindset, innovation, and collaboration.
Culture of inclusion, respect, integrity, and accountability.
Team operates with end-to-end ownership, deep technical rigor, and a strong bias toward real-world impact.
Active contribution to the broader research and engineering community through publishing, prototyping, and open-sourcing impactful technologies.
Opportunity to partner with product teams to reach billions of users and create immense positive impact.

Member of Technical Staff, Software Co-Design AI HPC Systems – MAI Superintelligence Team

Описание вакансии