Wednesday, July 19, 2023

Ultra Ethernet Consortium targets networking for AI and HPC

A new Ultra Ethernet Consortium (UEC) has been established with the goal of bringing together leading companies for industry-wide cooperation to build a complete Ethernet-based communication stack architecture for high-performance networking. UEC is 

The aim is to capitalize on Ethernet's ubiquity and flexibility for handling a wide variety of workloads in Artificial Intelligence (AI) and High-Performance Computing (HPC).

Founding members of Ultra Ethernet Consortium, which is a Joint Development Foundation project hosted by The Linux Foundation , include AMD, Arista, Broadcom, Cisco, Eviden (an Atos Business), HPE, Intel, Meta and Microsoft.

The consortium will follow a systematic approach with modular, compatible, interoperable layers with tight integration to provide a holistic improvement for demanding workloads. The founding companies are seeding the consortium with highly valuable contributions in four working groups: Physical Layer, Link Layer, Transport Layer and Software Layer.

The technical goals for the consortium are to develop specifications, APIs, and source code to define:

  1. Protocols, electrical and optical signaling characteristics, application program interfaces and/or data structures for Ethernet communications.
  2. Link-level and end-to-end network transport protocols to extend or replace existing link and transport protocols.
  3. Link-level and end-to-end congestion, telemetry and signaling mechanisms; each of the foregoing suitable for artificial intelligence, machine learning and high-performance computing environments.
  4. Software, storage, management and security constructs to facilitate a variety of workloads and operating environments.

"This isn't about overhauling Ethernet," said Dr. J Metz, Chair of the Ultra Ethernet Consortium. "It's about tuning Ethernet to improve efficiency for workloads with specific performance requirements. We're looking at every layer - from the physical all the way through the software layers - to find the best way to improve efficiency and performance at scale."  

“The next era of computing will be characterized by breakthrough advancements in AI and AI-optimized infrastructure, and Microsoft is committed to empowering organizations to push the bounds of what is possible with the power of Azure. Joining forces to develop a common set of standards to enhance Ethernet for hyperscale AI and high-performance computing workloads will help enable continued innovation now and in the future,” said Steve Scott, Corporate Vice President of Azure Hardware Architecture at Microsoft.

https://ultraethernet.org

Wondering whether to use Infiniband or Ethernet for building large-scale clustered GPU environments?


https://youtu.be/Job_3rlSby0