Thursday, April 20, 2023

Broadcom delivers Jericho3-AI switching fabric

Broadcom began shipping its Jericho3-AI switching silicon chip for delivering high-performance Ethernet for GPU clusters supporting AI workloads.

The Jericho3-AI fabric offers 26 petabits per second of Ethernet bandwidth, almost four times the bandwidth of the previous generation, while simultaneously delivering 40 percent lower power per gigabit. The chip boasts advanced capabilities such as load balancing, congestion-free operation, ultra-high radix, and zero-impact failover.

Broadcom said it optimized the chip to handle the unique characteristics of AI workloads such as a low number of large, long-lived flows, all starting concurrently upon completion of an AI computation cycle. 

“The benchmark for AI networking is reducing the time and effort it takes to complete the training and inference of large-scale AI models,” said Ram Velaga, senior vice president and general manager, Core Switching Group, Broadcom. “Jericho3-AI delivers significant reduction in job completion time compared to any other alternative in the market.”

Highlights

  • "Perfect load balancing" equally sprays traffic over all links of the fabric, ensuring maximum network utilization under the highest network loads.
  • Congestion-free operation with end-to-end traffic scheduling ensures no flow collisions and no jitter.
  • Ultra-high radix uniquely allows the Jericho3-AI fabric to scale connectivity to 32,000 GPUs, each with 800Gbps, in a single cluster.
  • Zero-Impact Failover functionality ensures sub-10ns automatic path convergence, resulting in no impact to job completion time.
  • Long-reach SerDes, distributed buffering, and advanced telemetry, all provided using industry-standard Ethernet. 

“Cloud operators will upgrade their AI infrastructure to address the massive growth in bandwidth, driven by a new generation of high-capacity GPUs and the emergence of large language models,” said Bob Wheeler, principal analyst at Wheeler’s Network. “Jericho3-AI offers a high-bandwidth, low-latency and low-power choice for networks connecting tens of thousands of GPUs, revolutionizing the economics of building and maintaining AI clusters for this exciting new era.”

https://www.broadcom.com/company/news/product-releases/61156