Monday, March 18, 2024

NVIDIA rolls out 800G Networking Switches

NVIDIA introduced its X800 series of 800G Infiniband and Ethernet networking switches designed to cater to massive-scale AI workloads, including those utilizing NVIDIA's newly unveiled Blackwell architecture-based products.

Quantum-X800 Platform: Features the NVIDIA Quantum Q3400 switch and the NVIDIA ConnectX-8 SuperNIC, delivering 5 times higher bandwidth and a 9 times increase in In-Network Computing capabilities compared to previous generations. Advanced features include:

  • The Q3400-RA 4U switch—the first to utilize 200 Gbps-per-lane serializer/deserializer (SerDes) technology. It offers 144 ports of 800 Gbps distributed over 72 OSFP cages and a dedicated management port for NVIDIA UFM (Unified Fabric Manager) connectivity
  • With this very high radix, a two-level fat tree topology can connect up to 10,368 network interface cards (NICs)
  • NVIDIA SHARP v4, Message Passing Interface (MPI) tag matching, MPI_Alltoall, and programmable cores boost NVIDIA In-Network Computing
  • Adaptive routing: The switch and ConnectX-8 SuperNIC, working together, maximize bandwidth and ensure network resilience for AI fabrics
  • Telemetry-based congestion control: These techniques provide noise isolation for multi-tenant AI workloads.
  • The Q3400 is air-cooled and compatible with standard 19-inch rack cabinets. A parallel liquid-cooled system, Q3400-LD, fitting an Open Compute Project (OCP) 21-inch rack, is offered as well.

The NVIDIA ConnectX-8 SuperNIC delivers 800 Gbps networking with performance isolation for multi-tenant generative AI clouds. It provides 800 Gbps data throughput with PCI Express (PCIe) Gen6, offering up to 48 lanes for various use cases such as PCIe switching inside NVIDIA GPU systems. It also supports advanced NVIDIA In-Network Computing, MPI_Alltoall, and MPI tag-matching hardware engines, as well as fabric enhancement features like quality ofservice and congestion control. The ConnectX-8 SuperNIC, featuring single-port OSFP224 and dual-port quad small form-factor pluggable (QSFP) 112 connectors for the adapters, is compatible with various form factors, including OCP 3.0 and Card Electromechanical (CEM) PCIe x16. ConnectX-8 SuperNIC also supports NVIDIA Socket Direct 16-lane auxiliary card expansion


Spectrum-X800 Platform: Tailored for AI cloud and enterprise infrastructure, offering optimized performance for faster processing and analysis of AI workloads. It includes the Spectrum SN5600 800 Gbps switch and the NVIDIA BlueField-3 SuperNIC. Highlights:

  • The Spectrum-X800 SN5600 ASIC boasts 64 ports of 800G OSFP and 51.2 terabits per second (Tbps) of switching capacity. 
  • Support Remote direct-memory access (RDMA) over converged Ethernet (RoCE) adaptive routing: Spectrum-X800 features adaptive routing for lossless networks, closely integrating the switch and SuperNIC to boost bandwidth and resilience in AI fabrics. 
  • Programmable congestion control: Spectrum-X800 uses advanced congestion control techniques to enhance noise isolation in multi-tenant AI environments
  • Software Support: NVIDIA provides a suite of network acceleration libraries and software to enhance the performance of trillion-parameter AI models, including the NVIDIA Collective Communications Library (NCCL) for extending GPU computing tasks to the Quantum-X800 network.

“NVIDIA Networking is central to the scalability of our AI supercomputing infrastructure,” said Gilad Shainer, senior vice president of Networking at NVIDIA. “NVIDIA X800 switches are end-to-end networking platforms that enable us to achieve trillion-parameter-scale generative AI essential for new AI infrastructures.”

Initial adopters of Quantum InfiniBand and Spectrum-X Ethernet include Microsoft Azure and Oracle Cloud Infrastructure.

“AI is a powerful tool to turn data into knowledge. Behind this transformation is the evolution of data centers into high-performance AI engines with increased demands for networking infrastructure,” said Nidhi Chappell, Vice President of AI Infrastructure at Microsoft Azure. “With new integrations of NVIDIA networking solutions, Microsoft Azure will continue to build the infrastructure that pushes the boundaries of cloud AI.

https://www.nvidia.com