Monday, March 18, 2024

NVIDIA Blackwell and 5th gen NVlink advance AI

 NVIDIA unveiled its Blackwell platform, named in honor of David Harold Blackwell, a mathematician who specialized in game theory and statistics, succeeding the NVIDIA Hopper architecture launched two years ago.

Blackwell leverages six new technologies to enable AI training and real-time LLM inference for models scaling up to 10 trillion parameters.

Blackwell Highlights

  • 208 billion transistors 
  • Manufactured using a custom-built 4NP TSMC process with two-reticle limit 
  • GPU dies are connected by 10 TB/second chip-to-chip link into a single, unified GPU
  • Blackwell introduces a 2nd gen  Transformer Engine with new micro-tensor scaling support and NVIDIA’s advanced dynamic range management algorithms
  • Blackwell will support double the compute and model sizes with new 4-bit floating point AI inference capabilities.
  • 5th gen NVLink delivers 1.8TB/s bidirectional throughput per GPU, ensuring seamless high-speed communication among up to 576 GPUs for the most complex LLMs
  • Blackwell-powered GPUs include a dedicated engine for reliability, availability and serviceability.
  • Support for new native interface encryption protocols
  • A dedicated decompression engine supports the latest formats
  • The NVIDIA GB200 Grace Blackwell Superchip connects two NVIDIA B200 Tensor Core GPUs to the NVIDIA Grace CPU over a 900GB/s ultra-low-power NVLink chip-to-chip interconnect.

Building Bigger Systems

For the highest AI performance, GB200-powered systems can be connected with the NVIDIA Quantum-X800 InfiniBand and Spectrum-X800 Ethernet platforms, also announced today, which deliver advanced networking at speeds up to 800 GbPs. The GB200 is a key component of the NVIDIA GB200 NVL72, a multi-node, liquid-cooled, rack-scale system for the most compute-intensive workloads. It combines 36 Grace Blackwell Superchips, which include 72 Blackwell GPUs and 36 Grace CPUs interconnected by fifth-generation NVLink. Additionally, GB200 NVL72 includes NVIDIA BlueField-3 data processing units to enable cloud network acceleration, composable storage, zero-trust security and GPU compute elasticity in hyperscale AI clouds. The GB200 NVL72 provides up to a 30x performance increase compared to the same number of NVIDIA H100 Tensor Core GPUs for LLM inference workloads, and reduces cost and energy consumption by up to 25x. The platform acts as a single GPU with 1.4 exaflops of AI performance and 30TB of fast memory, and is a building block for the newest DGX SuperPOD.

NVIDIA will also offer the HGX B200, a server board that links eight B200 GPUs through NVLink to support x86-based generative AI platforms. HGX B200 supports networking speeds up to 400Gb/s through the NVIDIA Quantum-2 InfiniBand and Spectrum-X Ethernet networking platforms.

Global Network of Blackwell Partners

AWS, Google Cloud, Microsoft Azure and Oracle Cloud Infrastructure will be among the first cloud service providers to offer Blackwell-powered instances, as will NVIDIA Cloud Partner program companies Applied Digital, CoreWeave, Crusoe, IBM Cloud and Lambda. 

Sovereign AI clouds will also provide Blackwell-based cloud services and infrastructure, including Indosat Ooredoo Hutchinson, Nebius, Nexgen Cloud, Oracle EU Sovereign Cloud, the Oracle US, UK, and Australian Government Clouds, Scaleway, Singtel, Northern Data Group's Taiga Cloud, Yotta Data Services’ Shakti Cloud and YTL Power International.

Cisco, Dell, Hewlett Packard Enterprise, Lenovo and Supermicro are expected to deliver a wide range of servers based on Blackwell products, as are Aivres, ASRock Rack, ASUS, Eviden, Foxconn, GIGABYTE, Inventec, Pegatron, QCT, Wistron, Wiwynn and ZT Systems.

Additionally, a growing network of software makers, including Ansys, Cadence and Synopsys — global leaders in engineering simulation — will use Blackwell-based processors to accelerate their software for designing.

https://nvidianews.nvidia.com/news/nvidia-blackwell-platform-arrives-to-power-a-new-era-of-computing