Tuesday, October 18, 2022

Meta unveils Grand Teton, its next-gen AI system

At Open Compute Project Summit in San Jose, Meta unveiled Grand Teton, its next-generation, GPU-based hardware platform. Compared to its Zion, its predecessor, Grand Teton boasts 4x the host-to-GPU bandwidth, 2x the compute and data network bandwidth, and 2x the power envelope. Grand Teton also has an integrated chassis in contrast to Zion-EX, which comprises multiple independent subsystems.

Grand Teton has been designed with greater compute capacity to better support memory-bandwidth-bound workloads at Meta, such as its open source DLRMs. Grand Teton’s expanded operational compute power envelope also optimizes it for compute-bound workloads, such as content understanding. 

The previous-generation Zion platform consists of three boxes: a CPU head node, a switch sync system, and a GPU system, and requires external cabling to connect everything. Grand Teton integrates this into a single chassis with fully integrated power, control, compute, and fabric interfaces for better overall performance, signal integrity, and thermal performance. 

This high level of integration dramatically simplifies the deployment of Grand Teton, allowing it to be introduced into data center fleets faster and with fewer potential points of failure, while providing rapid scale with increased reliability.

Meta also introduced Open Rack v3 (ORV3), a data center rack with a frame and power infrastructure capable of supporting a wide range of use cases — including support for Grand Teton.

ORV3’s power shelf isn’t bolted to the busbar. Instead, the power shelf installs anywhere in the rack, which enables flexible rack configurations. Multiple shelves can be installed on a single busbar to support 30kW racks, while 48VDC output will support the higher power transmission needs of future AI accelerators. It also features an improved battery backup unit, upping the capacity to four minutes, compared with the previous model’s 90 seconds, and with a power capacity of 15kW per shelf. Like the power shelf, this backup unit installs anywhere in the rack for customization and provides 30kW when installed as a pair.

https://engineering.fb.com/2022/10/18/open-source/ocp-summit-2022-grand-teton/