Monday, July 24, 2023

Cerebras builds Condor Galaxy AI supercomputer cluster

Cerebras Systems and G42, the UAE-based technology holding group, are building a network of nine interconnected supercomputers, offering a new approach to AI compute that promises to significantly reduce AI model training time. 

The first AI supercomputer on this network, Condor Galaxy 1 (CG-1), isocated in Santa Clara, California. CG-1 links 64 Cerebras CS-2 systems together into a single, system. It supports up to 600 billion parameter models and extendable configurations that support up to 100 trillion parameter models. With 54 million AI-optimized compute cores, 388 terabits per second of fabric bandwidth, and fed by 72,704 AMD EPYC processor cores, unlike any known GPU cluster, CG-1 delivers near-linear performance scaling from 1 to 64 CS-2 systems using simple data parallelism.

Cerebras and G42 offer CG-1 as a cloud service, allowing customers to enjoy the performance of an AI supercomputer without having to manage or distribute models over physical systems.

Cerebras and G42 are planning to deploy two more such supercomputers, CG-2 and CG-3, in the U.S. in early 2024. With a planned capacity of 36 exaFLOPs in total, this unprecedented supercomputing network will revolutionize the advancement of AI globally.

“Collaborating with Cerebras to rapidly deliver the world’s fastest AI training supercomputer and laying the foundation for interconnecting a constellation of these supercomputers across the world has been enormously exciting. This partnership brings together Cerebras’ extraordinary compute capabilities, together with G42’s multi-industry AI expertise. G42 and Cerebras’ shared vision is that Condor Galaxy will be used to address society’s most pressing challenges across healthcare, energy, climate action and more,” said Talal Alkaissi, CEO of G42 Cloud, a subsidiary of G42.

“Delivering 4 exaFLOPs of AI compute at FP 16, CG-1 dramatically reduces AI training timelines while eliminating the pain of distributed compute,” said Andrew Feldman, CEO of Cerebras Systems. “Many cloud companies have announced massive GPU clusters that cost billions of dollars to build, but that are extremely difficult to use. Distributing a single model over thousands of tiny GPUs takes months of time from dozens of people with rare expertise. CG-1 eliminates this challenge. Setting up a generative AI model takes minutes, not months and can be done by a single person. CG-1 is the first of three 4 exaFLOP AI supercomputers to be deployed across the U.S. Over the next year, together with G42, we plan to expand this deployment and stand up a staggering 36 exaFLOPs of efficient, purpose-built AI compute.”

https://www.cerebras.net/press-release/cerebras-and-g42-unveil-worlds-largest-supercomputer-for-ai-training-with-4-exaflops-to-fuel-a-new-era-of-innovation