Amazon Web Services announced the general availability of Amazon Elastic Compute Cloud (EC2) Capacity Blocks for ML, a new consumption model providing access highly sought-after GPU compute capacity for short duration machine learning (ML) workloads.
The service enables customers to reserve hundreds of NVIDIA GPUs colocated in Amazon EC2 UltraClusters designed for high-performance ML workloads. Customers can use EC2 Capacity Blocks with P5 instances, powered by the latest NVIDIA H100 Tensor Core GPUs, by specifying their cluster size, future start date, and duration.
AWS EC2 Capacity Blocks help ensure customers have reliable, predictable, and uninterrupted access to the GPU compute capacity required for their critical ML projects.
“AWS and NVIDIA have collaborated for more than 12 years to deliver scalable, high-performance GPU solutions, and we are seeing our customers build incredible generative AI applications that are transforming industries,” said David Brown, vice president of Compute and Networking at AWS. “AWS has unmatched experience delivering NVIDIA GPU-based compute in the cloud, in addition to offering our own Trainium and Inferentia chips. With Amazon EC2 Capacity Blocks, we are adding a new way for enterprises and startups to predictably acquire NVIDIA GPU capacity to build, train, and deploy their generative AI applications—without making long-term capital commitments. It’s one of the latest ways AWS is innovating to broaden access to generative AI capabilities.”