Tuesday, November 28, 2023

Infrastructure notes from AWS re:Invent 2023

At AWS re:Invent 2023 in Las Vegas, Adam Selipsky, CEO of Amazon Web Services, presented a 2.5 hour keynote where he shared the latest announcements and cloud strategies, with a heavy emphasis on AI.

Here are infrastructure highlights:

Introducing Amazon S3 Express One Zone - 17 years since launching its S3 Cloud Storage, AWS is introducing Amazon S3 Express One Zone for the highest performance and lowest latency storage. Amazon S3 Express One Zone is the lowest latency cloud object storage available, with data access speed up to 10 times faster and request costs up to 50% lower than Amazon S3 Standard, from any AWS Availability Zone within an AWS Region.


Introducing AWS Graviton4 processor - In 2018, AWS introduced its Graviton processor. This was followed in 2020 Graviton 2 and then Graviton3. There are already 150 EC2 instance types that use this processor, offering price/performance benefits. For example, SAP is using Graviton for its HANA service.

The new Graviton4 CPU is 30% faster , 50% more cores and 75% more memory bandwidth than current generation Graviton3 processor. Graviton4 also raises the bar on security by fully encrypting all high-speed physical hardware interface

AWS is now previewing R8g Instances based on Graviton4, enabling customers to improve the execution of their high-performance databases, in-memory caches, and big data analytics workloads. R8g instances offer larger instance sizes with up to 3x more vCPUs and 3x more memory than current generation R7g instances. 

Introducing Trainium2 - the new processor is designed to deliver up to 4x faster training than first generation Triennium chips and will be able to be deployed in EC2 UltraClusters of up to 100,000 chips.

An Expanded partnership with NVIDIA: AWS will offer first cloud AI supercomputer with NVIDIA Grace Hopper Superchip and AWS UltraCluster scalability based on multi-node NVLink technology.

NVLink can connect 32 Grace Hoppers via a new NVLINK switch. Each GH200 Superchip combines an Arm-based Grace CPU with a Hopper architecture GPU on the same module. 

A single Amazon EC2 instance with GH200 NVL32 can provide up to 20 TB of shared memory to power terabyte-scale workloads. These instances will take advantage of AWS’s third-generation EFA interconnect, providing up to 400 Gbps per Superchip of low-latency, high-bandwidth networking throughput, enabling customers to scale to thousands of GH200 Superchips in EC2 UltraClusters.

Liquid Cooling: AWS instances with GH200 NVL32 will be the first AI infrastructure on AWS to feature liquid cooling 

NVIDIA GH200-powered EC2 instances will feature 4.5 TB of HBM3e memory—a 7.2x increase compared to current generation H100-powered EC2 P5d instances—allowing customers to run larger models, while improving training performance. Additionally, CPU-to-GPU memory interconnect provides up to 7x higher bandwidth than PCIe, enabling chip-to-chip communications that extend the total memory available for applications.

NVIDIA DGX Cloud comes to AWS powered by GH200 NVL32 NVLink infrastructure. DGX Cloud is NVIDIA’s AI factory supporting many use cases, such as weather simulation, digital biology, etc.  


NVIDIA Project Ceiba - which refers to the most magnificent tree in the Amazon, will connect 16,384 GPUs into one giant supercomputer. NVIDIA estimates this will cut training time of largest LLMs in half the time. This will be 65 Exaflops — like 65 supercomputers in one system for training models.

AWS will introduce three additional Amazon EC2 instances: P5e instances, powered by NVIDIA H200 Tensor Core GPUs, for large-scale and cutting-edge generative AI and HPC workloads; and G6 and G6e instances, powered by NVIDIA L4 GPUs and NVIDIA L40S GPUs, respectively, for a wide set of applications such as AI fine tuning, inference, graphics, and video workloads.

Flexible Ultracluster usage - AWS is targeting fluctuating demand for cluster capacity. Amazon EC2 Capacity Blocks for ML lets customers reserve up to 100s of GPUs in a single cluster. This will push the envelop on price performance for ML workload.


AWS Sagemaker is being used by tens of thousands of customers, including support for Hugging Face

AWS Bedrock introduced a number of features including the ability to apply guardrails to all large language models (LLMs) , including fine-tuned models, and Agents for Amazon Bedrock.  Guardrails can be used to define denied topics and content filters to remove undesirable and harmful content from interactions between users and your applications.

Update on Project Kuiper satellite broadband - Amazon is making a big bet by building its own LEO constellation. The first 2 prototype satellites were launched in October  AWS plans to offer an enterprise service, along with a global consumer broadband service. AWS expects that developers will be able to begin testing in 2nd half of 2024.