Tuesday, April 9, 2024

Intel debuts Gaudi 3 AI processor

Intel introduced its Gaudi 3 AI accelerator for training and inference on popular large language models (LLMs) and multimodal models.

The Gaudi 3 processor delivers 4x AI compute for BF16,  1.5x increase in memory bandwidth, and 2x networking bandwidth for massive system scale out compared to its predecessor.

Features and Improvements:

  • The Gaudi 3 AI accelerator is built on a 5nm process, offering advanced efficiency for large-scale AI compute.
  • It includes 64 AI-custom Tensor Processor Cores (TPCs) and eight Matrix Multiplication Engines (MMEs), capable of performing 64,000 parallel operations for improved computational efficiency and deep learning computations.
  • The accelerator boasts 128GB of HBMe2 memory, 3.7TB of memory bandwidth, and 96MB of on-board SRAM, facilitating large GenAI dataset processing with enhanced performance and cost efficiency.
  • It features twenty-four 200Gb Ethernet ports for flexible, open-standard networking and efficient system scaling from single nodes to large clusters.
  • Integrates with PyTorch framework and provides optimized community-based models for ease of use and productivity in GenAI development.

Performance Projections, according to Intel: 

  • Compared to Nvidia's H100, the Intel Gaudi 3 is expected to offer 50% faster training times and inference throughput for various LLMs, including Llama and GPT-3 models, with significant improvements in inference power efficiency.
  • A projected 30% faster inferencing compared to Nvidia's H200 on similar models.

Market Adoption and Availability:

  • Scheduled for OEM availability in Q2 2024 with general availability in Q3 and the PCIe add-in card in Q4 2024.
  • Notable OEMs like Dell Technologies, HPE, Lenovo, and Supermicro will market the Gaudi 3.
  • The Gaudi 3 accelerator aims to power cost-effective cloud LLM infrastructures, offering performance and cost advantages.

Strategic Importance:

  • The accelerator supports critical sectors transitioning GenAI projects to full-scale implementations, requiring open, cost-effective, and energy-efficient solutions.
  • Designed for scalability, performance, and energy efficiency, meeting enterprise needs for return on investment and operational efficiency.
  • The momentum of Gaudi 3 accelerators is foundational for the development of Falcon Shores, Intel’s next-generation GPU, combining Intel Gaudi and Intel Xe IP under a unified programming interface based on the Intel oneAPI specification.

“In the ever-evolving landscape of the AI market, a significant gap persists in the current offerings. Feedback from our customers and the broader market underscores a desire for increased choice. Enterprises weigh considerations such as availability, scalability, performance, cost, and energy efficiency. Intel Gaudi 3 stands out as the GenAI alternative presenting a compelling combination of price performance, system scalability, and time-to-value advantage," stated Justin Hotard, Intel executive vice president and general manager of the Data Center and AI Group.

https://www.intel.com/content/www/us/en/newsroom/news/vision-2024-gaudi-3-ai-accelerator.html#gs.7ploog