Intel introduced its Gaudi 3 AI accelerator for training and inference on popular large language models (LLMs) and multimodal models.
The Gaudi 3 processor delivers 4x AI compute for BF16, 1.5x increase in memory bandwidth, and 2x networking bandwidth for massive system scale out compared to its predecessor.
Features and Improvements:
- The Gaudi 3 AI accelerator is built on a 5nm process, offering advanced efficiency for large-scale AI compute.
- It includes 64 AI-custom Tensor Processor Cores (TPCs) and eight Matrix Multiplication Engines (MMEs), capable of performing 64,000 parallel operations for improved computational efficiency and deep learning computations.
- The accelerator boasts 128GB of HBMe2 memory, 3.7TB of memory bandwidth, and 96MB of on-board SRAM, facilitating large GenAI dataset processing with enhanced performance and cost efficiency.
- It features twenty-four 200Gb Ethernet ports for flexible, open-standard networking and efficient system scaling from single nodes to large clusters.
- Integrates with PyTorch framework and provides optimized community-based models for ease of use and productivity in GenAI development.
Performance Projections, according to Intel:
- Compared to Nvidia's H100, the Intel Gaudi 3 is expected to offer 50% faster training times and inference throughput for various LLMs, including Llama and GPT-3 models, with significant improvements in inference power efficiency.
- A projected 30% faster inferencing compared to Nvidia's H200 on similar models.
Market Adoption and Availability:
- Scheduled for OEM availability in Q2 2024 with general availability in Q3 and the PCIe add-in card in Q4 2024.
- Notable OEMs like Dell Technologies, HPE, Lenovo, and Supermicro will market the Gaudi 3.
- The Gaudi 3 accelerator aims to power cost-effective cloud LLM infrastructures, offering performance and cost advantages.
Strategic Importance:
- The accelerator supports critical sectors transitioning GenAI projects to full-scale implementations, requiring open, cost-effective, and energy-efficient solutions.
- Designed for scalability, performance, and energy efficiency, meeting enterprise needs for return on investment and operational efficiency.
- The momentum of Gaudi 3 accelerators is foundational for the development of Falcon Shores, Intel’s next-generation GPU, combining Intel Gaudi and Intel Xe IP under a unified programming interface based on the Intel oneAPI specification.
“In the ever-evolving landscape of the AI market, a significant gap persists in the current offerings. Feedback from our customers and the broader market underscores a desire for increased choice. Enterprises weigh considerations such as availability, scalability, performance, cost, and energy efficiency. Intel Gaudi 3 stands out as the GenAI alternative presenting a compelling combination of price performance, system scalability, and time-to-value advantage," stated Justin Hotard, Intel executive vice president and general manager of the Data Center and AI Group.