Tuesday, August 29, 2023

NVIDIA previews AI accelerator

 In a talk at this week’s Hot Chips event at Stanford University, Bill Dally, NVIDIA’s chief scientist and senior vice president of research previewed a deep neural network (DNN) accelerator chip designed for efficient execution of natural language processing tasks.

The 5nm prototype achieves 95.6 TOPS/W in benchmarking and 1711 inferences/s/W with only 0.7% accuracy loss on BERT, demonstrating a practical accelerator design for energy-efficient inference with transformers.

He explored a half dozen other techniques for tailoring hardware to specific AI tasks, often by defining new data types or operations.

Dally described ways to simplify neural networks, pruning synapses and neurons in an approach called structural sparsity, first adopted in NVIDIA A100 Tensor Core GPUs.

“We’re not done with sparsity,” he said. “We need to do something with activations and can have greater sparsity in weights as well.”

In a separate talk, Kevin Deierling, NVIDIA’s vice president of networking, described the unique flexibility of NVIDIA BlueField DPUs and NVIDIA Spectrum networking switches for allocating resources based on changing network traffic or user rules.

“Today with generative AI workloads and cybersecurity, everything is dynamic, things are changing constantly,” Deierling said. “So we’re moving to runtime programmability and resources we can change on the fly.”


Google unveils its 5th gen TPU

Google unveiled its 5th generation Tensor Processing Unit (TPU v5e), promising up to 2x higher training performance per dollar and up to 2.5x inference performance per dollar for LLMs and gen AI models compared to Cloud TPU v4.

TPU v5e pods can be interconnected with up to 256 chips, providing aggregate bandwidth of more than 400 Tb/s and 100 petaOps of INT8 performance.

TPU v5e also supports eight different VM configurations, ranging from one chip to more than 250 chips within a single slice. This allows customers to choose the right configurations to serve a wide range of LLM and gen AI model sizes.

Google Cloud also introduced Multislice technology, which allows users to scale AI models beyond the boundaries of physical TPU pods. This means that users can now scale their training jobs up to tens of thousands of Cloud TPU v5e or TPU v4 chips. Previously, training jobs using TPUs were limited to a single slice of TPU chips, capping the size of the largest jobs at a maximum slice size of 3,072 chips for TPU v4.


Intel previews 5th Gen Xeons

At this week’s Hot Chips event at Stanford University, Intel presented two sessions revealing technical specifications and features of the Xeon platform architecture and offerings coming in 2024, along with additional information on its forthcoming 5th Gen Intel Xeon® processors launching later this year. A separate session outlined new capabilities related to its Intel Agilex 9 Direct RF-Series FPGAs.

The new Intel Xeon platform utilizes modular system-on-chips (SoCs) for increased scalability and flexibility to deliver a range of products that meet the growing scale, processing and power efficiency needs for AI, cloud and enterprise installations.

  • P-core and E-core are delivered with shared intellectual property (IP), firmware and OS software stack.
  • Fastest DDR and new high-bandwidth multiplexed combined rank (MCR) DIMMs.
  • New Intel Flat Memory enables hardware-managed data movement between DDR5 and CXL memory, making total capacity visible to software.
  • CXL 2.0 support for all device types with backward compatibility to CXL 1.1.
  • Advanced I/O with up to 136 lanes PCIe 5.0/CXL 2.0 and up to six UPI links.

Intel Xeon processors with E-cores (Sierra Forest) are enhanced to deliver density-optimized compute in the most power-efficient manner. Xeon processors with E-cores provide best-in-class power-performance density, offering distinct advantages for cloud-native and hyperscale workloads.  

  • 2.5x better rack density and 2.4x higher performance per watt1.
  • Support for 1S and 2S servers, with up to 144c per CPU and TDP as low as 200W.
  • Modern instruction set with robust security, virtualization and AVX with AI extensions.
  • Foundational memory RAS features such as machine check, data cache ECC standard in all Xeon CPUs.

Intel Xeon processors with P-cores (Granite Rapids) are optimized to deliver the lowest total cost of ownership (TCO) for high-core performance-sensitive workloads and general-purpose compute workloads. Today, Xeon enables better AI performance than any other CPU2, and Granite Rapids will further enhance AI performance. Built-in accelerators give an additional boost to targeted workloads for even greater performance and efficiency.

  • 2-3x better performance for mixed AI workloads3.
  • Enhanced Intel AMX with support for new FP16 instructions.
  • Higher memory bandwidth, core count, cache for compute intensive workloads.
  • Socket scalability from one socket to eight sockets.

Intel Agilex 9 Direct RF-Series FPGAs with Integrated 64 Gsps (giga-samples per second) Data Converters and a new wideband agility reference design include both wideband and narrowband receivers within the same multichip package. The wideband receiver provides an unprecedented 32 GHz of RF bandwidth to the FPGA.

Google Cloud and Nvidia Expand Alliance for AI

Google Cloud and NVIDIA are extending their collaboration focuses on AI infrastructure and software.  


  • Google Cloud announced general availability next month of A3 VMs powered by NVIDIA’s H100 GPU for the most demanding gen AI and large language model (LLM) workloads. Google Cloud said theA3 VMs powered by the new H100s will achieve three times better training performance over the prior-generation A2. 
  • NVIDIA H100 GPUs to power Google Cloud’s Vertex AI platform — H100 GPUs are expected to be generally available on VertexAI in the coming weeks, enabling customers to quickly develop generative AI LLMs.
  • Google Cloud to gain access to NVIDIA DGX™ GH200 — Google Cloud will be one of the first companies in the world to have access to the NVIDIA DGX GH200 AI supercomputer — powered by the NVIDIA Grace Hopper™ Superchip — to explore its capabilities for generative AI workloads.
  • NVIDIA DGX Cloud Coming to Google Cloud — NVIDIA DGX Cloud AI supercomputing and software will be available to customers directly from their web browser to provide speed and scale for advanced training workloads.
  • NVIDIA AI Enterprise on Google Cloud Marketplace — Users can access NVIDIA AI Enterprise, a secure, cloud native software platform that simplifies developing and deploying enterprise-ready applications including generative AI, speech AI, computer vision, and more.
  • Google Cloud first to offer NVIDIA L4 GPUs — Earlier this year, Google Cloud became the first cloud provider to offer NVIDIA L4 Tensor Core GPUs with the launch of the G2 VM. NVIDIA customers switching to L4 GPUs from CPUs for AI video workloads can realize up to 120x higher performance with 99% better efficiency. L4 GPUs are used widely for image and text generation, as well as VDI and AI-accelerated audio/video transcoding.


GlobalFoundries unveils its most advanced RF solution

GlobalFoundries (GF) announced its new 9SW RFSOI technology, its most advanced RF solution for use in front-end modules (FEMs) for today's 5G operating frequencies, as well as future 5G mobile and wireless communication applications. The advanced 9SW technology will be manufactured on GF’s 300mm production line at its fabrication site in Singapore.

The 9SW technology improves on the company’s previous 8SW technology with superior switching, low-noise amplifiers (LNA), and logic processing capabilities. The 9SW technology features significant reduction in standby currents for longer battery life, creating products over 10% smaller than previous generations with more than 20% enhancement in efficiency. It is more energy-efficient, lowering power consumption in both inactive and active states.

“Developing high-speed, more reliable connectivity is critical for the key 5G mobile and next-generation applications that we all depend on to live, work and play,” said David Archbold, vice president of product marketing, Wireless Semiconductor Division, Broadcom. “With this new generation of technology, Broadcom will enable our customers to deliver the best possible 5G mobile experience to their end users.”

“Wireless Connectivity comes in many shapes and sizes. GF has long standing leadership, in highly efficient wireless whether it’s short, medium, or long range. The ever-increasing reliance on 5G and smart mobile devices combined with the explosion of data necessitates more connectivity and improved efficiency.  GF’s newest generation of industry-leading RF SOI technologies will allow us to meet the global demand for solutions that deliver more reliable seamless connectivity, while using significantly less power,” said GF President and CEO Dr. Thomas Caulfield.


IONOS interconnects data centers with Infinera’s GX G30

IONOS, the largest web hosting company in Europe, deployed Infinera’s GX Series Compact Modular Platform to expand its network and interconnect data centers across Europe.

IONOS offers small- and medium-sized businesses web hosting and cloud services, managing more than 6 million customers and hosting over 22 million domains in its own regional data centers in Europe and in the U.S. With the 

Infinera GX G30 compact modular solution enables IONOS to scale capacity and deliver 100 GbE and 400 GbE services to meet increased demand for data center connectivity to keep up with its growing cloud business.

“We continue to advance the capabilities of our network infrastructure and invest in best-in-class technologies to meet ever-increasing data center traffic volumes driven by today’s bandwidth-intensive end-user applications,” said Sebastian Hohwieler, IONOS Head of Network Infrastructure. 

“We are constantly improving the energy efficiency of our networks,” said Daniel Heinze, SVP Network at IONOS. “Our target is to lower energy consumption despite traffic growth. Although green electricity is the main source of supply for our data centers in all countries, the best is if energy is not consumed at all.”


Altafiber raises $600M for fiber rollouts

Cincinnati Bell , which is now doing business as “altafiber” in Ohio, Kentucky, and Indiana, announced $600 million in new funding to support the continued construction of fiber networks throughout incumbent and expansion markets. The equity raise was solely supported by existing investors, funds managed by Macquarie Asset Management, Ares Management, and supporting co-investors.

In addition to continuing to upgrade its current footprint, these new funds will enable altafiber to expand its opportunistic edge-out and community partnership strategy, delivering on its mission to provide individuals and businesses with the fastest, most reliable, future-proof fiber network while maintaining its commitment to the communities it serves.

Since the close of the “take-private” transaction in September 2021, the company has passed ~300,000 addresses with fiber now totaling over 1.1 million homes passed with fiber. The Company expects to complete the construction of fiber to every single-family unit within Greater Cincinnati by the end of 2023. The completion of the construction of Hawaii’s statewide fiber network is expected by the end of 2027.

In combination with previously announced expansion markets, altafiber now has plans to expand its multi-gigabit fiber network to ~400,000 homes outside of its incumbent territories. These new markets include Ohio, Kentucky, and Indiana communities including, but not limited to, Butler, Warren, and Greene Counties in Ohio; Boone, Kenton, and Campbell Counties in Kentucky; and regional cities such as Greater Dayton, Dublin, Middletown, Waynesville, Xenia, Lawrenceburg, and Greendale.

“Gigabit connectivity is essential to access educational, employment, and healthcare opportunities. Robust fiber networks are also powerful economic development tools for business attraction and retention in growing municipalities,” said Leigh Fox, President and CEO of altafiber. “The combination of our operational expertise and this new funding will allow us to continue to invest in fiber, expand our geographic reach, and help to create digital equity in many rural and suburban communities.”

Dell'Oro: Open RAN revenues declined in 2Q2023

Both Open RAN and Virtualized RAN (vRAN) revenues declined in 2Q23, according to a new report from Dell’Oro Group. This marks the first quarter of year-over-year (YoY) contractions since Dell’Oro began tracking these next generation architectures back in 2019. 

“After a couple of years where Open RAN revenues exceeded expectations and advanced at an accelerated pace, the current slowdown doesn't come as a surprise,” said Stefan Pongratz, Vice President with the Dell’Oro Group. “Projections for 2023 were more tempered, considering that it would take time for the early majority operators to balance out the more challenging comparisons with the early adopters who fueled the initial Open RAN wave. This is the trend we are witnessing now – growth decelerated in the first quarter and declined in the second quarter,” continued Pongratz.

Additional Open RAN and vRAN highlights from the 2Q 2023 RAN report:

  • In Europe, Open RAN revenues are on the uptick, but this was insufficient to offset the declines in Asia Pacific and North America.

  • The vendor landscape remains mixed, as many Open RAN-focused suppliers are not thriving as they had hoped. NEC experienced a material improvement in its Open RAN market share between 2022 and 1H23, whereas Mavenir's Open RAN revenue share declined over the same period.

  • The top 4 Open RAN suppliers by revenue for the 1H23 period were Samsung, NEC, Fujitsu, and Rakuten Symphony. 

  • Open RAN revenues are still expected to account for 5 to 10 percent of the 2023 RAN market.