Tuesday, October 2, 2018

Xilinx looks beyond FPGAs with Adaptive Compute Acceleration Platform

At its second annual Xilinx Developer Forum (XDF) in San Jose, Xilinx unveiled strategic moves beyond its mainstay field-programmable gate array (FPGAs) with the introduction of its own accelerator line cards and, more significantly, a new Adaptive Compute Acceleration Platform (ACAP).


Xilinx, which got its start in 1984 and now sells a broad range of FPGAs and complex programmable logic devices (CPLDs), is transforming itself into a higher-value platform provider not only for existing workloads but for new domains, especially AI, hyperscale cloud data centers, autonomous vehicles, and 5G infrastructure.

In a keynote at the event, Victor Peng Xilinx's new CEO, Victor Peng, who took over the leadership position in January from Moshe Gavriel, said that a transformation is being driven by the rapid rise in overall compute workloads hitting just as Moore's Law is slowing down. Xilinx's chief advantages have been flexibility and performance compared to custom ASICs. As we move into the era of machine learning and artificial intelligence, Xilinx is positioning itself as a better alternative to CPUs (especially Intel), GPUs (especially NVIDIA), and the custom silicon developed by hyperscale cloud giants (especially Google and soon likely others).

As part of its updated data center strategy, Xilinx is announcing its own portfolio of accelerator cards for industry-standard servers in cloud and on-premise data centers. The new Alveo PCIe cards are powered by the Xilinx UltraScale+ FPGA, are available now for production orders. Customers can reconfigure the hardware, enabling them to optimize for shifting workloads, new standards, and updated algorithms.

Xilinx says performance when used in machine learning frameworks will be great. An Alveo U250 card increases real-time inference throughput by 20X versus high-end CPUs, and more than 4X for sub-two-millisecond low-latency applications versus fixed-function accelerators like high-end GPUs.  Alveo is supported by an ecosystem of partners and OEMs including Algo-Logic Systems Inc, Bigstream, BlackLynx Inc., CTAccel, Falcon Computing, Maxeler Technologies, Mipsology, NGCodec, Skreens, SumUp Analytics, Titan IC, Vitesse Data, VYUsync and Xelera Technologies.

"The launch of Alveo accelerator cards further advances Xilinx's transformation into a platform company, enabling a growing ecosystem of application partners that can now innovate faster than ever before," said Manish Muthal, vice president, data center, Xilinx. "We are seeing strong customer interest in Alveo accelerators and are delighted to partner with our application ecosystem to deliver production-deployable solutions based on Alveo to our customers."

The second big announcement from XDF was the unveiling Versal adaptive compute acceleration platform (ACAP), a fully software-programmable, heterogeneous compute platform that combines Scalar Engines, Adaptable Engines, and Intelligent Engines. Xilinx is claiming dramatic performance improvements of up to 20X over today's FPGAs, and over 100X over today's fastest CPUs. Target applications include Data Center, wired network, 5G wireless, and automobile driver assist applications.

The Versal ACAP is built on TSMC's 7-nanometer FinFET process technology. It combines software programmability with domain-specific hardware acceleration and the adaptability/

Xilinx already has plans for six series of devices in the Versal family.  This includes the Versal Prime series, Premium series and HBM series, which are designed to deliver performance, connectivity, bandwidth, and integration for the most demanding applications. It also includes the AI Core series, AI Edge series, and AI RF series, which feature the breakthrough AI Engine. The AI Engine is a new hardware block designed to address the emerging need for low-latency AI inference for a wide variety of applications and also supports advanced DSP implementations for applications like wireless and radar. It is tightly coupled with the Versal Adaptable Hardware Engines to enable whole application acceleration, meaning that both the hardware and software can be tuned to ensure maximum performance and efficiency.

"With the explosion of AI and big data and the decline of Moore's Law, the industry has reached a critical inflection point. Silicon design cycles can no longer keep up with the pace of innovation," says Peng. "Four years in development, Versal is the industry's first ACAP. We uniquely designed it to enable all types of developers to accelerate their whole application with optimized hardware and software and to instantly adapt both to keep pace with rapidly evolving technology. It is exactly what the industry needs at the exact moment it needs it."

The Versal AI Core series, which is optimized for cloud, networking, and autonomous technology, has five devices, offering 128 to 400 AI Engines. The series includes dual-core Arm Cortex-A72 application processors, dual-core Arm Cortex-R5 real-time processors, 256KB of on-chip memory with ECC, more than 1,900 DSP engines optimized for high-precision floating point with low latency. It also incorporates more than 1.9 million system logic cells combined with more than 130Mb of UltraRAM, up to 34Mb of block RAM, and 28Mb of distributed RAM and 32Mb of new Accelerator RAM blocks, which can be directly accessed from any engine and is unique to the Versal AI series' – all to support custom memory hierarchies. The series also includes PCIe Gen4 8-lane and 16-lane, and CCIX host interfaces, power-optimized 32G SerDes, up to 4 integrated DDR4 memory controllers, up to 4 multi-rate Ethernet MACs, 650 high-performance I/Os for MIPI D-PHY, NAND, storage-class memory interfacing and LVDS, plus 78 multiplexed I/Os to connect external components and more than 40 HD I/Os for 3.3V interfacing. All of this is interconnected by a state-of-the-art network-on-chip (NoC) with up to 28 master/slave ports, delivering multi-terabit per-second bandwidth at low latency combined with power efficiency and native software programmability.  The full product table is now available.

The Versal Prime series is designed for broad applicability across multiple markets and is optimized for connectivity and in-line acceleration of a diverse set of workloads. This mid-range series is made up of nine devices, each including dual-core Arm Cortex-A72 application processors, dual-core Arm Cortex-R5 real-time processors, 256KB of on-chip memory with ECC, more than 4,000 DSP engines optimized for high-precision floating point with low latency. It also incorporates more than 2 million system logic cells combined with more than 200Mb of UltraRAM, greater than 90Mb of block RAM, and 30Mb of distributed RAM to support custom memory hierarchies. The series also includes PCIe® Gen4 8-lane and 16-lane, and CCIX host interfaces, power-optimized 32 gigabits-per-second SerDes and mainstream 58 gigabits-per-second PAM4 SerDes, up to 6 integrated DDR4 memory controllers, up to 4 multi-rate Ethernet MACs, 700 high-performance I/Os for MIPI D-PHY, NAND, and storage-class memory interfaces and LVDS, plus 78 multiplexed I/Os to connect external components, and greater than 40 HD I/O for 3.3V interfacing. All of this is interconnected by a state-of-the-art network-on-chip (NoC) with up to 28 master/slave ports, delivering multi-terabits per-second bandwidth at low latency combined with power efficiency and native software programmability. The full product table is available now.