Monday, September 17, 2018

Interview: Habana Labs targets AI processors

Habana Labs, a start-up based in Israel with offices in Silicon Valley, emerged from stealth to unveil its first AI processor.

Habana's deep learning inference processor, named Goya, is >2 orders of magnitude better in throughput & power than commonly deployed CPUs, according to the company. The company will offer a PCIe 4.0 card that incorporates a single Goya HL-1000 processor and designed to accelerate various AI inferencing workloads, such as image recognition, neural machine translation, sentiment analysis, recommender systems, etc.  A PCIe card based on its Goya HL-1000 processor delivers 15,000 images/second throughput on the ResNet-50 inference benchmark, with 1.3 milliseconds latency, while consuming only 100 watts of power.

Habana is also developing an inference software toolkit to simplify the development and deployment of deep learning models (topologies) for mass-market use. The idea is to provide an inference network model compilation and runtime that eliminates low-level programming of the processor.

I recently sat down with Eitan Medina, Habana Labs' Chief Business Officer, to discuss the development of this new class of AI processors and what it means for the cloud business.

Jim Carroll:  Who is Habana Labs and how did you guys get started?

Eitan Medina: Habana was founded in 2016 with the goal of building AI processors for inference and training. Currently, we have about 120 people on board, mostly in R&D and based in Israel. We have a business headquarters here in Silicon Valley. In terms of the background of the management team, most of us have deep expertise in processors, DSPs, and communications semiconductors. I previously was the CTO for Galileo Technology (acquired by Marvell), and now I am on the business side. I would say we have a very strong and multidisciplinary team for machine learning. We certainly have the expertise in the processing, software and networking to architect a complete hardware and software solution for deep learning.

In building this company, we identified the AI space as one that deserves its own class of processors. We believe that the existing CPUs and GPUs are not good enough.

The first wave of these AI processors are coming now or being announced now. Habana decided that unlike other semiconductor companies, we would only emerge from stealth once we have an actual product. We have production samples now and that is why we are officially launching the company.

Jim Carroll: Who are the founders and what motivated them to enter this market segment?

Eitan Medina: The two co-founders are David Dahan (CEO) and Ran Halutz (VP R&D), who worked together at Prime Sense, a company that was acquired by Apple. We also have onboard Shlomo Raikin (CTO), who was the Chief SoC Architect at Mellanox and who has 45 patents. We've also been able to recruit top talent from across the R&D ecosystem in Israel. The lead investors are Avigdor Willenz (Chairman), Bessemer, and WALDEN (Lip-Bu Tan).

Jim Carroll: What does the name "Habana" refer to?

Eitan Medina: In Hebrew, Habana means "understanding" -- a good name for an AI company.

Jim Carroll: The market for AI processors, obviously, is in its infancy. How do you see it developing?

Eitan Medina: Well, some analysts are already projecting a market for new class of chipsets for deep learning. Tractica, for instance, divides the emerging market into CPUs, GPUs, FPGAs, ASICs, SoC accelerators, and other devices. We see the need for a different type of processor because of the huge gap between the computational requirements for AI and the incremental improvements that vendors have delivered over the past few years, which right are just small improvements to CPUs and GPUs. Look at the best-in-class, deep learning models and then calculate how much computing power is needed to train them. Look at how these requirements have grown over the past few years. Trying graphing this progression and you will see a log scale graph with a doubling time of three and a half months. That's 10x every year. Initially, people were running machine learning on CPUs, and then they adopted Nvidia's GPUs. What we see in the market today is that training is dominated by GPUs, while influence is dominated by CPUs.

Jim Carroll: So what is Habana's approach?

Eitan Medina: When we looked at the overall deep learning space, we began with the workflows. It is important to understand that there's a training workflow, and there's an inference workflow. What we are introducing today is our "Goya" inference processor. Our "Gaudi" training processor will be introduced in the second quarter of 2019. It will feature a 2Tbps interface per device and its training performance scales linearly to thousands of processors. We intend to sell line cards equipped with these processors, which you can then plug into your existing servers.

The inference processor offloads this workload completely from the CPU. Therefore, you will not need to replace your existing servers with more advanced CPUs. What can this do for you?  This is where our story gets really interesting. We're about more than an order of magnitude improvement.

Look at this graph showing our ResNet-50 inference throughput and latency performance. On the left side is the best performance Intel has shown to date on a dual socket Xeon Platinum.  Latency is not reported, which could be a critical issue. In the middle is Nvidia's V100 Tensor GPU, with shows 6ms of latency -- not bad, but we can do better. Our performance, shown on the right, exceeds 15,000 images per second with just 1.3ms of latency. Our card is just 100 watts, whereas we estimate at least 400 watts for the other guys.

Jim Carroll: Where are you getting these gains? Are you processing the images in a different way?

Eitan Medina: Well, I can say that we are not changing the topology.  If you are an AI researcher with a ResNet-50 topology, we will take you topology and ingest it to our compiler. We're not forcing you to change anything in your model.

Jim Carroll: So, if we try to understand the magic inside a GPU, Nvidia will talk about their ability to process polygons in parallel with large numbers of cores. Where is the magic for Habana?

Eitan Medina: Yeah, Nvidia will say they are really good at figuring out polygons, and may tell you about the massive memory bandwidth they can provide to the many cores. But, at the end of the day, if you are interested in doing image recognition, you only really care about application performance, not the stories of how wonderful the technology is.

Let's assume for a second that there's a guy with a very inefficient image processing architecture, ok? What would this guy do to give you better performance from generation to generation? He would just pack more of the same stuff each time -- more more memory, more bandwidth, and more power. And then he would tell you to "buy more to save more".  Sound familiar? This guy can show you improvements, but if he's carrying that inefficiency throughout the stack, it is just going to be more of the same. If a new guy comes to market, what you want to see is application performance. What's your latency? What's your throughput? What's your accuracy? What's your power? What's your cost? If we can show all of that, then we don't have to have a debate about architecture.

Jim Carroll: So, are you guys using the same "magic" to deliver inference performance?

Eitan Medina: No, but for now, I want to show you what we can do. The lion share of inference processors used by cloud operators today are CPUs -- an estimated 91% of these workloads are running on CPUs. Nvidia so far has not come up with a solution to move this market to GPUs. The market is using their GPUs mainly for training.

Our line card, installed in this server, can ingest and process 15,000 frames per second through the PCI bus.  Because our chip is so efficient, we don't need crazy memory technologies or specialized manufacturing techniques. In fact, this chip is built with 16 nanometer technology, which is quite mature and well-understood. As soon as we got the first device back from TSMC, we had ResNet up and running immediately.

In a cloud data center, three of our line cards could deliver the inference processing equivalent of 169 Intel powered servers or eight of Nvidia's latest Tesla V100 GPUs.

Habana Labs is showcasing a Goya inference processor card in a live server, running multiple neural-network topologies, at the AI Hardware Summit on September 18 – 19, 2018, in Mountain View, CA.

Juniper debuts Contrail Edge Cloud

Juniper Networks introduced its Contrail Edge Cloud for helping service providers deploy new revenue-generating services at the network edge. The aim is to bring a full-fledged secure cloud experience to the space- and power-constrained edge network, which includes base stations, hub sites and switching sites.

Juniper says its Contrail Edge Cloud can extend a full suite of orchestration, automation, security and analytics to the edge for supporting dynamic consumer and enterprise services in a cost and resource efficient manner.

The new Contrail Edge Cloud leverages the software-defined networking capabilities of Contrail Networking and Contrail Security with Kubernetes and OpenStack. Furthermore, it is based on Linux Foundation’s open Tungsten Fabric project.

Some highlights:

  • Footprint-optimized distributed computing: Integration with Red Hat OpenStack Platform, a highly scalable Infrastructure-as-a-Service (IaaS) platform, with distributed compute, leverages many of the latest OpenStack capabilities to remotely manage the lifecycle of compute nodes and virtual machines (VMs) from a centralized data center, without requiring the footprint of co-located OpenStack control plane functions in each of these remote edge sites. 
  • Containers for faster time to revenue: Contrail Edge Cloud will be capable of supporting Red Hat OpenShift Container Platform and other Kubernetes distributions.
  • Unified workflow, policy and service chaining: Contrail Networking with remote compute capabilities enables service providers to deliver a seamless network fabric between VMs, containers and bare metal servers from a single-user interface across physical and virtual environments. Contrail Networking translates abstract workflows into specific networking policies, simplifying the orchestration of virtual overlay connectivity across all environments. Fabric management capabilities automate policies and life-cycle management of each data center and edge site fabric. Contrail Networking provides connectivity and service chaining for OpenStack VM, Kubernetes containers and bare metal workloads without requiring the footprint of co-located control plane functions.
  • Native security and microsegmentation: Contrail Security provides visibility, telemetry and network policy enforcement in a single unified mechanism for service providers that are increasingly deploying a mix of VMs and containers. It includes adaptive firewall policies and tag-based ability to enforce security policies across Kubernetes and OpenStack.
  • Open source software-defined storage: Red Hat Ceph Storage provides massively scalable storage that runs economically on industry standard hardware and manages petabytes of data across cloud and emerging workloads. 
  • Machine learning-based analytics: AppFormix provides machine learning-based performance and health monitoring to give health and service insights into workloads and deployments.
  • Full-featured virtualized security: cSRX and vSRX deliver an industry leading container and virtualized security instance as part of the service chain.
Juniper said it continues to contribute to the Linux Foundation’s Akraino Edge Stack project to further the open cloud initiatives.

“Service providers’ edge networks are beachfront property. As 5G and new applications such as IoT, AR/VR and connected cars all require extreme proximity to the end user, the edge will become ground zero to deploy virtualized network infrastructure, as well as to monetize new applications. Contrail Edge Cloud will greatly simplify the IT side of spinning up and managing these new services in a secure way. And what’s great is it isn’t just for network engineers. It also gives sales and marketing executives a way to get creative with agile new services, so businesses and consumers start seeing carriers as more than just connectivity providers. We are executing all of this in an incredibly small edge-friendly footprint,” stated Sally Bament, Vice President, Service Provider Marketing, Juniper Networks.

TE Connectivity to sell its Subcom business to Cerberus

TE Connectivity will sell its SubCom subsea communications business to Cerberus Capital Management for $325 million in cash.

SubCo, which is based in Eatontown, New Jersey, designs, manufactures, deploys and maintains subsea fiber optic cable systems. To date, SubCom has completed more than 100 cable systems and deployed over 610,000 kilometers of cable through its eight high-performance cable ship. The division has 1,400 employees.

TE Connectivity said its SubCom business was expected to contribute approximately $700 million in sales to fiscal year 2018 results, with a minimal contribution to profitability. TE will provide supplemental information with respect to the SubCom business when it announces its financial results for the fourth quarter of fiscal year 2018.

TE Connectivity plans to use proceeds from the sale for share repurchases.

"The SubCom business is a leader in the undersea telecommunications market, and distinctly different from the rest of TE's connectivity and sensor portfolio. We are pleased that with this transaction we increase our focus as a leading industrial technology company. It strengthens our business model; resulting in a stronger growth profile, reduced cyclicality, higher margins and a greater return on investment," said TE Connectivity CEO Terrence Curtin. "We appreciate the contributions that the SubCom team has made to our company and toward building a more connected world, and we expect that they will continue that important work in the future with Cerberus."

“SubCom is a recognized pioneer in the subsea fiber optic cable industry with a long track record of technology innovation and excellent project management and customer service,” said Michael Sanford, Co-Head of North American Private Equity and Senior Managing Director of Cerberus.

“The industry-leading solutions and services that SubCom delivers will become even more critical for global companies as demand for data and connectivity continues to grow rapidly. Through this investment, SubCom will become an independent, standalone business that is well-positioned to capitalize on the significant growth opportunities ahead.”

Small Form Factor Pluggable Double Density (SFP-DD) enters V 2.0

The Small Form Factor Pluggable Double Density (SFP-DD) Multi Source Agreement (MSA) Group, whose founding members include Alibaba, Broadcom, Cisco, Dell EMC, Finisar, HPE, Huawei, Intel, Juniper, Lumentum, Mellanox, Molex, and TE Connectivity, released the v2.0 specification for the SFP-DD pluggable interface.

SFP-DD is the next-generation SFP form factor for DAC and AOC cabling, and optical transceivers. The innovative electrical interface is designed to support two lanes that operate up to 25 Gbps NRZ or 56 Gbps PAM4 per lane modulation — providing aggregate bandwidth of 50 Gbps NRZ or 112 Gbps PAM4 with excellent signal integrity.

The newly updated specification version 2.0 reflects enhancements to the mechanicals, extended modules and enhanced polarizing key of the high-speed, high-density SFP-DD electrical interface, targeting support of up to 3.5 W optical modules in an enterprise environment.

Version 1.0 of the spec was released in September 2017.

"Through strategic collaborations, we work to increase speed, density and scalability in next-generation servers," said Scott Sommers, Chair of the SFP-DD MSA. "We effectively enhance the roadmap in network applications to meet the challenging demands data centers and enterprise networking platforms are up against."

SFP-DD MSA contributing members are Accelink, Amphenol, AOI, Foxconn Interconnect Technology, Fourte, Genesis, Hisense, Infinera, Innolight, Maxim, Multilane, Nokia, Oclaro, Senko, Source Photonics, US Conec, and ZTE.

ZenFi + Cross River Fiber merger creates NJ-NYC powerhouse

ZenFi Networks, which operates a high fiber count network across all five boroughs of the City of New York, completed its previously announced merger with Cross River Fiber, which operates high-capacity and latency-sensitive fiber optic backbone spans throughout New Jersey and New York. The deal will create a leading communications infrastructure provider in the New York and New Jersey metro areas with more than 700 route miles of fiber optic network, 130 on-net buildings, 49 colocation facilities and 1,700 outdoor wireless locations with more than 3,000 under contract.  Financial terms were not disclosed.

"We celebrate a new milestone as we formally merge two of the region’s most highly regarded communication infrastructure providers into one agile, innovative organization dedicated to delivering solutions to clients more efficiently,” comments Ray LaChance, CEO of ZenFi Networks. “This transaction enhances our network reach, deepens our product portfolio, and delivers a premier regional communications network infrastructure that is the foundation for 5G network deployments and tomorrow’s evolving network technology needs.”

“I am excited about the future as we blend the considerable talents and experience of both companies,” shares Vincenzo Clemente, newly appointed President and COO of ZenFi Networks. "Combined, we will draw on our decades of experience to push the boundaries of infrastructure innovation, all while remaining focused on efficiently delivering and expanding purpose-built fiber optic networks and wireless solutions for our clients."

NETSCOUT sells off its handset test division

Netscout Systems has divested its handheld network test (HNT) tools business to StoneCalibre, a private equity firm based in Los Angeles. Financial terms were not disclosed.

The HNT portfolio acquired by StoneCalibre includes the LinkSprinter Network Tester, LinkRunner Network Auto-Tester, OneTouch AT Network Assistant, AirCheck G2 Wireless Tester, and AirMagnet Mobile solutions.

NETSCOUT will work toward a smooth transition for customers, partners, contractors and suppliers by collaborating with StoneCalibre to provide a variety of services across a range of functional areas over the next several months as StoneCalibre completes its carve-out of HNT as a standalone company in its portfolio.

Ixia and InnoLight demo 400 GE interoperability with OSFP transceivers

Keysight Technologies's Ixia division and Innolight demonstrated 400 Gigabit Ethernet (GE) interoperability between InnoLight’s OSFP optical transceivers, an OSFP 400GE switch from a major network OEM, and Ixia’s AresONE-400GE OSFP 8x400GE test system. The demo carried 3.2 Tbps of Ethernet test traffic.

Ixia’s AresONE-400GE test systems enable network equipment providers to test high-port-density devices such as routers, switches and servers for all Ethernet speeds, 400GE/200GE/100GE/50GE, based on the IEEE802.3bs and IEEE802.3cd standards. The 8-port 400GE test system is based on 56 Gb/s electrical interface with PAM-4 encoded signaling, supporting both OSFP or QSFP-DD pluggable interfaces. It runs IxNetwork, Ixia’s field-proven L2/3 emulation performance and scale test software. AresONE is available in full and reduced, 4-port and 8-port models.

“This interoperability test is a great milestone for the 400GE ecosystem with the first known demonstration of multimode and single mode OSFP optical transceivers delivering real network traffic on a major manufacturer’s switch,” said Sunil Kalidindi, vice president of product management at Keysight’s Ixia Solutions Group. “These tests show the versatility of the AresONE for testing different types of optical transceivers for 400GE and 200GE speeds as well as the forthcoming 4x100GE and 8X50GE multi-rate Ethernet speeds that these optics will eventually support.”

Mobile World Congress Americas attracted 22,000 attendees

Last week's 2018 Mobile World Congress Americas, which was held at the Los Angeles Convention Center, attracted 22,000 attendees, up by 1,000 compared to last year when it was held in San Francisco.

In February, attendance at the 2018 Mobile World Congress in Barcelona was 107,000 visitors, down slightly from 108,000 attendees in 2017 and compared with 101,000 attendees in 2016.

“This year’s show builds on the success of last year’s inaugural event, with a higher number of senior-level attendees from the mobile ecosystem, as well as adjacent industries such as media and entertainment,” said John Hoffman, CEO, GSMA Ltd. “Feedback from participating companies indicates that Mobile World Congress Americas delivered on their goals of developing new business, promoting new technologies and products and building visibility amongst industry influencers. We are really pleased with our first event here in LA and look forward to extending this in 2019.”

Singapore's National Supercomputing Centre picks Mellanox

Singapore's National Supercomputing Centre (NSCC) has selected Mellanox 100 Gigabit Ethernet Spectrum-based switches, ConnectX adapters, cables and modules for its network.

"We are excited to collaborate with NSCC to interconnect the Singapore's research and educational facilities in the most efficient and scalable way," said Gilad Shainer, Vice President of Marketing at Mellanox Technologies. "The combination of our Ethernet RoCE technology, Spectrum switches, MetroX WDM long-haul switch, cables and software provide the highest data throughput, enabling users to be at the forefront of research and scientific discovery."

Mellanox ConnectX-5 with Virtual Protocol Interconnect supports two ports of InfiniBand and Ethernet connectivity, sub-600 nanosecond latency, and very high message rate, plus embedded PCIe switch and NVMe over Fabric offloads. It enables higher HPC performance with new Message Passing Interface (MPI) offloads, advanced dynamic routing, and new capabilities to perform various data algorithms.

Mellanox Spectrum, the eighth generation of switching IC family from Mellanox, delivers leading Ethernet performance, efficiency and throughput, low-latency and scalability for data center Ethernet networks by integrating advanced networking functionality for Ethernet fabrics. Hyperscale, cloud, data-intensive, virtualized datacenters or storage environments drive the need for increased interconnect performance and throughput beyond 10 and 40GbE. Spectrum's flexibility enables solution companies to build any Ethernet switch system at the speeds of 10, 25, 40, 50 and 100G, with leading port density, low latency, zero packet loss, and non-blocking traffic.

Mellanox's MetroX provides RDMA Long-Haul Systems enable connections between data centers deployed across multiple geographically distributed sites, extending Mellanox's world-leading interconnect benefits beyond local data centers and storage clusters.