Thursday, June 9, 2016

Will coming 'third wave' of ARM-based processors for the data centre break Intel's monopolistic grip on this sector?

Preamble - basic facts about ARM

ARM (Advanced RISC Machine) uses reduced instruction set computing (RISC) architecture in its cores which offer relatively low power consumption, typically smaller die size, lower cost, faster clock speed and usually easier development, while Intel's market-dominant Xeon devices are complex instruction set computing (CISC) x86 processors targeted at the non-consumer workstation, server and embedded system markets. Rather than compete with Intel directly, ARM licenses its technology to other silicon companies. The biggest single and very substantial disadvantage of RISC architecture is that it cannot directly run x86 applications.

 ARM is estimated to have an 85% share of the apps processor market, a 65% share of the computer peripheral s market, a 90% share of the hard-disk and SSD market, and a 95% share of the automotive apps processor market. ARM is seeking to extend its 85% share of mobile processors and enter the data centre and networking space. Intel dominates the data centre server space and has just withdrawn from mobile space. Broadcom dominates the data centre switching space and has withdrawn from the mobile space, but may be eyeing an entrance into data centre servers with help from ARM.

ARM is a relatively small company compared to Intel whose revenue in 2015 was $55.4 billion, but ARM is growing quite fast whereas Intel revenue for the year was down around 1% on 2014. Since 2007, ARM has grown its overall market share in all of the markets in which it competes from 17% to 37% and revenue has more than tripled. Intel's market share, specifically in SC in recent years, has been roughly static in the range 15-16%.

On December 28, 2015 Intel closed its $16.7 billion acquisition of FPGA specialist Altera, whose sales in 2015 were around $1.6 billion, potentially adding around 3% to Intel's annual sales level. Because of its size, Intel's business outlook is much more dependent on total SC market growth than ARM. The global semiconductor market in 2015 was $335.2 billion, down 0.4% on the all time peak in 2014. On June 7, 2016 WSTS released its new semiconductor market forecast generated in May 2016 which forecasts the world semiconductor market to be down 2.4% in 2016 at $327 billion, with growth returning in 2017 and 2018. Intel's market share, specifically in SC in recent years, has been roughly static in the range 15-16%. Intel's Q1 2016 sales, including one full quarter of Altera, were $13.7 billion, up 7% year on year compared with $12.8 billion in Q1 2015 hinting at a slight improvement in its competitive position.

Specifically, ARM on February 10th reported full year revenue of $1,488.6 million (slightly under 3% of Intel's), up 15% from $1,292.6 million in 2014. ARM's Q4 2015 revenue was $407.9 million, up 14% YoY compared to $357.6 million, confirming a steady growth rate. This 14% trend was seen to be continuing when on April 20th ARM reported Q1/2016 revenue of $398.0 million compared with $348.2 million in Q1/2015. It reported a 2015 operating margin of 51.6%, up 1.3 points compared to 50.3% in 2014, and PBT of $511.5 million, up 24% compared to $411.3 million in 2014.

Operationally in the first quarter 2016, ARM reported 39 processor licences signed by a broad range of companies including leading SC vendors and OEMs. Target applications included mobile computing, automotive, networking infrastructure and servers, with strong demand for its most advanced technology with 8 licences signed for ARM Cortex-A technology for its high-performance and high efficiency application processors. It also noted 2 Mali multimedia processor licences signed, including licences for advanced graphics and display processors, and 4.1 billion ARM-based chips shipped, up 10% year-on-year.

New ARM-based processors aimed at the huge data centre opportunity

In recent weeks, there has been a parade of new ARM-based processors aimed at cloud data centres and network infrastructure. These 'forward-looking' product unveilings from the likes of AppliedMicro, Broadcom, Cavium and others, promise better performance and significant improvements in power efficiency over comparable Intel Xeon processors when this next generation of ARM-based processors become available late in 2016 or early 2017. The new chips license the core processor design from ARM Holdings, the British multinational semiconductor and software design company headquartered in Cambridge. The product announcements really come as no surprise, since ARM has been pursuing the market for 64-bit processors aimed at the data centre since at least 2012.

Two previous generations of ARM processors have since entered the market, and while they have seen some success in specialised applications with tier 2 OEMs such as Inventec, Gigabyte and Wistron, they have not gained enough market traction to become a mainstay of the modern data centre. For instance, AppliedMicro began sampling its first generation X-Gene 'Server on a Chip' processor in early 2013, including to HP for its Moonshot server. Despite early publicity, and a follow-up HeliX 2 embedded processor based on the ARMv8-A 64-bit architecture, Intel Xeons have kept ahead of the competition.

For 2015, ARM estimated that it captured 15% of market for processors designed for networking infrastructure, a broad category in which it includes base stations, switches, routers and servers for data centres and cloud platforms. Looking more closely at ARMv8-A based servers, ARM Holdings’ 2016 Roadshow presentation acknowledges less than a 1% market share as of the end of 2015, compared with its 85% of the mobile processing business (smartphones, tablets and laptops) and 25% share of the market for processors aimed at embedded intelligence, such automotive, industrial and other smart devices. So if the first two waves of ARM processors aimed at network infrastructure yielded a measly market share, will the third wave do any better?

The current 15% share is market reality and not the company's growth target. In its 2016 Roadshow presentation, ARM states a goal of capturing 45% of an addressable $16 billion networking infrastructure market in 2020. The company already has design wins in wireless access systems, wired access systems, aggregation/core products, security and some high-end data centres, and to push this further ARM will need to introduce new technology for physical IP, processor architecture and implementations, as well as tools and analysis to optimise SOCs for servers. Some licensees active in this market include Annapurna, AMD, AppliedMicro, Broadcom, Cavium, Marvell, NXP, and Texas Instruments. Other licensees include Altera, Avago, Freescale, HiSilicon, IBM, Qualcomm, and Xilinx.

Third wave ARM-based Solutions

Cavium 

On May 30th Cavium of San Jose, California announced its second generation ThunderX2 processor, which scales up to 54 cores with up to 3.0 GHz core frequency, making it a high-end, high-performance processor, for data centre workloads such as storage, network and secure compute workloads in the cloud. The ThunderX2 will offer fully out-of-order (OOO) cores, single and dual socket configurations, very high memory bandwidth, large memory capacity, integrated hardware accelerators, fully virtualised core and IO, scalable Ethernet fabric and feature rich I/O's supporting 25 Gbit/s. One variant of the chip will include Cavium's 5th generation NITROX technology with acceleration for IPSec, SSL, Anti-virus, Anti-malware, firewall and DPI. This family is optimised for secure web frontend, security appliances and cloud RAN type workloads. Similar hardware-based accelerators designed into the chip could give a performance advantage to this specialized silicon over Intel's more general purpose Xeons.

However, although the ThunderX2 announcement was accompanied by very positive generalised endorsements of Cavium's approach by a broad array of involved companies, namely AMI,  Canonical, E4, FreeBSD, Gigabyte, Inventec, Linaro, NGINX, Red Hat and SUSE, so far as is known no Tier 1 vendors have as yet committed to either the first or second generation of the product. Moreover, although Cavium has indicated that ThunderX2 will be built using a 14 nm FinFET SC process, (compared to 28 nm for the previous generation), which might seem to put it on a par with Intel's latest Xeon E5-2600 v4 (Broadwell) processor launched in March, also being built on a 14 nm process, not only will the Cavium chip not be built in volume until late 2017, while the Intel device is immediately available, but on the day of the announcement top server companies such as Dell ,Hewlett Packard Enterprises, and Lenovo were eager to make it clear they were on the ball and using the Intel chip.

The reason for such a very early announcement by Cavium further points up the difference between the two companies. Intel customers generally expect and receive regular predictable outputs from a pipeline of Intel products. They expect Intel to anticipate their needs and deliver appropriate technology upgrades regularly and on time  and it usually does. With a relatively small company like Cavium (sales in Q1 2016 $101.9 million) they need a degree of extra reassurance.

Applied Micro Circuits 

In November 2015 Applied Micro of Santa Clara, California (whose sales in 2015 were $159.3 million, and $41.1 million in its first quarter of 2016, its fifth successive quarter of growth) announced its X-Gene 3 Server-on-a-Chip, which will combine 32 high performance ARMv8-compatible CPU cores in a single socket. The device aims to deliver performance competitive with mainstream high-end Xeon E5/E7 processors in a similar TDP envelope. It will offer eight DDR4 memory channels for addressing one Tbyte of RAM, outpacing the Xeon E5 in memory bandwidth, making it well suited to hyperscale workloads such as in-memory databases, big data, machine learning, web search and HPC. X-Gene 3 is expected to sample in the second half of 2016, with production shipments in the second half of 2017.

Part 2

Phytium

At the 2015 Annual Hot Chips 27 conference in August 2015 this relatively unknown Chinese start-up, located in Guangzhou surprised attendees by introducing two families of ARM-based processors for server applications: Mars, aimed at applications characterised by high performance, high volume of memory, high bandwidth memory access, high bandwidth I/O access and a requirement for large scale cache coherence; and Earth, aimed at high density computing applications requiring low cost and high power efficiency but requiring moderate performance.

Mars is a 640 sq mm chip built on a TSMC 28 nm process which operates at up to 2 GHz and draws up to 120 W with an architecture based on 8 panels, each containing 8 x Xiaomi ARMv8-based cores.

The chip is rated at 512 GFlops. According to research director Charles Zhang, the company aspires to be a leading-edge processor and ASIC maker in the Chinese IT sector and will be working on two classes of ARM-based processors - one aimed at scale-up machines, another aimed at scale-out machines used in hyperscale and cloud computing.

In the presentation at Hot Chips he cited improvements in all areas for the next generation CPU but mentioned no numbers.

According to Wikipedia, Phytium has already produced the FT-1500A chip, an ARM64 SoC which includes 16 cores of ARMv8 processor, a 32-lane PCIe host, 2 GMAC on-chip Ethernet controller and a GICv3 interrupt controller with ITS support. The 'FT' in the device name seems to stand for FeiTeng, directly linking Phytium (presumably an anglicisation) to the FeiTeng range of processors developed under Professor Xing Zuocheng of the National University of Defense Science and Technology of Changsha, Hunan Province, a top military academy and national research university. 2,400 of the previous generation chip, the FeiTeng1000, are used in China's Tianhe2 Supercomputer in the Chinese National Super Computer Center in Guangzhou, which with 3.12 million cores and a capability of 33.86 petaflop/s was, as of July 2015, rated by the Top 500 HPC listing organisation as the world's fastest supercomputer.

While the Mars chip in its present form does not have a sufficient edge to make much of an impact on Intel's current position, it is necessary to recognise that as part of its 13th 5 Year Plan approved March 2016 the Chinese government committed $100 -$150 billion in public and private funds towards the objective of catching up technologically with the world’s leading firms by 2030 in the design, fabrication and packaging of chips, so it can cease being dependent on foreign supplies. In 2015, the government also announced that by 2025 it wanted local firms to be producing 70% of the chips consumed by Chinese industry. According to McKinsey, in 2015 only around 12% of the chips used in Chinese production were designed by local fabless Chinese companies, so there is a long way to go.

Despite the probably almost unlimited funding available, Chinese attempts to buy their way into the global industry have met serious political resistance in both the U.S. and Taiwan. However, that seems to mean that existing domestic semiconductor companies such as Phytium and market leader HiSilicon will for the next few years be able to scale up research, development and production facilities to whatever level they wish. Moreover, there are domestic security limitations on Intel's exporting its highest level technology to China, which possibly opens up some opportunities for Phytium and following some bad publicity about international security practices China is likely to establish increasingly tighter controls on the designs and designers of intelligent components used in the country's mission-critical systems, which would strongly favour local design and production.
(NB: An important point in this context is that on January 16th 2016 it was announced that Qualcomm and the provincial government of the southern Chinese province of Guizhou had established a $280 million capitalised joint venture 55% owned by the province and 45% by Qualcomm to focus on the design development and sale of advanced chipset technology in China. The population of Guizhou is around 35 million and its GDP in 2014 $151 billion - higher than Hungary, the Ukraine or Morocco.)

Industry partnerships and projects to open the market 

To accelerate its entry into network infrastructure, ARM is working with a number of open source projects, including the OpenDataPlane project. This initiative aims to produce an open-source, cross-platform set of application programming interfaces (APIs) for the networking data plane. Other supporters include Broadcom, Cavium, Cisco, Ericsson, Freescale, MontaVista, Nokia, Wind, Texas Instruments and ZTE.

In May, ARM announced its participation in an industry effort to develop a specification for a new Cache Coherent Interconnect for Accelerators (CCIX). This interconnect technology specification will ensure that processors using different instruction set architectures (ISA) can coherently share data with accelerators and enable efficient heterogeneous computing, significantly improving compute efficiency for servers running data centre workloads. Other companies supporting this include AMD, Huawei, IBM, Mellanox, Qualcomm and Xilinx.

Also in March 2016, ARM and TSMC announced a multi-year agreement to collaborate on a 7 nm FinFET process technology, which includes a design solution for future low-power, high-performance compute SoCs. This deal expands the long-term foundry partnership beyond mobile devices to include next-generation networks and data centres. It also sets the stage for possibly a fourth-wave of ARM-based processors aimed at the data centre.

A further opportunity for ARM may be China, which has been known to encourage IT solutions developed by Chinese firms as a matter of national security policy, especially in preference to Qualcomm, Intel, Microsoft and other dominant U.S. players. In April 2016, a China-based Green Computing Consortium (GCC) was formed with the goal of establishing a deep ecosystem in China for big data, enterprise and cloud computing platforms based on the ARM architecture. Under this consortium, ARM is working with Alibaba, Baidu, China National Software and Service, Dell. Guizhou Huaxintong, the joint venture company of Guizhou and Qualcomm Hewlett Packard Enterprise/H3C (HPE), Lenovo and Phytium.

RISC vs CISC battle continues 

Finally, one should note that Intel once flirted with RISC architecture. Ten years ago Intel decided to sell off its nascent Xscale communications and applications processing division, which was developing solutions for the then new category of smartphones and personal digital assistants. The RISC-based Intel Xscale technology had found early success with then super start Research in Motion (RIM), which was planning to use an Intel PXA9xx communication processor codenamed Hermon in a new Blackberry 8700 device. Another Intel PXA27x applications processor, codenamed Bulverde, had prime design wins in the Palm Treo smart phone, the Motorola Q and other devices of the day (the Xscale mobile processor group sale occurred six months ahead of Steve Jobs' unveiling of the iPhone in January 2007).

Marvell was the purchaser of this seemingly successful Intel division, paying $600 million in cash plus the assumption of some liabilities. Some 1,400 employees transferred from Intel to Marvell and there was the possibility of Intel becoming a supplier of ARM-based mobile processors. The strategic reasoning for the sale given by Intel at the time was so that it could focus its investments on its core businesses, including high-performance, low-power Intel Architecture-based processors and emerging technologies for mobile computing, including WiFi and WiMAX. WiMAX struggled for years in the vain hope of challenging LTE as the future of mobile broadband. The bet on x86 architecture for mobile devices did not turn out so well either, with Intel only recently deciding to give up altogether on x86 mobile processors.

As the third wave of ARM-based processors for servers and network infrastructure comes to market, ARM seems more patient than its larger rival. Its plan is than an army of licensees will develop specialised solutions for specific data centre workflows. There is also hope that its 85% share of mobile processors can be leveraged to accelerate its entrance into data centre infrastructure.

Summary 

There is already huge discussion in business literature about how to dislodge an incumbent, but it is to be observed that any such advice cannot be that good since senior U.S. ICT companies like Microsoft, Intel, Cisco, HP and IBM all look capable of hanging on for at least a decade and probably more, even though they show many signs of weakness and in all cases difficulty in renewing themselves internally. However, looking at it from the challenger point of view there are no easy options either.

The problem that a company like ARM and its large ecosystem has in dislodging an incumbent vendor, particularly one as big, rich and savvy as Intel, is that the challenger's products do not merely have to be better than those of the market leader but they have to be so much better that users of the existing product see it as a 'no brainer' to make the necessary changes in design. What that no-brainer advantage premium might actually be in percentage or operational terms varies depending on circumstances, such as the absolute size of the business involved, average design cycle, the immediate usually substantial onetime costs of making the change, and the longer term drawbacks of moving from a well-proven business relationship to a probably less certain one.

However, on the evidence that percentage is clearly quite large and difficult to achieve against an adversary as competent as Intel. Moreover, unless the advantage involved is so unique that it is unmatchable the incumbent usually still has 'last-mover' advantage, i.e. ,the time and space to make a carefully calculated counteroffer which may not necessarily match the challenger's offer but reduce it below the no brainer level, which would force the customer to make a change. So it is much too early to bet against Intel surviving most of the assaults on its data centre business fortress.