Thursday, December 11, 2014

Blueprint: InfiniBand Moves from SuperComputing to Cloud

By: Dr. David Southwell, CVO, Obsidian Strategics

Some technology found inside modern supercomputers may prove to be surprisingly applicable to new data center architectures. To understand how, we must first look at the anatomy of contemporary supercomputers. Classics like the Cray vector supercomputers have long since given way to commodity silicon based designs - the vast majority of supercomputers today are huge clusters of servers lashed together with high-performance networks.  Built for massively parallel large-scale simulations, the application work load is distributed across the server nodes which coordinate via messages passed across their shared communications fabric.  The server nodes usually feature floating point heavy CPUs and GPU-based math accelerators and enjoy large main memories, but they are essentially just Linux servers.

InfiniBand is the Fast Interconnect Fabric for SuperComputing

 Most supercomputers attach their storage to the same communications fabric, as is used for inter-processor communication.  Storage must also be fast and parallel to facilitate large data set loading and also periodic checkpointing to save simulation state in case of a failure.  The interconnect is thus a unified fabric carrying management, compute and storage traffic over a single fiber connection to each node.

Reducing cost per node is a key consideration for most, and budget determines a supercomputer’s performance.  For this reason commodity, standards-based hardware components are preferred.  An open standard called InfiniBand (IB) has been the dominant cluster interconnect since its introduction, with specifications first published by an industry consortium that included Intel, IBM, HP and Microsoft in 1999.

IB is attractive due to features such as extreme scalability, low latency (sub microsecond end-to-end), high bandwidth (100GBits/s per port) and hardware offload, which includes a very powerful feature called RDMA (Remote Direct Memory Access).  RDMA allows data to flow “zero copy” from one application’s memory space to that residing on another server at wire speed, without the intervention of the OS, or even the CPU, allowing data movement to scale with memory speeds, not just CPU core speeds (which have stalled). More information on IB can be found here.

InfiniBand Takes on Data Center Scalability and East-West Traffic Challenges

What does InfiniBand have to do with data center design?  The components of good server farm design create a balance of compute, storage and network performance.  Many factors today reveal the shortcomings of the legacy 37-year old TCP/IP Ethernet:

  • Multiple virtual machines are consolidated onto single physical machines via virtualization, which has the effect of further multiplying the network performance requirements per socket and pushing towards supercomputer-class loading levels.  For instance, a TCP/IP stack running over 1Gb Ethernet could require up to 1GHz worth of CPU – overlay 20 such machines on a single node and even many-core CPUs are saturated by the OS before the application sees a single cycle.
  • Many-core processors use billions of transistors to tile tens to hundreds of CPU cores per chip, and server chips are trending strongly in this direction.  It is easy to see that the networking capability must be proportionately and radically scaled up to maintain architectural balance, or the cores will be forever waiting on network I/O.
  • Current data center work flow requirements, which tend to strongly emphasize East-West traffic, require new fabric topologies. Ethernet spanning tree limitations preclude efficient implementations such as “fat tree” featuring aggregated trunks between switches.
  • Rotating storage is being displaced by Solid State Disks (SSDs) – and not just in their early critical applications such as database indexing and metadata storage.  Legacy NAS interconnects that were able to hide behind tens of milliseconds of rotating disk latency are suddenly found to be hampering SSDs and their microsecond-range response times.  SSDs also deliver order of magnitude throughput increases, again stressing older interconnects.
  • Because they minimize network adapters, cables and switches, unified fabrics are highly desirable. They improve a host of system-level metrics such as capital costs, airflow, heat generation, management complexity and the number of channel interfaces per host.  Micro- and Blade-form-factor servers can ill-afford three separate interfaces per node.  Due to its lossy flow control and high latency, TCP/IP Ethernet is not a good match for high performance storage networks.

InfiniBand is in a unique position; it is able to take on all these challenges as well as smooth migration paths – for example, via IPoIB, InfiniBand can carry legacy IP traffic at great speed and while this does not immediately expose all of the protocol’s benefits, it provides a bridge to more efficient implementations that can be rolled out over time.  Furthermore—and contrary to popular misconception—InfiniBand is actually the most cost-effective protocol in terms of $/Gbits/s of any comparable standards-based interconnect technology, and dramatically so if deployed as a unified fabric.

Extending InfiniBand from Local Subnets to Global Distances

It’s true that InfiniBand has plenty of power and scale. It’s also true that an open standard supercomputer interconnect may hold the key to efficient future data center implementations. However, does InfiniBand have what it takes for production deployments?

In the past, InfiniBand implementations were limited to single subnet topologies and lacked security mechanisms such as link encryption. They could only manage very short links between racks by the standard’s precise lossless flow control scheme. However, today’s InfiniBand solutions enable the spaning of global distances over standard optical infrastructure, with strong link encryption and multi-subnet segmentation. Those who make use of the new IB stand to catch the bleeding edge of innovation that the supercomputer world continues to offer.

About the author

Dr. David Southwell co-founded Obsidian Research Corporation. Dr. Southwell was also a founding member of YottaYotta, Inc. in 2000 and served as its director of Hardware Development until 2004. Dr. Southwell worked at British Telecom's Research Laboratory at Martlesham Heath in the UK, participated in several other high technology start-ups, operated a design consultancy business, and taught Computer Science and Engineering at the University of Alberta. Dr. Southwell graduated with honors from the University of York, United Kingdom, in 1990 with a M.Eng. in Electronic Systems Engineering and a Ph.D in Electronics in 1993 and holds a Professional Engineer (P.Eng.) designation. 

About Obsidian Strategics
Obsidian Strategics Inc. is a private Canadian corporation offering enterprise-class, commercial off the shelf (COTS) devices supporting the InfiniBand protocol used in Supercomputer and HPC environments. The Obsidian Longbow™ technology was first developed for use in mission-critical military and intelligence environments that imposed operational requirements new to InfiniBand. http://www.obsidianresearch.com/

Got an idea for a Blueprint column?  We welcome your ideas on next gen network architecture.
See our guidelines.