Some
technology found inside modern supercomputers may prove to be surprisingly
applicable to new data center architectures. To understand how, we must first
look at the anatomy of contemporary supercomputers. Classics like the Cray
vector supercomputers have long since given way to commodity silicon based
designs - the vast majority of supercomputers today are huge clusters of
servers lashed together with high-performance networks. Built for massively parallel large-scale
simulations, the application work load is distributed across the server nodes
which coordinate via messages passed across their shared communications
fabric. The server nodes usually feature
floating point heavy CPUs and GPU-based math accelerators
and enjoy large main memories, but they are essentially just Linux servers.
InfiniBand is the Fast Interconnect Fabric for SuperComputing
Most
supercomputers attach their storage to the same communications fabric, as is used for inter-processor communication. Storage must also be fast and parallel to
facilitate large data set loading and also periodic checkpointing to save
simulation state in case of a failure.
The interconnect is thus a unified fabric carrying management, compute
and storage traffic over a single fiber connection to each node.
Reducing
cost per node is a key consideration for most, and budget determines a supercomputer’s
performance. For this reason commodity,
standards-based hardware components are preferred. An open standard called InfiniBand (IB) has
been the dominant cluster interconnect since its introduction, with specifications first published by an industry consortium that included Intel, IBM, HP and Microsoft in 1999.
IB
is attractive due to features such as extreme scalability, low latency (sub
microsecond end-to-end), high bandwidth (100GBits/s per port) and hardware
offload, which includes a very powerful feature called RDMA (Remote Direct
Memory Access). RDMA allows data to flow
“zero copy” from one application’s memory space to that
residing on another server at wire speed, without the intervention of the OS, or even the CPU, allowing data movement to scale with memory speeds,
not just CPU core speeds (which have stalled). More information on IB can be
found here.
InfiniBand Takes on Data Center Scalability and East-West Traffic Challenges
What
does InfiniBand have to do with data center design? The components of good server farm design create
a balance of compute, storage and network performance. Many factors today reveal the shortcomings of
the legacy 37-year old TCP/IP Ethernet:
- Multiple virtual machines are consolidated onto single physical machines via virtualization, which has the effect of further multiplying the network performance requirements per socket and pushing towards supercomputer-class loading levels. For instance, a TCP/IP stack running over 1Gb Ethernet could require up to 1GHz worth of CPU – overlay 20 such machines on a single node and even many-core CPUs are saturated by the OS before the application sees a single cycle.
- Many-core processors use billions of transistors to tile tens to hundreds of CPU cores per chip, and server chips are trending strongly in this direction. It is easy to see that the networking capability must be proportionately and radically scaled up to maintain architectural balance, or the cores will be forever waiting on network I/O.
- Current data center work flow requirements, which tend to strongly emphasize East-West traffic, require new fabric topologies. Ethernet spanning tree limitations preclude efficient implementations such as “fat tree” featuring aggregated trunks between switches.
- Rotating storage is being displaced by Solid State Disks (SSDs) – and not just in their early critical applications such as database indexing and metadata storage. Legacy NAS interconnects that were able to hide behind tens of milliseconds of rotating disk latency are suddenly found to be hampering SSDs and their microsecond-range response times. SSDs also deliver order of magnitude throughput increases, again stressing older interconnects.
- Because they minimize network adapters, cables and switches, unified fabrics are highly desirable. They improve a host of system-level metrics such as capital costs, airflow, heat generation, management complexity and the number of channel interfaces per host. Micro- and Blade-form-factor servers can ill-afford three separate interfaces per node. Due to its lossy flow control and high latency, TCP/IP Ethernet is not a good match for high performance storage networks.
InfiniBand
is in a unique position; it is able to take on all these challenges as well as smooth
migration paths – for example, via IPoIB, InfiniBand can carry legacy IP
traffic at great speed and while this does not immediately expose all of the protocol’s benefits, it provides a bridge to
more efficient implementations that can be rolled out over time. Furthermore—and
contrary to popular misconception—InfiniBand is actually the
most cost-effective protocol in terms of $/Gbits/s of any comparable
standards-based interconnect technology, and dramatically so if deployed as a
unified fabric.
Extending InfiniBand from Local Subnets to Global Distances
It’s true that InfiniBand has
plenty of power and scale. It’s also true that an open standard supercomputer interconnect
may hold the key to efficient future data center implementations. However, does
InfiniBand have what it takes for production deployments?
In
the past, InfiniBand implementations were limited to single subnet topologies
and lacked security mechanisms such as link encryption. They could only manage very
short links between racks by the standard’s precise lossless flow control
scheme. However, today’s InfiniBand solutions enable the spaning of global distances over standard
optical infrastructure, with strong link encryption and multi-subnet
segmentation. Those who make use of the new IB stand to catch the bleeding edge
of innovation that the supercomputer world continues to offer.
About the author
Dr. David Southwell co-founded
Obsidian Research Corporation. Dr. Southwell was also a founding member of
YottaYotta, Inc. in 2000 and served as its director of Hardware Development
until 2004. Dr. Southwell worked at British Telecom's Research Laboratory at
Martlesham Heath in the UK, participated in several other high technology
start-ups, operated a design consultancy business, and taught Computer Science
and Engineering at the University of Alberta. Dr. Southwell graduated with
honors from the University of York, United Kingdom, in 1990 with a M.Eng. in
Electronic Systems Engineering and a Ph.D in Electronics in 1993 and holds a
Professional Engineer (P.Eng.) designation.
About Obsidian Strategics
Obsidian Strategics Inc. is a private Canadian corporation offering enterprise-class, commercial off the shelf (COTS) devices supporting the InfiniBand protocol used in Supercomputer and HPC environments. The Obsidian Longbow™ technology was first developed for use in mission-critical military and intelligence environments that imposed operational requirements new to InfiniBand. http://www.obsidianresearch.com/
Got an idea for a Blueprint column? We welcome your ideas on next gen network architecture.
See our guidelines.
See our guidelines.