Wednesday, August 27, 2014

Software Defined NFV in Google's Cloud Platform

by James E. Carroll

The really exciting phase of cloud computing is just beginning, said Google's Distinguished Engineer Amin Vahdat, speaking at the 22nd Annual Symposium on High Speed Interconnects at Google's HQ in Mountain View, California. It's not about delivering old capabilities cheaper but about new programming models unavailable elsewhere, leveraging low-latency and massive IOPS.

To get there, Vahdat said the old way of networking simply is insufficient to achieve the performance and flexibility needed by next-gen compute and storage systems. Google is leveraging SDN and NFV to run its cloud.

Some key points in his talk:

  • Everything at Google runs on shared infrastructures.  The SDN creates the illusion that individual applications/services are running in their own networks with their own IP address space.
  • Google private backbone, which connects its data centers, is larger and growing faster than its public-facing network. Bandwidth between Google data centers is comparable to what others see inside their data centers.
  • Google has $2.9 billion of additional planned data center builds worldwide.
  • Operating at Google scale reveals that the dominant costs of running a big data center is power and cooling, not the cost of the initial equipment.
  • Google operates a global CDN of edge connectivity with express lanes back to its data centers.
  • Google Andromeda software stack provides the logically centralized SDN control that orchestratew VM, vSwitch, NIC, fabric switch, packet processors and routers.
  • A logically centralized/hierachical control plane with peer to peer data plane beats full decentralization.
  • Baseline NFV is in the fabric. While Google's Andromeda contoller delivers NFV as a cloud service, new APIs will allow 3rd parties to offer additional network services over Google infrastructure.
  • DDoS protection is an essential NFV service. Large companies are under constant attack. The only question is how big is the ongoing attack.
  • Another NFV application is Google Cloud Load Balancing. This can be provisioned in 5 minutes and then takes 4 seconds to ramp. Steady state can be achieved in under 120 seconds. The total cost is $10.
  • Looking at future needs, the challenge is how to spin up a 1,000 port virtual network in a matter of seconds while ensuring isolation, load balancing, external access, bandwidth provisioning and SAN resources. Using Amdahl's lesser known law (1 Mbps of IO for every 1 MHz of computation) and assuming 32-core CPUs supporting lots of VMs, we can see the virtual network will require 100s Tbps of switching capacity.
  • Future NPU must be >2X better than switches and CPUs for the same functionality.
  • Every piece of the system has to be in sync and to achieve this constant auditing is required. You are only as good as weakest link.

http://www.hoti.org