by Martin Taylor, Chief Technical Officer, Metaswitch
In October 2012, when a group of 13 network operators launched their white paper describing Network Functions Virtualization, the world of cloud computing technology looked very different than it does today. As cloud computing has evolved, and as telcos have developed a deeper understanding of it, so the vision for NFV has evolved and changed out of all recognition.
The early vision of NFV focused on moving away from proprietary hardware to software running on commercial off-the-shelf servers. This was described in terms of “software appliances”. And in describing the compute environment in which those software appliances would run, the NFV pioneers took their inspiration from enterprise IT practices of that era, which focused on consolidating servers with the aid of hypervisors that essentially virtualized the physical host environment.
Meanwhile, hyperscale Web players such as Netflix and Facebook were developing cloud-based system architectures that support massive scalability with a high degree of resilience, which can be evolved very rapidly through incremental software enhancements, and which can be operated very cost-effectively with the aid of a high degree of operations automation. The set of practices developed by these players has come to be known as “cloud-native”, which can be summarized as dynamically orchestratable micro-services architectures, often based on stateless processing elements working with separate state storage micro-services, all deployed in Linux containers.
It’s been clear to most network operators for at least a couple of years that cloud-native is the right way to do NFV, for the following reasons:
- Microservices-based architectures promote rapid evolution of software capabilities to enable enhancement of services and operations, unlike legacy monolithic software architectures with their 9-18 month upgrade cycles and their costly and complicated roll-out procedures.
- Microservices-based architectures enable independent and dynamic scaling of different functional elements of the system with active-active N+k redundancy, which minimizes the hardware resources required to deliver any given service.
- Software packaged in containers is inherently more portable than VMs and does much to eliminate the problem of complex dependencies between VMs and the underlying infrastructure which has been a major issue for NFV deployments to date.
- The cloud-native ecosystem includes some outstandingly useful open source projects, foremost among which is Kubernetes – of which more later. Other key open source projects in the cloud-native ecosystem include Helm, a Kubernetes application deployment manager, service meshes such as Istio and Linkerd, and telemetry/logging solutions including Prometheus, Fluentd and Grafana. All of these combine to simplify, accelerate and lower the cost of developing, deploying and operating cloud-native network functions.
5G is the first new generation of mobile technology since the advent of the NFV era, and as such it represents a great opportunity to do NFV right – that is, the cloud-native way. The 3GPP standards for 5G are designed to promote a cloud-native approach to the 5G core – but they don’t actually guarantee that 5G core products will be recognisably cloud-native. It’s perfectly possible to build a standards-compliant 5G core that is resolutely legacy in its software architecture, and we believe that some vendors will go down that path. But some, at least, are stepping up to the plate and building genuinely cloud native solutions for the 5G core.
Cloud-native today is almost synonymous with containers orchestrated by Kubernetes. It wasn’t always thus: when we started developing our cloud-native IMS solution in 2012, these technologies were not around. It’s perfectly possible to build something that is cloud-native in all respects other than running in containers – i.e. dynamically orchestratable stateless microservices running in VMs – and production deployments of our cloud native IMS have demonstrated many of the benefits that cloud-native brings, particularly with regard to simple, rapid scaling of the system and the automation of lifecycle management operations such as software upgrade. But there’s no question that building cloud-native systems with containers is far better, not least because you can then take advantage of Kubernetes, and the rich orchestration and management ecosystem around it.
The rise to prominence of Kubernetes is almost unprecedented among open source projects. Originally released by Google as recently as July 2015, Kubernetes became the seed project of the Cloud Native Computing Foundation (CNCF), and rapidly eclipsed all the other container orchestration solutions that were out there at the time. It is now available in multiple mature distros including Red Hat OpenShift and Pivotal Container Services, and is also offered as a service by all the major public cloud operators. It’s the only game in town when it comes to deploying and managing cloud native applications. And, for the first time, we have a genuinely common platform for running cloud applications across both private and public clouds. This is hugely helpful to telcos who are starting to explore the possibility of hybrid clouds for NFV.
So what exactly is Kubernetes? It’s a container orchestration system for automating application deployment, scaling and management. For those who are familiar with the ETSI NFV architecture, it essentially covers the Virtual Infrastructure Manager (VIM) and VNF Manager (VNFM) roles.
In its VIM role, Kubernetes schedules container-based workloads and manages their network connectivity. In OpenStack terms, those are covered by Nova and Neutron respectively. Kubernetes includes a kind of Load Balancer as a Service, making it easy to deploy scale-out microservices.
In its VNFM role, Kubernetes can monitor the health of each container instance and restart any failed instance. It can also monitor the relative load on a set of container instances that are providing some specific micro-service and can scale out (or scale in) by spinning up new containers or spinning down existing ones. In this sense, Kubernetes acts as a Generic VNFM. For some types of workloads, especially stateful ones such as databases or state stores, Kubernetes native functionality for lifecycle management is not sufficient. For those cases, Kubernetes has an extension called the Operator Framework which provides a means to encapsulate any application-specific lifecycle management logic. In NFV terms, a standardized way of building Specific VNFMs.
But Kubernetes goes way beyond the simple application lifecycle management envisaged by the ETSI NFV effort. Kubernetes itself, together with a growing ecosystem of open source projects that surround it, is at the heart of a movement towards a declarative, version-controlled approach to defining both software infrastructure and applications. The vision here is for all aspects of a complex cloud native system, including cluster infrastructure and application configuration, to be described in a set of documents that are under version control, typically in a Git repository, which maintains a complete history of every change. These documents describe the desired state of the system, and a set of software agents act so as to ensure that the actual state of the system is automatically aligned with the desired state. With the aid of a service mesh such as Istio, changes to system configuration or software version can be automatically “canary” tested on a small proportion of traffic prior to be rolled out fully across the deployment. If any issues are detected, the change can simply be rolled back. The high degree of automation and control offered by this kind of approach has enabled Web-scale companies such as Netflix to reduce software release cycles from months to minutes.
Many of the network operators we talk to have a pretty good understanding of the benefits of cloud native NFV, and the technicalities of containers and Kubernetes. But we’ve also detected a substantial level of concern about how we get there from here. “Here” means today’s NFV infrastructure built on a hypervisor-based virtualization environment supporting VNFs deployed as virtual machines, where the VIM is either OpenStack or VMware. The conventional wisdom seems to be that you run Kubernetes on top of your existing VIM. And this is certainly possible: you just provision a number of VMs and treat these as hosts for the purposes of installing a Kubernetes cluster. But then you end up with a two-tier environment in which you have to deploy and orchestrate services across some mix of cloud native network functions in containers and VM-based VNFs, where orchestration is driving some mix of Kubernetes, OpenStack or VMware APIs and where Kubernetes needs to coexist with proprietary VNFMs for life-cycle management. It doesn’t sound very pretty, and indeed it isn’t.
In our work with cloud-native VNFs, containers and Kubernetes, we’ve seen just how much easier it is to deploy and manage large scale applications using this approach compared with traditional hypervisor-based approaches. The difference is huge. We firmly believe that adopting this approach is the key to unlocking the massive potential of NFV to simplify operations and accelerate the pace of innovation in services. But at the same time, we understand why some network operators would baulk at introducing further complexity into what is already a very complex NFV infrastructure.
That’s why we think the right approach is to level everything up to Kubernetes. And there’s an emerging open source project that makes that possible: KubeVirt.
KubeVirt provides a way to take an existing Virtual Machine and run it inside a container. From the point of view of the VM, it thinks it’s running on a hypervisor. From the point of view of Kubernetes, it sees just another container workload. So with KubeVirt, you can deploy and manage applications that comprise any arbitrary mix of native container workloads and VM workloads using Kubernetes.
In our view, KubeVirt could open the way to adopting Kubernetes as “level playing field” and de facto standard environment across all types of cloud infrastructure, supporting highly automated deployment and management of true cloud native VNFs and legacy VM-based VNFs alike. The underlying infrastructure can be OpenStack, VMware, bare metal – or any of the main public clouds including Azure, AWS or Google. This grand unified vision of NFV seems to us be truly compelling. We think network operators should ratchet up the pressure on their vendors to deliver genuinely cloud native, container-based VNFs, and get serious about Kubernetes as an integral part of their NFV infrastructure. Without any question, that is where the future lies.