Sunday, June 25, 2023

Blueprint: You Can’t Find Flaws in 5G Apps if You Don’t Know What to Look For

 By Glenn Chagnot, Sr. Director of Product Management – Cloud, Spirent Communications

Imagine you oversee a Communication Service Provider (CSP) organization transitioning to a cloud-native 5G architecture. You’ve put in extensive work building up skills to deploy and manage Cloud-Native Network Functions (CNFs), and you’re excited for the new agility and scalability of your telco cloud. But there’s a problem: the performance of one of your 5G production applications keeps degrading, and no one can understand why. You’re at risk of violating enterprise service-level agreements (SLAs), and your new service has barely gotten off the ground. 

You call a frantic all-hands meeting, and after hours of investigation, you finally pinpoint the issue: a new CNF isn’t getting the network performance it needs from the cloud. How did this happen? The Workload team points to the Cloud engineers. The Cloud team points back. Who is at fault —and more importantly, who’s responsible for making sure it doesn’t happen again? Unfortunately, you’ve just encountered one of the biggest blind spots causing headaches for CSPs around the globe. And the only one who can really fix this problem is you. 

In the dynamic cloud-native world we now live in, the performance of 5G applications depends directly on the performance of the underlying cloud. But if no one knows what each workload actually needs from that cloud, no one will be making sure they get it. That level of visibility wasn’t required before, so in many CSP organizations, these essential insights often fall through the cracks. But it doesn’t have to. You can take steps to shore up this oversight. When you do, you’ll find yourself with more stable, better-performing 5G services—and fewer sleepless nights. 

Navigating Complexity

To assure stable, performant 5G applications (such as firewall, 5G network functions, and so on), each CNF has specific requirements from the cloud infrastructure it runs on. If a workload doesn’t get what it needs in any of multiple dimensions (storage, memory, latency between CNFs or pods) its performance will degrade. Eventually, it will fail altogether. But exactly what a given workload needs—and who’s responsible for making sure it gets it—remains a gray area in most CSP organizations (Figure 1).

Figure 1. Who Fills the Gaps Between Cloud and CNF?

Cloud-native environments introduce enormous variability. A 5G workload might be deployed on multiple types of distributed Kubernetes pods, running on dozens of different physical or virtual hosts, across a variety of public and private clouds. A given CNF’s performance varies greatly depending on choices made at each of those layers, so there is no single answer for how the cloud should be configured. Even if there were, unexpected impairments will crop up in constantly changing cloud environments. If you don’t know a workload’s minimum requirements across all the different performance dimensions ahead of time, determining which one is impaired in production is like untangling a giant knot.

Compounding the problem, the nature of the data required—about the intersection of workload and cloud—falls precisely between the Application and Cloud team responsibilities that CSP organizations typically define. So, even if everyone agrees that this information is important, it’s not clear who’s responsible for providing it. 

There’s a clear solution to this challenge: thoroughly testing each 5G CNF, in the lab and preproduction, to characterize the performance it needs across every dimension of cloud. Once you have that data, you can then map it to specific cloud configurations, providing the critical context that’s been missing all this time. Now, Application teams can tell their Cloud colleagues exactly what they need for each workload. And Cloud teams know what to test for and monitor on an ongoing basis. 

Finding Answers

It sounds simple enough: You just need to test each CNF exhaustively in the lab (and then test again whenever something changes). Unfortunately, there’s no way to do that using current testing approaches. Even in the most sophisticated CSP labs, most testing infrastructure is still designed for traditional environments—not the dynamic, unpredictable conditions of real-world clouds. Indeed, this is one of the more common issues we see CSPs face. Legacy testing might show you how a 5G application performs in a cloud—an idealized, infinitely performant one. But it can’t predict how it will behave in your cloud, with its constant fluctuations and outright component failures. 

There are two approaches to address this problem, one short-term, one long:

Short-term: Work with an expert testing partner. The industry is developing new testing approaches for cloud-native 5G networks, but this effort is still very much in progress. It will be some time before productized solutions exist to simplify CNF-to-cloud testing. For now, the quickest, easiest way to resolve this disconnect is to work with one of the organizations currently inventing those tools and processes. 

Long-term: Start evolving teams and tooling for cloud-native testing. This problem isn’t getting easier; if anything, cloud environments will only get more complex and dynamic over time. There are no shortcuts, no vendor spec sheets coming that will make this go away. Eventually, you will need to build a CNF-to-cloud testing capability within your organization. Most likely, that will include expanding the charter of Application teams to collect this mapping data, and tasking Cloud teams with monitoring the production environment and regularly testing against those metrics. It will also require everyone to plan for closer collaboration in preproduction. You should be able to go CNF by CNF, identify those with issues, and make sure they’re addressed before promoting anything to production. 

The good news is that, once you know what to watch for, you can eliminate one of the biggest problems plaguing telco cloud organizations as they evolve to cloud-native architectures. Whether working with a partner or (eventually) testing yourself, you’ll be able to identify most issues that arise from CNF-to-cloud mismatches during testing and preproduction, when they’re far less expensive to fix. And when problems do arise in production, you’ll have meaningful baselines to measure against what you see in the environment, so you can quickly diagnose and correct them. Ultimately, you’ll find you can leave the fire drills and finger-pointing behind, and push ahead with new 5G services with confidence.