When the Changing Definition of 5G Real-Time Breaks Existing Tech Stack

In this article, we look at how the expectations of 5G are colliding with how we host and develop software. Several factors are coming together to break how we do things:

  • 5G’s millisecond latency expectations mean that the laws of physics will force us to operate under constraints that can’t be bypassed with more or faster hardware.
  • Some processing will, therefore, have to be done in local data centres at the edge, which upends a decade of consolidating systems into massive regional data centres.
  • 5G’s focus on the IoT will mean that many systems will be hard to update and potentially impossible to turn off, which will force a major rethink in how we provide long term support for deployed systems.
When the Changing Definition of 5G Real-Time Breaks Existing Tech Stack
When the Changing Definition of 5G Real-Time Breaks Existing Tech Stack

Check out this article for in-depth analysis and expert recommendations, including a checklist of the 5 points you’ll need to evaluate if you want to successfully upgrade your tech stack and excel in the era of 5G.

Table of contents

Introduction
Latency: Speed of light is now a factor
What is a “Digital Twin”
With enough devices, failure is a certainty
Planned downtime will be a serious problem
So what’s the “Inflection Point” then?
Life after we pass the inflection point
Next Steps: A Checklist
Conclusion

Introduction

everybody’s talking about 5G and edge computing, but they imply a significant change in the way we do things. Put simply:

Expectations for latency are hitting the laws of physics, and light’s speed of 181 miles per millisecond will drive application architecture.

‘End Users’ used to mean millions of humans. In the future, they will be dwarfed by billions of IoT devices, all of which will cause real-world problems if they lack adequate connectivity.

We may not be able to turn off the systems we deploy without creating an outage, which will also create real-world problems.

The “Inflection Point” in question is when latency expectations shrink to the point where the speed of light removes all the easy options and forces us to change how we do things. 5G and Edge both require latency that can not be met with current techniques.

There is a huge interest in both 5G and Edge Computing. They imply a significant change in the way we do things. Read on to learn more.

Latency: Speed of light is now a factor

Both 5G and edge computing are being positioned on the premise that billions of small IoT devices will be directed by some form of remote intelligence. A key idea of 5G is that latency will be no more than 1 – 3ms.

But light moves at a leisurely 181 miles per millisecond, and nothing moves faster than light. In fact, inside a fibre optic cable light moves at around 120 miles per millisecond. If your device requires 3ms latency to work, and the remote processing takes no more than 1ms, then it can’t be located more than 120 miles away. But that’s in theory. Real-world numbers for WAN latency are a lot worse than this. For example:

  • Dublin and Frankfurt are 677 miles apart.
  • Observed latency for round-tripping is around 25ms.
  • This works out at 54 miles per millisecond.

So to rework our example above: If I am in Dublin, Ireland and I need a decision made in Frankfurt, Germany I can’t expect a response in under 25ms, even if the decision is instant. But that assumes that the problem can be solved in one round trip. Current programming orthodoxy is to use lots of individual trips, in the form of key/value lookups or individual SQL statements, and do the actual processing away from where the data is stored.

It’s immediately obvious that if we need 10 round trips, the latency constraints will prevent the processing from happening on an end-user device located even 5 miles away, which means it must happen either in the back end database system or on an application server co-located with it. We also need to consider that transport latency isn’t the only network overhead. There is an appreciable temporal overhead to even a trivial message on a LAN. Even in an ideal scenario where these two servers are on the same physical rack in a 5G scenario, they won’t have the latency budget to have verbose and chatty conversations with each other and still have time left to do the processing that is needed to solve their business problem.

What is a “Digital Twin”

A Digital Twin5 is a virtual, online replica of something. It generally consists of three elements:

  • A heavily instrumented physical device that can be controlled remotely.
  • An online digital representation of the device with sufficient granularity that is functionally identical.
  • A reliable data transfer mechanism between the device and where its digital representation lives.

Once these three elements are in place it becomes feasible not just to apply cloud computing power to control a cheap and simple piece of equipment, but to use the cloud to make a large number of digital twin devices work together to achieve common goals.

With enough devices, failure is a certainty

Quality has never been a strong point in the software industry. As Gerald Weinburn said, “If builders built houses the way programmers built programs, the first woodpecker to come along would destroy civilisation”. Up to now, a lot of device’s interactions with the internet have been optional and often haven’t controlled their core functionality.

5G and edge take us to the point in time where we will now have billions of ‘digital twins’ doing things we depend on every day, all of which are themselves dependent on reliable, ultra-low latency decision making somewhere out in the network.

Because we’re dealing with real, physical hardware being continuously used the old approach of ‘turn it off and then on again’ isn’t going to be a viable option.

Latency also means that the devices depend on rapid connectivity to work, so failing to meet SLAs will at best be indistinguishable from an outage, and possibly much worse if the devices are directed to actively do things based on out of date information.

Planned downtime will be a serious problem

The real-time end of the software industry has become much better at avoiding unplanned downtime over the last decade. But scheduling planned downtime will become a problem with 5G / edge. When you create devices that need continuous connectivity to work, you also create a scenario where server-side downtime can’t easily be hidden from the outside world. This has several implications:

  1. Complicated stacks with layers of open source components may not work well. If each layer in the stack has its own, independent patching schedule then zealously applying patches as they arrive could create a significant and disruptive number of planned outages. But failing to apply patches as you become aware of them creates potential legal issues. If downtime starts costing thousands of dollars per minute, reducing it by reducing the number of components needed to run an application will become a goal.
  2. Because patching and maintenance are much easier to do at the software level instead of the firmware level, there will be a strong incentive to remove as much processing capability from end devices as possible.
  3. Agile development with many small releases may not be fully appropriate for working with billions of ‘Digital Twins’, each of which has a real-world consequence for failure.

So what’s the “Inflection Point” then?

‘Inflection Point” is one of those terms like “Quantum Leap” that gets overused. We would argue that in this case it’s fully justified.

The fundamental problem we face is that unavoidable network latency starts to devour our available time, leaving less and less time available for doing actual work.

With 5G/edge we get to a point where adding ‘more of stuff’ or using ‘faster stuff’ ceases to be relevant to solving our problems, and moving to a ‘faster network’ ceases to be possible.

The diagram below shows this. It shows the speed of light, with the measured speed we get from a WAN below it.

The Unknowable Zone
The Unknowable Zone

The X-axis is time, in ms. The Y-axis is the distance, in miles. So in 10 ms, light moves 1810 miles, or to turn it on its head, anything you hear about that took 10ms or under to get to you can’t be more than 1,810 miles away.

In practice, we can’t get anywhere near this, as we’re sending data through fibre optic cables, inside which light moves at about 2/36 the maximum speed.

Using a real-world WAN, the distance is more like 500 miles in 10ms. Now the “Your Network” line can be moved upwards by new technology, investment in dedicated fibre cables, etc., but it can never get above the “fibre optic cable line”, never mind approaching the “Speed of Light” line.

The area under the “Your Network” line is where all client-server computing lives, with the distance between the client and the server causing latencies to go up remorselessly.

At the bottom left-hand corner, we have the “5G zone”. 5G is expecting round trip latencies of 1ms. The only way to meet this need is for both parties to be within a few miles each other, and holding terse conversations. There is no technological way past this without a radical breakthrough in physics.

The key point here is that these changes won’t be happening in a vacuum, and they won’t be happening gradually over a decade. The implementation of 5G and edge computing means that the number of devices and user sessions will skyrocket, with the 5G spec assuming ‘1,000,000 per square km’. A significant number of these will be digital twins operating on hyperaggressive SLAs and need constant contact with remote computing capability because they can’t make decisions without it. So at the same time as the number of connected devices goes up by a factor of ten, we start to expect low latency and hyper-reliable connectivity.

So at the same time as the number of connected devices goes up by a factor of ten, we start to expect really low latency and hyper-reliable connectivity.
So at the same time as the number of connected devices goes up by a factor of ten, we start to expect low latency and hyper-reliable connectivity.

Historically ‘we want it faster’ was aspirational. Now ‘we need it in 4ms’ may well be part of your service level agreement.

If we accept the premise of 5G/edge computing at face value, then we also have to accept that it will lead to challenges we will struggle to meet with current techniques and architecture.

Life after we pass the inflection point

While we can’t predict the future with certainty, we can foresee some of the changes that will be needed to accomplish this shift:

  • “One big data centre” will become “local data hubs + one big data centre”
  • We will need to move from many-layered application stacks too much simpler stacks
  • Verbose messaging will need to be replaced with terse messaging, edge computing.
  • “Best effort” reliability will become SLA-enforced reliability
  • Devices become connected and instrumented, but dumb

Anyone of these on its own would be a significant challenge, collectively they mean that the status quo is unsustainable.

“One big data centre” will become “local data hubs + one big data centre”

New applications will have aggressive latency requirements that bump up against the laws of physics. This will mandate edge processing. Edge processing implies much more complicated deployment architecture, both from a technical and business perspective.

While we’ll still have a significant investment in centralized computing, large chunks of latency-sensitive functionality will be done at the edge.

We will need to move from many-layered application stacks too much simpler stacks

Before 5G / edge, we’d repeatedly seen a phenomenon where a business problem was solved by creating a layered architecture of multiple components, only to run into latency issues. What may not be obvious is that every time you add a layer, you’re also adding latency, and in the 5G / edge world you have to keep your latency budget for communicating with edge devices, not squander it having data moseying between different layers of a stack. What’s most disturbing about this phenomenon is that the induced latency can instead be invested in processing the data to derive value and decisions.

The new reality is that latency budgets will drive architectures going forward and that this will push people to use fewer layers so that data streaming, stream processing, and transactional state management merge into a unified platform to address the real-time needs and requirements in a much more holistic fashion.

Verbose messaging will need to be replaced with terse messaging, edge computing.

Modern applications tend to use lots of very small physical messages to accomplish a business goal. These include not just messages that originate from the client, but additional secondary messages within an application stack. For most current telco applications, you need something like 3-5 physical messages to accomplish a single logical task. Even if each of those messages takes 0.5ms, that’s still up to 2.5ms spent communicating. This will create problems not only for both traditional RDBMS products that communicate using many individual SQL statements but also for key/value store type applications. We simply don’t have the latency budget to do 20-30 round trips to solve a business problem. The alternative is to either move the business logic to where the data is or locate the entire application right out on the edge.

“Best effort” reliability will become SLA-enforced reliability

When it comes to automated systems, there are real-world consequences of failing to meet demanding latency expectations. As a consequence, we can expect to see commercially binding SLAs emerge going forward. While a lot of the core 5G technological effort is devoted to creating reliable communication across the network, there will still be latency bottlenecks in application code – within data centres and between layers of an application. People normally think of these bottlenecks as visible and permanent consequences of prior architectural decisions, but we would argue that the real issue lies with the behind-the-scenes housekeeping activities related to memory and disk management, as well as the overheads associated with elastic expansion and contraction of cluster sizes. Our experience is that many benchmarks and proof of concepts complete without these issues ever getting the visibility they deserve.

Devices become connected and instrumented, but dumb

Early efforts at the IoT involved adding sensors to devices so they could send data for remote analysis but didn’t change how the devices used limited local computing power to perform their core functions. Once 5G/edge connectivity becomes a given, there is no absolute reason to retain significant processing power on every device. Doing so increases cost guarantees immediate obsolescence, and creates a never-ending and ongoing patching and support nightmare. Once you commit to edge computing, there may well be scenarios where removing as much intelligence from the devices as possible is the logical thing to do.

Next Steps: A Checklist

Once you look into 5G, you realize that it isn’t just a marketing term or an excuse to replace good hardware with marginally better hardware. Real changes in how we do things will be needed. Use these 5 points to evaluate how to upgrade your tech stack:

  • Understand the challenge. Is the product OLTP and able to make accurate decisions at scale within single-digit latency?
  • Has this product been used in critical path? With downtime, more than an inconvenience, is the product supported by a world-class professional support organization to stand behind their product if needed? What are their SLAs?
  • Can the product architecturally minimize network trips? Can you solve your business problems with a minimal number of latency and bandwidth devouring network trips when using the product? Is there a storage engine that avoids the latency spikes (as in Java-based products)?
  • Is the stack simple and vertically integrated? Can the product spend the minimum practical amount of time processing in a highly available manner before sending you a response?
  • Is the product used to being part of an IoT/streaming ecosystem? Virtually all interactions ultimately come from devices, not people. Does the product have inbound and outbound connectivity to Kafka, Kinesis, and messaging technologies to integrate multiple streams of data?

Conclusion

As you move towards 5G/edge, you’re going to need to accept that the demanding latency requirements and need to support real, live physical devices means that you can’t assume you can use your existing software stack or development techniques.

If starting this journey and looking for a platform that combines millisecond latency, high availability, scalability with accurate decision making? Contact us now at voltdb.com/contact to meet for a one-hour specification workshop and understand how VoltDB can help.

Source: VoltDB

Published by Thomas Apel

, a dynamic and self-motivated information technology architect, with a thorough knowledge of all facets pertaining to system and network infrastructure design, implementation and administration. I enjoy the technical writing process and answering readers' comments included.