Automated Peering Engineering – Peering optimization in a modern internet world

The routing mechanism currently used for internet peering has remained largely unchanged over the past few decades, but it is fraught with problems. It still relies heavily on slow manual configuration, which requires skilled engineers and is laborious and errorprone. Added to this, the lack of visibility and the limitations of the Border Gateway Protocol (BGP) used to route traffic at these peering points can impact the end user’s quality of experience (QoE), as well as profitability.

Automated Peering Engineering - Peering optimization in a modern internet world
Automated Peering Engineering – Peering optimization in a modern internet world

These risks, combined with the increasingly unpredictable nature of today’s internet traffic, are pushing communications service providers (CSPs), enterprises and webscale companies to reconsider their peering solutions.

This white paper discusses an automated approach to internet peering that can handle rapidly changing traffic patterns more effectively. It integrates real-time traffic visibility with instant software control of network resources and application traffic flows — which enables automatic traffic engineering at the peering routers and network interconnections.

When these automated processes are applied to inbound and outbound traffic, they can improve interface utilization, simplify operations and reduce costly manual misconfigurations. They also help avoid traffic congestion and decrease end-to-end latency to improve performance and QoE — particularly for time-sensitive applications such as content streaming and online gaming.

Introduction

Today’s web applications are very different from those of just 15 years ago, when web browsing, e-commerce and social media were dominant. Now consumer applications, such as gaming and video streaming, are increasingly latency and bandwidth sensitive. Moreover, the new wave of mission-critical, industrial-grade internet applications presents additional constraints in terms of packet loss and roundtrip delays.

This current mix of applications makes internet traffic highly unpredictable. A live event, a popular TV show, a mobile phone OS update, a viral OTT video or a web attack by hackers can generate sudden traffic spikes in the network which can impact the end user’s QoE.

Versatile and unpredictable nature of growing internet traffic
Versatile and unpredictable nature of growing internet traffic

Rapidly changing network demand patterns are impacting both end-user applications and existing SLAs. As the volume of network traffic continues to grow, inbound or outbound, network congestion and degraded network performance are occurring more frequently. In addition, existing technologies that rely heavily on complex and error-prone manual configurations are showing their limits.

“When [gaming company] Fortnite pushed out an update recently, the network went insane for two hours” — Neil McRae, managing director and chief architect at BT

Currently, many service providers deal with time-sensitive applications and volatile traffic by overprovisioning their networks. This expensive, “heavy iron” approach may have worked in the past. But its costliness impacts profitability in the long run, especially as the gap between peak and mean traffic levels rises with dynamic network consumption patterns.

Businesses and end users alike now have a very low tolerance for poor quality. Providers are being constantly evaluated on their performance, and users have new tools for making those evaluations or sharing their appreciation. For example, the Netflix speed index measures and ranks prime time Netflix performance for specific ISPs. Gamers can easily find out from internet reviews which ISPs deliver the speeds they need. If unsatisfied with a service, they can easily complain on social media — potentially impacting the reputation of the provider — or simply just turn to a competitor.

Consequently, ensuring high volume traffic flow with consistent high performance for a top-notch QoE has become a critical business imperative.

Solution

Making the IP network much more responsive starts by simplifying processes and enabling automation where applicable. An integrated solution is essential for handling fast-changing traffic patterns, addressing network congestion and maintaining performance that can deliver optimal end-user QoE at all times.

Insight-driven automation

Insight-driven automation is a new approach to dynamic IP networking. It integrates and combines realtime traffic visibility with instant control over network resources and application traffic flows. Applied to peering, it enables incoming and/or outgoing traffic to be automatically engineered at peering routers and network interconnections.

This approach closes the loop between intent and outcome in three steps:

  1. The operator defines the intent. It can be expressed at a business or application level, then translated into network constraints to steer the closed loop. Operators do not have to describe how to reach the goals. The solution finds the best way independently, based on the objectives that have been set.
  2. The network produces a wide range of continuous feedback information which may include network telemetry, machine data and context information from cloud servers and devices. This information is converted into meaningful insights using multi-dimensional analytics.
  3. These insights are translated into network actions, to be executed by the programmable network, that express the stated intent within the constraints provided.

The state of the network is continuously monitored to verify whether the outcome is meeting the intended goals. Any time the outcome deviates from the initial intent, the insight-driven automation loop takes corrective actions designed to prevent service-impacting issues — or remedy them. For example, a policy can be created to automatically re-map services onto alternative network resources and paths when congestion or high latency is detected.

In a fully automated mode, adjustments can happen without human intervention. In semi-automated mode, the operator validates the actions before proceeding with the changes.

The insight-driven automation loop
The insight-driven automation loop

The following capabilities are required to use the insight-driven automation approach:

Real time visibility, which can:

  • Perform fine-grained analytics on network links with peering partners
  • Identify specific applications flowing from, to and through the network
  • Establish traffic baselines, observe trends and compile metrics.

Traffic optimization, which can:

  • Identify where traffic trends and metrics deviate from an operator’s intent and therefore require re-optimization
  • Map selected traffic flows to more appropriate paths.

Network control, which can:

  • Direct traffic flows using applicable standard protocols, such as Netconf/YANG, Openflow, multivendor filters, BGP Flow Spec, BGP route injection, Segment Routing, BGP policies, and RIB/FIB API
  • Continuously monitor network state and resource health in real time.

Options for steering traffic

A large set of attributes can be used to determine when and where traffic flows should be steered. These attributes help support sophisticated use cases with fine-grained optimization. They can be used by operators to define intent, or they can be used in the closed loop to dynamically program and optimize the network infrastructure and achieve the best outcomes. The criteria described below can serve as a checklist when evaluating the required attributes of an automated peering engineering solution.

Flow classification (5-tuple, Prefix, ASN, BGP Community and DSCP) are the most common attributes used to trigger traffic steering. However, some analytics tools can provide more meaningful attributes which are closer to the business intent and easier to manipulate. These attributes include:

  • Cost: transit fees. This option can, for example, ensure that a transit provider that charges fees is only used when the utilization on direct peering links reaches a certain threshold.
  • Performance: link utilization, latency, jitter, packet loss. For example, traffic can be moved from an interface before it reaches 80 percent utilization to avoid congestion and service degradation, or time-sensitive applications such as gaming or VoIP can be moved to paths with the lowest end-to-end latency.
  • Geographical coverage: countries, regions, sites. For example, traffic can avoid paths that traverse certain geographical areas because of reliability, security or geo-political reasons.
  • Source and destination: enterprises, peering partners, transit providers, CDN operators and internet exchange providers. For example, lowest latency paths can be favored for financial firms with high-frequency trading.
  • Application type: video, gaming, storage, VoIP, P2P, etc.

Network scope is another key aspect that needs to be considered. In many cases, optimizing traffic only at the peering edge is not enough. Taking the provider network and/or adjacent networks into account — in addition to the peering points — is the only way to guarantee the best performance end to end. As shown on Figure 4, with visibility at the peering edge only, it may look like the blue path (going through peering router 1 to provider B) provides the best performance. However, complete visibility will reveal that the red path (going through peering router 2 to provider B) is actually the best option.

Impact of network scope (peering points and internal network) on traffic optimization
Impact of network scope (peering points and internal network) on traffic optimization

Conclusion

Operators can now take advantage of automation to optimize interconnections with their peers for inbound and outbound traffic. But a comprehensive approach is essential for gaining the greatest benefit. That means using an insight-driven automation loop to combine real-time traffic visibility and analytics with instant control over network resources and application traffic flows — enabling dynamic traffic engineering at the peering routers and network interconnections.

Automating peering engineering with greater visibility and control provides crucial advantages. With new levels of insight, providers can anticipate and react quickly to sudden changes in traffic patterns, and this faster response time reduces congestion and improves performance for the latest dynamic and bandwidth-sensitive applications. Automation also improves the utilization of peering interfaces and simplifies the process of moving traffic when interfaces get congested — or when better performance can be achieved. In addition, closed-loop automation helps avoid slow, costly manual configuration and the errors it can cause.

Source: Nokia White paper