Internet Network Peering Limitations, Challenges, Opportunities and Solutions

Based on interviews with internet service providers, transit providers and content providers, the paper offers first-hand perspectives on internet peering. Find out about network peering limitations and challenges, as well as opportunities and solutions that better align peering with the needs of real-time applications.

Internet Network Peering Limitations, Challenges, Opportunities and Solutions. Source: ACG Research
Internet Network Peering Limitations, Challenges, Opportunities and Solutions. Source: ACG Research

Content Summary

Executive Summary
Key Findings
The Internet Needs Peering
Challenges And Limitations In Peering
Edge Computing And Hope Will Not Make Peering Engineering Easier
Networking Investments That Can Help Transform Peering Engineering
Superior Peering Engineering And Security
Moving At The Speed Of Trust
Conclusion

Executive Summary

Global network connectivity and a wide variety of application services are made possible by Internet peering. The ability to interconnect diverse networks from communication service provider (CSP) and webscale companies alike enables users to access information, education, applications and entertainment from around the world. However, evolutions in network peering have generally not kept pace with the types of traffic and applications driving the Internet today or with the widespread increase in security threats that are present in the Internet.

The heart of the Internet peering operations is supported by the Border Gateway Protocol (BGP) control plane, distributing reachability information to enable packets of all types to reach their destinations. Despite being central to peering operations, BGP has well-known limitations in supporting many applications, today, including a lack of capacity awareness, a lack of understanding of real-time traffic utilization at its peering points, and a lack of end-to-end application or flow visibility to ensure users’ expectations are met.

To evaluate the challenges, opportunities and evolutions present in network peering and gain first-hand insights about them, ACG Research conducted interviews with CSP and webscale companies, including an online gaming company. Both types of companies face similar challenges in their peering processes: they are highly manual, generally reactive and involve constant monitoring and reconfiguration by special skilled peering engineers. Although they face these common challenges, their perspectives on what to do about them differ.

Best-of-breed peering is evolving toward automated, intelligent solutions leveraging investments in three key areas: analytics, expanded ingress and egress controls, and enhanced data planes with fabrics instrumented for increased visibility and granularity in traffic management. This combination of enhancements acknowledges the importance of BGP in peering deployments, while addressing its shortcomings. For peering to become more automated, engineers will need to develop increased trust in the analysis and the decisions made by automated systems that they leverage. By providing visualizations and dashboards that integrate peering engineers into the process, algorithms can be hardened, and trust can be obtained along the journey.

Key Findings

  • Peering implementations have not kept pace with modern application needs and security threats
  • The peering control plane utilizes BGP, which lacks:
    • Capacity awareness
    • Knowledge of real-time traffic utilization
    • End-to-end visibility
  • CSP and webscale companies have similar challenges and disparate perspectives on peering needs
  • Three types of investment are needed to improve peering implementations:
    • Analytics engines
    • Unified controls
    • Instrumented fabrics
  • Automation of peering operations requires trust, which involves active engineering engagement

The Internet Needs Peering

One of the greatest attributes of the Internet is its ability to interconnect a collection of independent networks and autonomous systems (AS) to extend reachability across the globe and deliver communication and application services. By interconnecting local and regional networks with national and global ones, a student in rural Southern Illinois or France can utilize a computer and a DSL connection to reach across a world of information, education, applications and entertainment. Internet peering makes this possible.

In the early days of the Internet, peering between networks occurred primarily through public network access points (NAP) in a few major cities. A local Internet Service Provider (ISP) would make a single or redundant connection from a port(s) on a core router/switch to a port on a router/switch at the nearest geographic NAP, which then provided access to the rest of the Internet and other ISPs. With a single or redundant connection, public peering offered the simplest method to reach the rest of the Internet, but did not offer much in the way of traffic visibility and control.

The Internet connects more than 3.5 billion users and 20 billion devices and is made up of an incredibly diverse collection of companies and networks, including communication service providers with local, national and global networks and webscale companies with massive content distribution and application services, like Netflix, Amazon and Facebook. Each company designs, executes and evolves a combination of public and private peering connections to meet the technical and commercial needs of its services and its customers. Today, there are more public peering alternatives than ever before with companies like Equinix offering Internet eXchange services at over 36 Internet exchange points worldwide. In addition to public peering, private peering connections (also referred to as private network interconnects) are frequently used with targeted peers to increase control, improve performance and manage costs.

Figure 1. Public, Private and Transit Peering Relationships. Source: ACG Research
Figure 1. Public, Private and Transit Peering Relationships. Source: ACG Research

Although public and private peering arrangements are widely accessible and cost-effective, the approaches generally taken to network peering have not kept pace with the diversity of traffic and applications driving the Internet today nor with the progression of security and denial of service attacks. To explore these issues further and obtain a first-hand peering perspective on operators’ experiences in wrestling with them, ACG Research conducted interviews with a variety of communication service provider (CSP) and webscale companies, including a global on-line gaming company. We share the challenges and limitations these and other service providers are encountering in their network peering implementations, while also identifying opportunities and prescriptions that operators could pursue to better align their peering implementations with the needs of modern application service delivery and its evolution. We use the terms CSP company and webscale company to refer to two broad classes of companies participating with each other in network peering. Generally, CSPs own IP/optical networks that directly provide mobile, residential and enterprise connectivity services to end users, and webscale companies deliver application and cloud computing services, such as streaming video, IoT, public cloud computing, and on-line gaming. We acknowledge that in many cases a company (or its individual business units) may fit into both CSP and webscale categories. However, using these two categorizations remains useful when discussing the perspectives and needs of both groups.

Challenges And Limitations In Peering

Network peering can be divided into three functional parts: data plane (or network fabric), control plane and management plane. The data plane carries the physical traffic and is responsible for forwarding packets with the appropriate priority and urgency. The control plane is responsible for directing or programming the fabric on how to treat various types of traffic and to which next-hop to forward packets in order to reach their destination. The management plane configures the network and individual physical and virtual elements and provides performance monitoring throughout the service and deployment life cycle.

Although it has undergone numerous revisions and extensions over the past 20 years, the Border Gateway Protocol (BGP) remains at the heart of the control plane for network peering. BGP began and to a large extent remains a protocol designed to advertise routes or network reachability information across networks. BGP is used by peering routers to advertise IP routing information among a collection of connections and networks (and the Internet) to enable traffic to cross network boundaries and reach its destination.

For all of its advantages, BGP suffers from several shortcomings with respect to current service delivery needs. The biggest and well-known of these shortcomings are a lack of:

  • Capacity awareness in the data plane
  • Visibility into real-time traffic utilization and performance among multiple paths
  • End-to-end application and flow visibility to measure users’ experiences

Because of these, a number of extensions to BGP have been developed that many operators use to enhance operations; some go further and apply additional monitoring and control techniques in their own environments for optimizing traffic flows (as we will discuss). Examples of BGP extensions that support enhanced operations in peering route selection include:

  • AS Prepending
  • BGP Multi-Exit Discriminators parameters
  • BGP community strings

These techniques allow operators to influence the path selected in their peering implementations. Using these techniques in egress peering tends to be more straightforward as it involves steering the operator’s own traffic at the peering point along a specific path. Conversely, using them in ingress peering is more nuanced, as this involves attempts to influence the forwarding approach taken by traffic incoming from an adjacent network operator.

Although providing methods for steering traffic, these techniques also often involve manual (for example, command line interface) updates to routing policies in peering deployments. These manual updates are typically performed by a few, select peering experts in network engineering or operations teams. Given the increasing number of peering interconnects that operators need to support, this approach is frequently cumbersome, complex and error prone, characteristics many operators would like to overcome.

To shed more light on the importance of this point to operators in different segments, despite not having a focus on networking as they basis of their commercial offerings (as is the case with CSPs), webscale companies have invested in peering engineering to enable reliable delivery of their applications and services. They have developed application performance dashboards and built peering engineering teams that utilize a combination of public and private peering partners to reach their end customers. As one example, the gaming company we interviewed manages almost 3,000 BGP peering sessions traversing close to a thousand unique AS peers and is adding more every year.

Top Peering Challenges

Although CSPs and webscale companies share many common challenges in peering, they also bring unique perspectives and priorities to their peering operations. We highlight the top common challenges they each face (summarized in Table 1). Both have complex peering configurations in response to new service implementations, service degradation events or network alarms manually modified by specialty engineering personnel. Although providing some level of control, manually re-configuring the network after an issue is identified is not aligned with end-users’ quality of experience expectations and is fraught with the potential for errors or misconfigurations that can create additional service degradations. BGP does provide real-time reachability information across networks and will redirect traffic if an outage occurs (and a route is withdrawn from routing tables); however, without real-time performance monitoring and visibility into the characteristics of live traffic, BGP may not automatically adapt transmission paths based on packet loss, latency, congestion or other real-time conditions that degrade performance but may not cause a complete loss of connectivity. If a better path becomes available based upon current conditions regarding those metrics, it is highly unlikely BGP will find it.

Adding to the subtleties the operators must address are the differences in operation between private and public peering implementations. One of the reasons some companies rely heavily on private (versus public) peering is the relative lack of control they have over the intervening resources (in terms of bandwidth and latency, for example) and the relative lack of deterministic behavior in the event of a failover or fault condition that they experience using public (versus private) peering. CSP and webscale companies acknowledged that public peering by its nature is a shared connectivity fabric that can result in congestion during a failover condition or when operating on a secondary path through an IXP’s network. Multiple companies told ACG that that although they maintain both public and private peering connections, they utilize private peering as much as possible to maximize control/influence over their traffic.

In addition, both CSP and webscale companies are seeing an increase in the number of peering connections they require. Where prior private peering arrangements may have called for two peering interconnections, today, 10 or more peering interconnections is becoming increasingly common. The expansion is being driven by a desire to control costs and enhance application performance. By having multiple peering interconnections, CSP companies avoid the cost of backhauling traffic to larger centralized peering locations (example, NYC and LA). Similarly, webscale companies that can reach their customers locally benefit from reduced transmission distances and application latency. As one example, a CSP company told ACG Research that in the past several years it has expanded peering locations throughout the USA in more than 30 cities; whereas, previously they were only in the three largest (NYC, LA, Chicago).

Table 1. CSP and Webscale Network Peering Challenges and Priorities. Source: ACG Research
Table 1. CSP and Webscale Network Peering Challenges and Priorities. Source: ACG Research

Disparate Priorities and Focus

Although CSP and webscale companies face many similar challenges, they also bring different perspectives and priorities to network peering. CSP companies must manage all kinds of peering traffic from multiple peering partners with diverse latency, packet-loss, bandwidth, cost and resiliency requirements. Webscale companies, on the other hand, are laser focused on their application(s) for their customers. CSP companies may offer a service level agreement (SLA), but those tend to be for aggregated traffic and on a gross scale, such as, the traffic between Amsterdam and Paris will average 1 Gb/s and no more than 500 millisecond latency in any 24 hour period. Although such an SLA might be acceptable for connecting two enterprise branch offices, it is inadequate for a webscale company attempting to measure the latency and experience of an individual online gamer or other performance-sensitive application consumer.

Webscale and CSP companies also have different priorities regarding traffic directionality. Because webscale companies originate content, they are usually more focused on managing egress peering (to the CSP network); whereas CSP companies, that are generally recipients of traffic from webscale companies, tend to be more focused on managing ingress peering.

To put more color onto this, one of the first questions we asked of operators was, what does a webscale company want from Internet peering and are they able to get it? The simple answer is, a webscale company wants the ability to deliver an end-user experience that is consistent with the expectations for the service or application the customer purchased. As an example, in massive multiplayer online gaming if the movement of players in the virtual landscape is jerky and intermittent because of latency or packet loss, users do not view the offering as a good gaming experience. Lag or latency (not bandwidth) is the performance metric that most consumes the attention of webscale companies.

Because most webscale application delivery involves a server and a client (examples include a client on a PC or gaming console) webscale companies can monitor the real-time end-to-end performance of its customers (because they are present at both ends). Utilizing data from its application performance dashboard, the engineering team modifies peering configurations and optimizes traffic paths to meet per-flow latency targets, such as 80 msec or less in the case of the application of our gaming company. If collaboration with peering partners and peering reconfigurations do not achieve the targeted latency, the company may further alter its network design, including in some cases the additional distribution of gaming servers in metropolitan locations. By pushing gaming servers closer to its end-users, the company minimizes transmission distances and the volume of transit peering. The company is then able to focus on achieving its latency objectives by working directly with private peers for last-mile access connectivity to its gamers.

Now, back to our original question(s). Can a webscale company get what it wants from Internet peering? A webscale company may be able to get what it wants from Internet peering but doing so is not usually simple or immediate. Webscale companies generally need to create their own peering engineering team, collaborate with multiple public and private peers and may need to modify their own network architecture, including deployment of application servers closer to end-users, in order to meet their networking requirements and objectives.

Edge Computing And Hope Will Not Make Peering Engineering Easier

Beyond the distributed deployment of application servers, multiple technology evolutions are converging to enable edge computing, placing computing resources within tightly constrained latency boundaries closer to end-users and Internet-of-Thing endpoints at the edge of service providers’ and enterprises’ networks. As extensions of the cloud deployment model, edge computing is expected to be software controlled, dynamically managed and automatically responsive to users’ consumption requirements. Edge computing is appealing as it enables new capabilities not possible with prior infrastructures and deployments. Applications illustrating this include augmented reality to enhance travel and retail experiences; improved public safety and emergency response for smart cities using video analytics; improved industrial efficiency for quality monitoring and production process management; and, of course, real-time multi-player gaming.

Most edge applications will use a tiered distribution model: placing locally relevant functionality at the resource pool closest to the end-user; regionally meaningful logic in metro data centers and colocation sites; and whole service functions in large, centralized data centers. Local cloud pools will need to be continuously refreshed within latency and throughput budgets with information from network and application modules running in nearby sites. Modules in each of the edge sites will be advised of the best hop in their path to the nearest regional application processing site.

These dynamics will put pressure on network peering in at least two dimensions: scalability and automation. The number of peering points between a CSP and a webscale company will increase significantly (as much as 10x) with the need to support single-digit and double-digit millisecond latency response times for applications distributed across the footprint of a CSP and webscale. Increased peering interconnections will enable performance optimized peering to connect the application with the right data at the right time in the right location.

Peering implementations will also need to adjust automatically to meet end-users’ expectations, coordinating with application and service delivery that is now distributed across the networks of CSPs and webscales. We anticipate the primary focus of automation to be between access or last-mile CSP companies and the adjacent webscale company that owns the application. One can imagine embedding management plane data, such as application performance monitoring and network performance monitoring data, within or in conjunction with BGP control plane updates to enable CSP and webscale companies to make easier and better-informed decisions at a more granular, per-flow basis.

Networking Investments That Can Help Transform Peering Engineering

Given the number of challenges that exist in existing and emerging peering environments, what can be done to address the challenges and priorities that exist for both CSP and webscale companies? In a prior paper, Powering Intelligent Network Services with Real-Time Visibility, Analytics and Automation, ACG Research identified four key types of investments service providers should be making to transform their networks to support more intelligent, application-aware and automated service delivery. We focus on three of these as the building blocks for improving network peering with improved decision-making and traffic routing for optimal application delivery and end-users’ quality of experience:

  • Real-time analytics leveraging network and application telemetry
  • Simplified and fully programmable control plane
  • More capable data plane and highly instrumented networking fabric

At the heart of these investments is the transition to insight-driven automation of network operations. Insight-driven automation is critical for helping CSP and webscale companies accomplish dynamic network peering, eliminate manual configurations and the inefficiencies and errors they introduce, and meet their end users’ increasingly demanding requirements for responsiveness and quality of experience. But intelligent automation first requires insight. To derive insight, one must have access to reliable and timely information, which requires an instrumented, granular networking fabric. An instrumented fabric also provides a foundation for better peering security.

Figure 2. Insight Driven Automated Network Peering. Source: ACG Research
Figure 2. Insight Driven Automated Network Peering. Source: ACG Research

A More Capable Data Plane and Highly Instrumented Networking Fabric

Although most of the challenges in Internet peering are tied to BGP and the control plane, a more capable data plane will enhance network peering in four ways: increased capacity, increased logical scalability, real-time performance monitoring and real-time flow replication.

With respect to capacity, note that IP traffic continues to increase at a 25% compound annual growth rate (CAGR) with peak data rates growing even faster at 40% CAGR. Mobile operators report seeing 50% annual data growth in their networks. In an interview with a transit network service provider, one of its biggest challenges is based in the lack of 100Gb/s interfaces with potential peering partners. To provide the necessary capacity, the network fabric and its physical interconnections need to keep up with raw transmission capacity requirements, including 100G+ port speeds.

Increased logical scalability in the number of traffic classifiers, access control lists (ACL), and their sophistication is needed to enable the fabric to be programmed to perform the right traffic forwarding at scale while blocking malicious or aberrant traffic from entering the network at the ingress point.

To enhance path selection in peering, we need better performance monitoring. Real-time per flow statistics and metadata collection using IPFIX, NetFlow, sFlow and streaming telemetry functions will enable performance monitoring and service assurance software to measure, analyze and diagnose per application and per flow performance. Better data and analysis can help match flow performance with application delivery requirements and expectations (for example, by operating within their latency budgets).

Real-time per flow replication enables the fabric to continue to forward traffic while in parallel sending an identical copy to an analytics engine for additional analysis. If the analysis positively identifies a virus or DDoS attack, additional action can be taken to install an ACL rule to block or rate limit the original flow at the ingress point to the network. Five-tuple filtering rules can be used for the ACL entry, but industry-leading packet processors also support matching on payload patterns. Additional DDoS security mitigation and the use of BGP Flow Specification are covered in another section of this document.

Nokia’s 2.4 Tb/s FP4 silicon chipset with support for millions of classifiers and ACLs with flexible 5-tuple and payload matching is one example of a solution that provides the scalability, fine-grain visibility and control and improved security mechanisms required for supporting such next-generation data planes.

A Simplified and Fully Programmable Control Plane

In the paper we mentioned, our control plane discussion focused on the use of an external path computation element and segment routing to eliminate the need for protocols such as RSVP-TE and LDP while also enhancing path computation and network forwarding decisions.

In the evolution of network peering, a simplified and fully programmable control plane is relevant, but given that network peering is by its very definition between autonomous systems or inter-AS, we must extend the programmable control plane concept across networking boundaries while also acknowledging the need to co-operate with the existing BGP framework. In this manner, we envision an analytics engine performing real-time analysis on all types of performance data, including network telemetry to derive insights that are supplied to a controller similar to Nokia’s NSP. The controller transforms analytic insights into actions and issues instructions to the network and peering routers regarding steering and/or modification of existing paths. Assuming the controller only has direct access to the peering routers in its network (a likely scenario), then BGP and the techniques we listed previously, for example, AS Prepending to prioritize or deprioritize a given route, will continue to be used to influence network peers.

Real-Time Analytics Leveraging Network and Application Telemetry

A sea of networking data is useless without the ability to analyze it in the needed time frame and draw appropriate insights and actions from it. Analytics plays a critical role in building an insight-driven automated peering solution. Telemetry from the enhanced network fabric provides real-time visibility into networking conditions, including utilization, congestion, packet loss and latency. Analytic tools such as Nokia’s Deepfield correlate massive amounts of information and IP metadata from NetFlow, IPFIX and sFlow records with BMP and BGP route updates to identify issues impacting application delivery, generate potential fixes and provide insights for resolution. Some advanced analytics solutions also utilize techniques and software agents to continuously scan and probe Internet servers and hosts to create a continuous contextual and logical map of the Internet, including cloud applications and services and peering and transit networks. By combining data from so many diverse sources, industry-leading analytics engines offer insights that were simply unattainable a few years ago.

Figure 3. Big Data Analytics Source Data and Interactions. Source: ACG Research
Figure 3. Big Data Analytics Source Data and Interactions. Source: ACG Research

Superior Peering Engineering And Security

Now that we have discussed the three key building blocks for creating insight-driven automated peering, we pull the pieces together and see how they apply to a set of example use-cases. The first use-case considers an egress peering situation with temporary congestion or degraded latency conditions. The second use-case focuses on ingress peering. Although these two use-cases are each congestion and latency based, industry-leading peering automation solutions will be able to gather insights and produce traffic steering directives that support any of the intents and service delivery objectives we have outlined. For example, if minimizing cost or transit fees is the most important criterion to meet, our analytics engine and network control implementations should support that. If part of our intent is to avoid certain geographies or categories of peers, our tools should support that as well. In our third use-case, we look at how investments in peering engineering, including analytics, can be leveraged to enhance peering security and DDoS protection.

Use-Case #1: Egress Peering Engineering with Congestion or Latency Degradation

In this example we use network visibility, insight and control in an egress peering situation to overcome temporary network congestion or latency degradation in our links to an adjacent peering partner network. We assume that application content is being delivered to its end-users via a primary peering partner. Although there is no outage or loss of connectivity, the active delivery path through Peering Provider A encounters temporary networking congestion. In this case, the network detects the congestion and can redirect the egress traffic to take Peering Provider B to reach the destination. Egress redirection can be accomplished in several ways, but one of the most straightforward mechanisms is installation of a traffic flow classifier. The classifier can be highly granular redirecting only a single flow or it can be as broad as redirecting all traffic and all flows to Peering Provider B. Once the analytics and control engines detect that congestion has ended, the classifier can be removed, and traffic can return to its original path.

Industry-leading peering solutions also enable latency measurements to be taken across the operator’s own network and from egress to the IP destination or application end-point. With these capabilities, the measuring service provider can monitor and honor its latency commitments by redirecting traffic through low-latency paths and peering partners as needed without requiring access to the actual application latency data.

Figure 4. Egress Peering Optimizations Use Case. Source: ACG Research
Figure 4. Egress Peering Optimizations Use Case. Source: ACG Research

Use-Case #2: Ingress Peering Engineering with Congestion or Latency

In this use-case as shown in Figure 4, we consider congestion or latency at the ingress point to an IP network. Once the analytics and control engines are aware of the congestion or latency degradation, corrective action can be taken to influence the path of the incoming traffic. In this example, let us assume there is ingress congestion on traffic coming into the peering router from Peering Provider A. We can use the ingress BGP peering techniques, like AS Prepending and BGP Communities, to lower the preference for traffic to take the route through Peering Provider A and/or increase the preference for traffic to take the route through Peering Provider B.

Figure 5. Ingress Peering Engineering. Source: ACG Research
Figure 5. Ingress Peering Engineering. Source: ACG Research

Use-Case #3: Better Peering Engineering Delivers Better Peering Security

The optimal location to mitigate DDoS attacks targeted at disrupting service provider network operations is at the entry to the network before the attack is propagated to other elements and additional resources, including physical networking links. In network peering, that means interrupting DDoS traffic at the peering routers that form the basis for public and private peering interconnections.

The good news for CSP and webscale companies is that the same three investments and the architectural framework for enhanced peering engineering can also be leveraged for improvements in peering security. CSP and webscale companies can automate DDoS protection by combining analytics for attack detection with software-driven control of the programmable data fabric for mitigation. Advances and innovations in peering router design have also now enabled volumetric DDoS threat mitigation to be implemented in the peering router.

Flow metadata and streaming telemetry continuously feed the analytics engine as it searches for anomalous flows and behaviors. Once an aberrant condition is detected by the analytics engine, the analytics engine or a controller can update the access control list of the peering router to redirect, rate-limit or block suspect traffic. This is a prime example of streaming telemetry driving real-time intelligence for insight-driven network automation that exploits advanced programmability in the data plane.

Figure 6. Enhanced Peering Security with BGP Flow Specification. Source: ACG Research
Figure 6. Enhanced Peering Security with BGP Flow Specification. Source: ACG Research

In addition to detecting and blocking traffic at the ingress to one’s own network, we can improve scalability and reduce reaction times by also informing networking peers about potential threats. As shown in Figure 5, DDoS information can be shared between providers through integration of analytic engines and used as a trigger point to better coordinate mitigation of incoming attacks. One industry-leading approach utilizes BGP Flow Specification to provide a mechanism to share DDoS ACL entries (or filters) among peering routers across autonomous systems.

Just like BGP distributes routing information for network reachability, BGP Flow Specification distributes DDoS classifiers (5-tuple and payload match in industry-leading solutions) and defined actions for each to inform the peering router what traffic to match on and what action(s) to take when/if a match occurs.

However, as one Tier 1 CSP interviewed indicated to us, BGP Flow Specification currently lacks necessary eBGP external facing controls that would help ensure its usability in a wider range of peering implementations. For example, it currently does not support encryption, and the values it exchanges between peers also require further handling when receiving rules that are improper (the requested flowspec values require validation on the route within the AS and the IGP of the peer). If used maliciously these BGP FlowSpec/eBGP vulnerabilities could potentially be exploited to black hole traffic from third-party networks.

This is why an east-west integration between provider analytics and control systems over an encrypted session would enable a better mechanism to ensure that rules are proper and sent by the expected trusted peer network. Leading vendor solutions have extended this further by using cloud-hosted solutions to bring a common visibility of data, including cloud application context that provides more unique insight into inter-AS DDoS Information Exchange between analytics solutions.

With all approaches, the triggered actions range from applying a DSCP traffic marker to the traffic, to redirecting it to a VRF or rate limiting it, to dropping the matched traffic altogether. Analytics, control and an enhanced data plane play cooperative roles in evolving peering security. Although BGP Flow Specification is infrequently deployed today, it was mentioned in future deployment plans by multiple companies that we interviewed. In addition, BGP Flow Specification for Inter-AS DDoS mitigation was highlighted as an approach to enhanced security in peering in a presentation delivered at the NANOG73 event in 2018.

Moving At The Speed Of Trust

One additional topic is crucial to consider in the evolution of peering engineering and security. Although we generally acknowledge the need for increased intelligence, control and automation in network peering to overcome its current challenges and better align solutions with the requirements of evolving applications and services, network engineers and network operations personnel will not instantly take their hands off the peering wheel and give autonomous control to the algorithms, the analytics and the control systems focused on making the operations more efficient and responsive at scale. It is critical that the analytics and control engines being used provide visual insights to engineers managing their networks. They will justifiably want to review the information, investigate alternatives and approve the decisions for reconfiguring and steering traffic in many cases. With support for protocols like BGP Monitoring Protocol (BMP), industry-leading analytics engines and visual dashboards can consider paths beyond the currently active one reported by BGP. A key benefit of BMP is full visibility into all the possible BGP paths. By engaging peering engineering personnel in the process, analytics and control algorithms can be further validated and matured in live networks all while building trust along the way. As trust increases so will the levels of automation that peering engineers feel comfortable invoking in their networks.

Figure 7. Ingress Peering Route Optimization Visual Dashboard. Source: ACG Research
Figure 7. Ingress Peering Route Optimization Visual Dashboard. Source: ACG Research

Conclusion

Peering has not kept pace with the types of traffic and applications driving the Internet or with an ever-increasing set of security threats present today. Although the BGP control plane remains at the heart of Internet peering, the well-known limitations of BGP remain, including a lack of capacity awareness, a lack of real-time traffic utilization and a lack of end-to-end flow visibility. Both CSP and webscale companies are facing similar challenges in Internet peering, including complex, manual and error prone operations.

Best-of-breed peering solutions are evolving toward intelligent, automated peering based upon three foundational investments: analytics, uniform ingress and egress peering control and an enhanced data plane with instrumented fabric for increased visibility and granular traffic management. This approach acknowledges the dominance of BGP in existing networks while improving upon its limitations. In addition, the same investments and architectural framework can be combined with BGP Flow Specification to provide enhanced peering security with the ability to share DDoS mitigation information across carriers. To fully automate Internet peering, we must build trust with the engineering and operations personnel running networks today. Best-of-breed solutions are incorporating dashboards and graphical user displays that engage and encourage human collaboration while building trust on the path to fully automated network and security peering.

Source: ACG Research