Keyur Patel and Sanjay Kumar, Arrcus, Inc.
Published: 31 Aug 2022
CTN Issue: August 2022
A note from the editor:
The last two months we have enjoyed articles about routing in IP fabrics, including a new routing protocol, Routing in Fat Trees (RIFT). In this final chapter of the series, the authors from Arrcus provide us with an overview of a new protocol designed to simplify the building and operation of large-scale data centers.
Enter Link State Vector Routing (LSVR) protocol, an emerging protocol developed by the IETF for hyperscale data center and cloud providers which augments BGP by replacing its path vector algorithm with a Shortest Path Dijkstra algorithm. The main advantages of LSVR include but not limited to view of complete fabric topology, faster convergence compared with classic BGP and operation simplicity.
The CTN editors hope that you enjoy reading this series, please let us know if you have any questions or comments.
Yingzhen Qu, CTN Associate Editor
DataCenter Networking with LSVR & L3DL
As application-centric digital infrastructures have become mainstream, data center network architectures (across both enterprises and providers) are firmly rooted in routing-centric designs leaving behind the siloed switching-centric designs of the past. Starting with the hyperscale cloud providers and extending to most major enterprises, multi-tier Clos topologies (“Leaf-Spine”) are the norm today and into the foreseeable future. These designs enable wide fan-out bandwidth for east-west host-to-host communication, Layer 3 routing to the top-of-rack (or even into the server host), and consistent latency metrics. An example of IP Clos topology is shown in Figure 1.
From a control plane perspective, both the hyperscalers and the big enterprises use eBGP routing protocol as the data center transport-layer (aka “underlay”)  because of its proven ability to scale (in the backbone of networks). Furthermore, both Border Gateweay Protocol (BGP) and TCP have decades of operator experience and operational toolkits that allow operators to manage and analyze the control plane effectively. This class of customers typically build their own overlays on top for application traffic pertaining to intra-PoD and inter-PoD communication. They typically use BGP EVPN  or BGP VPN to create and maintain overlays. The switch configuration complexity of BGP EVPNs with underlay BGP is well-known and well documented by the operators.
The rest of enterprise landscape, with no such extensive resources, has tended to limit BGP usage to the overlay control plane with BGP EVPN particularly where segmentation is required and then using traditional Interior Gateway Protocols (IS-IS , OSPF ) as the underlay. This also has the undesirable side effect of a lack of transport PoD scale and also a multi-layer protocol stack (e.g. ISIS/OSPF for underlay, EVPN-BGP for overlay) with associated complexity. In addition, it allows the incumbent vendors to use that as a reason to provide vertically integrated locked-in fabric solutions.
Impact of Merchant Silicon
There has been a phenomenal progress in merchant silicon field particularly for chipsets targeting Datacenter switching. They are typically focused on highspeed switching, low latency, less features and lower buffers which are desired by hyperscalers and financial networks. Merchand silicon havs been growing in terms of speeds and feeds they support from 1G ports to 400G ports, as well as density of ports from 32 * 100G ports to 128 * 100G ports.
As port densities grow, the protocol peering that happens in the Clos networks to build the topology information grows as well. It also has a direct impact on number of routes and its ECMPs (Equal Cost Multi Paths) within the network. Decoupling the protocol peering from number of ports supported on these chips allow routing protocols to scale and converge lot faster. This can be achieved by using out of band peering models using route controllers.
Need for Link Discovery and Link Level Liveness
In Datacenter networks, routing protocols like BGP are used to build the topology and reachability databases for undelays as well as overlays. These protocols can benefit a lot from the discovery of links, its attributes and neighbor peering information. They can use this information to auto-create the protocol peering and exchange topology information as well as fast link failover detection over it. This can significantly simplify Datacenter network switch configuration and management.
The link liveness and detection help in reconverging in the event of a link failure or a node failure and data packet loss is minimized by faster reconvergence in the Clos networks. Both the link as well as the node liveness is typically handed by the BFD (Bidirectional Forwarding Detection)  protocol. For most switching and routing silicon the data-path BFD is implemented in the hardware. The protocol configuration, detection and the handling of the events are typically handed in the software architecture. Autoconfiguration of BFD peers also help in simplifying overall switch configurations.
Link State Vector Routing (LSVR)
LSVR is a cross vendor-customer collaboration effort in the IETF standards body to simplify large scale data center fabric designs and building operationally simple and low-cost fabrics (both IP Clos and multi-tenant BGP EVPN).
LSVR augments BGP by replacing its path vector algorithm with a Shortest Path Dijkstra algorithm . As a result, it replaces all the phases of existing BGP best path decision process. It also introduces a new BGP-LS-SPF SAFI (Subsequent Address Family Identifiers) within BGP. It has its own BGP NLRI (Network Layer Reachability Information) constructs for carrying IPv4 and IPv6 related link information using the new SAFI. Any routes that are computed as part of BGP-LS-SPF SAFI would be installed within the appropriate IPv4/IPv6 tables in RIB (Routing Information Base) and FIB (Forwarding Information Base). The preference and priority given to these routes are significantly higher than that of traditional BGP and IGP IPv4/IPv6 routes.
These modifications provide an option for BGP to connect with route reflectors or route controllers as shown above in Figure 2, and that are not in the forwarding path amongst the other peering models and thereby allows protocol peering to decouple from switch port densities that are ever growing. The changes also allow BGP to be deployed as the only routing protocol in any kind of Clos deployments – this helps simplify the software deployed on switches as well as helps retain existing operator infrastructure for configuration and operation of networks. The modifications to BGP also improve overall BGP convergence in the Clos networks.
The use of any centralized route controllers assists in inserting alternate paths for fast convergence, traffic engineering, efficient monitoring as well as in several other applications where centralized command and control is required within a network.
Layer-3 Discovery and Liveness (L3DL)
LSVR uses L3DL  to discover both the Link attributes, addresses and the protocol peering information. L3DL is also a cross vendor-customer collaboration effort in the IETF standards body. It uses multicast Hello PDUs to discover switches at either ends of the Links. L3DL has its own state machine to maintain stateful L3DL peering sessions between two switches. It discovers physical addresses, link attributes, logical addresses, and protocol peering information to facilitate auto peering for LSVR. Figure 3 illustrates the L3DL functionality. The auto creation of protocol peering would simplify Datacenter network switch configuration. The simpler the configuration, the less burdensome is the switch operation.
The explosive demand for data-intensive applications requires today’s network infrastructure to be faster, smarter, and better. Merchant silicon are innovating at the rapid rate to increase speeds and feeds as well as deliver higher port density on a single chip. As network transformation gains steam, operators are demanding simplicity and high programmability with Traffic engineering to quickly deliver applications and services. Yet, the existing fabric solutions are complex with separate protocols for overlay and underlay.
LSVR with its flexible peering models allows already deployed BGP to build and maintain complex underlay and overlay topologies in the Clos networks. Furthermore, with BGP EVPN/VXLAN solution combined with LSVR and L3DL, enterprises and service providers can now build scalable, automated, multi-tenant data center fabrics while minimizing their operational expenses. Ability to discover link capabilities and protocol peering information tremendously simplifies configuration.
- RFC 7938 - Use of BGP for Routing in Large-Scale Data Centers (ietf.org)
- RFC 8365 - A Network Virtualization Overlay Solution Using Ethernet VPN (EVPN) (ietf.org)
- ISIS - ISO/IEC, International Organization for Standardization, "Intermediate system to Intermediate system intra-domain routeing information exchange protocol for use in conjunction with the protocol for providing the connectionless-mode Network Service (ISO 8473)”, 2002
- OSPF – Moy, John, “OSPF Version 2”, RFC2328, 1998, IETF
- RFC 5880 - Bidirectional Forwarding Detection (BFD) (ietf.org)
- draft-ietf-lsvr-bgp-spf-16 - BGP Link-State Shortest Path First (SPF) Routing
- draft-ietf-lsvr-l3dl-09 - Layer-3 Discovery and Liveness
Statements and opinions given in a work published by the IEEE or the IEEE Communications Society are the expressions of the author(s). Responsibility for the content of published articles rests upon the authors(s), not IEEE nor the IEEE Communications Society.