Copyright 1998 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

This article was published in the May 1998 issue of
IEEE Communications Magazine.

CIRULE1.GIF (372 bytes)

Abstract
A "reserved bandwidth" service is described as a quality of service enhancement in the vBNS. Design of this service is based on the characteristics of vBNS user traffic and the state of the art of commercially available technology. An evolutionary strategy is adopted for vBNS QoS development with this service as the initial offering. Its implementation scheme and issues are discussed in detail. Future extension to include differentiated services or CoS is also described.

 

CIRULE2.GIF (100 bytes)


Quality of Service Development in the vBNS

CIRULE3.GIF (212 bytes)

Chuck Song, Laura Cunningham, and Rick Wilder
MCI Communications Corporation

 

The vBNS, for very-high-performance backbone network service, is a leading-edge high-speed backbone network for research and education (R&E). Sponsored by the National Science Foundation and implemented by MCI, the network was first brought up as an OC3 backbone in early 1995 to interconnect the NSF-supported supercomputing centers (SCCs) and network access points (NAPs). In 1997, it was upgraded to an OC12 backbone. With over 100 universities approved by NSF to be connected to the vBNS, it is now a high speed national backbone connecting research and education institutions and government sponsored networks. Figure 1 is a map of the vBNS backbone, showing mainly the infrastructure but not all customer connections.
Considered a continuation of NSFNet for the task of advanced Internet technology development, the vBNS is expected to provide not only production-quality backbone transport service to the leading research and education institutions, but also a production environment in which technology experiments can be conducted. At present, quality of service (QoS) is an important research and development focus for the vBNS. The vBNS environment has some unique user requirements and traffic characteristics. Our QoS design is based on the following considerations: the service must be a substantial enhancement to our current service offering and must be useful to current applications, and the implementation mechanisms must perform at high speed and be extendable to incorporate different strategies to address the scalability issue facing today's Internet.
In the following sections, we will first describe the current vBNS services and architecture. Then we will elaborate on the type of QoS we plan to offer initially, followed by a detailed discussion of our implementation. At the end, we will describe how we will extend our service.

vBNS Service and Architecture

Architecture Overview

As shown by Fig. 1, the vBNS backbone comprises nine terminal nodes geographically distributed across the United States, and four SCC nodes. Each node typically consists of a Fore ATM switch, one or more IP routers, a high-performance UNIX host for performance measurement, and an IP flows monitor to record usage on OC-12 fiber pairs.
The set of customers served by the vBNS was expanded in 1997 by NSF's New Connections Program. In addition to the SCCs, approximately 100 universities are in the process of being connected to the vBNS. These sites peer with the vBNS with a BGP peering session and are configured as primary peers. This means they receive all vBNS routes as well as the routes of other primary and secondary peers. In addition, secondary peers who are connected to other U.S. and international R&E networks receive routes to vBNS primary peers through network interconnects.
A full mesh of unspecified bit rate (UBR) permanent virtual paths (PVPs) that are statically configured through an underlying ATM network interconnects the terminal nodes. The underlying network is MCI's commercial ATM service, called Hyperstream, which supports other customers in addition to the vBNS. Figure 1 shows the logical vBNS topology. The use of HyperStream VPs rather than dedicated trunks has interesting implications for the QoS design which will be discussed later in this article. There are four SCC sites: San Diego Supercomputer Center (SDSC), National Center for Atmospheric Research (NCAR), National Center for Supercomputing Applications (NCSA), and Pittsburgh Supercomputer Center (PSC); and nine MCI terminal sites with a vBNS node collocated with a HyperStream ATM switch. The SCC nodes are connected to their closest terminals with a single UBR PVP. Similarly, other customers connect to the vBNS via a single UBR PVP to either a terminal node or, in some cases, an SCC. The physical connection is at DS-3 or OC-3 rate. All connections support IP, and some support ATM as well.

Users and the Connection Structure

The vBNS is an IP-over-ATM backbone network with IP backbone transport as its basic service. The service is provided in two distinct logical views. The first is via a mesh-like network, constructed using point-to-point permanent virtual connections (PVCs). All customers use this point-to-point mesh for IP transport over the vBNS backbone. Within the mesh Open Shortest Path First (OSPF) is used for the internal routing. Between a customer network and the vBNS, BGP is used for the peering. The second view of IP connectivity over the vBNS is via an ATM logical IP subnetwork (LIS). This LIS connectivity is provided to all vBNS infrastructure routers and hosts, and also to those customers who have direct ATM connections. Since no routing protocol is run on the LIS, transit traffic does not traverse this LIS normally. Both the mesh and the LIS provide best-effort service. As a design objective, this structure for best-effort service access is to be preserved when other services are added.
Among our connected customers, SCCs typically have direct connections to vBNS customer site equipment. Some of their hosts, particularly supercomputers, have unshared direct connections to vBNS routers/switches. From an end-to-end point of view, such connected SCC hosts (not all of them) are the same as vBNS directly connected hosts in the ability to achieve an end-to-end QoS. The other group of users, the universities approved by NSF for connections, have a variety of ways to connect to the vBNS (Fig. 2). Many go through traffic aggregation points, such as gigapops. Some traverse long-haul ATM links with switches between to connect to vBNS. The third group of users, mostly other research and government-sponsored networks, send traffic to the vBNS via NAPs. In the last two cases, achieving end-to-end QoS is more difficult than for directly connected hosts. In our QoS design, we take into consideration the implication of the different vBNS connections on the end-to-end QoS. We support end-to-end signaling within the vBNS and provide traffic differentiation based on such signaling. We believe our design facilitates the extension of the QoS through cooperating service provider and customer networks to allow end-to-end performance assurances even with multiprovider paths.

Traffic Characteristics

Traffic characteristics of user applications strongly influenced the design choices for QoS enhancement. Some SCCs' applications involve supercomputer-to-supercomputer traffic, which traverses the vBNS as the only long-haul transport. There is often a large data volume, and applications can be sensitive to packet loss. It is important to such applications that a large amount of bandwidth can be "committed" when needed. The number of these high-bandwidth application flows simultaneously on a given network path is generally small, so scalability is not a major concern. For the NSF-approved universities, traffic aggregation often occurs at shared connection points, or "gigapops."
Today, our traffic load is not at a level to cause congestion to be a frequent event, but with many universities being added, the traffic load is rapidly increasing. Since, due to factors beyond our control (e.g., a new connection approval process), we do not have precise predictability of traffic growth, we must have an aggressive QoS implementation plan to deal with unexpected traffic demands.

A Description of "Reserved Bandwidth" Service: An Evolutionary Strategy

Several factors require that vBNS QoS be provided in an evolutionary process. These include the expansion of our customer base, the increasing and diversifying needs for QoS from new applications, and our desire to quickly deploy vendor equipment/products as they become available.
We consider the reserved bandwidth service a match between today's technology and our immediate QoS needs. Implementation mechanisms for this service, which will be discussed in detail later, are under development by equipment vendors and will be deployed in this service prior to general commercial availability. The current design allows for the QoS implementation mechanisms to be useful to future QoS enhancements to address scalability by treating numerous IP flows, as in the backbone, as a small number of traffic aggregates.

Design Objectives

Based on our network architecture, service requirements, and technology feasibility, we attempt to achieve the following design objectives:
  • Support for two distinct classes of service: a service with bandwidth commitment and a traditional best-effort IP service with no such commitment. Access to this reserved-bandwidth service should be no different from the best-effort service, except that signaling is needed from a user.
  • Support for per-application-flow bandwidth allocation to support applications' sessions. The per-flow-based implementation mechanisms should be capable of supporting per-class bandwidth allocation with no major changes required.
  • Efficient use of bandwidth when supporting flows with potentially large data volumes and highly bursty behavior while providing a relatively firm commitment of the bandwidth to the QoS-requesting applications.
  • Accommodation of non-vBNS traffic that shares the same ATM layer resources. Such traffic is mainly native ATM-level traffic and may also require QoS commitment at the ATM level. Note that such traffic may not be subject to vBNS admission and policy control. However, such traffic should not compromise vBNS QoS or vice versa.
  • The QoS is measurable. Customers should be able to determine whether the request for bandwidth is actually honored. Network operators should be able to monitor and measure the service. Such monitoring and measuring capability is important for parameter tuning to achieve efficient utilization of network resources.

Characteristics of Reserved Bandwidth Service

The following are the defining aspects for the reserved-bandwidth service:
Service Measurability -- This service provides allocated bandwidth. When allocated to a traffic flow, a data throughput equivalent to the bandwidth should be observable over a reasonable time window. This time window, typically in seconds, should be large enough to accommodate service "bursts" and delay due to various network internal resource scheduling. Note that an equivalent data throughput may not be achievable if traffic is injected into the vBNS with bursts that can either starve internally scheduled service or overflow the buffering capacity. A certain level of burstiness is allowed by this service. Users are encouraged but not required to shape their traffic for a better result.
Service Invocation -- The service is triggered by RSVP messages, and bandwidth is dynamically allocated [1]. This is especially important because a static allocation for a small number of flows with large data volume can lead to very inefficient use of bandwidth resources.
Service Granularity -- A variety of traffic service granularity is allowed, ranging from individual application sessions or classes of traffic aggregated by certain rules, to all traffic arriving at a connection port. Initially we use the flow-spec in RSVP messages to specify traffic to be serviced. This may be augmented in the future with other tools.
Service Scope -- The service is within the vBNS backbone. Bandwidth allocation is performed on vBNS backbone trunks, and may also be performed on trunks exiting from vBNS to a customer in some cases. Traffic classification and policing are performed on the access links at the vBNS edge. Since RSVP is the signaling method, end-to-end bandwidth allocation is possible if bandwidth can be allocated on the path between the vBNS edge and the source/destination ends.
Unicast Service -- Initially, multicast bandwidth allocation is not provided although best effort multicast is available on the vBNS. This also limits the bandwidth reservation to be of only one style similar to the "Shared Explicit" specified in RSVP. We plan to address this limitation as soon as possible after initial deployment and evaluation of the unicast service.
Although packet delay is well bounded within the vBNS backbone, our reserved-bandwidth service does not offer any delay-variance-related QoS [2]. The service is considered to be somewhat stronger than the controlled load service [3]. Instead of giving an illusion of a lightly loaded network, a definition under which a range of implementation schemes is possible, we give a relatively firm commitment of bandwidth. The word "relatively" suggests that we have not excluded the use of bandwidth overbooking for the benefit of statistical sharing. However, we must make sure that an equivalent throughput is not compromised. Due to the traffic characteristics on the vBNS (i.e., potentially a small number of large data volume bursty flows), we will take a very conservative approach at the beginning of service implementation toward the tuning of this parameter. Tuning will be guided by our traffic measurement and experience with the service.

A Description of the Implementation

Backbone ATM Configuration

Enhanced Design for Multiservice Support -- The current vBNS ATM configuration includes a full-mesh of UBR PVPs between ATM switches at major vBNS nodes and a full mesh of UBR PVCs between vBNS routers. The UBR PVPs and PVCs are retained to carry best-effort traffic, while a second set of virtual connections is configured to carry other classes of traffic. For reserved-bandwidth traffic, a static set of VPs is used. The static VP configuration for reserved-bandwidth traffic provides a single near-real-time variable bit rate (nrt-VBR) VP per physical trunk in the OC-12 Hyperstream backbone. vBNS reserved-bandwidth flows whose path uses multiple Hyperstream trunks exit the Hyperstream network to be switched through a vBNS Fore switch to take the next Hyperstream hop. Although this configuration increases the number of switch hops taken by reserved bandwidth traffic in many cases, it is not expected to have a significant performance impact. The benefit of this configuration is that with a single VBR VP per physical trunk, the full SCR provided by the underlying Hyperstream ATM network can be configured for that VP. If we were to use a mesh of VBR VPs, as in the case of the best-effort UBR mesh, it would require dividing the SCR of the physical trunk among a variable number of VBR VPs that traversed the trunk, potentially leading to poor bandwidth utilization. Figure 3 shows a simplified example of how the UBR mesh of PVPs and the VBR terminal-to-terminal PVPs are logically configured.
Best-Effort Traffic -- Best-effort traffic uses UBR PVCs configured across the UBR VP mesh. Since the OSPF peering connections use these PVCs, the standard routing of packets will take these PVCs. Figure 4 shows how the UBR VPs are routed between vBNS switches and the path of PVCs through them.
Reserved-Bandwidth Service Traffic -- An SVC is set up through the VBR VPs for each reserved-bandwidth flow on a demand basis. Requests for reserved bandwidth service that cannot be supported within vBNS backbone resources will be denied. In such a case, applications can use the best-effort service if appropriate. There are two advantages to this VP/VC strategy for reserved-bandwidth traffic. One is that the Hyperstream configuration is simple and static. Signaling is not required of the Hyperstream switches, and provisioning changes should be rare. Also, having a single reserved-bandwidth VP per physical path allows optimal use of available bandwidth resources. This allows full bandwidth to be allocated between any endpoints that need it. A static VP configuration with multiple VPs per physical trunk would require predetermined bandwidth allocation to the VPs, dividing the bandwidth in a way that would seldom correspond to application needs. Figure 5 illustrates the PVP/SVC routing for reserved-bandwidth service.
Signaling -- Initially, the vBNS Fore switch signaling will tunnel through the Hyperstream switches, so user-network interface (UNI) signaling between the switches is not necessary. If, in the future, we want to stop tunneling through the Hyperstream switches and we find that the Fore and Hyperstream network-network interfaces (NNIs) do not interoperate, UNI may be used between them to enable switched virtual connection (SVC) setup through both networks. This solution would require static route configuration in the vBNS switches.
Coordinated ATM Admission Control -- The vBNS and Hyperstream switches must perform admission control that is compatible with each other as well as with the IP-layer admission control. IP admission control is implemented with a statically configured parameter, which defines the threshold for reserved bandwidth. Admission control at the ATM layer is determined by the result of a call admission control (CAC) computation that may differ on different vendors' switches. The CAC algorithm results for all switches for the statically configured paths must result in the same call admission decision from all switches. Also, the ATM-layer decision must agree with the IP-layer decision to have a consistent admission policy.

Router QoS Mechanisms

Flow Setup -- Use of the reserved-bandwidth service by user applications requires a per-flow path set up once an RSVP RESV message is received. vBNS routers are required to initiate signaling for an appropriate SVC across the vBNS to carry the flow's traffic. To do this, the Rspec parameters from the RSVP RESV message must be mapped to corresponding UNI signaling parameters. When the signaling is complete, the resulting virtual path idendifier/virtual connection identifier (VPI/VCI) must be added to the state information associated with the flow so that the data packets belonging to the flow can be routed onto the correct ATM virtual connection to cross the vBNS. The mapping between RSVP Rspec parameters and ATM signaling parameters follows current work in the Internet Engineering Task Force (IETF). RSVP PATH messages first use the existing best-effort PVC until a VBR SVC is set up for the traffic flow requesting the service. From then on, this SVC will be used to carry the RSVP PATH message for the same flow. RESV messages for a flow also use an existing best-effort PVC. If a reverse VBR SVC is set up for the flow, the RESV messages will also use this SVC. If no reverse-path SVC is set up, the RESV messages will keep using the best-effort PVC.
Packet Classification and Policing -- The token-bucket-based scheme is used to implement both packet classifier and packet policing. Each reserved-bandwidth service flow will be associated with a token bucket. The state for a token bucket keeps the packet classification criteria and rate information passed by RSVP messages. It also keeps track of actual packet data rate for packet policing. Rate information includes average rate, peak rate, and allowed burst size. Since traffic is not shaped from the input side of a router to the output side, the buffer space required at the token bucket is not large. Classification criteria include source and destination addresses, and other header information similar to that used for access control lists. This allows extension to support class-based packet classification. Nonconforming packets can be either discarded or mapped to a lower precedence class at the network operator's discretion. There are active and passive modes in which the token bucket mechanism can actively set the precedence bits or just passively use the precedence bits set by customers for classification.
Packet Queuing and Discard -- The current design is to use Weighted Fair Queuing (WFQ) on the output, with each reserved-bandwidth flow in a queue receiving a weight equal to its reserved rate [4]. To meet the loss goals of the reserved traffic, a drop policy must be used. The preferred algorithm is Weighted RED for selecting packets to discard, with each class having a separate MIN, MAX, and probability weight applied to it as a drop mechanism [5]. Using this WRED mechanism, we can treat nonconforming packets that belong to reserved-bandwidth flows with a discard probability either higher than the best-effort traffic or between reserved bandwidth and best effort. Once the ATM policing function has been configured to match the router QoS thresholds, the router must ensure that packets transmitted on an SVC do not then get dropped by the ATM policing function. This may happen as a result of processing jitter in the router causing transmitted cells not to conform to their contract. This will require shaping, preferably in hardware, of the outgoing traffic. The shaping capability is in our implementation plan but may not be offered at the beginning, depending on vendor delivery.
Coordinated Admission Control for IP over ATM -- Since reserved bandwidth is an IP service implemented over ATM, there are two levels of admission control to coordinate. At the IP layer, RSVP can be used to configure thresholds for bandwidth reservation on both interface and individual flow bases. The admission control algorithm checks that an incoming flow request does not result in either threshold being violated. For a particular data flow request, it is the outbound interface that should be checked for sufficient capacity. Subinterfaces are subject to the aggregate threshold for the physical interface configuration as well as their individual thresholds. Admission control at the ATM layer is based on a CAC computation when the request for an SVC is initiated. The SVC is subject to ATM admission control. In order to avoid flows passing RSVP admission control only to be rejected at the ATM layer, coordinated configuration of the RSVP bandwidth threshold with the ATM CAC algorithm is needed. The Hyperstream ATM parameters offered for vBNS VBR PVPs is a PCR of OC3 and an SCR of DS3. Because the vBNS ATM service does not signal with the commercial MCI ATM service, dynamic knowledge of available ATM bandwidth is incomplete and we must make assumptions when configuring the bandwidth thresholds at the IP layer. The initial RSVP threshold settings on the vBNS routers will limit per-flow reservations to DS3 rate and the aggregate of all reserved flows on an interface to n * DS3 where n = the number of outgoing trunks on the connected Hyperstream ATM switch, and n ¾ 3. For example, in Fig. 6 the router would be configured to have a per-flow threshold of DS3, and a threshold for all reservations on the interface of 2 * DS3 because that is the maximum outgoing SCR bandwidth available to the vBNS in the Hyperstream backbone. The aggregate threshold is capped at OC3 because the PCR offered to the vBNS on any VBR PVP is OC3.

Class of Service

End-to-end signaling and per-flow resource allocation and scheduling, the core mechanisms to support vBNS reserved-bandwidth service, demand high processing power and large memory space in proportion to the number of active traffic flows that require such service. It is expected that simultaneous vBNS traffic flows requiring "reserved bandwidth" will initially be a small number compared to the total active end-to-end flows at any time. This number is "reasonably small" so that the mechanisms discussed in this article can support this service with adequate performance. Although vendors have not committed to any specific size parameters, their in-depth disclosures on design and implementation gave us a degree of confidence to provide the "reserved bandwidth" on a limited basis given our current user groups.
Extending our QoS offering to include class-based or differentiated services is a logical step to address the scalability issue and to satisfy the diverse QoS needs of our customers [6, 7]. For example, the Internet2 community is looking at differentiated services as a model for their first QoS requirement. As its initial backbone infrastructure, we are actively looking at different implementation choices to support the premium and/or assured class along with reserved-bandwidth service.
For these class-based services, we desire to use the same implementation mechanisms used for reserved bandwidth. The key difference is no per-flow states or queues; instead, queues and states will be on a per-class basis. Signaling to trigger the service on a dynamic basis is likely to be unnecessary. Packets of application flows with similar network resource demands will be associated with the same class identifier and given the same treatment at points of packet queuing or dropping. We are evaluating the potential problems and required extensions to these mechanisms to support both per-flow and per-class services in the same environment. We are also interacting with the Internet2 community on issues such as bandwidth broker, policy control and others.

Conclusion

Our contribution is an analysis of our QoS needs and the traffic characteristics with a practical view of the available technology, and a proposed definition and implementation of a QoS called "reserved bandwidth" based on our analysis. We have chosen a QoS design that is believed to be a best match between our QoS needs and the reality of today's leading-edge technology. It is expected that details of the implementation will go through changes as we refine our design and expose limitations with field testing. We will also address major unresolved issues, some of which are pointed out below.

Multicast

Multicast complicates the design presented here in several ways. First, to support multicast over an ATM subnet we are evaluating the trade-off between a MARS-based solution versus a PIM-based solution which may be limited to a single LIS. Second, when multiple receivers are allowed per session, the model of point-to-multipoint SVCs and their mapping to RSVP styles must be defined (e.g., one point-to-multipoint per session, receiver, or QoS class to support wildcard, shared explicit, or fixed filter reservation styles).

Overall Performance

This design pulls together pieces from many different hardware and software implementations, and applies them to high-speed interfaces for the first time. Complete testing will require time and is expected to uncover limitations. Future work may be required for refinements to address the limitations and for general tuning to achieve optimal performance from the design.

Usage Policy

As an integral part of our QoS architecture we need policies to control the amount of bandwidth that can be reserved by a single flow and reserved at an aggregate level per port. The control over usage needS to be based on source/destination IP address, bandwidth requests, and time of day and duration of requests.

References
[1] R. Braden et al., Eds., "Resource Reservation Protocol (RSVP) -- Version 1 Functional Specification," RFC 2205, Sept. 1997.
[2] S. Shenker et al., "Specification of Guaranteed Quality of Service," RFC 2212, Sept. 1997.
[3] J. Wroclawski, "Specification of the Controlled-Load Network Element Service," RFC 2211, Sept. 1997.
[4] J. Bennett and H. Zhang, "Hierarchical Packet Fair Queuing Algorithms," Proc. ACM SIGCOMM '96, Aug. 1996.
[5] S. Floyd and V. Jacobson, "Random Early Detection Gateways for Congestion Avoidance," IEEE/ACM Trans. Networking, Aug. 1993.
[6] D. Clark and J. Wroclawski, "An Approach to Service Allocation in the Internet," Internet draft draft-clark-diff-svc-alloc-00.txt, July 1997.
[7] V. Jacobson, "Differentiated Services Architecture," talk at the Int-Serv WG at the Munich, Germany meeting of IETF, Aug. 1997.

Additional Reading
[1] D. Clark, "Adding Service Discrimination to the Internet," 1995.

Biographies
Chuck Song is a lead engineer for the vBNS project. His current focus is QoS design and implementation in the vBNS. Before joining MCI in 1995, he worked on the NSFNet project at IBM for over five years. He has a Ph.D. in computer science from the University of Wisconsin-Madison.
Laura Cunningham is a senior engineer in the vBNS group at MCI. She has been responsible for design and configuration of the ATM network in the vBNS project. Her current responsibilities include development and testing of vBNS QoS design. She worked at Bellcore and MITRE in the areas of multimedia and IP networks. She has an M.E. degree in systems engineering from the University of Virginia.
Rick Wilder is senior manager of MCI's Internet Technology group. His organization is responsible for research and development of new Internet technologies and facilitating the insertion of these technologies into MCI IP networking projects. He holds an M.S. in computer science.