Copyright 2001 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

This article was published in the September 2001 issue of
IEEE Communications.

CIRULE4.GIF (372 bytes)

ABSTRACT
      This article presents a survey of architectures, techniques, and algorithms for multicasting data in communication switching networks. We start with a broadcast architecture using a separate copy network and a routing network. A few versions of this idea using Delta and Benes networks exist. Another multicast architecture is a recycling network where internal nodes act as relay points, accept packets from the switching fabric, and recycle them back into the fabric after relabeling the packets. Next, we give an overview of a system that uses the Boolean splitting multicast algorithm. In this system a nonblocking self-routing broadcast Banyan copy network has been proposed. The network consists of several components including a running adder network to generate running sums of copy numbers specified in the headers of input packets. We then describe a multicasting technique presented for a different class of switching networks called deflection-routing networks. Finally, the idea of extending a nonblocking network to a three-dimensional structure consisting of multiple parallel planes is also presented. At the end of this article, we compare the efficiencies of the presented multicast architectures.

CIRULE4.GIF (100 bytes)

 

A Survey of Data Multicast Techniques, Architectures, and Algorithms

CIRULE4.GIF (212 bytes)

Nader F. Mir, San Jose State University

 

Introduction

      Modern switching systems must have the ability to support widespread multimedia applications such as multicasting voice, video, and data. Multicasting is defined as sending data from one source to a group of destinations. The motivation for developing multicast is that there are applications for which a packet needs to be sent to more than one destination host. Instead of forcing the source host to send a separate packet to each destination host, we want the source to be able to send a single packet to multicast addresses, and for the network to deliver a copy of that packet to each group of hosts. Hosts can then choose to join or leave this group without synchronizing with other members. A host may also belong to more than one group at a time.
      Multicasting is required to implement conferencing due to the real-time component of multimedia applications like audio and video. Tight real-time components require a constant flow of data and have very low tolerance of jitter. To this end, corrupted data frames are often dropped rather than retransmitted. New technologies like asyncrhonous transfer mode (ATM) are likely to have low jitter due to the use of optical fibers as the transport media. Here one would like to see a dynamic multicast system which adapts to deletions and additions of ongoing data over time [1]. Similarly, we have unicast or point-to-point service and broadcast service, defined as sending data from one source to all destinations. Multicast service is distinct from repetitive unicast. In multicast service we want to make copies of data at or near a source with only one access to the transport media. This raises issues of congestion more drastically due to copies of packets floating around. Multicasting is implemented as a tree structure. The source generates a packet and sends it out to the first switch. The packet may have a field that specifies how many copies of this packet are to be made. All copies may be destined for other switches in which more copies of the packet are made, and some packets may move to their destinations. To implement a multicast connection, a binary tree is normally constructed with the source switch port at the root and the destination switch ports as the leaves. Internal nodes act as relay points which receive packets and make copies.
      There are a number of multicast methods presented in the literature for switching networks. In this article we explain the current and popular multicast techniques as follows: multicasting by copy network, through a recycling technique, by a Boolean splitting algorithm, in deflection-routing networks, and in a three-dimensional nonblocking network. Finally, we compare the above architectures.

Multicasting through a Copy Network and a Routing Network

      One of the first broadcast architectures for modern data switching systems appeared in [2]. The system consists of three Delta networks: a copy network, a distribution network, and a routing network. This work was then extended in [3] where the Benes network topology was used to construct a routing network and a copy network for an ATM switching system. Two algorithms have been proposed, one for the routing network and one for the copy network. In the copy network shown in Fig. 1, packets are replicated as designated by the initial global fanout, F, given in the routing control field of the packet header. The global fanout refers to the total number of packet copies requested. Let Fj be the remaining number of copies of a packet when the packet arrives at stage j, and fj the number of copies made locally at stage j. In the beginning, the algorithm initializes Fj to F and fj to one. The copying method is such that the replication of the packets takes place stage by stage within the network in order to distribute the traffic as evenly as possible. There are two types of routing: point-to-point and multipoint connections. In the case of a point-to-point connection, a packet is routed randomly in the first k – 1 stages and routed according to successive d-array digits of the destination address in the last k stages.
      When multipoint connections are requested, a packet is routed randomly in the first k – 1 stages, and in the last k stages the packet is copied. The operation basically computes a new fanout number for a packet at stage j with local fanout where = 2k – 1 – j. The algorithm is set up to generate the final number of copies that appear on the outputs to be the smallest multiple of 2 greater than or equal to F. This technique reduces the hardware complexity in the controller. If the final number of copies that appear on the outputs is more than requested, the unnecessary copies of packets can easily be thrown away. This task is completed by the broadcast translated circuit (BTC), shown in Fig. 1. The BTCs receive all the copies of a packet. A central monitoring mechanism for all BTCs compares the number of requested copies read from the packet's header and the number of copies actually made in the copy network. It then decides to remove unused copies based on this comparison. For the routing network, routing is similar to that in a point-to-point copy network.
      As a practical example, consider Fig. 1, which shows a network with n = 16 ports using switch size d = 4. Assume the global fanout of an incoming packet is F = 5 and copies of the packet are to be routed to output port numbers 1, 6, 9, 12, and 14. In the copy network, there are (2k – 1) = 3 stages (); at stage j = 1, F1 = 5 and the local fanout f1 = 1, so the packet gets distributed. At stage j = 2, and ; thus, two copies are made at this stage and guided to two switches at the last stage (j = 3). At the third stage, and the local fanout ; therefore, for each of these switches three copies of the packet are made. Notice that the sum of the above copies (6) has exceeded the requested global fanout (5), so, as mentioned, the one additional copy is thrown away by the corresponding BTC.

Multicasting by the Recycling Technique

      A new method presented in [4] proposes multicasting by recycling packets. This method uses an architecture that reduces the switching complexity and the amount of memory required for routing packets in multicast virtual circuits. Consequently, this architecture makes it both technically and economically more feasible to construct large switching systems ultimately used for broadband switching applications. To implement a multicast connection, a binary tree is constructed with the source switch port at its root and destination switch ports as its leaves.
      Internal nodes acting as relay points accept packets from the switch but then recycle them back into the switch after relabeling the packets. New labels carry the information about the destination pair, identifying the next two switch ports to which they are to be sent. This topology provides a moderately low-cost solution. In this method, as seen in Fig. 2, a translation table (VXT) provides an <output, VCI> pair that is added to the packet header plus two additional bits which indicate whether the pair is to be recirculated another time. There are a number of buffers used (Fig. 2). A receive buffer (RCB) holds packets received from the input link and forward them to the switching network. A transmit buffer (XMB) holds packets waiting to be transmitted on outgoing links. A resequencing buffer (RSQ) is used to restore proper ordering of packets at the output. A recycling buffer (RCY) provides buffering to the packets that need to be recycled.
      Figure 2 illustrates the recycling technique. Basically, a is a source that generates packets to be delivered to b, c, d, and e. Also, x and y are relay points. Consider a packet entering at a has copies to be made and is headed for virtual circuit i. For such a packet there may be an entry of the type <x, j> and <e, k> in the VXT. This can be interpreted as making two copies of the packet, one to be recycled to port x on virtual circuit j and the other bound for output port e on virtual circuit k. Furthermore, the VXT entry at x may have entries <b, n> and <y, m>, which could be interpreted as making copies bound for output port b on virtual channel n and recycled to y on virtual channel m. The packet would again be recycled at y. The packet header has two bits, the status of which is determined by the controller to decide if a packet needs to be recycled. To add an endpoint, we need to find a parent for it (maybe one that does not have heavy recycling traffic) and add the new endpoint to one of its unused VXT entries. Since the intermediate point (i.e., the parent of the new endpoint) is also new, it has to be put into the VXT entry of another node, which becomes the grandparent of the new endpoint and whose child will now become a sibling of the new endpoint. To remove a connection we can consider two cases. If the output to be removed has no grandparent but its sibling has children, replace the parent's VXT entry with the sibling's children. If the output to be removed has no grandparent and its sibling has no children, we simply drop the output to be removed from its parent's VXT entry, and the connection reverts to a simple point-to-point connection. Since packets are resequenced when they exit from the system, the RSQ must be dimensioned to delay packets long enough so that slow packets have a chance to catch up with fast packets.

Multicasting by a Boolean Splitting Algorithm

      In [5], a nonblocking self-routing copy network with constant latency has been proposed. The system mainly consists of a broadcast Banyan network in which switching nodes replicate packets based on two-bit header information. The basic structure of this network consists of following cascaded components: a running adder network (RAN) that generates running sums of copy numbers specified in the headers of input packets, a dummy address encoder (DAE) that takes adjacent running sums to form a new packet header, a broadcast Banyan network (BBN) that is a Banyan network with broadcast switch nodes capable of packet replications based on two-bit header information, and a trunk number translator (TNT) that determines the outgoing trunk numbers for each copy packet. When broadcast packets are received at the RAN, the number of copies (CN) specified in the packet headers is added up recursively. DAEs form new headers consisting of two fields: a dummy address interval and an index reference (IR). The dummy address interval formed by adjacent running sums is represented by two n-bit binary numbers, the minimum (MIN) and maximum (MAX). The index reference is set equal to the minimum of the address interval, used later by the trunk number translators to determine the copy index (CI). The broadcast Banyan network replicates packets according to a Boolean interval splitting algorithm based on the address interval in the new header. When copies finally appear at the outputs, the TNTs compute the copy index for each copy from the output address and index reference. The broadcast channel number (BCN) together with the CI form a unique identifier for each copy. The TNTs then translate this identifier into a trunk number (TN), which is added to the packet header and used by the switch to route the packet to its final destination.
      To understand how routing takes place using the Boolean splitting algorithm shown in Fig. 3, suppose a packet arrives at an arbitrary node k – 1 from a previous node k. If we represent the MIN number as m1 ... mk and the MAX number as M1 ... Mk, we can make the following observations: If mk = Mk = 0 or mk = Mk = 1, then send the packet out on link 0 or 1, respectively. If mk = 0 and Mk = 1, then replicate the packet, modify the headers, and send both packets out on both links. If replication takes place, the packet header needs to be modified by splitting the original address interval into two subintervals, which can be expressed by the two following main recursions: (1) min(k) = min(k – 1) = m1 ... mN and max(k) = M1 ... Mk–101 ... 1, for the packet sent out on link 0, and (2) min(k) = m1 ... mk–110 ... 0 max(k) = max(k – 1) = M1 ... MN, for the packet sent out on link 1. This modification is a routine operation, meaning it is independent of the header contents and can be considered built-in hardware.
      The RAN is a log2n-stage network constructed from n/2log2n nodes. It can be arranged as a top-down or bottom-up recursive structure. Each node is an adder with two inputs and two outputs, and a vertical line which is a pass. The main function of the RAN is to assign output addresses for each broadcast packet according to the requested number of copies. It first generates the running sums of copy numbers at each port. This is done by adding the CN from the packet header with the number going down (or coming up in a bottom-up structure) through the pass and placing that sum on the output node. If there is another node on the same link, the process of adding is repeated. The dummy address encoders take the new headers from adjacent running sums. The new header consists of two fields: one designates the dummy address interval, which is represented by two n-bit binary numbers MIN and MAX; the other contains an index reference equal to the minimum of the address interval. If allocation of output addresses begins with address 0, the following sequence of dummy address intervals are generated by a top-down running adder:
      (0,S0 – 1),(S0,S1 – 1), ... ,(Sn–2,Sn–1 – 1),
where Si is the ith running sum of the copy numbers. This sequence results in forward address assignment. The length of each interval is equal to the corresponding copy number in both addressing schemes. It should also be noticed that both sequences are monotone to satisfy the nonblocking condition previously described. The decoding process is carried out by the broadcast Banyan network and the trunk number translators. The broadcast Banyan network routes the packets according to the Boolean splitting algorithm discussed earlier. When packets emerge from the Banyan network, the address interval in their headers contain only one address at the outputs of stage N, the last stage. According to the Boolean splitting algorithm min(N) = max(N) = output address. Packets belonging to the same broadcast channel are distinguished by CI. The CI of each packet is determined by CI = output address – IR.

Multicasting in Deflection-Routing Networks

      Another category of switching networks is based on the deflection of packets for routing. As a result, the multicasting technique could be fundamentally different to those in other switching system categories. In a deflection-routing network, when two packets request the same outgoing link, only one of them is forwarded on the preferred link; the other one is deflected on the second link. There are a few deflection-routing networks such as the Shuffle network [6] and the Manhattan street network (MSN) [7]. In this article we review the multicast technique for the MSN.
      The MSN is a regular mesh network that can be realized as a collection of horizontal and vertical rings. The MSN can be considered for implementation in geographically distributed computer networks. The links resemble the streets and avenues in Manhattan, which alternate in direction as shown in Fig. 4. With the torus-shaped topology embedded in the network, at each node there are two links arriving and two links leaving regardless of the network size. Although it is not indicated in the figure, at each node an incoming link and an outgoing link connect the network to a local processor which generates and receives data sent through the network. Routing in the MSN relies on deflection [7]. By maintaining the deflection rule explained above, once a packet is admitted to a network, it is not discarded if congestion occurs. Instead, the packet is misrouted temporarily but will reach its destination. However, deflection of packets onto longer paths causes additional delay.
      Nodes can be used with different purposes for multicasting. Figure 4 gives an example of a multicast tree for a packet with fanout-4. By the source node and destination node, we mean the terminals through which a packet is delivered and exits, respectively. A relay node refers to a location that determines the addresses of the next two copies. Finally, a copy node is a node through which a packet is replicated and forwarded to both outgoing links. In an N N network where N = √n, the number of hierarchical levels of a multicast tree in which copy nodes are required relies on the global fanout of a packet F {0,1, ..., n – 1}. In a multicast tree, the number of levels containing at least one copy node is . The number of relay nodes R is equal to F – 2.
      The determination of physical locations of copy nodes and relay nodes with respect to packet destinations can be important. These locations will affect the capability of a network to handle high bandwidths. Assume in Fig. 4 a packet at copy node c1 (as a local source node) is to be replicated and sent to two final destinations, d1 and d2, respectively. Let (i1, j1) and (i2, j2) be the addresses of two destination nodes d1 and d2, and (i3, j3) be the address of c1. Nodes d1 and d2 can always be realized as two vertices of a rectangle in the MSN. The ideal location of a relay node r1 is as an immediate neighbor of c1, (i3 + 1, j3) or (i3, j3 + 1), depending on which is closer to c3. By maintaining this rule in the network, the length of a routing path between a copy node and a destination node is evaluated in the relay node at its earliest time and thus at its lowest waste of bandwidth. For the same reason as for relay nodes, the location of a copy node must be closest to the pair of destination nodes. In Fig. 4 the copying steps of a fanout-4 packet are shown, and the ideal locations of copy nodes and relay nodes indicated.

Multicasting in Three-Dimensional Nonblocking Networks

      In [8], a 3D multicasting switching architecture was presented for high-speed applications. A Clos network is used as a building block module in this system. The Clos network is a nonblocking network with the condition that k ≥ 2n – 1, where n and k are the sizes of the first-stage crossbar switch elements [9]. In this figure, the network is nonblocking since n = 2 and k = 4. The multicast algorithm for each plane is described in the following steps.
      At the first stage no routing decision need be made. The first stage switch elements only maintain a table indicating utilization of the center stage arrays. This table is created based on back-pressure information obtained from the center stage arrays. A center stage element least used at any particular time gets the highest priority. In fact, this table can be shared by all the first stage arrays. Once a multicast packet arrives at the center stage array, the hardware looks at the first k bits of every address, where k = log2N/n. The packet is then split according to the number of distinct k bits found. The maximum number of new packets that can be created is N/n. All addresses with the same first k bits are sent to the corresponding switch element in the last stage. Also, since there are four distinct addresses identified by the first two bits in our example, the packet is split four ways. The final stage (stage 3) looks at the remaining l bits, where l = log2n, in each address of the multicast packet received by the last stage array. The packet is again split according to the number of different sets of l bits found. The maximum number of packets at this stage is l. The packets are then sent to the appropriate output port corresponding to the last l bits. If a packet is split, the header has to be modified. The new header contains the addresses with the same initial k bits.
      The 3D structure of the Clos network consists of a number of parallel planes of the same type and size of network seen in Fig. 5. The inputs are connected together such that an incoming input packet can be demultiplexed between all the same input ports at the same level. The demultiplexer does the work of distributing the input packets. Similarly, there is a device that connects all equally numbered outputs. This device accepts packets coming from different planes and time multiplexes them onto the same output port. A multicast packet on a particular input port requesting some number of copies, say F, can distribute the request equally between the parallel planes. This serves to reduce the load on any particular plane, which reduces overflow at the planes, leading to reduced use of buffers. The advantage of this method is that there is no need to queue input packets. This would reduce the traffic congestion in the network. An analysis of a 3D Clos network is based on the blocking probability pb(i) for input number i out of N:

(1)

where is the offered load per input and Fmax is the maximum number of copies requested by any packet. The distribution of multicast packets among planes is done using the following routing algorithm. Let the number of copies being requested by a multicast packet be represented by F.

      The distribution circuit at the input has the processing ability to take the packet header and modify it according to the algorithm for distribution of packets to the planes. Some fields are added on top of the original packet. In fact, the original packet is encapsulated into this new packet with the new packet header. The distribution circuit maintains a table constructed on the basis of back-pressure information obtained from the planes.

Comparison of Techniques and Summary

      In this article we present a survey of high-speed switching networks along with their techniques and algorithms for multicast traffic applications. The copy network architecture uses a Benes network. The Benes network can realize more than one path from an input to an output, but it is strictly nonblocking under certain speedup factors. The speedup factor in a switching network refers to the internal link speed to external link speed ratio. A network is called strictly nonblocking if for every set of routes R realizing a set of connections C, and every connection c compatible with C, there exists a route r that realizes c and is compatible with R. In the recycling technique, a method of multicasting by recycling packets is presented. This is an architecture that reduces switching network complexity and the amount of memory required for routing packets in multicast virtual circuits compared to the copy network architecture. However, extra hardware is needed to implement the recycling part. The recycling technique makes it both technically and economically feasible to construct the large switching systems that will ultimately be used for broadband switching applications. Multicasting in deflection routing networks requires a special-purpose technique since from any given input to any output there may exist a number of different paths with different lengths. The Manhattan street network as a sample of deflection-routing network is considered. This network has a toroidal-mesh topology which can deflect a request on the alternate output link if the desired link is not available. The three-dimensional structure of the presented architecture can be advantageous in applications where there is heavy multicast traffic, as in multimedia applications. The Clos nature of the network decreases internal link blocking. However, the complexity of the entire system increases, especially for larger-size architectures. A Clos network as a building block network uses a larger number of crosspoints than a similar sized Banyan network used in a Boolean split architecture. Nevertheless, the trade-off between hardware complexity and blocking makes more sense with today's technology as hardware becomes inexpensive.
      Table 1 presents a quick efficiency comparison of the five multicasting techniques; CN, RT, BS, DR, and 3D refer to the copy network, recycling technique, deflection routing networks, and three-dimensional structure, respectively. Factors considered are hardware complexity, the speed of algorithm, and the level of traffic congestion and blocking as a result of multicasting.

References
[1] M. Byne et al., "An Introduction to IP Multicast," http://ganges.cs.tcd.ie/undergrad/4ba2/multicast/
[2] J. Turner, "Design of Broadcast Packet Switching Network," IEEE Trans. Commun., 1988.
[3] N. F. Mir, "A Multicast Approach in an ATM Switching Network for Multimedia Applications," J. Net. and Comp. Apps., 1998.
[4] J. Turner, "An Optimal Nonblocking Multicast Virtual Circuit Switch," Proc. IEEE INFOCOM '94, Toronto, Ontario, Canada, June, 1994, pp. 298–305.
[6] M. Decina, P. Giacomazzi, and A. Pattavina, "Shuffle Interconnection Networks with Deflection Routing for ATM Switching: The Closed-Loop Shuffleout," Proc. IEEE INFOCOM '91, Apr. 1991, pp. 1254–63.
[5] T. T. Lee, "Nonblocking Copy Networks for Multicast Packet Switching," IEEE JSAC, 1988.
[9] J. Bellamy, Digital Telephony, 2nd ed., Wiley, 1990.
[8] Y. M. Matcheswala, "An Architecture and Algorithms for Multicasting Data in Computer Communication Networks," Comp. Commun. J., March 2001.
[7] N. F. Maxemchuk, "Routing in the Manhattan-Street Network," IEEE Trans. Commun., vol. 35, no. 5, 1987, pp. 503–12.

Additional Reading
[1] N. F. Mir, "Evaluation of an ATM LAN Constructed with a Cyclic Deflection-Routing Network," J. Comp. Commun., vol. 21, no. 7, pp. 661–69, June 1998.
[2] C.-C. Chen and J. Chen "Nearly Optimal One-to-Many Parallel Routing in Star Networks," IEEE Trans. Parallel and Dist. Sys., vol. 8, no. 12, Dec. 1997, pp. 1196–1202.

Biography
Nader F. Mir [SM] received a B.Sc. degree (with honors) in electrical engineering in 1985. He received M.Sc. and Ph.D. degrees, both in electrical engineering, from Washington University, St. Louis, Missouri, in 1990 and 1994, respectively. He is currently an associate professor in the Electrical Engineering Department at San Jose State University, California. Prior to this position he was an assistant professor in the Department of Electrical Engineering at the University of Kentucky at Lexington. From December 1994 to July 1996 he was a research scientist at the Advanced Telecommunications Institute at Stevens Institute of Technology, New Jersey, working on design of advanced telecommunication networks. From 1990 to 1994 he was with the Computer and Communications Research Center at Washington University, St. Louis, and worked as a research assistant on design and analysis of a gigabit switching systems project. His research interests are analysis of computer communications networks, design and analysis of switching systems, network design for wireless and telecommunications sytems, and applications of digital integrated circuits in computer communications.