Deploying Native Multicast Across the Internet

This document is a case study of the native multicast deployment in Sprint's Internet backbone. It is a technical explanation designed for network engineers familiar with multicast issues.

Why Multicast?

Multicast is an efficient way of delivering one-to-many communications across an IP infrastructure. Its benefits over the unicast delivery paradigm for this type of communication are well-chronicled. The explosive growth of multimedia content on the Internet has highlighted the need for a ubiquitous wide-scale deployment of native multicast across the Internet. In the example of Internet radio, unicast requires that each listener must make a separate connection to the server that is the source of the data. This results in tremendous load on the server and congestion across expensive WAN links as the number of listeners increases. With multicast, one stream is sent by the server to the network and a distribution tree forms. Interested listeners simply add a branch to the tree. Routers replicate packets at each branch in the tree. In this way, no packets are ever duplicated in the network, and the server never has to send more than one stream of data.

It is no longer inconceivable to predict that all of television and radio will eventually be delivered primarily over the Internet someday. Accepting that, it cannot be denied that the unicast method of delivery simply cannot scale to support this vision. Conversely, multicast is designed explicitly to provide this functionality. Also, multicast's benefits do not end with audio and video applications. File transfer, network management, stock tickers and any other application that requires one-to-many delivery is ideal for multicast.

During the 80s and early 90s, the multicast world had been confined to a tunneled overlay network of routers and UNIX servers known as the MBone. It was primarily used by research institutions as a hobbyist toy. However, with the recent creation and standardization of some protocols, along with the willingness of some service providers to provide a scalable architecture, a ubiquitous wide-scale deployment of non-tunneled (native) multicast across the Internet is a reality today. The intent of this document is to explain Sprint's native deployment across its Internet backbone. The protocols used to do this are not merely ideas on paper, or promises for the future. They are not solely designed for the enterprise networks of small- and medium-sized businesses. These protocols are available and in use today and will scale to support native deployments across the entire Internet.

Multicast Routing Protocols

Multicast routing offers a significant paradigm change from unicast routing. In unicast, routing decisions are made based on the destination. Conversely, multicast routing decisions are based on the source. A router will look at the source of the traffic and determine which interface is closest to the source. This is called a Reverse-Path Forwarding (RPF) check. Each router, from receiver to source, will perform RPF checks to determine the best path to the source.

Multicast routing protocols can generally be classified into two categories, sparse mode and dense mode. Dense mode assumes that most subnets in the network will be interested in multicast traffic. To inform other routers of multicast sources, it floods the multicast traffic to all routers in the network. A router with no receivers interested in this traffic will then tell its upstream router to stop forwarding this traffic or to prune this branch from the tree. This flood-and-prune mechanism allows these protocols to easily build a multicast distribution tree rooted at the source. A source-based tree guarantees the shortest and most efficient path from source to receiver. While this may be an ideal enterprise solution in many circumstances, the reliance on broadcast and flooding across the Internet simply will not scale. Examples of dense mode protocols are Distance Vector Multicast Routing Protocol (DVMRP), Multicast Open Shortest Path First (MOSPF) and Protocol Independent Multicast Dense Mode (PIM-DM).

Sparse mode protocols implement a shared distribution tree. Here, the multicast distribution tree is rooted at a core router in the network called a rendezvous point (RP). When a source begins actively sending multicast traffic, its directly connected router, or designated router, registers with the RP. The RP will keep track of all active sources in a domain. When a router is connected to a host that wants to receive a multicast group, it will use RPFs to determine the shortest path to the RP. While the RP builds a tree to the source, all receivers join the tree at the RP. As long as all routers know which router is the RP, broadcast is not needed to distribute multicast routing information. Additionally, this limits the amount of routing state that all non-RP routers need to know. Protocol Independent Multicast Sparse Mode (PIM-SM) is an example of a sparse mode routing protocol.

Protocol independence is a significant requirement that ISPs have of their multicast routing protocol. It should be able to use any underlying unicast routing protocol to build the multicast distribution tree. This leaves only the aptly named PIM protocols. Since we have already stated that the flooding requirements of dense mode protocols would not scale across the Internet, PIM-SM is the obvious choice for a large-scale deployment. However, there are still many issues to consider. In sparse mode protocols, the distribution tree is rooted at the RP. Sub-optimal routing can occur if the RP is not in the shortest path from receiver to source. Fortunately, PIM-SM allows a router to switch from the RP-based shared tree to a source-based tree if traffic levels reach a configured threshold. The default threshold is set to zero in Cisco and Juniper routers. This means that a router with a directly connected receiver, or designated router, will initially build a tree to the RP. After receiving the first multicast packet, it will switch to a tree rooted at the source. This provides the primary benefits of both sparse and dense mode protocols, without introducing their disadvantages.

We must also consider that there can be only one active RP for a given group. While the above mentioned switch to the source-based tree will reduce the load on the RP, a well-designed implementation should address RP load sharing and redundancy. Also, in the interdomain world of the Internet, different ISPs cannot rely on a shared RP.

Interdomain Routing: MSDP and MBGP

The significant differences between multicast solutions in the enterprise and on the Internet become apparent when considering the practical realities of interdomain routing. Interdomain routing refers to the routing between different autonomous administrative entities. Simply stated, we can think of this as routing between different ISPs.

The requirement of PIM-SM that there can only be one active RP for a given group presents a significant challenge in the interdomain world of the Internet. For practical reasons, ISPs do not want to rely on shared or third party RPs. ISPs require administrative control of their own RPs. A mechanism was needed for these different RPs to be able to communicate with one another to exchange information about the active sources in their respective domains. An early method of doing this was to have multicast peering exchanges on multi-access interfaces (FDDI or Fast Ethernet) to which each ISP would connect its RP. By running PIM-DM on this multi-access interface, each RP would flood its source information to each other only on this interface. Among other problems with this hybrid PIM-SM/PIM-DM architecture was that each ISP would have to place its RP at the edge of the network to achieve this. As we will later examine, a well-designed PIM-SM implementation requires that the RP be well-connected in the core of the network.

Multicast Source Discovery Protocol (MSDP) was developed to solve this problem. MSDP creates peerings between RPs via TCP connections. Through these peerings, RPs exchange Source Active (SA) messages, which describe all of the registered sources within their respective domains. In this way, RPs learn all of the active sources from other domains, in addition to the sources from their own domain. Additionally, Sprint maintains an MSDP peering with the historical MBone so that any multicast enabled customer will be able to send and receive traffic from the MBone.

A further requirement of multicast interdomain routing was the ability to support incongruent unicast and multicast routing topologies. ISPs wanted to be able to create different routing policies for unicast and multicast. For example, it may be desired to have multicast-only and unicast-only connections between two ISPs. The protocol used to achieve this should also be similar to existing routing protocols with which ISPs are currently comfortable using.

Multiprotocol Border Gateway Protocol (MBGP) creates extensions to the widely-used Border Gateway Protocol (BGP) to support this requirement. MBGP adds a multicast-only reachability table to the existing unicast reachability table of BGP. With MBGP, a router can effectively have two BGP tables, one for multicast and one for unicast. Routers can be configured to look first at the multicast reachability table when making RPF checks. If a network is not in the multicast reachability table, it will then use the unicast routing protocol to perform the RPF. Experienced BGP users should have no difficulty in learning to configure and maintain the extensions of MBGP. Furthermore, MBGP is backward compatible with BGP.

RP Issues: Placement, Anycast RP, RP Mapping

Since the RP will be the root of the shared tree, it is quite obvious that RP placement will be a critical decision in a well-designed implementation of PIM-SM. The RP will need to be located in a well-connected part of the core of a network to minimize sub-optimal routing on the shared tree. Also, the single RP for a given group requirement is as insufficient in the world of intradomain routing as it was in the interdomain realm. A single RP provides no load-sharing or redundancy capabilities.

Once again, MSDP provides the solution, this time in the intradomain space. Now, an ISP can configure multiple routers as RPs with the same loopback IP address. Using the same IP address for multiple RPs is known as "anycast RP." In SprintLink, multiple geographically diverse core routers were selected to be anycast RPs. By using the same IP address, users of Sprint's RP are routed to the topologically nearest RP. Should that router fail, they would be routed to the next closest RP. This provides a scalable way for Sprint to offer all customers the use of Sprint's RP. Thus, customers not wishing to administer their own RP can simply use Sprint's. Care should be taken in selecting the anycast loopback address such that it will not be selected as the router-ID for routing protocols like ISIS, OSPF or BGP.

With MSDP, the anycast RPs are able to exchange information regarding active sources registered to each other. The anycast RPs must all be in the same MSDP mesh group to accommodate the special RPF needs of RPs within the same domain. Customers and peers with their own RP will then run MSDP with the nearest Sprint RP. MSDP mesh groups must only be used for MSDP peerings between RPs in the same domain. Multiple mesh groups can be created in the same domain. For example, Sprint has three mesh groups created: one for routers in North America, one for Europe, and a third for Asia. MSDP peerings between RPs in different domains must not use mesh groups.

All routers in a domain must know which router is the RP. There are two ways a router can learn its RP: statically and dynamically. Much has been done to develop ways to provide dynamic RP mappings. Cisco's Auto-RP, in conjunction with PIM-sparse-dense, and Bootstrap Router (BSR) from PIM-SM v2 are two mechanisms used for RP announcement and discovery. In SprintLink, it was our intention to make our multicast deployment as simple as possible and eliminate any extraneous mechanisms that did not provide benefits in our network. Since the IP address of our RP was not likely to change, and adding a few extra lines of configuration to all routers was not a difficult proposition, it was decided to use static RP mappings. Thus, each router in SprintLink was explicitly configured to know the IP address of the RP.

Router Configuration

While the protocols that make multicast work can be extremely complicated at times, the configurations needed to make these protocols work in a Cisco router are surprisingly simple. The following are the actual configuration commands taken from SprintLink routers to illustrate this point. The first configuration contains the commands required to run an anycast RP. The second shows the commands from a generic non-RP SprintLink router. Finally, we have listed the minimum configuration needed for a customer router to interoperate with our multicast service. This assumes a customer who uses Sprint's RP and exemplifies the design goal of simplicity in the customer network. Additional commands may be required depending on what additional functionality is desired. We recommend customers contact their router vendor for further information.

Configuring a SprintLink RP:

ip multicast-routing distributed ! ! Enable PIM on unique loopback interface ! interface Loopback0 ip address 144.228.241.9 255.255.255.255 ip pim sparse-mode ! ! Enable PIM on Anycast RP interface ! Repeat for all other interfaces ! interface Loopback1 description rp.sprintlink.net ip address 144.232.187.198 255.255.255.255 ip pim sparse-mode ! ! Static RP mapping ! ip pim rp-address 144.232.187.198 override ip pim accept-rp 144.232.187.198 ! ! MSDP customer peering ! ip msdp peer 144.228.1.2 connect-source Loopback0 ip msdp sa-filter in 144.228.1.2 list 130 ip msdp sa-filter out 144.228.1.2 list 130 ! ! MSDP peerings with other Anycast RPs ! ip msdp peer 144.228.240.249 connect-source Loopback0 ip msdp peer 144.228.240.250 connect-source Loopback0 ip msdp peer 144.228.240.251 connect-source Loopback0 ip msdp peer 144.228.240.252 connect-source Loopback0 ip msdp peer 144.228.240.253 connect-source Loopback0 ip msdp peer 144.228.61.1 connect-source Loopback0 ip msdp peer 144.228.105.1 connect-source Loopback0 ip msdp peer 213.206.128.10 connect-source Loopback0 ip msdp mesh-group imsdp 144.228.240.249 ip msdp mesh-group imsdp 144.228.240.250 ip msdp mesh-group imsdp 144.228.240.251 ip msdp mesh-group imsdp 144.228.240.252 ip msdp mesh-group imsdp 144.228.240.253 ip msdp mesh-group imsdp 144.228.61.1 ip msdp mesh-group imsdp 144.228.105.1 ip msdp mesh-group imsdp 213.206.128.10 ip msdp cache-sa-state ! ! Prevents customers from announcing SA information ! for AutoRP and administratively scoped addresses ! access-list 130 deny ip any host 224.0.2.2 access-list 130 deny ip any host 224.0.1.3 access-list 130 deny ip any host 224.0.1.24 access-list 130 deny ip any host 224.0.1.22 access-list 130 deny ip any host 224.0.1.2 access-list 130 deny ip any host 224.0.1.35 access-list 130 deny ip any host 224.0.1.60 access-list 130 deny ip any host 224.0.1.39 access-list 130 deny ip any host 224.0.1.40 access-list 130 deny ip any 225.0.0.0 0.255.255.255 access-list 130 deny ip any 226.0.0.0 1.255.255.255 access-list 130 deny ip any 228.0.0.0 3.255.255.255 access-list 130 deny ip any 232.0.0.0 0.255.255.255 access-list 130 deny ip any 234.0.0.0 1.255.255.255 access-list 130 deny ip any 236.0.0.0 3.255.255.255 access-list 130 deny ip 10.0.0.0 0.255.255.255 any access-list 130 deny ip 127.0.0.0 0.255.255.255 any access-list 130 deny ip 172.16.0.0 0.15.255.255 any access-list 130 deny ip 192.168.0.0 0.0.255.255 any access-list 130 permit ip any any

Configuring a SprintLink non-RP router:

ip multicast-routing distributed ! ! Enable PIM on all interfaces ! Configure boundaries to prevent customers from sending ! to restricted groups (AutoRP, etc) ! interface POS0/0 description to SprintLink Core Router ip pim sparse-mode ! interface POS1/0 description to Customer ip pim sparse-mode ip multicast boundary 4 ! ip pim rp-address 144.232.187.198 override ip pim accept-rp 144.232.187.198 ! access-list 4 deny 224.0.1.35 access-list 4 deny 224.0.1.39 access-list 4 deny 224.0.1.40 access-list 4 deny 224.0.1.60 access-list 4 deny 224.0.2.2 access-list 4 deny 224.0.1.3 access-list 4 deny 224.0.1.2 access-list 4 deny 224.0.1.22 access-list 4 deny 224.0.1.24 access-list 4 deny 10.0.0.0 0.255.255.255 access-list 4 deny 127.0.0.0 0.255.255.255 access-list 4 deny 172.16.0.0 0.15.255.255 access-list 4 deny 192.168.0.0 0.0.255.255 access-list 4 deny 225.0.0.0 0.255.255.255 access-list 4 deny 226.0.0.0 1.255.255.255 access-list 4 deny 228.0.0.0 3.255.255.255 access-list 4 deny 234.0.0.0 1.255.255.255 access-list 4 deny 236.0.0.0 3.255.255.255 access-list 4 permit any

Configuring a Customer Router:

ip multicast-routing ! interface Serial1/0 ip pim sparse-mode ! interface Ethernet0/1 ip pim sparse-mode ! ip pim rp-address 144.232.187.198 ip pim accept-rp 144.232.187.198

Multicast Addressing

A significant challenge in today's deployment of multicast is Class D addressing. Multicast groups use IP addresses in the Class D range (224.0.0.0 - 239.255.255.255), which are controlled by IANA. Some applications, like SDR, dynamically select Class D addresses from a predefined pool in a DHCP-like manner. However, most commercially viable applications require static Class D addresses. Currently, the 233.0.0.0/8 range has been reserved for static IP addresses for anyone with his own autonomous system number. By converting an AS number into a 16 bit binary number, and placing this 16 bit number into the 2 middle octets in the 233.x.y.0 range, anyone who owns an AS number now owns 256 Class D addresses. For more information about this please read RFC 3180: GLOP Addressing in 233/8.

The 239.0.0.0/8 range has also been reserved for use in administratively scoped domains. These addresses can be thought of like unicast private address space (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16). However, the 239/8 space is not routable in SprintLink.

Customers who require static Class D addresses can either obtain an AS number and use the derived 233/8 space, or contact IANA for static Class D address allocations. In certain circumstances, Sprint may provide addresses from our own GLOP-derived address block for customers who are unable to obtain addresses through other means.

The Future of Multicast: SSM

Most multicast experts agree that the creation of MSDP was a stop-gap measure, designed to make multicast work until a better solution could be found. That better solution was to change the model used for multicast. The current model is now called Any-Source Multicast (ASM), and describes a multicast group where there could be multiple sources. For example, a video teleconference could be created with three parties, each of which is generating audio and video data.

Certain types of traffic, however, only require a single source. For example, television broadcasts reach a large audience, but have a single source. For situations like this, Single-Source Multicast (SSM) was created. SSM builds off of PIM-SM, but also requires an update to IGMP. IGMP version 3 includes a larger header, where the source address can be specified, in addition to the group address. This means that a router no longer needs to communicate with an RP in order to locate the source, and also means that MSDP is no longer needed since its only purpose is to pass information among RPs. SSM also provides for greater control, since you have a single source. With ASM, any receiver for a given group may decide to also become a source, which could interfere with a data stream. And SSM also solves a problem with multicast addressing. The 232.0.0.0/8 range has been reserved for use with SSM. Since source information is included when a user joins a group, this means that a given source can use any address they choose from that range, since the source address should be unique. Thus, customers do not need to worry about obtaining static multicast addresses, or an autonomous system number to use for GLOP addressing.

Sprint's network is prepared for the use of SSM. However, to use SSM applications, your operating system must support IGMPv3. Currently, Microsoft Windows XP is the only commercial operating system that includes support for IGMPv3 by default. Patches are available as well for Linux and BSD operating systems.

Customer Issues

Sprint's multicast architecture was designed to support the enabling of native multicast for all SprintLink customers in a simple and scalable way. Customers who want to use their own RP will run MSDP with the nearest Sprint RP. Sprint's anycast RP infrastructure is designed to support all other customers who do not want to manage their own RP. They will simply turn on PIM-SM and configure their routers to use Sprint's RPs. BGP customers can at any time switch to an MBGP peering.