Being multihomed means you have two (or more) routes to any destination connected to the Internet. In other words, you need a way to decide which route is better. When left to its own devices, a BGP router will try to send traffic over the route with the shortest AS path. Depending on the connectivity of your upstream ISPs and traffic patterns, this will suit the available bandwidth of the respective connections to varying degrees. Even though bandwidth is getting cheaper all the time, it’s usually advantageous to try to balance the traffic so that it takes advantage of all the available bandwidth in a multihomed setup. Thus, if BGP decides that most of the outgoing traffic should go through the smallest pipe, you will have to tell it that this isn’t what you want by tweaking one or more BGP attributes. Ideally, more traffic will then flow over the under-used connection. At the same time, you’ll want the traffic to take the best route to a destination, if possible, whatever “best” may be. This type of activity is called traffic engineering
Engineering outgoing traffic is the easy part, because you have control over what your own routers do. It’s harder to get incoming traffic balanced properly over the available connections. At the end of the chapter, there is a discussion of queuing, traffic shaping, and traffic policing techniques that can be used to maximize network performance under low-bandwidth conditions. The examples in this chapter all assume a network with Autonomous System number 60055 multihomed to two ISPs: ISP A (AS 40077) and ISP B (AS 50066). The way ISP A and ISP B interconnect with other ASes differs from example to example, however. See Figure 6-1.
Figure 6-1. Network used for examples in this chapter
Unlike in Chapter 5, the connections to both ISP A and ISP B terminate at the same router, so the settings for both ISPs can be shown side by side.
TIP: The examples in this chapter show only the commands necessary to perform the function being discussed. You also need to configure filters and other features discussed in Chapter 5 to arrive at a working configuration. Knowing Which Route Is Best Traditionally, there have been three figures that describe the quality of a connection: bandwidth, delay, and packet loss. A connection with high bandwidth, low delay, and low packet loss is obviously better than one with low bandwidth, high delay, and high packet loss. But which is better: a 45-Mbps satellite connection with a 300-millisecond delay,[1] or a 1544-Kbps terrestrial connection with a 3-ms delay? There is of course no easy answer: it depends. A protocol such as Telnet uses very little bandwidth, but when the user presses a key, he has to wait for the packet containing the input character to travel over the network and for the packet containing the response to travel back again before it shows up on his screen. So any delay of more than a few dozen milliseconds is immediately noticeable. With FTP, the time individual packets take to traverse the line is of little importance. What counts is the total time the file transfer takes, which depends mostly on the bandwidth of the connection. Both low-bandwidth, delay-sensitive applications such as Telnet and bandwidth-hungry, delay-tolerant ones such as FTP suffer from packet loss, because both typically use TCP. TCP uses complex algorithms to optimize performance (especially data throughput) for every possible combination of delay, bandwidth, and packet loss. TCP assumes packet loss indicates congestion in the network, so it slows down when packets are lost. (The TCP congestion management algorithms are discussed at the end of this chapter.)
Finding High- and Low-Quality Routes There are no hard and fast rules about which quality makes a certain route better than another: application requirements are different, and the interaction between the bandwidth, delay, and packet loss parameters complicate matters further. But that doesn’t mean there is nothing you can do. It may be hard to determine which of two good routes is best, but it isn’t hard to determine that a route is bad. Some experimentation with the traceroute program can bring interesting information to the surface you can use to base your route selection policy on.
traceroute The traceroute program is available on almost every system that runs TCP/IP. Under Windows it’s called tracert; on most other systems, it’s simply traceroute. traceroute manipulates the TTL field in the header of IP packets it transmits to the “traced” host. Every router is required to lower the TTL on packets it forwards and destroy packets when the TTL reaches 0. This way if there is a routing loop, packets won’t circle the network forever. In the first three probe packets, the TTL is set to 1. When the first router receives the packet, it decrements the TTL to 0. So the router throws away the packet and sends an ICMP “TTL exceeded in transit” error message back to the originating system. traceroute then prints the name and address of the router that sent the ICMP packet on the screen, along with the time the whole process took. When no ICMP message is received, the program prints an asterisk (*) on the screen.
After three probes with a TTL of one, traceroute sends out another three probes with a TTL of two. These packets are forwarded by the first router, but the TTL is decremented to 0 in the second router. So the next line on the screen has information about the second hop. The program continues to send probes with increasing TTLs until it finally receives packets from the destination host, at which time it stops.
The asterisks that indicate lost packets and the names (often containing a city or airport code) of routers along the way and the timing information can be informative. But because routers typically create the ICMP messages that are sent back to the traceroute program in a way that has little to do with actual packet forwarding, traceroute results offer only an indication of the network performance and not any definitive information.
Doing some traceroutes to destinations that are reachable over a certain path will often reveal some or all of the following information, which you may want to use to select the preferred route to certain destinations:
- Congestion: Congestion lowers the available bandwidth and increases delay and packet loss, so it’s important to avoid routes over congested paths. Congestion shows up in a traceroute in two ways: there is usually packet loss, and the round-trip times are often inconsistent: most of the time high, but sometimes low.
- Distance: Two routes can have different paths. Some networks interconnect only at relatively few locations, so they may have to transport traffic over long distances to get it to its destination. Others have better interconnection, so the traffic doesn’t have to take a detour. There may be reasons not to prefer the more direct route, such as lower bandwidth or congestion, but generally a shorter geographic path is better. Every 100 kilometers or 60 miles of fiber adds about a millisecond to the round-trip time, because light travels at a speed of approximately 207,000 km or 129,000 miles per second in fiber. A trans-Atlantic or transcontinental detour can easily add up to more than 100-ms extra delay.
The distance of a path isn’t directly visible in traceroute (apart from the higher delay), but many networks are helpful enough to give their router descriptive names, so it’s possible to deduce the geographic path to some extent. Also, this information may be available in BGP itself in the form of communities that indicate where a route was learned.
- Hops: In general, the number of hops that shows up on a traceroute isn’t too important. But each hop potentially adds additional delay, because packets have to wait in a queue before they are transmitted, and the extra equipment in a path means that a failure somewhere along the way is more likely. So all else being equal, paths with fewer hops are slightly better. On the other hand, paths with few hops probably use some kind of layer 2 switching, such as frame relay or ATM, which adds another layer of complexity to the network.
It may be necessary to temporarily reroute outgoing traffic to observe the properties of alternative paths. See the section “Setting the Local Preference” for more details on how to do this. Example 6-1 is the output of a traceroute to a somewhat congested destination. (Parts of the domain names and IP addresses have been removed for brevity.) Example 6-1: traceroute showing some congestion traceroute to g.root-servers.net (192.112.36.4), 30 hops max, 40 byte packets
1 208.100 (208.100) 0.602 ms 0.511 ms 0.498 ms
2 63.1 (63.1) 0.306 ms 0.272 ms 0.415 ms
3 pos3-2.gw2.dca8 (157.58) 0.982 ms 0.957 ms 0.967 ms
4 0.so-3-0.XL2.DCA8 (46.94) 1.116 ms 1.104 ms 1.095 ms
5 0.so-7-0.XL2.DCA6 (46.25) 2.456 ms 2.672 ms 2.444 ms
6 POS7-0.BR4.DCA6 (52.233) 2.408 ms 2.356 ms 2.369 ms
7 204.98 (204.98) 3.296 ms 3.395 ms 3.314 ms
8 wdc-core-01.inet (205.37) 3.344 ms 3.320 ms 3.293 ms
9 wdc-edge-05.inet (205.82) 3.465 ms 3.354 ms 3.295 ms
10 63.222 (63.222) 5.929 ms 226.962 ms 11.260 ms
11 198.50 (198.50) 52.694 ms 79.870 ms 33.990 ms
12 G.ROOT-SERVERS.NET (192.4) 23.892 ms * 19.495 ms
Beginning at line 10, there is a sudden increase in delay, and the delay becomes inconsistent from one probe packet to another. If this was the result of distance, the increase in delay would be across the board; light always travels at the same speed. Also, both the source and destination of the trace are in Virginia in this example. A lot of interconnection takes place there, so a detour is unlikely. The delay stabilizes at the last hop, so the earlier delay figures are probably due to high CPU load for the two hops just before the last, and the ICMP processing took a relatively long time. But the 20-ms delay with a 4-ms (20%) difference between the two delay figures on the last line indicates either a rather slow connection or high queuing delays. There is no packet loss, however, other than the asterisk in the middle of the last line. A missing answer to the second probe on a line usually indicates that the responding host or router limits the number of responses to probe packets per unit of time. So the level of congestion seen here isn’t high.
Is the Highest-Bandwidth Route Best? As you may have noticed, bandwidth isn’t on my list of route-selection criteria. Obviously, bandwidth can be a decisive factor. For instance, if you connect to the vBNS or another high bandwidth network, you’ll want to take advantage of this connection. You should then probably prefer all routes over the high bandwidth link. But in most cases, you will connect to ISPs that have many times the bandwidth you require, even if one ISP has even more than another. And if one of your connections is faster than another, this doesn’t mean you’ll want to use the faster connection for all traffic. For instance, balancing traffic so three quarters of it flows over a 4.5-Mbps fractional T3 connection and a quarter flows over a T1 results in an effective bandwidth of 6 Mbps. Using the fractional T3 connection for all traffic means no individual stream or session is limited to the maximum bandwidth of the slower T1 line, but it limits the total available bandwidth to that of the 4.5-Mbps line.
If you know in advance the IP addresses that high bandwidth applications will connect to, you can prefer routes to those address ranges over the higher bandwidth connection. If these addresses are not known, or the list is too long, you’ll have to look at other factors when configuring route selection parameters on the router.
Route Maps Cisco IOS provides the network administrator with route maps to modify the Local Preference, AS path, and MED prior to inclusion of a route in the BGP table and the subsequent best route selection process, or before a route is propagated to a neighbor. A route map is much like an if-then construction in a programming language. First, a match line is applied. If the route matches, the set lines that follow are applied. The route is then passed to the BGP table or to the neighbor, depending on whether the route map is set for incoming or outgoing route updates. Route maps are grouped together using a tag or name, and each route map has a sequence number. The route map with the lowest sequence number is evaluated first. When all route maps that share the same tag have been evaluated but there is no match, or when there is an explicit deny, the route is rejected and not entered into the BGP table or announced to the neighbor. Here are the match criteria most relevant to BGP processing:
as-path: Using an AS path access list community: Using a community list ip address: Using an access list or prefix list ip next-hop: Using an access list or prefix list metric: The Multi Exit Discriminator Actions that can be taken using the set part of a route map include: as-path: Prepending extra AS numbers to the AS path comm-list: Deleting communities community: Adding or replacing communities dampening: Setting the flap-dampening parameters ip next-hop: Setting the next hop address local-preference: Setting the Local Preference metric: Setting or changing the MED metric weight: Setting the weight value More information on route maps is available in the Cisco documentation, available over the Web at http://www.cisco.com. Iljitsch van Beijnum has been working with BGP in ISP and end-user networks since 1996. This article originally appeared at O’Reilly’s http://www.onlamp.com.
1. “Delay” usually means the time from the moment a bit enters a circuit on one end until the moment it appears at the other end, but it’s used interchangeably with round trip time (RTT) here, as the time it takes to receive a reply for a packet sent.
Iljitsch van Beijnum has been working with BGP in ISP and end-user networks since 1996.
BGP – This book is a guide to all aspects of BGP: the protocol, its configuration
and operation in an Internet environment, and how to troubleshoot it. The book
also describes how to secure BGP, and how BGP can be used as a tool in combating
Distributed Denial of Service (DDoS) attacks. Although the examples throughout
this book are for Cisco routers, the techniques discussed can be applied to any
BGP-capable router.