Date: Thu, 15 Mar 2001 17:33:18 EST To: Sally Floyd cc: mallman@grc.nasa.gov, kc@caida.org, end2end-interest@postel.org From: Constantinos Dovrolis Subject: Re: [e2e] two questions about the Internet Sally, I may have a (very partial) answer to the first question, i.e., what is the distribution of round-trip times (RTTs) for the packets on a certain link. A couple of initial "disclaimers": a) an RTT can be only associated to a packet of a closed-loop protocol, and so our measurements only looked at TCP packets b) our measurements do not refer to per-packet RTTs, but to per-connection RTTs. It is likely that there are important differences in these two distributions (short-RTT flows may tend to carry more data, causing the per-packet RTT distribution to be heavier on lower RTT values). So, the measurements that I refer to were done in the summer of 99 at CAIDA. We were processing traffic traces captured from passive monitors on certain links (note that we get two different traces, one for each direction of the link). In a certain trace, we were estimating the RTT of each TCP connection using the following two rules: a) if we observe the flow from the caller to the callee, the RTT is estimated as the time interval from the SYN to the SYN-ACK. b) if we observe the flow from the callee to the caller (which is usually the traffic from the server to the client), the RTT is estimated from the time spacing of the first 2 or 3 slow-start bursts. The code to do this is tricky (I can send to you, or to anyone else, the code if you want to play with this). So, using these tricks we were measuring the distribution of RTTs in the TCP connections that were present in each (unidirectional) trace. Just as an example of the distributions that we were getting, take a look at: http://www.cis.udel.edu/~dovrolis/rtt-sdsc.eps The graph shows two RTT distributions, one for each direction of the OC-3 link that used to connect UCSD with CERFnet. A few major points from the graph: - About 35% of the connections have RTT < 50ms - About 60% of the connections have RTT < 100ms - There is a significant fraction of connections (20-30%) with RTT>200ms (which is probably close to the upper bound for any type of interactive applications). - About 10% of the RTTs are quite large (some of them in the order of multiple seconds), which may indicate errors in our measurement methodology. This is why I did not include that fraction of RTTs in the graph. Some very interesting measurements on this subject also appear at Mark Allman's "A Web server's view of the transport layer" published at CCR Oct-2000. Mark's measurements originate from traces of the server's traffic (instead of a passive monitor in the the network). Also, he could measure the RTTs more accurately based on the time distance between a non-retransmitted packet and the corresponding ACK. Obviously we cannot do the same, because we don't have the flow of ACKs in the trace. It is interesting that Mark's measurements (see Figure 9) are not *very* different from the graph that I mentioned before. Specifically, his graph shows: - About 35% of the RTTs < 100msec - About 60-70% of the RTTs < 200msec - About 85% of the RTTs < 500msec. Of course Mark's measurements/analysiswere much more methodically done (my measurements were only done to get some reasonable values for simulations about other stuff). I hope that this helps. I am also very interested in answers to the rest of your questions. Constantinos Computer and Information Sciences - University of Delaware http://www.cis.udel.edu/~dovrolis/ Date: Thu, 15 Mar 2001 18:51:11 EST To: dovrolis@mail.eecis.udel.edu (Constantinos Dovrolis) cc: floyd@aciri.org (Sally Floyd), mallman@grc.nasa.gov, kc@caida.org, end2end-interest@postel.org From: Erich Nahum Subject: Re: [e2e] two questions about the Internet Constantinos Dovrolis writes: > > It is interesting that Mark's measurements (see Figure 9) are not *very* > different from the graph that I mentioned before. Specifically, > his graph shows: > - About 35% of the RTTs < 100msec > - About 60-70% of the RTTs < 200msec > - About 85% of the RTTs < 500msec. > Of course Mark's measurements/analysis were much more methodically > done (my measurements were only done to get some reasonable values > for simulations about other stuff). Srini Seshan (when he was here at Watson) had some packet trace data from the 1996 Olympic Web server, but it's a bit old now. The technique was similar to what Mark Allman did. For the record, though, it had: - 25% of the RTTs < 115 ms - 50% of the RTTs < 338 ms - 75% of the RTTs < 778 ms The RTTs are obviously going to vary depending on what kind of connection you have (T3, OC-768) as well as where your clients are (NY, CA, Greece). -Erich -- Erich M. Nahum IBM T.J. Watson Research Center Networking ResearchP.O. Box 704 nahum@watson.ibm.com Yorktown Heights NY 10598 Date: Fri, 16 Mar 2001 10:23:51 +1000 To: nahum@watson.ibm.com (Erich M. Nahum) cc: dovrolis@mail.eecis.udel.edu (Constantinos Dovrolis), floyd@aciri.org (Sally Floyd), mallman@grc.nasa.gov, kc@caida.org, end2end-interest@postel.org From: George Michaelson Subject: Re: [e2e] two questions about the Internet List-Archive: Srini Seshan (when he was here at Watson) had some packet trace data from the 1996 Olympic Web server, but it's a bit old now. The technique was similar to what Mark Allman did. For the record, though, it had: - 25% of the RTTs < 115 ms - 50% of the RTTs < 338 ms - 75% of the RTTs < 778 ms The RTTs are obviously going to vary depending on what kind of connection you have (T3, OC-768) as well as where your clients are (NY, CA, Greece). -Erich The 96 Olympics were hosted behind multiple backends, geographically distributed? I thought Nagano was, I went to a seminar by IBM on it. Because if so, there were presumably frontend boxes making decisions on backend server, which would either intuit best-fit path or else map it into some simple model like BGP AS or link-based region and so skew RTT in favour of shorter-hop and/or ligher-load hosts. -George -- George Michaelson | DSTC Pty Ltd Email: ggm@dstc.edu.au | University of Qld 4072 Phone: +61 7 3365 4310 | Australia Fax: +61 7 3365 4311 | http://www.dstc.edu.au Date: Thu, 15 Mar 2001 21:04:46 EST To: ggm@dstc.edu.au (George Michaelson) cc: nahum@watson.ibm.com (Erich M. Nahum), dovrolis@mail.eecis.udel.edu (Constantinos Dovrolis), floyd@aciri.org (Sally Floyd), mallman@grc.nasa.gov, kc@caida.org, end2end-interest@postel.org From: Erich Nahum Subject: Re: [e2e] two questions about the Internet George Michaelson writes: > > The 96 Olympics were hosted behind multiple backends, geographically > distributed? I thought Nagano was, I went to a seminar by IBM on it. > > Because if so, there were presumably frontend boxes making decisions > on backend server, which would either intuit best-fit path or else > map it into some simple model like BGP AS or link-based region and > so skew RTT in favour of shorter-hop and/or ligher-load hosts. 96 (Atlanta) was the first olympics that IBM hosted, and I believe it was just one complex in Southbury, CT. 98 (Nagano) and 2000 (Australia) were hosted by 4 main sites: Bethesda (for Europe), Shaumberg IL and someplace in Ohio (for the Americas) and Tokyo (for Asia). The request routing was done on a very course-grain level, basically through the routing tables. E.g., if you were in Europe, olympics.com pointed to Bethesda. I think it was done at the routing layer and not through DNS. The front ends of each cluster were IBM network dispatcher TCP sprayers, which routed to back-end nodes on the same LAN. So I believe the RTT distribution seen by a complex would be the same across nodes within that cluster. -Erich -- Erich M. Nahum IBM T.J. Watson Research Center Networking Research P.O. Box 704 nahum@watson.ibm.com Yorktown Heights NY 10598 Date: Thu, 15 Mar 2001 22:02:17 EST To: nahum@watson.ibm.com (Erich M. Nahum) cc: end2end-interest@postel.org From: Hari Balakrishnan Subject: Re: [e2e] two questions about the Internet List-Archive: Erich, > George Michaelson writes: > > > > The 96 Olympics were hosted behind multiple backends, geographically > > distributed? I thought Nagano was, I went to a seminar by IBM on it. > > > > Because if so, there were presumably frontend boxes making decisions > > on backend server, which would either intuit best-fit path or else > > map it into some simple model like BGP AS or link-based region and > > so skew RTT in favour of shorter-hop and/or ligher-load hosts. > > 96 (Atlanta) was the first olympics that IBM hosted, and I believe it was > just one complex in Southbury, CT. For the 1996 Atlanta games, IBM actually ran multiple servers for the Olympics, but they weren't transparent (i.e., they had distinct DNS names). The data being referred to here was collected at Southbury, CT. The other sites were, if I recall, at Keio (Japan), Cornell (NY), Karlsruhe (Germany), and Hursley (UK). The Southbury site was connected via T3 links to 4 US NAPs: Chicago (Bellcore & Ameritech), SF Bay Area (Bellcore and PacBell), NY (Sprint), and DC (MFS Datanet). > 98 (Nagano) and 2000 (Australia) > were hosted by 4 main sites: Bethesda (for Europe), Shaumberg IL and > someplace in Ohio (for the Americas) and Tokyo (for Asia). The > request routing was done on a very course-grain level, basically > through the routing tables. E.g., if you were in Europe, > olympics.com pointed to Bethesda. I think it was done at > the routing layer and not through DNS. > The front ends of each cluster were IBM network dispatcher TCP > sprayers, which routed to back-end nodes on the same LAN. So > I believe the RTT distribution seen by a complex would be the same > across nodes within that cluster. Sounds about right, if you believe the load-balancing was working correctly. :} (I'm not saying it wasn't!) Hari > > -Erich > > -- > Erich M. Nahum IBM T.J. Watson Research Center > Networking Research P.O. Box 704 > nahum@watson.ibm.com Yorktown Heights NY 10598 Date: Fri, 16 Mar 2001 09:40:48 PST To: Sally Floyd cc: end2end-interest@postel.org, Ping Pan From: Ping Pan Subject: Re: [e2e] two questions about the Internet Sally Floyd wrote: > > The new questions: > > ROUND-TRIP TIMES (HOPS, NUMBER OF ASes) OF PACKETS? > For packets on a particular link, each packet could be assigned an > estimated round-trip time, a number of ASes for the end-to-end > path, etc, based on the IP source and destination addresses for > that packet. For packets on a particular link, what can we say > about the distribution of round-trip times, or of the number of hops > traversed, or the number of ASes traversed, or number of continents > traversed, or (this is harder) the number of congested links traversed? > Hi, Hop-counters: http://www.nlanr.net/NA/Learn/wingspan.html AS length: http://moat.nlanr.net/ASPL/ (from University of Oregon) BTW, there are several good pages on Internet questions: 1. Henning Schulzrinne: http://www.cs.columbia.edu/~hgs/internet/traffic.html 2. Merit: http://www.merit.edu/ipma/reports/ 3. NLANR: http://www.nlanr.net/NA/Learn/ - Ping Pan Date: Fri, 16 Mar 2001 10:07:48 PST To: Sally Floyd cc: end2end-interest@postel.org From: Ping Pan Subject: Re: [e2e] two questions about the Internet List-Archive: Sally Floyd wrote: > > PERIODS OF EXTREME CONGESTION AT A ROUTER? > For those routers in the network that do occationally experience > congestion, how can we characterize their rare periods of *extreme* > congestion (defining extreme congestion, say, as packet drop rates > above 5%)? How frequently to these periods of extreme congestion > occur, and how long do they last? What fraction can be attributed > to flash crowds? to Denial of Service attacks? to fiber cuts or > other routing changes? > Almost forgot, please take a look at http://www.nordu.net/stats/. This is one of the better places where you can monitor the link traffic for both average and peak rates, and draw your own conclusion on link congestion and duration. In the past several years, most of US providers stop showing their networks, and only provide the average bw utilization, which is low anyway. It is believed that the peak/average ratio is around 3-4 or higher, but I have not seen solid evidence on this since NSFNET. I don't think there are too many fiber cuts in the network (well... on the other hand, China-US undersea cable was cut many days ago, and the link was still down.) But in some networks, providers do shift traffic between links quite often. - Ping Date: Fri, 16 Mar 2001 14:00:51 EST To: oleg@inforocket.com (Oleg Vishnepolsky) cc: nahum@watson.ibm.com (Erich M. Nahum), ggm@dstc.edu.au (George Michael ***son), dovrolis@mail.eecis.udel.edu (Constantinos Dovrolis), floyd@aciri.org (Sally Floyd), mallman@grc.nasa.gov, kc@caida.org, end2end-interest@postel.org From: Erich Nahum Subject: Re: [e2e] two questions about the Internet Oleg Vishnepolsky writes: > > >96 (Atlanta) was the first olympics that IBM hosted, and I believe it was > >just one complex in Southbury, CT. 98 (Nagano) and 2000 (Australia) > >were hosted by 4 main sites: Bethesda (for Europe), Shaumberg IL and > >someplace in Ohio (for the Americas) and Tokyo (for Asia). The > >request routing was done on a very course-grain level, basically > >through the routing tables. E.g., if you were in Europe, > >olympics.com pointed to Bethesda. I think it was done at > >the routing layer and not through DNS. > > How is it even possible not to involve DNS ? If DNS was giving out the > same IP address to olympics.com irrespective of the where requests > came from, then routing would have been real tricky, to say the least. I wasn't the one who did the work, so take my recollections with a grain of salt. Hari was one of the authors on the SigMetrics 97 and InfoCom 98 papers that describe this work, so I would trust him on this one about the 96 olympics. As for the later ones, this is what I've been told. It doesn't seem tricky to me, but I'm not a routing person. I'll try to dig up the info and post it here next week. -Erich -- Erich M. Nahum IBM T.J. Watson Research Center Networking Research P.O. Box 704 nahum@watson.ibm.com Yorktown Heights NY 10598