Date:    Thu, 15 Mar 2001 17:33:18 EST
To:      Sally Floyd <floyd@aciri.org>
cc:      mallman@grc.nasa.gov, kc@caida.org, end2end-interest@postel.org
From:    Constantinos Dovrolis <dovrolis@mail.eecis.udel.edu>
Subject: Re: [e2e] two questions about the Internet


Sally,

I may have a (very partial) answer to the first question, i.e.,
what is the distribution of round-trip times (RTTs) for the
packets on a certain link.

A couple of initial "disclaimers":
a) an RTT can be only associated to a packet of a closed-loop
protocol, and so our measurements only looked at TCP packets
b) our measurements do not refer to per-packet RTTs, but to
per-connection RTTs. It is likely that there are important
differences in these two distributions (short-RTT flows may
tend to carry more data, causing the per-packet RTT distribution to
be heavier on lower RTT values).

So, the measurements that I refer to were done in the summer
of 99 at CAIDA. We were processing traffic traces captured from
passive monitors on certain links (note that we get two different
traces, one for each direction of the link). In a certain trace,
we were estimating the RTT of each TCP connection using the
following two rules:

a) if we observe the flow from the caller to the callee, the
RTT is estimated as the time interval from the SYN to the SYN-ACK.

b) if we observe the flow from the callee to the caller (which
is usually the traffic from the server to the client), the RTT
is estimated from the time spacing of the first 2 or 3 slow-start
bursts. The code to do this is tricky (I can send to you, or to
anyone else, the code if you want to play with this).

So, using these tricks we were measuring the distribution
of RTTs in the TCP connections that were present in each
(unidirectional) trace. Just as an example of the distributions
that we were getting, take a look at:

http://www.cis.udel.edu/~dovrolis/rtt-sdsc.eps

The graph shows two RTT distributions, one for each direction
of the OC-3 link that used to connect UCSD with CERFnet.

A few major points from the graph:
- About 35% of the connections have RTT < 50ms
- About 60% of the connections have RTT < 100ms
- There is a significant fraction of connections (20-30%)
with RTT>200ms (which is probably close to the upper bound
for any type of interactive applications).
- About 10% of the RTTs are quite large (some of them in the
order of multiple seconds), which may indicate errors in our
measurement methodology. This is why I did not include that
fraction of RTTs in the graph.

Some very interesting measurements on this subject also appear
at Mark Allman's "A Web server's view of the transport layer"
published at CCR Oct-2000. Mark's measurements originate from
traces of the server's traffic (instead of a passive monitor in the
the network). Also, he could measure the RTTs more accurately
based on the time distance between a non-retransmitted packet
and the corresponding ACK. Obviously we cannot do the same,
because we don't have the flow of ACKs in the trace.

It is interesting that Mark's measurements (see Figure 9) are not *very*
different from the graph that I mentioned before. Specifically,
his graph shows:
- About 35% of the RTTs < 100msec
- About 60-70% of the RTTs < 200msec
- About 85% of the RTTs < 500msec.
Of course Mark's measurements/analysiswere much more methodically
done (my measurements were only done to get some reasonable values
for simulations about other stuff).

I hope that this helps. I am also very interested in answers
to the rest of your questions.


Constantinos

Computer and Information Sciences - University of Delaware

http://www.cis.udel.edu/~dovrolis/


Date:    Thu, 15 Mar 2001 18:51:11 EST
To:      dovrolis@mail.eecis.udel.edu (Constantinos Dovrolis)
cc:      floyd@aciri.org (Sally Floyd), mallman@grc.nasa.gov, kc@caida.org,
	 end2end-interest@postel.org
From:    Erich Nahum <nahum@watson.ibm.com>
Subject: Re: [e2e] two questions about the Internet

Constantinos Dovrolis writes:
> 
> It is interesting that Mark's measurements (see Figure 9) are not *very*
> different from the graph that I mentioned before. Specifically,
> his graph shows:
> - About 35% of the RTTs < 100msec
> - About 60-70% of the RTTs < 200msec
> - About 85% of the RTTs < 500msec.
> Of course Mark's measurements/analysis were much more methodically
> done (my measurements were only done to get some reasonable values
> for simulations about other stuff).

Srini Seshan (when he was here at Watson) had some packet trace data from 
the 1996 Olympic Web server, but it's a bit old now.  The technique was 
similar to what Mark Allman did.  For the record, though, it had:

- 25% of the RTTs < 115 ms
- 50% of the RTTs < 338 ms
- 75% of the RTTs < 778 ms

The RTTs are obviously going to vary depending on what kind of
connection you have (T3, OC-768) as well as where your clients
are (NY, CA, Greece).

-Erich

-- 
Erich M. Nahum                  IBM T.J. Watson Research Center
Networking ResearchP.O. Box 704
nahum@watson.ibm.com            Yorktown Heights NY 10598


Date:    Fri, 16 Mar 2001 10:23:51 +1000
To:      nahum@watson.ibm.com (Erich M. Nahum)
cc:      dovrolis@mail.eecis.udel.edu (Constantinos Dovrolis),
	 floyd@aciri.org (Sally Floyd), mallman@grc.nasa.gov, kc@caida.org,
	 end2end-interest@postel.org
From:    George Michaelson <ggm@dstc.edu.au>
Subject: Re: [e2e] two questions about the Internet 
List-Archive: <http://www.postel.org/pipermail/end2end-interest/>


Srini Seshan (when he was here at Watson) had some packet trace data from 
the 1996 Olympic Web server, but it's a bit old now.  The technique was 
similar to what Mark Allman did.  For the record, though, it had:

- 25% of the RTTs < 115 ms
- 50% of the RTTs < 338 ms
- 75% of the RTTs < 778 ms

The RTTs are obviously going to vary depending on what kind of
connection you have (T3, OC-768) as well as where your clients
are (NY, CA, Greece).

-Erich

The 96 Olympics were hosted behind multiple backends, geographically
distributed? I thought Nagano was, I went to a seminar by IBM on it.

Because if so, there were presumably frontend boxes making decisions
on backend server, which would either intuit best-fit path or else
map it into some simple model like BGP AS or link-based region and
so skew RTT in favour of shorter-hop and/or ligher-load hosts.

-George
--
George Michaelson         |  DSTC Pty Ltd
Email: ggm@dstc.edu.au    |  University of Qld 4072
Phone: +61 7 3365 4310    |  Australia
Fax: +61 7 3365 4311    |  http://www.dstc.edu.au


Date:    Thu, 15 Mar 2001 21:04:46 EST
To:      ggm@dstc.edu.au (George Michaelson)
cc:      nahum@watson.ibm.com (Erich M. Nahum),
	 dovrolis@mail.eecis.udel.edu (Constantinos Dovrolis),
	 floyd@aciri.org (Sally Floyd), mallman@grc.nasa.gov, kc@caida.org,
	 end2end-interest@postel.org
From:    Erich Nahum <nahum@watson.ibm.com>
Subject: Re: [e2e] two questions about the Internet

George Michaelson writes:
> 
> The 96 Olympics were hosted behind multiple backends, geographically
> distributed? I thought Nagano was, I went to a seminar by IBM on it.
> 
> Because if so, there were presumably frontend boxes making decisions
> on backend server, which would either intuit best-fit path or else
> map it into some simple model like BGP AS or link-based region and
> so skew RTT in favour of shorter-hop and/or ligher-load hosts.

96 (Atlanta) was the first olympics that IBM hosted, and I believe it was
just one complex in Southbury, CT.  98 (Nagano) and 2000 (Australia)
were hosted by 4 main sites: Bethesda (for Europe), Shaumberg IL and 
someplace in Ohio (for the Americas) and Tokyo (for Asia).  The
request routing was done on a very course-grain level, basically
through the routing tables.  E.g., if you were in Europe,
olympics.com pointed to Bethesda. I think it was done at
the routing layer and not through DNS. 

The front ends of each cluster were IBM network dispatcher TCP
sprayers, which routed to back-end nodes on the same LAN.  So
I believe the RTT distribution seen by a complex would be the same
across nodes within that cluster.

-Erich

-- 
Erich M. Nahum                  IBM T.J. Watson Research Center
Networking Research             P.O. Box 704
nahum@watson.ibm.com            Yorktown Heights NY 10598


Date:    Thu, 15 Mar 2001 22:02:17 EST
To:      nahum@watson.ibm.com (Erich M. Nahum)
cc:      end2end-interest@postel.org
From:    Hari Balakrishnan <hari@chive.lcs.mit.edu>
Subject: Re: [e2e] two questions about the Internet 
List-Archive: <http://www.postel.org/pipermail/end2end-interest/>


Erich,

> George Michaelson writes:
> > 
> > The 96 Olympics were hosted behind multiple backends, geographically
> > distributed? I thought Nagano was, I went to a seminar by IBM on it.
> > 
> > Because if so, there were presumably frontend boxes making decisions
> > on backend server, which would either intuit best-fit path or else
> > map it into some simple model like BGP AS or link-based region and
> > so skew RTT in favour of shorter-hop and/or ligher-load hosts.
> 
> 96 (Atlanta) was the first olympics that IBM hosted, and I believe it was
> just one complex in Southbury, CT.  

For the 1996 Atlanta games, IBM actually ran multiple servers for the Olympics,
 
but they weren't transparent (i.e., they had distinct DNS names).  The data 
being referred to here was collected at Southbury, CT.  The other sites were, 
if I recall, at Keio (Japan), Cornell (NY), Karlsruhe (Germany), and Hursley 
(UK).

The Southbury site was connected via T3 links to 4 US NAPs: Chicago (Bellcore &
 
Ameritech), SF Bay Area (Bellcore and PacBell), NY (Sprint), and DC (MFS 
Datanet).

> 98 (Nagano) and 2000 (Australia)
> were hosted by 4 main sites: Bethesda (for Europe), Shaumberg IL and 
> someplace in Ohio (for the Americas) and Tokyo (for Asia).  The
> request routing was done on a very course-grain level, basically
> through the routing tables.  E.g., if you were in Europe,
> olympics.com pointed to Bethesda. I think it was done at
> the routing layer and not through DNS. 

> The front ends of each cluster were IBM network dispatcher TCP
> sprayers, which routed to back-end nodes on the same LAN.  So
> I believe the RTT distribution seen by a complex would be the same
> across nodes within that cluster.

Sounds about right, if you believe the load-balancing was working correctly.  
:}  (I'm not saying it wasn't!)

Hari

> 
> -Erich
> 
> -- 
> Erich M. Nahum                  IBM T.J. Watson Research Center
> Networking Research             P.O. Box 704
> nahum@watson.ibm.com            Yorktown Heights NY 10598


Date:    Fri, 16 Mar 2001 09:40:48 PST
To:      Sally Floyd <floyd@aciri.org>
cc:      end2end-interest@postel.org, Ping Pan <pingpan@juniper.net>
From:    Ping Pan <pingpan@juniper.net>
Subject: Re: [e2e] two questions about the Internet

Sally Floyd wrote:
> 
> The new questions:
> 
> ROUND-TRIP TIMES (HOPS, NUMBER OF ASes) OF PACKETS?
> For packets on a particular link, each packet could be assigned an
> estimated round-trip time, a number of ASes for the end-to-end
> path, etc, based on the IP source and destination addresses for
> that packet.  For packets on a particular link, what can we say
> about the distribution of round-trip times, or of the number of hops
> traversed, or the number of ASes traversed, or number of continents
> traversed, or (this is harder) the number of congested links traversed?
> 

Hi,

Hop-counters: http://www.nlanr.net/NA/Learn/wingspan.html
AS length: http://moat.nlanr.net/ASPL/ (from University of Oregon)

BTW, there are several good pages on Internet questions:
1. Henning Schulzrinne:
http://www.cs.columbia.edu/~hgs/internet/traffic.html
2. Merit: http://www.merit.edu/ipma/reports/
3. NLANR: http://www.nlanr.net/NA/Learn/

- Ping Pan

Date:    Fri, 16 Mar 2001 10:07:48 PST
To:      Sally Floyd <floyd@aciri.org>
cc:      end2end-interest@postel.org
From:    Ping Pan <pingpan@juniper.net>
Subject: Re: [e2e] two questions about the Internet
List-Archive: <http://www.postel.org/pipermail/end2end-interest/>

Sally Floyd wrote:
> 
> PERIODS OF EXTREME CONGESTION AT A ROUTER?
> For those routers in the network that do occationally experience
> congestion, how can we characterize their rare periods of *extreme*
> congestion (defining extreme congestion, say, as packet drop rates
> above 5%)?  How frequently to these periods of extreme congestion
> occur, and how long do they last?  What fraction can be attributed
> to flash crowds? to Denial of Service attacks? to fiber cuts or
> other routing changes?
> 

Almost forgot, please take a look at http://www.nordu.net/stats/. This
is one of the better places where you can monitor the link traffic for
both average and peak rates, and draw your own conclusion on link
congestion and duration.

In the past several years, most of US providers stop showing their
networks, and only provide the average bw utilization, which is low
anyway. It is believed that the peak/average ratio is around 3-4 or
higher, but I have not seen solid evidence on this since NSFNET.

I don't think there are too many fiber cuts in the network (well... on
the other hand, China-US undersea cable was cut many days ago, and the
link was still down.) But in some networks, providers do shift traffic
between links quite often.

- Ping


Date:    Fri, 16 Mar 2001 14:00:51 EST
To:      oleg@inforocket.com (Oleg Vishnepolsky)
cc:      nahum@watson.ibm.com (Erich M. Nahum), ggm@dstc.edu.au (George Michael
     ***son),
	 dovrolis@mail.eecis.udel.edu (Constantinos Dovrolis),
	 floyd@aciri.org (Sally Floyd), mallman@grc.nasa.gov, kc@caida.org,
	 end2end-interest@postel.org
From:    Erich Nahum <nahum@watson.ibm.com>
Subject: Re: [e2e] two questions about the Internet

Oleg Vishnepolsky writes:
> 
> >96 (Atlanta) was the first olympics that IBM hosted, and I believe it was
> >just one complex in Southbury, CT.  98 (Nagano) and 2000 (Australia)
> >were hosted by 4 main sites: Bethesda (for Europe), Shaumberg IL and 
> >someplace in Ohio (for the Americas) and Tokyo (for Asia).  The
> >request routing was done on a very course-grain level, basically
> >through the routing tables.  E.g., if you were in Europe,
> >olympics.com pointed to Bethesda. I think it was done at
> >the routing layer and not through DNS. 
> 
> How is it even possible not to involve DNS ? If DNS was giving out the 
> same IP address to  olympics.com irrespective of the where requests 
> came from, then routing would have been real tricky, to say the least. 

I wasn't the one who did the work, so take my recollections with
a grain of salt.  Hari was one of the authors on the SigMetrics 97
and InfoCom 98 papers that describe this work, so I would trust him
on this one about the 96 olympics.

As for the later ones, this is what I've been told.  It doesn't
seem tricky to me, but I'm not a routing person.  I'll try to dig 
up the info and post it here next week.

-Erich

-- 
Erich M. Nahum                  IBM T.J. Watson Research Center
Networking Research             P.O. Box 704
nahum@watson.ibm.com            Yorktown Heights NY 10598