netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rick Jones <rick.jones2@hp.com>
To: John Heffner <johnwheffner@gmail.com>
Cc: David Miller <davem@davemloft.net>, md@bts.sk, netdev@vger.kernel.org
Subject: Re: TCP rx window autotuning harmful at LAN context
Date: Tue, 10 Mar 2009 10:20:12 -0700	[thread overview]
Message-ID: <49B6A14C.9070704@hp.com> (raw)
In-Reply-To: <1e41a3230903092055q2317e0cas3721d18fb4cef062@mail.gmail.com>

> (Pretty sure we went over this already, but once more..) 

Sometimes I am but dense north by northwest, but I am also occasionally simply 
dense regardless of the direction :)

> The receiver does not size to twice cwnd.  It sizes to twice the amount of
> data that the application read in one RTT.  In the common case of a path 
> bottleneck and a receiving application that always keeps up, this equals
> 2*cwnd, but the distinction is very important to understanding its behavior in
> other cases.
> 
> In your test where you limit sndbuf to 256k, you will find that you
> did not fill up the bottleneck queues, and you did not get a
> significantly increased RTT, which are the negative effects we want to
> avoid.  The large receive window caused no trouble at all.

What is the definition of "significantly" here?

With my 256K capped SO_SNDBUF ping seems to report like this:

[root@dl5855 ~]# ping sut42
PING sut42.west (10.208.0.45) 56(84) bytes of data.
64 bytes from sut42.west (10.208.0.45): icmp_seq=1 ttl=64 time=1.58 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=2 ttl=64 time=0.126 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=3 ttl=64 time=0.103 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=4 ttl=64 time=0.102 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=5 ttl=64 time=0.104 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=6 ttl=64 time=0.100 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=7 ttl=64 time=0.140 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=8 ttl=64 time=0.103 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=9 ttl=64 time=11.3 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=10 ttl=64 time=10.3 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=11 ttl=64 time=7.42 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=12 ttl=64 time=4.51 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=13 ttl=64 time=1.56 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=14 ttl=64 time=4.47 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=15 ttl=64 time=4.63 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=16 ttl=64 time=1.66 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=17 ttl=64 time=7.65 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=18 ttl=64 time=4.73 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=19 ttl=64 time=0.135 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=20 ttl=64 time=0.116 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=21 ttl=64 time=0.102 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=22 ttl=64 time=0.102 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=23 ttl=64 time=0.098 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=24 ttl=64 time=0.104 ms

FWIW, when I uncap the SO_SNDBUF, the RTTs start to look like this instead:

[root@dl5855 ~]# ping sut42
PING sut42.west (10.208.0.45) 56(84) bytes of data.
64 bytes from sut42.west (10.208.0.45): icmp_seq=1 ttl=64 time=0.183 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=2 ttl=64 time=0.107 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=3 ttl=64 time=0.100 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=4 ttl=64 time=0.117 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=5 ttl=64 time=0.103 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=6 ttl=64 time=0.099 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=7 ttl=64 time=0.123 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=8 ttl=64 time=26.2 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=9 ttl=64 time=24.3 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=10 ttl=64 time=26.3 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=11 ttl=64 time=26.4 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=12 ttl=64 time=26.3 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=13 ttl=64 time=26.2 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=14 ttl=64 time=26.6 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=15 ttl=64 time=26.2 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=16 ttl=64 time=26.5 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=17 ttl=64 time=26.3 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=18 ttl=64 time=0.126 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=19 ttl=64 time=0.119 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=20 ttl=64 time=0.120 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=21 ttl=64 time=0.097 ms

And then when I cap both sides to 64K requested/128K and still get link-rate the 
pings look like:

[root@dl5855 ~]# ping sut42
PING sut42.west (10.208.0.45) 56(84) bytes of data.
64 bytes from sut42.west (10.208.0.45): icmp_seq=1 ttl=64 time=0.161 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=2 ttl=64 time=0.104 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=3 ttl=64 time=0.103 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=4 ttl=64 time=0.101 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=5 ttl=64 time=0.106 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=6 ttl=64 time=0.102 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=7 ttl=64 time=0.753 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=8 ttl=64 time=0.594 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=9 ttl=64 time=0.789 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=10 ttl=64 time=0.566 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=11 ttl=64 time=0.587 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=12 ttl=64 time=0.635 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=13 ttl=64 time=0.729 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=14 ttl=64 time=0.613 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=15 ttl=64 time=0.609 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=16 ttl=64 time=0.655 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=17 ttl=64 time=0.152 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=18 ttl=64 time=0.106 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=19 ttl=64 time=0.100 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=20 ttl=64 time=0.106 ms
64 bytes from sut42.west (10.208.0.45): icmp_seq=21 ttl=64 time=0.122 ms

None of the above "absolves" the sender of course, but I still get wrapped around 
the axle of handing so much rope to senders when we know 99 times out of ten they 
are going to hang themselves with it.

rick jones

Netperf cannot tell me bytes received per RTT, but it can tell me the average 
bytes per recv() call.  I'm not sure if that is a sufficient approximation but 
here are those three netperf runs re-run with remote_bytes_per_recv added to the 
output:

[root@dl5855 ~]# netperf -t omni -H sut42 -- -k foo -s 64K -S 64K
OMNI TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to sut42.west (10.208.0.45) port 
0 AF_INET
THROUGHPUT=941.07
LSS_SIZE_REQ=65536
LSS_SIZE=131072
LSS_SIZE_END=131072
RSR_SIZE_REQ=65536
RSR_SIZE=131072
RSR_SIZE_END=131072
REMOTE_BYTES_PER_RECV=8178.43
[root@dl5855 ~]# netperf -t omni -H sut42 -- -k foo -s 128K
OMNI TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to sut42.west (10.208.0.45) port 
0 AF_INET
THROUGHPUT=941.31
LSS_SIZE_REQ=131072
LSS_SIZE=262142
LSS_SIZE_END=262142
RSR_SIZE_REQ=-1
RSR_SIZE=87380
RSR_SIZE_END=4194304
REMOTE_BYTES_PER_RECV=8005.97
[root@dl5855 ~]# netperf -t omni -H sut42 -- -k foo
OMNI TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to sut42.west (10.208.0.45) port 
0 AF_INET
THROUGHPUT=941.33
LSS_SIZE_REQ=-1
LSS_SIZE=16384
LSS_SIZE_END=4194304
RSR_SIZE_REQ=-1
RSR_SIZE=87380
RSR_SIZE_END=4194304
REMOTE_BYTES_PER_RECV=8055.89

  reply	other threads:[~2009-03-10 17:20 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-09 11:25 TCP rx window autotuning harmful at LAN context Marian Ďurkovič
2009-03-09 18:01 ` John Heffner
2009-03-09 20:05   ` Marian Ďurkovič
2009-03-09 20:24     ` Stephen Hemminger
2009-03-10  0:09     ` David Miller
2009-03-10  0:34       ` Rick Jones
2009-03-10  3:55         ` John Heffner
2009-03-10 17:20           ` Rick Jones [this message]
2009-03-11 10:03       ` Andi Kleen
2009-03-11 11:03         ` Marian Ďurkovič
2009-03-11 13:30         ` David Miller
2009-03-11 15:01           ` Andi Kleen
2009-03-11 14:56             ` Marian Ďurkovič
2009-03-11 15:34             ` John Heffner
     [not found]   ` <20090309195906.M50328@bts.sk>
2009-03-09 20:23     ` John Heffner
2009-03-09 20:33       ` Stephen Hemminger
2009-03-09 23:52       ` David Miller
2009-03-10  0:09         ` John Heffner
2009-03-10  5:19           ` Eric Dumazet
     [not found]       ` <20090310104956.GA81181@bts.sk>
2009-03-10 11:30         ` David Miller
2009-03-10 11:46           ` Marian Ďurkovič
2009-03-10 15:23             ` John Heffner
2009-03-10 16:00               ` Marian Ďurkovič
2009-03-10 16:18                 ` David Miller
2009-03-11  8:29                   ` Marian Ďurkovič
2009-03-11  8:41                     ` David Miller
2009-03-11  9:05                       ` Marian Ďurkovič
2009-03-11  9:11                       ` Eric Dumazet
2009-03-11 13:25                         ` David Miller
2009-03-11  9:02 ` Rémi Denis-Courmont

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49B6A14C.9070704@hp.com \
    --to=rick.jones2@hp.com \
    --cc=davem@davemloft.net \
    --cc=johnwheffner@gmail.com \
    --cc=md@bts.sk \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).