From: Bill Fink <billfink@mindspring.com>
To: Stephen Hemminger <stephen.hemminger@vyatta.com>
Cc: Roland Dreier <rdreier@cisco.com>,
Evgeniy Polyakov <johnpol@2ka.mipt.ru>,
David Miller <davem@davemloft.net>,
aglo@citi.umich.edu, shemminger@vyatta.com,
netdev@vger.kernel.org, rees@umich.edu, bfields@fieldses.org
Subject: Re: setsockopt()
Date: Tue, 8 Jul 2008 18:05:00 -0400 [thread overview]
Message-ID: <20080708180500.e8a61231.billfink@mindspring.com> (raw)
In-Reply-To: <20080708134845.2372a483@speedy>
On Tue, 8 Jul 2008, Stephen Hemminger wrote:
> On Mon, 07 Jul 2008 23:29:31 -0700
> Roland Dreier <rdreier@cisco.com> wrote:
>
> > Interesting... I'd not tried nuttcp before, and on my testbed, which is
> > a very high-bandwidth, low-RTT network (IP-over-InfiniBand with DDR IB,
> > so the network is capable of 16 Gbps, and the RTT is ~25 microseconds),
> > the difference between autotuning and not for nuttcp is huge (testing
> > with 2.6.26-rc8 plus some pending 2.6.27 patches that add checksum
> > offload, LSO and LRO to the IP-over-IB driver):
> >
> > nuttcp -T30 -i1 ends up with:
> >
> > 14465.0625 MB / 30.01 sec = 4043.6073 Mbps 82 %TX 2 %RX
> >
> > while setting the window even to 128 KB with
> > nuttcp -w128k -T30 -i1 ends up with:
> >
> > 36416.8125 MB / 30.00 sec = 10182.8137 Mbps 90 %TX 96 %RX
> >
> > so it's a factor of 2.5 with nuttcp. I've never seen other apps behave
> > like that -- for example NPtcp (netpipe) only gets slower when
> > explicitly setting the window size.
> >
> > Strange...
>
> I suspect that the link is so fast that the window growth isn't happening
> fast enough. With only a 30 second test, you probably barely made it
> out of TCP slow start.
Nah. 30 seconds is plenty of time. I got up to nearly 8 Gbps
in 4 seconds (see my test report in earlier message in this thread),
and that was on an ~72 ms RTT network path. Roland's IB network
only has a ~25 usec RTT.
BTW I believe there is one other important difference between the way
the tcp_rmem/tcp_wmem autotuning parameters are handled versus the way
the rmem_max/wmem_max parameters are used when explicitly setting the
socket buffer sizes. I believe the tcp_rmem/tcp_wmem autotuning maximum
parameters are hard limits, with the default maximum tcp_rmem setting
being ~170 KB and the default maximum tcp_wmem setting being 128 KB.
On the other hand, I believe the rmem_max/wmem_max determines the maximum
value allowed to be set via the SO_RCVBUF/SO_SNDBUF setsockopt() call.
But then Linux doubles the requested value, so when Roland specified
a "-w128" nuttcp parameter, he actually got a socket buffer size
of 256 KB, which would thus be double that available in the autotuning
case assuming the tcp_rmem/tcp_wmem settings are using their default
values. This could then account for a factor of 2 X between the two
test cases. The "-v" verbose option to nuttcp might shed some light
on this hypothesis.
-Bill
next prev parent reply other threads:[~2008-07-08 22:05 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-07 18:18 setsockopt() Olga Kornievskaia
2008-07-07 21:24 ` setsockopt() Stephen Hemminger
2008-07-07 21:30 ` setsockopt() Olga Kornievskaia
2008-07-07 21:33 ` setsockopt() Stephen Hemminger
2008-07-07 21:49 ` setsockopt() David Miller
2008-07-08 4:54 ` setsockopt() Evgeniy Polyakov
2008-07-08 6:02 ` setsockopt() Bill Fink
2008-07-08 6:29 ` setsockopt() Roland Dreier
2008-07-08 6:43 ` setsockopt() Evgeniy Polyakov
2008-07-08 7:03 ` setsockopt() Roland Dreier
2008-07-08 18:48 ` setsockopt() Bill Fink
2008-07-09 18:10 ` setsockopt() Roland Dreier
2008-07-09 18:34 ` setsockopt() Evgeniy Polyakov
2008-07-10 2:50 ` setsockopt() Bill Fink
2008-07-10 17:26 ` setsockopt() Rick Jones
2008-07-11 0:50 ` setsockopt() Bill Fink
2008-07-08 20:48 ` setsockopt() Stephen Hemminger
2008-07-08 22:05 ` Bill Fink [this message]
2008-07-09 5:25 ` setsockopt() Evgeniy Polyakov
2008-07-09 5:47 ` setsockopt() Bill Fink
2008-07-09 6:03 ` setsockopt() Evgeniy Polyakov
2008-07-09 18:11 ` setsockopt() J. Bruce Fields
2008-07-09 18:43 ` setsockopt() Evgeniy Polyakov
2008-07-09 22:28 ` setsockopt() J. Bruce Fields
2008-07-10 1:06 ` setsockopt() Evgeniy Polyakov
2008-07-10 20:05 ` [PATCH] Documentation: clarify tcp_{r,w}mem sysctl docs J. Bruce Fields
2008-07-10 23:50 ` David Miller
2008-07-08 20:12 ` setsockopt() Jim Rees
2008-07-08 21:54 ` setsockopt() John Heffner
2008-07-08 23:51 ` setsockopt() Jim Rees
2008-07-09 0:07 ` setsockopt() John Heffner
2008-07-07 22:50 ` setsockopt() Rick Jones
2008-07-07 23:00 ` setsockopt() David Miller
2008-07-07 23:27 ` setsockopt() Rick Jones
2008-07-08 1:15 ` setsockopt() Rick Jones
2008-07-08 1:48 ` setsockopt() J. Bruce Fields
2008-07-08 1:44 ` setsockopt() David Miller
2008-07-08 3:33 ` setsockopt() John Heffner
2008-07-08 18:16 ` setsockopt() Rick Jones
2008-07-08 19:10 ` setsockopt() John Heffner
[not found] ` <349f35ee0807090255s58fd040bne265ee117d06d397@mail.gmail.com>
2008-07-09 10:38 ` setsockopt() Jerry Chu
2008-07-07 21:32 ` setsockopt() J. Bruce Fields
2008-07-08 1:17 ` setsockopt() John Heffner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080708180500.e8a61231.billfink@mindspring.com \
--to=billfink@mindspring.com \
--cc=aglo@citi.umich.edu \
--cc=bfields@fieldses.org \
--cc=davem@davemloft.net \
--cc=johnpol@2ka.mipt.ru \
--cc=netdev@vger.kernel.org \
--cc=rdreier@cisco.com \
--cc=rees@umich.edu \
--cc=shemminger@vyatta.com \
--cc=stephen.hemminger@vyatta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.