From: Bill Fink <billfink@mindspring.com>
To: Stephen Hemminger <stephen.hemminger@vyatta.com>
Cc: Roland Dreier <rdreier@cisco.com>,
Evgeniy Polyakov <johnpol@2ka.mipt.ru>,
David Miller <davem@davemloft.net>,
aglo@citi.umich.edu, shemminger@vyatta.com,
netdev@vger.kernel.org, rees@umich.edu, bfields@fieldses.org
Subject: Re: setsockopt()
Date: Tue, 8 Jul 2008 18:05:00 -0400 [thread overview]
Message-ID: <20080708180500.e8a61231.billfink@mindspring.com> (raw)
In-Reply-To: <20080708134845.2372a483@speedy>
On Tue, 8 Jul 2008, Stephen Hemminger wrote:
> On Mon, 07 Jul 2008 23:29:31 -0700
> Roland Dreier <rdreier@cisco.com> wrote:
>
> > Interesting... I'd not tried nuttcp before, and on my testbed, which is
> > a very high-bandwidth, low-RTT network (IP-over-InfiniBand with DDR IB,
> > so the network is capable of 16 Gbps, and the RTT is ~25 microseconds),
> > the difference between autotuning and not for nuttcp is huge (testing
> > with 2.6.26-rc8 plus some pending 2.6.27 patches that add checksum
> > offload, LSO and LRO to the IP-over-IB driver):
> >
> > nuttcp -T30 -i1 ends up with:
> >
> > 14465.0625 MB / 30.01 sec = 4043.6073 Mbps 82 %TX 2 %RX
> >
> > while setting the window even to 128 KB with
> > nuttcp -w128k -T30 -i1 ends up with:
> >
> > 36416.8125 MB / 30.00 sec = 10182.8137 Mbps 90 %TX 96 %RX
> >
> > so it's a factor of 2.5 with nuttcp. I've never seen other apps behave
> > like that -- for example NPtcp (netpipe) only gets slower when
> > explicitly setting the window size.
> >
> > Strange...
>
> I suspect that the link is so fast that the window growth isn't happening
> fast enough. With only a 30 second test, you probably barely made it
> out of TCP slow start.
Nah. 30 seconds is plenty of time. I got up to nearly 8 Gbps
in 4 seconds (see my test report in earlier message in this thread),
and that was on an ~72 ms RTT network path. Roland's IB network
only has a ~25 usec RTT.
BTW I believe there is one other important difference between the way
the tcp_rmem/tcp_wmem autotuning parameters are handled versus the way
the rmem_max/wmem_max parameters are used when explicitly setting the
socket buffer sizes. I believe the tcp_rmem/tcp_wmem autotuning maximum
parameters are hard limits, with the default maximum tcp_rmem setting
being ~170 KB and the default maximum tcp_wmem setting being 128 KB.
On the other hand, I believe the rmem_max/wmem_max determines the maximum
value allowed to be set via the SO_RCVBUF/SO_SNDBUF setsockopt() call.
But then Linux doubles the requested value, so when Roland specified
a "-w128" nuttcp parameter, he actually got a socket buffer size
of 256 KB, which would thus be double that available in the autotuning
case assuming the tcp_rmem/tcp_wmem settings are using their default
values. This could then account for a factor of 2 X between the two
test cases. The "-v" verbose option to nuttcp might shed some light
on this hypothesis.
-Bill
next prev parent reply other threads:[~2008-07-08 22:05 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-07 18:18 setsockopt() Olga Kornievskaia
2008-07-07 21:24 ` setsockopt() Stephen Hemminger
2008-07-07 21:30 ` setsockopt() Olga Kornievskaia
2008-07-07 21:33 ` setsockopt() Stephen Hemminger
2008-07-07 21:49 ` setsockopt() David Miller
2008-07-08 4:54 ` setsockopt() Evgeniy Polyakov
2008-07-08 6:02 ` setsockopt() Bill Fink
2008-07-08 6:29 ` setsockopt() Roland Dreier
2008-07-08 6:43 ` setsockopt() Evgeniy Polyakov
2008-07-08 7:03 ` setsockopt() Roland Dreier
2008-07-08 18:48 ` setsockopt() Bill Fink
2008-07-09 18:10 ` setsockopt() Roland Dreier
2008-07-09 18:34 ` setsockopt() Evgeniy Polyakov
2008-07-10 2:50 ` setsockopt() Bill Fink
2008-07-10 17:26 ` setsockopt() Rick Jones
2008-07-11 0:50 ` setsockopt() Bill Fink
2008-07-08 20:48 ` setsockopt() Stephen Hemminger
2008-07-08 22:05 ` Bill Fink [this message]
2008-07-09 5:25 ` setsockopt() Evgeniy Polyakov
2008-07-09 5:47 ` setsockopt() Bill Fink
2008-07-09 6:03 ` setsockopt() Evgeniy Polyakov
2008-07-09 18:11 ` setsockopt() J. Bruce Fields
2008-07-09 18:43 ` setsockopt() Evgeniy Polyakov
2008-07-09 22:28 ` setsockopt() J. Bruce Fields
2008-07-10 1:06 ` setsockopt() Evgeniy Polyakov
2008-07-10 20:05 ` [PATCH] Documentation: clarify tcp_{r,w}mem sysctl docs J. Bruce Fields
2008-07-10 23:50 ` David Miller
2008-07-08 20:12 ` setsockopt() Jim Rees
2008-07-08 21:54 ` setsockopt() John Heffner
2008-07-08 23:51 ` setsockopt() Jim Rees
2008-07-09 0:07 ` setsockopt() John Heffner
2008-07-07 22:50 ` setsockopt() Rick Jones
2008-07-07 23:00 ` setsockopt() David Miller
2008-07-07 23:27 ` setsockopt() Rick Jones
2008-07-08 1:15 ` setsockopt() Rick Jones
2008-07-08 1:48 ` setsockopt() J. Bruce Fields
2008-07-08 1:44 ` setsockopt() David Miller
2008-07-08 3:33 ` setsockopt() John Heffner
2008-07-08 18:16 ` setsockopt() Rick Jones
2008-07-08 19:10 ` setsockopt() John Heffner
[not found] ` <349f35ee0807090255s58fd040bne265ee117d06d397@mail.gmail.com>
2008-07-09 10:38 ` setsockopt() Jerry Chu
2008-07-07 21:32 ` setsockopt() J. Bruce Fields
2008-07-08 1:17 ` setsockopt() John Heffner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080708180500.e8a61231.billfink@mindspring.com \
--to=billfink@mindspring.com \
--cc=aglo@citi.umich.edu \
--cc=bfields@fieldses.org \
--cc=davem@davemloft.net \
--cc=johnpol@2ka.mipt.ru \
--cc=netdev@vger.kernel.org \
--cc=rdreier@cisco.com \
--cc=rees@umich.edu \
--cc=shemminger@vyatta.com \
--cc=stephen.hemminger@vyatta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).