From: Rick Jones <rick.jones2@hp.com>
To: "Yan, Zheng " <yanzheng@21cn.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>, netdev <netdev@vger.kernel.org>
Subject: Re: [RFC] tcp: use order-3 pages in tcp_sendmsg()
Date: Thu, 15 Nov 2012 10:33:18 -0800 [thread overview]
Message-ID: <50A5356E.8030901@hp.com> (raw)
In-Reply-To: <CAAM7YAm+XJKTxJStLMoo4RRV85oogN5wCHfXorkEiUHKqNKHDQ@mail.gmail.com>
On 11/14/2012 11:52 PM, Yan, Zheng wrote:
> This commit makes one of our test case on core 2 machine drop in
> performance by about 60%. The test case runs 2048 instances of
> netperf 64k stream test at the same time.
I'm impressed that 2048 concurrent netperf TCP_STREAM tests ran to
completion in the first place :)
> Analysis showed using order-3 pages causes more LLC misses, most new
> LLC misses happen when the senders copy data to the socket buffer.
> If revert to use single page, the sender side only trigger a few LLC
> misses, most LLC misses happen on the receiver size. It means most
> pages allocated by the senders are cache hot. But when using order-3
> pages, 2048 * 32k = 64M, 64M is much larger than LLC size. Should
> this regression be worried? or our test case is too unpractical?
Even before the page change I would have expected the buffers that
netperf itself uses would have exceeded the LLC. If you were not using
test-specific -s and -S options to set an explicit socket buffer size, I
believe that under Linux (most of the time) the default SO_SNDBUF size
will be 86KB. Coupled with your statement that the send size was 64K it
means the send ring being used by netperf will be 2, 64KB buffers, which
would then be 256MB across 2048 concurrent netperfs. Even if we go with
"only the one send buffer in play at a time matters" that is still 128
MB of space up in netperf itself even before one gets to the stack.
Still, sharing the analysis tool output might be helpful.
By the way the "default" size of the buffer netperf posts in recv()
calls will depend on the initial value of SO_RCVBUF after the data
socket is created and had any -s or -S option values applied to it.
I cannot say that the scripts distributed with netperf are consistently
good about doing it themselves, but I would suggest for the "canonical"
bulk streak test something like:
netperf -t TCP_STREAM -H <dest> -l 60 -- -s 1M -S 1M -m 64K -M 64K
as that will reduce the number of variables. Those -s and -S values
though will probably call for tweaking sysctl settings or they will be
clipped by net.core.rmem_max and net.core.wmem_max. At a minimum I
would suggest having the -m and -M option. I might also tack-on a "-o
all" at the end, but that is a matter of preference - it will cause a
great deal of output...
Eric Dumazet later says:
> Number of in flight bytes do not depend on the order of the pages, but
> sizes of TCP buffers (receiver, sender)
And unless you happened to use explicit -s and -S options, there is even
more variability in how much may be inflight. If you do not add those
you can at least get netperf to report what the socket buffer sizes
became by the end of the test:
netperf -t TCP_STREAM ... -- ... -o lss_size_end,rsr_size_end
for "local socket send size" and "remote socket receive size" respectively.
> If the sender is faster (because of this commit), but receiver is slow
> to drain the receive queues, then you can have a situation where the
> consumed memory on receiver is higher and the receiver might be actually
> slower.
Netperf can be told to report the number of receive calls and the bytes
per receive - either by tacking-on a global "-v 2" or by requesting them
explicitly via omni output selection. Presumably, if the receiving
netserver processes are not keeping-up as well, that should manifest as
the bytes per receive being larger in the "after" case than the "before"
case.
netperf ... -- ... -o
remote_recv_size,remote_recv_calls,remote_bytes_per_recv
happy benchmarking,
rick jones
prev parent reply other threads:[~2012-11-15 18:33 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-17 7:49 [RFC] tcp: use order-3 pages in tcp_sendmsg() Eric Dumazet
2012-09-17 16:12 ` David Miller
2012-09-17 17:02 ` Eric Dumazet
2012-09-17 17:04 ` Eric Dumazet
2012-09-17 17:07 ` David Miller
2012-09-19 15:14 ` Eric Dumazet
2012-09-19 17:28 ` Rick Jones
2012-09-19 17:55 ` Eric Dumazet
2012-09-19 17:56 ` David Miller
2012-09-19 19:04 ` Alexander Duyck
2012-09-19 20:18 ` Ben Hutchings
2012-09-19 22:20 ` Vijay Subramanian
2012-09-20 5:37 ` Eric Dumazet
2012-09-20 17:10 ` Rick Jones
2012-09-20 17:43 ` Eric Dumazet
2012-09-20 18:37 ` Yuchung Cheng
2012-09-20 19:40 ` David Miller
2012-09-20 20:06 ` Rick Jones
2012-09-20 20:25 ` Eric Dumazet
2012-09-21 15:48 ` Eric Dumazet
2012-09-21 16:27 ` David Miller
2012-09-21 16:51 ` Eric Dumazet
2012-09-21 17:04 ` David Miller
2012-09-21 17:11 ` Eric Dumazet
2012-09-23 12:47 ` Jan Engelhardt
2012-09-23 16:16 ` David Miller
2012-09-23 17:40 ` Jan Engelhardt
2012-09-23 18:13 ` Eric Dumazet
2012-09-23 18:27 ` David Miller
2012-09-20 21:39 ` Vijay Subramanian
2012-09-20 22:01 ` Rick Jones
2012-11-15 7:52 ` Yan, Zheng
2012-11-15 13:06 ` Eric Dumazet
2012-11-16 2:36 ` Yan, Zheng
2012-11-15 13:47 ` Eric Dumazet
2012-11-21 8:05 ` Yan, Zheng
2012-11-15 18:33 ` Rick Jones [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50A5356E.8030901@hp.com \
--to=rick.jones2@hp.com \
--cc=eric.dumazet@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=yanzheng@21cn.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).