Re: [RFC] tcp: use order-3 pages in tcp_sendmsg()

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Rick Jones <rick.jones2@hp.com>
To: "Yan, Zheng " <yanzheng@21cn.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>, netdev <netdev@vger.kernel.org>
Subject: Re: [RFC] tcp: use order-3 pages in tcp_sendmsg()
Date: Thu, 15 Nov 2012 10:33:18 -0800	[thread overview]
Message-ID: <50A5356E.8030901@hp.com> (raw)
In-Reply-To: <CAAM7YAm+XJKTxJStLMoo4RRV85oogN5wCHfXorkEiUHKqNKHDQ@mail.gmail.com>

On 11/14/2012 11:52 PM, Yan, Zheng wrote:
> This commit makes one of our test case on core 2 machine drop in
> performance by about 60%. The test case runs 2048 instances of
> netperf 64k stream test at the same time.

I'm impressed that 2048 concurrent netperf TCP_STREAM tests ran to 
completion in the first place :)

> Analysis showed using order-3 pages causes more LLC misses, most new
> LLC misses happen when the senders copy data to the socket buffer.

> If revert to use single page, the sender side only trigger a few LLC
>  misses, most LLC misses happen on the receiver size. It means most
> pages allocated by the senders are cache hot. But when using order-3
> pages, 2048 * 32k = 64M, 64M is much larger than LLC size. Should
> this regression be worried? or our test case is too unpractical?

Even before the page change I would have expected the buffers that 
netperf itself uses would have exceeded the LLC.  If you were not using 
test-specific -s and -S options to set an explicit socket buffer size, I 
believe that under Linux (most of the time) the default SO_SNDBUF size 
will be 86KB.  Coupled with your statement that the send size was 64K it 
means the send ring being used by netperf will be 2, 64KB buffers, which 
would then be 256MB across 2048 concurrent netperfs.  Even if we go with 
"only the one send buffer in play at a time matters" that is still 128 
MB of space up in netperf itself even before one gets to the stack.

Still, sharing the analysis tool output might be helpful.

By the way the "default" size of the buffer netperf posts in recv() 
calls will depend on the initial value of SO_RCVBUF after the data 
socket is created and had any -s or -S option values applied to it.

I cannot say that the scripts distributed with netperf are consistently 
good about doing it themselves, but I would suggest for the "canonical" 
bulk streak test something like:

netperf -t TCP_STREAM -H <dest> -l 60 -- -s 1M -S 1M -m 64K -M 64K

as that will reduce the number of variables.  Those -s and -S values 
though will probably call for tweaking sysctl settings or they will be 
clipped by net.core.rmem_max and net.core.wmem_max.  At a minimum I 
would suggest having the -m and -M option.  I might also tack-on a "-o 
all" at the end, but that is a matter of preference - it will cause a 
great deal of output...

Eric Dumazet later says:
> Number of in flight bytes do not depend on the order of the pages, but
> sizes of TCP buffers (receiver, sender)

And unless you happened to use explicit -s and -S options, there is even 
more variability in how much may be inflight.  If you do not add those 
you can at least get netperf to report what the socket buffer sizes 
became by the end of the test:

netperf -t TCP_STREAM ... -- ... -o lss_size_end,rsr_size_end

for "local socket send size" and "remote socket receive size" respectively.

> If the sender is faster (because of this commit), but receiver is slow
> to drain the receive queues, then you can have a situation where the
> consumed memory on receiver is higher and the receiver might be actually
> slower.

Netperf can be told to report the number of receive calls and the bytes 
per receive - either by tacking-on a global "-v 2" or by requesting them 
explicitly via omni output selection.  Presumably, if the receiving 
netserver processes are not keeping-up as well, that should manifest as 
the bytes per receive being larger in the "after" case than the "before" 
case.

netperf ... -- ... -o 
remote_recv_size,remote_recv_calls,remote_bytes_per_recv

happy benchmarking,

rick jones

     prev parent reply	other threads:[~2012-11-15 18:33 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-17  7:49 [RFC] tcp: use order-3 pages in tcp_sendmsg() Eric Dumazet
2012-09-17 16:12 ` David Miller
2012-09-17 17:02   ` Eric Dumazet
2012-09-17 17:04     ` Eric Dumazet
2012-09-17 17:07       ` David Miller
2012-09-19 15:14         ` Eric Dumazet
2012-09-19 17:28           ` Rick Jones
2012-09-19 17:55             ` Eric Dumazet
2012-09-19 17:56           ` David Miller
2012-09-19 19:04             ` Alexander Duyck
2012-09-19 20:18           ` Ben Hutchings
2012-09-19 22:20           ` Vijay Subramanian
2012-09-20  5:37             ` Eric Dumazet
2012-09-20 17:10               ` Rick Jones
2012-09-20 17:43                 ` Eric Dumazet
2012-09-20 18:37                   ` Yuchung Cheng
2012-09-20 19:40                 ` David Miller
2012-09-20 20:06                   ` Rick Jones
2012-09-20 20:25                     ` Eric Dumazet
2012-09-21 15:48                       ` Eric Dumazet
2012-09-21 16:27                         ` David Miller
2012-09-21 16:51                           ` Eric Dumazet
2012-09-21 17:04                             ` David Miller
2012-09-21 17:11                               ` Eric Dumazet
2012-09-23 12:47                           ` Jan Engelhardt
2012-09-23 16:16                             ` David Miller
2012-09-23 17:40                               ` Jan Engelhardt
2012-09-23 18:13                                 ` Eric Dumazet
2012-09-23 18:27                                 ` David Miller
2012-09-20 21:39               ` Vijay Subramanian
2012-09-20 22:01               ` Rick Jones
2012-11-15  7:52 ` Yan, Zheng 
2012-11-15 13:06   ` Eric Dumazet
2012-11-16  2:36     ` Yan, Zheng 
2012-11-15 13:47   ` Eric Dumazet
2012-11-21  8:05     ` Yan, Zheng
2012-11-15 18:33   ` Rick Jones [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50A5356E.8030901@hp.com \
    --to=rick.jones2@hp.com \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=yanzheng@21cn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).