Netdev List
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: "Yan, Zheng" <yanzheng@21cn.com>
Cc: netdev <netdev@vger.kernel.org>
Subject: Re: [RFC] tcp: use order-3 pages in tcp_sendmsg()
Date: Thu, 15 Nov 2012 05:06:01 -0800	[thread overview]
Message-ID: <1352984761.4497.22.camel@edumazet-glaptop> (raw)
In-Reply-To: <CAAM7YAm+XJKTxJStLMoo4RRV85oogN5wCHfXorkEiUHKqNKHDQ@mail.gmail.com>

On Thu, 2012-11-15 at 15:52 +0800, Yan, Zheng wrote:
> On Mon, Sep 17, 2012 at 3:49 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > We currently use per socket page reserve for tcp_sendmsg() operations.
> >
> > Its done to raise the probability of coalescing small write() in to
> > single segments in the skbs.
> >
> > But it wastes a lot of memory for applications handling a lot of mostly
> > idle sockets, since each socket holds one page in sk->sk_sndmsg_page
> >
> > I did a small experiment to use order-3 pages and it gave me a 10% boost
> > of performance, because each TSO skb can use only two frags of 32KB,
> > instead of 16 frags of 4KB, so we spend less time in ndo_start_xmit() to
> > setup the tx descriptor and TX completion path to unmap the frags and
> > free them.
> >
> > We also spend less time in tcp_sendmsg(), because we call page allocator
> > 8x less often.
> >
> > Now back to the per socket page, what about trying to factorize it ?
> >
> > Since we can sleep (or/and do a cpu migration) in tcp_sendmsg(), we cant
> > really use a percpu page reserve as we do in __netdev_alloc_frag()
> >
> > We could instead use a per thread reserve, at the cost of adding a test
> > in task exit handler.
> >
> > Recap :
> >
> > 1) Use a per thread page reserve instead of a per socket one
> > 2) Use order-3 pages (or order-0 pages if page size is >= 32768)
> >
> >
> 
> Hi,
> 
> This commit makes one of our test case on core 2 machine drop in performance
> by about 60%. The test case runs 2048 instances of netperf 64k stream test at
> the same time.  Analysis showed using order-3 pages causes more LLC misses,
> most new LLC misses happen when the senders copy data to the socket buffer.
> If revert to use single page, the sender side only trigger a few LLC
> misses, most
> LLC misses happen on the receiver size. It means most pages allocated by the
> senders are cache hot. But when using order-3 pages, 2048 * 32k = 64M, 64M
> is much larger than LLC size. Should this regression be worried? or
> our test case
> is too unpractical?

Hi Yan

You forgot to give some basic information with this mail, like the
hardware configuration, NIC driver, ...

Increasing performance can sometime change the balance you had on a
prior workload.

Number of in flight bytes do not depend on the order of the pages, but
sizes of TCP buffers (receiver, sender)

TCP Small queue was an attempt to reduce the number of in-flight bytes,
you should try to change either SO_SNDBUF or SO_RCVBUF settings (instead
of letting the system autotune them) if you really need 2048 concurrent
flows.

Otherwise, each flow can consume up to 6 MB of memory, so obviously your
cpu caches wont hold 2048*6MB of memory...

If the sender is faster (because of this commit), but receiver is slow
to drain the receive queues, then you can have a situation where the
consumed memory on receiver is higher and the receiver might be actually
slower.

Thanks

  reply	other threads:[~2012-11-15 13:06 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-17  7:49 [RFC] tcp: use order-3 pages in tcp_sendmsg() Eric Dumazet
2012-09-17 16:12 ` David Miller
2012-09-17 17:02   ` Eric Dumazet
2012-09-17 17:04     ` Eric Dumazet
2012-09-17 17:07       ` David Miller
2012-09-19 15:14         ` Eric Dumazet
2012-09-19 17:28           ` Rick Jones
2012-09-19 17:55             ` Eric Dumazet
2012-09-19 17:56           ` David Miller
2012-09-19 19:04             ` Alexander Duyck
2012-09-19 20:18           ` Ben Hutchings
2012-09-19 22:20           ` Vijay Subramanian
2012-09-20  5:37             ` Eric Dumazet
2012-09-20 17:10               ` Rick Jones
2012-09-20 17:43                 ` Eric Dumazet
2012-09-20 18:37                   ` Yuchung Cheng
2012-09-20 19:40                 ` David Miller
2012-09-20 20:06                   ` Rick Jones
2012-09-20 20:25                     ` Eric Dumazet
2012-09-21 15:48                       ` Eric Dumazet
2012-09-21 16:27                         ` David Miller
2012-09-21 16:51                           ` Eric Dumazet
2012-09-21 17:04                             ` David Miller
2012-09-21 17:11                               ` Eric Dumazet
2012-09-23 12:47                           ` Jan Engelhardt
2012-09-23 16:16                             ` David Miller
2012-09-23 17:40                               ` Jan Engelhardt
2012-09-23 18:13                                 ` Eric Dumazet
2012-09-23 18:27                                 ` David Miller
2012-09-20 21:39               ` Vijay Subramanian
2012-09-20 22:01               ` Rick Jones
2012-11-15  7:52 ` Yan, Zheng 
2012-11-15 13:06   ` Eric Dumazet [this message]
2012-11-16  2:36     ` Yan, Zheng 
2012-11-15 13:47   ` Eric Dumazet
2012-11-21  8:05     ` Yan, Zheng
2012-11-15 18:33   ` Rick Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1352984761.4497.22.camel@edumazet-glaptop \
    --to=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=yanzheng@21cn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox