From: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>,
netdev <netdev@vger.kernel.org>, Tejun Heo <tj@kernel.org>,
Pekka Enberg <penberg@cs.helsinki.fi>,
alex.shi@intel.com,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Ma, Ling" <ling.ma@intel.com>,
"Chen, Tim C" <tim.c.chen@intel.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: hackbench regression due to commit 9dfc6e68bfe6e
Date: Wed, 07 Apr 2010 17:07:47 +0800 [thread overview]
Message-ID: <1270631267.2078.380.camel@ymzhang.sh.intel.com> (raw)
In-Reply-To: <1270622352.2091.702.camel@edumazet-laptop>
On Wed, 2010-04-07 at 08:39 +0200, Eric Dumazet wrote:
> Le mercredi 07 avril 2010 à 10:34 +0800, Zhang, Yanmin a écrit :
>
> > I collected retired instruction, dtlb miss and LLC miss.
> > Below is data of LLC miss.
> >
> > Kernel 2.6.33:
> > # Samples: 11639436896 LLC-load-misses
> > #
> > # Overhead Command Shared Object Symbol
> > # ........ ............... ...................................................... ......
> > #
> > 20.94% hackbench [kernel.kallsyms] [k] copy_user_generic_string
> > 14.56% hackbench [kernel.kallsyms] [k] unix_stream_recvmsg
> > 12.88% hackbench [kernel.kallsyms] [k] kfree
> > 7.37% hackbench [kernel.kallsyms] [k] kmem_cache_free
> > 7.18% hackbench [kernel.kallsyms] [k] kmem_cache_alloc_node
> > 6.78% hackbench [kernel.kallsyms] [k] kfree_skb
> > 6.27% hackbench [kernel.kallsyms] [k] __kmalloc_node_track_caller
> > 2.73% hackbench [kernel.kallsyms] [k] __slab_free
> > 2.21% hackbench [kernel.kallsyms] [k] get_partial_node
> > 2.01% hackbench [kernel.kallsyms] [k] _raw_spin_lock
> > 1.59% hackbench [kernel.kallsyms] [k] schedule
> > 1.27% hackbench hackbench [.] receiver
> > 0.99% hackbench libpthread-2.9.so [.] __read
> > 0.87% hackbench [kernel.kallsyms] [k] unix_stream_sendmsg
> >
> >
> >
> >
> > Kernel 2.6.34-rc3:
> > # Samples: 13079611308 LLC-load-misses
> > #
> > # Overhead Command Shared Object Symbol
> > # ........ ............... .................................................................... ......
> > #
> > 18.55% hackbench [kernel.kallsyms] [k] copy_user_generic_str
> > ing
> > 13.19% hackbench [kernel.kallsyms] [k] unix_stream_recvmsg
> > 11.62% hackbench [kernel.kallsyms] [k] kfree
> > 8.54% hackbench [kernel.kallsyms] [k] kmem_cache_free
> > 7.88% hackbench [kernel.kallsyms] [k] __kmalloc_node_track_
> > caller
> > 6.54% hackbench [kernel.kallsyms] [k] kmem_cache_alloc_node
> > 5.94% hackbench [kernel.kallsyms] [k] kfree_skb
> > 3.48% hackbench [kernel.kallsyms] [k] __slab_free
> > 2.15% hackbench [kernel.kallsyms] [k] _raw_spin_lock
> > 1.83% hackbench [kernel.kallsyms] [k] schedule
> > 1.82% hackbench [kernel.kallsyms] [k] get_partial_node
> > 1.59% hackbench hackbench [.] receiver
> > 1.37% hackbench libpthread-2.9.so [.] __read
> >
> >
>
> Please check values of /proc/sys/net/core/rmem_default
> and /proc/sys/net/core/wmem_default on your machines.
>
> Their values can also change hackbench results, because increasing
> wmem_default allows af_unix senders to consume much more skbs and stress
> slab allocators (__slab_free), way beyond slub_min_order can tune them.
>
> When 2000 senders are running (and 2000 receivers), we might consume
> something like 2000 * 100.000 bytes of kernel memory for skbs. TLB
> trashing is expected, because all these skbs can span many 2MB pages.
> Maybe some node imbalance happens too.
It's a good pointer. rmem_default and wmem_default are about 116k on my machine.
I changed them to 52K and it seems there is no improvement.
>
>
>
> You could try to boot your machine with less ram per node and check :
>
> # cat /proc/buddyinfo
> Node 0, zone DMA 2 1 2 2 1 1 1 0 1 1 3
> Node 0, zone DMA32 219 298 143 584 145 57 44 41 31 26 517
> Node 1, zone DMA32 4 1 17 1 0 3 2 2 2 2 123
> Node 1, zone Normal 126 169 83 8 7 5 59 59 49 28 459
>
>
> One experiment on your Nehalem machine would be to change hackbench so
> that each group (20 senders/ 20 receivers) run on a particular NUMA
> node.
I expect process scheduler to work well in scheduling different groups
to different nodes.
I suspected dynamic percpu data didn't take care of NUMA, but kernel dump shows
it does take care of NUMA.
>
> x86info -c ->
>
> CPU #1
> EFamily: 0 EModel: 1 Family: 6 Model: 26 Stepping: 5
> CPU Model: Core i7 (Nehalem)
> Processor name string: Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
> Type: 0 (Original OEM) Brand: 0 (Unsupported)
> Number of cores per physical package=8
> Number of logical processors per socket=16
> Number of logical processors per core=2
> APIC ID: 0x10 Package: 0 Core: 1 SMT ID 0
> Cache info
> L1 Instruction cache: 32KB, 4-way associative. 64 byte line size.
> L1 Data cache: 32KB, 8-way associative. 64 byte line size.
> L2 (MLC): 256KB, 8-way associative. 64 byte line size.
> TLB info
> Data TLB: 4KB pages, 4-way associative, 64 entries
> 64 byte prefetching.
> Found unknown cache descriptors: 55 5a b2 ca e4
>
>
next prev parent reply other threads:[~2010-04-07 9:05 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1269506457.4513.141.camel@alexs-hp.sh.intel.com>
[not found] ` <alpine.DEB.2.00.1003250942080.2670@router.home>
[not found] ` <1269570902.9614.92.camel@alexs-hp.sh.intel.com>
[not found] ` <1270114166.2078.107.camel@ymzhang.sh.intel.com>
[not found] ` <alpine.DEB.2.00.1004011050340.16531@router.home>
[not found] ` <1270195589.2078.116.camel@ymzhang.sh.intel.com>
[not found] ` <alpine.DEB.2.00.1004050853300.23149@router.home>
[not found] ` <i2z84144f021004051030k7ff5190cyc083aa12c552dfac@mail.gmail.com>
[not found] ` <4BBA8DF9.8010409@kernel.org>
[not found] ` <1270542497.2078.123.camel@ymzhang.sh.intel.com>
[not found] ` <alpine.DEB.2.00.1004061033330.18750@router.home>
[not found] ` <alpine.DEB.2.00.1004061552500.19151@router.home>
2010-04-06 22:10 ` hackbench regression due to commit 9dfc6e68bfe6e Eric Dumazet
2010-04-07 2:34 ` Zhang, Yanmin
2010-04-07 6:39 ` Eric Dumazet
2010-04-07 9:07 ` Zhang, Yanmin [this message]
2010-04-07 9:20 ` Eric Dumazet
2010-04-07 10:47 ` Pekka Enberg
2010-04-07 16:30 ` Christoph Lameter
2010-04-07 16:43 ` Christoph Lameter
2010-04-07 16:49 ` Pekka Enberg
2010-04-07 16:52 ` Pekka Enberg
2010-04-07 18:20 ` Christoph Lameter
2010-04-07 18:25 ` Pekka Enberg
2010-04-07 19:30 ` Christoph Lameter
2010-04-07 18:38 ` Eric Dumazet
2010-04-08 1:05 ` Zhang, Yanmin
2010-04-08 4:59 ` Eric Dumazet
2010-04-08 5:39 ` Eric Dumazet
2010-04-08 7:00 ` Eric Dumazet
2010-04-08 7:05 ` David Miller
2010-04-08 7:20 ` David Miller
2010-04-08 7:25 ` Eric Dumazet
2010-04-08 7:54 ` Zhang, Yanmin
2010-04-08 7:54 ` Eric Dumazet
2010-04-08 8:09 ` Eric Dumazet
2010-04-08 15:34 ` Christoph Lameter
2010-04-08 15:52 ` Eric Dumazet
2010-04-07 18:18 ` Christoph Lameter
2010-04-08 7:18 ` Zhang, Yanmin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1270631267.2078.380.camel@ymzhang.sh.intel.com \
--to=yanmin_zhang@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=alex.shi@intel.com \
--cc=cl@linux-foundation.org \
--cc=eric.dumazet@gmail.com \
--cc=ling.ma@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=penberg@cs.helsinki.fi \
--cc=tim.c.chen@intel.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).