All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>, Mel Gorman <mgorman@suse.de>,
	Rik van Riel <riel@redhat.com>, Kevin Hilman <khilman@linaro.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Paul Bolle <paul.bollee@gmail.com>,
	Zlatko Calusic <zcalusic@bitsync.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Andi Kleen <ak@linux.intel.com>
Subject: Re: NUMA? bisected performance regression 3.11->3.12
Date: Fri, 22 Nov 2013 00:22:19 -0500	[thread overview]
Message-ID: <20131122052219.GL3556@cmpxchg.org> (raw)
In-Reply-To: <528E8FCE.1000707@intel.com>

Hi Dave,

On Thu, Nov 21, 2013 at 02:57:18PM -0800, Dave Hansen wrote:
> Hey Johannes,
> 
> I'm running an open/close microbenchmark from the will-it-scale set:
> > https://github.com/antonblanchard/will-it-scale/blob/master/tests/open1.c
> 
> I was seeing some weird symptoms on 3.12 vs 3.11.  The throughput in
> that test was going from down from 50 million to 35 million.
> 
> The profiles show an increase in cpu time in _raw_spin_lock_irq.  The
> profiles pointed to slub code that hasn't been touched in quite a while.
>  I bisected it down to:
> 
> 81c0a2bb515fd4daae8cab64352877480792b515 is the first bad commit
> commit 81c0a2bb515fd4daae8cab64352877480792b515
> Author: Johannes Weiner <hannes@cmpxchg.org>
> Date:   Wed Sep 11 14:20:47 2013 -0700
> 
> Which also seems a bit weird, but I've tested with this and its
> preceding commit enough times to be fairly sure that I did it right.
> 
> __slab_free() and free_one_page() both seem to be spending more time
> spinning on their respective spinlocks, even though the throughput went
> down and we should have been doing fewer actual allocations/frees.  The
> best explanation for this would be if CPUs are tending to go after and
> contending for remote cachelines more often once this patch is applied.
> 
> Any ideas?
> 
> It's a 8-socket/160-thread (one NUMA node per socket) system that is not
> under memory pressure during the test.  The latencies are also such that
> vm.zone_reclaim_mode=0.

The change will definitely spread allocations out to all nodes then
and it's plausible that the remote references will hurt kernel object
allocations in a tight loop.  Just to confirm, could you rerun the
test with zone_reclaim_mode enabled to make the allocator stay in the
local zones?

The fairness code was written for reclaimable memory, which is
longer-lived, and the only memory where it matters.  I might have to
be bypass it for unreclaimable allocations...

> Raw perf profiles and .config are in here:
> http://www.sr71.net/~dave/intel/201311-wisregress0/
> 
> Here's a chunk of the 'perf diff':
> >     17.65%   +3.47%  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave           
> >     13.80%   -0.31%  [kernel.kallsyms]  [k] _raw_spin_lock                   
> >      7.21%   -0.51%  [unknown]          [.] 0x00007f7849058640               
> >      3.43%   +0.15%  [kernel.kallsyms]  [k] setup_object                     
> >      2.99%   -0.31%  [kernel.kallsyms]  [k] file_free_rcu                    
> >      2.71%   -0.13%  [kernel.kallsyms]  [k] rcu_process_callbacks            
> >      2.26%   -0.09%  [kernel.kallsyms]  [k] get_empty_filp                   
> >      2.06%   -0.09%  [kernel.kallsyms]  [k] kmem_cache_alloc                 
> >      1.65%   -0.08%  [kernel.kallsyms]  [k] link_path_walk                   
> >      1.53%   -0.08%  [kernel.kallsyms]  [k] memset                           
> >      1.46%   -0.09%  [kernel.kallsyms]  [k] do_dentry_open                   
> >      1.44%   -0.04%  [kernel.kallsyms]  [k] __d_lookup_rcu                   
> >      1.27%   -0.04%  [kernel.kallsyms]  [k] do_last                          
> >      1.18%   -0.04%  [kernel.kallsyms]  [k] ext4_release_file                
> >      1.16%   -0.04%  [kernel.kallsyms]  [k] __call_rcu.constprop.11          

Thanks for the detailed report.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-11-22  5:22 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-21 22:57 NUMA? bisected performance regression 3.11->3.12 Dave Hansen
2013-11-22  5:22 ` Johannes Weiner [this message]
2013-11-22  6:18   ` Dave Hansen
2013-11-22  6:38     ` Johannes Weiner
2013-11-22 16:57       ` Dave Hansen
2013-11-26 10:32 ` Mel Gorman
2013-12-06 17:43   ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131122052219.GL3556@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=khilman@linaro.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=paul.bollee@gmail.com \
    --cc=riel@redhat.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=zcalusic@bitsync.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.