linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* NUMA? bisected performance regression 3.11->3.12
@ 2013-11-21 22:57 Dave Hansen
  2013-11-22  5:22 ` Johannes Weiner
  2013-11-26 10:32 ` Mel Gorman
  0 siblings, 2 replies; 7+ messages in thread
From: Dave Hansen @ 2013-11-21 22:57 UTC (permalink / raw)
  To: Johannes Weiner, Linus Torvalds
  Cc: Linux-MM, Mel Gorman, Rik van Riel, Kevin Hilman,
	Andrea Arcangeli, Paul Bolle, Zlatko Calusic, Andrew Morton,
	Tim Chen, Andi Kleen

Hey Johannes,

I'm running an open/close microbenchmark from the will-it-scale set:
> https://github.com/antonblanchard/will-it-scale/blob/master/tests/open1.c

I was seeing some weird symptoms on 3.12 vs 3.11.  The throughput in
that test was going from down from 50 million to 35 million.

The profiles show an increase in cpu time in _raw_spin_lock_irq.  The
profiles pointed to slub code that hasn't been touched in quite a while.
 I bisected it down to:

81c0a2bb515fd4daae8cab64352877480792b515 is the first bad commit
commit 81c0a2bb515fd4daae8cab64352877480792b515
Author: Johannes Weiner <hannes@cmpxchg.org>
Date:   Wed Sep 11 14:20:47 2013 -0700

Which also seems a bit weird, but I've tested with this and its
preceding commit enough times to be fairly sure that I did it right.

__slab_free() and free_one_page() both seem to be spending more time
spinning on their respective spinlocks, even though the throughput went
down and we should have been doing fewer actual allocations/frees.  The
best explanation for this would be if CPUs are tending to go after and
contending for remote cachelines more often once this patch is applied.

Any ideas?

It's a 8-socket/160-thread (one NUMA node per socket) system that is not
under memory pressure during the test.  The latencies are also such that
vm.zone_reclaim_mode=0.

Raw perf profiles and .config are in here:
http://www.sr71.net/~dave/intel/201311-wisregress0/

Here's a chunk of the 'perf diff':
>     17.65%   +3.47%  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave           
>     13.80%   -0.31%  [kernel.kallsyms]  [k] _raw_spin_lock                   
>      7.21%   -0.51%  [unknown]          [.] 0x00007f7849058640               
>      3.43%   +0.15%  [kernel.kallsyms]  [k] setup_object                     
>      2.99%   -0.31%  [kernel.kallsyms]  [k] file_free_rcu                    
>      2.71%   -0.13%  [kernel.kallsyms]  [k] rcu_process_callbacks            
>      2.26%   -0.09%  [kernel.kallsyms]  [k] get_empty_filp                   
>      2.06%   -0.09%  [kernel.kallsyms]  [k] kmem_cache_alloc                 
>      1.65%   -0.08%  [kernel.kallsyms]  [k] link_path_walk                   
>      1.53%   -0.08%  [kernel.kallsyms]  [k] memset                           
>      1.46%   -0.09%  [kernel.kallsyms]  [k] do_dentry_open                   
>      1.44%   -0.04%  [kernel.kallsyms]  [k] __d_lookup_rcu                   
>      1.27%   -0.04%  [kernel.kallsyms]  [k] do_last                          
>      1.18%   -0.04%  [kernel.kallsyms]  [k] ext4_release_file                
>      1.16%   -0.04%  [kernel.kallsyms]  [k] __call_rcu.constprop.11          

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-12-06 17:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-21 22:57 NUMA? bisected performance regression 3.11->3.12 Dave Hansen
2013-11-22  5:22 ` Johannes Weiner
2013-11-22  6:18   ` Dave Hansen
2013-11-22  6:38     ` Johannes Weiner
2013-11-22 16:57       ` Dave Hansen
2013-11-26 10:32 ` Mel Gorman
2013-12-06 17:43   ` Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).