Re: Performance regression from switching lock to rw-sem for anon-vma tree

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@kernel.org>
To: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mel Gorman <mgorman@suse.de>, "Shi, Alex" <alex.shi@intel.com>,
	Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michel Lespinasse <walken@google.com>,
	Davidlohr Bueso <davidlohr.bueso@hp.com>,
	"Wilcox, Matthew R" <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Rik van Riel <riel@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm <linux-mm@kvack.org>
Subject: Re: Performance regression from switching lock to rw-sem for anon-vma tree
Date: Wed, 19 Jun 2013 15:16:12 +0200	[thread overview]
Message-ID: <20130619131611.GC24957@gmail.com> (raw)
In-Reply-To: <1371165992.27102.573.camel@schen9-DESK>


* Tim Chen <tim.c.chen@linux.intel.com> wrote:

> Ingo,
> 
> At the time of switching the anon-vma tree's lock from mutex to 
> rw-sem (commit 5a505085), we encountered regressions for fork heavy workload. 
> A lot of optimizations to rw-sem (e.g. lock stealing) helped to 
> mitigate the problem.  I tried an experiment on the 3.10-rc4 kernel 
> to compare the performance of rw-sem to one that uses mutex. I saw 
> a 8% regression in throughput for rw-sem vs a mutex implementation in
> 3.10-rc4.
> 
> For the experiments, I used the exim mail server workload in 
> the MOSBENCH test suite on 4 socket (westmere) and a 4 socket 
> (ivy bridge) with the number of clients sending mail equal 
> to number of cores.  The mail server will
> fork off a process to handle an incoming mail and put it into mail
> spool. The lock protecting the anon-vma tree is stressed due to
> heavy forking. On both machines, I saw that the mutex implementation 
> has 8% more throughput.  I've pinned the cpu frequency to maximum
> in the experiments.
> 
> I've tried two separate tweaks to the rw-sem on 3.10-rc4.  I've tested 
> each tweak individually.
> 
> 1) Add an owner field when a writer holds the lock and introduce 
> optimistic spinning when an active writer is holding the semaphore.  
> It reduced the context switching by 30% to a level very close to the
> mutex implementation.  However, I did not see any throughput improvement
> of exim.
> 
> 2) When the sem->count's active field is non-zero (i.e. someone
> is holding the lock), we can skip directly to the down_write_failed
> path, without adding the RWSEM_DOWN_WRITE_BIAS and taking
> it off again from sem->count, saving us two atomic operations.
> Since we will try the lock stealing again later, this should be okay.
> Unfortunately it did not improve the exim workload either.  
> 
> Any suggestions on the difference between rwsem and mutex performance
> and possible improvements to recover this regression?
> 
> Thanks.
> 
> Tim
> 
> vmstat for mutex implementation: 
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
> 38  0      0 130957920  47860 199956    0    0     0    56 236342 476975 14 72 14  0  0
> 41  0      0 130938560  47860 219900    0    0     0     0 236816 479676 14 72 14  0  0
> 
> vmstat for rw-sem implementation (3.10-rc4)
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
> 40  0      0 130933984  43232 202584    0    0     0     0 321817 690741 13 71 16  0  0
> 39  0      0 130913904  43232 224812    0    0     0     0 322193 692949 13 71 16  0  0

It appears the main difference is that the rwsem variant context-switches 
about 36% more than the mutex version, right?

I'm wondering how that's possible - the lock is mostly write-locked, 
correct? So the lock-stealing from Davidlohr Bueso and Michel Lespinasse 
ought to have brought roughly the same lock-stealing behavior as mutexes 
do, right?

So the next analytical step would be to figure out why rwsem lock-stealing 
is not behaving in an equivalent fashion on this workload. Do readers come 
in frequently enough to disrupt write-lock-stealing perhaps?

Context-switch call-graph profiling might shed some light on where the 
extra context switches come from...

Something like:

  perf record -g -e sched:sched_switch --filter 'prev_state != 0' -a sleep 1

or a variant thereof might do the trick.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Ingo Molnar <mingo@kernel.org>
To: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mel Gorman <mgorman@suse.de>, "Shi, Alex" <alex.shi@intel.com>,
	Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michel Lespinasse <walken@google.com>,
	Davidlohr Bueso <davidlohr.bueso@hp.com>,
	"Wilcox, Matthew R" <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Rik van Riel <riel@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm <linux-mm@kvack.org>
Subject: Re: Performance regression from switching lock to rw-sem for anon-vma tree
Date: Wed, 19 Jun 2013 15:16:12 +0200	[thread overview]
Message-ID: <20130619131611.GC24957@gmail.com> (raw)
In-Reply-To: <1371165992.27102.573.camel@schen9-DESK>


* Tim Chen <tim.c.chen@linux.intel.com> wrote:

> Ingo,
> 
> At the time of switching the anon-vma tree's lock from mutex to 
> rw-sem (commit 5a505085), we encountered regressions for fork heavy workload. 
> A lot of optimizations to rw-sem (e.g. lock stealing) helped to 
> mitigate the problem.  I tried an experiment on the 3.10-rc4 kernel 
> to compare the performance of rw-sem to one that uses mutex. I saw 
> a 8% regression in throughput for rw-sem vs a mutex implementation in
> 3.10-rc4.
> 
> For the experiments, I used the exim mail server workload in 
> the MOSBENCH test suite on 4 socket (westmere) and a 4 socket 
> (ivy bridge) with the number of clients sending mail equal 
> to number of cores.  The mail server will
> fork off a process to handle an incoming mail and put it into mail
> spool. The lock protecting the anon-vma tree is stressed due to
> heavy forking. On both machines, I saw that the mutex implementation 
> has 8% more throughput.  I've pinned the cpu frequency to maximum
> in the experiments.
> 
> I've tried two separate tweaks to the rw-sem on 3.10-rc4.  I've tested 
> each tweak individually.
> 
> 1) Add an owner field when a writer holds the lock and introduce 
> optimistic spinning when an active writer is holding the semaphore.  
> It reduced the context switching by 30% to a level very close to the
> mutex implementation.  However, I did not see any throughput improvement
> of exim.
> 
> 2) When the sem->count's active field is non-zero (i.e. someone
> is holding the lock), we can skip directly to the down_write_failed
> path, without adding the RWSEM_DOWN_WRITE_BIAS and taking
> it off again from sem->count, saving us two atomic operations.
> Since we will try the lock stealing again later, this should be okay.
> Unfortunately it did not improve the exim workload either.  
> 
> Any suggestions on the difference between rwsem and mutex performance
> and possible improvements to recover this regression?
> 
> Thanks.
> 
> Tim
> 
> vmstat for mutex implementation: 
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
> 38  0      0 130957920  47860 199956    0    0     0    56 236342 476975 14 72 14  0  0
> 41  0      0 130938560  47860 219900    0    0     0     0 236816 479676 14 72 14  0  0
> 
> vmstat for rw-sem implementation (3.10-rc4)
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
> 40  0      0 130933984  43232 202584    0    0     0     0 321817 690741 13 71 16  0  0
> 39  0      0 130913904  43232 224812    0    0     0     0 322193 692949 13 71 16  0  0

It appears the main difference is that the rwsem variant context-switches 
about 36% more than the mutex version, right?

I'm wondering how that's possible - the lock is mostly write-locked, 
correct? So the lock-stealing from Davidlohr Bueso and Michel Lespinasse 
ought to have brought roughly the same lock-stealing behavior as mutexes 
do, right?

So the next analytical step would be to figure out why rwsem lock-stealing 
is not behaving in an equivalent fashion on this workload. Do readers come 
in frequently enough to disrupt write-lock-stealing perhaps?

Context-switch call-graph profiling might shed some light on where the 
extra context switches come from...

Something like:

  perf record -g -e sched:sched_switch --filter 'prev_state != 0' -a sleep 1

or a variant thereof might do the trick.

Thanks,

	Ingo

next prev parent reply	other threads:[~2013-06-19 13:16 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-13 23:26 Performance regression from switching lock to rw-sem for anon-vma tree Tim Chen
2013-06-13 23:26 ` Tim Chen
2013-06-19 13:16 ` Ingo Molnar [this message]
2013-06-19 13:16   ` Ingo Molnar
2013-06-19 16:53   ` Tim Chen
2013-06-19 16:53     ` Tim Chen
2013-06-26  0:19     ` Tim Chen
2013-06-26  0:19       ` Tim Chen
2013-06-26  9:51       ` Ingo Molnar
2013-06-26  9:51         ` Ingo Molnar
2013-06-26 21:36         ` Tim Chen
2013-06-26 21:36           ` Tim Chen
2013-06-27  0:25           ` Tim Chen
2013-06-27  0:25             ` Tim Chen
2013-06-27  8:36             ` Ingo Molnar
2013-06-27  8:36               ` Ingo Molnar
2013-06-27 20:53               ` Tim Chen
2013-06-27 20:53                 ` Tim Chen
2013-06-27 23:31                 ` Tim Chen
2013-06-27 23:31                   ` Tim Chen
2013-06-28  9:38                   ` Ingo Molnar
2013-06-28  9:38                     ` Ingo Molnar
2013-06-28 21:04                     ` Tim Chen
2013-06-28 21:04                       ` Tim Chen
2013-06-29  7:12                       ` Ingo Molnar
2013-06-29  7:12                         ` Ingo Molnar
2013-07-01 20:28                         ` Tim Chen
2013-07-01 20:28                           ` Tim Chen
2013-07-02  6:45                           ` Ingo Molnar
2013-07-02  6:45                             ` Ingo Molnar
2013-07-16 17:53                             ` Tim Chen
2013-07-16 17:53                               ` Tim Chen
2013-07-23  9:45                               ` Ingo Molnar
2013-07-23  9:45                                 ` Ingo Molnar
2013-07-23  9:51                                 ` Peter Zijlstra
2013-07-23  9:51                                   ` Peter Zijlstra
2013-07-23  9:53                                   ` Ingo Molnar
2013-07-23  9:53                                     ` Ingo Molnar
2013-07-30  0:13                                     ` Tim Chen
2013-07-30  0:13                                       ` Tim Chen
2013-07-30 19:24                                       ` Ingo Molnar
2013-07-30 19:24                                         ` Ingo Molnar
2013-08-05 22:08                                         ` Tim Chen
2013-08-05 22:08                                           ` Tim Chen
2013-07-30 19:59                                       ` Davidlohr Bueso
2013-07-30 19:59                                         ` Davidlohr Bueso
2013-07-30 20:34                                         ` Tim Chen
2013-07-30 20:34                                           ` Tim Chen
2013-07-30 21:45                                           ` Davidlohr Bueso
2013-07-30 21:45                                             ` Davidlohr Bueso
2013-08-06 23:55                                       ` Davidlohr Bueso
2013-08-06 23:55                                         ` Davidlohr Bueso
2013-08-07  0:56                                         ` Tim Chen
2013-08-07  0:56                                           ` Tim Chen
2013-08-12 18:52                                           ` Ingo Molnar
2013-08-12 18:52                                             ` Ingo Molnar
2013-08-12 20:10                                             ` Tim Chen
2013-08-12 20:10                                               ` Tim Chen
2013-06-28  9:20                 ` Ingo Molnar
2013-06-28  9:20                   ` Ingo Molnar
     [not found] <1371165333.27102.568.camel@schen9-DESK>
     [not found] ` <1371167015.1754.14.camel@buesod1.americas.hpqcorp.net>
2013-06-14 16:09   ` Tim Chen
2013-06-14 16:09     ` Tim Chen
2013-06-14 22:31     ` Davidlohr Bueso
2013-06-14 22:31       ` Davidlohr Bueso
2013-06-14 22:44       ` Tim Chen
2013-06-14 22:44         ` Tim Chen
2013-06-14 22:47       ` Michel Lespinasse
2013-06-14 22:47         ` Michel Lespinasse
2013-06-17 22:27         ` Tim Chen
2013-06-17 22:27           ` Tim Chen
2013-06-16  9:50   ` Alex Shi
2013-06-16  9:50     ` Alex Shi
2013-06-17 16:22     ` Davidlohr Bueso
2013-06-17 16:22       ` Davidlohr Bueso
2013-06-17 18:45       ` Tim Chen
2013-06-17 18:45         ` Tim Chen
2013-06-17 19:05         ` Davidlohr Bueso
2013-06-17 19:05           ` Davidlohr Bueso
2013-06-17 22:28           ` Tim Chen
2013-06-17 22:28             ` Tim Chen
2013-06-17 23:18         ` Alex Shi
2013-06-17 23:18           ` Alex Shi
2013-06-17 23:20       ` Alex Shi
2013-06-17 23:20         ` Alex Shi
2013-06-17 23:35         ` Davidlohr Bueso
2013-06-17 23:35           ` Davidlohr Bueso
2013-06-18  0:08           ` Tim Chen
2013-06-18  0:08             ` Tim Chen
2013-06-19 23:11             ` Davidlohr Bueso
2013-06-19 23:11               ` Davidlohr Bueso
2013-06-19 23:24               ` Tim Chen
2013-06-19 23:24                 ` Tim Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130619131611.GC24957@gmail.com \
    --to=mingo@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@intel.com \
    --cc=andi@firstfloor.org \
    --cc=dave.hansen@intel.com \
    --cc=davidlohr.bueso@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.r.wilcox@intel.com \
    --cc=mgorman@suse.de \
    --cc=mingo@elte.hu \
    --cc=riel@redhat.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.