Re: Performance regression from switching lock to rw-sem for anon-vma tree

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@kernel.org>
To: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mel Gorman <mgorman@suse.de>, "Shi, Alex" <alex.shi@intel.com>,
	Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michel Lespinasse <walken@google.com>,
	Davidlohr Bueso <davidlohr.bueso@hp.com>,
	"Wilcox, Matthew R" <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Rik van Riel <riel@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm <linux-mm@kvack.org>
Subject: Re: Performance regression from switching lock to rw-sem for anon-vma tree
Date: Thu, 27 Jun 2013 10:36:51 +0200	[thread overview]
Message-ID: <20130627083651.GA3730@gmail.com> (raw)
In-Reply-To: <1372292701.22432.152.camel@schen9-DESK>


* Tim Chen <tim.c.chen@linux.intel.com> wrote:

> On Wed, 2013-06-26 at 14:36 -0700, Tim Chen wrote:
> > On Wed, 2013-06-26 at 11:51 +0200, Ingo Molnar wrote: 
> > > * Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > > 
> > > > On Wed, 2013-06-19 at 09:53 -0700, Tim Chen wrote: 
> > > > > On Wed, 2013-06-19 at 15:16 +0200, Ingo Molnar wrote:
> > > > > 
> > > > > > > vmstat for mutex implementation: 
> > > > > > > procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
> > > > > > >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
> > > > > > > 38  0      0 130957920  47860 199956    0    0     0    56 236342 476975 14 72 14  0  0
> > > > > > > 41  0      0 130938560  47860 219900    0    0     0     0 236816 479676 14 72 14  0  0
> > > > > > > 
> > > > > > > vmstat for rw-sem implementation (3.10-rc4)
> > > > > > > procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
> > > > > > >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
> > > > > > > 40  0      0 130933984  43232 202584    0    0     0     0 321817 690741 13 71 16  0  0
> > > > > > > 39  0      0 130913904  43232 224812    0    0     0     0 322193 692949 13 71 16  0  0
> > > > > > 
> > > > > > It appears the main difference is that the rwsem variant context-switches 
> > > > > > about 36% more than the mutex version, right?
> > > > > > 
> > > > > > I'm wondering how that's possible - the lock is mostly write-locked, 
> > > > > > correct? So the lock-stealing from Davidlohr Bueso and Michel Lespinasse 
> > > > > > ought to have brought roughly the same lock-stealing behavior as mutexes 
> > > > > > do, right?
> > > > > > 
> > > > > > So the next analytical step would be to figure out why rwsem lock-stealing 
> > > > > > is not behaving in an equivalent fashion on this workload. Do readers come 
> > > > > > in frequently enough to disrupt write-lock-stealing perhaps?
> > > > 
> > > > Ingo, 
> > > > 
> > > > I did some instrumentation on the write lock failure path.  I found that
> > > > for the exim workload, there are no readers blocking for the rwsem when
> > > > write locking failed.  The lock stealing is successful for 9.1% of the
> > > > time and the rest of the write lock failure caused the writer to go to
> > > > sleep.  About 1.4% of the writers sleep more than once. Majority of the
> > > > writers sleep once.
> > > > 
> > > > It is weird that lock stealing is not successful more often.
> > > 
> > > For this to be comparable to the mutex scalability numbers you'd have to 
> > > compare wlock-stealing _and_ adaptive spinning for failed-wlock rwsems.
> > > 
> > > Are both techniques applied in the kernel you are running your tests on?
> > > 
> > 
> > Ingo,
> > 
> > The previous experiment was done on a kernel without spinning. 
> > I've redone the testing on two kernel for a 15 sec stretch of the
> > workload run.  One with the adaptive (or optimistic) 
> > spinning and the other without.  Both have the patches from Alex to avoid 
> > cmpxchg induced cache bouncing.
> > 
> > With the spinning, I sleep much less for lock acquisition (18.6% vs 91.58%).
> > However, I've got doubling of write lock acquisition getting
> > blocked.  So that offset the gain from spinning which may be why
> > I didn't see gain for this particular workload.
> > 
> > 						No Opt Spin	Opt Spin
> > Writer acquisition blocked count		3448946		7359040
> > Blocked by reader				0.00%		0.55%
> > Lock acquired first attempt (lock stealing)	8.42%		16.92%
> > Lock acquired second attempt (1 sleep)	90.26%		17.60%
> > Lock acquired after more than 1 sleep	1.32%		1.00%
> > Lock acquired with optimistic spin		N/A		64.48%
> > 
> 
> Adding also the mutex statistics for the 3.10-rc4 kernel with mutex
> implemenation of lock for anon_vma tree.  Wonder if Ingo has any
> insight on why mutex performs better from these stats.
> 
> Mutex acquisition blocked count			14380340
> Lock acquired in slowpath (no sleep)		0.06%
> Lock acquired in slowpath (1 sleep)		0.24%
> Lock acquired in slowpath more than 1 sleep	0.98%
> Lock acquired with optimistic spin		99.6%

This is how I interpret the stats:

It does appear that in the mutex case we manage to acquire via spinning 
with a very high percentage - i.e. it essentialy behaves as a spinlock.

That is actually good news in a way, because it makes it rather simple how 
rwsems should behave in this case: since they have no substantial 
read-locking aspect in this workload, the down_write()/up_write()s should 
essentially behave like spinlocks as well, right?

Yet in the rwsem-spinning case the stats show that we only acquire the 
lock via spinning in 65% of the cases, plus we lock-steal in 16.9% of the 
cases:

Because lock stealing is essentially a single-spin spinning as well:

> > Lock acquired first attempt (lock stealing)	......		16.92%

So rwsems in this case behave like spinlocks in 65%+16.9% == 81.9% of the 
time.

What remains is the sleeping component:

> > Lock acquired second attempt (1 sleep)	......		17.60%

Yet the 17.6% sleep percentage is still much higher than the 1% in the 
mutex case. Why doesn't spinning work - do we time out of spinning 
differently?

Is there some other aspect that defeats optimistic spinning and forces the 
slowpath and creates sleeping, scheduling and thus extra overhead?

For example after a failed lock-stealing, do we still try optimistic 
spinning to write-acquire the rwsem, or go into the slowpath and thus 
trigger excessive context-switches?

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Ingo Molnar <mingo@kernel.org>
To: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mel Gorman <mgorman@suse.de>, "Shi, Alex" <alex.shi@intel.com>,
	Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michel Lespinasse <walken@google.com>,
	Davidlohr Bueso <davidlohr.bueso@hp.com>,
	"Wilcox, Matthew R" <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Rik van Riel <riel@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm <linux-mm@kvack.org>
Subject: Re: Performance regression from switching lock to rw-sem for anon-vma tree
Date: Thu, 27 Jun 2013 10:36:51 +0200	[thread overview]
Message-ID: <20130627083651.GA3730@gmail.com> (raw)
In-Reply-To: <1372292701.22432.152.camel@schen9-DESK>


* Tim Chen <tim.c.chen@linux.intel.com> wrote:

> On Wed, 2013-06-26 at 14:36 -0700, Tim Chen wrote:
> > On Wed, 2013-06-26 at 11:51 +0200, Ingo Molnar wrote: 
> > > * Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > > 
> > > > On Wed, 2013-06-19 at 09:53 -0700, Tim Chen wrote: 
> > > > > On Wed, 2013-06-19 at 15:16 +0200, Ingo Molnar wrote:
> > > > > 
> > > > > > > vmstat for mutex implementation: 
> > > > > > > procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
> > > > > > >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
> > > > > > > 38  0      0 130957920  47860 199956    0    0     0    56 236342 476975 14 72 14  0  0
> > > > > > > 41  0      0 130938560  47860 219900    0    0     0     0 236816 479676 14 72 14  0  0
> > > > > > > 
> > > > > > > vmstat for rw-sem implementation (3.10-rc4)
> > > > > > > procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
> > > > > > >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
> > > > > > > 40  0      0 130933984  43232 202584    0    0     0     0 321817 690741 13 71 16  0  0
> > > > > > > 39  0      0 130913904  43232 224812    0    0     0     0 322193 692949 13 71 16  0  0
> > > > > > 
> > > > > > It appears the main difference is that the rwsem variant context-switches 
> > > > > > about 36% more than the mutex version, right?
> > > > > > 
> > > > > > I'm wondering how that's possible - the lock is mostly write-locked, 
> > > > > > correct? So the lock-stealing from Davidlohr Bueso and Michel Lespinasse 
> > > > > > ought to have brought roughly the same lock-stealing behavior as mutexes 
> > > > > > do, right?
> > > > > > 
> > > > > > So the next analytical step would be to figure out why rwsem lock-stealing 
> > > > > > is not behaving in an equivalent fashion on this workload. Do readers come 
> > > > > > in frequently enough to disrupt write-lock-stealing perhaps?
> > > > 
> > > > Ingo, 
> > > > 
> > > > I did some instrumentation on the write lock failure path.  I found that
> > > > for the exim workload, there are no readers blocking for the rwsem when
> > > > write locking failed.  The lock stealing is successful for 9.1% of the
> > > > time and the rest of the write lock failure caused the writer to go to
> > > > sleep.  About 1.4% of the writers sleep more than once. Majority of the
> > > > writers sleep once.
> > > > 
> > > > It is weird that lock stealing is not successful more often.
> > > 
> > > For this to be comparable to the mutex scalability numbers you'd have to 
> > > compare wlock-stealing _and_ adaptive spinning for failed-wlock rwsems.
> > > 
> > > Are both techniques applied in the kernel you are running your tests on?
> > > 
> > 
> > Ingo,
> > 
> > The previous experiment was done on a kernel without spinning. 
> > I've redone the testing on two kernel for a 15 sec stretch of the
> > workload run.  One with the adaptive (or optimistic) 
> > spinning and the other without.  Both have the patches from Alex to avoid 
> > cmpxchg induced cache bouncing.
> > 
> > With the spinning, I sleep much less for lock acquisition (18.6% vs 91.58%).
> > However, I've got doubling of write lock acquisition getting
> > blocked.  So that offset the gain from spinning which may be why
> > I didn't see gain for this particular workload.
> > 
> > 						No Opt Spin	Opt Spin
> > Writer acquisition blocked count		3448946		7359040
> > Blocked by reader				0.00%		0.55%
> > Lock acquired first attempt (lock stealing)	8.42%		16.92%
> > Lock acquired second attempt (1 sleep)	90.26%		17.60%
> > Lock acquired after more than 1 sleep	1.32%		1.00%
> > Lock acquired with optimistic spin		N/A		64.48%
> > 
> 
> Adding also the mutex statistics for the 3.10-rc4 kernel with mutex
> implemenation of lock for anon_vma tree.  Wonder if Ingo has any
> insight on why mutex performs better from these stats.
> 
> Mutex acquisition blocked count			14380340
> Lock acquired in slowpath (no sleep)		0.06%
> Lock acquired in slowpath (1 sleep)		0.24%
> Lock acquired in slowpath more than 1 sleep	0.98%
> Lock acquired with optimistic spin		99.6%

This is how I interpret the stats:

It does appear that in the mutex case we manage to acquire via spinning 
with a very high percentage - i.e. it essentialy behaves as a spinlock.

That is actually good news in a way, because it makes it rather simple how 
rwsems should behave in this case: since they have no substantial 
read-locking aspect in this workload, the down_write()/up_write()s should 
essentially behave like spinlocks as well, right?

Yet in the rwsem-spinning case the stats show that we only acquire the 
lock via spinning in 65% of the cases, plus we lock-steal in 16.9% of the 
cases:

Because lock stealing is essentially a single-spin spinning as well:

> > Lock acquired first attempt (lock stealing)	......		16.92%

So rwsems in this case behave like spinlocks in 65%+16.9% == 81.9% of the 
time.

What remains is the sleeping component:

> > Lock acquired second attempt (1 sleep)	......		17.60%

Yet the 17.6% sleep percentage is still much higher than the 1% in the 
mutex case. Why doesn't spinning work - do we time out of spinning 
differently?

Is there some other aspect that defeats optimistic spinning and forces the 
slowpath and creates sleeping, scheduling and thus extra overhead?

For example after a failed lock-stealing, do we still try optimistic 
spinning to write-acquire the rwsem, or go into the slowpath and thus 
trigger excessive context-switches?

Thanks,

	Ingo

next prev parent reply	other threads:[~2013-06-27  8:36 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-13 23:26 Performance regression from switching lock to rw-sem for anon-vma tree Tim Chen
2013-06-13 23:26 ` Tim Chen
2013-06-19 13:16 ` Ingo Molnar
2013-06-19 13:16   ` Ingo Molnar
2013-06-19 16:53   ` Tim Chen
2013-06-19 16:53     ` Tim Chen
2013-06-26  0:19     ` Tim Chen
2013-06-26  0:19       ` Tim Chen
2013-06-26  9:51       ` Ingo Molnar
2013-06-26  9:51         ` Ingo Molnar
2013-06-26 21:36         ` Tim Chen
2013-06-26 21:36           ` Tim Chen
2013-06-27  0:25           ` Tim Chen
2013-06-27  0:25             ` Tim Chen
2013-06-27  8:36             ` Ingo Molnar [this message]
2013-06-27  8:36               ` Ingo Molnar
2013-06-27 20:53               ` Tim Chen
2013-06-27 20:53                 ` Tim Chen
2013-06-27 23:31                 ` Tim Chen
2013-06-27 23:31                   ` Tim Chen
2013-06-28  9:38                   ` Ingo Molnar
2013-06-28  9:38                     ` Ingo Molnar
2013-06-28 21:04                     ` Tim Chen
2013-06-28 21:04                       ` Tim Chen
2013-06-29  7:12                       ` Ingo Molnar
2013-06-29  7:12                         ` Ingo Molnar
2013-07-01 20:28                         ` Tim Chen
2013-07-01 20:28                           ` Tim Chen
2013-07-02  6:45                           ` Ingo Molnar
2013-07-02  6:45                             ` Ingo Molnar
2013-07-16 17:53                             ` Tim Chen
2013-07-16 17:53                               ` Tim Chen
2013-07-23  9:45                               ` Ingo Molnar
2013-07-23  9:45                                 ` Ingo Molnar
2013-07-23  9:51                                 ` Peter Zijlstra
2013-07-23  9:51                                   ` Peter Zijlstra
2013-07-23  9:53                                   ` Ingo Molnar
2013-07-23  9:53                                     ` Ingo Molnar
2013-07-30  0:13                                     ` Tim Chen
2013-07-30  0:13                                       ` Tim Chen
2013-07-30 19:24                                       ` Ingo Molnar
2013-07-30 19:24                                         ` Ingo Molnar
2013-08-05 22:08                                         ` Tim Chen
2013-08-05 22:08                                           ` Tim Chen
2013-07-30 19:59                                       ` Davidlohr Bueso
2013-07-30 19:59                                         ` Davidlohr Bueso
2013-07-30 20:34                                         ` Tim Chen
2013-07-30 20:34                                           ` Tim Chen
2013-07-30 21:45                                           ` Davidlohr Bueso
2013-07-30 21:45                                             ` Davidlohr Bueso
2013-08-06 23:55                                       ` Davidlohr Bueso
2013-08-06 23:55                                         ` Davidlohr Bueso
2013-08-07  0:56                                         ` Tim Chen
2013-08-07  0:56                                           ` Tim Chen
2013-08-12 18:52                                           ` Ingo Molnar
2013-08-12 18:52                                             ` Ingo Molnar
2013-08-12 20:10                                             ` Tim Chen
2013-08-12 20:10                                               ` Tim Chen
2013-06-28  9:20                 ` Ingo Molnar
2013-06-28  9:20                   ` Ingo Molnar
     [not found] <1371165333.27102.568.camel@schen9-DESK>
     [not found] ` <1371167015.1754.14.camel@buesod1.americas.hpqcorp.net>
2013-06-14 16:09   ` Tim Chen
2013-06-14 16:09     ` Tim Chen
2013-06-14 22:31     ` Davidlohr Bueso
2013-06-14 22:31       ` Davidlohr Bueso
2013-06-14 22:44       ` Tim Chen
2013-06-14 22:44         ` Tim Chen
2013-06-14 22:47       ` Michel Lespinasse
2013-06-14 22:47         ` Michel Lespinasse
2013-06-17 22:27         ` Tim Chen
2013-06-17 22:27           ` Tim Chen
2013-06-16  9:50   ` Alex Shi
2013-06-16  9:50     ` Alex Shi
2013-06-17 16:22     ` Davidlohr Bueso
2013-06-17 16:22       ` Davidlohr Bueso
2013-06-17 18:45       ` Tim Chen
2013-06-17 18:45         ` Tim Chen
2013-06-17 19:05         ` Davidlohr Bueso
2013-06-17 19:05           ` Davidlohr Bueso
2013-06-17 22:28           ` Tim Chen
2013-06-17 22:28             ` Tim Chen
2013-06-17 23:18         ` Alex Shi
2013-06-17 23:18           ` Alex Shi
2013-06-17 23:20       ` Alex Shi
2013-06-17 23:20         ` Alex Shi
2013-06-17 23:35         ` Davidlohr Bueso
2013-06-17 23:35           ` Davidlohr Bueso
2013-06-18  0:08           ` Tim Chen
2013-06-18  0:08             ` Tim Chen
2013-06-19 23:11             ` Davidlohr Bueso
2013-06-19 23:11               ` Davidlohr Bueso
2013-06-19 23:24               ` Tim Chen
2013-06-19 23:24                 ` Tim Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130627083651.GA3730@gmail.com \
    --to=mingo@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@intel.com \
    --cc=andi@firstfloor.org \
    --cc=dave.hansen@intel.com \
    --cc=davidlohr.bueso@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.r.wilcox@intel.com \
    --cc=mgorman@suse.de \
    --cc=mingo@elte.hu \
    --cc=riel@redhat.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.