Re: [PATCH] rwsem-spinlock: let rwsem write lock stealable

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Yuanhan Liu <yuanhan.liu@linux.intel.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org, David Howells <dhowells@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH] rwsem-spinlock: let rwsem write lock stealable
Date: Thu, 31 Jan 2013 18:09:27 +0800	[thread overview]
Message-ID: <20130131100927.GY12678@yliu-dev.sh.intel.com> (raw)
In-Reply-To: <20130131093931.GA398@gmail.com>

On Thu, Jan 31, 2013 at 10:39:31AM +0100, Ingo Molnar wrote:
> 
> * Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:
> 
> > We(Linux Kernel Performance project) found a regression introduced by
> > commit 5a50508, which just convert all mutex lock to rwsem write lock.
> > The semantics is same, but the results is quite huge in some cases.
> > After investigation, we found the root cause: mutex support lock
> > stealing. Here is the link for the detailed regression report:
> >     https://lkml.org/lkml/2013/1/29/84
> > 
> > Ingo suggests to add write lock stealing to rwsem as well:
> >     "I think we should allow lock-steal between rwsem writers - that
> >      will not hurt fairness as most rwsem fairness concerns relate to
> >      reader vs. writer fairness"
> > 
> > I then tried it with rwsem-spinlock first as I found it much easier to
> > implement it than lib/rwsem.c. And here I sent out this patch first for
> > comments. I'd try lib/rwsem.c later once the change to rwsem-spinlock
> > is OK to you guys.
> > 
> > With this patch, we got a double performance increase in one test box
> > with following aim7 workfile:
> >     FILESIZE: 1M
> >     POOLSIZE: 10M
> >     10 fork_test
> > 
> > some /usr/bin/time output w/o patch      some /usr/bin/time_output with patch
> > ----------------------------------------------------------------------------
> > Percent of CPU this job got: 369%        Percent of CPU this job got: 537%
> > Voluntary context switches: 640595016    Voluntary context switches: 157915561
> > ----------------------------------------------------------------------------
> > You will see we got a 45% increase of CPU usage and saves about 3/4
> > voluntary context switches.
> > 
> > 
> > Here is the .nr_running filed for all CPUs from /proc/sched_debug.
> > 
> > output w/o this patch:
> > ----------------------
> > cpu 00:   0   0   ...   0   0   0   0   0   0   0   1   0   1 .... 0   0
> > cpu 01:   0   0   ...   1   0   0   0   0   0   1   1   0   1 .... 0   0
> > cpu 02:   0   0   ...   1   1   0   0   0   1   0   0   1   0 .... 1   1
> > cpu 03:   0   0   ...   0   1   0   0   0   1   1   0   1   1 .... 0   0
> > cpu 04:   0   1   ...   0   0   2   1   1   2   1   0   1   0 .... 1   0
> > cpu 05:   0   1   ...   0   0   2   1   1   2   1   1   1   1 .... 0   0
> > cpu 06:   0   0   ...   2   0   0   1   0   0   1   0   0   0 .... 0   0
> > cpu 07:   0   0   ...   2   0   0   0   1   0   1   1   0   0 .... 1   0
> > cpu 08:   0   0   ...   1   0   0   0   1   0   0   1   0   0 .... 0   1
> > cpu 09:   0   0   ...   1   0   0   0   1   0   0   1   0   0 .... 0   1
> > cpu 10:   0   0   ...   0   0   0   2   0   0   1   0   1   1 .... 1   2
> > cpu 11:   0   0   ...   0   0   0   2   2   0   1   0   1   0 .... 1   2
> > cpu 12:   0   0   ...   2   0   0   0   1   1   3   1   1   1 .... 1   0
> > cpu 13:   0   0   ...   2   0   0   0   1   1   3   1   1   0 .... 1   1
> > cpu 14:   0   0   ...   0   0   0   2   0   0   1   1   0   0 .... 1   0
> > cpu 15:   0   0   ...   1   0   0   2   0   0   1   1   0   0 .... 0   0
> > 
> > output with this patch:
> > -----------------------
> > cpu 00:   0   0   ...   1   1   2   1   1   1   2   1   1   1 .... 1   3
> > cpu 01:   0   0   ...   1   1   1   1   1   1   2   1   1   1 .... 1   3
> > cpu 02:   0   0   ...   2   2   3   2   0   2   1   2   1   1 .... 1   1
> > cpu 03:   0   0   ...   2   2   3   2   1   2   1   2   1   1 .... 1   1
> > cpu 04:   0   1   ...   2   0   0   1   0   1   3   1   1   1 .... 1   1
> > cpu 05:   0   1   ...   2   0   1   1   0   1   2   1   1   1 .... 1   1
> > cpu 06:   0   0   ...   2   1   1   2   0   1   2   1   1   1 .... 2   1
> > cpu 07:   0   0   ...   2   1   1   2   0   1   2   1   1   1 .... 2   1
> > cpu 08:   0   0   ...   1   1   1   1   1   1   1   1   1   1 .... 0   0
> > cpu 09:   0   0   ...   1   1   1   1   1   1   1   1   1   1 .... 0   0
> > cpu 10:   0   0   ...   1   1   1   0   0   1   1   1   1   1 .... 0   0
> > cpu 11:   0   0   ...   1   1   1   0   0   1   1   1   1   2 .... 1   0
> > cpu 12:   0   0   ...   1   1   1   0   1   1   0   0   0   1 .... 2   1
> > cpu 13:   0   0   ...   1   1   1   0   1   1   1   0   1   2 .... 2   0
> > cpu 14:   0   0   ...   2   0   0   0   0   1   1   1   1   1 .... 2   2
> > cpu 15:   0   0   ...   2   0   0   1   0   1   1   1   1   1 .... 2   2
> > ------------------------------------------------------------------------
> > Where you can see that CPU is much busier with this patch.
> 
> That looks really good - quite similar to how it behaved with 
> mutexes, right?

Yes :)

And the result is almost same with mutex lock when MUTEX_SPIN_ON_OWNER
is disabled, and that's the reason you will see massive processes(about
100) queued on each CPU in my last report:
    https://lkml.org/lkml/2013/1/29/84

> 
> Does this recover most of the performance regression?

Yes, there is only a 10% gap here then. I guess that's because I used
the general rwsem lock implementation(lib/rwsem-spinlock.c), but not the
XADD one(lib/rwsem.c). I guess the gap may be a little smaller if we do
the same thing to lib/rwsem.c.


Thanks.

	--yliu

next prev parent reply	other threads:[~2013-01-31 10:08 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-30  9:14 [PATCH] rwsem-spinlock: let rwsem write lock stealable Yuanhan Liu
2013-01-31  9:39 ` Ingo Molnar
2013-01-31 10:09   ` Yuanhan Liu [this message]
2013-01-31 10:45     ` Ingo Molnar
2013-01-31 12:23       ` Yuanhan Liu
2013-01-31 11:57 ` Michel Lespinasse
2013-01-31 12:40   ` Yuanhan Liu
2013-01-31 13:12     ` Ingo Molnar
2013-01-31 14:36       ` Yuanhan Liu
2013-01-31 21:18         ` Ingo Molnar
2013-02-01  2:16           ` Yuanhan Liu
2013-01-31 13:10   ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130131100927.GY12678@yliu-dev.sh.intel.com \
    --to=yuanhan.liu@linux.intel.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.