public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] rwsem-spinlock: let rwsem write lock stealable
@ 2013-01-30  9:14 Yuanhan Liu
  2013-01-31  9:39 ` Ingo Molnar
  2013-01-31 11:57 ` Michel Lespinasse
  0 siblings, 2 replies; 12+ messages in thread
From: Yuanhan Liu @ 2013-01-30  9:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Yuanhan Liu, David Howells

We(Linux Kernel Performance project) found a regression introduced by
commit 5a50508, which just convert all mutex lock to rwsem write lock.
The semantics is same, but the results is quite huge in some cases.
After investigation, we found the root cause: mutex support lock
stealing. Here is the link for the detailed regression report:
    https://lkml.org/lkml/2013/1/29/84

Ingo suggests to add write lock stealing to rwsem as well:
    "I think we should allow lock-steal between rwsem writers - that
     will not hurt fairness as most rwsem fairness concerns relate to
     reader vs. writer fairness"

I then tried it with rwsem-spinlock first as I found it much easier to
implement it than lib/rwsem.c. And here I sent out this patch first for
comments. I'd try lib/rwsem.c later once the change to rwsem-spinlock
is OK to you guys.

With this patch, we got a double performance increase in one test box
with following aim7 workfile:
    FILESIZE: 1M
    POOLSIZE: 10M
    10 fork_test

some /usr/bin/time output w/o patch      some /usr/bin/time_output with patch
----------------------------------------------------------------------------
Percent of CPU this job got: 369%        Percent of CPU this job got: 537%
Voluntary context switches: 640595016    Voluntary context switches: 157915561
----------------------------------------------------------------------------
You will see we got a 45% increase of CPU usage and saves about 3/4
voluntary context switches.


Here is the .nr_running filed for all CPUs from /proc/sched_debug.

output w/o this patch:
----------------------
cpu 00:   0   0   ...   0   0   0   0   0   0   0   1   0   1 .... 0   0
cpu 01:   0   0   ...   1   0   0   0   0   0   1   1   0   1 .... 0   0
cpu 02:   0   0   ...   1   1   0   0   0   1   0   0   1   0 .... 1   1
cpu 03:   0   0   ...   0   1   0   0   0   1   1   0   1   1 .... 0   0
cpu 04:   0   1   ...   0   0   2   1   1   2   1   0   1   0 .... 1   0
cpu 05:   0   1   ...   0   0   2   1   1   2   1   1   1   1 .... 0   0
cpu 06:   0   0   ...   2   0   0   1   0   0   1   0   0   0 .... 0   0
cpu 07:   0   0   ...   2   0   0   0   1   0   1   1   0   0 .... 1   0
cpu 08:   0   0   ...   1   0   0   0   1   0   0   1   0   0 .... 0   1
cpu 09:   0   0   ...   1   0   0   0   1   0   0   1   0   0 .... 0   1
cpu 10:   0   0   ...   0   0   0   2   0   0   1   0   1   1 .... 1   2
cpu 11:   0   0   ...   0   0   0   2   2   0   1   0   1   0 .... 1   2
cpu 12:   0   0   ...   2   0   0   0   1   1   3   1   1   1 .... 1   0
cpu 13:   0   0   ...   2   0   0   0   1   1   3   1   1   0 .... 1   1
cpu 14:   0   0   ...   0   0   0   2   0   0   1   1   0   0 .... 1   0
cpu 15:   0   0   ...   1   0   0   2   0   0   1   1   0   0 .... 0   0

output with this patch:
-----------------------
cpu 00:   0   0   ...   1   1   2   1   1   1   2   1   1   1 .... 1   3
cpu 01:   0   0   ...   1   1   1   1   1   1   2   1   1   1 .... 1   3
cpu 02:   0   0   ...   2   2   3   2   0   2   1   2   1   1 .... 1   1
cpu 03:   0   0   ...   2   2   3   2   1   2   1   2   1   1 .... 1   1
cpu 04:   0   1   ...   2   0   0   1   0   1   3   1   1   1 .... 1   1
cpu 05:   0   1   ...   2   0   1   1   0   1   2   1   1   1 .... 1   1
cpu 06:   0   0   ...   2   1   1   2   0   1   2   1   1   1 .... 2   1
cpu 07:   0   0   ...   2   1   1   2   0   1   2   1   1   1 .... 2   1
cpu 08:   0   0   ...   1   1   1   1   1   1   1   1   1   1 .... 0   0
cpu 09:   0   0   ...   1   1   1   1   1   1   1   1   1   1 .... 0   0
cpu 10:   0   0   ...   1   1   1   0   0   1   1   1   1   1 .... 0   0
cpu 11:   0   0   ...   1   1   1   0   0   1   1   1   1   2 .... 1   0
cpu 12:   0   0   ...   1   1   1   0   1   1   0   0   0   1 .... 2   1
cpu 13:   0   0   ...   1   1   1   0   1   1   1   0   1   2 .... 2   0
cpu 14:   0   0   ...   2   0   0   0   0   1   1   1   1   1 .... 2   2
cpu 15:   0   0   ...   2   0   0   1   0   1   1   1   1   1 .... 2   2
------------------------------------------------------------------------
Where you can see that CPU is much busier with this patch.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
---
 lib/rwsem-spinlock.c |   65 +++++++++++++++++---------------------------------
 1 files changed, 22 insertions(+), 43 deletions(-)

diff --git a/lib/rwsem-spinlock.c b/lib/rwsem-spinlock.c
index 7e0d6a5..a32b85e 100644
--- a/lib/rwsem-spinlock.c
+++ b/lib/rwsem-spinlock.c
@@ -73,20 +73,13 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)
 		goto dont_wake_writers;
 	}
 
-	/* if we are allowed to wake writers try to grant a single write lock
-	 * if there's a writer at the front of the queue
-	 * - we leave the 'waiting count' incremented to signify potential
-	 *   contention
+	/*
+	 * as we support write lock stealing, we can't set sem->activity
+	 * to -1 here to indicate we get the lock. Instead, we wake it up
+	 * to let it go get it again.
 	 */
 	if (waiter->flags & RWSEM_WAITING_FOR_WRITE) {
-		sem->activity = -1;
-		list_del(&waiter->list);
-		tsk = waiter->task;
-		/* Don't touch waiter after ->task has been NULLed */
-		smp_mb();
-		waiter->task = NULL;
-		wake_up_process(tsk);
-		put_task_struct(tsk);
+		wake_up_process(waiter->task);
 		goto out;
 	}
 
@@ -121,18 +114,10 @@ static inline struct rw_semaphore *
 __rwsem_wake_one_writer(struct rw_semaphore *sem)
 {
 	struct rwsem_waiter *waiter;
-	struct task_struct *tsk;
-
-	sem->activity = -1;
 
 	waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
-	list_del(&waiter->list);
+	wake_up_process(waiter->task);
 
-	tsk = waiter->task;
-	smp_mb();
-	waiter->task = NULL;
-	wake_up_process(tsk);
-	put_task_struct(tsk);
 	return sem;
 }
 
@@ -204,7 +189,6 @@ int __down_read_trylock(struct rw_semaphore *sem)
 
 /*
  * get a write lock on the semaphore
- * - we increment the waiting count anyway to indicate an exclusive lock
  */
 void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
 {
@@ -214,37 +198,32 @@ void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
 
 	raw_spin_lock_irqsave(&sem->wait_lock, flags);
 
-	if (sem->activity == 0 && list_empty(&sem->wait_list)) {
-		/* granted */
-		sem->activity = -1;
-		raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
-		goto out;
-	}
-
-	tsk = current;
-	set_task_state(tsk, TASK_UNINTERRUPTIBLE);
-
 	/* set up my own style of waitqueue */
+	tsk = current;
 	waiter.task = tsk;
 	waiter.flags = RWSEM_WAITING_FOR_WRITE;
-	get_task_struct(tsk);
-
 	list_add_tail(&waiter.list, &sem->wait_list);
 
-	/* we don't need to touch the semaphore struct anymore */
-	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
-
-	/* wait to be given the lock */
+	/* wait for someone to release the lock */
 	for (;;) {
-		if (!waiter.task)
+		/*
+		 * That is the key to support write lock stealing: allows the
+		 * task already on CPU to get the lock soon rather than put
+		 * itself into sleep and waiting for system woke it or someone
+		 * else in the head of the wait list up.
+		 */
+		if (sem->activity == 0)
 			break;
-		schedule();
 		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+		raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
+		schedule();
+		raw_spin_lock_irqsave(&sem->wait_lock, flags);
 	}
+	/* got the lock */
+	sem->activity = -1;
+	list_del(&waiter.list);
 
-	tsk->state = TASK_RUNNING;
- out:
-	;
+	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
 }
 
 void __sched __down_write(struct rw_semaphore *sem)
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] rwsem-spinlock: let rwsem write lock stealable
  2013-01-30  9:14 [PATCH] rwsem-spinlock: let rwsem write lock stealable Yuanhan Liu
@ 2013-01-31  9:39 ` Ingo Molnar
  2013-01-31 10:09   ` Yuanhan Liu
  2013-01-31 11:57 ` Michel Lespinasse
  1 sibling, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2013-01-31  9:39 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: linux-kernel, David Howells, Linus Torvalds, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Steven Rostedt


* Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:

> We(Linux Kernel Performance project) found a regression introduced by
> commit 5a50508, which just convert all mutex lock to rwsem write lock.
> The semantics is same, but the results is quite huge in some cases.
> After investigation, we found the root cause: mutex support lock
> stealing. Here is the link for the detailed regression report:
>     https://lkml.org/lkml/2013/1/29/84
> 
> Ingo suggests to add write lock stealing to rwsem as well:
>     "I think we should allow lock-steal between rwsem writers - that
>      will not hurt fairness as most rwsem fairness concerns relate to
>      reader vs. writer fairness"
> 
> I then tried it with rwsem-spinlock first as I found it much easier to
> implement it than lib/rwsem.c. And here I sent out this patch first for
> comments. I'd try lib/rwsem.c later once the change to rwsem-spinlock
> is OK to you guys.
> 
> With this patch, we got a double performance increase in one test box
> with following aim7 workfile:
>     FILESIZE: 1M
>     POOLSIZE: 10M
>     10 fork_test
> 
> some /usr/bin/time output w/o patch      some /usr/bin/time_output with patch
> ----------------------------------------------------------------------------
> Percent of CPU this job got: 369%        Percent of CPU this job got: 537%
> Voluntary context switches: 640595016    Voluntary context switches: 157915561
> ----------------------------------------------------------------------------
> You will see we got a 45% increase of CPU usage and saves about 3/4
> voluntary context switches.
> 
> 
> Here is the .nr_running filed for all CPUs from /proc/sched_debug.
> 
> output w/o this patch:
> ----------------------
> cpu 00:   0   0   ...   0   0   0   0   0   0   0   1   0   1 .... 0   0
> cpu 01:   0   0   ...   1   0   0   0   0   0   1   1   0   1 .... 0   0
> cpu 02:   0   0   ...   1   1   0   0   0   1   0   0   1   0 .... 1   1
> cpu 03:   0   0   ...   0   1   0   0   0   1   1   0   1   1 .... 0   0
> cpu 04:   0   1   ...   0   0   2   1   1   2   1   0   1   0 .... 1   0
> cpu 05:   0   1   ...   0   0   2   1   1   2   1   1   1   1 .... 0   0
> cpu 06:   0   0   ...   2   0   0   1   0   0   1   0   0   0 .... 0   0
> cpu 07:   0   0   ...   2   0   0   0   1   0   1   1   0   0 .... 1   0
> cpu 08:   0   0   ...   1   0   0   0   1   0   0   1   0   0 .... 0   1
> cpu 09:   0   0   ...   1   0   0   0   1   0   0   1   0   0 .... 0   1
> cpu 10:   0   0   ...   0   0   0   2   0   0   1   0   1   1 .... 1   2
> cpu 11:   0   0   ...   0   0   0   2   2   0   1   0   1   0 .... 1   2
> cpu 12:   0   0   ...   2   0   0   0   1   1   3   1   1   1 .... 1   0
> cpu 13:   0   0   ...   2   0   0   0   1   1   3   1   1   0 .... 1   1
> cpu 14:   0   0   ...   0   0   0   2   0   0   1   1   0   0 .... 1   0
> cpu 15:   0   0   ...   1   0   0   2   0   0   1   1   0   0 .... 0   0
> 
> output with this patch:
> -----------------------
> cpu 00:   0   0   ...   1   1   2   1   1   1   2   1   1   1 .... 1   3
> cpu 01:   0   0   ...   1   1   1   1   1   1   2   1   1   1 .... 1   3
> cpu 02:   0   0   ...   2   2   3   2   0   2   1   2   1   1 .... 1   1
> cpu 03:   0   0   ...   2   2   3   2   1   2   1   2   1   1 .... 1   1
> cpu 04:   0   1   ...   2   0   0   1   0   1   3   1   1   1 .... 1   1
> cpu 05:   0   1   ...   2   0   1   1   0   1   2   1   1   1 .... 1   1
> cpu 06:   0   0   ...   2   1   1   2   0   1   2   1   1   1 .... 2   1
> cpu 07:   0   0   ...   2   1   1   2   0   1   2   1   1   1 .... 2   1
> cpu 08:   0   0   ...   1   1   1   1   1   1   1   1   1   1 .... 0   0
> cpu 09:   0   0   ...   1   1   1   1   1   1   1   1   1   1 .... 0   0
> cpu 10:   0   0   ...   1   1   1   0   0   1   1   1   1   1 .... 0   0
> cpu 11:   0   0   ...   1   1   1   0   0   1   1   1   1   2 .... 1   0
> cpu 12:   0   0   ...   1   1   1   0   1   1   0   0   0   1 .... 2   1
> cpu 13:   0   0   ...   1   1   1   0   1   1   1   0   1   2 .... 2   0
> cpu 14:   0   0   ...   2   0   0   0   0   1   1   1   1   1 .... 2   2
> cpu 15:   0   0   ...   2   0   0   1   0   1   1   1   1   1 .... 2   2
> ------------------------------------------------------------------------
> Where you can see that CPU is much busier with this patch.

That looks really good - quite similar to how it behaved with 
mutexes, right?

Does this recover most of the performance regression?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rwsem-spinlock: let rwsem write lock stealable
  2013-01-31  9:39 ` Ingo Molnar
@ 2013-01-31 10:09   ` Yuanhan Liu
  2013-01-31 10:45     ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: Yuanhan Liu @ 2013-01-31 10:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, David Howells, Linus Torvalds, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Steven Rostedt

On Thu, Jan 31, 2013 at 10:39:31AM +0100, Ingo Molnar wrote:
> 
> * Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:
> 
> > We(Linux Kernel Performance project) found a regression introduced by
> > commit 5a50508, which just convert all mutex lock to rwsem write lock.
> > The semantics is same, but the results is quite huge in some cases.
> > After investigation, we found the root cause: mutex support lock
> > stealing. Here is the link for the detailed regression report:
> >     https://lkml.org/lkml/2013/1/29/84
> > 
> > Ingo suggests to add write lock stealing to rwsem as well:
> >     "I think we should allow lock-steal between rwsem writers - that
> >      will not hurt fairness as most rwsem fairness concerns relate to
> >      reader vs. writer fairness"
> > 
> > I then tried it with rwsem-spinlock first as I found it much easier to
> > implement it than lib/rwsem.c. And here I sent out this patch first for
> > comments. I'd try lib/rwsem.c later once the change to rwsem-spinlock
> > is OK to you guys.
> > 
> > With this patch, we got a double performance increase in one test box
> > with following aim7 workfile:
> >     FILESIZE: 1M
> >     POOLSIZE: 10M
> >     10 fork_test
> > 
> > some /usr/bin/time output w/o patch      some /usr/bin/time_output with patch
> > ----------------------------------------------------------------------------
> > Percent of CPU this job got: 369%        Percent of CPU this job got: 537%
> > Voluntary context switches: 640595016    Voluntary context switches: 157915561
> > ----------------------------------------------------------------------------
> > You will see we got a 45% increase of CPU usage and saves about 3/4
> > voluntary context switches.
> > 
> > 
> > Here is the .nr_running filed for all CPUs from /proc/sched_debug.
> > 
> > output w/o this patch:
> > ----------------------
> > cpu 00:   0   0   ...   0   0   0   0   0   0   0   1   0   1 .... 0   0
> > cpu 01:   0   0   ...   1   0   0   0   0   0   1   1   0   1 .... 0   0
> > cpu 02:   0   0   ...   1   1   0   0   0   1   0   0   1   0 .... 1   1
> > cpu 03:   0   0   ...   0   1   0   0   0   1   1   0   1   1 .... 0   0
> > cpu 04:   0   1   ...   0   0   2   1   1   2   1   0   1   0 .... 1   0
> > cpu 05:   0   1   ...   0   0   2   1   1   2   1   1   1   1 .... 0   0
> > cpu 06:   0   0   ...   2   0   0   1   0   0   1   0   0   0 .... 0   0
> > cpu 07:   0   0   ...   2   0   0   0   1   0   1   1   0   0 .... 1   0
> > cpu 08:   0   0   ...   1   0   0   0   1   0   0   1   0   0 .... 0   1
> > cpu 09:   0   0   ...   1   0   0   0   1   0   0   1   0   0 .... 0   1
> > cpu 10:   0   0   ...   0   0   0   2   0   0   1   0   1   1 .... 1   2
> > cpu 11:   0   0   ...   0   0   0   2   2   0   1   0   1   0 .... 1   2
> > cpu 12:   0   0   ...   2   0   0   0   1   1   3   1   1   1 .... 1   0
> > cpu 13:   0   0   ...   2   0   0   0   1   1   3   1   1   0 .... 1   1
> > cpu 14:   0   0   ...   0   0   0   2   0   0   1   1   0   0 .... 1   0
> > cpu 15:   0   0   ...   1   0   0   2   0   0   1   1   0   0 .... 0   0
> > 
> > output with this patch:
> > -----------------------
> > cpu 00:   0   0   ...   1   1   2   1   1   1   2   1   1   1 .... 1   3
> > cpu 01:   0   0   ...   1   1   1   1   1   1   2   1   1   1 .... 1   3
> > cpu 02:   0   0   ...   2   2   3   2   0   2   1   2   1   1 .... 1   1
> > cpu 03:   0   0   ...   2   2   3   2   1   2   1   2   1   1 .... 1   1
> > cpu 04:   0   1   ...   2   0   0   1   0   1   3   1   1   1 .... 1   1
> > cpu 05:   0   1   ...   2   0   1   1   0   1   2   1   1   1 .... 1   1
> > cpu 06:   0   0   ...   2   1   1   2   0   1   2   1   1   1 .... 2   1
> > cpu 07:   0   0   ...   2   1   1   2   0   1   2   1   1   1 .... 2   1
> > cpu 08:   0   0   ...   1   1   1   1   1   1   1   1   1   1 .... 0   0
> > cpu 09:   0   0   ...   1   1   1   1   1   1   1   1   1   1 .... 0   0
> > cpu 10:   0   0   ...   1   1   1   0   0   1   1   1   1   1 .... 0   0
> > cpu 11:   0   0   ...   1   1   1   0   0   1   1   1   1   2 .... 1   0
> > cpu 12:   0   0   ...   1   1   1   0   1   1   0   0   0   1 .... 2   1
> > cpu 13:   0   0   ...   1   1   1   0   1   1   1   0   1   2 .... 2   0
> > cpu 14:   0   0   ...   2   0   0   0   0   1   1   1   1   1 .... 2   2
> > cpu 15:   0   0   ...   2   0   0   1   0   1   1   1   1   1 .... 2   2
> > ------------------------------------------------------------------------
> > Where you can see that CPU is much busier with this patch.
> 
> That looks really good - quite similar to how it behaved with 
> mutexes, right?

Yes :)

And the result is almost same with mutex lock when MUTEX_SPIN_ON_OWNER
is disabled, and that's the reason you will see massive processes(about
100) queued on each CPU in my last report:
    https://lkml.org/lkml/2013/1/29/84

> 
> Does this recover most of the performance regression?

Yes, there is only a 10% gap here then. I guess that's because I used
the general rwsem lock implementation(lib/rwsem-spinlock.c), but not the
XADD one(lib/rwsem.c). I guess the gap may be a little smaller if we do
the same thing to lib/rwsem.c.


Thanks.

	--yliu

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rwsem-spinlock: let rwsem write lock stealable
  2013-01-31 10:09   ` Yuanhan Liu
@ 2013-01-31 10:45     ` Ingo Molnar
  2013-01-31 12:23       ` Yuanhan Liu
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2013-01-31 10:45 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: linux-kernel, David Howells, Linus Torvalds, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Steven Rostedt


* Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:

> > > output with this patch:
> > > -----------------------
> > > cpu 00:   0   0   ...   1   1   2   1   1   1   2   1   1   1 .... 1   3
> > > cpu 01:   0   0   ...   1   1   1   1   1   1   2   1   1   1 .... 1   3
> > > cpu 02:   0   0   ...   2   2   3   2   0   2   1   2   1   1 .... 1   1
> > > cpu 03:   0   0   ...   2   2   3   2   1   2   1   2   1   1 .... 1   1
> > > cpu 04:   0   1   ...   2   0   0   1   0   1   3   1   1   1 .... 1   1
> > > cpu 05:   0   1   ...   2   0   1   1   0   1   2   1   1   1 .... 1   1
> > > cpu 06:   0   0   ...   2   1   1   2   0   1   2   1   1   1 .... 2   1
> > > cpu 07:   0   0   ...   2   1   1   2   0   1   2   1   1   1 .... 2   1
> > > cpu 08:   0   0   ...   1   1   1   1   1   1   1   1   1   1 .... 0   0
> > > cpu 09:   0   0   ...   1   1   1   1   1   1   1   1   1   1 .... 0   0
> > > cpu 10:   0   0   ...   1   1   1   0   0   1   1   1   1   1 .... 0   0
> > > cpu 11:   0   0   ...   1   1   1   0   0   1   1   1   1   2 .... 1   0
> > > cpu 12:   0   0   ...   1   1   1   0   1   1   0   0   0   1 .... 2   1
> > > cpu 13:   0   0   ...   1   1   1   0   1   1   1   0   1   2 .... 2   0
> > > cpu 14:   0   0   ...   2   0   0   0   0   1   1   1   1   1 .... 2   2
> > > cpu 15:   0   0   ...   2   0   0   1   0   1   1   1   1   1 .... 2   2
> > > ------------------------------------------------------------------------
> > > Where you can see that CPU is much busier with this patch.
> > 
> > That looks really good - quite similar to how it behaved 
> > with mutexes, right?
> 
> Yes :)
> 
> And the result is almost same with mutex lock when MUTEX_SPIN_ON_OWNER
> is disabled, and that's the reason you will see massive processes(about
> 100) queued on each CPU in my last report:
>     https://lkml.org/lkml/2013/1/29/84

Just curious: how does MUTEX_SPIN_ON_OWNER versus 
!MUTEX_SPIN_ON_OWNER compare, for this particular, 
massively-contended anon-vma locks benchmark?

> > Does this recover most of the performance regression?
> 
> Yes, there is only a 10% gap here then. I guess that's because 
> I used the general rwsem lock 
> implementation(lib/rwsem-spinlock.c), but not the XADD 
> one(lib/rwsem.c). I guess the gap may be a little smaller if 
> we do the same thing to lib/rwsem.c.

Is part of the gap due to MUTEX_SPIN_ON_OWNER perhaps?

I'm surprised that rwsem-spinlock versus rwsem.c would show a 
10% performance difference - assuming you have lock 
debugging/tracing disabled in the .config.

( Once the performance regression is fixed, another thing to 
  check would be to reduce anon-vma lock contention. )

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rwsem-spinlock: let rwsem write lock stealable
  2013-01-30  9:14 [PATCH] rwsem-spinlock: let rwsem write lock stealable Yuanhan Liu
  2013-01-31  9:39 ` Ingo Molnar
@ 2013-01-31 11:57 ` Michel Lespinasse
  2013-01-31 12:40   ` Yuanhan Liu
  2013-01-31 13:10   ` Ingo Molnar
  1 sibling, 2 replies; 12+ messages in thread
From: Michel Lespinasse @ 2013-01-31 11:57 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: linux-kernel, Ingo Molnar, David Howells

On Wed, Jan 30, 2013 at 1:14 AM, Yuanhan Liu
<yuanhan.liu@linux.intel.com> wrote:
> We(Linux Kernel Performance project) found a regression introduced by
> commit 5a50508, which just convert all mutex lock to rwsem write lock.
> The semantics is same, but the results is quite huge in some cases.
> After investigation, we found the root cause: mutex support lock
> stealing. Here is the link for the detailed regression report:
>     https://lkml.org/lkml/2013/1/29/84
>
> Ingo suggests to add write lock stealing to rwsem as well:
>     "I think we should allow lock-steal between rwsem writers - that
>      will not hurt fairness as most rwsem fairness concerns relate to
>      reader vs. writer fairness"
>
> I then tried it with rwsem-spinlock first as I found it much easier to
> implement it than lib/rwsem.c. And here I sent out this patch first for
> comments. I'd try lib/rwsem.c later once the change to rwsem-spinlock
> is OK to you guys.

I noticed that you haven't modified __down_write_trylock() - for
consistency with __down_write() you should replace
        if (sem->activity == 0 && list_empty(&sem->wait_list)) {
with
        if (sem->activity == 0) {

Other than that, I like the idea. I was originally uncomfortable with
doing lock stealing for the rwsem, but I think doing it for writers
only as you propose should be fine. Readers wait for any queued
writers, and in exchange they are guaranteed to get the lock once
they've blocked. You *still* want to check for regressions that this
change might cause - not with anon_vma as this was a mutex not long
ago, but possibly with mmap_sem - but I'm crossing my fingers and
thinking that it'll most likely turn out fine.

I may be able to help with the non-spinlock version of this as I still
remember how this works.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rwsem-spinlock: let rwsem write lock stealable
  2013-01-31 10:45     ` Ingo Molnar
@ 2013-01-31 12:23       ` Yuanhan Liu
  0 siblings, 0 replies; 12+ messages in thread
From: Yuanhan Liu @ 2013-01-31 12:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, David Howells, Linus Torvalds, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Steven Rostedt

On Thu, Jan 31, 2013 at 11:45:41AM +0100, Ingo Molnar wrote:
> 
> * Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:
> 
> > > > output with this patch:
> > > > -----------------------
> > > > cpu 00:   0   0   ...   1   1   2   1   1   1   2   1   1   1 .... 1   3
> > > > cpu 01:   0   0   ...   1   1   1   1   1   1   2   1   1   1 .... 1   3
> > > > cpu 02:   0   0   ...   2   2   3   2   0   2   1   2   1   1 .... 1   1
> > > > cpu 03:   0   0   ...   2   2   3   2   1   2   1   2   1   1 .... 1   1
> > > > cpu 04:   0   1   ...   2   0   0   1   0   1   3   1   1   1 .... 1   1
> > > > cpu 05:   0   1   ...   2   0   1   1   0   1   2   1   1   1 .... 1   1
> > > > cpu 06:   0   0   ...   2   1   1   2   0   1   2   1   1   1 .... 2   1
> > > > cpu 07:   0   0   ...   2   1   1   2   0   1   2   1   1   1 .... 2   1
> > > > cpu 08:   0   0   ...   1   1   1   1   1   1   1   1   1   1 .... 0   0
> > > > cpu 09:   0   0   ...   1   1   1   1   1   1   1   1   1   1 .... 0   0
> > > > cpu 10:   0   0   ...   1   1   1   0   0   1   1   1   1   1 .... 0   0
> > > > cpu 11:   0   0   ...   1   1   1   0   0   1   1   1   1   2 .... 1   0
> > > > cpu 12:   0   0   ...   1   1   1   0   1   1   0   0   0   1 .... 2   1
> > > > cpu 13:   0   0   ...   1   1   1   0   1   1   1   0   1   2 .... 2   0
> > > > cpu 14:   0   0   ...   2   0   0   0   0   1   1   1   1   1 .... 2   2
> > > > cpu 15:   0   0   ...   2   0   0   1   0   1   1   1   1   1 .... 2   2
> > > > ------------------------------------------------------------------------
> > > > Where you can see that CPU is much busier with this patch.
> > > 
> > > That looks really good - quite similar to how it behaved 
> > > with mutexes, right?
> > 
> > Yes :)
> > 
> > And the result is almost same with mutex lock when MUTEX_SPIN_ON_OWNER
> > is disabled, and that's the reason you will see massive processes(about
> > 100) queued on each CPU in my last report:
> >     https://lkml.org/lkml/2013/1/29/84
> 
> Just curious: how does MUTEX_SPIN_ON_OWNER versus 
> !MUTEX_SPIN_ON_OWNER compare, for this particular, 
> massively-contended anon-vma locks benchmark?

In above testcase, MUTEX_SPIN_ON_OWNER is slightly doing better job(like
3% ~ 4%) than !MUTEX_SPIN_ON_OWNER.

> 
> > > Does this recover most of the performance regression?
> > 
> > Yes, there is only a 10% gap here then. I guess that's because 

Sorry, to be accurate, it's about 14% gap; when MUTEX_SPIN_ON_OWNER is
enabled.

> > I used the general rwsem lock 
> > implementation(lib/rwsem-spinlock.c), but not the XADD 
> > one(lib/rwsem.c). I guess the gap may be a little smaller if 
> > we do the same thing to lib/rwsem.c.
> 
> Is part of the gap due to MUTEX_SPIN_ON_OWNER perhaps?

Nope, !MUTEX_SPIN_ON_OWNER does introduce a little performance drop just
as above stated.

So, to make it clear, here is the list:

lock case                            performance drop compared to mutex lock
----------------------------------------------------------------------------
mutex lock w/o MUTEX_SPIN_ON_OWNER   3.x%
rwsem-spinlock with write stealing   14.x%
rwsem-spinlock                       >100%


> 
> I'm surprised that rwsem-spinlock versus rwsem.c would show a 
> 10% performance difference -

Yes, it may not. And there is only about 0.9% performance difference in
above test between rwsem-spinlock and XADD rwsem. The difference maybe
enlarged when both has write lock stealing enabled, which will be known
only after we do same thing to lib/rwsem.c.

Thanks.

	--yliu

> assuming you have lock 
> debugging/tracing disabled in the .config.
> 
> ( Once the performance regression is fixed, another thing to 
>   check would be to reduce anon-vma lock contention. )
> 
> Thanks,
> 
> 	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rwsem-spinlock: let rwsem write lock stealable
  2013-01-31 11:57 ` Michel Lespinasse
@ 2013-01-31 12:40   ` Yuanhan Liu
  2013-01-31 13:12     ` Ingo Molnar
  2013-01-31 13:10   ` Ingo Molnar
  1 sibling, 1 reply; 12+ messages in thread
From: Yuanhan Liu @ 2013-01-31 12:40 UTC (permalink / raw)
  To: Michel Lespinasse; +Cc: linux-kernel, Ingo Molnar, David Howells

On Thu, Jan 31, 2013 at 03:57:51AM -0800, Michel Lespinasse wrote:
> On Wed, Jan 30, 2013 at 1:14 AM, Yuanhan Liu
> <yuanhan.liu@linux.intel.com> wrote:
> > We(Linux Kernel Performance project) found a regression introduced by
> > commit 5a50508, which just convert all mutex lock to rwsem write lock.
> > The semantics is same, but the results is quite huge in some cases.
> > After investigation, we found the root cause: mutex support lock
> > stealing. Here is the link for the detailed regression report:
> >     https://lkml.org/lkml/2013/1/29/84
> >
> > Ingo suggests to add write lock stealing to rwsem as well:
> >     "I think we should allow lock-steal between rwsem writers - that
> >      will not hurt fairness as most rwsem fairness concerns relate to
> >      reader vs. writer fairness"
> >
> > I then tried it with rwsem-spinlock first as I found it much easier to
> > implement it than lib/rwsem.c. And here I sent out this patch first for
> > comments. I'd try lib/rwsem.c later once the change to rwsem-spinlock
> > is OK to you guys.
> 
> I noticed that you haven't modified __down_write_trylock() - for
> consistency with __down_write() you should replace
>         if (sem->activity == 0 && list_empty(&sem->wait_list)) {
> with
>         if (sem->activity == 0) {

Yes, my bad for missing that. Thanks a lot for pointing it out. Will fix it.
> 
> Other than that, I like the idea. I was originally uncomfortable with
> doing lock stealing for the rwsem, but I think doing it for writers
> only as you propose should be fine. Readers wait for any queued
> writers, and in exchange they are guaranteed to get the lock once
> they've blocked.

> You *still* want to check for regressions that this
> change might cause - not with anon_vma as this was a mutex not long
> ago, but possibly with mmap_sem

Yes. Well, at least it passed Fengguang's 0-DAY test, which did lots
of tests on almost all ARCHs. Well, you reminds me that I just enabled
RWSEM_GENERIC_SPINLOCK for x86 ARCH, thus I need to enable
RWSEM_GENERIC_SPINLOCK to all ARCHs and do test again.

BTW, mind to tell a nice test case for mmap_sem?

> - but I'm crossing my fingers and
> thinking that it'll most likely turn out fine.

Thanks!

> 
> I may be able to help with the non-spinlock version of this as I still
> remember how this works.

That would be great! Especially I will have vacation soon for Chinese
New Year.


	--yliu

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rwsem-spinlock: let rwsem write lock stealable
  2013-01-31 11:57 ` Michel Lespinasse
  2013-01-31 12:40   ` Yuanhan Liu
@ 2013-01-31 13:10   ` Ingo Molnar
  1 sibling, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2013-01-31 13:10 UTC (permalink / raw)
  To: Michel Lespinasse; +Cc: Yuanhan Liu, linux-kernel, David Howells


* Michel Lespinasse <walken@google.com> wrote:

> On Wed, Jan 30, 2013 at 1:14 AM, Yuanhan Liu
> <yuanhan.liu@linux.intel.com> wrote:
> > We(Linux Kernel Performance project) found a regression introduced by
> > commit 5a50508, which just convert all mutex lock to rwsem write lock.
> > The semantics is same, but the results is quite huge in some cases.
> > After investigation, we found the root cause: mutex support lock
> > stealing. Here is the link for the detailed regression report:
> >     https://lkml.org/lkml/2013/1/29/84
> >
> > Ingo suggests to add write lock stealing to rwsem as well:
> >     "I think we should allow lock-steal between rwsem writers - that
> >      will not hurt fairness as most rwsem fairness concerns relate to
> >      reader vs. writer fairness"
> >
> > I then tried it with rwsem-spinlock first as I found it much easier to
> > implement it than lib/rwsem.c. And here I sent out this patch first for
> > comments. I'd try lib/rwsem.c later once the change to rwsem-spinlock
> > is OK to you guys.
> 
> I noticed that you haven't modified __down_write_trylock() - for
> consistency with __down_write() you should replace
>         if (sem->activity == 0 && list_empty(&sem->wait_list)) {
> with
>         if (sem->activity == 0) {
> 
> Other than that, I like the idea. I was originally 
> uncomfortable with doing lock stealing for the rwsem, but I 
> think doing it for writers only as you propose should be fine. 
> Readers wait for any queued writers, and in exchange they are 
> guaranteed to get the lock once they've blocked. You *still* 
> want to check for regressions that this change might cause - 
> not with anon_vma as this was a mutex not long ago, but 
> possibly with mmap_sem - but I'm crossing my fingers and 
> thinking that it'll most likely turn out fine.

My gut feeling, from having implemented lock stealing in a 
number of locking primitives in the past, is that writer 
lock-stealing will be a measurable win for mmap_sem as well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rwsem-spinlock: let rwsem write lock stealable
  2013-01-31 12:40   ` Yuanhan Liu
@ 2013-01-31 13:12     ` Ingo Molnar
  2013-01-31 14:36       ` Yuanhan Liu
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2013-01-31 13:12 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: Michel Lespinasse, linux-kernel, David Howells


* Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:

> BTW, mind to tell a nice test case for mmap_sem?

this one was write-hitting on mmap_sem pretty hard, last I 
checked:

  http://people.redhat.com/mingo/threaded-mmap-stresstest/

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rwsem-spinlock: let rwsem write lock stealable
  2013-01-31 13:12     ` Ingo Molnar
@ 2013-01-31 14:36       ` Yuanhan Liu
  2013-01-31 21:18         ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: Yuanhan Liu @ 2013-01-31 14:36 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Michel Lespinasse, linux-kernel, David Howells

On Thu, Jan 31, 2013 at 02:12:28PM +0100, Ingo Molnar wrote:
> 
> * Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:
> 
> > BTW, mind to tell a nice test case for mmap_sem?
> 
> this one was write-hitting on mmap_sem pretty hard, last I 
> checked:
> 
>   http://people.redhat.com/mingo/threaded-mmap-stresstest/

Thanks!

Is there any pass condition? I tested a while, at least I found no oops
or any noisy from dmesg output. Is that OK?

Well, sometimes, it will quit peacefully. Sometimes it will not.
ps -eo 'pid, state,wchan,comm' shows that it is sleeping at
futex_wait_queue_me().

NOTE: this happens both with or w/o this patch. Thus it may not an issue
introduced by this patch?

Thanks.

	--yliu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rwsem-spinlock: let rwsem write lock stealable
  2013-01-31 14:36       ` Yuanhan Liu
@ 2013-01-31 21:18         ` Ingo Molnar
  2013-02-01  2:16           ` Yuanhan Liu
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2013-01-31 21:18 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: Michel Lespinasse, linux-kernel, David Howells

[-- Attachment #1: Type: text/plain, Size: 1172 bytes --]


* Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:

> On Thu, Jan 31, 2013 at 02:12:28PM +0100, Ingo Molnar wrote:
> > 
> > * Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:
> > 
> > > BTW, mind to tell a nice test case for mmap_sem?
> > 
> > this one was write-hitting on mmap_sem pretty hard, last I 
> > checked:
> > 
> >   http://people.redhat.com/mingo/threaded-mmap-stresstest/
> 
> Thanks!
> 
> Is there any pass condition? I tested a while, at least I 
> found no oops or any noisy from dmesg output. Is that OK?

Yeah, not crashing and not hanging is the expected behavior.

> Well, sometimes, it will quit peacefully. Sometimes it will 
> not. ps -eo 'pid, state,wchan,comm' shows that it is sleeping 
> at futex_wait_queue_me().
> 
> NOTE: this happens both with or w/o this patch. Thus it may 
> not an issue introduced by this patch?

hm, that's unexpected - it's expected to loop infinitely. I have 
a newer version (attached) - is that exiting too?

Maybe this triggers spuriously:

        if (!info->si_addr)
                raise(SIGABRT); /* Allow GDB backtrace */

although then you should see the SIGABRT as an irregular exit 
IIRC.

Thanks,

	Ingo

[-- Attachment #2: threaded-mmap-stresstest.tar.gz --]
[-- Type: application/x-gzip, Size: 3036 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] rwsem-spinlock: let rwsem write lock stealable
  2013-01-31 21:18         ` Ingo Molnar
@ 2013-02-01  2:16           ` Yuanhan Liu
  0 siblings, 0 replies; 12+ messages in thread
From: Yuanhan Liu @ 2013-02-01  2:16 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Michel Lespinasse, linux-kernel, David Howells

On Thu, Jan 31, 2013 at 10:18:18PM +0100, Ingo Molnar wrote:
> 
> * Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:
> 
> > On Thu, Jan 31, 2013 at 02:12:28PM +0100, Ingo Molnar wrote:
> > > 
> > > * Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:
> > > 
> > > > BTW, mind to tell a nice test case for mmap_sem?
> > > 
> > > this one was write-hitting on mmap_sem pretty hard, last I 
> > > checked:
> > > 
> > >   http://people.redhat.com/mingo/threaded-mmap-stresstest/
> > 
> > Thanks!
> > 
> > Is there any pass condition? I tested a while, at least I 
> > found no oops or any noisy from dmesg output. Is that OK?
> 
> Yeah, not crashing and not hanging is the expected behavior.

Good to know.

> 
> > Well, sometimes, it will quit peacefully. Sometimes it will 
> > not. ps -eo 'pid, state,wchan,comm' shows that it is sleeping 
> > at futex_wait_queue_me().
> > 
> > NOTE: this happens both with or w/o this patch. Thus it may 
> > not an issue introduced by this patch?
> 
> hm, that's unexpected - it's expected to loop infinitely.

Reall sorry about that. My bad. I modify the code a bit: removed the two
//, so that thread will exit after count > 1000000.

Sorry again :(

	--yliu

> I have 
> a newer version (attached) - is that exiting too?
> 
> Maybe this triggers spuriously:
> 
>         if (!info->si_addr)
>                 raise(SIGABRT); /* Allow GDB backtrace */
> 
> although then you should see the SIGABRT as an irregular exit 
> IIRC.
> 
> Thanks,
> 
> 	Ingo



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-02-01  2:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-30  9:14 [PATCH] rwsem-spinlock: let rwsem write lock stealable Yuanhan Liu
2013-01-31  9:39 ` Ingo Molnar
2013-01-31 10:09   ` Yuanhan Liu
2013-01-31 10:45     ` Ingo Molnar
2013-01-31 12:23       ` Yuanhan Liu
2013-01-31 11:57 ` Michel Lespinasse
2013-01-31 12:40   ` Yuanhan Liu
2013-01-31 13:12     ` Ingo Molnar
2013-01-31 14:36       ` Yuanhan Liu
2013-01-31 21:18         ` Ingo Molnar
2013-02-01  2:16           ` Yuanhan Liu
2013-01-31 13:10   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox