Re: Performance regression from switching lock to rw-sem for anon-vma tree

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Alex Shi <alex.shi@intel.com>
To: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Ingo Molnar <mingo@elte.hu>, Rik van Riel <riel@redhat.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mel Gorman <mgorman@suse.de>, Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michel Lespinasse <walken@google.com>,
	"Wilcox, Matthew R" <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Alex Shi <alex.shi@intel.com>
Subject: Re: Performance regression from switching lock to rw-sem for anon-vma tree
Date: Sun, 16 Jun 2013 17:50:47 +0800	[thread overview]
Message-ID: <51BD8A77.2080201@intel.com> (raw)
In-Reply-To: <1371167015.1754.14.camel@buesod1.americas.hpqcorp.net>

On 06/14/2013 07:43 AM, Davidlohr Bueso wrote:
> I was hoping that the lack of spin on owner was the main difference with
> rwsems and am/was in the middle of implementing it. Could you send your
> patch so I can give it a try on my workloads?
> 
> Note that there have been a few recent (3.10) changes to mutexes that
> give a nice performance boost, specially on large systems, most
> noticeably:
> 
> commit 2bd2c92c (mutex: Make more scalable by doing less atomic
> operations)
> 
> commit 0dc8c730 (mutex: Queue mutex spinners with MCS lock to reduce
> cacheline contention)
> 
> It might be worth looking into doing something similar to commit
> 0dc8c730, in addition to the optimistic spinning.

It is a good tunning for large machine. I just following what the commit 
0dc8c730 done, give a RFC patch here. I tried it on my NHM EP machine. seems no
clear help on aim7. but maybe it is helpful on large machine.  :)


diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
index bb1e2cd..240729a 100644
--- a/include/asm-generic/rwsem.h
+++ b/include/asm-generic/rwsem.h
@@ -70,11 +70,11 @@ static inline void __down_write(struct rw_semaphore *sem)
 
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
-	long tmp;
+	if (unlikely(&sem->count != RWSEM_UNLOCKED_VALUE))
+		return 0;
 
-	tmp = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
-		      RWSEM_ACTIVE_WRITE_BIAS);
-	return tmp == RWSEM_UNLOCKED_VALUE;
+	return cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
+		      RWSEM_ACTIVE_WRITE_BIAS) == RWSEM_UNLOCKED_VALUE;
 }
 
 /*
diff --git a/lib/rwsem.c b/lib/rwsem.c
index 19c5fa9..9e54e20 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -64,7 +64,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 	struct rwsem_waiter *waiter;
 	struct task_struct *tsk;
 	struct list_head *next;
-	long oldcount, woken, loop, adjustment;
+	long woken, loop, adjustment;
 
 	waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
 	if (waiter->type == RWSEM_WAITING_FOR_WRITE) {
@@ -75,7 +75,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 			 * will block as they will notice the queued writer.
 			 */
 			wake_up_process(waiter->task);
-		goto out;
+		return sem;
 	}
 
 	/* Writers might steal the lock before we grant it to the next reader.
@@ -85,15 +85,28 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 	adjustment = 0;
 	if (wake_type != RWSEM_WAKE_READ_OWNED) {
 		adjustment = RWSEM_ACTIVE_READ_BIAS;
- try_reader_grant:
-		oldcount = rwsem_atomic_update(adjustment, sem) - adjustment;
-		if (unlikely(oldcount < RWSEM_WAITING_BIAS)) {
-			/* A writer stole the lock. Undo our reader grant. */
+		while (1) {
+			long oldcount;
+
+			/* A writer stole the lock. */
+			if (unlikely(sem->count & RWSEM_ACTIVE_MASK))
+				return sem;
+
+			if (unlikely(sem->count < RWSEM_WAITING_BIAS)) {
+				cpu_relax();
+				continue;
+			}
+
+			oldcount = rwsem_atomic_update(adjustment, sem)
+								- adjustment;
+			if (likely(oldcount >= RWSEM_WAITING_BIAS))
+				break;
+
+			 /* A writer stole the lock.  Undo our reader grant. */
 			if (rwsem_atomic_update(-adjustment, sem) &
 						RWSEM_ACTIVE_MASK)
-				goto out;
+				return sem;
 			/* Last active locker left. Retry waking readers. */
-			goto try_reader_grant;
 		}
 	}
 
@@ -136,7 +149,6 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 	sem->wait_list.next = next;
 	next->prev = &sem->wait_list;
 
- out:
 	return sem;
 }
 
-- 
Thanks
    Alex


-- 
Thanks
    Alex

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Alex Shi <alex.shi@intel.com>
To: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Ingo Molnar <mingo@elte.hu>, Rik van Riel <riel@redhat.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mel Gorman <mgorman@suse.de>, Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michel Lespinasse <walken@google.com>,
	"Wilcox, Matthew R" <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Alex Shi <alex.shi@intel.com>
Subject: Re: Performance regression from switching lock to rw-sem for anon-vma tree
Date: Sun, 16 Jun 2013 17:50:47 +0800	[thread overview]
Message-ID: <51BD8A77.2080201@intel.com> (raw)
In-Reply-To: <1371167015.1754.14.camel@buesod1.americas.hpqcorp.net>

On 06/14/2013 07:43 AM, Davidlohr Bueso wrote:
> I was hoping that the lack of spin on owner was the main difference with
> rwsems and am/was in the middle of implementing it. Could you send your
> patch so I can give it a try on my workloads?
> 
> Note that there have been a few recent (3.10) changes to mutexes that
> give a nice performance boost, specially on large systems, most
> noticeably:
> 
> commit 2bd2c92c (mutex: Make more scalable by doing less atomic
> operations)
> 
> commit 0dc8c730 (mutex: Queue mutex spinners with MCS lock to reduce
> cacheline contention)
> 
> It might be worth looking into doing something similar to commit
> 0dc8c730, in addition to the optimistic spinning.

It is a good tunning for large machine. I just following what the commit 
0dc8c730 done, give a RFC patch here. I tried it on my NHM EP machine. seems no
clear help on aim7. but maybe it is helpful on large machine.  :)


diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
index bb1e2cd..240729a 100644
--- a/include/asm-generic/rwsem.h
+++ b/include/asm-generic/rwsem.h
@@ -70,11 +70,11 @@ static inline void __down_write(struct rw_semaphore *sem)
 
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
-	long tmp;
+	if (unlikely(&sem->count != RWSEM_UNLOCKED_VALUE))
+		return 0;
 
-	tmp = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
-		      RWSEM_ACTIVE_WRITE_BIAS);
-	return tmp == RWSEM_UNLOCKED_VALUE;
+	return cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
+		      RWSEM_ACTIVE_WRITE_BIAS) == RWSEM_UNLOCKED_VALUE;
 }
 
 /*
diff --git a/lib/rwsem.c b/lib/rwsem.c
index 19c5fa9..9e54e20 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -64,7 +64,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 	struct rwsem_waiter *waiter;
 	struct task_struct *tsk;
 	struct list_head *next;
-	long oldcount, woken, loop, adjustment;
+	long woken, loop, adjustment;
 
 	waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
 	if (waiter->type == RWSEM_WAITING_FOR_WRITE) {
@@ -75,7 +75,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 			 * will block as they will notice the queued writer.
 			 */
 			wake_up_process(waiter->task);
-		goto out;
+		return sem;
 	}
 
 	/* Writers might steal the lock before we grant it to the next reader.
@@ -85,15 +85,28 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 	adjustment = 0;
 	if (wake_type != RWSEM_WAKE_READ_OWNED) {
 		adjustment = RWSEM_ACTIVE_READ_BIAS;
- try_reader_grant:
-		oldcount = rwsem_atomic_update(adjustment, sem) - adjustment;
-		if (unlikely(oldcount < RWSEM_WAITING_BIAS)) {
-			/* A writer stole the lock. Undo our reader grant. */
+		while (1) {
+			long oldcount;
+
+			/* A writer stole the lock. */
+			if (unlikely(sem->count & RWSEM_ACTIVE_MASK))
+				return sem;
+
+			if (unlikely(sem->count < RWSEM_WAITING_BIAS)) {
+				cpu_relax();
+				continue;
+			}
+
+			oldcount = rwsem_atomic_update(adjustment, sem)
+								- adjustment;
+			if (likely(oldcount >= RWSEM_WAITING_BIAS))
+				break;
+
+			 /* A writer stole the lock.  Undo our reader grant. */
 			if (rwsem_atomic_update(-adjustment, sem) &
 						RWSEM_ACTIVE_MASK)
-				goto out;
+				return sem;
 			/* Last active locker left. Retry waking readers. */
-			goto try_reader_grant;
 		}
 	}
 
@@ -136,7 +149,6 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 	sem->wait_list.next = next;
 	next->prev = &sem->wait_list;
 
- out:
 	return sem;
 }
 
-- 
Thanks
    Alex


-- 
Thanks
    Alex

next prev parent reply	other threads:[~2013-06-16  9:50 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1371165333.27102.568.camel@schen9-DESK>
     [not found] ` <1371167015.1754.14.camel@buesod1.americas.hpqcorp.net>
2013-06-14 16:09   ` Performance regression from switching lock to rw-sem for anon-vma tree Tim Chen
2013-06-14 16:09     ` Tim Chen
2013-06-14 22:31     ` Davidlohr Bueso
2013-06-14 22:31       ` Davidlohr Bueso
2013-06-14 22:44       ` Tim Chen
2013-06-14 22:44         ` Tim Chen
2013-06-14 22:47       ` Michel Lespinasse
2013-06-14 22:47         ` Michel Lespinasse
2013-06-17 22:27         ` Tim Chen
2013-06-17 22:27           ` Tim Chen
2013-06-16  9:50   ` Alex Shi [this message]
2013-06-16  9:50     ` Alex Shi
2013-06-17 16:22     ` Davidlohr Bueso
2013-06-17 16:22       ` Davidlohr Bueso
2013-06-17 18:45       ` Tim Chen
2013-06-17 18:45         ` Tim Chen
2013-06-17 19:05         ` Davidlohr Bueso
2013-06-17 19:05           ` Davidlohr Bueso
2013-06-17 22:28           ` Tim Chen
2013-06-17 22:28             ` Tim Chen
2013-06-17 23:18         ` Alex Shi
2013-06-17 23:18           ` Alex Shi
2013-06-17 23:20       ` Alex Shi
2013-06-17 23:20         ` Alex Shi
2013-06-17 23:35         ` Davidlohr Bueso
2013-06-17 23:35           ` Davidlohr Bueso
2013-06-18  0:08           ` Tim Chen
2013-06-18  0:08             ` Tim Chen
2013-06-19 23:11             ` Davidlohr Bueso
2013-06-19 23:11               ` Davidlohr Bueso
2013-06-19 23:24               ` Tim Chen
2013-06-19 23:24                 ` Tim Chen
2013-06-13 23:26 Tim Chen
2013-06-13 23:26 ` Tim Chen
2013-06-19 13:16 ` Ingo Molnar
2013-06-19 13:16   ` Ingo Molnar
2013-06-19 16:53   ` Tim Chen
2013-06-19 16:53     ` Tim Chen
2013-06-26  0:19     ` Tim Chen
2013-06-26  0:19       ` Tim Chen
2013-06-26  9:51       ` Ingo Molnar
2013-06-26  9:51         ` Ingo Molnar
2013-06-26 21:36         ` Tim Chen
2013-06-26 21:36           ` Tim Chen
2013-06-27  0:25           ` Tim Chen
2013-06-27  0:25             ` Tim Chen
2013-06-27  8:36             ` Ingo Molnar
2013-06-27  8:36               ` Ingo Molnar
2013-06-27 20:53               ` Tim Chen
2013-06-27 20:53                 ` Tim Chen
2013-06-27 23:31                 ` Tim Chen
2013-06-27 23:31                   ` Tim Chen
2013-06-28  9:38                   ` Ingo Molnar
2013-06-28  9:38                     ` Ingo Molnar
2013-06-28 21:04                     ` Tim Chen
2013-06-28 21:04                       ` Tim Chen
2013-06-29  7:12                       ` Ingo Molnar
2013-06-29  7:12                         ` Ingo Molnar
2013-07-01 20:28                         ` Tim Chen
2013-07-01 20:28                           ` Tim Chen
2013-07-02  6:45                           ` Ingo Molnar
2013-07-02  6:45                             ` Ingo Molnar
2013-07-16 17:53                             ` Tim Chen
2013-07-16 17:53                               ` Tim Chen
2013-07-23  9:45                               ` Ingo Molnar
2013-07-23  9:45                                 ` Ingo Molnar
2013-07-23  9:51                                 ` Peter Zijlstra
2013-07-23  9:51                                   ` Peter Zijlstra
2013-07-23  9:53                                   ` Ingo Molnar
2013-07-23  9:53                                     ` Ingo Molnar
2013-07-30  0:13                                     ` Tim Chen
2013-07-30  0:13                                       ` Tim Chen
2013-07-30 19:24                                       ` Ingo Molnar
2013-07-30 19:24                                         ` Ingo Molnar
2013-08-05 22:08                                         ` Tim Chen
2013-08-05 22:08                                           ` Tim Chen
2013-07-30 19:59                                       ` Davidlohr Bueso
2013-07-30 19:59                                         ` Davidlohr Bueso
2013-07-30 20:34                                         ` Tim Chen
2013-07-30 20:34                                           ` Tim Chen
2013-07-30 21:45                                           ` Davidlohr Bueso
2013-07-30 21:45                                             ` Davidlohr Bueso
2013-08-06 23:55                                       ` Davidlohr Bueso
2013-08-06 23:55                                         ` Davidlohr Bueso
2013-08-07  0:56                                         ` Tim Chen
2013-08-07  0:56                                           ` Tim Chen
2013-08-12 18:52                                           ` Ingo Molnar
2013-08-12 18:52                                             ` Ingo Molnar
2013-08-12 20:10                                             ` Tim Chen
2013-08-12 20:10                                               ` Tim Chen
2013-06-28  9:20                 ` Ingo Molnar
2013-06-28  9:20                   ` Ingo Molnar

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:bb1e2cd dfblob:240729a dfblob:19c5fa9 dfblob:9e54e20
dfblob:bb1e2cd dfblob:240729a dfblob:19c5fa9 dfblob:9e54e20 )
 OR (
bs:"Re: Performance regression from switching lock to rw-sem for anon-vma tree" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51BD8A77.2080201@intel.com \
    --to=alex.shi@intel.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=dave.hansen@intel.com \
    --cc=davidlohr.bueso@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.r.wilcox@intel.com \
    --cc=mgorman@suse.de \
    --cc=mingo@elte.hu \
    --cc=riel@redhat.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.