[PATCH v3] locking/pvqspinlock: Relax cmpxchg's to improve performance on some archs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Waiman Long <longman@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	Pan Xinhui <xinhui@linux.vnet.ibm.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Waiman Long <longman@redhat.com>
Subject: [PATCH v3] locking/pvqspinlock: Relax cmpxchg's to improve performance on some archs
Date: Fri, 17 Feb 2017 15:43:40 -0500	[thread overview]
Message-ID: <1487364220-11641-1-git-send-email-longman@redhat.com> (raw)

All the locking related cmpxchg's in the following functions are
replaced with the _acquire variants:
 - pv_queued_spin_steal_lock()
 - trylock_clear_pending()

This change should help performance on architectures that use LL/SC.

On a 2-core 16-thread Power8 system with pvqspinlock explicitly
enabled, the performance of a locking microbenchmark with and without
this patch on a 4.10-rc8 kernel with Xinhui's PPC qspinlock patch
were as follows:

  # of thread     w/o patch    with patch      % Change
  -----------     ---------    ----------      --------
       4         4053.3 Mop/s  4223.7 Mop/s     +4.2%
       8         3310.4 Mop/s  3406.0 Mop/s     +2.9%
      12         2576.4 Mop/s  2674.6 Mop/s     +3.8%

Signed-off-by: Waiman Long <longman@redhat.com>
---

 v2->v3:
  - Reduce scope by relaxing cmpxchg's in fast path only.

 v1->v2:
  - Add comments in changelog and code for the rationale of the change.

 kernel/locking/qspinlock_paravirt.h | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
index e6b2f7a..a59dc05 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -72,7 +72,7 @@ static inline bool pv_queued_spin_steal_lock(struct qspinlock *lock)
 	struct __qspinlock *l = (void *)lock;
 
 	if (!(atomic_read(&lock->val) & _Q_LOCKED_PENDING_MASK) &&
-	    (cmpxchg(&l->locked, 0, _Q_LOCKED_VAL) == 0)) {
+	    (cmpxchg_acquire(&l->locked, 0, _Q_LOCKED_VAL) == 0)) {
 		qstat_inc(qstat_pv_lock_stealing, true);
 		return true;
 	}
@@ -101,16 +101,16 @@ static __always_inline void clear_pending(struct qspinlock *lock)
 
 /*
  * The pending bit check in pv_queued_spin_steal_lock() isn't a memory
- * barrier. Therefore, an atomic cmpxchg() is used to acquire the lock
- * just to be sure that it will get it.
+ * barrier. Therefore, an atomic cmpxchg_acquire() is used to acquire the
+ * lock just to be sure that it will get it.
  */
 static __always_inline int trylock_clear_pending(struct qspinlock *lock)
 {
 	struct __qspinlock *l = (void *)lock;
 
 	return !READ_ONCE(l->locked) &&
-	       (cmpxchg(&l->locked_pending, _Q_PENDING_VAL, _Q_LOCKED_VAL)
-			== _Q_PENDING_VAL);
+	       (cmpxchg_acquire(&l->locked_pending, _Q_PENDING_VAL,
+				_Q_LOCKED_VAL) == _Q_PENDING_VAL);
 }
 #else /* _Q_PENDING_BITS == 8 */
 static __always_inline void set_pending(struct qspinlock *lock)
@@ -138,7 +138,7 @@ static __always_inline int trylock_clear_pending(struct qspinlock *lock)
 		 */
 		old = val;
 		new = (val & ~_Q_PENDING_MASK) | _Q_LOCKED_VAL;
-		val = atomic_cmpxchg(&lock->val, old, new);
+		val = atomic_cmpxchg_acquire(&lock->val, old, new);
 
 		if (val == old)
 			return 1;
@@ -361,6 +361,9 @@ static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node)
 	 * observe its next->locked value and advance itself.
 	 *
 	 * Matches with smp_store_mb() and cmpxchg() in pv_wait_node()
+	 *
+	 * We can't used relaxed form of cmpxchg here as the loading of
+	 * pn->state can happen before setting next->locked in some archs.
 	 */
 	if (cmpxchg(&pn->state, vcpu_halted, vcpu_hashed) != vcpu_halted)
 		return;
-- 
1.8.3.1

next             reply	other threads:[~2017-02-17 20:44 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-17 20:43 Waiman Long [this message]
2017-02-20  4:20 ` [PATCH v3] locking/pvqspinlock: Relax cmpxchg's to improve performance on some archs Andrea Parri
2017-02-20  4:53   ` Boqun Feng
2017-02-20  4:58     ` Boqun Feng
2017-02-21 13:04       ` Will Deacon
2017-02-20 15:58     ` Waiman Long
2017-02-20 11:00   ` Peter Zijlstra

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:e6b2f7a dfblob:a59dc05 )
 OR (
bs:"[PATCH v3] locking/pvqspinlock: Relax cmpxchg's to improve performance on some archs" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1487364220-11641-1-git-send-email-longman@redhat.com \
    --to=longman@redhat.com \
    --cc=boqun.feng@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=xinhui@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.