[PATCH 02/10] locking/qspinlock: Remove unbounded cmpxchg loop from locking slowpath

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 02/10] locking/qspinlock: Remove unbounded cmpxchg loop from locking slowpath
Date: Mon, 9 Apr 2018 15:54:09 +0100	[thread overview]
Message-ID: <20180409145409.GA9661@arm.com> (raw)
In-Reply-To: <20180409105835.GC23134@arm.com>

On Mon, Apr 09, 2018 at 11:58:35AM +0100, Will Deacon wrote:
> On Fri, Apr 06, 2018 at 04:50:19PM -0400, Waiman Long wrote:
> > The pending bit was added to the qspinlock design to counter performance
> > degradation compared with ticket lock for workloads with light
> > spinlock contention. I run my spinlock stress test on a Intel Skylake
> > server running the vanilla 4.16 kernel vs a patched kernel with this
> > patchset. The locking rates with different number of locking threads
> > were as follows:
> > 
> >   # of threads  4.16 kernel     patched 4.16 kernel
> >   ------------  -----------     -------------------
> >         1       7,417 kop/s         7,408 kop/s
> >         2       5,755 kop/s         4,486 kop/s
> >         3       4,214 kop/s         4,169 kop/s
> >         4       4,396 kop/s         4,383 kop/s
> >        
> > The 2 contending threads case is the one that exercise the pending bit
> > code path the most. So it is obvious that this is the one that is most
> > impacted by this patchset. The differences in the other cases are mostly
> > noise or maybe just a little bit on the 3 contending threads case.
> 
> That is bizarre. A few questions:
> 
>   1. Is this with my patches as posted, or also with your WRITE_ONCE change?
>   2. Could you try to bisect my series to see which patch is responsible
>      for this degradation, please?
>   3. Could you point me at your stress test, so I can try to reproduce these
>      numbers on arm64 systems, please?
> 
> > I am not against this patch, but we certainly need to find out a way to
> > bring the performance number up closer to what it is before applying
> > the patch.
> 
> We certainly need to *understand* where the drop is coming from, because
> the two-threaded case is still just a CAS on x86 with and without this
> patch series. Generally, there's a throughput cost when ensuring fairness
> and forward-progress otherwise we'd all be using test-and-set.

Whilst I think we still need to address my questions above, I've had a
crack at the diff below. Please can you give it a spin? It sticks a trylock
on the slowpath before setting pending and replaces the CAS-based set
with an xchg (which I *think* is safe, but will need to ponder it some
more).

Thanks,

Will

--->8

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 19261af9f61e..71eb5e3a3d91 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -139,6 +139,20 @@ static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
 	WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL);
 }
 
+/**
+ * set_pending_fetch_acquire - set the pending bit and return the old lock
+ *                             value with acquire semantics.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,*,* -> *,1,*
+ */
+static __always_inline u32 set_pending_fetch_acquire(struct qspinlock *lock)
+{
+	u32 val = xchg_relaxed(&lock->pending, 1) << _Q_PENDING_OFFSET;
+	val |= (atomic_read_acquire(&lock->val) & ~_Q_PENDING_MASK);
+	return val;
+}
+
 /*
  * xchg_tail - Put in the new queue tail code word & retrieve previous one
  * @lock : Pointer to queued spinlock structure
@@ -184,6 +198,18 @@ static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
 }
 
 /**
+ * set_pending_fetch_acquire - set the pending bit and return the old lock
+ *                             value with acquire semantics.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,*,* -> *,1,*
+ */
+static __always_inline u32 set_pending_fetch_acquire(struct qspinlock *lock)
+{
+	return atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val);
+}
+
+/**
  * xchg_tail - Put in the new queue tail code word & retrieve previous one
  * @lock : Pointer to queued spinlock structure
  * @tail : The new queue tail code word
@@ -289,18 +315,26 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 		return;
 
 	/*
-	 * If we observe any contention; queue.
+	 * If we observe queueing, then queue ourselves.
 	 */
-	if (val & ~_Q_LOCKED_MASK)
+	if (val & _Q_TAIL_MASK)
 		goto queue;
 
 	/*
+	 * We didn't see any queueing, so have one more try at snatching
+	 * the lock in case it became available whilst we were taking the
+	 * slow path.
+	 */
+	if (queued_spin_trylock(lock))
+		return;
+
+	/*
 	 * trylock || pending
 	 *
 	 * 0,0,0 -> 0,0,1 ; trylock
 	 * 0,0,1 -> 0,1,1 ; pending
 	 */
-	val = atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val);
+	val = set_pending_fetch_acquire(lock);
 	if (!(val & ~_Q_LOCKED_MASK)) {
 		/*
 		 * we're pending, wait for the owner to go away.

next prev parent reply	other threads:[~2018-04-09 14:54 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-05 16:58 [PATCH 00/10] kernel/locking: qspinlock improvements Will Deacon
2018-04-05 16:58 ` [PATCH 01/10] locking/qspinlock: Don't spin on pending->locked transition in slowpath Will Deacon
2018-04-05 16:58 ` [PATCH 02/10] locking/qspinlock: Remove unbounded cmpxchg loop from locking slowpath Will Deacon
2018-04-05 17:07   ` Peter Zijlstra
2018-04-06 15:08     ` Will Deacon
2018-04-05 17:13   ` Peter Zijlstra
2018-04-05 21:16   ` Waiman Long
2018-04-06 15:08     ` Will Deacon
2018-04-06 20:50   ` Waiman Long
2018-04-06 21:09     ` Paul E. McKenney
2018-04-07  8:47       ` Peter Zijlstra
2018-04-07 23:37         ` Paul E. McKenney
2018-04-09 10:58         ` Will Deacon
2018-04-07  9:07     ` Peter Zijlstra
2018-04-09 10:58     ` Will Deacon
2018-04-09 14:54       ` Will Deacon [this message]
2018-04-09 15:54         ` Peter Zijlstra
2018-04-09 17:19           ` Will Deacon
2018-04-10  9:35             ` Peter Zijlstra
2018-09-20 16:08             ` Peter Zijlstra
2018-09-20 16:22               ` Peter Zijlstra
2018-04-09 19:33         ` Waiman Long
2018-04-09 17:55       ` Waiman Long
2018-04-10 13:49   ` Sasha Levin
2018-04-05 16:59 ` [PATCH 03/10] locking/qspinlock: Kill cmpxchg loop when claiming lock from head of queue Will Deacon
2018-04-05 17:19   ` Peter Zijlstra
2018-04-06 10:54     ` Will Deacon
2018-04-05 16:59 ` [PATCH 04/10] locking/qspinlock: Use atomic_cond_read_acquire Will Deacon
2018-04-05 16:59 ` [PATCH 05/10] locking/mcs: Use smp_cond_load_acquire() in mcs spin loop Will Deacon
2018-04-05 16:59 ` [PATCH 06/10] barriers: Introduce smp_cond_load_relaxed and atomic_cond_read_relaxed Will Deacon
2018-04-05 17:22   ` Peter Zijlstra
2018-04-06 10:55     ` Will Deacon
2018-04-05 16:59 ` [PATCH 07/10] locking/qspinlock: Use smp_cond_load_relaxed to wait for next node Will Deacon
2018-04-05 16:59 ` [PATCH 08/10] locking/qspinlock: Merge struct __qspinlock into struct qspinlock Will Deacon
2018-04-07  5:23   ` Boqun Feng
2018-04-05 16:59 ` [PATCH 09/10] locking/qspinlock: Make queued_spin_unlock use smp_store_release Will Deacon
2018-04-05 16:59 ` [PATCH 10/10] locking/qspinlock: Elide back-to-back RELEASE operations with smp_wmb() Will Deacon
2018-04-05 17:28   ` Peter Zijlstra
2018-04-06 11:34     ` Will Deacon
2018-04-06 13:05       ` Andrea Parri
2018-04-06 15:27         ` Will Deacon
2018-04-06 15:49           ` Andrea Parri
2018-04-07  5:47   ` Boqun Feng
2018-04-09 10:47     ` Will Deacon
2018-04-06 13:22 ` [PATCH 00/10] kernel/locking: qspinlock improvements Andrea Parri
2018-04-11 10:20   ` Catalin Marinas
2018-04-11 15:39     ` Andrea Parri

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:19261af9f61 dfblob:71eb5e3a3d9 )
 OR (
bs:"[PATCH 02/10] locking/qspinlock: Remove unbounded cmpxchg loop from locking slowpath" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180409145409.GA9661@arm.com \
    --to=will.deacon@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).