[PATCH 02/10] locking/qspinlock: Remove unbounded cmpxchg loop from locking slowpath

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: paulmck@linux.vnet.ibm.com (Paul E. McKenney)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 02/10] locking/qspinlock: Remove unbounded cmpxchg loop from locking slowpath
Date: Fri, 6 Apr 2018 14:09:53 -0700	[thread overview]
Message-ID: <20180406210953.GA24165@linux.vnet.ibm.com> (raw)
In-Reply-To: <dc5f5e43-a60a-05f2-16fb-46960c40459e@redhat.com>

On Fri, Apr 06, 2018 at 04:50:19PM -0400, Waiman Long wrote:
> On 04/05/2018 12:58 PM, Will Deacon wrote:
> > The qspinlock locking slowpath utilises a "pending" bit as a simple form
> > of an embedded test-and-set lock that can avoid the overhead of explicit
> > queuing in cases where the lock is held but uncontended. This bit is
> > managed using a cmpxchg loop which tries to transition the uncontended
> > lock word from (0,0,0) -> (0,0,1) or (0,0,1) -> (0,1,1).
> >
> > Unfortunately, the cmpxchg loop is unbounded and lockers can be starved
> > indefinitely if the lock word is seen to oscillate between unlocked
> > (0,0,0) and locked (0,0,1). This could happen if concurrent lockers are
> > able to take the lock in the cmpxchg loop without queuing and pass it
> > around amongst themselves.
> >
> > This patch fixes the problem by unconditionally setting _Q_PENDING_VAL
> > using atomic_fetch_or, and then inspecting the old value to see whether
> > we need to spin on the current lock owner, or whether we now effectively
> > hold the lock. The tricky scenario is when concurrent lockers end up
> > queuing on the lock and the lock becomes available, causing us to see
> > a lockword of (n,0,0). With pending now set, simply queuing could lead
> > to deadlock as the head of the queue may not have observed the pending
> > flag being cleared. Conversely, if the head of the queue did observe
> > pending being cleared, then it could transition the lock from (n,0,0) ->
> > (0,0,1) meaning that any attempt to "undo" our setting of the pending
> > bit could race with a concurrent locker trying to set it.
> >
> > We handle this race by preserving the pending bit when taking the lock
> > after reaching the head of the queue and leaving the tail entry intact
> > if we saw pending set, because we know that the tail is going to be
> > updated shortly.
> >
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Signed-off-by: Will Deacon <will.deacon@arm.com>
> > ---
> 
> The pending bit was added to the qspinlock design to counter performance
> degradation compared with ticket lock for workloads with light
> spinlock contention. I run my spinlock stress test on a Intel Skylake
> server running the vanilla 4.16 kernel vs a patched kernel with this
> patchset. The locking rates with different number of locking threads
> were as follows:
> 
>   # of threads  4.16 kernel     patched 4.16 kernel
>   ------------  -----------     -------------------
>         1       7,417 kop/s         7,408 kop/s
>         2       5,755 kop/s         4,486 kop/s
>         3       4,214 kop/s         4,169 kop/s
>         4       4,396 kop/s         4,383 kop/s
>        
> The 2 contending threads case is the one that exercise the pending bit
> code path the most. So it is obvious that this is the one that is most
> impacted by this patchset. The differences in the other cases are mostly
> noise or maybe just a little bit on the 3 contending threads case.
> 
> I am not against this patch, but we certainly need to find out a way to
> bring the performance number up closer to what it is before applying
> the patch.

It would indeed be good to not be in the position of having to trade off
forward-progress guarantees against performance, but that does appear to
be where we are at the moment.

							Thanx, Paul

next prev parent reply	other threads:[~2018-04-06 21:09 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-05 16:58 [PATCH 00/10] kernel/locking: qspinlock improvements Will Deacon
2018-04-05 16:58 ` [PATCH 01/10] locking/qspinlock: Don't spin on pending->locked transition in slowpath Will Deacon
2018-04-05 16:58 ` [PATCH 02/10] locking/qspinlock: Remove unbounded cmpxchg loop from locking slowpath Will Deacon
2018-04-05 17:07   ` Peter Zijlstra
2018-04-06 15:08     ` Will Deacon
2018-04-05 17:13   ` Peter Zijlstra
2018-04-05 21:16   ` Waiman Long
2018-04-06 15:08     ` Will Deacon
2018-04-06 20:50   ` Waiman Long
2018-04-06 21:09     ` Paul E. McKenney [this message]
2018-04-07  8:47       ` Peter Zijlstra
2018-04-07 23:37         ` Paul E. McKenney
2018-04-09 10:58         ` Will Deacon
2018-04-07  9:07     ` Peter Zijlstra
2018-04-09 10:58     ` Will Deacon
2018-04-09 14:54       ` Will Deacon
2018-04-09 15:54         ` Peter Zijlstra
2018-04-09 17:19           ` Will Deacon
2018-04-10  9:35             ` Peter Zijlstra
2018-09-20 16:08             ` Peter Zijlstra
2018-09-20 16:22               ` Peter Zijlstra
2018-04-09 19:33         ` Waiman Long
2018-04-09 17:55       ` Waiman Long
2018-04-10 13:49   ` Sasha Levin
2018-04-05 16:59 ` [PATCH 03/10] locking/qspinlock: Kill cmpxchg loop when claiming lock from head of queue Will Deacon
2018-04-05 17:19   ` Peter Zijlstra
2018-04-06 10:54     ` Will Deacon
2018-04-05 16:59 ` [PATCH 04/10] locking/qspinlock: Use atomic_cond_read_acquire Will Deacon
2018-04-05 16:59 ` [PATCH 05/10] locking/mcs: Use smp_cond_load_acquire() in mcs spin loop Will Deacon
2018-04-05 16:59 ` [PATCH 06/10] barriers: Introduce smp_cond_load_relaxed and atomic_cond_read_relaxed Will Deacon
2018-04-05 17:22   ` Peter Zijlstra
2018-04-06 10:55     ` Will Deacon
2018-04-05 16:59 ` [PATCH 07/10] locking/qspinlock: Use smp_cond_load_relaxed to wait for next node Will Deacon
2018-04-05 16:59 ` [PATCH 08/10] locking/qspinlock: Merge struct __qspinlock into struct qspinlock Will Deacon
2018-04-07  5:23   ` Boqun Feng
2018-04-05 16:59 ` [PATCH 09/10] locking/qspinlock: Make queued_spin_unlock use smp_store_release Will Deacon
2018-04-05 16:59 ` [PATCH 10/10] locking/qspinlock: Elide back-to-back RELEASE operations with smp_wmb() Will Deacon
2018-04-05 17:28   ` Peter Zijlstra
2018-04-06 11:34     ` Will Deacon
2018-04-06 13:05       ` Andrea Parri
2018-04-06 15:27         ` Will Deacon
2018-04-06 15:49           ` Andrea Parri
2018-04-07  5:47   ` Boqun Feng
2018-04-09 10:47     ` Will Deacon
2018-04-06 13:22 ` [PATCH 00/10] kernel/locking: qspinlock improvements Andrea Parri
2018-04-11 10:20   ` Catalin Marinas
2018-04-11 15:39     ` Andrea Parri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180406210953.GA24165@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).