From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Will Deacon <will.deacon@arm.com>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
Waiman Long <longman@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
boqun.feng@gmail.com, linux-arm-kernel@lists.infradead.org,
paulmck@linux.vnet.ibm.com, Ingo Molnar <mingo@kernel.org>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.14 24/72] locking/qspinlock: Remove unbounded cmpxchg() loop from locking slowpath
Date: Thu, 20 Dec 2018 10:18:23 +0100 [thread overview]
Message-ID: <20181220085923.285659094@linuxfoundation.org> (raw)
In-Reply-To: <20181220085922.332225035@linuxfoundation.org>
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 59fb586b4a07b4e1a0ee577140ab4842ba451acd upstream.
The qspinlock locking slowpath utilises a "pending" bit as a simple form
of an embedded test-and-set lock that can avoid the overhead of explicit
queuing in cases where the lock is held but uncontended. This bit is
managed using a cmpxchg() loop which tries to transition the uncontended
lock word from (0,0,0) -> (0,0,1) or (0,0,1) -> (0,1,1).
Unfortunately, the cmpxchg() loop is unbounded and lockers can be starved
indefinitely if the lock word is seen to oscillate between unlocked
(0,0,0) and locked (0,0,1). This could happen if concurrent lockers are
able to take the lock in the cmpxchg() loop without queuing and pass it
around amongst themselves.
This patch fixes the problem by unconditionally setting _Q_PENDING_VAL
using atomic_fetch_or, and then inspecting the old value to see whether
we need to spin on the current lock owner, or whether we now effectively
hold the lock. The tricky scenario is when concurrent lockers end up
queuing on the lock and the lock becomes available, causing us to see
a lockword of (n,0,0). With pending now set, simply queuing could lead
to deadlock as the head of the queue may not have observed the pending
flag being cleared. Conversely, if the head of the queue did observe
pending being cleared, then it could transition the lock from (n,0,0) ->
(0,0,1) meaning that any attempt to "undo" our setting of the pending
bit could race with a concurrent locker trying to set it.
We handle this race by preserving the pending bit when taking the lock
after reaching the head of the queue and leaving the tail entry intact
if we saw pending set, because we know that the tail is going to be
updated shortly.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1524738868-31318-6-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/locking/qspinlock.c | 102 ++++++++++++++++------------
kernel/locking/qspinlock_paravirt.h | 5 --
2 files changed, 58 insertions(+), 49 deletions(-)
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index e60e618287b4..7bd053e528c2 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -127,6 +127,17 @@ static inline __pure struct mcs_spinlock *decode_tail(u32 tail)
#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
#if _Q_PENDING_BITS == 8
+/**
+ * clear_pending - clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,* -> *,0,*
+ */
+static __always_inline void clear_pending(struct qspinlock *lock)
+{
+ WRITE_ONCE(lock->pending, 0);
+}
+
/**
* clear_pending_set_locked - take ownership and clear the pending bit.
* @lock: Pointer to queued spinlock structure
@@ -162,6 +173,17 @@ static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
#else /* _Q_PENDING_BITS == 8 */
+/**
+ * clear_pending - clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,* -> *,0,*
+ */
+static __always_inline void clear_pending(struct qspinlock *lock)
+{
+ atomic_andnot(_Q_PENDING_VAL, &lock->val);
+}
+
/**
* clear_pending_set_locked - take ownership and clear the pending bit.
* @lock: Pointer to queued spinlock structure
@@ -266,7 +288,7 @@ static __always_inline u32 __pv_wait_head_or_lock(struct qspinlock *lock,
void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
{
struct mcs_spinlock *prev, *next, *node;
- u32 new, old, tail;
+ u32 old, tail;
int idx;
BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
@@ -289,59 +311,51 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
(VAL != _Q_PENDING_VAL) || !cnt--);
}
+ /*
+ * If we observe any contention; queue.
+ */
+ if (val & ~_Q_LOCKED_MASK)
+ goto queue;
+
/*
* trylock || pending
*
* 0,0,0 -> 0,0,1 ; trylock
* 0,0,1 -> 0,1,1 ; pending
*/
- for (;;) {
+ val = atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val);
+ if (!(val & ~_Q_LOCKED_MASK)) {
/*
- * If we observe any contention; queue.
+ * We're pending, wait for the owner to go away.
+ *
+ * *,1,1 -> *,1,0
+ *
+ * this wait loop must be a load-acquire such that we match the
+ * store-release that clears the locked bit and create lock
+ * sequentiality; this is because not all
+ * clear_pending_set_locked() implementations imply full
+ * barriers.
*/
- if (val & ~_Q_LOCKED_MASK)
- goto queue;
-
- new = _Q_LOCKED_VAL;
- if (val == new)
- new |= _Q_PENDING_VAL;
+ if (val & _Q_LOCKED_MASK) {
+ smp_cond_load_acquire(&lock->val.counter,
+ !(VAL & _Q_LOCKED_MASK));
+ }
/*
- * Acquire semantic is required here as the function may
- * return immediately if the lock was free.
+ * take ownership and clear the pending bit.
+ *
+ * *,1,0 -> *,0,1
*/
- old = atomic_cmpxchg_acquire(&lock->val, val, new);
- if (old == val)
- break;
-
- val = old;
- }
-
- /*
- * we won the trylock
- */
- if (new == _Q_LOCKED_VAL)
+ clear_pending_set_locked(lock);
return;
+ }
/*
- * we're pending, wait for the owner to go away.
- *
- * *,1,1 -> *,1,0
- *
- * this wait loop must be a load-acquire such that we match the
- * store-release that clears the locked bit and create lock
- * sequentiality; this is because not all clear_pending_set_locked()
- * implementations imply full barriers.
- */
- smp_cond_load_acquire(&lock->val.counter, !(VAL & _Q_LOCKED_MASK));
-
- /*
- * take ownership and clear the pending bit.
- *
- * *,1,0 -> *,0,1
+ * If pending was clear but there are waiters in the queue, then
+ * we need to undo our setting of pending before we queue ourselves.
*/
- clear_pending_set_locked(lock);
- return;
+ if (!(val & _Q_PENDING_MASK))
+ clear_pending(lock);
/*
* End of pending bit optimistic spinning and beginning of MCS
@@ -445,15 +459,15 @@ locked:
* claim the lock:
*
* n,0,0 -> 0,0,1 : lock, uncontended
- * *,0,0 -> *,0,1 : lock, contended
+ * *,*,0 -> *,*,1 : lock, contended
*
- * If the queue head is the only one in the queue (lock value == tail),
- * clear the tail code and grab the lock. Otherwise, we only need
- * to grab the lock.
+ * If the queue head is the only one in the queue (lock value == tail)
+ * and nobody is pending, clear the tail code and grab the lock.
+ * Otherwise, we only need to grab the lock.
*/
for (;;) {
/* In the PV case we might already have _Q_LOCKED_VAL set */
- if ((val & _Q_TAIL_MASK) != tail) {
+ if ((val & _Q_TAIL_MASK) != tail || (val & _Q_PENDING_MASK)) {
set_locked(lock);
break;
}
diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
index 1435ba7954c3..854443f7b60b 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -89,11 +89,6 @@ static __always_inline void set_pending(struct qspinlock *lock)
WRITE_ONCE(lock->pending, 1);
}
-static __always_inline void clear_pending(struct qspinlock *lock)
-{
- WRITE_ONCE(lock->pending, 0);
-}
-
/*
* The pending bit check in pv_queued_spin_steal_lock() isn't a memory
* barrier. Therefore, an atomic cmpxchg_acquire() is used to acquire the
--
2.19.1
next prev parent reply other threads:[~2018-12-20 9:18 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-20 9:17 [PATCH 4.14 00/72] 4.14.90-stable review Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 01/72] timer/debug: Change /proc/timer_list from 0444 to 0400 Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 02/72] pinctrl: sunxi: a83t: Fix IRQ offset typo for PH11 Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 03/72] aio: fix spectre gadget in lookup_ioctx Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 04/72] userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 05/72] arm64: dma-mapping: Fix FORCE_CONTIGUOUS buffer clearing Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 06/72] MMC: OMAP: fix broken MMC on OMAP15XX/OMAP5910/OMAP310 Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 07/72] mmc: sdhci: fix the timeout check window for clock and reset Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 08/72] fuse: continue to send FUSE_RELEASEDIR when FUSE_OPEN returns ENOSYS Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 09/72] ARM: mmp/mmp2: fix cpu_is_mmp2() on mmp2-dt Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 10/72] dm thin: send event about thin-pool state change _after_ making it Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 11/72] dm cache metadata: verify cache has blocks in blocks_are_clean_separate_dirty() Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 12/72] tracing: Fix memory leak in set_trigger_filter() Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 13/72] tracing: Fix memory leak of instance function hash filters Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 14/72] powerpc/msi: Fix NULL pointer access in teardown code Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 15/72] drm/nouveau/kms: Fix memory leak in nv50_mstm_del() Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 16/72] Revert "drm/rockchip: Allow driver to be shutdown on reboot/kexec" Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 17/72] drm/i915/execlists: Apply a full mb before execution for Braswell Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 18/72] drm/amdgpu: update SMC firmware image for polaris10 variants Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 19/72] x86/build: Fix compiler support check for CONFIG_RETPOLINE Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 20/72] locking: Remove smp_read_barrier_depends() from queued_spin_lock_slowpath() Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 21/72] locking/qspinlock: Ensure node is initialised before updating prev->next Greg Kroah-Hartman
2018-12-20 12:19 ` Sudip Mukherjee
2018-12-20 15:41 ` Greg Kroah-Hartman
2018-12-20 19:20 ` Sudip Mukherjee
2018-12-20 9:18 ` [PATCH 4.14 22/72] locking/qspinlock: Bound spinning on pending->locked transition in slowpath Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 23/72] locking/qspinlock: Merge struct __qspinlock into struct qspinlock Greg Kroah-Hartman
2018-12-20 9:18 ` Greg Kroah-Hartman [this message]
2018-12-20 9:18 ` [PATCH 4.14 25/72] locking/qspinlock: Remove duplicate clear_pending() function from PV code Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 26/72] locking/qspinlock: Kill cmpxchg() loop when claiming lock from head of queue Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 27/72] locking/qspinlock: Re-order code Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 28/72] locking/qspinlock/x86: Increase _Q_PENDING_LOOPS upper bound Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 29/72] locking/qspinlock, x86: Provide liveness guarantee Greg Kroah-Hartman
2018-12-20 12:14 ` Sudip Mukherjee
2018-12-20 15:40 ` Greg Kroah-Hartman
2018-12-20 18:49 ` Sudip Mukherjee
2018-12-20 23:05 ` Sebastian Andrzej Siewior
2018-12-21 12:59 ` Sudip Mukherjee
2018-12-20 9:18 ` [PATCH 4.14 30/72] elevator: lookup mq vs non-mq elevators Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 31/72] netfilter: ipset: Fix wraparound in hash:*net* types Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 32/72] mac80211: dont WARN on bad WMM parameters from buggy APs Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 33/72] mac80211: Fix condition validating WMM IE Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 34/72] IB/hfi1: Remove race conditions in user_sdma send path Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 35/72] locking/qspinlock: Fix build for anonymous union in older GCC compilers Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 36/72] mac80211_hwsim: fix module init error paths for netlink Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 37/72] Input: hyper-v - fix wakeup from suspend-to-idle Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 38/72] scsi: libiscsi: Fix NULL pointer dereference in iscsi_eh_session_reset Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 39/72] scsi: vmw_pscsi: Rearrange code to avoid multiple calls to free_irq during unload Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 40/72] x86/earlyprintk/efi: Fix infinite loop on some screen widths Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 41/72] drm/msm: Grab a vblank reference when waiting for commit_done Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 42/72] ARC: io.h: Implement reads{x}()/writes{x}() Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 43/72] bonding: fix 802.3ad state sent to partner when unbinding slave Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 44/72] bpf: Fix verifier log string check for bad alignment Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 45/72] nfs: dont dirty kernel pages read by direct-io Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 46/72] SUNRPC: Fix a potential race in xprt_connect() Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 47/72] sbus: char: add of_node_put() Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 48/72] drivers/sbus/char: " Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 49/72] drivers/tty: add missing of_node_put() Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 50/72] ide: pmac: add of_node_put() Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 51/72] drm/msm: Fix error return checking Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 52/72] clk: mvebu: Off by one bugs in cp110_of_clk_get() Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 53/72] clk: mmp: Off by one in mmp_clk_add() Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 54/72] Input: synaptics - enable SMBus for HP 15-ay000 Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 55/72] Input: omap-keypad - fix keyboard debounce configuration Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 56/72] libata: whitelist all SAMSUNG MZ7KM* solid-state disks Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 57/72] mv88e6060: disable hardware level MAC learning Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 58/72] net/mlx4_en: Fix build break when CONFIG_INET is off Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 59/72] ARM: 8814/1: mm: improve/fix ARM v7_dma_inv_range() unaligned address handling Greg Kroah-Hartman
2018-12-20 9:18 ` [PATCH 4.14 60/72] ARM: 8815/1: V7M: align v7m_dma_inv_range() with v7 counterpart Greg Kroah-Hartman
2018-12-20 9:19 ` [PATCH 4.14 61/72] ethernet: fman: fix wrong of_node_put() in probe function Greg Kroah-Hartman
2018-12-20 9:19 ` [PATCH 4.14 62/72] drm/ast: Fix connector leak during driver unload Greg Kroah-Hartman
2018-12-20 9:19 ` [PATCH 4.14 63/72] cifs: In Kconfig CONFIG_CIFS_POSIX needs depends on legacy (insecure cifs) Greg Kroah-Hartman
2018-12-20 9:19 ` [PATCH 4.14 64/72] vhost/vsock: fix reset orphans race with close timeout Greg Kroah-Hartman
2018-12-20 9:19 ` [PATCH 4.14 65/72] mlxsw: spectrum_switchdev: Fix VLAN device deletion via ioctl Greg Kroah-Hartman
2018-12-20 9:19 ` [PATCH 4.14 66/72] i2c: axxia: properly handle master timeout Greg Kroah-Hartman
2018-12-20 9:19 ` [PATCH 4.14 67/72] i2c: scmi: Fix probe error on devices with an empty SMB0001 ACPI device node Greg Kroah-Hartman
2018-12-20 9:19 ` [PATCH 4.14 68/72] i2c: uniphier: fix violation of tLOW requirement for Fast-mode Greg Kroah-Hartman
2018-12-20 9:19 ` [PATCH 4.14 69/72] i2c: uniphier-f: " Greg Kroah-Hartman
2018-12-20 9:19 ` [PATCH 4.14 70/72] nvmet-rdma: fix response use after free Greg Kroah-Hartman
2018-12-20 9:19 ` [PATCH 4.14 71/72] rtc: snvs: Add timeouts to avoid kernel lockups Greg Kroah-Hartman
2018-12-20 9:19 ` [PATCH 4.14 72/72] bpf, arm: fix emit_ldx_r and emit_mov_i using TMP_REG_1 Greg Kroah-Hartman
2018-12-20 18:29 ` [PATCH 4.14 00/72] 4.14.90-stable review Guenter Roeck
2018-12-20 22:50 ` shuah
2018-12-21 1:13 ` Naresh Kamboju
[not found] ` <CA+res+QQt7GEL0td1FraSJNnR2yOcDgXGahW+YyiuMSeWTshfw@mail.gmail.com>
2018-12-21 7:54 ` Jinpu Wang
2018-12-21 10:00 ` Greg Kroah-Hartman
2018-12-21 9:27 ` Jon Hunter
2018-12-21 10:00 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181220085923.285659094@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=bigeasy@linutronix.de \
--cc=boqun.feng@gmail.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mingo@kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).