From: Sasha Levin <Alexander.Levin@microsoft.com>
To: "stable@vger.kernel.org" <stable@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Sasha Levin <Alexander.Levin@microsoft.com>
Subject: [PATCH AUTOSEL 4.14 42/57] rcu: Fix grace-period hangs due to race with CPU offline
Date: Sat, 15 Sep 2018 01:32:55 +0000 [thread overview]
Message-ID: <20180915013223.179909-42-alexander.levin@microsoft.com> (raw)
In-Reply-To: <20180915013223.179909-1-alexander.levin@microsoft.com>
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
[ Upstream commit 1e64b15a4b102e1cd059d4d798b7a78f93341333 ]
Without special fail-safe quiescent-state-propagation checks, grace-period
hangs can result from the following scenario:
1. CPU 1 goes offline.
2. Because CPU 1 is the only CPU in the system blocking the current
grace period, the grace period ends as soon as
rcu_cleanup_dying_idle_cpu()'s call to rcu_report_qs_rnp()
returns.
3. At this point, the leaf rcu_node structure's ->lock is no longer
held: rcu_report_qs_rnp() has released it, as it must in order
to awaken the RCU grace-period kthread.
4. At this point, that same leaf rcu_node structure's ->qsmaskinitnext
field still records CPU 1 as being online. This is absolutely
necessary because the scheduler uses RCU (in this case on the
wake-up path while awakening RCU's grace-period kthread), and
->qsmaskinitnext contains RCU's idea as to which CPUs are online.
Therefore, invoking rcu_report_qs_rnp() after clearing CPU 1's
bit from ->qsmaskinitnext would result in a lockdep-RCU splat
due to RCU being used from an offline CPU.
5. RCU's grace-period kthread awakens, sees that the old grace period
has completed and that a new one is needed. It therefore starts
a new grace period, but because CPU 1's leaf rcu_node structure's
->qsmaskinitnext field still shows CPU 1 as being online, this new
grace period is initialized to wait for a quiescent state from the
now-offline CPU 1.
6. Without the fail-safe force-quiescent-state checks, there would
be no quiescent state from the now-offline CPU 1, which would
eventually result in RCU CPU stall warnings and memory exhaustion.
It would be good to get rid of the special fail-safe quiescent-state
propagation checks, and thus it would be good to fix things so that
the above scenario cannot happen. This commit therefore adds a new
->ofl_lock to the rcu_state structure. This lock is held by rcu_gp_init()
across the applying of buffered online and offline operations to the
rcu_node tree, and it is also held by rcu_cleanup_dying_idle_cpu()
when buffering a new offline operation. This prevents rcu_gp_init()
from acquiring the leaf rcu_node structure's lock during the interval
between when rcu_cleanup_dying_idle_cpu() invokes rcu_report_qs_rnp(),
which releases ->lock and the re-acquisition of that same lock.
This in turn prevents the failure scenario outlined above, and will
hopefully eventually allow removal of the offline-CPU checks from the
force-quiescent-state code path.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
---
kernel/rcu/tree.c | 6 ++++++
kernel/rcu/tree.h | 4 ++++
2 files changed, 10 insertions(+)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 3e3650e94ae6..930edad27cbf 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -102,6 +102,7 @@ struct rcu_state sname##_state = { \
.abbr = sabbr, \
.exp_mutex = __MUTEX_INITIALIZER(sname##_state.exp_mutex), \
.exp_wake_mutex = __MUTEX_INITIALIZER(sname##_state.exp_wake_mutex), \
+ .ofl_lock = __SPIN_LOCK_UNLOCKED(sname##_state.ofl_lock), \
}
RCU_STATE_INITIALIZER(rcu_sched, 's', call_rcu_sched);
@@ -1996,11 +1997,13 @@ static bool rcu_gp_init(struct rcu_state *rsp)
*/
rcu_for_each_leaf_node(rsp, rnp) {
rcu_gp_slow(rsp, gp_preinit_delay);
+ spin_lock(&rsp->ofl_lock);
raw_spin_lock_irq_rcu_node(rnp);
if (rnp->qsmaskinit == rnp->qsmaskinitnext &&
!rnp->wait_blkd_tasks) {
/* Nothing to do on this leaf rcu_node structure. */
raw_spin_unlock_irq_rcu_node(rnp);
+ spin_unlock(&rsp->ofl_lock);
continue;
}
@@ -2035,6 +2038,7 @@ static bool rcu_gp_init(struct rcu_state *rsp)
}
raw_spin_unlock_irq_rcu_node(rnp);
+ spin_unlock(&rsp->ofl_lock);
}
/*
@@ -3837,9 +3841,11 @@ static void rcu_cleanup_dying_idle_cpu(int cpu, struct rcu_state *rsp)
/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
mask = rdp->grpmask;
+ spin_lock(&rsp->ofl_lock);
raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */
rnp->qsmaskinitnext &= ~mask;
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+ spin_unlock(&rsp->ofl_lock);
}
/*
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 8e1f285f0a70..46fa50dbeb82 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -389,6 +389,10 @@ struct rcu_state {
const char *name; /* Name of structure. */
char abbr; /* Abbreviated name. */
struct list_head flavors; /* List of RCU flavors. */
+
+ spinlock_t ofl_lock ____cacheline_internodealigned_in_smp;
+ /* Synchronize offline with */
+ /* GP pre-initialization. */
};
/* Values for rcu_state structure's gp_flags field. */
--
2.17.1
next prev parent reply other threads:[~2018-09-15 1:32 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-15 1:32 [PATCH AUTOSEL 4.14 01/57] binfmt_elf: Respect error return from `regset->active' Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 02/57] net/mlx5: Add missing SET_DRIVER_VERSION command translation Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 03/57] arm64: dts: uniphier: Add missing cooling device properties for CPUs Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 04/57] audit: fix use-after-free in audit_add_watch Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 05/57] mtdchar: fix overflows in adjustment of `count` Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 06/57] vfs: fix freeze protection in mnt_want_write_file() for overlayfs Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 07/57] Bluetooth: Use lock_sock_nested in bt_accept_enqueue Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 08/57] mtd: rawnand: sunxi: Add an U suffix to NFC_PAGE_OP definition Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 09/57] evm: Don't deadlock if a crypto algorithm is unavailable Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 10/57] KVM: PPC: Book3S HV: Add of_node_put() in success path Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 11/57] security: check for kstrdup() failure in lsm_append() Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 12/57] PM / devfreq: use put_device() instead of kfree() Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 13/57] MIPS: loongson64: cs5536: Fix PCI_OHCI_INT_REG reads Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 14/57] configfs: fix registered group removal Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 15/57] pinctrl: rza1: Fix selector use for groups and functions Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 16/57] pinctrl: pinmux: Return selector to the pinctrl driver Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 17/57] sched/core: Use smp_mb() in wake_woken_function() Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 18/57] efi/esrt: Only call efi_mem_reserve() for boot services memory Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 19/57] ARM: hisi: handle of_iomap and fix missing of_node_put Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 21/57] ARM: hisi: check " Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 20/57] ARM: hisi: fix error handling and " Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 22/57] liquidio: fix hang when re-binding VF host drv after running DPDK VF driver Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 24/57] tty: fix termios input-speed encoding when using BOTHER Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 23/57] gpu: ipu-v3: csi: pass back mbus_code_to_bus_cfg error codes Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 25/57] tty: fix termios input-speed encoding Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 26/57] mmc: sdhci-of-esdhc: set proper dma mask for ls104x chips Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 27/57] mmc: tegra: prevent HS200 on Tegra 3 Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 28/57] mmc: sdhci: do not try to use 3.3V signaling if not supported Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 29/57] drm/nouveau: Fix runtime PM leak in drm_open() Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 31/57] drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 30/57] drm/nouveau/debugfs: Wake up GPU before doing any reclocking Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 32/57] parport: sunbpp: fix error return code Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 33/57] sched/fair: Fix util_avg of new tasks for asymmetric systems Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 34/57] coresight: Handle errors in finding input/output ports Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 35/57] coresight: tpiu: Fix disabling timeouts Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 36/57] coresight: ETM: Add support for Arm Cortex-A73 and Cortex-A35 Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 38/57] gpio: pxa: Fix potential NULL dereference Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 37/57] staging: bcm2835-audio: Don't leak workqueue if open fails Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 39/57] gpiolib: Mark gpio_suffixes array with __maybe_unused Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 40/57] mfd: 88pm860x-i2c: switch to i2c_lock_bus(..., I2C_LOCK_SEGMENT) Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 41/57] input: rohm_bu21023: " Sasha Levin
2018-09-15 1:32 ` Sasha Levin [this message]
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 43/57] drm/amdkfd: Fix error codes in kfd_get_process Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 44/57] rtc: bq4802: add error handling for devm_ioremap Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 45/57] ALSA: pcm: Fix snd_interval_refine first/last with open min/max Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 47/57] selftest: timers: Tweak raw_skew to SKIP when ADJ_OFFSET/other clock adjustments are in progress Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 46/57] scsi: libfc: fixup 'sleeping function called from invalid context' Sasha Levin
2018-09-15 1:32 ` [PATCH AUTOSEL 4.14 48/57] drm/panel: type promotion bug in s6e8aa0_read_mtp_id() Sasha Levin
2018-09-15 1:33 ` [PATCH AUTOSEL 4.14 50/57] dmaengine: idma64: Support dmaengine_terminate_sync() Sasha Levin
2018-09-15 1:33 ` [PATCH AUTOSEL 4.14 49/57] dmaengine: hsu: " Sasha Levin
2018-09-15 1:33 ` [PATCH AUTOSEL 4.14 52/57] blk-mq: only attempt to merge bio if there is rq in sw queue Sasha Levin
2018-09-15 1:33 ` [PATCH AUTOSEL 4.14 51/57] IB/nes: Fix a compiler warning Sasha Levin
2018-09-15 1:33 ` [PATCH AUTOSEL 4.14 54/57] gpiolib: Respect error code of ->get_direction() Sasha Levin
2018-09-15 1:33 ` [PATCH AUTOSEL 4.14 53/57] blk-mq: avoid to synchronize rcu inside blk_cleanup_queue() Sasha Levin
2018-09-15 1:33 ` [PATCH AUTOSEL 4.14 55/57] pinctrl: msm: Fix msm_config_group_get() to be compliant Sasha Levin
2018-09-15 1:33 ` [PATCH AUTOSEL 4.14 57/57] clk: tegra: bpmp: Don't crash when a clock fails to register Sasha Levin
2018-09-15 1:33 ` [PATCH AUTOSEL 4.14 56/57] pinctrl: qcom: spmi-gpio: Fix pmic_gpio_config_get() to be compliant Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180915013223.179909-42-alexander.levin@microsoft.com \
--to=alexander.levin@microsoft.com \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).