From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: David Woodhouse <dwmw@amazon.co.uk>,
"Paul E . McKenney" <paulmck@kernel.org>,
Sasha Levin <sashal@kernel.org>,
frederic@kernel.org, quic_neeraju@quicinc.com,
josh@joshtriplett.org, rcu@vger.kernel.org
Subject: [PATCH AUTOSEL 5.17 06/43] rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion
Date: Mon, 28 Mar 2022 07:17:50 -0400 [thread overview]
Message-ID: <20220328111828.1554086-6-sashal@kernel.org> (raw)
In-Reply-To: <20220328111828.1554086-1-sashal@kernel.org>
From: David Woodhouse <dwmw@amazon.co.uk>
[ Upstream commit 82980b1622d97017053c6792382469d7dc26a486 ]
If we allow architectures to bring APs online in parallel, then we end
up requiring rcu_cpu_starting() to be reentrant. But currently, the
manipulation of rnp->ofl_seq is not thread-safe.
However, rnp->ofl_seq is also fairly much pointless anyway since both
rcu_cpu_starting() and rcu_report_dead() hold rcu_state.ofl_lock for
fairly much the whole time that rnp->ofl_seq is set to an odd number
to indicate that an operation is in progress.
So drop rnp->ofl_seq completely, and use only rcu_state.ofl_lock.
This has a couple of minor complexities: lockdep will complain when we
take rcu_state.ofl_lock, and currently accepts the 'excuse' of having
an odd value in rnp->ofl_seq. So switch it to an arch_spinlock_t to
avoid that false positive complaint. Since we're killing rnp->ofl_seq
of course that 'excuse' has to be changed too, so make it check for
arch_spin_is_locked(rcu_state.ofl_lock).
There's no arch_spin_lock_irqsave() so we have to manually save and
restore local interrupts around the locking.
At Paul's request based on Neeraj's analysis, make rcu_gp_init not just
wait but *exclude* any CPU online/offline activity, which was fairly
much true already by virtue of it holding rcu_state.ofl_lock.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/rcu/tree.c | 71 ++++++++++++++++++++++++-----------------------
kernel/rcu/tree.h | 4 +--
2 files changed, 37 insertions(+), 38 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a4c25a6283b0..73a4c9d07b86 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -91,7 +91,7 @@ static struct rcu_state rcu_state = {
.abbr = RCU_ABBR,
.exp_mutex = __MUTEX_INITIALIZER(rcu_state.exp_mutex),
.exp_wake_mutex = __MUTEX_INITIALIZER(rcu_state.exp_wake_mutex),
- .ofl_lock = __RAW_SPIN_LOCK_UNLOCKED(rcu_state.ofl_lock),
+ .ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED,
};
/* Dump rcu_node combining tree at boot to verify correct setup. */
@@ -1175,7 +1175,15 @@ bool rcu_lockdep_current_cpu_online(void)
preempt_disable_notrace();
rdp = this_cpu_ptr(&rcu_data);
rnp = rdp->mynode;
- if (rdp->grpmask & rcu_rnp_online_cpus(rnp) || READ_ONCE(rnp->ofl_seq) & 0x1)
+ /*
+ * Strictly, we care here about the case where the current CPU is
+ * in rcu_cpu_starting() and thus has an excuse for rdp->grpmask
+ * not being up to date. So arch_spin_is_locked() might have a
+ * false positive if it's held by some *other* CPU, but that's
+ * OK because that just means a false *negative* on the warning.
+ */
+ if (rdp->grpmask & rcu_rnp_online_cpus(rnp) ||
+ arch_spin_is_locked(&rcu_state.ofl_lock))
ret = true;
preempt_enable_notrace();
return ret;
@@ -1739,7 +1747,6 @@ static void rcu_strict_gp_boundary(void *unused)
*/
static noinline_for_stack bool rcu_gp_init(void)
{
- unsigned long firstseq;
unsigned long flags;
unsigned long oldmask;
unsigned long mask;
@@ -1782,22 +1789,17 @@ static noinline_for_stack bool rcu_gp_init(void)
* of RCU's Requirements documentation.
*/
WRITE_ONCE(rcu_state.gp_state, RCU_GP_ONOFF);
+ /* Exclude CPU hotplug operations. */
rcu_for_each_leaf_node(rnp) {
- // Wait for CPU-hotplug operations that might have
- // started before this grace period did.
- smp_mb(); // Pair with barriers used when updating ->ofl_seq to odd values.
- firstseq = READ_ONCE(rnp->ofl_seq);
- if (firstseq & 0x1)
- while (firstseq == READ_ONCE(rnp->ofl_seq))
- schedule_timeout_idle(1); // Can't wake unless RCU is watching.
- smp_mb(); // Pair with barriers used when updating ->ofl_seq to even values.
- raw_spin_lock(&rcu_state.ofl_lock);
- raw_spin_lock_irq_rcu_node(rnp);
+ local_irq_save(flags);
+ arch_spin_lock(&rcu_state.ofl_lock);
+ raw_spin_lock_rcu_node(rnp);
if (rnp->qsmaskinit == rnp->qsmaskinitnext &&
!rnp->wait_blkd_tasks) {
/* Nothing to do on this leaf rcu_node structure. */
- raw_spin_unlock_irq_rcu_node(rnp);
- raw_spin_unlock(&rcu_state.ofl_lock);
+ raw_spin_unlock_rcu_node(rnp);
+ arch_spin_unlock(&rcu_state.ofl_lock);
+ local_irq_restore(flags);
continue;
}
@@ -1832,8 +1834,9 @@ static noinline_for_stack bool rcu_gp_init(void)
rcu_cleanup_dead_rnp(rnp);
}
- raw_spin_unlock_irq_rcu_node(rnp);
- raw_spin_unlock(&rcu_state.ofl_lock);
+ raw_spin_unlock_rcu_node(rnp);
+ arch_spin_unlock(&rcu_state.ofl_lock);
+ local_irq_restore(flags);
}
rcu_gp_slow(gp_preinit_delay); /* Races with CPU hotplug. */
@@ -4287,11 +4290,10 @@ void rcu_cpu_starting(unsigned int cpu)
rnp = rdp->mynode;
mask = rdp->grpmask;
- WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
- WARN_ON_ONCE(!(rnp->ofl_seq & 0x1));
+ local_irq_save(flags);
+ arch_spin_lock(&rcu_state.ofl_lock);
rcu_dynticks_eqs_online();
- smp_mb(); // Pair with rcu_gp_cleanup()'s ->ofl_seq barrier().
- raw_spin_lock_irqsave_rcu_node(rnp, flags);
+ raw_spin_lock_rcu_node(rnp);
WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext | mask);
newcpu = !(rnp->expmaskinitnext & mask);
rnp->expmaskinitnext |= mask;
@@ -4304,15 +4306,18 @@ void rcu_cpu_starting(unsigned int cpu)
/* An incoming CPU should never be blocking a grace period. */
if (WARN_ON_ONCE(rnp->qsmask & mask)) { /* RCU waiting on incoming CPU? */
+ /* rcu_report_qs_rnp() *really* wants some flags to restore */
+ unsigned long flags2;
+
+ local_irq_save(flags2);
rcu_disable_urgency_upon_qs(rdp);
/* Report QS -after- changing ->qsmaskinitnext! */
- rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
+ rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags2);
} else {
- raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+ raw_spin_unlock_rcu_node(rnp);
}
- smp_mb(); // Pair with rcu_gp_cleanup()'s ->ofl_seq barrier().
- WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
- WARN_ON_ONCE(rnp->ofl_seq & 0x1);
+ arch_spin_unlock(&rcu_state.ofl_lock);
+ local_irq_restore(flags);
smp_mb(); /* Ensure RCU read-side usage follows above initialization. */
}
@@ -4326,7 +4331,7 @@ void rcu_cpu_starting(unsigned int cpu)
*/
void rcu_report_dead(unsigned int cpu)
{
- unsigned long flags;
+ unsigned long flags, seq_flags;
unsigned long mask;
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */
@@ -4340,10 +4345,8 @@ void rcu_report_dead(unsigned int cpu)
/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
mask = rdp->grpmask;
- WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
- WARN_ON_ONCE(!(rnp->ofl_seq & 0x1));
- smp_mb(); // Pair with rcu_gp_cleanup()'s ->ofl_seq barrier().
- raw_spin_lock(&rcu_state.ofl_lock);
+ local_irq_save(seq_flags);
+ arch_spin_lock(&rcu_state.ofl_lock);
raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */
rdp->rcu_ofl_gp_seq = READ_ONCE(rcu_state.gp_seq);
rdp->rcu_ofl_gp_flags = READ_ONCE(rcu_state.gp_flags);
@@ -4354,10 +4357,8 @@ void rcu_report_dead(unsigned int cpu)
}
WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask);
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
- raw_spin_unlock(&rcu_state.ofl_lock);
- smp_mb(); // Pair with rcu_gp_cleanup()'s ->ofl_seq barrier().
- WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
- WARN_ON_ONCE(rnp->ofl_seq & 0x1);
+ arch_spin_unlock(&rcu_state.ofl_lock);
+ local_irq_restore(seq_flags);
rdp->cpu_started = false;
}
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 486fc901bd08..4b4bcef8a974 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -56,8 +56,6 @@ struct rcu_node {
/* Initialized from ->qsmaskinitnext at the */
/* beginning of each grace period. */
unsigned long qsmaskinitnext;
- unsigned long ofl_seq; /* CPU-hotplug operation sequence count. */
- /* Online CPUs for next grace period. */
unsigned long expmask; /* CPUs or groups that need to check in */
/* to allow the current expedited GP */
/* to complete. */
@@ -355,7 +353,7 @@ struct rcu_state {
const char *name; /* Name of structure. */
char abbr; /* Abbreviated name. */
- raw_spinlock_t ofl_lock ____cacheline_internodealigned_in_smp;
+ arch_spinlock_t ofl_lock ____cacheline_internodealigned_in_smp;
/* Synchronize offline with */
/* GP pre-initialization. */
};
--
2.34.1
next prev parent reply other threads:[~2022-03-28 11:19 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 02/43] regulator: rpi-panel: Handle I2C errors/timing to the Atmel Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 03/43] crypto: hisilicon/qm - cleanup warning in qm_vf_read_qos Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 04/43] crypto: octeontx2 - CN10K CPT to RNM workaround Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 05/43] gcc-plugins/stackleak: Exactly match strings instead of prefixes Sasha Levin
2022-03-28 11:17 ` Sasha Levin [this message]
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 07/43] pinctrl: npcm: Fix broken references to chip->parent_device Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 08/43] rcu: Mark writes to the rcu_segcblist structure's ->flags field Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 09/43] block: throttle split bio in case of iops limit Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 10/43] memstick/mspro_block: fix handling of read-only devices Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 11/43] block/bfq_wf2q: correct weight to ioprio Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 12/43] crypto: xts - Add softdep on ecb Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 13/43] crypto: hisilicon/sec - not need to enable sm4 extra mode at HW V3 Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 14/43] block, bfq: don't move oom_bfqq Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 15/43] selinux: use correct type for context length Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction Sasha Levin
2022-03-28 18:08 ` Eric Biggers
2022-03-28 18:34 ` Michael Brooks
2022-03-29 5:31 ` Jason A. Donenfeld
2022-04-05 22:10 ` Jason A. Donenfeld
2022-03-29 15:38 ` Theodore Ts'o
2022-03-29 17:34 ` Michael Brooks
2022-03-29 18:28 ` Theodore Ts'o
[not found] ` <CAOnCY6RUN+CSwjsD6Vg-MDi7ERAj2kKLorMLGp1jE8dTZ+3cpQ@mail.gmail.com>
2022-03-28 19:33 ` Michael Brooks
2022-03-30 16:08 ` Michael Brooks
2022-03-30 16:49 ` David Laight
2022-03-30 17:10 ` Michael Brooks
2022-03-30 18:33 ` Michael Brooks
2022-03-30 19:01 ` Theodore Y. Ts'o
2022-03-30 19:08 ` Michael Brooks
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 17/43] random: remove batched entropy locking Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 18/43] random: absorb fast pool into input pool after fast load Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 19/43] powercap/dtpm_cpu: Reset per_cpu variable in the release function Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 20/43] random: round-robin registers as ulong, not u32 Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 21/43] arm64: module: remove (NOLOAD) from linker script Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 22/43] selinux: allow FIOCLEX and FIONCLEX with policy capability Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 23/43] loop: use sysfs_emit() in the sysfs xxx show() Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 24/43] Fix incorrect type in assignment of ipv6 port for audit Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 25/43] irqchip/qcom-pdc: Fix broken locking Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 26/43] irqchip/nvic: Release nvic_base upon failure Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 27/43] fs/binfmt_elf: Fix AT_PHDR for unusual ELF files Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 28/43] hwrng: cavium - fix NULL but dereferenced coccicheck error Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 29/43] signal, x86: Delay calling signals in atomic on RT enabled kernels Sasha Levin
2022-03-28 14:31 ` Eric W. Biederman
2022-03-28 16:35 ` Sebastian Andrzej Siewior
2022-03-31 16:59 ` Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 30/43] bfq: fix use-after-free in bfq_dispatch_request Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 31/43] ACPICA: Avoid walking the ACPI Namespace if it is not there Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 32/43] ACPI / x86: Add skip i2c clients quirk for Nextbook Ares 8 Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 33/43] ACPI / x86: Add skip i2c clients quirk for Lenovo Yoga Tablet 1050F/L Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 34/43] lib/raid6/test/Makefile: Use $(pound) instead of \# for Make 4.3 Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 35/43] Revert "Revert "block, bfq: honor already-setup queue merges"" Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 36/43] ACPI/APEI: Limit printable size of BERT table data Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 37/43] PM: core: keep irq flags in device_pm_check_callbacks() Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 38/43] parisc: Fix non-access data TLB cache flush faults Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 39/43] parisc: Fix handling off probe non-access faults Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 40/43] nvme-tcp: lockdep: annotate in-kernel sockets Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 41/43] spi: tegra20: Use of_device_get_match_data() Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 42/43] Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag" Sasha Levin
2022-07-07 21:30 ` Tom Crossland
2022-07-07 21:36 ` Limonciello, Mario
2022-07-08 9:22 ` Tom Crossland
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 43/43] spi: fsi: Implement a timeout for polling status Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220328111828.1554086-6-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=dwmw@amazon.co.uk \
--cc=frederic@kernel.org \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=quic_neeraju@quicinc.com \
--cc=rcu@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox