From: paulmck@kernel.org
To: rcu@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, kernel-team@fb.com,
mingo@kernel.org, jiangshanlai@gmail.com,
akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org,
rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com,
fweisbec@gmail.com, oleg@redhat.com, joel@joelfernandes.org,
"Paul E. McKenney" <paulmck@kernel.org>
Subject: [PATCH tip/core/rcu 12/16] rcu: Prevent lockdep-RCU splats on lock acquisition/release
Date: Thu, 5 Nov 2020 15:09:17 -0800 [thread overview]
Message-ID: <20201105230921.19017-12-paulmck@kernel.org> (raw)
In-Reply-To: <20201105230856.GA18904@paulmck-ThinkPad-P72>
From: "Paul E. McKenney" <paulmck@kernel.org>
The rcu_cpu_starting() and rcu_report_dead() functions transition the
current CPU between online and offline state from an RCU perspective.
Unfortunately, this means that the rcu_cpu_starting() function's lock
acquisition and the rcu_report_dead() function's lock releases happen
while the CPU is offline from an RCU perspective, which can result
in lockdep-RCU splats about using RCU from an offline CPU. And this
situation can also result in too-short grace periods, especially in
guest OSes that are subject to vCPU preemption.
This commit therefore uses sequence-count-like synchronization to forgive
use of RCU while RCU thinks a CPU is offline across the full extent of
the rcu_cpu_starting() and rcu_report_dead() function's lock acquisitions
and releases.
One approach would have been to use the actual sequence-count primitives
provided by the Linux kernel. Unfortunately, the resulting code looks
completely broken and wrong, and is likely to result in patches that
break RCU in an attempt to address this appearance of broken wrongness.
Plus there is no net savings in lines of code, given the additional
explicit memory barriers required.
Therefore, this sequence count is instead implemented by a new ->ofl_seq
field in the rcu_node structure. If this counter's value is an odd
number, RCU forgives RCU read-side critical sections on other CPUs covered
by the same rcu_node structure, even if those CPUs are offline from
an RCU perspective. In addition, if a given leaf rcu_node structure's
->ofl_seq counter value is an odd number, rcu_gp_init() delays starting
the grace period until that counter value changes.
[ paulmck: Apply Peter Zijlstra feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
kernel/rcu/tree.c | 21 ++++++++++++++++++++-
kernel/rcu/tree.h | 1 +
2 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index fe569db..e790b85 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1152,7 +1152,7 @@ bool rcu_lockdep_current_cpu_online(void)
preempt_disable_notrace();
rdp = this_cpu_ptr(&rcu_data);
rnp = rdp->mynode;
- if (rdp->grpmask & rcu_rnp_online_cpus(rnp))
+ if (rdp->grpmask & rcu_rnp_online_cpus(rnp) || READ_ONCE(rnp->ofl_seq) & 0x1)
ret = true;
preempt_enable_notrace();
return ret;
@@ -1717,6 +1717,7 @@ static void rcu_strict_gp_boundary(void *unused)
*/
static bool rcu_gp_init(void)
{
+ unsigned long firstseq;
unsigned long flags;
unsigned long oldmask;
unsigned long mask;
@@ -1760,6 +1761,12 @@ static bool rcu_gp_init(void)
*/
rcu_state.gp_state = RCU_GP_ONOFF;
rcu_for_each_leaf_node(rnp) {
+ smp_mb(); // Pair with barriers used when updating ->ofl_seq to odd values.
+ firstseq = READ_ONCE(rnp->ofl_seq);
+ if (firstseq & 0x1)
+ while (firstseq == READ_ONCE(rnp->ofl_seq))
+ schedule_timeout_idle(1); // Can't wake unless RCU is watching.
+ smp_mb(); // Pair with barriers used when updating ->ofl_seq to even values.
raw_spin_lock(&rcu_state.ofl_lock);
raw_spin_lock_irq_rcu_node(rnp);
if (rnp->qsmaskinit == rnp->qsmaskinitnext &&
@@ -4069,6 +4076,9 @@ void rcu_cpu_starting(unsigned int cpu)
rnp = rdp->mynode;
mask = rdp->grpmask;
+ WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
+ WARN_ON_ONCE(!(rnp->ofl_seq & 0x1));
+ smp_mb(); // Pair with rcu_gp_cleanup()'s ->ofl_seq barrier().
raw_spin_lock_irqsave_rcu_node(rnp, flags);
WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext | mask);
newcpu = !(rnp->expmaskinitnext & mask);
@@ -4088,6 +4098,9 @@ void rcu_cpu_starting(unsigned int cpu)
} else {
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}
+ smp_mb(); // Pair with rcu_gp_cleanup()'s ->ofl_seq barrier().
+ WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
+ WARN_ON_ONCE(rnp->ofl_seq & 0x1);
smp_mb(); /* Ensure RCU read-side usage follows above initialization. */
}
@@ -4115,6 +4128,9 @@ void rcu_report_dead(unsigned int cpu)
/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
mask = rdp->grpmask;
+ WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
+ WARN_ON_ONCE(!(rnp->ofl_seq & 0x1));
+ smp_mb(); // Pair with rcu_gp_cleanup()'s ->ofl_seq barrier().
raw_spin_lock(&rcu_state.ofl_lock);
raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */
rdp->rcu_ofl_gp_seq = READ_ONCE(rcu_state.gp_seq);
@@ -4127,6 +4143,9 @@ void rcu_report_dead(unsigned int cpu)
WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask);
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
raw_spin_unlock(&rcu_state.ofl_lock);
+ smp_mb(); // Pair with rcu_gp_cleanup()'s ->ofl_seq barrier().
+ WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
+ WARN_ON_ONCE(rnp->ofl_seq & 0x1);
rdp->cpu_started = false;
}
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 805c9eb..7708ed1 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -56,6 +56,7 @@ struct rcu_node {
/* Initialized from ->qsmaskinitnext at the */
/* beginning of each grace period. */
unsigned long qsmaskinitnext;
+ unsigned long ofl_seq; /* CPU-hotplug operation sequence count. */
/* Online CPUs for next grace period. */
unsigned long expmask; /* CPUs or groups that need to check in */
/* to allow the current expedited GP */
--
2.9.5
next prev parent reply other threads:[~2020-11-05 23:09 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-05 23:08 [PATCH tip/core/rcu 0/16] Miscellaneous fixes for v5.11 Paul E. McKenney
2020-11-05 23:09 ` [PATCH tip/core/rcu 01/16] rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled paulmck
2020-11-05 23:09 ` [PATCH tip/core/rcu 02/16] x86/smpboot: Move rcu_cpu_starting() earlier paulmck
2020-11-05 23:09 ` [PATCH tip/core/rcu 03/16] rcu: Panic after fixed number of stalls paulmck
2020-11-05 23:09 ` [PATCH tip/core/rcu 04/16] list.h: Update comment to explicitly note circular lists paulmck
2020-11-05 23:09 ` [PATCH tip/core/rcu 05/16] rcu: Implement rcu_segcblist_is_offloaded() config dependent paulmck
2020-11-05 23:09 ` [PATCH tip/core/rcu 06/16] rcu: Fix single-CPU check in rcu_blocking_is_gp() paulmck
2020-11-05 23:09 ` [PATCH tip/core/rcu 07/16] rcu: Clarify nocb kthreads naming in RCU_NOCB_CPU config paulmck
2020-11-05 23:09 ` [PATCH tip/core/rcu 08/16] rcu/tree: Add a warning if CPU being onlined did not report QS already paulmck
2020-11-05 23:09 ` [PATCH tip/core/rcu 09/16] rcu/tree: Make struct kernel_param_ops definitions const paulmck
2020-11-05 23:09 ` [PATCH tip/core/rcu 10/16] rcu,ftrace: Fix ftrace recursion paulmck
2020-11-05 23:09 ` [PATCH tip/core/rcu 11/16] rcu/tree: nocb: Avoid raising softirq for offloaded ready-to-execute CBs paulmck
2020-11-05 23:09 ` paulmck [this message]
2020-11-05 23:09 ` [PATCH tip/core/rcu 13/16] rcu: Fix a typo in rcu_blocking_is_gp() header comment paulmck
2020-11-05 23:09 ` [PATCH tip/core/rcu 14/16] rcu: Do not report strict GPs for outgoing CPUs paulmck
2020-11-05 23:09 ` [PATCH tip/core/rcu 15/16] rcu/tree: Defer kvfree_rcu() allocation to a clean context paulmck
2020-11-05 23:09 ` [PATCH tip/core/rcu 16/16] srcu: Take early exit on memory-allocation failure paulmck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201105230921.19017-12-paulmck@kernel.org \
--to=paulmck@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=dhowells@redhat.com \
--cc=edumazet@google.com \
--cc=fweisbec@gmail.com \
--cc=jiangshanlai@gmail.com \
--cc=joel@joelfernandes.org \
--cc=josh@joshtriplett.org \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox