From: "Paul E. McKenney" <paulmck@kernel.org>
To: rcu@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com,
rostedt@goodmis.org, Frederic Weisbecker <frederic@kernel.org>,
"Paul E . McKenney" <paulmck@kernel.org>
Subject: [PATCH rcu 9/9] rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU invocation
Date: Tue, 4 Jun 2024 15:23:55 -0700 [thread overview]
Message-ID: <20240604222355.2370768-9-paulmck@kernel.org> (raw)
In-Reply-To: <657595c8-e86c-4594-a5b1-3c64a8275607@paulmck-laptop>
From: Frederic Weisbecker <frederic@kernel.org>
When rcu_barrier() calls rcu_rdp_cpu_online() and observes a CPU off
rnp->qsmaskinitnext, it means that all accesses from the offline CPU
preceding the CPUHP_TEARDOWN_CPU are visible to RCU barrier, including
callbacks expiration and counter updates.
However interrupts can still fire after stop_machine() re-enables
interrupts and before rcutree_report_cpu_dead(). The related accesses
happening between CPUHP_TEARDOWN_CPU and rnp->qsmaskinitnext clearing
are _NOT_ guaranteed to be seen by rcu_barrier() without proper
ordering, especially when callbacks are invoked there to the end, making
rcutree_migrate_callback() bypass barrier_lock.
The following theoretical race example can make rcu_barrier() hang:
CPU 0 CPU 1
----- -----
//cpu_down()
smpboot_park_threads()
//ksoftirqd is parked now
<IRQ>
rcu_sched_clock_irq()
invoke_rcu_core()
do_softirq()
rcu_core()
rcu_do_batch()
// callback storm
// rcu_do_batch() returns
// before completing all
// of them
// do_softirq also returns early because of
// timeout. It defers to ksoftirqd but
// it's parked
</IRQ>
stop_machine()
take_cpu_down()
rcu_barrier()
spin_lock(barrier_lock)
// observes rcu_segcblist_n_cbs(&rdp->cblist) != 0
<IRQ>
do_softirq()
rcu_core()
rcu_do_batch()
//completes all pending callbacks
//smp_mb() implied _after_ callback number dec
</IRQ>
rcutree_report_cpu_dead()
rnp->qsmaskinitnext &= ~rdp->grpmask;
rcutree_migrate_callback()
// no callback, early return without locking
// barrier_lock
//observes !rcu_rdp_cpu_online(rdp)
rcu_barrier_entrain()
rcu_segcblist_entrain()
// Observe rcu_segcblist_n_cbs(rsclp) == 0
// because no barrier between reading
// rnp->qsmaskinitnext and rsclp->len
rcu_segcblist_add_len()
smp_mb__before_atomic()
// will now observe the 0 count and empty
// list, but too late, we enqueue regardless
WRITE_ONCE(rsclp->len, rsclp->len + v);
// ignored barrier callback
// rcu barrier stall...
This could be solved with a read memory barrier, enforcing the message
passing between rnp->qsmaskinitnext and rsclp->len, matching the full
memory barrier after rsclp->len addition in rcu_segcblist_add_len()
performed at the end of rcu_do_batch().
However the rcu_barrier() is complicated enough and probably doesn't
need too many more subtleties. CPU down is a slowpath and the
barrier_lock seldom contended. Solve the issue with unconditionally
locking the barrier_lock on rcutree_migrate_callbacks(). This makes sure
that either rcu_barrier() sees the empty queue or its entrained
callback will be migrated.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
kernel/rcu/tree.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 408b020c9501f..c58fc10fb5969 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -5147,11 +5147,15 @@ void rcutree_migrate_callbacks(int cpu)
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
bool needwake;
- if (rcu_rdp_is_offloaded(rdp) ||
- rcu_segcblist_empty(&rdp->cblist))
- return; /* No callbacks to migrate. */
+ if (rcu_rdp_is_offloaded(rdp))
+ return;
raw_spin_lock_irqsave(&rcu_state.barrier_lock, flags);
+ if (rcu_segcblist_empty(&rdp->cblist)) {
+ raw_spin_unlock_irqrestore(&rcu_state.barrier_lock, flags);
+ return; /* No callbacks to migrate. */
+ }
+
WARN_ON_ONCE(rcu_rdp_cpu_online(rdp));
rcu_barrier_entrain(rdp);
my_rdp = this_cpu_ptr(&rcu_data);
--
2.40.1
prev parent reply other threads:[~2024-06-04 22:23 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-04 22:23 [PATCH rcu 0/9] Miscellaneous fixes for v6.11 Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 1/9] rcu: Add lockdep_assert_in_rcu_read_lock() and friends Paul E. McKenney
2025-02-20 19:38 ` Jeff Johnson
2025-02-20 22:04 ` Paul E. McKenney
2025-02-20 23:51 ` Jeff Johnson
2024-06-04 22:23 ` [PATCH rcu 2/9] rcu: Reduce synchronize_rcu() delays when all wait heads are in use Paul E. McKenney
2024-06-05 12:09 ` Frederic Weisbecker
2024-06-05 18:38 ` Paul E. McKenney
2024-06-06 3:46 ` Neeraj Upadhyay
2024-06-06 16:49 ` Paul E. McKenney
2024-06-11 10:12 ` Uladzislau Rezki
2024-06-04 22:23 ` [PATCH rcu 3/9] rcu/tree: Reduce wake up for synchronize_rcu() common case Paul E. McKenney
2024-06-05 16:35 ` Frederic Weisbecker
2024-06-05 18:42 ` Paul E. McKenney
2024-06-06 5:58 ` Neeraj upadhyay
2024-06-06 18:12 ` Paul E. McKenney
2024-06-07 1:51 ` Neeraj upadhyay
2024-06-10 15:12 ` Paul E. McKenney
2024-06-11 13:46 ` Neeraj upadhyay
2024-06-11 16:17 ` Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 4/9] rcu: Disable interrupts directly in rcu_gp_init() Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 5/9] srcu: Disable interrupts directly in srcu_gp_end() Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 6/9] rcu: Add rcutree.nocb_patience_delay to reduce nohz_full OS jitter Paul E. McKenney
2024-06-10 5:05 ` Leonardo Bras
2024-06-10 15:10 ` Paul E. McKenney
2024-07-03 16:21 ` Frederic Weisbecker
2024-07-03 17:25 ` Paul E. McKenney
2024-07-04 22:18 ` Frederic Weisbecker
2024-07-05 0:26 ` Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 7/9] MAINTAINERS: Add Uladzislau Rezki as RCU maintainer Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 8/9] rcu: Eliminate lockless accesses to rcu_sync->gp_count Paul E. McKenney
2024-06-04 22:23 ` Paul E. McKenney [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240604222355.2370768-9-paulmck@kernel.org \
--to=paulmck@kernel.org \
--cc=frederic@kernel.org \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox