From: "Paul E. McKenney" <paulmck@kernel.org>
To: rcu@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com,
rostedt@goodmis.org, Frederic Weisbecker <frederic@kernel.org>,
"Paul E . McKenney" <paulmck@kernel.org>
Subject: [PATCH rcu 9/9] rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU invocation
Date: Tue, 4 Jun 2024 15:23:55 -0700 [thread overview]
Message-ID: <20240604222355.2370768-9-paulmck@kernel.org> (raw)
In-Reply-To: <657595c8-e86c-4594-a5b1-3c64a8275607@paulmck-laptop>
From: Frederic Weisbecker <frederic@kernel.org>
When rcu_barrier() calls rcu_rdp_cpu_online() and observes a CPU off
rnp->qsmaskinitnext, it means that all accesses from the offline CPU
preceding the CPUHP_TEARDOWN_CPU are visible to RCU barrier, including
callbacks expiration and counter updates.
However interrupts can still fire after stop_machine() re-enables
interrupts and before rcutree_report_cpu_dead(). The related accesses
happening between CPUHP_TEARDOWN_CPU and rnp->qsmaskinitnext clearing
are _NOT_ guaranteed to be seen by rcu_barrier() without proper
ordering, especially when callbacks are invoked there to the end, making
rcutree_migrate_callback() bypass barrier_lock.
The following theoretical race example can make rcu_barrier() hang:
CPU 0 CPU 1
----- -----
//cpu_down()
smpboot_park_threads()
//ksoftirqd is parked now
<IRQ>
rcu_sched_clock_irq()
invoke_rcu_core()
do_softirq()
rcu_core()
rcu_do_batch()
// callback storm
// rcu_do_batch() returns
// before completing all
// of them
// do_softirq also returns early because of
// timeout. It defers to ksoftirqd but
// it's parked
</IRQ>
stop_machine()
take_cpu_down()
rcu_barrier()
spin_lock(barrier_lock)
// observes rcu_segcblist_n_cbs(&rdp->cblist) != 0
<IRQ>
do_softirq()
rcu_core()
rcu_do_batch()
//completes all pending callbacks
//smp_mb() implied _after_ callback number dec
</IRQ>
rcutree_report_cpu_dead()
rnp->qsmaskinitnext &= ~rdp->grpmask;
rcutree_migrate_callback()
// no callback, early return without locking
// barrier_lock
//observes !rcu_rdp_cpu_online(rdp)
rcu_barrier_entrain()
rcu_segcblist_entrain()
// Observe rcu_segcblist_n_cbs(rsclp) == 0
// because no barrier between reading
// rnp->qsmaskinitnext and rsclp->len
rcu_segcblist_add_len()
smp_mb__before_atomic()
// will now observe the 0 count and empty
// list, but too late, we enqueue regardless
WRITE_ONCE(rsclp->len, rsclp->len + v);
// ignored barrier callback
// rcu barrier stall...
This could be solved with a read memory barrier, enforcing the message
passing between rnp->qsmaskinitnext and rsclp->len, matching the full
memory barrier after rsclp->len addition in rcu_segcblist_add_len()
performed at the end of rcu_do_batch().
However the rcu_barrier() is complicated enough and probably doesn't
need too many more subtleties. CPU down is a slowpath and the
barrier_lock seldom contended. Solve the issue with unconditionally
locking the barrier_lock on rcutree_migrate_callbacks(). This makes sure
that either rcu_barrier() sees the empty queue or its entrained
callback will be migrated.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
kernel/rcu/tree.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 408b020c9501f..c58fc10fb5969 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -5147,11 +5147,15 @@ void rcutree_migrate_callbacks(int cpu)
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
bool needwake;
- if (rcu_rdp_is_offloaded(rdp) ||
- rcu_segcblist_empty(&rdp->cblist))
- return; /* No callbacks to migrate. */
+ if (rcu_rdp_is_offloaded(rdp))
+ return;
raw_spin_lock_irqsave(&rcu_state.barrier_lock, flags);
+ if (rcu_segcblist_empty(&rdp->cblist)) {
+ raw_spin_unlock_irqrestore(&rcu_state.barrier_lock, flags);
+ return; /* No callbacks to migrate. */
+ }
+
WARN_ON_ONCE(rcu_rdp_cpu_online(rdp));
rcu_barrier_entrain(rdp);
my_rdp = this_cpu_ptr(&rcu_data);
--
2.40.1
prev parent reply other threads:[~2024-06-04 22:23 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-04 22:23 [PATCH rcu 0/9] Miscellaneous fixes for v6.11 Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 1/9] rcu: Add lockdep_assert_in_rcu_read_lock() and friends Paul E. McKenney
2025-02-20 19:38 ` Jeff Johnson
2025-02-20 22:04 ` Paul E. McKenney
2025-02-20 23:51 ` Jeff Johnson
2024-06-04 22:23 ` [PATCH rcu 2/9] rcu: Reduce synchronize_rcu() delays when all wait heads are in use Paul E. McKenney
2024-06-05 12:09 ` Frederic Weisbecker
2024-06-05 18:38 ` Paul E. McKenney
2024-06-06 3:46 ` Neeraj Upadhyay
2024-06-06 16:49 ` Paul E. McKenney
2024-06-11 10:12 ` Uladzislau Rezki
2024-06-04 22:23 ` [PATCH rcu 3/9] rcu/tree: Reduce wake up for synchronize_rcu() common case Paul E. McKenney
2024-06-05 16:35 ` Frederic Weisbecker
2024-06-05 18:42 ` Paul E. McKenney
2024-06-06 5:58 ` Neeraj upadhyay
2024-06-06 18:12 ` Paul E. McKenney
2024-06-07 1:51 ` Neeraj upadhyay
2024-06-10 15:12 ` Paul E. McKenney
2024-06-11 13:46 ` Neeraj upadhyay
2024-06-11 16:17 ` Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 4/9] rcu: Disable interrupts directly in rcu_gp_init() Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 5/9] srcu: Disable interrupts directly in srcu_gp_end() Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 6/9] rcu: Add rcutree.nocb_patience_delay to reduce nohz_full OS jitter Paul E. McKenney
2024-06-10 5:05 ` Leonardo Bras
2024-06-10 15:10 ` Paul E. McKenney
2024-07-03 16:21 ` Frederic Weisbecker
2024-07-03 17:25 ` Paul E. McKenney
2024-07-04 22:18 ` Frederic Weisbecker
2024-07-05 0:26 ` Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 7/9] MAINTAINERS: Add Uladzislau Rezki as RCU maintainer Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 8/9] rcu: Eliminate lockless accesses to rcu_sync->gp_count Paul E. McKenney
2024-06-04 22:23 ` Paul E. McKenney [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240604222355.2370768-9-paulmck@kernel.org \
--to=paulmck@kernel.org \
--cc=frederic@kernel.org \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.