[PATCH AUTOSEL 6.10 15/16] rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU invocation

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Frederic Weisbecker <frederic@kernel.org>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Sasha Levin <sashal@kernel.org>,
	neeraj.upadhyay@kernel.org, joel@joelfernandes.org,
	josh@joshtriplett.org, boqun.feng@gmail.com, urezki@gmail.com,
	rcu@vger.kernel.org
Subject: [PATCH AUTOSEL 6.10 15/16] rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU invocation
Date: Sat, 27 Jul 2024 20:47:32 -0400	[thread overview]
Message-ID: <20240728004739.1698541-15-sashal@kernel.org> (raw)
In-Reply-To: <20240728004739.1698541-1-sashal@kernel.org>

From: Frederic Weisbecker <frederic@kernel.org>

[ Upstream commit 55d4669ef1b76823083caecfab12a8bd2ccdcf64 ]

When rcu_barrier() calls rcu_rdp_cpu_online() and observes a CPU off
rnp->qsmaskinitnext, it means that all accesses from the offline CPU
preceding the CPUHP_TEARDOWN_CPU are visible to RCU barrier, including
callbacks expiration and counter updates.

However interrupts can still fire after stop_machine() re-enables
interrupts and before rcutree_report_cpu_dead(). The related accesses
happening between CPUHP_TEARDOWN_CPU and rnp->qsmaskinitnext clearing
are _NOT_ guaranteed to be seen by rcu_barrier() without proper
ordering, especially when callbacks are invoked there to the end, making
rcutree_migrate_callback() bypass barrier_lock.

The following theoretical race example can make rcu_barrier() hang:

CPU 0                                               CPU 1
-----                                               -----
//cpu_down()
smpboot_park_threads()
//ksoftirqd is parked now
<IRQ>
rcu_sched_clock_irq()
   invoke_rcu_core()
do_softirq()
   rcu_core()
      rcu_do_batch()
         // callback storm
         // rcu_do_batch() returns
         // before completing all
         // of them
   // do_softirq also returns early because of
   // timeout. It defers to ksoftirqd but
   // it's parked
</IRQ>
stop_machine()
   take_cpu_down()
                                                    rcu_barrier()
                                                        spin_lock(barrier_lock)
                                                        // observes rcu_segcblist_n_cbs(&rdp->cblist) != 0
<IRQ>
do_softirq()
   rcu_core()
      rcu_do_batch()
         //completes all pending callbacks
         //smp_mb() implied _after_ callback number dec
</IRQ>

rcutree_report_cpu_dead()
   rnp->qsmaskinitnext &= ~rdp->grpmask;

rcutree_migrate_callback()
   // no callback, early return without locking
   // barrier_lock
                                                        //observes !rcu_rdp_cpu_online(rdp)
                                                        rcu_barrier_entrain()
                                                           rcu_segcblist_entrain()
                                                              // Observe rcu_segcblist_n_cbs(rsclp) == 0
                                                              // because no barrier between reading
                                                              // rnp->qsmaskinitnext and rsclp->len
                                                              rcu_segcblist_add_len()
                                                                 smp_mb__before_atomic()
                                                                 // will now observe the 0 count and empty
                                                                 // list, but too late, we enqueue regardless
                                                                 WRITE_ONCE(rsclp->len, rsclp->len + v);
                                                        // ignored barrier callback
                                                        // rcu barrier stall...

This could be solved with a read memory barrier, enforcing the message
passing between rnp->qsmaskinitnext and rsclp->len, matching the full
memory barrier after rsclp->len addition in rcu_segcblist_add_len()
performed at the end of rcu_do_batch().

However the rcu_barrier() is complicated enough and probably doesn't
need too many more subtleties. CPU down is a slowpath and the
barrier_lock seldom contended. Solve the issue with unconditionally
locking the barrier_lock on rcutree_migrate_callbacks(). This makes sure
that either rcu_barrier() sees the empty queue or its entrained
callback will be migrated.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/rcu/tree.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 28c7031711a3f..63fb007beeaf5 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -5110,11 +5110,15 @@ void rcutree_migrate_callbacks(int cpu)
 	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
 	bool needwake;
 
-	if (rcu_rdp_is_offloaded(rdp) ||
-	    rcu_segcblist_empty(&rdp->cblist))
-		return;  /* No callbacks to migrate. */
+	if (rcu_rdp_is_offloaded(rdp))
+		return;
 
 	raw_spin_lock_irqsave(&rcu_state.barrier_lock, flags);
+	if (rcu_segcblist_empty(&rdp->cblist)) {
+		raw_spin_unlock_irqrestore(&rcu_state.barrier_lock, flags);
+		return;  /* No callbacks to migrate. */
+	}
+
 	WARN_ON_ONCE(rcu_rdp_cpu_online(rdp));
 	rcu_barrier_entrain(rdp);
 	my_rdp = this_cpu_ptr(&rcu_data);
-- 
2.43.0

next prev parent reply	other threads:[~2024-07-28  0:48 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-28  0:47 [PATCH AUTOSEL 6.10 01/16] regmap: kunit: Fix memory leaks in gen_regmap() and gen_raw_regmap() Sasha Levin
2024-07-28  0:47 ` [PATCH AUTOSEL 6.10 02/16] fs: remove accidental overflow during wraparound check Sasha Levin
2024-07-29 11:54   ` Jan Kara
2024-08-10  9:10     ` Sasha Levin
2024-07-28  0:47 ` [PATCH AUTOSEL 6.10 03/16] gpio: prevent potential speculation leaks in gpio_device_get_desc() Sasha Levin
2024-07-28  0:47 ` [PATCH AUTOSEL 6.10 04/16] hwmon: corsair-psu: add USB id of HX1200i Series 2023 psu Sasha Levin
2024-07-28  0:47 ` [PATCH AUTOSEL 6.10 05/16] Revert "rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()" Sasha Levin
2024-07-28  0:47 ` [PATCH AUTOSEL 6.10 06/16] platform/chrome: cros_ec_lpc: Add a new quirk for ACPI id Sasha Levin
2024-07-28  0:47 ` [PATCH AUTOSEL 6.10 07/16] rcutorture: Fix rcu_torture_fwd_cb_cr() data race Sasha Levin
2024-07-28  0:47 ` [PATCH AUTOSEL 6.10 08/16] md: do not delete safemode_timer in mddev_suspend Sasha Levin
2024-07-28  0:47 ` [PATCH AUTOSEL 6.10 09/16] md: change the return value type of md_write_start to void Sasha Levin
2024-07-28  0:47 ` [PATCH AUTOSEL 6.10 10/16] md/raid5: avoid BUG_ON() while continue reshape after reassembling Sasha Levin
2024-07-28  0:47 ` [PATCH AUTOSEL 6.10 11/16] debugobjects: Annotate racy debug variables Sasha Levin
2024-07-28  0:47 ` [PATCH AUTOSEL 6.10 12/16] nvme: apple: fix device reference counting Sasha Levin
2024-07-28  0:47 ` [PATCH AUTOSEL 6.10 13/16] block: change rq_integrity_vec to respect the iterator Sasha Levin
2024-08-12 13:51   ` Matthieu Baerts
2024-08-12 14:03     ` Greg KH
2024-07-28  0:47 ` [PATCH AUTOSEL 6.10 14/16] irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc() Sasha Levin
2024-07-28  9:22   ` Marc Zyngier
2024-08-10  9:11     ` Sasha Levin
2024-07-28  0:47 ` Sasha Levin [this message]
2024-07-28  0:47 ` [PATCH AUTOSEL 6.10 16/16] clocksource/drivers/sh_cmt: Address race condition for clock events Sasha Levin

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:28c7031711a3 dfblob:63fb007beeaf )
 OR (
bs:"[PATCH AUTOSEL 6.10 15/16] rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU invocation" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240728004739.1698541-15-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=boqun.feng@gmail.com \
    --cc=frederic@kernel.org \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox