* [PATCH] rcu: Document concurrent quiescent state reporting for offline CPUs
@ 2025-07-07 3:32 Joel Fernandes
2025-07-07 14:22 ` Frederic Weisbecker
0 siblings, 1 reply; 3+ messages in thread
From: Joel Fernandes @ 2025-07-07 3:32 UTC (permalink / raw)
To: linux-kernel, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Jonathan Corbet
Cc: rcu, linux-doc
The synchronization of CPU offlining with GP initialization is confusing
to put it mildly (rightfully so as the issue it deals with is complex).
Recent discussions brought up a question -- what prevents the
rcu_implicit_dyntick_qs() from warning about QS reports for offline
CPUs (missing QS reports for offline CPUs causing indefinite hangs).
QS reporting for now-offline CPUs should only happen from:
- gp_init()
- rcutree_cpu_report_dead()
Add some documentation on this and refer to it from comments in the code
explaining how QS reporting is not missed when these functions are
concurrently running.
I referred heavily to this post [1] about the need for the ofl_lock.
[1] https://lore.kernel.org/all/20180924164443.GF4222@linux.ibm.com/
[ Applied paulmck feedback on moving documentation to Requirements.rst ]
Link: https://lore.kernel.org/all/01b4d228-9416-43f8-a62e-124b92e8741a@paulmck-laptop/
Co-developed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
.../RCU/Design/Requirements/Requirements.rst | 87 +++++++++++++++++++
kernel/rcu/tree.c | 19 +++-
2 files changed, 105 insertions(+), 1 deletion(-)
diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
index 771a1ce4c84d..841326d9358d 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.rst
+++ b/Documentation/RCU/Design/Requirements/Requirements.rst
@@ -2011,6 +2011,93 @@ after the CPU hotplug scanning.
By incrementing gp_seq first, CPU1's RCU read-side critical section
is guaranteed to not be missed by CPU2.
+**Concurrent Quiescent State Reporting for Offline CPUs**
+
+RCU must ensure that CPUs going offline report quiescent states to avoid
+blocking grace periods. This requires careful synchronization to handle
+race conditions
+
+**Race condition causing Offline CPU to hang GP**
+
+A race between CPU offlining and new GP initialization (gp_init) may occur
+because `rcu_report_qs_rnp()` in `rcutree_report_cpu_dead()` must temporarily
+release the `rcu_node` lock to wake the RCU grace-period kthread:
+
+.. code-block:: none
+
+ CPU1 (going offline) CPU0 (GP kthread)
+ -------------------- -----------------
+ rcutree_report_cpu_dead()
+ rcu_report_qs_rnp()
+ // Must release rnp->lock to wake GP kthread
+ raw_spin_unlock_irqrestore_rcu_node()
+ // Wakes up and starts new GP
+ rcu_gp_init()
+ // First loop:
+ copies qsmaskinitnext->qsmaskinit
+ // CPU1 still in qsmaskinitnext!
+
+ // Second loop:
+ rnp->qsmask = rnp->qsmaskinit
+ mask = rnp->qsmask & ~rnp->qsmaskinitnext
+ // mask is 0! CPU1 still in both masks
+ // Reacquire lock (but too late)
+ rnp->qsmaskinitnext &= ~mask // Finally clears bit
+
+Without `ofl_lock`, the new grace period includes the offline CPU and waits
+forever for its quiescent state causing a GP hang.
+
+**A solution with ofl_lock**
+
+The `ofl_lock` (offline lock) prevents `rcu_gp_init()` from running during
+the vulnerable window when `rcu_report_qs_rnp()` has released `rnp->lock`:
+
+.. code-block:: none
+
+ CPU0 (rcu_gp_init) CPU1 (rcutree_report_cpu_dead)
+ ------------------ ------------------------------
+ rcu_for_each_leaf_node(rnp) {
+ arch_spin_lock(&ofl_lock) -----> arch_spin_lock(&ofl_lock) [BLOCKED]
+
+ // Safe: CPU1 can't interfere
+ rnp->qsmaskinit = rnp->qsmaskinitnext
+
+ arch_spin_unlock(&ofl_lock) ---> // Now CPU1 can proceed
+ } // But snapshot already taken
+
+**Another race causing GP hangs in rcu_gpu_init(): Reporting QS for Now-offline CPUs**
+
+After the first loop takes an atomic snapshot of online CPUs, as shown above,
+the second loop in `rcu_gp_init()` detects CPUs that went offline between
+releasing `ofl_lock` and acquiring the per-node `rnp->lock`. This detection is
+crucial because:
+
+1. The CPU might have gone offline after the snapshot but before the second loop
+2. The offline CPU cannot report its own QS if it's already dead
+3. Without this detection, the grace period would wait forever for CPUs that
+ are now offline.
+
+The second loop performs this detection safely:
+
+.. code-block:: none
+
+ rcu_for_each_node_breadth_first(rnp) {
+ raw_spin_lock_irqsave_rcu_node(rnp, flags);
+ rnp->qsmask = rnp->qsmaskinit; // Apply the snapshot
+
+ // Detect CPUs offline after snapshot
+ mask = rnp->qsmask & ~rnp->qsmaskinitnext;
+
+ if (mask && rcu_is_leaf_node(rnp))
+ rcu_report_qs_rnp(mask, ...) // Report QS for offline CPUs
+ }
+
+This approach ensures atomicity: quiescent state reporting for offline CPUs
+happens either in `rcu_gp_init()` (second loop) or in `rcutree_report_cpu_dead()`,
+never both and never neither. The `rnp->lock` held throughout the sequence
+prevents races - `rcutree_report_cpu_dead()` also acquires this lock when
+clearing `qsmaskinitnext`, ensuring mutual exclusion.
+
Scheduler and RCU
~~~~~~~~~~~~~~~~~
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c31b85e62310..f669f9b45e80 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1883,6 +1883,10 @@ static noinline_for_stack bool rcu_gp_init(void)
/* Exclude CPU hotplug operations. */
rcu_for_each_leaf_node(rnp) {
local_irq_disable();
+ /*
+ * Serialize with CPU offline. See Requirements.rst > Hotplug CPU >
+ * Concurrent Quiescent State Reporting for Offline CPUs.
+ */
arch_spin_lock(&rcu_state.ofl_lock);
raw_spin_lock_rcu_node(rnp);
if (rnp->qsmaskinit == rnp->qsmaskinitnext &&
@@ -1957,7 +1961,12 @@ static noinline_for_stack bool rcu_gp_init(void)
trace_rcu_grace_period_init(rcu_state.name, rnp->gp_seq,
rnp->level, rnp->grplo,
rnp->grphi, rnp->qsmask);
- /* Quiescent states for tasks on any now-offline CPUs. */
+ /*
+ * Quiescent states for tasks on any now-offline CPUs. Since we
+ * released the ofl and rnp lock before this loop, CPUs might
+ * have gone offline and we have to report QS on their behalf.
+ * See Requirements.rst > Hotplug CPU > Concurrent QS Reporting.
+ */
mask = rnp->qsmask & ~rnp->qsmaskinitnext;
rnp->rcu_gp_init_mask = mask;
if ((mask || rnp->wait_blkd_tasks) && rcu_is_leaf_node(rnp))
@@ -4388,6 +4397,13 @@ void rcutree_report_cpu_dead(void)
/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
mask = rdp->grpmask;
+
+ /*
+ * Hold the ofl_lock and rnp lock to avoid races between CPU going
+ * offline and doing a QS report (as below), versus rcu_gp_init().
+ * See Requirements.rst > Hotplug CPU > Concurrent QS Reporting section
+ * for more details.
+ */
arch_spin_lock(&rcu_state.ofl_lock);
raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */
rdp->rcu_ofl_gp_seq = READ_ONCE(rcu_state.gp_seq);
@@ -4398,6 +4414,7 @@ void rcutree_report_cpu_dead(void)
rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
raw_spin_lock_irqsave_rcu_node(rnp, flags);
}
+ /* Clear from ->qsmaskinitnext to mark offline. */
WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask);
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
arch_spin_unlock(&rcu_state.ofl_lock);
--
2.43.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] rcu: Document concurrent quiescent state reporting for offline CPUs
2025-07-07 3:32 [PATCH] rcu: Document concurrent quiescent state reporting for offline CPUs Joel Fernandes
@ 2025-07-07 14:22 ` Frederic Weisbecker
2025-07-07 14:27 ` Joel Fernandes
0 siblings, 1 reply; 3+ messages in thread
From: Frederic Weisbecker @ 2025-07-07 14:22 UTC (permalink / raw)
To: Joel Fernandes
Cc: linux-kernel, Paul E. McKenney, Neeraj Upadhyay, Josh Triplett,
Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Jonathan Corbet, rcu, linux-doc
Le Sun, Jul 06, 2025 at 11:32:08PM -0400, Joel Fernandes a écrit :
> The synchronization of CPU offlining with GP initialization is confusing
> to put it mildly (rightfully so as the issue it deals with is complex).
> Recent discussions brought up a question -- what prevents the
> rcu_implicit_dyntick_qs() from warning about QS reports for offline
> CPUs (missing QS reports for offline CPUs causing indefinite hangs).
>
> QS reporting for now-offline CPUs should only happen from:
> - gp_init()
> - rcutree_cpu_report_dead()
>
> Add some documentation on this and refer to it from comments in the code
> explaining how QS reporting is not missed when these functions are
> concurrently running.
>
> I referred heavily to this post [1] about the need for the ofl_lock.
> [1] https://lore.kernel.org/all/20180924164443.GF4222@linux.ibm.com/
>
> [ Applied paulmck feedback on moving documentation to Requirements.rst ]
>
> Link: https://lore.kernel.org/all/01b4d228-9416-43f8-a62e-124b92e8741a@paulmck-laptop/
> Co-developed-by: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Very nice and welcome!!!
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
--
Frederic Weisbecker
SUSE Labs
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] rcu: Document concurrent quiescent state reporting for offline CPUs
2025-07-07 14:22 ` Frederic Weisbecker
@ 2025-07-07 14:27 ` Joel Fernandes
0 siblings, 0 replies; 3+ messages in thread
From: Joel Fernandes @ 2025-07-07 14:27 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: linux-kernel, Paul E. McKenney, Neeraj Upadhyay, Josh Triplett,
Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Jonathan Corbet, rcu, linux-doc
On 7/7/2025 10:22 AM, Frederic Weisbecker wrote:
> Le Sun, Jul 06, 2025 at 11:32:08PM -0400, Joel Fernandes a écrit :
>> The synchronization of CPU offlining with GP initialization is confusing
>> to put it mildly (rightfully so as the issue it deals with is complex).
>> Recent discussions brought up a question -- what prevents the
>> rcu_implicit_dyntick_qs() from warning about QS reports for offline
>> CPUs (missing QS reports for offline CPUs causing indefinite hangs).
>>
>> QS reporting for now-offline CPUs should only happen from:
>> - gp_init()
>> - rcutree_cpu_report_dead()
>>
>> Add some documentation on this and refer to it from comments in the code
>> explaining how QS reporting is not missed when these functions are
>> concurrently running.
>>
>> I referred heavily to this post [1] about the need for the ofl_lock.
>> [1] https://lore.kernel.org/all/20180924164443.GF4222@linux.ibm.com/
>>
>> [ Applied paulmck feedback on moving documentation to Requirements.rst ]
>>
>> Link: https://lore.kernel.org/all/01b4d228-9416-43f8-a62e-124b92e8741a@paulmck-laptop/
>> Co-developed-by: Paul E. McKenney <paulmck@kernel.org>
>> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
>
> Very nice and welcome!!!
>
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
>
Thanks, glad you like it. I figured this stuff is better off in the kernel
documentation than rotting in my notes ;-)
And lets continue to keep the discussions going ;-) thanks!
- Joel
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-07-07 14:28 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-07 3:32 [PATCH] rcu: Document concurrent quiescent state reporting for offline CPUs Joel Fernandes
2025-07-07 14:22 ` Frederic Weisbecker
2025-07-07 14:27 ` Joel Fernandes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).