linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] rcu: Document concurrent quiescent state reporting for offline CPUs
@ 2025-07-07  3:32 Joel Fernandes
  2025-07-07 14:22 ` Frederic Weisbecker
  0 siblings, 1 reply; 3+ messages in thread
From: Joel Fernandes @ 2025-07-07  3:32 UTC (permalink / raw)
  To: linux-kernel, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Jonathan Corbet
  Cc: rcu, linux-doc

The synchronization of CPU offlining with GP initialization is confusing
to put it mildly (rightfully so as the issue it deals with is complex).
Recent discussions brought up a question -- what prevents the
rcu_implicit_dyntick_qs() from warning about QS reports for offline
CPUs (missing QS reports for offline CPUs causing indefinite hangs).

QS reporting for now-offline CPUs should only happen from:
- gp_init()
- rcutree_cpu_report_dead()

Add some documentation on this and refer to it from comments in the code
explaining how QS reporting is not missed when these functions are
concurrently running.

I referred heavily to this post [1] about the need for the ofl_lock.
[1] https://lore.kernel.org/all/20180924164443.GF4222@linux.ibm.com/

[ Applied paulmck feedback on moving documentation to Requirements.rst ]

Link: https://lore.kernel.org/all/01b4d228-9416-43f8-a62e-124b92e8741a@paulmck-laptop/
Co-developed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 .../RCU/Design/Requirements/Requirements.rst  | 87 +++++++++++++++++++
 kernel/rcu/tree.c                             | 19 +++-
 2 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
index 771a1ce4c84d..841326d9358d 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.rst
+++ b/Documentation/RCU/Design/Requirements/Requirements.rst
@@ -2011,6 +2011,93 @@ after the CPU hotplug scanning.
 By incrementing gp_seq first, CPU1's RCU read-side critical section
 is guaranteed to not be missed by CPU2.
 
+**Concurrent Quiescent State Reporting for Offline CPUs**
+
+RCU must ensure that CPUs going offline report quiescent states to avoid
+blocking grace periods. This requires careful synchronization to handle
+race conditions
+
+**Race condition causing Offline CPU to hang GP**
+
+A race between CPU offlining and new GP initialization (gp_init) may occur
+because `rcu_report_qs_rnp()` in `rcutree_report_cpu_dead()` must temporarily
+release the `rcu_node` lock to wake the RCU grace-period kthread:
+
+.. code-block:: none
+
+   CPU1 (going offline)                 CPU0 (GP kthread)
+   --------------------                 -----------------
+   rcutree_report_cpu_dead()
+     rcu_report_qs_rnp()
+       // Must release rnp->lock to wake GP kthread
+       raw_spin_unlock_irqrestore_rcu_node()
+                                        // Wakes up and starts new GP
+                                        rcu_gp_init()
+                                          // First loop:
+                                          copies qsmaskinitnext->qsmaskinit
+                                          // CPU1 still in qsmaskinitnext!
+                                          
+                                          // Second loop:
+                                          rnp->qsmask = rnp->qsmaskinit
+                                          mask = rnp->qsmask & ~rnp->qsmaskinitnext
+                                          // mask is 0! CPU1 still in both masks
+       // Reacquire lock (but too late)
+     rnp->qsmaskinitnext &= ~mask       // Finally clears bit
+
+Without `ofl_lock`, the new grace period includes the offline CPU and waits
+forever for its quiescent state causing a GP hang.
+
+**A solution with ofl_lock**
+
+The `ofl_lock` (offline lock) prevents `rcu_gp_init()` from running during
+the vulnerable window when `rcu_report_qs_rnp()` has released `rnp->lock`:
+
+.. code-block:: none
+
+   CPU0 (rcu_gp_init)                   CPU1 (rcutree_report_cpu_dead)
+   ------------------                   ------------------------------
+   rcu_for_each_leaf_node(rnp) {
+       arch_spin_lock(&ofl_lock) -----> arch_spin_lock(&ofl_lock) [BLOCKED]
+       
+       // Safe: CPU1 can't interfere
+       rnp->qsmaskinit = rnp->qsmaskinitnext
+       
+       arch_spin_unlock(&ofl_lock) ---> // Now CPU1 can proceed
+   }                                    // But snapshot already taken
+
+**Another race causing GP hangs in rcu_gpu_init(): Reporting QS for Now-offline CPUs**
+
+After the first loop takes an atomic snapshot of online CPUs, as shown above,
+the second loop in `rcu_gp_init()` detects CPUs that went offline between
+releasing `ofl_lock` and acquiring the per-node `rnp->lock`. This detection is
+crucial because:
+
+1. The CPU might have gone offline after the snapshot but before the second loop
+2. The offline CPU cannot report its own QS if it's already dead
+3. Without this detection, the grace period would wait forever for CPUs that
+   are now offline.
+
+The second loop performs this detection safely:
+
+.. code-block:: none
+
+   rcu_for_each_node_breadth_first(rnp) {
+       raw_spin_lock_irqsave_rcu_node(rnp, flags);
+       rnp->qsmask = rnp->qsmaskinit;  // Apply the snapshot
+       
+       // Detect CPUs offline after snapshot
+       mask = rnp->qsmask & ~rnp->qsmaskinitnext;
+       
+       if (mask && rcu_is_leaf_node(rnp))
+           rcu_report_qs_rnp(mask, ...)  // Report QS for offline CPUs
+   }
+
+This approach ensures atomicity: quiescent state reporting for offline CPUs
+happens either in `rcu_gp_init()` (second loop) or in `rcutree_report_cpu_dead()`,
+never both and never neither. The `rnp->lock` held throughout the sequence
+prevents races - `rcutree_report_cpu_dead()` also acquires this lock when
+clearing `qsmaskinitnext`, ensuring mutual exclusion.
+
 Scheduler and RCU
 ~~~~~~~~~~~~~~~~~
 
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c31b85e62310..f669f9b45e80 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1883,6 +1883,10 @@ static noinline_for_stack bool rcu_gp_init(void)
 	/* Exclude CPU hotplug operations. */
 	rcu_for_each_leaf_node(rnp) {
 		local_irq_disable();
+		/*
+		 * Serialize with CPU offline. See Requirements.rst > Hotplug CPU >
+		 * Concurrent Quiescent State Reporting for Offline CPUs.
+		 */
 		arch_spin_lock(&rcu_state.ofl_lock);
 		raw_spin_lock_rcu_node(rnp);
 		if (rnp->qsmaskinit == rnp->qsmaskinitnext &&
@@ -1957,7 +1961,12 @@ static noinline_for_stack bool rcu_gp_init(void)
 		trace_rcu_grace_period_init(rcu_state.name, rnp->gp_seq,
 					    rnp->level, rnp->grplo,
 					    rnp->grphi, rnp->qsmask);
-		/* Quiescent states for tasks on any now-offline CPUs. */
+		/*
+		 * Quiescent states for tasks on any now-offline CPUs. Since we
+		 * released the ofl and rnp lock before this loop, CPUs might
+		 * have gone offline and we have to report QS on their behalf.
+		 * See Requirements.rst > Hotplug CPU > Concurrent QS Reporting.
+		 */
 		mask = rnp->qsmask & ~rnp->qsmaskinitnext;
 		rnp->rcu_gp_init_mask = mask;
 		if ((mask || rnp->wait_blkd_tasks) && rcu_is_leaf_node(rnp))
@@ -4388,6 +4397,13 @@ void rcutree_report_cpu_dead(void)
 
 	/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
 	mask = rdp->grpmask;
+
+	/*
+	 * Hold the ofl_lock and rnp lock to avoid races between CPU going
+	 * offline and doing a QS report (as below), versus rcu_gp_init().
+	 * See Requirements.rst > Hotplug CPU > Concurrent QS Reporting section
+	 * for more details.
+	 */
 	arch_spin_lock(&rcu_state.ofl_lock);
 	raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */
 	rdp->rcu_ofl_gp_seq = READ_ONCE(rcu_state.gp_seq);
@@ -4398,6 +4414,7 @@ void rcutree_report_cpu_dead(void)
 		rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	}
+	/* Clear from ->qsmaskinitnext to mark offline. */
 	WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask);
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	arch_spin_unlock(&rcu_state.ofl_lock);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] rcu: Document concurrent quiescent state reporting for offline CPUs
  2025-07-07  3:32 [PATCH] rcu: Document concurrent quiescent state reporting for offline CPUs Joel Fernandes
@ 2025-07-07 14:22 ` Frederic Weisbecker
  2025-07-07 14:27   ` Joel Fernandes
  0 siblings, 1 reply; 3+ messages in thread
From: Frederic Weisbecker @ 2025-07-07 14:22 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Paul E. McKenney, Neeraj Upadhyay, Josh Triplett,
	Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Jonathan Corbet, rcu, linux-doc

Le Sun, Jul 06, 2025 at 11:32:08PM -0400, Joel Fernandes a écrit :
> The synchronization of CPU offlining with GP initialization is confusing
> to put it mildly (rightfully so as the issue it deals with is complex).
> Recent discussions brought up a question -- what prevents the
> rcu_implicit_dyntick_qs() from warning about QS reports for offline
> CPUs (missing QS reports for offline CPUs causing indefinite hangs).
> 
> QS reporting for now-offline CPUs should only happen from:
> - gp_init()
> - rcutree_cpu_report_dead()
> 
> Add some documentation on this and refer to it from comments in the code
> explaining how QS reporting is not missed when these functions are
> concurrently running.
> 
> I referred heavily to this post [1] about the need for the ofl_lock.
> [1] https://lore.kernel.org/all/20180924164443.GF4222@linux.ibm.com/
> 
> [ Applied paulmck feedback on moving documentation to Requirements.rst ]
> 
> Link: https://lore.kernel.org/all/01b4d228-9416-43f8-a62e-124b92e8741a@paulmck-laptop/
> Co-developed-by: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

Very nice and welcome!!!

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] rcu: Document concurrent quiescent state reporting for offline CPUs
  2025-07-07 14:22 ` Frederic Weisbecker
@ 2025-07-07 14:27   ` Joel Fernandes
  0 siblings, 0 replies; 3+ messages in thread
From: Joel Fernandes @ 2025-07-07 14:27 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Paul E. McKenney, Neeraj Upadhyay, Josh Triplett,
	Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Jonathan Corbet, rcu, linux-doc



On 7/7/2025 10:22 AM, Frederic Weisbecker wrote:
> Le Sun, Jul 06, 2025 at 11:32:08PM -0400, Joel Fernandes a écrit :
>> The synchronization of CPU offlining with GP initialization is confusing
>> to put it mildly (rightfully so as the issue it deals with is complex).
>> Recent discussions brought up a question -- what prevents the
>> rcu_implicit_dyntick_qs() from warning about QS reports for offline
>> CPUs (missing QS reports for offline CPUs causing indefinite hangs).
>>
>> QS reporting for now-offline CPUs should only happen from:
>> - gp_init()
>> - rcutree_cpu_report_dead()
>>
>> Add some documentation on this and refer to it from comments in the code
>> explaining how QS reporting is not missed when these functions are
>> concurrently running.
>>
>> I referred heavily to this post [1] about the need for the ofl_lock.
>> [1] https://lore.kernel.org/all/20180924164443.GF4222@linux.ibm.com/
>>
>> [ Applied paulmck feedback on moving documentation to Requirements.rst ]
>>
>> Link: https://lore.kernel.org/all/01b4d228-9416-43f8-a62e-124b92e8741a@paulmck-laptop/
>> Co-developed-by: Paul E. McKenney <paulmck@kernel.org>
>> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> 
> Very nice and welcome!!!
> 
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> 

Thanks, glad you like it. I figured this stuff is better off in the kernel
documentation than rotting in my notes ;-)

And lets continue to keep the discussions going ;-) thanks!

 - Joel

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-07-07 14:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-07  3:32 [PATCH] rcu: Document concurrent quiescent state reporting for offline CPUs Joel Fernandes
2025-07-07 14:22 ` Frederic Weisbecker
2025-07-07 14:27   ` Joel Fernandes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).