[PATCH -next v2 0/4] rcu/nocb: Cleanup patches for next merge window

public inbox for rcu@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH -next v2 0/4] rcu/nocb: Cleanup patches for next merge window
@ 2026-01-14 17:31 Joel Fernandes
  2026-01-14 17:31 ` [PATCH -next v2 1/4] rcu/nocb: Remove unnecessary WakeOvfIsDeferred wake path Joel Fernandes
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Joel Fernandes @ 2026-01-14 17:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E . McKenney, Boqun Feng, rcu, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Uladzislau Rezki, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Joel Fernandes

These are few nocb related cleanup patches for the next merge window that I am
resending. The fourth patch is trivial extraction of duplicate code for canceling
deferred wakeups.

I did not hear back about opinions on "Add warning to detect if overload
advancement is ever useful" [1] so I will hold off on deleting that code path till
the next merge window (for now the patch is just adding a warning).

Changes from v1:
- Added patch 4 to extract nocb_defer_wakeup_cancel() helper
- Add Frederic's review talk to "Remove unnecessary WakeOvfIsDeferred wake path"

nocb rcutorture scenarios passed overnight testing on my system.

The git tree with all patches can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git (tag: rcu-nocb-v2-20260114)

Link to v1: https://lore.kernel.org/all/20260101163417.1065705-1-joelagnelf@nvidia.com/

[1] https://lore.kernel.org/all/654e3bde-764b-49ae-8faa-bab8199b1f15@nvidia.com/#t

Joel Fernandes (4):
  rcu/nocb: Remove unnecessary WakeOvfIsDeferred wake path
  rcu/nocb: Add warning if no rcuog wake up attempt happened during
    overload
  rcu/nocb: Add warning to detect if overload advancement is ever useful
  rcu/nocb: Extract nocb_defer_wakeup_cancel() helper

 kernel/rcu/tree.c      |  6 ++-
 kernel/rcu/tree.h      |  4 +-
 kernel/rcu/tree_nocb.h | 91 +++++++++++++++++++++++-------------------
 3 files changed, 56 insertions(+), 45 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH -next v2 1/4] rcu/nocb: Remove unnecessary WakeOvfIsDeferred wake path
  2026-01-14 17:31 [PATCH -next v2 0/4] rcu/nocb: Cleanup patches for next merge window Joel Fernandes
@ 2026-01-14 17:31 ` Joel Fernandes
  2026-01-14 17:31 ` [PATCH -next v2 2/4] rcu/nocb: Add warning if no rcuog wake up attempt happened during overload Joel Fernandes
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Joel Fernandes @ 2026-01-14 17:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E . McKenney, Boqun Feng, rcu, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Uladzislau Rezki, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Joel Fernandes

The WakeOvfIsDeferred code path in __call_rcu_nocb_wake() attempts to
wake rcuog when the callback count exceeds qhimark and callbacks aren't
done with their GP (newly queued or awaiting GP). However, a lot of
testing proves this wake is always redundant or useless.

In the flooding case, rcuog is always waiting for a GP to finish. So
waking up the rcuog thread is pointless. The timer wakeup adds overhead,
rcuog simply wakes up and goes back to sleep achieving nothing.

This path also adds a full memory barrier, and additional timer expiry
modifications unnecessarily.

The root cause is that WakeOvfIsDeferred fires when
!rcu_segcblist_ready_cbs() (GP not complete), but waking rcuog cannot
accelerate GP completion.

This commit therefore removes this path.

Tested with rcutorture scenarios: TREE01, TREE05, TREE08 (all NOCB
configurations) - all pass. Also stress tested using a kernel module
that floods call_rcu() to trigger the overload conditions and made the
observations confirming the findings.

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 kernel/rcu/tree.c      |  2 +-
 kernel/rcu/tree.h      |  3 +--
 kernel/rcu/tree_nocb.h | 49 ++++++++++++++----------------------------
 3 files changed, 18 insertions(+), 36 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 293bbd9ac3f4..2921ffb19939 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3769,7 +3769,7 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
 	}
 	rcu_nocb_unlock(rdp);
 	if (wake_nocb)
-		wake_nocb_gp(rdp, false);
+		wake_nocb_gp(rdp);
 	smp_store_release(&rdp->barrier_seq_snap, gseq);
 }
 
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 2265b9c2906e..7dfc57e9adb1 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -301,7 +301,6 @@ struct rcu_data {
 #define RCU_NOCB_WAKE_BYPASS	1
 #define RCU_NOCB_WAKE_LAZY	2
 #define RCU_NOCB_WAKE		3
-#define RCU_NOCB_WAKE_FORCE	4
 
 #define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500))
 					/* For jiffies_till_first_fqs and */
@@ -500,7 +499,7 @@ static void zero_cpu_stall_ticks(struct rcu_data *rdp);
 static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp);
 static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
 static void rcu_init_one_nocb(struct rcu_node *rnp);
-static bool wake_nocb_gp(struct rcu_data *rdp, bool force);
+static bool wake_nocb_gp(struct rcu_data *rdp);
 static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 				  unsigned long j, bool lazy);
 static void call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *head,
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index e6cd56603cad..f525e4f7985b 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -192,7 +192,7 @@ static void rcu_init_one_nocb(struct rcu_node *rnp)
 
 static bool __wake_nocb_gp(struct rcu_data *rdp_gp,
 			   struct rcu_data *rdp,
-			   bool force, unsigned long flags)
+			   unsigned long flags)
 	__releases(rdp_gp->nocb_gp_lock)
 {
 	bool needwake = false;
@@ -209,7 +209,7 @@ static bool __wake_nocb_gp(struct rcu_data *rdp_gp,
 		timer_delete(&rdp_gp->nocb_timer);
 	}
 
-	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
+	if (READ_ONCE(rdp_gp->nocb_gp_sleep)) {
 		WRITE_ONCE(rdp_gp->nocb_gp_sleep, false);
 		needwake = true;
 	}
@@ -225,13 +225,13 @@ static bool __wake_nocb_gp(struct rcu_data *rdp_gp,
 /*
  * Kick the GP kthread for this NOCB group.
  */
-static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
+static bool wake_nocb_gp(struct rcu_data *rdp)
 {
 	unsigned long flags;
 	struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
 
 	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
-	return __wake_nocb_gp(rdp_gp, rdp, force, flags);
+	return __wake_nocb_gp(rdp_gp, rdp, flags);
 }
 
 #ifdef CONFIG_RCU_LAZY
@@ -518,10 +518,8 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 }
 
 /*
- * Awaken the no-CBs grace-period kthread if needed, either due to it
- * legitimately being asleep or due to overload conditions.
- *
- * If warranted, also wake up the kthread servicing this CPUs queues.
+ * Awaken the no-CBs grace-period kthread if needed due to it legitimately
+ * being asleep.
  */
 static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 				 unsigned long flags)
@@ -533,7 +531,6 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 	long lazy_len;
 	long len;
 	struct task_struct *t;
-	struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
 
 	// If we are being polled or there is no kthread, just leave.
 	t = READ_ONCE(rdp->nocb_gp_kthread);
@@ -549,22 +546,22 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 	lazy_len = READ_ONCE(rdp->lazy_len);
 	if (was_alldone) {
 		rdp->qlen_last_fqs_check = len;
+		rcu_nocb_unlock(rdp);
 		// Only lazy CBs in bypass list
 		if (lazy_len && bypass_len == lazy_len) {
-			rcu_nocb_unlock(rdp);
 			wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_LAZY,
 					   TPS("WakeLazy"));
 		} else if (!irqs_disabled_flags(flags)) {
 			/* ... if queue was empty ... */
-			rcu_nocb_unlock(rdp);
-			wake_nocb_gp(rdp, false);
+			wake_nocb_gp(rdp);
 			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
 					    TPS("WakeEmpty"));
 		} else {
-			rcu_nocb_unlock(rdp);
 			wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE,
 					   TPS("WakeEmptyIsDeferred"));
 		}
+
+		return;
 	} else if (len > rdp->qlen_last_fqs_check + qhimark) {
 		/* ... or if many callbacks queued. */
 		rdp->qlen_last_fqs_check = len;
@@ -575,21 +572,10 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 			rcu_advance_cbs_nowake(rdp->mynode, rdp);
 			rdp->nocb_gp_adv_time = j;
 		}
-		smp_mb(); /* Enqueue before timer_pending(). */
-		if ((rdp->nocb_cb_sleep ||
-		     !rcu_segcblist_ready_cbs(&rdp->cblist)) &&
-		    !timer_pending(&rdp_gp->nocb_timer)) {
-			rcu_nocb_unlock(rdp);
-			wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_FORCE,
-					   TPS("WakeOvfIsDeferred"));
-		} else {
-			rcu_nocb_unlock(rdp);
-			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WakeNot"));
-		}
-	} else {
-		rcu_nocb_unlock(rdp);
-		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WakeNot"));
 	}
+
+	rcu_nocb_unlock(rdp);
+	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WakeNot"));
 }
 
 static void call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *head,
@@ -966,7 +952,6 @@ static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp_gp,
 					   unsigned long flags)
 	__releases(rdp_gp->nocb_gp_lock)
 {
-	int ndw;
 	int ret;
 
 	if (!rcu_nocb_need_deferred_wakeup(rdp_gp, level)) {
@@ -974,8 +959,7 @@ static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp_gp,
 		return false;
 	}
 
-	ndw = rdp_gp->nocb_defer_wakeup;
-	ret = __wake_nocb_gp(rdp_gp, rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
+	ret = __wake_nocb_gp(rdp_gp, rdp, flags);
 	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));
 
 	return ret;
@@ -991,7 +975,6 @@ static void do_nocb_deferred_wakeup_timer(struct timer_list *t)
 	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Timer"));
 
 	raw_spin_lock_irqsave(&rdp->nocb_gp_lock, flags);
-	smp_mb__after_spinlock(); /* Timer expire before wakeup. */
 	do_nocb_deferred_wakeup_common(rdp, rdp, RCU_NOCB_WAKE_BYPASS, flags);
 }
 
@@ -1272,7 +1255,7 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 		}
 		rcu_nocb_try_flush_bypass(rdp, jiffies);
 		rcu_nocb_unlock_irqrestore(rdp, flags);
-		wake_nocb_gp(rdp, false);
+		wake_nocb_gp(rdp);
 		sc->nr_to_scan -= _count;
 		count += _count;
 		if (sc->nr_to_scan <= 0)
@@ -1657,7 +1640,7 @@ static void rcu_init_one_nocb(struct rcu_node *rnp)
 {
 }
 
-static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
+static bool wake_nocb_gp(struct rcu_data *rdp)
 {
 	return false;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH -next v2 2/4] rcu/nocb: Add warning if no rcuog wake up attempt happened during overload
  2026-01-14 17:31 [PATCH -next v2 0/4] rcu/nocb: Cleanup patches for next merge window Joel Fernandes
  2026-01-14 17:31 ` [PATCH -next v2 1/4] rcu/nocb: Remove unnecessary WakeOvfIsDeferred wake path Joel Fernandes
@ 2026-01-14 17:31 ` Joel Fernandes
  2026-01-16 14:42   ` joelagnelf
  2026-01-16 21:56   ` Frederic Weisbecker
  2026-01-14 17:31 ` [PATCH -next v2 3/4] rcu/nocb: Add warning to detect if overload advancement is ever useful Joel Fernandes
  2026-01-14 17:31 ` [PATCH -next v2 4/4] rcu/nocb: Extract nocb_defer_wakeup_cancel() helper Joel Fernandes
  3 siblings, 2 replies; 11+ messages in thread
From: Joel Fernandes @ 2026-01-14 17:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E . McKenney, Boqun Feng, rcu, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Uladzislau Rezki, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Joel Fernandes

To be sure we have no rcog wake ups that were lost, add a warning
to cover the case where the rdp is overloaded with callbacks but
no wake up was attempted.

[applied Frederic's adjustment to clearing of nocb_gp_handling flag]

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 kernel/rcu/tree.c      |  4 ++++
 kernel/rcu/tree.h      |  1 +
 kernel/rcu/tree_nocb.h | 11 ++++++++++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 2921ffb19939..958b61be87ea 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3767,6 +3767,10 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
 		debug_rcu_head_unqueue(&rdp->barrier_head);
 		rcu_barrier_trace(TPS("IRQNQ"), -1, rcu_state.barrier_sequence);
 	}
+#ifdef CONFIG_RCU_NOCB_CPU
+	/* wake_nocb implies all CBs queued before were bypass/lazy. */
+	WARN_ON_ONCE(wake_nocb && !rdp->nocb_gp_handling);
+#endif
 	rcu_nocb_unlock(rdp);
 	if (wake_nocb)
 		wake_nocb_gp(rdp);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 7dfc57e9adb1..af1d065e3215 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -257,6 +257,7 @@ struct rcu_data {
 	unsigned long nocb_gp_loops;	/* # passes through wait code. */
 	struct swait_queue_head nocb_gp_wq; /* For nocb kthreads to sleep on. */
 	bool nocb_cb_sleep;		/* Is the nocb CB thread asleep? */
+	bool nocb_gp_handling;		/* Is rcuog handling this rdp? */
 	struct task_struct *nocb_cb_kthread;
 	struct list_head nocb_head_rdp; /*
 					 * Head of rcu_data list in wakeup chain,
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index f525e4f7985b..acca24670a8c 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -546,6 +546,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 	lazy_len = READ_ONCE(rdp->lazy_len);
 	if (was_alldone) {
 		rdp->qlen_last_fqs_check = len;
+		rdp->nocb_gp_handling = true;
 		rcu_nocb_unlock(rdp);
 		// Only lazy CBs in bypass list
 		if (lazy_len && bypass_len == lazy_len) {
@@ -563,7 +564,8 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 
 		return;
 	} else if (len > rdp->qlen_last_fqs_check + qhimark) {
-		/* ... or if many callbacks queued. */
+		/* Callback overload condition. */
+		WARN_ON_ONCE(!rdp->nocb_gp_handling);
 		rdp->qlen_last_fqs_check = len;
 		j = jiffies;
 		if (j != rdp->nocb_gp_adv_time &&
@@ -732,6 +734,12 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 			needwait_gp = true;
 			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
 					    TPS("NeedWaitGP"));
+		} else if (!rcu_cblist_n_cbs(&rdp->nocb_bypass)) {
+			/*
+			 * No pending callbacks and no bypass callbacks.
+			 * The rcuog kthread is done handling this rdp.
+			 */
+			rdp->nocb_gp_handling = false;
 		}
 		if (rcu_segcblist_ready_cbs(&rdp->cblist)) {
 			needwake = rdp->nocb_cb_sleep;
@@ -1254,6 +1262,7 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 			continue;
 		}
 		rcu_nocb_try_flush_bypass(rdp, jiffies);
+		rdp->nocb_gp_handling = true;
 		rcu_nocb_unlock_irqrestore(rdp, flags);
 		wake_nocb_gp(rdp);
 		sc->nr_to_scan -= _count;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH -next v2 3/4] rcu/nocb: Add warning to detect if overload advancement is ever useful
  2026-01-14 17:31 [PATCH -next v2 0/4] rcu/nocb: Cleanup patches for next merge window Joel Fernandes
  2026-01-14 17:31 ` [PATCH -next v2 1/4] rcu/nocb: Remove unnecessary WakeOvfIsDeferred wake path Joel Fernandes
  2026-01-14 17:31 ` [PATCH -next v2 2/4] rcu/nocb: Add warning if no rcuog wake up attempt happened during overload Joel Fernandes
@ 2026-01-14 17:31 ` Joel Fernandes
  2026-01-16 22:42   ` Frederic Weisbecker
  2026-01-14 17:31 ` [PATCH -next v2 4/4] rcu/nocb: Extract nocb_defer_wakeup_cancel() helper Joel Fernandes
  3 siblings, 1 reply; 11+ messages in thread
From: Joel Fernandes @ 2026-01-14 17:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E . McKenney, Boqun Feng, rcu, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Uladzislau Rezki, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Joel Fernandes

During callback overload, the NOCB code attempts an opportunistic
advancement via rcu_advance_cbs_nowake().

Analysis via tracing with 300,000 callbacks flooded shows this
optimization is likely dead code:
- 30 overload conditions triggered
- 0 advancements actually occurred
- 100% of time no advancement due to current GP not done.

I also ran TREE05 and TREE08 for 2 hours and cannot trigger it.

When callbacks overflow (exceed qhimark), they are waiting for a grace
period that hasn't completed yet. The optimization requires the GP to be
complete to advance callbacks, but the overload condition itself is
caused by callbacks piling up faster than GPs can complete. This creates
a logical contradiction where the advancement cannot happen.

In *theory* this might be possible, the GP completed just in the nick of
time as we hit the overload, but this is just so rare that it can be
considered impossible when we cannot even hit it with synthetic callback
flooding even, it is a waste of cycles to even try to advance, let alone
be useful and is a maintenance burden complexity we don't need.

I suggest deletion. However, add a WARN_ON_ONCE for a merge window or 2
and delete it after out of extreme caution.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 kernel/rcu/tree_nocb.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index acca24670a8c..702ede003dce 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -571,8 +571,20 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 		if (j != rdp->nocb_gp_adv_time &&
 		    rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
 		    rcu_seq_done(&rdp->mynode->gp_seq, cur_gp_seq)) {
+			long done_before = rcu_segcblist_get_seglen(&rdp->cblist, RCU_DONE_TAIL);
+
 			rcu_advance_cbs_nowake(rdp->mynode, rdp);
 			rdp->nocb_gp_adv_time = j;
+
+			/*
+			 * The advance_cbs call above is not useful. Under an
+			 * overload condition, nocb_gp_wait() is always waiting
+			 * for GP completion, due to this nothing can be moved
+			 * from WAIT to DONE, in the list. WARN if an
+			 * advancement happened (next step is deletion of advance).
+			 */
+			WARN_ON_ONCE(rcu_segcblist_get_seglen(&rdp->cblist,
+				     RCU_DONE_TAIL) > done_before);
 		}
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH -next v2 4/4] rcu/nocb: Extract nocb_defer_wakeup_cancel() helper
  2026-01-14 17:31 [PATCH -next v2 0/4] rcu/nocb: Cleanup patches for next merge window Joel Fernandes
                   ` (2 preceding siblings ...)
  2026-01-14 17:31 ` [PATCH -next v2 3/4] rcu/nocb: Add warning to detect if overload advancement is ever useful Joel Fernandes
@ 2026-01-14 17:31 ` Joel Fernandes
  2026-01-16 22:49   ` Frederic Weisbecker
  3 siblings, 1 reply; 11+ messages in thread
From: Joel Fernandes @ 2026-01-14 17:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E . McKenney, Boqun Feng, rcu, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Uladzislau Rezki, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Joel Fernandes

The pattern of checking nocb_defer_wakeup and deleting the timer is
duplicated in __wake_nocb_gp() and nocb_gp_wait(). Extract this into a
common helper function nocb_defer_wakeup_cancel().

This removes code duplication and makes it easier to maintain.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 kernel/rcu/tree_nocb.h | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 702ede003dce..df49c2fa79c5 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -190,6 +190,15 @@ static void rcu_init_one_nocb(struct rcu_node *rnp)
 	init_swait_queue_head(&rnp->nocb_gp_wq[1]);
 }
 
+/* Clear any pending deferred wakeup timer (nocb_gp_lock must be held). */
+static void nocb_defer_wakeup_cancel(struct rcu_data *rdp_gp)
+{
+	if (rdp_gp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) {
+		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
+		timer_delete(&rdp_gp->nocb_timer);
+	}
+}
+
 static bool __wake_nocb_gp(struct rcu_data *rdp_gp,
 			   struct rcu_data *rdp,
 			   unsigned long flags)
@@ -204,10 +213,7 @@ static bool __wake_nocb_gp(struct rcu_data *rdp_gp,
 		return false;
 	}
 
-	if (rdp_gp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) {
-		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
-		timer_delete(&rdp_gp->nocb_timer);
-	}
+	nocb_defer_wakeup_cancel(rdp_gp);
 
 	if (READ_ONCE(rdp_gp->nocb_gp_sleep)) {
 		WRITE_ONCE(rdp_gp->nocb_gp_sleep, false);
@@ -820,10 +826,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 		if (rdp_toggling)
 			my_rdp->nocb_toggling_rdp = NULL;
 
-		if (my_rdp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) {
-			WRITE_ONCE(my_rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
-			timer_delete(&my_rdp->nocb_timer);
-		}
+		nocb_defer_wakeup_cancel(my_rdp);
 		WRITE_ONCE(my_rdp->nocb_gp_sleep, true);
 		raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags);
 	} else {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH -next v2 2/4] rcu/nocb: Add warning if no rcuog wake up attempt happened during overload
  2026-01-14 17:31 ` [PATCH -next v2 2/4] rcu/nocb: Add warning if no rcuog wake up attempt happened during overload Joel Fernandes
@ 2026-01-16 14:42   ` joelagnelf
  2026-01-16 21:56   ` Frederic Weisbecker
  1 sibling, 0 replies; 11+ messages in thread
From: joelagnelf @ 2026-01-16 14:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: Paul E McKenney, Boqun Feng, rcu, Frederic Weisbecker

> On Jan 14, 2026, at 12:32 PM, Joel Fernandes <joelagnelf@nvidia.com> wrote:
> To be sure we have no rcog wake ups that were lost, add a warning
> to cover the case where the rdp is overloaded with callbacks but
> no wake up was attempted.
> 
> [applied Frederic's adjustment to clearing of nocb_gp_handling flag]

Frederic,

Is it possible for you to do a quick review of these last few? They're mostly 
simple. It would be great if we can pick it for this upcoming merge window.

Thanks!

 - Joel 




> 
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
> kernel/rcu/tree.c      |  4 ++++
> kernel/rcu/tree.h      |  1 +
> kernel/rcu/tree_nocb.h | 11 ++++++++++-
> 3 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 2921ffb19939..958b61be87ea 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3767,6 +3767,10 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
>        debug_rcu_head_unqueue(&rdp->barrier_head);
>        rcu_barrier_trace(TPS("IRQNQ"), -1, rcu_state.barrier_sequence);
>    }
> +#ifdef CONFIG_RCU_NOCB_CPU
> +    /* wake_nocb implies all CBs queued before were bypass/lazy. */
> +    WARN_ON_ONCE(wake_nocb && !rdp->nocb_gp_handling);
> +#endif
>    rcu_nocb_unlock(rdp);
>    if (wake_nocb)
>        wake_nocb_gp(rdp);
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 7dfc57e9adb1..af1d065e3215 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -257,6 +257,7 @@ struct rcu_data {
>    unsigned long nocb_gp_loops;    /* # passes through wait code. */
>    struct swait_queue_head nocb_gp_wq; /* For nocb kthreads to sleep on. */
>    bool nocb_cb_sleep;        /* Is the nocb CB thread asleep? */
> +    bool nocb_gp_handling;        /* Is rcuog handling this rdp? */
>    struct task_struct *nocb_cb_kthread;
>    struct list_head nocb_head_rdp; /*
>                     * Head of rcu_data list in wakeup chain,
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index f525e4f7985b..acca24670a8c 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -546,6 +546,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
>    lazy_len = READ_ONCE(rdp->lazy_len);
>    if (was_alldone) {
>        rdp->qlen_last_fqs_check = len;
> +        rdp->nocb_gp_handling = true;
>        rcu_nocb_unlock(rdp);
>        // Only lazy CBs in bypass list
>        if (lazy_len && bypass_len == lazy_len) {
> @@ -563,7 +564,8 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
> 
>        return;
>    } else if (len > rdp->qlen_last_fqs_check + qhimark) {
> -        /* ... or if many callbacks queued. */
> +        /* Callback overload condition. */
> +        WARN_ON_ONCE(!rdp->nocb_gp_handling);
>        rdp->qlen_last_fqs_check = len;
>        j = jiffies;
>        if (j != rdp->nocb_gp_adv_time &&
> @@ -732,6 +734,12 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
>            needwait_gp = true;
>            trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
>                        TPS("NeedWaitGP"));
> +        } else if (!rcu_cblist_n_cbs(&rdp->nocb_bypass)) {
> +            /*
> +             * No pending callbacks and no bypass callbacks.
> +             * The rcuog kthread is done handling this rdp.
> +             */
> +            rdp->nocb_gp_handling = false;
>        }
>        if (rcu_segcblist_ready_cbs(&rdp->cblist)) {
>            needwake = rdp->nocb_cb_sleep;
> @@ -1254,6 +1262,7 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>            continue;
>        }
>        rcu_nocb_try_flush_bypass(rdp, jiffies);
> +        rdp->nocb_gp_handling = true;
>        rcu_nocb_unlock_irqrestore(rdp, flags);
>        wake_nocb_gp(rdp);
>        sc->nr_to_scan -= _count;
> --
> 2.34.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -next v2 2/4] rcu/nocb: Add warning if no rcuog wake up attempt happened during overload
  2026-01-14 17:31 ` [PATCH -next v2 2/4] rcu/nocb: Add warning if no rcuog wake up attempt happened during overload Joel Fernandes
  2026-01-16 14:42   ` joelagnelf
@ 2026-01-16 21:56   ` Frederic Weisbecker
  2026-01-19 22:04     ` Joel Fernandes
  1 sibling, 1 reply; 11+ messages in thread
From: Frederic Weisbecker @ 2026-01-16 21:56 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Paul E . McKenney, Boqun Feng, rcu, Neeraj Upadhyay,
	Josh Triplett, Uladzislau Rezki, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang

Le Wed, Jan 14, 2026 at 12:31:52PM -0500, Joel Fernandes a écrit :
> @@ -1254,6 +1262,7 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>  			continue;
>  		}
>  		rcu_nocb_try_flush_bypass(rdp, jiffies);
> +		rdp->nocb_gp_handling = true;

It should be true already, right?

>  		rcu_nocb_unlock_irqrestore(rdp, flags);
>  		wake_nocb_gp(rdp);
>  		sc->nr_to_scan -= _count;

Thanks.

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -next v2 3/4] rcu/nocb: Add warning to detect if overload advancement is ever useful
  2026-01-14 17:31 ` [PATCH -next v2 3/4] rcu/nocb: Add warning to detect if overload advancement is ever useful Joel Fernandes
@ 2026-01-16 22:42   ` Frederic Weisbecker
  2026-01-19 21:34     ` Joel Fernandes
  0 siblings, 1 reply; 11+ messages in thread
From: Frederic Weisbecker @ 2026-01-16 22:42 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Paul E . McKenney, Boqun Feng, rcu, Neeraj Upadhyay,
	Josh Triplett, Uladzislau Rezki, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang

Le Wed, Jan 14, 2026 at 12:31:53PM -0500, Joel Fernandes a écrit :
> During callback overload, the NOCB code attempts an opportunistic
> advancement via rcu_advance_cbs_nowake().
> 
> Analysis via tracing with 300,000 callbacks flooded shows this
> optimization is likely dead code:
> - 30 overload conditions triggered
> - 0 advancements actually occurred
> - 100% of time no advancement due to current GP not done.
> 
> I also ran TREE05 and TREE08 for 2 hours and cannot trigger it.
> 
> When callbacks overflow (exceed qhimark), they are waiting for a grace
> period that hasn't completed yet. The optimization requires the GP to be
> complete to advance callbacks, but the overload condition itself is
> caused by callbacks piling up faster than GPs can complete. This creates
> a logical contradiction where the advancement cannot happen.
> 
> In *theory* this might be possible, the GP completed just in the nick of
> time as we hit the overload, but this is just so rare that it can be
> considered impossible when we cannot even hit it with synthetic callback
> flooding even, it is a waste of cycles to even try to advance, let alone
> be useful and is a maintenance burden complexity we don't need.

Rare is far from impossible with billions of android devices living out there.

I can imagine the warning to just hit if the flooding callback enqueuer happen
to hit the qhimark right after the GP has completed but before nocb_gp_wait()
managed yet to advance the callbacks.

But what would that prove then?

> 
> I suggest deletion. However, add a WARN_ON_ONCE for a merge window or 2
> and delete it after out of extreme caution.

2 merge windows is the least of time for that warning to ever land on the billions
machines. My phone still runs a v5.4 kernel :-)

And the patch doesn't quite qualify for a stable backport.

Anyway, consider an unpleasant case where nocb_gp_wait() is starving for
example. How would just advancing the callbacks help? We still need
nocb_gp_wait() to run its round to eventually wake up nocb_cb_wait()
so that the done callbacks are executed. And before doing that, it needs
to advance the callbacks anyway...

I'm personally in favour of removing this right away instead, unless Paul
has a good reason that I missed?

Thanks.

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -next v2 4/4] rcu/nocb: Extract nocb_defer_wakeup_cancel() helper
  2026-01-14 17:31 ` [PATCH -next v2 4/4] rcu/nocb: Extract nocb_defer_wakeup_cancel() helper Joel Fernandes
@ 2026-01-16 22:49   ` Frederic Weisbecker
  0 siblings, 0 replies; 11+ messages in thread
From: Frederic Weisbecker @ 2026-01-16 22:49 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Paul E . McKenney, Boqun Feng, rcu, Neeraj Upadhyay,
	Josh Triplett, Uladzislau Rezki, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang

Le Wed, Jan 14, 2026 at 12:31:54PM -0500, Joel Fernandes a écrit :
> The pattern of checking nocb_defer_wakeup and deleting the timer is
> duplicated in __wake_nocb_gp() and nocb_gp_wait(). Extract this into a
> common helper function nocb_defer_wakeup_cancel().
> 
> This removes code duplication and makes it easier to maintain.
> 
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -next v2 3/4] rcu/nocb: Add warning to detect if overload advancement is ever useful
  2026-01-16 22:42   ` Frederic Weisbecker
@ 2026-01-19 21:34     ` Joel Fernandes
  0 siblings, 0 replies; 11+ messages in thread
From: Joel Fernandes @ 2026-01-19 21:34 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Paul E. McKenney, Boqun Feng, rcu, Neeraj Upadhyay,
	Josh Triplett, Uladzislau Rezki, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang

On Fri, Jan 16, 2026 at 11:42:28PM +0100, Frederic Weisbecker wrote:
> Le Wed, Jan 14, 2026 at 12:31:53PM -0500, Joel Fernandes a Ã©crit :
> > During callback overload, the NOCB code attempts an opportunistic
> > advancement via rcu_advance_cbs_nowake().
> > 
> > Analysis via tracing with 300,000 callbacks flooded shows this
> > optimization is likely dead code:
> > - 30 overload conditions triggered
> > - 0 advancements actually occurred
> > - 100% of time no advancement due to current GP not done.
> > 
> > I also ran TREE05 and TREE08 for 2 hours and cannot trigger it.
> > 
> > When callbacks overflow (exceed qhimark), they are waiting for a grace
> > period that hasn't completed yet. The optimization requires the GP to be
> > complete to advance callbacks, but the overload condition itself is
> > caused by callbacks piling up faster than GPs can complete. This creates
> > a logical contradiction where the advancement cannot happen.
> > 
> > In *theory* this might be possible, the GP completed just in the nick of
> > time as we hit the overload, but this is just so rare that it can be
> > considered impossible when we cannot even hit it with synthetic callback
> > flooding even, it is a waste of cycles to even try to advance, let alone
> > be useful and is a maintenance burden complexity we don't need.
> 
> Rare is far from impossible with billions of android devices living out there.
> 
> I can imagine the warning to just hit if the flooding callback enqueuer happen
> to hit the qhimark right after the GP has completed but before nocb_gp_wait()
> managed yet to advance the callbacks.
> 
> But what would that prove then?

I agree with you. I think the original goal of this code path was to help
nocb_gp_wait() by doing some of the advancement work early when we already
know we're in an overload situation. But in all my testing, the cblist is
already advanced by the time we get here - making this path pointless as
you noted.

> > I suggest deletion. However, add a WARN_ON_ONCE for a merge window or 2
> > and delete it after out of extreme caution.
> 
> 2 merge windows is the least of time for that warning to ever land on the billions
> machines. My phone still runs a v5.4 kernel :-)
> 
> And the patch doesn't quite qualify for a stable backport.
> 
> Anyway, consider an unpleasant case where nocb_gp_wait() is starving for
> example. How would just advancing the callbacks help? We still need
> nocb_gp_wait() to run its round to eventually wake up nocb_cb_wait()
> so that the done callbacks are executed. And before doing that, it needs
> to advance the callbacks anyway...
> 
> I'm personally in favour of removing this right away instead, unless Paul
> has a good reason that I missed?

Agreed. Unless I hear otherwise from others, I will delete it in my respin.
I also agree that it is better to delete it sooner rather than later -
waiting for a merge window or two buys us very little given the latency of
kernel release ending in product.

Thanks,

 - Joel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -next v2 2/4] rcu/nocb: Add warning if no rcuog wake up attempt happened during overload
  2026-01-16 21:56   ` Frederic Weisbecker
@ 2026-01-19 22:04     ` Joel Fernandes
  0 siblings, 0 replies; 11+ messages in thread
From: Joel Fernandes @ 2026-01-19 22:04 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Paul E . McKenney, Boqun Feng, rcu, Neeraj Upadhyay,
	Josh Triplett, Uladzislau Rezki, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang



On 1/16/2026 4:56 PM, Frederic Weisbecker wrote:
> Le Wed, Jan 14, 2026 at 12:31:52PM -0500, Joel Fernandes a écrit :
>> @@ -1254,6 +1262,7 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>>   			continue;
>>   		}
>>   		rcu_nocb_try_flush_bypass(rdp, jiffies);
>> +		rdp->nocb_gp_handling = true;
> 
> It should be true already, right?
> 
Yes! I will drop this hunk on respin, thanks.

  - Joel


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-01-19 22:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-14 17:31 [PATCH -next v2 0/4] rcu/nocb: Cleanup patches for next merge window Joel Fernandes
2026-01-14 17:31 ` [PATCH -next v2 1/4] rcu/nocb: Remove unnecessary WakeOvfIsDeferred wake path Joel Fernandes
2026-01-14 17:31 ` [PATCH -next v2 2/4] rcu/nocb: Add warning if no rcuog wake up attempt happened during overload Joel Fernandes
2026-01-16 14:42   ` joelagnelf
2026-01-16 21:56   ` Frederic Weisbecker
2026-01-19 22:04     ` Joel Fernandes
2026-01-14 17:31 ` [PATCH -next v2 3/4] rcu/nocb: Add warning to detect if overload advancement is ever useful Joel Fernandes
2026-01-16 22:42   ` Frederic Weisbecker
2026-01-19 21:34     ` Joel Fernandes
2026-01-14 17:31 ` [PATCH -next v2 4/4] rcu/nocb: Extract nocb_defer_wakeup_cancel() helper Joel Fernandes
2026-01-16 22:49   ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox