* [RFC PATCH 00/10] RCU: Enable callbacks to benefit from expedited grace periods
@ 2026-04-17 23:11 Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 01/10] rcu/segcblist: Add SRCU and Tasks RCU wrapper functions Puranjay Mohan
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: Puranjay Mohan @ 2026-04-17 23:11 UTC (permalink / raw)
To: rcu, linux-kernel, linux-trace-kernel
Cc: Puranjay Mohan, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Masami Hiramatsu, Davidlohr Bueso
RCU callbacks only track normal grace period sequence numbers. This
means callbacks must wait for normal GPs even when expedited GPs have
already elapsed.
This series tracks both normal and expedited GP sequences in the
callback infrastructure using struct rcu_gp_oldstate, so callbacks
advance when either GP type completes.
Commits 1-4 prepare the callback infrastructure: SRCU and Tasks RCU are
isolated behind wrapper functions, the segment compaction logic is
factored out for reuse, gp_seq is widened to struct rcu_gp_oldstate, and
RCU_GET_STATE_NOT_TRACKED is added for subsystems that do not track
expedited GPs.
Commit 5 is the core change: rcu_segcblist_advance() checks both normal
and expedited GP completion via poll_state_synchronize_rcu_full(), and
rcu_accelerate_cbs() captures both GP sequences via
get_state_synchronize_rcu_full(). SRCU wrappers become standalone
implementations since they cannot use poll_state_synchronize_rcu_full()
which reads RCU-specific globals.
Commits 7-9 fix three notification paths so that expedited GP completion
triggers callback processing: NOCB rcuog kthreads are woken,
rcu_pending() detects the completion, and rcu_core() advances the
callbacks.
Commit 10 adds nexp and exp_interval parameters to rcuscale for testing
callback drain under concurrent expedited GP load.
Puranjay Mohan (10):
rcu/segcblist: Add SRCU and Tasks RCU wrapper functions
rcu/segcblist: Factor out rcu_segcblist_advance_compact() helper
rcu/segcblist: Change gp_seq to struct rcu_gp_oldstate gp_seq_full
rcu: Add RCU_GET_STATE_NOT_TRACKED for subsystems without expedited
GPs
rcu: Enable RCU callbacks to benefit from expedited grace periods
rcu: Update comments for gp_seq_full and expedited GP tracking
rcu: Wake NOCB rcuog kthreads on expedited grace period completion
rcu: Detect expedited grace period completion in rcu_pending()
rcu: Advance callbacks for expedited GP completion in rcu_core()
rcuscale: Add concurrent expedited GP threads for callback scaling
tests
include/linux/rcu_segcblist.h | 16 ++--
include/trace/events/rcu.h | 5 +-
kernel/rcu/rcu.h | 13 ++-
kernel/rcu/rcu_segcblist.c | 145 ++++++++++++++++++++++++----------
kernel/rcu/rcu_segcblist.h | 8 +-
kernel/rcu/rcuscale.c | 84 +++++++++++++++++++-
kernel/rcu/srcutree.c | 14 ++--
kernel/rcu/tasks.h | 8 +-
kernel/rcu/tree.c | 54 +++++++++----
kernel/rcu/tree.h | 1 +
kernel/rcu/tree_exp.h | 1 +
kernel/rcu/tree_nocb.h | 77 ++++++++++++++----
12 files changed, 324 insertions(+), 102 deletions(-)
base-commit: 51289896149317e0b6a4abf5d300a569c3353dca
--
2.52.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH 01/10] rcu/segcblist: Add SRCU and Tasks RCU wrapper functions
2026-04-17 23:11 [RFC PATCH 00/10] RCU: Enable callbacks to benefit from expedited grace periods Puranjay Mohan
@ 2026-04-17 23:11 ` Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 02/10] rcu/segcblist: Factor out rcu_segcblist_advance_compact() helper Puranjay Mohan
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Puranjay Mohan @ 2026-04-17 23:11 UTC (permalink / raw)
To: rcu, linux-kernel, linux-trace-kernel
Cc: Puranjay Mohan, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Masami Hiramatsu, Davidlohr Bueso
Add srcu_segcblist_advance() and srcu_segcblist_accelerate() wrappers
that forward to the core rcu_segcblist_advance() and
rcu_segcblist_accelerate() functions, and switch all SRCU (srcutree.c)
and Tasks RCU (tasks.h) callers to use these wrappers.
This isolates SRCU and Tasks RCU from upcoming changes to the core
advance/accelerate functions, which will switch to struct
rcu_gp_oldstate for dual normal/expedited GP tracking. Because SRCU and
Tasks RCU use only normal GP sequences, their wrappers will maintain the
existing unsigned long interface.
No functional change.
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
kernel/rcu/rcu_segcblist.c | 10 ++++++++++
kernel/rcu/rcu_segcblist.h | 2 ++
kernel/rcu/srcutree.c | 14 +++++++-------
kernel/rcu/tasks.h | 8 ++++----
4 files changed, 23 insertions(+), 11 deletions(-)
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 298a2c573f02..da39d818b01b 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -620,3 +620,13 @@ void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
rcu_segcblist_init(src_rsclp);
}
+
+void srcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
+{
+ rcu_segcblist_advance(rsclp, seq);
+}
+
+bool srcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
+{
+ return rcu_segcblist_accelerate(rsclp, seq);
+}
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index fadc08ad4b7b..956f2967d9d2 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -143,3 +143,5 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq);
bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq);
void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
struct rcu_segcblist *src_rsclp);
+void srcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq);
+bool srcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq);
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 7c2f7cc131f7..519a35719c89 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -1351,7 +1351,7 @@ static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
* 2) The grace period for RCU_WAIT_TAIL is seen as started but not
* completed so rcu_seq_current() returns X + SRCU_STATE_SCAN1.
*
- * 3) This value is passed to rcu_segcblist_advance() which can't move
+ * 3) This value is passed to srcu_segcblist_advance() which can't move
* any segment forward and fails.
*
* 4) srcu_gp_start_if_needed() still proceeds with callback acceleration.
@@ -1360,15 +1360,15 @@ static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
* RCU_NEXT_READY_TAIL segment as started (ie: X + 4 + SRCU_STATE_SCAN1)
* so it returns a snapshot of the next grace period, which is X + 12.
*
- * 5) The value of X + 12 is passed to rcu_segcblist_accelerate() but the
+ * 5) The value of X + 12 is passed to srcu_segcblist_accelerate() but the
* freshly enqueued callback in RCU_NEXT_TAIL can't move to
* RCU_NEXT_READY_TAIL which already has callbacks for a previous grace
* period (gp_num = X + 8). So acceleration fails.
*/
s = rcu_seq_snap(&ssp->srcu_sup->srcu_gp_seq);
if (rhp) {
- rcu_segcblist_advance(&sdp->srcu_cblist,
- rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq));
+ srcu_segcblist_advance(&sdp->srcu_cblist,
+ rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq));
/*
* Acceleration can never fail because the base current gp_seq
* used for acceleration is <= the value of gp_seq used for
@@ -1376,7 +1376,7 @@ static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
* always be able to be emptied by the acceleration into the
* RCU_NEXT_READY_TAIL or RCU_WAIT_TAIL segments.
*/
- WARN_ON_ONCE(!rcu_segcblist_accelerate(&sdp->srcu_cblist, s));
+ WARN_ON_ONCE(!srcu_segcblist_accelerate(&sdp->srcu_cblist, s));
}
if (ULONG_CMP_LT(sdp->srcu_gp_seq_needed, s)) {
sdp->srcu_gp_seq_needed = s;
@@ -1891,8 +1891,8 @@ static void srcu_invoke_callbacks(struct work_struct *work)
rcu_cblist_init(&ready_cbs);
raw_spin_lock_irq_rcu_node(sdp);
WARN_ON_ONCE(!rcu_segcblist_segempty(&sdp->srcu_cblist, RCU_NEXT_TAIL));
- rcu_segcblist_advance(&sdp->srcu_cblist,
- rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq));
+ srcu_segcblist_advance(&sdp->srcu_cblist,
+ rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq));
/*
* Although this function is theoretically re-entrant, concurrent
* callbacks invocation is disallowed to avoid executing an SRCU barrier
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 48f0d803c8e2..137eb6c48b2c 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -480,8 +480,8 @@ static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp)
if (cpu > 0)
ncbsnz += n;
}
- rcu_segcblist_advance(&rtpcp->cblist, rcu_seq_current(&rtp->tasks_gp_seq));
- (void)rcu_segcblist_accelerate(&rtpcp->cblist, rcu_seq_snap(&rtp->tasks_gp_seq));
+ srcu_segcblist_advance(&rtpcp->cblist, rcu_seq_current(&rtp->tasks_gp_seq));
+ (void)srcu_segcblist_accelerate(&rtpcp->cblist, rcu_seq_snap(&rtp->tasks_gp_seq));
if (rtpcp->urgent_gp > 0 && rcu_segcblist_pend_cbs(&rtpcp->cblist)) {
if (rtp->lazy_jiffies)
rtpcp->urgent_gp--;
@@ -564,7 +564,7 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu
if (rcu_segcblist_empty(&rtpcp->cblist))
return;
raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
- rcu_segcblist_advance(&rtpcp->cblist, rcu_seq_current(&rtp->tasks_gp_seq));
+ srcu_segcblist_advance(&rtpcp->cblist, rcu_seq_current(&rtp->tasks_gp_seq));
rcu_segcblist_extract_done_cbs(&rtpcp->cblist, &rcl);
raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
len = rcl.len;
@@ -577,7 +577,7 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu
}
raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
rcu_segcblist_add_len(&rtpcp->cblist, -len);
- (void)rcu_segcblist_accelerate(&rtpcp->cblist, rcu_seq_snap(&rtp->tasks_gp_seq));
+ (void)srcu_segcblist_accelerate(&rtpcp->cblist, rcu_seq_snap(&rtp->tasks_gp_seq));
raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
}
--
2.52.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 02/10] rcu/segcblist: Factor out rcu_segcblist_advance_compact() helper
2026-04-17 23:11 [RFC PATCH 00/10] RCU: Enable callbacks to benefit from expedited grace periods Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 01/10] rcu/segcblist: Add SRCU and Tasks RCU wrapper functions Puranjay Mohan
@ 2026-04-17 23:11 ` Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 03/10] rcu/segcblist: Change gp_seq to struct rcu_gp_oldstate gp_seq_full Puranjay Mohan
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Puranjay Mohan @ 2026-04-17 23:11 UTC (permalink / raw)
To: rcu, linux-kernel, linux-trace-kernel
Cc: Puranjay Mohan, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Masami Hiramatsu, Davidlohr Bueso
This commit extracts the tail-pointer cleanup and segment compaction
logic from rcu_segcblist_advance() into a new static helper function,
rcu_segcblist_advance_compact(). This shared logic will be reused by the
upcoming srcu_segcblist_advance() standalone implementation, which
cannot call the core rcu_segcblist_advance() because that function will
use RCU-specific globals.
No functional change.
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
kernel/rcu/rcu_segcblist.c | 50 ++++++++++++++++++++++++--------------
1 file changed, 32 insertions(+), 18 deletions(-)
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index da39d818b01b..421f1dadb5e5 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -462,13 +462,43 @@ void rcu_segcblist_insert_pend_cbs(struct rcu_segcblist *rsclp,
WRITE_ONCE(rsclp->tails[RCU_NEXT_TAIL], rclp->tail);
}
+/*
+ * Clean up and compact the segmented callback list after callbacks have been
+ * advanced to the RCU_DONE_TAIL segment. The @i parameter is the index of the
+ * first segment that was NOT advanced (i.e., the segment after the last one
+ * moved to RCU_DONE_TAIL). This function fixes up tail pointers and compacts
+ * any gaps left by the moved segments.
+ */
+static void rcu_segcblist_advance_compact(struct rcu_segcblist *rsclp, int i)
+{
+ int j;
+
+ /* Clean up tail pointers that might have been misordered above. */
+ for (j = RCU_WAIT_TAIL; j < i; j++)
+ WRITE_ONCE(rsclp->tails[j], rsclp->tails[RCU_DONE_TAIL]);
+
+ /*
+ * Callbacks moved, so there might be an empty RCU_WAIT_TAIL
+ * and a non-empty RCU_NEXT_READY_TAIL. If so, copy the
+ * RCU_NEXT_READY_TAIL segment to fill the RCU_WAIT_TAIL gap
+ * created by the now-ready-to-invoke segments.
+ */
+ for (j = RCU_WAIT_TAIL; i < RCU_NEXT_TAIL; i++, j++) {
+ if (rsclp->tails[j] == rsclp->tails[RCU_NEXT_TAIL])
+ break; /* No more callbacks. */
+ WRITE_ONCE(rsclp->tails[j], rsclp->tails[i]);
+ rcu_segcblist_move_seglen(rsclp, i, j);
+ rsclp->gp_seq[j] = rsclp->gp_seq[i];
+ }
+}
+
/*
* Advance the callbacks in the specified rcu_segcblist structure based
* on the current value passed in for the grace-period counter.
*/
void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
{
- int i, j;
+ int i;
WARN_ON_ONCE(!rcu_segcblist_is_enabled(rsclp));
if (rcu_segcblist_restempty(rsclp, RCU_DONE_TAIL))
@@ -489,23 +519,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
if (i == RCU_WAIT_TAIL)
return;
- /* Clean up tail pointers that might have been misordered above. */
- for (j = RCU_WAIT_TAIL; j < i; j++)
- WRITE_ONCE(rsclp->tails[j], rsclp->tails[RCU_DONE_TAIL]);
-
- /*
- * Callbacks moved, so there might be an empty RCU_WAIT_TAIL
- * and a non-empty RCU_NEXT_READY_TAIL. If so, copy the
- * RCU_NEXT_READY_TAIL segment to fill the RCU_WAIT_TAIL gap
- * created by the now-ready-to-invoke segments.
- */
- for (j = RCU_WAIT_TAIL; i < RCU_NEXT_TAIL; i++, j++) {
- if (rsclp->tails[j] == rsclp->tails[RCU_NEXT_TAIL])
- break; /* No more callbacks. */
- WRITE_ONCE(rsclp->tails[j], rsclp->tails[i]);
- rcu_segcblist_move_seglen(rsclp, i, j);
- rsclp->gp_seq[j] = rsclp->gp_seq[i];
- }
+ rcu_segcblist_advance_compact(rsclp, i);
}
/*
--
2.52.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 03/10] rcu/segcblist: Change gp_seq to struct rcu_gp_oldstate gp_seq_full
2026-04-17 23:11 [RFC PATCH 00/10] RCU: Enable callbacks to benefit from expedited grace periods Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 01/10] rcu/segcblist: Add SRCU and Tasks RCU wrapper functions Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 02/10] rcu/segcblist: Factor out rcu_segcblist_advance_compact() helper Puranjay Mohan
@ 2026-04-17 23:11 ` Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 04/10] rcu: Add RCU_GET_STATE_NOT_TRACKED for subsystems without expedited GPs Puranjay Mohan
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Puranjay Mohan @ 2026-04-17 23:11 UTC (permalink / raw)
To: rcu, linux-kernel, linux-trace-kernel
Cc: Puranjay Mohan, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Masami Hiramatsu, Davidlohr Bueso
This commit renames the ->gp_seq[] field in struct rcu_segcblist to
->gp_seq_full[] and changes its type from unsigned long to struct
rcu_gp_oldstate. This prepares the callback tracking infrastructure to
support both normal and expedited grace periods.
All function signatures are updated to pass struct rcu_gp_oldstate
pointers: rcu_segcblist_nextgp(), rcu_segcblist_advance(), and
rcu_segcblist_accelerate() now take struct rcu_gp_oldstate * instead of
unsigned long. All callers are updated to use the .rgos_norm field for
comparisons and assignments.
The SRCU and Tasks RCU wrappers now construct an rcu_gp_oldstate with
just .rgos_norm set and forward to the core functions.
No functional change: only the .rgos_norm field is used in place of
gp_seq.
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
include/linux/rcu_segcblist.h | 2 +-
include/trace/events/rcu.h | 5 +++--
kernel/rcu/rcu_segcblist.c | 30 +++++++++++++++++-------------
kernel/rcu/rcu_segcblist.h | 6 +++---
kernel/rcu/tree.c | 25 ++++++++++++++-----------
kernel/rcu/tree_nocb.h | 29 +++++++++++++++--------------
6 files changed, 53 insertions(+), 44 deletions(-)
diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
index 2fdc2208f1ca..59c68f2ba113 100644
--- a/include/linux/rcu_segcblist.h
+++ b/include/linux/rcu_segcblist.h
@@ -190,7 +190,7 @@ struct rcu_cblist {
struct rcu_segcblist {
struct rcu_head *head;
struct rcu_head **tails[RCU_CBLIST_NSEGS];
- unsigned long gp_seq[RCU_CBLIST_NSEGS];
+ struct rcu_gp_oldstate gp_seq_full[RCU_CBLIST_NSEGS];
#ifdef CONFIG_RCU_NOCB_CPU
atomic_long_t len;
#else
diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 5fbdabe3faea..2b859b274592 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -547,10 +547,11 @@ TRACE_EVENT_RCU(rcu_segcb_stats,
),
TP_fast_assign(
+ int i;
__entry->ctx = ctx;
memcpy(__entry->seglen, rs->seglen, RCU_CBLIST_NSEGS * sizeof(long));
- memcpy(__entry->gp_seq, rs->gp_seq, RCU_CBLIST_NSEGS * sizeof(unsigned long));
-
+ for (i = 0; i < RCU_CBLIST_NSEGS; i++)
+ __entry->gp_seq[i] = rs->gp_seq_full[i].rgos_norm;
),
TP_printk("%s seglen: (DONE=%ld, WAIT=%ld, NEXT_READY=%ld, NEXT=%ld) "
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 421f1dadb5e5..00e164db8b74 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -238,8 +238,8 @@ void rcu_segcblist_init(struct rcu_segcblist *rsclp)
{
int i;
- BUILD_BUG_ON(RCU_NEXT_TAIL + 1 != ARRAY_SIZE(rsclp->gp_seq));
- BUILD_BUG_ON(ARRAY_SIZE(rsclp->tails) != ARRAY_SIZE(rsclp->gp_seq));
+ BUILD_BUG_ON(RCU_NEXT_TAIL + 1 != ARRAY_SIZE(rsclp->gp_seq_full));
+ BUILD_BUG_ON(ARRAY_SIZE(rsclp->tails) != ARRAY_SIZE(rsclp->gp_seq_full));
rsclp->head = NULL;
for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
rsclp->tails[i] = &rsclp->head;
@@ -307,13 +307,13 @@ struct rcu_head *rcu_segcblist_first_pend_cb(struct rcu_segcblist *rsclp)
/*
* Return false if there are no CBs awaiting grace periods, otherwise,
- * return true and store the nearest waited-upon grace period into *lp.
+ * return true and store the nearest waited-upon grace period state into *rgosp.
*/
-bool rcu_segcblist_nextgp(struct rcu_segcblist *rsclp, unsigned long *lp)
+bool rcu_segcblist_nextgp(struct rcu_segcblist *rsclp, struct rcu_gp_oldstate *rgosp)
{
if (!rcu_segcblist_pend_cbs(rsclp))
return false;
- *lp = rsclp->gp_seq[RCU_WAIT_TAIL];
+ *rgosp = rsclp->gp_seq_full[RCU_WAIT_TAIL];
return true;
}
@@ -488,7 +488,7 @@ static void rcu_segcblist_advance_compact(struct rcu_segcblist *rsclp, int i)
break; /* No more callbacks. */
WRITE_ONCE(rsclp->tails[j], rsclp->tails[i]);
rcu_segcblist_move_seglen(rsclp, i, j);
- rsclp->gp_seq[j] = rsclp->gp_seq[i];
+ rsclp->gp_seq_full[j] = rsclp->gp_seq_full[i];
}
}
@@ -496,7 +496,7 @@ static void rcu_segcblist_advance_compact(struct rcu_segcblist *rsclp, int i)
* Advance the callbacks in the specified rcu_segcblist structure based
* on the current value passed in for the grace-period counter.
*/
-void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
+void rcu_segcblist_advance(struct rcu_segcblist *rsclp, struct rcu_gp_oldstate *rgosp)
{
int i;
@@ -509,7 +509,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
* are ready to invoke, and put them into the RCU_DONE_TAIL segment.
*/
for (i = RCU_WAIT_TAIL; i < RCU_NEXT_TAIL; i++) {
- if (ULONG_CMP_LT(seq, rsclp->gp_seq[i]))
+ if (ULONG_CMP_LT(rgosp->rgos_norm, rsclp->gp_seq_full[i].rgos_norm))
break;
WRITE_ONCE(rsclp->tails[RCU_DONE_TAIL], rsclp->tails[i]);
rcu_segcblist_move_seglen(rsclp, i, RCU_DONE_TAIL);
@@ -537,7 +537,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
* ready to invoke. Returns true if there are callbacks that won't be
* ready to invoke until seq, false otherwise.
*/
-bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
+bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, struct rcu_gp_oldstate *rgosp)
{
int i, j;
@@ -555,7 +555,7 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
*/
for (i = RCU_NEXT_READY_TAIL; i > RCU_DONE_TAIL; i--)
if (!rcu_segcblist_segempty(rsclp, i) &&
- ULONG_CMP_LT(rsclp->gp_seq[i], seq))
+ ULONG_CMP_LT(rsclp->gp_seq_full[i].rgos_norm, rgosp->rgos_norm))
break;
/*
@@ -595,7 +595,7 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
*/
for (; i < RCU_NEXT_TAIL; i++) {
WRITE_ONCE(rsclp->tails[i], rsclp->tails[RCU_NEXT_TAIL]);
- rsclp->gp_seq[i] = seq;
+ rsclp->gp_seq_full[i].rgos_norm = rgosp->rgos_norm;
}
return true;
}
@@ -637,10 +637,14 @@ void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
void srcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
{
- rcu_segcblist_advance(rsclp, seq);
+ struct rcu_gp_oldstate rgos = { .rgos_norm = seq };
+
+ rcu_segcblist_advance(rsclp, &rgos);
}
bool srcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
{
- return rcu_segcblist_accelerate(rsclp, seq);
+ struct rcu_gp_oldstate rgos = { .rgos_norm = seq };
+
+ return rcu_segcblist_accelerate(rsclp, &rgos);
}
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 956f2967d9d2..2c06ab830a3d 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -124,7 +124,7 @@ bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp);
bool rcu_segcblist_pend_cbs(struct rcu_segcblist *rsclp);
struct rcu_head *rcu_segcblist_first_cb(struct rcu_segcblist *rsclp);
struct rcu_head *rcu_segcblist_first_pend_cb(struct rcu_segcblist *rsclp);
-bool rcu_segcblist_nextgp(struct rcu_segcblist *rsclp, unsigned long *lp);
+bool rcu_segcblist_nextgp(struct rcu_segcblist *rsclp, struct rcu_gp_oldstate *rgosp);
void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
struct rcu_head *rhp);
bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
@@ -139,8 +139,8 @@ void rcu_segcblist_insert_done_cbs(struct rcu_segcblist *rsclp,
struct rcu_cblist *rclp);
void rcu_segcblist_insert_pend_cbs(struct rcu_segcblist *rsclp,
struct rcu_cblist *rclp);
-void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq);
-bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq);
+void rcu_segcblist_advance(struct rcu_segcblist *rsclp, struct rcu_gp_oldstate *rgosp);
+bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, struct rcu_gp_oldstate *rgosp);
void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
struct rcu_segcblist *src_rsclp);
void srcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 55df6d37145e..cbc170dc3f72 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1142,7 +1142,7 @@ static void rcu_gp_kthread_wake(void)
*/
static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
{
- unsigned long gp_seq_req;
+ struct rcu_gp_oldstate rgos;
bool ret = false;
rcu_lockdep_assert_cblist_protected(rdp);
@@ -1164,15 +1164,15 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
* accelerating callback invocation to an earlier grace-period
* number.
*/
- gp_seq_req = rcu_seq_snap(&rcu_state.gp_seq);
- if (rcu_segcblist_accelerate(&rdp->cblist, gp_seq_req))
- ret = rcu_start_this_gp(rnp, rdp, gp_seq_req);
+ rgos.rgos_norm = rcu_seq_snap(&rcu_state.gp_seq);
+ if (rcu_segcblist_accelerate(&rdp->cblist, &rgos))
+ ret = rcu_start_this_gp(rnp, rdp, rgos.rgos_norm);
/* Trace depending on how much we were able to accelerate. */
if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
- trace_rcu_grace_period(rcu_state.name, gp_seq_req, TPS("AccWaitCB"));
+ trace_rcu_grace_period(rcu_state.name, rgos.rgos_norm, TPS("AccWaitCB"));
else
- trace_rcu_grace_period(rcu_state.name, gp_seq_req, TPS("AccReadyCB"));
+ trace_rcu_grace_period(rcu_state.name, rgos.rgos_norm, TPS("AccReadyCB"));
trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCbPostAcc"));
@@ -1189,14 +1189,14 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
static void rcu_accelerate_cbs_unlocked(struct rcu_node *rnp,
struct rcu_data *rdp)
{
- unsigned long c;
+ struct rcu_gp_oldstate rgos;
bool needwake;
rcu_lockdep_assert_cblist_protected(rdp);
- c = rcu_seq_snap(&rcu_state.gp_seq);
- if (!READ_ONCE(rdp->gpwrap) && ULONG_CMP_GE(rdp->gp_seq_needed, c)) {
+ rgos.rgos_norm = rcu_seq_snap(&rcu_state.gp_seq);
+ if (!READ_ONCE(rdp->gpwrap) && ULONG_CMP_GE(rdp->gp_seq_needed, rgos.rgos_norm)) {
/* Old request still live, so mark recent callbacks. */
- (void)rcu_segcblist_accelerate(&rdp->cblist, c);
+ (void)rcu_segcblist_accelerate(&rdp->cblist, &rgos);
return;
}
raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */
@@ -1218,6 +1218,8 @@ static void rcu_accelerate_cbs_unlocked(struct rcu_node *rnp,
*/
static bool rcu_advance_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
{
+ struct rcu_gp_oldstate rgos;
+
rcu_lockdep_assert_cblist_protected(rdp);
raw_lockdep_assert_held_rcu_node(rnp);
@@ -1229,7 +1231,8 @@ static bool rcu_advance_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
* Find all callbacks whose ->gp_seq numbers indicate that they
* are ready to invoke, and put them into the RCU_DONE_TAIL sublist.
*/
- rcu_segcblist_advance(&rdp->cblist, rnp->gp_seq);
+ rgos.rgos_norm = rnp->gp_seq;
+ rcu_segcblist_advance(&rdp->cblist, &rgos);
/* Classify any remaining callbacks. */
return rcu_accelerate_cbs(rnp, rdp);
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 1047b30cd46b..1837eedfb8c2 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -433,7 +433,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
bool lazy)
{
unsigned long c;
- unsigned long cur_gp_seq;
+ struct rcu_gp_oldstate cur_gp_seq_full;
unsigned long j = jiffies;
long ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
long lazy_len = READ_ONCE(rdp->lazy_len);
@@ -501,8 +501,8 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
return false; // Caller must enqueue the callback.
}
if (j != rdp->nocb_gp_adv_time &&
- rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
- rcu_seq_done(&rdp->mynode->gp_seq, cur_gp_seq)) {
+ rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq_full) &&
+ rcu_seq_done(&rdp->mynode->gp_seq, cur_gp_seq_full.rgos_norm)) {
rcu_advance_cbs_nowake(rdp->mynode, rdp);
rdp->nocb_gp_adv_time = j;
}
@@ -659,7 +659,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
{
bool bypass = false;
int __maybe_unused cpu = my_rdp->cpu;
- unsigned long cur_gp_seq;
+ struct rcu_gp_oldstate cur_gp_seq_full;
unsigned long flags;
bool gotcbs = false;
unsigned long j = jiffies;
@@ -730,8 +730,8 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
needwake_gp = false;
if (!rcu_segcblist_restempty(&rdp->cblist,
RCU_NEXT_READY_TAIL) ||
- (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
- rcu_seq_done(&rnp->gp_seq, cur_gp_seq))) {
+ (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq_full) &&
+ rcu_seq_done(&rnp->gp_seq, cur_gp_seq_full.rgos_norm))) {
raw_spin_lock_rcu_node(rnp); /* irqs disabled. */
needwake_gp = rcu_advance_cbs(rnp, rdp);
wasempty = rcu_segcblist_restempty(&rdp->cblist,
@@ -742,10 +742,10 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
WARN_ON_ONCE(wasempty &&
!rcu_segcblist_restempty(&rdp->cblist,
RCU_NEXT_READY_TAIL));
- if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq)) {
+ if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq_full)) {
if (!needwait_gp ||
- ULONG_CMP_LT(cur_gp_seq, wait_gp_seq))
- wait_gp_seq = cur_gp_seq;
+ ULONG_CMP_LT(cur_gp_seq_full.rgos_norm, wait_gp_seq))
+ wait_gp_seq = cur_gp_seq_full.rgos_norm;
needwait_gp = true;
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
TPS("NeedWaitGP"));
@@ -877,7 +877,7 @@ static inline bool nocb_cb_wait_cond(struct rcu_data *rdp)
static void nocb_cb_wait(struct rcu_data *rdp)
{
struct rcu_segcblist *cblist = &rdp->cblist;
- unsigned long cur_gp_seq;
+ struct rcu_gp_oldstate cur_gp_seq_full;
unsigned long flags;
bool needwake_gp = false;
struct rcu_node *rnp = rdp->mynode;
@@ -918,8 +918,8 @@ static void nocb_cb_wait(struct rcu_data *rdp)
local_bh_enable();
lockdep_assert_irqs_enabled();
rcu_nocb_lock_irqsave(rdp, flags);
- if (rcu_segcblist_nextgp(cblist, &cur_gp_seq) &&
- rcu_seq_done(&rnp->gp_seq, cur_gp_seq) &&
+ if (rcu_segcblist_nextgp(cblist, &cur_gp_seq_full) &&
+ rcu_seq_done(&rnp->gp_seq, cur_gp_seq_full.rgos_norm) &&
raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */
needwake_gp = rcu_advance_cbs(rdp->mynode, rdp);
raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
@@ -1569,9 +1569,10 @@ static void show_rcu_nocb_state(struct rcu_data *rdp)
nocb_entry_rdp);
sprintf(bufd, "%ld", rsclp->seglen[RCU_DONE_TAIL]);
- sprintf(bufw, "%ld(%ld)", rsclp->seglen[RCU_WAIT_TAIL], rsclp->gp_seq[RCU_WAIT_TAIL]);
+ sprintf(bufw, "%ld(%ld)", rsclp->seglen[RCU_WAIT_TAIL],
+ rsclp->gp_seq_full[RCU_WAIT_TAIL].rgos_norm);
sprintf(bufr, "%ld(%ld)", rsclp->seglen[RCU_NEXT_READY_TAIL],
- rsclp->gp_seq[RCU_NEXT_READY_TAIL]);
+ rsclp->gp_seq_full[RCU_NEXT_READY_TAIL].rgos_norm);
sprintf(bufn, "%ld", rsclp->seglen[RCU_NEXT_TAIL]);
sprintf(bufb, "%ld", rcu_cblist_n_cbs(&rdp->nocb_bypass));
pr_info(" CB %d^%d->%d %c%c%c%c%c F%ld L%ld C%d %c%s%c%s%c%s%c%s%c%s q%ld %c CPU %d%s\n",
--
2.52.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 04/10] rcu: Add RCU_GET_STATE_NOT_TRACKED for subsystems without expedited GPs
2026-04-17 23:11 [RFC PATCH 00/10] RCU: Enable callbacks to benefit from expedited grace periods Puranjay Mohan
` (2 preceding siblings ...)
2026-04-17 23:11 ` [RFC PATCH 03/10] rcu/segcblist: Change gp_seq to struct rcu_gp_oldstate gp_seq_full Puranjay Mohan
@ 2026-04-17 23:11 ` Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 05/10] rcu: Enable RCU callbacks to benefit from expedited grace periods Puranjay Mohan
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Puranjay Mohan @ 2026-04-17 23:11 UTC (permalink / raw)
To: rcu, linux-kernel, linux-trace-kernel
Cc: Puranjay Mohan, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Masami Hiramatsu, Davidlohr Bueso
SRCU and Tasks RCU do not track expedited grace periods. When their
callback state is checked via poll_state_synchronize_rcu_full(), the
uninitialized or zeroed rgos_exp field could cause false-positive
completion detection.
This commit adds an RCU_GET_STATE_NOT_TRACKED sentinel value (0x2) that
these subsystems can place into rgos_exp to indicate that expedited GP
tracking is not applicable. The expedited sequence check in
poll_state_synchronize_rcu_full() is guarded to skip entries marked with
this sentinel.
This is needed to allow rcu_segcblist_advance() and rcu_accelerate_cbs()
to work with both normal and expedited grace periods via
get_state_synchronize_rcu_full() and poll_state_synchronize_rcu_full().
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
kernel/rcu/rcu.h | 13 +++++++++++--
kernel/rcu/tree.c | 3 ++-
2 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index fa6d30ce73d1..b450febac823 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -46,16 +46,25 @@
* the number of pending readers that will use
* this inactive index is bounded).
*
- * RCU polled GP special control value:
+ * RCU polled GP special control values:
*
* RCU_GET_STATE_COMPLETED : State value indicating an already-completed
* polled GP has completed. This value covers
* both the state and the counter of the
* grace-period sequence number.
+ *
+ * RCU_GET_STATE_NOT_TRACKED : State value indicating that a GP component
+ * is not tracked by this subsystem and should
+ * not be checked. Used by SRCU and RCU Tasks
+ * which do not track expedited GPs, to prevent
+ * false-positive completion when their
+ * gp_seq_full entries are checked via
+ * poll_state_synchronize_rcu_full().
*/
-/* Low-order bit definition for polled grace-period APIs. */
+/* Low-order bit definitions for polled grace-period APIs. */
#define RCU_GET_STATE_COMPLETED 0x1
+#define RCU_GET_STATE_NOT_TRACKED 0x2
/* A complete grace period count */
#define RCU_SEQ_GP (RCU_SEQ_STATE_MASK + 1)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index cbc170dc3f72..607fc5715cd1 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3603,7 +3603,8 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp)
if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED ||
rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) ||
rgosp->rgos_exp == RCU_GET_STATE_COMPLETED ||
- rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) {
+ (rgosp->rgos_exp != RCU_GET_STATE_NOT_TRACKED &&
+ rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp))) {
smp_mb(); /* Ensure GP ends before subsequent accesses. */
return true;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 05/10] rcu: Enable RCU callbacks to benefit from expedited grace periods
2026-04-17 23:11 [RFC PATCH 00/10] RCU: Enable callbacks to benefit from expedited grace periods Puranjay Mohan
` (3 preceding siblings ...)
2026-04-17 23:11 ` [RFC PATCH 04/10] rcu: Add RCU_GET_STATE_NOT_TRACKED for subsystems without expedited GPs Puranjay Mohan
@ 2026-04-17 23:11 ` Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 06/10] rcu: Update comments for gp_seq_full and expedited GP tracking Puranjay Mohan
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Puranjay Mohan @ 2026-04-17 23:11 UTC (permalink / raw)
To: rcu, linux-kernel, linux-trace-kernel
Cc: Puranjay Mohan, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Masami Hiramatsu, Davidlohr Bueso
Currently, RCU callbacks only track normal grace period sequence
numbers. This means callbacks must wait for normal grace periods to
complete even when expedited grace periods have already elapsed.
This commit uses the full rcu_gp_oldstate structure (which tracks both
normal and expedited GP sequences) throughout the callback
infrastructure.
The rcu_segcblist_advance() function now checks both normal and
expedited GP completion via poll_state_synchronize_rcu_full(), becoming
parameterless since it reads the GP state internally.
rcu_segcblist_accelerate() stores the full GP state (both normal and
expedited sequences) instead of just the normal sequence.
The rcu_accelerate_cbs() and rcu_accelerate_cbs_unlocked() functions use
get_state_synchronize_rcu_full() to capture both GP sequences. The NOCB
code uses poll_state_synchronize_rcu_full() for advance checks instead
of comparing only the normal GP sequence.
srcu_segcblist_advance() become standalone implementations because
compares SRCU sequences directly (it cannot use
poll_state_synchronize_rcu_full(), which reads RCU-specific globals).
srcu_segcblist_accelerate() sets rgos_exp to RCU_GET_STATE_NOT_TRACKED
so poll_state_synchronize_rcu_full() only compares the rgosp->rgos_norm
and ignores rgos_exp.
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
kernel/rcu/rcu_segcblist.c | 30 ++++++++++++++++++++++++------
kernel/rcu/rcu_segcblist.h | 2 +-
kernel/rcu/tree.c | 9 +++------
kernel/rcu/tree_nocb.h | 33 +++++++++++++++++++++++----------
4 files changed, 51 insertions(+), 23 deletions(-)
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 00e164db8b74..11174e2be3c2 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -12,6 +12,7 @@
#include <linux/kernel.h>
#include <linux/types.h>
+#include "rcu.h"
#include "rcu_segcblist.h"
/* Initialize simple callback list. */
@@ -494,9 +495,9 @@ static void rcu_segcblist_advance_compact(struct rcu_segcblist *rsclp, int i)
/*
* Advance the callbacks in the specified rcu_segcblist structure based
- * on the current value passed in for the grace-period counter.
+ * on the current value of the grace-period counter.
*/
-void rcu_segcblist_advance(struct rcu_segcblist *rsclp, struct rcu_gp_oldstate *rgosp)
+void rcu_segcblist_advance(struct rcu_segcblist *rsclp)
{
int i;
@@ -509,7 +510,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, struct rcu_gp_oldstate *
* are ready to invoke, and put them into the RCU_DONE_TAIL segment.
*/
for (i = RCU_WAIT_TAIL; i < RCU_NEXT_TAIL; i++) {
- if (ULONG_CMP_LT(rgosp->rgos_norm, rsclp->gp_seq_full[i].rgos_norm))
+ if (!poll_state_synchronize_rcu_full(&rsclp->gp_seq_full[i]))
break;
WRITE_ONCE(rsclp->tails[RCU_DONE_TAIL], rsclp->tails[i]);
rcu_segcblist_move_seglen(rsclp, i, RCU_DONE_TAIL);
@@ -595,7 +596,7 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, struct rcu_gp_oldstat
*/
for (; i < RCU_NEXT_TAIL; i++) {
WRITE_ONCE(rsclp->tails[i], rsclp->tails[RCU_NEXT_TAIL]);
- rsclp->gp_seq_full[i].rgos_norm = rgosp->rgos_norm;
+ rsclp->gp_seq_full[i] = *rgosp;
}
return true;
}
@@ -637,14 +638,31 @@ void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
void srcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
{
- struct rcu_gp_oldstate rgos = { .rgos_norm = seq };
+ int i;
- rcu_segcblist_advance(rsclp, &rgos);
+ WARN_ON_ONCE(!rcu_segcblist_is_enabled(rsclp));
+ if (rcu_segcblist_restempty(rsclp, RCU_DONE_TAIL))
+ return;
+
+ for (i = RCU_WAIT_TAIL; i < RCU_NEXT_TAIL; i++) {
+ if (ULONG_CMP_LT(seq, rsclp->gp_seq_full[i].rgos_norm))
+ break;
+ WRITE_ONCE(rsclp->tails[RCU_DONE_TAIL], rsclp->tails[i]);
+ rcu_segcblist_move_seglen(rsclp, i, RCU_DONE_TAIL);
+ }
+
+ /* If no callbacks moved, nothing more need be done. */
+ if (i == RCU_WAIT_TAIL)
+ return;
+
+ rcu_segcblist_advance_compact(rsclp, i);
}
bool srcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
{
struct rcu_gp_oldstate rgos = { .rgos_norm = seq };
+ if (IS_ENABLED(CONFIG_SMP))
+ rgos.rgos_exp = RCU_GET_STATE_NOT_TRACKED;
return rcu_segcblist_accelerate(rsclp, &rgos);
}
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 2c06ab830a3d..6e05fdf93e7b 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -139,7 +139,7 @@ void rcu_segcblist_insert_done_cbs(struct rcu_segcblist *rsclp,
struct rcu_cblist *rclp);
void rcu_segcblist_insert_pend_cbs(struct rcu_segcblist *rsclp,
struct rcu_cblist *rclp);
-void rcu_segcblist_advance(struct rcu_segcblist *rsclp, struct rcu_gp_oldstate *rgosp);
+void rcu_segcblist_advance(struct rcu_segcblist *rsclp);
bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, struct rcu_gp_oldstate *rgosp);
void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
struct rcu_segcblist *src_rsclp);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 607fc5715cd1..35076092f754 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1164,7 +1164,7 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
* accelerating callback invocation to an earlier grace-period
* number.
*/
- rgos.rgos_norm = rcu_seq_snap(&rcu_state.gp_seq);
+ get_state_synchronize_rcu_full(&rgos);
if (rcu_segcblist_accelerate(&rdp->cblist, &rgos))
ret = rcu_start_this_gp(rnp, rdp, rgos.rgos_norm);
@@ -1193,7 +1193,7 @@ static void rcu_accelerate_cbs_unlocked(struct rcu_node *rnp,
bool needwake;
rcu_lockdep_assert_cblist_protected(rdp);
- rgos.rgos_norm = rcu_seq_snap(&rcu_state.gp_seq);
+ get_state_synchronize_rcu_full(&rgos);
if (!READ_ONCE(rdp->gpwrap) && ULONG_CMP_GE(rdp->gp_seq_needed, rgos.rgos_norm)) {
/* Old request still live, so mark recent callbacks. */
(void)rcu_segcblist_accelerate(&rdp->cblist, &rgos);
@@ -1218,8 +1218,6 @@ static void rcu_accelerate_cbs_unlocked(struct rcu_node *rnp,
*/
static bool rcu_advance_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
{
- struct rcu_gp_oldstate rgos;
-
rcu_lockdep_assert_cblist_protected(rdp);
raw_lockdep_assert_held_rcu_node(rnp);
@@ -1231,8 +1229,7 @@ static bool rcu_advance_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
* Find all callbacks whose ->gp_seq numbers indicate that they
* are ready to invoke, and put them into the RCU_DONE_TAIL sublist.
*/
- rgos.rgos_norm = rnp->gp_seq;
- rcu_segcblist_advance(&rdp->cblist, &rgos);
+ rcu_segcblist_advance(&rdp->cblist);
/* Classify any remaining callbacks. */
return rcu_accelerate_cbs(rnp, rdp);
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 1837eedfb8c2..7462cd5e2507 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -502,7 +502,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
}
if (j != rdp->nocb_gp_adv_time &&
rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq_full) &&
- rcu_seq_done(&rdp->mynode->gp_seq, cur_gp_seq_full.rgos_norm)) {
+ poll_state_synchronize_rcu_full(&cur_gp_seq_full)) {
rcu_advance_cbs_nowake(rdp->mynode, rdp);
rdp->nocb_gp_adv_time = j;
}
@@ -731,7 +731,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
if (!rcu_segcblist_restempty(&rdp->cblist,
RCU_NEXT_READY_TAIL) ||
(rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq_full) &&
- rcu_seq_done(&rnp->gp_seq, cur_gp_seq_full.rgos_norm))) {
+ poll_state_synchronize_rcu_full(&cur_gp_seq_full))) {
raw_spin_lock_rcu_node(rnp); /* irqs disabled. */
needwake_gp = rcu_advance_cbs(rnp, rdp);
wasempty = rcu_segcblist_restempty(&rdp->cblist,
@@ -742,7 +742,18 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
WARN_ON_ONCE(wasempty &&
!rcu_segcblist_restempty(&rdp->cblist,
RCU_NEXT_READY_TAIL));
- if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq_full)) {
+ /*
+ * Only request a GP wait if the next pending callback's
+ * GP has not already completed (normal or expedited).
+ * If poll_state_synchronize_rcu_full() says it completed,
+ * then rcu_advance_cbs() above already moved those
+ * callbacks to RCU_DONE_TAIL, so there is no GP to wait
+ * for. Any remaining callbacks got new (future) GP
+ * numbers from rcu_accelerate_cbs() inside
+ * rcu_advance_cbs() and will be handled on the next pass.
+ */
+ if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq_full) &&
+ !poll_state_synchronize_rcu_full(&cur_gp_seq_full)) {
if (!needwait_gp ||
ULONG_CMP_LT(cur_gp_seq_full.rgos_norm, wait_gp_seq))
wait_gp_seq = cur_gp_seq_full.rgos_norm;
@@ -919,7 +930,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
lockdep_assert_irqs_enabled();
rcu_nocb_lock_irqsave(rdp, flags);
if (rcu_segcblist_nextgp(cblist, &cur_gp_seq_full) &&
- rcu_seq_done(&rnp->gp_seq, cur_gp_seq_full.rgos_norm) &&
+ poll_state_synchronize_rcu_full(&cur_gp_seq_full) &&
raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */
needwake_gp = rcu_advance_cbs(rdp->mynode, rdp);
raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
@@ -1548,8 +1559,8 @@ static void show_rcu_nocb_gp_state(struct rcu_data *rdp)
static void show_rcu_nocb_state(struct rcu_data *rdp)
{
char bufd[22];
- char bufw[45];
- char bufr[45];
+ char bufw[64];
+ char bufr[64];
char bufn[22];
char bufb[22];
struct rcu_data *nocb_next_rdp;
@@ -1569,10 +1580,12 @@ static void show_rcu_nocb_state(struct rcu_data *rdp)
nocb_entry_rdp);
sprintf(bufd, "%ld", rsclp->seglen[RCU_DONE_TAIL]);
- sprintf(bufw, "%ld(%ld)", rsclp->seglen[RCU_WAIT_TAIL],
- rsclp->gp_seq_full[RCU_WAIT_TAIL].rgos_norm);
- sprintf(bufr, "%ld(%ld)", rsclp->seglen[RCU_NEXT_READY_TAIL],
- rsclp->gp_seq_full[RCU_NEXT_READY_TAIL].rgos_norm);
+ sprintf(bufw, "%ld(%ld/%ld)", rsclp->seglen[RCU_WAIT_TAIL],
+ rsclp->gp_seq_full[RCU_WAIT_TAIL].rgos_norm,
+ rsclp->gp_seq_full[RCU_WAIT_TAIL].rgos_exp);
+ sprintf(bufr, "%ld(%ld/%ld)", rsclp->seglen[RCU_NEXT_READY_TAIL],
+ rsclp->gp_seq_full[RCU_NEXT_READY_TAIL].rgos_norm,
+ rsclp->gp_seq_full[RCU_NEXT_READY_TAIL].rgos_exp);
sprintf(bufn, "%ld", rsclp->seglen[RCU_NEXT_TAIL]);
sprintf(bufb, "%ld", rcu_cblist_n_cbs(&rdp->nocb_bypass));
pr_info(" CB %d^%d->%d %c%c%c%c%c F%ld L%ld C%d %c%s%c%s%c%s%c%s%c%s q%ld %c CPU %d%s\n",
--
2.52.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 06/10] rcu: Update comments for gp_seq_full and expedited GP tracking
2026-04-17 23:11 [RFC PATCH 00/10] RCU: Enable callbacks to benefit from expedited grace periods Puranjay Mohan
` (4 preceding siblings ...)
2026-04-17 23:11 ` [RFC PATCH 05/10] rcu: Enable RCU callbacks to benefit from expedited grace periods Puranjay Mohan
@ 2026-04-17 23:11 ` Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 07/10] rcu: Wake NOCB rcuog kthreads on expedited grace period completion Puranjay Mohan
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Puranjay Mohan @ 2026-04-17 23:11 UTC (permalink / raw)
To: rcu, linux-kernel, linux-trace-kernel
Cc: Puranjay Mohan, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Masami Hiramatsu, Davidlohr Bueso
This commit updates documentation comments throughout the RCU callback
infrastructure to reflect the transition from single grace-period
sequence numbers to the full rcu_gp_oldstate structure that tracks both
normal and expedited grace periods.
The ->gp_seq_full[] array documentation in rcu_segcblist.h is updated
to describe dual GP tracking. The rcu_segcblist_advance(),
rcu_segcblist_accelerate(), and rcu_advance_cbs() comments are updated
to reference ->gp_seq_full[] and rgosp instead of the old ->gp_seq[]
and seq names.
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
include/linux/rcu_segcblist.h | 14 +++++++-----
kernel/rcu/rcu_segcblist.c | 43 +++++++++++++++++++++++------------
kernel/rcu/tree.c | 6 ++---
3 files changed, 39 insertions(+), 24 deletions(-)
diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
index 59c68f2ba113..6052c5750ad3 100644
--- a/include/linux/rcu_segcblist.h
+++ b/include/linux/rcu_segcblist.h
@@ -50,12 +50,14 @@ struct rcu_cblist {
* Note that RCU_WAIT_TAIL cannot be empty unless RCU_NEXT_READY_TAIL is also
* empty.
*
- * The ->gp_seq[] array contains the grace-period number at which the
- * corresponding segment of callbacks will be ready to invoke. A given
- * element of this array is meaningful only when the corresponding segment
- * is non-empty, and it is never valid for RCU_DONE_TAIL (whose callbacks
- * are already ready to invoke) or for RCU_NEXT_TAIL (whose callbacks have
- * not yet been assigned a grace-period number).
+ * The ->gp_seq_full[] array contains the grace-period state at which the
+ * corresponding segment of callbacks will be ready to invoke. This tracks
+ * both normal and expedited grace periods, allowing callbacks to complete
+ * when either type of GP finishes. A given element of this array is
+ * meaningful only when the corresponding segment is non-empty, and it is
+ * never valid for RCU_DONE_TAIL (whose callbacks are already ready to
+ * invoke) or for RCU_NEXT_TAIL (whose callbacks have not yet been assigned
+ * a grace-period state).
*/
#define RCU_DONE_TAIL 0 /* Also RCU_WAIT head. */
#define RCU_WAIT_TAIL 1 /* Also RCU_NEXT_READY head. */
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 11174e2be3c2..0f13016c0a2c 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -495,7 +495,8 @@ static void rcu_segcblist_advance_compact(struct rcu_segcblist *rsclp, int i)
/*
* Advance the callbacks in the specified rcu_segcblist structure based
- * on the current value of the grace-period counter.
+ * on the current grace-period state. Checks both normal and expedited
+ * grace periods, advancing callbacks when either GP type completes.
*/
void rcu_segcblist_advance(struct rcu_segcblist *rsclp)
{
@@ -506,8 +507,10 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp)
return;
/*
- * Find all callbacks whose ->gp_seq numbers indicate that they
- * are ready to invoke, and put them into the RCU_DONE_TAIL segment.
+ * Find all callbacks whose grace periods have completed (either
+ * normal or expedited) and put them into the RCU_DONE_TAIL segment.
+ * We check against the current global GP state, which includes
+ * proper memory barriers and handles special completion values.
*/
for (i = RCU_WAIT_TAIL; i < RCU_NEXT_TAIL; i++) {
if (!poll_state_synchronize_rcu_full(&rsclp->gp_seq_full[i]))
@@ -534,9 +537,9 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp)
* them to complete at the end of the earlier grace period.
*
* This function operates on an rcu_segcblist structure, and also the
- * grace-period sequence number seq at which new callbacks would become
+ * grace-period state rgosp at which new callbacks would become
* ready to invoke. Returns true if there are callbacks that won't be
- * ready to invoke until seq, false otherwise.
+ * ready to invoke until the grace period represented by rgosp, false otherwise.
*/
bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, struct rcu_gp_oldstate *rgosp)
{
@@ -548,11 +551,11 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, struct rcu_gp_oldstat
/*
* Find the segment preceding the oldest segment of callbacks
- * whose ->gp_seq[] completion is at or after that passed in via
- * "seq", skipping any empty segments. This oldest segment, along
+ * whose grace period completion is at or after that passed in via
+ * "rgosp", skipping any empty segments. This oldest segment, along
* with any later segments, can be merged in with any newly arrived
- * callbacks in the RCU_NEXT_TAIL segment, and assigned "seq"
- * as their ->gp_seq[] grace-period completion sequence number.
+ * callbacks in the RCU_NEXT_TAIL segment, and assigned "rgosp"
+ * as their grace-period completion state.
*/
for (i = RCU_NEXT_READY_TAIL; i > RCU_DONE_TAIL; i--)
if (!rcu_segcblist_segempty(rsclp, i) &&
@@ -561,7 +564,7 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, struct rcu_gp_oldstat
/*
* If all the segments contain callbacks that correspond to
- * earlier grace-period sequence numbers than "seq", leave.
+ * earlier grace-period sequence numbers than "rgosp", leave.
* Assuming that the rcu_segcblist structure has enough
* segments in its arrays, this can only happen if some of
* the non-done segments contain callbacks that really are
@@ -569,15 +572,15 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, struct rcu_gp_oldstat
* out by the next call to rcu_segcblist_advance().
*
* Also advance to the oldest segment of callbacks whose
- * ->gp_seq[] completion is at or after that passed in via "seq",
+ * ->gp_seq_full[] completion is at or after that passed in via "rgosp",
* skipping any empty segments.
*
* Note that segment "i" (and any lower-numbered segments
* containing older callbacks) will be unaffected, and their
- * grace-period numbers remain unchanged. For example, if i ==
+ * grace-period states remain unchanged. For example, if i ==
* WAIT_TAIL, then neither WAIT_TAIL nor DONE_TAIL will be touched.
* Instead, the CBs in NEXT_TAIL will be merged with those in
- * NEXT_READY_TAIL and the grace-period number of NEXT_READY_TAIL
+ * NEXT_READY_TAIL and the grace-period state of NEXT_READY_TAIL
* would be updated. NEXT_TAIL would then be empty.
*/
if (rcu_segcblist_restempty(rsclp, i) || ++i >= RCU_NEXT_TAIL)
@@ -589,8 +592,8 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, struct rcu_gp_oldstat
/*
* Merge all later callbacks, including newly arrived callbacks,
- * into the segment located by the for-loop above. Assign "seq"
- * as the ->gp_seq[] value in order to correctly handle the case
+ * into the segment located by the for-loop above. Assign "rgosp"
+ * as the grace-period state in order to correctly handle the case
* where there were no pending callbacks in the rcu_segcblist
* structure other than in the RCU_NEXT_TAIL segment.
*/
@@ -644,6 +647,10 @@ void srcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
if (rcu_segcblist_restempty(rsclp, RCU_DONE_TAIL))
return;
+ /*
+ * Find all callbacks whose normal GP sequence numbers indicate
+ * that they are ready to invoke. For SRCU, we only check rgos_norm.
+ */
for (i = RCU_WAIT_TAIL; i < RCU_NEXT_TAIL; i++) {
if (ULONG_CMP_LT(seq, rsclp->gp_seq_full[i].rgos_norm))
break;
@@ -658,6 +665,12 @@ void srcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
rcu_segcblist_advance_compact(rsclp, i);
}
+/*
+ * SRCU wrapper for rcu_segcblist_accelerate() - converts SRCU's unsigned
+ * long GP sequence to rcu_gp_oldstate format with rgos_exp set to
+ * RCU_GET_STATE_NOT_TRACKED (since SRCU does not use expedited GPs)
+ * and calls the core rcu_segcblist_accelerate().
+ */
bool srcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
{
struct rcu_gp_oldstate rgos = { .rgos_norm = seq };
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 35076092f754..0e43866dc4cd 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1209,7 +1209,7 @@ static void rcu_accelerate_cbs_unlocked(struct rcu_node *rnp,
/*
* Move any callbacks whose grace period has completed to the
* RCU_DONE_TAIL sublist, then compact the remaining sublists and
- * assign ->gp_seq numbers to any callbacks in the RCU_NEXT_TAIL
+ * assign ->gp_seq_full[] state to any callbacks in the RCU_NEXT_TAIL
* sublist. This function is idempotent, so it does not hurt to
* invoke it repeatedly. As long as it is not invoked -too- often...
* Returns true if the RCU grace-period kthread needs to be awakened.
@@ -1226,8 +1226,8 @@ static bool rcu_advance_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
return false;
/*
- * Find all callbacks whose ->gp_seq numbers indicate that they
- * are ready to invoke, and put them into the RCU_DONE_TAIL sublist.
+ * Find all callbacks whose grace periods have completed (either
+ * normal or expedited) and put them into the RCU_DONE_TAIL sublist.
*/
rcu_segcblist_advance(&rdp->cblist);
--
2.52.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 07/10] rcu: Wake NOCB rcuog kthreads on expedited grace period completion
2026-04-17 23:11 [RFC PATCH 00/10] RCU: Enable callbacks to benefit from expedited grace periods Puranjay Mohan
` (5 preceding siblings ...)
2026-04-17 23:11 ` [RFC PATCH 06/10] rcu: Update comments for gp_seq_full and expedited GP tracking Puranjay Mohan
@ 2026-04-17 23:11 ` Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 08/10] rcu: Detect expedited grace period completion in rcu_pending() Puranjay Mohan
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Puranjay Mohan @ 2026-04-17 23:11 UTC (permalink / raw)
To: rcu, linux-kernel, linux-trace-kernel
Cc: Puranjay Mohan, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Masami Hiramatsu, Davidlohr Bueso
When an expedited grace period completes, rcu_exp_wait_wake() wakes
waiters on rnp->exp_wq[] but does not notify NOCB rcuog kthreads. These
kthreads may be sleeping waiting for a grace period to complete.
Without this wakeup, callbacks on offloaded CPUs that could benefit from
the expedited GP must wait until the rcuog kthread wakes for some other
reason (e.g., next normal GP completion or a timer).
Add rcu_exp_wake_nocb() which wakes rcuog kthreads for leaf-node CPUs,
deduplicating via rdp->nocb_gp_rdp since multiple CPUs share one rcuog
kthread. Uses for_each_leaf_node_possible_cpu() because offline CPUs
can have pending callbacks. The function is defined in tree_nocb.h with
an empty stub for CONFIG_RCU_NOCB_CPU=n builds.
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
kernel/rcu/tree.h | 1 +
kernel/rcu/tree_exp.h | 1 +
kernel/rcu/tree_nocb.h | 29 +++++++++++++++++++++++++++++
3 files changed, 31 insertions(+)
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 7dfc57e9adb1..40f778453591 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -500,6 +500,7 @@ static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp);
static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
static void rcu_init_one_nocb(struct rcu_node *rnp);
static bool wake_nocb_gp(struct rcu_data *rdp);
+static void rcu_exp_wake_nocb(struct rcu_node *rnp);
static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
unsigned long j, bool lazy);
static void call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *head,
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 82cada459e5d..0df1009c6e97 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -708,6 +708,7 @@ static void rcu_exp_wait_wake(unsigned long s)
}
smp_mb(); /* All above changes before wakeup. */
wake_up_all(&rnp->exp_wq[rcu_seq_ctr(s) & 0x3]);
+ rcu_exp_wake_nocb(rnp);
}
trace_rcu_exp_grace_period(rcu_state.name, s, TPS("endwake"));
mutex_unlock(&rcu_state.exp_wake_mutex);
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 7462cd5e2507..f37ee56d62a9 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -190,6 +190,31 @@ static void rcu_init_one_nocb(struct rcu_node *rnp)
init_swait_queue_head(&rnp->nocb_gp_wq[1]);
}
+/*
+ * Wake NOCB rcuog kthreads for leaf-node CPUs so that they can advance
+ * callbacks that were waiting for the just-completed expedited GP.
+ * Deduplicate via nocb_gp_rdp since multiple CPUs share one rcuog
+ * kthread. Use for_each_leaf_node_possible_cpu() because offline CPUs
+ * may have pending callbacks.
+ */
+static void rcu_exp_wake_nocb(struct rcu_node *rnp)
+{
+ struct rcu_data *last_rdp_gp = NULL;
+ int cpu;
+
+ if (!rcu_is_leaf_node(rnp))
+ return;
+
+ for_each_leaf_node_possible_cpu(rnp, cpu) {
+ struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+
+ if (rdp->nocb_gp_rdp == last_rdp_gp)
+ continue;
+ last_rdp_gp = rdp->nocb_gp_rdp;
+ wake_nocb_gp(rdp);
+ }
+}
+
/* Clear any pending deferred wakeup timer (nocb_gp_lock must be held). */
static void nocb_defer_wakeup_cancel(struct rcu_data *rdp_gp)
{
@@ -1668,6 +1693,10 @@ static void rcu_init_one_nocb(struct rcu_node *rnp)
{
}
+static void rcu_exp_wake_nocb(struct rcu_node *rnp)
+{
+}
+
static bool wake_nocb_gp(struct rcu_data *rdp)
{
return false;
--
2.52.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 08/10] rcu: Detect expedited grace period completion in rcu_pending()
2026-04-17 23:11 [RFC PATCH 00/10] RCU: Enable callbacks to benefit from expedited grace periods Puranjay Mohan
` (6 preceding siblings ...)
2026-04-17 23:11 ` [RFC PATCH 07/10] rcu: Wake NOCB rcuog kthreads on expedited grace period completion Puranjay Mohan
@ 2026-04-17 23:11 ` Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 09/10] rcu: Advance callbacks for expedited GP completion in rcu_core() Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 10/10] rcuscale: Add concurrent expedited GP threads for callback scaling tests Puranjay Mohan
9 siblings, 0 replies; 11+ messages in thread
From: Puranjay Mohan @ 2026-04-17 23:11 UTC (permalink / raw)
To: rcu, linux-kernel, linux-trace-kernel
Cc: Puranjay Mohan, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Masami Hiramatsu, Davidlohr Bueso
rcu_pending() is the gatekeeper that decides whether rcu_core() should
run on the current CPU's timer tick. Currently it checks if the CPU has
callbacks ready to invoke or a grace period has completed or started.
It does not check that an expedited GP has completed. After an expedited
GP, callbacks remain in RCU_WAIT_TAIL (not yet advanced to
RCU_DONE_TAIL) and So rcu_core() never runs to advance them.
Add a check using rcu_segcblist_nextgp() combined with
poll_state_synchronize_rcu_full() to detect when any pending callbacks'
grace period has completed.
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
kernel/rcu/tree.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0e43866dc4cd..309273a37b0a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3671,6 +3671,7 @@ EXPORT_SYMBOL_GPL(cond_synchronize_rcu_full);
static int rcu_pending(int user)
{
bool gp_in_progress;
+ struct rcu_gp_oldstate gp_state;
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
struct rcu_node *rnp = rdp->mynode;
@@ -3701,6 +3702,12 @@ static int rcu_pending(int user)
rcu_segcblist_ready_cbs(&rdp->cblist))
return 1;
+ /* Has a GP (normal or expedited) completed for pending callbacks? */
+ if (!rcu_rdp_is_offloaded(rdp) &&
+ rcu_segcblist_nextgp(&rdp->cblist, &gp_state) &&
+ poll_state_synchronize_rcu_full(&gp_state))
+ return 1;
+
/* Has RCU gone idle with this CPU needing another grace period? */
if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) &&
!rcu_rdp_is_offloaded(rdp) &&
--
2.52.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 09/10] rcu: Advance callbacks for expedited GP completion in rcu_core()
2026-04-17 23:11 [RFC PATCH 00/10] RCU: Enable callbacks to benefit from expedited grace periods Puranjay Mohan
` (7 preceding siblings ...)
2026-04-17 23:11 ` [RFC PATCH 08/10] rcu: Detect expedited grace period completion in rcu_pending() Puranjay Mohan
@ 2026-04-17 23:11 ` Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 10/10] rcuscale: Add concurrent expedited GP threads for callback scaling tests Puranjay Mohan
9 siblings, 0 replies; 11+ messages in thread
From: Puranjay Mohan @ 2026-04-17 23:11 UTC (permalink / raw)
To: rcu, linux-kernel, linux-trace-kernel
Cc: Puranjay Mohan, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Masami Hiramatsu, Davidlohr Bueso
Even when rcu_pending() triggers rcu_core(), the normal callback
advancement path through note_gp_changes() -> __note_gp_changes() bails
out when rdp->gp_seq == rnp->gp_seq (no normal GP change). Since
expedited GPs do not update rnp->gp_seq, rcu_advance_cbs() is never
called and callbacks remain stuck in RCU_WAIT_TAIL.
Add a direct callback advancement block in rcu_core() that checks for GP
completion via rcu_segcblist_nextgp() combined with
poll_state_synchronize_rcu_full(). When detected, trylock rnp and call
rcu_advance_cbs() to move completed callbacks to RCU_DONE_TAIL. Wake the
GP kthread if rcu_advance_cbs() requests a new grace period.
Uses trylock to avoid adding contention on rnp->lock. If the lock is
contended, callbacks will be advanced on the next tick.
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
kernel/rcu/tree.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 309273a37b0a..1a92b6105de5 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2853,6 +2853,22 @@ static __latent_entropy void rcu_core(void)
/* Update RCU state based on any recent quiescent states. */
rcu_check_quiescent_state(rdp);
+ /* Advance callbacks if an expedited GP has completed. */
+ if (!rcu_rdp_is_offloaded(rdp) &&
+ rcu_segcblist_is_enabled(&rdp->cblist)) {
+ struct rcu_gp_oldstate gp_state;
+
+ if (rcu_segcblist_nextgp(&rdp->cblist, &gp_state) &&
+ poll_state_synchronize_rcu_full(&gp_state)) {
+ guard(irqsave)();
+ if (raw_spin_trylock_rcu_node(rnp)) {
+ if (rcu_advance_cbs(rnp, rdp))
+ rcu_gp_kthread_wake();
+ raw_spin_unlock_rcu_node(rnp);
+ }
+ }
+ }
+
/* No grace period and unregistered callbacks? */
if (!rcu_gp_in_progress() &&
rcu_segcblist_is_enabled(&rdp->cblist) && !rcu_rdp_is_offloaded(rdp)) {
--
2.52.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 10/10] rcuscale: Add concurrent expedited GP threads for callback scaling tests
2026-04-17 23:11 [RFC PATCH 00/10] RCU: Enable callbacks to benefit from expedited grace periods Puranjay Mohan
` (8 preceding siblings ...)
2026-04-17 23:11 ` [RFC PATCH 09/10] rcu: Advance callbacks for expedited GP completion in rcu_core() Puranjay Mohan
@ 2026-04-17 23:11 ` Puranjay Mohan
9 siblings, 0 replies; 11+ messages in thread
From: Puranjay Mohan @ 2026-04-17 23:11 UTC (permalink / raw)
To: rcu, linux-kernel, linux-trace-kernel
Cc: Puranjay Mohan, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Masami Hiramatsu, Davidlohr Bueso
Add nexp and exp_interval parameters to rcuscale that spawn kthreads
running synchronize_rcu_expedited() in a loop. This generates concurrent
expedited GP load while the normal writers measure GP or callback
latency.
When combined with gp_async=1 (which uses call_rcu() for writers), this
tests how effectively callbacks benefit from expedited grace periods.
With RCU callback expedited GP tracking, the async callbacks should
complete faster because they piggyback on the expedited GPs rather than
waiting for normal GPs.
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
kernel/rcu/rcuscale.c | 84 +++++++++++++++++++++++++++++++++++++++++--
1 file changed, 82 insertions(+), 2 deletions(-)
diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
index ac0b1c6b7dae..1097ec15879c 100644
--- a/kernel/rcu/rcuscale.c
+++ b/kernel/rcu/rcuscale.c
@@ -91,6 +91,8 @@ torture_param(int, shutdown_secs, !IS_MODULE(CONFIG_RCU_SCALE_TEST) * 300,
torture_param(int, verbose, 1, "Enable verbose debugging printk()s");
torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable");
torture_param(int, writer_holdoff_jiffies, 0, "Holdoff (jiffies) between GPs, zero to disable");
+torture_param(int, nexp, 0, "Number of expedited GP threads to run concurrently");
+torture_param(int, exp_interval, 0, "Interval (us) between expedited GPs, zero to disable");
torture_param(int, kfree_rcu_test, 0, "Do we run a kfree_rcu() scale test?");
torture_param(int, kfree_mult, 1, "Multiple of kfree_obj size to allocate.");
torture_param(int, kfree_by_call_rcu, 0, "Use call_rcu() to emulate kfree_rcu()?");
@@ -115,8 +117,10 @@ struct writer_freelist {
static int nrealreaders;
static int nrealwriters;
+static int nrealexp;
static struct task_struct **writer_tasks;
static struct task_struct **reader_tasks;
+static struct task_struct **exp_tasks;
static u64 **writer_durations;
static bool *writer_done;
@@ -462,6 +466,34 @@ rcu_scale_reader(void *arg)
return 0;
}
+/*
+ * RCU expedited GP kthread. Repeatedly invokes expedited grace periods
+ * to generate concurrent expedited GP load while the normal-GP writers
+ * are being measured. This allows measuring the benefit of callbacks
+ * that can piggyback on expedited grace periods.
+ */
+static int
+rcu_scale_exp(void *arg)
+{
+ long me = (long)arg;
+
+ VERBOSE_SCALEOUT_STRING("rcu_scale_exp task started");
+ set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids));
+ set_user_nice(current, MIN_NICE);
+
+ if (holdoff)
+ schedule_timeout_idle(holdoff * HZ);
+
+ do {
+ if (exp_interval)
+ udelay(exp_interval);
+ cur_ops->exp_sync();
+ rcu_scale_wait_shutdown();
+ } while (!torture_must_stop());
+ torture_kthread_stopping("rcu_scale_exp");
+ return 0;
+}
+
/*
* Allocate a writer_mblock structure for the specified rcu_scale_writer
* task.
@@ -664,8 +696,10 @@ static void
rcu_scale_print_module_parms(struct rcu_scale_ops *cur_ops, const char *tag)
{
pr_alert("%s" SCALE_FLAG
- "--- %s: gp_async=%d gp_async_max=%d gp_exp=%d holdoff=%d minruntime=%d nreaders=%d nwriters=%d writer_holdoff=%d writer_holdoff_jiffies=%d verbose=%d shutdown_secs=%d\n",
- scale_type, tag, gp_async, gp_async_max, gp_exp, holdoff, minruntime, nrealreaders, nrealwriters, writer_holdoff, writer_holdoff_jiffies, verbose, shutdown_secs);
+ "--- %s: gp_async=%d gp_async_max=%d gp_exp=%d holdoff=%d minruntime=%d nreaders=%d nwriters=%d nexp=%d exp_interval=%d writer_holdoff=%d writer_holdoff_jiffies=%d verbose=%d shutdown_secs=%d\n",
+ scale_type, tag, gp_async, gp_async_max, gp_exp, holdoff,
+ minruntime, nrealreaders, nrealwriters, nrealexp, exp_interval,
+ writer_holdoff, writer_holdoff_jiffies, verbose, shutdown_secs);
}
/*
@@ -809,6 +843,13 @@ kfree_scale_cleanup(void)
if (torture_cleanup_begin())
return;
+ if (exp_tasks) {
+ for (i = 0; i < nrealexp; i++)
+ torture_stop_kthread(rcu_scale_exp, exp_tasks[i]);
+ kfree(exp_tasks);
+ exp_tasks = NULL;
+ }
+
if (kfree_reader_tasks) {
for (i = 0; i < kfree_nrealthreads; i++)
torture_stop_kthread(kfree_scale_thread,
@@ -903,6 +944,22 @@ kfree_scale_init(void)
goto unwind;
}
+ if (nrealexp > 0 && cur_ops->exp_sync) {
+ exp_tasks = kzalloc_objs(exp_tasks[0], nrealexp);
+ if (!exp_tasks) {
+ SCALEOUT_ERRSTRING("out of memory");
+ firsterr = -ENOMEM;
+ goto unwind;
+ }
+ for (i = 0; i < nrealexp; i++) {
+ firsterr = torture_create_kthread(rcu_scale_exp,
+ (void *)i,
+ exp_tasks[i]);
+ if (torture_init_error(firsterr))
+ goto unwind;
+ }
+ }
+
while (atomic_read(&n_kfree_scale_thread_started) < kfree_nrealthreads)
schedule_timeout_uninterruptible(1);
@@ -959,6 +1016,13 @@ rcu_scale_cleanup(void)
return;
}
+ if (exp_tasks) {
+ for (i = 0; i < nrealexp; i++)
+ torture_stop_kthread(rcu_scale_exp, exp_tasks[i]);
+ kfree(exp_tasks);
+ exp_tasks = NULL;
+ }
+
if (reader_tasks) {
for (i = 0; i < nrealreaders; i++)
torture_stop_kthread(rcu_scale_reader,
@@ -1076,6 +1140,7 @@ rcu_scale_init(void)
if (kthread_tp)
kthread_stime = kthread_tp->stime;
}
+ nrealexp = nexp;
if (kfree_rcu_test)
return kfree_scale_init();
@@ -1107,6 +1172,21 @@ rcu_scale_init(void)
}
while (atomic_read(&n_rcu_scale_reader_started) < nrealreaders)
schedule_timeout_uninterruptible(1);
+ if (nrealexp > 0 && cur_ops->exp_sync) {
+ exp_tasks = kzalloc_objs(exp_tasks[0], nrealexp);
+ if (!exp_tasks) {
+ SCALEOUT_ERRSTRING("out of memory");
+ firsterr = -ENOMEM;
+ goto unwind;
+ }
+ for (i = 0; i < nrealexp; i++) {
+ firsterr = torture_create_kthread(rcu_scale_exp,
+ (void *)i,
+ exp_tasks[i]);
+ if (torture_init_error(firsterr))
+ goto unwind;
+ }
+ }
writer_tasks = kzalloc_objs(writer_tasks[0], nrealwriters);
writer_durations = kcalloc(nrealwriters, sizeof(*writer_durations), GFP_KERNEL);
writer_n_durations = kzalloc_objs(*writer_n_durations, nrealwriters);
--
2.52.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-04-17 23:13 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-17 23:11 [RFC PATCH 00/10] RCU: Enable callbacks to benefit from expedited grace periods Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 01/10] rcu/segcblist: Add SRCU and Tasks RCU wrapper functions Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 02/10] rcu/segcblist: Factor out rcu_segcblist_advance_compact() helper Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 03/10] rcu/segcblist: Change gp_seq to struct rcu_gp_oldstate gp_seq_full Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 04/10] rcu: Add RCU_GET_STATE_NOT_TRACKED for subsystems without expedited GPs Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 05/10] rcu: Enable RCU callbacks to benefit from expedited grace periods Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 06/10] rcu: Update comments for gp_seq_full and expedited GP tracking Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 07/10] rcu: Wake NOCB rcuog kthreads on expedited grace period completion Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 08/10] rcu: Detect expedited grace period completion in rcu_pending() Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 09/10] rcu: Advance callbacks for expedited GP completion in rcu_core() Puranjay Mohan
2026-04-17 23:11 ` [RFC PATCH 10/10] rcuscale: Add concurrent expedited GP threads for callback scaling tests Puranjay Mohan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox