* [PATCH RFC 1/2] rcu: Expose per-CPU segmented callback counts via debugfs
2026-05-07 17:37 [PATCH RFC 0/2] rcu: Add debugfs interface for pending callback monitoring Gustavo Luiz Duarte
@ 2026-05-07 17:37 ` Gustavo Luiz Duarte
2026-05-07 17:37 ` [PATCH RFC 2/2] rcu: Include kfree_rcu/kvfree_rcu batched counts in pending_cbs Gustavo Luiz Duarte
2026-05-07 18:59 ` [PATCH RFC 0/2] rcu: Add debugfs interface for pending callback monitoring Joel Fernandes
2 siblings, 0 replies; 4+ messages in thread
From: Gustavo Luiz Duarte @ 2026-05-07 17:37 UTC (permalink / raw)
To: Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki,
Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Vlastimil Babka, Harry Yoo, Andrew Morton, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin, Breno Leitao
Cc: rcu, linux-kernel, linux-mm, Gustavo Luiz Duarte
The existing rcu_segcb_stats tracepoint requires active tracing, so
there is no always-on, low-overhead way to inspect how many callbacks
are pending on each CPU and at which stage of the grace-period
pipeline.
Add a debugfs file at /sys/kernel/debug/rcu/pending_cbs
that prints per-CPU callback counts broken down by segcblist segment,
plus a "total" row aggregating across CPUs:
- done: Callbacks ready to invoke (GP completed).
- wait: Callbacks waiting for the current GP.
- next_ready: Callbacks to be handled by the next GP.
- next: Newly queued callbacks not yet assigned a GP.
- lazy: Callbacks deferred via the RCU lazy mechanism.
The interface has zero steady-state overhead: it reads the existing
per-CPU rcu_segcblist.seglen[] counters on demand. These counters are
already maintained by the RCU callback infrastructure for its own
bookkeeping, so no new runtime accounting is introduced.
Example output:
cpu done wait next_ready next lazy
0 7 11 0 0 0
1 0 3 2 0 0
2 0 1 8 0 0
3 0 1 1 0 0
total 7 16 11 0 0
Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com>
---
kernel/rcu/tree_stall.h | 67 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 67 insertions(+)
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index b67532cb8770..d9fc9bfdaf96 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -71,6 +71,73 @@ late_initcall(kernel_rcu_stall_sysfs_init);
#endif // CONFIG_SYSFS
+#ifdef CONFIG_DEBUG_FS
+
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+
+/*
+ * Debugfs interface for displaying per-CPU RCU callback counts broken down
+ * by callback-list segment. This allows monitoring how many callbacks are
+ * waiting for grace periods without any steady-state overhead.
+ */
+static int rcu_pending_cbs_show(struct seq_file *m, void *v)
+{
+ int cpu;
+ long done, wait, nxtrdy, nxt, lazy;
+ long total_done = 0, total_wait = 0, total_nxtrdy = 0;
+ long total_nxt = 0, total_lazy = 0;
+ struct rcu_data *rdp;
+ struct rcu_segcblist *rsclp;
+
+ seq_printf(m, "%-8s %10s %10s %10s %10s %10s\n",
+ "cpu", "done", "wait", "next_ready", "next", "lazy");
+
+ for_each_possible_cpu(cpu) {
+ rdp = per_cpu_ptr(&rcu_data, cpu);
+ rsclp = &rdp->cblist;
+
+ if (!rcu_segcblist_is_enabled(rsclp))
+ continue;
+
+ done = rcu_segcblist_get_seglen(rsclp, RCU_DONE_TAIL);
+ wait = rcu_segcblist_get_seglen(rsclp, RCU_WAIT_TAIL);
+ nxtrdy = rcu_segcblist_get_seglen(rsclp, RCU_NEXT_READY_TAIL);
+ nxt = rcu_segcblist_get_seglen(rsclp, RCU_NEXT_TAIL);
+ lazy = READ_ONCE(rdp->lazy_len);
+
+ seq_printf(m, "%-8d %10ld %10ld %10ld %10ld %10ld\n",
+ cpu, done, wait, nxtrdy, nxt, lazy);
+
+ total_done += done;
+ total_wait += wait;
+ total_nxtrdy += nxtrdy;
+ total_nxt += nxt;
+ total_lazy += lazy;
+ }
+
+ seq_printf(m, "%-8s %10ld %10ld %10ld %10ld %10ld\n",
+ "total", total_done, total_wait, total_nxtrdy,
+ total_nxt, total_lazy);
+
+ return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(rcu_pending_cbs);
+
+static struct dentry *rcu_debugfs_dir;
+
+static int __init rcu_debugfs_init(void)
+{
+ rcu_debugfs_dir = debugfs_create_dir("rcu", NULL);
+ debugfs_create_file("pending_cbs", 0444, rcu_debugfs_dir,
+ NULL, &rcu_pending_cbs_fops);
+
+ return 0;
+}
+late_initcall(rcu_debugfs_init);
+
+#endif // #ifdef CONFIG_DEBUG_FS
+
#ifdef CONFIG_PROVE_RCU
#define RCU_STALL_DELAY_DELTA (5 * HZ)
#else
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 4+ messages in thread* [PATCH RFC 2/2] rcu: Include kfree_rcu/kvfree_rcu batched counts in pending_cbs
2026-05-07 17:37 [PATCH RFC 0/2] rcu: Add debugfs interface for pending callback monitoring Gustavo Luiz Duarte
2026-05-07 17:37 ` [PATCH RFC 1/2] rcu: Expose per-CPU segmented callback counts via debugfs Gustavo Luiz Duarte
@ 2026-05-07 17:37 ` Gustavo Luiz Duarte
2026-05-07 18:59 ` [PATCH RFC 0/2] rcu: Add debugfs interface for pending callback monitoring Joel Fernandes
2 siblings, 0 replies; 4+ messages in thread
From: Gustavo Luiz Duarte @ 2026-05-07 17:37 UTC (permalink / raw)
To: Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki,
Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Vlastimil Babka, Harry Yoo, Andrew Morton, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin, Breno Leitao
Cc: rcu, linux-kernel, linux-mm, Gustavo Luiz Duarte
The batched kfree_rcu()/kvfree_rcu() path (CONFIG_KVFREE_RCU_BATCHED)
manages its own per-CPU queues in struct kfree_rcu_cpu, bypassing the
main RCU segmented callback list. Objects queued through this path
were not visible in the debugfs pending_cbs file.
Add a kfree_rcu_pending() helper that returns the number of objects
waiting in the kfree_rcu batching layer for a given CPU, and include
this count as a "kfree_rcu" column in the debugfs output.
Example output:
cpu done wait next_ready next lazy kfree_rcu
0 0 0 0 5 5 12
1 0 0 0 3 3 8
total 0 0 0 8 8 20
Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com>
---
kernel/rcu/rcu.h | 1 +
kernel/rcu/tree_stall.h | 17 +++++++++++------
mm/slab_common.c | 18 ++++++++++++++++++
3 files changed, 30 insertions(+), 6 deletions(-)
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index fa6d30ce73d1..a28c3c7dc4da 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -652,6 +652,7 @@ void rcu_fwd_progress_check(unsigned long j);
void rcu_force_quiescent_state(void);
extern struct workqueue_struct *rcu_gp_wq;
extern struct kthread_worker *rcu_exp_gp_kworker;
+int kfree_rcu_pending(int cpu);
void rcu_gp_slow_register(atomic_t *rgssp);
void rcu_gp_slow_unregister(atomic_t *rgssp);
#endif /* #else #ifdef CONFIG_TINY_RCU */
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index d9fc9bfdaf96..5fd63730d5f5 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -84,14 +84,17 @@ late_initcall(kernel_rcu_stall_sysfs_init);
static int rcu_pending_cbs_show(struct seq_file *m, void *v)
{
int cpu;
+ int kfree;
long done, wait, nxtrdy, nxt, lazy;
long total_done = 0, total_wait = 0, total_nxtrdy = 0;
long total_nxt = 0, total_lazy = 0;
+ int total_kfree = 0;
struct rcu_data *rdp;
struct rcu_segcblist *rsclp;
- seq_printf(m, "%-8s %10s %10s %10s %10s %10s\n",
- "cpu", "done", "wait", "next_ready", "next", "lazy");
+ seq_printf(m, "%-8s %10s %10s %10s %10s %10s %10s\n",
+ "cpu", "done", "wait", "next_ready", "next", "lazy",
+ "kfree_rcu");
for_each_possible_cpu(cpu) {
rdp = per_cpu_ptr(&rcu_data, cpu);
@@ -105,20 +108,22 @@ static int rcu_pending_cbs_show(struct seq_file *m, void *v)
nxtrdy = rcu_segcblist_get_seglen(rsclp, RCU_NEXT_READY_TAIL);
nxt = rcu_segcblist_get_seglen(rsclp, RCU_NEXT_TAIL);
lazy = READ_ONCE(rdp->lazy_len);
+ kfree = kfree_rcu_pending(cpu);
- seq_printf(m, "%-8d %10ld %10ld %10ld %10ld %10ld\n",
- cpu, done, wait, nxtrdy, nxt, lazy);
+ seq_printf(m, "%-8d %10ld %10ld %10ld %10ld %10ld %10d\n",
+ cpu, done, wait, nxtrdy, nxt, lazy, kfree);
total_done += done;
total_wait += wait;
total_nxtrdy += nxtrdy;
total_nxt += nxt;
total_lazy += lazy;
+ total_kfree += kfree;
}
- seq_printf(m, "%-8s %10ld %10ld %10ld %10ld %10ld\n",
+ seq_printf(m, "%-8s %10ld %10ld %10ld %10ld %10ld %10d\n",
"total", total_done, total_wait, total_nxtrdy,
- total_nxt, total_lazy);
+ total_nxt, total_lazy, total_kfree);
return 0;
}
diff --git a/mm/slab_common.c b/mm/slab_common.c
index d5a70a831a2a..93b5d64399f2 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1280,6 +1280,11 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr)
}
EXPORT_SYMBOL_GPL(kvfree_call_rcu);
+int kfree_rcu_pending(int cpu)
+{
+ return 0;
+}
+
void __init kvfree_rcu_init(void)
{
}
@@ -2216,4 +2221,17 @@ void __init kvfree_rcu_init(void)
shrinker_register(kfree_rcu_shrinker);
}
+/**
+ * kfree_rcu_pending() - Return number of objects pending in kfree_rcu batches.
+ * @cpu: CPU number to query.
+ *
+ * Returns the number of objects queued in kfree_rcu()/kvfree_rcu() batches
+ * on @cpu that are waiting for a grace period. These objects are tracked
+ * separately from the main RCU callback list.
+ */
+int kfree_rcu_pending(int cpu)
+{
+ return krc_count(per_cpu_ptr(&krc, cpu));
+}
+
#endif /* CONFIG_KVFREE_RCU_BATCHED */
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH RFC 0/2] rcu: Add debugfs interface for pending callback monitoring
2026-05-07 17:37 [PATCH RFC 0/2] rcu: Add debugfs interface for pending callback monitoring Gustavo Luiz Duarte
2026-05-07 17:37 ` [PATCH RFC 1/2] rcu: Expose per-CPU segmented callback counts via debugfs Gustavo Luiz Duarte
2026-05-07 17:37 ` [PATCH RFC 2/2] rcu: Include kfree_rcu/kvfree_rcu batched counts in pending_cbs Gustavo Luiz Duarte
@ 2026-05-07 18:59 ` Joel Fernandes
2 siblings, 0 replies; 4+ messages in thread
From: Joel Fernandes @ 2026-05-07 18:59 UTC (permalink / raw)
To: Gustavo Luiz Duarte, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Josh Triplett, Boqun Feng, Uladzislau Rezki,
Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Vlastimil Babka, Harry Yoo, Andrew Morton, Hao Li,
Christoph Lameter, David Rientjes, Roman Gushchin, Breno Leitao
Cc: rcu, linux-kernel, linux-mm
On 5/7/2026 1:37 PM, Gustavo Luiz Duarte wrote:
> There is currently no easy way to monitor how many RCU callbacks are
> pending system-wide. The existing trace points provide per-event data
> but require active tracing, which makes them awkward for fleet-wide
> monitoring. Knowing the depth and stage of pending callbacks helps
> admins reason about RCU health, gives an indirect signal of memory
> held back by RCU, and is useful when tuning RCU parameters.
>
> This series adds a debugfs file at:
>
> /sys/kernel/debug/rcu/pending_cbs
>
> that reports per-CPU pending callback counts with a "total" row.
>
> Patch 1 introduces the file with per-CPU columns for each segcblist
> segment (done, wait, next_ready, next) plus a "lazy" column.
>
> Patch 2 extends the file with a "kfree_rcu" column reporting objects
> queued in the batched kfree_rcu()/kvfree_rcu() path
> (CONFIG_KVFREE_RCU_BATCHED), which has its own per-CPU queues outside
> the main segmented callback list.
>
> Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com>
You actually don't need debugfs for this. You can just use bpftrace and
instrument trace_rcu_ (with other RCU tracing Kconfig options enabled?). I had
something like that working sometime ago.
Generally RCU doesn't add userspace interfaces randomly like that. I remember
Paul ripped similar things out some time ago.
^ permalink raw reply [flat|nested] 4+ messages in thread