The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH RFC 0/2] rcu: Add debugfs interface for pending callback monitoring
@ 2026-05-07 17:37 Gustavo Luiz Duarte
  2026-05-07 17:37 ` [PATCH RFC 1/2] rcu: Expose per-CPU segmented callback counts via debugfs Gustavo Luiz Duarte
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Gustavo Luiz Duarte @ 2026-05-07 17:37 UTC (permalink / raw)
  To: Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Vlastimil Babka, Harry Yoo, Andrew Morton, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin, Breno Leitao
  Cc: rcu, linux-kernel, linux-mm, Gustavo Luiz Duarte

There is currently no easy way to monitor how many RCU callbacks are
pending system-wide. The existing trace points provide per-event data
but require active tracing, which makes them awkward for fleet-wide
monitoring. Knowing the depth and stage of pending callbacks helps
admins reason about RCU health, gives an indirect signal of memory
held back by RCU, and is useful when tuning RCU parameters.

This series adds a debugfs file at:

  /sys/kernel/debug/rcu/pending_cbs

that reports per-CPU pending callback counts with a "total" row.

Patch 1 introduces the file with per-CPU columns for each segcblist
segment (done, wait, next_ready, next) plus a "lazy" column.

Patch 2 extends the file with a "kfree_rcu" column reporting objects
queued in the batched kfree_rcu()/kvfree_rcu() path
(CONFIG_KVFREE_RCU_BATCHED), which has its own per-CPU queues outside
the main segmented callback list.

Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com>
---
Gustavo Luiz Duarte (2):
      rcu: Expose per-CPU segmented callback counts via debugfs
      rcu: Include kfree_rcu/kvfree_rcu batched counts in pending_cbs

 kernel/rcu/rcu.h        |  1 +
 kernel/rcu/tree_stall.h | 72 +++++++++++++++++++++++++++++++++++++++++++++++++
 mm/slab_common.c        | 18 +++++++++++++
 3 files changed, 91 insertions(+)
---
base-commit: 8ab992f815d6736b5c7a6f5fd7bfe7bc106bb3dc
change-id: 20260318-rcu-pending-cbs-stats-f72f5ca03415

Best regards,
-- 
Gustavo Luiz Duarte <gustavold@gmail.com>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH RFC 1/2] rcu: Expose per-CPU segmented callback counts via debugfs
  2026-05-07 17:37 [PATCH RFC 0/2] rcu: Add debugfs interface for pending callback monitoring Gustavo Luiz Duarte
@ 2026-05-07 17:37 ` Gustavo Luiz Duarte
  2026-05-07 17:37 ` [PATCH RFC 2/2] rcu: Include kfree_rcu/kvfree_rcu batched counts in pending_cbs Gustavo Luiz Duarte
  2026-05-07 18:59 ` [PATCH RFC 0/2] rcu: Add debugfs interface for pending callback monitoring Joel Fernandes
  2 siblings, 0 replies; 5+ messages in thread
From: Gustavo Luiz Duarte @ 2026-05-07 17:37 UTC (permalink / raw)
  To: Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Vlastimil Babka, Harry Yoo, Andrew Morton, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin, Breno Leitao
  Cc: rcu, linux-kernel, linux-mm, Gustavo Luiz Duarte

The existing rcu_segcb_stats tracepoint requires active tracing, so
there is no always-on, low-overhead way to inspect how many callbacks
are pending on each CPU and at which stage of the grace-period
pipeline.

Add a debugfs file at /sys/kernel/debug/rcu/pending_cbs
that prints per-CPU callback counts broken down by segcblist segment,
plus a "total" row aggregating across CPUs:

  - done:       Callbacks ready to invoke (GP completed).
  - wait:       Callbacks waiting for the current GP.
  - next_ready: Callbacks to be handled by the next GP.
  - next:       Newly queued callbacks not yet assigned a GP.
  - lazy:       Callbacks deferred via the RCU lazy mechanism.

The interface has zero steady-state overhead: it reads the existing
per-CPU rcu_segcblist.seglen[] counters on demand. These counters are
already maintained by the RCU callback infrastructure for its own
bookkeeping, so no new runtime accounting is introduced.

Example output:
  cpu            done       wait next_ready       next       lazy
  0                 7         11          0          0          0
  1                 0          3          2          0          0
  2                 0          1          8          0          0
  3                 0          1          1          0          0
  total             7         16         11          0          0

Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com>
---
 kernel/rcu/tree_stall.h | 67 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index b67532cb8770..d9fc9bfdaf96 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -71,6 +71,73 @@ late_initcall(kernel_rcu_stall_sysfs_init);
 
 #endif // CONFIG_SYSFS
 
+#ifdef CONFIG_DEBUG_FS
+
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+
+/*
+ * Debugfs interface for displaying per-CPU RCU callback counts broken down
+ * by callback-list segment.  This allows monitoring how many callbacks are
+ * waiting for grace periods without any steady-state overhead.
+ */
+static int rcu_pending_cbs_show(struct seq_file *m, void *v)
+{
+	int cpu;
+	long done, wait, nxtrdy, nxt, lazy;
+	long total_done = 0, total_wait = 0, total_nxtrdy = 0;
+	long total_nxt = 0, total_lazy = 0;
+	struct rcu_data *rdp;
+	struct rcu_segcblist *rsclp;
+
+	seq_printf(m, "%-8s %10s %10s %10s %10s %10s\n",
+		   "cpu", "done", "wait", "next_ready", "next", "lazy");
+
+	for_each_possible_cpu(cpu) {
+		rdp = per_cpu_ptr(&rcu_data, cpu);
+		rsclp = &rdp->cblist;
+
+		if (!rcu_segcblist_is_enabled(rsclp))
+			continue;
+
+		done   = rcu_segcblist_get_seglen(rsclp, RCU_DONE_TAIL);
+		wait   = rcu_segcblist_get_seglen(rsclp, RCU_WAIT_TAIL);
+		nxtrdy = rcu_segcblist_get_seglen(rsclp, RCU_NEXT_READY_TAIL);
+		nxt    = rcu_segcblist_get_seglen(rsclp, RCU_NEXT_TAIL);
+		lazy   = READ_ONCE(rdp->lazy_len);
+
+		seq_printf(m, "%-8d %10ld %10ld %10ld %10ld %10ld\n",
+			   cpu, done, wait, nxtrdy, nxt, lazy);
+
+		total_done   += done;
+		total_wait   += wait;
+		total_nxtrdy += nxtrdy;
+		total_nxt    += nxt;
+		total_lazy   += lazy;
+	}
+
+	seq_printf(m, "%-8s %10ld %10ld %10ld %10ld %10ld\n",
+		   "total", total_done, total_wait, total_nxtrdy,
+		   total_nxt, total_lazy);
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(rcu_pending_cbs);
+
+static struct dentry *rcu_debugfs_dir;
+
+static int __init rcu_debugfs_init(void)
+{
+	rcu_debugfs_dir = debugfs_create_dir("rcu", NULL);
+	debugfs_create_file("pending_cbs", 0444, rcu_debugfs_dir,
+			    NULL, &rcu_pending_cbs_fops);
+
+	return 0;
+}
+late_initcall(rcu_debugfs_init);
+
+#endif // #ifdef CONFIG_DEBUG_FS
+
 #ifdef CONFIG_PROVE_RCU
 #define RCU_STALL_DELAY_DELTA		(5 * HZ)
 #else

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH RFC 2/2] rcu: Include kfree_rcu/kvfree_rcu batched counts in pending_cbs
  2026-05-07 17:37 [PATCH RFC 0/2] rcu: Add debugfs interface for pending callback monitoring Gustavo Luiz Duarte
  2026-05-07 17:37 ` [PATCH RFC 1/2] rcu: Expose per-CPU segmented callback counts via debugfs Gustavo Luiz Duarte
@ 2026-05-07 17:37 ` Gustavo Luiz Duarte
  2026-05-07 18:59 ` [PATCH RFC 0/2] rcu: Add debugfs interface for pending callback monitoring Joel Fernandes
  2 siblings, 0 replies; 5+ messages in thread
From: Gustavo Luiz Duarte @ 2026-05-07 17:37 UTC (permalink / raw)
  To: Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Vlastimil Babka, Harry Yoo, Andrew Morton, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin, Breno Leitao
  Cc: rcu, linux-kernel, linux-mm, Gustavo Luiz Duarte

The batched kfree_rcu()/kvfree_rcu() path (CONFIG_KVFREE_RCU_BATCHED)
manages its own per-CPU queues in struct kfree_rcu_cpu, bypassing the
main RCU segmented callback list.  Objects queued through this path
were not visible in the debugfs pending_cbs file.

Add a kfree_rcu_pending() helper that returns the number of objects
waiting in the kfree_rcu batching layer for a given CPU, and include
this count as a "kfree_rcu" column in the debugfs output.

Example output:

  cpu            done       wait next_ready       next       lazy  kfree_rcu
  0                 0          0          0          5          5         12
  1                 0          0          0          3          3          8
  total             0          0          0          8          8         20

Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com>
---
 kernel/rcu/rcu.h        |  1 +
 kernel/rcu/tree_stall.h | 17 +++++++++++------
 mm/slab_common.c        | 18 ++++++++++++++++++
 3 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index fa6d30ce73d1..a28c3c7dc4da 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -652,6 +652,7 @@ void rcu_fwd_progress_check(unsigned long j);
 void rcu_force_quiescent_state(void);
 extern struct workqueue_struct *rcu_gp_wq;
 extern struct kthread_worker *rcu_exp_gp_kworker;
+int kfree_rcu_pending(int cpu);
 void rcu_gp_slow_register(atomic_t *rgssp);
 void rcu_gp_slow_unregister(atomic_t *rgssp);
 #endif /* #else #ifdef CONFIG_TINY_RCU */
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index d9fc9bfdaf96..5fd63730d5f5 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -84,14 +84,17 @@ late_initcall(kernel_rcu_stall_sysfs_init);
 static int rcu_pending_cbs_show(struct seq_file *m, void *v)
 {
 	int cpu;
+	int kfree;
 	long done, wait, nxtrdy, nxt, lazy;
 	long total_done = 0, total_wait = 0, total_nxtrdy = 0;
 	long total_nxt = 0, total_lazy = 0;
+	int total_kfree = 0;
 	struct rcu_data *rdp;
 	struct rcu_segcblist *rsclp;
 
-	seq_printf(m, "%-8s %10s %10s %10s %10s %10s\n",
-		   "cpu", "done", "wait", "next_ready", "next", "lazy");
+	seq_printf(m, "%-8s %10s %10s %10s %10s %10s %10s\n",
+		   "cpu", "done", "wait", "next_ready", "next", "lazy",
+		   "kfree_rcu");
 
 	for_each_possible_cpu(cpu) {
 		rdp = per_cpu_ptr(&rcu_data, cpu);
@@ -105,20 +108,22 @@ static int rcu_pending_cbs_show(struct seq_file *m, void *v)
 		nxtrdy = rcu_segcblist_get_seglen(rsclp, RCU_NEXT_READY_TAIL);
 		nxt    = rcu_segcblist_get_seglen(rsclp, RCU_NEXT_TAIL);
 		lazy   = READ_ONCE(rdp->lazy_len);
+		kfree  = kfree_rcu_pending(cpu);
 
-		seq_printf(m, "%-8d %10ld %10ld %10ld %10ld %10ld\n",
-			   cpu, done, wait, nxtrdy, nxt, lazy);
+		seq_printf(m, "%-8d %10ld %10ld %10ld %10ld %10ld %10d\n",
+			   cpu, done, wait, nxtrdy, nxt, lazy, kfree);
 
 		total_done   += done;
 		total_wait   += wait;
 		total_nxtrdy += nxtrdy;
 		total_nxt    += nxt;
 		total_lazy   += lazy;
+		total_kfree  += kfree;
 	}
 
-	seq_printf(m, "%-8s %10ld %10ld %10ld %10ld %10ld\n",
+	seq_printf(m, "%-8s %10ld %10ld %10ld %10ld %10ld %10d\n",
 		   "total", total_done, total_wait, total_nxtrdy,
-		   total_nxt, total_lazy);
+		   total_nxt, total_lazy, total_kfree);
 
 	return 0;
 }
diff --git a/mm/slab_common.c b/mm/slab_common.c
index d5a70a831a2a..93b5d64399f2 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1280,6 +1280,11 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr)
 }
 EXPORT_SYMBOL_GPL(kvfree_call_rcu);
 
+int kfree_rcu_pending(int cpu)
+{
+	return 0;
+}
+
 void __init kvfree_rcu_init(void)
 {
 }
@@ -2216,4 +2221,17 @@ void __init kvfree_rcu_init(void)
 	shrinker_register(kfree_rcu_shrinker);
 }
 
+/**
+ * kfree_rcu_pending() - Return number of objects pending in kfree_rcu batches.
+ * @cpu: CPU number to query.
+ *
+ * Returns the number of objects queued in kfree_rcu()/kvfree_rcu() batches
+ * on @cpu that are waiting for a grace period.  These objects are tracked
+ * separately from the main RCU callback list.
+ */
+int kfree_rcu_pending(int cpu)
+{
+	return krc_count(per_cpu_ptr(&krc, cpu));
+}
+
 #endif /* CONFIG_KVFREE_RCU_BATCHED */

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC 0/2] rcu: Add debugfs interface for pending callback monitoring
  2026-05-07 17:37 [PATCH RFC 0/2] rcu: Add debugfs interface for pending callback monitoring Gustavo Luiz Duarte
  2026-05-07 17:37 ` [PATCH RFC 1/2] rcu: Expose per-CPU segmented callback counts via debugfs Gustavo Luiz Duarte
  2026-05-07 17:37 ` [PATCH RFC 2/2] rcu: Include kfree_rcu/kvfree_rcu batched counts in pending_cbs Gustavo Luiz Duarte
@ 2026-05-07 18:59 ` Joel Fernandes
  2026-05-11 17:08   ` Gustavo Luiz Duarte
  2 siblings, 1 reply; 5+ messages in thread
From: Joel Fernandes @ 2026-05-07 18:59 UTC (permalink / raw)
  To: Gustavo Luiz Duarte, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Boqun Feng, Uladzislau Rezki,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Vlastimil Babka, Harry Yoo, Andrew Morton, Hao Li,
	Christoph Lameter, David Rientjes, Roman Gushchin, Breno Leitao
  Cc: rcu, linux-kernel, linux-mm



On 5/7/2026 1:37 PM, Gustavo Luiz Duarte wrote:
> There is currently no easy way to monitor how many RCU callbacks are
> pending system-wide. The existing trace points provide per-event data
> but require active tracing, which makes them awkward for fleet-wide
> monitoring. Knowing the depth and stage of pending callbacks helps
> admins reason about RCU health, gives an indirect signal of memory
> held back by RCU, and is useful when tuning RCU parameters.
> 
> This series adds a debugfs file at:
> 
>   /sys/kernel/debug/rcu/pending_cbs
> 
> that reports per-CPU pending callback counts with a "total" row.
> 
> Patch 1 introduces the file with per-CPU columns for each segcblist
> segment (done, wait, next_ready, next) plus a "lazy" column.
> 
> Patch 2 extends the file with a "kfree_rcu" column reporting objects
> queued in the batched kfree_rcu()/kvfree_rcu() path
> (CONFIG_KVFREE_RCU_BATCHED), which has its own per-CPU queues outside
> the main segmented callback list.
> 
> Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com>

You actually don't need debugfs for this. You can just use bpftrace and
instrument trace_rcu_ (with other RCU tracing Kconfig options enabled?). I had
something like that working sometime ago.

Generally RCU doesn't add userspace interfaces randomly like that. I remember
Paul ripped similar things out some time ago.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC 0/2] rcu: Add debugfs interface for pending callback monitoring
  2026-05-07 18:59 ` [PATCH RFC 0/2] rcu: Add debugfs interface for pending callback monitoring Joel Fernandes
@ 2026-05-11 17:08   ` Gustavo Luiz Duarte
  0 siblings, 0 replies; 5+ messages in thread
From: Gustavo Luiz Duarte @ 2026-05-11 17:08 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Vlastimil Babka,
	Harry Yoo, Andrew Morton, Hao Li, Christoph Lameter,
	David Rientjes, Roman Gushchin, Breno Leitao, rcu, linux-kernel,
	linux-mm

Hi Joel,

On Thu, May 7, 2026 at 7:59 PM Joel Fernandes <joelagnelf@nvidia.com> wrote:
>
>
>
> On 5/7/2026 1:37 PM, Gustavo Luiz Duarte wrote:
> > There is currently no easy way to monitor how many RCU callbacks are
> > pending system-wide. The existing trace points provide per-event data
> > but require active tracing, which makes them awkward for fleet-wide
> > monitoring. Knowing the depth and stage of pending callbacks helps
> > admins reason about RCU health, gives an indirect signal of memory
> > held back by RCU, and is useful when tuning RCU parameters.
> >
> > This series adds a debugfs file at:
> >
> >   /sys/kernel/debug/rcu/pending_cbs
> >
> > that reports per-CPU pending callback counts with a "total" row.
> >
> > Patch 1 introduces the file with per-CPU columns for each segcblist
> > segment (done, wait, next_ready, next) plus a "lazy" column.
> >
> > Patch 2 extends the file with a "kfree_rcu" column reporting objects
> > queued in the batched kfree_rcu()/kvfree_rcu() path
> > (CONFIG_KVFREE_RCU_BATCHED), which has its own per-CPU queues outside
> > the main segmented callback list.
> >
> > Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com>
>
> You actually don't need debugfs for this. You can just use bpftrace and
> instrument trace_rcu_ (with other RCU tracing Kconfig options enabled?). I had
> something like that working sometime ago.

My initial attempt to do this using tracepoints was probing
trace_rcu_segcb_stats, but this would add significant overhead to
every callback enqueue/dequeue event which is too expensive for a
production environment. I played a bit more with bpftrace and managed
to get this working with an interval probe plus some __per_cpu_offset
pointer arithmetic (see below). It is not the most maintainable code
and has some race issues, but probably acceptable for us if you
believe having this information easily available doesn't add value for
other use cases.

If anyone is interested, here is what I came up with:

interval:s:5 {
    printf("===== %s =====\n", strftime("%H:%M:%S", nsecs));
    $rdp_base = kaddr("rcu_data");
    $krc_base = kaddr("krc");
    $offsets  = (uint64 *)kaddr("__per_cpu_offset");

    for ($cpu : 0..ncpus) {
        $rdp  = (struct rcu_data *)($rdp_base + $offsets[$cpu]);
        $krcp = (struct kfree_rcu_cpu *)($krc_base + $offsets[$cpu]);

        $kfree = $krcp->head_count.counter
               + $krcp->bulk_count[0].counter
               + $krcp->bulk_count[1].counter;

        printf("cpu: %d done: %ld wait: %ld nr: %ld next: %ld lazy:
%ld kfree: %d\n",
               $cpu,
               $rdp->cblist.seglen[0],
               $rdp->cblist.seglen[1],
               $rdp->cblist.seglen[2],
               $rdp->cblist.seglen[3],
               $rdp->lazy_len,
               $kfree);
    }
}

>
> Generally RCU doesn't add userspace interfaces randomly like that. I remember
> Paul ripped similar things out some time ago.

Debugfs is intentionally not a stable ABI, so the bar for adding
things useful for debugging and tuning seems lower than /proc or /sys
-- which is why I went with debugfs here.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-05-11 17:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-07 17:37 [PATCH RFC 0/2] rcu: Add debugfs interface for pending callback monitoring Gustavo Luiz Duarte
2026-05-07 17:37 ` [PATCH RFC 1/2] rcu: Expose per-CPU segmented callback counts via debugfs Gustavo Luiz Duarte
2026-05-07 17:37 ` [PATCH RFC 2/2] rcu: Include kfree_rcu/kvfree_rcu batched counts in pending_cbs Gustavo Luiz Duarte
2026-05-07 18:59 ` [PATCH RFC 0/2] rcu: Add debugfs interface for pending callback monitoring Joel Fernandes
2026-05-11 17:08   ` Gustavo Luiz Duarte

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox