* [RFC PATCH net-next] page_pool: Add page_pool_release_stalled tracepoint
@ 2025-11-25 8:22 Leon Hwang
2025-11-25 16:23 ` Steven Rostedt
0 siblings, 1 reply; 3+ messages in thread
From: Leon Hwang @ 2025-11-25 8:22 UTC (permalink / raw)
To: netdev
Cc: hawk, ilias.apalodimas, rostedt, mhiramat, mathieu.desnoyers,
davem, edumazet, kuba, pabeni, horms, kerneljasonxing, lance.yang,
jiayuan.chen, linux-kernel, linux-trace-kernel, Leon Hwang,
Leon Huang Fu
Introduce a new tracepoint to track stalled page pool releases,
providing better observability for page pool lifecycle issues.
Problem:
Currently, when a page pool shutdown is stalled due to inflight pages,
the kernel only logs a warning message via pr_warn(). This has several
limitations:
1. The warning floods the kernel log after the initial DEFER_WARN_INTERVAL,
making it difficult to track the progression of stalled releases
2. There's no structured way to monitor or analyze these events
3. Debugging tools cannot easily capture and correlate stalled pool
events with other network activity
Solution:
Add a new tracepoint, page_pool_release_stalled, that fires when a page
pool shutdown is stalled. The tracepoint captures:
- pool: pointer to the stalled page_pool
- inflight: number of pages still in flight
- sec: seconds since the release was deferred
The implementation also modifies the logging behavior:
- pr_warn() is only emitted during the first warning interval
(DEFER_WARN_INTERVAL to DEFER_WARN_INTERVAL*2)
- The tracepoint is fired always, reducing log noise while still
allowing monitoring tools to track the issue
This allows developers and system administrators to:
- Use tools like perf, ftrace, or eBPF to monitor stalled releases
- Correlate page pool issues with network driver behavior
- Analyze patterns without parsing kernel logs
- Track the progression of inflight page counts over time
Signed-off-by: Leon Huang Fu <leon.huangfu@shopee.com>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
include/trace/events/page_pool.h | 22 ++++++++++++++++++++++
net/core/page_pool.c | 6 ++++--
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/include/trace/events/page_pool.h b/include/trace/events/page_pool.h
index 31825ed30032..c4205af9bf93 100644
--- a/include/trace/events/page_pool.h
+++ b/include/trace/events/page_pool.h
@@ -113,6 +113,28 @@ TRACE_EVENT(page_pool_update_nid,
__entry->pool, __entry->pool_nid, __entry->new_nid)
);
+TRACE_EVENT(page_pool_release_stalled,
+
+ TP_PROTO(const struct page_pool *pool, int inflight, int sec),
+
+ TP_ARGS(pool, inflight, sec),
+
+ TP_STRUCT__entry(
+ __field(const struct page_pool *, pool)
+ __field(int, inflight)
+ __field(int, sec)
+ ),
+
+ TP_fast_assign(
+ __entry->pool = pool;
+ __entry->inflight = inflight;
+ __entry->sec = sec;
+ ),
+
+ TP_printk("page_pool=%p id=%d inflight=%d sec=%d",
+ __entry->pool, __entry->pool->user.id, __entry->inflight, __entry->sec)
+);
+
#endif /* _TRACE_PAGE_POOL_H */
/* This part must be outside protection */
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 1a5edec485f1..9fd86749c705 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1218,8 +1218,10 @@ static void page_pool_release_retry(struct work_struct *wq)
(!netdev || netdev == NET_PTR_POISON)) {
int sec = (s32)((u32)jiffies - (u32)pool->defer_start) / HZ;
- pr_warn("%s() stalled pool shutdown: id %u, %d inflight %d sec\n",
- __func__, pool->user.id, inflight, sec);
+ if (sec >= DEFER_WARN_INTERVAL / HZ && sec < DEFER_WARN_INTERVAL * 2 / HZ)
+ pr_warn("%s() stalled pool shutdown: id %u, %d inflight %d sec\n",
+ __func__, pool->user.id, inflight, sec);
+ trace_page_pool_release_stalled(pool, inflight, sec);
pool->defer_warn = jiffies + DEFER_WARN_INTERVAL;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [RFC PATCH net-next] page_pool: Add page_pool_release_stalled tracepoint
2025-11-25 8:22 [RFC PATCH net-next] page_pool: Add page_pool_release_stalled tracepoint Leon Hwang
@ 2025-11-25 16:23 ` Steven Rostedt
2025-11-26 2:16 ` Leon Hwang
0 siblings, 1 reply; 3+ messages in thread
From: Steven Rostedt @ 2025-11-25 16:23 UTC (permalink / raw)
To: Leon Hwang
Cc: netdev, hawk, ilias.apalodimas, mhiramat, mathieu.desnoyers,
davem, edumazet, kuba, pabeni, horms, kerneljasonxing, lance.yang,
jiayuan.chen, linux-kernel, linux-trace-kernel, Leon Huang Fu
On Tue, 25 Nov 2025 16:22:07 +0800
Leon Hwang <leon.hwang@linux.dev> wrote:
> +TRACE_EVENT(page_pool_release_stalled,
> +
> + TP_PROTO(const struct page_pool *pool, int inflight, int sec),
> +
> + TP_ARGS(pool, inflight, sec),
> +
> + TP_STRUCT__entry(
> + __field(const struct page_pool *, pool)
> + __field(int, inflight)
> + __field(int, sec)
> + ),
> +
> + TP_fast_assign(
> + __entry->pool = pool;
> + __entry->inflight = inflight;
> + __entry->sec = sec;
> + ),
> +
> + TP_printk("page_pool=%p id=%d inflight=%d sec=%d",
> + __entry->pool, __entry->pool->user.id, __entry->inflight, __entry->sec)
You can't do: __entry->pool->user.id
The TP_fast_assign() is executed when the tracepoint is triggered. The
TP_printk() is executed when the trace is read. That can happen seconds,
minutes, hours, days, even months after the pool was assigned.
That __entry->pool can very well be freed a long time ago.
If you need the id, you need to record it in the TP_fast_assign():
__entry->id = pool->user.id
and print that.
-- Steve
> +);
> +
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC PATCH net-next] page_pool: Add page_pool_release_stalled tracepoint
2025-11-25 16:23 ` Steven Rostedt
@ 2025-11-26 2:16 ` Leon Hwang
0 siblings, 0 replies; 3+ messages in thread
From: Leon Hwang @ 2025-11-26 2:16 UTC (permalink / raw)
To: Steven Rostedt
Cc: netdev, hawk, ilias.apalodimas, mhiramat, mathieu.desnoyers,
davem, edumazet, kuba, pabeni, horms, kerneljasonxing, lance.yang,
jiayuan.chen, linux-kernel, linux-trace-kernel, Leon Huang Fu
On 26/11/25 00:23, Steven Rostedt wrote:
> On Tue, 25 Nov 2025 16:22:07 +0800
> Leon Hwang <leon.hwang@linux.dev> wrote:
>
>> +TRACE_EVENT(page_pool_release_stalled,
>> +
>> + TP_PROTO(const struct page_pool *pool, int inflight, int sec),
>> +
>> + TP_ARGS(pool, inflight, sec),
>> +
>> + TP_STRUCT__entry(
>> + __field(const struct page_pool *, pool)
>> + __field(int, inflight)
>> + __field(int, sec)
>> + ),
>> +
>> + TP_fast_assign(
>> + __entry->pool = pool;
>> + __entry->inflight = inflight;
>> + __entry->sec = sec;
>> + ),
>> +
>> + TP_printk("page_pool=%p id=%d inflight=%d sec=%d",
>> + __entry->pool, __entry->pool->user.id, __entry->inflight, __entry->sec)
>
> You can't do: __entry->pool->user.id
>
> The TP_fast_assign() is executed when the tracepoint is triggered. The
> TP_printk() is executed when the trace is read. That can happen seconds,
> minutes, hours, days, even months after the pool was assigned.
>
> That __entry->pool can very well be freed a long time ago.
>
> If you need the id, you need to record it in the TP_fast_assign():
>
> __entry->id = pool->user.id
>
> and print that.
>
> -- Steve
>
Hi Steve,
Thanks for the review.
Yes, the id is needed here (similar to what we emit in pr_warn()), so
I'll follow your suggestion and record it explicitly in TP_fast_assign():
__entry->id = pool->user.id;
And then update the print format to:
TP_printk("page_pool=%p id=%d inflight=%d sec=%d",
__entry->pool, __entry->id,
__entry->inflight, __entry->sec)
Thanks,
Leon
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-11-26 2:16 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-25 8:22 [RFC PATCH net-next] page_pool: Add page_pool_release_stalled tracepoint Leon Hwang
2025-11-25 16:23 ` Steven Rostedt
2025-11-26 2:16 ` Leon Hwang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).