netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v3] page_pool: Add page_pool_release_stalled tracepoint
@ 2026-01-02  7:17 Leon Hwang
  2026-01-02 11:43 ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 3+ messages in thread
From: Leon Hwang @ 2026-01-02  7:17 UTC (permalink / raw)
  To: netdev
  Cc: Jesper Dangaard Brouer, Ilias Apalodimas, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	kerneljasonxing, lance.yang, jiayuan.chen, linux-kernel,
	linux-trace-kernel, Leon Hwang, Leon Huang Fu

Introduce a new tracepoint to track stalled page pool releases,
providing better observability for page pool lifecycle issues.

Problem:
Currently, when a page pool shutdown is stalled due to inflight pages,
the kernel only logs a warning message via pr_warn(). This has several
limitations:

1. The warning floods the kernel log after the initial DEFER_WARN_INTERVAL,
   making it difficult to track the progression of stalled releases
2. There's no structured way to monitor or analyze these events
3. Debugging tools cannot easily capture and correlate stalled pool
   events with other network activity

Solution:
Add a new tracepoint, page_pool_release_stalled, that fires when a page
pool shutdown is stalled. The tracepoint captures:
- pool: pointer to the stalled page_pool
- inflight: number of pages still in flight
- sec: seconds since the release was deferred

The implementation also modifies the logging behavior:
- pr_warn() is only emitted during the first warning interval
  (DEFER_WARN_INTERVAL to DEFER_WARN_INTERVAL*2)
- The tracepoint is fired always, reducing log noise while still
  allowing monitoring tools to track the issue

This allows developers and system administrators to:
- Use tools like perf, ftrace, or eBPF to monitor stalled releases
- Correlate page pool issues with network driver behavior
- Analyze patterns without parsing kernel logs
- Track the progression of inflight page counts over time

Signed-off-by: Leon Huang Fu <leon.huangfu@shopee.com>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
v2 -> v3:
 - Print id using '%u'.
 - https://lore.kernel.org/netdev/20260102061718.210248-1-leon.hwang@linux.dev/

v1 -> v2:
 - Drop RFC.
 - Store 'pool->user.id' to '__entry->id' (per Steven Rostedt).
 - https://lore.kernel.org/netdev/20251125082207.356075-1-leon.hwang@linux.dev/
---
 include/trace/events/page_pool.h | 24 ++++++++++++++++++++++++
 net/core/page_pool.c             |  6 ++++--
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/page_pool.h b/include/trace/events/page_pool.h
index 31825ed30032..a851e0f6a384 100644
--- a/include/trace/events/page_pool.h
+++ b/include/trace/events/page_pool.h
@@ -113,6 +113,30 @@ TRACE_EVENT(page_pool_update_nid,
 		  __entry->pool, __entry->pool_nid, __entry->new_nid)
 );
 
+TRACE_EVENT(page_pool_release_stalled,
+
+	TP_PROTO(const struct page_pool *pool, int inflight, int sec),
+
+	TP_ARGS(pool, inflight, sec),
+
+	TP_STRUCT__entry(
+		__field(const struct page_pool *, pool)
+		__field(u32,			  id)
+		__field(int,			  inflight)
+		__field(int,			  sec)
+	),
+
+	TP_fast_assign(
+		__entry->pool		= pool;
+		__entry->id		= pool->user.id;
+		__entry->inflight	= inflight;
+		__entry->sec		= sec;
+	),
+
+	TP_printk("page_pool=%p id=%u inflight=%d sec=%d",
+		  __entry->pool, __entry->id, __entry->inflight, __entry->sec)
+);
+
 #endif /* _TRACE_PAGE_POOL_H */
 
 /* This part must be outside protection */
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 265a729431bb..01564aa84c89 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1222,8 +1222,10 @@ static void page_pool_release_retry(struct work_struct *wq)
 	    (!netdev || netdev == NET_PTR_POISON)) {
 		int sec = (s32)((u32)jiffies - (u32)pool->defer_start) / HZ;
 
-		pr_warn("%s() stalled pool shutdown: id %u, %d inflight %d sec\n",
-			__func__, pool->user.id, inflight, sec);
+		if (sec >= DEFER_WARN_INTERVAL / HZ && sec < DEFER_WARN_INTERVAL * 2 / HZ)
+			pr_warn("%s() stalled pool shutdown: id %u, %d inflight %d sec\n",
+				__func__, pool->user.id, inflight, sec);
+		trace_page_pool_release_stalled(pool, inflight, sec);
 		pool->defer_warn = jiffies + DEFER_WARN_INTERVAL;
 	}
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net-next v3] page_pool: Add page_pool_release_stalled tracepoint
  2026-01-02  7:17 [PATCH net-next v3] page_pool: Add page_pool_release_stalled tracepoint Leon Hwang
@ 2026-01-02 11:43 ` Jesper Dangaard Brouer
  2026-01-02 13:54   ` Leon Hwang
  0 siblings, 1 reply; 3+ messages in thread
From: Jesper Dangaard Brouer @ 2026-01-02 11:43 UTC (permalink / raw)
  To: Leon Hwang, netdev
  Cc: Ilias Apalodimas, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, kerneljasonxing, lance.yang,
	jiayuan.chen, linux-kernel, linux-trace-kernel, Leon Huang Fu,
	Dragos Tatulea, kernel-team, Yan Zhai



On 02/01/2026 08.17, Leon Hwang wrote:
> Introduce a new tracepoint to track stalled page pool releases,
> providing better observability for page pool lifecycle issues.
> 

In general I like/support adding this tracepoint for "debugability" of
page pool lifecycle issues.

For "observability" @Kuba added a netlink scheme[1][2] for page_pool[3], 
which gives us the ability to get events and list page_pools from userspace.
I've not used this myself (yet) so I need input from others if this is 
something that others have been using for page pool lifecycle issues?

Need input from @Kuba/others as the "page-pool-get"[4] state that "Only 
Page Pools associated with a net_device can be listed".  Don't we want 
the ability to list "invisible" page_pool's to allow debugging issues?

  [1] https://docs.kernel.org/userspace-api/netlink/intro-specs.html
  [2] https://docs.kernel.org/userspace-api/netlink/index.html
  [3] https://docs.kernel.org/netlink/specs/netdev.html
  [4] https://docs.kernel.org/netlink/specs/netdev.html#page-pool-get

Looking at the code, I see that NETDEV_CMD_PAGE_POOL_CHANGE_NTF netlink
notification is only generated once (in page_pool_destroy) and not when
we retry in page_pool_release_retry (like this patch).  In that sense,
this patch/tracepoint is catching something more than netlink provides.
First I though we could add a netlink notification, but I can imagine
cases this could generate too many netlink messages e.g. a netdev with
128 RX queues generating these every second for every RX queue.

Guess, I've talked myself into liking this change, what do other
maintainers think?  (e.g. netlink scheme and debugging balance)


> Problem:
> Currently, when a page pool shutdown is stalled due to inflight pages,
> the kernel only logs a warning message via pr_warn(). This has several
> limitations:
> 
> 1. The warning floods the kernel log after the initial DEFER_WARN_INTERVAL,
>     making it difficult to track the progression of stalled releases
> 2. There's no structured way to monitor or analyze these events
> 3. Debugging tools cannot easily capture and correlate stalled pool
>     events with other network activity
> 
> Solution:
> Add a new tracepoint, page_pool_release_stalled, that fires when a page
> pool shutdown is stalled. The tracepoint captures:
> - pool: pointer to the stalled page_pool
> - inflight: number of pages still in flight
> - sec: seconds since the release was deferred
> 
> The implementation also modifies the logging behavior:
> - pr_warn() is only emitted during the first warning interval
>    (DEFER_WARN_INTERVAL to DEFER_WARN_INTERVAL*2)
> - The tracepoint is fired always, reducing log noise while still
>    allowing monitoring tools to track the issue
> 
> This allows developers and system administrators to:
> - Use tools like perf, ftrace, or eBPF to monitor stalled releases
> - Correlate page pool issues with network driver behavior
> - Analyze patterns without parsing kernel logs
> - Track the progression of inflight page counts over time
> 
> Signed-off-by: Leon Huang Fu <leon.huangfu@shopee.com>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
> v2 -> v3:
>   - Print id using '%u'.
>   - https://lore.kernel.org/netdev/20260102061718.210248-1-leon.hwang@linux.dev/
> 
> v1 -> v2:
>   - Drop RFC.
>   - Store 'pool->user.id' to '__entry->id' (per Steven Rostedt).
>   - https://lore.kernel.org/netdev/20251125082207.356075-1-leon.hwang@linux.dev/
> ---
>   include/trace/events/page_pool.h | 24 ++++++++++++++++++++++++
>   net/core/page_pool.c             |  6 ++++--
>   2 files changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/include/trace/events/page_pool.h b/include/trace/events/page_pool.h
> index 31825ed30032..a851e0f6a384 100644
> --- a/include/trace/events/page_pool.h
> +++ b/include/trace/events/page_pool.h
> @@ -113,6 +113,30 @@ TRACE_EVENT(page_pool_update_nid,
>   		  __entry->pool, __entry->pool_nid, __entry->new_nid)
>   );
>   
> +TRACE_EVENT(page_pool_release_stalled,
> +
> +	TP_PROTO(const struct page_pool *pool, int inflight, int sec),
> +
> +	TP_ARGS(pool, inflight, sec),
> +
> +	TP_STRUCT__entry(
> +		__field(const struct page_pool *, pool)
> +		__field(u32,			  id)
> +		__field(int,			  inflight)
> +		__field(int,			  sec)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->pool		= pool;
> +		__entry->id		= pool->user.id;
> +		__entry->inflight	= inflight;
> +		__entry->sec		= sec;
> +	),
> +
> +	TP_printk("page_pool=%p id=%u inflight=%d sec=%d",
> +		  __entry->pool, __entry->id, __entry->inflight, __entry->sec)
> +);
> +
>   #endif /* _TRACE_PAGE_POOL_H */
>   
>   /* This part must be outside protection */
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 265a729431bb..01564aa84c89 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -1222,8 +1222,10 @@ static void page_pool_release_retry(struct work_struct *wq)
>   	    (!netdev || netdev == NET_PTR_POISON)) {
>   		int sec = (s32)((u32)jiffies - (u32)pool->defer_start) / HZ;
>   
> -		pr_warn("%s() stalled pool shutdown: id %u, %d inflight %d sec\n",
> -			__func__, pool->user.id, inflight, sec);
> +		if (sec >= DEFER_WARN_INTERVAL / HZ && sec < DEFER_WARN_INTERVAL * 2 / HZ)
> +			pr_warn("%s() stalled pool shutdown: id %u, %d inflight %d sec\n",
> +				__func__, pool->user.id, inflight, sec);
> +		trace_page_pool_release_stalled(pool, inflight, sec);
>   		pool->defer_warn = jiffies + DEFER_WARN_INTERVAL;
>   	}
>   


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net-next v3] page_pool: Add page_pool_release_stalled tracepoint
  2026-01-02 11:43 ` Jesper Dangaard Brouer
@ 2026-01-02 13:54   ` Leon Hwang
  0 siblings, 0 replies; 3+ messages in thread
From: Leon Hwang @ 2026-01-02 13:54 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev
  Cc: Ilias Apalodimas, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, kerneljasonxing, lance.yang,
	jiayuan.chen, linux-kernel, linux-trace-kernel, Leon Huang Fu,
	Dragos Tatulea, kernel-team, Yan Zhai



On 2026/1/2 19:43, Jesper Dangaard Brouer wrote:
> 
> 
> On 02/01/2026 08.17, Leon Hwang wrote:
>> Introduce a new tracepoint to track stalled page pool releases,
>> providing better observability for page pool lifecycle issues.
>>
> 
> In general I like/support adding this tracepoint for "debugability" of
> page pool lifecycle issues.
> 
> For "observability" @Kuba added a netlink scheme[1][2] for page_pool[3],
> which gives us the ability to get events and list page_pools from
> userspace.
> I've not used this myself (yet) so I need input from others if this is
> something that others have been using for page pool lifecycle issues?
> 
> Need input from @Kuba/others as the "page-pool-get"[4] state that "Only
> Page Pools associated with a net_device can be listed".  Don't we want
> the ability to list "invisible" page_pool's to allow debugging issues?
> 
>  [1] https://docs.kernel.org/userspace-api/netlink/intro-specs.html
>  [2] https://docs.kernel.org/userspace-api/netlink/index.html
>  [3] https://docs.kernel.org/netlink/specs/netdev.html
>  [4] https://docs.kernel.org/netlink/specs/netdev.html#page-pool-get
> 
> Looking at the code, I see that NETDEV_CMD_PAGE_POOL_CHANGE_NTF netlink
> notification is only generated once (in page_pool_destroy) and not when
> we retry in page_pool_release_retry (like this patch).  In that sense,
> this patch/tracepoint is catching something more than netlink provides.
> First I though we could add a netlink notification, but I can imagine
> cases this could generate too many netlink messages e.g. a netdev with
> 128 RX queues generating these every second for every RX queue.
> 
> Guess, I've talked myself into liking this change, what do other
> maintainers think?  (e.g. netlink scheme and debugging balance)
> 

Hi Jesper,

Thanks for the thoughtful review and for sharing the context around the
existing netlink-based observability.

I ran into a real-world issue where stalled pages were still referenced
by dangling TCP sockets. I wrote up the investigation in more detail in
my blog post “let page inflight” [1] (unfortunately only available in
Chinese at the moment).

In practice, the hardest part was identifying *who* was still holding
references to the inflight pages. With the current tooling, it is very
difficult to introspect the active users of a page once it becomes stalled.

If we can expose more information about current page users—such as the
user type and a user pointer, it becomes much easier to debug these
issues using BPF-based tools. For example, by tracing
page_pool_state_hold and page_pool_state_release, tools like bpftrace
[2] or bpfsnoop [3] (which I implemented) can correlate inflight page
pointers with their active users. This significantly lowers the barrier
to diagnosing page pool lifecycle problems.

As you noted, the existing netlink notifications are generated only at
page_pool_destroy, and not during retries in page_pool_release_retry. In
that sense, the proposed tracepoint captures a class of issues that
netlink does not currently cover, and does so without the risk of
generating excessive userspace events.

Thanks again for the feedback, and I’m happy to refine the approach
based on further input from you, Kuba, or other maintainers.

Links:
[1] https://blog.leonhw.com/post/linux-networking-6-inflight-page/
[2] https://github.com/bpftrace/bpftrace/
[3] https://github.com/bpfsnoop/bpfsnoop/

Thanks,
Leon

[...]


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-01-02 13:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-02  7:17 [PATCH net-next v3] page_pool: Add page_pool_release_stalled tracepoint Leon Hwang
2026-01-02 11:43 ` Jesper Dangaard Brouer
2026-01-02 13:54   ` Leon Hwang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).