* [PATCH net-next v4] page_pool: Add page_pool_release_stalled tracepoint
@ 2026-01-19 10:21 Leon Hwang
2026-01-19 16:37 ` Jakub Kicinski
0 siblings, 1 reply; 7+ messages in thread
From: Leon Hwang @ 2026-01-19 10:21 UTC (permalink / raw)
To: netdev
Cc: Jesper Dangaard Brouer, Ilias Apalodimas, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S . Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
kerneljasonxing, lance.yang, jiayuan.chen, linux-kernel,
linux-trace-kernel, Leon Hwang, Leon Huang Fu
Introduce a new tracepoint to track stalled page pool releases,
providing better observability for page pool lifecycle issues.
Problem:
Currently, when a page pool shutdown is stalled due to inflight pages,
the kernel only logs a warning message via pr_warn(). This has several
limitations:
1. The warning floods the kernel log after the initial DEFER_WARN_INTERVAL,
making it difficult to track the progression of stalled releases
2. There's no structured way to monitor or analyze these events
3. Debugging tools cannot easily capture and correlate stalled pool
events with other network activity
Solution:
Add a new tracepoint, page_pool_release_stalled, that fires when a page
pool shutdown is stalled. The tracepoint captures:
- pool: pointer to the stalled page_pool
- inflight: number of pages still in flight
- sec: seconds since the release was deferred
The implementation also modifies the logging behavior:
- pr_warn() is only emitted during the first warning interval
(DEFER_WARN_INTERVAL to DEFER_WARN_INTERVAL*2)
- The tracepoint is fired always, reducing log noise while still
allowing monitoring tools to track the issue
This allows developers and system administrators to:
- Use tools like perf, ftrace, or eBPF to monitor stalled releases
- Correlate page pool issues with network driver behavior
- Analyze patterns without parsing kernel logs
- Track the progression of inflight page counts over time
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Leon Huang Fu <leon.huangfu@shopee.com>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
v3 -> v4:
- Collect Reviewed-by from Steven, thanks.
- Collect Acked-by from Jesper, thanks.
- https://lore.kernel.org/netdev/20260102071745.291969-1-leon.hwang@linux.dev/
v2 -> v3:
- Print id using '%u'.
- https://lore.kernel.org/netdev/20260102061718.210248-1-leon.hwang@linux.dev/
v1 -> v2:
- Drop RFC.
- Store 'pool->user.id' to '__entry->id' (per Steven Rostedt).
- https://lore.kernel.org/netdev/20251125082207.356075-1-leon.hwang@linux.dev/
---
include/trace/events/page_pool.h | 24 ++++++++++++++++++++++++
net/core/page_pool.c | 6 ++++--
2 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/include/trace/events/page_pool.h b/include/trace/events/page_pool.h
index 31825ed30032..a851e0f6a384 100644
--- a/include/trace/events/page_pool.h
+++ b/include/trace/events/page_pool.h
@@ -113,6 +113,30 @@ TRACE_EVENT(page_pool_update_nid,
__entry->pool, __entry->pool_nid, __entry->new_nid)
);
+TRACE_EVENT(page_pool_release_stalled,
+
+ TP_PROTO(const struct page_pool *pool, int inflight, int sec),
+
+ TP_ARGS(pool, inflight, sec),
+
+ TP_STRUCT__entry(
+ __field(const struct page_pool *, pool)
+ __field(u32, id)
+ __field(int, inflight)
+ __field(int, sec)
+ ),
+
+ TP_fast_assign(
+ __entry->pool = pool;
+ __entry->id = pool->user.id;
+ __entry->inflight = inflight;
+ __entry->sec = sec;
+ ),
+
+ TP_printk("page_pool=%p id=%u inflight=%d sec=%d",
+ __entry->pool, __entry->id, __entry->inflight, __entry->sec)
+);
+
#endif /* _TRACE_PAGE_POOL_H */
/* This part must be outside protection */
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 265a729431bb..01564aa84c89 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1222,8 +1222,10 @@ static void page_pool_release_retry(struct work_struct *wq)
(!netdev || netdev == NET_PTR_POISON)) {
int sec = (s32)((u32)jiffies - (u32)pool->defer_start) / HZ;
- pr_warn("%s() stalled pool shutdown: id %u, %d inflight %d sec\n",
- __func__, pool->user.id, inflight, sec);
+ if (sec >= DEFER_WARN_INTERVAL / HZ && sec < DEFER_WARN_INTERVAL * 2 / HZ)
+ pr_warn("%s() stalled pool shutdown: id %u, %d inflight %d sec\n",
+ __func__, pool->user.id, inflight, sec);
+ trace_page_pool_release_stalled(pool, inflight, sec);
pool->defer_warn = jiffies + DEFER_WARN_INTERVAL;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net-next v4] page_pool: Add page_pool_release_stalled tracepoint
2026-01-19 10:21 [PATCH net-next v4] page_pool: Add page_pool_release_stalled tracepoint Leon Hwang
@ 2026-01-19 16:37 ` Jakub Kicinski
2026-01-19 16:43 ` Jesper Dangaard Brouer
2026-01-20 3:16 ` Leon Hwang
0 siblings, 2 replies; 7+ messages in thread
From: Jakub Kicinski @ 2026-01-19 16:37 UTC (permalink / raw)
To: Leon Hwang
Cc: netdev, Jesper Dangaard Brouer, Ilias Apalodimas, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S . Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, kerneljasonxing,
lance.yang, jiayuan.chen, linux-kernel, linux-trace-kernel,
Leon Huang Fu
On Mon, 19 Jan 2026 18:21:19 +0800 Leon Hwang wrote:
> Introduce a new tracepoint to track stalled page pool releases,
> providing better observability for page pool lifecycle issues.
Sorry, I really want you to answer the questions from the last
paragraph of:
https://lore.kernel.org/netdev/20260104084347.5de3a537@kernel.org/
--
pw-bot: cr
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next v4] page_pool: Add page_pool_release_stalled tracepoint
2026-01-19 16:37 ` Jakub Kicinski
@ 2026-01-19 16:43 ` Jesper Dangaard Brouer
2026-01-19 17:23 ` Jakub Kicinski
2026-01-20 3:16 ` Leon Hwang
1 sibling, 1 reply; 7+ messages in thread
From: Jesper Dangaard Brouer @ 2026-01-19 16:43 UTC (permalink / raw)
To: Jakub Kicinski, Leon Hwang
Cc: netdev, Ilias Apalodimas, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, David S . Miller, Eric Dumazet, Paolo Abeni,
Simon Horman, kerneljasonxing, lance.yang, jiayuan.chen,
linux-kernel, linux-trace-kernel, Leon Huang Fu
On 19/01/2026 17.37, Jakub Kicinski wrote:
> On Mon, 19 Jan 2026 18:21:19 +0800 Leon Hwang wrote:
>> Introduce a new tracepoint to track stalled page pool releases,
>> providing better observability for page pool lifecycle issues.
>
> Sorry, I really want you to answer the questions from the last
> paragraph of:
>
> https://lore.kernel.org/netdev/20260104084347.5de3a537@kernel.org/
Okay, I will answer that, if you will answer:
What monitoring tool did production people add metrics to?
People at CF recommend that I/we add this to prometheus node_exporter[0].
Perhaps somebody else already added this to some other FOSS tool?
--Jesper
[0] https://github.com/prometheus/node_exporter
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next v4] page_pool: Add page_pool_release_stalled tracepoint
2026-01-19 16:43 ` Jesper Dangaard Brouer
@ 2026-01-19 17:23 ` Jakub Kicinski
0 siblings, 0 replies; 7+ messages in thread
From: Jakub Kicinski @ 2026-01-19 17:23 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Leon Hwang, netdev, Ilias Apalodimas, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S . Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, kerneljasonxing,
lance.yang, jiayuan.chen, linux-kernel, linux-trace-kernel,
Leon Huang Fu
On Mon, 19 Jan 2026 17:43:20 +0100 Jesper Dangaard Brouer wrote:
> On 19/01/2026 17.37, Jakub Kicinski wrote:
> > On Mon, 19 Jan 2026 18:21:19 +0800 Leon Hwang wrote:
> >> Introduce a new tracepoint to track stalled page pool releases,
> >> providing better observability for page pool lifecycle issues.
> >
> > Sorry, I really want you to answer the questions from the last
> > paragraph of:
> >
> > https://lore.kernel.org/netdev/20260104084347.5de3a537@kernel.org/
>
> Okay, I will answer that, if you will answer:
To be clear I was asking Leon about his setup :)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next v4] page_pool: Add page_pool_release_stalled tracepoint
2026-01-19 16:37 ` Jakub Kicinski
2026-01-19 16:43 ` Jesper Dangaard Brouer
@ 2026-01-20 3:16 ` Leon Hwang
2026-01-20 23:29 ` Jakub Kicinski
1 sibling, 1 reply; 7+ messages in thread
From: Leon Hwang @ 2026-01-20 3:16 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, Jesper Dangaard Brouer, Ilias Apalodimas, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S . Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, kerneljasonxing,
lance.yang, jiayuan.chen, linux-kernel, linux-trace-kernel,
Leon Huang Fu
On 20/1/26 00:37, Jakub Kicinski wrote:
> On Mon, 19 Jan 2026 18:21:19 +0800 Leon Hwang wrote:
>> Introduce a new tracepoint to track stalled page pool releases,
>> providing better observability for page pool lifecycle issues.
>
> Sorry, I really want you to answer the questions from the last
> paragraph of:
>
> https://lore.kernel.org/netdev/20260104084347.5de3a537@kernel.org/
Let me share a concrete case where this tracepoint would have helped,
and why netlink notifications were not a good fit.
I encountered the 'pr_warn()' messages during Mellanox NIC flapping on a
system using the 'mlx5_core' driver (kernel 6.6). The root cause turned
out to be an application-level issue: the IBM/sarama “Client SeekBroker
Connection Leak” [1].
In short, some TCP sockets became orphaned while still holding FINACK
skbs in their 'sk_receive_queue'. These skbs were holding inflight pages
from page pools. After NIC flapping, as long as those sockets were not
closed, the inflight pages could not be returned, and the corresponding
page pools could not be released. Once the orphaned sockets were
explicitly closed (as in [2]), the inflight pages were returned and the
page pools were eventually destroyed.
During the investigation, the dmesg output was noisy: there were many
inflight pages across multiple page pools, originating from many
orphaned sockets. This made it difficult to investigate and reason about
the issue using BPF tools.
In this scenario, a netlink notification does not seem like a good fit:
* The situation involved many page pools and many inflight pages.
* Emitting netlink notifications on each retry or stall would likely
generate a large volume of messages.
* What was needed was not a stream of notifications, but the ability to
observe and correlate page pool state over time.
A tracepoint fits this use case better. With a
'page_pool_release_stalled' tracepoint, it becomes straightforward to
use BPF tools to:
* Track which page pools are repeatedly stalled
* Correlate stalls with socket state, RX queues, or driver behavior
* Distinguish expected situations (e.g. orphaned sockets temporarily
holding pages) from genuine kernel or driver issues
From my experience, this tracepoint complements the existing
netlink-based observability rather than duplicating it, while avoiding
the risk of excessive netlink traffic in pathological but realistic
scenarios such as NIC flapping combined with connection leaks.
Thanks,
Leon
[1] https://github.com/IBM/sarama/issues/3143
[2] https://github.com/IBM/sarama/pull/3384
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next v4] page_pool: Add page_pool_release_stalled tracepoint
2026-01-20 3:16 ` Leon Hwang
@ 2026-01-20 23:29 ` Jakub Kicinski
2026-01-21 2:17 ` Leon Hwang
0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2026-01-20 23:29 UTC (permalink / raw)
To: Leon Hwang
Cc: netdev, Jesper Dangaard Brouer, Ilias Apalodimas, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S . Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, kerneljasonxing,
lance.yang, jiayuan.chen, linux-kernel, linux-trace-kernel,
Leon Huang Fu
On Tue, 20 Jan 2026 11:16:20 +0800 Leon Hwang wrote:
> I encountered the 'pr_warn()' messages during Mellanox NIC flapping on a
> system using the 'mlx5_core' driver (kernel 6.6). The root cause turned
> out to be an application-level issue: the IBM/sarama “Client SeekBroker
> Connection Leak” [1].
The scenario you are describing matches the situations we run into
at Meta. With the upstream kernel you can find that the pages are
leaking based on stats, and if you care use drgn to locate them
(in the recv queue).
The 6.6 kernel did not have page pool stats. I feel quite odd about
adding more uAPI because someone is running a 2+ years old kernel
and doesn't have access to the already existing facilities.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next v4] page_pool: Add page_pool_release_stalled tracepoint
2026-01-20 23:29 ` Jakub Kicinski
@ 2026-01-21 2:17 ` Leon Hwang
0 siblings, 0 replies; 7+ messages in thread
From: Leon Hwang @ 2026-01-21 2:17 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, Jesper Dangaard Brouer, Ilias Apalodimas, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S . Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, kerneljasonxing,
lance.yang, jiayuan.chen, linux-kernel, linux-trace-kernel,
Leon Huang Fu
On 21/1/26 07:29, Jakub Kicinski wrote:
> On Tue, 20 Jan 2026 11:16:20 +0800 Leon Hwang wrote:
>> I encountered the 'pr_warn()' messages during Mellanox NIC flapping on a
>> system using the 'mlx5_core' driver (kernel 6.6). The root cause turned
>> out to be an application-level issue: the IBM/sarama “Client SeekBroker
>> Connection Leak” [1].
>
> The scenario you are describing matches the situations we run into
> at Meta. With the upstream kernel you can find that the pages are
> leaking based on stats, and if you care use drgn to locate them
> (in the recv queue).
>
Thanks, that makes sense.
drgn indeed sounds helpful for locating the pages once it is confirmed
that the inflight pages are being held by the socket receive queue.
Before reaching that point, however, it was quite difficult to pinpoint
where those inflight pages were stuck. I was wondering whether there is
any other handy tool or method to help locate them earlier.
> The 6.6 kernel did not have page pool stats. I feel quite odd about
> adding more uAPI because someone is running a 2+ years old kernel
> and doesn't have access to the already existing facilities.
After checking the code again, I realized that the 6.6 kernel does
have page pool stats support.
Unfortunately, CONFIG_PAGE_POOL_STATS was not enabled in our Shopee
deployment, which is why those facilities were not available to us.
In any case, I understand your concern. I won’t pursue adding this
tracepoint further if it’s not something you’d like to see upstream.
Thanks,
Leon
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-01-21 2:18 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-19 10:21 [PATCH net-next v4] page_pool: Add page_pool_release_stalled tracepoint Leon Hwang
2026-01-19 16:37 ` Jakub Kicinski
2026-01-19 16:43 ` Jesper Dangaard Brouer
2026-01-19 17:23 ` Jakub Kicinski
2026-01-20 3:16 ` Leon Hwang
2026-01-20 23:29 ` Jakub Kicinski
2026-01-21 2:17 ` Leon Hwang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox