linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Leon Hwang <leon.hwang@linux.dev>
To: Yunsheng Lin <linyunsheng@huawei.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	netdev@vger.kernel.org
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	"David S . Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	kerneljasonxing@gmail.com, lance.yang@linux.dev,
	jiayuan.chen@linux.dev, linux-kernel@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org,
	Leon Huang Fu <leon.huangfu@shopee.com>,
	Dragos Tatulea <dtatulea@nvidia.com>,
	kernel-team <kernel-team@cloudflare.com>,
	Yan Zhai <yan@cloudflare.com>
Subject: Re: [PATCH net-next v3] page_pool: Add page_pool_release_stalled tracepoint
Date: Mon, 5 Jan 2026 14:23:27 +0800	[thread overview]
Message-ID: <130d8c90-6285-41b0-926e-d6df9791bcd4@linux.dev> (raw)
In-Reply-To: <dfc33064-f99f-4728-858f-95c80300bcff@huawei.com>



On 4/1/26 10:18, Yunsheng Lin wrote:
> On 2026/1/2 19:43, Jesper Dangaard Brouer wrote:
>>
>>
>> On 02/01/2026 08.17, Leon Hwang wrote:
>>> Introduce a new tracepoint to track stalled page pool releases,
>>> providing better observability for page pool lifecycle issues.
>>>
>>
>> In general I like/support adding this tracepoint for "debugability" of
>> page pool lifecycle issues.
>>
>> For "observability" @Kuba added a netlink scheme[1][2] for page_pool[3], which gives us the ability to get events and list page_pools from userspace.
>> I've not used this myself (yet) so I need input from others if this is something that others have been using for page pool lifecycle issues?
>>
>> Need input from @Kuba/others as the "page-pool-get"[4] state that "Only Page Pools associated with a net_device can be listed".  Don't we want the ability to list "invisible" page_pool's to allow debugging issues?
>>
>>  [1] https://docs.kernel.org/userspace-api/netlink/intro-specs.html
>>  [2] https://docs.kernel.org/userspace-api/netlink/index.html
>>  [3] https://docs.kernel.org/netlink/specs/netdev.html
>>  [4] https://docs.kernel.org/netlink/specs/netdev.html#page-pool-get
>>
>> Looking at the code, I see that NETDEV_CMD_PAGE_POOL_CHANGE_NTF netlink
>> notification is only generated once (in page_pool_destroy) and not when
>> we retry in page_pool_release_retry (like this patch).  In that sense,
>> this patch/tracepoint is catching something more than netlink provides.
>> First I though we could add a netlink notification, but I can imagine
>> cases this could generate too many netlink messages e.g. a netdev with
>> 128 RX queues generating these every second for every RX queue.
>>
>> Guess, I've talked myself into liking this change, what do other
>> maintainers think?  (e.g. netlink scheme and debugging balance)
>>
>>
>>> Problem:
>>> Currently, when a page pool shutdown is stalled due to inflight pages,
>>> the kernel only logs a warning message via pr_warn(). This has several
>>> limitations:
>>>
>>> 1. The warning floods the kernel log after the initial DEFER_WARN_INTERVAL,
>>>     making it difficult to track the progression of stalled releases
>>> 2. There's no structured way to monitor or analyze these events
>>> 3. Debugging tools cannot easily capture and correlate stalled pool
>>>     events with other network activity
>>>
>>> Solution:
>>> Add a new tracepoint, page_pool_release_stalled, that fires when a page
>>> pool shutdown is stalled. The tracepoint captures:
>>> - pool: pointer to the stalled page_pool
>>> - inflight: number of pages still in flight
>>> - sec: seconds since the release was deferred
>>>
>>> The implementation also modifies the logging behavior:
>>> - pr_warn() is only emitted during the first warning interval
>>>    (DEFER_WARN_INTERVAL to DEFER_WARN_INTERVAL*2)
>>> - The tracepoint is fired always, reducing log noise while still
>>>    allowing monitoring tools to track the issue
> 
> If the initial log is still present, I don't really see what's the benefit
> of re-triggering logs or tracepoints when the first two fields are unchanged
> and the last two fields can be inspected using some tool? If there are none,
> perhaps we only need to print the first trigger log and a log upon completion
> of page_pool destruction.
> 

Even though it is possible to inspect the last two fields via the
workqueue (e.g., by tracing page_pool_release_retry with BPF tools),
this is not a practical approach for routine monitoring or debugging.

With the proposed tracepoint, obtaining these fields becomes
straightforward and lightweight, making it much easier to observe and
reason about stalled page pool releases in real systems.

In the issue I encountered, it was crucial to notice that the inflight
count was gradually decreasing over time. This gave me confidence that
some orphaned pages were eventually being returned to the page pool.
Based on that signal, I was then able to capture the call stack of
page_pool_put_defragged_page (kernel v6.6) to identify the code path
responsible for returning those pages.

Without repeated pr_warn logs or tracepoint events, it would have been
significantly harder to observe this progression and correlate it with
the eventual page returns.

Thanks,
Leon

  reply	other threads:[~2026-01-05  6:23 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-02  7:17 [PATCH net-next v3] page_pool: Add page_pool_release_stalled tracepoint Leon Hwang
2026-01-02 11:43 ` Jesper Dangaard Brouer
2026-01-02 13:54   ` Leon Hwang
2026-01-04  2:18   ` Yunsheng Lin
2026-01-05  6:23     ` Leon Hwang [this message]
2026-01-04 16:43   ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=130d8c90-6285-41b0-926e-d6df9791bcd4@linux.dev \
    --to=leon.hwang@linux.dev \
    --cc=davem@davemloft.net \
    --cc=dtatulea@nvidia.com \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=jiayuan.chen@linux.dev \
    --cc=kernel-team@cloudflare.com \
    --cc=kerneljasonxing@gmail.com \
    --cc=kuba@kernel.org \
    --cc=lance.yang@linux.dev \
    --cc=leon.huangfu@shopee.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=linyunsheng@huawei.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=yan@cloudflare.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).