From: Stanislav Fomichev <stfomichev@gmail.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: davem@davemloft.net, netdev@vger.kernel.org, edumazet@google.com,
pabeni@redhat.com, andrew+netdev@lunn.ch, horms@kernel.org,
David Wei <dw@davidwei.uk>,
michael.chan@broadcom.com, pavan.chebbi@broadcom.com,
hawk@kernel.org, ilias.apalodimas@linaro.org,
almasrymina@google.com, sdf@fomichev.me
Subject: Re: [PATCH net] net: page_pool: allow enabling recycling late, fix false positive warning
Date: Fri, 1 Aug 2025 13:55:09 -0700 [thread overview]
Message-ID: <aI0prRzAJkEXdkEa@mini-arch> (raw)
In-Reply-To: <20250801173011.2454447-1-kuba@kernel.org>
On 08/01, Jakub Kicinski wrote:
> Page pool can have pages "directly" (locklessly) recycled to it,
> if the NAPI that owns the page pool is scheduled to run on the same CPU.
> To make this safe we check that the NAPI is disabled while we destroy
> the page pool. In most cases NAPI and page pool lifetimes are tied
> together so this happens naturally.
>
> The queue API expects the following order of calls:
> -> mem_alloc
> alloc new pp
> -> stop
> napi_disable
> -> start
> napi_enable
> -> mem_free
> free old pp
>
> Here we allocate the page pool in ->mem_alloc and free in ->mem_free.
> But the NAPIs are only stopped between ->stop and ->start. We created
> page_pool_disable_direct_recycling() to safely shut down the recycling
> in ->stop. This way the page_pool_destroy() call in ->mem_free doesn't
> have to worry about recycling any more.
>
> Unfortunately, the page_pool_disable_direct_recycling() is not enough
> to deal with failures which necessitate freeing the _new_ page pool.
> If we hit a failure in ->mem_alloc or ->stop the new page pool has
> to be freed while the NAPI is active (assuming driver attaches the
> page pool to an existing NAPI instance and doesn't reallocate NAPIs).
>
> Freeing the new page pool is technically safe because it hasn't been
> used for any packets, yet, so there can be no recycling. But the check
> in napi_assert_will_not_race() has no way of knowing that. We could
> check if page pool is empty but that'd make the check much less likely
> to trigger during development.
>
> Add page_pool_enable_direct_recycling(), pairing with
> page_pool_disable_direct_recycling(). It will allow us to create the new
> page pools in "disabled" state and only enable recycling when we know
> the reconfig operation will not fail.
>
> Coincidentally it will also let us re-enable the recycling for the old
> pool, if the reconfig failed:
>
> -> mem_alloc (new)
> -> stop (old)
> # disables direct recycling for old
> -> start (new)
> # fail!!
> -> start (old)
> # go back to old pp but direct recycling is lost :(
> -> mem_free (new)
>
> Fixes: 40eca00ae605 ("bnxt_en: unlink page pool when stopping Rx queue")
> Tested-by: David Wei <dw@davidwei.uk>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> Thanks to David Wei for confirming the problem on bnxt and testing
> the fix. I hit this writing the fbnic support for ZC, TBH.
> Any driver where NAPI instance gets reused and not reallocated on each
> queue restart may have this problem. netdevsim doesn't 'cause the
> callbacks can't fail in funny ways there.
>
> CC: michael.chan@broadcom.com
> CC: pavan.chebbi@broadcom.com
> CC: hawk@kernel.org
> CC: ilias.apalodimas@linaro.org
> CC: dw@davidwei.uk
> CC: almasrymina@google.com
> CC: sdf@fomichev.me
> ---
> include/net/page_pool/types.h | 2 ++
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 9 ++++++++-
> net/core/page_pool.c | 13 +++++++++++++
> 3 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index 431b593de709..1509a536cb85 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -265,6 +265,8 @@ struct page_pool *page_pool_create_percpu(const struct page_pool_params *params,
> struct xdp_mem_info;
>
> #ifdef CONFIG_PAGE_POOL
> +void page_pool_enable_direct_recycling(struct page_pool *pool,
> + struct napi_struct *napi);
> void page_pool_disable_direct_recycling(struct page_pool *pool);
> void page_pool_destroy(struct page_pool *pool);
> void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 5578ddcb465d..76a4c5ae8000 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -3819,7 +3819,6 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
> if (BNXT_RX_PAGE_MODE(bp))
> pp.pool_size += bp->rx_ring_size / rx_size_fac;
> pp.nid = numa_node;
> - pp.napi = &rxr->bnapi->napi;
> pp.netdev = bp->dev;
> pp.dev = &bp->pdev->dev;
> pp.dma_dir = bp->rx_dir;
> @@ -3851,6 +3850,12 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
> return PTR_ERR(pool);
> }
>
> +static void bnxt_enable_rx_page_pool(struct bnxt_rx_ring_info *rxr)
> +{
> + page_pool_enable_direct_recycling(rxr->head_pool, &rxr->bnapi->napi);
> + page_pool_enable_direct_recycling(rxr->page_pool, &rxr->bnapi->napi);
We do bnxt_separate_head_pool check for the disable_direct_recycling
of head_pool. Is it safe to skip the check here because we always allocate two
pps from queue_mgmt callbacks? (not clear for me from a quick glance at
bnxt_alloc_rx_page_pool)
next prev parent reply other threads:[~2025-08-01 20:55 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-01 17:30 [PATCH net] net: page_pool: allow enabling recycling late, fix false positive warning Jakub Kicinski
2025-08-01 20:55 ` Stanislav Fomichev [this message]
2025-08-01 21:05 ` Jakub Kicinski
2025-08-01 21:36 ` Stanislav Fomichev
2025-08-04 14:17 ` Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aI0prRzAJkEXdkEa@mini-arch \
--to=stfomichev@gmail.com \
--cc=almasrymina@google.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=dw@davidwei.uk \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=kuba@kernel.org \
--cc=michael.chan@broadcom.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pavan.chebbi@broadcom.com \
--cc=sdf@fomichev.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.