* [PATCH net v2] net: page_pool: allow enabling recycling late, fix false positive warning
@ 2025-08-05 0:36 Jakub Kicinski
2025-08-05 8:03 ` Jesper Dangaard Brouer
2025-08-08 20:10 ` patchwork-bot+netdevbpf
0 siblings, 2 replies; 3+ messages in thread
From: Jakub Kicinski @ 2025-08-05 0:36 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski,
David Wei, michael.chan, pavan.chebbi, hawk, ilias.apalodimas,
almasrymina, sdf
Page pool can have pages "directly" (locklessly) recycled to it,
if the NAPI that owns the page pool is scheduled to run on the same CPU.
To make this safe we check that the NAPI is disabled while we destroy
the page pool. In most cases NAPI and page pool lifetimes are tied
together so this happens naturally.
The queue API expects the following order of calls:
-> mem_alloc
alloc new pp
-> stop
napi_disable
-> start
napi_enable
-> mem_free
free old pp
Here we allocate the page pool in ->mem_alloc and free in ->mem_free.
But the NAPIs are only stopped between ->stop and ->start. We created
page_pool_disable_direct_recycling() to safely shut down the recycling
in ->stop. This way the page_pool_destroy() call in ->mem_free doesn't
have to worry about recycling any more.
Unfortunately, the page_pool_disable_direct_recycling() is not enough
to deal with failures which necessitate freeing the _new_ page pool.
If we hit a failure in ->mem_alloc or ->stop the new page pool has
to be freed while the NAPI is active (assuming driver attaches the
page pool to an existing NAPI instance and doesn't reallocate NAPIs).
Freeing the new page pool is technically safe because it hasn't been
used for any packets, yet, so there can be no recycling. But the check
in napi_assert_will_not_race() has no way of knowing that. We could
check if page pool is empty but that'd make the check much less likely
to trigger during development.
Add page_pool_enable_direct_recycling(), pairing with
page_pool_disable_direct_recycling(). It will allow us to create the new
page pools in "disabled" state and only enable recycling when we know
the reconfig operation will not fail.
Coincidentally it will also let us re-enable the recycling for the old
pool, if the reconfig failed:
-> mem_alloc (new)
-> stop (old)
# disables direct recycling for old
-> start (new)
# fail!!
-> start (old)
# go back to old pp but direct recycling is lost :(
-> mem_free (new)
The new helper is idempotent to make the life easier for drivers,
which can operate in HDS mode and support zero-copy Rx.
The driver can call the helper twice whether there are two pools
or it has multiple references to a single pool.
Fixes: 40eca00ae605 ("bnxt_en: unlink page pool when stopping Rx queue")
Tested-by: David Wei <dw@davidwei.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v2:
- add kdoc
- WARN_ON_ONCE -> WARN_ON
v1: https://lore.kernel.org/20250801173011.2454447-1-kuba@kernel.org
Thanks to David Wei for confirming the problem on bnxt and testing
the fix. I hit this writing the fbnic support for ZC, TBH.
Any driver where NAPI instance gets reused and not reallocated on each
queue restart may have this problem. netdevsim doesn't 'cause the
callbacks can't fail in funny ways there.
CC: michael.chan@broadcom.com
CC: pavan.chebbi@broadcom.com
CC: hawk@kernel.org
CC: ilias.apalodimas@linaro.org
CC: dw@davidwei.uk
CC: almasrymina@google.com
CC: sdf@fomichev.me
---
include/net/page_pool/types.h | 2 ++
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 9 ++++++-
net/core/page_pool.c | 29 +++++++++++++++++++++++
3 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 431b593de709..1509a536cb85 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -265,6 +265,8 @@ struct page_pool *page_pool_create_percpu(const struct page_pool_params *params,
struct xdp_mem_info;
#ifdef CONFIG_PAGE_POOL
+void page_pool_enable_direct_recycling(struct page_pool *pool,
+ struct napi_struct *napi);
void page_pool_disable_direct_recycling(struct page_pool *pool);
void page_pool_destroy(struct page_pool *pool);
void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 5578ddcb465d..76a4c5ae8000 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3819,7 +3819,6 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
if (BNXT_RX_PAGE_MODE(bp))
pp.pool_size += bp->rx_ring_size / rx_size_fac;
pp.nid = numa_node;
- pp.napi = &rxr->bnapi->napi;
pp.netdev = bp->dev;
pp.dev = &bp->pdev->dev;
pp.dma_dir = bp->rx_dir;
@@ -3851,6 +3850,12 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
return PTR_ERR(pool);
}
+static void bnxt_enable_rx_page_pool(struct bnxt_rx_ring_info *rxr)
+{
+ page_pool_enable_direct_recycling(rxr->head_pool, &rxr->bnapi->napi);
+ page_pool_enable_direct_recycling(rxr->page_pool, &rxr->bnapi->napi);
+}
+
static int bnxt_alloc_rx_agg_bmap(struct bnxt *bp, struct bnxt_rx_ring_info *rxr)
{
u16 mem_size;
@@ -3889,6 +3894,7 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
rc = bnxt_alloc_rx_page_pool(bp, rxr, cpu_node);
if (rc)
return rc;
+ bnxt_enable_rx_page_pool(rxr);
rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i, 0);
if (rc < 0)
@@ -16031,6 +16037,7 @@ static int bnxt_queue_start(struct net_device *dev, void *qmem, int idx)
goto err_reset;
}
+ bnxt_enable_rx_page_pool(rxr);
napi_enable_locked(&bnapi->napi);
bnxt_db_nq_arm(bp, &cpr->cp_db, cpr->cp_raw_cons);
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 05e2e22a8f7c..343a6cac21e3 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1201,6 +1201,35 @@ void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
pool->xdp_mem_id = mem->id;
}
+/**
+ * page_pool_enable_direct_recycling() - mark page pool as owned by NAPI
+ * @pool: page pool to modify
+ * @napi: NAPI instance to associate the page pool with
+ *
+ * Associate a page pool with a NAPI instance for lockless page recycling.
+ * This is useful when a new page pool has to be added to a NAPI instance
+ * without disabling that NAPI instance, to mark the point at which control
+ * path "hands over" the page pool to the NAPI instance. In most cases driver
+ * can simply set the @napi field in struct page_pool_params, and does not
+ * have to call this helper.
+ *
+ * The function is idempotent, but does not implement any refcounting.
+ * Single page_pool_disable_direct_recycling() will disable recycling,
+ * no matter how many times enable was called.
+ */
+void page_pool_enable_direct_recycling(struct page_pool *pool,
+ struct napi_struct *napi)
+{
+ if (READ_ONCE(pool->p.napi) == napi)
+ return;
+ WARN_ON(!napi || pool->p.napi);
+
+ mutex_lock(&page_pools_lock);
+ WRITE_ONCE(pool->p.napi, napi);
+ mutex_unlock(&page_pools_lock);
+}
+EXPORT_SYMBOL(page_pool_enable_direct_recycling);
+
void page_pool_disable_direct_recycling(struct page_pool *pool)
{
/* Disable direct recycling based on pool->cpuid.
--
2.50.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH net v2] net: page_pool: allow enabling recycling late, fix false positive warning
2025-08-05 0:36 [PATCH net v2] net: page_pool: allow enabling recycling late, fix false positive warning Jakub Kicinski
@ 2025-08-05 8:03 ` Jesper Dangaard Brouer
2025-08-08 20:10 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 3+ messages in thread
From: Jesper Dangaard Brouer @ 2025-08-05 8:03 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, David Wei,
michael.chan, pavan.chebbi, ilias.apalodimas, almasrymina, sdf
On 05/08/2025 02.36, Jakub Kicinski wrote:
> Page pool can have pages "directly" (locklessly) recycled to it,
> if the NAPI that owns the page pool is scheduled to run on the same CPU.
> To make this safe we check that the NAPI is disabled while we destroy
> the page pool. In most cases NAPI and page pool lifetimes are tied
> together so this happens naturally.
>
> The queue API expects the following order of calls:
> -> mem_alloc
> alloc new pp
> -> stop
> napi_disable
> -> start
> napi_enable
> -> mem_free
> free old pp
>
> Here we allocate the page pool in ->mem_alloc and free in ->mem_free.
> But the NAPIs are only stopped between ->stop and ->start. We created
> page_pool_disable_direct_recycling() to safely shut down the recycling
> in ->stop. This way the page_pool_destroy() call in ->mem_free doesn't
> have to worry about recycling any more.
>
> Unfortunately, the page_pool_disable_direct_recycling() is not enough
> to deal with failures which necessitate freeing the_new_ page pool.
> If we hit a failure in ->mem_alloc or ->stop the new page pool has
> to be freed while the NAPI is active (assuming driver attaches the
> page pool to an existing NAPI instance and doesn't reallocate NAPIs).
>
> Freeing the new page pool is technically safe because it hasn't been
> used for any packets, yet, so there can be no recycling. But the check
> in napi_assert_will_not_race() has no way of knowing that. We could
> check if page pool is empty but that'd make the check much less likely
> to trigger during development.
>
> Add page_pool_enable_direct_recycling(), pairing with
> page_pool_disable_direct_recycling(). It will allow us to create the new
> page pools in "disabled" state and only enable recycling when we know
> the reconfig operation will not fail.
>
> Coincidentally it will also let us re-enable the recycling for the old
> pool, if the reconfig failed:
>
> -> mem_alloc (new)
> -> stop (old)
> # disables direct recycling for old
> -> start (new)
> # fail!!
> -> start (old)
> # go back to old pp but direct recycling is lost 🙁
> -> mem_free (new)
>
> The new helper is idempotent to make the life easier for drivers,
> which can operate in HDS mode and support zero-copy Rx.
> The driver can call the helper twice whether there are two pools
> or it has multiple references to a single pool.
>
> Fixes: 40eca00ae605 ("bnxt_en: unlink page pool when stopping Rx queue")
> Tested-by: David Wei<dw@davidwei.uk>
> Signed-off-by: Jakub Kicinski<kuba@kernel.org>
> ---
> v2:
> - add kdoc
> - WARN_ON_ONCE -> WARN_ON
> v1:https://lore.kernel.org/20250801173011.2454447-1-kuba@kernel.org
LGTM - thanks for adjusting :-)
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
--Jesper
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net v2] net: page_pool: allow enabling recycling late, fix false positive warning
2025-08-05 0:36 [PATCH net v2] net: page_pool: allow enabling recycling late, fix false positive warning Jakub Kicinski
2025-08-05 8:03 ` Jesper Dangaard Brouer
@ 2025-08-08 20:10 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 3+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-08-08 20:10 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, dw,
michael.chan, pavan.chebbi, hawk, ilias.apalodimas, almasrymina,
sdf
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 4 Aug 2025 17:36:54 -0700 you wrote:
> Page pool can have pages "directly" (locklessly) recycled to it,
> if the NAPI that owns the page pool is scheduled to run on the same CPU.
> To make this safe we check that the NAPI is disabled while we destroy
> the page pool. In most cases NAPI and page pool lifetimes are tied
> together so this happens naturally.
>
> The queue API expects the following order of calls:
> -> mem_alloc
> alloc new pp
> -> stop
> napi_disable
> -> start
> napi_enable
> -> mem_free
> free old pp
>
> [...]
Here is the summary with links:
- [net,v2] net: page_pool: allow enabling recycling late, fix false positive warning
https://git.kernel.org/netdev/net/c/64fdaa94bfe0
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-08-08 20:09 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-05 0:36 [PATCH net v2] net: page_pool: allow enabling recycling late, fix false positive warning Jakub Kicinski
2025-08-05 8:03 ` Jesper Dangaard Brouer
2025-08-08 20:10 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).