[PATCH net] net: page_pool: allow enabling recycling late, fix false positive warning

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net] net: page_pool: allow enabling recycling late, fix false positive warning
@ 2025-08-01 17:30 Jakub Kicinski
  2025-08-01 20:55 ` Stanislav Fomichev
  0 siblings, 1 reply; 5+ messages in thread
From: Jakub Kicinski @ 2025-08-01 17:30 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski,
	David Wei, michael.chan, pavan.chebbi, hawk, ilias.apalodimas,
	almasrymina, sdf

Page pool can have pages "directly" (locklessly) recycled to it,
if the NAPI that owns the page pool is scheduled to run on the same CPU.
To make this safe we check that the NAPI is disabled while we destroy
the page pool. In most cases NAPI and page pool lifetimes are tied
together so this happens naturally.

The queue API expects the following order of calls:
 -> mem_alloc
    alloc new pp
 -> stop
    napi_disable
 -> start
    napi_enable
 -> mem_free
    free old pp

Here we allocate the page pool in ->mem_alloc and free in ->mem_free.
But the NAPIs are only stopped between ->stop and ->start. We created
page_pool_disable_direct_recycling() to safely shut down the recycling
in ->stop. This way the page_pool_destroy() call in ->mem_free doesn't
have to worry about recycling any more.

Unfortunately, the page_pool_disable_direct_recycling() is not enough
to deal with failures which necessitate freeing the _new_ page pool.
If we hit a failure in ->mem_alloc or ->stop the new page pool has
to be freed while the NAPI is active (assuming driver attaches the
page pool to an existing NAPI instance and doesn't reallocate NAPIs).

Freeing the new page pool is technically safe because it hasn't been
used for any packets, yet, so there can be no recycling. But the check
in napi_assert_will_not_race() has no way of knowing that. We could
check if page pool is empty but that'd make the check much less likely
to trigger during development.

Add page_pool_enable_direct_recycling(), pairing with
page_pool_disable_direct_recycling(). It will allow us to create the new
page pools in "disabled" state and only enable recycling when we know
the reconfig operation will not fail.

Coincidentally it will also let us re-enable the recycling for the old
pool, if the reconfig failed:

 -> mem_alloc (new)
 -> stop (old)
    # disables direct recycling for old
 -> start (new)
    # fail!!
 -> start (old)
    # go back to old pp but direct recycling is lost :(
 -> mem_free (new)

Fixes: 40eca00ae605 ("bnxt_en: unlink page pool when stopping Rx queue")
Tested-by: David Wei <dw@davidwei.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
Thanks to David Wei for confirming the problem on bnxt and testing
the fix. I hit this writing the fbnic support for ZC, TBH.
Any driver where NAPI instance gets reused and not reallocated on each
queue restart may have this problem. netdevsim doesn't 'cause the
callbacks can't fail in funny ways there.

CC: michael.chan@broadcom.com
CC: pavan.chebbi@broadcom.com
CC: hawk@kernel.org
CC: ilias.apalodimas@linaro.org
CC: dw@davidwei.uk
CC: almasrymina@google.com
CC: sdf@fomichev.me
---
 include/net/page_pool/types.h             |  2 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |  9 ++++++++-
 net/core/page_pool.c                      | 13 +++++++++++++
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 431b593de709..1509a536cb85 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -265,6 +265,8 @@ struct page_pool *page_pool_create_percpu(const struct page_pool_params *params,
 struct xdp_mem_info;
 
 #ifdef CONFIG_PAGE_POOL
+void page_pool_enable_direct_recycling(struct page_pool *pool,
+				       struct napi_struct *napi);
 void page_pool_disable_direct_recycling(struct page_pool *pool);
 void page_pool_destroy(struct page_pool *pool);
 void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 5578ddcb465d..76a4c5ae8000 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3819,7 +3819,6 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
 	if (BNXT_RX_PAGE_MODE(bp))
 		pp.pool_size += bp->rx_ring_size / rx_size_fac;
 	pp.nid = numa_node;
-	pp.napi = &rxr->bnapi->napi;
 	pp.netdev = bp->dev;
 	pp.dev = &bp->pdev->dev;
 	pp.dma_dir = bp->rx_dir;
@@ -3851,6 +3850,12 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
 	return PTR_ERR(pool);
 }
 
+static void bnxt_enable_rx_page_pool(struct bnxt_rx_ring_info *rxr)
+{
+	page_pool_enable_direct_recycling(rxr->head_pool, &rxr->bnapi->napi);
+	page_pool_enable_direct_recycling(rxr->page_pool, &rxr->bnapi->napi);
+}
+
 static int bnxt_alloc_rx_agg_bmap(struct bnxt *bp, struct bnxt_rx_ring_info *rxr)
 {
 	u16 mem_size;
@@ -3889,6 +3894,7 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
 		rc = bnxt_alloc_rx_page_pool(bp, rxr, cpu_node);
 		if (rc)
 			return rc;
+		bnxt_enable_rx_page_pool(rxr);
 
 		rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i, 0);
 		if (rc < 0)
@@ -16031,6 +16037,7 @@ static int bnxt_queue_start(struct net_device *dev, void *qmem, int idx)
 			goto err_reset;
 	}
 
+	bnxt_enable_rx_page_pool(rxr);
 	napi_enable_locked(&bnapi->napi);
 	bnxt_db_nq_arm(bp, &cpr->cp_db, cpr->cp_raw_cons);
 
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 05e2e22a8f7c..c0ff1d0e0a7a 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1201,6 +1201,19 @@ void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
 	pool->xdp_mem_id = mem->id;
 }
 
+void page_pool_enable_direct_recycling(struct page_pool *pool,
+				       struct napi_struct *napi)
+{
+	if (READ_ONCE(pool->p.napi) == napi)
+		return;
+	WARN_ON_ONCE(!napi || pool->p.napi);
+
+	mutex_lock(&page_pools_lock);
+	WRITE_ONCE(pool->p.napi, napi);
+	mutex_unlock(&page_pools_lock);
+}
+EXPORT_SYMBOL(page_pool_enable_direct_recycling);
+
 void page_pool_disable_direct_recycling(struct page_pool *pool)
 {
 	/* Disable direct recycling based on pool->cpuid.
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net] net: page_pool: allow enabling recycling late, fix false positive warning
  2025-08-01 17:30 [PATCH net] net: page_pool: allow enabling recycling late, fix false positive warning Jakub Kicinski
@ 2025-08-01 20:55 ` Stanislav Fomichev
  2025-08-01 21:05   ` Jakub Kicinski
  0 siblings, 1 reply; 5+ messages in thread
From: Stanislav Fomichev @ 2025-08-01 20:55 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, David Wei,
	michael.chan, pavan.chebbi, hawk, ilias.apalodimas, almasrymina,
	sdf

On 08/01, Jakub Kicinski wrote:
> Page pool can have pages "directly" (locklessly) recycled to it,
> if the NAPI that owns the page pool is scheduled to run on the same CPU.
> To make this safe we check that the NAPI is disabled while we destroy
> the page pool. In most cases NAPI and page pool lifetimes are tied
> together so this happens naturally.
> 
> The queue API expects the following order of calls:
>  -> mem_alloc
>     alloc new pp
>  -> stop
>     napi_disable
>  -> start
>     napi_enable
>  -> mem_free
>     free old pp
> 
> Here we allocate the page pool in ->mem_alloc and free in ->mem_free.
> But the NAPIs are only stopped between ->stop and ->start. We created
> page_pool_disable_direct_recycling() to safely shut down the recycling
> in ->stop. This way the page_pool_destroy() call in ->mem_free doesn't
> have to worry about recycling any more.
> 
> Unfortunately, the page_pool_disable_direct_recycling() is not enough
> to deal with failures which necessitate freeing the _new_ page pool.
> If we hit a failure in ->mem_alloc or ->stop the new page pool has
> to be freed while the NAPI is active (assuming driver attaches the
> page pool to an existing NAPI instance and doesn't reallocate NAPIs).
> 
> Freeing the new page pool is technically safe because it hasn't been
> used for any packets, yet, so there can be no recycling. But the check
> in napi_assert_will_not_race() has no way of knowing that. We could
> check if page pool is empty but that'd make the check much less likely
> to trigger during development.
> 
> Add page_pool_enable_direct_recycling(), pairing with
> page_pool_disable_direct_recycling(). It will allow us to create the new
> page pools in "disabled" state and only enable recycling when we know
> the reconfig operation will not fail.
> 
> Coincidentally it will also let us re-enable the recycling for the old
> pool, if the reconfig failed:
> 
>  -> mem_alloc (new)
>  -> stop (old)
>     # disables direct recycling for old
>  -> start (new)
>     # fail!!
>  -> start (old)
>     # go back to old pp but direct recycling is lost :(
>  -> mem_free (new)
> 
> Fixes: 40eca00ae605 ("bnxt_en: unlink page pool when stopping Rx queue")
> Tested-by: David Wei <dw@davidwei.uk>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> Thanks to David Wei for confirming the problem on bnxt and testing
> the fix. I hit this writing the fbnic support for ZC, TBH.
> Any driver where NAPI instance gets reused and not reallocated on each
> queue restart may have this problem. netdevsim doesn't 'cause the
> callbacks can't fail in funny ways there.
> 
> CC: michael.chan@broadcom.com
> CC: pavan.chebbi@broadcom.com
> CC: hawk@kernel.org
> CC: ilias.apalodimas@linaro.org
> CC: dw@davidwei.uk
> CC: almasrymina@google.com
> CC: sdf@fomichev.me
> ---
>  include/net/page_pool/types.h             |  2 ++
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c |  9 ++++++++-
>  net/core/page_pool.c                      | 13 +++++++++++++
>  3 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index 431b593de709..1509a536cb85 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -265,6 +265,8 @@ struct page_pool *page_pool_create_percpu(const struct page_pool_params *params,
>  struct xdp_mem_info;
>  
>  #ifdef CONFIG_PAGE_POOL
> +void page_pool_enable_direct_recycling(struct page_pool *pool,
> +				       struct napi_struct *napi);
>  void page_pool_disable_direct_recycling(struct page_pool *pool);
>  void page_pool_destroy(struct page_pool *pool);
>  void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 5578ddcb465d..76a4c5ae8000 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -3819,7 +3819,6 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
>  	if (BNXT_RX_PAGE_MODE(bp))
>  		pp.pool_size += bp->rx_ring_size / rx_size_fac;
>  	pp.nid = numa_node;
> -	pp.napi = &rxr->bnapi->napi;
>  	pp.netdev = bp->dev;
>  	pp.dev = &bp->pdev->dev;
>  	pp.dma_dir = bp->rx_dir;
> @@ -3851,6 +3850,12 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
>  	return PTR_ERR(pool);
>  }
>  
> +static void bnxt_enable_rx_page_pool(struct bnxt_rx_ring_info *rxr)
> +{
> +	page_pool_enable_direct_recycling(rxr->head_pool, &rxr->bnapi->napi);
> +	page_pool_enable_direct_recycling(rxr->page_pool, &rxr->bnapi->napi);

We do bnxt_separate_head_pool check for the disable_direct_recycling
of head_pool. Is it safe to skip the check here because we always allocate two
pps from queue_mgmt callbacks? (not clear for me from a quick glance at
bnxt_alloc_rx_page_pool)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] net: page_pool: allow enabling recycling late, fix false positive warning
  2025-08-01 20:55 ` Stanislav Fomichev
@ 2025-08-01 21:05   ` Jakub Kicinski
  2025-08-01 21:36     ` Stanislav Fomichev
  0 siblings, 1 reply; 5+ messages in thread
From: Jakub Kicinski @ 2025-08-01 21:05 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, David Wei,
	michael.chan, pavan.chebbi, hawk, ilias.apalodimas, almasrymina,
	sdf

On Fri, 1 Aug 2025 13:55:09 -0700 Stanislav Fomichev wrote:
> > +static void bnxt_enable_rx_page_pool(struct bnxt_rx_ring_info *rxr)
> > +{
> > +	page_pool_enable_direct_recycling(rxr->head_pool, &rxr->bnapi->napi);
> > +	page_pool_enable_direct_recycling(rxr->page_pool, &rxr->bnapi->napi);  
> 
> We do bnxt_separate_head_pool check for the disable_direct_recycling
> of head_pool. Is it safe to skip the check here because we always allocate two
> pps from queue_mgmt callbacks? (not clear for me from a quick glance at
> bnxt_alloc_rx_page_pool)

It's safe (I hope) because the helper is duplicate-call-friendly:

+void page_pool_enable_direct_recycling(struct page_pool *pool,
+				       struct napi_struct *napi)
+{
+	if (READ_ONCE(pool->p.napi) == napi)   <<< right here
+		return;
+	WARN_ON_ONCE(!napi || pool->p.napi);
+
+	mutex_lock(&page_pools_lock);
+	WRITE_ONCE(pool->p.napi, napi);
+	mutex_unlock(&page_pools_lock);
+}

We already have a refcount in page pool, I'm planning to add
page_pool_get() in net-next and remove the

	if (bnxt_separate_head_pool)

before page_pool_destroy(), too.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] net: page_pool: allow enabling recycling late, fix false positive warning
  2025-08-01 21:05   ` Jakub Kicinski
@ 2025-08-01 21:36     ` Stanislav Fomichev
  2025-08-04 14:17       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 5+ messages in thread
From: Stanislav Fomichev @ 2025-08-01 21:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, David Wei,
	michael.chan, pavan.chebbi, hawk, ilias.apalodimas, almasrymina,
	sdf

On 08/01, Jakub Kicinski wrote:
> On Fri, 1 Aug 2025 13:55:09 -0700 Stanislav Fomichev wrote:
> > > +static void bnxt_enable_rx_page_pool(struct bnxt_rx_ring_info *rxr)
> > > +{
> > > +	page_pool_enable_direct_recycling(rxr->head_pool, &rxr->bnapi->napi);
> > > +	page_pool_enable_direct_recycling(rxr->page_pool, &rxr->bnapi->napi);  
> > 
> > We do bnxt_separate_head_pool check for the disable_direct_recycling
> > of head_pool. Is it safe to skip the check here because we always allocate two
> > pps from queue_mgmt callbacks? (not clear for me from a quick glance at
> > bnxt_alloc_rx_page_pool)
> 
> It's safe (I hope) because the helper is duplicate-call-friendly:
> 
> +void page_pool_enable_direct_recycling(struct page_pool *pool,
> +				       struct napi_struct *napi)
> +{
> +	if (READ_ONCE(pool->p.napi) == napi)   <<< right here
> +		return;
> +	WARN_ON_ONCE(!napi || pool->p.napi);
> +
> +	mutex_lock(&page_pools_lock);
> +	WRITE_ONCE(pool->p.napi, napi);
> +	mutex_unlock(&page_pools_lock);
> +}
> 
> We already have a refcount in page pool, I'm planning to add
> page_pool_get() in net-next and remove the
> 
> 	if (bnxt_separate_head_pool)
> 
> before page_pool_destroy(), too.

Ah, I see, I missed that fact that page_pool and head_pool point to the
same address when we don't have separate pools. Makes sense, thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] net: page_pool: allow enabling recycling late, fix false positive warning
  2025-08-01 21:36     ` Stanislav Fomichev
@ 2025-08-04 14:17       ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 5+ messages in thread
From: Jesper Dangaard Brouer @ 2025-08-04 14:17 UTC (permalink / raw)
  To: Stanislav Fomichev, Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, David Wei,
	michael.chan, pavan.chebbi, ilias.apalodimas, almasrymina, sdf



On 01/08/2025 23.36, Stanislav Fomichev wrote:
> On 08/01, Jakub Kicinski wrote:
>> On Fri, 1 Aug 2025 13:55:09 -0700 Stanislav Fomichev wrote:
>>>> +static void bnxt_enable_rx_page_pool(struct bnxt_rx_ring_info *rxr)
>>>> +{
>>>> +	page_pool_enable_direct_recycling(rxr->head_pool, &rxr->bnapi->napi);
>>>> +	page_pool_enable_direct_recycling(rxr->page_pool, &rxr->bnapi->napi);
>>>
>>> We do bnxt_separate_head_pool check for the disable_direct_recycling
>>> of head_pool. Is it safe to skip the check here because we always allocate two
>>> pps from queue_mgmt callbacks? (not clear for me from a quick glance at
>>> bnxt_alloc_rx_page_pool)
>>
>> It's safe (I hope) because the helper is duplicate-call-friendly:
>>
>> +void page_pool_enable_direct_recycling(struct page_pool *pool,
>> +				       struct napi_struct *napi)
>> +{
>> +	if (READ_ONCE(pool->p.napi) == napi)   <<< right here
>> +		return;
>> +	WARN_ON_ONCE(!napi || pool->p.napi);

Why only warn once?
(this is setup code path, so it should not get invoked a lot)

>> +
>> +	mutex_lock(&page_pools_lock);
>> +	WRITE_ONCE(pool->p.napi, napi);
>> +	mutex_unlock(&page_pools_lock);
>> +}
>>
>> We already have a refcount in page pool, I'm planning to add
>> page_pool_get() in net-next and remove the
>>
>> 	if (bnxt_separate_head_pool)
>>
>> before page_pool_destroy(), too.
> 
> Ah, I see, I missed that fact that page_pool and head_pool point to the
> same address when we don't have separate pools. Makes sense, thanks!

If you depend on this side-effect of the API, we better document that
this is the intend of the API.  E.g.
  Calling this function several time with same page_pool and napi is safe
and only enables it on the first call.  Other-wise it is no allowed to 
enable an already enabled page_pool.

(... that's what the WARN is about right?)

The bnxt driver code use-case for doing this seems scary and hard to
follow, e.g. having 'page_pool' and 'head_pool' point to the same
address, sometimes...  I would also add a comment about this in the
driver, but I find this optional and up-to the driver maintainer.

--Jesper

pw-bot: cr

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-08-04 14:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-01 17:30 [PATCH net] net: page_pool: allow enabling recycling late, fix false positive warning Jakub Kicinski
2025-08-01 20:55 ` Stanislav Fomichev
2025-08-01 21:05   ` Jakub Kicinski
2025-08-01 21:36     ` Stanislav Fomichev
2025-08-04 14:17       ` Jesper Dangaard Brouer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).