netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] net: page_pool: don't try to stash the napi id
@ 2025-01-23 23:16 Jakub Kicinski
  2025-01-24 21:00 ` Mina Almasry
  0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2025-01-23 23:16 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski,
	hawk, ilias.apalodimas, asml.silence, almasrymina, kaiyuanz,
	willemb, mkarsten, jdamato

Page ppol tried to cache the NAPI ID in page pool info to avoid
having a dependency on the life cycle of the NAPI instance.
Since commit under Fixes the NAPI ID is not populated until
napi_enable() and there's a good chance that page pool is
created before NAPI gets enabled.

Protect the NAPI pointer with the existing page pool mutex,
the reading path already holds it. napi_id itself we need
to READ_ONCE(), it's protected by netdev_lock() which are
not holding in page pool.

Before this patch napi IDs were missing for mlx5:

 # ./cli.py --spec netlink/specs/netdev.yaml --dump page-pool-get

 [{'id': 144, 'ifindex': 2, 'inflight': 3072, 'inflight-mem': 12582912},
  {'id': 143, 'ifindex': 2, 'inflight': 5568, 'inflight-mem': 22806528},
  {'id': 142, 'ifindex': 2, 'inflight': 5120, 'inflight-mem': 20971520},
  {'id': 141, 'ifindex': 2, 'inflight': 4992, 'inflight-mem': 20447232},
  ...

After:

 [{'id': 144, 'ifindex': 2, 'inflight': 3072, 'inflight-mem': 12582912,
   'napi-id': 565},
  {'id': 143, 'ifindex': 2, 'inflight': 4224, 'inflight-mem': 17301504,
   'napi-id': 525},
  {'id': 142, 'ifindex': 2, 'inflight': 4288, 'inflight-mem': 17563648,
   'napi-id': 524},
  ...

Fixes: 86e25f40aa1e ("net: napi: Add napi_config")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: hawk@kernel.org
CC: ilias.apalodimas@linaro.org
CC: asml.silence@gmail.com
CC: almasrymina@google.com
CC: kaiyuanz@google.com
CC: willemb@google.com
CC: mkarsten@uwaterloo.ca
CC: jdamato@fastly.com
---
 include/net/page_pool/types.h |  1 -
 net/core/page_pool_priv.h     |  2 ++
 net/core/dev.c                |  2 +-
 net/core/page_pool.c          |  2 ++
 net/core/page_pool_user.c     | 15 +++++++++------
 5 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index ed4cd114180a..7f405672b089 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -237,7 +237,6 @@ struct page_pool {
 	struct {
 		struct hlist_node list;
 		u64 detach_time;
-		u32 napi_id;
 		u32 id;
 	} user;
 };
diff --git a/net/core/page_pool_priv.h b/net/core/page_pool_priv.h
index 57439787b9c2..2fb06d5f6d55 100644
--- a/net/core/page_pool_priv.h
+++ b/net/core/page_pool_priv.h
@@ -7,6 +7,8 @@
 
 #include "netmem_priv.h"
 
+extern struct mutex page_pools_lock;
+
 s32 page_pool_inflight(const struct page_pool *pool, bool strict);
 
 int page_pool_list(struct page_pool *pool);
diff --git a/net/core/dev.c b/net/core/dev.c
index afa2282f2604..07b2bb1ce64f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6708,7 +6708,7 @@ void napi_resume_irqs(unsigned int napi_id)
 static void __napi_hash_add_with_id(struct napi_struct *napi,
 				    unsigned int napi_id)
 {
-	napi->napi_id = napi_id;
+	WRITE_ONCE(napi->napi_id, napi_id);
 	hlist_add_head_rcu(&napi->napi_hash_node,
 			   &napi_hash[napi->napi_id % HASH_SIZE(napi_hash)]);
 }
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index a3de752c5178..ed0f89373259 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1147,7 +1147,9 @@ void page_pool_disable_direct_recycling(struct page_pool *pool)
 	WARN_ON(!test_bit(NAPI_STATE_SCHED, &pool->p.napi->state));
 	WARN_ON(READ_ONCE(pool->p.napi->list_owner) != -1);
 
+	mutex_lock(&page_pools_lock);
 	WRITE_ONCE(pool->p.napi, NULL);
+	mutex_unlock(&page_pools_lock);
 }
 EXPORT_SYMBOL(page_pool_disable_direct_recycling);
 
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index 48335766c1bf..6677e0c2e256 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -3,6 +3,7 @@
 #include <linux/mutex.h>
 #include <linux/netdevice.h>
 #include <linux/xarray.h>
+#include <net/busy_poll.h>
 #include <net/net_debug.h>
 #include <net/netdev_rx_queue.h>
 #include <net/page_pool/helpers.h>
@@ -14,10 +15,11 @@
 #include "netdev-genl-gen.h"
 
 static DEFINE_XARRAY_FLAGS(page_pools, XA_FLAGS_ALLOC1);
-/* Protects: page_pools, netdevice->page_pools, pool->slow.netdev, pool->user.
+/* Protects: page_pools, netdevice->page_pools, pool->p.napi, pool->slow.netdev,
+ *	pool->user.
  * Ordering: inside rtnl_lock
  */
-static DEFINE_MUTEX(page_pools_lock);
+DEFINE_MUTEX(page_pools_lock);
 
 /* Page pools are only reachable from user space (via netlink) if they are
  * linked to a netdev at creation time. Following page pool "visibility"
@@ -216,6 +218,7 @@ page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
 {
 	struct net_devmem_dmabuf_binding *binding = pool->mp_priv;
 	size_t inflight, refsz;
+	unsigned int napi_id;
 	void *hdr;
 
 	hdr = genlmsg_iput(rsp, info);
@@ -229,8 +232,10 @@ page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
 	    nla_put_u32(rsp, NETDEV_A_PAGE_POOL_IFINDEX,
 			pool->slow.netdev->ifindex))
 		goto err_cancel;
-	if (pool->user.napi_id &&
-	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_NAPI_ID, pool->user.napi_id))
+
+	napi_id = pool->p.napi ? READ_ONCE(pool->p.napi->napi_id) : 0;
+	if (napi_id >= MIN_NAPI_ID &&
+	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_NAPI_ID, napi_id))
 		goto err_cancel;
 
 	inflight = page_pool_inflight(pool, false);
@@ -319,8 +324,6 @@ int page_pool_list(struct page_pool *pool)
 	if (pool->slow.netdev) {
 		hlist_add_head(&pool->user.list,
 			       &pool->slow.netdev->page_pools);
-		pool->user.napi_id = pool->p.napi ? pool->p.napi->napi_id : 0;
-
 		netdev_nl_page_pool_event(pool, NETDEV_CMD_PAGE_POOL_ADD_NTF);
 	}
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net] net: page_pool: don't try to stash the napi id
  2025-01-23 23:16 [PATCH net] net: page_pool: don't try to stash the napi id Jakub Kicinski
@ 2025-01-24 21:00 ` Mina Almasry
  2025-01-24 22:18   ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 8+ messages in thread
From: Mina Almasry @ 2025-01-24 21:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, hawk,
	ilias.apalodimas, asml.silence, kaiyuanz, willemb, mkarsten,
	jdamato

On Thu, Jan 23, 2025 at 3:16 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Page ppol tried to cache the NAPI ID in page pool info to avoid

Page pool

> having a dependency on the life cycle of the NAPI instance.
> Since commit under Fixes the NAPI ID is not populated until
> napi_enable() and there's a good chance that page pool is
> created before NAPI gets enabled.
>
> Protect the NAPI pointer with the existing page pool mutex,
> the reading path already holds it. napi_id itself we need

The reading paths in page_pool.c don't hold the lock, no? Only the
reading paths in page_pool_user.c seem to do.

I could not immediately wrap my head around why pool->p.napi can be
accessed in page_pool_napi_local with no lock, but needs to be
protected in the code in page_pool_user.c. It seems
READ_ONCE/WRITE_ONCE protection is good enough to make sure
page_pool_napi_local doesn't race with
page_pool_disable_direct_recycling in a way that can crash (the
reading code either sees a valid pointer or NULL). Why is that not
good enough to also synchronize the accesses between
page_pool_disable_direct_recycling and page_pool_nl_fill? I.e., drop
the locking?

Is there some guarantee the napi won't change/get freed while
page_pool_local is running, but can change while page_pool_nl_fill is
running?

> to READ_ONCE(), it's protected by netdev_lock() which are
> not holding in page pool.
>
> Before this patch napi IDs were missing for mlx5:
>
>  # ./cli.py --spec netlink/specs/netdev.yaml --dump page-pool-get
>
>  [{'id': 144, 'ifindex': 2, 'inflight': 3072, 'inflight-mem': 12582912},
>   {'id': 143, 'ifindex': 2, 'inflight': 5568, 'inflight-mem': 22806528},
>   {'id': 142, 'ifindex': 2, 'inflight': 5120, 'inflight-mem': 20971520},
>   {'id': 141, 'ifindex': 2, 'inflight': 4992, 'inflight-mem': 20447232},
>   ...
>
> After:
>
>  [{'id': 144, 'ifindex': 2, 'inflight': 3072, 'inflight-mem': 12582912,
>    'napi-id': 565},
>   {'id': 143, 'ifindex': 2, 'inflight': 4224, 'inflight-mem': 17301504,
>    'napi-id': 525},
>   {'id': 142, 'ifindex': 2, 'inflight': 4288, 'inflight-mem': 17563648,
>    'napi-id': 524},
>   ...
>
> Fixes: 86e25f40aa1e ("net: napi: Add napi_config")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: hawk@kernel.org
> CC: ilias.apalodimas@linaro.org
> CC: asml.silence@gmail.com
> CC: almasrymina@google.com
> CC: kaiyuanz@google.com
> CC: willemb@google.com
> CC: mkarsten@uwaterloo.ca
> CC: jdamato@fastly.com
> ---
>  include/net/page_pool/types.h |  1 -
>  net/core/page_pool_priv.h     |  2 ++
>  net/core/dev.c                |  2 +-
>  net/core/page_pool.c          |  2 ++
>  net/core/page_pool_user.c     | 15 +++++++++------
>  5 files changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index ed4cd114180a..7f405672b089 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -237,7 +237,6 @@ struct page_pool {
>         struct {
>                 struct hlist_node list;
>                 u64 detach_time;
> -               u32 napi_id;
>                 u32 id;
>         } user;
>  };
> diff --git a/net/core/page_pool_priv.h b/net/core/page_pool_priv.h
> index 57439787b9c2..2fb06d5f6d55 100644
> --- a/net/core/page_pool_priv.h
> +++ b/net/core/page_pool_priv.h
> @@ -7,6 +7,8 @@
>
>  #include "netmem_priv.h"
>
> +extern struct mutex page_pools_lock;
> +
>  s32 page_pool_inflight(const struct page_pool *pool, bool strict);
>
>  int page_pool_list(struct page_pool *pool);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index afa2282f2604..07b2bb1ce64f 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6708,7 +6708,7 @@ void napi_resume_irqs(unsigned int napi_id)
>  static void __napi_hash_add_with_id(struct napi_struct *napi,
>                                     unsigned int napi_id)
>  {
> -       napi->napi_id = napi_id;
> +       WRITE_ONCE(napi->napi_id, napi_id);
>         hlist_add_head_rcu(&napi->napi_hash_node,
>                            &napi_hash[napi->napi_id % HASH_SIZE(napi_hash)]);
>  }
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index a3de752c5178..ed0f89373259 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -1147,7 +1147,9 @@ void page_pool_disable_direct_recycling(struct page_pool *pool)
>         WARN_ON(!test_bit(NAPI_STATE_SCHED, &pool->p.napi->state));
>         WARN_ON(READ_ONCE(pool->p.napi->list_owner) != -1);
>
> +       mutex_lock(&page_pools_lock);
>         WRITE_ONCE(pool->p.napi, NULL);
> +       mutex_unlock(&page_pools_lock);
>  }
>  EXPORT_SYMBOL(page_pool_disable_direct_recycling);
>
> diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
> index 48335766c1bf..6677e0c2e256 100644
> --- a/net/core/page_pool_user.c
> +++ b/net/core/page_pool_user.c
> @@ -3,6 +3,7 @@
>  #include <linux/mutex.h>
>  #include <linux/netdevice.h>
>  #include <linux/xarray.h>
> +#include <net/busy_poll.h>
>  #include <net/net_debug.h>
>  #include <net/netdev_rx_queue.h>
>  #include <net/page_pool/helpers.h>
> @@ -14,10 +15,11 @@
>  #include "netdev-genl-gen.h"
>
>  static DEFINE_XARRAY_FLAGS(page_pools, XA_FLAGS_ALLOC1);
> -/* Protects: page_pools, netdevice->page_pools, pool->slow.netdev, pool->user.
> +/* Protects: page_pools, netdevice->page_pools, pool->p.napi, pool->slow.netdev,
> + *     pool->user.
>   * Ordering: inside rtnl_lock
>   */
> -static DEFINE_MUTEX(page_pools_lock);
> +DEFINE_MUTEX(page_pools_lock);
>
>  /* Page pools are only reachable from user space (via netlink) if they are
>   * linked to a netdev at creation time. Following page pool "visibility"
> @@ -216,6 +218,7 @@ page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
>  {
>         struct net_devmem_dmabuf_binding *binding = pool->mp_priv;
>         size_t inflight, refsz;
> +       unsigned int napi_id;
>         void *hdr;
>
>         hdr = genlmsg_iput(rsp, info);
> @@ -229,8 +232,10 @@ page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
>             nla_put_u32(rsp, NETDEV_A_PAGE_POOL_IFINDEX,
>                         pool->slow.netdev->ifindex))
>                 goto err_cancel;
> -       if (pool->user.napi_id &&
> -           nla_put_uint(rsp, NETDEV_A_PAGE_POOL_NAPI_ID, pool->user.napi_id))
> +
> +       napi_id = pool->p.napi ? READ_ONCE(pool->p.napi->napi_id) : 0;

Flowing up on above, I wonder if this can be similar to the code in
page_pool_napi_local to work without the mutex protection:

napi = READ_ONCE(pool->p.napi);
if (napi)
   napi_id = READ_ONCE(napi->napi_id);

> +       if (napi_id >= MIN_NAPI_ID &&

I think this check is added to filter out 0? Nit: I would check for 0
here, since any non zero napi_id should come from the napi->napi_id,
which should be valid, but not a necessary change.

> +           nla_put_uint(rsp, NETDEV_A_PAGE_POOL_NAPI_ID, napi_id))
>                 goto err_cancel;
>
>         inflight = page_pool_inflight(pool, false);
> @@ -319,8 +324,6 @@ int page_pool_list(struct page_pool *pool)
>         if (pool->slow.netdev) {
>                 hlist_add_head(&pool->user.list,
>                                &pool->slow.netdev->page_pools);
> -               pool->user.napi_id = pool->p.napi ? pool->p.napi->napi_id : 0;
> -
>                 netdev_nl_page_pool_event(pool, NETDEV_CMD_PAGE_POOL_ADD_NTF);
>         }
>
> --
> 2.48.1
>

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] net: page_pool: don't try to stash the napi id
  2025-01-24 21:00 ` Mina Almasry
@ 2025-01-24 22:18   ` Toke Høiland-Jørgensen
  2025-01-24 23:49     ` Mina Almasry
  2025-01-25  1:31     ` Jakub Kicinski
  0 siblings, 2 replies; 8+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-01-24 22:18 UTC (permalink / raw)
  To: Mina Almasry, Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, hawk,
	ilias.apalodimas, asml.silence, kaiyuanz, willemb, mkarsten,
	jdamato

Mina Almasry <almasrymina@google.com> writes:

> On Thu, Jan 23, 2025 at 3:16 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> Page ppol tried to cache the NAPI ID in page pool info to avoid
>
> Page pool
>
>> having a dependency on the life cycle of the NAPI instance.
>> Since commit under Fixes the NAPI ID is not populated until
>> napi_enable() and there's a good chance that page pool is
>> created before NAPI gets enabled.
>>
>> Protect the NAPI pointer with the existing page pool mutex,
>> the reading path already holds it. napi_id itself we need
>
> The reading paths in page_pool.c don't hold the lock, no? Only the
> reading paths in page_pool_user.c seem to do.
>
> I could not immediately wrap my head around why pool->p.napi can be
> accessed in page_pool_napi_local with no lock, but needs to be
> protected in the code in page_pool_user.c. It seems
> READ_ONCE/WRITE_ONCE protection is good enough to make sure
> page_pool_napi_local doesn't race with
> page_pool_disable_direct_recycling in a way that can crash (the
> reading code either sees a valid pointer or NULL). Why is that not
> good enough to also synchronize the accesses between
> page_pool_disable_direct_recycling and page_pool_nl_fill? I.e., drop
> the locking?

It actually seems that this is *not* currently the case. See the
discussion here:

https://lore.kernel.org/all/8734h8qgmz.fsf@toke.dk/

IMO (as indicated in the message linked above), we should require users
to destroy the page pool before freeing the NAPI memory, rather than add
additional synchronisation.

-Toke


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] net: page_pool: don't try to stash the napi id
  2025-01-24 22:18   ` Toke Høiland-Jørgensen
@ 2025-01-24 23:49     ` Mina Almasry
  2025-01-27 13:31       ` Toke Høiland-Jørgensen
  2025-01-25  1:31     ` Jakub Kicinski
  1 sibling, 1 reply; 8+ messages in thread
From: Mina Almasry @ 2025-01-24 23:49 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jakub Kicinski, davem, netdev, edumazet, pabeni, andrew+netdev,
	horms, hawk, ilias.apalodimas, asml.silence, kaiyuanz, willemb,
	mkarsten, jdamato

On Fri, Jan 24, 2025 at 2:18 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Mina Almasry <almasrymina@google.com> writes:
>
> > On Thu, Jan 23, 2025 at 3:16 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >>
> >> Page ppol tried to cache the NAPI ID in page pool info to avoid
> >
> > Page pool
> >
> >> having a dependency on the life cycle of the NAPI instance.
> >> Since commit under Fixes the NAPI ID is not populated until
> >> napi_enable() and there's a good chance that page pool is
> >> created before NAPI gets enabled.
> >>
> >> Protect the NAPI pointer with the existing page pool mutex,
> >> the reading path already holds it. napi_id itself we need
> >
> > The reading paths in page_pool.c don't hold the lock, no? Only the
> > reading paths in page_pool_user.c seem to do.
> >
> > I could not immediately wrap my head around why pool->p.napi can be
> > accessed in page_pool_napi_local with no lock, but needs to be
> > protected in the code in page_pool_user.c. It seems
> > READ_ONCE/WRITE_ONCE protection is good enough to make sure
> > page_pool_napi_local doesn't race with
> > page_pool_disable_direct_recycling in a way that can crash (the
> > reading code either sees a valid pointer or NULL). Why is that not
> > good enough to also synchronize the accesses between
> > page_pool_disable_direct_recycling and page_pool_nl_fill? I.e., drop
> > the locking?
>
> It actually seems that this is *not* currently the case. See the
> discussion here:
>
> https://lore.kernel.org/all/8734h8qgmz.fsf@toke.dk/
>
> IMO (as indicated in the message linked above), we should require users
> to destroy the page pool before freeing the NAPI memory, rather than add
> additional synchronisation.
>

Ah, I see. I wonder if we should make this part of the API via comment
and/or add DEBUG_NET_WARN_ON to catch misuse, something like:

diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index ed4cd114180a..3919ca302e95 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -257,6 +257,10 @@ struct xdp_mem_info;

 #ifdef CONFIG_PAGE_POOL
 void page_pool_disable_direct_recycling(struct page_pool *pool);
+
+/* page_pool_destroy or page_pool_disable_direct_recycling must be
called before
+ * netif_napi_del if pool->p.napi is set.
+ */
 void page_pool_destroy(struct page_pool *pool);
 void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
                           const struct xdp_mem_info *mem);

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 5c4b788b811b..dc82767b2516 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1161,6 +1161,8 @@ void page_pool_destroy(struct page_pool *pool)
        if (!page_pool_put(pool))
                return;

+       DEBUG_NET_WARN_ON(pool->p.napi && !napi_is_valid(pool->p.napi));
+
        page_pool_disable_direct_recycling(pool);
        page_pool_free_frag(pool);

I also took a quick spot check - which could be wrong - but it seems
to me both gve and bnxt free the napi before destroying the pool :(

But I think this entire discussion is unrelated to this patch, so and
the mutex sync in this patch seems necessary for the page_pool_user.c
code which runs outside of softirq context:

Reviewed-by: Mina Almasry <almasrymina@google.com>


-- 
Thanks,
Mina

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net] net: page_pool: don't try to stash the napi id
  2025-01-24 22:18   ` Toke Høiland-Jørgensen
  2025-01-24 23:49     ` Mina Almasry
@ 2025-01-25  1:31     ` Jakub Kicinski
  1 sibling, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2025-01-25  1:31 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Mina Almasry, davem, netdev, edumazet, pabeni, andrew+netdev,
	horms, hawk, ilias.apalodimas, asml.silence, kaiyuanz, willemb,
	mkarsten, jdamato

On Fri, 24 Jan 2025 23:18:08 +0100 Toke Høiland-Jørgensen wrote:
> > The reading paths in page_pool.c don't hold the lock, no? Only the
> > reading paths in page_pool_user.c seem to do.
> >
> > I could not immediately wrap my head around why pool->p.napi can be
> > accessed in page_pool_napi_local with no lock, but needs to be
> > protected in the code in page_pool_user.c. It seems
> > READ_ONCE/WRITE_ONCE protection is good enough to make sure
> > page_pool_napi_local doesn't race with
> > page_pool_disable_direct_recycling in a way that can crash (the
> > reading code either sees a valid pointer or NULL). Why is that not
> > good enough to also synchronize the accesses between
> > page_pool_disable_direct_recycling and page_pool_nl_fill? I.e., drop
> > the locking?  
> 
> It actually seems that this is *not* currently the case. See the
> discussion here:
> 
> https://lore.kernel.org/all/8734h8qgmz.fsf@toke.dk/
> 
> IMO (as indicated in the message linked above), we should require users
> to destroy the page pool before freeing the NAPI memory, rather than add
> additional synchronisation.

Agreed in general but this is a slightly different case. 
This sequence should be legal IMHO:

  page_pool_disable_direct_recycling()
  napi_disable()
  netif_napi_del()
  # free NAPI
  page_pool_destroy()

I'm not saying it's a good idea! but since
page_pool_disable_direct_recycling() detaches the NAPI,
logically someone could assume the above works.

I agree with you on datapath accesses, as discussed in the thread you
linked. But here reader is not under RCU, so the RCU sync in NAPI
destruction does not protect us from reader stalling for a long time.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] net: page_pool: don't try to stash the napi id
  2025-01-24 23:49     ` Mina Almasry
@ 2025-01-27 13:31       ` Toke Høiland-Jørgensen
  2025-01-27 19:37         ` Jakub Kicinski
  0 siblings, 1 reply; 8+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-01-27 13:31 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Jakub Kicinski, davem, netdev, edumazet, pabeni, andrew+netdev,
	horms, hawk, ilias.apalodimas, asml.silence, kaiyuanz, willemb,
	mkarsten, jdamato

Mina Almasry <almasrymina@google.com> writes:

> On Fri, Jan 24, 2025 at 2:18 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Mina Almasry <almasrymina@google.com> writes:
>>
>> > On Thu, Jan 23, 2025 at 3:16 PM Jakub Kicinski <kuba@kernel.org> wrote:
>> >>
>> >> Page ppol tried to cache the NAPI ID in page pool info to avoid
>> >
>> > Page pool
>> >
>> >> having a dependency on the life cycle of the NAPI instance.
>> >> Since commit under Fixes the NAPI ID is not populated until
>> >> napi_enable() and there's a good chance that page pool is
>> >> created before NAPI gets enabled.
>> >>
>> >> Protect the NAPI pointer with the existing page pool mutex,
>> >> the reading path already holds it. napi_id itself we need
>> >
>> > The reading paths in page_pool.c don't hold the lock, no? Only the
>> > reading paths in page_pool_user.c seem to do.
>> >
>> > I could not immediately wrap my head around why pool->p.napi can be
>> > accessed in page_pool_napi_local with no lock, but needs to be
>> > protected in the code in page_pool_user.c. It seems
>> > READ_ONCE/WRITE_ONCE protection is good enough to make sure
>> > page_pool_napi_local doesn't race with
>> > page_pool_disable_direct_recycling in a way that can crash (the
>> > reading code either sees a valid pointer or NULL). Why is that not
>> > good enough to also synchronize the accesses between
>> > page_pool_disable_direct_recycling and page_pool_nl_fill? I.e., drop
>> > the locking?
>>
>> It actually seems that this is *not* currently the case. See the
>> discussion here:
>>
>> https://lore.kernel.org/all/8734h8qgmz.fsf@toke.dk/
>>
>> IMO (as indicated in the message linked above), we should require users
>> to destroy the page pool before freeing the NAPI memory, rather than add
>> additional synchronisation.
>>
>
> Ah, I see. I wonder if we should make this part of the API via comment
> and/or add DEBUG_NET_WARN_ON to catch misuse, something like:
>
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index ed4cd114180a..3919ca302e95 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -257,6 +257,10 @@ struct xdp_mem_info;
>
>  #ifdef CONFIG_PAGE_POOL
>  void page_pool_disable_direct_recycling(struct page_pool *pool);
> +
> +/* page_pool_destroy or page_pool_disable_direct_recycling must be
> called before
> + * netif_napi_del if pool->p.napi is set.
> + */
>  void page_pool_destroy(struct page_pool *pool);
>  void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
>                            const struct xdp_mem_info *mem);
>
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 5c4b788b811b..dc82767b2516 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -1161,6 +1161,8 @@ void page_pool_destroy(struct page_pool *pool)
>         if (!page_pool_put(pool))
>                 return;
>
> +       DEBUG_NET_WARN_ON(pool->p.napi && !napi_is_valid(pool->p.napi));
> +
>         page_pool_disable_direct_recycling(pool);
>         page_pool_free_frag(pool);

Yeah, good idea; care to send a proper patch? :)

> I also took a quick spot check - which could be wrong - but it seems
> to me both gve and bnxt free the napi before destroying the pool :(

Right, that fits with what Yunsheng found over in that other thread, at
least (for bnxt).

> But I think this entire discussion is unrelated to this patch, so and
> the mutex sync in this patch seems necessary for the page_pool_user.c
> code which runs outside of softirq context:
>
> Reviewed-by: Mina Almasry <almasrymina@google.com>

Yeah, didn't really mean this comment to have anything to do with this
patch, just mentioned it since you were talking about the data path :)

For this patch, I agree, the mutex seems fine:

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] net: page_pool: don't try to stash the napi id
  2025-01-27 13:31       ` Toke Høiland-Jørgensen
@ 2025-01-27 19:37         ` Jakub Kicinski
  2025-01-27 19:41           ` Mina Almasry
  0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2025-01-27 19:37 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Mina Almasry, davem, netdev, edumazet, pabeni, andrew+netdev,
	horms, hawk, ilias.apalodimas, asml.silence, kaiyuanz, willemb,
	mkarsten, jdamato

On Mon, 27 Jan 2025 14:31:10 +0100 Toke Høiland-Jørgensen wrote:
> > +
> > +/* page_pool_destroy or page_pool_disable_direct_recycling must be
> > called before
> > + * netif_napi_del if pool->p.napi is set.
> > + */

FWIW the comment is better placed on the warning, that's where people
will look when they hit it ;)

> >  void page_pool_destroy(struct page_pool *pool);
> >  void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
> >                            const struct xdp_mem_info *mem);
> >
> > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > index 5c4b788b811b..dc82767b2516 100644
> > --- a/net/core/page_pool.c
> > +++ b/net/core/page_pool.c
> > @@ -1161,6 +1161,8 @@ void page_pool_destroy(struct page_pool *pool)
> >         if (!page_pool_put(pool))
> >                 return;
> >
> > +       DEBUG_NET_WARN_ON(pool->p.napi && !napi_is_valid(pool->p.napi));

IDK what "napi_is_valid()" is. I think like this:

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index a3de752c5178..837ed36472db 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1145,6 +1145,7 @@ void page_pool_disable_direct_recycling(struct page_pool *pool)
         * pool and NAPI are unlinked when NAPI is disabled.
         */
        WARN_ON(!test_bit(NAPI_STATE_SCHED, &pool->p.napi->state));
+       WARN_ON(!test_bit(NAPI_STATE_LISTED, &pool->p.napi->state));
        WARN_ON(READ_ONCE(pool->p.napi->list_owner) != -1);
 
        WRITE_ONCE(pool->p.napi, NULL);

Because page_pool_disable_direct_recycling() must also be called while
NAPI is listed. Technically we should also sync rcu if the driver calls
this directly, because NAPI may be reused :(

> >         page_pool_disable_direct_recycling(pool);
> >         page_pool_free_frag(pool);  
> 
> Yeah, good idea; care to send a proper patch? :)

...for net-next ? :)

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net] net: page_pool: don't try to stash the napi id
  2025-01-27 19:37         ` Jakub Kicinski
@ 2025-01-27 19:41           ` Mina Almasry
  0 siblings, 0 replies; 8+ messages in thread
From: Mina Almasry @ 2025-01-27 19:41 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Toke Høiland-Jørgensen, davem, netdev, edumazet, pabeni,
	andrew+netdev, horms, hawk, ilias.apalodimas, asml.silence,
	kaiyuanz, willemb, mkarsten, jdamato

On Mon, Jan 27, 2025 at 11:37 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 27 Jan 2025 14:31:10 +0100 Toke Høiland-Jørgensen wrote:
> > > +
> > > +/* page_pool_destroy or page_pool_disable_direct_recycling must be
> > > called before
> > > + * netif_napi_del if pool->p.napi is set.
> > > + */
>
> FWIW the comment is better placed on the warning, that's where people
> will look when they hit it ;)
>
> > >  void page_pool_destroy(struct page_pool *pool);
> > >  void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
> > >                            const struct xdp_mem_info *mem);
> > >
> > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > > index 5c4b788b811b..dc82767b2516 100644
> > > --- a/net/core/page_pool.c
> > > +++ b/net/core/page_pool.c
> > > @@ -1161,6 +1161,8 @@ void page_pool_destroy(struct page_pool *pool)
> > >         if (!page_pool_put(pool))
> > >                 return;
> > >
> > > +       DEBUG_NET_WARN_ON(pool->p.napi && !napi_is_valid(pool->p.napi));
>
> IDK what "napi_is_valid()" is. I think like this:
>

Yeah, napi_is_valid() was just pseudo code because I wasn't sure how
to actually check yet, but thanks for the diff below.

> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index a3de752c5178..837ed36472db 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -1145,6 +1145,7 @@ void page_pool_disable_direct_recycling(struct page_pool *pool)
>          * pool and NAPI are unlinked when NAPI is disabled.
>          */
>         WARN_ON(!test_bit(NAPI_STATE_SCHED, &pool->p.napi->state));
> +       WARN_ON(!test_bit(NAPI_STATE_LISTED, &pool->p.napi->state));
>         WARN_ON(READ_ONCE(pool->p.napi->list_owner) != -1);
>
>         WRITE_ONCE(pool->p.napi, NULL);
>
> Because page_pool_disable_direct_recycling() must also be called while
> NAPI is listed. Technically we should also sync rcu if the driver calls
> this directly, because NAPI may be reused :(
>
> > >         page_pool_disable_direct_recycling(pool);
> > >         page_pool_free_frag(pool);
> >
> > Yeah, good idea; care to send a proper patch? :)
>
> ...for net-next ? :)

Will put it on my TODO list for when the tree reopens, thanks.

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-01-27 19:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-23 23:16 [PATCH net] net: page_pool: don't try to stash the napi id Jakub Kicinski
2025-01-24 21:00 ` Mina Almasry
2025-01-24 22:18   ` Toke Høiland-Jørgensen
2025-01-24 23:49     ` Mina Almasry
2025-01-27 13:31       ` Toke Høiland-Jørgensen
2025-01-27 19:37         ` Jakub Kicinski
2025-01-27 19:41           ` Mina Almasry
2025-01-25  1:31     ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).