[PATCH net-next v7 0/2] Fix late DMA unmap crash for page pool

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next v7 0/2] Fix late DMA unmap crash for page pool
@ 2025-04-04 10:18 Toke Høiland-Jørgensen
  2025-04-04 10:18 ` [PATCH net-next v7 1/2] page_pool: Move pp_magic check into helper functions Toke Høiland-Jørgensen
  2025-04-04 10:18 ` [PATCH net-next v7 2/2] page_pool: Track DMA-mapped pages and unmap them when destroying the pool Toke Høiland-Jørgensen
  0 siblings, 2 replies; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-04-04 10:18 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, Ilias Apalodimas, Simon Horman,
	Andrew Morton, Mina Almasry, Yonglong Liu, Yunsheng Lin,
	Pavel Begunkov, Matthew Wilcox
  Cc: netdev, bpf, linux-rdma, linux-mm,
	Toke Høiland-Jørgensen, Qiuling Ren, Yuying Ma

This series fixes the late dma_unmap crash for page pool first reported
by Yonglong Liu in [0]. It is an alternative approach to the one
submitted by Yunsheng Lin, most recently in [1]. The first commit just
wraps some tests in a helper function, in preparation of the main change
in patch 2. See the commit message of patch 2 for the details.

-Toke

[0] https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/
[1] https://lore.kernel.org/r/20250307092356.638242-1-linyunsheng@huawei.com

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
Changes in v7:
- Change WARN_ON logic if xarray alloc fails
- Don't leak xarray ID if page_pool_set_dma_addr_netmem() fails
- Unconditionally init xarray in page_pool_init()
- Rebase on current net-next
- Link to v6: https://lore.kernel.org/r/20250401-page-pool-track-dma-v6-0-8b83474870d4@redhat.com

Changes in v6:
- Add READ_ONCE() around both reads of pp->dma_sync
- Link to v5: https://lore.kernel.org/r/20250328-page-pool-track-dma-v5-0-55002af683ad@redhat.com

Changes in v5:
- Dereferencing pp->p.dev if pp->pma_sync is unset could lead to a
  crash, so make sure we don't do that.
- With the change above, patch 2 was just changing a single field, so
  squash it into patch 3
- Link to v4: https://lore.kernel.org/r/20250327-page-pool-track-dma-v4-0-b380dc6706d0@redhat.com

Changes in v4:
- Rebase on net-next
- Collect tags
- Link to v3: https://lore.kernel.org/r/20250326-page-pool-track-dma-v3-0-8e464016e0ac@redhat.com

Changes in v3:
- Use a full-width bool for pp->dma_sync instead of a full unsigned
  long (in patch 2), and leave pp->dma_sync_cpu alone.

- Link to v2: https://lore.kernel.org/r/20250325-page-pool-track-dma-v2-0-113ebc1946f3@redhat.com

Changes in v2:
- Always leave two bits at the top of pp_magic as zero, instead of one

- Add an rcu_read_lock() around __page_pool_dma_sync_for_device()

- Add a comment in poison.h with a reference to the bitmask definition

- Add a longer description of the logic of the bitmask definitions to
  the comment in types.h, and a summary of the security implications of
  using the pp_magic field to the commit message of patch 3

- Collect Mina's Reviewed-by and Yonglong's Tested-by tags

- Link to v1: https://lore.kernel.org/r/20250314-page-pool-track-dma-v1-0-c212e57a74c2@redhat.com

---
Toke Høiland-Jørgensen (2):
      page_pool: Move pp_magic check into helper functions
      page_pool: Track DMA-mapped pages and unmap them when destroying the pool

 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c |  4 +-
 include/linux/poison.h                           |  4 ++
 include/net/page_pool/types.h                    | 63 +++++++++++++++++-
 mm/page_alloc.c                                  |  9 +--
 net/core/netmem_priv.h                           | 33 +++++++++-
 net/core/page_pool.c                             | 81 ++++++++++++++++++++----
 net/core/skbuff.c                                | 16 +----
 net/core/xdp.c                                   |  4 +-
 8 files changed, 175 insertions(+), 39 deletions(-)
---
base-commit: acc4d5ff0b61eb1715c498b6536c38c1feb7f3c1
change-id: 20250310-page-pool-track-dma-0332343a460e



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next v7 1/2] page_pool: Move pp_magic check into helper functions
  2025-04-04 10:18 [PATCH net-next v7 0/2] Fix late DMA unmap crash for page pool Toke Høiland-Jørgensen
@ 2025-04-04 10:18 ` Toke Høiland-Jørgensen
  2025-04-06 18:56   ` Zi Yan
  2025-04-04 10:18 ` [PATCH net-next v7 2/2] page_pool: Track DMA-mapped pages and unmap them when destroying the pool Toke Høiland-Jørgensen
  1 sibling, 1 reply; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-04-04 10:18 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, Ilias Apalodimas, Simon Horman,
	Andrew Morton, Mina Almasry, Yonglong Liu, Yunsheng Lin,
	Pavel Begunkov, Matthew Wilcox
  Cc: netdev, bpf, linux-rdma, linux-mm,
	Toke Høiland-Jørgensen

Since we are about to stash some more information into the pp_magic
field, let's move the magic signature checks into a pair of helper
functions so it can be changed in one place.

Reviewed-by: Mina Almasry <almasrymina@google.com>
Tested-by: Yonglong Liu <liuyonglong@huawei.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c |  4 ++--
 include/net/page_pool/types.h                    | 18 ++++++++++++++++++
 mm/page_alloc.c                                  |  9 +++------
 net/core/netmem_priv.h                           |  5 +++++
 net/core/skbuff.c                                | 16 ++--------------
 net/core/xdp.c                                   |  4 ++--
 6 files changed, 32 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index f803e1c93590068d3a7829b0683be4af019266d1..5ce1b463b7a8dd7969e391618658d66f6e836cc1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -707,8 +707,8 @@ static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq,
 				xdpi = mlx5e_xdpi_fifo_pop(xdpi_fifo);
 				page = xdpi.page.page;
 
-				/* No need to check ((page->pp_magic & ~0x3UL) == PP_SIGNATURE)
-				 * as we know this is a page_pool page.
+				/* No need to check page_pool_page_is_pp() as we
+				 * know this is a page_pool page.
 				 */
 				page_pool_recycle_direct(page->pp, page);
 			} while (++n < num);
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 36eb57d73abc6cfc601e700ca08be20fb8281055..df0d3c1608929605224feb26173135ff37951ef8 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -54,6 +54,14 @@ struct pp_alloc_cache {
 	netmem_ref cache[PP_ALLOC_CACHE_SIZE];
 };
 
+/* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is
+ * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for
+ * the head page of compound page and bit 1 for pfmemalloc page.
+ * page_is_pfmemalloc() is checked in __page_pool_put_page() to avoid recycling
+ * the pfmemalloc page.
+ */
+#define PP_MAGIC_MASK ~0x3UL
+
 /**
  * struct page_pool_params - page pool parameters
  * @fast:	params accessed frequently on hotpath
@@ -264,6 +272,11 @@ void page_pool_destroy(struct page_pool *pool);
 void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
 			   const struct xdp_mem_info *mem);
 void page_pool_put_netmem_bulk(netmem_ref *data, u32 count);
+
+static inline bool page_pool_page_is_pp(struct page *page)
+{
+	return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
+}
 #else
 static inline void page_pool_destroy(struct page_pool *pool)
 {
@@ -278,6 +291,11 @@ static inline void page_pool_use_xdp_mem(struct page_pool *pool,
 static inline void page_pool_put_netmem_bulk(netmem_ref *data, u32 count)
 {
 }
+
+static inline bool page_pool_page_is_pp(struct page *page)
+{
+	return false;
+}
 #endif
 
 void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f51aa6051a99867d2d7d8c70aa7c30e523629951..347a3cc2c188f4a9ced85e0d198947be7c503526 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -55,6 +55,7 @@
 #include <linux/delayacct.h>
 #include <linux/cacheinfo.h>
 #include <linux/pgalloc_tag.h>
+#include <net/page_pool/types.h>
 #include <asm/div64.h>
 #include "internal.h"
 #include "shuffle.h"
@@ -897,9 +898,7 @@ static inline bool page_expected_state(struct page *page,
 #ifdef CONFIG_MEMCG
 			page->memcg_data |
 #endif
-#ifdef CONFIG_PAGE_POOL
-			((page->pp_magic & ~0x3UL) == PP_SIGNATURE) |
-#endif
+			page_pool_page_is_pp(page) |
 			(page->flags & check_flags)))
 		return false;
 
@@ -926,10 +925,8 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
 	if (unlikely(page->memcg_data))
 		bad_reason = "page still charged to cgroup";
 #endif
-#ifdef CONFIG_PAGE_POOL
-	if (unlikely((page->pp_magic & ~0x3UL) == PP_SIGNATURE))
+	if (unlikely(page_pool_page_is_pp(page)))
 		bad_reason = "page_pool leak";
-#endif
 	return bad_reason;
 }
 
diff --git a/net/core/netmem_priv.h b/net/core/netmem_priv.h
index 7eadb8393e002fd1cc2cef8a313d2ea7df76f301..f33162fd281c23e109273ba09950c5d0a2829bc9 100644
--- a/net/core/netmem_priv.h
+++ b/net/core/netmem_priv.h
@@ -18,6 +18,11 @@ static inline void netmem_clear_pp_magic(netmem_ref netmem)
 	__netmem_clear_lsb(netmem)->pp_magic = 0;
 }
 
+static inline bool netmem_is_pp(netmem_ref netmem)
+{
+	return (netmem_get_pp_magic(netmem) & PP_MAGIC_MASK) == PP_SIGNATURE;
+}
+
 static inline void netmem_set_pp(netmem_ref netmem, struct page_pool *pool)
 {
 	__netmem_clear_lsb(netmem)->pp = pool;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6cbf77bc61fce74c934628fd74b3a2cb7809e464..74a2d886a35b518d55b6d3cafcb8442212f9beee 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -893,11 +893,6 @@ static void skb_clone_fraglist(struct sk_buff *skb)
 		skb_get(list);
 }
 
-static bool is_pp_netmem(netmem_ref netmem)
-{
-	return (netmem_get_pp_magic(netmem) & ~0x3UL) == PP_SIGNATURE;
-}
-
 int skb_pp_cow_data(struct page_pool *pool, struct sk_buff **pskb,
 		    unsigned int headroom)
 {
@@ -995,14 +990,7 @@ bool napi_pp_put_page(netmem_ref netmem)
 {
 	netmem = netmem_compound_head(netmem);
 
-	/* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
-	 * in order to preserve any existing bits, such as bit 0 for the
-	 * head page of compound page and bit 1 for pfmemalloc page, so
-	 * mask those bits for freeing side when doing below checking,
-	 * and page_is_pfmemalloc() is checked in __page_pool_put_page()
-	 * to avoid recycling the pfmemalloc page.
-	 */
-	if (unlikely(!is_pp_netmem(netmem)))
+	if (unlikely(!netmem_is_pp(netmem)))
 		return false;
 
 	page_pool_put_full_netmem(netmem_get_pp(netmem), netmem, false);
@@ -1042,7 +1030,7 @@ static int skb_pp_frag_ref(struct sk_buff *skb)
 
 	for (i = 0; i < shinfo->nr_frags; i++) {
 		head_netmem = netmem_compound_head(shinfo->frags[i].netmem);
-		if (likely(is_pp_netmem(head_netmem)))
+		if (likely(netmem_is_pp(head_netmem)))
 			page_pool_ref_netmem(head_netmem);
 		else
 			page_ref_inc(netmem_to_page(head_netmem));
diff --git a/net/core/xdp.c b/net/core/xdp.c
index f86eedad586a77eb63a96a85aa6d068d3e94f077..0ba73943c6eed873b3d1c681b3b9a802b590f2d9 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -437,8 +437,8 @@ void __xdp_return(netmem_ref netmem, enum xdp_mem_type mem_type,
 		netmem = netmem_compound_head(netmem);
 		if (napi_direct && xdp_return_frame_no_direct())
 			napi_direct = false;
-		/* No need to check ((page->pp_magic & ~0x3UL) == PP_SIGNATURE)
-		 * as mem->type knows this a page_pool page
+		/* No need to check netmem_is_pp() as mem->type knows this a
+		 * page_pool page
 		 */
 		page_pool_put_full_netmem(netmem_get_pp(netmem), netmem,
 					  napi_direct);

-- 
2.48.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v7 1/2] page_pool: Move pp_magic check into helper functions
  2025-04-04 10:18 ` [PATCH net-next v7 1/2] page_pool: Move pp_magic check into helper functions Toke Høiland-Jørgensen
@ 2025-04-06 18:56   ` Zi Yan
  2025-04-07  8:53     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 19+ messages in thread
From: Zi Yan @ 2025-04-06 18:56 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Saeed Mahameed, Leon Romanovsky,
	Tariq Toukan, Andrew Lunn, Eric Dumazet, Paolo Abeni,
	Ilias Apalodimas, Simon Horman, Andrew Morton, Mina Almasry,
	Yonglong Liu, Yunsheng Lin, Pavel Begunkov, Matthew Wilcox
  Cc: netdev, bpf, linux-rdma, linux-mm

On Fri Apr 4, 2025 at 6:18 AM EDT, Toke Høiland-Jørgensen wrote:
> Since we are about to stash some more information into the pp_magic
> field, let's move the magic signature checks into a pair of helper
> functions so it can be changed in one place.
>
> Reviewed-by: Mina Almasry <almasrymina@google.com>
> Tested-by: Yonglong Liu <liuyonglong@huawei.com>
> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c |  4 ++--
>  include/net/page_pool/types.h                    | 18 ++++++++++++++++++
>  mm/page_alloc.c                                  |  9 +++------
>  net/core/netmem_priv.h                           |  5 +++++
>  net/core/skbuff.c                                | 16 ++--------------
>  net/core/xdp.c                                   |  4 ++--
>  6 files changed, 32 insertions(+), 24 deletions(-)
>

<snip>

> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index 36eb57d73abc6cfc601e700ca08be20fb8281055..df0d3c1608929605224feb26173135ff37951ef8 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -54,6 +54,14 @@ struct pp_alloc_cache {
>  	netmem_ref cache[PP_ALLOC_CACHE_SIZE];
>  };
>  
> +/* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is
> + * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for
> + * the head page of compound page and bit 1 for pfmemalloc page.
> + * page_is_pfmemalloc() is checked in __page_pool_put_page() to avoid recycling
> + * the pfmemalloc page.
> + */
> +#define PP_MAGIC_MASK ~0x3UL
> +
>  /**
>   * struct page_pool_params - page pool parameters
>   * @fast:	params accessed frequently on hotpath
> @@ -264,6 +272,11 @@ void page_pool_destroy(struct page_pool *pool);
>  void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
>  			   const struct xdp_mem_info *mem);
>  void page_pool_put_netmem_bulk(netmem_ref *data, u32 count);
> +
> +static inline bool page_pool_page_is_pp(struct page *page)
> +{
> +	return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> +}
>  #else
>  static inline void page_pool_destroy(struct page_pool *pool)
>  {
> @@ -278,6 +291,11 @@ static inline void page_pool_use_xdp_mem(struct page_pool *pool,
>  static inline void page_pool_put_netmem_bulk(netmem_ref *data, u32 count)
>  {
>  }
> +
> +static inline bool page_pool_page_is_pp(struct page *page)
> +{
> +	return false;
> +}
>  #endif
>  
>  void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f51aa6051a99867d2d7d8c70aa7c30e523629951..347a3cc2c188f4a9ced85e0d198947be7c503526 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -55,6 +55,7 @@
>  #include <linux/delayacct.h>
>  #include <linux/cacheinfo.h>
>  #include <linux/pgalloc_tag.h>
> +#include <net/page_pool/types.h>
>  #include <asm/div64.h>
>  #include "internal.h"
>  #include "shuffle.h"
> @@ -897,9 +898,7 @@ static inline bool page_expected_state(struct page *page,
>  #ifdef CONFIG_MEMCG
>  			page->memcg_data |
>  #endif
> -#ifdef CONFIG_PAGE_POOL
> -			((page->pp_magic & ~0x3UL) == PP_SIGNATURE) |
> -#endif
> +			page_pool_page_is_pp(page) |
>  			(page->flags & check_flags)))
>  		return false;
>  
> @@ -926,10 +925,8 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
>  	if (unlikely(page->memcg_data))
>  		bad_reason = "page still charged to cgroup";
>  #endif
> -#ifdef CONFIG_PAGE_POOL
> -	if (unlikely((page->pp_magic & ~0x3UL) == PP_SIGNATURE))
> +	if (unlikely(page_pool_page_is_pp(page)))
>  		bad_reason = "page_pool leak";
> -#endif
>  	return bad_reason;
>  }
>  

I wonder if it is OK to make page allocation depend on page_pool from net/page_pool.
Would linux/mm.h be a better place for page_pool_page_is_pp()?

-- 
Best Regards,
Yan, Zi



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v7 1/2] page_pool: Move pp_magic check into helper functions
  2025-04-06 18:56   ` Zi Yan
@ 2025-04-07  8:53     ` Toke Høiland-Jørgensen
  2025-04-07 11:53       ` Zi Yan
  2025-04-07 12:24       ` Zi Yan
  0 siblings, 2 replies; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-04-07  8:53 UTC (permalink / raw)
  To: Zi Yan, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, Ilias Apalodimas, Simon Horman,
	Andrew Morton, Mina Almasry, Yonglong Liu, Yunsheng Lin,
	Pavel Begunkov, Matthew Wilcox
  Cc: netdev, bpf, linux-rdma, linux-mm

"Zi Yan" <ziy@nvidia.com> writes:

> On Fri Apr 4, 2025 at 6:18 AM EDT, Toke Høiland-Jørgensen wrote:
>> Since we are about to stash some more information into the pp_magic
>> field, let's move the magic signature checks into a pair of helper
>> functions so it can be changed in one place.
>>
>> Reviewed-by: Mina Almasry <almasrymina@google.com>
>> Tested-by: Yonglong Liu <liuyonglong@huawei.com>
>> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
>> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> ---
>>  drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c |  4 ++--
>>  include/net/page_pool/types.h                    | 18 ++++++++++++++++++
>>  mm/page_alloc.c                                  |  9 +++------
>>  net/core/netmem_priv.h                           |  5 +++++
>>  net/core/skbuff.c                                | 16 ++--------------
>>  net/core/xdp.c                                   |  4 ++--
>>  6 files changed, 32 insertions(+), 24 deletions(-)
>>
>
> <snip>
>
>> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
>> index 36eb57d73abc6cfc601e700ca08be20fb8281055..df0d3c1608929605224feb26173135ff37951ef8 100644
>> --- a/include/net/page_pool/types.h
>> +++ b/include/net/page_pool/types.h
>> @@ -54,6 +54,14 @@ struct pp_alloc_cache {
>>  	netmem_ref cache[PP_ALLOC_CACHE_SIZE];
>>  };
>>  
>> +/* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is
>> + * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for
>> + * the head page of compound page and bit 1 for pfmemalloc page.
>> + * page_is_pfmemalloc() is checked in __page_pool_put_page() to avoid recycling
>> + * the pfmemalloc page.
>> + */
>> +#define PP_MAGIC_MASK ~0x3UL
>> +
>>  /**
>>   * struct page_pool_params - page pool parameters
>>   * @fast:	params accessed frequently on hotpath
>> @@ -264,6 +272,11 @@ void page_pool_destroy(struct page_pool *pool);
>>  void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
>>  			   const struct xdp_mem_info *mem);
>>  void page_pool_put_netmem_bulk(netmem_ref *data, u32 count);
>> +
>> +static inline bool page_pool_page_is_pp(struct page *page)
>> +{
>> +	return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>> +}
>>  #else
>>  static inline void page_pool_destroy(struct page_pool *pool)
>>  {
>> @@ -278,6 +291,11 @@ static inline void page_pool_use_xdp_mem(struct page_pool *pool,
>>  static inline void page_pool_put_netmem_bulk(netmem_ref *data, u32 count)
>>  {
>>  }
>> +
>> +static inline bool page_pool_page_is_pp(struct page *page)
>> +{
>> +	return false;
>> +}
>>  #endif
>>  
>>  void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem,
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index f51aa6051a99867d2d7d8c70aa7c30e523629951..347a3cc2c188f4a9ced85e0d198947be7c503526 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -55,6 +55,7 @@
>>  #include <linux/delayacct.h>
>>  #include <linux/cacheinfo.h>
>>  #include <linux/pgalloc_tag.h>
>> +#include <net/page_pool/types.h>
>>  #include <asm/div64.h>
>>  #include "internal.h"
>>  #include "shuffle.h"
>> @@ -897,9 +898,7 @@ static inline bool page_expected_state(struct page *page,
>>  #ifdef CONFIG_MEMCG
>>  			page->memcg_data |
>>  #endif
>> -#ifdef CONFIG_PAGE_POOL
>> -			((page->pp_magic & ~0x3UL) == PP_SIGNATURE) |
>> -#endif
>> +			page_pool_page_is_pp(page) |
>>  			(page->flags & check_flags)))
>>  		return false;
>>  
>> @@ -926,10 +925,8 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
>>  	if (unlikely(page->memcg_data))
>>  		bad_reason = "page still charged to cgroup";
>>  #endif
>> -#ifdef CONFIG_PAGE_POOL
>> -	if (unlikely((page->pp_magic & ~0x3UL) == PP_SIGNATURE))
>> +	if (unlikely(page_pool_page_is_pp(page)))
>>  		bad_reason = "page_pool leak";
>> -#endif
>>  	return bad_reason;
>>  }
>>  
>
> I wonder if it is OK to make page allocation depend on page_pool from
> net/page_pool.

Why? It's not really a dependency, just a header include with a static
inline function...

> Would linux/mm.h be a better place for page_pool_page_is_pp()?

That would require moving all the definitions introduced in patch 2,
which I don't think is appropriate.

-Toke



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v7 1/2] page_pool: Move pp_magic check into helper functions
  2025-04-07  8:53     ` Toke Høiland-Jørgensen
@ 2025-04-07 11:53       ` Zi Yan
  2025-04-07 12:24       ` Zi Yan
  1 sibling, 0 replies; 19+ messages in thread
From: Zi Yan @ 2025-04-07 11:53 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Matthew Wilcox
  Cc: David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, Ilias Apalodimas, Simon Horman,
	Andrew Morton, Mina Almasry, Yonglong Liu, Yunsheng Lin,
	Pavel Begunkov, netdev, bpf, linux-rdma, linux-mm



--
Best Regards,
Yan, Zi

On 7 Apr 2025, at 4:53, Toke Høiland-Jørgensen wrote:

> "Zi Yan" <ziy@nvidia.com> writes:
>
>> On Fri Apr 4, 2025 at 6:18 AM EDT, Toke Høiland-Jørgensen wrote:
>>> Since we are about to stash some more information into the pp_magic
>>> field, let's move the magic signature checks into a pair of helper
>>> functions so it can be changed in one place.
>>>
>>> Reviewed-by: Mina Almasry <almasrymina@google.com>
>>> Tested-by: Yonglong Liu <liuyonglong@huawei.com>
>>> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
>>> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>> ---
>>>  drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c |  4 ++--
>>>  include/net/page_pool/types.h                    | 18 ++++++++++++++++++
>>>  mm/page_alloc.c                                  |  9 +++------
>>>  net/core/netmem_priv.h                           |  5 +++++
>>>  net/core/skbuff.c                                | 16 ++--------------
>>>  net/core/xdp.c                                   |  4 ++--
>>>  6 files changed, 32 insertions(+), 24 deletions(-)
>>>
>>
>> <snip>
>>
>>> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
>>> index 36eb57d73abc6cfc601e700ca08be20fb8281055..df0d3c1608929605224feb26173135ff37951ef8 100644
>>> --- a/include/net/page_pool/types.h
>>> +++ b/include/net/page_pool/types.h
>>> @@ -54,6 +54,14 @@ struct pp_alloc_cache {
>>>  	netmem_ref cache[PP_ALLOC_CACHE_SIZE];
>>>  };
>>>
>>> +/* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is
>>> + * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for
>>> + * the head page of compound page and bit 1 for pfmemalloc page.
>>> + * page_is_pfmemalloc() is checked in __page_pool_put_page() to avoid recycling
>>> + * the pfmemalloc page.
>>> + */
>>> +#define PP_MAGIC_MASK ~0x3UL
>>> +
>>>  /**
>>>   * struct page_pool_params - page pool parameters
>>>   * @fast:	params accessed frequently on hotpath
>>> @@ -264,6 +272,11 @@ void page_pool_destroy(struct page_pool *pool);
>>>  void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
>>>  			   const struct xdp_mem_info *mem);
>>>  void page_pool_put_netmem_bulk(netmem_ref *data, u32 count);
>>> +
>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>> +{
>>> +	return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>> +}
>>>  #else
>>>  static inline void page_pool_destroy(struct page_pool *pool)
>>>  {
>>> @@ -278,6 +291,11 @@ static inline void page_pool_use_xdp_mem(struct page_pool *pool,
>>>  static inline void page_pool_put_netmem_bulk(netmem_ref *data, u32 count)
>>>  {
>>>  }
>>> +
>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>> +{
>>> +	return false;
>>> +}
>>>  #endif
>>>
>>>  void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem,
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index f51aa6051a99867d2d7d8c70aa7c30e523629951..347a3cc2c188f4a9ced85e0d198947be7c503526 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -55,6 +55,7 @@
>>>  #include <linux/delayacct.h>
>>>  #include <linux/cacheinfo.h>
>>>  #include <linux/pgalloc_tag.h>
>>> +#include <net/page_pool/types.h>
>>>  #include <asm/div64.h>
>>>  #include "internal.h"
>>>  #include "shuffle.h"
>>> @@ -897,9 +898,7 @@ static inline bool page_expected_state(struct page *page,
>>>  #ifdef CONFIG_MEMCG
>>>  			page->memcg_data |
>>>  #endif
>>> -#ifdef CONFIG_PAGE_POOL
>>> -			((page->pp_magic & ~0x3UL) == PP_SIGNATURE) |
>>> -#endif
>>> +			page_pool_page_is_pp(page) |
>>>  			(page->flags & check_flags)))
>>>  		return false;
>>>
>>> @@ -926,10 +925,8 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
>>>  	if (unlikely(page->memcg_data))
>>>  		bad_reason = "page still charged to cgroup";
>>>  #endif
>>> -#ifdef CONFIG_PAGE_POOL
>>> -	if (unlikely((page->pp_magic & ~0x3UL) == PP_SIGNATURE))
>>> +	if (unlikely(page_pool_page_is_pp(page)))
>>>  		bad_reason = "page_pool leak";
>>> -#endif
>>>  	return bad_reason;
>>>  }
>>>
>>
>> I wonder if it is OK to make page allocation depend on page_pool from
>> net/page_pool.
>
> Why? It's not really a dependency, just a header include with a static
> inline function...

The function is checking, not even modifying, an core mm data structure,
struct page, which is also used by almost all subsystems. I do not get
why the function is in net subsystem.

>
>> Would linux/mm.h be a better place for page_pool_page_is_pp()?
>
> That would require moving all the definitions introduced in patch 2,
> which I don't think is appropriate.

Why? I do not see page_pool_page_is_pp() or PP_SIGNATURE is used anywhere
in patch 2.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v7 1/2] page_pool: Move pp_magic check into helper functions
  2025-04-07  8:53     ` Toke Høiland-Jørgensen
  2025-04-07 11:53       ` Zi Yan
@ 2025-04-07 12:24       ` Zi Yan
  2025-04-07 13:14         ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 19+ messages in thread
From: Zi Yan @ 2025-04-07 12:24 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, Ilias Apalodimas, Simon Horman,
	Andrew Morton, Mina Almasry, Yonglong Liu, Yunsheng Lin,
	Pavel Begunkov, Matthew Wilcox, netdev, bpf, linux-rdma, linux-mm

Resend to fix my signature.

On 7 Apr 2025, at 4:53, Toke Høiland-Jørgensen wrote:

> "Zi Yan" <ziy@nvidia.com> writes:
>
>> On Fri Apr 4, 2025 at 6:18 AM EDT, Toke Høiland-Jørgensen wrote:
>>> Since we are about to stash some more information into the pp_magic
>>> field, let's move the magic signature checks into a pair of helper
>>> functions so it can be changed in one place.
>>>
>>> Reviewed-by: Mina Almasry <almasrymina@google.com>
>>> Tested-by: Yonglong Liu <liuyonglong@huawei.com>
>>> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
>>> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>> ---
>>>  drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c |  4 ++--
>>>  include/net/page_pool/types.h                    | 18 ++++++++++++++++++
>>>  mm/page_alloc.c                                  |  9 +++------
>>>  net/core/netmem_priv.h                           |  5 +++++
>>>  net/core/skbuff.c                                | 16 ++--------------
>>>  net/core/xdp.c                                   |  4 ++--
>>>  6 files changed, 32 insertions(+), 24 deletions(-)
>>>
>>
>> <snip>
>>
>>> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
>>> index 36eb57d73abc6cfc601e700ca08be20fb8281055..df0d3c1608929605224feb26173135ff37951ef8 100644
>>> --- a/include/net/page_pool/types.h
>>> +++ b/include/net/page_pool/types.h
>>> @@ -54,6 +54,14 @@ struct pp_alloc_cache {
>>>  	netmem_ref cache[PP_ALLOC_CACHE_SIZE];
>>>  };
>>>
>>> +/* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is
>>> + * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for
>>> + * the head page of compound page and bit 1 for pfmemalloc page.
>>> + * page_is_pfmemalloc() is checked in __page_pool_put_page() to avoid recycling
>>> + * the pfmemalloc page.
>>> + */
>>> +#define PP_MAGIC_MASK ~0x3UL
>>> +
>>>  /**
>>>   * struct page_pool_params - page pool parameters
>>>   * @fast:	params accessed frequently on hotpath
>>> @@ -264,6 +272,11 @@ void page_pool_destroy(struct page_pool *pool);
>>>  void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
>>>  			   const struct xdp_mem_info *mem);
>>>  void page_pool_put_netmem_bulk(netmem_ref *data, u32 count);
>>> +
>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>> +{
>>> +	return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>> +}
>>>  #else
>>>  static inline void page_pool_destroy(struct page_pool *pool)
>>>  {
>>> @@ -278,6 +291,11 @@ static inline void page_pool_use_xdp_mem(struct page_pool *pool,
>>>  static inline void page_pool_put_netmem_bulk(netmem_ref *data, u32 count)
>>>  {
>>>  }
>>> +
>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>> +{
>>> +	return false;
>>> +}
>>>  #endif
>>>
>>>  void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem,
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index f51aa6051a99867d2d7d8c70aa7c30e523629951..347a3cc2c188f4a9ced85e0d198947be7c503526 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -55,6 +55,7 @@
>>>  #include <linux/delayacct.h>
>>>  #include <linux/cacheinfo.h>
>>>  #include <linux/pgalloc_tag.h>
>>> +#include <net/page_pool/types.h>
>>>  #include <asm/div64.h>
>>>  #include "internal.h"
>>>  #include "shuffle.h"
>>> @@ -897,9 +898,7 @@ static inline bool page_expected_state(struct page *page,
>>>  #ifdef CONFIG_MEMCG
>>>  			page->memcg_data |
>>>  #endif
>>> -#ifdef CONFIG_PAGE_POOL
>>> -			((page->pp_magic & ~0x3UL) == PP_SIGNATURE) |
>>> -#endif
>>> +			page_pool_page_is_pp(page) |
>>>  			(page->flags & check_flags)))
>>>  		return false;
>>>
>>> @@ -926,10 +925,8 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
>>>  	if (unlikely(page->memcg_data))
>>>  		bad_reason = "page still charged to cgroup";
>>>  #endif
>>> -#ifdef CONFIG_PAGE_POOL
>>> -	if (unlikely((page->pp_magic & ~0x3UL) == PP_SIGNATURE))
>>> +	if (unlikely(page_pool_page_is_pp(page)))
>>>  		bad_reason = "page_pool leak";
>>> -#endif
>>>  	return bad_reason;
>>>  }
>>>
>>
>> I wonder if it is OK to make page allocation depend on page_pool from
>> net/page_pool.
>
> Why? It's not really a dependency, just a header include with a static
> inline function...

The function is checking, not even modifying, an core mm data structure,
struct page, which is also used by almost all subsystems. I do not get
why the function is in net subsystem.

>
>> Would linux/mm.h be a better place for page_pool_page_is_pp()?
>
> That would require moving all the definitions introduced in patch 2,
> which I don't think is appropriate.

Why? I do not see page_pool_page_is_pp() or PP_SIGNATURE is used anywhere
in patch 2.

--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v7 1/2] page_pool: Move pp_magic check into helper functions
  2025-04-07 12:24       ` Zi Yan
@ 2025-04-07 13:14         ` Toke Høiland-Jørgensen
  2025-04-07 13:36           ` Zi Yan
  0 siblings, 1 reply; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-04-07 13:14 UTC (permalink / raw)
  To: Zi Yan
  Cc: David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, Ilias Apalodimas, Simon Horman,
	Andrew Morton, Mina Almasry, Yonglong Liu, Yunsheng Lin,
	Pavel Begunkov, Matthew Wilcox, netdev, bpf, linux-rdma, linux-mm

Zi Yan <ziy@nvidia.com> writes:

> Resend to fix my signature.
>
> On 7 Apr 2025, at 4:53, Toke Høiland-Jørgensen wrote:
>
>> "Zi Yan" <ziy@nvidia.com> writes:
>>
>>> On Fri Apr 4, 2025 at 6:18 AM EDT, Toke Høiland-Jørgensen wrote:
>>>> Since we are about to stash some more information into the pp_magic
>>>> field, let's move the magic signature checks into a pair of helper
>>>> functions so it can be changed in one place.
>>>>
>>>> Reviewed-by: Mina Almasry <almasrymina@google.com>
>>>> Tested-by: Yonglong Liu <liuyonglong@huawei.com>
>>>> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
>>>> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>> ---
>>>>  drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c |  4 ++--
>>>>  include/net/page_pool/types.h                    | 18 ++++++++++++++++++
>>>>  mm/page_alloc.c                                  |  9 +++------
>>>>  net/core/netmem_priv.h                           |  5 +++++
>>>>  net/core/skbuff.c                                | 16 ++--------------
>>>>  net/core/xdp.c                                   |  4 ++--
>>>>  6 files changed, 32 insertions(+), 24 deletions(-)
>>>>
>>>
>>> <snip>
>>>
>>>> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
>>>> index 36eb57d73abc6cfc601e700ca08be20fb8281055..df0d3c1608929605224feb26173135ff37951ef8 100644
>>>> --- a/include/net/page_pool/types.h
>>>> +++ b/include/net/page_pool/types.h
>>>> @@ -54,6 +54,14 @@ struct pp_alloc_cache {
>>>>  	netmem_ref cache[PP_ALLOC_CACHE_SIZE];
>>>>  };
>>>>
>>>> +/* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is
>>>> + * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for
>>>> + * the head page of compound page and bit 1 for pfmemalloc page.
>>>> + * page_is_pfmemalloc() is checked in __page_pool_put_page() to avoid recycling
>>>> + * the pfmemalloc page.
>>>> + */
>>>> +#define PP_MAGIC_MASK ~0x3UL
>>>> +
>>>>  /**
>>>>   * struct page_pool_params - page pool parameters
>>>>   * @fast:	params accessed frequently on hotpath
>>>> @@ -264,6 +272,11 @@ void page_pool_destroy(struct page_pool *pool);
>>>>  void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
>>>>  			   const struct xdp_mem_info *mem);
>>>>  void page_pool_put_netmem_bulk(netmem_ref *data, u32 count);
>>>> +
>>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>>> +{
>>>> +	return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>>> +}
>>>>  #else
>>>>  static inline void page_pool_destroy(struct page_pool *pool)
>>>>  {
>>>> @@ -278,6 +291,11 @@ static inline void page_pool_use_xdp_mem(struct page_pool *pool,
>>>>  static inline void page_pool_put_netmem_bulk(netmem_ref *data, u32 count)
>>>>  {
>>>>  }
>>>> +
>>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>>> +{
>>>> +	return false;
>>>> +}
>>>>  #endif
>>>>
>>>>  void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem,
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index f51aa6051a99867d2d7d8c70aa7c30e523629951..347a3cc2c188f4a9ced85e0d198947be7c503526 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -55,6 +55,7 @@
>>>>  #include <linux/delayacct.h>
>>>>  #include <linux/cacheinfo.h>
>>>>  #include <linux/pgalloc_tag.h>
>>>> +#include <net/page_pool/types.h>
>>>>  #include <asm/div64.h>
>>>>  #include "internal.h"
>>>>  #include "shuffle.h"
>>>> @@ -897,9 +898,7 @@ static inline bool page_expected_state(struct page *page,
>>>>  #ifdef CONFIG_MEMCG
>>>>  			page->memcg_data |
>>>>  #endif
>>>> -#ifdef CONFIG_PAGE_POOL
>>>> -			((page->pp_magic & ~0x3UL) == PP_SIGNATURE) |
>>>> -#endif
>>>> +			page_pool_page_is_pp(page) |
>>>>  			(page->flags & check_flags)))
>>>>  		return false;
>>>>
>>>> @@ -926,10 +925,8 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
>>>>  	if (unlikely(page->memcg_data))
>>>>  		bad_reason = "page still charged to cgroup";
>>>>  #endif
>>>> -#ifdef CONFIG_PAGE_POOL
>>>> -	if (unlikely((page->pp_magic & ~0x3UL) == PP_SIGNATURE))
>>>> +	if (unlikely(page_pool_page_is_pp(page)))
>>>>  		bad_reason = "page_pool leak";
>>>> -#endif
>>>>  	return bad_reason;
>>>>  }
>>>>
>>>
>>> I wonder if it is OK to make page allocation depend on page_pool from
>>> net/page_pool.
>>
>> Why? It's not really a dependency, just a header include with a static
>> inline function...
>
> The function is checking, not even modifying, an core mm data structure,
> struct page, which is also used by almost all subsystems. I do not get
> why the function is in net subsystem.

Well, because it's using details of the PP definitions, so keeping it
there nicely encapsulates things. I mean, that's the whole point of
defining a wrapper function - encapsulating the logic :)

>>> Would linux/mm.h be a better place for page_pool_page_is_pp()?
>>
>> That would require moving all the definitions introduced in patch 2,
>> which I don't think is appropriate.
>
> Why? I do not see page_pool_page_is_pp() or PP_SIGNATURE is used anywhere
> in patch 2.

Look again. Patch 2 redefines PP_MAGIC_MASK in terms of all the other
definitions.

-Toke



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v7 1/2] page_pool: Move pp_magic check into helper functions
  2025-04-07 13:14         ` Toke Høiland-Jørgensen
@ 2025-04-07 13:36           ` Zi Yan
  2025-04-07 14:15             ` Zi Yan
  0 siblings, 1 reply; 19+ messages in thread
From: Zi Yan @ 2025-04-07 13:36 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, Ilias Apalodimas, Simon Horman,
	Andrew Morton, Mina Almasry, Yonglong Liu, Yunsheng Lin,
	Pavel Begunkov, Matthew Wilcox, netdev, bpf, linux-rdma, linux-mm

On 7 Apr 2025, at 9:14, Toke Høiland-Jørgensen wrote:

> Zi Yan <ziy@nvidia.com> writes:
>
>> Resend to fix my signature.
>>
>> On 7 Apr 2025, at 4:53, Toke Høiland-Jørgensen wrote:
>>
>>> "Zi Yan" <ziy@nvidia.com> writes:
>>>
>>>> On Fri Apr 4, 2025 at 6:18 AM EDT, Toke Høiland-Jørgensen wrote:
>>>>> Since we are about to stash some more information into the pp_magic
>>>>> field, let's move the magic signature checks into a pair of helper
>>>>> functions so it can be changed in one place.
>>>>>
>>>>> Reviewed-by: Mina Almasry <almasrymina@google.com>
>>>>> Tested-by: Yonglong Liu <liuyonglong@huawei.com>
>>>>> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
>>>>> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
>>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>>> ---
>>>>>  drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c |  4 ++--
>>>>>  include/net/page_pool/types.h                    | 18 ++++++++++++++++++
>>>>>  mm/page_alloc.c                                  |  9 +++------
>>>>>  net/core/netmem_priv.h                           |  5 +++++
>>>>>  net/core/skbuff.c                                | 16 ++--------------
>>>>>  net/core/xdp.c                                   |  4 ++--
>>>>>  6 files changed, 32 insertions(+), 24 deletions(-)
>>>>>
>>>>
>>>> <snip>
>>>>
>>>>> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
>>>>> index 36eb57d73abc6cfc601e700ca08be20fb8281055..df0d3c1608929605224feb26173135ff37951ef8 100644
>>>>> --- a/include/net/page_pool/types.h
>>>>> +++ b/include/net/page_pool/types.h
>>>>> @@ -54,6 +54,14 @@ struct pp_alloc_cache {
>>>>>  	netmem_ref cache[PP_ALLOC_CACHE_SIZE];
>>>>>  };
>>>>>
>>>>> +/* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is
>>>>> + * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for
>>>>> + * the head page of compound page and bit 1 for pfmemalloc page.
>>>>> + * page_is_pfmemalloc() is checked in __page_pool_put_page() to avoid recycling
>>>>> + * the pfmemalloc page.
>>>>> + */
>>>>> +#define PP_MAGIC_MASK ~0x3UL
>>>>> +
>>>>>  /**
>>>>>   * struct page_pool_params - page pool parameters
>>>>>   * @fast:	params accessed frequently on hotpath
>>>>> @@ -264,6 +272,11 @@ void page_pool_destroy(struct page_pool *pool);
>>>>>  void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
>>>>>  			   const struct xdp_mem_info *mem);
>>>>>  void page_pool_put_netmem_bulk(netmem_ref *data, u32 count);
>>>>> +
>>>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>>>> +{
>>>>> +	return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>>>> +}
>>>>>  #else
>>>>>  static inline void page_pool_destroy(struct page_pool *pool)
>>>>>  {
>>>>> @@ -278,6 +291,11 @@ static inline void page_pool_use_xdp_mem(struct page_pool *pool,
>>>>>  static inline void page_pool_put_netmem_bulk(netmem_ref *data, u32 count)
>>>>>  {
>>>>>  }
>>>>> +
>>>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>>>> +{
>>>>> +	return false;
>>>>> +}
>>>>>  #endif
>>>>>
>>>>>  void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem,
>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>> index f51aa6051a99867d2d7d8c70aa7c30e523629951..347a3cc2c188f4a9ced85e0d198947be7c503526 100644
>>>>> --- a/mm/page_alloc.c
>>>>> +++ b/mm/page_alloc.c
>>>>> @@ -55,6 +55,7 @@
>>>>>  #include <linux/delayacct.h>
>>>>>  #include <linux/cacheinfo.h>
>>>>>  #include <linux/pgalloc_tag.h>
>>>>> +#include <net/page_pool/types.h>
>>>>>  #include <asm/div64.h>
>>>>>  #include "internal.h"
>>>>>  #include "shuffle.h"
>>>>> @@ -897,9 +898,7 @@ static inline bool page_expected_state(struct page *page,
>>>>>  #ifdef CONFIG_MEMCG
>>>>>  			page->memcg_data |
>>>>>  #endif
>>>>> -#ifdef CONFIG_PAGE_POOL
>>>>> -			((page->pp_magic & ~0x3UL) == PP_SIGNATURE) |
>>>>> -#endif
>>>>> +			page_pool_page_is_pp(page) |
>>>>>  			(page->flags & check_flags)))
>>>>>  		return false;
>>>>>
>>>>> @@ -926,10 +925,8 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
>>>>>  	if (unlikely(page->memcg_data))
>>>>>  		bad_reason = "page still charged to cgroup";
>>>>>  #endif
>>>>> -#ifdef CONFIG_PAGE_POOL
>>>>> -	if (unlikely((page->pp_magic & ~0x3UL) == PP_SIGNATURE))
>>>>> +	if (unlikely(page_pool_page_is_pp(page)))
>>>>>  		bad_reason = "page_pool leak";
>>>>> -#endif
>>>>>  	return bad_reason;
>>>>>  }
>>>>>
>>>>
>>>> I wonder if it is OK to make page allocation depend on page_pool from
>>>> net/page_pool.
>>>
>>> Why? It's not really a dependency, just a header include with a static
>>> inline function...
>>
>> The function is checking, not even modifying, an core mm data structure,
>> struct page, which is also used by almost all subsystems. I do not get
>> why the function is in net subsystem.
>
> Well, because it's using details of the PP definitions, so keeping it
> there nicely encapsulates things. I mean, that's the whole point of
> defining a wrapper function - encapsulating the logic :)
>
>>>> Would linux/mm.h be a better place for page_pool_page_is_pp()?
>>>
>>> That would require moving all the definitions introduced in patch 2,
>>> which I don't think is appropriate.
>>
>> Why? I do not see page_pool_page_is_pp() or PP_SIGNATURE is used anywhere
>> in patch 2.
>
> Look again. Patch 2 redefines PP_MAGIC_MASK in terms of all the other
> definitions.

OK. Just checked. Yes, the function depends on PP_MAGIC_MASK.

But net/types.h has a lot of unrelated page_pool functions and data structures
mm/page_alloc.c does not care about. Is there a way of moving page_pool_page_is_pp()
and its dependency to a separate header and including that in mm/page_alloc.c?

Looking at the use of page_pool_page_is_pp() in mm/page_alloc.c, it seems to be
just error checking. Why can't page_pool do the error checking?

Anyway, if you think the patch is the right thing to do, I would not object it.

--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v7 1/2] page_pool: Move pp_magic check into helper functions
  2025-04-07 13:36           ` Zi Yan
@ 2025-04-07 14:15             ` Zi Yan
  2025-04-07 14:43               ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 19+ messages in thread
From: Zi Yan @ 2025-04-07 14:15 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, Ilias Apalodimas, Simon Horman,
	Andrew Morton, Mina Almasry, Yonglong Liu, Yunsheng Lin,
	Pavel Begunkov, Matthew Wilcox, netdev, bpf, linux-rdma, linux-mm

On 7 Apr 2025, at 9:36, Zi Yan wrote:

> On 7 Apr 2025, at 9:14, Toke Høiland-Jørgensen wrote:
>
>> Zi Yan <ziy@nvidia.com> writes:
>>
>>> Resend to fix my signature.
>>>
>>> On 7 Apr 2025, at 4:53, Toke Høiland-Jørgensen wrote:
>>>
>>>> "Zi Yan" <ziy@nvidia.com> writes:
>>>>
>>>>> On Fri Apr 4, 2025 at 6:18 AM EDT, Toke Høiland-Jørgensen wrote:
>>>>>> Since we are about to stash some more information into the pp_magic
>>>>>> field, let's move the magic signature checks into a pair of helper
>>>>>> functions so it can be changed in one place.
>>>>>>
>>>>>> Reviewed-by: Mina Almasry <almasrymina@google.com>
>>>>>> Tested-by: Yonglong Liu <liuyonglong@huawei.com>
>>>>>> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
>>>>>> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
>>>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>>>> ---
>>>>>>  drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c |  4 ++--
>>>>>>  include/net/page_pool/types.h                    | 18 ++++++++++++++++++
>>>>>>  mm/page_alloc.c                                  |  9 +++------
>>>>>>  net/core/netmem_priv.h                           |  5 +++++
>>>>>>  net/core/skbuff.c                                | 16 ++--------------
>>>>>>  net/core/xdp.c                                   |  4 ++--
>>>>>>  6 files changed, 32 insertions(+), 24 deletions(-)
>>>>>>
>>>>>
>>>>> <snip>
>>>>>
>>>>>> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
>>>>>> index 36eb57d73abc6cfc601e700ca08be20fb8281055..df0d3c1608929605224feb26173135ff37951ef8 100644
>>>>>> --- a/include/net/page_pool/types.h
>>>>>> +++ b/include/net/page_pool/types.h
>>>>>> @@ -54,6 +54,14 @@ struct pp_alloc_cache {
>>>>>>  	netmem_ref cache[PP_ALLOC_CACHE_SIZE];
>>>>>>  };
>>>>>>
>>>>>> +/* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is
>>>>>> + * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for
>>>>>> + * the head page of compound page and bit 1 for pfmemalloc page.
>>>>>> + * page_is_pfmemalloc() is checked in __page_pool_put_page() to avoid recycling
>>>>>> + * the pfmemalloc page.
>>>>>> + */
>>>>>> +#define PP_MAGIC_MASK ~0x3UL
>>>>>> +
>>>>>>  /**
>>>>>>   * struct page_pool_params - page pool parameters
>>>>>>   * @fast:	params accessed frequently on hotpath
>>>>>> @@ -264,6 +272,11 @@ void page_pool_destroy(struct page_pool *pool);
>>>>>>  void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
>>>>>>  			   const struct xdp_mem_info *mem);
>>>>>>  void page_pool_put_netmem_bulk(netmem_ref *data, u32 count);
>>>>>> +
>>>>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>>>>> +{
>>>>>> +	return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>>>>> +}
>>>>>>  #else
>>>>>>  static inline void page_pool_destroy(struct page_pool *pool)
>>>>>>  {
>>>>>> @@ -278,6 +291,11 @@ static inline void page_pool_use_xdp_mem(struct page_pool *pool,
>>>>>>  static inline void page_pool_put_netmem_bulk(netmem_ref *data, u32 count)
>>>>>>  {
>>>>>>  }
>>>>>> +
>>>>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>>>>> +{
>>>>>> +	return false;
>>>>>> +}
>>>>>>  #endif
>>>>>>
>>>>>>  void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem,
>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>> index f51aa6051a99867d2d7d8c70aa7c30e523629951..347a3cc2c188f4a9ced85e0d198947be7c503526 100644
>>>>>> --- a/mm/page_alloc.c
>>>>>> +++ b/mm/page_alloc.c
>>>>>> @@ -55,6 +55,7 @@
>>>>>>  #include <linux/delayacct.h>
>>>>>>  #include <linux/cacheinfo.h>
>>>>>>  #include <linux/pgalloc_tag.h>
>>>>>> +#include <net/page_pool/types.h>
>>>>>>  #include <asm/div64.h>
>>>>>>  #include "internal.h"
>>>>>>  #include "shuffle.h"
>>>>>> @@ -897,9 +898,7 @@ static inline bool page_expected_state(struct page *page,
>>>>>>  #ifdef CONFIG_MEMCG
>>>>>>  			page->memcg_data |
>>>>>>  #endif
>>>>>> -#ifdef CONFIG_PAGE_POOL
>>>>>> -			((page->pp_magic & ~0x3UL) == PP_SIGNATURE) |
>>>>>> -#endif
>>>>>> +			page_pool_page_is_pp(page) |
>>>>>>  			(page->flags & check_flags)))
>>>>>>  		return false;
>>>>>>
>>>>>> @@ -926,10 +925,8 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
>>>>>>  	if (unlikely(page->memcg_data))
>>>>>>  		bad_reason = "page still charged to cgroup";
>>>>>>  #endif
>>>>>> -#ifdef CONFIG_PAGE_POOL
>>>>>> -	if (unlikely((page->pp_magic & ~0x3UL) == PP_SIGNATURE))
>>>>>> +	if (unlikely(page_pool_page_is_pp(page)))
>>>>>>  		bad_reason = "page_pool leak";
>>>>>> -#endif
>>>>>>  	return bad_reason;
>>>>>>  }
>>>>>>
>>>>>
>>>>> I wonder if it is OK to make page allocation depend on page_pool from
>>>>> net/page_pool.
>>>>
>>>> Why? It's not really a dependency, just a header include with a static
>>>> inline function...
>>>
>>> The function is checking, not even modifying, an core mm data structure,
>>> struct page, which is also used by almost all subsystems. I do not get
>>> why the function is in net subsystem.
>>
>> Well, because it's using details of the PP definitions, so keeping it
>> there nicely encapsulates things. I mean, that's the whole point of
>> defining a wrapper function - encapsulating the logic :)
>>
>>>>> Would linux/mm.h be a better place for page_pool_page_is_pp()?
>>>>
>>>> That would require moving all the definitions introduced in patch 2,
>>>> which I don't think is appropriate.
>>>
>>> Why? I do not see page_pool_page_is_pp() or PP_SIGNATURE is used anywhere
>>> in patch 2.
>>
>> Look again. Patch 2 redefines PP_MAGIC_MASK in terms of all the other
>> definitions.
>
> OK. Just checked. Yes, the function depends on PP_MAGIC_MASK.
>
> But net/types.h has a lot of unrelated page_pool functions and data structures
> mm/page_alloc.c does not care about. Is there a way of moving page_pool_page_is_pp()
> and its dependency to a separate header and including that in mm/page_alloc.c?
>
> Looking at the use of page_pool_page_is_pp() in mm/page_alloc.c, it seems to be
> just error checking. Why can't page_pool do the error checking?

Or just remove page_pool_page_is_pp() in mm/page_alloc.c. Has it really been used?

--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v7 1/2] page_pool: Move pp_magic check into helper functions
  2025-04-07 14:15             ` Zi Yan
@ 2025-04-07 14:43               ` Jesper Dangaard Brouer
  2025-04-07 15:50                 ` Zi Yan
  0 siblings, 1 reply; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2025-04-07 14:43 UTC (permalink / raw)
  To: Zi Yan, Toke Høiland-Jørgensen
  Cc: David S. Miller, Jakub Kicinski, Saeed Mahameed, Leon Romanovsky,
	Tariq Toukan, Andrew Lunn, Eric Dumazet, Paolo Abeni,
	Ilias Apalodimas, Simon Horman, Andrew Morton, Mina Almasry,
	Yonglong Liu, Yunsheng Lin, Pavel Begunkov, Matthew Wilcox,
	netdev, bpf, linux-rdma, linux-mm, kernel-team



On 07/04/2025 16.15, Zi Yan wrote:
> On 7 Apr 2025, at 9:36, Zi Yan wrote:
> 
>> On 7 Apr 2025, at 9:14, Toke Høiland-Jørgensen wrote:
>>
>>> Zi Yan<ziy@nvidia.com>  writes:
>>>
>>>> Resend to fix my signature.
>>>>
>>>> On 7 Apr 2025, at 4:53, Toke Høiland-Jørgensen wrote:
>>>>
>>>>> "Zi Yan"<ziy@nvidia.com>  writes:
>>>>>
>>>>>> On Fri Apr 4, 2025 at 6:18 AM EDT, Toke Høiland-Jørgensen wrote:
>>>>>>> Since we are about to stash some more information into the pp_magic
>>>>>>> field, let's move the magic signature checks into a pair of helper
>>>>>>> functions so it can be changed in one place.
>>>>>>>
>>>>>>> Reviewed-by: Mina Almasry<almasrymina@google.com>
>>>>>>> Tested-by: Yonglong Liu<liuyonglong@huawei.com>
>>>>>>> Acked-by: Jesper Dangaard Brouer<hawk@kernel.org>
>>>>>>> Reviewed-by: Ilias Apalodimas<ilias.apalodimas@linaro.org>
>>>>>>> Signed-off-by: Toke Høiland-Jørgensen<toke@redhat.com>
>>>>>>> ---
>>>>>>>   drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c |  4 ++--
>>>>>>>   include/net/page_pool/types.h                    | 18 ++++++++++++++++++
>>>>>>>   mm/page_alloc.c                                  |  9 +++------
>>>>>>>   net/core/netmem_priv.h                           |  5 +++++
>>>>>>>   net/core/skbuff.c                                | 16 ++--------------
>>>>>>>   net/core/xdp.c                                   |  4 ++--
>>>>>>>   6 files changed, 32 insertions(+), 24 deletions(-)
>>>>>>>
>>>>>> <snip>
[...]

>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>>> index f51aa6051a99867d2d7d8c70aa7c30e523629951..347a3cc2c188f4a9ced85e0d198947be7c503526 100644
>>>>>>> --- a/mm/page_alloc.c
>>>>>>> +++ b/mm/page_alloc.c
>>>>>>> @@ -55,6 +55,7 @@
>>>>>>>   #include <linux/delayacct.h>
>>>>>>>   #include <linux/cacheinfo.h>
>>>>>>>   #include <linux/pgalloc_tag.h>
>>>>>>> +#include <net/page_pool/types.h>
>>>>>>>   #include <asm/div64.h>
>>>>>>>   #include "internal.h"
>>>>>>>   #include "shuffle.h"
>>>>>>> @@ -897,9 +898,7 @@ static inline bool page_expected_state(struct page *page,
>>>>>>>   #ifdef CONFIG_MEMCG
>>>>>>>   			page->memcg_data |
>>>>>>>   #endif
>>>>>>> -#ifdef CONFIG_PAGE_POOL
>>>>>>> -			((page->pp_magic & ~0x3UL) == PP_SIGNATURE) |
>>>>>>> -#endif
>>>>>>> +			page_pool_page_is_pp(page) |
>>>>>>>   			(page->flags & check_flags)))
>>>>>>>   		return false;
>>>>>>>
>>>>>>> @@ -926,10 +925,8 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
>>>>>>>   	if (unlikely(page->memcg_data))
>>>>>>>   		bad_reason = "page still charged to cgroup";
>>>>>>>   #endif
>>>>>>> -#ifdef CONFIG_PAGE_POOL
>>>>>>> -	if (unlikely((page->pp_magic & ~0x3UL) == PP_SIGNATURE))
>>>>>>> +	if (unlikely(page_pool_page_is_pp(page)))
>>>>>>>   		bad_reason = "page_pool leak";
>>>>>>> -#endif
>>>>>>>   	return bad_reason;
>>>>>>>   }
>>>>>>>
>>>>>> I wonder if it is OK to make page allocation depend on page_pool from
>>>>>> net/page_pool.
>>>>> Why? It's not really a dependency, just a header include with a static
>>>>> inline function...
>>>> The function is checking, not even modifying, an core mm data structure,
>>>> struct page, which is also used by almost all subsystems. I do not get
>>>> why the function is in net subsystem.
>>> Well, because it's using details of the PP definitions, so keeping it
>>> there nicely encapsulates things. I mean, that's the whole point of
>>> defining a wrapper function - encapsulating the logic 🙂
>>>
>>>>>> Would linux/mm.h be a better place for page_pool_page_is_pp()?
>>>>> That would require moving all the definitions introduced in patch 2,
>>>>> which I don't think is appropriate.
>>>> Why? I do not see page_pool_page_is_pp() or PP_SIGNATURE is used anywhere
>>>> in patch 2.
>>> Look again. Patch 2 redefines PP_MAGIC_MASK in terms of all the other
>>> definitions.
>> OK. Just checked. Yes, the function depends on PP_MAGIC_MASK.
>>
>> But net/types.h has a lot of unrelated page_pool functions and data structures
>> mm/page_alloc.c does not care about. Is there a way of moving page_pool_page_is_pp()
>> and its dependency to a separate header and including that in mm/page_alloc.c?
>>
>> Looking at the use of page_pool_page_is_pp() in mm/page_alloc.c, it seems to be
>> just error checking. Why can't page_pool do the error checking?
 >
> Or just remove page_pool_page_is_pp() in mm/page_alloc.c. Has it really been used?

We have actually used this at Cloudflare to catch some page_pool bugs.
And this have been backported to our 6.1 and 6.6 kernels and we have
enabled needed config CONFIG_DEBUG_VM (which we measured have low enough
overhead to enable in production).  AFAIK this is also enabled for our
6.12 kernels.

--Jesper



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v7 1/2] page_pool: Move pp_magic check into helper functions
  2025-04-07 14:43               ` Jesper Dangaard Brouer
@ 2025-04-07 15:50                 ` Zi Yan
  2025-04-07 16:05                   ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 19+ messages in thread
From: Zi Yan @ 2025-04-07 15:50 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Jesper Dangaard Brouer
  Cc: David S. Miller, Jakub Kicinski, Saeed Mahameed, Leon Romanovsky,
	Tariq Toukan, Andrew Lunn, Eric Dumazet, Paolo Abeni,
	Ilias Apalodimas, Simon Horman, Andrew Morton, Mina Almasry,
	Yonglong Liu, Yunsheng Lin, Pavel Begunkov, Matthew Wilcox,
	netdev, bpf, linux-rdma, linux-mm, kernel-team

On 7 Apr 2025, at 10:43, Jesper Dangaard Brouer wrote:

> On 07/04/2025 16.15, Zi Yan wrote:
>> On 7 Apr 2025, at 9:36, Zi Yan wrote:
>>
>>> On 7 Apr 2025, at 9:14, Toke Høiland-Jørgensen wrote:
>>>
>>>> Zi Yan<ziy@nvidia.com>  writes:
>>>>
>>>>> Resend to fix my signature.
>>>>>
>>>>> On 7 Apr 2025, at 4:53, Toke Høiland-Jørgensen wrote:
>>>>>
>>>>>> "Zi Yan"<ziy@nvidia.com>  writes:
>>>>>>
>>>>>>> On Fri Apr 4, 2025 at 6:18 AM EDT, Toke Høiland-Jørgensen wrote:
>>>>>>>> Since we are about to stash some more information into the pp_magic
>>>>>>>> field, let's move the magic signature checks into a pair of helper
>>>>>>>> functions so it can be changed in one place.
>>>>>>>>
>>>>>>>> Reviewed-by: Mina Almasry<almasrymina@google.com>
>>>>>>>> Tested-by: Yonglong Liu<liuyonglong@huawei.com>
>>>>>>>> Acked-by: Jesper Dangaard Brouer<hawk@kernel.org>
>>>>>>>> Reviewed-by: Ilias Apalodimas<ilias.apalodimas@linaro.org>
>>>>>>>> Signed-off-by: Toke Høiland-Jørgensen<toke@redhat.com>
>>>>>>>> ---
>>>>>>>>   drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c |  4 ++--
>>>>>>>>   include/net/page_pool/types.h                    | 18 ++++++++++++++++++
>>>>>>>>   mm/page_alloc.c                                  |  9 +++------
>>>>>>>>   net/core/netmem_priv.h                           |  5 +++++
>>>>>>>>   net/core/skbuff.c                                | 16 ++--------------
>>>>>>>>   net/core/xdp.c                                   |  4 ++--
>>>>>>>>   6 files changed, 32 insertions(+), 24 deletions(-)
>>>>>>>>
>>>>>>> <snip>
> [...]
>
>>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>>>> index f51aa6051a99867d2d7d8c70aa7c30e523629951..347a3cc2c188f4a9ced85e0d198947be7c503526 100644
>>>>>>>> --- a/mm/page_alloc.c
>>>>>>>> +++ b/mm/page_alloc.c
>>>>>>>> @@ -55,6 +55,7 @@
>>>>>>>>   #include <linux/delayacct.h>
>>>>>>>>   #include <linux/cacheinfo.h>
>>>>>>>>   #include <linux/pgalloc_tag.h>
>>>>>>>> +#include <net/page_pool/types.h>
>>>>>>>>   #include <asm/div64.h>
>>>>>>>>   #include "internal.h"
>>>>>>>>   #include "shuffle.h"
>>>>>>>> @@ -897,9 +898,7 @@ static inline bool page_expected_state(struct page *page,
>>>>>>>>   #ifdef CONFIG_MEMCG
>>>>>>>>   			page->memcg_data |
>>>>>>>>   #endif
>>>>>>>> -#ifdef CONFIG_PAGE_POOL
>>>>>>>> -			((page->pp_magic & ~0x3UL) == PP_SIGNATURE) |
>>>>>>>> -#endif
>>>>>>>> +			page_pool_page_is_pp(page) |
>>>>>>>>   			(page->flags & check_flags)))
>>>>>>>>   		return false;
>>>>>>>>
>>>>>>>> @@ -926,10 +925,8 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
>>>>>>>>   	if (unlikely(page->memcg_data))
>>>>>>>>   		bad_reason = "page still charged to cgroup";
>>>>>>>>   #endif
>>>>>>>> -#ifdef CONFIG_PAGE_POOL
>>>>>>>> -	if (unlikely((page->pp_magic & ~0x3UL) == PP_SIGNATURE))
>>>>>>>> +	if (unlikely(page_pool_page_is_pp(page)))
>>>>>>>>   		bad_reason = "page_pool leak";
>>>>>>>> -#endif
>>>>>>>>   	return bad_reason;
>>>>>>>>   }
>>>>>>>>
>>>>>>> I wonder if it is OK to make page allocation depend on page_pool from
>>>>>>> net/page_pool.
>>>>>> Why? It's not really a dependency, just a header include with a static
>>>>>> inline function...
>>>>> The function is checking, not even modifying, an core mm data structure,
>>>>> struct page, which is also used by almost all subsystems. I do not get
>>>>> why the function is in net subsystem.
>>>> Well, because it's using details of the PP definitions, so keeping it
>>>> there nicely encapsulates things. I mean, that's the whole point of
>>>> defining a wrapper function - encapsulating the logic 🙂
>>>>
>>>>>>> Would linux/mm.h be a better place for page_pool_page_is_pp()?
>>>>>> That would require moving all the definitions introduced in patch 2,
>>>>>> which I don't think is appropriate.

The patch at the bottom moves page_pool_page_is_pp() to mm.h and compiles.
The macros and the function use mm’s page->pp_magic, so I am not sure
why it is appropriate, especially the user of the macros, net/core/page_pool.c,
has already included mm.h.

>>>>> Why? I do not see page_pool_page_is_pp() or PP_SIGNATURE is used anywhere
>>>>> in patch 2.
>>>> Look again. Patch 2 redefines PP_MAGIC_MASK in terms of all the other
>>>> definitions.
>>> OK. Just checked. Yes, the function depends on PP_MAGIC_MASK.
>>>
>>> But net/types.h has a lot of unrelated page_pool functions and data structures
>>> mm/page_alloc.c does not care about. Is there a way of moving page_pool_page_is_pp()
>>> and its dependency to a separate header and including that in mm/page_alloc.c?
>>>
>>> Looking at the use of page_pool_page_is_pp() in mm/page_alloc.c, it seems to be
>>> just error checking. Why can't page_pool do the error checking?
>>
>> Or just remove page_pool_page_is_pp() in mm/page_alloc.c. Has it really been used?
>
> We have actually used this at Cloudflare to catch some page_pool bugs.
> And this have been backported to our 6.1 and 6.6 kernels and we have
> enabled needed config CONFIG_DEBUG_VM (which we measured have low enough
> overhead to enable in production).  AFAIK this is also enabled for our
> 6.12 kernels.

Got it. Thank you for the information.

diff --git a/include/linux/mm.h b/include/linux/mm.h
index b7f13f087954..a5c4dafcaa0f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4248,4 +4248,63 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
 #define VM_SEALED_SYSMAP	VM_NONE
 #endif

+/*
+ * DMA mapping IDs
+ *
+ * When DMA-mapping a page, we allocate an ID (from an xarray) and stash this in
+ * the upper bits of page->pp_magic. We always want to be able to unambiguously
+ * identify page pool pages (using page_pool_page_is_pp()). Non-PP pages can
+ * have arbitrary kernel pointers stored in the same field as pp_magic (since it
+ * overlaps with page->lru.next), so we must ensure that we cannot mistake a
+ * valid kernel pointer with any of the values we write into this field.
+ *
+ * On architectures that set POISON_POINTER_DELTA, this is already ensured,
+ * since this value becomes part of PP_SIGNATURE; meaning we can just use the
+ * space between the PP_SIGNATURE value (without POISON_POINTER_DELTA), and the
+ * lowest bits of POISON_POINTER_DELTA. On arches where POISON_POINTER_DELTA is
+ * 0, we make sure that we leave the two topmost bits empty, as that guarantees
+ * we won't mistake a valid kernel pointer for a value we set, regardless of the
+ * VMSPLIT setting.
+ *
+ * Altogether, this means that the number of bits available is constrained by
+ * the size of an unsigned long (at the upper end, subtracting two bits per the
+ * above), and the definition of PP_SIGNATURE (with or without
+ * POISON_POINTER_DELTA).
+ */
+#define PP_DMA_INDEX_SHIFT (1 + __fls(PP_SIGNATURE - POISON_POINTER_DELTA))
+#if POISON_POINTER_DELTA > 0
+/* PP_SIGNATURE includes POISON_POINTER_DELTA, so limit the size of the DMA
+ * index to not overlap with that if set
+ */
+#define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
+#else
+/* Always leave out the topmost two; see above. */
+#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
+#endif
+
+#define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS + PP_DMA_INDEX_SHIFT - 1, \
+				  PP_DMA_INDEX_SHIFT)
+#define PP_DMA_INDEX_LIMIT XA_LIMIT(1, BIT(PP_DMA_INDEX_BITS) - 1)
+
+/* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is
+ * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for
+ * the head page of compound page and bit 1 for pfmemalloc page, as well as the
+ * bits used for the DMA index. page_is_pfmemalloc() is checked in
+ * __page_pool_put_page() to avoid recycling the pfmemalloc page.
+ */
+#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
+
+#ifdef CONFIG_PAGE_POOL
+static inline bool page_pool_page_is_pp(struct page *page)
+{
+	return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
+}
+#else
+
+static inline bool page_pool_page_is_pp(struct page *page)
+{
+	return false;
+}
+#endif
+
 #endif /* _LINUX_MM_H */
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 5835d359ecd0..38ca7ac567cf 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -55,52 +55,6 @@ struct pp_alloc_cache {
 	netmem_ref cache[PP_ALLOC_CACHE_SIZE];
 };

-/*
- * DMA mapping IDs
- *
- * When DMA-mapping a page, we allocate an ID (from an xarray) and stash this in
- * the upper bits of page->pp_magic. We always want to be able to unambiguously
- * identify page pool pages (using page_pool_page_is_pp()). Non-PP pages can
- * have arbitrary kernel pointers stored in the same field as pp_magic (since it
- * overlaps with page->lru.next), so we must ensure that we cannot mistake a
- * valid kernel pointer with any of the values we write into this field.
- *
- * On architectures that set POISON_POINTER_DELTA, this is already ensured,
- * since this value becomes part of PP_SIGNATURE; meaning we can just use the
- * space between the PP_SIGNATURE value (without POISON_POINTER_DELTA), and the
- * lowest bits of POISON_POINTER_DELTA. On arches where POISON_POINTER_DELTA is
- * 0, we make sure that we leave the two topmost bits empty, as that guarantees
- * we won't mistake a valid kernel pointer for a value we set, regardless of the
- * VMSPLIT setting.
- *
- * Altogether, this means that the number of bits available is constrained by
- * the size of an unsigned long (at the upper end, subtracting two bits per the
- * above), and the definition of PP_SIGNATURE (with or without
- * POISON_POINTER_DELTA).
- */
-#define PP_DMA_INDEX_SHIFT (1 + __fls(PP_SIGNATURE - POISON_POINTER_DELTA))
-#if POISON_POINTER_DELTA > 0
-/* PP_SIGNATURE includes POISON_POINTER_DELTA, so limit the size of the DMA
- * index to not overlap with that if set
- */
-#define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
-#else
-/* Always leave out the topmost two; see above. */
-#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
-#endif
-
-#define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS + PP_DMA_INDEX_SHIFT - 1, \
-				  PP_DMA_INDEX_SHIFT)
-#define PP_DMA_INDEX_LIMIT XA_LIMIT(1, BIT(PP_DMA_INDEX_BITS) - 1)
-
-/* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is
- * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for
- * the head page of compound page and bit 1 for pfmemalloc page, as well as the
- * bits used for the DMA index. page_is_pfmemalloc() is checked in
- * __page_pool_put_page() to avoid recycling the pfmemalloc page.
- */
-#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
-
 /**
  * struct page_pool_params - page pool parameters
  * @fast:	params accessed frequently on hotpath
@@ -314,10 +268,6 @@ void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
 			   const struct xdp_mem_info *mem);
 void page_pool_put_netmem_bulk(netmem_ref *data, u32 count);

-static inline bool page_pool_page_is_pp(struct page *page)
-{
-	return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
-}
 #else
 static inline void page_pool_destroy(struct page_pool *pool)
 {
@@ -333,10 +283,6 @@ static inline void page_pool_put_netmem_bulk(netmem_ref *data, u32 count)
 {
 }

-static inline bool page_pool_page_is_pp(struct page *page)
-{
-	return false;
-}
 #endif

 void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b14f292da3db..a18340b32218 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -55,7 +55,6 @@
 #include <linux/delayacct.h>
 #include <linux/cacheinfo.h>
 #include <linux/pgalloc_tag.h>
-#include <net/page_pool/types.h>
 #include <asm/div64.h>
 #include "internal.h"
 #include "shuffle.h"


Best Regards,
Yan, Zi


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v7 1/2] page_pool: Move pp_magic check into helper functions
  2025-04-07 15:50                 ` Zi Yan
@ 2025-04-07 16:05                   ` Toke Høiland-Jørgensen
  2025-04-07 16:06                     ` Zi Yan
  0 siblings, 1 reply; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-04-07 16:05 UTC (permalink / raw)
  To: Zi Yan, Jesper Dangaard Brouer
  Cc: David S. Miller, Jakub Kicinski, Saeed Mahameed, Leon Romanovsky,
	Tariq Toukan, Andrew Lunn, Eric Dumazet, Paolo Abeni,
	Ilias Apalodimas, Simon Horman, Andrew Morton, Mina Almasry,
	Yonglong Liu, Yunsheng Lin, Pavel Begunkov, Matthew Wilcox,
	netdev, bpf, linux-rdma, linux-mm, kernel-team

Zi Yan <ziy@nvidia.com> writes:

> On 7 Apr 2025, at 10:43, Jesper Dangaard Brouer wrote:
>
>> On 07/04/2025 16.15, Zi Yan wrote:
>>> On 7 Apr 2025, at 9:36, Zi Yan wrote:
>>>
>>>> On 7 Apr 2025, at 9:14, Toke Høiland-Jørgensen wrote:
>>>>
>>>>> Zi Yan<ziy@nvidia.com>  writes:
>>>>>
>>>>>> Resend to fix my signature.
>>>>>>
>>>>>> On 7 Apr 2025, at 4:53, Toke Høiland-Jørgensen wrote:
>>>>>>
>>>>>>> "Zi Yan"<ziy@nvidia.com>  writes:
>>>>>>>
>>>>>>>> On Fri Apr 4, 2025 at 6:18 AM EDT, Toke Høiland-Jørgensen wrote:
>>>>>>>>> Since we are about to stash some more information into the pp_magic
>>>>>>>>> field, let's move the magic signature checks into a pair of helper
>>>>>>>>> functions so it can be changed in one place.
>>>>>>>>>
>>>>>>>>> Reviewed-by: Mina Almasry<almasrymina@google.com>
>>>>>>>>> Tested-by: Yonglong Liu<liuyonglong@huawei.com>
>>>>>>>>> Acked-by: Jesper Dangaard Brouer<hawk@kernel.org>
>>>>>>>>> Reviewed-by: Ilias Apalodimas<ilias.apalodimas@linaro.org>
>>>>>>>>> Signed-off-by: Toke Høiland-Jørgensen<toke@redhat.com>
>>>>>>>>> ---
>>>>>>>>>   drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c |  4 ++--
>>>>>>>>>   include/net/page_pool/types.h                    | 18 ++++++++++++++++++
>>>>>>>>>   mm/page_alloc.c                                  |  9 +++------
>>>>>>>>>   net/core/netmem_priv.h                           |  5 +++++
>>>>>>>>>   net/core/skbuff.c                                | 16 ++--------------
>>>>>>>>>   net/core/xdp.c                                   |  4 ++--
>>>>>>>>>   6 files changed, 32 insertions(+), 24 deletions(-)
>>>>>>>>>
>>>>>>>> <snip>
>> [...]
>>
>>>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>>>>> index f51aa6051a99867d2d7d8c70aa7c30e523629951..347a3cc2c188f4a9ced85e0d198947be7c503526 100644
>>>>>>>>> --- a/mm/page_alloc.c
>>>>>>>>> +++ b/mm/page_alloc.c
>>>>>>>>> @@ -55,6 +55,7 @@
>>>>>>>>>   #include <linux/delayacct.h>
>>>>>>>>>   #include <linux/cacheinfo.h>
>>>>>>>>>   #include <linux/pgalloc_tag.h>
>>>>>>>>> +#include <net/page_pool/types.h>
>>>>>>>>>   #include <asm/div64.h>
>>>>>>>>>   #include "internal.h"
>>>>>>>>>   #include "shuffle.h"
>>>>>>>>> @@ -897,9 +898,7 @@ static inline bool page_expected_state(struct page *page,
>>>>>>>>>   #ifdef CONFIG_MEMCG
>>>>>>>>>   			page->memcg_data |
>>>>>>>>>   #endif
>>>>>>>>> -#ifdef CONFIG_PAGE_POOL
>>>>>>>>> -			((page->pp_magic & ~0x3UL) == PP_SIGNATURE) |
>>>>>>>>> -#endif
>>>>>>>>> +			page_pool_page_is_pp(page) |
>>>>>>>>>   			(page->flags & check_flags)))
>>>>>>>>>   		return false;
>>>>>>>>>
>>>>>>>>> @@ -926,10 +925,8 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
>>>>>>>>>   	if (unlikely(page->memcg_data))
>>>>>>>>>   		bad_reason = "page still charged to cgroup";
>>>>>>>>>   #endif
>>>>>>>>> -#ifdef CONFIG_PAGE_POOL
>>>>>>>>> -	if (unlikely((page->pp_magic & ~0x3UL) == PP_SIGNATURE))
>>>>>>>>> +	if (unlikely(page_pool_page_is_pp(page)))
>>>>>>>>>   		bad_reason = "page_pool leak";
>>>>>>>>> -#endif
>>>>>>>>>   	return bad_reason;
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>> I wonder if it is OK to make page allocation depend on page_pool from
>>>>>>>> net/page_pool.
>>>>>>> Why? It's not really a dependency, just a header include with a static
>>>>>>> inline function...
>>>>>> The function is checking, not even modifying, an core mm data structure,
>>>>>> struct page, which is also used by almost all subsystems. I do not get
>>>>>> why the function is in net subsystem.
>>>>> Well, because it's using details of the PP definitions, so keeping it
>>>>> there nicely encapsulates things. I mean, that's the whole point of
>>>>> defining a wrapper function - encapsulating the logic 🙂
>>>>>
>>>>>>>> Would linux/mm.h be a better place for page_pool_page_is_pp()?
>>>>>>> That would require moving all the definitions introduced in patch 2,
>>>>>>> which I don't think is appropriate.
>
> The patch at the bottom moves page_pool_page_is_pp() to mm.h and compiles.
> The macros and the function use mm’s page->pp_magic, so I am not sure
> why it is appropriate, especially the user of the macros, net/core/page_pool.c,
> has already included mm.h.

Well, I kinda considered those details page_pool-internal. But okay, I
can move them if you prefer to have them in mm.h.

-Toke



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v7 1/2] page_pool: Move pp_magic check into helper functions
  2025-04-07 16:05                   ` Toke Høiland-Jørgensen
@ 2025-04-07 16:06                     ` Zi Yan
  0 siblings, 0 replies; 19+ messages in thread
From: Zi Yan @ 2025-04-07 16:06 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, David S. Miller, Jakub Kicinski,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, Ilias Apalodimas, Simon Horman,
	Andrew Morton, Mina Almasry, Yonglong Liu, Yunsheng Lin,
	Pavel Begunkov, Matthew Wilcox, netdev, bpf, linux-rdma, linux-mm,
	kernel-team

On 7 Apr 2025, at 12:05, Toke Høiland-Jørgensen wrote:

> Zi Yan <ziy@nvidia.com> writes:
>
>> On 7 Apr 2025, at 10:43, Jesper Dangaard Brouer wrote:
>>
>>> On 07/04/2025 16.15, Zi Yan wrote:
>>>> On 7 Apr 2025, at 9:36, Zi Yan wrote:
>>>>
>>>>> On 7 Apr 2025, at 9:14, Toke Høiland-Jørgensen wrote:
>>>>>
>>>>>> Zi Yan<ziy@nvidia.com>  writes:
>>>>>>
>>>>>>> Resend to fix my signature.
>>>>>>>
>>>>>>> On 7 Apr 2025, at 4:53, Toke Høiland-Jørgensen wrote:
>>>>>>>
>>>>>>>> "Zi Yan"<ziy@nvidia.com>  writes:
>>>>>>>>
>>>>>>>>> On Fri Apr 4, 2025 at 6:18 AM EDT, Toke Høiland-Jørgensen wrote:
>>>>>>>>>> Since we are about to stash some more information into the pp_magic
>>>>>>>>>> field, let's move the magic signature checks into a pair of helper
>>>>>>>>>> functions so it can be changed in one place.
>>>>>>>>>>
>>>>>>>>>> Reviewed-by: Mina Almasry<almasrymina@google.com>
>>>>>>>>>> Tested-by: Yonglong Liu<liuyonglong@huawei.com>
>>>>>>>>>> Acked-by: Jesper Dangaard Brouer<hawk@kernel.org>
>>>>>>>>>> Reviewed-by: Ilias Apalodimas<ilias.apalodimas@linaro.org>
>>>>>>>>>> Signed-off-by: Toke Høiland-Jørgensen<toke@redhat.com>
>>>>>>>>>> ---
>>>>>>>>>>   drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c |  4 ++--
>>>>>>>>>>   include/net/page_pool/types.h                    | 18 ++++++++++++++++++
>>>>>>>>>>   mm/page_alloc.c                                  |  9 +++------
>>>>>>>>>>   net/core/netmem_priv.h                           |  5 +++++
>>>>>>>>>>   net/core/skbuff.c                                | 16 ++--------------
>>>>>>>>>>   net/core/xdp.c                                   |  4 ++--
>>>>>>>>>>   6 files changed, 32 insertions(+), 24 deletions(-)
>>>>>>>>>>
>>>>>>>>> <snip>
>>> [...]
>>>
>>>>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>>>>>> index f51aa6051a99867d2d7d8c70aa7c30e523629951..347a3cc2c188f4a9ced85e0d198947be7c503526 100644
>>>>>>>>>> --- a/mm/page_alloc.c
>>>>>>>>>> +++ b/mm/page_alloc.c
>>>>>>>>>> @@ -55,6 +55,7 @@
>>>>>>>>>>   #include <linux/delayacct.h>
>>>>>>>>>>   #include <linux/cacheinfo.h>
>>>>>>>>>>   #include <linux/pgalloc_tag.h>
>>>>>>>>>> +#include <net/page_pool/types.h>
>>>>>>>>>>   #include <asm/div64.h>
>>>>>>>>>>   #include "internal.h"
>>>>>>>>>>   #include "shuffle.h"
>>>>>>>>>> @@ -897,9 +898,7 @@ static inline bool page_expected_state(struct page *page,
>>>>>>>>>>   #ifdef CONFIG_MEMCG
>>>>>>>>>>   			page->memcg_data |
>>>>>>>>>>   #endif
>>>>>>>>>> -#ifdef CONFIG_PAGE_POOL
>>>>>>>>>> -			((page->pp_magic & ~0x3UL) == PP_SIGNATURE) |
>>>>>>>>>> -#endif
>>>>>>>>>> +			page_pool_page_is_pp(page) |
>>>>>>>>>>   			(page->flags & check_flags)))
>>>>>>>>>>   		return false;
>>>>>>>>>>
>>>>>>>>>> @@ -926,10 +925,8 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
>>>>>>>>>>   	if (unlikely(page->memcg_data))
>>>>>>>>>>   		bad_reason = "page still charged to cgroup";
>>>>>>>>>>   #endif
>>>>>>>>>> -#ifdef CONFIG_PAGE_POOL
>>>>>>>>>> -	if (unlikely((page->pp_magic & ~0x3UL) == PP_SIGNATURE))
>>>>>>>>>> +	if (unlikely(page_pool_page_is_pp(page)))
>>>>>>>>>>   		bad_reason = "page_pool leak";
>>>>>>>>>> -#endif
>>>>>>>>>>   	return bad_reason;
>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>> I wonder if it is OK to make page allocation depend on page_pool from
>>>>>>>>> net/page_pool.
>>>>>>>> Why? It's not really a dependency, just a header include with a static
>>>>>>>> inline function...
>>>>>>> The function is checking, not even modifying, an core mm data structure,
>>>>>>> struct page, which is also used by almost all subsystems. I do not get
>>>>>>> why the function is in net subsystem.
>>>>>> Well, because it's using details of the PP definitions, so keeping it
>>>>>> there nicely encapsulates things. I mean, that's the whole point of
>>>>>> defining a wrapper function - encapsulating the logic 🙂
>>>>>>
>>>>>>>>> Would linux/mm.h be a better place for page_pool_page_is_pp()?
>>>>>>>> That would require moving all the definitions introduced in patch 2,
>>>>>>>> which I don't think is appropriate.
>>
>> The patch at the bottom moves page_pool_page_is_pp() to mm.h and compiles.
>> The macros and the function use mm’s page->pp_magic, so I am not sure
>> why it is appropriate, especially the user of the macros, net/core/page_pool.c,
>> has already included mm.h.
>
> Well, I kinda considered those details page_pool-internal. But okay, I
> can move them if you prefer to have them in mm.h.

Thanks.

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next v7 2/2] page_pool: Track DMA-mapped pages and unmap them when destroying the pool
  2025-04-04 10:18 [PATCH net-next v7 0/2] Fix late DMA unmap crash for page pool Toke Høiland-Jørgensen
  2025-04-04 10:18 ` [PATCH net-next v7 1/2] page_pool: Move pp_magic check into helper functions Toke Høiland-Jørgensen
@ 2025-04-04 10:18 ` Toke Høiland-Jørgensen
  2025-04-04 15:55   ` Alexander Lobakin
  1 sibling, 1 reply; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-04-04 10:18 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, Ilias Apalodimas, Simon Horman,
	Andrew Morton, Mina Almasry, Yonglong Liu, Yunsheng Lin,
	Pavel Begunkov, Matthew Wilcox
  Cc: netdev, bpf, linux-rdma, linux-mm,
	Toke Høiland-Jørgensen, Qiuling Ren, Yuying Ma

When enabling DMA mapping in page_pool, pages are kept DMA mapped until
they are released from the pool, to avoid the overhead of re-mapping the
pages every time they are used. This causes resource leaks and/or
crashes when there are pages still outstanding while the device is torn
down, because page_pool will attempt an unmap through a non-existent DMA
device on the subsequent page return.

To fix this, implement a simple tracking of outstanding DMA-mapped pages
in page pool using an xarray. This was first suggested by Mina[0], and
turns out to be fairly straight forward: We simply store pointers to
pages directly in the xarray with xa_alloc() when they are first DMA
mapped, and remove them from the array on unmap. Then, when a page pool
is torn down, it can simply walk the xarray and unmap all pages still
present there before returning, which also allows us to get rid of the
get/put_device() calls in page_pool. Using xa_cmpxchg(), no additional
synchronisation is needed, as a page will only ever be unmapped once.

To avoid having to walk the entire xarray on unmap to find the page
reference, we stash the ID assigned by xa_alloc() into the page
structure itself, using the upper bits of the pp_magic field. This
requires a couple of defines to avoid conflicting with the
POINTER_POISON_DELTA define, but this is all evaluated at compile-time,
so does not affect run-time performance. The bitmap calculations in this
patch gives the following number of bits for different architectures:

- 23 bits on 32-bit architectures
- 21 bits on PPC64 (because of the definition of ILLEGAL_POINTER_VALUE)
- 32 bits on other 64-bit architectures

Stashing a value into the unused bits of pp_magic does have the effect
that it can make the value stored there lie outside the unmappable
range (as governed by the mmap_min_addr sysctl), for architectures that
don't define ILLEGAL_POINTER_VALUE. This means that if one of the
pointers that is aliased to the pp_magic field (such as page->lru.next)
is dereferenced while the page is owned by page_pool, that could lead to
a dereference into userspace, which is a security concern. The risk of
this is mitigated by the fact that (a) we always clear pp_magic before
releasing a page from page_pool, and (b) this would need a
use-after-free bug for struct page, which can have many other risks
since page->lru.next is used as a generic list pointer in multiple
places in the kernel. As such, with this patch we take the position that
this risk is negligible in practice. For more discussion, see[1].

Since all the tracking added in this patch is performed on DMA
map/unmap, no additional code is needed in the fast path, meaning the
performance overhead of this tracking is negligible there. A
micro-benchmark shows that the total overhead of the tracking itself is
about 400 ns (39 cycles(tsc) 395.218 ns; sum for both map and unmap[2]).
Since this cost is only paid on DMA map and unmap, it seems like an
acceptable cost to fix the late unmap issue. Further optimisation can
narrow the cases where this cost is paid (for instance by eliding the
tracking when DMA map/unmap is a no-op).

The extra memory needed to track the pages is neatly encapsulated inside
xarray, which uses the 'struct xa_node' structure to track items. This
structure is 576 bytes long, with slots for 64 items, meaning that a
full node occurs only 9 bytes of overhead per slot it tracks (in
practice, it probably won't be this efficient, but in any case it should
be an acceptable overhead).

[0] https://lore.kernel.org/all/CAHS8izPg7B5DwKfSuzz-iOop_YRbk3Sd6Y4rX7KBG9DcVJcyWg@mail.gmail.com/
[1] https://lore.kernel.org/r/20250320023202.GA25514@openwall.com
[2] https://lore.kernel.org/r/ae07144c-9295-4c9d-a400-153bb689fe9e@huawei.com

Reported-by: Yonglong Liu <liuyonglong@huawei.com>
Closes: https://lore.kernel.org/r/8743264a-9700-4227-a556-5f931c720211@huawei.com
Fixes: ff7d6b27f894 ("page_pool: refurbish version of page_pool code")
Suggested-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org>
Tested-by: Jesper Dangaard Brouer <hawk@kernel.org>
Tested-by: Qiuling Ren <qren@redhat.com>
Tested-by: Yuying Ma <yuma@redhat.com>
Tested-by: Yonglong Liu <liuyonglong@huawei.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/linux/poison.h        |  4 +++
 include/net/page_pool/types.h | 53 ++++++++++++++++++++++++----
 net/core/netmem_priv.h        | 28 ++++++++++++++-
 net/core/page_pool.c          | 81 ++++++++++++++++++++++++++++++++++++-------
 4 files changed, 147 insertions(+), 19 deletions(-)

diff --git a/include/linux/poison.h b/include/linux/poison.h
index 331a9a996fa8746626afa63ea462b85ca3e5938b..5351efd710d5e21cc341f7bb533b1aeea4a0808a 100644
--- a/include/linux/poison.h
+++ b/include/linux/poison.h
@@ -70,6 +70,10 @@
 #define KEY_DESTROY		0xbd
 
 /********** net/core/page_pool.c **********/
+/*
+ * page_pool uses additional free bits within this value to store data, see the
+ * definition of PP_DMA_INDEX_MASK in include/net/page_pool/types.h
+ */
 #define PP_SIGNATURE		(0x40 + POISON_POINTER_DELTA)
 
 /********** net/core/skbuff.c **********/
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index df0d3c1608929605224feb26173135ff37951ef8..5835d359ecd0ac75dd737736926914ef7dd60646 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -6,6 +6,7 @@
 #include <linux/dma-direction.h>
 #include <linux/ptr_ring.h>
 #include <linux/types.h>
+#include <linux/xarray.h>
 #include <net/netmem.h>
 
 #define PP_FLAG_DMA_MAP		BIT(0) /* Should page_pool do the DMA
@@ -54,13 +55,51 @@ struct pp_alloc_cache {
 	netmem_ref cache[PP_ALLOC_CACHE_SIZE];
 };
 
+/*
+ * DMA mapping IDs
+ *
+ * When DMA-mapping a page, we allocate an ID (from an xarray) and stash this in
+ * the upper bits of page->pp_magic. We always want to be able to unambiguously
+ * identify page pool pages (using page_pool_page_is_pp()). Non-PP pages can
+ * have arbitrary kernel pointers stored in the same field as pp_magic (since it
+ * overlaps with page->lru.next), so we must ensure that we cannot mistake a
+ * valid kernel pointer with any of the values we write into this field.
+ *
+ * On architectures that set POISON_POINTER_DELTA, this is already ensured,
+ * since this value becomes part of PP_SIGNATURE; meaning we can just use the
+ * space between the PP_SIGNATURE value (without POISON_POINTER_DELTA), and the
+ * lowest bits of POISON_POINTER_DELTA. On arches where POISON_POINTER_DELTA is
+ * 0, we make sure that we leave the two topmost bits empty, as that guarantees
+ * we won't mistake a valid kernel pointer for a value we set, regardless of the
+ * VMSPLIT setting.
+ *
+ * Altogether, this means that the number of bits available is constrained by
+ * the size of an unsigned long (at the upper end, subtracting two bits per the
+ * above), and the definition of PP_SIGNATURE (with or without
+ * POISON_POINTER_DELTA).
+ */
+#define PP_DMA_INDEX_SHIFT (1 + __fls(PP_SIGNATURE - POISON_POINTER_DELTA))
+#if POISON_POINTER_DELTA > 0
+/* PP_SIGNATURE includes POISON_POINTER_DELTA, so limit the size of the DMA
+ * index to not overlap with that if set
+ */
+#define PP_DMA_INDEX_BITS MIN(32, __ffs(POISON_POINTER_DELTA) - PP_DMA_INDEX_SHIFT)
+#else
+/* Always leave out the topmost two; see above. */
+#define PP_DMA_INDEX_BITS MIN(32, BITS_PER_LONG - PP_DMA_INDEX_SHIFT - 2)
+#endif
+
+#define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS + PP_DMA_INDEX_SHIFT - 1, \
+				  PP_DMA_INDEX_SHIFT)
+#define PP_DMA_INDEX_LIMIT XA_LIMIT(1, BIT(PP_DMA_INDEX_BITS) - 1)
+
 /* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is
  * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for
- * the head page of compound page and bit 1 for pfmemalloc page.
- * page_is_pfmemalloc() is checked in __page_pool_put_page() to avoid recycling
- * the pfmemalloc page.
+ * the head page of compound page and bit 1 for pfmemalloc page, as well as the
+ * bits used for the DMA index. page_is_pfmemalloc() is checked in
+ * __page_pool_put_page() to avoid recycling the pfmemalloc page.
  */
-#define PP_MAGIC_MASK ~0x3UL
+#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
 
 /**
  * struct page_pool_params - page pool parameters
@@ -173,10 +212,10 @@ struct page_pool {
 	int cpuid;
 	u32 pages_state_hold_cnt;
 
-	bool has_init_callback:1;	/* slow::init_callback is set */
+	bool dma_sync;			/* Perform DMA sync for device */
 	bool dma_map:1;			/* Perform DMA mapping */
-	bool dma_sync:1;		/* Perform DMA sync for device */
 	bool dma_sync_for_cpu:1;	/* Perform DMA sync for cpu */
+	bool has_init_callback:1;	/* slow::init_callback is set */
 #ifdef CONFIG_PAGE_POOL_STATS
 	bool system:1;			/* This is a global percpu pool */
 #endif
@@ -229,6 +268,8 @@ struct page_pool {
 	void *mp_priv;
 	const struct memory_provider_ops *mp_ops;
 
+	struct xarray dma_mapped;
+
 #ifdef CONFIG_PAGE_POOL_STATS
 	/* recycle stats are per-cpu to avoid locking */
 	struct page_pool_recycle_stats __percpu *recycle_stats;
diff --git a/net/core/netmem_priv.h b/net/core/netmem_priv.h
index f33162fd281c23e109273ba09950c5d0a2829bc9..cd95394399b40c3604934ba7898eeeeacb8aee99 100644
--- a/net/core/netmem_priv.h
+++ b/net/core/netmem_priv.h
@@ -5,7 +5,7 @@
 
 static inline unsigned long netmem_get_pp_magic(netmem_ref netmem)
 {
-	return __netmem_clear_lsb(netmem)->pp_magic;
+	return __netmem_clear_lsb(netmem)->pp_magic & ~PP_DMA_INDEX_MASK;
 }
 
 static inline void netmem_or_pp_magic(netmem_ref netmem, unsigned long pp_magic)
@@ -15,6 +15,8 @@ static inline void netmem_or_pp_magic(netmem_ref netmem, unsigned long pp_magic)
 
 static inline void netmem_clear_pp_magic(netmem_ref netmem)
 {
+	WARN_ON_ONCE(__netmem_clear_lsb(netmem)->pp_magic & PP_DMA_INDEX_MASK);
+
 	__netmem_clear_lsb(netmem)->pp_magic = 0;
 }
 
@@ -33,4 +35,28 @@ static inline void netmem_set_dma_addr(netmem_ref netmem,
 {
 	__netmem_clear_lsb(netmem)->dma_addr = dma_addr;
 }
+
+static inline unsigned long netmem_get_dma_index(netmem_ref netmem)
+{
+	unsigned long magic;
+
+	if (WARN_ON_ONCE(netmem_is_net_iov(netmem)))
+		return 0;
+
+	magic = __netmem_clear_lsb(netmem)->pp_magic;
+
+	return (magic & PP_DMA_INDEX_MASK) >> PP_DMA_INDEX_SHIFT;
+}
+
+static inline void netmem_set_dma_index(netmem_ref netmem,
+					unsigned long id)
+{
+	unsigned long magic;
+
+	if (WARN_ON_ONCE(netmem_is_net_iov(netmem)))
+		return;
+
+	magic = netmem_get_pp_magic(netmem) | (id << PP_DMA_INDEX_SHIFT);
+	__netmem_clear_lsb(netmem)->pp_magic = magic;
+}
 #endif
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 7745ad924ae2d801580a6760eba9393e1cf67b01..370b66ca834a48cfad83d4a1d50cad11141ac642 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -272,13 +272,11 @@ static int page_pool_init(struct page_pool *pool,
 	}
 
 	atomic_set(&pool->pages_state_release_cnt, 0);
+	xa_init_flags(&pool->dma_mapped, XA_FLAGS_ALLOC1);
 
 	/* Driver calling page_pool_create() also call page_pool_destroy() */
 	refcount_set(&pool->user_cnt, 1);
 
-	if (pool->dma_map)
-		get_device(pool->p.dev);
-
 	if (pool->slow.flags & PP_FLAG_ALLOW_UNREADABLE_NETMEM) {
 		netdev_assert_locked(pool->slow.netdev);
 		rxq = __netif_get_rx_queue(pool->slow.netdev,
@@ -322,7 +320,7 @@ static void page_pool_uninit(struct page_pool *pool)
 	ptr_ring_cleanup(&pool->ring, NULL);
 
 	if (pool->dma_map)
-		put_device(pool->p.dev);
+		xa_destroy(&pool->dma_mapped);
 
 #ifdef CONFIG_PAGE_POOL_STATS
 	if (!pool->system)
@@ -463,13 +461,21 @@ page_pool_dma_sync_for_device(const struct page_pool *pool,
 			      netmem_ref netmem,
 			      u32 dma_sync_size)
 {
-	if (pool->dma_sync && dma_dev_need_sync(pool->p.dev))
-		__page_pool_dma_sync_for_device(pool, netmem, dma_sync_size);
+	if (READ_ONCE(pool->dma_sync) && dma_dev_need_sync(pool->p.dev)) {
+		rcu_read_lock();
+		/* re-check under rcu_read_lock() to sync with page_pool_scrub() */
+		if (READ_ONCE(pool->dma_sync))
+			__page_pool_dma_sync_for_device(pool, netmem,
+							dma_sync_size);
+		rcu_read_unlock();
+	}
 }
 
-static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem)
+static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem, gfp_t gfp)
 {
 	dma_addr_t dma;
+	int err;
+	u32 id;
 
 	/* Setup DMA mapping: use 'struct page' area for storing DMA-addr
 	 * since dma_addr_t can be either 32 or 64 bits and does not always fit
@@ -483,15 +489,28 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem)
 	if (dma_mapping_error(pool->p.dev, dma))
 		return false;
 
-	if (page_pool_set_dma_addr_netmem(netmem, dma))
+	if (page_pool_set_dma_addr_netmem(netmem, dma)) {
+		WARN_ONCE(1, "unexpected DMA address, please report to netdev@");
 		goto unmap_failed;
+	}
 
+	if (in_softirq())
+		err = xa_alloc(&pool->dma_mapped, &id, netmem_to_page(netmem),
+			       PP_DMA_INDEX_LIMIT, gfp);
+	else
+		err = xa_alloc_bh(&pool->dma_mapped, &id, netmem_to_page(netmem),
+				  PP_DMA_INDEX_LIMIT, gfp);
+	if (err) {
+		WARN_ONCE(err != -ENOMEM, "couldn't track DMA mapping, please report to netdev@");
+		goto unmap_failed;
+	}
+
+	netmem_set_dma_index(netmem, id);
 	page_pool_dma_sync_for_device(pool, netmem, pool->p.max_len);
 
 	return true;
 
 unmap_failed:
-	WARN_ONCE(1, "unexpected DMA address, please report to netdev@");
 	dma_unmap_page_attrs(pool->p.dev, dma,
 			     PAGE_SIZE << pool->p.order, pool->p.dma_dir,
 			     DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING);
@@ -508,7 +527,7 @@ static struct page *__page_pool_alloc_page_order(struct page_pool *pool,
 	if (unlikely(!page))
 		return NULL;
 
-	if (pool->dma_map && unlikely(!page_pool_dma_map(pool, page_to_netmem(page)))) {
+	if (pool->dma_map && unlikely(!page_pool_dma_map(pool, page_to_netmem(page), gfp))) {
 		put_page(page);
 		return NULL;
 	}
@@ -554,7 +573,7 @@ static noinline netmem_ref __page_pool_alloc_pages_slow(struct page_pool *pool,
 	 */
 	for (i = 0; i < nr_pages; i++) {
 		netmem = pool->alloc.cache[i];
-		if (dma_map && unlikely(!page_pool_dma_map(pool, netmem))) {
+		if (dma_map && unlikely(!page_pool_dma_map(pool, netmem, gfp))) {
 			put_page(netmem_to_page(netmem));
 			continue;
 		}
@@ -656,6 +675,8 @@ void page_pool_clear_pp_info(netmem_ref netmem)
 static __always_inline void __page_pool_release_page_dma(struct page_pool *pool,
 							 netmem_ref netmem)
 {
+	struct page *old, *page = netmem_to_page(netmem);
+	unsigned long id;
 	dma_addr_t dma;
 
 	if (!pool->dma_map)
@@ -664,6 +685,17 @@ static __always_inline void __page_pool_release_page_dma(struct page_pool *pool,
 		 */
 		return;
 
+	id = netmem_get_dma_index(netmem);
+	if (!id)
+		return;
+
+	if (in_softirq())
+		old = xa_cmpxchg(&pool->dma_mapped, id, page, NULL, 0);
+	else
+		old = xa_cmpxchg_bh(&pool->dma_mapped, id, page, NULL, 0);
+	if (old != page)
+		return;
+
 	dma = page_pool_get_dma_addr_netmem(netmem);
 
 	/* When page is unmapped, it cannot be returned to our pool */
@@ -671,6 +703,7 @@ static __always_inline void __page_pool_release_page_dma(struct page_pool *pool,
 			     PAGE_SIZE << pool->p.order, pool->p.dma_dir,
 			     DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING);
 	page_pool_set_dma_addr_netmem(netmem, 0);
+	netmem_set_dma_index(netmem, 0);
 }
 
 /* Disconnects a page (from a page_pool).  API users can have a need
@@ -1080,8 +1113,32 @@ static void page_pool_empty_alloc_cache_once(struct page_pool *pool)
 
 static void page_pool_scrub(struct page_pool *pool)
 {
+	unsigned long id;
+	void *ptr;
+
 	page_pool_empty_alloc_cache_once(pool);
-	pool->destroy_cnt++;
+	if (!pool->destroy_cnt++ && pool->dma_map) {
+		if (pool->dma_sync) {
+			/* paired with READ_ONCE in
+			 * page_pool_dma_sync_for_device() and
+			 * __page_pool_dma_sync_for_cpu()
+			 */
+			WRITE_ONCE(pool->dma_sync, false);
+
+			/* Make sure all concurrent returns that may see the old
+			 * value of dma_sync (and thus perform a sync) have
+			 * finished before doing the unmapping below. Skip the
+			 * wait if the device doesn't actually need syncing, or
+			 * if there are no outstanding mapped pages.
+			 */
+			if (dma_dev_need_sync(pool->p.dev) &&
+			    !xa_empty(&pool->dma_mapped))
+				synchronize_net();
+		}
+
+		xa_for_each(&pool->dma_mapped, id, ptr)
+			__page_pool_release_page_dma(pool, page_to_netmem(ptr));
+	}
 
 	/* No more consumers should exist, but producers could still
 	 * be in-flight.

-- 
2.48.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v7 2/2] page_pool: Track DMA-mapped pages and unmap them when destroying the pool
  2025-04-04 10:18 ` [PATCH net-next v7 2/2] page_pool: Track DMA-mapped pages and unmap them when destroying the pool Toke Høiland-Jørgensen
@ 2025-04-04 15:55   ` Alexander Lobakin
  2025-04-04 16:14     ` Alexander Lobakin
  0 siblings, 1 reply; 19+ messages in thread
From: Alexander Lobakin @ 2025-04-04 15:55 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, Ilias Apalodimas, Simon Horman,
	Andrew Morton, Mina Almasry, Yonglong Liu, Yunsheng Lin,
	Pavel Begunkov, Matthew Wilcox, netdev, bpf, linux-rdma, linux-mm,
	Qiuling Ren, Yuying Ma

From: Toke Høiland-Jørgensen <toke@redhat.com>
Date: Fri, 04 Apr 2025 12:18:36 +0200

> When enabling DMA mapping in page_pool, pages are kept DMA mapped until
> they are released from the pool, to avoid the overhead of re-mapping the
> pages every time they are used. This causes resource leaks and/or
> crashes when there are pages still outstanding while the device is torn
> down, because page_pool will attempt an unmap through a non-existent DMA
> device on the subsequent page return.

[...]

> -#define PP_MAGIC_MASK ~0x3UL
> +#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>  
>  /**
>   * struct page_pool_params - page pool parameters
> @@ -173,10 +212,10 @@ struct page_pool {
>  	int cpuid;
>  	u32 pages_state_hold_cnt;
>  
> -	bool has_init_callback:1;	/* slow::init_callback is set */
> +	bool dma_sync;			/* Perform DMA sync for device */

Yunsheng said this change to a full bool is redundant in the v6 thread
¯\_(ツ)_/¯
I hope you've read it.

>  	bool dma_map:1;			/* Perform DMA mapping */
> -	bool dma_sync:1;		/* Perform DMA sync for device */
>  	bool dma_sync_for_cpu:1;	/* Perform DMA sync for cpu */
> +	bool has_init_callback:1;	/* slow::init_callback is set */
>  #ifdef CONFIG_PAGE_POOL_STATS
>  	bool system:1;			/* This is a global percpu pool */
>  #endif

Thanks,
Olek


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v7 2/2] page_pool: Track DMA-mapped pages and unmap them when destroying the pool
  2025-04-04 15:55   ` Alexander Lobakin
@ 2025-04-04 16:14     ` Alexander Lobakin
  2025-04-05 12:50       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 19+ messages in thread
From: Alexander Lobakin @ 2025-04-04 16:14 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, Ilias Apalodimas, Simon Horman,
	Andrew Morton, Mina Almasry, Yonglong Liu, Yunsheng Lin,
	Pavel Begunkov, Matthew Wilcox, netdev, bpf, linux-rdma, linux-mm,
	Qiuling Ren, Yuying Ma

From: Alexander Lobakin <aleksander.lobakin@intel.com>
Date: Fri, 4 Apr 2025 17:55:43 +0200

> From: Toke Høiland-Jørgensen <toke@redhat.com>
> Date: Fri, 04 Apr 2025 12:18:36 +0200
> 
>> When enabling DMA mapping in page_pool, pages are kept DMA mapped until
>> they are released from the pool, to avoid the overhead of re-mapping the
>> pages every time they are used. This causes resource leaks and/or
>> crashes when there are pages still outstanding while the device is torn
>> down, because page_pool will attempt an unmap through a non-existent DMA
>> device on the subsequent page return.
> 
> [...]
> 
>> -#define PP_MAGIC_MASK ~0x3UL
>> +#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>>  
>>  /**
>>   * struct page_pool_params - page pool parameters
>> @@ -173,10 +212,10 @@ struct page_pool {
>>  	int cpuid;
>>  	u32 pages_state_hold_cnt;
>>  
>> -	bool has_init_callback:1;	/* slow::init_callback is set */
>> +	bool dma_sync;			/* Perform DMA sync for device */
> 
> Yunsheng said this change to a full bool is redundant in the v6 thread
> ¯\_(ツ)_/¯


Under v5, sorree >_<

> I hope you've read it.
> 
>>  	bool dma_map:1;			/* Perform DMA mapping */
>> -	bool dma_sync:1;		/* Perform DMA sync for device */
>>  	bool dma_sync_for_cpu:1;	/* Perform DMA sync for cpu */
>> +	bool has_init_callback:1;	/* slow::init_callback is set */
>>  #ifdef CONFIG_PAGE_POOL_STATS
>>  	bool system:1;			/* This is a global percpu pool */
>>  #endif

Thanks,
Olek


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v7 2/2] page_pool: Track DMA-mapped pages and unmap them when destroying the pool
  2025-04-04 16:14     ` Alexander Lobakin
@ 2025-04-05 12:50       ` Toke Høiland-Jørgensen
  2025-04-07 11:26         ` Yunsheng Lin
  0 siblings, 1 reply; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-04-05 12:50 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, Ilias Apalodimas, Simon Horman,
	Andrew Morton, Mina Almasry, Yonglong Liu, Yunsheng Lin,
	Pavel Begunkov, Matthew Wilcox, netdev, bpf, linux-rdma, linux-mm,
	Qiuling Ren, Yuying Ma

Alexander Lobakin <aleksander.lobakin@intel.com> writes:

> From: Alexander Lobakin <aleksander.lobakin@intel.com>
> Date: Fri, 4 Apr 2025 17:55:43 +0200
>
>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>> Date: Fri, 04 Apr 2025 12:18:36 +0200
>> 
>>> When enabling DMA mapping in page_pool, pages are kept DMA mapped until
>>> they are released from the pool, to avoid the overhead of re-mapping the
>>> pages every time they are used. This causes resource leaks and/or
>>> crashes when there are pages still outstanding while the device is torn
>>> down, because page_pool will attempt an unmap through a non-existent DMA
>>> device on the subsequent page return.
>> 
>> [...]
>> 
>>> -#define PP_MAGIC_MASK ~0x3UL
>>> +#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>>>  
>>>  /**
>>>   * struct page_pool_params - page pool parameters
>>> @@ -173,10 +212,10 @@ struct page_pool {
>>>  	int cpuid;
>>>  	u32 pages_state_hold_cnt;
>>>  
>>> -	bool has_init_callback:1;	/* slow::init_callback is set */
>>> +	bool dma_sync;			/* Perform DMA sync for device */
>> 
>> Yunsheng said this change to a full bool is redundant in the v6 thread
>> ¯\_(ツ)_/¯

AFAIU, the comment was that the second READ_ONCE() when reading the
field was redundant, because of the rcu_read_lock(). Which may be the
case, but I think keeping it makes the intent of the code clearer. And
in any case, it has nothing to do with changing the type of the field...

-Toke



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v7 2/2] page_pool: Track DMA-mapped pages and unmap them when destroying the pool
  2025-04-05 12:50       ` Toke Høiland-Jørgensen
@ 2025-04-07 11:26         ` Yunsheng Lin
  2025-04-07 11:49           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 19+ messages in thread
From: Yunsheng Lin @ 2025-04-07 11:26 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Alexander Lobakin
  Cc: David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, Ilias Apalodimas, Simon Horman,
	Andrew Morton, Mina Almasry, Yonglong Liu, Pavel Begunkov,
	Matthew Wilcox, netdev, bpf, linux-rdma, linux-mm, Qiuling Ren,
	Yuying Ma

On 2025/4/5 20:50, Toke Høiland-Jørgensen wrote:
> Alexander Lobakin <aleksander.lobakin@intel.com> writes:
> 
>> From: Alexander Lobakin <aleksander.lobakin@intel.com>
>> Date: Fri, 4 Apr 2025 17:55:43 +0200
>>
>>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>>> Date: Fri, 04 Apr 2025 12:18:36 +0200
>>>
>>>> When enabling DMA mapping in page_pool, pages are kept DMA mapped until
>>>> they are released from the pool, to avoid the overhead of re-mapping the
>>>> pages every time they are used. This causes resource leaks and/or
>>>> crashes when there are pages still outstanding while the device is torn
>>>> down, because page_pool will attempt an unmap through a non-existent DMA
>>>> device on the subsequent page return.
>>>
>>> [...]
>>>
>>>> -#define PP_MAGIC_MASK ~0x3UL
>>>> +#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>>>>  
>>>>  /**
>>>>   * struct page_pool_params - page pool parameters
>>>> @@ -173,10 +212,10 @@ struct page_pool {
>>>>  	int cpuid;
>>>>  	u32 pages_state_hold_cnt;
>>>>  
>>>> -	bool has_init_callback:1;	/* slow::init_callback is set */
>>>> +	bool dma_sync;			/* Perform DMA sync for device */
>>>
>>> Yunsheng said this change to a full bool is redundant in the v6 thread
>>> ¯\_(ツ)_/¯
> 
> AFAIU, the comment was that the second READ_ONCE() when reading the
> field was redundant, because of the rcu_read_lock(). Which may be the
> case, but I think keeping it makes the intent of the code clearer. And
> in any case, it has nothing to do with changing the type of the field...

For changing the type of the field part, there are only two outcomes here
when using bit field here:
1. The reading returns a correct value.
2. The reading returns a incorrect value.

So the question seems to be what would possibly go wrong when the reading
return an incorrect value when there is an additional reading under the rcu
read lock and there is a rcu sync after clearing pool->dma_sync? Considering
we only need to ensure there is no dma sync API called after rcu sync.

And it seems data_race() can be used to mark the above reading so that KCSAN
will not complain.

IOW, changing the type of the field part isn't that necessary as my understanding.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v7 2/2] page_pool: Track DMA-mapped pages and unmap them when destroying the pool
  2025-04-07 11:26         ` Yunsheng Lin
@ 2025-04-07 11:49           ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-04-07 11:49 UTC (permalink / raw)
  To: Yunsheng Lin, Alexander Lobakin
  Cc: David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, Ilias Apalodimas, Simon Horman,
	Andrew Morton, Mina Almasry, Yonglong Liu, Pavel Begunkov,
	Matthew Wilcox, netdev, bpf, linux-rdma, linux-mm, Qiuling Ren,
	Yuying Ma

Yunsheng Lin <linyunsheng@huawei.com> writes:

> On 2025/4/5 20:50, Toke Høiland-Jørgensen wrote:
>> Alexander Lobakin <aleksander.lobakin@intel.com> writes:
>> 
>>> From: Alexander Lobakin <aleksander.lobakin@intel.com>
>>> Date: Fri, 4 Apr 2025 17:55:43 +0200
>>>
>>>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>>>> Date: Fri, 04 Apr 2025 12:18:36 +0200
>>>>
>>>>> When enabling DMA mapping in page_pool, pages are kept DMA mapped until
>>>>> they are released from the pool, to avoid the overhead of re-mapping the
>>>>> pages every time they are used. This causes resource leaks and/or
>>>>> crashes when there are pages still outstanding while the device is torn
>>>>> down, because page_pool will attempt an unmap through a non-existent DMA
>>>>> device on the subsequent page return.
>>>>
>>>> [...]
>>>>
>>>>> -#define PP_MAGIC_MASK ~0x3UL
>>>>> +#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>>>>>  
>>>>>  /**
>>>>>   * struct page_pool_params - page pool parameters
>>>>> @@ -173,10 +212,10 @@ struct page_pool {
>>>>>  	int cpuid;
>>>>>  	u32 pages_state_hold_cnt;
>>>>>  
>>>>> -	bool has_init_callback:1;	/* slow::init_callback is set */
>>>>> +	bool dma_sync;			/* Perform DMA sync for device */
>>>>
>>>> Yunsheng said this change to a full bool is redundant in the v6 thread
>>>> ¯\_(ツ)_/¯
>> 
>> AFAIU, the comment was that the second READ_ONCE() when reading the
>> field was redundant, because of the rcu_read_lock(). Which may be the
>> case, but I think keeping it makes the intent of the code clearer. And
>> in any case, it has nothing to do with changing the type of the field...
>
> For changing the type of the field part, there are only two outcomes here
> when using bit field here:
> 1. The reading returns a correct value.
> 2. The reading returns a incorrect value.
>
> So the question seems to be what would possibly go wrong when the reading
> return an incorrect value when there is an additional reading under the rcu
> read lock and there is a rcu sync after clearing pool->dma_sync? Considering
> we only need to ensure there is no dma sync API called after rcu sync.

Okay, so your argument is basically that the barrier in rcu_read_lock()
should prevent the compiler from coalescing the two reads of the
pp->dma_sync field in page_pool_dma_sync_for_device()? And that
READ/WRITE_ONCE() are not needed for the same reason?

> And it seems data_race() can be used to mark the above reading so that KCSAN
> will not complain.

Where would you suggest to add those? Not sure such annotations would
improve readability relative to the current use of READ/WRITE_ONCE()?
The latter is more clear in communicating intent, I would say...

> IOW, changing the type of the field part isn't that necessary as my
> understanding.

Since changing the field doesn't change the size of the structure, I
would be inclined to keep the change for readability reasons, cf the
above.

-Toke



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-04-07 16:06 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-04 10:18 [PATCH net-next v7 0/2] Fix late DMA unmap crash for page pool Toke Høiland-Jørgensen
2025-04-04 10:18 ` [PATCH net-next v7 1/2] page_pool: Move pp_magic check into helper functions Toke Høiland-Jørgensen
2025-04-06 18:56   ` Zi Yan
2025-04-07  8:53     ` Toke Høiland-Jørgensen
2025-04-07 11:53       ` Zi Yan
2025-04-07 12:24       ` Zi Yan
2025-04-07 13:14         ` Toke Høiland-Jørgensen
2025-04-07 13:36           ` Zi Yan
2025-04-07 14:15             ` Zi Yan
2025-04-07 14:43               ` Jesper Dangaard Brouer
2025-04-07 15:50                 ` Zi Yan
2025-04-07 16:05                   ` Toke Høiland-Jørgensen
2025-04-07 16:06                     ` Zi Yan
2025-04-04 10:18 ` [PATCH net-next v7 2/2] page_pool: Track DMA-mapped pages and unmap them when destroying the pool Toke Høiland-Jørgensen
2025-04-04 15:55   ` Alexander Lobakin
2025-04-04 16:14     ` Alexander Lobakin
2025-04-05 12:50       ` Toke Høiland-Jørgensen
2025-04-07 11:26         ` Yunsheng Lin
2025-04-07 11:49           ` Toke Høiland-Jørgensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).