bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v6 0/9] Split netmem from struct page
@ 2025-06-20  4:12 Byungchul Park
  2025-06-20  4:12 ` [PATCH net-next v6 1/9] netmem: introduce struct netmem_desc mirroring " Byungchul Park
                   ` (8 more replies)
  0 siblings, 9 replies; 35+ messages in thread
From: Byungchul Park @ 2025-06-20  4:12 UTC (permalink / raw)
  To: willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola, hannes, ziy, jackmanb

Hi all,

In this version, I'm posting non-controversial patches first.  I will
post the rest later.  This version has no update but adding given
'Reviewed-by's and 'Acked-by's.  See the changes below.

The MM subsystem is trying to reduce struct page to a single pointer.
The first step towards that is splitting struct page by its individual
users, as has already been done with folio and slab.  This patchset does
that for netmem which is used for page pools.

Matthew Wilcox tried and stopped the same work, you can see in:

   https://lore.kernel.org/linux-mm/20230111042214.907030-1-willy@infradead.org/

Mina Almasry already has done a lot fo prerequisite works by luck.  I
stacked my patches on the top of his work e.i. netmem.

I focused on removing the page pool members in struct page this time,
not moving the allocation code of page pool from net to mm.  It can be
done later if needed.

The final patch removing the page pool fields will be submitted once
all the converting work of page to netmem are done:

   1. converting of libeth_fqe by Tony Nguyen.
   2. converting of mlx5 by Tariq Toukan.
   3. converting of prueth_swdata.
   4. converting of freescale driver.

For our discussion, I'm sharing what the final patch looks like the
following.

	Byungchul
--8<--
commit 1847d9890f798456b21ccb27aac7545303048492
Author: Byungchul Park <byungchul@sk.com>
Date:   Wed May 28 20:44:55 2025 +0900

    mm, netmem: remove the page pool members in struct page
    
    Now that all the users of the page pool members in struct page have been
    gone, the members can be removed from struct page.
    
    However, since struct netmem_desc still uses the space in struct page,
    the important offsets should be checked properly, until struct
    netmem_desc has its own instance from slab.
    
    Remove the page pool members in struct page and modify static checkers
    for the offsets.
    
    Signed-off-by: Byungchul Park <byungchul@sk.com>

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 32ba5126e221..db2fe0d0ebbf 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -120,17 +120,6 @@ struct page {
 			 */
 			unsigned long private;
 		};
-		struct {	/* page_pool used by netstack */
-			/**
-			 * @pp_magic: magic value to avoid recycling non
-			 * page_pool allocated pages.
-			 */
-			unsigned long pp_magic;
-			struct page_pool *pp;
-			unsigned long _pp_mapping_pad;
-			unsigned long dma_addr;
-			atomic_long_t pp_ref_count;
-		};
 		struct {	/* Tail pages of compound page */
 			unsigned long compound_head;	/* Bit zero is set */
 		};
diff --git a/include/net/netmem.h b/include/net/netmem.h
index 8f354ae7d5c3..3414f184d018 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -42,11 +42,8 @@ struct netmem_desc {
 	static_assert(offsetof(struct page, pg) == \
 		      offsetof(struct netmem_desc, desc))
 NETMEM_DESC_ASSERT_OFFSET(flags, _flags);
-NETMEM_DESC_ASSERT_OFFSET(pp_magic, pp_magic);
-NETMEM_DESC_ASSERT_OFFSET(pp, pp);
-NETMEM_DESC_ASSERT_OFFSET(_pp_mapping_pad, _pp_mapping_pad);
-NETMEM_DESC_ASSERT_OFFSET(dma_addr, dma_addr);
-NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
+NETMEM_DESC_ASSERT_OFFSET(lru, pp_magic);
+NETMEM_DESC_ASSERT_OFFSET(mapping, _pp_mapping_pad);
 #undef NETMEM_DESC_ASSERT_OFFSET
 
 /*
---
Changes from v5 (no actual updates):
	1. Rease on net-next/main as of Jun 20.
	2. Add given 'Reviewed-by's and 'Acked-by's, thanks to all.
	3. Add missing cc's.

Changes from v4:
	1. Add given 'Reviewed-by's, thanks to all.
	2. Exclude potentially controversial patches.

Changes from v3:
	1. Relocates ->owner and ->type of net_iov out of netmem_desc
	   and make them be net_iov specific.
	2. Remove __force when casting struct page to struct netmem_desc.

Changes from v2:
	1. Introduce a netmem API, virt_to_head_netmem(), and use it
	   when it's needed.
	2. Introduce struct netmem_desc as a new struct and union'ed
	   with the existing fields in struct net_iov.
	3. Make page_pool_page_is_pp() access ->pp_magic through struct
	   netmem_desc instead of struct page.
	4. Move netmem alloc APIs from include/net/netmem.h to
	   net/core/netmem_priv.h.
	5. Apply trivial feedbacks, thanks to Mina, Pavel, and Toke.
	6. Add given 'Reviewed-by's, thanks to Mina.

Changes from v1:
	1. Rebase on net-next's main as of May 26.
	2. Check checkpatch.pl, feedbacked by SJ Park.
	3. Add converting of page to netmem in mt76.
	4. Revert 'mlx5: use netmem descriptor and APIs for page pool'
	   since it's on-going by Tariq Toukan.  I will wait for his
	   work to be done.
	5. Revert 'page_pool: use netmem APIs to access page->pp_magic
	   in page_pool_page_is_pp()' since we need more discussion.
	6. Revert 'mm, netmem: remove the page pool members in struct
	   page' since there are some prerequisite works to remove the
	   page pool fields from struct page.  I can submit this patch
	   separatedly later.
	7. Cancel relocating a page pool member in struct page.
	8. Modify static assert for offests and size of struct
	   netmem_desc.

Changes from rfc:
	1. Rebase on net-next's main branch.
	   https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/
	2. Fix a build error reported by kernel test robot.
	   https://lore.kernel.org/all/202505100932.uzAMBW1y-lkp@intel.com/
	3. Add given 'Reviewed-by's, thanks to Mina and Ilias.
	4. Do static_assert() on the size of struct netmem_desc instead
	   of placing place-holder in struct page, feedbacked by
	   Matthew.
	5. Do struct_group_tagged(netmem_desc) on struct net_iov instead
	   of wholly renaming it to strcut netmem_desc, feedbacked by
	   Mina and Pavel.

Byungchul Park (9):
  netmem: introduce struct netmem_desc mirroring struct page
  page_pool: rename page_pool_return_page() to page_pool_return_netmem()
  page_pool: rename __page_pool_release_page_dma() to
    __page_pool_release_netmem_dma()
  page_pool: rename __page_pool_alloc_pages_slow() to
    __page_pool_alloc_netmems_slow()
  netmem: use _Generic to cover const casting for page_to_netmem()
  netmem: remove __netmem_get_pp()
  page_pool: make page_pool_get_dma_addr() just wrap
    page_pool_get_dma_addr_netmem()
  netmem: introduce a netmem API, virt_to_head_netmem()
  page_pool: access ->pp_magic through struct netmem_desc in
    page_pool_page_is_pp()

 include/linux/mm.h              |  12 ---
 include/net/netmem.h            | 138 ++++++++++++++++++++++----------
 include/net/page_pool/helpers.h |   7 +-
 mm/page_alloc.c                 |   1 +
 net/core/page_pool.c            |  36 ++++-----
 5 files changed, 117 insertions(+), 77 deletions(-)


base-commit: 4f4040ea5d3e4bebebbef9379f88085c8b99221c
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next v6 1/9] netmem: introduce struct netmem_desc mirroring struct page
  2025-06-20  4:12 [PATCH net-next v6 0/9] Split netmem from struct page Byungchul Park
@ 2025-06-20  4:12 ` Byungchul Park
  2025-06-23  9:32   ` David Hildenbrand
  2025-06-20  4:12 ` [PATCH net-next v6 2/9] page_pool: rename page_pool_return_page() to page_pool_return_netmem() Byungchul Park
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Byungchul Park @ 2025-06-20  4:12 UTC (permalink / raw)
  To: willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola, hannes, ziy, jackmanb

To simplify struct page, the page pool members of struct page should be
moved to other, allowing these members to be removed from struct page.

Introduce a network memory descriptor to store the members, struct
netmem_desc, and make it union'ed with the existing fields in struct
net_iov, allowing to organize the fields of struct net_iov.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Harry Yoo <harry.yoo@oracle.com>
---
 include/net/netmem.h | 94 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 73 insertions(+), 21 deletions(-)

diff --git a/include/net/netmem.h b/include/net/netmem.h
index 7a1dafa3f080..e0aa67aca9bb 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -12,6 +12,50 @@
 #include <linux/mm.h>
 #include <net/net_debug.h>
 
+/* These fields in struct page are used by the page_pool and net stack:
+ *
+ *        struct {
+ *                unsigned long pp_magic;
+ *                struct page_pool *pp;
+ *                unsigned long _pp_mapping_pad;
+ *                unsigned long dma_addr;
+ *                atomic_long_t pp_ref_count;
+ *        };
+ *
+ * We mirror the page_pool fields here so the page_pool can access these
+ * fields without worrying whether the underlying fields belong to a
+ * page or netmem_desc.
+ *
+ * CAUTION: Do not update the fields in netmem_desc without also
+ * updating the anonymous aliasing union in struct net_iov.
+ */
+struct netmem_desc {
+	unsigned long _flags;
+	unsigned long pp_magic;
+	struct page_pool *pp;
+	unsigned long _pp_mapping_pad;
+	unsigned long dma_addr;
+	atomic_long_t pp_ref_count;
+};
+
+#define NETMEM_DESC_ASSERT_OFFSET(pg, desc)        \
+	static_assert(offsetof(struct page, pg) == \
+		      offsetof(struct netmem_desc, desc))
+NETMEM_DESC_ASSERT_OFFSET(flags, _flags);
+NETMEM_DESC_ASSERT_OFFSET(pp_magic, pp_magic);
+NETMEM_DESC_ASSERT_OFFSET(pp, pp);
+NETMEM_DESC_ASSERT_OFFSET(_pp_mapping_pad, _pp_mapping_pad);
+NETMEM_DESC_ASSERT_OFFSET(dma_addr, dma_addr);
+NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
+#undef NETMEM_DESC_ASSERT_OFFSET
+
+/*
+ * Since struct netmem_desc uses the space in struct page, the size
+ * should be checked, until struct netmem_desc has its own instance from
+ * slab, to avoid conflicting with other members within struct page.
+ */
+static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
+
 /* net_iov */
 
 DECLARE_STATIC_KEY_FALSE(page_pool_mem_providers);
@@ -31,12 +75,25 @@ enum net_iov_type {
 };
 
 struct net_iov {
-	enum net_iov_type type;
-	unsigned long pp_magic;
-	struct page_pool *pp;
+	union {
+		struct netmem_desc desc;
+
+		/* XXX: The following part should be removed once all
+		 * the references to them are converted so as to be
+		 * accessed via netmem_desc e.g. niov->desc.pp instead
+		 * of niov->pp.
+		 */
+		struct {
+			unsigned long _flags;
+			unsigned long pp_magic;
+			struct page_pool *pp;
+			unsigned long _pp_mapping_pad;
+			unsigned long dma_addr;
+			atomic_long_t pp_ref_count;
+		};
+	};
 	struct net_iov_area *owner;
-	unsigned long dma_addr;
-	atomic_long_t pp_ref_count;
+	enum net_iov_type type;
 };
 
 struct net_iov_area {
@@ -48,27 +105,22 @@ struct net_iov_area {
 	unsigned long base_virtual;
 };
 
-/* These fields in struct page are used by the page_pool and net stack:
+/* net_iov is union'ed with struct netmem_desc mirroring struct page, so
+ * the page_pool can access these fields without worrying whether the
+ * underlying fields are accessed via netmem_desc or directly via
+ * net_iov, until all the references to them are converted so as to be
+ * accessed via netmem_desc e.g. niov->desc.pp instead of niov->pp.
  *
- *        struct {
- *                unsigned long pp_magic;
- *                struct page_pool *pp;
- *                unsigned long _pp_mapping_pad;
- *                unsigned long dma_addr;
- *                atomic_long_t pp_ref_count;
- *        };
- *
- * We mirror the page_pool fields here so the page_pool can access these fields
- * without worrying whether the underlying fields belong to a page or net_iov.
- *
- * The non-net stack fields of struct page are private to the mm stack and must
- * never be mirrored to net_iov.
+ * The non-net stack fields of struct page are private to the mm stack
+ * and must never be mirrored to net_iov.
  */
-#define NET_IOV_ASSERT_OFFSET(pg, iov)             \
-	static_assert(offsetof(struct page, pg) == \
+#define NET_IOV_ASSERT_OFFSET(desc, iov)                    \
+	static_assert(offsetof(struct netmem_desc, desc) == \
 		      offsetof(struct net_iov, iov))
+NET_IOV_ASSERT_OFFSET(_flags, _flags);
 NET_IOV_ASSERT_OFFSET(pp_magic, pp_magic);
 NET_IOV_ASSERT_OFFSET(pp, pp);
+NET_IOV_ASSERT_OFFSET(_pp_mapping_pad, _pp_mapping_pad);
 NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr);
 NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
 #undef NET_IOV_ASSERT_OFFSET
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next v6 2/9] page_pool: rename page_pool_return_page() to page_pool_return_netmem()
  2025-06-20  4:12 [PATCH net-next v6 0/9] Split netmem from struct page Byungchul Park
  2025-06-20  4:12 ` [PATCH net-next v6 1/9] netmem: introduce struct netmem_desc mirroring " Byungchul Park
@ 2025-06-20  4:12 ` Byungchul Park
  2025-06-20  4:12 ` [PATCH net-next v6 3/9] page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma() Byungchul Park
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Byungchul Park @ 2025-06-20  4:12 UTC (permalink / raw)
  To: willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola, hannes, ziy, jackmanb

Now that page_pool_return_page() is for returning netmem, not struct
page, rename it to page_pool_return_netmem() to reflect what it does.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
 net/core/page_pool.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index ba7cf3e3c32f..3bf25e554f96 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -371,7 +371,7 @@ struct page_pool *page_pool_create(const struct page_pool_params *params)
 }
 EXPORT_SYMBOL(page_pool_create);
 
-static void page_pool_return_page(struct page_pool *pool, netmem_ref netmem);
+static void page_pool_return_netmem(struct page_pool *pool, netmem_ref netmem);
 
 static noinline netmem_ref page_pool_refill_alloc_cache(struct page_pool *pool)
 {
@@ -409,7 +409,7 @@ static noinline netmem_ref page_pool_refill_alloc_cache(struct page_pool *pool)
 			 * (2) break out to fallthrough to alloc_pages_node.
 			 * This limit stress on page buddy alloactor.
 			 */
-			page_pool_return_page(pool, netmem);
+			page_pool_return_netmem(pool, netmem);
 			alloc_stat_inc(pool, waive);
 			netmem = 0;
 			break;
@@ -712,7 +712,7 @@ static __always_inline void __page_pool_release_page_dma(struct page_pool *pool,
  * a regular page (that will eventually be returned to the normal
  * page-allocator via put_page).
  */
-void page_pool_return_page(struct page_pool *pool, netmem_ref netmem)
+static void page_pool_return_netmem(struct page_pool *pool, netmem_ref netmem)
 {
 	int count;
 	bool put;
@@ -826,7 +826,7 @@ __page_pool_put_page(struct page_pool *pool, netmem_ref netmem,
 	 * will be invoking put_page.
 	 */
 	recycle_stat_inc(pool, released_refcnt);
-	page_pool_return_page(pool, netmem);
+	page_pool_return_netmem(pool, netmem);
 
 	return 0;
 }
@@ -869,7 +869,7 @@ void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem,
 	if (netmem && !page_pool_recycle_in_ring(pool, netmem)) {
 		/* Cache full, fallback to free pages */
 		recycle_stat_inc(pool, ring_full);
-		page_pool_return_page(pool, netmem);
+		page_pool_return_netmem(pool, netmem);
 	}
 }
 EXPORT_SYMBOL(page_pool_put_unrefed_netmem);
@@ -912,7 +912,7 @@ static void page_pool_recycle_ring_bulk(struct page_pool *pool,
 	 * since put_page() with refcnt == 1 can be an expensive operation.
 	 */
 	for (; i < bulk_len; i++)
-		page_pool_return_page(pool, bulk[i]);
+		page_pool_return_netmem(pool, bulk[i]);
 }
 
 /**
@@ -995,7 +995,7 @@ static netmem_ref page_pool_drain_frag(struct page_pool *pool,
 		return netmem;
 	}
 
-	page_pool_return_page(pool, netmem);
+	page_pool_return_netmem(pool, netmem);
 	return 0;
 }
 
@@ -1009,7 +1009,7 @@ static void page_pool_free_frag(struct page_pool *pool)
 	if (!netmem || page_pool_unref_netmem(netmem, drain_count))
 		return;
 
-	page_pool_return_page(pool, netmem);
+	page_pool_return_netmem(pool, netmem);
 }
 
 netmem_ref page_pool_alloc_frag_netmem(struct page_pool *pool,
@@ -1076,7 +1076,7 @@ static void page_pool_empty_ring(struct page_pool *pool)
 			pr_crit("%s() page_pool refcnt %d violation\n",
 				__func__, netmem_ref_count(netmem));
 
-		page_pool_return_page(pool, netmem);
+		page_pool_return_netmem(pool, netmem);
 	}
 }
 
@@ -1109,7 +1109,7 @@ static void page_pool_empty_alloc_cache_once(struct page_pool *pool)
 	 */
 	while (pool->alloc.count) {
 		netmem = pool->alloc.cache[--pool->alloc.count];
-		page_pool_return_page(pool, netmem);
+		page_pool_return_netmem(pool, netmem);
 	}
 }
 
@@ -1253,7 +1253,7 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid)
 	/* Flush pool alloc cache, as refill will check NUMA node */
 	while (pool->alloc.count) {
 		netmem = pool->alloc.cache[--pool->alloc.count];
-		page_pool_return_page(pool, netmem);
+		page_pool_return_netmem(pool, netmem);
 	}
 }
 EXPORT_SYMBOL(page_pool_update_nid);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next v6 3/9] page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma()
  2025-06-20  4:12 [PATCH net-next v6 0/9] Split netmem from struct page Byungchul Park
  2025-06-20  4:12 ` [PATCH net-next v6 1/9] netmem: introduce struct netmem_desc mirroring " Byungchul Park
  2025-06-20  4:12 ` [PATCH net-next v6 2/9] page_pool: rename page_pool_return_page() to page_pool_return_netmem() Byungchul Park
@ 2025-06-20  4:12 ` Byungchul Park
  2025-06-20  4:12 ` [PATCH net-next v6 4/9] page_pool: rename __page_pool_alloc_pages_slow() to __page_pool_alloc_netmems_slow() Byungchul Park
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Byungchul Park @ 2025-06-20  4:12 UTC (permalink / raw)
  To: willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola, hannes, ziy, jackmanb

Now that __page_pool_release_page_dma() is for releasing netmem, not
struct page, rename it to __page_pool_release_netmem_dma() to reflect
what it does.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
 net/core/page_pool.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 3bf25e554f96..95ffa48c7c67 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -673,8 +673,8 @@ void page_pool_clear_pp_info(netmem_ref netmem)
 	netmem_set_pp(netmem, NULL);
 }
 
-static __always_inline void __page_pool_release_page_dma(struct page_pool *pool,
-							 netmem_ref netmem)
+static __always_inline void __page_pool_release_netmem_dma(struct page_pool *pool,
+							   netmem_ref netmem)
 {
 	struct page *old, *page = netmem_to_page(netmem);
 	unsigned long id;
@@ -721,7 +721,7 @@ static void page_pool_return_netmem(struct page_pool *pool, netmem_ref netmem)
 	if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops)
 		put = pool->mp_ops->release_netmem(pool, netmem);
 	else
-		__page_pool_release_page_dma(pool, netmem);
+		__page_pool_release_netmem_dma(pool, netmem);
 
 	/* This may be the last page returned, releasing the pool, so
 	 * it is not safe to reference pool afterwards.
@@ -1136,7 +1136,7 @@ static void page_pool_scrub(struct page_pool *pool)
 		}
 
 		xa_for_each(&pool->dma_mapped, id, ptr)
-			__page_pool_release_page_dma(pool, page_to_netmem(ptr));
+			__page_pool_release_netmem_dma(pool, page_to_netmem((struct page *)ptr));
 	}
 
 	/* No more consumers should exist, but producers could still
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next v6 4/9] page_pool: rename __page_pool_alloc_pages_slow() to __page_pool_alloc_netmems_slow()
  2025-06-20  4:12 [PATCH net-next v6 0/9] Split netmem from struct page Byungchul Park
                   ` (2 preceding siblings ...)
  2025-06-20  4:12 ` [PATCH net-next v6 3/9] page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma() Byungchul Park
@ 2025-06-20  4:12 ` Byungchul Park
  2025-06-20  4:12 ` [PATCH net-next v6 5/9] netmem: use _Generic to cover const casting for page_to_netmem() Byungchul Park
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Byungchul Park @ 2025-06-20  4:12 UTC (permalink / raw)
  To: willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola, hannes, ziy, jackmanb

Now that __page_pool_alloc_pages_slow() is for allocating netmem, not
struct page, rename it to __page_pool_alloc_netmems_slow() to reflect
what it does.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 net/core/page_pool.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 95ffa48c7c67..05e2e22a8f7c 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -544,8 +544,8 @@ static struct page *__page_pool_alloc_page_order(struct page_pool *pool,
 }
 
 /* slow path */
-static noinline netmem_ref __page_pool_alloc_pages_slow(struct page_pool *pool,
-							gfp_t gfp)
+static noinline netmem_ref __page_pool_alloc_netmems_slow(struct page_pool *pool,
+							  gfp_t gfp)
 {
 	const int bulk = PP_ALLOC_CACHE_REFILL;
 	unsigned int pp_order = pool->p.order;
@@ -615,7 +615,7 @@ netmem_ref page_pool_alloc_netmems(struct page_pool *pool, gfp_t gfp)
 	if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops)
 		netmem = pool->mp_ops->alloc_netmems(pool, gfp);
 	else
-		netmem = __page_pool_alloc_pages_slow(pool, gfp);
+		netmem = __page_pool_alloc_netmems_slow(pool, gfp);
 	return netmem;
 }
 EXPORT_SYMBOL(page_pool_alloc_netmems);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next v6 5/9] netmem: use _Generic to cover const casting for page_to_netmem()
  2025-06-20  4:12 [PATCH net-next v6 0/9] Split netmem from struct page Byungchul Park
                   ` (3 preceding siblings ...)
  2025-06-20  4:12 ` [PATCH net-next v6 4/9] page_pool: rename __page_pool_alloc_pages_slow() to __page_pool_alloc_netmems_slow() Byungchul Park
@ 2025-06-20  4:12 ` Byungchul Park
  2025-06-20  4:12 ` [PATCH net-next v6 6/9] netmem: remove __netmem_get_pp() Byungchul Park
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Byungchul Park @ 2025-06-20  4:12 UTC (permalink / raw)
  To: willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola, hannes, ziy, jackmanb

The current page_to_netmem() doesn't cover const casting resulting in
trying to cast const struct page * to const netmem_ref fails.

To cover the case, change page_to_netmem() to use macro and _Generic.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/net/netmem.h | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/include/net/netmem.h b/include/net/netmem.h
index e0aa67aca9bb..e27ed0b9c82e 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -191,10 +191,9 @@ static inline netmem_ref net_iov_to_netmem(struct net_iov *niov)
 	return (__force netmem_ref)((unsigned long)niov | NET_IOV);
 }
 
-static inline netmem_ref page_to_netmem(const struct page *page)
-{
-	return (__force netmem_ref)page;
-}
+#define page_to_netmem(p)	(_Generic((p),			\
+	const struct page * :	(__force const netmem_ref)(p),	\
+	struct page * :		(__force netmem_ref)(p)))
 
 /**
  * virt_to_netmem - convert virtual memory pointer to a netmem reference
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next v6 6/9] netmem: remove __netmem_get_pp()
  2025-06-20  4:12 [PATCH net-next v6 0/9] Split netmem from struct page Byungchul Park
                   ` (4 preceding siblings ...)
  2025-06-20  4:12 ` [PATCH net-next v6 5/9] netmem: use _Generic to cover const casting for page_to_netmem() Byungchul Park
@ 2025-06-20  4:12 ` Byungchul Park
  2025-06-23  4:32   ` Byungchul Park
  2025-06-20  4:12 ` [PATCH net-next v6 7/9] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem() Byungchul Park
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Byungchul Park @ 2025-06-20  4:12 UTC (permalink / raw)
  To: willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola, hannes, ziy, jackmanb

There are no users of __netmem_get_pp().  Remove it.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
 include/net/netmem.h | 16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/include/net/netmem.h b/include/net/netmem.h
index e27ed0b9c82e..d0a84557983d 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -245,22 +245,6 @@ static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem)
 	return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV);
 }
 
-/**
- * __netmem_get_pp - unsafely get pointer to the &page_pool backing @netmem
- * @netmem: netmem reference to get the pointer from
- *
- * Unsafe version of netmem_get_pp(). When @netmem is always page-backed,
- * e.g. when it's a header buffer, performs faster and generates smaller
- * object code (avoids clearing the LSB). When @netmem points to IOV,
- * provokes invalid memory access.
- *
- * Return: pointer to the &page_pool (garbage if @netmem is not page-backed).
- */
-static inline struct page_pool *__netmem_get_pp(netmem_ref netmem)
-{
-	return __netmem_to_page(netmem)->pp;
-}
-
 static inline struct page_pool *netmem_get_pp(netmem_ref netmem)
 {
 	return __netmem_clear_lsb(netmem)->pp;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next v6 7/9] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem()
  2025-06-20  4:12 [PATCH net-next v6 0/9] Split netmem from struct page Byungchul Park
                   ` (5 preceding siblings ...)
  2025-06-20  4:12 ` [PATCH net-next v6 6/9] netmem: remove __netmem_get_pp() Byungchul Park
@ 2025-06-20  4:12 ` Byungchul Park
  2025-06-20  4:12 ` [PATCH net-next v6 8/9] netmem: introduce a netmem API, virt_to_head_netmem() Byungchul Park
  2025-06-20  4:12 ` [PATCH net-next v6 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp() Byungchul Park
  8 siblings, 0 replies; 35+ messages in thread
From: Byungchul Park @ 2025-06-20  4:12 UTC (permalink / raw)
  To: willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola, hannes, ziy, jackmanb

The page pool members in struct page cannot be removed unless it's not
allowed to access any of them via struct page.

Do not access 'page->dma_addr' directly in page_pool_get_dma_addr() but
just wrap page_pool_get_dma_addr_netmem() safely.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/net/page_pool/helpers.h | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 773fc65780b5..db180626be06 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -444,12 +444,7 @@ static inline dma_addr_t page_pool_get_dma_addr_netmem(netmem_ref netmem)
  */
 static inline dma_addr_t page_pool_get_dma_addr(const struct page *page)
 {
-	dma_addr_t ret = page->dma_addr;
-
-	if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA)
-		ret <<= PAGE_SHIFT;
-
-	return ret;
+	return page_pool_get_dma_addr_netmem(page_to_netmem(page));
 }
 
 static inline void __page_pool_dma_sync_for_cpu(const struct page_pool *pool,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next v6 8/9] netmem: introduce a netmem API, virt_to_head_netmem()
  2025-06-20  4:12 [PATCH net-next v6 0/9] Split netmem from struct page Byungchul Park
                   ` (6 preceding siblings ...)
  2025-06-20  4:12 ` [PATCH net-next v6 7/9] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem() Byungchul Park
@ 2025-06-20  4:12 ` Byungchul Park
  2025-06-20  4:12 ` [PATCH net-next v6 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp() Byungchul Park
  8 siblings, 0 replies; 35+ messages in thread
From: Byungchul Park @ 2025-06-20  4:12 UTC (permalink / raw)
  To: willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola, hannes, ziy, jackmanb

To eliminate the use of struct page in page pool, the page pool code
should use netmem descriptor and APIs instead.

As part of the work, introduce a netmem API to convert a virtual address
to a head netmem allowing the code to use it rather than the existing
API, virt_to_head_page() for struct page.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
---
 include/net/netmem.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/net/netmem.h b/include/net/netmem.h
index d0a84557983d..d49ed49d250b 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -276,6 +276,13 @@ static inline netmem_ref netmem_compound_head(netmem_ref netmem)
 	return page_to_netmem(compound_head(netmem_to_page(netmem)));
 }
 
+static inline netmem_ref virt_to_head_netmem(const void *x)
+{
+	netmem_ref netmem = virt_to_netmem(x);
+
+	return netmem_compound_head(netmem);
+}
+
 /**
  * __netmem_address - unsafely get pointer to the memory backing @netmem
  * @netmem: netmem reference to get the pointer for
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next v6 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-20  4:12 [PATCH net-next v6 0/9] Split netmem from struct page Byungchul Park
                   ` (7 preceding siblings ...)
  2025-06-20  4:12 ` [PATCH net-next v6 8/9] netmem: introduce a netmem API, virt_to_head_netmem() Byungchul Park
@ 2025-06-20  4:12 ` Byungchul Park
  2025-06-23  9:16   ` David Hildenbrand
  8 siblings, 1 reply; 35+ messages in thread
From: Byungchul Park @ 2025-06-20  4:12 UTC (permalink / raw)
  To: willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola, hannes, ziy, jackmanb

To simplify struct page, the effort to separate its own descriptor from
struct page is required and the work for page pool is on going.

To achieve that, all the code should avoid directly accessing page pool
members of struct page.

Access ->pp_magic through struct netmem_desc instead of directly
accessing it through struct page in page_pool_page_is_pp().  Plus, move
page_pool_page_is_pp() from mm.h to netmem.h to use struct netmem_desc
without header dependency issue.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Harry Yoo <harry.yoo@oracle.com>
---
 include/linux/mm.h   | 12 ------------
 include/net/netmem.h | 14 ++++++++++++++
 mm/page_alloc.c      |  1 +
 3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0ef2ba0c667a..0b7f7f998085 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4172,16 +4172,4 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
  */
 #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
 
-#ifdef CONFIG_PAGE_POOL
-static inline bool page_pool_page_is_pp(struct page *page)
-{
-	return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
-}
-#else
-static inline bool page_pool_page_is_pp(struct page *page)
-{
-	return false;
-}
-#endif
-
 #endif /* _LINUX_MM_H */
diff --git a/include/net/netmem.h b/include/net/netmem.h
index d49ed49d250b..3d1b1dfc9ba5 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -56,6 +56,20 @@ NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
  */
 static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
 
+#ifdef CONFIG_PAGE_POOL
+static inline bool page_pool_page_is_pp(struct page *page)
+{
+	struct netmem_desc *desc = (struct netmem_desc *)page;
+
+	return (desc->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
+}
+#else
+static inline bool page_pool_page_is_pp(struct page *page)
+{
+	return false;
+}
+#endif
+
 /* net_iov */
 
 DECLARE_STATIC_KEY_FALSE(page_pool_mem_providers);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2ef3c07266b3..cc1d169853e8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -55,6 +55,7 @@
 #include <linux/delayacct.h>
 #include <linux/cacheinfo.h>
 #include <linux/pgalloc_tag.h>
+#include <net/netmem.h>
 #include <asm/div64.h>
 #include "internal.h"
 #include "shuffle.h"
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 6/9] netmem: remove __netmem_get_pp()
  2025-06-20  4:12 ` [PATCH net-next v6 6/9] netmem: remove __netmem_get_pp() Byungchul Park
@ 2025-06-23  4:32   ` Byungchul Park
  2025-06-24  0:17     ` Jakub Kicinski
  0 siblings, 1 reply; 35+ messages in thread
From: Byungchul Park @ 2025-06-23  4:32 UTC (permalink / raw)
  To: willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola, hannes, ziy, jackmanb

On Fri, Jun 20, 2025 at 01:12:21PM +0900, Byungchul Park wrote:
> There are no users of __netmem_get_pp().  Remove it.
> 
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> Reviewed-by: Mina Almasry <almasrymina@google.com>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> ---
>  include/net/netmem.h | 16 ----------------
>  1 file changed, 16 deletions(-)
> 
> diff --git a/include/net/netmem.h b/include/net/netmem.h
> index e27ed0b9c82e..d0a84557983d 100644
> --- a/include/net/netmem.h
> +++ b/include/net/netmem.h
> @@ -245,22 +245,6 @@ static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem)
>  	return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV);
>  }
>  
> -/**
> - * __netmem_get_pp - unsafely get pointer to the &page_pool backing @netmem
> - * @netmem: netmem reference to get the pointer from
> - *
> - * Unsafe version of netmem_get_pp(). When @netmem is always page-backed,
> - * e.g. when it's a header buffer, performs faster and generates smaller
> - * object code (avoids clearing the LSB). When @netmem points to IOV,
> - * provokes invalid memory access.
> - *
> - * Return: pointer to the &page_pool (garbage if @netmem is not page-backed).
> - */
> -static inline struct page_pool *__netmem_get_pp(netmem_ref netmem)
> -{
> -	return __netmem_to_page(netmem)->pp;
> -}
> -

In the meantime, libeth started to use __netmem_get_pp() again :(

Discard this patch please.  Do I have to resend this series with this
excluded?

	Byungchul

>  static inline struct page_pool *netmem_get_pp(netmem_ref netmem)
>  {
>  	return __netmem_clear_lsb(netmem)->pp;
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-20  4:12 ` [PATCH net-next v6 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp() Byungchul Park
@ 2025-06-23  9:16   ` David Hildenbrand
  2025-06-23 10:16     ` Byungchul Park
  0 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand @ 2025-06-23  9:16 UTC (permalink / raw)
  To: Byungchul Park, willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, lorenzo.stoakes, Liam.Howlett, vbabka,
	rppt, surenb, mhocko, horms, linux-rdma, bpf, vishal.moola,
	hannes, ziy, jackmanb

On 20.06.25 06:12, Byungchul Park wrote:
> To simplify struct page, the effort to separate its own descriptor from
> struct page is required and the work for page pool is on going.
> 
> To achieve that, all the code should avoid directly accessing page pool
> members of struct page.
> 
> Access ->pp_magic through struct netmem_desc instead of directly
> accessing it through struct page in page_pool_page_is_pp().  Plus, move
> page_pool_page_is_pp() from mm.h to netmem.h to use struct netmem_desc
> without header dependency issue.
> 
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Reviewed-by: Mina Almasry <almasrymina@google.com>
> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Harry Yoo <harry.yoo@oracle.com>
> ---
>   include/linux/mm.h   | 12 ------------
>   include/net/netmem.h | 14 ++++++++++++++
>   mm/page_alloc.c      |  1 +
>   3 files changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0ef2ba0c667a..0b7f7f998085 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4172,16 +4172,4 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>    */
>   #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>   
> -#ifdef CONFIG_PAGE_POOL
> -static inline bool page_pool_page_is_pp(struct page *page)
> -{
> -	return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> -}
> -#else
> -static inline bool page_pool_page_is_pp(struct page *page)
> -{
> -	return false;
> -}
> -#endif
> -
>   #endif /* _LINUX_MM_H */
> diff --git a/include/net/netmem.h b/include/net/netmem.h
> index d49ed49d250b..3d1b1dfc9ba5 100644
> --- a/include/net/netmem.h
> +++ b/include/net/netmem.h
> @@ -56,6 +56,20 @@ NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
>    */
>   static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
>   
> +#ifdef CONFIG_PAGE_POOL
> +static inline bool page_pool_page_is_pp(struct page *page)
> +{
> +	struct netmem_desc *desc = (struct netmem_desc *)page;
> +
> +	return (desc->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> +}
> +#else
> +static inline bool page_pool_page_is_pp(struct page *page)
> +{
> +	return false;
> +}
> +#endif

I wonder how helpful this cleanup is long-term.

page_pool_page_is_pp() is only called from mm/page_alloc.c, right? 
There, we want to make sure that no pagepool page is ever returned to 
the buddy.

How reasonable is this sanity check to have long-term? Wouldn't we be 
able to check that on some higher-level freeing path?

The reason I am commenting is that once we decouple "struct page" from 
"struct netmem_desc", we'd have to lookup here the corresponding "struct 
netmem_desc".

... but at that point here (when we free the actual pages), the "struct 
netmem_desc" would likely already have been freed separately (remember: 
it will be dynamically allocated).

With that in mind:

1) Is there a higher level "struct netmem_desc" freeing path where we 
could check that instead, so we don't have to cast from pages to 
netmem_desc at all.

2) How valuable are these sanity checks deep in the buddy?

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 1/9] netmem: introduce struct netmem_desc mirroring struct page
  2025-06-20  4:12 ` [PATCH net-next v6 1/9] netmem: introduce struct netmem_desc mirroring " Byungchul Park
@ 2025-06-23  9:32   ` David Hildenbrand
  2025-06-23 10:28     ` Byungchul Park
  0 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand @ 2025-06-23  9:32 UTC (permalink / raw)
  To: Byungchul Park, willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, lorenzo.stoakes, Liam.Howlett, vbabka,
	rppt, surenb, mhocko, horms, linux-rdma, bpf, vishal.moola,
	hannes, ziy, jackmanb

On 20.06.25 06:12, Byungchul Park wrote:
> To simplify struct page, the page pool members of struct page should be
> moved to other, allowing these members to be removed from struct page.
> 
> Introduce a network memory descriptor to store the members, struct
> netmem_desc, and make it union'ed with the existing fields in struct
> net_iov, allowing to organize the fields of struct net_iov.

It would be great adding some result from the previous discussions in 
here, such as that the layout of "struct net_iov" can be changed because 
it is not a "struct page" overlay, what the next steps based on this 
patch are etc.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-23  9:16   ` David Hildenbrand
@ 2025-06-23 10:16     ` Byungchul Park
  2025-06-23 11:13       ` Zi Yan
  0 siblings, 1 reply; 35+ messages in thread
From: Byungchul Park @ 2025-06-23 10:16 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
	almasrymina, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
	john.fastabend, andrew+netdev, asml.silence, toke, tariqt,
	edumazet, pabeni, saeedm, leon, ast, daniel, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, linux-rdma,
	bpf, vishal.moola, hannes, ziy, jackmanb

On Mon, Jun 23, 2025 at 11:16:43AM +0200, David Hildenbrand wrote:
> On 20.06.25 06:12, Byungchul Park wrote:
> > To simplify struct page, the effort to separate its own descriptor from
> > struct page is required and the work for page pool is on going.
> > 
> > To achieve that, all the code should avoid directly accessing page pool
> > members of struct page.
> > 
> > Access ->pp_magic through struct netmem_desc instead of directly
> > accessing it through struct page in page_pool_page_is_pp().  Plus, move
> > page_pool_page_is_pp() from mm.h to netmem.h to use struct netmem_desc
> > without header dependency issue.
> > 
> > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > Reviewed-by: Mina Almasry <almasrymina@google.com>
> > Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
> > Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> > Acked-by: Harry Yoo <harry.yoo@oracle.com>
> > ---
> >   include/linux/mm.h   | 12 ------------
> >   include/net/netmem.h | 14 ++++++++++++++
> >   mm/page_alloc.c      |  1 +
> >   3 files changed, 15 insertions(+), 12 deletions(-)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 0ef2ba0c667a..0b7f7f998085 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -4172,16 +4172,4 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> >    */
> >   #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> > 
> > -#ifdef CONFIG_PAGE_POOL
> > -static inline bool page_pool_page_is_pp(struct page *page)
> > -{
> > -     return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> > -}
> > -#else
> > -static inline bool page_pool_page_is_pp(struct page *page)
> > -{
> > -     return false;
> > -}
> > -#endif
> > -
> >   #endif /* _LINUX_MM_H */
> > diff --git a/include/net/netmem.h b/include/net/netmem.h
> > index d49ed49d250b..3d1b1dfc9ba5 100644
> > --- a/include/net/netmem.h
> > +++ b/include/net/netmem.h
> > @@ -56,6 +56,20 @@ NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
> >    */
> >   static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
> > 
> > +#ifdef CONFIG_PAGE_POOL
> > +static inline bool page_pool_page_is_pp(struct page *page)
> > +{
> > +     struct netmem_desc *desc = (struct netmem_desc *)page;
> > +
> > +     return (desc->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> > +}
> > +#else
> > +static inline bool page_pool_page_is_pp(struct page *page)
> > +{
> > +     return false;
> > +}
> > +#endif
> 
> I wonder how helpful this cleanup is long-term.
> 
> page_pool_page_is_pp() is only called from mm/page_alloc.c, right?

Yes.

> There, we want to make sure that no pagepool page is ever returned to
> the buddy.
> 
> How reasonable is this sanity check to have long-term? Wouldn't we be
> able to check that on some higher-level freeing path?
> 
> The reason I am commenting is that once we decouple "struct page" from
> "struct netmem_desc", we'd have to lookup here the corresponding "struct
> netmem_desc".
> 
> ... but at that point here (when we free the actual pages), the "struct
> netmem_desc" would likely already have been freed separately (remember:
> it will be dynamically allocated).
> 
> With that in mind:
> 
> 1) Is there a higher level "struct netmem_desc" freeing path where we
> could check that instead, so we don't have to cast from pages to
> netmem_desc at all.

I also thought it's too paranoiac.  However, I thought it's other issue
than this work.  That's why I left the API as is for now, it can be gone
once we get convinced the check is unnecessary in deep buddy.  Wrong?

> 2) How valuable are these sanity checks deep in the buddy?

That was also what I felt weird on.

	Byungchul

> --
> Cheers,
> 
> David / dhildenb

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 1/9] netmem: introduce struct netmem_desc mirroring struct page
  2025-06-23  9:32   ` David Hildenbrand
@ 2025-06-23 10:28     ` Byungchul Park
  2025-06-23 10:38       ` David Hildenbrand
  2025-06-23 12:18       ` Harry Yoo
  0 siblings, 2 replies; 35+ messages in thread
From: Byungchul Park @ 2025-06-23 10:28 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
	almasrymina, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
	john.fastabend, andrew+netdev, asml.silence, toke, tariqt,
	edumazet, pabeni, saeedm, leon, ast, daniel, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, linux-rdma,
	bpf, vishal.moola, hannes, ziy, jackmanb

On Mon, Jun 23, 2025 at 11:32:16AM +0200, David Hildenbrand wrote:
> On 20.06.25 06:12, Byungchul Park wrote:
> > To simplify struct page, the page pool members of struct page should be
> > moved to other, allowing these members to be removed from struct page.
> > 
> > Introduce a network memory descriptor to store the members, struct
> > netmem_desc, and make it union'ed with the existing fields in struct
> > net_iov, allowing to organize the fields of struct net_iov.
> 
> It would be great adding some result from the previous discussions in
> here, such as that the layout of "struct net_iov" can be changed because
> it is not a "struct page" overlay, what the next steps based on this

I think the network folks already know how to use and interpret their
data struct, struct net_iov for sure.. but I will add the comment if it
you think is needed.  Thanks for the comment.

	Byungchul

> patch are etc.
> 
> --
> Cheers,
> 
> David / dhildenb

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 1/9] netmem: introduce struct netmem_desc mirroring struct page
  2025-06-23 10:28     ` Byungchul Park
@ 2025-06-23 10:38       ` David Hildenbrand
  2025-06-23 12:18       ` Harry Yoo
  1 sibling, 0 replies; 35+ messages in thread
From: David Hildenbrand @ 2025-06-23 10:38 UTC (permalink / raw)
  To: Byungchul Park
  Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
	almasrymina, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
	john.fastabend, andrew+netdev, asml.silence, toke, tariqt,
	edumazet, pabeni, saeedm, leon, ast, daniel, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, linux-rdma,
	bpf, vishal.moola, hannes, ziy, jackmanb

On 23.06.25 12:28, Byungchul Park wrote:
> On Mon, Jun 23, 2025 at 11:32:16AM +0200, David Hildenbrand wrote:
>> On 20.06.25 06:12, Byungchul Park wrote:
>>> To simplify struct page, the page pool members of struct page should be
>>> moved to other, allowing these members to be removed from struct page.
>>>
>>> Introduce a network memory descriptor to store the members, struct
>>> netmem_desc, and make it union'ed with the existing fields in struct
>>> net_iov, allowing to organize the fields of struct net_iov.
>>
>> It would be great adding some result from the previous discussions in
>> here, such as that the layout of "struct net_iov" can be changed because
>> it is not a "struct page" overlay, what the next steps based on this
> 
> I think the network folks already know how to use and interpret their
> data struct, struct net_iov for sure.. but I will add the comment if it
> you think is needed.  Thanks for the comment.

Well, you CC MM folks like me ... :)

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-23 10:16     ` Byungchul Park
@ 2025-06-23 11:13       ` Zi Yan
  2025-06-23 11:25         ` Byungchul Park
  2025-06-23 14:58         ` David Hildenbrand
  0 siblings, 2 replies; 35+ messages in thread
From: Zi Yan @ 2025-06-23 11:13 UTC (permalink / raw)
  To: Byungchul Park, David Hildenbrand
  Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
	almasrymina, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
	john.fastabend, andrew+netdev, asml.silence, toke, tariqt,
	edumazet, pabeni, saeedm, leon, ast, daniel, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, linux-rdma,
	bpf, vishal.moola, hannes, jackmanb

On 23 Jun 2025, at 6:16, Byungchul Park wrote:

> On Mon, Jun 23, 2025 at 11:16:43AM +0200, David Hildenbrand wrote:
>> On 20.06.25 06:12, Byungchul Park wrote:
>>> To simplify struct page, the effort to separate its own descriptor from
>>> struct page is required and the work for page pool is on going.
>>>
>>> To achieve that, all the code should avoid directly accessing page pool
>>> members of struct page.
>>>
>>> Access ->pp_magic through struct netmem_desc instead of directly
>>> accessing it through struct page in page_pool_page_is_pp().  Plus, move
>>> page_pool_page_is_pp() from mm.h to netmem.h to use struct netmem_desc
>>> without header dependency issue.
>>>
>>> Signed-off-by: Byungchul Park <byungchul@sk.com>
>>> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>> Reviewed-by: Mina Almasry <almasrymina@google.com>
>>> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
>>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>>> Acked-by: Harry Yoo <harry.yoo@oracle.com>
>>> ---
>>>   include/linux/mm.h   | 12 ------------
>>>   include/net/netmem.h | 14 ++++++++++++++
>>>   mm/page_alloc.c      |  1 +
>>>   3 files changed, 15 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index 0ef2ba0c667a..0b7f7f998085 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -4172,16 +4172,4 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>>>    */
>>>   #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>>>
>>> -#ifdef CONFIG_PAGE_POOL
>>> -static inline bool page_pool_page_is_pp(struct page *page)
>>> -{
>>> -     return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>> -}
>>> -#else
>>> -static inline bool page_pool_page_is_pp(struct page *page)
>>> -{
>>> -     return false;
>>> -}
>>> -#endif
>>> -
>>>   #endif /* _LINUX_MM_H */
>>> diff --git a/include/net/netmem.h b/include/net/netmem.h
>>> index d49ed49d250b..3d1b1dfc9ba5 100644
>>> --- a/include/net/netmem.h
>>> +++ b/include/net/netmem.h
>>> @@ -56,6 +56,20 @@ NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
>>>    */
>>>   static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
>>>
>>> +#ifdef CONFIG_PAGE_POOL
>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>> +{
>>> +     struct netmem_desc *desc = (struct netmem_desc *)page;
>>> +
>>> +     return (desc->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>> +}
>>> +#else
>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>> +{
>>> +     return false;
>>> +}
>>> +#endif
>>
>> I wonder how helpful this cleanup is long-term.
>>
>> page_pool_page_is_pp() is only called from mm/page_alloc.c, right?
>
> Yes.
>
>> There, we want to make sure that no pagepool page is ever returned to
>> the buddy.
>>
>> How reasonable is this sanity check to have long-term? Wouldn't we be
>> able to check that on some higher-level freeing path?
>>
>> The reason I am commenting is that once we decouple "struct page" from
>> "struct netmem_desc", we'd have to lookup here the corresponding "struct
>> netmem_desc".
>>
>> ... but at that point here (when we free the actual pages), the "struct
>> netmem_desc" would likely already have been freed separately (remember:
>> it will be dynamically allocated).
>>
>> With that in mind:
>>
>> 1) Is there a higher level "struct netmem_desc" freeing path where we
>> could check that instead, so we don't have to cast from pages to
>> netmem_desc at all.
>
> I also thought it's too paranoiac.  However, I thought it's other issue
> than this work.  That's why I left the API as is for now, it can be gone
> once we get convinced the check is unnecessary in deep buddy.  Wrong?
>
>> 2) How valuable are these sanity checks deep in the buddy?
>
> That was also what I felt weird on.

It seems very useful when I asked last time[1]:

|> We have actually used this at Cloudflare to catch some page_pool bugs.

[1] https://lore.kernel.org/linux-mm/4d35bda2-d032-49db-bb6e-b1d70f10d436@kernel.org/

--
Best Regards,
Yan, Zi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-23 11:13       ` Zi Yan
@ 2025-06-23 11:25         ` Byungchul Park
  2025-06-23 14:58         ` David Hildenbrand
  1 sibling, 0 replies; 35+ messages in thread
From: Byungchul Park @ 2025-06-23 11:25 UTC (permalink / raw)
  To: Zi Yan
  Cc: David Hildenbrand, willy, netdev, linux-kernel, linux-mm,
	kernel_team, kuba, almasrymina, ilias.apalodimas, harry.yoo, hawk,
	akpm, davem, john.fastabend, andrew+netdev, asml.silence, toke,
	tariqt, edumazet, pabeni, saeedm, leon, ast, daniel,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	horms, linux-rdma, bpf, vishal.moola, hannes, jackmanb

On Mon, Jun 23, 2025 at 07:13:21AM -0400, Zi Yan wrote:
> On 23 Jun 2025, at 6:16, Byungchul Park wrote:
> > On Mon, Jun 23, 2025 at 11:16:43AM +0200, David Hildenbrand wrote:
> >> On 20.06.25 06:12, Byungchul Park wrote:
> >>> To simplify struct page, the effort to separate its own descriptor from
> >>> struct page is required and the work for page pool is on going.
> >>>
> >>> To achieve that, all the code should avoid directly accessing page pool
> >>> members of struct page.
> >>>
> >>> Access ->pp_magic through struct netmem_desc instead of directly
> >>> accessing it through struct page in page_pool_page_is_pp().  Plus, move
> >>> page_pool_page_is_pp() from mm.h to netmem.h to use struct netmem_desc
> >>> without header dependency issue.
> >>>
> >>> Signed-off-by: Byungchul Park <byungchul@sk.com>
> >>> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >>> Reviewed-by: Mina Almasry <almasrymina@google.com>
> >>> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
> >>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> >>> Acked-by: Harry Yoo <harry.yoo@oracle.com>
> >>> ---
> >>>   include/linux/mm.h   | 12 ------------
> >>>   include/net/netmem.h | 14 ++++++++++++++
> >>>   mm/page_alloc.c      |  1 +
> >>>   3 files changed, 15 insertions(+), 12 deletions(-)
> >>>
> >>> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >>> index 0ef2ba0c667a..0b7f7f998085 100644
> >>> --- a/include/linux/mm.h
> >>> +++ b/include/linux/mm.h
> >>> @@ -4172,16 +4172,4 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> >>>    */
> >>>   #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> >>>
> >>> -#ifdef CONFIG_PAGE_POOL
> >>> -static inline bool page_pool_page_is_pp(struct page *page)
> >>> -{
> >>> -     return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> >>> -}
> >>> -#else
> >>> -static inline bool page_pool_page_is_pp(struct page *page)
> >>> -{
> >>> -     return false;
> >>> -}
> >>> -#endif
> >>> -
> >>>   #endif /* _LINUX_MM_H */
> >>> diff --git a/include/net/netmem.h b/include/net/netmem.h
> >>> index d49ed49d250b..3d1b1dfc9ba5 100644
> >>> --- a/include/net/netmem.h
> >>> +++ b/include/net/netmem.h
> >>> @@ -56,6 +56,20 @@ NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
> >>>    */
> >>>   static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
> >>>
> >>> +#ifdef CONFIG_PAGE_POOL
> >>> +static inline bool page_pool_page_is_pp(struct page *page)
> >>> +{
> >>> +     struct netmem_desc *desc = (struct netmem_desc *)page;
> >>> +
> >>> +     return (desc->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> >>> +}
> >>> +#else
> >>> +static inline bool page_pool_page_is_pp(struct page *page)
> >>> +{
> >>> +     return false;
> >>> +}
> >>> +#endif
> >>
> >> I wonder how helpful this cleanup is long-term.
> >>
> >> page_pool_page_is_pp() is only called from mm/page_alloc.c, right?
> >
> > Yes.
> >
> >> There, we want to make sure that no pagepool page is ever returned to
> >> the buddy.
> >>
> >> How reasonable is this sanity check to have long-term? Wouldn't we be
> >> able to check that on some higher-level freeing path?
> >>
> >> The reason I am commenting is that once we decouple "struct page" from
> >> "struct netmem_desc", we'd have to lookup here the corresponding "struct
> >> netmem_desc".
> >>
> >> ... but at that point here (when we free the actual pages), the "struct
> >> netmem_desc" would likely already have been freed separately (remember:
> >> it will be dynamically allocated).
> >>
> >> With that in mind:
> >>
> >> 1) Is there a higher level "struct netmem_desc" freeing path where we
> >> could check that instead, so we don't have to cast from pages to
> >> netmem_desc at all.
> >
> > I also thought it's too paranoiac.  However, I thought it's other issue
> > than this work.  That's why I left the API as is for now, it can be gone
> > once we get convinced the check is unnecessary in deep buddy.  Wrong?
> >
> >> 2) How valuable are these sanity checks deep in the buddy?
> >
> > That was also what I felt weird on.
> 
> It seems very useful when I asked last time[1]:
> 
> |> We have actually used this at Cloudflare to catch some page_pool bugs.

Indeed..  So I think it'd be better to leave the check as is until we
will be fully convinced on that issue, I ideally agree with David's
opinion tho.

	Byungchul

> [1] https://lore.kernel.org/linux-mm/4d35bda2-d032-49db-bb6e-b1d70f10d436@kernel.org/
> 
> --
> Best Regards,
> Yan, Zi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 1/9] netmem: introduce struct netmem_desc mirroring struct page
  2025-06-23 10:28     ` Byungchul Park
  2025-06-23 10:38       ` David Hildenbrand
@ 2025-06-23 12:18       ` Harry Yoo
  2025-06-23 19:09         ` Mina Almasry
  1 sibling, 1 reply; 35+ messages in thread
From: Harry Yoo @ 2025-06-23 12:18 UTC (permalink / raw)
  To: Byungchul Park
  Cc: David Hildenbrand, willy, netdev, linux-kernel, linux-mm,
	kernel_team, kuba, almasrymina, ilias.apalodimas, hawk, akpm,
	davem, john.fastabend, andrew+netdev, asml.silence, toke, tariqt,
	edumazet, pabeni, saeedm, leon, ast, daniel, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, linux-rdma,
	bpf, vishal.moola, hannes, ziy, jackmanb

On Mon, Jun 23, 2025 at 07:28:21PM +0900, Byungchul Park wrote:
> On Mon, Jun 23, 2025 at 11:32:16AM +0200, David Hildenbrand wrote:
> > On 20.06.25 06:12, Byungchul Park wrote:
> > > To simplify struct page, the page pool members of struct page should be
> > > moved to other, allowing these members to be removed from struct page.
> > > 
> > > Introduce a network memory descriptor to store the members, struct
> > > netmem_desc, and make it union'ed with the existing fields in struct
> > > net_iov, allowing to organize the fields of struct net_iov.
> > 
> > It would be great adding some result from the previous discussions in
> > here, such as that the layout of "struct net_iov" can be changed because
> > it is not a "struct page" overlay, what the next steps based on this
> 
> I think the network folks already know how to use and interpret their
> data struct, struct net_iov for sure.. but I will add the comment if it
> you think is needed.  Thanks for the comment.

I agree with David - it's not immediately obvious at first glance.
That was my feedback on the previous version as well :)

I think it'd be great to add that explanation, since this is where MM and
networking intersect.

> 	Byungchul
> 
> > patch are etc.
> > 
> > --
> > Cheers,
> > 
> > David / dhildenb

-- 
Cheers,
Harry / Hyeonggon

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-23 11:13       ` Zi Yan
  2025-06-23 11:25         ` Byungchul Park
@ 2025-06-23 14:58         ` David Hildenbrand
  2025-06-23 15:25           ` Zi Yan
  2025-06-23 17:06           ` Pavel Begunkov
  1 sibling, 2 replies; 35+ messages in thread
From: David Hildenbrand @ 2025-06-23 14:58 UTC (permalink / raw)
  To: Zi Yan, Byungchul Park
  Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
	almasrymina, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
	john.fastabend, andrew+netdev, asml.silence, toke, tariqt,
	edumazet, pabeni, saeedm, leon, ast, daniel, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, linux-rdma,
	bpf, vishal.moola, hannes, jackmanb

On 23.06.25 13:13, Zi Yan wrote:
> On 23 Jun 2025, at 6:16, Byungchul Park wrote:
> 
>> On Mon, Jun 23, 2025 at 11:16:43AM +0200, David Hildenbrand wrote:
>>> On 20.06.25 06:12, Byungchul Park wrote:
>>>> To simplify struct page, the effort to separate its own descriptor from
>>>> struct page is required and the work for page pool is on going.
>>>>
>>>> To achieve that, all the code should avoid directly accessing page pool
>>>> members of struct page.
>>>>
>>>> Access ->pp_magic through struct netmem_desc instead of directly
>>>> accessing it through struct page in page_pool_page_is_pp().  Plus, move
>>>> page_pool_page_is_pp() from mm.h to netmem.h to use struct netmem_desc
>>>> without header dependency issue.
>>>>
>>>> Signed-off-by: Byungchul Park <byungchul@sk.com>
>>>> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>> Reviewed-by: Mina Almasry <almasrymina@google.com>
>>>> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
>>>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>>>> Acked-by: Harry Yoo <harry.yoo@oracle.com>
>>>> ---
>>>>    include/linux/mm.h   | 12 ------------
>>>>    include/net/netmem.h | 14 ++++++++++++++
>>>>    mm/page_alloc.c      |  1 +
>>>>    3 files changed, 15 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>> index 0ef2ba0c667a..0b7f7f998085 100644
>>>> --- a/include/linux/mm.h
>>>> +++ b/include/linux/mm.h
>>>> @@ -4172,16 +4172,4 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>>>>     */
>>>>    #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>>>>
>>>> -#ifdef CONFIG_PAGE_POOL
>>>> -static inline bool page_pool_page_is_pp(struct page *page)
>>>> -{
>>>> -     return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>>> -}
>>>> -#else
>>>> -static inline bool page_pool_page_is_pp(struct page *page)
>>>> -{
>>>> -     return false;
>>>> -}
>>>> -#endif
>>>> -
>>>>    #endif /* _LINUX_MM_H */
>>>> diff --git a/include/net/netmem.h b/include/net/netmem.h
>>>> index d49ed49d250b..3d1b1dfc9ba5 100644
>>>> --- a/include/net/netmem.h
>>>> +++ b/include/net/netmem.h
>>>> @@ -56,6 +56,20 @@ NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
>>>>     */
>>>>    static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
>>>>
>>>> +#ifdef CONFIG_PAGE_POOL
>>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>>> +{
>>>> +     struct netmem_desc *desc = (struct netmem_desc *)page;
>>>> +
>>>> +     return (desc->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>>> +}
>>>> +#else
>>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>>> +{
>>>> +     return false;
>>>> +}
>>>> +#endif
>>>
>>> I wonder how helpful this cleanup is long-term.
>>>
>>> page_pool_page_is_pp() is only called from mm/page_alloc.c, right?
>>
>> Yes.
>>
>>> There, we want to make sure that no pagepool page is ever returned to
>>> the buddy.
>>>
>>> How reasonable is this sanity check to have long-term? Wouldn't we be
>>> able to check that on some higher-level freeing path?
>>>
>>> The reason I am commenting is that once we decouple "struct page" from
>>> "struct netmem_desc", we'd have to lookup here the corresponding "struct
>>> netmem_desc".
>>>
>>> ... but at that point here (when we free the actual pages), the "struct
>>> netmem_desc" would likely already have been freed separately (remember:
>>> it will be dynamically allocated).
>>>
>>> With that in mind:
>>>
>>> 1) Is there a higher level "struct netmem_desc" freeing path where we
>>> could check that instead, so we don't have to cast from pages to
>>> netmem_desc at all.
>>
>> I also thought it's too paranoiac.  However, I thought it's other issue
>> than this work.  That's why I left the API as is for now, it can be gone
>> once we get convinced the check is unnecessary in deep buddy.  Wrong?
>>
>>> 2) How valuable are these sanity checks deep in the buddy?
>>
>> That was also what I felt weird on.
> 
> It seems very useful when I asked last time[1]:
> 
> |> We have actually used this at Cloudflare to catch some page_pool bugs.

My question is rather, whether there is some higher-level freeing path 
for netmem_desc where we could check that instead (IOW, earlier).

Or is it really arbitrary put_page() (IOW, we assume that many possible 
references can be held)?

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-23 14:58         ` David Hildenbrand
@ 2025-06-23 15:25           ` Zi Yan
  2025-06-24 14:43             ` Toke Høiland-Jørgensen
  2025-06-23 17:06           ` Pavel Begunkov
  1 sibling, 1 reply; 35+ messages in thread
From: Zi Yan @ 2025-06-23 15:25 UTC (permalink / raw)
  To: David Hildenbrand, Toke Høiland-Jørgensen
  Cc: Byungchul Park, willy, netdev, linux-kernel, linux-mm,
	kernel_team, kuba, almasrymina, ilias.apalodimas, harry.yoo, hawk,
	akpm, davem, john.fastabend, andrew+netdev, asml.silence, toke,
	tariqt, edumazet, pabeni, saeedm, leon, ast, daniel,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	horms, linux-rdma, bpf, vishal.moola, hannes, jackmanb

On 23 Jun 2025, at 10:58, David Hildenbrand wrote:

> On 23.06.25 13:13, Zi Yan wrote:
>> On 23 Jun 2025, at 6:16, Byungchul Park wrote:
>>
>>> On Mon, Jun 23, 2025 at 11:16:43AM +0200, David Hildenbrand wrote:
>>>> On 20.06.25 06:12, Byungchul Park wrote:
>>>>> To simplify struct page, the effort to separate its own descriptor from
>>>>> struct page is required and the work for page pool is on going.
>>>>>
>>>>> To achieve that, all the code should avoid directly accessing page pool
>>>>> members of struct page.
>>>>>
>>>>> Access ->pp_magic through struct netmem_desc instead of directly
>>>>> accessing it through struct page in page_pool_page_is_pp().  Plus, move
>>>>> page_pool_page_is_pp() from mm.h to netmem.h to use struct netmem_desc
>>>>> without header dependency issue.
>>>>>
>>>>> Signed-off-by: Byungchul Park <byungchul@sk.com>
>>>>> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>>> Reviewed-by: Mina Almasry <almasrymina@google.com>
>>>>> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
>>>>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>>>>> Acked-by: Harry Yoo <harry.yoo@oracle.com>
>>>>> ---
>>>>>    include/linux/mm.h   | 12 ------------
>>>>>    include/net/netmem.h | 14 ++++++++++++++
>>>>>    mm/page_alloc.c      |  1 +
>>>>>    3 files changed, 15 insertions(+), 12 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>>> index 0ef2ba0c667a..0b7f7f998085 100644
>>>>> --- a/include/linux/mm.h
>>>>> +++ b/include/linux/mm.h
>>>>> @@ -4172,16 +4172,4 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>>>>>     */
>>>>>    #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>>>>>
>>>>> -#ifdef CONFIG_PAGE_POOL
>>>>> -static inline bool page_pool_page_is_pp(struct page *page)
>>>>> -{
>>>>> -     return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>>>> -}
>>>>> -#else
>>>>> -static inline bool page_pool_page_is_pp(struct page *page)
>>>>> -{
>>>>> -     return false;
>>>>> -}
>>>>> -#endif
>>>>> -
>>>>>    #endif /* _LINUX_MM_H */
>>>>> diff --git a/include/net/netmem.h b/include/net/netmem.h
>>>>> index d49ed49d250b..3d1b1dfc9ba5 100644
>>>>> --- a/include/net/netmem.h
>>>>> +++ b/include/net/netmem.h
>>>>> @@ -56,6 +56,20 @@ NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
>>>>>     */
>>>>>    static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
>>>>>
>>>>> +#ifdef CONFIG_PAGE_POOL
>>>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>>>> +{
>>>>> +     struct netmem_desc *desc = (struct netmem_desc *)page;
>>>>> +
>>>>> +     return (desc->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>>>> +}
>>>>> +#else
>>>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>>>> +{
>>>>> +     return false;
>>>>> +}
>>>>> +#endif
>>>>
>>>> I wonder how helpful this cleanup is long-term.
>>>>
>>>> page_pool_page_is_pp() is only called from mm/page_alloc.c, right?
>>>
>>> Yes.
>>>
>>>> There, we want to make sure that no pagepool page is ever returned to
>>>> the buddy.
>>>>
>>>> How reasonable is this sanity check to have long-term? Wouldn't we be
>>>> able to check that on some higher-level freeing path?
>>>>
>>>> The reason I am commenting is that once we decouple "struct page" from
>>>> "struct netmem_desc", we'd have to lookup here the corresponding "struct
>>>> netmem_desc".
>>>>
>>>> ... but at that point here (when we free the actual pages), the "struct
>>>> netmem_desc" would likely already have been freed separately (remember:
>>>> it will be dynamically allocated).
>>>>
>>>> With that in mind:
>>>>
>>>> 1) Is there a higher level "struct netmem_desc" freeing path where we
>>>> could check that instead, so we don't have to cast from pages to
>>>> netmem_desc at all.
>>>
>>> I also thought it's too paranoiac.  However, I thought it's other issue
>>> than this work.  That's why I left the API as is for now, it can be gone
>>> once we get convinced the check is unnecessary in deep buddy.  Wrong?
>>>
>>>> 2) How valuable are these sanity checks deep in the buddy?
>>>
>>> That was also what I felt weird on.
>>
>> It seems very useful when I asked last time[1]:
>>
>> |> We have actually used this at Cloudflare to catch some page_pool bugs.
>
> My question is rather, whether there is some higher-level freeing path for netmem_desc where we could check that instead (IOW, earlier).
>
> Or is it really arbitrary put_page() (IOW, we assume that many possible references can be held)?

+Toke, who I talked about this last time.

Maybe he can shed some light on it.


--
Best Regards,
Yan, Zi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-23 14:58         ` David Hildenbrand
  2025-06-23 15:25           ` Zi Yan
@ 2025-06-23 17:06           ` Pavel Begunkov
  2025-06-23 17:28             ` Mina Almasry
  1 sibling, 1 reply; 35+ messages in thread
From: Pavel Begunkov @ 2025-06-23 17:06 UTC (permalink / raw)
  To: David Hildenbrand, Zi Yan, Byungchul Park
  Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
	almasrymina, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
	john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, lorenzo.stoakes, Liam.Howlett, vbabka,
	rppt, surenb, mhocko, horms, linux-rdma, bpf, vishal.moola,
	hannes, jackmanb

On 6/23/25 15:58, David Hildenbrand wrote:
> On 23.06.25 13:13, Zi Yan wrote:
>> On 23 Jun 2025, at 6:16, Byungchul Park wrote:
>>
>>> On Mon, Jun 23, 2025 at 11:16:43AM +0200, David Hildenbrand wrote:
>>>> On 20.06.25 06:12, Byungchul Park wrote:
>>>>> To simplify struct page, the effort to separate its own descriptor from
>>>>> struct page is required and the work for page pool is on going.
>>>>>
>>>>> To achieve that, all the code should avoid directly accessing page pool
>>>>> members of struct page.
>>>>>
>>>>> Access ->pp_magic through struct netmem_desc instead of directly
>>>>> accessing it through struct page in page_pool_page_is_pp().  Plus, move
>>>>> page_pool_page_is_pp() from mm.h to netmem.h to use struct netmem_desc
>>>>> without header dependency issue.
>>>>>
>>>>> Signed-off-by: Byungchul Park <byungchul@sk.com>
>>>>> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>>> Reviewed-by: Mina Almasry <almasrymina@google.com>
>>>>> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
>>>>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>>>>> Acked-by: Harry Yoo <harry.yoo@oracle.com>
>>>>> ---
>>>>>    include/linux/mm.h   | 12 ------------
>>>>>    include/net/netmem.h | 14 ++++++++++++++
>>>>>    mm/page_alloc.c      |  1 +
>>>>>    3 files changed, 15 insertions(+), 12 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>>> index 0ef2ba0c667a..0b7f7f998085 100644
>>>>> --- a/include/linux/mm.h
>>>>> +++ b/include/linux/mm.h
>>>>> @@ -4172,16 +4172,4 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>>>>>     */
>>>>>    #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>>>>>
>>>>> -#ifdef CONFIG_PAGE_POOL
>>>>> -static inline bool page_pool_page_is_pp(struct page *page)
>>>>> -{
>>>>> -     return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>>>> -}
>>>>> -#else
>>>>> -static inline bool page_pool_page_is_pp(struct page *page)
>>>>> -{
>>>>> -     return false;
>>>>> -}
>>>>> -#endif
>>>>> -
>>>>>    #endif /* _LINUX_MM_H */
>>>>> diff --git a/include/net/netmem.h b/include/net/netmem.h
>>>>> index d49ed49d250b..3d1b1dfc9ba5 100644
>>>>> --- a/include/net/netmem.h
>>>>> +++ b/include/net/netmem.h
>>>>> @@ -56,6 +56,20 @@ NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
>>>>>     */
>>>>>    static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
>>>>>
>>>>> +#ifdef CONFIG_PAGE_POOL
>>>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>>>> +{
>>>>> +     struct netmem_desc *desc = (struct netmem_desc *)page;
>>>>> +
>>>>> +     return (desc->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>>>> +}
>>>>> +#else
>>>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>>>> +{
>>>>> +     return false;
>>>>> +}
>>>>> +#endif
>>>>
>>>> I wonder how helpful this cleanup is long-term.
>>>>
>>>> page_pool_page_is_pp() is only called from mm/page_alloc.c, right?
>>>
>>> Yes.
>>>
>>>> There, we want to make sure that no pagepool page is ever returned to
>>>> the buddy.
>>>>
>>>> How reasonable is this sanity check to have long-term? Wouldn't we be
>>>> able to check that on some higher-level freeing path?
>>>>
>>>> The reason I am commenting is that once we decouple "struct page" from
>>>> "struct netmem_desc", we'd have to lookup here the corresponding "struct
>>>> netmem_desc".
>>>>
>>>> ... but at that point here (when we free the actual pages), the "struct
>>>> netmem_desc" would likely already have been freed separately (remember:
>>>> it will be dynamically allocated).
>>>>
>>>> With that in mind:
>>>>
>>>> 1) Is there a higher level "struct netmem_desc" freeing path where we
>>>> could check that instead, so we don't have to cast from pages to
>>>> netmem_desc at all.

As you said, it's just a sanity check, all page pool pages should
be freed by the networking code. It checks the ownership with
netmem_is_pp(), which is basically the same as page_pool_page_is_pp()
but done though some aliasing.

static inline bool netmem_is_pp(netmem_ref netmem)
{
	return (netmem_get_pp_magic(netmem) & PP_MAGIC_MASK) == PP_SIGNATURE;
}

I assume there is no point in moving the check to skbuff.c as it
already does exactly same test, but we can probably just kill it.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-23 17:06           ` Pavel Begunkov
@ 2025-06-23 17:28             ` Mina Almasry
  2025-06-23 18:09               ` Vlastimil Babka
  2025-06-23 18:14               ` Pavel Begunkov
  0 siblings, 2 replies; 35+ messages in thread
From: Mina Almasry @ 2025-06-23 17:28 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: David Hildenbrand, Zi Yan, Byungchul Park, willy, netdev,
	linux-kernel, linux-mm, kernel_team, kuba, ilias.apalodimas,
	harry.yoo, hawk, akpm, davem, john.fastabend, andrew+netdev, toke,
	tariqt, edumazet, pabeni, saeedm, leon, ast, daniel,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	horms, linux-rdma, bpf, vishal.moola, hannes, jackmanb

On Mon, Jun 23, 2025 at 10:05 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 6/23/25 15:58, David Hildenbrand wrote:
> > On 23.06.25 13:13, Zi Yan wrote:
> >> On 23 Jun 2025, at 6:16, Byungchul Park wrote:
> >>
> >>> On Mon, Jun 23, 2025 at 11:16:43AM +0200, David Hildenbrand wrote:
> >>>> On 20.06.25 06:12, Byungchul Park wrote:
> >>>>> To simplify struct page, the effort to separate its own descriptor from
> >>>>> struct page is required and the work for page pool is on going.
> >>>>>
> >>>>> To achieve that, all the code should avoid directly accessing page pool
> >>>>> members of struct page.
> >>>>>
> >>>>> Access ->pp_magic through struct netmem_desc instead of directly
> >>>>> accessing it through struct page in page_pool_page_is_pp().  Plus, move
> >>>>> page_pool_page_is_pp() from mm.h to netmem.h to use struct netmem_desc
> >>>>> without header dependency issue.
> >>>>>
> >>>>> Signed-off-by: Byungchul Park <byungchul@sk.com>
> >>>>> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >>>>> Reviewed-by: Mina Almasry <almasrymina@google.com>
> >>>>> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
> >>>>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> >>>>> Acked-by: Harry Yoo <harry.yoo@oracle.com>
> >>>>> ---
> >>>>>    include/linux/mm.h   | 12 ------------
> >>>>>    include/net/netmem.h | 14 ++++++++++++++
> >>>>>    mm/page_alloc.c      |  1 +
> >>>>>    3 files changed, 15 insertions(+), 12 deletions(-)
> >>>>>
> >>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >>>>> index 0ef2ba0c667a..0b7f7f998085 100644
> >>>>> --- a/include/linux/mm.h
> >>>>> +++ b/include/linux/mm.h
> >>>>> @@ -4172,16 +4172,4 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> >>>>>     */
> >>>>>    #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> >>>>>
> >>>>> -#ifdef CONFIG_PAGE_POOL
> >>>>> -static inline bool page_pool_page_is_pp(struct page *page)
> >>>>> -{
> >>>>> -     return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> >>>>> -}
> >>>>> -#else
> >>>>> -static inline bool page_pool_page_is_pp(struct page *page)
> >>>>> -{
> >>>>> -     return false;
> >>>>> -}
> >>>>> -#endif
> >>>>> -
> >>>>>    #endif /* _LINUX_MM_H */
> >>>>> diff --git a/include/net/netmem.h b/include/net/netmem.h
> >>>>> index d49ed49d250b..3d1b1dfc9ba5 100644
> >>>>> --- a/include/net/netmem.h
> >>>>> +++ b/include/net/netmem.h
> >>>>> @@ -56,6 +56,20 @@ NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
> >>>>>     */
> >>>>>    static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
> >>>>>
> >>>>> +#ifdef CONFIG_PAGE_POOL
> >>>>> +static inline bool page_pool_page_is_pp(struct page *page)
> >>>>> +{
> >>>>> +     struct netmem_desc *desc = (struct netmem_desc *)page;
> >>>>> +
> >>>>> +     return (desc->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> >>>>> +}
> >>>>> +#else
> >>>>> +static inline bool page_pool_page_is_pp(struct page *page)
> >>>>> +{
> >>>>> +     return false;
> >>>>> +}
> >>>>> +#endif
> >>>>
> >>>> I wonder how helpful this cleanup is long-term.
> >>>>
> >>>> page_pool_page_is_pp() is only called from mm/page_alloc.c, right?
> >>>
> >>> Yes.
> >>>
> >>>> There, we want to make sure that no pagepool page is ever returned to
> >>>> the buddy.
> >>>>
> >>>> How reasonable is this sanity check to have long-term? Wouldn't we be
> >>>> able to check that on some higher-level freeing path?
> >>>>
> >>>> The reason I am commenting is that once we decouple "struct page" from
> >>>> "struct netmem_desc", we'd have to lookup here the corresponding "struct
> >>>> netmem_desc".
> >>>>
> >>>> ... but at that point here (when we free the actual pages), the "struct
> >>>> netmem_desc" would likely already have been freed separately (remember:
> >>>> it will be dynamically allocated).
> >>>>
> >>>> With that in mind:
> >>>>
> >>>> 1) Is there a higher level "struct netmem_desc" freeing path where we
> >>>> could check that instead, so we don't have to cast from pages to
> >>>> netmem_desc at all.
>
> As you said, it's just a sanity check, all page pool pages should
> be freed by the networking code. It checks the ownership with
> netmem_is_pp(), which is basically the same as page_pool_page_is_pp()
> but done though some aliasing.
>
> static inline bool netmem_is_pp(netmem_ref netmem)
> {
>         return (netmem_get_pp_magic(netmem) & PP_MAGIC_MASK) == PP_SIGNATURE;
> }
>
> I assume there is no point in moving the check to skbuff.c as it
> already does exactly same test, but we can probably just kill it.
>

Even if we do kill it, maybe lets do that in a separate patch, and
maybe a separate series. I would recommend not complicating this one?

Also, AFAIU, this is about removing/moving the checks in
bad_page_reason() and page_expected_state()? I think this check does
fire sometimes. I saw at least 1 report in the last year of a
bad_page_reason() check firing because the page_pool got its
accounting wrong and released a page to the buddy allocator early, so
maybe that new patch that removes that check should explain why this
check is no longer necessary.

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-23 17:28             ` Mina Almasry
@ 2025-06-23 18:09               ` Vlastimil Babka
  2025-06-23 18:14               ` Pavel Begunkov
  1 sibling, 0 replies; 35+ messages in thread
From: Vlastimil Babka @ 2025-06-23 18:09 UTC (permalink / raw)
  To: Mina Almasry, Pavel Begunkov
  Cc: David Hildenbrand, Zi Yan, Byungchul Park, willy, netdev,
	linux-kernel, linux-mm, kernel_team, kuba, ilias.apalodimas,
	harry.yoo, hawk, akpm, davem, john.fastabend, andrew+netdev, toke,
	tariqt, edumazet, pabeni, saeedm, leon, ast, daniel,
	lorenzo.stoakes, Liam.Howlett, rppt, surenb, mhocko, horms,
	linux-rdma, bpf, vishal.moola, hannes, jackmanb

On 6/23/25 19:28, Mina Almasry wrote:
> On Mon, Jun 23, 2025 at 10:05 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
>>
>> As you said, it's just a sanity check, all page pool pages should
>> be freed by the networking code. It checks the ownership with
>> netmem_is_pp(), which is basically the same as page_pool_page_is_pp()
>> but done though some aliasing.
>>
>> static inline bool netmem_is_pp(netmem_ref netmem)
>> {
>>         return (netmem_get_pp_magic(netmem) & PP_MAGIC_MASK) == PP_SIGNATURE;
>> }
>>
>> I assume there is no point in moving the check to skbuff.c as it
>> already does exactly same test, but we can probably just kill it.
>>
> 
> Even if we do kill it, maybe lets do that in a separate patch, and
> maybe a separate series. I would recommend not complicating this one?
> 
> Also, AFAIU, this is about removing/moving the checks in
> bad_page_reason() and page_expected_state()? I think this check does
> fire sometimes. I saw at least 1 report in the last year of a
> bad_page_reason() check firing because the page_pool got its
> accounting wrong and released a page to the buddy allocator early, so
> maybe that new patch that removes that check should explain why this
> check is no longer necessary.

Note these checks are these days only done with CONFIG_DEBUG_VM, or
debugging/hardening options like debug_pagealloc/init_on_alloc/init_on_free,
see mem_debugging_and_hardening_init() and when check_pages_enabled static
key is enabled.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-23 17:28             ` Mina Almasry
  2025-06-23 18:09               ` Vlastimil Babka
@ 2025-06-23 18:14               ` Pavel Begunkov
  2025-06-24  1:54                 ` Byungchul Park
  1 sibling, 1 reply; 35+ messages in thread
From: Pavel Begunkov @ 2025-06-23 18:14 UTC (permalink / raw)
  To: Mina Almasry
  Cc: David Hildenbrand, Zi Yan, Byungchul Park, willy, netdev,
	linux-kernel, linux-mm, kernel_team, kuba, ilias.apalodimas,
	harry.yoo, hawk, akpm, davem, john.fastabend, andrew+netdev, toke,
	tariqt, edumazet, pabeni, saeedm, leon, ast, daniel,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	horms, linux-rdma, bpf, vishal.moola, hannes, jackmanb

On 6/23/25 18:28, Mina Almasry wrote:
> On Mon, Jun 23, 2025 at 10:05 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
...>> As you said, it's just a sanity check, all page pool pages should
>> be freed by the networking code. It checks the ownership with
>> netmem_is_pp(), which is basically the same as page_pool_page_is_pp()
>> but done though some aliasing.
>>
>> static inline bool netmem_is_pp(netmem_ref netmem)
>> {
>>          return (netmem_get_pp_magic(netmem) & PP_MAGIC_MASK) == PP_SIGNATURE;
>> }
>>
>> I assume there is no point in moving the check to skbuff.c as it
>> already does exactly same test, but we can probably just kill it.
>>
> 
> Even if we do kill it, maybe lets do that in a separate patch, and
> maybe a separate series. I would recommend not complicating this one?
FWIW, the discussion somewhat mentioned "long term", but I'm not
suggesting actually removing it, it serves the purpose. And in
long term the helper will be converted to use page->type / etc.
without touching pp fields, that should reduce the degree of
ugliness and make it more acceptable for keeping in mm.

> Also, AFAIU, this is about removing/moving the checks in
> bad_page_reason() and page_expected_state()? I think this check does
> fire sometimes. I saw at least 1 report in the last year of a
> bad_page_reason() check firing because the page_pool got its
> accounting wrong and released a page to the buddy allocator early, so
> maybe that new patch that removes that check should explain why this
> check is no longer necessary.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 1/9] netmem: introduce struct netmem_desc mirroring struct page
  2025-06-23 12:18       ` Harry Yoo
@ 2025-06-23 19:09         ` Mina Almasry
  2025-06-23 19:28           ` Pavel Begunkov
  2025-06-24  1:17           ` Byungchul Park
  0 siblings, 2 replies; 35+ messages in thread
From: Mina Almasry @ 2025-06-23 19:09 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Byungchul Park, David Hildenbrand, willy, netdev, linux-kernel,
	linux-mm, kernel_team, kuba, ilias.apalodimas, hawk, akpm, davem,
	john.fastabend, andrew+netdev, asml.silence, toke, tariqt,
	edumazet, pabeni, saeedm, leon, ast, daniel, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, linux-rdma,
	bpf, vishal.moola, hannes, ziy, jackmanb

On Mon, Jun 23, 2025 at 5:18 AM Harry Yoo <harry.yoo@oracle.com> wrote:
>
> On Mon, Jun 23, 2025 at 07:28:21PM +0900, Byungchul Park wrote:
> > On Mon, Jun 23, 2025 at 11:32:16AM +0200, David Hildenbrand wrote:
> > > On 20.06.25 06:12, Byungchul Park wrote:
> > > > To simplify struct page, the page pool members of struct page should be
> > > > moved to other, allowing these members to be removed from struct page.
> > > >
> > > > Introduce a network memory descriptor to store the members, struct
> > > > netmem_desc, and make it union'ed with the existing fields in struct
> > > > net_iov, allowing to organize the fields of struct net_iov.
> > >
> > > It would be great adding some result from the previous discussions in
> > > here, such as that the layout of "struct net_iov" can be changed because
> > > it is not a "struct page" overlay, what the next steps based on this
> >
> > I think the network folks already know how to use and interpret their
> > data struct, struct net_iov for sure.. but I will add the comment if it
> > you think is needed.  Thanks for the comment.
>
> I agree with David - it's not immediately obvious at first glance.
> That was my feedback on the previous version as well :)
>
> I think it'd be great to add that explanation, since this is where MM and
> networking intersect.
>

I think a lot of people are now saying the same thing: (1) lets keep
net_iov and page/netmem_desc separate, and (2) lets add comments
explaining their relation so this intersection between MM and
networking is not confused in the long term .

For #1, concretely I would recommend removing the union inside struct
net_iov? And also revert all the changes to net_iov for that matter.
They are all to bring netmem_desc and net_iov closer together, but the
feedback is that we should keep them separate, and I kinda agree with
that. The fact that net_iov includes a netmem_desc in your patch makes
readers think they're very closely related.

For #2, add this comment (roughly) on top of struct net_iov? Untested
with kdoc and spell checker:

diff --git a/include/net/netmem.h b/include/net/netmem.h
index 7a1dafa3f080..8fb2b294e5f2 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -30,6 +30,25 @@ enum net_iov_type {
        NET_IOV_MAX = ULONG_MAX
 };

+/* A memory descriptor representing abstract networking I/O vectors.
+ *
+ * net_iovs are allocated by networking code, and generally represent some
+ * abstract form of non-paged memory used by the networking stack. The size
+ * of the chunk is PAGE_SIZE.
+ *
+ * This memory can be any form of non-struct paged memory. Examples include
+ * imported dmabuf memory and imported io_uring memory. See
net_iov_type for all
+ * the supported types.
+ *
+ * @type: the type of the memory. Different types of net_iovs are supported.
+ * @pp_magic: pp field, similar to the one in struct page/struct netmem_desc.
+ * @pp: the pp this net_iov belongs to, if any.
+ * @owner: the net_iov_area this net_iov belongs to, if any.
+ * @dma_addr: the dma addrs of the net_iov. Needed for the network card to
+ * send/receive this net_iov.
+ * @pp_ref_count:  the pp ref count of this net_iov, exactly the same usage as
+ * struct page/struct netmem_desc.
+ */





--
Thanks,
Mina

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 1/9] netmem: introduce struct netmem_desc mirroring struct page
  2025-06-23 19:09         ` Mina Almasry
@ 2025-06-23 19:28           ` Pavel Begunkov
  2025-06-24  1:17           ` Byungchul Park
  1 sibling, 0 replies; 35+ messages in thread
From: Pavel Begunkov @ 2025-06-23 19:28 UTC (permalink / raw)
  To: Mina Almasry, Harry Yoo
  Cc: Byungchul Park, David Hildenbrand, willy, netdev, linux-kernel,
	linux-mm, kernel_team, kuba, ilias.apalodimas, hawk, akpm, davem,
	john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, lorenzo.stoakes, Liam.Howlett, vbabka,
	rppt, surenb, mhocko, horms, linux-rdma, bpf, vishal.moola,
	hannes, ziy, jackmanb

On 6/23/25 20:09, Mina Almasry wrote:
> On Mon, Jun 23, 2025 at 5:18 AM Harry Yoo <harry.yoo@oracle.com> wrote:
>>
>> On Mon, Jun 23, 2025 at 07:28:21PM +0900, Byungchul Park wrote:
>>> On Mon, Jun 23, 2025 at 11:32:16AM +0200, David Hildenbrand wrote:
>>>> On 20.06.25 06:12, Byungchul Park wrote:
>>>>> To simplify struct page, the page pool members of struct page should be
>>>>> moved to other, allowing these members to be removed from struct page.
>>>>>
>>>>> Introduce a network memory descriptor to store the members, struct
>>>>> netmem_desc, and make it union'ed with the existing fields in struct
>>>>> net_iov, allowing to organize the fields of struct net_iov.
>>>>
>>>> It would be great adding some result from the previous discussions in
>>>> here, such as that the layout of "struct net_iov" can be changed because
>>>> it is not a "struct page" overlay, what the next steps based on this
>>>
>>> I think the network folks already know how to use and interpret their
>>> data struct, struct net_iov for sure.. but I will add the comment if it
>>> you think is needed.  Thanks for the comment.
>>
>> I agree with David - it's not immediately obvious at first glance.
>> That was my feedback on the previous version as well :)
>>
>> I think it'd be great to add that explanation, since this is where MM and
>> networking intersect.
>>
> 
> I think a lot of people are now saying the same thing: (1) lets keep
> net_iov and page/netmem_desc separate, and (2) lets add comments
> explaining their relation so this intersection between MM and
> networking is not confused in the long term .
> 
> For #1, concretely I would recommend removing the union inside struct
> net_iov? And also revert all the changes to net_iov for that matter.

Without it the merge logistics will get more complicated than it
should be because of conflicts with changes in the io_uring tree.
It's a temporary solution, I'll convert once the series is merged
and reaches io_uring.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 6/9] netmem: remove __netmem_get_pp()
  2025-06-23  4:32   ` Byungchul Park
@ 2025-06-24  0:17     ` Jakub Kicinski
  2025-06-24  1:27       ` Byungchul Park
  0 siblings, 1 reply; 35+ messages in thread
From: Jakub Kicinski @ 2025-06-24  0:17 UTC (permalink / raw)
  To: Byungchul Park
  Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola, hannes, ziy, jackmanb

On Mon, 23 Jun 2025 13:32:07 +0900 Byungchul Park wrote:
> In the meantime, libeth started to use __netmem_get_pp() again :(
> 
> Discard this patch please.  Do I have to resend this series with this
> excluded?

You'll need to repost, unfortunately. If there's a build failure 
the CI doesn't execute any functional testing nodes.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 1/9] netmem: introduce struct netmem_desc mirroring struct page
  2025-06-23 19:09         ` Mina Almasry
  2025-06-23 19:28           ` Pavel Begunkov
@ 2025-06-24  1:17           ` Byungchul Park
  1 sibling, 0 replies; 35+ messages in thread
From: Byungchul Park @ 2025-06-24  1:17 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Harry Yoo, David Hildenbrand, willy, netdev, linux-kernel,
	linux-mm, kernel_team, kuba, ilias.apalodimas, hawk, akpm, davem,
	john.fastabend, andrew+netdev, asml.silence, toke, tariqt,
	edumazet, pabeni, saeedm, leon, ast, daniel, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, linux-rdma,
	bpf, vishal.moola, hannes, ziy, jackmanb

On Mon, Jun 23, 2025 at 12:09:09PM -0700, Mina Almasry wrote:
> On Mon, Jun 23, 2025 at 5:18 AM Harry Yoo <harry.yoo@oracle.com> wrote:
> > On Mon, Jun 23, 2025 at 07:28:21PM +0900, Byungchul Park wrote:
> > > On Mon, Jun 23, 2025 at 11:32:16AM +0200, David Hildenbrand wrote:
> > > > On 20.06.25 06:12, Byungchul Park wrote:
> > > > > To simplify struct page, the page pool members of struct page should be
> > > > > moved to other, allowing these members to be removed from struct page.
> > > > >
> > > > > Introduce a network memory descriptor to store the members, struct
> > > > > netmem_desc, and make it union'ed with the existing fields in struct
> > > > > net_iov, allowing to organize the fields of struct net_iov.
> > > >
> > > > It would be great adding some result from the previous discussions in
> > > > here, such as that the layout of "struct net_iov" can be changed because
> > > > it is not a "struct page" overlay, what the next steps based on this
> > >
> > > I think the network folks already know how to use and interpret their
> > > data struct, struct net_iov for sure.. but I will add the comment if it
> > > you think is needed.  Thanks for the comment.
> >
> > I agree with David - it's not immediately obvious at first glance.
> > That was my feedback on the previous version as well :)
> >
> > I think it'd be great to add that explanation, since this is where MM and
> > networking intersect.
> >
> 
> I think a lot of people are now saying the same thing: (1) lets keep
> net_iov and page/netmem_desc separate, and (2) lets add comments
> explaining their relation so this intersection between MM and
> networking is not confused in the long term .
> 
> For #1, concretely I would recommend removing the union inside struct
> net_iov? And also revert all the changes to net_iov for that matter.

It seems like many got it wrong.  I didn't change the layout of net_iov
much.  I did nothing but replaced the existing pp fields in net_iov with
a wrapper, netmem_desc that still has the same fields.  Even with
net_iov reverted, net_iov still has the fields of netmemdesc.

Just to clarify, netmem_desc is the intersection but net_iov is not.
net_iov is a network's thing.

> They are all to bring netmem_desc and net_iov closer together, but the

It was already closer together even before this series, since netmem_ref
is used to unify the related usages.

> feedback is that we should keep them separate, and I kinda agree with
> that. The fact that net_iov includes a netmem_desc in your patch makes
> readers think they're very closely related.

Again, they were already very closely related before this series.  Of
course I agree with that it should be kept separated but it's another
issue.  It can be done on top of this series by e.g. Pavel as he said.

> For #2, add this comment (roughly) on top of struct net_iov? Untested
> with kdoc and spell checker:
> 
> diff --git a/include/net/netmem.h b/include/net/netmem.h
> index 7a1dafa3f080..8fb2b294e5f2 100644
> --- a/include/net/netmem.h
> +++ b/include/net/netmem.h
> @@ -30,6 +30,25 @@ enum net_iov_type {
>         NET_IOV_MAX = ULONG_MAX
>  };
> 
> +/* A memory descriptor representing abstract networking I/O vectors.
> + *
> + * net_iovs are allocated by networking code, and generally represent some
> + * abstract form of non-paged memory used by the networking stack. The size
> + * of the chunk is PAGE_SIZE.
> + *
> + * This memory can be any form of non-struct paged memory. Examples include
> + * imported dmabuf memory and imported io_uring memory. See
> net_iov_type for all
> + * the supported types.

The description should be changed depending on the result of discussion.
However, I will basically add this doc with some adjustment.

Thanks Mina.

	Byungchul

> + *
> + * @type: the type of the memory. Different types of net_iovs are supported.
> + * @pp_magic: pp field, similar to the one in struct page/struct netmem_desc.
> + * @pp: the pp this net_iov belongs to, if any.
> + * @owner: the net_iov_area this net_iov belongs to, if any.
> + * @dma_addr: the dma addrs of the net_iov. Needed for the network card to
> + * send/receive this net_iov.
> + * @pp_ref_count:  the pp ref count of this net_iov, exactly the same usage as
> + * struct page/struct netmem_desc.
> + */
> 
> 
> 
> 
> 
> --
> Thanks,
> Mina

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 6/9] netmem: remove __netmem_get_pp()
  2025-06-24  0:17     ` Jakub Kicinski
@ 2025-06-24  1:27       ` Byungchul Park
  0 siblings, 0 replies; 35+ messages in thread
From: Byungchul Park @ 2025-06-24  1:27 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola, hannes, ziy, jackmanb

On Mon, Jun 23, 2025 at 05:17:06PM -0700, Jakub Kicinski wrote:
> On Mon, 23 Jun 2025 13:32:07 +0900 Byungchul Park wrote:
> > In the meantime, libeth started to use __netmem_get_pp() again :(
> >
> > Discard this patch please.  Do I have to resend this series with this
> > excluded?
> 
> You'll need to repost, unfortunately. If there's a build failure
> the CI doesn't execute any functional testing nodes.

I will.  Thanks.

	Byungchul
> --
> pw-bot: cr

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-23 18:14               ` Pavel Begunkov
@ 2025-06-24  1:54                 ` Byungchul Park
  0 siblings, 0 replies; 35+ messages in thread
From: Byungchul Park @ 2025-06-24  1:54 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Mina Almasry, David Hildenbrand, Zi Yan, willy, netdev,
	linux-kernel, linux-mm, kernel_team, kuba, ilias.apalodimas,
	harry.yoo, hawk, akpm, davem, john.fastabend, andrew+netdev, toke,
	tariqt, edumazet, pabeni, saeedm, leon, ast, daniel,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	horms, linux-rdma, bpf, vishal.moola, hannes, jackmanb

On Mon, Jun 23, 2025 at 07:14:41PM +0100, Pavel Begunkov wrote:
> On 6/23/25 18:28, Mina Almasry wrote:
> > On Mon, Jun 23, 2025 at 10:05 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
> ...>> As you said, it's just a sanity check, all page pool pages should
> > > be freed by the networking code. It checks the ownership with
> > > netmem_is_pp(), which is basically the same as page_pool_page_is_pp()
> > > but done though some aliasing.
> > > 
> > > static inline bool netmem_is_pp(netmem_ref netmem)
> > > {
> > >          return (netmem_get_pp_magic(netmem) & PP_MAGIC_MASK) == PP_SIGNATURE;
> > > }
> > > 
> > > I assume there is no point in moving the check to skbuff.c as it
> > > already does exactly same test, but we can probably just kill it.
> > > 
> > 
> > Even if we do kill it, maybe lets do that in a separate patch, and
> > maybe a separate series. I would recommend not complicating this one?
> FWIW, the discussion somewhat mentioned "long term", but I'm not
> suggesting actually removing it, it serves the purpose. And in
> long term the helper will be converted to use page->type / etc.
> without touching pp fields, that should reduce the degree of
> ugliness and make it more acceptable for keeping in mm.

Agree.

	Byungchul
> 
> > Also, AFAIU, this is about removing/moving the checks in
> > bad_page_reason() and page_expected_state()? I think this check does
> > fire sometimes. I saw at least 1 report in the last year of a
> > bad_page_reason() check firing because the page_pool got its
> > accounting wrong and released a page to the buddy allocator early, so
> > maybe that new patch that removes that check should explain why this
> > check is no longer necessary.
> 
> --
> Pavel Begunkov

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-23 15:25           ` Zi Yan
@ 2025-06-24 14:43             ` Toke Høiland-Jørgensen
  2025-06-24 14:56               ` David Hildenbrand
  0 siblings, 1 reply; 35+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-06-24 14:43 UTC (permalink / raw)
  To: Zi Yan, David Hildenbrand
  Cc: Byungchul Park, willy, netdev, linux-kernel, linux-mm,
	kernel_team, kuba, almasrymina, ilias.apalodimas, harry.yoo, hawk,
	akpm, davem, john.fastabend, andrew+netdev, asml.silence, tariqt,
	edumazet, pabeni, saeedm, leon, ast, daniel, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, linux-rdma,
	bpf, vishal.moola, hannes, jackmanb, jesper@cloudflare.com

Zi Yan <ziy@nvidia.com> writes:

> On 23 Jun 2025, at 10:58, David Hildenbrand wrote:
>
>> On 23.06.25 13:13, Zi Yan wrote:
>>> On 23 Jun 2025, at 6:16, Byungchul Park wrote:
>>>
>>>> On Mon, Jun 23, 2025 at 11:16:43AM +0200, David Hildenbrand wrote:
>>>>> On 20.06.25 06:12, Byungchul Park wrote:
>>>>>> To simplify struct page, the effort to separate its own descriptor from
>>>>>> struct page is required and the work for page pool is on going.
>>>>>>
>>>>>> To achieve that, all the code should avoid directly accessing page pool
>>>>>> members of struct page.
>>>>>>
>>>>>> Access ->pp_magic through struct netmem_desc instead of directly
>>>>>> accessing it through struct page in page_pool_page_is_pp().  Plus, move
>>>>>> page_pool_page_is_pp() from mm.h to netmem.h to use struct netmem_desc
>>>>>> without header dependency issue.
>>>>>>
>>>>>> Signed-off-by: Byungchul Park <byungchul@sk.com>
>>>>>> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>>>> Reviewed-by: Mina Almasry <almasrymina@google.com>
>>>>>> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
>>>>>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>>>>>> Acked-by: Harry Yoo <harry.yoo@oracle.com>
>>>>>> ---
>>>>>>    include/linux/mm.h   | 12 ------------
>>>>>>    include/net/netmem.h | 14 ++++++++++++++
>>>>>>    mm/page_alloc.c      |  1 +
>>>>>>    3 files changed, 15 insertions(+), 12 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>>>> index 0ef2ba0c667a..0b7f7f998085 100644
>>>>>> --- a/include/linux/mm.h
>>>>>> +++ b/include/linux/mm.h
>>>>>> @@ -4172,16 +4172,4 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>>>>>>     */
>>>>>>    #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>>>>>>
>>>>>> -#ifdef CONFIG_PAGE_POOL
>>>>>> -static inline bool page_pool_page_is_pp(struct page *page)
>>>>>> -{
>>>>>> -     return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>>>>> -}
>>>>>> -#else
>>>>>> -static inline bool page_pool_page_is_pp(struct page *page)
>>>>>> -{
>>>>>> -     return false;
>>>>>> -}
>>>>>> -#endif
>>>>>> -
>>>>>>    #endif /* _LINUX_MM_H */
>>>>>> diff --git a/include/net/netmem.h b/include/net/netmem.h
>>>>>> index d49ed49d250b..3d1b1dfc9ba5 100644
>>>>>> --- a/include/net/netmem.h
>>>>>> +++ b/include/net/netmem.h
>>>>>> @@ -56,6 +56,20 @@ NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
>>>>>>     */
>>>>>>    static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
>>>>>>
>>>>>> +#ifdef CONFIG_PAGE_POOL
>>>>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>>>>> +{
>>>>>> +     struct netmem_desc *desc = (struct netmem_desc *)page;
>>>>>> +
>>>>>> +     return (desc->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>>>>> +}
>>>>>> +#else
>>>>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>>>>> +{
>>>>>> +     return false;
>>>>>> +}
>>>>>> +#endif
>>>>>
>>>>> I wonder how helpful this cleanup is long-term.
>>>>>
>>>>> page_pool_page_is_pp() is only called from mm/page_alloc.c, right?
>>>>
>>>> Yes.
>>>>
>>>>> There, we want to make sure that no pagepool page is ever returned to
>>>>> the buddy.
>>>>>
>>>>> How reasonable is this sanity check to have long-term? Wouldn't we be
>>>>> able to check that on some higher-level freeing path?
>>>>>
>>>>> The reason I am commenting is that once we decouple "struct page" from
>>>>> "struct netmem_desc", we'd have to lookup here the corresponding "struct
>>>>> netmem_desc".
>>>>>
>>>>> ... but at that point here (when we free the actual pages), the "struct
>>>>> netmem_desc" would likely already have been freed separately (remember:
>>>>> it will be dynamically allocated).
>>>>>
>>>>> With that in mind:
>>>>>
>>>>> 1) Is there a higher level "struct netmem_desc" freeing path where we
>>>>> could check that instead, so we don't have to cast from pages to
>>>>> netmem_desc at all.
>>>>
>>>> I also thought it's too paranoiac.  However, I thought it's other issue
>>>> than this work.  That's why I left the API as is for now, it can be gone
>>>> once we get convinced the check is unnecessary in deep buddy.  Wrong?
>>>>
>>>>> 2) How valuable are these sanity checks deep in the buddy?
>>>>
>>>> That was also what I felt weird on.
>>>
>>> It seems very useful when I asked last time[1]:
>>>
>>> |> We have actually used this at Cloudflare to catch some page_pool bugs.
>>
>> My question is rather, whether there is some higher-level freeing path for netmem_desc where we could check that instead (IOW, earlier).
>>
>> Or is it really arbitrary put_page() (IOW, we assume that many possible references can be held)?
>
> +Toke, who I talked about this last time.
>
> Maybe he can shed some light on it.

As others have pointed out, basically, AFAIU: Yes, pages are *supposed*
to go through a common freeing path where this check could reside, but
we've had bugs where they ended up leaking anyway, which is why this
check in MM was added in the first place.

I don't recall the specifics of *what* the bug was; +Jesper who maybe does?

-Toke


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-24 14:43             ` Toke Høiland-Jørgensen
@ 2025-06-24 14:56               ` David Hildenbrand
  2025-06-25  1:24                 ` Byungchul Park
  0 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand @ 2025-06-24 14:56 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Zi Yan
  Cc: Byungchul Park, willy, netdev, linux-kernel, linux-mm,
	kernel_team, kuba, almasrymina, ilias.apalodimas, harry.yoo, hawk,
	akpm, davem, john.fastabend, andrew+netdev, asml.silence, tariqt,
	edumazet, pabeni, saeedm, leon, ast, daniel, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, linux-rdma,
	bpf, vishal.moola, hannes, jackmanb, jesper@cloudflare.com

On 24.06.25 16:43, Toke Høiland-Jørgensen wrote:
> Zi Yan <ziy@nvidia.com> writes:
> 
>> On 23 Jun 2025, at 10:58, David Hildenbrand wrote:
>>
>>> On 23.06.25 13:13, Zi Yan wrote:
>>>> On 23 Jun 2025, at 6:16, Byungchul Park wrote:
>>>>
>>>>> On Mon, Jun 23, 2025 at 11:16:43AM +0200, David Hildenbrand wrote:
>>>>>> On 20.06.25 06:12, Byungchul Park wrote:
>>>>>>> To simplify struct page, the effort to separate its own descriptor from
>>>>>>> struct page is required and the work for page pool is on going.
>>>>>>>
>>>>>>> To achieve that, all the code should avoid directly accessing page pool
>>>>>>> members of struct page.
>>>>>>>
>>>>>>> Access ->pp_magic through struct netmem_desc instead of directly
>>>>>>> accessing it through struct page in page_pool_page_is_pp().  Plus, move
>>>>>>> page_pool_page_is_pp() from mm.h to netmem.h to use struct netmem_desc
>>>>>>> without header dependency issue.
>>>>>>>
>>>>>>> Signed-off-by: Byungchul Park <byungchul@sk.com>
>>>>>>> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>>>>> Reviewed-by: Mina Almasry <almasrymina@google.com>
>>>>>>> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
>>>>>>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>>>>>>> Acked-by: Harry Yoo <harry.yoo@oracle.com>
>>>>>>> ---
>>>>>>>     include/linux/mm.h   | 12 ------------
>>>>>>>     include/net/netmem.h | 14 ++++++++++++++
>>>>>>>     mm/page_alloc.c      |  1 +
>>>>>>>     3 files changed, 15 insertions(+), 12 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>>>>> index 0ef2ba0c667a..0b7f7f998085 100644
>>>>>>> --- a/include/linux/mm.h
>>>>>>> +++ b/include/linux/mm.h
>>>>>>> @@ -4172,16 +4172,4 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>>>>>>>      */
>>>>>>>     #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>>>>>>>
>>>>>>> -#ifdef CONFIG_PAGE_POOL
>>>>>>> -static inline bool page_pool_page_is_pp(struct page *page)
>>>>>>> -{
>>>>>>> -     return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>>>>>> -}
>>>>>>> -#else
>>>>>>> -static inline bool page_pool_page_is_pp(struct page *page)
>>>>>>> -{
>>>>>>> -     return false;
>>>>>>> -}
>>>>>>> -#endif
>>>>>>> -
>>>>>>>     #endif /* _LINUX_MM_H */
>>>>>>> diff --git a/include/net/netmem.h b/include/net/netmem.h
>>>>>>> index d49ed49d250b..3d1b1dfc9ba5 100644
>>>>>>> --- a/include/net/netmem.h
>>>>>>> +++ b/include/net/netmem.h
>>>>>>> @@ -56,6 +56,20 @@ NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
>>>>>>>      */
>>>>>>>     static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
>>>>>>>
>>>>>>> +#ifdef CONFIG_PAGE_POOL
>>>>>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>>>>>> +{
>>>>>>> +     struct netmem_desc *desc = (struct netmem_desc *)page;
>>>>>>> +
>>>>>>> +     return (desc->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>>>>>> +}
>>>>>>> +#else
>>>>>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>>>>>> +{
>>>>>>> +     return false;
>>>>>>> +}
>>>>>>> +#endif
>>>>>>
>>>>>> I wonder how helpful this cleanup is long-term.
>>>>>>
>>>>>> page_pool_page_is_pp() is only called from mm/page_alloc.c, right?
>>>>>
>>>>> Yes.
>>>>>
>>>>>> There, we want to make sure that no pagepool page is ever returned to
>>>>>> the buddy.
>>>>>>
>>>>>> How reasonable is this sanity check to have long-term? Wouldn't we be
>>>>>> able to check that on some higher-level freeing path?
>>>>>>
>>>>>> The reason I am commenting is that once we decouple "struct page" from
>>>>>> "struct netmem_desc", we'd have to lookup here the corresponding "struct
>>>>>> netmem_desc".
>>>>>>
>>>>>> ... but at that point here (when we free the actual pages), the "struct
>>>>>> netmem_desc" would likely already have been freed separately (remember:
>>>>>> it will be dynamically allocated).
>>>>>>
>>>>>> With that in mind:
>>>>>>
>>>>>> 1) Is there a higher level "struct netmem_desc" freeing path where we
>>>>>> could check that instead, so we don't have to cast from pages to
>>>>>> netmem_desc at all.
>>>>>
>>>>> I also thought it's too paranoiac.  However, I thought it's other issue
>>>>> than this work.  That's why I left the API as is for now, it can be gone
>>>>> once we get convinced the check is unnecessary in deep buddy.  Wrong?
>>>>>
>>>>>> 2) How valuable are these sanity checks deep in the buddy?
>>>>>
>>>>> That was also what I felt weird on.
>>>>
>>>> It seems very useful when I asked last time[1]:
>>>>
>>>> |> We have actually used this at Cloudflare to catch some page_pool bugs.
>>>
>>> My question is rather, whether there is some higher-level freeing path for netmem_desc where we could check that instead (IOW, earlier).
>>>
>>> Or is it really arbitrary put_page() (IOW, we assume that many possible references can be held)?
>>
>> +Toke, who I talked about this last time.
>>
>> Maybe he can shed some light on it.
> 
> As others have pointed out, basically, AFAIU: Yes, pages are *supposed*
> to go through a common freeing path where this check could reside, but
> we've had bugs where they ended up leaking anyway, which is why this
> check in MM was added in the first place.

Okay, thanks. If we could be using a page type instead to catch such 
leaks to the page allocator, we could implement it without any such 
pp-specific checks.

page types are stored in page->page_type and overlay page->_mapcount 
right now.

Looking at "struct netmem_desc", page->_mapcount should not be overlayed 
(good!).


So, you could be setting the type when creating a "struct netmem_desc" 
page, and clearing the type when about to free the page. In the buddy, 
you can then check without any casts from page to whatever else if the 
type is still unexpectedly set. If still set, you know that there is 
unexpected freeing.


I'll note that page types will be a building blocks of memdescs, to 
descibe "what we are pointing at". See

https://kernelnewbies.org/MatthewWilcox/Memdescs

Willy already planned for a "Bump" type; I assume this would now be 
"NMDesc" or sth like that IIUC.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-24 14:56               ` David Hildenbrand
@ 2025-06-25  1:24                 ` Byungchul Park
  2025-06-26  6:35                   ` Byungchul Park
  0 siblings, 1 reply; 35+ messages in thread
From: Byungchul Park @ 2025-06-25  1:24 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Toke Høiland-Jørgensen, Zi Yan, willy, netdev,
	linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, tariqt, edumazet, pabeni, saeedm,
	leon, ast, daniel, lorenzo.stoakes, Liam.Howlett, vbabka, rppt,
	surenb, mhocko, horms, linux-rdma, bpf, vishal.moola, hannes,
	jackmanb, jesper@cloudflare.com

On Tue, Jun 24, 2025 at 04:56:32PM +0200, David Hildenbrand wrote:
> 
> On 24.06.25 16:43, Toke Høiland-Jørgensen wrote:
> > Zi Yan <ziy@nvidia.com> writes:
> > 
> > > On 23 Jun 2025, at 10:58, David Hildenbrand wrote:
> > > 
> > > > On 23.06.25 13:13, Zi Yan wrote:
> > > > > On 23 Jun 2025, at 6:16, Byungchul Park wrote:
> > > > > 
> > > > > > On Mon, Jun 23, 2025 at 11:16:43AM +0200, David Hildenbrand wrote:
> > > > > > > On 20.06.25 06:12, Byungchul Park wrote:
> > > > > > > > To simplify struct page, the effort to separate its own descriptor from
> > > > > > > > struct page is required and the work for page pool is on going.
> > > > > > > > 
> > > > > > > > To achieve that, all the code should avoid directly accessing page pool
> > > > > > > > members of struct page.
> > > > > > > > 
> > > > > > > > Access ->pp_magic through struct netmem_desc instead of directly
> > > > > > > > accessing it through struct page in page_pool_page_is_pp().  Plus, move
> > > > > > > > page_pool_page_is_pp() from mm.h to netmem.h to use struct netmem_desc
> > > > > > > > without header dependency issue.
> > > > > > > > 
> > > > > > > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > > > > > > > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > > > > > > > Reviewed-by: Mina Almasry <almasrymina@google.com>
> > > > > > > > Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
> > > > > > > > Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> > > > > > > > Acked-by: Harry Yoo <harry.yoo@oracle.com>
> > > > > > > > ---
> > > > > > > >     include/linux/mm.h   | 12 ------------
> > > > > > > >     include/net/netmem.h | 14 ++++++++++++++
> > > > > > > >     mm/page_alloc.c      |  1 +
> > > > > > > >     3 files changed, 15 insertions(+), 12 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > > > > > > index 0ef2ba0c667a..0b7f7f998085 100644
> > > > > > > > --- a/include/linux/mm.h
> > > > > > > > +++ b/include/linux/mm.h
> > > > > > > > @@ -4172,16 +4172,4 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> > > > > > > >      */
> > > > > > > >     #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> > > > > > > > 
> > > > > > > > -#ifdef CONFIG_PAGE_POOL
> > > > > > > > -static inline bool page_pool_page_is_pp(struct page *page)
> > > > > > > > -{
> > > > > > > > -     return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> > > > > > > > -}
> > > > > > > > -#else
> > > > > > > > -static inline bool page_pool_page_is_pp(struct page *page)
> > > > > > > > -{
> > > > > > > > -     return false;
> > > > > > > > -}
> > > > > > > > -#endif
> > > > > > > > -
> > > > > > > >     #endif /* _LINUX_MM_H */
> > > > > > > > diff --git a/include/net/netmem.h b/include/net/netmem.h
> > > > > > > > index d49ed49d250b..3d1b1dfc9ba5 100644
> > > > > > > > --- a/include/net/netmem.h
> > > > > > > > +++ b/include/net/netmem.h
> > > > > > > > @@ -56,6 +56,20 @@ NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
> > > > > > > >      */
> > > > > > > >     static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
> > > > > > > > 
> > > > > > > > +#ifdef CONFIG_PAGE_POOL
> > > > > > > > +static inline bool page_pool_page_is_pp(struct page *page)
> > > > > > > > +{
> > > > > > > > +     struct netmem_desc *desc = (struct netmem_desc *)page;
> > > > > > > > +
> > > > > > > > +     return (desc->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> > > > > > > > +}
> > > > > > > > +#else
> > > > > > > > +static inline bool page_pool_page_is_pp(struct page *page)
> > > > > > > > +{
> > > > > > > > +     return false;
> > > > > > > > +}
> > > > > > > > +#endif
> > > > > > > 
> > > > > > > I wonder how helpful this cleanup is long-term.
> > > > > > > 
> > > > > > > page_pool_page_is_pp() is only called from mm/page_alloc.c, right?
> > > > > > 
> > > > > > Yes.
> > > > > > 
> > > > > > > There, we want to make sure that no pagepool page is ever returned to
> > > > > > > the buddy.
> > > > > > > 
> > > > > > > How reasonable is this sanity check to have long-term? Wouldn't we be
> > > > > > > able to check that on some higher-level freeing path?
> > > > > > > 
> > > > > > > The reason I am commenting is that once we decouple "struct page" from
> > > > > > > "struct netmem_desc", we'd have to lookup here the corresponding "struct
> > > > > > > netmem_desc".
> > > > > > > 
> > > > > > > ... but at that point here (when we free the actual pages), the "struct
> > > > > > > netmem_desc" would likely already have been freed separately (remember:
> > > > > > > it will be dynamically allocated).
> > > > > > > 
> > > > > > > With that in mind:
> > > > > > > 
> > > > > > > 1) Is there a higher level "struct netmem_desc" freeing path where we
> > > > > > > could check that instead, so we don't have to cast from pages to
> > > > > > > netmem_desc at all.
> > > > > > 
> > > > > > I also thought it's too paranoiac.  However, I thought it's other issue
> > > > > > than this work.  That's why I left the API as is for now, it can be gone
> > > > > > once we get convinced the check is unnecessary in deep buddy.  Wrong?
> > > > > > 
> > > > > > > 2) How valuable are these sanity checks deep in the buddy?
> > > > > > 
> > > > > > That was also what I felt weird on.
> > > > > 
> > > > > It seems very useful when I asked last time[1]:
> > > > > 
> > > > > |> We have actually used this at Cloudflare to catch some page_pool bugs.
> > > > 
> > > > My question is rather, whether there is some higher-level freeing path for netmem_desc where we could check that instead (IOW, earlier).
> > > > 
> > > > Or is it really arbitrary put_page() (IOW, we assume that many possible references can be held)?
> > > 
> > > +Toke, who I talked about this last time.
> > > 
> > > Maybe he can shed some light on it.
> > 
> > As others have pointed out, basically, AFAIU: Yes, pages are *supposed*
> > to go through a common freeing path where this check could reside, but
> > we've had bugs where they ended up leaking anyway, which is why this
> > check in MM was added in the first place.
> 
> Okay, thanks. If we could be using a page type instead to catch such
> leaks to the page allocator, we could implement it without any such
> pp-specific checks.
> 
> page types are stored in page->page_type and overlay page->_mapcount
> right now.
> 
> Looking at "struct netmem_desc", page->_mapcount should not be overlayed
> (good!).
> 
> 
> So, you could be setting the type when creating a "struct netmem_desc"
> page, and clearing the type when about to free the page. In the buddy,
> you can then check without any casts from page to whatever else if the
> type is still unexpectedly set. If still set, you know that there is
> unexpected freeing.

Yeah, this is what we all were looking forward to.  However, I decided
to use the pp field for the checking since it's not ready for now.

So.. Is the current approach okay *for now*, even though the approach
should be updated once the type can be checked inside mm later?

Or do you want me to wait for it to be ready before this netmem work to
remove the pp fields from struct page?

	Byungchul
> 
> I'll note that page types will be a building blocks of memdescs, to
> descibe "what we are pointing at". See
> 
> https://kernelnewbies.org/MatthewWilcox/Memdescs
> 
> Willy already planned for a "Bump" type; I assume this would now be
> "NMDesc" or sth like that IIUC.
> 
> --
> Cheers,
> 
> David / dhildenb

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next v6 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-25  1:24                 ` Byungchul Park
@ 2025-06-26  6:35                   ` Byungchul Park
  0 siblings, 0 replies; 35+ messages in thread
From: Byungchul Park @ 2025-06-26  6:35 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Toke Høiland-Jørgensen, Zi Yan, willy, netdev,
	linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, tariqt, edumazet, pabeni, saeedm,
	leon, ast, daniel, lorenzo.stoakes, Liam.Howlett, vbabka, rppt,
	surenb, mhocko, horms, linux-rdma, bpf, vishal.moola, hannes,
	jackmanb, jesper@cloudflare.com

On Wed, Jun 25, 2025 at 10:24:32AM +0900, Byungchul Park wrote:
> On Tue, Jun 24, 2025 at 04:56:32PM +0200, David Hildenbrand wrote:
> > 
> > On 24.06.25 16:43, Toke Høiland-Jørgensen wrote:
> > > Zi Yan <ziy@nvidia.com> writes:
> > > 
> > > > On 23 Jun 2025, at 10:58, David Hildenbrand wrote:
> > > > 
> > > > > On 23.06.25 13:13, Zi Yan wrote:
> > > > > > On 23 Jun 2025, at 6:16, Byungchul Park wrote:
> > > > > > 
> > > > > > > On Mon, Jun 23, 2025 at 11:16:43AM +0200, David Hildenbrand wrote:
> > > > > > > > On 20.06.25 06:12, Byungchul Park wrote:
> > > > > > > > > To simplify struct page, the effort to separate its own descriptor from
> > > > > > > > > struct page is required and the work for page pool is on going.
> > > > > > > > > 
> > > > > > > > > To achieve that, all the code should avoid directly accessing page pool
> > > > > > > > > members of struct page.
> > > > > > > > > 
> > > > > > > > > Access ->pp_magic through struct netmem_desc instead of directly
> > > > > > > > > accessing it through struct page in page_pool_page_is_pp().  Plus, move
> > > > > > > > > page_pool_page_is_pp() from mm.h to netmem.h to use struct netmem_desc
> > > > > > > > > without header dependency issue.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > > > > > > > > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > > > > > > > > Reviewed-by: Mina Almasry <almasrymina@google.com>
> > > > > > > > > Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
> > > > > > > > > Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> > > > > > > > > Acked-by: Harry Yoo <harry.yoo@oracle.com>
> > > > > > > > > ---
> > > > > > > > >     include/linux/mm.h   | 12 ------------
> > > > > > > > >     include/net/netmem.h | 14 ++++++++++++++
> > > > > > > > >     mm/page_alloc.c      |  1 +
> > > > > > > > >     3 files changed, 15 insertions(+), 12 deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > > > > > > > index 0ef2ba0c667a..0b7f7f998085 100644
> > > > > > > > > --- a/include/linux/mm.h
> > > > > > > > > +++ b/include/linux/mm.h
> > > > > > > > > @@ -4172,16 +4172,4 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> > > > > > > > >      */
> > > > > > > > >     #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> > > > > > > > > 
> > > > > > > > > -#ifdef CONFIG_PAGE_POOL
> > > > > > > > > -static inline bool page_pool_page_is_pp(struct page *page)
> > > > > > > > > -{
> > > > > > > > > -     return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> > > > > > > > > -}
> > > > > > > > > -#else
> > > > > > > > > -static inline bool page_pool_page_is_pp(struct page *page)
> > > > > > > > > -{
> > > > > > > > > -     return false;
> > > > > > > > > -}
> > > > > > > > > -#endif
> > > > > > > > > -
> > > > > > > > >     #endif /* _LINUX_MM_H */
> > > > > > > > > diff --git a/include/net/netmem.h b/include/net/netmem.h
> > > > > > > > > index d49ed49d250b..3d1b1dfc9ba5 100644
> > > > > > > > > --- a/include/net/netmem.h
> > > > > > > > > +++ b/include/net/netmem.h
> > > > > > > > > @@ -56,6 +56,20 @@ NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
> > > > > > > > >      */
> > > > > > > > >     static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
> > > > > > > > > 
> > > > > > > > > +#ifdef CONFIG_PAGE_POOL
> > > > > > > > > +static inline bool page_pool_page_is_pp(struct page *page)
> > > > > > > > > +{
> > > > > > > > > +     struct netmem_desc *desc = (struct netmem_desc *)page;
> > > > > > > > > +
> > > > > > > > > +     return (desc->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> > > > > > > > > +}
> > > > > > > > > +#else
> > > > > > > > > +static inline bool page_pool_page_is_pp(struct page *page)
> > > > > > > > > +{
> > > > > > > > > +     return false;
> > > > > > > > > +}
> > > > > > > > > +#endif
> > > > > > > > 
> > > > > > > > I wonder how helpful this cleanup is long-term.
> > > > > > > > 
> > > > > > > > page_pool_page_is_pp() is only called from mm/page_alloc.c, right?
> > > > > > > 
> > > > > > > Yes.
> > > > > > > 
> > > > > > > > There, we want to make sure that no pagepool page is ever returned to
> > > > > > > > the buddy.
> > > > > > > > 
> > > > > > > > How reasonable is this sanity check to have long-term? Wouldn't we be
> > > > > > > > able to check that on some higher-level freeing path?
> > > > > > > > 
> > > > > > > > The reason I am commenting is that once we decouple "struct page" from
> > > > > > > > "struct netmem_desc", we'd have to lookup here the corresponding "struct
> > > > > > > > netmem_desc".
> > > > > > > > 
> > > > > > > > ... but at that point here (when we free the actual pages), the "struct
> > > > > > > > netmem_desc" would likely already have been freed separately (remember:
> > > > > > > > it will be dynamically allocated).
> > > > > > > > 
> > > > > > > > With that in mind:
> > > > > > > > 
> > > > > > > > 1) Is there a higher level "struct netmem_desc" freeing path where we
> > > > > > > > could check that instead, so we don't have to cast from pages to
> > > > > > > > netmem_desc at all.
> > > > > > > 
> > > > > > > I also thought it's too paranoiac.  However, I thought it's other issue
> > > > > > > than this work.  That's why I left the API as is for now, it can be gone
> > > > > > > once we get convinced the check is unnecessary in deep buddy.  Wrong?
> > > > > > > 
> > > > > > > > 2) How valuable are these sanity checks deep in the buddy?
> > > > > > > 
> > > > > > > That was also what I felt weird on.
> > > > > > 
> > > > > > It seems very useful when I asked last time[1]:
> > > > > > 
> > > > > > |> We have actually used this at Cloudflare to catch some page_pool bugs.
> > > > > 
> > > > > My question is rather, whether there is some higher-level freeing path for netmem_desc where we could check that instead (IOW, earlier).
> > > > > 
> > > > > Or is it really arbitrary put_page() (IOW, we assume that many possible references can be held)?
> > > > 
> > > > +Toke, who I talked about this last time.
> > > > 
> > > > Maybe he can shed some light on it.
> > > 
> > > As others have pointed out, basically, AFAIU: Yes, pages are *supposed*
> > > to go through a common freeing path where this check could reside, but
> > > we've had bugs where they ended up leaking anyway, which is why this
> > > check in MM was added in the first place.
> > 
> > Okay, thanks. If we could be using a page type instead to catch such
> > leaks to the page allocator, we could implement it without any such
> > pp-specific checks.
> > 
> > page types are stored in page->page_type and overlay page->_mapcount
> > right now.
> > 
> > Looking at "struct netmem_desc", page->_mapcount should not be overlayed
> > (good!).
> > 
> > 
> > So, you could be setting the type when creating a "struct netmem_desc"
> > page, and clearing the type when about to free the page. In the buddy,
> > you can then check without any casts from page to whatever else if the
> > type is still unexpectedly set. If still set, you know that there is
> > unexpected freeing.
> 
> Yeah, this is what we all were looking forward to.  However, I decided
> to use the pp field for the checking since it's not ready for now.
> 
> So.. Is the current approach okay *for now*, even though the approach
> should be updated once the type can be checked inside mm later?
> 
> Or do you want me to wait for it to be ready before this netmem work to
> remove the pp fields from struct page?

Just in case you may get it wrong, I'm still waiting for your answer,
even though I've sent v7 with this patch excluded.

To Willy and David,

Is the work to add pp type(or bump) going to be done shortly?  If yes,
I should wait for it.  If no, it'd better use the pp magic field *for
now*.

	Byungchul

> 	Byungchul
> > 
> > I'll note that page types will be a building blocks of memdescs, to
> > descibe "what we are pointing at". See
> > 
> > https://kernelnewbies.org/MatthewWilcox/Memdescs
> > 
> > Willy already planned for a "Bump" type; I assume this would now be
> > "NMDesc" or sth like that IIUC.
> > 
> > --
> > Cheers,
> > 
> > David / dhildenb

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2025-06-26  6:35 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-20  4:12 [PATCH net-next v6 0/9] Split netmem from struct page Byungchul Park
2025-06-20  4:12 ` [PATCH net-next v6 1/9] netmem: introduce struct netmem_desc mirroring " Byungchul Park
2025-06-23  9:32   ` David Hildenbrand
2025-06-23 10:28     ` Byungchul Park
2025-06-23 10:38       ` David Hildenbrand
2025-06-23 12:18       ` Harry Yoo
2025-06-23 19:09         ` Mina Almasry
2025-06-23 19:28           ` Pavel Begunkov
2025-06-24  1:17           ` Byungchul Park
2025-06-20  4:12 ` [PATCH net-next v6 2/9] page_pool: rename page_pool_return_page() to page_pool_return_netmem() Byungchul Park
2025-06-20  4:12 ` [PATCH net-next v6 3/9] page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma() Byungchul Park
2025-06-20  4:12 ` [PATCH net-next v6 4/9] page_pool: rename __page_pool_alloc_pages_slow() to __page_pool_alloc_netmems_slow() Byungchul Park
2025-06-20  4:12 ` [PATCH net-next v6 5/9] netmem: use _Generic to cover const casting for page_to_netmem() Byungchul Park
2025-06-20  4:12 ` [PATCH net-next v6 6/9] netmem: remove __netmem_get_pp() Byungchul Park
2025-06-23  4:32   ` Byungchul Park
2025-06-24  0:17     ` Jakub Kicinski
2025-06-24  1:27       ` Byungchul Park
2025-06-20  4:12 ` [PATCH net-next v6 7/9] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem() Byungchul Park
2025-06-20  4:12 ` [PATCH net-next v6 8/9] netmem: introduce a netmem API, virt_to_head_netmem() Byungchul Park
2025-06-20  4:12 ` [PATCH net-next v6 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp() Byungchul Park
2025-06-23  9:16   ` David Hildenbrand
2025-06-23 10:16     ` Byungchul Park
2025-06-23 11:13       ` Zi Yan
2025-06-23 11:25         ` Byungchul Park
2025-06-23 14:58         ` David Hildenbrand
2025-06-23 15:25           ` Zi Yan
2025-06-24 14:43             ` Toke Høiland-Jørgensen
2025-06-24 14:56               ` David Hildenbrand
2025-06-25  1:24                 ` Byungchul Park
2025-06-26  6:35                   ` Byungchul Park
2025-06-23 17:06           ` Pavel Begunkov
2025-06-23 17:28             ` Mina Almasry
2025-06-23 18:09               ` Vlastimil Babka
2025-06-23 18:14               ` Pavel Begunkov
2025-06-24  1:54                 ` Byungchul Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).