linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/9] Split netmem from struct page
@ 2025-06-09  4:32 Byungchul Park
  2025-06-09  4:32 ` [PATCH net-next 1/9] netmem: introduce struct netmem_desc mirroring " Byungchul Park
                   ` (10 more replies)
  0 siblings, 11 replies; 40+ messages in thread
From: Byungchul Park @ 2025-06-09  4:32 UTC (permalink / raw)
  To: willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

Hi all,

In this version, I'm posting non-controversial patches first.  I will
post the rest more carefully later.  In this version, no update has been
applied except excluding some patches from the previous version.  See
the changes below.

--8<---
The MM subsystem is trying to reduce struct page to a single pointer.
The first step towards that is splitting struct page by its individual
users, as has already been done with folio and slab.  This patchset does
that for netmem which is used for page pools.

Matthew Wilcox tried and stopped the same work, you can see in:

   https://lore.kernel.org/linux-mm/20230111042214.907030-1-willy@infradead.org/

Mina Almasry already has done a lot fo prerequisite works by luck.  I
stacked my patches on the top of his work e.i. netmem.

I focused on removing the page pool members in struct page this time,
not moving the allocation code of page pool from net to mm.  It can be
done later if needed.

The final patch removing the page pool fields will be submitted once
all the converting work of page to netmem are done:

   1. converting of libeth_fqe by Tony Nguyen.
   2. converting of mlx5 by Tariq Toukan.
   3. converting of prueth_swdata (on me).
   4. converting of freescale driver (on me).

For our discussion, I'm sharing what the final patch looks like the
following.

	Byungchul
--8<--
commit 1847d9890f798456b21ccb27aac7545303048492
Author: Byungchul Park <byungchul@sk.com>
Date:   Wed May 28 20:44:55 2025 +0900

    mm, netmem: remove the page pool members in struct page
    
    Now that all the users of the page pool members in struct page have been
    gone, the members can be removed from struct page.
    
    However, since struct netmem_desc still uses the space in struct page,
    the important offsets should be checked properly, until struct
    netmem_desc has its own instance from slab.
    
    Remove the page pool members in struct page and modify static checkers
    for the offsets.
    
    Signed-off-by: Byungchul Park <byungchul@sk.com>

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 32ba5126e221..db2fe0d0ebbf 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -120,17 +120,6 @@ struct page {
 			 */
 			unsigned long private;
 		};
-		struct {	/* page_pool used by netstack */
-			/**
-			 * @pp_magic: magic value to avoid recycling non
-			 * page_pool allocated pages.
-			 */
-			unsigned long pp_magic;
-			struct page_pool *pp;
-			unsigned long _pp_mapping_pad;
-			unsigned long dma_addr;
-			atomic_long_t pp_ref_count;
-		};
 		struct {	/* Tail pages of compound page */
 			unsigned long compound_head;	/* Bit zero is set */
 		};
diff --git a/include/net/netmem.h b/include/net/netmem.h
index 8f354ae7d5c3..3414f184d018 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -42,11 +42,8 @@ struct netmem_desc {
 	static_assert(offsetof(struct page, pg) == \
 		      offsetof(struct netmem_desc, desc))
 NETMEM_DESC_ASSERT_OFFSET(flags, _flags);
-NETMEM_DESC_ASSERT_OFFSET(pp_magic, pp_magic);
-NETMEM_DESC_ASSERT_OFFSET(pp, pp);
-NETMEM_DESC_ASSERT_OFFSET(_pp_mapping_pad, _pp_mapping_pad);
-NETMEM_DESC_ASSERT_OFFSET(dma_addr, dma_addr);
-NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
+NETMEM_DESC_ASSERT_OFFSET(lru, pp_magic);
+NETMEM_DESC_ASSERT_OFFSET(mapping, _pp_mapping_pad);
 #undef NETMEM_DESC_ASSERT_OFFSET
 
 /*
---
Changes from v4:
	1. Add given 'Reviewed-by's, thanks to all.
	2. Exclude potentially controversial patches.

Changes from v3:
	1. Relocates ->owner and ->type of net_iov out of netmem_desc
	   and make them be net_iov specific.
	2. Remove __force when casting struct page to struct netmem_desc.

Changes from v2:
	1. Introduce a netmem API, virt_to_head_netmem(), and use it
	   when it's needed.
	2. Introduce struct netmem_desc as a new struct and union'ed
	   with the existing fields in struct net_iov.
	3. Make page_pool_page_is_pp() access ->pp_magic through struct
	   netmem_desc instead of struct page.
	4. Move netmem alloc APIs from include/net/netmem.h to
	   net/core/netmem_priv.h.
	5. Apply trivial feedbacks, thanks to Mina, Pavel, and Toke.
	6. Add given 'Reviewed-by's, thanks to Mina.

Changes from v1:
	1. Rebase on net-next's main as of May 26.
	2. Check checkpatch.pl, feedbacked by SJ Park.
	3. Add converting of page to netmem in mt76.
	4. Revert 'mlx5: use netmem descriptor and APIs for page pool'
	   since it's on-going by Tariq Toukan.  I will wait for his
	   work to be done.
	5. Revert 'page_pool: use netmem APIs to access page->pp_magic
	   in page_pool_page_is_pp()' since we need more discussion.
	6. Revert 'mm, netmem: remove the page pool members in struct
	   page' since there are some prerequisite works to remove the
	   page pool fields from struct page.  I can submit this patch
	   separatedly later.
	7. Cancel relocating a page pool member in struct page.
	8. Modify static assert for offests and size of struct
	   netmem_desc.

Changes from rfc:
	1. Rebase on net-next's main branch.
	   https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/
	2. Fix a build error reported by kernel test robot.
	   https://lore.kernel.org/all/202505100932.uzAMBW1y-lkp@intel.com/
	3. Add given 'Reviewed-by's, thanks to Mina and Ilias.
	4. Do static_assert() on the size of struct netmem_desc instead
	   of placing place-holder in struct page, feedbacked by
	   Matthew.
	5. Do struct_group_tagged(netmem_desc) on struct net_iov instead
	   of wholly renaming it to strcut netmem_desc, feedbacked by
	   Mina and Pavel.

Byungchul Park (9):
  netmem: introduce struct netmem_desc mirroring struct page
  page_pool: rename page_pool_return_page() to page_pool_return_netmem()
  page_pool: rename __page_pool_release_page_dma() to
    __page_pool_release_netmem_dma()
  page_pool: rename __page_pool_alloc_pages_slow() to
    __page_pool_alloc_netmems_slow()
  netmem: use _Generic to cover const casting for page_to_netmem()
  netmem: remove __netmem_get_pp()
  page_pool: make page_pool_get_dma_addr() just wrap
    page_pool_get_dma_addr_netmem()
  netmem: introduce a netmem API, virt_to_head_netmem()
  page_pool: access ->pp_magic through struct netmem_desc in
    page_pool_page_is_pp()

 include/linux/mm.h              |  12 ---
 include/net/netmem.h            | 138 ++++++++++++++++++++++----------
 include/net/page_pool/helpers.h |   7 +-
 mm/page_alloc.c                 |   1 +
 net/core/page_pool.c            |  36 ++++-----
 5 files changed, 117 insertions(+), 77 deletions(-)


base-commit: 90b83efa6701656e02c86e7df2cb1765ea602d07
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 1/9] netmem: introduce struct netmem_desc mirroring struct page
  2025-06-09  4:32 [PATCH net-next 0/9] Split netmem from struct page Byungchul Park
@ 2025-06-09  4:32 ` Byungchul Park
  2025-06-09 19:32   ` Jakub Kicinski
                     ` (2 more replies)
  2025-06-09  4:32 ` [PATCH net-next 2/9] page_pool: rename page_pool_return_page() to page_pool_return_netmem() Byungchul Park
                   ` (9 subsequent siblings)
  10 siblings, 3 replies; 40+ messages in thread
From: Byungchul Park @ 2025-06-09  4:32 UTC (permalink / raw)
  To: willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

To simplify struct page, the page pool members of struct page should be
moved to other, allowing these members to be removed from struct page.

Introduce a network memory descriptor to store the members, struct
netmem_desc, and make it union'ed with the existing fields in struct
net_iov, allowing to organize the fields of struct net_iov.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
---
 include/net/netmem.h | 94 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 73 insertions(+), 21 deletions(-)

diff --git a/include/net/netmem.h b/include/net/netmem.h
index 386164fb9c18..2687c8051ca5 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -12,6 +12,50 @@
 #include <linux/mm.h>
 #include <net/net_debug.h>
 
+/* These fields in struct page are used by the page_pool and net stack:
+ *
+ *        struct {
+ *                unsigned long pp_magic;
+ *                struct page_pool *pp;
+ *                unsigned long _pp_mapping_pad;
+ *                unsigned long dma_addr;
+ *                atomic_long_t pp_ref_count;
+ *        };
+ *
+ * We mirror the page_pool fields here so the page_pool can access these
+ * fields without worrying whether the underlying fields belong to a
+ * page or netmem_desc.
+ *
+ * CAUTION: Do not update the fields in netmem_desc without also
+ * updating the anonymous aliasing union in struct net_iov.
+ */
+struct netmem_desc {
+	unsigned long _flags;
+	unsigned long pp_magic;
+	struct page_pool *pp;
+	unsigned long _pp_mapping_pad;
+	unsigned long dma_addr;
+	atomic_long_t pp_ref_count;
+};
+
+#define NETMEM_DESC_ASSERT_OFFSET(pg, desc)        \
+	static_assert(offsetof(struct page, pg) == \
+		      offsetof(struct netmem_desc, desc))
+NETMEM_DESC_ASSERT_OFFSET(flags, _flags);
+NETMEM_DESC_ASSERT_OFFSET(pp_magic, pp_magic);
+NETMEM_DESC_ASSERT_OFFSET(pp, pp);
+NETMEM_DESC_ASSERT_OFFSET(_pp_mapping_pad, _pp_mapping_pad);
+NETMEM_DESC_ASSERT_OFFSET(dma_addr, dma_addr);
+NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
+#undef NETMEM_DESC_ASSERT_OFFSET
+
+/*
+ * Since struct netmem_desc uses the space in struct page, the size
+ * should be checked, until struct netmem_desc has its own instance from
+ * slab, to avoid conflicting with other members within struct page.
+ */
+static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
+
 /* net_iov */
 
 DECLARE_STATIC_KEY_FALSE(page_pool_mem_providers);
@@ -31,12 +75,25 @@ enum net_iov_type {
 };
 
 struct net_iov {
-	enum net_iov_type type;
-	unsigned long pp_magic;
-	struct page_pool *pp;
+	union {
+		struct netmem_desc desc;
+
+		/* XXX: The following part should be removed once all
+		 * the references to them are converted so as to be
+		 * accessed via netmem_desc e.g. niov->desc.pp instead
+		 * of niov->pp.
+		 */
+		struct {
+			unsigned long _flags;
+			unsigned long pp_magic;
+			struct page_pool *pp;
+			unsigned long _pp_mapping_pad;
+			unsigned long dma_addr;
+			atomic_long_t pp_ref_count;
+		};
+	};
 	struct net_iov_area *owner;
-	unsigned long dma_addr;
-	atomic_long_t pp_ref_count;
+	enum net_iov_type type;
 };
 
 struct net_iov_area {
@@ -48,27 +105,22 @@ struct net_iov_area {
 	unsigned long base_virtual;
 };
 
-/* These fields in struct page are used by the page_pool and net stack:
+/* net_iov is union'ed with struct netmem_desc mirroring struct page, so
+ * the page_pool can access these fields without worrying whether the
+ * underlying fields are accessed via netmem_desc or directly via
+ * net_iov, until all the references to them are converted so as to be
+ * accessed via netmem_desc e.g. niov->desc.pp instead of niov->pp.
  *
- *        struct {
- *                unsigned long pp_magic;
- *                struct page_pool *pp;
- *                unsigned long _pp_mapping_pad;
- *                unsigned long dma_addr;
- *                atomic_long_t pp_ref_count;
- *        };
- *
- * We mirror the page_pool fields here so the page_pool can access these fields
- * without worrying whether the underlying fields belong to a page or net_iov.
- *
- * The non-net stack fields of struct page are private to the mm stack and must
- * never be mirrored to net_iov.
+ * The non-net stack fields of struct page are private to the mm stack
+ * and must never be mirrored to net_iov.
  */
-#define NET_IOV_ASSERT_OFFSET(pg, iov)             \
-	static_assert(offsetof(struct page, pg) == \
+#define NET_IOV_ASSERT_OFFSET(desc, iov)                    \
+	static_assert(offsetof(struct netmem_desc, desc) == \
 		      offsetof(struct net_iov, iov))
+NET_IOV_ASSERT_OFFSET(_flags, _flags);
 NET_IOV_ASSERT_OFFSET(pp_magic, pp_magic);
 NET_IOV_ASSERT_OFFSET(pp, pp);
+NET_IOV_ASSERT_OFFSET(_pp_mapping_pad, _pp_mapping_pad);
 NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr);
 NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
 #undef NET_IOV_ASSERT_OFFSET
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 2/9] page_pool: rename page_pool_return_page() to page_pool_return_netmem()
  2025-06-09  4:32 [PATCH net-next 0/9] Split netmem from struct page Byungchul Park
  2025-06-09  4:32 ` [PATCH net-next 1/9] netmem: introduce struct netmem_desc mirroring " Byungchul Park
@ 2025-06-09  4:32 ` Byungchul Park
  2025-06-10 11:15   ` Ilias Apalodimas
  2025-06-09  4:32 ` [PATCH net-next 3/9] page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma() Byungchul Park
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 40+ messages in thread
From: Byungchul Park @ 2025-06-09  4:32 UTC (permalink / raw)
  To: willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

Now that page_pool_return_page() is for returning netmem, not struct
page, rename it to page_pool_return_netmem() to reflect what it does.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/core/page_pool.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 4011eb305cee..460d11a31fbc 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -371,7 +371,7 @@ struct page_pool *page_pool_create(const struct page_pool_params *params)
 }
 EXPORT_SYMBOL(page_pool_create);
 
-static void page_pool_return_page(struct page_pool *pool, netmem_ref netmem);
+static void page_pool_return_netmem(struct page_pool *pool, netmem_ref netmem);
 
 static noinline netmem_ref page_pool_refill_alloc_cache(struct page_pool *pool)
 {
@@ -409,7 +409,7 @@ static noinline netmem_ref page_pool_refill_alloc_cache(struct page_pool *pool)
 			 * (2) break out to fallthrough to alloc_pages_node.
 			 * This limit stress on page buddy alloactor.
 			 */
-			page_pool_return_page(pool, netmem);
+			page_pool_return_netmem(pool, netmem);
 			alloc_stat_inc(pool, waive);
 			netmem = 0;
 			break;
@@ -712,7 +712,7 @@ static __always_inline void __page_pool_release_page_dma(struct page_pool *pool,
  * a regular page (that will eventually be returned to the normal
  * page-allocator via put_page).
  */
-void page_pool_return_page(struct page_pool *pool, netmem_ref netmem)
+static void page_pool_return_netmem(struct page_pool *pool, netmem_ref netmem)
 {
 	int count;
 	bool put;
@@ -829,7 +829,7 @@ __page_pool_put_page(struct page_pool *pool, netmem_ref netmem,
 	 * will be invoking put_page.
 	 */
 	recycle_stat_inc(pool, released_refcnt);
-	page_pool_return_page(pool, netmem);
+	page_pool_return_netmem(pool, netmem);
 
 	return 0;
 }
@@ -872,7 +872,7 @@ void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem,
 	if (netmem && !page_pool_recycle_in_ring(pool, netmem)) {
 		/* Cache full, fallback to free pages */
 		recycle_stat_inc(pool, ring_full);
-		page_pool_return_page(pool, netmem);
+		page_pool_return_netmem(pool, netmem);
 	}
 }
 EXPORT_SYMBOL(page_pool_put_unrefed_netmem);
@@ -915,7 +915,7 @@ static void page_pool_recycle_ring_bulk(struct page_pool *pool,
 	 * since put_page() with refcnt == 1 can be an expensive operation.
 	 */
 	for (; i < bulk_len; i++)
-		page_pool_return_page(pool, bulk[i]);
+		page_pool_return_netmem(pool, bulk[i]);
 }
 
 /**
@@ -998,7 +998,7 @@ static netmem_ref page_pool_drain_frag(struct page_pool *pool,
 		return netmem;
 	}
 
-	page_pool_return_page(pool, netmem);
+	page_pool_return_netmem(pool, netmem);
 	return 0;
 }
 
@@ -1012,7 +1012,7 @@ static void page_pool_free_frag(struct page_pool *pool)
 	if (!netmem || page_pool_unref_netmem(netmem, drain_count))
 		return;
 
-	page_pool_return_page(pool, netmem);
+	page_pool_return_netmem(pool, netmem);
 }
 
 netmem_ref page_pool_alloc_frag_netmem(struct page_pool *pool,
@@ -1079,7 +1079,7 @@ static void page_pool_empty_ring(struct page_pool *pool)
 			pr_crit("%s() page_pool refcnt %d violation\n",
 				__func__, netmem_ref_count(netmem));
 
-		page_pool_return_page(pool, netmem);
+		page_pool_return_netmem(pool, netmem);
 	}
 }
 
@@ -1112,7 +1112,7 @@ static void page_pool_empty_alloc_cache_once(struct page_pool *pool)
 	 */
 	while (pool->alloc.count) {
 		netmem = pool->alloc.cache[--pool->alloc.count];
-		page_pool_return_page(pool, netmem);
+		page_pool_return_netmem(pool, netmem);
 	}
 }
 
@@ -1252,7 +1252,7 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid)
 	/* Flush pool alloc cache, as refill will check NUMA node */
 	while (pool->alloc.count) {
 		netmem = pool->alloc.cache[--pool->alloc.count];
-		page_pool_return_page(pool, netmem);
+		page_pool_return_netmem(pool, netmem);
 	}
 }
 EXPORT_SYMBOL(page_pool_update_nid);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 3/9] page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma()
  2025-06-09  4:32 [PATCH net-next 0/9] Split netmem from struct page Byungchul Park
  2025-06-09  4:32 ` [PATCH net-next 1/9] netmem: introduce struct netmem_desc mirroring " Byungchul Park
  2025-06-09  4:32 ` [PATCH net-next 2/9] page_pool: rename page_pool_return_page() to page_pool_return_netmem() Byungchul Park
@ 2025-06-09  4:32 ` Byungchul Park
  2025-06-10 11:11   ` Ilias Apalodimas
  2025-06-09  4:32 ` [PATCH net-next 4/9] page_pool: rename __page_pool_alloc_pages_slow() to __page_pool_alloc_netmems_slow() Byungchul Park
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 40+ messages in thread
From: Byungchul Park @ 2025-06-09  4:32 UTC (permalink / raw)
  To: willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

Now that __page_pool_release_page_dma() is for releasing netmem, not
struct page, rename it to __page_pool_release_netmem_dma() to reflect
what it does.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/core/page_pool.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 460d11a31fbc..8d44d1abfaef 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -673,8 +673,8 @@ void page_pool_clear_pp_info(netmem_ref netmem)
 	netmem_set_pp(netmem, NULL);
 }
 
-static __always_inline void __page_pool_release_page_dma(struct page_pool *pool,
-							 netmem_ref netmem)
+static __always_inline void __page_pool_release_netmem_dma(struct page_pool *pool,
+							   netmem_ref netmem)
 {
 	struct page *old, *page = netmem_to_page(netmem);
 	unsigned long id;
@@ -721,7 +721,7 @@ static void page_pool_return_netmem(struct page_pool *pool, netmem_ref netmem)
 	if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops)
 		put = pool->mp_ops->release_netmem(pool, netmem);
 	else
-		__page_pool_release_page_dma(pool, netmem);
+		__page_pool_release_netmem_dma(pool, netmem);
 
 	/* This may be the last page returned, releasing the pool, so
 	 * it is not safe to reference pool afterwards.
@@ -1139,7 +1139,7 @@ static void page_pool_scrub(struct page_pool *pool)
 		}
 
 		xa_for_each(&pool->dma_mapped, id, ptr)
-			__page_pool_release_page_dma(pool, page_to_netmem(ptr));
+			__page_pool_release_netmem_dma(pool, page_to_netmem((struct page *)ptr));
 	}
 
 	/* No more consumers should exist, but producers could still
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 4/9] page_pool: rename __page_pool_alloc_pages_slow() to __page_pool_alloc_netmems_slow()
  2025-06-09  4:32 [PATCH net-next 0/9] Split netmem from struct page Byungchul Park
                   ` (2 preceding siblings ...)
  2025-06-09  4:32 ` [PATCH net-next 3/9] page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma() Byungchul Park
@ 2025-06-09  4:32 ` Byungchul Park
  2025-06-09  4:32 ` [PATCH net-next 5/9] netmem: use _Generic to cover const casting for page_to_netmem() Byungchul Park
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 40+ messages in thread
From: Byungchul Park @ 2025-06-09  4:32 UTC (permalink / raw)
  To: willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

Now that __page_pool_alloc_pages_slow() is for allocating netmem, not
struct page, rename it to __page_pool_alloc_netmems_slow() to reflect
what it does.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 net/core/page_pool.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 8d44d1abfaef..a9189ad1044b 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -544,8 +544,8 @@ static struct page *__page_pool_alloc_page_order(struct page_pool *pool,
 }
 
 /* slow path */
-static noinline netmem_ref __page_pool_alloc_pages_slow(struct page_pool *pool,
-							gfp_t gfp)
+static noinline netmem_ref __page_pool_alloc_netmems_slow(struct page_pool *pool,
+							  gfp_t gfp)
 {
 	const int bulk = PP_ALLOC_CACHE_REFILL;
 	unsigned int pp_order = pool->p.order;
@@ -615,7 +615,7 @@ netmem_ref page_pool_alloc_netmems(struct page_pool *pool, gfp_t gfp)
 	if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops)
 		netmem = pool->mp_ops->alloc_netmems(pool, gfp);
 	else
-		netmem = __page_pool_alloc_pages_slow(pool, gfp);
+		netmem = __page_pool_alloc_netmems_slow(pool, gfp);
 	return netmem;
 }
 EXPORT_SYMBOL(page_pool_alloc_netmems);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 5/9] netmem: use _Generic to cover const casting for page_to_netmem()
  2025-06-09  4:32 [PATCH net-next 0/9] Split netmem from struct page Byungchul Park
                   ` (3 preceding siblings ...)
  2025-06-09  4:32 ` [PATCH net-next 4/9] page_pool: rename __page_pool_alloc_pages_slow() to __page_pool_alloc_netmems_slow() Byungchul Park
@ 2025-06-09  4:32 ` Byungchul Park
  2025-06-09  4:32 ` [PATCH net-next 6/9] netmem: remove __netmem_get_pp() Byungchul Park
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 40+ messages in thread
From: Byungchul Park @ 2025-06-09  4:32 UTC (permalink / raw)
  To: willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

The current page_to_netmem() doesn't cover const casting resulting in
trying to cast const struct page * to const netmem_ref fails.

To cover the case, change page_to_netmem() to use macro and _Generic.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/net/netmem.h | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/include/net/netmem.h b/include/net/netmem.h
index 2687c8051ca5..65bb87835664 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -195,10 +195,9 @@ static inline netmem_ref net_iov_to_netmem(struct net_iov *niov)
 	return (__force netmem_ref)((unsigned long)niov | NET_IOV);
 }
 
-static inline netmem_ref page_to_netmem(struct page *page)
-{
-	return (__force netmem_ref)page;
-}
+#define page_to_netmem(p)	(_Generic((p),			\
+	const struct page * :	(__force const netmem_ref)(p),	\
+	struct page * :		(__force netmem_ref)(p)))
 
 /**
  * virt_to_netmem - convert virtual memory pointer to a netmem reference
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 6/9] netmem: remove __netmem_get_pp()
  2025-06-09  4:32 [PATCH net-next 0/9] Split netmem from struct page Byungchul Park
                   ` (4 preceding siblings ...)
  2025-06-09  4:32 ` [PATCH net-next 5/9] netmem: use _Generic to cover const casting for page_to_netmem() Byungchul Park
@ 2025-06-09  4:32 ` Byungchul Park
  2025-06-10 11:11   ` Ilias Apalodimas
  2025-06-09  4:32 ` [PATCH net-next 7/9] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem() Byungchul Park
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 40+ messages in thread
From: Byungchul Park @ 2025-06-09  4:32 UTC (permalink / raw)
  To: willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

There are no users of __netmem_get_pp().  Remove it.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/net/netmem.h | 16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/include/net/netmem.h b/include/net/netmem.h
index 65bb87835664..d4066fcb1fee 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -234,22 +234,6 @@ static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem)
 	return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV);
 }
 
-/**
- * __netmem_get_pp - unsafely get pointer to the &page_pool backing @netmem
- * @netmem: netmem reference to get the pointer from
- *
- * Unsafe version of netmem_get_pp(). When @netmem is always page-backed,
- * e.g. when it's a header buffer, performs faster and generates smaller
- * object code (avoids clearing the LSB). When @netmem points to IOV,
- * provokes invalid memory access.
- *
- * Return: pointer to the &page_pool (garbage if @netmem is not page-backed).
- */
-static inline struct page_pool *__netmem_get_pp(netmem_ref netmem)
-{
-	return __netmem_to_page(netmem)->pp;
-}
-
 static inline struct page_pool *netmem_get_pp(netmem_ref netmem)
 {
 	return __netmem_clear_lsb(netmem)->pp;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 7/9] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem()
  2025-06-09  4:32 [PATCH net-next 0/9] Split netmem from struct page Byungchul Park
                   ` (5 preceding siblings ...)
  2025-06-09  4:32 ` [PATCH net-next 6/9] netmem: remove __netmem_get_pp() Byungchul Park
@ 2025-06-09  4:32 ` Byungchul Park
  2025-06-09  4:32 ` [PATCH net-next 8/9] netmem: introduce a netmem API, virt_to_head_netmem() Byungchul Park
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 40+ messages in thread
From: Byungchul Park @ 2025-06-09  4:32 UTC (permalink / raw)
  To: willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

The page pool members in struct page cannot be removed unless it's not
allowed to access any of them via struct page.

Do not access 'page->dma_addr' directly in page_pool_get_dma_addr() but
just wrap page_pool_get_dma_addr_netmem() safely.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/net/page_pool/helpers.h | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 93f2c31baf9b..387913b6c8bf 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -437,12 +437,7 @@ static inline dma_addr_t page_pool_get_dma_addr_netmem(netmem_ref netmem)
  */
 static inline dma_addr_t page_pool_get_dma_addr(const struct page *page)
 {
-	dma_addr_t ret = page->dma_addr;
-
-	if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA)
-		ret <<= PAGE_SHIFT;
-
-	return ret;
+	return page_pool_get_dma_addr_netmem(page_to_netmem(page));
 }
 
 static inline void __page_pool_dma_sync_for_cpu(const struct page_pool *pool,
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 8/9] netmem: introduce a netmem API, virt_to_head_netmem()
  2025-06-09  4:32 [PATCH net-next 0/9] Split netmem from struct page Byungchul Park
                   ` (6 preceding siblings ...)
  2025-06-09  4:32 ` [PATCH net-next 7/9] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem() Byungchul Park
@ 2025-06-09  4:32 ` Byungchul Park
  2025-06-09  4:32 ` [PATCH net-next 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp() Byungchul Park
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 40+ messages in thread
From: Byungchul Park @ 2025-06-09  4:32 UTC (permalink / raw)
  To: willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

To eliminate the use of struct page in page pool, the page pool code
should use netmem descriptor and APIs instead.

As part of the work, introduce a netmem API to convert a virtual address
to a head netmem allowing the code to use it rather than the existing
API, virt_to_head_page() for struct page.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
---
 include/net/netmem.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/net/netmem.h b/include/net/netmem.h
index d4066fcb1fee..d84ab624b489 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -265,6 +265,13 @@ static inline netmem_ref netmem_compound_head(netmem_ref netmem)
 	return page_to_netmem(compound_head(netmem_to_page(netmem)));
 }
 
+static inline netmem_ref virt_to_head_netmem(const void *x)
+{
+	netmem_ref netmem = virt_to_netmem(x);
+
+	return netmem_compound_head(netmem);
+}
+
 /**
  * __netmem_address - unsafely get pointer to the memory backing @netmem
  * @netmem: netmem reference to get the pointer for
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-09  4:32 [PATCH net-next 0/9] Split netmem from struct page Byungchul Park
                   ` (7 preceding siblings ...)
  2025-06-09  4:32 ` [PATCH net-next 8/9] netmem: introduce a netmem API, virt_to_head_netmem() Byungchul Park
@ 2025-06-09  4:32 ` Byungchul Park
  2025-06-09 17:39   ` Mina Almasry
  2025-06-19  9:55   ` Vlastimil Babka
  2025-06-09  6:49 ` [PATCH net-next 0/9] Split netmem from struct page Byungchul Park
  2025-06-11 14:25 ` Pavel Begunkov
  10 siblings, 2 replies; 40+ messages in thread
From: Byungchul Park @ 2025-06-09  4:32 UTC (permalink / raw)
  To: willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

To simplify struct page, the effort to separate its own descriptor from
struct page is required and the work for page pool is on going.

To achieve that, all the code should avoid directly accessing page pool
members of struct page.

Access ->pp_magic through struct netmem_desc instead of directly
accessing it through struct page in page_pool_page_is_pp().  Plus, move
page_pool_page_is_pp() from mm.h to netmem.h to use struct netmem_desc
without header dependency issue.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/linux/mm.h   | 12 ------------
 include/net/netmem.h | 14 ++++++++++++++
 mm/page_alloc.c      |  1 +
 3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e51dba8398f7..f23560853447 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4311,16 +4311,4 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
  */
 #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
 
-#ifdef CONFIG_PAGE_POOL
-static inline bool page_pool_page_is_pp(struct page *page)
-{
-	return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
-}
-#else
-static inline bool page_pool_page_is_pp(struct page *page)
-{
-	return false;
-}
-#endif
-
 #endif /* _LINUX_MM_H */
diff --git a/include/net/netmem.h b/include/net/netmem.h
index d84ab624b489..8f354ae7d5c3 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -56,6 +56,20 @@ NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
  */
 static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
 
+#ifdef CONFIG_PAGE_POOL
+static inline bool page_pool_page_is_pp(struct page *page)
+{
+	struct netmem_desc *desc = (struct netmem_desc *)page;
+
+	return (desc->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
+}
+#else
+static inline bool page_pool_page_is_pp(struct page *page)
+{
+	return false;
+}
+#endif
+
 /* net_iov */
 
 DECLARE_STATIC_KEY_FALSE(page_pool_mem_providers);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4f29e393f6af..be0752c0ac92 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -55,6 +55,7 @@
 #include <linux/delayacct.h>
 #include <linux/cacheinfo.h>
 #include <linux/pgalloc_tag.h>
+#include <net/netmem.h>
 #include <asm/div64.h>
 #include "internal.h"
 #include "shuffle.h"
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 0/9] Split netmem from struct page
  2025-06-09  4:32 [PATCH net-next 0/9] Split netmem from struct page Byungchul Park
                   ` (8 preceding siblings ...)
  2025-06-09  4:32 ` [PATCH net-next 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp() Byungchul Park
@ 2025-06-09  6:49 ` Byungchul Park
  2025-06-11 14:25 ` Pavel Begunkov
  10 siblings, 0 replies; 40+ messages in thread
From: Byungchul Park @ 2025-06-09  6:49 UTC (permalink / raw)
  To: willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

On Mon, Jun 09, 2025 at 01:32:16PM +0900, Byungchul Park wrote:
> Hi all,

I'm so sorry.. I missed to add v5 into the prefix..

	Byungchul

> In this version, I'm posting non-controversial patches first.  I will
> post the rest more carefully later.  In this version, no update has been
> applied except excluding some patches from the previous version.  See
> the changes below.
> 
> --8<---
> The MM subsystem is trying to reduce struct page to a single pointer.
> The first step towards that is splitting struct page by its individual
> users, as has already been done with folio and slab.  This patchset does
> that for netmem which is used for page pools.
> 
> Matthew Wilcox tried and stopped the same work, you can see in:
> 
>    https://lore.kernel.org/linux-mm/20230111042214.907030-1-willy@infradead.org/
> 
> Mina Almasry already has done a lot fo prerequisite works by luck.  I
> stacked my patches on the top of his work e.i. netmem.
> 
> I focused on removing the page pool members in struct page this time,
> not moving the allocation code of page pool from net to mm.  It can be
> done later if needed.
> 
> The final patch removing the page pool fields will be submitted once
> all the converting work of page to netmem are done:
> 
>    1. converting of libeth_fqe by Tony Nguyen.
>    2. converting of mlx5 by Tariq Toukan.
>    3. converting of prueth_swdata (on me).
>    4. converting of freescale driver (on me).
> 
> For our discussion, I'm sharing what the final patch looks like the
> following.
> 
> 	Byungchul
> --8<--
> commit 1847d9890f798456b21ccb27aac7545303048492
> Author: Byungchul Park <byungchul@sk.com>
> Date:   Wed May 28 20:44:55 2025 +0900
> 
>     mm, netmem: remove the page pool members in struct page
>     
>     Now that all the users of the page pool members in struct page have been
>     gone, the members can be removed from struct page.
>     
>     However, since struct netmem_desc still uses the space in struct page,
>     the important offsets should be checked properly, until struct
>     netmem_desc has its own instance from slab.
>     
>     Remove the page pool members in struct page and modify static checkers
>     for the offsets.
>     
>     Signed-off-by: Byungchul Park <byungchul@sk.com>
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 32ba5126e221..db2fe0d0ebbf 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -120,17 +120,6 @@ struct page {
>  			 */
>  			unsigned long private;
>  		};
> -		struct {	/* page_pool used by netstack */
> -			/**
> -			 * @pp_magic: magic value to avoid recycling non
> -			 * page_pool allocated pages.
> -			 */
> -			unsigned long pp_magic;
> -			struct page_pool *pp;
> -			unsigned long _pp_mapping_pad;
> -			unsigned long dma_addr;
> -			atomic_long_t pp_ref_count;
> -		};
>  		struct {	/* Tail pages of compound page */
>  			unsigned long compound_head;	/* Bit zero is set */
>  		};
> diff --git a/include/net/netmem.h b/include/net/netmem.h
> index 8f354ae7d5c3..3414f184d018 100644
> --- a/include/net/netmem.h
> +++ b/include/net/netmem.h
> @@ -42,11 +42,8 @@ struct netmem_desc {
>  	static_assert(offsetof(struct page, pg) == \
>  		      offsetof(struct netmem_desc, desc))
>  NETMEM_DESC_ASSERT_OFFSET(flags, _flags);
> -NETMEM_DESC_ASSERT_OFFSET(pp_magic, pp_magic);
> -NETMEM_DESC_ASSERT_OFFSET(pp, pp);
> -NETMEM_DESC_ASSERT_OFFSET(_pp_mapping_pad, _pp_mapping_pad);
> -NETMEM_DESC_ASSERT_OFFSET(dma_addr, dma_addr);
> -NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
> +NETMEM_DESC_ASSERT_OFFSET(lru, pp_magic);
> +NETMEM_DESC_ASSERT_OFFSET(mapping, _pp_mapping_pad);
>  #undef NETMEM_DESC_ASSERT_OFFSET
>  
>  /*
> ---
> Changes from v4:
> 	1. Add given 'Reviewed-by's, thanks to all.
> 	2. Exclude potentially controversial patches.
> 
> Changes from v3:
> 	1. Relocates ->owner and ->type of net_iov out of netmem_desc
> 	   and make them be net_iov specific.
> 	2. Remove __force when casting struct page to struct netmem_desc.
> 
> Changes from v2:
> 	1. Introduce a netmem API, virt_to_head_netmem(), and use it
> 	   when it's needed.
> 	2. Introduce struct netmem_desc as a new struct and union'ed
> 	   with the existing fields in struct net_iov.
> 	3. Make page_pool_page_is_pp() access ->pp_magic through struct
> 	   netmem_desc instead of struct page.
> 	4. Move netmem alloc APIs from include/net/netmem.h to
> 	   net/core/netmem_priv.h.
> 	5. Apply trivial feedbacks, thanks to Mina, Pavel, and Toke.
> 	6. Add given 'Reviewed-by's, thanks to Mina.
> 
> Changes from v1:
> 	1. Rebase on net-next's main as of May 26.
> 	2. Check checkpatch.pl, feedbacked by SJ Park.
> 	3. Add converting of page to netmem in mt76.
> 	4. Revert 'mlx5: use netmem descriptor and APIs for page pool'
> 	   since it's on-going by Tariq Toukan.  I will wait for his
> 	   work to be done.
> 	5. Revert 'page_pool: use netmem APIs to access page->pp_magic
> 	   in page_pool_page_is_pp()' since we need more discussion.
> 	6. Revert 'mm, netmem: remove the page pool members in struct
> 	   page' since there are some prerequisite works to remove the
> 	   page pool fields from struct page.  I can submit this patch
> 	   separatedly later.
> 	7. Cancel relocating a page pool member in struct page.
> 	8. Modify static assert for offests and size of struct
> 	   netmem_desc.
> 
> Changes from rfc:
> 	1. Rebase on net-next's main branch.
> 	   https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/
> 	2. Fix a build error reported by kernel test robot.
> 	   https://lore.kernel.org/all/202505100932.uzAMBW1y-lkp@intel.com/
> 	3. Add given 'Reviewed-by's, thanks to Mina and Ilias.
> 	4. Do static_assert() on the size of struct netmem_desc instead
> 	   of placing place-holder in struct page, feedbacked by
> 	   Matthew.
> 	5. Do struct_group_tagged(netmem_desc) on struct net_iov instead
> 	   of wholly renaming it to strcut netmem_desc, feedbacked by
> 	   Mina and Pavel.
> 
> Byungchul Park (9):
>   netmem: introduce struct netmem_desc mirroring struct page
>   page_pool: rename page_pool_return_page() to page_pool_return_netmem()
>   page_pool: rename __page_pool_release_page_dma() to
>     __page_pool_release_netmem_dma()
>   page_pool: rename __page_pool_alloc_pages_slow() to
>     __page_pool_alloc_netmems_slow()
>   netmem: use _Generic to cover const casting for page_to_netmem()
>   netmem: remove __netmem_get_pp()
>   page_pool: make page_pool_get_dma_addr() just wrap
>     page_pool_get_dma_addr_netmem()
>   netmem: introduce a netmem API, virt_to_head_netmem()
>   page_pool: access ->pp_magic through struct netmem_desc in
>     page_pool_page_is_pp()
> 
>  include/linux/mm.h              |  12 ---
>  include/net/netmem.h            | 138 ++++++++++++++++++++++----------
>  include/net/page_pool/helpers.h |   7 +-
>  mm/page_alloc.c                 |   1 +
>  net/core/page_pool.c            |  36 ++++-----
>  5 files changed, 117 insertions(+), 77 deletions(-)
> 
> 
> base-commit: 90b83efa6701656e02c86e7df2cb1765ea602d07
> -- 
> 2.17.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-09  4:32 ` [PATCH net-next 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp() Byungchul Park
@ 2025-06-09 17:39   ` Mina Almasry
  2025-06-10  1:45     ` Byungchul Park
  2025-06-19  9:55   ` Vlastimil Babka
  1 sibling, 1 reply; 40+ messages in thread
From: Mina Almasry @ 2025-06-09 17:39 UTC (permalink / raw)
  To: Byungchul Park
  Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

On Sun, Jun 8, 2025 at 9:32 PM Byungchul Park <byungchul@sk.com> wrote:
>
> To simplify struct page, the effort to separate its own descriptor from
> struct page is required and the work for page pool is on going.
>
> To achieve that, all the code should avoid directly accessing page pool
> members of struct page.
>
> Access ->pp_magic through struct netmem_desc instead of directly
> accessing it through struct page in page_pool_page_is_pp().  Plus, move
> page_pool_page_is_pp() from mm.h to netmem.h to use struct netmem_desc
> without header dependency issue.
>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
>  include/linux/mm.h   | 12 ------------
>  include/net/netmem.h | 14 ++++++++++++++
>  mm/page_alloc.c      |  1 +
>  3 files changed, 15 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e51dba8398f7..f23560853447 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4311,16 +4311,4 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>   */
>  #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>
> -#ifdef CONFIG_PAGE_POOL
> -static inline bool page_pool_page_is_pp(struct page *page)
> -{
> -       return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> -}
> -#else
> -static inline bool page_pool_page_is_pp(struct page *page)
> -{
> -       return false;
> -}
> -#endif
> -
>  #endif /* _LINUX_MM_H */
> diff --git a/include/net/netmem.h b/include/net/netmem.h
> index d84ab624b489..8f354ae7d5c3 100644
> --- a/include/net/netmem.h
> +++ b/include/net/netmem.h
> @@ -56,6 +56,20 @@ NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
>   */
>  static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
>
> +#ifdef CONFIG_PAGE_POOL
> +static inline bool page_pool_page_is_pp(struct page *page)
> +{
> +       struct netmem_desc *desc = (struct netmem_desc *)page;
> +
> +       return (desc->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> +}
> +#else
> +static inline bool page_pool_page_is_pp(struct page *page)
> +{
> +       return false;
> +}
> +#endif
> +
>  /* net_iov */
>
>  DECLARE_STATIC_KEY_FALSE(page_pool_mem_providers);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4f29e393f6af..be0752c0ac92 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -55,6 +55,7 @@
>  #include <linux/delayacct.h>
>  #include <linux/cacheinfo.h>
>  #include <linux/pgalloc_tag.h>
> +#include <net/netmem.h>

mm files starting to include netmem.h is a bit interesting. I did not
expect/want dependencies outside of net. If anything the netmem stuff
include linux/mm.h

But I don't have a butter suggestion here and I don't see any huge
problems with this off the top of my head, so

Reviewed-by: Mina Almasry <almasrymina@google.com>

Lets see if Jakub objects though. To be fair, we did put the netmem
private stuff in net/core/netmem_priv.h, so technically
include/net/netmem.h should be exportable indeed.

-- 
Thanks,
Mina


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/9] netmem: introduce struct netmem_desc mirroring struct page
  2025-06-09  4:32 ` [PATCH net-next 1/9] netmem: introduce struct netmem_desc mirroring " Byungchul Park
@ 2025-06-09 19:32   ` Jakub Kicinski
  2025-06-10  1:30     ` Byungchul Park
  2025-06-11 20:53     ` Mina Almasry
  2025-06-17  2:20   ` Harry Yoo
  2025-06-19  9:32   ` Vlastimil Babka
  2 siblings, 2 replies; 40+ messages in thread
From: Jakub Kicinski @ 2025-06-09 19:32 UTC (permalink / raw)
  To: Byungchul Park
  Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

On Mon,  9 Jun 2025 13:32:17 +0900 Byungchul Park wrote:
> To simplify struct page, the page pool members of struct page should be
> moved to other, allowing these members to be removed from struct page.
> 
> Introduce a network memory descriptor to store the members, struct
> netmem_desc, and make it union'ed with the existing fields in struct
> net_iov, allowing to organize the fields of struct net_iov.

What's the intended relation between the types?

netmem_ref exists to clearly indicate that memory may not be readable.
Majority of memory we expect to allocate from page pool must be
kernel-readable. What's the plan for reading the "single pointer"
memory within the kernel?

I think you're approaching this problem from the easiest and least
relevant direction. Are you coordinating with David Howells?


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/9] netmem: introduce struct netmem_desc mirroring struct page
  2025-06-09 19:32   ` Jakub Kicinski
@ 2025-06-10  1:30     ` Byungchul Park
  2025-06-12  1:55       ` Jakub Kicinski
  2025-06-11 20:53     ` Mina Almasry
  1 sibling, 1 reply; 40+ messages in thread
From: Byungchul Park @ 2025-06-10  1:30 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

On Mon, Jun 09, 2025 at 12:32:55PM -0700, Jakub Kicinski wrote:
> On Mon,  9 Jun 2025 13:32:17 +0900 Byungchul Park wrote:
> > To simplify struct page, the page pool members of struct page should be
> > moved to other, allowing these members to be removed from struct page.
> > 
> > Introduce a network memory descriptor to store the members, struct
> > netmem_desc, and make it union'ed with the existing fields in struct
> > net_iov, allowing to organize the fields of struct net_iov.
> 
> What's the intended relation between the types?

One thing I'm trying to achieve is to remove pp fields from struct page,
and make network code use struct netmem_desc { pp fields; } instead of
sturc page for that purpose.

The reason why I union'ed it with the existing pp fields in struct
net_iov *temporarily* for now is, to fade out the existing pp fields
from struct net_iov so as to make the final form like:

	strcut net_iov {
		struct netmem_desc desc;

		net_iov_specific_variable_1;
		net_iov_specific_variable_2;
		...
	};

> netmem_ref exists to clearly indicate that memory may not be readable.
> Majority of memory we expect to allocate from page pool must be
> kernel-readable. What's the plan for reading the "single pointer"
> memory within the kernel?
> 
> I think you're approaching this problem from the easiest and least

No, I've never looked for the easiest way.  My bad if there are a better
way to achieve it.  What would you recommend?

> relevant direction. Are you coordinating with David Howells?

It's mm's project driven by Matthew Wilcox but as for page pool part,
I'm working alone.

   https://kernelnewbies.org/MatthewWilcox/Memdescs/Path

	Byungchul


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-09 17:39   ` Mina Almasry
@ 2025-06-10  1:45     ` Byungchul Park
  2025-06-11 14:30       ` Pavel Begunkov
  0 siblings, 1 reply; 40+ messages in thread
From: Byungchul Park @ 2025-06-10  1:45 UTC (permalink / raw)
  To: Mina Almasry
  Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

On Mon, Jun 09, 2025 at 10:39:06AM -0700, Mina Almasry wrote:
> On Sun, Jun 8, 2025 at 9:32 PM Byungchul Park <byungchul@sk.com> wrote:
> >
> > To simplify struct page, the effort to separate its own descriptor from
> > struct page is required and the work for page pool is on going.
> >
> > To achieve that, all the code should avoid directly accessing page pool
> > members of struct page.
> >
> > Access ->pp_magic through struct netmem_desc instead of directly
> > accessing it through struct page in page_pool_page_is_pp().  Plus, move
> > page_pool_page_is_pp() from mm.h to netmem.h to use struct netmem_desc
> > without header dependency issue.
> >
> > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > ---
> >  include/linux/mm.h   | 12 ------------
> >  include/net/netmem.h | 14 ++++++++++++++
> >  mm/page_alloc.c      |  1 +
> >  3 files changed, 15 insertions(+), 12 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index e51dba8398f7..f23560853447 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -4311,16 +4311,4 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> >   */
> >  #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> >
> > -#ifdef CONFIG_PAGE_POOL
> > -static inline bool page_pool_page_is_pp(struct page *page)
> > -{
> > -       return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> > -}
> > -#else
> > -static inline bool page_pool_page_is_pp(struct page *page)
> > -{
> > -       return false;
> > -}
> > -#endif
> > -
> >  #endif /* _LINUX_MM_H */
> > diff --git a/include/net/netmem.h b/include/net/netmem.h
> > index d84ab624b489..8f354ae7d5c3 100644
> > --- a/include/net/netmem.h
> > +++ b/include/net/netmem.h
> > @@ -56,6 +56,20 @@ NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
> >   */
> >  static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
> >
> > +#ifdef CONFIG_PAGE_POOL
> > +static inline bool page_pool_page_is_pp(struct page *page)
> > +{
> > +       struct netmem_desc *desc = (struct netmem_desc *)page;
> > +
> > +       return (desc->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> > +}
> > +#else
> > +static inline bool page_pool_page_is_pp(struct page *page)
> > +{
> > +       return false;
> > +}
> > +#endif
> > +
> >  /* net_iov */
> >
> >  DECLARE_STATIC_KEY_FALSE(page_pool_mem_providers);
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 4f29e393f6af..be0752c0ac92 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -55,6 +55,7 @@
> >  #include <linux/delayacct.h>
> >  #include <linux/cacheinfo.h>
> >  #include <linux/pgalloc_tag.h>
> > +#include <net/netmem.h>
> 
> mm files starting to include netmem.h is a bit interesting. I did not
> expect/want dependencies outside of net. If anything the netmem stuff
> include linux/mm.h

That's what I also concerned.  However, now that there are no way to
check the type of memory in a general way but require to use one of pp
fields, page_pool_page_is_pp() should be served by pp code e.i. network
subsystem.

This should be changed once either 1) mm provides a general way to check
the type or 2) pp code is moved to mm code.  I think this approach
should acceptable until then.

> But I don't have a butter suggestion here and I don't see any huge
			^
			lol

	Byungchul

> problems with this off the top of my head, so
> 
> Reviewed-by: Mina Almasry <almasrymina@google.com>
> 
> Lets see if Jakub objects though. To be fair, we did put the netmem
> private stuff in net/core/netmem_priv.h, so technically
> include/net/netmem.h should be exportable indeed.
> 
> -- 
> Thanks,
> Mina


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 6/9] netmem: remove __netmem_get_pp()
  2025-06-09  4:32 ` [PATCH net-next 6/9] netmem: remove __netmem_get_pp() Byungchul Park
@ 2025-06-10 11:11   ` Ilias Apalodimas
  0 siblings, 0 replies; 40+ messages in thread
From: Ilias Apalodimas @ 2025-06-10 11:11 UTC (permalink / raw)
  To: Byungchul Park
  Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
	almasrymina, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

On Mon, 9 Jun 2025 at 07:32, Byungchul Park <byungchul@sk.com> wrote:
>
> There are no users of __netmem_get_pp().  Remove it.
>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> Reviewed-by: Mina Almasry <almasrymina@google.com>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
> ---

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

>  include/net/netmem.h | 16 ----------------
>  1 file changed, 16 deletions(-)
>
> diff --git a/include/net/netmem.h b/include/net/netmem.h
> index 65bb87835664..d4066fcb1fee 100644
> --- a/include/net/netmem.h
> +++ b/include/net/netmem.h
> @@ -234,22 +234,6 @@ static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem)
>         return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV);
>  }
>
> -/**
> - * __netmem_get_pp - unsafely get pointer to the &page_pool backing @netmem
> - * @netmem: netmem reference to get the pointer from
> - *
> - * Unsafe version of netmem_get_pp(). When @netmem is always page-backed,
> - * e.g. when it's a header buffer, performs faster and generates smaller
> - * object code (avoids clearing the LSB). When @netmem points to IOV,
> - * provokes invalid memory access.
> - *
> - * Return: pointer to the &page_pool (garbage if @netmem is not page-backed).
> - */
> -static inline struct page_pool *__netmem_get_pp(netmem_ref netmem)
> -{
> -       return __netmem_to_page(netmem)->pp;
> -}
> -
>  static inline struct page_pool *netmem_get_pp(netmem_ref netmem)
>  {
>         return __netmem_clear_lsb(netmem)->pp;
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 3/9] page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma()
  2025-06-09  4:32 ` [PATCH net-next 3/9] page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma() Byungchul Park
@ 2025-06-10 11:11   ` Ilias Apalodimas
  0 siblings, 0 replies; 40+ messages in thread
From: Ilias Apalodimas @ 2025-06-10 11:11 UTC (permalink / raw)
  To: Byungchul Park
  Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
	almasrymina, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

On Mon, 9 Jun 2025 at 07:32, Byungchul Park <byungchul@sk.com> wrote:
>
> Now that __page_pool_release_page_dma() is for releasing netmem, not
> struct page, rename it to __page_pool_release_netmem_dma() to reflect
> what it does.
>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> Reviewed-by: Mina Almasry <almasrymina@google.com>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
> ---

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

>  net/core/page_pool.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 460d11a31fbc..8d44d1abfaef 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -673,8 +673,8 @@ void page_pool_clear_pp_info(netmem_ref netmem)
>         netmem_set_pp(netmem, NULL);
>  }
>
> -static __always_inline void __page_pool_release_page_dma(struct page_pool *pool,
> -                                                        netmem_ref netmem)
> +static __always_inline void __page_pool_release_netmem_dma(struct page_pool *pool,
> +                                                          netmem_ref netmem)
>  {
>         struct page *old, *page = netmem_to_page(netmem);
>         unsigned long id;
> @@ -721,7 +721,7 @@ static void page_pool_return_netmem(struct page_pool *pool, netmem_ref netmem)
>         if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops)
>                 put = pool->mp_ops->release_netmem(pool, netmem);
>         else
> -               __page_pool_release_page_dma(pool, netmem);
> +               __page_pool_release_netmem_dma(pool, netmem);
>
>         /* This may be the last page returned, releasing the pool, so
>          * it is not safe to reference pool afterwards.
> @@ -1139,7 +1139,7 @@ static void page_pool_scrub(struct page_pool *pool)
>                 }
>
>                 xa_for_each(&pool->dma_mapped, id, ptr)
> -                       __page_pool_release_page_dma(pool, page_to_netmem(ptr));
> +                       __page_pool_release_netmem_dma(pool, page_to_netmem((struct page *)ptr));
>         }
>
>         /* No more consumers should exist, but producers could still
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 2/9] page_pool: rename page_pool_return_page() to page_pool_return_netmem()
  2025-06-09  4:32 ` [PATCH net-next 2/9] page_pool: rename page_pool_return_page() to page_pool_return_netmem() Byungchul Park
@ 2025-06-10 11:15   ` Ilias Apalodimas
  0 siblings, 0 replies; 40+ messages in thread
From: Ilias Apalodimas @ 2025-06-10 11:15 UTC (permalink / raw)
  To: Byungchul Park
  Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
	almasrymina, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

On Mon, 9 Jun 2025 at 07:32, Byungchul Park <byungchul@sk.com> wrote:
>
> Now that page_pool_return_page() is for returning netmem, not struct
> page, rename it to page_pool_return_netmem() to reflect what it does.
>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> Reviewed-by: Mina Almasry <almasrymina@google.com>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
> ---

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

>  net/core/page_pool.c | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 4011eb305cee..460d11a31fbc 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -371,7 +371,7 @@ struct page_pool *page_pool_create(const struct page_pool_params *params)
>  }
>  EXPORT_SYMBOL(page_pool_create);
>
> -static void page_pool_return_page(struct page_pool *pool, netmem_ref netmem);
> +static void page_pool_return_netmem(struct page_pool *pool, netmem_ref netmem);
>
>  static noinline netmem_ref page_pool_refill_alloc_cache(struct page_pool *pool)
>  {
> @@ -409,7 +409,7 @@ static noinline netmem_ref page_pool_refill_alloc_cache(struct page_pool *pool)
>                          * (2) break out to fallthrough to alloc_pages_node.
>                          * This limit stress on page buddy alloactor.
>                          */
> -                       page_pool_return_page(pool, netmem);
> +                       page_pool_return_netmem(pool, netmem);
>                         alloc_stat_inc(pool, waive);
>                         netmem = 0;
>                         break;
> @@ -712,7 +712,7 @@ static __always_inline void __page_pool_release_page_dma(struct page_pool *pool,
>   * a regular page (that will eventually be returned to the normal
>   * page-allocator via put_page).
>   */
> -void page_pool_return_page(struct page_pool *pool, netmem_ref netmem)
> +static void page_pool_return_netmem(struct page_pool *pool, netmem_ref netmem)
>  {
>         int count;
>         bool put;
> @@ -829,7 +829,7 @@ __page_pool_put_page(struct page_pool *pool, netmem_ref netmem,
>          * will be invoking put_page.
>          */
>         recycle_stat_inc(pool, released_refcnt);
> -       page_pool_return_page(pool, netmem);
> +       page_pool_return_netmem(pool, netmem);
>
>         return 0;
>  }
> @@ -872,7 +872,7 @@ void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem,
>         if (netmem && !page_pool_recycle_in_ring(pool, netmem)) {
>                 /* Cache full, fallback to free pages */
>                 recycle_stat_inc(pool, ring_full);
> -               page_pool_return_page(pool, netmem);
> +               page_pool_return_netmem(pool, netmem);
>         }
>  }
>  EXPORT_SYMBOL(page_pool_put_unrefed_netmem);
> @@ -915,7 +915,7 @@ static void page_pool_recycle_ring_bulk(struct page_pool *pool,
>          * since put_page() with refcnt == 1 can be an expensive operation.
>          */
>         for (; i < bulk_len; i++)
> -               page_pool_return_page(pool, bulk[i]);
> +               page_pool_return_netmem(pool, bulk[i]);
>  }
>
>  /**
> @@ -998,7 +998,7 @@ static netmem_ref page_pool_drain_frag(struct page_pool *pool,
>                 return netmem;
>         }
>
> -       page_pool_return_page(pool, netmem);
> +       page_pool_return_netmem(pool, netmem);
>         return 0;
>  }
>
> @@ -1012,7 +1012,7 @@ static void page_pool_free_frag(struct page_pool *pool)
>         if (!netmem || page_pool_unref_netmem(netmem, drain_count))
>                 return;
>
> -       page_pool_return_page(pool, netmem);
> +       page_pool_return_netmem(pool, netmem);
>  }
>
>  netmem_ref page_pool_alloc_frag_netmem(struct page_pool *pool,
> @@ -1079,7 +1079,7 @@ static void page_pool_empty_ring(struct page_pool *pool)
>                         pr_crit("%s() page_pool refcnt %d violation\n",
>                                 __func__, netmem_ref_count(netmem));
>
> -               page_pool_return_page(pool, netmem);
> +               page_pool_return_netmem(pool, netmem);
>         }
>  }
>
> @@ -1112,7 +1112,7 @@ static void page_pool_empty_alloc_cache_once(struct page_pool *pool)
>          */
>         while (pool->alloc.count) {
>                 netmem = pool->alloc.cache[--pool->alloc.count];
> -               page_pool_return_page(pool, netmem);
> +               page_pool_return_netmem(pool, netmem);
>         }
>  }
>
> @@ -1252,7 +1252,7 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid)
>         /* Flush pool alloc cache, as refill will check NUMA node */
>         while (pool->alloc.count) {
>                 netmem = pool->alloc.cache[--pool->alloc.count];
> -               page_pool_return_page(pool, netmem);
> +               page_pool_return_netmem(pool, netmem);
>         }
>  }
>  EXPORT_SYMBOL(page_pool_update_nid);
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 0/9] Split netmem from struct page
  2025-06-09  4:32 [PATCH net-next 0/9] Split netmem from struct page Byungchul Park
                   ` (9 preceding siblings ...)
  2025-06-09  6:49 ` [PATCH net-next 0/9] Split netmem from struct page Byungchul Park
@ 2025-06-11 14:25 ` Pavel Begunkov
  2025-06-11 20:48   ` Mina Almasry
  10 siblings, 1 reply; 40+ messages in thread
From: Pavel Begunkov @ 2025-06-11 14:25 UTC (permalink / raw)
  To: Byungchul Park, willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, toke, tariqt, edumazet, pabeni, saeedm, leon, ast,
	daniel, david, lorenzo.stoakes, Liam.Howlett, vbabka, rppt,
	surenb, mhocko, horms, linux-rdma, bpf, vishal.moola

On 6/9/25 05:32, Byungchul Park wrote:
> Hi all,
> 
> In this version, I'm posting non-controversial patches first.  I will
> post the rest more carefully later.  In this version, no update has been
> applied except excluding some patches from the previous version.  See
> the changes below.

fwiw, I tried it with net_iov (zcrx), it didn't blow up during a
short test.

-- 
Pavel Begunkov



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-10  1:45     ` Byungchul Park
@ 2025-06-11 14:30       ` Pavel Begunkov
  2025-06-12  0:39         ` Byungchul Park
  2025-06-17  2:37         ` Harry Yoo
  0 siblings, 2 replies; 40+ messages in thread
From: Pavel Begunkov @ 2025-06-11 14:30 UTC (permalink / raw)
  To: Byungchul Park, Mina Almasry
  Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, toke, tariqt, edumazet, pabeni, saeedm, leon, ast,
	daniel, david, lorenzo.stoakes, Liam.Howlett, vbabka, rppt,
	surenb, mhocko, horms, linux-rdma, bpf, vishal.moola

On 6/10/25 02:45, Byungchul Park wrote:
> On Mon, Jun 09, 2025 at 10:39:06AM -0700, Mina Almasry wrote:
>> On Sun, Jun 8, 2025 at 9:32 PM Byungchul Park <byungchul@sk.com> wrote:
>>>
>>> To simplify struct page, the effort to separate its own descriptor from
>>> struct page is required and the work for page pool is on going.
>>>
>>> To achieve that, all the code should avoid directly accessing page pool
>>> members of struct page.
>>>
>>> Access ->pp_magic through struct netmem_desc instead of directly
>>> accessing it through struct page in page_pool_page_is_pp().  Plus, move
>>> page_pool_page_is_pp() from mm.h to netmem.h to use struct netmem_desc
>>> without header dependency issue.
>>>
>>> Signed-off-by: Byungchul Park <byungchul@sk.com>
>>> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>> ---
>>>   include/linux/mm.h   | 12 ------------
>>>   include/net/netmem.h | 14 ++++++++++++++
>>>   mm/page_alloc.c      |  1 +
>>>   3 files changed, 15 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index e51dba8398f7..f23560853447 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -4311,16 +4311,4 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>>>    */
>>>   #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>>>
>>> -#ifdef CONFIG_PAGE_POOL
>>> -static inline bool page_pool_page_is_pp(struct page *page)
>>> -{
>>> -       return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>> -}
>>> -#else
>>> -static inline bool page_pool_page_is_pp(struct page *page)
>>> -{
>>> -       return false;
>>> -}
>>> -#endif
>>> -
>>>   #endif /* _LINUX_MM_H */
>>> diff --git a/include/net/netmem.h b/include/net/netmem.h
>>> index d84ab624b489..8f354ae7d5c3 100644
>>> --- a/include/net/netmem.h
>>> +++ b/include/net/netmem.h
>>> @@ -56,6 +56,20 @@ NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
>>>    */
>>>   static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
>>>
>>> +#ifdef CONFIG_PAGE_POOL
>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>> +{
>>> +       struct netmem_desc *desc = (struct netmem_desc *)page;
>>> +
>>> +       return (desc->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>> +}
>>> +#else
>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>> +{
>>> +       return false;
>>> +}
>>> +#endif
>>> +
>>>   /* net_iov */
>>>
>>>   DECLARE_STATIC_KEY_FALSE(page_pool_mem_providers);
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 4f29e393f6af..be0752c0ac92 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -55,6 +55,7 @@
>>>   #include <linux/delayacct.h>
>>>   #include <linux/cacheinfo.h>
>>>   #include <linux/pgalloc_tag.h>
>>> +#include <net/netmem.h>
>>
>> mm files starting to include netmem.h is a bit interesting. I did not
>> expect/want dependencies outside of net. If anything the netmem stuff
>> include linux/mm.h
> 
> That's what I also concerned.  However, now that there are no way to
> check the type of memory in a general way but require to use one of pp
> fields, page_pool_page_is_pp() should be served by pp code e.i. network
> subsystem.
> 
> This should be changed once either 1) mm provides a general way to check
> the type or 2) pp code is moved to mm code.  I think this approach
> should acceptable until then.

I'd argue in the end the helper should be in mm.h as mm is going to
dictate how to check the type and keep them enumerated.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>

-- 
Pavel Begunkov



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 0/9] Split netmem from struct page
  2025-06-11 14:25 ` Pavel Begunkov
@ 2025-06-11 20:48   ` Mina Almasry
  2025-06-12  0:40     ` Byungchul Park
  2025-06-16  7:45     ` Jesper Dangaard Brouer
  0 siblings, 2 replies; 40+ messages in thread
From: Mina Almasry @ 2025-06-11 20:48 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Byungchul Park, willy, netdev, linux-kernel, linux-mm,
	kernel_team, kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
	john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

On Wed, Jun 11, 2025 at 7:24 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 6/9/25 05:32, Byungchul Park wrote:
> > Hi all,
> >
> > In this version, I'm posting non-controversial patches first.  I will
> > post the rest more carefully later.  In this version, no update has been
> > applied except excluding some patches from the previous version.  See
> > the changes below.
>
> fwiw, I tried it with net_iov (zcrx), it didn't blow up during a
> short test.
>

FWIW, I ran my devmem TCP tests, and pp benchmark regression tests.
Both look good to me. For the pp benchmark:

Before:

Fast path results:
no-softirq-page_pool01 Per elem: 11 cycles(tsc) 4.337 ns

ptr_ring results:
no-softirq-page_pool02 Per elem: 529 cycles(tsc) 196.073 ns

slow path results:
no-softirq-page_pool03 Per elem: 554 cycles(tsc) 205.195 ns

After:

Fast path results:
no-softirq-page_pool01 Per elem: 11 cycles(tsc) 4.401 ns

ptr_ring results:
no-softirq-page_pool02 Per elem: 530 cycles(tsc) 196.443 ns

slow path results:
no-softirq-page_pool03 Per elem: 551 cycles(tsc) 204.287 ns



-- 
Thanks,
Mina


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/9] netmem: introduce struct netmem_desc mirroring struct page
  2025-06-09 19:32   ` Jakub Kicinski
  2025-06-10  1:30     ` Byungchul Park
@ 2025-06-11 20:53     ` Mina Almasry
  2025-06-12  1:58       ` Jakub Kicinski
  1 sibling, 1 reply; 40+ messages in thread
From: Mina Almasry @ 2025-06-11 20:53 UTC (permalink / raw)
  To: Jakub Kicinski, David Howells
  Cc: Byungchul Park, willy, netdev, linux-kernel, linux-mm,
	kernel_team, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
	john.fastabend, andrew+netdev, asml.silence, toke, tariqt,
	edumazet, pabeni, saeedm, leon, ast, daniel, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	horms, linux-rdma, bpf, vishal.moola

On Mon, Jun 9, 2025 at 12:32 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon,  9 Jun 2025 13:32:17 +0900 Byungchul Park wrote:
> > To simplify struct page, the page pool members of struct page should be
> > moved to other, allowing these members to be removed from struct page.
> >
> > Introduce a network memory descriptor to store the members, struct
> > netmem_desc, and make it union'ed with the existing fields in struct
> > net_iov, allowing to organize the fields of struct net_iov.
>
> What's the intended relation between the types?
>
> netmem_ref exists to clearly indicate that memory may not be readable.
> Majority of memory we expect to allocate from page pool must be
> kernel-readable. What's the plan for reading the "single pointer"
> memory within the kernel?
>
> I think you're approaching this problem from the easiest and least
> relevant direction. Are you coordinating with David Howells?

FWIW I did point David to this work in a tangentially related thread:

https://lore.kernel.org/netdev/CAHS8izMMU8QZrvXRiDjqwsBg_34s+dhvSyrU7XGMBuPF6eWyTA@mail.gmail.com/

I think yes it would be good to get a reviewed-by or acked-by from
Matthew or David to show that this approach is in line with their
plans?

From my end I tried to review to:

1. Make sure the changes are compatible with net_iov/netmem_ref.
2. Make sure what's implemented here is in line with the memdesc
effort Matthew lists here[1]. In particular the netmem_desc struct
introduced here is very similar to the zpdesc and ptdesc structs
mentioned as an example in [1].

[1] https://kernelnewbies.org/MatthewWilcox/Memdescs/Path

-- 
Thanks,
Mina


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-11 14:30       ` Pavel Begunkov
@ 2025-06-12  0:39         ` Byungchul Park
  2025-06-17  2:37         ` Harry Yoo
  1 sibling, 0 replies; 40+ messages in thread
From: Byungchul Park @ 2025-06-12  0:39 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Mina Almasry, willy, netdev, linux-kernel, linux-mm, kernel_team,
	kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
	john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

On Wed, Jun 11, 2025 at 03:30:28PM +0100, Pavel Begunkov wrote:
> On 6/10/25 02:45, Byungchul Park wrote:
> > On Mon, Jun 09, 2025 at 10:39:06AM -0700, Mina Almasry wrote:
> > > On Sun, Jun 8, 2025 at 9:32 PM Byungchul Park <byungchul@sk.com> wrote:
> > > > 
> > > > To simplify struct page, the effort to separate its own descriptor from
> > > > struct page is required and the work for page pool is on going.
> > > > 
> > > > To achieve that, all the code should avoid directly accessing page pool
> > > > members of struct page.
> > > > 
> > > > Access ->pp_magic through struct netmem_desc instead of directly
> > > > accessing it through struct page in page_pool_page_is_pp().  Plus, move
> > > > page_pool_page_is_pp() from mm.h to netmem.h to use struct netmem_desc
> > > > without header dependency issue.
> > > > 
> > > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > > > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > > > ---
> > > >   include/linux/mm.h   | 12 ------------
> > > >   include/net/netmem.h | 14 ++++++++++++++
> > > >   mm/page_alloc.c      |  1 +
> > > >   3 files changed, 15 insertions(+), 12 deletions(-)
> > > > 
> > > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > > index e51dba8398f7..f23560853447 100644
> > > > --- a/include/linux/mm.h
> > > > +++ b/include/linux/mm.h
> > > > @@ -4311,16 +4311,4 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> > > >    */
> > > >   #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> > > > 
> > > > -#ifdef CONFIG_PAGE_POOL
> > > > -static inline bool page_pool_page_is_pp(struct page *page)
> > > > -{
> > > > -       return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> > > > -}
> > > > -#else
> > > > -static inline bool page_pool_page_is_pp(struct page *page)
> > > > -{
> > > > -       return false;
> > > > -}
> > > > -#endif
> > > > -
> > > >   #endif /* _LINUX_MM_H */
> > > > diff --git a/include/net/netmem.h b/include/net/netmem.h
> > > > index d84ab624b489..8f354ae7d5c3 100644
> > > > --- a/include/net/netmem.h
> > > > +++ b/include/net/netmem.h
> > > > @@ -56,6 +56,20 @@ NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
> > > >    */
> > > >   static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
> > > > 
> > > > +#ifdef CONFIG_PAGE_POOL
> > > > +static inline bool page_pool_page_is_pp(struct page *page)
> > > > +{
> > > > +       struct netmem_desc *desc = (struct netmem_desc *)page;
> > > > +
> > > > +       return (desc->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> > > > +}
> > > > +#else
> > > > +static inline bool page_pool_page_is_pp(struct page *page)
> > > > +{
> > > > +       return false;
> > > > +}
> > > > +#endif
> > > > +
> > > >   /* net_iov */
> > > > 
> > > >   DECLARE_STATIC_KEY_FALSE(page_pool_mem_providers);
> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > index 4f29e393f6af..be0752c0ac92 100644
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -55,6 +55,7 @@
> > > >   #include <linux/delayacct.h>
> > > >   #include <linux/cacheinfo.h>
> > > >   #include <linux/pgalloc_tag.h>
> > > > +#include <net/netmem.h>
> > > 
> > > mm files starting to include netmem.h is a bit interesting. I did not
> > > expect/want dependencies outside of net. If anything the netmem stuff
> > > include linux/mm.h
> > 
> > That's what I also concerned.  However, now that there are no way to
> > check the type of memory in a general way but require to use one of pp
> > fields, page_pool_page_is_pp() should be served by pp code e.i. network
> > subsystem.
> > 
> > This should be changed once either 1) mm provides a general way to check
> > the type or 2) pp code is moved to mm code.  I think this approach
> > should acceptable until then.
> 
> I'd argue in the end the helper should be in mm.h as mm is going to
> dictate how to check the type and keep them enumerated.

Definitely yes, I agree.  However, mm cannot do that for now.  With the
current way, it should be served by pp code since the type checking is
done through the pp fields.

	Byungchul
> 
> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
> 
> -- 
> Pavel Begunkov


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 0/9] Split netmem from struct page
  2025-06-11 20:48   ` Mina Almasry
@ 2025-06-12  0:40     ` Byungchul Park
  2025-06-16  7:45     ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 40+ messages in thread
From: Byungchul Park @ 2025-06-12  0:40 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Pavel Begunkov, willy, netdev, linux-kernel, linux-mm,
	kernel_team, kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
	john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

On Wed, Jun 11, 2025 at 01:48:36PM -0700, Mina Almasry wrote:
> On Wed, Jun 11, 2025 at 7:24 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
> >
> > On 6/9/25 05:32, Byungchul Park wrote:
> > > Hi all,
> > >
> > > In this version, I'm posting non-controversial patches first.  I will
> > > post the rest more carefully later.  In this version, no update has been
> > > applied except excluding some patches from the previous version.  See
> > > the changes below.
> >
> > fwiw, I tried it with net_iov (zcrx), it didn't blow up during a
> > short test.
> >
> 
> FWIW, I ran my devmem TCP tests, and pp benchmark regression tests.
> Both look good to me. For the pp benchmark:

Thanks for the test.

	Byungchul
> 
> Before:
> 
> Fast path results:
> no-softirq-page_pool01 Per elem: 11 cycles(tsc) 4.337 ns
> 
> ptr_ring results:
> no-softirq-page_pool02 Per elem: 529 cycles(tsc) 196.073 ns
> 
> slow path results:
> no-softirq-page_pool03 Per elem: 554 cycles(tsc) 205.195 ns
> 
> After:
> 
> Fast path results:
> no-softirq-page_pool01 Per elem: 11 cycles(tsc) 4.401 ns
> 
> ptr_ring results:
> no-softirq-page_pool02 Per elem: 530 cycles(tsc) 196.443 ns
> 
> slow path results:
> no-softirq-page_pool03 Per elem: 551 cycles(tsc) 204.287 ns
> 
> 
> 
> -- 
> Thanks,
> Mina


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/9] netmem: introduce struct netmem_desc mirroring struct page
  2025-06-10  1:30     ` Byungchul Park
@ 2025-06-12  1:55       ` Jakub Kicinski
  2025-06-13  1:13         ` Byungchul Park
  0 siblings, 1 reply; 40+ messages in thread
From: Jakub Kicinski @ 2025-06-12  1:55 UTC (permalink / raw)
  To: Byungchul Park
  Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

On Tue, 10 Jun 2025 10:30:01 +0900 Byungchul Park wrote:
> > What's the intended relation between the types?  
> 
> One thing I'm trying to achieve is to remove pp fields from struct page,
> and make network code use struct netmem_desc { pp fields; } instead of
> sturc page for that purpose.
> 
> The reason why I union'ed it with the existing pp fields in struct
> net_iov *temporarily* for now is, to fade out the existing pp fields
> from struct net_iov so as to make the final form like:

I see, I may have mixed up the complaints there. I thought the effort
was also about removing the need for the ref count. And Rx is
relatively light on use of ref counting. 

> > netmem_ref exists to clearly indicate that memory may not be readable.
> > Majority of memory we expect to allocate from page pool must be
> > kernel-readable. What's the plan for reading the "single pointer"
> > memory within the kernel?
> > 
> > I think you're approaching this problem from the easiest and least  
> 
> No, I've never looked for the easiest way.  My bad if there are a better
> way to achieve it.  What would you recommend?

Sorry, I don't mean that the approach you took is the easiest way out.
I meant that between Rx and Tx handling Rx is the easier part because 
we already have the suitable abstraction. It's true that we use more
fields in page struct on Rx, but I thought Tx is also more urgent
as there are open reports for networking taking references on slab
pages.

In any case, please make sure you maintain clear separation between
readable and unreadable memory in the code you produce.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/9] netmem: introduce struct netmem_desc mirroring struct page
  2025-06-11 20:53     ` Mina Almasry
@ 2025-06-12  1:58       ` Jakub Kicinski
  0 siblings, 0 replies; 40+ messages in thread
From: Jakub Kicinski @ 2025-06-12  1:58 UTC (permalink / raw)
  To: Mina Almasry
  Cc: David Howells, Byungchul Park, willy, netdev, linux-kernel,
	linux-mm, kernel_team, ilias.apalodimas, harry.yoo, hawk, akpm,
	davem, john.fastabend, andrew+netdev, asml.silence, toke, tariqt,
	edumazet, pabeni, saeedm, leon, ast, daniel, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	horms, linux-rdma, bpf, vishal.moola

On Wed, 11 Jun 2025 13:53:51 -0700 Mina Almasry wrote:
> I think yes it would be good to get a reviewed-by or acked-by from
> Matthew or David to show that this approach is in line with their
> plans?

Yes, most definitely!


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/9] netmem: introduce struct netmem_desc mirroring struct page
  2025-06-12  1:55       ` Jakub Kicinski
@ 2025-06-13  1:13         ` Byungchul Park
  2025-06-14  2:19           ` Mina Almasry
  0 siblings, 1 reply; 40+ messages in thread
From: Byungchul Park @ 2025-06-13  1:13 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

On Wed, Jun 11, 2025 at 06:55:42PM -0700, Jakub Kicinski wrote:
> On Tue, 10 Jun 2025 10:30:01 +0900 Byungchul Park wrote:
> > > What's the intended relation between the types?  
> > 
> > One thing I'm trying to achieve is to remove pp fields from struct page,
> > and make network code use struct netmem_desc { pp fields; } instead of
> > sturc page for that purpose.
> > 
> > The reason why I union'ed it with the existing pp fields in struct
> > net_iov *temporarily* for now is, to fade out the existing pp fields
> > from struct net_iov so as to make the final form like:
> 
> I see, I may have mixed up the complaints there. I thought the effort
> was also about removing the need for the ref count. And Rx is
> relatively light on use of ref counting. 
> 
> > > netmem_ref exists to clearly indicate that memory may not be readable.
> > > Majority of memory we expect to allocate from page pool must be
> > > kernel-readable. What's the plan for reading the "single pointer"
> > > memory within the kernel?
> > > 
> > > I think you're approaching this problem from the easiest and least  
> > 
> > No, I've never looked for the easiest way.  My bad if there are a better
> > way to achieve it.  What would you recommend?
> 
> Sorry, I don't mean that the approach you took is the easiest way out.
> I meant that between Rx and Tx handling Rx is the easier part because 
> we already have the suitable abstraction. It's true that we use more
> fields in page struct on Rx, but I thought Tx is also more urgent
> as there are open reports for networking taking references on slab
> pages.
> 
> In any case, please make sure you maintain clear separation between
> readable and unreadable memory in the code you produce.

Do you mean the current patches do not?  If yes, please point out one
as example, which would be helpful to extract action items.

If no, are there things I should do further for this series including
non-controversial patches only?

	Byungchul


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/9] netmem: introduce struct netmem_desc mirroring struct page
  2025-06-13  1:13         ` Byungchul Park
@ 2025-06-14  2:19           ` Mina Almasry
  2025-06-17  2:31             ` netmem series needs some love and Acks from MM folks Harry Yoo
  2025-06-20  4:12             ` [PATCH net-next 1/9] netmem: introduce struct netmem_desc mirroring struct page Byungchul Park
  0 siblings, 2 replies; 40+ messages in thread
From: Mina Almasry @ 2025-06-14  2:19 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Jakub Kicinski, willy, netdev, linux-kernel, linux-mm,
	kernel_team, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
	john.fastabend, andrew+netdev, asml.silence, toke, tariqt,
	edumazet, pabeni, saeedm, leon, ast, daniel, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	horms, linux-rdma, bpf, vishal.moola

On Thu, Jun 12, 2025 at 6:13 PM Byungchul Park <byungchul@sk.com> wrote:
>
> On Wed, Jun 11, 2025 at 06:55:42PM -0700, Jakub Kicinski wrote:
> > On Tue, 10 Jun 2025 10:30:01 +0900 Byungchul Park wrote:
> > > > What's the intended relation between the types?
> > >
> > > One thing I'm trying to achieve is to remove pp fields from struct page,
> > > and make network code use struct netmem_desc { pp fields; } instead of
> > > sturc page for that purpose.
> > >
> > > The reason why I union'ed it with the existing pp fields in struct
> > > net_iov *temporarily* for now is, to fade out the existing pp fields
> > > from struct net_iov so as to make the final form like:
> >
> > I see, I may have mixed up the complaints there. I thought the effort
> > was also about removing the need for the ref count. And Rx is
> > relatively light on use of ref counting.
> >
> > > > netmem_ref exists to clearly indicate that memory may not be readable.
> > > > Majority of memory we expect to allocate from page pool must be
> > > > kernel-readable. What's the plan for reading the "single pointer"
> > > > memory within the kernel?
> > > >
> > > > I think you're approaching this problem from the easiest and least
> > >
> > > No, I've never looked for the easiest way.  My bad if there are a better
> > > way to achieve it.  What would you recommend?
> >
> > Sorry, I don't mean that the approach you took is the easiest way out.
> > I meant that between Rx and Tx handling Rx is the easier part because
> > we already have the suitable abstraction. It's true that we use more
> > fields in page struct on Rx, but I thought Tx is also more urgent
> > as there are open reports for networking taking references on slab
> > pages.
> >
> > In any case, please make sure you maintain clear separation between
> > readable and unreadable memory in the code you produce.
>
> Do you mean the current patches do not?  If yes, please point out one
> as example, which would be helpful to extract action items.
>

I think one thing we could do to improve separation between readable
(pages/netmem_desc) and unreadable (net_iov) is to remove the struct
netmem_desc field inside the net_iov, and instead just duplicate the
pp/pp_ref_count/etc fields. The current code gives off the impression
that net_iov may be a container of netmem_desc which is not really
accurate.

But I don't think that's a major blocker. I think maybe the real issue
is that there are no reviews from any mm maintainers? So I'm not 100%
sure this is in line with their memdesc plans. I think probably
patches 2->8 are generic netmem-ifications that are good to merge
anyway, but I would say patch 1 and 9 need a reviewed by from someone
on the mm side. Just my 2 cents.

Btw, this series has been marked as changes requested on patchwork, so
it is in need of a respin one way or another:

https://patchwork.kernel.org/project/netdevbpf/list/?series=&submitter=byungchul&state=*&q=&archive=&delegate=

https://docs.kernel.org/process/maintainer-netdev.html#patch-status

-- 
Thanks,
Mina


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 0/9] Split netmem from struct page
  2025-06-11 20:48   ` Mina Almasry
  2025-06-12  0:40     ` Byungchul Park
@ 2025-06-16  7:45     ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 40+ messages in thread
From: Jesper Dangaard Brouer @ 2025-06-16  7:45 UTC (permalink / raw)
  To: Mina Almasry, Pavel Begunkov
  Cc: Byungchul Park, willy, netdev, linux-kernel, linux-mm,
	kernel_team, kuba, ilias.apalodimas, harry.yoo, akpm, davem,
	john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola



On 11/06/2025 22.48, Mina Almasry wrote:
> On Wed, Jun 11, 2025 at 7:24 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
>> On 6/9/25 05:32, Byungchul Park wrote:
>>> Hi all,
>>>
>>> In this version, I'm posting non-controversial patches first.  I will
>>> post the rest more carefully later.  In this version, no update has been
>>> applied except excluding some patches from the previous version.  See
>>> the changes below.
>>
>> fwiw, I tried it with net_iov (zcrx), it didn't blow up during a
>> short test.
>>
> 
> FWIW, I ran my devmem TCP tests, and pp benchmark regression tests.
> Both look good to me. For the pp benchmark:

Thanks for verifying micro-benchmarks wasn't affected by this change.

> Before:
> 
> Fast path results:
> no-softirq-page_pool01 Per elem: 11 cycles(tsc) 4.337 ns
> 
> ptr_ring results:
> no-softirq-page_pool02 Per elem: 529 cycles(tsc) 196.073 ns

I'm surprised that ptr_ring return is this expensive on your system Mina.
I'll reply on the [v4] benchmark import patch, as this needs some more 
investigations.

[v4] 
https://lore.kernel.org/all/20250615205914.835368-1-almasrymina@google.com/

> slow path results:
> no-softirq-page_pool03 Per elem: 554 cycles(tsc) 205.195 ns
> 
> After:
> 
> Fast path results:
> no-softirq-page_pool01 Per elem: 11 cycles(tsc) 4.401 ns
> 
> ptr_ring results:
> no-softirq-page_pool02 Per elem: 530 cycles(tsc) 196.443 ns
> 
> slow path results:
> no-softirq-page_pool03 Per elem: 551 cycles(tsc) 204.287 ns
> 
> 
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/9] netmem: introduce struct netmem_desc mirroring struct page
  2025-06-09  4:32 ` [PATCH net-next 1/9] netmem: introduce struct netmem_desc mirroring " Byungchul Park
  2025-06-09 19:32   ` Jakub Kicinski
@ 2025-06-17  2:20   ` Harry Yoo
  2025-06-19  9:32   ` Vlastimil Babka
  2 siblings, 0 replies; 40+ messages in thread
From: Harry Yoo @ 2025-06-17  2:20 UTC (permalink / raw)
  To: Byungchul Park
  Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
	almasrymina, ilias.apalodimas, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

On Mon, Jun 09, 2025 at 01:32:17PM +0900, Byungchul Park wrote:
> To simplify struct page, the page pool members of struct page should be
> moved to other, allowing these members to be removed from struct page.
> 
> Introduce a network memory descriptor to store the members, struct
> netmem_desc, and make it union'ed with the existing fields in struct
> net_iov, allowing to organize the fields of struct net_iov.
> 
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
> Reviewed-by: Mina Almasry <almasrymina@google.com>
> ---

I think one point of confusion here is that net_iov mirrors some fields
in netmem_desc, even though net_iov itself does not overlay struct page.

Presumably the reason is because net_iovs may not be associated with
specific pages, but only with DMA addresses?

The only reason why it mirrors netmem_desc fields seems to be
"page_pool doesn't want to care if netmem_ref is netmem_desc or net_iov
 when doing something with netmem_ref".

Maybe it's worth clearly documenting that net_iov does not
overlay (I mean, does not share storage with) struct page, and
why it won't be a memdesc type in the memdesc world.

Other than that, it looks good to me:
Acked-by: Harry Yoo <harry.yoo@oracle.com>

>  include/net/netmem.h | 94 ++++++++++++++++++++++++++++++++++----------
>  1 file changed, 73 insertions(+), 21 deletions(-)
> 
> diff --git a/include/net/netmem.h b/include/net/netmem.h
> index 386164fb9c18..2687c8051ca5 100644
> --- a/include/net/netmem.h
> +++ b/include/net/netmem.h
> @@ -12,6 +12,50 @@
>  #include <linux/mm.h>
>  #include <net/net_debug.h>
>  
> +/* These fields in struct page are used by the page_pool and net stack:
> + *
> + *        struct {
> + *                unsigned long pp_magic;
> + *                struct page_pool *pp;
> + *                unsigned long _pp_mapping_pad;
> + *                unsigned long dma_addr;
> + *                atomic_long_t pp_ref_count;
> + *        };
> + *
> + * We mirror the page_pool fields here so the page_pool can access these
> + * fields without worrying whether the underlying fields belong to a
> + * page or netmem_desc.
> + *
> + * CAUTION: Do not update the fields in netmem_desc without also
> + * updating the anonymous aliasing union in struct net_iov.
> + */
> +struct netmem_desc {
> +	unsigned long _flags;
> +	unsigned long pp_magic;
> +	struct page_pool *pp;
> +	unsigned long _pp_mapping_pad;
> +	unsigned long dma_addr;
> +	atomic_long_t pp_ref_count;
> +};
> +
> +#define NETMEM_DESC_ASSERT_OFFSET(pg, desc)        \
> +	static_assert(offsetof(struct page, pg) == \
> +		      offsetof(struct netmem_desc, desc))
> +NETMEM_DESC_ASSERT_OFFSET(flags, _flags);
> +NETMEM_DESC_ASSERT_OFFSET(pp_magic, pp_magic);
> +NETMEM_DESC_ASSERT_OFFSET(pp, pp);
> +NETMEM_DESC_ASSERT_OFFSET(_pp_mapping_pad, _pp_mapping_pad);
> +NETMEM_DESC_ASSERT_OFFSET(dma_addr, dma_addr);
> +NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
> +#undef NETMEM_DESC_ASSERT_OFFSET
> +
> +/*
> + * Since struct netmem_desc uses the space in struct page, the size
> + * should be checked, until struct netmem_desc has its own instance from
> + * slab, to avoid conflicting with other members within struct page.
> + */
> +static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
> +
>  /* net_iov */
>  
>  DECLARE_STATIC_KEY_FALSE(page_pool_mem_providers);
> @@ -31,12 +75,25 @@ enum net_iov_type {
>  };
>  
>  struct net_iov {
> -	enum net_iov_type type;
> -	unsigned long pp_magic;
> -	struct page_pool *pp;
> +	union {
> +		struct netmem_desc desc;
> +
> +		/* XXX: The following part should be removed once all
> +		 * the references to them are converted so as to be
> +		 * accessed via netmem_desc e.g. niov->desc.pp instead
> +		 * of niov->pp.
> +		 */
> +		struct {
> +			unsigned long _flags;
> +			unsigned long pp_magic;
> +			struct page_pool *pp;
> +			unsigned long _pp_mapping_pad;
> +			unsigned long dma_addr;
> +			atomic_long_t pp_ref_count;
> +		};
> +	};
>  	struct net_iov_area *owner;
> -	unsigned long dma_addr;
> -	atomic_long_t pp_ref_count;
> +	enum net_iov_type type;
>  };
>  
>  struct net_iov_area {
> @@ -48,27 +105,22 @@ struct net_iov_area {
>  	unsigned long base_virtual;
>  };
>  
> -/* These fields in struct page are used by the page_pool and net stack:
> +/* net_iov is union'ed with struct netmem_desc mirroring struct page, so
> + * the page_pool can access these fields without worrying whether the
> + * underlying fields are accessed via netmem_desc or directly via
> + * net_iov, until all the references to them are converted so as to be
> + * accessed via netmem_desc e.g. niov->desc.pp instead of niov->pp.
>   *
> - *        struct {
> - *                unsigned long pp_magic;
> - *                struct page_pool *pp;
> - *                unsigned long _pp_mapping_pad;
> - *                unsigned long dma_addr;
> - *                atomic_long_t pp_ref_count;
> - *        };
> - *
> - * We mirror the page_pool fields here so the page_pool can access these fields
> - * without worrying whether the underlying fields belong to a page or net_iov.
> - *
> - * The non-net stack fields of struct page are private to the mm stack and must
> - * never be mirrored to net_iov.
> + * The non-net stack fields of struct page are private to the mm stack
> + * and must never be mirrored to net_iov.
>   */
> -#define NET_IOV_ASSERT_OFFSET(pg, iov)             \
> -	static_assert(offsetof(struct page, pg) == \
> +#define NET_IOV_ASSERT_OFFSET(desc, iov)                    \
> +	static_assert(offsetof(struct netmem_desc, desc) == \
>  		      offsetof(struct net_iov, iov))
> +NET_IOV_ASSERT_OFFSET(_flags, _flags);
>  NET_IOV_ASSERT_OFFSET(pp_magic, pp_magic);
>  NET_IOV_ASSERT_OFFSET(pp, pp);
> +NET_IOV_ASSERT_OFFSET(_pp_mapping_pad, _pp_mapping_pad);
>  NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr);
>  NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
>  #undef NET_IOV_ASSERT_OFFSET
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* netmem series needs some love and Acks from MM folks
  2025-06-14  2:19           ` Mina Almasry
@ 2025-06-17  2:31             ` Harry Yoo
  2025-06-17 16:09               ` David Hildenbrand
  2025-06-20  4:12             ` [PATCH net-next 1/9] netmem: introduce struct netmem_desc mirroring struct page Byungchul Park
  1 sibling, 1 reply; 40+ messages in thread
From: Harry Yoo @ 2025-06-17  2:31 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Byungchul Park, Jakub Kicinski, willy, netdev, linux-kernel,
	linux-mm, kernel_team, ilias.apalodimas, hawk, akpm, davem,
	john.fastabend, andrew+netdev, asml.silence, toke, tariqt,
	edumazet, pabeni, saeedm, leon, ast, daniel, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	horms, linux-rdma, bpf, vishal.moola

On Fri, Jun 13, 2025 at 07:19:07PM -0700, Mina Almasry wrote:
> On Thu, Jun 12, 2025 at 6:13 PM Byungchul Park <byungchul@sk.com> wrote:
> >
> > On Wed, Jun 11, 2025 at 06:55:42PM -0700, Jakub Kicinski wrote:
> > > On Tue, 10 Jun 2025 10:30:01 +0900 Byungchul Park wrote:
> > > > > What's the intended relation between the types?
> > > >
> > > > One thing I'm trying to achieve is to remove pp fields from struct page,
> > > > and make network code use struct netmem_desc { pp fields; } instead of
> > > > sturc page for that purpose.
> > > >
> > > > The reason why I union'ed it with the existing pp fields in struct
> > > > net_iov *temporarily* for now is, to fade out the existing pp fields
> > > > from struct net_iov so as to make the final form like:
> > >
> > > I see, I may have mixed up the complaints there. I thought the effort
> > > was also about removing the need for the ref count. And Rx is
> > > relatively light on use of ref counting.
> > >
> > > > > netmem_ref exists to clearly indicate that memory may not be readable.
> > > > > Majority of memory we expect to allocate from page pool must be
> > > > > kernel-readable. What's the plan for reading the "single pointer"
> > > > > memory within the kernel?
> > > > >
> > > > > I think you're approaching this problem from the easiest and least
> > > >
> > > > No, I've never looked for the easiest way.  My bad if there are a better
> > > > way to achieve it.  What would you recommend?
> > >
> > > Sorry, I don't mean that the approach you took is the easiest way out.
> > > I meant that between Rx and Tx handling Rx is the easier part because
> > > we already have the suitable abstraction. It's true that we use more
> > > fields in page struct on Rx, but I thought Tx is also more urgent
> > > as there are open reports for networking taking references on slab
> > > pages.
> > >
> > > In any case, please make sure you maintain clear separation between
> > > readable and unreadable memory in the code you produce.
> >
> > Do you mean the current patches do not?  If yes, please point out one
> > as example, which would be helpful to extract action items.
> >
> 
> I think one thing we could do to improve separation between readable
> (pages/netmem_desc) and unreadable (net_iov) is to remove the struct
> netmem_desc field inside the net_iov, and instead just duplicate the
> pp/pp_ref_count/etc fields. The current code gives off the impression
> that net_iov may be a container of netmem_desc which is not really
> accurate.
> 
> But I don't think that's a major blocker. I think maybe the real issue
> is that there are no reviews from any mm maintainers?

Let's try changing the subject to draw some attention from MM people :)

> So I'm not 100%
> sure this is in line with their memdesc plans. I think probably
> patches 2->8 are generic netmem-ifications that are good to merge
> anyway, but I would say patch 1 and 9 need a reviewed by from someone
> on the mm side. Just my 2 cents.

As someone who worked on the zpdesc series, I think it is pretty much
in line with the memdesc plans.

I mean, it does differ a bit from the initial idea of generalizing it as
"bump" allocator, but overall, it's still aligned with the memdesc
plans, and looks like a starting point, IMHO.

> Btw, this series has been marked as changes requested on patchwork, so
> it is in need of a respin one way or another:

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-11 14:30       ` Pavel Begunkov
  2025-06-12  0:39         ` Byungchul Park
@ 2025-06-17  2:37         ` Harry Yoo
  1 sibling, 0 replies; 40+ messages in thread
From: Harry Yoo @ 2025-06-17  2:37 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Byungchul Park, Mina Almasry, willy, netdev, linux-kernel,
	linux-mm, kernel_team, kuba, ilias.apalodimas, hawk, akpm, davem,
	john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

On Wed, Jun 11, 2025 at 03:30:28PM +0100, Pavel Begunkov wrote:
> On 6/10/25 02:45, Byungchul Park wrote:
> > On Mon, Jun 09, 2025 at 10:39:06AM -0700, Mina Almasry wrote:
> > > On Sun, Jun 8, 2025 at 9:32 PM Byungchul Park <byungchul@sk.com> wrote:
> > > > 
> > > > To simplify struct page, the effort to separate its own descriptor from
> > > > struct page is required and the work for page pool is on going.
> > > > 
> > > > To achieve that, all the code should avoid directly accessing page pool
> > > > members of struct page.
> > > > 
> > > > Access ->pp_magic through struct netmem_desc instead of directly
> > > > accessing it through struct page in page_pool_page_is_pp().  Plus, move
> > > > page_pool_page_is_pp() from mm.h to netmem.h to use struct netmem_desc
> > > > without header dependency issue.
> > > > 
> > > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > > > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > > > ---
> > > >   include/linux/mm.h   | 12 ------------
> > > >   include/net/netmem.h | 14 ++++++++++++++
> > > >   mm/page_alloc.c      |  1 +
> > > >   3 files changed, 15 insertions(+), 12 deletions(-)
> > > > 
> > > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > > index e51dba8398f7..f23560853447 100644
> > > > --- a/include/linux/mm.h
> > > > +++ b/include/linux/mm.h
> > > > @@ -4311,16 +4311,4 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> > > >    */
> > > >   #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> > > > 
> > > > -#ifdef CONFIG_PAGE_POOL
> > > > -static inline bool page_pool_page_is_pp(struct page *page)
> > > > -{
> > > > -       return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> > > > -}
> > > > -#else
> > > > -static inline bool page_pool_page_is_pp(struct page *page)
> > > > -{
> > > > -       return false;
> > > > -}
> > > > -#endif
> > > > -
> > > >   #endif /* _LINUX_MM_H */
> > > > diff --git a/include/net/netmem.h b/include/net/netmem.h
> > > > index d84ab624b489..8f354ae7d5c3 100644
> > > > --- a/include/net/netmem.h
> > > > +++ b/include/net/netmem.h
> > > > @@ -56,6 +56,20 @@ NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
> > > >    */
> > > >   static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
> > > > 
> > > > +#ifdef CONFIG_PAGE_POOL
> > > > +static inline bool page_pool_page_is_pp(struct page *page)
> > > > +{
> > > > +       struct netmem_desc *desc = (struct netmem_desc *)page;
> > > > +
> > > > +       return (desc->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> > > > +}
> > > > +#else
> > > > +static inline bool page_pool_page_is_pp(struct page *page)
> > > > +{
> > > > +       return false;
> > > > +}
> > > > +#endif
> > > > +
> > > >   /* net_iov */
> > > > 
> > > >   DECLARE_STATIC_KEY_FALSE(page_pool_mem_providers);
> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > index 4f29e393f6af..be0752c0ac92 100644
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -55,6 +55,7 @@
> > > >   #include <linux/delayacct.h>
> > > >   #include <linux/cacheinfo.h>
> > > >   #include <linux/pgalloc_tag.h>
> > > > +#include <net/netmem.h>
> > > 
> > > mm files starting to include netmem.h is a bit interesting. I did not
> > > expect/want dependencies outside of net. If anything the netmem stuff
> > > include linux/mm.h
> > 
> > That's what I also concerned.  However, now that there are no way to
> > check the type of memory in a general way but require to use one of pp
> > fields, page_pool_page_is_pp() should be served by pp code e.i. network
> > subsystem.
> > 
> > This should be changed once either 1) mm provides a general way to check
> > the type or 2) pp code is moved to mm code.  I think this approach
> > should acceptable until then.
> 
> I'd argue in the end the helper should be in mm.h as mm is going to
> dictate how to check the type and keep them enumerated.
>
> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>

Acked-by: Harry Yoo <harry.yoo@oracle.com>

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: netmem series needs some love and Acks from MM folks
  2025-06-17  2:31             ` netmem series needs some love and Acks from MM folks Harry Yoo
@ 2025-06-17 16:09               ` David Hildenbrand
  2025-06-17 16:32                 ` Harry Yoo
  2025-06-18  0:08                 ` Byungchul Park
  0 siblings, 2 replies; 40+ messages in thread
From: David Hildenbrand @ 2025-06-17 16:09 UTC (permalink / raw)
  To: Harry Yoo, Mina Almasry, willy
  Cc: Byungchul Park, Jakub Kicinski, netdev, linux-kernel, linux-mm,
	kernel_team, ilias.apalodimas, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, lorenzo.stoakes, Liam.Howlett, vbabka,
	rppt, surenb, mhocko, horms, linux-rdma, bpf, vishal.moola

On 17.06.25 04:31, Harry Yoo wrote:
> On Fri, Jun 13, 2025 at 07:19:07PM -0700, Mina Almasry wrote:
>> On Thu, Jun 12, 2025 at 6:13 PM Byungchul Park <byungchul@sk.com> wrote:
>>>
>>> On Wed, Jun 11, 2025 at 06:55:42PM -0700, Jakub Kicinski wrote:
>>>> On Tue, 10 Jun 2025 10:30:01 +0900 Byungchul Park wrote:
>>>>>> What's the intended relation between the types?
>>>>>
>>>>> One thing I'm trying to achieve is to remove pp fields from struct page,
>>>>> and make network code use struct netmem_desc { pp fields; } instead of
>>>>> sturc page for that purpose.
>>>>>
>>>>> The reason why I union'ed it with the existing pp fields in struct
>>>>> net_iov *temporarily* for now is, to fade out the existing pp fields
>>>>> from struct net_iov so as to make the final form like:
>>>>
>>>> I see, I may have mixed up the complaints there. I thought the effort
>>>> was also about removing the need for the ref count. And Rx is
>>>> relatively light on use of ref counting.
>>>>
>>>>>> netmem_ref exists to clearly indicate that memory may not be readable.
>>>>>> Majority of memory we expect to allocate from page pool must be
>>>>>> kernel-readable. What's the plan for reading the "single pointer"
>>>>>> memory within the kernel?
>>>>>>
>>>>>> I think you're approaching this problem from the easiest and least
>>>>>
>>>>> No, I've never looked for the easiest way.  My bad if there are a better
>>>>> way to achieve it.  What would you recommend?
>>>>
>>>> Sorry, I don't mean that the approach you took is the easiest way out.
>>>> I meant that between Rx and Tx handling Rx is the easier part because
>>>> we already have the suitable abstraction. It's true that we use more
>>>> fields in page struct on Rx, but I thought Tx is also more urgent
>>>> as there are open reports for networking taking references on slab
>>>> pages.
>>>>
>>>> In any case, please make sure you maintain clear separation between
>>>> readable and unreadable memory in the code you produce.
>>>
>>> Do you mean the current patches do not?  If yes, please point out one
>>> as example, which would be helpful to extract action items.
>>>
>>
>> I think one thing we could do to improve separation between readable
>> (pages/netmem_desc) and unreadable (net_iov) is to remove the struct
>> netmem_desc field inside the net_iov, and instead just duplicate the
>> pp/pp_ref_count/etc fields. The current code gives off the impression
>> that net_iov may be a container of netmem_desc which is not really
>> accurate.
>>
>> But I don't think that's a major blocker. I think maybe the real issue
>> is that there are no reviews from any mm maintainers?
> 
> Let's try changing the subject to draw some attention from MM people :)

Hi, it worked! :P

I hope Willy will find his way to this thread as well.

> 
>> So I'm not 100%
>> sure this is in line with their memdesc plans. I think probably
>> patches 2->8 are generic netmem-ifications that are good to merge
>> anyway, but I would say patch 1 and 9 need a reviewed by from someone
>> on the mm side. Just my 2 cents.
> 
> As someone who worked on the zpdesc series, I think it is pretty much
> in line with the memdesc plans.
> 
> I mean, it does differ a bit from the initial idea of generalizing it as
> "bump" allocator, but overall, it's still aligned with the memdesc
> plans, and looks like a starting point, IMHO.

Just to summarize (not that there is any misunderstanding), the first 
step of the memdesc plan is simple:

1) have a dedicated data-structure we will allocate alter dynamically.

2) Make it overlay "struct page" for now in a way that doesn't break things

3) Convert all users of "struct page" to the new data-structure

Later, the memdesc data-structure will then actually come be allocated 
dynamically, so "struct page" content will not apply anymore, and we can 
shrink "struct page".


What I see in this patch is exactly 1) and 2).

I am not 100% sure about existing "struct net_iov" and how that 
interacts with "struct page" overlay. I suspects it's just a dynamically 
allocated structure?

Because this patch changes the layout of "struct net_iov", which is a 
bit confusing at first sight?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: netmem series needs some love and Acks from MM folks
  2025-06-17 16:09               ` David Hildenbrand
@ 2025-06-17 16:32                 ` Harry Yoo
  2025-06-18  0:08                 ` Byungchul Park
  1 sibling, 0 replies; 40+ messages in thread
From: Harry Yoo @ 2025-06-17 16:32 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Mina Almasry, willy, Byungchul Park, Jakub Kicinski, netdev,
	linux-kernel, linux-mm, kernel_team, ilias.apalodimas, hawk, akpm,
	davem, john.fastabend, andrew+netdev, asml.silence, toke, tariqt,
	edumazet, pabeni, saeedm, leon, ast, daniel, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, linux-rdma,
	bpf, vishal.moola

On Tue, Jun 17, 2025 at 06:09:36PM +0200, David Hildenbrand wrote:
> On 17.06.25 04:31, Harry Yoo wrote:
> > On Fri, Jun 13, 2025 at 07:19:07PM -0700, Mina Almasry wrote:
> > > On Thu, Jun 12, 2025 at 6:13 PM Byungchul Park <byungchul@sk.com> wrote:
> > > > 
> > > > On Wed, Jun 11, 2025 at 06:55:42PM -0700, Jakub Kicinski wrote:
> > > > > On Tue, 10 Jun 2025 10:30:01 +0900 Byungchul Park wrote:
> > > > > > > What's the intended relation between the types?
> > > > > > 
> > > > > > One thing I'm trying to achieve is to remove pp fields from struct page,
> > > > > > and make network code use struct netmem_desc { pp fields; } instead of
> > > > > > sturc page for that purpose.
> > > > > > 
> > > > > > The reason why I union'ed it with the existing pp fields in struct
> > > > > > net_iov *temporarily* for now is, to fade out the existing pp fields
> > > > > > from struct net_iov so as to make the final form like:
> > > > > 
> > > > > I see, I may have mixed up the complaints there. I thought the effort
> > > > > was also about removing the need for the ref count. And Rx is
> > > > > relatively light on use of ref counting.
> > > > > 
> > > > > > > netmem_ref exists to clearly indicate that memory may not be readable.
> > > > > > > Majority of memory we expect to allocate from page pool must be
> > > > > > > kernel-readable. What's the plan for reading the "single pointer"
> > > > > > > memory within the kernel?
> > > > > > > 
> > > > > > > I think you're approaching this problem from the easiest and least
> > > > > > 
> > > > > > No, I've never looked for the easiest way.  My bad if there are a better
> > > > > > way to achieve it.  What would you recommend?
> > > > > 
> > > > > Sorry, I don't mean that the approach you took is the easiest way out.
> > > > > I meant that between Rx and Tx handling Rx is the easier part because
> > > > > we already have the suitable abstraction. It's true that we use more
> > > > > fields in page struct on Rx, but I thought Tx is also more urgent
> > > > > as there are open reports for networking taking references on slab
> > > > > pages.
> > > > > 
> > > > > In any case, please make sure you maintain clear separation between
> > > > > readable and unreadable memory in the code you produce.
> > > > 
> > > > Do you mean the current patches do not?  If yes, please point out one
> > > > as example, which would be helpful to extract action items.
> > > > 
> > > 
> > > I think one thing we could do to improve separation between readable
> > > (pages/netmem_desc) and unreadable (net_iov) is to remove the struct
> > > netmem_desc field inside the net_iov, and instead just duplicate the
> > > pp/pp_ref_count/etc fields. The current code gives off the impression
> > > that net_iov may be a container of netmem_desc which is not really
> > > accurate.
> > > 
> > > But I don't think that's a major blocker. I think maybe the real issue
> > > is that there are no reviews from any mm maintainers?
> > 
> > Let's try changing the subject to draw some attention from MM people :)
> 
> Hi, it worked! :P

Glad it worked :)

> > 
> > > So I'm not 100%
> > > sure this is in line with their memdesc plans. I think probably
> > > patches 2->8 are generic netmem-ifications that are good to merge
> > > anyway, but I would say patch 1 and 9 need a reviewed by from someone
> > > on the mm side. Just my 2 cents.
> > 
> > As someone who worked on the zpdesc series, I think it is pretty much
> > in line with the memdesc plans.
> > 
> > I mean, it does differ a bit from the initial idea of generalizing it as
> > "bump" allocator, but overall, it's still aligned with the memdesc
> > plans, and looks like a starting point, IMHO.
> 
> Just to summarize (not that there is any misunderstanding), the first step
> of the memdesc plan is simple:
> 
> 1) have a dedicated data-structure we will allocate alter dynamically.
> 
> 2) Make it overlay "struct page" for now in a way that doesn't break things
> 
> 3) Convert all users of "struct page" to the new data-structure
> 
> Later, the memdesc data-structure will then actually come be allocated
> dynamically, so "struct page" content will not apply anymore, and we can
> shrink "struct page".
> 
> 
> What I see in this patch is exactly 1) and 2).

I believe 3) will be accompolished with a follow-up patch series from
Byungchul. The previous series was 18 patches, but divided to two series
and this is the first half.

> I am not 100% sure about existing "struct net_iov" and how that interacts
> with "struct page" overlay. I suspects it's just a dynamically allocated
> structure?

Yeah, that's indeed confusing. As mentioned here [1], it does not
overlay struct page at all. It is dynamically allocated from slab.

IIUC net_iov mirrors netmem_desc fields because netmem_ref might be
netmem_desc (overlays struct page) or net_iov (allocated from slab), and
by mirroring netmem_desc fields in net_iov, page pool doesn't need to care
if it's net_iov or netmem_desc when acccessing the common fields shared
between them.

[1] https://lore.kernel.org/linux-mm/aFDQ9X2WsXszNJ5M@hyeyoo

> Because this patch changes the layout of "struct net_iov", which is a bit
> confusing at first sight?

Yeah, better documented.

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: netmem series needs some love and Acks from MM folks
  2025-06-17 16:09               ` David Hildenbrand
  2025-06-17 16:32                 ` Harry Yoo
@ 2025-06-18  0:08                 ` Byungchul Park
  2025-06-18  7:51                   ` Pavel Begunkov
  1 sibling, 1 reply; 40+ messages in thread
From: Byungchul Park @ 2025-06-18  0:08 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Harry Yoo, Mina Almasry, willy, Jakub Kicinski, netdev,
	linux-kernel, linux-mm, kernel_team, ilias.apalodimas, hawk, akpm,
	davem, john.fastabend, andrew+netdev, asml.silence, toke, tariqt,
	edumazet, pabeni, saeedm, leon, ast, daniel, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, linux-rdma,
	bpf, vishal.moola

On Tue, Jun 17, 2025 at 06:09:36PM +0200, David Hildenbrand wrote:
> On 17.06.25 04:31, Harry Yoo wrote:
> > On Fri, Jun 13, 2025 at 07:19:07PM -0700, Mina Almasry wrote:
> > > On Thu, Jun 12, 2025 at 6:13 PM Byungchul Park <byungchul@sk.com> wrote:
> > > > 
> > > > On Wed, Jun 11, 2025 at 06:55:42PM -0700, Jakub Kicinski wrote:
> > > > > On Tue, 10 Jun 2025 10:30:01 +0900 Byungchul Park wrote:
> > > > > > > What's the intended relation between the types?
> > > > > > 
> > > > > > One thing I'm trying to achieve is to remove pp fields from struct page,
> > > > > > and make network code use struct netmem_desc { pp fields; } instead of
> > > > > > sturc page for that purpose.
> > > > > > 
> > > > > > The reason why I union'ed it with the existing pp fields in struct
> > > > > > net_iov *temporarily* for now is, to fade out the existing pp fields
> > > > > > from struct net_iov so as to make the final form like:
> > > > > 
> > > > > I see, I may have mixed up the complaints there. I thought the effort
> > > > > was also about removing the need for the ref count. And Rx is
> > > > > relatively light on use of ref counting.
> > > > > 
> > > > > > > netmem_ref exists to clearly indicate that memory may not be readable.
> > > > > > > Majority of memory we expect to allocate from page pool must be
> > > > > > > kernel-readable. What's the plan for reading the "single pointer"
> > > > > > > memory within the kernel?
> > > > > > > 
> > > > > > > I think you're approaching this problem from the easiest and least
> > > > > > 
> > > > > > No, I've never looked for the easiest way.  My bad if there are a better
> > > > > > way to achieve it.  What would you recommend?
> > > > > 
> > > > > Sorry, I don't mean that the approach you took is the easiest way out.
> > > > > I meant that between Rx and Tx handling Rx is the easier part because
> > > > > we already have the suitable abstraction. It's true that we use more
> > > > > fields in page struct on Rx, but I thought Tx is also more urgent
> > > > > as there are open reports for networking taking references on slab
> > > > > pages.
> > > > > 
> > > > > In any case, please make sure you maintain clear separation between
> > > > > readable and unreadable memory in the code you produce.
> > > > 
> > > > Do you mean the current patches do not?  If yes, please point out one
> > > > as example, which would be helpful to extract action items.
> > > > 
> > > 
> > > I think one thing we could do to improve separation between readable
> > > (pages/netmem_desc) and unreadable (net_iov) is to remove the struct
> > > netmem_desc field inside the net_iov, and instead just duplicate the
> > > pp/pp_ref_count/etc fields. The current code gives off the impression
> > > that net_iov may be a container of netmem_desc which is not really
> > > accurate.
> > > 
> > > But I don't think that's a major blocker. I think maybe the real issue
> > > is that there are no reviews from any mm maintainers?
> > 
> > Let's try changing the subject to draw some attention from MM people :)
> 
> Hi, it worked! :P
> 
> I hope Willy will find his way to this thread as well.
> 
> > 
> > > So I'm not 100%
> > > sure this is in line with their memdesc plans. I think probably
> > > patches 2->8 are generic netmem-ifications that are good to merge
> > > anyway, but I would say patch 1 and 9 need a reviewed by from someone
> > > on the mm side. Just my 2 cents.
> > 
> > As someone who worked on the zpdesc series, I think it is pretty much
> > in line with the memdesc plans.
> > 
> > I mean, it does differ a bit from the initial idea of generalizing it as
> > "bump" allocator, but overall, it's still aligned with the memdesc
> > plans, and looks like a starting point, IMHO.
> 
> Just to summarize (not that there is any misunderstanding), the first
> step of the memdesc plan is simple:
> 
> 1) have a dedicated data-structure we will allocate alter dynamically.
> 
> 2) Make it overlay "struct page" for now in a way that doesn't break things
> 
> 3) Convert all users of "struct page" to the new data-structure
> 
> Later, the memdesc data-structure will then actually come be allocated
> dynamically, so "struct page" content will not apply anymore, and we can
> shrink "struct page".
> 
> 
> What I see in this patch is exactly 1) and 2).
> 
> I am not 100% sure about existing "struct net_iov" and how that
> interacts with "struct page" overlay. I suspects it's just a dynamically
> allocated structure?
> 
> Because this patch changes the layout of "struct net_iov", which is a
> bit confusing at first sight?

The changes of the layout was asked by network folks, that was to split
the struct net_iov fields to two, netmem_desc and net_iov specific ones.

How to organize struct net_iov further is up to the network folks, but
I believe the current layout should be the first step.

	Byungchul

> 
> --
> Cheers,
> 
> David / dhildenb


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: netmem series needs some love and Acks from MM folks
  2025-06-18  0:08                 ` Byungchul Park
@ 2025-06-18  7:51                   ` Pavel Begunkov
  0 siblings, 0 replies; 40+ messages in thread
From: Pavel Begunkov @ 2025-06-18  7:51 UTC (permalink / raw)
  To: Byungchul Park, David Hildenbrand
  Cc: Harry Yoo, Mina Almasry, willy, Jakub Kicinski, netdev,
	linux-kernel, linux-mm, kernel_team, ilias.apalodimas, hawk, akpm,
	davem, john.fastabend, andrew+netdev, toke, tariqt, edumazet,
	pabeni, saeedm, leon, ast, daniel, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
	vishal.moola

On 6/18/25 01:08, Byungchul Park wrote:
> On Tue, Jun 17, 2025 at 06:09:36PM +0200, David Hildenbrand wrote:
...>> 1) have a dedicated data-structure we will allocate alter dynamically.
>>
>> 2) Make it overlay "struct page" for now in a way that doesn't break things
>>
>> 3) Convert all users of "struct page" to the new data-structure
>>
>> Later, the memdesc data-structure will then actually come be allocated
>> dynamically, so "struct page" content will not apply anymore, and we can
>> shrink "struct page".
>>
>>
>> What I see in this patch is exactly 1) and 2).
>>
>> I am not 100% sure about existing "struct net_iov" and how that
>> interacts with "struct page" overlay. I suspects it's just a dynamically
>> allocated structure?
>>
>> Because this patch changes the layout of "struct net_iov", which is a
>> bit confusing at first sight?
> 
> The changes of the layout was asked by network folks, that was to split
> the struct net_iov fields to two, netmem_desc and net_iov specific ones.

That's right, and the moved fields should never be touched without
checking that it's a net_iov first. All of it will eventually converge
to netmem_desc without net_iov sticking out where it shouldn't, but my
personal preference would be to get this merged, as it's a good enough
base, and do the rest on top.

> How to organize struct net_iov further is up to the network folks, but
> I believe the current layout should be the first step.
-- 
Pavel Begunkov



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/9] netmem: introduce struct netmem_desc mirroring struct page
  2025-06-09  4:32 ` [PATCH net-next 1/9] netmem: introduce struct netmem_desc mirroring " Byungchul Park
  2025-06-09 19:32   ` Jakub Kicinski
  2025-06-17  2:20   ` Harry Yoo
@ 2025-06-19  9:32   ` Vlastimil Babka
  2 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2025-06-19  9:32 UTC (permalink / raw)
  To: Byungchul Park, willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	rppt, surenb, mhocko, horms, linux-rdma, bpf, vishal.moola

On 6/9/25 06:32, Byungchul Park wrote:
> To simplify struct page, the page pool members of struct page should be
> moved to other, allowing these members to be removed from struct page.
> 
> Introduce a network memory descriptor to store the members, struct
> netmem_desc, and make it union'ed with the existing fields in struct
> net_iov, allowing to organize the fields of struct net_iov.
> 
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
> Reviewed-by: Mina Almasry <almasrymina@google.com>

I'm not willy or David, but I believe I know enough about the memdesc plans
and as I worked on the struct slab, can confirm this goes in pretty much the
same direction, so looks good to me. As for struct net_iov I trust the
networking people know what they're doing there and it seems to make sense
to me too.

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  include/net/netmem.h | 94 ++++++++++++++++++++++++++++++++++----------
>  1 file changed, 73 insertions(+), 21 deletions(-)
> 
> diff --git a/include/net/netmem.h b/include/net/netmem.h
> index 386164fb9c18..2687c8051ca5 100644
> --- a/include/net/netmem.h
> +++ b/include/net/netmem.h
> @@ -12,6 +12,50 @@
>  #include <linux/mm.h>
>  #include <net/net_debug.h>
>  
> +/* These fields in struct page are used by the page_pool and net stack:
> + *
> + *        struct {
> + *                unsigned long pp_magic;
> + *                struct page_pool *pp;
> + *                unsigned long _pp_mapping_pad;
> + *                unsigned long dma_addr;
> + *                atomic_long_t pp_ref_count;
> + *        };
> + *
> + * We mirror the page_pool fields here so the page_pool can access these
> + * fields without worrying whether the underlying fields belong to a
> + * page or netmem_desc.
> + *
> + * CAUTION: Do not update the fields in netmem_desc without also
> + * updating the anonymous aliasing union in struct net_iov.
> + */
> +struct netmem_desc {
> +	unsigned long _flags;
> +	unsigned long pp_magic;
> +	struct page_pool *pp;
> +	unsigned long _pp_mapping_pad;
> +	unsigned long dma_addr;
> +	atomic_long_t pp_ref_count;
> +};
> +
> +#define NETMEM_DESC_ASSERT_OFFSET(pg, desc)        \
> +	static_assert(offsetof(struct page, pg) == \
> +		      offsetof(struct netmem_desc, desc))
> +NETMEM_DESC_ASSERT_OFFSET(flags, _flags);
> +NETMEM_DESC_ASSERT_OFFSET(pp_magic, pp_magic);
> +NETMEM_DESC_ASSERT_OFFSET(pp, pp);
> +NETMEM_DESC_ASSERT_OFFSET(_pp_mapping_pad, _pp_mapping_pad);
> +NETMEM_DESC_ASSERT_OFFSET(dma_addr, dma_addr);
> +NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
> +#undef NETMEM_DESC_ASSERT_OFFSET
> +
> +/*
> + * Since struct netmem_desc uses the space in struct page, the size
> + * should be checked, until struct netmem_desc has its own instance from
> + * slab, to avoid conflicting with other members within struct page.
> + */
> +static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
> +
>  /* net_iov */
>  
>  DECLARE_STATIC_KEY_FALSE(page_pool_mem_providers);
> @@ -31,12 +75,25 @@ enum net_iov_type {
>  };
>  
>  struct net_iov {
> -	enum net_iov_type type;
> -	unsigned long pp_magic;
> -	struct page_pool *pp;
> +	union {
> +		struct netmem_desc desc;
> +
> +		/* XXX: The following part should be removed once all
> +		 * the references to them are converted so as to be
> +		 * accessed via netmem_desc e.g. niov->desc.pp instead
> +		 * of niov->pp.
> +		 */
> +		struct {
> +			unsigned long _flags;
> +			unsigned long pp_magic;
> +			struct page_pool *pp;
> +			unsigned long _pp_mapping_pad;
> +			unsigned long dma_addr;
> +			atomic_long_t pp_ref_count;
> +		};
> +	};
>  	struct net_iov_area *owner;
> -	unsigned long dma_addr;
> -	atomic_long_t pp_ref_count;
> +	enum net_iov_type type;
>  };
>  
>  struct net_iov_area {
> @@ -48,27 +105,22 @@ struct net_iov_area {
>  	unsigned long base_virtual;
>  };
>  
> -/* These fields in struct page are used by the page_pool and net stack:
> +/* net_iov is union'ed with struct netmem_desc mirroring struct page, so
> + * the page_pool can access these fields without worrying whether the
> + * underlying fields are accessed via netmem_desc or directly via
> + * net_iov, until all the references to them are converted so as to be
> + * accessed via netmem_desc e.g. niov->desc.pp instead of niov->pp.
>   *
> - *        struct {
> - *                unsigned long pp_magic;
> - *                struct page_pool *pp;
> - *                unsigned long _pp_mapping_pad;
> - *                unsigned long dma_addr;
> - *                atomic_long_t pp_ref_count;
> - *        };
> - *
> - * We mirror the page_pool fields here so the page_pool can access these fields
> - * without worrying whether the underlying fields belong to a page or net_iov.
> - *
> - * The non-net stack fields of struct page are private to the mm stack and must
> - * never be mirrored to net_iov.
> + * The non-net stack fields of struct page are private to the mm stack
> + * and must never be mirrored to net_iov.
>   */
> -#define NET_IOV_ASSERT_OFFSET(pg, iov)             \
> -	static_assert(offsetof(struct page, pg) == \
> +#define NET_IOV_ASSERT_OFFSET(desc, iov)                    \
> +	static_assert(offsetof(struct netmem_desc, desc) == \
>  		      offsetof(struct net_iov, iov))
> +NET_IOV_ASSERT_OFFSET(_flags, _flags);
>  NET_IOV_ASSERT_OFFSET(pp_magic, pp_magic);
>  NET_IOV_ASSERT_OFFSET(pp, pp);
> +NET_IOV_ASSERT_OFFSET(_pp_mapping_pad, _pp_mapping_pad);
>  NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr);
>  NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
>  #undef NET_IOV_ASSERT_OFFSET



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-09  4:32 ` [PATCH net-next 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp() Byungchul Park
  2025-06-09 17:39   ` Mina Almasry
@ 2025-06-19  9:55   ` Vlastimil Babka
  2025-06-19 10:42     ` Byungchul Park
  1 sibling, 1 reply; 40+ messages in thread
From: Vlastimil Babka @ 2025-06-19  9:55 UTC (permalink / raw)
  To: Byungchul Park, willy, netdev
  Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
	ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
	andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
	saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
	rppt, surenb, mhocko, horms, linux-rdma, bpf, vishal.moola

On 6/9/25 06:32, Byungchul Park wrote:
> To simplify struct page, the effort to separate its own descriptor from
> struct page is required and the work for page pool is on going.
> 
> To achieve that, all the code should avoid directly accessing page pool
> members of struct page.
> 
> Access ->pp_magic through struct netmem_desc instead of directly
> accessing it through struct page in page_pool_page_is_pp().  Plus, move
> page_pool_page_is_pp() from mm.h to netmem.h to use struct netmem_desc
> without header dependency issue.
> 
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

I'm not that worried about the new include as it's for a .c file and not
between .h files.

> ---
>  include/linux/mm.h   | 12 ------------
>  include/net/netmem.h | 14 ++++++++++++++
>  mm/page_alloc.c      |  1 +
>  3 files changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e51dba8398f7..f23560853447 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4311,16 +4311,4 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>   */
>  #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)

Do you plan to move also all these PP_ constants/macros (bunch more above
this one) to a net header?

Thanks,
Vlastimil



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
  2025-06-19  9:55   ` Vlastimil Babka
@ 2025-06-19 10:42     ` Byungchul Park
  0 siblings, 0 replies; 40+ messages in thread
From: Byungchul Park @ 2025-06-19 10:42 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
	almasrymina, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
	john.fastabend, andrew+netdev, asml.silence, toke, tariqt,
	edumazet, pabeni, saeedm, leon, ast, daniel, david,
	lorenzo.stoakes, Liam.Howlett, rppt, surenb, mhocko, horms,
	linux-rdma, bpf, vishal.moola

On Thu, Jun 19, 2025 at 11:55:03AM +0200, Vlastimil Babka wrote:
> On 6/9/25 06:32, Byungchul Park wrote:
> > To simplify struct page, the effort to separate its own descriptor from
> > struct page is required and the work for page pool is on going.
> >
> > To achieve that, all the code should avoid directly accessing page pool
> > members of struct page.
> >
> > Access ->pp_magic through struct netmem_desc instead of directly
> > accessing it through struct page in page_pool_page_is_pp().  Plus, move
> > page_pool_page_is_pp() from mm.h to netmem.h to use struct netmem_desc
> > without header dependency issue.
> >
> > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> 
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> 
> I'm not that worried about the new include as it's for a .c file and not
> between .h files.
> 
> > ---
> >  include/linux/mm.h   | 12 ------------
> >  include/net/netmem.h | 14 ++++++++++++++
> >  mm/page_alloc.c      |  1 +
> >  3 files changed, 15 insertions(+), 12 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index e51dba8398f7..f23560853447 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -4311,16 +4311,4 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> >   */
> >  #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> 
> Do you plan to move also all these PP_ constants/macros (bunch more above
> this one) to a net header?

No, but it should.  I will consider it too.  Thanks.

	Byungchul

> 
> Thanks,
> Vlastimil
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/9] netmem: introduce struct netmem_desc mirroring struct page
  2025-06-14  2:19           ` Mina Almasry
  2025-06-17  2:31             ` netmem series needs some love and Acks from MM folks Harry Yoo
@ 2025-06-20  4:12             ` Byungchul Park
  1 sibling, 0 replies; 40+ messages in thread
From: Byungchul Park @ 2025-06-20  4:12 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Jakub Kicinski, willy, netdev, linux-kernel, linux-mm,
	kernel_team, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
	john.fastabend, andrew+netdev, asml.silence, toke, tariqt,
	edumazet, pabeni, saeedm, leon, ast, daniel, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	horms, linux-rdma, bpf, vishal.moola

On Fri, Jun 13, 2025 at 07:19:07PM -0700, Mina Almasry wrote:
> On Thu, Jun 12, 2025 at 6:13 PM Byungchul Park <byungchul@sk.com> wrote:
> >
> > On Wed, Jun 11, 2025 at 06:55:42PM -0700, Jakub Kicinski wrote:
> > > On Tue, 10 Jun 2025 10:30:01 +0900 Byungchul Park wrote:
> > > > > What's the intended relation between the types?
> > > >
> > > > One thing I'm trying to achieve is to remove pp fields from struct page,
> > > > and make network code use struct netmem_desc { pp fields; } instead of
> > > > sturc page for that purpose.
> > > >
> > > > The reason why I union'ed it with the existing pp fields in struct
> > > > net_iov *temporarily* for now is, to fade out the existing pp fields
> > > > from struct net_iov so as to make the final form like:
> > >
> > > I see, I may have mixed up the complaints there. I thought the effort
> > > was also about removing the need for the ref count. And Rx is
> > > relatively light on use of ref counting.
> > >
> > > > > netmem_ref exists to clearly indicate that memory may not be readable.
> > > > > Majority of memory we expect to allocate from page pool must be
> > > > > kernel-readable. What's the plan for reading the "single pointer"
> > > > > memory within the kernel?
> > > > >
> > > > > I think you're approaching this problem from the easiest and least
> > > >
> > > > No, I've never looked for the easiest way.  My bad if there are a better
> > > > way to achieve it.  What would you recommend?
> > >
> > > Sorry, I don't mean that the approach you took is the easiest way out.
> > > I meant that between Rx and Tx handling Rx is the easier part because
> > > we already have the suitable abstraction. It's true that we use more
> > > fields in page struct on Rx, but I thought Tx is also more urgent
> > > as there are open reports for networking taking references on slab
> > > pages.
> > >
> > > In any case, please make sure you maintain clear separation between
> > > readable and unreadable memory in the code you produce.
> >
> > Do you mean the current patches do not?  If yes, please point out one
> > as example, which would be helpful to extract action items.
> >
> 
> I think one thing we could do to improve separation between readable
> (pages/netmem_desc) and unreadable (net_iov) is to remove the struct
> netmem_desc field inside the net_iov, and instead just duplicate the
> pp/pp_ref_count/etc fields. The current code gives off the impression
> that net_iov may be a container of netmem_desc which is not really
> accurate.
> 
> But I don't think that's a major blocker. I think maybe the real issue
> is that there are no reviews from any mm maintainers? So I'm not 100%
> sure this is in line with their memdesc plans. I think probably
> patches 2->8 are generic netmem-ifications that are good to merge
> anyway, but I would say patch 1 and 9 need a reviewed by from someone
> on the mm side. Just my 2 cents.
> 
> Btw, this series has been marked as changes requested on patchwork, so
> it is in need of a respin one way or another:

Some can be improved but the others not.  For example:

   +static noinline netmem_ref __page_pool_alloc_netmems_slow(struct page_pool *pool,
   +							  gfp_t gfp)

It complains about the long line length but no idea how to avoid it :(
I can do nothing but to ignore..

	Byungchul

> https://patchwork.kernel.org/project/netdevbpf/list/?series=&submitter=byungchul&state=*&q=&archive=&delegate=
> 
> https://docs.kernel.org/process/maintainer-netdev.html#patch-status
> 
> -- 
> Thanks,
> Mina


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2025-06-20  4:12 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-09  4:32 [PATCH net-next 0/9] Split netmem from struct page Byungchul Park
2025-06-09  4:32 ` [PATCH net-next 1/9] netmem: introduce struct netmem_desc mirroring " Byungchul Park
2025-06-09 19:32   ` Jakub Kicinski
2025-06-10  1:30     ` Byungchul Park
2025-06-12  1:55       ` Jakub Kicinski
2025-06-13  1:13         ` Byungchul Park
2025-06-14  2:19           ` Mina Almasry
2025-06-17  2:31             ` netmem series needs some love and Acks from MM folks Harry Yoo
2025-06-17 16:09               ` David Hildenbrand
2025-06-17 16:32                 ` Harry Yoo
2025-06-18  0:08                 ` Byungchul Park
2025-06-18  7:51                   ` Pavel Begunkov
2025-06-20  4:12             ` [PATCH net-next 1/9] netmem: introduce struct netmem_desc mirroring struct page Byungchul Park
2025-06-11 20:53     ` Mina Almasry
2025-06-12  1:58       ` Jakub Kicinski
2025-06-17  2:20   ` Harry Yoo
2025-06-19  9:32   ` Vlastimil Babka
2025-06-09  4:32 ` [PATCH net-next 2/9] page_pool: rename page_pool_return_page() to page_pool_return_netmem() Byungchul Park
2025-06-10 11:15   ` Ilias Apalodimas
2025-06-09  4:32 ` [PATCH net-next 3/9] page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma() Byungchul Park
2025-06-10 11:11   ` Ilias Apalodimas
2025-06-09  4:32 ` [PATCH net-next 4/9] page_pool: rename __page_pool_alloc_pages_slow() to __page_pool_alloc_netmems_slow() Byungchul Park
2025-06-09  4:32 ` [PATCH net-next 5/9] netmem: use _Generic to cover const casting for page_to_netmem() Byungchul Park
2025-06-09  4:32 ` [PATCH net-next 6/9] netmem: remove __netmem_get_pp() Byungchul Park
2025-06-10 11:11   ` Ilias Apalodimas
2025-06-09  4:32 ` [PATCH net-next 7/9] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem() Byungchul Park
2025-06-09  4:32 ` [PATCH net-next 8/9] netmem: introduce a netmem API, virt_to_head_netmem() Byungchul Park
2025-06-09  4:32 ` [PATCH net-next 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp() Byungchul Park
2025-06-09 17:39   ` Mina Almasry
2025-06-10  1:45     ` Byungchul Park
2025-06-11 14:30       ` Pavel Begunkov
2025-06-12  0:39         ` Byungchul Park
2025-06-17  2:37         ` Harry Yoo
2025-06-19  9:55   ` Vlastimil Babka
2025-06-19 10:42     ` Byungchul Park
2025-06-09  6:49 ` [PATCH net-next 0/9] Split netmem from struct page Byungchul Park
2025-06-11 14:25 ` Pavel Begunkov
2025-06-11 20:48   ` Mina Almasry
2025-06-12  0:40     ` Byungchul Park
2025-06-16  7:45     ` Jesper Dangaard Brouer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).