* [PATCH 00/18] Split netmem from struct page
@ 2025-05-23 3:25 Byungchul Park
2025-05-23 3:25 ` [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov Byungchul Park
` (19 more replies)
0 siblings, 20 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-23 3:25 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
The MM subsystem is trying to reduce struct page to a single pointer.
The first step towards that is splitting struct page by its individual
users, as has already been done with folio and slab. This patchset does
that for netmem which is used for page pools.
Matthew Wilcox tried and stopped the same work, you can see in:
https://lore.kernel.org/linux-mm/20230111042214.907030-1-willy@infradead.org/
Mina Almasry already has done a lot fo prerequisite works by luck, he
said :). I stacked my patches on the top of his work e.i. netmem.
I focused on removing the page pool members in struct page this time,
not moving the allocation code of page pool from net to mm. It can be
done later if needed.
My rfc version of this work is:
https://lore.kernel.org/all/20250509115126.63190-1-byungchul@sk.com/
There are still a lot of works to do, to remove the dependency on struct
page in the network subsystem. I will continue to work on this after
this base patchset is merged.
---
Changes from rfc:
1. Rebase on net-next's main branch
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/
2. Fix a build error reported by kernel test robot
https://lore.kernel.org/all/202505100932.uzAMBW1y-lkp@intel.com/
3. Add given 'Reviewed-by's, thanks to Mina and Ilias
4. Do static_assert() on the size of struct netmem_desc instead
of placing place-holder in struct page, feedbacked by Matthew
5. Do struct_group_tagged(netmem_desc) on struct net_iov instead
of wholly renaming it to strcut netmem_desc, feedbacked by
Mina and Pavel
Byungchul Park (18):
netmem: introduce struct netmem_desc struct_group_tagged()'ed on
struct net_iov
netmem: introduce netmem alloc APIs to wrap page alloc APIs
page_pool: use netmem alloc/put APIs in __page_pool_alloc_page_order()
page_pool: rename __page_pool_alloc_page_order() to
__page_pool_alloc_large_netmem()
page_pool: use netmem alloc/put APIs in __page_pool_alloc_pages_slow()
page_pool: rename page_pool_return_page() to page_pool_return_netmem()
page_pool: use netmem put API in page_pool_return_netmem()
page_pool: rename __page_pool_release_page_dma() to
__page_pool_release_netmem_dma()
page_pool: rename __page_pool_put_page() to __page_pool_put_netmem()
page_pool: rename __page_pool_alloc_pages_slow() to
__page_pool_alloc_netmems_slow()
mlx4: use netmem descriptor and APIs for page pool
page_pool: use netmem APIs to access page->pp_magic in
page_pool_page_is_pp()
mlx5: use netmem descriptor and APIs for page pool
netmem: use _Generic to cover const casting for page_to_netmem()
netmem: remove __netmem_get_pp()
page_pool: make page_pool_get_dma_addr() just wrap
page_pool_get_dma_addr_netmem()
netdevsim: use netmem descriptor and APIs for page pool
mm, netmem: remove the page pool members in struct page
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 46 ++++----
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 8 +-
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 4 +-
drivers/net/ethernet/mellanox/mlx5/core/en.h | 4 +-
.../net/ethernet/mellanox/mlx5/core/en/xdp.c | 18 ++--
.../net/ethernet/mellanox/mlx5/core/en/xdp.h | 2 +-
.../net/ethernet/mellanox/mlx5/core/en_main.c | 15 ++-
.../net/ethernet/mellanox/mlx5/core/en_rx.c | 66 ++++++------
drivers/net/netdevsim/netdev.c | 18 ++--
drivers/net/netdevsim/netdevsim.h | 2 +-
include/linux/mm.h | 5 +-
include/linux/mm_types.h | 11 --
include/linux/skbuff.h | 14 +++
include/net/netmem.h | 101 ++++++++++--------
include/net/page_pool/helpers.h | 11 +-
net/core/page_pool.c | 97 +++++++++--------
16 files changed, 221 insertions(+), 201 deletions(-)
base-commit: f44092606a3f153bb7e6b277006b1f4a5b914cfc
--
2.17.1
^ permalink raw reply [flat|nested] 72+ messages in thread
* [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov
2025-05-23 3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
@ 2025-05-23 3:25 ` Byungchul Park
2025-05-23 9:01 ` Toke Høiland-Jørgensen
` (2 more replies)
2025-05-23 3:25 ` [PATCH 02/18] netmem: introduce netmem alloc APIs to wrap page alloc APIs Byungchul Park
` (18 subsequent siblings)
19 siblings, 3 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-23 3:25 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
To simplify struct page, the page pool members of struct page should be
moved to other, allowing these members to be removed from struct page.
Introduce a network memory descriptor to store the members, struct
netmem_desc, reusing struct net_iov that already mirrored struct page.
While at it, relocate _pp_mapping_pad to group struct net_iov's fields.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
include/linux/mm_types.h | 2 +-
include/net/netmem.h | 43 +++++++++++++++++++++++++++++++++-------
2 files changed, 37 insertions(+), 8 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 56d07edd01f9..873e820e1521 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -120,13 +120,13 @@ struct page {
unsigned long private;
};
struct { /* page_pool used by netstack */
+ unsigned long _pp_mapping_pad;
/**
* @pp_magic: magic value to avoid recycling non
* page_pool allocated pages.
*/
unsigned long pp_magic;
struct page_pool *pp;
- unsigned long _pp_mapping_pad;
unsigned long dma_addr;
atomic_long_t pp_ref_count;
};
diff --git a/include/net/netmem.h b/include/net/netmem.h
index 386164fb9c18..08e9d76cdf14 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -31,12 +31,41 @@ enum net_iov_type {
};
struct net_iov {
- enum net_iov_type type;
- unsigned long pp_magic;
- struct page_pool *pp;
- struct net_iov_area *owner;
- unsigned long dma_addr;
- atomic_long_t pp_ref_count;
+ /*
+ * XXX: Now that struct netmem_desc overlays on struct page,
+ * struct_group_tagged() should cover all of them. However,
+ * a separate struct netmem_desc should be declared and embedded,
+ * once struct netmem_desc is no longer overlayed but it has its
+ * own instance from slab. The final form should be:
+ *
+ * struct netmem_desc {
+ * unsigned long pp_magic;
+ * struct page_pool *pp;
+ * unsigned long dma_addr;
+ * atomic_long_t pp_ref_count;
+ * };
+ *
+ * struct net_iov {
+ * enum net_iov_type type;
+ * struct net_iov_area *owner;
+ * struct netmem_desc;
+ * };
+ */
+ struct_group_tagged(netmem_desc, desc,
+ /*
+ * only for struct net_iov
+ */
+ enum net_iov_type type;
+ struct net_iov_area *owner;
+
+ /*
+ * actually for struct netmem_desc
+ */
+ unsigned long pp_magic;
+ struct page_pool *pp;
+ unsigned long dma_addr;
+ atomic_long_t pp_ref_count;
+ );
};
struct net_iov_area {
@@ -51,9 +80,9 @@ struct net_iov_area {
/* These fields in struct page are used by the page_pool and net stack:
*
* struct {
+ * unsigned long _pp_mapping_pad;
* unsigned long pp_magic;
* struct page_pool *pp;
- * unsigned long _pp_mapping_pad;
* unsigned long dma_addr;
* atomic_long_t pp_ref_count;
* };
--
2.17.1
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 02/18] netmem: introduce netmem alloc APIs to wrap page alloc APIs
2025-05-23 3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
2025-05-23 3:25 ` [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov Byungchul Park
@ 2025-05-23 3:25 ` Byungchul Park
2025-05-23 3:25 ` [PATCH 03/18] page_pool: use netmem alloc/put APIs in __page_pool_alloc_page_order() Byungchul Park
` (17 subsequent siblings)
19 siblings, 0 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-23 3:25 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
To eliminate the use of struct page in page pool, the page pool code
should use netmem descriptor and APIs instead.
As part of the work, introduce netmem alloc APIs allowing the code to
use them rather than the existing APIs for struct page.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
include/net/netmem.h | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/include/net/netmem.h b/include/net/netmem.h
index 08e9d76cdf14..29c005d70c4f 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -177,6 +177,19 @@ static inline netmem_ref page_to_netmem(struct page *page)
return (__force netmem_ref)page;
}
+static inline netmem_ref alloc_netmems_node(int nid, gfp_t gfp_mask,
+ unsigned int order)
+{
+ return page_to_netmem(alloc_pages_node(nid, gfp_mask, order));
+}
+
+static inline unsigned long alloc_netmems_bulk_node(gfp_t gfp, int nid,
+ unsigned long nr_netmems, netmem_ref *netmem_array)
+{
+ return alloc_pages_bulk_node(gfp, nid, nr_netmems,
+ (struct page **)netmem_array);
+}
+
/**
* virt_to_netmem - convert virtual memory pointer to a netmem reference
* @data: host memory pointer to convert
--
2.17.1
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 03/18] page_pool: use netmem alloc/put APIs in __page_pool_alloc_page_order()
2025-05-23 3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
2025-05-23 3:25 ` [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov Byungchul Park
2025-05-23 3:25 ` [PATCH 02/18] netmem: introduce netmem alloc APIs to wrap page alloc APIs Byungchul Park
@ 2025-05-23 3:25 ` Byungchul Park
2025-05-23 3:25 ` [PATCH 04/18] page_pool: rename __page_pool_alloc_page_order() to __page_pool_alloc_large_netmem() Byungchul Park
` (16 subsequent siblings)
19 siblings, 0 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-23 3:25 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
Use netmem alloc/put APIs instead of page alloc/put APIs and make it
return netmem_ref instead of struct page * in
__page_pool_alloc_page_order().
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
net/core/page_pool.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 974f3eef2efa..2680d38d3daf 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -518,29 +518,29 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem, gfp_t g
return false;
}
-static struct page *__page_pool_alloc_page_order(struct page_pool *pool,
+static netmem_ref __page_pool_alloc_page_order(struct page_pool *pool,
gfp_t gfp)
{
- struct page *page;
+ netmem_ref netmem;
gfp |= __GFP_COMP;
- page = alloc_pages_node(pool->p.nid, gfp, pool->p.order);
- if (unlikely(!page))
- return NULL;
+ netmem = alloc_netmems_node(pool->p.nid, gfp, pool->p.order);
+ if (unlikely(!netmem))
+ return 0;
- if (pool->dma_map && unlikely(!page_pool_dma_map(pool, page_to_netmem(page), gfp))) {
- put_page(page);
- return NULL;
+ if (pool->dma_map && unlikely(!page_pool_dma_map(pool, netmem, gfp))) {
+ put_netmem(netmem);
+ return 0;
}
alloc_stat_inc(pool, slow_high_order);
- page_pool_set_pp_info(pool, page_to_netmem(page));
+ page_pool_set_pp_info(pool, netmem);
/* Track how many pages are held 'in-flight' */
pool->pages_state_hold_cnt++;
- trace_page_pool_state_hold(pool, page_to_netmem(page),
+ trace_page_pool_state_hold(pool, netmem,
pool->pages_state_hold_cnt);
- return page;
+ return netmem;
}
/* slow path */
@@ -555,7 +555,7 @@ static noinline netmem_ref __page_pool_alloc_pages_slow(struct page_pool *pool,
/* Don't support bulk alloc for high-order pages */
if (unlikely(pp_order))
- return page_to_netmem(__page_pool_alloc_page_order(pool, gfp));
+ return __page_pool_alloc_page_order(pool, gfp);
/* Unnecessary as alloc cache is empty, but guarantees zero count */
if (unlikely(pool->alloc.count > 0))
--
2.17.1
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 04/18] page_pool: rename __page_pool_alloc_page_order() to __page_pool_alloc_large_netmem()
2025-05-23 3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
` (2 preceding siblings ...)
2025-05-23 3:25 ` [PATCH 03/18] page_pool: use netmem alloc/put APIs in __page_pool_alloc_page_order() Byungchul Park
@ 2025-05-23 3:25 ` Byungchul Park
2025-05-23 3:25 ` [PATCH 05/18] page_pool: use netmem alloc/put APIs in __page_pool_alloc_pages_slow() Byungchul Park
` (15 subsequent siblings)
19 siblings, 0 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-23 3:25 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
Now that __page_pool_alloc_page_order() uses netmem alloc/put APIs, not
page alloc/put APIs, rename it to __page_pool_alloc_large_netmem() to
reflect what it does.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
net/core/page_pool.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 2680d38d3daf..147cefe7a031 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -518,7 +518,7 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem, gfp_t g
return false;
}
-static netmem_ref __page_pool_alloc_page_order(struct page_pool *pool,
+static netmem_ref __page_pool_alloc_large_netmem(struct page_pool *pool,
gfp_t gfp)
{
netmem_ref netmem;
@@ -555,7 +555,7 @@ static noinline netmem_ref __page_pool_alloc_pages_slow(struct page_pool *pool,
/* Don't support bulk alloc for high-order pages */
if (unlikely(pp_order))
- return __page_pool_alloc_page_order(pool, gfp);
+ return __page_pool_alloc_large_netmem(pool, gfp);
/* Unnecessary as alloc cache is empty, but guarantees zero count */
if (unlikely(pool->alloc.count > 0))
--
2.17.1
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 05/18] page_pool: use netmem alloc/put APIs in __page_pool_alloc_pages_slow()
2025-05-23 3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
` (3 preceding siblings ...)
2025-05-23 3:25 ` [PATCH 04/18] page_pool: rename __page_pool_alloc_page_order() to __page_pool_alloc_large_netmem() Byungchul Park
@ 2025-05-23 3:25 ` Byungchul Park
2025-05-23 3:25 ` [PATCH 06/18] page_pool: rename page_pool_return_page() to page_pool_return_netmem() Byungchul Park
` (14 subsequent siblings)
19 siblings, 0 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-23 3:25 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
Use netmem alloc/put APIs instead of page alloc/put APIs in
__page_pool_alloc_pages_slow().
While at it, improved some comments.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
net/core/page_pool.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 147cefe7a031..cec126e85eff 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -551,7 +551,7 @@ static noinline netmem_ref __page_pool_alloc_pages_slow(struct page_pool *pool,
unsigned int pp_order = pool->p.order;
bool dma_map = pool->dma_map;
netmem_ref netmem;
- int i, nr_pages;
+ int i, nr_netmems;
/* Don't support bulk alloc for high-order pages */
if (unlikely(pp_order))
@@ -561,21 +561,21 @@ static noinline netmem_ref __page_pool_alloc_pages_slow(struct page_pool *pool,
if (unlikely(pool->alloc.count > 0))
return pool->alloc.cache[--pool->alloc.count];
- /* Mark empty alloc.cache slots "empty" for alloc_pages_bulk */
+ /* Mark empty alloc.cache slots "empty" for alloc_netmems_bulk_node() */
memset(&pool->alloc.cache, 0, sizeof(void *) * bulk);
- nr_pages = alloc_pages_bulk_node(gfp, pool->p.nid, bulk,
- (struct page **)pool->alloc.cache);
- if (unlikely(!nr_pages))
+ nr_netmems = alloc_netmems_bulk_node(gfp, pool->p.nid, bulk,
+ pool->alloc.cache);
+ if (unlikely(!nr_netmems))
return 0;
- /* Pages have been filled into alloc.cache array, but count is zero and
- * page element have not been (possibly) DMA mapped.
+ /* Netmems have been filled into alloc.cache array, but count is
+ * zero and elements have not been (possibly) DMA mapped.
*/
- for (i = 0; i < nr_pages; i++) {
+ for (i = 0; i < nr_netmems; i++) {
netmem = pool->alloc.cache[i];
if (dma_map && unlikely(!page_pool_dma_map(pool, netmem, gfp))) {
- put_page(netmem_to_page(netmem));
+ put_netmem(netmem);
continue;
}
@@ -587,7 +587,7 @@ static noinline netmem_ref __page_pool_alloc_pages_slow(struct page_pool *pool,
pool->pages_state_hold_cnt);
}
- /* Return last page */
+ /* Return the last netmem */
if (likely(pool->alloc.count > 0)) {
netmem = pool->alloc.cache[--pool->alloc.count];
alloc_stat_inc(pool, slow);
@@ -595,7 +595,8 @@ static noinline netmem_ref __page_pool_alloc_pages_slow(struct page_pool *pool,
netmem = 0;
}
- /* When page just alloc'ed is should/must have refcnt 1. */
+ /* When a netmem has been just allocated, it should/must have
+ * refcnt 1. */
return netmem;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 06/18] page_pool: rename page_pool_return_page() to page_pool_return_netmem()
2025-05-23 3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
` (4 preceding siblings ...)
2025-05-23 3:25 ` [PATCH 05/18] page_pool: use netmem alloc/put APIs in __page_pool_alloc_pages_slow() Byungchul Park
@ 2025-05-23 3:25 ` Byungchul Park
2025-05-28 3:18 ` Mina Almasry
2025-05-23 3:25 ` [PATCH 07/18] page_pool: use netmem put API in page_pool_return_netmem() Byungchul Park
` (13 subsequent siblings)
19 siblings, 1 reply; 72+ messages in thread
From: Byungchul Park @ 2025-05-23 3:25 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
Now that page_pool_return_page() is for returning netmem, not struct
page, rename it to page_pool_return_netmem() to reflect what it does.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
net/core/page_pool.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index cec126e85eff..1106d4759fc6 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -371,7 +371,7 @@ struct page_pool *page_pool_create(const struct page_pool_params *params)
}
EXPORT_SYMBOL(page_pool_create);
-static void page_pool_return_page(struct page_pool *pool, netmem_ref netmem);
+static void page_pool_return_netmem(struct page_pool *pool, netmem_ref netmem);
static noinline netmem_ref page_pool_refill_alloc_cache(struct page_pool *pool)
{
@@ -409,7 +409,7 @@ static noinline netmem_ref page_pool_refill_alloc_cache(struct page_pool *pool)
* (2) break out to fallthrough to alloc_pages_node.
* This limit stress on page buddy alloactor.
*/
- page_pool_return_page(pool, netmem);
+ page_pool_return_netmem(pool, netmem);
alloc_stat_inc(pool, waive);
netmem = 0;
break;
@@ -713,7 +713,7 @@ static __always_inline void __page_pool_release_page_dma(struct page_pool *pool,
* a regular page (that will eventually be returned to the normal
* page-allocator via put_page).
*/
-void page_pool_return_page(struct page_pool *pool, netmem_ref netmem)
+static void page_pool_return_netmem(struct page_pool *pool, netmem_ref netmem)
{
int count;
bool put;
@@ -830,7 +830,7 @@ __page_pool_put_page(struct page_pool *pool, netmem_ref netmem,
* will be invoking put_page.
*/
recycle_stat_inc(pool, released_refcnt);
- page_pool_return_page(pool, netmem);
+ page_pool_return_netmem(pool, netmem);
return 0;
}
@@ -873,7 +873,7 @@ void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem,
if (netmem && !page_pool_recycle_in_ring(pool, netmem)) {
/* Cache full, fallback to free pages */
recycle_stat_inc(pool, ring_full);
- page_pool_return_page(pool, netmem);
+ page_pool_return_netmem(pool, netmem);
}
}
EXPORT_SYMBOL(page_pool_put_unrefed_netmem);
@@ -916,7 +916,7 @@ static void page_pool_recycle_ring_bulk(struct page_pool *pool,
* since put_page() with refcnt == 1 can be an expensive operation.
*/
for (; i < bulk_len; i++)
- page_pool_return_page(pool, bulk[i]);
+ page_pool_return_netmem(pool, bulk[i]);
}
/**
@@ -999,7 +999,7 @@ static netmem_ref page_pool_drain_frag(struct page_pool *pool,
return netmem;
}
- page_pool_return_page(pool, netmem);
+ page_pool_return_netmem(pool, netmem);
return 0;
}
@@ -1013,7 +1013,7 @@ static void page_pool_free_frag(struct page_pool *pool)
if (!netmem || page_pool_unref_netmem(netmem, drain_count))
return;
- page_pool_return_page(pool, netmem);
+ page_pool_return_netmem(pool, netmem);
}
netmem_ref page_pool_alloc_frag_netmem(struct page_pool *pool,
@@ -1080,7 +1080,7 @@ static void page_pool_empty_ring(struct page_pool *pool)
pr_crit("%s() page_pool refcnt %d violation\n",
__func__, netmem_ref_count(netmem));
- page_pool_return_page(pool, netmem);
+ page_pool_return_netmem(pool, netmem);
}
}
@@ -1113,7 +1113,7 @@ static void page_pool_empty_alloc_cache_once(struct page_pool *pool)
*/
while (pool->alloc.count) {
netmem = pool->alloc.cache[--pool->alloc.count];
- page_pool_return_page(pool, netmem);
+ page_pool_return_netmem(pool, netmem);
}
}
@@ -1253,7 +1253,7 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid)
/* Flush pool alloc cache, as refill will check NUMA node */
while (pool->alloc.count) {
netmem = pool->alloc.cache[--pool->alloc.count];
- page_pool_return_page(pool, netmem);
+ page_pool_return_netmem(pool, netmem);
}
}
EXPORT_SYMBOL(page_pool_update_nid);
--
2.17.1
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 07/18] page_pool: use netmem put API in page_pool_return_netmem()
2025-05-23 3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
` (5 preceding siblings ...)
2025-05-23 3:25 ` [PATCH 06/18] page_pool: rename page_pool_return_page() to page_pool_return_netmem() Byungchul Park
@ 2025-05-23 3:25 ` Byungchul Park
2025-05-23 3:25 ` [PATCH 08/18] page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma() Byungchul Park
` (12 subsequent siblings)
19 siblings, 0 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-23 3:25 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
Use netmem put API, put_netmem(), instead of put_page() in
page_pool_return_netmem().
While at it, delete #include <linux/mm.h> since the last put_page() in
page_pool.c has been just removed with this patch.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
net/core/page_pool.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 1106d4759fc6..00bd5898a25c 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -20,7 +20,6 @@
#include <linux/dma-direction.h>
#include <linux/dma-mapping.h>
#include <linux/page-flags.h>
-#include <linux/mm.h> /* for put_page() */
#include <linux/poison.h>
#include <linux/ethtool.h>
#include <linux/netdevice.h>
@@ -711,7 +710,7 @@ static __always_inline void __page_pool_release_page_dma(struct page_pool *pool,
/* Disconnects a page (from a page_pool). API users can have a need
* to disconnect a page (from a page_pool), to allow it to be used as
* a regular page (that will eventually be returned to the normal
- * page-allocator via put_page).
+ * page-allocator via put_netmem() and then put_page()).
*/
static void page_pool_return_netmem(struct page_pool *pool, netmem_ref netmem)
{
@@ -732,7 +731,7 @@ static void page_pool_return_netmem(struct page_pool *pool, netmem_ref netmem)
if (put) {
page_pool_clear_pp_info(netmem);
- put_page(netmem_to_page(netmem));
+ put_netmem(netmem);
}
/* An optimization would be to call __free_pages(page, pool->p.order)
* knowing page is not part of page-cache (thus avoiding a
--
2.17.1
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 08/18] page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma()
2025-05-23 3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
` (6 preceding siblings ...)
2025-05-23 3:25 ` [PATCH 07/18] page_pool: use netmem put API in page_pool_return_netmem() Byungchul Park
@ 2025-05-23 3:25 ` Byungchul Park
2025-05-23 3:26 ` [PATCH 09/18] page_pool: rename __page_pool_put_page() to __page_pool_put_netmem() Byungchul Park
` (11 subsequent siblings)
19 siblings, 0 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-23 3:25 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
Now that __page_pool_release_page_dma() is for releasing netmem, not
struct page, rename it to __page_pool_release_netmem_dma() to reflect
what it does.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
net/core/page_pool.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 00bd5898a25c..fd71198afd8b 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -673,7 +673,7 @@ void page_pool_clear_pp_info(netmem_ref netmem)
netmem_set_pp(netmem, NULL);
}
-static __always_inline void __page_pool_release_page_dma(struct page_pool *pool,
+static __always_inline void __page_pool_release_netmem_dma(struct page_pool *pool,
netmem_ref netmem)
{
struct page *old, *page = netmem_to_page(netmem);
@@ -721,7 +721,7 @@ static void page_pool_return_netmem(struct page_pool *pool, netmem_ref netmem)
if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops)
put = pool->mp_ops->release_netmem(pool, netmem);
else
- __page_pool_release_page_dma(pool, netmem);
+ __page_pool_release_netmem_dma(pool, netmem);
/* This may be the last page returned, releasing the pool, so
* it is not safe to reference pool afterwards.
@@ -1139,7 +1139,7 @@ static void page_pool_scrub(struct page_pool *pool)
}
xa_for_each(&pool->dma_mapped, id, ptr)
- __page_pool_release_page_dma(pool, page_to_netmem(ptr));
+ __page_pool_release_netmem_dma(pool, page_to_netmem((struct page *)ptr));
}
/* No more consumers should exist, but producers could still
--
2.17.1
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 09/18] page_pool: rename __page_pool_put_page() to __page_pool_put_netmem()
2025-05-23 3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
` (7 preceding siblings ...)
2025-05-23 3:25 ` [PATCH 08/18] page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma() Byungchul Park
@ 2025-05-23 3:26 ` Byungchul Park
2025-05-23 3:26 ` [PATCH 10/18] page_pool: rename __page_pool_alloc_pages_slow() to __page_pool_alloc_netmems_slow() Byungchul Park
` (10 subsequent siblings)
19 siblings, 0 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-23 3:26 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
Now that __page_pool_put_page() puts netmem, not struct page, rename it
to __page_pool_put_netmem() to reflect what it does.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
net/core/page_pool.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index fd71198afd8b..01b5f6e65216 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -789,7 +789,7 @@ static bool __page_pool_page_can_be_recycled(netmem_ref netmem)
* subsystem.
*/
static __always_inline netmem_ref
-__page_pool_put_page(struct page_pool *pool, netmem_ref netmem,
+__page_pool_put_netmem(struct page_pool *pool, netmem_ref netmem,
unsigned int dma_sync_size, bool allow_direct)
{
lockdep_assert_no_hardirq();
@@ -849,7 +849,7 @@ static bool page_pool_napi_local(const struct page_pool *pool)
/* Allow direct recycle if we have reasons to believe that we are
* in the same context as the consumer would run, so there's
* no possible race.
- * __page_pool_put_page() makes sure we're not in hardirq context
+ * __page_pool_put_netmem() makes sure we're not in hardirq context
* and interrupts are enabled prior to accessing the cache.
*/
cpuid = smp_processor_id();
@@ -868,7 +868,7 @@ void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem,
allow_direct = page_pool_napi_local(pool);
netmem =
- __page_pool_put_page(pool, netmem, dma_sync_size, allow_direct);
+ __page_pool_put_netmem(pool, netmem, dma_sync_size, allow_direct);
if (netmem && !page_pool_recycle_in_ring(pool, netmem)) {
/* Cache full, fallback to free pages */
recycle_stat_inc(pool, ring_full);
@@ -969,7 +969,7 @@ void page_pool_put_netmem_bulk(netmem_ref *data, u32 count)
continue;
}
- netmem = __page_pool_put_page(pool, netmem, -1,
+ netmem = __page_pool_put_netmem(pool, netmem, -1,
allow_direct);
/* Approved for bulk recycling in ptr_ring cache */
if (netmem)
--
2.17.1
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 10/18] page_pool: rename __page_pool_alloc_pages_slow() to __page_pool_alloc_netmems_slow()
2025-05-23 3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
` (8 preceding siblings ...)
2025-05-23 3:26 ` [PATCH 09/18] page_pool: rename __page_pool_put_page() to __page_pool_put_netmem() Byungchul Park
@ 2025-05-23 3:26 ` Byungchul Park
2025-05-23 3:26 ` [PATCH 11/18] mlx4: use netmem descriptor and APIs for page pool Byungchul Park
` (9 subsequent siblings)
19 siblings, 0 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-23 3:26 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
Now that __page_pool_alloc_pages_slow() is for allocating netmem, not
struct page, rename it to __page_pool_alloc_netmems_slow() to reflect
what it does.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
net/core/page_pool.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 01b5f6e65216..1071cb3d63e5 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -543,7 +543,7 @@ static netmem_ref __page_pool_alloc_large_netmem(struct page_pool *pool,
}
/* slow path */
-static noinline netmem_ref __page_pool_alloc_pages_slow(struct page_pool *pool,
+static noinline netmem_ref __page_pool_alloc_netmems_slow(struct page_pool *pool,
gfp_t gfp)
{
const int bulk = PP_ALLOC_CACHE_REFILL;
@@ -615,7 +615,7 @@ netmem_ref page_pool_alloc_netmems(struct page_pool *pool, gfp_t gfp)
if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops)
netmem = pool->mp_ops->alloc_netmems(pool, gfp);
else
- netmem = __page_pool_alloc_pages_slow(pool, gfp);
+ netmem = __page_pool_alloc_netmems_slow(pool, gfp);
return netmem;
}
EXPORT_SYMBOL(page_pool_alloc_netmems);
--
2.17.1
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 11/18] mlx4: use netmem descriptor and APIs for page pool
2025-05-23 3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
` (9 preceding siblings ...)
2025-05-23 3:26 ` [PATCH 10/18] page_pool: rename __page_pool_alloc_pages_slow() to __page_pool_alloc_netmems_slow() Byungchul Park
@ 2025-05-23 3:26 ` Byungchul Park
2025-05-23 3:26 ` [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp() Byungchul Park
` (8 subsequent siblings)
19 siblings, 0 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-23 3:26 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
To simplify struct page, the effort to seperate its own descriptor from
struct page is required and the work for page pool is on going.
Use netmem descriptor and APIs for page pool in mlx4 code.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 46 +++++++++++---------
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 8 ++--
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 4 +-
3 files changed, 31 insertions(+), 27 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index b33285d755b9..82c24931fa44 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -62,18 +62,18 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
int i;
for (i = 0; i < priv->num_frags; i++, frags++) {
- if (!frags->page) {
- frags->page = page_pool_alloc_pages(ring->pp, gfp);
- if (!frags->page) {
+ if (!frags->netmem) {
+ frags->netmem = page_pool_alloc_netmems(ring->pp, gfp);
+ if (!frags->netmem) {
ring->alloc_fail++;
return -ENOMEM;
}
- page_pool_fragment_page(frags->page, 1);
+ page_pool_fragment_netmem(frags->netmem, 1);
frags->page_offset = priv->rx_headroom;
ring->rx_alloc_pages++;
}
- dma = page_pool_get_dma_addr(frags->page);
+ dma = page_pool_get_dma_addr_netmem(frags->netmem);
rx_desc->data[i].addr = cpu_to_be64(dma + frags->page_offset);
}
return 0;
@@ -83,10 +83,10 @@ static void mlx4_en_free_frag(const struct mlx4_en_priv *priv,
struct mlx4_en_rx_ring *ring,
struct mlx4_en_rx_alloc *frag)
{
- if (frag->page)
- page_pool_put_full_page(ring->pp, frag->page, false);
+ if (frag->netmem)
+ page_pool_put_full_netmem(ring->pp, frag->netmem, false);
/* We need to clear all fields, otherwise a change of priv->log_rx_info
- * could lead to see garbage later in frag->page.
+ * could lead to see garbage later in frag->netmem.
*/
memset(frag, 0, sizeof(*frag));
}
@@ -440,29 +440,33 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
unsigned int truesize = 0;
bool release = true;
int nr, frag_size;
- struct page *page;
+ netmem_ref netmem;
dma_addr_t dma;
/* Collect used fragments while replacing them in the HW descriptors */
for (nr = 0;; frags++) {
frag_size = min_t(int, length, frag_info->frag_size);
- page = frags->page;
- if (unlikely(!page))
+ netmem = frags->netmem;
+ if (unlikely(!netmem))
goto fail;
- dma = page_pool_get_dma_addr(page);
+ dma = page_pool_get_dma_addr_netmem(netmem);
dma_sync_single_range_for_cpu(priv->ddev, dma, frags->page_offset,
frag_size, priv->dma_dir);
- __skb_fill_page_desc(skb, nr, page, frags->page_offset,
+ __skb_fill_netmem_desc(skb, nr, netmem, frags->page_offset,
frag_size);
truesize += frag_info->frag_stride;
if (frag_info->frag_stride == PAGE_SIZE / 2) {
+ struct page *page = netmem_to_page(netmem);
+ atomic_long_t *pp_ref_count =
+ netmem_get_pp_ref_count_ref(netmem);
+
frags->page_offset ^= PAGE_SIZE / 2;
release = page_count(page) != 1 ||
- atomic_long_read(&page->pp_ref_count) != 1 ||
+ atomic_long_read(pp_ref_count) != 1 ||
page_is_pfmemalloc(page) ||
page_to_nid(page) != numa_mem_id();
} else if (!priv->rx_headroom) {
@@ -476,9 +480,9 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
release = frags->page_offset + frag_info->frag_size > PAGE_SIZE;
}
if (release) {
- frags->page = NULL;
+ frags->netmem = 0;
} else {
- page_pool_ref_page(page);
+ page_pool_ref_netmem(netmem);
}
nr++;
@@ -719,7 +723,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
int nr;
frags = ring->rx_info + (index << priv->log_rx_info);
- va = page_address(frags[0].page) + frags[0].page_offset;
+ va = netmem_address(frags[0].netmem) + frags[0].page_offset;
net_prefetchw(va);
/*
* make sure we read the CQE after we read the ownership bit
@@ -748,7 +752,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
/* Get pointer to first fragment since we haven't
* skb yet and cast it to ethhdr struct
*/
- dma = page_pool_get_dma_addr(frags[0].page);
+ dma = page_pool_get_dma_addr_netmem(frags[0].netmem);
dma += frags[0].page_offset;
dma_sync_single_for_cpu(priv->ddev, dma, sizeof(*ethh),
DMA_FROM_DEVICE);
@@ -788,7 +792,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
void *orig_data;
u32 act;
- dma = page_pool_get_dma_addr(frags[0].page);
+ dma = page_pool_get_dma_addr_netmem(frags[0].netmem);
dma += frags[0].page_offset;
dma_sync_single_for_cpu(priv->ddev, dma,
priv->frag_info[0].frag_size,
@@ -818,7 +822,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
if (likely(!xdp_do_redirect(dev, &mxbuf.xdp, xdp_prog))) {
ring->xdp_redirect++;
xdp_redir_flush = true;
- frags[0].page = NULL;
+ frags[0].netmem = 0;
goto next;
}
ring->xdp_redirect_fail++;
@@ -828,7 +832,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
if (likely(!mlx4_en_xmit_frame(ring, frags, priv,
length, cq_ring,
&doorbell_pending))) {
- frags[0].page = NULL;
+ frags[0].netmem = 0;
goto next;
}
trace_xdp_exception(dev, xdp_prog, act);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 87f35bcbeff8..b564a953da09 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -354,7 +354,7 @@ u32 mlx4_en_recycle_tx_desc(struct mlx4_en_priv *priv,
struct page_pool *pool = ring->recycle_ring->pp;
/* Note that napi_mode = 0 means ndo_close() path, not budget = 0 */
- page_pool_put_full_page(pool, tx_info->page, !!napi_mode);
+ page_pool_put_full_netmem(pool, tx_info->netmem, !!napi_mode);
return tx_info->nr_txbb;
}
@@ -1191,10 +1191,10 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
tx_desc = ring->buf + (index << LOG_TXBB_SIZE);
data = &tx_desc->data;
- dma = page_pool_get_dma_addr(frame->page);
+ dma = page_pool_get_dma_addr_netmem(frame->netmem);
- tx_info->page = frame->page;
- frame->page = NULL;
+ tx_info->netmem = frame->netmem;
+ frame->netmem = 0;
tx_info->map0_dma = dma;
tx_info->nr_bytes = max_t(unsigned int, length, ETH_ZLEN);
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index ad0d91a75184..3ef9a0a1f783 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -213,7 +213,7 @@ enum cq_type {
struct mlx4_en_tx_info {
union {
struct sk_buff *skb;
- struct page *page;
+ netmem_ref netmem;
};
dma_addr_t map0_dma;
u32 map0_byte_count;
@@ -246,7 +246,7 @@ struct mlx4_en_tx_desc {
#define MLX4_EN_CX3_HIGH_ID 0x1005
struct mlx4_en_rx_alloc {
- struct page *page;
+ netmem_ref netmem;
u32 page_offset;
};
--
2.17.1
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp()
2025-05-23 3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
` (10 preceding siblings ...)
2025-05-23 3:26 ` [PATCH 11/18] mlx4: use netmem descriptor and APIs for page pool Byungchul Park
@ 2025-05-23 3:26 ` Byungchul Park
2025-05-23 8:58 ` Toke Høiland-Jørgensen
2025-05-23 17:21 ` Mina Almasry
2025-05-23 3:26 ` [PATCH 13/18] mlx5: use netmem descriptor and APIs for page pool Byungchul Park
` (7 subsequent siblings)
19 siblings, 2 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-23 3:26 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
To simplify struct page, the effort to seperate its own descriptor from
struct page is required and the work for page pool is on going.
To achieve that, all the code should avoid accessing page pool members
of struct page directly, but use safe APIs for the purpose.
Use netmem_is_pp() instead of directly accessing page->pp_magic in
page_pool_page_is_pp().
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
include/linux/mm.h | 5 +----
net/core/page_pool.c | 5 +++++
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8dc012e84033..3f7c80fb73ce 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4312,10 +4312,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
#ifdef CONFIG_PAGE_POOL
-static inline bool page_pool_page_is_pp(struct page *page)
-{
- return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
-}
+bool page_pool_page_is_pp(struct page *page);
#else
static inline bool page_pool_page_is_pp(struct page *page)
{
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 1071cb3d63e5..37e667e6ca33 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1284,3 +1284,8 @@ void net_mp_niov_clear_page_pool(struct net_iov *niov)
page_pool_clear_pp_info(netmem);
}
+
+bool page_pool_page_is_pp(struct page *page)
+{
+ return netmem_is_pp(page_to_netmem(page));
+}
--
2.17.1
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 13/18] mlx5: use netmem descriptor and APIs for page pool
2025-05-23 3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
` (11 preceding siblings ...)
2025-05-23 3:26 ` [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp() Byungchul Park
@ 2025-05-23 3:26 ` Byungchul Park
2025-05-23 17:13 ` Mina Almasry
2025-05-23 3:26 ` [PATCH 14/18] netmem: use _Generic to cover const casting for page_to_netmem() Byungchul Park
` (6 subsequent siblings)
19 siblings, 1 reply; 72+ messages in thread
From: Byungchul Park @ 2025-05-23 3:26 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
To simplify struct page, the effort to seperate its own descriptor from
struct page is required and the work for page pool is on going.
Use netmem descriptor and APIs for page pool in mlx5 code.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en.h | 4 +-
.../net/ethernet/mellanox/mlx5/core/en/xdp.c | 18 ++---
.../net/ethernet/mellanox/mlx5/core/en/xdp.h | 2 +-
.../net/ethernet/mellanox/mlx5/core/en_main.c | 15 +++--
.../net/ethernet/mellanox/mlx5/core/en_rx.c | 66 +++++++++----------
include/linux/skbuff.h | 14 ++++
include/net/page_pool/helpers.h | 4 ++
7 files changed, 73 insertions(+), 50 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 5b0d03b3efe8..ab36a4e86c42 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -557,7 +557,7 @@ struct mlx5e_icosq {
} ____cacheline_aligned_in_smp;
struct mlx5e_frag_page {
- struct page *page;
+ netmem_ref netmem;
u16 frags;
};
@@ -629,7 +629,7 @@ struct mlx5e_dma_info {
dma_addr_t addr;
union {
struct mlx5e_frag_page *frag_page;
- struct page *page;
+ netmem_ref netmem;
};
};
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index 5ce1b463b7a8..cead69ff8eee 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -61,7 +61,7 @@ static inline bool
mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq,
struct xdp_buff *xdp)
{
- struct page *page = virt_to_page(xdp->data);
+ netmem_ref netmem = virt_to_netmem(xdp->data);
struct mlx5e_xmit_data_frags xdptxdf = {};
struct mlx5e_xmit_data *xdptxd;
struct xdp_frame *xdpf;
@@ -122,7 +122,7 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq,
* mode.
*/
- dma_addr = page_pool_get_dma_addr(page) + (xdpf->data - (void *)xdpf);
+ dma_addr = page_pool_get_dma_addr_netmem(netmem) + (xdpf->data - (void *)xdpf);
dma_sync_single_for_device(sq->pdev, dma_addr, xdptxd->len, DMA_BIDIRECTIONAL);
if (xdptxd->has_frags) {
@@ -134,7 +134,7 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq,
dma_addr_t addr;
u32 len;
- addr = page_pool_get_dma_addr(skb_frag_page(frag)) +
+ addr = page_pool_get_dma_addr_netmem(skb_frag_netmem(frag)) +
skb_frag_off(frag);
len = skb_frag_size(frag);
dma_sync_single_for_device(sq->pdev, addr, len,
@@ -157,19 +157,19 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq,
(union mlx5e_xdp_info)
{ .page.num = 1 + xdptxdf.sinfo->nr_frags });
mlx5e_xdpi_fifo_push(&sq->db.xdpi_fifo,
- (union mlx5e_xdp_info) { .page.page = page });
+ (union mlx5e_xdp_info) { .page.netmem = netmem });
for (i = 0; i < xdptxdf.sinfo->nr_frags; i++) {
skb_frag_t *frag = &xdptxdf.sinfo->frags[i];
mlx5e_xdpi_fifo_push(&sq->db.xdpi_fifo,
(union mlx5e_xdp_info)
- { .page.page = skb_frag_page(frag) });
+ { .page.netmem = skb_frag_netmem(frag) });
}
} else {
mlx5e_xdpi_fifo_push(&sq->db.xdpi_fifo,
(union mlx5e_xdp_info) { .page.num = 1 });
mlx5e_xdpi_fifo_push(&sq->db.xdpi_fifo,
- (union mlx5e_xdp_info) { .page.page = page });
+ (union mlx5e_xdp_info) { .page.netmem = netmem });
}
return true;
@@ -702,15 +702,15 @@ static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq,
num = xdpi.page.num;
do {
- struct page *page;
+ netmem_ref netmem;
xdpi = mlx5e_xdpi_fifo_pop(xdpi_fifo);
- page = xdpi.page.page;
+ netmem = xdpi.page.netmem;
/* No need to check page_pool_page_is_pp() as we
* know this is a page_pool page.
*/
- page_pool_recycle_direct(page->pp, page);
+ page_pool_recycle_direct_netmem(netmem_get_pp(netmem), netmem);
} while (++n < num);
break;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
index 46ab0a9e8cdd..931f9922e5c5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
@@ -90,7 +90,7 @@ union mlx5e_xdp_info {
union {
struct mlx5e_rq *rq;
u8 num;
- struct page *page;
+ netmem_ref netmem;
} page;
struct xsk_tx_metadata_compl xsk_meta;
};
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 9bd166f489e7..4d6a08502c5e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -708,24 +708,29 @@ static void mlx5e_rq_err_cqe_work(struct work_struct *recover_work)
static int mlx5e_alloc_mpwqe_rq_drop_page(struct mlx5e_rq *rq)
{
- rq->wqe_overflow.page = alloc_page(GFP_KERNEL);
- if (!rq->wqe_overflow.page)
+ struct page *page = alloc_page(GFP_KERNEL);
+
+ if (!page)
return -ENOMEM;
- rq->wqe_overflow.addr = dma_map_page(rq->pdev, rq->wqe_overflow.page, 0,
+ rq->wqe_overflow.addr = dma_map_page(rq->pdev, page, 0,
PAGE_SIZE, rq->buff.map_dir);
if (dma_mapping_error(rq->pdev, rq->wqe_overflow.addr)) {
- __free_page(rq->wqe_overflow.page);
+ __free_page(page);
return -ENOMEM;
}
+
+ rq->wqe_overflow.netmem = page_to_netmem(page);
return 0;
}
static void mlx5e_free_mpwqe_rq_drop_page(struct mlx5e_rq *rq)
{
+ struct page *page = netmem_to_page(rq->wqe_overflow.netmem);
+
dma_unmap_page(rq->pdev, rq->wqe_overflow.addr, PAGE_SIZE,
rq->buff.map_dir);
- __free_page(rq->wqe_overflow.page);
+ __free_page(page);
}
static int mlx5e_init_rxq_rq(struct mlx5e_channel *c, struct mlx5e_params *params,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 84b1ab8233b8..78ca93b7a7ee 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -276,16 +276,16 @@ static inline u32 mlx5e_decompress_cqes_start(struct mlx5e_rq *rq,
static int mlx5e_page_alloc_fragmented(struct mlx5e_rq *rq,
struct mlx5e_frag_page *frag_page)
{
- struct page *page;
+ netmem_ref netmem;
- page = page_pool_dev_alloc_pages(rq->page_pool);
- if (unlikely(!page))
+ netmem = page_pool_dev_alloc_netmem(rq->page_pool, NULL, NULL);
+ if (unlikely(!netmem))
return -ENOMEM;
- page_pool_fragment_page(page, MLX5E_PAGECNT_BIAS_MAX);
+ page_pool_fragment_netmem(netmem, MLX5E_PAGECNT_BIAS_MAX);
*frag_page = (struct mlx5e_frag_page) {
- .page = page,
+ .netmem = netmem,
.frags = 0,
};
@@ -296,10 +296,10 @@ static void mlx5e_page_release_fragmented(struct mlx5e_rq *rq,
struct mlx5e_frag_page *frag_page)
{
u16 drain_count = MLX5E_PAGECNT_BIAS_MAX - frag_page->frags;
- struct page *page = frag_page->page;
+ netmem_ref netmem = frag_page->netmem;
- if (page_pool_unref_page(page, drain_count) == 0)
- page_pool_put_unrefed_page(rq->page_pool, page, -1, true);
+ if (page_pool_unref_netmem(netmem, drain_count) == 0)
+ page_pool_put_unrefed_netmem(rq->page_pool, netmem, -1, true);
}
static inline int mlx5e_get_rx_frag(struct mlx5e_rq *rq,
@@ -358,7 +358,7 @@ static int mlx5e_alloc_rx_wqe(struct mlx5e_rq *rq, struct mlx5e_rx_wqe_cyc *wqe,
frag->flags &= ~BIT(MLX5E_WQE_FRAG_SKIP_RELEASE);
headroom = i == 0 ? rq->buff.headroom : 0;
- addr = page_pool_get_dma_addr(frag->frag_page->page);
+ addr = page_pool_get_dma_addr_netmem(frag->frag_page->netmem);
wqe->data[i].addr = cpu_to_be64(addr + frag->offset + headroom);
}
@@ -501,7 +501,7 @@ mlx5e_add_skb_shared_info_frag(struct mlx5e_rq *rq, struct skb_shared_info *sinf
{
skb_frag_t *frag;
- dma_addr_t addr = page_pool_get_dma_addr(frag_page->page);
+ dma_addr_t addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
dma_sync_single_for_cpu(rq->pdev, addr + frag_offset, len, rq->buff.map_dir);
if (!xdp_buff_has_frags(xdp)) {
@@ -514,9 +514,9 @@ mlx5e_add_skb_shared_info_frag(struct mlx5e_rq *rq, struct skb_shared_info *sinf
}
frag = &sinfo->frags[sinfo->nr_frags++];
- skb_frag_fill_page_desc(frag, frag_page->page, frag_offset, len);
+ skb_frag_fill_netmem_desc(frag, frag_page->netmem, frag_offset, len);
- if (page_is_pfmemalloc(frag_page->page))
+ if (netmem_is_pfmemalloc(frag_page->netmem))
xdp_buff_set_frag_pfmemalloc(xdp);
sinfo->xdp_frags_size += len;
}
@@ -527,27 +527,27 @@ mlx5e_add_skb_frag(struct mlx5e_rq *rq, struct sk_buff *skb,
u32 frag_offset, u32 len,
unsigned int truesize)
{
- dma_addr_t addr = page_pool_get_dma_addr(frag_page->page);
+ dma_addr_t addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
u8 next_frag = skb_shinfo(skb)->nr_frags;
dma_sync_single_for_cpu(rq->pdev, addr + frag_offset, len,
rq->buff.map_dir);
- if (skb_can_coalesce(skb, next_frag, frag_page->page, frag_offset)) {
+ if (skb_can_coalesce_netmem(skb, next_frag, frag_page->netmem, frag_offset)) {
skb_coalesce_rx_frag(skb, next_frag - 1, len, truesize);
} else {
frag_page->frags++;
- skb_add_rx_frag(skb, next_frag, frag_page->page,
+ skb_add_rx_frag_netmem(skb, next_frag, frag_page->netmem,
frag_offset, len, truesize);
}
}
static inline void
mlx5e_copy_skb_header(struct mlx5e_rq *rq, struct sk_buff *skb,
- struct page *page, dma_addr_t addr,
+ netmem_ref netmem, dma_addr_t addr,
int offset_from, int dma_offset, u32 headlen)
{
- const void *from = page_address(page) + offset_from;
+ const void *from = netmem_address(netmem) + offset_from;
/* Aligning len to sizeof(long) optimizes memcpy performance */
unsigned int len = ALIGN(headlen, sizeof(long));
@@ -684,7 +684,7 @@ static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq,
goto err_unmap;
- addr = page_pool_get_dma_addr(frag_page->page);
+ addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
for (int j = 0; j < MLX5E_SHAMPO_WQ_HEADER_PER_PAGE; j++) {
header_offset = mlx5e_shampo_hd_offset(index++);
@@ -794,7 +794,7 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
err = mlx5e_page_alloc_fragmented(rq, frag_page);
if (unlikely(err))
goto err_unmap;
- addr = page_pool_get_dma_addr(frag_page->page);
+ addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
umr_wqe->inline_mtts[i] = (struct mlx5_mtt) {
.ptag = cpu_to_be64(addr | MLX5_EN_WR),
};
@@ -1212,7 +1212,7 @@ static void *mlx5e_shampo_get_packet_hd(struct mlx5e_rq *rq, u16 header_index)
struct mlx5e_frag_page *frag_page = mlx5e_shampo_hd_to_frag_page(rq, header_index);
u16 head_offset = mlx5e_shampo_hd_offset(header_index) + rq->buff.headroom;
- return page_address(frag_page->page) + head_offset;
+ return netmem_address(frag_page->netmem) + head_offset;
}
static void mlx5e_shampo_update_ipv4_udp_hdr(struct mlx5e_rq *rq, struct iphdr *ipv4)
@@ -1673,11 +1673,11 @@ mlx5e_skb_from_cqe_linear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi,
dma_addr_t addr;
u32 frag_size;
- va = page_address(frag_page->page) + wi->offset;
+ va = netmem_address(frag_page->netmem) + wi->offset;
data = va + rx_headroom;
frag_size = MLX5_SKB_FRAG_SZ(rx_headroom + cqe_bcnt);
- addr = page_pool_get_dma_addr(frag_page->page);
+ addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
dma_sync_single_range_for_cpu(rq->pdev, addr, wi->offset,
frag_size, rq->buff.map_dir);
net_prefetch(data);
@@ -1727,10 +1727,10 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi
frag_page = wi->frag_page;
- va = page_address(frag_page->page) + wi->offset;
+ va = netmem_address(frag_page->netmem) + wi->offset;
frag_consumed_bytes = min_t(u32, frag_info->frag_size, cqe_bcnt);
- addr = page_pool_get_dma_addr(frag_page->page);
+ addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
dma_sync_single_range_for_cpu(rq->pdev, addr, wi->offset,
rq->buff.frame0_sz, rq->buff.map_dir);
net_prefetchw(va); /* xdp_frame data area */
@@ -2003,12 +2003,12 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
if (prog) {
/* area for bpf_xdp_[store|load]_bytes */
- net_prefetchw(page_address(frag_page->page) + frag_offset);
+ net_prefetchw(netmem_address(frag_page->netmem) + frag_offset);
if (unlikely(mlx5e_page_alloc_fragmented(rq, &wi->linear_page))) {
rq->stats->buff_alloc_err++;
return NULL;
}
- va = page_address(wi->linear_page.page);
+ va = netmem_address(wi->linear_page.netmem);
net_prefetchw(va); /* xdp_frame data area */
linear_hr = XDP_PACKET_HEADROOM;
linear_data_len = 0;
@@ -2117,8 +2117,8 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
while (++pagep < frag_page);
}
/* copy header */
- addr = page_pool_get_dma_addr(head_page->page);
- mlx5e_copy_skb_header(rq, skb, head_page->page, addr,
+ addr = page_pool_get_dma_addr_netmem(head_page->netmem);
+ mlx5e_copy_skb_header(rq, skb, head_page->netmem, addr,
head_offset, head_offset, headlen);
/* skb linear part was allocated with headlen and aligned to long */
skb->tail += headlen;
@@ -2148,11 +2148,11 @@ mlx5e_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
return NULL;
}
- va = page_address(frag_page->page) + head_offset;
+ va = netmem_address(frag_page->netmem) + head_offset;
data = va + rx_headroom;
frag_size = MLX5_SKB_FRAG_SZ(rx_headroom + cqe_bcnt);
- addr = page_pool_get_dma_addr(frag_page->page);
+ addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
dma_sync_single_range_for_cpu(rq->pdev, addr, head_offset,
frag_size, rq->buff.map_dir);
net_prefetch(data);
@@ -2191,7 +2191,7 @@ mlx5e_skb_from_cqe_shampo(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
struct mlx5_cqe64 *cqe, u16 header_index)
{
struct mlx5e_frag_page *frag_page = mlx5e_shampo_hd_to_frag_page(rq, header_index);
- dma_addr_t page_dma_addr = page_pool_get_dma_addr(frag_page->page);
+ dma_addr_t page_dma_addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
u16 head_offset = mlx5e_shampo_hd_offset(header_index);
dma_addr_t dma_addr = page_dma_addr + head_offset;
u16 head_size = cqe->shampo.header_size;
@@ -2200,7 +2200,7 @@ mlx5e_skb_from_cqe_shampo(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
void *hdr, *data;
u32 frag_size;
- hdr = page_address(frag_page->page) + head_offset;
+ hdr = netmem_address(frag_page->netmem) + head_offset;
data = hdr + rx_headroom;
frag_size = MLX5_SKB_FRAG_SZ(rx_headroom + head_size);
@@ -2225,7 +2225,7 @@ mlx5e_skb_from_cqe_shampo(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
}
net_prefetchw(skb->data);
- mlx5e_copy_skb_header(rq, skb, frag_page->page, dma_addr,
+ mlx5e_copy_skb_header(rq, skb, frag_page->netmem, dma_addr,
head_offset + rx_headroom,
rx_headroom, head_size);
/* skb linear part was allocated with headlen and aligned to long */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 5520524c93bf..faf59ea5b13f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3887,6 +3887,20 @@ static inline bool skb_can_coalesce(struct sk_buff *skb, int i,
return false;
}
+static inline bool skb_can_coalesce_netmem(struct sk_buff *skb, int i,
+ const netmem_ref netmem, int off)
+{
+ if (skb_zcopy(skb))
+ return false;
+ if (i) {
+ const skb_frag_t *frag = &skb_shinfo(skb)->frags[i - 1];
+
+ return netmem == skb_frag_netmem(frag) &&
+ off == skb_frag_off(frag) + skb_frag_size(frag);
+ }
+ return false;
+}
+
static inline int __skb_linearize(struct sk_buff *skb)
{
return __pskb_pull_tail(skb, skb->data_len) ? 0 : -ENOMEM;
diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 93f2c31baf9b..aa120f6d519a 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -150,6 +150,10 @@ static inline netmem_ref page_pool_dev_alloc_netmem(struct page_pool *pool,
{
gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN;
+ WARN_ON((!offset && size) || (offset && !size));
+ if (!offset || !size)
+ return page_pool_alloc_netmems(pool, gfp);
+
return page_pool_alloc_netmem(pool, offset, size, gfp);
}
--
2.17.1
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 14/18] netmem: use _Generic to cover const casting for page_to_netmem()
2025-05-23 3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
` (12 preceding siblings ...)
2025-05-23 3:26 ` [PATCH 13/18] mlx5: use netmem descriptor and APIs for page pool Byungchul Park
@ 2025-05-23 3:26 ` Byungchul Park
2025-05-23 17:14 ` Mina Almasry
2025-05-23 3:26 ` [PATCH 15/18] netmem: remove __netmem_get_pp() Byungchul Park
` (5 subsequent siblings)
19 siblings, 1 reply; 72+ messages in thread
From: Byungchul Park @ 2025-05-23 3:26 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
The current page_to_netmem() doesn't cover const casting resulting in
trying to cast const struct page * to const netmem_ref fails.
To cover the case, change page_to_netmem() to use macro and _Generic.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
include/net/netmem.h | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/include/net/netmem.h b/include/net/netmem.h
index 29c005d70c4f..c2eb121181c2 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -172,10 +172,9 @@ static inline netmem_ref net_iov_to_netmem(struct net_iov *niov)
return (__force netmem_ref)((unsigned long)niov | NET_IOV);
}
-static inline netmem_ref page_to_netmem(struct page *page)
-{
- return (__force netmem_ref)page;
-}
+#define page_to_netmem(p) (_Generic((p), \
+ const struct page *: (__force const netmem_ref)(p), \
+ struct page *: (__force netmem_ref)(p)))
static inline netmem_ref alloc_netmems_node(int nid, gfp_t gfp_mask,
unsigned int order)
--
2.17.1
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 15/18] netmem: remove __netmem_get_pp()
2025-05-23 3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
` (13 preceding siblings ...)
2025-05-23 3:26 ` [PATCH 14/18] netmem: use _Generic to cover const casting for page_to_netmem() Byungchul Park
@ 2025-05-23 3:26 ` Byungchul Park
2025-05-23 3:26 ` [PATCH 16/18] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem() Byungchul Park
` (4 subsequent siblings)
19 siblings, 0 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-23 3:26 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
There are no users of __netmem_get_pp(). Remove it.
Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
---
include/net/netmem.h | 16 ----------------
1 file changed, 16 deletions(-)
diff --git a/include/net/netmem.h b/include/net/netmem.h
index c2eb121181c2..c63a7e20f5f3 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -224,22 +224,6 @@ static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem)
return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV);
}
-/**
- * __netmem_get_pp - unsafely get pointer to the &page_pool backing @netmem
- * @netmem: netmem reference to get the pointer from
- *
- * Unsafe version of netmem_get_pp(). When @netmem is always page-backed,
- * e.g. when it's a header buffer, performs faster and generates smaller
- * object code (avoids clearing the LSB). When @netmem points to IOV,
- * provokes invalid memory access.
- *
- * Return: pointer to the &page_pool (garbage if @netmem is not page-backed).
- */
-static inline struct page_pool *__netmem_get_pp(netmem_ref netmem)
-{
- return __netmem_to_page(netmem)->pp;
-}
-
static inline struct page_pool *netmem_get_pp(netmem_ref netmem)
{
return __netmem_clear_lsb(netmem)->pp;
--
2.17.1
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 16/18] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem()
2025-05-23 3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
` (14 preceding siblings ...)
2025-05-23 3:26 ` [PATCH 15/18] netmem: remove __netmem_get_pp() Byungchul Park
@ 2025-05-23 3:26 ` Byungchul Park
2025-05-23 3:26 ` [PATCH 17/18] netdevsim: use netmem descriptor and APIs for page pool Byungchul Park
` (3 subsequent siblings)
19 siblings, 0 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-23 3:26 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
The page pool members in struct page cannot be removed unless it's not
allowed to access any of them via struct page.
Do not access 'page->dma_addr' directly in page_pool_get_dma_addr() but
just wrap page_pool_get_dma_addr_netmem() safely.
Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
include/net/page_pool/helpers.h | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index aa120f6d519a..bcd0c08fd5b8 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -441,12 +441,7 @@ static inline dma_addr_t page_pool_get_dma_addr_netmem(netmem_ref netmem)
*/
static inline dma_addr_t page_pool_get_dma_addr(const struct page *page)
{
- dma_addr_t ret = page->dma_addr;
-
- if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA)
- ret <<= PAGE_SHIFT;
-
- return ret;
+ return page_pool_get_dma_addr_netmem(page_to_netmem(page));
}
static inline void __page_pool_dma_sync_for_cpu(const struct page_pool *pool,
--
2.17.1
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 17/18] netdevsim: use netmem descriptor and APIs for page pool
2025-05-23 3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
` (15 preceding siblings ...)
2025-05-23 3:26 ` [PATCH 16/18] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem() Byungchul Park
@ 2025-05-23 3:26 ` Byungchul Park
2025-05-23 3:26 ` [PATCH 18/18] mm, netmem: remove the page pool members in struct page Byungchul Park
` (2 subsequent siblings)
19 siblings, 0 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-23 3:26 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
To simplify struct page, the effort to seperate its own descriptor from
struct page is required and the work for page pool is on going.
Use netmem descriptor and APIs for page pool in netdevsim code.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
drivers/net/netdevsim/netdev.c | 18 +++++++++---------
drivers/net/netdevsim/netdevsim.h | 2 +-
2 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
index af545d42961c..c550a234807c 100644
--- a/drivers/net/netdevsim/netdev.c
+++ b/drivers/net/netdevsim/netdev.c
@@ -821,7 +821,7 @@ nsim_pp_hold_read(struct file *file, char __user *data,
struct netdevsim *ns = file->private_data;
char buf[3] = "n\n";
- if (ns->page)
+ if (ns->netmem)
buf[0] = 'y';
return simple_read_from_buffer(data, count, ppos, buf, 2);
@@ -841,18 +841,18 @@ nsim_pp_hold_write(struct file *file, const char __user *data,
rtnl_lock();
ret = count;
- if (val == !!ns->page)
+ if (val == !!ns->netmem)
goto exit;
if (!netif_running(ns->netdev) && val) {
ret = -ENETDOWN;
} else if (val) {
- ns->page = page_pool_dev_alloc_pages(ns->rq[0]->page_pool);
- if (!ns->page)
+ ns->netmem = page_pool_dev_alloc_netmem(ns->rq[0]->page_pool, NULL, NULL);
+ if (!ns->netmem)
ret = -ENOMEM;
} else {
- page_pool_put_full_page(ns->page->pp, ns->page, false);
- ns->page = NULL;
+ page_pool_put_full_netmem(netmem_get_pp(ns->netmem), ns->netmem, false);
+ ns->netmem = 0;
}
exit:
@@ -1077,9 +1077,9 @@ void nsim_destroy(struct netdevsim *ns)
nsim_exit_netdevsim(ns);
/* Put this intentionally late to exercise the orphaning path */
- if (ns->page) {
- page_pool_put_full_page(ns->page->pp, ns->page, false);
- ns->page = NULL;
+ if (ns->netmem) {
+ page_pool_put_full_netmem(netmem_get_pp(ns->netmem), ns->netmem, false);
+ ns->netmem = 0;
}
free_netdev(dev);
diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h
index d04401f0bdf7..1dc51468a50c 100644
--- a/drivers/net/netdevsim/netdevsim.h
+++ b/drivers/net/netdevsim/netdevsim.h
@@ -138,7 +138,7 @@ struct netdevsim {
struct debugfs_u32_array dfs_ports[2];
} udp_ports;
- struct page *page;
+ netmem_ref netmem;
struct dentry *pp_dfs;
struct dentry *qr_dfs;
--
2.17.1
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 18/18] mm, netmem: remove the page pool members in struct page
2025-05-23 3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
` (16 preceding siblings ...)
2025-05-23 3:26 ` [PATCH 17/18] netdevsim: use netmem descriptor and APIs for page pool Byungchul Park
@ 2025-05-23 3:26 ` Byungchul Park
2025-05-23 17:16 ` kernel test robot
2025-05-23 17:55 ` Mina Almasry
2025-05-23 6:20 ` [PATCH 00/18] Split netmem from " Taehee Yoo
2025-05-23 17:47 ` SeongJae Park
19 siblings, 2 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-23 3:26 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
Now that all the users of the page pool members in struct page have been
gone, the members can be removed from struct page.
However, since struct netmem_desc might still use the space in struct
page, the size of struct netmem_desc should be checked, until struct
netmem_desc has its own instance from slab, to avoid conficting with
other members within struct page.
Remove the page pool members in struct page and add a static checker for
the size.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
include/linux/mm_types.h | 11 -----------
include/net/netmem.h | 28 +++++-----------------------
2 files changed, 5 insertions(+), 34 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 873e820e1521..5a7864eb9d76 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -119,17 +119,6 @@ struct page {
*/
unsigned long private;
};
- struct { /* page_pool used by netstack */
- unsigned long _pp_mapping_pad;
- /**
- * @pp_magic: magic value to avoid recycling non
- * page_pool allocated pages.
- */
- unsigned long pp_magic;
- struct page_pool *pp;
- unsigned long dma_addr;
- atomic_long_t pp_ref_count;
- };
struct { /* Tail pages of compound page */
unsigned long compound_head; /* Bit zero is set */
};
diff --git a/include/net/netmem.h b/include/net/netmem.h
index c63a7e20f5f3..257c22398d7a 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -77,30 +77,12 @@ struct net_iov_area {
unsigned long base_virtual;
};
-/* These fields in struct page are used by the page_pool and net stack:
- *
- * struct {
- * unsigned long _pp_mapping_pad;
- * unsigned long pp_magic;
- * struct page_pool *pp;
- * unsigned long dma_addr;
- * atomic_long_t pp_ref_count;
- * };
- *
- * We mirror the page_pool fields here so the page_pool can access these fields
- * without worrying whether the underlying fields belong to a page or net_iov.
- *
- * The non-net stack fields of struct page are private to the mm stack and must
- * never be mirrored to net_iov.
+/* XXX: The page pool fields in struct page have been removed but they
+ * might still use the space in struct page. Thus, the size of struct
+ * netmem_desc should be under control until struct netmem_desc has its
+ * own instance from slab.
*/
-#define NET_IOV_ASSERT_OFFSET(pg, iov) \
- static_assert(offsetof(struct page, pg) == \
- offsetof(struct net_iov, iov))
-NET_IOV_ASSERT_OFFSET(pp_magic, pp_magic);
-NET_IOV_ASSERT_OFFSET(pp, pp);
-NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr);
-NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
-#undef NET_IOV_ASSERT_OFFSET
+static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
static inline struct net_iov_area *net_iov_owner(const struct net_iov *niov)
{
--
2.17.1
^ permalink raw reply related [flat|nested] 72+ messages in thread
* Re: [PATCH 00/18] Split netmem from struct page
2025-05-23 3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
` (17 preceding siblings ...)
2025-05-23 3:26 ` [PATCH 18/18] mm, netmem: remove the page pool members in struct page Byungchul Park
@ 2025-05-23 6:20 ` Taehee Yoo
2025-05-23 7:47 ` Byungchul Park
2025-05-23 17:47 ` SeongJae Park
19 siblings, 1 reply; 72+ messages in thread
From: Taehee Yoo @ 2025-05-23 6:20 UTC (permalink / raw)
To: Byungchul Park
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
almasrymina, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, asml.silence, toke, tariqt,
edumazet, pabeni, saeedm, leon, ast, daniel, david,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
horms, linux-rdma, bpf, vishal.moola
On Fri, May 23, 2025 at 12:36 PM Byungchul Park <byungchul@sk.com> wrote:
>
Hi Byungchul,
Thanks a lot for this work!
> The MM subsystem is trying to reduce struct page to a single pointer.
> The first step towards that is splitting struct page by its individual
> users, as has already been done with folio and slab. This patchset does
> that for netmem which is used for page pools.
>
> Matthew Wilcox tried and stopped the same work, you can see in:
>
> https://lore.kernel.org/linux-mm/20230111042214.907030-1-willy@infradead.org/
>
> Mina Almasry already has done a lot fo prerequisite works by luck, he
> said :). I stacked my patches on the top of his work e.i. netmem.
>
> I focused on removing the page pool members in struct page this time,
> not moving the allocation code of page pool from net to mm. It can be
> done later if needed.
>
> My rfc version of this work is:
>
> https://lore.kernel.org/all/20250509115126.63190-1-byungchul@sk.com/
>
> There are still a lot of works to do, to remove the dependency on struct
> page in the network subsystem. I will continue to work on this after
> this base patchset is merged.
There is a compile failure.
In file included from drivers/net/ethernet/intel/libeth/rx.c:4:
./include/net/libeth/rx.h: In function ‘libeth_rx_sync_for_cpu’:
./include/net/libeth/rx.h:140:40: error: ‘struct page’ has no member named ‘pp’
140 | page_pool_dma_sync_for_cpu(page->pp, page, fqe->offset, len);
| ^~
drivers/net/ethernet/intel/libeth/rx.c: In function ‘libeth_rx_recycle_slow’:
drivers/net/ethernet/intel/libeth/rx.c:210:38: error: ‘struct page’
has no member named ‘pp’
210 | page_pool_recycle_direct(page->pp, page);
| ^~
make[7]: *** [scripts/Makefile.build:203:
drivers/net/ethernet/intel/libeth/rx.o] Error 1
make[6]: *** [scripts/Makefile.build:461:
drivers/net/ethernet/intel/libeth] Error 2
make[5]: *** [scripts/Makefile.build:461: drivers/net/ethernet/intel] Error 2
make[5]: *** Waiting for unfinished jobs....
There are page->pp usecases in drivers/net
./drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c:1574:
} else if (page->pp) {
./drivers/net/ethernet/freescale/fec_main.c:1046:
page_pool_put_page(page->pp, page, 0, false);
./drivers/net/ethernet/freescale/fec_main.c:1584:
page_pool_put_page(page->pp, page, 0, true);
./drivers/net/ethernet/freescale/fec_main.c:3351:
page_pool_put_page(page->pp, page, 0, false);
./drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c:370:
page_pool_recycle_direct(page->pp, page);
./drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c:395:
page_pool_recycle_direct(page->pp, page);
./drivers/net/ethernet/ti/icssg/icssg_common.c:111:
page_pool_recycle_direct(page->pp, swdata->data.page);
./drivers/net/ethernet/intel/idpf/idpf_txrx.c:389:
page_pool_put_full_page(rx_buf->page->pp, rx_buf->page, false);
./drivers/net/ethernet/intel/idpf/idpf_txrx.c:3254: u32 hr =
rx_buf->page->pp->p.offset;
./drivers/net/ethernet/intel/idpf/idpf_txrx.c:3286: dst =
page_address(hdr->page) + hdr->offset + hdr->page->pp->p.offset;
./drivers/net/ethernet/intel/idpf/idpf_txrx.c:3287: src =
page_address(buf->page) + buf->offset + buf->page->pp->p.offset;
./drivers/net/ethernet/intel/idpf/idpf_txrx.c:3305: u32 hr =
buf->page->pp->p.offset;
./drivers/net/ethernet/intel/libeth/rx.c:210:
page_pool_recycle_direct(page->pp, page);
./drivers/net/ethernet/intel/iavf/iavf_txrx.c:1200: u32 hr =
rx_buffer->page->pp->p.offset;
./drivers/net/ethernet/intel/iavf/iavf_txrx.c:1217: u32 hr =
rx_buffer->page->pp->p.offset;
./drivers/net/wireless/mediatek/mt76/mt76.h:1800:
page_pool_put_full_page(page->pp, page, allow_direct);
./include/net/libeth/rx.h:140: page_pool_dma_sync_for_cpu(page->pp,
page, fqe->offset, len);
Thanks a lot!
Taehee Yoo
>
> ---
>
> Changes from rfc:
> 1. Rebase on net-next's main branch
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/
> 2. Fix a build error reported by kernel test robot
> https://lore.kernel.org/all/202505100932.uzAMBW1y-lkp@intel.com/
> 3. Add given 'Reviewed-by's, thanks to Mina and Ilias
> 4. Do static_assert() on the size of struct netmem_desc instead
> of placing place-holder in struct page, feedbacked by Matthew
> 5. Do struct_group_tagged(netmem_desc) on struct net_iov instead
> of wholly renaming it to strcut netmem_desc, feedbacked by
> Mina and Pavel
>
> Byungchul Park (18):
> netmem: introduce struct netmem_desc struct_group_tagged()'ed on
> struct net_iov
> netmem: introduce netmem alloc APIs to wrap page alloc APIs
> page_pool: use netmem alloc/put APIs in __page_pool_alloc_page_order()
> page_pool: rename __page_pool_alloc_page_order() to
> __page_pool_alloc_large_netmem()
> page_pool: use netmem alloc/put APIs in __page_pool_alloc_pages_slow()
> page_pool: rename page_pool_return_page() to page_pool_return_netmem()
> page_pool: use netmem put API in page_pool_return_netmem()
> page_pool: rename __page_pool_release_page_dma() to
> __page_pool_release_netmem_dma()
> page_pool: rename __page_pool_put_page() to __page_pool_put_netmem()
> page_pool: rename __page_pool_alloc_pages_slow() to
> __page_pool_alloc_netmems_slow()
> mlx4: use netmem descriptor and APIs for page pool
> page_pool: use netmem APIs to access page->pp_magic in
> page_pool_page_is_pp()
> mlx5: use netmem descriptor and APIs for page pool
> netmem: use _Generic to cover const casting for page_to_netmem()
> netmem: remove __netmem_get_pp()
> page_pool: make page_pool_get_dma_addr() just wrap
> page_pool_get_dma_addr_netmem()
> netdevsim: use netmem descriptor and APIs for page pool
> mm, netmem: remove the page pool members in struct page
>
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 46 ++++----
> drivers/net/ethernet/mellanox/mlx4/en_tx.c | 8 +-
> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 4 +-
> drivers/net/ethernet/mellanox/mlx5/core/en.h | 4 +-
> .../net/ethernet/mellanox/mlx5/core/en/xdp.c | 18 ++--
> .../net/ethernet/mellanox/mlx5/core/en/xdp.h | 2 +-
> .../net/ethernet/mellanox/mlx5/core/en_main.c | 15 ++-
> .../net/ethernet/mellanox/mlx5/core/en_rx.c | 66 ++++++------
> drivers/net/netdevsim/netdev.c | 18 ++--
> drivers/net/netdevsim/netdevsim.h | 2 +-
> include/linux/mm.h | 5 +-
> include/linux/mm_types.h | 11 --
> include/linux/skbuff.h | 14 +++
> include/net/netmem.h | 101 ++++++++++--------
> include/net/page_pool/helpers.h | 11 +-
> net/core/page_pool.c | 97 +++++++++--------
> 16 files changed, 221 insertions(+), 201 deletions(-)
>
>
> base-commit: f44092606a3f153bb7e6b277006b1f4a5b914cfc
> --
> 2.17.1
>
>
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 00/18] Split netmem from struct page
2025-05-23 6:20 ` [PATCH 00/18] Split netmem from " Taehee Yoo
@ 2025-05-23 7:47 ` Byungchul Park
0 siblings, 0 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-23 7:47 UTC (permalink / raw)
To: Taehee Yoo
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
almasrymina, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, asml.silence, toke, tariqt,
edumazet, pabeni, saeedm, leon, ast, daniel, david,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
horms, linux-rdma, bpf, vishal.moola
On Fri, May 23, 2025 at 03:20:27PM +0900, Taehee Yoo wrote:
> On Fri, May 23, 2025 at 12:36 PM Byungchul Park <byungchul@sk.com> wrote:
> >
>
> Hi Byungchul,
> Thanks a lot for this work!
>
> > The MM subsystem is trying to reduce struct page to a single pointer.
> > The first step towards that is splitting struct page by its individual
> > users, as has already been done with folio and slab. This patchset does
> > that for netmem which is used for page pools.
> >
> > Matthew Wilcox tried and stopped the same work, you can see in:
> >
> > https://lore.kernel.org/linux-mm/20230111042214.907030-1-willy@infradead.org/
> >
> > Mina Almasry already has done a lot fo prerequisite works by luck, he
> > said :). I stacked my patches on the top of his work e.i. netmem.
> >
> > I focused on removing the page pool members in struct page this time,
> > not moving the allocation code of page pool from net to mm. It can be
> > done later if needed.
> >
> > My rfc version of this work is:
> >
> > https://lore.kernel.org/all/20250509115126.63190-1-byungchul@sk.com/
> >
> > There are still a lot of works to do, to remove the dependency on struct
> > page in the network subsystem. I will continue to work on this after
> > this base patchset is merged.
>
> There is a compile failure.
Thanks a lot. I will fix it.
Byungchul
>
> In file included from drivers/net/ethernet/intel/libeth/rx.c:4:
> ./include/net/libeth/rx.h: In function ‘libeth_rx_sync_for_cpu’:
> ./include/net/libeth/rx.h:140:40: error: ‘struct page’ has no member named ‘pp’
> 140 | page_pool_dma_sync_for_cpu(page->pp, page, fqe->offset, len);
> | ^~
> drivers/net/ethernet/intel/libeth/rx.c: In function ‘libeth_rx_recycle_slow’:
> drivers/net/ethernet/intel/libeth/rx.c:210:38: error: ‘struct page’
> has no member named ‘pp’
> 210 | page_pool_recycle_direct(page->pp, page);
> | ^~
> make[7]: *** [scripts/Makefile.build:203:
> drivers/net/ethernet/intel/libeth/rx.o] Error 1
> make[6]: *** [scripts/Makefile.build:461:
> drivers/net/ethernet/intel/libeth] Error 2
> make[5]: *** [scripts/Makefile.build:461: drivers/net/ethernet/intel] Error 2
> make[5]: *** Waiting for unfinished jobs....
>
> There are page->pp usecases in drivers/net
> ./drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c:1574:
> } else if (page->pp) {
> ./drivers/net/ethernet/freescale/fec_main.c:1046:
> page_pool_put_page(page->pp, page, 0, false);
> ./drivers/net/ethernet/freescale/fec_main.c:1584:
> page_pool_put_page(page->pp, page, 0, true);
> ./drivers/net/ethernet/freescale/fec_main.c:3351:
> page_pool_put_page(page->pp, page, 0, false);
> ./drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c:370:
> page_pool_recycle_direct(page->pp, page);
> ./drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c:395:
> page_pool_recycle_direct(page->pp, page);
> ./drivers/net/ethernet/ti/icssg/icssg_common.c:111:
> page_pool_recycle_direct(page->pp, swdata->data.page);
> ./drivers/net/ethernet/intel/idpf/idpf_txrx.c:389:
> page_pool_put_full_page(rx_buf->page->pp, rx_buf->page, false);
> ./drivers/net/ethernet/intel/idpf/idpf_txrx.c:3254: u32 hr =
> rx_buf->page->pp->p.offset;
> ./drivers/net/ethernet/intel/idpf/idpf_txrx.c:3286: dst =
> page_address(hdr->page) + hdr->offset + hdr->page->pp->p.offset;
> ./drivers/net/ethernet/intel/idpf/idpf_txrx.c:3287: src =
> page_address(buf->page) + buf->offset + buf->page->pp->p.offset;
> ./drivers/net/ethernet/intel/idpf/idpf_txrx.c:3305: u32 hr =
> buf->page->pp->p.offset;
> ./drivers/net/ethernet/intel/libeth/rx.c:210:
> page_pool_recycle_direct(page->pp, page);
> ./drivers/net/ethernet/intel/iavf/iavf_txrx.c:1200: u32 hr =
> rx_buffer->page->pp->p.offset;
> ./drivers/net/ethernet/intel/iavf/iavf_txrx.c:1217: u32 hr =
> rx_buffer->page->pp->p.offset;
> ./drivers/net/wireless/mediatek/mt76/mt76.h:1800:
> page_pool_put_full_page(page->pp, page, allow_direct);
> ./include/net/libeth/rx.h:140: page_pool_dma_sync_for_cpu(page->pp,
> page, fqe->offset, len);
>
> Thanks a lot!
> Taehee Yoo
>
> >
> > ---
> >
> > Changes from rfc:
> > 1. Rebase on net-next's main branch
> > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/
> > 2. Fix a build error reported by kernel test robot
> > https://lore.kernel.org/all/202505100932.uzAMBW1y-lkp@intel.com/
> > 3. Add given 'Reviewed-by's, thanks to Mina and Ilias
> > 4. Do static_assert() on the size of struct netmem_desc instead
> > of placing place-holder in struct page, feedbacked by Matthew
> > 5. Do struct_group_tagged(netmem_desc) on struct net_iov instead
> > of wholly renaming it to strcut netmem_desc, feedbacked by
> > Mina and Pavel
> >
> > Byungchul Park (18):
> > netmem: introduce struct netmem_desc struct_group_tagged()'ed on
> > struct net_iov
> > netmem: introduce netmem alloc APIs to wrap page alloc APIs
> > page_pool: use netmem alloc/put APIs in __page_pool_alloc_page_order()
> > page_pool: rename __page_pool_alloc_page_order() to
> > __page_pool_alloc_large_netmem()
> > page_pool: use netmem alloc/put APIs in __page_pool_alloc_pages_slow()
> > page_pool: rename page_pool_return_page() to page_pool_return_netmem()
> > page_pool: use netmem put API in page_pool_return_netmem()
> > page_pool: rename __page_pool_release_page_dma() to
> > __page_pool_release_netmem_dma()
> > page_pool: rename __page_pool_put_page() to __page_pool_put_netmem()
> > page_pool: rename __page_pool_alloc_pages_slow() to
> > __page_pool_alloc_netmems_slow()
> > mlx4: use netmem descriptor and APIs for page pool
> > page_pool: use netmem APIs to access page->pp_magic in
> > page_pool_page_is_pp()
> > mlx5: use netmem descriptor and APIs for page pool
> > netmem: use _Generic to cover const casting for page_to_netmem()
> > netmem: remove __netmem_get_pp()
> > page_pool: make page_pool_get_dma_addr() just wrap
> > page_pool_get_dma_addr_netmem()
> > netdevsim: use netmem descriptor and APIs for page pool
> > mm, netmem: remove the page pool members in struct page
> >
> > drivers/net/ethernet/mellanox/mlx4/en_rx.c | 46 ++++----
> > drivers/net/ethernet/mellanox/mlx4/en_tx.c | 8 +-
> > drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 4 +-
> > drivers/net/ethernet/mellanox/mlx5/core/en.h | 4 +-
> > .../net/ethernet/mellanox/mlx5/core/en/xdp.c | 18 ++--
> > .../net/ethernet/mellanox/mlx5/core/en/xdp.h | 2 +-
> > .../net/ethernet/mellanox/mlx5/core/en_main.c | 15 ++-
> > .../net/ethernet/mellanox/mlx5/core/en_rx.c | 66 ++++++------
> > drivers/net/netdevsim/netdev.c | 18 ++--
> > drivers/net/netdevsim/netdevsim.h | 2 +-
> > include/linux/mm.h | 5 +-
> > include/linux/mm_types.h | 11 --
> > include/linux/skbuff.h | 14 +++
> > include/net/netmem.h | 101 ++++++++++--------
> > include/net/page_pool/helpers.h | 11 +-
> > net/core/page_pool.c | 97 +++++++++--------
> > 16 files changed, 221 insertions(+), 201 deletions(-)
> >
> >
> > base-commit: f44092606a3f153bb7e6b277006b1f4a5b914cfc
> > --
> > 2.17.1
> >
> >
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp()
2025-05-23 3:26 ` [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp() Byungchul Park
@ 2025-05-23 8:58 ` Toke Høiland-Jørgensen
2025-05-23 17:21 ` Mina Almasry
1 sibling, 0 replies; 72+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-05-23 8:58 UTC (permalink / raw)
To: Byungchul Park, willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, tariqt, edumazet, pabeni, saeedm,
leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett, vbabka,
rppt, surenb, mhocko, horms, linux-rdma, bpf, vishal.moola
Byungchul Park <byungchul@sk.com> writes:
> To simplify struct page, the effort to seperate its own descriptor from
> struct page is required and the work for page pool is on going.
>
> To achieve that, all the code should avoid accessing page pool members
> of struct page directly, but use safe APIs for the purpose.
>
> Use netmem_is_pp() instead of directly accessing page->pp_magic in
> page_pool_page_is_pp().
>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> ---
> include/linux/mm.h | 5 +----
> net/core/page_pool.c | 5 +++++
> 2 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 8dc012e84033..3f7c80fb73ce 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4312,10 +4312,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>
> #ifdef CONFIG_PAGE_POOL
> -static inline bool page_pool_page_is_pp(struct page *page)
> -{
> - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> -}
> +bool page_pool_page_is_pp(struct page *page);
Here you're turning an inline function into a function call, which has
performance implications. Please try to avoid that.
-Toke
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov
2025-05-23 3:25 ` [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov Byungchul Park
@ 2025-05-23 9:01 ` Toke Høiland-Jørgensen
2025-05-26 0:56 ` Byungchul Park
2025-05-23 17:00 ` Mina Almasry
2025-05-27 2:50 ` Byungchul Park
2 siblings, 1 reply; 72+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-05-23 9:01 UTC (permalink / raw)
To: Byungchul Park, willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, tariqt, edumazet, pabeni, saeedm,
leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett, vbabka,
rppt, surenb, mhocko, horms, linux-rdma, bpf, vishal.moola
Byungchul Park <byungchul@sk.com> writes:
> To simplify struct page, the page pool members of struct page should be
> moved to other, allowing these members to be removed from struct page.
>
> Introduce a network memory descriptor to store the members, struct
> netmem_desc, reusing struct net_iov that already mirrored struct page.
>
> While at it, relocate _pp_mapping_pad to group struct net_iov's fields.
>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> ---
> include/linux/mm_types.h | 2 +-
> include/net/netmem.h | 43 +++++++++++++++++++++++++++++++++-------
> 2 files changed, 37 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 56d07edd01f9..873e820e1521 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -120,13 +120,13 @@ struct page {
> unsigned long private;
> };
> struct { /* page_pool used by netstack */
> + unsigned long _pp_mapping_pad;
> /**
> * @pp_magic: magic value to avoid recycling non
> * page_pool allocated pages.
> */
> unsigned long pp_magic;
> struct page_pool *pp;
> - unsigned long _pp_mapping_pad;
> unsigned long dma_addr;
> atomic_long_t pp_ref_count;
> };
The reason that field is called "_pp_mapping_pad" is that it's supposed
to overlay the page->mapping field, so that none of the page_pool uses
set a value here. Moving it breaks that assumption. Once struct
netmem_desc is completely decoupled from struct page this obviously
doesn't matter, but I think it does today? At least, trying to use that
field for the DMA index broke things, which is why we ended up with the
bit-stuffing in pp_magic...
-Toke
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov
2025-05-23 3:25 ` [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov Byungchul Park
2025-05-23 9:01 ` Toke Høiland-Jørgensen
@ 2025-05-23 17:00 ` Mina Almasry
2025-05-26 1:15 ` Byungchul Park
2025-05-27 2:50 ` Byungchul Park
2 siblings, 1 reply; 72+ messages in thread
From: Mina Almasry @ 2025-05-23 17:00 UTC (permalink / raw)
To: Byungchul Park
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
>
> To simplify struct page, the page pool members of struct page should be
> moved to other, allowing these members to be removed from struct page.
>
> Introduce a network memory descriptor to store the members, struct
> netmem_desc, reusing struct net_iov that already mirrored struct page.
>
> While at it, relocate _pp_mapping_pad to group struct net_iov's fields.
>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> ---
> include/linux/mm_types.h | 2 +-
> include/net/netmem.h | 43 +++++++++++++++++++++++++++++++++-------
> 2 files changed, 37 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 56d07edd01f9..873e820e1521 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -120,13 +120,13 @@ struct page {
> unsigned long private;
> };
> struct { /* page_pool used by netstack */
> + unsigned long _pp_mapping_pad;
> /**
> * @pp_magic: magic value to avoid recycling non
> * page_pool allocated pages.
> */
> unsigned long pp_magic;
> struct page_pool *pp;
> - unsigned long _pp_mapping_pad;
Like Toke says, moving this to the beginning of this struct is not
allowed. The first 3 bits of pp_magic are overlaid with page->lru so
the pp makes sure not to use them. _pp_mapping_pad is overlaid with
page->mapping, so the pp makes sure not to use it. AFAICT, this moving
of _pp_mapping_pad is not necessary for this patch. I think just drop
it.
> unsigned long dma_addr;
> atomic_long_t pp_ref_count;
> };
> diff --git a/include/net/netmem.h b/include/net/netmem.h
> index 386164fb9c18..08e9d76cdf14 100644
> --- a/include/net/netmem.h
> +++ b/include/net/netmem.h
> @@ -31,12 +31,41 @@ enum net_iov_type {
> };
>
> struct net_iov {
> - enum net_iov_type type;
> - unsigned long pp_magic;
> - struct page_pool *pp;
> - struct net_iov_area *owner;
> - unsigned long dma_addr;
> - atomic_long_t pp_ref_count;
> + /*
> + * XXX: Now that struct netmem_desc overlays on struct page,
> + * struct_group_tagged() should cover all of them. However,
> + * a separate struct netmem_desc should be declared and embedded,
> + * once struct netmem_desc is no longer overlayed but it has its
> + * own instance from slab. The final form should be:
> + *
> + * struct netmem_desc {
> + * unsigned long pp_magic;
> + * struct page_pool *pp;
> + * unsigned long dma_addr;
> + * atomic_long_t pp_ref_count;
> + * };
> + *
> + * struct net_iov {
> + * enum net_iov_type type;
> + * struct net_iov_area *owner;
> + * struct netmem_desc;
> + * };
> + */
I'm unclear on why moving to this format is a TODO for the future. Why
isn't this state in the comment the state in the code? I think I gave
the same code snippet on the RFC, but here again:
struct netmem_desc {
/**
* @pp_magic: magic value to avoid recycling non
* page_pool allocated pages.
*/
unsigned long pp_magic;
struct page_pool *pp;
unsigned long _pp_mapping_pad;
unsigned long dma_addr;
atomic_long_t pp_ref_count;
};
(Roughly):
struct page {
...
struct { /* page_pool used by netstack */
struct netmem_desc;
};
...
};
struct net_iov {
enum net_iov_type type;
struct netmem_desc;
struct net_iov_area *owner;
}
AFAICT, this should work..?
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 13/18] mlx5: use netmem descriptor and APIs for page pool
2025-05-23 3:26 ` [PATCH 13/18] mlx5: use netmem descriptor and APIs for page pool Byungchul Park
@ 2025-05-23 17:13 ` Mina Almasry
2025-05-26 3:08 ` Byungchul Park
0 siblings, 1 reply; 72+ messages in thread
From: Mina Almasry @ 2025-05-23 17:13 UTC (permalink / raw)
To: Byungchul Park
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
>
> To simplify struct page, the effort to seperate its own descriptor from
> struct page is required and the work for page pool is on going.
>
> Use netmem descriptor and APIs for page pool in mlx5 code.
>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
Just FYI, you're racing with Nvidia adding netmem support to mlx5 as
well. Probably they prefer to take their patch. So try to rebase on
top of that maybe? Up to you.
https://lore.kernel.org/netdev/1747950086-1246773-9-git-send-email-tariqt@nvidia.com/
I also wonder if you should send this through the net-next tree, since
it seem to race with changes that are going to land in net-next soon.
Up to you, I don't have any strong preference. But if you do send to
net-next, there are a bunch of extra rules to keep in mind:
https://docs.kernel.org/process/maintainer-netdev.html
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 14/18] netmem: use _Generic to cover const casting for page_to_netmem()
2025-05-23 3:26 ` [PATCH 14/18] netmem: use _Generic to cover const casting for page_to_netmem() Byungchul Park
@ 2025-05-23 17:14 ` Mina Almasry
0 siblings, 0 replies; 72+ messages in thread
From: Mina Almasry @ 2025-05-23 17:14 UTC (permalink / raw)
To: Byungchul Park
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
>
> The current page_to_netmem() doesn't cover const casting resulting in
> trying to cast const struct page * to const netmem_ref fails.
>
> To cover the case, change page_to_netmem() to use macro and _Generic.
>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
> ---
> include/net/netmem.h | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/include/net/netmem.h b/include/net/netmem.h
> index 29c005d70c4f..c2eb121181c2 100644
> --- a/include/net/netmem.h
> +++ b/include/net/netmem.h
> @@ -172,10 +172,9 @@ static inline netmem_ref net_iov_to_netmem(struct net_iov *niov)
> return (__force netmem_ref)((unsigned long)niov | NET_IOV);
> }
>
> -static inline netmem_ref page_to_netmem(struct page *page)
> -{
> - return (__force netmem_ref)page;
> -}
> +#define page_to_netmem(p) (_Generic((p), \
> + const struct page *: (__force const netmem_ref)(p), \
> + struct page *: (__force netmem_ref)(p)))
>
> static inline netmem_ref alloc_netmems_node(int nid, gfp_t gfp_mask,
> unsigned int order)
> --
> 2.17.1
>
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 18/18] mm, netmem: remove the page pool members in struct page
2025-05-23 3:26 ` [PATCH 18/18] mm, netmem: remove the page pool members in struct page Byungchul Park
@ 2025-05-23 17:16 ` kernel test robot
2025-05-23 17:55 ` Mina Almasry
1 sibling, 0 replies; 72+ messages in thread
From: kernel test robot @ 2025-05-23 17:16 UTC (permalink / raw)
To: Byungchul Park, willy, netdev
Cc: oe-kbuild-all, linux-kernel, linux-mm, kernel_team, kuba,
almasrymina, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, asml.silence, toke, tariqt,
edumazet, pabeni, saeedm, leon, ast, daniel, david,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko
Hi Byungchul,
kernel test robot noticed the following build errors:
[auto build test ERROR on f44092606a3f153bb7e6b277006b1f4a5b914cfc]
url: https://github.com/intel-lab-lkp/linux/commits/Byungchul-Park/netmem-introduce-struct-netmem_desc-struct_group_tagged-ed-on-struct-net_iov/20250523-112806
base: f44092606a3f153bb7e6b277006b1f4a5b914cfc
patch link: https://lore.kernel.org/r/20250523032609.16334-19-byungchul%40sk.com
patch subject: [PATCH 18/18] mm, netmem: remove the page pool members in struct page
config: x86_64-rhel-9.4-kunit (https://download.01.org/0day-ci/archive/20250524/202505240152.9ODpQBK0-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250524/202505240152.9ODpQBK0-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505240152.9ODpQBK0-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from include/linux/net/intel/libie/rx.h:7,
from drivers/net/ethernet/intel/iavf/iavf_txrx.c:5:
include/net/libeth/rx.h: In function 'libeth_rx_sync_for_cpu':
include/net/libeth/rx.h:140:40: error: 'struct page' has no member named 'pp'
140 | page_pool_dma_sync_for_cpu(page->pp, page, fqe->offset, len);
| ^~
drivers/net/ethernet/intel/iavf/iavf_txrx.c: In function 'iavf_add_rx_frag':
>> drivers/net/ethernet/intel/iavf/iavf_txrx.c:1200:33: error: 'struct page' has no member named 'pp'
1200 | u32 hr = rx_buffer->page->pp->p.offset;
| ^~
drivers/net/ethernet/intel/iavf/iavf_txrx.c: In function 'iavf_build_skb':
drivers/net/ethernet/intel/iavf/iavf_txrx.c:1217:33: error: 'struct page' has no member named 'pp'
1217 | u32 hr = rx_buffer->page->pp->p.offset;
| ^~
--
In file included from drivers/net/ethernet/intel/idpf/idpf_txrx.c:4:
include/net/libeth/rx.h: In function 'libeth_rx_sync_for_cpu':
include/net/libeth/rx.h:140:40: error: 'struct page' has no member named 'pp'
140 | page_pool_dma_sync_for_cpu(page->pp, page, fqe->offset, len);
| ^~
drivers/net/ethernet/intel/idpf/idpf_txrx.c: In function 'idpf_rx_page_rel':
>> drivers/net/ethernet/intel/idpf/idpf_txrx.c:389:45: error: 'struct page' has no member named 'pp'
389 | page_pool_put_full_page(rx_buf->page->pp, rx_buf->page, false);
| ^~
drivers/net/ethernet/intel/idpf/idpf_txrx.c: In function 'idpf_rx_add_frag':
drivers/net/ethernet/intel/idpf/idpf_txrx.c:3254:30: error: 'struct page' has no member named 'pp'
3254 | u32 hr = rx_buf->page->pp->p.offset;
| ^~
drivers/net/ethernet/intel/idpf/idpf_txrx.c: In function 'idpf_rx_hsplit_wa':
drivers/net/ethernet/intel/idpf/idpf_txrx.c:3286:64: error: 'struct page' has no member named 'pp'
3286 | dst = page_address(hdr->page) + hdr->offset + hdr->page->pp->p.offset;
| ^~
drivers/net/ethernet/intel/idpf/idpf_txrx.c:3287:64: error: 'struct page' has no member named 'pp'
3287 | src = page_address(buf->page) + buf->offset + buf->page->pp->p.offset;
| ^~
drivers/net/ethernet/intel/idpf/idpf_txrx.c: In function 'idpf_rx_build_skb':
drivers/net/ethernet/intel/idpf/idpf_txrx.c:3305:27: error: 'struct page' has no member named 'pp'
3305 | u32 hr = buf->page->pp->p.offset;
| ^~
--
In file included from drivers/net/wireless/mediatek/mt76/mt76x2/../mt76x02.h:12,
from drivers/net/wireless/mediatek/mt76/mt76x2/mt76x2.h:23,
from drivers/net/wireless/mediatek/mt76/mt76x2/eeprom.c:9:
drivers/net/wireless/mediatek/mt76/mt76x2/../mt76.h: In function 'mt76_put_page_pool_buf':
>> drivers/net/wireless/mediatek/mt76/mt76x2/../mt76.h:1788:37: error: 'struct page' has no member named 'pp'
1788 | page_pool_put_full_page(page->pp, page, allow_direct);
| ^~
vim +1200 drivers/net/ethernet/intel/iavf/iavf_txrx.c
7f12ad741a4870 drivers/net/ethernet/intel/i40evf/i40e_txrx.c Greg Rose 2013-12-21 1184
ab9ad98eb5f95b drivers/net/ethernet/intel/i40evf/i40e_txrx.c Jesse Brandeburg 2016-04-18 1185 /**
56184e01c00d6d drivers/net/ethernet/intel/iavf/iavf_txrx.c Jesse Brandeburg 2018-09-14 1186 * iavf_add_rx_frag - Add contents of Rx buffer to sk_buff
ab9ad98eb5f95b drivers/net/ethernet/intel/i40evf/i40e_txrx.c Jesse Brandeburg 2016-04-18 1187 * @skb: sk_buff to place the data into
5fa4caff59f251 drivers/net/ethernet/intel/iavf/iavf_txrx.c Alexander Lobakin 2024-04-18 1188 * @rx_buffer: buffer containing page to add
a0cfc3130eef54 drivers/net/ethernet/intel/i40evf/i40e_txrx.c Alexander Duyck 2017-03-14 1189 * @size: packet length from rx_desc
ab9ad98eb5f95b drivers/net/ethernet/intel/i40evf/i40e_txrx.c Jesse Brandeburg 2016-04-18 1190 *
ab9ad98eb5f95b drivers/net/ethernet/intel/i40evf/i40e_txrx.c Jesse Brandeburg 2016-04-18 1191 * This function will add the data contained in rx_buffer->page to the skb.
fa2343e9034ce6 drivers/net/ethernet/intel/i40evf/i40e_txrx.c Alexander Duyck 2017-03-14 1192 * It will just attach the page as a frag to the skb.
ab9ad98eb5f95b drivers/net/ethernet/intel/i40evf/i40e_txrx.c Jesse Brandeburg 2016-04-18 1193 *
fa2343e9034ce6 drivers/net/ethernet/intel/i40evf/i40e_txrx.c Alexander Duyck 2017-03-14 1194 * The function will then update the page offset.
ab9ad98eb5f95b drivers/net/ethernet/intel/i40evf/i40e_txrx.c Jesse Brandeburg 2016-04-18 1195 **/
5fa4caff59f251 drivers/net/ethernet/intel/iavf/iavf_txrx.c Alexander Lobakin 2024-04-18 1196 static void iavf_add_rx_frag(struct sk_buff *skb,
5fa4caff59f251 drivers/net/ethernet/intel/iavf/iavf_txrx.c Alexander Lobakin 2024-04-18 1197 const struct libeth_fqe *rx_buffer,
a0cfc3130eef54 drivers/net/ethernet/intel/i40evf/i40e_txrx.c Alexander Duyck 2017-03-14 1198 unsigned int size)
ab9ad98eb5f95b drivers/net/ethernet/intel/i40evf/i40e_txrx.c Jesse Brandeburg 2016-04-18 1199 {
5fa4caff59f251 drivers/net/ethernet/intel/iavf/iavf_txrx.c Alexander Lobakin 2024-04-18 @1200 u32 hr = rx_buffer->page->pp->p.offset;
efa14c3985828d drivers/net/ethernet/intel/iavf/iavf_txrx.c Mitch Williams 2019-05-14 1201
fa2343e9034ce6 drivers/net/ethernet/intel/i40evf/i40e_txrx.c Alexander Duyck 2017-03-14 1202 skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buffer->page,
5fa4caff59f251 drivers/net/ethernet/intel/iavf/iavf_txrx.c Alexander Lobakin 2024-04-18 1203 rx_buffer->offset + hr, size, rx_buffer->truesize);
9a064128fc8489 drivers/net/ethernet/intel/i40evf/i40e_txrx.c Alexander Duyck 2017-03-14 1204 }
9a064128fc8489 drivers/net/ethernet/intel/i40evf/i40e_txrx.c Alexander Duyck 2017-03-14 1205
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp()
2025-05-23 3:26 ` [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp() Byungchul Park
2025-05-23 8:58 ` Toke Høiland-Jørgensen
@ 2025-05-23 17:21 ` Mina Almasry
2025-05-26 2:23 ` Byungchul Park
1 sibling, 1 reply; 72+ messages in thread
From: Mina Almasry @ 2025-05-23 17:21 UTC (permalink / raw)
To: Byungchul Park
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
>
> To simplify struct page, the effort to seperate its own descriptor from
> struct page is required and the work for page pool is on going.
>
> To achieve that, all the code should avoid accessing page pool members
> of struct page directly, but use safe APIs for the purpose.
>
> Use netmem_is_pp() instead of directly accessing page->pp_magic in
> page_pool_page_is_pp().
>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> ---
> include/linux/mm.h | 5 +----
> net/core/page_pool.c | 5 +++++
> 2 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 8dc012e84033..3f7c80fb73ce 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4312,10 +4312,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>
> #ifdef CONFIG_PAGE_POOL
> -static inline bool page_pool_page_is_pp(struct page *page)
> -{
> - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> -}
I vote for keeping this function as-is (do not convert it to netmem),
and instead modify it to access page->netmem_desc->pp_magic.
The reason is that page_pool_is_pp() is today only called from code
paths we have a page and not a netmem. Casting the page to a netmem
which will cast it back to a page pretty much is a waste of cpu
cycles. The page_pool is a place where we count cycles and we have
benchmarks to verify performance (I pointed you to
page_pool_bench_simple on the RFC).
So lets avoid the cpu cycles if possible.
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 00/18] Split netmem from struct page
2025-05-23 3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
` (18 preceding siblings ...)
2025-05-23 6:20 ` [PATCH 00/18] Split netmem from " Taehee Yoo
@ 2025-05-23 17:47 ` SeongJae Park
2025-05-26 1:16 ` Byungchul Park
19 siblings, 1 reply; 72+ messages in thread
From: SeongJae Park @ 2025-05-23 17:47 UTC (permalink / raw)
To: Byungchul Park
Cc: SeongJae Park, willy, netdev, linux-kernel, linux-mm, kernel_team,
kuba, almasrymina, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, asml.silence, toke, tariqt,
edumazet, pabeni, saeedm, leon, ast, daniel, david,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
horms, linux-rdma, bpf, vishal.moola
Hi Byungchul,
On Fri, 23 May 2025 12:25:51 +0900 Byungchul Park <byungchul@sk.com> wrote:
> The MM subsystem is trying to reduce struct page to a single pointer.
> The first step towards that is splitting struct page by its individual
> users, as has already been done with folio and slab. This patchset does
> that for netmem which is used for page pools.
I found checkpatch.pl outputs some complaints to a few patches of this
patch series. Most warnings and errors look not critical or even unnecessary,
but seems some of those would better to be reduced in my opinion.
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 18/18] mm, netmem: remove the page pool members in struct page
2025-05-23 3:26 ` [PATCH 18/18] mm, netmem: remove the page pool members in struct page Byungchul Park
2025-05-23 17:16 ` kernel test robot
@ 2025-05-23 17:55 ` Mina Almasry
2025-05-26 1:37 ` Byungchul Park
1 sibling, 1 reply; 72+ messages in thread
From: Mina Almasry @ 2025-05-23 17:55 UTC (permalink / raw)
To: Byungchul Park
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
>
> Now that all the users of the page pool members in struct page have been
> gone, the members can be removed from struct page.
>
> However, since struct netmem_desc might still use the space in struct
> page, the size of struct netmem_desc should be checked, until struct
> netmem_desc has its own instance from slab, to avoid conficting with
> other members within struct page.
>
> Remove the page pool members in struct page and add a static checker for
> the size.
>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> ---
> include/linux/mm_types.h | 11 -----------
> include/net/netmem.h | 28 +++++-----------------------
> 2 files changed, 5 insertions(+), 34 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 873e820e1521..5a7864eb9d76 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -119,17 +119,6 @@ struct page {
> */
> unsigned long private;
> };
> - struct { /* page_pool used by netstack */
> - unsigned long _pp_mapping_pad;
> - /**
> - * @pp_magic: magic value to avoid recycling non
> - * page_pool allocated pages.
> - */
> - unsigned long pp_magic;
> - struct page_pool *pp;
> - unsigned long dma_addr;
> - atomic_long_t pp_ref_count;
> - };
> struct { /* Tail pages of compound page */
> unsigned long compound_head; /* Bit zero is set */
> };
> diff --git a/include/net/netmem.h b/include/net/netmem.h
> index c63a7e20f5f3..257c22398d7a 100644
> --- a/include/net/netmem.h
> +++ b/include/net/netmem.h
> @@ -77,30 +77,12 @@ struct net_iov_area {
> unsigned long base_virtual;
> };
>
> -/* These fields in struct page are used by the page_pool and net stack:
> - *
> - * struct {
> - * unsigned long _pp_mapping_pad;
> - * unsigned long pp_magic;
> - * struct page_pool *pp;
> - * unsigned long dma_addr;
> - * atomic_long_t pp_ref_count;
> - * };
> - *
> - * We mirror the page_pool fields here so the page_pool can access these fields
> - * without worrying whether the underlying fields belong to a page or net_iov.
> - *
> - * The non-net stack fields of struct page are private to the mm stack and must
> - * never be mirrored to net_iov.
> +/* XXX: The page pool fields in struct page have been removed but they
> + * might still use the space in struct page. Thus, the size of struct
> + * netmem_desc should be under control until struct netmem_desc has its
> + * own instance from slab.
> */
> -#define NET_IOV_ASSERT_OFFSET(pg, iov) \
> - static_assert(offsetof(struct page, pg) == \
> - offsetof(struct net_iov, iov))
> -NET_IOV_ASSERT_OFFSET(pp_magic, pp_magic);
> -NET_IOV_ASSERT_OFFSET(pp, pp);
> -NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr);
> -NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
> -#undef NET_IOV_ASSERT_OFFSET
> +static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
>
Removing these asserts is actually a bit dangerous. Functions like
netmem_or_pp_magic() rely on the fact that the offsets are the same
between struct page and struct net_iov to access these fields without
worrying about the type of the netmem. What we do in these helpers is
we we clear the least significant bit of the netmem, and then access
the field. This works only because we verified at build time that the
offset is the same.
I think we have 3 options here:
1. Keep the asserts as-is, then in the follow up patch where we remove
netmem_desc from struct page, we update the asserts to make sure
struct page and struct net_iov can grab the netmem_desc in a uniform
way.
2. We remove the asserts, but all the helpers that rely on
__netmem_clear_lsb need to be modified to do custom handling of
net_iov vs page. Something like:
static inline void netmem_or_pp_magic(netmem_ref netmem, unsigned long pp_magic)
{
if (netmem_is_net_iov(netmem)
netmem_to_net_iov(netmem)->pp_magic |= pp_magic;
else
netmem_to_page(netmem)->pp_magic |= pp_magic;
}
Option #2 requires extra checks, which may affect the performance
reported by page_pool_bench_simple that I pointed you to before.
3. We could swap out all the individual asserts for one assert, if
both page and net_iov have a netmem_desc subfield. This will also need
to be reworked when netmem_desc is eventually moved out of struct page
and is slab allocated:
NET_IOV_ASSERT_OFFSET(netmem_desc, netmem_desc);
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov
2025-05-23 9:01 ` Toke Høiland-Jørgensen
@ 2025-05-26 0:56 ` Byungchul Park
0 siblings, 0 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-26 0:56 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
almasrymina, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, asml.silence, tariqt, edumazet,
pabeni, saeedm, leon, ast, daniel, david, lorenzo.stoakes,
Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, linux-rdma,
bpf, vishal.moola
On Fri, May 23, 2025 at 11:01:01AM +0200, Toke Høiland-Jørgensen wrote:
> Byungchul Park <byungchul@sk.com> writes:
>
> > To simplify struct page, the page pool members of struct page should be
> > moved to other, allowing these members to be removed from struct page.
> >
> > Introduce a network memory descriptor to store the members, struct
> > netmem_desc, reusing struct net_iov that already mirrored struct page.
> >
> > While at it, relocate _pp_mapping_pad to group struct net_iov's fields.
> >
> > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > ---
> > include/linux/mm_types.h | 2 +-
> > include/net/netmem.h | 43 +++++++++++++++++++++++++++++++++-------
> > 2 files changed, 37 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 56d07edd01f9..873e820e1521 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -120,13 +120,13 @@ struct page {
> > unsigned long private;
> > };
> > struct { /* page_pool used by netstack */
> > + unsigned long _pp_mapping_pad;
> > /**
> > * @pp_magic: magic value to avoid recycling non
> > * page_pool allocated pages.
> > */
> > unsigned long pp_magic;
> > struct page_pool *pp;
> > - unsigned long _pp_mapping_pad;
> > unsigned long dma_addr;
> > atomic_long_t pp_ref_count;
> > };
>
> The reason that field is called "_pp_mapping_pad" is that it's supposed
> to overlay the page->mapping field, so that none of the page_pool uses
> set a value here. Moving it breaks that assumption. Once struct
Right. I will fix it. Thanks.
Byungchul
> netmem_desc is completely decoupled from struct page this obviously
> doesn't matter, but I think it does today? At least, trying to use that
> field for the DMA index broke things, which is why we ended up with the
> bit-stuffing in pp_magic...
>
> -Toke
>
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov
2025-05-23 17:00 ` Mina Almasry
@ 2025-05-26 1:15 ` Byungchul Park
0 siblings, 0 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-26 1:15 UTC (permalink / raw)
To: Mina Almasry
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Fri, May 23, 2025 at 10:00:55AM -0700, Mina Almasry wrote:
> On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
> >
> > To simplify struct page, the page pool members of struct page should be
> > moved to other, allowing these members to be removed from struct page.
> >
> > Introduce a network memory descriptor to store the members, struct
> > netmem_desc, reusing struct net_iov that already mirrored struct page.
> >
> > While at it, relocate _pp_mapping_pad to group struct net_iov's fields.
> >
> > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > ---
> > include/linux/mm_types.h | 2 +-
> > include/net/netmem.h | 43 +++++++++++++++++++++++++++++++++-------
> > 2 files changed, 37 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 56d07edd01f9..873e820e1521 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -120,13 +120,13 @@ struct page {
> > unsigned long private;
> > };
> > struct { /* page_pool used by netstack */
> > + unsigned long _pp_mapping_pad;
> > /**
> > * @pp_magic: magic value to avoid recycling non
> > * page_pool allocated pages.
> > */
> > unsigned long pp_magic;
> > struct page_pool *pp;
> > - unsigned long _pp_mapping_pad;
>
> Like Toke says, moving this to the beginning of this struct is not
> allowed. The first 3 bits of pp_magic are overlaid with page->lru so
> the pp makes sure not to use them. _pp_mapping_pad is overlaid with
> page->mapping, so the pp makes sure not to use it. AFAICT, this moving
> of _pp_mapping_pad is not necessary for this patch. I think just drop
> it.
Sure, I will. Thanks.
> > unsigned long dma_addr;
> > atomic_long_t pp_ref_count;
> > };
> > diff --git a/include/net/netmem.h b/include/net/netmem.h
> > index 386164fb9c18..08e9d76cdf14 100644
> > --- a/include/net/netmem.h
> > +++ b/include/net/netmem.h
> > @@ -31,12 +31,41 @@ enum net_iov_type {
> > };
> >
> > struct net_iov {
> > - enum net_iov_type type;
> > - unsigned long pp_magic;
> > - struct page_pool *pp;
> > - struct net_iov_area *owner;
> > - unsigned long dma_addr;
> > - atomic_long_t pp_ref_count;
> > + /*
> > + * XXX: Now that struct netmem_desc overlays on struct page,
> > + * struct_group_tagged() should cover all of them. However,
> > + * a separate struct netmem_desc should be declared and embedded,
> > + * once struct netmem_desc is no longer overlayed but it has its
> > + * own instance from slab. The final form should be:
> > + *
> > + * struct netmem_desc {
> > + * unsigned long pp_magic;
> > + * struct page_pool *pp;
> > + * unsigned long dma_addr;
> > + * atomic_long_t pp_ref_count;
> > + * };
> > + *
> > + * struct net_iov {
> > + * enum net_iov_type type;
> > + * struct net_iov_area *owner;
> > + * struct netmem_desc;
> > + * };
> > + */
>
> I'm unclear on why moving to this format is a TODO for the future. Why
> isn't this state in the comment the state in the code? I think I gave
> the same code snippet on the RFC, but here again:
>
> struct netmem_desc {
> /**
> * @pp_magic: magic value to avoid recycling non
> * page_pool allocated pages.
> */
> unsigned long pp_magic;
> struct page_pool *pp;
> unsigned long _pp_mapping_pad;
> unsigned long dma_addr;
> atomic_long_t pp_ref_count;
> };
>
> (Roughly):
>
> struct page {
> ...
> struct { /* page_pool used by netstack */
> struct netmem_desc;
This is unnecessary since it will be removed shortly.
> };
> ...
> };
>
> struct net_iov {
> enum net_iov_type type;
> struct netmem_desc;
This requires a huge change in a single commit since all the code
referring to any of the page pool fields, struct net_iov, and maybe
io_uring(?) should be altered at once.
Plus, much more changes are required since struct netmem_desc would not
overlay on struct page with what you suggest, which breaks the
assumption that struct netmem_desc overlays on struct page in the
current code.
So at least, the work should be started once the code doesn't need the
assumption.
Thoughts?
Byungchul
> struct net_iov_area *owner;
> }
>
> AFAICT, this should work..?
>
> --
> Thanks,
> Mina
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 00/18] Split netmem from struct page
2025-05-23 17:47 ` SeongJae Park
@ 2025-05-26 1:16 ` Byungchul Park
0 siblings, 0 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-26 1:16 UTC (permalink / raw)
To: SeongJae Park
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
almasrymina, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, asml.silence, toke, tariqt,
edumazet, pabeni, saeedm, leon, ast, daniel, david,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
horms, linux-rdma, bpf, vishal.moola
On Fri, May 23, 2025 at 10:47:48AM -0700, SeongJae Park wrote:
> Hi Byungchul,
>
> On Fri, 23 May 2025 12:25:51 +0900 Byungchul Park <byungchul@sk.com> wrote:
>
> > The MM subsystem is trying to reduce struct page to a single pointer.
> > The first step towards that is splitting struct page by its individual
> > users, as has already been done with folio and slab. This patchset does
> > that for netmem which is used for page pools.
>
> I found checkpatch.pl outputs some complaints to a few patches of this
> patch series. Most warnings and errors look not critical or even unnecessary,
> but seems some of those would better to be reduced in my opinion.
Thanks for the suggestion. I will check it.
Byungchul
>
>
> Thanks,
> SJ
>
> [...]
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 18/18] mm, netmem: remove the page pool members in struct page
2025-05-23 17:55 ` Mina Almasry
@ 2025-05-26 1:37 ` Byungchul Park
2025-05-26 16:58 ` Pavel Begunkov
0 siblings, 1 reply; 72+ messages in thread
From: Byungchul Park @ 2025-05-26 1:37 UTC (permalink / raw)
To: Mina Almasry
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Fri, May 23, 2025 at 10:55:54AM -0700, Mina Almasry wrote:
> On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
> >
> > Now that all the users of the page pool members in struct page have been
> > gone, the members can be removed from struct page.
> >
> > However, since struct netmem_desc might still use the space in struct
> > page, the size of struct netmem_desc should be checked, until struct
> > netmem_desc has its own instance from slab, to avoid conficting with
> > other members within struct page.
> >
> > Remove the page pool members in struct page and add a static checker for
> > the size.
> >
> > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > ---
> > include/linux/mm_types.h | 11 -----------
> > include/net/netmem.h | 28 +++++-----------------------
> > 2 files changed, 5 insertions(+), 34 deletions(-)
> >
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 873e820e1521..5a7864eb9d76 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -119,17 +119,6 @@ struct page {
> > */
> > unsigned long private;
> > };
> > - struct { /* page_pool used by netstack */
> > - unsigned long _pp_mapping_pad;
> > - /**
> > - * @pp_magic: magic value to avoid recycling non
> > - * page_pool allocated pages.
> > - */
> > - unsigned long pp_magic;
> > - struct page_pool *pp;
> > - unsigned long dma_addr;
> > - atomic_long_t pp_ref_count;
> > - };
> > struct { /* Tail pages of compound page */
> > unsigned long compound_head; /* Bit zero is set */
> > };
> > diff --git a/include/net/netmem.h b/include/net/netmem.h
> > index c63a7e20f5f3..257c22398d7a 100644
> > --- a/include/net/netmem.h
> > +++ b/include/net/netmem.h
> > @@ -77,30 +77,12 @@ struct net_iov_area {
> > unsigned long base_virtual;
> > };
> >
> > -/* These fields in struct page are used by the page_pool and net stack:
> > - *
> > - * struct {
> > - * unsigned long _pp_mapping_pad;
> > - * unsigned long pp_magic;
> > - * struct page_pool *pp;
> > - * unsigned long dma_addr;
> > - * atomic_long_t pp_ref_count;
> > - * };
> > - *
> > - * We mirror the page_pool fields here so the page_pool can access these fields
> > - * without worrying whether the underlying fields belong to a page or net_iov.
> > - *
> > - * The non-net stack fields of struct page are private to the mm stack and must
> > - * never be mirrored to net_iov.
> > +/* XXX: The page pool fields in struct page have been removed but they
> > + * might still use the space in struct page. Thus, the size of struct
> > + * netmem_desc should be under control until struct netmem_desc has its
> > + * own instance from slab.
> > */
> > -#define NET_IOV_ASSERT_OFFSET(pg, iov) \
> > - static_assert(offsetof(struct page, pg) == \
> > - offsetof(struct net_iov, iov))
> > -NET_IOV_ASSERT_OFFSET(pp_magic, pp_magic);
> > -NET_IOV_ASSERT_OFFSET(pp, pp);
> > -NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr);
> > -NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
> > -#undef NET_IOV_ASSERT_OFFSET
> > +static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
> >
>
> Removing these asserts is actually a bit dangerous. Functions like
> netmem_or_pp_magic() rely on the fact that the offsets are the same
> between struct page and struct net_iov to access these fields without
Worth noting this patch removes the page pool fields from struct page.
However, yes, I will keep necessary assertions with some changes applied
so that it can work even after removing the page pool fields like:
NET_IOV_ASSERT_OFFSET(lru, pp_magic);
NET_IOV_ASSERT_OFFSET(mapping, _pp_mapping_pad);
> worrying about the type of the netmem. What we do in these helpers is
> we we clear the least significant bit of the netmem, and then access
> the field. This works only because we verified at build time that the
> offset is the same.
>
> I think we have 3 options here:
>
> 1. Keep the asserts as-is, then in the follow up patch where we remove
> netmem_desc from struct page, we update the asserts to make sure
> struct page and struct net_iov can grab the netmem_desc in a uniform
Ah. It's worth noting that I'm removing the page pool fields all the
way from strcut page, instead of placing a place-holder that I did in
RFC as Matthew requested.
> way.
>
> 2. We remove the asserts, but all the helpers that rely on
> __netmem_clear_lsb need to be modified to do custom handling of
> net_iov vs page. Something like:
>
> static inline void netmem_or_pp_magic(netmem_ref netmem, unsigned long pp_magic)
> {
> if (netmem_is_net_iov(netmem)
> netmem_to_net_iov(netmem)->pp_magic |= pp_magic;
> else
> netmem_to_page(netmem)->pp_magic |= pp_magic;
struct page should not have pp_magic field once the page pool fields are
gone.
Byungchul
> }
>
> Option #2 requires extra checks, which may affect the performance
> reported by page_pool_bench_simple that I pointed you to before.
>
> 3. We could swap out all the individual asserts for one assert, if
> both page and net_iov have a netmem_desc subfield. This will also need
> to be reworked when netmem_desc is eventually moved out of struct page
> and is slab allocated:
>
> NET_IOV_ASSERT_OFFSET(netmem_desc, netmem_desc);
>
> --
> Thanks,
> Mina
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp()
2025-05-23 17:21 ` Mina Almasry
@ 2025-05-26 2:23 ` Byungchul Park
2025-05-26 2:36 ` Byungchul Park
2025-05-28 7:51 ` Pavel Begunkov
0 siblings, 2 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-26 2:23 UTC (permalink / raw)
To: Mina Almasry
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Fri, May 23, 2025 at 10:21:17AM -0700, Mina Almasry wrote:
> On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
> >
> > To simplify struct page, the effort to seperate its own descriptor from
> > struct page is required and the work for page pool is on going.
> >
> > To achieve that, all the code should avoid accessing page pool members
> > of struct page directly, but use safe APIs for the purpose.
> >
> > Use netmem_is_pp() instead of directly accessing page->pp_magic in
> > page_pool_page_is_pp().
> >
> > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > ---
> > include/linux/mm.h | 5 +----
> > net/core/page_pool.c | 5 +++++
> > 2 files changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 8dc012e84033..3f7c80fb73ce 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -4312,10 +4312,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> > #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> >
> > #ifdef CONFIG_PAGE_POOL
> > -static inline bool page_pool_page_is_pp(struct page *page)
> > -{
> > - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> > -}
>
> I vote for keeping this function as-is (do not convert it to netmem),
> and instead modify it to access page->netmem_desc->pp_magic.
Once the page pool fields are removed from struct page, struct page will
have neither struct netmem_desc nor the fields..
So it's unevitable to cast it to netmem_desc in order to refer to
pp_magic. Again, pp_magic is no longer associated to struct page.
Thoughts?
Byungchul
> The reason is that page_pool_is_pp() is today only called from code
> paths we have a page and not a netmem. Casting the page to a netmem
> which will cast it back to a page pretty much is a waste of cpu
> cycles. The page_pool is a place where we count cycles and we have
> benchmarks to verify performance (I pointed you to
> page_pool_bench_simple on the RFC).
>
> So lets avoid the cpu cycles if possible.
>
> --
> Thanks,
> Mina
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp()
2025-05-26 2:23 ` Byungchul Park
@ 2025-05-26 2:36 ` Byungchul Park
2025-05-26 8:40 ` Toke Høiland-Jørgensen
2025-05-28 7:51 ` Pavel Begunkov
1 sibling, 1 reply; 72+ messages in thread
From: Byungchul Park @ 2025-05-26 2:36 UTC (permalink / raw)
To: Mina Almasry
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Mon, May 26, 2025 at 11:23:07AM +0900, Byungchul Park wrote:
> On Fri, May 23, 2025 at 10:21:17AM -0700, Mina Almasry wrote:
> > On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
> > >
> > > To simplify struct page, the effort to seperate its own descriptor from
> > > struct page is required and the work for page pool is on going.
> > >
> > > To achieve that, all the code should avoid accessing page pool members
> > > of struct page directly, but use safe APIs for the purpose.
> > >
> > > Use netmem_is_pp() instead of directly accessing page->pp_magic in
> > > page_pool_page_is_pp().
> > >
> > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > > ---
> > > include/linux/mm.h | 5 +----
> > > net/core/page_pool.c | 5 +++++
> > > 2 files changed, 6 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index 8dc012e84033..3f7c80fb73ce 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -4312,10 +4312,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> > > #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> > >
> > > #ifdef CONFIG_PAGE_POOL
> > > -static inline bool page_pool_page_is_pp(struct page *page)
> > > -{
> > > - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> > > -}
> >
> > I vote for keeping this function as-is (do not convert it to netmem),
> > and instead modify it to access page->netmem_desc->pp_magic.
>
> Once the page pool fields are removed from struct page, struct page will
> have neither struct netmem_desc nor the fields..
>
> So it's unevitable to cast it to netmem_desc in order to refer to
> pp_magic. Again, pp_magic is no longer associated to struct page.
Options that come across my mind are:
1. use lru field of struct page instead, with appropriate comment but
looks so ugly.
2. instead of a full word for the magic, use a bit of flags or use
the private field for that purpose.
3. do not check magic number for page pool.
4. more?
Byungchul
>
> Thoughts?
>
> Byungchul
>
> > The reason is that page_pool_is_pp() is today only called from code
> > paths we have a page and not a netmem. Casting the page to a netmem
> > which will cast it back to a page pretty much is a waste of cpu
> > cycles. The page_pool is a place where we count cycles and we have
> > benchmarks to verify performance (I pointed you to
> > page_pool_bench_simple on the RFC).
> >
> > So lets avoid the cpu cycles if possible.
> >
> > --
> > Thanks,
> > Mina
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 13/18] mlx5: use netmem descriptor and APIs for page pool
2025-05-23 17:13 ` Mina Almasry
@ 2025-05-26 3:08 ` Byungchul Park
2025-05-26 8:12 ` Byungchul Park
0 siblings, 1 reply; 72+ messages in thread
From: Byungchul Park @ 2025-05-26 3:08 UTC (permalink / raw)
To: Mina Almasry
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Fri, May 23, 2025 at 10:13:27AM -0700, Mina Almasry wrote:
> On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
> >
> > To simplify struct page, the effort to seperate its own descriptor from
> > struct page is required and the work for page pool is on going.
> >
> > Use netmem descriptor and APIs for page pool in mlx5 code.
> >
> > Signed-off-by: Byungchul Park <byungchul@sk.com>
>
> Just FYI, you're racing with Nvidia adding netmem support to mlx5 as
> well. Probably they prefer to take their patch. So try to rebase on
> top of that maybe? Up to you.
>
> https://lore.kernel.org/netdev/1747950086-1246773-9-git-send-email-tariqt@nvidia.com/
>
> I also wonder if you should send this through the net-next tree, since
> it seem to race with changes that are going to land in net-next soon.
> Up to you, I don't have any strong preference. But if you do send to
> net-next, there are a bunch of extra rules to keep in mind:
>
> https://docs.kernel.org/process/maintainer-netdev.html
I can send to net-next, but is it okay even if it's more than 15 patches?
Byungchul
>
> --
> Thanks,
> Mina
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 13/18] mlx5: use netmem descriptor and APIs for page pool
2025-05-26 3:08 ` Byungchul Park
@ 2025-05-26 8:12 ` Byungchul Park
2025-05-26 18:00 ` Mina Almasry
0 siblings, 1 reply; 72+ messages in thread
From: Byungchul Park @ 2025-05-26 8:12 UTC (permalink / raw)
To: Mina Almasry
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Mon, May 26, 2025 at 12:08:58PM +0900, Byungchul Park wrote:
> On Fri, May 23, 2025 at 10:13:27AM -0700, Mina Almasry wrote:
> > On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
> > >
> > > To simplify struct page, the effort to seperate its own descriptor from
> > > struct page is required and the work for page pool is on going.
> > >
> > > Use netmem descriptor and APIs for page pool in mlx5 code.
> > >
> > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> >
> > Just FYI, you're racing with Nvidia adding netmem support to mlx5 as
> > well. Probably they prefer to take their patch. So try to rebase on
> > top of that maybe? Up to you.
> >
> > https://lore.kernel.org/netdev/1747950086-1246773-9-git-send-email-tariqt@nvidia.com/
> >
> > I also wonder if you should send this through the net-next tree, since
> > it seem to race with changes that are going to land in net-next soon.
> > Up to you, I don't have any strong preference. But if you do send to
> > net-next, there are a bunch of extra rules to keep in mind:
> >
> > https://docs.kernel.org/process/maintainer-netdev.html
It looks like I have to wait for net-next to reopen, maybe until the
next -rc1 released.. Right? However, I can see some patches posted now.
Hm..
Byungchul
>
> I can send to net-next, but is it okay even if it's more than 15 patches?
>
> Byungchul
> >
> > --
> > Thanks,
> > Mina
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp()
2025-05-26 2:36 ` Byungchul Park
@ 2025-05-26 8:40 ` Toke Høiland-Jørgensen
2025-05-26 9:43 ` Byungchul Park
0 siblings, 1 reply; 72+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-05-26 8:40 UTC (permalink / raw)
To: Byungchul Park, Mina Almasry
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, tariqt, edumazet, pabeni, saeedm,
leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett, vbabka,
rppt, surenb, mhocko, horms, linux-rdma, bpf, vishal.moola
Byungchul Park <byungchul@sk.com> writes:
> On Mon, May 26, 2025 at 11:23:07AM +0900, Byungchul Park wrote:
>> On Fri, May 23, 2025 at 10:21:17AM -0700, Mina Almasry wrote:
>> > On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
>> > >
>> > > To simplify struct page, the effort to seperate its own descriptor from
>> > > struct page is required and the work for page pool is on going.
>> > >
>> > > To achieve that, all the code should avoid accessing page pool members
>> > > of struct page directly, but use safe APIs for the purpose.
>> > >
>> > > Use netmem_is_pp() instead of directly accessing page->pp_magic in
>> > > page_pool_page_is_pp().
>> > >
>> > > Signed-off-by: Byungchul Park <byungchul@sk.com>
>> > > ---
>> > > include/linux/mm.h | 5 +----
>> > > net/core/page_pool.c | 5 +++++
>> > > 2 files changed, 6 insertions(+), 4 deletions(-)
>> > >
>> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
>> > > index 8dc012e84033..3f7c80fb73ce 100644
>> > > --- a/include/linux/mm.h
>> > > +++ b/include/linux/mm.h
>> > > @@ -4312,10 +4312,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>> > > #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>> > >
>> > > #ifdef CONFIG_PAGE_POOL
>> > > -static inline bool page_pool_page_is_pp(struct page *page)
>> > > -{
>> > > - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>> > > -}
>> >
>> > I vote for keeping this function as-is (do not convert it to netmem),
>> > and instead modify it to access page->netmem_desc->pp_magic.
>>
>> Once the page pool fields are removed from struct page, struct page will
>> have neither struct netmem_desc nor the fields..
>>
>> So it's unevitable to cast it to netmem_desc in order to refer to
>> pp_magic. Again, pp_magic is no longer associated to struct page.
>
> Options that come across my mind are:
>
> 1. use lru field of struct page instead, with appropriate comment but
> looks so ugly.
> 2. instead of a full word for the magic, use a bit of flags or use
> the private field for that purpose.
> 3. do not check magic number for page pool.
> 4. more?
I'm not sure I understand Mina's concern about CPU cycles from casting.
The casting is a compile-time thing, which shouldn't affect run-time
performance as long as the check is kept as an inline function. So it's
"just" a matter of exposing struct netmem_desc to mm.h so it can use it
in the inline definition. Unless I'm missing something?
-Toke
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp()
2025-05-26 8:40 ` Toke Høiland-Jørgensen
@ 2025-05-26 9:43 ` Byungchul Park
2025-05-26 9:54 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 72+ messages in thread
From: Byungchul Park @ 2025-05-26 9:43 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Mina Almasry, willy, netdev, linux-kernel, linux-mm, kernel_team,
kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, asml.silence, tariqt, edumazet,
pabeni, saeedm, leon, ast, daniel, david, lorenzo.stoakes,
Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, linux-rdma,
bpf, vishal.moola
On Mon, May 26, 2025 at 10:40:30AM +0200, Toke Høiland-Jørgensen wrote:
> Byungchul Park <byungchul@sk.com> writes:
>
> > On Mon, May 26, 2025 at 11:23:07AM +0900, Byungchul Park wrote:
> >> On Fri, May 23, 2025 at 10:21:17AM -0700, Mina Almasry wrote:
> >> > On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
> >> > >
> >> > > To simplify struct page, the effort to seperate its own descriptor from
> >> > > struct page is required and the work for page pool is on going.
> >> > >
> >> > > To achieve that, all the code should avoid accessing page pool members
> >> > > of struct page directly, but use safe APIs for the purpose.
> >> > >
> >> > > Use netmem_is_pp() instead of directly accessing page->pp_magic in
> >> > > page_pool_page_is_pp().
> >> > >
> >> > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> >> > > ---
> >> > > include/linux/mm.h | 5 +----
> >> > > net/core/page_pool.c | 5 +++++
> >> > > 2 files changed, 6 insertions(+), 4 deletions(-)
> >> > >
> >> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> >> > > index 8dc012e84033..3f7c80fb73ce 100644
> >> > > --- a/include/linux/mm.h
> >> > > +++ b/include/linux/mm.h
> >> > > @@ -4312,10 +4312,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> >> > > #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> >> > >
> >> > > #ifdef CONFIG_PAGE_POOL
> >> > > -static inline bool page_pool_page_is_pp(struct page *page)
> >> > > -{
> >> > > - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> >> > > -}
> >> >
> >> > I vote for keeping this function as-is (do not convert it to netmem),
> >> > and instead modify it to access page->netmem_desc->pp_magic.
> >>
> >> Once the page pool fields are removed from struct page, struct page will
> >> have neither struct netmem_desc nor the fields..
> >>
> >> So it's unevitable to cast it to netmem_desc in order to refer to
> >> pp_magic. Again, pp_magic is no longer associated to struct page.
> >
> > Options that come across my mind are:
> >
> > 1. use lru field of struct page instead, with appropriate comment but
> > looks so ugly.
> > 2. instead of a full word for the magic, use a bit of flags or use
> > the private field for that purpose.
> > 3. do not check magic number for page pool.
> > 4. more?
>
> I'm not sure I understand Mina's concern about CPU cycles from casting.
> The casting is a compile-time thing, which shouldn't affect run-time
I didn't mention it but yes.
> performance as long as the check is kept as an inline function. So it's
> "just" a matter of exposing struct netmem_desc to mm.h so it can use it
Then.. we should expose net_iov as well, but I'm afraid it looks weird.
Do you think it's okay?
As I told in another thread, embedding strcut netmem_desc into struct
net_iov will require a huge single patch altering all the users of
struct net_iov.
Byungchul
> in the inline definition. Unless I'm missing something?
>
> -Toke
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp()
2025-05-26 9:43 ` Byungchul Park
@ 2025-05-26 9:54 ` Toke Høiland-Jørgensen
2025-05-26 10:01 ` Byungchul Park
2025-05-28 5:14 ` Byungchul Park
0 siblings, 2 replies; 72+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-05-26 9:54 UTC (permalink / raw)
To: Byungchul Park
Cc: Mina Almasry, willy, netdev, linux-kernel, linux-mm, kernel_team,
kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, asml.silence, tariqt, edumazet,
pabeni, saeedm, leon, ast, daniel, david, lorenzo.stoakes,
Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, linux-rdma,
bpf, vishal.moola
Byungchul Park <byungchul@sk.com> writes:
> On Mon, May 26, 2025 at 10:40:30AM +0200, Toke Høiland-Jørgensen wrote:
>> Byungchul Park <byungchul@sk.com> writes:
>>
>> > On Mon, May 26, 2025 at 11:23:07AM +0900, Byungchul Park wrote:
>> >> On Fri, May 23, 2025 at 10:21:17AM -0700, Mina Almasry wrote:
>> >> > On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
>> >> > >
>> >> > > To simplify struct page, the effort to seperate its own descriptor from
>> >> > > struct page is required and the work for page pool is on going.
>> >> > >
>> >> > > To achieve that, all the code should avoid accessing page pool members
>> >> > > of struct page directly, but use safe APIs for the purpose.
>> >> > >
>> >> > > Use netmem_is_pp() instead of directly accessing page->pp_magic in
>> >> > > page_pool_page_is_pp().
>> >> > >
>> >> > > Signed-off-by: Byungchul Park <byungchul@sk.com>
>> >> > > ---
>> >> > > include/linux/mm.h | 5 +----
>> >> > > net/core/page_pool.c | 5 +++++
>> >> > > 2 files changed, 6 insertions(+), 4 deletions(-)
>> >> > >
>> >> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
>> >> > > index 8dc012e84033..3f7c80fb73ce 100644
>> >> > > --- a/include/linux/mm.h
>> >> > > +++ b/include/linux/mm.h
>> >> > > @@ -4312,10 +4312,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>> >> > > #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>> >> > >
>> >> > > #ifdef CONFIG_PAGE_POOL
>> >> > > -static inline bool page_pool_page_is_pp(struct page *page)
>> >> > > -{
>> >> > > - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>> >> > > -}
>> >> >
>> >> > I vote for keeping this function as-is (do not convert it to netmem),
>> >> > and instead modify it to access page->netmem_desc->pp_magic.
>> >>
>> >> Once the page pool fields are removed from struct page, struct page will
>> >> have neither struct netmem_desc nor the fields..
>> >>
>> >> So it's unevitable to cast it to netmem_desc in order to refer to
>> >> pp_magic. Again, pp_magic is no longer associated to struct page.
>> >
>> > Options that come across my mind are:
>> >
>> > 1. use lru field of struct page instead, with appropriate comment but
>> > looks so ugly.
>> > 2. instead of a full word for the magic, use a bit of flags or use
>> > the private field for that purpose.
>> > 3. do not check magic number for page pool.
>> > 4. more?
>>
>> I'm not sure I understand Mina's concern about CPU cycles from casting.
>> The casting is a compile-time thing, which shouldn't affect run-time
>
> I didn't mention it but yes.
>
>> performance as long as the check is kept as an inline function. So it's
>> "just" a matter of exposing struct netmem_desc to mm.h so it can use it
>
> Then.. we should expose net_iov as well, but I'm afraid it looks weird.
> Do you think it's okay?
Well, it'll be ugly, I grant you that :)
Hmm, so another idea could be to add the pp_magic field to the inner
union that the lru field is in, and keep the page_pool_page_is_pp()
as-is. Then add an assert for offsetof(struct page, pp_magic) ==
offsetof(netmem_desc, pp_magic) on the netmem side, which can be removed
once the two structs no longer shadow each other?
That way you can still get rid of the embedded page_pool struct in
struct page, and the pp_magic field will just be a transition thing
until things are completely separated...
-Toke
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp()
2025-05-26 9:54 ` Toke Høiland-Jørgensen
@ 2025-05-26 10:01 ` Byungchul Park
2025-05-28 5:14 ` Byungchul Park
1 sibling, 0 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-26 10:01 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Mina Almasry, willy, netdev, linux-kernel, linux-mm, kernel_team,
kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, asml.silence, tariqt, edumazet,
pabeni, saeedm, leon, ast, daniel, david, lorenzo.stoakes,
Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, linux-rdma,
bpf, vishal.moola
On Mon, May 26, 2025 at 11:54:33AM +0200, Toke Høiland-Jørgensen wrote:
> Byungchul Park <byungchul@sk.com> writes:
>
> > On Mon, May 26, 2025 at 10:40:30AM +0200, Toke Høiland-Jørgensen wrote:
> >> Byungchul Park <byungchul@sk.com> writes:
> >>
> >> > On Mon, May 26, 2025 at 11:23:07AM +0900, Byungchul Park wrote:
> >> >> On Fri, May 23, 2025 at 10:21:17AM -0700, Mina Almasry wrote:
> >> >> > On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
> >> >> > >
> >> >> > > To simplify struct page, the effort to seperate its own descriptor from
> >> >> > > struct page is required and the work for page pool is on going.
> >> >> > >
> >> >> > > To achieve that, all the code should avoid accessing page pool members
> >> >> > > of struct page directly, but use safe APIs for the purpose.
> >> >> > >
> >> >> > > Use netmem_is_pp() instead of directly accessing page->pp_magic in
> >> >> > > page_pool_page_is_pp().
> >> >> > >
> >> >> > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> >> >> > > ---
> >> >> > > include/linux/mm.h | 5 +----
> >> >> > > net/core/page_pool.c | 5 +++++
> >> >> > > 2 files changed, 6 insertions(+), 4 deletions(-)
> >> >> > >
> >> >> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> >> >> > > index 8dc012e84033..3f7c80fb73ce 100644
> >> >> > > --- a/include/linux/mm.h
> >> >> > > +++ b/include/linux/mm.h
> >> >> > > @@ -4312,10 +4312,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> >> >> > > #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> >> >> > >
> >> >> > > #ifdef CONFIG_PAGE_POOL
> >> >> > > -static inline bool page_pool_page_is_pp(struct page *page)
> >> >> > > -{
> >> >> > > - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> >> >> > > -}
> >> >> >
> >> >> > I vote for keeping this function as-is (do not convert it to netmem),
> >> >> > and instead modify it to access page->netmem_desc->pp_magic.
> >> >>
> >> >> Once the page pool fields are removed from struct page, struct page will
> >> >> have neither struct netmem_desc nor the fields..
> >> >>
> >> >> So it's unevitable to cast it to netmem_desc in order to refer to
> >> >> pp_magic. Again, pp_magic is no longer associated to struct page.
> >> >
> >> > Options that come across my mind are:
> >> >
> >> > 1. use lru field of struct page instead, with appropriate comment but
> >> > looks so ugly.
> >> > 2. instead of a full word for the magic, use a bit of flags or use
> >> > the private field for that purpose.
> >> > 3. do not check magic number for page pool.
> >> > 4. more?
> >>
> >> I'm not sure I understand Mina's concern about CPU cycles from casting.
> >> The casting is a compile-time thing, which shouldn't affect run-time
> >
> > I didn't mention it but yes.
> >
> >> performance as long as the check is kept as an inline function. So it's
> >> "just" a matter of exposing struct netmem_desc to mm.h so it can use it
> >
> > Then.. we should expose net_iov as well, but I'm afraid it looks weird.
> > Do you think it's okay?
>
> Well, it'll be ugly, I grant you that :)
>
> Hmm, so another idea could be to add the pp_magic field to the inner
> union that the lru field is in, and keep the page_pool_page_is_pp()
> as-is. Then add an assert for offsetof(struct page, pp_magic) ==
> offsetof(netmem_desc, pp_magic) on the netmem side, which can be removed
> once the two structs no longer shadow each other?
It would work, but still that's what I wanted to avoid.
To Matthew and mm folks,
Does it look okay?
Byungchul
>
> That way you can still get rid of the embedded page_pool struct in
> struct page, and the pp_magic field will just be a transition thing
> until things are completely separated...
>
> -Toke
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 18/18] mm, netmem: remove the page pool members in struct page
2025-05-26 1:37 ` Byungchul Park
@ 2025-05-26 16:58 ` Pavel Begunkov
2025-05-26 17:33 ` Mina Almasry
2025-05-27 1:02 ` Byungchul Park
0 siblings, 2 replies; 72+ messages in thread
From: Pavel Begunkov @ 2025-05-26 16:58 UTC (permalink / raw)
To: Byungchul Park, Mina Almasry
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, toke, tariqt, edumazet, pabeni, saeedm, leon, ast,
daniel, david, lorenzo.stoakes, Liam.Howlett, vbabka, rppt,
surenb, mhocko, horms, linux-rdma, bpf, vishal.moola
On 5/26/25 02:37, Byungchul Park wrote:
> On Fri, May 23, 2025 at 10:55:54AM -0700, Mina Almasry wrote:
>> On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
>>>
>>> Now that all the users of the page pool members in struct page have been
>>> gone, the members can be removed from struct page.
>>>
>>> However, since struct netmem_desc might still use the space in struct
>>> page, the size of struct netmem_desc should be checked, until struct
>>> netmem_desc has its own instance from slab, to avoid conficting with
>>> other members within struct page.
>>>
>>> Remove the page pool members in struct page and add a static checker for
>>> the size.
>>>
>>> Signed-off-by: Byungchul Park <byungchul@sk.com>
>>> ---
>>> include/linux/mm_types.h | 11 -----------
>>> include/net/netmem.h | 28 +++++-----------------------
>>> 2 files changed, 5 insertions(+), 34 deletions(-)
>>>
>>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>>> index 873e820e1521..5a7864eb9d76 100644
>>> --- a/include/linux/mm_types.h
>>> +++ b/include/linux/mm_types.h
>>> @@ -119,17 +119,6 @@ struct page {
>>> */
>>> unsigned long private;
>>> };
>>> - struct { /* page_pool used by netstack */
>>> - unsigned long _pp_mapping_pad;
>>> - /**
>>> - * @pp_magic: magic value to avoid recycling non
>>> - * page_pool allocated pages.
>>> - */
>>> - unsigned long pp_magic;
>>> - struct page_pool *pp;
>>> - unsigned long dma_addr;
>>> - atomic_long_t pp_ref_count;
>>> - };
>>> struct { /* Tail pages of compound page */
>>> unsigned long compound_head; /* Bit zero is set */
>>> };
>>> diff --git a/include/net/netmem.h b/include/net/netmem.h
>>> index c63a7e20f5f3..257c22398d7a 100644
>>> --- a/include/net/netmem.h
>>> +++ b/include/net/netmem.h
>>> @@ -77,30 +77,12 @@ struct net_iov_area {
>>> unsigned long base_virtual;
>>> };
>>>
>>> -/* These fields in struct page are used by the page_pool and net stack:
>>> - *
>>> - * struct {
>>> - * unsigned long _pp_mapping_pad;
>>> - * unsigned long pp_magic;
>>> - * struct page_pool *pp;
>>> - * unsigned long dma_addr;
>>> - * atomic_long_t pp_ref_count;
>>> - * };
>>> - *
>>> - * We mirror the page_pool fields here so the page_pool can access these fields
>>> - * without worrying whether the underlying fields belong to a page or net_iov.
>>> - *
>>> - * The non-net stack fields of struct page are private to the mm stack and must
>>> - * never be mirrored to net_iov.
>>> +/* XXX: The page pool fields in struct page have been removed but they
>>> + * might still use the space in struct page. Thus, the size of struct
>>> + * netmem_desc should be under control until struct netmem_desc has its
>>> + * own instance from slab.
>>> */
>>> -#define NET_IOV_ASSERT_OFFSET(pg, iov) \
>>> - static_assert(offsetof(struct page, pg) == \
>>> - offsetof(struct net_iov, iov))
>>> -NET_IOV_ASSERT_OFFSET(pp_magic, pp_magic);
>>> -NET_IOV_ASSERT_OFFSET(pp, pp);
>>> -NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr);
>>> -NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
>>> -#undef NET_IOV_ASSERT_OFFSET
>>> +static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
>>>
>>
>> Removing these asserts is actually a bit dangerous. Functions like
>> netmem_or_pp_magic() rely on the fact that the offsets are the same
>> between struct page and struct net_iov to access these fields without
>
> Worth noting this patch removes the page pool fields from struct page.
static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem)
{
return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV);
}
static inline atomic_long_t *netmem_get_pp_ref_count_ref(netmem_ref netmem)
{
return &__netmem_clear_lsb(netmem)->pp_ref_count;
}
That's a snippet of code after applying the series. So, let's say we
take a page, it's casted to netmem, then the netmem (as it was before)
is casted to net_iov. Before it relied on net_iov and the pp's part of
the page having the same layout, which was checked by static asserts,
but now, unless I'm mistaken, it's aligned in the exactly same way but
points to a seemingly random offset of the page. We should not be doing
that.
Just to be clear, I think casting pages to struct net_iov *, as it
currently is, is quite ugly, but that's something netmem_desc and this
effort can help with.
What you likely want to do is:
Patch 1:
struct page {
unsigned long flags;
union {
struct_group_tagged(netmem_desc, netmem_desc) {
// same layout as before
...
struct page_pool *pp;
...
};
}
}
struct net_iov {
unsigned long flags_padding;
union {
struct {
// same layout as in page + build asserts;
...
struct page_pool *pp;
...
};
struct netmem_desc desc;
};
};
struct netmem_desc *page_to_netmem_desc(struct page *page)
{
return &page->netmem_desc;
}
struct netmem_desc *netmem_to_desc(netmem_t netmem)
{
if (netmem_is_page(netmem))
return page_to_netmem_desc(netmem_to_page(netmem);
return &netmem_to_niov(netmem)->desc;
}
The compiler should be able to optimise the branch in netmem_to_desc(),
but we might need to help it a bit.
Then, patch 2 ... N convert page pool and everyone else accessing
those page fields directly to netmem_to_desc / etc.
And the final patch replaces the struct group in the page with a
new field:
struct netmem_desc {
struct page_pool *pp;
...
};
struct page {
unsigned long flags_padding;
union {
struct netmem_desc desc;
...
};
};
net_iov will drop its union in a later series to avoid conflicts.
btw, I don't think you need to convert page pool to netmem for this
to happen, so that can be done in a separate unrelated series. It's
18 patches, and netdev usually requires it to be no more than 15.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 18/18] mm, netmem: remove the page pool members in struct page
2025-05-26 16:58 ` Pavel Begunkov
@ 2025-05-26 17:33 ` Mina Almasry
2025-05-27 1:02 ` Byungchul Park
1 sibling, 0 replies; 72+ messages in thread
From: Mina Almasry @ 2025-05-26 17:33 UTC (permalink / raw)
To: Pavel Begunkov
Cc: Byungchul Park, willy, netdev, linux-kernel, linux-mm,
kernel_team, kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Mon, May 26, 2025 at 9:57 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
> >> Removing these asserts is actually a bit dangerous. Functions like
> >> netmem_or_pp_magic() rely on the fact that the offsets are the same
> >> between struct page and struct net_iov to access these fields without
> >
> > Worth noting this patch removes the page pool fields from struct page.
>
> static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem)
> {
> return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV);
> }
>
> static inline atomic_long_t *netmem_get_pp_ref_count_ref(netmem_ref netmem)
> {
> return &__netmem_clear_lsb(netmem)->pp_ref_count;
> }
>
> That's a snippet of code after applying the series. So, let's say we
> take a page, it's casted to netmem, then the netmem (as it was before)
> is casted to net_iov. Before it relied on net_iov and the pp's part of
> the page having the same layout, which was checked by static asserts,
> but now, unless I'm mistaken, it's aligned in the exactly same way but
> points to a seemingly random offset of the page. We should not be doing
> that.
>
Agreed.
> Just to be clear, I think casting pages to struct net_iov *, as it
> currently is, is quite ugly, but that's something netmem_desc and this
> effort can help with.
>
Agreed it's quite ugly. It was done in the name of optimizing the page
pool benchmark to the extreme as far as I can remember. We could use
page pool benchmark numbers on this series to make sure these new
changes aren't regressing the fast path.
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 13/18] mlx5: use netmem descriptor and APIs for page pool
2025-05-26 8:12 ` Byungchul Park
@ 2025-05-26 18:00 ` Mina Almasry
0 siblings, 0 replies; 72+ messages in thread
From: Mina Almasry @ 2025-05-26 18:00 UTC (permalink / raw)
To: Byungchul Park
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Mon, May 26, 2025 at 1:12 AM Byungchul Park <byungchul@sk.com> wrote:
>
> On Mon, May 26, 2025 at 12:08:58PM +0900, Byungchul Park wrote:
> > On Fri, May 23, 2025 at 10:13:27AM -0700, Mina Almasry wrote:
> > > On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
> > > >
> > > > To simplify struct page, the effort to seperate its own descriptor from
> > > > struct page is required and the work for page pool is on going.
> > > >
> > > > Use netmem descriptor and APIs for page pool in mlx5 code.
> > > >
> > > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > >
> > > Just FYI, you're racing with Nvidia adding netmem support to mlx5 as
> > > well. Probably they prefer to take their patch. So try to rebase on
> > > top of that maybe? Up to you.
> > >
> > > https://lore.kernel.org/netdev/1747950086-1246773-9-git-send-email-tariqt@nvidia.com/
> > >
> > > I also wonder if you should send this through the net-next tree, since
> > > it seem to race with changes that are going to land in net-next soon.
> > > Up to you, I don't have any strong preference. But if you do send to
> > > net-next, there are a bunch of extra rules to keep in mind:
> > >
> > > https://docs.kernel.org/process/maintainer-netdev.html
>
> It looks like I have to wait for net-next to reopen, maybe until the
> next -rc1 released.. Right? However, I can see some patches posted now.
> Hm..
>
We try to stick to 15 patches, but I've seen up to 20 sometimes get reviewed.
net-next just closed unfortunately, so yes you'll need to wait until
it reopens. RFCs are welcome in the meantime, and if you want to stick
to mm-unstable that's fine by me too, FWIW.
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 18/18] mm, netmem: remove the page pool members in struct page
2025-05-26 16:58 ` Pavel Begunkov
2025-05-26 17:33 ` Mina Almasry
@ 2025-05-27 1:02 ` Byungchul Park
2025-05-27 1:31 ` Byungchul Park
2025-05-27 5:30 ` Pavel Begunkov
1 sibling, 2 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-27 1:02 UTC (permalink / raw)
To: Pavel Begunkov
Cc: Mina Almasry, willy, netdev, linux-kernel, linux-mm, kernel_team,
kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Mon, May 26, 2025 at 05:58:10PM +0100, Pavel Begunkov wrote:
> On 5/26/25 02:37, Byungchul Park wrote:
> > On Fri, May 23, 2025 at 10:55:54AM -0700, Mina Almasry wrote:
> > > On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
> > > >
> > > > Now that all the users of the page pool members in struct page have been
> > > > gone, the members can be removed from struct page.
> > > >
> > > > However, since struct netmem_desc might still use the space in struct
> > > > page, the size of struct netmem_desc should be checked, until struct
> > > > netmem_desc has its own instance from slab, to avoid conficting with
> > > > other members within struct page.
> > > >
> > > > Remove the page pool members in struct page and add a static checker for
> > > > the size.
> > > >
> > > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > > > ---
> > > > include/linux/mm_types.h | 11 -----------
> > > > include/net/netmem.h | 28 +++++-----------------------
> > > > 2 files changed, 5 insertions(+), 34 deletions(-)
> > > >
> > > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > > > index 873e820e1521..5a7864eb9d76 100644
> > > > --- a/include/linux/mm_types.h
> > > > +++ b/include/linux/mm_types.h
> > > > @@ -119,17 +119,6 @@ struct page {
> > > > */
> > > > unsigned long private;
> > > > };
> > > > - struct { /* page_pool used by netstack */
> > > > - unsigned long _pp_mapping_pad;
> > > > - /**
> > > > - * @pp_magic: magic value to avoid recycling non
> > > > - * page_pool allocated pages.
> > > > - */
> > > > - unsigned long pp_magic;
> > > > - struct page_pool *pp;
> > > > - unsigned long dma_addr;
> > > > - atomic_long_t pp_ref_count;
> > > > - };
> > > > struct { /* Tail pages of compound page */
> > > > unsigned long compound_head; /* Bit zero is set */
> > > > };
> > > > diff --git a/include/net/netmem.h b/include/net/netmem.h
> > > > index c63a7e20f5f3..257c22398d7a 100644
> > > > --- a/include/net/netmem.h
> > > > +++ b/include/net/netmem.h
> > > > @@ -77,30 +77,12 @@ struct net_iov_area {
> > > > unsigned long base_virtual;
> > > > };
> > > >
> > > > -/* These fields in struct page are used by the page_pool and net stack:
> > > > - *
> > > > - * struct {
> > > > - * unsigned long _pp_mapping_pad;
> > > > - * unsigned long pp_magic;
> > > > - * struct page_pool *pp;
> > > > - * unsigned long dma_addr;
> > > > - * atomic_long_t pp_ref_count;
> > > > - * };
> > > > - *
> > > > - * We mirror the page_pool fields here so the page_pool can access these fields
> > > > - * without worrying whether the underlying fields belong to a page or net_iov.
> > > > - *
> > > > - * The non-net stack fields of struct page are private to the mm stack and must
> > > > - * never be mirrored to net_iov.
> > > > +/* XXX: The page pool fields in struct page have been removed but they
> > > > + * might still use the space in struct page. Thus, the size of struct
> > > > + * netmem_desc should be under control until struct netmem_desc has its
> > > > + * own instance from slab.
> > > > */
> > > > -#define NET_IOV_ASSERT_OFFSET(pg, iov) \
> > > > - static_assert(offsetof(struct page, pg) == \
> > > > - offsetof(struct net_iov, iov))
> > > > -NET_IOV_ASSERT_OFFSET(pp_magic, pp_magic);
> > > > -NET_IOV_ASSERT_OFFSET(pp, pp);
> > > > -NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr);
> > > > -NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
> > > > -#undef NET_IOV_ASSERT_OFFSET
> > > > +static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
> > > >
> > >
> > > Removing these asserts is actually a bit dangerous. Functions like
> > > netmem_or_pp_magic() rely on the fact that the offsets are the same
> > > between struct page and struct net_iov to access these fields without
> >
> > Worth noting this patch removes the page pool fields from struct page.
>
> static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem)
> {
> return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV);
> }
>
> static inline atomic_long_t *netmem_get_pp_ref_count_ref(netmem_ref netmem)
> {
> return &__netmem_clear_lsb(netmem)->pp_ref_count;
> }
>
> That's a snippet of code after applying the series. So, let's say we
> take a page, it's casted to netmem, then the netmem (as it was before)
> is casted to net_iov. Before it relied on net_iov and the pp's part of
> the page having the same layout, which was checked by static asserts,
> but now, unless I'm mistaken, it's aligned in the exactly same way but
> points to a seemingly random offset of the page. We should not be doing
> that.
I told it in another thread. My bad. I will fix it.
> Just to be clear, I think casting pages to struct net_iov *, as it
> currently is, is quite ugly, but that's something netmem_desc and this
> effort can help with.
>
> What you likely want to do is:
>
> Patch 1:
>
> struct page {
> unsigned long flags;
> union {
> struct_group_tagged(netmem_desc, netmem_desc) {
> // same layout as before
> ...
> struct page_pool *pp;
> ...
> };
This part will be gone shortly. The matters come from absence of this
part.
> }
> }
>
> struct net_iov {
> unsigned long flags_padding;
> union {
> struct {
> // same layout as in page + build asserts;
> ...
> struct page_pool *pp;
> ...
> };
> struct netmem_desc desc;
> };
> };
>
> struct netmem_desc *page_to_netmem_desc(struct page *page)
> {
> return &page->netmem_desc;
page will not have any netmem things in it after this, that matters.
> }
>
> struct netmem_desc *netmem_to_desc(netmem_t netmem)
> {
> if (netmem_is_page(netmem))
> return page_to_netmem_desc(netmem_to_page(netmem);
> return &netmem_to_niov(netmem)->desc;
> }
>
> The compiler should be able to optimise the branch in netmem_to_desc(),
> but we might need to help it a bit.
>
>
> Then, patch 2 ... N convert page pool and everyone else accessing
> those page fields directly to netmem_to_desc / etc.
>
> And the final patch replaces the struct group in the page with a
> new field:
>
> struct netmem_desc {
> struct page_pool *pp;
> ...
> };
>
> struct page {
> unsigned long flags_padding;
> union {
> struct netmem_desc desc;
^
should be gone.
Byungchul
> ...
> };
> };
>
> net_iov will drop its union in a later series to avoid conflicts.
>
> btw, I don't think you need to convert page pool to netmem for this
> to happen, so that can be done in a separate unrelated series. It's
> 18 patches, and netdev usually requires it to be no more than 15.
>
> --
> Pavel Begunkov
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 18/18] mm, netmem: remove the page pool members in struct page
2025-05-27 1:02 ` Byungchul Park
@ 2025-05-27 1:31 ` Byungchul Park
2025-05-27 5:30 ` Pavel Begunkov
1 sibling, 0 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-27 1:31 UTC (permalink / raw)
To: Pavel Begunkov
Cc: Mina Almasry, willy, netdev, linux-kernel, linux-mm, kernel_team,
kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Tue, May 27, 2025 at 10:02:26AM +0900, Byungchul Park wrote:
> On Mon, May 26, 2025 at 05:58:10PM +0100, Pavel Begunkov wrote:
> > struct net_iov {
> > unsigned long flags_padding;
> > union {
> > struct {
> > // same layout as in page + build asserts;
> > ...
> > struct page_pool *pp;
> > ...
> > };
> > struct netmem_desc desc;
> > };
> > };
> >
> > struct netmem_desc *page_to_netmem_desc(struct page *page)
> > {
> > return &page->netmem_desc;
>
> page will not have any netmem things in it after this, that matters.
^
this patch series
Byungchul
>
> > }
> >
> > struct netmem_desc *netmem_to_desc(netmem_t netmem)
> > {
> > if (netmem_is_page(netmem))
> > return page_to_netmem_desc(netmem_to_page(netmem);
> > return &netmem_to_niov(netmem)->desc;
> > }
> >
> > The compiler should be able to optimise the branch in netmem_to_desc(),
> > but we might need to help it a bit.
> >
> >
> > Then, patch 2 ... N convert page pool and everyone else accessing
> > those page fields directly to netmem_to_desc / etc.
> >
> > And the final patch replaces the struct group in the page with a
> > new field:
> >
> > struct netmem_desc {
> > struct page_pool *pp;
> > ...
> > };
> >
> > struct page {
> > unsigned long flags_padding;
> > union {
> > struct netmem_desc desc;
> ^
> should be gone.
>
> Byungchul
> > ...
> > };
> > };
> >
> > net_iov will drop its union in a later series to avoid conflicts.
> >
> > btw, I don't think you need to convert page pool to netmem for this
> > to happen, so that can be done in a separate unrelated series. It's
> > 18 patches, and netdev usually requires it to be no more than 15.
> >
> > --
> > Pavel Begunkov
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov
2025-05-23 3:25 ` [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov Byungchul Park
2025-05-23 9:01 ` Toke Høiland-Jørgensen
2025-05-23 17:00 ` Mina Almasry
@ 2025-05-27 2:50 ` Byungchul Park
2025-05-27 20:03 ` Mina Almasry
2 siblings, 1 reply; 72+ messages in thread
From: Byungchul Park @ 2025-05-27 2:50 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Fri, May 23, 2025 at 12:25:52PM +0900, Byungchul Park wrote:
> To simplify struct page, the page pool members of struct page should be
> moved to other, allowing these members to be removed from struct page.
>
> Introduce a network memory descriptor to store the members, struct
> netmem_desc, reusing struct net_iov that already mirrored struct page.
>
> While at it, relocate _pp_mapping_pad to group struct net_iov's fields.
>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> ---
> include/linux/mm_types.h | 2 +-
> include/net/netmem.h | 43 +++++++++++++++++++++++++++++++++-------
> 2 files changed, 37 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 56d07edd01f9..873e820e1521 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -120,13 +120,13 @@ struct page {
> unsigned long private;
> };
> struct { /* page_pool used by netstack */
> + unsigned long _pp_mapping_pad;
> /**
> * @pp_magic: magic value to avoid recycling non
> * page_pool allocated pages.
> */
> unsigned long pp_magic;
> struct page_pool *pp;
> - unsigned long _pp_mapping_pad;
> unsigned long dma_addr;
> atomic_long_t pp_ref_count;
> };
> diff --git a/include/net/netmem.h b/include/net/netmem.h
> index 386164fb9c18..08e9d76cdf14 100644
> --- a/include/net/netmem.h
> +++ b/include/net/netmem.h
> @@ -31,12 +31,41 @@ enum net_iov_type {
> };
>
> struct net_iov {
> - enum net_iov_type type;
> - unsigned long pp_magic;
> - struct page_pool *pp;
> - struct net_iov_area *owner;
> - unsigned long dma_addr;
> - atomic_long_t pp_ref_count;
> + /*
> + * XXX: Now that struct netmem_desc overlays on struct page,
> + * struct_group_tagged() should cover all of them. However,
> + * a separate struct netmem_desc should be declared and embedded,
> + * once struct netmem_desc is no longer overlayed but it has its
> + * own instance from slab. The final form should be:
> + *
> + * struct netmem_desc {
> + * unsigned long pp_magic;
> + * struct page_pool *pp;
> + * unsigned long dma_addr;
> + * atomic_long_t pp_ref_count;
> + * };
> + *
> + * struct net_iov {
> + * enum net_iov_type type;
> + * struct net_iov_area *owner;
> + * struct netmem_desc;
> + * };
> + */
> + struct_group_tagged(netmem_desc, desc,
So.. For now, this is the best option we can pick. We can do all that
you told me once struct netmem_desc has it own instance from slab.
Again, it's because the page pool fields (or netmem things) from struct
page will be gone by this series.
Mina, thoughts?
Byungchul
> + /*
> + * only for struct net_iov
> + */
> + enum net_iov_type type;
> + struct net_iov_area *owner;
> +
> + /*
> + * actually for struct netmem_desc
> + */
> + unsigned long pp_magic;
> + struct page_pool *pp;
> + unsigned long dma_addr;
> + atomic_long_t pp_ref_count;
> + );
> };
>
> struct net_iov_area {
> @@ -51,9 +80,9 @@ struct net_iov_area {
> /* These fields in struct page are used by the page_pool and net stack:
> *
> * struct {
> + * unsigned long _pp_mapping_pad;
> * unsigned long pp_magic;
> * struct page_pool *pp;
> - * unsigned long _pp_mapping_pad;
> * unsigned long dma_addr;
> * atomic_long_t pp_ref_count;
> * };
> --
> 2.17.1
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 18/18] mm, netmem: remove the page pool members in struct page
2025-05-27 1:02 ` Byungchul Park
2025-05-27 1:31 ` Byungchul Park
@ 2025-05-27 5:30 ` Pavel Begunkov
2025-05-27 17:38 ` Mina Almasry
1 sibling, 1 reply; 72+ messages in thread
From: Pavel Begunkov @ 2025-05-27 5:30 UTC (permalink / raw)
To: Byungchul Park
Cc: Mina Almasry, willy, netdev, linux-kernel, linux-mm, kernel_team,
kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On 5/27/25 02:02, Byungchul Park wrote:
...>> Patch 1:
>>
>> struct page {
>> unsigned long flags;
>> union {
>> struct_group_tagged(netmem_desc, netmem_desc) {
>> // same layout as before
>> ...
>> struct page_pool *pp;
>> ...
>> };
>
> This part will be gone shortly. The matters come from absence of this
> part.
Right, the problem is not having an explicit netmem_desc in struct
page and not using struct netmem_desc in all relevant helpers.
>> struct net_iov {
>> unsigned long flags_padding;
>> union {
>> struct {
>> // same layout as in page + build asserts;
>> ...
>> struct page_pool *pp;
>> ...
>> };
>> struct netmem_desc desc;
>> };
>> };
>>
>> struct netmem_desc *page_to_netmem_desc(struct page *page)
>> {
>> return &page->netmem_desc;
>
> page will not have any netmem things in it after this, that matters.
Ok, the question is where are you going to stash the fields?
We still need space to store them. Are you going to do the
indirection mm folks want?
AFAIK, the plan is that in the end pages will still have
netmem_desc but through an indirection. E.g.
static inline bool page_pool_page_is_pp(struct page *page)
{
return page->page_type == PAGE_PP_NET;
}
struct netmem_desc *page_to_netmem_desc(struct page *page)
{
return page->page_private;
}
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 18/18] mm, netmem: remove the page pool members in struct page
2025-05-27 5:30 ` Pavel Begunkov
@ 2025-05-27 17:38 ` Mina Almasry
2025-05-28 1:31 ` Byungchul Park
2025-05-28 7:21 ` Pavel Begunkov
0 siblings, 2 replies; 72+ messages in thread
From: Mina Almasry @ 2025-05-27 17:38 UTC (permalink / raw)
To: Pavel Begunkov
Cc: Byungchul Park, willy, netdev, linux-kernel, linux-mm,
kernel_team, kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Mon, May 26, 2025 at 10:29 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 5/27/25 02:02, Byungchul Park wrote:
> ...>> Patch 1:
> >>
> >> struct page {
> >> unsigned long flags;
> >> union {
> >> struct_group_tagged(netmem_desc, netmem_desc) {
> >> // same layout as before
> >> ...
> >> struct page_pool *pp;
> >> ...
> >> };
> >
> > This part will be gone shortly. The matters come from absence of this
> > part.
>
> Right, the problem is not having an explicit netmem_desc in struct
> page and not using struct netmem_desc in all relevant helpers.
>
> >> struct net_iov {
> >> unsigned long flags_padding;
> >> union {
> >> struct {
> >> // same layout as in page + build asserts;
> >> ...
> >> struct page_pool *pp;
> >> ...
> >> };
> >> struct netmem_desc desc;
> >> };
> >> };
> >>
> >> struct netmem_desc *page_to_netmem_desc(struct page *page)
> >> {
> >> return &page->netmem_desc;
> >
> > page will not have any netmem things in it after this, that matters.
>
> Ok, the question is where are you going to stash the fields?
> We still need space to store them. Are you going to do the
> indirection mm folks want?
>
I think I see some confusion here. I'm not sure indirection is what mm
folks want. The memdesc effort has already been implemented for zpdesc
and ptdesc[1], and the approach they did is very different from this
series. zpdesc and ptdesc have created a struct that mirrors the
entirety of struct page, not a subfield of struct page with
indirection:
https://elixir.bootlin.com/linux/v6.14.3/source/mm/zpdesc.h#L29
I'm now a bit confused, because the code changes in this series do not
match the general approach that zpdesc and ptdesc have done.
Byungchul, is the deviation in approach from zpdesc and ptdecs
intentional? And if so why? Should we follow the zpdesc and ptdesc
lead and implement a new struct that mirrors the entirety of struct
page?
[1] https://kernelnewbies.org/MatthewWilcox/Memdescs/Path
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov
2025-05-27 2:50 ` Byungchul Park
@ 2025-05-27 20:03 ` Mina Almasry
2025-05-28 1:21 ` Byungchul Park
0 siblings, 1 reply; 72+ messages in thread
From: Mina Almasry @ 2025-05-27 20:03 UTC (permalink / raw)
To: Byungchul Park
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Mon, May 26, 2025 at 7:50 PM Byungchul Park <byungchul@sk.com> wrote:
>
> On Fri, May 23, 2025 at 12:25:52PM +0900, Byungchul Park wrote:
> > To simplify struct page, the page pool members of struct page should be
> > moved to other, allowing these members to be removed from struct page.
> >
> > Introduce a network memory descriptor to store the members, struct
> > netmem_desc, reusing struct net_iov that already mirrored struct page.
> >
> > While at it, relocate _pp_mapping_pad to group struct net_iov's fields.
> >
> > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > ---
> > include/linux/mm_types.h | 2 +-
> > include/net/netmem.h | 43 +++++++++++++++++++++++++++++++++-------
> > 2 files changed, 37 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 56d07edd01f9..873e820e1521 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -120,13 +120,13 @@ struct page {
> > unsigned long private;
> > };
> > struct { /* page_pool used by netstack */
> > + unsigned long _pp_mapping_pad;
> > /**
> > * @pp_magic: magic value to avoid recycling non
> > * page_pool allocated pages.
> > */
> > unsigned long pp_magic;
> > struct page_pool *pp;
> > - unsigned long _pp_mapping_pad;
> > unsigned long dma_addr;
> > atomic_long_t pp_ref_count;
> > };
> > diff --git a/include/net/netmem.h b/include/net/netmem.h
> > index 386164fb9c18..08e9d76cdf14 100644
> > --- a/include/net/netmem.h
> > +++ b/include/net/netmem.h
> > @@ -31,12 +31,41 @@ enum net_iov_type {
> > };
> >
> > struct net_iov {
> > - enum net_iov_type type;
> > - unsigned long pp_magic;
> > - struct page_pool *pp;
> > - struct net_iov_area *owner;
> > - unsigned long dma_addr;
> > - atomic_long_t pp_ref_count;
> > + /*
> > + * XXX: Now that struct netmem_desc overlays on struct page,
> > + * struct_group_tagged() should cover all of them. However,
> > + * a separate struct netmem_desc should be declared and embedded,
> > + * once struct netmem_desc is no longer overlayed but it has its
> > + * own instance from slab. The final form should be:
> > + *
> > + * struct netmem_desc {
> > + * unsigned long pp_magic;
> > + * struct page_pool *pp;
> > + * unsigned long dma_addr;
> > + * atomic_long_t pp_ref_count;
> > + * };
> > + *
> > + * struct net_iov {
> > + * enum net_iov_type type;
> > + * struct net_iov_area *owner;
> > + * struct netmem_desc;
> > + * };
> > + */
> > + struct_group_tagged(netmem_desc, desc,
>
> So.. For now, this is the best option we can pick. We can do all that
> you told me once struct netmem_desc has it own instance from slab.
>
> Again, it's because the page pool fields (or netmem things) from struct
> page will be gone by this series.
>
> Mina, thoughts?
>
Can you please post an updated series with the approach you have in
mind? I think this series as-is seems broken vis-a-vie the
_pp_padding_map param move that looks incorrect. Pavel and I have also
commented on patch 18 that removing the ASSERTS seems incorrect as
it's breaking the symmetry between struct page and struct net_iov.
It's not clear to me if the fields are being removed from struct page,
where are they going... the approach ptdesc for example has taken is
to create a mirror of struct page, then show via asserts that the
mirror is equivalent to struct page, AFAIU:
https://elixir.bootlin.com/linux/v6.14.3/source/include/linux/mm_types.h#L437
Also the same approach for zpdesc:
https://elixir.bootlin.com/linux/v6.14.3/source/mm/zpdesc.h#L29
In this series you're removing the entries from struct page, I'm not
really sure where they went, and you're removing the asserts that we
have between net_iov and struct page so we're not even sure that those
are in sync anymore. I would suggest for me at least reposting with
the new types you have in mind and with clear asserts showing what is
meant to be in sync (and overlay) what.
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov
2025-05-27 20:03 ` Mina Almasry
@ 2025-05-28 1:21 ` Byungchul Park
2025-05-28 3:47 ` Mina Almasry
2025-05-28 7:38 ` Pavel Begunkov
0 siblings, 2 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-28 1:21 UTC (permalink / raw)
To: Mina Almasry
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Tue, May 27, 2025 at 01:03:32PM -0700, Mina Almasry wrote:
> On Mon, May 26, 2025 at 7:50 PM Byungchul Park <byungchul@sk.com> wrote:
> >
> > On Fri, May 23, 2025 at 12:25:52PM +0900, Byungchul Park wrote:
> > > To simplify struct page, the page pool members of struct page should be
> > > moved to other, allowing these members to be removed from struct page.
> > >
> > > Introduce a network memory descriptor to store the members, struct
> > > netmem_desc, reusing struct net_iov that already mirrored struct page.
> > >
> > > While at it, relocate _pp_mapping_pad to group struct net_iov's fields.
> > >
> > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > > ---
> > > include/linux/mm_types.h | 2 +-
> > > include/net/netmem.h | 43 +++++++++++++++++++++++++++++++++-------
> > > 2 files changed, 37 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > > index 56d07edd01f9..873e820e1521 100644
> > > --- a/include/linux/mm_types.h
> > > +++ b/include/linux/mm_types.h
> > > @@ -120,13 +120,13 @@ struct page {
> > > unsigned long private;
> > > };
> > > struct { /* page_pool used by netstack */
> > > + unsigned long _pp_mapping_pad;
> > > /**
> > > * @pp_magic: magic value to avoid recycling non
> > > * page_pool allocated pages.
> > > */
> > > unsigned long pp_magic;
> > > struct page_pool *pp;
> > > - unsigned long _pp_mapping_pad;
> > > unsigned long dma_addr;
> > > atomic_long_t pp_ref_count;
> > > };
> > > diff --git a/include/net/netmem.h b/include/net/netmem.h
> > > index 386164fb9c18..08e9d76cdf14 100644
> > > --- a/include/net/netmem.h
> > > +++ b/include/net/netmem.h
> > > @@ -31,12 +31,41 @@ enum net_iov_type {
> > > };
> > >
> > > struct net_iov {
> > > - enum net_iov_type type;
> > > - unsigned long pp_magic;
> > > - struct page_pool *pp;
> > > - struct net_iov_area *owner;
> > > - unsigned long dma_addr;
> > > - atomic_long_t pp_ref_count;
> > > + /*
> > > + * XXX: Now that struct netmem_desc overlays on struct page,
> > > + * struct_group_tagged() should cover all of them. However,
> > > + * a separate struct netmem_desc should be declared and embedded,
> > > + * once struct netmem_desc is no longer overlayed but it has its
> > > + * own instance from slab. The final form should be:
> > > + *
> > > + * struct netmem_desc {
> > > + * unsigned long pp_magic;
> > > + * struct page_pool *pp;
> > > + * unsigned long dma_addr;
> > > + * atomic_long_t pp_ref_count;
> > > + * };
> > > + *
> > > + * struct net_iov {
> > > + * enum net_iov_type type;
> > > + * struct net_iov_area *owner;
> > > + * struct netmem_desc;
> > > + * };
> > > + */
> > > + struct_group_tagged(netmem_desc, desc,
> >
> > So.. For now, this is the best option we can pick. We can do all that
> > you told me once struct netmem_desc has it own instance from slab.
> >
> > Again, it's because the page pool fields (or netmem things) from struct
> > page will be gone by this series.
> >
> > Mina, thoughts?
> >
>
> Can you please post an updated series with the approach you have in
> mind? I think this series as-is seems broken vis-a-vie the
> _pp_padding_map param move that looks incorrect. Pavel and I have also
> commented on patch 18 that removing the ASSERTS seems incorrect as
> it's breaking the symmetry between struct page and struct net_iov.
I told you I will fix it. I will send the updated series shortly for
*review*. However, it will be for review since we know this work can be
completed once the next works have been done:
https://lore.kernel.org/all/20250520205920.2134829-2-anthony.l.nguyen@intel.com/
https://lore.kernel.org/all/1747950086-1246773-9-git-send-email-tariqt@nvidia.com/
> It's not clear to me if the fields are being removed from struct page,
> where are they going... the approach ptdesc for example has taken is
They are going to struct net_iov. Or I should introduce another struct
mirroring struct page as ptdesc did, that would be the exact same as
struct net_iov. Do you think I should do that?
> to create a mirror of struct page, then show via asserts that the
> mirror is equivalent to struct page, AFAIU:
>
> https://elixir.bootlin.com/linux/v6.14.3/source/include/linux/mm_types.h#L437
>
> Also the same approach for zpdesc:
>
> https://elixir.bootlin.com/linux/v6.14.3/source/mm/zpdesc.h#L29
Okay, again. Thanks.
Byungchul
> In this series you're removing the entries from struct page, I'm not
> really sure where they went, and you're removing the asserts that we
> have between net_iov and struct page so we're not even sure that those
> are in sync anymore. I would suggest for me at least reposting with
> the new types you have in mind and with clear asserts showing what is
> meant to be in sync (and overlay) what.
>
> --
> Thanks,
> Mina
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 18/18] mm, netmem: remove the page pool members in struct page
2025-05-27 17:38 ` Mina Almasry
@ 2025-05-28 1:31 ` Byungchul Park
2025-05-28 7:21 ` Pavel Begunkov
1 sibling, 0 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-28 1:31 UTC (permalink / raw)
To: Mina Almasry
Cc: Pavel Begunkov, willy, netdev, linux-kernel, linux-mm,
kernel_team, kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Tue, May 27, 2025 at 10:38:43AM -0700, Mina Almasry wrote:
> On Mon, May 26, 2025 at 10:29 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
> >
> > On 5/27/25 02:02, Byungchul Park wrote:
> > ...>> Patch 1:
> > >>
> > >> struct page {
> > >> unsigned long flags;
> > >> union {
> > >> struct_group_tagged(netmem_desc, netmem_desc) {
> > >> // same layout as before
> > >> ...
> > >> struct page_pool *pp;
> > >> ...
> > >> };
> > >
> > > This part will be gone shortly. The matters come from absence of this
> > > part.
> >
> > Right, the problem is not having an explicit netmem_desc in struct
> > page and not using struct netmem_desc in all relevant helpers.
> >
> > >> struct net_iov {
> > >> unsigned long flags_padding;
> > >> union {
> > >> struct {
> > >> // same layout as in page + build asserts;
> > >> ...
> > >> struct page_pool *pp;
> > >> ...
> > >> };
> > >> struct netmem_desc desc;
> > >> };
> > >> };
> > >>
> > >> struct netmem_desc *page_to_netmem_desc(struct page *page)
> > >> {
> > >> return &page->netmem_desc;
> > >
> > > page will not have any netmem things in it after this, that matters.
> >
> > Ok, the question is where are you going to stash the fields?
> > We still need space to store them. Are you going to do the
> > indirection mm folks want?
> >
>
> I think I see some confusion here. I'm not sure indirection is what mm
> folks want. The memdesc effort has already been implemented for zpdesc
> and ptdesc[1], and the approach they did is very different from this
> series. zpdesc and ptdesc have created a struct that mirrors the
It's struct netmem_desc. Just introducing struct netmem_desc that looks
exact same as struct net_iov, is ugly.
> entirety of struct page, not a subfield of struct page with
> indirection:
I think you got confused.
At the beginning, I tried to place a place-holder:
https://lore.kernel.org/all/20250512125103.GC45370@system.software.com/
But changed the direction as Matthew requested:
https://lore.kernel.org/all/aCK6J2YtA7vi1Kjz@casper.infradead.org/
So now, I will go with the same direction as the others. I will share
the updates version with the assert issues fixed.
Byungchul
>
> https://elixir.bootlin.com/linux/v6.14.3/source/mm/zpdesc.h#L29
>
> I'm now a bit confused, because the code changes in this series do not
> match the general approach that zpdesc and ptdesc have done.
> Byungchul, is the deviation in approach from zpdesc and ptdecs
> intentional? And if so why? Should we follow the zpdesc and ptdesc
> lead and implement a new struct that mirrors the entirety of struct
> page?
>
> [1] https://kernelnewbies.org/MatthewWilcox/Memdescs/Path
>
> --
> Thanks,
> Mina
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 06/18] page_pool: rename page_pool_return_page() to page_pool_return_netmem()
2025-05-23 3:25 ` [PATCH 06/18] page_pool: rename page_pool_return_page() to page_pool_return_netmem() Byungchul Park
@ 2025-05-28 3:18 ` Mina Almasry
0 siblings, 0 replies; 72+ messages in thread
From: Mina Almasry @ 2025-05-28 3:18 UTC (permalink / raw)
To: Byungchul Park
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
>
> Now that page_pool_return_page() is for returning netmem, not struct
> page, rename it to page_pool_return_netmem() to reflect what it does.
>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov
2025-05-28 1:21 ` Byungchul Park
@ 2025-05-28 3:47 ` Mina Almasry
2025-05-28 5:03 ` Byungchul Park
2025-05-28 7:38 ` Pavel Begunkov
1 sibling, 1 reply; 72+ messages in thread
From: Mina Almasry @ 2025-05-28 3:47 UTC (permalink / raw)
To: Byungchul Park
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Tue, May 27, 2025 at 6:22 PM Byungchul Park <byungchul@sk.com> wrote:
>
> On Tue, May 27, 2025 at 01:03:32PM -0700, Mina Almasry wrote:
> > On Mon, May 26, 2025 at 7:50 PM Byungchul Park <byungchul@sk.com> wrote:
> > >
> > > On Fri, May 23, 2025 at 12:25:52PM +0900, Byungchul Park wrote:
> > > > To simplify struct page, the page pool members of struct page should be
> > > > moved to other, allowing these members to be removed from struct page.
> > > >
> > > > Introduce a network memory descriptor to store the members, struct
> > > > netmem_desc, reusing struct net_iov that already mirrored struct page.
> > > >
> > > > While at it, relocate _pp_mapping_pad to group struct net_iov's fields.
> > > >
> > > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > > > ---
> > > > include/linux/mm_types.h | 2 +-
> > > > include/net/netmem.h | 43 +++++++++++++++++++++++++++++++++-------
> > > > 2 files changed, 37 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > > > index 56d07edd01f9..873e820e1521 100644
> > > > --- a/include/linux/mm_types.h
> > > > +++ b/include/linux/mm_types.h
> > > > @@ -120,13 +120,13 @@ struct page {
> > > > unsigned long private;
> > > > };
> > > > struct { /* page_pool used by netstack */
> > > > + unsigned long _pp_mapping_pad;
> > > > /**
> > > > * @pp_magic: magic value to avoid recycling non
> > > > * page_pool allocated pages.
> > > > */
> > > > unsigned long pp_magic;
> > > > struct page_pool *pp;
> > > > - unsigned long _pp_mapping_pad;
> > > > unsigned long dma_addr;
> > > > atomic_long_t pp_ref_count;
> > > > };
> > > > diff --git a/include/net/netmem.h b/include/net/netmem.h
> > > > index 386164fb9c18..08e9d76cdf14 100644
> > > > --- a/include/net/netmem.h
> > > > +++ b/include/net/netmem.h
> > > > @@ -31,12 +31,41 @@ enum net_iov_type {
> > > > };
> > > >
> > > > struct net_iov {
> > > > - enum net_iov_type type;
> > > > - unsigned long pp_magic;
> > > > - struct page_pool *pp;
> > > > - struct net_iov_area *owner;
> > > > - unsigned long dma_addr;
> > > > - atomic_long_t pp_ref_count;
> > > > + /*
> > > > + * XXX: Now that struct netmem_desc overlays on struct page,
> > > > + * struct_group_tagged() should cover all of them. However,
> > > > + * a separate struct netmem_desc should be declared and embedded,
> > > > + * once struct netmem_desc is no longer overlayed but it has its
> > > > + * own instance from slab. The final form should be:
> > > > + *
> > > > + * struct netmem_desc {
> > > > + * unsigned long pp_magic;
> > > > + * struct page_pool *pp;
> > > > + * unsigned long dma_addr;
> > > > + * atomic_long_t pp_ref_count;
> > > > + * };
> > > > + *
> > > > + * struct net_iov {
> > > > + * enum net_iov_type type;
> > > > + * struct net_iov_area *owner;
> > > > + * struct netmem_desc;
> > > > + * };
> > > > + */
> > > > + struct_group_tagged(netmem_desc, desc,
> > >
> > > So.. For now, this is the best option we can pick. We can do all that
> > > you told me once struct netmem_desc has it own instance from slab.
> > >
> > > Again, it's because the page pool fields (or netmem things) from struct
> > > page will be gone by this series.
> > >
> > > Mina, thoughts?
> > >
> >
> > Can you please post an updated series with the approach you have in
> > mind? I think this series as-is seems broken vis-a-vie the
> > _pp_padding_map param move that looks incorrect. Pavel and I have also
> > commented on patch 18 that removing the ASSERTS seems incorrect as
> > it's breaking the symmetry between struct page and struct net_iov.
>
> I told you I will fix it. I will send the updated series shortly for
> *review*. However, it will be for review since we know this work can be
> completed once the next works have been done:
>
> https://lore.kernel.org/all/20250520205920.2134829-2-anthony.l.nguyen@intel.com/
> https://lore.kernel.org/all/1747950086-1246773-9-git-send-email-tariqt@nvidia.com/
>
> > It's not clear to me if the fields are being removed from struct page,
> > where are they going... the approach ptdesc for example has taken is
>
> They are going to struct net_iov.
Oh. I see. My gut reaction is I'm not sure moving the page_pool fields
to struct net_iov will work.
struct net_iov shares some fields with struct page, but abstractly
it's very different.
struct page is allocated by the mm stack via things like alloc_pages
and can be passed to mm apis such as put_page() (called from
skb_frag_ref) and vm_insert_batch (called from
tcp_zerocopy_vm_insert_batch_error).
struct net_iov is kvmalloced by networking code (see
net_devmem_bind_dmabuf for example), and *must not* be passed to any
mm apis as it's not a struct page at all. Accidentally calling
vm_insert_batch on a struct net_iov will cause a kernel crash or some
memory corruption.
Thus abstractly different things maybe should not share the same
in-kernel struct.
One thing that maybe could work is if struct net_iov has a field in it
which tells us whether it's actually a struct page that can be passed
to mm apis, or not a struct page which cannot be passed to mm apis.
> Or I should introduce another struct
maybe introducing another struct is the answer. I'm not sure. The net
stack today already supports struct page and struct net_iov, with
netmem_ref acting as an abstraction over both. Adding a 3rd struct and
adding more checks to test if page or net_iov or something new will
add overhead.
An additional problem is that there are probably hundreds or thousands
of references to 'page' in the net stack and drivers. I'm not sure
what you're going to do about those. Are you converting all those to
netmem or netmem_desc?
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov
2025-05-28 3:47 ` Mina Almasry
@ 2025-05-28 5:03 ` Byungchul Park
2025-05-28 7:43 ` Pavel Begunkov
0 siblings, 1 reply; 72+ messages in thread
From: Byungchul Park @ 2025-05-28 5:03 UTC (permalink / raw)
To: Mina Almasry
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, asml.silence, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Tue, May 27, 2025 at 08:47:54PM -0700, Mina Almasry wrote:
> On Tue, May 27, 2025 at 6:22 PM Byungchul Park <byungchul@sk.com> wrote:
> >
> > On Tue, May 27, 2025 at 01:03:32PM -0700, Mina Almasry wrote:
> > > On Mon, May 26, 2025 at 7:50 PM Byungchul Park <byungchul@sk.com> wrote:
> > > >
> > > > On Fri, May 23, 2025 at 12:25:52PM +0900, Byungchul Park wrote:
> > > > > To simplify struct page, the page pool members of struct page should be
> > > > > moved to other, allowing these members to be removed from struct page.
> > > > >
> > > > > Introduce a network memory descriptor to store the members, struct
> > > > > netmem_desc, reusing struct net_iov that already mirrored struct page.
> > > > >
> > > > > While at it, relocate _pp_mapping_pad to group struct net_iov's fields.
> > > > >
> > > > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > > > > ---
> > > > > include/linux/mm_types.h | 2 +-
> > > > > include/net/netmem.h | 43 +++++++++++++++++++++++++++++++++-------
> > > > > 2 files changed, 37 insertions(+), 8 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > > > > index 56d07edd01f9..873e820e1521 100644
> > > > > --- a/include/linux/mm_types.h
> > > > > +++ b/include/linux/mm_types.h
> > > > > @@ -120,13 +120,13 @@ struct page {
> > > > > unsigned long private;
> > > > > };
> > > > > struct { /* page_pool used by netstack */
> > > > > + unsigned long _pp_mapping_pad;
> > > > > /**
> > > > > * @pp_magic: magic value to avoid recycling non
> > > > > * page_pool allocated pages.
> > > > > */
> > > > > unsigned long pp_magic;
> > > > > struct page_pool *pp;
> > > > > - unsigned long _pp_mapping_pad;
> > > > > unsigned long dma_addr;
> > > > > atomic_long_t pp_ref_count;
> > > > > };
> > > > > diff --git a/include/net/netmem.h b/include/net/netmem.h
> > > > > index 386164fb9c18..08e9d76cdf14 100644
> > > > > --- a/include/net/netmem.h
> > > > > +++ b/include/net/netmem.h
> > > > > @@ -31,12 +31,41 @@ enum net_iov_type {
> > > > > };
> > > > >
> > > > > struct net_iov {
> > > > > - enum net_iov_type type;
> > > > > - unsigned long pp_magic;
> > > > > - struct page_pool *pp;
> > > > > - struct net_iov_area *owner;
> > > > > - unsigned long dma_addr;
> > > > > - atomic_long_t pp_ref_count;
> > > > > + /*
> > > > > + * XXX: Now that struct netmem_desc overlays on struct page,
> > > > > + * struct_group_tagged() should cover all of them. However,
> > > > > + * a separate struct netmem_desc should be declared and embedded,
> > > > > + * once struct netmem_desc is no longer overlayed but it has its
> > > > > + * own instance from slab. The final form should be:
> > > > > + *
> > > > > + * struct netmem_desc {
> > > > > + * unsigned long pp_magic;
> > > > > + * struct page_pool *pp;
> > > > > + * unsigned long dma_addr;
> > > > > + * atomic_long_t pp_ref_count;
> > > > > + * };
> > > > > + *
> > > > > + * struct net_iov {
> > > > > + * enum net_iov_type type;
> > > > > + * struct net_iov_area *owner;
> > > > > + * struct netmem_desc;
> > > > > + * };
> > > > > + */
> > > > > + struct_group_tagged(netmem_desc, desc,
> > > >
> > > > So.. For now, this is the best option we can pick. We can do all that
> > > > you told me once struct netmem_desc has it own instance from slab.
> > > >
> > > > Again, it's because the page pool fields (or netmem things) from struct
> > > > page will be gone by this series.
> > > >
> > > > Mina, thoughts?
> > > >
> > >
> > > Can you please post an updated series with the approach you have in
> > > mind? I think this series as-is seems broken vis-a-vie the
> > > _pp_padding_map param move that looks incorrect. Pavel and I have also
> > > commented on patch 18 that removing the ASSERTS seems incorrect as
> > > it's breaking the symmetry between struct page and struct net_iov.
> >
> > I told you I will fix it. I will send the updated series shortly for
> > *review*. However, it will be for review since we know this work can be
> > completed once the next works have been done:
> >
> > https://lore.kernel.org/all/20250520205920.2134829-2-anthony.l.nguyen@intel.com/
> > https://lore.kernel.org/all/1747950086-1246773-9-git-send-email-tariqt@nvidia.com/
> >
> > > It's not clear to me if the fields are being removed from struct page,
> > > where are they going... the approach ptdesc for example has taken is
> >
> > They are going to struct net_iov.
Precisely speaking, to 'struct netmem_desc'.
> Oh. I see. My gut reaction is I'm not sure moving the page_pool fields
> to struct net_iov will work.
>
> struct net_iov shares some fields with struct page, but abstractly
> it's very different.
>
> struct page is allocated by the mm stack via things like alloc_pages
> and can be passed to mm apis such as put_page() (called from
> skb_frag_ref) and vm_insert_batch (called from
> tcp_zerocopy_vm_insert_batch_error).
>
> struct net_iov is kvmalloced by networking code (see
> net_devmem_bind_dmabuf for example), and *must not* be passed to any
> mm apis as it's not a struct page at all. Accidentally calling
> vm_insert_batch on a struct net_iov will cause a kernel crash or some
> memory corruption.
>
> Thus abstractly different things maybe should not share the same
> in-kernel struct.
>
> One thing that maybe could work is if struct net_iov has a field in it
> which tells us whether it's actually a struct page that can be passed
> to mm apis, or not a struct page which cannot be passed to mm apis.
>
> > Or I should introduce another struct
>
> maybe introducing another struct is the answer. I'm not sure. The net
The final form should be like:
struct netmem_desc {
struct page_pool *pp;
unsigned long dma_addr;
atomic_long_t ref_count;
};
struct net_iov {
struct netmem_desc;
enum net_iov_type type;
struct net_iov_area *owner;
...
};
However, now that overlaying on struct page is required, struct
netmem_desc should be almost same as struct net_iov. So I'm not sure if
we should introduce struct netmem_desc as a new struct along with struct
net_iov.
> stack today already supports struct page and struct net_iov, with
> netmem_ref acting as an abstraction over both. Adding a 3rd struct and
> adding more checks to test if page or net_iov or something new will
> add overhead.
So I think the current form in this patch is a good option we can take
for now.
> An additional problem is that there are probably hundreds or thousands
> of references to 'page' in the net stack and drivers. I'm not sure
> what you're going to do about those. Are you converting all those to
> netmem or netmem_desc?
No. I will convert only the references for page pool.
Byungchul
>
> --
> Thanks,
> Mina
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp()
2025-05-26 9:54 ` Toke Høiland-Jørgensen
2025-05-26 10:01 ` Byungchul Park
@ 2025-05-28 5:14 ` Byungchul Park
2025-05-28 7:35 ` Toke Høiland-Jørgensen
1 sibling, 1 reply; 72+ messages in thread
From: Byungchul Park @ 2025-05-28 5:14 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Mina Almasry, willy, netdev, linux-kernel, linux-mm, kernel_team,
kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, asml.silence, tariqt, edumazet,
pabeni, saeedm, leon, ast, daniel, david, lorenzo.stoakes,
Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, linux-rdma,
bpf, vishal.moola
On Mon, May 26, 2025 at 11:54:33AM +0200, Toke Høiland-Jørgensen wrote:
> Byungchul Park <byungchul@sk.com> writes:
>
> > On Mon, May 26, 2025 at 10:40:30AM +0200, Toke Høiland-Jørgensen wrote:
> >> Byungchul Park <byungchul@sk.com> writes:
> >>
> >> > On Mon, May 26, 2025 at 11:23:07AM +0900, Byungchul Park wrote:
> >> >> On Fri, May 23, 2025 at 10:21:17AM -0700, Mina Almasry wrote:
> >> >> > On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
> >> >> > >
> >> >> > > To simplify struct page, the effort to seperate its own descriptor from
> >> >> > > struct page is required and the work for page pool is on going.
> >> >> > >
> >> >> > > To achieve that, all the code should avoid accessing page pool members
> >> >> > > of struct page directly, but use safe APIs for the purpose.
> >> >> > >
> >> >> > > Use netmem_is_pp() instead of directly accessing page->pp_magic in
> >> >> > > page_pool_page_is_pp().
> >> >> > >
> >> >> > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> >> >> > > ---
> >> >> > > include/linux/mm.h | 5 +----
> >> >> > > net/core/page_pool.c | 5 +++++
> >> >> > > 2 files changed, 6 insertions(+), 4 deletions(-)
> >> >> > >
> >> >> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> >> >> > > index 8dc012e84033..3f7c80fb73ce 100644
> >> >> > > --- a/include/linux/mm.h
> >> >> > > +++ b/include/linux/mm.h
> >> >> > > @@ -4312,10 +4312,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> >> >> > > #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> >> >> > >
> >> >> > > #ifdef CONFIG_PAGE_POOL
> >> >> > > -static inline bool page_pool_page_is_pp(struct page *page)
> >> >> > > -{
> >> >> > > - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> >> >> > > -}
> >> >> >
> >> >> > I vote for keeping this function as-is (do not convert it to netmem),
> >> >> > and instead modify it to access page->netmem_desc->pp_magic.
> >> >>
> >> >> Once the page pool fields are removed from struct page, struct page will
> >> >> have neither struct netmem_desc nor the fields..
> >> >>
> >> >> So it's unevitable to cast it to netmem_desc in order to refer to
> >> >> pp_magic. Again, pp_magic is no longer associated to struct page.
> >> >
> >> > Options that come across my mind are:
> >> >
> >> > 1. use lru field of struct page instead, with appropriate comment but
> >> > looks so ugly.
> >> > 2. instead of a full word for the magic, use a bit of flags or use
> >> > the private field for that purpose.
> >> > 3. do not check magic number for page pool.
> >> > 4. more?
> >>
> >> I'm not sure I understand Mina's concern about CPU cycles from casting.
> >> The casting is a compile-time thing, which shouldn't affect run-time
> >
> > I didn't mention it but yes.
> >
> >> performance as long as the check is kept as an inline function. So it's
> >> "just" a matter of exposing struct netmem_desc to mm.h so it can use it
> >
> > Then.. we should expose net_iov as well, but I'm afraid it looks weird.
> > Do you think it's okay?
>
> Well, it'll be ugly, I grant you that :)
>
> Hmm, so another idea could be to add the pp_magic field to the inner
> union that the lru field is in, and keep the page_pool_page_is_pp()
> as-is. Then add an assert for offsetof(struct page, pp_magic) ==
> offsetof(netmem_desc, pp_magic) on the netmem side, which can be removed
> once the two structs no longer shadow each other?
>
> That way you can still get rid of the embedded page_pool struct in
> struct page, and the pp_magic field will just be a transition thing
> until things are completely separated...
Or what about to do that as mm folks did in page_is_pfmemalloc()?
static inline bool page_pool_page_is_pp(struct page *page)
{
/*
* XXX: The space of page->lru.next is used as pp_magic in
* struct netmem_desc overlaying on struct page temporarily.
* This API will be unneeded shortly. Let's use the ugly but
* temporal way to access pp_magic until struct netmem_desc has
* its own instance.
*/
return (((unsigned long)page->lru.next) & PP_MAGIC_MASK) == PP_SIGNATURE;
}
Byungchul
>
> -Toke
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 18/18] mm, netmem: remove the page pool members in struct page
2025-05-27 17:38 ` Mina Almasry
2025-05-28 1:31 ` Byungchul Park
@ 2025-05-28 7:21 ` Pavel Begunkov
1 sibling, 0 replies; 72+ messages in thread
From: Pavel Begunkov @ 2025-05-28 7:21 UTC (permalink / raw)
To: Mina Almasry
Cc: Byungchul Park, willy, netdev, linux-kernel, linux-mm,
kernel_team, kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On 5/27/25 18:38, Mina Almasry wrote:
...>>>> struct netmem_desc *page_to_netmem_desc(struct page *page)
>>>> {
>>>> return &page->netmem_desc;
>>>
>>> page will not have any netmem things in it after this, that matters.
>>
>> Ok, the question is where are you going to stash the fields?
>> We still need space to store them. Are you going to do the
>> indirection mm folks want?
>>
>
> I think I see some confusion here. I'm not sure indirection is what mm
> folks want. The memdesc effort has already been implemented for zpdesc
To the best of my knowledge, it is. What you're looking at should be
a temporary state before all other users are converted, after which
mm will shrink the page in a single patch / small series.
> and ptdesc[1], and the approach they did is very different from this
> series. zpdesc and ptdesc have created a struct that mirrors the
> entirety of struct page, not a subfield of struct page with
> indirection:
>
> https://elixir.bootlin.com/linux/v6.14.3/source/mm/zpdesc.h#L29
>
> I'm now a bit confused, because the code changes in this series do not
> match the general approach that zpdesc and ptdesc have done.
In my estimation, the only bits that mm needs for a clean final
patch is a new struct with use case specific fields (i.e. netmem_desc),
a helper converting a page to it, and that everyone uses the helper
to access the fields. I'd argue a temporary placeholder in struct
page is an easier approach than separate overlays, but either is
fine to me.
> Byungchul, is the deviation in approach from zpdesc and ptdecs
> intentional? And if so why? Should we follow the zpdesc and ptdesc
> lead and implement a new struct that mirrors the entirety of struct
> page?
>
> [1] https://kernelnewbies.org/MatthewWilcox/Memdescs/Path
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp()
2025-05-28 5:14 ` Byungchul Park
@ 2025-05-28 7:35 ` Toke Høiland-Jørgensen
2025-05-28 8:15 ` Byungchul Park
0 siblings, 1 reply; 72+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-05-28 7:35 UTC (permalink / raw)
To: Byungchul Park
Cc: Mina Almasry, willy, netdev, linux-kernel, linux-mm, kernel_team,
kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, asml.silence, tariqt, edumazet,
pabeni, saeedm, leon, ast, daniel, david, lorenzo.stoakes,
Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, linux-rdma,
bpf, vishal.moola
Byungchul Park <byungchul@sk.com> writes:
> On Mon, May 26, 2025 at 11:54:33AM +0200, Toke Høiland-Jørgensen wrote:
>> Byungchul Park <byungchul@sk.com> writes:
>>
>> > On Mon, May 26, 2025 at 10:40:30AM +0200, Toke Høiland-Jørgensen wrote:
>> >> Byungchul Park <byungchul@sk.com> writes:
>> >>
>> >> > On Mon, May 26, 2025 at 11:23:07AM +0900, Byungchul Park wrote:
>> >> >> On Fri, May 23, 2025 at 10:21:17AM -0700, Mina Almasry wrote:
>> >> >> > On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
>> >> >> > >
>> >> >> > > To simplify struct page, the effort to seperate its own descriptor from
>> >> >> > > struct page is required and the work for page pool is on going.
>> >> >> > >
>> >> >> > > To achieve that, all the code should avoid accessing page pool members
>> >> >> > > of struct page directly, but use safe APIs for the purpose.
>> >> >> > >
>> >> >> > > Use netmem_is_pp() instead of directly accessing page->pp_magic in
>> >> >> > > page_pool_page_is_pp().
>> >> >> > >
>> >> >> > > Signed-off-by: Byungchul Park <byungchul@sk.com>
>> >> >> > > ---
>> >> >> > > include/linux/mm.h | 5 +----
>> >> >> > > net/core/page_pool.c | 5 +++++
>> >> >> > > 2 files changed, 6 insertions(+), 4 deletions(-)
>> >> >> > >
>> >> >> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
>> >> >> > > index 8dc012e84033..3f7c80fb73ce 100644
>> >> >> > > --- a/include/linux/mm.h
>> >> >> > > +++ b/include/linux/mm.h
>> >> >> > > @@ -4312,10 +4312,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>> >> >> > > #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>> >> >> > >
>> >> >> > > #ifdef CONFIG_PAGE_POOL
>> >> >> > > -static inline bool page_pool_page_is_pp(struct page *page)
>> >> >> > > -{
>> >> >> > > - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>> >> >> > > -}
>> >> >> >
>> >> >> > I vote for keeping this function as-is (do not convert it to netmem),
>> >> >> > and instead modify it to access page->netmem_desc->pp_magic.
>> >> >>
>> >> >> Once the page pool fields are removed from struct page, struct page will
>> >> >> have neither struct netmem_desc nor the fields..
>> >> >>
>> >> >> So it's unevitable to cast it to netmem_desc in order to refer to
>> >> >> pp_magic. Again, pp_magic is no longer associated to struct page.
>> >> >
>> >> > Options that come across my mind are:
>> >> >
>> >> > 1. use lru field of struct page instead, with appropriate comment but
>> >> > looks so ugly.
>> >> > 2. instead of a full word for the magic, use a bit of flags or use
>> >> > the private field for that purpose.
>> >> > 3. do not check magic number for page pool.
>> >> > 4. more?
>> >>
>> >> I'm not sure I understand Mina's concern about CPU cycles from casting.
>> >> The casting is a compile-time thing, which shouldn't affect run-time
>> >
>> > I didn't mention it but yes.
>> >
>> >> performance as long as the check is kept as an inline function. So it's
>> >> "just" a matter of exposing struct netmem_desc to mm.h so it can use it
>> >
>> > Then.. we should expose net_iov as well, but I'm afraid it looks weird.
>> > Do you think it's okay?
>>
>> Well, it'll be ugly, I grant you that :)
>>
>> Hmm, so another idea could be to add the pp_magic field to the inner
>> union that the lru field is in, and keep the page_pool_page_is_pp()
>> as-is. Then add an assert for offsetof(struct page, pp_magic) ==
>> offsetof(netmem_desc, pp_magic) on the netmem side, which can be removed
>> once the two structs no longer shadow each other?
>>
>> That way you can still get rid of the embedded page_pool struct in
>> struct page, and the pp_magic field will just be a transition thing
>> until things are completely separated...
>
> Or what about to do that as mm folks did in page_is_pfmemalloc()?
>
> static inline bool page_pool_page_is_pp(struct page *page)
> {
> /*
> * XXX: The space of page->lru.next is used as pp_magic in
> * struct netmem_desc overlaying on struct page temporarily.
> * This API will be unneeded shortly. Let's use the ugly but
> * temporal way to access pp_magic until struct netmem_desc has
> * its own instance.
> */
> return (((unsigned long)page->lru.next) & PP_MAGIC_MASK) == PP_SIGNATURE;
> }
Sure, that can work as a temporary solution (maybe with a static assert
somewhere that pp_magic and lru have the same offsetof())?
-Toke
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov
2025-05-28 1:21 ` Byungchul Park
2025-05-28 3:47 ` Mina Almasry
@ 2025-05-28 7:38 ` Pavel Begunkov
1 sibling, 0 replies; 72+ messages in thread
From: Pavel Begunkov @ 2025-05-28 7:38 UTC (permalink / raw)
To: Byungchul Park, Mina Almasry
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, toke, tariqt, edumazet, pabeni, saeedm, leon, ast,
daniel, david, lorenzo.stoakes, Liam.Howlett, vbabka, rppt,
surenb, mhocko, horms, linux-rdma, bpf, vishal.moola
On 5/28/25 02:21, Byungchul Park wrote:
>>> So.. For now, this is the best option we can pick. We can do all that
>>> you told me once struct netmem_desc has it own instance from slab.
>>>
>>> Again, it's because the page pool fields (or netmem things) from struct
>>> page will be gone by this series.
>>>
>>> Mina, thoughts?
>>>
>>
>> Can you please post an updated series with the approach you have in
>> mind? I think this series as-is seems broken vis-a-vie the
>> _pp_padding_map param move that looks incorrect. Pavel and I have also
>> commented on patch 18 that removing the ASSERTS seems incorrect as
>> it's breaking the symmetry between struct page and struct net_iov.
>
> I told you I will fix it. I will send the updated series shortly for
> *review*. However, it will be for review since we know this work can be
> completed once the next works have been done:
Please don't forget to tag it with "RFC", otherwise nobody will
assume it's for for review only.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov
2025-05-28 5:03 ` Byungchul Park
@ 2025-05-28 7:43 ` Pavel Begunkov
2025-05-28 8:17 ` Byungchul Park
0 siblings, 1 reply; 72+ messages in thread
From: Pavel Begunkov @ 2025-05-28 7:43 UTC (permalink / raw)
To: Byungchul Park, Mina Almasry
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, toke, tariqt, edumazet, pabeni, saeedm, leon, ast,
daniel, david, lorenzo.stoakes, Liam.Howlett, vbabka, rppt,
surenb, mhocko, horms, linux-rdma, bpf, vishal.moola
On 5/28/25 06:03, Byungchul Park wrote:
...>> Thus abstractly different things maybe should not share the same
>> in-kernel struct.
>>
>> One thing that maybe could work is if struct net_iov has a field in it
>> which tells us whether it's actually a struct page that can be passed
>> to mm apis, or not a struct page which cannot be passed to mm apis.
>>
>>> Or I should introduce another struct
>>
>> maybe introducing another struct is the answer. I'm not sure. The net
>
> The final form should be like:
>
> struct netmem_desc {
> struct page_pool *pp;
> unsigned long dma_addr;
> atomic_long_t ref_count;
> };
>
> struct net_iov {
> struct netmem_desc;
> enum net_iov_type type;
> struct net_iov_area *owner;
> ...
> };
>
> However, now that overlaying on struct page is required, struct
> netmem_desc should be almost same as struct net_iov. So I'm not sure if
> we should introduce struct netmem_desc as a new struct along with struct
> net_iov.
Yes, you should. Mina already explained that net_iov is not the same
thing as the net specific sub-struct of the page. They have common
fields, but there are also net_iov (memory provider) specific fields
as well.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp()
2025-05-26 2:23 ` Byungchul Park
2025-05-26 2:36 ` Byungchul Park
@ 2025-05-28 7:51 ` Pavel Begunkov
2025-05-28 8:14 ` Byungchul Park
1 sibling, 1 reply; 72+ messages in thread
From: Pavel Begunkov @ 2025-05-28 7:51 UTC (permalink / raw)
To: Byungchul Park, Mina Almasry
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, davem, john.fastabend,
andrew+netdev, toke, tariqt, edumazet, pabeni, saeedm, leon, ast,
daniel, david, lorenzo.stoakes, Liam.Howlett, vbabka, rppt,
surenb, mhocko, horms, linux-rdma, bpf, vishal.moola
On 5/26/25 03:23, Byungchul Park wrote:
> On Fri, May 23, 2025 at 10:21:17AM -0700, Mina Almasry wrote:
>> On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
>>>
>>> To simplify struct page, the effort to seperate its own descriptor from
>>> struct page is required and the work for page pool is on going.
>>>
>>> To achieve that, all the code should avoid accessing page pool members
>>> of struct page directly, but use safe APIs for the purpose.
>>>
>>> Use netmem_is_pp() instead of directly accessing page->pp_magic in
>>> page_pool_page_is_pp().
>>>
>>> Signed-off-by: Byungchul Park <byungchul@sk.com>
>>> ---
>>> include/linux/mm.h | 5 +----
>>> net/core/page_pool.c | 5 +++++
>>> 2 files changed, 6 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index 8dc012e84033..3f7c80fb73ce 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -4312,10 +4312,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>>> #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>>>
>>> #ifdef CONFIG_PAGE_POOL
>>> -static inline bool page_pool_page_is_pp(struct page *page)
>>> -{
>>> - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>> -}
>>
>> I vote for keeping this function as-is (do not convert it to netmem),
>> and instead modify it to access page->netmem_desc->pp_magic.
>
> Once the page pool fields are removed from struct page, struct page will
> have neither struct netmem_desc nor the fields..
>
> So it's unevitable to cast it to netmem_desc in order to refer to
> pp_magic. Again, pp_magic is no longer associated to struct page.
>
> Thoughts?
Once the indirection / page shrinking is realized, the page is
supposed to have a type field, isn't it? And all pp_magic trickery
will be replaced with something like
page_pool_page_is_pp() { return page->type == PAGE_TYPE_PP; }
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp()
2025-05-28 7:51 ` Pavel Begunkov
@ 2025-05-28 8:14 ` Byungchul Park
2025-05-28 9:07 ` Pavel Begunkov
0 siblings, 1 reply; 72+ messages in thread
From: Byungchul Park @ 2025-05-28 8:14 UTC (permalink / raw)
To: Pavel Begunkov
Cc: Mina Almasry, willy, netdev, linux-kernel, linux-mm, kernel_team,
kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Wed, May 28, 2025 at 08:51:47AM +0100, Pavel Begunkov wrote:
> On 5/26/25 03:23, Byungchul Park wrote:
> > On Fri, May 23, 2025 at 10:21:17AM -0700, Mina Almasry wrote:
> > > On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
> > > >
> > > > To simplify struct page, the effort to seperate its own descriptor from
> > > > struct page is required and the work for page pool is on going.
> > > >
> > > > To achieve that, all the code should avoid accessing page pool members
> > > > of struct page directly, but use safe APIs for the purpose.
> > > >
> > > > Use netmem_is_pp() instead of directly accessing page->pp_magic in
> > > > page_pool_page_is_pp().
> > > >
> > > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > > > ---
> > > > include/linux/mm.h | 5 +----
> > > > net/core/page_pool.c | 5 +++++
> > > > 2 files changed, 6 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > > index 8dc012e84033..3f7c80fb73ce 100644
> > > > --- a/include/linux/mm.h
> > > > +++ b/include/linux/mm.h
> > > > @@ -4312,10 +4312,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> > > > #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> > > >
> > > > #ifdef CONFIG_PAGE_POOL
> > > > -static inline bool page_pool_page_is_pp(struct page *page)
> > > > -{
> > > > - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> > > > -}
> > >
> > > I vote for keeping this function as-is (do not convert it to netmem),
> > > and instead modify it to access page->netmem_desc->pp_magic.
> >
> > Once the page pool fields are removed from struct page, struct page will
> > have neither struct netmem_desc nor the fields..
> >
> > So it's unevitable to cast it to netmem_desc in order to refer to
> > pp_magic. Again, pp_magic is no longer associated to struct page.
> >
> > Thoughts?
>
> Once the indirection / page shrinking is realized, the page is
> supposed to have a type field, isn't it? And all pp_magic trickery
> will be replaced with something like
>
> page_pool_page_is_pp() { return page->type == PAGE_TYPE_PP; }
Agree, but we need a temporary solution until then. I will use the
following way for now:
https://lore.kernel.org/all/20250528051452.GB59539@system.software.com/
Byungchul
>
>
> --
> Pavel Begunkov
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp()
2025-05-28 7:35 ` Toke Høiland-Jørgensen
@ 2025-05-28 8:15 ` Byungchul Park
0 siblings, 0 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-28 8:15 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Mina Almasry, willy, netdev, linux-kernel, linux-mm, kernel_team,
kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, asml.silence, tariqt, edumazet,
pabeni, saeedm, leon, ast, daniel, david, lorenzo.stoakes,
Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, linux-rdma,
bpf, vishal.moola
On Wed, May 28, 2025 at 09:35:03AM +0200, Toke Høiland-Jørgensen wrote:
> Byungchul Park <byungchul@sk.com> writes:
>
> > On Mon, May 26, 2025 at 11:54:33AM +0200, Toke Høiland-Jørgensen wrote:
> >> Byungchul Park <byungchul@sk.com> writes:
> >>
> >> > On Mon, May 26, 2025 at 10:40:30AM +0200, Toke Høiland-Jørgensen wrote:
> >> >> Byungchul Park <byungchul@sk.com> writes:
> >> >>
> >> >> > On Mon, May 26, 2025 at 11:23:07AM +0900, Byungchul Park wrote:
> >> >> >> On Fri, May 23, 2025 at 10:21:17AM -0700, Mina Almasry wrote:
> >> >> >> > On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
> >> >> >> > >
> >> >> >> > > To simplify struct page, the effort to seperate its own descriptor from
> >> >> >> > > struct page is required and the work for page pool is on going.
> >> >> >> > >
> >> >> >> > > To achieve that, all the code should avoid accessing page pool members
> >> >> >> > > of struct page directly, but use safe APIs for the purpose.
> >> >> >> > >
> >> >> >> > > Use netmem_is_pp() instead of directly accessing page->pp_magic in
> >> >> >> > > page_pool_page_is_pp().
> >> >> >> > >
> >> >> >> > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> >> >> >> > > ---
> >> >> >> > > include/linux/mm.h | 5 +----
> >> >> >> > > net/core/page_pool.c | 5 +++++
> >> >> >> > > 2 files changed, 6 insertions(+), 4 deletions(-)
> >> >> >> > >
> >> >> >> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> >> >> >> > > index 8dc012e84033..3f7c80fb73ce 100644
> >> >> >> > > --- a/include/linux/mm.h
> >> >> >> > > +++ b/include/linux/mm.h
> >> >> >> > > @@ -4312,10 +4312,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> >> >> >> > > #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> >> >> >> > >
> >> >> >> > > #ifdef CONFIG_PAGE_POOL
> >> >> >> > > -static inline bool page_pool_page_is_pp(struct page *page)
> >> >> >> > > -{
> >> >> >> > > - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> >> >> >> > > -}
> >> >> >> >
> >> >> >> > I vote for keeping this function as-is (do not convert it to netmem),
> >> >> >> > and instead modify it to access page->netmem_desc->pp_magic.
> >> >> >>
> >> >> >> Once the page pool fields are removed from struct page, struct page will
> >> >> >> have neither struct netmem_desc nor the fields..
> >> >> >>
> >> >> >> So it's unevitable to cast it to netmem_desc in order to refer to
> >> >> >> pp_magic. Again, pp_magic is no longer associated to struct page.
> >> >> >
> >> >> > Options that come across my mind are:
> >> >> >
> >> >> > 1. use lru field of struct page instead, with appropriate comment but
> >> >> > looks so ugly.
> >> >> > 2. instead of a full word for the magic, use a bit of flags or use
> >> >> > the private field for that purpose.
> >> >> > 3. do not check magic number for page pool.
> >> >> > 4. more?
> >> >>
> >> >> I'm not sure I understand Mina's concern about CPU cycles from casting.
> >> >> The casting is a compile-time thing, which shouldn't affect run-time
> >> >
> >> > I didn't mention it but yes.
> >> >
> >> >> performance as long as the check is kept as an inline function. So it's
> >> >> "just" a matter of exposing struct netmem_desc to mm.h so it can use it
> >> >
> >> > Then.. we should expose net_iov as well, but I'm afraid it looks weird.
> >> > Do you think it's okay?
> >>
> >> Well, it'll be ugly, I grant you that :)
> >>
> >> Hmm, so another idea could be to add the pp_magic field to the inner
> >> union that the lru field is in, and keep the page_pool_page_is_pp()
> >> as-is. Then add an assert for offsetof(struct page, pp_magic) ==
> >> offsetof(netmem_desc, pp_magic) on the netmem side, which can be removed
> >> once the two structs no longer shadow each other?
> >>
> >> That way you can still get rid of the embedded page_pool struct in
> >> struct page, and the pp_magic field will just be a transition thing
> >> until things are completely separated...
> >
> > Or what about to do that as mm folks did in page_is_pfmemalloc()?
> >
> > static inline bool page_pool_page_is_pp(struct page *page)
> > {
> > /*
> > * XXX: The space of page->lru.next is used as pp_magic in
> > * struct netmem_desc overlaying on struct page temporarily.
> > * This API will be unneeded shortly. Let's use the ugly but
> > * temporal way to access pp_magic until struct netmem_desc has
> > * its own instance.
> > */
> > return (((unsigned long)page->lru.next) & PP_MAGIC_MASK) == PP_SIGNATURE;
> > }
>
> Sure, that can work as a temporary solution (maybe with a static assert
> somewhere that pp_magic and lru have the same offsetof())?
Sure. I will do that as I posted in the cover letter:
https://lore.kernel.org/all/20250528022911.73453-1-byungchul@sk.com/
Byungchul
>
> -Toke
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov
2025-05-28 7:43 ` Pavel Begunkov
@ 2025-05-28 8:17 ` Byungchul Park
0 siblings, 0 replies; 72+ messages in thread
From: Byungchul Park @ 2025-05-28 8:17 UTC (permalink / raw)
To: Pavel Begunkov
Cc: Mina Almasry, willy, netdev, linux-kernel, linux-mm, kernel_team,
kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Wed, May 28, 2025 at 08:43:34AM +0100, Pavel Begunkov wrote:
> On 5/28/25 06:03, Byungchul Park wrote:
> ...>> Thus abstractly different things maybe should not share the same
> > > in-kernel struct.
> > >
> > > One thing that maybe could work is if struct net_iov has a field in it
> > > which tells us whether it's actually a struct page that can be passed
> > > to mm apis, or not a struct page which cannot be passed to mm apis.
> > >
> > > > Or I should introduce another struct
> > >
> > > maybe introducing another struct is the answer. I'm not sure. The net
> >
> > The final form should be like:
> >
> > struct netmem_desc {
> > struct page_pool *pp;
> > unsigned long dma_addr;
> > atomic_long_t ref_count;
> > };
> >
> > struct net_iov {
> > struct netmem_desc;
> > enum net_iov_type type;
> > struct net_iov_area *owner;
> > ...
> > };
> >
> > However, now that overlaying on struct page is required, struct
> > netmem_desc should be almost same as struct net_iov. So I'm not sure if
> > we should introduce struct netmem_desc as a new struct along with struct
> > net_iov.
>
> Yes, you should. Mina already explained that net_iov is not the same
> thing as the net specific sub-struct of the page. They have common
> fields, but there are also net_iov (memory provider) specific fields
> as well.
Okay then. I will introduce a separate struct, netmem_desc, that has
similar fields to net_iov, and related static assert for the offsets.
Byungchul
>
> --
> Pavel Begunkov
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp()
2025-05-28 8:14 ` Byungchul Park
@ 2025-05-28 9:07 ` Pavel Begunkov
2025-05-28 9:14 ` Byungchul Park
0 siblings, 1 reply; 72+ messages in thread
From: Pavel Begunkov @ 2025-05-28 9:07 UTC (permalink / raw)
To: Byungchul Park
Cc: Mina Almasry, willy, netdev, linux-kernel, linux-mm, kernel_team,
kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On 5/28/25 09:14, Byungchul Park wrote:
> On Wed, May 28, 2025 at 08:51:47AM +0100, Pavel Begunkov wrote:
>> On 5/26/25 03:23, Byungchul Park wrote:
>>> On Fri, May 23, 2025 at 10:21:17AM -0700, Mina Almasry wrote:
>>>> On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
>>>>>
>>>>> To simplify struct page, the effort to seperate its own descriptor from
>>>>> struct page is required and the work for page pool is on going.
>>>>>
>>>>> To achieve that, all the code should avoid accessing page pool members
>>>>> of struct page directly, but use safe APIs for the purpose.
>>>>>
>>>>> Use netmem_is_pp() instead of directly accessing page->pp_magic in
>>>>> page_pool_page_is_pp().
>>>>>
>>>>> Signed-off-by: Byungchul Park <byungchul@sk.com>
>>>>> ---
>>>>> include/linux/mm.h | 5 +----
>>>>> net/core/page_pool.c | 5 +++++
>>>>> 2 files changed, 6 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>>> index 8dc012e84033..3f7c80fb73ce 100644
>>>>> --- a/include/linux/mm.h
>>>>> +++ b/include/linux/mm.h
>>>>> @@ -4312,10 +4312,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>>>>> #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>>>>>
>>>>> #ifdef CONFIG_PAGE_POOL
>>>>> -static inline bool page_pool_page_is_pp(struct page *page)
>>>>> -{
>>>>> - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>>>> -}
>>>>
>>>> I vote for keeping this function as-is (do not convert it to netmem),
>>>> and instead modify it to access page->netmem_desc->pp_magic.
>>>
>>> Once the page pool fields are removed from struct page, struct page will
>>> have neither struct netmem_desc nor the fields..
>>>
>>> So it's unevitable to cast it to netmem_desc in order to refer to
>>> pp_magic. Again, pp_magic is no longer associated to struct page.
>>>
>>> Thoughts?
>>
>> Once the indirection / page shrinking is realized, the page is
>> supposed to have a type field, isn't it? And all pp_magic trickery
>> will be replaced with something like
>>
>> page_pool_page_is_pp() { return page->type == PAGE_TYPE_PP; }
>
> Agree, but we need a temporary solution until then. I will use the
> following way for now:
The question is what is the problem that you need another temporary
solution? If, for example, we go the placeholder way, page_pool_page_is_pp()
can continue using page->netmem_desc->pp_magic as before, and mm folks
will fix it up to page->type when it's time for that. And the compiler
will help by failing compilation if forgotten. You should be able to do
the same with the overlay option.
And, AFAIU, they want to remove/move the lru field in the same way?
In which case we'll get the same problem and need to re-alias it to
something else.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp()
2025-05-28 9:07 ` Pavel Begunkov
@ 2025-05-28 9:14 ` Byungchul Park
2025-05-28 9:20 ` Pavel Begunkov
0 siblings, 1 reply; 72+ messages in thread
From: Byungchul Park @ 2025-05-28 9:14 UTC (permalink / raw)
To: Pavel Begunkov
Cc: Mina Almasry, willy, netdev, linux-kernel, linux-mm, kernel_team,
kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Wed, May 28, 2025 at 10:07:52AM +0100, Pavel Begunkov wrote:
> On 5/28/25 09:14, Byungchul Park wrote:
> > On Wed, May 28, 2025 at 08:51:47AM +0100, Pavel Begunkov wrote:
> > > On 5/26/25 03:23, Byungchul Park wrote:
> > > > On Fri, May 23, 2025 at 10:21:17AM -0700, Mina Almasry wrote:
> > > > > On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
> > > > > >
> > > > > > To simplify struct page, the effort to seperate its own descriptor from
> > > > > > struct page is required and the work for page pool is on going.
> > > > > >
> > > > > > To achieve that, all the code should avoid accessing page pool members
> > > > > > of struct page directly, but use safe APIs for the purpose.
> > > > > >
> > > > > > Use netmem_is_pp() instead of directly accessing page->pp_magic in
> > > > > > page_pool_page_is_pp().
> > > > > >
> > > > > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > > > > > ---
> > > > > > include/linux/mm.h | 5 +----
> > > > > > net/core/page_pool.c | 5 +++++
> > > > > > 2 files changed, 6 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > > > > index 8dc012e84033..3f7c80fb73ce 100644
> > > > > > --- a/include/linux/mm.h
> > > > > > +++ b/include/linux/mm.h
> > > > > > @@ -4312,10 +4312,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> > > > > > #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> > > > > >
> > > > > > #ifdef CONFIG_PAGE_POOL
> > > > > > -static inline bool page_pool_page_is_pp(struct page *page)
> > > > > > -{
> > > > > > - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> > > > > > -}
> > > > >
> > > > > I vote for keeping this function as-is (do not convert it to netmem),
> > > > > and instead modify it to access page->netmem_desc->pp_magic.
> > > >
> > > > Once the page pool fields are removed from struct page, struct page will
> > > > have neither struct netmem_desc nor the fields..
> > > >
> > > > So it's unevitable to cast it to netmem_desc in order to refer to
> > > > pp_magic. Again, pp_magic is no longer associated to struct page.
> > > >
> > > > Thoughts?
> > >
> > > Once the indirection / page shrinking is realized, the page is
> > > supposed to have a type field, isn't it? And all pp_magic trickery
> > > will be replaced with something like
> > >
> > > page_pool_page_is_pp() { return page->type == PAGE_TYPE_PP; }
> >
> > Agree, but we need a temporary solution until then. I will use the
> > following way for now:
>
> The question is what is the problem that you need another temporary
> solution? If, for example, we go the placeholder way, page_pool_page_is_pp()
I prefer using the place-holder, but Matthew does not. I explained it:
https://lore.kernel.org/all/20250528013145.GB2986@system.software.com/
Now, I'm going with the same way as the other approaches e.g. ptdesc.
Byungchul
> can continue using page->netmem_desc->pp_magic as before, and mm folks
> will fix it up to page->type when it's time for that. And the compiler
> will help by failing compilation if forgotten. You should be able to do
> the same with the overlay option.
>
> And, AFAIU, they want to remove/move the lru field in the same way?
> In which case we'll get the same problem and need to re-alias it to
> something else.
>
> --
> Pavel Begunkov
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp()
2025-05-28 9:14 ` Byungchul Park
@ 2025-05-28 9:20 ` Pavel Begunkov
2025-05-28 9:33 ` Byungchul Park
0 siblings, 1 reply; 72+ messages in thread
From: Pavel Begunkov @ 2025-05-28 9:20 UTC (permalink / raw)
To: Byungchul Park
Cc: Mina Almasry, willy, netdev, linux-kernel, linux-mm, kernel_team,
kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On 5/28/25 10:14, Byungchul Park wrote:
> On Wed, May 28, 2025 at 10:07:52AM +0100, Pavel Begunkov wrote:
>> On 5/28/25 09:14, Byungchul Park wrote:
>>> On Wed, May 28, 2025 at 08:51:47AM +0100, Pavel Begunkov wrote:
>>>> On 5/26/25 03:23, Byungchul Park wrote:
>>>>> On Fri, May 23, 2025 at 10:21:17AM -0700, Mina Almasry wrote:
>>>>>> On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
>>>>>>>
>>>>>>> To simplify struct page, the effort to seperate its own descriptor from
>>>>>>> struct page is required and the work for page pool is on going.
>>>>>>>
>>>>>>> To achieve that, all the code should avoid accessing page pool members
>>>>>>> of struct page directly, but use safe APIs for the purpose.
>>>>>>>
>>>>>>> Use netmem_is_pp() instead of directly accessing page->pp_magic in
>>>>>>> page_pool_page_is_pp().
>>>>>>>
>>>>>>> Signed-off-by: Byungchul Park <byungchul@sk.com>
>>>>>>> ---
>>>>>>> include/linux/mm.h | 5 +----
>>>>>>> net/core/page_pool.c | 5 +++++
>>>>>>> 2 files changed, 6 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>>>>> index 8dc012e84033..3f7c80fb73ce 100644
>>>>>>> --- a/include/linux/mm.h
>>>>>>> +++ b/include/linux/mm.h
>>>>>>> @@ -4312,10 +4312,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>>>>>>> #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>>>>>>>
>>>>>>> #ifdef CONFIG_PAGE_POOL
>>>>>>> -static inline bool page_pool_page_is_pp(struct page *page)
>>>>>>> -{
>>>>>>> - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>>>>>> -}
>>>>>>
>>>>>> I vote for keeping this function as-is (do not convert it to netmem),
>>>>>> and instead modify it to access page->netmem_desc->pp_magic.
>>>>>
>>>>> Once the page pool fields are removed from struct page, struct page will
>>>>> have neither struct netmem_desc nor the fields..
>>>>>
>>>>> So it's unevitable to cast it to netmem_desc in order to refer to
>>>>> pp_magic. Again, pp_magic is no longer associated to struct page.
>>>>>
>>>>> Thoughts?
>>>>
>>>> Once the indirection / page shrinking is realized, the page is
>>>> supposed to have a type field, isn't it? And all pp_magic trickery
>>>> will be replaced with something like
>>>>
>>>> page_pool_page_is_pp() { return page->type == PAGE_TYPE_PP; }
>>>
>>> Agree, but we need a temporary solution until then. I will use the
>>> following way for now:
>>
>> The question is what is the problem that you need another temporary
>> solution? If, for example, we go the placeholder way, page_pool_page_is_pp()
>
> I prefer using the place-holder, but Matthew does not. I explained it:
>
> https://lore.kernel.org/all/20250528013145.GB2986@system.software.com/
>
> Now, I'm going with the same way as the other approaches e.g. ptdesc.
Sure, but that doesn't change my point
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp()
2025-05-28 9:20 ` Pavel Begunkov
@ 2025-05-28 9:33 ` Byungchul Park
2025-05-28 9:51 ` Pavel Begunkov
0 siblings, 1 reply; 72+ messages in thread
From: Byungchul Park @ 2025-05-28 9:33 UTC (permalink / raw)
To: Pavel Begunkov
Cc: Mina Almasry, willy, netdev, linux-kernel, linux-mm, kernel_team,
kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Wed, May 28, 2025 at 10:20:29AM +0100, Pavel Begunkov wrote:
> On 5/28/25 10:14, Byungchul Park wrote:
> > On Wed, May 28, 2025 at 10:07:52AM +0100, Pavel Begunkov wrote:
> > > On 5/28/25 09:14, Byungchul Park wrote:
> > > > On Wed, May 28, 2025 at 08:51:47AM +0100, Pavel Begunkov wrote:
> > > > > On 5/26/25 03:23, Byungchul Park wrote:
> > > > > > On Fri, May 23, 2025 at 10:21:17AM -0700, Mina Almasry wrote:
> > > > > > > On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
> > > > > > > >
> > > > > > > > To simplify struct page, the effort to seperate its own descriptor from
> > > > > > > > struct page is required and the work for page pool is on going.
> > > > > > > >
> > > > > > > > To achieve that, all the code should avoid accessing page pool members
> > > > > > > > of struct page directly, but use safe APIs for the purpose.
> > > > > > > >
> > > > > > > > Use netmem_is_pp() instead of directly accessing page->pp_magic in
> > > > > > > > page_pool_page_is_pp().
> > > > > > > >
> > > > > > > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > > > > > > > ---
> > > > > > > > include/linux/mm.h | 5 +----
> > > > > > > > net/core/page_pool.c | 5 +++++
> > > > > > > > 2 files changed, 6 insertions(+), 4 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > > > > > > index 8dc012e84033..3f7c80fb73ce 100644
> > > > > > > > --- a/include/linux/mm.h
> > > > > > > > +++ b/include/linux/mm.h
> > > > > > > > @@ -4312,10 +4312,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> > > > > > > > #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> > > > > > > >
> > > > > > > > #ifdef CONFIG_PAGE_POOL
> > > > > > > > -static inline bool page_pool_page_is_pp(struct page *page)
> > > > > > > > -{
> > > > > > > > - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> > > > > > > > -}
> > > > > > >
> > > > > > > I vote for keeping this function as-is (do not convert it to netmem),
> > > > > > > and instead modify it to access page->netmem_desc->pp_magic.
> > > > > >
> > > > > > Once the page pool fields are removed from struct page, struct page will
> > > > > > have neither struct netmem_desc nor the fields..
> > > > > >
> > > > > > So it's unevitable to cast it to netmem_desc in order to refer to
> > > > > > pp_magic. Again, pp_magic is no longer associated to struct page.
> > > > > >
> > > > > > Thoughts?
> > > > >
> > > > > Once the indirection / page shrinking is realized, the page is
> > > > > supposed to have a type field, isn't it? And all pp_magic trickery
> > > > > will be replaced with something like
> > > > >
> > > > > page_pool_page_is_pp() { return page->type == PAGE_TYPE_PP; }
> > > >
> > > > Agree, but we need a temporary solution until then. I will use the
> > > > following way for now:
> > >
> > > The question is what is the problem that you need another temporary
> > > solution? If, for example, we go the placeholder way, page_pool_page_is_pp()
> >
> > I prefer using the place-holder, but Matthew does not. I explained it:
> >
> > https://lore.kernel.org/all/20250528013145.GB2986@system.software.com/
> >
> > Now, I'm going with the same way as the other approaches e.g. ptdesc.
>
> Sure, but that doesn't change my point
What's your point? The other appoaches do not use place-holders. I
don't get your point.
As I told you, I will introduce a new struct, netmem_desc, instead of
struct_group_tagged() on struct net_iov, and modify the static assert on
the offsets to keep the important fields between struct page and
netmem_desc.
Then, is that following your point? Or could you explain your point in
more detail? Did you say other points than these?
Byungchul
>
> --
> Pavel Begunkov
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp()
2025-05-28 9:33 ` Byungchul Park
@ 2025-05-28 9:51 ` Pavel Begunkov
2025-05-28 10:44 ` Byungchul Park
0 siblings, 1 reply; 72+ messages in thread
From: Pavel Begunkov @ 2025-05-28 9:51 UTC (permalink / raw)
To: Byungchul Park
Cc: Mina Almasry, willy, netdev, linux-kernel, linux-mm, kernel_team,
kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On 5/28/25 10:33, Byungchul Park wrote:
> On Wed, May 28, 2025 at 10:20:29AM +0100, Pavel Begunkov wrote:
>> On 5/28/25 10:14, Byungchul Park wrote:
>>> On Wed, May 28, 2025 at 10:07:52AM +0100, Pavel Begunkov wrote:
>>>> On 5/28/25 09:14, Byungchul Park wrote:
>>>>> On Wed, May 28, 2025 at 08:51:47AM +0100, Pavel Begunkov wrote:
>>>>>> On 5/26/25 03:23, Byungchul Park wrote:
>>>>>>> On Fri, May 23, 2025 at 10:21:17AM -0700, Mina Almasry wrote:
>>>>>>>> On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
>>>>>>>>>
>>>>>>>>> To simplify struct page, the effort to seperate its own descriptor from
>>>>>>>>> struct page is required and the work for page pool is on going.
>>>>>>>>>
>>>>>>>>> To achieve that, all the code should avoid accessing page pool members
>>>>>>>>> of struct page directly, but use safe APIs for the purpose.
>>>>>>>>>
>>>>>>>>> Use netmem_is_pp() instead of directly accessing page->pp_magic in
>>>>>>>>> page_pool_page_is_pp().
>>>>>>>>>
>>>>>>>>> Signed-off-by: Byungchul Park <byungchul@sk.com>
>>>>>>>>> ---
>>>>>>>>> include/linux/mm.h | 5 +----
>>>>>>>>> net/core/page_pool.c | 5 +++++
>>>>>>>>> 2 files changed, 6 insertions(+), 4 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>>>>>>> index 8dc012e84033..3f7c80fb73ce 100644
>>>>>>>>> --- a/include/linux/mm.h
>>>>>>>>> +++ b/include/linux/mm.h
>>>>>>>>> @@ -4312,10 +4312,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>>>>>>>>> #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>>>>>>>>>
>>>>>>>>> #ifdef CONFIG_PAGE_POOL
>>>>>>>>> -static inline bool page_pool_page_is_pp(struct page *page)
>>>>>>>>> -{
>>>>>>>>> - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>>>>>>>> -}
>>>>>>>>
>>>>>>>> I vote for keeping this function as-is (do not convert it to netmem),
>>>>>>>> and instead modify it to access page->netmem_desc->pp_magic.
>>>>>>>
>>>>>>> Once the page pool fields are removed from struct page, struct page will
>>>>>>> have neither struct netmem_desc nor the fields..
>>>>>>>
>>>>>>> So it's unevitable to cast it to netmem_desc in order to refer to
>>>>>>> pp_magic. Again, pp_magic is no longer associated to struct page.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>
>>>>>> Once the indirection / page shrinking is realized, the page is
>>>>>> supposed to have a type field, isn't it? And all pp_magic trickery
>>>>>> will be replaced with something like
>>>>>>
>>>>>> page_pool_page_is_pp() { return page->type == PAGE_TYPE_PP; }
>>>>>
>>>>> Agree, but we need a temporary solution until then. I will use the
>>>>> following way for now:
>>>>
>>>> The question is what is the problem that you need another temporary
>>>> solution? If, for example, we go the placeholder way, page_pool_page_is_pp()
>>>
>>> I prefer using the place-holder, but Matthew does not. I explained it:
>>>
>>> https://lore.kernel.org/all/20250528013145.GB2986@system.software.com/
>>>
>>> Now, I'm going with the same way as the other approaches e.g. ptdesc.
>>
>> Sure, but that doesn't change my point
>
> What's your point? The other appoaches do not use place-holders. I
> don't get your point.
>
> As I told you, I will introduce a new struct, netmem_desc, instead of
> struct_group_tagged() on struct net_iov, and modify the static assert on
> the offsets to keep the important fields between struct page and
> netmem_desc.
>
> Then, is that following your point? Or could you explain your point in
> more detail? Did you say other points than these?
Then please read the message again first. I was replying to th
aliasing with "lru", and even at the place you cut the message it
says "for example", which was followed by "You should be able to
do the same with the overlay option.".
You can still continue to use pp_magic placed in the netmem_desc
until mm gets rid of it in favour of page->type. I hear that you're
saying it's temporary, but it's messy and there is nothing more
persistent than a "temporary solution", who knows where the final
conversion is going to happen.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp()
2025-05-28 9:51 ` Pavel Begunkov
@ 2025-05-28 10:44 ` Byungchul Park
2025-05-28 10:54 ` Pavel Begunkov
0 siblings, 1 reply; 72+ messages in thread
From: Byungchul Park @ 2025-05-28 10:44 UTC (permalink / raw)
To: Pavel Begunkov
Cc: Mina Almasry, willy, netdev, linux-kernel, linux-mm, kernel_team,
kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On Wed, May 28, 2025 at 10:51:29AM +0100, Pavel Begunkov wrote:
> On 5/28/25 10:33, Byungchul Park wrote:
> > On Wed, May 28, 2025 at 10:20:29AM +0100, Pavel Begunkov wrote:
> > > On 5/28/25 10:14, Byungchul Park wrote:
> > > > On Wed, May 28, 2025 at 10:07:52AM +0100, Pavel Begunkov wrote:
> > > > > On 5/28/25 09:14, Byungchul Park wrote:
> > > > > > On Wed, May 28, 2025 at 08:51:47AM +0100, Pavel Begunkov wrote:
> > > > > > > On 5/26/25 03:23, Byungchul Park wrote:
> > > > > > > > On Fri, May 23, 2025 at 10:21:17AM -0700, Mina Almasry wrote:
> > > > > > > > > On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
> > > > > > > > > >
> > > > > > > > > > To simplify struct page, the effort to seperate its own descriptor from
> > > > > > > > > > struct page is required and the work for page pool is on going.
> > > > > > > > > >
> > > > > > > > > > To achieve that, all the code should avoid accessing page pool members
> > > > > > > > > > of struct page directly, but use safe APIs for the purpose.
> > > > > > > > > >
> > > > > > > > > > Use netmem_is_pp() instead of directly accessing page->pp_magic in
> > > > > > > > > > page_pool_page_is_pp().
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > > > > > > > > > ---
> > > > > > > > > > include/linux/mm.h | 5 +----
> > > > > > > > > > net/core/page_pool.c | 5 +++++
> > > > > > > > > > 2 files changed, 6 insertions(+), 4 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > > > > > > > > index 8dc012e84033..3f7c80fb73ce 100644
> > > > > > > > > > --- a/include/linux/mm.h
> > > > > > > > > > +++ b/include/linux/mm.h
> > > > > > > > > > @@ -4312,10 +4312,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> > > > > > > > > > #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> > > > > > > > > >
> > > > > > > > > > #ifdef CONFIG_PAGE_POOL
> > > > > > > > > > -static inline bool page_pool_page_is_pp(struct page *page)
> > > > > > > > > > -{
> > > > > > > > > > - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> > > > > > > > > > -}
> > > > > > > > >
> > > > > > > > > I vote for keeping this function as-is (do not convert it to netmem),
> > > > > > > > > and instead modify it to access page->netmem_desc->pp_magic.
> > > > > > > >
> > > > > > > > Once the page pool fields are removed from struct page, struct page will
> > > > > > > > have neither struct netmem_desc nor the fields..
> > > > > > > >
> > > > > > > > So it's unevitable to cast it to netmem_desc in order to refer to
> > > > > > > > pp_magic. Again, pp_magic is no longer associated to struct page.
> > > > > > > >
> > > > > > > > Thoughts?
> > > > > > >
> > > > > > > Once the indirection / page shrinking is realized, the page is
> > > > > > > supposed to have a type field, isn't it? And all pp_magic trickery
> > > > > > > will be replaced with something like
> > > > > > >
> > > > > > > page_pool_page_is_pp() { return page->type == PAGE_TYPE_PP; }
> > > > > >
> > > > > > Agree, but we need a temporary solution until then. I will use the
> > > > > > following way for now:
> > > > >
> > > > > The question is what is the problem that you need another temporary
> > > > > solution? If, for example, we go the placeholder way, page_pool_page_is_pp()
> > > >
> > > > I prefer using the place-holder, but Matthew does not. I explained it:
> > > >
> > > > https://lore.kernel.org/all/20250528013145.GB2986@system.software.com/
> > > >
> > > > Now, I'm going with the same way as the other approaches e.g. ptdesc.
> > >
> > > Sure, but that doesn't change my point
> >
> > What's your point? The other appoaches do not use place-holders. I
> > don't get your point.
> >
> > As I told you, I will introduce a new struct, netmem_desc, instead of
> > struct_group_tagged() on struct net_iov, and modify the static assert on
> > the offsets to keep the important fields between struct page and
> > netmem_desc.
> >
> > Then, is that following your point? Or could you explain your point in
> > more detail? Did you say other points than these?
>
> Then please read the message again first. I was replying to th
> aliasing with "lru", and even at the place you cut the message it
> says "for example", which was followed by "You should be able to
> do the same with the overlay option.".
With struct_group_tagged() on struct net_iov, no idea about how to.
However, it's doable with a new separate struct, struct netmem_desc.
I will.
Byungchul
>
> You can still continue to use pp_magic placed in the netmem_desc
> until mm gets rid of it in favour of page->type. I hear that you're
> saying it's temporary, but it's messy and there is nothing more
> persistent than a "temporary solution", who knows where the final
> conversion is going to happen.
>
> --
> Pavel Begunkov
>
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp()
2025-05-28 10:44 ` Byungchul Park
@ 2025-05-28 10:54 ` Pavel Begunkov
0 siblings, 0 replies; 72+ messages in thread
From: Pavel Begunkov @ 2025-05-28 10:54 UTC (permalink / raw)
To: Byungchul Park
Cc: Mina Almasry, willy, netdev, linux-kernel, linux-mm, kernel_team,
kuba, ilias.apalodimas, harry.yoo, hawk, akpm, davem,
john.fastabend, andrew+netdev, toke, tariqt, edumazet, pabeni,
saeedm, leon, ast, daniel, david, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, horms, linux-rdma, bpf,
vishal.moola
On 5/28/25 11:44, Byungchul Park wrote:
> On Wed, May 28, 2025 at 10:51:29AM +0100, Pavel Begunkov wrote:
>> On 5/28/25 10:33, Byungchul Park wrote:
>>> On Wed, May 28, 2025 at 10:20:29AM +0100, Pavel Begunkov wrote:
>>>> On 5/28/25 10:14, Byungchul Park wrote:
>>>>> On Wed, May 28, 2025 at 10:07:52AM +0100, Pavel Begunkov wrote:
>>>>>> On 5/28/25 09:14, Byungchul Park wrote:
>>>>>>> On Wed, May 28, 2025 at 08:51:47AM +0100, Pavel Begunkov wrote:
>>>>>>>> On 5/26/25 03:23, Byungchul Park wrote:
>>>>>>>>> On Fri, May 23, 2025 at 10:21:17AM -0700, Mina Almasry wrote:
>>>>>>>>>> On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> To simplify struct page, the effort to seperate its own descriptor from
>>>>>>>>>>> struct page is required and the work for page pool is on going.
>>>>>>>>>>>
>>>>>>>>>>> To achieve that, all the code should avoid accessing page pool members
>>>>>>>>>>> of struct page directly, but use safe APIs for the purpose.
>>>>>>>>>>>
>>>>>>>>>>> Use netmem_is_pp() instead of directly accessing page->pp_magic in
>>>>>>>>>>> page_pool_page_is_pp().
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Byungchul Park <byungchul@sk.com>
>>>>>>>>>>> ---
>>>>>>>>>>> include/linux/mm.h | 5 +----
>>>>>>>>>>> net/core/page_pool.c | 5 +++++
>>>>>>>>>>> 2 files changed, 6 insertions(+), 4 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>>>>>>>>> index 8dc012e84033..3f7c80fb73ce 100644
>>>>>>>>>>> --- a/include/linux/mm.h
>>>>>>>>>>> +++ b/include/linux/mm.h
>>>>>>>>>>> @@ -4312,10 +4312,7 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>>>>>>>>>>> #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>>>>>>>>>>>
>>>>>>>>>>> #ifdef CONFIG_PAGE_POOL
>>>>>>>>>>> -static inline bool page_pool_page_is_pp(struct page *page)
>>>>>>>>>>> -{
>>>>>>>>>>> - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>>>>>>>>>> -}
>>>>>>>>>>
>>>>>>>>>> I vote for keeping this function as-is (do not convert it to netmem),
>>>>>>>>>> and instead modify it to access page->netmem_desc->pp_magic.
>>>>>>>>>
>>>>>>>>> Once the page pool fields are removed from struct page, struct page will
>>>>>>>>> have neither struct netmem_desc nor the fields..
>>>>>>>>>
>>>>>>>>> So it's unevitable to cast it to netmem_desc in order to refer to
>>>>>>>>> pp_magic. Again, pp_magic is no longer associated to struct page.
>>>>>>>>>
>>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>> Once the indirection / page shrinking is realized, the page is
>>>>>>>> supposed to have a type field, isn't it? And all pp_magic trickery
>>>>>>>> will be replaced with something like
>>>>>>>>
>>>>>>>> page_pool_page_is_pp() { return page->type == PAGE_TYPE_PP; }
>>>>>>>
>>>>>>> Agree, but we need a temporary solution until then. I will use the
>>>>>>> following way for now:
>>>>>>
>>>>>> The question is what is the problem that you need another temporary
>>>>>> solution? If, for example, we go the placeholder way, page_pool_page_is_pp()
>>>>>
>>>>> I prefer using the place-holder, but Matthew does not. I explained it:
>>>>>
>>>>> https://lore.kernel.org/all/20250528013145.GB2986@system.software.com/
>>>>>
>>>>> Now, I'm going with the same way as the other approaches e.g. ptdesc.
>>>>
>>>> Sure, but that doesn't change my point
>>>
>>> What's your point? The other appoaches do not use place-holders. I
>>> don't get your point.
>>>
>>> As I told you, I will introduce a new struct, netmem_desc, instead of
>>> struct_group_tagged() on struct net_iov, and modify the static assert on
>>> the offsets to keep the important fields between struct page and
>>> netmem_desc.
>>>
>>> Then, is that following your point? Or could you explain your point in
>>> more detail? Did you say other points than these?
>>
>> Then please read the message again first. I was replying to th
>> aliasing with "lru", and even at the place you cut the message it
>> says "for example", which was followed by "You should be able to
>> do the same with the overlay option.".
>
> With struct_group_tagged() on struct net_iov, no idea about how to.
> However, it's doable with a new separate struct, struct netmem_desc.
static inline bool page_pool_page_is_pp(struct page *page)
{
pp_magic = page_to_netdesc(page)->pp_magic;
return pp_magic == ...;
}
page_to_netdesc() is either casting directly in case of full page
overlays, or "&page->netdesc" for the placeholder option.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 72+ messages in thread
end of thread, other threads:[~2025-05-28 10:53 UTC | newest]
Thread overview: 72+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-23 3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
2025-05-23 3:25 ` [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov Byungchul Park
2025-05-23 9:01 ` Toke Høiland-Jørgensen
2025-05-26 0:56 ` Byungchul Park
2025-05-23 17:00 ` Mina Almasry
2025-05-26 1:15 ` Byungchul Park
2025-05-27 2:50 ` Byungchul Park
2025-05-27 20:03 ` Mina Almasry
2025-05-28 1:21 ` Byungchul Park
2025-05-28 3:47 ` Mina Almasry
2025-05-28 5:03 ` Byungchul Park
2025-05-28 7:43 ` Pavel Begunkov
2025-05-28 8:17 ` Byungchul Park
2025-05-28 7:38 ` Pavel Begunkov
2025-05-23 3:25 ` [PATCH 02/18] netmem: introduce netmem alloc APIs to wrap page alloc APIs Byungchul Park
2025-05-23 3:25 ` [PATCH 03/18] page_pool: use netmem alloc/put APIs in __page_pool_alloc_page_order() Byungchul Park
2025-05-23 3:25 ` [PATCH 04/18] page_pool: rename __page_pool_alloc_page_order() to __page_pool_alloc_large_netmem() Byungchul Park
2025-05-23 3:25 ` [PATCH 05/18] page_pool: use netmem alloc/put APIs in __page_pool_alloc_pages_slow() Byungchul Park
2025-05-23 3:25 ` [PATCH 06/18] page_pool: rename page_pool_return_page() to page_pool_return_netmem() Byungchul Park
2025-05-28 3:18 ` Mina Almasry
2025-05-23 3:25 ` [PATCH 07/18] page_pool: use netmem put API in page_pool_return_netmem() Byungchul Park
2025-05-23 3:25 ` [PATCH 08/18] page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma() Byungchul Park
2025-05-23 3:26 ` [PATCH 09/18] page_pool: rename __page_pool_put_page() to __page_pool_put_netmem() Byungchul Park
2025-05-23 3:26 ` [PATCH 10/18] page_pool: rename __page_pool_alloc_pages_slow() to __page_pool_alloc_netmems_slow() Byungchul Park
2025-05-23 3:26 ` [PATCH 11/18] mlx4: use netmem descriptor and APIs for page pool Byungchul Park
2025-05-23 3:26 ` [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp() Byungchul Park
2025-05-23 8:58 ` Toke Høiland-Jørgensen
2025-05-23 17:21 ` Mina Almasry
2025-05-26 2:23 ` Byungchul Park
2025-05-26 2:36 ` Byungchul Park
2025-05-26 8:40 ` Toke Høiland-Jørgensen
2025-05-26 9:43 ` Byungchul Park
2025-05-26 9:54 ` Toke Høiland-Jørgensen
2025-05-26 10:01 ` Byungchul Park
2025-05-28 5:14 ` Byungchul Park
2025-05-28 7:35 ` Toke Høiland-Jørgensen
2025-05-28 8:15 ` Byungchul Park
2025-05-28 7:51 ` Pavel Begunkov
2025-05-28 8:14 ` Byungchul Park
2025-05-28 9:07 ` Pavel Begunkov
2025-05-28 9:14 ` Byungchul Park
2025-05-28 9:20 ` Pavel Begunkov
2025-05-28 9:33 ` Byungchul Park
2025-05-28 9:51 ` Pavel Begunkov
2025-05-28 10:44 ` Byungchul Park
2025-05-28 10:54 ` Pavel Begunkov
2025-05-23 3:26 ` [PATCH 13/18] mlx5: use netmem descriptor and APIs for page pool Byungchul Park
2025-05-23 17:13 ` Mina Almasry
2025-05-26 3:08 ` Byungchul Park
2025-05-26 8:12 ` Byungchul Park
2025-05-26 18:00 ` Mina Almasry
2025-05-23 3:26 ` [PATCH 14/18] netmem: use _Generic to cover const casting for page_to_netmem() Byungchul Park
2025-05-23 17:14 ` Mina Almasry
2025-05-23 3:26 ` [PATCH 15/18] netmem: remove __netmem_get_pp() Byungchul Park
2025-05-23 3:26 ` [PATCH 16/18] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem() Byungchul Park
2025-05-23 3:26 ` [PATCH 17/18] netdevsim: use netmem descriptor and APIs for page pool Byungchul Park
2025-05-23 3:26 ` [PATCH 18/18] mm, netmem: remove the page pool members in struct page Byungchul Park
2025-05-23 17:16 ` kernel test robot
2025-05-23 17:55 ` Mina Almasry
2025-05-26 1:37 ` Byungchul Park
2025-05-26 16:58 ` Pavel Begunkov
2025-05-26 17:33 ` Mina Almasry
2025-05-27 1:02 ` Byungchul Park
2025-05-27 1:31 ` Byungchul Park
2025-05-27 5:30 ` Pavel Begunkov
2025-05-27 17:38 ` Mina Almasry
2025-05-28 1:31 ` Byungchul Park
2025-05-28 7:21 ` Pavel Begunkov
2025-05-23 6:20 ` [PATCH 00/18] Split netmem from " Taehee Yoo
2025-05-23 7:47 ` Byungchul Park
2025-05-23 17:47 ` SeongJae Park
2025-05-26 1:16 ` Byungchul Park
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).