* [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type
@ 2025-07-29 11:02 Byungchul Park
2025-07-29 14:20 ` Zi Yan
` (4 more replies)
0 siblings, 5 replies; 13+ messages in thread
From: Byungchul Park @ 2025-07-29 11:02 UTC (permalink / raw)
To: linux-mm, netdev
Cc: linux-kernel, kernel_team, harry.yoo, ast, daniel, davem, kuba,
hawk, john.fastabend, sdf, saeedm, leon, tariqt, mbloch,
andrew+netdev, edumazet, pabeni, akpm, david, lorenzo.stoakes,
Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, jackmanb,
hannes, ziy, ilias.apalodimas, willy, brauner, kas, yuzhao,
usamaarif642, baolin.wang, almasrymina, toke, asml.silence, bpf,
linux-rdma, sfr
Changes from v2:
1. Rebase on linux-next as of Jul 29.
2. Skip 'niov->pp = NULL' when it's allocated using __GFP_ZERO.
3. Change trivial coding style. (feedbacked by Mina)
4. Add Co-developed-by, Acked-by, and Reviewed-by properly.
Thanks to all.
Changes from v1:
1. Rebase on linux-next.
2. Initialize net_iov->pp = NULL when allocating net_iov in
net_devmem_bind_dmabuf() and io_zcrx_create_area().
3. Use ->pp for net_iov to identify if it's pp rather than
always consider net_iov as pp.
4. Add Suggested-by: David Hildenbrand <david@redhat.com>.
---8<---
From 88bcb9907a0cef65a9c0adf35e144f9eb67e0542 Mon Sep 17 00:00:00 2001
From: Byungchul Park <byungchul@sk.com>
Date: Tue, 29 Jul 2025 19:49:44 +0900
Subject: [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type
->pp_magic field in struct page is current used to identify if a page
belongs to a page pool. However, ->pp_magic will be removed and page
type bit in struct page e.i. PGTY_netpp can be used for that purpose.
Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
and __ClearPageNetpp() instead, and remove the existing APIs accessing
->pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
netmem_clear_pp_magic().
For net_iov, use ->pp to identify if it's pp, with making sure that ->pp
is NULL for non-pp net_iov.
This work was inspired by the following link:
[1] https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/
While at it, move the sanity check for page pool to on free.
Suggested-by: David Hildenbrand <david@redhat.com>
Co-developed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Byungchul Park <byungchul@sk.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
---
.../net/ethernet/mellanox/mlx5/core/en/xdp.c | 2 +-
include/linux/mm.h | 27 +++----------------
include/linux/page-flags.h | 6 +++++
include/net/netmem.h | 2 +-
io_uring/zcrx.c | 4 +++
mm/page_alloc.c | 7 +++--
net/core/devmem.c | 1 +
net/core/netmem_priv.h | 23 +++++++---------
net/core/page_pool.c | 10 +++++--
9 files changed, 37 insertions(+), 45 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index 5d51600935a6..def274f5c1ca 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -707,7 +707,7 @@ static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq,
xdpi = mlx5e_xdpi_fifo_pop(xdpi_fifo);
page = xdpi.page.page;
- /* No need to check page_pool_page_is_pp() as we
+ /* No need to check PageNetpp() as we
* know this is a page_pool page.
*/
page_pool_recycle_direct(pp_page_to_nmdesc(page)->pp,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0d4ee569aa6b..d01b296e7184 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4171,10 +4171,9 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
* DMA mapping IDs for page_pool
*
* When DMA-mapping a page, page_pool allocates an ID (from an xarray) and
- * stashes it in the upper bits of page->pp_magic. We always want to be able to
- * unambiguously identify page pool pages (using page_pool_page_is_pp()). Non-PP
- * pages can have arbitrary kernel pointers stored in the same field as pp_magic
- * (since it overlaps with page->lru.next), so we must ensure that we cannot
+ * stashes it in the upper bits of page->pp_magic. Non-PP pages can have
+ * arbitrary kernel pointers stored in the same field as pp_magic (since
+ * it overlaps with page->lru.next), so we must ensure that we cannot
* mistake a valid kernel pointer with any of the values we write into this
* field.
*
@@ -4205,26 +4204,6 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
#define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS + PP_DMA_INDEX_SHIFT - 1, \
PP_DMA_INDEX_SHIFT)
-/* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is
- * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for
- * the head page of compound page and bit 1 for pfmemalloc page, as well as the
- * bits used for the DMA index. page_is_pfmemalloc() is checked in
- * __page_pool_put_page() to avoid recycling the pfmemalloc page.
- */
-#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
-
-#ifdef CONFIG_PAGE_POOL
-static inline bool page_pool_page_is_pp(const struct page *page)
-{
- return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
-}
-#else
-static inline bool page_pool_page_is_pp(const struct page *page)
-{
- return false;
-}
-#endif
-
#define PAGE_SNAPSHOT_FAITHFUL (1 << 0)
#define PAGE_SNAPSHOT_PG_BUDDY (1 << 1)
#define PAGE_SNAPSHOT_PG_IDLE (1 << 2)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 8d3fa3a91ce4..84247e39e9e7 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -933,6 +933,7 @@ enum pagetype {
PGTY_zsmalloc = 0xf6,
PGTY_unaccepted = 0xf7,
PGTY_large_kmalloc = 0xf8,
+ PGTY_netpp = 0xf9,
PGTY_mapcount_underflow = 0xff
};
@@ -1077,6 +1078,11 @@ PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc)
PAGE_TYPE_OPS(Unaccepted, unaccepted, unaccepted)
FOLIO_TYPE_OPS(large_kmalloc, large_kmalloc)
+/*
+ * Marks page_pool allocated pages.
+ */
+PAGE_TYPE_OPS(Netpp, netpp, netpp)
+
/**
* PageHuge - Determine if the page belongs to hugetlbfs
* @page: The page to test.
diff --git a/include/net/netmem.h b/include/net/netmem.h
index f7dacc9e75fd..3667334e16e7 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -298,7 +298,7 @@ static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem)
*/
#define pp_page_to_nmdesc(p) \
({ \
- DEBUG_NET_WARN_ON_ONCE(!page_pool_page_is_pp(p)); \
+ DEBUG_NET_WARN_ON_ONCE(!PageNetpp(p)); \
__pp_page_to_nmdesc(p); \
})
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index e5ff49f3425e..f771bb3e756d 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -444,6 +444,10 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
area->freelist[i] = i;
atomic_set(&area->user_refs[i], 0);
niov->type = NET_IOV_IOURING;
+
+ /* niov->pp is already initialized to NULL by
+ * kvmalloc_array(__GFP_ZERO).
+ */
}
area->free_count = nr_iovs;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d1d037f97c5f..2f6a55fab942 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1042,7 +1042,6 @@ static inline bool page_expected_state(struct page *page,
#ifdef CONFIG_MEMCG
page->memcg_data |
#endif
- page_pool_page_is_pp(page) |
(page->flags & check_flags)))
return false;
@@ -1069,8 +1068,6 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
if (unlikely(page->memcg_data))
bad_reason = "page still charged to cgroup";
#endif
- if (unlikely(page_pool_page_is_pp(page)))
- bad_reason = "page_pool leak";
return bad_reason;
}
@@ -1379,9 +1376,11 @@ __always_inline bool free_pages_prepare(struct page *page,
mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
folio->mapping = NULL;
}
- if (unlikely(page_has_type(page)))
+ if (unlikely(page_has_type(page))) {
+ WARN_ON_ONCE(PageNetpp(page));
/* Reset the page_type (which overlays _mapcount) */
page->page_type = UINT_MAX;
+ }
if (is_check_pages_enabled()) {
if (free_page_is_bad(page))
diff --git a/net/core/devmem.c b/net/core/devmem.c
index b3a62ca0df65..40e7a4ec9009 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -285,6 +285,7 @@ net_devmem_bind_dmabuf(struct net_device *dev,
niov = &owner->area.niovs[i];
niov->type = NET_IOV_DMABUF;
niov->owner = &owner->area;
+ niov->pp = NULL;
page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov),
net_devmem_get_dma_addr(niov));
if (direction == DMA_TO_DEVICE)
diff --git a/net/core/netmem_priv.h b/net/core/netmem_priv.h
index cd95394399b4..4b90332d6c64 100644
--- a/net/core/netmem_priv.h
+++ b/net/core/netmem_priv.h
@@ -8,21 +8,18 @@ static inline unsigned long netmem_get_pp_magic(netmem_ref netmem)
return __netmem_clear_lsb(netmem)->pp_magic & ~PP_DMA_INDEX_MASK;
}
-static inline void netmem_or_pp_magic(netmem_ref netmem, unsigned long pp_magic)
-{
- __netmem_clear_lsb(netmem)->pp_magic |= pp_magic;
-}
-
-static inline void netmem_clear_pp_magic(netmem_ref netmem)
-{
- WARN_ON_ONCE(__netmem_clear_lsb(netmem)->pp_magic & PP_DMA_INDEX_MASK);
-
- __netmem_clear_lsb(netmem)->pp_magic = 0;
-}
-
static inline bool netmem_is_pp(netmem_ref netmem)
{
- return (netmem_get_pp_magic(netmem) & PP_MAGIC_MASK) == PP_SIGNATURE;
+ /* Use ->pp for net_iov to identify if it's pp, which requires
+ * that non-pp net_iov should have ->pp NULL'd.
+ */
+ if (netmem_is_net_iov(netmem))
+ return !!__netmem_clear_lsb(netmem)->pp;
+
+ /* For system memory, page type bit in struct page can be used
+ * to identify if it's pp.
+ */
+ return PageNetpp(__netmem_to_page(netmem));
}
static inline void netmem_set_pp(netmem_ref netmem, struct page_pool *pool)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 05e2e22a8f7c..37eeab76c41c 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -654,7 +654,6 @@ s32 page_pool_inflight(const struct page_pool *pool, bool strict)
void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem)
{
netmem_set_pp(netmem, pool);
- netmem_or_pp_magic(netmem, PP_SIGNATURE);
/* Ensuring all pages have been split into one fragment initially:
* page_pool_set_pp_info() is only called once for every page when it
@@ -665,12 +664,19 @@ void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem)
page_pool_fragment_netmem(netmem, 1);
if (pool->has_init_callback)
pool->slow.init_callback(netmem, pool->slow.init_arg);
+
+ /* If it's page-backed */
+ if (!netmem_is_net_iov(netmem))
+ __SetPageNetpp(__netmem_to_page(netmem));
}
void page_pool_clear_pp_info(netmem_ref netmem)
{
- netmem_clear_pp_magic(netmem);
netmem_set_pp(netmem, NULL);
+
+ /* If it's page-backed */
+ if (!netmem_is_net_iov(netmem))
+ __ClearPageNetpp(__netmem_to_page(netmem));
}
static __always_inline void __page_pool_release_netmem_dma(struct page_pool *pool,
base-commit: 54efec8782214652b331c50646013f8526570e8d
--
2.17.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type
2025-07-29 11:02 [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type Byungchul Park
@ 2025-07-29 14:20 ` Zi Yan
2025-08-01 23:08 ` Jakub Kicinski
` (3 subsequent siblings)
4 siblings, 0 replies; 13+ messages in thread
From: Zi Yan @ 2025-07-29 14:20 UTC (permalink / raw)
To: Byungchul Park
Cc: linux-mm, netdev, linux-kernel, kernel_team, harry.yoo, ast,
daniel, davem, kuba, hawk, john.fastabend, sdf, saeedm, leon,
tariqt, mbloch, andrew+netdev, edumazet, pabeni, akpm, david,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
horms, jackmanb, hannes, ilias.apalodimas, willy, brauner, kas,
yuzhao, usamaarif642, baolin.wang, almasrymina, toke,
asml.silence, bpf, linux-rdma, sfr
On 29 Jul 2025, at 7:02, Byungchul Park wrote:
> Changes from v2:
> 1. Rebase on linux-next as of Jul 29.
> 2. Skip 'niov->pp = NULL' when it's allocated using __GFP_ZERO.
> 3. Change trivial coding style. (feedbacked by Mina)
> 4. Add Co-developed-by, Acked-by, and Reviewed-by properly.
> Thanks to all.
>
> Changes from v1:
> 1. Rebase on linux-next.
> 2. Initialize net_iov->pp = NULL when allocating net_iov in
> net_devmem_bind_dmabuf() and io_zcrx_create_area().
> 3. Use ->pp for net_iov to identify if it's pp rather than
> always consider net_iov as pp.
> 4. Add Suggested-by: David Hildenbrand <david@redhat.com>.
>
> ---8<---
> From 88bcb9907a0cef65a9c0adf35e144f9eb67e0542 Mon Sep 17 00:00:00 2001
> From: Byungchul Park <byungchul@sk.com>
> Date: Tue, 29 Jul 2025 19:49:44 +0900
> Subject: [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type
>
> ->pp_magic field in struct page is current used to identify if a page
> belongs to a page pool. However, ->pp_magic will be removed and page
> type bit in struct page e.i. PGTY_netpp can be used for that purpose.
>
> Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
> and __ClearPageNetpp() instead, and remove the existing APIs accessing
> ->pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
> netmem_clear_pp_magic().
>
> For net_iov, use ->pp to identify if it's pp, with making sure that ->pp
> is NULL for non-pp net_iov.
>
> This work was inspired by the following link:
>
> [1] https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/
>
> While at it, move the sanity check for page pool to on free.
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Co-developed-by: Pavel Begunkov <asml.silence@gmail.com>
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Mina Almasry <almasrymina@google.com>
> ---
> .../net/ethernet/mellanox/mlx5/core/en/xdp.c | 2 +-
> include/linux/mm.h | 27 +++----------------
> include/linux/page-flags.h | 6 +++++
> include/net/netmem.h | 2 +-
> io_uring/zcrx.c | 4 +++
> mm/page_alloc.c | 7 +++--
> net/core/devmem.c | 1 +
> net/core/netmem_priv.h | 23 +++++++---------
> net/core/page_pool.c | 10 +++++--
> 9 files changed, 37 insertions(+), 45 deletions(-)
>
<snip>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0d4ee569aa6b..d01b296e7184 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4171,10 +4171,9 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> * DMA mapping IDs for page_pool
> *
> * When DMA-mapping a page, page_pool allocates an ID (from an xarray) and
> - * stashes it in the upper bits of page->pp_magic. We always want to be able to
> - * unambiguously identify page pool pages (using page_pool_page_is_pp()). Non-PP
> - * pages can have arbitrary kernel pointers stored in the same field as pp_magic
> - * (since it overlaps with page->lru.next), so we must ensure that we cannot
> + * stashes it in the upper bits of page->pp_magic. Non-PP pages can have
> + * arbitrary kernel pointers stored in the same field as pp_magic (since
> + * it overlaps with page->lru.next), so we must ensure that we cannot
> * mistake a valid kernel pointer with any of the values we write into this
> * field.
> *
> @@ -4205,26 +4204,6 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> #define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS + PP_DMA_INDEX_SHIFT - 1, \
> PP_DMA_INDEX_SHIFT)
>
> -/* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is
> - * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for
> - * the head page of compound page and bit 1 for pfmemalloc page, as well as the
> - * bits used for the DMA index. page_is_pfmemalloc() is checked in
> - * __page_pool_put_page() to avoid recycling the pfmemalloc page.
> - */
> -#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> -
> -#ifdef CONFIG_PAGE_POOL
> -static inline bool page_pool_page_is_pp(const struct page *page)
> -{
> - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> -}
> -#else
> -static inline bool page_pool_page_is_pp(const struct page *page)
> -{
> - return false;
> -}
> -#endif
> -
> #define PAGE_SNAPSHOT_FAITHFUL (1 << 0)
> #define PAGE_SNAPSHOT_PG_BUDDY (1 << 1)
> #define PAGE_SNAPSHOT_PG_IDLE (1 << 2)
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 8d3fa3a91ce4..84247e39e9e7 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -933,6 +933,7 @@ enum pagetype {
> PGTY_zsmalloc = 0xf6,
> PGTY_unaccepted = 0xf7,
> PGTY_large_kmalloc = 0xf8,
> + PGTY_netpp = 0xf9,
>
> PGTY_mapcount_underflow = 0xff
> };
> @@ -1077,6 +1078,11 @@ PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc)
> PAGE_TYPE_OPS(Unaccepted, unaccepted, unaccepted)
> FOLIO_TYPE_OPS(large_kmalloc, large_kmalloc)
>
> +/*
> + * Marks page_pool allocated pages.
> + */
> +PAGE_TYPE_OPS(Netpp, netpp, netpp)
> +
> /**
> * PageHuge - Determine if the page belongs to hugetlbfs
> * @page: The page to test.
<snip>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d1d037f97c5f..2f6a55fab942 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1042,7 +1042,6 @@ static inline bool page_expected_state(struct page *page,
> #ifdef CONFIG_MEMCG
> page->memcg_data |
> #endif
> - page_pool_page_is_pp(page) |
> (page->flags & check_flags)))
> return false;
>
> @@ -1069,8 +1068,6 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
> if (unlikely(page->memcg_data))
> bad_reason = "page still charged to cgroup";
> #endif
> - if (unlikely(page_pool_page_is_pp(page)))
> - bad_reason = "page_pool leak";
> return bad_reason;
> }
>
> @@ -1379,9 +1376,11 @@ __always_inline bool free_pages_prepare(struct page *page,
> mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
> folio->mapping = NULL;
> }
> - if (unlikely(page_has_type(page)))
> + if (unlikely(page_has_type(page))) {
> + WARN_ON_ONCE(PageNetpp(page));
> /* Reset the page_type (which overlays _mapcount) */
> page->page_type = UINT_MAX;
> + }
>
> if (is_check_pages_enabled()) {
> if (free_page_is_bad(page))
The mm part looks good to me.
Acked-by: Zi Yan <ziy@nvidia.com>
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type
2025-07-29 11:02 [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type Byungchul Park
2025-07-29 14:20 ` Zi Yan
@ 2025-08-01 23:08 ` Jakub Kicinski
2025-08-04 1:03 ` Byungchul Park
2025-08-02 5:07 ` Stephen Rothwell
` (2 subsequent siblings)
4 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2025-08-01 23:08 UTC (permalink / raw)
To: Byungchul Park
Cc: linux-mm, netdev, linux-kernel, kernel_team, harry.yoo, ast,
daniel, davem, hawk, john.fastabend, sdf, saeedm, leon, tariqt,
mbloch, andrew+netdev, edumazet, pabeni, akpm, david,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
horms, jackmanb, hannes, ziy, ilias.apalodimas, willy, brauner,
kas, yuzhao, usamaarif642, baolin.wang, almasrymina, toke,
asml.silence, bpf, linux-rdma, sfr
On Tue, 29 Jul 2025 20:02:10 +0900 Byungchul Park wrote:
> [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type
linux-next does not accept patches. This has to go either via networking or MM.
> - if (unlikely(page_has_type(page)))
> + if (unlikely(page_has_type(page))) {
Maybe add :
/* networking expects to clear its page type before releasing */
> + WARN_ON_ONCE(PageNetpp(page));
> /* Reset the page_type (which overlays _mapcount) */
> page->page_type = UINT_MAX;
> + }
> static inline bool netmem_is_pp(netmem_ref netmem)
> {
> - return (netmem_get_pp_magic(netmem) & PP_MAGIC_MASK) == PP_SIGNATURE;
> + /* Use ->pp for net_iov to identify if it's pp,
Please try to use precise language, this code is confusing as is.
net_iov may _belong_ to a page pool.
* which requires that non-pp net_iov should have ->pp NULL'd.
I don't think this adds any information.
> + */
> + if (netmem_is_net_iov(netmem))
> + return !!__netmem_clear_lsb(netmem)->pp;
> +
> + /* For system memory, page type bit in struct page can be used
"page type bit" -> "page type", it's not a bit.
> + * to identify if it's pp.
... to identify pages which belong to a page pool.
> + */
> + return PageNetpp(__netmem_to_page(netmem));
> }
>
> static inline void netmem_set_pp(netmem_ref netmem, struct page_pool *pool)
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 05e2e22a8f7c..37eeab76c41c 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -654,7 +654,6 @@ s32 page_pool_inflight(const struct page_pool *pool, bool strict)
> void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem)
> {
> netmem_set_pp(netmem, pool);
> - netmem_or_pp_magic(netmem, PP_SIGNATURE);
>
> /* Ensuring all pages have been split into one fragment initially:
> * page_pool_set_pp_info() is only called once for every page when it
> @@ -665,12 +664,19 @@ void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem)
> page_pool_fragment_netmem(netmem, 1);
> if (pool->has_init_callback)
> pool->slow.init_callback(netmem, pool->slow.init_arg);
> +
> + /* If it's page-backed */
Please don't add obvious comments.
> + if (!netmem_is_net_iov(netmem))
> + __SetPageNetpp(__netmem_to_page(netmem));
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type
2025-07-29 11:02 [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type Byungchul Park
2025-07-29 14:20 ` Zi Yan
2025-08-01 23:08 ` Jakub Kicinski
@ 2025-08-02 5:07 ` Stephen Rothwell
2025-08-04 1:17 ` Byungchul Park
2025-08-10 20:21 ` Pavel Begunkov
2025-08-13 6:09 ` Byungchul Park
4 siblings, 1 reply; 13+ messages in thread
From: Stephen Rothwell @ 2025-08-02 5:07 UTC (permalink / raw)
To: Byungchul Park
Cc: linux-mm, netdev, linux-kernel, kernel_team, harry.yoo, ast,
daniel, davem, kuba, hawk, john.fastabend, sdf, saeedm, leon,
tariqt, mbloch, andrew+netdev, edumazet, pabeni, akpm, david,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
horms, jackmanb, hannes, ziy, ilias.apalodimas, willy, brauner,
kas, yuzhao, usamaarif642, baolin.wang, almasrymina, toke,
asml.silence, bpf, linux-rdma
[-- Attachment #1: Type: text/plain, Size: 319 bytes --]
Hi,
On Tue, 29 Jul 2025 20:02:10 +0900 Byungchul Park <byungchul@sk.com> wrote:
>
> Changes from v2:
> 1. Rebase on linux-next as of Jul 29.
Why are you basing development work in linux-next. That is a
constantly rebasing tree. Please base your work on some stable tree.
--
Cheers,
Stephen Rothwell
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type
2025-08-01 23:08 ` Jakub Kicinski
@ 2025-08-04 1:03 ` Byungchul Park
0 siblings, 0 replies; 13+ messages in thread
From: Byungchul Park @ 2025-08-04 1:03 UTC (permalink / raw)
To: Jakub Kicinski
Cc: linux-mm, netdev, linux-kernel, kernel_team, harry.yoo, ast,
daniel, davem, hawk, john.fastabend, sdf, saeedm, leon, tariqt,
mbloch, andrew+netdev, edumazet, pabeni, akpm, david,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
horms, jackmanb, hannes, ziy, ilias.apalodimas, willy, brauner,
kas, yuzhao, usamaarif642, baolin.wang, almasrymina, toke,
asml.silence, bpf, linux-rdma, sfr
On Fri, Aug 01, 2025 at 04:08:05PM -0700, Jakub Kicinski wrote:
> On Tue, 29 Jul 2025 20:02:10 +0900 Byungchul Park wrote:
> > [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type
>
> linux-next does not accept patches. This has to go either via networking or MM.
Yeah. That's what I found. Thanks for the confirmation.
> > - if (unlikely(page_has_type(page)))
> > + if (unlikely(page_has_type(page))) {
>
> Maybe add :
>
> /* networking expects to clear its page type before releasing */
I will.
> > + WARN_ON_ONCE(PageNetpp(page));
> > /* Reset the page_type (which overlays _mapcount) */
> > page->page_type = UINT_MAX;
> > + }
>
> > static inline bool netmem_is_pp(netmem_ref netmem)
> > {
> > - return (netmem_get_pp_magic(netmem) & PP_MAGIC_MASK) == PP_SIGNATURE;
> > + /* Use ->pp for net_iov to identify if it's pp,
>
> Please try to use precise language, this code is confusing as is.
> net_iov may _belong_ to a page pool.
>
> * which requires that non-pp net_iov should have ->pp NULL'd.
Thank you. I will.
> I don't think this adds any information.
>
> > + */
> > + if (netmem_is_net_iov(netmem))
> > + return !!__netmem_clear_lsb(netmem)->pp;
> > +
> > + /* For system memory, page type bit in struct page can be used
>
> "page type bit" -> "page type", it's not a bit.
>
> > + * to identify if it's pp.
>
> ... to identify pages which belong to a page pool.
I will.
> > + */
> > + return PageNetpp(__netmem_to_page(netmem));
> > }
> >
> > static inline void netmem_set_pp(netmem_ref netmem, struct page_pool *pool)
> > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > index 05e2e22a8f7c..37eeab76c41c 100644
> > --- a/net/core/page_pool.c
> > +++ b/net/core/page_pool.c
> > @@ -654,7 +654,6 @@ s32 page_pool_inflight(const struct page_pool *pool, bool strict)
> > void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem)
> > {
> > netmem_set_pp(netmem, pool);
> > - netmem_or_pp_magic(netmem, PP_SIGNATURE);
> >
> > /* Ensuring all pages have been split into one fragment initially:
> > * page_pool_set_pp_info() is only called once for every page when it
> > @@ -665,12 +664,19 @@ void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem)
> > page_pool_fragment_netmem(netmem, 1);
> > if (pool->has_init_callback)
> > pool->slow.init_callback(netmem, pool->slow.init_arg);
> > +
> > + /* If it's page-backed */
>
> Please don't add obvious comments.
I added the comment since it's _not_ obvious. !net_iov means it's not
net_iov, not it's page-backed. However, since you, as the maintainer,
are okay, I will remove it. Thanks.
Byungchul
> > + if (!netmem_is_net_iov(netmem))
> > + __SetPageNetpp(__netmem_to_page(netmem));
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type
2025-08-02 5:07 ` Stephen Rothwell
@ 2025-08-04 1:17 ` Byungchul Park
2025-08-04 7:38 ` David Hildenbrand
0 siblings, 1 reply; 13+ messages in thread
From: Byungchul Park @ 2025-08-04 1:17 UTC (permalink / raw)
To: Stephen Rothwell
Cc: linux-mm, netdev, linux-kernel, kernel_team, harry.yoo, ast,
daniel, davem, kuba, hawk, john.fastabend, sdf, saeedm, leon,
tariqt, mbloch, andrew+netdev, edumazet, pabeni, akpm, david,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
horms, jackmanb, hannes, ziy, ilias.apalodimas, willy, brauner,
kas, yuzhao, usamaarif642, baolin.wang, almasrymina, toke,
asml.silence, bpf, linux-rdma
On Sat, Aug 02, 2025 at 03:07:46PM +1000, Stephen Rothwell wrote:
> Hi,
>
> On Tue, 29 Jul 2025 20:02:10 +0900 Byungchul Park <byungchul@sk.com> wrote:
> >
> > Changes from v2:
> > 1. Rebase on linux-next as of Jul 29.
>
> Why are you basing development work in linux-next. That is a
> constantly rebasing tree. Please base your work on some stable tree.
Sorry about the confusing. I misunderstood how to work for patches
based on linux-next.
However, basing on linux-next is still required for this work since more
than one subsystem is involved, and asked by David Hildenbrand:
https://lore.kernel.org/all/20250728105701.GA21732@system.software.com/
I will base on linux-next and work aiming at either network or mm tree.
Byungchul
> --
> Cheers,
> Stephen Rothwell
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type
2025-08-04 1:17 ` Byungchul Park
@ 2025-08-04 7:38 ` David Hildenbrand
0 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2025-08-04 7:38 UTC (permalink / raw)
To: Byungchul Park, Stephen Rothwell
Cc: linux-mm, netdev, linux-kernel, kernel_team, harry.yoo, ast,
daniel, davem, kuba, hawk, john.fastabend, sdf, saeedm, leon,
tariqt, mbloch, andrew+netdev, edumazet, pabeni, akpm,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
horms, jackmanb, hannes, ziy, ilias.apalodimas, willy, brauner,
kas, yuzhao, usamaarif642, baolin.wang, almasrymina, toke,
asml.silence, bpf, linux-rdma
On 04.08.25 03:17, Byungchul Park wrote:
> On Sat, Aug 02, 2025 at 03:07:46PM +1000, Stephen Rothwell wrote:
>> Hi,
>>
>> On Tue, 29 Jul 2025 20:02:10 +0900 Byungchul Park <byungchul@sk.com> wrote:
>>>
>>> Changes from v2:
>>> 1. Rebase on linux-next as of Jul 29.
>>
>> Why are you basing development work in linux-next. That is a
>> constantly rebasing tree. Please base your work on some stable tree.
>
> Sorry about the confusing. I misunderstood how to work for patches
> based on linux-next.
>
> However, basing on linux-next is still required for this work since more
> than one subsystem is involved, and asked by David Hildenbrand:
>
> https://lore.kernel.org/all/20250728105701.GA21732@system.software.com/
>
> I will base on linux-next and work aiming at either network or mm tree.
I think this is the key part: there is nothing wrong on temporarily
basing your stuff on linux-next, while we are waiting for this merge
window to end and relevant patches showing up in either tree after the
rebase.
You should mention below the "---" your intentions like "This patch is
supposed to go via the XXX tree, but it currently also depends on
patches in the YYY tree. For now, this patch is based on linux-next, but
will apply cleanly (or get rebased) after XXX was rebased."
Also, probably best to indicate the patch as being RFC (instead of
linux-next) until there is a stable "base".
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type
2025-07-29 11:02 [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type Byungchul Park
` (2 preceding siblings ...)
2025-08-02 5:07 ` Stephen Rothwell
@ 2025-08-10 20:21 ` Pavel Begunkov
2025-08-11 1:09 ` Byungchul Park
2025-08-13 6:09 ` Byungchul Park
4 siblings, 1 reply; 13+ messages in thread
From: Pavel Begunkov @ 2025-08-10 20:21 UTC (permalink / raw)
To: Byungchul Park, linux-mm, netdev
Cc: linux-kernel, kernel_team, harry.yoo, ast, daniel, davem, kuba,
hawk, john.fastabend, sdf, saeedm, leon, tariqt, mbloch,
andrew+netdev, edumazet, pabeni, akpm, david, lorenzo.stoakes,
Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, jackmanb,
hannes, ziy, ilias.apalodimas, willy, brauner, kas, yuzhao,
usamaarif642, baolin.wang, almasrymina, toke, bpf, linux-rdma,
sfr
On 7/29/25 12:02, Byungchul Park wrote:
> Changes from v2:
> 1. Rebase on linux-next as of Jul 29.
> 2. Skip 'niov->pp = NULL' when it's allocated using __GFP_ZERO.
> 3. Change trivial coding style. (feedbacked by Mina)
> 4. Add Co-developed-by, Acked-by, and Reviewed-by properly.
> Thanks to all.
>
> Changes from v1:
> 1. Rebase on linux-next.
> 2. Initialize net_iov->pp = NULL when allocating net_iov in
> net_devmem_bind_dmabuf() and io_zcrx_create_area().
> 3. Use ->pp for net_iov to identify if it's pp rather than
> always consider net_iov as pp.
> 4. Add Suggested-by: David Hildenbrand <david@redhat.com>.
>
> ---8<---
> From 88bcb9907a0cef65a9c0adf35e144f9eb67e0542 Mon Sep 17 00:00:00 2001
> From: Byungchul Park <byungchul@sk.com>
> Date: Tue, 29 Jul 2025 19:49:44 +0900
> Subject: [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type
That will conflict with "netmem: replace __netmem_clear_lsb() with
netmem_to_nmdesc()", it'll need some coordination.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type
2025-08-10 20:21 ` Pavel Begunkov
@ 2025-08-11 1:09 ` Byungchul Park
0 siblings, 0 replies; 13+ messages in thread
From: Byungchul Park @ 2025-08-11 1:09 UTC (permalink / raw)
To: Pavel Begunkov
Cc: linux-mm, netdev, linux-kernel, kernel_team, harry.yoo, ast,
daniel, davem, kuba, hawk, john.fastabend, sdf, saeedm, leon,
tariqt, mbloch, andrew+netdev, edumazet, pabeni, akpm, david,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
horms, jackmanb, hannes, ziy, ilias.apalodimas, willy, brauner,
kas, yuzhao, usamaarif642, baolin.wang, almasrymina, toke, bpf,
linux-rdma, sfr
On Sun, Aug 10, 2025 at 09:21:45PM +0100, Pavel Begunkov wrote:
> On 7/29/25 12:02, Byungchul Park wrote:
> > Changes from v2:
> > 1. Rebase on linux-next as of Jul 29.
> > 2. Skip 'niov->pp = NULL' when it's allocated using __GFP_ZERO.
> > 3. Change trivial coding style. (feedbacked by Mina)
> > 4. Add Co-developed-by, Acked-by, and Reviewed-by properly.
> > Thanks to all.
> >
> > Changes from v1:
> > 1. Rebase on linux-next.
> > 2. Initialize net_iov->pp = NULL when allocating net_iov in
> > net_devmem_bind_dmabuf() and io_zcrx_create_area().
> > 3. Use ->pp for net_iov to identify if it's pp rather than
> > always consider net_iov as pp.
> > 4. Add Suggested-by: David Hildenbrand <david@redhat.com>.
> >
> > ---8<---
> > From 88bcb9907a0cef65a9c0adf35e144f9eb67e0542 Mon Sep 17 00:00:00 2001
> > From: Byungchul Park <byungchul@sk.com>
> > Date: Tue, 29 Jul 2025 19:49:44 +0900
> > Subject: [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type
>
> That will conflict with "netmem: replace __netmem_clear_lsb() with
> netmem_to_nmdesc()", it'll need some coordination.
Indeed. It'd better work on top of "netmem: replace __netmem_clear_lsb()
with netmem_to_nmdesc()" then. You said you are going to take the patch.
Please lemme know the progress so that I can track and re-work on this.
Byungchul
> --
> Pavel Begunkov
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type
2025-07-29 11:02 [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type Byungchul Park
` (3 preceding siblings ...)
2025-08-10 20:21 ` Pavel Begunkov
@ 2025-08-13 6:09 ` Byungchul Park
2025-08-13 11:18 ` Pavel Begunkov
4 siblings, 1 reply; 13+ messages in thread
From: Byungchul Park @ 2025-08-13 6:09 UTC (permalink / raw)
To: akpm, kuba
Cc: linux-kernel, kernel_team, harry.yoo, ast, daniel, davem, kuba,
hawk, john.fastabend, sdf, saeedm, leon, tariqt, mbloch,
andrew+netdev, edumazet, pabeni, david, lorenzo.stoakes,
Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, jackmanb,
hannes, ziy, ilias.apalodimas, willy, brauner, kas, yuzhao,
usamaarif642, baolin.wang, almasrymina, toke, asml.silence, bpf,
linux-rdma, sfr, linux-mm, netdev
On Tue, Jul 29, 2025 at 08:02:10PM +0900, Byungchul Park wrote:
> Changes from v2:
> 1. Rebase on linux-next as of Jul 29.
> 2. Skip 'niov->pp = NULL' when it's allocated using __GFP_ZERO.
> 3. Change trivial coding style. (feedbacked by Mina)
> 4. Add Co-developed-by, Acked-by, and Reviewed-by properly.
> Thanks to all.
>
> Changes from v1:
> 1. Rebase on linux-next.
> 2. Initialize net_iov->pp = NULL when allocating net_iov in
> net_devmem_bind_dmabuf() and io_zcrx_create_area().
> 3. Use ->pp for net_iov to identify if it's pp rather than
> always consider net_iov as pp.
> 4. Add Suggested-by: David Hildenbrand <david@redhat.com>.
>
> ---8<---
> >From 88bcb9907a0cef65a9c0adf35e144f9eb67e0542 Mon Sep 17 00:00:00 2001
> From: Byungchul Park <byungchul@sk.com>
> Date: Tue, 29 Jul 2025 19:49:44 +0900
> Subject: [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type
>
> ->pp_magic field in struct page is current used to identify if a page
> belongs to a page pool. However, ->pp_magic will be removed and page
> type bit in struct page e.i. PGTY_netpp can be used for that purpose.
>
> Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
> and __ClearPageNetpp() instead, and remove the existing APIs accessing
> ->pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
> netmem_clear_pp_magic().
>
> For net_iov, use ->pp to identify if it's pp, with making sure that ->pp
> is NULL for non-pp net_iov.
>
> This work was inspired by the following link:
>
> [1] https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/
>
> While at it, move the sanity check for page pool to on free.
Hi, Andrew and Jakub
I will spin the next one with some modified, once the following patch,
[1], gets merged.
[1] https://lore.kernel.org/all/a8643abedd208138d3d550db71631d5a2e4168d1.1754929026.git.asml.silence@gmail.com/
This is about both mm and network. I have no idea which tree should I
aim at between mm tree and network tree? I prefer the network tree tho.
However, it's totally fine regardless of what it would be. Suggestion?
Byungchul
> Suggested-by: David Hildenbrand <david@redhat.com>
> Co-developed-by: Pavel Begunkov <asml.silence@gmail.com>
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Mina Almasry <almasrymina@google.com>
> ---
> .../net/ethernet/mellanox/mlx5/core/en/xdp.c | 2 +-
> include/linux/mm.h | 27 +++----------------
> include/linux/page-flags.h | 6 +++++
> include/net/netmem.h | 2 +-
> io_uring/zcrx.c | 4 +++
> mm/page_alloc.c | 7 +++--
> net/core/devmem.c | 1 +
> net/core/netmem_priv.h | 23 +++++++---------
> net/core/page_pool.c | 10 +++++--
> 9 files changed, 37 insertions(+), 45 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> index 5d51600935a6..def274f5c1ca 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> @@ -707,7 +707,7 @@ static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq,
> xdpi = mlx5e_xdpi_fifo_pop(xdpi_fifo);
> page = xdpi.page.page;
>
> - /* No need to check page_pool_page_is_pp() as we
> + /* No need to check PageNetpp() as we
> * know this is a page_pool page.
> */
> page_pool_recycle_direct(pp_page_to_nmdesc(page)->pp,
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0d4ee569aa6b..d01b296e7184 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4171,10 +4171,9 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> * DMA mapping IDs for page_pool
> *
> * When DMA-mapping a page, page_pool allocates an ID (from an xarray) and
> - * stashes it in the upper bits of page->pp_magic. We always want to be able to
> - * unambiguously identify page pool pages (using page_pool_page_is_pp()). Non-PP
> - * pages can have arbitrary kernel pointers stored in the same field as pp_magic
> - * (since it overlaps with page->lru.next), so we must ensure that we cannot
> + * stashes it in the upper bits of page->pp_magic. Non-PP pages can have
> + * arbitrary kernel pointers stored in the same field as pp_magic (since
> + * it overlaps with page->lru.next), so we must ensure that we cannot
> * mistake a valid kernel pointer with any of the values we write into this
> * field.
> *
> @@ -4205,26 +4204,6 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> #define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS + PP_DMA_INDEX_SHIFT - 1, \
> PP_DMA_INDEX_SHIFT)
>
> -/* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is
> - * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for
> - * the head page of compound page and bit 1 for pfmemalloc page, as well as the
> - * bits used for the DMA index. page_is_pfmemalloc() is checked in
> - * __page_pool_put_page() to avoid recycling the pfmemalloc page.
> - */
> -#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> -
> -#ifdef CONFIG_PAGE_POOL
> -static inline bool page_pool_page_is_pp(const struct page *page)
> -{
> - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> -}
> -#else
> -static inline bool page_pool_page_is_pp(const struct page *page)
> -{
> - return false;
> -}
> -#endif
> -
> #define PAGE_SNAPSHOT_FAITHFUL (1 << 0)
> #define PAGE_SNAPSHOT_PG_BUDDY (1 << 1)
> #define PAGE_SNAPSHOT_PG_IDLE (1 << 2)
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 8d3fa3a91ce4..84247e39e9e7 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -933,6 +933,7 @@ enum pagetype {
> PGTY_zsmalloc = 0xf6,
> PGTY_unaccepted = 0xf7,
> PGTY_large_kmalloc = 0xf8,
> + PGTY_netpp = 0xf9,
>
> PGTY_mapcount_underflow = 0xff
> };
> @@ -1077,6 +1078,11 @@ PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc)
> PAGE_TYPE_OPS(Unaccepted, unaccepted, unaccepted)
> FOLIO_TYPE_OPS(large_kmalloc, large_kmalloc)
>
> +/*
> + * Marks page_pool allocated pages.
> + */
> +PAGE_TYPE_OPS(Netpp, netpp, netpp)
> +
> /**
> * PageHuge - Determine if the page belongs to hugetlbfs
> * @page: The page to test.
> diff --git a/include/net/netmem.h b/include/net/netmem.h
> index f7dacc9e75fd..3667334e16e7 100644
> --- a/include/net/netmem.h
> +++ b/include/net/netmem.h
> @@ -298,7 +298,7 @@ static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem)
> */
> #define pp_page_to_nmdesc(p) \
> ({ \
> - DEBUG_NET_WARN_ON_ONCE(!page_pool_page_is_pp(p)); \
> + DEBUG_NET_WARN_ON_ONCE(!PageNetpp(p)); \
> __pp_page_to_nmdesc(p); \
> })
>
> diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
> index e5ff49f3425e..f771bb3e756d 100644
> --- a/io_uring/zcrx.c
> +++ b/io_uring/zcrx.c
> @@ -444,6 +444,10 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
> area->freelist[i] = i;
> atomic_set(&area->user_refs[i], 0);
> niov->type = NET_IOV_IOURING;
> +
> + /* niov->pp is already initialized to NULL by
> + * kvmalloc_array(__GFP_ZERO).
> + */
> }
>
> area->free_count = nr_iovs;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d1d037f97c5f..2f6a55fab942 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1042,7 +1042,6 @@ static inline bool page_expected_state(struct page *page,
> #ifdef CONFIG_MEMCG
> page->memcg_data |
> #endif
> - page_pool_page_is_pp(page) |
> (page->flags & check_flags)))
> return false;
>
> @@ -1069,8 +1068,6 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
> if (unlikely(page->memcg_data))
> bad_reason = "page still charged to cgroup";
> #endif
> - if (unlikely(page_pool_page_is_pp(page)))
> - bad_reason = "page_pool leak";
> return bad_reason;
> }
>
> @@ -1379,9 +1376,11 @@ __always_inline bool free_pages_prepare(struct page *page,
> mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
> folio->mapping = NULL;
> }
> - if (unlikely(page_has_type(page)))
> + if (unlikely(page_has_type(page))) {
> + WARN_ON_ONCE(PageNetpp(page));
> /* Reset the page_type (which overlays _mapcount) */
> page->page_type = UINT_MAX;
> + }
>
> if (is_check_pages_enabled()) {
> if (free_page_is_bad(page))
> diff --git a/net/core/devmem.c b/net/core/devmem.c
> index b3a62ca0df65..40e7a4ec9009 100644
> --- a/net/core/devmem.c
> +++ b/net/core/devmem.c
> @@ -285,6 +285,7 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> niov = &owner->area.niovs[i];
> niov->type = NET_IOV_DMABUF;
> niov->owner = &owner->area;
> + niov->pp = NULL;
> page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov),
> net_devmem_get_dma_addr(niov));
> if (direction == DMA_TO_DEVICE)
> diff --git a/net/core/netmem_priv.h b/net/core/netmem_priv.h
> index cd95394399b4..4b90332d6c64 100644
> --- a/net/core/netmem_priv.h
> +++ b/net/core/netmem_priv.h
> @@ -8,21 +8,18 @@ static inline unsigned long netmem_get_pp_magic(netmem_ref netmem)
> return __netmem_clear_lsb(netmem)->pp_magic & ~PP_DMA_INDEX_MASK;
> }
>
> -static inline void netmem_or_pp_magic(netmem_ref netmem, unsigned long pp_magic)
> -{
> - __netmem_clear_lsb(netmem)->pp_magic |= pp_magic;
> -}
> -
> -static inline void netmem_clear_pp_magic(netmem_ref netmem)
> -{
> - WARN_ON_ONCE(__netmem_clear_lsb(netmem)->pp_magic & PP_DMA_INDEX_MASK);
> -
> - __netmem_clear_lsb(netmem)->pp_magic = 0;
> -}
> -
> static inline bool netmem_is_pp(netmem_ref netmem)
> {
> - return (netmem_get_pp_magic(netmem) & PP_MAGIC_MASK) == PP_SIGNATURE;
> + /* Use ->pp for net_iov to identify if it's pp, which requires
> + * that non-pp net_iov should have ->pp NULL'd.
> + */
> + if (netmem_is_net_iov(netmem))
> + return !!__netmem_clear_lsb(netmem)->pp;
> +
> + /* For system memory, page type bit in struct page can be used
> + * to identify if it's pp.
> + */
> + return PageNetpp(__netmem_to_page(netmem));
> }
>
> static inline void netmem_set_pp(netmem_ref netmem, struct page_pool *pool)
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 05e2e22a8f7c..37eeab76c41c 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -654,7 +654,6 @@ s32 page_pool_inflight(const struct page_pool *pool, bool strict)
> void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem)
> {
> netmem_set_pp(netmem, pool);
> - netmem_or_pp_magic(netmem, PP_SIGNATURE);
>
> /* Ensuring all pages have been split into one fragment initially:
> * page_pool_set_pp_info() is only called once for every page when it
> @@ -665,12 +664,19 @@ void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem)
> page_pool_fragment_netmem(netmem, 1);
> if (pool->has_init_callback)
> pool->slow.init_callback(netmem, pool->slow.init_arg);
> +
> + /* If it's page-backed */
> + if (!netmem_is_net_iov(netmem))
> + __SetPageNetpp(__netmem_to_page(netmem));
> }
>
> void page_pool_clear_pp_info(netmem_ref netmem)
> {
> - netmem_clear_pp_magic(netmem);
> netmem_set_pp(netmem, NULL);
> +
> + /* If it's page-backed */
> + if (!netmem_is_net_iov(netmem))
> + __ClearPageNetpp(__netmem_to_page(netmem));
> }
>
> static __always_inline void __page_pool_release_netmem_dma(struct page_pool *pool,
>
> base-commit: 54efec8782214652b331c50646013f8526570e8d
> --
> 2.17.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type
2025-08-13 6:09 ` Byungchul Park
@ 2025-08-13 11:18 ` Pavel Begunkov
2025-08-13 14:52 ` Jakub Kicinski
0 siblings, 1 reply; 13+ messages in thread
From: Pavel Begunkov @ 2025-08-13 11:18 UTC (permalink / raw)
To: Byungchul Park, akpm, kuba
Cc: linux-kernel, kernel_team, harry.yoo, ast, daniel, davem, hawk,
john.fastabend, sdf, saeedm, leon, tariqt, mbloch, andrew+netdev,
edumazet, pabeni, david, lorenzo.stoakes, Liam.Howlett, vbabka,
rppt, surenb, mhocko, horms, jackmanb, hannes, ziy,
ilias.apalodimas, willy, brauner, kas, yuzhao, usamaarif642,
baolin.wang, almasrymina, toke, bpf, linux-rdma, sfr, linux-mm,
netdev
On 8/13/25 07:09, Byungchul Park wrote:
> On Tue, Jul 29, 2025 at 08:02:10PM +0900, Byungchul Park wrote:
...>> For net_iov, use ->pp to identify if it's pp, with making sure that ->pp
>> is NULL for non-pp net_iov.
>>
>> This work was inspired by the following link:
>>
>> [1] https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/
>>
>> While at it, move the sanity check for page pool to on free.
>
> Hi, Andrew and Jakub
>
> I will spin the next one with some modified, once the following patch,
> [1], gets merged.
>
> [1] https://lore.kernel.org/all/a8643abedd208138d3d550db71631d5a2e4168d1.1754929026.git.asml.silence@gmail.com/
>
> This is about both mm and network. I have no idea which tree should I
> aim at between mm tree and network tree? I prefer the network tree tho.
>
> However, it's totally fine regardless of what it would be. Suggestion?
It should go to net, there will be enough of conflicts otherwise.
mm maintainers, do you like it as a shared branch or can it just
go through the net tree?
It'd also be better to split mm and net changes into a separate
patches. A patch I had before, it might need a rebase though.
From: Pavel Begunkov <asml.silence@gmail.com>
Date: Thu, 17 Jul 2025 11:46:21 +0100
Subject: [PATCH] mm: introduce a page type for page pool
Page pool currently uses ->pp_magic aliased with lru.next to check
whether a page belongs to it. Add a new page type, a later patch will
convert page pool to use it.
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
include/linux/mm.h | 20 --------------------
include/linux/page-flags.h | 6 ++++++
mm/page_alloc.c | 7 +++----
3 files changed, 9 insertions(+), 24 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0d4ee569aa6b..21db02e92b33 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4205,26 +4205,6 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
#define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS + PP_DMA_INDEX_SHIFT - 1, \
PP_DMA_INDEX_SHIFT)
-/* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is
- * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for
- * the head page of compound page and bit 1 for pfmemalloc page, as well as the
- * bits used for the DMA index. page_is_pfmemalloc() is checked in
- * __page_pool_put_page() to avoid recycling the pfmemalloc page.
- */
-#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
-
-#ifdef CONFIG_PAGE_POOL
-static inline bool page_pool_page_is_pp(const struct page *page)
-{
- return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
-}
-#else
-static inline bool page_pool_page_is_pp(const struct page *page)
-{
- return false;
-}
-#endif
-
#define PAGE_SNAPSHOT_FAITHFUL (1 << 0)
#define PAGE_SNAPSHOT_PG_BUDDY (1 << 1)
#define PAGE_SNAPSHOT_PG_IDLE (1 << 2)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 8d3fa3a91ce4..0afdf2ee3fbd 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -933,6 +933,7 @@ enum pagetype {
PGTY_zsmalloc = 0xf6,
PGTY_unaccepted = 0xf7,
PGTY_large_kmalloc = 0xf8,
+ PGTY_net_pp = 0xf9,
PGTY_mapcount_underflow = 0xff
};
@@ -1077,6 +1078,11 @@ PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc)
PAGE_TYPE_OPS(Unaccepted, unaccepted, unaccepted)
FOLIO_TYPE_OPS(large_kmalloc, large_kmalloc)
+/*
+ * Marks pages allocated by page_pool. See (see net/core/page_pool.c)
+ */
+PAGE_TYPE_OPS(Net_pp, net_pp, net_pp)
+
/**
* PageHuge - Determine if the page belongs to hugetlbfs
* @page: The page to test.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d1d037f97c5f..67dfd6d8a124 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1042,7 +1042,6 @@ static inline bool page_expected_state(struct page *page,
#ifdef CONFIG_MEMCG
page->memcg_data |
#endif
- page_pool_page_is_pp(page) |
(page->flags & check_flags)))
return false;
@@ -1069,8 +1068,6 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
if (unlikely(page->memcg_data))
bad_reason = "page still charged to cgroup";
#endif
- if (unlikely(page_pool_page_is_pp(page)))
- bad_reason = "page_pool leak";
return bad_reason;
}
@@ -1379,9 +1376,11 @@ __always_inline bool free_pages_prepare(struct page *page,
mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
folio->mapping = NULL;
}
- if (unlikely(page_has_type(page)))
+ if (unlikely(page_has_type(page))) {
+ WARN_ON_ONCE(PageNet_pp(page));
/* Reset the page_type (which overlays _mapcount) */
page->page_type = UINT_MAX;
+ }
if (is_check_pages_enabled()) {
if (free_page_is_bad(page))
--
Pavel Begunkov
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type
2025-08-13 11:18 ` Pavel Begunkov
@ 2025-08-13 14:52 ` Jakub Kicinski
2025-08-14 9:42 ` Pavel Begunkov
0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2025-08-13 14:52 UTC (permalink / raw)
To: Pavel Begunkov
Cc: Byungchul Park, akpm, linux-kernel, kernel_team, harry.yoo, ast,
daniel, davem, hawk, john.fastabend, sdf, saeedm, leon, tariqt,
mbloch, andrew+netdev, edumazet, pabeni, david, lorenzo.stoakes,
Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, jackmanb,
hannes, ziy, ilias.apalodimas, willy, brauner, kas, yuzhao,
usamaarif642, baolin.wang, almasrymina, toke, bpf, linux-rdma,
sfr, linux-mm, netdev
On Wed, 13 Aug 2025 12:18:56 +0100 Pavel Begunkov wrote:
> It should go to net, there will be enough of conflicts otherwise.
> mm maintainers, do you like it as a shared branch or can it just
> go through the net tree?
Looks like this is 100% in mm, and the work is not urgent at all.
So I'm happy for Andrew to take this, and dependent patches (if any)
can come in the next cycle.
> @@ -1379,9 +1376,11 @@ __always_inline bool free_pages_prepare(struct page *page,
> mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
> folio->mapping = NULL;
> }
> - if (unlikely(page_has_type(page)))
> + if (unlikely(page_has_type(page))) {
> + WARN_ON_ONCE(PageNet_pp(page));
I guess my ask to add a comment here got ignored?
> /* Reset the page_type (which overlays _mapcount) */
> page->page_type = UINT_MAX;
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type
2025-08-13 14:52 ` Jakub Kicinski
@ 2025-08-14 9:42 ` Pavel Begunkov
0 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2025-08-14 9:42 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Byungchul Park, akpm, linux-kernel, kernel_team, harry.yoo, ast,
daniel, davem, hawk, john.fastabend, sdf, saeedm, leon, tariqt,
mbloch, andrew+netdev, edumazet, pabeni, david, lorenzo.stoakes,
Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, jackmanb,
hannes, ziy, ilias.apalodimas, willy, brauner, kas, yuzhao,
usamaarif642, baolin.wang, almasrymina, toke, bpf, linux-rdma,
sfr, linux-mm, netdev
On 8/13/25 15:52, Jakub Kicinski wrote:
> On Wed, 13 Aug 2025 12:18:56 +0100 Pavel Begunkov wrote:
>> It should go to net, there will be enough of conflicts otherwise.
>> mm maintainers, do you like it as a shared branch or can it just
>> go through the net tree?
>
> Looks like this is 100% in mm, and the work is not urgent at all.
There is a slight dependency in rc1, but we should be able to
massage it to be mm only.
> So I'm happy for Andrew to take this, and dependent patches (if any)
> can come in the next cycle.
Yeah, good option. It'd be a good idea to cut the diff down to
avoid removing the relevant mm page state checks until the next
cycle.
>> @@ -1379,9 +1376,11 @@ __always_inline bool free_pages_prepare(struct page *page,
>> mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
>> folio->mapping = NULL;
>> }
>> - if (unlikely(page_has_type(page)))
>> + if (unlikely(page_has_type(page))) {
>> + WARN_ON_ONCE(PageNet_pp(page));
>
> I guess my ask to add a comment here got ignored?
It's an old patch attached as a point of reference. Any actual submission
surely will need to follow up on the reviews.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-08-14 9:41 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-29 11:02 [PATCH linux-next v3] mm, page_pool: introduce a new page type for page pool in page type Byungchul Park
2025-07-29 14:20 ` Zi Yan
2025-08-01 23:08 ` Jakub Kicinski
2025-08-04 1:03 ` Byungchul Park
2025-08-02 5:07 ` Stephen Rothwell
2025-08-04 1:17 ` Byungchul Park
2025-08-04 7:38 ` David Hildenbrand
2025-08-10 20:21 ` Pavel Begunkov
2025-08-11 1:09 ` Byungchul Park
2025-08-13 6:09 ` Byungchul Park
2025-08-13 11:18 ` Pavel Begunkov
2025-08-13 14:52 ` Jakub Kicinski
2025-08-14 9:42 ` Pavel Begunkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).