BPF List
 help / color / mirror / Atom feed
* [PATCH v4] mm: introduce a new page type for page pool in page type
@ 2026-02-24  5:13 Byungchul Park
  2026-02-25  7:19 ` Mike Rapoport
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Byungchul Park @ 2026-02-24  5:13 UTC (permalink / raw)
  To: linux-mm, akpm, netdev
  Cc: linux-kernel, kernel_team, harry.yoo, ast, daniel, davem, kuba,
	hawk, john.fastabend, sdf, saeedm, leon, tariqt, mbloch,
	andrew+netdev, edumazet, pabeni, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, jackmanb,
	hannes, ziy, ilias.apalodimas, willy, brauner, kas, yuzhao,
	usamaarif642, baolin.wang, almasrymina, toke, asml.silence, bpf,
	linux-rdma, sfr, dw, ap420073, dtatulea

Currently, the condition 'page->pp_magic == PP_SIGNATURE' is used to
determine if a page belongs to a page pool.  However, with the planned
removal of @pp_magic, we should instead leverage the page_type in struct
page, such as PGTY_netpp, for this purpose.

Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
and __ClearPageNetpp() instead, and remove the existing APIs accessing
@pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
netmem_clear_pp_magic().

Plus, add @page_type to struct net_iov at the same offset as struct page
so as to use the page_type APIs for struct net_iov as well.  While at it,
reorder @type and @owner in struct net_iov to avoid a hole and
increasing the struct size.

This work was inspired by the following link:

  https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/

While at it, move the sanity check for page pool to on the free path.

Suggested-by: David Hildenbrand <david@redhat.com>
Co-developed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Byungchul Park <byungchul@sk.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
The following 'Acked-by's were given only for mm part:

  Acked-by: David Hildenbrand <david@redhat.com>
  Acked-by: Zi Yan <ziy@nvidia.com>
  Acked-by: Vlastimil Babka <vbabka@suse.cz>

This patch changes both mm and page-pool, but I'm currently targetting
mm tree because Jakub asked and I also think it's more about mm change.
See the following link:

  https://lore.kernel.org/all/20250813075212.051b5178@kernel.org/

Changes from v3:
	1. Rebase on mm-new as of Feb 24, 2026.
	2. Fix an issue reported by kernel test robot due to incorrect
	   type.
	3. Add 'Acked-by: Vlastimil Babka <vbabka@suse.cz>'.  Thanks.

Changes from v2:
	1. Fix an issue reported by kernel test robot due to incorrect
	   type in argument of __netmem_to_page().

Changes from v1:
	1. Drop the finalizing patch removing the pp fields of struct
	   page since I found that there is still code accessing a pp
	   field via struct page.  I will retry the finalizing patch
	   after resolving the issue.
---
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  2 +-
 include/linux/mm.h                            | 27 +++----------------
 include/linux/page-flags.h                    |  6 +++++
 include/net/netmem.h                          | 15 +++++++++--
 mm/page_alloc.c                               | 11 +++++---
 net/core/netmem_priv.h                        | 23 +++++++---------
 net/core/page_pool.c                          | 24 +++++++++++++++--
 7 files changed, 62 insertions(+), 46 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index 80f9fc10877ad..7d90d2485c787 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -707,7 +707,7 @@ static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq,
 				xdpi = mlx5e_xdpi_fifo_pop(xdpi_fifo);
 				page = xdpi.page.page;
 
-				/* No need to check page_pool_page_is_pp() as we
+				/* No need to check PageNetpp() as we
 				 * know this is a page_pool page.
 				 */
 				page_pool_recycle_direct(pp_page_to_nmdesc(page)->pp,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 13336340612e2..0db764b3d6b84 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4824,10 +4824,9 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
  * DMA mapping IDs for page_pool
  *
  * When DMA-mapping a page, page_pool allocates an ID (from an xarray) and
- * stashes it in the upper bits of page->pp_magic. We always want to be able to
- * unambiguously identify page pool pages (using page_pool_page_is_pp()). Non-PP
- * pages can have arbitrary kernel pointers stored in the same field as pp_magic
- * (since it overlaps with page->lru.next), so we must ensure that we cannot
+ * stashes it in the upper bits of page->pp_magic. Non-PP pages can have
+ * arbitrary kernel pointers stored in the same field as pp_magic (since
+ * it overlaps with page->lru.next), so we must ensure that we cannot
  * mistake a valid kernel pointer with any of the values we write into this
  * field.
  *
@@ -4862,26 +4861,6 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
 #define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS + PP_DMA_INDEX_SHIFT - 1, \
 				  PP_DMA_INDEX_SHIFT)
 
-/* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is
- * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for
- * the head page of compound page and bit 1 for pfmemalloc page, as well as the
- * bits used for the DMA index. page_is_pfmemalloc() is checked in
- * __page_pool_put_page() to avoid recycling the pfmemalloc page.
- */
-#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
-
-#ifdef CONFIG_PAGE_POOL
-static inline bool page_pool_page_is_pp(const struct page *page)
-{
-	return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
-}
-#else
-static inline bool page_pool_page_is_pp(const struct page *page)
-{
-	return false;
-}
-#endif
-
 #define PAGE_SNAPSHOT_FAITHFUL (1 << 0)
 #define PAGE_SNAPSHOT_PG_BUDDY (1 << 1)
 #define PAGE_SNAPSHOT_PG_IDLE  (1 << 2)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 0426cac91c0bb..30c4eb24e4edf 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -939,6 +939,7 @@ enum pagetype {
 	PGTY_zsmalloc		= 0xf6,
 	PGTY_unaccepted		= 0xf7,
 	PGTY_large_kmalloc	= 0xf8,
+	PGTY_netpp		= 0xf9,
 
 	PGTY_mapcount_underflow = 0xff
 };
@@ -1071,6 +1072,11 @@ PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc)
 PAGE_TYPE_OPS(Unaccepted, unaccepted, unaccepted)
 PAGE_TYPE_OPS(LargeKmalloc, large_kmalloc, large_kmalloc)
 
+/*
+ * Marks page_pool allocated pages.
+ */
+PAGE_TYPE_OPS(Netpp, netpp, netpp)
+
 /**
  * PageHuge - Determine if the page belongs to hugetlbfs
  * @page: The page to test.
diff --git a/include/net/netmem.h b/include/net/netmem.h
index a96b3e5e5574c..85e3b26ec547f 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -110,10 +110,21 @@ struct net_iov {
 			atomic_long_t pp_ref_count;
 		};
 	};
-	struct net_iov_area *owner;
+
+	unsigned int page_type;
 	enum net_iov_type type;
+	struct net_iov_area *owner;
 };
 
+/* Make sure 'the offset of page_type in struct page == the offset of
+ * type in struct net_iov'.
+ */
+#define NET_IOV_ASSERT_OFFSET(pg, iov)			\
+	static_assert(offsetof(struct page, pg) ==	\
+		      offsetof(struct net_iov, iov))
+NET_IOV_ASSERT_OFFSET(page_type, page_type);
+#undef NET_IOV_ASSERT_OFFSET
+
 struct net_iov_area {
 	/* Array of net_iovs for this area. */
 	struct net_iov *niovs;
@@ -256,7 +267,7 @@ static inline unsigned long netmem_pfn_trace(netmem_ref netmem)
  */
 #define pp_page_to_nmdesc(p)						\
 ({									\
-	DEBUG_NET_WARN_ON_ONCE(!page_pool_page_is_pp(p));		\
+	DEBUG_NET_WARN_ON_ONCE(!PageNetpp(p));				\
 	__pp_page_to_nmdesc(p);						\
 })
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d88c8c67ac0b7..cae9f93271469 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1079,7 +1079,6 @@ static inline bool page_expected_state(struct page *page,
 #ifdef CONFIG_MEMCG
 			page->memcg_data |
 #endif
-			page_pool_page_is_pp(page) |
 			(page->flags.f & check_flags)))
 		return false;
 
@@ -1106,8 +1105,6 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
 	if (unlikely(page->memcg_data))
 		bad_reason = "page still charged to cgroup";
 #endif
-	if (unlikely(page_pool_page_is_pp(page)))
-		bad_reason = "page_pool leak";
 	return bad_reason;
 }
 
@@ -1416,9 +1413,15 @@ __always_inline bool __free_pages_prepare(struct page *page,
 		mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
 		folio->mapping = NULL;
 	}
-	if (unlikely(page_has_type(page)))
+	if (unlikely(page_has_type(page))) {
+		/* networking expects to clear its page type before releasing */
+		if (unlikely(PageNetpp(page))) {
+			bad_page(page, "page_pool leak");
+			return false;
+		}
 		/* Reset the page_type (which overlays _mapcount) */
 		page->page_type = UINT_MAX;
+	}
 
 	if (is_check_pages_enabled()) {
 		if (free_page_is_bad(page))
diff --git a/net/core/netmem_priv.h b/net/core/netmem_priv.h
index 23175cb2bd866..3e6fde8f1726f 100644
--- a/net/core/netmem_priv.h
+++ b/net/core/netmem_priv.h
@@ -8,21 +8,18 @@ static inline unsigned long netmem_get_pp_magic(netmem_ref netmem)
 	return netmem_to_nmdesc(netmem)->pp_magic & ~PP_DMA_INDEX_MASK;
 }
 
-static inline void netmem_or_pp_magic(netmem_ref netmem, unsigned long pp_magic)
-{
-	netmem_to_nmdesc(netmem)->pp_magic |= pp_magic;
-}
-
-static inline void netmem_clear_pp_magic(netmem_ref netmem)
-{
-	WARN_ON_ONCE(netmem_to_nmdesc(netmem)->pp_magic & PP_DMA_INDEX_MASK);
-
-	netmem_to_nmdesc(netmem)->pp_magic = 0;
-}
-
 static inline bool netmem_is_pp(netmem_ref netmem)
 {
-	return (netmem_get_pp_magic(netmem) & PP_MAGIC_MASK) == PP_SIGNATURE;
+	struct page *page;
+
+	/* XXX: Now that the offset of page_type is shared between
+	 * struct page and net_iov, just cast the netmem to struct page
+	 * unconditionally by clearing NET_IOV if any, no matter whether
+	 * it comes from struct net_iov or struct page.  This should be
+	 * adjusted once the offset is no longer shared.
+	 */
+	page = (struct page *)((__force unsigned long)netmem & ~NET_IOV);
+	return PageNetpp(page);
 }
 
 static inline void netmem_set_pp(netmem_ref netmem, struct page_pool *pool)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 265a729431bb7..877bbf7a19389 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -702,8 +702,18 @@ s32 page_pool_inflight(const struct page_pool *pool, bool strict)
 
 void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem)
 {
+	struct page *page;
+
 	netmem_set_pp(netmem, pool);
-	netmem_or_pp_magic(netmem, PP_SIGNATURE);
+
+	/* XXX: Now that the offset of page_type is shared between
+	 * struct page and net_iov, just cast the netmem to struct page
+	 * unconditionally by clearing NET_IOV if any, no matter whether
+	 * it comes from struct net_iov or struct page.  This should be
+	 * adjusted once the offset is no longer shared.
+	 */
+	page = (struct page *)((__force unsigned long)netmem & ~NET_IOV);
+	__SetPageNetpp(page);
 
 	/* Ensuring all pages have been split into one fragment initially:
 	 * page_pool_set_pp_info() is only called once for every page when it
@@ -718,7 +728,17 @@ void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem)
 
 void page_pool_clear_pp_info(netmem_ref netmem)
 {
-	netmem_clear_pp_magic(netmem);
+	struct page *page;
+
+	/* XXX: Now that the offset of page_type is shared between
+	 * struct page and net_iov, just cast the netmem to struct page
+	 * unconditionally by clearing NET_IOV if any, no matter whether
+	 * it comes from struct net_iov or struct page.  This should be
+	 * adjusted once the offset is no longer shared.
+	 */
+	page = (struct page *)((__force unsigned long)netmem & ~NET_IOV);
+	__ClearPageNetpp(page);
+
 	netmem_set_pp(netmem, NULL);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] mm: introduce a new page type for page pool in page type
  2026-02-24  5:13 [PATCH v4] mm: introduce a new page type for page pool in page type Byungchul Park
@ 2026-02-25  7:19 ` Mike Rapoport
  2026-02-26 18:49 ` Johannes Weiner
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 26+ messages in thread
From: Mike Rapoport @ 2026-02-25  7:19 UTC (permalink / raw)
  To: Byungchul Park
  Cc: linux-mm, akpm, netdev, linux-kernel, kernel_team, harry.yoo, ast,
	daniel, davem, kuba, hawk, john.fastabend, sdf, saeedm, leon,
	tariqt, mbloch, andrew+netdev, edumazet, pabeni, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, surenb, mhocko, horms,
	jackmanb, hannes, ziy, ilias.apalodimas, willy, brauner, kas,
	yuzhao, usamaarif642, baolin.wang, almasrymina, toke,
	asml.silence, bpf, linux-rdma, sfr, dw, ap420073, dtatulea

On Tue, Feb 24, 2026 at 02:13:47PM +0900, Byungchul Park wrote:
> Currently, the condition 'page->pp_magic == PP_SIGNATURE' is used to
> determine if a page belongs to a page pool.  However, with the planned
> removal of @pp_magic, we should instead leverage the page_type in struct
> page, such as PGTY_netpp, for this purpose.
> 
> Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
> and __ClearPageNetpp() instead, and remove the existing APIs accessing
> @pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
> netmem_clear_pp_magic().
> 
> Plus, add @page_type to struct net_iov at the same offset as struct page
> so as to use the page_type APIs for struct net_iov as well.  While at it,
> reorder @type and @owner in struct net_iov to avoid a hole and
> increasing the struct size.
> 
> This work was inspired by the following link:
> 
>   https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/
> 
> While at it, move the sanity check for page pool to on the free path.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Co-developed-by: Pavel Begunkov <asml.silence@gmail.com>
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Acked-by: Zi Yan <ziy@nvidia.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>

Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] mm: introduce a new page type for page pool in page type
  2026-02-24  5:13 [PATCH v4] mm: introduce a new page type for page pool in page type Byungchul Park
  2026-02-25  7:19 ` Mike Rapoport
@ 2026-02-26 18:49 ` Johannes Weiner
  2026-03-16 22:29   ` Byungchul Park
  2026-05-13  9:00 ` [PATCH v4] " Dragos Tatulea
  2026-05-13  9:42 ` Lorenzo Stoakes
  3 siblings, 1 reply; 26+ messages in thread
From: Johannes Weiner @ 2026-02-26 18:49 UTC (permalink / raw)
  To: Byungchul Park
  Cc: linux-mm, akpm, netdev, linux-kernel, kernel_team, harry.yoo, ast,
	daniel, davem, kuba, hawk, john.fastabend, sdf, saeedm, leon,
	tariqt, mbloch, andrew+netdev, edumazet, pabeni, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	horms, jackmanb, ziy, ilias.apalodimas, willy, brauner, kas,
	yuzhao, usamaarif642, baolin.wang, almasrymina, toke,
	asml.silence, bpf, linux-rdma, sfr, dw, ap420073, dtatulea

On Tue, Feb 24, 2026 at 02:13:47PM +0900, Byungchul Park wrote:
> @@ -1416,9 +1413,15 @@ __always_inline bool __free_pages_prepare(struct page *page,
>  		mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
>  		folio->mapping = NULL;
>  	}
> -	if (unlikely(page_has_type(page)))
> +	if (unlikely(page_has_type(page))) {
> +		/* networking expects to clear its page type before releasing */
> +		if (unlikely(PageNetpp(page))) {

You can gate that on is_check_pages_enabled(), to avoid a new branch
in the !debug case.

Otherwise, the MM bits look good to me!

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] mm: introduce a new page type for page pool in page type
  2026-02-26 18:49 ` Johannes Weiner
@ 2026-03-16 22:29   ` Byungchul Park
  2026-03-16 22:31     ` [PATCH v5] " Byungchul Park
  0 siblings, 1 reply; 26+ messages in thread
From: Byungchul Park @ 2026-03-16 22:29 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, akpm, netdev, linux-kernel, kernel_team, harry.yoo, ast,
	daniel, davem, kuba, hawk, john.fastabend, sdf, saeedm, leon,
	tariqt, mbloch, andrew+netdev, edumazet, pabeni, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	horms, jackmanb, ziy, ilias.apalodimas, willy, brauner, kas,
	yuzhao, usamaarif642, baolin.wang, almasrymina, toke,
	asml.silence, bpf, linux-rdma, sfr, dw, ap420073, dtatulea

On Thu, Feb 26, 2026 at 01:49:49PM -0500, Johannes Weiner wrote:
> On Tue, Feb 24, 2026 at 02:13:47PM +0900, Byungchul Park wrote:
> > @@ -1416,9 +1413,15 @@ __always_inline bool __free_pages_prepare(struct page *page,
> >               mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
> >               folio->mapping = NULL;
> >       }
> > -     if (unlikely(page_has_type(page)))
> > +     if (unlikely(page_has_type(page))) {
> > +             /* networking expects to clear its page type before releasing */
> > +             if (unlikely(PageNetpp(page))) {
> 
> You can gate that on is_check_pages_enabled(), to avoid a new branch
> in the !debug case.

It's an unlikely condition so behaves already almost no branch.  Do we
need the additional effort?  No objection but I'm just curious.

I'm adding v5 patch with the request applied in this mail thread anyway.

	Byungchul

> Otherwise, the MM bits look good to me!
> 
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v5] mm: introduce a new page type for page pool in page type
  2026-03-16 22:29   ` Byungchul Park
@ 2026-03-16 22:31     ` Byungchul Park
  2026-03-17  9:20       ` Jesper Dangaard Brouer
  2026-03-19 23:31       ` Jakub Kicinski
  0 siblings, 2 replies; 26+ messages in thread
From: Byungchul Park @ 2026-03-16 22:31 UTC (permalink / raw)
  To: linux-mm, akpm, netdev
  Cc: linux-kernel, kernel_team, harry.yoo, ast, daniel, davem, kuba,
	hawk, john.fastabend, sdf, saeedm, leon, tariqt, mbloch,
	andrew+netdev, edumazet, pabeni, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, jackmanb,
	hannes, ziy, ilias.apalodimas, willy, brauner, kas, yuzhao,
	usamaarif642, baolin.wang, almasrymina, toke, asml.silence, bpf,
	linux-rdma, sfr, dw, ap420073, dtatulea

Currently, the condition 'page->pp_magic == PP_SIGNATURE' is used to
determine if a page belongs to a page pool.  However, with the planned
removal of @pp_magic, we should instead leverage the page_type in struct
page, such as PGTY_netpp, for this purpose.

Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
and __ClearPageNetpp() instead, and remove the existing APIs accessing
@pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
netmem_clear_pp_magic().

Plus, add @page_type to struct net_iov at the same offset as struct page
so as to use the page_type APIs for struct net_iov as well.  While at it,
reorder @type and @owner in struct net_iov to avoid a hole and
increasing the struct size.

This work was inspired by the following link:

  https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/

While at it, move the sanity check for page pool to on the free path.

Suggested-by: David Hildenbrand <david@redhat.com>
Co-developed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Byungchul Park <byungchul@sk.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
The following 'Acked-by's were given only for mm part:

  Acked-by: David Hildenbrand <david@redhat.com>
  Acked-by: Zi Yan <ziy@nvidia.com>
  Acked-by: Vlastimil Babka <vbabka@suse.cz>
  Acked-by: Johannes Weiner <hannes@cmpxchg.org>

This patch changes both mm and page-pool, but I'm currently targetting
mm tree because Jakub asked and I also think it's more about mm change.
See the following link:

  https://lore.kernel.org/all/20250813075212.051b5178@kernel.org/

Changes from v4:
	1. Gate the sanity check, 'if (unlikely(PageNetpp(page)))' on
	   free path, with is_check_pages_enabled(). (Feedbacked by
	   Johannes Weiner)
	2. Add 'Acked-by's.  Thanks.

Changes from v3:
	1. Rebase on mm-new as of Feb 24, 2026.
	2. Fix an issue reported by kernel test robot due to incorrect
	   type.
	3. Add 'Acked-by: Vlastimil Babka <vbabka@suse.cz>'.  Thanks.

Changes from v2:
	1. Fix an issue reported by kernel test robot due to incorrect
	   type in argument of __netmem_to_page().

Changes from v1:
	1. Drop the finalizing patch removing the pp fields of struct
	   page since I found that there is still code accessing a pp
	   field via struct page.  I will retry the finalizing patch
	   after resolving the issue.
---
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  2 +-
 include/linux/mm.h                            | 27 +++----------------
 include/linux/page-flags.h                    |  6 +++++
 include/net/netmem.h                          | 15 +++++++++--
 mm/page_alloc.c                               | 13 ++++++---
 net/core/netmem_priv.h                        | 23 +++++++---------
 net/core/page_pool.c                          | 24 +++++++++++++++--
 7 files changed, 64 insertions(+), 46 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index 80f9fc10877ad..7d90d2485c787 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -707,7 +707,7 @@ static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq,
 				xdpi = mlx5e_xdpi_fifo_pop(xdpi_fifo);
 				page = xdpi.page.page;
 
-				/* No need to check page_pool_page_is_pp() as we
+				/* No need to check PageNetpp() as we
 				 * know this is a page_pool page.
 				 */
 				page_pool_recycle_direct(pp_page_to_nmdesc(page)->pp,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 52206525af7c7..6f0a3edb24e14 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -5197,10 +5197,9 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
  * DMA mapping IDs for page_pool
  *
  * When DMA-mapping a page, page_pool allocates an ID (from an xarray) and
- * stashes it in the upper bits of page->pp_magic. We always want to be able to
- * unambiguously identify page pool pages (using page_pool_page_is_pp()). Non-PP
- * pages can have arbitrary kernel pointers stored in the same field as pp_magic
- * (since it overlaps with page->lru.next), so we must ensure that we cannot
+ * stashes it in the upper bits of page->pp_magic. Non-PP pages can have
+ * arbitrary kernel pointers stored in the same field as pp_magic (since
+ * it overlaps with page->lru.next), so we must ensure that we cannot
  * mistake a valid kernel pointer with any of the values we write into this
  * field.
  *
@@ -5235,26 +5234,6 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
 #define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS + PP_DMA_INDEX_SHIFT - 1, \
 				  PP_DMA_INDEX_SHIFT)
 
-/* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is
- * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for
- * the head page of compound page and bit 1 for pfmemalloc page, as well as the
- * bits used for the DMA index. page_is_pfmemalloc() is checked in
- * __page_pool_put_page() to avoid recycling the pfmemalloc page.
- */
-#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
-
-#ifdef CONFIG_PAGE_POOL
-static inline bool page_pool_page_is_pp(const struct page *page)
-{
-	return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
-}
-#else
-static inline bool page_pool_page_is_pp(const struct page *page)
-{
-	return false;
-}
-#endif
-
 #define PAGE_SNAPSHOT_FAITHFUL (1 << 0)
 #define PAGE_SNAPSHOT_PG_BUDDY (1 << 1)
 #define PAGE_SNAPSHOT_PG_IDLE  (1 << 2)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 7223f6f4e2b40..0e03d816e8b9d 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -923,6 +923,7 @@ enum pagetype {
 	PGTY_zsmalloc		= 0xf6,
 	PGTY_unaccepted		= 0xf7,
 	PGTY_large_kmalloc	= 0xf8,
+	PGTY_netpp		= 0xf9,
 
 	PGTY_mapcount_underflow = 0xff
 };
@@ -1055,6 +1056,11 @@ PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc)
 PAGE_TYPE_OPS(Unaccepted, unaccepted, unaccepted)
 PAGE_TYPE_OPS(LargeKmalloc, large_kmalloc, large_kmalloc)
 
+/*
+ * Marks page_pool allocated pages.
+ */
+PAGE_TYPE_OPS(Netpp, netpp, netpp)
+
 /**
  * PageHuge - Determine if the page belongs to hugetlbfs
  * @page: The page to test.
diff --git a/include/net/netmem.h b/include/net/netmem.h
index a96b3e5e5574c..85e3b26ec547f 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -110,10 +110,21 @@ struct net_iov {
 			atomic_long_t pp_ref_count;
 		};
 	};
-	struct net_iov_area *owner;
+
+	unsigned int page_type;
 	enum net_iov_type type;
+	struct net_iov_area *owner;
 };
 
+/* Make sure 'the offset of page_type in struct page == the offset of
+ * type in struct net_iov'.
+ */
+#define NET_IOV_ASSERT_OFFSET(pg, iov)			\
+	static_assert(offsetof(struct page, pg) ==	\
+		      offsetof(struct net_iov, iov))
+NET_IOV_ASSERT_OFFSET(page_type, page_type);
+#undef NET_IOV_ASSERT_OFFSET
+
 struct net_iov_area {
 	/* Array of net_iovs for this area. */
 	struct net_iov *niovs;
@@ -256,7 +267,7 @@ static inline unsigned long netmem_pfn_trace(netmem_ref netmem)
  */
 #define pp_page_to_nmdesc(p)						\
 ({									\
-	DEBUG_NET_WARN_ON_ONCE(!page_pool_page_is_pp(p));		\
+	DEBUG_NET_WARN_ON_ONCE(!PageNetpp(p));				\
 	__pp_page_to_nmdesc(p);						\
 })
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9f2fe46ff69a1..ee81f5c67c18f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1044,7 +1044,6 @@ static inline bool page_expected_state(struct page *page,
 #ifdef CONFIG_MEMCG
 			page->memcg_data |
 #endif
-			page_pool_page_is_pp(page) |
 			(page->flags.f & check_flags)))
 		return false;
 
@@ -1071,8 +1070,6 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
 	if (unlikely(page->memcg_data))
 		bad_reason = "page still charged to cgroup";
 #endif
-	if (unlikely(page_pool_page_is_pp(page)))
-		bad_reason = "page_pool leak";
 	return bad_reason;
 }
 
@@ -1381,9 +1378,17 @@ __always_inline bool __free_pages_prepare(struct page *page,
 		mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
 		folio->mapping = NULL;
 	}
-	if (unlikely(page_has_type(page)))
+	if (unlikely(page_has_type(page))) {
+		/* networking expects to clear its page type before releasing */
+		if (is_check_pages_enabled()) {
+			if (unlikely(PageNetpp(page))) {
+				bad_page(page, "page_pool leak");
+				return false;
+			}
+		}
 		/* Reset the page_type (which overlays _mapcount) */
 		page->page_type = UINT_MAX;
+	}
 
 	if (is_check_pages_enabled()) {
 		if (free_page_is_bad(page))
diff --git a/net/core/netmem_priv.h b/net/core/netmem_priv.h
index 23175cb2bd866..3e6fde8f1726f 100644
--- a/net/core/netmem_priv.h
+++ b/net/core/netmem_priv.h
@@ -8,21 +8,18 @@ static inline unsigned long netmem_get_pp_magic(netmem_ref netmem)
 	return netmem_to_nmdesc(netmem)->pp_magic & ~PP_DMA_INDEX_MASK;
 }
 
-static inline void netmem_or_pp_magic(netmem_ref netmem, unsigned long pp_magic)
-{
-	netmem_to_nmdesc(netmem)->pp_magic |= pp_magic;
-}
-
-static inline void netmem_clear_pp_magic(netmem_ref netmem)
-{
-	WARN_ON_ONCE(netmem_to_nmdesc(netmem)->pp_magic & PP_DMA_INDEX_MASK);
-
-	netmem_to_nmdesc(netmem)->pp_magic = 0;
-}
-
 static inline bool netmem_is_pp(netmem_ref netmem)
 {
-	return (netmem_get_pp_magic(netmem) & PP_MAGIC_MASK) == PP_SIGNATURE;
+	struct page *page;
+
+	/* XXX: Now that the offset of page_type is shared between
+	 * struct page and net_iov, just cast the netmem to struct page
+	 * unconditionally by clearing NET_IOV if any, no matter whether
+	 * it comes from struct net_iov or struct page.  This should be
+	 * adjusted once the offset is no longer shared.
+	 */
+	page = (struct page *)((__force unsigned long)netmem & ~NET_IOV);
+	return PageNetpp(page);
 }
 
 static inline void netmem_set_pp(netmem_ref netmem, struct page_pool *pool)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 265a729431bb7..877bbf7a19389 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -702,8 +702,18 @@ s32 page_pool_inflight(const struct page_pool *pool, bool strict)
 
 void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem)
 {
+	struct page *page;
+
 	netmem_set_pp(netmem, pool);
-	netmem_or_pp_magic(netmem, PP_SIGNATURE);
+
+	/* XXX: Now that the offset of page_type is shared between
+	 * struct page and net_iov, just cast the netmem to struct page
+	 * unconditionally by clearing NET_IOV if any, no matter whether
+	 * it comes from struct net_iov or struct page.  This should be
+	 * adjusted once the offset is no longer shared.
+	 */
+	page = (struct page *)((__force unsigned long)netmem & ~NET_IOV);
+	__SetPageNetpp(page);
 
 	/* Ensuring all pages have been split into one fragment initially:
 	 * page_pool_set_pp_info() is only called once for every page when it
@@ -718,7 +728,17 @@ void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem)
 
 void page_pool_clear_pp_info(netmem_ref netmem)
 {
-	netmem_clear_pp_magic(netmem);
+	struct page *page;
+
+	/* XXX: Now that the offset of page_type is shared between
+	 * struct page and net_iov, just cast the netmem to struct page
+	 * unconditionally by clearing NET_IOV if any, no matter whether
+	 * it comes from struct net_iov or struct page.  This should be
+	 * adjusted once the offset is no longer shared.
+	 */
+	page = (struct page *)((__force unsigned long)netmem & ~NET_IOV);
+	__ClearPageNetpp(page);
+
 	netmem_set_pp(netmem, NULL);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v5] mm: introduce a new page type for page pool in page type
  2026-03-16 22:31     ` [PATCH v5] " Byungchul Park
@ 2026-03-17  9:20       ` Jesper Dangaard Brouer
  2026-03-17 10:03         ` Ilias Apalodimas
                           ` (2 more replies)
  2026-03-19 23:31       ` Jakub Kicinski
  1 sibling, 3 replies; 26+ messages in thread
From: Jesper Dangaard Brouer @ 2026-03-17  9:20 UTC (permalink / raw)
  To: Byungchul Park, linux-mm, akpm, netdev, Jakub Kicinski,
	Mina Almasry, Toke Hoiland Jorgensen
  Cc: linux-kernel, kernel_team, harry.yoo, ast, daniel, davem, kuba,
	john.fastabend, sdf, saeedm, leon, tariqt, mbloch, andrew+netdev,
	edumazet, pabeni, david, lorenzo.stoakes, Liam.Howlett, vbabka,
	rppt, surenb, mhocko, horms, jackmanb, hannes, ziy,
	ilias.apalodimas, willy, brauner, kas, yuzhao, usamaarif642,
	baolin.wang, almasrymina, toke, asml.silence, bpf, linux-rdma,
	sfr, dw, ap420073, dtatulea




On 16/03/2026 23.31, Byungchul Park wrote:
> Currently, the condition 'page->pp_magic == PP_SIGNATURE' is used to
> determine if a page belongs to a page pool.  However, with the planned
> removal of @pp_magic, we should instead leverage the page_type in struct
> page, such as PGTY_netpp, for this purpose.
> 
> Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
> and __ClearPageNetpp() instead, and remove the existing APIs accessing
> @pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
> netmem_clear_pp_magic().
> 
> Plus, add @page_type to struct net_iov at the same offset as struct page
> so as to use the page_type APIs for struct net_iov as well.  While at it,
> reorder @type and @owner in struct net_iov to avoid a hole and
> increasing the struct size.
> 
> This work was inspired by the following link:
> 
>    https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/
> 
> While at it, move the sanity check for page pool to on the free path.
> 
[...]
see below

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9f2fe46ff69a1..ee81f5c67c18f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1044,7 +1044,6 @@ static inline bool page_expected_state(struct page *page,
>   #ifdef CONFIG_MEMCG
>   			page->memcg_data |
>   #endif
> -			page_pool_page_is_pp(page) |
>   			(page->flags.f & check_flags)))
>   		return false;
>   
> @@ -1071,8 +1070,6 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
>   	if (unlikely(page->memcg_data))
>   		bad_reason = "page still charged to cgroup";
>   #endif
> -	if (unlikely(page_pool_page_is_pp(page)))
> -		bad_reason = "page_pool leak";
>   	return bad_reason;
>   }
>   
> @@ -1381,9 +1378,17 @@ __always_inline bool __free_pages_prepare(struct page *page,
>   		mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
>   		folio->mapping = NULL;
>   	}
> -	if (unlikely(page_has_type(page)))
> +	if (unlikely(page_has_type(page))) {
> +		/* networking expects to clear its page type before releasing */
> +		if (is_check_pages_enabled()) {
> +			if (unlikely(PageNetpp(page))) {
> +				bad_page(page, "page_pool leak");
> +				return false;
> +			}
> +		}
>   		/* Reset the page_type (which overlays _mapcount) */
>   		page->page_type = UINT_MAX;
> +	}

I need some opinions here.  When CONFIG_DEBUG_VM is enabled we get help
finding (hard to find) page_pool leaks and mark the page as bad.

When CONFIG_DEBUG_VM is disabled we silently "allow" leaking.
Should we handle this more gracefully here? (release pp resources)

Allowing the page to be reused at this point, when a page_pool instance
is known to track life-time and (likely) have the page DMA mapped, seems
problematic to me.  Code only clears page->page_type, but IIRC Toke
added DMA tracking in other fields (plus xarray internally).

I was going to suggest to simply re-export page_pool_release_page() that
was hidden by 535b9c61bdef ("net: page_pool: hide
page_pool_release_page()"), but that functionality was lost in
07e0c7d3179d ("net: page_pool: merge page_pool_release_page() with
page_pool_return_page()"). (Cc Jakub/Kuba for that choice).
Simply page_pool_release_page(page->pp, page).

Opinions?

--Jesper

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5] mm: introduce a new page type for page pool in page type
  2026-03-17  9:20       ` Jesper Dangaard Brouer
@ 2026-03-17 10:03         ` Ilias Apalodimas
  2026-03-17 11:06         ` Dragos Tatulea
  2026-03-18  2:02         ` Byungchul Park
  2 siblings, 0 replies; 26+ messages in thread
From: Ilias Apalodimas @ 2026-03-17 10:03 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Byungchul Park, linux-mm, akpm, netdev, Jakub Kicinski,
	Mina Almasry, Toke Hoiland Jorgensen, linux-kernel, kernel_team,
	harry.yoo, ast, daniel, davem, john.fastabend, sdf, saeedm, leon,
	tariqt, mbloch, andrew+netdev, edumazet, pabeni, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	horms, jackmanb, hannes, ziy, willy, brauner, kas, yuzhao,
	usamaarif642, baolin.wang, asml.silence, bpf, linux-rdma, sfr, dw,
	ap420073, dtatulea

Hi Jesper,

On Tue, 17 Mar 2026 at 11:20, Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>
>
>
>
> On 16/03/2026 23.31, Byungchul Park wrote:
> > Currently, the condition 'page->pp_magic == PP_SIGNATURE' is used to
> > determine if a page belongs to a page pool.  However, with the planned
> > removal of @pp_magic, we should instead leverage the page_type in struct
> > page, such as PGTY_netpp, for this purpose.
> >
> > Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
> > and __ClearPageNetpp() instead, and remove the existing APIs accessing
> > @pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
> > netmem_clear_pp_magic().
> >
> > Plus, add @page_type to struct net_iov at the same offset as struct page
> > so as to use the page_type APIs for struct net_iov as well.  While at it,
> > reorder @type and @owner in struct net_iov to avoid a hole and
> > increasing the struct size.
> >
> > This work was inspired by the following link:
> >
> >    https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/
> >
> > While at it, move the sanity check for page pool to on the free path.
> >
> [...]
> see below
>
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 9f2fe46ff69a1..ee81f5c67c18f 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1044,7 +1044,6 @@ static inline bool page_expected_state(struct page *page,
> >   #ifdef CONFIG_MEMCG
> >                       page->memcg_data |
> >   #endif
> > -                     page_pool_page_is_pp(page) |
> >                       (page->flags.f & check_flags)))
> >               return false;
> >
> > @@ -1071,8 +1070,6 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
> >       if (unlikely(page->memcg_data))
> >               bad_reason = "page still charged to cgroup";
> >   #endif
> > -     if (unlikely(page_pool_page_is_pp(page)))
> > -             bad_reason = "page_pool leak";
> >       return bad_reason;
> >   }
> >
> > @@ -1381,9 +1378,17 @@ __always_inline bool __free_pages_prepare(struct page *page,
> >               mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
> >               folio->mapping = NULL;
> >       }
> > -     if (unlikely(page_has_type(page)))
> > +     if (unlikely(page_has_type(page))) {
> > +             /* networking expects to clear its page type before releasing */
> > +             if (is_check_pages_enabled()) {
> > +                     if (unlikely(PageNetpp(page))) {
> > +                             bad_page(page, "page_pool leak");
> > +                             return false;
> > +                     }
> > +             }
> >               /* Reset the page_type (which overlays _mapcount) */
> >               page->page_type = UINT_MAX;
> > +     }
>
> I need some opinions here.  When CONFIG_DEBUG_VM is enabled we get help
> finding (hard to find) page_pool leaks and mark the page as bad.
>

You mean we get debug messages about the leak right? Nothing is
silently freed with that config enabled.

> When CONFIG_DEBUG_VM is disabled we silently "allow" leaking.
> Should we handle this more gracefully here? (release pp resources)
>
> Allowing the page to be reused at this point, when a page_pool instance
> is known to track life-time and (likely) have the page DMA mapped, seems
> problematic to me.  Code only clears page->page_type, but IIRC Toke
> added DMA tracking in other fields (plus xarray internally).

If we can reliably track and release the pages, I think we should do
this regardless of CONFIG_DEBUG_VM being enabled or not. But the
question is how did that leak happen? Is it a misuse of the API from
the driver?

Thanks
/Ilias
>
> I was going to suggest to simply re-export page_pool_release_page() that
> was hidden by 535b9c61bdef ("net: page_pool: hide
> page_pool_release_page()"), but that functionality was lost in
> 07e0c7d3179d ("net: page_pool: merge page_pool_release_page() with
> page_pool_return_page()"). (Cc Jakub/Kuba for that choice).
> Simply page_pool_release_page(page->pp, page).
>
> Opinions?
>
> --Jesper

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5] mm: introduce a new page type for page pool in page type
  2026-03-17  9:20       ` Jesper Dangaard Brouer
  2026-03-17 10:03         ` Ilias Apalodimas
@ 2026-03-17 11:06         ` Dragos Tatulea
  2026-03-19 23:31           ` Jakub Kicinski
  2026-03-18  2:02         ` Byungchul Park
  2 siblings, 1 reply; 26+ messages in thread
From: Dragos Tatulea @ 2026-03-17 11:06 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Byungchul Park, linux-mm, akpm, netdev,
	Jakub Kicinski, Mina Almasry, Toke Hoiland Jorgensen
  Cc: linux-kernel, kernel_team, harry.yoo, ast, daniel, davem,
	john.fastabend, sdf, saeedm, leon, tariqt, mbloch, andrew+netdev,
	edumazet, pabeni, david, lorenzo.stoakes, Liam.Howlett, vbabka,
	rppt, surenb, mhocko, horms, jackmanb, hannes, ziy,
	ilias.apalodimas, willy, brauner, kas, yuzhao, usamaarif642,
	baolin.wang, asml.silence, bpf, linux-rdma, sfr, dw, ap420073



On 17.03.26 10:20, Jesper Dangaard Brouer wrote:
> 
> 
> 
> On 16/03/2026 23.31, Byungchul Park wrote:
>> Currently, the condition 'page->pp_magic == PP_SIGNATURE' is used to
>> determine if a page belongs to a page pool.  However, with the planned
>> removal of @pp_magic, we should instead leverage the page_type in struct
>> page, such as PGTY_netpp, for this purpose.
>>
>> Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
>> and __ClearPageNetpp() instead, and remove the existing APIs accessing
>> @pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
>> netmem_clear_pp_magic().
>>
>> Plus, add @page_type to struct net_iov at the same offset as struct page
>> so as to use the page_type APIs for struct net_iov as well.  While at it,
>> reorder @type and @owner in struct net_iov to avoid a hole and
>> increasing the struct size.
>>
>> This work was inspired by the following link:
>>
>>    https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/
>>
>> While at it, move the sanity check for page pool to on the free path.
>>
> [...]
> see below
> 
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 9f2fe46ff69a1..ee81f5c67c18f 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1044,7 +1044,6 @@ static inline bool page_expected_state(struct page *page,
>>   #ifdef CONFIG_MEMCG
>>               page->memcg_data |
>>   #endif
>> -            page_pool_page_is_pp(page) |
>>               (page->flags.f & check_flags)))
>>           return false;
>>   @@ -1071,8 +1070,6 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
>>       if (unlikely(page->memcg_data))
>>           bad_reason = "page still charged to cgroup";
>>   #endif
>> -    if (unlikely(page_pool_page_is_pp(page)))
>> -        bad_reason = "page_pool leak";
>>       return bad_reason;
>>   }
>>   @@ -1381,9 +1378,17 @@ __always_inline bool __free_pages_prepare(struct page *page,
>>           mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
>>           folio->mapping = NULL;
>>       }
>> -    if (unlikely(page_has_type(page)))
>> +    if (unlikely(page_has_type(page))) {
>> +        /* networking expects to clear its page type before releasing */
>> +        if (is_check_pages_enabled()) {
>> +            if (unlikely(PageNetpp(page))) {
>> +                bad_page(page, "page_pool leak");
>> +                return false;
>> +            }
>> +        }
>>           /* Reset the page_type (which overlays _mapcount) */
>>           page->page_type = UINT_MAX;
>> +    }
> 
> I need some opinions here.  When CONFIG_DEBUG_VM is enabled we get help
> finding (hard to find) page_pool leaks and mark the page as bad.
> 
> When CONFIG_DEBUG_VM is disabled we silently "allow" leaking.
Some leaks could still be caught at the page_pool level through
the YNL api of disconnected page_pools. Not the nasty/interesting
ones though...

> Should we handle this more gracefully here? (release pp resources)
> 
Releasing of a leaked page_pool page could lead to many harder to track
side-effects further down the line. Isn't it better to simply let the
page in a leaked state?

Having a way to track this leaked state though would certainly be
useful though.

Thanks,
Dragos

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5] mm: introduce a new page type for page pool in page type
  2026-03-17  9:20       ` Jesper Dangaard Brouer
  2026-03-17 10:03         ` Ilias Apalodimas
  2026-03-17 11:06         ` Dragos Tatulea
@ 2026-03-18  2:02         ` Byungchul Park
  2026-03-20 11:44           ` Jesper Dangaard Brouer
  2 siblings, 1 reply; 26+ messages in thread
From: Byungchul Park @ 2026-03-18  2:02 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, akpm, netdev, Jakub Kicinski, Mina Almasry,
	Toke Hoiland Jorgensen, linux-kernel, kernel_team, harry.yoo, ast,
	daniel, davem, john.fastabend, sdf, saeedm, leon, tariqt, mbloch,
	andrew+netdev, edumazet, pabeni, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, jackmanb,
	hannes, ziy, ilias.apalodimas, willy, brauner, kas, yuzhao,
	usamaarif642, baolin.wang, asml.silence, bpf, linux-rdma, sfr, dw,
	ap420073, dtatulea

On Tue, Mar 17, 2026 at 10:20:08AM +0100, Jesper Dangaard Brouer wrote:
> On 16/03/2026 23.31, Byungchul Park wrote:
> > Currently, the condition 'page->pp_magic == PP_SIGNATURE' is used to
> > determine if a page belongs to a page pool.  However, with the planned
> > removal of @pp_magic, we should instead leverage the page_type in struct
> > page, such as PGTY_netpp, for this purpose.
> > 
> > Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
> > and __ClearPageNetpp() instead, and remove the existing APIs accessing
> > @pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
> > netmem_clear_pp_magic().
> > 
> > Plus, add @page_type to struct net_iov at the same offset as struct page
> > so as to use the page_type APIs for struct net_iov as well.  While at it,
> > reorder @type and @owner in struct net_iov to avoid a hole and
> > increasing the struct size.
> > 
> > This work was inspired by the following link:
> > 
> >    https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/
> > 
> > While at it, move the sanity check for page pool to on the free path.
> > 
> [...]
> see below
> 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 9f2fe46ff69a1..ee81f5c67c18f 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1044,7 +1044,6 @@ static inline bool page_expected_state(struct page *page,
> >   #ifdef CONFIG_MEMCG
> >                       page->memcg_data |
> >   #endif
> > -                     page_pool_page_is_pp(page) |
> >                       (page->flags.f & check_flags)))
> >               return false;
> > 
> > @@ -1071,8 +1070,6 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
> >       if (unlikely(page->memcg_data))
> >               bad_reason = "page still charged to cgroup";
> >   #endif
> > -     if (unlikely(page_pool_page_is_pp(page)))
> > -             bad_reason = "page_pool leak";
> >       return bad_reason;
> >   }
> > 
> > @@ -1381,9 +1378,17 @@ __always_inline bool __free_pages_prepare(struct page *page,
> >               mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
> >               folio->mapping = NULL;
> >       }
> > -     if (unlikely(page_has_type(page)))
> > +     if (unlikely(page_has_type(page))) {
> > +             /* networking expects to clear its page type before releasing */
> > +             if (is_check_pages_enabled()) {
> > +                     if (unlikely(PageNetpp(page))) {
> > +                             bad_page(page, "page_pool leak");
> > +                             return false;
> > +                     }
> > +             }
> >               /* Reset the page_type (which overlays _mapcount) */
> >               page->page_type = UINT_MAX;
> > +     }
> 
> I need some opinions here.  When CONFIG_DEBUG_VM is enabled we get help
> finding (hard to find) page_pool leaks and mark the page as bad.
> 
> When CONFIG_DEBUG_VM is disabled we silently "allow" leaking.
> Should we handle this more gracefully here? (release pp resources)
> 
> Allowing the page to be reused at this point, when a page_pool instance
> is known to track life-time and (likely) have the page DMA mapped, seems
> problematic to me.  Code only clears page->page_type, but IIRC Toke
> added DMA tracking in other fields (plus xarray internally).
> 
> I was going to suggest to simply re-export page_pool_release_page() that
> was hidden by 535b9c61bdef ("net: page_pool: hide
> page_pool_release_page()"), but that functionality was lost in
> 07e0c7d3179d ("net: page_pool: merge page_pool_release_page() with
> page_pool_return_page()"). (Cc Jakub/Kuba for that choice).
> Simply page_pool_release_page(page->pp, page).
> 
> Opinions?

This is not an issue related to this patch but it's worth discussing. :)

Not only page pool state but any bad states checks are also skipped with
is_check_pages_enabled() off, which means they are going to be
suppressed with the off.

It is necessary to first clarify whether the action was intentional or
not.  According to the answer, we can discuss what to do better. :)

	Byungchul
> 
> --Jesper

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5] mm: introduce a new page type for page pool in page type
  2026-03-17 11:06         ` Dragos Tatulea
@ 2026-03-19 23:31           ` Jakub Kicinski
  0 siblings, 0 replies; 26+ messages in thread
From: Jakub Kicinski @ 2026-03-19 23:31 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Jesper Dangaard Brouer, Byungchul Park, linux-mm, akpm, netdev,
	Mina Almasry, Toke Hoiland Jorgensen, linux-kernel, kernel_team,
	harry.yoo, ast, daniel, davem, john.fastabend, sdf, saeedm, leon,
	tariqt, mbloch, andrew+netdev, edumazet, pabeni, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	horms, jackmanb, hannes, ziy, ilias.apalodimas, willy, brauner,
	kas, yuzhao, usamaarif642, baolin.wang, asml.silence, bpf,
	linux-rdma, sfr, dw, ap420073

On Tue, 17 Mar 2026 12:06:51 +0100 Dragos Tatulea wrote:
> > Should we handle this more gracefully here? (release pp resources)
> >   
> Releasing of a leaked page_pool page could lead to many harder to track
> side-effects further down the line. Isn't it better to simply let the
> page in a leaked state?

+1 that'd also be my intuition.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5] mm: introduce a new page type for page pool in page type
  2026-03-16 22:31     ` [PATCH v5] " Byungchul Park
  2026-03-17  9:20       ` Jesper Dangaard Brouer
@ 2026-03-19 23:31       ` Jakub Kicinski
  1 sibling, 0 replies; 26+ messages in thread
From: Jakub Kicinski @ 2026-03-19 23:31 UTC (permalink / raw)
  To: Byungchul Park
  Cc: linux-mm, akpm, netdev, linux-kernel, kernel_team, harry.yoo, ast,
	daniel, davem, hawk, john.fastabend, sdf, saeedm, leon, tariqt,
	mbloch, andrew+netdev, edumazet, pabeni, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, jackmanb,
	hannes, ziy, ilias.apalodimas, willy, brauner, kas, yuzhao,
	usamaarif642, baolin.wang, almasrymina, toke, asml.silence, bpf,
	linux-rdma, sfr, dw, ap420073, dtatulea

On Tue, 17 Mar 2026 07:31:13 +0900 Byungchul Park wrote:
> Currently, the condition 'page->pp_magic == PP_SIGNATURE' is used to
> determine if a page belongs to a page pool.  However, with the planned
> removal of @pp_magic, we should instead leverage the page_type in struct
> page, such as PGTY_netpp, for this purpose.
> 
> Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
> and __ClearPageNetpp() instead, and remove the existing APIs accessing
> @pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
> netmem_clear_pp_magic().
> 
> Plus, add @page_type to struct net_iov at the same offset as struct page
> so as to use the page_type APIs for struct net_iov as well.  While at it,
> reorder @type and @owner in struct net_iov to avoid a hole and
> increasing the struct size.

Acked-by: Jakub Kicinski <kuba@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5] mm: introduce a new page type for page pool in page type
  2026-03-18  2:02         ` Byungchul Park
@ 2026-03-20 11:44           ` Jesper Dangaard Brouer
  2026-03-23 12:16             ` Ilias Apalodimas
  0 siblings, 1 reply; 26+ messages in thread
From: Jesper Dangaard Brouer @ 2026-03-20 11:44 UTC (permalink / raw)
  To: Byungchul Park
  Cc: linux-mm, akpm, netdev, Jakub Kicinski, Mina Almasry,
	Toke Hoiland Jorgensen, linux-kernel, kernel_team, harry.yoo, ast,
	daniel, davem, john.fastabend, sdf, saeedm, leon, tariqt, mbloch,
	andrew+netdev, edumazet, pabeni, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, jackmanb,
	hannes, ziy, ilias.apalodimas, willy, brauner, kas, yuzhao,
	usamaarif642, baolin.wang, asml.silence, bpf, linux-rdma, sfr, dw,
	ap420073, dtatulea, Daniel Dao, kernel-team



On 18/03/2026 03.02, Byungchul Park wrote:
> On Tue, Mar 17, 2026 at 10:20:08AM +0100, Jesper Dangaard Brouer wrote:
>> On 16/03/2026 23.31, Byungchul Park wrote:
>>> Currently, the condition 'page->pp_magic == PP_SIGNATURE' is used to
>>> determine if a page belongs to a page pool.  However, with the planned
>>> removal of @pp_magic, we should instead leverage the page_type in struct
>>> page, such as PGTY_netpp, for this purpose.
>>>
>>> Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
>>> and __ClearPageNetpp() instead, and remove the existing APIs accessing
>>> @pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
>>> netmem_clear_pp_magic().
>>>
>>> Plus, add @page_type to struct net_iov at the same offset as struct page
>>> so as to use the page_type APIs for struct net_iov as well.  While at it,
>>> reorder @type and @owner in struct net_iov to avoid a hole and
>>> increasing the struct size.
>>>
>>> This work was inspired by the following link:
>>>
>>>     https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/
>>>
>>> While at it, move the sanity check for page pool to on the free path.
>>>
>> [...]
>> see below
>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 9f2fe46ff69a1..ee81f5c67c18f 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -1044,7 +1044,6 @@ static inline bool page_expected_state(struct page *page,
>>>    #ifdef CONFIG_MEMCG
>>>                        page->memcg_data |
>>>    #endif
>>> -                     page_pool_page_is_pp(page) |
>>>                        (page->flags.f & check_flags)))
>>>                return false;
>>>
>>> @@ -1071,8 +1070,6 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
>>>        if (unlikely(page->memcg_data))
>>>                bad_reason = "page still charged to cgroup";
>>>    #endif
>>> -     if (unlikely(page_pool_page_is_pp(page)))
>>> -             bad_reason = "page_pool leak";
>>>        return bad_reason;
>>>    }
>>>
>>> @@ -1381,9 +1378,17 @@ __always_inline bool __free_pages_prepare(struct page *page,
>>>                mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
>>>                folio->mapping = NULL;
>>>        }
>>> -     if (unlikely(page_has_type(page)))
>>> +     if (unlikely(page_has_type(page))) {
>>> +             /* networking expects to clear its page type before releasing */
>>> +             if (is_check_pages_enabled()) {
>>> +                     if (unlikely(PageNetpp(page))) {
>>> +                             bad_page(page, "page_pool leak");
>>> +                             return false;
>>> +                     }
>>> +             }
>>>                /* Reset the page_type (which overlays _mapcount) */
>>>                page->page_type = UINT_MAX;
>>> +     }
>>
>> I need some opinions here.  When CONFIG_DEBUG_VM is enabled we get help
>> finding (hard to find) page_pool leaks and mark the page as bad.
>>
>> When CONFIG_DEBUG_VM is disabled we silently "allow" leaking.
>> Should we handle this more gracefully here? (release pp resources)
>>
>> Allowing the page to be reused at this point, when a page_pool instance
>> is known to track life-time and (likely) have the page DMA mapped, seems
>> problematic to me.  Code only clears page->page_type, but IIRC Toke
>> added DMA tracking in other fields (plus xarray internally).
>>
>> I was going to suggest to simply re-export page_pool_release_page() that
>> was hidden by 535b9c61bdef ("net: page_pool: hide
>> page_pool_release_page()"), but that functionality was lost in
>> 07e0c7d3179d ("net: page_pool: merge page_pool_release_page() with
>> page_pool_return_page()"). (Cc Jakub/Kuba for that choice).
>> Simply page_pool_release_page(page->pp, page).
>>
>> Opinions?
> 
> This is not an issue related to this patch but it's worth discussing. :)
> 

Yes, looking at existing code, you are actually keeping the same
problematic semantics of freeing a page that is still tracked by a
page_pool (DMA-mapped and life-time tracked), unless CONFIG_DEBUG_VM is
enabled.  You code change just makes this more clear that this is the case.

In that sense this patch is correct and can be applied.
Let me clearly ACK this patch.

Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>


> Not only page pool state but any bad states checks are also skipped with
> is_check_pages_enabled() off, which means they are going to be
> suppressed with the off.
> 

Sadly yes. I'm mostly familiar with the page_pool, and I can say that
this isn't a good situation.  Like Dragos said [1] it would be better to
just leak to it.  Dragos gave a talk[2] on how to find these leaked
pages, if we let them stay leaked.


  [1] https://lore.kernel.org/all/20260319163109.58511b70@kernel.org/
  [2] 
https://netdevconf.info/0x19/sessions/tutorial/diagnosing-page-pool-leaks.html


> It is necessary to first clarify whether the action was intentional or
> not.  According to the answer, we can discuss what to do better. :)
> 
Acking this patch, as this discussion is a problem for "us" (netdev
people) to figure out.  Lets land this patch and then we can followup.

--Jesper





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5] mm: introduce a new page type for page pool in page type
  2026-03-20 11:44           ` Jesper Dangaard Brouer
@ 2026-03-23 12:16             ` Ilias Apalodimas
  0 siblings, 0 replies; 26+ messages in thread
From: Ilias Apalodimas @ 2026-03-23 12:16 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Byungchul Park, linux-mm, akpm, netdev, Jakub Kicinski,
	Mina Almasry, Toke Hoiland Jorgensen, linux-kernel, kernel_team,
	harry.yoo, ast, daniel, davem, john.fastabend, sdf, saeedm, leon,
	tariqt, mbloch, andrew+netdev, edumazet, pabeni, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	horms, jackmanb, hannes, ziy, willy, brauner, kas, yuzhao,
	usamaarif642, baolin.wang, asml.silence, bpf, linux-rdma, sfr, dw,
	ap420073, dtatulea, Daniel Dao, kernel-team

On Fri, 20 Mar 2026 at 13:44, Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>
>
>
> On 18/03/2026 03.02, Byungchul Park wrote:
> > On Tue, Mar 17, 2026 at 10:20:08AM +0100, Jesper Dangaard Brouer wrote:
> >> On 16/03/2026 23.31, Byungchul Park wrote:
> >>> Currently, the condition 'page->pp_magic == PP_SIGNATURE' is used to
> >>> determine if a page belongs to a page pool.  However, with the planned
> >>> removal of @pp_magic, we should instead leverage the page_type in struct
> >>> page, such as PGTY_netpp, for this purpose.
> >>>
> >>> Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
> >>> and __ClearPageNetpp() instead, and remove the existing APIs accessing
> >>> @pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
> >>> netmem_clear_pp_magic().
> >>>
> >>> Plus, add @page_type to struct net_iov at the same offset as struct page
> >>> so as to use the page_type APIs for struct net_iov as well.  While at it,
> >>> reorder @type and @owner in struct net_iov to avoid a hole and
> >>> increasing the struct size.
> >>>
> >>> This work was inspired by the following link:
> >>>
> >>>     https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/
> >>>
> >>> While at it, move the sanity check for page pool to on the free path.
> >>>
> >> [...]
> >> see below
> >>
> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>> index 9f2fe46ff69a1..ee81f5c67c18f 100644
> >>> --- a/mm/page_alloc.c
> >>> +++ b/mm/page_alloc.c
> >>> @@ -1044,7 +1044,6 @@ static inline bool page_expected_state(struct page *page,
> >>>    #ifdef CONFIG_MEMCG
> >>>                        page->memcg_data |
> >>>    #endif
> >>> -                     page_pool_page_is_pp(page) |
> >>>                        (page->flags.f & check_flags)))
> >>>                return false;
> >>>
> >>> @@ -1071,8 +1070,6 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
> >>>        if (unlikely(page->memcg_data))
> >>>                bad_reason = "page still charged to cgroup";
> >>>    #endif
> >>> -     if (unlikely(page_pool_page_is_pp(page)))
> >>> -             bad_reason = "page_pool leak";
> >>>        return bad_reason;
> >>>    }
> >>>
> >>> @@ -1381,9 +1378,17 @@ __always_inline bool __free_pages_prepare(struct page *page,
> >>>                mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
> >>>                folio->mapping = NULL;
> >>>        }
> >>> -     if (unlikely(page_has_type(page)))
> >>> +     if (unlikely(page_has_type(page))) {
> >>> +             /* networking expects to clear its page type before releasing */
> >>> +             if (is_check_pages_enabled()) {
> >>> +                     if (unlikely(PageNetpp(page))) {
> >>> +                             bad_page(page, "page_pool leak");
> >>> +                             return false;
> >>> +                     }
> >>> +             }
> >>>                /* Reset the page_type (which overlays _mapcount) */
> >>>                page->page_type = UINT_MAX;
> >>> +     }
> >>
> >> I need some opinions here.  When CONFIG_DEBUG_VM is enabled we get help
> >> finding (hard to find) page_pool leaks and mark the page as bad.
> >>
> >> When CONFIG_DEBUG_VM is disabled we silently "allow" leaking.
> >> Should we handle this more gracefully here? (release pp resources)
> >>
> >> Allowing the page to be reused at this point, when a page_pool instance
> >> is known to track life-time and (likely) have the page DMA mapped, seems
> >> problematic to me.  Code only clears page->page_type, but IIRC Toke
> >> added DMA tracking in other fields (plus xarray internally).
> >>
> >> I was going to suggest to simply re-export page_pool_release_page() that
> >> was hidden by 535b9c61bdef ("net: page_pool: hide
> >> page_pool_release_page()"), but that functionality was lost in
> >> 07e0c7d3179d ("net: page_pool: merge page_pool_release_page() with
> >> page_pool_return_page()"). (Cc Jakub/Kuba for that choice).
> >> Simply page_pool_release_page(page->pp, page).
> >>
> >> Opinions?
> >
> > This is not an issue related to this patch but it's worth discussing. :)
> >
>
> Yes, looking at existing code, you are actually keeping the same
> problematic semantics of freeing a page that is still tracked by a
> page_pool (DMA-mapped and life-time tracked), unless CONFIG_DEBUG_VM is
> enabled.  You code change just makes this more clear that this is the case.
>
> In that sense this patch is correct and can be applied.
> Let me clearly ACK this patch.
>
> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>

Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

>
>
> > Not only page pool state but any bad states checks are also skipped with
> > is_check_pages_enabled() off, which means they are going to be
> > suppressed with the off.
> >
>
> Sadly yes. I'm mostly familiar with the page_pool, and I can say that
> this isn't a good situation.  Like Dragos said [1] it would be better to
> just leak to it.  Dragos gave a talk[2] on how to find these leaked
> pages, if we let them stay leaked.
>
>
>   [1] https://lore.kernel.org/all/20260319163109.58511b70@kernel.org/
>   [2]
> https://netdevconf.info/0x19/sessions/tutorial/diagnosing-page-pool-leaks.html
>
>
> > It is necessary to first clarify whether the action was intentional or
> > not.  According to the answer, we can discuss what to do better. :)
> >
> Acking this patch, as this discussion is a problem for "us" (netdev
> people) to figure out.  Lets land this patch and then we can followup.
>
> --Jesper
>
>
>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] mm: introduce a new page type for page pool in page type
  2026-02-24  5:13 [PATCH v4] mm: introduce a new page type for page pool in page type Byungchul Park
  2026-02-25  7:19 ` Mike Rapoport
  2026-02-26 18:49 ` Johannes Weiner
@ 2026-05-13  9:00 ` Dragos Tatulea
  2026-05-13  9:12   ` Vlastimil Babka (SUSE)
                     ` (2 more replies)
  2026-05-13  9:42 ` Lorenzo Stoakes
  3 siblings, 3 replies; 26+ messages in thread
From: Dragos Tatulea @ 2026-05-13  9:00 UTC (permalink / raw)
  To: Byungchul Park, linux-mm, akpm, netdev
  Cc: linux-kernel, kernel_team, harry.yoo, ast, daniel, davem, kuba,
	hawk, john.fastabend, sdf, saeedm, leon, tariqt, mbloch,
	andrew+netdev, edumazet, pabeni, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, jackmanb,
	hannes, ziy, ilias.apalodimas, willy, brauner, kas, yuzhao,
	usamaarif642, baolin.wang, almasrymina, toke, asml.silence, bpf,
	linux-rdma, sfr, dw, ap420073



On 24.02.26 06:13, Byungchul Park wrote:
> Currently, the condition 'page->pp_magic == PP_SIGNATURE' is used to
> determine if a page belongs to a page pool.  However, with the planned
> removal of @pp_magic, we should instead leverage the page_type in struct
> page, such as PGTY_netpp, for this purpose.
> 
> Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
> and __ClearPageNetpp() instead, and remove the existing APIs accessing
> @pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
> netmem_clear_pp_magic().
> 
> Plus, add @page_type to struct net_iov at the same offset as struct page
> so as to use the page_type APIs for struct net_iov as well.  While at it,
> reorder @type and @owner in struct net_iov to avoid a hole and
> increasing the struct size.
> 
> This work was inspired by the following link:
> 
>   https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/
> 
> While at it, move the sanity check for page pool to on the free path.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Co-developed-by: Pavel Begunkov <asml.silence@gmail.com>
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Acked-by: Zi Yan <ziy@nvidia.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---

Seems like this patch broke tcp_mmap because
validate_page_before_insert() returns -EINVAL due
to a page having a type. Here's the full flow:

getsockopt(TCP_ZEROCOPY_RECEIVE) returns -EINVAL because of the
below flow in the kernel:

tcp_zerocopy_receive()
-> tcp_zerocopy_vm_insert_batch()
  -> vm_insert_pages()
    -> insert_pages()
      -> insert_page_in_batch_locked()
        -> validate_page_before_insert() returns -EINVAL
           because page_has_type(page) is now true.

The patch below fixes the issue. But is this a valid fix?

diff --git a/mm/memory.c b/mm/memory.c
index ea6568571131..4cb12673f450 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2326,7 +2326,7 @@ static int validate_page_before_insert(struct vm_area_struct *vma,
                        return -EINVAL;
                return 0;
        }
-       if (folio_test_anon(folio) || page_has_type(page))
+       if (folio_test_anon(folio) || (page_has_type(page) && !PageNetpp(page)))
                return -EINVAL;
        flush_dcache_folio(folio);
        return 0;

Thanks,
Dragos



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] mm: introduce a new page type for page pool in page type
  2026-05-13  9:00 ` [PATCH v4] " Dragos Tatulea
@ 2026-05-13  9:12   ` Vlastimil Babka (SUSE)
  2026-05-13  9:26     ` Pedro Falcato
  2026-05-13  9:34   ` David Hildenbrand (Arm)
  2026-05-13 12:18   ` Byungchul Park
  2 siblings, 1 reply; 26+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-05-13  9:12 UTC (permalink / raw)
  To: Dragos Tatulea, Byungchul Park, linux-mm, akpm, netdev
  Cc: linux-kernel, kernel_team, harry.yoo, ast, daniel, davem, kuba,
	hawk, john.fastabend, sdf, saeedm, leon, tariqt, mbloch,
	andrew+netdev, edumazet, pabeni, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, jackmanb,
	hannes, ziy, ilias.apalodimas, willy, brauner, kas, yuzhao,
	usamaarif642, baolin.wang, almasrymina, toke, asml.silence, bpf,
	linux-rdma, sfr, dw, ap420073

On 5/13/26 11:00, Dragos Tatulea wrote:
> 
> 
> On 24.02.26 06:13, Byungchul Park wrote:
>> Currently, the condition 'page->pp_magic == PP_SIGNATURE' is used to
>> determine if a page belongs to a page pool.  However, with the planned
>> removal of @pp_magic, we should instead leverage the page_type in struct
>> page, such as PGTY_netpp, for this purpose.
>> 
>> Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
>> and __ClearPageNetpp() instead, and remove the existing APIs accessing
>> @pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
>> netmem_clear_pp_magic().
>> 
>> Plus, add @page_type to struct net_iov at the same offset as struct page
>> so as to use the page_type APIs for struct net_iov as well.  While at it,
>> reorder @type and @owner in struct net_iov to avoid a hole and
>> increasing the struct size.
>> 
>> This work was inspired by the following link:
>> 
>>   https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/
>> 
>> While at it, move the sanity check for page pool to on the free path.
>> 
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Co-developed-by: Pavel Begunkov <asml.silence@gmail.com>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> Signed-off-by: Byungchul Park <byungchul@sk.com>
>> Acked-by: David Hildenbrand <david@redhat.com>
>> Acked-by: Zi Yan <ziy@nvidia.com>
>> Acked-by: Vlastimil Babka <vbabka@suse.cz>
>> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> ---
> 
> Seems like this patch broke tcp_mmap because
> validate_page_before_insert() returns -EINVAL due
> to a page having a type. Here's the full flow:
> 
> getsockopt(TCP_ZEROCOPY_RECEIVE) returns -EINVAL because of the
> below flow in the kernel:
> 
> tcp_zerocopy_receive()
> -> tcp_zerocopy_vm_insert_batch()
>   -> vm_insert_pages()
>     -> insert_pages()
>       -> insert_page_in_batch_locked()
>         -> validate_page_before_insert() returns -EINVAL
>            because page_has_type(page) is now true.
> 
> The patch below fixes the issue. But is this a valid fix?

Hmm the check traces back to commit 0ee930e6cafa0 "mm/memory.c: prevent
mapping typed pages to userspace"

> Pages which use page_type must never be mapped to userspace as it would
> destroy their page type.  Add an explicit check for this instead of
> assuming that kernel drivers always get this right.

So uh, this doesn't look good I think.

> diff --git a/mm/memory.c b/mm/memory.c
> index ea6568571131..4cb12673f450 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2326,7 +2326,7 @@ static int validate_page_before_insert(struct vm_area_struct *vma,
>                         return -EINVAL;
>                 return 0;
>         }
> -       if (folio_test_anon(folio) || page_has_type(page))
> +       if (folio_test_anon(folio) || (page_has_type(page) && !PageNetpp(page)))
>                 return -EINVAL;
>         flush_dcache_folio(folio);
>         return 0;
> 
> Thanks,
> Dragos
> 
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] mm: introduce a new page type for page pool in page type
  2026-05-13  9:12   ` Vlastimil Babka (SUSE)
@ 2026-05-13  9:26     ` Pedro Falcato
  2026-05-13  9:36       ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 26+ messages in thread
From: Pedro Falcato @ 2026-05-13  9:26 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE)
  Cc: Dragos Tatulea, Byungchul Park, linux-mm, akpm, netdev,
	linux-kernel, kernel_team, harry.yoo, ast, daniel, davem, kuba,
	hawk, john.fastabend, sdf, saeedm, leon, tariqt, mbloch,
	andrew+netdev, edumazet, pabeni, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, jackmanb,
	hannes, ziy, ilias.apalodimas, willy, brauner, kas, yuzhao,
	usamaarif642, baolin.wang, almasrymina, toke, asml.silence, bpf,
	linux-rdma, sfr, dw, ap420073

On Wed, May 13, 2026 at 11:12:43AM +0200, Vlastimil Babka (SUSE) wrote:
> On 5/13/26 11:00, Dragos Tatulea wrote:
> > 
> > 
> > On 24.02.26 06:13, Byungchul Park wrote:
> >> Currently, the condition 'page->pp_magic == PP_SIGNATURE' is used to
> >> determine if a page belongs to a page pool.  However, with the planned
> >> removal of @pp_magic, we should instead leverage the page_type in struct
> >> page, such as PGTY_netpp, for this purpose.
> >> 
> >> Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
> >> and __ClearPageNetpp() instead, and remove the existing APIs accessing
> >> @pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
> >> netmem_clear_pp_magic().
> >> 
> >> Plus, add @page_type to struct net_iov at the same offset as struct page
> >> so as to use the page_type APIs for struct net_iov as well.  While at it,
> >> reorder @type and @owner in struct net_iov to avoid a hole and
> >> increasing the struct size.
> >> 
> >> This work was inspired by the following link:
> >> 
> >>   https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/
> >> 
> >> While at it, move the sanity check for page pool to on the free path.
> >> 
> >> Suggested-by: David Hildenbrand <david@redhat.com>
> >> Co-developed-by: Pavel Begunkov <asml.silence@gmail.com>
> >> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> >> Signed-off-by: Byungchul Park <byungchul@sk.com>
> >> Acked-by: David Hildenbrand <david@redhat.com>
> >> Acked-by: Zi Yan <ziy@nvidia.com>
> >> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> >> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >> ---
> > 
> > Seems like this patch broke tcp_mmap because
> > validate_page_before_insert() returns -EINVAL due
> > to a page having a type. Here's the full flow:
> > 
> > getsockopt(TCP_ZEROCOPY_RECEIVE) returns -EINVAL because of the
> > below flow in the kernel:
> > 
> > tcp_zerocopy_receive()
> > -> tcp_zerocopy_vm_insert_batch()
> >   -> vm_insert_pages()
> >     -> insert_pages()
> >       -> insert_page_in_batch_locked()
> >         -> validate_page_before_insert() returns -EINVAL
> >            because page_has_type(page) is now true.
> > 
> > The patch below fixes the issue. But is this a valid fix?
> 
> Hmm the check traces back to commit 0ee930e6cafa0 "mm/memory.c: prevent
> mapping typed pages to userspace"
> 
> > Pages which use page_type must never be mapped to userspace as it would
> > destroy their page type.  Add an explicit check for this instead of
> > assuming that kernel drivers always get this right.
> 
> So uh, this doesn't look good I think.

Yep, you fundamentally can't map a page with a type as page type aliases with
mapcount. Even with the given diff, just mapping it will increment the mapcount
and wreak havoc. I think we need to revert this patch for now.

I'm not sure what the long term plan for this would be. If page types are moved
to memdesc types, then the two stop colliding and that could work. I don't know
if that's Willy's plan, however.

(then there's the other question: are page pool pages really folios? not really.
they are mappable, but they aren't part of the page cache, or anon, nor are
they in the LRU or have rmap capabilities. perhaps we need a different memdesc
for those. we're one step away from reinventing class polymorphism from first
principles ;)

> 
> > diff --git a/mm/memory.c b/mm/memory.c
> > index ea6568571131..4cb12673f450 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -2326,7 +2326,7 @@ static int validate_page_before_insert(struct vm_area_struct *vma,
> >                         return -EINVAL;
> >                 return 0;
> >         }
> > -       if (folio_test_anon(folio) || page_has_type(page))
> > +       if (folio_test_anon(folio) || (page_has_type(page) && !PageNetpp(page)))
> >                 return -EINVAL;
> >         flush_dcache_folio(folio);
> >         return 0;
> > 
> > Thanks,
> > Dragos
> > 
> > 
> 
> 

-- 
Pedro

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] mm: introduce a new page type for page pool in page type
  2026-05-13  9:00 ` [PATCH v4] " Dragos Tatulea
  2026-05-13  9:12   ` Vlastimil Babka (SUSE)
@ 2026-05-13  9:34   ` David Hildenbrand (Arm)
  2026-05-13 12:18   ` Byungchul Park
  2 siblings, 0 replies; 26+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-13  9:34 UTC (permalink / raw)
  To: Dragos Tatulea, Byungchul Park, linux-mm, akpm, netdev
  Cc: linux-kernel, kernel_team, harry.yoo, ast, daniel, davem, kuba,
	hawk, john.fastabend, sdf, saeedm, leon, tariqt, mbloch,
	andrew+netdev, edumazet, pabeni, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, jackmanb, hannes, ziy,
	ilias.apalodimas, willy, brauner, kas, yuzhao, usamaarif642,
	baolin.wang, almasrymina, toke, asml.silence, bpf, linux-rdma,
	sfr, dw, ap420073

On 5/13/26 11:00, Dragos Tatulea wrote:
> 
> 
> On 24.02.26 06:13, Byungchul Park wrote:
>> Currently, the condition 'page->pp_magic == PP_SIGNATURE' is used to
>> determine if a page belongs to a page pool.  However, with the planned
>> removal of @pp_magic, we should instead leverage the page_type in struct
>> page, such as PGTY_netpp, for this purpose.
>>
>> Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
>> and __ClearPageNetpp() instead, and remove the existing APIs accessing
>> @pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
>> netmem_clear_pp_magic().
>>
>> Plus, add @page_type to struct net_iov at the same offset as struct page
>> so as to use the page_type APIs for struct net_iov as well.  While at it,
>> reorder @type and @owner in struct net_iov to avoid a hole and
>> increasing the struct size.
>>
>> This work was inspired by the following link:
>>
>>   https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/
>>
>> While at it, move the sanity check for page pool to on the free path.
>>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Co-developed-by: Pavel Begunkov <asml.silence@gmail.com>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> Signed-off-by: Byungchul Park <byungchul@sk.com>
>> Acked-by: David Hildenbrand <david@redhat.com>
>> Acked-by: Zi Yan <ziy@nvidia.com>
>> Acked-by: Vlastimil Babka <vbabka@suse.cz>
>> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> ---
> 
> Seems like this patch broke tcp_mmap because
> validate_page_before_insert() returns -EINVAL due
> to a page having a type. Here's the full flow:
> 
> getsockopt(TCP_ZEROCOPY_RECEIVE) returns -EINVAL because of the
> below flow in the kernel:
> 
> tcp_zerocopy_receive()
> -> tcp_zerocopy_vm_insert_batch()
>   -> vm_insert_pages()
>     -> insert_pages()
>       -> insert_page_in_batch_locked()
>         -> validate_page_before_insert() returns -EINVAL
>            because page_has_type(page) is now true.
> 
> The patch below fixes the issue. But is this a valid fix?
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index ea6568571131..4cb12673f450 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2326,7 +2326,7 @@ static int validate_page_before_insert(struct vm_area_struct *vma,
>                         return -EINVAL;
>                 return 0;
>         }
> -       if (folio_test_anon(folio) || page_has_type(page))
> +       if (folio_test_anon(folio) || (page_has_type(page) && !PageNetpp(page)))
>                 return -EINVAL;

That's wrong. The type is stored in page->_mapcount, and rmap code would corrupt
that type.

So if the pages should be mapped to user space, it wouldn't be possible like
this. Today ...

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] mm: introduce a new page type for page pool in page type
  2026-05-13  9:26     ` Pedro Falcato
@ 2026-05-13  9:36       ` David Hildenbrand (Arm)
  2026-05-13 12:06         ` Dragos Tatulea
  0 siblings, 1 reply; 26+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-13  9:36 UTC (permalink / raw)
  To: Pedro Falcato, Vlastimil Babka (SUSE)
  Cc: Dragos Tatulea, Byungchul Park, linux-mm, akpm, netdev,
	linux-kernel, kernel_team, harry.yoo, ast, daniel, davem, kuba,
	hawk, john.fastabend, sdf, saeedm, leon, tariqt, mbloch,
	andrew+netdev, edumazet, pabeni, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, horms, jackmanb, hannes, ziy,
	ilias.apalodimas, willy, brauner, kas, yuzhao, usamaarif642,
	baolin.wang, almasrymina, toke, asml.silence, bpf, linux-rdma,
	sfr, dw, ap420073

On 5/13/26 11:26, Pedro Falcato wrote:
> On Wed, May 13, 2026 at 11:12:43AM +0200, Vlastimil Babka (SUSE) wrote:
>> On 5/13/26 11:00, Dragos Tatulea wrote:
>>>
>>>
>>>
>>> Seems like this patch broke tcp_mmap because
>>> validate_page_before_insert() returns -EINVAL due
>>> to a page having a type. Here's the full flow:
>>>
>>> getsockopt(TCP_ZEROCOPY_RECEIVE) returns -EINVAL because of the
>>> below flow in the kernel:
>>>
>>> tcp_zerocopy_receive()
>>> -> tcp_zerocopy_vm_insert_batch()
>>>   -> vm_insert_pages()
>>>     -> insert_pages()
>>>       -> insert_page_in_batch_locked()
>>>         -> validate_page_before_insert() returns -EINVAL
>>>            because page_has_type(page) is now true.
>>>
>>> The patch below fixes the issue. But is this a valid fix?
>>
>> Hmm the check traces back to commit 0ee930e6cafa0 "mm/memory.c: prevent
>> mapping typed pages to userspace"
>>
>>> Pages which use page_type must never be mapped to userspace as it would
>>> destroy their page type.  Add an explicit check for this instead of
>>> assuming that kernel drivers always get this right.
>>
>> So uh, this doesn't look good I think.
> 
> Yep, you fundamentally can't map a page with a type as page type aliases with
> mapcount. Even with the given diff, just mapping it will increment the mapcount
> and wreak havoc. I think we need to revert this patch for now.
> 
> I'm not sure what the long term plan for this would be. If page types are moved
> to memdesc types, then the two stop colliding and that could work. I don't know
> if that's Willy's plan, however.
> 
> (then there's the other question: are page pool pages really folios? not really.
> they are mappable, but they aren't part of the page cache, or anon, nor are
> they in the LRU or have rmap capabilities. perhaps we need a different memdesc
> for those. we're one step away from reinventing class polymorphism from first
> principles ;)

Zi Yan is working on this: non-folio pages would no longer mess with
rmap/mapcounts, and page table walking code will identify them to be non-folio
things to skip them.

It will take a while, though ...

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] mm: introduce a new page type for page pool in page type
  2026-02-24  5:13 [PATCH v4] mm: introduce a new page type for page pool in page type Byungchul Park
                   ` (2 preceding siblings ...)
  2026-05-13  9:00 ` [PATCH v4] " Dragos Tatulea
@ 2026-05-13  9:42 ` Lorenzo Stoakes
  3 siblings, 0 replies; 26+ messages in thread
From: Lorenzo Stoakes @ 2026-05-13  9:42 UTC (permalink / raw)
  To: Byungchul Park
  Cc: linux-mm, akpm, netdev, linux-kernel, kernel_team, harry.yoo, ast,
	daniel, davem, kuba, hawk, john.fastabend, sdf, saeedm, leon,
	tariqt, mbloch, andrew+netdev, edumazet, pabeni, david,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, jackmanb,
	hannes, ziy, ilias.apalodimas, willy, brauner, kas, yuzhao,
	usamaarif642, baolin.wang, almasrymina, toke, asml.silence, bpf,
	linux-rdma, sfr, dw, ap420073, dtatulea

-cc previous kernel mail address
-cc David's very previous kernel mail address
+cc David's current mail address

Hi,

Just an annoying reminder to please check people's addresses via
get_maintainer.pl, as otherwise stuff might get missed! :)

Cheers, Lorenzo

On Tue, Feb 24, 2026 at 02:13:47PM +0900, Byungchul Park wrote:
> Currently, the condition 'page->pp_magic == PP_SIGNATURE' is used to
> determine if a page belongs to a page pool.  However, with the planned
> removal of @pp_magic, we should instead leverage the page_type in struct
> page, such as PGTY_netpp, for this purpose.
>
> Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
> and __ClearPageNetpp() instead, and remove the existing APIs accessing
> @pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
> netmem_clear_pp_magic().
>
> Plus, add @page_type to struct net_iov at the same offset as struct page
> so as to use the page_type APIs for struct net_iov as well.  While at it,
> reorder @type and @owner in struct net_iov to avoid a hole and
> increasing the struct size.
>
> This work was inspired by the following link:
>
>   https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/
>
> While at it, move the sanity check for page pool to on the free path.
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Co-developed-by: Pavel Begunkov <asml.silence@gmail.com>
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Acked-by: Zi Yan <ziy@nvidia.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
> The following 'Acked-by's were given only for mm part:
>
>   Acked-by: David Hildenbrand <david@redhat.com>
>   Acked-by: Zi Yan <ziy@nvidia.com>
>   Acked-by: Vlastimil Babka <vbabka@suse.cz>
>
> This patch changes both mm and page-pool, but I'm currently targetting
> mm tree because Jakub asked and I also think it's more about mm change.
> See the following link:
>
>   https://lore.kernel.org/all/20250813075212.051b5178@kernel.org/
>
> Changes from v3:
> 	1. Rebase on mm-new as of Feb 24, 2026.
> 	2. Fix an issue reported by kernel test robot due to incorrect
> 	   type.
> 	3. Add 'Acked-by: Vlastimil Babka <vbabka@suse.cz>'.  Thanks.
>
> Changes from v2:
> 	1. Fix an issue reported by kernel test robot due to incorrect
> 	   type in argument of __netmem_to_page().
>
> Changes from v1:
> 	1. Drop the finalizing patch removing the pp fields of struct
> 	   page since I found that there is still code accessing a pp
> 	   field via struct page.  I will retry the finalizing patch
> 	   after resolving the issue.
> ---
>  .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  2 +-
>  include/linux/mm.h                            | 27 +++----------------
>  include/linux/page-flags.h                    |  6 +++++
>  include/net/netmem.h                          | 15 +++++++++--
>  mm/page_alloc.c                               | 11 +++++---
>  net/core/netmem_priv.h                        | 23 +++++++---------
>  net/core/page_pool.c                          | 24 +++++++++++++++--
>  7 files changed, 62 insertions(+), 46 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> index 80f9fc10877ad..7d90d2485c787 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> @@ -707,7 +707,7 @@ static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq,
>  				xdpi = mlx5e_xdpi_fifo_pop(xdpi_fifo);
>  				page = xdpi.page.page;
>
> -				/* No need to check page_pool_page_is_pp() as we
> +				/* No need to check PageNetpp() as we
>  				 * know this is a page_pool page.
>  				 */
>  				page_pool_recycle_direct(pp_page_to_nmdesc(page)->pp,
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 13336340612e2..0db764b3d6b84 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4824,10 +4824,9 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>   * DMA mapping IDs for page_pool
>   *
>   * When DMA-mapping a page, page_pool allocates an ID (from an xarray) and
> - * stashes it in the upper bits of page->pp_magic. We always want to be able to
> - * unambiguously identify page pool pages (using page_pool_page_is_pp()). Non-PP
> - * pages can have arbitrary kernel pointers stored in the same field as pp_magic
> - * (since it overlaps with page->lru.next), so we must ensure that we cannot
> + * stashes it in the upper bits of page->pp_magic. Non-PP pages can have
> + * arbitrary kernel pointers stored in the same field as pp_magic (since
> + * it overlaps with page->lru.next), so we must ensure that we cannot
>   * mistake a valid kernel pointer with any of the values we write into this
>   * field.
>   *
> @@ -4862,26 +4861,6 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>  #define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS + PP_DMA_INDEX_SHIFT - 1, \
>  				  PP_DMA_INDEX_SHIFT)
>
> -/* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is
> - * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for
> - * the head page of compound page and bit 1 for pfmemalloc page, as well as the
> - * bits used for the DMA index. page_is_pfmemalloc() is checked in
> - * __page_pool_put_page() to avoid recycling the pfmemalloc page.
> - */
> -#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> -
> -#ifdef CONFIG_PAGE_POOL
> -static inline bool page_pool_page_is_pp(const struct page *page)
> -{
> -	return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> -}
> -#else
> -static inline bool page_pool_page_is_pp(const struct page *page)
> -{
> -	return false;
> -}
> -#endif
> -
>  #define PAGE_SNAPSHOT_FAITHFUL (1 << 0)
>  #define PAGE_SNAPSHOT_PG_BUDDY (1 << 1)
>  #define PAGE_SNAPSHOT_PG_IDLE  (1 << 2)
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 0426cac91c0bb..30c4eb24e4edf 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -939,6 +939,7 @@ enum pagetype {
>  	PGTY_zsmalloc		= 0xf6,
>  	PGTY_unaccepted		= 0xf7,
>  	PGTY_large_kmalloc	= 0xf8,
> +	PGTY_netpp		= 0xf9,
>
>  	PGTY_mapcount_underflow = 0xff
>  };
> @@ -1071,6 +1072,11 @@ PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc)
>  PAGE_TYPE_OPS(Unaccepted, unaccepted, unaccepted)
>  PAGE_TYPE_OPS(LargeKmalloc, large_kmalloc, large_kmalloc)
>
> +/*
> + * Marks page_pool allocated pages.
> + */
> +PAGE_TYPE_OPS(Netpp, netpp, netpp)
> +
>  /**
>   * PageHuge - Determine if the page belongs to hugetlbfs
>   * @page: The page to test.
> diff --git a/include/net/netmem.h b/include/net/netmem.h
> index a96b3e5e5574c..85e3b26ec547f 100644
> --- a/include/net/netmem.h
> +++ b/include/net/netmem.h
> @@ -110,10 +110,21 @@ struct net_iov {
>  			atomic_long_t pp_ref_count;
>  		};
>  	};
> -	struct net_iov_area *owner;
> +
> +	unsigned int page_type;
>  	enum net_iov_type type;
> +	struct net_iov_area *owner;
>  };
>
> +/* Make sure 'the offset of page_type in struct page == the offset of
> + * type in struct net_iov'.
> + */
> +#define NET_IOV_ASSERT_OFFSET(pg, iov)			\
> +	static_assert(offsetof(struct page, pg) ==	\
> +		      offsetof(struct net_iov, iov))
> +NET_IOV_ASSERT_OFFSET(page_type, page_type);
> +#undef NET_IOV_ASSERT_OFFSET
> +
>  struct net_iov_area {
>  	/* Array of net_iovs for this area. */
>  	struct net_iov *niovs;
> @@ -256,7 +267,7 @@ static inline unsigned long netmem_pfn_trace(netmem_ref netmem)
>   */
>  #define pp_page_to_nmdesc(p)						\
>  ({									\
> -	DEBUG_NET_WARN_ON_ONCE(!page_pool_page_is_pp(p));		\
> +	DEBUG_NET_WARN_ON_ONCE(!PageNetpp(p));				\
>  	__pp_page_to_nmdesc(p);						\
>  })
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d88c8c67ac0b7..cae9f93271469 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1079,7 +1079,6 @@ static inline bool page_expected_state(struct page *page,
>  #ifdef CONFIG_MEMCG
>  			page->memcg_data |
>  #endif
> -			page_pool_page_is_pp(page) |
>  			(page->flags.f & check_flags)))
>  		return false;
>
> @@ -1106,8 +1105,6 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
>  	if (unlikely(page->memcg_data))
>  		bad_reason = "page still charged to cgroup";
>  #endif
> -	if (unlikely(page_pool_page_is_pp(page)))
> -		bad_reason = "page_pool leak";
>  	return bad_reason;
>  }
>
> @@ -1416,9 +1413,15 @@ __always_inline bool __free_pages_prepare(struct page *page,
>  		mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
>  		folio->mapping = NULL;
>  	}
> -	if (unlikely(page_has_type(page)))
> +	if (unlikely(page_has_type(page))) {
> +		/* networking expects to clear its page type before releasing */
> +		if (unlikely(PageNetpp(page))) {
> +			bad_page(page, "page_pool leak");
> +			return false;
> +		}
>  		/* Reset the page_type (which overlays _mapcount) */
>  		page->page_type = UINT_MAX;
> +	}
>
>  	if (is_check_pages_enabled()) {
>  		if (free_page_is_bad(page))
> diff --git a/net/core/netmem_priv.h b/net/core/netmem_priv.h
> index 23175cb2bd866..3e6fde8f1726f 100644
> --- a/net/core/netmem_priv.h
> +++ b/net/core/netmem_priv.h
> @@ -8,21 +8,18 @@ static inline unsigned long netmem_get_pp_magic(netmem_ref netmem)
>  	return netmem_to_nmdesc(netmem)->pp_magic & ~PP_DMA_INDEX_MASK;
>  }
>
> -static inline void netmem_or_pp_magic(netmem_ref netmem, unsigned long pp_magic)
> -{
> -	netmem_to_nmdesc(netmem)->pp_magic |= pp_magic;
> -}
> -
> -static inline void netmem_clear_pp_magic(netmem_ref netmem)
> -{
> -	WARN_ON_ONCE(netmem_to_nmdesc(netmem)->pp_magic & PP_DMA_INDEX_MASK);
> -
> -	netmem_to_nmdesc(netmem)->pp_magic = 0;
> -}
> -
>  static inline bool netmem_is_pp(netmem_ref netmem)
>  {
> -	return (netmem_get_pp_magic(netmem) & PP_MAGIC_MASK) == PP_SIGNATURE;
> +	struct page *page;
> +
> +	/* XXX: Now that the offset of page_type is shared between
> +	 * struct page and net_iov, just cast the netmem to struct page
> +	 * unconditionally by clearing NET_IOV if any, no matter whether
> +	 * it comes from struct net_iov or struct page.  This should be
> +	 * adjusted once the offset is no longer shared.
> +	 */
> +	page = (struct page *)((__force unsigned long)netmem & ~NET_IOV);
> +	return PageNetpp(page);
>  }
>
>  static inline void netmem_set_pp(netmem_ref netmem, struct page_pool *pool)
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 265a729431bb7..877bbf7a19389 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -702,8 +702,18 @@ s32 page_pool_inflight(const struct page_pool *pool, bool strict)
>
>  void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem)
>  {
> +	struct page *page;
> +
>  	netmem_set_pp(netmem, pool);
> -	netmem_or_pp_magic(netmem, PP_SIGNATURE);
> +
> +	/* XXX: Now that the offset of page_type is shared between
> +	 * struct page and net_iov, just cast the netmem to struct page
> +	 * unconditionally by clearing NET_IOV if any, no matter whether
> +	 * it comes from struct net_iov or struct page.  This should be
> +	 * adjusted once the offset is no longer shared.
> +	 */
> +	page = (struct page *)((__force unsigned long)netmem & ~NET_IOV);
> +	__SetPageNetpp(page);
>
>  	/* Ensuring all pages have been split into one fragment initially:
>  	 * page_pool_set_pp_info() is only called once for every page when it
> @@ -718,7 +728,17 @@ void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem)
>
>  void page_pool_clear_pp_info(netmem_ref netmem)
>  {
> -	netmem_clear_pp_magic(netmem);
> +	struct page *page;
> +
> +	/* XXX: Now that the offset of page_type is shared between
> +	 * struct page and net_iov, just cast the netmem to struct page
> +	 * unconditionally by clearing NET_IOV if any, no matter whether
> +	 * it comes from struct net_iov or struct page.  This should be
> +	 * adjusted once the offset is no longer shared.
> +	 */
> +	page = (struct page *)((__force unsigned long)netmem & ~NET_IOV);
> +	__ClearPageNetpp(page);
> +
>  	netmem_set_pp(netmem, NULL);
>  }
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] mm: introduce a new page type for page pool in page type
  2026-05-13  9:36       ` David Hildenbrand (Arm)
@ 2026-05-13 12:06         ` Dragos Tatulea
  2026-05-13 12:11           ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 26+ messages in thread
From: Dragos Tatulea @ 2026-05-13 12:06 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Pedro Falcato, Vlastimil Babka (SUSE)
  Cc: Byungchul Park, linux-mm, akpm, netdev, linux-kernel, kernel_team,
	harry.yoo, ast, daniel, davem, kuba, hawk, john.fastabend, sdf,
	saeedm, leon, tariqt, mbloch, andrew+netdev, edumazet, pabeni,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	horms, jackmanb, hannes, ziy, ilias.apalodimas, willy, brauner,
	kas, yuzhao, usamaarif642, baolin.wang, almasrymina, toke,
	asml.silence, bpf, linux-rdma, sfr, dw, ap420073



On 13.05.26 11:36, David Hildenbrand (Arm) wrote:
> On 5/13/26 11:26, Pedro Falcato wrote:
>> On Wed, May 13, 2026 at 11:12:43AM +0200, Vlastimil Babka (SUSE) wrote:
>>> On 5/13/26 11:00, Dragos Tatulea wrote:
>>>>
>>>>
>>>>
>>>> Seems like this patch broke tcp_mmap because
>>>> validate_page_before_insert() returns -EINVAL due
>>>> to a page having a type. Here's the full flow:
>>>>
>>>> getsockopt(TCP_ZEROCOPY_RECEIVE) returns -EINVAL because of the
>>>> below flow in the kernel:
>>>>
>>>> tcp_zerocopy_receive()
>>>> -> tcp_zerocopy_vm_insert_batch()
>>>>   -> vm_insert_pages()
>>>>     -> insert_pages()
>>>>       -> insert_page_in_batch_locked()
>>>>         -> validate_page_before_insert() returns -EINVAL
>>>>            because page_has_type(page) is now true.
>>>>
>>>> The patch below fixes the issue. But is this a valid fix?
>>>
>>> Hmm the check traces back to commit 0ee930e6cafa0 "mm/memory.c: prevent
>>> mapping typed pages to userspace"
>>>
>>>> Pages which use page_type must never be mapped to userspace as it would
>>>> destroy their page type.  Add an explicit check for this instead of
>>>> assuming that kernel drivers always get this right.
>>>
>>> So uh, this doesn't look good I think.
>>
>> Yep, you fundamentally can't map a page with a type as page type aliases with
>> mapcount. Even with the given diff, just mapping it will increment the mapcount
>> and wreak havoc. I think we need to revert this patch for now.
>>
>> I'm not sure what the long term plan for this would be. If page types are moved
>> to memdesc types, then the two stop colliding and that could work. I don't know
>> if that's Willy's plan, however.
>>
>> (then there's the other question: are page pool pages really folios? not really.
>> they are mappable, but they aren't part of the page cache, or anon, nor are
>> they in the LRU or have rmap capabilities. perhaps we need a different memdesc
>> for those. we're one step away from reinventing class polymorphism from first
>> principles ;)
> 
> Zi Yan is working on this: non-folio pages would no longer mess with
> rmap/mapcounts, and page table walking code will identify them to be non-folio
> things to skip them.
> 
> It will take a while, though ...
> 
So do I get it right that the path forward here is to revert this commit [1]
and wait until the work from Zi Yan is ready?

[1] db359fccf212 ("mm: introduce a new page type for page pool in page type")

Thanks,
Dragos

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] mm: introduce a new page type for page pool in page type
  2026-05-13 12:06         ` Dragos Tatulea
@ 2026-05-13 12:11           ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 26+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-13 12:11 UTC (permalink / raw)
  To: Dragos Tatulea, Pedro Falcato, Vlastimil Babka (SUSE)
  Cc: Byungchul Park, linux-mm, akpm, netdev, linux-kernel, kernel_team,
	harry.yoo, ast, daniel, davem, kuba, hawk, john.fastabend, sdf,
	saeedm, leon, tariqt, mbloch, andrew+netdev, edumazet, pabeni,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	horms, jackmanb, hannes, ziy, ilias.apalodimas, willy, brauner,
	kas, yuzhao, usamaarif642, baolin.wang, almasrymina, toke,
	asml.silence, bpf, linux-rdma, sfr, dw, ap420073

On 5/13/26 14:06, Dragos Tatulea wrote:
> 
> 
> On 13.05.26 11:36, David Hildenbrand (Arm) wrote:
>> On 5/13/26 11:26, Pedro Falcato wrote:
>>>
>>> Yep, you fundamentally can't map a page with a type as page type aliases with
>>> mapcount. Even with the given diff, just mapping it will increment the mapcount
>>> and wreak havoc. I think we need to revert this patch for now.
>>>
>>> I'm not sure what the long term plan for this would be. If page types are moved
>>> to memdesc types, then the two stop colliding and that could work. I don't know
>>> if that's Willy's plan, however.
>>>
>>> (then there's the other question: are page pool pages really folios? not really.
>>> they are mappable, but they aren't part of the page cache, or anon, nor are
>>> they in the LRU or have rmap capabilities. perhaps we need a different memdesc
>>> for those. we're one step away from reinventing class polymorphism from first
>>> principles ;)
>>
>> Zi Yan is working on this: non-folio pages would no longer mess with
>> rmap/mapcounts, and page table walking code will identify them to be non-folio
>> things to skip them.
>>
>> It will take a while, though ...
>>
> So do I get it right that the path forward here is to revert this commit [1]
> and wait until the work from Zi Yan is ready?

If there is a requirement that these things will get mapped to user space, then,
yes, it currently doesn't work.

One could remap_pfn_range() and friends instead, but I assume we don't want to
go down that path ...

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] mm: introduce a new page type for page pool in page type
  2026-05-13  9:00 ` [PATCH v4] " Dragos Tatulea
  2026-05-13  9:12   ` Vlastimil Babka (SUSE)
  2026-05-13  9:34   ` David Hildenbrand (Arm)
@ 2026-05-13 12:18   ` Byungchul Park
  2026-05-13 12:29     ` David Hildenbrand (Arm)
  2 siblings, 1 reply; 26+ messages in thread
From: Byungchul Park @ 2026-05-13 12:18 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: linux-mm, akpm, netdev, linux-kernel, kernel_team, harry.yoo, ast,
	daniel, davem, kuba, hawk, john.fastabend, sdf, saeedm, leon,
	tariqt, mbloch, andrew+netdev, edumazet, pabeni, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	horms, jackmanb, hannes, ziy, ilias.apalodimas, willy, brauner,
	kas, yuzhao, usamaarif642, baolin.wang, almasrymina, toke,
	asml.silence, bpf, linux-rdma, sfr, dw, ap420073

On Wed, May 13, 2026 at 11:00:51AM +0200, Dragos Tatulea wrote:
> On 24.02.26 06:13, Byungchul Park wrote:
> > Currently, the condition 'page->pp_magic == PP_SIGNATURE' is used to
> > determine if a page belongs to a page pool.  However, with the planned
> > removal of @pp_magic, we should instead leverage the page_type in struct
> > page, such as PGTY_netpp, for this purpose.
> >
> > Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
> > and __ClearPageNetpp() instead, and remove the existing APIs accessing
> > @pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
> > netmem_clear_pp_magic().
> >
> > Plus, add @page_type to struct net_iov at the same offset as struct page
> > so as to use the page_type APIs for struct net_iov as well.  While at it,
> > reorder @type and @owner in struct net_iov to avoid a hole and
> > increasing the struct size.
> >
> > This work was inspired by the following link:
> >
> >   https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/
> >
> > While at it, move the sanity check for page pool to on the free path.
> >
> > Suggested-by: David Hildenbrand <david@redhat.com>
> > Co-developed-by: Pavel Begunkov <asml.silence@gmail.com>
> > Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > Acked-by: David Hildenbrand <david@redhat.com>
> > Acked-by: Zi Yan <ziy@nvidia.com>
> > Acked-by: Vlastimil Babka <vbabka@suse.cz>
> > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > ---
> 
> Seems like this patch broke tcp_mmap because
> validate_page_before_insert() returns -EINVAL due
> to a page having a type. Here's the full flow:
> 
> getsockopt(TCP_ZEROCOPY_RECEIVE) returns -EINVAL because of the
> below flow in the kernel:
> 
> tcp_zerocopy_receive()
> -> tcp_zerocopy_vm_insert_batch()
>   -> vm_insert_pages()
>     -> insert_pages()
>       -> insert_page_in_batch_locked()
>         -> validate_page_before_insert() returns -EINVAL
>            because page_has_type(page) is now true.
> 
> The patch below fixes the issue. But is this a valid fix?

Hi,

The problem comes from the fact that page_type and _mapcount are
union'ed but there is a case where these two information should be kept
at the same time.

Why don't we allow these two information can be kept in the 4 bytes at
the same time until Zi Yan's work on _mapcount and page_type will be
done, instead of taking a step back?

It can be more optimized but I suggest the approach I just mentioned:
---
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 64dc44832808..e5ec204866dc 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -185,8 +185,7 @@ static inline int folio_precise_page_mapcount(struct folio *folio,
 {
 	int mapcount = atomic_read(&page->_mapcount) + 1;
 
-	if (page_mapcount_is_type(mapcount))
-		mapcount = 0;
+	mapcount = page_mapcount_clear_type(mapcount);
 	if (folio_test_large(folio))
 		mapcount += folio_entire_mapcount(folio);
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8260e28205e9..f45064796313 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1865,8 +1865,7 @@ static inline int folio_mapcount(const struct folio *folio)
 
 	if (likely(!folio_test_large(folio))) {
 		mapcount = atomic_read(&folio->_mapcount) + 1;
-		if (page_mapcount_is_type(mapcount))
-			mapcount = 0;
+		mapcount = page_mapcount_clear_type(mapcount);
 		return mapcount;
 	}
 	return folio_large_mapcount(folio);
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 0e03d816e8b9..f3b0d1fa262d 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -934,9 +934,9 @@ static inline bool page_type_has_type(int page_type)
 }
 
 /* This takes a mapcount which is one more than page->_mapcount */
-static inline bool page_mapcount_is_type(unsigned int mapcount)
+static inline unsigned int page_mapcount_clear_type(unsigned int mapcount)
 {
-	return page_type_has_type(mapcount - 1);
+	return (unsigned int)(((int)(mapcount << 8)) >> 8);
 }
 
 static inline bool page_has_type(const struct page *page)
@@ -953,16 +953,20 @@ static __always_inline void __folio_set_##fname(struct folio *folio)	\
 {									\
 	if (folio_test_##fname(folio))					\
 		return;							\
-	VM_BUG_ON_FOLIO(data_race(folio->page.page_type) != UINT_MAX,	\
+	VM_BUG_ON_FOLIO(page_type_has_type(data_race(folio->page.page_type)), \
 			folio);						\
-	folio->page.page_type = (unsigned int)PGTY_##lname << 24;	\
+	folio->page.page_type &= ~(PGTY_mapcount_underflow << 24);	\
+	folio->page.page_type |= (unsigned int)PGTY_##lname << 24;	\
 }									\
 static __always_inline void __folio_clear_##fname(struct folio *folio)	\
 {									\
-	if (folio->page.page_type == UINT_MAX)				\
+	int mapcount;							\
+									\
+	if (!page_type_has_type(folio->page.page_type))			\
 		return;							\
 	VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio);		\
-	folio->page.page_type = UINT_MAX;				\
+	mapcount = atomic_read(&folio->page._mapcount);			\
+	folio->page.page_type = page_mapcount_clear_type(mapcount);	\
 }
 
 #define PAGE_TYPE_OPS(uname, lname, fname)				\
@@ -975,15 +979,20 @@ static __always_inline void __SetPage##uname(struct page *page)		\
 {									\
 	if (Page##uname(page))						\
 		return;							\
-	VM_BUG_ON_PAGE(data_race(page->page_type) != UINT_MAX, page);	\
-	page->page_type = (unsigned int)PGTY_##lname << 24;		\
+	VM_BUG_ON_PAGE(page_type_has_type(data_race(page->page_type)),	\
+				page);					\
+	page->page_type &= ~(PGTY_mapcount_underflow << 24);		\
+	page->page_type |= (unsigned int)PGTY_##lname << 24;		\
 }									\
 static __always_inline void __ClearPage##uname(struct page *page)	\
 {									\
-	if (page->page_type == UINT_MAX)				\
+	int mapcount;							\
+									\
+	if (!page_type_has_type(page->page_type))			\
 		return;							\
 	VM_BUG_ON_PAGE(!Page##uname(page), page);			\
-	page->page_type = UINT_MAX;					\
+	mapcount = atomic_read(&page->_mapcount);			\
+	page->page_type = page_mapcount_clear_type(mapcount);		\
 }
 
 /*
diff --git a/mm/debug.c b/mm/debug.c
index 77fa8fe1d641..9a932ded09d4 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -74,8 +74,7 @@ static void __dump_folio(const struct folio *folio, const struct page *page,
 	int mapcount = atomic_read(&page->_mapcount) + 1;
 	char *type = "";
 
-	if (page_mapcount_is_type(mapcount))
-		mapcount = 0;
+	mapcount = page_mapcount_clear_type(mapcount);
 
 	pr_warn("page: refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n",
 			folio_ref_count(folio), mapcount, mapping,
---

Thoughts?

	Byungchul

> diff --git a/mm/memory.c b/mm/memory.c
> index ea6568571131..4cb12673f450 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2326,7 +2326,7 @@ static int validate_page_before_insert(struct vm_area_struct *vma,
>                         return -EINVAL;
>                 return 0;
>         }
> -       if (folio_test_anon(folio) || page_has_type(page))
> +       if (folio_test_anon(folio) || (page_has_type(page) && !PageNetpp(page)))
>                 return -EINVAL;
>         flush_dcache_folio(folio);
>         return 0;
> 
> Thanks,
> Dragos
> 

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] mm: introduce a new page type for page pool in page type
  2026-05-13 12:18   ` Byungchul Park
@ 2026-05-13 12:29     ` David Hildenbrand (Arm)
  2026-05-13 12:39       ` Byungchul Park
  0 siblings, 1 reply; 26+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-13 12:29 UTC (permalink / raw)
  To: Byungchul Park, Dragos Tatulea
  Cc: linux-mm, akpm, netdev, linux-kernel, kernel_team, harry.yoo, ast,
	daniel, davem, kuba, hawk, john.fastabend, sdf, saeedm, leon,
	tariqt, mbloch, andrew+netdev, edumazet, pabeni, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, horms, jackmanb,
	hannes, ziy, ilias.apalodimas, willy, brauner, kas, yuzhao,
	usamaarif642, baolin.wang, almasrymina, toke, asml.silence, bpf,
	linux-rdma, sfr, dw, ap420073

On 5/13/26 14:18, Byungchul Park wrote:
> On Wed, May 13, 2026 at 11:00:51AM +0200, Dragos Tatulea wrote:
>> On 24.02.26 06:13, Byungchul Park wrote:
>>> Currently, the condition 'page->pp_magic == PP_SIGNATURE' is used to
>>> determine if a page belongs to a page pool.  However, with the planned
>>> removal of @pp_magic, we should instead leverage the page_type in struct
>>> page, such as PGTY_netpp, for this purpose.
>>>
>>> Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
>>> and __ClearPageNetpp() instead, and remove the existing APIs accessing
>>> @pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
>>> netmem_clear_pp_magic().
>>>
>>> Plus, add @page_type to struct net_iov at the same offset as struct page
>>> so as to use the page_type APIs for struct net_iov as well.  While at it,
>>> reorder @type and @owner in struct net_iov to avoid a hole and
>>> increasing the struct size.
>>>
>>> This work was inspired by the following link:
>>>
>>>   https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/
>>>
>>> While at it, move the sanity check for page pool to on the free path.
>>>
>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>> Co-developed-by: Pavel Begunkov <asml.silence@gmail.com>
>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>> Signed-off-by: Byungchul Park <byungchul@sk.com>
>>> Acked-by: David Hildenbrand <david@redhat.com>
>>> Acked-by: Zi Yan <ziy@nvidia.com>
>>> Acked-by: Vlastimil Babka <vbabka@suse.cz>
>>> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>> ---
>>
>> Seems like this patch broke tcp_mmap because
>> validate_page_before_insert() returns -EINVAL due
>> to a page having a type. Here's the full flow:
>>
>> getsockopt(TCP_ZEROCOPY_RECEIVE) returns -EINVAL because of the
>> below flow in the kernel:
>>
>> tcp_zerocopy_receive()
>> -> tcp_zerocopy_vm_insert_batch()
>>   -> vm_insert_pages()
>>     -> insert_pages()
>>       -> insert_page_in_batch_locked()
>>         -> validate_page_before_insert() returns -EINVAL
>>            because page_has_type(page) is now true.
>>
>> The patch below fixes the issue. But is this a valid fix?
> 
> Hi,
> 
> The problem comes from the fact that page_type and _mapcount are
> union'ed but there is a case where these two information should be kept
> at the same time.
> 
> Why don't we allow these two information can be kept in the 4 bytes at
> the same time until Zi Yan's work on _mapcount and page_type will be
> done, instead of taking a step back?
> 
> It can be more optimized but I suggest the approach I just mentioned:
> ---
> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> index 64dc44832808..e5ec204866dc 100644
> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -185,8 +185,7 @@ static inline int folio_precise_page_mapcount(struct folio *folio,
>  {
>  	int mapcount = atomic_read(&page->_mapcount) + 1;
>  
> -	if (page_mapcount_is_type(mapcount))
> -		mapcount = 0;
> +	mapcount = page_mapcount_clear_type(mapcount);
>  	if (folio_test_large(folio))
>  		mapcount += folio_entire_mapcount(folio);
>  
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 8260e28205e9..f45064796313 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1865,8 +1865,7 @@ static inline int folio_mapcount(const struct folio *folio)
>  
>  	if (likely(!folio_test_large(folio))) {
>  		mapcount = atomic_read(&folio->_mapcount) + 1;
> -		if (page_mapcount_is_type(mapcount))
> -			mapcount = 0;
> +		mapcount = page_mapcount_clear_type(mapcount);
>  		return mapcount;
>  	}
>  	return folio_large_mapcount(folio);
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 0e03d816e8b9..f3b0d1fa262d 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -934,9 +934,9 @@ static inline bool page_type_has_type(int page_type)
>  }
>  
>  /* This takes a mapcount which is one more than page->_mapcount */
> -static inline bool page_mapcount_is_type(unsigned int mapcount)
> +static inline unsigned int page_mapcount_clear_type(unsigned int mapcount)
>  {
> -	return page_type_has_type(mapcount - 1);
> +	return (unsigned int)(((int)(mapcount << 8)) >> 8);
>  }
>  
>  static inline bool page_has_type(const struct page *page)
> @@ -953,16 +953,20 @@ static __always_inline void __folio_set_##fname(struct folio *folio)	\
>  {									\
>  	if (folio_test_##fname(folio))					\
>  		return;							\
> -	VM_BUG_ON_FOLIO(data_race(folio->page.page_type) != UINT_MAX,	\
> +	VM_BUG_ON_FOLIO(page_type_has_type(data_race(folio->page.page_type)), \
>  			folio);						\
> -	folio->page.page_type = (unsigned int)PGTY_##lname << 24;	\
> +	folio->page.page_type &= ~(PGTY_mapcount_underflow << 24);	\
> +	folio->page.page_type |= (unsigned int)PGTY_##lname << 24;	\
>  }									\
>  static __always_inline void __folio_clear_##fname(struct folio *folio)	\
>  {									\
> -	if (folio->page.page_type == UINT_MAX)				\
> +	int mapcount;							\
> +									\
> +	if (!page_type_has_type(folio->page.page_type))			\
>  		return;							\
>  	VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio);		\
> -	folio->page.page_type = UINT_MAX;				\
> +	mapcount = atomic_read(&folio->page._mapcount);			\
> +	folio->page.page_type = page_mapcount_clear_type(mapcount);	\
>  }
>  
>  #define PAGE_TYPE_OPS(uname, lname, fname)				\
> @@ -975,15 +979,20 @@ static __always_inline void __SetPage##uname(struct page *page)		\
>  {									\
>  	if (Page##uname(page))						\
>  		return;							\
> -	VM_BUG_ON_PAGE(data_race(page->page_type) != UINT_MAX, page);	\
> -	page->page_type = (unsigned int)PGTY_##lname << 24;		\
> +	VM_BUG_ON_PAGE(page_type_has_type(data_race(page->page_type)),	\
> +				page);					\
> +	page->page_type &= ~(PGTY_mapcount_underflow << 24);		\
> +	page->page_type |= (unsigned int)PGTY_##lname << 24;		\
>  }									\
>  static __always_inline void __ClearPage##uname(struct page *page)	\
>  {									\
> -	if (page->page_type == UINT_MAX)				\
> +	int mapcount;							\
> +									\
> +	if (!page_type_has_type(page->page_type))			\
>  		return;							\
>  	VM_BUG_ON_PAGE(!Page##uname(page), page);			\
> -	page->page_type = UINT_MAX;					\
> +	mapcount = atomic_read(&page->_mapcount);			\
> +	page->page_type = page_mapcount_clear_type(mapcount);		\
>  }
>  
>  /*
> diff --git a/mm/debug.c b/mm/debug.c
> index 77fa8fe1d641..9a932ded09d4 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -74,8 +74,7 @@ static void __dump_folio(const struct folio *folio, const struct page *page,
>  	int mapcount = atomic_read(&page->_mapcount) + 1;
>  	char *type = "";
>  
> -	if (page_mapcount_is_type(mapcount))
> -		mapcount = 0;
> +	mapcount = page_mapcount_clear_type(mapcount);
>  
>  	pr_warn("page: refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n",
>  			folio_ref_count(folio), mapcount, mapping,
> ---
> 
> Thoughts?

God no.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] mm: introduce a new page type for page pool in page type
  2026-05-13 12:29     ` David Hildenbrand (Arm)
@ 2026-05-13 12:39       ` Byungchul Park
  2026-05-13 13:02         ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 26+ messages in thread
From: Byungchul Park @ 2026-05-13 12:39 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Dragos Tatulea, linux-mm, akpm, netdev, linux-kernel, kernel_team,
	harry.yoo, ast, daniel, davem, kuba, hawk, john.fastabend, sdf,
	saeedm, leon, tariqt, mbloch, andrew+netdev, edumazet, pabeni,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	horms, jackmanb, hannes, ziy, ilias.apalodimas, willy, brauner,
	kas, yuzhao, usamaarif642, baolin.wang, almasrymina, toke,
	asml.silence, bpf, linux-rdma, sfr, dw, ap420073

On Wed, May 13, 2026 at 02:29:46PM +0200, David Hildenbrand (Arm) wrote:
 On 5/13/26 14:18, Byungchul Park wrote:
> > On Wed, May 13, 2026 at 11:00:51AM +0200, Dragos Tatulea wrote:
> >> On 24.02.26 06:13, Byungchul Park wrote:
> >>> Currently, the condition 'page->pp_magic == PP_SIGNATURE' is used to
> >>> determine if a page belongs to a page pool.  However, with the planned
> >>> removal of @pp_magic, we should instead leverage the page_type in struct
> >>> page, such as PGTY_netpp, for this purpose.
> >>>
> >>> Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
> >>> and __ClearPageNetpp() instead, and remove the existing APIs accessing
> >>> @pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
> >>> netmem_clear_pp_magic().
> >>>
> >>> Plus, add @page_type to struct net_iov at the same offset as struct page
> >>> so as to use the page_type APIs for struct net_iov as well.  While at it,
> >>> reorder @type and @owner in struct net_iov to avoid a hole and
> >>> increasing the struct size.
> >>>
> >>> This work was inspired by the following link:
> >>>
> >>>   https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/
> >>>
> >>> While at it, move the sanity check for page pool to on the free path.
> >>>
> >>> Suggested-by: David Hildenbrand <david@redhat.com>
> >>> Co-developed-by: Pavel Begunkov <asml.silence@gmail.com>
> >>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> >>> Signed-off-by: Byungchul Park <byungchul@sk.com>
> >>> Acked-by: David Hildenbrand <david@redhat.com>
> >>> Acked-by: Zi Yan <ziy@nvidia.com>
> >>> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> >>> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >>> ---
> >>
> >> Seems like this patch broke tcp_mmap because
> >> validate_page_before_insert() returns -EINVAL due
> >> to a page having a type. Here's the full flow:
> >>
> >> getsockopt(TCP_ZEROCOPY_RECEIVE) returns -EINVAL because of the
> >> below flow in the kernel:
> >>
> >> tcp_zerocopy_receive()
> >> -> tcp_zerocopy_vm_insert_batch()
> >>   -> vm_insert_pages()
> >>     -> insert_pages()
> >>       -> insert_page_in_batch_locked()
> >>         -> validate_page_before_insert() returns -EINVAL
> >>            because page_has_type(page) is now true.
> >>
> >> The patch below fixes the issue. But is this a valid fix?
> >
> > Hi,
> >
> > The problem comes from the fact that page_type and _mapcount are
> > union'ed but there is a case where these two information should be kept
> > at the same time.
> >
> > Why don't we allow these two information can be kept in the 4 bytes at
> > the same time until Zi Yan's work on _mapcount and page_type will be
> > done, instead of taking a step back?
> >
> > It can be more optimized but I suggest the approach I just mentioned:
> > ---
> > diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> > index 64dc44832808..e5ec204866dc 100644
> > --- a/fs/proc/internal.h
> > +++ b/fs/proc/internal.h
> > @@ -185,8 +185,7 @@ static inline int folio_precise_page_mapcount(struct folio *folio,
> >  {
> >       int mapcount = atomic_read(&page->_mapcount) + 1;
> >
> > -     if (page_mapcount_is_type(mapcount))
> > -             mapcount = 0;
> > +     mapcount = page_mapcount_clear_type(mapcount);
> >       if (folio_test_large(folio))
> >               mapcount += folio_entire_mapcount(folio);
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 8260e28205e9..f45064796313 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -1865,8 +1865,7 @@ static inline int folio_mapcount(const struct folio *folio)
> >
> >       if (likely(!folio_test_large(folio))) {
> >               mapcount = atomic_read(&folio->_mapcount) + 1;
> > -             if (page_mapcount_is_type(mapcount))
> > -                     mapcount = 0;
> > +             mapcount = page_mapcount_clear_type(mapcount);
> >               return mapcount;
> >       }
> >       return folio_large_mapcount(folio);
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index 0e03d816e8b9..f3b0d1fa262d 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -934,9 +934,9 @@ static inline bool page_type_has_type(int page_type)
> >  }
> >
> >  /* This takes a mapcount which is one more than page->_mapcount */
> > -static inline bool page_mapcount_is_type(unsigned int mapcount)
> > +static inline unsigned int page_mapcount_clear_type(unsigned int mapcount)
> >  {
> > -     return page_type_has_type(mapcount - 1);
> > +     return (unsigned int)(((int)(mapcount << 8)) >> 8);
> >  }
> >
> >  static inline bool page_has_type(const struct page *page)
> > @@ -953,16 +953,20 @@ static __always_inline void __folio_set_##fname(struct folio *folio)    \
> >  {                                                                    \
> >       if (folio_test_##fname(folio))                                  \
> >               return;                                                 \
> > -     VM_BUG_ON_FOLIO(data_race(folio->page.page_type) != UINT_MAX,   \
> > +     VM_BUG_ON_FOLIO(page_type_has_type(data_race(folio->page.page_type)), \
> >                       folio);                                         \
> > -     folio->page.page_type = (unsigned int)PGTY_##lname << 24;       \
> > +     folio->page.page_type &= ~(PGTY_mapcount_underflow << 24);      \
> > +     folio->page.page_type |= (unsigned int)PGTY_##lname << 24;      \
> >  }                                                                    \
> >  static __always_inline void __folio_clear_##fname(struct folio *folio)       \
> >  {                                                                    \
> > -     if (folio->page.page_type == UINT_MAX)                          \
> > +     int mapcount;                                                   \
> > +                                                                     \
> > +     if (!page_type_has_type(folio->page.page_type))                 \
> >               return;                                                 \
> >       VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio);             \
> > -     folio->page.page_type = UINT_MAX;                               \
> > +     mapcount = atomic_read(&folio->page._mapcount);                 \
> > +     folio->page.page_type = page_mapcount_clear_type(mapcount);     \
> >  }
> >
> >  #define PAGE_TYPE_OPS(uname, lname, fname)                           \
> > @@ -975,15 +979,20 @@ static __always_inline void __SetPage##uname(struct page *page)         \
> >  {                                                                    \
> >       if (Page##uname(page))                                          \
> >               return;                                                 \
> > -     VM_BUG_ON_PAGE(data_race(page->page_type) != UINT_MAX, page);   \
> > -     page->page_type = (unsigned int)PGTY_##lname << 24;             \
> > +     VM_BUG_ON_PAGE(page_type_has_type(data_race(page->page_type)),  \
> > +                             page);                                  \
> > +     page->page_type &= ~(PGTY_mapcount_underflow << 24);            \
> > +     page->page_type |= (unsigned int)PGTY_##lname << 24;            \
> >  }                                                                    \
> >  static __always_inline void __ClearPage##uname(struct page *page)    \
> >  {                                                                    \
> > -     if (page->page_type == UINT_MAX)                                \
> > +     int mapcount;                                                   \
> > +                                                                     \
> > +     if (!page_type_has_type(page->page_type))                       \
> >               return;                                                 \
> >       VM_BUG_ON_PAGE(!Page##uname(page), page);                       \
> > -     page->page_type = UINT_MAX;                                     \
> > +     mapcount = atomic_read(&page->_mapcount);                       \
> > +     page->page_type = page_mapcount_clear_type(mapcount);           \
> >  }
> >
> >  /*
> > diff --git a/mm/debug.c b/mm/debug.c
> > index 77fa8fe1d641..9a932ded09d4 100644
> > --- a/mm/debug.c
> > +++ b/mm/debug.c
> > @@ -74,8 +74,7 @@ static void __dump_folio(const struct folio *folio, const struct page *page,
> >       int mapcount = atomic_read(&page->_mapcount) + 1;
> >       char *type = "";
> >
> > -     if (page_mapcount_is_type(mapcount))
> > -             mapcount = 0;
> > +     mapcount = page_mapcount_clear_type(mapcount);
> >
> >       pr_warn("page: refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n",
> >                       folio_ref_count(folio), mapcount, mapping,
> > ---
> >
> > Thoughts?
> 
> God no.

This is not final patch, but for sharing the rough idea *with code* -
maybe there are more points in code that should be adjusted by the
change.  I just typed the draft patch quick just for sharing idea.

If we should allow pp type pages to be used in mapping as well, then
we should allow a page to keep both its type and mapcount at the same
time.  Am I missing something?

	Byungchul

> --
> Cheers,
> 
> David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] mm: introduce a new page type for page pool in page type
  2026-05-13 12:39       ` Byungchul Park
@ 2026-05-13 13:02         ` David Hildenbrand (Arm)
  2026-05-13 13:26           ` Byungchul Park
  0 siblings, 1 reply; 26+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-13 13:02 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Dragos Tatulea, linux-mm, akpm, netdev, linux-kernel, kernel_team,
	harry.yoo, ast, daniel, davem, kuba, hawk, john.fastabend, sdf,
	saeedm, leon, tariqt, mbloch, andrew+netdev, edumazet, pabeni,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	horms, jackmanb, hannes, ziy, ilias.apalodimas, willy, brauner,
	kas, yuzhao, usamaarif642, baolin.wang, almasrymina, toke,
	asml.silence, bpf, linux-rdma, sfr, dw, ap420073

On 5/13/26 14:39, Byungchul Park wrote:
> On Wed, May 13, 2026 at 02:29:46PM +0200, David Hildenbrand (Arm) wrote:
>  On 5/13/26 14:18, Byungchul Park wrote:
>>>
>>> Hi,
>>>
>>> The problem comes from the fact that page_type and _mapcount are
>>> union'ed but there is a case where these two information should be kept
>>> at the same time.
>>>
>>> Why don't we allow these two information can be kept in the 4 bytes at
>>> the same time until Zi Yan's work on _mapcount and page_type will be
>>> done, instead of taking a step back?
>>>
>>> It can be more optimized but I suggest the approach I just mentioned:
>>> ---
>>> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
>>> index 64dc44832808..e5ec204866dc 100644
>>> --- a/fs/proc/internal.h
>>> +++ b/fs/proc/internal.h
>>> @@ -185,8 +185,7 @@ static inline int folio_precise_page_mapcount(struct folio *folio,
>>>  {
>>>       int mapcount = atomic_read(&page->_mapcount) + 1;
>>>
>>> -     if (page_mapcount_is_type(mapcount))
>>> -             mapcount = 0;
>>> +     mapcount = page_mapcount_clear_type(mapcount);
>>>       if (folio_test_large(folio))
>>>               mapcount += folio_entire_mapcount(folio);
>>>
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index 8260e28205e9..f45064796313 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -1865,8 +1865,7 @@ static inline int folio_mapcount(const struct folio *folio)
>>>
>>>       if (likely(!folio_test_large(folio))) {
>>>               mapcount = atomic_read(&folio->_mapcount) + 1;
>>> -             if (page_mapcount_is_type(mapcount))
>>> -                     mapcount = 0;
>>> +             mapcount = page_mapcount_clear_type(mapcount);
>>>               return mapcount;
>>>       }
>>>       return folio_large_mapcount(folio);
>>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>>> index 0e03d816e8b9..f3b0d1fa262d 100644
>>> --- a/include/linux/page-flags.h
>>> +++ b/include/linux/page-flags.h
>>> @@ -934,9 +934,9 @@ static inline bool page_type_has_type(int page_type)
>>>  }
>>>
>>>  /* This takes a mapcount which is one more than page->_mapcount */
>>> -static inline bool page_mapcount_is_type(unsigned int mapcount)
>>> +static inline unsigned int page_mapcount_clear_type(unsigned int mapcount)
>>>  {
>>> -     return page_type_has_type(mapcount - 1);
>>> +     return (unsigned int)(((int)(mapcount << 8)) >> 8);
>>>  }
>>>
>>>  static inline bool page_has_type(const struct page *page)
>>> @@ -953,16 +953,20 @@ static __always_inline void __folio_set_##fname(struct folio *folio)    \
>>>  {                                                                    \
>>>       if (folio_test_##fname(folio))                                  \
>>>               return;                                                 \
>>> -     VM_BUG_ON_FOLIO(data_race(folio->page.page_type) != UINT_MAX,   \
>>> +     VM_BUG_ON_FOLIO(page_type_has_type(data_race(folio->page.page_type)), \
>>>                       folio);                                         \
>>> -     folio->page.page_type = (unsigned int)PGTY_##lname << 24;       \
>>> +     folio->page.page_type &= ~(PGTY_mapcount_underflow << 24);      \
>>> +     folio->page.page_type |= (unsigned int)PGTY_##lname << 24;      \
>>>  }                                                                    \
>>>  static __always_inline void __folio_clear_##fname(struct folio *folio)       \
>>>  {                                                                    \
>>> -     if (folio->page.page_type == UINT_MAX)                          \
>>> +     int mapcount;                                                   \
>>> +                                                                     \
>>> +     if (!page_type_has_type(folio->page.page_type))                 \
>>>               return;                                                 \
>>>       VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio);             \
>>> -     folio->page.page_type = UINT_MAX;                               \
>>> +     mapcount = atomic_read(&folio->page._mapcount);                 \
>>> +     folio->page.page_type = page_mapcount_clear_type(mapcount);     \
>>>  }
>>>
>>>  #define PAGE_TYPE_OPS(uname, lname, fname)                           \
>>> @@ -975,15 +979,20 @@ static __always_inline void __SetPage##uname(struct page *page)         \
>>>  {                                                                    \
>>>       if (Page##uname(page))                                          \
>>>               return;                                                 \
>>> -     VM_BUG_ON_PAGE(data_race(page->page_type) != UINT_MAX, page);   \
>>> -     page->page_type = (unsigned int)PGTY_##lname << 24;             \
>>> +     VM_BUG_ON_PAGE(page_type_has_type(data_race(page->page_type)),  \
>>> +                             page);                                  \
>>> +     page->page_type &= ~(PGTY_mapcount_underflow << 24);            \
>>> +     page->page_type |= (unsigned int)PGTY_##lname << 24;            \
>>>  }                                                                    \
>>>  static __always_inline void __ClearPage##uname(struct page *page)    \
>>>  {                                                                    \
>>> -     if (page->page_type == UINT_MAX)                                \
>>> +     int mapcount;                                                   \
>>> +                                                                     \
>>> +     if (!page_type_has_type(page->page_type))                       \
>>>               return;                                                 \
>>>       VM_BUG_ON_PAGE(!Page##uname(page), page);                       \
>>> -     page->page_type = UINT_MAX;                                     \
>>> +     mapcount = atomic_read(&page->_mapcount);                       \
>>> +     page->page_type = page_mapcount_clear_type(mapcount);           \
>>>  }
>>>
>>>  /*
>>> diff --git a/mm/debug.c b/mm/debug.c
>>> index 77fa8fe1d641..9a932ded09d4 100644
>>> --- a/mm/debug.c
>>> +++ b/mm/debug.c
>>> @@ -74,8 +74,7 @@ static void __dump_folio(const struct folio *folio, const struct page *page,
>>>       int mapcount = atomic_read(&page->_mapcount) + 1;
>>>       char *type = "";
>>>
>>> -     if (page_mapcount_is_type(mapcount))
>>> -             mapcount = 0;
>>> +     mapcount = page_mapcount_clear_type(mapcount);
>>>
>>>       pr_warn("page: refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n",
>>>                       folio_ref_count(folio), mapcount, mapping,
>>> ---
>>>
>>> Thoughts?
>>
>> God no.
> 
> This is not final patch, but for sharing the rough idea *with code* -
> maybe there are more points in code that should be adjusted by the
> change.  I just typed the draft patch quick just for sharing idea.
> 
> If we should allow pp type pages to be used in mapping as well, then
> we should allow a page to keep both its type and mapcount at the same
> time.  Am I missing something?

We don't want code to accidentally overflow mapcounts into these bits and have
them wrongly be detected as page types.

This is just very fragile.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] mm: introduce a new page type for page pool in page type
  2026-05-13 13:02         ` David Hildenbrand (Arm)
@ 2026-05-13 13:26           ` Byungchul Park
  0 siblings, 0 replies; 26+ messages in thread
From: Byungchul Park @ 2026-05-13 13:26 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Dragos Tatulea, linux-mm, akpm, netdev, linux-kernel, kernel_team,
	harry.yoo, ast, daniel, davem, kuba, hawk, john.fastabend, sdf,
	saeedm, leon, tariqt, mbloch, andrew+netdev, edumazet, pabeni,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	horms, jackmanb, hannes, ziy, ilias.apalodimas, willy, brauner,
	kas, yuzhao, usamaarif642, baolin.wang, almasrymina, toke,
	asml.silence, bpf, linux-rdma, sfr, dw, ap420073

On Wed, May 13, 2026 at 03:02:43PM +0200, David Hildenbrand (Arm) wrote:
> On 5/13/26 14:39, Byungchul Park wrote:
> > On Wed, May 13, 2026 at 02:29:46PM +0200, David Hildenbrand (Arm) wrote:
> >  On 5/13/26 14:18, Byungchul Park wrote:
> >>>
> >>> Hi,
> >>>
> >>> The problem comes from the fact that page_type and _mapcount are
> >>> union'ed but there is a case where these two information should be kept
> >>> at the same time.
> >>>
> >>> Why don't we allow these two information can be kept in the 4 bytes at
> >>> the same time until Zi Yan's work on _mapcount and page_type will be
> >>> done, instead of taking a step back?
> >>>
> >>> It can be more optimized but I suggest the approach I just mentioned:
> >>> ---
> >>> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> >>> index 64dc44832808..e5ec204866dc 100644
> >>> --- a/fs/proc/internal.h
> >>> +++ b/fs/proc/internal.h
> >>> @@ -185,8 +185,7 @@ static inline int folio_precise_page_mapcount(struct folio *folio,
> >>>  {
> >>>       int mapcount = atomic_read(&page->_mapcount) + 1;
> >>>
> >>> -     if (page_mapcount_is_type(mapcount))
> >>> -             mapcount = 0;
> >>> +     mapcount = page_mapcount_clear_type(mapcount);
> >>>       if (folio_test_large(folio))
> >>>               mapcount += folio_entire_mapcount(folio);
> >>>
> >>> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >>> index 8260e28205e9..f45064796313 100644
> >>> --- a/include/linux/mm.h
> >>> +++ b/include/linux/mm.h
> >>> @@ -1865,8 +1865,7 @@ static inline int folio_mapcount(const struct folio *folio)
> >>>
> >>>       if (likely(!folio_test_large(folio))) {
> >>>               mapcount = atomic_read(&folio->_mapcount) + 1;
> >>> -             if (page_mapcount_is_type(mapcount))
> >>> -                     mapcount = 0;
> >>> +             mapcount = page_mapcount_clear_type(mapcount);
> >>>               return mapcount;
> >>>       }
> >>>       return folio_large_mapcount(folio);
> >>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> >>> index 0e03d816e8b9..f3b0d1fa262d 100644
> >>> --- a/include/linux/page-flags.h
> >>> +++ b/include/linux/page-flags.h
> >>> @@ -934,9 +934,9 @@ static inline bool page_type_has_type(int page_type)
> >>>  }
> >>>
> >>>  /* This takes a mapcount which is one more than page->_mapcount */
> >>> -static inline bool page_mapcount_is_type(unsigned int mapcount)
> >>> +static inline unsigned int page_mapcount_clear_type(unsigned int mapcount)
> >>>  {
> >>> -     return page_type_has_type(mapcount - 1);
> >>> +     return (unsigned int)(((int)(mapcount << 8)) >> 8);
> >>>  }
> >>>
> >>>  static inline bool page_has_type(const struct page *page)
> >>> @@ -953,16 +953,20 @@ static __always_inline void __folio_set_##fname(struct folio *folio)    \
> >>>  {                                                                    \
> >>>       if (folio_test_##fname(folio))                                  \
> >>>               return;                                                 \
> >>> -     VM_BUG_ON_FOLIO(data_race(folio->page.page_type) != UINT_MAX,   \
> >>> +     VM_BUG_ON_FOLIO(page_type_has_type(data_race(folio->page.page_type)), \
> >>>                       folio);                                         \
> >>> -     folio->page.page_type = (unsigned int)PGTY_##lname << 24;       \
> >>> +     folio->page.page_type &= ~(PGTY_mapcount_underflow << 24);      \
> >>> +     folio->page.page_type |= (unsigned int)PGTY_##lname << 24;      \
> >>>  }                                                                    \
> >>>  static __always_inline void __folio_clear_##fname(struct folio *folio)       \
> >>>  {                                                                    \
> >>> -     if (folio->page.page_type == UINT_MAX)                          \
> >>> +     int mapcount;                                                   \
> >>> +                                                                     \
> >>> +     if (!page_type_has_type(folio->page.page_type))                 \
> >>>               return;                                                 \
> >>>       VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio);             \
> >>> -     folio->page.page_type = UINT_MAX;                               \
> >>> +     mapcount = atomic_read(&folio->page._mapcount);                 \
> >>> +     folio->page.page_type = page_mapcount_clear_type(mapcount);     \
> >>>  }
> >>>
> >>>  #define PAGE_TYPE_OPS(uname, lname, fname)                           \
> >>> @@ -975,15 +979,20 @@ static __always_inline void __SetPage##uname(struct page *page)         \
> >>>  {                                                                    \
> >>>       if (Page##uname(page))                                          \
> >>>               return;                                                 \
> >>> -     VM_BUG_ON_PAGE(data_race(page->page_type) != UINT_MAX, page);   \
> >>> -     page->page_type = (unsigned int)PGTY_##lname << 24;             \
> >>> +     VM_BUG_ON_PAGE(page_type_has_type(data_race(page->page_type)),  \
> >>> +                             page);                                  \
> >>> +     page->page_type &= ~(PGTY_mapcount_underflow << 24);            \
> >>> +     page->page_type |= (unsigned int)PGTY_##lname << 24;            \
> >>>  }                                                                    \
> >>>  static __always_inline void __ClearPage##uname(struct page *page)    \
> >>>  {                                                                    \
> >>> -     if (page->page_type == UINT_MAX)                                \
> >>> +     int mapcount;                                                   \
> >>> +                                                                     \
> >>> +     if (!page_type_has_type(page->page_type))                       \
> >>>               return;                                                 \
> >>>       VM_BUG_ON_PAGE(!Page##uname(page), page);                       \
> >>> -     page->page_type = UINT_MAX;                                     \
> >>> +     mapcount = atomic_read(&page->_mapcount);                       \
> >>> +     page->page_type = page_mapcount_clear_type(mapcount);           \
> >>>  }
> >>>
> >>>  /*
> >>> diff --git a/mm/debug.c b/mm/debug.c
> >>> index 77fa8fe1d641..9a932ded09d4 100644
> >>> --- a/mm/debug.c
> >>> +++ b/mm/debug.c
> >>> @@ -74,8 +74,7 @@ static void __dump_folio(const struct folio *folio, const struct page *page,
> >>>       int mapcount = atomic_read(&page->_mapcount) + 1;
> >>>       char *type = "";
> >>>
> >>> -     if (page_mapcount_is_type(mapcount))
> >>> -             mapcount = 0;
> >>> +     mapcount = page_mapcount_clear_type(mapcount);
> >>>
> >>>       pr_warn("page: refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n",
> >>>                       folio_ref_count(folio), mapcount, mapping,
> >>> ---
> >>>
> >>> Thoughts?
> >>
> >> God no.
> >
> > This is not final patch, but for sharing the rough idea *with code* -
> > maybe there are more points in code that should be adjusted by the
> > change.  I just typed the draft patch quick just for sharing idea.
> >
> > If we should allow pp type pages to be used in mapping as well, then
> > we should allow a page to keep both its type and mapcount at the same
> > time.  Am I missing something?
> 
> We don't want code to accidentally overflow mapcounts into these bits and have
> them wrongly be detected as page types.
> 
> This is just very fragile.

Okay.  Thanks for the explanation.  Plus, the adjustment I mentioned
might not be as simple as I thought it'd be.

So sorry about the noise.  I'm a zombie now.  I'll think about it after
some rest.

	Byungchul

> --
> Cheers,
> 
> David

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2026-05-13 13:26 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-24  5:13 [PATCH v4] mm: introduce a new page type for page pool in page type Byungchul Park
2026-02-25  7:19 ` Mike Rapoport
2026-02-26 18:49 ` Johannes Weiner
2026-03-16 22:29   ` Byungchul Park
2026-03-16 22:31     ` [PATCH v5] " Byungchul Park
2026-03-17  9:20       ` Jesper Dangaard Brouer
2026-03-17 10:03         ` Ilias Apalodimas
2026-03-17 11:06         ` Dragos Tatulea
2026-03-19 23:31           ` Jakub Kicinski
2026-03-18  2:02         ` Byungchul Park
2026-03-20 11:44           ` Jesper Dangaard Brouer
2026-03-23 12:16             ` Ilias Apalodimas
2026-03-19 23:31       ` Jakub Kicinski
2026-05-13  9:00 ` [PATCH v4] " Dragos Tatulea
2026-05-13  9:12   ` Vlastimil Babka (SUSE)
2026-05-13  9:26     ` Pedro Falcato
2026-05-13  9:36       ` David Hildenbrand (Arm)
2026-05-13 12:06         ` Dragos Tatulea
2026-05-13 12:11           ` David Hildenbrand (Arm)
2026-05-13  9:34   ` David Hildenbrand (Arm)
2026-05-13 12:18   ` Byungchul Park
2026-05-13 12:29     ` David Hildenbrand (Arm)
2026-05-13 12:39       ` Byungchul Park
2026-05-13 13:02         ` David Hildenbrand (Arm)
2026-05-13 13:26           ` Byungchul Park
2026-05-13  9:42 ` Lorenzo Stoakes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox