From: Byungchul Park <byungchul@sk.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: linux-mm@kvack.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, kernel_team@skhynix.com,
harry.yoo@oracle.com, ast@kernel.org, daniel@iogearbox.net,
davem@davemloft.net, hawk@kernel.org, john.fastabend@gmail.com,
sdf@fomichev.me, saeedm@nvidia.com, leon@kernel.org,
tariqt@nvidia.com, mbloch@nvidia.com, andrew+netdev@lunn.ch,
edumazet@google.com, pabeni@redhat.com,
akpm@linux-foundation.org, david@redhat.com,
lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com,
vbabka@suse.cz, rppt@kernel.org, surenb@google.com,
mhocko@suse.com, horms@kernel.org, jackmanb@google.com,
hannes@cmpxchg.org, ziy@nvidia.com, ilias.apalodimas@linaro.org,
willy@infradead.org, brauner@kernel.org, kas@kernel.org,
yuzhao@google.com, usamaarif642@gmail.com,
baolin.wang@linux.alibaba.com, almasrymina@google.com,
toke@redhat.com, asml.silence@gmail.com, bpf@vger.kernel.org,
linux-rdma@vger.kernel.org, sfr@canb.auug.org.au, dw@davidwei.uk,
ap420073@gmail.com, dtatulea@nvidia.com
Subject: Re: [RFC mm v5 1/2] page_pool: check nmdesc->pp to see its usage as page pool for net_iov not page-backed
Date: Wed, 12 Nov 2025 16:41:18 +0900 [thread overview]
Message-ID: <20251112074118.GA31149@system.software.com> (raw)
In-Reply-To: <20251107174129.62a3f39c@kernel.org>
On Fri, Nov 07, 2025 at 05:41:29PM -0800, Jakub Kicinski wrote:
> On Fri, 7 Nov 2025 13:47:08 +0900 Byungchul Park wrote:
> > The offset of page_type in struct page cannot be used in struct net_iov
> > for the same purpose, since the offset in struct net_iov is for storing
> > (struct net_iov_area *)owner.
>
> owner does not have to be at a fixed offset. Can we not move owner
> to _pp_mapping_pad ? Or reorder it with type, enum net_iov_type
> only has 2 values we can smoosh it with page_type easily.
At the beginning I tried to smoosh it with page_type but it requires the
additional logic or something to save and restore the value of enum
net_iov_type if the netmem is net_iov when updating page_type like:
if (netmem_is_net_iov(netmem))
type = get_type_from_niov(netmem_to_niov(netmem));
/* Unconditionally clear the value of enum net_iov_type. */
__{Set,Clear}PageNetpp(__netmem_to_page((__force unsigned long)netmem & ~NET_IOV));
if (netmem_is_net_iov(netmem))
set_type_to_niov(netmem_to_niov(netmem), type);
Or page_pool_{set,clear}_pp_info() callers should handle it in a similar
way. I'd like to check if you are okay with this approach.
Or I can make it simpler by adding page_type to struct net_iov like the
following. Jakub, do these approachs work for you?
Byungchul
---
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index 5d51600935a6..def274f5c1ca 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -707,7 +707,7 @@ static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq,
xdpi = mlx5e_xdpi_fifo_pop(xdpi_fifo);
page = xdpi.page.page;
- /* No need to check page_pool_page_is_pp() as we
+ /* No need to check PageNetpp() as we
* know this is a page_pool page.
*/
page_pool_recycle_direct(pp_page_to_nmdesc(page)->pp,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a3f97c551ad8..081e365caa1a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4252,10 +4252,9 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
* DMA mapping IDs for page_pool
*
* When DMA-mapping a page, page_pool allocates an ID (from an xarray) and
- * stashes it in the upper bits of page->pp_magic. We always want to be able to
- * unambiguously identify page pool pages (using page_pool_page_is_pp()). Non-PP
- * pages can have arbitrary kernel pointers stored in the same field as pp_magic
- * (since it overlaps with page->lru.next), so we must ensure that we cannot
+ * stashes it in the upper bits of page->pp_magic. Non-PP pages can have
+ * arbitrary kernel pointers stored in the same field as pp_magic (since
+ * it overlaps with page->lru.next), so we must ensure that we cannot
* mistake a valid kernel pointer with any of the values we write into this
* field.
*
@@ -4290,26 +4289,6 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
#define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS + PP_DMA_INDEX_SHIFT - 1, \
PP_DMA_INDEX_SHIFT)
-/* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is
- * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for
- * the head page of compound page and bit 1 for pfmemalloc page, as well as the
- * bits used for the DMA index. page_is_pfmemalloc() is checked in
- * __page_pool_put_page() to avoid recycling the pfmemalloc page.
- */
-#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
-
-#ifdef CONFIG_PAGE_POOL
-static inline bool page_pool_page_is_pp(const struct page *page)
-{
- return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
-}
-#else
-static inline bool page_pool_page_is_pp(const struct page *page)
-{
- return false;
-}
-#endif
-
#define PAGE_SNAPSHOT_FAITHFUL (1 << 0)
#define PAGE_SNAPSHOT_PG_BUDDY (1 << 1)
#define PAGE_SNAPSHOT_PG_IDLE (1 << 2)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 0091ad1986bf..edf5418c91dd 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -934,6 +934,7 @@ enum pagetype {
PGTY_zsmalloc = 0xf6,
PGTY_unaccepted = 0xf7,
PGTY_large_kmalloc = 0xf8,
+ PGTY_netpp = 0xf9,
PGTY_mapcount_underflow = 0xff
};
@@ -1078,6 +1079,11 @@ PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc)
PAGE_TYPE_OPS(Unaccepted, unaccepted, unaccepted)
FOLIO_TYPE_OPS(large_kmalloc, large_kmalloc)
+/*
+ * Marks page_pool allocated pages.
+ */
+PAGE_TYPE_OPS(Netpp, netpp, netpp)
+
/**
* PageHuge - Determine if the page belongs to hugetlbfs
* @page: The page to test.
diff --git a/include/net/netmem.h b/include/net/netmem.h
index 651e2c62d1dd..b42d75ecd411 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -114,10 +114,21 @@ struct net_iov {
atomic_long_t pp_ref_count;
};
};
+
+ unsigned int page_type;
struct net_iov_area *owner;
enum net_iov_type type;
};
+/* Make sure 'the offset of page_type in struct page == the offset of
+ * type in struct net_iov'.
+ */
+#define NET_IOV_ASSERT_OFFSET(pg, iov) \
+ static_assert(offsetof(struct page, pg) == \
+ offsetof(struct net_iov, iov))
+NET_IOV_ASSERT_OFFSET(page_type, page_type);
+#undef NET_IOV_ASSERT_OFFSET
+
struct net_iov_area {
/* Array of net_iovs for this area. */
struct net_iov *niovs;
@@ -260,7 +271,7 @@ static inline unsigned long netmem_pfn_trace(netmem_ref netmem)
*/
#define pp_page_to_nmdesc(p) \
({ \
- DEBUG_NET_WARN_ON_ONCE(!page_pool_page_is_pp(p)); \
+ DEBUG_NET_WARN_ON_ONCE(!PageNetpp(p)); \
__pp_page_to_nmdesc(p); \
})
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 600d9e981c23..01dd14123065 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1041,7 +1041,6 @@ static inline bool page_expected_state(struct page *page,
#ifdef CONFIG_MEMCG
page->memcg_data |
#endif
- page_pool_page_is_pp(page) |
(page->flags.f & check_flags)))
return false;
@@ -1068,8 +1067,6 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
if (unlikely(page->memcg_data))
bad_reason = "page still charged to cgroup";
#endif
- if (unlikely(page_pool_page_is_pp(page)))
- bad_reason = "page_pool leak";
return bad_reason;
}
@@ -1378,9 +1375,12 @@ __always_inline bool free_pages_prepare(struct page *page,
mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
folio->mapping = NULL;
}
- if (unlikely(page_has_type(page)))
+ if (unlikely(page_has_type(page))) {
+ /* networking expects to clear its page type before releasing */
+ WARN_ON_ONCE(PageNetpp(page));
/* Reset the page_type (which overlays _mapcount) */
page->page_type = UINT_MAX;
+ }
if (is_check_pages_enabled()) {
if (free_page_is_bad(page))
diff --git a/net/core/netmem_priv.h b/net/core/netmem_priv.h
index 23175cb2bd86..6b7d90b8ebb9 100644
--- a/net/core/netmem_priv.h
+++ b/net/core/netmem_priv.h
@@ -8,21 +8,15 @@ static inline unsigned long netmem_get_pp_magic(netmem_ref netmem)
return netmem_to_nmdesc(netmem)->pp_magic & ~PP_DMA_INDEX_MASK;
}
-static inline void netmem_or_pp_magic(netmem_ref netmem, unsigned long pp_magic)
-{
- netmem_to_nmdesc(netmem)->pp_magic |= pp_magic;
-}
-
-static inline void netmem_clear_pp_magic(netmem_ref netmem)
-{
- WARN_ON_ONCE(netmem_to_nmdesc(netmem)->pp_magic & PP_DMA_INDEX_MASK);
-
- netmem_to_nmdesc(netmem)->pp_magic = 0;
-}
-
static inline bool netmem_is_pp(netmem_ref netmem)
{
- return (netmem_get_pp_magic(netmem) & PP_MAGIC_MASK) == PP_SIGNATURE;
+ /* XXX: Now that the offset of page_type is shared between
+ * struct page and net_iov, just cast the netmem to struct page
+ * unconditionally by clearing NET_IOV if any, no matter whether
+ * it comes from struct net_iov or struct page. This should be
+ * adjusted once the offset is no longer shared.
+ */
+ return PageNetpp(__netmem_to_page((__force unsigned long)netmem & ~NET_IOV));
}
static inline void netmem_set_pp(netmem_ref netmem, struct page_pool *pool)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 1a5edec485f1..27650ca789d1 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -699,7 +699,14 @@ s32 page_pool_inflight(const struct page_pool *pool, bool strict)
void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem)
{
netmem_set_pp(netmem, pool);
- netmem_or_pp_magic(netmem, PP_SIGNATURE);
+
+ /* XXX: Now that the offset of page_type is shared between
+ * struct page and net_iov, just cast the netmem to struct page
+ * unconditionally by clearing NET_IOV if any, no matter whether
+ * it comes from struct net_iov or struct page. This should be
+ * adjusted once the offset is no longer shared.
+ */
+ __SetPageNetpp(__netmem_to_page((__force unsigned long)netmem & ~NET_IOV));
/* Ensuring all pages have been split into one fragment initially:
* page_pool_set_pp_info() is only called once for every page when it
@@ -714,7 +721,14 @@ void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem)
void page_pool_clear_pp_info(netmem_ref netmem)
{
- netmem_clear_pp_magic(netmem);
+ /* XXX: Now that the offset of page_type is shared between
+ * struct page and net_iov, just cast the netmem to struct page
+ * unconditionally by clearing NET_IOV if any, no matter whether
+ * it comes from struct net_iov or struct page. This should be
+ * adjusted once the offset is no longer shared.
+ */
+ __ClearPageNetpp(__netmem_to_page((__force unsigned long)netmem & ~NET_IOV));
+
netmem_set_pp(netmem, NULL);
}
next prev parent reply other threads:[~2025-11-12 7:41 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-03 7:51 [RFC mm v5 0/2] mm, page_pool: introduce a new page type for page pool in page type Byungchul Park
2025-11-03 7:51 ` [RFC mm v5 1/2] page_pool: check nmdesc->pp to see its usage as page pool for net_iov not page-backed Byungchul Park
2025-11-03 12:24 ` Toke Høiland-Jørgensen
2025-11-06 11:07 ` Pavel Begunkov
2025-11-07 1:33 ` Jakub Kicinski
2025-11-07 1:59 ` Byungchul Park
2025-11-07 2:08 ` Jakub Kicinski
2025-11-07 4:47 ` Byungchul Park
2025-11-08 1:41 ` Jakub Kicinski
2025-11-08 2:24 ` Byungchul Park
2025-11-08 2:29 ` Byungchul Park
2025-11-08 2:37 ` Jakub Kicinski
2025-11-10 1:09 ` Byungchul Park
2025-11-11 1:40 ` Byungchul Park
2025-11-11 1:56 ` Jakub Kicinski
2025-11-11 2:17 ` Byungchul Park
2025-11-11 2:45 ` Byungchul Park
2025-11-11 12:36 ` Toke Høiland-Jørgensen
2025-11-12 7:41 ` Byungchul Park [this message]
2025-11-15 1:23 ` Jakub Kicinski
2025-11-17 4:25 ` Byungchul Park
2025-11-03 7:51 ` [RFC mm v5 2/2] mm: introduce a new page type for page pool in page type Byungchul Park
2025-11-03 12:26 ` Toke Høiland-Jørgensen
2025-11-03 12:39 ` Byungchul Park
2025-11-03 14:50 ` Toke Høiland-Jørgensen
2025-11-06 11:08 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251112074118.GA31149@system.software.com \
--to=byungchul@sk.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=almasrymina@google.com \
--cc=andrew+netdev@lunn.ch \
--cc=ap420073@gmail.com \
--cc=asml.silence@gmail.com \
--cc=ast@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bpf@vger.kernel.org \
--cc=brauner@kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=david@redhat.com \
--cc=dtatulea@nvidia.com \
--cc=dw@davidwei.uk \
--cc=edumazet@google.com \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=jackmanb@google.com \
--cc=john.fastabend@gmail.com \
--cc=kas@kernel.org \
--cc=kernel_team@skhynix.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rdma@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mbloch@nvidia.com \
--cc=mhocko@suse.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=rppt@kernel.org \
--cc=saeedm@nvidia.com \
--cc=sdf@fomichev.me \
--cc=sfr@canb.auug.org.au \
--cc=surenb@google.com \
--cc=tariqt@nvidia.com \
--cc=toke@redhat.com \
--cc=usamaarif642@gmail.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=yuzhao@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.