From: Byungchul Park <byungchul@sk.com>
To: linux-mm@kvack.org, netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, kernel_team@skhynix.com,
harry.yoo@oracle.com, ast@kernel.org, daniel@iogearbox.net,
davem@davemloft.net, kuba@kernel.org, hawk@kernel.org,
john.fastabend@gmail.com, sdf@fomichev.me, saeedm@nvidia.com,
leon@kernel.org, tariqt@nvidia.com, mbloch@nvidia.com,
andrew+netdev@lunn.ch, edumazet@google.com, pabeni@redhat.com,
akpm@linux-foundation.org, david@redhat.com,
lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com,
vbabka@suse.cz, rppt@kernel.org, surenb@google.com,
mhocko@suse.com, horms@kernel.org, jackmanb@google.com,
hannes@cmpxchg.org, ziy@nvidia.com, ilias.apalodimas@linaro.org,
willy@infradead.org, brauner@kernel.org, kas@kernel.org,
yuzhao@google.com, usamaarif642@gmail.com,
baolin.wang@linux.alibaba.com, almasrymina@google.com,
toke@redhat.com, asml.silence@gmail.com, bpf@vger.kernel.org,
linux-rdma@vger.kernel.org, sfr@canb.auug.org.au
Subject: Re: [PATCH v2] mm, page_pool: introduce a new page type for page pool in page type
Date: Mon, 28 Jul 2025 19:57:46 +0900 [thread overview]
Message-ID: <20250728105746.GB21732@system.software.com> (raw)
In-Reply-To: <20250728052742.81294-1-byungchul@sk.com>
On Mon, Jul 28, 2025 at 02:27:42PM +0900, Byungchul Park wrote:
> Changes from v1:
> 1. Rebase on linux-next.
+cc sfr@canb.auug.org.au
Byungchul
> 2. Initialize net_iov->pp = NULL when allocating net_iov in
> net_devmem_bind_dmabuf() and io_zcrx_create_area().
> 3. Use ->pp for net_iov to identify if it's pp rather than
> always consider net_iov as pp.
> 4. Add Suggested-by: David Hildenbrand <david@redhat.com>.
>
> ---8<---
> >From 08b65324282bbe683a2479faef8eb24df249fd18 Mon Sep 17 00:00:00 2001
> From: Byungchul Park <byungchul@sk.com>
> Date: Mon, 28 Jul 2025 14:07:17 +0900
> Subject: [PATCH v2] mm, page_pool: introduce a new page type for page pool in page type
>
> ->pp_magic field in struct page is current used to identify if a page
> belongs to a page pool. However, ->pp_magic will be removed and page
> type bit in struct page e.i. PGTY_netpp can be used for that purpose.
>
> Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
> and __ClearPageNetpp() instead, and remove the existing APIs accessing
> ->pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
> netmem_clear_pp_magic().
>
> For net_iov, use ->pp to identify if it's pp, with making sure that ->pp
> is NULL for non-pp net_iov.
>
> This work was inspired by the following link:
>
> [1] https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/
>
> While at it, move the sanity check for page pool to on free.
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> ---
> .../net/ethernet/mellanox/mlx5/core/en/xdp.c | 2 +-
> include/linux/mm.h | 27 +++----------------
> include/linux/page-flags.h | 6 +++++
> io_uring/zcrx.c | 1 +
> mm/page_alloc.c | 7 +++--
> net/core/devmem.c | 1 +
> net/core/netmem_priv.h | 23 +++++++---------
> net/core/page_pool.c | 10 +++++--
> 8 files changed, 33 insertions(+), 44 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> index 5ce1b463b7a8..79a3ceff3952 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> @@ -707,7 +707,7 @@ static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq,
> xdpi = mlx5e_xdpi_fifo_pop(xdpi_fifo);
> page = xdpi.page.page;
>
> - /* No need to check page_pool_page_is_pp() as we
> + /* No need to check PageNetpp() as we
> * know this is a page_pool page.
> */
> page_pool_recycle_direct(page->pp, page);
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 3bfb566ad202..d01b296e7184 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4171,10 +4171,9 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> * DMA mapping IDs for page_pool
> *
> * When DMA-mapping a page, page_pool allocates an ID (from an xarray) and
> - * stashes it in the upper bits of page->pp_magic. We always want to be able to
> - * unambiguously identify page pool pages (using page_pool_page_is_pp()). Non-PP
> - * pages can have arbitrary kernel pointers stored in the same field as pp_magic
> - * (since it overlaps with page->lru.next), so we must ensure that we cannot
> + * stashes it in the upper bits of page->pp_magic. Non-PP pages can have
> + * arbitrary kernel pointers stored in the same field as pp_magic (since
> + * it overlaps with page->lru.next), so we must ensure that we cannot
> * mistake a valid kernel pointer with any of the values we write into this
> * field.
> *
> @@ -4205,26 +4204,6 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> #define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS + PP_DMA_INDEX_SHIFT - 1, \
> PP_DMA_INDEX_SHIFT)
>
> -/* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is
> - * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for
> - * the head page of compound page and bit 1 for pfmemalloc page, as well as the
> - * bits used for the DMA index. page_is_pfmemalloc() is checked in
> - * __page_pool_put_page() to avoid recycling the pfmemalloc page.
> - */
> -#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
> -
> -#ifdef CONFIG_PAGE_POOL
> -static inline bool page_pool_page_is_pp(struct page *page)
> -{
> - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
> -}
> -#else
> -static inline bool page_pool_page_is_pp(struct page *page)
> -{
> - return false;
> -}
> -#endif
> -
> #define PAGE_SNAPSHOT_FAITHFUL (1 << 0)
> #define PAGE_SNAPSHOT_PG_BUDDY (1 << 1)
> #define PAGE_SNAPSHOT_PG_IDLE (1 << 2)
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 8e4d6eda8a8d..548a68415845 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -935,6 +935,7 @@ enum pagetype {
> PGTY_zsmalloc = 0xf6,
> PGTY_unaccepted = 0xf7,
> PGTY_large_kmalloc = 0xf8,
> + PGTY_netpp = 0xf9,
>
> PGTY_mapcount_underflow = 0xff
> };
> @@ -1079,6 +1080,11 @@ PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc)
> PAGE_TYPE_OPS(Unaccepted, unaccepted, unaccepted)
> FOLIO_TYPE_OPS(large_kmalloc, large_kmalloc)
>
> +/*
> + * Marks page_pool allocated pages.
> + */
> +PAGE_TYPE_OPS(Netpp, netpp, netpp)
> +
> /**
> * PageHuge - Determine if the page belongs to hugetlbfs
> * @page: The page to test.
> diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
> index 100b75ab1e64..34634552cf74 100644
> --- a/io_uring/zcrx.c
> +++ b/io_uring/zcrx.c
> @@ -444,6 +444,7 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
> area->freelist[i] = i;
> atomic_set(&area->user_refs[i], 0);
> niov->type = NET_IOV_IOURING;
> + niov->pp = NULL;
> }
>
> area->free_count = nr_iovs;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index fa09154a799c..cb90d6a3fd9d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1042,7 +1042,6 @@ static inline bool page_expected_state(struct page *page,
> #ifdef CONFIG_MEMCG
> page->memcg_data |
> #endif
> - page_pool_page_is_pp(page) |
> (page->flags & check_flags)))
> return false;
>
> @@ -1069,8 +1068,6 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
> if (unlikely(page->memcg_data))
> bad_reason = "page still charged to cgroup";
> #endif
> - if (unlikely(page_pool_page_is_pp(page)))
> - bad_reason = "page_pool leak";
> return bad_reason;
> }
>
> @@ -1379,9 +1376,11 @@ __always_inline bool free_pages_prepare(struct page *page,
> mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
> folio->mapping = NULL;
> }
> - if (unlikely(page_has_type(page)))
> + if (unlikely(page_has_type(page))) {
> + WARN_ON_ONCE(PageNetpp(page));
> /* Reset the page_type (which overlays _mapcount) */
> page->page_type = UINT_MAX;
> + }
>
> if (is_check_pages_enabled()) {
> if (free_page_is_bad(page))
> diff --git a/net/core/devmem.c b/net/core/devmem.c
> index b3a62ca0df65..40e7a4ec9009 100644
> --- a/net/core/devmem.c
> +++ b/net/core/devmem.c
> @@ -285,6 +285,7 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> niov = &owner->area.niovs[i];
> niov->type = NET_IOV_DMABUF;
> niov->owner = &owner->area;
> + niov->pp = NULL;
> page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov),
> net_devmem_get_dma_addr(niov));
> if (direction == DMA_TO_DEVICE)
> diff --git a/net/core/netmem_priv.h b/net/core/netmem_priv.h
> index cd95394399b4..4b90332d6c64 100644
> --- a/net/core/netmem_priv.h
> +++ b/net/core/netmem_priv.h
> @@ -8,21 +8,18 @@ static inline unsigned long netmem_get_pp_magic(netmem_ref netmem)
> return __netmem_clear_lsb(netmem)->pp_magic & ~PP_DMA_INDEX_MASK;
> }
>
> -static inline void netmem_or_pp_magic(netmem_ref netmem, unsigned long pp_magic)
> -{
> - __netmem_clear_lsb(netmem)->pp_magic |= pp_magic;
> -}
> -
> -static inline void netmem_clear_pp_magic(netmem_ref netmem)
> -{
> - WARN_ON_ONCE(__netmem_clear_lsb(netmem)->pp_magic & PP_DMA_INDEX_MASK);
> -
> - __netmem_clear_lsb(netmem)->pp_magic = 0;
> -}
> -
> static inline bool netmem_is_pp(netmem_ref netmem)
> {
> - return (netmem_get_pp_magic(netmem) & PP_MAGIC_MASK) == PP_SIGNATURE;
> + /* Use ->pp for net_iov to identify if it's pp, which requires
> + * that non-pp net_iov should have ->pp NULL'd.
> + */
> + if (netmem_is_net_iov(netmem))
> + return !!__netmem_clear_lsb(netmem)->pp;
> +
> + /* For system memory, page type bit in struct page can be used
> + * to identify if it's pp.
> + */
> + return PageNetpp(__netmem_to_page(netmem));
> }
>
> static inline void netmem_set_pp(netmem_ref netmem, struct page_pool *pool)
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 05e2e22a8f7c..0a10f3026faa 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -654,7 +654,6 @@ s32 page_pool_inflight(const struct page_pool *pool, bool strict)
> void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem)
> {
> netmem_set_pp(netmem, pool);
> - netmem_or_pp_magic(netmem, PP_SIGNATURE);
>
> /* Ensuring all pages have been split into one fragment initially:
> * page_pool_set_pp_info() is only called once for every page when it
> @@ -665,12 +664,19 @@ void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem)
> page_pool_fragment_netmem(netmem, 1);
> if (pool->has_init_callback)
> pool->slow.init_callback(netmem, pool->slow.init_arg);
> +
> + if (netmem_is_net_iov(netmem))
> + return;
> + __SetPageNetpp(__netmem_to_page(netmem));
> }
>
> void page_pool_clear_pp_info(netmem_ref netmem)
> {
> - netmem_clear_pp_magic(netmem);
> netmem_set_pp(netmem, NULL);
> +
> + if (netmem_is_net_iov(netmem))
> + return;
> + __ClearPageNetpp(__netmem_to_page(netmem));
> }
>
> static __always_inline void __page_pool_release_netmem_dma(struct page_pool *pool,
>
> base-commit: 97987520025658f30bb787a99ffbd9bbff9ffc9d
> --
> 2.17.1
next prev parent reply other threads:[~2025-07-28 10:57 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-28 5:27 [PATCH v2] mm, page_pool: introduce a new page type for page pool in page type Byungchul Park
2025-07-28 8:20 ` [PATCH v2 rebase as of Jul 28] " Byungchul Park
2025-07-28 17:50 ` Mina Almasry
2025-07-28 8:45 ` [PATCH v2] " David Hildenbrand
2025-07-28 10:57 ` Byungchul Park [this message]
2025-07-28 18:36 ` Pavel Begunkov
2025-07-28 18:39 ` Mina Almasry
2025-07-28 18:49 ` Pavel Begunkov
2025-07-29 1:04 ` Byungchul Park
2025-07-29 8:53 ` Pavel Begunkov
2025-07-29 0:53 ` Byungchul Park
2025-07-29 0:51 ` Byungchul Park
2025-07-29 1:19 ` Byungchul Park
2025-07-29 9:22 ` Pavel Begunkov
2025-07-29 10:22 ` Byungchul Park
2025-07-31 6:13 ` Byungchul Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250728105746.GB21732@system.software.com \
--to=byungchul@sk.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=almasrymina@google.com \
--cc=andrew+netdev@lunn.ch \
--cc=asml.silence@gmail.com \
--cc=ast@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bpf@vger.kernel.org \
--cc=brauner@kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=david@redhat.com \
--cc=edumazet@google.com \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=jackmanb@google.com \
--cc=john.fastabend@gmail.com \
--cc=kas@kernel.org \
--cc=kernel_team@skhynix.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rdma@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mbloch@nvidia.com \
--cc=mhocko@suse.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=rppt@kernel.org \
--cc=saeedm@nvidia.com \
--cc=sdf@fomichev.me \
--cc=sfr@canb.auug.org.au \
--cc=surenb@google.com \
--cc=tariqt@nvidia.com \
--cc=toke@redhat.com \
--cc=usamaarif642@gmail.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=yuzhao@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.