From: Mike Rapoport <rppt@kernel.org>
To: Li Zhe <lizhe.67@bytedance.com>
Cc: tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, arnd@arndb.de,
akpm@linux-foundation.org, david@kernel.org, x86@kernel.org,
linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH 2/4] mm: add a template-based fast path for zone-device page init
Date: Mon, 18 May 2026 09:51:34 +0300 [thread overview]
Message-ID: <agq29raPaAPlM8kV@kernel.org> (raw)
In-Reply-To: <20260515082045.63029-3-lizhe.67@bytedance.com>
Hi,
On Fri, May 15, 2026 at 04:20:43PM +0800, Li Zhe wrote:
> On 64-bit builds, memmap_init_zone_device() spends most of its time
> repeating the same struct page initialization for every PFN. Prepare a
> template page through the existing slow path once, then copy that
> template into each destination page and fix up the PFN-dependent state
> afterwards.
>
> Keep the optimized path disabled when the page_ref_set tracepoint is
> active, because the template-copy path bypasses set_page_count() and
> would otherwise hide the corresponding trace event.
>
> Non-64-bit builds continue to use the existing slow path.
ZONE_DEVICE depends on MEMORY_HOTPLUG and MEMORY_HOTPLUG is only supported
for 64 bits, so there can't be 32-bit builds for ZONE_DEVICE functionality.
> Tested in a VM with a 100 GB fsdax namespace device configured with
> map=dev on Intel Ice Lake server. This test exercises the nd_pmem rebind
> path (pfns_per_compound == 1).
>
> Test procedure:
> Rebind the nd_pmem driver 30 times and collect the memmap initialization
> time from the pr_debug() output of memmap_init_zone_device().
>
> Base(v7.1-rc3):
> First binding: 1486 ms
> Average of subsequent rebinds: 273.52 ms
>
> With this patch:
> First binding: 1421 ms
> Average of subsequent rebinds: 246.14 ms
>
> This reduces the average rebind time from 273.52 ms to 246.14 ms, or
> about 10%.
>
> Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> ---
> mm/mm_init.c | 103 +++++++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 96 insertions(+), 7 deletions(-)
>
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 5244acb96dbb..4c475c71a9d6 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1013,7 +1013,7 @@ static inline int zone_device_page_init_refcount(
> }
> }
>
> -static void __ref generic_init_zone_device_page(struct page *page,
> +static void __ref generic_init_zone_device_page_slow(struct page *page,
> unsigned long pfn, unsigned long zone_idx, int nid,
> struct dev_pagemap *pgmap)
> {
> @@ -1040,12 +1040,9 @@ static void __ref generic_init_zone_device_page(struct page *page,
> set_page_count(page, 0);
> }
>
> -static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
> - unsigned long zone_idx, int nid,
> - struct dev_pagemap *pgmap)
> +static void __ref zone_device_page_init_pageblock(struct page *page,
> + unsigned long pfn)
Please move splitting _pageblock helper into the first patch, so that the
first patch would contain all code movement.
> {
> - generic_init_zone_device_page(page, pfn, zone_idx, nid, pgmap);
> -
> /*
> * Mark the block movable so that blocks are reserved for
> * movable at startup. This will force kernel allocations
> @@ -1062,6 +1059,88 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
> }
> }
>
> +static inline void __init_zone_device_page(struct page *page, unsigned long pfn,
> + unsigned long zone_idx, int nid,
> + struct dev_pagemap *pgmap)
> +{
> + generic_init_zone_device_page_slow(page, pfn, zone_idx, nid, pgmap);
> + zone_device_page_init_pageblock(page, pfn);
> +}
> +
> +#if BITS_PER_LONG == 64
> +static inline bool zone_device_page_init_optimization_enabled(void)
> +{
> + /*
> + * We use template pages and assign page->_refcount via memory copy.
> + * This means the optimized path bypasses set_page_count(), so the
> + * page_ref_set tracepoint cannot observe this initialization.
> + * Skip the optimized path when the tracepoint is enabled.
> + */
> + return !page_ref_tracepoint_active(page_ref_set);
> +}
> +
> +static inline void struct_page_layout_check(void)
> +{
> + BUILD_BUG_ON(sizeof(struct page) & (sizeof(u64) - 1));
Does it have to be a BUILD_BUG()? Can't we fallback to slow path if struct
page has a weird size?
Just do the check in zone_device_page_init_optimization_enabled().
> +}
> +
> +static inline void init_template_page(struct page *template,
> + unsigned long pfn,
> + unsigned long zone_idx,
> + int nid,
> + struct dev_pagemap *pgmap)
The name should include zone_device to avoid confusion with regular pages.
> +{
> + generic_init_zone_device_page_slow(template, pfn, zone_idx, nid, pgmap);
> +}
> +
> +/*
> + * Initialize parts that differ from the template
> + */
> +static inline void generic_init_zone_device_page_finish(struct page *page,
> + unsigned long pfn)
> +{
> +#ifdef SECTION_IN_PAGE_FLAGS
> + set_page_section(page, pfn_to_section_nr(pfn));
Can we add a stub for set_page_address() for !SECTION_IN_PAGE_FLAGS case
and drop the #ifdef here and in set_page_links()?
> +#endif
> +#ifdef WANT_PAGE_VIRTUAL
> + if (!is_highmem_idx(ZONE_DEVICE))
> + set_page_address(page, __va(pfn << PAGE_SHIFT));
set_page_address() is a not when WANT_PAGE_VIRTUAL, you can drop the ifdef.
> +#endif
> +}
> +
> +static void init_zone_device_page_from_template(struct page *page,
> + unsigned long pfn, const struct page *template)
zone_device_page_init_from_template() please.
> +{
> + const u64 *src = (const u64 *)template;
> + u64 *dst = (u64 *)page;
> + unsigned int i;
> +
> + for (i = 0; i < sizeof(struct page) / sizeof(u64); i++)
> + dst[i] = src[i];
> + generic_init_zone_device_page_finish(page, pfn);
> + zone_device_page_init_pageblock(page, pfn);
> +}
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2026-05-18 6:51 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-15 8:20 [PATCH 0/4] mm: speed up ZONE_DEVICE memmap initialization Li Zhe
2026-05-15 8:20 ` [PATCH 1/4] mm: factor zone-device page init helpers out of __init_zone_device_page Li Zhe
2026-05-18 6:32 ` Mike Rapoport
2026-05-18 9:11 ` Li Zhe
2026-05-15 8:20 ` [PATCH 2/4] mm: add a template-based fast path for zone-device page init Li Zhe
2026-05-18 6:51 ` Mike Rapoport [this message]
2026-05-18 9:54 ` Li Zhe
2026-05-18 11:42 ` Mike Rapoport
2026-05-15 8:20 ` [PATCH 3/4] mm: extend the template fast path to zone-device compound tails Li Zhe
2026-05-15 8:20 ` [PATCH 4/4] mm: use arch store helpers in zone-device template copies Li Zhe
2026-05-18 0:32 ` Alistair Popple
2026-05-18 6:42 ` Li Zhe
2026-05-20 22:42 ` Alistair Popple
2026-05-19 3:09 ` Balbir Singh
2026-05-18 6:23 ` [PATCH 0/4] mm: speed up ZONE_DEVICE memmap initialization Mike Rapoport
2026-05-18 8:57 ` Li Zhe
2026-05-20 6:20 ` Mike Rapoport
2026-05-20 11:57 ` Li Zhe
2026-05-20 22:36 ` Alistair Popple
2026-05-21 3:00 ` Li Zhe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=agq29raPaAPlM8kV@kernel.org \
--to=rppt@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=david@kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizhe.67@bytedance.com \
--cc=mingo@redhat.com \
--cc=tglx@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.