From: "Li Zhe" <lizhe.67@bytedance.com>
To: <akpm@linux-foundation.org>, <apopple@nvidia.com>,
<arnd@arndb.de>, <bp@alien8.de>, <dave.hansen@linux.intel.com>,
<david@kernel.org>, <kees@kernel.org>, <mingo@redhat.com>,
<rppt@kernel.org>, <tglx@kernel.org>
Cc: <linux-arch@vger.kernel.org>, <linux-hardening@vger.kernel.org>,
<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
<x86@kernel.org>, <lizhe.67@bytedance.com>
Subject: [PATCH v5 8/8] mm: use memcpy_nt() in zone-device template copies
Date: Wed, 1 Jul 2026 17:05:53 +0800 [thread overview]
Message-ID: <20260701090553.62691-9-lizhe.67@bytedance.com> (raw)
In-Reply-To: <20260701090553.62691-1-lizhe.67@bytedance.com>
The template fast path currently uses memcpy() for the actual struct
page copy. Switch zone_device_page_init_from_template() to memcpy_nt()
and add memcpy_nt_drain() before memmap_init_compound(), before
prep_compound_head() updates overlapping tail metadata, and before
returning from memmap_init_zone_device().
ZONE_DEVICE memmap initialization is largely write-once: each struct
page is populated once, and most destination cachelines are not expected
to be reused immediately afterwards. On x86, a regular cached memcpy()
can therefore incur write-allocate traffic by pulling destination
cachelines into the cache before writeback, and can populate the cache
with data that has little near-term reuse. Using memcpy_nt() lets this
path request non-temporal stores for that copy pattern, which can reduce
cache pollution and avoid part of the associated write-allocate
overhead, while architectures without a specialized backend still fall
back to memcpy().
When memcpy_nt() maps to non-temporal stores, order those stores before
memmap_init_compound(), before prep_compound_head() updates overlapping
compound metadata, and before returning from memmap_init_zone_device().
Keep sanitized builds on the slow path so KASAN/KMSAN retain their
instrumented stores.
Tested in a VM with a 100 GB fsdax namespace device configured with
map=dev and a 100 GB devdax namespace (align=2097152) on Intel Ice Lake
server.
Test procedure:
Rebind the nd_pmem and dax_pmem driver 30 times and collect the memmap
initialization time from the pr_debug() output of
memmap_init_zone_device().
Base(v7.2-rc1):
First binding for nd_pmem driver: 1456 ms
Average of subsequent rebinds: 244.28 ms
First binding for dax_pmem driver: 1462 ms
Average of subsequent rebinds: 273.31 ms
With this series:
First binding for nd_pmem driver: 1272 ms
Average of subsequent rebinds: 96.79 ms
First binding for dax_pmem driver: 1354 ms
Average of subsequent rebinds: 119.04 ms
This reduces the average rebind time by about 60.4% for nd_pmem and
56.4% for dax_pmem.
Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
---
mm/mm_init.c | 39 +++++++++++++++++++++++++++++++++++++--
1 file changed, 37 insertions(+), 2 deletions(-)
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 60794050bc07..eb8859a62f70 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1068,11 +1068,21 @@ static void __ref zone_device_page_init_slow(struct page *page,
static inline bool zone_device_page_init_optimization_enabled(void)
{
+ /*
+ * Keep sanitized builds on the slow path so their stores stay
+ * instrumented.
+ */
+ if (IS_ENABLED(CONFIG_KASAN) || IS_ENABLED(CONFIG_KMSAN))
+ return false;
+
/*
* The template fast path copies a preinitialized struct page image.
* Skip it when the page_ref_set tracepoint is enabled.
*/
- return !page_ref_tracepoint_active(page_ref_set);
+ if (page_ref_tracepoint_active(page_ref_set))
+ return false;
+
+ return true;
}
static inline void zone_device_template_page_init(struct page *template,
@@ -1117,7 +1127,7 @@ static void zone_device_page_init_from_template(struct page *page,
* to the destination page.
*/
zone_device_page_update_template(template, pfn);
- memcpy(page, template, sizeof(*page));
+ memcpy_nt(page, template, sizeof(*page));
}
/*
@@ -1188,6 +1198,15 @@ static void __ref memmap_init_compound(struct page *head,
zone_device_tail_page_init(page, pfn, zone_idx, nid,
pgmap, head, order);
}
+
+ /*
+ * When the template path is enabled, order the preceding tail-page copies
+ * before prep_compound_head() updates the overlapping compound metadata
+ * in the first tail-page descriptors. If memcpy_nt() fell back to
+ * regular cached stores, memcpy_nt_drain() may be a no-op.
+ */
+ if (use_template)
+ memcpy_nt_drain();
prep_compound_head(head, order);
}
@@ -1257,10 +1276,26 @@ void __ref memmap_init_zone_device(struct zone *zone,
if (pfns_per_compound == 1)
continue;
+ /*
+ * When the template path is enabled, order the preceding head-page copy
+ * before memmap_init_compound(), which immediately updates compound-head
+ * metadata. If memcpy_nt() fell back to regular cached stores,
+ * memcpy_nt_drain() may be a no-op.
+ */
+ if (use_template)
+ memcpy_nt_drain();
+
memmap_init_compound(page, pfn, zone_idx, nid, pgmap,
compound_nr_pages(pfn, altmap, pgmap),
use_template);
}
+ /*
+ * Ensure any prior template copies are ordered before returning.
+ * On architectures where memcpy_nt() used regular cached stores,
+ * memcpy_nt_drain() may be a no-op.
+ */
+ if (use_template)
+ memcpy_nt_drain();
pageblock_migratetype_init_range(start_pfn, nr_pages, MIGRATE_MOVABLE);
--
2.20.1
prev parent reply other threads:[~2026-07-01 9:10 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-07-01 9:05 [PATCH v5 0/8] mm: optimize zone-device memmap initialization Li Zhe
2026-07-01 9:05 ` [PATCH v5 1/8] mm: fix stale ZONE_DEVICE refcount comment Li Zhe
2026-07-01 9:05 ` [PATCH v5 2/8] mm: factor zone-device page init helpers out of __init_zone_device_page Li Zhe
2026-07-01 9:05 ` [PATCH v5 3/8] mm: add a set_page_section_from_pfn() helper Li Zhe
2026-07-01 9:05 ` [PATCH v5 4/8] mm: add a template-based fast path for zone-device page init Li Zhe
2026-07-03 14:06 ` Mike Rapoport
2026-07-01 9:05 ` [PATCH v5 5/8] mm: extend the template fast path to zone-device compound tails Li Zhe
2026-07-01 9:05 ` [PATCH v5 6/8] string: introduce memcpy_nt() helpers Li Zhe
2026-07-01 9:05 ` [PATCH v5 7/8] x86/string: extend memcpy_flushcache() fixed-size fastpaths Li Zhe
2026-07-01 9:05 ` Li Zhe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260701090553.62691-9-lizhe.67@bytedance.com \
--to=lizhe.67@bytedance.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=arnd@arndb.de \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=david@kernel.org \
--cc=kees@kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-hardening@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=rppt@kernel.org \
--cc=tglx@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox