Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] mm: speed up ZONE_DEVICE memmap initialization
@ 2026-05-15  8:20 Li Zhe
  2026-05-15  8:20 ` [PATCH 1/4] mm: factor zone-device page init helpers out of __init_zone_device_page Li Zhe
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Li Zhe @ 2026-05-15  8:20 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, arnd, rppt, akpm, david
  Cc: x86, linux-kernel, linux-arch, linux-mm, lizhe.67

memmap_init_zone_device() can spend a substantial amount of time
initializing large ZONE_DEVICE ranges because it repeats nearly
identical struct page setup for every PFN.

This series reduces that overhead in four steps.

The first patch factors the reusable pieces out of
__init_zone_device_page() so later patches can share the same logic
without changing the existing slow path.

The second patch adds a template-based fast path for zone-device head
pages on 64-bit builds. Instead of rebuilding the same struct page state
for every PFN, it prepares a reusable page template once and copies it
to each destination page.

The third patch extends the same template-based approach to compound
tails, so pfns_per_compound > 1 can also benefit from the fast path.

The last patch introduces arch_optimize_store_u64() and
arch_optimize_store_drain(), with a generic fallback and an x86-64
MOVNTI/SFENCE implementation, and uses them in the template-copy hot
path. It also refreshes the PFN-dependent fields in the reusable
template before each copy, so the hot path remains a fixed-offset store
sequence without post-copy normal stores to the destination page.

The optimized path is intentionally limited to 64-bit builds for now.
The current fast path copies struct page as a sequence of fixed-offset
u64 stores and relies on the set of struct page layouts used by current
64-bit configurations. Extending the same optimization to 32-bit builds
would need a separate store layout scheme and its own validation, so
this series keeps 32-bit on the existing slow path.

The helper interface is also meant to leave room for other
architecture-specific backends. This series only adds an x86-64
implementation because that is the only platform I was able to measure.
Other architectures, including arm64, may be able to add their own
optimized backend in follow-up work. But I do not currently have arm64
hardware available to validate that.

To preserve observability, the optimized path is disabled when the
page_ref_set tracepoint is enabled, because the template-copy path
bypasses set_page_count() and would otherwise hide that trace event.

Patch summary:
  1. mm: factor zone-device page init helpers out of __init_zone_device_page
  2. mm: add a template-based fast path for zone-device page init
  3. mm: extend the template fast path to zone-device compound tails
  4. mm: use arch store helpers in zone-device template copies

Testing
=======

Tests were run in a VM on an Intel Ice Lake server.

Two PMEM configurations were used:
  - a 100 GB fsdax namespace configured with map=dev, which exercises
    the nd_pmem rebind path (pfns_per_compound == 1)
  - a 100 GB devdax namespace configured with align=2097152, which
    exercises the dax_pmem rebind path (pfns_per_compound > 1)

For each configuration, the corresponding driver was unbound and
rebound 30 times. Memmap initialization latency was collected from the
pr_debug() output of memmap_init_zone_device().

The first bind is reported separately, and the average of subsequent
rebinds is used as the steady-state result.

Performance
===========
nd_pmem rebind, 100 GB fsdax namespace, map=dev
  Base(v7.1-rc3):
    First binding: 1486 ms
    Average of subsequent rebinds: 273.52 ms
  Full series:
    First binding: 1272 ms
    Average of subsequent rebinds: 104.59 ms

dax_pmem rebind, 100 GB devdax namespace, align=2097152
  Base(v7.1-rc3):
    First binding: 1515 ms
    Average of subsequent rebinds: 313.45 ms
  Full series:
    First binding: 1286 ms
    Average of subsequent rebinds: 116.93 ms

Li Zhe (4):
  mm: factor zone-device page init helpers out of
    __init_zone_device_page
  mm: add a template-based fast path for zone-device page init
  mm: extend the template fast path to zone-device compound tails
  mm: use arch store helpers in zone-device template copies

 arch/x86/include/asm/struct_page_init.h |  28 +++
 include/asm-generic/Kbuild              |   1 +
 include/asm-generic/struct_page_init.h  |  17 ++
 mm/mm_init.c                            | 260 +++++++++++++++++++++---
 4 files changed, 280 insertions(+), 26 deletions(-)
 create mode 100644 arch/x86/include/asm/struct_page_init.h
 create mode 100644 include/asm-generic/struct_page_init.h

-- 
2.20.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-05-19  3:11 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-15  8:20 [PATCH 0/4] mm: speed up ZONE_DEVICE memmap initialization Li Zhe
2026-05-15  8:20 ` [PATCH 1/4] mm: factor zone-device page init helpers out of __init_zone_device_page Li Zhe
2026-05-18  6:32   ` Mike Rapoport
2026-05-18  9:11     ` Li Zhe
2026-05-15  8:20 ` [PATCH 2/4] mm: add a template-based fast path for zone-device page init Li Zhe
2026-05-18  6:51   ` Mike Rapoport
2026-05-18  9:54     ` Li Zhe
2026-05-18 11:42       ` Mike Rapoport
2026-05-15  8:20 ` [PATCH 3/4] mm: extend the template fast path to zone-device compound tails Li Zhe
2026-05-15  8:20 ` [PATCH 4/4] mm: use arch store helpers in zone-device template copies Li Zhe
2026-05-18  0:32   ` Alistair Popple
2026-05-18  6:42     ` Li Zhe
2026-05-19  3:09     ` Balbir Singh
2026-05-18  6:23 ` [PATCH 0/4] mm: speed up ZONE_DEVICE memmap initialization Mike Rapoport
2026-05-18  8:57   ` Li Zhe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox