From: Yuan Liu <yuan1.liu@intel.com>
To: David Hildenbrand <david@kernel.org>,
Oscar Salvador <osalvador@suse.de>,
Mike Rapoport <rppt@kernel.org>,
Wei Yang <richard.weiyang@gmail.com>
Cc: linux-mm@kvack.org, Yong Hu <yong.hu@intel.com>,
Nanhai Zou <nanhai.zou@intel.com>, Yuan Liu <yuan1.liu@intel.com>,
Tim Chen <tim.c.chen@linux.intel.com>,
Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
Yu C Chen <yu.c.chen@intel.com>, Pan Deng <pan.deng@intel.com>,
Tianyou Li <tianyou.li@intel.com>,
Chen Zhang <zhangchen.kidd@jd.com>,
linux-kernel@vger.kernel.org
Subject: [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
Date: Thu, 19 Mar 2026 05:56:22 -0400 [thread overview]
Message-ID: <20260319095622.1130380-1-yuan1.liu@intel.com> (raw)
When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it will
update the zone->contiguous by checking the new zone's pfn range from the
beginning to the end, regardless the previous state of the old zone. When
the zone's pfn range is large, the cost of traversing the pfn range to
update the zone->contiguous could be significant.
Add a new zone's pages_with_memmap member, it is pages within the zone that
have an online memmap. It includes present pages and memory holes that have
a memmap. When spanned_pages == pages_with_online_memmap, pfn_to_page() can
be performed without further checks on any pfn within the zone span.
The following test cases of memory hotplug for a VM [1], tested in the
environment [2], show that this optimization can significantly reduce the
memory hotplug time [3].
+----------------+------+---------------+--------------+----------------+
| | Size | Time (before) | Time (after) | Time Reduction |
| +------+---------------+--------------+----------------+
| Plug Memory | 256G | 10s | 3s | 70% |
| +------+---------------+--------------+----------------+
| | 512G | 36s | 7s | 81% |
+----------------+------+---------------+--------------+----------------+
+----------------+------+---------------+--------------+----------------+
| | Size | Time (before) | Time (after) | Time Reduction |
| +------+---------------+--------------+----------------+
| Unplug Memory | 256G | 11s | 4s | 64% |
| +------+---------------+--------------+----------------+
| | 512G | 36s | 9s | 75% |
+----------------+------+---------------+--------------+----------------+
[1] Qemu commands to hotplug 256G/512G memory for a VM:
object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
qom-set vmem1 requested-size 256G/512G (Plug Memory)
qom-set vmem1 requested-size 0G (Unplug Memory)
[2] Hardware : Intel Icelake server
Guest Kernel : v7.0-rc4
Qemu : v9.0.0
Launch VM :
qemu-system-x86_64 -accel kvm -cpu host \
-drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
-drive file=./seed.img,format=raw,if=virtio \
-smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
-m 2G,slots=10,maxmem=2052472M \
-device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
-device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
-nographic -machine q35 \
-nic user,hostfwd=tcp::3000-:22
Guest kernel auto-onlines newly added memory blocks:
echo online > /sys/devices/system/memory/auto_online_blocks
[3] The time from typing the QEMU commands in [1] to when the output of
'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
memory is recognized.
Reported-by: Nanhai Zou <nanhai.zou@intel.com>
Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
Tested-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
Reviewed-by: Pan Deng <pan.deng@intel.com>
Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
Reviewed-by: Yuan Liu <yuan1.liu@intel.com>
Co-developed-by: Tianyou Li <tianyou.li@intel.com>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
---
Documentation/mm/physical_memory.rst | 6 +++++
include/linux/mmzone.h | 22 ++++++++++++++-
mm/internal.h | 10 +++----
mm/memory_hotplug.c | 21 +++++----------
mm/mm_init.c | 40 +++++++++-------------------
5 files changed, 50 insertions(+), 49 deletions(-)
diff --git a/Documentation/mm/physical_memory.rst b/Documentation/mm/physical_memory.rst
index b76183545e5b..d324da29ac11 100644
--- a/Documentation/mm/physical_memory.rst
+++ b/Documentation/mm/physical_memory.rst
@@ -483,6 +483,12 @@ General
``present_pages`` should use ``get_online_mems()`` to get a stable value. It
is initialized by ``calculate_node_totalpages()``.
+``pages_with_online_memmap``
+ The pages_with_online_memmap is pages within the zone that have an online
+ memmap. It includes present pages and memory holes that have a memmap. When
+ spanned_pages == pages_with_online_memmap, pfn_to_page() can be performed
+ without further checks on any pfn within the zone span.
+
``present_early_pages``
The present pages existing within the zone located on memory available since
early boot, excluding hotplugged memory. Defined only when
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 3e51190a55e4..c7a136ce55c7 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -943,6 +943,11 @@ struct zone {
* cma pages is present pages that are assigned for CMA use
* (MIGRATE_CMA).
*
+ * pages_with_online_memmap is pages within the zone that have an online
+ * memmap. It includes present pages and memory holes that have a memmap.
+ * When spanned_pages == pages_with_online_memmap, pfn_to_page() can be
+ * performed without further checks on any pfn within the zone span.
+ *
* So present_pages may be used by memory hotplug or memory power
* management logic to figure out unmanaged pages by checking
* (present_pages - managed_pages). And managed_pages should be used
@@ -967,6 +972,7 @@ struct zone {
atomic_long_t managed_pages;
unsigned long spanned_pages;
unsigned long present_pages;
+ unsigned long pages_with_online_memmap;
#if defined(CONFIG_MEMORY_HOTPLUG)
unsigned long present_early_pages;
#endif
@@ -1051,7 +1057,6 @@ struct zone {
bool compact_blockskip_flush;
#endif
- bool contiguous;
CACHELINE_PADDING(_pad3_);
/* Zone statistics */
@@ -1124,6 +1129,21 @@ static inline bool zone_spans_pfn(const struct zone *zone, unsigned long pfn)
return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone);
}
+/**
+ * zone_is_contiguous - test whether a zone is contiguous
+ * @zone: the zone to test.
+ *
+ * In a contiguous zone, it is valid to call pfn_to_page() on any pfn in the
+ * spanned zone without requiring pfn_valid() or pfn_to_online_page() checks.
+ *
+ * Returns: true if contiguous, otherwise false.
+ */
+static inline bool zone_is_contiguous(const struct zone *zone)
+{
+ return READ_ONCE(zone->spanned_pages) ==
+ READ_ONCE(zone->pages_with_online_memmap);
+}
+
static inline bool zone_is_initialized(const struct zone *zone)
{
return zone->initialized;
diff --git a/mm/internal.h b/mm/internal.h
index cb0af847d7d9..7c4c8ab68bde 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -793,21 +793,17 @@ extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
unsigned long end_pfn, struct zone *zone)
{
- if (zone->contiguous)
+ if (zone_is_contiguous(zone) && zone_spans_pfn(zone, start_pfn)) {
+ VM_BUG_ON(end_pfn > zone_end_pfn(zone));
return pfn_to_page(start_pfn);
+ }
return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
}
-void set_zone_contiguous(struct zone *zone);
bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
unsigned long nr_pages);
-static inline void clear_zone_contiguous(struct zone *zone)
-{
- zone->contiguous = false;
-}
-
extern int __isolate_free_page(struct page *page, unsigned int order);
extern void __putback_isolated_page(struct page *page, unsigned int order,
int mt);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index bc805029da51..2ba7a394a64b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -492,11 +492,11 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
pfn = find_smallest_section_pfn(nid, zone, end_pfn,
zone_end_pfn(zone));
if (pfn) {
- zone->spanned_pages = zone_end_pfn(zone) - pfn;
+ WRITE_ONCE(zone->spanned_pages, zone_end_pfn(zone) - pfn);
zone->zone_start_pfn = pfn;
} else {
zone->zone_start_pfn = 0;
- zone->spanned_pages = 0;
+ WRITE_ONCE(zone->spanned_pages, 0);
}
} else if (zone_end_pfn(zone) == end_pfn) {
/*
@@ -508,10 +508,10 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
pfn = find_biggest_section_pfn(nid, zone, zone->zone_start_pfn,
start_pfn);
if (pfn)
- zone->spanned_pages = pfn - zone->zone_start_pfn + 1;
+ WRITE_ONCE(zone->spanned_pages, pfn - zone->zone_start_pfn + 1);
else {
zone->zone_start_pfn = 0;
- zone->spanned_pages = 0;
+ WRITE_ONCE(zone->spanned_pages, 0);
}
}
}
@@ -565,18 +565,13 @@ void remove_pfn_range_from_zone(struct zone *zone,
/*
* Zone shrinking code cannot properly deal with ZONE_DEVICE. So
- * we will not try to shrink the zones - which is okay as
- * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
+ * we will not try to shrink the zones.
*/
if (zone_is_zone_device(zone))
return;
- clear_zone_contiguous(zone);
-
shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
update_pgdat_span(pgdat);
-
- set_zone_contiguous(zone);
}
/**
@@ -753,8 +748,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
struct pglist_data *pgdat = zone->zone_pgdat;
int nid = pgdat->node_id;
- clear_zone_contiguous(zone);
-
if (zone_is_empty(zone))
init_currently_empty_zone(zone, start_pfn, nr_pages);
resize_zone_range(zone, start_pfn, nr_pages);
@@ -782,8 +775,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0,
MEMINIT_HOTPLUG, altmap, migratetype,
isolate_pageblock);
-
- set_zone_contiguous(zone);
}
struct auto_movable_stats {
@@ -1079,6 +1070,8 @@ void adjust_present_page_count(struct page *page, struct memory_group *group,
if (early_section(__pfn_to_section(page_to_pfn(page))))
zone->present_early_pages += nr_pages;
zone->present_pages += nr_pages;
+ WRITE_ONCE(zone->pages_with_online_memmap,
+ READ_ONCE(zone->pages_with_online_memmap) + nr_pages);
zone->zone_pgdat->node_present_pages += nr_pages;
if (group && movable)
diff --git a/mm/mm_init.c b/mm/mm_init.c
index df34797691bd..96690e550024 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -946,6 +946,7 @@ static void __init memmap_init_zone_range(struct zone *zone,
unsigned long zone_start_pfn = zone->zone_start_pfn;
unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages;
int nid = zone_to_nid(zone), zone_id = zone_idx(zone);
+ unsigned long zone_hole_start, zone_hole_end;
start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn);
@@ -957,8 +958,19 @@ static void __init memmap_init_zone_range(struct zone *zone,
zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
false);
- if (*hole_pfn < start_pfn)
+ WRITE_ONCE(zone->pages_with_online_memmap,
+ READ_ONCE(zone->pages_with_online_memmap) +
+ (end_pfn - start_pfn));
+
+ if (*hole_pfn < start_pfn) {
init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
+ zone_hole_start = clamp(*hole_pfn, zone_start_pfn, zone_end_pfn);
+ zone_hole_end = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
+ if (zone_hole_start < zone_hole_end)
+ WRITE_ONCE(zone->pages_with_online_memmap,
+ READ_ONCE(zone->pages_with_online_memmap) +
+ (zone_hole_end - zone_hole_start));
+ }
*hole_pfn = end_pfn;
}
@@ -2261,28 +2273,6 @@ void __init init_cma_pageblock(struct page *page)
}
#endif
-void set_zone_contiguous(struct zone *zone)
-{
- unsigned long block_start_pfn = zone->zone_start_pfn;
- unsigned long block_end_pfn;
-
- block_end_pfn = pageblock_end_pfn(block_start_pfn);
- for (; block_start_pfn < zone_end_pfn(zone);
- block_start_pfn = block_end_pfn,
- block_end_pfn += pageblock_nr_pages) {
-
- block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
-
- if (!__pageblock_pfn_to_page(block_start_pfn,
- block_end_pfn, zone))
- return;
- cond_resched();
- }
-
- /* We confirm that there is no hole */
- zone->contiguous = true;
-}
-
/*
* Check if a PFN range intersects multiple zones on one or more
* NUMA nodes. Specify the @nid argument if it is known that this
@@ -2311,7 +2301,6 @@ bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
static void __init mem_init_print_info(void);
void __init page_alloc_init_late(void)
{
- struct zone *zone;
int nid;
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
@@ -2345,9 +2334,6 @@ void __init page_alloc_init_late(void)
for_each_node_state(nid, N_MEMORY)
shuffle_free_memory(NODE_DATA(nid));
- for_each_populated_zone(zone)
- set_zone_contiguous(zone);
-
/* Initialize page ext after all struct pages are initialized. */
if (deferred_struct_pages)
page_ext_init();
--
2.47.3
next reply other threads:[~2026-03-19 9:56 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-19 9:56 Yuan Liu [this message]
2026-03-19 10:08 ` [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range Liu, Yuan1
2026-03-20 3:13 ` Andrew Morton
2026-03-23 10:56 ` David Hildenbrand (Arm)
2026-03-23 11:31 ` Mike Rapoport
2026-03-23 11:42 ` David Hildenbrand (Arm)
2026-03-26 7:30 ` Liu, Yuan1
2026-03-26 7:38 ` Chen, Yu C
2026-03-26 9:53 ` David Hildenbrand (Arm)
2026-03-26 3:39 ` Liu, Yuan1
2026-03-26 9:23 ` David Hildenbrand (Arm)
2026-03-23 11:51 ` Mike Rapoport
2026-03-26 7:32 ` Liu, Yuan1
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260319095622.1130380-1-yuan1.liu@intel.com \
--to=yuan1.liu@intel.com \
--cc=david@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nanhai.zou@intel.com \
--cc=osalvador@suse.de \
--cc=pan.deng@intel.com \
--cc=qiuxu.zhuo@intel.com \
--cc=richard.weiyang@gmail.com \
--cc=rppt@kernel.org \
--cc=tianyou.li@intel.com \
--cc=tim.c.chen@linux.intel.com \
--cc=yong.hu@intel.com \
--cc=yu.c.chen@intel.com \
--cc=zhangchen.kidd@jd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox