* [PATCH v2] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
@ 2026-04-01 7:01 Yuan Liu
2026-04-02 14:57 ` David Hildenbrand (Arm)
2026-04-04 11:11 ` Mike Rapoport
0 siblings, 2 replies; 5+ messages in thread
From: Yuan Liu @ 2026-04-01 7:01 UTC (permalink / raw)
To: David Hildenbrand, Oscar Salvador, Mike Rapoport, Wei Yang
Cc: linux-mm, Yong Hu, Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo,
Yu C Chen, Pan Deng, Tianyou Li, Chen Zhang, linux-kernel
When move_pfn_range_to_zone() or remove_pfn_range_from_zone() updates a
zone, set_zone_contiguous() rescans the entire zone pageblock-by-pageblock
to rebuild zone->contiguous. For large zones this is a significant cost
during memory hotplug and hot-unplug.
Add a new zone member pages_with_online_memmap that tracks the number of
pages within the zone span that have an online memmap (including present
pages and memory holes whose memmap has been initialized). When
spanned_pages == pages_with_online_memmap the zone is contiguous and
pfn_to_page() can be called on any PFN in the zone span without further
pfn_valid() checks.
Only pages that fall within the current zone span are accounted towards
pages_with_online_memmap. A "too small" value is safe, it merely prevents
detecting a contiguous zone.
The following test cases of memory hotplug for a VM [1], tested in the
environment [2], show that this optimization can significantly reduce the
memory hotplug time [3].
+----------------+------+---------------+--------------+----------------+
| | Size | Time (before) | Time (after) | Time Reduction |
| +------+---------------+--------------+----------------+
| Plug Memory | 256G | 10s | 3s | 70% |
| +------+---------------+--------------+----------------+
| | 512G | 36s | 7s | 81% |
+----------------+------+---------------+--------------+----------------+
+----------------+------+---------------+--------------+----------------+
| | Size | Time (before) | Time (after) | Time Reduction |
| +------+---------------+--------------+----------------+
| Unplug Memory | 256G | 11s | 4s | 64% |
| +------+---------------+--------------+----------------+
| | 512G | 36s | 9s | 75% |
+----------------+------+---------------+--------------+----------------+
[1] Qemu commands to hotplug 256G/512G memory for a VM:
object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
qom-set vmem1 requested-size 256G/512G (Plug Memory)
qom-set vmem1 requested-size 0G (Unplug Memory)
[2] Hardware : Intel Icelake server
Guest Kernel : v7.0-rc4
Qemu : v9.0.0
Launch VM :
qemu-system-x86_64 -accel kvm -cpu host \
-drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
-drive file=./seed.img,format=raw,if=virtio \
-smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
-m 2G,slots=10,maxmem=2052472M \
-device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
-device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
-nographic -machine q35 \
-nic user,hostfwd=tcp::3000-:22
Guest kernel auto-onlines newly added memory blocks:
echo online > /sys/devices/system/memory/auto_online_blocks
[3] The time from typing the QEMU commands in [1] to when the output of
'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
memory is recognized.
Reported-by: Nanhai Zou <nanhai.zou@intel.com>
Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
Tested-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
Reviewed-by: Pan Deng <pan.deng@intel.com>
Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
Co-developed-by: Tianyou Li <tianyou.li@intel.com>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
---
Documentation/mm/physical_memory.rst | 11 +++++
drivers/base/memory.c | 6 +++
include/linux/mmzone.h | 44 +++++++++++++++++++
mm/internal.h | 8 +---
mm/memory_hotplug.c | 12 +-----
mm/mm_init.c | 64 +++++++++++++++++-----------
6 files changed, 102 insertions(+), 43 deletions(-)
diff --git a/Documentation/mm/physical_memory.rst b/Documentation/mm/physical_memory.rst
index b76183545e5b..e47e96ef6a6d 100644
--- a/Documentation/mm/physical_memory.rst
+++ b/Documentation/mm/physical_memory.rst
@@ -483,6 +483,17 @@ General
``present_pages`` should use ``get_online_mems()`` to get a stable value. It
is initialized by ``calculate_node_totalpages()``.
+``pages_with_online_memmap``
+ Tracks pages within the zone that have an online memmap (present pages and
+ memory holes whose memmap has been initialized). When ``spanned_pages`` ==
+ ``pages_with_online_memmap``, ``pfn_to_page()`` can be performed without
+ further checks on any PFN within the zone span.
+
+ Note: this counter may temporarily undercount when pages with an online
+ memmap exist outside the current zone span. Growing the zone to cover such
+ pages and later shrinking it back may result in a "too small" value. This is
+ safe: it merely prevents detecting a contiguous zone.
+
``present_early_pages``
The present pages existing within the zone located on memory available since
early boot, excluding hotplugged memory. Defined only when
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index a3091924918b..2b6b4e5508af 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -246,6 +246,7 @@ static int memory_block_online(struct memory_block *mem)
nr_vmemmap_pages = mem->altmap->free;
mem_hotplug_begin();
+ clear_zone_contiguous(zone);
if (nr_vmemmap_pages) {
ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
if (ret)
@@ -270,6 +271,7 @@ static int memory_block_online(struct memory_block *mem)
mem->zone = zone;
out:
+ set_zone_contiguous(zone);
mem_hotplug_done();
return ret;
}
@@ -282,6 +284,7 @@ static int memory_block_offline(struct memory_block *mem)
unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
unsigned long nr_vmemmap_pages = 0;
+ struct zone *zone;
int ret;
if (!mem->zone)
@@ -294,7 +297,9 @@ static int memory_block_offline(struct memory_block *mem)
if (mem->altmap)
nr_vmemmap_pages = mem->altmap->free;
+ zone = mem->zone;
mem_hotplug_begin();
+ clear_zone_contiguous(zone);
if (nr_vmemmap_pages)
adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
-nr_vmemmap_pages);
@@ -314,6 +319,7 @@ static int memory_block_offline(struct memory_block *mem)
mem->zone = NULL;
out:
+ set_zone_contiguous(zone);
mem_hotplug_done();
return ret;
}
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 3e51190a55e4..011df76a03b6 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -943,6 +943,17 @@ struct zone {
* cma pages is present pages that are assigned for CMA use
* (MIGRATE_CMA).
*
+ * pages_with_online_memmap tracks pages in the zone that have an
+ * online memmap (present pages and holes whose memmap was initialized).
+ * When spanned_pages == pages_with_online_memmap, pfn_to_page() can
+ * be performed without further checks on any PFN in the zone span.
+ *
+ * Note: pages_with_online_memmap may temporarily undercount when pages
+ * with an online memmap exist outside the current zone span (e.g., from
+ * init_unavailable_range() during boot). Growing the zone to cover such
+ * pages and later shrinking it back may result in a "too small" value.
+ * This is safe: it merely prevents detecting a contiguous zone.
+ *
* So present_pages may be used by memory hotplug or memory power
* management logic to figure out unmanaged pages by checking
* (present_pages - managed_pages). And managed_pages should be used
@@ -967,6 +978,7 @@ struct zone {
atomic_long_t managed_pages;
unsigned long spanned_pages;
unsigned long present_pages;
+ unsigned long pages_with_online_memmap;
#if defined(CONFIG_MEMORY_HOTPLUG)
unsigned long present_early_pages;
#endif
@@ -1601,6 +1613,38 @@ static inline bool zone_is_zone_device(const struct zone *zone)
}
#endif
+/**
+ * zone_is_contiguous - test whether a zone is contiguous
+ * @zone: the zone to test.
+ *
+ * In a contiguous zone, it is valid to call pfn_to_page() on any PFN in the
+ * spanned zone without requiring pfn_valid() or pfn_to_online_page() checks.
+ *
+ * Note that missing synchronization with memory offlining makes any PFN
+ * traversal prone to races.
+ *
+ * ZONE_DEVICE zones are always marked non-contiguous.
+ *
+ * Return: true if contiguous, otherwise false.
+ */
+static inline bool zone_is_contiguous(const struct zone *zone)
+{
+ return zone->contiguous;
+}
+
+static inline void set_zone_contiguous(struct zone *zone)
+{
+ if (zone_is_zone_device(zone))
+ return;
+ if (zone->spanned_pages == zone->pages_with_online_memmap)
+ zone->contiguous = true;
+}
+
+static inline void clear_zone_contiguous(struct zone *zone)
+{
+ zone->contiguous = false;
+}
+
/*
* Returns true if a zone has pages managed by the buddy allocator.
* All the reclaim decisions have to use this function rather than
diff --git a/mm/internal.h b/mm/internal.h
index cb0af847d7d9..92fee035c3f2 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -793,21 +793,15 @@ extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
unsigned long end_pfn, struct zone *zone)
{
- if (zone->contiguous)
+ if (zone_is_contiguous(zone))
return pfn_to_page(start_pfn);
return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
}
-void set_zone_contiguous(struct zone *zone);
bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
unsigned long nr_pages);
-static inline void clear_zone_contiguous(struct zone *zone)
-{
- zone->contiguous = false;
-}
-
extern int __isolate_free_page(struct page *page, unsigned int order);
extern void __putback_isolated_page(struct page *page, unsigned int order,
int mt);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index bc805029da51..cd9c89de6ed2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -565,18 +565,13 @@ void remove_pfn_range_from_zone(struct zone *zone,
/*
* Zone shrinking code cannot properly deal with ZONE_DEVICE. So
- * we will not try to shrink the zones - which is okay as
- * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
+ * we will not try to shrink the zones.
*/
if (zone_is_zone_device(zone))
return;
- clear_zone_contiguous(zone);
-
shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
update_pgdat_span(pgdat);
-
- set_zone_contiguous(zone);
}
/**
@@ -753,8 +748,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
struct pglist_data *pgdat = zone->zone_pgdat;
int nid = pgdat->node_id;
- clear_zone_contiguous(zone);
-
if (zone_is_empty(zone))
init_currently_empty_zone(zone, start_pfn, nr_pages);
resize_zone_range(zone, start_pfn, nr_pages);
@@ -782,8 +775,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0,
MEMINIT_HOTPLUG, altmap, migratetype,
isolate_pageblock);
-
- set_zone_contiguous(zone);
}
struct auto_movable_stats {
@@ -1079,6 +1070,7 @@ void adjust_present_page_count(struct page *page, struct memory_group *group,
if (early_section(__pfn_to_section(page_to_pfn(page))))
zone->present_early_pages += nr_pages;
zone->present_pages += nr_pages;
+ zone->pages_with_online_memmap += nr_pages;
zone->zone_pgdat->node_present_pages += nr_pages;
if (group && movable)
diff --git a/mm/mm_init.c b/mm/mm_init.c
index df34797691bd..b8187a22e90e 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -842,7 +842,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
* zone/node above the hole except for the trailing pages in the last
* section that will be appended to the zone/node below.
*/
-static void __init init_unavailable_range(unsigned long spfn,
+static unsigned long __init init_unavailable_range(unsigned long spfn,
unsigned long epfn,
int zone, int node)
{
@@ -858,6 +858,36 @@ static void __init init_unavailable_range(unsigned long spfn,
if (pgcnt)
pr_info("On node %d, zone %s: %lld pages in unavailable ranges\n",
node, zone_names[zone], pgcnt);
+ return pgcnt;
+}
+
+/*
+ * Initialize unavailable range [spfn, epfn) while accounting only the pages
+ * that fall within the zone span towards pages_with_online_memmap. Pages
+ * outside the zone span are still initialized but not accounted.
+ */
+static void __init init_unavailable_range_for_zone(struct zone *zone,
+ unsigned long spfn,
+ unsigned long epfn)
+{
+ int nid = zone_to_nid(zone);
+ int zid = zone_idx(zone);
+ unsigned long in_zone_start;
+ unsigned long in_zone_end;
+
+ in_zone_start = clamp(spfn, zone->zone_start_pfn, zone_end_pfn(zone));
+ in_zone_end = clamp(epfn, zone->zone_start_pfn, zone_end_pfn(zone));
+
+ if (spfn < in_zone_start)
+ init_unavailable_range(spfn, in_zone_start, zid, nid);
+
+ if (in_zone_start < in_zone_end)
+ zone->pages_with_online_memmap +=
+ init_unavailable_range(in_zone_start, in_zone_end,
+ zid, nid);
+
+ if (in_zone_end < epfn)
+ init_unavailable_range(in_zone_end, epfn, zid, nid);
}
/*
@@ -956,9 +986,10 @@ static void __init memmap_init_zone_range(struct zone *zone,
memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn,
zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
false);
+ zone->pages_with_online_memmap += end_pfn - start_pfn;
if (*hole_pfn < start_pfn)
- init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
+ init_unavailable_range_for_zone(zone, *hole_pfn, start_pfn);
*hole_pfn = end_pfn;
}
@@ -996,8 +1027,11 @@ static void __init memmap_init(void)
#else
end_pfn = round_up(end_pfn, MAX_ORDER_NR_PAGES);
#endif
- if (hole_pfn < end_pfn)
- init_unavailable_range(hole_pfn, end_pfn, zone_id, nid);
+ if (hole_pfn < end_pfn) {
+ struct zone *zone = &NODE_DATA(nid)->node_zones[zone_id];
+
+ init_unavailable_range_for_zone(zone, hole_pfn, end_pfn);
+ }
}
#ifdef CONFIG_ZONE_DEVICE
@@ -2261,28 +2295,6 @@ void __init init_cma_pageblock(struct page *page)
}
#endif
-void set_zone_contiguous(struct zone *zone)
-{
- unsigned long block_start_pfn = zone->zone_start_pfn;
- unsigned long block_end_pfn;
-
- block_end_pfn = pageblock_end_pfn(block_start_pfn);
- for (; block_start_pfn < zone_end_pfn(zone);
- block_start_pfn = block_end_pfn,
- block_end_pfn += pageblock_nr_pages) {
-
- block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
-
- if (!__pageblock_pfn_to_page(block_start_pfn,
- block_end_pfn, zone))
- return;
- cond_resched();
- }
-
- /* We confirm that there is no hole */
- zone->contiguous = true;
-}
-
/*
* Check if a PFN range intersects multiple zones on one or more
* NUMA nodes. Specify the @nid argument if it is known that this
--
2.47.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
2026-04-01 7:01 [PATCH v2] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range Yuan Liu
@ 2026-04-02 14:57 ` David Hildenbrand (Arm)
2026-04-03 10:15 ` Liu, Yuan1
2026-04-04 11:11 ` Mike Rapoport
1 sibling, 1 reply; 5+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-02 14:57 UTC (permalink / raw)
To: Yuan Liu, Oscar Salvador, Mike Rapoport, Wei Yang
Cc: linux-mm, Yong Hu, Nanhai Zou, Tim Chen, Qiuxu Zhuo, Yu C Chen,
Pan Deng, Tianyou Li, Chen Zhang, linux-kernel
On 4/1/26 09:01, Yuan Liu wrote:
> When move_pfn_range_to_zone() or remove_pfn_range_from_zone() updates a
> zone, set_zone_contiguous() rescans the entire zone pageblock-by-pageblock
> to rebuild zone->contiguous. For large zones this is a significant cost
> during memory hotplug and hot-unplug.
>
> Add a new zone member pages_with_online_memmap that tracks the number of
> pages within the zone span that have an online memmap (including present
> pages and memory holes whose memmap has been initialized). When
> spanned_pages == pages_with_online_memmap the zone is contiguous and
> pfn_to_page() can be called on any PFN in the zone span without further
> pfn_valid() checks.
>
> Only pages that fall within the current zone span are accounted towards
> pages_with_online_memmap. A "too small" value is safe, it merely prevents
> detecting a contiguous zone.
>
> The following test cases of memory hotplug for a VM [1], tested in the
> environment [2], show that this optimization can significantly reduce the
> memory hotplug time [3].
>
> +----------------+------+---------------+--------------+----------------+
> | | Size | Time (before) | Time (after) | Time Reduction |
> | +------+---------------+--------------+----------------+
> | Plug Memory | 256G | 10s | 3s | 70% |
> | +------+---------------+--------------+----------------+
> | | 512G | 36s | 7s | 81% |
> +----------------+------+---------------+--------------+----------------+
>
> +----------------+------+---------------+--------------+----------------+
> | | Size | Time (before) | Time (after) | Time Reduction |
> | +------+---------------+--------------+----------------+
> | Unplug Memory | 256G | 11s | 4s | 64% |
> | +------+---------------+--------------+----------------+
> | | 512G | 36s | 9s | 75% |
> +----------------+------+---------------+--------------+----------------+
>
> [1] Qemu commands to hotplug 256G/512G memory for a VM:
> object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
> device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
> qom-set vmem1 requested-size 256G/512G (Plug Memory)
> qom-set vmem1 requested-size 0G (Unplug Memory)
>
> [2] Hardware : Intel Icelake server
> Guest Kernel : v7.0-rc4
> Qemu : v9.0.0
>
> Launch VM :
> qemu-system-x86_64 -accel kvm -cpu host \
> -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
> -drive file=./seed.img,format=raw,if=virtio \
> -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
> -m 2G,slots=10,maxmem=2052472M \
> -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
> -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
> -nographic -machine q35 \
> -nic user,hostfwd=tcp::3000-:22
>
> Guest kernel auto-onlines newly added memory blocks:
> echo online > /sys/devices/system/memory/auto_online_blocks
>
> [3] The time from typing the QEMU commands in [1] to when the output of
> 'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
> memory is recognized.
>
> Reported-by: Nanhai Zou <nanhai.zou@intel.com>
> Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
> Tested-by: Yuan Liu <yuan1.liu@intel.com>
> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
> Reviewed-by: Pan Deng <pan.deng@intel.com>
> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
> Co-developed-by: Tianyou Li <tianyou.li@intel.com>
> Signed-off-by: Tianyou Li <tianyou.li@intel.com>
> Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
> ---
> Documentation/mm/physical_memory.rst | 11 +++++
> drivers/base/memory.c | 6 +++
> include/linux/mmzone.h | 44 +++++++++++++++++++
> mm/internal.h | 8 +---
> mm/memory_hotplug.c | 12 +-----
> mm/mm_init.c | 64 +++++++++++++++++-----------
> 6 files changed, 102 insertions(+), 43 deletions(-)
>
> diff --git a/Documentation/mm/physical_memory.rst b/Documentation/mm/physical_memory.rst
> index b76183545e5b..e47e96ef6a6d 100644
> --- a/Documentation/mm/physical_memory.rst
> +++ b/Documentation/mm/physical_memory.rst
> @@ -483,6 +483,17 @@ General
> ``present_pages`` should use ``get_online_mems()`` to get a stable value. It
> is initialized by ``calculate_node_totalpages()``.
>
> +``pages_with_online_memmap``
> + Tracks pages within the zone that have an online memmap (present pages and
> + memory holes whose memmap has been initialized). When ``spanned_pages`` ==
> + ``pages_with_online_memmap``, ``pfn_to_page()`` can be performed without
> + further checks on any PFN within the zone span.
> +
> + Note: this counter may temporarily undercount when pages with an online
> + memmap exist outside the current zone span. Growing the zone to cover such
Maybe add here "This can only happen during boot, when initializing the
memmap of pages that do not fall into any zone span."
> + * we will not try to shrink the zones.
s/zone/it/ ?
[...]
> +
> +/*
> + * Initialize unavailable range [spfn, epfn) while accounting only the pages
> + * that fall within the zone span towards pages_with_online_memmap. Pages
> + * outside the zone span are still initialized but not accounted.
> + */
> +static void __init init_unavailable_range_for_zone(struct zone *zone,
> + unsigned long spfn,
> + unsigned long epfn)
Best to use double tab to fit this into a single line
unsigned long spfn, unsigned long epfn)
^ two tabs
> +{
> + int nid = zone_to_nid(zone);
> + int zid = zone_idx(zone);
Both can be const.
> + unsigned long in_zone_start;
> + unsigned long in_zone_end;
> +
> + in_zone_start = clamp(spfn, zone->zone_start_pfn, zone_end_pfn(zone));
> + in_zone_end = clamp(epfn, zone->zone_start_pfn, zone_end_pfn(zone));
> +
> + if (spfn < in_zone_start)
> + init_unavailable_range(spfn, in_zone_start, zid, nid);
> +
> + if (in_zone_start < in_zone_end)
> + zone->pages_with_online_memmap +=
> + init_unavailable_range(in_zone_start, in_zone_end,
> + zid, nid);
Best to use a temporary variable to make this easier to read.
pgcnt = init_unavailable_range(in_zone_start, ...
You can also exceed 80c a bit if it aids readability.
> +
> + if (in_zone_end < epfn)
> + init_unavailable_range(in_zone_end, epfn, zid, nid);
> }
Only nits, hoping we don't miss anything obvious (or any corner case :) ).
If Mike tells us that we are processing all pages during boot
appropriately, this should work.
Thanks!
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH v2] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
2026-04-02 14:57 ` David Hildenbrand (Arm)
@ 2026-04-03 10:15 ` Liu, Yuan1
0 siblings, 0 replies; 5+ messages in thread
From: Liu, Yuan1 @ 2026-04-03 10:15 UTC (permalink / raw)
To: David Hildenbrand (Arm), Oscar Salvador, Mike Rapoport, Wei Yang
Cc: linux-mm@kvack.org, Hu, Yong, Zou, Nanhai, Tim Chen, Zhuo, Qiuxu,
Chen, Yu C, Deng, Pan, Li, Tianyou, Chen Zhang,
linux-kernel@vger.kernel.org
> -----Original Message-----
> From: David Hildenbrand (Arm) <david@kernel.org>
> Sent: Thursday, April 2, 2026 10:57 PM
> To: Liu, Yuan1 <yuan1.liu@intel.com>; Oscar Salvador <osalvador@suse.de>;
> Mike Rapoport <rppt@kernel.org>; Wei Yang <richard.weiyang@gmail.com>
> Cc: linux-mm@kvack.org; Hu, Yong <yong.hu@intel.com>; Zou, Nanhai
> <nanhai.zou@intel.com>; Tim Chen <tim.c.chen@linux.intel.com>; Zhuo, Qiuxu
> <qiuxu.zhuo@intel.com>; Chen, Yu C <yu.c.chen@intel.com>; Deng, Pan
> <pan.deng@intel.com>; Li, Tianyou <tianyou.li@intel.com>; Chen Zhang
> <zhangchen.kidd@jd.com>; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2] mm/memory hotplug/unplug: Optimize zone contiguous
> check when changing pfn range
>
> On 4/1/26 09:01, Yuan Liu wrote:
> > When move_pfn_range_to_zone() or remove_pfn_range_from_zone() updates a
> > zone, set_zone_contiguous() rescans the entire zone pageblock-by-
> pageblock
> > to rebuild zone->contiguous. For large zones this is a significant cost
> > during memory hotplug and hot-unplug.
> >
> > Add a new zone member pages_with_online_memmap that tracks the number of
> > pages within the zone span that have an online memmap (including present
> > pages and memory holes whose memmap has been initialized). When
> > spanned_pages == pages_with_online_memmap the zone is contiguous and
> > pfn_to_page() can be called on any PFN in the zone span without further
> > pfn_valid() checks.
> >
> > Only pages that fall within the current zone span are accounted towards
> > pages_with_online_memmap. A "too small" value is safe, it merely
> prevents
> > detecting a contiguous zone.
> >
> > The following test cases of memory hotplug for a VM [1], tested in the
> > environment [2], show that this optimization can significantly reduce
> the
> > memory hotplug time [3].
> >
> > +----------------+------+---------------+--------------+----------------
> +
> > | | Size | Time (before) | Time (after) | Time Reduction
> |
> > | +------+---------------+--------------+----------------
> +
> > | Plug Memory | 256G | 10s | 3s | 70%
> |
> > | +------+---------------+--------------+----------------
> +
> > | | 512G | 36s | 7s | 81%
> |
> > +----------------+------+---------------+--------------+----------------
> +
> >
> > +----------------+------+---------------+--------------+----------------
> +
> > | | Size | Time (before) | Time (after) | Time Reduction
> |
> > | +------+---------------+--------------+----------------
> +
> > | Unplug Memory | 256G | 11s | 4s | 64%
> |
> > | +------+---------------+--------------+----------------
> +
> > | | 512G | 36s | 9s | 75%
> |
> > +----------------+------+---------------+--------------+----------------
> +
> >
> > [1] Qemu commands to hotplug 256G/512G memory for a VM:
> > object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
> > device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
> > qom-set vmem1 requested-size 256G/512G (Plug Memory)
> > qom-set vmem1 requested-size 0G (Unplug Memory)
> >
> > [2] Hardware : Intel Icelake server
> > Guest Kernel : v7.0-rc4
> > Qemu : v9.0.0
> >
> > Launch VM :
> > qemu-system-x86_64 -accel kvm -cpu host \
> > -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
> > -drive file=./seed.img,format=raw,if=virtio \
> > -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
> > -m 2G,slots=10,maxmem=2052472M \
> > -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
> > -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
> > -nographic -machine q35 \
> > -nic user,hostfwd=tcp::3000-:22
> >
> > Guest kernel auto-onlines newly added memory blocks:
> > echo online > /sys/devices/system/memory/auto_online_blocks
> >
> > [3] The time from typing the QEMU commands in [1] to when the output of
> > 'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
> > memory is recognized.
> >
> > Reported-by: Nanhai Zou <nanhai.zou@intel.com>
> > Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
> > Tested-by: Yuan Liu <yuan1.liu@intel.com>
> > Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> > Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> > Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
> > Reviewed-by: Pan Deng <pan.deng@intel.com>
> > Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
> > Co-developed-by: Tianyou Li <tianyou.li@intel.com>
> > Signed-off-by: Tianyou Li <tianyou.li@intel.com>
> > Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
> > ---
> > Documentation/mm/physical_memory.rst | 11 +++++
> > drivers/base/memory.c | 6 +++
> > include/linux/mmzone.h | 44 +++++++++++++++++++
> > mm/internal.h | 8 +---
> > mm/memory_hotplug.c | 12 +-----
> > mm/mm_init.c | 64 +++++++++++++++++-----------
> > 6 files changed, 102 insertions(+), 43 deletions(-)
> >
> > diff --git a/Documentation/mm/physical_memory.rst
> b/Documentation/mm/physical_memory.rst
> > index b76183545e5b..e47e96ef6a6d 100644
> > --- a/Documentation/mm/physical_memory.rst
> > +++ b/Documentation/mm/physical_memory.rst
> > @@ -483,6 +483,17 @@ General
> > ``present_pages`` should use ``get_online_mems()`` to get a stable
> value. It
> > is initialized by ``calculate_node_totalpages()``.
> >
> > +``pages_with_online_memmap``
> > + Tracks pages within the zone that have an online memmap (present
> pages and
> > + memory holes whose memmap has been initialized). When
> ``spanned_pages`` ==
> > + ``pages_with_online_memmap``, ``pfn_to_page()`` can be performed
> without
> > + further checks on any PFN within the zone span.
> > +
> > + Note: this counter may temporarily undercount when pages with an
> online
> > + memmap exist outside the current zone span. Growing the zone to cover
> such
>
> Maybe add here "This can only happen during boot, when initializing the
> memmap of pages that do not fall into any zone span."
I will add it into the next version.
>
> > + * we will not try to shrink the zones.
>
> s/zone/it/ ?
> [...]
Sure
> > +
> > +/*
> > + * Initialize unavailable range [spfn, epfn) while accounting only the
> pages
> > + * that fall within the zone span towards pages_with_online_memmap.
> Pages
> > + * outside the zone span are still initialized but not accounted.
> > + */
> > +static void __init init_unavailable_range_for_zone(struct zone *zone,
> > + unsigned long spfn,
> > + unsigned long epfn)
>
> Best to use double tab to fit this into a single line
>
> unsigned long spfn, unsigned long epfn)
>
> ^ two tabs
Thanks, I will refine the coding style here.
> > +{
> > + int nid = zone_to_nid(zone);
> > + int zid = zone_idx(zone);
>
> Both can be const.
Yes, I will fix this next version.
> > + unsigned long in_zone_start;
> > + unsigned long in_zone_end;
> > +
> > + in_zone_start = clamp(spfn, zone->zone_start_pfn,
> zone_end_pfn(zone));
> > + in_zone_end = clamp(epfn, zone->zone_start_pfn, zone_end_pfn(zone));
> > +
> > + if (spfn < in_zone_start)
> > + init_unavailable_range(spfn, in_zone_start, zid, nid);
> > +
> > + if (in_zone_start < in_zone_end)
> > + zone->pages_with_online_memmap +=
> > + init_unavailable_range(in_zone_start, in_zone_end,
> > + zid, nid);
>
> Best to use a temporary variable to make this easier to read.
>
> pgcnt = init_unavailable_range(in_zone_start, ...
>
> You can also exceed 80c a bit if it aids readability.
Indeed, that is a better way.
> > +
> > + if (in_zone_end < epfn)
> > + init_unavailable_range(in_zone_end, epfn, zid, nid);
> > }
>
>
> Only nits, hoping we don't miss anything obvious (or any corner case :) ).
>
> If Mike tells us that we are processing all pages during boot
> appropriately, this should work.
Okay, I will update the next version after Mike’s review, and I really appreciate your comments.
> Thanks!
>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
>
> --
> Cheers,
>
> David
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
2026-04-01 7:01 [PATCH v2] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range Yuan Liu
2026-04-02 14:57 ` David Hildenbrand (Arm)
@ 2026-04-04 11:11 ` Mike Rapoport
2026-04-07 0:59 ` Liu, Yuan1
1 sibling, 1 reply; 5+ messages in thread
From: Mike Rapoport @ 2026-04-04 11:11 UTC (permalink / raw)
To: Yuan Liu
Cc: David Hildenbrand, Oscar Salvador, Wei Yang, linux-mm, Yong Hu,
Nanhai Zou, Tim Chen, Qiuxu Zhuo, Yu C Chen, Pan Deng, Tianyou Li,
Chen Zhang, linux-kernel
On Wed, Apr 01, 2026 at 03:01:55AM -0400, Yuan Liu wrote:
> When move_pfn_range_to_zone() or remove_pfn_range_from_zone() updates a
> zone, set_zone_contiguous() rescans the entire zone pageblock-by-pageblock
> to rebuild zone->contiguous. For large zones this is a significant cost
> during memory hotplug and hot-unplug.
...
> diff --git a/Documentation/mm/physical_memory.rst b/Documentation/mm/physical_memory.rst
> index b76183545e5b..e47e96ef6a6d 100644
> --- a/Documentation/mm/physical_memory.rst
> +++ b/Documentation/mm/physical_memory.rst
> @@ -483,6 +483,17 @@ General
> ``present_pages`` should use ``get_online_mems()`` to get a stable value. It
> is initialized by ``calculate_node_totalpages()``.
>
> +``pages_with_online_memmap``
> + Tracks pages within the zone that have an online memmap (present pages and
Please spell out "memory map" rather then memmap in the documentation and
in the comments.
> + memory holes whose memmap has been initialized). When ``spanned_pages`` ==
> + ``pages_with_online_memmap``, ``pfn_to_page()`` can be performed without
> + further checks on any PFN within the zone span.
> +
> + Note: this counter may temporarily undercount when pages with an online
> + memmap exist outside the current zone span. Growing the zone to cover such
> + pages and later shrinking it back may result in a "too small" value. This is
> + safe: it merely prevents detecting a contiguous zone.
> +
> ``present_early_pages``
> The present pages existing within the zone located on memory available since
> early boot, excluding hotplugged memory. Defined only when
...
> +/*
> + * Initialize unavailable range [spfn, epfn) while accounting only the pages
> + * that fall within the zone span towards pages_with_online_memmap. Pages
> + * outside the zone span are still initialized but not accounted.
> + */
> +static void __init init_unavailable_range_for_zone(struct zone *zone,
> + unsigned long spfn,
> + unsigned long epfn)
> +{
> + int nid = zone_to_nid(zone);
> + int zid = zone_idx(zone);
> + unsigned long in_zone_start;
> + unsigned long in_zone_end;
> +
> + in_zone_start = clamp(spfn, zone->zone_start_pfn, zone_end_pfn(zone));
> + in_zone_end = clamp(epfn, zone->zone_start_pfn, zone_end_pfn(zone));
> +
> + if (spfn < in_zone_start)
> + init_unavailable_range(spfn, in_zone_start, zid, nid);
> +
> + if (in_zone_start < in_zone_end)
> + zone->pages_with_online_memmap +=
> + init_unavailable_range(in_zone_start, in_zone_end,
> + zid, nid);
> +
> + if (in_zone_end < epfn)
> + init_unavailable_range(in_zone_end, epfn, zid, nid);
> }
I think we can make it simpler, see below.
> /*
> @@ -956,9 +986,10 @@ static void __init memmap_init_zone_range(struct zone *zone,
> memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn,
> zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
> false);
> + zone->pages_with_online_memmap += end_pfn - start_pfn;
>
> if (*hole_pfn < start_pfn)
> - init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
> + init_unavailable_range_for_zone(zone, *hole_pfn, start_pfn);
Here *hole_pfn is either inside zone span or below it and in the second
case it's enough to adjust page count returned by init_unavailable_range()
by (zone_start_pfn - *hole_pfn).
> *hole_pfn = end_pfn;
> }
> @@ -996,8 +1027,11 @@ static void __init memmap_init(void)
> #else
> end_pfn = round_up(end_pfn, MAX_ORDER_NR_PAGES);
> #endif
> - if (hole_pfn < end_pfn)
> - init_unavailable_range(hole_pfn, end_pfn, zone_id, nid);
> + if (hole_pfn < end_pfn) {
> + struct zone *zone = &NODE_DATA(nid)->node_zones[zone_id];
> +
> + init_unavailable_range_for_zone(zone, hole_pfn, end_pfn);
Here we know that the range is not in any zone span.
> + }
> }
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH v2] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
2026-04-04 11:11 ` Mike Rapoport
@ 2026-04-07 0:59 ` Liu, Yuan1
0 siblings, 0 replies; 5+ messages in thread
From: Liu, Yuan1 @ 2026-04-07 0:59 UTC (permalink / raw)
To: Mike Rapoport
Cc: David Hildenbrand, Oscar Salvador, Wei Yang, linux-mm@kvack.org,
Hu, Yong, Zou, Nanhai, Tim Chen, Zhuo, Qiuxu, Chen, Yu C,
Deng, Pan, Li, Tianyou, Chen Zhang, linux-kernel@vger.kernel.org
> -----Original Message-----
> From: Mike Rapoport <rppt@kernel.org>
> Sent: Saturday, April 4, 2026 7:12 PM
> To: Liu, Yuan1 <yuan1.liu@intel.com>
> Cc: David Hildenbrand <david@kernel.org>; Oscar Salvador
> <osalvador@suse.de>; Wei Yang <richard.weiyang@gmail.com>; linux-
> mm@kvack.org; Hu, Yong <yong.hu@intel.com>; Zou, Nanhai
> <nanhai.zou@intel.com>; Tim Chen <tim.c.chen@linux.intel.com>; Zhuo, Qiuxu
> <qiuxu.zhuo@intel.com>; Chen, Yu C <yu.c.chen@intel.com>; Deng, Pan
> <pan.deng@intel.com>; Li, Tianyou <tianyou.li@intel.com>; Chen Zhang
> <zhangchen.kidd@jd.com>; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2] mm/memory hotplug/unplug: Optimize zone contiguous
> check when changing pfn range
>
> On Wed, Apr 01, 2026 at 03:01:55AM -0400, Yuan Liu wrote:
> > When move_pfn_range_to_zone() or remove_pfn_range_from_zone() updates a
> > zone, set_zone_contiguous() rescans the entire zone pageblock-by-
> pageblock
> > to rebuild zone->contiguous. For large zones this is a significant cost
> > during memory hotplug and hot-unplug.
>
> ...
>
> > diff --git a/Documentation/mm/physical_memory.rst
> b/Documentation/mm/physical_memory.rst
> > index b76183545e5b..e47e96ef6a6d 100644
> > --- a/Documentation/mm/physical_memory.rst
> > +++ b/Documentation/mm/physical_memory.rst
> > @@ -483,6 +483,17 @@ General
> > ``present_pages`` should use ``get_online_mems()`` to get a stable
> value. It
> > is initialized by ``calculate_node_totalpages()``.
> >
> > +``pages_with_online_memmap``
> > + Tracks pages within the zone that have an online memmap (present
> pages and
>
> Please spell out "memory map" rather then memmap in the documentation and
> in the comments.
Sure, I will fix it next version.
> > + memory holes whose memmap has been initialized). When
> ``spanned_pages`` ==
> > + ``pages_with_online_memmap``, ``pfn_to_page()`` can be performed
> without
> > + further checks on any PFN within the zone span.
> > +
> > + Note: this counter may temporarily undercount when pages with an
> online
> > + memmap exist outside the current zone span. Growing the zone to cover
> such
> > + pages and later shrinking it back may result in a "too small" value.
> This is
> > + safe: it merely prevents detecting a contiguous zone.
> > +
> > ``present_early_pages``
> > The present pages existing within the zone located on memory
> available since
> > early boot, excluding hotplugged memory. Defined only when
>
> ...
>
> > +/*
> > + * Initialize unavailable range [spfn, epfn) while accounting only the
> pages
> > + * that fall within the zone span towards pages_with_online_memmap.
> Pages
> > + * outside the zone span are still initialized but not accounted.
> > + */
> > +static void __init init_unavailable_range_for_zone(struct zone *zone,
> > + unsigned long spfn,
> > + unsigned long epfn)
> > +{
> > + int nid = zone_to_nid(zone);
> > + int zid = zone_idx(zone);
> > + unsigned long in_zone_start;
> > + unsigned long in_zone_end;
> > +
> > + in_zone_start = clamp(spfn, zone->zone_start_pfn,
> zone_end_pfn(zone));
> > + in_zone_end = clamp(epfn, zone->zone_start_pfn, zone_end_pfn(zone));
> > +
> > + if (spfn < in_zone_start)
> > + init_unavailable_range(spfn, in_zone_start, zid, nid);
> > +
> > + if (in_zone_start < in_zone_end)
> > + zone->pages_with_online_memmap +=
> > + init_unavailable_range(in_zone_start, in_zone_end,
> > + zid, nid);
> > +
> > + if (in_zone_end < epfn)
> > + init_unavailable_range(in_zone_end, epfn, zid, nid);
> > }
>
> I think we can make it simpler, see below.
>
> > /*
> > @@ -956,9 +986,10 @@ static void __init memmap_init_zone_range(struct
> zone *zone,
> > memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn,
> > zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
> > false);
> > + zone->pages_with_online_memmap += end_pfn - start_pfn;
> >
> > if (*hole_pfn < start_pfn)
> > - init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
> > + init_unavailable_range_for_zone(zone, *hole_pfn, start_pfn);
>
> Here *hole_pfn is either inside zone span or below it and in the second
> case it's enough to adjust page count returned by init_unavailable_range()
> by (zone_start_pfn - *hole_pfn).
Get it, I will refine it next version.
> > *hole_pfn = end_pfn;
> > }
> > @@ -996,8 +1027,11 @@ static void __init memmap_init(void)
> > #else
> > end_pfn = round_up(end_pfn, MAX_ORDER_NR_PAGES);
> > #endif
> > - if (hole_pfn < end_pfn)
> > - init_unavailable_range(hole_pfn, end_pfn, zone_id, nid);
> > + if (hole_pfn < end_pfn) {
> > + struct zone *zone = &NODE_DATA(nid)->node_zones[zone_id];
> > +
> > + init_unavailable_range_for_zone(zone, hole_pfn, end_pfn);
>
> Here we know that the range is not in any zone span.
Indeed, the range here does not belong to the zone span.
Thank you for your review.
> > + }
> > }
> >
>
> --
> Sincerely yours,
> Mike.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-07 0:59 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-01 7:01 [PATCH v2] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range Yuan Liu
2026-04-02 14:57 ` David Hildenbrand (Arm)
2026-04-03 10:15 ` Liu, Yuan1
2026-04-04 11:11 ` Mike Rapoport
2026-04-07 0:59 ` Liu, Yuan1
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox