[PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

* [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
@ 2026-03-19  9:56 Yuan Liu
  2026-03-19 10:08 ` Liu, Yuan1
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Yuan Liu @ 2026-03-19  9:56 UTC (permalink / raw)
  To: David Hildenbrand, Oscar Salvador, Mike Rapoport, Wei Yang
  Cc: linux-mm, Yong Hu, Nanhai Zou, Yuan Liu, Tim Chen, Qiuxu Zhuo,
	Yu C Chen, Pan Deng, Tianyou Li, Chen Zhang, linux-kernel

When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it will
update the zone->contiguous by checking the new zone's pfn range from the
beginning to the end, regardless the previous state of the old zone. When
the zone's pfn range is large, the cost of traversing the pfn range to
update the zone->contiguous could be significant.

Add a new zone's pages_with_memmap member, it is pages within the zone that
have an online memmap. It includes present pages and memory holes that have
a memmap. When spanned_pages == pages_with_online_memmap, pfn_to_page() can
be performed without further checks on any pfn within the zone span.

The following test cases of memory hotplug for a VM [1], tested in the
environment [2], show that this optimization can significantly reduce the
memory hotplug time [3].

+----------------+------+---------------+--------------+----------------+
|                | Size | Time (before) | Time (after) | Time Reduction |
|                +------+---------------+--------------+----------------+
| Plug Memory    | 256G |      10s      |      3s      |       70%      |
|                +------+---------------+--------------+----------------+
|                | 512G |      36s      |      7s      |       81%      |
+----------------+------+---------------+--------------+----------------+

+----------------+------+---------------+--------------+----------------+
|                | Size | Time (before) | Time (after) | Time Reduction |
|                +------+---------------+--------------+----------------+
| Unplug Memory  | 256G |      11s      |      4s      |       64%      |
|                +------+---------------+--------------+----------------+
|                | 512G |      36s      |      9s      |       75%      |
+----------------+------+---------------+--------------+----------------+

[1] Qemu commands to hotplug 256G/512G memory for a VM:
    object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
    device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
    qom-set vmem1 requested-size 256G/512G (Plug Memory)
    qom-set vmem1 requested-size 0G (Unplug Memory)

[2] Hardware     : Intel Icelake server
    Guest Kernel : v7.0-rc4
    Qemu         : v9.0.0

    Launch VM    :
    qemu-system-x86_64 -accel kvm -cpu host \
    -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
    -drive file=./seed.img,format=raw,if=virtio \
    -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
    -m 2G,slots=10,maxmem=2052472M \
    -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
    -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
    -nographic -machine q35 \
    -nic user,hostfwd=tcp::3000-:22

    Guest kernel auto-onlines newly added memory blocks:
    echo online > /sys/devices/system/memory/auto_online_blocks

[3] The time from typing the QEMU commands in [1] to when the output of
    'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
    memory is recognized.

Reported-by: Nanhai Zou <nanhai.zou@intel.com>
Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
Tested-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
Reviewed-by: Pan Deng <pan.deng@intel.com>
Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
Reviewed-by: Yuan Liu <yuan1.liu@intel.com>
Co-developed-by: Tianyou Li <tianyou.li@intel.com>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
---
 Documentation/mm/physical_memory.rst |  6 +++++
 include/linux/mmzone.h               | 22 ++++++++++++++-
 mm/internal.h                        | 10 +++----
 mm/memory_hotplug.c                  | 21 +++++----------
 mm/mm_init.c                         | 40 +++++++++-------------------
 5 files changed, 50 insertions(+), 49 deletions(-)

diff --git a/Documentation/mm/physical_memory.rst b/Documentation/mm/physical_memory.rst
index b76183545e5b..d324da29ac11 100644
--- a/Documentation/mm/physical_memory.rst
+++ b/Documentation/mm/physical_memory.rst
@@ -483,6 +483,12 @@ General
   ``present_pages`` should use ``get_online_mems()`` to get a stable value. It
   is initialized by ``calculate_node_totalpages()``.
 
+``pages_with_online_memmap``
+  The pages_with_online_memmap is pages within the zone that have an online
+  memmap. It includes present pages and memory holes that have a memmap. When
+  spanned_pages == pages_with_online_memmap, pfn_to_page() can be performed
+  without further checks on any pfn within the zone span.
+
 ``present_early_pages``
   The present pages existing within the zone located on memory available since
   early boot, excluding hotplugged memory. Defined only when
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 3e51190a55e4..c7a136ce55c7 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -943,6 +943,11 @@ struct zone {
 	 * cma pages is present pages that are assigned for CMA use
 	 * (MIGRATE_CMA).
 	 *
+	 * pages_with_online_memmap is pages within the zone that have an online
+	 * memmap. It includes present pages and memory holes that have a memmap.
+	 * When spanned_pages == pages_with_online_memmap, pfn_to_page() can be
+	 * performed without further checks on any pfn within the zone span.
+	 *
 	 * So present_pages may be used by memory hotplug or memory power
 	 * management logic to figure out unmanaged pages by checking
 	 * (present_pages - managed_pages). And managed_pages should be used
@@ -967,6 +972,7 @@ struct zone {
 	atomic_long_t		managed_pages;
 	unsigned long		spanned_pages;
 	unsigned long		present_pages;
+	unsigned long		pages_with_online_memmap;
 #if defined(CONFIG_MEMORY_HOTPLUG)
 	unsigned long		present_early_pages;
 #endif
@@ -1051,7 +1057,6 @@ struct zone {
 	bool			compact_blockskip_flush;
 #endif
 
-	bool			contiguous;
 
 	CACHELINE_PADDING(_pad3_);
 	/* Zone statistics */
@@ -1124,6 +1129,21 @@ static inline bool zone_spans_pfn(const struct zone *zone, unsigned long pfn)
 	return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone);
 }
 
+/**
+ * zone_is_contiguous - test whether a zone is contiguous
+ * @zone: the zone to test.
+ *
+ * In a contiguous zone, it is valid to call pfn_to_page() on any pfn in the
+ * spanned zone without requiring pfn_valid() or pfn_to_online_page() checks.
+ *
+ * Returns: true if contiguous, otherwise false.
+ */
+static inline bool zone_is_contiguous(const struct zone *zone)
+{
+	return READ_ONCE(zone->spanned_pages) ==
+		READ_ONCE(zone->pages_with_online_memmap);
+}
+
 static inline bool zone_is_initialized(const struct zone *zone)
 {
 	return zone->initialized;
diff --git a/mm/internal.h b/mm/internal.h
index cb0af847d7d9..7c4c8ab68bde 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -793,21 +793,17 @@ extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
 static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
 				unsigned long end_pfn, struct zone *zone)
 {
-	if (zone->contiguous)
+	if (zone_is_contiguous(zone) && zone_spans_pfn(zone, start_pfn)) {
+		VM_BUG_ON(end_pfn > zone_end_pfn(zone));
 		return pfn_to_page(start_pfn);
+	}
 
 	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
 }
 
-void set_zone_contiguous(struct zone *zone);
 bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
 			   unsigned long nr_pages);
 
-static inline void clear_zone_contiguous(struct zone *zone)
-{
-	zone->contiguous = false;
-}
-
 extern int __isolate_free_page(struct page *page, unsigned int order);
 extern void __putback_isolated_page(struct page *page, unsigned int order,
 				    int mt);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index bc805029da51..2ba7a394a64b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -492,11 +492,11 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
 		pfn = find_smallest_section_pfn(nid, zone, end_pfn,
 						zone_end_pfn(zone));
 		if (pfn) {
-			zone->spanned_pages = zone_end_pfn(zone) - pfn;
+			WRITE_ONCE(zone->spanned_pages, zone_end_pfn(zone) - pfn);
 			zone->zone_start_pfn = pfn;
 		} else {
 			zone->zone_start_pfn = 0;
-			zone->spanned_pages = 0;
+			WRITE_ONCE(zone->spanned_pages, 0);
 		}
 	} else if (zone_end_pfn(zone) == end_pfn) {
 		/*
@@ -508,10 +508,10 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
 		pfn = find_biggest_section_pfn(nid, zone, zone->zone_start_pfn,
 					       start_pfn);
 		if (pfn)
-			zone->spanned_pages = pfn - zone->zone_start_pfn + 1;
+			WRITE_ONCE(zone->spanned_pages, pfn - zone->zone_start_pfn + 1);
 		else {
 			zone->zone_start_pfn = 0;
-			zone->spanned_pages = 0;
+			WRITE_ONCE(zone->spanned_pages, 0);
 		}
 	}
 }
@@ -565,18 +565,13 @@ void remove_pfn_range_from_zone(struct zone *zone,
 
 	/*
 	 * Zone shrinking code cannot properly deal with ZONE_DEVICE. So
-	 * we will not try to shrink the zones - which is okay as
-	 * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
+	 * we will not try to shrink the zones.
 	 */
 	if (zone_is_zone_device(zone))
 		return;
 
-	clear_zone_contiguous(zone);
-
 	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
 	update_pgdat_span(pgdat);
-
-	set_zone_contiguous(zone);
 }
 
 /**
@@ -753,8 +748,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 	struct pglist_data *pgdat = zone->zone_pgdat;
 	int nid = pgdat->node_id;
 
-	clear_zone_contiguous(zone);
-
 	if (zone_is_empty(zone))
 		init_currently_empty_zone(zone, start_pfn, nr_pages);
 	resize_zone_range(zone, start_pfn, nr_pages);
@@ -782,8 +775,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 	memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0,
 			 MEMINIT_HOTPLUG, altmap, migratetype,
 			 isolate_pageblock);
-
-	set_zone_contiguous(zone);
 }
 
 struct auto_movable_stats {
@@ -1079,6 +1070,8 @@ void adjust_present_page_count(struct page *page, struct memory_group *group,
 	if (early_section(__pfn_to_section(page_to_pfn(page))))
 		zone->present_early_pages += nr_pages;
 	zone->present_pages += nr_pages;
+	WRITE_ONCE(zone->pages_with_online_memmap,
+		READ_ONCE(zone->pages_with_online_memmap) + nr_pages);
 	zone->zone_pgdat->node_present_pages += nr_pages;
 
 	if (group && movable)
diff --git a/mm/mm_init.c b/mm/mm_init.c
index df34797691bd..96690e550024 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -946,6 +946,7 @@ static void __init memmap_init_zone_range(struct zone *zone,
 	unsigned long zone_start_pfn = zone->zone_start_pfn;
 	unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages;
 	int nid = zone_to_nid(zone), zone_id = zone_idx(zone);
+	unsigned long zone_hole_start, zone_hole_end;
 
 	start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
 	end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn);
@@ -957,8 +958,19 @@ static void __init memmap_init_zone_range(struct zone *zone,
 			  zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
 			  false);
 
-	if (*hole_pfn < start_pfn)
+	WRITE_ONCE(zone->pages_with_online_memmap,
+		   READ_ONCE(zone->pages_with_online_memmap) +
+		   (end_pfn - start_pfn));
+
+	if (*hole_pfn < start_pfn) {
 		init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
+		zone_hole_start = clamp(*hole_pfn, zone_start_pfn, zone_end_pfn);
+		zone_hole_end = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
+		if (zone_hole_start < zone_hole_end)
+			WRITE_ONCE(zone->pages_with_online_memmap,
+				   READ_ONCE(zone->pages_with_online_memmap) +
+				   (zone_hole_end - zone_hole_start));
+	}
 
 	*hole_pfn = end_pfn;
 }
@@ -2261,28 +2273,6 @@ void __init init_cma_pageblock(struct page *page)
 }
 #endif
 
-void set_zone_contiguous(struct zone *zone)
-{
-	unsigned long block_start_pfn = zone->zone_start_pfn;
-	unsigned long block_end_pfn;
-
-	block_end_pfn = pageblock_end_pfn(block_start_pfn);
-	for (; block_start_pfn < zone_end_pfn(zone);
-			block_start_pfn = block_end_pfn,
-			 block_end_pfn += pageblock_nr_pages) {
-
-		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
-
-		if (!__pageblock_pfn_to_page(block_start_pfn,
-					     block_end_pfn, zone))
-			return;
-		cond_resched();
-	}
-
-	/* We confirm that there is no hole */
-	zone->contiguous = true;
-}
-
 /*
  * Check if a PFN range intersects multiple zones on one or more
  * NUMA nodes. Specify the @nid argument if it is known that this
@@ -2311,7 +2301,6 @@ bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
 static void __init mem_init_print_info(void);
 void __init page_alloc_init_late(void)
 {
-	struct zone *zone;
 	int nid;
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
@@ -2345,9 +2334,6 @@ void __init page_alloc_init_late(void)
 	for_each_node_state(nid, N_MEMORY)
 		shuffle_free_memory(NODE_DATA(nid));
 
-	for_each_populated_zone(zone)
-		set_zone_contiguous(zone);
-
 	/* Initialize page ext after all struct pages are initialized. */
 	if (deferred_struct_pages)
 		page_ext_init();
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* RE: [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
  2026-03-19  9:56 [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range Yuan Liu
@ 2026-03-19 10:08 ` Liu, Yuan1
  2026-03-20  3:13 ` Andrew Morton
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Liu, Yuan1 @ 2026-03-19 10:08 UTC (permalink / raw)
  To: David Hildenbrand, Oscar Salvador, Mike Rapoport, Wei Yang
  Cc: linux-mm@kvack.org, Hu, Yong, Zou, Nanhai, Tim Chen, Zhuo, Qiuxu,
	Chen, Yu C, Deng, Pan, Li, Tianyou, Chen Zhang,
	linux-kernel@vger.kernel.org

Hi David & Mike

I merged this patch into v6.19-rc8 for validation and observed that unplugging 256 GB takes 3 seconds, while unplugging 512 GB takes 7 seconds. I believe this performance regression in memory unplug is not caused by this patch.

Best Regards,
Liu, Yuan1

> -----Original Message-----
> From: Liu, Yuan1 <yuan1.liu@intel.com>
> Sent: Thursday, March 19, 2026 5:56 PM
> To: David Hildenbrand <david@kernel.org>; Oscar Salvador
> <osalvador@suse.de>; Mike Rapoport <rppt@kernel.org>; Wei Yang
> <richard.weiyang@gmail.com>
> Cc: linux-mm@kvack.org; Hu, Yong <yong.hu@intel.com>; Zou, Nanhai
> <nanhai.zou@intel.com>; Liu, Yuan1 <yuan1.liu@intel.com>; Tim Chen
> <tim.c.chen@linux.intel.com>; Zhuo, Qiuxu <qiuxu.zhuo@intel.com>; Chen, Yu
> C <yu.c.chen@intel.com>; Deng, Pan <pan.deng@intel.com>; Li, Tianyou
> <tianyou.li@intel.com>; Chen Zhang <zhangchen.kidd@jd.com>; linux-
> kernel@vger.kernel.org
> Subject: [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check
> when changing pfn range
> 
> When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it will
> update the zone->contiguous by checking the new zone's pfn range from the
> beginning to the end, regardless the previous state of the old zone. When
> the zone's pfn range is large, the cost of traversing the pfn range to
> update the zone->contiguous could be significant.
> 
> Add a new zone's pages_with_memmap member, it is pages within the zone
> that
> have an online memmap. It includes present pages and memory holes that
> have
> a memmap. When spanned_pages == pages_with_online_memmap, pfn_to_page()
> can
> be performed without further checks on any pfn within the zone span.
> 
> The following test cases of memory hotplug for a VM [1], tested in the
> environment [2], show that this optimization can significantly reduce the
> memory hotplug time [3].
> 
> +----------------+------+---------------+--------------+----------------+
> |                | Size | Time (before) | Time (after) | Time Reduction |
> |                +------+---------------+--------------+----------------+
> | Plug Memory    | 256G |      10s      |      3s      |       70%      |
> |                +------+---------------+--------------+----------------+
> |                | 512G |      36s      |      7s      |       81%      |
> +----------------+------+---------------+--------------+----------------+
> 
> +----------------+------+---------------+--------------+----------------+
> |                | Size | Time (before) | Time (after) | Time Reduction |
> |                +------+---------------+--------------+----------------+
> | Unplug Memory  | 256G |      11s      |      4s      |       64%      |
> |                +------+---------------+--------------+----------------+
> |                | 512G |      36s      |      9s      |       75%      |
> +----------------+------+---------------+--------------+----------------+
> 
> [1] Qemu commands to hotplug 256G/512G memory for a VM:
>     object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
>     device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
>     qom-set vmem1 requested-size 256G/512G (Plug Memory)
>     qom-set vmem1 requested-size 0G (Unplug Memory)
> 
> [2] Hardware     : Intel Icelake server
>     Guest Kernel : v7.0-rc4
>     Qemu         : v9.0.0
> 
>     Launch VM    :
>     qemu-system-x86_64 -accel kvm -cpu host \
>     -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
>     -drive file=./seed.img,format=raw,if=virtio \
>     -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
>     -m 2G,slots=10,maxmem=2052472M \
>     -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
>     -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
>     -nographic -machine q35 \
>     -nic user,hostfwd=tcp::3000-:22
> 
>     Guest kernel auto-onlines newly added memory blocks:
>     echo online > /sys/devices/system/memory/auto_online_blocks
> 
> [3] The time from typing the QEMU commands in [1] to when the output of
>     'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
>     memory is recognized.
> 
> Reported-by: Nanhai Zou <nanhai.zou@intel.com>
> Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
> Tested-by: Yuan Liu <yuan1.liu@intel.com>
> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
> Reviewed-by: Pan Deng <pan.deng@intel.com>
> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
> Reviewed-by: Yuan Liu <yuan1.liu@intel.com>
> Co-developed-by: Tianyou Li <tianyou.li@intel.com>
> Signed-off-by: Tianyou Li <tianyou.li@intel.com>
> Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
> ---
>  Documentation/mm/physical_memory.rst |  6 +++++
>  include/linux/mmzone.h               | 22 ++++++++++++++-
>  mm/internal.h                        | 10 +++----
>  mm/memory_hotplug.c                  | 21 +++++----------
>  mm/mm_init.c                         | 40 +++++++++-------------------
>  5 files changed, 50 insertions(+), 49 deletions(-)
> 
> diff --git a/Documentation/mm/physical_memory.rst
> b/Documentation/mm/physical_memory.rst
> index b76183545e5b..d324da29ac11 100644
> --- a/Documentation/mm/physical_memory.rst
> +++ b/Documentation/mm/physical_memory.rst
> @@ -483,6 +483,12 @@ General
>    ``present_pages`` should use ``get_online_mems()`` to get a stable
> value. It
>    is initialized by ``calculate_node_totalpages()``.
> 
> +``pages_with_online_memmap``
> +  The pages_with_online_memmap is pages within the zone that have an
> online
> +  memmap. It includes present pages and memory holes that have a memmap.
> When
> +  spanned_pages == pages_with_online_memmap, pfn_to_page() can be
> performed
> +  without further checks on any pfn within the zone span.
> +
>  ``present_early_pages``
>    The present pages existing within the zone located on memory available
> since
>    early boot, excluding hotplugged memory. Defined only when
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 3e51190a55e4..c7a136ce55c7 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -943,6 +943,11 @@ struct zone {
>  	 * cma pages is present pages that are assigned for CMA use
>  	 * (MIGRATE_CMA).
>  	 *
> +	 * pages_with_online_memmap is pages within the zone that have an
> online
> +	 * memmap. It includes present pages and memory holes that have a
> memmap.
> +	 * When spanned_pages == pages_with_online_memmap, pfn_to_page() can
> be
> +	 * performed without further checks on any pfn within the zone span.
> +	 *
>  	 * So present_pages may be used by memory hotplug or memory power
>  	 * management logic to figure out unmanaged pages by checking
>  	 * (present_pages - managed_pages). And managed_pages should be used
> @@ -967,6 +972,7 @@ struct zone {
>  	atomic_long_t		managed_pages;
>  	unsigned long		spanned_pages;
>  	unsigned long		present_pages;
> +	unsigned long		pages_with_online_memmap;
>  #if defined(CONFIG_MEMORY_HOTPLUG)
>  	unsigned long		present_early_pages;
>  #endif
> @@ -1051,7 +1057,6 @@ struct zone {
>  	bool			compact_blockskip_flush;
>  #endif
> 
> -	bool			contiguous;
> 
>  	CACHELINE_PADDING(_pad3_);
>  	/* Zone statistics */
> @@ -1124,6 +1129,21 @@ static inline bool zone_spans_pfn(const struct zone
> *zone, unsigned long pfn)
>  	return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone);
>  }
> 
> +/**
> + * zone_is_contiguous - test whether a zone is contiguous
> + * @zone: the zone to test.
> + *
> + * In a contiguous zone, it is valid to call pfn_to_page() on any pfn in
> the
> + * spanned zone without requiring pfn_valid() or pfn_to_online_page()
> checks.
> + *
> + * Returns: true if contiguous, otherwise false.
> + */
> +static inline bool zone_is_contiguous(const struct zone *zone)
> +{
> +	return READ_ONCE(zone->spanned_pages) ==
> +		READ_ONCE(zone->pages_with_online_memmap);
> +}
> +
>  static inline bool zone_is_initialized(const struct zone *zone)
>  {
>  	return zone->initialized;
> diff --git a/mm/internal.h b/mm/internal.h
> index cb0af847d7d9..7c4c8ab68bde 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -793,21 +793,17 @@ extern struct page *__pageblock_pfn_to_page(unsigned
> long start_pfn,
>  static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>  				unsigned long end_pfn, struct zone *zone)
>  {
> -	if (zone->contiguous)
> +	if (zone_is_contiguous(zone) && zone_spans_pfn(zone, start_pfn)) {
> +		VM_BUG_ON(end_pfn > zone_end_pfn(zone));
>  		return pfn_to_page(start_pfn);
> +	}
> 
>  	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
>  }
> 
> -void set_zone_contiguous(struct zone *zone);
>  bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
>  			   unsigned long nr_pages);
> 
> -static inline void clear_zone_contiguous(struct zone *zone)
> -{
> -	zone->contiguous = false;
> -}
> -
>  extern int __isolate_free_page(struct page *page, unsigned int order);
>  extern void __putback_isolated_page(struct page *page, unsigned int
> order,
>  				    int mt);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index bc805029da51..2ba7a394a64b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -492,11 +492,11 @@ static void shrink_zone_span(struct zone *zone,
> unsigned long start_pfn,
>  		pfn = find_smallest_section_pfn(nid, zone, end_pfn,
>  						zone_end_pfn(zone));
>  		if (pfn) {
> -			zone->spanned_pages = zone_end_pfn(zone) - pfn;
> +			WRITE_ONCE(zone->spanned_pages, zone_end_pfn(zone) -
> pfn);
>  			zone->zone_start_pfn = pfn;
>  		} else {
>  			zone->zone_start_pfn = 0;
> -			zone->spanned_pages = 0;
> +			WRITE_ONCE(zone->spanned_pages, 0);
>  		}
>  	} else if (zone_end_pfn(zone) == end_pfn) {
>  		/*
> @@ -508,10 +508,10 @@ static void shrink_zone_span(struct zone *zone,
> unsigned long start_pfn,
>  		pfn = find_biggest_section_pfn(nid, zone, zone-
> >zone_start_pfn,
>  					       start_pfn);
>  		if (pfn)
> -			zone->spanned_pages = pfn - zone->zone_start_pfn + 1;
> +			WRITE_ONCE(zone->spanned_pages, pfn - zone-
> >zone_start_pfn + 1);
>  		else {
>  			zone->zone_start_pfn = 0;
> -			zone->spanned_pages = 0;
> +			WRITE_ONCE(zone->spanned_pages, 0);
>  		}
>  	}
>  }
> @@ -565,18 +565,13 @@ void remove_pfn_range_from_zone(struct zone *zone,
> 
>  	/*
>  	 * Zone shrinking code cannot properly deal with ZONE_DEVICE. So
> -	 * we will not try to shrink the zones - which is okay as
> -	 * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
> +	 * we will not try to shrink the zones.
>  	 */
>  	if (zone_is_zone_device(zone))
>  		return;
> 
> -	clear_zone_contiguous(zone);
> -
>  	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
>  	update_pgdat_span(pgdat);
> -
> -	set_zone_contiguous(zone);
>  }
> 
>  /**
> @@ -753,8 +748,6 @@ void move_pfn_range_to_zone(struct zone *zone,
> unsigned long start_pfn,
>  	struct pglist_data *pgdat = zone->zone_pgdat;
>  	int nid = pgdat->node_id;
> 
> -	clear_zone_contiguous(zone);
> -
>  	if (zone_is_empty(zone))
>  		init_currently_empty_zone(zone, start_pfn, nr_pages);
>  	resize_zone_range(zone, start_pfn, nr_pages);
> @@ -782,8 +775,6 @@ void move_pfn_range_to_zone(struct zone *zone,
> unsigned long start_pfn,
>  	memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0,
>  			 MEMINIT_HOTPLUG, altmap, migratetype,
>  			 isolate_pageblock);
> -
> -	set_zone_contiguous(zone);
>  }
> 
>  struct auto_movable_stats {
> @@ -1079,6 +1070,8 @@ void adjust_present_page_count(struct page *page,
> struct memory_group *group,
>  	if (early_section(__pfn_to_section(page_to_pfn(page))))
>  		zone->present_early_pages += nr_pages;
>  	zone->present_pages += nr_pages;
> +	WRITE_ONCE(zone->pages_with_online_memmap,
> +		READ_ONCE(zone->pages_with_online_memmap) + nr_pages);
>  	zone->zone_pgdat->node_present_pages += nr_pages;
> 
>  	if (group && movable)
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index df34797691bd..96690e550024 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -946,6 +946,7 @@ static void __init memmap_init_zone_range(struct zone
> *zone,
>  	unsigned long zone_start_pfn = zone->zone_start_pfn;
>  	unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages;
>  	int nid = zone_to_nid(zone), zone_id = zone_idx(zone);
> +	unsigned long zone_hole_start, zone_hole_end;
> 
>  	start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
>  	end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn);
> @@ -957,8 +958,19 @@ static void __init memmap_init_zone_range(struct zone
> *zone,
>  			  zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
>  			  false);
> 
> -	if (*hole_pfn < start_pfn)
> +	WRITE_ONCE(zone->pages_with_online_memmap,
> +		   READ_ONCE(zone->pages_with_online_memmap) +
> +		   (end_pfn - start_pfn));
> +
> +	if (*hole_pfn < start_pfn) {
>  		init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
> +		zone_hole_start = clamp(*hole_pfn, zone_start_pfn,
> zone_end_pfn);
> +		zone_hole_end = clamp(start_pfn, zone_start_pfn,
> zone_end_pfn);
> +		if (zone_hole_start < zone_hole_end)
> +			WRITE_ONCE(zone->pages_with_online_memmap,
> +				   READ_ONCE(zone->pages_with_online_memmap) +
> +				   (zone_hole_end - zone_hole_start));
> +	}
> 
>  	*hole_pfn = end_pfn;
>  }
> @@ -2261,28 +2273,6 @@ void __init init_cma_pageblock(struct page *page)
>  }
>  #endif
> 
> -void set_zone_contiguous(struct zone *zone)
> -{
> -	unsigned long block_start_pfn = zone->zone_start_pfn;
> -	unsigned long block_end_pfn;
> -
> -	block_end_pfn = pageblock_end_pfn(block_start_pfn);
> -	for (; block_start_pfn < zone_end_pfn(zone);
> -			block_start_pfn = block_end_pfn,
> -			 block_end_pfn += pageblock_nr_pages) {
> -
> -		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
> -
> -		if (!__pageblock_pfn_to_page(block_start_pfn,
> -					     block_end_pfn, zone))
> -			return;
> -		cond_resched();
> -	}
> -
> -	/* We confirm that there is no hole */
> -	zone->contiguous = true;
> -}
> -
>  /*
>   * Check if a PFN range intersects multiple zones on one or more
>   * NUMA nodes. Specify the @nid argument if it is known that this
> @@ -2311,7 +2301,6 @@ bool pfn_range_intersects_zones(int nid, unsigned
> long start_pfn,
>  static void __init mem_init_print_info(void);
>  void __init page_alloc_init_late(void)
>  {
> -	struct zone *zone;
>  	int nid;
> 
>  #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
> @@ -2345,9 +2334,6 @@ void __init page_alloc_init_late(void)
>  	for_each_node_state(nid, N_MEMORY)
>  		shuffle_free_memory(NODE_DATA(nid));
> 
> -	for_each_populated_zone(zone)
> -		set_zone_contiguous(zone);
> -
>  	/* Initialize page ext after all struct pages are initialized. */
>  	if (deferred_struct_pages)
>  		page_ext_init();
> --
> 2.47.3



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
  2026-03-19  9:56 [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range Yuan Liu
  2026-03-19 10:08 ` Liu, Yuan1
@ 2026-03-20  3:13 ` Andrew Morton
  2026-03-23 10:56 ` David Hildenbrand (Arm)
  2026-03-23 11:51 ` Mike Rapoport
  3 siblings, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2026-03-20  3:13 UTC (permalink / raw)
  To: Yuan Liu
  Cc: David Hildenbrand, Oscar Salvador, Mike Rapoport, Wei Yang,
	linux-mm, Yong Hu, Nanhai Zou, Tim Chen, Qiuxu Zhuo, Yu C Chen,
	Pan Deng, Tianyou Li, Chen Zhang, linux-kernel

On Thu, 19 Mar 2026 05:56:22 -0400 Yuan Liu <yuan1.liu@intel.com> wrote:

> When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it will
> update the zone->contiguous by checking the new zone's pfn range from the
> beginning to the end, regardless the previous state of the old zone. When
> the zone's pfn range is large, the cost of traversing the pfn range to
> update the zone->contiguous could be significant.
> 
> Add a new zone's pages_with_memmap member, it is pages within the zone that
> have an online memmap. It includes present pages and memory holes that have
> a memmap. When spanned_pages == pages_with_online_memmap, pfn_to_page() can
> be performed without further checks on any pfn within the zone span.

AI review asks questions:
	https://sashiko.dev/#/patchset/20260319095622.1130380-1-yuan1.liu%40intel.com


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
  2026-03-19  9:56 [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range Yuan Liu
  2026-03-19 10:08 ` Liu, Yuan1
  2026-03-20  3:13 ` Andrew Morton
@ 2026-03-23 10:56 ` David Hildenbrand (Arm)
  2026-03-23 11:31   ` Mike Rapoport
  2026-03-26  3:39   ` Liu, Yuan1
  2026-03-23 11:51 ` Mike Rapoport
  3 siblings, 2 replies; 13+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-23 10:56 UTC (permalink / raw)
  To: Yuan Liu, Oscar Salvador, Mike Rapoport, Wei Yang
  Cc: linux-mm, Yong Hu, Nanhai Zou, Tim Chen, Qiuxu Zhuo, Yu C Chen,
	Pan Deng, Tianyou Li, Chen Zhang, linux-kernel

On 3/19/26 10:56, Yuan Liu wrote:
> When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it will
> update the zone->contiguous by checking the new zone's pfn range from the
> beginning to the end, regardless the previous state of the old zone. When
> the zone's pfn range is large, the cost of traversing the pfn range to
> update the zone->contiguous could be significant.
> 
> Add a new zone's pages_with_memmap member, it is pages within the zone that
> have an online memmap. It includes present pages and memory holes that have
> a memmap. When spanned_pages == pages_with_online_memmap, pfn_to_page() can
> be performed without further checks on any pfn within the zone span.
> 
> The following test cases of memory hotplug for a VM [1], tested in the
> environment [2], show that this optimization can significantly reduce the
> memory hotplug time [3].
> 
> +----------------+------+---------------+--------------+----------------+
> |                | Size | Time (before) | Time (after) | Time Reduction |
> |                +------+---------------+--------------+----------------+
> | Plug Memory    | 256G |      10s      |      3s      |       70%      |
> |                +------+---------------+--------------+----------------+
> |                | 512G |      36s      |      7s      |       81%      |
> +----------------+------+---------------+--------------+----------------+
> 
> +----------------+------+---------------+--------------+----------------+
> |                | Size | Time (before) | Time (after) | Time Reduction |
> |                +------+---------------+--------------+----------------+
> | Unplug Memory  | 256G |      11s      |      4s      |       64%      |
> |                +------+---------------+--------------+----------------+
> |                | 512G |      36s      |      9s      |       75%      |
> +----------------+------+---------------+--------------+----------------+
> 
> [1] Qemu commands to hotplug 256G/512G memory for a VM:
>     object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
>     device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
>     qom-set vmem1 requested-size 256G/512G (Plug Memory)
>     qom-set vmem1 requested-size 0G (Unplug Memory)
> 
> [2] Hardware     : Intel Icelake server
>     Guest Kernel : v7.0-rc4
>     Qemu         : v9.0.0
> 
>     Launch VM    :
>     qemu-system-x86_64 -accel kvm -cpu host \
>     -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
>     -drive file=./seed.img,format=raw,if=virtio \
>     -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
>     -m 2G,slots=10,maxmem=2052472M \
>     -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
>     -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
>     -nographic -machine q35 \
>     -nic user,hostfwd=tcp::3000-:22
> 
>     Guest kernel auto-onlines newly added memory blocks:
>     echo online > /sys/devices/system/memory/auto_online_blocks
> 
> [3] The time from typing the QEMU commands in [1] to when the output of
>     'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
>     memory is recognized.
> 
> Reported-by: Nanhai Zou <nanhai.zou@intel.com>
> Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
> Tested-by: Yuan Liu <yuan1.liu@intel.com>
> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
> Reviewed-by: Pan Deng <pan.deng@intel.com>
> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
> Reviewed-by: Yuan Liu <yuan1.liu@intel.com>
> Co-developed-by: Tianyou Li <tianyou.li@intel.com>
> Signed-off-by: Tianyou Li <tianyou.li@intel.com>
> Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
> ---


[...]

>  
> +/**
> + * zone_is_contiguous - test whether a zone is contiguous
> + * @zone: the zone to test.
> + *
> + * In a contiguous zone, it is valid to call pfn_to_page() on any pfn in the
> + * spanned zone without requiring pfn_valid() or pfn_to_online_page() checks.

I think there is a small catch to it: users should protect from concurrent
memory offlining. I recall that, for compaction, there either was some protection 
in place or the race window was effectively impossible to hit.

Maybe we should add here for completion: "Note that missing synchronization with
memory offlining makes any PFN traversal prone to races."

> + *
> + * Returns: true if contiguous, otherwise false.
> + */
> +static inline bool zone_is_contiguous(const struct zone *zone)
> +{
> +	return READ_ONCE(zone->spanned_pages) ==
> +		READ_ONCE(zone->pages_with_online_memmap);

	       ^ should be vertically aligned

> +}
> +
>  static inline bool zone_is_initialized(const struct zone *zone)
>  {
>  	return zone->initialized;
> diff --git a/mm/internal.h b/mm/internal.h
> index cb0af847d7d9..7c4c8ab68bde 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -793,21 +793,17 @@ extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
>  static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>  				unsigned long end_pfn, struct zone *zone)
>  {
> -	if (zone->contiguous)
> +	if (zone_is_contiguous(zone) && zone_spans_pfn(zone, start_pfn)) {

Do we really need the zone_spans_pfn() check? The caller must make sure that the
zone spans the PFN range before calling this function.
Compaction does that by walking only PFNs in the range.


The old "if (zone->contiguous)" check also expected a caller to handle that.

> +		VM_BUG_ON(end_pfn > zone_end_pfn(zone));

No VM_BUG_ONs please. But I think we can also drop this.

>  		return pfn_to_page(start_pfn);
> +	}
>  
>  	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
>  }
>  
> -void set_zone_contiguous(struct zone *zone);
>  bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
>  			   unsigned long nr_pages);
>  
> -static inline void clear_zone_contiguous(struct zone *zone)
> -{
> -	zone->contiguous = false;
> -}
> -
>  extern int __isolate_free_page(struct page *page, unsigned int order);
>  extern void __putback_isolated_page(struct page *page, unsigned int order,
>  				    int mt);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index bc805029da51..2ba7a394a64b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -492,11 +492,11 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
>  		pfn = find_smallest_section_pfn(nid, zone, end_pfn,
>  						zone_end_pfn(zone));
>  		if (pfn) {
> -			zone->spanned_pages = zone_end_pfn(zone) - pfn;
> +			WRITE_ONCE(zone->spanned_pages, zone_end_pfn(zone) - pfn);
>  			zone->zone_start_pfn = pfn;
>  		} else {
>  			zone->zone_start_pfn = 0;
> -			zone->spanned_pages = 0;
> +			WRITE_ONCE(zone->spanned_pages, 0);
>  		}
>  	} else if (zone_end_pfn(zone) == end_pfn) {
>  		/*
> @@ -508,10 +508,10 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
>  		pfn = find_biggest_section_pfn(nid, zone, zone->zone_start_pfn,
>  					       start_pfn);
>  		if (pfn)
> -			zone->spanned_pages = pfn - zone->zone_start_pfn + 1;
> +			WRITE_ONCE(zone->spanned_pages, pfn - zone->zone_start_pfn + 1);
>  		else {
>  			zone->zone_start_pfn = 0;
> -			zone->spanned_pages = 0;
> +			WRITE_ONCE(zone->spanned_pages, 0);
>  		}
>  	}
>  }

As the AI review points out, we should also make sure that
resize_zone_range() updates it with a WRITE_ONCE().

But I am starting to wonder if we should as a first step leave
the zone->contiguous bool in place. Then we have to worry less about
reorderings of reading/writing spanned_pages vs. pages_with_online_memmap.

See below

[...]

> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index df34797691bd..96690e550024 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -946,6 +946,7 @@ static void __init memmap_init_zone_range(struct zone *zone,
>  	unsigned long zone_start_pfn = zone->zone_start_pfn;
>  	unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages;
>  	int nid = zone_to_nid(zone), zone_id = zone_idx(zone);
> +	unsigned long zone_hole_start, zone_hole_end;
>  
>  	start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
>  	end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn);
> @@ -957,8 +958,19 @@ static void __init memmap_init_zone_range(struct zone *zone,
>  			  zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
>  			  false);
>  
> -	if (*hole_pfn < start_pfn)
> +	WRITE_ONCE(zone->pages_with_online_memmap,
> +		   READ_ONCE(zone->pages_with_online_memmap) +
> +		   (end_pfn - start_pfn));
> +
> +	if (*hole_pfn < start_pfn) {
>  		init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
> +		zone_hole_start = clamp(*hole_pfn, zone_start_pfn, zone_end_pfn);
> +		zone_hole_end = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
> +		if (zone_hole_start < zone_hole_end)
> +			WRITE_ONCE(zone->pages_with_online_memmap,
> +				   READ_ONCE(zone->pages_with_online_memmap) +
> +				   (zone_hole_end - zone_hole_start));
> +	}

The range can have larger holes without a memmap, and I think we would be
missing pages handled by the other init_unavailable_range() call?


There is one question for Mike, though: couldn't it happen that the
init_unavailable_range() call in memmap_init() would initialize
the memmap outside of the node/zone span? If so, I wonder whether we
would want to adjust the node+zone space to include these ranges.

Later memory onlining could make these ranges suddenly fall into the
node/zone span.

So that requires some thought.


Maybe we should start with this (untested):

From a73ee44bc93fbcb9cf2b995e27fb98c68415f7be Mon Sep 17 00:00:00 2001
From: Yuan Liu <yuan1.liu@intel.com>
Date: Thu, 19 Mar 2026 05:56:22 -0400
Subject: [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when
 changing pfn range

[...]
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 Documentation/mm/physical_memory.rst |  6 ++++
 drivers/base/memory.c                |  5 ++++
 include/linux/mmzone.h               | 38 +++++++++++++++++++++++++
 mm/internal.h                        |  8 +-----
 mm/memory_hotplug.c                  | 12 ++------
 mm/mm_init.c                         | 42 ++++++++++------------------
 6 files changed, 67 insertions(+), 44 deletions(-)

diff --git a/Documentation/mm/physical_memory.rst b/Documentation/mm/physical_memory.rst
index 2398d87ac156..e4e188cd4887 100644
--- a/Documentation/mm/physical_memory.rst
+++ b/Documentation/mm/physical_memory.rst
@@ -483,6 +483,12 @@ General
   ``present_pages`` should use ``get_online_mems()`` to get a stable value. It
   is initialized by ``calculate_node_totalpages()``.
 
+``pages_with_online_memmap``
+  The pages_with_online_memmap is pages within the zone that have an online
+  memmap. It includes present pages and memory holes that have a memmap. When
+  spanned_pages == pages_with_online_memmap, pfn_to_page() can be performed
+  without further checks on any pfn within the zone span.
+
 ``present_early_pages``
   The present pages existing within the zone located on memory available since
   early boot, excluding hotplugged memory. Defined only when
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 5380050b16b7..a367dde6e6fa 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -246,6 +246,7 @@ static int memory_block_online(struct memory_block *mem)
 		nr_vmemmap_pages = mem->altmap->free;
 
 	mem_hotplug_begin();
+	clear_zone_contiguous(zone);
 	if (nr_vmemmap_pages) {
 		ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
 		if (ret)
@@ -270,6 +271,7 @@ static int memory_block_online(struct memory_block *mem)
 
 	mem->zone = zone;
 out:
+	set_zone_contiguous(zone);
 	mem_hotplug_done();
 	return ret;
 }
@@ -295,6 +297,8 @@ static int memory_block_offline(struct memory_block *mem)
 		nr_vmemmap_pages = mem->altmap->free;
 
 	mem_hotplug_begin();
+	clear_zone_contiguous(mem->zone);
+
 	if (nr_vmemmap_pages)
 		adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
 					  -nr_vmemmap_pages);
@@ -314,6 +318,7 @@ static int memory_block_offline(struct memory_block *mem)
 
 	mem->zone = NULL;
 out:
+	set_zone_contiguous(mem->zone);
 	mem_hotplug_done();
 	return ret;
 }
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e11513f581eb..463376349a2c 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1029,6 +1029,11 @@ struct zone {
 	 * cma pages is present pages that are assigned for CMA use
 	 * (MIGRATE_CMA).
 	 *
+	 * pages_with_online_memmap is pages within the zone that have an online
+	 * memmap. It includes present pages and memory holes that have a memmap.
+	 * When spanned_pages == pages_with_online_memmap, pfn_to_page() can be
+	 * performed without further checks on any pfn within the zone span.
+	 *
 	 * So present_pages may be used by memory hotplug or memory power
 	 * management logic to figure out unmanaged pages by checking
 	 * (present_pages - managed_pages). And managed_pages should be used
@@ -1053,6 +1058,7 @@ struct zone {
 	atomic_long_t		managed_pages;
 	unsigned long		spanned_pages;
 	unsigned long		present_pages;
+	unsigned long		pages_with_online_memmap;
 #if defined(CONFIG_MEMORY_HOTPLUG)
 	unsigned long		present_early_pages;
 #endif
@@ -1710,6 +1716,38 @@ static inline bool populated_zone(const struct zone *zone)
 	return zone->present_pages;
 }
 
+/**
+ * zone_is_contiguous - test whether a zone is contiguous
+ * @zone: the zone to test.
+ *
+ * In a contiguous zone, it is valid to call pfn_to_page() on any pfn in the
+ * spanned zone without requiring pfn_valid() or pfn_to_online_page() checks.
+ *
+ * Note that missing synchronization with memory offlining makes any
+ * PFN traversal prone to races.
+ *
+ * ZONE_DEVICE zones are always marked non-contiguous.
+ *
+ * Returns: true if contiguous, otherwise false.
+ */
+static inline bool zone_is_contiguous(const struct zone *zone)
+{
+	return zone->contiguous;
+}
+
+static inline void set_zone_contiguous(struct zone *zone)
+{
+	if (zone_is_zone_device(zone))
+		return;
+	if (zone->spanned_pages == zone->pages_with_online_memmap)
+		zone->contiguous = true;
+}
+
+static inline void clear_zone_contiguous(struct zone *zone)
+{
+	zone->contiguous = false;
+}
+
 #ifdef CONFIG_NUMA
 static inline int zone_to_nid(const struct zone *zone)
 {
diff --git a/mm/internal.h b/mm/internal.h
index 532d78febf91..faec50e55a30 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -816,21 +816,15 @@ extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
 static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
 				unsigned long end_pfn, struct zone *zone)
 {
-	if (zone->contiguous)
+	if (zone_is_contiguous(zone))
 		return pfn_to_page(start_pfn);
 
 	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
 }
 
-void set_zone_contiguous(struct zone *zone);
 bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
 			   unsigned long nr_pages);
 
-static inline void clear_zone_contiguous(struct zone *zone)
-{
-	zone->contiguous = false;
-}
-
 extern int __isolate_free_page(struct page *page, unsigned int order);
 extern void __putback_isolated_page(struct page *page, unsigned int order,
 				    int mt);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 70e620496cec..f29c0d70c970 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -558,18 +558,13 @@ void remove_pfn_range_from_zone(struct zone *zone,
 
 	/*
 	 * Zone shrinking code cannot properly deal with ZONE_DEVICE. So
-	 * we will not try to shrink the zones - which is okay as
-	 * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
+	 * we will not try to shrink the zones.
 	 */
 	if (zone_is_zone_device(zone))
 		return;
 
-	clear_zone_contiguous(zone);
-
 	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
 	update_pgdat_span(pgdat);
-
-	set_zone_contiguous(zone);
 }
 
 /**
@@ -746,8 +741,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 	struct pglist_data *pgdat = zone->zone_pgdat;
 	int nid = pgdat->node_id;
 
-	clear_zone_contiguous(zone);
-
 	if (zone_is_empty(zone))
 		init_currently_empty_zone(zone, start_pfn, nr_pages);
 	resize_zone_range(zone, start_pfn, nr_pages);
@@ -775,8 +768,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 	memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0,
 			 MEMINIT_HOTPLUG, altmap, migratetype,
 			 isolate_pageblock);
-
-	set_zone_contiguous(zone);
 }
 
 struct auto_movable_stats {
@@ -1072,6 +1063,7 @@ void adjust_present_page_count(struct page *page, struct memory_group *group,
 	if (early_section(__pfn_to_section(page_to_pfn(page))))
 		zone->present_early_pages += nr_pages;
 	zone->present_pages += nr_pages;
+	zone->pages_with_online_memmap += nr_pages;
 	zone->zone_pgdat->node_present_pages += nr_pages;
 
 	if (group && movable)
diff --git a/mm/mm_init.c b/mm/mm_init.c
index e0f1e36cb9e4..6e5a8da7cdda 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -854,7 +854,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
  *   zone/node above the hole except for the trailing pages in the last
  *   section that will be appended to the zone/node below.
  */
-static void __init init_unavailable_range(unsigned long spfn,
+static unsigned long __init init_unavailable_range(unsigned long spfn,
 					  unsigned long epfn,
 					  int zone, int node)
 {
@@ -870,6 +870,7 @@ static void __init init_unavailable_range(unsigned long spfn,
 	if (pgcnt)
 		pr_info("On node %d, zone %s: %lld pages in unavailable ranges\n",
 			node, zone_names[zone], pgcnt);
+	return pgcnt;
 }
 
 /*
@@ -958,6 +959,7 @@ static void __init memmap_init_zone_range(struct zone *zone,
 	unsigned long zone_start_pfn = zone->zone_start_pfn;
 	unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages;
 	int nid = zone_to_nid(zone), zone_id = zone_idx(zone);
+	unsigned long hole_pfns;
 
 	start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
 	end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn);
@@ -968,9 +970,12 @@ static void __init memmap_init_zone_range(struct zone *zone,
 	memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn,
 			  zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
 			  false);
+	zone->pages_with_online_memmap = end_pfn - start_pfn;
 
-	if (*hole_pfn < start_pfn)
-		init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
+	if (*hole_pfn < start_pfn) {
+		hole_pfns = init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
+		zone->pages_with_online_memmap += hole_pfns;
+	}
 
 	*hole_pfn = end_pfn;
 }
@@ -980,6 +985,7 @@ static void __init memmap_init(void)
 	unsigned long start_pfn, end_pfn;
 	unsigned long hole_pfn = 0;
 	int i, j, zone_id = 0, nid;
+	unsigned long hole_pfns;
 
 	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
 		struct pglist_data *node = NODE_DATA(nid);
@@ -1008,8 +1014,12 @@ static void __init memmap_init(void)
 #else
 	end_pfn = round_up(end_pfn, MAX_ORDER_NR_PAGES);
 #endif
-	if (hole_pfn < end_pfn)
-		init_unavailable_range(hole_pfn, end_pfn, zone_id, nid);
+	if (hole_pfn < end_pfn) {
+		struct zone *zone = &NODE_DATA(nid)->node_zones[zone_id];
+
+		hole_pfns = init_unavailable_range(hole_pfn, end_pfn, zone_id, nid);
+		zone->pages_with_online_memmap += hole_pfns;
+	}
 }
 
 #ifdef CONFIG_ZONE_DEVICE
@@ -2273,28 +2283,6 @@ void __init init_cma_pageblock(struct page *page)
 }
 #endif
 
-void set_zone_contiguous(struct zone *zone)
-{
-	unsigned long block_start_pfn = zone->zone_start_pfn;
-	unsigned long block_end_pfn;
-
-	block_end_pfn = pageblock_end_pfn(block_start_pfn);
-	for (; block_start_pfn < zone_end_pfn(zone);
-			block_start_pfn = block_end_pfn,
-			 block_end_pfn += pageblock_nr_pages) {
-
-		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
-
-		if (!__pageblock_pfn_to_page(block_start_pfn,
-					     block_end_pfn, zone))
-			return;
-		cond_resched();
-	}
-
-	/* We confirm that there is no hole */
-	zone->contiguous = true;
-}
-
 /*
  * Check if a PFN range intersects multiple zones on one or more
  * NUMA nodes. Specify the @nid argument if it is known that this
-- 
2.43.0


-- 
Cheers,

David


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
  2026-03-23 10:56 ` David Hildenbrand (Arm)
@ 2026-03-23 11:31   ` Mike Rapoport
  2026-03-23 11:42     ` David Hildenbrand (Arm)
  2026-03-26  3:39   ` Liu, Yuan1
  1 sibling, 1 reply; 13+ messages in thread
From: Mike Rapoport @ 2026-03-23 11:31 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Yuan Liu, Oscar Salvador, Wei Yang, linux-mm, Yong Hu, Nanhai Zou,
	Tim Chen, Qiuxu Zhuo, Yu C Chen, Pan Deng, Tianyou Li, Chen Zhang,
	linux-kernel

On Mon, Mar 23, 2026 at 11:56:35AM +0100, David Hildenbrand (Arm) wrote:
> On 3/19/26 10:56, Yuan Liu wrote:

...

> > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > index df34797691bd..96690e550024 100644
> > --- a/mm/mm_init.c
> > +++ b/mm/mm_init.c
> > @@ -946,6 +946,7 @@ static void __init memmap_init_zone_range(struct zone *zone,
> >  	unsigned long zone_start_pfn = zone->zone_start_pfn;
> >  	unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages;
> >  	int nid = zone_to_nid(zone), zone_id = zone_idx(zone);
> > +	unsigned long zone_hole_start, zone_hole_end;
> >  
> >  	start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
> >  	end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn);
> > @@ -957,8 +958,19 @@ static void __init memmap_init_zone_range(struct zone *zone,
> >  			  zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
> >  			  false);
> >  
> > -	if (*hole_pfn < start_pfn)
> > +	WRITE_ONCE(zone->pages_with_online_memmap,
> > +		   READ_ONCE(zone->pages_with_online_memmap) +
> > +		   (end_pfn - start_pfn));
> > +
> > +	if (*hole_pfn < start_pfn) {
> >  		init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
> > +		zone_hole_start = clamp(*hole_pfn, zone_start_pfn, zone_end_pfn);
> > +		zone_hole_end = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
> > +		if (zone_hole_start < zone_hole_end)
> > +			WRITE_ONCE(zone->pages_with_online_memmap,
> > +				   READ_ONCE(zone->pages_with_online_memmap) +
> > +				   (zone_hole_end - zone_hole_start));
> > +	}
> 
> The range can have larger holes without a memmap, and I think we would be
> missing pages handled by the other init_unavailable_range() call?
> 
> 
> There is one question for Mike, though: couldn't it happen that the
> init_unavailable_range() call in memmap_init() would initialize
> the memmap outside of the node/zone span? 

Yes, and it most likely will.

Very common example is page 0 on x86 systems:

[    0.012196]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.012221] On node 0, zone DMA: 1 pages in unavailable ranges
[    0.012205] Early memory node ranges
[    0.012206]   node   0: [mem 0x0000000000001000-0x000000000009efff]
 
The unavailable page in zone DMA is the page from  0x0 to 0x1000 that is
neither in node 0 nor in zone DMA.

For ZONE_NORMAL it would be a more pathological case when zone/node span
ends in a middle of a section, but that's still possible.
 
> If so, I wonder whether we would want to adjust the node+zone space to
> include these ranges.
> 
> Later memory onlining could make these ranges suddenly fall into the
> node/zone span.

But doesn't memory onlining always happen at section boundaries?
 
> So that requires some thought.
> 
> 
> Maybe we should start with this (untested):
> 
> >From a73ee44bc93fbcb9cf2b995e27fb98c68415f7be Mon Sep 17 00:00:00 2001
> From: Yuan Liu <yuan1.liu@intel.com>
> Date: Thu, 19 Mar 2026 05:56:22 -0400
> Subject: [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when
>  changing pfn range
> 
> [...]
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> ---
>  Documentation/mm/physical_memory.rst |  6 ++++
>  drivers/base/memory.c                |  5 ++++
>  include/linux/mmzone.h               | 38 +++++++++++++++++++++++++
>  mm/internal.h                        |  8 +-----
>  mm/memory_hotplug.c                  | 12 ++------
>  mm/mm_init.c                         | 42 ++++++++++------------------
>  6 files changed, 67 insertions(+), 44 deletions(-)
> 
> diff --git a/Documentation/mm/physical_memory.rst b/Documentation/mm/physical_memory.rst
> index 2398d87ac156..e4e188cd4887 100644
> --- a/Documentation/mm/physical_memory.rst
> +++ b/Documentation/mm/physical_memory.rst
> @@ -483,6 +483,12 @@ General
>    ``present_pages`` should use ``get_online_mems()`` to get a stable value. It
>    is initialized by ``calculate_node_totalpages()``.
>  
> +``pages_with_online_memmap``
> +  The pages_with_online_memmap is pages within the zone that have an online
> +  memmap. It includes present pages and memory holes that have a memmap. When
> +  spanned_pages == pages_with_online_memmap, pfn_to_page() can be performed
> +  without further checks on any pfn within the zone span.
> +
>  ``present_early_pages``
>    The present pages existing within the zone located on memory available since
>    early boot, excluding hotplugged memory. Defined only when
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 5380050b16b7..a367dde6e6fa 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -246,6 +246,7 @@ static int memory_block_online(struct memory_block *mem)
>  		nr_vmemmap_pages = mem->altmap->free;
>  
>  	mem_hotplug_begin();
> +	clear_zone_contiguous(zone);
>  	if (nr_vmemmap_pages) {
>  		ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
>  		if (ret)
> @@ -270,6 +271,7 @@ static int memory_block_online(struct memory_block *mem)
>  
>  	mem->zone = zone;
>  out:
> +	set_zone_contiguous(zone);
>  	mem_hotplug_done();
>  	return ret;
>  }
> @@ -295,6 +297,8 @@ static int memory_block_offline(struct memory_block *mem)
>  		nr_vmemmap_pages = mem->altmap->free;
>  
>  	mem_hotplug_begin();
> +	clear_zone_contiguous(mem->zone);
> +
>  	if (nr_vmemmap_pages)
>  		adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
>  					  -nr_vmemmap_pages);
> @@ -314,6 +318,7 @@ static int memory_block_offline(struct memory_block *mem)
>  
>  	mem->zone = NULL;
>  out:
> +	set_zone_contiguous(mem->zone);
>  	mem_hotplug_done();
>  	return ret;
>  }
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index e11513f581eb..463376349a2c 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1029,6 +1029,11 @@ struct zone {
>  	 * cma pages is present pages that are assigned for CMA use
>  	 * (MIGRATE_CMA).
>  	 *
> +	 * pages_with_online_memmap is pages within the zone that have an online
> +	 * memmap. It includes present pages and memory holes that have a memmap.
> +	 * When spanned_pages == pages_with_online_memmap, pfn_to_page() can be
> +	 * performed without further checks on any pfn within the zone span.
> +	 *
>  	 * So present_pages may be used by memory hotplug or memory power
>  	 * management logic to figure out unmanaged pages by checking
>  	 * (present_pages - managed_pages). And managed_pages should be used
> @@ -1053,6 +1058,7 @@ struct zone {
>  	atomic_long_t		managed_pages;
>  	unsigned long		spanned_pages;
>  	unsigned long		present_pages;
> +	unsigned long		pages_with_online_memmap;
>  #if defined(CONFIG_MEMORY_HOTPLUG)
>  	unsigned long		present_early_pages;
>  #endif
> @@ -1710,6 +1716,38 @@ static inline bool populated_zone(const struct zone *zone)
>  	return zone->present_pages;
>  }
>  
> +/**
> + * zone_is_contiguous - test whether a zone is contiguous
> + * @zone: the zone to test.
> + *
> + * In a contiguous zone, it is valid to call pfn_to_page() on any pfn in the
> + * spanned zone without requiring pfn_valid() or pfn_to_online_page() checks.
> + *
> + * Note that missing synchronization with memory offlining makes any
> + * PFN traversal prone to races.
> + *
> + * ZONE_DEVICE zones are always marked non-contiguous.
> + *
> + * Returns: true if contiguous, otherwise false.
> + */
> +static inline bool zone_is_contiguous(const struct zone *zone)
> +{
> +	return zone->contiguous;
> +}
> +
> +static inline void set_zone_contiguous(struct zone *zone)
> +{
> +	if (zone_is_zone_device(zone))
> +		return;
> +	if (zone->spanned_pages == zone->pages_with_online_memmap)
> +		zone->contiguous = true;
> +}
> +
> +static inline void clear_zone_contiguous(struct zone *zone)
> +{
> +	zone->contiguous = false;
> +}
> +
>  #ifdef CONFIG_NUMA
>  static inline int zone_to_nid(const struct zone *zone)
>  {
> diff --git a/mm/internal.h b/mm/internal.h
> index 532d78febf91..faec50e55a30 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -816,21 +816,15 @@ extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
>  static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>  				unsigned long end_pfn, struct zone *zone)
>  {
> -	if (zone->contiguous)
> +	if (zone_is_contiguous(zone))
>  		return pfn_to_page(start_pfn);
>  
>  	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
>  }
>  
> -void set_zone_contiguous(struct zone *zone);
>  bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
>  			   unsigned long nr_pages);
>  
> -static inline void clear_zone_contiguous(struct zone *zone)
> -{
> -	zone->contiguous = false;
> -}
> -
>  extern int __isolate_free_page(struct page *page, unsigned int order);
>  extern void __putback_isolated_page(struct page *page, unsigned int order,
>  				    int mt);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 70e620496cec..f29c0d70c970 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -558,18 +558,13 @@ void remove_pfn_range_from_zone(struct zone *zone,
>  
>  	/*
>  	 * Zone shrinking code cannot properly deal with ZONE_DEVICE. So
> -	 * we will not try to shrink the zones - which is okay as
> -	 * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
> +	 * we will not try to shrink the zones.
>  	 */
>  	if (zone_is_zone_device(zone))
>  		return;
>  
> -	clear_zone_contiguous(zone);
> -
>  	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
>  	update_pgdat_span(pgdat);
> -
> -	set_zone_contiguous(zone);
>  }
>  
>  /**
> @@ -746,8 +741,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>  	struct pglist_data *pgdat = zone->zone_pgdat;
>  	int nid = pgdat->node_id;
>  
> -	clear_zone_contiguous(zone);
> -
>  	if (zone_is_empty(zone))
>  		init_currently_empty_zone(zone, start_pfn, nr_pages);
>  	resize_zone_range(zone, start_pfn, nr_pages);
> @@ -775,8 +768,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>  	memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0,
>  			 MEMINIT_HOTPLUG, altmap, migratetype,
>  			 isolate_pageblock);
> -
> -	set_zone_contiguous(zone);
>  }
>  
>  struct auto_movable_stats {
> @@ -1072,6 +1063,7 @@ void adjust_present_page_count(struct page *page, struct memory_group *group,
>  	if (early_section(__pfn_to_section(page_to_pfn(page))))
>  		zone->present_early_pages += nr_pages;
>  	zone->present_pages += nr_pages;
> +	zone->pages_with_online_memmap += nr_pages;
>  	zone->zone_pgdat->node_present_pages += nr_pages;
>  
>  	if (group && movable)
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index e0f1e36cb9e4..6e5a8da7cdda 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -854,7 +854,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
>   *   zone/node above the hole except for the trailing pages in the last
>   *   section that will be appended to the zone/node below.
>   */
> -static void __init init_unavailable_range(unsigned long spfn,
> +static unsigned long __init init_unavailable_range(unsigned long spfn,
>  					  unsigned long epfn,
>  					  int zone, int node)
>  {
> @@ -870,6 +870,7 @@ static void __init init_unavailable_range(unsigned long spfn,
>  	if (pgcnt)
>  		pr_info("On node %d, zone %s: %lld pages in unavailable ranges\n",
>  			node, zone_names[zone], pgcnt);
> +	return pgcnt;
>  }
>  
>  /*
> @@ -958,6 +959,7 @@ static void __init memmap_init_zone_range(struct zone *zone,
>  	unsigned long zone_start_pfn = zone->zone_start_pfn;
>  	unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages;
>  	int nid = zone_to_nid(zone), zone_id = zone_idx(zone);
> +	unsigned long hole_pfns;
>  
>  	start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
>  	end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn);
> @@ -968,9 +970,12 @@ static void __init memmap_init_zone_range(struct zone *zone,
>  	memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn,
>  			  zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
>  			  false);
> +	zone->pages_with_online_memmap = end_pfn - start_pfn;
>  
> -	if (*hole_pfn < start_pfn)
> -		init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
> +	if (*hole_pfn < start_pfn) {
> +		hole_pfns = init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
> +		zone->pages_with_online_memmap += hole_pfns;
> +	}
>  
>  	*hole_pfn = end_pfn;
>  }
> @@ -980,6 +985,7 @@ static void __init memmap_init(void)
>  	unsigned long start_pfn, end_pfn;
>  	unsigned long hole_pfn = 0;
>  	int i, j, zone_id = 0, nid;
> +	unsigned long hole_pfns;
>  
>  	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
>  		struct pglist_data *node = NODE_DATA(nid);
> @@ -1008,8 +1014,12 @@ static void __init memmap_init(void)
>  #else
>  	end_pfn = round_up(end_pfn, MAX_ORDER_NR_PAGES);
>  #endif
> -	if (hole_pfn < end_pfn)
> -		init_unavailable_range(hole_pfn, end_pfn, zone_id, nid);
> +	if (hole_pfn < end_pfn) {
> +		struct zone *zone = &NODE_DATA(nid)->node_zones[zone_id];
> +
> +		hole_pfns = init_unavailable_range(hole_pfn, end_pfn, zone_id, nid);
> +		zone->pages_with_online_memmap += hole_pfns;
> +	}
>  }
>  
>  #ifdef CONFIG_ZONE_DEVICE
> @@ -2273,28 +2283,6 @@ void __init init_cma_pageblock(struct page *page)
>  }
>  #endif
>  
> -void set_zone_contiguous(struct zone *zone)
> -{
> -	unsigned long block_start_pfn = zone->zone_start_pfn;
> -	unsigned long block_end_pfn;
> -
> -	block_end_pfn = pageblock_end_pfn(block_start_pfn);
> -	for (; block_start_pfn < zone_end_pfn(zone);
> -			block_start_pfn = block_end_pfn,
> -			 block_end_pfn += pageblock_nr_pages) {
> -
> -		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
> -
> -		if (!__pageblock_pfn_to_page(block_start_pfn,
> -					     block_end_pfn, zone))
> -			return;
> -		cond_resched();
> -	}
> -
> -	/* We confirm that there is no hole */
> -	zone->contiguous = true;
> -}
> -
>  /*
>   * Check if a PFN range intersects multiple zones on one or more
>   * NUMA nodes. Specify the @nid argument if it is known that this
> -- 
> 2.43.0
> 
> 
> -- 
> Cheers,
> 
> David

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
  2026-03-23 11:31   ` Mike Rapoport
@ 2026-03-23 11:42     ` David Hildenbrand (Arm)
  2026-03-26  7:30       ` Liu, Yuan1
  0 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-23 11:42 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Yuan Liu, Oscar Salvador, Wei Yang, linux-mm, Yong Hu, Nanhai Zou,
	Tim Chen, Qiuxu Zhuo, Yu C Chen, Pan Deng, Tianyou Li, Chen Zhang,
	linux-kernel

On 3/23/26 12:31, Mike Rapoport wrote:
> On Mon, Mar 23, 2026 at 11:56:35AM +0100, David Hildenbrand (Arm) wrote:
>> On 3/19/26 10:56, Yuan Liu wrote:
> 
> ...
> 
>>> diff --git a/mm/mm_init.c b/mm/mm_init.c
>>> index df34797691bd..96690e550024 100644
>>> --- a/mm/mm_init.c
>>> +++ b/mm/mm_init.c
>>> @@ -946,6 +946,7 @@ static void __init memmap_init_zone_range(struct zone *zone,
>>>  	unsigned long zone_start_pfn = zone->zone_start_pfn;
>>>  	unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages;
>>>  	int nid = zone_to_nid(zone), zone_id = zone_idx(zone);
>>> +	unsigned long zone_hole_start, zone_hole_end;
>>>  
>>>  	start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
>>>  	end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn);
>>> @@ -957,8 +958,19 @@ static void __init memmap_init_zone_range(struct zone *zone,
>>>  			  zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
>>>  			  false);
>>>  
>>> -	if (*hole_pfn < start_pfn)
>>> +	WRITE_ONCE(zone->pages_with_online_memmap,
>>> +		   READ_ONCE(zone->pages_with_online_memmap) +
>>> +		   (end_pfn - start_pfn));
>>> +
>>> +	if (*hole_pfn < start_pfn) {
>>>  		init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
>>> +		zone_hole_start = clamp(*hole_pfn, zone_start_pfn, zone_end_pfn);
>>> +		zone_hole_end = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
>>> +		if (zone_hole_start < zone_hole_end)
>>> +			WRITE_ONCE(zone->pages_with_online_memmap,
>>> +				   READ_ONCE(zone->pages_with_online_memmap) +
>>> +				   (zone_hole_end - zone_hole_start));
>>> +	}
>>
>> The range can have larger holes without a memmap, and I think we would be
>> missing pages handled by the other init_unavailable_range() call?
>>
>>
>> There is one question for Mike, though: couldn't it happen that the
>> init_unavailable_range() call in memmap_init() would initialize
>> the memmap outside of the node/zone span? 
> 
> Yes, and it most likely will.
> 
> Very common example is page 0 on x86 systems:
> 
> [    0.012196]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> [    0.012221] On node 0, zone DMA: 1 pages in unavailable ranges
> [    0.012205] Early memory node ranges
> [    0.012206]   node   0: [mem 0x0000000000001000-0x000000000009efff]
>  
> The unavailable page in zone DMA is the page from  0x0 to 0x1000 that is
> neither in node 0 nor in zone DMA.
> 
> For ZONE_NORMAL it would be a more pathological case when zone/node span
> ends in a middle of a section, but that's still possible.
>  
>> If so, I wonder whether we would want to adjust the node+zone space to
>> include these ranges.
>>
>> Later memory onlining could make these ranges suddenly fall into the
>> node/zone span.
> 
> But doesn't memory onlining always happen at section boundaries?

Sure, but assume ZONE_NORMAL ends in the middle of a section, and then
you hotplug the next section.

Then, the zone spans that memmap. zone->pages_with_online_memmap will be
wrong.

Once we unplug the hotplugged section, zone shrinking code will stumble
over the whole-pfns and assume they belong to the zone.
zone->pages_with_online_memmap will be wrong.

zone->pages_with_online_memmap being wrong means that it is smaller than
it should. I guess, it would not be broken, but we would fail to detect
contiguous zones.

If there would be an easy way to avoid that, that would be cleaner.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
  2026-03-19  9:56 [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range Yuan Liu
                   ` (2 preceding siblings ...)
  2026-03-23 10:56 ` David Hildenbrand (Arm)
@ 2026-03-23 11:51 ` Mike Rapoport
  2026-03-26  7:32   ` Liu, Yuan1
  3 siblings, 1 reply; 13+ messages in thread
From: Mike Rapoport @ 2026-03-23 11:51 UTC (permalink / raw)
  To: Yuan Liu
  Cc: David Hildenbrand, Oscar Salvador, Wei Yang, linux-mm, Yong Hu,
	Nanhai Zou, Tim Chen, Qiuxu Zhuo, Yu C Chen, Pan Deng, Tianyou Li,
	Chen Zhang, linux-kernel

Hi,

On Thu, Mar 19, 2026 at 05:56:22AM -0400, Yuan Liu wrote:

...

> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index df34797691bd..96690e550024 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -946,6 +946,7 @@ static void __init memmap_init_zone_range(struct zone *zone,
>  	unsigned long zone_start_pfn = zone->zone_start_pfn;
>  	unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages;
>  	int nid = zone_to_nid(zone), zone_id = zone_idx(zone);
> +	unsigned long zone_hole_start, zone_hole_end;
>  
>  	start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
>  	end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn);
> @@ -957,8 +958,19 @@ static void __init memmap_init_zone_range(struct zone *zone,
>  			  zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
>  			  false);
>  
> -	if (*hole_pfn < start_pfn)
> +	WRITE_ONCE(zone->pages_with_online_memmap,
> +		   READ_ONCE(zone->pages_with_online_memmap) +
> +		   (end_pfn - start_pfn));
> +
> +	if (*hole_pfn < start_pfn) {
>  		init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
> +		zone_hole_start = clamp(*hole_pfn, zone_start_pfn, zone_end_pfn);
> +		zone_hole_end = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
> +		if (zone_hole_start < zone_hole_end)
> +			WRITE_ONCE(zone->pages_with_online_memmap,
> +				   READ_ONCE(zone->pages_with_online_memmap) +
> +				   (zone_hole_end - zone_hole_start));
> +	}

I didn't have time to review it, but it really jumped at me.
memmap_init_zone_range() runs before SMP, there is no need for
WRITE_ONCE()/READ_ONCE() here.

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
  2026-03-23 10:56 ` David Hildenbrand (Arm)
  2026-03-23 11:31   ` Mike Rapoport
@ 2026-03-26  3:39   ` Liu, Yuan1
  2026-03-26  9:23     ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 13+ messages in thread
From: Liu, Yuan1 @ 2026-03-26  3:39 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Oscar Salvador, Mike Rapoport, Wei Yang
  Cc: linux-mm@kvack.org, Hu, Yong, Zou, Nanhai, Tim Chen, Zhuo, Qiuxu,
	Chen, Yu C, Deng, Pan, Li, Tianyou, Chen Zhang,
	linux-kernel@vger.kernel.org

> >
> > +/**
> > + * zone_is_contiguous - test whether a zone is contiguous
> > + * @zone: the zone to test.
> > + *
> > + * In a contiguous zone, it is valid to call pfn_to_page() on any pfn
> in the
> > + * spanned zone without requiring pfn_valid() or pfn_to_online_page()
> checks.
> 
> I think there is a small catch to it: users should protect from concurrent
> memory offlining. I recall that, for compaction, there either was some
> protection
> in place or the race window was effectively impossible to hit.
> 
> Maybe we should add here for completion: "Note that missing
> synchronization with
> memory offlining makes any PFN traversal prone to races."

Sure, I will add it.

> > + *
> > + * Returns: true if contiguous, otherwise false.
> > + */
> > +static inline bool zone_is_contiguous(const struct zone *zone)
> > +{
> > +	return READ_ONCE(zone->spanned_pages) ==
> > +		READ_ONCE(zone->pages_with_online_memmap);
> 
> 	       ^ should be vertically aligned

Got it, thanks

> > diff --git a/mm/internal.h b/mm/internal.h
> > index cb0af847d7d9..7c4c8ab68bde 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -793,21 +793,17 @@ extern struct page
> *__pageblock_pfn_to_page(unsigned long start_pfn,
> >  static inline struct page *pageblock_pfn_to_page(unsigned long
> start_pfn,
> >  				unsigned long end_pfn, struct zone *zone)
> >  {
> > -	if (zone->contiguous)
> > +	if (zone_is_contiguous(zone) && zone_spans_pfn(zone, start_pfn)) {
> 
> Do we really need the zone_spans_pfn() check? The caller must make sure
> that the
> zone spans the PFN range before calling this function.
> Compaction does that by walking only PFNs in the range.

Get it, I will fix it.
 
> The old "if (zone->contiguous)" check also expected a caller to handle
> that.
> 
> > +		VM_BUG_ON(end_pfn > zone_end_pfn(zone));
> 
> No VM_BUG_ONs please. But I think we can also drop this.

Got it, thanks

> As the AI review points out, we should also make sure that
> resize_zone_range() updates it with a WRITE_ONCE().
> 
> But I am starting to wonder if we should as a first step leave
> the zone->contiguous bool in place. Then we have to worry less about
> reorderings of reading/writing spanned_pages vs. pages_with_online_memmap.
> 
> See below

Yes, I agree with retaining zone->contiguous
 
> [...]
> 
> > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > index df34797691bd..96690e550024 100644
> > --- a/mm/mm_init.c
> > +++ b/mm/mm_init.c
> > @@ -946,6 +946,7 @@ static void __init memmap_init_zone_range(struct
> zone *zone,
> >  	unsigned long zone_start_pfn = zone->zone_start_pfn;
> >  	unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages;
> >  	int nid = zone_to_nid(zone), zone_id = zone_idx(zone);
> > +	unsigned long zone_hole_start, zone_hole_end;
> >
> >  	start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
> >  	end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn);
> > @@ -957,8 +958,19 @@ static void __init memmap_init_zone_range(struct
> zone *zone,
> >  			  zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
> >  			  false);
> >
> > -	if (*hole_pfn < start_pfn)
> > +	WRITE_ONCE(zone->pages_with_online_memmap,
> > +		   READ_ONCE(zone->pages_with_online_memmap) +
> > +		   (end_pfn - start_pfn));
> > +
> > +	if (*hole_pfn < start_pfn) {
> >  		init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
> > +		zone_hole_start = clamp(*hole_pfn, zone_start_pfn,
> zone_end_pfn);
> > +		zone_hole_end = clamp(start_pfn, zone_start_pfn,
> zone_end_pfn);
> > +		if (zone_hole_start < zone_hole_end)
> > +			WRITE_ONCE(zone->pages_with_online_memmap,
> > +				   READ_ONCE(zone->pages_with_online_memmap) +
> > +				   (zone_hole_end - zone_hole_start));
> > +	}
> 
> The range can have larger holes without a memmap, and I think we would be
> missing pages handled by the other init_unavailable_range() call?

There is another init_unavailable_range in the memmap_init.
However, it is used to ensure section alignment and is not accounted for in zone->spanned_pages. 
Therefore, if we include it here, it would cause zone->pages_with_online_memmap to exceed zone->spanned_pages.

#ifdef CONFIG_SPARSEMEM
        end_pfn = round_up(end_pfn, PAGES_PER_SECTION);
#else
        end_pfn = round_up(end_pfn, MAX_ORDER_NR_PAGES);
#endif
        if (hole_pfn < end_pfn)
                init_unavailable_range(hole_pfn, end_pfn, zone_id, nid);

About AI feedback below

init_unavailable_range() uses for_each_valid_pfn() and skips unmapped
sections that lack a mem_map. If the full hole size is added anyway,
pages_with_online_memmap could artificially match spanned_pages for
zones with missing sections.

Could this make zone_is_contiguous() incorrectly return true, bypassing
pfn_valid() checks and potentially dereferencing missing sections?
unsinged long zone_hole_pages

I think if I can update the pages_with_online_memmap in the 
init_unavailable_range directly.

static void __init init_unavailable_range(unsigned long spfn,
                                          unsigned long epfn,
-                                         int zone, int node)
+                                         int zone_id, int node)
 {
+       struct zone *zone = &NODE_DATA(nid)->node_zones[zone_id];
        unsigned long pfn;
        u64 pgcnt = 0;
+       u64 zone_memmap = 0;

        for_each_valid_pfn(pfn, spfn, epfn) {
-               __init_single_page(pfn_to_page(pfn), pfn, zone, node);
+               __init_single_page(pfn_to_page(pfn), pfn, zone_id, node);
                __SetPageReserved(pfn_to_page(pfn));
                pgcnt++;
+               if (zone_spans_pfn(zone, pfn)
+                       zone_memmap++;
        }

+       zone->pages_with_online_memmap += zone_memmap++;
        if (pgcnt)
                pr_info("On node %d, zone %s: %lld pages in unavailable ranges\n",
                        node, zone_names[zone], pgcnt);

> There is one question for Mike, though: couldn't it happen that the
> init_unavailable_range() call in memmap_init() would initialize
> the memmap outside of the node/zone span? If so, I wonder whether we
> would want to adjust the node+zone space to include these ranges.
> 
> Later memory onlining could make these ranges suddenly fall into the
> node/zone span.
> 
> So that requires some thought.
> 
> 
> Maybe we should start with this (untested):

Sure, I will prepare new patch based on it, thanks.

> From a73ee44bc93fbcb9cf2b995e27fb98c68415f7be Mon Sep 17 00:00:00 2001
> From: Yuan Liu <yuan1.liu@intel.com>
> Date: Thu, 19 Mar 2026 05:56:22 -0400
> Subject: [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check
> when
>  changing pfn range
> 
> [...]
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> ---
>  Documentation/mm/physical_memory.rst |  6 ++++
>  drivers/base/memory.c                |  5 ++++
>  include/linux/mmzone.h               | 38 +++++++++++++++++++++++++
>  mm/internal.h                        |  8 +-----
>  mm/memory_hotplug.c                  | 12 ++------
>  mm/mm_init.c                         | 42 ++++++++++------------------
>  6 files changed, 67 insertions(+), 44 deletions(-)
> 
> diff --git a/Documentation/mm/physical_memory.rst
> b/Documentation/mm/physical_memory.rst
> index 2398d87ac156..e4e188cd4887 100644
> --- a/Documentation/mm/physical_memory.rst
> +++ b/Documentation/mm/physical_memory.rst
> @@ -483,6 +483,12 @@ General
>    ``present_pages`` should use ``get_online_mems()`` to get a stable
> value. It
>    is initialized by ``calculate_node_totalpages()``.
> 
> +``pages_with_online_memmap``
> +  The pages_with_online_memmap is pages within the zone that have an
> online
> +  memmap. It includes present pages and memory holes that have a memmap.
> When
> +  spanned_pages == pages_with_online_memmap, pfn_to_page() can be
> performed
> +  without further checks on any pfn within the zone span.
> +
>  ``present_early_pages``
>    The present pages existing within the zone located on memory available
> since
>    early boot, excluding hotplugged memory. Defined only when
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 5380050b16b7..a367dde6e6fa 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -246,6 +246,7 @@ static int memory_block_online(struct memory_block
> *mem)
>  		nr_vmemmap_pages = mem->altmap->free;
> 
>  	mem_hotplug_begin();
> +	clear_zone_contiguous(zone);
>  	if (nr_vmemmap_pages) {
>  		ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages,
> zone);
>  		if (ret)
> @@ -270,6 +271,7 @@ static int memory_block_online(struct memory_block
> *mem)
> 
>  	mem->zone = zone;
>  out:
> +	set_zone_contiguous(zone);
>  	mem_hotplug_done();
>  	return ret;
>  }
> @@ -295,6 +297,8 @@ static int memory_block_offline(struct memory_block
> *mem)
>  		nr_vmemmap_pages = mem->altmap->free;
> 
>  	mem_hotplug_begin();
> +	clear_zone_contiguous(mem->zone);
> +
>  	if (nr_vmemmap_pages)
>  		adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
>  					  -nr_vmemmap_pages);
> @@ -314,6 +318,7 @@ static int memory_block_offline(struct memory_block
> *mem)
> 
>  	mem->zone = NULL;
>  out:
> +	set_zone_contiguous(mem->zone);
>  	mem_hotplug_done();
>  	return ret;
>  }
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index e11513f581eb..463376349a2c 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1029,6 +1029,11 @@ struct zone {
>  	 * cma pages is present pages that are assigned for CMA use
>  	 * (MIGRATE_CMA).
>  	 *
> +	 * pages_with_online_memmap is pages within the zone that have an
> online
> +	 * memmap. It includes present pages and memory holes that have a
> memmap.
> +	 * When spanned_pages == pages_with_online_memmap, pfn_to_page() can
> be
> +	 * performed without further checks on any pfn within the zone span.
> +	 *
>  	 * So present_pages may be used by memory hotplug or memory power
>  	 * management logic to figure out unmanaged pages by checking
>  	 * (present_pages - managed_pages). And managed_pages should be used
> @@ -1053,6 +1058,7 @@ struct zone {
>  	atomic_long_t		managed_pages;
>  	unsigned long		spanned_pages;
>  	unsigned long		present_pages;
> +	unsigned long		pages_with_online_memmap;
>  #if defined(CONFIG_MEMORY_HOTPLUG)
>  	unsigned long		present_early_pages;
>  #endif
> @@ -1710,6 +1716,38 @@ static inline bool populated_zone(const struct zone
> *zone)
>  	return zone->present_pages;
>  }
> 
> +/**
> + * zone_is_contiguous - test whether a zone is contiguous
> + * @zone: the zone to test.
> + *
> + * In a contiguous zone, it is valid to call pfn_to_page() on any pfn in
> the
> + * spanned zone without requiring pfn_valid() or pfn_to_online_page()
> checks.
> + *
> + * Note that missing synchronization with memory offlining makes any
> + * PFN traversal prone to races.
> + *
> + * ZONE_DEVICE zones are always marked non-contiguous.
> + *
> + * Returns: true if contiguous, otherwise false.
> + */
> +static inline bool zone_is_contiguous(const struct zone *zone)
> +{
> +	return zone->contiguous;
> +}
> +
> +static inline void set_zone_contiguous(struct zone *zone)
> +{
> +	if (zone_is_zone_device(zone))
> +		return;
> +	if (zone->spanned_pages == zone->pages_with_online_memmap)
> +		zone->contiguous = true;
> +}
> +
> +static inline void clear_zone_contiguous(struct zone *zone)
> +{
> +	zone->contiguous = false;
> +}
> +
>  #ifdef CONFIG_NUMA
>  static inline int zone_to_nid(const struct zone *zone)
>  {
> diff --git a/mm/internal.h b/mm/internal.h
> index 532d78febf91..faec50e55a30 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -816,21 +816,15 @@ extern struct page *__pageblock_pfn_to_page(unsigned
> long start_pfn,
>  static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>  				unsigned long end_pfn, struct zone *zone)
>  {
> -	if (zone->contiguous)
> +	if (zone_is_contiguous(zone))
>  		return pfn_to_page(start_pfn);
> 
>  	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
>  }
> 
> -void set_zone_contiguous(struct zone *zone);
>  bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
>  			   unsigned long nr_pages);
> 
> -static inline void clear_zone_contiguous(struct zone *zone)
> -{
> -	zone->contiguous = false;
> -}
> -
>  extern int __isolate_free_page(struct page *page, unsigned int order);
>  extern void __putback_isolated_page(struct page *page, unsigned int
> order,
>  				    int mt);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 70e620496cec..f29c0d70c970 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -558,18 +558,13 @@ void remove_pfn_range_from_zone(struct zone *zone,
> 
>  	/*
>  	 * Zone shrinking code cannot properly deal with ZONE_DEVICE. So
> -	 * we will not try to shrink the zones - which is okay as
> -	 * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
> +	 * we will not try to shrink the zones.
>  	 */
>  	if (zone_is_zone_device(zone))
>  		return;
> 
> -	clear_zone_contiguous(zone);
> -
>  	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
>  	update_pgdat_span(pgdat);
> -
> -	set_zone_contiguous(zone);
>  }
> 
>  /**
> @@ -746,8 +741,6 @@ void move_pfn_range_to_zone(struct zone *zone,
> unsigned long start_pfn,
>  	struct pglist_data *pgdat = zone->zone_pgdat;
>  	int nid = pgdat->node_id;
> 
> -	clear_zone_contiguous(zone);
> -
>  	if (zone_is_empty(zone))
>  		init_currently_empty_zone(zone, start_pfn, nr_pages);
>  	resize_zone_range(zone, start_pfn, nr_pages);
> @@ -775,8 +768,6 @@ void move_pfn_range_to_zone(struct zone *zone,
> unsigned long start_pfn,
>  	memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0,
>  			 MEMINIT_HOTPLUG, altmap, migratetype,
>  			 isolate_pageblock);
> -
> -	set_zone_contiguous(zone);
>  }
> 
>  struct auto_movable_stats {
> @@ -1072,6 +1063,7 @@ void adjust_present_page_count(struct page *page,
> struct memory_group *group,
>  	if (early_section(__pfn_to_section(page_to_pfn(page))))
>  		zone->present_early_pages += nr_pages;
>  	zone->present_pages += nr_pages;
> +	zone->pages_with_online_memmap += nr_pages;
>  	zone->zone_pgdat->node_present_pages += nr_pages;
> 
>  	if (group && movable)
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index e0f1e36cb9e4..6e5a8da7cdda 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -854,7 +854,7 @@ overlap_memmap_init(unsigned long zone, unsigned long
> *pfn)
>   *   zone/node above the hole except for the trailing pages in the last
>   *   section that will be appended to the zone/node below.
>   */
> -static void __init init_unavailable_range(unsigned long spfn,
> +static unsigned long __init init_unavailable_range(unsigned long spfn,
>  					  unsigned long epfn,
>  					  int zone, int node)
>  {
> @@ -870,6 +870,7 @@ static void __init init_unavailable_range(unsigned
> long spfn,
>  	if (pgcnt)
>  		pr_info("On node %d, zone %s: %lld pages in unavailable
> ranges\n",
>  			node, zone_names[zone], pgcnt);
> +	return pgcnt;
>  }
> 
>  /*
> @@ -958,6 +959,7 @@ static void __init memmap_init_zone_range(struct zone
> *zone,
>  	unsigned long zone_start_pfn = zone->zone_start_pfn;
>  	unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages;
>  	int nid = zone_to_nid(zone), zone_id = zone_idx(zone);
> +	unsigned long hole_pfns;
> 
>  	start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
>  	end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn);
> @@ -968,9 +970,12 @@ static void __init memmap_init_zone_range(struct zone
> *zone,
>  	memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn,
>  			  zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
>  			  false);
> +	zone->pages_with_online_memmap = end_pfn - start_pfn;
> 
> -	if (*hole_pfn < start_pfn)
> -		init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
> +	if (*hole_pfn < start_pfn) {
> +		hole_pfns = init_unavailable_range(*hole_pfn, start_pfn,
> zone_id, nid);
> +		zone->pages_with_online_memmap += hole_pfns;
> +	}
> 
>  	*hole_pfn = end_pfn;
>  }
> @@ -980,6 +985,7 @@ static void __init memmap_init(void)
>  	unsigned long start_pfn, end_pfn;
>  	unsigned long hole_pfn = 0;
>  	int i, j, zone_id = 0, nid;
> +	unsigned long hole_pfns;
> 
>  	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid)
> {
>  		struct pglist_data *node = NODE_DATA(nid);
> @@ -1008,8 +1014,12 @@ static void __init memmap_init(void)
>  #else
>  	end_pfn = round_up(end_pfn, MAX_ORDER_NR_PAGES);
>  #endif
> -	if (hole_pfn < end_pfn)
> -		init_unavailable_range(hole_pfn, end_pfn, zone_id, nid);
> +	if (hole_pfn < end_pfn) {
> +		struct zone *zone = &NODE_DATA(nid)->node_zones[zone_id];
> +
> +		hole_pfns = init_unavailable_range(hole_pfn, end_pfn, zone_id,
> nid);
> +		zone->pages_with_online_memmap += hole_pfns;
> +	}
>  }
> 
>  #ifdef CONFIG_ZONE_DEVICE
> @@ -2273,28 +2283,6 @@ void __init init_cma_pageblock(struct page *page)
>  }
>  #endif
> 
> -void set_zone_contiguous(struct zone *zone)
> -{
> -	unsigned long block_start_pfn = zone->zone_start_pfn;
> -	unsigned long block_end_pfn;
> -
> -	block_end_pfn = pageblock_end_pfn(block_start_pfn);
> -	for (; block_start_pfn < zone_end_pfn(zone);
> -			block_start_pfn = block_end_pfn,
> -			 block_end_pfn += pageblock_nr_pages) {
> -
> -		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
> -
> -		if (!__pageblock_pfn_to_page(block_start_pfn,
> -					     block_end_pfn, zone))
> -			return;
> -		cond_resched();
> -	}
> -
> -	/* We confirm that there is no hole */
> -	zone->contiguous = true;
> -}
> -
>  /*
>   * Check if a PFN range intersects multiple zones on one or more
>   * NUMA nodes. Specify the @nid argument if it is known that this
> --
> 2.43.0
> 
> 
> --
> Cheers,
> 
> David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
  2026-03-23 11:42     ` David Hildenbrand (Arm)
@ 2026-03-26  7:30       ` Liu, Yuan1
  2026-03-26  7:38         ` Chen, Yu C
  0 siblings, 1 reply; 13+ messages in thread
From: Liu, Yuan1 @ 2026-03-26  7:30 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Mike Rapoport
  Cc: Oscar Salvador, Wei Yang, linux-mm@kvack.org, Hu, Yong,
	Zou, Nanhai, Tim Chen, Zhuo, Qiuxu, Chen, Yu C, Deng, Pan,
	Li, Tianyou, Chen Zhang, linux-kernel@vger.kernel.org

> -----Original Message-----
> From: David Hildenbrand (Arm) <david@kernel.org>
> Sent: Monday, March 23, 2026 7:42 PM
> To: Mike Rapoport <rppt@kernel.org>
> Cc: Liu, Yuan1 <yuan1.liu@intel.com>; Oscar Salvador <osalvador@suse.de>;
> Wei Yang <richard.weiyang@gmail.com>; linux-mm@kvack.org; Hu, Yong
> <yong.hu@intel.com>; Zou, Nanhai <nanhai.zou@intel.com>; Tim Chen
> <tim.c.chen@linux.intel.com>; Zhuo, Qiuxu <qiuxu.zhuo@intel.com>; Chen, Yu
> C <yu.c.chen@intel.com>; Deng, Pan <pan.deng@intel.com>; Li, Tianyou
> <tianyou.li@intel.com>; Chen Zhang <zhangchen.kidd@jd.com>; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous
> check when changing pfn range
> 
> On 3/23/26 12:31, Mike Rapoport wrote:
> > On Mon, Mar 23, 2026 at 11:56:35AM +0100, David Hildenbrand (Arm) wrote:
> >> On 3/19/26 10:56, Yuan Liu wrote:
> >
> > ...
> >
> >>> diff --git a/mm/mm_init.c b/mm/mm_init.c
> >>> index df34797691bd..96690e550024 100644
> >>> --- a/mm/mm_init.c
> >>> +++ b/mm/mm_init.c
> >>> @@ -946,6 +946,7 @@ static void __init memmap_init_zone_range(struct
> zone *zone,
> >>>  	unsigned long zone_start_pfn = zone->zone_start_pfn;
> >>>  	unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages;
> >>>  	int nid = zone_to_nid(zone), zone_id = zone_idx(zone);
> >>> +	unsigned long zone_hole_start, zone_hole_end;
> >>>
> >>>  	start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
> >>>  	end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn);
> >>> @@ -957,8 +958,19 @@ static void __init memmap_init_zone_range(struct
> zone *zone,
> >>>  			  zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
> >>>  			  false);
> >>>
> >>> -	if (*hole_pfn < start_pfn)
> >>> +	WRITE_ONCE(zone->pages_with_online_memmap,
> >>> +		   READ_ONCE(zone->pages_with_online_memmap) +
> >>> +		   (end_pfn - start_pfn));
> >>> +
> >>> +	if (*hole_pfn < start_pfn) {
> >>>  		init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
> >>> +		zone_hole_start = clamp(*hole_pfn, zone_start_pfn,
> zone_end_pfn);
> >>> +		zone_hole_end = clamp(start_pfn, zone_start_pfn,
> zone_end_pfn);
> >>> +		if (zone_hole_start < zone_hole_end)
> >>> +			WRITE_ONCE(zone->pages_with_online_memmap,
> >>> +				   READ_ONCE(zone->pages_with_online_memmap) +
> >>> +				   (zone_hole_end - zone_hole_start));
> >>> +	}
> >>
> >> The range can have larger holes without a memmap, and I think we would
> be
> >> missing pages handled by the other init_unavailable_range() call?
> >>
> >>
> >> There is one question for Mike, though: couldn't it happen that the
> >> init_unavailable_range() call in memmap_init() would initialize
> >> the memmap outside of the node/zone span?
> >
> > Yes, and it most likely will.
> >
> > Very common example is page 0 on x86 systems:
> >
> > [    0.012196]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> > [    0.012221] On node 0, zone DMA: 1 pages in unavailable ranges
> > [    0.012205] Early memory node ranges
> > [    0.012206]   node   0: [mem 0x0000000000001000-0x000000000009efff]
> >
> > The unavailable page in zone DMA is the page from  0x0 to 0x1000 that is
> > neither in node 0 nor in zone DMA.
> >
> > For ZONE_NORMAL it would be a more pathological case when zone/node span
> > ends in a middle of a section, but that's still possible.
> >
> >> If so, I wonder whether we would want to adjust the node+zone space to
> >> include these ranges.
> >>
> >> Later memory onlining could make these ranges suddenly fall into the
> >> node/zone span.
> >
> > But doesn't memory onlining always happen at section boundaries?
> 
> Sure, but assume ZONE_NORMAL ends in the middle of a section, and then
> you hotplug the next section.
> 
> Then, the zone spans that memmap. zone->pages_with_online_memmap will be
> wrong.
> 
> Once we unplug the hotplugged section, zone shrinking code will stumble
> over the whole-pfns and assume they belong to the zone.
> zone->pages_with_online_memmap will be wrong.
> 
> zone->pages_with_online_memmap being wrong means that it is smaller than
> it should. I guess, it would not be broken, but we would fail to detect
> contiguous zones.
> 
> If there would be an easy way to avoid that, that would be cleaner.

I try to get your points and draft below codes.

+static void adjust_pages_with_online_memmap(struct zone *zone, long nr_pages,
+                                           long added_spanned_pages)
+{
+       if (added_spanned_pages == nr_pages)
+               zone->pages_with_online_memmap += nr_pages
+       else
+               zone->pages_with_online_memmap += added_spanned_pages;
+}
 /*
  * Must be called with mem_hotplug_lock in write mode.
  */
@@ -1154,6 +1162,7 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
        const int nid = zone_to_nid(zone);
        int need_zonelists_rebuild = 0;
        unsigned long flags;
+       unsigned long old_spanned_pages = zone->spanned_pages;
        int ret;

        /*
@@ -1206,6 +1215,8 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,

        online_pages_range(pfn, nr_pages);
        adjust_present_page_count(pfn_to_page(pfn), group, nr_pages);
+       adjust_pages_with_online_memmap(zone, nr_pages,
+                                       zone->spanned_pages - old_spanned_pages);

        if (node_arg.nid >= 0)
                node_set_state(nid, N_MEMORY);
@@ -1905,6 +1916,7 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
        struct node_notify node_arg = {
                .nid = NUMA_NO_NODE,
        };
+       unsigned long old_spanned_pages = zone->spanned_pages;
        unsigned long flags;
        char *reason;
        int ret;
@@ -2051,6 +2063,8 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
        /* removal success */
        adjust_managed_page_count(pfn_to_page(start_pfn), -managed_pages);
        adjust_present_page_count(pfn_to_page(start_pfn), group, -nr_pages);
+       adjust_pages_with_online_memmap(zone, nr_pages,
+                                       zone->spanned_pages - old_spanned_pages);

Btw, can we introduce a new kernel command-line parameter to allow users to select 
the memory block size? This could also address the current issue.

Test Results as below, memory block size 128MB Vs. 2GB
+----------------+------+---------------+--------------+----------------+
|                | Size |    128MG      |    2GB       | Time Reduction |
|                +------+---------------+--------------+----------------+
| Plug Memory    | 256G |      10s      |       3s     |       70%      |
|                +------+---------------+--------------+----------------+
|                | 512G |      36s      |       7s     |       81%      |
+----------------+------+---------------+--------------+----------------+
 
+----------------+------+---------------+--------------+----------------+
|                | Size |    128MG      |    2GB       | Time Reduction |
|                +------+---------------+--------------+----------------+
| Unplug Memory  | 256G |      11s      |      3s      |       72%      |
|                +------+---------------+--------------+----------------+
|                | 512G |      36s      |      7s      |       81%      |
+----------------+------+---------------+--------------+----------------+

And I see the UV system has already this (Kernel parameter is uv_memblksize). 
I think if we can introduce a common kernel parameter for memory block size configuration?

--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1458,6 +1458,26 @@ int __init set_memory_block_size_order(unsigned int order)
        return 0;
 }

+static int __init cmdline_parse_memory_block_size(char *p)
+{
+    unsigned long size;
+    char *endp = p;
+    int ret;
+
+    size = memparse(p, &endp);
+    if (*endp != '\0' || !is_power_of_2(size))
+        return -EINVAL;
+
+    ret = set_memory_block_size_order(ilog2(size));
+    if (ret)
+        return ret;
+
+    pr_info("x86/mm: memory_block_size cmdline override: %ldMB\n",
+        size >> 20);
+    return 0;
+}
+early_param("x86_memory_block_size", cmdline_parse_memory_block_size);

> --
> Cheers,
> 
> David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
  2026-03-23 11:51 ` Mike Rapoport
@ 2026-03-26  7:32   ` Liu, Yuan1
  0 siblings, 0 replies; 13+ messages in thread
From: Liu, Yuan1 @ 2026-03-26  7:32 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: David Hildenbrand, Oscar Salvador, Wei Yang, linux-mm@kvack.org,
	Hu, Yong, Zou, Nanhai, Tim Chen, Zhuo, Qiuxu, Chen, Yu C,
	Deng, Pan, Li, Tianyou, Chen Zhang, linux-kernel@vger.kernel.org

> -----Original Message-----
> From: Mike Rapoport <rppt@kernel.org>
> Sent: Monday, March 23, 2026 7:51 PM
> To: Liu, Yuan1 <yuan1.liu@intel.com>
> Cc: David Hildenbrand <david@kernel.org>; Oscar Salvador
> <osalvador@suse.de>; Wei Yang <richard.weiyang@gmail.com>; linux-
> mm@kvack.org; Hu, Yong <yong.hu@intel.com>; Zou, Nanhai
> <nanhai.zou@intel.com>; Tim Chen <tim.c.chen@linux.intel.com>; Zhuo, Qiuxu
> <qiuxu.zhuo@intel.com>; Chen, Yu C <yu.c.chen@intel.com>; Deng, Pan
> <pan.deng@intel.com>; Li, Tianyou <tianyou.li@intel.com>; Chen Zhang
> <zhangchen.kidd@jd.com>; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous
> check when changing pfn range
> 
> Hi,
> 
> On Thu, Mar 19, 2026 at 05:56:22AM -0400, Yuan Liu wrote:
> 
> ...
> 
> > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > index df34797691bd..96690e550024 100644
> > --- a/mm/mm_init.c
> > +++ b/mm/mm_init.c
> > @@ -946,6 +946,7 @@ static void __init memmap_init_zone_range(struct
> zone *zone,
> >  	unsigned long zone_start_pfn = zone->zone_start_pfn;
> >  	unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages;
> >  	int nid = zone_to_nid(zone), zone_id = zone_idx(zone);
> > +	unsigned long zone_hole_start, zone_hole_end;
> >
> >  	start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
> >  	end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn);
> > @@ -957,8 +958,19 @@ static void __init memmap_init_zone_range(struct
> zone *zone,
> >  			  zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
> >  			  false);
> >
> > -	if (*hole_pfn < start_pfn)
> > +	WRITE_ONCE(zone->pages_with_online_memmap,
> > +		   READ_ONCE(zone->pages_with_online_memmap) +
> > +		   (end_pfn - start_pfn));
> > +
> > +	if (*hole_pfn < start_pfn) {
> >  		init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
> > +		zone_hole_start = clamp(*hole_pfn, zone_start_pfn,
> zone_end_pfn);
> > +		zone_hole_end = clamp(start_pfn, zone_start_pfn,
> zone_end_pfn);
> > +		if (zone_hole_start < zone_hole_end)
> > +			WRITE_ONCE(zone->pages_with_online_memmap,
> > +				   READ_ONCE(zone->pages_with_online_memmap) +
> > +				   (zone_hole_end - zone_hole_start));
> > +	}
> 
> I didn't have time to review it, but it really jumped at me.
> memmap_init_zone_range() runs before SMP, there is no need for
> WRITE_ONCE()/READ_ONCE() here.

Hi Mike

Thank you very much for taking the time to review this patch. I will fix it.

> --
> Sincerely yours,
> Mike.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
  2026-03-26  7:30       ` Liu, Yuan1
@ 2026-03-26  7:38         ` Chen, Yu C
  2026-03-26  9:53           ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 13+ messages in thread
From: Chen, Yu C @ 2026-03-26  7:38 UTC (permalink / raw)
  To: Liu, Yuan1, David Hildenbrand (Arm), Mike Rapoport
  Cc: Oscar Salvador, Wei Yang, linux-mm@kvack.org, Hu, Yong,
	Zou, Nanhai, Tim Chen, Zhuo, Qiuxu, Deng, Pan, Li, Tianyou,
	Chen Zhang, linux-kernel@vger.kernel.org

On 3/26/2026 3:30 PM, Liu, Yuan1 wrote:
>> -----Original Message-----

[ .... ]

> 
> Btw, can we introduce a new kernel command-line parameter to allow users to select
> the memory block size? This could also address the current issue.
> 
> Test Results as below, memory block size 128MB Vs. 2GB
> +----------------+------+---------------+--------------+----------------+
> |                | Size |    128MG      |    2GB       | Time Reduction |
> |                +------+---------------+--------------+----------------+
> | Plug Memory    | 256G |      10s      |       3s     |       70%      |
> |                +------+---------------+--------------+----------------+
> |                | 512G |      36s      |       7s     |       81%      |
> +----------------+------+---------------+--------------+----------------+
>   
> +----------------+------+---------------+--------------+----------------+
> |                | Size |    128MG      |    2GB       | Time Reduction |
> |                +------+---------------+--------------+----------------+
> | Unplug Memory  | 256G |      11s      |      3s      |       72%      |
> |                +------+---------------+--------------+----------------+
> |                | 512G |      36s      |      7s      |       81%      |
> +----------------+------+---------------+--------------+----------------+
> 
> And I see the UV system has already this (Kernel parameter is uv_memblksize).
> I think if we can introduce a common kernel parameter for memory block size configuration?
> 

Is it possible to turn uv_memblksize into a generic commandline 
memblksize without
introducing extra parameter?

thanks,
Chenyu


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
  2026-03-26  3:39   ` Liu, Yuan1
@ 2026-03-26  9:23     ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-26  9:23 UTC (permalink / raw)
  To: Liu, Yuan1, Oscar Salvador, Mike Rapoport, Wei Yang
  Cc: linux-mm@kvack.org, Hu, Yong, Zou, Nanhai, Tim Chen, Zhuo, Qiuxu,
	Chen, Yu C, Deng, Pan, Li, Tianyou, Chen Zhang,
	linux-kernel@vger.kernel.org


> About AI feedback below
> 
> init_unavailable_range() uses for_each_valid_pfn() and skips unmapped
> sections that lack a mem_map. If the full hole size is added anyway,
> pages_with_online_memmap could artificially match spanned_pages for
> zones with missing sections.
> 
> Could this make zone_is_contiguous() incorrectly return true, bypassing
> pfn_valid() checks and potentially dereferencing missing sections?
> unsinged long zone_hole_pages
> 
> I think if I can update the pages_with_online_memmap in the 
> init_unavailable_range directly.

See how I handled it: I simply return the number of initialized pages
from the function. I think that should do.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
  2026-03-26  7:38         ` Chen, Yu C
@ 2026-03-26  9:53           ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-26  9:53 UTC (permalink / raw)
  To: Chen, Yu C, Liu, Yuan1, Mike Rapoport
  Cc: Oscar Salvador, Wei Yang, linux-mm@kvack.org, Hu, Yong,
	Zou, Nanhai, Tim Chen, Zhuo, Qiuxu, Deng, Pan, Li, Tianyou,
	Chen Zhang, linux-kernel@vger.kernel.org

On 3/26/26 08:38, Chen, Yu C wrote:
> On 3/26/2026 3:30 PM, Liu, Yuan1 wrote:
>>> -----Original Message-----
> 
> [ .... ]
> 
>>
>> Btw, can we introduce a new kernel command-line parameter to allow
>> users to select
>> the memory block size? This could also address the current issue.
>>
>> Test Results as below, memory block size 128MB Vs. 2GB
>> +----------------+------+---------------+--------------+----------------+
>> |                | Size |    128MG      |    2GB       | Time Reduction |
>> |                +------+---------------+--------------+----------------+
>> | Plug Memory    | 256G |      10s      |       3s     |       70%      |
>> |                +------+---------------+--------------+----------------+
>> |                | 512G |      36s      |       7s     |       81%      |
>> +----------------+------+---------------+--------------+----------------+
>>   +----------------+------+---------------+--------------
>> +----------------+
>> |                | Size |    128MG      |    2GB       | Time Reduction |
>> |                +------+---------------+--------------+----------------+
>> | Unplug Memory  | 256G |      11s      |      3s      |       72%      |
>> |                +------+---------------+--------------+----------------+
>> |                | 512G |      36s      |      7s      |       81%      |
>> +----------------+------+---------------+--------------+----------------+
>>
>> And I see the UV system has already this (Kernel parameter is
>> uv_memblksize).
>> I think if we can introduce a common kernel parameter for memory block
>> size configuration?
>>
> 
> Is it possible to turn uv_memblksize into a generic commandline
> memblksize without
> introducing extra parameter?

We don't want that, and it's kind of a workaround for the problem. :)

I think we would want to only account pages towards
pages_with_online_memmap that fall within the zone span.

We will not account pages initialized that are outside the zone span.

Growing the zone and later trying to shrink them will only possibly see
a "too small"  pages_with_online_memmap value. That is fine, it simply
prevents detecting "contiguous" so it's safe.

We can document that, and in the future we could handle it a bit nicer
(e.g., indicate these pages as being just fill material).

So indeed, I guess we want to teach init_unavailable_range() to only
account towards zone->pages_with_online_memmap whatever falls into the
zone span.

That could be done internally, or from the callers by calling
init_unavailable_range() once for the out-of-zone range and once for the
in-zone-range.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-03-26  9:54 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-19  9:56 [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range Yuan Liu
2026-03-19 10:08 ` Liu, Yuan1
2026-03-20  3:13 ` Andrew Morton
2026-03-23 10:56 ` David Hildenbrand (Arm)
2026-03-23 11:31   ` Mike Rapoport
2026-03-23 11:42     ` David Hildenbrand (Arm)
2026-03-26  7:30       ` Liu, Yuan1
2026-03-26  7:38         ` Chen, Yu C
2026-03-26  9:53           ` David Hildenbrand (Arm)
2026-03-26  3:39   ` Liu, Yuan1
2026-03-26  9:23     ` David Hildenbrand (Arm)
2026-03-23 11:51 ` Mike Rapoport
2026-03-26  7:32   ` Liu, Yuan1

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox