From: Mike Rapoport <rppt@kernel.org>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: Yuan Liu <yuan1.liu@intel.com>,
Oscar Salvador <osalvador@suse.de>,
Wei Yang <richard.weiyang@gmail.com>,
linux-mm@kvack.org, Yong Hu <yong.hu@intel.com>,
Nanhai Zou <nanhai.zou@intel.com>,
Tim Chen <tim.c.chen@linux.intel.com>,
Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
Yu C Chen <yu.c.chen@intel.com>, Pan Deng <pan.deng@intel.com>,
Tianyou Li <tianyou.li@intel.com>,
Chen Zhang <zhangchen.kidd@jd.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
Date: Mon, 23 Mar 2026 13:31:09 +0200 [thread overview]
Message-ID: <acEkfaycrJI-kWjk@kernel.org> (raw)
In-Reply-To: <48b497e5-1545-4376-a898-f3813a6ef989@kernel.org>
On Mon, Mar 23, 2026 at 11:56:35AM +0100, David Hildenbrand (Arm) wrote:
> On 3/19/26 10:56, Yuan Liu wrote:
...
> > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > index df34797691bd..96690e550024 100644
> > --- a/mm/mm_init.c
> > +++ b/mm/mm_init.c
> > @@ -946,6 +946,7 @@ static void __init memmap_init_zone_range(struct zone *zone,
> > unsigned long zone_start_pfn = zone->zone_start_pfn;
> > unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages;
> > int nid = zone_to_nid(zone), zone_id = zone_idx(zone);
> > + unsigned long zone_hole_start, zone_hole_end;
> >
> > start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
> > end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn);
> > @@ -957,8 +958,19 @@ static void __init memmap_init_zone_range(struct zone *zone,
> > zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
> > false);
> >
> > - if (*hole_pfn < start_pfn)
> > + WRITE_ONCE(zone->pages_with_online_memmap,
> > + READ_ONCE(zone->pages_with_online_memmap) +
> > + (end_pfn - start_pfn));
> > +
> > + if (*hole_pfn < start_pfn) {
> > init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
> > + zone_hole_start = clamp(*hole_pfn, zone_start_pfn, zone_end_pfn);
> > + zone_hole_end = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
> > + if (zone_hole_start < zone_hole_end)
> > + WRITE_ONCE(zone->pages_with_online_memmap,
> > + READ_ONCE(zone->pages_with_online_memmap) +
> > + (zone_hole_end - zone_hole_start));
> > + }
>
> The range can have larger holes without a memmap, and I think we would be
> missing pages handled by the other init_unavailable_range() call?
>
>
> There is one question for Mike, though: couldn't it happen that the
> init_unavailable_range() call in memmap_init() would initialize
> the memmap outside of the node/zone span?
Yes, and it most likely will.
Very common example is page 0 on x86 systems:
[ 0.012196] DMA [mem 0x0000000000001000-0x0000000000ffffff]
[ 0.012221] On node 0, zone DMA: 1 pages in unavailable ranges
[ 0.012205] Early memory node ranges
[ 0.012206] node 0: [mem 0x0000000000001000-0x000000000009efff]
The unavailable page in zone DMA is the page from 0x0 to 0x1000 that is
neither in node 0 nor in zone DMA.
For ZONE_NORMAL it would be a more pathological case when zone/node span
ends in a middle of a section, but that's still possible.
> If so, I wonder whether we would want to adjust the node+zone space to
> include these ranges.
>
> Later memory onlining could make these ranges suddenly fall into the
> node/zone span.
But doesn't memory onlining always happen at section boundaries?
> So that requires some thought.
>
>
> Maybe we should start with this (untested):
>
> >From a73ee44bc93fbcb9cf2b995e27fb98c68415f7be Mon Sep 17 00:00:00 2001
> From: Yuan Liu <yuan1.liu@intel.com>
> Date: Thu, 19 Mar 2026 05:56:22 -0400
> Subject: [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when
> changing pfn range
>
> [...]
> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
> ---
> Documentation/mm/physical_memory.rst | 6 ++++
> drivers/base/memory.c | 5 ++++
> include/linux/mmzone.h | 38 +++++++++++++++++++++++++
> mm/internal.h | 8 +-----
> mm/memory_hotplug.c | 12 ++------
> mm/mm_init.c | 42 ++++++++++------------------
> 6 files changed, 67 insertions(+), 44 deletions(-)
>
> diff --git a/Documentation/mm/physical_memory.rst b/Documentation/mm/physical_memory.rst
> index 2398d87ac156..e4e188cd4887 100644
> --- a/Documentation/mm/physical_memory.rst
> +++ b/Documentation/mm/physical_memory.rst
> @@ -483,6 +483,12 @@ General
> ``present_pages`` should use ``get_online_mems()`` to get a stable value. It
> is initialized by ``calculate_node_totalpages()``.
>
> +``pages_with_online_memmap``
> + The pages_with_online_memmap is pages within the zone that have an online
> + memmap. It includes present pages and memory holes that have a memmap. When
> + spanned_pages == pages_with_online_memmap, pfn_to_page() can be performed
> + without further checks on any pfn within the zone span.
> +
> ``present_early_pages``
> The present pages existing within the zone located on memory available since
> early boot, excluding hotplugged memory. Defined only when
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 5380050b16b7..a367dde6e6fa 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -246,6 +246,7 @@ static int memory_block_online(struct memory_block *mem)
> nr_vmemmap_pages = mem->altmap->free;
>
> mem_hotplug_begin();
> + clear_zone_contiguous(zone);
> if (nr_vmemmap_pages) {
> ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
> if (ret)
> @@ -270,6 +271,7 @@ static int memory_block_online(struct memory_block *mem)
>
> mem->zone = zone;
> out:
> + set_zone_contiguous(zone);
> mem_hotplug_done();
> return ret;
> }
> @@ -295,6 +297,8 @@ static int memory_block_offline(struct memory_block *mem)
> nr_vmemmap_pages = mem->altmap->free;
>
> mem_hotplug_begin();
> + clear_zone_contiguous(mem->zone);
> +
> if (nr_vmemmap_pages)
> adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
> -nr_vmemmap_pages);
> @@ -314,6 +318,7 @@ static int memory_block_offline(struct memory_block *mem)
>
> mem->zone = NULL;
> out:
> + set_zone_contiguous(mem->zone);
> mem_hotplug_done();
> return ret;
> }
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index e11513f581eb..463376349a2c 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1029,6 +1029,11 @@ struct zone {
> * cma pages is present pages that are assigned for CMA use
> * (MIGRATE_CMA).
> *
> + * pages_with_online_memmap is pages within the zone that have an online
> + * memmap. It includes present pages and memory holes that have a memmap.
> + * When spanned_pages == pages_with_online_memmap, pfn_to_page() can be
> + * performed without further checks on any pfn within the zone span.
> + *
> * So present_pages may be used by memory hotplug or memory power
> * management logic to figure out unmanaged pages by checking
> * (present_pages - managed_pages). And managed_pages should be used
> @@ -1053,6 +1058,7 @@ struct zone {
> atomic_long_t managed_pages;
> unsigned long spanned_pages;
> unsigned long present_pages;
> + unsigned long pages_with_online_memmap;
> #if defined(CONFIG_MEMORY_HOTPLUG)
> unsigned long present_early_pages;
> #endif
> @@ -1710,6 +1716,38 @@ static inline bool populated_zone(const struct zone *zone)
> return zone->present_pages;
> }
>
> +/**
> + * zone_is_contiguous - test whether a zone is contiguous
> + * @zone: the zone to test.
> + *
> + * In a contiguous zone, it is valid to call pfn_to_page() on any pfn in the
> + * spanned zone without requiring pfn_valid() or pfn_to_online_page() checks.
> + *
> + * Note that missing synchronization with memory offlining makes any
> + * PFN traversal prone to races.
> + *
> + * ZONE_DEVICE zones are always marked non-contiguous.
> + *
> + * Returns: true if contiguous, otherwise false.
> + */
> +static inline bool zone_is_contiguous(const struct zone *zone)
> +{
> + return zone->contiguous;
> +}
> +
> +static inline void set_zone_contiguous(struct zone *zone)
> +{
> + if (zone_is_zone_device(zone))
> + return;
> + if (zone->spanned_pages == zone->pages_with_online_memmap)
> + zone->contiguous = true;
> +}
> +
> +static inline void clear_zone_contiguous(struct zone *zone)
> +{
> + zone->contiguous = false;
> +}
> +
> #ifdef CONFIG_NUMA
> static inline int zone_to_nid(const struct zone *zone)
> {
> diff --git a/mm/internal.h b/mm/internal.h
> index 532d78febf91..faec50e55a30 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -816,21 +816,15 @@ extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
> static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
> unsigned long end_pfn, struct zone *zone)
> {
> - if (zone->contiguous)
> + if (zone_is_contiguous(zone))
> return pfn_to_page(start_pfn);
>
> return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
> }
>
> -void set_zone_contiguous(struct zone *zone);
> bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
> unsigned long nr_pages);
>
> -static inline void clear_zone_contiguous(struct zone *zone)
> -{
> - zone->contiguous = false;
> -}
> -
> extern int __isolate_free_page(struct page *page, unsigned int order);
> extern void __putback_isolated_page(struct page *page, unsigned int order,
> int mt);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 70e620496cec..f29c0d70c970 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -558,18 +558,13 @@ void remove_pfn_range_from_zone(struct zone *zone,
>
> /*
> * Zone shrinking code cannot properly deal with ZONE_DEVICE. So
> - * we will not try to shrink the zones - which is okay as
> - * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
> + * we will not try to shrink the zones.
> */
> if (zone_is_zone_device(zone))
> return;
>
> - clear_zone_contiguous(zone);
> -
> shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
> update_pgdat_span(pgdat);
> -
> - set_zone_contiguous(zone);
> }
>
> /**
> @@ -746,8 +741,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
> struct pglist_data *pgdat = zone->zone_pgdat;
> int nid = pgdat->node_id;
>
> - clear_zone_contiguous(zone);
> -
> if (zone_is_empty(zone))
> init_currently_empty_zone(zone, start_pfn, nr_pages);
> resize_zone_range(zone, start_pfn, nr_pages);
> @@ -775,8 +768,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
> memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0,
> MEMINIT_HOTPLUG, altmap, migratetype,
> isolate_pageblock);
> -
> - set_zone_contiguous(zone);
> }
>
> struct auto_movable_stats {
> @@ -1072,6 +1063,7 @@ void adjust_present_page_count(struct page *page, struct memory_group *group,
> if (early_section(__pfn_to_section(page_to_pfn(page))))
> zone->present_early_pages += nr_pages;
> zone->present_pages += nr_pages;
> + zone->pages_with_online_memmap += nr_pages;
> zone->zone_pgdat->node_present_pages += nr_pages;
>
> if (group && movable)
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index e0f1e36cb9e4..6e5a8da7cdda 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -854,7 +854,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
> * zone/node above the hole except for the trailing pages in the last
> * section that will be appended to the zone/node below.
> */
> -static void __init init_unavailable_range(unsigned long spfn,
> +static unsigned long __init init_unavailable_range(unsigned long spfn,
> unsigned long epfn,
> int zone, int node)
> {
> @@ -870,6 +870,7 @@ static void __init init_unavailable_range(unsigned long spfn,
> if (pgcnt)
> pr_info("On node %d, zone %s: %lld pages in unavailable ranges\n",
> node, zone_names[zone], pgcnt);
> + return pgcnt;
> }
>
> /*
> @@ -958,6 +959,7 @@ static void __init memmap_init_zone_range(struct zone *zone,
> unsigned long zone_start_pfn = zone->zone_start_pfn;
> unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages;
> int nid = zone_to_nid(zone), zone_id = zone_idx(zone);
> + unsigned long hole_pfns;
>
> start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
> end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn);
> @@ -968,9 +970,12 @@ static void __init memmap_init_zone_range(struct zone *zone,
> memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn,
> zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
> false);
> + zone->pages_with_online_memmap = end_pfn - start_pfn;
>
> - if (*hole_pfn < start_pfn)
> - init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
> + if (*hole_pfn < start_pfn) {
> + hole_pfns = init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
> + zone->pages_with_online_memmap += hole_pfns;
> + }
>
> *hole_pfn = end_pfn;
> }
> @@ -980,6 +985,7 @@ static void __init memmap_init(void)
> unsigned long start_pfn, end_pfn;
> unsigned long hole_pfn = 0;
> int i, j, zone_id = 0, nid;
> + unsigned long hole_pfns;
>
> for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
> struct pglist_data *node = NODE_DATA(nid);
> @@ -1008,8 +1014,12 @@ static void __init memmap_init(void)
> #else
> end_pfn = round_up(end_pfn, MAX_ORDER_NR_PAGES);
> #endif
> - if (hole_pfn < end_pfn)
> - init_unavailable_range(hole_pfn, end_pfn, zone_id, nid);
> + if (hole_pfn < end_pfn) {
> + struct zone *zone = &NODE_DATA(nid)->node_zones[zone_id];
> +
> + hole_pfns = init_unavailable_range(hole_pfn, end_pfn, zone_id, nid);
> + zone->pages_with_online_memmap += hole_pfns;
> + }
> }
>
> #ifdef CONFIG_ZONE_DEVICE
> @@ -2273,28 +2283,6 @@ void __init init_cma_pageblock(struct page *page)
> }
> #endif
>
> -void set_zone_contiguous(struct zone *zone)
> -{
> - unsigned long block_start_pfn = zone->zone_start_pfn;
> - unsigned long block_end_pfn;
> -
> - block_end_pfn = pageblock_end_pfn(block_start_pfn);
> - for (; block_start_pfn < zone_end_pfn(zone);
> - block_start_pfn = block_end_pfn,
> - block_end_pfn += pageblock_nr_pages) {
> -
> - block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
> -
> - if (!__pageblock_pfn_to_page(block_start_pfn,
> - block_end_pfn, zone))
> - return;
> - cond_resched();
> - }
> -
> - /* We confirm that there is no hole */
> - zone->contiguous = true;
> -}
> -
> /*
> * Check if a PFN range intersects multiple zones on one or more
> * NUMA nodes. Specify the @nid argument if it is known that this
> --
> 2.43.0
>
>
> --
> Cheers,
>
> David
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2026-03-23 11:31 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-19 9:56 [PATCH] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range Yuan Liu
2026-03-19 10:08 ` Liu, Yuan1
2026-03-20 3:13 ` Andrew Morton
2026-03-23 10:56 ` David Hildenbrand (Arm)
2026-03-23 11:31 ` Mike Rapoport [this message]
2026-03-23 11:42 ` David Hildenbrand (Arm)
2026-03-26 7:30 ` Liu, Yuan1
2026-03-26 7:38 ` Chen, Yu C
2026-03-26 9:53 ` David Hildenbrand (Arm)
2026-03-26 3:39 ` Liu, Yuan1
2026-03-26 9:23 ` David Hildenbrand (Arm)
2026-03-23 11:51 ` Mike Rapoport
2026-03-26 7:32 ` Liu, Yuan1
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=acEkfaycrJI-kWjk@kernel.org \
--to=rppt@kernel.org \
--cc=david@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nanhai.zou@intel.com \
--cc=osalvador@suse.de \
--cc=pan.deng@intel.com \
--cc=qiuxu.zhuo@intel.com \
--cc=richard.weiyang@gmail.com \
--cc=tianyou.li@intel.com \
--cc=tim.c.chen@linux.intel.com \
--cc=yong.hu@intel.com \
--cc=yu.c.chen@intel.com \
--cc=yuan1.liu@intel.com \
--cc=zhangchen.kidd@jd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox