Re: [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Mike Rapoport <rppt@kernel.org>
Cc: Yuan Liu <yuan1.liu@intel.com>,
	Oscar Salvador <osalvador@suse.de>,
	Wei Yang <richard.weiyang@gmail.com>,
	linux-mm@kvack.org, Yong Hu <yong.hu@intel.com>,
	Nanhai Zou <nanhai.zou@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
	Yu C Chen <yu.c.chen@intel.com>, Pan Deng <pan.deng@intel.com>,
	Tianyou Li <tianyou.li@intel.com>,
	Chen Zhang <zhangchen.kidd@jd.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
Date: Thu, 9 Apr 2026 17:08:18 +0200	[thread overview]
Message-ID: <e86fee84-08d8-4563-8596-e40d8e196799@kernel.org> (raw)
In-Reply-To: <ade6RVeXxDt7ImP4@kernel.org>

On 4/9/26 16:40, Mike Rapoport wrote:
> On Wed, Apr 08, 2026 at 09:36:14AM +0200, David Hildenbrand (Arm) wrote:
>> On 4/8/26 05:16, Yuan Liu wrote:
>>> When move_pfn_range_to_zone() or remove_pfn_range_from_zone() updates a
>>> zone, set_zone_contiguous() rescans the entire zone pageblock-by-pageblock
>>> to rebuild zone->contiguous. For large zones this is a significant cost
>>> during memory hotplug and hot-unplug.
>>>
>>> Add a new zone member pages_with_online_memmap that tracks the number of
>>> pages within the zone span that have an online memory map (including present
>>> pages and memory holes whose memory map has been initialized). When
>>> spanned_pages == pages_with_online_memmap the zone is contiguous and
>>> pfn_to_page() can be called on any PFN in the zone span without further
>>> pfn_valid() checks.
>>>
>>> Only pages that fall within the current zone span are accounted towards
>>> pages_with_online_memmap. A "too small" value is safe, it merely prevents
>>> detecting a contiguous zone.
>>>
>>> The following test cases of memory hotplug for a VM [1], tested in the
>>> environment [2], show that this optimization can significantly reduce the
>>> memory hotplug time [3].
>>>
>>> +----------------+------+---------------+--------------+----------------+
>>> |                | Size | Time (before) | Time (after) | Time Reduction |
>>> |                +------+---------------+--------------+----------------+
>>> | Plug Memory    | 256G |      10s      |      3s      |       70%      |
>>> |                +------+---------------+--------------+----------------+
>>> |                | 512G |      36s      |      7s      |       81%      |
>>> +----------------+------+---------------+--------------+----------------+
>>>
>>> +----------------+------+---------------+--------------+----------------+
>>> |                | Size | Time (before) | Time (after) | Time Reduction |
>>> |                +------+---------------+--------------+----------------+
>>> | Unplug Memory  | 256G |      11s      |      4s      |       64%      |
>>> |                +------+---------------+--------------+----------------+
>>> |                | 512G |      36s      |      9s      |       75%      |
>>> +----------------+------+---------------+--------------+----------------+
>>>
>>> [1] Qemu commands to hotplug 256G/512G memory for a VM:
>>>     object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
>>>     device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
>>>     qom-set vmem1 requested-size 256G/512G (Plug Memory)
>>>     qom-set vmem1 requested-size 0G (Unplug Memory)
>>>
>>> [2] Hardware     : Intel Icelake server
>>>     Guest Kernel : v7.0-rc4
>>>     Qemu         : v9.0.0
>>>
>>>     Launch VM    :
>>>     qemu-system-x86_64 -accel kvm -cpu host \
>>>     -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
>>>     -drive file=./seed.img,format=raw,if=virtio \
>>>     -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
>>>     -m 2G,slots=10,maxmem=2052472M \
>>>     -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
>>>     -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
>>>     -nographic -machine q35 \
>>>     -nic user,hostfwd=tcp::3000-:22
>>>
>>>     Guest kernel auto-onlines newly added memory blocks:
>>>     echo online > /sys/devices/system/memory/auto_online_blocks
>>>
>>> [3] The time from typing the QEMU commands in [1] to when the output of
>>>     'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
>>>     memory is recognized.
>>>
>>> Reported-by: Nanhai Zou <nanhai.zou@intel.com>
>>> Reported-by: Chen Zhang <zhangchen.kidd@jd.com>
>>> Tested-by: Yuan Liu <yuan1.liu@intel.com>
>>> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
>>> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>>> Reviewed-by: Yu C Chen <yu.c.chen@intel.com>
>>> Reviewed-by: Pan Deng <pan.deng@intel.com>
>>> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
>>> Co-developed-by: Tianyou Li <tianyou.li@intel.com>
>>> Signed-off-by: Tianyou Li <tianyou.li@intel.com>
>>> Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
>>> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
>>> ---
>>
>> [...]
>>
>>> @@ -842,7 +842,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
>>>   *   zone/node above the hole except for the trailing pages in the last
>>>   *   section that will be appended to the zone/node below.
>>>   */
>>> -static void __init init_unavailable_range(unsigned long spfn,
>>> +static unsigned long __init init_unavailable_range(unsigned long spfn,
>>>  					  unsigned long epfn,
>>>  					  int zone, int node)
>>>  {
>>> @@ -858,6 +858,7 @@ static void __init init_unavailable_range(unsigned long spfn,
>>>  	if (pgcnt)
>>>  		pr_info("On node %d, zone %s: %lld pages in unavailable ranges\n",
>>>  			node, zone_names[zone], pgcnt);
>>> +	return pgcnt;
>>>  }
>>>  
>>>  /*
>>> @@ -956,9 +957,22 @@ static void __init memmap_init_zone_range(struct zone *zone,
>>>  	memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn,
>>>  			  zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
>>>  			  false);
>>> +	zone->pages_with_online_memmap += end_pfn - start_pfn;
>>>  
>>> -	if (*hole_pfn < start_pfn)
>>> -		init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
>>> +	if (*hole_pfn < start_pfn) {
>>> +		unsigned long pgcnt;
>>> +
>>> +		if (*hole_pfn < zone_start_pfn) {
>>> +			init_unavailable_range(*hole_pfn, zone_start_pfn,
>>> +					       zone_id, nid);
>>> +			pgcnt = init_unavailable_range(zone_start_pfn,
>>> +					start_pfn, zone_id, nid);
>>
>> Indentation of parameters.
>>
>>> +		} else {
>>> +			pgcnt = init_unavailable_range(*hole_pfn, start_pfn,
>>> +					zone_id, nid);
>>
>>
>> Same here.
>>
>>> +		}
>>> +		zone->pages_with_online_memmap += pgcnt;
>>> +	}
>>
>>
>> Maybe something like the following could make it nicer to read, just a
>> thought.
>>
>>
>> unsigned long hole_start_pfn = *hole_pfn;
>>
>> if (hole_start_pfn < zone_start_pfn) {
>> 	init_unavailable_range(hole_start_pfn, zone_start_pfn,
>> 			       zone_id, nid);
>> 	hole_start_pfn = zone_start_pfn;
>> }
>> pgcnt = init_unavailable_range(hole_start_pfn, start_pfn,
>> 			       zone_id, nid);
>>
> 
> Yeah, this looks better :)
> 
> sashiko had several comments
> https://sashiko.dev/#/patchset/20260408031615.1831922-1-yuan1.liu%40intel.com
> 
> I skipped the ones related to hotplug, but in the mm_init part the comment
> about zones that can have overlapping physical spans when mirrored
> kernelcore is enabled seems valid.

The set_zone_contiguous/clear_zone_contiguous can be ignored I think.

The comment about shrink_zone_span() is likely not realistic.
shrink_zone_span() would not shrink over boot holes.

Well, unless we have an odd case where the hole+memory starts in the
middle of a "PAGES_PER_SUBSECTION". That would already be problematic if
memory starts/ends in the middle of a PAGES_PER_SUBSECTION chunk. I
don't such a case exists.

We could improve shrink_zone_span() to let
find_smallest_section_pfn/find_biggest_section_pfn test the pfn_to_nid()
and page_zone() not on;y on the smallest/highest pfn, but also on the
highest/smallest PFN in a PAGES_PER_SUBSECTION chunk.

No need to test pfn_to_online_page() twice, as that is the same result
for all pages in a PAGES_PER_SUBSECTION chunk.

-- 
Cheers,

David

next prev parent reply	other threads:[~2026-04-09 15:08 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-08  3:16 [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range Yuan Liu
2026-04-08  7:36 ` David Hildenbrand (Arm)
2026-04-08 12:29   ` Liu, Yuan1
2026-04-08 12:31     ` David Hildenbrand (Arm)
2026-04-08 12:37       ` Liu, Yuan1
2026-04-09 14:40   ` Mike Rapoport
2026-04-09 15:08     ` David Hildenbrand (Arm) [this message]
2026-04-14  7:06       ` Liu, Yuan1
2026-04-14  9:24         ` David Hildenbrand (Arm)
2026-04-17  6:34           ` Liu, Yuan1
2026-04-17  9:00             ` David Hildenbrand (Arm)
2026-04-17  9:28               ` Liu, Yuan1
2026-04-20 14:03             ` Mike Rapoport
2026-04-21  0:00               ` Liu, Yuan1
2026-04-28  0:37         ` Liu, Yuan1
2026-04-13 13:06 ` Wei Yang
2026-04-13 18:24   ` David Hildenbrand (Arm)
2026-04-14  2:12     ` Wei Yang
2026-04-14  9:32       ` David Hildenbrand (Arm)
2026-04-15  2:30         ` Wei Yang
2026-04-15  9:11           ` David Hildenbrand (Arm)
2026-04-16  2:23 ` Wei Yang
2026-04-16  7:15   ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e86fee84-08d8-4563-8596-e40d8e196799@kernel.org \
    --to=david@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nanhai.zou@intel.com \
    --cc=osalvador@suse.de \
    --cc=pan.deng@intel.com \
    --cc=qiuxu.zhuo@intel.com \
    --cc=richard.weiyang@gmail.com \
    --cc=rppt@kernel.org \
    --cc=tianyou.li@intel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=yong.hu@intel.com \
    --cc=yu.c.chen@intel.com \
    --cc=yuan1.liu@intel.com \
    --cc=zhangchen.kidd@jd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.