From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3378B2AD35 for ; Thu, 11 Dec 2025 05:07:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765429674; cv=none; b=JLV3Jyunk9CEAepT925mJ2sy1zBO3XHmwwBUeO28x2osGIo+zNppEwqSfEsqHEVmC1gBOFaPQjFhhiji2wzTdKpamRwfl0nsIG0voOM04HjNrgudYJryPTX3pFmqhk6oqNTVvmOe2GjgJmkB930Sb6frU+Ei2hXMam10nHvFHf0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765429674; c=relaxed/simple; bh=CoF70oxFrtVWSs7KXMWlhiTwgJe/OyPVxsUwgAvPmhk=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=X9AqjCYArf9hFPkUaorMPxm0fFvimG/9CY+9+n2bkH1Jxx+KoXLvYu7U+OGEpqeFEZOoeXC14Fp9E3UfsbVlHkITolnhK9tdBiOwoPNFbLoUSRWV2sjB5BCWF3yX1+ZlTTNVLaLs4Ug6aP0xGy9pQbG26GSJ0SQq7tjmA4yscYM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=suse.de; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=iX5bt+hQ; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=hNw/kktA; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=kUOpH0Z8; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=9QyR5IDC; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="iX5bt+hQ"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="hNw/kktA"; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="kUOpH0Z8"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="9QyR5IDC" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 0589633711; Thu, 11 Dec 2025 05:07:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1765429669; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=m7DNDRFGzVCcVWE+kBBV+oyh/JbLf2abSR0tz8270oM=; b=iX5bt+hQDAJC5S4LfiUX18S3l2QkD8B2WemtgtdEYM9IQHDTG2lxrcGmuYJiB+KLC1eP2b 040ugC63kRy1PHAgrCjB9qn7rDwmDvphdeGkO+0r6IQnf7rtayNGHiARcVzNFFM89BEA1g vr4pdZBWl9H/EtRbjIN7kXX67b9mBak= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1765429669; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=m7DNDRFGzVCcVWE+kBBV+oyh/JbLf2abSR0tz8270oM=; b=hNw/kktAhwG/ePOyE+JI2y/i79asQr5nYn/RykAJ4vgjyikRbz97x1Lwlu9ODm4ZsD/eK5 WQjodRnVCUWLKdDA== Authentication-Results: smtp-out1.suse.de; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=kUOpH0Z8; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=9QyR5IDC DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1765429668; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=m7DNDRFGzVCcVWE+kBBV+oyh/JbLf2abSR0tz8270oM=; b=kUOpH0Z8JCjmGDRYsM0c7GhyP/n4CBGGZAdWg/UuoN49qP5bqiYF3vo6MWY8dE40HFN1m1 /fxZdzOsodoGcGR3P0wKn9Qzgre71HxOmNObNKmxfeT1VtVlkKuOGJyQEjG7w7SYfRaDVL 12FVNZQWbHQzfrkwln8fr8CVLUALGPU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1765429668; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=m7DNDRFGzVCcVWE+kBBV+oyh/JbLf2abSR0tz8270oM=; b=9QyR5IDCtfCYs79YZMeDUXv8DTfQ7i5dcV4CBhRUCQ+FMvRgxEnpmlANSoWkFY7uW30V3Q QH8a86/NE9BFO8Dg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 2B3463EA63; Thu, 11 Dec 2025 05:07:47 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id 4C5EB6NROmmxFQAAD6G6ig (envelope-from ); Thu, 11 Dec 2025 05:07:47 +0000 Date: Thu, 11 Dec 2025 06:07:45 +0100 From: Oscar Salvador To: Tianyou Li Cc: David Hildenbrand , Mike Rapoport , Wei Yang , linux-mm@kvack.org, Yong Hu , Nanhai Zou , Yuan Liu , Tim Chen , Qiuxu Zhuo , Yu C Chen , Pan Deng , Chen Zhang , linux-kernel@vger.kernel.org Subject: Re: [PATCH v5 1/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range Message-ID: References: <20251208152544.1150732-1-tianyou.li@intel.com> <20251208152544.1150732-2-tianyou.li@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20251208152544.1150732-2-tianyou.li@intel.com> X-Spam-Flag: NO X-Spam-Score: -4.01 X-Rspamd-Queue-Id: 0589633711 X-Spamd-Result: default: False [-4.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; SUSPICIOUS_RECIPS(1.50)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; DWL_DNSWL_LOW(-1.00)[suse.de:dkim]; R_DKIM_ALLOW(-0.20)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; RCPT_COUNT_TWELVE(0.00)[14]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FUZZY_RATELIMITED(0.00)[rspamd.com]; FREEMAIL_ENVRCPT(0.00)[gmail.com]; RCVD_TLS_ALL(0.00)[]; TO_DN_SOME(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; FREEMAIL_CC(0.00)[redhat.com,kernel.org,gmail.com,kvack.org,intel.com,linux.intel.com,jd.com,vger.kernel.org]; TAGGED_RCPT(0.00)[]; MISSING_XM_UA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; DKIM_TRACE(0.00)[suse.de:+]; DBL_BLOCKED_OPENRESOLVER(0.00)[intel.com:email,imap1.dmz-prg2.suse.org:rdns,imap1.dmz-prg2.suse.org:helo,suse.de:dkim] X-Rspamd-Server: rspamd1.dmz-prg2.suse.org X-Rspamd-Action: no action X-Spam-Level: On Mon, Dec 08, 2025 at 11:25:43PM +0800, Tianyou Li wrote: > When invoke move_pfn_range_to_zone or remove_pfn_range_from_zone, it will > update the zone->contiguous by checking the new zone's pfn range from the > beginning to the end, regardless the previous state of the old zone. When > the zone's pfn range is large, the cost of traversing the pfn range to > update the zone->contiguous could be significant. > > Add fast paths to quickly detect cases where zone is definitely not > contiguous without scanning the new zone. The cases are: when the new range > did not overlap with previous range, the contiguous should be false; if the > new range adjacent with the previous range, just need to check the new > range; if the new added pages could not fill the hole of previous zone, the > contiguous should be false. > > The following test cases of memory hotplug for a VM [1], tested in the > environment [2], show that this optimization can significantly reduce the > memory hotplug time [3]. > > +----------------+------+---------------+--------------+----------------+ > | | Size | Time (before) | Time (after) | Time Reduction | > | +------+---------------+--------------+----------------+ > | Plug Memory | 256G | 10s | 2s | 80% | > | +------+---------------+--------------+----------------+ > | | 512G | 33s | 6s | 81% | > +----------------+------+---------------+--------------+----------------+ > > +----------------+------+---------------+--------------+----------------+ > | | Size | Time (before) | Time (after) | Time Reduction | > | +------+---------------+--------------+----------------+ > | Unplug Memory | 256G | 10s | 2s | 80% | > | +------+---------------+--------------+----------------+ > | | 512G | 34s | 6s | 82% | > +----------------+------+---------------+--------------+----------------+ > > [1] Qemu commands to hotplug 256G/512G memory for a VM: > object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on > device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1 > qom-set vmem1 requested-size 256G/512G (Plug Memory) > qom-set vmem1 requested-size 0G (Unplug Memory) > > [2] Hardware : Intel Icelake server > Guest Kernel : v6.18-rc2 > Qemu : v9.0.0 > > Launch VM : > qemu-system-x86_64 -accel kvm -cpu host \ > -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \ > -drive file=./seed.img,format=raw,if=virtio \ > -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \ > -m 2G,slots=10,maxmem=2052472M \ > -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \ > -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \ > -nographic -machine q35 \ > -nic user,hostfwd=tcp::3000-:22 > > Guest kernel auto-onlines newly added memory blocks: > echo online > /sys/devices/system/memory/auto_online_blocks > > [3] The time from typing the QEMU commands in [1] to when the output of > 'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged > memory is recognized. > > Reported-by: Nanhai Zou > Reported-by: Chen Zhang > Tested-by: Yuan Liu > Reviewed-by: Tim Chen > Reviewed-by: Qiuxu Zhuo > Reviewed-by: Yu C Chen > Reviewed-by: Pan Deng > Reviewed-by: Nanhai Zou > Reviewed-by: Yuan Liu > Signed-off-by: Tianyou Li Overall this looks good to me, thanks Tianyou Li for working on this. Just some minor comments below: > --- > mm/internal.h | 8 +++++- > mm/memory_hotplug.c | 64 ++++++++++++++++++++++++++++++++++++++++++--- > mm/mm_init.c | 13 +++++++-- > 3 files changed, 79 insertions(+), 6 deletions(-) > > diff --git a/mm/internal.h b/mm/internal.h > index 1561fc2ff5b8..1b5bba6526d4 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -730,7 +730,13 @@ static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn, > return __pageblock_pfn_to_page(start_pfn, end_pfn, zone); > } > > -void set_zone_contiguous(struct zone *zone); > +enum zone_contig_state { > + ZONE_CONTIG_YES, > + ZONE_CONTIG_NO, > + ZONE_CONTIG_MAYBE, > +}; > + > +void set_zone_contiguous(struct zone *zone, enum zone_contig_state state); > bool pfn_range_intersects_zones(int nid, unsigned long start_pfn, > unsigned long nr_pages); > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 0be83039c3b5..d711f6e2c87f 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -544,6 +544,28 @@ static void update_pgdat_span(struct pglist_data *pgdat) > pgdat->node_spanned_pages = node_end_pfn - node_start_pfn; > } > > +static enum zone_contig_state __meminit zone_contig_state_after_shrinking( > + struct zone *zone, unsigned long start_pfn, unsigned long nr_pages) Why do we need the __meminit? These functions are only used from memory-hotplug code so we should not need it? > +{ > + const unsigned long end_pfn = start_pfn + nr_pages; > + > + /* > + * If the removed pfn range inside the original zone span, the contiguous > + * property is surely false. > + */ > + if (start_pfn > zone->zone_start_pfn && end_pfn < zone_end_pfn(zone)) > + return ZONE_CONTIG_NO; > + > + /* If the removed pfn range is at the beginning or end of the > + * original zone span, the contiguous property is preserved when > + * the original zone is contiguous. > + */ > + if (start_pfn == zone->zone_start_pfn || end_pfn == zone_end_pfn(zone)) > + return zone->contiguous ? ZONE_CONTIG_YES : ZONE_CONTIG_MAYBE; > + > + return ZONE_CONTIG_MAYBE; > +} > + > void remove_pfn_range_from_zone(struct zone *zone, > unsigned long start_pfn, > unsigned long nr_pages) > @@ -551,6 +573,7 @@ void remove_pfn_range_from_zone(struct zone *zone, > const unsigned long end_pfn = start_pfn + nr_pages; > struct pglist_data *pgdat = zone->zone_pgdat; > unsigned long pfn, cur_nr_pages; > + enum zone_contig_state contiguous_state = ZONE_CONTIG_MAYBE; I think that new_contiguous_state is clearer, but I do not have a strong opinion here. > /* Poison struct pages because they are now uninitialized again. */ > for (pfn = start_pfn; pfn < end_pfn; pfn += cur_nr_pages) { > @@ -571,12 +594,13 @@ void remove_pfn_range_from_zone(struct zone *zone, > if (zone_is_zone_device(zone)) > return; > > + contiguous_state = zone_contig_state_after_shrinking(zone, start_pfn, nr_pages); > clear_zone_contiguous(zone); > > shrink_zone_span(zone, start_pfn, start_pfn + nr_pages); > update_pgdat_span(pgdat); > > - set_zone_contiguous(zone); > + set_zone_contiguous(zone, contiguous_state); > } > ... > @@ -752,7 +809,8 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, > { > struct pglist_data *pgdat = zone->zone_pgdat; > int nid = pgdat->node_id; > - > + const enum zone_contig_state contiguous_state = > + zone_contig_state_after_growing(zone, start_pfn, nr_pages); Same comment from remove_pfn_range_from_zone. > clear_zone_contiguous(zone); > > if (zone_is_empty(zone)) > @@ -783,7 +841,7 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, > MEMINIT_HOTPLUG, altmap, migratetype, > isolate_pageblock); > > - set_zone_contiguous(zone); > + set_zone_contiguous(zone, contiguous_state); > } > > struct auto_movable_stats { > diff --git a/mm/mm_init.c b/mm/mm_init.c > index 7712d887b696..e296bd9fac9e 100644 > --- a/mm/mm_init.c > +++ b/mm/mm_init.c > @@ -2263,11 +2263,19 @@ void __init init_cma_pageblock(struct page *page) > } > #endif > > -void set_zone_contiguous(struct zone *zone) > +void set_zone_contiguous(struct zone *zone, enum zone_contig_state state) > { > unsigned long block_start_pfn = zone->zone_start_pfn; > unsigned long block_end_pfn; > > + if (state == ZONE_CONTIG_YES) { > + zone->contiguous = true; > + return; > + } > + > + if (state == ZONE_CONTIG_NO) > + return; > + > block_end_pfn = pageblock_end_pfn(block_start_pfn); > for (; block_start_pfn < zone_end_pfn(zone); > block_start_pfn = block_end_pfn, > @@ -2283,6 +2291,7 @@ void set_zone_contiguous(struct zone *zone) > > /* We confirm that there is no hole */ > zone->contiguous = true; > + Not needed? > } -- Oscar Salvador SUSE Labs