From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF01F3BC67C for ; Thu, 9 Apr 2026 14:40:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775745613; cv=none; b=Hp43gfPtd/UEbrq436B1bF8KH9Pu07F7tJChozzVslDvQvALPXTMVExeNNgcY3gt7lt5rlnr7r1cvrwTtiZ1/9IWkCAJHaxLR0GDspUm9qv3JpM+WMFmhYO2V1xdY0UNRFzISQYza+1i2B31smlXVPhXEIXhOqRmy3DlTlWG4gg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775745613; c=relaxed/simple; bh=kEAP5eB0z0PaQwrbMq601FsxrVj/+btiOm2UxZu5PFs=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=txgg9dHRGWD+EKDKVLLtqK5O/Ojxtk17xOQdowMm1wuJcPJqii3dYwDphiB6yfXf4Pm/Xva4PopbRch8mlje16o5pGLFbgjkt4T1oGGfpiQdU9cLnYqfDqGuwRgjPQCBQEFk/XipQHRYN10HQwlfjhk3XFG65i/rkao7EadIK/c= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=BIvkZIdm; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="BIvkZIdm" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B96CBC4CEF7; Thu, 9 Apr 2026 14:40:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1775745612; bh=kEAP5eB0z0PaQwrbMq601FsxrVj/+btiOm2UxZu5PFs=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=BIvkZIdmTl5KJXqNs+MFxCnQetWyTSsMEopZThPFaYzp+R6QbklDeetEFoqWxgoWk VPqJBCnYi/bqcqmeRjrq4xQUr++aZQ8y/t81AP94SBGrEN5Au7NtlnN8gIfrwh96Wj 35yZ/xZjf/4w4dPMSdCQ4BVdq5l6dm1C3bvaAxum6wglYZeX91+iQKZPhz7UIktoVi QWilpoGQbqMFOwXzMst8giXWYjqmMH8Qw0UmlEMEAieo70U1nw66aCaDFvgx9XLhB5 CeF0dK2kh5YZ7mcWJS389h7hF/XN6idp/0NtE8uoxK/UhncjL71yxdS+5SLeltJXat qAImlj/TVUh5Q== Date: Thu, 9 Apr 2026 17:40:05 +0300 From: Mike Rapoport To: "David Hildenbrand (Arm)" Cc: Yuan Liu , Oscar Salvador , Wei Yang , linux-mm@kvack.org, Yong Hu , Nanhai Zou , Tim Chen , Qiuxu Zhuo , Yu C Chen , Pan Deng , Tianyou Li , Chen Zhang , linux-kernel@vger.kernel.org Subject: Re: [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range Message-ID: References: <20260408031615.1831922-1-yuan1.liu@intel.com> <17b821b6-0176-43d5-92f7-fe2a0c4f70cf@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <17b821b6-0176-43d5-92f7-fe2a0c4f70cf@kernel.org> On Wed, Apr 08, 2026 at 09:36:14AM +0200, David Hildenbrand (Arm) wrote: > On 4/8/26 05:16, Yuan Liu wrote: > > When move_pfn_range_to_zone() or remove_pfn_range_from_zone() updates a > > zone, set_zone_contiguous() rescans the entire zone pageblock-by-pageblock > > to rebuild zone->contiguous. For large zones this is a significant cost > > during memory hotplug and hot-unplug. > > > > Add a new zone member pages_with_online_memmap that tracks the number of > > pages within the zone span that have an online memory map (including present > > pages and memory holes whose memory map has been initialized). When > > spanned_pages == pages_with_online_memmap the zone is contiguous and > > pfn_to_page() can be called on any PFN in the zone span without further > > pfn_valid() checks. > > > > Only pages that fall within the current zone span are accounted towards > > pages_with_online_memmap. A "too small" value is safe, it merely prevents > > detecting a contiguous zone. > > > > The following test cases of memory hotplug for a VM [1], tested in the > > environment [2], show that this optimization can significantly reduce the > > memory hotplug time [3]. > > > > +----------------+------+---------------+--------------+----------------+ > > | | Size | Time (before) | Time (after) | Time Reduction | > > | +------+---------------+--------------+----------------+ > > | Plug Memory | 256G | 10s | 3s | 70% | > > | +------+---------------+--------------+----------------+ > > | | 512G | 36s | 7s | 81% | > > +----------------+------+---------------+--------------+----------------+ > > > > +----------------+------+---------------+--------------+----------------+ > > | | Size | Time (before) | Time (after) | Time Reduction | > > | +------+---------------+--------------+----------------+ > > | Unplug Memory | 256G | 11s | 4s | 64% | > > | +------+---------------+--------------+----------------+ > > | | 512G | 36s | 9s | 75% | > > +----------------+------+---------------+--------------+----------------+ > > > > [1] Qemu commands to hotplug 256G/512G memory for a VM: > > object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on > > device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1 > > qom-set vmem1 requested-size 256G/512G (Plug Memory) > > qom-set vmem1 requested-size 0G (Unplug Memory) > > > > [2] Hardware : Intel Icelake server > > Guest Kernel : v7.0-rc4 > > Qemu : v9.0.0 > > > > Launch VM : > > qemu-system-x86_64 -accel kvm -cpu host \ > > -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \ > > -drive file=./seed.img,format=raw,if=virtio \ > > -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \ > > -m 2G,slots=10,maxmem=2052472M \ > > -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \ > > -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \ > > -nographic -machine q35 \ > > -nic user,hostfwd=tcp::3000-:22 > > > > Guest kernel auto-onlines newly added memory blocks: > > echo online > /sys/devices/system/memory/auto_online_blocks > > > > [3] The time from typing the QEMU commands in [1] to when the output of > > 'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged > > memory is recognized. > > > > Reported-by: Nanhai Zou > > Reported-by: Chen Zhang > > Tested-by: Yuan Liu > > Reviewed-by: Tim Chen > > Reviewed-by: Qiuxu Zhuo > > Reviewed-by: Yu C Chen > > Reviewed-by: Pan Deng > > Reviewed-by: Nanhai Zou > > Co-developed-by: Tianyou Li > > Signed-off-by: Tianyou Li > > Signed-off-by: Yuan Liu > > Acked-by: David Hildenbrand (Arm) > > --- > > [...] > > > @@ -842,7 +842,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn) > > * zone/node above the hole except for the trailing pages in the last > > * section that will be appended to the zone/node below. > > */ > > -static void __init init_unavailable_range(unsigned long spfn, > > +static unsigned long __init init_unavailable_range(unsigned long spfn, > > unsigned long epfn, > > int zone, int node) > > { > > @@ -858,6 +858,7 @@ static void __init init_unavailable_range(unsigned long spfn, > > if (pgcnt) > > pr_info("On node %d, zone %s: %lld pages in unavailable ranges\n", > > node, zone_names[zone], pgcnt); > > + return pgcnt; > > } > > > > /* > > @@ -956,9 +957,22 @@ static void __init memmap_init_zone_range(struct zone *zone, > > memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn, > > zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE, > > false); > > + zone->pages_with_online_memmap += end_pfn - start_pfn; > > > > - if (*hole_pfn < start_pfn) > > - init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid); > > + if (*hole_pfn < start_pfn) { > > + unsigned long pgcnt; > > + > > + if (*hole_pfn < zone_start_pfn) { > > + init_unavailable_range(*hole_pfn, zone_start_pfn, > > + zone_id, nid); > > + pgcnt = init_unavailable_range(zone_start_pfn, > > + start_pfn, zone_id, nid); > > Indentation of parameters. > > > + } else { > > + pgcnt = init_unavailable_range(*hole_pfn, start_pfn, > > + zone_id, nid); > > > Same here. > > > + } > > + zone->pages_with_online_memmap += pgcnt; > > + } > > > Maybe something like the following could make it nicer to read, just a > thought. > > > unsigned long hole_start_pfn = *hole_pfn; > > if (hole_start_pfn < zone_start_pfn) { > init_unavailable_range(hole_start_pfn, zone_start_pfn, > zone_id, nid); > hole_start_pfn = zone_start_pfn; > } > pgcnt = init_unavailable_range(hole_start_pfn, start_pfn, > zone_id, nid); > Yeah, this looks better :) sashiko had several comments https://sashiko.dev/#/patchset/20260408031615.1831922-1-yuan1.liu%40intel.com I skipped the ones related to hotplug, but in the mm_init part the comment about zones that can have overlapping physical spans when mirrored kernelcore is enabled seems valid. > -- > Cheers, > David -- Sincerely yours, Mike.