From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B48AD3002CF for ; Tue, 14 Apr 2026 02:12:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.43 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776132745; cv=none; b=AgISXAEtChyFQ7lPqBFyL3k0+q9v65rsdN0D41077ZZEYAiJnlzXlW1/tOkh0+wD0C6S8vavfuzcLbN2gHELgowtyUNioASfyGc/rboH/D0gOMqBWznZrc5GU8ru5JurjcA7UpM6MZqU39VVy/VIiVsnZdADinCZwpY6KoZs7SU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776132745; c=relaxed/simple; bh=bsCQxIVsrPwBYScakkshWGeg7MSCEX8ZE7efRUp8/Sk=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=OQrMDhXWWRPm8rbOZItH+Sd8r+VrHJFxRIMDACDpNBmjpbm+EoImcQ6PdRUWS4AWajWO4xlxb/iROHnd3lRL33wKJIpUh45YIGidUU7jukH+gEyaczFdyDDMBd0vaeiqT1AYM4scho9a+Nz5aHhSsYvtwNhzqWe3CGqEXnBnw+0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=G6dCUJKN; arc=none smtp.client-ip=209.85.208.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="G6dCUJKN" Received: by mail-ed1-f43.google.com with SMTP id 4fb4d7f45d1cf-670c6c6e64dso6011981a12.2 for ; Mon, 13 Apr 2026 19:12:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776132742; x=1776737542; darn=vger.kernel.org; h=user-agent:in-reply-to:content-disposition:mime-version:references :reply-to:message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=RnjfpD+5B59teRstSFpwFq28zSHyCsooHf0UM8fMsW4=; b=G6dCUJKNEq2G0OkjFd36OoX46R3Y0ZxZsDHcc0Aem9R+/QU12xrBmJTAxUj60KY2od CKfGezeWTmiWK6AQfhPIFV4hf8abe9KyNINP7M+h2K6eI+cShWxTtHbUmv5wy4boykbR 8xkirUEJoI+UhbvNZeD2LGuOE8PNOGKh97W/86eB6o0HMuoXbkKn/5x06XHYsvN98JWI 0D0Mgo4kWycuYmHZ1VxIYB12CQi8tTBb5M0im3ydK95u69G3faYFC/ggH6hkRKkfXwcA GzRJMpN3DFRUN2nGFApUpmOAd22W3b0fL7v3XCxG2S7QZfoRalJ4qo0U2YeWCeJ5+zlM mN+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776132742; x=1776737542; h=user-agent:in-reply-to:content-disposition:mime-version:references :reply-to:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=RnjfpD+5B59teRstSFpwFq28zSHyCsooHf0UM8fMsW4=; b=h56taUMDtyBlHJv2wFNRrl+KPBjwzRxxTeyCuOBv05pzYrBstpRbJE255buBT5vRxV ZOSNqX0URR3gBSS5JG/hzeXb4PGXATE3sQ5dQjG/mRdO9+1zI3y56bpr+CKvYG4uk9kh ZXuKa0gWIXFpeL1Ezc4nysaWgHbltdTgei+auiADvvIs2rreXSBaU0s1tNXSVl6TaeFk a93W8Eb4vHXZIX75jFM5Yg+a8pKY2Bf9LWO6fGjmG54M3XFgS8bhAyAMOfwh33nVNX15 eyeabVM7yfGoUeHD0JNaKJI/i/dedKjzK6MHFwj4uo/pTotp0yzjCKw2nCqv3aZr5JXM iUSg== X-Forwarded-Encrypted: i=1; AFNElJ+hUbmIBwrreGQfumlHDBPSa8wJk4IhdcHn60nk1VeLD79RhCZGvWjGRkwVXWEBmpii7ipxrgUwZxQUxow=@vger.kernel.org X-Gm-Message-State: AOJu0YwPDQO7/t9xrfZCByiOWeQzAPhocTWZFgNcna1KwDxuljiKUTwT qz9MZTbT/MT53ngi3zCHmols8WhJ7b2wb62fxH07cjcNo+5P4dE4Y7oG X-Gm-Gg: AeBDiesoHkuCn0nk0qOpAyTABjQS6hPWRQVpQRvXushkxQtvYukyiQjGdPlYMD2amGr 0GiNvqiL8qu6iEbb3YJCvmlnFXiPCjKkarMA/bQcRuytt1UcdnwL1qWMIsY0kq/WYehADU1SL8D K3DKsImf4wwYfX5R7BWZsKi5a53cL4+SRYv3mXSoyZN3gq8E9RHsZS90/mcxybLZj+7IvcXTrhw UsQZvOAyRgyHx4LiP7CGSNhWSRRfZ7A6N9Gtgz5FP6/Pe7sS9FgSyS+q2KcpfvGhk8sqTABZL17 XgwU7vr5W987mSHc3PXhnX1T0kX/499LDQoBI5Z0VY5dMYC0k61WbDKX0wsar1E9cl2WpAURvGT 60+EwmhQwMlVfcAxb8w9jdT9uX2wfhY5pJGhXqZyFB+sNxN0mVLD4RAe1tIOzPBLghLqA58n0wv CWjnvSrTojEbAjsm6L46j+6g== X-Received: by 2002:a17:907:894f:b0:b9b:207c:f7af with SMTP id a640c23a62f3a-b9d7267bf50mr841919466b.42.1776132741794; Mon, 13 Apr 2026 19:12:21 -0700 (PDT) Received: from localhost ([185.92.221.13]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-b9d6de97e93sm351679766b.9.2026.04.13.19.12.20 (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Mon, 13 Apr 2026 19:12:20 -0700 (PDT) Date: Tue, 14 Apr 2026 02:12:19 +0000 From: Wei Yang To: "David Hildenbrand (Arm)" Cc: Wei Yang , Yuan Liu , Oscar Salvador , Mike Rapoport , linux-mm@kvack.org, Yong Hu , Nanhai Zou , Tim Chen , Qiuxu Zhuo , Yu C Chen , Pan Deng , Tianyou Li , Chen Zhang , linux-kernel@vger.kernel.org Subject: Re: [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range Message-ID: <20260414021219.wayysugpfbzirzh6@master> Reply-To: Wei Yang References: <20260408031615.1831922-1-yuan1.liu@intel.com> <20260413130633.knzkliyqvjhuz2kd@master> <1928b6b0-2ec3-43ca-a41b-e880d974af04@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1928b6b0-2ec3-43ca-a41b-e880d974af04@kernel.org> User-Agent: NeoMutt/20170113 (1.7.2) On Mon, Apr 13, 2026 at 08:24:05PM +0200, David Hildenbrand (Arm) wrote: >> With the last memblock region fits in Node 1 Zone Normal. >> >> Then I punch a hole in this region with 2M(subsection) size with following >> change, to mimic there is a hole in memory range: >> >> @@ -1372,5 +1372,8 @@ __init void e820__memblock_setup(void) >> /* Throw away partial pages: */ >> memblock_trim_memory(PAGE_SIZE); >> >> + memblock_remove(0x140000000, 0x200000); >> + >> memblock_dump_all(); >> } >> >> Then the memblock dump shows: >> >> MEMBLOCK configuration: >> memory size = 0x000000017fd7dc00 reserved size = 0x0000000005a97 9c2 >> memory.cnt = 0x4 >> memory[0x0] [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0 >> memory[0x1] [0x0000000000100000-0x00000000bffdefff], 0x00000000bfedf000 bytes on node 0 flags: 0x0 >> +- memory[0x2] [0x0000000100000000-0x000000013fffffff], 0x0000000040000000 bytes on node 1 flags: 0x0 >> +- memory[0x3] [0x0000000140200000-0x00000001bfffffff], 0x000000007fe00000 bytes on node 1 flags: 0x0 >> >> We can see the original one memblock region is divided into two, with a hole >> of 2M in the middle. > >Yes, that makes sense. > >> >> Not sure this is a reasonable mimic of memory hole. Also I tried to >> punch a larger hole, e.g. 10M, still see the behavioral change. >> >> The /proc/zoneinfo result: >> >> w/o patch >> >> Node 1, zone Normal >> pages free 469271 >> boost 0 >> min 8567 >> low 10708 >> high 12849 >> promo 14990 >> spanned 786432 >> present 785920 >> contigu 0 <--- zone is non-contiguous >> managed 766024 >> cma 0 >> >> with patch >> >> Node 1, zone Normal >> pages free 121098 >> boost 0 >> min 8665 >> low 10831 >> high 12997 >> promo 15163 >> spanned 786432 >> present 785920 >> contigu 1 <--- zone is contiguous >> managed 773041 >> cma 0 >> >> This shows we treat Node 1 Zone Normal as non-contiguous before, but treat >> it a contiguous zone after this patch. >> >> Reason: >> >> set_zone_contiguous() >> __pageblock_pfn_to_page() >> pfn_to_online_page() >> pfn_section_valid() <--- check subsection >> >> When SPARSEMEM_VMEMMEP is set, pfn_section_valid() checks subsection bit to >> decide if it is valid. For a hole, the corresponding bit is not set. So it >> is non-contiguous before the patch. >> >> After this patch, the memory map in this hole also contributes to >> pages_with_online_memmap, so it is treated as contiguous. > >That means that mm init code actually initialized a memmap, so there is >a memmap there that is properly initialized? > >So init_unavailable_range()->for_each_valid_pfn() processed these >sub-section holes I guess. > Yes, I think so. When memmap_init()->for_each_mem_pfn_range() iterate on the last memblock region, init_unavailable_range() will init the hole. >subsection_map_init() takes care of initializing the subsections. That >happens before memmap_init() in free_area_init(). > Yes. I guess you mean sparse_init_subsection_map(). >Is there a problem in for_each_valid_pfn()? > >And I think there is in first_valid_pfn: > You mean there is a problem in first_valid_pfn? > if (valid_section(ms) && > (early_section(ms) || pfn_section_first_valid(ms, &pfn))) { > rcu_read_unlock_sched(); > return pfn; > } > >The PFN is valid, but we actually care about whether it will be online. >So likely, we should skip over sub-sections here also for early sections >(even though the memmap exist, nobody should be looking at it, just like >for an offline memory section). > And it should be like below? if (valid_section(ms) && pfn_section_first_valid(ms, &pfn)) { rcu_read_unlock_sched(); return pfn; } IIUC, this would skip hole and leave allocated memory map uninitialized. And then those pages won't contribute to pages_with_online_memmap, which further leave the zone non-contiguous. But we want zone to be contiguous when we have a hole like this, right? Sorry, I don't follow here. >> >> Some question: >> >> I suspect with !SPARSEMEM_VMEMMEP, we always treat Zone Normal as >> contiguous, because we don't set subsection. So it looks the behavior is >> different from SPARSEMEM_VMEMMEP. But I didn't manage to build kernel with >> !SPARSEMEM_VMEMMEP to verify. >> >> I see the discussion on defining zone->contiguous as safe to use >> pfn_to_page() for the whole zone. For this purpose, current change looks >> good to me. Since we do allocate and init memory map for holes. > >Right. > >> >> But pageblock_pfn_to_page() is used for compaction and other. A pfn with >> memory map but no actual memory seems not guarantee to be a usable page. So >> the correct usage of pageblock_pfn_to_page() is after >> pageblock_pfn_to_page() return a page, we should validate each page in the >> range before using? I am a little lost here. > >These non-existent pages (holes) are no different than allocated >un-movable memory. So compaction code must deal with them. Just like >smaller memory holes that don't cover a full memory section. > Thanks for explanation. >-- >Cheers, > >David -- Wei Yang Help you, Help me