From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Wei Yang <richard.weiyang@gmail.com>, Yuan Liu <yuan1.liu@intel.com>
Cc: Oscar Salvador <osalvador@suse.de>,
Mike Rapoport <rppt@kernel.org>,
linux-mm@kvack.org, Yong Hu <yong.hu@intel.com>,
Nanhai Zou <nanhai.zou@intel.com>,
Tim Chen <tim.c.chen@linux.intel.com>,
Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
Yu C Chen <yu.c.chen@intel.com>, Pan Deng <pan.deng@intel.com>,
Tianyou Li <tianyou.li@intel.com>,
Chen Zhang <zhangchen.kidd@jd.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
Date: Mon, 13 Apr 2026 20:24:05 +0200 [thread overview]
Message-ID: <1928b6b0-2ec3-43ca-a41b-e880d974af04@kernel.org> (raw)
In-Reply-To: <20260413130633.knzkliyqvjhuz2kd@master>
> With the last memblock region fits in Node 1 Zone Normal.
>
> Then I punch a hole in this region with 2M(subsection) size with following
> change, to mimic there is a hole in memory range:
>
> @@ -1372,5 +1372,8 @@ __init void e820__memblock_setup(void)
> /* Throw away partial pages: */
> memblock_trim_memory(PAGE_SIZE);
>
> + memblock_remove(0x140000000, 0x200000);
> +
> memblock_dump_all();
> }
>
> Then the memblock dump shows:
>
> MEMBLOCK configuration:
> memory size = 0x000000017fd7dc00 reserved size = 0x0000000005a97 9c2
> memory.cnt = 0x4
> memory[0x0] [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
> memory[0x1] [0x0000000000100000-0x00000000bffdefff], 0x00000000bfedf000 bytes on node 0 flags: 0x0
> +- memory[0x2] [0x0000000100000000-0x000000013fffffff], 0x0000000040000000 bytes on node 1 flags: 0x0
> +- memory[0x3] [0x0000000140200000-0x00000001bfffffff], 0x000000007fe00000 bytes on node 1 flags: 0x0
>
> We can see the original one memblock region is divided into two, with a hole
> of 2M in the middle.
Yes, that makes sense.
>
> Not sure this is a reasonable mimic of memory hole. Also I tried to
> punch a larger hole, e.g. 10M, still see the behavioral change.
>
> The /proc/zoneinfo result:
>
> w/o patch
>
> Node 1, zone Normal
> pages free 469271
> boost 0
> min 8567
> low 10708
> high 12849
> promo 14990
> spanned 786432
> present 785920
> contigu 0 <--- zone is non-contiguous
> managed 766024
> cma 0
>
> with patch
>
> Node 1, zone Normal
> pages free 121098
> boost 0
> min 8665
> low 10831
> high 12997
> promo 15163
> spanned 786432
> present 785920
> contigu 1 <--- zone is contiguous
> managed 773041
> cma 0
>
> This shows we treat Node 1 Zone Normal as non-contiguous before, but treat
> it a contiguous zone after this patch.
>
> Reason:
>
> set_zone_contiguous()
> __pageblock_pfn_to_page()
> pfn_to_online_page()
> pfn_section_valid() <--- check subsection
>
> When SPARSEMEM_VMEMMEP is set, pfn_section_valid() checks subsection bit to
> decide if it is valid. For a hole, the corresponding bit is not set. So it
> is non-contiguous before the patch.
>
> After this patch, the memory map in this hole also contributes to
> pages_with_online_memmap, so it is treated as contiguous.
That means that mm init code actually initialized a memmap, so there is
a memmap there that is properly initialized?
So init_unavailable_range()->for_each_valid_pfn() processed these
sub-section holes I guess.
subsection_map_init() takes care of initializing the subsections. That
happens before memmap_init() in free_area_init().
Is there a problem in for_each_valid_pfn()?
And I think there is in first_valid_pfn:
if (valid_section(ms) &&
(early_section(ms) || pfn_section_first_valid(ms, &pfn))) {
rcu_read_unlock_sched();
return pfn;
}
The PFN is valid, but we actually care about whether it will be online.
So likely, we should skip over sub-sections here also for early sections
(even though the memmap exist, nobody should be looking at it, just like
for an offline memory section).
>
> Some question:
>
> I suspect with !SPARSEMEM_VMEMMEP, we always treat Zone Normal as
> contiguous, because we don't set subsection. So it looks the behavior is
> different from SPARSEMEM_VMEMMEP. But I didn't manage to build kernel with
> !SPARSEMEM_VMEMMEP to verify.
>
> I see the discussion on defining zone->contiguous as safe to use
> pfn_to_page() for the whole zone. For this purpose, current change looks
> good to me. Since we do allocate and init memory map for holes.
Right.
>
> But pageblock_pfn_to_page() is used for compaction and other. A pfn with
> memory map but no actual memory seems not guarantee to be a usable page. So
> the correct usage of pageblock_pfn_to_page() is after
> pageblock_pfn_to_page() return a page, we should validate each page in the
> range before using? I am a little lost here.
These non-existent pages (holes) are no different than allocated
un-movable memory. So compaction code must deal with them. Just like
smaller memory holes that don't cover a full memory section.
--
Cheers,
David
next prev parent reply other threads:[~2026-04-13 18:24 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-08 3:16 [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range Yuan Liu
2026-04-08 7:36 ` David Hildenbrand (Arm)
2026-04-08 12:29 ` Liu, Yuan1
2026-04-08 12:31 ` David Hildenbrand (Arm)
2026-04-08 12:37 ` Liu, Yuan1
2026-04-09 14:40 ` Mike Rapoport
2026-04-09 15:08 ` David Hildenbrand (Arm)
2026-04-14 7:06 ` Liu, Yuan1
2026-04-14 9:24 ` David Hildenbrand (Arm)
2026-04-17 6:34 ` Liu, Yuan1
2026-04-17 9:00 ` David Hildenbrand (Arm)
2026-04-17 9:28 ` Liu, Yuan1
2026-04-20 14:03 ` Mike Rapoport
2026-04-21 0:00 ` Liu, Yuan1
2026-04-28 0:37 ` Liu, Yuan1
2026-04-13 13:06 ` Wei Yang
2026-04-13 18:24 ` David Hildenbrand (Arm) [this message]
2026-04-14 2:12 ` Wei Yang
2026-04-14 9:32 ` David Hildenbrand (Arm)
2026-04-15 2:30 ` Wei Yang
2026-04-15 9:11 ` David Hildenbrand (Arm)
2026-04-16 2:23 ` Wei Yang
2026-04-16 7:15 ` David Hildenbrand (Arm)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1928b6b0-2ec3-43ca-a41b-e880d974af04@kernel.org \
--to=david@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nanhai.zou@intel.com \
--cc=osalvador@suse.de \
--cc=pan.deng@intel.com \
--cc=qiuxu.zhuo@intel.com \
--cc=richard.weiyang@gmail.com \
--cc=rppt@kernel.org \
--cc=tianyou.li@intel.com \
--cc=tim.c.chen@linux.intel.com \
--cc=yong.hu@intel.com \
--cc=yu.c.chen@intel.com \
--cc=yuan1.liu@intel.com \
--cc=zhangchen.kidd@jd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.