From: "David Hildenbrand (Arm)" <david@kernel.org>
To: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-cxl@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Oscar Salvador <osalvador@suse.de>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH 09/14] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling
Date: Fri, 20 Mar 2026 19:49:46 +0100 [thread overview]
Message-ID: <306f0fd2-cba1-4f48-af05-248be2bc9506@kernel.org> (raw)
In-Reply-To: <f23aa64a-d78f-4c57-a08b-5c589889b7fc@lucifer.local>
On 3/17/26 20:48, Lorenzo Stoakes (Oracle) wrote:
> On Tue, Mar 17, 2026 at 05:56:47PM +0100, David Hildenbrand (Arm) wrote:
>> In 2008, we added through commit 48c906823f39 ("memory hotplug: allocate
>> usemap on the section with pgdat") quite some complexity to try
>> allocating memory for the "usemap" (storing pageblock information
>> per memory section) for a memory section close to the memory of the
>> "pgdat" of the node.
>>
>> The goal was to make memory hotunplug of boot memory more likely to
>> succeed. That commit also added some checks for circular dependencies
>> between two memory sections, whereby two memory sections would contain
>> each others usemap, turning bot memory sections un-removable.
>
> Typo: bot -> both. Presumably you are not talking about memory a bot of some
> kind allocated :P
>
>>
>> However, in 2010, commit a4322e1bad91 ("sparsemem: Put usemap for one node
>> together") started allocating the usemap for multiple memory
>> sections on the same node in one chunk, effectively grouping all usemap
>> allocations of the same node in a single memblock allocation.
>>
>> We don't really give guarantees about memory hotunplug of boot memory, and
>> with the change in 2010, it is pretty much impossible in practice to get
>> any circular dependencies.
>
> Pretty much impossible? :) We can probably go so far as to so impossible no?
Yes.
>
>>
>> commit 48c906823f39 ("memory hotplug: allocate usemap on the section with
>> pgdat") also added the comment:
>>
>> "Similarly, a pgdat can prevent a section being removed. If
>> section A contains a pgdat and section B
>> contains the usemap, both sections become inter-dependent."
>>
>> Given that we don't free the pgdat anymore, that comment (and handling)
>> does not apply.
>
> Isn't pgdat synonymous with a node and that's the data structure that describes
> a node right? Confusingly typedef'd from pglist_data to pg_data_t but then
> referred to as pgdat because all that makes so much sense :)
Yeah, in general we refer to the NODE_DATA as pgdat (grep for it and
you'll be surprised).
>
> But I'm confused, does a section containing a pgdat mean a section having the
> pgdat data structure literally allocated in it?
Yes. "struct pgdat" placed in some memory section.
>
> A usemap is... something that tracks pageblock metadata I think right?
Yes. Essentially a large array of bytes, whereby each byte describes a
pageblock data (migratetype etc)
>
> Anyway I'm also confused by 'given we don't free the pgdat any more', but the
> comment says a 'pgdat can prevent a section being removed' rather than anything
> about it being removed?
Well, if a pgdat resides in some memory section, given that it is
unmovable turns the whole memory section unremovable -> hotunplug fails.
Assuming you could free the pgdat when the node goes offlining, you
would turn that memory section removable.
And I think that commit somehow assumed that the last memory section
could be removed if all it contains is the corresponding pgdat (which
was never the case).
>
> I guess it means the OTHER section could be prevented from being removed even
> after it's gone.. somehow?
>
> Anyway! I think maybe this could be clearer, somehow :)
I'm afraid the whole purpose of the original patch was sketchy, which is
also while I fail to even explain the original motivation clearly.
Now it's fortunately no longer required. :)
>
>>
>> So let's simply remove this complexity.
>>
>> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
>
> I think what you've done in the patch is right though, we're not doing any of
> these dances after a4322e1bad91 and pgdats sitting around mean we don't really
> care about where the usemap goes anyway I don't think so...
>
> I usemap and I find myself in a place where I give you a:
>
> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
>
Thanks ;)
[...]
>> -
>> #ifdef CONFIG_SPARSEMEM_VMEMMAP
>> unsigned long __init section_map_size(void)
>> {
>> @@ -486,7 +390,6 @@ void __init sparse_init_early_section(int nid, struct page *map,
>> unsigned long pnum, unsigned long flags)
>> {
>> BUG_ON(!sparse_usagebuf || sparse_usagebuf >= sparse_usagebuf_end);
>> - check_usemap_section_nr(nid, sparse_usagebuf);
>> sparse_init_one_section(__nr_to_section(pnum), pnum, map,
>> sparse_usagebuf, SECTION_IS_EARLY | flags);
>> sparse_usagebuf = (void *)sparse_usagebuf + mem_section_usage_size();
>> @@ -497,8 +400,7 @@ static int __init sparse_usage_init(int nid, unsigned long map_count)
>> unsigned long size;
>>
>> size = mem_section_usage_size() * map_count;
>> - sparse_usagebuf = sparse_early_usemaps_alloc_pgdat_section(
>> - NODE_DATA(nid), size);
>> + sparse_usagebuf = memblock_alloc_node(size, SMP_CACHE_BYTES, nid);
>
> I guess nid here is the same node as the pgdat?
Yes! before we used NODE_DATA(nid)->node_id, which is really just ... nid :)
--
Cheers,
David
next prev parent reply other threads:[~2026-03-20 18:49 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 16:56 [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups David Hildenbrand (Arm)
2026-03-17 16:56 ` [PATCH 01/14] mm/memory_hotplug: remove for_each_valid_pfn() usage David Hildenbrand (Arm)
2026-03-17 17:19 ` Lorenzo Stoakes (Oracle)
2026-03-17 20:30 ` David Hildenbrand (Arm)
2026-03-18 7:51 ` Mike Rapoport
2026-03-17 16:56 ` [PATCH 02/14] mm/sparse: remove WARN_ONs from (online|offline)_mem_sections() David Hildenbrand (Arm)
2026-03-17 17:21 ` Lorenzo Stoakes (Oracle)
2026-03-18 7:53 ` Mike Rapoport
2026-03-17 16:56 ` [PATCH 03/14] mm/Kconfig: make CONFIG_MEMORY_HOTPLUG depend on CONFIG_SPARSEMEM_VMEMMAP David Hildenbrand (Arm)
2026-03-17 17:22 ` Lorenzo Stoakes (Oracle)
2026-03-18 7:55 ` Mike Rapoport
2026-03-17 16:56 ` [PATCH 04/14] mm/memory_hotplug: simplify check_pfn_span() David Hildenbrand (Arm)
2026-03-17 17:24 ` Lorenzo Stoakes (Oracle)
2026-03-18 7:56 ` Mike Rapoport
2026-03-17 16:56 ` [PATCH 05/14] mm/sparse: remove !CONFIG_SPARSEMEM_VMEMMAP leftovers for CONFIG_MEMORY_HOTPLUG David Hildenbrand (Arm)
2026-03-17 17:54 ` Lorenzo Stoakes (Oracle)
2026-03-18 7:58 ` Mike Rapoport
2026-03-17 16:56 ` [PATCH 06/14] mm/bootmem_info: remove handling for !CONFIG_SPARSEMEM_VMEMMAP David Hildenbrand (Arm)
2026-03-17 17:49 ` Lorenzo Stoakes (Oracle)
2026-03-18 8:15 ` Mike Rapoport
2026-03-20 18:37 ` David Hildenbrand (Arm)
2026-03-17 16:56 ` [PATCH 07/14] mm/bootmem_info: avoid using sparse_decode_mem_map() David Hildenbrand (Arm)
2026-03-17 18:02 ` Lorenzo Stoakes (Oracle)
2026-03-18 8:20 ` Mike Rapoport
2026-03-17 16:56 ` [PATCH 08/14] mm/sparse: remove sparse_decode_mem_map() David Hildenbrand (Arm)
2026-03-17 19:25 ` Lorenzo Stoakes (Oracle)
2026-03-18 8:20 ` Mike Rapoport
2026-03-17 16:56 ` [PATCH 09/14] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling David Hildenbrand (Arm)
2026-03-17 19:48 ` Lorenzo Stoakes (Oracle)
2026-03-20 18:49 ` David Hildenbrand (Arm) [this message]
2026-03-20 18:58 ` David Hildenbrand (Arm)
2026-03-18 8:34 ` Mike Rapoport
2026-03-17 16:56 ` [PATCH 10/14] mm: prepare to move subsection_map_init() to mm/sparse-vmemmap.c David Hildenbrand (Arm)
2026-03-17 19:51 ` Lorenzo Stoakes (Oracle)
2026-03-20 18:59 ` David Hildenbrand (Arm)
2026-03-18 8:46 ` Mike Rapoport
2026-03-20 19:01 ` David Hildenbrand (Arm)
2026-03-17 16:56 ` [PATCH 11/14] mm/sparse: drop set_section_nid() from sparse_add_section() David Hildenbrand (Arm)
2026-03-17 19:55 ` Lorenzo Stoakes (Oracle)
2026-03-18 8:50 ` Mike Rapoport
2026-03-17 16:56 ` [PATCH 12/14] mm/sparse: move sparse_init_one_section() to internal.h David Hildenbrand (Arm)
2026-03-17 20:00 ` Lorenzo Stoakes (Oracle)
2026-03-18 8:54 ` Mike Rapoport
2026-03-17 16:56 ` [PATCH 13/14] mm/sparse: move __section_mark_present() " David Hildenbrand (Arm)
2026-03-17 20:01 ` Lorenzo Stoakes (Oracle)
2026-03-18 8:56 ` Mike Rapoport
2026-03-20 19:06 ` David Hildenbrand (Arm)
2026-03-17 16:56 ` [PATCH 14/14] mm/sparse: move memory hotplug bits to sparse-vmemmap.c David Hildenbrand (Arm)
2026-03-17 20:09 ` Lorenzo Stoakes (Oracle)
2026-03-20 19:07 ` David Hildenbrand (Arm)
2026-03-18 8:57 ` Mike Rapoport
2026-03-18 19:51 ` [PATCH 00/14] mm: memory hot(un)plug and SPARSEMEM cleanups Andrew Morton
2026-03-18 19:54 ` David Hildenbrand (Arm)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=306f0fd2-cba1-4f48-af05-248be2bc9506@kernel.org \
--to=david@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=osalvador@suse.de \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox