From: Muchun Song <muchun.song@linux.dev>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>,
Andrew Morton <akpm@linux-foundation.org>,
yinghai@kernel.org, Lorenzo Stoakes <ljs@kernel.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] mm/sparse: remove sparse_buffer
Date: Thu, 9 Apr 2026 19:40:08 +0800 [thread overview]
Message-ID: <70EF8E41-31A2-4B43-BABE-7218FD5F7271@linux.dev> (raw)
In-Reply-To: <b22f0af1-848d-45da-99b2-0c28b740e979@kernel.org>
> On Apr 8, 2026, at 21:40, David Hildenbrand (Arm) <david@kernel.org> wrote:
>
> On 4/7/26 10:39, Muchun Song wrote:
>> The sparse_buffer was originally introduced in commit 9bdac9142407
>> ("sparsemem: Put mem map for one node together.") to allocate a
>> contiguous block of memory for all memmaps of a NUMA node.
>>
>> However, the original commit message did not clearly state the actual
>> benefits or the necessity of keeping all memmap areas strictly
>> contiguous for a given node.
>
> We don't want the memmap to be scattered around, given that it is one of
> the biggest allocations during boot.
>
> It's related to not turning too many memory blocks/sections
> un-offlinable I think.
Hi David,
Got it.
>
> I always imagined that memblock would still keep these allocations close
> to each other. Can you verify if that is indeed true?
You raised a very interesting point about whether memblock keeps
these allocations close to each other. I've done a thorough test
on a 16GB VM by printing the actual physical allocations.
I enabled the existing debug logs in arch/x86/mm/init_64.c to
trace the vmemmap_set_pmd allocations. Here is what really happens:
When using vmemmap_alloc_block without sparse_buffer, the
memblock allocator allocates 2MB chunks. Because memblock
allocates top-down by default, the physical allocations look
like this:
[ffe6475cc0000000-ffe6475cc01fffff] PMD -> [ff3cb082bfc00000-ff3cb082bfdfffff] on node 0
[ffe6475cc0200000-ffe6475cc03fffff] PMD -> [ff3cb082bfa00000-ff3cb082bfbfffff] on node 0
[ffe6475cc0400000-ffe6475cc05fffff] PMD -> [ff3cb082bf800000-ff3cb082bf9fffff] on node 0
[ffe6475cc0600000-ffe6475cc07fffff] PMD -> [ff3cb082bf600000-ff3cb082bf7fffff] on node 0
[ffe6475cc0800000-ffe6475cc09fffff] PMD -> [ff3cb082bf400000-ff3cb082bf5fffff] on node 0
[ffe6475cc0a00000-ffe6475cc0bfffff] PMD -> [ff3cb082bf200000-ff3cb082bf3fffff] on node 0
[ffe6475cc0c00000-ffe6475cc0dfffff] PMD -> [ff3cb082bf000000-ff3cb082bf1fffff] on node 0
[ffe6475cc0e00000-ffe6475cc0ffffff] PMD -> [ff3cb082bee00000-ff3cb082beffffff] on node 0
[ffe6475cc1000000-ffe6475cc11fffff] PMD -> [ff3cb082bec00000-ff3cb082bedfffff] on node 0
[ffe6475cc1200000-ffe6475cc13fffff] PMD -> [ff3cb082bea00000-ff3cb082bebfffff] on node 0
[ffe6475cc1400000-ffe6475cc15fffff] PMD -> [ff3cb082be800000-ff3cb082be9fffff] on node 0
[ffe6475cc1600000-ffe6475cc17fffff] PMD -> [ff3cb082be600000-ff3cb082be7fffff] on node 0
[ffe6475cc1800000-ffe6475cc19fffff] PMD -> [ff3cb082be400000-ff3cb082be5fffff] on node 0
[ffe6475cc1a00000-ffe6475cc1bfffff] PMD -> [ff3cb082be200000-ff3cb082be3fffff] on node 0
[ffe6475cc1c00000-ffe6475cc1dfffff] PMD -> [ff3cb082be000000-ff3cb082be1fffff] on node 0
[ffe6475cc1e00000-ffe6475cc1ffffff] PMD -> [ff3cb082bde00000-ff3cb082bdffffff] on node 0
[ffe6475cc2000000-ffe6475cc21fffff] PMD -> [ff3cb082bdc00000-ff3cb082bddfffff] on node 0
[ffe6475cc2200000-ffe6475cc23fffff] PMD -> [ff3cb082bda00000-ff3cb082bdbfffff] on node 0
[ffe6475cc2400000-ffe6475cc25fffff] PMD -> [ff3cb082bd800000-ff3cb082bd9fffff] on node 0
[ffe6475cc2600000-ffe6475cc27fffff] PMD -> [ff3cb082bd600000-ff3cb082bd7fffff] on node 0
[ffe6475cc2800000-ffe6475cc29fffff] PMD -> [ff3cb082bd400000-ff3cb082bd5fffff] on node 0
[ffe6475cc2a00000-ffe6475cc2bfffff] PMD -> [ff3cb082bd200000-ff3cb082bd3fffff] on node 0
[ffe6475cc2c00000-ffe6475cc2dfffff] PMD -> [ff3cb082bd000000-ff3cb082bd1fffff] on node 0
[ffe6475cc2e00000-ffe6475cc2ffffff] PMD -> [ff3cb082bce00000-ff3cb082bcffffff] on node 0
[ffe6475cc4000000-ffe6475cc41fffff] PMD -> [ff3cb082bcc00000-ff3cb082bcdfffff] on node 0
[ffe6475cc4200000-ffe6475cc43fffff] PMD -> [ff3cb082bca00000-ff3cb082bcbfffff] on node 0
[ffe6475cc4400000-ffe6475cc45fffff] PMD -> [ff3cb082bc800000-ff3cb082bc9fffff] on node 0
[ffe6475cc4600000-ffe6475cc47fffff] PMD -> [ff3cb082bc600000-ff3cb082bc7fffff] on node 0
[ffe6475cc4800000-ffe6475cc49fffff] PMD -> [ff3cb082bc400000-ff3cb082bc5fffff] on node 0
[ffe6475cc4a00000-ffe6475cc4bfffff] PMD -> [ff3cb082bc200000-ff3cb082bc3fffff] on node 0
[ffe6475cc4c00000-ffe6475cc4dfffff] PMD -> [ff3cb082bc000000-ff3cb082bc1fffff] on node 0
[ffe6475cc4e00000-ffe6475cc4ffffff] PMD -> [ff3cb082bbe00000-ff3cb082bbffffff] on node 0
[ffe6475cc5000000-ffe6475cc51fffff] PMD -> [ff3cb083bfa00000-ff3cb083bfbfffff] on node 1
[ffe6475cc5200000-ffe6475cc53fffff] PMD -> [ff3cb083bf800000-ff3cb083bf9fffff] on node 1
[ffe6475cc5400000-ffe6475cc55fffff] PMD -> [ff3cb083bf600000-ff3cb083bf7fffff] on node 1
[ffe6475cc5600000-ffe6475cc57fffff] PMD -> [ff3cb083bf400000-ff3cb083bf5fffff] on node 1
[ffe6475cc5800000-ffe6475cc59fffff] PMD -> [ff3cb083bf200000-ff3cb083bf3fffff] on node 1
[ffe6475cc5a00000-ffe6475cc5bfffff] PMD -> [ff3cb083bf000000-ff3cb083bf1fffff] on node 1
[ffe6475cc5c00000-ffe6475cc5dfffff] PMD -> [ff3cb083b6e00000-ff3cb083b6ffffff] on node 1
[ffe6475cc5e00000-ffe6475cc5ffffff] PMD -> [ff3cb083b6c00000-ff3cb083b6dfffff] on node 1
[ffe6475cc6000000-ffe6475cc61fffff] PMD -> [ff3cb083b6a00000-ff3cb083b6bfffff] on node 1
[ffe6475cc6200000-ffe6475cc63fffff] PMD -> [ff3cb083b6800000-ff3cb083b69fffff] on node 1
[ffe6475cc6400000-ffe6475cc65fffff] PMD -> [ff3cb083b6600000-ff3cb083b67fffff] on node 1
[ffe6475cc6600000-ffe6475cc67fffff] PMD -> [ff3cb083b6400000-ff3cb083b65fffff] on node 1
[ffe6475cc6800000-ffe6475cc69fffff] PMD -> [ff3cb083b6200000-ff3cb083b63fffff] on node 1
[ffe6475cc6a00000-ffe6475cc6bfffff] PMD -> [ff3cb083b6000000-ff3cb083b61fffff] on node 1
[ffe6475cc6c00000-ffe6475cc6dfffff] PMD -> [ff3cb083b5e00000-ff3cb083b5ffffff] on node 1
[ffe6475cc6e00000-ffe6475cc6ffffff] PMD -> [ff3cb083b5c00000-ff3cb083b5dfffff] on node 1
[ffe6475cc7000000-ffe6475cc71fffff] PMD -> [ff3cb083b5a00000-ff3cb083b5bfffff] on node 1
[ffe6475cc7200000-ffe6475cc73fffff] PMD -> [ff3cb083b5800000-ff3cb083b59fffff] on node 1
[ffe6475cc7400000-ffe6475cc75fffff] PMD -> [ff3cb083b5600000-ff3cb083b57fffff] on node 1
[ffe6475cc7600000-ffe6475cc77fffff] PMD -> [ff3cb083b5400000-ff3cb083b55fffff] on node 1
[ffe6475cc7800000-ffe6475cc79fffff] PMD -> [ff3cb083b5200000-ff3cb083b53fffff] on node 1
[ffe6475cc7a00000-ffe6475cc7bfffff] PMD -> [ff3cb083b5000000-ff3cb083b51fffff] on node 1
[ffe6475cc7c00000-ffe6475cc7dfffff] PMD -> [ff3cb083b4e00000-ff3cb083b4ffffff] on node 1
[ffe6475cc7e00000-ffe6475cc7ffffff] PMD -> [ff3cb083b4c00000-ff3cb083b4dfffff] on node 1
[ffe6475cc8000000-ffe6475cc81fffff] PMD -> [ff3cb083b4a00000-ff3cb083b4bfffff] on node 1
[ffe6475cc8200000-ffe6475cc83fffff] PMD -> [ff3cb083b4800000-ff3cb083b49fffff] on node 1
[ffe6475cc8400000-ffe6475cc85fffff] PMD -> [ff3cb083b4600000-ff3cb083b47fffff] on node 1
[ffe6475cc8600000-ffe6475cc87fffff] PMD -> [ff3cb083b4400000-ff3cb083b45fffff] on node 1
[ffe6475cc8800000-ffe6475cc89fffff] PMD -> [ff3cb083b4200000-ff3cb083b43fffff] on node 1
[ffe6475cc8a00000-ffe6475cc8bfffff] PMD -> [ff3cb083b4000000-ff3cb083b41fffff] on node 1
[ffe6475cc8c00000-ffe6475cc8dfffff] PMD -> [ff3cb083b3e00000-ff3cb083b3ffffff] on node 1
[ffe6475cc8e00000-ffe6475cc8ffffff] PMD -> [ff3cb083b3c00000-ff3cb083b3dfffff] on node 1
Notice that the physical chunks are strictly adjacent to each
other, but in descending order!
So, they are NOT "scattered around" the whole node randomly.
Instead, they are packed densely back-to-back in a single
contiguous physical range (just mapped top-down in 2MB pieces).
Because they are packed tightly together within the same
contiguous physical memory range, they will at most consume or
pollute the exact same number of memory blocks as a single
contiguous allocation (like sparse_buffer did). Therefore, this
will NOT turn additional memory blocks/sections into an
"un-offlinable" state.
It seems we can safely remove the sparse buffer preallocation
mechanism, don't you think?
Thanks,
Muchun
>
> --
> Cheers,
>
> David
next prev parent reply other threads:[~2026-04-09 11:41 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-07 8:39 [RFC PATCH] mm/sparse: remove sparse_buffer Muchun Song
2026-04-08 13:40 ` David Hildenbrand (Arm)
2026-04-09 11:40 ` Muchun Song [this message]
2026-04-09 12:29 ` David Hildenbrand (Arm)
2026-04-09 15:10 ` Mike Rapoport
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=70EF8E41-31A2-4B43-BABE-7218FD5F7271@linux.dev \
--to=muchun.song@linux.dev \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=songmuchun@bytedance.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox