public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Muchun Song <muchun.song@linux.dev>
Cc: Muchun Song <songmuchun@bytedance.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	yinghai@kernel.org, Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] mm/sparse: remove sparse_buffer
Date: Thu, 9 Apr 2026 14:29:38 +0200	[thread overview]
Message-ID: <dcc3b56f-5d4c-4bb0-bfd8-49747df30f47@kernel.org> (raw)
In-Reply-To: <70EF8E41-31A2-4B43-BABE-7218FD5F7271@linux.dev>

On 4/9/26 13:40, Muchun Song wrote:
> 
> 
>> On Apr 8, 2026, at 21:40, David Hildenbrand (Arm) <david@kernel.org> wrote:
>>
>> On 4/7/26 10:39, Muchun Song wrote:
>>> The sparse_buffer was originally introduced in commit 9bdac9142407
>>> ("sparsemem: Put mem map for one node together.") to allocate a
>>> contiguous block of memory for all memmaps of a NUMA node.
>>>
>>> However, the original commit message did not clearly state the actual
>>> benefits or the necessity of keeping all memmap areas strictly
>>> contiguous for a given node.
>>
>> We don't want the memmap to be scattered around, given that it is one of
>> the biggest allocations during boot.
>>
>> It's related to not turning too many memory blocks/sections
>> un-offlinable I think.
> 
> Hi David,
> 
> Got it.
> 
>>
>> I always imagined that memblock would still keep these allocations close
>> to each other. Can you verify if that is indeed true?
> 
> You raised a very interesting point about whether memblock keeps
> these allocations close to each other. I've done a thorough test
> on a 16GB VM by printing the actual physical allocations.
> 
> I enabled the existing debug logs in arch/x86/mm/init_64.c to
> trace the vmemmap_set_pmd allocations. Here is what really happens:
> 
> When using vmemmap_alloc_block without sparse_buffer, the
> memblock allocator allocates 2MB chunks. Because memblock
> allocates top-down by default, the physical allocations look
> like this:
> 
> [ffe6475cc0000000-ffe6475cc01fffff] PMD -> [ff3cb082bfc00000-ff3cb082bfdfffff] on node 0
> [ffe6475cc0200000-ffe6475cc03fffff] PMD -> [ff3cb082bfa00000-ff3cb082bfbfffff] on node 0
> [ffe6475cc0400000-ffe6475cc05fffff] PMD -> [ff3cb082bf800000-ff3cb082bf9fffff] on node 0
> [ffe6475cc0600000-ffe6475cc07fffff] PMD -> [ff3cb082bf600000-ff3cb082bf7fffff] on node 0
> [ffe6475cc0800000-ffe6475cc09fffff] PMD -> [ff3cb082bf400000-ff3cb082bf5fffff] on node 0
> [ffe6475cc0a00000-ffe6475cc0bfffff] PMD -> [ff3cb082bf200000-ff3cb082bf3fffff] on node 0
> [ffe6475cc0c00000-ffe6475cc0dfffff] PMD -> [ff3cb082bf000000-ff3cb082bf1fffff] on node 0
> [ffe6475cc0e00000-ffe6475cc0ffffff] PMD -> [ff3cb082bee00000-ff3cb082beffffff] on node 0
> [ffe6475cc1000000-ffe6475cc11fffff] PMD -> [ff3cb082bec00000-ff3cb082bedfffff] on node 0
> [ffe6475cc1200000-ffe6475cc13fffff] PMD -> [ff3cb082bea00000-ff3cb082bebfffff] on node 0
> [ffe6475cc1400000-ffe6475cc15fffff] PMD -> [ff3cb082be800000-ff3cb082be9fffff] on node 0
> [ffe6475cc1600000-ffe6475cc17fffff] PMD -> [ff3cb082be600000-ff3cb082be7fffff] on node 0
> [ffe6475cc1800000-ffe6475cc19fffff] PMD -> [ff3cb082be400000-ff3cb082be5fffff] on node 0
> [ffe6475cc1a00000-ffe6475cc1bfffff] PMD -> [ff3cb082be200000-ff3cb082be3fffff] on node 0
> [ffe6475cc1c00000-ffe6475cc1dfffff] PMD -> [ff3cb082be000000-ff3cb082be1fffff] on node 0
> [ffe6475cc1e00000-ffe6475cc1ffffff] PMD -> [ff3cb082bde00000-ff3cb082bdffffff] on node 0
> [ffe6475cc2000000-ffe6475cc21fffff] PMD -> [ff3cb082bdc00000-ff3cb082bddfffff] on node 0
> [ffe6475cc2200000-ffe6475cc23fffff] PMD -> [ff3cb082bda00000-ff3cb082bdbfffff] on node 0
> [ffe6475cc2400000-ffe6475cc25fffff] PMD -> [ff3cb082bd800000-ff3cb082bd9fffff] on node 0
> [ffe6475cc2600000-ffe6475cc27fffff] PMD -> [ff3cb082bd600000-ff3cb082bd7fffff] on node 0
> [ffe6475cc2800000-ffe6475cc29fffff] PMD -> [ff3cb082bd400000-ff3cb082bd5fffff] on node 0
> [ffe6475cc2a00000-ffe6475cc2bfffff] PMD -> [ff3cb082bd200000-ff3cb082bd3fffff] on node 0
> [ffe6475cc2c00000-ffe6475cc2dfffff] PMD -> [ff3cb082bd000000-ff3cb082bd1fffff] on node 0
> [ffe6475cc2e00000-ffe6475cc2ffffff] PMD -> [ff3cb082bce00000-ff3cb082bcffffff] on node 0
> [ffe6475cc4000000-ffe6475cc41fffff] PMD -> [ff3cb082bcc00000-ff3cb082bcdfffff] on node 0
> [ffe6475cc4200000-ffe6475cc43fffff] PMD -> [ff3cb082bca00000-ff3cb082bcbfffff] on node 0
> [ffe6475cc4400000-ffe6475cc45fffff] PMD -> [ff3cb082bc800000-ff3cb082bc9fffff] on node 0
> [ffe6475cc4600000-ffe6475cc47fffff] PMD -> [ff3cb082bc600000-ff3cb082bc7fffff] on node 0
> [ffe6475cc4800000-ffe6475cc49fffff] PMD -> [ff3cb082bc400000-ff3cb082bc5fffff] on node 0
> [ffe6475cc4a00000-ffe6475cc4bfffff] PMD -> [ff3cb082bc200000-ff3cb082bc3fffff] on node 0
> [ffe6475cc4c00000-ffe6475cc4dfffff] PMD -> [ff3cb082bc000000-ff3cb082bc1fffff] on node 0
> [ffe6475cc4e00000-ffe6475cc4ffffff] PMD -> [ff3cb082bbe00000-ff3cb082bbffffff] on node 0
> [ffe6475cc5000000-ffe6475cc51fffff] PMD -> [ff3cb083bfa00000-ff3cb083bfbfffff] on node 1
> [ffe6475cc5200000-ffe6475cc53fffff] PMD -> [ff3cb083bf800000-ff3cb083bf9fffff] on node 1
> [ffe6475cc5400000-ffe6475cc55fffff] PMD -> [ff3cb083bf600000-ff3cb083bf7fffff] on node 1
> [ffe6475cc5600000-ffe6475cc57fffff] PMD -> [ff3cb083bf400000-ff3cb083bf5fffff] on node 1
> [ffe6475cc5800000-ffe6475cc59fffff] PMD -> [ff3cb083bf200000-ff3cb083bf3fffff] on node 1
> [ffe6475cc5a00000-ffe6475cc5bfffff] PMD -> [ff3cb083bf000000-ff3cb083bf1fffff] on node 1
> [ffe6475cc5c00000-ffe6475cc5dfffff] PMD -> [ff3cb083b6e00000-ff3cb083b6ffffff] on node 1
> [ffe6475cc5e00000-ffe6475cc5ffffff] PMD -> [ff3cb083b6c00000-ff3cb083b6dfffff] on node 1
> [ffe6475cc6000000-ffe6475cc61fffff] PMD -> [ff3cb083b6a00000-ff3cb083b6bfffff] on node 1
> [ffe6475cc6200000-ffe6475cc63fffff] PMD -> [ff3cb083b6800000-ff3cb083b69fffff] on node 1
> [ffe6475cc6400000-ffe6475cc65fffff] PMD -> [ff3cb083b6600000-ff3cb083b67fffff] on node 1
> [ffe6475cc6600000-ffe6475cc67fffff] PMD -> [ff3cb083b6400000-ff3cb083b65fffff] on node 1
> [ffe6475cc6800000-ffe6475cc69fffff] PMD -> [ff3cb083b6200000-ff3cb083b63fffff] on node 1
> [ffe6475cc6a00000-ffe6475cc6bfffff] PMD -> [ff3cb083b6000000-ff3cb083b61fffff] on node 1
> [ffe6475cc6c00000-ffe6475cc6dfffff] PMD -> [ff3cb083b5e00000-ff3cb083b5ffffff] on node 1
> [ffe6475cc6e00000-ffe6475cc6ffffff] PMD -> [ff3cb083b5c00000-ff3cb083b5dfffff] on node 1
> [ffe6475cc7000000-ffe6475cc71fffff] PMD -> [ff3cb083b5a00000-ff3cb083b5bfffff] on node 1
> [ffe6475cc7200000-ffe6475cc73fffff] PMD -> [ff3cb083b5800000-ff3cb083b59fffff] on node 1
> [ffe6475cc7400000-ffe6475cc75fffff] PMD -> [ff3cb083b5600000-ff3cb083b57fffff] on node 1
> [ffe6475cc7600000-ffe6475cc77fffff] PMD -> [ff3cb083b5400000-ff3cb083b55fffff] on node 1
> [ffe6475cc7800000-ffe6475cc79fffff] PMD -> [ff3cb083b5200000-ff3cb083b53fffff] on node 1
> [ffe6475cc7a00000-ffe6475cc7bfffff] PMD -> [ff3cb083b5000000-ff3cb083b51fffff] on node 1
> [ffe6475cc7c00000-ffe6475cc7dfffff] PMD -> [ff3cb083b4e00000-ff3cb083b4ffffff] on node 1
> [ffe6475cc7e00000-ffe6475cc7ffffff] PMD -> [ff3cb083b4c00000-ff3cb083b4dfffff] on node 1
> [ffe6475cc8000000-ffe6475cc81fffff] PMD -> [ff3cb083b4a00000-ff3cb083b4bfffff] on node 1
> [ffe6475cc8200000-ffe6475cc83fffff] PMD -> [ff3cb083b4800000-ff3cb083b49fffff] on node 1
> [ffe6475cc8400000-ffe6475cc85fffff] PMD -> [ff3cb083b4600000-ff3cb083b47fffff] on node 1
> [ffe6475cc8600000-ffe6475cc87fffff] PMD -> [ff3cb083b4400000-ff3cb083b45fffff] on node 1
> [ffe6475cc8800000-ffe6475cc89fffff] PMD -> [ff3cb083b4200000-ff3cb083b43fffff] on node 1
> [ffe6475cc8a00000-ffe6475cc8bfffff] PMD -> [ff3cb083b4000000-ff3cb083b41fffff] on node 1
> [ffe6475cc8c00000-ffe6475cc8dfffff] PMD -> [ff3cb083b3e00000-ff3cb083b3ffffff] on node 1
> [ffe6475cc8e00000-ffe6475cc8ffffff] PMD -> [ff3cb083b3c00000-ff3cb083b3dfffff] on node 1
> 
> Notice that the physical chunks are strictly adjacent to each
> other, but in descending order!
> 
> So, they are NOT "scattered around" the whole node randomly.
> Instead, they are packed densely back-to-back in a single
> contiguous physical range (just mapped top-down in 2MB pieces).
> 
> Because they are packed tightly together within the same
> contiguous physical memory range, they will at most consume or
> pollute the exact same number of memory blocks as a single
> contiguous allocation (like sparse_buffer did). Therefore, this
> will NOT turn additional memory blocks/sections into an
> "un-offlinable" state.
> 
> It seems we can safely remove the sparse buffer preallocation
> mechanism, don't you think?

Yes, what I suspected. Is there a performance implication when doing
many individual memmap_alloc(), for example, on a larger system with
many sections?

-- 
Cheers,

David

  reply	other threads:[~2026-04-09 12:29 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-07  8:39 [RFC PATCH] mm/sparse: remove sparse_buffer Muchun Song
2026-04-08 13:40 ` David Hildenbrand (Arm)
2026-04-09 11:40   ` Muchun Song
2026-04-09 12:29     ` David Hildenbrand (Arm) [this message]
2026-04-09 15:10       ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dcc3b56f-5d4c-4bb0-bfd8-49747df30f47@kernel.org \
    --to=david@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=rppt@kernel.org \
    --cc=songmuchun@bytedance.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox