From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C8A9513957E for ; Thu, 9 Apr 2026 15:10:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775747421; cv=none; b=BD0/q4SIkm12WA3DSzBCaloR1l3RfrBZLFinKyzrDe+u7N92cJTA5OQ2ZmPwEjRhjWCP7c0d+j/Amu/Em2Otis+6Zw/a/cn6sJ0BVnXh4/RHN3hdUcA6N/AqfvP58T6FYJR70kZaeWxS48YBAAjOSCaT4CUWege7yonXlIk6ybw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775747421; c=relaxed/simple; bh=mpzbJeW9K8HU2WsSd97Hb9A6JcAPIDKyhFRx2DEEoe4=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=q/OothQkdZL1mi/hVUm8zDXp3WVFyd2m9MVJTLAOgZxLOB86AAcaSm2IJuxgVYWhn0kPTS6WsyRvUQ8Fe3g/DhbetN+pbjz6Mn9y/92RP/IlZTI0IZkIedcj2F0SUmP/Z++EwfL6BDuZLAisrebZm4CoFdW1zfFPUnwB/cUF/00= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ga94zfGZ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ga94zfGZ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A6EA1C4CEF7; Thu, 9 Apr 2026 15:10:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1775747421; bh=mpzbJeW9K8HU2WsSd97Hb9A6JcAPIDKyhFRx2DEEoe4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=ga94zfGZlyiDjKYAxafyOeqcBSHFLvp+MC1zJIsqZ2CZzDB8V7fQt6soXlFoDHt1C oeNp3mU784AFH2Km/tgGqAOxH6kJDsJAepj1RcOngWBlKmhnzuxFmgVy8CwZfscRZ+ 4uvoPbCeNcEcbm+UWrrNBjf7l3xnPwHTLM3Y8sg+JxTDfjp3oHHSgAQXSb6aBC/upw 3lXDP4VqBY9Lwp4UtlNFCZ3RKlavzj7FLyRrdGwXHcOGcz6FUtL6jwWBbSleyXUXb7 SLDxnFJjO9RuWg6f453kioizTxG7lexyNfTUXidefPlibk80ZhHmiew0XSDj4ZbukB wXG2eyKMjE9+A== Date: Thu, 9 Apr 2026 18:10:13 +0300 From: Mike Rapoport To: "David Hildenbrand (Arm)" Cc: Muchun Song , Muchun Song , Andrew Morton , yinghai@kernel.org, Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Suren Baghdasaryan , Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH] mm/sparse: remove sparse_buffer Message-ID: References: <20260407083951.2823915-1-songmuchun@bytedance.com> <70EF8E41-31A2-4B43-BABE-7218FD5F7271@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Hi, On Thu, Apr 09, 2026 at 02:29:38PM +0200, David Hildenbrand (Arm) wrote: > On 4/9/26 13:40, Muchun Song wrote: > > > > > >> On Apr 8, 2026, at 21:40, David Hildenbrand (Arm) wrote: > >> > >> On 4/7/26 10:39, Muchun Song wrote: > >>> The sparse_buffer was originally introduced in commit 9bdac9142407 > >>> ("sparsemem: Put mem map for one node together.") to allocate a > >>> contiguous block of memory for all memmaps of a NUMA node. > >>> > >>> However, the original commit message did not clearly state the actual > >>> benefits or the necessity of keeping all memmap areas strictly > >>> contiguous for a given node. > >> > >> We don't want the memmap to be scattered around, given that it is one of > >> the biggest allocations during boot. > >> > >> It's related to not turning too many memory blocks/sections > >> un-offlinable I think. > >> > >> I always imagined that memblock would still keep these allocations close > >> to each other. Can you verify if that is indeed true? > > > > You raised a very interesting point about whether memblock keeps > > these allocations close to each other. I've done a thorough test > > on a 16GB VM by printing the actual physical allocations. memblock always allocates in order, so if there are no other memblock allocations between the calls to memmap_alloc(), all these allocations will be together and they all will be coalesced to a single region in memblock.reserved. > > I enabled the existing debug logs in arch/x86/mm/init_64.c to > > trace the vmemmap_set_pmd allocations. Here is what really happens: > > > > When using vmemmap_alloc_block without sparse_buffer, the > > memblock allocator allocates 2MB chunks. Because memblock > > allocates top-down by default, the physical allocations look > > like this: > > > > [ffe6475cc0000000-ffe6475cc01fffff] PMD -> [ff3cb082bfc00000-ff3cb082bfdfffff] on node 0 > > [ffe6475cc0200000-ffe6475cc03fffff] PMD -> [ff3cb082bfa00000-ff3cb082bfbfffff] on node 0 > > [ffe6475cc0400000-ffe6475cc05fffff] PMD -> [ff3cb082bf800000-ff3cb082bf9fffff] on node 0 ... > > Notice that the physical chunks are strictly adjacent to each > > other, but in descending order! > > > > So, they are NOT "scattered around" the whole node randomly. > > Instead, they are packed densely back-to-back in a single > > contiguous physical range (just mapped top-down in 2MB pieces). > > > > Because they are packed tightly together within the same > > contiguous physical memory range, they will at most consume or > > pollute the exact same number of memory blocks as a single > > contiguous allocation (like sparse_buffer did). Therefore, this > > will NOT turn additional memory blocks/sections into an > > "un-offlinable" state. > > > > It seems we can safely remove the sparse buffer preallocation > > mechanism, don't you think? > > Yes, what I suspected. Is there a performance implication when doing > many individual memmap_alloc(), for example, on a larger system with > many sections? memmap_alloc() will be slower than sparse_buffer_alloc(), allocating from memblock is more involved that sparse_buffer_alloc(), but without measurements it's hard to tell how much it'll affect overall sparse_init(). > -- > Cheers, > > David -- Sincerely yours, Mike.