From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-186.mta0.migadu.com (out-186.mta0.migadu.com [91.218.175.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C1CE3A255F for ; Thu, 9 Apr 2026 11:41:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.186 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775734874; cv=none; b=juV0xmQ+VmMiOAihlNawEnlJDLGw4Pmp0O8dsuCB0Z/7v1059Rd38UCxjtvS21AAP+/dkeD0NfVngXP0Y8A5cMIwSfmqZm0JysCNGWCLkaJmxx4LpC8Fkgzp7N0nzwbI4lHLZxZn+dGlM4AyoLLNYgOrZ+0oNDbwHUtNgjq+iRo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775734874; c=relaxed/simple; bh=xs/VI904FekyOG73HJ6C/dK9dumEXcCfq78JAVX1ARU=; h=Content-Type:Mime-Version:Subject:From:In-Reply-To:Date:Cc: Message-Id:References:To; b=kOOvu9O6P0t/RRzq9IONqC0SOJJPrYm/YdZ8ohlcF19uN6YTzbG+CoWIRdYIhw5znXxrHuGOIPzZ/4o9KC3j3Yr4V4KSFGtDMb0Ib7ezBHfFwB6mRLG/IgC9ulAV3dm6TA4fJlpGanFT0RADRFtHsO/FiScDuLC+cbxW/63WsMM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=EElsSNSC; arc=none smtp.client-ip=91.218.175.186 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="EElsSNSC" Content-Type: text/plain; charset=us-ascii DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1775734869; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xs/VI904FekyOG73HJ6C/dK9dumEXcCfq78JAVX1ARU=; b=EElsSNSCD/qGE9ZUq4C0pa44QnDRJOd7EQ1FiCnBc/i8kuOXk04lkoRAqHzCVTKUPERQTC KwLD8fcsQ00CvPRYKL5TCZutqwEr+2hnA9wDKYahzWBz9EJvUME3+ypVQCq91uFQSdr8eL FibPEWCw133nsYQW6i66yeI4xOZs2Wo= Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3864.500.181\)) Subject: Re: [RFC PATCH] mm/sparse: remove sparse_buffer X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: Date: Thu, 9 Apr 2026 19:40:08 +0800 Cc: Muchun Song , Andrew Morton , yinghai@kernel.org, Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Message-Id: <70EF8E41-31A2-4B43-BABE-7218FD5F7271@linux.dev> References: <20260407083951.2823915-1-songmuchun@bytedance.com> To: "David Hildenbrand (Arm)" X-Migadu-Flow: FLOW_OUT > On Apr 8, 2026, at 21:40, David Hildenbrand (Arm) = wrote: >=20 > On 4/7/26 10:39, Muchun Song wrote: >> The sparse_buffer was originally introduced in commit 9bdac9142407 >> ("sparsemem: Put mem map for one node together.") to allocate a >> contiguous block of memory for all memmaps of a NUMA node. >>=20 >> However, the original commit message did not clearly state the actual >> benefits or the necessity of keeping all memmap areas strictly >> contiguous for a given node. >=20 > We don't want the memmap to be scattered around, given that it is one = of > the biggest allocations during boot. >=20 > It's related to not turning too many memory blocks/sections > un-offlinable I think. Hi David, Got it. >=20 > I always imagined that memblock would still keep these allocations = close > to each other. Can you verify if that is indeed true? You raised a very interesting point about whether memblock keeps these allocations close to each other. I've done a thorough test on a 16GB VM by printing the actual physical allocations. I enabled the existing debug logs in arch/x86/mm/init_64.c to trace the vmemmap_set_pmd allocations. Here is what really happens: When using vmemmap_alloc_block without sparse_buffer, the memblock allocator allocates 2MB chunks. Because memblock allocates top-down by default, the physical allocations look like this: [ffe6475cc0000000-ffe6475cc01fffff] PMD -> = [ff3cb082bfc00000-ff3cb082bfdfffff] on node 0 [ffe6475cc0200000-ffe6475cc03fffff] PMD -> = [ff3cb082bfa00000-ff3cb082bfbfffff] on node 0 [ffe6475cc0400000-ffe6475cc05fffff] PMD -> = [ff3cb082bf800000-ff3cb082bf9fffff] on node 0 [ffe6475cc0600000-ffe6475cc07fffff] PMD -> = [ff3cb082bf600000-ff3cb082bf7fffff] on node 0 [ffe6475cc0800000-ffe6475cc09fffff] PMD -> = [ff3cb082bf400000-ff3cb082bf5fffff] on node 0 [ffe6475cc0a00000-ffe6475cc0bfffff] PMD -> = [ff3cb082bf200000-ff3cb082bf3fffff] on node 0 [ffe6475cc0c00000-ffe6475cc0dfffff] PMD -> = [ff3cb082bf000000-ff3cb082bf1fffff] on node 0 [ffe6475cc0e00000-ffe6475cc0ffffff] PMD -> = [ff3cb082bee00000-ff3cb082beffffff] on node 0 [ffe6475cc1000000-ffe6475cc11fffff] PMD -> = [ff3cb082bec00000-ff3cb082bedfffff] on node 0 [ffe6475cc1200000-ffe6475cc13fffff] PMD -> = [ff3cb082bea00000-ff3cb082bebfffff] on node 0 [ffe6475cc1400000-ffe6475cc15fffff] PMD -> = [ff3cb082be800000-ff3cb082be9fffff] on node 0 [ffe6475cc1600000-ffe6475cc17fffff] PMD -> = [ff3cb082be600000-ff3cb082be7fffff] on node 0 [ffe6475cc1800000-ffe6475cc19fffff] PMD -> = [ff3cb082be400000-ff3cb082be5fffff] on node 0 [ffe6475cc1a00000-ffe6475cc1bfffff] PMD -> = [ff3cb082be200000-ff3cb082be3fffff] on node 0 [ffe6475cc1c00000-ffe6475cc1dfffff] PMD -> = [ff3cb082be000000-ff3cb082be1fffff] on node 0 [ffe6475cc1e00000-ffe6475cc1ffffff] PMD -> = [ff3cb082bde00000-ff3cb082bdffffff] on node 0 [ffe6475cc2000000-ffe6475cc21fffff] PMD -> = [ff3cb082bdc00000-ff3cb082bddfffff] on node 0 [ffe6475cc2200000-ffe6475cc23fffff] PMD -> = [ff3cb082bda00000-ff3cb082bdbfffff] on node 0 [ffe6475cc2400000-ffe6475cc25fffff] PMD -> = [ff3cb082bd800000-ff3cb082bd9fffff] on node 0 [ffe6475cc2600000-ffe6475cc27fffff] PMD -> = [ff3cb082bd600000-ff3cb082bd7fffff] on node 0 [ffe6475cc2800000-ffe6475cc29fffff] PMD -> = [ff3cb082bd400000-ff3cb082bd5fffff] on node 0 [ffe6475cc2a00000-ffe6475cc2bfffff] PMD -> = [ff3cb082bd200000-ff3cb082bd3fffff] on node 0 [ffe6475cc2c00000-ffe6475cc2dfffff] PMD -> = [ff3cb082bd000000-ff3cb082bd1fffff] on node 0 [ffe6475cc2e00000-ffe6475cc2ffffff] PMD -> = [ff3cb082bce00000-ff3cb082bcffffff] on node 0 [ffe6475cc4000000-ffe6475cc41fffff] PMD -> = [ff3cb082bcc00000-ff3cb082bcdfffff] on node 0 [ffe6475cc4200000-ffe6475cc43fffff] PMD -> = [ff3cb082bca00000-ff3cb082bcbfffff] on node 0 [ffe6475cc4400000-ffe6475cc45fffff] PMD -> = [ff3cb082bc800000-ff3cb082bc9fffff] on node 0 [ffe6475cc4600000-ffe6475cc47fffff] PMD -> = [ff3cb082bc600000-ff3cb082bc7fffff] on node 0 [ffe6475cc4800000-ffe6475cc49fffff] PMD -> = [ff3cb082bc400000-ff3cb082bc5fffff] on node 0 [ffe6475cc4a00000-ffe6475cc4bfffff] PMD -> = [ff3cb082bc200000-ff3cb082bc3fffff] on node 0 [ffe6475cc4c00000-ffe6475cc4dfffff] PMD -> = [ff3cb082bc000000-ff3cb082bc1fffff] on node 0 [ffe6475cc4e00000-ffe6475cc4ffffff] PMD -> = [ff3cb082bbe00000-ff3cb082bbffffff] on node 0 [ffe6475cc5000000-ffe6475cc51fffff] PMD -> = [ff3cb083bfa00000-ff3cb083bfbfffff] on node 1 [ffe6475cc5200000-ffe6475cc53fffff] PMD -> = [ff3cb083bf800000-ff3cb083bf9fffff] on node 1 [ffe6475cc5400000-ffe6475cc55fffff] PMD -> = [ff3cb083bf600000-ff3cb083bf7fffff] on node 1 [ffe6475cc5600000-ffe6475cc57fffff] PMD -> = [ff3cb083bf400000-ff3cb083bf5fffff] on node 1 [ffe6475cc5800000-ffe6475cc59fffff] PMD -> = [ff3cb083bf200000-ff3cb083bf3fffff] on node 1 [ffe6475cc5a00000-ffe6475cc5bfffff] PMD -> = [ff3cb083bf000000-ff3cb083bf1fffff] on node 1 [ffe6475cc5c00000-ffe6475cc5dfffff] PMD -> = [ff3cb083b6e00000-ff3cb083b6ffffff] on node 1 [ffe6475cc5e00000-ffe6475cc5ffffff] PMD -> = [ff3cb083b6c00000-ff3cb083b6dfffff] on node 1 [ffe6475cc6000000-ffe6475cc61fffff] PMD -> = [ff3cb083b6a00000-ff3cb083b6bfffff] on node 1 [ffe6475cc6200000-ffe6475cc63fffff] PMD -> = [ff3cb083b6800000-ff3cb083b69fffff] on node 1 [ffe6475cc6400000-ffe6475cc65fffff] PMD -> = [ff3cb083b6600000-ff3cb083b67fffff] on node 1 [ffe6475cc6600000-ffe6475cc67fffff] PMD -> = [ff3cb083b6400000-ff3cb083b65fffff] on node 1 [ffe6475cc6800000-ffe6475cc69fffff] PMD -> = [ff3cb083b6200000-ff3cb083b63fffff] on node 1 [ffe6475cc6a00000-ffe6475cc6bfffff] PMD -> = [ff3cb083b6000000-ff3cb083b61fffff] on node 1 [ffe6475cc6c00000-ffe6475cc6dfffff] PMD -> = [ff3cb083b5e00000-ff3cb083b5ffffff] on node 1 [ffe6475cc6e00000-ffe6475cc6ffffff] PMD -> = [ff3cb083b5c00000-ff3cb083b5dfffff] on node 1 [ffe6475cc7000000-ffe6475cc71fffff] PMD -> = [ff3cb083b5a00000-ff3cb083b5bfffff] on node 1 [ffe6475cc7200000-ffe6475cc73fffff] PMD -> = [ff3cb083b5800000-ff3cb083b59fffff] on node 1 [ffe6475cc7400000-ffe6475cc75fffff] PMD -> = [ff3cb083b5600000-ff3cb083b57fffff] on node 1 [ffe6475cc7600000-ffe6475cc77fffff] PMD -> = [ff3cb083b5400000-ff3cb083b55fffff] on node 1 [ffe6475cc7800000-ffe6475cc79fffff] PMD -> = [ff3cb083b5200000-ff3cb083b53fffff] on node 1 [ffe6475cc7a00000-ffe6475cc7bfffff] PMD -> = [ff3cb083b5000000-ff3cb083b51fffff] on node 1 [ffe6475cc7c00000-ffe6475cc7dfffff] PMD -> = [ff3cb083b4e00000-ff3cb083b4ffffff] on node 1 [ffe6475cc7e00000-ffe6475cc7ffffff] PMD -> = [ff3cb083b4c00000-ff3cb083b4dfffff] on node 1 [ffe6475cc8000000-ffe6475cc81fffff] PMD -> = [ff3cb083b4a00000-ff3cb083b4bfffff] on node 1 [ffe6475cc8200000-ffe6475cc83fffff] PMD -> = [ff3cb083b4800000-ff3cb083b49fffff] on node 1 [ffe6475cc8400000-ffe6475cc85fffff] PMD -> = [ff3cb083b4600000-ff3cb083b47fffff] on node 1 [ffe6475cc8600000-ffe6475cc87fffff] PMD -> = [ff3cb083b4400000-ff3cb083b45fffff] on node 1 [ffe6475cc8800000-ffe6475cc89fffff] PMD -> = [ff3cb083b4200000-ff3cb083b43fffff] on node 1 [ffe6475cc8a00000-ffe6475cc8bfffff] PMD -> = [ff3cb083b4000000-ff3cb083b41fffff] on node 1 [ffe6475cc8c00000-ffe6475cc8dfffff] PMD -> = [ff3cb083b3e00000-ff3cb083b3ffffff] on node 1 [ffe6475cc8e00000-ffe6475cc8ffffff] PMD -> = [ff3cb083b3c00000-ff3cb083b3dfffff] on node 1 Notice that the physical chunks are strictly adjacent to each other, but in descending order! So, they are NOT "scattered around" the whole node randomly. Instead, they are packed densely back-to-back in a single contiguous physical range (just mapped top-down in 2MB pieces). Because they are packed tightly together within the same contiguous physical memory range, they will at most consume or pollute the exact same number of memory blocks as a single contiguous allocation (like sparse_buffer did). Therefore, this will NOT turn additional memory blocks/sections into an "un-offlinable" state. It seems we can safely remove the sparse buffer preallocation mechanism, don't you think? Thanks, Muchun >=20 > --=20 > Cheers, >=20 > David