From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-183.mta0.migadu.com (out-183.mta0.migadu.com [91.218.175.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D96FF361672 for ; Wed, 1 Apr 2026 03:00:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.183 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775012432; cv=none; b=iWCMEQt4T0FQB26jR0x1VuNnHF9xwsalvs4kyvpC3IJNKzNzpfyuPtKFIMwpsmQFeoRIGJqt+rRXOl8SgCj34lGJMXSk1/ERxOpw7YSUouL59iHlOd8iPjanCo3GZUQI5uvkvb5Iklx7aXLuCCV7zElyw6rm+moU6I9hQMo7bzE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775012432; c=relaxed/simple; bh=kp6UeyAdoASuY0p+m28xJ8mP9a4weN3aPeUPHOTU0NA=; h=Content-Type:Mime-Version:Subject:From:In-Reply-To:Date:Cc: Message-Id:References:To; b=OMgvRD3lj3NZARl3XteOBKytaJt5jBqRA9hybYkbPNpWQSqvfUFCBArGpUpGBgPO1/eeG+jTOlSikcJ8HDTwky36LffU3sV5xL/VbLbK+bme+qOD646SirPy/kTUhce0y5xh1k6J1eR5cpb3mnQhO1DwT8q0kdyZoHEaGcZvXeY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=IlR4RNz5; arc=none smtp.client-ip=91.218.175.183 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="IlR4RNz5" Content-Type: text/plain; charset=us-ascii DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1775012426; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZaF0PbbHuT6q9CCC8Jn3YcYEjjRK4C/ES9CKlMh+1jI=; b=IlR4RNz5fTK0AjVMPjhRn7IWz9zSQRR0v9ge8WsB7ihcLTL0PKgUdNjREQYnCBqGU116F2 xQIk98cQbp/RiRy+uwZkY1nbecEdNxKLVyX4O2MViTsSBiCWIB5K55Ai234gH8rr3Bn0JF HC+APMu7+aYTJv6xbPhGEZ+qUSbsT5g= Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3864.400.21\)) Subject: Re: [PATCH] mm/sparse: fix BUILD_BUG_ON check for section map alignment X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: <7C90E910-D229-4F60-A62D-E893A89D58F2@linux.dev> Date: Wed, 1 Apr 2026 10:59:46 +0800 Cc: Muchun Song , Andrew Morton , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Petr Tesarik , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Message-Id: <76E85D67-FE97-4D36-80F8-BB0B3DCF70A6@linux.dev> References: <20260331113023.2068075-1-songmuchun@bytedance.com> <32789381-f860-4b60-a1e1-4c97f6ed08b1@kernel.org> <7C90E910-D229-4F60-A62D-E893A89D58F2@linux.dev> To: "David Hildenbrand (Arm)" X-Migadu-Flow: FLOW_OUT > On Apr 1, 2026, at 10:57, Muchun Song wrote: >=20 >=20 >=20 >> On Apr 1, 2026, at 04:29, David Hildenbrand (Arm) = wrote: >>=20 >> On 3/31/26 13:30, Muchun Song wrote: >>> The comment in mmzone.h states that the alignment requirement >>> is the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT. However, the >>> pointer arithmetic (mem_map - section_nr_to_pfn()) results in >>> a byte offset scaled by sizeof(struct page). Thus, the actual >>> alignment provided by the second term is PFN_SECTION_SHIFT + >>> __ffs(sizeof(struct page)). >>>=20 >>> Update the compile-time check and the mmzone.h comment to >>> accurately reflect this mathematically guaranteed alignment by >>> taking the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT + >>> __ffs(sizeof(struct page)). This avoids the issue of the check >>> being overly restrictive on architectures like powerpc where >>> PFN_SECTION_SHIFT alone is very small (e.g., 6). >>>=20 >>> Also, remove the exhaustive per-architecture bit-width list from the >>> comment; such details risk falling out of date over time and may >>> inadvertently be left un-updated, while the existing BUILD_BUG_ON >>> provides sufficient compile-time verification of the constraint. >>>=20 >>> No runtime impact so far: SECTION_MAP_LAST_BIT happens to fit within >>> the smaller limit on all existing architectures. >>>=20 >>> Fixes: def9b71ee651 ("include/linux/mmzone.h: fix explanation of = lower bits in the SPARSEMEM mem_map pointer") >>> Signed-off-by: Muchun Song >>> --- >>> include/linux/mmzone.h | 24 +++++++++--------------- >>> mm/sparse.c | 3 ++- >>> 2 files changed, 11 insertions(+), 16 deletions(-) >>>=20 >>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >>> index 7bd0134c241c..584fa598ad75 100644 >>> --- a/include/linux/mmzone.h >>> +++ b/include/linux/mmzone.h >>> @@ -2073,21 +2073,15 @@ static inline struct mem_section = *__nr_to_section(unsigned long nr) >>> extern size_t mem_section_usage_size(void); >>>=20 >>> /* >>> - * We use the lower bits of the mem_map pointer to store >>> - * a little bit of information. The pointer is calculated >>> - * as mem_map - section_nr_to_pfn(pnum). The result is >>> - * aligned to the minimum alignment of the two values: >>> - * 1. All mem_map arrays are page-aligned. >>> - * 2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT >>> - * lowest bits. PFN_SECTION_SHIFT is arch-specific >>> - * (equal SECTION_SIZE_BITS - PAGE_SHIFT), and the >>> - * worst combination is powerpc with 256k pages, >>> - * which results in PFN_SECTION_SHIFT equal 6. >>> - * To sum it up, at least 6 bits are available on all = architectures. >>> - * However, we can exceed 6 bits on some other architectures except >>> - * powerpc (e.g. 15 bits are available on x86_64, 13 bits are = available >>> - * with the worst case of 64K pages on arm64) if we make sure the >>> - * exceeded bit is not applicable to powerpc. >>> + * We use the lower bits of the mem_map pointer to store a little = bit of >>> + * information. The pointer is calculated as mem_map - = section_nr_to_pfn(). >>> + * The result is aligned to the minimum alignment of the two = values: >>> + * >>> + * 1. All mem_map arrays are page-aligned. >>> + * 2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT lowest = bits. Because >>> + * it is subtracted from a struct page pointer, the offset is = scaled by >>> + * sizeof(struct page). This provides an alignment of = PFN_SECTION_SHIFT + >>> + * __ffs(sizeof(struct page)). >>> */ >>> enum { >>> SECTION_MARKED_PRESENT_BIT, >>> diff --git a/mm/sparse.c b/mm/sparse.c >>> index dfabe554adf8..c2eb36bfb86d 100644 >>> --- a/mm/sparse.c >>> +++ b/mm/sparse.c >>> @@ -269,7 +269,8 @@ static unsigned long = sparse_encode_mem_map(struct page *mem_map, unsigned long p >>> { >>> unsigned long coded_mem_map =3D >>> (unsigned long)(mem_map - (section_nr_to_pfn(pnum))); >>> - BUILD_BUG_ON(SECTION_MAP_LAST_BIT > PFN_SECTION_SHIFT); >>> + BUILD_BUG_ON(SECTION_MAP_LAST_BIT > min(PFN_SECTION_SHIFT + = __ffs(sizeof(struct page)), >>> + PAGE_SHIFT)); >>=20 >> If that would trigger, wouldn't the memmap of a memory section be >> smaller than a single page? >=20 > I don't think a memory section can be smaller than a page, because > PFN_SECTION_SHIFT is defined as follows: >=20 > #define PFN_SECTION_SHIFT (SECTION_SIZE_BITS - PAGE_SHIFT) >=20 > Therefore, PFN_SECTION_SHIFT must be greater than PAGE_SHIFT. On = powerpc, Sorry, I want to say memory section must be greater than page. > PFN_SECTION_SHIFT is 6, PAGE_SHIFT is 18 (the worst combination). >=20 > Sorry, but I didn't understand what your concern is. Could you = elaborate > a bit more? >=20 >>=20 >> Is this really something we should be concerned about? :) >>=20 >=20 > When we continuously increase SECTION_MAP_LAST_BIT, it may trigger = issues, > because I expect to catch problems as early as possible at compile = time. That > was the motivation behind my change. >=20 > Thanks. >=20 >> --=20 >> Cheers, >>=20 >> David