public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Muchun Song <muchun.song@linux.dev>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Petr Tesarik <ptesarik@suse.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/sparse: fix BUILD_BUG_ON check for section map alignment
Date: Wed, 1 Apr 2026 10:59:46 +0800	[thread overview]
Message-ID: <76E85D67-FE97-4D36-80F8-BB0B3DCF70A6@linux.dev> (raw)
In-Reply-To: <7C90E910-D229-4F60-A62D-E893A89D58F2@linux.dev>



> On Apr 1, 2026, at 10:57, Muchun Song <muchun.song@linux.dev> wrote:
> 
> 
> 
>> On Apr 1, 2026, at 04:29, David Hildenbrand (Arm) <david@kernel.org> wrote:
>> 
>> On 3/31/26 13:30, Muchun Song wrote:
>>> The comment in mmzone.h states that the alignment requirement
>>> is the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT. However, the
>>> pointer arithmetic (mem_map - section_nr_to_pfn()) results in
>>> a byte offset scaled by sizeof(struct page). Thus, the actual
>>> alignment provided by the second term is PFN_SECTION_SHIFT +
>>> __ffs(sizeof(struct page)).
>>> 
>>> Update the compile-time check and the mmzone.h comment to
>>> accurately reflect this mathematically guaranteed alignment by
>>> taking the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT +
>>> __ffs(sizeof(struct page)). This avoids the issue of the check
>>> being overly restrictive on architectures like powerpc where
>>> PFN_SECTION_SHIFT alone is very small (e.g., 6).
>>> 
>>> Also, remove the exhaustive per-architecture bit-width list from the
>>> comment; such details risk falling out of date over time and may
>>> inadvertently be left un-updated, while the existing BUILD_BUG_ON
>>> provides sufficient compile-time verification of the constraint.
>>> 
>>> No runtime impact so far: SECTION_MAP_LAST_BIT happens to fit within
>>> the smaller limit on all existing architectures.
>>> 
>>> Fixes: def9b71ee651 ("include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer")
>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>> ---
>>> include/linux/mmzone.h | 24 +++++++++---------------
>>> mm/sparse.c            |  3 ++-
>>> 2 files changed, 11 insertions(+), 16 deletions(-)
>>> 
>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>> index 7bd0134c241c..584fa598ad75 100644
>>> --- a/include/linux/mmzone.h
>>> +++ b/include/linux/mmzone.h
>>> @@ -2073,21 +2073,15 @@ static inline struct mem_section *__nr_to_section(unsigned long nr)
>>> extern size_t mem_section_usage_size(void);
>>> 
>>> /*
>>> - * We use the lower bits of the mem_map pointer to store
>>> - * a little bit of information.  The pointer is calculated
>>> - * as mem_map - section_nr_to_pfn(pnum).  The result is
>>> - * aligned to the minimum alignment of the two values:
>>> - *   1. All mem_map arrays are page-aligned.
>>> - *   2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT
>>> - *      lowest bits.  PFN_SECTION_SHIFT is arch-specific
>>> - *      (equal SECTION_SIZE_BITS - PAGE_SHIFT), and the
>>> - *      worst combination is powerpc with 256k pages,
>>> - *      which results in PFN_SECTION_SHIFT equal 6.
>>> - * To sum it up, at least 6 bits are available on all architectures.
>>> - * However, we can exceed 6 bits on some other architectures except
>>> - * powerpc (e.g. 15 bits are available on x86_64, 13 bits are available
>>> - * with the worst case of 64K pages on arm64) if we make sure the
>>> - * exceeded bit is not applicable to powerpc.
>>> + * We use the lower bits of the mem_map pointer to store a little bit of
>>> + * information. The pointer is calculated as mem_map - section_nr_to_pfn().
>>> + * The result is aligned to the minimum alignment of the two values:
>>> + *
>>> + * 1. All mem_map arrays are page-aligned.
>>> + * 2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT lowest bits. Because
>>> + *    it is subtracted from a struct page pointer, the offset is scaled by
>>> + *    sizeof(struct page). This provides an alignment of PFN_SECTION_SHIFT +
>>> + *    __ffs(sizeof(struct page)).
>>> */
>>> enum {
>>> SECTION_MARKED_PRESENT_BIT,
>>> diff --git a/mm/sparse.c b/mm/sparse.c
>>> index dfabe554adf8..c2eb36bfb86d 100644
>>> --- a/mm/sparse.c
>>> +++ b/mm/sparse.c
>>> @@ -269,7 +269,8 @@ static unsigned long sparse_encode_mem_map(struct page *mem_map, unsigned long p
>>> {
>>> unsigned long coded_mem_map =
>>> (unsigned long)(mem_map - (section_nr_to_pfn(pnum)));
>>> -  BUILD_BUG_ON(SECTION_MAP_LAST_BIT > PFN_SECTION_SHIFT);
>>> +  BUILD_BUG_ON(SECTION_MAP_LAST_BIT > min(PFN_SECTION_SHIFT + __ffs(sizeof(struct page)),
>>> +  PAGE_SHIFT));
>> 
>> If that would trigger, wouldn't the memmap of a memory section be
>> smaller than a single page?
> 
> I don't think a memory section can be smaller than a page, because
> PFN_SECTION_SHIFT is defined as follows:
> 
> #define PFN_SECTION_SHIFT (SECTION_SIZE_BITS - PAGE_SHIFT)
> 
> Therefore, PFN_SECTION_SHIFT must be greater than PAGE_SHIFT. On powerpc,

Sorry, I want to say memory section must be greater than page.

> PFN_SECTION_SHIFT is 6, PAGE_SHIFT is 18 (the worst combination).
> 
> Sorry, but I didn't understand what your concern is. Could you elaborate
> a bit more?
> 
>> 
>> Is this really something we should be concerned about? :)
>> 
> 
> When we continuously increase SECTION_MAP_LAST_BIT, it may trigger issues,
> because I expect to catch problems as early as possible at compile time. That
> was the motivation behind my change.
> 
> Thanks.
> 
>> -- 
>> Cheers,
>> 
>> David




  reply	other threads:[~2026-04-01  3:00 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-31 11:30 [PATCH] mm/sparse: fix BUILD_BUG_ON check for section map alignment Muchun Song
2026-03-31 19:55 ` Andrew Morton
2026-03-31 20:04   ` David Hildenbrand (Arm)
2026-04-01  2:47     ` Muchun Song
2026-03-31 20:07 ` Andrew Morton
2026-04-01  2:47   ` Muchun Song
2026-03-31 20:29 ` David Hildenbrand (Arm)
2026-04-01  2:57   ` Muchun Song
2026-04-01  2:59     ` Muchun Song [this message]
2026-04-01  4:01     ` Muchun Song
2026-04-01  7:08       ` David Hildenbrand (Arm)
2026-04-01  7:23         ` Muchun Song
2026-04-01  7:26           ` David Hildenbrand (Arm)
2026-04-01  7:28             ` Muchun Song
2026-04-01 16:33             ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=76E85D67-FE97-4D36-80F8-BB0B3DCF70A6@linux.dev \
    --to=muchun.song@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=ptesarik@suse.com \
    --cc=rppt@kernel.org \
    --cc=songmuchun@bytedance.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox