public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Hao Ge <hao.ge@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>,
	Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4] mm/alloc_tag: replace fixed-size early PFN array with dynamic linked list
Date: Wed, 6 May 2026 10:23:09 +0800	[thread overview]
Message-ID: <0b9969e2-b208-46c2-a9a5-bf620239275a@linux.dev> (raw)
In-Reply-To: <20260430075239.efcceac3d83ede2a2d22158c@linux-foundation.org>

Hi Andrew and Suren


Sorry for the late reply, I was on holiday.


On 2026/4/30 22:52, Andrew Morton wrote:
> On Thu, 30 Apr 2026 10:02:26 +0800 Hao Ge <hao.ge@linux.dev> wrote:
>
>> Pages allocated before page_ext is available have their codetag left
>> uninitialized. Track these early PFNs and clear their codetag in
>> clear_early_alloc_pfn_tag_refs() to avoid "alloc_tag was not set"
>> warnings when they are freed later.
>>
>> Currently a fixed-size array of 8192 entries is used, with a warning if
>> the limit is exceeded. However, the number of early allocations depends
>> on the number of CPUs and can be larger than 8192.
>>
>> Replace the fixed-size array with a dynamically allocated linked list
>> of pfn_pool structs. Each node is allocated via alloc_page() and mapped
>> to a pfn_pool containing a next pointer, an atomic slot counter, and a
>> PFN array that fills the remainder of the page.
>>
>> The tracking pages themselves are allocated via alloc_page(), which
>> would trigger __pgalloc_tag_add() -> alloc_tag_add_early_pfn() and
>> recurse indefinitely. Introduce __GFP_NO_CODETAG (reuses the
>> %__GFP_NO_OBJ_EXT bit) and pass gfp_flags through pgalloc_tag_add()
>> so that the early path can skip recording allocations that carry this
>> flag.
> Thanks.  AI review asked a couple of questions:
>
> 	https://sashiko.dev/#/patchset/20260430020226.34116-1-hao.ge@linux.dev

Yes, Sashiko is an excellent tool. Let me carefully go through the 
questions raised by its review.


About question 1 (About question 1 (__GFP_NO_CODETAG aliasing to 
__GFP_NO_OBJ_EXT may cause

SLUB page allocations to be erroneously skipped):


No. When SLUB allocates a new slab page via new_slab(),

it filters the flags with (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK),

https://elixir.bootlin.com/linux/v7.0.1/source/mm/slub.c#L3539

  which does not include __GFP_NO_OBJ_EXT. So __GFP_NO_OBJ_EXT is never 
passed

through to the page allocator for slab page allocations.


About question 2 (retaining caller's zone/placement flags may cause NULL

pointer dereference or violate mobility constraints):


Sashiko's analysis is correct.  As Suren also suggested, it will be 
fixed in

v5 by masking out GFP_ZONEMASK when allocating tracking pages,

so the internal allocation is no longer constrained by the original 
caller's zone

requirements.


> Please check to see if there's anything legitimate there?
>
> It also asked some different questions of the v3 patch:
> 	https://sashiko.dev/#/patchset/20260423083756.157902-1-hao.ge@linux.dev

For the v3 patch:


About question 3 (race in the cmpxchg failure path: page_ext becomes

available between alloc_page() and __free_page(), triggering "alloc_tag 
was not set"):


Good catch. This race was identified during the v3 review and the fix

(calling clear_page_tag_ref() before __free_page()) was added in v4.

Apologies for not documenting it in the v4 changelogs.

CPU A (__alloc_tag_add_early_pfn)                CPU B 
(clear_early_alloc_pfn_tag_refs)

alloc_page() -> codetag uninitialized

                         page_ext becomes available

cmpxchg() fails

__free_page -> WARN

Setting the codetag to CODETAG_EMPTY via clear_page_tag_ref() prevents

this warning.


About question 4 (accounting leak: tracking page gets fully accounted,

then clear_page_tag_ref() overwrites to CODETAG_EMPTY, __free_page()

skips subtraction):


Yes, this can happen. However, the tracking pages are allocated with

__GFP_NO_CODETAG and are only used during early boot. The leaked

accounting is at most one page per tracking node, which is negligible

for this debug-only code path.


Thanks

Best Regards

Hao


  reply	other threads:[~2026-05-06  2:24 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30  2:02 [PATCH v4] mm/alloc_tag: replace fixed-size early PFN array with dynamic linked list Hao Ge
2026-04-30 14:52 ` Andrew Morton
2026-05-06  2:23   ` Hao Ge [this message]
2026-05-01 20:32 ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0b9969e2-b208-46c2-a9a5-bf620239275a@linux.dev \
    --to=hao.ge@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=surenb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox