Re: [PATCH v4] mm/alloc_tag: replace fixed-size early PFN array with dynamic linked list

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Hao Ge <hao.ge@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>,
	Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4] mm/alloc_tag: replace fixed-size early PFN array with dynamic linked list
Date: Wed, 6 May 2026 10:23:09 +0800	[thread overview]
Message-ID: <0b9969e2-b208-46c2-a9a5-bf620239275a@linux.dev> (raw)
In-Reply-To: <20260430075239.efcceac3d83ede2a2d22158c@linux-foundation.org>

Hi Andrew and Suren


Sorry for the late reply, I was on holiday.


On 2026/4/30 22:52, Andrew Morton wrote:
> On Thu, 30 Apr 2026 10:02:26 +0800 Hao Ge <hao.ge@linux.dev> wrote:
>
>> Pages allocated before page_ext is available have their codetag left
>> uninitialized. Track these early PFNs and clear their codetag in
>> clear_early_alloc_pfn_tag_refs() to avoid "alloc_tag was not set"
>> warnings when they are freed later.
>>
>> Currently a fixed-size array of 8192 entries is used, with a warning if
>> the limit is exceeded. However, the number of early allocations depends
>> on the number of CPUs and can be larger than 8192.
>>
>> Replace the fixed-size array with a dynamically allocated linked list
>> of pfn_pool structs. Each node is allocated via alloc_page() and mapped
>> to a pfn_pool containing a next pointer, an atomic slot counter, and a
>> PFN array that fills the remainder of the page.
>>
>> The tracking pages themselves are allocated via alloc_page(), which
>> would trigger __pgalloc_tag_add() -> alloc_tag_add_early_pfn() and
>> recurse indefinitely. Introduce __GFP_NO_CODETAG (reuses the
>> %__GFP_NO_OBJ_EXT bit) and pass gfp_flags through pgalloc_tag_add()
>> so that the early path can skip recording allocations that carry this
>> flag.
> Thanks.  AI review asked a couple of questions:
>
> 	https://sashiko.dev/#/patchset/20260430020226.34116-1-hao.ge@linux.dev

Yes, Sashiko is an excellent tool. Let me carefully go through the 
questions raised by its review.


About question 1 (About question 1 (__GFP_NO_CODETAG aliasing to 
__GFP_NO_OBJ_EXT may cause

SLUB page allocations to be erroneously skipped):


No. When SLUB allocates a new slab page via new_slab(),

it filters the flags with (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK),

https://elixir.bootlin.com/linux/v7.0.1/source/mm/slub.c#L3539

  which does not include __GFP_NO_OBJ_EXT. So __GFP_NO_OBJ_EXT is never 
passed

through to the page allocator for slab page allocations.


About question 2 (retaining caller's zone/placement flags may cause NULL

pointer dereference or violate mobility constraints):


Sashiko's analysis is correct.  As Suren also suggested, it will be 
fixed in

v5 by masking out GFP_ZONEMASK when allocating tracking pages,

so the internal allocation is no longer constrained by the original 
caller's zone

requirements.


> Please check to see if there's anything legitimate there?
>
> It also asked some different questions of the v3 patch:
> 	https://sashiko.dev/#/patchset/20260423083756.157902-1-hao.ge@linux.dev

For the v3 patch:


About question 3 (race in the cmpxchg failure path: page_ext becomes

available between alloc_page() and __free_page(), triggering "alloc_tag 
was not set"):


Good catch. This race was identified during the v3 review and the fix

(calling clear_page_tag_ref() before __free_page()) was added in v4.

Apologies for not documenting it in the v4 changelogs.

CPU A (__alloc_tag_add_early_pfn)                CPU B 
(clear_early_alloc_pfn_tag_refs)

alloc_page() -> codetag uninitialized

                         page_ext becomes available

cmpxchg() fails

__free_page -> WARN

Setting the codetag to CODETAG_EMPTY via clear_page_tag_ref() prevents

this warning.


About question 4 (accounting leak: tracking page gets fully accounted,

then clear_page_tag_ref() overwrites to CODETAG_EMPTY, __free_page()

skips subtraction):


Yes, this can happen. However, the tracking pages are allocated with

__GFP_NO_CODETAG and are only used during early boot. The leaked

accounting is at most one page per tracking node, which is negligible

for this debug-only code path.


Thanks

Best Regards

Hao

next prev parent reply	other threads:[~2026-05-06  2:24 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30  2:02 [PATCH v4] mm/alloc_tag: replace fixed-size early PFN array with dynamic linked list Hao Ge
2026-04-30 14:52 ` Andrew Morton
2026-05-06  2:23   ` Hao Ge [this message]
2026-05-01 20:32 ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0b9969e2-b208-46c2-a9a5-bf620239275a@linux.dev \
    --to=hao.ge@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=surenb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.