From: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
To: "zhen.ni" <zhen.ni@easystack.cn>,
Andrew Morton <akpm@linux-foundation.org>
Cc: surenb@google.com, mhocko@suse.com, jackmanb@google.com,
hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v9 0/4] mm/page_owner: add per-fd filter infrastructure for print_mode and NUMA filtering
Date: Thu, 18 Jun 2026 09:27:24 +0200 [thread overview]
Message-ID: <7b2125bb-c2d6-4678-9ae3-6f4e93034391@kernel.org> (raw)
In-Reply-To: <5ef656db-c6b6-4a2c-b6be-628e5214952f@easystack.cn>
On 6/17/26 10:52, zhen.ni wrote:
> 在 2026/5/26 03:58, Andrew Morton 写道:
>> On Mon, 25 May 2026 16:16:48 +0800 Zhen Ni <zhen.ni@easystack.cn> wrote:
>>
>>> This patch series introduces per-file-descriptor filtering capabilities to the
>>> page_owner feature.
>>
>> Thanks again. AI review has found a bunch of new things to get worried
>> about:
>> https://sashiko.dev/#/patchset/20260525081652.2210206-1-zhen.ni@easystack.cn
>>
>>
> Hi,
>
> Can this lead to an out-of-bounds memory read?
>
> The NUMA filter in page_owner (mm/page_owner.c:790-798) bypasses
> PF_POISONED_CHECK() to avoid triggering VM_BUG_ON during concurrent page
> allocation/free:
>
> int page_nid = memdesc_nid(page->flags);
>
> When NODE_NOT_IN_PAGE_FLAGS is defined, memdesc_nid() performs unchecked
> array access:
>
> int memdesc_nid(memdesc_flags_t mdf)
> {
> return section_to_node_table[memdesc_section(mdf)];
> }
>
> If page->flags is poisoned, memdesc_section() can return a garbage
> section_nr that causes out-of-bounds access.
>
> ## Lockless Access Safety Principle
>
> The page_owner iterator runs without locks, meaning pages can be
> allocated or freed concurrently. The fundamental design principle should be:
>
> "It's acceptable to skip a small number of abnormal pages, but panics
> must be prevented."
>
> In lockless iteration, TOCTOU is unavoidable - even with reference
> counting or RCU, page->flags can still be modified concurrently during
> access. Zone locks prevent this but are prohibitively expensive.
>
> ## Proposed Solution: Add nid to struct page_owner
>
> Record nid at allocation time when page state is stable, eliminating the
> need to extract it from page->flags during iteration:
>
> ### 1. Modify struct page_owner
>
> struct page_owner {
> unsigned short order;
> short last_migrate_reason;
> ...
> pid_t tgid;
> pid_t free_pid;
> pid_t free_tgid;
> int nid; // NEW
> };
>
> ### 2. Record nid during allocation
>
> static inline void __update_page_owner_handle(struct page *page, ...)
> {
> int nid = page_to_nid(page); // Safe in allocation context
>
> for_each_page_ext(page, 1 << order, page_ext, iter) {
> page_owner = get_page_owner(page_ext);
> page_owner->nid = nid;
> // ... other fields ...
> }
> }
>
> ### 3. Use saved nid in NUMA filter
>
> if (state->nid_filter_enabled) {
> int page_nid = page_owner->nid; // Direct read, safe
>
> if (!node_isset(page_nid, state->nid_filter)) {
> spin_unlock_irqrestore(&state->lock, flags);
> goto ext_put_continue;
> }
> }
>
> ### 4. Update nid on page migration
>
> // In split_page_owner() when page migrates
> page_owner->nid = page_to_nid(&newfolio->page);
>
This (presumably LLM) suggestion is a, let's say "lazy" solution to the
problem, leading to more memory usage. I'd be surprised if it's not possible
to read the nid in a way that avoids the hazards. If page_to_nid() can
trigger a VM_BUG_ON(), then I'd add a version without that VM_BUG_ON(),
handling the poisoned state gracefully - if it's poisoned, return e.g.
NUMA_NO_NODE and skip the page, or something.
> The remaining two issues can also be improved. If there are no
> additional comments, I will proceed with sending v10.
>
>
> Thanks,
> Zhen
next prev parent reply other threads:[~2026-06-18 7:27 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-25 8:16 [PATCH v9 0/4] mm/page_owner: add per-fd filter infrastructure for print_mode and NUMA filtering Zhen Ni
2026-05-25 8:16 ` [PATCH v9 1/4] mm/page_owner: add print_mode filter Zhen Ni
2026-05-25 8:16 ` [PATCH v9 2/4] mm/page_owner: add NUMA node filter Zhen Ni
2026-05-25 8:16 ` [PATCH v9 3/4] tools/mm: add page_owner_filter userspace tool Zhen Ni
2026-05-25 8:16 ` [PATCH v9 4/4] mm/page_owner: document page_owner filter Zhen Ni
2026-05-25 19:58 ` [PATCH v9 0/4] mm/page_owner: add per-fd filter infrastructure for print_mode and NUMA filtering Andrew Morton
2026-06-17 8:52 ` zhen.ni
2026-06-18 7:27 ` Vlastimil Babka (SUSE) [this message]
2026-06-18 8:15 ` zhen.ni
2026-06-23 6:52 ` Vlastimil Babka (SUSE)
2026-06-23 12:43 ` zhen.ni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7b2125bb-c2d6-4678-9ae3-6f4e93034391@kernel.org \
--to=vbabka@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=jackmanb@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=surenb@google.com \
--cc=zhen.ni@easystack.cn \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox