From: Lorenzo Stoakes <ljs@kernel.org>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: Hui Zhu <hui.zhu@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
"Liam R. Howlett" <liam@infradead.org>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Kairui Song <kasong@tencent.com>,
Qi Zheng <qi.zheng@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Barry Song <baohua@kernel.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Hui Zhu <zhuhui@kylinos.cn>
Subject: Re: [PATCH v5] mm: assert exclusive nid/zonenum bits at the page/folio access sites
Date: Thu, 25 Jun 2026 13:07:33 +0100 [thread overview]
Message-ID: <aj0YlpCkGySxt7-D@lucifer> (raw)
In-Reply-To: <2c4dd46a-4755-4bf5-8f14-2d73eb356e3e@kernel.org>
On Thu, Jun 25, 2026 at 01:53:14PM +0200, David Hildenbrand (Arm) wrote:
> On 6/25/26 09:18, Hui Zhu wrote:
> > From: Hui Zhu <zhuhui@kylinos.cn>
> >
> > KCSAN reports a data race between page_to_nid()/folio_pgdat() reading
> > page->flags and folio_trylock()/folio_lock() concurrently doing
> > test_and_set_bit_lock(PG_locked, ...) on the same word, e.g.:
> >
> > BUG: KCSAN: data-race in __lruvec_stat_mod_folio / shmem_get_folio_gfp
> >
> > The node id and zone id occupy fixed bit-ranges of page->flags that
> > are set once at page init and never modified afterwards, so they can
> > never overlap with the low PG_locked/PG_waiters bits touched by the
> > folio lock path.
> >
> > ASSERT_EXCLUSIVE_BITS(mdf.f, ...) inside memdesc_nid()/memdesc_zonenum()
> > checks a by-value copy of the flags word, not the actual shared
> > page->flags/folio->flags being modified concurrently, so it doesn't
> > reliably assert anything about the real race. Move the assertion to
> > page_to_nid(), folio_nid(), page_zonenum() and folio_zonenum(), where
> > flags is dereferenced directly from the page/folio.
> >
> > On CONFIG_NUMA=n, NODES_MASK is 0 and the old memdesc_nid() body
> > folded to a constant, so page->flags/folio->flags was never actually
> > read. ASSERT_EXCLUSIVE_BITS() is a real runtime check that can't be
> > folded away, so doing it unconditionally would add a pointless read
> > of page->flags/folio->flags and a check that can never fire. Keep
> > page_to_nid()/folio_nid() as plain "return 0" static inline stubs
> > under CONFIG_NUMA=n instead.
> >
> > Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
> > Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> > ---
> > Changelog:
> > v5:
> > According to the comments of Sashiko, guard the ASSERT_EXCLUSIVE_BITS()
> > calls with #ifndef NODE_NOT_IN_PAGE_FLAGS (for nid) and #if
> > ZONES_WIDTH != 0 (for zonenum).
> > According to the comments of David, avoid calling
> > PF_POISONED_CHECK(page) twice in page_to_nid().
> > According to the warning of lkp, switch the CONFIG_NUMA=n
> > page_to_nid()/folio_nid() stubs from macros to static inline functions.
> > v4:
> > According to the comments of Andrew and Sashiko, set
> > page_to_nid()/folio_nid() as static inline stubs returning 0
> > under CONFIG_NUMA=n.
> > v3:
> > According to the comments of Andrew and Sashiko, move
> > ASSERT_EXCLUSIVE_BITS out of memdesc_nid()/memdesc_zonenum()
> > into the page/folio call sites.
> > v2:
> > According to the comments of David, remove useless comments and use
> > ASSERT_EXCLUSIVE_BITS() in memdesc_nid() instead of data_race() in
> > page_to_nid().
> >
> > include/linux/mm.h | 23 ++++++++++++++++++++++-
> > include/linux/mmzone.h | 7 ++++++-
> > 2 files changed, 28 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 485df9c2dbdd..772bd1fc6fe7 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2294,15 +2294,36 @@ static inline int memdesc_nid(memdesc_flags_t mdf)
> > }
> > #endif
> >
> > +#ifdef CONFIG_NUMA
> > static inline int page_to_nid(const struct page *page)
> > {
> > - return memdesc_nid(PF_POISONED_CHECK(page)->flags);
> > + const struct page *p = PF_POISONED_CHECK(page);
> > +
> > +#ifndef NODE_NOT_IN_PAGE_FLAGS
> > + ASSERT_EXCLUSIVE_BITS(p->flags, NODES_MASK << NODES_PGSHIFT);
> > +#endif
> > + return memdesc_nid(p->flags);
> > }
> >
> > static inline int folio_nid(const struct folio *folio)
> > {
> > +#ifndef NODE_NOT_IN_PAGE_FLAGS
> > + ASSERT_EXCLUSIVE_BITS(folio->flags,
> > + NODES_MASK << NODES_PGSHIFT);
> > +#endif47
>
> This is getting ugly, really. We're leaking implementation details from
> memdesc_nid() into folio_nid().
>
> Maybe just turn memdesc_nid() into a macro where we can just do that check
> internally? Not the best thing in this world, but better than this here.
Could also do:
if (!IS_ENABLED(NODE_NOT_IN_PAGE_FLAGS))
ASSERT_EXCLUSIVE_BITS(folio->flags,
NODES_MASK << NODES_PGSHIFT);
But not sure if it's that much better.
(There's precedent for that form of it in mm/numa_memblks.c)
>
> --
> Cheers,
>
> David
Thanks, Lorenzo
next prev parent reply other threads:[~2026-06-25 12:07 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-25 7:18 [PATCH v5] mm: assert exclusive nid/zonenum bits at the page/folio access sites Hui Zhu
2026-06-25 11:53 ` David Hildenbrand (Arm)
2026-06-25 12:07 ` Lorenzo Stoakes [this message]
2026-06-25 12:08 ` David Hildenbrand (Arm)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aj0YlpCkGySxt7-D@lucifer \
--to=ljs@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=david@kernel.org \
--cc=hui.zhu@linux.dev \
--cc=kasong@tencent.com \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=qi.zheng@linux.dev \
--cc=rppt@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
--cc=zhuhui@kylinos.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox