From: Leon Hwang <leon.hwang@linux.dev>
To: Hui Zhu <hui.zhu@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <ljs@kernel.org>,
"Liam R. Howlett" <liam@infradead.org>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Kairui Song <kasong@tencent.com>,
Qi Zheng <qi.zheng@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Barry Song <baohua@kernel.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Hui Zhu <zhuhui@kylinos.cn>
Subject: Re: [PATCH v7] mm: assert exclusive nid/zonenum bits at the page/folio access sites
Date: Fri, 26 Jun 2026 13:04:51 +0800 [thread overview]
Message-ID: <925c8686-9ff6-44c1-9780-63bd7cd8a1c3@linux.dev> (raw)
In-Reply-To: <20260626032012.1049667-1-hui.zhu@linux.dev>
On 26/6/26 11:20, Hui Zhu wrote:
> From: Hui Zhu <zhuhui@kylinos.cn>
>
> KCSAN reports a data race between page_to_nid()/folio_pgdat() reading
> page->flags and folio_trylock()/folio_lock() concurrently doing
> test_and_set_bit_lock(PG_locked, ...) on the same word, e.g.:
>
> BUG: KCSAN: data-race in __lruvec_stat_mod_folio / shmem_get_folio_gfp
>
> The node id and zone id occupy fixed bit-ranges of page->flags that
> are set once at page init and never modified afterwards, so they can
> never overlap with the low PG_locked/PG_waiters bits touched by the
> folio lock path.
>
> ASSERT_EXCLUSIVE_BITS(mdf.f, ...) inside memdesc_nid()/memdesc_zonenum()
> used to check a by-value copy of the flags word, not the actual shared
> page->flags/folio->flags being modified concurrently, so it didn't
> reliably assert anything about the real race.
>
> For zonenum, move the assertion out of memdesc_zonenum() into
> page_zonenum() and folio_zonenum(), where flags is dereferenced
> directly from the page/folio.
>
> For nid, turn memdesc_nid() into a macro instead, so the mdf argument
> is expanded as the caller's own flags expression
> (PF_POISONED_CHECK(page)->flags or folio->flags) rather than copied
> into a function parameter, letting ASSERT_EXCLUSIVE_BITS() check the
> real page->flags/folio->flags directly.
>
> On CONFIG_NUMA=n, NODES_MASK is 0 and the old memdesc_nid() body
> folded to a constant, so page->flags/folio->flags was never actually
> read. ASSERT_EXCLUSIVE_BITS() is a real runtime check that can't be
> folded away, so doing it unconditionally would add a pointless read
> of page->flags/folio->flags and a check that can never fire. Keep
> page_to_nid()/folio_nid() as plain "return 0" static inline stubs
> under CONFIG_NUMA=n instead.
>
> Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
> ---
> Changelog:
> v7:
> According to the comments of Sashiko, restrict the memdesc_nid() macro
> to CONFIG_NUMA, keeping a plain "return 0" static inline stub otherwise,
> and re-add a local page pointer in page_to_nid() to avoid evaluating
> PF_POISONED_CHECK(page) twice.
> v6:
> According to the comments of David, turn memdesc_nid() from a static
> inline function into a macro so ASSERT_EXCLUSIVE_BITS() can check the
> caller's page->flags/folio->flags directly.
> v5:
> According to the comments of Sashiko, guard the ASSERT_EXCLUSIVE_BITS()
> calls with #ifndef NODE_NOT_IN_PAGE_FLAGS (for nid) and #if
> ZONES_WIDTH != 0 (for zonenum).
> According to the comments of David, avoid calling
> PF_POISONED_CHECK(page) twice in page_to_nid().
> According to the warning of lkp, switch the CONFIG_NUMA=n
> page_to_nid()/folio_nid() stubs from macros to static inline functions.
> v4:
> According to the comments of Andrew and Sashiko, set
> page_to_nid()/folio_nid() as static inline stubs returning 0
> under CONFIG_NUMA=n.
> v3:
> According to the comments of Andrew and Sashiko, move
> ASSERT_EXCLUSIVE_BITS out of memdesc_nid()/memdesc_zonenum()
> into the page/folio call sites.
> v2:
> According to the comments of David, remove useless comments and use
> ASSERT_EXCLUSIVE_BITS() in memdesc_nid() instead of data_race() in
> page_to_nid().
>
> include/linux/mm.h | 25 +++++++++++++++++++++++--
> include/linux/mmzone.h | 7 ++++++-
> 2 files changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 485df9c2dbdd..63fcf277b675 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2288,21 +2288,42 @@ static inline int page_zone_id(struct page *page)
> #ifdef NODE_NOT_IN_PAGE_FLAGS
> int memdesc_nid(memdesc_flags_t mdf);
> #else
> +#ifdef CONFIG_NUMA
> +#define memdesc_nid(mdf) \
> +({ \
> + ASSERT_EXCLUSIVE_BITS(mdf.f, NODES_MASK << NODES_PGSHIFT); \
> + (int)((mdf.f >> NODES_PGSHIFT) & NODES_MASK); \
> +})
> +#else
> static inline int memdesc_nid(memdesc_flags_t mdf)
> {
> - return (mdf.f >> NODES_PGSHIFT) & NODES_MASK;
> + return 0;
> }
> #endif
>
> +#ifdef CONFIG_NUMA
> static inline int page_to_nid(const struct page *page)
> {
> - return memdesc_nid(PF_POISONED_CHECK(page)->flags);
> + const struct page *p = PF_POISONED_CHECK(page);
> +
> + return memdesc_nid(p->flags);
> }
>
> static inline int folio_nid(const struct folio *folio)
> {
> return memdesc_nid(folio->flags);
> }
> +#else
> +static inline int page_to_nid(const struct page *page)
> +{
> + return 0;
> +}
> +
> +static inline int folio_nid(const struct folio *folio)
> +{
> + return 0;
> +}
> +#endif
>
> #ifdef CONFIG_NUMA_BALANCING
> /* page access time bits needs to hold at least 4 seconds */
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index ca2712187147..1b4336098113 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1274,17 +1274,22 @@ static inline bool zone_is_empty(const struct zone *zone)
>
> static inline enum zone_type memdesc_zonenum(memdesc_flags_t flags)
> {
> - ASSERT_EXCLUSIVE_BITS(flags.f, ZONES_MASK << ZONES_PGSHIFT);
> return (flags.f >> ZONES_PGSHIFT) & ZONES_MASK;
> }
>
> static inline enum zone_type page_zonenum(const struct page *page)
> {
> +#if ZONES_WIDTH != 0
> + ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT);
> +#endif
> return memdesc_zonenum(page->flags);
> }
>
> static inline enum zone_type folio_zonenum(const struct folio *folio)
> {
> +#if ZONES_WIDTH != 0
> + ASSERT_EXCLUSIVE_BITS(folio->flags, ZONES_MASK << ZONES_PGSHIFT);
> +#endif
> return memdesc_zonenum(folio->flags);
> }
>
Better to factor out a common macro alongside a comment for these two '#if'?
Thanks,
Leon
next prev parent reply other threads:[~2026-06-26 5:05 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-26 3:20 [PATCH v7] mm: assert exclusive nid/zonenum bits at the page/folio access sites Hui Zhu
2026-06-26 4:54 ` kernel test robot
2026-06-26 4:54 ` kernel test robot
2026-06-26 5:04 ` Leon Hwang [this message]
2026-06-26 9:09 ` David Hildenbrand (Arm)
2026-06-26 6:05 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=925c8686-9ff6-44c1-9780-63bd7cd8a1c3@linux.dev \
--to=leon.hwang@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=david@kernel.org \
--cc=hui.zhu@linux.dev \
--cc=kasong@tencent.com \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=qi.zheng@linux.dev \
--cc=rppt@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
--cc=zhuhui@kylinos.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox