From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-177.mta0.migadu.com (out-177.mta0.migadu.com [91.218.175.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A5EED22301 for ; Fri, 26 Jun 2026 03:20:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.177 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782444031; cv=none; b=JZkxdrQm3UOfQ/Auf/tIMdr0s1D/u6tIjuJtUZ0YS8Ufg80TGfUXVzF7W3e8LJFjXLZgoGsASl7oyWbyYPDRQyXyL2RS/+ifSZuQM0cqq7OvpLQZ7NuI5djTFtl8k1I4WuT0yshMJK0my5WFAcSa76CjM/UoW/EvqkwBCCvHsbk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782444031; c=relaxed/simple; bh=nrRvY+KdjD7rV4/ENggSTkD3/it6Yl82sTndKtzhigs=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=ljoelVcjLM7HQer4xj3/KjbUaueZks6Il0skwh6RigNBmghR+uUDo9TCD1Wr9Z7fy+bzujf7GdFAR1tEEEqEWcnEusEOorrPX+zknhMgHwzgyWoJ7cec9dPJ4MeG3k+tEmjRErRytukmCsIJYtvTGoFNb8b+PL49+8ummLUBYcg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=fZbEO1tL; arc=none smtp.client-ip=91.218.175.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="fZbEO1tL" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1782444027; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=Z+jrTmuUXWf0qKSmtsfEFj8rLfN5Zd2Qr+1+6/wY3MU=; b=fZbEO1tL0dvVBk98/Btdrsm1UwIe1ZFNJUZ8qZowfMyzeyh+IIqt23QNbfU4olfZu6PFNE 7ZEvTsDK6jr2tLHVEhkq4LRbFkAAUeSjeM0gmbNU2VQ63AqtHnC1AHbAZhmkCiXgYBBoOM MsauUDvBbHj2LyidwUALvJ9W0Da4D6Y= From: Hui Zhu To: Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Kairui Song , Qi Zheng , Shakeel Butt , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Hui Zhu Subject: [PATCH v7] mm: assert exclusive nid/zonenum bits at the page/folio access sites Date: Fri, 26 Jun 2026 11:20:12 +0800 Message-ID: <20260626032012.1049667-1-hui.zhu@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT From: Hui Zhu KCSAN reports a data race between page_to_nid()/folio_pgdat() reading page->flags and folio_trylock()/folio_lock() concurrently doing test_and_set_bit_lock(PG_locked, ...) on the same word, e.g.: BUG: KCSAN: data-race in __lruvec_stat_mod_folio / shmem_get_folio_gfp The node id and zone id occupy fixed bit-ranges of page->flags that are set once at page init and never modified afterwards, so they can never overlap with the low PG_locked/PG_waiters bits touched by the folio lock path. ASSERT_EXCLUSIVE_BITS(mdf.f, ...) inside memdesc_nid()/memdesc_zonenum() used to check a by-value copy of the flags word, not the actual shared page->flags/folio->flags being modified concurrently, so it didn't reliably assert anything about the real race. For zonenum, move the assertion out of memdesc_zonenum() into page_zonenum() and folio_zonenum(), where flags is dereferenced directly from the page/folio. For nid, turn memdesc_nid() into a macro instead, so the mdf argument is expanded as the caller's own flags expression (PF_POISONED_CHECK(page)->flags or folio->flags) rather than copied into a function parameter, letting ASSERT_EXCLUSIVE_BITS() check the real page->flags/folio->flags directly. On CONFIG_NUMA=n, NODES_MASK is 0 and the old memdesc_nid() body folded to a constant, so page->flags/folio->flags was never actually read. ASSERT_EXCLUSIVE_BITS() is a real runtime check that can't be folded away, so doing it unconditionally would add a pointless read of page->flags/folio->flags and a check that can never fire. Keep page_to_nid()/folio_nid() as plain "return 0" static inline stubs under CONFIG_NUMA=n instead. Signed-off-by: Hui Zhu --- Changelog: v7: According to the comments of Sashiko, restrict the memdesc_nid() macro to CONFIG_NUMA, keeping a plain "return 0" static inline stub otherwise, and re-add a local page pointer in page_to_nid() to avoid evaluating PF_POISONED_CHECK(page) twice. v6: According to the comments of David, turn memdesc_nid() from a static inline function into a macro so ASSERT_EXCLUSIVE_BITS() can check the caller's page->flags/folio->flags directly. v5: According to the comments of Sashiko, guard the ASSERT_EXCLUSIVE_BITS() calls with #ifndef NODE_NOT_IN_PAGE_FLAGS (for nid) and #if ZONES_WIDTH != 0 (for zonenum). According to the comments of David, avoid calling PF_POISONED_CHECK(page) twice in page_to_nid(). According to the warning of lkp, switch the CONFIG_NUMA=n page_to_nid()/folio_nid() stubs from macros to static inline functions. v4: According to the comments of Andrew and Sashiko, set page_to_nid()/folio_nid() as static inline stubs returning 0 under CONFIG_NUMA=n. v3: According to the comments of Andrew and Sashiko, move ASSERT_EXCLUSIVE_BITS out of memdesc_nid()/memdesc_zonenum() into the page/folio call sites. v2: According to the comments of David, remove useless comments and use ASSERT_EXCLUSIVE_BITS() in memdesc_nid() instead of data_race() in page_to_nid(). include/linux/mm.h | 25 +++++++++++++++++++++++-- include/linux/mmzone.h | 7 ++++++- 2 files changed, 29 insertions(+), 3 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 485df9c2dbdd..63fcf277b675 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2288,21 +2288,42 @@ static inline int page_zone_id(struct page *page) #ifdef NODE_NOT_IN_PAGE_FLAGS int memdesc_nid(memdesc_flags_t mdf); #else +#ifdef CONFIG_NUMA +#define memdesc_nid(mdf) \ +({ \ + ASSERT_EXCLUSIVE_BITS(mdf.f, NODES_MASK << NODES_PGSHIFT); \ + (int)((mdf.f >> NODES_PGSHIFT) & NODES_MASK); \ +}) +#else static inline int memdesc_nid(memdesc_flags_t mdf) { - return (mdf.f >> NODES_PGSHIFT) & NODES_MASK; + return 0; } #endif +#ifdef CONFIG_NUMA static inline int page_to_nid(const struct page *page) { - return memdesc_nid(PF_POISONED_CHECK(page)->flags); + const struct page *p = PF_POISONED_CHECK(page); + + return memdesc_nid(p->flags); } static inline int folio_nid(const struct folio *folio) { return memdesc_nid(folio->flags); } +#else +static inline int page_to_nid(const struct page *page) +{ + return 0; +} + +static inline int folio_nid(const struct folio *folio) +{ + return 0; +} +#endif #ifdef CONFIG_NUMA_BALANCING /* page access time bits needs to hold at least 4 seconds */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index ca2712187147..1b4336098113 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1274,17 +1274,22 @@ static inline bool zone_is_empty(const struct zone *zone) static inline enum zone_type memdesc_zonenum(memdesc_flags_t flags) { - ASSERT_EXCLUSIVE_BITS(flags.f, ZONES_MASK << ZONES_PGSHIFT); return (flags.f >> ZONES_PGSHIFT) & ZONES_MASK; } static inline enum zone_type page_zonenum(const struct page *page) { +#if ZONES_WIDTH != 0 + ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT); +#endif return memdesc_zonenum(page->flags); } static inline enum zone_type folio_zonenum(const struct folio *folio) { +#if ZONES_WIDTH != 0 + ASSERT_EXCLUSIVE_BITS(folio->flags, ZONES_MASK << ZONES_PGSHIFT); +#endif return memdesc_zonenum(folio->flags); } -- 2.43.0