From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-180.mta0.migadu.com (out-180.mta0.migadu.com [91.218.175.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 494CE2EB859 for ; Fri, 26 Jun 2026 02:06:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.180 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782439613; cv=none; b=bb6O9psheDSMRQSZd5g2HjvL6K7o5xv+FufYYP03yaokyHuLfzw7i8V98c6tSp0VgiO4pWNUMB0hlW0OkWeHOdC30w/LzdivwZZV4mQvgUkG8aQ0sqL1S8z2T60bxJSoUByH6oMxrr103tR/ycI2CmZc28ptgsLPrh6/98hEI/s= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782439613; c=relaxed/simple; bh=tL1BlG0UZ5oeOaPMPFnwoL4g87Ce7iQA5yEWIUz38F4=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=ek26AQZtfEYUPu6oEv2lMdS5877fjthYffWvH5vJqoL78uw5NvdoOPlDFdx0UUj4JGLZzhIZvObDn3ys/5p81axvbTAntoG8PtNvu3odFkVahPnctG+mIaaYJYgBU0e7JVEAY6miB9Z25Q+qNUlszIp9kP8Fg6jVIGJ9pgJmnrw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=XHh2+uFm; arc=none smtp.client-ip=91.218.175.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="XHh2+uFm" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1782439607; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=rsHOkASwl1S1CEEEKqfD3rOkMhH97Cd0VnIdzLWWzu8=; b=XHh2+uFmSCkYZmfLI2yVM7U3iFbIb+dj3Dj0FIt0iDa+cjdWMG3XDeKEddrbkktV5ZwpBN 1AJNCcIBaIveETtQkeZSyrIVbZK2yF+Jl5M1hIMPk+lQhWUtl997X/VD54Lu19fcmv9jpk d1GDHr1+x0ctcsmX0AeUvlbJbU/OiTo= From: Hui Zhu To: Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Kairui Song , Qi Zheng , Shakeel Butt , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Hui Zhu Subject: [PATCH v6] mm: assert exclusive nid/zonenum bits at the page/folio access sites Date: Fri, 26 Jun 2026 10:06:29 +0800 Message-ID: <20260626020629.1042041-1-hui.zhu@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT From: Hui Zhu KCSAN reports a data race between page_to_nid()/folio_pgdat() reading page->flags and folio_trylock()/folio_lock() concurrently doing test_and_set_bit_lock(PG_locked, ...) on the same word, e.g.: BUG: KCSAN: data-race in __lruvec_stat_mod_folio / shmem_get_folio_gfp The node id and zone id occupy fixed bit-ranges of page->flags that are set once at page init and never modified afterwards, so they can never overlap with the low PG_locked/PG_waiters bits touched by the folio lock path. ASSERT_EXCLUSIVE_BITS(mdf.f, ...) inside memdesc_nid()/memdesc_zonenum() used to check a by-value copy of the flags word, not the actual shared page->flags/folio->flags being modified concurrently, so it didn't reliably assert anything about the real race. For zonenum, move the assertion out of memdesc_zonenum() into page_zonenum() and folio_zonenum(), where flags is dereferenced directly from the page/folio. For nid, turn memdesc_nid() into a macro instead, so the mdf argument is expanded as the caller's own flags expression (PF_POISONED_CHECK(page)->flags or folio->flags) rather than copied into a function parameter, letting ASSERT_EXCLUSIVE_BITS() check the real page->flags/folio->flags directly. On CONFIG_NUMA=n, NODES_MASK is 0 and the old memdesc_nid() body folded to a constant, so page->flags/folio->flags was never actually read. ASSERT_EXCLUSIVE_BITS() is a real runtime check that can't be folded away, so doing it unconditionally would add a pointless read of page->flags/folio->flags and a check that can never fire. Keep page_to_nid()/folio_nid() as plain "return 0" static inline stubs under CONFIG_NUMA=n instead. Signed-off-by: Hui Zhu --- Changelog: v6: According to the comments of David, turn memdesc_nid() from a static inline function into a macro so ASSERT_EXCLUSIVE_BITS() can check the caller's page->flags/folio->flags directly. v5: According to the comments of Sashiko, guard the ASSERT_EXCLUSIVE_BITS() calls with #ifndef NODE_NOT_IN_PAGE_FLAGS (for nid) and #if ZONES_WIDTH != 0 (for zonenum). According to the comments of David, avoid calling PF_POISONED_CHECK(page) twice in page_to_nid(). According to the warning of lkp, switch the CONFIG_NUMA=n page_to_nid()/folio_nid() stubs from macros to static inline functions. v4: According to the comments of Andrew and Sashiko, set page_to_nid()/folio_nid() as static inline stubs returning 0 under CONFIG_NUMA=n. v3: According to the comments of Andrew and Sashiko, move ASSERT_EXCLUSIVE_BITS out of memdesc_nid()/memdesc_zonenum() into the page/folio call sites. v2: According to the comments of David, remove useless comments and use ASSERT_EXCLUSIVE_BITS() in memdesc_nid() instead of data_race() in page_to_nid(). include/linux/mm.h | 21 +++++++++++++++++---- include/linux/mmzone.h | 7 ++++++- 2 files changed, 23 insertions(+), 5 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 485df9c2dbdd..6cce6dc621a9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2288,12 +2288,14 @@ static inline int page_zone_id(struct page *page) #ifdef NODE_NOT_IN_PAGE_FLAGS int memdesc_nid(memdesc_flags_t mdf); #else -static inline int memdesc_nid(memdesc_flags_t mdf) -{ - return (mdf.f >> NODES_PGSHIFT) & NODES_MASK; -} +#define memdesc_nid(mdf) \ +({ \ + ASSERT_EXCLUSIVE_BITS(mdf.f, NODES_MASK << NODES_PGSHIFT); \ + (int)((mdf.f >> NODES_PGSHIFT) & NODES_MASK); \ +}) #endif +#ifdef CONFIG_NUMA static inline int page_to_nid(const struct page *page) { return memdesc_nid(PF_POISONED_CHECK(page)->flags); @@ -2303,6 +2305,17 @@ static inline int folio_nid(const struct folio *folio) { return memdesc_nid(folio->flags); } +#else +static inline int page_to_nid(const struct page *page) +{ + return 0; +} + +static inline int folio_nid(const struct folio *folio) +{ + return 0; +} +#endif #ifdef CONFIG_NUMA_BALANCING /* page access time bits needs to hold at least 4 seconds */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index ca2712187147..1b4336098113 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1274,17 +1274,22 @@ static inline bool zone_is_empty(const struct zone *zone) static inline enum zone_type memdesc_zonenum(memdesc_flags_t flags) { - ASSERT_EXCLUSIVE_BITS(flags.f, ZONES_MASK << ZONES_PGSHIFT); return (flags.f >> ZONES_PGSHIFT) & ZONES_MASK; } static inline enum zone_type page_zonenum(const struct page *page) { +#if ZONES_WIDTH != 0 + ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT); +#endif return memdesc_zonenum(page->flags); } static inline enum zone_type folio_zonenum(const struct folio *folio) { +#if ZONES_WIDTH != 0 + ASSERT_EXCLUSIVE_BITS(folio->flags, ZONES_MASK << ZONES_PGSHIFT); +#endif return memdesc_zonenum(folio->flags); } -- 2.43.0