Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4] mm: assert exclusive nid/zonenum bits at the page/folio access sites
@ 2026-06-25  5:39 Hui Zhu
  2026-06-25  6:46 ` David Hildenbrand (Arm)
  2026-06-25  6:50 ` kernel test robot
  0 siblings, 2 replies; 3+ messages in thread
From: Hui Zhu @ 2026-06-25  5:39 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Kairui Song, Qi Zheng,
	Shakeel Butt, Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	linux-mm, linux-kernel
  Cc: Hui Zhu

From: Hui Zhu <zhuhui@kylinos.cn>

KCSAN reports a data race between page_to_nid()/folio_pgdat() reading
page->flags and folio_trylock()/folio_lock() concurrently doing
test_and_set_bit_lock(PG_locked, ...) on the same word, e.g.:

  BUG: KCSAN: data-race in __lruvec_stat_mod_folio / shmem_get_folio_gfp

The node id and zone id occupy fixed bit-ranges of page->flags that
are set once at page init and never modified afterwards, so they can
never overlap with the low PG_locked/PG_waiters bits touched by the
folio lock path.

ASSERT_EXCLUSIVE_BITS(mdf.f, ...) inside memdesc_nid()/memdesc_zonenum()
checks a by-value copy of the flags word, not the actual shared
page->flags/folio->flags being modified concurrently, so it doesn't
reliably assert anything about the real race. Move the assertion to
page_to_nid(), folio_nid(), page_zonenum() and folio_zonenum(), where
flags is dereferenced directly from the page/folio.

On CONFIG_NUMA=n, NODES_MASK is 0 and the old memdesc_nid() body
folded to a constant, so page->flags/folio->flags was never actually
read. ASSERT_EXCLUSIVE_BITS() is a real runtime check that can't be
folded away, so doing it unconditionally would add a pointless read
of page->flags/folio->flags and a check that can never fire. Keep
page_to_nid()/folio_nid() as plain "return 0" static inline stubs
under CONFIG_NUMA=n instead.

Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
---
Changelog:
v4:
According to the comments of Andrew and Sashiko, set
page_to_nid()/folio_nid() as static inline stubs returning 0
under CONFIG_NUMA=n.
v3:
According to the comments of Andrew and Sashiko, move
ASSERT_EXCLUSIVE_BITS out of memdesc_nid()/memdesc_zonenum()
into the page/folio call sites.
v2:
According to the comments of David, remove useless comments and use
ASSERT_EXCLUSIVE_BITS() in memdesc_nid() instead of data_race() in
page_to_nid().

 include/linux/mm.h     | 9 +++++++++
 include/linux/mmzone.h | 3 ++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 485df9c2dbdd..56b39194605a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2294,15 +2294,24 @@ static inline int memdesc_nid(memdesc_flags_t mdf)
 }
 #endif
 
+#ifdef CONFIG_NUMA
 static inline int page_to_nid(const struct page *page)
 {
+	ASSERT_EXCLUSIVE_BITS(PF_POISONED_CHECK(page)->flags,
+			      NODES_MASK << NODES_PGSHIFT);
 	return memdesc_nid(PF_POISONED_CHECK(page)->flags);
 }
 
 static inline int folio_nid(const struct folio *folio)
 {
+	ASSERT_EXCLUSIVE_BITS(folio->flags,
+			      NODES_MASK << NODES_PGSHIFT);
 	return memdesc_nid(folio->flags);
 }
+#else
+#define page_to_nid(page) (0)
+#define folio_nid(folio) (0)
+#endif
 
 #ifdef CONFIG_NUMA_BALANCING
 /* page access time bits needs to hold at least 4 seconds */
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index ca2712187147..56dffa966343 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1274,17 +1274,18 @@ static inline bool zone_is_empty(const struct zone *zone)
 
 static inline enum zone_type memdesc_zonenum(memdesc_flags_t flags)
 {
-	ASSERT_EXCLUSIVE_BITS(flags.f, ZONES_MASK << ZONES_PGSHIFT);
 	return (flags.f >> ZONES_PGSHIFT) & ZONES_MASK;
 }
 
 static inline enum zone_type page_zonenum(const struct page *page)
 {
+	ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT);
 	return memdesc_zonenum(page->flags);
 }
 
 static inline enum zone_type folio_zonenum(const struct folio *folio)
 {
+	ASSERT_EXCLUSIVE_BITS(folio->flags, ZONES_MASK << ZONES_PGSHIFT);
 	return memdesc_zonenum(folio->flags);
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v4] mm: assert exclusive nid/zonenum bits at the page/folio access sites
  2026-06-25  5:39 [PATCH v4] mm: assert exclusive nid/zonenum bits at the page/folio access sites Hui Zhu
@ 2026-06-25  6:46 ` David Hildenbrand (Arm)
  2026-06-25  6:50 ` kernel test robot
  1 sibling, 0 replies; 3+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-25  6:46 UTC (permalink / raw)
  To: Hui Zhu, Andrew Morton, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Kairui Song, Qi Zheng, Shakeel Butt, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, linux-mm, linux-kernel
  Cc: Hui Zhu

On 6/25/26 07:39, Hui Zhu wrote:
> From: Hui Zhu <zhuhui@kylinos.cn>
> 
> KCSAN reports a data race between page_to_nid()/folio_pgdat() reading
> page->flags and folio_trylock()/folio_lock() concurrently doing
> test_and_set_bit_lock(PG_locked, ...) on the same word, e.g.:
> 
>   BUG: KCSAN: data-race in __lruvec_stat_mod_folio / shmem_get_folio_gfp
> 
> The node id and zone id occupy fixed bit-ranges of page->flags that
> are set once at page init and never modified afterwards, so they can
> never overlap with the low PG_locked/PG_waiters bits touched by the
> folio lock path.
> 
> ASSERT_EXCLUSIVE_BITS(mdf.f, ...) inside memdesc_nid()/memdesc_zonenum()
> checks a by-value copy of the flags word, not the actual shared
> page->flags/folio->flags being modified concurrently, so it doesn't
> reliably assert anything about the real race.

Is that the case? I thought the existing ASSERT_EXCLUSIVE_BITS() reliably worked
before?

Maybe the compiler optimizing out a local copy sorted that for us.

> Move the assertion to
> page_to_nid(), folio_nid(), page_zonenum() and folio_zonenum(), where
> flags is dereferenced directly from the page/folio.
> 
> On CONFIG_NUMA=n, NODES_MASK is 0 and the old memdesc_nid() body
> folded to a constant, so page->flags/folio->flags was never actually
> read. ASSERT_EXCLUSIVE_BITS() is a real runtime check that can't be
> folded away, so doing it unconditionally would add a pointless read
> of page->flags/folio->flags and a check that can never fire. Keep
> page_to_nid()/folio_nid() as plain "return 0" static inline stubs
> under CONFIG_NUMA=n instead.
> 
> Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
> ---
> Changelog:
> v4:
> According to the comments of Andrew and Sashiko, set
> page_to_nid()/folio_nid() as static inline stubs returning 0
> under CONFIG_NUMA=n.
> v3:
> According to the comments of Andrew and Sashiko, move
> ASSERT_EXCLUSIVE_BITS out of memdesc_nid()/memdesc_zonenum()
> into the page/folio call sites.
> v2:
> According to the comments of David, remove useless comments and use
> ASSERT_EXCLUSIVE_BITS() in memdesc_nid() instead of data_race() in
> page_to_nid().
> 
>  include/linux/mm.h     | 9 +++++++++
>  include/linux/mmzone.h | 3 ++-
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 485df9c2dbdd..56b39194605a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2294,15 +2294,24 @@ static inline int memdesc_nid(memdesc_flags_t mdf)
>  }
>  #endif
>  
> +#ifdef CONFIG_NUMA
>  static inline int page_to_nid(const struct page *page)
>  {
> +	ASSERT_EXCLUSIVE_BITS(PF_POISONED_CHECK(page)->flags,
> +			      NODES_MASK << NODES_PGSHIFT);

Performing the PF_POISONED_CHECK() twice is a bit odd. One time is sufficient,
maybe simply before both statements separately?

>  	return memdesc_nid(PF_POISONED_CHECK(page)->flags);
>  }
>  
>  static inline int folio_nid(const struct folio *folio)
>  {
> +	ASSERT_EXCLUSIVE_BITS(folio->flags,
> +			      NODES_MASK << NODES_PGSHIFT);
>  	return memdesc_nid(folio->flags);
>  }
> +#else
> +#define page_to_nid(page) (0)
> +#define folio_nid(folio) (0)
> +#endif
>  

LGTM

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v4] mm: assert exclusive nid/zonenum bits at the page/folio access sites
  2026-06-25  5:39 [PATCH v4] mm: assert exclusive nid/zonenum bits at the page/folio access sites Hui Zhu
  2026-06-25  6:46 ` David Hildenbrand (Arm)
@ 2026-06-25  6:50 ` kernel test robot
  1 sibling, 0 replies; 3+ messages in thread
From: kernel test robot @ 2026-06-25  6:50 UTC (permalink / raw)
  To: Hui Zhu, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Kairui Song, Qi Zheng,
	Shakeel Butt, Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	linux-kernel
  Cc: llvm, oe-kbuild-all, Linux Memory Management List, Hui Zhu

Hi Hui,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Hui-Zhu/mm-assert-exclusive-nid-zonenum-bits-at-the-page-folio-access-sites/20260625-134106
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20260625053958.918738-1-hui.zhu%40linux.dev
patch subject: [PATCH v4] mm: assert exclusive nid/zonenum bits at the page/folio access sites
config: s390-allnoconfig (https://download.01.org/0day-ci/archive/20260625/202606251454.M74ab4Rw-lkp@intel.com/config)
compiler: clang version 23.0.0git (https://github.com/llvm/llvm-project 6cc609bb250b21b47fc7d394b4019101e9983597)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260625/202606251454.M74ab4Rw-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606251454.M74ab4Rw-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> kernel/fork.c:258:18: warning: unused variable 'page' [-Wunused-variable]
     258 |                                 struct page *page = vm_area->pages[i];
         |                                              ^~~~
   1 warning generated.


vim +/page +258 kernel/fork.c

262ef8e55b7ccd4 Mateusz Guzik             2025-11-20  240  
449e0b4ed5a16c7 Pasha Tatashin            2025-05-09  241  static bool try_release_thread_stack_to_cache(struct vm_struct *vm_area)
e540bf3162e822d Sebastian Andrzej Siewior 2022-02-17  242  {
e540bf3162e822d Sebastian Andrzej Siewior 2022-02-17  243  	unsigned int i;
262ef8e55b7ccd4 Mateusz Guzik             2025-11-20  244  	int nid;
262ef8e55b7ccd4 Mateusz Guzik             2025-11-20  245  
262ef8e55b7ccd4 Mateusz Guzik             2025-11-20  246  	/*
262ef8e55b7ccd4 Mateusz Guzik             2025-11-20  247  	 * Don't cache stacks if any of the pages don't match the local domain, unless
262ef8e55b7ccd4 Mateusz Guzik             2025-11-20  248  	 * there is no local memory to begin with.
262ef8e55b7ccd4 Mateusz Guzik             2025-11-20  249  	 *
262ef8e55b7ccd4 Mateusz Guzik             2025-11-20  250  	 * Note that lack of local memory does not automatically mean it makes no difference
262ef8e55b7ccd4 Mateusz Guzik             2025-11-20  251  	 * performance-wise which other domain backs the stack. In this case we are merely
262ef8e55b7ccd4 Mateusz Guzik             2025-11-20  252  	 * trying to avoid constantly going to vmalloc.
262ef8e55b7ccd4 Mateusz Guzik             2025-11-20  253  	 */
262ef8e55b7ccd4 Mateusz Guzik             2025-11-20  254  	scoped_guard(preempt) {
262ef8e55b7ccd4 Mateusz Guzik             2025-11-20  255  		nid = numa_node_id();
262ef8e55b7ccd4 Mateusz Guzik             2025-11-20  256  		if (node_state(nid, N_MEMORY)) {
262ef8e55b7ccd4 Mateusz Guzik             2025-11-20  257  			for (i = 0; i < vm_area->nr_pages; i++) {
262ef8e55b7ccd4 Mateusz Guzik             2025-11-20 @258  				struct page *page = vm_area->pages[i];
262ef8e55b7ccd4 Mateusz Guzik             2025-11-20  259  				if (page_to_nid(page) != nid)
262ef8e55b7ccd4 Mateusz Guzik             2025-11-20  260  					return false;
262ef8e55b7ccd4 Mateusz Guzik             2025-11-20  261  			}
262ef8e55b7ccd4 Mateusz Guzik             2025-11-20  262  		}
e540bf3162e822d Sebastian Andrzej Siewior 2022-02-17  263  
e540bf3162e822d Sebastian Andrzej Siewior 2022-02-17  264  		for (i = 0; i < NR_CACHED_STACKS; i++) {
47e39c793367600 Uros Bizjak               2024-05-23  265  			struct vm_struct *tmp = NULL;
47e39c793367600 Uros Bizjak               2024-05-23  266  
449e0b4ed5a16c7 Pasha Tatashin            2025-05-09  267  			if (this_cpu_try_cmpxchg(cached_stacks[i], &tmp, vm_area))
e540bf3162e822d Sebastian Andrzej Siewior 2022-02-17  268  				return true;
e540bf3162e822d Sebastian Andrzej Siewior 2022-02-17  269  		}
262ef8e55b7ccd4 Mateusz Guzik             2025-11-20  270  	}
e540bf3162e822d Sebastian Andrzej Siewior 2022-02-17  271  	return false;
e540bf3162e822d Sebastian Andrzej Siewior 2022-02-17  272  }
e540bf3162e822d Sebastian Andrzej Siewior 2022-02-17  273  

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-25  6:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-25  5:39 [PATCH v4] mm: assert exclusive nid/zonenum bits at the page/folio access sites Hui Zhu
2026-06-25  6:46 ` David Hildenbrand (Arm)
2026-06-25  6:50 ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox