Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1] mm: fix data-race in folio_batch_count()
@ 2026-06-24  9:26 Xuewen Wang
  2026-06-24 13:54 ` kernel test robot
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Xuewen Wang @ 2026-06-24  9:26 UTC (permalink / raw)
  To: akpm, david, ljs, liam, vbabka, rppt, surenb, mhocko
  Cc: linux-mm, linux-kernel, Xuewen Wang

KCSAN reports:

BUG: KCSAN: data-race in __lru_add_drain_all / folio_batch_add_and_move

write to 0xffff98fe74c015f8 of 1 bytes by task 45153 on cpu 2:
  folio_batch_add+0x30/0xe0

read to 0xffff98fe74c015f8 of 1 bytes by task 45175 on cpu 0:
  folio_batch_count+0x0/0x10
  cpu_needs_drain+0x253/0x430

The write side is a per-cpu local operation (folio_batch_add on the
CPU that owns the per-cpu batch), while cpu_needs_drain() reads
another CPU's per-cpu batch without locking. Reading a slightly stale
value is harmless -- it only determines whether to schedule a drain,
and a subsequent check will catch it.

Use READ_ONCE() to annotate the read and prevent load tearing, which
also suppresses the KCSAN warning.

Signed-off-by: Xuewen Wang <wangxuewen@kylinos.cn>
---
 include/linux/folio_batch.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/folio_batch.h b/include/linux/folio_batch.h
index b45946adc50b..1e31e058e19d 100644
--- a/include/linux/folio_batch.h
+++ b/include/linux/folio_batch.h
@@ -53,7 +53,7 @@ static inline void folio_batch_reinit(struct folio_batch *fbatch)
 
 static inline unsigned int folio_batch_count(const struct folio_batch *fbatch)
 {
-	return fbatch->nr;
+	return READ_ONCE(fbatch->nr);
 }
 
 static inline unsigned int folio_batch_space(const struct folio_batch *fbatch)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] mm: fix data-race in folio_batch_count()
  2026-06-24  9:26 [PATCH v1] mm: fix data-race in folio_batch_count() Xuewen Wang
@ 2026-06-24 13:54 ` kernel test robot
  2026-06-24 14:23 ` Lorenzo Stoakes
  2026-06-24 14:35 ` kernel test robot
  2 siblings, 0 replies; 5+ messages in thread
From: kernel test robot @ 2026-06-24 13:54 UTC (permalink / raw)
  To: Xuewen Wang, akpm, david, ljs, liam, vbabka, rppt, surenb, mhocko
  Cc: oe-kbuild-all, linux-mm, linux-kernel, Xuewen Wang

Hi Xuewen,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Xuewen-Wang/mm-fix-data-race-in-folio_batch_count/20260624-172724
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20260624092606.1083449-1-wangxuewen%40kylinos.cn
patch subject: [PATCH v1] mm: fix data-race in folio_batch_count()
config: x86_64-defconfig (https://download.01.org/0day-ci/archive/20260624/202606242115.DmNPrSkD-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260624/202606242115.DmNPrSkD-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606242115.DmNPrSkD-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from drivers/gpu/drm/i915/gem/i915_gem_shmem.c:6:
   include/linux/folio_batch.h: In function 'folio_batch_count':
>> include/linux/folio_batch.h:56:16: error: implicit declaration of function 'READ_ONCE' [-Wimplicit-function-declaration]
      56 |         return READ_ONCE(fbatch->nr);
         |                ^~~~~~~~~


vim +/READ_ONCE +56 include/linux/folio_batch.h

    53	
    54	static inline unsigned int folio_batch_count(const struct folio_batch *fbatch)
    55	{
  > 56		return READ_ONCE(fbatch->nr);
    57	}
    58	

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] mm: fix data-race in folio_batch_count()
  2026-06-24  9:26 [PATCH v1] mm: fix data-race in folio_batch_count() Xuewen Wang
  2026-06-24 13:54 ` kernel test robot
@ 2026-06-24 14:23 ` Lorenzo Stoakes
  2026-06-24 14:38   ` David Hildenbrand (Arm)
  2026-06-24 14:35 ` kernel test robot
  2 siblings, 1 reply; 5+ messages in thread
From: Lorenzo Stoakes @ 2026-06-24 14:23 UTC (permalink / raw)
  To: Xuewen Wang
  Cc: akpm, david, liam, vbabka, rppt, surenb, mhocko, linux-mm,
	linux-kernel

On Wed, Jun 24, 2026 at 05:26:06PM +0800, Xuewen Wang wrote:
> KCSAN reports:
>
> BUG: KCSAN: data-race in __lru_add_drain_all / folio_batch_add_and_move

Where? A syzbot report? A local run? Please specify.

>
> write to 0xffff98fe74c015f8 of 1 bytes by task 45153 on cpu 2:
>   folio_batch_add+0x30/0xe0
>
> read to 0xffff98fe74c015f8 of 1 bytes by task 45175 on cpu 0:
>   folio_batch_count+0x0/0x10
>   cpu_needs_drain+0x253/0x430
>
> The write side is a per-cpu local operation (folio_batch_add on the
> CPU that owns the per-cpu batch), while cpu_needs_drain() reads
> another CPU's per-cpu batch without locking. Reading a slightly stale
> value is harmless -- it only determines whether to schedule a drain,

Then why are we adding a READ_ONCE() in such a core helper?

> and a subsequent check will catch it.

Where? Which check? Be specific.

>
> Use READ_ONCE() to annotate the read and prevent load tearing, which
> also suppresses the KCSAN warning.

Tearing on a single byte? Which architecture tears a single byte?

I think you're actually more concerned about the value being optimised out on
assumption of it not being updated elsewhere right?

But acutally you're not, because everybody else uses a stack variable or _their
own_ per-CPU value?

Only cpu_needs_drain() is the odd one out right?

>
> Signed-off-by: Xuewen Wang <wangxuewen@kylinos.cn>
> ---
>  include/linux/folio_batch.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/folio_batch.h b/include/linux/folio_batch.h
> index b45946adc50b..1e31e058e19d 100644
> --- a/include/linux/folio_batch.h
> +++ b/include/linux/folio_batch.h
> @@ -53,7 +53,7 @@ static inline void folio_batch_reinit(struct folio_batch *fbatch)
>
>  static inline unsigned int folio_batch_count(const struct folio_batch *fbatch)
>  {
> -	return fbatch->nr;
> +	return READ_ONCE(fbatch->nr);

This isn't free, you're breaking optimisations here by doing that...

It feels like the wrong level of abstraction, but actually I think every other
case is either stack or per-CPU _on its own CPU_ (please check), in which case
we _can_ suppress the check here but I think best done with data_race().

And see the kernel bug bot report, you need to add:

#include <linux/compiler.h>

Too for that.

>  }
>
>  static inline unsigned int folio_batch_space(const struct folio_batch *fbatch)
> --
> 2.25.1
>

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] mm: fix data-race in folio_batch_count()
  2026-06-24  9:26 [PATCH v1] mm: fix data-race in folio_batch_count() Xuewen Wang
  2026-06-24 13:54 ` kernel test robot
  2026-06-24 14:23 ` Lorenzo Stoakes
@ 2026-06-24 14:35 ` kernel test robot
  2 siblings, 0 replies; 5+ messages in thread
From: kernel test robot @ 2026-06-24 14:35 UTC (permalink / raw)
  To: Xuewen Wang, akpm, david, ljs, liam, vbabka, rppt, surenb, mhocko
  Cc: llvm, oe-kbuild-all, linux-mm, linux-kernel, Xuewen Wang

Hi Xuewen,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Xuewen-Wang/mm-fix-data-race-in-folio_batch_count/20260624-172724
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20260624092606.1083449-1-wangxuewen%40kylinos.cn
patch subject: [PATCH v1] mm: fix data-race in folio_batch_count()
config: i386-defconfig (https://download.01.org/0day-ci/archive/20260624/202606242209.fM2W0efm-lkp@intel.com/config)
compiler: clang version 22.1.3 (https://github.com/llvm/llvm-project e9846648fd6183ee6d8cbdb4502213fcf902a211)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260624/202606242209.fM2W0efm-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606242209.fM2W0efm-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from drivers/gpu/drm/i915/gem/i915_gem_shmem.c:6:
>> include/linux/folio_batch.h:56:9: error: call to undeclared function 'READ_ONCE'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
      56 |         return READ_ONCE(fbatch->nr);
         |                ^
   In file included from drivers/gpu/drm/i915/gem/i915_gem_shmem.c:7:
   In file included from include/linux/shmem_fs.h:6:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:17:
   In file included from include/linux/fs.h:5:
   In file included from include/linux/fs/super.h:5:
   In file included from include/linux/fs/super_types.h:13:
   In file included from include/linux/percpu-rwsem.h:7:
   In file included from include/linux/rcuwait.h:6:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:98:11: warning: array index 3 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds]
      98 |                 return (set->sig[3] | set->sig[2] |
         |                         ^        ~
   arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
      24 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from drivers/gpu/drm/i915/gem/i915_gem_shmem.c:7:
   In file included from include/linux/shmem_fs.h:6:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:17:
   In file included from include/linux/fs.h:5:
   In file included from include/linux/fs/super.h:5:
   In file included from include/linux/fs/super_types.h:13:
   In file included from include/linux/percpu-rwsem.h:7:
   In file included from include/linux/rcuwait.h:6:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:98:25: warning: array index 2 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds]
      98 |                 return (set->sig[3] | set->sig[2] |
         |                                       ^        ~
   arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
      24 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from drivers/gpu/drm/i915/gem/i915_gem_shmem.c:7:
   In file included from include/linux/shmem_fs.h:6:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:17:
   In file included from include/linux/fs.h:5:
   In file included from include/linux/fs/super.h:5:
   In file included from include/linux/fs/super_types.h:13:
   In file included from include/linux/percpu-rwsem.h:7:
   In file included from include/linux/rcuwait.h:6:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:114:11: warning: array index 3 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds]
     114 |                 return  (set1->sig[3] == set2->sig[3]) &&
         |                          ^         ~
   arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
      24 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from drivers/gpu/drm/i915/gem/i915_gem_shmem.c:7:
   In file included from include/linux/shmem_fs.h:6:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:17:
   In file included from include/linux/fs.h:5:
   In file included from include/linux/fs/super.h:5:
   In file included from include/linux/fs/super_types.h:13:
   In file included from include/linux/percpu-rwsem.h:7:
   In file included from include/linux/rcuwait.h:6:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:114:27: warning: array index 3 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds]
     114 |                 return  (set1->sig[3] == set2->sig[3]) &&
         |                                          ^         ~
   arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
      24 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from drivers/gpu/drm/i915/gem/i915_gem_shmem.c:7:
   In file included from include/linux/shmem_fs.h:6:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:17:
   In file included from include/linux/fs.h:5:
   In file included from include/linux/fs/super.h:5:
   In file included from include/linux/fs/super_types.h:13:
   In file included from include/linux/percpu-rwsem.h:7:
   In file included from include/linux/rcuwait.h:6:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:115:5: warning: array index 2 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds]
     115 |                         (set1->sig[2] == set2->sig[2]) &&
         |                          ^         ~
   arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
      24 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from drivers/gpu/drm/i915/gem/i915_gem_shmem.c:7:
   In file included from include/linux/shmem_fs.h:6:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:17:
   In file included from include/linux/fs.h:5:
   In file included from include/linux/fs/super.h:5:
   In file included from include/linux/fs/super_types.h:13:
   In file included from include/linux/percpu-rwsem.h:7:
   In file included from include/linux/rcuwait.h:6:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:115:21: warning: array index 2 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds]
     115 |                         (set1->sig[2] == set2->sig[2]) &&


vim +/READ_ONCE +56 include/linux/folio_batch.h

    53	
    54	static inline unsigned int folio_batch_count(const struct folio_batch *fbatch)
    55	{
  > 56		return READ_ONCE(fbatch->nr);
    57	}
    58	

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] mm: fix data-race in folio_batch_count()
  2026-06-24 14:23 ` Lorenzo Stoakes
@ 2026-06-24 14:38   ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 5+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-24 14:38 UTC (permalink / raw)
  To: Lorenzo Stoakes, Xuewen Wang
  Cc: akpm, liam, vbabka, rppt, surenb, mhocko, linux-mm, linux-kernel

On 6/24/26 16:23, Lorenzo Stoakes wrote:
> On Wed, Jun 24, 2026 at 05:26:06PM +0800, Xuewen Wang wrote:
>> KCSAN reports:
>>
>> BUG: KCSAN: data-race in __lru_add_drain_all / folio_batch_add_and_move
> 
> Where? A syzbot report? A local run? Please specify.
> 
>>
>> write to 0xffff98fe74c015f8 of 1 bytes by task 45153 on cpu 2:
>>   folio_batch_add+0x30/0xe0
>>
>> read to 0xffff98fe74c015f8 of 1 bytes by task 45175 on cpu 0:
>>   folio_batch_count+0x0/0x10
>>   cpu_needs_drain+0x253/0x430
>>
>> The write side is a per-cpu local operation (folio_batch_add on the
>> CPU that owns the per-cpu batch), while cpu_needs_drain() reads
>> another CPU's per-cpu batch without locking. Reading a slightly stale
>> value is harmless -- it only determines whether to schedule a drain,
> 
> Then why are we adding a READ_ONCE() in such a core helper?
> 
>> and a subsequent check will catch it.
> 
> Where? Which check? Be specific.
> 
>>
>> Use READ_ONCE() to annotate the read and prevent load tearing, which
>> also suppresses the KCSAN warning.
> 
> Tearing on a single byte? Which architecture tears a single byte?
> 
> I think you're actually more concerned about the value being optimised out on
> assumption of it not being updated elsewhere right?
> 
> But acutally you're not, because everybody else uses a stack variable or _their
> own_ per-CPU value?
> 
> Only cpu_needs_drain() is the odd one out right?
> 
>>
>> Signed-off-by: Xuewen Wang <wangxuewen@kylinos.cn>
>> ---
>>  include/linux/folio_batch.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/include/linux/folio_batch.h b/include/linux/folio_batch.h
>> index b45946adc50b..1e31e058e19d 100644
>> --- a/include/linux/folio_batch.h
>> +++ b/include/linux/folio_batch.h
>> @@ -53,7 +53,7 @@ static inline void folio_batch_reinit(struct folio_batch *fbatch)
>>
>>  static inline unsigned int folio_batch_count(const struct folio_batch *fbatch)
>>  {
>> -	return fbatch->nr;
>> +	return READ_ONCE(fbatch->nr);
> 
> This isn't free, you're breaking optimisations here by doing that...
> 
> It feels like the wrong level of abstraction, but actually I think every other
> case is either stack or per-CPU _on its own CPU_ (please check), in which case
> we _can_ suppress the check here but I think best done with data_race().

Fully agreed.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-24 14:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-24  9:26 [PATCH v1] mm: fix data-race in folio_batch_count() Xuewen Wang
2026-06-24 13:54 ` kernel test robot
2026-06-24 14:23 ` Lorenzo Stoakes
2026-06-24 14:38   ` David Hildenbrand (Arm)
2026-06-24 14:35 ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox