* [PATCH v1] mm: fix data-race in folio_batch_count()
@ 2026-06-24 9:26 Xuewen Wang
2026-06-24 13:54 ` kernel test robot
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Xuewen Wang @ 2026-06-24 9:26 UTC (permalink / raw)
To: akpm, david, ljs, liam, vbabka, rppt, surenb, mhocko
Cc: linux-mm, linux-kernel, Xuewen Wang
KCSAN reports:
BUG: KCSAN: data-race in __lru_add_drain_all / folio_batch_add_and_move
write to 0xffff98fe74c015f8 of 1 bytes by task 45153 on cpu 2:
folio_batch_add+0x30/0xe0
read to 0xffff98fe74c015f8 of 1 bytes by task 45175 on cpu 0:
folio_batch_count+0x0/0x10
cpu_needs_drain+0x253/0x430
The write side is a per-cpu local operation (folio_batch_add on the
CPU that owns the per-cpu batch), while cpu_needs_drain() reads
another CPU's per-cpu batch without locking. Reading a slightly stale
value is harmless -- it only determines whether to schedule a drain,
and a subsequent check will catch it.
Use READ_ONCE() to annotate the read and prevent load tearing, which
also suppresses the KCSAN warning.
Signed-off-by: Xuewen Wang <wangxuewen@kylinos.cn>
---
include/linux/folio_batch.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/folio_batch.h b/include/linux/folio_batch.h
index b45946adc50b..1e31e058e19d 100644
--- a/include/linux/folio_batch.h
+++ b/include/linux/folio_batch.h
@@ -53,7 +53,7 @@ static inline void folio_batch_reinit(struct folio_batch *fbatch)
static inline unsigned int folio_batch_count(const struct folio_batch *fbatch)
{
- return fbatch->nr;
+ return READ_ONCE(fbatch->nr);
}
static inline unsigned int folio_batch_space(const struct folio_batch *fbatch)
--
2.25.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH v1] mm: fix data-race in folio_batch_count()
2026-06-24 9:26 [PATCH v1] mm: fix data-race in folio_batch_count() Xuewen Wang
@ 2026-06-24 13:54 ` kernel test robot
2026-06-24 14:23 ` Lorenzo Stoakes
2026-06-24 14:35 ` kernel test robot
2 siblings, 0 replies; 5+ messages in thread
From: kernel test robot @ 2026-06-24 13:54 UTC (permalink / raw)
To: Xuewen Wang, akpm, david, ljs, liam, vbabka, rppt, surenb, mhocko
Cc: oe-kbuild-all, linux-mm, linux-kernel, Xuewen Wang
Hi Xuewen,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/Xuewen-Wang/mm-fix-data-race-in-folio_batch_count/20260624-172724
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20260624092606.1083449-1-wangxuewen%40kylinos.cn
patch subject: [PATCH v1] mm: fix data-race in folio_batch_count()
config: x86_64-defconfig (https://download.01.org/0day-ci/archive/20260624/202606242115.DmNPrSkD-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260624/202606242115.DmNPrSkD-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606242115.DmNPrSkD-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from drivers/gpu/drm/i915/gem/i915_gem_shmem.c:6:
include/linux/folio_batch.h: In function 'folio_batch_count':
>> include/linux/folio_batch.h:56:16: error: implicit declaration of function 'READ_ONCE' [-Wimplicit-function-declaration]
56 | return READ_ONCE(fbatch->nr);
| ^~~~~~~~~
vim +/READ_ONCE +56 include/linux/folio_batch.h
53
54 static inline unsigned int folio_batch_count(const struct folio_batch *fbatch)
55 {
> 56 return READ_ONCE(fbatch->nr);
57 }
58
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH v1] mm: fix data-race in folio_batch_count()
2026-06-24 9:26 [PATCH v1] mm: fix data-race in folio_batch_count() Xuewen Wang
2026-06-24 13:54 ` kernel test robot
@ 2026-06-24 14:23 ` Lorenzo Stoakes
2026-06-24 14:38 ` David Hildenbrand (Arm)
2026-06-24 14:35 ` kernel test robot
2 siblings, 1 reply; 5+ messages in thread
From: Lorenzo Stoakes @ 2026-06-24 14:23 UTC (permalink / raw)
To: Xuewen Wang
Cc: akpm, david, liam, vbabka, rppt, surenb, mhocko, linux-mm,
linux-kernel
On Wed, Jun 24, 2026 at 05:26:06PM +0800, Xuewen Wang wrote:
> KCSAN reports:
>
> BUG: KCSAN: data-race in __lru_add_drain_all / folio_batch_add_and_move
Where? A syzbot report? A local run? Please specify.
>
> write to 0xffff98fe74c015f8 of 1 bytes by task 45153 on cpu 2:
> folio_batch_add+0x30/0xe0
>
> read to 0xffff98fe74c015f8 of 1 bytes by task 45175 on cpu 0:
> folio_batch_count+0x0/0x10
> cpu_needs_drain+0x253/0x430
>
> The write side is a per-cpu local operation (folio_batch_add on the
> CPU that owns the per-cpu batch), while cpu_needs_drain() reads
> another CPU's per-cpu batch without locking. Reading a slightly stale
> value is harmless -- it only determines whether to schedule a drain,
Then why are we adding a READ_ONCE() in such a core helper?
> and a subsequent check will catch it.
Where? Which check? Be specific.
>
> Use READ_ONCE() to annotate the read and prevent load tearing, which
> also suppresses the KCSAN warning.
Tearing on a single byte? Which architecture tears a single byte?
I think you're actually more concerned about the value being optimised out on
assumption of it not being updated elsewhere right?
But acutally you're not, because everybody else uses a stack variable or _their
own_ per-CPU value?
Only cpu_needs_drain() is the odd one out right?
>
> Signed-off-by: Xuewen Wang <wangxuewen@kylinos.cn>
> ---
> include/linux/folio_batch.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/folio_batch.h b/include/linux/folio_batch.h
> index b45946adc50b..1e31e058e19d 100644
> --- a/include/linux/folio_batch.h
> +++ b/include/linux/folio_batch.h
> @@ -53,7 +53,7 @@ static inline void folio_batch_reinit(struct folio_batch *fbatch)
>
> static inline unsigned int folio_batch_count(const struct folio_batch *fbatch)
> {
> - return fbatch->nr;
> + return READ_ONCE(fbatch->nr);
This isn't free, you're breaking optimisations here by doing that...
It feels like the wrong level of abstraction, but actually I think every other
case is either stack or per-CPU _on its own CPU_ (please check), in which case
we _can_ suppress the check here but I think best done with data_race().
And see the kernel bug bot report, you need to add:
#include <linux/compiler.h>
Too for that.
> }
>
> static inline unsigned int folio_batch_space(const struct folio_batch *fbatch)
> --
> 2.25.1
>
Thanks, Lorenzo
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH v1] mm: fix data-race in folio_batch_count()
2026-06-24 14:23 ` Lorenzo Stoakes
@ 2026-06-24 14:38 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 5+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-24 14:38 UTC (permalink / raw)
To: Lorenzo Stoakes, Xuewen Wang
Cc: akpm, liam, vbabka, rppt, surenb, mhocko, linux-mm, linux-kernel
On 6/24/26 16:23, Lorenzo Stoakes wrote:
> On Wed, Jun 24, 2026 at 05:26:06PM +0800, Xuewen Wang wrote:
>> KCSAN reports:
>>
>> BUG: KCSAN: data-race in __lru_add_drain_all / folio_batch_add_and_move
>
> Where? A syzbot report? A local run? Please specify.
>
>>
>> write to 0xffff98fe74c015f8 of 1 bytes by task 45153 on cpu 2:
>> folio_batch_add+0x30/0xe0
>>
>> read to 0xffff98fe74c015f8 of 1 bytes by task 45175 on cpu 0:
>> folio_batch_count+0x0/0x10
>> cpu_needs_drain+0x253/0x430
>>
>> The write side is a per-cpu local operation (folio_batch_add on the
>> CPU that owns the per-cpu batch), while cpu_needs_drain() reads
>> another CPU's per-cpu batch without locking. Reading a slightly stale
>> value is harmless -- it only determines whether to schedule a drain,
>
> Then why are we adding a READ_ONCE() in such a core helper?
>
>> and a subsequent check will catch it.
>
> Where? Which check? Be specific.
>
>>
>> Use READ_ONCE() to annotate the read and prevent load tearing, which
>> also suppresses the KCSAN warning.
>
> Tearing on a single byte? Which architecture tears a single byte?
>
> I think you're actually more concerned about the value being optimised out on
> assumption of it not being updated elsewhere right?
>
> But acutally you're not, because everybody else uses a stack variable or _their
> own_ per-CPU value?
>
> Only cpu_needs_drain() is the odd one out right?
>
>>
>> Signed-off-by: Xuewen Wang <wangxuewen@kylinos.cn>
>> ---
>> include/linux/folio_batch.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/include/linux/folio_batch.h b/include/linux/folio_batch.h
>> index b45946adc50b..1e31e058e19d 100644
>> --- a/include/linux/folio_batch.h
>> +++ b/include/linux/folio_batch.h
>> @@ -53,7 +53,7 @@ static inline void folio_batch_reinit(struct folio_batch *fbatch)
>>
>> static inline unsigned int folio_batch_count(const struct folio_batch *fbatch)
>> {
>> - return fbatch->nr;
>> + return READ_ONCE(fbatch->nr);
>
> This isn't free, you're breaking optimisations here by doing that...
>
> It feels like the wrong level of abstraction, but actually I think every other
> case is either stack or per-CPU _on its own CPU_ (please check), in which case
> we _can_ suppress the check here but I think best done with data_race().
Fully agreed.
--
Cheers,
David
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v1] mm: fix data-race in folio_batch_count()
2026-06-24 9:26 [PATCH v1] mm: fix data-race in folio_batch_count() Xuewen Wang
2026-06-24 13:54 ` kernel test robot
2026-06-24 14:23 ` Lorenzo Stoakes
@ 2026-06-24 14:35 ` kernel test robot
2 siblings, 0 replies; 5+ messages in thread
From: kernel test robot @ 2026-06-24 14:35 UTC (permalink / raw)
To: Xuewen Wang, akpm, david, ljs, liam, vbabka, rppt, surenb, mhocko
Cc: llvm, oe-kbuild-all, linux-mm, linux-kernel, Xuewen Wang
Hi Xuewen,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/Xuewen-Wang/mm-fix-data-race-in-folio_batch_count/20260624-172724
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20260624092606.1083449-1-wangxuewen%40kylinos.cn
patch subject: [PATCH v1] mm: fix data-race in folio_batch_count()
config: i386-defconfig (https://download.01.org/0day-ci/archive/20260624/202606242209.fM2W0efm-lkp@intel.com/config)
compiler: clang version 22.1.3 (https://github.com/llvm/llvm-project e9846648fd6183ee6d8cbdb4502213fcf902a211)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260624/202606242209.fM2W0efm-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606242209.fM2W0efm-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from drivers/gpu/drm/i915/gem/i915_gem_shmem.c:6:
>> include/linux/folio_batch.h:56:9: error: call to undeclared function 'READ_ONCE'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
56 | return READ_ONCE(fbatch->nr);
| ^
In file included from drivers/gpu/drm/i915/gem/i915_gem_shmem.c:7:
In file included from include/linux/shmem_fs.h:6:
In file included from include/linux/swap.h:9:
In file included from include/linux/memcontrol.h:13:
In file included from include/linux/cgroup.h:17:
In file included from include/linux/fs.h:5:
In file included from include/linux/fs/super.h:5:
In file included from include/linux/fs/super_types.h:13:
In file included from include/linux/percpu-rwsem.h:7:
In file included from include/linux/rcuwait.h:6:
In file included from include/linux/sched/signal.h:6:
include/linux/signal.h:98:11: warning: array index 3 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds]
98 | return (set->sig[3] | set->sig[2] |
| ^ ~
arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
24 | unsigned long sig[_NSIG_WORDS];
| ^
In file included from drivers/gpu/drm/i915/gem/i915_gem_shmem.c:7:
In file included from include/linux/shmem_fs.h:6:
In file included from include/linux/swap.h:9:
In file included from include/linux/memcontrol.h:13:
In file included from include/linux/cgroup.h:17:
In file included from include/linux/fs.h:5:
In file included from include/linux/fs/super.h:5:
In file included from include/linux/fs/super_types.h:13:
In file included from include/linux/percpu-rwsem.h:7:
In file included from include/linux/rcuwait.h:6:
In file included from include/linux/sched/signal.h:6:
include/linux/signal.h:98:25: warning: array index 2 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds]
98 | return (set->sig[3] | set->sig[2] |
| ^ ~
arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
24 | unsigned long sig[_NSIG_WORDS];
| ^
In file included from drivers/gpu/drm/i915/gem/i915_gem_shmem.c:7:
In file included from include/linux/shmem_fs.h:6:
In file included from include/linux/swap.h:9:
In file included from include/linux/memcontrol.h:13:
In file included from include/linux/cgroup.h:17:
In file included from include/linux/fs.h:5:
In file included from include/linux/fs/super.h:5:
In file included from include/linux/fs/super_types.h:13:
In file included from include/linux/percpu-rwsem.h:7:
In file included from include/linux/rcuwait.h:6:
In file included from include/linux/sched/signal.h:6:
include/linux/signal.h:114:11: warning: array index 3 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds]
114 | return (set1->sig[3] == set2->sig[3]) &&
| ^ ~
arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
24 | unsigned long sig[_NSIG_WORDS];
| ^
In file included from drivers/gpu/drm/i915/gem/i915_gem_shmem.c:7:
In file included from include/linux/shmem_fs.h:6:
In file included from include/linux/swap.h:9:
In file included from include/linux/memcontrol.h:13:
In file included from include/linux/cgroup.h:17:
In file included from include/linux/fs.h:5:
In file included from include/linux/fs/super.h:5:
In file included from include/linux/fs/super_types.h:13:
In file included from include/linux/percpu-rwsem.h:7:
In file included from include/linux/rcuwait.h:6:
In file included from include/linux/sched/signal.h:6:
include/linux/signal.h:114:27: warning: array index 3 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds]
114 | return (set1->sig[3] == set2->sig[3]) &&
| ^ ~
arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
24 | unsigned long sig[_NSIG_WORDS];
| ^
In file included from drivers/gpu/drm/i915/gem/i915_gem_shmem.c:7:
In file included from include/linux/shmem_fs.h:6:
In file included from include/linux/swap.h:9:
In file included from include/linux/memcontrol.h:13:
In file included from include/linux/cgroup.h:17:
In file included from include/linux/fs.h:5:
In file included from include/linux/fs/super.h:5:
In file included from include/linux/fs/super_types.h:13:
In file included from include/linux/percpu-rwsem.h:7:
In file included from include/linux/rcuwait.h:6:
In file included from include/linux/sched/signal.h:6:
include/linux/signal.h:115:5: warning: array index 2 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds]
115 | (set1->sig[2] == set2->sig[2]) &&
| ^ ~
arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
24 | unsigned long sig[_NSIG_WORDS];
| ^
In file included from drivers/gpu/drm/i915/gem/i915_gem_shmem.c:7:
In file included from include/linux/shmem_fs.h:6:
In file included from include/linux/swap.h:9:
In file included from include/linux/memcontrol.h:13:
In file included from include/linux/cgroup.h:17:
In file included from include/linux/fs.h:5:
In file included from include/linux/fs/super.h:5:
In file included from include/linux/fs/super_types.h:13:
In file included from include/linux/percpu-rwsem.h:7:
In file included from include/linux/rcuwait.h:6:
In file included from include/linux/sched/signal.h:6:
include/linux/signal.h:115:21: warning: array index 2 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds]
115 | (set1->sig[2] == set2->sig[2]) &&
vim +/READ_ONCE +56 include/linux/folio_batch.h
53
54 static inline unsigned int folio_batch_count(const struct folio_batch *fbatch)
55 {
> 56 return READ_ONCE(fbatch->nr);
57 }
58
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-24 14:38 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-24 9:26 [PATCH v1] mm: fix data-race in folio_batch_count() Xuewen Wang
2026-06-24 13:54 ` kernel test robot
2026-06-24 14:23 ` Lorenzo Stoakes
2026-06-24 14:38 ` David Hildenbrand (Arm)
2026-06-24 14:35 ` kernel test robot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.