public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [BUG] ext4: KCSAN: lockless i_es_all_nr reads in es_shrinker_info
@ 2026-04-23 17:37 Shuhao Fu
  0 siblings, 0 replies; only message in thread
From: Shuhao Fu @ 2026-04-23 17:37 UTC (permalink / raw)
  To: Theodore Ts'o, linux-ext4; +Cc: linux-kernel

Hi,

Reading /proc/fs/ext4/<sb>/es_shrinker_info can overlap with extent-status
updates and trigger KCSAN reports on the per-inode ES counters (I saw this on
i_es_all_nr; i_es_shk_nr is read the same way in this proc path). From what I
can see, the user-visible impact appears limited to stale/inconsistent procfs
stats output (I do not have evidence of corruption or crash from this path).

I reproduced this on a local KCSAN-instrumented tree based on linux commit
d8a9a4b11a13, using an x86_64 QEMU workload with userspace reader/writer loops.
To increase the race window, I added small debug-only hooks in my local tree:
after the writer updates the counter, it briefly delays and records which inode
it just touched; the proc reader then samples that inode's counters during the
s_es_list walk. I also wrapped the i_es_all_nr load in a local helper
ext4_es_shrinker_read_all_nr() so the read-side stack has a stable symbol;
upstream reads happen directly in ext4_seq_es_shrinker_info_show().

With that setup, KCSAN prints the following summary line (naming the two
racing functions):

  BUG: KCSAN: data-race in ext4_es_init_extent / ext4_es_shrinker_read_all_nr

The first clean hit in my local log was:

  read to 0xffff917cc15222c8 of 4 bytes by task 107 on cpu 0:
   ext4_es_shrinker_read_all_nr+0x26/0x50
   ext4_es_kcsan_probe_hot_inode+0x2b9/0x400
   ext4_seq_es_shrinker_info_show+0x9b/0xd40
   ...
   __x64_sys_sendfile64+0xc2/0x100
   do_syscall_64+0x13f/0x3c0

  write (reordered) to 0xffff917cc15222c8 of 4 bytes by task 108 on cpu 2:
   ext4_es_init_extent+0x6aa/0xa00
   __es_insert_extent+0x477/0xaa0
   ...
   ext4_do_fallocate+0x127/0x310
   __x64_sys_fallocate+0x75/0xb0

I then saw the same pair again later in the same run (for example around
129.529391 and 129.579938), still on the same 4-byte address.

It looks like i_es_all_nr and i_es_shk_nr are documented as protected by
i_es_lock, and writers update them under i_es_lock, but
ext4_seq_es_shrinker_info_show() reads them while walking the list under
s_es_lock (the list lock), not i_es_lock.

The reproducer shape from normal userspace APIs is one reader loop running
cat /proc/fs/ext4/<sb>/es_shrinker_info while a writer loop runs fallocate,
buffered writes, punch-hole, and truncate on the same filesystem.

Since this appears to be an observational procfs stats path, would you prefer
marking these loads with data_race(...) so the intentionally approximate reads
are explicit and this path stops generating repeated KCSAN warnings?

The rough change I had in mind is:

diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index ... .. ...
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@
 int ext4_seq_es_shrinker_info_show(struct seq_file *seq, void *v)
 {
 	...
 	list_for_each_entry(ei, &sbi->s_es_list, i_es_list) {
 		inode_cnt++;
 		ei_all_nr = data_race(ei->i_es_all_nr);
 		ei_shk_nr = data_race(ei->i_es_shk_nr);
 		...
  }

If this direction is preferred, I can send a formal patch.

Thanks,
Shuhao

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2026-04-23 17:37 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-23 17:37 [BUG] ext4: KCSAN: lockless i_es_all_nr reads in es_shrinker_info Shuhao Fu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox