public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* BUG: KCSAN: data-race in _copy_to_iter / ext4_generic_delete_entry
@ 2026-03-11  8:04 Jianzhou Zhao
  2026-03-11 14:40 ` Theodore Tso
  0 siblings, 1 reply; 2+ messages in thread
From: Jianzhou Zhao @ 2026-03-11  8:04 UTC (permalink / raw)
  To: tytso, adilger.kernel, linux-ext4, linux-kernel

Subject: [BUG] ext4: KCSAN: data-race in _copy_to_iter / ext4_generic_delete_entry

Dear Maintainers,

We are writing to report a KCSAN-detected data race vulnerability within `ext4` and the block device layer. This bug was found by our custom fuzzing tool, RacePilot. The race occurs when `ext4_generic_delete_entry` modifies the `rec_len` of a previous directory entry (via a 2-byte write) during a path unlink operation, while a concurrent thread directly accesses the raw block device of the mounted filesystem (via `read()`), executing `_copy_to_iter()` which blindly bulk-reads the buffer underlying the filesystem page cache. We observed this bug on the Linux kernel version 6.18.0-08691-g2061f18ad76e-dirty.

Call Trace & Context
==================================================================
BUG: KCSAN: data-race in _copy_to_iter / ext4_generic_delete_entry

write to 0xffff888033da2010 of 2 bytes by task 5608 on cpu 0:
 ext4_generic_delete_entry+0x358/0x470 fs/ext4/namei.c:2670
 ext4_delete_entry+0x16d/0x280 fs/ext4/namei.c:2724
 __ext4_unlink+0x504/0x6e0 fs/ext4/namei.c:3263
 ext4_unlink+0x25d/0x280 fs/ext4/namei.c:3312
 vfs_unlink+0x323/0x710 fs/namei.c:5409
 do_unlinkat+0x301/0x540 fs/namei.c:5480
 ...
 __x64_sys_unlink+0x7d/0xa0 fs/namei.c:5513

read to 0xffff888033da2000 of 1377 bytes by task 4793 on cpu 1:
 instrument_copy_to_user include/linux/instrumented.h:113 [inline]
 copy_to_user_iter lib/iov_iter.c:29 [inline]
 iterate_ubuf include/linux/iov_iter.h:31 [inline]
 iterate_and_advance2 include/linux/iov_iter.h:304 [inline]
 iterate_and_advance include/linux/iov_iter.h:332 [inline]
 _copy_to_iter+0x210/0xf10 lib/iov_iter.c:231
 copy_page_to_iter lib/iov_iter.c:412 [inline]
 copy_page_to_iter+0xd1/0x150 lib/iov_iter.c:399
 copy_folio_to_iter include/linux/uio.h:204 [inline]
 filemap_read+0x46e/0x8f0 mm/filemap.c:2899
 blkdev_read_iter+0x114/0x360 block/fops.c:868
 new_sync_read fs/read_write.c:502 [inline]
 vfs_read+0x5c8/0x820 fs/read_write.c:583
 ksys_read+0xbe/0x190 fs/read_write.c:730
 ...
 __x64_sys_read+0x41/0x50 fs/read_write.c:737

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 4793 Comm: systemd-udevd Not tainted 6.18.0-08691-g2061f18ad76e-dirty #50 PREEMPT(voluntary) 
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
==================================================================

Execution Flow & Code Context
When deleting an entry from a directory block, `ext4_generic_delete_entry()` merges the deleted target into the preceding directory entry (`pde`) by extending its `rec_len`. 
```c
// fs/ext4/namei.c
int ext4_generic_delete_entry(...)
{
	...
		if (de == de_del)  {
			if (pde) {
				pde->rec_len = ext4_rec_len_to_disk( // <-- Plain concurrent 2-byte write
					ext4_rec_len_from_disk(pde->rec_len,
							       blocksize) +
					ext4_rec_len_from_disk(de->rec_len,
							       blocksize),
					blocksize);
                /* wipe entire dir_entry */
                memset(de, 0, ...);
            }
	...
}
```

Meanwhile, a secondary application accesses the underlying block device nodes (e.g. `/dev/sda1` mapped to the mount). The buffer structures managing the directory metadata pages are shared in physical memory between the `ext4` filesystem instance and the raw block device layer. Invoking `sys_read` drops into `blkdev_read_iter()`, which accesses the page/buffer mapping via lockless algorithms or asynchronous chunk reads. Eventually `copy_page_to_iter` reads the exact memory span being manipulated.

Root Cause Analysis
A KCSAN data race arises because one thread (the raw block device reader) executes an unannotated bulk copy (`_copy_to_iter`) of physical pages simultaneously while the filesystem actively modifies the active directory entries (`pde->rec_len`) over the same page cache mapped structures. Reading raw block devices underpinning actively mounted filesystems is intrinsically racy and officially unsupported for data coherence in Linux; however, without compiler barriers, this specific mutation can suffer load tearing or generate severe KCSAN spam.
Unfortunately, we were unable to generate a reproducer for this bug.

Potential Impact
This data race is benign. Raw device readers to live filesystems accept that data may be torn or in-transition when extracted locklessly. For the filesystem itself, it relies on journal commits and directory locks, so ext4 consistency is untampered. But, unannotated mutations trigger compiler optimization alarms within the sanitization toolchains. 

Proposed Fix
While disabling direct block access to mounted filesystems is the broader architectural shift, to silence KCSAN localized tearing warnings on this commonly active mutation path, `WRITE_ONCE` explicitly encodes the volatile memory constraint for the record length modification. 

```diff
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2667,12 +2667,12 @@ int ext4_generic_delete_entry(struct inode *dir,
 			return -EFSCORRUPTED;
 		if (de == de_del)  {
 			if (pde) {
-				pde->rec_len = ext4_rec_len_to_disk(
+				WRITE_ONCE(pde->rec_len, ext4_rec_len_to_disk(
 					ext4_rec_len_from_disk(pde->rec_len,
 							       blocksize) +
 					ext4_rec_len_from_disk(de->rec_len,
 							       blocksize),
-					blocksize);
+					blocksize));
 
 				/* wipe entire dir_entry */
 				memset(de, 0, ext4_rec_len_from_disk(de->rec_len,
```

We would be highly honored if this could be of any help.

Best regards,
RacePilot Team

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-03-11 14:40 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-11  8:04 BUG: KCSAN: data-race in _copy_to_iter / ext4_generic_delete_entry Jianzhou Zhao
2026-03-11 14:40 ` Theodore Tso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox