public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* KCSAN: data-race in path_lookupat / vfs_rename
@ 2026-03-11  2:54     ` Jianzhou Zhao
  2026-03-11 10:38       ` KCSAN: data-race in __d_drop / retain_dentry Christian Brauner
  0 siblings, 1 reply; 6+ messages in thread
From: Jianzhou Zhao @ 2026-03-11  2:54 UTC (permalink / raw)
  To: jack, viro, brauner, linux-fsdevel, linux-kernel

Subject: [BUG] fs/namei: KCSAN: data-race in path_lookupat / vfs_rename (`d_flags`)

Dear VFS Maintainers,

Our fuzzing tool, RacePilot, has detected a data race in the VFS subsystem on a 6.18 kernel version (`6.18.0-08691-g2061f18ad76e-dirty`). 

The race occurs between `vfs_rename()` modifying a victim dentry's `d_flags` via `dont_mount()` and a concurrent path lookup (`path_lookupat()`) executing lockless reads on the same `d_flags` via `d_managed()`.

### Call Trace & Context

```
BUG: KCSAN: data-race in path_lookupat / vfs_rename

write to 0xffff88802d62b180 of 4 bytes by task 4204 on cpu 0:
 dont_mount include/linux/dcache.h:391 [inline]
 vfs_rename+0xf03/0x13a0 fs/namei.c:5963
 do_renameat2+0x6a9/0x8e0 fs/namei.c:6072
 ...

read to 0xffff88802d62b180 of 4 bytes by task 4691 on cpu 1:
 d_managed include/linux/dcache.h:412 [inline]
 step_into fs/namei.c:2102 [inline]
 walk_component fs/namei.c:2258 [inline]
 lookup_last fs/namei.c:2753 [inline]
 path_lookupat+0x1fc/0x740 fs/namei.c:2777
 ...

value changed: 0x00300080 -> 0x00004180
```

#### Execution Flow & Code Context:

**1. The Write Path (`vfs_rename` -> `dont_mount`)**

During a rename operation, if a target (victim) dentry is going to be replaced, `vfs_rename` unmounts any mounts attached to it:
```c
5978: 	if (!(flags & RENAME_EXCHANGE) && target) {
5979: 		if (is_dir) {
5980: 			shrink_dcache_parent(new_dentry);
5981: 			target->i_flags |= S_DEAD;
5982: 		}
5983: 		dont_mount(new_dentry); // <--- Invokes dont_mount on the victim
5984: 		detach_mounts(new_dentry);
5985: 	}
```

Inside `dont_mount(struct dentry *dentry)` (`include/linux/dcache.h`), `d_flags` is updated while holding the `d_lock`:
```c
388: static inline void dont_mount(struct dentry *dentry)
389: {
390: 	spin_lock(&dentry->d_lock);
391: 	dentry->d_flags |= DCACHE_CANT_MOUNT; // <--- Read-Modify-Write
392: 	spin_unlock(&dentry->d_lock);
393: }
```

**2. The Read Path (`path_lookupat` -> `step_into` -> `d_managed`)**

Concurrently, a separate thread traversing the filesystem performs a path lookup and inspects the same intermediate dentry (the rename victim) within `step_into()`. It tests if the dentry needs mount-point translation by calling `d_managed()`:
```c
412: static inline bool d_managed(const struct dentry *dentry)
413: {
414: 	return dentry->d_flags & DCACHE_MANAGED_DENTRY; // <--- Lockless Read
415: }
```

### Root Cause Analysis

The data race is triggered because `d_managed()` executes a plain read of `dentry->d_flags` without holding `d_lock` (or using `READ_ONCE`), while `dont_mount()` concurrently modifies `dentry->d_flags` using a plain read-modify-write operation (`|= DCACHE_CANT_MOUNT`). 

Although `dont_mount()` properly protects the write within a `spin_lock`, the lockless reader is oblivious to it. KCSAN identifies this as a data race because the plain read in `d_managed()` can overlap with the unlocked portion of the compiler's emitted store sequence in another CPU.

Regrettably, we were unable to create a reproduction program for this bug.

### Potential Impact

While a torn read of a 32-bit aligned integer is highly unlikely on most modern architectures, plain concurrent access constitutes undefined behavior under the C memory model. The compiler might apply unforeseen transformations capable of producing a stale or transient state representation. However, pragmatically speaking, since the victim dentry is concurrently being purged/renamed, misinterpreting the managed flags would likely lead to subsequent validation failures upstream within namei.c, causing the traversal to bounce to slow paths safely. Therefore, we evaluate the severity of this issue as a minor data race that primarily risks undefined behavior and triggers KCSAN warnings.

### Proposed Fix

To silence the KCSAN warning and adhere securely to kernel concurrency practices, the lockless read of `d_flags` in `d_managed()` should be explicitly annotated using `data_race()` or, more robustly, `READ_ONCE()`. Using `READ_ONCE()` guarantees single-instruction load semantics preventing compiler optimizations that might induce tearing.

```diff
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -411,7 +411,7 @@ extern void dput(struct dentry *);
 
 static inline bool d_managed(const struct dentry *dentry)
 {
-	return dentry->d_flags & DCACHE_MANAGED_DENTRY;
+	return READ_ONCE(dentry->d_flags) & DCACHE_MANAGED_DENTRY;
 }
```
(Alternatively, using `data_race()` if no load-tearing dangers exist in theory on the supported architectures, depending on VFS lockless scaling conventions for d_flags).

We would be highly honored if this could be of any help.

Best regards,
[Your Name/Team]
RacePilot Team

^ permalink raw reply	[flat|nested] 6+ messages in thread

* KCSAN: data-race in __remove_assoc_queue / mark_buffer_dirty_inode
@ 2026-03-11  7:49   ` Jianzhou Zhao
  2026-03-11  2:54     ` KCSAN: data-race in path_lookupat / vfs_rename Jianzhou Zhao
  2026-03-17 12:37     ` KCSAN: data-race in __remove_assoc_queue / mark_buffer_dirty_inode Jan Kara
  0 siblings, 2 replies; 6+ messages in thread
From: Jianzhou Zhao @ 2026-03-11  7:49 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, jack, brauner, viro



Subject: [BUG] fs/buffer: KCSAN: data-race in __remove_assoc_queue / mark_buffer_dirty_inode

Dear Maintainers,

We are writing to report a KCSAN-detected data race vulnerability within `fs/buffer.c`. This bug was found by our custom fuzzing tool, RacePilot. The race condition occurs when `__remove_assoc_queue` updates `bh->b_assoc_map` while `mark_buffer_dirty_inode` performs a lockless speculative read on the exact same variable before trying to acquire the associative lock. We observed this bug on the Linux kernel version 6.18.0-08691-g2061f18ad76e-dirty.

Call Trace & Context
==================================================================
BUG: KCSAN: data-race in __remove_assoc_queue / mark_buffer_dirty_inode

write to 0xffff88802a6cc1f8 of 8 bytes by task 25093 on cpu 1:
 __remove_assoc_queue+0xae/0xd0 fs/buffer.c:524
 fsync_buffers_list+0x183/0x750 fs/buffer.c:823
 sync_mapping_buffers+0x59/0x90 fs/buffer.c:585
 fat_file_fsync+0xbb/0x100 fs/fat/file.c:195
 vfs_fsync_range+0xe8/0x170 fs/sync.c:197
 generic_write_sync include/linux/fs.h:2630 [inline]
 generic_file_write_iter+0x1ee/0x210 mm/filemap.c:4494
 new_sync_write fs/read_write.c:605 [inline]
 vfs_write+0x78f/0x910 fs/read_write.c:701
 ksys_write+0xbe/0x190 fs/read_write.c:753
 __do_sys_write fs/read_write.c:764 [inline]
 __se_sys_write fs/read_write.c:761 [inline]
 __x64_sys_write+0x41/0x50 fs/read_write.c:761
 x64_sys_call+0x1022/0x2030 arch/x86/include/generated/asm/syscalls_64.h:2
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xae/0x2c0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

read to 0xffff88802a6cc1f8 of 8 bytes by task 25074 on cpu 0:
 mark_buffer_dirty_inode+0x9c/0x250 fs/buffer.c:711
 fat_mirror_bhs+0x280/0x3b0 fs/fat/fatent.c:417
 fat_alloc_clusters+0xaed/0xb90 fs/fat/fatent.c:568
 fat_add_cluster+0x34/0xc0 fs/fat/inode.c:111
 __fat_get_block fs/fat/inode.c:159 [inline]
 fat_get_block+0x3c4/0x550 fs/fat/inode.c:194
 __block_write_begin_int+0x29e/0xcd0 fs/buffer.c:2186
 block_write_begin+0x74/0xf0 fs/buffer.c:2297
 cont_write_begin+0x402/0x5d0 fs/buffer.c:2635
 fat_write_begin+0x4f/0xe0 fs/fat/inode.c:233
 generic_perform_write+0x13c/0x4c0 mm/filemap.c:4341
 __generic_file_write_iter+0x117/0x130 mm/filemap.c:4464
 generic_file_write_iter+0xa5/0x210 mm/filemap.c:4490
 new_sync_write fs/read_write.c:605 [inline]
 vfs_write+0x78f/0x910 fs/read_write.c:701
 ksys_write+0xbe/0x190 fs/read_write.c:753
 __do_sys_write fs/read_write.c:764 [inline]
 __se_sys_write fs/read_write.c:761 [inline]
 __x64_sys_write+0x41/0x50 fs/read_write.c:761
 x64_sys_call+0x1022/0x2030 arch/x86/include/generated/asm/syscalls_64.h:2
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xae/0x2c0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

value changed: 0xffff88802a5fa008 -> 0x0000000000000000

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 UID: 0 PID: 25074 Comm: syz.2.1198 Not tainted 6.18.0-08691-g2061f18ad76e-dirty #44 PREEMPT(voluntary) 
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
==================================================================

Execution Flow & Code Context
During a buffer fsync trigger (e.g. from `fat_file_fsync`), `fsync_buffers_list()` is responsible for looping over the inode's private buffer list. At the start of its loop, it isolates the buffer and calls `__remove_assoc_queue()`, which clears `bh->b_assoc_map` using a plain C store under `buffer_mapping->i_private_lock`:
```c
// fs/buffer.c
static void __remove_assoc_queue(struct buffer_head *bh)
{
	list_del_init(&bh->b_assoc_buffers);
	WARN_ON(!bh->b_assoc_map);
	bh->b_assoc_map = NULL; // <-- Plain concurrent write
}
```

Meanwhile, another process actively dirtying a buffer triggers `mark_buffer_dirty_inode()`. This function optimistically checks whether the buffer head is already associated with an inode mapping structure using a lockless peek at `bh->b_assoc_map`. If the map is unassigned, it acquires the target `i_private_lock` and updates it:
```c
// fs/buffer.c
void mark_buffer_dirty_inode(struct buffer_head *bh, struct inode *inode)
{
	...
	if (!bh->b_assoc_map) { // <-- Lockless plain concurrent read
		spin_lock(&buffer_mapping->i_private_lock);
		list_move_tail(&bh->b_assoc_buffers, &mapping->i_private_list);
		bh->b_assoc_map = mapping;
		spin_unlock(&buffer_mapping->i_private_lock);
	}
}
```

Root Cause Analysis
A read-write KCSAN data race arises because `__remove_assoc_queue()` assigns `bh->b_assoc_map` while holding `i_private_lock` without employing any memory model volatile annotations. At the exact same snapshot, `mark_buffer_dirty_inode()` evaluates `if (!bh->b_assoc_map)` out of the lock domain to optimize out taking the spinlock for an already-associated buffer block. 
Unfortunately, we were unable to generate a reproducer for this bug.

Potential Impact
This data race is largely benign from a runtime control-flow perspective; the lack of association triggers a spinlock wait to assign the buffer, whereas an obsolete read evaluating to actual pointers skips the lock and skips the queue placement. If extreme read tearing or heavy compiler optimization happens, it could possibly lead to duplicate list inclusions or desynchronised buffer dirty associations, which could eventually yield missed buffers during `fsync` requests. However, triggering KCSAN logs adds extensive noise over expected logic.

Proposed Fix
To align with the Linux Memory Model and inform KCSAN that this speculative read is intentional and expected (and to prevent compiler tearing optimisations), we should simply wrap the condition check in `mark_buffer_dirty_inode` using the `data_race()` macro. Furthermore, employing `WRITE_ONCE` in `__remove_assoc_queue` reinforces safety. 

```diff
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -522,7 +522,7 @@ static void __remove_assoc_queue(struct buffer_head *bh)
 {
 	list_del_init(&bh->b_assoc_buffers);
 	WARN_ON(!bh->b_assoc_map);
-	bh->b_assoc_map = NULL;
+	WRITE_ONCE(bh->b_assoc_map, NULL);
 }
 
 int inode_has_buffers(struct inode *inode)
@@ -712,7 +712,7 @@ void mark_buffer_dirty_inode(struct buffer_head *bh, struct inode *inode)
 	} else {
 		BUG_ON(mapping->i_private_data != buffer_mapping);
 	}
-	if (!bh->b_assoc_map) {
+	if (!data_race(bh->b_assoc_map)) {
 		spin_lock(&buffer_mapping->i_private_lock);
 		list_move_tail(&bh->b_assoc_buffers, &mapping->i_private_list);
 		bh->b_assoc_map = mapping;
```

We would be highly honored if this could be of any help.

Best regards,
RacePilot Team

^ permalink raw reply	[flat|nested] 6+ messages in thread

* KCSAN: data-race in step_into_slowpath / vfs_unlink
@ 2026-03-11  7:55 ` Jianzhou Zhao
  2026-03-11  7:49   ` KCSAN: data-race in __remove_assoc_queue / mark_buffer_dirty_inode Jianzhou Zhao
  0 siblings, 1 reply; 6+ messages in thread
From: Jianzhou Zhao @ 2026-03-11  7:55 UTC (permalink / raw)
  To: linux-fsdevel, viro, brauner, jack, linux-kernel



Subject: [BUG] fs: KCSAN: data-race in step_into_slowpath / vfs_unlink

Dear Maintainers,

We are writing to report a KCSAN-detected data race vulnerability within the VFS subsystem (`fs/namei.c` and `include/linux/dcache.h`). This bug was found by our custom fuzzing tool, RacePilot. The race concerns the `d_flags` field of a `dentry` structure: `dont_mount()` modifies the flags concurrently with lockless RCU path walks checking `d_is_symlink()` inside `step_into_slowpath()`. We observed this bug on the Linux kernel version 6.18.0-08691-g2061f18ad76e-dirty.

Call Trace & Context
==================================================================
BUG: KCSAN: data-race in step_into_slowpath / vfs_unlink

write to 0xffff8880134699c0 of 4 bytes by task 3611 on cpu 1:
 dont_mount home/kfuzz/linux/include/linux/dcache.h:391 [inline]
 vfs_unlink+0x37d/0x6e0 home/kfuzz/linux/fs/namei.c:5392
 do_unlinkat+0x301/0x4e0 home/kfuzz/linux/fs/namei.c:5460
 __do_sys_unlink home/kfuzz/linux/fs/namei.c:5495 [inline]
 ...

read to 0xffff8880134699c0 of 4 bytes by task 3015 on cpu 0:
 __d_entry_type home/kfuzz/linux/include/linux/dcache.h:425 [inline]
 d_is_symlink home/kfuzz/linux/include/linux/dcache.h:455 [inline]
 step_into_slowpath+0x11d/0x910 home/kfuzz/linux/fs/namei.c:2068
 step_into home/kfuzz/linux/fs/namei.c:2119 [inline]
 walk_component home/kfuzz/linux/fs/namei.c:2258 [inline]
 lookup_last home/kfuzz/linux/fs/namei.c:2753 [inline]
 path_lookupat+0x422/0x740 home/kfuzz/linux/fs/namei.c:2777
 filename_lookup+0x1d3/0x3f0 home/kfuzz/linux/fs/namei.c:2806
 do_readlinkat.part.0+0x5f/0x250 home/kfuzz/linux/fs/stat.c:594
 ...

value changed: 0x00300080 -> 0x00004080

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 UID: 0 PID: 3015 Comm: systemd-udevd Not tainted 6.18.0-08691-g2061f18ad76e-dirty #42 PREEMPT(voluntary) 
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
==================================================================

Execution Flow & Code Context
The race occurs when a process is unlinking a file (`vfs_unlink`). As part of removing the dentry, VFS utilizes `dont_mount` to guarantee the dentry is flagged as `DCACHE_CANT_MOUNT`, preventing further interactions spanning mount layers. This operation applies a spinlock around the mutation:
```c
// include/linux/dcache.h
static inline void dont_mount(struct dentry *dentry)
{
	spin_lock(&dentry->d_lock);
	dentry->d_flags |= DCACHE_CANT_MOUNT; // <-- Modifies d_flags
	spin_unlock(&dentry->d_lock);
}
```

However, another concurrent process resolving paths (e.g., executing `readlinkat`) may traverse through `step_into()` into `step_into_slowpath()`. The slowpath evaluates whether to follow symbolic links using `d_is_symlink()`, which executes a lockless read on `dentry->d_flags`:
```c
// include/linux/dcache.h
static inline unsigned __d_entry_type(const struct dentry *dentry)
{
	return dentry->d_flags & DCACHE_ENTRY_TYPE; // <-- Plain concurrent read
}

static inline bool d_is_symlink(const struct dentry *dentry)
{
	return __d_entry_type(dentry) == DCACHE_SYMLINK_TYPE;
}
```

Root Cause Analysis
A KCSAN data race arises because one thread configures the internal `d_flags` synchronously within a spinlock but without explicit compiler bounds (`WRITE_ONCE`), while a second thread concurrently navigates the paths. `__d_entry_type` evaluates `dentry->d_flags` via a plain read (`& DCACHE_ENTRY_TYPE`). Because path lookups are largely optimized locklessly (RCU walks), `d_flags` bits can fluctuate concurrently. The writer changes `DCACHE_CANT_MOUNT` bits which inadvertently triggers standard diagnostics against the masking read of the entry type.
Unfortunately, we were unable to generate a reproducer for this bug.

Potential Impact
This data race is functionally benign. The type encoding bits `DCACHE_ENTRY_TYPE` are initialized when the dentry is instantiated and they do not shift dynamically for a specific live dentry pointer instance. The concurrent mutation only manipulates disjoint bit ranges (like `DCACHE_CANT_MOUNT`). However, the lack of annotations can trigger compiler tearing optimisations on architectures with distinct load/store granularities or simply pollute logs with false positive alarms, obscuring valid concurrency issues.

Proposed Fix
To accurately reflect the RCU memory model design that allows lockless reads of `d_flags` bits against asynchronous modifiers acting on orthogonal masks, we should decorate the read access in `__d_entry_type` with the `data_race()` marker.

```diff
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -426,7 +426,7 @@ static inline bool d_mountpoint(const struct dentry *dentry)
  */
 static inline unsigned __d_entry_type(const struct dentry *dentry)
 {
-	return dentry->d_flags & DCACHE_ENTRY_TYPE;
+	return data_race(dentry->d_flags) & DCACHE_ENTRY_TYPE;
 }
 
 static inline bool d_is_miss(const struct dentry *dentry)
```

We would be highly honored if this could be of any help.

Best regards,
RacePilot Team

^ permalink raw reply	[flat|nested] 6+ messages in thread

* KCSAN: data-race in __d_drop / retain_dentry
@ 2026-03-11  8:02 Jianzhou Zhao
  2026-03-11  7:55 ` KCSAN: data-race in step_into_slowpath / vfs_unlink Jianzhou Zhao
  0 siblings, 1 reply; 6+ messages in thread
From: Jianzhou Zhao @ 2026-03-11  8:02 UTC (permalink / raw)
  To: linux-kernel, viro, brauner, jack, linux-fsdevel



Subject: [BUG] fs: KCSAN: data-race in __d_drop / retain_dentry

Dear Maintainers,

We are writing to report a KCSAN-detected data race vulnerability within the VFS dcache subsystem (`fs/dcache.c` and `include/linux/list_bl.h`). This bug was found by our custom fuzzing tool, RacePilot. The race occurs when `__d_drop()` unhashes a dentry and maliciously clears `dentry->d_hash.pprev` concurrently against a lockless RCU reader executing `retain_dentry()` which calls `hlist_bl_unhashed()` to inspect `pprev`. We observed this bug on the Linux kernel version 6.18.0-08691-g2061f18ad76e-dirty.

Call Trace & Context
==================================================================
BUG: KCSAN: data-race in __d_drop / retain_dentry

write to 0xffff888013db3010 of 8 bytes by task 3021 on cpu 1:
 __d_drop fs/dcache.c:607 [inline]
 __d_drop+0x8c/0xd0 fs/dcache.c:601
 __dentry_kill+0xbd/0x3e0 fs/dcache.c:715
 dput fs/dcache.c:977 [inline]
 dput+0x123/0x220 fs/dcache.c:964
 handle_mounts fs/namei.c:1722 [inline]
 step_into_slowpath+0x688/0x960 fs/namei.c:2081
 ...
 __x64_sys_readlinkat+0x6f/0xa0 fs/stat.c:624

read to 0xffff888013db3010 of 8 bytes by task 4584 on cpu 0:
 hlist_bl_unhashed include/linux/list_bl.h:57 [inline]
 d_unhashed include/linux/dcache.h:374 [inline]
 retain_dentry+0x79/0x320 fs/dcache.c:809
 fast_dput fs/dcache.c:913 [inline]
 dput+0x97/0x220 fs/dcache.c:971
 end_dirop fs/namei.c:2939 [inline]
 do_unlinkat+0x332/0x540 fs/namei.c:5483
 ...
 __x64_sys_unlink+0x7d/0xa0 fs/namei.c:5513

value changed: 0xffff88807dbb55f8 -> 0x0000000000000000

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 UID: 0 PID: 4584 Comm: systemd-udevd Not tainted 6.18.0-08691-g2061f18ad76e-dirty #50 PREEMPT(voluntary) 
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
==================================================================

Execution Flow & Code Context
When a dentry receives its final decrement during a path operation (e.g., inside `dput`), its lifecycle might traverse `__dentry_kill()` leading to `__d_drop()`. Here, VFS manually eradicates the dentry from the hash list by assigning `NULL` to the internal double-linked list pointer tracker `pprev`:
```c
// fs/dcache.c
void __d_drop(struct dentry *dentry)
{
	if (!d_unhashed(dentry)) {
		___d_drop(dentry);
		...
		dentry->d_hash.pprev = NULL; // <-- Plain concurrent write
		write_seqcount_invalidate(&dentry->d_seq);
	}
}
```

Simultaneously, another thread undergoing an optimistic lockless `dput` (e.g., `fast_dput` resolving symbolic links or unlinking) probes whether the unreferenced dentry should be retained via `retain_dentry()`. `retain_dentry` verifies `d_unhashed()` relying on `hlist_bl_unhashed()`:
```c
// include/linux/list_bl.h
static inline bool hlist_bl_unhashed(const struct hlist_bl_node *h)
{
	return !h->pprev; // <-- Plain concurrent read
}
```

Root Cause Analysis
A KCSAN data race materializes because `__d_drop` resets `dentry->d_hash.pprev` using standard assignments, while `retain_dentry` inspects `pprev` outside the dentry's lock protection. The Linux kernel explicitly permits optimistic RCU verification of `d_unhashed` within `retain_dentry` given that transient inaccuracies gracefully fall through to a strictly-locked verification slow-path (`locked: if (dentry->d_lockref.count || retain_dentry(dentry, true))`). This lockless access is an architectural optimization; however, reading and writing the 8-byte `pprev` pointer without safe `READ_ONCE()` and `WRITE_ONCE()` directives violates the Memory Model constraints under KCSAN and exposes potential tearing risks across compiler transformations. 
Unfortunately, we were unable to generate a reproducer for this bug.

Potential Impact
This data race technically threatens architectures or compilers prone to load/store tearing resulting in a garbled non-NULL `pprev` pointer value snapshot. Nonetheless, functionally, the impact is minuscule: since `retain_dentry` simply falls back to the properly synchronized path if it wrongly presumes the dentry is not unhashed, there is no direct vulnerability. The data race nevertheless generates false-positive diagnostic spam obscuring more pertinent memory corruption flaws.

Proposed Fix
To codify the lockless RCU access paradigm for `hlist_bl_unhashed` reliably across VFS and properly placate KCSAN, we apply `WRITE_ONCE` to the hash decoupling and `READ_ONCE` for the state examination.

```diff
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -604,7 +604,7 @@ void __d_drop(struct dentry *dentry)
 		___d_drop(dentry);
 		__sanitizer_obj_cov_trace_pc(268);
 		__sanitizer_obj_cov_trace_pc(436);
-		dentry->d_hash.pprev = NULL;
+		WRITE_ONCE(dentry->d_hash.pprev, NULL);
 		write_seqcount_invalidate(&dentry->d_seq);
 	}
 }
--- a/include/linux/list_bl.h
+++ b/include/linux/list_bl.h
@@ -54,7 +54,7 @@ static inline bool hlist_bl_unhashed(const struct hlist_bl_node *h)
 	__sanitizer_obj_cov_trace_pc(386);
 	__sanitizer_obj_cov_trace_pc(1106);
 	__sanitizer_obj_cov_trace_pc(1107);
-	return !h->pprev;
+	return !READ_ONCE(h->pprev);
 }
 
 static inline struct hlist_bl_node *hlist_bl_first(struct hlist_bl_head *h)
```

We would be highly honored if this could be of any help.

Best regards,
RacePilot Team

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: KCSAN: data-race in __d_drop / retain_dentry
  2026-03-11  2:54     ` KCSAN: data-race in path_lookupat / vfs_rename Jianzhou Zhao
@ 2026-03-11 10:38       ` Christian Brauner
  0 siblings, 0 replies; 6+ messages in thread
From: Christian Brauner @ 2026-03-11 10:38 UTC (permalink / raw)
  To: Jianzhou Zhao; +Cc: linux-kernel, viro, jack, linux-fsdevel

On Wed, Mar 11, 2026 at 04:02:41PM +0800, Jianzhou Zhao wrote:
> 
> 
> Subject: [BUG] fs: KCSAN: data-race in __d_drop / retain_dentry
> Subject: [BUG] fs: KCSAN: data-race in step_into_slowpath / vfs_unlink
> Subject: [BUG] fs/buffer: KCSAN: data-race in __remove_assoc_queue / mark_buffer_dirty_inode
> Subject: [BUG] fs/namei: KCSAN: data-race in path_lookupat / vfs_rename (`d_flags`)

Propose a patch series with fixes and appropriate arguments for adding
these annotations, please. All of the races indicated seem benign.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: KCSAN: data-race in __remove_assoc_queue / mark_buffer_dirty_inode
  2026-03-11  7:49   ` KCSAN: data-race in __remove_assoc_queue / mark_buffer_dirty_inode Jianzhou Zhao
  2026-03-11  2:54     ` KCSAN: data-race in path_lookupat / vfs_rename Jianzhou Zhao
@ 2026-03-17 12:37     ` Jan Kara
  1 sibling, 0 replies; 6+ messages in thread
From: Jan Kara @ 2026-03-17 12:37 UTC (permalink / raw)
  To: Jianzhou Zhao; +Cc: linux-kernel, linux-fsdevel, jack, brauner, viro

On Wed 11-03-26 15:49:05, Jianzhou Zhao wrote:
> 
> 
> Subject: [BUG] fs/buffer: KCSAN: data-race in __remove_assoc_queue / mark_buffer_dirty_inode
> 
> Dear Maintainers,
> 
> We are writing to report a KCSAN-detected data race vulnerability within `fs/buffer.c`. This bug was found by our custom fuzzing tool, RacePilot. The race condition occurs when `__remove_assoc_queue` updates `bh->b_assoc_map` while `mark_buffer_dirty_inode` performs a lockless speculative read on the exact same variable before trying to acquire the associative lock. We observed this bug on the Linux kernel version 6.18.0-08691-g2061f18ad76e-dirty.
> 
> Call Trace & Context
> ==================================================================
> BUG: KCSAN: data-race in __remove_assoc_queue / mark_buffer_dirty_inode

This one is completely harmless and the code will do the right thing. So I
don't think any fix is needed here. At best we could annotate the read with
data_race to silence KCSAN but I have larger refactoring in progress in
this area so let's bother with such cosmetic things afterwards.

								Honza

> 
> write to 0xffff88802a6cc1f8 of 8 bytes by task 25093 on cpu 1:
>  __remove_assoc_queue+0xae/0xd0 fs/buffer.c:524
>  fsync_buffers_list+0x183/0x750 fs/buffer.c:823
>  sync_mapping_buffers+0x59/0x90 fs/buffer.c:585
>  fat_file_fsync+0xbb/0x100 fs/fat/file.c:195
>  vfs_fsync_range+0xe8/0x170 fs/sync.c:197
>  generic_write_sync include/linux/fs.h:2630 [inline]
>  generic_file_write_iter+0x1ee/0x210 mm/filemap.c:4494
>  new_sync_write fs/read_write.c:605 [inline]
>  vfs_write+0x78f/0x910 fs/read_write.c:701
>  ksys_write+0xbe/0x190 fs/read_write.c:753
>  __do_sys_write fs/read_write.c:764 [inline]
>  __se_sys_write fs/read_write.c:761 [inline]
>  __x64_sys_write+0x41/0x50 fs/read_write.c:761
>  x64_sys_call+0x1022/0x2030 arch/x86/include/generated/asm/syscalls_64.h:2
>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>  do_syscall_64+0xae/0x2c0 arch/x86/entry/syscall_64.c:94
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> read to 0xffff88802a6cc1f8 of 8 bytes by task 25074 on cpu 0:
>  mark_buffer_dirty_inode+0x9c/0x250 fs/buffer.c:711
>  fat_mirror_bhs+0x280/0x3b0 fs/fat/fatent.c:417
>  fat_alloc_clusters+0xaed/0xb90 fs/fat/fatent.c:568
>  fat_add_cluster+0x34/0xc0 fs/fat/inode.c:111
>  __fat_get_block fs/fat/inode.c:159 [inline]
>  fat_get_block+0x3c4/0x550 fs/fat/inode.c:194
>  __block_write_begin_int+0x29e/0xcd0 fs/buffer.c:2186
>  block_write_begin+0x74/0xf0 fs/buffer.c:2297
>  cont_write_begin+0x402/0x5d0 fs/buffer.c:2635
>  fat_write_begin+0x4f/0xe0 fs/fat/inode.c:233
>  generic_perform_write+0x13c/0x4c0 mm/filemap.c:4341
>  __generic_file_write_iter+0x117/0x130 mm/filemap.c:4464
>  generic_file_write_iter+0xa5/0x210 mm/filemap.c:4490
>  new_sync_write fs/read_write.c:605 [inline]
>  vfs_write+0x78f/0x910 fs/read_write.c:701
>  ksys_write+0xbe/0x190 fs/read_write.c:753
>  __do_sys_write fs/read_write.c:764 [inline]
>  __se_sys_write fs/read_write.c:761 [inline]
>  __x64_sys_write+0x41/0x50 fs/read_write.c:761
>  x64_sys_call+0x1022/0x2030 arch/x86/include/generated/asm/syscalls_64.h:2
>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>  do_syscall_64+0xae/0x2c0 arch/x86/entry/syscall_64.c:94
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> value changed: 0xffff88802a5fa008 -> 0x0000000000000000
> 
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 0 UID: 0 PID: 25074 Comm: syz.2.1198 Not tainted 6.18.0-08691-g2061f18ad76e-dirty #44 PREEMPT(voluntary) 
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> ==================================================================
> 
> Execution Flow & Code Context
> During a buffer fsync trigger (e.g. from `fat_file_fsync`), `fsync_buffers_list()` is responsible for looping over the inode's private buffer list. At the start of its loop, it isolates the buffer and calls `__remove_assoc_queue()`, which clears `bh->b_assoc_map` using a plain C store under `buffer_mapping->i_private_lock`:
> ```c
> // fs/buffer.c
> static void __remove_assoc_queue(struct buffer_head *bh)
> {
> 	list_del_init(&bh->b_assoc_buffers);
> 	WARN_ON(!bh->b_assoc_map);
> 	bh->b_assoc_map = NULL; // <-- Plain concurrent write
> }
> ```
> 
> Meanwhile, another process actively dirtying a buffer triggers `mark_buffer_dirty_inode()`. This function optimistically checks whether the buffer head is already associated with an inode mapping structure using a lockless peek at `bh->b_assoc_map`. If the map is unassigned, it acquires the target `i_private_lock` and updates it:
> ```c
> // fs/buffer.c
> void mark_buffer_dirty_inode(struct buffer_head *bh, struct inode *inode)
> {
> 	...
> 	if (!bh->b_assoc_map) { // <-- Lockless plain concurrent read
> 		spin_lock(&buffer_mapping->i_private_lock);
> 		list_move_tail(&bh->b_assoc_buffers, &mapping->i_private_list);
> 		bh->b_assoc_map = mapping;
> 		spin_unlock(&buffer_mapping->i_private_lock);
> 	}
> }
> ```
> 
> Root Cause Analysis
> A read-write KCSAN data race arises because `__remove_assoc_queue()` assigns `bh->b_assoc_map` while holding `i_private_lock` without employing any memory model volatile annotations. At the exact same snapshot, `mark_buffer_dirty_inode()` evaluates `if (!bh->b_assoc_map)` out of the lock domain to optimize out taking the spinlock for an already-associated buffer block. 
> Unfortunately, we were unable to generate a reproducer for this bug.
> 
> Potential Impact
> This data race is largely benign from a runtime control-flow perspective; the lack of association triggers a spinlock wait to assign the buffer, whereas an obsolete read evaluating to actual pointers skips the lock and skips the queue placement. If extreme read tearing or heavy compiler optimization happens, it could possibly lead to duplicate list inclusions or desynchronised buffer dirty associations, which could eventually yield missed buffers during `fsync` requests. However, triggering KCSAN logs adds extensive noise over expected logic.
> 
> Proposed Fix
> To align with the Linux Memory Model and inform KCSAN that this speculative read is intentional and expected (and to prevent compiler tearing optimisations), we should simply wrap the condition check in `mark_buffer_dirty_inode` using the `data_race()` macro. Furthermore, employing `WRITE_ONCE` in `__remove_assoc_queue` reinforces safety. 
> 
> ```diff
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -522,7 +522,7 @@ static void __remove_assoc_queue(struct buffer_head *bh)
>  {
>  	list_del_init(&bh->b_assoc_buffers);
>  	WARN_ON(!bh->b_assoc_map);
> -	bh->b_assoc_map = NULL;
> +	WRITE_ONCE(bh->b_assoc_map, NULL);
>  }
>  
>  int inode_has_buffers(struct inode *inode)
> @@ -712,7 +712,7 @@ void mark_buffer_dirty_inode(struct buffer_head *bh, struct inode *inode)
>  	} else {
>  		BUG_ON(mapping->i_private_data != buffer_mapping);
>  	}
> -	if (!bh->b_assoc_map) {
> +	if (!data_race(bh->b_assoc_map)) {
>  		spin_lock(&buffer_mapping->i_private_lock);
>  		list_move_tail(&bh->b_assoc_buffers, &mapping->i_private_list);
>  		bh->b_assoc_map = mapping;
> ```
> 
> We would be highly honored if this could be of any help.
> 
> Best regards,
> RacePilot Team
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-03-17 12:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-11  8:02 KCSAN: data-race in __d_drop / retain_dentry Jianzhou Zhao
2026-03-11  7:55 ` KCSAN: data-race in step_into_slowpath / vfs_unlink Jianzhou Zhao
2026-03-11  7:49   ` KCSAN: data-race in __remove_assoc_queue / mark_buffer_dirty_inode Jianzhou Zhao
2026-03-11  2:54     ` KCSAN: data-race in path_lookupat / vfs_rename Jianzhou Zhao
2026-03-11 10:38       ` KCSAN: data-race in __d_drop / retain_dentry Christian Brauner
2026-03-17 12:37     ` KCSAN: data-race in __remove_assoc_queue / mark_buffer_dirty_inode Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox