* [PATCH] btrfs: annotate lockless read of defrag_bytes in should_nocow()
@ 2026-04-01 2:21 Cen Zhang
2026-04-21 3:33 ` David Sterba
0 siblings, 1 reply; 2+ messages in thread
From: Cen Zhang @ 2026-04-01 2:21 UTC (permalink / raw)
To: clm, dsterba; +Cc: linux-btrfs, linux-kernel, baijiaju1990, Cen Zhang
should_nocow() reads inode->defrag_bytes without holding inode->lock,
while btrfs_set_delalloc_extent() and btrfs_clear_delalloc_extent()
update it under that spinlock.
This is a data race. The read is a quick check used to decide whether
to fall back to COW for a NOCOW inode: if defrag_bytes is non-zero and
the range is tagged EXTENT_DEFRAG, we force COW so that defragmentation
can rewrite the extent. Reading a stale value is harmless because:
- A missed increment may skip COW once, but the defrag pass will
redo the extent later.
- A stale non-zero may force an unnecessary COW, which is a minor
efficiency loss, not a correctness issue.
On 64-bit platforms an aligned u64 load is naturally atomic so tearing
cannot happen. On 32-bit platforms u64 may tear, but we only test for
zero vs non-zero, so the heuristic stays correct regardless.
Add READ_ONCE() to prevent the compiler from caching or splitting the
load and to document the intentional lock-free pattern.
Fixes: 47059d930f0e ("Btrfs: make defragment work with nodatacow option")
Signed-off-by: Cen Zhang <zzzccc427@gmail.com>
---
fs/btrfs/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a6da98435ef7..afc5d75d2dcb 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2420,7 +2420,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
static bool should_nocow(struct btrfs_inode *inode, u64 start, u64 end)
{
if (inode->flags & (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC)) {
- if (inode->defrag_bytes &&
+ if (READ_ONCE(inode->defrag_bytes) &&
btrfs_test_range_bit_exists(&inode->io_tree, start, end, EXTENT_DEFRAG))
return false;
return true;
--
2.34.1
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] btrfs: annotate lockless read of defrag_bytes in should_nocow()
2026-04-01 2:21 [PATCH] btrfs: annotate lockless read of defrag_bytes in should_nocow() Cen Zhang
@ 2026-04-21 3:33 ` David Sterba
0 siblings, 0 replies; 2+ messages in thread
From: David Sterba @ 2026-04-21 3:33 UTC (permalink / raw)
To: Cen Zhang; +Cc: clm, dsterba, linux-btrfs, linux-kernel, baijiaju1990
On Wed, Apr 01, 2026 at 10:21:53AM +0800, Cen Zhang wrote:
> should_nocow() reads inode->defrag_bytes without holding inode->lock,
> while btrfs_set_delalloc_extent() and btrfs_clear_delalloc_extent()
> update it under that spinlock.
>
> This is a data race. The read is a quick check used to decide whether
> to fall back to COW for a NOCOW inode: if defrag_bytes is non-zero and
> the range is tagged EXTENT_DEFRAG, we force COW so that defragmentation
> can rewrite the extent. Reading a stale value is harmless because:
>
> - A missed increment may skip COW once, but the defrag pass will
> redo the extent later.
> - A stale non-zero may force an unnecessary COW, which is a minor
> efficiency loss, not a correctness issue.
>
> On 64-bit platforms an aligned u64 load is naturally atomic so tearing
> cannot happen. On 32-bit platforms u64 may tear, but we only test for
> zero vs non-zero, so the heuristic stays correct regardless.
>
> Add READ_ONCE() to prevent the compiler from caching or splitting the
> load and to document the intentional lock-free pattern.
As it's explained it's ok to do the unlocked read but of that that the
dara_race() is more convenient, we do not really need the READ_ONCE
semantics.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-04-21 3:33 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-01 2:21 [PATCH] btrfs: annotate lockless read of defrag_bytes in should_nocow() Cen Zhang
2026-04-21 3:33 ` David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox