From: Sun YangKai <sunk67188@gmail.com>
To: Leo Martins <loemra.dev@gmail.com>, Filipe Manana <fdmanana@kernel.org>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] btrfs: prevent COW amplification during btrfs_search_slot
Date: Fri, 30 Jan 2026 17:37:15 +0800 [thread overview]
Message-ID: <e5eee424-303d-423b-aead-2eccbf63b8ec@gmail.com> (raw)
In-Reply-To: <df47b1c0-c25e-4501-aaa0-bc73ce1fdc00@gmail.com>
On 2026/1/30 12:14, Sun YangKai wrote:
> On 2026/1/30 08:12, Leo Martins wrote:
>> On Thu, 29 Jan 2026 11:52:07 +0000 Filipe Manana<fdmanana@kernel.org>
>> wrote:
>>> On Tue, Jan 27, 2026 at 8:43 PM Leo Martins<loemra.dev@gmail.com> wrote:
>>>> I've been investigating enospcs at Meta and have observed a strange
>>>> pattern where filesystems are enospcing with lots of unallocated space
>>>> (> 100G). Sample dmesg dump at bottom of message.
>>>>
>>>> btrfs_insert_delayed_dir_index is attempting to migrate some
>>>> reservation
>>>> from the transaction block reserve and finding it exhausted leading
>>>> to a
>>>> warning and enospc. This is a bug as the reservations are meant to be
>>>> worst case. It should be impossible to exhaust the transaction block
>>>> reserve.
>>>>
>>>> Some tracing of affected hosts revealed that there were single
>>>> btrfs_search_slot calls that were COWing 100s of times. I was able to
>>>> reproduce this behavior locally by creating a very constrained cgroup
>>>> and producing a lot of concurrent filesystem operations. Here's the
>>>> pattern:
>>>>
>>>> 1. btrfs_search_slot() begins tree traversal with cow=1
>>>> 2. Node at level N needs COW (old generation or WRITTEN flag set)
>>>> 3. btrfs_cow_block() allocates new node, updates parent pointer
>>>> 4. Traversal continues, but hits a condition requiring restart
>>>> (e.g., node
>>>> not cached, lock contention, need higher write_lock_level)
>>>> 5. btrfs_release_path() releases all locks and references
>>>> 6. Memory pressure triggers writeback on the COW'd node
>>>> 7. lock_extent_buffer_for_io() clears EXTENT_BUFFER_DIRTY and sets
>>>> BTRFS_HEADER_FLAG_WRITTEN
>>>> 8. goto again - traversal restarts from root
>>>> 9. Traversal reaches the freshly COW'd node
>>>> 10. should_cow_block() sees WRITTEN flag set, returns true
>>>> 11. btrfs_cow_block() allocates another new node - same logical
>>>> position,
>>>> new physical location, new reservation consumed
>>>> 12. Steps 4-11 repeat indefinitely under sustained memory pressure
>>>>
>>>> Note this behavior should be much harder to trigger since Boris's
>>>> AS_KERNEL_FILE changes that make it so that extent_buffer pages aren't
>>>> accounted for in user cgroups. However, I believe it
>>>> would still be an issue under global memory pressure.
>>>> Link:https://lore.kernel.org/linux-btrfs/
>>>> cover.1755812945.git.boris@bur.io/
>>>>
>>>> This COW amplification breaks the idea that transaction reservations
>>>> are
>>>> worst case as any search slot call could find itself in this COW
>>>> loop and
>>>> exhaust its reservation.
>>>>
>>>> My proposed solution is to temporarily pin extent buffers for the
>>>> lifetime of btrfs_search_slot. This prevents the massive COW
>>>> amplification that can be seen during high memory pressure.
>>>>
>>>> The implementation uses a local xarray to track COW'd buffers for the
>>>> duration of the search. The xarray stores extent_buffer pointers
>>>> without
>>>> taking additional references; this is safe because tracked buffers
>>>> remain
>>>> dirty (writeback_blockers prevents the dirty bit from being cleared)
>>>> and
>>>> dirty buffers cannot be reclaimed by memory pressure.
>>>>
>>>> Synchronization is provided by eb->lock: increments in
>>>> btrfs_search_slot_track_cow() occur while holding the write lock, and
>>>> the check in lock_extent_buffer_for_io() also holds the write lock via
>>>> btrfs_tree_lock(). Decrements don't require eb->lock because
>>>> writeback_blockers is atomic and merely indicates "don't write yet".
>>>> Once we decrement, we're done and don't care if writeback proceeds
>>>> immediately.
>>> This seems too complex to me.
>>>
>>> So this problem is very similar to some idea I had a few years ago but
>>> never managed to implement.
>>> It was about avoiding unnecessary COW, not for this space reservation
>>> exhaustion due to sustained memory pressure, but it would solve it
>>> too.
>>>
>>> The idea was that we do unnecessary COW in cases like this:
>>>
>>> 1) We COW a path in some tree and we are at transaction N;
>>>
>>> 2) Writeback happened for the extent buffers in that path while we are
>>> in the same transaction, because we reached the 32M limit and some
>>> task called btrfs_btree_balance_dirty() or something else triggered
>>> writeback of the btree inode;
>>>
>>> 3) While still at transaction N, we visit the same path to add an item
>>> to a leaf, or modify an item, whatever. Because the extent buffers
>>> have BTRFS_HEADER_FLAG_WRITTEN, we COW them again (should_cow_block()
>>> returns true).
>>>
>>> So during the lifetime of a transaction we can have a lot of
>>> unnecessary COW - we spend more time allocating extents, allocating
>>> memory, copying extent buffer data, use more space per transaction,
>>> etc.
>>>
>>> The idea was to not COW when an extent buffer has
>>> BTRFS_HEADER_FLAG_WRITTEN set, but only if its generation
>>> (btrfs_header_generation(eb)) matches the current transaction.
>>> That is safe because there's no committed tree that points to an
>>> extent buffer created in the current transaction.
>>>
>>> Any further modification to the extent buffer must be sure that the
>>> EXTENT_BUFFER_DIRTY flag is set, that the eb range is still in the
>>> transaction's dirty_pages io tree, etc, so that we don't miss writing
>>> the extent buffer to the same location again before the transaction
>>> commits the superblocks.
>>>
>>> Have you considered an approach like this?
>> I had not considered this, but it is a great idea.
>>
>> My first thought is that implementing this could be as simple
>> as removing the BTRFS_HEADER_FLAG_WRITTEN check. However, this
>> would mess with the assumptions around the log tree. From
>> btrfs_sync_log():
> After a fast glance and some tests, I found things might not be that
> easy. The problem is not only the log tree.
>> /*
>> * IO has been started, blocks of the log tree have WRITTEN flag set
>> * in their headers. new modifications of the log will be written to
>> * new positions. so it's safe to allow log writers to go in.
>> */
>>
>> ^ Assumes that WRITTEN blocks will be COW'd.
>>
>> The issue looks like:
>>
>> 1. fsync A COWs eb
>> 2. fsync A lock_extent_buffer_for_io(); sets WRITTEN, unlocks tree
>> 3. fsync B does __not__ COW eb and modifies it
>> 4. fsync A writes modified eb to disk
>> 5. CRASH; the log tree is corrupted
>>
>> One way to avoid that is to keep the current behavior for the log
>> tree, but that leaves the potential for COW amplification...
> I tested with a patch like this:
> @@ -624,14 +624,18 @@ static inline bool should_cow_block(const struct
> btrfs_trans_handle *trans,
> if (btrfs_header_generation(buf) != trans->transid)
> return true;
>
> - if (btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN))
> - return true;
> -
> /* Ensure we can see the FORCE_COW bit. */
> smp_mb__before_atomic();
> if (test_bit(BTRFS_ROOT_FORCE_COW, &root->state))
> return true;
>
> + if (btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN)) {
> + if (btrfs_root_id(root) == BTRFS_TREE_LOG_OBJECTID)
> + return true;
> + btrfs_mark_buffer_dirty(trans, buf);
> + return false;
> + }
> +
> if (btrfs_root_id(root) == BTRFS_TREE_RELOC_OBJECTID)
>
> return false;
>
> And get some errors like this:
>
>
> [ +0.090163] [ T2589] run fstests btrfs/004 at 2026-01-30 11:53:37
> [ +0.432352] [T11685] BTRFS: device fsid 1fb397fc-97a7-44dd-9602-
> dd38b74bc391 devid 1 transid 8 /dev/loop1 (7:1) scanned by mount (11685)
> [ +0.000351] [T11685] BTRFS info (device loop1): first mount of
> filesystem 1fb397fc-97a7-44dd-9602-dd38b74bc391
> [ +0.000014] [T11685] BTRFS info (device loop1): using crc32c (crc32c-
> lib) checksum algorithm
> [ +0.001298] [T11685] BTRFS info (device loop1): checking UUID tree
> [ +0.000039] [T11685] BTRFS info (device loop1): enabling ssd
> optimizations
> [ +0.000003] [T11685] BTRFS info (device loop1): turning on async discard
> [ +0.000002] [T11685] BTRFS info (device loop1): enabling free space tree
> [ +1.051781] [T11703] page: refcount:2 mapcount:0
> mapping:00000000eb6d7caa index:0x2348 pfn:0x1caebf
> [ +0.000008] [T11703] memcg:ffff9b3300263cc0
> [ +0.000003] [T11703] aops:0xffffffffc0354040 ino:1
> [ +0.000024] [T11703] flags: 0x4e0000000000423e(referenced|uptodate|
> dirty|lru|workingset|private|writeback|zone=1)
> [ +0.000007] [T11703] raw: 4e0000000000423e fffff74a872bb908
> fffff74a84206a88 ffff9b33c6706880
> [ +0.000004] [T11703] raw: 0000000000002348 ffff9b334be522d0
> 00000002ffffffff ffff9b3300263cc0
> [ +0.000002] [T11703] page dumped because: eb page dump
> [ +0.000003] [T11703] BTRFS critical (device loop1): corrupt leaf:
> root=5 block=36995072 slot=118 ino=406 file_offset=94208, invalid
> ram_bytes for file extent, have 8660273067269322872, should be aligned
> to 4096
> [ +0.000013] [T11703] BTRFS info (device loop1): leaf 36995072 gen 33
> total ptrs 128 free space 2857 owner 5
> [ +0.000006] [T11703] item 0 key (386 DIR_ITEM 238230307) itemoff
> 16249 itemsize 34
> [ +0.000004] [T11703] location key (462 1 0) type 2
> [ +0.000003] [T11703] transid 33 data_len 0 name_len 4
> [ +0.000003] [T11703] item 1 key (386 DIR_ITEM 1473745676) itemoff
> 16216 itemsize 33
> [ +0.000004] [T11703] location key (376 1 0) type 3
> [ +0.000002] [T11703] transid 30 data_len 0 name_len 3
> [ +0.000003] [T11703] item 2 key (386 DIR_ITEM 2243137595) itemoff
> 16182 itemsize 34
> [ +0.000004] [T11703] location key (413 1 0) type 1
> [ +0.000002] [T11703] transid 32 data_len 0 name_len 4
> ...
> [ +0.000001] [T11703] item 127 key (405 DIR_ITEM 828387202) itemoff
> 6057 itemsize 34
> [ +0.000002] [T11703] location key (479 1 0) type 3
> [ +0.000001] [T11703] transid 33 data_len 0 name_len 4
> [ +0.000002] [T11703] BTRFS error (device loop1): block=36995072 write
> time tree block corruption detected
> [ +0.003429] [T11703] BTRFS: error (device loop1) in
> btrfs_commit_transaction:2555: errno=-5 IO failure (Error while writing
> out transaction)
> [ +0.000007] [T11703] BTRFS info (device loop1 state E): forced readonly
> [ +0.000002] [T11703] BTRFS warning (device loop1 state E): Skipping
> commit of aborted transaction.
> [ +0.000002] [T11703] BTRFS error (device loop1 state EA): Transaction
> aborted (error -5)
> [ +0.000003] [T11703] BTRFS: error (device loop1 state EA) in
> cleanup_transaction:2037: errno=-5 IO failure
>
> The reported 406 inode is even not in the printed leaf. It seems like a
> data race maybe caused by:
>
> We unlock the eb after setting the WRITTEN flag during write back, and
> the eb should not get modified since then because all future writes will
> use the cowed eb. However, with the WRITTEN flag check removed in
> should_cow_block, we might write to the eb with WRITTEN flag set which
> might be under io.
I tried again with this:
@@ -624,14 +624,20 @@ static inline bool should_cow_block(const struct
btrfs_trans_handle *trans,
if (btrfs_header_generation(buf) != trans->transid)
return true;
- if (btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN))
- return true;
-
/* Ensure we can see the FORCE_COW bit. */
smp_mb__before_atomic();
if (test_bit(BTRFS_ROOT_FORCE_COW, &root->state))
return true;
+ if (btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN)) {
+ if (btrfs_root_id(root) == BTRFS_TREE_LOG_OBJECTID)
+ return true;
+ if (test_bit(EXTENT_BUFFER_WRITEBACK, &buf->bflags))
+ return true;
+ btrfs_mark_buffer_dirty(trans, buf);
+ return false;
+ }
+
if (btrfs_root_id(root) == BTRFS_TREE_RELOC_OBJECTID)
return false;
When WRITEBACK is set, do a normal cow to prevent the data race. This
seems to fix the previous problem. However, I got this:
[ +0.020843] [T15127] BTRFS error (device loop1): block=30687232 bad
generation, have 11 expect > 14
[ +0.000009] [T15127] item 0 key (256 INODE_ITEM 0) itemoff 16123
itemsize 160
[ +0.000004] [T15127] inode generation 3 transid 11 size 10 nbytes 16384
[ +0.000003] [T15127] block group 0 mode 40755 links 1 uid 0 gid 0
[ +0.000002] [T15127] rdev 0 sequence 1 flags 0x0
[ +0.000002] [T15127] atime 1769760651.0
[ +0.000002] [T15127] ctime 1769760652.250234845
[ +0.000002] [T15127] mtime 1769760652.250234845
[ +0.000001] [T15127] otime 1769760651.0
[ +0.000002] [T15127] item 1 key (256 INODE_REF 256) itemoff 16111
itemsize 12
[ +0.000003] [T15127] index 0 name_len 2
[ +0.000002] [T15127] item 2 key (256 DIR_ITEM 2030520461) itemoff
16076 itemsize 35
[ +0.000002] [T15127] location key (257 1 0) type 2
[ +0.000002] [T15127] transid 11 data_len 0 name_len 5
[ +0.000002] [T15127] item 3 key (256 DIR_INDEX 2) itemoff 16041
itemsize 35
[ +0.000002] [T15127] location key (257 1 0) type 2
[ +0.000002] [T15127] transid 11 data_len 0 name_len 5
[ +0.000002] [T15127] item 4 key (257 INODE_ITEM 0) itemoff 15881
itemsize 160
[ +0.000002] [T15127] inode generation 11 transid 11 size 12 nbytes 0
[ +0.000002] [T15127] block group 0 mode 40755 links 1 uid 0 gid 0
[ +0.000002] [T15127] rdev 0 sequence 19 flags 0x0
[ +0.000001] [T15127] atime 1769760652.250234845
[ +0.000002] [T15127] ctime 1769760652.256913323
[ +0.000002] [T15127] mtime 1769760652.256913323
[ +0.000001] [T15127] otime 1769760652.250234845
[ +0.000002] [T15127] item 5 key (257 INODE_REF 256) itemoff 15866
itemsize 15
[ +0.000002] [T15127] index 2 name_len 5
[ +0.000002] [T15127] item 6 key (257 DIR_ITEM 247980518) itemoff
15830 itemsize 36
[ +0.000002] [T15127] location key (256 132 18446744073709551615) type 2
[ +0.000002] [T15127] transid 11 data_len 0 name_len 6
[ +0.000002] [T15127] item 7 key (257 DIR_INDEX 2) itemoff 15794
itemsize 36
[ +0.000002] [T15127] location key (256 132 18446744073709551615) type 2
[ +0.000002] [T15127] transid 11 data_len 0 name_len 6
[ +0.000001] [T15127] BTRFS error (device loop1): block=30687232 write
time tree block corruption detected
[ +0.000017] [T15127] BTRFS error (device loop1): block=30703616 bad
generation, have 11 expect > 14
[ +0.000004] [T15127] item 0 key (13631488 BLOCK_GROUP_ITEM 8388608)
itemoff 16259 itemsize 24
[ +0.000003] [T15127] block group used 0 chunk_objectid 256 flags 1
[ +0.000002] [T15127] item 1 key (22020096 BLOCK_GROUP_ITEM 8388608)
itemoff 16235 itemsize 24
[ +0.000002] [T15127] block group used 16384 chunk_objectid 256 flags 34
[ +0.000002] [T15127] item 2 key (22036480 METADATA_ITEM 0) itemoff
16202 itemsize 33
[ +0.000002] [T15127] extent refs 1 gen 8 flags 2
[ +0.000002] [T15127] ref#0: tree block backref root 3
[ +0.000003] [T15127] item 3 key (30408704 BLOCK_GROUP_ITEM 268435456)
itemoff 16178 itemsize 24
[ +0.000002] [T15127] block group used 163840 chunk_objectid 256 flags 36
[ +0.000002] [T15127] item 4 key (30490624 METADATA_ITEM 0) itemoff
16145 itemsize 33
[ +0.000002] [T15127] extent refs 1 gen 5 flags 2
[ +0.000002] [T15127] ref#0: tree block backref root 7
[ +0.000002] [T15127] item 5 key (30523392 METADATA_ITEM 0) itemoff
16112 itemsize 33
[ +0.000002] [T15127] extent refs 1 gen 5 flags 2
[ +0.000002] [T15127] ref#0: tree block backref root 18446744073709551607
[ +0.000002] [T15127] item 6 key (30605312 METADATA_ITEM 0) itemoff
16079 itemsize 33
[ +0.000002] [T15127] extent refs 1 gen 9 flags 2
[ +0.000002] [T15127] ref#0: tree block backref root 4
[ +0.000002] [T15127] item 7 key (30687232 METADATA_ITEM 0) itemoff
16046 itemsize 33
[ +0.000002] [T15127] extent refs 1 gen 11 flags 2
[ +0.000002] [T15127] ref#0: tree block backref root 5
[ +0.000002] [T15127] item 8 key (30703616 METADATA_ITEM 0) itemoff
16013 itemsize 33
[ +0.000002] [T15127] extent refs 1 gen 11 flags 2
[ +0.000002] [T15127] ref#0: tree block backref root 2
[ +0.000002] [T15127] item 9 key (30720000 METADATA_ITEM 0) itemoff
15980 itemsize 33
[ +0.000002] [T15127] extent refs 1 gen 11 flags 2
[ +0.000002] [T15127] ref#0: tree block backref root 10
[ +0.000002] [T15127] item 10 key (30736384 METADATA_ITEM 0) itemoff
15947 itemsize 33
[ +0.000002] [T15127] extent refs 1 gen 11 flags 2
[ +0.000002] [T15127] ref#0: tree block backref root 8
[ +0.000002] [T15127] item 11 key (30752768 METADATA_ITEM 0) itemoff
15914 itemsize 33
[ +0.000002] [T15127] extent refs 1 gen 11 flags 2
[ +0.000002] [T15127] ref#0: tree block backref root 256
[ +0.000002] [T15127] item 12 key (30769152 METADATA_ITEM 0) itemoff
15881 itemsize 33
[ +0.000002] [T15127] extent refs 1 gen 11 flags 2
[ +0.000002] [T15127] ref#0: tree block backref root 1
[ +0.000002] [T15127] item 13 key (30785536 METADATA_ITEM 0) itemoff
15848 itemsize 33
[ +0.000002] [T15127] extent refs 1 gen 11 flags 2
[ +0.000002] [T15127] ref#0: tree block backref root 9
[ +0.000002] [T15127] BTRFS error (device loop1): block=30703616 write
time tree block corruption detected
[ +0.000012] [T15127] BTRFS error (device loop1): block=30720000 bad
generation, have 11 expect > 14
[ +0.000004] [T15127] item 0 key (13631488 FREE_SPACE_INFO 8388608)
itemoff 16275 itemsize 8
[ +0.000002] [T15127] item 1 key (13631488 FREE_SPACE_EXTENT 8388608)
itemoff 16275 itemsize 0
[ +0.000002] [T15127] item 2 key (22020096 FREE_SPACE_INFO 8388608)
itemoff 16267 itemsize 8
[ +0.000002] [T15127] item 3 key (22020096 FREE_SPACE_EXTENT 16384)
itemoff 16267 itemsize 0
[ +0.000003] [T15127] item 4 key (22052864 FREE_SPACE_EXTENT 8355840)
itemoff 16267 itemsize 0
[ +0.000002] [T15127] item 5 key (30408704 FREE_SPACE_INFO 268435456)
itemoff 16259 itemsize 8
[ +0.000002] [T15127] item 6 key (30408704 FREE_SPACE_EXTENT 81920)
itemoff 16259 itemsize 0
[ +0.000002] [T15127] item 7 key (30507008 FREE_SPACE_EXTENT 16384)
itemoff 16259 itemsize 0
[ +0.000002] [T15127] item 8 key (30539776 FREE_SPACE_EXTENT 65536)
itemoff 16259 itemsize 0
[ +0.000002] [T15127] item 9 key (30621696 FREE_SPACE_EXTENT 65536)
itemoff 16259 itemsize 0
[ +0.000003] [T15127] item 10 key (30801920 FREE_SPACE_EXTENT
268042240) itemoff 16259 itemsize 0
[ +0.000002] [T15127] BTRFS error (device loop1): block=30720000 write
time tree block corruption detected
[ +0.000010] [T15127] BTRFS error (device loop1): block=30736384 bad
generation, have 11 expect > 14
[ +0.000004] [T15127] item 0 key (0 QGROUP_STATUS 0) itemoff 16243
itemsize 40
[ +0.000003] [T15127] item 1 key (0 QGROUP_INFO 5) itemoff 16203
itemsize 40
[ +0.000002] [T15127] item 2 key (0 QGROUP_INFO 256) itemoff 16163
itemsize 40
[ +0.000002] [T15127] item 3 key (0 QGROUP_LIMIT 5) itemoff 16123
itemsize 40
[ +0.000002] [T15127] item 4 key (0 QGROUP_LIMIT 256) itemoff 16083
itemsize 40
[ +0.000003] [T15127] BTRFS error (device loop1): block=30736384 write
time tree block corruption detected
[ +0.000014] [T15127] BTRFS error (device loop1): block=30769152 bad
generation, have 11 expect > 14
[ +0.000004] [T15127] item 0 key (2 ROOT_ITEM 0) itemoff 15844
itemsize 439
[ +0.000002] [T15127] root data bytenr 30703616 refs 1
[ +0.000002] [T15127] item 1 key (4 ROOT_ITEM 0) itemoff 15405
itemsize 439
[ +0.000002] [T15127] root data bytenr 30605312 refs 1
[ +0.000001] [T15127] item 2 key (5 INODE_REF 6) itemoff 15388 itemsize 17
[ +0.000002] [T15127] index 0 name_len 7
[ +0.000002] [T15127] item 3 key (5 ROOT_ITEM 0) itemoff 14949
itemsize 439
[ +0.000002] [T15127] root data bytenr 30687232 refs 1
[ +0.000002] [T15127] item 4 key (5 ROOT_REF 256) itemoff 14925
itemsize 24
[ +0.000002] [T15127] item 5 key (6 INODE_ITEM 0) itemoff 14765
itemsize 160
[ +0.000002] [T15127] inode generation 3 transid 0 size 0 nbytes 16384
[ +0.000002] [T15127] block group 0 mode 40755 links 1 uid 0 gid 0
[ +0.000002] [T15127] rdev 0 sequence 0 flags 0x0
[ +0.000001] [T15127] atime 1769760651.0
[ +0.000002] [T15127] ctime 1769760651.0
[ +0.000002] [T15127] mtime 1769760651.0
[ +0.000001] [T15127] otime 1769760651.0
[ +0.000002] [T15127] item 6 key (6 INODE_REF 6) itemoff 14753 itemsize 12
[ +0.000002] [T15127] index 0 name_len 2
[ +0.000001] [T15127] item 7 key (6 DIR_ITEM 2378154706) itemoff 14716
itemsize 37
[ +0.000003] [T15127] location key (5 132 18446744073709551615) type 2
[ +0.000001] [T15127] transid 3 data_len 0 name_len 7
[ +0.000002] [T15127] item 8 key (7 ROOT_ITEM 0) itemoff 14277
itemsize 439
[ +0.000002] [T15127] root data bytenr 30490624 refs 1
[ +0.000002] [T15127] item 9 key (8 ROOT_ITEM 0) itemoff 13838
itemsize 439
[ +0.000002] [T15127] root data bytenr 30736384 refs 1
[ +0.000001] [T15127] item 10 key (9 ROOT_ITEM 0) itemoff 13399
itemsize 439
[ +0.000002] [T15127] root data bytenr 30785536 refs 1
[ +0.000002] [T15127] item 11 key (10 ROOT_ITEM 0) itemoff 12960
itemsize 439
[ +0.000002] [T15127] root data bytenr 30720000 refs 1
[ +0.000001] [T15127] item 12 key (256 ROOT_ITEM 11) itemoff 12521
itemsize 439
[ +0.000003] [T15127] root data bytenr 30752768 refs 1
[ +0.000001] [T15127] item 13 key (256 ROOT_BACKREF 5) itemoff 12497
itemsize 24
[ +0.000003] [T15127] item 14 key (18446744073709551607 ROOT_ITEM 0)
itemoff 12058 itemsize 439
[ +0.000002] [T15127] root data bytenr 30523392 refs 1
[ +0.000001] [T15127] BTRFS error (device loop1): block=30769152 write
time tree block corruption detected
[ +0.000012] [T15127] BTRFS error (device loop1): block=30801920 bad
generation, have 12 expect > 14
[ +0.000003] [T15127] item 0 key (0 QGROUP_STATUS 0) itemoff 16243
itemsize 40
[ +0.000003] [T15127] item 1 key (0 QGROUP_INFO 5) itemoff 16203
itemsize 40
[ +0.000002] [T15127] item 2 key (0 QGROUP_INFO 256) itemoff 16163
itemsize 40
[ +0.000002] [T15127] item 3 key (0 QGROUP_INFO 257) itemoff 16123
itemsize 40
[ +0.000002] [T15127] item 4 key (0 QGROUP_LIMIT 5) itemoff 16083
itemsize 40
[ +0.000002] [T15127] item 5 key (0 QGROUP_LIMIT 256) itemoff 16043
itemsize 40
[ +0.000002] [T15127] item 6 key (0 QGROUP_LIMIT 257) itemoff 16003
itemsize 40
[ +0.000002] [T15127] BTRFS error (device loop1): block=30801920 write
time tree block corruption detected
[ +0.000014] [T15127] BTRFS error (device loop1): block=30818304 bad
generation, have 12 expect > 14
[ +0.000003] [T15127] item 0 key (256 INODE_ITEM 0) itemoff 16123
itemsize 160
[ +0.000002] [T15127] inode generation 3 transid 11 size 10 nbytes 16384
[ +0.000002] [T15127] block group 0 mode 40755 links 1 uid 0 gid 0
[ +0.000002] [T15127] rdev 0 sequence 1 flags 0x0
[ +0.000002] [T15127] atime 1769760651.0
[ +0.000001] [T15127] ctime 1769760652.250234845
[ +0.000002] [T15127] mtime 1769760652.250234845
[ +0.000001] [T15127] otime 1769760651.0
[ +0.000002] [T15127] item 1 key (256 INODE_REF 256) itemoff 16111
itemsize 12
[ +0.000002] [T15127] index 0 name_len 2
[ +0.000002] [T15127] item 2 key (256 DIR_ITEM 2030520461) itemoff
16076 itemsize 35
[ +0.000002] [T15127] location key (257 1 0) type 2
[ +0.000002] [T15127] transid 11 data_len 0 name_len 5
[ +0.000001] [T15127] item 3 key (256 DIR_INDEX 2) itemoff 16041
itemsize 35
[ +0.000002] [T15127] location key (257 1 0) type 2
[ +0.000002] [T15127] transid 11 data_len 0 name_len 5
[ +0.000002] [T15127] item 4 key (257 INODE_ITEM 0) itemoff 15881
itemsize 160
[ +0.000002] [T15127] inode generation 11 transid 12 size 24 nbytes 0
[ +0.000002] [T15127] block group 0 mode 40755 links 1 uid 0 gid 0
[ +0.000002] [T15127] rdev 0 sequence 19 flags 0x0
[ +0.000001] [T15127] atime 1769760652.250234845
[ +0.000002] [T15127] ctime 1769760652.267621586
[ +0.000001] [T15127] mtime 1769760652.267621586
[ +0.000002] [T15127] otime 1769760652.250234845
[ +0.000002] [T15127] item 5 key (257 INODE_REF 256) itemoff 15866
itemsize 15
[ +0.000002] [T15127] index 2 name_len 5
[ +0.000001] [T15127] item 6 key (257 DIR_ITEM 247980518) itemoff
15830 itemsize 36
[ +0.000002] [T15127] location key (256 132 18446744073709551615) type 2
[ +0.000002] [T15127] transid 11 data_len 0 name_len 6
[ +0.000002] [T15127] item 7 key (257 DIR_ITEM 496439826) itemoff
15794 itemsize 36
[ +0.000002] [T15127] location key (257 132 18446744073709551615) type 2
[ +0.000002] [T15127] transid 12 data_len 0 name_len 6
[ +0.000001] [T15127] item 8 key (257 DIR_INDEX 2) itemoff 15758
itemsize 36
[ +0.000003] [T15127] location key (256 132 18446744073709551615) type 2
[ +0.000001] [T15127] transid 11 data_len 0 name_len 6
[ +0.000002] [T15127] item 9 key (257 DIR_INDEX 3) itemoff 15722
itemsize 36
[ +0.000002] [T15127] location key (257 132 18446744073709551615) type 2
[ +0.000002] [T15127] transid 12 data_len 0 name_len 6
[ +0.000001] [T15127] BTRFS error (device loop1): block=30818304 write
time tree block corruption detected
[ +0.000016] [T15127] BTRFS error (device loop1): block=30851072 bad
generation, have 12 expect > 14
[ +0.000004] [T15127] item 0 key (2 ROOT_ITEM 0) itemoff 15844
itemsize 439
[ +0.000002] [T15127] root data bytenr 30867456 refs 1
[ +0.000001] [T15127] item 1 key (4 ROOT_ITEM 0) itemoff 15405
itemsize 439
[ +0.000002] [T15127] root data bytenr 30605312 refs 1
[ +0.000002] [T15127] item 2 key (5 INODE_REF 6) itemoff 15388 itemsize 17
[ +0.000002] [T15127] index 0 name_len 7
[ +0.000001] [T15127] item 3 key (5 ROOT_ITEM 0) itemoff 14949
itemsize 439
[ +0.000002] [T15127] root data bytenr 30818304 refs 1
[ +0.000002] [T15127] item 4 key (5 ROOT_REF 256) itemoff 14925
itemsize 24
[ +0.000002] [T15127] item 5 key (5 ROOT_REF 257) itemoff 14901
itemsize 24
[ +0.000002] [T15127] item 6 key (6 INODE_ITEM 0) itemoff 14741
itemsize 160
[ +0.000002] [T15127] inode generation 3 transid 0 size 0 nbytes 16384
[ +0.000002] [T15127] block group 0 mode 40755 links 1 uid 0 gid 0
[ +0.000003] [T15127] rdev 0 sequence 0 flags 0x0
[ +0.000001] [T15127] atime 1769760651.0
[ +0.000002] [T15127] ctime 1769760651.0
[ +0.000003] [T15127] mtime 1769760651.0
[ +0.000002] [T15127] otime 1769760651.0
[ +0.000002] [T15127] item 7 key (6 INODE_REF 6) itemoff 14729 itemsize 12
[ +0.000003] [T15127] index 0 name_len 2
[ +0.000002] [T15127] item 8 key (6 DIR_ITEM 2378154706) itemoff 14692
itemsize 37
[ +0.000003] [T15127] location key (5 132 18446744073709551615) type 2
[ +0.000002] [T15127] transid 3 data_len 0 name_len 7
[ +0.000002] [T15127] item 9 key (7 ROOT_ITEM 0) itemoff 14253
itemsize 439
[ +0.000003] [T15127] root data bytenr 30490624 refs 1
[ +0.000002] [T15127] item 10 key (8 ROOT_ITEM 0) itemoff 13814
itemsize 439
[ +0.000002] [T15127] root data bytenr 30801920 refs 1
[ +0.000003] [T15127] item 11 key (9 ROOT_ITEM 0) itemoff 13375
itemsize 439
[ +0.000002] [T15127] root data bytenr 30900224 refs 1
[ +0.000002] [T15127] item 12 key (10 ROOT_ITEM 0) itemoff 12936
itemsize 439
[ +0.000003] [T15127] root data bytenr 30883840 refs 1
[ +0.000002] [T15127] item 13 key (256 ROOT_ITEM 11) itemoff 12497
itemsize 439
[ +0.000003] [T15127] root data bytenr 30752768 refs 1
[ +0.000002] [T15127] item 14 key (256 ROOT_BACKREF 5) itemoff 12473
itemsize 24
[ +0.000003] [T15127] item 15 key (257 ROOT_ITEM 12) itemoff 12034
itemsize 439
[ +0.000003] [T15127] root data bytenr 30834688 refs 1
[ +0.000002] [T15127] item 16 key (257 ROOT_BACKREF 5) itemoff 12010
itemsize 24
[ +0.000003] [T15127] item 17 key (18446744073709551607 ROOT_ITEM 0)
itemoff 11571 itemsize 439
[ +0.000004] [T15127] root data bytenr 30523392 refs 1
[ +0.000002] [T15127] BTRFS error (device loop1): block=30851072 write
time tree block corruption detected
and a lot more lines with the same generation errors for btrfs/122
btrfs/152 btrfs/210 btrfs/224 btrfs/316 btrfs/320 btrfs/340 fstest cases.
I have no idea why it's trying to write some ebs older than current
transaction. Seems related with snapshots.
> To fix this, we need to check the DIRTY flag again to prevent writing a
> eb which has some new data written, and lock the eb before we really
> doing io related things. I'm not farmilar with io related code so please
> correct me if I got anything wrong.
>
> Thanks,
>
> Sun Yangkai
next prev parent reply other threads:[~2026-01-30 9:37 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-27 20:42 [PATCH] btrfs: prevent COW amplification during btrfs_search_slot Leo Martins
2026-01-28 21:48 ` Qu Wenruo
2026-01-29 19:30 ` Leo Martins
2026-01-29 11:52 ` Filipe Manana
2026-01-30 0:12 ` Leo Martins
2026-01-30 4:14 ` Sun YangKai
2026-01-30 9:37 ` Sun YangKai [this message]
2026-01-30 15:50 ` Sun YangKai
2026-01-30 16:11 ` Filipe Manana
2026-01-31 9:16 ` Sun YangKai
2026-01-30 12:49 ` Filipe Manana
2026-01-30 15:43 ` Boris Burkov
2026-01-30 15:57 ` Filipe Manana
2026-02-03 1:09 ` Leo Martins
2026-01-30 21:43 ` Leo Martins
2026-01-30 22:34 ` Qu Wenruo
2026-01-31 0:11 ` Boris Burkov
2026-01-31 1:06 ` Qu Wenruo
2026-01-31 17:16 ` Boris Burkov
2026-01-31 21:59 ` Qu Wenruo
2026-02-10 7:45 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e5eee424-303d-423b-aead-2eccbf63b8ec@gmail.com \
--to=sunk67188@gmail.com \
--cc=fdmanana@kernel.org \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=loemra.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox