All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sun YangKai <sunk67188@gmail.com>
To: Leo Martins <loemra.dev@gmail.com>, Filipe Manana <fdmanana@kernel.org>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] btrfs: prevent COW amplification during btrfs_search_slot
Date: Fri, 30 Jan 2026 17:37:15 +0800	[thread overview]
Message-ID: <e5eee424-303d-423b-aead-2eccbf63b8ec@gmail.com> (raw)
In-Reply-To: <df47b1c0-c25e-4501-aaa0-bc73ce1fdc00@gmail.com>



On 2026/1/30 12:14, Sun YangKai wrote:
> On 2026/1/30 08:12, Leo Martins wrote:
>> On Thu, 29 Jan 2026 11:52:07 +0000 Filipe Manana<fdmanana@kernel.org> 
>> wrote:
>>> On Tue, Jan 27, 2026 at 8:43 PM Leo Martins<loemra.dev@gmail.com> wrote:
>>>> I've been investigating enospcs at Meta and have observed a strange
>>>> pattern where filesystems are enospcing with lots of unallocated space
>>>> (> 100G). Sample dmesg dump at bottom of message.
>>>>
>>>> btrfs_insert_delayed_dir_index is attempting to migrate some 
>>>> reservation
>>>> from the transaction block reserve and finding it exhausted leading 
>>>> to a
>>>> warning and enospc. This is a bug as the reservations are meant to be
>>>> worst case. It should be impossible to exhaust the transaction block
>>>> reserve.
>>>>
>>>> Some tracing of affected hosts revealed that there were single
>>>> btrfs_search_slot calls that were COWing 100s of times. I was able to
>>>> reproduce this behavior locally by creating a very constrained cgroup
>>>> and producing a lot of concurrent filesystem operations. Here's the
>>>> pattern:
>>>>
>>>>   1. btrfs_search_slot() begins tree traversal with cow=1
>>>>   2. Node at level N needs COW (old generation or WRITTEN flag set)
>>>>   3. btrfs_cow_block() allocates new node, updates parent pointer
>>>>   4. Traversal continues, but hits a condition requiring restart 
>>>> (e.g., node
>>>>      not cached, lock contention, need higher write_lock_level)
>>>>   5. btrfs_release_path() releases all locks and references
>>>>   6. Memory pressure triggers writeback on the COW'd node
>>>>   7. lock_extent_buffer_for_io() clears EXTENT_BUFFER_DIRTY and sets
>>>>      BTRFS_HEADER_FLAG_WRITTEN
>>>>   8. goto again - traversal restarts from root
>>>>   9. Traversal reaches the freshly COW'd node
>>>>   10. should_cow_block() sees WRITTEN flag set, returns true
>>>>   11. btrfs_cow_block() allocates another new node - same logical 
>>>> position,
>>>>       new physical location, new reservation consumed
>>>>   12. Steps 4-11 repeat indefinitely under sustained memory pressure
>>>>
>>>> Note this behavior should be much harder to trigger since Boris's
>>>> AS_KERNEL_FILE changes that make it so that extent_buffer pages aren't
>>>> accounted for in user cgroups. However, I believe it
>>>> would still be an issue under global memory pressure.
>>>> Link:https://lore.kernel.org/linux-btrfs/ 
>>>> cover.1755812945.git.boris@bur.io/
>>>>
>>>> This COW amplification breaks the idea that transaction reservations 
>>>> are
>>>> worst case as any search slot call could find itself in this COW 
>>>> loop and
>>>> exhaust its reservation.
>>>>
>>>> My proposed solution is to temporarily pin extent buffers for the
>>>> lifetime of btrfs_search_slot. This prevents the massive COW
>>>> amplification that can be seen during high memory pressure.
>>>>
>>>> The implementation uses a local xarray to track COW'd buffers for the
>>>> duration of the search. The xarray stores extent_buffer pointers 
>>>> without
>>>> taking additional references; this is safe because tracked buffers 
>>>> remain
>>>> dirty (writeback_blockers prevents the dirty bit from being cleared) 
>>>> and
>>>> dirty buffers cannot be reclaimed by memory pressure.
>>>>
>>>> Synchronization is provided by eb->lock: increments in
>>>> btrfs_search_slot_track_cow() occur while holding the write lock, and
>>>> the check in lock_extent_buffer_for_io() also holds the write lock via
>>>> btrfs_tree_lock(). Decrements don't require eb->lock because
>>>> writeback_blockers is atomic and merely indicates "don't write yet".
>>>> Once we decrement, we're done and don't care if writeback proceeds
>>>> immediately.
>>> This seems too complex to me.
>>>
>>> So this problem is very similar to some idea I had a few years ago but
>>> never managed to implement.
>>> It was about avoiding unnecessary COW, not for this space reservation
>>> exhaustion due to sustained memory pressure, but it would solve it
>>> too.
>>>
>>> The idea was that we do unnecessary COW in cases like this:
>>>
>>> 1) We COW a path in some tree and we are at transaction N;
>>>
>>> 2) Writeback happened for the extent buffers in that path while we are
>>> in the same transaction, because we reached the 32M limit and some
>>> task called btrfs_btree_balance_dirty() or something else triggered
>>> writeback of the btree inode;
>>>
>>> 3) While still at transaction N, we visit the same path to add an item
>>> to a leaf, or modify an item, whatever. Because the extent buffers
>>> have BTRFS_HEADER_FLAG_WRITTEN, we COW them again (should_cow_block()
>>> returns true).
>>>
>>> So during the lifetime of a transaction we can have a lot of
>>> unnecessary COW - we spend more time allocating extents, allocating
>>> memory, copying extent buffer data, use more space per transaction,
>>> etc.
>>>
>>> The idea was to not COW when an extent buffer has
>>> BTRFS_HEADER_FLAG_WRITTEN set, but only if its generation
>>> (btrfs_header_generation(eb)) matches the current transaction.
>>> That is safe because there's no committed tree that points to an
>>> extent buffer created in the current transaction.
>>>
>>> Any further modification to the extent buffer must be sure that the
>>> EXTENT_BUFFER_DIRTY flag is set, that the eb range is still in the
>>> transaction's dirty_pages io tree, etc, so that we don't miss writing
>>> the extent buffer to the same location again before the transaction
>>> commits the superblocks.
>>>
>>> Have you considered an approach like this?
>> I had not considered this, but it is a great idea.
>>
>> My first thought is that implementing this could be as simple
>> as removing the BTRFS_HEADER_FLAG_WRITTEN check. However, this
>> would mess with the assumptions around the log tree. From
>> btrfs_sync_log():
> After a fast glance and some tests, I found things might not be that 
> easy. The problem is not only the log tree.
>> /*
>>   * IO has been started, blocks of the log tree have WRITTEN flag set
>>   * in their headers. new modifications of the log will be written to
>>   * new positions. so it's safe to allow log writers to go in.
>>   */
>>
>> ^ Assumes that WRITTEN blocks will be COW'd.
>>
>> The issue looks like:
>>
>>   1. fsync A COWs eb
>>   2. fsync A lock_extent_buffer_for_io(); sets WRITTEN, unlocks tree
>>   3. fsync B does __not__ COW eb and modifies it
>>   4. fsync A writes modified eb to disk
>>   5. CRASH; the log tree is corrupted
>>
>> One way to avoid that is to keep the current behavior for the log
>> tree, but that leaves the potential for COW amplification...
> I tested with a patch like this:
> @@ -624,14 +624,18 @@ static inline bool should_cow_block(const struct 
> btrfs_trans_handle *trans,
>          if (btrfs_header_generation(buf) != trans->transid)
>                  return true;
> 
> -       if (btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN))
> -               return true;
> -
>          /* Ensure we can see the FORCE_COW bit. */
>          smp_mb__before_atomic();
>          if (test_bit(BTRFS_ROOT_FORCE_COW, &root->state))
>                  return true;
> 
> +       if (btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN)) {
> +               if (btrfs_root_id(root) == BTRFS_TREE_LOG_OBJECTID)
> +                       return true;
> +               btrfs_mark_buffer_dirty(trans, buf);
> +               return false;
> +       }
> +
>          if (btrfs_root_id(root) == BTRFS_TREE_RELOC_OBJECTID)
> 
>                  return false;
> 
> And get some errors like this:
> 
> 
> [  +0.090163] [ T2589] run fstests btrfs/004 at 2026-01-30 11:53:37
> [  +0.432352] [T11685] BTRFS: device fsid 1fb397fc-97a7-44dd-9602- 
> dd38b74bc391 devid 1 transid 8 /dev/loop1 (7:1) scanned by mount (11685)
> [  +0.000351] [T11685] BTRFS info (device loop1): first mount of 
> filesystem 1fb397fc-97a7-44dd-9602-dd38b74bc391
> [  +0.000014] [T11685] BTRFS info (device loop1): using crc32c (crc32c- 
> lib) checksum algorithm
> [  +0.001298] [T11685] BTRFS info (device loop1): checking UUID tree
> [  +0.000039] [T11685] BTRFS info (device loop1): enabling ssd 
> optimizations
> [  +0.000003] [T11685] BTRFS info (device loop1): turning on async discard
> [  +0.000002] [T11685] BTRFS info (device loop1): enabling free space tree
> [  +1.051781] [T11703] page: refcount:2 mapcount:0 
> mapping:00000000eb6d7caa index:0x2348 pfn:0x1caebf
> [  +0.000008] [T11703] memcg:ffff9b3300263cc0
> [  +0.000003] [T11703] aops:0xffffffffc0354040 ino:1
> [  +0.000024] [T11703] flags: 0x4e0000000000423e(referenced|uptodate| 
> dirty|lru|workingset|private|writeback|zone=1)
> [  +0.000007] [T11703] raw: 4e0000000000423e fffff74a872bb908 
> fffff74a84206a88 ffff9b33c6706880
> [  +0.000004] [T11703] raw: 0000000000002348 ffff9b334be522d0 
> 00000002ffffffff ffff9b3300263cc0
> [  +0.000002] [T11703] page dumped because: eb page dump
> [  +0.000003] [T11703] BTRFS critical (device loop1): corrupt leaf: 
> root=5 block=36995072 slot=118 ino=406 file_offset=94208, invalid 
> ram_bytes for file extent, have 8660273067269322872, should be aligned 
> to 4096
> [  +0.000013] [T11703] BTRFS info (device loop1): leaf 36995072 gen 33 
> total ptrs 128 free space 2857 owner 5
> [  +0.000006] [T11703]     item 0 key (386 DIR_ITEM 238230307) itemoff 
> 16249 itemsize 34
> [  +0.000004] [T11703]         location key (462 1 0) type 2
> [  +0.000003] [T11703]         transid 33 data_len 0 name_len 4
> [  +0.000003] [T11703]     item 1 key (386 DIR_ITEM 1473745676) itemoff 
> 16216 itemsize 33
> [  +0.000004] [T11703]         location key (376 1 0) type 3
> [  +0.000002] [T11703]         transid 30 data_len 0 name_len 3
> [  +0.000003] [T11703]     item 2 key (386 DIR_ITEM 2243137595) itemoff 
> 16182 itemsize 34
> [  +0.000004] [T11703]         location key (413 1 0) type 1
> [  +0.000002] [T11703]         transid 32 data_len 0 name_len 4
> ...
> [  +0.000001] [T11703]     item 127 key (405 DIR_ITEM 828387202) itemoff 
> 6057 itemsize 34
> [  +0.000002] [T11703]         location key (479 1 0) type 3
> [  +0.000001] [T11703]         transid 33 data_len 0 name_len 4
> [  +0.000002] [T11703] BTRFS error (device loop1): block=36995072 write 
> time tree block corruption detected
> [  +0.003429] [T11703] BTRFS: error (device loop1) in 
> btrfs_commit_transaction:2555: errno=-5 IO failure (Error while writing 
> out transaction)
> [  +0.000007] [T11703] BTRFS info (device loop1 state E): forced readonly
> [  +0.000002] [T11703] BTRFS warning (device loop1 state E): Skipping 
> commit of aborted transaction.
> [  +0.000002] [T11703] BTRFS error (device loop1 state EA): Transaction 
> aborted (error -5)
> [  +0.000003] [T11703] BTRFS: error (device loop1 state EA) in 
> cleanup_transaction:2037: errno=-5 IO failure
> 
> The reported 406 inode is even not in the printed leaf. It seems like a 
> data race maybe caused by:
> 
> We unlock the eb after setting the WRITTEN flag during write back, and 
> the eb should not get modified since then because all future writes will 
> use the cowed eb. However, with the WRITTEN flag check removed in 
> should_cow_block, we might write to the eb with WRITTEN flag set which 
> might be under io.

I tried again with this:

@@ -624,14 +624,20 @@ static inline bool should_cow_block(const struct 
btrfs_trans_handle *trans,
         if (btrfs_header_generation(buf) != trans->transid)
                 return true;

-       if (btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN))
-               return true;
-
         /* Ensure we can see the FORCE_COW bit. */
         smp_mb__before_atomic();
         if (test_bit(BTRFS_ROOT_FORCE_COW, &root->state))
                 return true;

+       if (btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN)) {
+               if (btrfs_root_id(root) == BTRFS_TREE_LOG_OBJECTID)
+                       return true;
+               if (test_bit(EXTENT_BUFFER_WRITEBACK, &buf->bflags))
+                       return true;
+               btrfs_mark_buffer_dirty(trans, buf);
+               return false;
+       }
+
         if (btrfs_root_id(root) == BTRFS_TREE_RELOC_OBJECTID)
                 return false;

When WRITEBACK is set, do a normal cow to prevent the data race. This 
seems to fix the previous problem. However, I got this:

[  +0.020843] [T15127] BTRFS error (device loop1): block=30687232 bad 
generation, have 11 expect > 14
[  +0.000009] [T15127] 	item 0 key (256 INODE_ITEM 0) itemoff 16123 
itemsize 160
[  +0.000004] [T15127] 		inode generation 3 transid 11 size 10 nbytes 16384
[  +0.000003] [T15127] 		block group 0 mode 40755 links 1 uid 0 gid 0
[  +0.000002] [T15127] 		rdev 0 sequence 1 flags 0x0
[  +0.000002] [T15127] 		atime 1769760651.0
[  +0.000002] [T15127] 		ctime 1769760652.250234845
[  +0.000002] [T15127] 		mtime 1769760652.250234845
[  +0.000001] [T15127] 		otime 1769760651.0
[  +0.000002] [T15127] 	item 1 key (256 INODE_REF 256) itemoff 16111 
itemsize 12
[  +0.000003] [T15127] 		index 0 name_len 2
[  +0.000002] [T15127] 	item 2 key (256 DIR_ITEM 2030520461) itemoff 
16076 itemsize 35
[  +0.000002] [T15127] 		location key (257 1 0) type 2
[  +0.000002] [T15127] 		transid 11 data_len 0 name_len 5
[  +0.000002] [T15127] 	item 3 key (256 DIR_INDEX 2) itemoff 16041 
itemsize 35
[  +0.000002] [T15127] 		location key (257 1 0) type 2
[  +0.000002] [T15127] 		transid 11 data_len 0 name_len 5
[  +0.000002] [T15127] 	item 4 key (257 INODE_ITEM 0) itemoff 15881 
itemsize 160
[  +0.000002] [T15127] 		inode generation 11 transid 11 size 12 nbytes 0
[  +0.000002] [T15127] 		block group 0 mode 40755 links 1 uid 0 gid 0
[  +0.000002] [T15127] 		rdev 0 sequence 19 flags 0x0
[  +0.000001] [T15127] 		atime 1769760652.250234845
[  +0.000002] [T15127] 		ctime 1769760652.256913323
[  +0.000002] [T15127] 		mtime 1769760652.256913323
[  +0.000001] [T15127] 		otime 1769760652.250234845
[  +0.000002] [T15127] 	item 5 key (257 INODE_REF 256) itemoff 15866 
itemsize 15
[  +0.000002] [T15127] 		index 2 name_len 5
[  +0.000002] [T15127] 	item 6 key (257 DIR_ITEM 247980518) itemoff 
15830 itemsize 36
[  +0.000002] [T15127] 		location key (256 132 18446744073709551615) type 2
[  +0.000002] [T15127] 		transid 11 data_len 0 name_len 6
[  +0.000002] [T15127] 	item 7 key (257 DIR_INDEX 2) itemoff 15794 
itemsize 36
[  +0.000002] [T15127] 		location key (256 132 18446744073709551615) type 2
[  +0.000002] [T15127] 		transid 11 data_len 0 name_len 6
[  +0.000001] [T15127] BTRFS error (device loop1): block=30687232 write 
time tree block corruption detected
[  +0.000017] [T15127] BTRFS error (device loop1): block=30703616 bad 
generation, have 11 expect > 14
[  +0.000004] [T15127] 	item 0 key (13631488 BLOCK_GROUP_ITEM 8388608) 
itemoff 16259 itemsize 24
[  +0.000003] [T15127] 		block group used 0 chunk_objectid 256 flags 1
[  +0.000002] [T15127] 	item 1 key (22020096 BLOCK_GROUP_ITEM 8388608) 
itemoff 16235 itemsize 24
[  +0.000002] [T15127] 		block group used 16384 chunk_objectid 256 flags 34
[  +0.000002] [T15127] 	item 2 key (22036480 METADATA_ITEM 0) itemoff 
16202 itemsize 33
[  +0.000002] [T15127] 		extent refs 1 gen 8 flags 2
[  +0.000002] [T15127] 		ref#0: tree block backref root 3
[  +0.000003] [T15127] 	item 3 key (30408704 BLOCK_GROUP_ITEM 268435456) 
itemoff 16178 itemsize 24
[  +0.000002] [T15127] 		block group used 163840 chunk_objectid 256 flags 36
[  +0.000002] [T15127] 	item 4 key (30490624 METADATA_ITEM 0) itemoff 
16145 itemsize 33
[  +0.000002] [T15127] 		extent refs 1 gen 5 flags 2
[  +0.000002] [T15127] 		ref#0: tree block backref root 7
[  +0.000002] [T15127] 	item 5 key (30523392 METADATA_ITEM 0) itemoff 
16112 itemsize 33
[  +0.000002] [T15127] 		extent refs 1 gen 5 flags 2
[  +0.000002] [T15127] 		ref#0: tree block backref root 18446744073709551607
[  +0.000002] [T15127] 	item 6 key (30605312 METADATA_ITEM 0) itemoff 
16079 itemsize 33
[  +0.000002] [T15127] 		extent refs 1 gen 9 flags 2
[  +0.000002] [T15127] 		ref#0: tree block backref root 4
[  +0.000002] [T15127] 	item 7 key (30687232 METADATA_ITEM 0) itemoff 
16046 itemsize 33
[  +0.000002] [T15127] 		extent refs 1 gen 11 flags 2
[  +0.000002] [T15127] 		ref#0: tree block backref root 5
[  +0.000002] [T15127] 	item 8 key (30703616 METADATA_ITEM 0) itemoff 
16013 itemsize 33
[  +0.000002] [T15127] 		extent refs 1 gen 11 flags 2
[  +0.000002] [T15127] 		ref#0: tree block backref root 2
[  +0.000002] [T15127] 	item 9 key (30720000 METADATA_ITEM 0) itemoff 
15980 itemsize 33
[  +0.000002] [T15127] 		extent refs 1 gen 11 flags 2
[  +0.000002] [T15127] 		ref#0: tree block backref root 10
[  +0.000002] [T15127] 	item 10 key (30736384 METADATA_ITEM 0) itemoff 
15947 itemsize 33
[  +0.000002] [T15127] 		extent refs 1 gen 11 flags 2
[  +0.000002] [T15127] 		ref#0: tree block backref root 8
[  +0.000002] [T15127] 	item 11 key (30752768 METADATA_ITEM 0) itemoff 
15914 itemsize 33
[  +0.000002] [T15127] 		extent refs 1 gen 11 flags 2
[  +0.000002] [T15127] 		ref#0: tree block backref root 256
[  +0.000002] [T15127] 	item 12 key (30769152 METADATA_ITEM 0) itemoff 
15881 itemsize 33
[  +0.000002] [T15127] 		extent refs 1 gen 11 flags 2
[  +0.000002] [T15127] 		ref#0: tree block backref root 1
[  +0.000002] [T15127] 	item 13 key (30785536 METADATA_ITEM 0) itemoff 
15848 itemsize 33
[  +0.000002] [T15127] 		extent refs 1 gen 11 flags 2
[  +0.000002] [T15127] 		ref#0: tree block backref root 9
[  +0.000002] [T15127] BTRFS error (device loop1): block=30703616 write 
time tree block corruption detected
[  +0.000012] [T15127] BTRFS error (device loop1): block=30720000 bad 
generation, have 11 expect > 14
[  +0.000004] [T15127] 	item 0 key (13631488 FREE_SPACE_INFO 8388608) 
itemoff 16275 itemsize 8
[  +0.000002] [T15127] 	item 1 key (13631488 FREE_SPACE_EXTENT 8388608) 
itemoff 16275 itemsize 0
[  +0.000002] [T15127] 	item 2 key (22020096 FREE_SPACE_INFO 8388608) 
itemoff 16267 itemsize 8
[  +0.000002] [T15127] 	item 3 key (22020096 FREE_SPACE_EXTENT 16384) 
itemoff 16267 itemsize 0
[  +0.000003] [T15127] 	item 4 key (22052864 FREE_SPACE_EXTENT 8355840) 
itemoff 16267 itemsize 0
[  +0.000002] [T15127] 	item 5 key (30408704 FREE_SPACE_INFO 268435456) 
itemoff 16259 itemsize 8
[  +0.000002] [T15127] 	item 6 key (30408704 FREE_SPACE_EXTENT 81920) 
itemoff 16259 itemsize 0
[  +0.000002] [T15127] 	item 7 key (30507008 FREE_SPACE_EXTENT 16384) 
itemoff 16259 itemsize 0
[  +0.000002] [T15127] 	item 8 key (30539776 FREE_SPACE_EXTENT 65536) 
itemoff 16259 itemsize 0
[  +0.000002] [T15127] 	item 9 key (30621696 FREE_SPACE_EXTENT 65536) 
itemoff 16259 itemsize 0
[  +0.000003] [T15127] 	item 10 key (30801920 FREE_SPACE_EXTENT 
268042240) itemoff 16259 itemsize 0
[  +0.000002] [T15127] BTRFS error (device loop1): block=30720000 write 
time tree block corruption detected
[  +0.000010] [T15127] BTRFS error (device loop1): block=30736384 bad 
generation, have 11 expect > 14
[  +0.000004] [T15127] 	item 0 key (0 QGROUP_STATUS 0) itemoff 16243 
itemsize 40
[  +0.000003] [T15127] 	item 1 key (0 QGROUP_INFO 5) itemoff 16203 
itemsize 40
[  +0.000002] [T15127] 	item 2 key (0 QGROUP_INFO 256) itemoff 16163 
itemsize 40
[  +0.000002] [T15127] 	item 3 key (0 QGROUP_LIMIT 5) itemoff 16123 
itemsize 40
[  +0.000002] [T15127] 	item 4 key (0 QGROUP_LIMIT 256) itemoff 16083 
itemsize 40
[  +0.000003] [T15127] BTRFS error (device loop1): block=30736384 write 
time tree block corruption detected
[  +0.000014] [T15127] BTRFS error (device loop1): block=30769152 bad 
generation, have 11 expect > 14
[  +0.000004] [T15127] 	item 0 key (2 ROOT_ITEM 0) itemoff 15844 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30703616 refs 1
[  +0.000002] [T15127] 	item 1 key (4 ROOT_ITEM 0) itemoff 15405 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30605312 refs 1
[  +0.000001] [T15127] 	item 2 key (5 INODE_REF 6) itemoff 15388 itemsize 17
[  +0.000002] [T15127] 		index 0 name_len 7
[  +0.000002] [T15127] 	item 3 key (5 ROOT_ITEM 0) itemoff 14949 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30687232 refs 1
[  +0.000002] [T15127] 	item 4 key (5 ROOT_REF 256) itemoff 14925 
itemsize 24
[  +0.000002] [T15127] 	item 5 key (6 INODE_ITEM 0) itemoff 14765 
itemsize 160
[  +0.000002] [T15127] 		inode generation 3 transid 0 size 0 nbytes 16384
[  +0.000002] [T15127] 		block group 0 mode 40755 links 1 uid 0 gid 0
[  +0.000002] [T15127] 		rdev 0 sequence 0 flags 0x0
[  +0.000001] [T15127] 		atime 1769760651.0
[  +0.000002] [T15127] 		ctime 1769760651.0
[  +0.000002] [T15127] 		mtime 1769760651.0
[  +0.000001] [T15127] 		otime 1769760651.0
[  +0.000002] [T15127] 	item 6 key (6 INODE_REF 6) itemoff 14753 itemsize 12
[  +0.000002] [T15127] 		index 0 name_len 2
[  +0.000001] [T15127] 	item 7 key (6 DIR_ITEM 2378154706) itemoff 14716 
itemsize 37
[  +0.000003] [T15127] 		location key (5 132 18446744073709551615) type 2
[  +0.000001] [T15127] 		transid 3 data_len 0 name_len 7
[  +0.000002] [T15127] 	item 8 key (7 ROOT_ITEM 0) itemoff 14277 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30490624 refs 1
[  +0.000002] [T15127] 	item 9 key (8 ROOT_ITEM 0) itemoff 13838 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30736384 refs 1
[  +0.000001] [T15127] 	item 10 key (9 ROOT_ITEM 0) itemoff 13399 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30785536 refs 1
[  +0.000002] [T15127] 	item 11 key (10 ROOT_ITEM 0) itemoff 12960 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30720000 refs 1
[  +0.000001] [T15127] 	item 12 key (256 ROOT_ITEM 11) itemoff 12521 
itemsize 439
[  +0.000003] [T15127] 		root data bytenr 30752768 refs 1
[  +0.000001] [T15127] 	item 13 key (256 ROOT_BACKREF 5) itemoff 12497 
itemsize 24
[  +0.000003] [T15127] 	item 14 key (18446744073709551607 ROOT_ITEM 0) 
itemoff 12058 itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30523392 refs 1
[  +0.000001] [T15127] BTRFS error (device loop1): block=30769152 write 
time tree block corruption detected
[  +0.000012] [T15127] BTRFS error (device loop1): block=30801920 bad 
generation, have 12 expect > 14
[  +0.000003] [T15127] 	item 0 key (0 QGROUP_STATUS 0) itemoff 16243 
itemsize 40
[  +0.000003] [T15127] 	item 1 key (0 QGROUP_INFO 5) itemoff 16203 
itemsize 40
[  +0.000002] [T15127] 	item 2 key (0 QGROUP_INFO 256) itemoff 16163 
itemsize 40
[  +0.000002] [T15127] 	item 3 key (0 QGROUP_INFO 257) itemoff 16123 
itemsize 40
[  +0.000002] [T15127] 	item 4 key (0 QGROUP_LIMIT 5) itemoff 16083 
itemsize 40
[  +0.000002] [T15127] 	item 5 key (0 QGROUP_LIMIT 256) itemoff 16043 
itemsize 40
[  +0.000002] [T15127] 	item 6 key (0 QGROUP_LIMIT 257) itemoff 16003 
itemsize 40
[  +0.000002] [T15127] BTRFS error (device loop1): block=30801920 write 
time tree block corruption detected
[  +0.000014] [T15127] BTRFS error (device loop1): block=30818304 bad 
generation, have 12 expect > 14
[  +0.000003] [T15127] 	item 0 key (256 INODE_ITEM 0) itemoff 16123 
itemsize 160
[  +0.000002] [T15127] 		inode generation 3 transid 11 size 10 nbytes 16384
[  +0.000002] [T15127] 		block group 0 mode 40755 links 1 uid 0 gid 0
[  +0.000002] [T15127] 		rdev 0 sequence 1 flags 0x0
[  +0.000002] [T15127] 		atime 1769760651.0
[  +0.000001] [T15127] 		ctime 1769760652.250234845
[  +0.000002] [T15127] 		mtime 1769760652.250234845
[  +0.000001] [T15127] 		otime 1769760651.0
[  +0.000002] [T15127] 	item 1 key (256 INODE_REF 256) itemoff 16111 
itemsize 12
[  +0.000002] [T15127] 		index 0 name_len 2
[  +0.000002] [T15127] 	item 2 key (256 DIR_ITEM 2030520461) itemoff 
16076 itemsize 35
[  +0.000002] [T15127] 		location key (257 1 0) type 2
[  +0.000002] [T15127] 		transid 11 data_len 0 name_len 5
[  +0.000001] [T15127] 	item 3 key (256 DIR_INDEX 2) itemoff 16041 
itemsize 35
[  +0.000002] [T15127] 		location key (257 1 0) type 2
[  +0.000002] [T15127] 		transid 11 data_len 0 name_len 5
[  +0.000002] [T15127] 	item 4 key (257 INODE_ITEM 0) itemoff 15881 
itemsize 160
[  +0.000002] [T15127] 		inode generation 11 transid 12 size 24 nbytes 0
[  +0.000002] [T15127] 		block group 0 mode 40755 links 1 uid 0 gid 0
[  +0.000002] [T15127] 		rdev 0 sequence 19 flags 0x0
[  +0.000001] [T15127] 		atime 1769760652.250234845
[  +0.000002] [T15127] 		ctime 1769760652.267621586
[  +0.000001] [T15127] 		mtime 1769760652.267621586
[  +0.000002] [T15127] 		otime 1769760652.250234845
[  +0.000002] [T15127] 	item 5 key (257 INODE_REF 256) itemoff 15866 
itemsize 15
[  +0.000002] [T15127] 		index 2 name_len 5
[  +0.000001] [T15127] 	item 6 key (257 DIR_ITEM 247980518) itemoff 
15830 itemsize 36
[  +0.000002] [T15127] 		location key (256 132 18446744073709551615) type 2
[  +0.000002] [T15127] 		transid 11 data_len 0 name_len 6
[  +0.000002] [T15127] 	item 7 key (257 DIR_ITEM 496439826) itemoff 
15794 itemsize 36
[  +0.000002] [T15127] 		location key (257 132 18446744073709551615) type 2
[  +0.000002] [T15127] 		transid 12 data_len 0 name_len 6
[  +0.000001] [T15127] 	item 8 key (257 DIR_INDEX 2) itemoff 15758 
itemsize 36
[  +0.000003] [T15127] 		location key (256 132 18446744073709551615) type 2
[  +0.000001] [T15127] 		transid 11 data_len 0 name_len 6
[  +0.000002] [T15127] 	item 9 key (257 DIR_INDEX 3) itemoff 15722 
itemsize 36
[  +0.000002] [T15127] 		location key (257 132 18446744073709551615) type 2
[  +0.000002] [T15127] 		transid 12 data_len 0 name_len 6
[  +0.000001] [T15127] BTRFS error (device loop1): block=30818304 write 
time tree block corruption detected
[  +0.000016] [T15127] BTRFS error (device loop1): block=30851072 bad 
generation, have 12 expect > 14
[  +0.000004] [T15127] 	item 0 key (2 ROOT_ITEM 0) itemoff 15844 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30867456 refs 1
[  +0.000001] [T15127] 	item 1 key (4 ROOT_ITEM 0) itemoff 15405 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30605312 refs 1
[  +0.000002] [T15127] 	item 2 key (5 INODE_REF 6) itemoff 15388 itemsize 17
[  +0.000002] [T15127] 		index 0 name_len 7
[  +0.000001] [T15127] 	item 3 key (5 ROOT_ITEM 0) itemoff 14949 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30818304 refs 1
[  +0.000002] [T15127] 	item 4 key (5 ROOT_REF 256) itemoff 14925 
itemsize 24
[  +0.000002] [T15127] 	item 5 key (5 ROOT_REF 257) itemoff 14901 
itemsize 24
[  +0.000002] [T15127] 	item 6 key (6 INODE_ITEM 0) itemoff 14741 
itemsize 160
[  +0.000002] [T15127] 		inode generation 3 transid 0 size 0 nbytes 16384
[  +0.000002] [T15127] 		block group 0 mode 40755 links 1 uid 0 gid 0
[  +0.000003] [T15127] 		rdev 0 sequence 0 flags 0x0
[  +0.000001] [T15127] 		atime 1769760651.0
[  +0.000002] [T15127] 		ctime 1769760651.0
[  +0.000003] [T15127] 		mtime 1769760651.0
[  +0.000002] [T15127] 		otime 1769760651.0
[  +0.000002] [T15127] 	item 7 key (6 INODE_REF 6) itemoff 14729 itemsize 12
[  +0.000003] [T15127] 		index 0 name_len 2
[  +0.000002] [T15127] 	item 8 key (6 DIR_ITEM 2378154706) itemoff 14692 
itemsize 37
[  +0.000003] [T15127] 		location key (5 132 18446744073709551615) type 2
[  +0.000002] [T15127] 		transid 3 data_len 0 name_len 7
[  +0.000002] [T15127] 	item 9 key (7 ROOT_ITEM 0) itemoff 14253 
itemsize 439
[  +0.000003] [T15127] 		root data bytenr 30490624 refs 1
[  +0.000002] [T15127] 	item 10 key (8 ROOT_ITEM 0) itemoff 13814 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30801920 refs 1
[  +0.000003] [T15127] 	item 11 key (9 ROOT_ITEM 0) itemoff 13375 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30900224 refs 1
[  +0.000002] [T15127] 	item 12 key (10 ROOT_ITEM 0) itemoff 12936 
itemsize 439
[  +0.000003] [T15127] 		root data bytenr 30883840 refs 1
[  +0.000002] [T15127] 	item 13 key (256 ROOT_ITEM 11) itemoff 12497 
itemsize 439
[  +0.000003] [T15127] 		root data bytenr 30752768 refs 1
[  +0.000002] [T15127] 	item 14 key (256 ROOT_BACKREF 5) itemoff 12473 
itemsize 24
[  +0.000003] [T15127] 	item 15 key (257 ROOT_ITEM 12) itemoff 12034 
itemsize 439
[  +0.000003] [T15127] 		root data bytenr 30834688 refs 1
[  +0.000002] [T15127] 	item 16 key (257 ROOT_BACKREF 5) itemoff 12010 
itemsize 24
[  +0.000003] [T15127] 	item 17 key (18446744073709551607 ROOT_ITEM 0) 
itemoff 11571 itemsize 439
[  +0.000004] [T15127] 		root data bytenr 30523392 refs 1
[  +0.000002] [T15127] BTRFS error (device loop1): block=30851072 write 
time tree block corruption detected

and a lot more lines with the same generation errors for btrfs/122 
btrfs/152 btrfs/210 btrfs/224 btrfs/316 btrfs/320 btrfs/340 fstest cases.

I have no idea why it's trying to write some ebs older than current 
transaction. Seems related with snapshots.

> To fix this, we need to check the DIRTY flag again to prevent writing a 
> eb which has some new data written, and lock the eb before we really 
> doing io related things. I'm not farmilar with io related code so please 
> correct me if I got anything wrong.
> 
> Thanks,
> 
> Sun Yangkai



  reply	other threads:[~2026-01-30  9:37 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-27 20:42 [PATCH] btrfs: prevent COW amplification during btrfs_search_slot Leo Martins
2026-01-28 21:48 ` Qu Wenruo
2026-01-29 19:30   ` Leo Martins
2026-01-29 11:52 ` Filipe Manana
2026-01-30  0:12   ` Leo Martins
2026-01-30  4:14     ` Sun YangKai
2026-01-30  9:37       ` Sun YangKai [this message]
2026-01-30 15:50         ` Sun YangKai
2026-01-30 16:11           ` Filipe Manana
2026-01-31  9:16             ` Sun YangKai
2026-01-30 12:49     ` Filipe Manana
2026-01-30 15:43       ` Boris Burkov
2026-01-30 15:57         ` Filipe Manana
2026-02-03  1:09           ` Leo Martins
2026-01-30 21:43       ` Leo Martins
2026-01-30 22:34         ` Qu Wenruo
2026-01-31  0:11           ` Boris Burkov
2026-01-31  1:06             ` Qu Wenruo
2026-01-31 17:16               ` Boris Burkov
2026-01-31 21:59                 ` Qu Wenruo
2026-02-10  7:45 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e5eee424-303d-423b-aead-2eccbf63b8ec@gmail.com \
    --to=sunk67188@gmail.com \
    --cc=fdmanana@kernel.org \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=loemra.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.