Re: [PATCH] btrfs: prevent COW amplification during btrfs_search_slot

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Sun YangKai <sunk67188@gmail.com>
To: Leo Martins <loemra.dev@gmail.com>, Filipe Manana <fdmanana@kernel.org>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] btrfs: prevent COW amplification during btrfs_search_slot
Date: Fri, 30 Jan 2026 17:37:15 +0800	[thread overview]
Message-ID: <e5eee424-303d-423b-aead-2eccbf63b8ec@gmail.com> (raw)
In-Reply-To: <df47b1c0-c25e-4501-aaa0-bc73ce1fdc00@gmail.com>



On 2026/1/30 12:14, Sun YangKai wrote:
> On 2026/1/30 08:12, Leo Martins wrote:
>> On Thu, 29 Jan 2026 11:52:07 +0000 Filipe Manana<fdmanana@kernel.org> 
>> wrote:
>>> On Tue, Jan 27, 2026 at 8:43 PM Leo Martins<loemra.dev@gmail.com> wrote:
>>>> I've been investigating enospcs at Meta and have observed a strange
>>>> pattern where filesystems are enospcing with lots of unallocated space
>>>> (> 100G). Sample dmesg dump at bottom of message.
>>>>
>>>> btrfs_insert_delayed_dir_index is attempting to migrate some 
>>>> reservation
>>>> from the transaction block reserve and finding it exhausted leading 
>>>> to a
>>>> warning and enospc. This is a bug as the reservations are meant to be
>>>> worst case. It should be impossible to exhaust the transaction block
>>>> reserve.
>>>>
>>>> Some tracing of affected hosts revealed that there were single
>>>> btrfs_search_slot calls that were COWing 100s of times. I was able to
>>>> reproduce this behavior locally by creating a very constrained cgroup
>>>> and producing a lot of concurrent filesystem operations. Here's the
>>>> pattern:
>>>>
>>>>   1. btrfs_search_slot() begins tree traversal with cow=1
>>>>   2. Node at level N needs COW (old generation or WRITTEN flag set)
>>>>   3. btrfs_cow_block() allocates new node, updates parent pointer
>>>>   4. Traversal continues, but hits a condition requiring restart 
>>>> (e.g., node
>>>>      not cached, lock contention, need higher write_lock_level)
>>>>   5. btrfs_release_path() releases all locks and references
>>>>   6. Memory pressure triggers writeback on the COW'd node
>>>>   7. lock_extent_buffer_for_io() clears EXTENT_BUFFER_DIRTY and sets
>>>>      BTRFS_HEADER_FLAG_WRITTEN
>>>>   8. goto again - traversal restarts from root
>>>>   9. Traversal reaches the freshly COW'd node
>>>>   10. should_cow_block() sees WRITTEN flag set, returns true
>>>>   11. btrfs_cow_block() allocates another new node - same logical 
>>>> position,
>>>>       new physical location, new reservation consumed
>>>>   12. Steps 4-11 repeat indefinitely under sustained memory pressure
>>>>
>>>> Note this behavior should be much harder to trigger since Boris's
>>>> AS_KERNEL_FILE changes that make it so that extent_buffer pages aren't
>>>> accounted for in user cgroups. However, I believe it
>>>> would still be an issue under global memory pressure.
>>>> Link:https://lore.kernel.org/linux-btrfs/ 
>>>> cover.1755812945.git.boris@bur.io/
>>>>
>>>> This COW amplification breaks the idea that transaction reservations 
>>>> are
>>>> worst case as any search slot call could find itself in this COW 
>>>> loop and
>>>> exhaust its reservation.
>>>>
>>>> My proposed solution is to temporarily pin extent buffers for the
>>>> lifetime of btrfs_search_slot. This prevents the massive COW
>>>> amplification that can be seen during high memory pressure.
>>>>
>>>> The implementation uses a local xarray to track COW'd buffers for the
>>>> duration of the search. The xarray stores extent_buffer pointers 
>>>> without
>>>> taking additional references; this is safe because tracked buffers 
>>>> remain
>>>> dirty (writeback_blockers prevents the dirty bit from being cleared) 
>>>> and
>>>> dirty buffers cannot be reclaimed by memory pressure.
>>>>
>>>> Synchronization is provided by eb->lock: increments in
>>>> btrfs_search_slot_track_cow() occur while holding the write lock, and
>>>> the check in lock_extent_buffer_for_io() also holds the write lock via
>>>> btrfs_tree_lock(). Decrements don't require eb->lock because
>>>> writeback_blockers is atomic and merely indicates "don't write yet".
>>>> Once we decrement, we're done and don't care if writeback proceeds
>>>> immediately.
>>> This seems too complex to me.
>>>
>>> So this problem is very similar to some idea I had a few years ago but
>>> never managed to implement.
>>> It was about avoiding unnecessary COW, not for this space reservation
>>> exhaustion due to sustained memory pressure, but it would solve it
>>> too.
>>>
>>> The idea was that we do unnecessary COW in cases like this:
>>>
>>> 1) We COW a path in some tree and we are at transaction N;
>>>
>>> 2) Writeback happened for the extent buffers in that path while we are
>>> in the same transaction, because we reached the 32M limit and some
>>> task called btrfs_btree_balance_dirty() or something else triggered
>>> writeback of the btree inode;
>>>
>>> 3) While still at transaction N, we visit the same path to add an item
>>> to a leaf, or modify an item, whatever. Because the extent buffers
>>> have BTRFS_HEADER_FLAG_WRITTEN, we COW them again (should_cow_block()
>>> returns true).
>>>
>>> So during the lifetime of a transaction we can have a lot of
>>> unnecessary COW - we spend more time allocating extents, allocating
>>> memory, copying extent buffer data, use more space per transaction,
>>> etc.
>>>
>>> The idea was to not COW when an extent buffer has
>>> BTRFS_HEADER_FLAG_WRITTEN set, but only if its generation
>>> (btrfs_header_generation(eb)) matches the current transaction.
>>> That is safe because there's no committed tree that points to an
>>> extent buffer created in the current transaction.
>>>
>>> Any further modification to the extent buffer must be sure that the
>>> EXTENT_BUFFER_DIRTY flag is set, that the eb range is still in the
>>> transaction's dirty_pages io tree, etc, so that we don't miss writing
>>> the extent buffer to the same location again before the transaction
>>> commits the superblocks.
>>>
>>> Have you considered an approach like this?
>> I had not considered this, but it is a great idea.
>>
>> My first thought is that implementing this could be as simple
>> as removing the BTRFS_HEADER_FLAG_WRITTEN check. However, this
>> would mess with the assumptions around the log tree. From
>> btrfs_sync_log():
> After a fast glance and some tests, I found things might not be that 
> easy. The problem is not only the log tree.
>> /*
>>   * IO has been started, blocks of the log tree have WRITTEN flag set
>>   * in their headers. new modifications of the log will be written to
>>   * new positions. so it's safe to allow log writers to go in.
>>   */
>>
>> ^ Assumes that WRITTEN blocks will be COW'd.
>>
>> The issue looks like:
>>
>>   1. fsync A COWs eb
>>   2. fsync A lock_extent_buffer_for_io(); sets WRITTEN, unlocks tree
>>   3. fsync B does __not__ COW eb and modifies it
>>   4. fsync A writes modified eb to disk
>>   5. CRASH; the log tree is corrupted
>>
>> One way to avoid that is to keep the current behavior for the log
>> tree, but that leaves the potential for COW amplification...
> I tested with a patch like this:
> @@ -624,14 +624,18 @@ static inline bool should_cow_block(const struct 
> btrfs_trans_handle *trans,
>          if (btrfs_header_generation(buf) != trans->transid)
>                  return true;
> 
> -       if (btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN))
> -               return true;
> -
>          /* Ensure we can see the FORCE_COW bit. */
>          smp_mb__before_atomic();
>          if (test_bit(BTRFS_ROOT_FORCE_COW, &root->state))
>                  return true;
> 
> +       if (btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN)) {
> +               if (btrfs_root_id(root) == BTRFS_TREE_LOG_OBJECTID)
> +                       return true;
> +               btrfs_mark_buffer_dirty(trans, buf);
> +               return false;
> +       }
> +
>          if (btrfs_root_id(root) == BTRFS_TREE_RELOC_OBJECTID)
> 
>                  return false;
> 
> And get some errors like this:
> 
> 
> [  +0.090163] [ T2589] run fstests btrfs/004 at 2026-01-30 11:53:37
> [  +0.432352] [T11685] BTRFS: device fsid 1fb397fc-97a7-44dd-9602- 
> dd38b74bc391 devid 1 transid 8 /dev/loop1 (7:1) scanned by mount (11685)
> [  +0.000351] [T11685] BTRFS info (device loop1): first mount of 
> filesystem 1fb397fc-97a7-44dd-9602-dd38b74bc391
> [  +0.000014] [T11685] BTRFS info (device loop1): using crc32c (crc32c- 
> lib) checksum algorithm
> [  +0.001298] [T11685] BTRFS info (device loop1): checking UUID tree
> [  +0.000039] [T11685] BTRFS info (device loop1): enabling ssd 
> optimizations
> [  +0.000003] [T11685] BTRFS info (device loop1): turning on async discard
> [  +0.000002] [T11685] BTRFS info (device loop1): enabling free space tree
> [  +1.051781] [T11703] page: refcount:2 mapcount:0 
> mapping:00000000eb6d7caa index:0x2348 pfn:0x1caebf
> [  +0.000008] [T11703] memcg:ffff9b3300263cc0
> [  +0.000003] [T11703] aops:0xffffffffc0354040 ino:1
> [  +0.000024] [T11703] flags: 0x4e0000000000423e(referenced|uptodate| 
> dirty|lru|workingset|private|writeback|zone=1)
> [  +0.000007] [T11703] raw: 4e0000000000423e fffff74a872bb908 
> fffff74a84206a88 ffff9b33c6706880
> [  +0.000004] [T11703] raw: 0000000000002348 ffff9b334be522d0 
> 00000002ffffffff ffff9b3300263cc0
> [  +0.000002] [T11703] page dumped because: eb page dump
> [  +0.000003] [T11703] BTRFS critical (device loop1): corrupt leaf: 
> root=5 block=36995072 slot=118 ino=406 file_offset=94208, invalid 
> ram_bytes for file extent, have 8660273067269322872, should be aligned 
> to 4096
> [  +0.000013] [T11703] BTRFS info (device loop1): leaf 36995072 gen 33 
> total ptrs 128 free space 2857 owner 5
> [  +0.000006] [T11703]     item 0 key (386 DIR_ITEM 238230307) itemoff 
> 16249 itemsize 34
> [  +0.000004] [T11703]         location key (462 1 0) type 2
> [  +0.000003] [T11703]         transid 33 data_len 0 name_len 4
> [  +0.000003] [T11703]     item 1 key (386 DIR_ITEM 1473745676) itemoff 
> 16216 itemsize 33
> [  +0.000004] [T11703]         location key (376 1 0) type 3
> [  +0.000002] [T11703]         transid 30 data_len 0 name_len 3
> [  +0.000003] [T11703]     item 2 key (386 DIR_ITEM 2243137595) itemoff 
> 16182 itemsize 34
> [  +0.000004] [T11703]         location key (413 1 0) type 1
> [  +0.000002] [T11703]         transid 32 data_len 0 name_len 4
> ...
> [  +0.000001] [T11703]     item 127 key (405 DIR_ITEM 828387202) itemoff 
> 6057 itemsize 34
> [  +0.000002] [T11703]         location key (479 1 0) type 3
> [  +0.000001] [T11703]         transid 33 data_len 0 name_len 4
> [  +0.000002] [T11703] BTRFS error (device loop1): block=36995072 write 
> time tree block corruption detected
> [  +0.003429] [T11703] BTRFS: error (device loop1) in 
> btrfs_commit_transaction:2555: errno=-5 IO failure (Error while writing 
> out transaction)
> [  +0.000007] [T11703] BTRFS info (device loop1 state E): forced readonly
> [  +0.000002] [T11703] BTRFS warning (device loop1 state E): Skipping 
> commit of aborted transaction.
> [  +0.000002] [T11703] BTRFS error (device loop1 state EA): Transaction 
> aborted (error -5)
> [  +0.000003] [T11703] BTRFS: error (device loop1 state EA) in 
> cleanup_transaction:2037: errno=-5 IO failure
> 
> The reported 406 inode is even not in the printed leaf. It seems like a 
> data race maybe caused by:
> 
> We unlock the eb after setting the WRITTEN flag during write back, and 
> the eb should not get modified since then because all future writes will 
> use the cowed eb. However, with the WRITTEN flag check removed in 
> should_cow_block, we might write to the eb with WRITTEN flag set which 
> might be under io.

I tried again with this:

@@ -624,14 +624,20 @@ static inline bool should_cow_block(const struct 
btrfs_trans_handle *trans,
         if (btrfs_header_generation(buf) != trans->transid)
                 return true;

-       if (btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN))
-               return true;
-
         /* Ensure we can see the FORCE_COW bit. */
         smp_mb__before_atomic();
         if (test_bit(BTRFS_ROOT_FORCE_COW, &root->state))
                 return true;

+       if (btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN)) {
+               if (btrfs_root_id(root) == BTRFS_TREE_LOG_OBJECTID)
+                       return true;
+               if (test_bit(EXTENT_BUFFER_WRITEBACK, &buf->bflags))
+                       return true;
+               btrfs_mark_buffer_dirty(trans, buf);
+               return false;
+       }
+
         if (btrfs_root_id(root) == BTRFS_TREE_RELOC_OBJECTID)
                 return false;

When WRITEBACK is set, do a normal cow to prevent the data race. This 
seems to fix the previous problem. However, I got this:

[  +0.020843] [T15127] BTRFS error (device loop1): block=30687232 bad 
generation, have 11 expect > 14
[  +0.000009] [T15127] 	item 0 key (256 INODE_ITEM 0) itemoff 16123 
itemsize 160
[  +0.000004] [T15127] 		inode generation 3 transid 11 size 10 nbytes 16384
[  +0.000003] [T15127] 		block group 0 mode 40755 links 1 uid 0 gid 0
[  +0.000002] [T15127] 		rdev 0 sequence 1 flags 0x0
[  +0.000002] [T15127] 		atime 1769760651.0
[  +0.000002] [T15127] 		ctime 1769760652.250234845
[  +0.000002] [T15127] 		mtime 1769760652.250234845
[  +0.000001] [T15127] 		otime 1769760651.0
[  +0.000002] [T15127] 	item 1 key (256 INODE_REF 256) itemoff 16111 
itemsize 12
[  +0.000003] [T15127] 		index 0 name_len 2
[  +0.000002] [T15127] 	item 2 key (256 DIR_ITEM 2030520461) itemoff 
16076 itemsize 35
[  +0.000002] [T15127] 		location key (257 1 0) type 2
[  +0.000002] [T15127] 		transid 11 data_len 0 name_len 5
[  +0.000002] [T15127] 	item 3 key (256 DIR_INDEX 2) itemoff 16041 
itemsize 35
[  +0.000002] [T15127] 		location key (257 1 0) type 2
[  +0.000002] [T15127] 		transid 11 data_len 0 name_len 5
[  +0.000002] [T15127] 	item 4 key (257 INODE_ITEM 0) itemoff 15881 
itemsize 160
[  +0.000002] [T15127] 		inode generation 11 transid 11 size 12 nbytes 0
[  +0.000002] [T15127] 		block group 0 mode 40755 links 1 uid 0 gid 0
[  +0.000002] [T15127] 		rdev 0 sequence 19 flags 0x0
[  +0.000001] [T15127] 		atime 1769760652.250234845
[  +0.000002] [T15127] 		ctime 1769760652.256913323
[  +0.000002] [T15127] 		mtime 1769760652.256913323
[  +0.000001] [T15127] 		otime 1769760652.250234845
[  +0.000002] [T15127] 	item 5 key (257 INODE_REF 256) itemoff 15866 
itemsize 15
[  +0.000002] [T15127] 		index 2 name_len 5
[  +0.000002] [T15127] 	item 6 key (257 DIR_ITEM 247980518) itemoff 
15830 itemsize 36
[  +0.000002] [T15127] 		location key (256 132 18446744073709551615) type 2
[  +0.000002] [T15127] 		transid 11 data_len 0 name_len 6
[  +0.000002] [T15127] 	item 7 key (257 DIR_INDEX 2) itemoff 15794 
itemsize 36
[  +0.000002] [T15127] 		location key (256 132 18446744073709551615) type 2
[  +0.000002] [T15127] 		transid 11 data_len 0 name_len 6
[  +0.000001] [T15127] BTRFS error (device loop1): block=30687232 write 
time tree block corruption detected
[  +0.000017] [T15127] BTRFS error (device loop1): block=30703616 bad 
generation, have 11 expect > 14
[  +0.000004] [T15127] 	item 0 key (13631488 BLOCK_GROUP_ITEM 8388608) 
itemoff 16259 itemsize 24
[  +0.000003] [T15127] 		block group used 0 chunk_objectid 256 flags 1
[  +0.000002] [T15127] 	item 1 key (22020096 BLOCK_GROUP_ITEM 8388608) 
itemoff 16235 itemsize 24
[  +0.000002] [T15127] 		block group used 16384 chunk_objectid 256 flags 34
[  +0.000002] [T15127] 	item 2 key (22036480 METADATA_ITEM 0) itemoff 
16202 itemsize 33
[  +0.000002] [T15127] 		extent refs 1 gen 8 flags 2
[  +0.000002] [T15127] 		ref#0: tree block backref root 3
[  +0.000003] [T15127] 	item 3 key (30408704 BLOCK_GROUP_ITEM 268435456) 
itemoff 16178 itemsize 24
[  +0.000002] [T15127] 		block group used 163840 chunk_objectid 256 flags 36
[  +0.000002] [T15127] 	item 4 key (30490624 METADATA_ITEM 0) itemoff 
16145 itemsize 33
[  +0.000002] [T15127] 		extent refs 1 gen 5 flags 2
[  +0.000002] [T15127] 		ref#0: tree block backref root 7
[  +0.000002] [T15127] 	item 5 key (30523392 METADATA_ITEM 0) itemoff 
16112 itemsize 33
[  +0.000002] [T15127] 		extent refs 1 gen 5 flags 2
[  +0.000002] [T15127] 		ref#0: tree block backref root 18446744073709551607
[  +0.000002] [T15127] 	item 6 key (30605312 METADATA_ITEM 0) itemoff 
16079 itemsize 33
[  +0.000002] [T15127] 		extent refs 1 gen 9 flags 2
[  +0.000002] [T15127] 		ref#0: tree block backref root 4
[  +0.000002] [T15127] 	item 7 key (30687232 METADATA_ITEM 0) itemoff 
16046 itemsize 33
[  +0.000002] [T15127] 		extent refs 1 gen 11 flags 2
[  +0.000002] [T15127] 		ref#0: tree block backref root 5
[  +0.000002] [T15127] 	item 8 key (30703616 METADATA_ITEM 0) itemoff 
16013 itemsize 33
[  +0.000002] [T15127] 		extent refs 1 gen 11 flags 2
[  +0.000002] [T15127] 		ref#0: tree block backref root 2
[  +0.000002] [T15127] 	item 9 key (30720000 METADATA_ITEM 0) itemoff 
15980 itemsize 33
[  +0.000002] [T15127] 		extent refs 1 gen 11 flags 2
[  +0.000002] [T15127] 		ref#0: tree block backref root 10
[  +0.000002] [T15127] 	item 10 key (30736384 METADATA_ITEM 0) itemoff 
15947 itemsize 33
[  +0.000002] [T15127] 		extent refs 1 gen 11 flags 2
[  +0.000002] [T15127] 		ref#0: tree block backref root 8
[  +0.000002] [T15127] 	item 11 key (30752768 METADATA_ITEM 0) itemoff 
15914 itemsize 33
[  +0.000002] [T15127] 		extent refs 1 gen 11 flags 2
[  +0.000002] [T15127] 		ref#0: tree block backref root 256
[  +0.000002] [T15127] 	item 12 key (30769152 METADATA_ITEM 0) itemoff 
15881 itemsize 33
[  +0.000002] [T15127] 		extent refs 1 gen 11 flags 2
[  +0.000002] [T15127] 		ref#0: tree block backref root 1
[  +0.000002] [T15127] 	item 13 key (30785536 METADATA_ITEM 0) itemoff 
15848 itemsize 33
[  +0.000002] [T15127] 		extent refs 1 gen 11 flags 2
[  +0.000002] [T15127] 		ref#0: tree block backref root 9
[  +0.000002] [T15127] BTRFS error (device loop1): block=30703616 write 
time tree block corruption detected
[  +0.000012] [T15127] BTRFS error (device loop1): block=30720000 bad 
generation, have 11 expect > 14
[  +0.000004] [T15127] 	item 0 key (13631488 FREE_SPACE_INFO 8388608) 
itemoff 16275 itemsize 8
[  +0.000002] [T15127] 	item 1 key (13631488 FREE_SPACE_EXTENT 8388608) 
itemoff 16275 itemsize 0
[  +0.000002] [T15127] 	item 2 key (22020096 FREE_SPACE_INFO 8388608) 
itemoff 16267 itemsize 8
[  +0.000002] [T15127] 	item 3 key (22020096 FREE_SPACE_EXTENT 16384) 
itemoff 16267 itemsize 0
[  +0.000003] [T15127] 	item 4 key (22052864 FREE_SPACE_EXTENT 8355840) 
itemoff 16267 itemsize 0
[  +0.000002] [T15127] 	item 5 key (30408704 FREE_SPACE_INFO 268435456) 
itemoff 16259 itemsize 8
[  +0.000002] [T15127] 	item 6 key (30408704 FREE_SPACE_EXTENT 81920) 
itemoff 16259 itemsize 0
[  +0.000002] [T15127] 	item 7 key (30507008 FREE_SPACE_EXTENT 16384) 
itemoff 16259 itemsize 0
[  +0.000002] [T15127] 	item 8 key (30539776 FREE_SPACE_EXTENT 65536) 
itemoff 16259 itemsize 0
[  +0.000002] [T15127] 	item 9 key (30621696 FREE_SPACE_EXTENT 65536) 
itemoff 16259 itemsize 0
[  +0.000003] [T15127] 	item 10 key (30801920 FREE_SPACE_EXTENT 
268042240) itemoff 16259 itemsize 0
[  +0.000002] [T15127] BTRFS error (device loop1): block=30720000 write 
time tree block corruption detected
[  +0.000010] [T15127] BTRFS error (device loop1): block=30736384 bad 
generation, have 11 expect > 14
[  +0.000004] [T15127] 	item 0 key (0 QGROUP_STATUS 0) itemoff 16243 
itemsize 40
[  +0.000003] [T15127] 	item 1 key (0 QGROUP_INFO 5) itemoff 16203 
itemsize 40
[  +0.000002] [T15127] 	item 2 key (0 QGROUP_INFO 256) itemoff 16163 
itemsize 40
[  +0.000002] [T15127] 	item 3 key (0 QGROUP_LIMIT 5) itemoff 16123 
itemsize 40
[  +0.000002] [T15127] 	item 4 key (0 QGROUP_LIMIT 256) itemoff 16083 
itemsize 40
[  +0.000003] [T15127] BTRFS error (device loop1): block=30736384 write 
time tree block corruption detected
[  +0.000014] [T15127] BTRFS error (device loop1): block=30769152 bad 
generation, have 11 expect > 14
[  +0.000004] [T15127] 	item 0 key (2 ROOT_ITEM 0) itemoff 15844 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30703616 refs 1
[  +0.000002] [T15127] 	item 1 key (4 ROOT_ITEM 0) itemoff 15405 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30605312 refs 1
[  +0.000001] [T15127] 	item 2 key (5 INODE_REF 6) itemoff 15388 itemsize 17
[  +0.000002] [T15127] 		index 0 name_len 7
[  +0.000002] [T15127] 	item 3 key (5 ROOT_ITEM 0) itemoff 14949 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30687232 refs 1
[  +0.000002] [T15127] 	item 4 key (5 ROOT_REF 256) itemoff 14925 
itemsize 24
[  +0.000002] [T15127] 	item 5 key (6 INODE_ITEM 0) itemoff 14765 
itemsize 160
[  +0.000002] [T15127] 		inode generation 3 transid 0 size 0 nbytes 16384
[  +0.000002] [T15127] 		block group 0 mode 40755 links 1 uid 0 gid 0
[  +0.000002] [T15127] 		rdev 0 sequence 0 flags 0x0
[  +0.000001] [T15127] 		atime 1769760651.0
[  +0.000002] [T15127] 		ctime 1769760651.0
[  +0.000002] [T15127] 		mtime 1769760651.0
[  +0.000001] [T15127] 		otime 1769760651.0
[  +0.000002] [T15127] 	item 6 key (6 INODE_REF 6) itemoff 14753 itemsize 12
[  +0.000002] [T15127] 		index 0 name_len 2
[  +0.000001] [T15127] 	item 7 key (6 DIR_ITEM 2378154706) itemoff 14716 
itemsize 37
[  +0.000003] [T15127] 		location key (5 132 18446744073709551615) type 2
[  +0.000001] [T15127] 		transid 3 data_len 0 name_len 7
[  +0.000002] [T15127] 	item 8 key (7 ROOT_ITEM 0) itemoff 14277 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30490624 refs 1
[  +0.000002] [T15127] 	item 9 key (8 ROOT_ITEM 0) itemoff 13838 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30736384 refs 1
[  +0.000001] [T15127] 	item 10 key (9 ROOT_ITEM 0) itemoff 13399 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30785536 refs 1
[  +0.000002] [T15127] 	item 11 key (10 ROOT_ITEM 0) itemoff 12960 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30720000 refs 1
[  +0.000001] [T15127] 	item 12 key (256 ROOT_ITEM 11) itemoff 12521 
itemsize 439
[  +0.000003] [T15127] 		root data bytenr 30752768 refs 1
[  +0.000001] [T15127] 	item 13 key (256 ROOT_BACKREF 5) itemoff 12497 
itemsize 24
[  +0.000003] [T15127] 	item 14 key (18446744073709551607 ROOT_ITEM 0) 
itemoff 12058 itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30523392 refs 1
[  +0.000001] [T15127] BTRFS error (device loop1): block=30769152 write 
time tree block corruption detected
[  +0.000012] [T15127] BTRFS error (device loop1): block=30801920 bad 
generation, have 12 expect > 14
[  +0.000003] [T15127] 	item 0 key (0 QGROUP_STATUS 0) itemoff 16243 
itemsize 40
[  +0.000003] [T15127] 	item 1 key (0 QGROUP_INFO 5) itemoff 16203 
itemsize 40
[  +0.000002] [T15127] 	item 2 key (0 QGROUP_INFO 256) itemoff 16163 
itemsize 40
[  +0.000002] [T15127] 	item 3 key (0 QGROUP_INFO 257) itemoff 16123 
itemsize 40
[  +0.000002] [T15127] 	item 4 key (0 QGROUP_LIMIT 5) itemoff 16083 
itemsize 40
[  +0.000002] [T15127] 	item 5 key (0 QGROUP_LIMIT 256) itemoff 16043 
itemsize 40
[  +0.000002] [T15127] 	item 6 key (0 QGROUP_LIMIT 257) itemoff 16003 
itemsize 40
[  +0.000002] [T15127] BTRFS error (device loop1): block=30801920 write 
time tree block corruption detected
[  +0.000014] [T15127] BTRFS error (device loop1): block=30818304 bad 
generation, have 12 expect > 14
[  +0.000003] [T15127] 	item 0 key (256 INODE_ITEM 0) itemoff 16123 
itemsize 160
[  +0.000002] [T15127] 		inode generation 3 transid 11 size 10 nbytes 16384
[  +0.000002] [T15127] 		block group 0 mode 40755 links 1 uid 0 gid 0
[  +0.000002] [T15127] 		rdev 0 sequence 1 flags 0x0
[  +0.000002] [T15127] 		atime 1769760651.0
[  +0.000001] [T15127] 		ctime 1769760652.250234845
[  +0.000002] [T15127] 		mtime 1769760652.250234845
[  +0.000001] [T15127] 		otime 1769760651.0
[  +0.000002] [T15127] 	item 1 key (256 INODE_REF 256) itemoff 16111 
itemsize 12
[  +0.000002] [T15127] 		index 0 name_len 2
[  +0.000002] [T15127] 	item 2 key (256 DIR_ITEM 2030520461) itemoff 
16076 itemsize 35
[  +0.000002] [T15127] 		location key (257 1 0) type 2
[  +0.000002] [T15127] 		transid 11 data_len 0 name_len 5
[  +0.000001] [T15127] 	item 3 key (256 DIR_INDEX 2) itemoff 16041 
itemsize 35
[  +0.000002] [T15127] 		location key (257 1 0) type 2
[  +0.000002] [T15127] 		transid 11 data_len 0 name_len 5
[  +0.000002] [T15127] 	item 4 key (257 INODE_ITEM 0) itemoff 15881 
itemsize 160
[  +0.000002] [T15127] 		inode generation 11 transid 12 size 24 nbytes 0
[  +0.000002] [T15127] 		block group 0 mode 40755 links 1 uid 0 gid 0
[  +0.000002] [T15127] 		rdev 0 sequence 19 flags 0x0
[  +0.000001] [T15127] 		atime 1769760652.250234845
[  +0.000002] [T15127] 		ctime 1769760652.267621586
[  +0.000001] [T15127] 		mtime 1769760652.267621586
[  +0.000002] [T15127] 		otime 1769760652.250234845
[  +0.000002] [T15127] 	item 5 key (257 INODE_REF 256) itemoff 15866 
itemsize 15
[  +0.000002] [T15127] 		index 2 name_len 5
[  +0.000001] [T15127] 	item 6 key (257 DIR_ITEM 247980518) itemoff 
15830 itemsize 36
[  +0.000002] [T15127] 		location key (256 132 18446744073709551615) type 2
[  +0.000002] [T15127] 		transid 11 data_len 0 name_len 6
[  +0.000002] [T15127] 	item 7 key (257 DIR_ITEM 496439826) itemoff 
15794 itemsize 36
[  +0.000002] [T15127] 		location key (257 132 18446744073709551615) type 2
[  +0.000002] [T15127] 		transid 12 data_len 0 name_len 6
[  +0.000001] [T15127] 	item 8 key (257 DIR_INDEX 2) itemoff 15758 
itemsize 36
[  +0.000003] [T15127] 		location key (256 132 18446744073709551615) type 2
[  +0.000001] [T15127] 		transid 11 data_len 0 name_len 6
[  +0.000002] [T15127] 	item 9 key (257 DIR_INDEX 3) itemoff 15722 
itemsize 36
[  +0.000002] [T15127] 		location key (257 132 18446744073709551615) type 2
[  +0.000002] [T15127] 		transid 12 data_len 0 name_len 6
[  +0.000001] [T15127] BTRFS error (device loop1): block=30818304 write 
time tree block corruption detected
[  +0.000016] [T15127] BTRFS error (device loop1): block=30851072 bad 
generation, have 12 expect > 14
[  +0.000004] [T15127] 	item 0 key (2 ROOT_ITEM 0) itemoff 15844 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30867456 refs 1
[  +0.000001] [T15127] 	item 1 key (4 ROOT_ITEM 0) itemoff 15405 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30605312 refs 1
[  +0.000002] [T15127] 	item 2 key (5 INODE_REF 6) itemoff 15388 itemsize 17
[  +0.000002] [T15127] 		index 0 name_len 7
[  +0.000001] [T15127] 	item 3 key (5 ROOT_ITEM 0) itemoff 14949 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30818304 refs 1
[  +0.000002] [T15127] 	item 4 key (5 ROOT_REF 256) itemoff 14925 
itemsize 24
[  +0.000002] [T15127] 	item 5 key (5 ROOT_REF 257) itemoff 14901 
itemsize 24
[  +0.000002] [T15127] 	item 6 key (6 INODE_ITEM 0) itemoff 14741 
itemsize 160
[  +0.000002] [T15127] 		inode generation 3 transid 0 size 0 nbytes 16384
[  +0.000002] [T15127] 		block group 0 mode 40755 links 1 uid 0 gid 0
[  +0.000003] [T15127] 		rdev 0 sequence 0 flags 0x0
[  +0.000001] [T15127] 		atime 1769760651.0
[  +0.000002] [T15127] 		ctime 1769760651.0
[  +0.000003] [T15127] 		mtime 1769760651.0
[  +0.000002] [T15127] 		otime 1769760651.0
[  +0.000002] [T15127] 	item 7 key (6 INODE_REF 6) itemoff 14729 itemsize 12
[  +0.000003] [T15127] 		index 0 name_len 2
[  +0.000002] [T15127] 	item 8 key (6 DIR_ITEM 2378154706) itemoff 14692 
itemsize 37
[  +0.000003] [T15127] 		location key (5 132 18446744073709551615) type 2
[  +0.000002] [T15127] 		transid 3 data_len 0 name_len 7
[  +0.000002] [T15127] 	item 9 key (7 ROOT_ITEM 0) itemoff 14253 
itemsize 439
[  +0.000003] [T15127] 		root data bytenr 30490624 refs 1
[  +0.000002] [T15127] 	item 10 key (8 ROOT_ITEM 0) itemoff 13814 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30801920 refs 1
[  +0.000003] [T15127] 	item 11 key (9 ROOT_ITEM 0) itemoff 13375 
itemsize 439
[  +0.000002] [T15127] 		root data bytenr 30900224 refs 1
[  +0.000002] [T15127] 	item 12 key (10 ROOT_ITEM 0) itemoff 12936 
itemsize 439
[  +0.000003] [T15127] 		root data bytenr 30883840 refs 1
[  +0.000002] [T15127] 	item 13 key (256 ROOT_ITEM 11) itemoff 12497 
itemsize 439
[  +0.000003] [T15127] 		root data bytenr 30752768 refs 1
[  +0.000002] [T15127] 	item 14 key (256 ROOT_BACKREF 5) itemoff 12473 
itemsize 24
[  +0.000003] [T15127] 	item 15 key (257 ROOT_ITEM 12) itemoff 12034 
itemsize 439
[  +0.000003] [T15127] 		root data bytenr 30834688 refs 1
[  +0.000002] [T15127] 	item 16 key (257 ROOT_BACKREF 5) itemoff 12010 
itemsize 24
[  +0.000003] [T15127] 	item 17 key (18446744073709551607 ROOT_ITEM 0) 
itemoff 11571 itemsize 439
[  +0.000004] [T15127] 		root data bytenr 30523392 refs 1
[  +0.000002] [T15127] BTRFS error (device loop1): block=30851072 write 
time tree block corruption detected

and a lot more lines with the same generation errors for btrfs/122 
btrfs/152 btrfs/210 btrfs/224 btrfs/316 btrfs/320 btrfs/340 fstest cases.

I have no idea why it's trying to write some ebs older than current 
transaction. Seems related with snapshots.

> To fix this, we need to check the DIRTY flag again to prevent writing a 
> eb which has some new data written, and lock the eb before we really 
> doing io related things. I'm not farmilar with io related code so please 
> correct me if I got anything wrong.
> 
> Thanks,
> 
> Sun Yangkai

next prev parent reply	other threads:[~2026-01-30  9:37 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-27 20:42 [PATCH] btrfs: prevent COW amplification during btrfs_search_slot Leo Martins
2026-01-28 21:48 ` Qu Wenruo
2026-01-29 19:30   ` Leo Martins
2026-01-29 11:52 ` Filipe Manana
2026-01-30  0:12   ` Leo Martins
2026-01-30  4:14     ` Sun YangKai
2026-01-30  9:37       ` Sun YangKai [this message]
2026-01-30 15:50         ` Sun YangKai
2026-01-30 16:11           ` Filipe Manana
2026-01-31  9:16             ` Sun YangKai
2026-01-30 12:49     ` Filipe Manana
2026-01-30 15:43       ` Boris Burkov
2026-01-30 15:57         ` Filipe Manana
2026-02-03  1:09           ` Leo Martins
2026-01-30 21:43       ` Leo Martins
2026-01-30 22:34         ` Qu Wenruo
2026-01-31  0:11           ` Boris Burkov
2026-01-31  1:06             ` Qu Wenruo
2026-01-31 17:16               ` Boris Burkov
2026-01-31 21:59                 ` Qu Wenruo
2026-02-10  7:45 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e5eee424-303d-423b-aead-2eccbf63b8ec@gmail.com \
    --to=sunk67188@gmail.com \
    --cc=fdmanana@kernel.org \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=loemra.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox