From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f196.google.com (mail-pl1-f196.google.com [209.85.214.196]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D3A0F32D7FB for ; Fri, 30 Jan 2026 09:37:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.196 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769765845; cv=none; b=m0kGtJaDZpsbG9WMm+7rEEswKV6F7q3s0zuGdwIe6FiNnt4LCs/mrr2fUWun7pplXJ3CJeQ6PxwQAazKqziy9cAO8TPMg+QaF35fPcfefxTI715+bom14Iq0W40Aoi1c3oOoJjyNckXJfnz63b+1AVVCxoivMNY5GVbMKSC8a5k= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769765845; c=relaxed/simple; bh=PbMkLXLHvgdeOm2rjpkuX5Mfq0HIJKkfKBlS9zFkrms=; h=Message-ID:Date:MIME-Version:From:Subject:To:Cc:References: In-Reply-To:Content-Type; b=iBpblbPkNHi2fiOd/DCHceRQDx8s0qV5q2+sK0eIx8VQwMfxdLjrfQVNKnzt2sj8nQ4IX2AsTeLXnUWzCvYeqrqkICNJxnuUqC5V87BoHlBeTlclQzJA1WVHh1JFPH6Cm0qoUkU0EJhBBIShopzrvN/rHd8IRfpc+Ue6/yP1dBE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=cJSezL5N; arc=none smtp.client-ip=209.85.214.196 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="cJSezL5N" Received: by mail-pl1-f196.google.com with SMTP id d9443c01a7336-2a08cb5e30eso493985ad.1 for ; Fri, 30 Jan 2026 01:37:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1769765842; x=1770370642; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:from:user-agent:mime-version:date:message-id:from:to :cc:subject:date:message-id:reply-to; bh=HN5PItkEo7YDhL4OlC3nfk0M2II9uOFf6rC3W/MOovY=; b=cJSezL5Ns6ZbOOkp/yW7EunK91YjyXQvXQADGcw3et3mPBkS+MSCV9SnGfT36gl4b3 Bjo/h5PKvaSTBXtU4KN7/t344CTtYbzCrSb/Pe4FC3GerTC1AbBKuX03SP+QJiXwORHu ho5GugwykmyqDY7DbHIa4Z8f8n8AjblMwAnt5IV9k3wLS6x8XduxSi32/DH7+Bw0aTnJ OKpK/zSNFDrU7WNq2gra5UeUYqCbVJkqnDuk6mngsVbek9kN8gX6zwbTzYT86eE8pU+o exWpZAWCObnBYpSC9rDg5z/BoxOLxVTycc09TrR4HUPJ4aaZwjPam9YsMFQPFYf/upL+ 9QEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769765842; x=1770370642; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:from:user-agent:mime-version:date:message-id:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=HN5PItkEo7YDhL4OlC3nfk0M2II9uOFf6rC3W/MOovY=; b=kN7XP1UHACR6KNoAgAsYRyYCYupgavbQyjttKfmiKeUxHhJuXXDRmnify1Bd11BRj4 i2NH2M+VTDn4P2ua+9aPmffS3wjjRv1QN3T+xmGE5TBPh0H//7NAxkIV4wYF/BMQaUgf KGPiMYh9zj8uXuliHRKu7CDzv9sQC2aHmuf0M6DicFPZGqS1PaKi7VJeOMMZavkgtb9P YYo4jiNt48aLBnqWHWvujQsEre0Tq55WhogCAx+qfHF0a3kcOdE3xp6s+rxTs2QI7sLT xOMlsN64QaEs4mtUIWL6SLGU7KqtgsuG+r83q3CTBG/NHM/HuaWu45oTJwbQ3+74p34M zKGA== X-Gm-Message-State: AOJu0Yxzjm/1v8Yx2C23Of3OliFlom4bHZZ4QM71GamOGQsExOwFYOEG NuXrprTz/+V7YA5P0LCmuu+f48QtEAagtChkERyiNTpuKjQzMyi+mPpeJd3cPUtpnLjt+ZjW X-Gm-Gg: AZuq6aIJL0PKuMOj0eyjcsBm/z45Pfen0hQ6qwBAOthfAq1m5E/0+J6Qtue6tntbSny iTGndBW4z0j8l/VSWr/XmUGIrMzmvwm1LBDHlwy7wDAL9+WfhFy5F0UXBZQkQ+VW20crABiXXBS Z4TX7Wi0yxKcjhPPrJt4XSw/DjL0qKdssvNvMbgaYCIQXoLLU9UNwW0oTtD/jtMBtn32CS4E10f fPH3nQHen6HABCrq1aJ7DjgJqwIenLjw+GyBmGcKCsyTneM5a6H3fl/LfChAkl3xS/CgLX1AwsY K/++2DDzb6ZflQz1PKcbkr0ShVxTnPw7KlOhmf/Idu9aNwMpnCt6VP6NW2zJG7QJ0YTIUq5rNhe n3h949Hj1qupTyupeq5BjrWkJFsljcivX0/TSARXlH2GI8krpidrUtp7gUQ2/M4jytF0qwjSyIu f7rhe0S9psF/iD34FBh8ylDTJXiaMmHLP+C/MwP88kOyBybMEEJ5JfiYgMlCV0A3BAXg== X-Received: by 2002:a17:902:d50e:b0:29d:779c:c0cb with SMTP id d9443c01a7336-2a8d96a0659mr17935915ad.2.1769765841989; Fri, 30 Jan 2026 01:37:21 -0800 (PST) Received: from ?IPV6:2408:8239:502:5512:daed:b15:c399:847e? ([175.143.94.228]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2a88b4153e4sm74117545ad.39.2026.01.30.01.37.18 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 30 Jan 2026 01:37:21 -0800 (PST) Message-ID: Date: Fri, 30 Jan 2026 17:37:15 +0800 Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Sun YangKai Subject: Re: [PATCH] btrfs: prevent COW amplification during btrfs_search_slot To: Leo Martins , Filipe Manana Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com References: <20260130001254.83750-1-loemra.dev@gmail.com> Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 2026/1/30 12:14, Sun YangKai wrote: > On 2026/1/30 08:12, Leo Martins wrote: >> On Thu, 29 Jan 2026 11:52:07 +0000 Filipe Manana >> wrote: >>> On Tue, Jan 27, 2026 at 8:43 PM Leo Martins wrote: >>>> I've been investigating enospcs at Meta and have observed a strange >>>> pattern where filesystems are enospcing with lots of unallocated space >>>> (> 100G). Sample dmesg dump at bottom of message. >>>> >>>> btrfs_insert_delayed_dir_index is attempting to migrate some >>>> reservation >>>> from the transaction block reserve and finding it exhausted leading >>>> to a >>>> warning and enospc. This is a bug as the reservations are meant to be >>>> worst case. It should be impossible to exhaust the transaction block >>>> reserve. >>>> >>>> Some tracing of affected hosts revealed that there were single >>>> btrfs_search_slot calls that were COWing 100s of times. I was able to >>>> reproduce this behavior locally by creating a very constrained cgroup >>>> and producing a lot of concurrent filesystem operations. Here's the >>>> pattern: >>>> >>>>   1. btrfs_search_slot() begins tree traversal with cow=1 >>>>   2. Node at level N needs COW (old generation or WRITTEN flag set) >>>>   3. btrfs_cow_block() allocates new node, updates parent pointer >>>>   4. Traversal continues, but hits a condition requiring restart >>>> (e.g., node >>>>      not cached, lock contention, need higher write_lock_level) >>>>   5. btrfs_release_path() releases all locks and references >>>>   6. Memory pressure triggers writeback on the COW'd node >>>>   7. lock_extent_buffer_for_io() clears EXTENT_BUFFER_DIRTY and sets >>>>      BTRFS_HEADER_FLAG_WRITTEN >>>>   8. goto again - traversal restarts from root >>>>   9. Traversal reaches the freshly COW'd node >>>>   10. should_cow_block() sees WRITTEN flag set, returns true >>>>   11. btrfs_cow_block() allocates another new node - same logical >>>> position, >>>>       new physical location, new reservation consumed >>>>   12. Steps 4-11 repeat indefinitely under sustained memory pressure >>>> >>>> Note this behavior should be much harder to trigger since Boris's >>>> AS_KERNEL_FILE changes that make it so that extent_buffer pages aren't >>>> accounted for in user cgroups. However, I believe it >>>> would still be an issue under global memory pressure. >>>> Link:https://lore.kernel.org/linux-btrfs/ >>>> cover.1755812945.git.boris@bur.io/ >>>> >>>> This COW amplification breaks the idea that transaction reservations >>>> are >>>> worst case as any search slot call could find itself in this COW >>>> loop and >>>> exhaust its reservation. >>>> >>>> My proposed solution is to temporarily pin extent buffers for the >>>> lifetime of btrfs_search_slot. This prevents the massive COW >>>> amplification that can be seen during high memory pressure. >>>> >>>> The implementation uses a local xarray to track COW'd buffers for the >>>> duration of the search. The xarray stores extent_buffer pointers >>>> without >>>> taking additional references; this is safe because tracked buffers >>>> remain >>>> dirty (writeback_blockers prevents the dirty bit from being cleared) >>>> and >>>> dirty buffers cannot be reclaimed by memory pressure. >>>> >>>> Synchronization is provided by eb->lock: increments in >>>> btrfs_search_slot_track_cow() occur while holding the write lock, and >>>> the check in lock_extent_buffer_for_io() also holds the write lock via >>>> btrfs_tree_lock(). Decrements don't require eb->lock because >>>> writeback_blockers is atomic and merely indicates "don't write yet". >>>> Once we decrement, we're done and don't care if writeback proceeds >>>> immediately. >>> This seems too complex to me. >>> >>> So this problem is very similar to some idea I had a few years ago but >>> never managed to implement. >>> It was about avoiding unnecessary COW, not for this space reservation >>> exhaustion due to sustained memory pressure, but it would solve it >>> too. >>> >>> The idea was that we do unnecessary COW in cases like this: >>> >>> 1) We COW a path in some tree and we are at transaction N; >>> >>> 2) Writeback happened for the extent buffers in that path while we are >>> in the same transaction, because we reached the 32M limit and some >>> task called btrfs_btree_balance_dirty() or something else triggered >>> writeback of the btree inode; >>> >>> 3) While still at transaction N, we visit the same path to add an item >>> to a leaf, or modify an item, whatever. Because the extent buffers >>> have BTRFS_HEADER_FLAG_WRITTEN, we COW them again (should_cow_block() >>> returns true). >>> >>> So during the lifetime of a transaction we can have a lot of >>> unnecessary COW - we spend more time allocating extents, allocating >>> memory, copying extent buffer data, use more space per transaction, >>> etc. >>> >>> The idea was to not COW when an extent buffer has >>> BTRFS_HEADER_FLAG_WRITTEN set, but only if its generation >>> (btrfs_header_generation(eb)) matches the current transaction. >>> That is safe because there's no committed tree that points to an >>> extent buffer created in the current transaction. >>> >>> Any further modification to the extent buffer must be sure that the >>> EXTENT_BUFFER_DIRTY flag is set, that the eb range is still in the >>> transaction's dirty_pages io tree, etc, so that we don't miss writing >>> the extent buffer to the same location again before the transaction >>> commits the superblocks. >>> >>> Have you considered an approach like this? >> I had not considered this, but it is a great idea. >> >> My first thought is that implementing this could be as simple >> as removing the BTRFS_HEADER_FLAG_WRITTEN check. However, this >> would mess with the assumptions around the log tree. From >> btrfs_sync_log(): > After a fast glance and some tests, I found things might not be that > easy. The problem is not only the log tree. >> /* >>   * IO has been started, blocks of the log tree have WRITTEN flag set >>   * in their headers. new modifications of the log will be written to >>   * new positions. so it's safe to allow log writers to go in. >>   */ >> >> ^ Assumes that WRITTEN blocks will be COW'd. >> >> The issue looks like: >> >>   1. fsync A COWs eb >>   2. fsync A lock_extent_buffer_for_io(); sets WRITTEN, unlocks tree >>   3. fsync B does __not__ COW eb and modifies it >>   4. fsync A writes modified eb to disk >>   5. CRASH; the log tree is corrupted >> >> One way to avoid that is to keep the current behavior for the log >> tree, but that leaves the potential for COW amplification... > I tested with a patch like this: > @@ -624,14 +624,18 @@ static inline bool should_cow_block(const struct > btrfs_trans_handle *trans, >         if (btrfs_header_generation(buf) != trans->transid) >                 return true; > > -       if (btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN)) > -               return true; > - >         /* Ensure we can see the FORCE_COW bit. */ >         smp_mb__before_atomic(); >         if (test_bit(BTRFS_ROOT_FORCE_COW, &root->state)) >                 return true; > > +       if (btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN)) { > +               if (btrfs_root_id(root) == BTRFS_TREE_LOG_OBJECTID) > +                       return true; > +               btrfs_mark_buffer_dirty(trans, buf); > +               return false; > +       } > + >         if (btrfs_root_id(root) == BTRFS_TREE_RELOC_OBJECTID) > >                 return false; > > And get some errors like this: > > > [  +0.090163] [ T2589] run fstests btrfs/004 at 2026-01-30 11:53:37 > [  +0.432352] [T11685] BTRFS: device fsid 1fb397fc-97a7-44dd-9602- > dd38b74bc391 devid 1 transid 8 /dev/loop1 (7:1) scanned by mount (11685) > [  +0.000351] [T11685] BTRFS info (device loop1): first mount of > filesystem 1fb397fc-97a7-44dd-9602-dd38b74bc391 > [  +0.000014] [T11685] BTRFS info (device loop1): using crc32c (crc32c- > lib) checksum algorithm > [  +0.001298] [T11685] BTRFS info (device loop1): checking UUID tree > [  +0.000039] [T11685] BTRFS info (device loop1): enabling ssd > optimizations > [  +0.000003] [T11685] BTRFS info (device loop1): turning on async discard > [  +0.000002] [T11685] BTRFS info (device loop1): enabling free space tree > [  +1.051781] [T11703] page: refcount:2 mapcount:0 > mapping:00000000eb6d7caa index:0x2348 pfn:0x1caebf > [  +0.000008] [T11703] memcg:ffff9b3300263cc0 > [  +0.000003] [T11703] aops:0xffffffffc0354040 ino:1 > [  +0.000024] [T11703] flags: 0x4e0000000000423e(referenced|uptodate| > dirty|lru|workingset|private|writeback|zone=1) > [  +0.000007] [T11703] raw: 4e0000000000423e fffff74a872bb908 > fffff74a84206a88 ffff9b33c6706880 > [  +0.000004] [T11703] raw: 0000000000002348 ffff9b334be522d0 > 00000002ffffffff ffff9b3300263cc0 > [  +0.000002] [T11703] page dumped because: eb page dump > [  +0.000003] [T11703] BTRFS critical (device loop1): corrupt leaf: > root=5 block=36995072 slot=118 ino=406 file_offset=94208, invalid > ram_bytes for file extent, have 8660273067269322872, should be aligned > to 4096 > [  +0.000013] [T11703] BTRFS info (device loop1): leaf 36995072 gen 33 > total ptrs 128 free space 2857 owner 5 > [  +0.000006] [T11703]     item 0 key (386 DIR_ITEM 238230307) itemoff > 16249 itemsize 34 > [  +0.000004] [T11703]         location key (462 1 0) type 2 > [  +0.000003] [T11703]         transid 33 data_len 0 name_len 4 > [  +0.000003] [T11703]     item 1 key (386 DIR_ITEM 1473745676) itemoff > 16216 itemsize 33 > [  +0.000004] [T11703]         location key (376 1 0) type 3 > [  +0.000002] [T11703]         transid 30 data_len 0 name_len 3 > [  +0.000003] [T11703]     item 2 key (386 DIR_ITEM 2243137595) itemoff > 16182 itemsize 34 > [  +0.000004] [T11703]         location key (413 1 0) type 1 > [  +0.000002] [T11703]         transid 32 data_len 0 name_len 4 > ... > [  +0.000001] [T11703]     item 127 key (405 DIR_ITEM 828387202) itemoff > 6057 itemsize 34 > [  +0.000002] [T11703]         location key (479 1 0) type 3 > [  +0.000001] [T11703]         transid 33 data_len 0 name_len 4 > [  +0.000002] [T11703] BTRFS error (device loop1): block=36995072 write > time tree block corruption detected > [  +0.003429] [T11703] BTRFS: error (device loop1) in > btrfs_commit_transaction:2555: errno=-5 IO failure (Error while writing > out transaction) > [  +0.000007] [T11703] BTRFS info (device loop1 state E): forced readonly > [  +0.000002] [T11703] BTRFS warning (device loop1 state E): Skipping > commit of aborted transaction. > [  +0.000002] [T11703] BTRFS error (device loop1 state EA): Transaction > aborted (error -5) > [  +0.000003] [T11703] BTRFS: error (device loop1 state EA) in > cleanup_transaction:2037: errno=-5 IO failure > > The reported 406 inode is even not in the printed leaf. It seems like a > data race maybe caused by: > > We unlock the eb after setting the WRITTEN flag during write back, and > the eb should not get modified since then because all future writes will > use the cowed eb. However, with the WRITTEN flag check removed in > should_cow_block, we might write to the eb with WRITTEN flag set which > might be under io. I tried again with this: @@ -624,14 +624,20 @@ static inline bool should_cow_block(const struct btrfs_trans_handle *trans, if (btrfs_header_generation(buf) != trans->transid) return true; - if (btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN)) - return true; - /* Ensure we can see the FORCE_COW bit. */ smp_mb__before_atomic(); if (test_bit(BTRFS_ROOT_FORCE_COW, &root->state)) return true; + if (btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN)) { + if (btrfs_root_id(root) == BTRFS_TREE_LOG_OBJECTID) + return true; + if (test_bit(EXTENT_BUFFER_WRITEBACK, &buf->bflags)) + return true; + btrfs_mark_buffer_dirty(trans, buf); + return false; + } + if (btrfs_root_id(root) == BTRFS_TREE_RELOC_OBJECTID) return false; When WRITEBACK is set, do a normal cow to prevent the data race. This seems to fix the previous problem. However, I got this: [ +0.020843] [T15127] BTRFS error (device loop1): block=30687232 bad generation, have 11 expect > 14 [ +0.000009] [T15127] item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160 [ +0.000004] [T15127] inode generation 3 transid 11 size 10 nbytes 16384 [ +0.000003] [T15127] block group 0 mode 40755 links 1 uid 0 gid 0 [ +0.000002] [T15127] rdev 0 sequence 1 flags 0x0 [ +0.000002] [T15127] atime 1769760651.0 [ +0.000002] [T15127] ctime 1769760652.250234845 [ +0.000002] [T15127] mtime 1769760652.250234845 [ +0.000001] [T15127] otime 1769760651.0 [ +0.000002] [T15127] item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12 [ +0.000003] [T15127] index 0 name_len 2 [ +0.000002] [T15127] item 2 key (256 DIR_ITEM 2030520461) itemoff 16076 itemsize 35 [ +0.000002] [T15127] location key (257 1 0) type 2 [ +0.000002] [T15127] transid 11 data_len 0 name_len 5 [ +0.000002] [T15127] item 3 key (256 DIR_INDEX 2) itemoff 16041 itemsize 35 [ +0.000002] [T15127] location key (257 1 0) type 2 [ +0.000002] [T15127] transid 11 data_len 0 name_len 5 [ +0.000002] [T15127] item 4 key (257 INODE_ITEM 0) itemoff 15881 itemsize 160 [ +0.000002] [T15127] inode generation 11 transid 11 size 12 nbytes 0 [ +0.000002] [T15127] block group 0 mode 40755 links 1 uid 0 gid 0 [ +0.000002] [T15127] rdev 0 sequence 19 flags 0x0 [ +0.000001] [T15127] atime 1769760652.250234845 [ +0.000002] [T15127] ctime 1769760652.256913323 [ +0.000002] [T15127] mtime 1769760652.256913323 [ +0.000001] [T15127] otime 1769760652.250234845 [ +0.000002] [T15127] item 5 key (257 INODE_REF 256) itemoff 15866 itemsize 15 [ +0.000002] [T15127] index 2 name_len 5 [ +0.000002] [T15127] item 6 key (257 DIR_ITEM 247980518) itemoff 15830 itemsize 36 [ +0.000002] [T15127] location key (256 132 18446744073709551615) type 2 [ +0.000002] [T15127] transid 11 data_len 0 name_len 6 [ +0.000002] [T15127] item 7 key (257 DIR_INDEX 2) itemoff 15794 itemsize 36 [ +0.000002] [T15127] location key (256 132 18446744073709551615) type 2 [ +0.000002] [T15127] transid 11 data_len 0 name_len 6 [ +0.000001] [T15127] BTRFS error (device loop1): block=30687232 write time tree block corruption detected [ +0.000017] [T15127] BTRFS error (device loop1): block=30703616 bad generation, have 11 expect > 14 [ +0.000004] [T15127] item 0 key (13631488 BLOCK_GROUP_ITEM 8388608) itemoff 16259 itemsize 24 [ +0.000003] [T15127] block group used 0 chunk_objectid 256 flags 1 [ +0.000002] [T15127] item 1 key (22020096 BLOCK_GROUP_ITEM 8388608) itemoff 16235 itemsize 24 [ +0.000002] [T15127] block group used 16384 chunk_objectid 256 flags 34 [ +0.000002] [T15127] item 2 key (22036480 METADATA_ITEM 0) itemoff 16202 itemsize 33 [ +0.000002] [T15127] extent refs 1 gen 8 flags 2 [ +0.000002] [T15127] ref#0: tree block backref root 3 [ +0.000003] [T15127] item 3 key (30408704 BLOCK_GROUP_ITEM 268435456) itemoff 16178 itemsize 24 [ +0.000002] [T15127] block group used 163840 chunk_objectid 256 flags 36 [ +0.000002] [T15127] item 4 key (30490624 METADATA_ITEM 0) itemoff 16145 itemsize 33 [ +0.000002] [T15127] extent refs 1 gen 5 flags 2 [ +0.000002] [T15127] ref#0: tree block backref root 7 [ +0.000002] [T15127] item 5 key (30523392 METADATA_ITEM 0) itemoff 16112 itemsize 33 [ +0.000002] [T15127] extent refs 1 gen 5 flags 2 [ +0.000002] [T15127] ref#0: tree block backref root 18446744073709551607 [ +0.000002] [T15127] item 6 key (30605312 METADATA_ITEM 0) itemoff 16079 itemsize 33 [ +0.000002] [T15127] extent refs 1 gen 9 flags 2 [ +0.000002] [T15127] ref#0: tree block backref root 4 [ +0.000002] [T15127] item 7 key (30687232 METADATA_ITEM 0) itemoff 16046 itemsize 33 [ +0.000002] [T15127] extent refs 1 gen 11 flags 2 [ +0.000002] [T15127] ref#0: tree block backref root 5 [ +0.000002] [T15127] item 8 key (30703616 METADATA_ITEM 0) itemoff 16013 itemsize 33 [ +0.000002] [T15127] extent refs 1 gen 11 flags 2 [ +0.000002] [T15127] ref#0: tree block backref root 2 [ +0.000002] [T15127] item 9 key (30720000 METADATA_ITEM 0) itemoff 15980 itemsize 33 [ +0.000002] [T15127] extent refs 1 gen 11 flags 2 [ +0.000002] [T15127] ref#0: tree block backref root 10 [ +0.000002] [T15127] item 10 key (30736384 METADATA_ITEM 0) itemoff 15947 itemsize 33 [ +0.000002] [T15127] extent refs 1 gen 11 flags 2 [ +0.000002] [T15127] ref#0: tree block backref root 8 [ +0.000002] [T15127] item 11 key (30752768 METADATA_ITEM 0) itemoff 15914 itemsize 33 [ +0.000002] [T15127] extent refs 1 gen 11 flags 2 [ +0.000002] [T15127] ref#0: tree block backref root 256 [ +0.000002] [T15127] item 12 key (30769152 METADATA_ITEM 0) itemoff 15881 itemsize 33 [ +0.000002] [T15127] extent refs 1 gen 11 flags 2 [ +0.000002] [T15127] ref#0: tree block backref root 1 [ +0.000002] [T15127] item 13 key (30785536 METADATA_ITEM 0) itemoff 15848 itemsize 33 [ +0.000002] [T15127] extent refs 1 gen 11 flags 2 [ +0.000002] [T15127] ref#0: tree block backref root 9 [ +0.000002] [T15127] BTRFS error (device loop1): block=30703616 write time tree block corruption detected [ +0.000012] [T15127] BTRFS error (device loop1): block=30720000 bad generation, have 11 expect > 14 [ +0.000004] [T15127] item 0 key (13631488 FREE_SPACE_INFO 8388608) itemoff 16275 itemsize 8 [ +0.000002] [T15127] item 1 key (13631488 FREE_SPACE_EXTENT 8388608) itemoff 16275 itemsize 0 [ +0.000002] [T15127] item 2 key (22020096 FREE_SPACE_INFO 8388608) itemoff 16267 itemsize 8 [ +0.000002] [T15127] item 3 key (22020096 FREE_SPACE_EXTENT 16384) itemoff 16267 itemsize 0 [ +0.000003] [T15127] item 4 key (22052864 FREE_SPACE_EXTENT 8355840) itemoff 16267 itemsize 0 [ +0.000002] [T15127] item 5 key (30408704 FREE_SPACE_INFO 268435456) itemoff 16259 itemsize 8 [ +0.000002] [T15127] item 6 key (30408704 FREE_SPACE_EXTENT 81920) itemoff 16259 itemsize 0 [ +0.000002] [T15127] item 7 key (30507008 FREE_SPACE_EXTENT 16384) itemoff 16259 itemsize 0 [ +0.000002] [T15127] item 8 key (30539776 FREE_SPACE_EXTENT 65536) itemoff 16259 itemsize 0 [ +0.000002] [T15127] item 9 key (30621696 FREE_SPACE_EXTENT 65536) itemoff 16259 itemsize 0 [ +0.000003] [T15127] item 10 key (30801920 FREE_SPACE_EXTENT 268042240) itemoff 16259 itemsize 0 [ +0.000002] [T15127] BTRFS error (device loop1): block=30720000 write time tree block corruption detected [ +0.000010] [T15127] BTRFS error (device loop1): block=30736384 bad generation, have 11 expect > 14 [ +0.000004] [T15127] item 0 key (0 QGROUP_STATUS 0) itemoff 16243 itemsize 40 [ +0.000003] [T15127] item 1 key (0 QGROUP_INFO 5) itemoff 16203 itemsize 40 [ +0.000002] [T15127] item 2 key (0 QGROUP_INFO 256) itemoff 16163 itemsize 40 [ +0.000002] [T15127] item 3 key (0 QGROUP_LIMIT 5) itemoff 16123 itemsize 40 [ +0.000002] [T15127] item 4 key (0 QGROUP_LIMIT 256) itemoff 16083 itemsize 40 [ +0.000003] [T15127] BTRFS error (device loop1): block=30736384 write time tree block corruption detected [ +0.000014] [T15127] BTRFS error (device loop1): block=30769152 bad generation, have 11 expect > 14 [ +0.000004] [T15127] item 0 key (2 ROOT_ITEM 0) itemoff 15844 itemsize 439 [ +0.000002] [T15127] root data bytenr 30703616 refs 1 [ +0.000002] [T15127] item 1 key (4 ROOT_ITEM 0) itemoff 15405 itemsize 439 [ +0.000002] [T15127] root data bytenr 30605312 refs 1 [ +0.000001] [T15127] item 2 key (5 INODE_REF 6) itemoff 15388 itemsize 17 [ +0.000002] [T15127] index 0 name_len 7 [ +0.000002] [T15127] item 3 key (5 ROOT_ITEM 0) itemoff 14949 itemsize 439 [ +0.000002] [T15127] root data bytenr 30687232 refs 1 [ +0.000002] [T15127] item 4 key (5 ROOT_REF 256) itemoff 14925 itemsize 24 [ +0.000002] [T15127] item 5 key (6 INODE_ITEM 0) itemoff 14765 itemsize 160 [ +0.000002] [T15127] inode generation 3 transid 0 size 0 nbytes 16384 [ +0.000002] [T15127] block group 0 mode 40755 links 1 uid 0 gid 0 [ +0.000002] [T15127] rdev 0 sequence 0 flags 0x0 [ +0.000001] [T15127] atime 1769760651.0 [ +0.000002] [T15127] ctime 1769760651.0 [ +0.000002] [T15127] mtime 1769760651.0 [ +0.000001] [T15127] otime 1769760651.0 [ +0.000002] [T15127] item 6 key (6 INODE_REF 6) itemoff 14753 itemsize 12 [ +0.000002] [T15127] index 0 name_len 2 [ +0.000001] [T15127] item 7 key (6 DIR_ITEM 2378154706) itemoff 14716 itemsize 37 [ +0.000003] [T15127] location key (5 132 18446744073709551615) type 2 [ +0.000001] [T15127] transid 3 data_len 0 name_len 7 [ +0.000002] [T15127] item 8 key (7 ROOT_ITEM 0) itemoff 14277 itemsize 439 [ +0.000002] [T15127] root data bytenr 30490624 refs 1 [ +0.000002] [T15127] item 9 key (8 ROOT_ITEM 0) itemoff 13838 itemsize 439 [ +0.000002] [T15127] root data bytenr 30736384 refs 1 [ +0.000001] [T15127] item 10 key (9 ROOT_ITEM 0) itemoff 13399 itemsize 439 [ +0.000002] [T15127] root data bytenr 30785536 refs 1 [ +0.000002] [T15127] item 11 key (10 ROOT_ITEM 0) itemoff 12960 itemsize 439 [ +0.000002] [T15127] root data bytenr 30720000 refs 1 [ +0.000001] [T15127] item 12 key (256 ROOT_ITEM 11) itemoff 12521 itemsize 439 [ +0.000003] [T15127] root data bytenr 30752768 refs 1 [ +0.000001] [T15127] item 13 key (256 ROOT_BACKREF 5) itemoff 12497 itemsize 24 [ +0.000003] [T15127] item 14 key (18446744073709551607 ROOT_ITEM 0) itemoff 12058 itemsize 439 [ +0.000002] [T15127] root data bytenr 30523392 refs 1 [ +0.000001] [T15127] BTRFS error (device loop1): block=30769152 write time tree block corruption detected [ +0.000012] [T15127] BTRFS error (device loop1): block=30801920 bad generation, have 12 expect > 14 [ +0.000003] [T15127] item 0 key (0 QGROUP_STATUS 0) itemoff 16243 itemsize 40 [ +0.000003] [T15127] item 1 key (0 QGROUP_INFO 5) itemoff 16203 itemsize 40 [ +0.000002] [T15127] item 2 key (0 QGROUP_INFO 256) itemoff 16163 itemsize 40 [ +0.000002] [T15127] item 3 key (0 QGROUP_INFO 257) itemoff 16123 itemsize 40 [ +0.000002] [T15127] item 4 key (0 QGROUP_LIMIT 5) itemoff 16083 itemsize 40 [ +0.000002] [T15127] item 5 key (0 QGROUP_LIMIT 256) itemoff 16043 itemsize 40 [ +0.000002] [T15127] item 6 key (0 QGROUP_LIMIT 257) itemoff 16003 itemsize 40 [ +0.000002] [T15127] BTRFS error (device loop1): block=30801920 write time tree block corruption detected [ +0.000014] [T15127] BTRFS error (device loop1): block=30818304 bad generation, have 12 expect > 14 [ +0.000003] [T15127] item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160 [ +0.000002] [T15127] inode generation 3 transid 11 size 10 nbytes 16384 [ +0.000002] [T15127] block group 0 mode 40755 links 1 uid 0 gid 0 [ +0.000002] [T15127] rdev 0 sequence 1 flags 0x0 [ +0.000002] [T15127] atime 1769760651.0 [ +0.000001] [T15127] ctime 1769760652.250234845 [ +0.000002] [T15127] mtime 1769760652.250234845 [ +0.000001] [T15127] otime 1769760651.0 [ +0.000002] [T15127] item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12 [ +0.000002] [T15127] index 0 name_len 2 [ +0.000002] [T15127] item 2 key (256 DIR_ITEM 2030520461) itemoff 16076 itemsize 35 [ +0.000002] [T15127] location key (257 1 0) type 2 [ +0.000002] [T15127] transid 11 data_len 0 name_len 5 [ +0.000001] [T15127] item 3 key (256 DIR_INDEX 2) itemoff 16041 itemsize 35 [ +0.000002] [T15127] location key (257 1 0) type 2 [ +0.000002] [T15127] transid 11 data_len 0 name_len 5 [ +0.000002] [T15127] item 4 key (257 INODE_ITEM 0) itemoff 15881 itemsize 160 [ +0.000002] [T15127] inode generation 11 transid 12 size 24 nbytes 0 [ +0.000002] [T15127] block group 0 mode 40755 links 1 uid 0 gid 0 [ +0.000002] [T15127] rdev 0 sequence 19 flags 0x0 [ +0.000001] [T15127] atime 1769760652.250234845 [ +0.000002] [T15127] ctime 1769760652.267621586 [ +0.000001] [T15127] mtime 1769760652.267621586 [ +0.000002] [T15127] otime 1769760652.250234845 [ +0.000002] [T15127] item 5 key (257 INODE_REF 256) itemoff 15866 itemsize 15 [ +0.000002] [T15127] index 2 name_len 5 [ +0.000001] [T15127] item 6 key (257 DIR_ITEM 247980518) itemoff 15830 itemsize 36 [ +0.000002] [T15127] location key (256 132 18446744073709551615) type 2 [ +0.000002] [T15127] transid 11 data_len 0 name_len 6 [ +0.000002] [T15127] item 7 key (257 DIR_ITEM 496439826) itemoff 15794 itemsize 36 [ +0.000002] [T15127] location key (257 132 18446744073709551615) type 2 [ +0.000002] [T15127] transid 12 data_len 0 name_len 6 [ +0.000001] [T15127] item 8 key (257 DIR_INDEX 2) itemoff 15758 itemsize 36 [ +0.000003] [T15127] location key (256 132 18446744073709551615) type 2 [ +0.000001] [T15127] transid 11 data_len 0 name_len 6 [ +0.000002] [T15127] item 9 key (257 DIR_INDEX 3) itemoff 15722 itemsize 36 [ +0.000002] [T15127] location key (257 132 18446744073709551615) type 2 [ +0.000002] [T15127] transid 12 data_len 0 name_len 6 [ +0.000001] [T15127] BTRFS error (device loop1): block=30818304 write time tree block corruption detected [ +0.000016] [T15127] BTRFS error (device loop1): block=30851072 bad generation, have 12 expect > 14 [ +0.000004] [T15127] item 0 key (2 ROOT_ITEM 0) itemoff 15844 itemsize 439 [ +0.000002] [T15127] root data bytenr 30867456 refs 1 [ +0.000001] [T15127] item 1 key (4 ROOT_ITEM 0) itemoff 15405 itemsize 439 [ +0.000002] [T15127] root data bytenr 30605312 refs 1 [ +0.000002] [T15127] item 2 key (5 INODE_REF 6) itemoff 15388 itemsize 17 [ +0.000002] [T15127] index 0 name_len 7 [ +0.000001] [T15127] item 3 key (5 ROOT_ITEM 0) itemoff 14949 itemsize 439 [ +0.000002] [T15127] root data bytenr 30818304 refs 1 [ +0.000002] [T15127] item 4 key (5 ROOT_REF 256) itemoff 14925 itemsize 24 [ +0.000002] [T15127] item 5 key (5 ROOT_REF 257) itemoff 14901 itemsize 24 [ +0.000002] [T15127] item 6 key (6 INODE_ITEM 0) itemoff 14741 itemsize 160 [ +0.000002] [T15127] inode generation 3 transid 0 size 0 nbytes 16384 [ +0.000002] [T15127] block group 0 mode 40755 links 1 uid 0 gid 0 [ +0.000003] [T15127] rdev 0 sequence 0 flags 0x0 [ +0.000001] [T15127] atime 1769760651.0 [ +0.000002] [T15127] ctime 1769760651.0 [ +0.000003] [T15127] mtime 1769760651.0 [ +0.000002] [T15127] otime 1769760651.0 [ +0.000002] [T15127] item 7 key (6 INODE_REF 6) itemoff 14729 itemsize 12 [ +0.000003] [T15127] index 0 name_len 2 [ +0.000002] [T15127] item 8 key (6 DIR_ITEM 2378154706) itemoff 14692 itemsize 37 [ +0.000003] [T15127] location key (5 132 18446744073709551615) type 2 [ +0.000002] [T15127] transid 3 data_len 0 name_len 7 [ +0.000002] [T15127] item 9 key (7 ROOT_ITEM 0) itemoff 14253 itemsize 439 [ +0.000003] [T15127] root data bytenr 30490624 refs 1 [ +0.000002] [T15127] item 10 key (8 ROOT_ITEM 0) itemoff 13814 itemsize 439 [ +0.000002] [T15127] root data bytenr 30801920 refs 1 [ +0.000003] [T15127] item 11 key (9 ROOT_ITEM 0) itemoff 13375 itemsize 439 [ +0.000002] [T15127] root data bytenr 30900224 refs 1 [ +0.000002] [T15127] item 12 key (10 ROOT_ITEM 0) itemoff 12936 itemsize 439 [ +0.000003] [T15127] root data bytenr 30883840 refs 1 [ +0.000002] [T15127] item 13 key (256 ROOT_ITEM 11) itemoff 12497 itemsize 439 [ +0.000003] [T15127] root data bytenr 30752768 refs 1 [ +0.000002] [T15127] item 14 key (256 ROOT_BACKREF 5) itemoff 12473 itemsize 24 [ +0.000003] [T15127] item 15 key (257 ROOT_ITEM 12) itemoff 12034 itemsize 439 [ +0.000003] [T15127] root data bytenr 30834688 refs 1 [ +0.000002] [T15127] item 16 key (257 ROOT_BACKREF 5) itemoff 12010 itemsize 24 [ +0.000003] [T15127] item 17 key (18446744073709551607 ROOT_ITEM 0) itemoff 11571 itemsize 439 [ +0.000004] [T15127] root data bytenr 30523392 refs 1 [ +0.000002] [T15127] BTRFS error (device loop1): block=30851072 write time tree block corruption detected and a lot more lines with the same generation errors for btrfs/122 btrfs/152 btrfs/210 btrfs/224 btrfs/316 btrfs/320 btrfs/340 fstest cases. I have no idea why it's trying to write some ebs older than current transaction. Seems related with snapshots. > To fix this, we need to check the DIRTY flag again to prevent writing a > eb which has some new data written, and lock the eb before we really > doing io related things. I'm not farmilar with io related code so please > correct me if I got anything wrong. > > Thanks, > > Sun Yangkai