Re: [PATCH] btrfs: reject root with mismatched level between root_item and node header

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Qu Wenruo <wqu@suse.com>
To: ZhengYuan Huang <gality369@gmail.com>
Cc: dsterba@suse.com, clm@fb.com, linux-btrfs@vger.kernel.org,
	linux-kernel@vger.kernel.org, baijiaju1990@gmail.com,
	r33s3n6@gmail.com, zzzccc427@gmail.com, stable@vger.kernel.org
Subject: Re: [PATCH] btrfs: reject root with mismatched level between root_item and node header
Date: Fri, 13 Mar 2026 13:39:34 +1030	[thread overview]
Message-ID: <14bae5ed-c12c-4380-a9a0-7a714217913e@suse.com> (raw)
In-Reply-To: <CAOmEq9U14a=pwN_dw2M70gfujhMKki434cfmegoxcyUpkYs5bQ@mail.gmail.com>



在 2026/3/13 13:19, ZhengYuan Huang 写道:
> On Fri, Mar 13, 2026 at 5:29 AM Qu Wenruo <wqu@suse.com> wrote:
>> Nope, we have btrfs_tree_parent_check structure, which has all the
>> needed checks at read time.
>>
>> The point of using that other than doing it manually here is, if one
>> mirror is bad, but the other mirror is good, then we can still grab the
>> good copy, but checking it here means if we got the bad mirror first, we
>> have no more chance.
>>
>> And during read of root-node, we have already passed the proper level
>> into it.
>>
>> So the only possibility is, your fuzzing tool is modifying the memory
>> after the read check.
>>
>> If so, it's impossible to fix.
> 
> Thanks for the review and for pointing this out.
> 
> I agree that btrfs_tree_parent_check is the intended read-time
> verifier, but the crash path here relies on a cache-hit bypass where
> that verification is not re-run.
> 
> My earlier description may have been misleading, or at least not clear
> enough, so let me clarify the exact trigger path in more detail below.

My bad, I forgot to mention the correct way to fix: you should put the 
check into the cached hit path, other than adhocing random checks around.

The correct way is to add an optional @check parameter for 
btrfs_buffer_uptodate() so that cached extent buffer will still be checked.

> 
> Two different metadata blocks are involved:
> - Block A: a root-tree leaf containing root_item for tree 265 (this
> field is corrupted: root_item.level = 1)
>    item 11 key (265 ROOT_ITEM 0) itemoff 13489 itemsize 439
>        generation 4255 root_dirid 256 bytenr 18787663872 level 1 refs 1
>        lastsnap 4214 byte_limit 0 bytes_used 16384 flags 0x0(none)
>        uuid 4cc4bc58-9708-2848-a264-19b95269f104
>        ctransid 13 otransid 13 stransid 0 rtransid 0
>        ctime 1766050670.362764444 (2025-12-18 09:37:50)
>        otime 1766050670.362000000 (2025-12-18 09:37:50)
>        drop key (0 UNKNOWN.0 0) level 0
> 
> - Block B: the actual tree-265 root block at bytenr 18787663872
> (header.level = 0, otherwise valid)
>      item 57 key (18787663872 METADATA_ITEM 0) itemoff 14198 itemsize 33
>        refs 1 gen 4255 flags TREE_BLOCK
>        tree block skinny level 0
>        tree block backref root 265
> 
> In relocate_tree_blocks phase 1 (get_tree_block_key), block B is read
> with check.level = block->level = 0 (from extent-tree metadata for
> that extent item).
> This I/O path runs btrfs_validate_extent_buffer, and level check
> passes (found 0, expected 0).
> So block B becomes EXTENT_BUFFER_UPTODATE.
> 
> In phase 2 (build_backref_tree -> handle_indirect_tree_backref ->
> btrfs_get_fs_root -> read_tree_root_path), level is taken from
> root_item in block A via btrfs_root_level, so expected level becomes 1
> (corrupt value).
> Then read_tree_block is called for the same bytenr (block B), but now
> it hits EXTENT_BUFFER_UPTODATE and returns from
> read_extent_buffer_pages_nowait early.
> 
> On that cache-hit path, btrfs_validate_extent_buffer is not executed
> again, so no level mismatch check occurs for expected=1 vs actual=0.
> 
> Because no read error is returned on cache hit, mirror retry logic is
> never entered. So this is not a “bad mirror first, good mirror later”
> case: there is no second mirror attempt because the read already
> succeeded from cache.
> 
> read_tree_root_path then builds an inconsistent root object:
>    - root->root_item.level = 1 (from block A)
>    - root->node/commit_root level = 0 (from cached block B)
> 
> handle_indirect_tree_backref computes level = cur->level + 1 = 1,
> searches commit_root (actual level 0), path->nodes[1] remains NULL,
> and btrfs_node_blockptr(NULL, ...) crashes. So the issue is a
> cross-block consistency gap at root construction time, not post-read
> memory corruption by the fuzzer.
> 
> That is why the fix in read_tree_root_path (checking
> btrfs_header_level(root->node) == btrfs_root_level(&root->root_item))
> is needed even with btrfs_tree_parent_check in place.
> 
> To clarify, our fuzzing tool does not perform any in-memory
> modification during testing. In fact, this bug is not caused by memory
> corruption at all; it is triggered entirely by corrupted on-disk
> metadata together with a cache-hit path that skips re-validation of
> the root block. I have also uploaded the reproduction script to
> https://drive.google.com/drive/folders/1BPXcgVI4DLzDcufNyynOakKD4EKnfVCg.
> 
> To reproduce the issue:
> 1. Build the PoC program: gcc repro.c -o poc
> 2. Build the ublk helper program from the ublk codebase, which is
> used to provide the runtime corruption capability:
> g++ -std=c++20 -fcoroutines -O2 -o standalone_replay \
> standalone_replay_btrfs.cpp targets/ublksrv_tgt.cpp \
> -I. -Iinclude -Itargets/include \
> -L./lib/.libs -lublksrv -luring -lpthread
> 3. Attach the crafted image through ublk:
> ./standalone_replay add -t loop -f /path/to/image
> 4. Run the PoC: ./poc
> This reliably reproduces the bug.
> 
> Thanks,
> ZhengYuan Huang

     prev parent reply	other threads:[~2026-03-13  3:09 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-12 10:22 [PATCH] btrfs: reject root with mismatched level between root_item and node header ZhengYuan Huang
2026-03-12 21:29 ` Qu Wenruo
2026-03-13  2:49   ` ZhengYuan Huang
2026-03-13  3:09     ` Qu Wenruo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14bae5ed-c12c-4380-a9a0-7a714217913e@suse.com \
    --to=wqu@suse.com \
    --cc=baijiaju1990@gmail.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=gality369@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=r33s3n6@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=zzzccc427@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox