Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: Teng Liu <27rabbitlt@gmail.com>, linux-btrfs@vger.kernel.org
Cc: dsterba@suse.com, clm@fb.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] btrfs: wait for in-flight readahead BIOs on open_ctree() error
Date: Mon, 30 Mar 2026 08:36:15 +1030	[thread overview]
Message-ID: <4a129696-0352-427f-9e0e-7962e789df57@suse.com> (raw)
In-Reply-To: <aclYf4R2XxlUkxAQ@rabbitArch>



在 2026/3/30 03:53, Teng Liu 写道:
> Thanks for your review!
> On 2026-03-29 17:33, Qu Wenruo wrote:
>>
>>
>> This doesn't make any sense to me.
> It confuses me as well when I try to reproduce the bug. The reported
> claimed that btrfs_bio_counter_sub triggered a use-after-free but this
> function lives under `dev-reaplce.c` which should have nothing to do
> with the setting from the name.
> 
> However when I checked the function call chain:
> 
> open_ctree()
>    → btrfs_read_sys_array()          # OK — sys_chunk_array in superblock is intact
>    → load_super_root(chunk_root)     # OK — reads root node, passes validation
>    → btrfs_read_chunk_tree()
>        → btrfs_for_each_slot()
>            → readahead_tree_node_children(node)
>                → for each child pointer in the internal node:
>                    btrfs_readahead_node_child()
>                      → btrfs_readahead_tree_block()
>                        → read_extent_buffer_pages_nowait()
>                          → btrfs_submit_bbio()
>                            → btrfs_submit_chunk()
>                              → btrfs_bio_counter_inc_blocked()  ← bio_counter++
>                              → btrfs_map_block()
>                              → submit_bio()                     ← sent to USB drive

Even you wait for all bios, it can still cause problems.

As the bio counter is only for btrfs bio layer, we still have 
btrfs_bio::end_io called after btrfs_bio_counter_dec().

And if the full fs_info has been freed, then at end_bbio_meta_read(), we 
can still have problems as btrfs_validate_extent_buffer() will access eb 
(bbio->private) and fs_info (eb->fs_info), which triggers use after free.

So using that bio counter is not going to solve all problems, but only 
reducing the race window thus masking the problem.

> 
> After submit_bio() sends BIO to USB drive, we continue on
> read_one_dev():
> 
> open_ctree()
>    → btrfs_read_sys_array()          # OK — sys_chunk_array in superblock is intact
>    → load_super_root(chunk_root)     # OK — reads root node, passes validation
>    → btrfs_read_chunk_tree()
>        → btrfs_for_each_slot()
>            → readahead_tree_node_children(node)
>              → bio_coutner++ and submit_bio() send BIO to USB drive
>            → read_one_dev()
> 
> This read_one_dev will return an error since the leaf block is actually
> corrupted. Then open_ctree will get into error path and try to free
> fs_info.
> 
> After USB device finished BIO, it will try to decreament the counter but
> the fs_info is already freed.
> 
> Any suggestions on this?

The following ideas come up to me, but neither seems as simple as your 
current one:

1) Introduce a dedicated counter for metadata readahead/reads
    This seems to be the simplest one among all.
    But the only usage is only the error handling, thus may not be
    worthy.

2) Disable metadata readahead during open_ctree()
    Which will delay the mount, especially for large extent tree without
    bgt feature.

3) Use buffer_tree xarray to iterate through all ebs
    Since this is only for error handling of open_ctree(), we're fine to
    do the full xarray iteration, and wait for any eb that has
    EXTENT_BUFFER_READING flag.

    The problem is, we do not have a dedicated tag like
    PAGECACHE_TAG_(TOWRITE|DIRTY) to easily catch all dirty/writeback
    ebs.
    So the only option is to go through each eb and check their flags.

    I think this is the one with minimal impact, but may cause much
    longer runtime during this error handling path.

My personal preference is option 3).

> 
> 
>>
>> The wait and counter are all for dev-reaplce, not matching your description
>> of the generic metadata readahead.
>>
>> If you want to wait for all existing metadata reads, I didn't find a good
>> helper, thus you will need to go through all extent buffers and wait for
>> EXTENT_BUFFER_READING flags.
>>
>>
> 
> 


  reply	other threads:[~2026-03-29 22:06 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-29  6:31 [PATCH] btrfs: wait for in-flight readahead BIOs on open_ctree() error Teng Liu
2026-03-29  7:03 ` Qu Wenruo
2026-03-29 17:23   ` Teng Liu
2026-03-29 22:06     ` Qu Wenruo [this message]
2026-03-29 22:21       ` Qu Wenruo
2026-03-30 18:00         ` Teng Liu
2026-03-30 21:48           ` Qu Wenruo
2026-03-30 22:14             ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4a129696-0352-427f-9e0e-7962e789df57@suse.com \
    --to=wqu@suse.com \
    --cc=27rabbitlt@gmail.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox