From: Qu Wenruo <wqu@suse.com>
To: Filipe Manana <fdmanana@kernel.org>
Cc: linux-btrfs@vger.kernel.org, AHN SEOK-YOUNG <iamsyahn@gmail.com>,
Teng Liu <27rabbitlt@gmail.com>
Subject: Re: [PATCH] btrfs: warn about extent buffer that can not be released
Date: Thu, 16 Apr 2026 08:32:40 +0930 [thread overview]
Message-ID: <ec6a7a95-28d2-41bc-85a7-ef36e58ff9bd@suse.com> (raw)
In-Reply-To: <CAL3q7H45KF=ptRmAEE32J07oeewV3h-Zq8k_-LhO15PnReL+2Q@mail.gmail.com>
在 2026/4/15 20:55, Filipe Manana 写道:
> On Sun, Apr 5, 2026 at 5:56 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> When we unmount the fs or during mount failures, btrfs will call
>> invalidate_inode_pages() to release all btree inode folios.
>>
>> However that function can return -EBUSY if any folios can not be
>> invalidated.
>> This can be caused by:
>>
>> - Some extent buffers are still held by btrfs
>> This is a logic error, as we should release all tree root nodes
>> during unmount and mount failure handling.
>>
>> - Some extent buffers are under readahead and haven't yet finished
>> This is much rarer but valid cases.
>> In that case we should wait for those extent buffers.
>>
>> Introduce a new helper invalidate_btree_folios() which will:
>>
>> - Call invalidate_inode_pages2() and catch its return value
>> If it returned 0 as expected, that's great and we can call it a day.
>>
>> - Otherwise go through each extent buffer in buffer_tree
>> For eb that still has EXTENT_BUFFER_READING flag, wait for them to
>> finish first.
>>
>> Then check if the refs of the eb is greater than 1, if so it means
>> it's still held by someone, give it a warning.
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=221270
>> Reported-by: AHN SEOK-YOUNG <iamsyahn@gmail.com>
>> Cc: Teng Liu <27rabbitlt@gmail.com>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> This new debug feature has exposed some problems, e.g. generic/648,
>> after an injected failure for device barrier, transaction is aborted but
>> we have a lot of dirty tree blocks unreleased and triggered the new
>> warning.
>>
>> We still need to fix these corner cases.
>>
>> Meanwhile if Ahn or Teng can give this patch a try for the existing
>> use-after-free problem, it would be appreciated a lot.
>> ---
>> fs/btrfs/disk-io.c | 52 ++++++++++++++++++++++++++++++++++++++++++--
>> fs/btrfs/extent_io.c | 6 -----
>> fs/btrfs/extent_io.h | 6 +++++
>> 3 files changed, 56 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index b8ac3275d8f8..d450bfc6d89b 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3276,6 +3276,54 @@ static bool fs_is_full_ro(const struct btrfs_fs_info *fs_info)
>> return false;
>> }
>>
>> +static void invalidate_btree_folios(struct btrfs_fs_info *fs_info)
>> +{
>> + XA_STATE(xas, &fs_info->buffer_tree, 0);
>> + unsigned int checked = 0;
>> + void *tmp;
>> + int ret;
>> +
>> + ret = invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
>> + if (likely(ret == 0))
>> + return;
>> +
>> + /*
>> + * Some btree pages can not be invalidated, this happens when some
>> + * tree blocks are still held (either by some pointer or readahead).
>> + */
>> + xas_lock_irq(&xas);
>> + xas_for_each(&xas, tmp, ULONG_MAX) {
>> + struct extent_buffer *eb = tmp;
>
> Why use the tmp variable? We can pass 'eb' to xas_for_each().
>
>> +
>> + /* Wait for any readahead first. */
>> + if (test_bit(EXTENT_BUFFER_READING, &eb->bflags)) {
>> + xas_pause(&xas);
>> + xas_unlock_irq(&xas);
>> + wait_on_bit_io(&eb->bflags, EXTENT_BUFFER_READING,
>> + TASK_UNINTERRUPTIBLE);
>> + xas_lock_irq(&xas);
>> + }
>> +
>> + spin_lock(&eb->refs_lock);
>> + if (refcount_read(&eb->refs) > 1 || extent_buffer_under_io(eb)) {
>> + btrfs_warn(fs_info,
>> + "unbale to release extent buffer %llu owner %llu flags 0x%lx",
>
> unbale -> unable
>
>> + eb->start, btrfs_header_owner(eb), eb->bflags);
>> + if (IS_ENABLED(CONFIG_BTRFS_DEBUG))
>> + WARN_ON_ONCE(true);
>
> Pointless if, just do DEBUG_WARN();
The problem is we do not have DEBUG_WARN_ON_ONCE().
We already have cases triggering this during fstests, and there were a
lot of ebs that are dirty in that case.
DEBUG_WARN() will flood the dmesg with the same call trace for each
dirty eb.
I can add a new DEBUG_WARN_ON_ONCE() for this particular call site though.
Thanks,
Qu
next prev parent reply other threads:[~2026-04-15 23:02 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-05 4:56 [PATCH] btrfs: warn about extent buffer that can not be released Qu Wenruo
2026-04-05 10:05 ` 말란말야
2026-04-08 22:18 ` Teng Liu
2026-04-15 11:25 ` Filipe Manana
2026-04-15 23:02 ` Qu Wenruo [this message]
2026-04-16 10:22 ` Filipe Manana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ec6a7a95-28d2-41bc-85a7-ef36e58ff9bd@suse.com \
--to=wqu@suse.com \
--cc=27rabbitlt@gmail.com \
--cc=fdmanana@kernel.org \
--cc=iamsyahn@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox