From: David Sterba <dsterba@suse.cz>
To: Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org, AHN SEOK-YOUNG <iamsyahn@gmail.com>,
Teng Liu <27rabbitlt@gmail.com>
Subject: Re: [PATCH v3] btrfs: warn about extent buffer that can not be released
Date: Mon, 27 Apr 2026 17:48:05 +0200 [thread overview]
Message-ID: <20260427154805.GQ12792@twin.jikos.cz> (raw)
In-Reply-To: <4ac4a9f2c599841b00f39f3be082432f43130f3e.1776379191.git.wqu@suse.com>
On Fri, Apr 17, 2026 at 08:13:14AM +0930, Qu Wenruo wrote:
> When we unmount the fs or during mount failures, btrfs will call
> invalidate_inode_pages() to release all btree inode folios.
>
> However that function can return -EBUSY if any folios can not be
> invalidated.
> This can be caused by:
>
> - Some extent buffers are still held by btrfs
> This is a logic error, as we should release all tree root nodes
> during unmount and mount failure handling.
>
> - Some extent buffers are under readahead and haven't yet finished
> This is much rarer but valid cases.
> In that case we should wait for those extent buffers.
>
> Introduce a new helper invalidate_btree_folios() which will:
>
> - Call invalidate_inode_pages2() and catch its return value
> If it returned 0 as expected, that's great and we can call it a day.
>
> - Otherwise go through each extent buffer in buffer_tree
> Increase the ref by one first for the eb we're checking.
> This is to ensure the eb won't be freed after the readahead is
> finished.
>
> For eb that still has EXTENT_BUFFER_READING flag, wait for them to
> finish first.
>
> After waiting for the readahead, check the refs of the eb and if it's
> still dirty.
>
> If the eb refs is greater than 2 (one for the buffer tree, one hold by
> us), it means we are still holding the extent buffer somewhere else,
> which is a logic bug.
>
> If the eb is still dirty, it means a bug in transaction handling.
> Unfortunately there are already test cases triggering this warning, so
> our transaction cleanup hasn't done its work reliably.
>
> For either case, show a warning message about the eb, including its
> bytenr, owner, refs and flags.
> And if it's a debug build, also trigger WARN_ON_ONCE() so that fstests
> can properly catch such situation.
>
> Furthermore, to help debugging the unreleased extent buffers, output the
> transid of the current aborted transaction, so that we can know which
> transaction the unreleased extent buffers belong to.
>
> This will help future debugging as we're already hitting the new
> warnings from test cases like generic/388.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=221270
> Reported-by: AHN SEOK-YOUNG <iamsyahn@gmail.com>
> Cc: Teng Liu <27rabbitlt@gmail.com>
> Tested-by: Teng Liu <27rabbitlt@gmail.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Changelog:
> v3:
> - Revert the DEBUG_WANR_ON_ONCE() change
> As there is only one user, a simple
> WARN_ON_ONCE(IS_ENABLED(CONFIG_BTRFS_DEBUG)) is more than enough.
>
> - Output the generation of the unreleased eb too
> Since it's possible to have 2 transactions (one committing and reached
> UNBLOCKED state, one new running), the generation output will help us
> to know which transaction the unreleased eb belongs to.
>
> - Also output the transid when a transaction is aborted
> To co-operate with the above change for debugging.
>
> v2:
> - Add one extra ref before checking the eb
> Although readahead has one extra ref, after the readahead finished the
> extra ref will be dropped, and memory pressure can kick in to free the
> extent buffer.
>
> - Use rcu lock with xa_for_each() instead of xas lock and xas_for_each()
> Since we're holding one extra eb ref to prevent eb from disappearing,
> we no longer needs the more strict xas lock nor the extra xas
> pause/unlock.
>
> Although xa_for_each() is more time consuming, we're at the cold path
> already, not a huge cost.
>
> - Remove the temporarary void pointer
> And pass eb pointer directly into xas_for_each().
>
> - Introduce DEBUG_WARN_ON_ONCE() helper
> To follow the existing DEBUG_WARN() helper.
>
> - Fix a typo
>
> - Also fix the checkpatch warning on the exist DEBUG_WARN()
> ---
> fs/btrfs/disk-io.c | 49 ++++++++++++++++++++++++++++++++++++++++--
> fs/btrfs/extent_io.c | 6 ------
> fs/btrfs/extent_io.h | 6 ++++++
> fs/btrfs/transaction.h | 8 +++----
> 4 files changed, 57 insertions(+), 12 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 7800a1b20290..241acdc16da1 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3272,6 +3272,51 @@ static bool fs_is_full_ro(const struct btrfs_fs_info *fs_info)
> return false;
> }
>
> +static void invalidate_btree_folios(struct btrfs_fs_info *fs_info)
This is too close to the generic invalidate_inode_pages2, please add
btrfs_ prefix.
> +{
> + unsigned long index = 0;
> + struct extent_buffer *eb;
> + int ret;
> +
> + ret = invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
> + if (likely(ret == 0))
> + return;
> +
> + /*
> + * Some btree pages can not be invalidated, this happens when some
> + * tree blocks are still held (either by some pointer or readahead).
> + */
> + rcu_read_lock();
> + xa_for_each(&fs_info->buffer_tree, index, eb) {
> + /* Increase the ref so that the eb won't disappear. */
> + if (!refcount_inc_not_zero(&eb->refs))
> + continue;
> + rcu_read_unlock();
> +
> + /* Wait for any readahead first. */
> + if (test_bit(EXTENT_BUFFER_READING, &eb->bflags))
> + wait_on_bit_io(&eb->bflags, EXTENT_BUFFER_READING,
> + TASK_UNINTERRUPTIBLE);
> + /*
> + * The refs threshold is 2, one hold by us at the beginning
> + * of the loop, one for the ownership in the buffer tree.
> + */
> + if (unlikely(refcount_read(&eb->refs) > 2 ||
> + extent_buffer_under_io(eb))) {
> + WARN_ON_ONCE(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> + btrfs_warn(fs_info,
> + "unable to release extent buffer %llu owner %llu gen %llu refs %u flags 0x%lx",
> + eb->start, btrfs_header_owner(eb),
> + btrfs_header_generation(eb),
> + refcount_read(&eb->refs), eb->bflags);
> + }
> + free_extent_buffer(eb);
> + rcu_read_lock();
> + }
> + rcu_read_unlock();
> + invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
> +}
> +
> int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices)
> {
> u32 sectorsize;
> @@ -3702,7 +3747,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
> if (fs_info->data_reloc_root)
> btrfs_drop_and_free_fs_root(fs_info, fs_info->data_reloc_root);
> free_root_pointers(fs_info, true);
> - invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
> + invalidate_btree_folios(fs_info);
>
> fail_sb_buffer:
> btrfs_stop_all_workers(fs_info);
> @@ -4431,7 +4476,7 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
> * We must make sure there is not any read request to
> * submit after we stop all workers.
> */
> - invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
> + invalidate_btree_folios(fs_info);
> btrfs_stop_all_workers(fs_info);
>
> /*
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 8d241a7a880f..4eab0f9909e3 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2872,12 +2872,6 @@ bool try_release_extent_mapping(struct folio *folio, gfp_t mask)
> return try_release_extent_state(io_tree, folio);
> }
>
> -static int extent_buffer_under_io(const struct extent_buffer *eb)
> -{
> - return (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags) ||
> - test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
> -}
> -
> static bool folio_range_has_eb(struct folio *folio)
> {
> struct btrfs_folio_state *bfs;
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index fd209233317f..b284aee1bfb0 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -326,6 +326,12 @@ static inline bool extent_buffer_uptodate(const struct extent_buffer *eb)
> return test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
> }
>
> +static inline bool extent_buffer_under_io(const struct extent_buffer *eb)
> +{
> + return (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags) ||
> + test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
> +}
> +
> int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
> unsigned long start, unsigned long len);
> void read_extent_buffer(const struct extent_buffer *eb, void *dst,
> diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
> index 7d70fe486758..264dcd4b3788 100644
> --- a/fs/btrfs/transaction.h
> +++ b/fs/btrfs/transaction.h
> @@ -255,13 +255,13 @@ do { \
> __first = true; \
> if (WARN(btrfs_abort_should_print_stack(error), \
> KERN_ERR \
> - "BTRFS: Transaction aborted (error %d)\n", \
> - (error))) { \
> + "BTRFS: Transaction %llu aborted (error %d)\n", \
> + (trans)->transid, (error))) { \
> /* Stack trace printed. */ \
> } else { \
> btrfs_err((trans)->fs_info, \
> - "Transaction aborted (error %d)", \
> - (error)); \
> + "Transaction %llu aborted (error %d)", \
> + (trans)->transid, (error)); \
Adding the transaction number adds like 4KiB of object code because the
calls are inlined so we can have exact location and stack.
It could be possibly moved to __btrfs_abort_transaction() but with some
additinal shuffling of the code from macro to the handler.
next prev parent reply other threads:[~2026-04-27 15:48 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-16 22:43 [PATCH v3] btrfs: warn about extent buffer that can not be released Qu Wenruo
2026-04-27 15:48 ` David Sterba [this message]
2026-04-27 22:01 ` Qu Wenruo
2026-04-28 15:17 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260427154805.GQ12792@twin.jikos.cz \
--to=dsterba@suse.cz \
--cc=27rabbitlt@gmail.com \
--cc=iamsyahn@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox