[PATCH v4 0/2] btrfs: fix and detect dirty/held ebs at unmount time

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

* [PATCH v4 0/2] btrfs: fix and detect dirty/held ebs at unmount time
@ 2026-04-30  1:07 Qu Wenruo
  2026-04-30  1:07 ` [PATCH v4 1/2] btrfs: only release the dirty pages io tree after successful writes Qu Wenruo
  2026-04-30  1:07 ` [PATCH v4 2/2] btrfs: warn about extent buffer that can not be released Qu Wenruo
  0 siblings, 2 replies; 6+ messages in thread
From: Qu Wenruo @ 2026-04-30  1:07 UTC (permalink / raw)
  To: linux-btrfs

[CHANGELOG]
v4:
- Add the patch into the series which fixes the dirty ebs

- Remove the transid output
  It's not really utilized for the particular bug, and can be added
  later easily.

- Rename the detecting function to invalidate_and_check_btree_folios()

- Update the commit message in the 1st patch
  The dirty ebs won't cause NULL pointer dereferences even all workers
  are already stopped.
  As in this particular case the fs is already in RO mode, thus all
  metadata writeback will fail immediately without going through the
  writeback path.

  But still better to fix the bug other than relying on extra safety
  nets.
  
v3:
- Revert the DEBUG_WANR_ON_ONCE() change
  As there is only one user, a simple
  WARN_ON_ONCE(IS_ENABLED(CONFIG_BTRFS_DEBUG)) is more than enough.

- Output the generation of the unreleased eb too
  Since it's possible to have 2 transactions (one committing and reached
  UNBLOCKED state, one new running), the generation output will help us
  to know which transaction the unreleased eb belongs to.

- Also output the transid when a transaction is aborted
  To co-operate with the above change for debugging.

v2:
- Add one extra ref before checking the eb
  Although readahead has one extra ref, after the readahead finished the
  extra ref will be dropped, and memory pressure can kick in to free the
  extent buffer.

- Use rcu lock with xa_for_each() instead of xas lock and xas_for_each()
  Since we're holding one extra eb ref to prevent eb from disappearing,
  we no longer needs the more strict xas lock nor the extra xas
  pause/unlock.

  Although xa_for_each() is more time consuming, we're at the cold path
  already, not a huge cost.

- Remove the temporarary void pointer
  And pass eb pointer directly into xas_for_each().

- Introduce DEBUG_WARN_ON_ONCE() helper
  To follow the existing DEBUG_WARN() helper.

- Fix a typo

- Also fix the checkpatch warning on the exist DEBUG_WARN()

There is a report about readahead triggering NULL pointer dereferences,
which inspired me to add a detection mechanism about any dirty/held ebs
during unmount.

It turns out that several error injection/shutdown test cases can
trigger the new detection mechanism, and after more debugging it shows
that we had a bug in the transaction abort path that doesn't properly
cleanup the dirty ebs.

The first patch fixes the bug first, then the second patch introduces
the new detection mechanism.

Qu Wenruo (2):
  btrfs: only release the dirty pages io tree after successful writes
  btrfs: warn about extent buffer that can not be released

 fs/btrfs/disk-io.c     | 55 ++++++++++++++++++++++++++++++++++++++++--
 fs/btrfs/extent_io.c   |  6 -----
 fs/btrfs/extent_io.h   |  6 +++++
 fs/btrfs/transaction.c |  9 +++----
 4 files changed, 63 insertions(+), 13 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v4 1/2] btrfs: only release the dirty pages io tree after successful writes
  2026-04-30  1:07 [PATCH v4 0/2] btrfs: fix and detect dirty/held ebs at unmount time Qu Wenruo
@ 2026-04-30  1:07 ` Qu Wenruo
  2026-04-30 12:53   ` Filipe Manana
  2026-04-30  1:07 ` [PATCH v4 2/2] btrfs: warn about extent buffer that can not be released Qu Wenruo
  1 sibling, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2026-04-30  1:07 UTC (permalink / raw)
  To: linux-btrfs; +Cc: stable

[BUG]
With the recent commit "btrfs: warn about extent buffer that can not be
released", we can trigger the following warning running test cases like
generic/388 at unmount:

 BTRFS critical (device dm-2 state E): emergency shutdown
 BTRFS error (device dm-2 state E): cow_file_range failed, root=5 inode=265 start=135168 len=118784 cur_offset=135168 cur_alloc_size=0: -5
 BTRFS error (device dm-2 state E): error while writing out transaction: -30
 BTRFS warning (device dm-2 state E): Skipping commit of aborted transaction.
 BTRFS error (device dm-2 state EA): Transaction 9 aborted (error -30)
 BTRFS: error (device dm-2 state EA) in cleanup_transaction:2068: errno=-30 Readonly filesystem
 BTRFS info (device dm-2 state EA): forced readonly
 BTRFS error (device dm-2 state EA): failed to run delalloc range, root=5 ino=265 folio=135168 submit_bitmap=0 start=135168 len=118784: -5
 BTRFS info (device dm-2 state EA): last unmount of filesystem 8b3d8748-4710-4b5a-84d9-b072cb03be2d
 ------------[ cut here ]------------
 WARNING: disk-io.c:3306 at invalidate_btree_folios+0xfd/0x1ca [btrfs], CPU#4: umount/60183
 CPU: 4 UID: 0 PID: 60183 Comm: umount Tainted: G        W  OE       7.0.0-rc6-custom+ #365 PREEMPT(full)  5804053f02137e627472d94b5128cc9fcb110e88
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
 RIP: 0010:invalidate_btree_folios+0xfd/0x1ca [btrfs]
 Call Trace:
  <TASK>
  close_ctree+0x534/0x57a [btrfs eeeee2af86b856a32e0b81b75d427a17a62ffe29]
  generic_shutdown_super+0x89/0x1a0
  kill_anon_super+0x16/0x40
  btrfs_kill_super+0x16/0x20 [btrfs eeeee2af86b856a32e0b81b75d427a17a62ffe29]
  deactivate_locked_super+0x2d/0xb0
  cleanup_mnt+0xdc/0x140
  task_work_run+0x5a/0xa0
  exit_to_user_mode_loop+0x123/0x4b0
  do_syscall_64+0x288/0x7d0
  entry_SYSCALL_64_after_hwframe+0x4b/0x53
  </TASK>
 ---[ end trace 0000000000000000 ]---
 BTRFS warning (device dm-2 state EA): unable to release extent buffer 30507008 owner 1 gen 9 refs 2 flags 0x7
 BTRFS warning (device dm-2 state EA): unable to release extent buffer 30588928 owner 9 gen 9 refs 2 flags 0x7
 BTRFS warning (device dm-2 state EA): unable to release extent buffer 30605312 owner 257 gen 9 refs 2 flags 0x7
 BTRFS warning (device dm-2 state EA): unable to release extent buffer 30621696 owner 7 gen 9 refs 2 flags 0x7
 BTRFS warning (device dm-2 state EA): unable to release extent buffer 30638080 owner 258 gen 9 refs 2 flags 0x7
 BTRFS warning (device dm-2 state EA): unable to release extent buffer 30654464 owner 2 gen 9 refs 2 flags 0x7
 BTRFS warning (device dm-2 state EA): unable to release extent buffer 30670848 owner 10 gen 9 refs 2 flags 0x7

I'm using a stripped down version, which seems to trigger the warning
more reliably:

  _fsstress_pid=""
  workload()
  {
  	dmesg -C
  	mkfs.btrfs -f -K $dev > /dev/null
  	echo 1 > /sys/kernel/debug/clear_warn_once
  	mount $dev $mnt
  	$fsstress -w -n 1024 -p 4 -d $mnt &
  	_fsstress_pid=$!
  	sleep 0
  	$godown $mnt
  	pkill --echo -PIPE fsstress > /dev/null
  	wait $_fsstress_pid
  	unset _fsstress_pid
  	umount $mnt

  	if dmesg | grep -q "WARNING"; then
  		fail
  	fi
  }

  for (( i = 0; i < $runtime; i++ )); do
  	echo "=== $i/$runtime ==="
  	workload
  done

[CAUSE]
Inside btrfs_write_and_wait_transaction(), we first try to write all
dirty ebs, then wait for them to finish.

After that we call btrfs_extent_io_tree_release() to free all
extent states from dirty_pages io tree.

However if we hit an error from btrfs_write_marked_extent(), then we
still call btrfs_extent_io_tree_release() to clear that dirty_pages io
tree, which may contain dirty records that we haven't yet submitted.

Furthermore, the later transaction cleanup path will utilize that
dirty_pages io tree to properly cleanup those dirty ebs, but since it's
already empty, no dirty ebs are properly cleaned up, thus will later
trigger the warnings inside invalidate_btree_folios().

[FIX]
Normally such dirty ebs won't cause problems, as when the iput() is
called on the btree inode, the dirty ebs will be forcibly written back,
and since the fs is already in an error status, such writeback will not
reach disk and finish immediately.

But it's still better to get rid of such dirty ebs, if we ended up with
dirty ebs but the fs is not in an error status, then such writeback at
iput() time will be too late, as all workers are already stopped but
writeback will utilize workers, which will lead to NULL pointer
dereferences.

Instead of unconditionally calling btrfs_extent_io_tree_release(), only
call it if btrfs_write_and_wait_transaction() finished successfully, so
that @dirty_pages extent io tree is kept untouched for transaction
cleanup.

CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c     | 1 +
 fs/btrfs/transaction.c | 9 ++++-----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 308955f0592a..f28cef8217de 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4679,6 +4679,7 @@ static void btrfs_destroy_marked_extents(struct btrfs_fs_info *fs_info,
 			free_extent_buffer_stale(eb);
 		}
 	}
+	btrfs_extent_io_tree_release(dirty_pages);
 }
 
 static void btrfs_destroy_pinned_extent(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 248adb785051..194f581b36f3 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1293,14 +1293,13 @@ static int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans)
 	blk_finish_plug(&plug);
 	ret2 = btrfs_wait_extents(fs_info, dirty_pages);
 
-	btrfs_extent_io_tree_release(&trans->transaction->dirty_pages);
-
 	if (ret)
 		return ret;
-	else if (ret2)
+	if (ret2)
 		return ret2;
-	else
-		return 0;
+
+	btrfs_extent_io_tree_release(&trans->transaction->dirty_pages);
+	return 0;
 }
 
 /*
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v4 2/2] btrfs: warn about extent buffer that can not be released
  2026-04-30  1:07 [PATCH v4 0/2] btrfs: fix and detect dirty/held ebs at unmount time Qu Wenruo
  2026-04-30  1:07 ` [PATCH v4 1/2] btrfs: only release the dirty pages io tree after successful writes Qu Wenruo
@ 2026-04-30  1:07 ` Qu Wenruo
  2026-04-30 12:48   ` Filipe Manana
  1 sibling, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2026-04-30  1:07 UTC (permalink / raw)
  To: linux-btrfs; +Cc: AHN SEOK-YOUNG, Teng Liu

When we unmount the fs or during mount failures, btrfs will call
invalidate_inode_pages() to release all btree inode folios.

However that function can return -EBUSY if any folios can not be
invalidated.
This can be caused by:

- Some extent buffers are still held by btrfs
  This is a logic error, as we should release all tree root nodes
  during unmount and mount failure handling.

- Some extent buffers are under readahead and haven't yet finished
  These are much rarer but valid cases.
  In that case we should wait for those extent buffers.

Introduce a new helper invalidate_and_check_btree_folios() which will:

- Call invalidate_inode_pages2() and catch its return value
  If it returned 0 as expected, that's great and we can call it a day.

- Otherwise go through each extent buffer in buffer_tree
  Increase the ref by one first for the eb we're checking.
  This is to ensure the eb won't be freed after the readahead is
  finished.

  For ebs that still have EXTENT_BUFFER_READING flag, wait for them to
  finish first.

  After waiting for the readahead, check the refs of the eb and if it's
  still dirty.

  If the eb ref count is greater than 2 (one for the buffer tree, one
  held by us), it means we are still holding the extent buffer somewhere
  else, which is a code bug.

  If the eb is still dirty, it means a bug in transaction handling, e.g.
  the bug fixed by patch "btrfs: only release the dirty pages io tree
  after successful writes".

  For either case, show a warning message about the eb, including its
  bytenr, owner, refs and flags.
  And if it's a debug build, also trigger WARN_ON_ONCE() so that fstests
  can properly catch such situation.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=221270
Reported-by: AHN SEOK-YOUNG <iamsyahn@gmail.com>
Cc: Teng Liu <27rabbitlt@gmail.com>
Tested-by: Teng Liu <27rabbitlt@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c   | 54 ++++++++++++++++++++++++++++++++++++++++++--
 fs/btrfs/extent_io.c |  6 -----
 fs/btrfs/extent_io.h |  6 +++++
 3 files changed, 58 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f28cef8217de..97299394515b 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3272,6 +3272,56 @@ static bool fs_is_full_ro(const struct btrfs_fs_info *fs_info)
 	return false;
 }
 
+/*
+ * Try to wait for any metadata readahead, and invalidate all btree folios.
+ *
+ * If the invalidation failed, report any dirty/held extent buffers.
+ */
+static void invalidate_and_check_btree_folios(struct btrfs_fs_info *fs_info)
+{
+	unsigned long index = 0;
+	struct extent_buffer *eb;
+	int ret;
+
+	ret = invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
+	if (likely(ret == 0))
+		return;
+
+	/*
+	 * Some btree pages can not be invalidated, this happens when some
+	 * tree blocks are still held (either by some pointer or readahead).
+	 */
+	rcu_read_lock();
+	xa_for_each(&fs_info->buffer_tree, index, eb) {
+		/* Increase the ref so that the eb won't disappear. */
+		if (!refcount_inc_not_zero(&eb->refs))
+			continue;
+		rcu_read_unlock();
+
+		/* Wait for any readahead first. */
+		if (test_bit(EXTENT_BUFFER_READING, &eb->bflags))
+			wait_on_bit_io(&eb->bflags, EXTENT_BUFFER_READING,
+				       TASK_UNINTERRUPTIBLE);
+		/*
+		 * The refs threshold is 2, one held by us at the beginning
+		 * of the loop, one for the ownership in the buffer tree.
+		 */
+		if (unlikely(refcount_read(&eb->refs) > 2 ||
+			     extent_buffer_under_io(eb))) {
+			WARN_ON_ONCE(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+			btrfs_warn(fs_info,
+			"unable to release extent buffer %llu owner %llu gen %llu refs %u flags 0x%lx",
+				   eb->start, btrfs_header_owner(eb),
+				   btrfs_header_generation(eb),
+				   refcount_read(&eb->refs), eb->bflags);
+		}
+		free_extent_buffer(eb);
+		rcu_read_lock();
+	}
+	rcu_read_unlock();
+	invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
+}
+
 int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices)
 {
 	u32 sectorsize;
@@ -3709,7 +3759,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	if (fs_info->data_reloc_root)
 		btrfs_drop_and_free_fs_root(fs_info, fs_info->data_reloc_root);
 	free_root_pointers(fs_info, true);
-	invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
+	invalidate_and_check_btree_folios(fs_info);
 
 fail_sb_buffer:
 	btrfs_stop_all_workers(fs_info);
@@ -4438,7 +4488,7 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
 	 * We must make sure there is not any read request to
 	 * submit after we stop all workers.
 	 */
-	invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
+	invalidate_and_check_btree_folios(fs_info);
 	btrfs_stop_all_workers(fs_info);
 
 	/*
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8800faa8b4be..9c9dffda2b07 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2882,12 +2882,6 @@ bool try_release_extent_mapping(struct folio *folio, gfp_t mask)
 	return try_release_extent_state(io_tree, folio);
 }
 
-static int extent_buffer_under_io(const struct extent_buffer *eb)
-{
-	return (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags) ||
-		test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
-}
-
 static bool folio_range_has_eb(struct folio *folio)
 {
 	struct btrfs_folio_state *bfs;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index b310a5145cf6..7b4152387d88 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -327,6 +327,12 @@ static inline bool extent_buffer_uptodate(const struct extent_buffer *eb)
 	return test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
 }
 
+static inline bool extent_buffer_under_io(const struct extent_buffer *eb)
+{
+	return (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags) ||
+		test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
+}
+
 int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
 			 unsigned long start, unsigned long len);
 void read_extent_buffer(const struct extent_buffer *eb, void *dst,
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 2/2] btrfs: warn about extent buffer that can not be released
  2026-04-30  1:07 ` [PATCH v4 2/2] btrfs: warn about extent buffer that can not be released Qu Wenruo
@ 2026-04-30 12:48   ` Filipe Manana
  2026-04-30 21:58     ` Qu Wenruo
  0 siblings, 1 reply; 6+ messages in thread
From: Filipe Manana @ 2026-04-30 12:48 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, AHN SEOK-YOUNG, Teng Liu

On Thu, Apr 30, 2026 at 2:08 AM Qu Wenruo <wqu@suse.com> wrote:
>
> When we unmount the fs or during mount failures, btrfs will call
> invalidate_inode_pages() to release all btree inode folios.
>
> However that function can return -EBUSY if any folios can not be
> invalidated.
> This can be caused by:
>
> - Some extent buffers are still held by btrfs
>   This is a logic error, as we should release all tree root nodes
>   during unmount and mount failure handling.
>
> - Some extent buffers are under readahead and haven't yet finished
>   These are much rarer but valid cases.
>   In that case we should wait for those extent buffers.
>
> Introduce a new helper invalidate_and_check_btree_folios() which will:
>
> - Call invalidate_inode_pages2() and catch its return value
>   If it returned 0 as expected, that's great and we can call it a day.
>
> - Otherwise go through each extent buffer in buffer_tree
>   Increase the ref by one first for the eb we're checking.
>   This is to ensure the eb won't be freed after the readahead is
>   finished.
>
>   For ebs that still have EXTENT_BUFFER_READING flag, wait for them to
>   finish first.
>
>   After waiting for the readahead, check the refs of the eb and if it's
>   still dirty.
>
>   If the eb ref count is greater than 2 (one for the buffer tree, one
>   held by us), it means we are still holding the extent buffer somewhere
>   else, which is a code bug.
>
>   If the eb is still dirty, it means a bug in transaction handling, e.g.
>   the bug fixed by patch "btrfs: only release the dirty pages io tree
>   after successful writes".
>
>   For either case, show a warning message about the eb, including its
>   bytenr, owner, refs and flags.
>   And if it's a debug build, also trigger WARN_ON_ONCE() so that fstests
>   can properly catch such situation.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=221270
> Reported-by: AHN SEOK-YOUNG <iamsyahn@gmail.com>
> Cc: Teng Liu <27rabbitlt@gmail.com>
> Tested-by: Teng Liu <27rabbitlt@gmail.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/disk-io.c   | 54 ++++++++++++++++++++++++++++++++++++++++++--
>  fs/btrfs/extent_io.c |  6 -----
>  fs/btrfs/extent_io.h |  6 +++++
>  3 files changed, 58 insertions(+), 8 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index f28cef8217de..97299394515b 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3272,6 +3272,56 @@ static bool fs_is_full_ro(const struct btrfs_fs_info *fs_info)
>         return false;
>  }
>
> +/*
> + * Try to wait for any metadata readahead, and invalidate all btree folios.
> + *
> + * If the invalidation failed, report any dirty/held extent buffers.
> + */
> +static void invalidate_and_check_btree_folios(struct btrfs_fs_info *fs_info)
> +{
> +       unsigned long index = 0;
> +       struct extent_buffer *eb;
> +       int ret;
> +
> +       ret = invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
> +       if (likely(ret == 0))
> +               return;
> +
> +       /*
> +        * Some btree pages can not be invalidated, this happens when some
> +        * tree blocks are still held (either by some pointer or readahead).

"by some pointer" - this is really confusing. We want to say some
other task is holding a ref or perhaps we have a refcount leak
somewhere.

> +        */
> +       rcu_read_lock();
> +       xa_for_each(&fs_info->buffer_tree, index, eb) {
> +               /* Increase the ref so that the eb won't disappear. */
> +               if (!refcount_inc_not_zero(&eb->refs))
> +                       continue;
> +               rcu_read_unlock();
> +
> +               /* Wait for any readahead first. */
> +               if (test_bit(EXTENT_BUFFER_READING, &eb->bflags))
> +                       wait_on_bit_io(&eb->bflags, EXTENT_BUFFER_READING,
> +                                      TASK_UNINTERRUPTIBLE);
> +               /*
> +                * The refs threshold is 2, one held by us at the beginning
> +                * of the loop, one for the ownership in the buffer tree.
> +                */
> +               if (unlikely(refcount_read(&eb->refs) > 2 ||
> +                            extent_buffer_under_io(eb))) {
> +                       WARN_ON_ONCE(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> +                       btrfs_warn(fs_info,
> +                       "unable to release extent buffer %llu owner %llu gen %llu refs %u flags 0x%lx",
> +                                  eb->start, btrfs_header_owner(eb),
> +                                  btrfs_header_generation(eb),
> +                                  refcount_read(&eb->refs), eb->bflags);
> +               }
> +               free_extent_buffer(eb);
> +               rcu_read_lock();
> +       }
> +       rcu_read_unlock();
> +       invalidate_inode_pages2(fs_info->btree_inode->i_mapping);

Can we report/WARN_ON if invalidate_inode_pages2() returns an error again?

Otherwise:

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

> +}
> +
>  int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices)
>  {
>         u32 sectorsize;
> @@ -3709,7 +3759,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
>         if (fs_info->data_reloc_root)
>                 btrfs_drop_and_free_fs_root(fs_info, fs_info->data_reloc_root);
>         free_root_pointers(fs_info, true);
> -       invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
> +       invalidate_and_check_btree_folios(fs_info);
>
>  fail_sb_buffer:
>         btrfs_stop_all_workers(fs_info);
> @@ -4438,7 +4488,7 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
>          * We must make sure there is not any read request to
>          * submit after we stop all workers.
>          */
> -       invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
> +       invalidate_and_check_btree_folios(fs_info);
>         btrfs_stop_all_workers(fs_info);
>
>         /*
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 8800faa8b4be..9c9dffda2b07 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2882,12 +2882,6 @@ bool try_release_extent_mapping(struct folio *folio, gfp_t mask)
>         return try_release_extent_state(io_tree, folio);
>  }
>
> -static int extent_buffer_under_io(const struct extent_buffer *eb)
> -{
> -       return (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags) ||
> -               test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
> -}
> -
>  static bool folio_range_has_eb(struct folio *folio)
>  {
>         struct btrfs_folio_state *bfs;
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index b310a5145cf6..7b4152387d88 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -327,6 +327,12 @@ static inline bool extent_buffer_uptodate(const struct extent_buffer *eb)
>         return test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
>  }
>
> +static inline bool extent_buffer_under_io(const struct extent_buffer *eb)
> +{
> +       return (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags) ||
> +               test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
> +}
> +
>  int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
>                          unsigned long start, unsigned long len);
>  void read_extent_buffer(const struct extent_buffer *eb, void *dst,
> --
> 2.54.0
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 1/2] btrfs: only release the dirty pages io tree after successful writes
  2026-04-30  1:07 ` [PATCH v4 1/2] btrfs: only release the dirty pages io tree after successful writes Qu Wenruo
@ 2026-04-30 12:53   ` Filipe Manana
  0 siblings, 0 replies; 6+ messages in thread
From: Filipe Manana @ 2026-04-30 12:53 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, stable

On Thu, Apr 30, 2026 at 2:07 AM Qu Wenruo <wqu@suse.com> wrote:
>
> [BUG]
> With the recent commit "btrfs: warn about extent buffer that can not be
> released",

It's a bit odd to refer to that patch as it comes after in the series.

> we can trigger the following warning running test cases like
> generic/388 at unmount:
>
>  BTRFS critical (device dm-2 state E): emergency shutdown
>  BTRFS error (device dm-2 state E): cow_file_range failed, root=5 inode=265 start=135168 len=118784 cur_offset=135168 cur_alloc_size=0: -5
>  BTRFS error (device dm-2 state E): error while writing out transaction: -30
>  BTRFS warning (device dm-2 state E): Skipping commit of aborted transaction.
>  BTRFS error (device dm-2 state EA): Transaction 9 aborted (error -30)
>  BTRFS: error (device dm-2 state EA) in cleanup_transaction:2068: errno=-30 Readonly filesystem
>  BTRFS info (device dm-2 state EA): forced readonly
>  BTRFS error (device dm-2 state EA): failed to run delalloc range, root=5 ino=265 folio=135168 submit_bitmap=0 start=135168 len=118784: -5
>  BTRFS info (device dm-2 state EA): last unmount of filesystem 8b3d8748-4710-4b5a-84d9-b072cb03be2d
>  ------------[ cut here ]------------
>  WARNING: disk-io.c:3306 at invalidate_btree_folios+0xfd/0x1ca [btrfs], CPU#4: umount/60183

This stack trace is also outdated because the function is now named
invalidate_and_check_btree_folios().

Otherwise:

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

>  CPU: 4 UID: 0 PID: 60183 Comm: umount Tainted: G        W  OE       7.0.0-rc6-custom+ #365 PREEMPT(full)  5804053f02137e627472d94b5128cc9fcb110e88
>  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
>  RIP: 0010:invalidate_btree_folios+0xfd/0x1ca [btrfs]
>  Call Trace:
>   <TASK>
>   close_ctree+0x534/0x57a [btrfs eeeee2af86b856a32e0b81b75d427a17a62ffe29]
>   generic_shutdown_super+0x89/0x1a0
>   kill_anon_super+0x16/0x40
>   btrfs_kill_super+0x16/0x20 [btrfs eeeee2af86b856a32e0b81b75d427a17a62ffe29]
>   deactivate_locked_super+0x2d/0xb0
>   cleanup_mnt+0xdc/0x140
>   task_work_run+0x5a/0xa0
>   exit_to_user_mode_loop+0x123/0x4b0
>   do_syscall_64+0x288/0x7d0
>   entry_SYSCALL_64_after_hwframe+0x4b/0x53
>   </TASK>
>  ---[ end trace 0000000000000000 ]---
>  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30507008 owner 1 gen 9 refs 2 flags 0x7
>  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30588928 owner 9 gen 9 refs 2 flags 0x7
>  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30605312 owner 257 gen 9 refs 2 flags 0x7
>  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30621696 owner 7 gen 9 refs 2 flags 0x7
>  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30638080 owner 258 gen 9 refs 2 flags 0x7
>  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30654464 owner 2 gen 9 refs 2 flags 0x7
>  BTRFS warning (device dm-2 state EA): unable to release extent buffer 30670848 owner 10 gen 9 refs 2 flags 0x7
>
> I'm using a stripped down version, which seems to trigger the warning
> more reliably:
>
>   _fsstress_pid=""
>   workload()
>   {
>         dmesg -C
>         mkfs.btrfs -f -K $dev > /dev/null
>         echo 1 > /sys/kernel/debug/clear_warn_once
>         mount $dev $mnt
>         $fsstress -w -n 1024 -p 4 -d $mnt &
>         _fsstress_pid=$!
>         sleep 0
>         $godown $mnt
>         pkill --echo -PIPE fsstress > /dev/null
>         wait $_fsstress_pid
>         unset _fsstress_pid
>         umount $mnt
>
>         if dmesg | grep -q "WARNING"; then
>                 fail
>         fi
>   }
>
>   for (( i = 0; i < $runtime; i++ )); do
>         echo "=== $i/$runtime ==="
>         workload
>   done
>
> [CAUSE]
> Inside btrfs_write_and_wait_transaction(), we first try to write all
> dirty ebs, then wait for them to finish.
>
> After that we call btrfs_extent_io_tree_release() to free all
> extent states from dirty_pages io tree.
>
> However if we hit an error from btrfs_write_marked_extent(), then we
> still call btrfs_extent_io_tree_release() to clear that dirty_pages io
> tree, which may contain dirty records that we haven't yet submitted.
>
> Furthermore, the later transaction cleanup path will utilize that
> dirty_pages io tree to properly cleanup those dirty ebs, but since it's
> already empty, no dirty ebs are properly cleaned up, thus will later
> trigger the warnings inside invalidate_btree_folios().
>
> [FIX]
> Normally such dirty ebs won't cause problems, as when the iput() is
> called on the btree inode, the dirty ebs will be forcibly written back,
> and since the fs is already in an error status, such writeback will not
> reach disk and finish immediately.
>
> But it's still better to get rid of such dirty ebs, if we ended up with
> dirty ebs but the fs is not in an error status, then such writeback at
> iput() time will be too late, as all workers are already stopped but
> writeback will utilize workers, which will lead to NULL pointer
> dereferences.
>
> Instead of unconditionally calling btrfs_extent_io_tree_release(), only
> call it if btrfs_write_and_wait_transaction() finished successfully, so
> that @dirty_pages extent io tree is kept untouched for transaction
> cleanup.
>
> CC: stable@vger.kernel.org # 6.1+
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/disk-io.c     | 1 +
>  fs/btrfs/transaction.c | 9 ++++-----
>  2 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 308955f0592a..f28cef8217de 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -4679,6 +4679,7 @@ static void btrfs_destroy_marked_extents(struct btrfs_fs_info *fs_info,
>                         free_extent_buffer_stale(eb);
>                 }
>         }
> +       btrfs_extent_io_tree_release(dirty_pages);
>  }
>
>  static void btrfs_destroy_pinned_extent(struct btrfs_fs_info *fs_info,
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 248adb785051..194f581b36f3 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1293,14 +1293,13 @@ static int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans)
>         blk_finish_plug(&plug);
>         ret2 = btrfs_wait_extents(fs_info, dirty_pages);
>
> -       btrfs_extent_io_tree_release(&trans->transaction->dirty_pages);
> -
>         if (ret)
>                 return ret;
> -       else if (ret2)
> +       if (ret2)
>                 return ret2;
> -       else
> -               return 0;
> +
> +       btrfs_extent_io_tree_release(&trans->transaction->dirty_pages);
> +       return 0;
>  }
>
>  /*
> --
> 2.54.0
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 2/2] btrfs: warn about extent buffer that can not be released
  2026-04-30 12:48   ` Filipe Manana
@ 2026-04-30 21:58     ` Qu Wenruo
  0 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2026-04-30 21:58 UTC (permalink / raw)
  To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs, AHN SEOK-YOUNG, Teng Liu



在 2026/4/30 22:18, Filipe Manana 写道:
> On Thu, Apr 30, 2026 at 2:08 AM Qu Wenruo <wqu@suse.com> wrote:
[...]
>> +       }
>> +       rcu_read_unlock();
>> +       invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
> 
> Can we report/WARN_ON if invalidate_inode_pages2() returns an error again?

May not be a good idea, especially if we already hit a dirty eb and 
warned before.

Thanks,
Qu

> 
> Otherwise:
> 
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> 
> Thanks.
> 
>> +}
>> +
>>   int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices)
>>   {
>>          u32 sectorsize;
>> @@ -3709,7 +3759,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
>>          if (fs_info->data_reloc_root)
>>                  btrfs_drop_and_free_fs_root(fs_info, fs_info->data_reloc_root);
>>          free_root_pointers(fs_info, true);
>> -       invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
>> +       invalidate_and_check_btree_folios(fs_info);
>>
>>   fail_sb_buffer:
>>          btrfs_stop_all_workers(fs_info);
>> @@ -4438,7 +4488,7 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
>>           * We must make sure there is not any read request to
>>           * submit after we stop all workers.
>>           */
>> -       invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
>> +       invalidate_and_check_btree_folios(fs_info);
>>          btrfs_stop_all_workers(fs_info);
>>
>>          /*
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 8800faa8b4be..9c9dffda2b07 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -2882,12 +2882,6 @@ bool try_release_extent_mapping(struct folio *folio, gfp_t mask)
>>          return try_release_extent_state(io_tree, folio);
>>   }
>>
>> -static int extent_buffer_under_io(const struct extent_buffer *eb)
>> -{
>> -       return (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags) ||
>> -               test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
>> -}
>> -
>>   static bool folio_range_has_eb(struct folio *folio)
>>   {
>>          struct btrfs_folio_state *bfs;
>> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
>> index b310a5145cf6..7b4152387d88 100644
>> --- a/fs/btrfs/extent_io.h
>> +++ b/fs/btrfs/extent_io.h
>> @@ -327,6 +327,12 @@ static inline bool extent_buffer_uptodate(const struct extent_buffer *eb)
>>          return test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
>>   }
>>
>> +static inline bool extent_buffer_under_io(const struct extent_buffer *eb)
>> +{
>> +       return (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags) ||
>> +               test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
>> +}
>> +
>>   int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
>>                           unsigned long start, unsigned long len);
>>   void read_extent_buffer(const struct extent_buffer *eb, void *dst,
>> --
>> 2.54.0
>>
>>
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-04-30 21:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-30  1:07 [PATCH v4 0/2] btrfs: fix and detect dirty/held ebs at unmount time Qu Wenruo
2026-04-30  1:07 ` [PATCH v4 1/2] btrfs: only release the dirty pages io tree after successful writes Qu Wenruo
2026-04-30 12:53   ` Filipe Manana
2026-04-30  1:07 ` [PATCH v4 2/2] btrfs: warn about extent buffer that can not be released Qu Wenruo
2026-04-30 12:48   ` Filipe Manana
2026-04-30 21:58     ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox