* [PATCH 0/2] btrfs: fix the delayed iputs ASSERT() when the fs
@ 2025-03-07 4:26 Qu Wenruo
2025-03-07 4:26 ` [PATCH 1/2] btrfs: run btrfs_error_commit_super() early Qu Wenruo
2025-03-07 4:26 ` [PATCH 2/2] btrfs: add extra warning if delayed iput is added when it's not allowed Qu Wenruo
0 siblings, 2 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-03-07 4:26 UTC (permalink / raw)
To: linux-btrfs; +Cc: fdmanana
Although Filipe fixes a lot of cases where the delayed iputs ASSERT()
can be triggered, that's all focusing on the common cases where the fs
is still in a good shape.
However in my experimental 2K block size runs, the test case generic/475
has a very high chance that the emulated error makes the fs to flips
into an error status.
That error status will trigger a completely new path inside
close_ctree(), calling btrfs_error_commit_super() which flush and wait
for all ordered extents.
That btrfs_error_commit_super() will generate new delayed iputs
asynchronously, thus trigger the later ASSERT().
The first patch is to fix the error, at least I can no longer trigger
the ASSERT() in generic/475.
The second patch is to add an extra WARN_ON_ONCE() inside
btrfs_add_delayed_iput(), triggered if close_ctree() has processed after
btrfs_error_commit_super() calls.
Qu Wenruo (2):
btrfs: run btrfs_error_commit_super() early
btrfs: add extra warning if delayed iput is added when it's not
allowed
fs/btrfs/disk-io.c | 14 +++++++++++---
fs/btrfs/fs.h | 4 ++++
fs/btrfs/inode.c | 4 ++++
3 files changed, 19 insertions(+), 3 deletions(-)
--
2.48.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] btrfs: run btrfs_error_commit_super() early
2025-03-07 4:26 [PATCH 0/2] btrfs: fix the delayed iputs ASSERT() when the fs Qu Wenruo
@ 2025-03-07 4:26 ` Qu Wenruo
2025-03-07 11:36 ` Filipe Manana
2025-03-07 4:26 ` [PATCH 2/2] btrfs: add extra warning if delayed iput is added when it's not allowed Qu Wenruo
1 sibling, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2025-03-07 4:26 UTC (permalink / raw)
To: linux-btrfs; +Cc: fdmanana
[BUG]
Even after all the error fixes related the
"ASSERT(list_empty(&fs_info->delayed_iputs));" in close_ctree(), I can
still hit it reliably with my experimental 2K block size.
[CAUSE]
In my case, all the error is triggered after the fs is already in error
status.
I find the following call trace to be the cause of race:
Main thread | endio_write_workers
---------------------------------------+------------------------
close_ctree() |
|- btrfs_error_commit_super() |
| |- btrfs_cleanup_tranasction() |
| | |- btrfs_wait_ordered_roots() |
| |- btrfs_run_delayed_iputs() |
| | btrfs_finish_ordered_io()
| | |- btrfs_put_ordered_extent()
| | |- btrfs_add_delayed_iput()
|- ASSERT(list_empty(delayed_iputs)) |
!!! Triggered !!!
The root cause is that, btrfs_wait_ordered_roots() only wait for
ordered extents to finish their IOs, not to wait for them to finish and
removed.
[FIX]
Since btrfs_error_commit_super() will flush and wait for all ordered
extents, it should be executed early, before we start flushing the
workqueues.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/disk-io.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b0f125d8efa0..320136a59db2 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4320,6 +4320,14 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
/* clear out the rbtree of defraggable inodes */
btrfs_cleanup_defrag_inodes(fs_info);
+ /*
+ * Handle the error fs first, as it will flush and wait for
+ * all ordred extents.
+ * This will generate delayed iputs, thus we want to handle it first.
+ */
+ if (unlikely(BTRFS_FS_ERROR(fs_info)))
+ btrfs_error_commit_super(fs_info);
+
/*
* Wait for any fixup workers to complete.
* If we don't wait for them here and they are still running by the time
@@ -4410,9 +4418,6 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
btrfs_err(fs_info, "commit super ret %d", ret);
}
- if (BTRFS_FS_ERROR(fs_info))
- btrfs_error_commit_super(fs_info);
-
kthread_stop(fs_info->transaction_kthread);
kthread_stop(fs_info->cleaner_kthread);
--
2.48.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] btrfs: add extra warning if delayed iput is added when it's not allowed
2025-03-07 4:26 [PATCH 0/2] btrfs: fix the delayed iputs ASSERT() when the fs Qu Wenruo
2025-03-07 4:26 ` [PATCH 1/2] btrfs: run btrfs_error_commit_super() early Qu Wenruo
@ 2025-03-07 4:26 ` Qu Wenruo
2025-03-07 11:45 ` Filipe Manana
1 sibling, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2025-03-07 4:26 UTC (permalink / raw)
To: linux-btrfs; +Cc: fdmanana
Since I have triggered the ASSERT() on the delayed iput too many times,
now is the time to add some extra debug warnings for delayed iput.
After all btrfs_commit_super() calls, we should no longer allow any new
delayed iput being added.
So introduce a new BTRFS_FS_STATE_NO_DELAYED_IPUT for debug builds, set
after above mentioned timing.
And all btrfs_add_delayed_iput() will check that flag and give a
WARN_ON_ONCE().
I really hope this warning will never be triggered.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/disk-io.c | 3 +++
fs/btrfs/fs.h | 4 ++++
fs/btrfs/inode.c | 4 ++++
3 files changed, 11 insertions(+)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 320136a59db2..bb20c015b779 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4417,6 +4417,9 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
if (ret)
btrfs_err(fs_info, "commit super ret %d", ret);
}
+#ifdef CONFIG_BTRFS_DEBUG
+ set_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state);
+#endif
kthread_stop(fs_info->transaction_kthread);
kthread_stop(fs_info->cleaner_kthread);
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index b8c2e59ffc43..ee298dd0f568 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -117,6 +117,10 @@ enum {
/* Indicates there was an error cleaning up a log tree. */
BTRFS_FS_STATE_LOG_CLEANUP_ERROR,
+#ifdef CONFIG_BTRFS_DEBUG
+ /* No more delayed iput can be queued. */
+ BTRFS_FS_STATE_NO_DELAYED_IPUT,
+#endif
BTRFS_FS_STATE_COUNT
};
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8ac4858b70e7..d2bf81c08f13 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3435,6 +3435,10 @@ void btrfs_add_delayed_iput(struct btrfs_inode *inode)
if (atomic_add_unless(&inode->vfs_inode.i_count, -1, 1))
return;
+#ifdef CONFIG_BTRFS_DEBUG
+ WARN_ON_ONCE(test_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state));
+#endif
+
atomic_inc(&fs_info->nr_delayed_iputs);
/*
* Need to be irq safe here because we can be called from either an irq
--
2.48.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] btrfs: run btrfs_error_commit_super() early
2025-03-07 4:26 ` [PATCH 1/2] btrfs: run btrfs_error_commit_super() early Qu Wenruo
@ 2025-03-07 11:36 ` Filipe Manana
0 siblings, 0 replies; 6+ messages in thread
From: Filipe Manana @ 2025-03-07 11:36 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Fri, Mar 7, 2025 at 4:27 AM Qu Wenruo <wqu@suse.com> wrote:
>
> [BUG]
> Even after all the error fixes related the
> "ASSERT(list_empty(&fs_info->delayed_iputs));" in close_ctree(), I can
> still hit it reliably with my experimental 2K block size.
>
> [CAUSE]
> In my case, all the error is triggered after the fs is already in error
> status.
>
> I find the following call trace to be the cause of race:
>
> Main thread | endio_write_workers
> ---------------------------------------+------------------------
> close_ctree() |
> |- btrfs_error_commit_super() |
> | |- btrfs_cleanup_tranasction() |
btrfs_cleanup_tranasction() ->btrfs_cleanup_transaction()
> | | |- btrfs_wait_ordered_roots() |
It took me a bit to see where this call came from, which is from
btrfs_destroy_all_ordered_extents() which is called by
btrfs_cleanup_transaction().
Could be added here.
> | |- btrfs_run_delayed_iputs() |
> | | btrfs_finish_ordered_io()
> | | |- btrfs_put_ordered_extent()
> | | |- btrfs_add_delayed_iput()
> |- ASSERT(list_empty(delayed_iputs)) |
> !!! Triggered !!!
>
> The root cause is that, btrfs_wait_ordered_roots() only wait for
> ordered extents to finish their IOs, not to wait for them to finish and
> removed.
>
> [FIX]
> Since btrfs_error_commit_super() will flush and wait for all ordered
> extents, it should be executed early, before we start flushing the
> workqueues.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/disk-io.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index b0f125d8efa0..320136a59db2 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -4320,6 +4320,14 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
> /* clear out the rbtree of defraggable inodes */
> btrfs_cleanup_defrag_inodes(fs_info);
>
> + /*
> + * Handle the error fs first, as it will flush and wait for
> + * all ordred extents.
ordred -> ordered
Those can be fixed when pushing to for-next.
Looks good:
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Thanks.
> + * This will generate delayed iputs, thus we want to handle it first.
> + */
> + if (unlikely(BTRFS_FS_ERROR(fs_info)))
> + btrfs_error_commit_super(fs_info);
> +
> /*
> * Wait for any fixup workers to complete.
> * If we don't wait for them here and they are still running by the time
> @@ -4410,9 +4418,6 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
> btrfs_err(fs_info, "commit super ret %d", ret);
> }
>
> - if (BTRFS_FS_ERROR(fs_info))
> - btrfs_error_commit_super(fs_info);
> -
> kthread_stop(fs_info->transaction_kthread);
> kthread_stop(fs_info->cleaner_kthread);
>
> --
> 2.48.1
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] btrfs: add extra warning if delayed iput is added when it's not allowed
2025-03-07 4:26 ` [PATCH 2/2] btrfs: add extra warning if delayed iput is added when it's not allowed Qu Wenruo
@ 2025-03-07 11:45 ` Filipe Manana
2025-03-07 21:27 ` Qu Wenruo
0 siblings, 1 reply; 6+ messages in thread
From: Filipe Manana @ 2025-03-07 11:45 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Fri, Mar 7, 2025 at 4:27 AM Qu Wenruo <wqu@suse.com> wrote:
>
> Since I have triggered the ASSERT() on the delayed iput too many times,
> now is the time to add some extra debug warnings for delayed iput.
>
> After all btrfs_commit_super() calls, we should no longer allow any new
> delayed iput being added.
>
> So introduce a new BTRFS_FS_STATE_NO_DELAYED_IPUT for debug builds, set
> after above mentioned timing.
> And all btrfs_add_delayed_iput() will check that flag and give a
> WARN_ON_ONCE().
>
> I really hope this warning will never be triggered.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/disk-io.c | 3 +++
> fs/btrfs/fs.h | 4 ++++
> fs/btrfs/inode.c | 4 ++++
> 3 files changed, 11 insertions(+)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 320136a59db2..bb20c015b779 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -4417,6 +4417,9 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
> if (ret)
> btrfs_err(fs_info, "commit super ret %d", ret);
> }
> +#ifdef CONFIG_BTRFS_DEBUG
> + set_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state);
> +#endif
We'll want this earlier, right after we call
btrfs_run_delayed_iputs(), so that we don't miss the case where for
example we are forgetting to flush some queue and then the delayed
iput is added after running iputs and before calling
btfs_commit_super(), or right after calling btrfs_commit_super() and
before setting the bit.
>
> kthread_stop(fs_info->transaction_kthread);
> kthread_stop(fs_info->cleaner_kthread);
> diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
> index b8c2e59ffc43..ee298dd0f568 100644
> --- a/fs/btrfs/fs.h
> +++ b/fs/btrfs/fs.h
> @@ -117,6 +117,10 @@ enum {
> /* Indicates there was an error cleaning up a log tree. */
> BTRFS_FS_STATE_LOG_CLEANUP_ERROR,
>
> +#ifdef CONFIG_BTRFS_DEBUG
> + /* No more delayed iput can be queued. */
> + BTRFS_FS_STATE_NO_DELAYED_IPUT,
> +#endif
> BTRFS_FS_STATE_COUNT
> };
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 8ac4858b70e7..d2bf81c08f13 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3435,6 +3435,10 @@ void btrfs_add_delayed_iput(struct btrfs_inode *inode)
> if (atomic_add_unless(&inode->vfs_inode.i_count, -1, 1))
> return;
>
> +#ifdef CONFIG_BTRFS_DEBUG
> + WARN_ON_ONCE(test_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state));
> +#endif
> +
I would get rid of the #ifdef CONFIG_BTRFS_DEBUG everywhere. It's very
cheap code, negligible.
Thanks.
> atomic_inc(&fs_info->nr_delayed_iputs);
> /*
> * Need to be irq safe here because we can be called from either an irq
> --
> 2.48.1
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] btrfs: add extra warning if delayed iput is added when it's not allowed
2025-03-07 11:45 ` Filipe Manana
@ 2025-03-07 21:27 ` Qu Wenruo
0 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-03-07 21:27 UTC (permalink / raw)
To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs
在 2025/3/7 22:15, Filipe Manana 写道:
> On Fri, Mar 7, 2025 at 4:27 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> Since I have triggered the ASSERT() on the delayed iput too many times,
>> now is the time to add some extra debug warnings for delayed iput.
>>
>> After all btrfs_commit_super() calls, we should no longer allow any new
>> delayed iput being added.
>>
>> So introduce a new BTRFS_FS_STATE_NO_DELAYED_IPUT for debug builds, set
>> after above mentioned timing.
>> And all btrfs_add_delayed_iput() will check that flag and give a
>> WARN_ON_ONCE().
>>
>> I really hope this warning will never be triggered.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> fs/btrfs/disk-io.c | 3 +++
>> fs/btrfs/fs.h | 4 ++++
>> fs/btrfs/inode.c | 4 ++++
>> 3 files changed, 11 insertions(+)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 320136a59db2..bb20c015b779 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -4417,6 +4417,9 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
>> if (ret)
>> btrfs_err(fs_info, "commit super ret %d", ret);
>> }
>> +#ifdef CONFIG_BTRFS_DEBUG
>> + set_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state);
>> +#endif
>
> We'll want this earlier, right after we call
> btrfs_run_delayed_iputs(), so that we don't miss the case where for
> example we are forgetting to flush some queue and then the delayed
> iput is added after running iputs and before calling
> btfs_commit_super(), or right after calling btrfs_commit_super() and
> before setting the bit.
I guess this means the call of btrfs_run_delayed_iputs() inside
btrfs_commit_super() should also be removed in this case?
Thanks,
Qu
>
>>
>> kthread_stop(fs_info->transaction_kthread);
>> kthread_stop(fs_info->cleaner_kthread);
>> diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
>> index b8c2e59ffc43..ee298dd0f568 100644
>> --- a/fs/btrfs/fs.h
>> +++ b/fs/btrfs/fs.h
>> @@ -117,6 +117,10 @@ enum {
>> /* Indicates there was an error cleaning up a log tree. */
>> BTRFS_FS_STATE_LOG_CLEANUP_ERROR,
>>
>> +#ifdef CONFIG_BTRFS_DEBUG
>> + /* No more delayed iput can be queued. */
>> + BTRFS_FS_STATE_NO_DELAYED_IPUT,
>> +#endif
>> BTRFS_FS_STATE_COUNT
>> };
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 8ac4858b70e7..d2bf81c08f13 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -3435,6 +3435,10 @@ void btrfs_add_delayed_iput(struct btrfs_inode *inode)
>> if (atomic_add_unless(&inode->vfs_inode.i_count, -1, 1))
>> return;
>>
>> +#ifdef CONFIG_BTRFS_DEBUG
>> + WARN_ON_ONCE(test_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state));
>> +#endif
>> +
>
> I would get rid of the #ifdef CONFIG_BTRFS_DEBUG everywhere. It's very
> cheap code, negligible.
>
> Thanks.
>
>
>
>> atomic_inc(&fs_info->nr_delayed_iputs);
>> /*
>> * Need to be irq safe here because we can be called from either an irq
>> --
>> 2.48.1
>>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-03-07 21:27 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-07 4:26 [PATCH 0/2] btrfs: fix the delayed iputs ASSERT() when the fs Qu Wenruo
2025-03-07 4:26 ` [PATCH 1/2] btrfs: run btrfs_error_commit_super() early Qu Wenruo
2025-03-07 11:36 ` Filipe Manana
2025-03-07 4:26 ` [PATCH 2/2] btrfs: add extra warning if delayed iput is added when it's not allowed Qu Wenruo
2025-03-07 11:45 ` Filipe Manana
2025-03-07 21:27 ` Qu Wenruo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox