* [PATCH 0/2] btrfs: fix the delayed iputs ASSERT() when the fs
@ 2025-03-07 4:26 Qu Wenruo
2025-03-07 4:26 ` [PATCH 1/2] btrfs: run btrfs_error_commit_super() early Qu Wenruo
2025-03-07 4:26 ` [PATCH 2/2] btrfs: add extra warning if delayed iput is added when it's not allowed Qu Wenruo
0 siblings, 2 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-03-07 4:26 UTC (permalink / raw)
To: linux-btrfs; +Cc: fdmanana
Although Filipe fixes a lot of cases where the delayed iputs ASSERT()
can be triggered, that's all focusing on the common cases where the fs
is still in a good shape.
However in my experimental 2K block size runs, the test case generic/475
has a very high chance that the emulated error makes the fs to flips
into an error status.
That error status will trigger a completely new path inside
close_ctree(), calling btrfs_error_commit_super() which flush and wait
for all ordered extents.
That btrfs_error_commit_super() will generate new delayed iputs
asynchronously, thus trigger the later ASSERT().
The first patch is to fix the error, at least I can no longer trigger
the ASSERT() in generic/475.
The second patch is to add an extra WARN_ON_ONCE() inside
btrfs_add_delayed_iput(), triggered if close_ctree() has processed after
btrfs_error_commit_super() calls.
Qu Wenruo (2):
btrfs: run btrfs_error_commit_super() early
btrfs: add extra warning if delayed iput is added when it's not
allowed
fs/btrfs/disk-io.c | 14 +++++++++++---
fs/btrfs/fs.h | 4 ++++
fs/btrfs/inode.c | 4 ++++
3 files changed, 19 insertions(+), 3 deletions(-)
--
2.48.1
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH 1/2] btrfs: run btrfs_error_commit_super() early 2025-03-07 4:26 [PATCH 0/2] btrfs: fix the delayed iputs ASSERT() when the fs Qu Wenruo @ 2025-03-07 4:26 ` Qu Wenruo 2025-03-07 11:36 ` Filipe Manana 2025-03-07 4:26 ` [PATCH 2/2] btrfs: add extra warning if delayed iput is added when it's not allowed Qu Wenruo 1 sibling, 1 reply; 6+ messages in thread From: Qu Wenruo @ 2025-03-07 4:26 UTC (permalink / raw) To: linux-btrfs; +Cc: fdmanana [BUG] Even after all the error fixes related the "ASSERT(list_empty(&fs_info->delayed_iputs));" in close_ctree(), I can still hit it reliably with my experimental 2K block size. [CAUSE] In my case, all the error is triggered after the fs is already in error status. I find the following call trace to be the cause of race: Main thread | endio_write_workers ---------------------------------------+------------------------ close_ctree() | |- btrfs_error_commit_super() | | |- btrfs_cleanup_tranasction() | | | |- btrfs_wait_ordered_roots() | | |- btrfs_run_delayed_iputs() | | | btrfs_finish_ordered_io() | | |- btrfs_put_ordered_extent() | | |- btrfs_add_delayed_iput() |- ASSERT(list_empty(delayed_iputs)) | !!! Triggered !!! The root cause is that, btrfs_wait_ordered_roots() only wait for ordered extents to finish their IOs, not to wait for them to finish and removed. [FIX] Since btrfs_error_commit_super() will flush and wait for all ordered extents, it should be executed early, before we start flushing the workqueues. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/disk-io.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b0f125d8efa0..320136a59db2 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -4320,6 +4320,14 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info) /* clear out the rbtree of defraggable inodes */ btrfs_cleanup_defrag_inodes(fs_info); + /* + * Handle the error fs first, as it will flush and wait for + * all ordred extents. + * This will generate delayed iputs, thus we want to handle it first. + */ + if (unlikely(BTRFS_FS_ERROR(fs_info))) + btrfs_error_commit_super(fs_info); + /* * Wait for any fixup workers to complete. * If we don't wait for them here and they are still running by the time @@ -4410,9 +4418,6 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info) btrfs_err(fs_info, "commit super ret %d", ret); } - if (BTRFS_FS_ERROR(fs_info)) - btrfs_error_commit_super(fs_info); - kthread_stop(fs_info->transaction_kthread); kthread_stop(fs_info->cleaner_kthread); -- 2.48.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] btrfs: run btrfs_error_commit_super() early 2025-03-07 4:26 ` [PATCH 1/2] btrfs: run btrfs_error_commit_super() early Qu Wenruo @ 2025-03-07 11:36 ` Filipe Manana 0 siblings, 0 replies; 6+ messages in thread From: Filipe Manana @ 2025-03-07 11:36 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Fri, Mar 7, 2025 at 4:27 AM Qu Wenruo <wqu@suse.com> wrote: > > [BUG] > Even after all the error fixes related the > "ASSERT(list_empty(&fs_info->delayed_iputs));" in close_ctree(), I can > still hit it reliably with my experimental 2K block size. > > [CAUSE] > In my case, all the error is triggered after the fs is already in error > status. > > I find the following call trace to be the cause of race: > > Main thread | endio_write_workers > ---------------------------------------+------------------------ > close_ctree() | > |- btrfs_error_commit_super() | > | |- btrfs_cleanup_tranasction() | btrfs_cleanup_tranasction() ->btrfs_cleanup_transaction() > | | |- btrfs_wait_ordered_roots() | It took me a bit to see where this call came from, which is from btrfs_destroy_all_ordered_extents() which is called by btrfs_cleanup_transaction(). Could be added here. > | |- btrfs_run_delayed_iputs() | > | | btrfs_finish_ordered_io() > | | |- btrfs_put_ordered_extent() > | | |- btrfs_add_delayed_iput() > |- ASSERT(list_empty(delayed_iputs)) | > !!! Triggered !!! > > The root cause is that, btrfs_wait_ordered_roots() only wait for > ordered extents to finish their IOs, not to wait for them to finish and > removed. > > [FIX] > Since btrfs_error_commit_super() will flush and wait for all ordered > extents, it should be executed early, before we start flushing the > workqueues. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/disk-io.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index b0f125d8efa0..320136a59db2 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -4320,6 +4320,14 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info) > /* clear out the rbtree of defraggable inodes */ > btrfs_cleanup_defrag_inodes(fs_info); > > + /* > + * Handle the error fs first, as it will flush and wait for > + * all ordred extents. ordred -> ordered Those can be fixed when pushing to for-next. Looks good: Reviewed-by: Filipe Manana <fdmanana@suse.com> Thanks. > + * This will generate delayed iputs, thus we want to handle it first. > + */ > + if (unlikely(BTRFS_FS_ERROR(fs_info))) > + btrfs_error_commit_super(fs_info); > + > /* > * Wait for any fixup workers to complete. > * If we don't wait for them here and they are still running by the time > @@ -4410,9 +4418,6 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info) > btrfs_err(fs_info, "commit super ret %d", ret); > } > > - if (BTRFS_FS_ERROR(fs_info)) > - btrfs_error_commit_super(fs_info); > - > kthread_stop(fs_info->transaction_kthread); > kthread_stop(fs_info->cleaner_kthread); > > -- > 2.48.1 > ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 2/2] btrfs: add extra warning if delayed iput is added when it's not allowed 2025-03-07 4:26 [PATCH 0/2] btrfs: fix the delayed iputs ASSERT() when the fs Qu Wenruo 2025-03-07 4:26 ` [PATCH 1/2] btrfs: run btrfs_error_commit_super() early Qu Wenruo @ 2025-03-07 4:26 ` Qu Wenruo 2025-03-07 11:45 ` Filipe Manana 1 sibling, 1 reply; 6+ messages in thread From: Qu Wenruo @ 2025-03-07 4:26 UTC (permalink / raw) To: linux-btrfs; +Cc: fdmanana Since I have triggered the ASSERT() on the delayed iput too many times, now is the time to add some extra debug warnings for delayed iput. After all btrfs_commit_super() calls, we should no longer allow any new delayed iput being added. So introduce a new BTRFS_FS_STATE_NO_DELAYED_IPUT for debug builds, set after above mentioned timing. And all btrfs_add_delayed_iput() will check that flag and give a WARN_ON_ONCE(). I really hope this warning will never be triggered. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/disk-io.c | 3 +++ fs/btrfs/fs.h | 4 ++++ fs/btrfs/inode.c | 4 ++++ 3 files changed, 11 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 320136a59db2..bb20c015b779 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -4417,6 +4417,9 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info) if (ret) btrfs_err(fs_info, "commit super ret %d", ret); } +#ifdef CONFIG_BTRFS_DEBUG + set_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state); +#endif kthread_stop(fs_info->transaction_kthread); kthread_stop(fs_info->cleaner_kthread); diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h index b8c2e59ffc43..ee298dd0f568 100644 --- a/fs/btrfs/fs.h +++ b/fs/btrfs/fs.h @@ -117,6 +117,10 @@ enum { /* Indicates there was an error cleaning up a log tree. */ BTRFS_FS_STATE_LOG_CLEANUP_ERROR, +#ifdef CONFIG_BTRFS_DEBUG + /* No more delayed iput can be queued. */ + BTRFS_FS_STATE_NO_DELAYED_IPUT, +#endif BTRFS_FS_STATE_COUNT }; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 8ac4858b70e7..d2bf81c08f13 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3435,6 +3435,10 @@ void btrfs_add_delayed_iput(struct btrfs_inode *inode) if (atomic_add_unless(&inode->vfs_inode.i_count, -1, 1)) return; +#ifdef CONFIG_BTRFS_DEBUG + WARN_ON_ONCE(test_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state)); +#endif + atomic_inc(&fs_info->nr_delayed_iputs); /* * Need to be irq safe here because we can be called from either an irq -- 2.48.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] btrfs: add extra warning if delayed iput is added when it's not allowed 2025-03-07 4:26 ` [PATCH 2/2] btrfs: add extra warning if delayed iput is added when it's not allowed Qu Wenruo @ 2025-03-07 11:45 ` Filipe Manana 2025-03-07 21:27 ` Qu Wenruo 0 siblings, 1 reply; 6+ messages in thread From: Filipe Manana @ 2025-03-07 11:45 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Fri, Mar 7, 2025 at 4:27 AM Qu Wenruo <wqu@suse.com> wrote: > > Since I have triggered the ASSERT() on the delayed iput too many times, > now is the time to add some extra debug warnings for delayed iput. > > After all btrfs_commit_super() calls, we should no longer allow any new > delayed iput being added. > > So introduce a new BTRFS_FS_STATE_NO_DELAYED_IPUT for debug builds, set > after above mentioned timing. > And all btrfs_add_delayed_iput() will check that flag and give a > WARN_ON_ONCE(). > > I really hope this warning will never be triggered. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/disk-io.c | 3 +++ > fs/btrfs/fs.h | 4 ++++ > fs/btrfs/inode.c | 4 ++++ > 3 files changed, 11 insertions(+) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 320136a59db2..bb20c015b779 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -4417,6 +4417,9 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info) > if (ret) > btrfs_err(fs_info, "commit super ret %d", ret); > } > +#ifdef CONFIG_BTRFS_DEBUG > + set_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state); > +#endif We'll want this earlier, right after we call btrfs_run_delayed_iputs(), so that we don't miss the case where for example we are forgetting to flush some queue and then the delayed iput is added after running iputs and before calling btfs_commit_super(), or right after calling btrfs_commit_super() and before setting the bit. > > kthread_stop(fs_info->transaction_kthread); > kthread_stop(fs_info->cleaner_kthread); > diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h > index b8c2e59ffc43..ee298dd0f568 100644 > --- a/fs/btrfs/fs.h > +++ b/fs/btrfs/fs.h > @@ -117,6 +117,10 @@ enum { > /* Indicates there was an error cleaning up a log tree. */ > BTRFS_FS_STATE_LOG_CLEANUP_ERROR, > > +#ifdef CONFIG_BTRFS_DEBUG > + /* No more delayed iput can be queued. */ > + BTRFS_FS_STATE_NO_DELAYED_IPUT, > +#endif > BTRFS_FS_STATE_COUNT > }; > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 8ac4858b70e7..d2bf81c08f13 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -3435,6 +3435,10 @@ void btrfs_add_delayed_iput(struct btrfs_inode *inode) > if (atomic_add_unless(&inode->vfs_inode.i_count, -1, 1)) > return; > > +#ifdef CONFIG_BTRFS_DEBUG > + WARN_ON_ONCE(test_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state)); > +#endif > + I would get rid of the #ifdef CONFIG_BTRFS_DEBUG everywhere. It's very cheap code, negligible. Thanks. > atomic_inc(&fs_info->nr_delayed_iputs); > /* > * Need to be irq safe here because we can be called from either an irq > -- > 2.48.1 > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] btrfs: add extra warning if delayed iput is added when it's not allowed 2025-03-07 11:45 ` Filipe Manana @ 2025-03-07 21:27 ` Qu Wenruo 0 siblings, 0 replies; 6+ messages in thread From: Qu Wenruo @ 2025-03-07 21:27 UTC (permalink / raw) To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs 在 2025/3/7 22:15, Filipe Manana 写道: > On Fri, Mar 7, 2025 at 4:27 AM Qu Wenruo <wqu@suse.com> wrote: >> >> Since I have triggered the ASSERT() on the delayed iput too many times, >> now is the time to add some extra debug warnings for delayed iput. >> >> After all btrfs_commit_super() calls, we should no longer allow any new >> delayed iput being added. >> >> So introduce a new BTRFS_FS_STATE_NO_DELAYED_IPUT for debug builds, set >> after above mentioned timing. >> And all btrfs_add_delayed_iput() will check that flag and give a >> WARN_ON_ONCE(). >> >> I really hope this warning will never be triggered. >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/disk-io.c | 3 +++ >> fs/btrfs/fs.h | 4 ++++ >> fs/btrfs/inode.c | 4 ++++ >> 3 files changed, 11 insertions(+) >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index 320136a59db2..bb20c015b779 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -4417,6 +4417,9 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info) >> if (ret) >> btrfs_err(fs_info, "commit super ret %d", ret); >> } >> +#ifdef CONFIG_BTRFS_DEBUG >> + set_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state); >> +#endif > > We'll want this earlier, right after we call > btrfs_run_delayed_iputs(), so that we don't miss the case where for > example we are forgetting to flush some queue and then the delayed > iput is added after running iputs and before calling > btfs_commit_super(), or right after calling btrfs_commit_super() and > before setting the bit. I guess this means the call of btrfs_run_delayed_iputs() inside btrfs_commit_super() should also be removed in this case? Thanks, Qu > >> >> kthread_stop(fs_info->transaction_kthread); >> kthread_stop(fs_info->cleaner_kthread); >> diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h >> index b8c2e59ffc43..ee298dd0f568 100644 >> --- a/fs/btrfs/fs.h >> +++ b/fs/btrfs/fs.h >> @@ -117,6 +117,10 @@ enum { >> /* Indicates there was an error cleaning up a log tree. */ >> BTRFS_FS_STATE_LOG_CLEANUP_ERROR, >> >> +#ifdef CONFIG_BTRFS_DEBUG >> + /* No more delayed iput can be queued. */ >> + BTRFS_FS_STATE_NO_DELAYED_IPUT, >> +#endif >> BTRFS_FS_STATE_COUNT >> }; >> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >> index 8ac4858b70e7..d2bf81c08f13 100644 >> --- a/fs/btrfs/inode.c >> +++ b/fs/btrfs/inode.c >> @@ -3435,6 +3435,10 @@ void btrfs_add_delayed_iput(struct btrfs_inode *inode) >> if (atomic_add_unless(&inode->vfs_inode.i_count, -1, 1)) >> return; >> >> +#ifdef CONFIG_BTRFS_DEBUG >> + WARN_ON_ONCE(test_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state)); >> +#endif >> + > > I would get rid of the #ifdef CONFIG_BTRFS_DEBUG everywhere. It's very > cheap code, negligible. > > Thanks. > > > >> atomic_inc(&fs_info->nr_delayed_iputs); >> /* >> * Need to be irq safe here because we can be called from either an irq >> -- >> 2.48.1 >> > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-03-07 21:27 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-03-07 4:26 [PATCH 0/2] btrfs: fix the delayed iputs ASSERT() when the fs Qu Wenruo 2025-03-07 4:26 ` [PATCH 1/2] btrfs: run btrfs_error_commit_super() early Qu Wenruo 2025-03-07 11:36 ` Filipe Manana 2025-03-07 4:26 ` [PATCH 2/2] btrfs: add extra warning if delayed iput is added when it's not allowed Qu Wenruo 2025-03-07 11:45 ` Filipe Manana 2025-03-07 21:27 ` Qu Wenruo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox