public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] btrfs: fix the delayed iputs ASSERT() when the fs
@ 2025-03-07  4:26 Qu Wenruo
  2025-03-07  4:26 ` [PATCH 1/2] btrfs: run btrfs_error_commit_super() early Qu Wenruo
  2025-03-07  4:26 ` [PATCH 2/2] btrfs: add extra warning if delayed iput is added when it's not allowed Qu Wenruo
  0 siblings, 2 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-03-07  4:26 UTC (permalink / raw)
  To: linux-btrfs; +Cc: fdmanana

Although Filipe fixes a lot of cases where the delayed iputs ASSERT()
can be triggered, that's all focusing on the common cases where the fs
is still in a good shape.

However in my experimental 2K block size runs, the test case generic/475
has a very high chance that the emulated error makes the fs to flips
into an error status.

That error status will trigger a completely new path inside
close_ctree(), calling btrfs_error_commit_super() which flush and wait
for all ordered extents.

That btrfs_error_commit_super() will generate new delayed iputs
asynchronously, thus trigger the later ASSERT().

The first patch is to fix the error, at least I can no longer trigger
the ASSERT() in generic/475.

The second patch is to add an extra WARN_ON_ONCE() inside
btrfs_add_delayed_iput(), triggered if close_ctree() has processed after
btrfs_error_commit_super() calls.

Qu Wenruo (2):
  btrfs: run btrfs_error_commit_super() early
  btrfs: add extra warning if delayed iput is added when it's not
    allowed

 fs/btrfs/disk-io.c | 14 +++++++++++---
 fs/btrfs/fs.h      |  4 ++++
 fs/btrfs/inode.c   |  4 ++++
 3 files changed, 19 insertions(+), 3 deletions(-)

-- 
2.48.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] btrfs: run btrfs_error_commit_super() early
  2025-03-07  4:26 [PATCH 0/2] btrfs: fix the delayed iputs ASSERT() when the fs Qu Wenruo
@ 2025-03-07  4:26 ` Qu Wenruo
  2025-03-07 11:36   ` Filipe Manana
  2025-03-07  4:26 ` [PATCH 2/2] btrfs: add extra warning if delayed iput is added when it's not allowed Qu Wenruo
  1 sibling, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2025-03-07  4:26 UTC (permalink / raw)
  To: linux-btrfs; +Cc: fdmanana

[BUG]
Even after all the error fixes related the
"ASSERT(list_empty(&fs_info->delayed_iputs));" in close_ctree(), I can
still hit it reliably with my experimental 2K block size.

[CAUSE]
In my case, all the error is triggered after the fs is already in error
status.

I find the following call trace to be the cause of race:

           Main thread                 |     endio_write_workers
---------------------------------------+------------------------
close_ctree()                          |
|- btrfs_error_commit_super()          |
|  |- btrfs_cleanup_tranasction()      |
|  |  |- btrfs_wait_ordered_roots()    |
|  |- btrfs_run_delayed_iputs()        |
|                                      | btrfs_finish_ordered_io()
|                                      | |- btrfs_put_ordered_extent()
|                                      |    |- btrfs_add_delayed_iput()
|- ASSERT(list_empty(delayed_iputs))   |
   !!! Triggered !!!

The root cause is that, btrfs_wait_ordered_roots() only wait for
ordered extents to finish their IOs, not to wait for them to finish and
removed.

[FIX]
Since btrfs_error_commit_super() will flush and wait for all ordered
extents, it should be executed early, before we start flushing the
workqueues.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b0f125d8efa0..320136a59db2 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4320,6 +4320,14 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
 	/* clear out the rbtree of defraggable inodes */
 	btrfs_cleanup_defrag_inodes(fs_info);
 
+	/*
+	 * Handle the error fs first, as it will flush and wait for
+	 * all ordred extents.
+	 * This will generate delayed iputs, thus we want to handle it first.
+	 */
+	if (unlikely(BTRFS_FS_ERROR(fs_info)))
+		btrfs_error_commit_super(fs_info);
+
 	/*
 	 * Wait for any fixup workers to complete.
 	 * If we don't wait for them here and they are still running by the time
@@ -4410,9 +4418,6 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
 			btrfs_err(fs_info, "commit super ret %d", ret);
 	}
 
-	if (BTRFS_FS_ERROR(fs_info))
-		btrfs_error_commit_super(fs_info);
-
 	kthread_stop(fs_info->transaction_kthread);
 	kthread_stop(fs_info->cleaner_kthread);
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] btrfs: add extra warning if delayed iput is added when it's not allowed
  2025-03-07  4:26 [PATCH 0/2] btrfs: fix the delayed iputs ASSERT() when the fs Qu Wenruo
  2025-03-07  4:26 ` [PATCH 1/2] btrfs: run btrfs_error_commit_super() early Qu Wenruo
@ 2025-03-07  4:26 ` Qu Wenruo
  2025-03-07 11:45   ` Filipe Manana
  1 sibling, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2025-03-07  4:26 UTC (permalink / raw)
  To: linux-btrfs; +Cc: fdmanana

Since I have triggered the ASSERT() on the delayed iput too many times,
now is the time to add some extra debug warnings for delayed iput.

After all btrfs_commit_super() calls, we should no longer allow any new
delayed iput being added.

So introduce a new BTRFS_FS_STATE_NO_DELAYED_IPUT for debug builds, set
after above mentioned timing.
And all btrfs_add_delayed_iput() will check that flag and give a
WARN_ON_ONCE().

I really hope this warning will never be triggered.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 3 +++
 fs/btrfs/fs.h      | 4 ++++
 fs/btrfs/inode.c   | 4 ++++
 3 files changed, 11 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 320136a59db2..bb20c015b779 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4417,6 +4417,9 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
 		if (ret)
 			btrfs_err(fs_info, "commit super ret %d", ret);
 	}
+#ifdef CONFIG_BTRFS_DEBUG
+	set_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state);
+#endif
 
 	kthread_stop(fs_info->transaction_kthread);
 	kthread_stop(fs_info->cleaner_kthread);
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index b8c2e59ffc43..ee298dd0f568 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -117,6 +117,10 @@ enum {
 	/* Indicates there was an error cleaning up a log tree. */
 	BTRFS_FS_STATE_LOG_CLEANUP_ERROR,
 
+#ifdef CONFIG_BTRFS_DEBUG
+	/* No more delayed iput can be queued. */
+	BTRFS_FS_STATE_NO_DELAYED_IPUT,
+#endif
 	BTRFS_FS_STATE_COUNT
 };
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8ac4858b70e7..d2bf81c08f13 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3435,6 +3435,10 @@ void btrfs_add_delayed_iput(struct btrfs_inode *inode)
 	if (atomic_add_unless(&inode->vfs_inode.i_count, -1, 1))
 		return;
 
+#ifdef CONFIG_BTRFS_DEBUG
+	WARN_ON_ONCE(test_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state));
+#endif
+
 	atomic_inc(&fs_info->nr_delayed_iputs);
 	/*
 	 * Need to be irq safe here because we can be called from either an irq
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] btrfs: run btrfs_error_commit_super() early
  2025-03-07  4:26 ` [PATCH 1/2] btrfs: run btrfs_error_commit_super() early Qu Wenruo
@ 2025-03-07 11:36   ` Filipe Manana
  0 siblings, 0 replies; 6+ messages in thread
From: Filipe Manana @ 2025-03-07 11:36 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Mar 7, 2025 at 4:27 AM Qu Wenruo <wqu@suse.com> wrote:
>
> [BUG]
> Even after all the error fixes related the
> "ASSERT(list_empty(&fs_info->delayed_iputs));" in close_ctree(), I can
> still hit it reliably with my experimental 2K block size.
>
> [CAUSE]
> In my case, all the error is triggered after the fs is already in error
> status.
>
> I find the following call trace to be the cause of race:
>
>            Main thread                 |     endio_write_workers
> ---------------------------------------+------------------------
> close_ctree()                          |
> |- btrfs_error_commit_super()          |
> |  |- btrfs_cleanup_tranasction()      |

btrfs_cleanup_tranasction() ->btrfs_cleanup_transaction()

> |  |  |- btrfs_wait_ordered_roots()    |

It took me a bit to see where this call came from, which is from
btrfs_destroy_all_ordered_extents() which is called by
btrfs_cleanup_transaction().
Could be added here.

> |  |- btrfs_run_delayed_iputs()        |
> |                                      | btrfs_finish_ordered_io()
> |                                      | |- btrfs_put_ordered_extent()
> |                                      |    |- btrfs_add_delayed_iput()
> |- ASSERT(list_empty(delayed_iputs))   |
>    !!! Triggered !!!
>
> The root cause is that, btrfs_wait_ordered_roots() only wait for
> ordered extents to finish their IOs, not to wait for them to finish and
> removed.
>
> [FIX]
> Since btrfs_error_commit_super() will flush and wait for all ordered
> extents, it should be executed early, before we start flushing the
> workqueues.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/disk-io.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index b0f125d8efa0..320136a59db2 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -4320,6 +4320,14 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
>         /* clear out the rbtree of defraggable inodes */
>         btrfs_cleanup_defrag_inodes(fs_info);
>
> +       /*
> +        * Handle the error fs first, as it will flush and wait for
> +        * all ordred extents.

ordred -> ordered

Those can be fixed when pushing to for-next.

Looks good:

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

> +        * This will generate delayed iputs, thus we want to handle it first.
> +        */
> +       if (unlikely(BTRFS_FS_ERROR(fs_info)))
> +               btrfs_error_commit_super(fs_info);
> +
>         /*
>          * Wait for any fixup workers to complete.
>          * If we don't wait for them here and they are still running by the time
> @@ -4410,9 +4418,6 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
>                         btrfs_err(fs_info, "commit super ret %d", ret);
>         }
>
> -       if (BTRFS_FS_ERROR(fs_info))
> -               btrfs_error_commit_super(fs_info);
> -
>         kthread_stop(fs_info->transaction_kthread);
>         kthread_stop(fs_info->cleaner_kthread);
>
> --
> 2.48.1
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] btrfs: add extra warning if delayed iput is added when it's not allowed
  2025-03-07  4:26 ` [PATCH 2/2] btrfs: add extra warning if delayed iput is added when it's not allowed Qu Wenruo
@ 2025-03-07 11:45   ` Filipe Manana
  2025-03-07 21:27     ` Qu Wenruo
  0 siblings, 1 reply; 6+ messages in thread
From: Filipe Manana @ 2025-03-07 11:45 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Mar 7, 2025 at 4:27 AM Qu Wenruo <wqu@suse.com> wrote:
>
> Since I have triggered the ASSERT() on the delayed iput too many times,
> now is the time to add some extra debug warnings for delayed iput.
>
> After all btrfs_commit_super() calls, we should no longer allow any new
> delayed iput being added.
>
> So introduce a new BTRFS_FS_STATE_NO_DELAYED_IPUT for debug builds, set
> after above mentioned timing.
> And all btrfs_add_delayed_iput() will check that flag and give a
> WARN_ON_ONCE().
>
> I really hope this warning will never be triggered.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/disk-io.c | 3 +++
>  fs/btrfs/fs.h      | 4 ++++
>  fs/btrfs/inode.c   | 4 ++++
>  3 files changed, 11 insertions(+)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 320136a59db2..bb20c015b779 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -4417,6 +4417,9 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
>                 if (ret)
>                         btrfs_err(fs_info, "commit super ret %d", ret);
>         }
> +#ifdef CONFIG_BTRFS_DEBUG
> +       set_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state);
> +#endif

We'll want this earlier, right after we call
btrfs_run_delayed_iputs(), so that we don't miss the case where for
example we are forgetting to flush some queue and then the delayed
iput is added after running iputs and before calling
btfs_commit_super(), or right after calling btrfs_commit_super() and
before setting the bit.

>
>         kthread_stop(fs_info->transaction_kthread);
>         kthread_stop(fs_info->cleaner_kthread);
> diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
> index b8c2e59ffc43..ee298dd0f568 100644
> --- a/fs/btrfs/fs.h
> +++ b/fs/btrfs/fs.h
> @@ -117,6 +117,10 @@ enum {
>         /* Indicates there was an error cleaning up a log tree. */
>         BTRFS_FS_STATE_LOG_CLEANUP_ERROR,
>
> +#ifdef CONFIG_BTRFS_DEBUG
> +       /* No more delayed iput can be queued. */
> +       BTRFS_FS_STATE_NO_DELAYED_IPUT,
> +#endif
>         BTRFS_FS_STATE_COUNT
>  };
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 8ac4858b70e7..d2bf81c08f13 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3435,6 +3435,10 @@ void btrfs_add_delayed_iput(struct btrfs_inode *inode)
>         if (atomic_add_unless(&inode->vfs_inode.i_count, -1, 1))
>                 return;
>
> +#ifdef CONFIG_BTRFS_DEBUG
> +       WARN_ON_ONCE(test_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state));
> +#endif
> +

I would get rid of the #ifdef CONFIG_BTRFS_DEBUG everywhere. It's very
cheap code, negligible.

Thanks.



>         atomic_inc(&fs_info->nr_delayed_iputs);
>         /*
>          * Need to be irq safe here because we can be called from either an irq
> --
> 2.48.1
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] btrfs: add extra warning if delayed iput is added when it's not allowed
  2025-03-07 11:45   ` Filipe Manana
@ 2025-03-07 21:27     ` Qu Wenruo
  0 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-03-07 21:27 UTC (permalink / raw)
  To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs



在 2025/3/7 22:15, Filipe Manana 写道:
> On Fri, Mar 7, 2025 at 4:27 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> Since I have triggered the ASSERT() on the delayed iput too many times,
>> now is the time to add some extra debug warnings for delayed iput.
>>
>> After all btrfs_commit_super() calls, we should no longer allow any new
>> delayed iput being added.
>>
>> So introduce a new BTRFS_FS_STATE_NO_DELAYED_IPUT for debug builds, set
>> after above mentioned timing.
>> And all btrfs_add_delayed_iput() will check that flag and give a
>> WARN_ON_ONCE().
>>
>> I really hope this warning will never be triggered.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/disk-io.c | 3 +++
>>   fs/btrfs/fs.h      | 4 ++++
>>   fs/btrfs/inode.c   | 4 ++++
>>   3 files changed, 11 insertions(+)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 320136a59db2..bb20c015b779 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -4417,6 +4417,9 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
>>                  if (ret)
>>                          btrfs_err(fs_info, "commit super ret %d", ret);
>>          }
>> +#ifdef CONFIG_BTRFS_DEBUG
>> +       set_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state);
>> +#endif
>
> We'll want this earlier, right after we call
> btrfs_run_delayed_iputs(), so that we don't miss the case where for
> example we are forgetting to flush some queue and then the delayed
> iput is added after running iputs and before calling
> btfs_commit_super(), or right after calling btrfs_commit_super() and
> before setting the bit.

I guess this means the call of btrfs_run_delayed_iputs() inside
btrfs_commit_super() should also be removed in this case?

Thanks,
Qu

>
>>
>>          kthread_stop(fs_info->transaction_kthread);
>>          kthread_stop(fs_info->cleaner_kthread);
>> diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
>> index b8c2e59ffc43..ee298dd0f568 100644
>> --- a/fs/btrfs/fs.h
>> +++ b/fs/btrfs/fs.h
>> @@ -117,6 +117,10 @@ enum {
>>          /* Indicates there was an error cleaning up a log tree. */
>>          BTRFS_FS_STATE_LOG_CLEANUP_ERROR,
>>
>> +#ifdef CONFIG_BTRFS_DEBUG
>> +       /* No more delayed iput can be queued. */
>> +       BTRFS_FS_STATE_NO_DELAYED_IPUT,
>> +#endif
>>          BTRFS_FS_STATE_COUNT
>>   };
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 8ac4858b70e7..d2bf81c08f13 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -3435,6 +3435,10 @@ void btrfs_add_delayed_iput(struct btrfs_inode *inode)
>>          if (atomic_add_unless(&inode->vfs_inode.i_count, -1, 1))
>>                  return;
>>
>> +#ifdef CONFIG_BTRFS_DEBUG
>> +       WARN_ON_ONCE(test_bit(BTRFS_FS_STATE_NO_DELAYED_IPUT, &fs_info->fs_state));
>> +#endif
>> +
>
> I would get rid of the #ifdef CONFIG_BTRFS_DEBUG everywhere. It's very
> cheap code, negligible.
>
> Thanks.
>
>
>
>>          atomic_inc(&fs_info->nr_delayed_iputs);
>>          /*
>>           * Need to be irq safe here because we can be called from either an irq
>> --
>> 2.48.1
>>
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-03-07 21:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-07  4:26 [PATCH 0/2] btrfs: fix the delayed iputs ASSERT() when the fs Qu Wenruo
2025-03-07  4:26 ` [PATCH 1/2] btrfs: run btrfs_error_commit_super() early Qu Wenruo
2025-03-07 11:36   ` Filipe Manana
2025-03-07  4:26 ` [PATCH 2/2] btrfs: add extra warning if delayed iput is added when it's not allowed Qu Wenruo
2025-03-07 11:45   ` Filipe Manana
2025-03-07 21:27     ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox