* [PATCH 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups
@ 2025-03-05 17:45 fdmanana
2025-03-05 17:45 ` [PATCH 1/4] btrfs: fix non-empty delayed iputs list on unmount due to endio workers fdmanana
` (4 more replies)
0 siblings, 5 replies; 16+ messages in thread
From: fdmanana @ 2025-03-05 17:45 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
Fix a couple races that can result in adding delayed iputs during umount after
we no longer expect to find any, triggering an assertion failure. Plus a couple
cleanups. Details in the change logs.
Filipe Manana (4):
btrfs: fix non-empty delayed iputs list on unmount due to endio workers
btrfs: fix non-empty delayed iputs list on unmount due to compressed write workers
btrfs: move __btrfs_bio_end_io() code into its single caller
btrfs: move btrfs_cleanup_bio() code into its single caller
fs/btrfs/bio.c | 36 ++++++++++++++----------------------
fs/btrfs/disk-io.c | 23 +++++++++++++++++++++++
2 files changed, 37 insertions(+), 22 deletions(-)
--
2.45.2
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/4] btrfs: fix non-empty delayed iputs list on unmount due to endio workers
2025-03-05 17:45 [PATCH 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups fdmanana
@ 2025-03-05 17:45 ` fdmanana
2025-03-05 17:45 ` [PATCH 2/4] btrfs: fix non-empty delayed iputs list on unmount due to compressed write workers fdmanana
` (3 subsequent siblings)
4 siblings, 0 replies; 16+ messages in thread
From: fdmanana @ 2025-03-05 17:45 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
At close_ctree() after we have ran delayed iputs either through explicitly
calling btrfs_run_delayed_iputs() or later during the call to
btrfs_commit_super() or btrfs_error_commit_super(), we assert that the
delayed iputs list is empty.
Sometimes this assertion may fail because delayed iputs may have been
added to the list after we last ran delayed iputs, and this happens due
to workers in the endio_workers workqueue still running. These workers can
do a final put on an ordered extent attached to a data bio, which results
in adding a delayed iput. This is done at btrfs_bio_end_io() and its
helper __btrfs_bio_end_io().
Fix this by flushing the endio_workers workqueue before running delayed
iputs at close_ctree().
David reported this when running generic/648.
Reported-by: David Sterba <dsterba@suse.com>
Fixes: ec63b84d4611 ("btrfs: add an ordered_extent pointer to struct btrfs_bio")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/disk-io.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d96ea974ef73..df8e075e69a3 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4340,6 +4340,16 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
*/
btrfs_flush_workqueue(fs_info->delalloc_workers);
+ /*
+ * We can also have ordered extents getting their last reference dropped
+ * from the endio_workers workqueue because for data bios we keep a
+ * reference on an ordered extent which gets dropped when running
+ * btrfs_bio_end_io() in that workqueue, and that final drop results in
+ * adding a delayed iput for the inode.
+ */
+ if (fs_info->endio_workers)
+ flush_workqueue(fs_info->endio_workers);
+
/*
* After we parked the cleaner kthread, ordered extents may have
* completed and created new delayed iputs. If one of the async reclaim
--
2.45.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/4] btrfs: fix non-empty delayed iputs list on unmount due to compressed write workers
2025-03-05 17:45 [PATCH 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups fdmanana
2025-03-05 17:45 ` [PATCH 1/4] btrfs: fix non-empty delayed iputs list on unmount due to endio workers fdmanana
@ 2025-03-05 17:45 ` fdmanana
2025-03-05 17:45 ` [PATCH 3/4] btrfs: move __btrfs_bio_end_io() code into its single caller fdmanana
` (2 subsequent siblings)
4 siblings, 0 replies; 16+ messages in thread
From: fdmanana @ 2025-03-05 17:45 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
At close_ctree() after we have ran delayed iputs either through explicitly
calling btrfs_run_delayed_iputs() or later during the call to
btrfs_commit_super() or btrfs_error_commit_super(), we assert that the
delayed iputs list is empty.
When we have compressed writes this assertion may fail because delayed
iputs may have been added to the list after we last ran delayed iputs.
This happens like this:
1) We have a compressed write bio executing;
2) We enter close_ctree() and flush the fs_info->endio_write_workers
queue which is the queue used for running ordered extent completion;
3) The compressed write bio finishes and enters
btrfs_finish_compressed_write_work(), where it calls
btrfs_finish_ordered_extent() which in turn calls
btrfs_queue_ordered_fn(), which queues a work item in the
fs_info->endio_write_workers queue that we have flushed before;
4) At close_ctree() we proceed, run all existing delayed iputs and
call btrfs_commit_super() (which also runs delayed iputs), but before
we run the following assertion below:
ASSERT(list_empty(&fs_info->delayed_iputs))
A delayed iput is added by the step below...
5) The ordered extent completion job queued in step 3 runs and results in
creating a delayed iput when dropping the last reference of the ordered
extent (a call to btrfs_put_ordered_extent() made from
btrfs_finish_one_ordered());
6) At this point the delayed iputs list is not empty, so the assertion at
close_ctree() fails.
Fix this by flushing the fs_info->compressed_write_workers queue at
close_ctree() before flushing the fs_info->endio_write_workers queue,
respecting the queue dependency as the later is responsible for the
execution of ordered extent completion.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/disk-io.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index df8e075e69a3..95277c05fefa 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4350,6 +4350,19 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
if (fs_info->endio_workers)
flush_workqueue(fs_info->endio_workers);
+ /*
+ * When finishing a compressed write bio we schedule a work queue item
+ * to finish an ordered extent - btrfs_finish_compressed_write_work()
+ * calls btrfs_finish_ordered_extent() which in turns does a call to
+ * btrfs_queue_ordered_fn(), and that queues the ordered extent
+ * completion either in the endio_write_workers work queue or in the
+ * fs_info->endio_freespace_worker work queue. We flush those queues
+ * below, so before we flush them we must flush this queue for the
+ * workers of compressed writes.
+ */
+ if (fs_info->compressed_write_workers)
+ flush_workqueue(fs_info->compressed_write_workers);
+
/*
* After we parked the cleaner kthread, ordered extents may have
* completed and created new delayed iputs. If one of the async reclaim
--
2.45.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/4] btrfs: move __btrfs_bio_end_io() code into its single caller
2025-03-05 17:45 [PATCH 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups fdmanana
2025-03-05 17:45 ` [PATCH 1/4] btrfs: fix non-empty delayed iputs list on unmount due to endio workers fdmanana
2025-03-05 17:45 ` [PATCH 2/4] btrfs: fix non-empty delayed iputs list on unmount due to compressed write workers fdmanana
@ 2025-03-05 17:45 ` fdmanana
2025-03-05 17:45 ` [PATCH 4/4] btrfs: move btrfs_cleanup_bio() " fdmanana
2025-03-05 18:17 ` [PATCH v2 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups fdmanana
4 siblings, 0 replies; 16+ messages in thread
From: fdmanana @ 2025-03-05 17:45 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
The __btrfs_bio_end_io() helper is trivial and has a single caller, so
there's no point in having a dedicated helper function. Further the double
underscore prefix in the name is discouraged. So get rid of it and move
its code into the caller (btrfs_bio_end_io()).
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/bio.c | 22 +++++++++-------------
1 file changed, 9 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 2f32ee215c3f..07bbb0da2812 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -105,18 +105,6 @@ static void btrfs_cleanup_bio(struct btrfs_bio *bbio)
bio_put(&bbio->bio);
}
-static void __btrfs_bio_end_io(struct btrfs_bio *bbio)
-{
- if (bbio_has_ordered_extent(bbio)) {
- struct btrfs_ordered_extent *ordered = bbio->ordered;
-
- bbio->end_io(bbio);
- btrfs_put_ordered_extent(ordered);
- } else {
- bbio->end_io(bbio);
- }
-}
-
void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status)
{
bbio->bio.bi_status = status;
@@ -138,7 +126,15 @@ void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status)
/* Load split bio's error which might be set above. */
if (status == BLK_STS_OK)
bbio->bio.bi_status = READ_ONCE(bbio->status);
- __btrfs_bio_end_io(bbio);
+
+ if (bbio_has_ordered_extent(bbio)) {
+ struct btrfs_ordered_extent *ordered = bbio->ordered;
+
+ bbio->end_io(bbio);
+ btrfs_put_ordered_extent(ordered);
+ } else {
+ bbio->end_io(bbio);
+ }
}
}
--
2.45.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 4/4] btrfs: move btrfs_cleanup_bio() code into its single caller
2025-03-05 17:45 [PATCH 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups fdmanana
` (2 preceding siblings ...)
2025-03-05 17:45 ` [PATCH 3/4] btrfs: move __btrfs_bio_end_io() code into its single caller fdmanana
@ 2025-03-05 17:45 ` fdmanana
2025-03-05 18:17 ` [PATCH v2 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups fdmanana
4 siblings, 0 replies; 16+ messages in thread
From: fdmanana @ 2025-03-05 17:45 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
The btrfs_cleanup_bio() helper is trivial and has a single caller, there's
no point in having a dedicated helper function. So get rid of it and move
its code into the caller (btrfs_bio_end_io()).
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/bio.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 07bbb0da2812..e9840954bf4a 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -97,21 +97,17 @@ static struct btrfs_bio *btrfs_split_bio(struct btrfs_fs_info *fs_info,
return bbio;
}
-/* Free a bio that was never submitted to the underlying device. */
-static void btrfs_cleanup_bio(struct btrfs_bio *bbio)
-{
- if (bbio_has_ordered_extent(bbio))
- btrfs_put_ordered_extent(bbio->ordered);
- bio_put(&bbio->bio);
-}
-
void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status)
{
bbio->bio.bi_status = status;
if (bbio->bio.bi_pool == &btrfs_clone_bioset) {
struct btrfs_bio *orig_bbio = bbio->private;
- btrfs_cleanup_bio(bbio);
+ /* Free bio that was never submitted to the underlying device. */
+ if (bbio_has_ordered_extent(bbio))
+ btrfs_put_ordered_extent(bbio->ordered);
+ bio_put(&bbio->bio);
+
bbio = orig_bbio;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups
2025-03-05 17:45 [PATCH 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups fdmanana
` (3 preceding siblings ...)
2025-03-05 17:45 ` [PATCH 4/4] btrfs: move btrfs_cleanup_bio() " fdmanana
@ 2025-03-05 18:17 ` fdmanana
2025-03-05 18:17 ` [PATCH v2 1/4] btrfs: fix non-empty delayed iputs list on unmount due to endio workers fdmanana
` (5 more replies)
4 siblings, 6 replies; 16+ messages in thread
From: fdmanana @ 2025-03-05 18:17 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
Fix a couple races that can result in adding delayed iputs during umount after
we no longer expect to find any, triggering an assertion failure. Plus a couple
cleanups. Details in the change logs.
V2: Removed the NULL checks for the workqueues in patches 1 and 2, as they
can never be NULL while at close_ctree() (they can only be NULL in error
paths from open_ctree()).
Filipe Manana (4):
btrfs: fix non-empty delayed iputs list on unmount due to endio workers
btrfs: fix non-empty delayed iputs list on unmount due to compressed write workers
btrfs: move __btrfs_bio_end_io() code into its single caller
btrfs: move btrfs_cleanup_bio() code into its single caller
fs/btrfs/bio.c | 36 ++++++++++++++----------------------
fs/btrfs/disk-io.c | 21 +++++++++++++++++++++
2 files changed, 35 insertions(+), 22 deletions(-)
--
2.45.2
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 1/4] btrfs: fix non-empty delayed iputs list on unmount due to endio workers
2025-03-05 18:17 ` [PATCH v2 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups fdmanana
@ 2025-03-05 18:17 ` fdmanana
2025-03-05 22:46 ` Qu Wenruo
2025-03-05 18:17 ` [PATCH v2 2/4] btrfs: fix non-empty delayed iputs list on unmount due to compressed write workers fdmanana
` (4 subsequent siblings)
5 siblings, 1 reply; 16+ messages in thread
From: fdmanana @ 2025-03-05 18:17 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
At close_ctree() after we have ran delayed iputs either through explicitly
calling btrfs_run_delayed_iputs() or later during the call to
btrfs_commit_super() or btrfs_error_commit_super(), we assert that the
delayed iputs list is empty.
Sometimes this assertion may fail because delayed iputs may have been
added to the list after we last ran delayed iputs, and this happens due
to workers in the endio_workers workqueue still running. These workers can
do a final put on an ordered extent attached to a data bio, which results
in adding a delayed iput. This is done at btrfs_bio_end_io() and its
helper __btrfs_bio_end_io().
Fix this by flushing the endio_workers workqueue before running delayed
iputs at close_ctree().
David reported this when running generic/648.
Reported-by: David Sterba <dsterba@suse.com>
Fixes: ec63b84d4611 ("btrfs: add an ordered_extent pointer to struct btrfs_bio")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/disk-io.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d96ea974ef73..b6194ae97361 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4340,6 +4340,15 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
*/
btrfs_flush_workqueue(fs_info->delalloc_workers);
+ /*
+ * We can also have ordered extents getting their last reference dropped
+ * from the endio_workers workqueue because for data bios we keep a
+ * reference on an ordered extent which gets dropped when running
+ * btrfs_bio_end_io() in that workqueue, and that final drop results in
+ * adding a delayed iput for the inode.
+ */
+ flush_workqueue(fs_info->endio_workers);
+
/*
* After we parked the cleaner kthread, ordered extents may have
* completed and created new delayed iputs. If one of the async reclaim
--
2.45.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 2/4] btrfs: fix non-empty delayed iputs list on unmount due to compressed write workers
2025-03-05 18:17 ` [PATCH v2 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups fdmanana
2025-03-05 18:17 ` [PATCH v2 1/4] btrfs: fix non-empty delayed iputs list on unmount due to endio workers fdmanana
@ 2025-03-05 18:17 ` fdmanana
2025-03-05 18:17 ` [PATCH v2 3/4] btrfs: move __btrfs_bio_end_io() code into its single caller fdmanana
` (3 subsequent siblings)
5 siblings, 0 replies; 16+ messages in thread
From: fdmanana @ 2025-03-05 18:17 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
At close_ctree() after we have ran delayed iputs either through explicitly
calling btrfs_run_delayed_iputs() or later during the call to
btrfs_commit_super() or btrfs_error_commit_super(), we assert that the
delayed iputs list is empty.
When we have compressed writes this assertion may fail because delayed
iputs may have been added to the list after we last ran delayed iputs.
This happens like this:
1) We have a compressed write bio executing;
2) We enter close_ctree() and flush the fs_info->endio_write_workers
queue which is the queue used for running ordered extent completion;
3) The compressed write bio finishes and enters
btrfs_finish_compressed_write_work(), where it calls
btrfs_finish_ordered_extent() which in turn calls
btrfs_queue_ordered_fn(), which queues a work item in the
fs_info->endio_write_workers queue that we have flushed before;
4) At close_ctree() we proceed, run all existing delayed iputs and
call btrfs_commit_super() (which also runs delayed iputs), but before
we run the following assertion below:
ASSERT(list_empty(&fs_info->delayed_iputs))
A delayed iput is added by the step below...
5) The ordered extent completion job queued in step 3 runs and results in
creating a delayed iput when dropping the last reference of the ordered
extent (a call to btrfs_put_ordered_extent() made from
btrfs_finish_one_ordered());
6) At this point the delayed iputs list is not empty, so the assertion at
close_ctree() fails.
Fix this by flushing the fs_info->compressed_write_workers queue at
close_ctree() before flushing the fs_info->endio_write_workers queue,
respecting the queue dependency as the later is responsible for the
execution of ordered extent completion.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/disk-io.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b6194ae97361..cae5113b91da 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4349,6 +4349,18 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
*/
flush_workqueue(fs_info->endio_workers);
+ /*
+ * When finishing a compressed write bio we schedule a work queue item
+ * to finish an ordered extent - btrfs_finish_compressed_write_work()
+ * calls btrfs_finish_ordered_extent() which in turns does a call to
+ * btrfs_queue_ordered_fn(), and that queues the ordered extent
+ * completion either in the endio_write_workers work queue or in the
+ * fs_info->endio_freespace_worker work queue. We flush those queues
+ * below, so before we flush them we must flush this queue for the
+ * workers of compressed writes.
+ */
+ flush_workqueue(fs_info->compressed_write_workers);
+
/*
* After we parked the cleaner kthread, ordered extents may have
* completed and created new delayed iputs. If one of the async reclaim
--
2.45.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 3/4] btrfs: move __btrfs_bio_end_io() code into its single caller
2025-03-05 18:17 ` [PATCH v2 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups fdmanana
2025-03-05 18:17 ` [PATCH v2 1/4] btrfs: fix non-empty delayed iputs list on unmount due to endio workers fdmanana
2025-03-05 18:17 ` [PATCH v2 2/4] btrfs: fix non-empty delayed iputs list on unmount due to compressed write workers fdmanana
@ 2025-03-05 18:17 ` fdmanana
2025-03-05 18:17 ` [PATCH v2 4/4] btrfs: move btrfs_cleanup_bio() " fdmanana
` (2 subsequent siblings)
5 siblings, 0 replies; 16+ messages in thread
From: fdmanana @ 2025-03-05 18:17 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
The __btrfs_bio_end_io() helper is trivial and has a single caller, so
there's no point in having a dedicated helper function. Further the double
underscore prefix in the name is discouraged. So get rid of it and move
its code into the caller (btrfs_bio_end_io()).
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/bio.c | 22 +++++++++-------------
1 file changed, 9 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 2f32ee215c3f..07bbb0da2812 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -105,18 +105,6 @@ static void btrfs_cleanup_bio(struct btrfs_bio *bbio)
bio_put(&bbio->bio);
}
-static void __btrfs_bio_end_io(struct btrfs_bio *bbio)
-{
- if (bbio_has_ordered_extent(bbio)) {
- struct btrfs_ordered_extent *ordered = bbio->ordered;
-
- bbio->end_io(bbio);
- btrfs_put_ordered_extent(ordered);
- } else {
- bbio->end_io(bbio);
- }
-}
-
void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status)
{
bbio->bio.bi_status = status;
@@ -138,7 +126,15 @@ void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status)
/* Load split bio's error which might be set above. */
if (status == BLK_STS_OK)
bbio->bio.bi_status = READ_ONCE(bbio->status);
- __btrfs_bio_end_io(bbio);
+
+ if (bbio_has_ordered_extent(bbio)) {
+ struct btrfs_ordered_extent *ordered = bbio->ordered;
+
+ bbio->end_io(bbio);
+ btrfs_put_ordered_extent(ordered);
+ } else {
+ bbio->end_io(bbio);
+ }
}
}
--
2.45.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 4/4] btrfs: move btrfs_cleanup_bio() code into its single caller
2025-03-05 18:17 ` [PATCH v2 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups fdmanana
` (2 preceding siblings ...)
2025-03-05 18:17 ` [PATCH v2 3/4] btrfs: move __btrfs_bio_end_io() code into its single caller fdmanana
@ 2025-03-05 18:17 ` fdmanana
2025-03-05 22:54 ` [PATCH v2 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups Qu Wenruo
2025-03-06 10:09 ` Qu Wenruo
5 siblings, 0 replies; 16+ messages in thread
From: fdmanana @ 2025-03-05 18:17 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
The btrfs_cleanup_bio() helper is trivial and has a single caller, there's
no point in having a dedicated helper function. So get rid of it and move
its code into the caller (btrfs_bio_end_io()).
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/bio.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 07bbb0da2812..e9840954bf4a 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -97,21 +97,17 @@ static struct btrfs_bio *btrfs_split_bio(struct btrfs_fs_info *fs_info,
return bbio;
}
-/* Free a bio that was never submitted to the underlying device. */
-static void btrfs_cleanup_bio(struct btrfs_bio *bbio)
-{
- if (bbio_has_ordered_extent(bbio))
- btrfs_put_ordered_extent(bbio->ordered);
- bio_put(&bbio->bio);
-}
-
void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status)
{
bbio->bio.bi_status = status;
if (bbio->bio.bi_pool == &btrfs_clone_bioset) {
struct btrfs_bio *orig_bbio = bbio->private;
- btrfs_cleanup_bio(bbio);
+ /* Free bio that was never submitted to the underlying device. */
+ if (bbio_has_ordered_extent(bbio))
+ btrfs_put_ordered_extent(bbio->ordered);
+ bio_put(&bbio->bio);
+
bbio = orig_bbio;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/4] btrfs: fix non-empty delayed iputs list on unmount due to endio workers
2025-03-05 18:17 ` [PATCH v2 1/4] btrfs: fix non-empty delayed iputs list on unmount due to endio workers fdmanana
@ 2025-03-05 22:46 ` Qu Wenruo
2025-03-05 22:50 ` Qu Wenruo
0 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2025-03-05 22:46 UTC (permalink / raw)
To: fdmanana, linux-btrfs
在 2025/3/6 04:47, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> At close_ctree() after we have ran delayed iputs either through explicitly
> calling btrfs_run_delayed_iputs() or later during the call to
> btrfs_commit_super() or btrfs_error_commit_super(), we assert that the
> delayed iputs list is empty.
>
> Sometimes this assertion may fail because delayed iputs may have been
> added to the list after we last ran delayed iputs, and this happens due
> to workers in the endio_workers workqueue still running. These workers can
> do a final put on an ordered extent attached to a data bio, which results
> in adding a delayed iput. This is done at btrfs_bio_end_io() and its
> helper __btrfs_bio_end_io().
But the endio_workers workqueue is only utilized by data READ
operations, in function btrfs_end_io_wq(), which is called in
btrfs_simple_end_io().
Furthermore, for the endio_workers workqueue, for data inodes it only
handles btrfs_check_read_bio(), which shouldn't generate ordered extent
either.
Did I miss something here?
Thanks,
Qu
>
> Fix this by flushing the endio_workers workqueue before running delayed
> iputs at close_ctree().
>
> David reported this when running generic/648.
>
> Reported-by: David Sterba <dsterba@suse.com>
> Fixes: ec63b84d4611 ("btrfs: add an ordered_extent pointer to struct btrfs_bio")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> fs/btrfs/disk-io.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index d96ea974ef73..b6194ae97361 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -4340,6 +4340,15 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
> */
> btrfs_flush_workqueue(fs_info->delalloc_workers);
>
> + /*
> + * We can also have ordered extents getting their last reference dropped
> + * from the endio_workers workqueue because for data bios we keep a
> + * reference on an ordered extent which gets dropped when running
> + * btrfs_bio_end_io() in that workqueue, and that final drop results in
> + * adding a delayed iput for the inode.
> + */
> + flush_workqueue(fs_info->endio_workers);
> +
> /*
> * After we parked the cleaner kthread, ordered extents may have
> * completed and created new delayed iputs. If one of the async reclaim
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/4] btrfs: fix non-empty delayed iputs list on unmount due to endio workers
2025-03-05 22:46 ` Qu Wenruo
@ 2025-03-05 22:50 ` Qu Wenruo
2025-03-05 23:14 ` Filipe Manana
0 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2025-03-05 22:50 UTC (permalink / raw)
To: fdmanana, linux-btrfs
在 2025/3/6 09:16, Qu Wenruo 写道:
>
>
> 在 2025/3/6 04:47, fdmanana@kernel.org 写道:
>> From: Filipe Manana <fdmanana@suse.com>
>>
>> At close_ctree() after we have ran delayed iputs either through
>> explicitly
>> calling btrfs_run_delayed_iputs() or later during the call to
>> btrfs_commit_super() or btrfs_error_commit_super(), we assert that the
>> delayed iputs list is empty.
>>
>> Sometimes this assertion may fail because delayed iputs may have been
>> added to the list after we last ran delayed iputs, and this happens due
>> to workers in the endio_workers workqueue still running. These workers
>> can
>> do a final put on an ordered extent attached to a data bio, which results
>> in adding a delayed iput. This is done at btrfs_bio_end_io() and its
>> helper __btrfs_bio_end_io().
>
> But the endio_workers workqueue is only utilized by data READ
> operations, in function btrfs_end_io_wq(), which is called in
> btrfs_simple_end_io().
>
> Furthermore, for the endio_workers workqueue, for data inodes it only
> handles btrfs_check_read_bio(), which shouldn't generate ordered extent
> either.
>
> Did I miss something here?
Oh, I missed the recently disabled (for subpage) uncached io.
So I guess the real fixes tag should be that not-yet-upstreamed uncached
io patch.
Thanks,
Qu
>
> Thanks,
> Qu
>
>>
>> Fix this by flushing the endio_workers workqueue before running delayed
>> iputs at close_ctree().
>>
>> David reported this when running generic/648.
>>
>> Reported-by: David Sterba <dsterba@suse.com>
>> Fixes: ec63b84d4611 ("btrfs: add an ordered_extent pointer to struct
>> btrfs_bio")
>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>> ---
>> fs/btrfs/disk-io.c | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index d96ea974ef73..b6194ae97361 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -4340,6 +4340,15 @@ void __cold close_ctree(struct btrfs_fs_info
>> *fs_info)
>> */
>> btrfs_flush_workqueue(fs_info->delalloc_workers);
>> + /*
>> + * We can also have ordered extents getting their last reference
>> dropped
>> + * from the endio_workers workqueue because for data bios we keep a
>> + * reference on an ordered extent which gets dropped when running
>> + * btrfs_bio_end_io() in that workqueue, and that final drop
>> results in
>> + * adding a delayed iput for the inode.
>> + */
>> + flush_workqueue(fs_info->endio_workers);
>> +
>> /*
>> * After we parked the cleaner kthread, ordered extents may have
>> * completed and created new delayed iputs. If one of the async
>> reclaim
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups
2025-03-05 18:17 ` [PATCH v2 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups fdmanana
` (3 preceding siblings ...)
2025-03-05 18:17 ` [PATCH v2 4/4] btrfs: move btrfs_cleanup_bio() " fdmanana
@ 2025-03-05 22:54 ` Qu Wenruo
2025-03-06 10:09 ` Qu Wenruo
5 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2025-03-05 22:54 UTC (permalink / raw)
To: fdmanana, linux-btrfs
在 2025/3/6 04:47, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> Fix a couple races that can result in adding delayed iputs during umount after
> we no longer expect to find any, triggering an assertion failure. Plus a couple
> cleanups. Details in the change logs.
>
> V2: Removed the NULL checks for the workqueues in patches 1 and 2, as they
> can never be NULL while at close_ctree() (they can only be NULL in error
> paths from open_ctree()).
Commented on the first patch that the fixed tag should be the uncached
io enablement.
Since before that we only handle data read operations in endio_workers,
which should not get ordered extent involved at all.
(Thankfully we get that feature disabled for now)
Otherwise looks good to me.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Thanks,
Qu
>
> Filipe Manana (4):
> btrfs: fix non-empty delayed iputs list on unmount due to endio workers
> btrfs: fix non-empty delayed iputs list on unmount due to compressed write workers
> btrfs: move __btrfs_bio_end_io() code into its single caller
> btrfs: move btrfs_cleanup_bio() code into its single caller
>
> fs/btrfs/bio.c | 36 ++++++++++++++----------------------
> fs/btrfs/disk-io.c | 21 +++++++++++++++++++++
> 2 files changed, 35 insertions(+), 22 deletions(-)
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/4] btrfs: fix non-empty delayed iputs list on unmount due to endio workers
2025-03-05 22:50 ` Qu Wenruo
@ 2025-03-05 23:14 ` Filipe Manana
0 siblings, 0 replies; 16+ messages in thread
From: Filipe Manana @ 2025-03-05 23:14 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Wed, Mar 5, 2025 at 10:50 PM Qu Wenruo <wqu@suse.com> wrote:
>
>
>
> 在 2025/3/6 09:16, Qu Wenruo 写道:
> >
> >
> > 在 2025/3/6 04:47, fdmanana@kernel.org 写道:
> >> From: Filipe Manana <fdmanana@suse.com>
> >>
> >> At close_ctree() after we have ran delayed iputs either through
> >> explicitly
> >> calling btrfs_run_delayed_iputs() or later during the call to
> >> btrfs_commit_super() or btrfs_error_commit_super(), we assert that the
> >> delayed iputs list is empty.
> >>
> >> Sometimes this assertion may fail because delayed iputs may have been
> >> added to the list after we last ran delayed iputs, and this happens due
> >> to workers in the endio_workers workqueue still running. These workers
> >> can
> >> do a final put on an ordered extent attached to a data bio, which results
> >> in adding a delayed iput. This is done at btrfs_bio_end_io() and its
> >> helper __btrfs_bio_end_io().
> >
> > But the endio_workers workqueue is only utilized by data READ
> > operations, in function btrfs_end_io_wq(), which is called in
> > btrfs_simple_end_io().
> >
> > Furthermore, for the endio_workers workqueue, for data inodes it only
> > handles btrfs_check_read_bio(), which shouldn't generate ordered extent
> > either.
> >
> > Did I miss something here?
>
> Oh, I missed the recently disabled (for subpage) uncached io.
>
> So I guess the real fixes tag should be that not-yet-upstreamed uncached
> io patch.
Oh yes, I picked a wrong commit somehow.
Since the uncached stuff is not in Linus' tree, I'll remove the Fixes
tag and just mention it happens for uncached IO added recently and the
subject of the patch that introduced it.
Thanks.
>
> Thanks,
> Qu
>
> >
> > Thanks,
> > Qu
> >
> >>
> >> Fix this by flushing the endio_workers workqueue before running delayed
> >> iputs at close_ctree().
> >>
> >> David reported this when running generic/648.
> >>
> >> Reported-by: David Sterba <dsterba@suse.com>
> >> Fixes: ec63b84d4611 ("btrfs: add an ordered_extent pointer to struct
> >> btrfs_bio")
> >> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> >> ---
> >> fs/btrfs/disk-io.c | 9 +++++++++
> >> 1 file changed, 9 insertions(+)
> >>
> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> >> index d96ea974ef73..b6194ae97361 100644
> >> --- a/fs/btrfs/disk-io.c
> >> +++ b/fs/btrfs/disk-io.c
> >> @@ -4340,6 +4340,15 @@ void __cold close_ctree(struct btrfs_fs_info
> >> *fs_info)
> >> */
> >> btrfs_flush_workqueue(fs_info->delalloc_workers);
> >> + /*
> >> + * We can also have ordered extents getting their last reference
> >> dropped
> >> + * from the endio_workers workqueue because for data bios we keep a
> >> + * reference on an ordered extent which gets dropped when running
> >> + * btrfs_bio_end_io() in that workqueue, and that final drop
> >> results in
> >> + * adding a delayed iput for the inode.
> >> + */
> >> + flush_workqueue(fs_info->endio_workers);
> >> +
> >> /*
> >> * After we parked the cleaner kthread, ordered extents may have
> >> * completed and created new delayed iputs. If one of the async
> >> reclaim
> >
> >
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups
2025-03-05 18:17 ` [PATCH v2 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups fdmanana
` (4 preceding siblings ...)
2025-03-05 22:54 ` [PATCH v2 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups Qu Wenruo
@ 2025-03-06 10:09 ` Qu Wenruo
2025-03-06 16:00 ` Filipe Manana
5 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2025-03-06 10:09 UTC (permalink / raw)
To: fdmanana, linux-btrfs
在 2025/3/6 04:47, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> Fix a couple races that can result in adding delayed iputs during umount after
> we no longer expect to find any, triggering an assertion failure. Plus a couple
> cleanups. Details in the change logs.
>
> V2: Removed the NULL checks for the workqueues in patches 1 and 2, as they
> can never be NULL while at close_ctree() (they can only be NULL in error
> paths from open_ctree()).
BTW, even with this series applied, I can still trigger the delayed iput
ASSERT():
[ 5364.406135] BTRFS warning (device dm-10 state EA): direct IO failed
ino 259 op 0x0 offset 0x14800 len 18432 err no 10
[ 5364.406327] BTRFS warning (device dm-10 state EA): direct IO failed
ino 301 op 0x0 offset 0x112000 len 12288 err no 10
[ 5364.406443] BTRFS warning (device dm-10 state EA): direct IO failed
ino 284 op 0x0 offset 0x129000 len 40960 err no 10
[ 5364.408115] BTRFS warning (device dm-10 state EA): direct IO failed
ino 333 op 0x0 offset 0x43000 len 2048 err no 10
[ 5364.408350] BTRFS warning (device dm-10 state EA): direct IO failed
ino 7914 op 0x0 offset 0x34a000 len 43008 err no 10
[ 5364.636270] BTRFS info (device dm-10 state EA): last unmount of
filesystem 9c4c225e-d4c6-43d0-b8c9-4b3afb5cb3cc
[ 5364.639881] assertion failed: list_empty(&fs_info->delayed_iputs), in
fs/btrfs/disk-io.c:4419
[ 5364.641814] ------------[ cut here ]------------
[ 5364.642733] kernel BUG at fs/btrfs/disk-io.c:4419!
[ 5364.643712] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 5364.644880] CPU: 5 UID: 0 PID: 2672520 Comm: umount Tainted: G
W 6.14.0-rc5-custom+ #224
[ 5364.646787] Tainted: [W]=WARN
I have hit this at least twice, on x86_64 with the new experimental 2K
block size.
And this is the latest for-next, which does not have the uncached IO
patch at all.
Since I do not have compression enable for the mount options, I believe
there is some extra causes.
Thanks,
Qu
>
> Filipe Manana (4):
> btrfs: fix non-empty delayed iputs list on unmount due to endio workers
> btrfs: fix non-empty delayed iputs list on unmount due to compressed write workers
> btrfs: move __btrfs_bio_end_io() code into its single caller
> btrfs: move btrfs_cleanup_bio() code into its single caller
>
> fs/btrfs/bio.c | 36 ++++++++++++++----------------------
> fs/btrfs/disk-io.c | 21 +++++++++++++++++++++
> 2 files changed, 35 insertions(+), 22 deletions(-)
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups
2025-03-06 10:09 ` Qu Wenruo
@ 2025-03-06 16:00 ` Filipe Manana
0 siblings, 0 replies; 16+ messages in thread
From: Filipe Manana @ 2025-03-06 16:00 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Thu, Mar 6, 2025 at 10:09 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> 在 2025/3/6 04:47, fdmanana@kernel.org 写道:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > Fix a couple races that can result in adding delayed iputs during umount after
> > we no longer expect to find any, triggering an assertion failure. Plus a couple
> > cleanups. Details in the change logs.
> >
> > V2: Removed the NULL checks for the workqueues in patches 1 and 2, as they
> > can never be NULL while at close_ctree() (they can only be NULL in error
> > paths from open_ctree()).
>
> BTW, even with this series applied, I can still trigger the delayed iput
> ASSERT():
>
> [ 5364.406135] BTRFS warning (device dm-10 state EA): direct IO failed
> ino 259 op 0x0 offset 0x14800 len 18432 err no 10
> [ 5364.406327] BTRFS warning (device dm-10 state EA): direct IO failed
> ino 301 op 0x0 offset 0x112000 len 12288 err no 10
> [ 5364.406443] BTRFS warning (device dm-10 state EA): direct IO failed
> ino 284 op 0x0 offset 0x129000 len 40960 err no 10
> [ 5364.408115] BTRFS warning (device dm-10 state EA): direct IO failed
> ino 333 op 0x0 offset 0x43000 len 2048 err no 10
> [ 5364.408350] BTRFS warning (device dm-10 state EA): direct IO failed
> ino 7914 op 0x0 offset 0x34a000 len 43008 err no 10
> [ 5364.636270] BTRFS info (device dm-10 state EA): last unmount of
> filesystem 9c4c225e-d4c6-43d0-b8c9-4b3afb5cb3cc
> [ 5364.639881] assertion failed: list_empty(&fs_info->delayed_iputs), in
> fs/btrfs/disk-io.c:4419
> [ 5364.641814] ------------[ cut here ]------------
> [ 5364.642733] kernel BUG at fs/btrfs/disk-io.c:4419!
> [ 5364.643712] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [ 5364.644880] CPU: 5 UID: 0 PID: 2672520 Comm: umount Tainted: G
> W 6.14.0-rc5-custom+ #224
> [ 5364.646787] Tainted: [W]=WARN
>
> I have hit this at least twice, on x86_64 with the new experimental 2K
> block size.
>
> And this is the latest for-next, which does not have the uncached IO
> patch at all.
>
> Since I do not have compression enable for the mount options, I believe
> there is some extra causes.
There is, in case of IO errors we can add delayed iputs from another queue,
which makes sense since generic/648 exercises IO error simulation with dmerror,
and your dmesg also shows IO errors. Patch here:
https://lore.kernel.org/linux-btrfs/b07f13dbfadfdb5884b21b97bbf1316c45d06a32.1741272910.git.fdmanana@suse.com/
Thanks.
>
> Thanks,
> Qu
>
> >
> > Filipe Manana (4):
> > btrfs: fix non-empty delayed iputs list on unmount due to endio workers
> > btrfs: fix non-empty delayed iputs list on unmount due to compressed write workers
> > btrfs: move __btrfs_bio_end_io() code into its single caller
> > btrfs: move btrfs_cleanup_bio() code into its single caller
> >
> > fs/btrfs/bio.c | 36 ++++++++++++++----------------------
> > fs/btrfs/disk-io.c | 21 +++++++++++++++++++++
> > 2 files changed, 35 insertions(+), 22 deletions(-)
> >
>
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-03-06 16:01 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-05 17:45 [PATCH 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups fdmanana
2025-03-05 17:45 ` [PATCH 1/4] btrfs: fix non-empty delayed iputs list on unmount due to endio workers fdmanana
2025-03-05 17:45 ` [PATCH 2/4] btrfs: fix non-empty delayed iputs list on unmount due to compressed write workers fdmanana
2025-03-05 17:45 ` [PATCH 3/4] btrfs: move __btrfs_bio_end_io() code into its single caller fdmanana
2025-03-05 17:45 ` [PATCH 4/4] btrfs: move btrfs_cleanup_bio() " fdmanana
2025-03-05 18:17 ` [PATCH v2 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups fdmanana
2025-03-05 18:17 ` [PATCH v2 1/4] btrfs: fix non-empty delayed iputs list on unmount due to endio workers fdmanana
2025-03-05 22:46 ` Qu Wenruo
2025-03-05 22:50 ` Qu Wenruo
2025-03-05 23:14 ` Filipe Manana
2025-03-05 18:17 ` [PATCH v2 2/4] btrfs: fix non-empty delayed iputs list on unmount due to compressed write workers fdmanana
2025-03-05 18:17 ` [PATCH v2 3/4] btrfs: move __btrfs_bio_end_io() code into its single caller fdmanana
2025-03-05 18:17 ` [PATCH v2 4/4] btrfs: move btrfs_cleanup_bio() " fdmanana
2025-03-05 22:54 ` [PATCH v2 0/4] btrfs: fix unexpected delayed iputs at umount time and cleanups Qu Wenruo
2025-03-06 10:09 ` Qu Wenruo
2025-03-06 16:00 ` Filipe Manana
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox