* [PATCH 0/2] a couple more freeze/shutdown fixes
@ 2023-12-05 13:24 Brian Foster
2023-12-05 13:24 ` [PATCH 1/2] bcachefs: don't attempt rw on unfreeze when shutdown Brian Foster
2023-12-05 13:24 ` [PATCH 2/2] bcachefs: return from fsync on writeback error to avoid early shutdown Brian Foster
0 siblings, 2 replies; 5+ messages in thread
From: Brian Foster @ 2023-12-05 13:24 UTC (permalink / raw)
To: linux-bcachefs
Hi Kent,
I managed to catch up a bit on the fsync thing we had talked about
earlier this year. Skimming through the original thread, I think this
mail [1] summarizes things best. The short of it is that we can address
at least a couple of the failures we're seeing with fstests
generic/441,484 with a couple small tweaks in bcachefs and fstests, but
these aren't necessarily the longer term fix.
Firstly, patch 1 is just another unrelated (un)freeze fixup I happened
across when hacking around. I don't know that it currently associates to
any related test failures. I just include it here for convenience.
Patch 2 of this series tweaks the fsync path to be a bit more deliberate
/ less aggressive to help avoid spurious shutdowns. The reasoning behind
this is that if fsync fails, the user can't be certain of the state of
things on disk anyways. What I've observed with this patch is that it
seems to prevent generic/484 failures (though not sure that is
guaranteed) and based on the original thread, it can address generic/441
when combined with an fstests tweak to allow the fs a bit of time to
idle before transitioning to the dm error table..
All in all, I still think this is a reasonable incremental improvement.
I think the longer term fix here is more something like the ability to
retry metadata I/O on failure such that we can be a little less
sensitive to emergency shutdowns. I had managed to hack up a quick
prototype of metadata I/O failure/retries a few weeks or so ago just to
explore how difficult it might be, and it didn't seem that bad IIRC. The
bigger question in my mind is how to deal with journal writes,
particularly if journal I/O is any more frequent than the common
filesystems fstests tends to accommodate (i.e. xfs, ext4, etc.). I
suspect this is worth discussing further in an upcoming call..
Also just as a data point, btrfs skips generic/441 in favor of its own
custom variant in btrfs/146. That test runs the same fsync tool, but it
looks like it sets up a combination of data striping (raid0) and
metadata replication on the fs presumably to facilitate data I/O errors
on single disk errors without triggering high level metadata errors.
This might be another option worth considering for bcachefs if we can do
something similar...
Thoughts, reviews, flames appreciated.
Brian
[1] https://lore.kernel.org/linux-bcachefs/Y+EduoshRHXec+XU@bfoster/
Brian Foster (2):
bcachefs: don't attempt rw on unfreeze when shutdown
bcachefs: return from fsync on writeback error to avoid early shutdown
fs/bcachefs/fs-io.c | 14 +++++++++-----
fs/bcachefs/fs.c | 3 +++
2 files changed, 12 insertions(+), 5 deletions(-)
--
2.42.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] bcachefs: don't attempt rw on unfreeze when shutdown
2023-12-05 13:24 [PATCH 0/2] a couple more freeze/shutdown fixes Brian Foster
@ 2023-12-05 13:24 ` Brian Foster
2023-12-05 19:12 ` Kent Overstreet
2023-12-05 13:24 ` [PATCH 2/2] bcachefs: return from fsync on writeback error to avoid early shutdown Brian Foster
1 sibling, 1 reply; 5+ messages in thread
From: Brian Foster @ 2023-12-05 13:24 UTC (permalink / raw)
To: linux-bcachefs
The internal freeze mechanism in bcachefs mostly reuses the generic
rw<->ro transition code. If the fs happens to shutdown during or
after freeze, a transition back to rw can fail. This is expected,
but returning an error from the unfreeze callout prevents the
filesystem from being unfrozen.
Skip the read write transition if the fs is shutdown. This allows
the fs to unfreeze at the vfs level so writes will no longer block,
but will still fail due to the emergency read-only state of the fs.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
fs/bcachefs/fs.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/bcachefs/fs.c b/fs/bcachefs/fs.c
index 0d0a37cad2d4..480a61b60724 100644
--- a/fs/bcachefs/fs.c
+++ b/fs/bcachefs/fs.c
@@ -1733,6 +1733,9 @@ static int bch2_unfreeze(struct super_block *sb)
struct bch_fs *c = sb->s_fs_info;
int ret;
+ if (test_bit(BCH_FS_emergency_ro, &c->flags))
+ return 0;
+
down_write(&c->state_lock);
ret = bch2_fs_read_write(c);
up_write(&c->state_lock);
--
2.42.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] bcachefs: return from fsync on writeback error to avoid early shutdown
2023-12-05 13:24 [PATCH 0/2] a couple more freeze/shutdown fixes Brian Foster
2023-12-05 13:24 ` [PATCH 1/2] bcachefs: don't attempt rw on unfreeze when shutdown Brian Foster
@ 2023-12-05 13:24 ` Brian Foster
2023-12-05 19:12 ` Kent Overstreet
1 sibling, 1 reply; 5+ messages in thread
From: Brian Foster @ 2023-12-05 13:24 UTC (permalink / raw)
To: linux-bcachefs
When investigating transient failures of generic/441 on bcachefs, it
was determined that the cause of the failure was a combination of
unconditional emergency shutdown and racing between background
journal activity and the test switchover from a working device
mapper table to an error injecting table.
Part of the reason for this sequence of events is that bcachefs
aggressively flushes as much as possible during fsync(), regardless
of errors. While this is reasonable behavior, it is technically
unnecessary because once an error is returned from fsync(), the
caller cannot make any assumptions about the resilience of data.
Tweak the bch2_fsync() logic to return an error on failure of any of
the steps involved in the flush. Note that this change alone does
not prevent generic/441 failure, but in combination with a test
tweak to avoid racing during the dm-error table switchover it avoids
the unnecessary shutdowns and allows the test to pass reliably on
bcachefs.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
fs/bcachefs/fs-io.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/fs/bcachefs/fs-io.c b/fs/bcachefs/fs-io.c
index 31f40e587a4f..98bd5babab19 100644
--- a/fs/bcachefs/fs-io.c
+++ b/fs/bcachefs/fs-io.c
@@ -192,13 +192,17 @@ int bch2_fsync(struct file *file, loff_t start, loff_t end, int datasync)
{
struct bch_inode_info *inode = file_bch_inode(file);
struct bch_fs *c = inode->v.i_sb->s_fs_info;
- int ret, ret2, ret3;
+ int ret;
ret = file_write_and_wait_range(file, start, end);
- ret2 = sync_inode_metadata(&inode->v, 1);
- ret3 = bch2_flush_inode(c, inode);
-
- return bch2_err_class(ret ?: ret2 ?: ret3);
+ if (ret)
+ goto out;
+ ret = sync_inode_metadata(&inode->v, 1);
+ if (ret)
+ goto out;
+ ret = bch2_flush_inode(c, inode);
+out:
+ return bch2_err_class(ret);
}
/* truncate: */
--
2.42.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] bcachefs: don't attempt rw on unfreeze when shutdown
2023-12-05 13:24 ` [PATCH 1/2] bcachefs: don't attempt rw on unfreeze when shutdown Brian Foster
@ 2023-12-05 19:12 ` Kent Overstreet
0 siblings, 0 replies; 5+ messages in thread
From: Kent Overstreet @ 2023-12-05 19:12 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-bcachefs
On Tue, Dec 05, 2023 at 08:24:38AM -0500, Brian Foster wrote:
> The internal freeze mechanism in bcachefs mostly reuses the generic
> rw<->ro transition code. If the fs happens to shutdown during or
> after freeze, a transition back to rw can fail. This is expected,
> but returning an error from the unfreeze callout prevents the
> filesystem from being unfrozen.
>
> Skip the read write transition if the fs is shutdown. This allows
> the fs to unfreeze at the vfs level so writes will no longer block,
> but will still fail due to the emergency read-only state of the fs.
>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
Applied
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] bcachefs: return from fsync on writeback error to avoid early shutdown
2023-12-05 13:24 ` [PATCH 2/2] bcachefs: return from fsync on writeback error to avoid early shutdown Brian Foster
@ 2023-12-05 19:12 ` Kent Overstreet
0 siblings, 0 replies; 5+ messages in thread
From: Kent Overstreet @ 2023-12-05 19:12 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-bcachefs
On Tue, Dec 05, 2023 at 08:24:39AM -0500, Brian Foster wrote:
> When investigating transient failures of generic/441 on bcachefs, it
> was determined that the cause of the failure was a combination of
> unconditional emergency shutdown and racing between background
> journal activity and the test switchover from a working device
> mapper table to an error injecting table.
>
> Part of the reason for this sequence of events is that bcachefs
> aggressively flushes as much as possible during fsync(), regardless
> of errors. While this is reasonable behavior, it is technically
> unnecessary because once an error is returned from fsync(), the
> caller cannot make any assumptions about the resilience of data.
>
> Tweak the bch2_fsync() logic to return an error on failure of any of
> the steps involved in the flush. Note that this change alone does
> not prevent generic/441 failure, but in combination with a test
> tweak to avoid racing during the dm-error table switchover it avoids
> the unnecessary shutdowns and allows the test to pass reliably on
> bcachefs.
>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
Applied
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-12-05 19:12 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-05 13:24 [PATCH 0/2] a couple more freeze/shutdown fixes Brian Foster
2023-12-05 13:24 ` [PATCH 1/2] bcachefs: don't attempt rw on unfreeze when shutdown Brian Foster
2023-12-05 19:12 ` Kent Overstreet
2023-12-05 13:24 ` [PATCH 2/2] bcachefs: return from fsync on writeback error to avoid early shutdown Brian Foster
2023-12-05 19:12 ` Kent Overstreet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox