public inbox for linux-bcachefs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] [RFC WIP] bcachefs: online fsck
@ 2023-12-06 20:33 Kent Overstreet
  2023-12-06 20:33 ` [PATCH 1/6] bcachefs: thread_with_file Kent Overstreet
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Kent Overstreet @ 2023-12-06 20:33 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet, djwong

Spent a couple days hacking code with Darrick, and online fsck seems to
be coming _way_ sooner than I expected/realized - various threads coming
together nicely.

What works - most of checking alloc info:

All but the original btree_gc.c code already runs while we're RW, and
various internal processes are actively modifying the filesystem:
copygc, rebalance, the do_discard worker. This means that in particular
the check alloc info passes are already well tested in RW mode with
active use.

There's one known concurrency bug in that part of fsck, which is that we
process gaps in the alloc btree all at once, which is incorrect since we
don't have any way of taking range locks on the key cache, but that bug
doesn't affect online fsck any more than regular fsck.

Still debating how to fix that one, just dropping the "process a gap all
at once" and going back to key-by-key is a serious regression on mkfs
performance on huge filesystems, and fsck performance when the fs is
still mostly empty.

Todo: btree_gc.c

The old btree_gc code used to run while the fs was in use and RW (before
bcachefs was a fs); it was well tested and relied upon then, but the
traversal order meant that it had to block btree topology changes,
effectively any interior btree node update. As filesystems keep getting
bigger that is definitely no longer feasible, so that code will all need
to be reworked and modernized (and possibly split into multiple passes,
i.e. splitting out checking of indirect extent refcounts).

Todo: fsck.c

The passes that check filesystem-level structure were written assuming
the filesystem was not in use and being modified; they use the normal
btree transaction API that handles locking w.r.t. other transactions,
but not "correctly"; they do things like build up state (walking a btree
path, tracking where extents end in different snapshots), but the btree
transactions (trans_begin() -> trans_commit()) are smaller than they
should be to protect that state as they're walking things.

The main reason for this is that currently, there's a fixed limit on the
number of btree_paths (it's a fixed array) in the btree_trans object -
64. I've started looking at lifting that limit and making the array of
paths dynamically growable, and I think that's going to be the way to go
- the alternative would be introducing extra outside locking to all the
filesystem paths that touch that stuff, and we don't want to do that.

Fortunately other developments have happened that make growing the max
btree_paths look a lot more reasonable than when last I was considering
it, and fsck should be a good place for testing that out and seeing what
growing pains we encounter.

other misc todo:
 - the thread_with_file stuff can probably use further cleanup and
   refactoring; also the f_ops we implement needs a .poll() so userspace
   doesn't spin when using O_NONBLOCK

 - ???

 - profit!

Kent Overstreet (6):
  bcachefs: thread_with_file
  bcachefs: Add ability to redirect log output
  bcachefs: Mark recovery passses that are safe to run online
  bcachefs: bch2_run_online_recovery_passes()
  bcachefs: BCH_IOCTL_FSCK_OFFLINE
  bcachefs: BCH_IOCTL_FSCK_ONLINE

 fs/bcachefs/bcachefs.h       |  61 ++++--
 fs/bcachefs/bcachefs_ioctl.h |  24 +++
 fs/bcachefs/chardev.c        | 376 ++++++++++++++++++++++++++++++-----
 fs/bcachefs/opts.h           |   5 +
 fs/bcachefs/recovery.c       |  51 +++--
 fs/bcachefs/recovery.h       |   1 +
 fs/bcachefs/recovery_types.h |  73 +++----
 fs/bcachefs/super.c          |  28 +++
 8 files changed, 506 insertions(+), 113 deletions(-)

-- 
2.42.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-12-08 20:35 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-06 20:33 [PATCH 0/6] [RFC WIP] bcachefs: online fsck Kent Overstreet
2023-12-06 20:33 ` [PATCH 1/6] bcachefs: thread_with_file Kent Overstreet
2023-12-06 20:33 ` [PATCH 2/6] bcachefs: Add ability to redirect log output Kent Overstreet
2023-12-08 20:24   ` Brian Foster
2023-12-08 20:35     ` Kent Overstreet
2023-12-06 20:33 ` [PATCH 3/6] bcachefs: Mark recovery passses that are safe to run online Kent Overstreet
2023-12-06 20:33 ` [PATCH 4/6] bcachefs: bch2_run_online_recovery_passes() Kent Overstreet
2023-12-08 20:25   ` Brian Foster
2023-12-08 20:34     ` Kent Overstreet
2023-12-06 20:33 ` [PATCH 5/6] bcachefs: BCH_IOCTL_FSCK_OFFLINE Kent Overstreet
2023-12-08 20:26   ` Brian Foster
2023-12-08 20:33     ` Kent Overstreet
2023-12-06 20:33 ` [PATCH 6/6] bcachefs: BCH_IOCTL_FSCK_ONLINE Kent Overstreet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox