From: Kent Overstreet <kent.overstreet@linux.dev>
To: linux-bcachefs@vger.kernel.org
Cc: Kent Overstreet <kent.overstreet@linux.dev>, djwong@kernel.org
Subject: [PATCH 0/6] [RFC WIP] bcachefs: online fsck
Date: Wed, 6 Dec 2023 15:33:04 -0500 [thread overview]
Message-ID: <20231206203313.2197302-1-kent.overstreet@linux.dev> (raw)
Spent a couple days hacking code with Darrick, and online fsck seems to
be coming _way_ sooner than I expected/realized - various threads coming
together nicely.
What works - most of checking alloc info:
All but the original btree_gc.c code already runs while we're RW, and
various internal processes are actively modifying the filesystem:
copygc, rebalance, the do_discard worker. This means that in particular
the check alloc info passes are already well tested in RW mode with
active use.
There's one known concurrency bug in that part of fsck, which is that we
process gaps in the alloc btree all at once, which is incorrect since we
don't have any way of taking range locks on the key cache, but that bug
doesn't affect online fsck any more than regular fsck.
Still debating how to fix that one, just dropping the "process a gap all
at once" and going back to key-by-key is a serious regression on mkfs
performance on huge filesystems, and fsck performance when the fs is
still mostly empty.
Todo: btree_gc.c
The old btree_gc code used to run while the fs was in use and RW (before
bcachefs was a fs); it was well tested and relied upon then, but the
traversal order meant that it had to block btree topology changes,
effectively any interior btree node update. As filesystems keep getting
bigger that is definitely no longer feasible, so that code will all need
to be reworked and modernized (and possibly split into multiple passes,
i.e. splitting out checking of indirect extent refcounts).
Todo: fsck.c
The passes that check filesystem-level structure were written assuming
the filesystem was not in use and being modified; they use the normal
btree transaction API that handles locking w.r.t. other transactions,
but not "correctly"; they do things like build up state (walking a btree
path, tracking where extents end in different snapshots), but the btree
transactions (trans_begin() -> trans_commit()) are smaller than they
should be to protect that state as they're walking things.
The main reason for this is that currently, there's a fixed limit on the
number of btree_paths (it's a fixed array) in the btree_trans object -
64. I've started looking at lifting that limit and making the array of
paths dynamically growable, and I think that's going to be the way to go
- the alternative would be introducing extra outside locking to all the
filesystem paths that touch that stuff, and we don't want to do that.
Fortunately other developments have happened that make growing the max
btree_paths look a lot more reasonable than when last I was considering
it, and fsck should be a good place for testing that out and seeing what
growing pains we encounter.
other misc todo:
- the thread_with_file stuff can probably use further cleanup and
refactoring; also the f_ops we implement needs a .poll() so userspace
doesn't spin when using O_NONBLOCK
- ???
- profit!
Kent Overstreet (6):
bcachefs: thread_with_file
bcachefs: Add ability to redirect log output
bcachefs: Mark recovery passses that are safe to run online
bcachefs: bch2_run_online_recovery_passes()
bcachefs: BCH_IOCTL_FSCK_OFFLINE
bcachefs: BCH_IOCTL_FSCK_ONLINE
fs/bcachefs/bcachefs.h | 61 ++++--
fs/bcachefs/bcachefs_ioctl.h | 24 +++
fs/bcachefs/chardev.c | 376 ++++++++++++++++++++++++++++++-----
fs/bcachefs/opts.h | 5 +
fs/bcachefs/recovery.c | 51 +++--
fs/bcachefs/recovery.h | 1 +
fs/bcachefs/recovery_types.h | 73 +++----
fs/bcachefs/super.c | 28 +++
8 files changed, 506 insertions(+), 113 deletions(-)
--
2.42.0
next reply other threads:[~2023-12-06 20:33 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-06 20:33 Kent Overstreet [this message]
2023-12-06 20:33 ` [PATCH 1/6] bcachefs: thread_with_file Kent Overstreet
2023-12-06 20:33 ` [PATCH 2/6] bcachefs: Add ability to redirect log output Kent Overstreet
2023-12-08 20:24 ` Brian Foster
2023-12-08 20:35 ` Kent Overstreet
2023-12-06 20:33 ` [PATCH 3/6] bcachefs: Mark recovery passses that are safe to run online Kent Overstreet
2023-12-06 20:33 ` [PATCH 4/6] bcachefs: bch2_run_online_recovery_passes() Kent Overstreet
2023-12-08 20:25 ` Brian Foster
2023-12-08 20:34 ` Kent Overstreet
2023-12-06 20:33 ` [PATCH 5/6] bcachefs: BCH_IOCTL_FSCK_OFFLINE Kent Overstreet
2023-12-08 20:26 ` Brian Foster
2023-12-08 20:33 ` Kent Overstreet
2023-12-06 20:33 ` [PATCH 6/6] bcachefs: BCH_IOCTL_FSCK_ONLINE Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231206203313.2197302-1-kent.overstreet@linux.dev \
--to=kent.overstreet@linux.dev \
--cc=djwong@kernel.org \
--cc=linux-bcachefs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox