From: Brian Foster <bfoster@redhat.com>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: linux-bcachefs@vger.kernel.org
Subject: Re: bcachefs replica garbage collection
Date: Wed, 3 May 2023 10:05:15 -0400 [thread overview]
Message-ID: <ZFJqG43IUQIaMDMr@bfoster> (raw)
In-Reply-To: <ZFGZMl/0GGFwajoz@moria.home.lan>
On Tue, May 02, 2023 at 07:13:54PM -0400, Kent Overstreet wrote:
> On Tue, May 02, 2023 at 10:58:50AM -0400, Brian Foster wrote:
> > Hi Kent,
> >
> > I'm tracking an issue that seems to be related to bch_replica garbage
> > collection, where fsck reports the following:
> >
> > superblock not marked as containing replicas journal: 1/1 [1], not fixing
> >
> > ... and the fs doesn't want to mount (until repaired).
> >
> > The short of it looks like there's a window where we write out the sb
> > replica info with no BCH_DATA_journal usage where a shutdown/recovery
> > doesn't otherwise expect it. I've not fully root caused the problem yet,
> > but when looking into the code it looks like we have a couple different
> > mechanisms for clearing out unused replicas...
> >
> > The "old" _start/_end mechanism handles journal data only and is invoked
> > after pin flush. I presume the start/end calls wrap marking journal dev
> > replicas to ensure usage accounting is up to date, but I'm not totally
> > sure. Is that the case?
>
> It's not anything to do with usage accounting - we don't really do usage
> accounting for the journal, not in a way that would work for replicas
> tracking.
>
> So the old style replicas gc mechanism that
> bch2_journal_flush_device_pins() uses is your traditional mark and sweep
> gc - clear everything, then walk and remark everything that's currently
> in use.
>
Yeah, I think I grokked the fundamentals from the implementation. It
looks like it basically clears out the journal replica entries from the
_gc container, repopulates based on c->usage->replicas, and then updates
the sb/primary tables based on the result.
> > If so, any reason the marking via journal_write_done() isn't
> > sufficient?
>
> This might be the bug you're looking for - the journal write path should
> be doing the marking before the write, not after.
>
Quite possible. That certainly could be a vector to this transient
state. I'll look into it..
I am still a bit unclear on the marks in the flush path
(bch2_journal_flush_device_pins()), though. Am I following correctly
that this code marks the devices still backing journal entries? If so
and given the submit time marking, is that basically gc algorithmic
boilerplate where in context we'd expect all of those devices to already
be marked, or is there some particular situation where you'd expect that
path to mark a device that isn't already marked with journal data?
> > The "new" bch2_replicas_gc2() mechanism is invoked from various other
> > places and always propagates journal data replica entries. Is that
> > filter on journal data by design, or temporary because journal replica
> > gc is handled separately as above? If the latter, is there some explicit
> > reason the old mechanism was left around to handle journal data like
> > this?
> >
> > As noted above, I still need to track down the root cause of the
> > unexpectedly absent journal data replica sb field. It may be agnostic to
> > either gc mechanism, but I'm trying to understand the higher level
> > situation here a bit better before getting too deep into the weeds...
> > thanks.
>
> Applying this...
>
ACK.. that's useful context. Thanks.
Brian
> From: Kent Overstreet <kent.overstreet@linux.dev>
> Date: Tue, 2 May 2023 18:22:12 -0400
> Subject: [PATCH] bcachefs: Improved comment for bch2_replicas_gc2()
>
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
>
> diff --git a/fs/bcachefs/replicas.c b/fs/bcachefs/replicas.c
> index 8935ff5899..8ae50dfd8c 100644
> --- a/fs/bcachefs/replicas.c
> +++ b/fs/bcachefs/replicas.c
> @@ -550,8 +550,14 @@ int bch2_replicas_gc_start(struct bch_fs *c, unsigned typemask)
> return 0;
> }
>
> -/* New much simpler mechanism for clearing out unneeded replicas entries: */
> -
> +/*
> + * New much simpler mechanism for clearing out unneeded replicas entries - drop
> + * replicas entries that have 0 sectors used.
> + *
> + * However, we don't track sector counts for journal usage, so this doesn't drop
> + * any BCH_DATA_journal entries; the old bch2_replicas_gc_(start|end) mechanism
> + * is retained for that.
> + */
> int bch2_replicas_gc2(struct bch_fs *c)
> {
> struct bch_replicas_cpu new = { 0 };
>
next prev parent reply other threads:[~2023-05-03 14:04 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-02 14:58 bcachefs replica garbage collection Brian Foster
2023-05-02 23:13 ` Kent Overstreet
2023-05-03 14:05 ` Brian Foster [this message]
2023-05-04 14:11 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZFJqG43IUQIaMDMr@bfoster \
--to=bfoster@redhat.com \
--cc=kent.overstreet@linux.dev \
--cc=linux-bcachefs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox