public inbox for linux-bcachefs@vger.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: linux-bcachefs@vger.kernel.org
Subject: Re: [PATCH 5/5] bcachefs: use prejournaled key updates for write buffer flushes
Date: Fri, 21 Jul 2023 09:09:44 -0400	[thread overview]
Message-ID: <ZLqDmGvvkc8ZtBTa@bfoster> (raw)
In-Reply-To: <20230720212657.tfiootulv3aksq7f@moria.home.lan>

On Thu, Jul 20, 2023 at 05:26:57PM -0400, Kent Overstreet wrote:
> On Wed, Jul 19, 2023 at 08:53:06AM -0400, Brian Foster wrote:
> > The write buffer mechanism journals keys twice in certain
> > situations. A key is always journaled on write buffer insertion, and
> > is potentially journaled again if a write buffer flush falls into
> > either of the slow btree insert paths. This has shown to cause
> > journal recovery ordering problems in the event of an untimely
> > crash.
> > 
> > For example, consider if a key is inserted into index 0 of a write
> > buffer, the active write buffer switches to index 1, the key is
> > deleted in index 1, and then index 0 is flushed. If the original key
> > is rejournaled in the btree update from the index 0 flush, the (now
> > deleted) key is journaled in a seq buffer ahead of the latest
> > version of key (which was journaled when the key was deleted in
> > index 1). If the fs crashes while this is still observable in the
> > log, recovery sees the key from the btree update after the delete
> > key from the write buffer insert, which is the incorrect order. This
> > problem is occasionally reproduced by generic/388 and generally
> > manifests as one or more backpointer entry inconsistencies.
> > 
> > To avoid this problem, never rejournal write buffered key updates to
> > the associated btree. Instead, use prejournaled key updates to pass
> > the journal seq of the write buffer insert down to the btree insert,
> > which updates the btree leaf pin to reflect the seq of the key.
> > 
> > Note that tracking the seq is required instead of just using
> > NOJOURNAL here because otherwise we lose protection of the write
> > buffer pin when the buffer is flushed, which means the key can fall
> > off the tail of the on-disk journal before the btree leaf is flushed
> > and lead to similar recovery inconsistencies.
> 
> Fairly simple, and it looks like this fixes all the backpointers errors
> - nice, applied.
> 

Thanks!

> So journal replay does something similar, where we're doing a btree
> update for a key that already lives in a particular journal entry and
> the dependency needs to be recorded. Could you have a quick look to see
> if there's any cleanups/unification worth doing there? Or if not, just
> leave a note.
> 

Interesting. IIRC, one of the purposes of generic/388 was to test
crashes during recovery itself, so we should have at least some coverage
in that area.

Can you point to an example or area of code where this occurs with
recovery? (Or do you mean recovery in general has this rejournaling
behavior?)

> Also, did you look at adding the tracepoint we talked about, for when
> we're doing btree updates that are already at the end of the journal and
> immediately need to be flushed?
> 

Not yet.. I wanted to get this worked out first and then ran into some
sort of contention issue during some fstests that I had to dig into
before I got a chance to think about it. I'll send a separate mail on
that one and will probably have to pick up both once I'm back..

Brian


      reply	other threads:[~2023-07-21 13:07 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-19 12:53 [PATCH 0/5] bcachefs: write buffer journaling fixes Brian Foster
2023-07-19 12:53 ` [PATCH 1/5] bcachefs: remove duplicate code between backpointer update paths Brian Foster
2023-07-19 12:53 ` [PATCH 2/5] bcachefs: remove unnecessary btree_insert_key_leaf() wrapper Brian Foster
2023-07-19 12:53 ` [PATCH 3/5] bcachefs: fold bch2_trans_update_by_path_trace() into callers Brian Foster
2023-07-19 12:53 ` [PATCH 4/5] bcachefs: support btree updates of prejournaled keys Brian Foster
2023-07-19 12:53 ` [PATCH 5/5] bcachefs: use prejournaled key updates for write buffer flushes Brian Foster
2023-07-20 21:26   ` Kent Overstreet
2023-07-21 13:09     ` Brian Foster [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZLqDmGvvkc8ZtBTa@bfoster \
    --to=bfoster@redhat.com \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-bcachefs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox