All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: linux-bcachefs@vger.kernel.org
Subject: Re: fstests generic/441 -- occasional bcachefs failure
Date: Fri, 27 Jan 2023 09:50:05 -0500	[thread overview]
Message-ID: <Y9PkndpTQcHrZ+im@bfoster> (raw)
In-Reply-To: <Y9KXaSkuMZM663Vi@moria.home.lan>

On Thu, Jan 26, 2023 at 10:08:25AM -0500, Kent Overstreet wrote:
> On Wed, Jan 25, 2023 at 10:45:24AM -0500, Brian Foster wrote:
> > Hi Kent, list,
> > 
> > I've noticed occasional failures of generic/441 on bcachefs that look
> > something like this:
> > 
> > generic/441 11s ... - output mismatch (see /root/xfstests-dev/results//generic/441.out.bad)
> >     --- tests/generic/441.out   2022-10-11 14:20:36.986202162 -0400
> >     +++ /root/xfstests-dev/results//generic/441.out.bad 2023-01-24 13:33:54.977729904 -0500
> >     @@ -1,3 +1,3 @@
> >      QA output created by 441
> >      Format and mount
> >     -Test passed!
> >     +First fsync after reopen of fd[0] failed: Input/output error
> >     ...
> >     (Run 'diff -u /root/xfstests-dev/tests/generic/441.out /root/xfstests-dev/results//generic/441.out.bad'  to see the entire diff)
> > 
> > This test covers fsync error reporting across multiple fds. What it does
> > in a nutshell is open a bunch of fds, write to them and fsync them, and
> > then repeat that a couple times over after switching to an error table
> > (i.e. dm-error) and back to a working table, verifying error reporting
> > expectations at each step along the way.
> > 
> > The error above is associated with an fsync failure after switching from
> > an error table back to a working table, and essentially occurs because
> > the filesystem has shutdown and the bcachefs inode has a journal
> > sequence number ahead of what the journal shows has been flushed to
> > disk. That makes sense in general because even though the vfs error
> > reporting might not expect an error, the filesystem has been shutdown
> > and fsync has work to do. I've been digging into it a bit regardless to
> > try and understand why the behavior seems transient..
> > 
> > What I'm seeing looks like a matter of timing at the point the test
> > switches to an error table and issues the first fsync (where failure is
> > expected). IIUC, what I think is happening here in the scenario where
> > the test passes is something like this:
> 
> Hey Brian - my understanding of this bug was that it was even simpler.
> 
> As I understand it, the decreed semantics of fsync by the Powers That Be
> are that it should return a given error _once_  - if a subsequent fsync
> returns an error, then there must have been another write that errored.
> 

Yep, that's my understanding as well. This particular test is just
checking that the error reporting spreads across distinct fds for the
same underlying file. I.e., so when writeback fails and sets an error on
the mapping, each open fd/file should see an instance of that error on
fsync (but presumably not repeatedly unless pages are written and failed
again).

> So we just need to plumb our errors from the journal through the
> errseq_t mechanism. The tricky bit is that bch2_fsync() doesn't record
> or check if it's actually doing a journal flush or if the journal flush
> was already done.
> 

That's an interesting idea. I'd have to poke around the errseq stuff and
think about it a bit..

Something else that occurred to me while looking further at this is we
can also avoid the error in this case fairly easily by bailing out of
bch2_fsync() if page writeback fails, as opposed to the unconditional
flush -> sync meta -> flush log sequence that returns the first error
anyways. That would prevent marking the inode with a new sequence number
when I/Os are obviously failing. The caveat is that the test still
fails, now with a "Read-only file system" error instead of EIO, because
the filesystem is shutdown by the time the vfs write inode path actually
runs.

All in all ISTM the shutdown is the fundamental issue wrt to this test.
I'm not sure it's worth complicating the log flush error handling with
errseq and whatnot just to filter out certain errors when the fs has
seen a catastrophic failure/shutdown (also agree with your followup that
isn't the most critical bug in the world anyways).

Either way, I am trying to grok the commit/writeback/journaling path a
bit better here to try and understand why the fs shuts down in this test
whereas other fs' don't seem to. One thing I'm noticing is that this
whole path is rather asynchronous, making it a bit hard to follow. Any
tips on the best way to work through the big picture steps involved in
the path from a trans commit, through journaling, to eventual btree
writeback?

Thanks for the feedback..

Brian

> So I think we'd need to add another sequence number to
> bch_inode_info to track the sequence that was flushed, and then only
> call bch2_journal_flush_seq() if flushed_seq !=
> journal_seq_of_last_update, and then to make this work correctly when an
> inode has been evicted it might need to go in struct bch_inode, stored
> in the btree, as well.
> 


  parent reply	other threads:[~2023-01-27 14:49 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-25 15:45 fstests generic/441 -- occasional bcachefs failure Brian Foster
2023-01-26 15:08 ` Kent Overstreet
2023-01-27  7:21   ` Kent Overstreet
2023-01-27 14:50   ` Brian Foster [this message]
2023-01-30 17:06     ` Kent Overstreet
2023-01-31 16:04       ` Brian Foster
2023-02-01 14:34         ` Kent Overstreet
2023-02-02 15:50           ` Brian Foster
2023-02-02 17:09             ` Freezing (was: Re: fstests generic/441 -- occasional bcachefs failure) Kent Overstreet
2023-02-02 20:04               ` Brian Foster
2023-02-02 22:39                 ` Kent Overstreet
2023-02-03  0:51               ` Dave Chinner
2023-02-04  0:35                 ` Kent Overstreet
2023-02-07  0:03                   ` Dave Chinner
2023-02-16 20:04                     ` Eric Wheeler
2023-02-20 22:19                       ` Dave Chinner
2023-02-20 23:23                         ` Kent Overstreet
2023-02-02 22:56         ` fstests generic/441 -- occasional bcachefs failure Kent Overstreet
2023-02-04 21:33           ` Brian Foster
2023-02-04 22:15             ` Kent Overstreet
2023-02-06 15:33               ` Brian Foster
2023-02-06 22:18                 ` Kent Overstreet
2023-02-09 12:57                   ` Brian Foster
2023-02-09 14:58                     ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y9PkndpTQcHrZ+im@bfoster \
    --to=bfoster@redhat.com \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-bcachefs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.