From: Dave Chinner <david@fromorbit.com>
To: Manfred Spraul <manfred@colorfullife.com>
Cc: Theodore Ts'o <tytso@mit.edu>,
linux-xfs@vger.kernel.org,
"Spraul Manfred (XC/QMM21-CT)" <Manfred.Spraul@de.bosch.com>
Subject: Re: Metadata CRC error detected at xfs_dir3_block_read_verify+0x9e/0xc0 [xfs], xfs_dir3_block block 0x86f58
Date: Thu, 17 Mar 2022 19:24:11 +1100 [thread overview]
Message-ID: <20220317082411.GA3927073@dread.disaster.area> (raw)
In-Reply-To: <21c13283-2a9f-4978-25e4-228e44ab74e6@colorfullife.com>
On Thu, Mar 17, 2022 at 07:49:02AM +0100, Manfred Spraul wrote:
> Hi Dave,
>
> [+Ted as the topic also applies to ext4]
>
> On 3/17/22 04:08, Dave Chinner wrote:
> > On Thu, Mar 17, 2022 at 01:47:05PM +1100, Dave Chinner wrote:
> > > On Wed, Mar 16, 2022 at 09:55:04AM +0100, Manfred Spraul wrote:
> > > > Hi Dave,
> > > >
> > > > On 3/14/22 16:18, Manfred Spraul wrote:
> > > >
> > > > But:
> > > >
> > > > I've checked the eMMC specification, and the spec allows that teared write
> > > > happen:
> > > Yes, most storage only guarantees that sector writes are atomic and
> > > so multi-sector writes have no guarantees of being written
> > > atomically. IOWs, all storage technologies that currently exist are
> > > allowed to tear multi-sector writes.
> > >
> > > However, FUA writes are guaranteed to be whole on persistent storage
> > > regardless of size when the hardware signals completion. And any
> > > write that the hardware has signalled as complete before a cache
> > > flush is received is also guaranteed to be whole on persistent
> > > storage when the cache flush is signalled as complete by the
> > > hardware. These mechanisms provide protection against torn writes.
>
> My plan was to create a replay application that randomly creates disc images
> allowed by the writeback_cache_control documentation.
>
> https://www.kernel.org/doc/html/latest/block/writeback_cache_control.html
>
> And then check that the filesystem behaves as expected/defined.
We already have that tool that exercises stepwise flush/fua aware
write recovery for filesystem testing: dm-logwrites was written and
integrated into fstests years ago (2016?) by Josef Bacik for testing
btrfs recovery, but it was a generic solution that all filesystems
can use to test failure recovery....
See, for example, common/dmlogwrites and tests/generic/482 - g/482
uses fsstress to randomly modify the filesystem while dm-logwrites
records all the writes made by the filesystem. It then replays them
one flush/fua at a time, mounting the filesystem to ensure that it
can recover the filesystem, then runs filesystem checkers to ensure
that the filesystem does not have any corrupt metadata. Then it
replays to the next flush/fua and repeats.
tools/dm-logwrite-replay provides a script and documents the
methodology to run step by step through replay of g/482 failures to
be able to reliably reproduce and diagnose the cause of the failure.
There's no need to re-invent the wheel if we've already got a
perfectly good one...
> > > > Is my understanding correct that XFS support neither eMMC nor NVM devices?
> > > > (unless there is a battery backup that exceeds the guarantees from the spec)
> > > Incorrect.
> > >
> > > They are supported just fine because flush/FUA semantics provide
> > > guarantees against torn writes in normal operation. IOWs, torn
> > > writes are something that almost *never* happen in real life, even
> > > when power fails suddenly. Despite this, XFS can detect it has
> > > occurred (because broken storage is all too common!), and if it
> > > can't recovery automatically, it will shut down and ask the user to
> > > correct the problem.
>
> So for xfs the behavior should be:
>
> - without torn writes: Mount always successful, no errors when accessing the
> content.
Yes.
Of course, there are software bugs, so mounts, recovery and
subsequent repair testing can still fail.
> - with torn writes: There may be error that will be detected only at
> runtime. The errors may at the end cause a file system shutdown.
Yes, and they may even prevent the filesystem from being mounted
because recovery trips over them (e.g. processing pending unlinked
inodes or replaying incomplete intents).
> (commented dmesg is attached)
>
> The application I have in mind are embedded systems.
> I.e. there is no user that can correct something, the recovery strategy must
> be included in the design.
Good luck with that - storage hardware fails in ways that no
existing filesystem can recover automatically from 100% of the time.
And very few even attempt to do so because it is largely an
impossible requirement to fulfil. Torn writes are just the tip of
the iceberg....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2022-03-17 8:24 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-13 15:47 Metadata CRC error detected at xfs_dir3_block_read_verify+0x9e/0xc0 [xfs], xfs_dir3_block block 0x86f58 Manfred Spraul
2022-03-13 22:46 ` Dave Chinner
2022-03-14 15:18 ` Manfred Spraul
2022-03-16 8:55 ` Manfred Spraul
2022-03-17 2:47 ` Dave Chinner
2022-03-17 3:08 ` Dave Chinner
2022-03-17 6:49 ` Manfred Spraul
2022-03-17 8:24 ` Dave Chinner [this message]
2022-03-17 16:09 ` Manfred Spraul
2022-03-17 14:50 ` Theodore Ts'o
2022-03-17 16:03 ` Manfred Spraul
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220317082411.GA3927073@dread.disaster.area \
--to=david@fromorbit.com \
--cc=Manfred.Spraul@de.bosch.com \
--cc=linux-xfs@vger.kernel.org \
--cc=manfred@colorfullife.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox