All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <jbacik@fb.com>
To: Jan Kara <jack@suse.cz>, Dave Chinner <david@fromorbit.com>
Cc: Sage Weil <sage@newdream.net>,
	<lsf-pc@lists.linux-foundation.org>,
	<linux-fsdevel@vger.kernel.org>
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing
Date: Tue, 6 Jan 2015 11:39:27 -0500	[thread overview]
Message-ID: <54AC0FBF.3020104@fb.com> (raw)
In-Reply-To: <20150106085347.GA15729@quack.suse.cz>

On 01/06/2015 03:53 AM, Jan Kara wrote:
> On Tue 06-01-15 08:47:55, Dave Chinner wrote:
>>> As things stand now the other devs are loathe to touch any remotely exotic
>>> fs call, but that hardly seems ideal.  Hopefully a common framework for
>>> powerfail testing can improve on this.  Perhaps there are other ways we
>>> make it easier to tell what is (well) tested, and conversely ensure that
>>> those tests are well-aligned with what real users are doing...
>>
>> We don't actually need power failure (or even device failure)
>> infrastructure to test data integrity on failure. Filesystems just
>> need a shutdown method that stops any IO from being issued once the
>> shutdown flag is set. XFS has this and it's used by xfstests via the
>> "godown" utility to shut the fileystem down in various
>> circumstances. We've been using this for data integrity and log
>> recovery testing in xfstests for many years.
>>
>> Hence we know if the device behaves correctly w.r.t cache flushes
>> and FUA then the filesystem will behave correctly on power loss. We
>> don't need a device power fail simulator to tell us violating
>> fundamental architectural assumptions will corrupt filesystems....
>    I think that fs ioctl cannot easily simulate the situation where
> on-device volatile caches aren't properly flushed in all the necessary
> cases (we had a bugs like this in ext3/4 in the past which were hit by real
> users).
>

Agreed, my dm thing was meant to expose problems where we do not wait on 
IO properly before writing our super, a problem we've had at least twice 
so far.  I wanted something that was nice and simple and would quickly 
expose these kind of bugs.

> I also think that simulating the device failure in a different layer is
> simpler than checking for superblock flag in all the places where the
> filesystem submits IO (e.g. ext4 doesn't have dedicated buffer layer like
> xfs has and we rely on flusher thread to flush committed metadata to final
> location on disk so that writeback path completely avoids ext4 code - it's
> a generic writeback of the block device mapping). So I like the solution
> with the dm target more than a fs ioctl although I agree that it's more
> clumsy from the xfstests perspective.
>

So I'm working in support to xfstests fsx to emit the proper dm messages 
when it does an fsync so we can easily build a test to stress test fsync 
in all the horrible ways that fsx works.  Building tests around the dm 
target I've written is pretty simple, you just do something like

create device
mkfs device
mark the mkfs in the log
mount device
do your operations
unmount
replay log in whichever way you want and verify the contents

The replay thing is accomplished by the library and some helper 
functions in xfstests, so it's no more awkward than what we do with dm 
flakey, and gives us a bit more reproduce-ability and lets us check more 
esoteric failure conditions.

Like Jan says, we all do things differently, we are all our own little 
snowflakes, I feel like a dm target is a nice solution where we can 
impose a certain set of rules in very little code and all agree that 
it's correct, and then build tests around that.  Then our current fs'es 
will be well tested and any new fs'es will be equally well tested, all 
without having to add fs specific code that could be buggy.  Thanks,

Josef

  reply	other threads:[~2015-01-06 16:39 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-08 22:11 [LSF/MM TOPIC] Working towards better power fail testing Josef Bacik
2014-12-10 11:27 ` [Lsf-pc] " Jan Kara
2014-12-10 15:09   ` Josef Bacik
2015-01-05 18:34     ` Sage Weil
2015-01-05 19:02       ` Brian Foster
2015-01-05 19:13         ` Sage Weil
2015-01-05 19:33           ` Brian Foster
2015-01-05 21:17       ` Jan Kara
2015-01-05 21:47       ` Dave Chinner
2015-01-05 22:26         ` Sage Weil
2015-01-05 23:27           ` Dave Chinner
2015-01-06 17:37             ` Sage Weil
2015-01-06  8:53         ` Jan Kara
2015-01-06 16:39           ` Josef Bacik [this message]
2015-01-06 22:07           ` Dave Chinner
2015-01-07 10:10             ` Jan Kara
2015-01-13 17:05 ` Dmitry Monakhov
2015-01-13 17:17   ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54AC0FBF.3020104@fb.com \
    --to=jbacik@fb.com \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.