From: Brian Foster <bfoster@redhat.com>
To: Sage Weil <sage@newdream.net>
Cc: Josef Bacik <jbacik@fb.com>, Jan Kara <jack@suse.cz>,
lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing
Date: Mon, 5 Jan 2015 14:02:44 -0500 [thread overview]
Message-ID: <20150105190243.GA51005@bfoster.bfoster> (raw)
In-Reply-To: <alpine.DEB.2.00.1501050819500.20175@cobra.newdream.net>
On Mon, Jan 05, 2015 at 10:34:57AM -0800, Sage Weil wrote:
> On Wed, 10 Dec 2014, Josef Bacik wrote:
> > On 12/10/2014 06:27 AM, Jan Kara wrote:
> > > On Mon 08-12-14 17:11:41, Josef Bacik wrote:
> > > > Hello,
> > > >
> > > > We have been doing pretty well at populating xfstests with loads of
> > > > tests to catch regressions and validate we're all working properly.
> > > > One thing that has been lacking is a good way to verify file system
> > > > integrity after a power fail. This is a core part of what file
> > > > systems are supposed to provide but it is probably the least tested
> > > > aspect. We have dm-flakey tests in xfstests to test fsync
> > > > correctness, but these tests do not catch the random horrible things
> > > > that can go wrong. We are still finding horrible scary things that
> > > > go wrong in Btrfs because it is simply hard to reproduce and test
> > > > for.
> > > >
> > > > I have been working on an idea to do this better, some may have seen
> > > > my dm-power-fail attempt, and I've got a new incarnation of the idea
> > > > thanks to discussions with Zach Brown. Obviously there will be a
> > > > lot changing in this area in the time between now and March but it
> > > > would be good to have everybody in the room talking about what they
> > > > would need to build a good and deterministic test to make sure we're
> > > > always giving a consistent file system and to make sure our fsync()
> > > > handling is working properly. Thanks,
> > > I agree we are lacking in testing this aspect. Just I don't see too much
> > > material for discussion there, unless we have something more tangible -
> > > when we have some implementation, we can talk about pros and cons of it,
> > > what still needs doing etc.
> > >
> >
> > Right that's what I was getting at. I have a solution and have sent it around
> > but there doesn't seem to be too many people interested in commenting on it.
> > I figure one of two things will happen
> >
> > 1) My solution will go in before LSF, in which case YAY my job is done and
> > this is more of an [ATTEND] than a [TOPIC], or
> >
> > 2) My solution hasn't gone in yet and I'd like to discuss my methodology and
> > how we can integrate it into xfstests, future features, other areas we could
> > test etc.
> >
> > Maybe not a full blown slot but combined with a overall testing slot or hell
> > just a quick lightening talk. Thanks,
>
> I have a related topic that may make sense to fit into any discussion
> about this. Twice recently we've run into trouble using newish or less
> common (combinations of) syscalls.
>
> The first instance was with the use of sync_file_range to try to
> control/limit the amount of dirty data in the page cache. This, possibly
> in combination with posix_fadvise(DONTNEED), managed to break the
> writeback sequence in XFS and led to data corruption after power loss.
>
Was there a report or any other details on this one? In particular, I'm
wondering if this is related to the problem exposed by xfstests test
xfs/053...
Brian
> The other issue we saw was just a general raft of FIEMAP bugs over the
> last year or two. We saw cases where even after fsync a fiemap result
> would not include all extents, and (not unexpectedly) lots of corner cases
> in several file systems, e.g., around partial blocks at end of file. (As
> far as I know everything we saw is resolved in current kernels.)
>
> I'm not so concerned with these specific bugs, but worried that we
> (perhaps naively) expected them to be pretty safe. Perhaps for FIEMAP
> this is a general case where a newish syscall/ioctl should be tested
> carefully with our workloads before being relied upon, and we could have
> worked to make sure e.g. xfstests has appropriate tests. For power fail
> testing in particular, though, right now it isn't clear who is testing
> what under what workloads, so the only really "safe" approach is to stick
> to whatever syscall combinations we think the rest of the world is using,
> or make sure we test ourselves.
>
> As things stand now the other devs are loathe to touch any remotely exotic
> fs call, but that hardly seems ideal. Hopefully a common framework for
> powerfail testing can improve on this. Perhaps there are other ways we
> make it easier to tell what is (well) tested, and conversely ensure that
> those tests are well-aligned with what real users are doing...
>
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2015-01-05 19:03 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-08 22:11 [LSF/MM TOPIC] Working towards better power fail testing Josef Bacik
2014-12-10 11:27 ` [Lsf-pc] " Jan Kara
2014-12-10 15:09 ` Josef Bacik
2015-01-05 18:34 ` Sage Weil
2015-01-05 19:02 ` Brian Foster [this message]
2015-01-05 19:13 ` Sage Weil
2015-01-05 19:33 ` Brian Foster
2015-01-05 21:17 ` Jan Kara
2015-01-05 21:47 ` Dave Chinner
2015-01-05 22:26 ` Sage Weil
2015-01-05 23:27 ` Dave Chinner
2015-01-06 17:37 ` Sage Weil
2015-01-06 8:53 ` Jan Kara
2015-01-06 16:39 ` Josef Bacik
2015-01-06 22:07 ` Dave Chinner
2015-01-07 10:10 ` Jan Kara
2015-01-13 17:05 ` Dmitry Monakhov
2015-01-13 17:17 ` Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150105190243.GA51005@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=jack@suse.cz \
--cc=jbacik@fb.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=sage@newdream.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).