Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Brian Foster <bfoster@redhat.com>
To: Sage Weil <sage@newdream.net>
Cc: Josef Bacik <jbacik@fb.com>, Jan Kara <jack@suse.cz>,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing
Date: Mon, 5 Jan 2015 14:02:44 -0500	[thread overview]
Message-ID: <20150105190243.GA51005@bfoster.bfoster> (raw)
In-Reply-To: <alpine.DEB.2.00.1501050819500.20175@cobra.newdream.net>

On Mon, Jan 05, 2015 at 10:34:57AM -0800, Sage Weil wrote:
> On Wed, 10 Dec 2014, Josef Bacik wrote:
> > On 12/10/2014 06:27 AM, Jan Kara wrote:
> > > On Mon 08-12-14 17:11:41, Josef Bacik wrote:
> > > > Hello,
> > > > 
> > > > We have been doing pretty well at populating xfstests with loads of
> > > > tests to catch regressions and validate we're all working properly.
> > > > One thing that has been lacking is a good way to verify file system
> > > > integrity after a power fail.  This is a core part of what file
> > > > systems are supposed to provide but it is probably the least tested
> > > > aspect.  We have dm-flakey tests in xfstests to test fsync
> > > > correctness, but these tests do not catch the random horrible things
> > > > that can go wrong.  We are still finding horrible scary things that
> > > > go wrong in Btrfs because it is simply hard to reproduce and test
> > > > for.
> > > > 
> > > > I have been working on an idea to do this better, some may have seen
> > > > my dm-power-fail attempt, and I've got a new incarnation of the idea
> > > > thanks to discussions with Zach Brown.  Obviously there will be a
> > > > lot changing in this area in the time between now and March but it
> > > > would be good to have everybody in the room talking about what they
> > > > would need to build a good and deterministic test to make sure we're
> > > > always giving a consistent file system and to make sure our fsync()
> > > > handling is working properly.  Thanks,
> > >    I agree we are lacking in testing this aspect. Just I don't see too much
> > > material for discussion there, unless we have something more tangible -
> > > when we have some implementation, we can talk about pros and cons of it,
> > > what still needs doing etc.
> > > 
> > 
> > Right that's what I was getting at.  I have a solution and have sent it around
> > but there doesn't seem to be too many people interested in commenting on it.
> > I figure one of two things will happen
> > 
> > 1) My solution will go in before LSF, in which case YAY my job is done and
> > this is more of an [ATTEND] than a [TOPIC], or
> > 
> > 2) My solution hasn't gone in yet and I'd like to discuss my methodology and
> > how we can integrate it into xfstests, future features, other areas we could
> > test etc.
> > 
> > Maybe not a full blown slot but combined with a overall testing slot or hell
> > just a quick lightening talk.  Thanks,
> 
> I have a related topic that may make sense to fit into any discussion 
> about this. Twice recently we've run into trouble using newish or less 
> common (combinations of) syscalls.
> 
> The first instance was with the use of sync_file_range to try to 
> control/limit the amount of dirty data in the page cache.  This, possibly 
> in combination with posix_fadvise(DONTNEED), managed to break the 
> writeback sequence in XFS and led to data corruption after power loss.
> 

Was there a report or any other details on this one? In particular, I'm
wondering if this is related to the problem exposed by xfstests test
xfs/053...

Brian

> The other issue we saw was just a general raft of FIEMAP bugs over the 
> last year or two. We saw cases where even after fsync a fiemap result 
> would not include all extents, and (not unexpectedly) lots of corner cases 
> in several file systems, e.g., around partial blocks at end of file.  (As 
> far as I know everything we saw is resolved in current kernels.)
> 
> I'm not so concerned with these specific bugs, but worried that we 
> (perhaps naively) expected them to be pretty safe.  Perhaps for FIEMAP 
> this is a general case where a newish syscall/ioctl should be tested 
> carefully with our workloads before being relied upon, and we could have 
> worked to make sure e.g. xfstests has appropriate tests.  For power fail 
> testing in particular, though, right now it isn't clear who is testing 
> what under what workloads, so the only really "safe" approach is to stick 
> to whatever syscall combinations we think the rest of the world is using, 
> or make sure we test ourselves.
> 
> As things stand now the other devs are loathe to touch any remotely exotic 
> fs call, but that hardly seems ideal.  Hopefully a common framework for 
> powerfail testing can improve on this.  Perhaps there are other ways we 
> make it easier to tell what is (well) tested, and conversely ensure that 
> those tests are well-aligned with what real users are doing...
> 
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2015-01-05 19:03 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-08 22:11 [LSF/MM TOPIC] Working towards better power fail testing Josef Bacik
2014-12-10 11:27 ` [Lsf-pc] " Jan Kara
2014-12-10 15:09   ` Josef Bacik
2015-01-05 18:34     ` Sage Weil
2015-01-05 19:02       ` Brian Foster [this message]
2015-01-05 19:13         ` Sage Weil
2015-01-05 19:33           ` Brian Foster
2015-01-05 21:17       ` Jan Kara
2015-01-05 21:47       ` Dave Chinner
2015-01-05 22:26         ` Sage Weil
2015-01-05 23:27           ` Dave Chinner
2015-01-06 17:37             ` Sage Weil
2015-01-06  8:53         ` Jan Kara
2015-01-06 16:39           ` Josef Bacik
2015-01-06 22:07           ` Dave Chinner
2015-01-07 10:10             ` Jan Kara
2015-01-13 17:05 ` Dmitry Monakhov
2015-01-13 17:17   ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150105190243.GA51005@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=jack@suse.cz \
    --cc=jbacik@fb.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).