From: Brian Foster <bfoster@redhat.com>
To: Sage Weil <sage@newdream.net>
Cc: Josef Bacik <jbacik@fb.com>, Jan Kara <jack@suse.cz>,
lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing
Date: Mon, 5 Jan 2015 14:33:39 -0500 [thread overview]
Message-ID: <20150105193338.GB51005@bfoster.bfoster> (raw)
In-Reply-To: <alpine.DEB.2.00.1501051111090.5967@cobra.newdream.net>
On Mon, Jan 05, 2015 at 11:13:28AM -0800, Sage Weil wrote:
> On Mon, 5 Jan 2015, Brian Foster wrote:
> > On Mon, Jan 05, 2015 at 10:34:57AM -0800, Sage Weil wrote:
> > > On Wed, 10 Dec 2014, Josef Bacik wrote:
> > > > On 12/10/2014 06:27 AM, Jan Kara wrote:
> > > > > On Mon 08-12-14 17:11:41, Josef Bacik wrote:
> > > > > > Hello,
> > > > > >
> > > > > > We have been doing pretty well at populating xfstests with loads of
> > > > > > tests to catch regressions and validate we're all working properly.
> > > > > > One thing that has been lacking is a good way to verify file system
> > > > > > integrity after a power fail. This is a core part of what file
> > > > > > systems are supposed to provide but it is probably the least tested
> > > > > > aspect. We have dm-flakey tests in xfstests to test fsync
> > > > > > correctness, but these tests do not catch the random horrible things
> > > > > > that can go wrong. We are still finding horrible scary things that
> > > > > > go wrong in Btrfs because it is simply hard to reproduce and test
> > > > > > for.
> > > > > >
> > > > > > I have been working on an idea to do this better, some may have seen
> > > > > > my dm-power-fail attempt, and I've got a new incarnation of the idea
> > > > > > thanks to discussions with Zach Brown. Obviously there will be a
> > > > > > lot changing in this area in the time between now and March but it
> > > > > > would be good to have everybody in the room talking about what they
> > > > > > would need to build a good and deterministic test to make sure we're
> > > > > > always giving a consistent file system and to make sure our fsync()
> > > > > > handling is working properly. Thanks,
> > > > > I agree we are lacking in testing this aspect. Just I don't see too much
> > > > > material for discussion there, unless we have something more tangible -
> > > > > when we have some implementation, we can talk about pros and cons of it,
> > > > > what still needs doing etc.
> > > > >
> > > >
> > > > Right that's what I was getting at. I have a solution and have sent it around
> > > > but there doesn't seem to be too many people interested in commenting on it.
> > > > I figure one of two things will happen
> > > >
> > > > 1) My solution will go in before LSF, in which case YAY my job is done and
> > > > this is more of an [ATTEND] than a [TOPIC], or
> > > >
> > > > 2) My solution hasn't gone in yet and I'd like to discuss my methodology and
> > > > how we can integrate it into xfstests, future features, other areas we could
> > > > test etc.
> > > >
> > > > Maybe not a full blown slot but combined with a overall testing slot or hell
> > > > just a quick lightening talk. Thanks,
> > >
> > > I have a related topic that may make sense to fit into any discussion
> > > about this. Twice recently we've run into trouble using newish or less
> > > common (combinations of) syscalls.
> > >
> > > The first instance was with the use of sync_file_range to try to
> > > control/limit the amount of dirty data in the page cache. This, possibly
> > > in combination with posix_fadvise(DONTNEED), managed to break the
> > > writeback sequence in XFS and led to data corruption after power loss.
> > >
> >
> > Was there a report or any other details on this one? In particular, I'm
> > wondering if this is related to the problem exposed by xfstests test
> > xfs/053...
>
> This is the original thread:
>
> http://oss.sgi.com/archives/xfs/2013-06/msg00066.html
>
Thanks. It does look similar to xfs/053, the intent of which was to
indirectly create the kind of writeback pattern that exposes this.
> Looks like 053 is about ACLs though?
>
generic/053 does something with ACLs, xfs/053 is the test of interest.
Regardless, from the thread above it sounds like Dave had honed in on
the cause.
Brian
> sage
next prev parent reply other threads:[~2015-01-05 19:33 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-08 22:11 [LSF/MM TOPIC] Working towards better power fail testing Josef Bacik
2014-12-10 11:27 ` [Lsf-pc] " Jan Kara
2014-12-10 15:09 ` Josef Bacik
2015-01-05 18:34 ` Sage Weil
2015-01-05 19:02 ` Brian Foster
2015-01-05 19:13 ` Sage Weil
2015-01-05 19:33 ` Brian Foster [this message]
2015-01-05 21:17 ` Jan Kara
2015-01-05 21:47 ` Dave Chinner
2015-01-05 22:26 ` Sage Weil
2015-01-05 23:27 ` Dave Chinner
2015-01-06 17:37 ` Sage Weil
2015-01-06 8:53 ` Jan Kara
2015-01-06 16:39 ` Josef Bacik
2015-01-06 22:07 ` Dave Chinner
2015-01-07 10:10 ` Jan Kara
2015-01-13 17:05 ` Dmitry Monakhov
2015-01-13 17:17 ` Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150105193338.GB51005@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=jack@suse.cz \
--cc=jbacik@fb.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=sage@newdream.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).