From: Dave Chinner <david@fromorbit.com>
To: Lukas Czerner <lczerner@redhat.com>
Cc: Jayashree Mohan <jayashree2912@gmail.com>,
linux-xfs@vger.kernel.org,
Vijaychidambaram Velayudhan Pillai <vijay@cs.utexas.edu>,
Ashlie Martinez <ashmrtn@utexas.edu>,
fstests@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>
Subject: Re: XFS crash consistency bug : Loss of fsynced metadata operation
Date: Sat, 17 Mar 2018 14:16:26 +1100 [thread overview]
Message-ID: <20180317031626.GO18129@dastard> (raw)
In-Reply-To: <20180316054516.fehsluoqo2h2zvrw@rh_laptop>
On Fri, Mar 16, 2018 at 06:45:16AM +0100, Lukas Czerner wrote:
> On Fri, Mar 16, 2018 at 11:19:55AM +1100, Dave Chinner wrote:
> > On Thu, Mar 15, 2018 at 11:32:07AM +0100, Lukas Czerner wrote:
> > >
> > > And where the "cdcd" data is comming from ? Not from the
> > >
> > > "pwrite -S 1 0 64k"
> > >
> > > that's producing "0101".
> >
> > Ah, too much context switching. Makes me look like the village
> > idiot...
>
> that's how I was looking the past 4 emails ;)
>
> Anyway good that we finally understand each other. I think it'd be worth
> fixing the generic/042 to check for the actuall stale data not just
> expecting the file to be empty as it still can do some good.
I've got a fixed version of it. It starts by setting the file size
to 128k and then fsyncing ths file, so after the shutdown we should
always see a file full of zeros. Of course, the extent layout can be
anything, as long as the data returned after remount is all zeros.
> > Ok, I just spent some time looking at this in detail (event traces,
> > etc) and it is indeed as the test describes - an whole delalloc
> > extent allocation on partial overwrite. THe partial overwrite is
> > triggered by the fpunch/fzero code, and it appears the delalloc
> > converison is not paying attention to the start offset of the
> > writeback:
> >
> > xfs_writepage: dev 7:0 ino 0x7603 pgoff 0xf000 size 0x10000 delalloc 1
> > .....
> > xfs_alloc_near_greater: dev 7:0 agno 0 agbno 3784 minlen 16 maxlen 16
> > .....
> > xfs_write_extent: dev 7:0 ino 0x7603 offset 0 block 3784 count 16 flag 0
> >
> > IOWs, as I've said from the start (based on the test description)
> > this has nothing to do with the corruption issue CrashMonkey is
> > creating.
>
> Yeah they're probably not seeing this because of added fsync.
Yes - the fsync ensures data vs metadata ordering by writing and
waiting for data before flushing the log. In this case, there is
not data flush and so when the log is flushed in the shutdown,
the metadata hits the disk before the data, and that's where the
stale data exposure comes from.
THis is a well known issue - if allocation transactions are
committed before the data writes have been submitted, then a crash
can expose stale data - but it's very rarely seen with delalloc.
Historically it's always been seen with direct IO - that's why we
use unwritten extent allocation for direct IO at all times.
However, in XFS we can't do that with delayed allocation because
of the fact the information used by writeback is held on the
bufferhead and is not coherent with extent state. Hence we can't
change a delalloc extent to unwritten deep in the allocator, because
then the page writeback does the wrong thing as it expects to see a
delalloc extent not an unwritten extent. Fixing this problem is one
of the reasons we want to get rid of bufferheads from XFS.
I've got patches that have gone one round of review that remove
bufferhead state from the XFS writepage engine, but that will need
to fixed up and finished off before this problem can be fixed. IIRC,
when I wrote this test I'd just uncovered the problem, realised it
can't be fixed without bufferhead removal, and so the test was added
to remind us what needed fixing....
I actually have fixes that make XFS pass this test, but there's a
fair chunk of work still to be done before they are ready for
review. I'm thinking of closing this whole entirely by always using
unwritten extent allocate for delalloc writes inside EOF, which will
mean we could probably go back to using delalloc with extent size
hints, too (yay!)...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
prev parent reply other threads:[~2018-03-17 3:16 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-13 2:15 XFS crash consistency bug : Loss of fsynced metadata operation Jayashree Mohan
2018-03-13 4:21 ` Dave Chinner
2018-03-13 6:36 ` Amir Goldstein
2018-03-13 18:05 ` Jayashree Mohan
2018-03-13 16:57 ` Jayashree Mohan
2018-03-13 22:45 ` Dave Chinner
2018-03-14 13:16 ` Lukas Czerner
2018-03-14 13:32 ` Dave Chinner
2018-03-14 13:57 ` Lukas Czerner
2018-03-14 21:24 ` Dave Chinner
2018-03-15 6:15 ` Lukas Czerner
2018-03-15 10:06 ` Dave Chinner
2018-03-15 10:32 ` Lukas Czerner
2018-03-16 0:19 ` Dave Chinner
2018-03-16 5:45 ` Lukas Czerner
2018-03-17 3:16 ` Dave Chinner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180317031626.GO18129@dastard \
--to=david@fromorbit.com \
--cc=ashmrtn@utexas.edu \
--cc=fstests@vger.kernel.org \
--cc=jayashree2912@gmail.com \
--cc=lczerner@redhat.com \
--cc=linux-xfs@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=vijay@cs.utexas.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).