From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com, Viro@disappointment.disaster,
viro@ZenIV.linux.org.uk, Al@disappointment.disaster
Subject: Re: [PATCH 2/6] xfs: write failure beyond EOF truncates too much data
Date: Fri, 4 Apr 2014 11:26:36 -0400 [thread overview]
Message-ID: <20140404152636.GA40629@bfoster.bfoster> (raw)
In-Reply-To: <20140329151419.GA33827@bfoster.bfoster>
On Sat, Mar 29, 2014 at 11:14:19AM -0400, Brian Foster wrote:
> On Fri, Mar 21, 2014 at 09:11:46PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > If we fail a write beyond EOF and have to handle it in
> > xfs_vm_write_begin(), we truncate the inode back to the current inode
> > size. This doesn't take into account the fact that we may have
> > already made successful writes to the same page (in the case of block
> > size < page size) and hence we can truncate the page cache away from
> > blocks with valid data in them. If these blocks are delayed
> > allocation blocks, we now have a mismatch between the page cache and
> > the extent tree, and this will trigger - at minimum - a delayed
> > block count mismatch assert when the inode is evicted from the cache.
> > We can also trip over it when block mapping for direct IO - this is
> > the most common symptom seen from fsx and fsstress when run from
> > xfstests.
> >
> > Fix it by only truncating away the exact range we are updating state
> > for in this write_begin call.
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> > fs/xfs/xfs_aops.c | 10 ++++++++--
> > 1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index e810243..6b4ecc8 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -1609,12 +1609,18 @@ xfs_vm_write_begin(
> > status = __block_write_begin(page, pos, len, xfs_get_blocks);
> > if (unlikely(status)) {
> > struct inode *inode = mapping->host;
> > + size_t isize = i_size_read(inode);
> >
> > xfs_vm_write_failed(inode, page, pos, len);
> > unlock_page(page);
> >
> > - if (pos + len > i_size_read(inode))
> > - truncate_pagecache(inode, i_size_read(inode));
>
> From what I can see, we have a write_begin/copy/write_end sequence per
> page and the inode size is updated down in the write_end path. If the
> write fails at write_begin time, then we haven't copied any data and the
> inode size hasn't changed.
>
> The intent of the original code looks like it wants to break off any
> blocks that might have been set up beyond EOF before the write happened
> to fail. Could you elaborate on how this can kill blocks that might have
> been written successfully? Doesn't this only affect post-EOF metadata?
>
> > + /*
> > + * If the write is beyond EOF, we only want to kill blocks
> > + * allocated in this write, not blocks that were previously
> > + * written successfully.
> > + */
> > + if (pos + len > isize)
> > + truncate_pagecache_range(inode, pos, pos + len);
>
> So the cache truncate now covers the range of the write. What happens if
> the part of the write is an overwrite? This seems similar to the problem
> you've described above... should the truncate start at isize instead?
>
I ran an experiment on this to confirm my suspicion here. I added a
quick little hack to fail any write_begin (set status=-1) for which pos
!= 0. With that in place:
xfs_io -fc "pwrite -b 2k 0 2k" /mnt/file
# hexdump /mnt/file
0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
*
0000800
xfs_io -c "pwrite -b 2k 1k 2k" /mnt/file
(fails)
# hexdump /mnt/file
0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
*
0000400 0000 0000 0000 0000 0000 0000 0000 0000
*
0000800
The failed extending write trashes the data over the previously valid
region.
Brian
> Brian
>
> >
> > page_cache_release(page);
> > page = NULL;
> > --
> > 1.9.0
> >
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-04-04 15:26 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-21 10:11 [RFC, PATCH 0/6] xfs: delalloc, DIO and corruption Dave Chinner
2014-03-21 10:11 ` [PATCH 1/6] xfs: kill buffers over failed write ranges properly Dave Chinner
2014-03-21 10:11 ` [PATCH 2/6] xfs: write failure beyond EOF truncates too much data Dave Chinner
2014-03-29 15:14 ` Brian Foster
2014-04-04 15:26 ` Brian Foster [this message]
2014-04-04 21:26 ` Dave Chinner
2014-03-21 10:11 ` [PATCH 3/6] xfs: xfs_vm_write_end truncates too much on failure Dave Chinner
2014-03-21 10:11 ` [PATCH 4/6] xfs: zeroing space needs to punch delalloc blocks Dave Chinner
2014-03-21 10:11 ` [PATCH 5/6] xfs: splitting delalloc extents can run out of reservation Dave Chinner
2014-04-04 13:37 ` Brian Foster
2014-04-04 21:31 ` Dave Chinner
2014-03-21 10:11 ` [PATCH 6/6] xfs: don't map ranges that span EOF for direct IO Dave Chinner
2014-03-31 17:22 ` [RFC, PATCH 0/6] xfs: delalloc, DIO and corruption Brian Foster
2014-03-31 20:17 ` Dave Chinner
2014-04-01 11:54 ` Brian Foster
-- strict thread matches above, loose matches on Subject: below --
2014-04-10 5:00 [PATCH 0/6 v2] xfs: delalloc, dio " Dave Chinner
2014-04-10 5:00 ` [PATCH 2/6] xfs: write failure beyond EOF truncates too much data Dave Chinner
2014-04-10 10:35 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140404152636.GA40629@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=Al@disappointment.disaster \
--cc=Viro@disappointment.disaster \
--cc=david@fromorbit.com \
--cc=viro@ZenIV.linux.org.uk \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).