From: Dave Chinner <david@fromorbit.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org, Brian Foster <bfoster@redhat.com>,
xfs@oss.sgi.com, Dave Chinner <dchinner@redhat.com>
Subject: Re: fs corruption exposed by "xfs: increase prealloc size to double that of the previous extent"
Date: Mon, 17 Mar 2014 12:41:56 +1100 [thread overview]
Message-ID: <20140317014156.GC7072@dastard> (raw)
In-Reply-To: <20140317012804.GU18016@ZenIV.linux.org.uk>
On Mon, Mar 17, 2014 at 01:28:04AM +0000, Al Viro wrote:
> On Mon, Mar 17, 2014 at 12:29:18AM +0000, Al Viro wrote:
>
> > I think I know what's going on - O_DIRECT write starting a bit before
> > EOF on a file with the last extent that can be grown. It fills
> > a buffer_head with b_size extending quite a bit past the EOF; the
> > blocks are really allocated. What causes the problem is that we
> > have the flags set for the *first* block. IOW, buffer_new(bh) is
> > false - the first block has already been allocated. And for
> > direct-io.c it means "no zeroing the tail of the last block".
>
> BTW, that's something I have directly observed - xfs_get_blocks_direct()
> called with iblock corresponding to a bit under 16Kb below EOF and
> returning with ->b_size equal to 700K and ->b_flags not containing BH_New.
What's the userspace IO pattern that triggers this?
> IOW, we really can't mix new and old blocks in that interface - not enough
> information is passed back to caller to be able to decide what does and
> what does not need zeroing out. It should be either all-new or all-old.
Right, and XFS should not be mixing old and new in the way you are
describing, and that's what I can't reproduce. See my reply on the
other thread. Probably best to continue there...
> And it's not just the EOF, of course - the beginning of a hole in a sparse
> file isn't any different from the end of file in that respect.
Except that XFS treats that differently - it does allocation as
unwritten extents there, and any mapping that covers an unwritten
block will always result in buffer_new() getting set...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Brian Foster <bfoster@redhat.com>,
linux-fsdevel@vger.kernel.org, Dave Chinner <dchinner@redhat.com>,
xfs@oss.sgi.com
Subject: Re: fs corruption exposed by "xfs: increase prealloc size to double that of the previous extent"
Date: Mon, 17 Mar 2014 12:41:56 +1100 [thread overview]
Message-ID: <20140317014156.GC7072@dastard> (raw)
In-Reply-To: <20140317012804.GU18016@ZenIV.linux.org.uk>
On Mon, Mar 17, 2014 at 01:28:04AM +0000, Al Viro wrote:
> On Mon, Mar 17, 2014 at 12:29:18AM +0000, Al Viro wrote:
>
> > I think I know what's going on - O_DIRECT write starting a bit before
> > EOF on a file with the last extent that can be grown. It fills
> > a buffer_head with b_size extending quite a bit past the EOF; the
> > blocks are really allocated. What causes the problem is that we
> > have the flags set for the *first* block. IOW, buffer_new(bh) is
> > false - the first block has already been allocated. And for
> > direct-io.c it means "no zeroing the tail of the last block".
>
> BTW, that's something I have directly observed - xfs_get_blocks_direct()
> called with iblock corresponding to a bit under 16Kb below EOF and
> returning with ->b_size equal to 700K and ->b_flags not containing BH_New.
What's the userspace IO pattern that triggers this?
> IOW, we really can't mix new and old blocks in that interface - not enough
> information is passed back to caller to be able to decide what does and
> what does not need zeroing out. It should be either all-new or all-old.
Right, and XFS should not be mixing old and new in the way you are
describing, and that's what I can't reproduce. See my reply on the
other thread. Probably best to continue there...
> And it's not just the EOF, of course - the beginning of a hole in a sparse
> file isn't any different from the end of file in that respect.
Except that XFS treats that differently - it does allocation as
unwritten extents there, and any mapping that covers an unwritten
block will always result in buffer_new() getting set...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2014-03-17 1:42 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-15 21:02 fs corruption exposed by "xfs: increase prealloc size to double that of the previous extent" Al Viro
2014-03-15 21:02 ` Al Viro
2014-03-16 2:21 ` Al Viro
2014-03-16 2:39 ` Al Viro
2014-03-16 2:39 ` Al Viro
2014-03-16 20:56 ` Al Viro
2014-03-16 20:56 ` Al Viro
2014-03-17 1:36 ` Dave Chinner
2014-03-17 1:36 ` Dave Chinner
2014-03-17 2:43 ` Dave Chinner
2014-03-18 1:16 ` Dave Chinner
2014-03-17 0:11 ` Dave Chinner
2014-03-17 0:11 ` Dave Chinner
2014-03-17 0:29 ` Al Viro
2014-03-17 0:29 ` Al Viro
2014-03-17 1:28 ` Al Viro
2014-03-17 1:38 ` Al Viro
2014-03-17 1:41 ` Dave Chinner [this message]
2014-03-17 1:41 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140317014156.GC7072@dastard \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=dchinner@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=viro@ZenIV.linux.org.uk \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.