From: Eryu Guan <eguan@redhat.com>
To: linux-xfs@vger.kernel.org
Cc: Eryu Guan <eguan@redhat.com>
Subject: [PATCH] xfs: flush the range before zero partial block range on truncate down
Date: Fri, 27 Oct 2017 20:53:28 +0800 [thread overview]
Message-ID: <20171027125328.25001-1-eguan@redhat.com> (raw)
On truncate down, if new size is not block size aligned, we zero the
rest of block via iomap_truncate_page() to avoid exposing stale data
to user, and iomap_truncate_page() skips zeroing if the range is
already in unwritten status or a hole.
But it's possible that a buffer write overwrites the unwritten
extent, which won't be converted to a normal extent until I/O
completion, and iomap_truncate_page() skips zeroing wrongly because
of the not-converted unwritten extent. This would cause a subsequent
mmap read sees non-zeros beyond EOF.
I occasionally see this in fsx runs in fstests generic/112, a
simplified fsx operation sequence is like (assuming 4k block size
xfs):
fallocate 0x0 0x1000 0x0 keep_size
write 0x0 0x1000 0x0
truncate 0x0 0x800 0x1000
punch_hole 0x0 0x800 0x800
mapread 0x0 0x800 0x800
This is introduced along with the switch to iomap based buffer write
support, commit 68a9f5e7007c ("xfs: implement iomap based buffered
write path"), because previously, block_truncate_page() did fs block
based check and only skipped holes.
Fix it by flushing the range we're going to zero to ensure
iomap_truncate_page() sees the final extent state and zeros the
range correctly.
Also fixed the wrong 'end' param of filemap_write_and_wait_range()
call while we're at it, the 'end' is inclusive and should be
'newsize - 1'.
Signed-off-by: Eryu Guan <eguan@redhat.com>
---
An fstests test case followed, I've verified it could reproduce the
bug reliably and quickly.
fs/xfs/xfs_iops.c | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 17081c77ef86..cd4c612cc23c 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -824,6 +824,7 @@ xfs_setattr_size(
{
struct xfs_mount *mp = ip->i_mount;
struct inode *inode = VFS_I(ip);
+ struct address_space *mapping = inode->i_mapping;
xfs_off_t oldsize, newsize;
struct xfs_trans *tp;
int error;
@@ -878,8 +879,22 @@ xfs_setattr_size(
if (newsize > oldsize) {
error = xfs_zero_eof(ip, newsize, oldsize, &did_zeroing);
} else {
- error = iomap_truncate_page(inode, newsize, &did_zeroing,
- &xfs_iomap_ops);
+ /*
+ * Firstly write out the range that will be zeroed, otherwise
+ * iomap_truncate_page() may skip zeroing this range wrongly.
+ * Because the range could still be dirty after buffer writes
+ * over unwritten extents, and the extent still stays in
+ * UNWRITTEN state, because the conversion only happens at I/O
+ * completion.
+ */
+ unsigned int blocksize = i_blocksize(inode);
+ unsigned int off = newsize & (blocksize - 1);
+ if (off && mapping->nrpages)
+ error = filemap_write_and_wait_range(mapping, newsize,
+ newsize + (blocksize - off) - 1);
+ if (!error)
+ error = iomap_truncate_page(inode, newsize,
+ &did_zeroing, &xfs_iomap_ops);
}
if (error)
@@ -895,8 +910,8 @@ xfs_setattr_size(
*/
if (did_zeroing ||
(newsize > ip->i_d.di_size && oldsize != ip->i_d.di_size)) {
- error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping,
- ip->i_d.di_size, newsize);
+ error = filemap_write_and_wait_range(mapping, ip->i_d.di_size,
+ newsize - 1);
if (error)
return error;
}
--
2.13.6
next reply other threads:[~2017-10-27 12:53 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-27 12:53 Eryu Guan [this message]
2017-10-28 6:05 ` [PATCH] xfs: flush the range before zero partial block range on truncate down Christoph Hellwig
2017-10-31 10:09 ` Eryu Guan
2017-10-31 17:11 ` Darrick J. Wong
2017-10-31 23:03 ` Dave Chinner
2017-10-31 22:58 ` Dave Chinner
2017-11-01 3:46 ` Eryu Guan
2017-11-01 4:44 ` Dave Chinner
2017-11-01 12:06 ` Eryu Guan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171027125328.25001-1-eguan@redhat.com \
--to=eguan@redhat.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).