linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eryu Guan <eguan@redhat.com>
To: linux-xfs@vger.kernel.org
Cc: Eryu Guan <eguan@redhat.com>
Subject: [PATCH] xfs: flush the range before zero partial block range on truncate down
Date: Fri, 27 Oct 2017 20:53:28 +0800	[thread overview]
Message-ID: <20171027125328.25001-1-eguan@redhat.com> (raw)

On truncate down, if new size is not block size aligned, we zero the
rest of block via iomap_truncate_page() to avoid exposing stale data
to user, and iomap_truncate_page() skips zeroing if the range is
already in unwritten status or a hole.

But it's possible that a buffer write overwrites the unwritten
extent, which won't be converted to a normal extent until I/O
completion, and iomap_truncate_page() skips zeroing wrongly because
of the not-converted unwritten extent. This would cause a subsequent
mmap read sees non-zeros beyond EOF.

I occasionally see this in fsx runs in fstests generic/112, a
simplified fsx operation sequence is like (assuming 4k block size
xfs):

  fallocate 0x0 0x1000 0x0 keep_size
  write 0x0 0x1000 0x0
  truncate 0x0 0x800 0x1000
  punch_hole 0x0 0x800 0x800
  mapread 0x0 0x800 0x800

This is introduced along with the switch to iomap based buffer write
support, commit 68a9f5e7007c ("xfs: implement iomap based buffered
write path"), because previously, block_truncate_page() did fs block
based check and only skipped holes.

Fix it by flushing the range we're going to zero to ensure
iomap_truncate_page() sees the final extent state and zeros the
range correctly.

Also fixed the wrong 'end' param of filemap_write_and_wait_range()
call while we're at it, the 'end' is inclusive and should be
'newsize - 1'.

Signed-off-by: Eryu Guan <eguan@redhat.com>
---

An fstests test case followed, I've verified it could reproduce the
bug reliably and quickly.

 fs/xfs/xfs_iops.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 17081c77ef86..cd4c612cc23c 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -824,6 +824,7 @@ xfs_setattr_size(
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct inode		*inode = VFS_I(ip);
+	struct address_space	*mapping = inode->i_mapping;
 	xfs_off_t		oldsize, newsize;
 	struct xfs_trans	*tp;
 	int			error;
@@ -878,8 +879,22 @@ xfs_setattr_size(
 	if (newsize > oldsize) {
 		error = xfs_zero_eof(ip, newsize, oldsize, &did_zeroing);
 	} else {
-		error = iomap_truncate_page(inode, newsize, &did_zeroing,
-				&xfs_iomap_ops);
+		/*
+		 * Firstly write out the range that will be zeroed, otherwise
+		 * iomap_truncate_page() may skip zeroing this range wrongly.
+		 * Because the range could still be dirty after buffer writes
+		 * over unwritten extents, and the extent still stays in
+		 * UNWRITTEN state, because the conversion only happens at I/O
+		 * completion.
+		 */
+		unsigned int blocksize = i_blocksize(inode);
+		unsigned int off = newsize & (blocksize - 1);
+		if (off && mapping->nrpages)
+			error = filemap_write_and_wait_range(mapping, newsize,
+					newsize + (blocksize - off) - 1);
+		if (!error)
+			error = iomap_truncate_page(inode, newsize,
+					&did_zeroing, &xfs_iomap_ops);
 	}
 
 	if (error)
@@ -895,8 +910,8 @@ xfs_setattr_size(
 	 */
 	if (did_zeroing ||
 	    (newsize > ip->i_d.di_size && oldsize != ip->i_d.di_size)) {
-		error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping,
-						      ip->i_d.di_size, newsize);
+		error = filemap_write_and_wait_range(mapping, ip->i_d.di_size,
+							newsize - 1);
 		if (error)
 			return error;
 	}
-- 
2.13.6


             reply	other threads:[~2017-10-27 12:53 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-27 12:53 Eryu Guan [this message]
2017-10-28  6:05 ` [PATCH] xfs: flush the range before zero partial block range on truncate down Christoph Hellwig
2017-10-31 10:09   ` Eryu Guan
2017-10-31 17:11     ` Darrick J. Wong
2017-10-31 23:03       ` Dave Chinner
2017-10-31 22:58 ` Dave Chinner
2017-11-01  3:46   ` Eryu Guan
2017-11-01  4:44     ` Dave Chinner
2017-11-01 12:06       ` Eryu Guan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171027125328.25001-1-eguan@redhat.com \
    --to=eguan@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).