Patch "xfs: truncate pagecache before writeback in xfs_setattr_size()" has been added to the 4.9-stable tree

All of lore.kernel.org
 help / color / mirror / Atom feed

From: <gregkh@linuxfoundation.org>
To: eguan@redhat.com, alexander.levin@verizon.com,
	bfoster@redhat.com, darrick.wong@oracle.com, dchinner@redhat.com,
	gregkh@linuxfoundation.org
Cc: <stable@vger.kernel.org>, <stable-commits@vger.kernel.org>
Subject: Patch "xfs: truncate pagecache before writeback in xfs_setattr_size()" has been added to the 4.9-stable tree
Date: Mon, 18 Dec 2017 14:24:25 +0100	[thread overview]
Message-ID: <151360346512510@kroah.com> (raw)


This is a note to let you know that I've just added the patch titled

    xfs: truncate pagecache before writeback in xfs_setattr_size()

to the 4.9-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     xfs-truncate-pagecache-before-writeback-in-xfs_setattr_size.patch
and it can be found in the queue-4.9 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.


>From foo@baz Mon Dec 18 14:12:35 CET 2017
From: Eryu Guan <eguan@redhat.com>
Date: Wed, 1 Nov 2017 21:43:50 -0700
Subject: xfs: truncate pagecache before writeback in xfs_setattr_size()

From: Eryu Guan <eguan@redhat.com>


[ Upstream commit 350976ae21873b0d36584ea005076356431b8f79 ]

On truncate down, if new size is not block size aligned, we zero the
rest of block to avoid exposing stale data to user, and
iomap_truncate_page() skips zeroing if the range is already in
unwritten state or a hole. Then we writeback from on-disk i_size to
the new size if this range hasn't been written to disk yet, and
truncate page cache beyond new EOF and set in-core i_size.

The problem is that we could write data between di_size and newsize
before removing the page cache beyond newsize, as the extents may
still be in unwritten state right after a buffer write. As such, the
page of data that newsize lies in has not been zeroed by page cache
invalidation before it is written, and xfs_do_writepage() hasn't
triggered it's "zero data beyond EOF" case because we haven't
updated in-core i_size yet. Then a subsequent mmap read could see
non-zeros past EOF.

I occasionally see this in fsx runs in fstests generic/112, a
simplified fsx operation sequence is like (assuming 4k block size
xfs):

  fallocate 0x0 0x1000 0x0 keep_size
  write 0x0 0x1000 0x0
  truncate 0x0 0x800 0x1000
  punch_hole 0x0 0x800 0x800
  mapread 0x0 0x800 0x800

where fallocate allocates unwritten extent but doesn't update
i_size, buffer write populates the page cache and extent is still
unwritten, truncate skips zeroing page past new EOF and writes the
page to disk, punch_hole invalidates the page cache, at last mapread
reads the block back and sees non-zero beyond EOF.

Fix it by moving truncate_setsize() to before writeback so the page
cache invalidation zeros the partial page at the new EOF. This also
triggers "zero data beyond EOF" in xfs_do_writepage() at writeback
time, because newsize has been set and page straddles the newsize.

Also fixed the wrong 'end' param of filemap_write_and_wait_range()
call while we're at it, the 'end' is inclusive and should be
'newsize - 1'.

Suggested-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eryu Guan <eguan@redhat.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/xfs/xfs_iops.c |   36 ++++++++++++++++++++----------------
 1 file changed, 20 insertions(+), 16 deletions(-)

--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -871,22 +871,6 @@ xfs_setattr_size(
 		return error;
 
 	/*
-	 * We are going to log the inode size change in this transaction so
-	 * any previous writes that are beyond the on disk EOF and the new
-	 * EOF that have not been written out need to be written here.  If we
-	 * do not write the data out, we expose ourselves to the null files
-	 * problem. Note that this includes any block zeroing we did above;
-	 * otherwise those blocks may not be zeroed after a crash.
-	 */
-	if (did_zeroing ||
-	    (newsize > ip->i_d.di_size && oldsize != ip->i_d.di_size)) {
-		error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping,
-						      ip->i_d.di_size, newsize);
-		if (error)
-			return error;
-	}
-
-	/*
 	 * We've already locked out new page faults, so now we can safely remove
 	 * pages from the page cache knowing they won't get refaulted until we
 	 * drop the XFS_MMAP_EXCL lock after the extent manipulations are
@@ -902,9 +886,29 @@ xfs_setattr_size(
 	 * user visible changes). There's not much we can do about this, except
 	 * to hope that the caller sees ENOMEM and retries the truncate
 	 * operation.
+	 *
+	 * And we update in-core i_size and truncate page cache beyond newsize
+	 * before writeback the [di_size, newsize] range, so we're guaranteed
+	 * not to write stale data past the new EOF on truncate down.
 	 */
 	truncate_setsize(inode, newsize);
 
+	/*
+	 * We are going to log the inode size change in this transaction so
+	 * any previous writes that are beyond the on disk EOF and the new
+	 * EOF that have not been written out need to be written here.  If we
+	 * do not write the data out, we expose ourselves to the null files
+	 * problem. Note that this includes any block zeroing we did above;
+	 * otherwise those blocks may not be zeroed after a crash.
+	 */
+	if (did_zeroing ||
+	    (newsize > ip->i_d.di_size && oldsize != ip->i_d.di_size)) {
+		error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping,
+						ip->i_d.di_size, newsize - 1);
+		if (error)
+			return error;
+	}
+
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
 	if (error)
 		return error;


Patches currently in stable-queue which might be from eguan@redhat.com are

queue-4.9/ext4-fix-fdatasync-2-after-fallocate-2-operation.patch
queue-4.9/xfs-truncate-pagecache-before-writeback-in-xfs_setattr_size.patch

                 reply	other threads:[~2017-12-18 13:27 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=151360346512510@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=alexander.levin@verizon.com \
    --cc=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=eguan@redhat.com \
    --cc=stable-commits@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.