From: Christoph Hellwig <hch@lst.de>
To: stable@vger.kernel.org
Cc: linux-xfs@vger.kernel.org, Brian Foster <bfoster@redhat.com>,
"Darrick J . Wong" <darrick.wong@oracle.com>
Subject: [PATCH 16/16] xfs: trim writepage mapping to within eof
Date: Thu, 19 Oct 2017 16:22:59 +0200 [thread overview]
Message-ID: <20171019142259.20082-17-hch@lst.de> (raw)
In-Reply-To: <20171019142259.20082-1-hch@lst.de>
From: Brian Foster <bfoster@redhat.com>
commit 40214d128e07dd21bb07a8ed6a7fe2f911281ab2 upstream.
The writeback rework in commit fbcc02561359 ("xfs: Introduce
writeback context for writepages") introduced a subtle change in
behavior with regard to the block mapping used across the
->writepages() sequence. The previous xfs_cluster_write() code would
only flush pages up to EOF at the time of the writepage, thus
ensuring that any pages due to file-extending writes would be
handled on a separate cycle and with a new, updated block mapping.
The updated code establishes a block mapping in xfs_writepage_map()
that could extend beyond EOF if the file has post-eof preallocation.
Because we now use the generic writeback infrastructure and pass the
cached mapping to each writepage call, there is no implicit EOF
limit in place. If eofblocks trimming occurs during ->writepages(),
any post-eof portion of the cached mapping becomes invalid. The
eofblocks code has no means to serialize against writeback because
there are no pages associated with post-eof blocks. Therefore if an
eofblocks trim occurs and is followed by a file-extending buffered
write, not only has the mapping become invalid, but we could end up
writing a page to disk based on the invalid mapping.
Consider the following sequence of events:
- A buffered write creates a delalloc extent and post-eof
speculative preallocation.
- Writeback starts and on the first writepage cycle, the delalloc
extent is converted to real blocks (including the post-eof blocks)
and the mapping is cached.
- The file is closed and xfs_release() trims post-eof blocks. The
cached writeback mapping is now invalid.
- Another buffered write appends the file with a delalloc extent.
- The concurrent writeback cycle picks up the just written page
because the writeback range end is LLONG_MAX. xfs_writepage_map()
attributes it to the (now invalid) cached mapping and writes the
data to an incorrect location on disk (and where the file offset is
still backed by a delalloc extent).
This problem is reproduced by xfstests test generic/464, which
triggers racing writes, appends, open/closes and writeback requests.
To address this problem, trim the mapping used during writeback to
within EOF when the mapping is validated. This ensures the mapping
is revalidated for any pages encountered beyond EOF as of the time
the current mapping was cached or last validated.
Reported-by: Eryu Guan <eguan@redhat.com>
Diagnosed-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_bmap.c | 11 +++++++++++
fs/xfs/libxfs/xfs_bmap.h | 1 +
fs/xfs/xfs_aops.c | 13 +++++++++++++
3 files changed, 25 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index d2f4ab175096..7eb99701054f 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4057,6 +4057,17 @@ xfs_trim_extent(
}
}
+/* trim extent to within eof */
+void
+xfs_trim_extent_eof(
+ struct xfs_bmbt_irec *irec,
+ struct xfs_inode *ip)
+
+{
+ xfs_trim_extent(irec, 0, XFS_B_TO_FSB(ip->i_mount,
+ i_size_read(VFS_I(ip))));
+}
+
/*
* Trim the returned map to the required bounds
*/
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index db53ac7ff6df..f1446d127120 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -196,6 +196,7 @@ void xfs_bmap_trace_exlist(struct xfs_inode *ip, xfs_extnum_t cnt,
void xfs_trim_extent(struct xfs_bmbt_irec *irec, xfs_fileoff_t bno,
xfs_filblks_t len);
+void xfs_trim_extent_eof(struct xfs_bmbt_irec *, struct xfs_inode *);
int xfs_bmap_add_attrfork(struct xfs_inode *ip, int size, int rsvd);
void xfs_bmap_local_to_extents_empty(struct xfs_inode *ip, int whichfork);
void xfs_bmap_add_free(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index c2dee43a2994..d31cd1ebd8e9 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -438,6 +438,19 @@ xfs_imap_valid(
{
offset >>= inode->i_blkbits;
+ /*
+ * We have to make sure the cached mapping is within EOF to protect
+ * against eofblocks trimming on file release leaving us with a stale
+ * mapping. Otherwise, a page for a subsequent file extending buffered
+ * write could get picked up by this writeback cycle and written to the
+ * wrong blocks.
+ *
+ * Note that what we really want here is a generic mapping invalidation
+ * mechanism to protect us from arbitrary extent modifying contexts, not
+ * just eofblocks.
+ */
+ xfs_trim_extent_eof(imap, XFS_I(inode));
+
return offset >= imap->br_startoff &&
offset < imap->br_startoff + imap->br_blockcount;
}
--
2.14.2
next prev parent reply other threads:[~2017-10-19 14:23 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-19 14:22 4.9-stable updates for XFS Christoph Hellwig
2017-10-19 14:22 ` [PATCH 01/16] xfs: don't unconditionally clear the reflink flag on zero-block files Christoph Hellwig
2017-10-19 14:22 ` [PATCH 02/16] xfs: evict CoW fork extents when performing finsert/fcollapse Christoph Hellwig
2017-10-19 14:22 ` [PATCH 03/16] fs/xfs: Use %pS printk format for direct addresses Christoph Hellwig
2017-10-19 14:22 ` [PATCH 04/16] xfs: report zeroed or not correctly in xfs_zero_range() Christoph Hellwig
2017-10-19 14:22 ` [PATCH 05/16] xfs: update i_size after unwritten conversion in dio completion Christoph Hellwig
2017-10-19 14:22 ` [PATCH 06/16] xfs: perag initialization should only touch m_ag_max_usable for AG 0 Christoph Hellwig
2017-10-19 14:22 ` [PATCH 07/16] xfs: Capture state of the right inode in xfs_iflush_done Christoph Hellwig
2017-10-19 14:22 ` [PATCH 08/16] xfs: always swap the cow forks when swapping extents Christoph Hellwig
2017-10-19 14:22 ` [PATCH 09/16] xfs: handle racy AIO in xfs_reflink_end_cow Christoph Hellwig
2017-10-19 14:22 ` [PATCH 10/16] xfs: Don't log uninitialised fields in inode structures Christoph Hellwig
2017-10-19 14:22 ` [PATCH 11/16] xfs: move more RT specific code under CONFIG_XFS_RT Christoph Hellwig
2017-10-19 14:22 ` [PATCH 12/16] xfs: don't change inode mode if ACL update fails Christoph Hellwig
2017-10-19 14:22 ` [PATCH 13/16] xfs: reinit btree pointer on attr tree inactivation walk Christoph Hellwig
2017-10-19 14:22 ` [PATCH 14/16] xfs: handle error if xfs_btree_get_bufs fails Christoph Hellwig
2017-10-19 14:22 ` [PATCH 15/16] xfs: cancel dirty pages on invalidation Christoph Hellwig
2017-10-19 14:22 ` Christoph Hellwig [this message]
2017-10-24 12:54 ` 4.9-stable updates for XFS Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171019142259.20082-17-hch@lst.de \
--to=hch@lst.de \
--cc=bfoster@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).