From: Brian Foster <bfoster@redhat.com>
To: linux-xfs@vger.kernel.org
Subject: [PATCH RFC 2/2] xfs: optimize eof page flush for iomap zeroing on truncate
Date: Fri, 28 Oct 2022 09:04:11 -0400 [thread overview]
Message-ID: <20221028130411.977076-3-bfoster@redhat.com> (raw)
In-Reply-To: <20221028130411.977076-1-bfoster@redhat.com>
The flush that occurs just before xfs_truncate_page() during a
non-extending truncate exists to avoid potential stale data exposure
problems when iomap zeroing might be racing with buffered writes
over unwritten extents. However, we've had reports of this causing
significant performance regressions on overwrite workloads where the
flush serves no correctness purpose. For example, the uuidd
mechanism stores time metadata to a file on every generation
sequence. This involves a buffered (over)write followed by a
truncate of the file to its current size. If these uuids are used as
transaction IDs for a database application, then overall performance
can suffer tremendously by the repeated flushing on every truncate.
To avoid this problem, update the truncate path to only flush in
scenarios that are known to conflict with iomap zeroing. iomap skips
zeroing when it sees a hole or unwritten extent, so this essentially
means the filesystem should flush if either of those scenarios have
outstanding dirty pagecache and can skip the flush otherwise.
The ideal longer term solution here is to avoid the need to flush
entirely and allow the zeroing to detect a dirty page and zero it
accordingly, but this is a bit more involved in that it may involve
the iomap interface. The purpose of this change is therefore to
prioritize addressing the performance regression in a straightfoward
enough manner that it can be separated from further improvements.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
fs/xfs/xfs_iops.c | 44 ++++++++++++++++++++++++++++++++++++++------
1 file changed, 38 insertions(+), 6 deletions(-)
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index d31e64db243f..37f78117557e 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -782,7 +782,15 @@ xfs_truncate_zeroing(
xfs_off_t newsize,
bool *did_zeroing)
{
+ struct xfs_mount *mp = ip->i_mount;
+ struct inode *inode = VFS_I(ip);
+ struct xfs_ifork *ifp = ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK);
+ struct xfs_iext_cursor icur;
+ struct xfs_bmbt_irec got;
+ xfs_off_t end;
+ xfs_fileoff_t end_fsb = XFS_B_TO_FSBT(mp, newsize);
int error;
+ bool found;
if (newsize > oldsize) {
trace_xfs_zero_eof(ip, oldsize, newsize - oldsize);
@@ -790,16 +798,40 @@ xfs_truncate_zeroing(
did_zeroing);
}
+ /*
+ * No zeroing occurs if newsize is block aligned (or zero). The eof page
+ * is partially zeroed by the pagecache truncate, if necessary, and
+ * post-eof blocks are removed.
+ */
+ if ((newsize & (i_blocksize(inode) - 1)) == 0)
+ return 0;
+
/*
* iomap won't detect a dirty page over an unwritten block (or a cow
* block over a hole) and subsequently skips zeroing the newly post-EOF
- * portion of the page. Flush the new EOF to convert the block before
- * the pagecache truncate.
+ * portion of the page. To ensure proper zeroing occurs, flush the eof
+ * page if it is dirty and backed by a hole or unwritten extent in the
+ * data fork. This ensures that iomap sees the eof block in a state that
+ * warrants zeroing.
+ *
+ * This should eventually be handled in iomap processing so we don't
+ * have to flush at all. We do it here for now to avoid the additional
+ * latency in cases where it's not absolutely required.
*/
- error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping, newsize - 1,
- newsize - 1);
- if (error)
- return error;
+ end = newsize - 1;
+ if (filemap_range_needs_writeback(inode->i_mapping, end, end)) {
+ xfs_ilock(ip, XFS_ILOCK_SHARED);
+ found = xfs_iext_lookup_extent(ip, ifp, end_fsb, &icur, &got);
+ xfs_iunlock(ip, XFS_ILOCK_SHARED);
+
+ if (!found || got.br_startoff > end_fsb ||
+ got.br_state == XFS_EXT_UNWRITTEN) {
+ error = filemap_write_and_wait_range(inode->i_mapping,
+ end, end);
+ if (error)
+ return error;
+ }
+ }
return xfs_truncate_page(ip, newsize, did_zeroing);
}
--
2.37.3
next prev parent reply other threads:[~2022-10-28 13:05 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-28 13:04 [PATCH RFC 0/2] xfs: optimize truncate cache flushing Brian Foster
2022-10-28 13:04 ` [PATCH RFC 1/2] xfs: lift truncate iomap zeroing into a new helper Brian Foster
2022-10-28 13:04 ` Brian Foster [this message]
2022-10-28 13:11 ` [PATCH] xfs: redirty eof folio on truncate to avoid filemap flush Brian Foster
2022-10-28 18:26 ` Brian Foster
2022-10-28 21:30 ` Dave Chinner
2022-10-28 23:49 ` Darrick J. Wong
2022-10-29 22:01 ` Dave Chinner
2022-11-02 8:15 ` Christoph Hellwig
2022-11-03 14:53 ` Brian Foster
2022-11-03 22:25 ` Dave Chinner
2022-11-04 18:22 ` Brian Foster
2022-11-02 8:25 ` Christoph Hellwig
2022-10-28 21:35 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221028130411.977076-3-bfoster@redhat.com \
--to=bfoster@redhat.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox