From: Brian Foster <bfoster@redhat.com>
To: linux-fsdevel@vger.kernel.org
Cc: linux-xfs@vger.kernel.org, linux-mm@kvack.org
Subject: [PATCH 3/4] iomap: fix handling of dirty folios over unwritten extents
Date: Thu, 18 Jul 2024 09:02:11 -0400 [thread overview]
Message-ID: <20240718130212.23905-4-bfoster@redhat.com> (raw)
In-Reply-To: <20240718130212.23905-1-bfoster@redhat.com>
iomap_zero_range() does not correctly handle unwritten mappings with
dirty folios in pagecache. It skips unwritten mappings
unconditionally as if they were already zeroed, and thus potentially
exposes stale data from a previous write if affected folios are not
written back before the zero range.
Most callers already flush the target range of the zero for
unrelated, context specific reasons, so this problem is not
necessarily prevalent. The known outliers (in XFS) are file
extension via buffered write and truncate. The truncate path issues
a flush to work around this iomap problem, but the file extension
path does not and thus can expose stale data if current EOF is
unaligned and has a dirty folio over an unwritten block.
This patch implements a mechanism for making zero range pagecache
aware for filesystems that support mapping validation (i.e.
folio_ops->iomap_valid()). Instead of just skipping unwritten
mappings, scan the corresponding pagecache range for dirty or
writeback folios. If found, explicitly zero them via buffered write.
Clean or uncached subranges of unwritten mappings are skipped, as
before.
The quirk with a post-iomap_begin() pagecache scan is that it is
racy with writeback and reclaim activity. Even if the higher level
code holds the invalidate lock, nothing prevents a dirty folio from
being written back, cleaned, and even reclaimed sometime after
iomap_begin() returns an unwritten map but before a pagecache scan
might find the dirty folio. To handle this situation, we can rely on
the fact that writeback completion converts unwritten extents in the
fs before writeback state is cleared on the folio.
This means that a pagecache scan followed by a mapping revalidate of
an unwritten mapping should either find a dirty folio if it exists,
or detect a mapping change if a dirty folio did exist and had been
cleaned sometime before the scan but after the unwritten mapping was
found. If the revalidation succeeds then we can safely assume
nothing has been written back and skip the range. If the
revalidation fails then we must assume any offset in the range could
have been modified by writeback. In other words, we must be
particularly careful to make sure that any uncached range we intend
to skip does not make it into iter.processed until the mapping is
revalidated.
Altogether, this allows zero range to handle dirty folios over
unwritten extents without needing to flush and wait for writeback
completion.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
fs/iomap/buffered-io.c | 53 +++++++++++++++++++++++++++++++++++++++---
1 file changed, 50 insertions(+), 3 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index a9425170df72..ea1d396ef445 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1385,6 +1385,23 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
}
EXPORT_SYMBOL_GPL(iomap_file_unshare);
+/*
+ * Scan an unwritten mapping for dirty pagecache and return the length of the
+ * clean or uncached range leading up to it. This is the range that zeroing may
+ * skip once the mapping is validated.
+ */
+static inline loff_t
+iomap_zero_iter_unwritten(struct iomap_iter *iter, loff_t pos, loff_t length)
+{
+ struct address_space *mapping = iter->inode->i_mapping;
+ loff_t fpos = pos;
+
+ if (!filemap_range_has_writeback(mapping, &fpos, length))
+ return length;
+ /* fpos can be smaller if the start folio is dirty */
+ return max(fpos, pos) - pos;
+}
+
static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
{
const struct iomap *srcmap = iomap_iter_srcmap(iter);
@@ -1393,16 +1410,46 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
loff_t written = 0;
/* already zeroed? we're done. */
- if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
+ if (srcmap->type == IOMAP_HOLE)
return length;
do {
struct folio *folio;
int status;
size_t offset;
- size_t bytes = min_t(u64, SIZE_MAX, length);
+ size_t bytes;
+ loff_t pending = 0;
bool ret;
+ /*
+ * Determine the range of the unwritten mapping that is clean in
+ * pagecache. We can skip this range, but only if the mapping is
+ * still valid after the pagecache scan. This is because
+ * writeback may have cleaned folios after the mapping lookup
+ * but before we were able to find them here. If that occurs,
+ * then the mapping must now be stale and we must reprocess the
+ * range.
+ */
+ if (srcmap->type == IOMAP_UNWRITTEN) {
+ pending = iomap_zero_iter_unwritten(iter, pos, length);
+ if (pending == length) {
+ /* no dirty cache, revalidate and bounce as we're
+ * either done or the mapping is stale */
+ if (iomap_revalidate(iter))
+ written += pending;
+ break;
+ }
+
+ /*
+ * Found a dirty folio. Update pos/length to point at
+ * it. written is updated only after the mapping is
+ * revalidated by iomap_write_begin().
+ */
+ pos += pending;
+ length -= pending;
+ }
+
+ bytes = min_t(u64, SIZE_MAX, length);
status = iomap_write_begin(iter, pos, bytes, &folio);
if (status)
return status;
@@ -1422,7 +1469,7 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
pos += bytes;
length -= bytes;
- written += bytes;
+ written += bytes + pending;
} while (length > 0);
if (did_zero)
--
2.45.0
next prev parent reply other threads:[~2024-07-18 13:01 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-18 13:02 [PATCH RFC 0/4] iomap: zero dirty folios over unwritten mappings on zero range Brian Foster
2024-07-18 13:02 ` [PATCH 1/4] filemap: return pos of first dirty folio from range_has_writeback Brian Foster
2024-07-18 15:09 ` Matthew Wilcox
2024-07-18 16:03 ` Brian Foster
2024-07-18 13:02 ` [PATCH 2/4] iomap: refactor an iomap_revalidate() helper Brian Foster
2024-07-18 13:02 ` Brian Foster [this message]
2024-07-19 0:25 ` [PATCH 3/4] iomap: fix handling of dirty folios over unwritten extents Dave Chinner
2024-07-19 15:17 ` Brian Foster
2024-07-18 13:02 ` [PATCH 4/4] xfs: remove unnecessary flush of eof page from truncate Brian Foster
2024-07-18 15:36 ` [PATCH RFC 0/4] iomap: zero dirty folios over unwritten mappings on zero range Josef Bacik
2024-07-18 16:02 ` Darrick J. Wong
2024-07-18 16:40 ` Brian Foster
2024-07-19 1:10 ` Dave Chinner
2024-07-19 15:22 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240718130212.23905-4-bfoster@redhat.com \
--to=bfoster@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).