linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] iomap: flush dirty cache over unwritten mappings on zero range
@ 2024-08-30 14:56 Brian Foster
  2024-08-30 14:56 ` [PATCH v3 1/2] iomap: fix handling of dirty folios over unwritten extents Brian Foster
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Brian Foster @ 2024-08-30 14:56 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs, djwong, josef, david

Hi all,

Here's v3 of the iomap zero range flush fixes. No real changes here
other than comment updates to better explain the flush and stale logic.
The latest version of corresponding test support is posted here [1].
Thoughts, reviews, flames appreciated.

v3:
- Rework comment(s) in patch 2 to explain marking the mapping stale.
- Added R-b tags.
v2: https://lore.kernel.org/linux-fsdevel/20240828181912.41517-1-bfoster@redhat.com/
- Update comment in patch 2 to explain hole case.
v1: https://lore.kernel.org/linux-fsdevel/20240822145910.188974-1-bfoster@redhat.com/
- Alternative approach, flush instead of revalidate.
rfc: https://lore.kernel.org/linux-fsdevel/20240718130212.23905-1-bfoster@redhat.com/

Brian Foster (2):
  iomap: fix handling of dirty folios over unwritten extents
  iomap: make zero range flush conditional on unwritten mappings

 fs/iomap/buffered-io.c | 63 +++++++++++++++++++++++++++++++++++++++---
 fs/xfs/xfs_iops.c      | 10 -------
 2 files changed, 59 insertions(+), 14 deletions(-)

-- 
2.45.0


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v3 1/2] iomap: fix handling of dirty folios over unwritten extents
  2024-08-30 14:56 [PATCH v3 0/2] iomap: flush dirty cache over unwritten mappings on zero range Brian Foster
@ 2024-08-30 14:56 ` Brian Foster
  2024-08-30 14:56 ` [PATCH v3 2/2] iomap: make zero range flush conditional on unwritten mappings Brian Foster
  2024-09-03  7:59 ` [PATCH v3 0/2] iomap: flush dirty cache over unwritten mappings on zero range Christian Brauner
  2 siblings, 0 replies; 4+ messages in thread
From: Brian Foster @ 2024-08-30 14:56 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs, djwong, josef, david

The iomap zero range implementation doesn't properly handle dirty
pagecache over unwritten mappings. It skips such mappings as if they
were pre-zeroed. If some part of an unwritten mapping is dirty in
pagecache from a previous write, the data in cache should be zeroed
as well. Instead, the data is left in cache and creates a stale data
exposure problem if writeback occurs sometime after the zero range.

Most callers are unaffected by this because the higher level
filesystem contexts that call zero range typically perform a filemap
flush of the target range for other reasons. A couple contexts that
don't otherwise need to flush are write file size extension and
truncate in XFS. The former path is currently susceptible to the
stale data exposure problem and the latter performs a flush
specifically to work around it.

This is clearly inconsistent and incomplete. As a first step toward
correcting behavior, lift the XFS workaround to iomap_zero_range()
and unconditionally flush the range before the zero range operation
proceeds. While this appears to be a bit of a big hammer, most all
users already do this from calling context save for the couple of
exceptions noted above. Future patches will optimize or elide this
flush while maintaining functional correctness.

Fixes: ae259a9c8593 ("fs: introduce iomap infrastructure")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/iomap/buffered-io.c | 10 ++++++++++
 fs/xfs/xfs_iops.c      | 10 ----------
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index f420c53d86ac..3e846f43ff48 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1451,6 +1451,16 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 	};
 	int ret;
 
+	/*
+	 * Zero range wants to skip pre-zeroed (i.e. unwritten) mappings, but
+	 * pagecache must be flushed to ensure stale data from previous
+	 * buffered writes is not exposed.
+	 */
+	ret = filemap_write_and_wait_range(inode->i_mapping,
+			pos, pos + len - 1);
+	if (ret)
+		return ret;
+
 	while ((ret = iomap_iter(&iter, ops)) > 0)
 		iter.processed = iomap_zero_iter(&iter, did_zero);
 	return ret;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 1cdc8034f54d..ddd3697e6ecd 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -870,16 +870,6 @@ xfs_setattr_size(
 		error = xfs_zero_range(ip, oldsize, newsize - oldsize,
 				&did_zeroing);
 	} else {
-		/*
-		 * iomap won't detect a dirty page over an unwritten block (or a
-		 * cow block over a hole) and subsequently skips zeroing the
-		 * newly post-EOF portion of the page. Flush the new EOF to
-		 * convert the block before the pagecache truncate.
-		 */
-		error = filemap_write_and_wait_range(inode->i_mapping, newsize,
-						     newsize);
-		if (error)
-			return error;
 		error = xfs_truncate_page(ip, newsize, &did_zeroing);
 	}
 
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v3 2/2] iomap: make zero range flush conditional on unwritten mappings
  2024-08-30 14:56 [PATCH v3 0/2] iomap: flush dirty cache over unwritten mappings on zero range Brian Foster
  2024-08-30 14:56 ` [PATCH v3 1/2] iomap: fix handling of dirty folios over unwritten extents Brian Foster
@ 2024-08-30 14:56 ` Brian Foster
  2024-09-03  7:59 ` [PATCH v3 0/2] iomap: flush dirty cache over unwritten mappings on zero range Christian Brauner
  2 siblings, 0 replies; 4+ messages in thread
From: Brian Foster @ 2024-08-30 14:56 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs, djwong, josef, david

iomap_zero_range() flushes pagecache to mitigate consistency
problems with dirty pagecache and unwritten mappings. The flush is
unconditional over the entire range because checking pagecache state
after mapping lookup is racy with writeback and reclaim. There are
ways around this using iomap's mapping revalidation mechanism, but
this is not supported by all iomap based filesystems and so is not a
generic solution.

There is another way around this limitation that is good enough to
filter the flush for most cases in practice. If we check for dirty
pagecache over the target range (instead of unconditionally flush),
we can keep track of whether the range was dirty before lookup and
defer the flush until/unless we see a combination of dirty cache
backed by an unwritten mapping. We don't necessarily know whether
the dirty cache was backed by the unwritten maping or some other
(written) part of the range, but the impliciation of a false
positive here is a spurious flush and thus relatively harmless.

Note that we also flush for hole mappings because iomap_zero_range()
is used for partial folio zeroing in some cases. For example, if a
folio straddles EOF on a sub-page FSB size fs, the post-eof portion
is hole-backed and dirtied/written via mapped write, and then i_size
increases before writeback can occur (which otherwise zeroes the
post-eof portion of the EOF folio), then the folio becomes
inconsistent with disk until reclaimed. A flush in this case
executes partial zeroing from writeback, and iomap knows that there
is otherwise no I/O to submit for hole backed mappings.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/iomap/buffered-io.c | 63 ++++++++++++++++++++++++++++++++++++------
 1 file changed, 54 insertions(+), 9 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 3e846f43ff48..1ace16181ecf 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1393,16 +1393,53 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
 }
 EXPORT_SYMBOL_GPL(iomap_file_unshare);
 
-static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
+/*
+ * Flush the remaining range of the iter and mark the current mapping stale.
+ * This is used when zero range sees an unwritten mapping that may have had
+ * dirty pagecache over it.
+ */
+static inline int iomap_zero_iter_flush_and_stale(struct iomap_iter *i)
+{
+	struct address_space *mapping = i->inode->i_mapping;
+	loff_t end = i->pos + i->len - 1;
+
+	i->iomap.flags |= IOMAP_F_STALE;
+	return filemap_write_and_wait_range(mapping, i->pos, end);
+}
+
+static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
+		bool *range_dirty)
 {
 	const struct iomap *srcmap = iomap_iter_srcmap(iter);
 	loff_t pos = iter->pos;
 	loff_t length = iomap_length(iter);
 	loff_t written = 0;
 
-	/* already zeroed?  we're done. */
-	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
+	/*
+	 * We must zero subranges of unwritten mappings that might be dirty in
+	 * pagecache from previous writes. We only know whether the entire range
+	 * was clean or not, however, and dirty folios may have been written
+	 * back or reclaimed at any point after mapping lookup.
+	 *
+	 * The easiest way to deal with this is to flush pagecache to trigger
+	 * any pending unwritten conversions and then grab the updated extents
+	 * from the fs. The flush may change the current mapping, so mark it
+	 * stale for the iterator to remap it for the next pass to handle
+	 * properly.
+	 *
+	 * Note that holes are treated the same as unwritten because zero range
+	 * is (ab)used for partial folio zeroing in some cases. Hole backed
+	 * post-eof ranges can be dirtied via mapped write and the flush
+	 * triggers writeback time post-eof zeroing.
+	 */
+	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) {
+		if (*range_dirty) {
+			*range_dirty = false;
+			return iomap_zero_iter_flush_and_stale(iter);
+		}
+		/* range is clean and already zeroed, nothing to do */
 		return length;
+	}
 
 	do {
 		struct folio *folio;
@@ -1450,19 +1487,27 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 		.flags		= IOMAP_ZERO,
 	};
 	int ret;
+	bool range_dirty;
 
 	/*
 	 * Zero range wants to skip pre-zeroed (i.e. unwritten) mappings, but
 	 * pagecache must be flushed to ensure stale data from previous
-	 * buffered writes is not exposed.
+	 * buffered writes is not exposed. A flush is only required for certain
+	 * types of mappings, but checking pagecache after mapping lookup is
+	 * racy with writeback and reclaim.
+	 *
+	 * Therefore, check the entire range first and pass along whether any
+	 * part of it is dirty. If so and an underlying mapping warrants it,
+	 * flush the cache at that point. This trades off the occasional false
+	 * positive (and spurious flush, if the dirty data and mapping don't
+	 * happen to overlap) for simplicity in handling a relatively uncommon
+	 * situation.
 	 */
-	ret = filemap_write_and_wait_range(inode->i_mapping,
-			pos, pos + len - 1);
-	if (ret)
-		return ret;
+	range_dirty = filemap_range_needs_writeback(inode->i_mapping,
+					pos, pos + len - 1);
 
 	while ((ret = iomap_iter(&iter, ops)) > 0)
-		iter.processed = iomap_zero_iter(&iter, did_zero);
+		iter.processed = iomap_zero_iter(&iter, did_zero, &range_dirty);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(iomap_zero_range);
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3 0/2] iomap: flush dirty cache over unwritten mappings on zero range
  2024-08-30 14:56 [PATCH v3 0/2] iomap: flush dirty cache over unwritten mappings on zero range Brian Foster
  2024-08-30 14:56 ` [PATCH v3 1/2] iomap: fix handling of dirty folios over unwritten extents Brian Foster
  2024-08-30 14:56 ` [PATCH v3 2/2] iomap: make zero range flush conditional on unwritten mappings Brian Foster
@ 2024-09-03  7:59 ` Christian Brauner
  2 siblings, 0 replies; 4+ messages in thread
From: Christian Brauner @ 2024-09-03  7:59 UTC (permalink / raw)
  To: Brian Foster
  Cc: Christian Brauner, linux-xfs, djwong, josef, david, linux-fsdevel

On Fri, 30 Aug 2024 10:56:32 -0400, Brian Foster wrote:
> Here's v3 of the iomap zero range flush fixes. No real changes here
> other than comment updates to better explain the flush and stale logic.
> The latest version of corresponding test support is posted here [1].
> Thoughts, reviews, flames appreciated.
> 
> v3:
> - Rework comment(s) in patch 2 to explain marking the mapping stale.
> - Added R-b tags.
> v2: https://lore.kernel.org/linux-fsdevel/20240828181912.41517-1-bfoster@redhat.com/
> - Update comment in patch 2 to explain hole case.
> v1: https://lore.kernel.org/linux-fsdevel/20240822145910.188974-1-bfoster@redhat.com/
> - Alternative approach, flush instead of revalidate.
> rfc: https://lore.kernel.org/linux-fsdevel/20240718130212.23905-1-bfoster@redhat.com/
> 
> [...]

Applied to the vfs.blocksize branch of the vfs/vfs.git tree.
Patches in the vfs.blocksize branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.blocksize

[1/2] iomap: fix handling of dirty folios over unwritten extents
      https://git.kernel.org/vfs/vfs/c/e19d398f4eb8
[2/2] iomap: make zero range flush conditional on unwritten mappings
      https://git.kernel.org/vfs/vfs/c/2dbdb9dbad46

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-09-03  8:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-30 14:56 [PATCH v3 0/2] iomap: flush dirty cache over unwritten mappings on zero range Brian Foster
2024-08-30 14:56 ` [PATCH v3 1/2] iomap: fix handling of dirty folios over unwritten extents Brian Foster
2024-08-30 14:56 ` [PATCH v3 2/2] iomap: make zero range flush conditional on unwritten mappings Brian Foster
2024-09-03  7:59 ` [PATCH v3 0/2] iomap: flush dirty cache over unwritten mappings on zero range Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).