public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/3] iomap: zero range flush fixes
@ 2024-11-15 20:01 Brian Foster
  2024-11-15 20:01 ` [PATCH v4 1/3] iomap: reset per-iter state on non-error iter advances Brian Foster
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Brian Foster @ 2024-11-15 20:01 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs, hch, djwong

Hi all,

Here's v4 of the zero range flush improvements. No real major changes
here, mostly minor whitespace, naming issues, etc.

Brian

v4:
- Whitespace and variable naming fixes.
- Split off patch 4 to a separate post.
v3: https://lore.kernel.org/linux-fsdevel/20241108124246.198489-1-bfoster@redhat.com/
- Added new patch 1 to always reset per-iter state in iomap_iter.
- Dropped iomap_iter_init() helper.
- Misc. cleanups.
- Appended patch 4 to warn on zeroing beyond EOF.
v2: https://lore.kernel.org/linux-fsdevel/20241031140449.439576-1-bfoster@redhat.com/
- Added patch 1 to lift zeroed mapping handling code into caller.
- Split unaligned start range handling at the top level.
- Retain existing conditional flush behavior (vs. unconditional flush)
  for the remaining range.
v1: https://lore.kernel.org/linux-fsdevel/20241023143029.11275-1-bfoster@redhat.com/

Brian Foster (3):
  iomap: reset per-iter state on non-error iter advances
  iomap: lift zeroed mapping handling into iomap_zero_range()
  iomap: elide flush from partial eof zero range

 fs/iomap/buffered-io.c | 88 +++++++++++++++++++++---------------------
 fs/iomap/iter.c        | 11 +++---
 2 files changed, 50 insertions(+), 49 deletions(-)

-- 
2.47.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v4 1/3] iomap: reset per-iter state on non-error iter advances
  2024-11-15 20:01 [PATCH v4 0/3] iomap: zero range flush fixes Brian Foster
@ 2024-11-15 20:01 ` Brian Foster
  2024-11-15 20:01 ` [PATCH v4 2/3] iomap: lift zeroed mapping handling into iomap_zero_range() Brian Foster
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Brian Foster @ 2024-11-15 20:01 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs, hch, djwong

iomap_iter_advance() zeroes the processed and mapping fields on
every non-error iteration except for the last expected iteration
(i.e. return 0 expected to terminate the iteration loop). This
appears to be circumstantial as nothing currently relies on these
fields after the final iteration.

Therefore to better faciliate iomap_iter reuse in subsequent
patches, update iomap_iter_advance() to always reset per-iteration
state on successful completion.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/iter.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c
index 79a0614eaab7..3790918646af 100644
--- a/fs/iomap/iter.c
+++ b/fs/iomap/iter.c
@@ -22,26 +22,25 @@
 static inline int iomap_iter_advance(struct iomap_iter *iter)
 {
 	bool stale = iter->iomap.flags & IOMAP_F_STALE;
+	int ret = 1;
 
 	/* handle the previous iteration (if any) */
 	if (iter->iomap.length) {
 		if (iter->processed < 0)
 			return iter->processed;
-		if (!iter->processed && !stale)
-			return 0;
 		if (WARN_ON_ONCE(iter->processed > iomap_length(iter)))
 			return -EIO;
 		iter->pos += iter->processed;
 		iter->len -= iter->processed;
-		if (!iter->len)
-			return 0;
+		if (!iter->len || (!iter->processed && !stale))
+			ret = 0;
 	}
 
-	/* clear the state for the next iteration */
+	/* clear the per iteration state */
 	iter->processed = 0;
 	memset(&iter->iomap, 0, sizeof(iter->iomap));
 	memset(&iter->srcmap, 0, sizeof(iter->srcmap));
-	return 1;
+	return ret;
 }
 
 static inline void iomap_iter_done(struct iomap_iter *iter)
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 2/3] iomap: lift zeroed mapping handling into iomap_zero_range()
  2024-11-15 20:01 [PATCH v4 0/3] iomap: zero range flush fixes Brian Foster
  2024-11-15 20:01 ` [PATCH v4 1/3] iomap: reset per-iter state on non-error iter advances Brian Foster
@ 2024-11-15 20:01 ` Brian Foster
  2024-11-18  6:29   ` Christoph Hellwig
  2024-11-15 20:01 ` [PATCH v4 3/3] iomap: elide flush from partial eof zero range Brian Foster
  2024-11-20  8:33 ` [PATCH v4 0/3] iomap: zero range flush fixes Christian Brauner
  3 siblings, 1 reply; 8+ messages in thread
From: Brian Foster @ 2024-11-15 20:01 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs, hch, djwong

In preparation for special handling of subranges, lift the zeroed
mapping logic from the iterator into the caller. Since this puts the
pagecache dirty check and flushing in the same place, streamline the
comments a bit as well.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/iomap/buffered-io.c | 66 +++++++++++++++---------------------------
 1 file changed, 24 insertions(+), 42 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index ef0b68bccbb6..9c1aa0355c71 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1350,40 +1350,12 @@ static inline int iomap_zero_iter_flush_and_stale(struct iomap_iter *i)
 	return filemap_write_and_wait_range(mapping, i->pos, end);
 }
 
-static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
-		bool *range_dirty)
+static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
 {
-	const struct iomap *srcmap = iomap_iter_srcmap(iter);
 	loff_t pos = iter->pos;
 	loff_t length = iomap_length(iter);
 	loff_t written = 0;
 
-	/*
-	 * We must zero subranges of unwritten mappings that might be dirty in
-	 * pagecache from previous writes. We only know whether the entire range
-	 * was clean or not, however, and dirty folios may have been written
-	 * back or reclaimed at any point after mapping lookup.
-	 *
-	 * The easiest way to deal with this is to flush pagecache to trigger
-	 * any pending unwritten conversions and then grab the updated extents
-	 * from the fs. The flush may change the current mapping, so mark it
-	 * stale for the iterator to remap it for the next pass to handle
-	 * properly.
-	 *
-	 * Note that holes are treated the same as unwritten because zero range
-	 * is (ab)used for partial folio zeroing in some cases. Hole backed
-	 * post-eof ranges can be dirtied via mapped write and the flush
-	 * triggers writeback time post-eof zeroing.
-	 */
-	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) {
-		if (*range_dirty) {
-			*range_dirty = false;
-			return iomap_zero_iter_flush_and_stale(iter);
-		}
-		/* range is clean and already zeroed, nothing to do */
-		return length;
-	}
-
 	do {
 		struct folio *folio;
 		int status;
@@ -1433,24 +1405,34 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 	bool range_dirty;
 
 	/*
-	 * Zero range wants to skip pre-zeroed (i.e. unwritten) mappings, but
-	 * pagecache must be flushed to ensure stale data from previous
-	 * buffered writes is not exposed. A flush is only required for certain
-	 * types of mappings, but checking pagecache after mapping lookup is
-	 * racy with writeback and reclaim.
+	 * Zero range can skip mappings that are zero on disk so long as
+	 * pagecache is clean. If pagecache was dirty prior to zero range, the
+	 * mapping converts on writeback completion and so must be zeroed.
 	 *
-	 * Therefore, check the entire range first and pass along whether any
-	 * part of it is dirty. If so and an underlying mapping warrants it,
-	 * flush the cache at that point. This trades off the occasional false
-	 * positive (and spurious flush, if the dirty data and mapping don't
-	 * happen to overlap) for simplicity in handling a relatively uncommon
-	 * situation.
+	 * The simplest way to deal with this across a range is to flush
+	 * pagecache and process the updated mappings. To avoid an unconditional
+	 * flush, check pagecache state and only flush if dirty and the fs
+	 * returns a mapping that might convert on writeback.
 	 */
 	range_dirty = filemap_range_needs_writeback(inode->i_mapping,
 					pos, pos + len - 1);
+	while ((ret = iomap_iter(&iter, ops)) > 0) {
+		const struct iomap *srcmap = iomap_iter_srcmap(&iter);
 
-	while ((ret = iomap_iter(&iter, ops)) > 0)
-		iter.processed = iomap_zero_iter(&iter, did_zero, &range_dirty);
+		if (srcmap->type == IOMAP_HOLE ||
+		    srcmap->type == IOMAP_UNWRITTEN) {
+			loff_t proc = iomap_length(&iter);
+
+			if (range_dirty) {
+				range_dirty = false;
+				proc = iomap_zero_iter_flush_and_stale(&iter);
+			}
+			iter.processed = proc;
+			continue;
+		}
+
+		iter.processed = iomap_zero_iter(&iter, did_zero);
+	}
 	return ret;
 }
 EXPORT_SYMBOL_GPL(iomap_zero_range);
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 3/3] iomap: elide flush from partial eof zero range
  2024-11-15 20:01 [PATCH v4 0/3] iomap: zero range flush fixes Brian Foster
  2024-11-15 20:01 ` [PATCH v4 1/3] iomap: reset per-iter state on non-error iter advances Brian Foster
  2024-11-15 20:01 ` [PATCH v4 2/3] iomap: lift zeroed mapping handling into iomap_zero_range() Brian Foster
@ 2024-11-15 20:01 ` Brian Foster
  2024-11-20  8:33 ` [PATCH v4 0/3] iomap: zero range flush fixes Christian Brauner
  3 siblings, 0 replies; 8+ messages in thread
From: Brian Foster @ 2024-11-15 20:01 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs, hch, djwong

iomap zero range flushes pagecache in certain situations to
determine which parts of the range might require zeroing if dirty
data is present in pagecache. The kernel robot recently reported a
regression associated with this flushing in the following stress-ng
workload on XFS:

stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --metamix 64

This workload involves repeated small, strided, extending writes. On
XFS, this produces a pattern of post-eof speculative preallocation,
conversion of preallocation from delalloc to unwritten, dirtying
pagecache over newly unwritten blocks, and then rinse and repeat
from the new EOF. This leads to repetitive flushing of the EOF folio
via the zero range call XFS uses for writes that start beyond
current EOF.

To mitigate this problem, special case EOF block zeroing to prefer
zeroing the folio over a flush when the EOF folio is already dirty.
To do this, split out and open code handling of an unaligned start
offset. This brings most of the performance back by avoiding flushes
on zero range calls via write and truncate extension operations. The
flush doesn't occur in these situations because the entire range is
post-eof and therefore the folio that overlaps EOF is the only one
in the range.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 9c1aa0355c71..af2f59817779 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1401,6 +1401,10 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 		.len		= len,
 		.flags		= IOMAP_ZERO,
 	};
+	struct address_space *mapping = inode->i_mapping;
+	unsigned int blocksize = i_blocksize(inode);
+	unsigned int off = pos & (blocksize - 1);
+	loff_t plen = min_t(loff_t, len, blocksize - off);
 	int ret;
 	bool range_dirty;
 
@@ -1410,12 +1414,28 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 	 * mapping converts on writeback completion and so must be zeroed.
 	 *
 	 * The simplest way to deal with this across a range is to flush
-	 * pagecache and process the updated mappings. To avoid an unconditional
-	 * flush, check pagecache state and only flush if dirty and the fs
-	 * returns a mapping that might convert on writeback.
+	 * pagecache and process the updated mappings. To avoid excessive
+	 * flushing on partial eof zeroing, special case it to zero the
+	 * unaligned start portion if already dirty in pagecache.
+	 */
+	if (off &&
+	    filemap_range_needs_writeback(mapping, pos, pos + plen - 1)) {
+		iter.len = plen;
+		while ((ret = iomap_iter(&iter, ops)) > 0)
+			iter.processed = iomap_zero_iter(&iter, did_zero);
+
+		iter.len = len - (iter.pos - pos);
+		if (ret || !iter.len)
+			return ret;
+	}
+
+	/*
+	 * To avoid an unconditional flush, check pagecache state and only flush
+	 * if dirty and the fs returns a mapping that might convert on
+	 * writeback.
 	 */
 	range_dirty = filemap_range_needs_writeback(inode->i_mapping,
-					pos, pos + len - 1);
+					iter.pos, iter.pos + iter.len - 1);
 	while ((ret = iomap_iter(&iter, ops)) > 0) {
 		const struct iomap *srcmap = iomap_iter_srcmap(&iter);
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 2/3] iomap: lift zeroed mapping handling into iomap_zero_range()
  2024-11-15 20:01 ` [PATCH v4 2/3] iomap: lift zeroed mapping handling into iomap_zero_range() Brian Foster
@ 2024-11-18  6:29   ` Christoph Hellwig
  2024-11-18 13:54     ` Brian Foster
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2024-11-18  6:29 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, linux-xfs, hch, djwong

On Fri, Nov 15, 2024 at 03:01:54PM -0500, Brian Foster wrote:
> In preparation for special handling of subranges, lift the zeroed
> mapping logic from the iterator into the caller. Since this puts the
> pagecache dirty check and flushing in the same place, streamline the
> comments a bit as well.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

I don't want to block this improvement on stylistic things, but
I still don't like moving more code than the function invocation into
the iter body.  I hope you're okay with me undoing that sooner or later.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 2/3] iomap: lift zeroed mapping handling into iomap_zero_range()
  2024-11-18  6:29   ` Christoph Hellwig
@ 2024-11-18 13:54     ` Brian Foster
  2024-11-18 14:14       ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Brian Foster @ 2024-11-18 13:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-xfs, djwong

On Sun, Nov 17, 2024 at 10:29:50PM -0800, Christoph Hellwig wrote:
> On Fri, Nov 15, 2024 at 03:01:54PM -0500, Brian Foster wrote:
> > In preparation for special handling of subranges, lift the zeroed
> > mapping logic from the iterator into the caller. Since this puts the
> > pagecache dirty check and flushing in the same place, streamline the
> > comments a bit as well.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> 
> I don't want to block this improvement on stylistic things, but
> I still don't like moving more code than the function invocation into
> the iter body.  I hope you're okay with me undoing that sooner or later.
> 
> 

I actually think it's easier for you to just fix it up according to your
needs rather than spin around on the list on it, since I'm not totally
clear on what the goal is here anyways.

Not sure if you saw my comment here [1], but my goal is to eventually
remove this code anyways in favor of something that supports more of a
sparse folio iteration. Whether it gets removed first or reworked in the
meantime as part of broader cleanups isn't such a big deal. I just want
to point that out so it's clear it's not worth trying too hard to
beautify it.

Brian

[1] https://lore.kernel.org/linux-fsdevel/ZzdgWkt1DRCTWfCv@bfoster/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 2/3] iomap: lift zeroed mapping handling into iomap_zero_range()
  2024-11-18 13:54     ` Brian Foster
@ 2024-11-18 14:14       ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2024-11-18 14:14 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-fsdevel, linux-xfs, djwong

On Mon, Nov 18, 2024 at 08:54:51AM -0500, Brian Foster wrote:
> I actually think it's easier for you to just fix it up according to your
> needs rather than spin around on the list on it, since I'm not totally
> clear on what the goal is here anyways.
> 
> Not sure if you saw my comment here [1], but my goal is to eventually
> remove this code anyways in favor of something that supports more of a
> sparse folio iteration. Whether it gets removed first or reworked in the
> meantime as part of broader cleanups isn't such a big deal. I just want
> to point that out so it's clear it's not worth trying too hard to
> beautify it.

Sounds good, thanks!


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 0/3] iomap: zero range flush fixes
  2024-11-15 20:01 [PATCH v4 0/3] iomap: zero range flush fixes Brian Foster
                   ` (2 preceding siblings ...)
  2024-11-15 20:01 ` [PATCH v4 3/3] iomap: elide flush from partial eof zero range Brian Foster
@ 2024-11-20  8:33 ` Christian Brauner
  3 siblings, 0 replies; 8+ messages in thread
From: Christian Brauner @ 2024-11-20  8:33 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christian Brauner, linux-xfs, hch, djwong, linux-fsdevel

On Fri, 15 Nov 2024 15:01:52 -0500, Brian Foster wrote:
> Here's v4 of the zero range flush improvements. No real major changes
> here, mostly minor whitespace, naming issues, etc.
> 
> Brian
> 
> v4:
> - Whitespace and variable naming fixes.
> - Split off patch 4 to a separate post.
> v3: https://lore.kernel.org/linux-fsdevel/20241108124246.198489-1-bfoster@redhat.com/
> - Added new patch 1 to always reset per-iter state in iomap_iter.
> - Dropped iomap_iter_init() helper.
> - Misc. cleanups.
> - Appended patch 4 to warn on zeroing beyond EOF.
> v2: https://lore.kernel.org/linux-fsdevel/20241031140449.439576-1-bfoster@redhat.com/
> - Added patch 1 to lift zeroed mapping handling code into caller.
> - Split unaligned start range handling at the top level.
> - Retain existing conditional flush behavior (vs. unconditional flush)
>   for the remaining range.
> v1: https://lore.kernel.org/linux-fsdevel/20241023143029.11275-1-bfoster@redhat.com/
> 
> [...]

Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes

[1/3] iomap: reset per-iter state on non-error iter advances
      https://git.kernel.org/vfs/vfs/c/cad47157a55d
[2/3] iomap: lift zeroed mapping handling into iomap_zero_range()
      https://git.kernel.org/vfs/vfs/c/46538f0b405b
[3/3] iomap: elide flush from partial eof zero range
      https://git.kernel.org/vfs/vfs/c/c07ba2d5979b

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-11-20  8:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-15 20:01 [PATCH v4 0/3] iomap: zero range flush fixes Brian Foster
2024-11-15 20:01 ` [PATCH v4 1/3] iomap: reset per-iter state on non-error iter advances Brian Foster
2024-11-15 20:01 ` [PATCH v4 2/3] iomap: lift zeroed mapping handling into iomap_zero_range() Brian Foster
2024-11-18  6:29   ` Christoph Hellwig
2024-11-18 13:54     ` Brian Foster
2024-11-18 14:14       ` Christoph Hellwig
2024-11-15 20:01 ` [PATCH v4 3/3] iomap: elide flush from partial eof zero range Brian Foster
2024-11-20  8:33 ` [PATCH v4 0/3] iomap: zero range flush fixes Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox