[PATCH v5 0/7] iomap: zero range folio batch support

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v5 0/7] iomap: zero range folio batch support
@ 2025-10-03 13:46 Brian Foster
  2025-10-03 13:46 ` [PATCH v5 1/7] filemap: add helper to look up dirty folios in a range Brian Foster
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: Brian Foster @ 2025-10-03 13:46 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs, linux-mm, hch, djwong, willy, brauner

Hi all,

Only minor changes in v5 to the XFS errortag patch. I've kept the R-b
tags because the fundamental logic is the same, but the errortag
mechanism has been reworked and so that one needed a rebase (which turns
out much simpler). A second look certainly couldn't hurt, but otherwise
the associated fstest still works as expected.

Note that the force zeroing fstests test has since been merged as
xfs/131. Otherwise I still have some followup patches to this work re:
the ext4 on iomap work, but it would be nice to move this along before
getting too far ahead with that.

Brian

--- Original cover letter ---

Hi all,

Here's a first real v1 of folio batch support for iomap. This initially
only targets zero range, the use case being zeroing of dirty folios over
unwritten mappings. There is potential to support other operations in
the future: iomap seek data/hole has similar raciness issues as zero
range, the prospect of using this for buffered write has been raised for
granular locking purposes, etc.

The one major caveat with this zero range implementation is that it
doesn't look at iomap_folio_state to determine whether to zero a
sub-folio portion of the folio. Instead it just relies on whether the
folio was dirty or not. This means that spurious zeroing of unwritten
ranges is possible if a folio is dirty but the target range includes a
subrange that is not.

The reasoning is that this is essentially a complexity tradeoff. The
current use cases for iomap_zero_range() are limited mostly to partial
block zeroing scenarios. It's relatively harmless to zero an unwritten
block (i.e. not a correctness issue), and this is something that
filesystems have done in the past without much notice or issue. The
advantage is less code and this makes it a little easier to use a
filemap lookup function for the batch rather than open coding more logic
in iomap. That said, this can probably be enhanced to look at ifs in the
future if the use case expands and/or other operations justify it.

WRT testing, I've tested with and without a local hack to redirect
fallocate zero range calls to iomap_zero_range() in XFS. This helps test
beyond the partial block/folio use case, i.e. to cover boundary
conditions like full folio batch handling, etc. I recently added patch 7
in spirit of that, which turns this logic into an XFS errortag. Further
comments on that are inline with patch 7.

Thoughts, reviews, flames appreciated.

Brian

v5:
- Rebased across XFS errortag mechanism rework.
v4: https://lore.kernel.org/linux-fsdevel/20250807144711.564137-1-bfoster@redhat.com/
- Update commit log description in patch 3.
- Added remaining R-b tags.
v3: https://lore.kernel.org/linux-fsdevel/20250714204122.349582-1-bfoster@redhat.com/
- Update commit log description in patch 2.
- Improve comments in patch 7.
v2: https://lore.kernel.org/linux-fsdevel/20250714132059.288129-1-bfoster@redhat.com/
- Move filemap patch to top. Add some comments and drop export.
- Drop unnecessary BUG_ON()s from iomap_write_begin() instead of moving.
- Added folio mapping check to batch codepath, improved comments.
v1: https://lore.kernel.org/linux-fsdevel/20250605173357.579720-1-bfoster@redhat.com/
- Dropped most prep patches from previous version (merged separately).
- Reworked dirty folio lookup to use find_get_entry() loop (new patch
  for filemap helper).
- Misc. bug fixes, code cleanups, comments, etc.
- Added (RFC) prospective patch for wider zero range test coverage.
RFCv2: https://lore.kernel.org/linux-fsdevel/20241213150528.1003662-1-bfoster@redhat.com/
- Port onto incremental advance, drop patch 1 from RFCv1.
- Moved batch into iomap_iter, dynamically allocate and drop flag.
- Tweak XFS patch to always trim zero range on EOF boundary.
RFCv1: https://lore.kernel.org/linux-fsdevel/20241119154656.774395-1-bfoster@redhat.com/

Brian Foster (7):
  filemap: add helper to look up dirty folios in a range
  iomap: remove pos+len BUG_ON() to after folio lookup
  iomap: optional zero range dirty folio processing
  xfs: always trim mapping to requested range for zero range
  xfs: fill dirty folios on zero range of unwritten mappings
  iomap: remove old partial eof zeroing optimization
  xfs: error tag to force zeroing on debug kernels

 fs/iomap/buffered-io.c       | 117 +++++++++++++++++++++++++----------
 fs/iomap/iter.c              |   6 ++
 fs/xfs/libxfs/xfs_errortag.h |   6 +-
 fs/xfs/xfs_file.c            |  29 ++++++---
 fs/xfs/xfs_iomap.c           |  38 +++++++++---
 include/linux/iomap.h        |   4 ++
 include/linux/pagemap.h      |   2 +
 mm/filemap.c                 |  58 +++++++++++++++++
 8 files changed, 210 insertions(+), 50 deletions(-)

-- 
2.51.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v5 1/7] filemap: add helper to look up dirty folios in a range
  2025-10-03 13:46 [PATCH v5 0/7] iomap: zero range folio batch support Brian Foster
@ 2025-10-03 13:46 ` Brian Foster
  2025-10-03 13:46 ` [PATCH v5 2/7] iomap: remove pos+len BUG_ON() to after folio lookup Brian Foster
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Brian Foster @ 2025-10-03 13:46 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs, linux-mm, hch, djwong, willy, brauner

Add a new filemap_get_folios_dirty() helper to look up existing dirty
folios in a range and add them to a folio_batch. This is to support
optimization of certain iomap operations that only care about dirty
folios in a target range. For example, zero range only zeroes the subset
of dirty pages over unwritten mappings, seek hole/data may use similar
logic in the future, etc.

Note that the helper is intended for use under internal fs locks.
Therefore it trylocks folios in order to filter out clean folios.
This loosely follows the logic from filemap_range_has_writeback().

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
 include/linux/pagemap.h |  2 ++
 mm/filemap.c            | 58 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 185644e288ea..0b471f0eb940 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -977,6 +977,8 @@ unsigned filemap_get_folios_contig(struct address_space *mapping,
 		pgoff_t *start, pgoff_t end, struct folio_batch *fbatch);
 unsigned filemap_get_folios_tag(struct address_space *mapping, pgoff_t *start,
 		pgoff_t end, xa_mark_t tag, struct folio_batch *fbatch);
+unsigned filemap_get_folios_dirty(struct address_space *mapping,
+		pgoff_t *start, pgoff_t end, struct folio_batch *fbatch);
 
 struct folio *read_cache_folio(struct address_space *, pgoff_t index,
 		filler_t *filler, struct file *file);
diff --git a/mm/filemap.c b/mm/filemap.c
index a52dd38d2b4a..db9f2dac9c54 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2348,6 +2348,64 @@ unsigned filemap_get_folios_tag(struct address_space *mapping, pgoff_t *start,
 }
 EXPORT_SYMBOL(filemap_get_folios_tag);
 
+/**
+ * filemap_get_folios_dirty - Get a batch of dirty folios
+ * @mapping:	The address_space to search
+ * @start:	The starting folio index
+ * @end:	The final folio index (inclusive)
+ * @fbatch:	The batch to fill
+ *
+ * filemap_get_folios_dirty() works exactly like filemap_get_folios(), except
+ * the returned folios are presumed to be dirty or undergoing writeback. Dirty
+ * state is presumed because we don't block on folio lock nor want to miss
+ * folios. Callers that need to can recheck state upon locking the folio.
+ *
+ * This may not return all dirty folios if the batch gets filled up.
+ *
+ * Return: The number of folios found.
+ * Also update @start to be positioned for traversal of the next folio.
+ */
+unsigned filemap_get_folios_dirty(struct address_space *mapping, pgoff_t *start,
+			pgoff_t end, struct folio_batch *fbatch)
+{
+	XA_STATE(xas, &mapping->i_pages, *start);
+	struct folio *folio;
+
+	rcu_read_lock();
+	while ((folio = find_get_entry(&xas, end, XA_PRESENT)) != NULL) {
+		if (xa_is_value(folio))
+			continue;
+		if (folio_trylock(folio)) {
+			bool clean = !folio_test_dirty(folio) &&
+				     !folio_test_writeback(folio);
+			folio_unlock(folio);
+			if (clean) {
+				folio_put(folio);
+				continue;
+			}
+		}
+		if (!folio_batch_add(fbatch, folio)) {
+			unsigned long nr = folio_nr_pages(folio);
+			*start = folio->index + nr;
+			goto out;
+		}
+	}
+	/*
+	 * We come here when there is no folio beyond @end. We take care to not
+	 * overflow the index @start as it confuses some of the callers. This
+	 * breaks the iteration when there is a folio at index -1 but that is
+	 * already broke anyway.
+	 */
+	if (end == (pgoff_t)-1)
+		*start = (pgoff_t)-1;
+	else
+		*start = end + 1;
+out:
+	rcu_read_unlock();
+
+	return folio_batch_count(fbatch);
+}
+
 /*
  * CD/DVDs are error prone. When a medium error occurs, the driver may fail
  * a _large_ part of the i/o request. Imagine the worst scenario:
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 2/7] iomap: remove pos+len BUG_ON() to after folio lookup
  2025-10-03 13:46 [PATCH v5 0/7] iomap: zero range folio batch support Brian Foster
  2025-10-03 13:46 ` [PATCH v5 1/7] filemap: add helper to look up dirty folios in a range Brian Foster
@ 2025-10-03 13:46 ` Brian Foster
  2025-10-03 13:46 ` [PATCH v5 3/7] iomap: optional zero range dirty folio processing Brian Foster
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Brian Foster @ 2025-10-03 13:46 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs, linux-mm, hch, djwong, willy, brauner

The bug checks at the top of iomap_write_begin() assume the pos/len
reflect exactly the next range to process. This may no longer be the
case once the get folio path is able to process a folio batch from
the filesystem. On top of that, len is already trimmed to within the
iomap/srcmap by iomap_length(), so these checks aren't terribly
useful. Remove the unnecessary BUG_ON() checks.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/iomap/buffered-io.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 8b847a1e27f1..211644774bb8 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -815,15 +815,12 @@ static int iomap_write_begin(struct iomap_iter *iter,
 		size_t *poffset, u64 *plen)
 {
 	const struct iomap *srcmap = iomap_iter_srcmap(iter);
-	loff_t pos = iter->pos;
+	loff_t pos;
 	u64 len = min_t(u64, SIZE_MAX, iomap_length(iter));
 	struct folio *folio;
 	int status = 0;
 
 	len = min_not_zero(len, *plen);
-	BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length);
-	if (srcmap != &iter->iomap)
-		BUG_ON(pos + len > srcmap->offset + srcmap->length);
 
 	if (fatal_signal_pending(current))
 		return -EINTR;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 3/7] iomap: optional zero range dirty folio processing
  2025-10-03 13:46 [PATCH v5 0/7] iomap: zero range folio batch support Brian Foster
  2025-10-03 13:46 ` [PATCH v5 1/7] filemap: add helper to look up dirty folios in a range Brian Foster
  2025-10-03 13:46 ` [PATCH v5 2/7] iomap: remove pos+len BUG_ON() to after folio lookup Brian Foster
@ 2025-10-03 13:46 ` Brian Foster
  2025-10-03 13:46 ` [PATCH v5 4/7] xfs: always trim mapping to requested range for zero range Brian Foster
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Brian Foster @ 2025-10-03 13:46 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs, linux-mm, hch, djwong, willy, brauner

The only way zero range can currently process unwritten mappings
with dirty pagecache is to check whether the range is dirty before
mapping lookup and then flush when at least one underlying mapping
is unwritten. This ordering is required to prevent iomap lookup from
racing with folio writeback and reclaim.

Since zero range can skip ranges of unwritten mappings that are
clean in cache, this operation can be improved by allowing the
filesystem to provide a set of dirty folios that require zeroing. In
turn, rather than flush or iterate file offsets, zero range can
iterate on folios in the batch and advance over clean or uncached
ranges in between.

Add a folio_batch in struct iomap and provide a helper for
filesystems to populate the batch at lookup time. Update the folio
lookup path to return the next folio in the batch, if provided, and
advance the iter if the folio starts beyond the current offset.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/iomap/buffered-io.c | 89 +++++++++++++++++++++++++++++++++++++++---
 fs/iomap/iter.c        |  6 +++
 include/linux/iomap.h  |  4 ++
 3 files changed, 94 insertions(+), 5 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 211644774bb8..a830ae5f5ac7 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -761,6 +761,28 @@ static struct folio *__iomap_get_folio(struct iomap_iter *iter,
 	if (!mapping_large_folio_support(iter->inode->i_mapping))
 		len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));
 
+	if (iter->fbatch) {
+		struct folio *folio = folio_batch_next(iter->fbatch);
+
+		if (!folio)
+			return NULL;
+
+		/*
+		 * The folio mapping generally shouldn't have changed based on
+		 * fs locks, but be consistent with filemap lookup and retry
+		 * the iter if it does.
+		 */
+		folio_lock(folio);
+		if (unlikely(folio->mapping != iter->inode->i_mapping)) {
+			iter->iomap.flags |= IOMAP_F_STALE;
+			folio_unlock(folio);
+			return NULL;
+		}
+
+		folio_get(folio);
+		return folio;
+	}
+
 	if (write_ops && write_ops->get_folio)
 		return write_ops->get_folio(iter, pos, len);
 	return iomap_get_folio(iter, pos, len);
@@ -821,6 +843,8 @@ static int iomap_write_begin(struct iomap_iter *iter,
 	int status = 0;
 
 	len = min_not_zero(len, *plen);
+	*foliop = NULL;
+	*plen = 0;
 
 	if (fatal_signal_pending(current))
 		return -EINTR;
@@ -829,6 +853,15 @@ static int iomap_write_begin(struct iomap_iter *iter,
 	if (IS_ERR(folio))
 		return PTR_ERR(folio);
 
+	/*
+	 * No folio means we're done with a batch. We still have range to
+	 * process so return and let the caller iterate and refill the batch.
+	 */
+	if (!folio) {
+		WARN_ON_ONCE(!iter->fbatch);
+		return 0;
+	}
+
 	/*
 	 * Now we have a locked folio, before we do anything with it we need to
 	 * check that the iomap we have cached is not stale. The inode extent
@@ -849,6 +882,21 @@ static int iomap_write_begin(struct iomap_iter *iter,
 		}
 	}
 
+	/*
+	 * The folios in a batch may not be contiguous. If we've skipped
+	 * forward, advance the iter to the pos of the current folio. If the
+	 * folio starts beyond the end of the mapping, it may have been trimmed
+	 * since the lookup for whatever reason. Return a NULL folio to
+	 * terminate the op.
+	 */
+	if (folio_pos(folio) > iter->pos) {
+		len = min_t(u64, folio_pos(folio) - iter->pos,
+				 iomap_length(iter));
+		status = iomap_iter_advance(iter, &len);
+		if (status || !len)
+			goto out_unlock;
+	}
+
 	pos = iomap_trim_folio_range(iter, folio, poffset, &len);
 
 	if (srcmap->type == IOMAP_INLINE)
@@ -1395,6 +1443,12 @@ static int iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
 		if (iter->iomap.flags & IOMAP_F_STALE)
 			break;
 
+		/* a NULL folio means we're done with a folio batch */
+		if (!folio) {
+			status = iomap_iter_advance_full(iter);
+			break;
+		}
+
 		/* warn about zeroing folios beyond eof that won't write back */
 		WARN_ON_ONCE(folio_pos(folio) > iter->inode->i_size);
 
@@ -1419,6 +1473,26 @@ static int iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
 	return status;
 }
 
+loff_t
+iomap_fill_dirty_folios(
+	struct iomap_iter	*iter,
+	loff_t			offset,
+	loff_t			length)
+{
+	struct address_space	*mapping = iter->inode->i_mapping;
+	pgoff_t			start = offset >> PAGE_SHIFT;
+	pgoff_t			end = (offset + length - 1) >> PAGE_SHIFT;
+
+	iter->fbatch = kmalloc(sizeof(struct folio_batch), GFP_KERNEL);
+	if (!iter->fbatch)
+		return offset + length;
+	folio_batch_init(iter->fbatch);
+
+	filemap_get_folios_dirty(mapping, &start, end, iter->fbatch);
+	return (start << PAGE_SHIFT);
+}
+EXPORT_SYMBOL_GPL(iomap_fill_dirty_folios);
+
 int
 iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 		const struct iomap_ops *ops,
@@ -1448,7 +1522,7 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 	 * flushing on partial eof zeroing, special case it to zero the
 	 * unaligned start portion if already dirty in pagecache.
 	 */
-	if (off &&
+	if (!iter.fbatch && off &&
 	    filemap_range_needs_writeback(mapping, pos, pos + plen - 1)) {
 		iter.len = plen;
 		while ((ret = iomap_iter(&iter, ops)) > 0)
@@ -1465,13 +1539,18 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 	 * if dirty and the fs returns a mapping that might convert on
 	 * writeback.
 	 */
-	range_dirty = filemap_range_needs_writeback(inode->i_mapping,
-					iter.pos, iter.pos + iter.len - 1);
+	range_dirty = filemap_range_needs_writeback(mapping, iter.pos,
+					iter.pos + iter.len - 1);
 	while ((ret = iomap_iter(&iter, ops)) > 0) {
 		const struct iomap *srcmap = iomap_iter_srcmap(&iter);
 
-		if (srcmap->type == IOMAP_HOLE ||
-		    srcmap->type == IOMAP_UNWRITTEN) {
+		if (WARN_ON_ONCE(iter.fbatch &&
+				 srcmap->type != IOMAP_UNWRITTEN))
+			return -EIO;
+
+		if (!iter.fbatch &&
+		    (srcmap->type == IOMAP_HOLE ||
+		     srcmap->type == IOMAP_UNWRITTEN)) {
 			s64 status;
 
 			if (range_dirty) {
diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c
index cef77ca0c20b..66ca12aac57d 100644
--- a/fs/iomap/iter.c
+++ b/fs/iomap/iter.c
@@ -8,6 +8,12 @@
 
 static inline void iomap_iter_reset_iomap(struct iomap_iter *iter)
 {
+	if (iter->fbatch) {
+		folio_batch_release(iter->fbatch);
+		kfree(iter->fbatch);
+		iter->fbatch = NULL;
+	}
+
 	iter->status = 0;
 	memset(&iter->iomap, 0, sizeof(iter->iomap));
 	memset(&iter->srcmap, 0, sizeof(iter->srcmap));
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 73dceabc21c8..cd0f573156d6 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -9,6 +9,7 @@
 #include <linux/types.h>
 #include <linux/mm_types.h>
 #include <linux/blkdev.h>
+#include <linux/pagevec.h>
 
 struct address_space;
 struct fiemap_extent_info;
@@ -241,6 +242,7 @@ struct iomap_iter {
 	unsigned flags;
 	struct iomap iomap;
 	struct iomap srcmap;
+	struct folio_batch *fbatch;
 	void *private;
 };
 
@@ -349,6 +351,8 @@ bool iomap_dirty_folio(struct address_space *mapping, struct folio *folio);
 int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
 		const struct iomap_ops *ops,
 		const struct iomap_write_ops *write_ops);
+loff_t iomap_fill_dirty_folios(struct iomap_iter *iter, loff_t offset,
+		loff_t length);
 int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
 		bool *did_zero, const struct iomap_ops *ops,
 		const struct iomap_write_ops *write_ops, void *private);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 4/7] xfs: always trim mapping to requested range for zero range
  2025-10-03 13:46 [PATCH v5 0/7] iomap: zero range folio batch support Brian Foster
                   ` (2 preceding siblings ...)
  2025-10-03 13:46 ` [PATCH v5 3/7] iomap: optional zero range dirty folio processing Brian Foster
@ 2025-10-03 13:46 ` Brian Foster
  2025-10-03 13:46 ` [PATCH v5 5/7] xfs: fill dirty folios on zero range of unwritten mappings Brian Foster
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Brian Foster @ 2025-10-03 13:46 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs, linux-mm, hch, djwong, willy, brauner

Refactor and tweak the IOMAP_ZERO logic in preparation to support
filling the folio batch for unwritten mappings. Drop the superfluous
imap offset check since the hole case has already been filtered out.
Split the the delalloc case handling into a sub-branch, and always
trim the imap to the requested offset/count so it can be more easily
used to bound the range to lookup in pagecache.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/xfs_iomap.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index d3f6e3e42a11..6a05e04ad5ba 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1767,21 +1767,20 @@ xfs_buffered_write_iomap_begin(
 	}
 
 	/*
-	 * For zeroing, trim a delalloc extent that extends beyond the EOF
-	 * block.  If it starts beyond the EOF block, convert it to an
+	 * For zeroing, trim extents that extend beyond the EOF block. If a
+	 * delalloc extent starts beyond the EOF block, convert it to an
 	 * unwritten extent.
 	 */
-	if ((flags & IOMAP_ZERO) && imap.br_startoff <= offset_fsb &&
-	    isnullstartblock(imap.br_startblock)) {
+	if (flags & IOMAP_ZERO) {
 		xfs_fileoff_t eof_fsb = XFS_B_TO_FSB(mp, XFS_ISIZE(ip));
 
-		if (offset_fsb >= eof_fsb)
+		if (isnullstartblock(imap.br_startblock) &&
+		    offset_fsb >= eof_fsb)
 			goto convert_delay;
-		if (end_fsb > eof_fsb) {
+		if (offset_fsb < eof_fsb && end_fsb > eof_fsb)
 			end_fsb = eof_fsb;
-			xfs_trim_extent(&imap, offset_fsb,
-					end_fsb - offset_fsb);
-		}
+
+		xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
 	}
 
 	/*
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 5/7] xfs: fill dirty folios on zero range of unwritten mappings
  2025-10-03 13:46 [PATCH v5 0/7] iomap: zero range folio batch support Brian Foster
                   ` (3 preceding siblings ...)
  2025-10-03 13:46 ` [PATCH v5 4/7] xfs: always trim mapping to requested range for zero range Brian Foster
@ 2025-10-03 13:46 ` Brian Foster
  2025-10-03 13:46 ` [PATCH v5 6/7] iomap: remove old partial eof zeroing optimization Brian Foster
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Brian Foster @ 2025-10-03 13:46 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs, linux-mm, hch, djwong, willy, brauner

Use the iomap folio batch mechanism to select folios to zero on zero
range of unwritten mappings. Trim the resulting mapping if the batch
is filled (unlikely for current use cases) to distinguish between a
range to skip and one that requires another iteration due to a full
batch.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/xfs_iomap.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 6a05e04ad5ba..535bf3b8705d 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1702,6 +1702,8 @@ xfs_buffered_write_iomap_begin(
 	struct iomap		*iomap,
 	struct iomap		*srcmap)
 {
+	struct iomap_iter	*iter = container_of(iomap, struct iomap_iter,
+						     iomap);
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
@@ -1773,6 +1775,7 @@ xfs_buffered_write_iomap_begin(
 	 */
 	if (flags & IOMAP_ZERO) {
 		xfs_fileoff_t eof_fsb = XFS_B_TO_FSB(mp, XFS_ISIZE(ip));
+		u64 end;
 
 		if (isnullstartblock(imap.br_startblock) &&
 		    offset_fsb >= eof_fsb)
@@ -1780,6 +1783,26 @@ xfs_buffered_write_iomap_begin(
 		if (offset_fsb < eof_fsb && end_fsb > eof_fsb)
 			end_fsb = eof_fsb;
 
+		/*
+		 * Look up dirty folios for unwritten mappings within EOF.
+		 * Providing this bypasses the flush iomap uses to trigger
+		 * extent conversion when unwritten mappings have dirty
+		 * pagecache in need of zeroing.
+		 *
+		 * Trim the mapping to the end pos of the lookup, which in turn
+		 * was trimmed to the end of the batch if it became full before
+		 * the end of the mapping.
+		 */
+		if (imap.br_state == XFS_EXT_UNWRITTEN &&
+		    offset_fsb < eof_fsb) {
+			loff_t len = min(count,
+					 XFS_FSB_TO_B(mp, imap.br_blockcount));
+
+			end = iomap_fill_dirty_folios(iter, offset, len);
+			end_fsb = min_t(xfs_fileoff_t, end_fsb,
+					XFS_B_TO_FSB(mp, end));
+		}
+
 		xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
 	}
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 6/7] iomap: remove old partial eof zeroing optimization
  2025-10-03 13:46 [PATCH v5 0/7] iomap: zero range folio batch support Brian Foster
                   ` (4 preceding siblings ...)
  2025-10-03 13:46 ` [PATCH v5 5/7] xfs: fill dirty folios on zero range of unwritten mappings Brian Foster
@ 2025-10-03 13:46 ` Brian Foster
  2025-10-03 13:46 ` [PATCH v5 7/7] xfs: error tag to force zeroing on debug kernels Brian Foster
  2025-10-07 11:12 ` [PATCH v5 0/7] iomap: zero range folio batch support Christian Brauner
  7 siblings, 0 replies; 10+ messages in thread
From: Brian Foster @ 2025-10-03 13:46 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs, linux-mm, hch, djwong, willy, brauner

iomap_zero_range() optimizes the partial eof block zeroing use case
by force zeroing if the mapping is dirty. This is to avoid frequent
flushing on file extending workloads, which hurts performance.

Now that the folio batch mechanism provides a more generic solution
and is used by the only real zero range user (XFS), this isolated
optimization is no longer needed. Remove the unnecessary code and
let callers use the folio batch or fall back to flushing by default.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/iomap/buffered-io.c | 25 -------------------------
 1 file changed, 25 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index a830ae5f5ac7..51ecb6d48feb 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1506,34 +1506,9 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 		.private	= private,
 	};
 	struct address_space *mapping = inode->i_mapping;
-	unsigned int blocksize = i_blocksize(inode);
-	unsigned int off = pos & (blocksize - 1);
-	loff_t plen = min_t(loff_t, len, blocksize - off);
 	int ret;
 	bool range_dirty;
 
-	/*
-	 * Zero range can skip mappings that are zero on disk so long as
-	 * pagecache is clean. If pagecache was dirty prior to zero range, the
-	 * mapping converts on writeback completion and so must be zeroed.
-	 *
-	 * The simplest way to deal with this across a range is to flush
-	 * pagecache and process the updated mappings. To avoid excessive
-	 * flushing on partial eof zeroing, special case it to zero the
-	 * unaligned start portion if already dirty in pagecache.
-	 */
-	if (!iter.fbatch && off &&
-	    filemap_range_needs_writeback(mapping, pos, pos + plen - 1)) {
-		iter.len = plen;
-		while ((ret = iomap_iter(&iter, ops)) > 0)
-			iter.status = iomap_zero_iter(&iter, did_zero,
-					write_ops);
-
-		iter.len = len - (iter.pos - pos);
-		if (ret || !iter.len)
-			return ret;
-	}
-
 	/*
 	 * To avoid an unconditional flush, check pagecache state and only flush
 	 * if dirty and the fs returns a mapping that might convert on
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 7/7] xfs: error tag to force zeroing on debug kernels
  2025-10-03 13:46 [PATCH v5 0/7] iomap: zero range folio batch support Brian Foster
                   ` (5 preceding siblings ...)
  2025-10-03 13:46 ` [PATCH v5 6/7] iomap: remove old partial eof zeroing optimization Brian Foster
@ 2025-10-03 13:46 ` Brian Foster
  2025-10-07 11:12 ` [PATCH v5 0/7] iomap: zero range folio batch support Christian Brauner
  7 siblings, 0 replies; 10+ messages in thread
From: Brian Foster @ 2025-10-03 13:46 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs, linux-mm, hch, djwong, willy, brauner

iomap_zero_range() has to cover various corner cases that are
difficult to test on production kernels because it is used in fairly
limited use cases. For example, it is currently only used by XFS and
mostly only in partial block zeroing cases.

While it's possible to test most of these functional cases, we can
provide more robust test coverage by co-opting fallocate zero range
to invoke zeroing of the entire range instead of the more efficient
block punch/allocate sequence. Add an errortag to occasionally
invoke forced zeroing.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_errortag.h |  6 ++++--
 fs/xfs/xfs_file.c            | 29 ++++++++++++++++++++++-------
 2 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h
index de840abc0bcd..57e47077c75a 100644
--- a/fs/xfs/libxfs/xfs_errortag.h
+++ b/fs/xfs/libxfs/xfs_errortag.h
@@ -73,7 +73,8 @@
 #define XFS_ERRTAG_WRITE_DELAY_MS			43
 #define XFS_ERRTAG_EXCHMAPS_FINISH_ONE			44
 #define XFS_ERRTAG_METAFILE_RESV_CRITICAL		45
-#define XFS_ERRTAG_MAX					46
+#define XFS_ERRTAG_FORCE_ZERO_RANGE			46
+#define XFS_ERRTAG_MAX					47
 
 /*
  * Random factors for above tags, 1 means always, 2 means 1/2 time, etc.
@@ -133,7 +134,8 @@ XFS_ERRTAG(ATTR_LEAF_TO_NODE,	attr_leaf_to_node,	1) \
 XFS_ERRTAG(WB_DELAY_MS,		wb_delay_ms,		3000) \
 XFS_ERRTAG(WRITE_DELAY_MS,	write_delay_ms,		3000) \
 XFS_ERRTAG(EXCHMAPS_FINISH_ONE,	exchmaps_finish_one,	1) \
-XFS_ERRTAG(METAFILE_RESV_CRITICAL, metafile_resv_crit,	4)
+XFS_ERRTAG(METAFILE_RESV_CRITICAL, metafile_resv_crit,	4) \
+XFS_ERRTAG(FORCE_ZERO_RANGE,	force_zero_range,	4)
 #endif /* XFS_ERRTAG */
 
 #endif /* __XFS_ERRORTAG_H_ */
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 2702fef2c90c..5b9864c8582e 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -27,6 +27,8 @@
 #include "xfs_file.h"
 #include "xfs_aops.h"
 #include "xfs_zone_alloc.h"
+#include "xfs_error.h"
+#include "xfs_errortag.h"
 
 #include <linux/dax.h>
 #include <linux/falloc.h>
@@ -1254,23 +1256,36 @@ xfs_falloc_zero_range(
 	struct xfs_zone_alloc_ctx *ac)
 {
 	struct inode		*inode = file_inode(file);
+	struct xfs_inode	*ip = XFS_I(inode);
 	unsigned int		blksize = i_blocksize(inode);
 	loff_t			new_size = 0;
 	int			error;
 
-	trace_xfs_zero_file_space(XFS_I(inode));
+	trace_xfs_zero_file_space(ip);
 
 	error = xfs_falloc_newsize(file, mode, offset, len, &new_size);
 	if (error)
 		return error;
 
-	error = xfs_free_file_space(XFS_I(inode), offset, len, ac);
-	if (error)
-		return error;
+	/*
+	 * Zero range implements a full zeroing mechanism but is only used in
+	 * limited situations. It is more efficient to allocate unwritten
+	 * extents than to perform zeroing here, so use an errortag to randomly
+	 * force zeroing on DEBUG kernels for added test coverage.
+	 */
+	if (XFS_TEST_ERROR(ip->i_mount,
+			   XFS_ERRTAG_FORCE_ZERO_RANGE)) {
+		error = xfs_zero_range(ip, offset, len, ac, NULL);
+	} else {
+		error = xfs_free_file_space(ip, offset, len, ac);
+		if (error)
+			return error;
 
-	len = round_up(offset + len, blksize) - round_down(offset, blksize);
-	offset = round_down(offset, blksize);
-	error = xfs_alloc_file_space(XFS_I(inode), offset, len);
+		len = round_up(offset + len, blksize) -
+			round_down(offset, blksize);
+		offset = round_down(offset, blksize);
+		error = xfs_alloc_file_space(ip, offset, len);
+	}
 	if (error)
 		return error;
 	return xfs_falloc_setsize(file, new_size);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 0/7] iomap: zero range folio batch support
  2025-10-03 13:46 [PATCH v5 0/7] iomap: zero range folio batch support Brian Foster
                   ` (6 preceding siblings ...)
  2025-10-03 13:46 ` [PATCH v5 7/7] xfs: error tag to force zeroing on debug kernels Brian Foster
@ 2025-10-07 11:12 ` Christian Brauner
  2025-10-21  0:14   ` Joanne Koong
  7 siblings, 1 reply; 10+ messages in thread
From: Christian Brauner @ 2025-10-07 11:12 UTC (permalink / raw)
  To: linux-fsdevel, Brian Foster
  Cc: Christian Brauner, linux-xfs, linux-mm, hch, djwong, willy

On Fri, 03 Oct 2025 09:46:34 -0400, Brian Foster wrote:
> Only minor changes in v5 to the XFS errortag patch. I've kept the R-b
> tags because the fundamental logic is the same, but the errortag
> mechanism has been reworked and so that one needed a rebase (which turns
> out much simpler). A second look certainly couldn't hurt, but otherwise
> the associated fstest still works as expected.
> 
> Note that the force zeroing fstests test has since been merged as
> xfs/131. Otherwise I still have some followup patches to this work re:
> the ext4 on iomap work, but it would be nice to move this along before
> getting too far ahead with that.
> 
> [...]

Applied to the vfs-6.19.iomap branch of the vfs/vfs.git tree.
Patches in the vfs-6.19.iomap branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.19.iomap

[1/7] filemap: add helper to look up dirty folios in a range
      https://git.kernel.org/vfs/vfs/c/757f5ca76903
[2/7] iomap: remove pos+len BUG_ON() to after folio lookup
      https://git.kernel.org/vfs/vfs/c/e027b6ecb710
[3/7] iomap: optional zero range dirty folio processing
      https://git.kernel.org/vfs/vfs/c/5a9a21cb7706
[4/7] xfs: always trim mapping to requested range for zero range
      https://git.kernel.org/vfs/vfs/c/50dc360fa097
[5/7] xfs: fill dirty folios on zero range of unwritten mappings
      https://git.kernel.org/vfs/vfs/c/492258e4508a
[6/7] iomap: remove old partial eof zeroing optimization
      https://git.kernel.org/vfs/vfs/c/47520b756355
[7/7] xfs: error tag to force zeroing on debug kernels
      https://git.kernel.org/vfs/vfs/c/87a5ca9f6c56

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 0/7] iomap: zero range folio batch support
  2025-10-07 11:12 ` [PATCH v5 0/7] iomap: zero range folio batch support Christian Brauner
@ 2025-10-21  0:14   ` Joanne Koong
  0 siblings, 0 replies; 10+ messages in thread
From: Joanne Koong @ 2025-10-21  0:14 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Brian Foster, linux-xfs, linux-mm, hch, djwong,
	willy

On Tue, Oct 7, 2025 at 4:12 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Fri, 03 Oct 2025 09:46:34 -0400, Brian Foster wrote:
> > Only minor changes in v5 to the XFS errortag patch. I've kept the R-b
> > tags because the fundamental logic is the same, but the errortag
> > mechanism has been reworked and so that one needed a rebase (which turns
> > out much simpler). A second look certainly couldn't hurt, but otherwise
> > the associated fstest still works as expected.
> >
> > Note that the force zeroing fstests test has since been merged as
> > xfs/131. Otherwise I still have some followup patches to this work re:
> > the ext4 on iomap work, but it would be nice to move this along before
> > getting too far ahead with that.
> >
> > [...]
>
> Applied to the vfs-6.19.iomap branch of the vfs/vfs.git tree.
> Patches in the vfs-6.19.iomap branch should appear in linux-next soon.
>
> Please report any outstanding bugs that were missed during review in a
> new review to the original patch series allowing us to drop it.
>
> It's encouraged to provide Acked-bys and Reviewed-bys even though the
> patch has now been applied. If possible patch trailers will be updated.
>
> Note that commit hashes shown below are subject to change due to rebase,
> trailer updates or similar. If in doubt, please check the listed branch.
>
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
> branch: vfs-6.19.iomap
>
> [1/7] filemap: add helper to look up dirty folios in a range
>       https://git.kernel.org/vfs/vfs/c/757f5ca76903
> [2/7] iomap: remove pos+len BUG_ON() to after folio lookup
>       https://git.kernel.org/vfs/vfs/c/e027b6ecb710
> [3/7] iomap: optional zero range dirty folio processing
>       https://git.kernel.org/vfs/vfs/c/5a9a21cb7706

Hi Christian,

Thanks for all your work with managing the vfs iomap branch. I noticed
for vfs-6.19.iomap, this series was merged after a prior patch in the
branch that had changed the iomap_iter_advance() interface [1]. As
such for the merging ordering, I think this 3rd patch needs this minor
patch-up to be compatible with the change made in [1], if you're able
to fold this in:

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 72196e5021b1..36ee3290669a 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -867,7 +867,8 @@ static int iomap_write_begin(struct iomap_iter *iter,
        if (folio_pos(folio) > iter->pos) {
                len = min_t(u64, folio_pos(folio) - iter->pos,
                                 iomap_length(iter));
-               status = iomap_iter_advance(iter, &len);
+               status = iomap_iter_advance(iter, len);
+               len = iomap_length(iter);
                if (status || !len)
                        goto out_unlock;
        }

Thanks,
Joanne

[1] https://lore.kernel.org/linux-fsdevel/CAJnrk1aJf1cgpzmDz0d+8K5gOFBpk5whqPRFsWtQ0M3dpOOJ2Q@mail.gmail.com/T/#u

> [4/7] xfs: always trim mapping to requested range for zero range
>       https://git.kernel.org/vfs/vfs/c/50dc360fa097
> [5/7] xfs: fill dirty folios on zero range of unwritten mappings
>       https://git.kernel.org/vfs/vfs/c/492258e4508a
> [6/7] iomap: remove old partial eof zeroing optimization
>       https://git.kernel.org/vfs/vfs/c/47520b756355
> [7/7] xfs: error tag to force zeroing on debug kernels
>       https://git.kernel.org/vfs/vfs/c/87a5ca9f6c56
>

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-10-21  0:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-03 13:46 [PATCH v5 0/7] iomap: zero range folio batch support Brian Foster
2025-10-03 13:46 ` [PATCH v5 1/7] filemap: add helper to look up dirty folios in a range Brian Foster
2025-10-03 13:46 ` [PATCH v5 2/7] iomap: remove pos+len BUG_ON() to after folio lookup Brian Foster
2025-10-03 13:46 ` [PATCH v5 3/7] iomap: optional zero range dirty folio processing Brian Foster
2025-10-03 13:46 ` [PATCH v5 4/7] xfs: always trim mapping to requested range for zero range Brian Foster
2025-10-03 13:46 ` [PATCH v5 5/7] xfs: fill dirty folios on zero range of unwritten mappings Brian Foster
2025-10-03 13:46 ` [PATCH v5 6/7] iomap: remove old partial eof zeroing optimization Brian Foster
2025-10-03 13:46 ` [PATCH v5 7/7] xfs: error tag to force zeroing on debug kernels Brian Foster
2025-10-07 11:12 ` [PATCH v5 0/7] iomap: zero range folio batch support Christian Brauner
2025-10-21  0:14   ` Joanne Koong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).