[PATCH v2 0/5] iomap, xfs: improve zero range flushing and lookup

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/5] iomap, xfs: improve zero range flushing and lookup
@ 2026-01-29 15:50 Brian Foster
  2026-01-29 15:50 ` [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs Brian Foster
                   ` (4 more replies)
  0 siblings, 5 replies; 36+ messages in thread
From: Brian Foster @ 2026-01-29 15:50 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs

Hi all,

Here's v2 of the iomap zero range flush cleanup patches. Patch 1 in v1
has already been merged separately, so that's dropped off. Otherwise no
major changes from v1. The remaining patches here lift the flush into
XFS, fix up the insert range issue that the flush implicitly suppressed,
streamlines the .iomap_begin() logic a bit, and finally replaces the
just lifted flush with proper use of the folio batch mechanism. The end
result is that the flush remains in iomap zero range purely as a
fallback for callers who do not provide a folio batch for pagecache
dirty unwritten mappings.

WRT some of the discussion on v1.. I looked into changing how COW blocks
over data fork holes are reported in XFS as a first step, but I
eventually ran into complexity that would essentially duplicate some of
the hacks I'm trying to clean up. For example, we'd have to determine
whether to report as a hole or "data" mapping based on pagecache state,
and this series adds some of that by the end by explicitly doing the
dirty folio lookup in this scenario. I'll plan to revisit this on top of
this series as a standalone XFS improvement, but haven't got there yet.

The other thing that is a little more annoying was failure of the idea
to essentially prep the shift where patch 2 adds an EOF folio flush [1].
This ordering leads to potential pagecache inconsistency because the
i_size update can zero and repopulate pagecache. I'm open to other ideas
here, but otherwise haven't been able to think of anything more
clever/simple (including futzing around for suggestions with AI). All in
all I still think this is more clear having the flush isolated in insert
range where it is actually required than having a flush in iomap
indirectly suppress the problem.

Thoughts, reviews, flames appreciated.

Brian

[1] https://lore.kernel.org/linux-fsdevel/20251105001445.GW196370@frogsfrogsfrogs/

v2:
- Patch 1 from v1 merged separately.
- Fixed up iomap_fill_dirty_folios() call in patch 5.
v1: https://lore.kernel.org/linux-fsdevel/20251016190303.53881-1-bfoster@redhat.com/

Brian Foster (5):
  iomap, xfs: lift zero range hole mapping flush into xfs
  xfs: flush eof folio before insert range size update
  xfs: look up cow fork extent earlier for buffered iomap_begin
  xfs: only flush when COW fork blocks overlap data fork holes
  xfs: replace zero range flush with folio batch

 fs/iomap/buffered-io.c |  6 +--
 fs/xfs/xfs_file.c      | 17 +++++++++
 fs/xfs/xfs_iomap.c     | 87 ++++++++++++++++++++++++++++++------------
 3 files changed, 81 insertions(+), 29 deletions(-)

-- 
2.52.0

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs
  2026-01-29 15:50 [PATCH v2 0/5] iomap, xfs: improve zero range flushing and lookup Brian Foster
@ 2026-01-29 15:50 ` Brian Foster
  2026-02-10 16:15   ` Christoph Hellwig
                     ` (2 more replies)
  2026-01-29 15:50 ` [PATCH v2 2/5] xfs: flush eof folio before insert range size update Brian Foster
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 36+ messages in thread
From: Brian Foster @ 2026-01-29 15:50 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs

iomap zero range has a wart in that it also flushes dirty pagecache
over hole mappings (rather than only unwritten mappings). This was
included to accommodate a quirk in XFS where COW fork preallocation
can exist over a hole in the data fork, and the associated range is
reported as a hole. This is because the range actually is a hole,
but XFS also has an optimization where if COW fork blocks exist for
a range being written to, those blocks are used regardless of
whether the data fork blocks are shared or not. For zeroing, COW
fork blocks over a data fork hole are only relevant if the range is
dirty in pagecache, otherwise the range is already considered
zeroed.

The easiest way to deal with this corner case is to flush the
pagecache to trigger COW remapping into the data fork, and then
operate on the updated on-disk state. The problem is that ext4
cannot accommodate a flush from this context due to being a
transaction deadlock vector.

Outside of the hole quirk, ext4 can avoid the flush for zero range
by using the recently introduced folio batch lookup mechanism for
unwritten mappings. Therefore, take the next logical step and lift
the hole handling logic into the XFS iomap_begin handler. iomap will
still flush on unwritten mappings without a folio batch, and XFS
will flush and retry mapping lookups in the case where it would
otherwise report a hole with dirty pagecache during a zero range.

Note that this is intended to be a fairly straightforward lift and
otherwise not change behavior. Now that the flush exists within XFS,
follow on patches can further optimize it.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/iomap/buffered-io.c |  2 +-
 fs/xfs/xfs_iomap.c     | 25 ++++++++++++++++++++++---
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 6beb876658c0..807384d72311 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1620,7 +1620,7 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 		     srcmap->type == IOMAP_UNWRITTEN)) {
 			s64 status;

-			if (range_dirty) {
+			if (range_dirty && srcmap->type == IOMAP_UNWRITTEN) {
 				range_dirty = false;
 				status = iomap_zero_iter_flush_and_stale(&iter);
 			} else {
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 37a1b33e9045..896d0dd07613 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1790,6 +1790,7 @@ xfs_buffered_write_iomap_begin(
 	if (error)
 		return error;

+restart:
 	error = xfs_ilock_for_iomap(ip, flags, &lockmode);
 	if (error)
 		return error;
@@ -1817,9 +1818,27 @@ xfs_buffered_write_iomap_begin(
 	if (eof)
 		imap.br_startoff = end_fsb; /* fake hole until the end */

-	/* We never need to allocate blocks for zeroing or unsharing a hole. */
-	if ((flags & (IOMAP_UNSHARE | IOMAP_ZERO)) &&
-	    imap.br_startoff > offset_fsb) {
+	/* We never need to allocate blocks for unsharing a hole. */
+	if ((flags & IOMAP_UNSHARE) && imap.br_startoff > offset_fsb) {
+		xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
+		goto out_unlock;
+	}
+
+	/*
+	 * We may need to zero over a hole in the data fork if it's fronted by
+	 * COW blocks and dirty pagecache. To make sure zeroing occurs, force
+	 * writeback to remap pending blocks and restart the lookup.
+	 */
+	if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
+		if (filemap_range_needs_writeback(inode->i_mapping, offset,
+						  offset + count - 1)) {
+			xfs_iunlock(ip, lockmode);
+			error = filemap_write_and_wait_range(inode->i_mapping,
+						offset, offset + count - 1);
+			if (error)
+				return error;
+			goto restart;
+		}
 		xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
 		goto out_unlock;
 	}
-- 
2.52.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 2/5] xfs: flush eof folio before insert range size update
  2026-01-29 15:50 [PATCH v2 0/5] iomap, xfs: improve zero range flushing and lookup Brian Foster
  2026-01-29 15:50 ` [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs Brian Foster
@ 2026-01-29 15:50 ` Brian Foster
  2026-02-10 16:16   ` Christoph Hellwig
  2026-01-29 15:50 ` [PATCH v2 3/5] xfs: look up cow fork extent earlier for buffered iomap_begin Brian Foster
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 36+ messages in thread
From: Brian Foster @ 2026-01-29 15:50 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs

The flush in xfs_buffered_write_iomap_begin() for zero range over a
data fork hole fronted by COW fork prealloc is primarily designed to
provide correct zeroing behavior in particular pagecache conditions.
As it turns out, this also partially masks some odd behavior in
insert range (via zero range via setattr).

Insert range bumps i_size the length of the new range, flushes,
unmaps pagecache and cancels COW prealloc, and then right shifts
extents from the end of the file back to the target offset of the
insert. Since the i_size update occurs before the pagecache flush,
this creates a transient situation where writeback around EOF can
behave differently.

This appears to be corner case situation, but if happens to be
fronted by COW fork speculative preallocation and a large, dirty
folio that contains at least one full COW block beyond EOF, the
writeback after i_size is bumped may remap that COW fork block into
the data fork within EOF. The block is zeroed and then shifted back
out to post-eof, but this is unexpected in that it leads to a
written post-eof data fork block. This can cause a zero range
warning on a subsequent size extension, because we should never find
blocks that require physical zeroing beyond i_size.

To avoid this quirk, flush the EOF folio before the i_size update
during insert range. The entire range will be flushed, unmapped and
invalidated anyways, so this should be relatively unnoticeable.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_file.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 7874cf745af3..1f2730558165 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1227,6 +1227,23 @@ xfs_falloc_insert_range(
 	if (offset >= isize)
 		return -EINVAL;

+	/*
+	 * Let writeback clean up EOF folio state before we bump i_size. The
+	 * insert flushes before it starts shifting and under certain
+	 * circumstances we can write back blocks that should technically be
+	 * considered post-eof (and thus should not be submitted for writeback).
+	 *
+	 * For example, a large, dirty folio that spans EOF and is backed by
+	 * post-eof COW fork preallocation can cause block remap into the data
+	 * fork. This shifts back out beyond EOF, but creates an expectedly
+	 * written post-eof block. The insert is going to flush, unmap and
+	 * cancel prealloc across this whole range, so flush EOF now before we
+	 * bump i_size to provide consistent behavior.
+	 */
+	error = filemap_write_and_wait_range(inode->i_mapping, isize, isize);
+	if (error)
+		return error;
+
 	error = xfs_falloc_setsize(file, isize + len);
 	if (error)
 		return error;
-- 
2.52.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 3/5] xfs: look up cow fork extent earlier for buffered iomap_begin
  2026-01-29 15:50 [PATCH v2 0/5] iomap, xfs: improve zero range flushing and lookup Brian Foster
  2026-01-29 15:50 ` [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs Brian Foster
  2026-01-29 15:50 ` [PATCH v2 2/5] xfs: flush eof folio before insert range size update Brian Foster
@ 2026-01-29 15:50 ` Brian Foster
  2026-02-10 16:17   ` Christoph Hellwig
  2026-01-29 15:50 ` [PATCH v2 4/5] xfs: only flush when COW fork blocks overlap data fork holes Brian Foster
  2026-01-29 15:50 ` [PATCH v2 5/5] xfs: replace zero range flush with folio batch Brian Foster
  4 siblings, 1 reply; 36+ messages in thread
From: Brian Foster @ 2026-01-29 15:50 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs

To further isolate the need for flushing for zero range, we need to
know whether a hole in the data fork is fronted by blocks in the COW
fork or not. COW fork lookup currently occurs further down in the
function, after the zero range case is handled.

As a preparation step, lift the COW fork extent lookup to earlier in
the function, at the same time as the data fork lookup. Only the
lookup logic is lifted. The COW fork branch/reporting logic remains
as is to avoid any observable behavior change from an iomap
reporting perspective.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/xfs_iomap.c | 46 +++++++++++++++++++++++++---------------------
 1 file changed, 25 insertions(+), 21 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 896d0dd07613..0edab7af4a10 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1809,14 +1809,29 @@ xfs_buffered_write_iomap_begin(
 		goto out_unlock;
 
 	/*
-	 * Search the data fork first to look up our source mapping.  We
-	 * always need the data fork map, as we have to return it to the
-	 * iomap code so that the higher level write code can read data in to
-	 * perform read-modify-write cycles for unaligned writes.
+	 * Search the data fork first to look up our source mapping. We always
+	 * need the data fork map, as we have to return it to the iomap code so
+	 * that the higher level write code can read data in to perform
+	 * read-modify-write cycles for unaligned writes.
+	 *
+	 * Then search the COW fork extent list even if we did not find a data
+	 * fork extent. This serves two purposes: first this implements the
+	 * speculative preallocation using cowextsize, so that we also unshare
+	 * block adjacent to shared blocks instead of just the shared blocks
+	 * themselves. Second the lookup in the extent list is generally faster
+	 * than going out to the shared extent tree.
 	 */
 	eof = !xfs_iext_lookup_extent(ip, &ip->i_df, offset_fsb, &icur, &imap);
 	if (eof)
 		imap.br_startoff = end_fsb; /* fake hole until the end */
+	if (xfs_is_cow_inode(ip)) {
+		if (!ip->i_cowfp) {
+			ASSERT(!xfs_is_reflink_inode(ip));
+			xfs_ifork_init_cow(ip);
+		}
+		cow_eof = !xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb,
+				&ccur, &cmap);
+	}
 
 	/* We never need to allocate blocks for unsharing a hole. */
 	if ((flags & IOMAP_UNSHARE) && imap.br_startoff > offset_fsb) {
@@ -1883,24 +1898,13 @@ xfs_buffered_write_iomap_begin(
 	}
 
 	/*
-	 * Search the COW fork extent list even if we did not find a data fork
-	 * extent.  This serves two purposes: first this implements the
-	 * speculative preallocation using cowextsize, so that we also unshare
-	 * block adjacent to shared blocks instead of just the shared blocks
-	 * themselves.  Second the lookup in the extent list is generally faster
-	 * than going out to the shared extent tree.
+	 * Now that we've handled any operation specific special cases, at this
+	 * point we can report a COW mapping if found.
 	 */
-	if (xfs_is_cow_inode(ip)) {
-		if (!ip->i_cowfp) {
-			ASSERT(!xfs_is_reflink_inode(ip));
-			xfs_ifork_init_cow(ip);
-		}
-		cow_eof = !xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb,
-				&ccur, &cmap);
-		if (!cow_eof && cmap.br_startoff <= offset_fsb) {
-			trace_xfs_reflink_cow_found(ip, &cmap);
-			goto found_cow;
-		}
+	if (xfs_is_cow_inode(ip) &&
+	    !cow_eof && cmap.br_startoff <= offset_fsb) {
+		trace_xfs_reflink_cow_found(ip, &cmap);
+		goto found_cow;
 	}
 
 	if (imap.br_startoff <= offset_fsb) {
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 4/5] xfs: only flush when COW fork blocks overlap data fork holes
  2026-01-29 15:50 [PATCH v2 0/5] iomap, xfs: improve zero range flushing and lookup Brian Foster
                   ` (2 preceding siblings ...)
  2026-01-29 15:50 ` [PATCH v2 3/5] xfs: look up cow fork extent earlier for buffered iomap_begin Brian Foster
@ 2026-01-29 15:50 ` Brian Foster
  2026-02-10 16:19   ` Christoph Hellwig
  2026-02-17 15:06   ` Nirjhar Roy (IBM)
  2026-01-29 15:50 ` [PATCH v2 5/5] xfs: replace zero range flush with folio batch Brian Foster
  4 siblings, 2 replies; 36+ messages in thread
From: Brian Foster @ 2026-01-29 15:50 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs

The zero range hole mapping flush case has been lifted from iomap
into XFS. Now that we have more mapping context available from the
->iomap_begin() handler, we can isolate the flush further to when we
know a hole is fronted by COW blocks.

Rather than purely rely on pagecache dirty state, explicitly check
for the case where a range is a hole in both forks. Otherwise trim
to the range where there does happen to be overlap and use that for
the pagecache writeback check. This might prevent some spurious
zeroing, but more importantly makes it easier to remove the flush
entirely.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_iomap.c | 36 ++++++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 0edab7af4a10..0e82b4ec8264 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1760,10 +1760,12 @@ xfs_buffered_write_iomap_begin(
 {
 	struct iomap_iter	*iter = container_of(iomap, struct iomap_iter,
 						     iomap);
+	struct address_space	*mapping = inode->i_mapping;
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
 	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
+	xfs_fileoff_t		cow_fsb = NULLFILEOFF;
 	struct xfs_bmbt_irec	imap, cmap;
 	struct xfs_iext_cursor	icur, ccur;
 	xfs_fsblock_t		prealloc_blocks = 0;
@@ -1831,6 +1833,8 @@ xfs_buffered_write_iomap_begin(
 		}
 		cow_eof = !xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb,
 				&ccur, &cmap);
+		if (!cow_eof)
+			cow_fsb = cmap.br_startoff;
 	}
 
 	/* We never need to allocate blocks for unsharing a hole. */
@@ -1845,17 +1849,37 @@ xfs_buffered_write_iomap_begin(
 	 * writeback to remap pending blocks and restart the lookup.
 	 */
 	if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
-		if (filemap_range_needs_writeback(inode->i_mapping, offset,
-						  offset + count - 1)) {
+		loff_t start, end;
+
+		imap.br_blockcount = imap.br_startoff - offset_fsb;
+		imap.br_startoff = offset_fsb;
+		imap.br_startblock = HOLESTARTBLOCK;
+		imap.br_state = XFS_EXT_NORM;
+
+		if (cow_fsb == NULLFILEOFF) {
+			goto found_imap;
+		} else if (cow_fsb > offset_fsb) {
+			xfs_trim_extent(&imap, offset_fsb,
+					cow_fsb - offset_fsb);
+			goto found_imap;
+		}
+
+		/* COW fork blocks overlap the hole */
+		xfs_trim_extent(&imap, offset_fsb,
+			    cmap.br_startoff + cmap.br_blockcount - offset_fsb);
+		start = XFS_FSB_TO_B(mp, imap.br_startoff);
+		end = XFS_FSB_TO_B(mp,
+				   imap.br_startoff + imap.br_blockcount) - 1;
+		if (filemap_range_needs_writeback(mapping, start, end)) {
 			xfs_iunlock(ip, lockmode);
-			error = filemap_write_and_wait_range(inode->i_mapping,
-						offset, offset + count - 1);
+			error = filemap_write_and_wait_range(mapping, start,
+							     end);
 			if (error)
 				return error;
 			goto restart;
 		}
-		xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
-		goto out_unlock;
+
+		goto found_imap;
 	}
 
 	/*
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 5/5] xfs: replace zero range flush with folio batch
  2026-01-29 15:50 [PATCH v2 0/5] iomap, xfs: improve zero range flushing and lookup Brian Foster
                   ` (3 preceding siblings ...)
  2026-01-29 15:50 ` [PATCH v2 4/5] xfs: only flush when COW fork blocks overlap data fork holes Brian Foster
@ 2026-01-29 15:50 ` Brian Foster
  2026-02-10 16:21   ` Christoph Hellwig
  4 siblings, 1 reply; 36+ messages in thread
From: Brian Foster @ 2026-01-29 15:50 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs

Now that the zero range pagecache flush is purely isolated to
providing zeroing correctness in this case, we can remove it and
replace it with the folio batch mechanism that is used for handling
unwritten extents.

This is still slightly odd in that XFS reports a hole vs. a mapping
that reflects the COW fork extents, but that has always been the
case in this situation and so a separate issue. We drop the iomap
warning that assumes the folio batch is always associated with
unwritten mappings, but this is mainly a development assertion as
otherwise the core iomap fbatch code doesn't care much about the
mapping type if it's handed the set of folios to process.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/iomap/buffered-io.c |  4 ----
 fs/xfs/xfs_iomap.c     | 16 ++++------------
 2 files changed, 4 insertions(+), 16 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 807384d72311..68eb4bc056b6 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1611,10 +1611,6 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 	while ((ret = iomap_iter(&iter, ops)) > 0) {
 		const struct iomap *srcmap = iomap_iter_srcmap(&iter);
 
-		if (WARN_ON_ONCE((iter.iomap.flags & IOMAP_F_FOLIO_BATCH) &&
-				 srcmap->type != IOMAP_UNWRITTEN))
-			return -EIO;
-
 		if (!(iter.iomap.flags & IOMAP_F_FOLIO_BATCH) &&
 		    (srcmap->type == IOMAP_HOLE ||
 		     srcmap->type == IOMAP_UNWRITTEN)) {
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 0e82b4ec8264..49ab4edf5ec3 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1760,7 +1760,6 @@ xfs_buffered_write_iomap_begin(
 {
 	struct iomap_iter	*iter = container_of(iomap, struct iomap_iter,
 						     iomap);
-	struct address_space	*mapping = inode->i_mapping;
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
@@ -1792,7 +1791,6 @@ xfs_buffered_write_iomap_begin(
 	if (error)
 		return error;
 
-restart:
 	error = xfs_ilock_for_iomap(ip, flags, &lockmode);
 	if (error)
 		return error;
@@ -1868,16 +1866,10 @@ xfs_buffered_write_iomap_begin(
 		xfs_trim_extent(&imap, offset_fsb,
 			    cmap.br_startoff + cmap.br_blockcount - offset_fsb);
 		start = XFS_FSB_TO_B(mp, imap.br_startoff);
-		end = XFS_FSB_TO_B(mp,
-				   imap.br_startoff + imap.br_blockcount) - 1;
-		if (filemap_range_needs_writeback(mapping, start, end)) {
-			xfs_iunlock(ip, lockmode);
-			error = filemap_write_and_wait_range(mapping, start,
-							     end);
-			if (error)
-				return error;
-			goto restart;
-		}
+		end = XFS_FSB_TO_B(mp, imap.br_startoff + imap.br_blockcount);
+		iomap_fill_dirty_folios(iter, &start, end, &iomap_flags);
+		xfs_trim_extent(&imap, offset_fsb,
+				XFS_B_TO_FSB(mp, start) - offset_fsb);
 
 		goto found_imap;
 	}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs
  2026-01-29 15:50 ` [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs Brian Foster
@ 2026-02-10 16:15   ` Christoph Hellwig
  2026-02-10 19:14     ` Brian Foster
  2026-02-13  6:06   ` Christoph Hellwig
  2026-02-13 10:20   ` Nirjhar Roy (IBM)
  2 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2026-02-10 16:15 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, linux-xfs

This looks sensible to me.  I'll add it to my zoned xfs test queue
to make sure it doesn't break anything there, but as the zoned
iomap_begin always allocates new space it should be fine.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/5] xfs: flush eof folio before insert range size update
  2026-01-29 15:50 ` [PATCH v2 2/5] xfs: flush eof folio before insert range size update Brian Foster
@ 2026-02-10 16:16   ` Christoph Hellwig
  2026-02-10 19:14     ` Brian Foster
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2026-02-10 16:16 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, linux-xfs

On Thu, Jan 29, 2026 at 10:50:25AM -0500, Brian Foster wrote:
> The flush in xfs_buffered_write_iomap_begin() for zero range over a
> data fork hole fronted by COW fork prealloc is primarily designed to
> provide correct zeroing behavior in particular pagecache conditions.
> As it turns out, this also partially masks some odd behavior in
> insert range (via zero range via setattr).
> 
> Insert range bumps i_size the length of the new range, flushes,
> unmaps pagecache and cancels COW prealloc, and then right shifts
> extents from the end of the file back to the target offset of the
> insert. Since the i_size update occurs before the pagecache flush,
> this creates a transient situation where writeback around EOF can
> behave differently.

Eww.

> To avoid this quirk, flush the EOF folio before the i_size update
> during insert range. The entire range will be flushed, unmapped and
> invalidated anyways, so this should be relatively unnoticeable.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/5] xfs: look up cow fork extent earlier for buffered iomap_begin
  2026-01-29 15:50 ` [PATCH v2 3/5] xfs: look up cow fork extent earlier for buffered iomap_begin Brian Foster
@ 2026-02-10 16:17   ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2026-02-10 16:17 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/5] xfs: only flush when COW fork blocks overlap data fork holes
  2026-01-29 15:50 ` [PATCH v2 4/5] xfs: only flush when COW fork blocks overlap data fork holes Brian Foster
@ 2026-02-10 16:19   ` Christoph Hellwig
  2026-02-10 19:18     ` Brian Foster
  2026-02-17 15:06   ` Nirjhar Roy (IBM)
  1 sibling, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2026-02-10 16:19 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, linux-xfs

On Thu, Jan 29, 2026 at 10:50:27AM -0500, Brian Foster wrote:
> The zero range hole mapping flush case has been lifted from iomap
> into XFS. Now that we have more mapping context available from the
> ->iomap_begin() handler, we can isolate the flush further to when we
> know a hole is fronted by COW blocks.
> 
> Rather than purely rely on pagecache dirty state, explicitly check
> for the case where a range is a hole in both forks. Otherwise trim
> to the range where there does happen to be overlap and use that for
> the pagecache writeback check. This might prevent some spurious
> zeroing, but more importantly makes it easier to remove the flush
> entirely.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_iomap.c | 36 ++++++++++++++++++++++++++++++------
>  1 file changed, 30 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 0edab7af4a10..0e82b4ec8264 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1760,10 +1760,12 @@ xfs_buffered_write_iomap_begin(
>  {
>  	struct iomap_iter	*iter = container_of(iomap, struct iomap_iter,
>  						     iomap);
> +	struct address_space	*mapping = inode->i_mapping;
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
>  	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
>  	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
> +	xfs_fileoff_t		cow_fsb = NULLFILEOFF;
>  	struct xfs_bmbt_irec	imap, cmap;
>  	struct xfs_iext_cursor	icur, ccur;
>  	xfs_fsblock_t		prealloc_blocks = 0;
> @@ -1831,6 +1833,8 @@ xfs_buffered_write_iomap_begin(
>  		}
>  		cow_eof = !xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb,
>  				&ccur, &cmap);
> +		if (!cow_eof)
> +			cow_fsb = cmap.br_startoff;
>  	}
>  
>  	/* We never need to allocate blocks for unsharing a hole. */
> @@ -1845,17 +1849,37 @@ xfs_buffered_write_iomap_begin(
>  	 * writeback to remap pending blocks and restart the lookup.
>  	 */
>  	if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
> -		if (filemap_range_needs_writeback(inode->i_mapping, offset,
> -						  offset + count - 1)) {
> +		loff_t start, end;
> +
> +		imap.br_blockcount = imap.br_startoff - offset_fsb;
> +		imap.br_startoff = offset_fsb;
> +		imap.br_startblock = HOLESTARTBLOCK;
> +		imap.br_state = XFS_EXT_NORM;
> +
> +		if (cow_fsb == NULLFILEOFF) {
> +			goto found_imap;
> +		} else if (cow_fsb > offset_fsb) {

No need for an else after a goto.

Otherwise this looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

I wonder if at some point the zeroing logic should be split into a
separate helper..

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/5] xfs: replace zero range flush with folio batch
  2026-01-29 15:50 ` [PATCH v2 5/5] xfs: replace zero range flush with folio batch Brian Foster
@ 2026-02-10 16:21   ` Christoph Hellwig
  2026-02-10 19:19     ` Brian Foster
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2026-02-10 16:21 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, linux-xfs

On Thu, Jan 29, 2026 at 10:50:28AM -0500, Brian Foster wrote:
> Now that the zero range pagecache flush is purely isolated to
> providing zeroing correctness in this case, we can remove it and
> replace it with the folio batch mechanism that is used for handling
> unwritten extents.
> 
> This is still slightly odd in that XFS reports a hole vs. a mapping
> that reflects the COW fork extents, but that has always been the
> case in this situation and so a separate issue

I wish we could fix it eventually.  One thing that came to my mind
a few times would be to do all writes through the COW fork, and
instead renaming it a 'staging' or similar fork, because it then
contains all data not recorded into the inode yet.  But that would
be a huge lift..

The patch itself looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs
  2026-02-10 16:15   ` Christoph Hellwig
@ 2026-02-10 19:14     ` Brian Foster
  2026-02-11 15:36       ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Brian Foster @ 2026-02-10 19:14 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-xfs

On Tue, Feb 10, 2026 at 08:15:50AM -0800, Christoph Hellwig wrote:
> This looks sensible to me.  I'll add it to my zoned xfs test queue
> to make sure it doesn't break anything there, but as the zoned
> iomap_begin always allocates new space it should be fine.
> 
> 

Thanks. As you've probably noticed from looking through the rest of the
patches, this one is mainly a lift-and-shift initial step from iomap to
xfs. Let me know if anything breaks.

Brian


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/5] xfs: flush eof folio before insert range size update
  2026-02-10 16:16   ` Christoph Hellwig
@ 2026-02-10 19:14     ` Brian Foster
  0 siblings, 0 replies; 36+ messages in thread
From: Brian Foster @ 2026-02-10 19:14 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-xfs

On Tue, Feb 10, 2026 at 08:16:48AM -0800, Christoph Hellwig wrote:
> On Thu, Jan 29, 2026 at 10:50:25AM -0500, Brian Foster wrote:
> > The flush in xfs_buffered_write_iomap_begin() for zero range over a
> > data fork hole fronted by COW fork prealloc is primarily designed to
> > provide correct zeroing behavior in particular pagecache conditions.
> > As it turns out, this also partially masks some odd behavior in
> > insert range (via zero range via setattr).
> > 
> > Insert range bumps i_size the length of the new range, flushes,
> > unmaps pagecache and cancels COW prealloc, and then right shifts
> > extents from the end of the file back to the target offset of the
> > insert. Since the i_size update occurs before the pagecache flush,
> > this creates a transient situation where writeback around EOF can
> > behave differently.
> 
> Eww.
> 

Yes. I'd still like to have something better here. I just don't know
what that is. I'm hoping that more eyes on this little quirk will
eventually lead to some better ideas on the insert range side.

Thanks for the review.

Brian

> > To avoid this quirk, flush the EOF folio before the i_size update
> > during insert range. The entire range will be flushed, unmapped and
> > invalidated anyways, so this should be relatively unnoticeable.
> 
> Looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/5] xfs: only flush when COW fork blocks overlap data fork holes
  2026-02-10 16:19   ` Christoph Hellwig
@ 2026-02-10 19:18     ` Brian Foster
  0 siblings, 0 replies; 36+ messages in thread
From: Brian Foster @ 2026-02-10 19:18 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-xfs

On Tue, Feb 10, 2026 at 08:19:09AM -0800, Christoph Hellwig wrote:
> On Thu, Jan 29, 2026 at 10:50:27AM -0500, Brian Foster wrote:
> > The zero range hole mapping flush case has been lifted from iomap
> > into XFS. Now that we have more mapping context available from the
> > ->iomap_begin() handler, we can isolate the flush further to when we
> > know a hole is fronted by COW blocks.
> > 
> > Rather than purely rely on pagecache dirty state, explicitly check
> > for the case where a range is a hole in both forks. Otherwise trim
> > to the range where there does happen to be overlap and use that for
> > the pagecache writeback check. This might prevent some spurious
> > zeroing, but more importantly makes it easier to remove the flush
> > entirely.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/xfs_iomap.c | 36 ++++++++++++++++++++++++++++++------
> >  1 file changed, 30 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index 0edab7af4a10..0e82b4ec8264 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
...
> > @@ -1845,17 +1849,37 @@ xfs_buffered_write_iomap_begin(
> >  	 * writeback to remap pending blocks and restart the lookup.
> >  	 */
> >  	if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
> > -		if (filemap_range_needs_writeback(inode->i_mapping, offset,
> > -						  offset + count - 1)) {
> > +		loff_t start, end;
> > +
> > +		imap.br_blockcount = imap.br_startoff - offset_fsb;
> > +		imap.br_startoff = offset_fsb;
> > +		imap.br_startblock = HOLESTARTBLOCK;
> > +		imap.br_state = XFS_EXT_NORM;
> > +
> > +		if (cow_fsb == NULLFILEOFF) {
> > +			goto found_imap;
> > +		} else if (cow_fsb > offset_fsb) {
> 
> No need for an else after a goto.
> 

I'll tweak that and send a new version with R-b tags if I don't hear
about anything from your zoned tests after a few days.

> Otherwise this looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> I wonder if at some point the zeroing logic should be split into a
> separate helper..
> 

Do you mean a standalone .iomap_begin() for zeroing ops? If so, I've had
a similar thought related to the comments on the next patch...

Brian


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/5] xfs: replace zero range flush with folio batch
  2026-02-10 16:21   ` Christoph Hellwig
@ 2026-02-10 19:19     ` Brian Foster
  2026-02-11 15:41       ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Brian Foster @ 2026-02-10 19:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-xfs

On Tue, Feb 10, 2026 at 08:21:04AM -0800, Christoph Hellwig wrote:
> On Thu, Jan 29, 2026 at 10:50:28AM -0500, Brian Foster wrote:
> > Now that the zero range pagecache flush is purely isolated to
> > providing zeroing correctness in this case, we can remove it and
> > replace it with the folio batch mechanism that is used for handling
> > unwritten extents.
> > 
> > This is still slightly odd in that XFS reports a hole vs. a mapping
> > that reflects the COW fork extents, but that has always been the
> > case in this situation and so a separate issue
> 
> I wish we could fix it eventually.  One thing that came to my mind
> a few times would be to do all writes through the COW fork, and
> instead renaming it a 'staging' or similar fork, because it then
> contains all data not recorded into the inode yet.  But that would
> be a huge lift..
> 

I failed to rework this as an initial step (as discussed in the prior
version of this series), but I got back to playing with it a bit after
posting this version and I think I have a prototype that so far seems to
work on top of this series.

For context, the idea is basically that instead of always reporting COW
fork blocks over data fork holes as a hole mapping, we report it as a
shared/COW mapping when the conditions to require zeroing are met. That
basically means that 1. only when the folio batch is not empty and 2.
only when the mapping is fully within EOF. The caveat to the latter is
that we also have to split at the EOF boundary the same way the existing
zero range code does to trigger post-eof delalloc conversion.

This adds a bit more code in the same area as this series. It doesn't
seem terrible so far, but it was one reason I was wondering if this
perhaps warranted splitting off its own callback. The behavior for zero
range here is unique enough from standard read/write ops such that might
be a readability improvement. Since I'm not the only person with that
thought, I'll take a proper look and see if it's worth a prototype to go
along with the mapping behavior change..

Brian

> The patch itself looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs
  2026-02-10 19:14     ` Brian Foster
@ 2026-02-11 15:36       ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2026-02-11 15:36 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-fsdevel, linux-xfs

On Tue, Feb 10, 2026 at 02:14:13PM -0500, Brian Foster wrote:
> On Tue, Feb 10, 2026 at 08:15:50AM -0800, Christoph Hellwig wrote:
> > This looks sensible to me.  I'll add it to my zoned xfs test queue
> > to make sure it doesn't break anything there, but as the zoned
> > iomap_begin always allocates new space it should be fine.
> > 
> > 
> 
> Thanks. As you've probably noticed from looking through the rest of the
> patches, this one is mainly a lift-and-shift initial step from iomap to
> xfs. Let me know if anything breaks.

Well, to one of the iomap_begin instances.  We have an entirely
separate one for zoned mode, and we also fall back to the direct I/O
one when a rtextsize is set.  I don't think either of the moved
handling, but I want to be sure.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/5] xfs: replace zero range flush with folio batch
  2026-02-10 19:19     ` Brian Foster
@ 2026-02-11 15:41       ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2026-02-11 15:41 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-fsdevel, linux-xfs

On Tue, Feb 10, 2026 at 02:19:53PM -0500, Brian Foster wrote:
> This adds a bit more code in the same area as this series. It doesn't
> seem terrible so far, but it was one reason I was wondering if this
> perhaps warranted splitting off its own callback. The behavior for zero
> range here is unique enough from standard read/write ops such that might
> be a readability improvement. Since I'm not the only person with that
> thought, I'll take a proper look and see if it's worth a prototype to go
> along with the mapping behavior change..

Yeah, a separate handler might be cleaner.  Note that in
xfs_buffered_write_iomap_begin, the zeroing special case is also
a fairly large part of the function.

One thing that might help here is my old idea of splitting looking up the
existing mapping, and creating a new mapping for dirtied data into
separate nested iomap iterations, preferably with a way to pass along the
mapping for in-place updates.  That way all this zeroing special casing
would go away, as we'd only perform an action if the read side lookup
returned an extent.  But so far this is just a crazy idea in my mind,
and I'm not really sure how well it would work in practice.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs
  2026-01-29 15:50 ` [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs Brian Foster
  2026-02-10 16:15   ` Christoph Hellwig
@ 2026-02-13  6:06   ` Christoph Hellwig
  2026-02-13 17:37     ` Brian Foster
  2026-02-13 10:20   ` Nirjhar Roy (IBM)
  2 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2026-02-13  6:06 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, linux-xfs

This patch makes generic/363 and xfs/131 fail on zoned XFS file systems.
A later patch also makes generic/127, but I haven't fully bisected
the cause yet.

I'll see if I can make sense of it, but unfortunately I've got a fair
amount of travel coming up, so it might take a while.

If you want to give it a spin yourself, you can just add '-r zoned=1'
to the mkfs arguments for the test and scratch device.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs
  2026-01-29 15:50 ` [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs Brian Foster
  2026-02-10 16:15   ` Christoph Hellwig
  2026-02-13  6:06   ` Christoph Hellwig
@ 2026-02-13 10:20   ` Nirjhar Roy (IBM)
  2026-02-13 16:24     ` Darrick J. Wong
  2 siblings, 1 reply; 36+ messages in thread
From: Nirjhar Roy (IBM) @ 2026-02-13 10:20 UTC (permalink / raw)
  To: Brian Foster, linux-fsdevel, linux-xfs

On Thu, 2026-01-29 at 10:50 -0500, Brian Foster wrote:
> iomap zero range has a wart in that it also flushes dirty pagecache
> over hole mappings (rather than only unwritten mappings). This was
> included to accommodate a quirk in XFS where COW fork preallocation
> can exist over a hole in the data fork, and the associated range is
> reported as a hole. This is because the range actually is a hole,
> but XFS also has an optimization where if COW fork blocks exist for
> a range being written to, those blocks are used regardless of
> whether the data fork blocks are shared or not. For zeroing, COW
> fork blocks over a data fork hole are only relevant if the range is
> dirty in pagecache, otherwise the range is already considered
> zeroed.
> 
> The easiest way to deal with this corner case is to flush the
> pagecache to trigger COW remapping into the data fork, and then
> operate on the updated on-disk state. The problem is that ext4
> cannot accommodate a flush from this context due to being a
> transaction deadlock vector.
> 
> Outside of the hole quirk, ext4 can avoid the flush for zero range
> by using the recently introduced folio batch lookup mechanism for
> unwritten mappings. Therefore, take the next logical step and lift
> the hole handling logic into the XFS iomap_begin handler. iomap will
> still flush on unwritten mappings without a folio batch, and XFS
> will flush and retry mapping lookups in the case where it would
> otherwise report a hole with dirty pagecache during a zero range.
> 
> Note that this is intended to be a fairly straightforward lift and
> otherwise not change behavior. Now that the flush exists within XFS,
> follow on patches can further optimize it.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/iomap/buffered-io.c |  2 +-
>  fs/xfs/xfs_iomap.c     | 25 ++++++++++++++++++++++---
>  2 files changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 6beb876658c0..807384d72311 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1620,7 +1620,7 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
>  		     srcmap->type == IOMAP_UNWRITTEN)) {
>  			s64 status;
>  
> -			if (range_dirty) {
> +			if (range_dirty && srcmap->type == IOMAP_UNWRITTEN) {
>  				range_dirty = false;
>  				status = iomap_zero_iter_flush_and_stale(&iter);
>  			} else {
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 37a1b33e9045..896d0dd07613 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1790,6 +1790,7 @@ xfs_buffered_write_iomap_begin(
>  	if (error)
>  		return error;
>  
> +restart:
>  	error = xfs_ilock_for_iomap(ip, flags, &lockmode);
>  	if (error)
>  		return error;
> @@ -1817,9 +1818,27 @@ xfs_buffered_write_iomap_begin(
>  	if (eof)
>  		imap.br_startoff = end_fsb; /* fake hole until the end */
>  
> -	/* We never need to allocate blocks for zeroing or unsharing a hole. */
> -	if ((flags & (IOMAP_UNSHARE | IOMAP_ZERO)) &&
> -	    imap.br_startoff > offset_fsb) {
> +	/* We never need to allocate blocks for unsharing a hole. */
> +	if ((flags & IOMAP_UNSHARE) && imap.br_startoff > offset_fsb) {
> +		xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
> +		goto out_unlock;
> +	}
> +
> +	/*
> +	 * We may need to zero over a hole in the data fork if it's fronted by
> +	 * COW blocks and dirty pagecache. To make sure zeroing occurs, force
> +	 * writeback to remap pending blocks and restart the lookup.
> +	 */
> +	if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
> +		if (filemap_range_needs_writeback(inode->i_mapping, offset,
> +						  offset + count - 1)) {
> +			xfs_iunlock(ip, lockmode);

I am a bit new to this section of the code - so a naive question:
Why do we need to unlock the inode here? Shouldn't the mappings be thread safe while the write/flush
is going on?
--NR
> +			error = filemap_write_and_wait_range(inode->i_mapping,
> +						offset, offset + count - 1);
> +			if (error)
> +				return error;
> +			goto restart;
> +		}
>  		xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
>  		goto out_unlock;
>  	}


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs
  2026-02-13 10:20   ` Nirjhar Roy (IBM)
@ 2026-02-13 16:24     ` Darrick J. Wong
  2026-02-18 17:41       ` Nirjhar Roy (IBM)
  0 siblings, 1 reply; 36+ messages in thread
From: Darrick J. Wong @ 2026-02-13 16:24 UTC (permalink / raw)
  To: Nirjhar Roy (IBM); +Cc: Brian Foster, linux-fsdevel, linux-xfs

On Fri, Feb 13, 2026 at 03:50:07PM +0530, Nirjhar Roy (IBM) wrote:
> On Thu, 2026-01-29 at 10:50 -0500, Brian Foster wrote:
> > iomap zero range has a wart in that it also flushes dirty pagecache
> > over hole mappings (rather than only unwritten mappings). This was
> > included to accommodate a quirk in XFS where COW fork preallocation
> > can exist over a hole in the data fork, and the associated range is
> > reported as a hole. This is because the range actually is a hole,
> > but XFS also has an optimization where if COW fork blocks exist for
> > a range being written to, those blocks are used regardless of
> > whether the data fork blocks are shared or not. For zeroing, COW
> > fork blocks over a data fork hole are only relevant if the range is
> > dirty in pagecache, otherwise the range is already considered
> > zeroed.
> > 
> > The easiest way to deal with this corner case is to flush the
> > pagecache to trigger COW remapping into the data fork, and then
> > operate on the updated on-disk state. The problem is that ext4
> > cannot accommodate a flush from this context due to being a
> > transaction deadlock vector.
> > 
> > Outside of the hole quirk, ext4 can avoid the flush for zero range
> > by using the recently introduced folio batch lookup mechanism for
> > unwritten mappings. Therefore, take the next logical step and lift
> > the hole handling logic into the XFS iomap_begin handler. iomap will
> > still flush on unwritten mappings without a folio batch, and XFS
> > will flush and retry mapping lookups in the case where it would
> > otherwise report a hole with dirty pagecache during a zero range.
> > 
> > Note that this is intended to be a fairly straightforward lift and
> > otherwise not change behavior. Now that the flush exists within XFS,
> > follow on patches can further optimize it.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/iomap/buffered-io.c |  2 +-
> >  fs/xfs/xfs_iomap.c     | 25 ++++++++++++++++++++++---
> >  2 files changed, 23 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 6beb876658c0..807384d72311 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -1620,7 +1620,7 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
> >  		     srcmap->type == IOMAP_UNWRITTEN)) {
> >  			s64 status;
> >  
> > -			if (range_dirty) {
> > +			if (range_dirty && srcmap->type == IOMAP_UNWRITTEN) {
> >  				range_dirty = false;
> >  				status = iomap_zero_iter_flush_and_stale(&iter);
> >  			} else {
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index 37a1b33e9045..896d0dd07613 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -1790,6 +1790,7 @@ xfs_buffered_write_iomap_begin(
> >  	if (error)
> >  		return error;
> >  
> > +restart:
> >  	error = xfs_ilock_for_iomap(ip, flags, &lockmode);
> >  	if (error)
> >  		return error;
> > @@ -1817,9 +1818,27 @@ xfs_buffered_write_iomap_begin(
> >  	if (eof)
> >  		imap.br_startoff = end_fsb; /* fake hole until the end */
> >  
> > -	/* We never need to allocate blocks for zeroing or unsharing a hole. */
> > -	if ((flags & (IOMAP_UNSHARE | IOMAP_ZERO)) &&
> > -	    imap.br_startoff > offset_fsb) {
> > +	/* We never need to allocate blocks for unsharing a hole. */
> > +	if ((flags & IOMAP_UNSHARE) && imap.br_startoff > offset_fsb) {
> > +		xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
> > +		goto out_unlock;
> > +	}
> > +
> > +	/*
> > +	 * We may need to zero over a hole in the data fork if it's fronted by
> > +	 * COW blocks and dirty pagecache. To make sure zeroing occurs, force
> > +	 * writeback to remap pending blocks and restart the lookup.
> > +	 */
> > +	if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
> > +		if (filemap_range_needs_writeback(inode->i_mapping, offset,
> > +						  offset + count - 1)) {
> > +			xfs_iunlock(ip, lockmode);
> 
> I am a bit new to this section of the code - so a naive question:
> Why do we need to unlock the inode here? Shouldn't the mappings be thread safe while the write/flush
> is going on?

Writeback takes XFS_ILOCK, which we currently hold here (possibly in
exclusive mode) so we must drop it to write(back) and wait.

--D

> --NR
> > +			error = filemap_write_and_wait_range(inode->i_mapping,
> > +						offset, offset + count - 1);
> > +			if (error)
> > +				return error;
> > +			goto restart;
> > +		}
> >  		xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
> >  		goto out_unlock;
> >  	}
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs
  2026-02-13  6:06   ` Christoph Hellwig
@ 2026-02-13 17:37     ` Brian Foster
  2026-03-02 19:02       ` Brian Foster
  0 siblings, 1 reply; 36+ messages in thread
From: Brian Foster @ 2026-02-13 17:37 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-xfs

On Thu, Feb 12, 2026 at 10:06:50PM -0800, Christoph Hellwig wrote:
> This patch makes generic/363 and xfs/131 fail on zoned XFS file systems.
> A later patch also makes generic/127, but I haven't fully bisected
> the cause yet.
> 
> I'll see if I can make sense of it, but unfortunately I've got a fair
> amount of travel coming up, so it might take a while.
> 
> If you want to give it a spin yourself, you can just add '-r zoned=1'
> to the mkfs arguments for the test and scratch device.
> 
> 

Ok, thanks for testing and reporting. I'm a little buried atm as well,
but this isn't the most pressing series anyways. I'll dig into these
tests when I spin back to this..

Brian


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/5] xfs: only flush when COW fork blocks overlap data fork holes
  2026-01-29 15:50 ` [PATCH v2 4/5] xfs: only flush when COW fork blocks overlap data fork holes Brian Foster
  2026-02-10 16:19   ` Christoph Hellwig
@ 2026-02-17 15:06   ` Nirjhar Roy (IBM)
  2026-02-18 15:37     ` Brian Foster
  1 sibling, 1 reply; 36+ messages in thread
From: Nirjhar Roy (IBM) @ 2026-02-17 15:06 UTC (permalink / raw)
  To: Brian Foster, linux-fsdevel, linux-xfs

On Thu, 2026-01-29 at 10:50 -0500, Brian Foster wrote:
> The zero range hole mapping flush case has been lifted from iomap
> into XFS. Now that we have more mapping context available from the
> ->iomap_begin() handler, we can isolate the flush further to when we
> know a hole is fronted by COW blocks.
> 
> Rather than purely rely on pagecache dirty state, explicitly check
> for the case where a range is a hole in both forks. Otherwise trim
> to the range where there does happen to be overlap and use that for
> the pagecache writeback check. This might prevent some spurious
> zeroing, but more importantly makes it easier to remove the flush
> entirely.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_iomap.c | 36 ++++++++++++++++++++++++++++++------
>  1 file changed, 30 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 0edab7af4a10..0e82b4ec8264 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1760,10 +1760,12 @@ xfs_buffered_write_iomap_begin(
>  {
>  	struct iomap_iter	*iter = container_of(iomap, struct iomap_iter,
>  						     iomap);
> +	struct address_space	*mapping = inode->i_mapping;
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
>  	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
>  	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
> +	xfs_fileoff_t		cow_fsb = NULLFILEOFF;
>  	struct xfs_bmbt_irec	imap, cmap;
>  	struct xfs_iext_cursor	icur, ccur;
>  	xfs_fsblock_t		prealloc_blocks = 0;
> @@ -1831,6 +1833,8 @@ xfs_buffered_write_iomap_begin(
>  		}
>  		cow_eof = !xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb,
>  				&ccur, &cmap);
> +		if (!cow_eof)
> +			cow_fsb = cmap.br_startoff;
>  	}
>  
>  	/* We never need to allocate blocks for unsharing a hole. */
> @@ -1845,17 +1849,37 @@ xfs_buffered_write_iomap_begin(
>  	 * writeback to remap pending blocks and restart the lookup.
>  	 */
>  	if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
> -		if (filemap_range_needs_writeback(inode->i_mapping, offset,
> -						  offset + count - 1)) {
> +		loff_t start, end;

Nit: Tab between data type and identifier?

> +
> +		imap.br_blockcount = imap.br_startoff - offset_fsb;
> +		imap.br_startoff = offset_fsb;
> +		imap.br_startblock = HOLESTARTBLOCK;
> +		imap.br_state = XFS_EXT_NORM;
> +
> +		if (cow_fsb == NULLFILEOFF) {
> +			goto found_imap;
> +		} else if (cow_fsb > offset_fsb) {
> +			xfs_trim_extent(&imap, offset_fsb,
> +					cow_fsb - offset_fsb);
> +			goto found_imap;
> +		}
> +
> +		/* COW fork blocks overlap the hole */
> +		xfs_trim_extent(&imap, offset_fsb,
> +			    cmap.br_startoff + cmap.br_blockcount - offset_fsb);
> +		start = XFS_FSB_TO_B(mp, imap.br_startoff);
> +		end = XFS_FSB_TO_B(mp,
> +				   imap.br_startoff + imap.br_blockcount) - 1;

So, we are including the bytes in the block number (imap.br_startoff + imap.br_blockcount - 1)th,
right? That is why a -1 after XFS_FSB_TO_B()? 
--NR
> +		if (filemap_range_needs_writeback(mapping, start, end)) {
>  			xfs_iunlock(ip, lockmode);
> -			error = filemap_write_and_wait_range(inode->i_mapping,
> -						offset, offset + count - 1);
> +			error = filemap_write_and_wait_range(mapping, start,
> +							     end);
>  			if (error)
>  				return error;
>  			goto restart;
>  		}
> -		xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
> -		goto out_unlock;
> +
> +		goto found_imap;
>  	}
>  
>  	/*


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/5] xfs: only flush when COW fork blocks overlap data fork holes
  2026-02-17 15:06   ` Nirjhar Roy (IBM)
@ 2026-02-18 15:37     ` Brian Foster
  2026-02-18 17:40       ` Nirjhar Roy (IBM)
  0 siblings, 1 reply; 36+ messages in thread
From: Brian Foster @ 2026-02-18 15:37 UTC (permalink / raw)
  To: Nirjhar Roy (IBM); +Cc: linux-fsdevel, linux-xfs

On Tue, Feb 17, 2026 at 08:36:50PM +0530, Nirjhar Roy (IBM) wrote:
> On Thu, 2026-01-29 at 10:50 -0500, Brian Foster wrote:
> > The zero range hole mapping flush case has been lifted from iomap
> > into XFS. Now that we have more mapping context available from the
> > ->iomap_begin() handler, we can isolate the flush further to when we
> > know a hole is fronted by COW blocks.
> > 
> > Rather than purely rely on pagecache dirty state, explicitly check
> > for the case where a range is a hole in both forks. Otherwise trim
> > to the range where there does happen to be overlap and use that for
> > the pagecache writeback check. This might prevent some spurious
> > zeroing, but more importantly makes it easier to remove the flush
> > entirely.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/xfs_iomap.c | 36 ++++++++++++++++++++++++++++++------
> >  1 file changed, 30 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index 0edab7af4a10..0e82b4ec8264 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -1760,10 +1760,12 @@ xfs_buffered_write_iomap_begin(
> >  {
> >  	struct iomap_iter	*iter = container_of(iomap, struct iomap_iter,
> >  						     iomap);
> > +	struct address_space	*mapping = inode->i_mapping;
> >  	struct xfs_inode	*ip = XFS_I(inode);
> >  	struct xfs_mount	*mp = ip->i_mount;
> >  	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
> >  	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
> > +	xfs_fileoff_t		cow_fsb = NULLFILEOFF;
> >  	struct xfs_bmbt_irec	imap, cmap;
> >  	struct xfs_iext_cursor	icur, ccur;
> >  	xfs_fsblock_t		prealloc_blocks = 0;
> > @@ -1831,6 +1833,8 @@ xfs_buffered_write_iomap_begin(
> >  		}
> >  		cow_eof = !xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb,
> >  				&ccur, &cmap);
> > +		if (!cow_eof)
> > +			cow_fsb = cmap.br_startoff;
> >  	}
> >  
> >  	/* We never need to allocate blocks for unsharing a hole. */
> > @@ -1845,17 +1849,37 @@ xfs_buffered_write_iomap_begin(
> >  	 * writeback to remap pending blocks and restart the lookup.
> >  	 */
> >  	if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
> > -		if (filemap_range_needs_writeback(inode->i_mapping, offset,
> > -						  offset + count - 1)) {
> > +		loff_t start, end;
> 
> Nit: Tab between data type and identifier?
> 

Sure.

> > +
> > +		imap.br_blockcount = imap.br_startoff - offset_fsb;
> > +		imap.br_startoff = offset_fsb;
> > +		imap.br_startblock = HOLESTARTBLOCK;
> > +		imap.br_state = XFS_EXT_NORM;
> > +
> > +		if (cow_fsb == NULLFILEOFF) {
> > +			goto found_imap;
> > +		} else if (cow_fsb > offset_fsb) {
> > +			xfs_trim_extent(&imap, offset_fsb,
> > +					cow_fsb - offset_fsb);
> > +			goto found_imap;
> > +		}
> > +
> > +		/* COW fork blocks overlap the hole */
> > +		xfs_trim_extent(&imap, offset_fsb,
> > +			    cmap.br_startoff + cmap.br_blockcount - offset_fsb);
> > +		start = XFS_FSB_TO_B(mp, imap.br_startoff);
> > +		end = XFS_FSB_TO_B(mp,
> > +				   imap.br_startoff + imap.br_blockcount) - 1;
> 
> So, we are including the bytes in the block number (imap.br_startoff + imap.br_blockcount - 1)th,
> right? That is why a -1 after XFS_FSB_TO_B()? 

Not sure I follow what you mean by the "bytes in the block number"
phrasing..? Anyways, the XFS_FSB_TO_B() here should return the starting
byte offset of the first block beyond the range (exclusive). The -1
changes that to the last byte offset of the range we're interested in
(inclusive), which I believe is what the filemap api wants..

Brian

> --NR
> > +		if (filemap_range_needs_writeback(mapping, start, end)) {
> >  			xfs_iunlock(ip, lockmode);
> > -			error = filemap_write_and_wait_range(inode->i_mapping,
> > -						offset, offset + count - 1);
> > +			error = filemap_write_and_wait_range(mapping, start,
> > +							     end);
> >  			if (error)
> >  				return error;
> >  			goto restart;
> >  		}
> > -		xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
> > -		goto out_unlock;
> > +
> > +		goto found_imap;
> >  	}
> >  
> >  	/*
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/5] xfs: only flush when COW fork blocks overlap data fork holes
  2026-02-18 15:37     ` Brian Foster
@ 2026-02-18 17:40       ` Nirjhar Roy (IBM)
  0 siblings, 0 replies; 36+ messages in thread
From: Nirjhar Roy (IBM) @ 2026-02-18 17:40 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, linux-xfs


On 2/18/26 21:07, Brian Foster wrote:
> On Tue, Feb 17, 2026 at 08:36:50PM +0530, Nirjhar Roy (IBM) wrote:
>> On Thu, 2026-01-29 at 10:50 -0500, Brian Foster wrote:
>>> The zero range hole mapping flush case has been lifted from iomap
>>> into XFS. Now that we have more mapping context available from the
>>> ->iomap_begin() handler, we can isolate the flush further to when we
>>> know a hole is fronted by COW blocks.
>>>
>>> Rather than purely rely on pagecache dirty state, explicitly check
>>> for the case where a range is a hole in both forks. Otherwise trim
>>> to the range where there does happen to be overlap and use that for
>>> the pagecache writeback check. This might prevent some spurious
>>> zeroing, but more importantly makes it easier to remove the flush
>>> entirely.
>>>
>>> Signed-off-by: Brian Foster <bfoster@redhat.com>
>>> ---
>>>   fs/xfs/xfs_iomap.c | 36 ++++++++++++++++++++++++++++++------
>>>   1 file changed, 30 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
>>> index 0edab7af4a10..0e82b4ec8264 100644
>>> --- a/fs/xfs/xfs_iomap.c
>>> +++ b/fs/xfs/xfs_iomap.c
>>> @@ -1760,10 +1760,12 @@ xfs_buffered_write_iomap_begin(
>>>   {
>>>   	struct iomap_iter	*iter = container_of(iomap, struct iomap_iter,
>>>   						     iomap);
>>> +	struct address_space	*mapping = inode->i_mapping;
>>>   	struct xfs_inode	*ip = XFS_I(inode);
>>>   	struct xfs_mount	*mp = ip->i_mount;
>>>   	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
>>>   	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
>>> +	xfs_fileoff_t		cow_fsb = NULLFILEOFF;
>>>   	struct xfs_bmbt_irec	imap, cmap;
>>>   	struct xfs_iext_cursor	icur, ccur;
>>>   	xfs_fsblock_t		prealloc_blocks = 0;
>>> @@ -1831,6 +1833,8 @@ xfs_buffered_write_iomap_begin(
>>>   		}
>>>   		cow_eof = !xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb,
>>>   				&ccur, &cmap);
>>> +		if (!cow_eof)
>>> +			cow_fsb = cmap.br_startoff;
>>>   	}
>>>   
>>>   	/* We never need to allocate blocks for unsharing a hole. */
>>> @@ -1845,17 +1849,37 @@ xfs_buffered_write_iomap_begin(
>>>   	 * writeback to remap pending blocks and restart the lookup.
>>>   	 */
>>>   	if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
>>> -		if (filemap_range_needs_writeback(inode->i_mapping, offset,
>>> -						  offset + count - 1)) {
>>> +		loff_t start, end;
>> Nit: Tab between data type and identifier?
>>
> Sure.
>
>>> +
>>> +		imap.br_blockcount = imap.br_startoff - offset_fsb;
>>> +		imap.br_startoff = offset_fsb;
>>> +		imap.br_startblock = HOLESTARTBLOCK;
>>> +		imap.br_state = XFS_EXT_NORM;
>>> +
>>> +		if (cow_fsb == NULLFILEOFF) {
>>> +			goto found_imap;
>>> +		} else if (cow_fsb > offset_fsb) {
>>> +			xfs_trim_extent(&imap, offset_fsb,
>>> +					cow_fsb - offset_fsb);
>>> +			goto found_imap;
>>> +		}
>>> +
>>> +		/* COW fork blocks overlap the hole */
>>> +		xfs_trim_extent(&imap, offset_fsb,
>>> +			    cmap.br_startoff + cmap.br_blockcount - offset_fsb);
>>> +		start = XFS_FSB_TO_B(mp, imap.br_startoff);
>>> +		end = XFS_FSB_TO_B(mp,
>>> +				   imap.br_startoff + imap.br_blockcount) - 1;
>> So, we are including the bytes in the block number (imap.br_startoff + imap.br_blockcount - 1)th,
>> right? That is why a -1 after XFS_FSB_TO_B()?
> Not sure I follow what you mean by the "bytes in the block number"
> phrasing..? Anyways, the XFS_FSB_TO_B() here should return the starting
> byte offset of the first block beyond the range (exclusive). The -1
> changes that to the last byte offset of the range we're interested in
> (inclusive), which I believe is what the filemap api wants..

Yeah, that answers my question. Thank you.

--NR

>
> Brian
>
>> --NR
>>> +		if (filemap_range_needs_writeback(mapping, start, end)) {
>>>   			xfs_iunlock(ip, lockmode);
>>> -			error = filemap_write_and_wait_range(inode->i_mapping,
>>> -						offset, offset + count - 1);
>>> +			error = filemap_write_and_wait_range(mapping, start,
>>> +							     end);
>>>   			if (error)
>>>   				return error;
>>>   			goto restart;
>>>   		}
>>> -		xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
>>> -		goto out_unlock;
>>> +
>>> +		goto found_imap;
>>>   	}
>>>   
>>>   	/*

-- 
Nirjhar Roy
Linux Kernel Developer
IBM, Bangalore


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs
  2026-02-13 16:24     ` Darrick J. Wong
@ 2026-02-18 17:41       ` Nirjhar Roy (IBM)
  0 siblings, 0 replies; 36+ messages in thread
From: Nirjhar Roy (IBM) @ 2026-02-18 17:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Brian Foster, linux-fsdevel, linux-xfs


On 2/13/26 21:54, Darrick J. Wong wrote:
> On Fri, Feb 13, 2026 at 03:50:07PM +0530, Nirjhar Roy (IBM) wrote:
>> On Thu, 2026-01-29 at 10:50 -0500, Brian Foster wrote:
>>> iomap zero range has a wart in that it also flushes dirty pagecache
>>> over hole mappings (rather than only unwritten mappings). This was
>>> included to accommodate a quirk in XFS where COW fork preallocation
>>> can exist over a hole in the data fork, and the associated range is
>>> reported as a hole. This is because the range actually is a hole,
>>> but XFS also has an optimization where if COW fork blocks exist for
>>> a range being written to, those blocks are used regardless of
>>> whether the data fork blocks are shared or not. For zeroing, COW
>>> fork blocks over a data fork hole are only relevant if the range is
>>> dirty in pagecache, otherwise the range is already considered
>>> zeroed.
>>>
>>> The easiest way to deal with this corner case is to flush the
>>> pagecache to trigger COW remapping into the data fork, and then
>>> operate on the updated on-disk state. The problem is that ext4
>>> cannot accommodate a flush from this context due to being a
>>> transaction deadlock vector.
>>>
>>> Outside of the hole quirk, ext4 can avoid the flush for zero range
>>> by using the recently introduced folio batch lookup mechanism for
>>> unwritten mappings. Therefore, take the next logical step and lift
>>> the hole handling logic into the XFS iomap_begin handler. iomap will
>>> still flush on unwritten mappings without a folio batch, and XFS
>>> will flush and retry mapping lookups in the case where it would
>>> otherwise report a hole with dirty pagecache during a zero range.
>>>
>>> Note that this is intended to be a fairly straightforward lift and
>>> otherwise not change behavior. Now that the flush exists within XFS,
>>> follow on patches can further optimize it.
>>>
>>> Signed-off-by: Brian Foster <bfoster@redhat.com>
>>> ---
>>>   fs/iomap/buffered-io.c |  2 +-
>>>   fs/xfs/xfs_iomap.c     | 25 ++++++++++++++++++++++---
>>>   2 files changed, 23 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>>> index 6beb876658c0..807384d72311 100644
>>> --- a/fs/iomap/buffered-io.c
>>> +++ b/fs/iomap/buffered-io.c
>>> @@ -1620,7 +1620,7 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
>>>   		     srcmap->type == IOMAP_UNWRITTEN)) {
>>>   			s64 status;
>>>   
>>> -			if (range_dirty) {
>>> +			if (range_dirty && srcmap->type == IOMAP_UNWRITTEN) {
>>>   				range_dirty = false;
>>>   				status = iomap_zero_iter_flush_and_stale(&iter);
>>>   			} else {
>>> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
>>> index 37a1b33e9045..896d0dd07613 100644
>>> --- a/fs/xfs/xfs_iomap.c
>>> +++ b/fs/xfs/xfs_iomap.c
>>> @@ -1790,6 +1790,7 @@ xfs_buffered_write_iomap_begin(
>>>   	if (error)
>>>   		return error;
>>>   
>>> +restart:
>>>   	error = xfs_ilock_for_iomap(ip, flags, &lockmode);
>>>   	if (error)
>>>   		return error;
>>> @@ -1817,9 +1818,27 @@ xfs_buffered_write_iomap_begin(
>>>   	if (eof)
>>>   		imap.br_startoff = end_fsb; /* fake hole until the end */
>>>   
>>> -	/* We never need to allocate blocks for zeroing or unsharing a hole. */
>>> -	if ((flags & (IOMAP_UNSHARE | IOMAP_ZERO)) &&
>>> -	    imap.br_startoff > offset_fsb) {
>>> +	/* We never need to allocate blocks for unsharing a hole. */
>>> +	if ((flags & IOMAP_UNSHARE) && imap.br_startoff > offset_fsb) {
>>> +		xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
>>> +		goto out_unlock;
>>> +	}
>>> +
>>> +	/*
>>> +	 * We may need to zero over a hole in the data fork if it's fronted by
>>> +	 * COW blocks and dirty pagecache. To make sure zeroing occurs, force
>>> +	 * writeback to remap pending blocks and restart the lookup.
>>> +	 */
>>> +	if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
>>> +		if (filemap_range_needs_writeback(inode->i_mapping, offset,
>>> +						  offset + count - 1)) {
>>> +			xfs_iunlock(ip, lockmode);
>> I am a bit new to this section of the code - so a naive question:
>> Why do we need to unlock the inode here? Shouldn't the mappings be thread safe while the write/flush
>> is going on?
> Writeback takes XFS_ILOCK, which we currently hold here (possibly in
> exclusive mode) so we must drop it to write(back) and wait.

Okay, got it. Thank you.

--NR

>
> --D
>
>> --NR
>>> +			error = filemap_write_and_wait_range(inode->i_mapping,
>>> +						offset, offset + count - 1);
>>> +			if (error)
>>> +				return error;
>>> +			goto restart;
>>> +		}
>>>   		xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
>>>   		goto out_unlock;
>>>   	}
>>
-- 
Nirjhar Roy
Linux Kernel Developer
IBM, Bangalore


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs
  2026-02-13 17:37     ` Brian Foster
@ 2026-03-02 19:02       ` Brian Foster
  2026-03-03 14:37         ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Brian Foster @ 2026-03-02 19:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-xfs

On Fri, Feb 13, 2026 at 12:37:39PM -0500, Brian Foster wrote:
> On Thu, Feb 12, 2026 at 10:06:50PM -0800, Christoph Hellwig wrote:
> > This patch makes generic/363 and xfs/131 fail on zoned XFS file systems.
> > A later patch also makes generic/127, but I haven't fully bisected
> > the cause yet.
> > 
> > I'll see if I can make sense of it, but unfortunately I've got a fair
> > amount of travel coming up, so it might take a while.
> > 
> > If you want to give it a spin yourself, you can just add '-r zoned=1'
> > to the mkfs arguments for the test and scratch device.
> > 
> > 
> 
> Ok, thanks for testing and reporting. I'm a little buried atm as well,
> but this isn't the most pressing series anyways. I'll dig into these
> tests when I spin back to this..
> 

I got a chance to look into this. Note that I don't reproduce a
generic/127 failure (even after running a few iters), so I don't know if
that might be related to something on your end. I reproduce the other
two and those looked like straight failures to zero. From that I think
the issue is just that xfs_zoned_buffered_write_iomap_begin() minimally
needs the flush treatment from patch 1. I.e., something like the
appended diff allows these tests to pass with -rzoned=1.

Technically this could use the folio batch helper, but given that we
don't use that for the unwritten case (and thus already rely on the
iomap flush), and that this is currently experimental, I think this is
probably fine for now. Perhaps if we lift zeroing off into its own set
of callbacks, that might be a good opportunity to clean this up in both
places.

I'll run some more testing with this on -rzoned=1 and barring major
issues plan to include this in the next version of the series..

Brian

--- 8< ---

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index fcb9463d0d8c..5b709b86d7cb 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1590,6 +1590,7 @@ xfs_zoned_buffered_write_iomap_begin(
 {
 	struct iomap_iter	*iter =
 		container_of(iomap, struct iomap_iter, iomap);
+	struct address_space	*mapping = inode->i_mapping;
 	struct xfs_zone_alloc_ctx *ac = iter->private;
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
@@ -1614,6 +1615,7 @@ xfs_zoned_buffered_write_iomap_begin(
 	if (error)
 		return error;
 
+restart:
 	error = xfs_ilock_for_iomap(ip, flags, &lockmode);
 	if (error)
 		return error;
@@ -1655,6 +1657,17 @@ xfs_zoned_buffered_write_iomap_begin(
 			 * We never need to allocate blocks for zeroing a hole.
 			 */
 			if (flags & IOMAP_ZERO) {
+				if (filemap_range_needs_writeback(mapping,
+						  offset, offset + count - 1)) {
+					xfs_iunlock(ip, lockmode);
+					error = filemap_write_and_wait_range(
+							mapping, offset,
+							offset + count - 1);
+					if (error)
+						return error;
+					goto restart;
+				}
+
 				xfs_hole_to_iomap(ip, iomap, offset_fsb,
 						smap.br_startoff);
 				goto out_unlock;


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs
  2026-03-02 19:02       ` Brian Foster
@ 2026-03-03 14:37         ` Christoph Hellwig
  2026-03-03 19:00           ` Brian Foster
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2026-03-03 14:37 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-fsdevel, linux-xfs

On Mon, Mar 02, 2026 at 02:02:10PM -0500, Brian Foster wrote:
> I got a chance to look into this. Note that I don't reproduce a
> generic/127 failure (even after running a few iters), so I don't know if
> that might be related to something on your end. I reproduce the other
> two and those looked like straight failures to zero. From that I think
> the issue is just that xfs_zoned_buffered_write_iomap_begin() minimally
> needs the flush treatment from patch 1. I.e., something like the
> appended diff allows these tests to pass with -rzoned=1.
> 
> Technically this could use the folio batch helper, but given that we
> don't use that for the unwritten case (and thus already rely on the
> iomap flush), and that this is currently experimental, I think this is
> probably fine for now. Perhaps if we lift zeroing off into its own set
> of callbacks, that might be a good opportunity to clean this up in both
> places.

Note that unwritten extents aren't supported for zoned rt inodes, so
that case doesn't actually exist.

The changes themselves look good.  I kinda hate the very deep
indentation, but I can't really see a good way to fix that easily.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs
  2026-03-03 14:37         ` Christoph Hellwig
@ 2026-03-03 19:00           ` Brian Foster
  2026-03-04 13:13             ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Brian Foster @ 2026-03-03 19:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-xfs

On Tue, Mar 03, 2026 at 06:37:09AM -0800, Christoph Hellwig wrote:
> On Mon, Mar 02, 2026 at 02:02:10PM -0500, Brian Foster wrote:
> > I got a chance to look into this. Note that I don't reproduce a
> > generic/127 failure (even after running a few iters), so I don't know if
> > that might be related to something on your end. I reproduce the other
> > two and those looked like straight failures to zero. From that I think
> > the issue is just that xfs_zoned_buffered_write_iomap_begin() minimally
> > needs the flush treatment from patch 1. I.e., something like the
> > appended diff allows these tests to pass with -rzoned=1.
> > 
> > Technically this could use the folio batch helper, but given that we
> > don't use that for the unwritten case (and thus already rely on the
> > iomap flush), and that this is currently experimental, I think this is
> > probably fine for now. Perhaps if we lift zeroing off into its own set
> > of callbacks, that might be a good opportunity to clean this up in both
> > places.
> 
> Note that unwritten extents aren't supported for zoned rt inodes, so
> that case doesn't actually exist.
> 

Oh I see. If I follow the high level flow here, zoned mode always writes
through COW fork delalloc, and then writeback appears to remove the
delalloc mapping and then does whatever physical zone allocation magic
further down in the submission path. So there are no unwritten extents
nor COW fork preallocation as far as I can tell.

I think that actually means the IOMAP_ZERO logic for the zoned
iomap_begin handler is slightly wrong as it is. I was originally
thinking this was just another COW fork prealloc situation, but in
actuality it looks like zoned mode intentionally creates this COW fork
blocks over data fork hole scenario on first write to a previously
unallocated file range.

IOMAP_ZERO returns a hole whenever one exists in the data fork, so that
means we're not properly reporting a data mapping up until the range is
allocated in the data fork (i.e. writeback occurs at least once). The
reason this has worked is presumably because iomap does the flush when
the range of a reported hole is dirty, so it retries the mapping lookup
after blocks are remapped and DTRT from there.

So the fix I posted works just the same.. lifting the flush just
preserves how things work today. But I think what this means is that we
should also be able to rework zoned mode IOMAP_ZERO handling to require
neither the flush nor dirty folio lookup. It should be able to return a
mapping to zero if blocks exist in either fork (allocating to COW fork
if necessary), otherwise report a hole.

Hmm.. maybe I'll take a look if we can do something like that from the
start. If it's not straightforward, I'll keep the flush fix for now and
come back to it later..

Brian

> The changes themselves look good.  I kinda hate the very deep
> indentation, but I can't really see a good way to fix that easily.
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs
  2026-03-03 19:00           ` Brian Foster
@ 2026-03-04 13:13             ` Christoph Hellwig
  2026-03-04 14:17               ` Brian Foster
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2026-03-04 13:13 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-fsdevel, linux-xfs

On Tue, Mar 03, 2026 at 02:00:47PM -0500, Brian Foster wrote:
> Oh I see. If I follow the high level flow here, zoned mode always writes
> through COW fork delalloc, and then writeback appears to remove the
> delalloc mapping and then does whatever physical zone allocation magic
> further down in the submission path. So there are no unwritten extents
> nor COW fork preallocation as far as I can tell.

Yes.

> I think that actually means the IOMAP_ZERO logic for the zoned
> iomap_begin handler is slightly wrong as it is. I was originally
> thinking this was just another COW fork prealloc situation, but in
> actuality it looks like zoned mode intentionally creates this COW fork
> blocks over data fork hole scenario on first write to a previously
> unallocated file range.

Yes.

> IOMAP_ZERO returns a hole whenever one exists in the data fork, so that
> means we're not properly reporting a data mapping up until the range is
> allocated in the data fork (i.e. writeback occurs at least once). The
> reason this has worked is presumably because iomap does the flush when
> the range of a reported hole is dirty, so it retries the mapping lookup

Yeah.

> So the fix I posted works just the same.. lifting the flush just
> preserves how things work today. But I think what this means is that we
> should also be able to rework zoned mode IOMAP_ZERO handling to require
> neither the flush nor dirty folio lookup. It should be able to return a
> mapping to zero if blocks exist in either fork (allocating to COW fork
> if necessary), otherwise report a hole.

Yeah.  If there still is a delalloc mapping in the COW fork we could
actually steal that for zeroing.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs
  2026-03-04 13:13             ` Christoph Hellwig
@ 2026-03-04 14:17               ` Brian Foster
  2026-03-04 14:41                 ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Brian Foster @ 2026-03-04 14:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-xfs

On Wed, Mar 04, 2026 at 05:13:23AM -0800, Christoph Hellwig wrote:
> On Tue, Mar 03, 2026 at 02:00:47PM -0500, Brian Foster wrote:
> > Oh I see. If I follow the high level flow here, zoned mode always writes
> > through COW fork delalloc, and then writeback appears to remove the
> > delalloc mapping and then does whatever physical zone allocation magic
> > further down in the submission path. So there are no unwritten extents
> > nor COW fork preallocation as far as I can tell.
> 
> Yes.
> 
> > I think that actually means the IOMAP_ZERO logic for the zoned
> > iomap_begin handler is slightly wrong as it is. I was originally
> > thinking this was just another COW fork prealloc situation, but in
> > actuality it looks like zoned mode intentionally creates this COW fork
> > blocks over data fork hole scenario on first write to a previously
> > unallocated file range.
> 
> Yes.
> 
> > IOMAP_ZERO returns a hole whenever one exists in the data fork, so that
> > means we're not properly reporting a data mapping up until the range is
> > allocated in the data fork (i.e. writeback occurs at least once). The
> > reason this has worked is presumably because iomap does the flush when
> > the range of a reported hole is dirty, so it retries the mapping lookup
> 
> Yeah.
> 
> > So the fix I posted works just the same.. lifting the flush just
> > preserves how things work today. But I think what this means is that we
> > should also be able to rework zoned mode IOMAP_ZERO handling to require
> > neither the flush nor dirty folio lookup. It should be able to return a
> > mapping to zero if blocks exist in either fork (allocating to COW fork
> > if necessary), otherwise report a hole.
> 
> Yeah.  If there still is a delalloc mapping in the COW fork we could
> actually steal that for zeroing.
> 
> 

I tested the change below but it ended up failing xfs/131. Some fast and
loose (i.e. LLM assisted) trace analysis suggests the issue is that this
particular situation is racy. I.e., we write to a sparse file range and
add COW fork dellaloc, writeback kicks in and drops the delalloc
mapping, then zeroing occurs over said range and finds holes in both
forks, then zone I/O completion occurs and maps blocks into the data
fork.

So this still seems like generally the right idea to me, but we probably
need to find a way to avoid the transient hole situation on an unlocked
inode. For example, maybe the COW fork delalloc could stay around
longer, or transfer to the data fork at writeback time if the data fork
range happens to be a hole.

But that's just handwaving and beyond the scope of this series. For now
I'll probably go back to the flush fix and document some of this in the
patch for future reference..

Brian

--- 8< ---

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 255b650c3790..533d44633177 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1651,14 +1651,6 @@ xfs_zoned_buffered_write_iomap_begin(
 				&smap))
 			smap.br_startoff = end_fsb; /* fake hole until EOF */
 		if (smap.br_startoff > offset_fsb) {
-			/*
-			 * We never need to allocate blocks for zeroing a hole.
-			 */
-			if (flags & IOMAP_ZERO) {
-				xfs_hole_to_iomap(ip, iomap, offset_fsb,
-						smap.br_startoff);
-				goto out_unlock;
-			}
 			end_fsb = min(end_fsb, smap.br_startoff);
 		} else {
 			end_fsb = min(end_fsb,
@@ -1690,6 +1682,15 @@ xfs_zoned_buffered_write_iomap_begin(
 	count_fsb = min3(end_fsb - offset_fsb, XFS_MAX_BMBT_EXTLEN,
 			 XFS_B_TO_FSB(mp, 1024 * PAGE_SIZE));
 
+	/*
+	 * We don't allocate blocks for zeroing a hole, but we only report a
+	 * hole in zoned mode if one exists in both the COW and data forks.
+	 */
+	if ((flags & IOMAP_ZERO) && srcmap->type == IOMAP_HOLE) {
+		xfs_hole_to_iomap(ip, iomap, offset_fsb, end_fsb);
+		goto out_unlock;
+	}
+
 	/*
 	 * The block reservation is supposed to cover all blocks that the
 	 * operation could possible write, but there is a nasty corner case


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs
  2026-03-04 14:17               ` Brian Foster
@ 2026-03-04 14:41                 ` Christoph Hellwig
  2026-03-04 15:02                   ` Brian Foster
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2026-03-04 14:41 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-fsdevel, linux-xfs

On Wed, Mar 04, 2026 at 09:17:33AM -0500, Brian Foster wrote:
> I tested the change below but it ended up failing xfs/131. Some fast and
> loose (i.e. LLM assisted) trace analysis suggests the issue is that this
> particular situation is racy. I.e., we write to a sparse file range and
> add COW fork dellaloc, writeback kicks in and drops the delalloc
> mapping, then zeroing occurs over said range and finds holes in both
> forks, then zone I/O completion occurs and maps blocks into the data
> fork.

Yes, that can happen.  But the folio will be locked or have the
writeback bit set over that whole period, so I can't see how writeback
can actually race with that?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs
  2026-03-04 14:41                 ` Christoph Hellwig
@ 2026-03-04 15:02                   ` Brian Foster
  2026-03-04 17:04                     ` Brian Foster
  0 siblings, 1 reply; 36+ messages in thread
From: Brian Foster @ 2026-03-04 15:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-xfs

On Wed, Mar 04, 2026 at 06:41:23AM -0800, Christoph Hellwig wrote:
> On Wed, Mar 04, 2026 at 09:17:33AM -0500, Brian Foster wrote:
> > I tested the change below but it ended up failing xfs/131. Some fast and
> > loose (i.e. LLM assisted) trace analysis suggests the issue is that this
> > particular situation is racy. I.e., we write to a sparse file range and
> > add COW fork dellaloc, writeback kicks in and drops the delalloc
> > mapping, then zeroing occurs over said range and finds holes in both
> > forks, then zone I/O completion occurs and maps blocks into the data
> > fork.
> 
> Yes, that can happen.  But the folio will be locked or have the
> writeback bit set over that whole period, so I can't see how writeback
> can actually race with that?
> 

I think that's why the flush (i.e. current behavior) works..

The change in my last mail attempts to replace the flush with improved
reporting to only report a hole if one exists in both COW/data forks at
the same time. So basically if the data fork is currently a hole and
writeback kicks in, xfs_zoned_map_blocks() deletes the COW fork mapping
and carries on with zoned writeback handling. At that point if zeroing
occurs, it would see holes in both the COW (just removed) and data fork
(not yet mapped in) and think there's nothing to do.

The idea is that if say we instead did something like transfer delalloc
into the data fork at writeback time, if the data fork had a hole, then
we could always tell from the iomap mapping whether we need to zero or
not without flushing or consulting pagecache at all.

Brian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs
  2026-03-04 15:02                   ` Brian Foster
@ 2026-03-04 17:04                     ` Brian Foster
  2026-03-05 14:11                       ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Brian Foster @ 2026-03-04 17:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-xfs

On Wed, Mar 04, 2026 at 10:02:09AM -0500, Brian Foster wrote:
> On Wed, Mar 04, 2026 at 06:41:23AM -0800, Christoph Hellwig wrote:
> > On Wed, Mar 04, 2026 at 09:17:33AM -0500, Brian Foster wrote:
> > > I tested the change below but it ended up failing xfs/131. Some fast and
> > > loose (i.e. LLM assisted) trace analysis suggests the issue is that this
> > > particular situation is racy. I.e., we write to a sparse file range and
> > > add COW fork dellaloc, writeback kicks in and drops the delalloc
> > > mapping, then zeroing occurs over said range and finds holes in both
> > > forks, then zone I/O completion occurs and maps blocks into the data
> > > fork.
> > 
> > Yes, that can happen.  But the folio will be locked or have the
> > writeback bit set over that whole period, so I can't see how writeback
> > can actually race with that?
> > 
> 
> I think that's why the flush (i.e. current behavior) works..
> 
> The change in my last mail attempts to replace the flush with improved
> reporting to only report a hole if one exists in both COW/data forks at
> the same time. So basically if the data fork is currently a hole and
> writeback kicks in, xfs_zoned_map_blocks() deletes the COW fork mapping
> and carries on with zoned writeback handling. At that point if zeroing
> occurs, it would see holes in both the COW (just removed) and data fork
> (not yet mapped in) and think there's nothing to do.
> 
> The idea is that if say we instead did something like transfer delalloc
> into the data fork at writeback time, if the data fork had a hole, then
> we could always tell from the iomap mapping whether we need to zero or
> not without flushing or consulting pagecache at all.
> 

This patch seems to work on a quick test. It's basically the two patches
squashed together (I'd post them as independent patches), so nothing too
different, but if we fix up the zero logic first that helps clean up the
indentation as a bonus.

Brian

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index be86d43044df..27470ec8372b 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1590,6 +1590,7 @@ xfs_zoned_buffered_write_iomap_begin(
 {
 	struct iomap_iter	*iter =
 		container_of(iomap, struct iomap_iter, iomap);
+	struct address_space	*mapping = inode->i_mapping;
 	struct xfs_zone_alloc_ctx *ac = iter->private;
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
@@ -1614,6 +1615,7 @@ xfs_zoned_buffered_write_iomap_begin(
 	if (error)
 		return error;
 
+restart:
 	error = xfs_ilock_for_iomap(ip, flags, &lockmode);
 	if (error)
 		return error;
@@ -1651,14 +1653,6 @@ xfs_zoned_buffered_write_iomap_begin(
 				&smap))
 			smap.br_startoff = end_fsb; /* fake hole until EOF */
 		if (smap.br_startoff > offset_fsb) {
-			/*
-			 * We never need to allocate blocks for zeroing a hole.
-			 */
-			if (flags & IOMAP_ZERO) {
-				xfs_hole_to_iomap(ip, iomap, offset_fsb,
-						smap.br_startoff);
-				goto out_unlock;
-			}
 			end_fsb = min(end_fsb, smap.br_startoff);
 		} else {
 			end_fsb = min(end_fsb,
@@ -1690,6 +1684,31 @@ xfs_zoned_buffered_write_iomap_begin(
 	count_fsb = min3(end_fsb - offset_fsb, XFS_MAX_BMBT_EXTLEN,
 			 XFS_B_TO_FSB(mp, 1024 * PAGE_SIZE));
 
+	/*
+	 * We don't allocate blocks for zeroing a hole, but we only report a
+	 * hole in zoned mode if one exists in both the COW and data forks.
+	 *
+	 * There is currently a corner case where writeback removes the COW fork
+	 * mapping and unlocks the inode, leaving a transient state where a hole
+	 * exists in both forks until write completion maps blocks into the data
+	 * fork. Until we can avoid this transient hole state, detect and avoid
+	 * this with a flush of any such range that appears dirty in pagecache.
+	 */
+	if ((flags & IOMAP_ZERO) && srcmap->type == IOMAP_HOLE) {
+		if (filemap_range_needs_writeback(mapping, offset,
+						  offset + count - 1)) {
+			xfs_iunlock(ip, lockmode);
+			error = filemap_write_and_wait_range(mapping, offset,
+							    offset + count - 1);
+			if (error)
+				return error;
+			goto restart;
+		}
+
+		xfs_hole_to_iomap(ip, iomap, offset_fsb, end_fsb);
+		goto out_unlock;
+	}
+
 	/*
 	 * The block reservation is supposed to cover all blocks that the
 	 * operation could possible write, but there is a nasty corner case


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs
  2026-03-04 17:04                     ` Brian Foster
@ 2026-03-05 14:11                       ` Christoph Hellwig
  2026-03-05 15:06                         ` Brian Foster
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2026-03-05 14:11 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-fsdevel, linux-xfs

On Wed, Mar 04, 2026 at 12:04:04PM -0500, Brian Foster wrote:
> This patch seems to work on a quick test. It's basically the two patches
> squashed together (I'd post them as independent patches), so nothing too
> different, but if we fix up the zero logic first that helps clean up the
> indentation as a bonus.

Cool, thanks a lot for putting in this extra effort!

Cosmetic comment on the comment:

> +	/*
> +	 * We don't allocate blocks for zeroing a hole, but we only report a
> +	 * hole in zoned mode if one exists in both the COW and data forks.

I'd reword this a bit as:

	/*
	 * When zeroing, don't allocate blocks for holes as they already
	 * zeroes, but we need to ensure that no extents exist in both
	 * the data and COW fork to ensure this really is a hole.
> +	 *
> +	 * There is currently a corner case where writeback removes the COW fork
> +	 * mapping and unlocks the inode, leaving a transient state where a hole
> +	 * exists in both forks until write completion maps blocks into the data
> +	 * fork. Until we can avoid this transient hole state, detect and avoid
> +	 * this with a flush of any such range that appears dirty in pagecache.
> +	 */


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs
  2026-03-05 14:11                       ` Christoph Hellwig
@ 2026-03-05 15:06                         ` Brian Foster
  2026-03-05 16:10                           ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Brian Foster @ 2026-03-05 15:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-xfs

On Thu, Mar 05, 2026 at 06:11:24AM -0800, Christoph Hellwig wrote:
> On Wed, Mar 04, 2026 at 12:04:04PM -0500, Brian Foster wrote:
> > This patch seems to work on a quick test. It's basically the two patches
> > squashed together (I'd post them as independent patches), so nothing too
> > different, but if we fix up the zero logic first that helps clean up the
> > indentation as a bonus.
> 
> Cool, thanks a lot for putting in this extra effort!
> 
> Cosmetic comment on the comment:
> 
> > +	/*
> > +	 * We don't allocate blocks for zeroing a hole, but we only report a
> > +	 * hole in zoned mode if one exists in both the COW and data forks.
> 
> I'd reword this a bit as:
> 
> 	/*
> 	 * When zeroing, don't allocate blocks for holes as they already
> 	 * zeroes, but we need to ensure that no extents exist in both
> 	 * the data and COW fork to ensure this really is a hole.
> > +	 *
> > +	 * There is currently a corner case where writeback removes the COW fork
> > +	 * mapping and unlocks the inode, leaving a transient state where a hole
> > +	 * exists in both forks until write completion maps blocks into the data
> > +	 * fork. Until we can avoid this transient hole state, detect and avoid
> > +	 * this with a flush of any such range that appears dirty in pagecache.
> > +	 */
> 
> 

Sure. I had reworked the latter part of the comment as well. With both
the changes it currently looks like:

        /*
         * When zeroing, don't allocate blocks for holes as they are already
         * zeroes, but we need to ensure that no extents exist in both the data
         * and COW fork to ensure this really is a hole.
         *
         * A window exists where we might observe a hole in both forks with
         * valid data in cache. Writeback removes the COW fork blocks on
         * submission but doesn't remap into the data fork until completion. If
         * the data fork was previously a hole, we'll fail to zero. Until we
         * find a way to avoid this transient state, check for dirty pagecache
         * and flush to wait on blocks to land in the data fork.
         */

Brian


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs
  2026-03-05 15:06                         ` Brian Foster
@ 2026-03-05 16:10                           ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2026-03-05 16:10 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-fsdevel, linux-xfs

On Thu, Mar 05, 2026 at 10:06:35AM -0500, Brian Foster wrote:
> Sure. I had reworked the latter part of the comment as well. With both
> the changes it currently looks like:
> 
>         /*
>          * When zeroing, don't allocate blocks for holes as they are already
>          * zeroes, but we need to ensure that no extents exist in both the data

overly long line here

>          * submission but doesn't remap into the data fork until completion. If

and here.

Otherwise looks great.


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2026-03-05 16:11 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-29 15:50 [PATCH v2 0/5] iomap, xfs: improve zero range flushing and lookup Brian Foster
2026-01-29 15:50 ` [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs Brian Foster
2026-02-10 16:15   ` Christoph Hellwig
2026-02-10 19:14     ` Brian Foster
2026-02-11 15:36       ` Christoph Hellwig
2026-02-13  6:06   ` Christoph Hellwig
2026-02-13 17:37     ` Brian Foster
2026-03-02 19:02       ` Brian Foster
2026-03-03 14:37         ` Christoph Hellwig
2026-03-03 19:00           ` Brian Foster
2026-03-04 13:13             ` Christoph Hellwig
2026-03-04 14:17               ` Brian Foster
2026-03-04 14:41                 ` Christoph Hellwig
2026-03-04 15:02                   ` Brian Foster
2026-03-04 17:04                     ` Brian Foster
2026-03-05 14:11                       ` Christoph Hellwig
2026-03-05 15:06                         ` Brian Foster
2026-03-05 16:10                           ` Christoph Hellwig
2026-02-13 10:20   ` Nirjhar Roy (IBM)
2026-02-13 16:24     ` Darrick J. Wong
2026-02-18 17:41       ` Nirjhar Roy (IBM)
2026-01-29 15:50 ` [PATCH v2 2/5] xfs: flush eof folio before insert range size update Brian Foster
2026-02-10 16:16   ` Christoph Hellwig
2026-02-10 19:14     ` Brian Foster
2026-01-29 15:50 ` [PATCH v2 3/5] xfs: look up cow fork extent earlier for buffered iomap_begin Brian Foster
2026-02-10 16:17   ` Christoph Hellwig
2026-01-29 15:50 ` [PATCH v2 4/5] xfs: only flush when COW fork blocks overlap data fork holes Brian Foster
2026-02-10 16:19   ` Christoph Hellwig
2026-02-10 19:18     ` Brian Foster
2026-02-17 15:06   ` Nirjhar Roy (IBM)
2026-02-18 15:37     ` Brian Foster
2026-02-18 17:40       ` Nirjhar Roy (IBM)
2026-01-29 15:50 ` [PATCH v2 5/5] xfs: replace zero range flush with folio batch Brian Foster
2026-02-10 16:21   ` Christoph Hellwig
2026-02-10 19:19     ` Brian Foster
2026-02-11 15:41       ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox