From: Brian Foster <bfoster@redhat.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs
Date: Wed, 4 Mar 2026 09:17:33 -0500 [thread overview]
Message-ID: <aag-_c8G_L5MQ42m@bfoster> (raw)
In-Reply-To: <aagv8y96vGHvbOdX@infradead.org>
On Wed, Mar 04, 2026 at 05:13:23AM -0800, Christoph Hellwig wrote:
> On Tue, Mar 03, 2026 at 02:00:47PM -0500, Brian Foster wrote:
> > Oh I see. If I follow the high level flow here, zoned mode always writes
> > through COW fork delalloc, and then writeback appears to remove the
> > delalloc mapping and then does whatever physical zone allocation magic
> > further down in the submission path. So there are no unwritten extents
> > nor COW fork preallocation as far as I can tell.
>
> Yes.
>
> > I think that actually means the IOMAP_ZERO logic for the zoned
> > iomap_begin handler is slightly wrong as it is. I was originally
> > thinking this was just another COW fork prealloc situation, but in
> > actuality it looks like zoned mode intentionally creates this COW fork
> > blocks over data fork hole scenario on first write to a previously
> > unallocated file range.
>
> Yes.
>
> > IOMAP_ZERO returns a hole whenever one exists in the data fork, so that
> > means we're not properly reporting a data mapping up until the range is
> > allocated in the data fork (i.e. writeback occurs at least once). The
> > reason this has worked is presumably because iomap does the flush when
> > the range of a reported hole is dirty, so it retries the mapping lookup
>
> Yeah.
>
> > So the fix I posted works just the same.. lifting the flush just
> > preserves how things work today. But I think what this means is that we
> > should also be able to rework zoned mode IOMAP_ZERO handling to require
> > neither the flush nor dirty folio lookup. It should be able to return a
> > mapping to zero if blocks exist in either fork (allocating to COW fork
> > if necessary), otherwise report a hole.
>
> Yeah. If there still is a delalloc mapping in the COW fork we could
> actually steal that for zeroing.
>
>
I tested the change below but it ended up failing xfs/131. Some fast and
loose (i.e. LLM assisted) trace analysis suggests the issue is that this
particular situation is racy. I.e., we write to a sparse file range and
add COW fork dellaloc, writeback kicks in and drops the delalloc
mapping, then zeroing occurs over said range and finds holes in both
forks, then zone I/O completion occurs and maps blocks into the data
fork.
So this still seems like generally the right idea to me, but we probably
need to find a way to avoid the transient hole situation on an unlocked
inode. For example, maybe the COW fork delalloc could stay around
longer, or transfer to the data fork at writeback time if the data fork
range happens to be a hole.
But that's just handwaving and beyond the scope of this series. For now
I'll probably go back to the flush fix and document some of this in the
patch for future reference..
Brian
--- 8< ---
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 255b650c3790..533d44633177 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1651,14 +1651,6 @@ xfs_zoned_buffered_write_iomap_begin(
&smap))
smap.br_startoff = end_fsb; /* fake hole until EOF */
if (smap.br_startoff > offset_fsb) {
- /*
- * We never need to allocate blocks for zeroing a hole.
- */
- if (flags & IOMAP_ZERO) {
- xfs_hole_to_iomap(ip, iomap, offset_fsb,
- smap.br_startoff);
- goto out_unlock;
- }
end_fsb = min(end_fsb, smap.br_startoff);
} else {
end_fsb = min(end_fsb,
@@ -1690,6 +1682,15 @@ xfs_zoned_buffered_write_iomap_begin(
count_fsb = min3(end_fsb - offset_fsb, XFS_MAX_BMBT_EXTLEN,
XFS_B_TO_FSB(mp, 1024 * PAGE_SIZE));
+ /*
+ * We don't allocate blocks for zeroing a hole, but we only report a
+ * hole in zoned mode if one exists in both the COW and data forks.
+ */
+ if ((flags & IOMAP_ZERO) && srcmap->type == IOMAP_HOLE) {
+ xfs_hole_to_iomap(ip, iomap, offset_fsb, end_fsb);
+ goto out_unlock;
+ }
+
/*
* The block reservation is supposed to cover all blocks that the
* operation could possible write, but there is a nasty corner case
next prev parent reply other threads:[~2026-03-04 14:17 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-29 15:50 [PATCH v2 0/5] iomap, xfs: improve zero range flushing and lookup Brian Foster
2026-01-29 15:50 ` [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs Brian Foster
2026-02-10 16:15 ` Christoph Hellwig
2026-02-10 19:14 ` Brian Foster
2026-02-11 15:36 ` Christoph Hellwig
2026-02-13 6:06 ` Christoph Hellwig
2026-02-13 17:37 ` Brian Foster
2026-03-02 19:02 ` Brian Foster
2026-03-03 14:37 ` Christoph Hellwig
2026-03-03 19:00 ` Brian Foster
2026-03-04 13:13 ` Christoph Hellwig
2026-03-04 14:17 ` Brian Foster [this message]
2026-03-04 14:41 ` Christoph Hellwig
2026-03-04 15:02 ` Brian Foster
2026-03-04 17:04 ` Brian Foster
2026-03-05 14:11 ` Christoph Hellwig
2026-03-05 15:06 ` Brian Foster
2026-03-05 16:10 ` Christoph Hellwig
2026-02-13 10:20 ` Nirjhar Roy (IBM)
2026-02-13 16:24 ` Darrick J. Wong
2026-02-18 17:41 ` Nirjhar Roy (IBM)
2026-01-29 15:50 ` [PATCH v2 2/5] xfs: flush eof folio before insert range size update Brian Foster
2026-02-10 16:16 ` Christoph Hellwig
2026-02-10 19:14 ` Brian Foster
2026-01-29 15:50 ` [PATCH v2 3/5] xfs: look up cow fork extent earlier for buffered iomap_begin Brian Foster
2026-02-10 16:17 ` Christoph Hellwig
2026-01-29 15:50 ` [PATCH v2 4/5] xfs: only flush when COW fork blocks overlap data fork holes Brian Foster
2026-02-10 16:19 ` Christoph Hellwig
2026-02-10 19:18 ` Brian Foster
2026-02-17 15:06 ` Nirjhar Roy (IBM)
2026-02-18 15:37 ` Brian Foster
2026-02-18 17:40 ` Nirjhar Roy (IBM)
2026-01-29 15:50 ` [PATCH v2 5/5] xfs: replace zero range flush with folio batch Brian Foster
2026-02-10 16:21 ` Christoph Hellwig
2026-02-10 19:19 ` Brian Foster
2026-02-11 15:41 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aag-_c8G_L5MQ42m@bfoster \
--to=bfoster@redhat.com \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox