From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/6] iomap, xfs: lift zero range hole mapping flush into xfs
Date: Wed, 5 Nov 2025 10:33:16 -0500 [thread overview]
Message-ID: <aQtuPFHtzm8-zeqS@bfoster> (raw)
In-Reply-To: <20251105003114.GY196370@frogsfrogsfrogs>
On Tue, Nov 04, 2025 at 04:31:14PM -0800, Darrick J. Wong wrote:
> On Thu, Oct 16, 2025 at 03:02:59PM -0400, Brian Foster wrote:
> > iomap zero range has a wart in that it also flushes dirty pagecache
> > over hole mappings (rather than only unwritten mappings). This was
> > included to accommodate a quirk in XFS where COW fork preallocation
> > can exist over a hole in the data fork, and the associated range is
> > reported as a hole. This is because the range actually is a hole,
> > but XFS also has an optimization where if COW fork blocks exist for
> > a range being written to, those blocks are used regardless of
> > whether the data fork blocks are shared or not. For zeroing, COW
> > fork blocks over a data fork hole are only relevant if the range is
> > dirty in pagecache, otherwise the range is already considered
> > zeroed.
>
> It occurs to me that the situation (unwritten cow mapping, hole in data
> fork) results in iomap_iter::iomap getting the unwritten mapping, and
> iomap_iter::srcmap getting the hole mapping. iomap_iter_srcmap returns
> iomap_itere::iomap because srcmap.type == HOLE.
>
> But then you have ext4 where there is no cow fork, so it will only ever
> set iomap_iter::iomap, leaving iomap_iter::srcmap set to the default.
> The default srcmap is a HOLE.
>
> So iomap can't distinguish between xfs' speculative cow over a hole
> behavior vs. ext4 just being simple. I wonder if we actually need to
> introduce a new iomap type for "pure overwrite"?
>
I definitely think we need a better solution here in iomap. The current
iomap/srcmap management/handling is quite confusing. What that solution
is, I'm not sure.
> The reason I say that that in designing the fuse-iomap uapi, it was a
> lot easier to understand the programming model if there was always
> explicit read and write mappings being sent back and forth; and a new
> type FUSE_IOMAP_TYPE_PURE_OVERWRITE that could be stored in the write
> mapping to mean "just look at the read mapping". If such a beast were
> ported to the core iomap code then maybe that would help here?
>
I'm not following what this means. Separate read/write mappings for each
individual iomap operation (i.e. "read from here, write to there"), or
separate iomap structures to be used for read ops vs. write ops, or
something else..?
> A hole with an out-of-place mapping needs a flush (or maybe just go find
> the pagecache and zero it), whereas a hole with nothing else backing it
> clearly doesn't need any action at all.
>
> Does that help?
>
This kind of sounds like what we're already doing in iomap, so I suspect
I'm missing something on the fuse side...
WRT this patchset, I'm trying to address the underlying problems that
require the flush-a-dirty-hole hack that provides zeroing correctness
for XFS. This needs to be lifted out of iomap because it also causes
problems for ext4, but can be bypassed completely for XFS as well by the
end of the series. The first part is just a straight lift into xfs, but
the next patches replace the flush with use of the folio batch, and
split off the band-aid case down into insert range where this flush was
also indirectly suppressing issues.
For the former, if you look at the last few patches the main reason we
rely on the flush-a-dirty-hole hack is that we don't actually report the
mappings correctly in XFS for zero range with respect to allowable
behavior. We just report a hole if one exists in the data fork. So this
is trying to encode that "COW fork prealloc over data fork hole"
scenario correctly for zero range, and also identify when we need to
consider whether the mapping range is dirty in cache (i.e. unwritten COW
blocks).
So yes in general I think we need to improve on iomap reporting somehow,
but I don't necessarily see how that avoids the need (or desire) to fix
up the iomap_begin logic. I also think it's confusing enough that it
should probably be a separate discussion (I'd probably need to stare at
the fuse-related proposition to grok it).
Ultimately the flush in zero range should go away completely except for
the default/fallback case where the fs supports zero range, fails to
check pagecache itself, and iomap has otherwise detected that the range
over an unwritten mapping was dirty. There has been some discussion over
potentially lifting the batch lookup into iomap as well, but there are
some details that would need to be worked out to determine whether that
can be done safely.
Brian
> --D
>
> > The easiest way to deal with this corner case is to flush the
> > pagecache to trigger COW remapping into the data fork, and then
> > operate on the updated on-disk state. The problem is that ext4
> > cannot accommodate a flush from this context due to being a
> > transaction deadlock vector.
> >
> > Outside of the hole quirk, ext4 can avoid the flush for zero range
> > by using the recently introduced folio batch lookup mechanism for
> > unwritten mappings. Therefore, take the next logical step and lift
> > the hole handling logic into the XFS iomap_begin handler. iomap will
> > still flush on unwritten mappings without a folio batch, and XFS
> > will flush and retry mapping lookups in the case where it would
> > otherwise report a hole with dirty pagecache during a zero range.
> >
> > Note that this is intended to be a fairly straightforward lift and
> > otherwise not change behavior. Now that the flush exists within XFS,
> > follow on patches can further optimize it.
> >
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> > fs/iomap/buffered-io.c | 2 +-
> > fs/xfs/xfs_iomap.c | 25 ++++++++++++++++++++++---
> > 2 files changed, 23 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 05ff82c5432e..d6de689374c3 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -1543,7 +1543,7 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
> > srcmap->type == IOMAP_UNWRITTEN)) {
> > s64 status;
> >
> > - if (range_dirty) {
> > + if (range_dirty && srcmap->type == IOMAP_UNWRITTEN) {
> > range_dirty = false;
> > status = iomap_zero_iter_flush_and_stale(&iter);
> > } else {
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index 01833aca37ac..b84c94558cc9 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -1734,6 +1734,7 @@ xfs_buffered_write_iomap_begin(
> > if (error)
> > return error;
> >
> > +restart:
> > error = xfs_ilock_for_iomap(ip, flags, &lockmode);
> > if (error)
> > return error;
> > @@ -1761,9 +1762,27 @@ xfs_buffered_write_iomap_begin(
> > if (eof)
> > imap.br_startoff = end_fsb; /* fake hole until the end */
> >
> > - /* We never need to allocate blocks for zeroing or unsharing a hole. */
> > - if ((flags & (IOMAP_UNSHARE | IOMAP_ZERO)) &&
> > - imap.br_startoff > offset_fsb) {
> > + /* We never need to allocate blocks for unsharing a hole. */
> > + if ((flags & IOMAP_UNSHARE) && imap.br_startoff > offset_fsb) {
> > + xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
> > + goto out_unlock;
> > + }
> > +
> > + /*
> > + * We may need to zero over a hole in the data fork if it's fronted by
> > + * COW blocks and dirty pagecache. To make sure zeroing occurs, force
> > + * writeback to remap pending blocks and restart the lookup.
> > + */
> > + if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
> > + if (filemap_range_needs_writeback(inode->i_mapping, offset,
> > + offset + count - 1)) {
> > + xfs_iunlock(ip, lockmode);
> > + error = filemap_write_and_wait_range(inode->i_mapping,
> > + offset, offset + count - 1);
> > + if (error)
> > + return error;
> > + goto restart;
> > + }
> > xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
> > goto out_unlock;
> > }
> > --
> > 2.51.0
> >
> >
>
next prev parent reply other threads:[~2025-11-05 15:28 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-16 19:02 [PATCH 0/6] iomap, xfs: improve zero range flushing and lookup Brian Foster
2025-10-16 19:02 ` [PATCH 1/6] iomap: replace folio_batch allocation with stack allocation Brian Foster
2025-11-05 0:07 ` Darrick J. Wong
2025-11-05 15:27 ` Brian Foster
2025-11-05 21:41 ` Darrick J. Wong
2025-11-06 15:51 ` Brian Foster
2025-11-06 15:58 ` Darrick J. Wong
2025-10-16 19:02 ` [PATCH 2/6] iomap, xfs: lift zero range hole mapping flush into xfs Brian Foster
2025-11-05 0:31 ` Darrick J. Wong
2025-11-05 15:33 ` Brian Foster [this message]
2025-11-05 22:23 ` Darrick J. Wong
2025-11-06 15:52 ` Brian Foster
2025-11-06 23:32 ` Darrick J. Wong
2025-11-07 13:52 ` Brian Foster
2025-11-07 13:59 ` Christoph Hellwig
2025-11-07 13:57 ` Christoph Hellwig
2025-11-07 13:55 ` Christoph Hellwig
2025-10-16 19:03 ` [PATCH 3/6] xfs: flush eof folio before insert range size update Brian Foster
2025-11-05 0:14 ` Darrick J. Wong
2025-11-05 15:34 ` Brian Foster
2025-11-05 22:15 ` Darrick J. Wong
2025-10-16 19:03 ` [PATCH 4/6] xfs: look up cow fork extent earlier for buffered iomap_begin Brian Foster
2025-11-05 22:26 ` Darrick J. Wong
2025-10-16 19:03 ` [PATCH 5/6] xfs: only flush when COW fork blocks overlap data fork holes Brian Foster
2025-10-16 19:03 ` [PATCH 6/6] xfs: replace zero range flush with folio batch Brian Foster
2025-11-05 22:37 ` Darrick J. Wong
2025-11-06 15:53 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aQtuPFHtzm8-zeqS@bfoster \
--to=bfoster@redhat.com \
--cc=djwong@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).