From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 3/6] xfs: flush eof folio before insert range size update
Date: Tue, 18 Nov 2025 15:08:35 -0500 [thread overview]
Message-ID: <aRzSQypQIad3TsBT@bfoster> (raw)
In-Reply-To: <aQtughoBHt6LRTUx@bfoster>
On Wed, Nov 05, 2025 at 10:34:26AM -0500, Brian Foster wrote:
> On Tue, Nov 04, 2025 at 04:14:45PM -0800, Darrick J. Wong wrote:
> > On Thu, Oct 16, 2025 at 03:03:00PM -0400, Brian Foster wrote:
> > > The flush in xfs_buffered_write_iomap_begin() for zero range over a
> > > data fork hole fronted by COW fork prealloc is primarily designed to
> > > provide correct zeroing behavior in particular pagecache conditions.
> > > As it turns out, this also partially masks some odd behavior in
> > > insert range (via zero range via setattr).
> > >
> > > Insert range bumps i_size the length of the new range, flushes,
> > > unmaps pagecache and cancels COW prealloc, and then right shifts
> > > extents from the end of the file back to the target offset of the
> > > insert. Since the i_size update occurs before the pagecache flush,
> > > this creates a transient situation where writeback around EOF can
> > > behave differently.
> >
> > Why not flush the file from @offset to EOF, flush the COW
> > preallocations, extend i_size, and only then start shifting extents?
> > That would seem a lot more straightforward to me.
> >
>
> I agree. I noted in the cover letter that I started with this approach
> of reordering the existing sequence of operations, but the factoring
> looked ugly enough that I stopped and wanted to solicit input.
>
Well this is annoying.. I looked into lifting the prepare shift bits a
level up and just invoking it before the truncate, but that actually
appears to be incorrect in at least one case. I.e., the truncate up for
insert range invokes zero eof for partial zeroing of the eof block,
which brings a folio back into pagecache (after the lifted flush/unmap)
within the range being shfited. This then creates a post-shift cache
inconsistency. This is resolved by restoring the flush/unmap between the
truncate and extent shift, so we end up just having it in both places.
So atm I'm not really sure if there's a better option than what this
patch is doing. There are probably other ways to implement it, like
perhaps introducing ability to convert blocks that are shifted out from
eof, but that just seems like unnecessary complexity for basically
implementing the same sort of thing.
I dunno.. insert range is odd in this regard. I'll think a bit more
about it while I look into the zero range cow mapping reporting thing.
Brian
> The details of that fell out of my brain since I posted this,
> unfortunately. I suspect it may have been related to layering or
> something wrt the prepare_shift factoring, but I'll take another look in
> that direction for v2 and once I've got some feedback on the rest of the
> series.. Thanks.
>
> Brian
>
> > --D
> >
> > > This appears to be corner case situation, but if happens to be
> > > fronted by COW fork speculative preallocation and a large, dirty
> > > folio that contains at least one full COW block beyond EOF, the
> > > writeback after i_size is bumped may remap that COW fork block into
> > > the data fork within EOF. The block is zeroed and then shifted back
> > > out to post-eof, but this is unexpected in that it leads to a
> > > written post-eof data fork block. This can cause a zero range
> > > warning on a subsequent size extension, because we should never find
> > > blocks that require physical zeroing beyond i_size.
> > >
> > > To avoid this quirk, flush the EOF folio before the i_size update
> > > during insert range. The entire range will be flushed, unmapped and
> > > invalidated anyways, so this should be relatively unnoticeable.
> > >
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > > fs/xfs/xfs_file.c | 17 +++++++++++++++++
> > > 1 file changed, 17 insertions(+)
> > >
> > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > > index 5b9864c8582e..cc3a9674ad40 100644
> > > --- a/fs/xfs/xfs_file.c
> > > +++ b/fs/xfs/xfs_file.c
> > > @@ -1226,6 +1226,23 @@ xfs_falloc_insert_range(
> > > if (offset >= isize)
> > > return -EINVAL;
> > >
> > > + /*
> > > + * Let writeback clean up EOF folio state before we bump i_size. The
> > > + * insert flushes before it starts shifting and under certain
> > > + * circumstances we can write back blocks that should technically be
> > > + * considered post-eof (and thus should not be submitted for writeback).
> > > + *
> > > + * For example, a large, dirty folio that spans EOF and is backed by
> > > + * post-eof COW fork preallocation can cause block remap into the data
> > > + * fork. This shifts back out beyond EOF, but creates an expectedly
> > > + * written post-eof block. The insert is going to flush, unmap and
> > > + * cancel prealloc across this whole range, so flush EOF now before we
> > > + * bump i_size to provide consistent behavior.
> > > + */
> > > + error = filemap_write_and_wait_range(inode->i_mapping, isize, isize);
> > > + if (error)
> > > + return error;
> > > +
> > > error = xfs_falloc_setsize(file, isize + len);
> > > if (error)
> > > return error;
> > > --
> > > 2.51.0
> > >
> > >
> >
>
>
next prev parent reply other threads:[~2025-11-18 20:08 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-16 19:02 [PATCH 0/6] iomap, xfs: improve zero range flushing and lookup Brian Foster
2025-10-16 19:02 ` [PATCH 1/6] iomap: replace folio_batch allocation with stack allocation Brian Foster
2025-11-05 0:07 ` Darrick J. Wong
2025-11-05 15:27 ` Brian Foster
2025-11-05 21:41 ` Darrick J. Wong
2025-11-06 15:51 ` Brian Foster
2025-11-06 15:58 ` Darrick J. Wong
2025-10-16 19:02 ` [PATCH 2/6] iomap, xfs: lift zero range hole mapping flush into xfs Brian Foster
2025-11-05 0:31 ` Darrick J. Wong
2025-11-05 15:33 ` Brian Foster
2025-11-05 22:23 ` Darrick J. Wong
2025-11-06 15:52 ` Brian Foster
2025-11-06 23:32 ` Darrick J. Wong
2025-11-07 13:52 ` Brian Foster
2025-11-07 13:59 ` Christoph Hellwig
2025-11-07 13:57 ` Christoph Hellwig
2025-11-07 13:55 ` Christoph Hellwig
2025-10-16 19:03 ` [PATCH 3/6] xfs: flush eof folio before insert range size update Brian Foster
2025-11-05 0:14 ` Darrick J. Wong
2025-11-05 15:34 ` Brian Foster
2025-11-05 22:15 ` Darrick J. Wong
2025-11-18 20:08 ` Brian Foster [this message]
2025-10-16 19:03 ` [PATCH 4/6] xfs: look up cow fork extent earlier for buffered iomap_begin Brian Foster
2025-11-05 22:26 ` Darrick J. Wong
2025-10-16 19:03 ` [PATCH 5/6] xfs: only flush when COW fork blocks overlap data fork holes Brian Foster
2025-10-16 19:03 ` [PATCH 6/6] xfs: replace zero range flush with folio batch Brian Foster
2025-11-05 22:37 ` Darrick J. Wong
2025-11-06 15:53 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aRzSQypQIad3TsBT@bfoster \
--to=bfoster@redhat.com \
--cc=djwong@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).