Re: [RFC] gfs2: Do not call iomap_zero_range beyond eof

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>,
	Christian Brauner <brauner@kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	Matthew Wilcox <willy@infradead.org>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	gfs2@lists.linux.dev
Subject: Re: [RFC] gfs2: Do not call iomap_zero_range beyond eof
Date: Thu, 8 May 2025 12:34:33 -0400	[thread overview]
Message-ID: <aBzdGfHZmx-9j8xi@bfoster> (raw)
In-Reply-To: <20250508160422.GN1035866@frogsfrogsfrogs>

On Thu, May 08, 2025 at 09:04:22AM -0700, Darrick J. Wong wrote:
> On Thu, May 08, 2025 at 11:19:37AM -0400, Brian Foster wrote:
> > On Thu, May 08, 2025 at 08:04:46AM -0700, Darrick J. Wong wrote:
> > > On Thu, May 08, 2025 at 10:30:30AM -0400, Brian Foster wrote:
> > > > On Thu, May 08, 2025 at 03:34:27PM +0200, Andreas Gruenbacher wrote:
> > > > > Since commit eb65540aa9fc ("iomap: warn on zero range of a post-eof
> > > > > folio"), iomap_zero_range() warns when asked to zero a folio beyond eof.
> > > > > The warning triggers on the following code path:
> > > 
> > > Which warning?  This one?
> > > 
> > > 	/* warn about zeroing folios beyond eof that won't write back */
> > > 	WARN_ON_ONCE(folio_pos(folio) > iter->inode->i_size);
> > > 
> > > If so, then why are there folios that start entirely beyond EOF?
> > > 
> > 
> > Yeah.. this gfs2 instance is simply a case of their punch hole mechanism
> > does unconditional partial folio zeroing via iomap zero range, so if a
> > punch hole occurs on some unaligned range of post-eof blocks, it will
> > basically create and perform zeroing of post-eof folios. IIUC the caveat
> > here is that these blocks are all zeroed on alloc (unwritten extents are
> > apparently not a thing in gfs2), so the punch time zeroing and warning
> > are spurious. Andreas can correct me if I have any of that wrong.
> 
> Oh, right, because iomap_zero_iter calls iomap_write_begin, which
> allocates a new folio completely beyond EOF, and then we see that new
> folio and WARN about it before scribbling on the folio and dirtying it.
> Correct?
> 
> If so then yeah, it doesn't seem useful to do that... unless the file
> size immediately gets extended such that at least one byte of the dirty
> folio is within EOF.  Even then, that seems like a stretch...
> 

Yep, agreed. An i_size update after a zero op would come after locks are
dropped and whatnot as well, so seems racy and wrong.

> > > > > 
> > > > >   gfs2_fallocate(FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE)
> > > > >     __gfs2_punch_hole()
> > > > >       gfs2_block_zero_range()
> > > > >         iomap_zero_range()
> > > > > 
> > > > > So far, gfs2 is just zeroing out partial pages at the beginning and end
> > > > > of the range, whether beyond eof or not.  The data beyond eof is already
> > > > > expected to be all zeroes, though.  Truncate the range passed to
> > > > > iomap_zero_range().
> > > > > 
> > > > > As an alternative approach, we could also implicitly truncate the range
> > > > > inside iomap_zero_range() instead of issuing a warning.  Any thoughts?
> > > > > 
> > > > 
> > > > Thanks Andreas. The more I think about this the more it seems like
> > > > lifting this logic into iomap is a reasonable compromise between just
> > > > dropping the warning and forcing individual filesystems to work around
> > > > it. The original intent of the warning was to have something to catch
> > > > subtle bad behavior since zero range did update i_size for so long.
> > > > 
> > > > OTOH I think it's reasonable to argue that we shouldn't need to warn in
> > > > situations where we could just enforce correct behavior. Also, I believe
> > > > we introduced something similar to avoid post-eof weirdness wrt unshare
> > > > range [1], so precedent exists.
> > > > 
> > > > I'm interested if others have opinions on the iomap side.. (though as I
> > > > write this it looks like hch sits on the side of not tweaking iomap).
> > > 
> > > IIRC XFS calls iomap_zero_range during file extending operations to zero
> > > the tail of a folio that spans EOF, so you'd have to allow for that too.
> > > 
> > 
> > Yeah, good point. Perhaps we'd want to bail on a folio that starts
> > beyond EOF with this approach, similar to the warning logic.
> 
> ...because I don't see much use in zeroing and dirtying a folio that
> starts well beyond EOF since iomap_writepage_handle_eof will ignore it
> and there are several gigantic comments in buffered-io.c about clamping
> to EOF.
> 
> <shrug> But maybe I'm missing a usecase?
> 

Not that I'm aware of.

Brian

> --D
> 
> > Brian
> > 
> > > --D
> > > 
> > > > Brian
> > > > 
> > > > [1] a311a08a4237 ("iomap: constrain the file range passed to iomap_file_unshare")
> > > > 
> > > > > Thanks,
> > > > > Andreas
> > > > > 
> > > > > --
> > > > > 
> > > > > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> > > > > 
> > > > > diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> > > > > index b81984def58e..d9a4309cd414 100644
> > > > > --- a/fs/gfs2/bmap.c
> > > > > +++ b/fs/gfs2/bmap.c
> > > > > @@ -1301,6 +1301,10 @@ static int gfs2_block_zero_range(struct inode *inode, loff_t from,
> > > > >  				 unsigned int length)
> > > > >  {
> > > > >  	BUG_ON(current->journal_info);
> > > > > +	if (from > inode->i_size)
> > > > > +		return 0;
> > > > > +	if (from + length > inode->i_size)
> > > > > +		length = inode->i_size - from;
> > > > >  	return iomap_zero_range(inode, from, length, NULL, &gfs2_iomap_ops,
> > > > >  			NULL);
> > > > >  }
> > > > > 
> > > > 
> > > > 
> > > 
> > 
> > 
>

next prev parent reply	other threads:[~2025-05-08 16:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-08 13:34 [RFC] gfs2: Do not call iomap_zero_range beyond eof Andreas Gruenbacher
2025-05-08 14:17 ` Christoph Hellwig
2025-05-08 14:30 ` Brian Foster
2025-05-08 15:04   ` Darrick J. Wong
2025-05-08 15:19     ` Brian Foster
2025-05-08 16:04       ` Darrick J. Wong
2025-05-08 16:34         ` Brian Foster [this message]
2025-05-08 16:36         ` Andreas Gruenbacher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBzdGfHZmx-9j8xi@bfoster \
    --to=bfoster@redhat.com \
    --cc=agruenba@redhat.com \
    --cc=brauner@kernel.org \
    --cc=djwong@kernel.org \
    --cc=gfs2@lists.linux.dev \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.