linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>,
	Zhang Yi <yi.zhang@huaweicloud.com>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, brauner@kernel.org,
	david@fromorbit.com, chandanbabu@kernel.org, jack@suse.cz,
	willy@infradead.org, yi.zhang@huawei.com,
	chengzhihao1@huawei.com, yukuai3@huawei.com
Subject: Re: [RFC PATCH v4 1/8] iomap: zeroing needs to be pagecache aware
Date: Fri, 31 May 2024 11:43:22 -0400	[thread overview]
Message-ID: <ZlnwGhuvUBLyiE6J@bfoster> (raw)
In-Reply-To: <20240531140358.GF52987@frogsfrogsfrogs>

On Fri, May 31, 2024 at 07:03:58AM -0700, Darrick J. Wong wrote:
> On Fri, May 31, 2024 at 06:11:25AM -0700, Christoph Hellwig wrote:
> > On Wed, May 29, 2024 at 05:51:59PM +0800, Zhang Yi wrote:
> > > XXX: how do we detect a iomap containing a cow mapping over a hole
> > > in iomap_zero_iter()? The XFS code implies this case also needs to
> > > zero the page cache if there is data present, so trigger for page
> > > cache lookup only in iomap_zero_iter() needs to handle this case as
> > > well.
> > 
> > If there is no data in the page cache and either a whole or unwritten
> > extent it really should not matter what is in the COW fork, a there
> > obviously isn't any data we could zero.
> > 
> > If there is data in the page cache for something that is marked as
> > a hole in the srcmap, but we have data in the COW fork due to
> > COW extsize preallocation we'd need to zero it, but as the
> > xfs iomap ops don't return a separate srcmap for that case we
> > should be fine.  Or am I missing something?
> 
> It might be useful to skip the scan for dirty pagecache if both forks
> have holes, since (in theory) that's never possible on xfs.
> 
> OTOH maybe there are filesystems that allow dirty pagecache over a hole?
> 

IIRC there was a case where dirty cache can exist over what is reported
as a hole to zero range. I want to say it was something like a COW
prealloc over a data fork hole followed by a buffered write and then a
zero range, but I don't recall the details. That is all something that
should be fixed on the lookup side anyways.

Brian

> > > + * Note: when zeroing unwritten extents, we might have data in the page cache
> > > + * over an unwritten extent. In this case, we want to do a pure lookup on the
> > > + * page cache and not create a new folio as we don't need to perform zeroing on
> > > + * unwritten extents if there is no cached data over the given range.
> > >   */
> > >  struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len)
> > >  {
> > >  	fgf_t fgp = FGP_WRITEBEGIN | FGP_NOFS;
> > >  
> > > +	if (iter->flags & IOMAP_ZERO) {
> > > +		const struct iomap *srcmap = iomap_iter_srcmap(iter);
> > > +
> > > +		if (srcmap->type == IOMAP_UNWRITTEN)
> > > +			fgp &= ~FGP_CREAT;
> > > +	}
> > 
> > Nit:  The comment would probably stand out a little better if it was
> > right next to the IOMAP_ZERO conditional instead of above the
> > function.
> 
> Agreed.
> 
> > > +		if (status) {
> > > +			if (status == -ENOENT) {
> > > +				/*
> > > +				 * Unwritten extents need to have page cache
> > > +				 * lookups done to determine if they have data
> > > +				 * over them that needs zeroing. If there is no
> > > +				 * data, we'll get -ENOENT returned here, so we
> > > +				 * can just skip over this index.
> > > +				 */
> > > +				WARN_ON_ONCE(srcmap->type != IOMAP_UNWRITTEN);
> > 
> > I'd return -EIO if the WARN_ON triggers.
> > 
> > > +loop_continue:
> > 
> > While I'm no strange to gotos for loop control something trips me
> > up about jumping to the end of the loop.  Here is what I could come
> > up with instead.  Not arguing it's objectively better, but I somehow
> > like it a little better:
> > 
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 700b22d6807783..81378f7cd8d7ff 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -1412,49 +1412,56 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
> >  		bool ret;
> >  
> >  		status = iomap_write_begin(iter, pos, bytes, &folio);
> > -		if (status) {
> > -			if (status == -ENOENT) {
> > -				/*
> > -				 * Unwritten extents need to have page cache
> > -				 * lookups done to determine if they have data
> > -				 * over them that needs zeroing. If there is no
> > -				 * data, we'll get -ENOENT returned here, so we
> > -				 * can just skip over this index.
> > -				 */
> > -				WARN_ON_ONCE(srcmap->type != IOMAP_UNWRITTEN);
> > -				if (bytes > PAGE_SIZE - offset_in_page(pos))
> > -					bytes = PAGE_SIZE - offset_in_page(pos);
> > -				goto loop_continue;
> > -			}
> > +		if (status && status != -ENOENT)
> >  			return status;
> > -		}
> > -		if (iter->iomap.flags & IOMAP_F_STALE)
> > -			break;
> >  
> > -		offset = offset_in_folio(folio, pos);
> > -		if (bytes > folio_size(folio) - offset)
> > -			bytes = folio_size(folio) - offset;
> > +		if (status == -ENOENT) {
> > +			/*
> > +			 * If we end up here, we did not find a folio in the
> > +			 * page cache for an unwritten extent and thus can
> > +			 * skip over the range.
> > +			 */
> > +			if (WARN_ON_ONCE(srcmap->type != IOMAP_UNWRITTEN))
> > +				return -EIO;
> >  
> > -		/*
> > -		 * If the folio over an unwritten extent is clean (i.e. because
> > -		 * it has been read from), then it already contains zeros. Hence
> > -		 * we can just skip it.
> > -		 */
> > -		if (srcmap->type == IOMAP_UNWRITTEN &&
> > -		    !folio_test_dirty(folio)) {
> > -			folio_unlock(folio);
> > -			goto loop_continue;
> > +			/*
> > +			 * XXX: It would be nice if we could get the offset of
> > +			 * the next entry in the pagecache so that we don't have
> > +			 * to iterate one page at a time here.
> > +			 */
> > +			offset = offset_in_page(pos);
> > +			if (bytes > PAGE_SIZE - offset)
> > +				bytes = PAGE_SIZE - offset;
> 
> Why is it PAGE_SIZE here and not folio_size() like below?
> 
> (I know you're just copying the existing code; I'm merely wondering if
> this is some minor bug.)
> 
> --D
> 
> > +		} else {
> > +			if (iter->iomap.flags & IOMAP_F_STALE)
> > +				break;
> > +
> > +			offset = offset_in_folio(folio, pos);
> > +			if (bytes > folio_size(folio) - offset)
> > +				bytes = folio_size(folio) - offset;
> > +		
> > +			/*
> > +			 * If the folio over an unwritten extent is clean (i.e.
> > +			 * because it has only been read from), then it already
> > +			 * contains zeros.  Hence we can just skip it.
> > +			 */
> > +			if (srcmap->type == IOMAP_UNWRITTEN &&
> > +			    !folio_test_dirty(folio)) {
> > +				folio_unlock(folio);
> > +				status = -ENOENT;
> > +			}
> >  		}
> >  
> > -		folio_zero_range(folio, offset, bytes);
> > -		folio_mark_accessed(folio);
> > +		if (status != -ENOENT) {
> > +			folio_zero_range(folio, offset, bytes);
> > +			folio_mark_accessed(folio);
> >  
> > -		ret = iomap_write_end(iter, pos, bytes, bytes, folio);
> > -		__iomap_put_folio(iter, pos, bytes, folio);
> > -		if (WARN_ON_ONCE(!ret))
> > -			return -EIO;
> > +			ret = iomap_write_end(iter, pos, bytes, bytes, folio);
> > +			__iomap_put_folio(iter, pos, bytes, folio);
> > +			if (WARN_ON_ONCE(!ret))
> > +				return -EIO;
> > +		}
> >  
> > -loop_continue:
> >  		pos += bytes;
> >  		length -= bytes;
> >  		written += bytes;
> > 
> 


  parent reply	other threads:[~2024-05-31 15:43 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-29  9:51 [RFC PATCH v4 0/8] iomap/xfs: fix stale data exposure when truncating realtime inodes Zhang Yi
2024-05-29  9:51 ` [RFC PATCH v4 1/8] iomap: zeroing needs to be pagecache aware Zhang Yi
2024-05-31 13:11   ` Christoph Hellwig
2024-05-31 14:03     ` Darrick J. Wong
2024-05-31 14:05       ` Christoph Hellwig
2024-05-31 15:44         ` Brian Foster
2024-05-31 15:43       ` Brian Foster [this message]
2024-06-02 22:22     ` Dave Chinner
2024-06-02 11:04   ` Brian Foster
2024-06-03  9:07     ` Zhang Yi
2024-06-03 14:37       ` Brian Foster
2024-06-04 23:38         ` Dave Chinner
2024-05-29  9:52 ` [RFC PATCH v4 2/8] math64: add rem_u64() to just return the remainder Zhang Yi
2024-05-31 12:35   ` Christoph Hellwig
2024-05-31 14:04   ` Darrick J. Wong
2024-05-29  9:52 ` [RFC PATCH v4 3/8] iomap: pass blocksize to iomap_truncate_page() Zhang Yi
2024-05-31 12:39   ` Christoph Hellwig
2024-06-02 11:16     ` Brian Foster
2024-06-03 13:23     ` Zhang Yi
2024-05-29  9:52 ` [RFC PATCH v4 4/8] fsdax: pass blocksize to dax_truncate_page() Zhang Yi
2024-05-29  9:52 ` [RFC PATCH v4 5/8] xfs: refactor the truncating order Zhang Yi
2024-05-31 13:31   ` Christoph Hellwig
2024-05-31 15:27     ` Darrick J. Wong
2024-05-31 16:17       ` Christoph Hellwig
2024-06-03 13:51       ` Zhang Yi
2024-05-31 15:44   ` Darrick J. Wong
2024-06-03 14:15     ` Zhang Yi
2024-06-02 22:46   ` Dave Chinner
2024-06-03 14:18     ` Zhang Yi
2024-05-29  9:52 ` [RFC PATCH v4 6/8] xfs: correct the truncate blocksize of realtime inode Zhang Yi
2024-05-31 13:36   ` Christoph Hellwig
2024-06-03 14:35     ` Zhang Yi
2024-05-29  9:52 ` [RFC PATCH v4 7/8] xfs: reserve blocks for truncating " Zhang Yi
2024-05-31 12:42   ` Christoph Hellwig
2024-05-31 14:10     ` Darrick J. Wong
2024-05-31 14:13       ` Christoph Hellwig
2024-05-31 15:29         ` Darrick J. Wong
2024-05-31 16:17           ` Christoph Hellwig
2024-05-29  9:52 ` [RFC PATCH v4 8/8] xfs: improve truncate on a realtime inode with huge extsize Zhang Yi
2024-05-31 13:46   ` Christoph Hellwig
2024-05-31 14:12     ` Darrick J. Wong
2024-05-31 14:15       ` Christoph Hellwig
2024-05-31 15:00         ` Darrick J. Wong
2024-06-04  7:09           ` Zhang Yi
2024-05-31 12:26 ` [RFC PATCH v4 0/8] iomap/xfs: fix stale data exposure when truncating realtime inodes Christoph Hellwig
2024-06-01  7:38   ` Zhang Yi
2024-06-01  7:40     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZlnwGhuvUBLyiE6J@bfoster \
    --to=bfoster@redhat.com \
    --cc=brauner@kernel.org \
    --cc=chandanbabu@kernel.org \
    --cc=chengzhihao1@huawei.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    --cc=yi.zhang@huawei.com \
    --cc=yi.zhang@huaweicloud.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).