Re: [PATCH RFCv2 2/4] iomap: optional zero range dirty folio processing

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Brian Foster <bfoster@redhat.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH RFCv2 2/4] iomap: optional zero range dirty folio processing
Date: Mon, 13 Jan 2025 09:32:37 -0500	[thread overview]
Message-ID: <Z4UkBfnm5kSdYdv3@bfoster> (raw)
In-Reply-To: <Z4SbwEbcp5AlxMIv@infradead.org>

On Sun, Jan 12, 2025 at 08:51:12PM -0800, Christoph Hellwig wrote:
> On Fri, Jan 10, 2025 at 12:53:19PM -0500, Brian Foster wrote:
> > processing.
> > 
> > For example, if we have a largish dirty folio backed by an unwritten
> > extent with maybe a single block that is actually dirty, would we be
> > alright to just zero the requested portion of the folio as long as some
> > part of the folio is dirty? Given the historical ad hoc nature of XFS
> > speculative prealloc zeroing, personally I don't see that as much of an
> > issue in practice as long as subsequent reads return zeroes, but I could
> > be missing something.
> 
> That's a very good question I haven't though about much yet.  And
> everytime I try to think of the speculative preallocations and they're
> implications my head begins to implode..
> 

Heh. Just context on my thought process, FWIW.. We've obviously zeroed
the newly exposed file range for writes that start beyond EOF for quite
some time. This new post-eof range may or may not have been backed by
speculative prealloc, and if so, that prealloc may either be delalloc or
unwritten extents depending on whether writeback occurred on the EOF
extent (assuming large enough free extents, etc.) before the extending
write.

In turn, this means that extending write zero range would have either
physically zeroed delalloc extents or skipped unwritten blocks,
depending on the situation. Personally, I don't think it really matters
which as there is no real guarantee that "all blocks not previously
written to are unwritten," for example, but rather just that "all blocks
not written to return zeroes on read." For that reason, I'm _hoping_
that we can keep this simple and just deal with some potential spurious
zeroing on folios that are already dirty, but I'm open to arguments
against that.

Note that the post-eof zero range behavior changed in XFS sometime over
the past few releases or so in that it always converts post-eof delalloc
to unwritten in iomap_begin(), but IIRC this was to deal with some other
unrelated issue. I also don't think that change is necessarily right
because it can significantly increase the rate of physical block
allocations in some workloads, but that's a separate issue, particularly
now that zero range doesn't update i_size.. ;P

> > > >  static inline void iomap_iter_reset_iomap(struct iomap_iter *iter)
> > > >  {
> > > > +	if (iter->fbatch) {
> > > > +		folio_batch_release(iter->fbatch);
> > > > +		kfree(iter->fbatch);
> > > > +		iter->fbatch = NULL;
> > > > +	}
> > > 
> > > Does it make sense to free the fbatch allocation on every iteration,
> > > or should we keep the memory allocation around and only free it after
> > > the last iteration?
> > > 
> > 
> > In the current implementation the existence of the fbatch is what
> > controls the folio lookup path, so we'd only want it for unwritten
> > mappings. That said, this could be done differently with a flag or
> > something that indicates whether to use the batch. Given that we release
> > the folios anyways and zero range isn't the most frequent thing, I
> > figured this keeps things simple for now. I don't really have a strong
> > preference for either approach, however.
> 
> I was just worried about the overhead of allocating and freeing
> it all the time.  OTOH we probably rarely have more than a single
> extent to process with the batch right now.
> 

Ok. This maintains zero range performance in my testing so far, so I'm
going to maintain simplicity for now until there's a reason to do
otherwise. I'm open to change it on future iterations. I suspect it
might anyways if this ends up used for more operations...

Thanks again for the comments.

Brian

next prev parent reply	other threads:[~2025-01-13 14:30 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-13 15:05 [PATCH RFCv2 0/4] iomap: zero range folio batch processing prototype Brian Foster
2024-12-13 15:05 ` [PATCH RFCv2 1/4] iomap: prep work for folio_batch support Brian Foster
2024-12-13 15:05 ` [PATCH RFCv2 2/4] iomap: optional zero range dirty folio processing Brian Foster
2025-01-09  7:20   ` Christoph Hellwig
2025-01-10 17:53     ` Brian Foster
2025-01-13  4:51       ` Christoph Hellwig
2025-01-13 14:32         ` Brian Foster [this message]
2025-01-15  5:47           ` Christoph Hellwig
2025-01-16 14:14             ` Brian Foster
2024-12-13 15:05 ` [PATCH RFCv2 3/4] xfs: always trim mapping to requested range for zero range Brian Foster
2025-01-09  7:22   ` Christoph Hellwig
2024-12-13 15:05 ` [PATCH RFCv2 4/4] xfs: fill dirty folios on zero range of unwritten mappings Brian Foster
2025-01-09  7:26   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z4UkBfnm5kSdYdv3@bfoster \
    --to=bfoster@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox