Re: [PATCH] iomap: elide zero range flush from partial eof zeroing

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <djwong@kernel.org>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH] iomap: elide zero range flush from partial eof zeroing
Date: Thu, 24 Oct 2024 10:08:17 -0700	[thread overview]
Message-ID: <20241024170817.GK21853@frogsfrogsfrogs> (raw)
In-Reply-To: <20241023143029.11275-1-bfoster@redhat.com>

On Wed, Oct 23, 2024 at 10:30:29AM -0400, Brian Foster wrote:
> iomap zero range performs a pagecache flush upon seeing unwritten
> extents with dirty pagecache in order to determine accurate
> subranges that require direct zeroing. This is to support an
> optimization where clean, unwritten ranges are skipped as they are
> already zero on-disk.
> 
> Certain use cases for zero range are more sensitive to flush latency
> than others. The kernel test robot recently reported a regression in
> the following stress-ng workload on XFS:
> 
>   stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --metamix 64
> 
> This workload involves a series of small, strided, write extending
> writes. On XFS, this produces a pattern of allocating post-eof
> speculative preallocation, converting preallocation to unwritten on
> zero range calls, dirtying pagecache over the converted mapping, and
> then repeating the sequence again from the updated EOF. This
> basically produces a sequence of pagecache flushes on the partial
> EOF block zeroing use case of zero range.
> 
> To mitigate this problem, special case the EOF block zeroing use
> case to prefer zeroing over a pagecache flush when the EOF folio is
> already dirty. This brings most of the performance back by avoiding
> flushes on write and truncate extension operations, while preserving
> the ability for iomap to flush and properly process larger ranges.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> 
> Hi iomap maintainers,
> 
> This is an incremental optimization for the regression reported by the
> test robot here[1]. I'm not totally convinced this is necessary as an
> immediate fix, but the discussion on that thread was enough to suggest
> it could be. I don't really love the factoring, but I had to play a bit
> of whack-a-mole between fstests and stress-ng to restore performance and
> still maintain behavior expectations for some of the tests.
> 
> On a positive note, exploring this gave me what I think is a better idea
> for dealing with zero range overall, so I'm working on a followup to
> this that reworks it by splitting zero range across block alignment
> boundaries (similar to how something like truncate page range works, for
> example). This simplifies things by isolating the dirty range check to a
> single folio on an unaligned start offset, which lets the _iter() call
> do a skip or zero (i.e. no more flush_and_stale()), and then
> unconditionally flush the aligned portion to end-of-range. The latter
> flush should be a no-op for every use case I've seen so far, so this
> might entirely avoid the need for anything more complex for zero range.
> 
> In summary, I'm posting this as an optional and more "stable-worthy"
> patch for reference and for the maintainers to consider as they like. I
> think it's reasonable to include if we are concerned about this
> particular stress-ng test and are Ok with it as a transient solution.
> But if it were up to me, I'd probably sit on it for a bit to determine
> if a more practical user/workload is affected by this, particularly
> knowing that I'm trying to rework it. This could always be applied as a
> stable fix if really needed, but I just don't think the slightly more
> invasive rework is appropriate for -rc..
> 
> Thoughts, reviews, flames appreciated.
> 
> Brian
> 
> [1] https://lore.kernel.org/linux-xfs/202410141536.1167190b-oliver.sang@intel.com/
> 
>  fs/iomap/buffered-io.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index aa587b2142e2..8fd25b14d120 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1372,6 +1372,7 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
>  	loff_t pos = iter->pos;
>  	loff_t length = iomap_length(iter);
>  	loff_t written = 0;
> +	bool eof_zero = false;
>  
>  	/*
>  	 * We must zero subranges of unwritten mappings that might be dirty in
> @@ -1391,12 +1392,23 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
>  	 * triggers writeback time post-eof zeroing.
>  	 */
>  	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) {
> -		if (*range_dirty) {
> +		/* range is clean and already zeroed, nothing to do */
> +		if (!*range_dirty)
> +			return length;
> +
> +		/* flush for anything other than partial eof zeroing */
> +		if (pos != i_size_read(iter->inode) ||
> +		   (pos % i_blocksize(iter->inode)) == 0) {
>  			*range_dirty = false;
>  			return iomap_zero_iter_flush_and_stale(iter);
>  		}
> -		/* range is clean and already zeroed, nothing to do */
> -		return length;
> +		/*
> +		 * Special case partial EOF zeroing. Since we know the EOF
> +		 * folio is dirty, prefer in-memory zeroing for it. This avoids
> +		 * excessive flush latency on frequent file size extending
> +		 * operations.
> +		 */
> +		eof_zero = true;
>  	}
>  
>  	do {
> @@ -1415,6 +1427,8 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
>  		offset = offset_in_folio(folio, pos);
>  		if (bytes > folio_size(folio) - offset)
>  			bytes = folio_size(folio) - offset;
> +		if (eof_zero && length > bytes)
> +			length = bytes;

What does this do?  I think this causes the loop to break after putting
the folio that caches @pos?  And then I guess we go around the loop in
iomap_zero_range again if there were more bytes to zero after this
folio?

--D

>  
>  		folio_zero_range(folio, offset, bytes);
>  		folio_mark_accessed(folio);
> -- 
> 2.46.2
> 
>

next prev parent reply	other threads:[~2024-10-24 17:08 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-23 14:30 [PATCH] iomap: elide zero range flush from partial eof zeroing Brian Foster
2024-10-24 14:38 ` kernel test robot
2024-10-24 17:08 ` Darrick J. Wong [this message]
2024-10-24 17:41   ` Brian Foster
2024-10-24 19:42     ` Brian Foster
2024-10-25 16:49       ` Brian Foster
2024-11-05  1:43         ` Darrick J. Wong
2024-11-05 13:26           ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241024170817.GK21853@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=bfoster@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox