public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v2 2/2] iomap: elide flush from partial eof zero range
Date: Tue, 5 Nov 2024 16:11:30 -0800	[thread overview]
Message-ID: <20241106001130.GP21836@frogsfrogsfrogs> (raw)
In-Reply-To: <20241031140449.439576-3-bfoster@redhat.com>

On Thu, Oct 31, 2024 at 10:04:48AM -0400, Brian Foster wrote:
> iomap zero range flushes pagecache in certain situations to
> determine which parts of the range might require zeroing if dirty
> data is present in pagecache. The kernel robot recently reported a
> regression associated with this flushing in the following stress-ng
> workload on XFS:
> 
> stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --metamix 64
> 
> This workload involves repeated small, strided, extending writes. On
> XFS, this produces a pattern of post-eof speculative preallocation,
> conversion of preallocation from delalloc to unwritten, dirtying
> pagecache over newly unwritten blocks, and then rinse and repeat
> from the new EOF. This leads to repetitive flushing of the EOF folio
> via the zero range call XFS uses for writes that start beyond
> current EOF.
> 
> To mitigate this problem, special case EOF block zeroing to prefer
> zeroing the folio over a flush when the EOF folio is already dirty.
> To do this, split out and open code handling of an unaligned start
> offset. This brings most of the performance back by avoiding flushes
> on zero range calls via write and truncate extension operations. The
> flush doesn't occur in these situations because the entire range is
> post-eof and therefore the folio that overlaps EOF is the only one
> in the range.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Cc: <stable@vger.kernel.org> # v6.12-rc1
Fixes: 7d9b474ee4cc37 ("iomap: make zero range flush conditional on unwritten mappings")

perhaps?

> ---
>  fs/iomap/buffered-io.c | 42 ++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 38 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 60386cb7b9ef..343a2fa29bec 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -227,6 +227,18 @@ static void ifs_free(struct folio *folio)
>  	kfree(ifs);
>  }
>  
> +/* helper to reset an iter for reuse */
> +static inline void
> +iomap_iter_init(struct iomap_iter *iter, struct inode *inode, loff_t pos,
> +		loff_t len, unsigned flags)

Nit: maybe call this iomap_iter_reset() ?

Also I wonder if it's really safe to zero iomap_iter::private?
Won't doing that leave a minor logic bomb?

> +{
> +	memset(iter, 0, sizeof(*iter));
> +	iter->inode = inode;
> +	iter->pos = pos;
> +	iter->len = len;
> +	iter->flags = flags;
> +}
> +
>  /*
>   * Calculate the range inside the folio that we actually need to read.
>   */
> @@ -1416,6 +1428,10 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
>  		.len		= len,
>  		.flags		= IOMAP_ZERO,
>  	};
> +	struct address_space *mapping = inode->i_mapping;
> +	unsigned int blocksize = i_blocksize(inode);
> +	unsigned int off = pos & (blocksize - 1);
> +	loff_t plen = min_t(loff_t, len, blocksize - off);
>  	int ret;
>  	bool range_dirty;
>  
> @@ -1425,12 +1441,30 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
>  	 * mapping converts on writeback completion and must be zeroed.
>  	 *
>  	 * The simplest way to deal with this is to flush pagecache and process
> -	 * the updated mappings. To avoid an unconditional flush, check dirty
> -	 * state and defer the flush until a combination of dirty pagecache and
> -	 * at least one mapping that might convert on writeback is seen.
> +	 * the updated mappings. First, special case the partial eof zeroing
> +	 * use case since it is more performance sensitive. Zero the start of
> +	 * the range if unaligned and already dirty in pagecache.
> +	 */
> +	if (off &&
> +	    filemap_range_needs_writeback(mapping, pos, pos + plen - 1)) {
> +		iter.len = plen;
> +		while ((ret = iomap_iter(&iter, ops)) > 0)
> +			iter.processed = iomap_zero_iter(&iter, did_zero);
> +
> +		/* reset iterator for the rest of the range */
> +		iomap_iter_init(&iter, inode, iter.pos,
> +			len - (iter.pos - pos), IOMAP_ZERO);

Nit: maybe one more tab ^ here?

Also from the previous thread: can you reset the original iter instead
of declaring a second one by zeroing the mappings/processed fields,
re-expanding iter::len, and resetting iter::flags?

I guess we'll still do the flush if the start of the zeroing range
aligns with an fsblock?  I guess if you're going to do a lot of small
extensions then once per fsblock isn't too bad?

--D

> +		if (ret || !iter.len)
> +			return ret;
> +	}
> +
> +	/*
> +	 * To avoid an unconditional flush, check dirty state and defer the
> +	 * flush until a combination of dirty pagecache and at least one
> +	 * mapping that might convert on writeback is seen.
>  	 */
>  	range_dirty = filemap_range_needs_writeback(inode->i_mapping,
> -					pos, pos + len - 1);
> +					iter.pos, iter.pos + iter.len - 1);
>  	while ((ret = iomap_iter(&iter, ops)) > 0) {
>  		const struct iomap *s = iomap_iter_srcmap(&iter);
>  		if (s->type == IOMAP_HOLE || s->type == IOMAP_UNWRITTEN) {
> -- 
> 2.46.2
> 
> 

  reply	other threads:[~2024-11-06  0:11 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-31 14:04 [PATCH v2 0/2] iomap: avoid flushes for partial eof zeroing Brian Foster
2024-10-31 14:04 ` [PATCH v2 1/2] iomap: lift zeroed mapping handling into iomap_zero_range() Brian Foster
2024-11-06  0:14   ` Darrick J. Wong
2024-10-31 14:04 ` [PATCH v2 2/2] iomap: elide flush from partial eof zero range Brian Foster
2024-11-06  0:11   ` Darrick J. Wong [this message]
2024-11-06 14:13     ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241106001130.GP21836@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=bfoster@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox