From: "Darrick J. Wong" <djwong@kernel.org>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v2 2/2] iomap: elide flush from partial eof zero range
Date: Tue, 5 Nov 2024 16:11:30 -0800 [thread overview]
Message-ID: <20241106001130.GP21836@frogsfrogsfrogs> (raw)
In-Reply-To: <20241031140449.439576-3-bfoster@redhat.com>
On Thu, Oct 31, 2024 at 10:04:48AM -0400, Brian Foster wrote:
> iomap zero range flushes pagecache in certain situations to
> determine which parts of the range might require zeroing if dirty
> data is present in pagecache. The kernel robot recently reported a
> regression associated with this flushing in the following stress-ng
> workload on XFS:
>
> stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --metamix 64
>
> This workload involves repeated small, strided, extending writes. On
> XFS, this produces a pattern of post-eof speculative preallocation,
> conversion of preallocation from delalloc to unwritten, dirtying
> pagecache over newly unwritten blocks, and then rinse and repeat
> from the new EOF. This leads to repetitive flushing of the EOF folio
> via the zero range call XFS uses for writes that start beyond
> current EOF.
>
> To mitigate this problem, special case EOF block zeroing to prefer
> zeroing the folio over a flush when the EOF folio is already dirty.
> To do this, split out and open code handling of an unaligned start
> offset. This brings most of the performance back by avoiding flushes
> on zero range calls via write and truncate extension operations. The
> flush doesn't occur in these situations because the entire range is
> post-eof and therefore the folio that overlaps EOF is the only one
> in the range.
>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
Cc: <stable@vger.kernel.org> # v6.12-rc1
Fixes: 7d9b474ee4cc37 ("iomap: make zero range flush conditional on unwritten mappings")
perhaps?
> ---
> fs/iomap/buffered-io.c | 42 ++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 38 insertions(+), 4 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 60386cb7b9ef..343a2fa29bec 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -227,6 +227,18 @@ static void ifs_free(struct folio *folio)
> kfree(ifs);
> }
>
> +/* helper to reset an iter for reuse */
> +static inline void
> +iomap_iter_init(struct iomap_iter *iter, struct inode *inode, loff_t pos,
> + loff_t len, unsigned flags)
Nit: maybe call this iomap_iter_reset() ?
Also I wonder if it's really safe to zero iomap_iter::private?
Won't doing that leave a minor logic bomb?
> +{
> + memset(iter, 0, sizeof(*iter));
> + iter->inode = inode;
> + iter->pos = pos;
> + iter->len = len;
> + iter->flags = flags;
> +}
> +
> /*
> * Calculate the range inside the folio that we actually need to read.
> */
> @@ -1416,6 +1428,10 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
> .len = len,
> .flags = IOMAP_ZERO,
> };
> + struct address_space *mapping = inode->i_mapping;
> + unsigned int blocksize = i_blocksize(inode);
> + unsigned int off = pos & (blocksize - 1);
> + loff_t plen = min_t(loff_t, len, blocksize - off);
> int ret;
> bool range_dirty;
>
> @@ -1425,12 +1441,30 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
> * mapping converts on writeback completion and must be zeroed.
> *
> * The simplest way to deal with this is to flush pagecache and process
> - * the updated mappings. To avoid an unconditional flush, check dirty
> - * state and defer the flush until a combination of dirty pagecache and
> - * at least one mapping that might convert on writeback is seen.
> + * the updated mappings. First, special case the partial eof zeroing
> + * use case since it is more performance sensitive. Zero the start of
> + * the range if unaligned and already dirty in pagecache.
> + */
> + if (off &&
> + filemap_range_needs_writeback(mapping, pos, pos + plen - 1)) {
> + iter.len = plen;
> + while ((ret = iomap_iter(&iter, ops)) > 0)
> + iter.processed = iomap_zero_iter(&iter, did_zero);
> +
> + /* reset iterator for the rest of the range */
> + iomap_iter_init(&iter, inode, iter.pos,
> + len - (iter.pos - pos), IOMAP_ZERO);
Nit: maybe one more tab ^ here?
Also from the previous thread: can you reset the original iter instead
of declaring a second one by zeroing the mappings/processed fields,
re-expanding iter::len, and resetting iter::flags?
I guess we'll still do the flush if the start of the zeroing range
aligns with an fsblock? I guess if you're going to do a lot of small
extensions then once per fsblock isn't too bad?
--D
> + if (ret || !iter.len)
> + return ret;
> + }
> +
> + /*
> + * To avoid an unconditional flush, check dirty state and defer the
> + * flush until a combination of dirty pagecache and at least one
> + * mapping that might convert on writeback is seen.
> */
> range_dirty = filemap_range_needs_writeback(inode->i_mapping,
> - pos, pos + len - 1);
> + iter.pos, iter.pos + iter.len - 1);
> while ((ret = iomap_iter(&iter, ops)) > 0) {
> const struct iomap *s = iomap_iter_srcmap(&iter);
> if (s->type == IOMAP_HOLE || s->type == IOMAP_UNWRITTEN) {
> --
> 2.46.2
>
>
next prev parent reply other threads:[~2024-11-06 0:11 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-31 14:04 [PATCH v2 0/2] iomap: avoid flushes for partial eof zeroing Brian Foster
2024-10-31 14:04 ` [PATCH v2 1/2] iomap: lift zeroed mapping handling into iomap_zero_range() Brian Foster
2024-11-06 0:14 ` Darrick J. Wong
2024-10-31 14:04 ` [PATCH v2 2/2] iomap: elide flush from partial eof zero range Brian Foster
2024-11-06 0:11 ` Darrick J. Wong [this message]
2024-11-06 14:13 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241106001130.GP21836@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=bfoster@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox