From: Dave Chinner <dgc@kernel.org>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Brian Foster <bfoster@redhat.com>,
Fengnan Chang <fengnanchang@gmail.com>,
brauner@kernel.org, hch@infradead.org, linux-xfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
lidiangang@bytedance.com,
Fengnan Chang <changfengnan@bytedance.com>
Subject: Re: [PATCH] iomap: avoid memset iomap when iter is done
Date: Fri, 17 Apr 2026 08:42:05 +1000 [thread overview]
Message-ID: <aeFlvXDShzIuOU9z@dread> (raw)
In-Reply-To: <20260416152705.GC114239@frogsfrogsfrogs>
On Thu, Apr 16, 2026 at 08:27:05AM -0700, Darrick J. Wong wrote:
> On Thu, Apr 16, 2026 at 09:27:22AM -0400, Brian Foster wrote:
> > On Thu, Apr 16, 2026 at 11:06:42AM +0800, Fengnan Chang wrote:
> > > When iomap_iter() finishes its iteration (returns <= 0), it is no longer
> > > necessary to memset the entire iomap and srcmap structures.
> > >
> > > In high-IOPS scenarios (like 4k randread NVMe polling with io_uring),
> > > where the majority of I/Os complete in a single extent map, this wasted
> > > memory write bandwidth, as the caller will just discard the iterator.
> > >
> > > Use this command to test:
> > > taskset -c 30 ./t/io_uring -p1 -d512 -b4096 -s32 -c32 -F1 -B1 -R1 -X1
> > > -n1 -P1 /mnt/testfile
> > > IOPS improve about 5% on ext4 and XFS.
> > >
> > > However, we MUST still call iomap_iter_reset_iomap() to release the
> > > folio_batch if IOMAP_F_FOLIO_BATCH is set, otherwise we leak page
> > > references. Therefore, split the cleanup logic: always release the
> > > folio_batch, but skip the memset() when ret <= 0.
> > >
> > > Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
> > > ---
> > > fs/iomap/iter.c | 5 +++--
> > > 1 file changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c
> > > index c04796f6e57f..91eb5e6165ff 100644
> > > --- a/fs/iomap/iter.c
> > > +++ b/fs/iomap/iter.c
> > > @@ -15,8 +15,6 @@ static inline void iomap_iter_reset_iomap(struct iomap_iter *iter)
> > > }
> > >
> > > iter->status = 0;
> > > - memset(&iter->iomap, 0, sizeof(iter->iomap));
> > > - memset(&iter->srcmap, 0, sizeof(iter->srcmap));
> > > }
> > >
> > > /* Advance the current iterator position and decrement the remaining length */
> > > @@ -106,6 +104,9 @@ int iomap_iter(struct iomap_iter *iter, const struct iomap_ops *ops)
> > > if (ret <= 0)
> > > return ret;
> > >
> > > + memset(&iter->iomap, 0, sizeof(iter->iomap));
> > > + memset(&iter->srcmap, 0, sizeof(iter->srcmap));
> > > +
> >
> > This seems reasonable to me in principle, but it feels a little odd to
> > leave a reset helper that doesn't really do a "reset." I wonder if this
> > should be refactored into an iomap_iter_complete() (i.e. "complete an
> > iteration") helper that includes the ret assignment logic just above the
> > reset call and returns it, and then maybe leave a oneline comment above
> > the memset so somebody doesn't blindly fold it back in the future. So
> > for example:
> >
> > ret = iomap_iter_complete(iter);
> > if (ret <= 0)
> > return ret;
> >
> > /* save cycles and only clear the mappings if we plan to iterate */
> > memset(..);
> > ...
> >
> > We'd probably have to recheck some of the iter state within the new
> > helper, but that doesn't seem like a big deal to me. Thoughts?
>
> What kind of computer is this where there's a 5% hit to iops from a
> memset of ~150 bytes?
Even small costs can have a big impact when you have to pay it many
times.
i.e. 2 million IOPS * 2 * 72 bytes per IO = 288MB/s of memory being
zeroed unnecessarily in very small (inefficient) chunks.
That's definitely enough to cause a 5% drop in IOPS when the
workload is CPU or memory bandwidth bound....
-Dave.
--
Dave Chinner
dgc@kernel.org
next prev parent reply other threads:[~2026-04-16 22:42 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-16 3:06 [PATCH] iomap: avoid memset iomap when iter is done Fengnan Chang
2026-04-16 13:27 ` Brian Foster
2026-04-16 15:27 ` Darrick J. Wong
2026-04-16 22:42 ` Dave Chinner [this message]
2026-04-17 2:15 ` changfengnan
2026-04-16 23:20 ` Mateusz Guzik
2026-04-17 2:19 ` changfengnan
2026-04-17 7:27 ` Christoph Hellwig
2026-04-17 9:15 ` changfengnan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aeFlvXDShzIuOU9z@dread \
--to=dgc@kernel.org \
--cc=bfoster@redhat.com \
--cc=brauner@kernel.org \
--cc=changfengnan@bytedance.com \
--cc=djwong@kernel.org \
--cc=fengnanchang@gmail.com \
--cc=hch@infradead.org \
--cc=lidiangang@bytedance.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.