From: Dave Chinner <dgc@kernel.org>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Brian Foster <bfoster@redhat.com>,
Fengnan Chang <fengnanchang@gmail.com>,
brauner@kernel.org, hch@infradead.org, linux-xfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
lidiangang@bytedance.com,
Fengnan Chang <changfengnan@bytedance.com>
Subject: Re: [PATCH] iomap: avoid memset iomap when iter is done
Date: Fri, 17 Apr 2026 08:42:05 +1000 [thread overview]
Message-ID: <aeFlvXDShzIuOU9z@dread> (raw)
In-Reply-To: <20260416152705.GC114239@frogsfrogsfrogs>
On Thu, Apr 16, 2026 at 08:27:05AM -0700, Darrick J. Wong wrote:
> On Thu, Apr 16, 2026 at 09:27:22AM -0400, Brian Foster wrote:
> > On Thu, Apr 16, 2026 at 11:06:42AM +0800, Fengnan Chang wrote:
> > > When iomap_iter() finishes its iteration (returns <= 0), it is no longer
> > > necessary to memset the entire iomap and srcmap structures.
> > >
> > > In high-IOPS scenarios (like 4k randread NVMe polling with io_uring),
> > > where the majority of I/Os complete in a single extent map, this wasted
> > > memory write bandwidth, as the caller will just discard the iterator.
> > >
> > > Use this command to test:
> > > taskset -c 30 ./t/io_uring -p1 -d512 -b4096 -s32 -c32 -F1 -B1 -R1 -X1
> > > -n1 -P1 /mnt/testfile
> > > IOPS improve about 5% on ext4 and XFS.
> > >
> > > However, we MUST still call iomap_iter_reset_iomap() to release the
> > > folio_batch if IOMAP_F_FOLIO_BATCH is set, otherwise we leak page
> > > references. Therefore, split the cleanup logic: always release the
> > > folio_batch, but skip the memset() when ret <= 0.
> > >
> > > Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
> > > ---
> > > fs/iomap/iter.c | 5 +++--
> > > 1 file changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c
> > > index c04796f6e57f..91eb5e6165ff 100644
> > > --- a/fs/iomap/iter.c
> > > +++ b/fs/iomap/iter.c
> > > @@ -15,8 +15,6 @@ static inline void iomap_iter_reset_iomap(struct iomap_iter *iter)
> > > }
> > >
> > > iter->status = 0;
> > > - memset(&iter->iomap, 0, sizeof(iter->iomap));
> > > - memset(&iter->srcmap, 0, sizeof(iter->srcmap));
> > > }
> > >
> > > /* Advance the current iterator position and decrement the remaining length */
> > > @@ -106,6 +104,9 @@ int iomap_iter(struct iomap_iter *iter, const struct iomap_ops *ops)
> > > if (ret <= 0)
> > > return ret;
> > >
> > > + memset(&iter->iomap, 0, sizeof(iter->iomap));
> > > + memset(&iter->srcmap, 0, sizeof(iter->srcmap));
> > > +
> >
> > This seems reasonable to me in principle, but it feels a little odd to
> > leave a reset helper that doesn't really do a "reset." I wonder if this
> > should be refactored into an iomap_iter_complete() (i.e. "complete an
> > iteration") helper that includes the ret assignment logic just above the
> > reset call and returns it, and then maybe leave a oneline comment above
> > the memset so somebody doesn't blindly fold it back in the future. So
> > for example:
> >
> > ret = iomap_iter_complete(iter);
> > if (ret <= 0)
> > return ret;
> >
> > /* save cycles and only clear the mappings if we plan to iterate */
> > memset(..);
> > ...
> >
> > We'd probably have to recheck some of the iter state within the new
> > helper, but that doesn't seem like a big deal to me. Thoughts?
>
> What kind of computer is this where there's a 5% hit to iops from a
> memset of ~150 bytes?
Even small costs can have a big impact when you have to pay it many
times.
i.e. 2 million IOPS * 2 * 72 bytes per IO = 288MB/s of memory being
zeroed unnecessarily in very small (inefficient) chunks.
That's definitely enough to cause a 5% drop in IOPS when the
workload is CPU or memory bandwidth bound....
-Dave.
--
Dave Chinner
dgc@kernel.org
next prev parent reply other threads:[~2026-04-16 22:42 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-16 3:06 [PATCH] iomap: avoid memset iomap when iter is done Fengnan Chang
2026-04-16 13:27 ` Brian Foster
2026-04-16 15:27 ` Darrick J. Wong
2026-04-16 22:42 ` Dave Chinner [this message]
2026-04-17 2:15 ` changfengnan
2026-04-16 23:20 ` Mateusz Guzik
2026-04-17 2:19 ` changfengnan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aeFlvXDShzIuOU9z@dread \
--to=dgc@kernel.org \
--cc=bfoster@redhat.com \
--cc=brauner@kernel.org \
--cc=changfengnan@bytedance.com \
--cc=djwong@kernel.org \
--cc=fengnanchang@gmail.com \
--cc=hch@infradead.org \
--cc=lidiangang@bytedance.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox