public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] iomap: avoid memset iomap when iter is done
@ 2026-04-16  3:06 Fengnan Chang
  2026-04-16 13:27 ` Brian Foster
  0 siblings, 1 reply; 5+ messages in thread
From: Fengnan Chang @ 2026-04-16  3:06 UTC (permalink / raw)
  To: brauner, djwong, hch, linux-xfs, linux-fsdevel, linux-ext4
  Cc: lidiangang, Fengnan Chang

When iomap_iter() finishes its iteration (returns <= 0), it is no longer
necessary to memset the entire iomap and srcmap structures.

In high-IOPS scenarios (like 4k randread NVMe polling with io_uring),
where the majority of I/Os complete in a single extent map, this wasted
memory write bandwidth, as the caller will just discard the iterator.

Use this command to test:
taskset -c 30 ./t/io_uring -p1 -d512 -b4096 -s32 -c32 -F1 -B1 -R1 -X1
-n1 -P1 /mnt/testfile
IOPS improve about 5% on ext4 and XFS.

However, we MUST still call iomap_iter_reset_iomap() to release the
folio_batch if IOMAP_F_FOLIO_BATCH is set, otherwise we leak page
references. Therefore, split the cleanup logic: always release the
folio_batch, but skip the memset() when ret <= 0.

Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
---
 fs/iomap/iter.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c
index c04796f6e57f..91eb5e6165ff 100644
--- a/fs/iomap/iter.c
+++ b/fs/iomap/iter.c
@@ -15,8 +15,6 @@ static inline void iomap_iter_reset_iomap(struct iomap_iter *iter)
 	}
 
 	iter->status = 0;
-	memset(&iter->iomap, 0, sizeof(iter->iomap));
-	memset(&iter->srcmap, 0, sizeof(iter->srcmap));
 }
 
 /* Advance the current iterator position and decrement the remaining length */
@@ -106,6 +104,9 @@ int iomap_iter(struct iomap_iter *iter, const struct iomap_ops *ops)
 	if (ret <= 0)
 		return ret;
 
+	memset(&iter->iomap, 0, sizeof(iter->iomap));
+	memset(&iter->srcmap, 0, sizeof(iter->srcmap));
+
 begin:
 	ret = ops->iomap_begin(iter->inode, iter->pos, iter->len, iter->flags,
 			       &iter->iomap, &iter->srcmap);
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] iomap: avoid memset iomap when iter is done
  2026-04-16  3:06 [PATCH] iomap: avoid memset iomap when iter is done Fengnan Chang
@ 2026-04-16 13:27 ` Brian Foster
  2026-04-16 15:27   ` Darrick J. Wong
  0 siblings, 1 reply; 5+ messages in thread
From: Brian Foster @ 2026-04-16 13:27 UTC (permalink / raw)
  To: Fengnan Chang
  Cc: brauner, djwong, hch, linux-xfs, linux-fsdevel, linux-ext4,
	lidiangang, Fengnan Chang

On Thu, Apr 16, 2026 at 11:06:42AM +0800, Fengnan Chang wrote:
> When iomap_iter() finishes its iteration (returns <= 0), it is no longer
> necessary to memset the entire iomap and srcmap structures.
> 
> In high-IOPS scenarios (like 4k randread NVMe polling with io_uring),
> where the majority of I/Os complete in a single extent map, this wasted
> memory write bandwidth, as the caller will just discard the iterator.
> 
> Use this command to test:
> taskset -c 30 ./t/io_uring -p1 -d512 -b4096 -s32 -c32 -F1 -B1 -R1 -X1
> -n1 -P1 /mnt/testfile
> IOPS improve about 5% on ext4 and XFS.
> 
> However, we MUST still call iomap_iter_reset_iomap() to release the
> folio_batch if IOMAP_F_FOLIO_BATCH is set, otherwise we leak page
> references. Therefore, split the cleanup logic: always release the
> folio_batch, but skip the memset() when ret <= 0.
> 
> Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
> ---
>  fs/iomap/iter.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c
> index c04796f6e57f..91eb5e6165ff 100644
> --- a/fs/iomap/iter.c
> +++ b/fs/iomap/iter.c
> @@ -15,8 +15,6 @@ static inline void iomap_iter_reset_iomap(struct iomap_iter *iter)
>  	}
>  
>  	iter->status = 0;
> -	memset(&iter->iomap, 0, sizeof(iter->iomap));
> -	memset(&iter->srcmap, 0, sizeof(iter->srcmap));
>  }
>  
>  /* Advance the current iterator position and decrement the remaining length */
> @@ -106,6 +104,9 @@ int iomap_iter(struct iomap_iter *iter, const struct iomap_ops *ops)
>  	if (ret <= 0)
>  		return ret;
>  
> +	memset(&iter->iomap, 0, sizeof(iter->iomap));
> +	memset(&iter->srcmap, 0, sizeof(iter->srcmap));
> +

This seems reasonable to me in principle, but it feels a little odd to
leave a reset helper that doesn't really do a "reset." I wonder if this
should be refactored into an iomap_iter_complete() (i.e. "complete an
iteration") helper that includes the ret assignment logic just above the
reset call and returns it, and then maybe leave a oneline comment above
the memset so somebody doesn't blindly fold it back in the future. So
for example:

	ret = iomap_iter_complete(iter);
	if (ret <= 0)
		return ret;

	/* save cycles and only clear the mappings if we plan to iterate */
	memset(..);
	...

We'd probably have to recheck some of the iter state within the new
helper, but that doesn't seem like a big deal to me. Thoughts?

Brian

>  begin:
>  	ret = ops->iomap_begin(iter->inode, iter->pos, iter->len, iter->flags,
>  			       &iter->iomap, &iter->srcmap);
> -- 
> 2.39.5 (Apple Git-154)
> 
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] iomap: avoid memset iomap when iter is done
  2026-04-16 13:27 ` Brian Foster
@ 2026-04-16 15:27   ` Darrick J. Wong
  2026-04-16 22:42     ` Dave Chinner
  2026-04-16 23:20     ` Mateusz Guzik
  0 siblings, 2 replies; 5+ messages in thread
From: Darrick J. Wong @ 2026-04-16 15:27 UTC (permalink / raw)
  To: Brian Foster
  Cc: Fengnan Chang, brauner, hch, linux-xfs, linux-fsdevel, linux-ext4,
	lidiangang, Fengnan Chang

On Thu, Apr 16, 2026 at 09:27:22AM -0400, Brian Foster wrote:
> On Thu, Apr 16, 2026 at 11:06:42AM +0800, Fengnan Chang wrote:
> > When iomap_iter() finishes its iteration (returns <= 0), it is no longer
> > necessary to memset the entire iomap and srcmap structures.
> > 
> > In high-IOPS scenarios (like 4k randread NVMe polling with io_uring),
> > where the majority of I/Os complete in a single extent map, this wasted
> > memory write bandwidth, as the caller will just discard the iterator.
> > 
> > Use this command to test:
> > taskset -c 30 ./t/io_uring -p1 -d512 -b4096 -s32 -c32 -F1 -B1 -R1 -X1
> > -n1 -P1 /mnt/testfile
> > IOPS improve about 5% on ext4 and XFS.
> > 
> > However, we MUST still call iomap_iter_reset_iomap() to release the
> > folio_batch if IOMAP_F_FOLIO_BATCH is set, otherwise we leak page
> > references. Therefore, split the cleanup logic: always release the
> > folio_batch, but skip the memset() when ret <= 0.
> > 
> > Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
> > ---
> >  fs/iomap/iter.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c
> > index c04796f6e57f..91eb5e6165ff 100644
> > --- a/fs/iomap/iter.c
> > +++ b/fs/iomap/iter.c
> > @@ -15,8 +15,6 @@ static inline void iomap_iter_reset_iomap(struct iomap_iter *iter)
> >  	}
> >  
> >  	iter->status = 0;
> > -	memset(&iter->iomap, 0, sizeof(iter->iomap));
> > -	memset(&iter->srcmap, 0, sizeof(iter->srcmap));
> >  }
> >  
> >  /* Advance the current iterator position and decrement the remaining length */
> > @@ -106,6 +104,9 @@ int iomap_iter(struct iomap_iter *iter, const struct iomap_ops *ops)
> >  	if (ret <= 0)
> >  		return ret;
> >  
> > +	memset(&iter->iomap, 0, sizeof(iter->iomap));
> > +	memset(&iter->srcmap, 0, sizeof(iter->srcmap));
> > +
> 
> This seems reasonable to me in principle, but it feels a little odd to
> leave a reset helper that doesn't really do a "reset." I wonder if this
> should be refactored into an iomap_iter_complete() (i.e. "complete an
> iteration") helper that includes the ret assignment logic just above the
> reset call and returns it, and then maybe leave a oneline comment above
> the memset so somebody doesn't blindly fold it back in the future. So
> for example:
> 
> 	ret = iomap_iter_complete(iter);
> 	if (ret <= 0)
> 		return ret;
> 
> 	/* save cycles and only clear the mappings if we plan to iterate */
> 	memset(..);
> 	...
> 
> We'd probably have to recheck some of the iter state within the new
> helper, but that doesn't seem like a big deal to me. Thoughts?

What kind of computer is this where there's a 5% hit to iops from a
memset of ~150 bytes?

--D

> Brian
> 
> >  begin:
> >  	ret = ops->iomap_begin(iter->inode, iter->pos, iter->len, iter->flags,
> >  			       &iter->iomap, &iter->srcmap);
> > -- 
> > 2.39.5 (Apple Git-154)
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] iomap: avoid memset iomap when iter is done
  2026-04-16 15:27   ` Darrick J. Wong
@ 2026-04-16 22:42     ` Dave Chinner
  2026-04-16 23:20     ` Mateusz Guzik
  1 sibling, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2026-04-16 22:42 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Brian Foster, Fengnan Chang, brauner, hch, linux-xfs,
	linux-fsdevel, linux-ext4, lidiangang, Fengnan Chang

On Thu, Apr 16, 2026 at 08:27:05AM -0700, Darrick J. Wong wrote:
> On Thu, Apr 16, 2026 at 09:27:22AM -0400, Brian Foster wrote:
> > On Thu, Apr 16, 2026 at 11:06:42AM +0800, Fengnan Chang wrote:
> > > When iomap_iter() finishes its iteration (returns <= 0), it is no longer
> > > necessary to memset the entire iomap and srcmap structures.
> > > 
> > > In high-IOPS scenarios (like 4k randread NVMe polling with io_uring),
> > > where the majority of I/Os complete in a single extent map, this wasted
> > > memory write bandwidth, as the caller will just discard the iterator.
> > > 
> > > Use this command to test:
> > > taskset -c 30 ./t/io_uring -p1 -d512 -b4096 -s32 -c32 -F1 -B1 -R1 -X1
> > > -n1 -P1 /mnt/testfile
> > > IOPS improve about 5% on ext4 and XFS.
> > > 
> > > However, we MUST still call iomap_iter_reset_iomap() to release the
> > > folio_batch if IOMAP_F_FOLIO_BATCH is set, otherwise we leak page
> > > references. Therefore, split the cleanup logic: always release the
> > > folio_batch, but skip the memset() when ret <= 0.
> > > 
> > > Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
> > > ---
> > >  fs/iomap/iter.c | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c
> > > index c04796f6e57f..91eb5e6165ff 100644
> > > --- a/fs/iomap/iter.c
> > > +++ b/fs/iomap/iter.c
> > > @@ -15,8 +15,6 @@ static inline void iomap_iter_reset_iomap(struct iomap_iter *iter)
> > >  	}
> > >  
> > >  	iter->status = 0;
> > > -	memset(&iter->iomap, 0, sizeof(iter->iomap));
> > > -	memset(&iter->srcmap, 0, sizeof(iter->srcmap));
> > >  }
> > >  
> > >  /* Advance the current iterator position and decrement the remaining length */
> > > @@ -106,6 +104,9 @@ int iomap_iter(struct iomap_iter *iter, const struct iomap_ops *ops)
> > >  	if (ret <= 0)
> > >  		return ret;
> > >  
> > > +	memset(&iter->iomap, 0, sizeof(iter->iomap));
> > > +	memset(&iter->srcmap, 0, sizeof(iter->srcmap));
> > > +
> > 
> > This seems reasonable to me in principle, but it feels a little odd to
> > leave a reset helper that doesn't really do a "reset." I wonder if this
> > should be refactored into an iomap_iter_complete() (i.e. "complete an
> > iteration") helper that includes the ret assignment logic just above the
> > reset call and returns it, and then maybe leave a oneline comment above
> > the memset so somebody doesn't blindly fold it back in the future. So
> > for example:
> > 
> > 	ret = iomap_iter_complete(iter);
> > 	if (ret <= 0)
> > 		return ret;
> > 
> > 	/* save cycles and only clear the mappings if we plan to iterate */
> > 	memset(..);
> > 	...
> > 
> > We'd probably have to recheck some of the iter state within the new
> > helper, but that doesn't seem like a big deal to me. Thoughts?
> 
> What kind of computer is this where there's a 5% hit to iops from a
> memset of ~150 bytes?

Even small costs can have a big impact when you have to pay it many
times.

i.e. 2 million IOPS * 2 * 72 bytes per IO = 288MB/s of memory being
zeroed unnecessarily in very small (inefficient) chunks.

That's definitely enough to cause a 5% drop in IOPS when the
workload is CPU or memory bandwidth bound....

-Dave.
-- 
Dave Chinner
dgc@kernel.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] iomap: avoid memset iomap when iter is done
  2026-04-16 15:27   ` Darrick J. Wong
  2026-04-16 22:42     ` Dave Chinner
@ 2026-04-16 23:20     ` Mateusz Guzik
  1 sibling, 0 replies; 5+ messages in thread
From: Mateusz Guzik @ 2026-04-16 23:20 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Brian Foster, Fengnan Chang, brauner, hch, linux-xfs,
	linux-fsdevel, linux-ext4, lidiangang, Fengnan Chang

On Thu, Apr 16, 2026 at 08:27:05AM -0700, Darrick J. Wong wrote:
> What kind of computer is this where there's a 5% hit to iops from a
> memset of ~150 bytes?
> 

Such a memset is not free in its own right, but here the real problem is
*how* it is done: by default gcc emits inlined 'rep stosq' which has
utterly horrid performance (same as 'rep movsq') and in this particular
case the 2 memsets in the source code result in 2 separate 'rep movsq'
invocations.

I complained about it numerous times, see this to get a taste:
https://lore.kernel.org/all/20250605164733.737543-1-mjguzik@gmail.com/

Also see
https://lore.kernel.org/all/202504181042.54ea2b8a-lkp@intel.com/

If memory serves right gcc is going to fix it, so this is largely going
to clear itself down the road.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-04-16 23:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-16  3:06 [PATCH] iomap: avoid memset iomap when iter is done Fengnan Chang
2026-04-16 13:27 ` Brian Foster
2026-04-16 15:27   ` Darrick J. Wong
2026-04-16 22:42     ` Dave Chinner
2026-04-16 23:20     ` Mateusz Guzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox