Linux XFS filesystem development
 help / color / mirror / Atom feed
* don't merge bios over iomap boundaries, was: Re: [PATCH] erofs: prevent buffered read bio merges across device chunks
       [not found] ` <58bef9af-0926-4948-b917-e38c3793f596@linux.alibaba.com>
@ 2026-06-12  6:25   ` Christoph Hellwig
  2026-06-12  6:54     ` Gao Xiang
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2026-06-12  6:25 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Yifan Zhao, linux-erofs, linux-kernel, yekelu1, jingrui,
	zhukeqian1, Ritesh Harjani, Darrick J. Wong, linux-xfs,
	Joanne Koong

On Fri, Jun 12, 2026 at 11:42:38AM +0800, Gao Xiang wrote:
> > Reported-by: Kelu Ye <yekelu1@huawei.com>
> > Assisted-by: Codex:GPT-5.5
> > Signed-off-by: Yifan Zhao <zhaoyifan28@huawei.com>
> 
> I think it's an iomap bug instead, see:
> 
> iomap_bio_read_folio_range(), we should fix iomap instead.

Yes.  iomap should not try to build bios over iomap boundaries.
caused various issues.  Ritesh ran into that with the ext2 port
back in the day, and I actually ran into it again with an under
development xfs feature.

Can you try this patch?

---
From 297230cc3c08cbfef3670b08c4e35813c18c523e Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Sun, 7 Jun 2026 08:53:20 +0200
Subject: iomap: submit read bio after each extent

This keeps bios from crossing RTG boundaries in XFS and probably fixes
all kinds of other stuff..

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index d55b936e6986..3642a11c102f 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -597,12 +597,13 @@ void iomap_read_folio(const struct iomap_ops *ops,
 
 	trace_iomap_readpage(iter.inode, 1);
 
-	while ((ret = iomap_iter(&iter, ops)) > 0)
+	while ((ret = iomap_iter(&iter, ops)) > 0) {
 		iter.status = iomap_read_folio_iter(&iter, ctx,
 				&bytes_submitted);
-
-	if (ctx->read_ctx && ctx->ops->submit_read)
-		ctx->ops->submit_read(&iter, ctx);
+		if (ctx->read_ctx && ctx->ops->submit_read)
+			ctx->ops->submit_read(&iter, ctx);
+		ctx->read_ctx = NULL;
+	}
 
 	if (ctx->cur_folio)
 		iomap_read_end(ctx->cur_folio, bytes_submitted);
@@ -664,12 +665,13 @@ void iomap_readahead(const struct iomap_ops *ops,
 
 	trace_iomap_readahead(rac->mapping->host, readahead_count(rac));
 
-	while (iomap_iter(&iter, ops) > 0)
+	while (iomap_iter(&iter, ops) > 0) {
 		iter.status = iomap_readahead_iter(&iter, ctx,
 					&cur_bytes_submitted);
-
-	if (ctx->read_ctx && ctx->ops->submit_read)
-		ctx->ops->submit_read(&iter, ctx);
+		if (ctx->read_ctx && ctx->ops->submit_read)
+			ctx->ops->submit_read(&iter, ctx);
+		ctx->read_ctx = NULL;
+	}
 
 	if (ctx->cur_folio)
 		iomap_read_end(ctx->cur_folio, cur_bytes_submitted);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: don't merge bios over iomap boundaries, was: Re: [PATCH] erofs: prevent buffered read bio merges across device chunks
  2026-06-12  6:25   ` don't merge bios over iomap boundaries, was: Re: [PATCH] erofs: prevent buffered read bio merges across device chunks Christoph Hellwig
@ 2026-06-12  6:54     ` Gao Xiang
  2026-06-12  7:10       ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Gao Xiang @ 2026-06-12  6:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Yifan Zhao, linux-erofs, linux-kernel, yekelu1, jingrui,
	zhukeqian1, Ritesh Harjani, Darrick J. Wong, linux-xfs,
	Joanne Koong

Hi Christoph,

On 2026/6/12 14:25, Christoph Hellwig wrote:
> On Fri, Jun 12, 2026 at 11:42:38AM +0800, Gao Xiang wrote:
>>> Reported-by: Kelu Ye <yekelu1@huawei.com>
>>> Assisted-by: Codex:GPT-5.5
>>> Signed-off-by: Yifan Zhao <zhaoyifan28@huawei.com>
>>
>> I think it's an iomap bug instead, see:
>>
>> iomap_bio_read_folio_range(), we should fix iomap instead.
> 
> Yes.  iomap should not try to build bios over iomap boundaries.
> caused various issues.  Ritesh ran into that with the ext2 port
> back in the day, and I actually ran into it again with an under
> development xfs feature.
> 
> Can you try this patch?

hmm, currently erofs could return block-sized iomap (if the chunk
size is 4k) even it can be merged with the following chunks.

Previously it was fairly good since consecutive chunks will be
added to the current bio if possible, but after this patch,
there will be a lot of 4k bios.

But if iomap goes into this way, I could make iomap_begin maps
more chunks in one shot, but that needs more changes in erofs,
it's fine anyway.

... I was thinking the following diff (space-damaged):

diff --git a/fs/iomap/bio.c b/fs/iomap/bio.c
index 4504f4633f17..241df96a16a6 100644
--- a/fs/iomap/bio.c
+++ b/fs/iomap/bio.c
@@ -142,6 +142,7 @@ int iomap_bio_read_folio_range(const struct iomap_iter *iter,

         if (!bio ||
             bio_end_sector(bio) != iomap_sector(&iter->iomap, iter->pos) ||
+           bio->bi_bdev != iter->iomap.bdev ||
             bio->bi_iter.bi_size > iomap_max_bio_size(&iter->iomap) - plen ||
             !bio_add_folio(bio, folio, plen, offset_in_folio(folio, iter->pos)))
                 iomap_read_alloc_bio(iter, ctx, plen);


but either way works fine with me since it's an iomap design
stuff.

Thanks,
Gao Xiang

> 
> ---
>  From 297230cc3c08cbfef3670b08c4e35813c18c523e Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@lst.de>
> Date: Sun, 7 Jun 2026 08:53:20 +0200
> Subject: iomap: submit read bio after each extent
> 
> This keeps bios from crossing RTG boundaries in XFS and probably fixes
> all kinds of other stuff..
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/iomap/buffered-io.c | 18 ++++++++++--------
>   1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index d55b936e6986..3642a11c102f 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -597,12 +597,13 @@ void iomap_read_folio(const struct iomap_ops *ops,
>   
>   	trace_iomap_readpage(iter.inode, 1);
>   
> -	while ((ret = iomap_iter(&iter, ops)) > 0)
> +	while ((ret = iomap_iter(&iter, ops)) > 0) {
>   		iter.status = iomap_read_folio_iter(&iter, ctx,
>   				&bytes_submitted);
> -
> -	if (ctx->read_ctx && ctx->ops->submit_read)
> -		ctx->ops->submit_read(&iter, ctx);
> +		if (ctx->read_ctx && ctx->ops->submit_read)
> +			ctx->ops->submit_read(&iter, ctx);
> +		ctx->read_ctx = NULL;
> +	}
>   
>   	if (ctx->cur_folio)
>   		iomap_read_end(ctx->cur_folio, bytes_submitted);
> @@ -664,12 +665,13 @@ void iomap_readahead(const struct iomap_ops *ops,
>   
>   	trace_iomap_readahead(rac->mapping->host, readahead_count(rac));
>   
> -	while (iomap_iter(&iter, ops) > 0)
> +	while (iomap_iter(&iter, ops) > 0) {
>   		iter.status = iomap_readahead_iter(&iter, ctx,
>   					&cur_bytes_submitted);
> -
> -	if (ctx->read_ctx && ctx->ops->submit_read)
> -		ctx->ops->submit_read(&iter, ctx);
> +		if (ctx->read_ctx && ctx->ops->submit_read)
> +			ctx->ops->submit_read(&iter, ctx);
> +		ctx->read_ctx = NULL;
> +	}
>   
>   	if (ctx->cur_folio)
>   		iomap_read_end(ctx->cur_folio, cur_bytes_submitted);


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: don't merge bios over iomap boundaries, was: Re: [PATCH] erofs: prevent buffered read bio merges across device chunks
  2026-06-12  6:54     ` Gao Xiang
@ 2026-06-12  7:10       ` Christoph Hellwig
  2026-06-12  7:19         ` Gao Xiang
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2026-06-12  7:10 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Christoph Hellwig, Yifan Zhao, linux-erofs, linux-kernel, yekelu1,
	jingrui, zhukeqian1, Ritesh Harjani, Darrick J. Wong, linux-xfs,
	Joanne Koong

On Fri, Jun 12, 2026 at 02:54:47PM +0800, Gao Xiang wrote:
> hmm, currently erofs could return block-sized iomap (if the chunk
> size is 4k) even it can be merged with the following chunks.
> 
> Previously it was fairly good since consecutive chunks will be
> added to the current bio if possible, but after this patch,
> there will be a lot of 4k bios.
> 
> But if iomap goes into this way, I could make iomap_begin maps
> more chunks in one shot, but that needs more changes in erofs,
> it's fine anyway.
> 
> ... I was thinking the following diff (space-damaged):

That should work too for your case.  But we definitively have various
cases where merging over iomaps is a bad idea.  You'll also end up with
other efficiency gains by merging consecutive entries, especially for
direct I/O and when using large folios.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: don't merge bios over iomap boundaries, was: Re: [PATCH] erofs: prevent buffered read bio merges across device chunks
  2026-06-12  7:10       ` Christoph Hellwig
@ 2026-06-12  7:19         ` Gao Xiang
  2026-06-12  7:35           ` Gao Xiang
  2026-06-12  8:01           ` Christoph Hellwig
  0 siblings, 2 replies; 7+ messages in thread
From: Gao Xiang @ 2026-06-12  7:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Yifan Zhao, linux-erofs, linux-kernel, yekelu1, jingrui,
	zhukeqian1, Ritesh Harjani, Darrick J. Wong, linux-xfs,
	Joanne Koong



On 2026/6/12 15:10, Christoph Hellwig wrote:
> On Fri, Jun 12, 2026 at 02:54:47PM +0800, Gao Xiang wrote:
>> hmm, currently erofs could return block-sized iomap (if the chunk
>> size is 4k) even it can be merged with the following chunks.
>>
>> Previously it was fairly good since consecutive chunks will be
>> added to the current bio if possible, but after this patch,
>> there will be a lot of 4k bios.
>>
>> But if iomap goes into this way, I could make iomap_begin maps
>> more chunks in one shot, but that needs more changes in erofs,
>> it's fine anyway.
>>
>> ... I was thinking the following diff (space-damaged):
> 
> That should work too for your case.  But we definitively have various
> cases where merging over iomaps is a bad idea.  You'll also end up with
> other efficiency gains by merging consecutive entries, especially for
> direct I/O and when using large folios.

Yes, optimizing erofs chunk mapping would be more
efficient, will work out one soon, but Yifan can test
your patch in parallel.

Also, if "iomap: submit read bio after each extent" is
applied, I guess some merging condition in
iomap_bio_read_folio_range() can be removed since they
won't be reached in any case. (deadcode)

Thanks,
Gao Xiang



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: don't merge bios over iomap boundaries, was: Re: [PATCH] erofs: prevent buffered read bio merges across device chunks
  2026-06-12  7:19         ` Gao Xiang
@ 2026-06-12  7:35           ` Gao Xiang
  2026-06-12  8:04             ` Christoph Hellwig
  2026-06-12  8:01           ` Christoph Hellwig
  1 sibling, 1 reply; 7+ messages in thread
From: Gao Xiang @ 2026-06-12  7:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Yifan Zhao, linux-erofs, linux-kernel, yekelu1, jingrui,
	zhukeqian1, Ritesh Harjani, Darrick J. Wong, linux-xfs,
	Joanne Koong



On 2026/6/12 15:19, Gao Xiang wrote:
> 
> 
> On 2026/6/12 15:10, Christoph Hellwig wrote:
>> On Fri, Jun 12, 2026 at 02:54:47PM +0800, Gao Xiang wrote:
>>> hmm, currently erofs could return block-sized iomap (if the chunk
>>> size is 4k) even it can be merged with the following chunks.
>>>
>>> Previously it was fairly good since consecutive chunks will be
>>> added to the current bio if possible, but after this patch,
>>> there will be a lot of 4k bios.
>>>
>>> But if iomap goes into this way, I could make iomap_begin maps
>>> more chunks in one shot, but that needs more changes in erofs,
>>> it's fine anyway.
>>>
>>> ... I was thinking the following diff (space-damaged):
>>
>> That should work too for your case.  But we definitively have various
>> cases where merging over iomaps is a bad idea.  You'll also end up with
>> other efficiency gains by merging consecutive entries, especially for
>> direct I/O and when using large folios.
> 
> Yes, optimizing erofs chunk mapping would be more
> efficient, will work out one soon, but Yifan can test
> your patch in parallel.
> 
> Also, if "iomap: submit read bio after each extent" is
> applied, I guess some merging condition in
> iomap_bio_read_folio_range() can be removed since they
> won't be reached in any case. (deadcode)

btw, there may be be some edge cases like:
written | hole | written | hole | written ...

and if bios cannot across multiple iomaps, bios could be
amplified according to the shuffle pattern even all written
data is consecutive on disk (the block allocator may
allocate written blocks consecutively.)

Anyway, I never tried to argue with this cases (yet both
previous buffer-head and mpage codebase will merge this
except for some specific exceptions), maybe it's just a
pure artificial pattern and I'm worried too much.

Thanks,
Gao Xiang

> 
> Thanks,
> Gao Xiang
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: don't merge bios over iomap boundaries, was: Re: [PATCH] erofs: prevent buffered read bio merges across device chunks
  2026-06-12  7:19         ` Gao Xiang
  2026-06-12  7:35           ` Gao Xiang
@ 2026-06-12  8:01           ` Christoph Hellwig
  1 sibling, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2026-06-12  8:01 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Christoph Hellwig, Yifan Zhao, linux-erofs, linux-kernel, yekelu1,
	jingrui, zhukeqian1, Ritesh Harjani, Darrick J. Wong, linux-xfs,
	Joanne Koong

On Fri, Jun 12, 2026 at 03:19:30PM +0800, Gao Xiang wrote:
> 
> 
> On 2026/6/12 15:10, Christoph Hellwig wrote:
> > On Fri, Jun 12, 2026 at 02:54:47PM +0800, Gao Xiang wrote:
> > > hmm, currently erofs could return block-sized iomap (if the chunk
> > > size is 4k) even it can be merged with the following chunks.
> > > 
> > > Previously it was fairly good since consecutive chunks will be
> > > added to the current bio if possible, but after this patch,
> > > there will be a lot of 4k bios.
> > > 
> > > But if iomap goes into this way, I could make iomap_begin maps
> > > more chunks in one shot, but that needs more changes in erofs,
> > > it's fine anyway.
> > > 
> > > ... I was thinking the following diff (space-damaged):
> > 
> > That should work too for your case.  But we definitively have various
> > cases where merging over iomaps is a bad idea.  You'll also end up with
> > other efficiency gains by merging consecutive entries, especially for
> > direct I/O and when using large folios.
> 
> Yes, optimizing erofs chunk mapping would be more
> efficient, will work out one soon, but Yifan can test
> your patch in parallel.
> 
> Also, if "iomap: submit read bio after each extent" is
> applied, I guess some merging condition in
> iomap_bio_read_folio_range() can be removed since they
> won't be reached in any case. (deadcode)

I guess we can't hit the sector check anymore indeed, assuming
we never get non-contiguos readeahead requests, which I think is
true.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: don't merge bios over iomap boundaries, was: Re: [PATCH] erofs: prevent buffered read bio merges across device chunks
  2026-06-12  7:35           ` Gao Xiang
@ 2026-06-12  8:04             ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2026-06-12  8:04 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Christoph Hellwig, Yifan Zhao, linux-erofs, linux-kernel, yekelu1,
	jingrui, zhukeqian1, Ritesh Harjani, Darrick J. Wong, linux-xfs,
	Joanne Koong

On Fri, Jun 12, 2026 at 03:35:38PM +0800, Gao Xiang wrote:
> btw, there may be be some edge cases like:
> written | hole | written | hole | written ...
> 
> and if bios cannot across multiple iomaps, bios could be
> amplified according to the shuffle pattern even all written
> data is consecutive on disk (the block allocator may
> allocate written blocks consecutively.)
> 
> Anyway, I never tried to argue with this cases (yet both
> previous buffer-head and mpage codebase will merge this
> except for some specific exceptions), maybe it's just a
> pure artificial pattern and I'm worried too much.

We actually just had something like this come for XFS even
with the current merging:

https://lore.kernel.org/linux-xfs/6csdtjn33va4ivyycr4uh2ogac22xput4kgzxzt3mczdkvwjaf@37audfdijskv/T/#t


although this involves REQ_NOWAIT and thus is a bit more complicated.
But the merging scheme discussed there should also help with your
above case in general.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-12  8:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260612033244.993507-1-zhaoyifan28@huawei.com>
     [not found] ` <58bef9af-0926-4948-b917-e38c3793f596@linux.alibaba.com>
2026-06-12  6:25   ` don't merge bios over iomap boundaries, was: Re: [PATCH] erofs: prevent buffered read bio merges across device chunks Christoph Hellwig
2026-06-12  6:54     ` Gao Xiang
2026-06-12  7:10       ` Christoph Hellwig
2026-06-12  7:19         ` Gao Xiang
2026-06-12  7:35           ` Gao Xiang
2026-06-12  8:04             ` Christoph Hellwig
2026-06-12  8:01           ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox