From: Brian Foster <bfoster@redhat.com>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: Gao Xiang <hsiangkao@linux.alibaba.com>,
Christoph Hellwig <hch@infradead.org>,
brauner@kernel.org, djwong@kernel.org,
linux-fsdevel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
Date: Sat, 18 Oct 2025 06:30:03 -0400 [thread overview]
Message-ID: <aPNsK9D3RtAjccOA@bfoster> (raw)
In-Reply-To: <CAJnrk1YLuiRyVwEW6nR1fUnvWf1Ozu1hK4LKn3mRWKeECVyMAA@mail.gmail.com>
On Fri, Oct 17, 2025 at 01:19:39PM -0700, Joanne Koong wrote:
> On Fri, Oct 17, 2025 at 11:23 AM Brian Foster <bfoster@redhat.com> wrote:
> >
> > On Fri, Oct 17, 2025 at 11:33:31AM -0400, Brian Foster wrote:
> > > On Thu, Oct 16, 2025 at 03:39:27PM -0700, Joanne Koong wrote:
> > > > On Thu, Oct 16, 2025 at 4:25 AM Brian Foster <bfoster@redhat.com> wrote:
> > > > >
> > > > > On Wed, Oct 15, 2025 at 06:27:10PM -0700, Joanne Koong wrote:
> > > > > > On Wed, Oct 15, 2025 at 5:36 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > > > >
> > > > > > > On Wed, Oct 15, 2025 at 11:39 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > Hi Joanne,
> > > > > > > >
> > > > > > > > On 2025/10/16 02:21, Joanne Koong wrote:
> > > > > > > > > On Wed, Oct 15, 2025 at 11:06 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > ...
> > > > > > > >
> > > > > > > > >>>
> > > > > > > > >>> This is where I encountered it in erofs: [1] for the "WARNING in
> > > > > > > > >>> iomap_iter_advance" syz repro. (this syzbot report was generated in
> > > > > > > > >>> response to this patchset version [2]).
> > > > > > > > >>>
> > > > > > > > >>> When I ran that syz program locally, I remember seeing pos=116 and length=3980.
> > > > > > > > >>
> > > > > > > > >> I just ran the C repro locally with the upstream codebase (but I
> > > > > > > > >> didn't use the related Kconfig), and it doesn't show anything.
> > > > > > > > >
> > > > > > > > > Which upstream commit are you running it on? It needs to be run on top
> > > > > > > > > of this patchset [1] but without this fix [2]. These changes are in
> > > > > > > > > Christian's vfs-6.19.iomap branch in his vfs tree but I don't think
> > > > > > > > > that branch has been published publicly yet so maybe just patching it
> > > > > > > > > in locally will work best.
> > > > > > > > >
> > > > > > > > > When I reproed it last month, I used the syz executor (not the C
> > > > > > > > > repro, though that should probably work too?) directly with the
> > > > > > > > > kconfig they had.
> > > > > > > >
> > > > > > > > I believe it's a regression somewhere since it's a valid
> > > > > > > > IOMAP_INLINE extent (since it's inlined, the length is not
> > > > > > > > block-aligned of course), you could add a print just before
> > > > > > > > erofs_iomap_begin() returns.
> > > > > > >
> > > > > > > Ok, so if erofs is strictly block-aligned except for tail inline data
> > > > > > > (eg the IOMAP_INLINE extent), then I agree, there is a regression
> > > > > > > somewhere as we shouldn't be running into the situation where erofs is
> > > > > > > calling iomap_adjust_read_range() with a non-block-aligned position
> > > > > > > and length. I'll track the offending commit down tomorrow.
> > > > > > >
> > > > > >
> > > > > > Ok, I think it's commit bc264fea0f6f ("iomap: support incremental
> > > > > > iomap_iter advances") that changed this behavior for erofs such that
> > > > > > the read iteration continues even after encountering an IOMAP_INLINE
> > > > > > extent, whereas before, the iteration stopped after reading in the
> > > > > > iomap inline extent. This leads erofs to end up in the situation where
> > > > > > it calls into iomap_adjust_read_range() with a non-block-aligned
> > > > > > position/length (on that subsequent iteration).
> > > > > >
> > > > > > In particular, this change in commit bc264fea0f6f to iomap_iter():
> > > > > >
> > > > > > - if (ret > 0 && !iter->processed && !stale)
> > > > > > + if (ret > 0 && !advanced && !stale)
> > > > > >
> > > > > > For iomap inline extents, iter->processed is 0, which stopped the
> > > > > > iteration before. But now, advanced (which is iter->pos -
> > > > > > iter->iter_start_pos) is used which will continue the iteration (since
> > > > > > the iter is advanced after reading in the iomap inline extents).
> > > > > >
> > > > > > Erofs is able to handle subsequent iterations after iomap_inline
> > > > > > extents because erofs_iomap_begin() checks the block map and returns
> > > > > > IOMAP_HOLE if it's not mapped
> > > > > > if (!(map.m_flags & EROFS_MAP_MAPPED)) {
> > > > > > iomap->type = IOMAP_HOLE;
> > > > > > return 0;
> > > > > > }
> > > > > >
> > > > > > but I think what probably would be better is a separate patch that
> > > > > > reverts this back to the original behavior of stopping the iteration
> > > > > > after IOMAP_INLINE extents are read in.
> > > > > >
> > > > >
> > > > > Hmm.. so as of commit bc264fea0f6f, it looks like the read_inline() path
> > > > > still didn't advance the iter at all by that point. It just returned 0
> > > > > and this caused iomap_iter() to break out of the iteration loop.
> > > > >
> > > > > The logic noted above in iomap_iter() is basically saying to break out
> > > > > if the iteration did nothing, which is a bit of a hacky way to terminate
> > > > > an IOMAP_INLINE read. The proper thing to do in that path IMO is to
> > > > > report the bytes processed and then terminate some other way more
> > > > > naturally. I see Gao actually fixed this sometime later in commit
> > > > > b26816b4e320 ("iomap: fix inline data on buffered read"), which is when
> > > > > the inline read path started to advance the iter.
> > > >
> > > > That's a good point, the fix in commit b26816b4e320 is what led to the
> > > > new erofs behavior, not commit bc264fea0f6f.
> > > >
> > > > >
> > > > > TBH, the behavior described above where we advance over the inline
> > > > > mapping and then report any remaining iter length as a hole also sounds
> > > > > like reasonably appropriate behavior to me. I suppose you could argue
> > > > > that the inline case should just terminate the iter, which perhaps means
> > > > > it should call iomap_iter_advance_full() instead. That technically
> > > > > hardcodes that we will never process mappings beyond an inline mapping
> > > > > into iomap. That bugs me a little bit, but is also probably always going
> > > > > to be true so doesn't seem like that big of a deal.
> > > >
> > > > Reporting any remaining iter length as a hole also sounds reasonable
> > > > to me but it seems that this may add additional work that may not be
> > > > trivial. For example, I'm looking at erofs's erofs_map_blocks() call
> > > > which they would need to do to figure out it should be an iomap hole.
> > > > It seems a bit nonideal to me that any filesystem using iomap inline
> > > > data would also have to make sure they cover this case, which maybe
> > > > they already need to do that, I'm not sure, but it seems like an extra
> > > > thing they would now need to account for.
> > > >
> > >
> > > To make sure I follow, you're talking about any potential non-erofs
> > > cases, right? I thought it was mentioned that erofs already handled this
> > > correctly by returning a hole. So I take this to mean other users would
> > > need to handle this similarly.
>
> Yes, sorry if my previous wording was confusing. Erofs already handles
> this correctly but there's some overhead with having to determine it's
> a hole.
>
> > >
> > > Poking around, I see that ext4 uses iomap and IOMAP_INLINE in a couple
> > > isolated cases. One looks like swapfile activation and the other appears
> > > to be related to fiemap. Any idea if those are working correctly? At
> > > first look it kind of looks like they check and return error for offset
> > > beyond the specified length..
> > >
> >
> > JFYI from digging further.. ext4 returns -ENOENT for this "offset beyond
> > inline size" case. I didn't see any error returns from fiemap/filefrag
> > on a quick test against an inline data file. It looks like
> > iomap_fiemap() actually ignores errors either if the previous mapping is
> > non-hole, or catches -ENOENT explicitly to handle an attribute mapping
> > case. So afaict this all seems to work Ok..
> >
> > I'm not sure this is all that relevant for the swap case. mkswap
> > apparently requires a file at least 40k in size, which iiuc is beyond
> > the scope of files with inline data..
>
> I think gfs2 also uses IOMAP_INLINE. In __gfs2_iomap_get(), it looks
> like they report a hole if the offset is beyond the inode size, so
> that looks okay.
>
> >
> > Brian
> >
> > > > One scenario I'm imagining maybe being useful in the future is
> > > > supporting chained inline mappings, in which case we would still want
> > > > to continue processing after the first inline mapping, but we could
> > > > also address that if it does come up.
> > > >
> > >
> > > Yeah..
> > >
> > > > >
> > > > > If we wanted to consider it an optimization so we didn't always do this
> > > > > extra iter on inline files, perhaps another variant of that could be an
> > > > > EOF flag or some such that the fs could set to trigger a full advance
> > > > > after the current mapping. OTOH you could argue that's what inline
> > > > > already is so maybe that's overthinking it. Just a thought. Hm?
> > > > >
> > > >
> > > > Would non-inline iomap types also find that flag helpful or is the
> > > > intention for it to be inline specific? I guess the flag could also be
> > > > used by non-inline types to stop the iteration short, if there's a use
> > > > case for that?
> > > >
> > >
> > > ... I hadn't really considered that. This was just me thinking out loud
> > > about trying to handle things more generically.
> > >
> > > Having thought more about it, I think either way it might just make
> > > sense to fix the inline read case to do the full advance (assuming it is
> > > verified this fixes the problem) to restore historical iomap behavior.
>
> Hmm, the iter is already fully advanced after the inline data is read.
> I think the issue is that when the iomap mapping length is less than
> the iter len (the iter len will always be the size of the folio but
> the length for the inline data could be less), then advancing it fully
> through iomap_iter_advance_full() will only advance it up to the iomap
> length. iter->len will then still have a positive value which makes
> iomap_iter() continue iterating.
>
Derp.. yeah, I had my wires crossed thinking the full advance completed
the full request, when really it's just the current iteration. Thanks
for pointing that out.
> I don't see a cleaner way of restoring historical iomap behavior than
> adding that eof/stop flag.
>
Hmm.. so then aside from the oddly aligned iter case warning that you've
already fixed, ISTM that everything otherwise works correctly, right? If
so, I wouldn't be in a huge rush to do the flag thing.
In theory, there may still be similar options to do what I was thinking,
like perhaps just zero out remaining iter->len in that particular inline
path (maybe via an iomap_iter_advance_inline() helper for example). It's
not totally clear to me it's worth it though if everything is pretty
much working as is.
Brian
> Thanks,
> Joanne
>
>
> > > Then if there is ever value to support multiple inline mappings somehow
> > > or another, we could revisit the flag idea.
>
> > >
> > > Brian
> > >
> > > > Thanks,
> > > > Joanne
>
next prev parent reply other threads:[~2025-10-18 10:25 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-09 22:56 [PATCH v1 0/9] iomap: buffered io changes Joanne Koong
2025-10-09 22:56 ` [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range Joanne Koong
2025-10-13 3:00 ` Christoph Hellwig
2025-10-15 17:34 ` Joanne Koong
2025-10-15 17:40 ` Gao Xiang
2025-10-15 17:49 ` Joanne Koong
2025-10-15 18:06 ` Gao Xiang
2025-10-15 18:21 ` Joanne Koong
2025-10-15 18:39 ` Gao Xiang
2025-10-16 0:36 ` Joanne Koong
2025-10-16 1:27 ` Joanne Koong
2025-10-16 1:58 ` Gao Xiang
2025-10-16 22:03 ` Joanne Koong
2025-10-17 0:03 ` Gao Xiang
2025-10-17 18:41 ` Joanne Koong
2025-10-17 22:07 ` Gao Xiang
2025-10-17 23:22 ` Joanne Koong
2025-10-18 0:12 ` Gao Xiang
2025-10-20 23:25 ` Joanne Koong
2025-10-21 1:39 ` Gao Xiang
2025-10-16 11:29 ` Brian Foster
2025-10-16 22:39 ` Joanne Koong
2025-10-17 15:33 ` Brian Foster
2025-10-17 18:27 ` Brian Foster
2025-10-17 20:19 ` Joanne Koong
2025-10-18 10:30 ` Brian Foster [this message]
2025-10-20 21:39 ` Joanne Koong
2025-10-15 18:28 ` Gao Xiang
2025-10-09 22:56 ` [PATCH v1 2/9] docs: document iomap writeback's iomap_finish_folio_write() requirement Joanne Koong
2025-10-13 3:01 ` Christoph Hellwig
2025-10-09 22:56 ` [PATCH v1 3/9] iomap: optimize pending async writeback accounting Joanne Koong
2025-10-13 3:04 ` Christoph Hellwig
2025-10-09 22:56 ` [PATCH v1 4/9] iomap: simplify ->read_folio_range() error handling for reads Joanne Koong
2025-10-13 3:06 ` Christoph Hellwig
2025-10-09 22:56 ` [PATCH v1 5/9] iomap: simplify when reads can be skipped for writes Joanne Koong
2025-10-13 3:06 ` Christoph Hellwig
2025-10-09 22:56 ` [PATCH v1 6/9] iomap: optimize reads for non-block-aligned writes Joanne Koong
2025-10-13 3:08 ` Christoph Hellwig
2025-10-14 0:04 ` Joanne Koong
2025-10-14 4:14 ` Christoph Hellwig
2025-10-09 22:56 ` [PATCH v1 7/9] iomap: use loff_t for file positions and offsets in writeback code Joanne Koong
2025-10-13 3:09 ` Christoph Hellwig
2025-10-09 22:56 ` [PATCH v1 8/9] iomap: use find_next_bit() for dirty bitmap scanning Joanne Koong
2025-10-13 3:13 ` Christoph Hellwig
2025-10-09 22:56 ` [PATCH v1 9/9] iomap: use find_next_bit() for uptodate " Joanne Koong
2025-10-13 3:13 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aPNsK9D3RtAjccOA@bfoster \
--to=bfoster@redhat.com \
--cc=brauner@kernel.org \
--cc=djwong@kernel.org \
--cc=hch@infradead.org \
--cc=hsiangkao@linux.alibaba.com \
--cc=joannelkoong@gmail.com \
--cc=kernel-team@meta.com \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).