* [PATCH v1 0/1] iomap: don't mark folio uptodate if read IO has bytes pending
@ 2026-02-19 0:39 Joanne Koong
2026-02-19 0:39 ` [PATCH v1 1/1] " Joanne Koong
0 siblings, 1 reply; 10+ messages in thread
From: Joanne Koong @ 2026-02-19 0:39 UTC (permalink / raw)
To: brauner; +Cc: willy, wegao, sashal, djwong, hch, linux-fsdevel
This is a fix for this scenario:
->read_folio() gets called on a folio size that is 16k while the file is 4k:
a) ifs->read_bytes_pending gets initialized to 16k
b) ->read_folio_range() is called for the 4k read
c) the 4k read succeeds, ifs->read_bytes_pending is now 12k and the
0 to 4k range is marked uptodate
d) the post-eof blocks are zeroed and marked uptodate in the call to
iomap_set_range_uptodate()
e) iomap_set_range_uptodate() sees all the ranges are marked
uptodate and it marks the folio uptodate
f) iomap_read_end() gets called to subtract the 12k from
ifs->read_bytes_pending. it too sees all the ranges are marked
uptodate and marks the folio uptodate using XOR
g) the XOR call clears the uptodate flag on the folio
The same situation can occur if the last range read for the folio is done as an
inline read and all the previous ranges have already completed by the time the
inline read completes.
For more context, the full discussion can be found in [1]. There was a
discussion about alternative approaches in that thread, but they had more
complications.
Thanks,
Joanne
[1] https://lore.kernel.org/linux-fsdevel/CAJnrk1Z9za5w4FoJqTGx50zR2haHHaoot1KJViQyEHJQq4=34w@mail.gmail.com/#t
Joanne Koong (1):
iomap: don't mark folio uptodate if read IO has bytes pending
fs/iomap/buffered-io.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v1 1/1] iomap: don't mark folio uptodate if read IO has bytes pending
2026-02-19 0:39 [PATCH v1 0/1] iomap: don't mark folio uptodate if read IO has bytes pending Joanne Koong
@ 2026-02-19 0:39 ` Joanne Koong
2026-02-19 2:45 ` Darrick J. Wong
0 siblings, 1 reply; 10+ messages in thread
From: Joanne Koong @ 2026-02-19 0:39 UTC (permalink / raw)
To: brauner; +Cc: willy, wegao, sashal, djwong, hch, linux-fsdevel
If a folio has ifs metadata attached to it and the folio is partially
read in through an async IO helper with the rest of it then being read
in through post-EOF zeroing or as inline data, and the helper
successfully finishes the read first, then post-EOF zeroing / reading
inline will mark the folio as uptodate in iomap_set_range_uptodate().
This is a problem because when the read completion path later calls
iomap_read_end(), it will call folio_end_read(), which sets the uptodate
bit using XOR semantics. Calling folio_end_read() on a folio that was
already marked uptodate clears the uptodate bit.
Fix this by not marking the folio as uptodate if the read IO has bytes
pending. The folio uptodate state will be set in the read completion
path through iomap_end_read() -> folio_end_read().
Reported-by: Wei Gao <wegao@suse.com>
Suggested-by: Sasha Levin <sashal@kernel.org>
Tested-by: Wei Gao <wegao@suse.com>
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Fixes: b2f35ac4146d ("iomap: add caller-provided callbacks for read and readahead")
---
fs/iomap/buffered-io.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 58887513b894..4fc5ce963feb 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -80,18 +80,27 @@ static void iomap_set_range_uptodate(struct folio *folio, size_t off,
{
struct iomap_folio_state *ifs = folio->private;
unsigned long flags;
- bool uptodate = true;
+ bool mark_uptodate = true;
if (folio_test_uptodate(folio))
return;
if (ifs) {
spin_lock_irqsave(&ifs->state_lock, flags);
- uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
+ /*
+ * If a read with bytes pending is in progress, we must not call
+ * folio_mark_uptodate(). The read completion path
+ * (iomap_read_end()) will call folio_end_read(), which uses XOR
+ * semantics to set the uptodate bit. If we set it here, the XOR
+ * in folio_end_read() will clear it, leaving the folio not
+ * uptodate.
+ */
+ mark_uptodate = ifs_set_range_uptodate(folio, ifs, off, len) &&
+ !ifs->read_bytes_pending;
spin_unlock_irqrestore(&ifs->state_lock, flags);
}
- if (uptodate)
+ if (mark_uptodate)
folio_mark_uptodate(folio);
}
--
2.47.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v1 1/1] iomap: don't mark folio uptodate if read IO has bytes pending
2026-02-19 0:39 ` [PATCH v1 1/1] " Joanne Koong
@ 2026-02-19 2:45 ` Darrick J. Wong
2026-02-19 4:23 ` Matthew Wilcox
0 siblings, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2026-02-19 2:45 UTC (permalink / raw)
To: Joanne Koong; +Cc: brauner, willy, wegao, sashal, hch, linux-fsdevel
On Wed, Feb 18, 2026 at 04:39:11PM -0800, Joanne Koong wrote:
> If a folio has ifs metadata attached to it and the folio is partially
> read in through an async IO helper with the rest of it then being read
> in through post-EOF zeroing or as inline data, and the helper
> successfully finishes the read first, then post-EOF zeroing / reading
> inline will mark the folio as uptodate in iomap_set_range_uptodate().
>
> This is a problem because when the read completion path later calls
> iomap_read_end(), it will call folio_end_read(), which sets the uptodate
> bit using XOR semantics. Calling folio_end_read() on a folio that was
> already marked uptodate clears the uptodate bit.
Aha, I wondered if that xor thing was going to come back to bite us.
> Fix this by not marking the folio as uptodate if the read IO has bytes
> pending. The folio uptodate state will be set in the read completion
> path through iomap_end_read() -> folio_end_read().
>
> Reported-by: Wei Gao <wegao@suse.com>
> Suggested-by: Sasha Levin <sashal@kernel.org>
> Tested-by: Wei Gao <wegao@suse.com>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Fixes: b2f35ac4146d ("iomap: add caller-provided callbacks for read and readahead")
I would add:
Link: https://lore.kernel.org/linux-fsdevel/aYbmy8JdgXwsGaPP@autotest-wegao.qe.prg2.suse.org/
Cc: <stable@vger.kernel.org> # v6.19
since the recent discussion around this was sort of buried in a
different thread, and the original patch is now in a released kernel.
> ---
> fs/iomap/buffered-io.c | 15 ++++++++++++---
> 1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 58887513b894..4fc5ce963feb 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -80,18 +80,27 @@ static void iomap_set_range_uptodate(struct folio *folio, size_t off,
> {
> struct iomap_folio_state *ifs = folio->private;
> unsigned long flags;
> - bool uptodate = true;
> + bool mark_uptodate = true;
>
> if (folio_test_uptodate(folio))
> return;
>
> if (ifs) {
> spin_lock_irqsave(&ifs->state_lock, flags);
> - uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
> + /*
> + * If a read with bytes pending is in progress, we must not call
> + * folio_mark_uptodate(). The read completion path
> + * (iomap_read_end()) will call folio_end_read(), which uses XOR
> + * semantics to set the uptodate bit. If we set it here, the XOR
> + * in folio_end_read() will clear it, leaving the folio not
> + * uptodate.
Yeah, that makes sense. How difficult is this to write up as an fstest?
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> + */
> + mark_uptodate = ifs_set_range_uptodate(folio, ifs, off, len) &&
> + !ifs->read_bytes_pending;
> spin_unlock_irqrestore(&ifs->state_lock, flags);
> }
>
> - if (uptodate)
> + if (mark_uptodate)
> folio_mark_uptodate(folio);
> }
>
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1 1/1] iomap: don't mark folio uptodate if read IO has bytes pending
2026-02-19 2:45 ` Darrick J. Wong
@ 2026-02-19 4:23 ` Matthew Wilcox
2026-02-19 6:11 ` Darrick J. Wong
2026-02-20 23:45 ` Darrick J. Wong
0 siblings, 2 replies; 10+ messages in thread
From: Matthew Wilcox @ 2026-02-19 4:23 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Joanne Koong, brauner, wegao, sashal, hch, linux-fsdevel
On Wed, Feb 18, 2026 at 06:45:34PM -0800, Darrick J. Wong wrote:
> On Wed, Feb 18, 2026 at 04:39:11PM -0800, Joanne Koong wrote:
> > If a folio has ifs metadata attached to it and the folio is partially
> > read in through an async IO helper with the rest of it then being read
> > in through post-EOF zeroing or as inline data, and the helper
> > successfully finishes the read first, then post-EOF zeroing / reading
> > inline will mark the folio as uptodate in iomap_set_range_uptodate().
> >
> > This is a problem because when the read completion path later calls
> > iomap_read_end(), it will call folio_end_read(), which sets the uptodate
> > bit using XOR semantics. Calling folio_end_read() on a folio that was
> > already marked uptodate clears the uptodate bit.
>
> Aha, I wondered if that xor thing was going to come back to bite us.
This isn't "the xor thing has come back to bite us". This is "the iomap
code is now too complicated and I cannot figure out how to explain to
Joanne that there's really a simple way to do this".
I'm going to have to set aside my current projects and redo the iomap
readahead/read_folio code myself, aren't I?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1 1/1] iomap: don't mark folio uptodate if read IO has bytes pending
2026-02-19 4:23 ` Matthew Wilcox
@ 2026-02-19 6:11 ` Darrick J. Wong
2026-02-19 6:11 ` Christoph Hellwig
2026-02-20 22:13 ` Joanne Koong
2026-02-20 23:45 ` Darrick J. Wong
1 sibling, 2 replies; 10+ messages in thread
From: Darrick J. Wong @ 2026-02-19 6:11 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Joanne Koong, brauner, wegao, sashal, hch, linux-fsdevel
On Thu, Feb 19, 2026 at 04:23:23AM +0000, Matthew Wilcox wrote:
> On Wed, Feb 18, 2026 at 06:45:34PM -0800, Darrick J. Wong wrote:
> > On Wed, Feb 18, 2026 at 04:39:11PM -0800, Joanne Koong wrote:
> > > If a folio has ifs metadata attached to it and the folio is partially
> > > read in through an async IO helper with the rest of it then being read
> > > in through post-EOF zeroing or as inline data, and the helper
> > > successfully finishes the read first, then post-EOF zeroing / reading
> > > inline will mark the folio as uptodate in iomap_set_range_uptodate().
> > >
> > > This is a problem because when the read completion path later calls
> > > iomap_read_end(), it will call folio_end_read(), which sets the uptodate
> > > bit using XOR semantics. Calling folio_end_read() on a folio that was
> > > already marked uptodate clears the uptodate bit.
> >
> > Aha, I wondered if that xor thing was going to come back to bite us.
>
> This isn't "the xor thing has come back to bite us". This is "the iomap
> code is now too complicated and I cannot figure out how to explain to
> Joanne that there's really a simple way to do this".
>
> I'm going to have to set aside my current projects and redo the iomap
> readahead/read_folio code myself, aren't I?
Well you could try explaining to me what that simpler way is?
/me gets the sense he's missing a discussion somewhere...
--D
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1 1/1] iomap: don't mark folio uptodate if read IO has bytes pending
2026-02-19 6:11 ` Darrick J. Wong
@ 2026-02-19 6:11 ` Christoph Hellwig
2026-02-20 22:13 ` Joanne Koong
1 sibling, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2026-02-19 6:11 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Matthew Wilcox, Joanne Koong, brauner, wegao, sashal, hch,
linux-fsdevel
On Wed, Feb 18, 2026 at 10:11:01PM -0800, Darrick J. Wong wrote:
> > This isn't "the xor thing has come back to bite us". This is "the iomap
> > code is now too complicated and I cannot figure out how to explain to
> > Joanne that there's really a simple way to do this".
> >
> > I'm going to have to set aside my current projects and redo the iomap
> > readahead/read_folio code myself, aren't I?
>
> Well you could try explaining to me what that simpler way is?
>
> /me gets the sense he's missing a discussion somewhere...
Same.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1 1/1] iomap: don't mark folio uptodate if read IO has bytes pending
2026-02-19 6:11 ` Darrick J. Wong
2026-02-19 6:11 ` Christoph Hellwig
@ 2026-02-20 22:13 ` Joanne Koong
1 sibling, 0 replies; 10+ messages in thread
From: Joanne Koong @ 2026-02-20 22:13 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Matthew Wilcox, brauner, wegao, sashal, hch, linux-fsdevel
On Wed, Feb 18, 2026 at 10:11 PM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Thu, Feb 19, 2026 at 04:23:23AM +0000, Matthew Wilcox wrote:
> > On Wed, Feb 18, 2026 at 06:45:34PM -0800, Darrick J. Wong wrote:
> > > On Wed, Feb 18, 2026 at 04:39:11PM -0800, Joanne Koong wrote:
> > > > If a folio has ifs metadata attached to it and the folio is partially
> > > > read in through an async IO helper with the rest of it then being read
> > > > in through post-EOF zeroing or as inline data, and the helper
> > > > successfully finishes the read first, then post-EOF zeroing / reading
> > > > inline will mark the folio as uptodate in iomap_set_range_uptodate().
> > > >
> > > > This is a problem because when the read completion path later calls
> > > > iomap_read_end(), it will call folio_end_read(), which sets the uptodate
> > > > bit using XOR semantics. Calling folio_end_read() on a folio that was
> > > > already marked uptodate clears the uptodate bit.
> > >
> > > Aha, I wondered if that xor thing was going to come back to bite us.
> >
> > This isn't "the xor thing has come back to bite us". This is "the iomap
> > code is now too complicated and I cannot figure out how to explain to
> > Joanne that there's really a simple way to do this".
> >
> > I'm going to have to set aside my current projects and redo the iomap
> > readahead/read_folio code myself, aren't I?
>
> Well you could try explaining to me what that simpler way is?
>
> /me gets the sense he's missing a discussion somewhere...
This is the link to the prior discussion
https://lore.kernel.org/linux-fsdevel/20251223223018.3295372-1-sashal@kernel.org/T/#mbd61eaa5fd1e8922caa479720232628e39b8c9da
Thanks,
Joanne
>
> --D
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1 1/1] iomap: don't mark folio uptodate if read IO has bytes pending
2026-02-19 4:23 ` Matthew Wilcox
2026-02-19 6:11 ` Darrick J. Wong
@ 2026-02-20 23:45 ` Darrick J. Wong
2026-02-23 23:53 ` Joanne Koong
1 sibling, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2026-02-20 23:45 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Joanne Koong, brauner, wegao, sashal, hch, linux-fsdevel
On Thu, Feb 19, 2026 at 04:23:23AM +0000, Matthew Wilcox wrote:
> On Wed, Feb 18, 2026 at 06:45:34PM -0800, Darrick J. Wong wrote:
> > On Wed, Feb 18, 2026 at 04:39:11PM -0800, Joanne Koong wrote:
> > > If a folio has ifs metadata attached to it and the folio is partially
> > > read in through an async IO helper with the rest of it then being read
> > > in through post-EOF zeroing or as inline data, and the helper
> > > successfully finishes the read first, then post-EOF zeroing / reading
> > > inline will mark the folio as uptodate in iomap_set_range_uptodate().
> > >
> > > This is a problem because when the read completion path later calls
> > > iomap_read_end(), it will call folio_end_read(), which sets the uptodate
> > > bit using XOR semantics. Calling folio_end_read() on a folio that was
> > > already marked uptodate clears the uptodate bit.
> >
> > Aha, I wondered if that xor thing was going to come back to bite us.
>
> This isn't "the xor thing has come back to bite us". This is "the iomap
> code is now too complicated and I cannot figure out how to explain to
> Joanne that there's really a simple way to do this".
>
> I'm going to have to set aside my current projects and redo the iomap
> readahead/read_folio code myself, aren't I?
<willy and I had a chat; this is a clumsy non-AI summary of it>
I started looking at folio read state management in iomap, and made a
few observations that (I hope) match what willy's grumpy about.
There are three ways that iomap can be reading into the pagecache:
a) async ->readahead,
b) synchronous ->read_folio (page faults), and
c) synchronous ->read_folio_range (pagecache write).
(Note that (b) can call a different ->read_folio_range than (c), though
all implementations seem to have the same function)
All three of these IO paths share the behavior that they try to fill out
the folio's contents and set the corresponding folio/ifs uptodate bits
if that succeeds. Folio contents can come from anywhere, whether it's:
i) zeroing memory,
ii) copying from an inlinedata buffer, or
iii) asynchronously fetching the contents from somewhere
In the case of (c) above, if the read fails then we fail the write, and
if the read succeeds then we start copying to the pagecache.
However, (a) and (b) have this additional read_bytes_pending field in
the ifs that implements some extra tracking. AFAICT the purpose of this
field is to ensure that we don't call folio_end_read prematurely if
there's an async read in progress. This can happen if iomap_iter
returns a negative errno on a partially processed folio, I think?
read_bytes_pending is initialized to the folio_size() at the start of a
read and subtracted from when parts of the folio are supplied, whether
that's synchronous zeroing or asynchronous read ioend completion. When
the field reaches zero, we can then call folio_end_read().
But then there are twists, like the fact that we only call
iomap_read_init() to set read_bytes_pending if we decide to do an
asynchronous read. Or that iomap_read_end and iomap_finish_folio_read
have awfully similar code. I think in the case of (i) and (ii) we also
don't touch read_pending_bytes at all, and merely set the uptodate bits?
This is confusing to me. It would be more straightforward (I think) if
we just did it for all cases instead of adding more conditionals. IOWs,
how hard would it be to consolidate the read code so that there's one
function that iomap calls when it has filled out part of a folio. Is
that possible, even though we shouldn't be calling folio_end_read during
a pagecache write?
At the end of the day, however, there's a bug in Linus' tree and we need
to fix it, so Joanne's patch is a sufficient bandaid until we can go
clean this up.
--D
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1 1/1] iomap: don't mark folio uptodate if read IO has bytes pending
2026-02-20 23:45 ` Darrick J. Wong
@ 2026-02-23 23:53 ` Joanne Koong
2026-02-24 15:16 ` Christoph Hellwig
0 siblings, 1 reply; 10+ messages in thread
From: Joanne Koong @ 2026-02-23 23:53 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Matthew Wilcox, brauner, wegao, sashal, hch, linux-fsdevel
On Fri, Feb 20, 2026 at 3:45 PM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Thu, Feb 19, 2026 at 04:23:23AM +0000, Matthew Wilcox wrote:
> > On Wed, Feb 18, 2026 at 06:45:34PM -0800, Darrick J. Wong wrote:
> > > On Wed, Feb 18, 2026 at 04:39:11PM -0800, Joanne Koong wrote:
> > > > If a folio has ifs metadata attached to it and the folio is partially
> > > > read in through an async IO helper with the rest of it then being read
> > > > in through post-EOF zeroing or as inline data, and the helper
> > > > successfully finishes the read first, then post-EOF zeroing / reading
> > > > inline will mark the folio as uptodate in iomap_set_range_uptodate().
> > > >
> > > > This is a problem because when the read completion path later calls
> > > > iomap_read_end(), it will call folio_end_read(), which sets the uptodate
> > > > bit using XOR semantics. Calling folio_end_read() on a folio that was
> > > > already marked uptodate clears the uptodate bit.
> > >
> > > Aha, I wondered if that xor thing was going to come back to bite us.
> >
> > This isn't "the xor thing has come back to bite us". This is "the iomap
> > code is now too complicated and I cannot figure out how to explain to
> > Joanne that there's really a simple way to do this".
> >
> > I'm going to have to set aside my current projects and redo the iomap
> > readahead/read_folio code myself, aren't I?
>
> <willy and I had a chat; this is a clumsy non-AI summary of it>
>
> I started looking at folio read state management in iomap, and made a
> few observations that (I hope) match what willy's grumpy about.
>
> There are three ways that iomap can be reading into the pagecache:
> a) async ->readahead,
> b) synchronous ->read_folio (page faults), and
b) is async as well. The code for b) and a) are exactly the same (the
logic in iomap_read_folio_iter())
> c) synchronous ->read_folio_range (pagecache write).
>
> (Note that (b) can call a different ->read_folio_range than (c), though
> all implementations seem to have the same function)
>
> All three of these IO paths share the behavior that they try to fill out
> the folio's contents and set the corresponding folio/ifs uptodate bits
> if that succeeds. Folio contents can come from anywhere, whether it's:
>
> i) zeroing memory,
> ii) copying from an inlinedata buffer, or
> iii) asynchronously fetching the contents from somewhere
>
> In the case of (c) above, if the read fails then we fail the write, and
> if the read succeeds then we start copying to the pagecache.
>
> However, (a) and (b) have this additional read_bytes_pending field in
> the ifs that implements some extra tracking. AFAICT the purpose of this
> field is to ensure that we don't call folio_end_read prematurely if
> there's an async read in progress. This can happen if iomap_iter
> returns a negative errno on a partially processed folio, I think?
>
> read_bytes_pending is initialized to the folio_size() at the start of a
> read and subtracted from when parts of the folio are supplied, whether
> that's synchronous zeroing or asynchronous read ioend completion. When
Synchronous zeroing does not update read_bytes_pending, only async
read completions do.
> the field reaches zero, we can then call folio_end_read().
>
> But then there are twists, like the fact that we only call
> iomap_read_init() to set read_bytes_pending if we decide to do an
> asynchronous read. Or that iomap_read_end and iomap_finish_folio_read
> have awfully similar code. I think in the case of (i) and (ii) we also
> don't touch read_pending_bytes at all, and merely set the uptodate bits?
>
> This is confusing to me. It would be more straightforward (I think) if
> we just did it for all cases instead of adding more conditionals. IOWs,
> how hard would it be to consolidate the read code so that there's one
> function that iomap calls when it has filled out part of a folio. Is
> that possible, even though we shouldn't be calling folio_end_read during
> a pagecache write?
imo, I don't think the synchronous ->read_folio_range() for buffered
writes should be consolidated with the async read logic. If we have
the synchronous write path setting read_bytes_pending, that adds extra
overhead with having to acquire/release the spinlock for every range
read in. It also makes the handling more complicated (eg now having to
differentiate whether the folio was read in for a read vs. a write).
Synchronous ->read_folio_range() for buffered writes is extremely
simple and self-contained right now and I think it should be kept that
way.
For async reads, I agree that there are a bunch of different edge
cases that arise from i) ii) and iii), and from the fact that a folio
could be composed of a mixture of i) ii) and iii).
The motivation for adding read_bytes_pending was so we could know
which async read finishes last. eg this example scenario: read a 64k
folio where the first and last page are not uptodate but everything in
between is
* ->read_folio_range() for 0 to 4k
* ->read_folio_range() for 60k to 64k
These two async read calls may be two different I/O requests that
complete at different times but only the last finisher should call
folio_end_read().
I don't think having the zeroing and inline read paths also
manipulating read_bytes_pending helps here. This was discussed a bit
in [1] but I think it runs into other edge cases / race conditions [2]
that would need to be accounted for and makes some paths more
suboptimal (eg unnecessary ifs allocations and spinlock acquires). But
maybe I'm missing something here and there is a better approach for
doing this?
Thanks,
Joanne
[1] https://lore.kernel.org/linux-fsdevel/CAJnrk1YcuhKwbZLo-11=umcTzH_OJ+bdwZq5=XjeJo8gb9e5ig@mail.gmail.com/T/#md09648082a96122ec1e541993872e0c43da5105f
[2] https://lore.kernel.org/linux-fsdevel/CAJnrk1YcuhKwbZLo-11=umcTzH_OJ+bdwZq5=XjeJo8gb9e5ig@mail.gmail.com/T/#mdc49b649378798fa9e850c9c6914c8c6af5e2895
>
> At the end of the day, however, there's a bug in Linus' tree and we need
> to fix it, so Joanne's patch is a sufficient bandaid until we can go
> clean this up.
>
> --D
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1 1/1] iomap: don't mark folio uptodate if read IO has bytes pending
2026-02-23 23:53 ` Joanne Koong
@ 2026-02-24 15:16 ` Christoph Hellwig
0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2026-02-24 15:16 UTC (permalink / raw)
To: Joanne Koong
Cc: Darrick J. Wong, Matthew Wilcox, brauner, wegao, sashal, hch,
linux-fsdevel
On Mon, Feb 23, 2026 at 03:53:15PM -0800, Joanne Koong wrote:
> > There are three ways that iomap can be reading into the pagecache:
> > a) async ->readahead,
> > b) synchronous ->read_folio (page faults), and
>
> b) is async as well. The code for b) and a) are exactly the same (the
> logic in iomap_read_folio_iter())
Yes.
> > This is confusing to me. It would be more straightforward (I think) if
> > we just did it for all cases instead of adding more conditionals. IOWs,
> > how hard would it be to consolidate the read code so that there's one
> > function that iomap calls when it has filled out part of a folio. Is
> > that possible, even though we shouldn't be calling folio_end_read during
> > a pagecache write?
>
> imo, I don't think the synchronous ->read_folio_range() for buffered
> writes should be consolidated with the async read logic.
Yes. I've been thinking about that on and off, but unfortunately so far
I've not come up with a good idea how to merge the code. Doing so would
be very useful for many reasons.
The problem with that isn't really async vs sync; ->read_folio clearly
shows you you turn underlying asynchronous logic into a synchronous call.
It's really about the range logic, where the writer preparation might
want to only read the head and the tail segments of a folio.
But if we can merge that into the main implementation and have a single
core implementation we'd be much better off.
Anyone looking for a "little" project? :)
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-02-24 15:16 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-19 0:39 [PATCH v1 0/1] iomap: don't mark folio uptodate if read IO has bytes pending Joanne Koong
2026-02-19 0:39 ` [PATCH v1 1/1] " Joanne Koong
2026-02-19 2:45 ` Darrick J. Wong
2026-02-19 4:23 ` Matthew Wilcox
2026-02-19 6:11 ` Darrick J. Wong
2026-02-19 6:11 ` Christoph Hellwig
2026-02-20 22:13 ` Joanne Koong
2026-02-20 23:45 ` Darrick J. Wong
2026-02-23 23:53 ` Joanne Koong
2026-02-24 15:16 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox