From: Sasha Levin <sashal@kernel.org>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: Wei Gao <wegao@suse.com>,
willy@infradead.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
Date: Tue, 10 Feb 2026 19:00:05 -0500 [thread overview]
Message-ID: <aYvGhQI2ryNR7VLQ@laps> (raw)
In-Reply-To: <CAJnrk1aPs2J_EerLROxtiHAKTyU2NHBkRXpS=-yunEsC9epAWw@mail.gmail.com>
On Tue, Feb 10, 2026 at 02:18:06PM -0800, Joanne Koong wrote:
>On Mon, Feb 9, 2026 at 4:40 PM Wei Gao <wegao@suse.com> wrote:
>>
>> On Mon, Feb 09, 2026 at 04:20:01PM -0800, Joanne Koong wrote:
>> > On Mon, Feb 9, 2026 at 4:12 PM Wei Gao <wegao@suse.com> wrote:
>> > >
>> > > On Mon, Feb 09, 2026 at 11:08:50AM -0800, Joanne Koong wrote:
>> > > > On Fri, Feb 6, 2026 at 11:16 PM Wei Gao <wegao@suse.com> wrote:
>> > > > >
>> > > > > On Tue, Dec 23, 2025 at 08:31:57PM -0500, Sasha Levin wrote:
>> > > > > > On Tue, Dec 23, 2025 at 05:12:09PM -0800, Joanne Koong wrote:
>> > > > > > > On Tue, Dec 23, 2025 at 2:30 PM Sasha Levin <sashal@kernel.org> wrote:
>> > > > > > > >
>> > > > > > >
>> > > > > > > Hi Sasha,
>> > > > > > >
>> > > > > > > Thanks for your patch and for the detailed writeup.
>> > > > > >
>> > > > > > Thanks for looking into this!
>> > > > > >
>> > > > > > > > When iomap uses large folios, per-block uptodate tracking is managed via
>> > > > > > > > iomap_folio_state (ifs). A race condition can cause the ifs uptodate bits
>> > > > > > > > to become inconsistent with the folio's uptodate flag.
>> > > > > > > >
>> > > > > > > > The race occurs because folio_end_read() uses XOR semantics to atomically
>> > > > > > > > set the uptodate bit and clear the locked bit:
>> > > > > > > >
>> > > > > > > > Thread A (read completion): Thread B (concurrent write):
>> > > > > > > > -------------------------------- --------------------------------
>> > > > > > > > iomap_finish_folio_read()
>> > > > > > > > spin_lock(state_lock)
>> > > > > > > > ifs_set_range_uptodate() -> true
>> > > > > > > > spin_unlock(state_lock)
>> > > > > > > > iomap_set_range_uptodate()
>> > > > > > > > spin_lock(state_lock)
>> > > > > > > > ifs_set_range_uptodate() -> true
>> > > > > > > > spin_unlock(state_lock)
>> > > > > > > > folio_mark_uptodate(folio)
>> > > > > > > > folio_end_read(folio, true)
>> > > > > > > > folio_xor_flags() // XOR CLEARS uptodate!
>> > > > > > >
>> > > > > > > The part I'm confused about here is how this can happen between a
>> > > > > > > concurrent read and write. My understanding is that the folio is
>> > > > > > > locked when the read occurs and locked when the write occurs and both
>> > > > > > > locks get dropped only when the read or write finishes. Looking at
>> > > > > > > iomap code, I see iomap_set_range_uptodate() getting called in
>> > > > > > > __iomap_write_begin() and __iomap_write_end() for the writes, but in
>> > > > > > > both those places the folio lock is held while this is called. I'm not
>> > > > > > > seeing how the read and write race in the diagram can happen, but
>> > > > > > > maybe I'm missing something here?
>> > > > > >
>> > > > > > Hmm, you're right... The folio lock should prevent concurrent read/write
>> > > > > > access. Looking at this again, I suspect that FUSE was calling
>> > > > > > folio_clear_uptodate() and folio_mark_uptodate() directly without updating the
>> > > > > > ifs bits. For example, in fuse_send_write_pages() on write error, it calls
>> > > > > > folio_clear_uptodate(folio) which clears the folio flag but leaves ifs still
>> > > > > > showing all blocks uptodate?
>> > > > >
>> > > > > Hi Sasha
>> > > > > On PowerPC with 64KB page size, msync04 fails with SIGBUS on NTFS-FUSE. The issue stems from a state inconsistency between
>> > > > > the iomap_folio_state (ifs) bitmap and the folio's Uptodate flag.
>> > > > > tst_test.c:1985: TINFO: === Testing on ntfs ===
>> > > > > tst_test.c:1290: TINFO: Formatting /dev/loop0 with ntfs opts='' extra opts=''
>> > > > > Failed to set locale, using default 'C'.
>> > > > > The partition start sector was not specified for /dev/loop0 and it could not be obtained automatically. It has been set to 0.
>> > > > > The number of sectors per track was not specified for /dev/loop0 and it could not be obtained automatically. It has been set to 0.
>> > > > > The number of heads was not specified for /dev/loop0 and it could not be obtained automatically. It has been set to 0.
>> > > > > To boot from a device, Windows needs the 'partition start sector', the 'sectors per track' and the 'number of heads' to be set.
>> > > > > Windows will not be able to boot from this device.
>> > > > > tst_test.c:1302: TINFO: Mounting /dev/loop0 to /tmp/LTP_msy3ljVxi/msync04 fstyp=ntfs flags=0
>> > > > > tst_test.c:1302: TINFO: Trying FUSE...
>> > > > > tst_test.c:1953: TBROK: Test killed by SIGBUS!
>> > > > >
>> > > > > Root Cause Analysis: When a page fault triggers fuse_read_folio, the iomap_read_folio_iter handles the request. For a 64KB page,
>> > > > > after fetching 4KB via fuse_iomap_read_folio_range_async, the remaining 60KB (61440 bytes) is zero-filled via iomap_block_needs_zeroing,
>> > > > > then iomap_set_range_uptodate marks the folio as Uptodate globally, after folio_xor_flags folio's uptodate become 0 again, finally trigger
>> > > > > an SIGBUS issue in filemap_fault.
>> > > >
>> > > > Hi Wei,
>> > > >
>> > > > Thanks for your report. afaict, this scenario occurs only if the
>> > > > server is a fuseblk server with a block size different from the memory
>> > > > page size and if the file size is less than the size of the folio
>> > > > being read in.
>> > > Thanks for checking this and give quick feedback :)
>> > > >
>> > > > Could you verify that this snippet from Sasha's patch fixes the issue?:
>> > > Yes, Sasha's patch can fixes the issue.
>> >
>> > I think just those lines I pasted from Sasha's patch is the relevant
>> > fix. Could you verify that just those lines (without the changes
>> > from the rest of his patch) fixes the issue?
>> Yes, i just add two lines change in iomap_set_range_uptodate can fixes
>> the issue.
>
>Great, thank you for confirming.
>
>Sasha, would you mind submitting this snippet of your patch as the fix
>for the EOF zeroing issue? I think it could be restructured to
>
>diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>index 1fe19b4ee2f4..412e661871f8 100644
>--- a/fs/iomap/buffered-io.c
>+++ b/fs/iomap/buffered-io.c
>@@ -87,7 +87,16 @@ static void iomap_set_range_uptodate(struct folio
>*folio, size_t off,
>
> if (ifs) {
> spin_lock_irqsave(&ifs->state_lock, flags);
>- uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
>+ /*
>+ * If a read is in progress, we must NOT call
>folio_mark_uptodate.
>+ * The read completion path (iomap_finish_folio_read or
>+ * iomap_read_end) will call folio_end_read() which uses XOR
>+ * semantics to set the uptodate bit. If we set it here, the XOR
>+ * in folio_end_read() will clear it, leaving the folio not
>+ * uptodate.
>+ */
>+ uptodate = ifs_set_range_uptodate(folio, ifs, off, len) &&
>+ !ifs->read_bytes_pending;
> spin_unlock_irqrestore(&ifs->state_lock, flags);
> }
>
>to be a bit more concise.
>
>If you're busy and don't have the bandwidth, I'm happy to forward the
>patch on your behalf with your Signed-off-by / authorship.
Thanks for the offer Joanna!
Since you've done all the triaging work here, please go ahead and submit it -
something like a Suggested-by would be more than enought for me :)
--
Thanks,
Sasha
next prev parent reply other threads:[~2026-02-11 0:00 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-26 0:25 [PATCH v5 00/14] fuse: use iomap for buffered reads + readahead Joanne Koong
2025-09-26 0:25 ` [PATCH v5 01/14] iomap: move bio read logic into helper function Joanne Koong
2025-09-26 0:25 ` [PATCH v5 02/14] iomap: move read/readahead bio submission " Joanne Koong
2025-09-26 0:25 ` [PATCH v5 03/14] iomap: store read/readahead bio generically Joanne Koong
2025-09-26 0:25 ` [PATCH v5 04/14] iomap: iterate over folio mapping in iomap_readpage_iter() Joanne Koong
2025-09-26 0:26 ` [PATCH v5 05/14] iomap: rename iomap_readpage_iter() to iomap_read_folio_iter() Joanne Koong
2025-09-26 0:26 ` [PATCH v5 06/14] iomap: rename iomap_readpage_ctx struct to iomap_read_folio_ctx Joanne Koong
2025-09-26 0:26 ` [PATCH v5 07/14] iomap: track pending read bytes more optimally Joanne Koong
2025-10-23 19:34 ` Brian Foster
2025-10-24 0:01 ` Joanne Koong
2025-10-24 16:25 ` Joanne Koong
2025-10-24 17:14 ` Brian Foster
2025-10-24 19:48 ` Joanne Koong
2025-10-24 21:55 ` Joanne Koong
2025-10-27 12:16 ` Brian Foster
2025-10-24 17:21 ` Matthew Wilcox
2025-10-24 19:22 ` Joanne Koong
2025-10-24 20:59 ` Matthew Wilcox
2025-10-24 21:37 ` Darrick J. Wong
2025-10-24 21:58 ` Joanne Koong
2025-09-26 0:26 ` [PATCH v5 08/14] iomap: set accurate iter->pos when reading folio ranges Joanne Koong
2025-09-26 0:26 ` [PATCH v5 09/14] iomap: add caller-provided callbacks for read and readahead Joanne Koong
2025-09-26 0:26 ` [PATCH v5 10/14] iomap: move buffered io bio logic into new file Joanne Koong
2025-09-26 0:26 ` [PATCH v5 11/14] iomap: make iomap_read_folio() a void return Joanne Koong
2025-09-26 0:26 ` [PATCH v5 12/14] fuse: use iomap for read_folio Joanne Koong
2025-12-23 22:30 ` [RFC PATCH 0/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read Sasha Levin
2025-12-23 22:30 ` [RFC PATCH 1/1] " Sasha Levin
2025-12-24 1:12 ` Joanne Koong
2025-12-24 1:31 ` Sasha Levin
2026-02-07 7:16 ` Wei Gao
2026-02-09 19:08 ` Joanne Koong
2026-02-10 0:12 ` Wei Gao
2026-02-10 0:20 ` Joanne Koong
2026-02-10 0:40 ` Wei Gao
2026-02-10 22:18 ` Joanne Koong
2026-02-11 0:00 ` Sasha Levin [this message]
2026-02-11 3:11 ` Matthew Wilcox
2026-02-11 19:33 ` Joanne Koong
2026-02-11 21:03 ` Matthew Wilcox
2026-02-11 23:13 ` Joanne Koong
2026-02-12 19:31 ` Matthew Wilcox
2026-02-13 0:53 ` Joanne Koong
2025-12-24 2:10 ` Matthew Wilcox
2025-12-24 15:43 ` Sasha Levin
2025-12-24 17:27 ` Matthew Wilcox
2025-12-24 21:21 ` Sasha Levin
2025-12-30 0:58 ` Joanne Koong
2025-09-26 0:26 ` [PATCH v5 13/14] fuse: use iomap for readahead Joanne Koong
2025-09-26 0:26 ` [PATCH v5 14/14] fuse: remove fc->blkbits workaround for partial writes Joanne Koong
2025-09-29 9:38 ` [PATCH v5 00/14] fuse: use iomap for buffered reads + readahead Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aYvGhQI2ryNR7VLQ@laps \
--to=sashal@kernel.org \
--cc=joannelkoong@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=wegao@suse.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.