public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: Wei Gao <wegao@suse.com>,
	willy@infradead.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
Date: Tue, 10 Feb 2026 19:00:05 -0500	[thread overview]
Message-ID: <aYvGhQI2ryNR7VLQ@laps> (raw)
In-Reply-To: <CAJnrk1aPs2J_EerLROxtiHAKTyU2NHBkRXpS=-yunEsC9epAWw@mail.gmail.com>

On Tue, Feb 10, 2026 at 02:18:06PM -0800, Joanne Koong wrote:
>On Mon, Feb 9, 2026 at 4:40 PM Wei Gao <wegao@suse.com> wrote:
>>
>> On Mon, Feb 09, 2026 at 04:20:01PM -0800, Joanne Koong wrote:
>> > On Mon, Feb 9, 2026 at 4:12 PM Wei Gao <wegao@suse.com> wrote:
>> > >
>> > > On Mon, Feb 09, 2026 at 11:08:50AM -0800, Joanne Koong wrote:
>> > > > On Fri, Feb 6, 2026 at 11:16 PM Wei Gao <wegao@suse.com> wrote:
>> > > > >
>> > > > > On Tue, Dec 23, 2025 at 08:31:57PM -0500, Sasha Levin wrote:
>> > > > > > On Tue, Dec 23, 2025 at 05:12:09PM -0800, Joanne Koong wrote:
>> > > > > > > On Tue, Dec 23, 2025 at 2:30 PM Sasha Levin <sashal@kernel.org> wrote:
>> > > > > > > >
>> > > > > > >
>> > > > > > > Hi Sasha,
>> > > > > > >
>> > > > > > > Thanks for your patch and for the detailed writeup.
>> > > > > >
>> > > > > > Thanks for looking into this!
>> > > > > >
>> > > > > > > > When iomap uses large folios, per-block uptodate tracking is managed via
>> > > > > > > > iomap_folio_state (ifs). A race condition can cause the ifs uptodate bits
>> > > > > > > > to become inconsistent with the folio's uptodate flag.
>> > > > > > > >
>> > > > > > > > The race occurs because folio_end_read() uses XOR semantics to atomically
>> > > > > > > > set the uptodate bit and clear the locked bit:
>> > > > > > > >
>> > > > > > > >   Thread A (read completion):          Thread B (concurrent write):
>> > > > > > > >   --------------------------------     --------------------------------
>> > > > > > > >   iomap_finish_folio_read()
>> > > > > > > >     spin_lock(state_lock)
>> > > > > > > >     ifs_set_range_uptodate() -> true
>> > > > > > > >     spin_unlock(state_lock)
>> > > > > > > >                                        iomap_set_range_uptodate()
>> > > > > > > >                                          spin_lock(state_lock)
>> > > > > > > >                                          ifs_set_range_uptodate() -> true
>> > > > > > > >                                          spin_unlock(state_lock)
>> > > > > > > >                                          folio_mark_uptodate(folio)
>> > > > > > > >     folio_end_read(folio, true)
>> > > > > > > >       folio_xor_flags()  // XOR CLEARS uptodate!
>> > > > > > >
>> > > > > > > The part I'm confused about here is how this can happen between a
>> > > > > > > concurrent read and write. My understanding is that the folio is
>> > > > > > > locked when the read occurs and locked when the write occurs and both
>> > > > > > > locks get dropped only when the read or write finishes. Looking at
>> > > > > > > iomap code, I see iomap_set_range_uptodate() getting called in
>> > > > > > > __iomap_write_begin() and __iomap_write_end() for the writes, but in
>> > > > > > > both those places the folio lock is held while this is called. I'm not
>> > > > > > > seeing how the read and write race in the diagram can happen, but
>> > > > > > > maybe I'm missing something here?
>> > > > > >
>> > > > > > Hmm, you're right... The folio lock should prevent concurrent read/write
>> > > > > > access. Looking at this again, I suspect that FUSE was calling
>> > > > > > folio_clear_uptodate() and folio_mark_uptodate() directly without updating the
>> > > > > > ifs bits. For example, in fuse_send_write_pages() on write error, it calls
>> > > > > > folio_clear_uptodate(folio) which clears the folio flag but leaves ifs still
>> > > > > > showing all blocks uptodate?
>> > > > >
>> > > > > Hi Sasha
>> > > > > On PowerPC with 64KB page size, msync04 fails with SIGBUS on NTFS-FUSE. The issue stems from a state inconsistency between
>> > > > > the iomap_folio_state (ifs) bitmap and the folio's Uptodate flag.
>> > > > > tst_test.c:1985: TINFO: === Testing on ntfs ===
>> > > > > tst_test.c:1290: TINFO: Formatting /dev/loop0 with ntfs opts='' extra opts=''
>> > > > > Failed to set locale, using default 'C'.
>> > > > > The partition start sector was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
>> > > > > The number of sectors per track was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
>> > > > > The number of heads was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
>> > > > > To boot from a device, Windows needs the 'partition start sector', the 'sectors per track' and the 'number of heads' to be set.
>> > > > > Windows will not be able to boot from this device.
>> > > > > tst_test.c:1302: TINFO: Mounting /dev/loop0 to /tmp/LTP_msy3ljVxi/msync04 fstyp=ntfs flags=0
>> > > > > tst_test.c:1302: TINFO: Trying FUSE...
>> > > > > tst_test.c:1953: TBROK: Test killed by SIGBUS!
>> > > > >
>> > > > > Root Cause Analysis: When a page fault triggers fuse_read_folio, the iomap_read_folio_iter handles the request. For a 64KB page,
>> > > > > after fetching 4KB via fuse_iomap_read_folio_range_async, the remaining 60KB (61440 bytes) is zero-filled via iomap_block_needs_zeroing,
>> > > > > then iomap_set_range_uptodate marks the folio as Uptodate globally, after folio_xor_flags folio's uptodate become 0 again, finally trigger
>> > > > > an SIGBUS issue in filemap_fault.
>> > > >
>> > > > Hi Wei,
>> > > >
>> > > > Thanks for your report. afaict, this scenario occurs only if the
>> > > > server is a fuseblk server with a block size different from the memory
>> > > > page size and if the file size is less than the size of the folio
>> > > > being read in.
>> > > Thanks for checking this and give quick feedback :)
>> > > >
>> > > > Could you verify that this snippet from Sasha's patch fixes the issue?:
>> > > Yes, Sasha's patch can fixes the issue.
>> >
>> > I think just those lines I pasted from Sasha's patch is the relevant
>> > fix. Could you verify that just those lines (without the changes
>> > from the rest of his patch) fixes the issue?
>> Yes, i just add two lines change in iomap_set_range_uptodate can fixes
>> the issue.
>
>Great, thank you for confirming.
>
>Sasha, would you mind submitting this snippet of your patch as the fix
>for the EOF zeroing issue? I think it could be restructured to
>
>diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>index 1fe19b4ee2f4..412e661871f8 100644
>--- a/fs/iomap/buffered-io.c
>+++ b/fs/iomap/buffered-io.c
>@@ -87,7 +87,16 @@ static void iomap_set_range_uptodate(struct folio
>*folio, size_t off,
>
>        if (ifs) {
>                spin_lock_irqsave(&ifs->state_lock, flags);
>-               uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
>+               /*
>+                * If a read is in progress, we must NOT call
>folio_mark_uptodate.
>+                * The read completion path (iomap_finish_folio_read or
>+                * iomap_read_end) will call folio_end_read() which uses XOR
>+                * semantics to set the uptodate bit. If we set it here, the XOR
>+                * in folio_end_read() will clear it, leaving the folio not
>+                * uptodate.
>+                */
>+               uptodate = ifs_set_range_uptodate(folio, ifs, off, len) &&
>+                       !ifs->read_bytes_pending;
>                spin_unlock_irqrestore(&ifs->state_lock, flags);
>        }
>
>to be a bit more concise.
>
>If you're busy and don't have the bandwidth, I'm happy to forward the
>patch on your behalf with your Signed-off-by / authorship.

Thanks for the offer Joanna!

Since you've done all the triaging work here, please go ahead and submit it -
something like a Suggested-by would be more than enought for me :)

-- 
Thanks,
Sasha

  reply	other threads:[~2026-02-11  0:00 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-26  0:25 [PATCH v5 00/14] fuse: use iomap for buffered reads + readahead Joanne Koong
2025-09-26  0:25 ` [PATCH v5 01/14] iomap: move bio read logic into helper function Joanne Koong
2025-09-26  0:25 ` [PATCH v5 02/14] iomap: move read/readahead bio submission " Joanne Koong
2025-09-26  0:25 ` [PATCH v5 03/14] iomap: store read/readahead bio generically Joanne Koong
2025-09-26  0:25 ` [PATCH v5 04/14] iomap: iterate over folio mapping in iomap_readpage_iter() Joanne Koong
2025-09-26  0:26 ` [PATCH v5 05/14] iomap: rename iomap_readpage_iter() to iomap_read_folio_iter() Joanne Koong
2025-09-26  0:26 ` [PATCH v5 06/14] iomap: rename iomap_readpage_ctx struct to iomap_read_folio_ctx Joanne Koong
2025-09-26  0:26 ` [PATCH v5 07/14] iomap: track pending read bytes more optimally Joanne Koong
2025-10-23 19:34   ` Brian Foster
2025-10-24  0:01     ` Joanne Koong
2025-10-24 16:25       ` Joanne Koong
2025-10-24 17:14         ` Brian Foster
2025-10-24 19:48           ` Joanne Koong
2025-10-24 21:55             ` Joanne Koong
2025-10-27 12:16               ` Brian Foster
2025-10-24 17:21         ` Matthew Wilcox
2025-10-24 19:22           ` Joanne Koong
2025-10-24 20:59             ` Matthew Wilcox
2025-10-24 21:37               ` Darrick J. Wong
2025-10-24 21:58               ` Joanne Koong
2025-09-26  0:26 ` [PATCH v5 08/14] iomap: set accurate iter->pos when reading folio ranges Joanne Koong
2025-09-26  0:26 ` [PATCH v5 09/14] iomap: add caller-provided callbacks for read and readahead Joanne Koong
2025-09-26  0:26 ` [PATCH v5 10/14] iomap: move buffered io bio logic into new file Joanne Koong
2025-09-26  0:26 ` [PATCH v5 11/14] iomap: make iomap_read_folio() a void return Joanne Koong
2025-09-26  0:26 ` [PATCH v5 12/14] fuse: use iomap for read_folio Joanne Koong
2025-12-23 22:30   ` [RFC PATCH 0/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read Sasha Levin
2025-12-23 22:30     ` [RFC PATCH 1/1] " Sasha Levin
2025-12-24  1:12       ` Joanne Koong
2025-12-24  1:31         ` Sasha Levin
2026-02-07  7:16           ` Wei Gao
2026-02-09 19:08             ` Joanne Koong
2026-02-10  0:12               ` Wei Gao
2026-02-10  0:20                 ` Joanne Koong
2026-02-10  0:40                   ` Wei Gao
2026-02-10 22:18                     ` Joanne Koong
2026-02-11  0:00                       ` Sasha Levin [this message]
2026-02-11  3:11                       ` Matthew Wilcox
2026-02-11 19:33                         ` Joanne Koong
2026-02-11 21:03                           ` Matthew Wilcox
2026-02-11 23:13                             ` Joanne Koong
2026-02-12 19:31                               ` Matthew Wilcox
2026-02-13  0:53                                 ` Joanne Koong
2025-12-24  2:10         ` Matthew Wilcox
2025-12-24 15:43           ` Sasha Levin
2025-12-24 17:27             ` Matthew Wilcox
2025-12-24 21:21               ` Sasha Levin
2025-12-30  0:58                 ` Joanne Koong
2025-09-26  0:26 ` [PATCH v5 13/14] fuse: use iomap for readahead Joanne Koong
2025-09-26  0:26 ` [PATCH v5 14/14] fuse: remove fc->blkbits workaround for partial writes Joanne Koong
2025-09-29  9:38 ` [PATCH v5 00/14] fuse: use iomap for buffered reads + readahead Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aYvGhQI2ryNR7VLQ@laps \
    --to=sashal@kernel.org \
    --cc=joannelkoong@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=wegao@suse.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox