[RFC PATCH 0/1] iomap: fix race between iomap_set_range_uptodate and folio_end

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 0/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
       [not found] <20250926002609.1302233-13-joannelkoong@gmail.com>
@ 2025-12-23 22:30 ` Sasha Levin
  2025-12-23 22:30   ` [RFC PATCH 1/1] " Sasha Levin
  0 siblings, 1 reply; 22+ messages in thread
From: Sasha Levin @ 2025-12-23 22:30 UTC (permalink / raw)
  To: joannelkoong; +Cc: willy, linux-fsdevel, linux-kernel, Sasha Levin

Hi Joanne,

While testing with your FUSE iomap patchset that recently landed upstream,
I ran into a warning in ifs_free() where the folio's uptodate flag didn't
match the ifs per-block uptodate bitmap. The warning was triggered during
FUSE-based filesystem unmount when running the LTP writev03 test.

After some investigation, I believe the root cause is a race condition
that has existed since commit 7a4847e54cc1 ("iomap: use folio_end_read()")
but was difficult to trigger until now. The issue is that folio_end_read()
uses XOR semantics to set the uptodate bit, so if iomap_set_range_uptodate()
calls folio_mark_uptodate() while a read is in progress, the subsequent
folio_end_read() will XOR and clear the uptodate bit.

The FUSE iomap enablement seems to have created the right conditions to
expose this race - likely due to different file extent patterns in
FUSE-based filesystems (like NTFS-3G) compared to native filesystems
like XFS/ext4.

The fix checks read_bytes_pending under the state_lock in
iomap_set_range_uptodate() and skips calling folio_mark_uptodate() if a
read is in progress, letting the read completion path handle it.

I'm not very familiar with the iomap internals, so I'd really appreciate
your review and feedback on whether this approach is correct.

Thanks,
Sasha

Sasha Levin (1):
  iomap: fix race between iomap_set_range_uptodate and folio_end_read

 fs/fuse/dev.c          |  3 +-
 fs/fuse/file.c         |  6 ++--
 fs/iomap/buffered-io.c | 65 +++++++++++++++++++++++++++++++++++++++---
 include/linux/iomap.h  |  2 ++
 4 files changed, 68 insertions(+), 8 deletions(-)

-- 
2.51.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
  2025-12-23 22:30 ` [RFC PATCH 0/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read Sasha Levin
@ 2025-12-23 22:30   ` Sasha Levin
  2025-12-24  1:12     ` Joanne Koong
  0 siblings, 1 reply; 22+ messages in thread
From: Sasha Levin @ 2025-12-23 22:30 UTC (permalink / raw)
  To: joannelkoong; +Cc: willy, linux-fsdevel, linux-kernel, Sasha Levin

When iomap uses large folios, per-block uptodate tracking is managed via
iomap_folio_state (ifs). A race condition can cause the ifs uptodate bits
to become inconsistent with the folio's uptodate flag.

The race occurs because folio_end_read() uses XOR semantics to atomically
set the uptodate bit and clear the locked bit:

  Thread A (read completion):          Thread B (concurrent write):
  --------------------------------     --------------------------------
  iomap_finish_folio_read()
    spin_lock(state_lock)
    ifs_set_range_uptodate() -> true
    spin_unlock(state_lock)
                                       iomap_set_range_uptodate()
                                         spin_lock(state_lock)
                                         ifs_set_range_uptodate() -> true
                                         spin_unlock(state_lock)
                                         folio_mark_uptodate(folio)
    folio_end_read(folio, true)
      folio_xor_flags()  // XOR CLEARS uptodate!

Result: folio is NOT uptodate, but ifs says all blocks ARE uptodate.

Fix by checking read_bytes_pending in iomap_set_range_uptodate() under the
lock. If a read is in progress, skip calling folio_mark_uptodate() - the
read completion path will handle it via folio_end_read().

The warning was triggered during FUSE-based filesystem (e.g., NTFS-3G)
unmount when the LTP writev03 test was run:

  WARNING: fs/iomap/buffered-io.c at ifs_free
  Call trace:
   ifs_free
   iomap_invalidate_folio
   truncate_cleanup_folio
   truncate_inode_pages_range
   truncate_inode_pages_final
   fuse_evict_inode
   ...
   fuse_kill_sb_blk

Fixes: 7a4847e54cc1 ("iomap: use folio_end_read()")
Assisted-by: claude-opus-4-5-20251101
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/fuse/dev.c          |  3 +-
 fs/fuse/file.c         |  6 ++--
 fs/iomap/buffered-io.c | 65 +++++++++++++++++++++++++++++++++++++++---
 include/linux/iomap.h  |  2 ++
 4 files changed, 68 insertions(+), 8 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 6d59cbc877c6..50e84e913589 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -11,6 +11,7 @@
 #include "fuse_dev_i.h"
 
 #include <linux/init.h>
+#include <linux/iomap.h>
 #include <linux/module.h>
 #include <linux/poll.h>
 #include <linux/sched/signal.h>
@@ -1820,7 +1821,7 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
 		if (!folio_test_uptodate(folio) && !err && offset == 0 &&
 		    (nr_bytes == folio_size(folio) || file_size == end)) {
 			folio_zero_segment(folio, nr_bytes, folio_size(folio));
-			folio_mark_uptodate(folio);
+			iomap_set_range_uptodate(folio, 0, folio_size(folio));
 		}
 		folio_unlock(folio);
 		folio_put(folio);
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 01bc894e9c2b..3abe38416199 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1216,13 +1216,13 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia,
 		struct folio *folio = ap->folios[i];
 
 		if (err) {
-			folio_clear_uptodate(folio);
+			iomap_clear_folio_uptodate(folio);
 		} else {
 			if (count >= folio_size(folio) - offset)
 				count -= folio_size(folio) - offset;
 			else {
 				if (short_write)
-					folio_clear_uptodate(folio);
+					iomap_clear_folio_uptodate(folio);
 				count = 0;
 			}
 			offset = 0;
@@ -1305,7 +1305,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
 
 		/* If we copied full folio, mark it uptodate */
 		if (tmp == folio_size(folio))
-			folio_mark_uptodate(folio);
+			iomap_set_range_uptodate(folio, 0, folio_size(folio));
 
 		if (folio_test_uptodate(folio)) {
 			folio_unlock(folio);
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index e5c1ca440d93..7ceda24cf6a7 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -74,8 +74,7 @@ static bool ifs_set_range_uptodate(struct folio *folio,
 	return ifs_is_fully_uptodate(folio, ifs);
 }
 
-static void iomap_set_range_uptodate(struct folio *folio, size_t off,
-		size_t len)
+void iomap_set_range_uptodate(struct folio *folio, size_t off, size_t len)
 {
 	struct iomap_folio_state *ifs = folio->private;
 	unsigned long flags;
@@ -87,12 +86,50 @@ static void iomap_set_range_uptodate(struct folio *folio, size_t off,
 	if (ifs) {
 		spin_lock_irqsave(&ifs->state_lock, flags);
 		uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
+		/*
+		 * If a read is in progress, we must NOT call folio_mark_uptodate
+		 * here. The read completion path (iomap_finish_folio_read or
+		 * iomap_read_end) will call folio_end_read() which uses XOR
+		 * semantics to set the uptodate bit. If we set it here, the XOR
+		 * in folio_end_read() will clear it, leaving the folio not
+		 * uptodate while the ifs says all blocks are uptodate.
+		 */
+		if (uptodate && ifs->read_bytes_pending)
+			uptodate = false;
 		spin_unlock_irqrestore(&ifs->state_lock, flags);
 	}
 
 	if (uptodate)
 		folio_mark_uptodate(folio);
 }
+EXPORT_SYMBOL_GPL(iomap_set_range_uptodate);
+
+void iomap_clear_folio_uptodate(struct folio *folio)
+{
+	struct iomap_folio_state *ifs = folio->private;
+
+	if (ifs) {
+		struct inode *inode = folio->mapping->host;
+		unsigned int nr_blocks = i_blocks_per_folio(inode, folio);
+		unsigned long flags;
+
+		spin_lock_irqsave(&ifs->state_lock, flags);
+		/*
+		 * If a read is in progress, don't clear the uptodate state.
+		 * The read completion path will handle the folio state, and
+		 * clearing here would race with iomap_finish_folio_read()
+		 * potentially causing ifs/folio uptodate state mismatch.
+		 */
+		if (ifs->read_bytes_pending) {
+			spin_unlock_irqrestore(&ifs->state_lock, flags);
+			return;
+		}
+		bitmap_clear(ifs->state, 0, nr_blocks);
+		spin_unlock_irqrestore(&ifs->state_lock, flags);
+	}
+	folio_clear_uptodate(folio);
+}
+EXPORT_SYMBOL_GPL(iomap_clear_folio_uptodate);
 
 /*
  * Find the next dirty block in the folio. end_blk is inclusive.
@@ -399,8 +436,17 @@ void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
 		spin_unlock_irqrestore(&ifs->state_lock, flags);
 	}
 
-	if (finished)
+	if (finished) {
+		/*
+		 * If uptodate is true but the folio is already marked uptodate,
+		 * folio_end_read's XOR semantics would clear the uptodate bit.
+		 * This should never happen because iomap_set_range_uptodate()
+		 * skips calling folio_mark_uptodate() when read_bytes_pending
+		 * is non-zero, ensuring only the read completion path sets it.
+		 */
+		WARN_ON_ONCE(uptodate && folio_test_uptodate(folio));
 		folio_end_read(folio, uptodate);
+	}
 }
 EXPORT_SYMBOL_GPL(iomap_finish_folio_read);
 
@@ -481,8 +527,19 @@ static void iomap_read_end(struct folio *folio, size_t bytes_submitted)
 		if (end_read)
 			uptodate = ifs_is_fully_uptodate(folio, ifs);
 		spin_unlock_irq(&ifs->state_lock);
-		if (end_read)
+		if (end_read) {
+			/*
+			 * If uptodate is true but the folio is already marked
+			 * uptodate, folio_end_read's XOR semantics would clear
+			 * the uptodate bit. This should never happen because
+			 * iomap_set_range_uptodate() skips calling
+			 * folio_mark_uptodate() when read_bytes_pending is
+			 * non-zero, ensuring only the read completion path
+			 * sets it.
+			 */
+			WARN_ON_ONCE(uptodate && folio_test_uptodate(folio));
 			folio_end_read(folio, uptodate);
+		}
 	} else if (!bytes_submitted) {
 		/*
 		 * If there were no bytes submitted, this means we are
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 520e967cb501..3c2ad88d16b6 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -345,6 +345,8 @@ void iomap_read_folio(const struct iomap_ops *ops,
 void iomap_readahead(const struct iomap_ops *ops,
 		struct iomap_read_folio_ctx *ctx);
 bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);
+void iomap_set_range_uptodate(struct folio *folio, size_t off, size_t len);
+void iomap_clear_folio_uptodate(struct folio *folio);
 struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len);
 bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags);
 void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
  2025-12-23 22:30   ` [RFC PATCH 1/1] " Sasha Levin
@ 2025-12-24  1:12     ` Joanne Koong
  2025-12-24  1:31       ` Sasha Levin
  2025-12-24  2:10       ` Matthew Wilcox
  0 siblings, 2 replies; 22+ messages in thread
From: Joanne Koong @ 2025-12-24  1:12 UTC (permalink / raw)
  To: Sasha Levin; +Cc: willy, linux-fsdevel, linux-kernel

On Tue, Dec 23, 2025 at 2:30 PM Sasha Levin <sashal@kernel.org> wrote:
>

Hi Sasha,

Thanks for your patch and for the detailed writeup.

> When iomap uses large folios, per-block uptodate tracking is managed via
> iomap_folio_state (ifs). A race condition can cause the ifs uptodate bits
> to become inconsistent with the folio's uptodate flag.
>
> The race occurs because folio_end_read() uses XOR semantics to atomically
> set the uptodate bit and clear the locked bit:
>
>   Thread A (read completion):          Thread B (concurrent write):
>   --------------------------------     --------------------------------
>   iomap_finish_folio_read()
>     spin_lock(state_lock)
>     ifs_set_range_uptodate() -> true
>     spin_unlock(state_lock)
>                                        iomap_set_range_uptodate()
>                                          spin_lock(state_lock)
>                                          ifs_set_range_uptodate() -> true
>                                          spin_unlock(state_lock)
>                                          folio_mark_uptodate(folio)
>     folio_end_read(folio, true)
>       folio_xor_flags()  // XOR CLEARS uptodate!

The part I'm confused about here is how this can happen between a
concurrent read and write. My understanding is that the folio is
locked when the read occurs and locked when the write occurs and both
locks get dropped only when the read or write finishes. Looking at
iomap code, I see iomap_set_range_uptodate() getting called in
__iomap_write_begin() and __iomap_write_end() for the writes, but in
both those places the folio lock is held while this is called. I'm not
seeing how the read and write race in the diagram can happen, but
maybe I'm missing something here?

>
> Result: folio is NOT uptodate, but ifs says all blocks ARE uptodate.

Ah I see the WARN_ON_ONCE() in ifs_free:
        WARN_ON_ONCE(ifs_is_fully_uptodate(folio, ifs) !=
                        folio_test_uptodate(folio));

Just to confirm, are you seeing that the folio is not marked uptodate
but the ifs blocks are? Or are the ifs blocks not uptodate but the
folio is?

>
> Fix by checking read_bytes_pending in iomap_set_range_uptodate() under the
> lock. If a read is in progress, skip calling folio_mark_uptodate() - the
> read completion path will handle it via folio_end_read().
>
> The warning was triggered during FUSE-based filesystem (e.g., NTFS-3G)
> unmount when the LTP writev03 test was run:
>
>   WARNING: fs/iomap/buffered-io.c at ifs_free
>   Call trace:
>    ifs_free
>    iomap_invalidate_folio
>    truncate_cleanup_folio
>    truncate_inode_pages_range
>    truncate_inode_pages_final
>    fuse_evict_inode
>    ...
>    fuse_kill_sb_blk
>
> Fixes: 7a4847e54cc1 ("iomap: use folio_end_read()")
> Assisted-by: claude-opus-4-5-20251101
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>  fs/fuse/dev.c          |  3 +-
>  fs/fuse/file.c         |  6 ++--
>  fs/iomap/buffered-io.c | 65 +++++++++++++++++++++++++++++++++++++++---
>  include/linux/iomap.h  |  2 ++
>  4 files changed, 68 insertions(+), 8 deletions(-)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 6d59cbc877c6..50e84e913589 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -11,6 +11,7 @@
>  #include "fuse_dev_i.h"
>
>  #include <linux/init.h>
> +#include <linux/iomap.h>
>  #include <linux/module.h>
>  #include <linux/poll.h>
>  #include <linux/sched/signal.h>
> @@ -1820,7 +1821,7 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
>                 if (!folio_test_uptodate(folio) && !err && offset == 0 &&
>                     (nr_bytes == folio_size(folio) || file_size == end)) {
>                         folio_zero_segment(folio, nr_bytes, folio_size(folio));
> -                       folio_mark_uptodate(folio);
> +                       iomap_set_range_uptodate(folio, 0, folio_size(folio));
>                 }
>                 folio_unlock(folio);
>                 folio_put(folio);
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 01bc894e9c2b..3abe38416199 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1216,13 +1216,13 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia,
>                 struct folio *folio = ap->folios[i];
>
>                 if (err) {
> -                       folio_clear_uptodate(folio);
> +                       iomap_clear_folio_uptodate(folio);
>                 } else {
>                         if (count >= folio_size(folio) - offset)
>                                 count -= folio_size(folio) - offset;
>                         else {
>                                 if (short_write)
> -                                       folio_clear_uptodate(folio);
> +                                       iomap_clear_folio_uptodate(folio);
>                                 count = 0;
>                         }
>                         offset = 0;
> @@ -1305,7 +1305,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
>
>                 /* If we copied full folio, mark it uptodate */
>                 if (tmp == folio_size(folio))
> -                       folio_mark_uptodate(folio);
> +                       iomap_set_range_uptodate(folio, 0, folio_size(folio));
>
>                 if (folio_test_uptodate(folio)) {
>                         folio_unlock(folio);
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index e5c1ca440d93..7ceda24cf6a7 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -74,8 +74,7 @@ static bool ifs_set_range_uptodate(struct folio *folio,
>         return ifs_is_fully_uptodate(folio, ifs);
>  }
>
> -static void iomap_set_range_uptodate(struct folio *folio, size_t off,
> -               size_t len)
> +void iomap_set_range_uptodate(struct folio *folio, size_t off, size_t len)
>  {
>         struct iomap_folio_state *ifs = folio->private;
>         unsigned long flags;
> @@ -87,12 +86,50 @@ static void iomap_set_range_uptodate(struct folio *folio, size_t off,
>         if (ifs) {
>                 spin_lock_irqsave(&ifs->state_lock, flags);
>                 uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
> +               /*
> +                * If a read is in progress, we must NOT call folio_mark_uptodate
> +                * here. The read completion path (iomap_finish_folio_read or
> +                * iomap_read_end) will call folio_end_read() which uses XOR
> +                * semantics to set the uptodate bit. If we set it here, the XOR
> +                * in folio_end_read() will clear it, leaving the folio not
> +                * uptodate while the ifs says all blocks are uptodate.
> +                */
> +               if (uptodate && ifs->read_bytes_pending)
> +                       uptodate = false;

Does the warning you saw in ifs_free() still go away without the
changes here to iomap_set_range_uptodate() or is this change here
necessary?  I'm asking mostly because I'm not seeing how
iomap_set_range_uptodate() can be called while the read is in
progress, as the logic should be already protected by the folio locks.

>                 spin_unlock_irqrestore(&ifs->state_lock, flags);
>         }
>
>         if (uptodate)
>                 folio_mark_uptodate(folio);
>  }
> +EXPORT_SYMBOL_GPL(iomap_set_range_uptodate);
> +
> +void iomap_clear_folio_uptodate(struct folio *folio)
> +{
> +       struct iomap_folio_state *ifs = folio->private;
> +
> +       if (ifs) {
> +               struct inode *inode = folio->mapping->host;
> +               unsigned int nr_blocks = i_blocks_per_folio(inode, folio);
> +               unsigned long flags;
> +
> +               spin_lock_irqsave(&ifs->state_lock, flags);
> +               /*
> +                * If a read is in progress, don't clear the uptodate state.
> +                * The read completion path will handle the folio state, and
> +                * clearing here would race with iomap_finish_folio_read()
> +                * potentially causing ifs/folio uptodate state mismatch.
> +                */
> +               if (ifs->read_bytes_pending) {
> +                       spin_unlock_irqrestore(&ifs->state_lock, flags);
> +                       return;
> +               }
> +               bitmap_clear(ifs->state, 0, nr_blocks);
> +               spin_unlock_irqrestore(&ifs->state_lock, flags);
> +       }
> +       folio_clear_uptodate(folio);
> +}
> +EXPORT_SYMBOL_GPL(iomap_clear_folio_uptodate);
>
>  /*
>   * Find the next dirty block in the folio. end_blk is inclusive.
> @@ -399,8 +436,17 @@ void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
>                 spin_unlock_irqrestore(&ifs->state_lock, flags);
>         }
>
> -       if (finished)
> +       if (finished) {
> +               /*
> +                * If uptodate is true but the folio is already marked uptodate,
> +                * folio_end_read's XOR semantics would clear the uptodate bit.
> +                * This should never happen because iomap_set_range_uptodate()
> +                * skips calling folio_mark_uptodate() when read_bytes_pending
> +                * is non-zero, ensuring only the read completion path sets it.
> +                */
> +               WARN_ON_ONCE(uptodate && folio_test_uptodate(folio));

Matthew pointed out in another thread [1] that folio_end_read() has
already the warnings against double-unlocks or double-uptodates
in-built:

        VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
        VM_BUG_ON_FOLIO(success && folio_test_uptodate(folio), folio);

but imo the WARN_ON_ONCE() here is nice to have too, as I don't think
most builds enable CONFIG_DEBUG_VM.

[1] https://lore.kernel.org/linux-fsdevel/aPu1ilw6Tq6tKPrf@casper.infradead.org/

Thanks,
Joanne
>                 folio_end_read(folio, uptodate);
> +       }
>  }
>  EXPORT_SYMBOL_GPL(iomap_finish_folio_read);
>
> @@ -481,8 +527,19 @@ static void iomap_read_end(struct folio *folio, size_t bytes_submitted)
>                 if (end_read)
>                         uptodate = ifs_is_fully_uptodate(folio, ifs);
>                 spin_unlock_irq(&ifs->state_lock);
> -               if (end_read)
> +               if (end_read) {
> +                       /*
> +                        * If uptodate is true but the folio is already marked
> +                        * uptodate, folio_end_read's XOR semantics would clear
> +                        * the uptodate bit. This should never happen because
> +                        * iomap_set_range_uptodate() skips calling
> +                        * folio_mark_uptodate() when read_bytes_pending is
> +                        * non-zero, ensuring only the read completion path
> +                        * sets it.
> +                        */
> +                       WARN_ON_ONCE(uptodate && folio_test_uptodate(folio));
>                         folio_end_read(folio, uptodate);
> +               }
>         } else if (!bytes_submitted) {
>                 /*
>                  * If there were no bytes submitted, this means we are
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 520e967cb501..3c2ad88d16b6 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -345,6 +345,8 @@ void iomap_read_folio(const struct iomap_ops *ops,
>  void iomap_readahead(const struct iomap_ops *ops,
>                 struct iomap_read_folio_ctx *ctx);
>  bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);
> +void iomap_set_range_uptodate(struct folio *folio, size_t off, size_t len);
> +void iomap_clear_folio_uptodate(struct folio *folio);
>  struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len);
>  bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags);
>  void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len);
> --
> 2.51.0
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
  2025-12-24  1:12     ` Joanne Koong
@ 2025-12-24  1:31       ` Sasha Levin
  2026-02-07  7:16         ` Wei Gao
  2025-12-24  2:10       ` Matthew Wilcox
  1 sibling, 1 reply; 22+ messages in thread
From: Sasha Levin @ 2025-12-24  1:31 UTC (permalink / raw)
  To: Joanne Koong; +Cc: willy, linux-fsdevel, linux-kernel

On Tue, Dec 23, 2025 at 05:12:09PM -0800, Joanne Koong wrote:
>On Tue, Dec 23, 2025 at 2:30 PM Sasha Levin <sashal@kernel.org> wrote:
>>
>
>Hi Sasha,
>
>Thanks for your patch and for the detailed writeup.

Thanks for looking into this!

>> When iomap uses large folios, per-block uptodate tracking is managed via
>> iomap_folio_state (ifs). A race condition can cause the ifs uptodate bits
>> to become inconsistent with the folio's uptodate flag.
>>
>> The race occurs because folio_end_read() uses XOR semantics to atomically
>> set the uptodate bit and clear the locked bit:
>>
>>   Thread A (read completion):          Thread B (concurrent write):
>>   --------------------------------     --------------------------------
>>   iomap_finish_folio_read()
>>     spin_lock(state_lock)
>>     ifs_set_range_uptodate() -> true
>>     spin_unlock(state_lock)
>>                                        iomap_set_range_uptodate()
>>                                          spin_lock(state_lock)
>>                                          ifs_set_range_uptodate() -> true
>>                                          spin_unlock(state_lock)
>>                                          folio_mark_uptodate(folio)
>>     folio_end_read(folio, true)
>>       folio_xor_flags()  // XOR CLEARS uptodate!
>
>The part I'm confused about here is how this can happen between a
>concurrent read and write. My understanding is that the folio is
>locked when the read occurs and locked when the write occurs and both
>locks get dropped only when the read or write finishes. Looking at
>iomap code, I see iomap_set_range_uptodate() getting called in
>__iomap_write_begin() and __iomap_write_end() for the writes, but in
>both those places the folio lock is held while this is called. I'm not
>seeing how the read and write race in the diagram can happen, but
>maybe I'm missing something here?

Hmm, you're right... The folio lock should prevent concurrent read/write
access. Looking at this again, I suspect that FUSE was calling
folio_clear_uptodate() and folio_mark_uptodate() directly without updating the
ifs bits. For example, in fuse_send_write_pages() on write error, it calls
folio_clear_uptodate(folio) which clears the folio flag but leaves ifs still
showing all blocks uptodate?

>>
>> Result: folio is NOT uptodate, but ifs says all blocks ARE uptodate.
>
>Ah I see the WARN_ON_ONCE() in ifs_free:
>        WARN_ON_ONCE(ifs_is_fully_uptodate(folio, ifs) !=
>                        folio_test_uptodate(folio));
>
>Just to confirm, are you seeing that the folio is not marked uptodate
>but the ifs blocks are? Or are the ifs blocks not uptodate but the
>folio is?

The former: folio is NOT uptodate but ifs shows all blocks ARE uptodate
(state=0xffff with 16 blocks)

>>
>> Fix by checking read_bytes_pending in iomap_set_range_uptodate() under the
>> lock. If a read is in progress, skip calling folio_mark_uptodate() - the
>> read completion path will handle it via folio_end_read().
>>
>> The warning was triggered during FUSE-based filesystem (e.g., NTFS-3G)
>> unmount when the LTP writev03 test was run:
>>
>>   WARNING: fs/iomap/buffered-io.c at ifs_free
>>   Call trace:
>>    ifs_free
>>    iomap_invalidate_folio
>>    truncate_cleanup_folio
>>    truncate_inode_pages_range
>>    truncate_inode_pages_final
>>    fuse_evict_inode
>>    ...
>>    fuse_kill_sb_blk
>>
>> Fixes: 7a4847e54cc1 ("iomap: use folio_end_read()")
>> Assisted-by: claude-opus-4-5-20251101
>> Signed-off-by: Sasha Levin <sashal@kernel.org>
>> ---
>>  fs/fuse/dev.c          |  3 +-
>>  fs/fuse/file.c         |  6 ++--
>>  fs/iomap/buffered-io.c | 65 +++++++++++++++++++++++++++++++++++++++---
>>  include/linux/iomap.h  |  2 ++
>>  4 files changed, 68 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
>> index 6d59cbc877c6..50e84e913589 100644
>> --- a/fs/fuse/dev.c
>> +++ b/fs/fuse/dev.c
>> @@ -11,6 +11,7 @@
>>  #include "fuse_dev_i.h"
>>
>>  #include <linux/init.h>
>> +#include <linux/iomap.h>
>>  #include <linux/module.h>
>>  #include <linux/poll.h>
>>  #include <linux/sched/signal.h>
>> @@ -1820,7 +1821,7 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
>>                 if (!folio_test_uptodate(folio) && !err && offset == 0 &&
>>                     (nr_bytes == folio_size(folio) || file_size == end)) {
>>                         folio_zero_segment(folio, nr_bytes, folio_size(folio));
>> -                       folio_mark_uptodate(folio);
>> +                       iomap_set_range_uptodate(folio, 0, folio_size(folio));
>>                 }
>>                 folio_unlock(folio);
>>                 folio_put(folio);
>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> index 01bc894e9c2b..3abe38416199 100644
>> --- a/fs/fuse/file.c
>> +++ b/fs/fuse/file.c
>> @@ -1216,13 +1216,13 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia,
>>                 struct folio *folio = ap->folios[i];
>>
>>                 if (err) {
>> -                       folio_clear_uptodate(folio);
>> +                       iomap_clear_folio_uptodate(folio);
>>                 } else {
>>                         if (count >= folio_size(folio) - offset)
>>                                 count -= folio_size(folio) - offset;
>>                         else {
>>                                 if (short_write)
>> -                                       folio_clear_uptodate(folio);
>> +                                       iomap_clear_folio_uptodate(folio);
>>                                 count = 0;
>>                         }
>>                         offset = 0;
>> @@ -1305,7 +1305,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
>>
>>                 /* If we copied full folio, mark it uptodate */
>>                 if (tmp == folio_size(folio))
>> -                       folio_mark_uptodate(folio);
>> +                       iomap_set_range_uptodate(folio, 0, folio_size(folio));
>>
>>                 if (folio_test_uptodate(folio)) {
>>                         folio_unlock(folio);
>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>> index e5c1ca440d93..7ceda24cf6a7 100644
>> --- a/fs/iomap/buffered-io.c
>> +++ b/fs/iomap/buffered-io.c
>> @@ -74,8 +74,7 @@ static bool ifs_set_range_uptodate(struct folio *folio,
>>         return ifs_is_fully_uptodate(folio, ifs);
>>  }
>>
>> -static void iomap_set_range_uptodate(struct folio *folio, size_t off,
>> -               size_t len)
>> +void iomap_set_range_uptodate(struct folio *folio, size_t off, size_t len)
>>  {
>>         struct iomap_folio_state *ifs = folio->private;
>>         unsigned long flags;
>> @@ -87,12 +86,50 @@ static void iomap_set_range_uptodate(struct folio *folio, size_t off,
>>         if (ifs) {
>>                 spin_lock_irqsave(&ifs->state_lock, flags);
>>                 uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
>> +               /*
>> +                * If a read is in progress, we must NOT call folio_mark_uptodate
>> +                * here. The read completion path (iomap_finish_folio_read or
>> +                * iomap_read_end) will call folio_end_read() which uses XOR
>> +                * semantics to set the uptodate bit. If we set it here, the XOR
>> +                * in folio_end_read() will clear it, leaving the folio not
>> +                * uptodate while the ifs says all blocks are uptodate.
>> +                */
>> +               if (uptodate && ifs->read_bytes_pending)
>> +                       uptodate = false;
>
>Does the warning you saw in ifs_free() still go away without the
>changes here to iomap_set_range_uptodate() or is this change here
>necessary?  I'm asking mostly because I'm not seeing how
>iomap_set_range_uptodate() can be called while the read is in
>progress, as the logic should be already protected by the folio locks.

Yes, the warning goes away even without this part. I don't think that this is
necessary - I just kept it while figuring out the race.

>>                 spin_unlock_irqrestore(&ifs->state_lock, flags);
>>         }
>>
>>         if (uptodate)
>>                 folio_mark_uptodate(folio);
>>  }
>> +EXPORT_SYMBOL_GPL(iomap_set_range_uptodate);
>> +
>> +void iomap_clear_folio_uptodate(struct folio *folio)
>> +{
>> +       struct iomap_folio_state *ifs = folio->private;
>> +
>> +       if (ifs) {
>> +               struct inode *inode = folio->mapping->host;
>> +               unsigned int nr_blocks = i_blocks_per_folio(inode, folio);
>> +               unsigned long flags;
>> +
>> +               spin_lock_irqsave(&ifs->state_lock, flags);
>> +               /*
>> +                * If a read is in progress, don't clear the uptodate state.
>> +                * The read completion path will handle the folio state, and
>> +                * clearing here would race with iomap_finish_folio_read()
>> +                * potentially causing ifs/folio uptodate state mismatch.
>> +                */
>> +               if (ifs->read_bytes_pending) {
>> +                       spin_unlock_irqrestore(&ifs->state_lock, flags);
>> +                       return;
>> +               }
>> +               bitmap_clear(ifs->state, 0, nr_blocks);
>> +               spin_unlock_irqrestore(&ifs->state_lock, flags);
>> +       }
>> +       folio_clear_uptodate(folio);
>> +}
>> +EXPORT_SYMBOL_GPL(iomap_clear_folio_uptodate);
>>
>>  /*
>>   * Find the next dirty block in the folio. end_blk is inclusive.
>> @@ -399,8 +436,17 @@ void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
>>                 spin_unlock_irqrestore(&ifs->state_lock, flags);
>>         }
>>
>> -       if (finished)
>> +       if (finished) {
>> +               /*
>> +                * If uptodate is true but the folio is already marked uptodate,
>> +                * folio_end_read's XOR semantics would clear the uptodate bit.
>> +                * This should never happen because iomap_set_range_uptodate()
>> +                * skips calling folio_mark_uptodate() when read_bytes_pending
>> +                * is non-zero, ensuring only the read completion path sets it.
>> +                */
>> +               WARN_ON_ONCE(uptodate && folio_test_uptodate(folio));
>
>Matthew pointed out in another thread [1] that folio_end_read() has
>already the warnings against double-unlocks or double-uptodates
>in-built:
>
>        VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
>        VM_BUG_ON_FOLIO(success && folio_test_uptodate(folio), folio);
>
>but imo the WARN_ON_ONCE() here is nice to have too, as I don't think
>most builds enable CONFIG_DEBUG_VM.
>
>[1] https://lore.kernel.org/linux-fsdevel/aPu1ilw6Tq6tKPrf@casper.infradead.org/
>
>Thanks,
>Joanne
>>                 folio_end_read(folio, uptodate);
>> +       }
>>  }
>>  EXPORT_SYMBOL_GPL(iomap_finish_folio_read);
>>
>> @@ -481,8 +527,19 @@ static void iomap_read_end(struct folio *folio, size_t bytes_submitted)
>>                 if (end_read)
>>                         uptodate = ifs_is_fully_uptodate(folio, ifs);
>>                 spin_unlock_irq(&ifs->state_lock);
>> -               if (end_read)
>> +               if (end_read) {
>> +                       /*
>> +                        * If uptodate is true but the folio is already marked
>> +                        * uptodate, folio_end_read's XOR semantics would clear
>> +                        * the uptodate bit. This should never happen because
>> +                        * iomap_set_range_uptodate() skips calling
>> +                        * folio_mark_uptodate() when read_bytes_pending is
>> +                        * non-zero, ensuring only the read completion path
>> +                        * sets it.
>> +                        */
>> +                       WARN_ON_ONCE(uptodate && folio_test_uptodate(folio));
>>                         folio_end_read(folio, uptodate);
>> +               }
>>         } else if (!bytes_submitted) {
>>                 /*
>>                  * If there were no bytes submitted, this means we are
>> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
>> index 520e967cb501..3c2ad88d16b6 100644
>> --- a/include/linux/iomap.h
>> +++ b/include/linux/iomap.h
>> @@ -345,6 +345,8 @@ void iomap_read_folio(const struct iomap_ops *ops,
>>  void iomap_readahead(const struct iomap_ops *ops,
>>                 struct iomap_read_folio_ctx *ctx);
>>  bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);
>> +void iomap_set_range_uptodate(struct folio *folio, size_t off, size_t len);
>> +void iomap_clear_folio_uptodate(struct folio *folio);
>>  struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len);
>>  bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags);
>>  void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len);
>> --
>> 2.51.0
>>

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
  2025-12-24  1:12     ` Joanne Koong
  2025-12-24  1:31       ` Sasha Levin
@ 2025-12-24  2:10       ` Matthew Wilcox
  2025-12-24 15:43         ` Sasha Levin
  1 sibling, 1 reply; 22+ messages in thread
From: Matthew Wilcox @ 2025-12-24  2:10 UTC (permalink / raw)
  To: Joanne Koong; +Cc: Sasha Levin, linux-fsdevel, linux-kernel

On Tue, Dec 23, 2025 at 05:12:09PM -0800, Joanne Koong wrote:
> On Tue, Dec 23, 2025 at 2:30 PM Sasha Levin <sashal@kernel.org> wrote:
> >
> 
> Hi Sasha,
> 
> Thanks for your patch and for the detailed writeup.

The important line to note is:

Assisted-by: claude-opus-4-5-20251101

So Sasha has produced a very convincingly worded writeup that's
hallucinated.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
  2025-12-24  2:10       ` Matthew Wilcox
@ 2025-12-24 15:43         ` Sasha Levin
  2025-12-24 17:27           ` Matthew Wilcox
  0 siblings, 1 reply; 22+ messages in thread
From: Sasha Levin @ 2025-12-24 15:43 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Joanne Koong, linux-fsdevel, linux-kernel

On Wed, Dec 24, 2025 at 02:10:19AM +0000, Matthew Wilcox wrote:
>On Tue, Dec 23, 2025 at 05:12:09PM -0800, Joanne Koong wrote:
>> On Tue, Dec 23, 2025 at 2:30 PM Sasha Levin <sashal@kernel.org> wrote:
>> >
>>
>> Hi Sasha,
>>
>> Thanks for your patch and for the detailed writeup.
>
>The important line to note is:
>
>Assisted-by: claude-opus-4-5-20251101
>
>So Sasha has produced a very convincingly worded writeup that's
>hallucinated.

And spent a few hours trying to figure it out so I could unblock testing, but
sure - thanks.

Here's the full log:
https://qa-reports.linaro.org/lkft/sashal-linus-next/build/v6.18-rc7-13806-gb927546677c8/testrun/30618654/suite/log-parser-test/test/exception-warning-fsiomapbuffered-io-at-ifs_free/log
, happy to test any patches you might have.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
  2025-12-24 15:43         ` Sasha Levin
@ 2025-12-24 17:27           ` Matthew Wilcox
  2025-12-24 21:21             ` Sasha Levin
  0 siblings, 1 reply; 22+ messages in thread
From: Matthew Wilcox @ 2025-12-24 17:27 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Joanne Koong, linux-fsdevel, linux-kernel

On Wed, Dec 24, 2025 at 10:43:58AM -0500, Sasha Levin wrote:
> On Wed, Dec 24, 2025 at 02:10:19AM +0000, Matthew Wilcox wrote:
> > So Sasha has produced a very convincingly worded writeup that's
> > hallucinated.
> 
> And spent a few hours trying to figure it out so I could unblock testing, but
> sure - thanks.

When you produce a convincingly worded writeup that's utterly wrong,
and have a reputation for using AI, that's the kind of reaction you're
going to get.

> Here's the full log:
> https://qa-reports.linaro.org/lkft/sashal-linus-next/build/v6.18-rc7-13806-gb927546677c8/testrun/30618654/suite/log-parser-test/test/exception-warning-fsiomapbuffered-io-at-ifs_free/log
> , happy to test any patches you might have.

That's actually much more helpful because it removes your incorrect
assumptions about what's going on.

 WARNING: fs/iomap/buffered-io.c:254 at ifs_free+0x130/0x148, CPU#0: msync04/406

That's this one:

        WARN_ON_ONCE(ifs_is_fully_uptodate(folio, ifs) !=
                        folio_test_uptodate(folio));

which would be fully explained by fuse calling folio_clear_uptodate()
in fuse_send_write_pages().  I have come to believe that allowing
filesystems to call folio_clear_uptodate() is just dangerous.  It
causes assertions to fire all over the place (eg if the page is mapped
into memory, the MM contains assertions that it must be uptodate).

So I think the first step is simply to delete the folio_clear_uptodate()
calls in fuse:

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 01bc894e9c2b..b819ede407d5 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1194,7 +1194,6 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia,
 	struct fuse_file *ff = file->private_data;
 	struct fuse_mount *fm = ff->fm;
 	unsigned int offset, i;
-	bool short_write;
 	int err;
 
 	for (i = 0; i < ap->num_folios; i++)
@@ -1209,22 +1208,16 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia,
 	if (!err && ia->write.out.size > count)
 		err = -EIO;
 
-	short_write = ia->write.out.size < count;
 	offset = ap->descs[0].offset;
 	count = ia->write.out.size;
 	for (i = 0; i < ap->num_folios; i++) {
 		struct folio *folio = ap->folios[i];
 
-		if (err) {
-			folio_clear_uptodate(folio);
-		} else {
+		if (!err) {
 			if (count >= folio_size(folio) - offset)
 				count -= folio_size(folio) - offset;
-			else {
-				if (short_write)
-					folio_clear_uptodate(folio);
+			else
 				count = 0;
-			}
 			offset = 0;
 		}
 		if (ia->write.folio_locked && (i == ap->num_folios - 1))

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
  2025-12-24 17:27           ` Matthew Wilcox
@ 2025-12-24 21:21             ` Sasha Levin
  2025-12-30  0:58               ` Joanne Koong
  0 siblings, 1 reply; 22+ messages in thread
From: Sasha Levin @ 2025-12-24 21:21 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Joanne Koong, linux-fsdevel, linux-kernel

On Wed, Dec 24, 2025 at 05:27:03PM +0000, Matthew Wilcox wrote:
>On Wed, Dec 24, 2025 at 10:43:58AM -0500, Sasha Levin wrote:
>> On Wed, Dec 24, 2025 at 02:10:19AM +0000, Matthew Wilcox wrote:
>> > So Sasha has produced a very convincingly worded writeup that's
>> > hallucinated.
>>
>> And spent a few hours trying to figure it out so I could unblock testing, but
>> sure - thanks.
>
>When you produce a convincingly worded writeup that's utterly wrong,
>and have a reputation for using AI, that's the kind of reaction you're
>going to get.

A rude and unprofessional one?

>> Here's the full log:
>> https://qa-reports.linaro.org/lkft/sashal-linus-next/build/v6.18-rc7-13806-gb927546677c8/testrun/30618654/suite/log-parser-test/test/exception-warning-fsiomapbuffered-io-at-ifs_free/log
>> , happy to test any patches you might have.
>
>That's actually much more helpful because it removes your incorrect
>assumptions about what's going on.
>
> WARNING: fs/iomap/buffered-io.c:254 at ifs_free+0x130/0x148, CPU#0: msync04/406
>
>That's this one:
>
>        WARN_ON_ONCE(ifs_is_fully_uptodate(folio, ifs) !=
>                        folio_test_uptodate(folio));
>
>which would be fully explained by fuse calling folio_clear_uptodate()
>in fuse_send_write_pages().  I have come to believe that allowing
>filesystems to call folio_clear_uptodate() is just dangerous.  It
>causes assertions to fire all over the place (eg if the page is mapped
>into memory, the MM contains assertions that it must be uptodate).
>
>So I think the first step is simply to delete the folio_clear_uptodate()
>calls in fuse:

[snip]

Here's the log of a run with the change you've provided applied: https://qa-reports.linaro.org/lkft/sashal-linus-next/build/v6.18-rc7-13807-g26a15474eb13/testrun/30620754/suite/log-parser-test/test/exception-warning-fsiomapbuffered-io-at-ifs_free/log

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
  2025-12-24 21:21             ` Sasha Levin
@ 2025-12-30  0:58               ` Joanne Koong
  0 siblings, 0 replies; 22+ messages in thread
From: Joanne Koong @ 2025-12-30  0:58 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Matthew Wilcox, linux-fsdevel, linux-kernel

On Wed, Dec 24, 2025 at 1:21 PM Sasha Levin <sashal@kernel.org> wrote:
>
> On Wed, Dec 24, 2025 at 05:27:03PM +0000, Matthew Wilcox wrote:
> >
> > WARNING: fs/iomap/buffered-io.c:254 at ifs_free+0x130/0x148, CPU#0: msync04/406
> >
> >That's this one:
> >
> >        WARN_ON_ONCE(ifs_is_fully_uptodate(folio, ifs) !=
> >                        folio_test_uptodate(folio));
> >
> >which would be fully explained by fuse calling folio_clear_uptodate()
> >in fuse_send_write_pages().  I have come to believe that allowing
> >filesystems to call folio_clear_uptodate() is just dangerous.  It
> >causes assertions to fire all over the place (eg if the page is mapped
> >into memory, the MM contains assertions that it must be uptodate).
> >
> >So I think the first step is simply to delete the folio_clear_uptodate()
> >calls in fuse:

Hmm... this fuse_perform_write() call path is for writethrough. In
writethrough, fuse first writes the data to the page cache and then to
the server. I think because we're doing the writes in that order (eg
first to the page cache, then the server), the clear uptodate is
needed if the server write is a short write or an error since we can't
revert the page cache data back to its original content (eg we want to
write 2 KB starting at offset 0, the folio representing that in the
page cache is uptodate, we retrieve that folio and write 2 KB to it,
then when we try writing it to the server, the server can only write
out 1 KB, where now there's a discrepancy between the page cache
contents and the disk contents, where we're unable to make these
consistent by undoing the page cache write for the chunk between 1 KB
and 2 KB). If we could switch the ordering and write it to the server
first and then to the page cache, then we could get rid of the clear
uptodates, but to switch this ordering requires a bigger change where
we'd need to add support for copying out data from a userspace iter to
the server (currently, only copying out data from folios are
supported). I'm happy to work on this though if you think we should
try our best to fully eradicate folio_clear_uptodate() from fuse.

There's also another folio_clear_uptodate() call in
fuse_try_move_folio() in fuse/dev.c when the server gifts pages to the
kernel through vmsplice. This one I think is needed else
folio_end_read() will xor uptodate state of an already uptodate folio
(commit 76a51ac ("fuse: clear PG_uptodate when using a stolen page")
says a bit more about this).

> [snip]
>
> Here's the log of a run with the change you've provided applied: https://qa-reports.linaro.org/lkft/sashal-linus-next/build/v6.18-rc7-13807-g26a15474eb13/testrun/30620754/suite/log-parser-test/test/exception-warning-fsiomapbuffered-io-at-ifs_free/log

Hmm, I think this WARN_ON_ONCE is getting triggered from the
folio_mark_uptodate() call in fuse_fill_write_pages().

This is happening because iomap integration hasn't (yet) been added to
the fuse writethrough path, as it's not necessary / urgent (whereas
for buffered writes, it is in order for fuse to use large folios). imo
updating the folio uptodate/dirty state but not the bitmap is
logically fine as the worst outcome from this is that we miss being
able to skip some extra read calls that we could saved if we did add
the iomap bitmap integration. However, I didn't realize there's a
WARN_ON_ONCE checking the ifs uptodate bitmap state (but curiously no
WARN_ON_ONCE checking the ifs dirty bitmap state).

With that said, I think it makes sense to either a) do the
iomap_set_range_uptodate() / iomap_clear_folio_uptodate() bitmap
updating you proposed as a fix for this WARN_ON_ONCE for now to
unblock things, until iomap integration gets added to the fuse
writethrough path, which I'll now prioritize, or b) remove that
warning. The warning does seem otherwise useful though so it seems
like we should probably just go with a).

Thanks,
Joanne

>
> --
> Thanks,
> Sasha

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
  2025-12-24  1:31       ` Sasha Levin
@ 2026-02-07  7:16         ` Wei Gao
  2026-02-09 19:08           ` Joanne Koong
  0 siblings, 1 reply; 22+ messages in thread
From: Wei Gao @ 2026-02-07  7:16 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Joanne Koong, willy, linux-fsdevel, linux-kernel, wegao

On Tue, Dec 23, 2025 at 08:31:57PM -0500, Sasha Levin wrote:
> On Tue, Dec 23, 2025 at 05:12:09PM -0800, Joanne Koong wrote:
> > On Tue, Dec 23, 2025 at 2:30 PM Sasha Levin <sashal@kernel.org> wrote:
> > > 
> > 
> > Hi Sasha,
> > 
> > Thanks for your patch and for the detailed writeup.
> 
> Thanks for looking into this!
> 
> > > When iomap uses large folios, per-block uptodate tracking is managed via
> > > iomap_folio_state (ifs). A race condition can cause the ifs uptodate bits
> > > to become inconsistent with the folio's uptodate flag.
> > > 
> > > The race occurs because folio_end_read() uses XOR semantics to atomically
> > > set the uptodate bit and clear the locked bit:
> > > 
> > >   Thread A (read completion):          Thread B (concurrent write):
> > >   --------------------------------     --------------------------------
> > >   iomap_finish_folio_read()
> > >     spin_lock(state_lock)
> > >     ifs_set_range_uptodate() -> true
> > >     spin_unlock(state_lock)
> > >                                        iomap_set_range_uptodate()
> > >                                          spin_lock(state_lock)
> > >                                          ifs_set_range_uptodate() -> true
> > >                                          spin_unlock(state_lock)
> > >                                          folio_mark_uptodate(folio)
> > >     folio_end_read(folio, true)
> > >       folio_xor_flags()  // XOR CLEARS uptodate!
> > 
> > The part I'm confused about here is how this can happen between a
> > concurrent read and write. My understanding is that the folio is
> > locked when the read occurs and locked when the write occurs and both
> > locks get dropped only when the read or write finishes. Looking at
> > iomap code, I see iomap_set_range_uptodate() getting called in
> > __iomap_write_begin() and __iomap_write_end() for the writes, but in
> > both those places the folio lock is held while this is called. I'm not
> > seeing how the read and write race in the diagram can happen, but
> > maybe I'm missing something here?
> 
> Hmm, you're right... The folio lock should prevent concurrent read/write
> access. Looking at this again, I suspect that FUSE was calling
> folio_clear_uptodate() and folio_mark_uptodate() directly without updating the
> ifs bits. For example, in fuse_send_write_pages() on write error, it calls
> folio_clear_uptodate(folio) which clears the folio flag but leaves ifs still
> showing all blocks uptodate?

Hi Sasha
On PowerPC with 64KB page size, msync04 fails with SIGBUS on NTFS-FUSE. The issue stems from a state inconsistency between
the iomap_folio_state (ifs) bitmap and the folio's Uptodate flag.
tst_test.c:1985: TINFO: === Testing on ntfs ===
tst_test.c:1290: TINFO: Formatting /dev/loop0 with ntfs opts='' extra opts=''
Failed to set locale, using default 'C'.
The partition start sector was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
The number of sectors per track was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
The number of heads was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
To boot from a device, Windows needs the 'partition start sector', the 'sectors per track' and the 'number of heads' to be set.
Windows will not be able to boot from this device.
tst_test.c:1302: TINFO: Mounting /dev/loop0 to /tmp/LTP_msy3ljVxi/msync04 fstyp=ntfs flags=0
tst_test.c:1302: TINFO: Trying FUSE...
tst_test.c:1953: TBROK: Test killed by SIGBUS!

Root Cause Analysis: When a page fault triggers fuse_read_folio, the iomap_read_folio_iter handles the request. For a 64KB page, 
after fetching 4KB via fuse_iomap_read_folio_range_async, the remaining 60KB (61440 bytes) is zero-filled via iomap_block_needs_zeroing, 
then iomap_set_range_uptodate marks the folio as Uptodate globally, after folio_xor_flags folio's uptodate become 0 again, finally trigger 
an SIGBUS issue in filemap_fault.

So your iomap_set_range_uptodate patch can fix above failed case since it block mark folio's uptodate to 1.
Hope my findings are helpful.

> 
> -- 
> Thanks,
> Sasha
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
  2026-02-07  7:16         ` Wei Gao
@ 2026-02-09 19:08           ` Joanne Koong
  2026-02-10  0:12             ` Wei Gao
  0 siblings, 1 reply; 22+ messages in thread
From: Joanne Koong @ 2026-02-09 19:08 UTC (permalink / raw)
  To: Wei Gao; +Cc: Sasha Levin, willy, linux-fsdevel, linux-kernel

On Fri, Feb 6, 2026 at 11:16 PM Wei Gao <wegao@suse.com> wrote:
>
> On Tue, Dec 23, 2025 at 08:31:57PM -0500, Sasha Levin wrote:
> > On Tue, Dec 23, 2025 at 05:12:09PM -0800, Joanne Koong wrote:
> > > On Tue, Dec 23, 2025 at 2:30 PM Sasha Levin <sashal@kernel.org> wrote:
> > > >
> > >
> > > Hi Sasha,
> > >
> > > Thanks for your patch and for the detailed writeup.
> >
> > Thanks for looking into this!
> >
> > > > When iomap uses large folios, per-block uptodate tracking is managed via
> > > > iomap_folio_state (ifs). A race condition can cause the ifs uptodate bits
> > > > to become inconsistent with the folio's uptodate flag.
> > > >
> > > > The race occurs because folio_end_read() uses XOR semantics to atomically
> > > > set the uptodate bit and clear the locked bit:
> > > >
> > > >   Thread A (read completion):          Thread B (concurrent write):
> > > >   --------------------------------     --------------------------------
> > > >   iomap_finish_folio_read()
> > > >     spin_lock(state_lock)
> > > >     ifs_set_range_uptodate() -> true
> > > >     spin_unlock(state_lock)
> > > >                                        iomap_set_range_uptodate()
> > > >                                          spin_lock(state_lock)
> > > >                                          ifs_set_range_uptodate() -> true
> > > >                                          spin_unlock(state_lock)
> > > >                                          folio_mark_uptodate(folio)
> > > >     folio_end_read(folio, true)
> > > >       folio_xor_flags()  // XOR CLEARS uptodate!
> > >
> > > The part I'm confused about here is how this can happen between a
> > > concurrent read and write. My understanding is that the folio is
> > > locked when the read occurs and locked when the write occurs and both
> > > locks get dropped only when the read or write finishes. Looking at
> > > iomap code, I see iomap_set_range_uptodate() getting called in
> > > __iomap_write_begin() and __iomap_write_end() for the writes, but in
> > > both those places the folio lock is held while this is called. I'm not
> > > seeing how the read and write race in the diagram can happen, but
> > > maybe I'm missing something here?
> >
> > Hmm, you're right... The folio lock should prevent concurrent read/write
> > access. Looking at this again, I suspect that FUSE was calling
> > folio_clear_uptodate() and folio_mark_uptodate() directly without updating the
> > ifs bits. For example, in fuse_send_write_pages() on write error, it calls
> > folio_clear_uptodate(folio) which clears the folio flag but leaves ifs still
> > showing all blocks uptodate?
>
> Hi Sasha
> On PowerPC with 64KB page size, msync04 fails with SIGBUS on NTFS-FUSE. The issue stems from a state inconsistency between
> the iomap_folio_state (ifs) bitmap and the folio's Uptodate flag.
> tst_test.c:1985: TINFO: === Testing on ntfs ===
> tst_test.c:1290: TINFO: Formatting /dev/loop0 with ntfs opts='' extra opts=''
> Failed to set locale, using default 'C'.
> The partition start sector was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
> The number of sectors per track was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
> The number of heads was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
> To boot from a device, Windows needs the 'partition start sector', the 'sectors per track' and the 'number of heads' to be set.
> Windows will not be able to boot from this device.
> tst_test.c:1302: TINFO: Mounting /dev/loop0 to /tmp/LTP_msy3ljVxi/msync04 fstyp=ntfs flags=0
> tst_test.c:1302: TINFO: Trying FUSE...
> tst_test.c:1953: TBROK: Test killed by SIGBUS!
>
> Root Cause Analysis: When a page fault triggers fuse_read_folio, the iomap_read_folio_iter handles the request. For a 64KB page,
> after fetching 4KB via fuse_iomap_read_folio_range_async, the remaining 60KB (61440 bytes) is zero-filled via iomap_block_needs_zeroing,
> then iomap_set_range_uptodate marks the folio as Uptodate globally, after folio_xor_flags folio's uptodate become 0 again, finally trigger
> an SIGBUS issue in filemap_fault.

Hi Wei,

Thanks for your report. afaict, this scenario occurs only if the
server is a fuseblk server with a block size different from the memory
page size and if the file size is less than the size of the folio
being read in.

Could you verify that this snippet from Sasha's patch fixes the issue?:

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index e5c1ca440d93..7ceda24cf6a7 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -87,12 +86,50 @@ static void iomap_set_range_uptodate(struct folio
*folio, size_t off,
  if (ifs) {
          spin_lock_irqsave(&ifs->state_lock, flags);
          uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
          + /*
          + * If a read is in progress, we must NOT call folio_mark_uptodate
          + * here. The read completion path (iomap_finish_folio_read or
          + * iomap_read_end) will call folio_end_read() which uses XOR
          + * semantics to set the uptodate bit. If we set it here, the XOR
          + * in folio_end_read() will clear it, leaving the folio not
          + * uptodate while the ifs says all blocks are uptodate.
          + */
         + if (uptodate && ifs->read_bytes_pending)
                   + uptodate = false;
        spin_unlock_irqrestore(&ifs->state_lock, flags);
  }

Thanks,
Joanne

>
> So your iomap_set_range_uptodate patch can fix above failed case since it block mark folio's uptodate to 1.
> Hope my findings are helpful.
>
> >
> > --
> > Thanks,
> > Sasha
> >

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
  2026-02-09 19:08           ` Joanne Koong
@ 2026-02-10  0:12             ` Wei Gao
  2026-02-10  0:20               ` Joanne Koong
  0 siblings, 1 reply; 22+ messages in thread
From: Wei Gao @ 2026-02-10  0:12 UTC (permalink / raw)
  To: Joanne Koong; +Cc: Sasha Levin, willy, linux-fsdevel, linux-kernel

On Mon, Feb 09, 2026 at 11:08:50AM -0800, Joanne Koong wrote:
> On Fri, Feb 6, 2026 at 11:16 PM Wei Gao <wegao@suse.com> wrote:
> >
> > On Tue, Dec 23, 2025 at 08:31:57PM -0500, Sasha Levin wrote:
> > > On Tue, Dec 23, 2025 at 05:12:09PM -0800, Joanne Koong wrote:
> > > > On Tue, Dec 23, 2025 at 2:30 PM Sasha Levin <sashal@kernel.org> wrote:
> > > > >
> > > >
> > > > Hi Sasha,
> > > >
> > > > Thanks for your patch and for the detailed writeup.
> > >
> > > Thanks for looking into this!
> > >
> > > > > When iomap uses large folios, per-block uptodate tracking is managed via
> > > > > iomap_folio_state (ifs). A race condition can cause the ifs uptodate bits
> > > > > to become inconsistent with the folio's uptodate flag.
> > > > >
> > > > > The race occurs because folio_end_read() uses XOR semantics to atomically
> > > > > set the uptodate bit and clear the locked bit:
> > > > >
> > > > >   Thread A (read completion):          Thread B (concurrent write):
> > > > >   --------------------------------     --------------------------------
> > > > >   iomap_finish_folio_read()
> > > > >     spin_lock(state_lock)
> > > > >     ifs_set_range_uptodate() -> true
> > > > >     spin_unlock(state_lock)
> > > > >                                        iomap_set_range_uptodate()
> > > > >                                          spin_lock(state_lock)
> > > > >                                          ifs_set_range_uptodate() -> true
> > > > >                                          spin_unlock(state_lock)
> > > > >                                          folio_mark_uptodate(folio)
> > > > >     folio_end_read(folio, true)
> > > > >       folio_xor_flags()  // XOR CLEARS uptodate!
> > > >
> > > > The part I'm confused about here is how this can happen between a
> > > > concurrent read and write. My understanding is that the folio is
> > > > locked when the read occurs and locked when the write occurs and both
> > > > locks get dropped only when the read or write finishes. Looking at
> > > > iomap code, I see iomap_set_range_uptodate() getting called in
> > > > __iomap_write_begin() and __iomap_write_end() for the writes, but in
> > > > both those places the folio lock is held while this is called. I'm not
> > > > seeing how the read and write race in the diagram can happen, but
> > > > maybe I'm missing something here?
> > >
> > > Hmm, you're right... The folio lock should prevent concurrent read/write
> > > access. Looking at this again, I suspect that FUSE was calling
> > > folio_clear_uptodate() and folio_mark_uptodate() directly without updating the
> > > ifs bits. For example, in fuse_send_write_pages() on write error, it calls
> > > folio_clear_uptodate(folio) which clears the folio flag but leaves ifs still
> > > showing all blocks uptodate?
> >
> > Hi Sasha
> > On PowerPC with 64KB page size, msync04 fails with SIGBUS on NTFS-FUSE. The issue stems from a state inconsistency between
> > the iomap_folio_state (ifs) bitmap and the folio's Uptodate flag.
> > tst_test.c:1985: TINFO: === Testing on ntfs ===
> > tst_test.c:1290: TINFO: Formatting /dev/loop0 with ntfs opts='' extra opts=''
> > Failed to set locale, using default 'C'.
> > The partition start sector was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
> > The number of sectors per track was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
> > The number of heads was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
> > To boot from a device, Windows needs the 'partition start sector', the 'sectors per track' and the 'number of heads' to be set.
> > Windows will not be able to boot from this device.
> > tst_test.c:1302: TINFO: Mounting /dev/loop0 to /tmp/LTP_msy3ljVxi/msync04 fstyp=ntfs flags=0
> > tst_test.c:1302: TINFO: Trying FUSE...
> > tst_test.c:1953: TBROK: Test killed by SIGBUS!
> >
> > Root Cause Analysis: When a page fault triggers fuse_read_folio, the iomap_read_folio_iter handles the request. For a 64KB page,
> > after fetching 4KB via fuse_iomap_read_folio_range_async, the remaining 60KB (61440 bytes) is zero-filled via iomap_block_needs_zeroing,
> > then iomap_set_range_uptodate marks the folio as Uptodate globally, after folio_xor_flags folio's uptodate become 0 again, finally trigger
> > an SIGBUS issue in filemap_fault.
> 
> Hi Wei,
> 
> Thanks for your report. afaict, this scenario occurs only if the
> server is a fuseblk server with a block size different from the memory
> page size and if the file size is less than the size of the folio
> being read in.
Thanks for checking this and give quick feedback :)
> 
> Could you verify that this snippet from Sasha's patch fixes the issue?:
Yes, Sasha's patch can fixes the issue.
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index e5c1ca440d93..7ceda24cf6a7 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -87,12 +86,50 @@ static void iomap_set_range_uptodate(struct folio
> *folio, size_t off,
>   if (ifs) {
>           spin_lock_irqsave(&ifs->state_lock, flags);
>           uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
>           + /*
>           + * If a read is in progress, we must NOT call folio_mark_uptodate
>           + * here. The read completion path (iomap_finish_folio_read or
>           + * iomap_read_end) will call folio_end_read() which uses XOR
>           + * semantics to set the uptodate bit. If we set it here, the XOR
>           + * in folio_end_read() will clear it, leaving the folio not
>           + * uptodate while the ifs says all blocks are uptodate.
>           + */
>          + if (uptodate && ifs->read_bytes_pending)
>                    + uptodate = false;
>         spin_unlock_irqrestore(&ifs->state_lock, flags);
>   }
> 
> Thanks,
> Joanne
> 
> >
> > So your iomap_set_range_uptodate patch can fix above failed case since it block mark folio's uptodate to 1.
> > Hope my findings are helpful.
> >
> > >
> > > --
> > > Thanks,
> > > Sasha
> > >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
  2026-02-10  0:12             ` Wei Gao
@ 2026-02-10  0:20               ` Joanne Koong
  2026-02-10  0:40                 ` Wei Gao
  0 siblings, 1 reply; 22+ messages in thread
From: Joanne Koong @ 2026-02-10  0:20 UTC (permalink / raw)
  To: Wei Gao; +Cc: Sasha Levin, willy, linux-fsdevel, linux-kernel

On Mon, Feb 9, 2026 at 4:12 PM Wei Gao <wegao@suse.com> wrote:
>
> On Mon, Feb 09, 2026 at 11:08:50AM -0800, Joanne Koong wrote:
> > On Fri, Feb 6, 2026 at 11:16 PM Wei Gao <wegao@suse.com> wrote:
> > >
> > > On Tue, Dec 23, 2025 at 08:31:57PM -0500, Sasha Levin wrote:
> > > > On Tue, Dec 23, 2025 at 05:12:09PM -0800, Joanne Koong wrote:
> > > > > On Tue, Dec 23, 2025 at 2:30 PM Sasha Levin <sashal@kernel.org> wrote:
> > > > > >
> > > > >
> > > > > Hi Sasha,
> > > > >
> > > > > Thanks for your patch and for the detailed writeup.
> > > >
> > > > Thanks for looking into this!
> > > >
> > > > > > When iomap uses large folios, per-block uptodate tracking is managed via
> > > > > > iomap_folio_state (ifs). A race condition can cause the ifs uptodate bits
> > > > > > to become inconsistent with the folio's uptodate flag.
> > > > > >
> > > > > > The race occurs because folio_end_read() uses XOR semantics to atomically
> > > > > > set the uptodate bit and clear the locked bit:
> > > > > >
> > > > > >   Thread A (read completion):          Thread B (concurrent write):
> > > > > >   --------------------------------     --------------------------------
> > > > > >   iomap_finish_folio_read()
> > > > > >     spin_lock(state_lock)
> > > > > >     ifs_set_range_uptodate() -> true
> > > > > >     spin_unlock(state_lock)
> > > > > >                                        iomap_set_range_uptodate()
> > > > > >                                          spin_lock(state_lock)
> > > > > >                                          ifs_set_range_uptodate() -> true
> > > > > >                                          spin_unlock(state_lock)
> > > > > >                                          folio_mark_uptodate(folio)
> > > > > >     folio_end_read(folio, true)
> > > > > >       folio_xor_flags()  // XOR CLEARS uptodate!
> > > > >
> > > > > The part I'm confused about here is how this can happen between a
> > > > > concurrent read and write. My understanding is that the folio is
> > > > > locked when the read occurs and locked when the write occurs and both
> > > > > locks get dropped only when the read or write finishes. Looking at
> > > > > iomap code, I see iomap_set_range_uptodate() getting called in
> > > > > __iomap_write_begin() and __iomap_write_end() for the writes, but in
> > > > > both those places the folio lock is held while this is called. I'm not
> > > > > seeing how the read and write race in the diagram can happen, but
> > > > > maybe I'm missing something here?
> > > >
> > > > Hmm, you're right... The folio lock should prevent concurrent read/write
> > > > access. Looking at this again, I suspect that FUSE was calling
> > > > folio_clear_uptodate() and folio_mark_uptodate() directly without updating the
> > > > ifs bits. For example, in fuse_send_write_pages() on write error, it calls
> > > > folio_clear_uptodate(folio) which clears the folio flag but leaves ifs still
> > > > showing all blocks uptodate?
> > >
> > > Hi Sasha
> > > On PowerPC with 64KB page size, msync04 fails with SIGBUS on NTFS-FUSE. The issue stems from a state inconsistency between
> > > the iomap_folio_state (ifs) bitmap and the folio's Uptodate flag.
> > > tst_test.c:1985: TINFO: === Testing on ntfs ===
> > > tst_test.c:1290: TINFO: Formatting /dev/loop0 with ntfs opts='' extra opts=''
> > > Failed to set locale, using default 'C'.
> > > The partition start sector was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
> > > The number of sectors per track was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
> > > The number of heads was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
> > > To boot from a device, Windows needs the 'partition start sector', the 'sectors per track' and the 'number of heads' to be set.
> > > Windows will not be able to boot from this device.
> > > tst_test.c:1302: TINFO: Mounting /dev/loop0 to /tmp/LTP_msy3ljVxi/msync04 fstyp=ntfs flags=0
> > > tst_test.c:1302: TINFO: Trying FUSE...
> > > tst_test.c:1953: TBROK: Test killed by SIGBUS!
> > >
> > > Root Cause Analysis: When a page fault triggers fuse_read_folio, the iomap_read_folio_iter handles the request. For a 64KB page,
> > > after fetching 4KB via fuse_iomap_read_folio_range_async, the remaining 60KB (61440 bytes) is zero-filled via iomap_block_needs_zeroing,
> > > then iomap_set_range_uptodate marks the folio as Uptodate globally, after folio_xor_flags folio's uptodate become 0 again, finally trigger
> > > an SIGBUS issue in filemap_fault.
> >
> > Hi Wei,
> >
> > Thanks for your report. afaict, this scenario occurs only if the
> > server is a fuseblk server with a block size different from the memory
> > page size and if the file size is less than the size of the folio
> > being read in.
> Thanks for checking this and give quick feedback :)
> >
> > Could you verify that this snippet from Sasha's patch fixes the issue?:
> Yes, Sasha's patch can fixes the issue.

I think just those lines I pasted from Sasha's patch is the relevant
fix. Could you verify that just those lines (without the changes
from the rest of his patch) fixes the issue?

Thanks,
Joanne


> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index e5c1ca440d93..7ceda24cf6a7 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -87,12 +86,50 @@ static void iomap_set_range_uptodate(struct folio
> > *folio, size_t off,
> >   if (ifs) {
> >           spin_lock_irqsave(&ifs->state_lock, flags);
> >           uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
> >           + /*
> >           + * If a read is in progress, we must NOT call folio_mark_uptodate
> >           + * here. The read completion path (iomap_finish_folio_read or
> >           + * iomap_read_end) will call folio_end_read() which uses XOR
> >           + * semantics to set the uptodate bit. If we set it here, the XOR
> >           + * in folio_end_read() will clear it, leaving the folio not
> >           + * uptodate while the ifs says all blocks are uptodate.
> >           + */
> >          + if (uptodate && ifs->read_bytes_pending)
> >                    + uptodate = false;
> >         spin_unlock_irqrestore(&ifs->state_lock, flags);
> >   }
> >
> > Thanks,
> > Joanne
> >
> > >
> > > So your iomap_set_range_uptodate patch can fix above failed case since it block mark folio's uptodate to 1.
> > > Hope my findings are helpful.
> > >
> > > >
> > > > --
> > > > Thanks,
> > > > Sasha
> > > >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
  2026-02-10  0:20               ` Joanne Koong
@ 2026-02-10  0:40                 ` Wei Gao
  2026-02-10 22:18                   ` Joanne Koong
  0 siblings, 1 reply; 22+ messages in thread
From: Wei Gao @ 2026-02-10  0:40 UTC (permalink / raw)
  To: Joanne Koong; +Cc: Sasha Levin, willy, linux-fsdevel, linux-kernel

On Mon, Feb 09, 2026 at 04:20:01PM -0800, Joanne Koong wrote:
> On Mon, Feb 9, 2026 at 4:12 PM Wei Gao <wegao@suse.com> wrote:
> >
> > On Mon, Feb 09, 2026 at 11:08:50AM -0800, Joanne Koong wrote:
> > > On Fri, Feb 6, 2026 at 11:16 PM Wei Gao <wegao@suse.com> wrote:
> > > >
> > > > On Tue, Dec 23, 2025 at 08:31:57PM -0500, Sasha Levin wrote:
> > > > > On Tue, Dec 23, 2025 at 05:12:09PM -0800, Joanne Koong wrote:
> > > > > > On Tue, Dec 23, 2025 at 2:30 PM Sasha Levin <sashal@kernel.org> wrote:
> > > > > > >
> > > > > >
> > > > > > Hi Sasha,
> > > > > >
> > > > > > Thanks for your patch and for the detailed writeup.
> > > > >
> > > > > Thanks for looking into this!
> > > > >
> > > > > > > When iomap uses large folios, per-block uptodate tracking is managed via
> > > > > > > iomap_folio_state (ifs). A race condition can cause the ifs uptodate bits
> > > > > > > to become inconsistent with the folio's uptodate flag.
> > > > > > >
> > > > > > > The race occurs because folio_end_read() uses XOR semantics to atomically
> > > > > > > set the uptodate bit and clear the locked bit:
> > > > > > >
> > > > > > >   Thread A (read completion):          Thread B (concurrent write):
> > > > > > >   --------------------------------     --------------------------------
> > > > > > >   iomap_finish_folio_read()
> > > > > > >     spin_lock(state_lock)
> > > > > > >     ifs_set_range_uptodate() -> true
> > > > > > >     spin_unlock(state_lock)
> > > > > > >                                        iomap_set_range_uptodate()
> > > > > > >                                          spin_lock(state_lock)
> > > > > > >                                          ifs_set_range_uptodate() -> true
> > > > > > >                                          spin_unlock(state_lock)
> > > > > > >                                          folio_mark_uptodate(folio)
> > > > > > >     folio_end_read(folio, true)
> > > > > > >       folio_xor_flags()  // XOR CLEARS uptodate!
> > > > > >
> > > > > > The part I'm confused about here is how this can happen between a
> > > > > > concurrent read and write. My understanding is that the folio is
> > > > > > locked when the read occurs and locked when the write occurs and both
> > > > > > locks get dropped only when the read or write finishes. Looking at
> > > > > > iomap code, I see iomap_set_range_uptodate() getting called in
> > > > > > __iomap_write_begin() and __iomap_write_end() for the writes, but in
> > > > > > both those places the folio lock is held while this is called. I'm not
> > > > > > seeing how the read and write race in the diagram can happen, but
> > > > > > maybe I'm missing something here?
> > > > >
> > > > > Hmm, you're right... The folio lock should prevent concurrent read/write
> > > > > access. Looking at this again, I suspect that FUSE was calling
> > > > > folio_clear_uptodate() and folio_mark_uptodate() directly without updating the
> > > > > ifs bits. For example, in fuse_send_write_pages() on write error, it calls
> > > > > folio_clear_uptodate(folio) which clears the folio flag but leaves ifs still
> > > > > showing all blocks uptodate?
> > > >
> > > > Hi Sasha
> > > > On PowerPC with 64KB page size, msync04 fails with SIGBUS on NTFS-FUSE. The issue stems from a state inconsistency between
> > > > the iomap_folio_state (ifs) bitmap and the folio's Uptodate flag.
> > > > tst_test.c:1985: TINFO: === Testing on ntfs ===
> > > > tst_test.c:1290: TINFO: Formatting /dev/loop0 with ntfs opts='' extra opts=''
> > > > Failed to set locale, using default 'C'.
> > > > The partition start sector was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
> > > > The number of sectors per track was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
> > > > The number of heads was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
> > > > To boot from a device, Windows needs the 'partition start sector', the 'sectors per track' and the 'number of heads' to be set.
> > > > Windows will not be able to boot from this device.
> > > > tst_test.c:1302: TINFO: Mounting /dev/loop0 to /tmp/LTP_msy3ljVxi/msync04 fstyp=ntfs flags=0
> > > > tst_test.c:1302: TINFO: Trying FUSE...
> > > > tst_test.c:1953: TBROK: Test killed by SIGBUS!
> > > >
> > > > Root Cause Analysis: When a page fault triggers fuse_read_folio, the iomap_read_folio_iter handles the request. For a 64KB page,
> > > > after fetching 4KB via fuse_iomap_read_folio_range_async, the remaining 60KB (61440 bytes) is zero-filled via iomap_block_needs_zeroing,
> > > > then iomap_set_range_uptodate marks the folio as Uptodate globally, after folio_xor_flags folio's uptodate become 0 again, finally trigger
> > > > an SIGBUS issue in filemap_fault.
> > >
> > > Hi Wei,
> > >
> > > Thanks for your report. afaict, this scenario occurs only if the
> > > server is a fuseblk server with a block size different from the memory
> > > page size and if the file size is less than the size of the folio
> > > being read in.
> > Thanks for checking this and give quick feedback :)
> > >
> > > Could you verify that this snippet from Sasha's patch fixes the issue?:
> > Yes, Sasha's patch can fixes the issue.
> 
> I think just those lines I pasted from Sasha's patch is the relevant
> fix. Could you verify that just those lines (without the changes
> from the rest of his patch) fixes the issue?
Yes, i just add two lines change in iomap_set_range_uptodate can fixes
the issue.
+		if (uptodate && ifs->read_bytes_pending)
+			uptodate = false;
> 
> Thanks,
> Joanne
> 
> 
> > >
> > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > index e5c1ca440d93..7ceda24cf6a7 100644
> > > --- a/fs/iomap/buffered-io.c
> > > +++ b/fs/iomap/buffered-io.c
> > > @@ -87,12 +86,50 @@ static void iomap_set_range_uptodate(struct folio
> > > *folio, size_t off,
> > >   if (ifs) {
> > >           spin_lock_irqsave(&ifs->state_lock, flags);
> > >           uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
> > >           + /*
> > >           + * If a read is in progress, we must NOT call folio_mark_uptodate
> > >           + * here. The read completion path (iomap_finish_folio_read or
> > >           + * iomap_read_end) will call folio_end_read() which uses XOR
> > >           + * semantics to set the uptodate bit. If we set it here, the XOR
> > >           + * in folio_end_read() will clear it, leaving the folio not
> > >           + * uptodate while the ifs says all blocks are uptodate.
> > >           + */
> > >          + if (uptodate && ifs->read_bytes_pending)
> > >                    + uptodate = false;
> > >         spin_unlock_irqrestore(&ifs->state_lock, flags);
> > >   }
> > >
> > > Thanks,
> > > Joanne
> > >
> > > >
> > > > So your iomap_set_range_uptodate patch can fix above failed case since it block mark folio's uptodate to 1.
> > > > Hope my findings are helpful.
> > > >
> > > > >
> > > > > --
> > > > > Thanks,
> > > > > Sasha
> > > > >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
  2026-02-10  0:40                 ` Wei Gao
@ 2026-02-10 22:18                   ` Joanne Koong
  2026-02-11  0:00                     ` Sasha Levin
  2026-02-11  3:11                     ` Matthew Wilcox
  0 siblings, 2 replies; 22+ messages in thread
From: Joanne Koong @ 2026-02-10 22:18 UTC (permalink / raw)
  To: Wei Gao; +Cc: Sasha Levin, willy, linux-fsdevel, linux-kernel

On Mon, Feb 9, 2026 at 4:40 PM Wei Gao <wegao@suse.com> wrote:
>
> On Mon, Feb 09, 2026 at 04:20:01PM -0800, Joanne Koong wrote:
> > On Mon, Feb 9, 2026 at 4:12 PM Wei Gao <wegao@suse.com> wrote:
> > >
> > > On Mon, Feb 09, 2026 at 11:08:50AM -0800, Joanne Koong wrote:
> > > > On Fri, Feb 6, 2026 at 11:16 PM Wei Gao <wegao@suse.com> wrote:
> > > > >
> > > > > On Tue, Dec 23, 2025 at 08:31:57PM -0500, Sasha Levin wrote:
> > > > > > On Tue, Dec 23, 2025 at 05:12:09PM -0800, Joanne Koong wrote:
> > > > > > > On Tue, Dec 23, 2025 at 2:30 PM Sasha Levin <sashal@kernel.org> wrote:
> > > > > > > >
> > > > > > >
> > > > > > > Hi Sasha,
> > > > > > >
> > > > > > > Thanks for your patch and for the detailed writeup.
> > > > > >
> > > > > > Thanks for looking into this!
> > > > > >
> > > > > > > > When iomap uses large folios, per-block uptodate tracking is managed via
> > > > > > > > iomap_folio_state (ifs). A race condition can cause the ifs uptodate bits
> > > > > > > > to become inconsistent with the folio's uptodate flag.
> > > > > > > >
> > > > > > > > The race occurs because folio_end_read() uses XOR semantics to atomically
> > > > > > > > set the uptodate bit and clear the locked bit:
> > > > > > > >
> > > > > > > >   Thread A (read completion):          Thread B (concurrent write):
> > > > > > > >   --------------------------------     --------------------------------
> > > > > > > >   iomap_finish_folio_read()
> > > > > > > >     spin_lock(state_lock)
> > > > > > > >     ifs_set_range_uptodate() -> true
> > > > > > > >     spin_unlock(state_lock)
> > > > > > > >                                        iomap_set_range_uptodate()
> > > > > > > >                                          spin_lock(state_lock)
> > > > > > > >                                          ifs_set_range_uptodate() -> true
> > > > > > > >                                          spin_unlock(state_lock)
> > > > > > > >                                          folio_mark_uptodate(folio)
> > > > > > > >     folio_end_read(folio, true)
> > > > > > > >       folio_xor_flags()  // XOR CLEARS uptodate!
> > > > > > >
> > > > > > > The part I'm confused about here is how this can happen between a
> > > > > > > concurrent read and write. My understanding is that the folio is
> > > > > > > locked when the read occurs and locked when the write occurs and both
> > > > > > > locks get dropped only when the read or write finishes. Looking at
> > > > > > > iomap code, I see iomap_set_range_uptodate() getting called in
> > > > > > > __iomap_write_begin() and __iomap_write_end() for the writes, but in
> > > > > > > both those places the folio lock is held while this is called. I'm not
> > > > > > > seeing how the read and write race in the diagram can happen, but
> > > > > > > maybe I'm missing something here?
> > > > > >
> > > > > > Hmm, you're right... The folio lock should prevent concurrent read/write
> > > > > > access. Looking at this again, I suspect that FUSE was calling
> > > > > > folio_clear_uptodate() and folio_mark_uptodate() directly without updating the
> > > > > > ifs bits. For example, in fuse_send_write_pages() on write error, it calls
> > > > > > folio_clear_uptodate(folio) which clears the folio flag but leaves ifs still
> > > > > > showing all blocks uptodate?
> > > > >
> > > > > Hi Sasha
> > > > > On PowerPC with 64KB page size, msync04 fails with SIGBUS on NTFS-FUSE. The issue stems from a state inconsistency between
> > > > > the iomap_folio_state (ifs) bitmap and the folio's Uptodate flag.
> > > > > tst_test.c:1985: TINFO: === Testing on ntfs ===
> > > > > tst_test.c:1290: TINFO: Formatting /dev/loop0 with ntfs opts='' extra opts=''
> > > > > Failed to set locale, using default 'C'.
> > > > > The partition start sector was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
> > > > > The number of sectors per track was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
> > > > > The number of heads was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
> > > > > To boot from a device, Windows needs the 'partition start sector', the 'sectors per track' and the 'number of heads' to be set.
> > > > > Windows will not be able to boot from this device.
> > > > > tst_test.c:1302: TINFO: Mounting /dev/loop0 to /tmp/LTP_msy3ljVxi/msync04 fstyp=ntfs flags=0
> > > > > tst_test.c:1302: TINFO: Trying FUSE...
> > > > > tst_test.c:1953: TBROK: Test killed by SIGBUS!
> > > > >
> > > > > Root Cause Analysis: When a page fault triggers fuse_read_folio, the iomap_read_folio_iter handles the request. For a 64KB page,
> > > > > after fetching 4KB via fuse_iomap_read_folio_range_async, the remaining 60KB (61440 bytes) is zero-filled via iomap_block_needs_zeroing,
> > > > > then iomap_set_range_uptodate marks the folio as Uptodate globally, after folio_xor_flags folio's uptodate become 0 again, finally trigger
> > > > > an SIGBUS issue in filemap_fault.
> > > >
> > > > Hi Wei,
> > > >
> > > > Thanks for your report. afaict, this scenario occurs only if the
> > > > server is a fuseblk server with a block size different from the memory
> > > > page size and if the file size is less than the size of the folio
> > > > being read in.
> > > Thanks for checking this and give quick feedback :)
> > > >
> > > > Could you verify that this snippet from Sasha's patch fixes the issue?:
> > > Yes, Sasha's patch can fixes the issue.
> >
> > I think just those lines I pasted from Sasha's patch is the relevant
> > fix. Could you verify that just those lines (without the changes
> > from the rest of his patch) fixes the issue?
> Yes, i just add two lines change in iomap_set_range_uptodate can fixes
> the issue.

Great, thank you for confirming.

Sasha, would you mind submitting this snippet of your patch as the fix
for the EOF zeroing issue? I think it could be restructured to

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 1fe19b4ee2f4..412e661871f8 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -87,7 +87,16 @@ static void iomap_set_range_uptodate(struct folio
*folio, size_t off,

        if (ifs) {
                spin_lock_irqsave(&ifs->state_lock, flags);
-               uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
+               /*
+                * If a read is in progress, we must NOT call
folio_mark_uptodate.
+                * The read completion path (iomap_finish_folio_read or
+                * iomap_read_end) will call folio_end_read() which uses XOR
+                * semantics to set the uptodate bit. If we set it here, the XOR
+                * in folio_end_read() will clear it, leaving the folio not
+                * uptodate.
+                */
+               uptodate = ifs_set_range_uptodate(folio, ifs, off, len) &&
+                       !ifs->read_bytes_pending;
                spin_unlock_irqrestore(&ifs->state_lock, flags);
        }

to be a bit more concise.

If you're busy and don't have the bandwidth, I'm happy to forward the
patch on your behalf with your Signed-off-by / authorship.

Thanks,
Joanne
> +               if (uptodate && ifs->read_bytes_pending)
> +                       uptodate = false;
> >
> > Thanks,
> > Joanne
> >
> >
> > > >
> > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > > index e5c1ca440d93..7ceda24cf6a7 100644
> > > > --- a/fs/iomap/buffered-io.c
> > > > +++ b/fs/iomap/buffered-io.c
> > > > @@ -87,12 +86,50 @@ static void iomap_set_range_uptodate(struct folio
> > > > *folio, size_t off,
> > > >   if (ifs) {
> > > >           spin_lock_irqsave(&ifs->state_lock, flags);
> > > >           uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
> > > >           + /*
> > > >           + * If a read is in progress, we must NOT call folio_mark_uptodate
> > > >           + * here. The read completion path (iomap_finish_folio_read or
> > > >           + * iomap_read_end) will call folio_end_read() which uses XOR
> > > >           + * semantics to set the uptodate bit. If we set it here, the XOR
> > > >           + * in folio_end_read() will clear it, leaving the folio not
> > > >           + * uptodate while the ifs says all blocks are uptodate.
> > > >           + */
> > > >          + if (uptodate && ifs->read_bytes_pending)
> > > >                    + uptodate = false;
> > > >         spin_unlock_irqrestore(&ifs->state_lock, flags);
> > > >   }
> > > >
> > > > Thanks,
> > > > Joanne
> > > >
> > > > >
> > > > > So your iomap_set_range_uptodate patch can fix above failed case since it block mark folio's uptodate to 1.
> > > > > Hope my findings are helpful.
> > > > >
> > > > > >
> > > > > > --
> > > > > > Thanks,
> > > > > > Sasha
> > > > > >

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
  2026-02-10 22:18                   ` Joanne Koong
@ 2026-02-11  0:00                     ` Sasha Levin
  2026-02-11  3:11                     ` Matthew Wilcox
  1 sibling, 0 replies; 22+ messages in thread
From: Sasha Levin @ 2026-02-11  0:00 UTC (permalink / raw)
  To: Joanne Koong; +Cc: Wei Gao, willy, linux-fsdevel, linux-kernel

On Tue, Feb 10, 2026 at 02:18:06PM -0800, Joanne Koong wrote:
>On Mon, Feb 9, 2026 at 4:40 PM Wei Gao <wegao@suse.com> wrote:
>>
>> On Mon, Feb 09, 2026 at 04:20:01PM -0800, Joanne Koong wrote:
>> > On Mon, Feb 9, 2026 at 4:12 PM Wei Gao <wegao@suse.com> wrote:
>> > >
>> > > On Mon, Feb 09, 2026 at 11:08:50AM -0800, Joanne Koong wrote:
>> > > > On Fri, Feb 6, 2026 at 11:16 PM Wei Gao <wegao@suse.com> wrote:
>> > > > >
>> > > > > On Tue, Dec 23, 2025 at 08:31:57PM -0500, Sasha Levin wrote:
>> > > > > > On Tue, Dec 23, 2025 at 05:12:09PM -0800, Joanne Koong wrote:
>> > > > > > > On Tue, Dec 23, 2025 at 2:30 PM Sasha Levin <sashal@kernel.org> wrote:
>> > > > > > > >
>> > > > > > >
>> > > > > > > Hi Sasha,
>> > > > > > >
>> > > > > > > Thanks for your patch and for the detailed writeup.
>> > > > > >
>> > > > > > Thanks for looking into this!
>> > > > > >
>> > > > > > > > When iomap uses large folios, per-block uptodate tracking is managed via
>> > > > > > > > iomap_folio_state (ifs). A race condition can cause the ifs uptodate bits
>> > > > > > > > to become inconsistent with the folio's uptodate flag.
>> > > > > > > >
>> > > > > > > > The race occurs because folio_end_read() uses XOR semantics to atomically
>> > > > > > > > set the uptodate bit and clear the locked bit:
>> > > > > > > >
>> > > > > > > >   Thread A (read completion):          Thread B (concurrent write):
>> > > > > > > >   --------------------------------     --------------------------------
>> > > > > > > >   iomap_finish_folio_read()
>> > > > > > > >     spin_lock(state_lock)
>> > > > > > > >     ifs_set_range_uptodate() -> true
>> > > > > > > >     spin_unlock(state_lock)
>> > > > > > > >                                        iomap_set_range_uptodate()
>> > > > > > > >                                          spin_lock(state_lock)
>> > > > > > > >                                          ifs_set_range_uptodate() -> true
>> > > > > > > >                                          spin_unlock(state_lock)
>> > > > > > > >                                          folio_mark_uptodate(folio)
>> > > > > > > >     folio_end_read(folio, true)
>> > > > > > > >       folio_xor_flags()  // XOR CLEARS uptodate!
>> > > > > > >
>> > > > > > > The part I'm confused about here is how this can happen between a
>> > > > > > > concurrent read and write. My understanding is that the folio is
>> > > > > > > locked when the read occurs and locked when the write occurs and both
>> > > > > > > locks get dropped only when the read or write finishes. Looking at
>> > > > > > > iomap code, I see iomap_set_range_uptodate() getting called in
>> > > > > > > __iomap_write_begin() and __iomap_write_end() for the writes, but in
>> > > > > > > both those places the folio lock is held while this is called. I'm not
>> > > > > > > seeing how the read and write race in the diagram can happen, but
>> > > > > > > maybe I'm missing something here?
>> > > > > >
>> > > > > > Hmm, you're right... The folio lock should prevent concurrent read/write
>> > > > > > access. Looking at this again, I suspect that FUSE was calling
>> > > > > > folio_clear_uptodate() and folio_mark_uptodate() directly without updating the
>> > > > > > ifs bits. For example, in fuse_send_write_pages() on write error, it calls
>> > > > > > folio_clear_uptodate(folio) which clears the folio flag but leaves ifs still
>> > > > > > showing all blocks uptodate?
>> > > > >
>> > > > > Hi Sasha
>> > > > > On PowerPC with 64KB page size, msync04 fails with SIGBUS on NTFS-FUSE. The issue stems from a state inconsistency between
>> > > > > the iomap_folio_state (ifs) bitmap and the folio's Uptodate flag.
>> > > > > tst_test.c:1985: TINFO: === Testing on ntfs ===
>> > > > > tst_test.c:1290: TINFO: Formatting /dev/loop0 with ntfs opts='' extra opts=''
>> > > > > Failed to set locale, using default 'C'.
>> > > > > The partition start sector was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
>> > > > > The number of sectors per track was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
>> > > > > The number of heads was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
>> > > > > To boot from a device, Windows needs the 'partition start sector', the 'sectors per track' and the 'number of heads' to be set.
>> > > > > Windows will not be able to boot from this device.
>> > > > > tst_test.c:1302: TINFO: Mounting /dev/loop0 to /tmp/LTP_msy3ljVxi/msync04 fstyp=ntfs flags=0
>> > > > > tst_test.c:1302: TINFO: Trying FUSE...
>> > > > > tst_test.c:1953: TBROK: Test killed by SIGBUS!
>> > > > >
>> > > > > Root Cause Analysis: When a page fault triggers fuse_read_folio, the iomap_read_folio_iter handles the request. For a 64KB page,
>> > > > > after fetching 4KB via fuse_iomap_read_folio_range_async, the remaining 60KB (61440 bytes) is zero-filled via iomap_block_needs_zeroing,
>> > > > > then iomap_set_range_uptodate marks the folio as Uptodate globally, after folio_xor_flags folio's uptodate become 0 again, finally trigger
>> > > > > an SIGBUS issue in filemap_fault.
>> > > >
>> > > > Hi Wei,
>> > > >
>> > > > Thanks for your report. afaict, this scenario occurs only if the
>> > > > server is a fuseblk server with a block size different from the memory
>> > > > page size and if the file size is less than the size of the folio
>> > > > being read in.
>> > > Thanks for checking this and give quick feedback :)
>> > > >
>> > > > Could you verify that this snippet from Sasha's patch fixes the issue?:
>> > > Yes, Sasha's patch can fixes the issue.
>> >
>> > I think just those lines I pasted from Sasha's patch is the relevant
>> > fix. Could you verify that just those lines (without the changes
>> > from the rest of his patch) fixes the issue?
>> Yes, i just add two lines change in iomap_set_range_uptodate can fixes
>> the issue.
>
>Great, thank you for confirming.
>
>Sasha, would you mind submitting this snippet of your patch as the fix
>for the EOF zeroing issue? I think it could be restructured to
>
>diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>index 1fe19b4ee2f4..412e661871f8 100644
>--- a/fs/iomap/buffered-io.c
>+++ b/fs/iomap/buffered-io.c
>@@ -87,7 +87,16 @@ static void iomap_set_range_uptodate(struct folio
>*folio, size_t off,
>
>        if (ifs) {
>                spin_lock_irqsave(&ifs->state_lock, flags);
>-               uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
>+               /*
>+                * If a read is in progress, we must NOT call
>folio_mark_uptodate.
>+                * The read completion path (iomap_finish_folio_read or
>+                * iomap_read_end) will call folio_end_read() which uses XOR
>+                * semantics to set the uptodate bit. If we set it here, the XOR
>+                * in folio_end_read() will clear it, leaving the folio not
>+                * uptodate.
>+                */
>+               uptodate = ifs_set_range_uptodate(folio, ifs, off, len) &&
>+                       !ifs->read_bytes_pending;
>                spin_unlock_irqrestore(&ifs->state_lock, flags);
>        }
>
>to be a bit more concise.
>
>If you're busy and don't have the bandwidth, I'm happy to forward the
>patch on your behalf with your Signed-off-by / authorship.

Thanks for the offer Joanna!

Since you've done all the triaging work here, please go ahead and submit it -
something like a Suggested-by would be more than enought for me :)

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
  2026-02-10 22:18                   ` Joanne Koong
  2026-02-11  0:00                     ` Sasha Levin
@ 2026-02-11  3:11                     ` Matthew Wilcox
  2026-02-11 19:33                       ` Joanne Koong
  1 sibling, 1 reply; 22+ messages in thread
From: Matthew Wilcox @ 2026-02-11  3:11 UTC (permalink / raw)
  To: Joanne Koong; +Cc: Wei Gao, Sasha Levin, linux-fsdevel, linux-kernel

On Tue, Feb 10, 2026 at 02:18:06PM -0800, Joanne Koong wrote:
>                 spin_lock_irqsave(&ifs->state_lock, flags);
> -               uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
> +               /*
> +                * If a read is in progress, we must NOT call
> folio_mark_uptodate.
> +                * The read completion path (iomap_finish_folio_read or
> +                * iomap_read_end) will call folio_end_read() which uses XOR
> +                * semantics to set the uptodate bit. If we set it here, the XOR
> +                * in folio_end_read() will clear it, leaving the folio not
> +                * uptodate.
> +                */
> +               uptodate = ifs_set_range_uptodate(folio, ifs, off, len) &&
> +                       !ifs->read_bytes_pending;
>                 spin_unlock_irqrestore(&ifs->state_lock, flags);

This can't possibly be the right fix.  There's some horrible confusion
here.  It should not be possible to have read bytes pending _and_ the
entire folio be uptodate.  That's an invariant that should always be
maintained.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
  2026-02-11  3:11                     ` Matthew Wilcox
@ 2026-02-11 19:33                       ` Joanne Koong
  2026-02-11 21:03                         ` Matthew Wilcox
  0 siblings, 1 reply; 22+ messages in thread
From: Joanne Koong @ 2026-02-11 19:33 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Wei Gao, Sasha Levin, linux-fsdevel, linux-kernel

On Tue, Feb 10, 2026 at 7:11 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Feb 10, 2026 at 02:18:06PM -0800, Joanne Koong wrote:
> >                 spin_lock_irqsave(&ifs->state_lock, flags);
> > -               uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
> > +               /*
> > +                * If a read is in progress, we must NOT call
> > folio_mark_uptodate.
> > +                * The read completion path (iomap_finish_folio_read or
> > +                * iomap_read_end) will call folio_end_read() which uses XOR
> > +                * semantics to set the uptodate bit. If we set it here, the XOR
> > +                * in folio_end_read() will clear it, leaving the folio not
> > +                * uptodate.
> > +                */
> > +               uptodate = ifs_set_range_uptodate(folio, ifs, off, len) &&
> > +                       !ifs->read_bytes_pending;
> >                 spin_unlock_irqrestore(&ifs->state_lock, flags);
>
> This can't possibly be the right fix.  There's some horrible confusion
> here.  It should not be possible to have read bytes pending _and_ the
> entire folio be uptodate.  That's an invariant that should always be
> maintained.

ifs->read_bytes_pending gets initialized to the folio size, but if the
file being read in is smaller than the size of the folio, then we
reach this scenario because the file has been read in but
ifs->read_bytes_pending is still a positive value because it
represents the bytes between the end of the file and the end of the
folio. If the folio size is 16k and the file size is 4k:
  a) ifs->read_bytes_pending gets initialized to 16k
  b) ->read_folio_range() is called for the 4k read
  c) the 4k read succeeds, ifs->read_bytes_pending is now 12k and the
0 to 4k range is marked uptodate
  d) the post-eof blocks are zeroed and marked uptodate in the call to
iomap_set_range_uptodate()
  e) iomap_set_range_uptodate() sees all the ranges are marked
uptodate and it marks the folio uptodate
  f) iomap_read_end() gets called to subtract the 12k from
ifs->read_bytes_pending. it too sees all the ranges are marked
uptodate and marks the folio uptodate

The same scenario could happen for IOMAP_INLINE mappings if part of
the folio is read in through ->read_folio_range() and then the rest is
read in as inline data.

An alternative solution is to not have zeroed-out / inlined mappings
call iomap_read_end(), eg something like this [1], but this adds
additional complexity and doesn't work if there's additional mappings
for the folio after a non-IOMAP_MAPPED mapping.

Is there a better approach that I'm missing?

Thanks,
Joanne

[1] https://github.com/joannekoong/linux/commit/de48d3c29db8ae654300341e3eec12497df54673

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
  2026-02-11 19:33                       ` Joanne Koong
@ 2026-02-11 21:03                         ` Matthew Wilcox
  2026-02-11 23:13                           ` Joanne Koong
  0 siblings, 1 reply; 22+ messages in thread
From: Matthew Wilcox @ 2026-02-11 21:03 UTC (permalink / raw)
  To: Joanne Koong; +Cc: Wei Gao, Sasha Levin, linux-fsdevel, linux-kernel

On Wed, Feb 11, 2026 at 11:33:05AM -0800, Joanne Koong wrote:
> ifs->read_bytes_pending gets initialized to the folio size, but if the
> file being read in is smaller than the size of the folio, then we
> reach this scenario because the file has been read in but
> ifs->read_bytes_pending is still a positive value because it
> represents the bytes between the end of the file and the end of the
> folio. If the folio size is 16k and the file size is 4k:
>   a) ifs->read_bytes_pending gets initialized to 16k
>   b) ->read_folio_range() is called for the 4k read
>   c) the 4k read succeeds, ifs->read_bytes_pending is now 12k and the
> 0 to 4k range is marked uptodate
>   d) the post-eof blocks are zeroed and marked uptodate in the call to
> iomap_set_range_uptodate()

This is the bug then.  If they're marked uptodate, read_bytes_pending
should be decremented at the same time.  Now, I appreciate that
iomap_set_range_uptodate() is called both from iomap_read_folio_iter()
and __iomap_write_begin(), and it can't decrement read_bytes_pending
in the latter case.  Perhaps a flag or a second length parameter is
the solution?

>   e) iomap_set_range_uptodate() sees all the ranges are marked
> uptodate and it marks the folio uptodate
>   f) iomap_read_end() gets called to subtract the 12k from
> ifs->read_bytes_pending. it too sees all the ranges are marked
> uptodate and marks the folio uptodate
> 
> The same scenario could happen for IOMAP_INLINE mappings if part of
> the folio is read in through ->read_folio_range() and then the rest is
> read in as inline data.

This is basically the same case as post-eof.

> An alternative solution is to not have zeroed-out / inlined mappings
> call iomap_read_end(), eg something like this [1], but this adds
> additional complexity and doesn't work if there's additional mappings
> for the folio after a non-IOMAP_MAPPED mapping.
> 
> Is there a better approach that I'm missing?
> 
> Thanks,
> Joanne
> 
> [1] https://github.com/joannekoong/linux/commit/de48d3c29db8ae654300341e3eec12497df54673

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
  2026-02-11 21:03                         ` Matthew Wilcox
@ 2026-02-11 23:13                           ` Joanne Koong
  2026-02-12 19:31                             ` Matthew Wilcox
  0 siblings, 1 reply; 22+ messages in thread
From: Joanne Koong @ 2026-02-11 23:13 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Wei Gao, Sasha Levin, linux-fsdevel, linux-kernel

On Wed, Feb 11, 2026 at 1:03 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Feb 11, 2026 at 11:33:05AM -0800, Joanne Koong wrote:
> > ifs->read_bytes_pending gets initialized to the folio size, but if the
> > file being read in is smaller than the size of the folio, then we
> > reach this scenario because the file has been read in but
> > ifs->read_bytes_pending is still a positive value because it
> > represents the bytes between the end of the file and the end of the
> > folio. If the folio size is 16k and the file size is 4k:
> >   a) ifs->read_bytes_pending gets initialized to 16k
> >   b) ->read_folio_range() is called for the 4k read
> >   c) the 4k read succeeds, ifs->read_bytes_pending is now 12k and the
> > 0 to 4k range is marked uptodate
> >   d) the post-eof blocks are zeroed and marked uptodate in the call to
> > iomap_set_range_uptodate()
>
> This is the bug then.  If they're marked uptodate, read_bytes_pending
> should be decremented at the same time.  Now, I appreciate that
> iomap_set_range_uptodate() is called both from iomap_read_folio_iter()
> and __iomap_write_begin(), and it can't decrement read_bytes_pending
> in the latter case.  Perhaps a flag or a second length parameter is
> the solution?

I don't think it's enough to decrement read_bytes_pending by the
zeroed/read-inline length because there's these two edge cases:
a) some blocks in the folio were already uptodate from the very
beginning and skipped for IO but not decremented yet from
ifs->read_bytes_pending, which means in iomap_read_end(),
ifs->read_bytes_pending would be > 0 and the uptodate flag could get
XORed again. This means we need to also decrement read_bytes_pending
by bytes_submitted as well for this case
b) the async ->read_folio_range() callback finishes after the
zeroing's read_bytes_pending decrement and calls folio_end_read(), so
we need to assign ctx->cur_folio to NULL

I think the code would have to look something like [1] (this is
similar to the alternative approach I mentioned in my previous reply
but fixed up to cover some more edge cases).

Thanks,
Joanne

[1] https://github.com/joannekoong/linux/commit/b42f47726433a8130e8c27d1b43b16e27dfd6960

>
> >   e) iomap_set_range_uptodate() sees all the ranges are marked
> > uptodate and it marks the folio uptodate
> >   f) iomap_read_end() gets called to subtract the 12k from
> > ifs->read_bytes_pending. it too sees all the ranges are marked
> > uptodate and marks the folio uptodate
> >
> > The same scenario could happen for IOMAP_INLINE mappings if part of
> > the folio is read in through ->read_folio_range() and then the rest is
> > read in as inline data.
>
> This is basically the same case as post-eof.
>
> > An alternative solution is to not have zeroed-out / inlined mappings
> > call iomap_read_end(), eg something like this [1], but this adds
> > additional complexity and doesn't work if there's additional mappings
> > for the folio after a non-IOMAP_MAPPED mapping.

(I was wrong about it not working for cases where's additional
mappings after a non-IOMAP_MAPPED mapping, since both
inline-read/zeroing are no-ops if the entire folio is already
uptodate)

 > >
> > Is there a better approach that I'm missing?
> >
> > Thanks,
> > Joanne
> >
> > [1] https://github.com/joannekoong/linux/commit/de48d3c29db8ae654300341e3eec12497df54673

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
  2026-02-11 23:13                           ` Joanne Koong
@ 2026-02-12 19:31                             ` Matthew Wilcox
  2026-02-13  0:53                               ` Joanne Koong
  0 siblings, 1 reply; 22+ messages in thread
From: Matthew Wilcox @ 2026-02-12 19:31 UTC (permalink / raw)
  To: Joanne Koong; +Cc: Wei Gao, Sasha Levin, linux-fsdevel, linux-kernel

On Wed, Feb 11, 2026 at 03:13:48PM -0800, Joanne Koong wrote:
> On Wed, Feb 11, 2026 at 1:03 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Wed, Feb 11, 2026 at 11:33:05AM -0800, Joanne Koong wrote:
> > > ifs->read_bytes_pending gets initialized to the folio size, but if the
> > > file being read in is smaller than the size of the folio, then we
> > > reach this scenario because the file has been read in but
> > > ifs->read_bytes_pending is still a positive value because it
> > > represents the bytes between the end of the file and the end of the
> > > folio. If the folio size is 16k and the file size is 4k:
> > >   a) ifs->read_bytes_pending gets initialized to 16k
> > >   b) ->read_folio_range() is called for the 4k read
> > >   c) the 4k read succeeds, ifs->read_bytes_pending is now 12k and the
> > > 0 to 4k range is marked uptodate
> > >   d) the post-eof blocks are zeroed and marked uptodate in the call to
> > > iomap_set_range_uptodate()
> >
> > This is the bug then.  If they're marked uptodate, read_bytes_pending
> > should be decremented at the same time.  Now, I appreciate that
> > iomap_set_range_uptodate() is called both from iomap_read_folio_iter()
> > and __iomap_write_begin(), and it can't decrement read_bytes_pending
> > in the latter case.  Perhaps a flag or a second length parameter is
> > the solution?
> 
> I don't think it's enough to decrement read_bytes_pending by the
> zeroed/read-inline length because there's these two edge cases:
> a) some blocks in the folio were already uptodate from the very
> beginning and skipped for IO but not decremented yet from
> ifs->read_bytes_pending, which means in iomap_read_end(),
> ifs->read_bytes_pending would be > 0 and the uptodate flag could get
> XORed again. This means we need to also decrement read_bytes_pending
> by bytes_submitted as well for this case

Hm, that's a good one.  It can't happen for readahead, but it can happen
if we start out by writing to some blocks of a folio, then call
read_folio to get the remaining blocks uptodate.  We could avoid it
happening by initialising read_bytes_pending to folio_size() -
bitmap_weight(ifs->uptodate) * block_size.

> b) the async ->read_folio_range() callback finishes after the
> zeroing's read_bytes_pending decrement and calls folio_end_read(), so
> we need to assign ctx->cur_folio to NULL

If we return 'finished' from iomap_finish_folio_read(), we can handle
this?

> I think the code would have to look something like [1] (this is
> similar to the alternative approach I mentioned in my previous reply
> but fixed up to cover some more edge cases).
> 
> Thanks,
> Joanne
> 
> [1] https://github.com/joannekoong/linux/commit/b42f47726433a8130e8c27d1b43b16e27dfd6960

I think we can do everything we need with a suitably modified
iomap_finish_folio_read() rather than the new iomap_finish_read_range().

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
  2026-02-12 19:31                             ` Matthew Wilcox
@ 2026-02-13  0:53                               ` Joanne Koong
  0 siblings, 0 replies; 22+ messages in thread
From: Joanne Koong @ 2026-02-13  0:53 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Wei Gao, Sasha Levin, linux-fsdevel, linux-kernel

On Thu, Feb 12, 2026 at 11:31 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Feb 11, 2026 at 03:13:48PM -0800, Joanne Koong wrote:
> > On Wed, Feb 11, 2026 at 1:03 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Wed, Feb 11, 2026 at 11:33:05AM -0800, Joanne Koong wrote:
> > > > ifs->read_bytes_pending gets initialized to the folio size, but if the
> > > > file being read in is smaller than the size of the folio, then we
> > > > reach this scenario because the file has been read in but
> > > > ifs->read_bytes_pending is still a positive value because it
> > > > represents the bytes between the end of the file and the end of the
> > > > folio. If the folio size is 16k and the file size is 4k:
> > > >   a) ifs->read_bytes_pending gets initialized to 16k
> > > >   b) ->read_folio_range() is called for the 4k read
> > > >   c) the 4k read succeeds, ifs->read_bytes_pending is now 12k and the
> > > > 0 to 4k range is marked uptodate
> > > >   d) the post-eof blocks are zeroed and marked uptodate in the call to
> > > > iomap_set_range_uptodate()
> > >
> > > This is the bug then.  If they're marked uptodate, read_bytes_pending
> > > should be decremented at the same time.  Now, I appreciate that
> > > iomap_set_range_uptodate() is called both from iomap_read_folio_iter()
> > > and __iomap_write_begin(), and it can't decrement read_bytes_pending
> > > in the latter case.  Perhaps a flag or a second length parameter is
> > > the solution?
> >
> > I don't think it's enough to decrement read_bytes_pending by the
> > zeroed/read-inline length because there's these two edge cases:
> > a) some blocks in the folio were already uptodate from the very
> > beginning and skipped for IO but not decremented yet from
> > ifs->read_bytes_pending, which means in iomap_read_end(),
> > ifs->read_bytes_pending would be > 0 and the uptodate flag could get
> > XORed again. This means we need to also decrement read_bytes_pending
> > by bytes_submitted as well for this case
>
> Hm, that's a good one.  It can't happen for readahead, but it can happen
> if we start out by writing to some blocks of a folio, then call
> read_folio to get the remaining blocks uptodate.  We could avoid it
> happening by initialising read_bytes_pending to folio_size() -
> bitmap_weight(ifs->uptodate) * block_size.

This is an interesting idea but if we do this then I think this adds
some more edge cases. For example, the range being inlined or zeroed
may have some already uptodate blocks (eg from a prior buffered write)
so we'll need to calculate how many already-existing uptodate bytes
there are in that range to avoid over-decrementing
ifs->read_bytes_pending. I think we would also have to move the
ifs_alloc() and iomap_read_init() calls to the very beginning of
iomap_read_folio_iter() before any iomap_read_inline_data() call
because there could be the case where a folio has an ifs that was
allocated from a prior write, so if we call iomap_finish_folio_read()
after iomap_read_inline_data(), the folio's ifs->read_bytes_pending
now must be initialized before the inline read. Whereas before, we had
some more optimal behavior with being able to entirely skip the ifs
allocation and read initialization if the entire folio gets read
inline.

>
> > b) the async ->read_folio_range() callback finishes after the
> > zeroing's read_bytes_pending decrement and calls folio_end_read(), so
> > we need to assign ctx->cur_folio to NULL
>
> If we return 'finished' from iomap_finish_folio_read(), we can handle
> this?

I think there is still this scenario:
- ->read_folio gets called on an 8k-size folio for a 4k-size file
- iomap_read_init() is called, ifs->read_bytes_pending is now 8k
- make async ->read_folio_range() call to read in 4k
- iomap zeroes out folio from 4k to 8k, then calls
iomap_finish_folio_read() with off = 4k and len = 4k
- in iomap_finish_folio_read(), decrement ifs->read_bytes_pending by
len. ifs->read_bytes_pending is now 4k
- async ->read_folio_range() completes read, calls
iomap_finish_folio_read() with off=0 and len = 4k, which now
decrements ifs->read_bytes_pending by 4k. read_bytes_pending is now 0,
so folio_end_read() gets called. folio should now not be touched by
iomap
- iomap still has valid ctx->cur_folio, and calls iomap_read_end on
ctx->cur_folio

This is the same issue as the one in
https://lore.kernel.org/linux-fsdevel/20260126224107.2182262-2-joannelkoong@gmail.com/

We could always set ctx->cur_folio to NULL after inline/zeroing calls
iomap_finish_folio_read() regardless of whether it actually ended the
read or not, but then this runs into issues for zeroing. The zeroing
can be triggered by non-EOF cases, eg if the first mapping is an
IOMAP_HOLE and then the rest of hte folio is mapped. We may still need
to read in the rest of the folio, so we can't just set ctx->cur_folio
to NULL. i guess one workaround is to explicitly check if the zeroing
is for IOMAP_MAPPED types and if so then always set ctx->cur_folio to
NULL, but I think this just gets uglier / more complex to understand
and I'm not sure if there's other edge cases I'm missing that we'd
need to account for. One other idea is to try avoiding the
iomap_end_read() call for non-error cases if we use your
bitmap_weight() idea above, then it wouldn't matter in that scenario
above if ctx->cur_folio points to a folio that already had read ended
on it. But I think that also just makes the code harder to
read/understand.

The original patch seemed cleanest to me, maybe if we renamed uptodate
to mark_uptodate, it'd be more appetible?  eg

@@ -80,18 +80,19 @@ static void iomap_set_range_uptodate(struct folio
*folio, size_t off,
 {
        struct iomap_folio_state *ifs = folio->private;
        unsigned long flags;
-       bool uptodate = true;
+       bool mark_uptodate = true;

        if (folio_test_uptodate(folio))
                return;

        if (ifs) {
                spin_lock_irqsave(&ifs->state_lock, flags);
-               uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
+               mark_uptodate = ifs_set_range_uptodate(folio, ifs, off, len) &&
+                       !ifs->read_bytes_pending;
                spin_unlock_irqrestore(&ifs->state_lock, flags);
        }

-       if (uptodate)
+       if (mark_uptodate)
                folio_mark_uptodate(folio);
 }


Thanks,
Joanne

>
> > I think the code would have to look something like [1] (this is
> > similar to the alternative approach I mentioned in my previous reply
> > but fixed up to cover some more edge cases).
> >
> > Thanks,
> > Joanne
> >
> > [1] https://github.com/joannekoong/linux/commit/b42f47726433a8130e8c27d1b43b16e27dfd6960
>
> I think we can do everything we need with a suitably modified
> iomap_finish_folio_read() rather than the new iomap_finish_read_range().

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2026-02-13  0:54 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250926002609.1302233-13-joannelkoong@gmail.com>
2025-12-23 22:30 ` [RFC PATCH 0/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read Sasha Levin
2025-12-23 22:30   ` [RFC PATCH 1/1] " Sasha Levin
2025-12-24  1:12     ` Joanne Koong
2025-12-24  1:31       ` Sasha Levin
2026-02-07  7:16         ` Wei Gao
2026-02-09 19:08           ` Joanne Koong
2026-02-10  0:12             ` Wei Gao
2026-02-10  0:20               ` Joanne Koong
2026-02-10  0:40                 ` Wei Gao
2026-02-10 22:18                   ` Joanne Koong
2026-02-11  0:00                     ` Sasha Levin
2026-02-11  3:11                     ` Matthew Wilcox
2026-02-11 19:33                       ` Joanne Koong
2026-02-11 21:03                         ` Matthew Wilcox
2026-02-11 23:13                           ` Joanne Koong
2026-02-12 19:31                             ` Matthew Wilcox
2026-02-13  0:53                               ` Joanne Koong
2025-12-24  2:10       ` Matthew Wilcox
2025-12-24 15:43         ` Sasha Levin
2025-12-24 17:27           ` Matthew Wilcox
2025-12-24 21:21             ` Sasha Levin
2025-12-30  0:58               ` Joanne Koong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox