[PATCH v1 0/9] iomap: buffered io changes

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1 0/9] iomap: buffered io changes
@ 2025-10-09 22:56 Joanne Koong
  2025-10-09 22:56 ` [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range Joanne Koong
                   ` (8 more replies)
  0 siblings, 9 replies; 46+ messages in thread
From: Joanne Koong @ 2025-10-09 22:56 UTC (permalink / raw)
  To: brauner; +Cc: djwong, hch, bfoster, linux-fsdevel, kernel-team

This series is on top of commit 267652e9d474 ("Merge branch 'vfs-6.18.async'
into vfs.all") and patches [1][2][3] in Christian's vfs.all tree.

Patches 8 and 9 (using find_next_bit() for bitmap scanning) were pulled from
another patchset [4]. Patch 8 includes Darrick's nifty
'for_each_clean_block()' macro suggestion and includes expliciting handling
the "if (start_blk == end_blk)" case in ifs_find_dirty_range() to make it less
confusing, per Brian's feedback.

This series was run through fstests on fuse passthrough_hp as a sanity-check.

Thanks,
Joanne

[1] https://lore.kernel.org/linux-fsdevel/20250919214250.4144807-1-joannelkoong@gmail.com/
[2] https://lore.kernel.org/linux-fsdevel/20250922180042.1775241-1-joannelkoong@gmail.com/
[3] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/
[4] https://lore.kernel.org/linux-fsdevel/20250829233942.3607248-1-joannelkoong@gmail.com/


Joanne Koong (9):
  iomap: account for unaligned end offsets when truncating read range
  docs: document iomap writeback's iomap_finish_folio_write()
    requirement
  iomap: optimize pending async writeback accounting
  iomap: simplify ->read_folio_range() error handling for reads
  iomap: simplify when reads can be skipped for writes
  iomap: optimize reads for non-block-aligned writes
  iomap: use loff_t for file positions and offsets in writeback code
  iomap: use find_next_bit() for dirty bitmap scanning
  iomap: use find_next_bit() for uptodate bitmap scanning

 .../filesystems/iomap/operations.rst          |  10 +-
 fs/fuse/file.c                                |  18 +-
 fs/iomap/buffered-io.c                        | 281 ++++++++++++------
 fs/iomap/ioend.c                              |   2 -
 fs/xfs/xfs_aops.c                             |   8 +-
 include/linux/iomap.h                         |  15 +-
 6 files changed, 208 insertions(+), 126 deletions(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-09 22:56 [PATCH v1 0/9] iomap: buffered io changes Joanne Koong
@ 2025-10-09 22:56 ` Joanne Koong
  2025-10-13  3:00   ` Christoph Hellwig
  2025-10-09 22:56 ` [PATCH v1 2/9] docs: document iomap writeback's iomap_finish_folio_write() requirement Joanne Koong
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 46+ messages in thread
From: Joanne Koong @ 2025-10-09 22:56 UTC (permalink / raw)
  To: brauner; +Cc: djwong, hch, bfoster, linux-fsdevel, kernel-team

The end position to start truncating from may be at an offset into a
block, which under the current logic would result in overtruncation.

Adjust the calculation to account for unaligned end offsets.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 fs/iomap/buffered-io.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index f9ae72713f74..1c6575b7e583 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -218,6 +218,22 @@ static void ifs_free(struct folio *folio)
 	kfree(ifs);
 }
 
+/*
+ * Calculate how many bytes to truncate based off the number of blocks to
+ * truncate and the end position to start truncating from.
+ */
+static size_t iomap_bytes_to_truncate(loff_t end_pos, unsigned block_bits,
+		unsigned blocks_truncated)
+{
+	unsigned block_size = 1 << block_bits;
+	unsigned block_offset = end_pos & (block_size - 1);
+
+	if (!block_offset)
+		return blocks_truncated << block_bits;
+
+	return ((blocks_truncated - 1) << block_bits) + block_offset;
+}
+
 /*
  * Calculate the range inside the folio that we actually need to read.
  */
@@ -263,7 +279,8 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
 		/* truncate len if we find any trailing uptodate block(s) */
 		while (++i <= last) {
 			if (ifs_block_is_uptodate(ifs, i)) {
-				plen -= (last - i + 1) * block_size;
+				plen -= iomap_bytes_to_truncate(*pos + plen,
+						block_bits, last - i + 1);
 				last = i - 1;
 				break;
 			}
@@ -279,7 +296,8 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
 		unsigned end = offset_in_folio(folio, isize - 1) >> block_bits;
 
 		if (first <= end && last > end)
-			plen -= (last - end) * block_size;
+			plen -= iomap_bytes_to_truncate(*pos + plen, block_bits,
+					last - end);
 	}
 
 	*offp = poff;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v1 2/9] docs: document iomap writeback's iomap_finish_folio_write() requirement
  2025-10-09 22:56 [PATCH v1 0/9] iomap: buffered io changes Joanne Koong
  2025-10-09 22:56 ` [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range Joanne Koong
@ 2025-10-09 22:56 ` Joanne Koong
  2025-10-13  3:01   ` Christoph Hellwig
  2025-10-09 22:56 ` [PATCH v1 3/9] iomap: optimize pending async writeback accounting Joanne Koong
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 46+ messages in thread
From: Joanne Koong @ 2025-10-09 22:56 UTC (permalink / raw)
  To: brauner; +Cc: djwong, hch, bfoster, linux-fsdevel, kernel-team

Document that iomap_finish_folio_write() must be called after writeback
on the range completes.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 Documentation/filesystems/iomap/operations.rst | 3 +++
 include/linux/iomap.h                          | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index cef3c3e76e9e..018cfd13b9fa 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -361,6 +361,9 @@ The fields are as follows:
     delalloc reservations to avoid having delalloc reservations for
     clean pagecache.
     This function must be supplied by the filesystem.
+    If this succeeds, iomap_finish_folio_write() must be called once writeback
+    completes for the range, regardless of whether the writeback succeeded or
+    failed.
 
   - ``writeback_submit``: Submit the previous built writeback context.
     Block based file systems should use the iomap_ioend_writeback_submit
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 6d864b446b6e..e6fa812229dc 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -431,6 +431,10 @@ struct iomap_writeback_ops {
 	 * An existing mapping from a previous call to this method can be reused
 	 * by the file system if it is still valid.
 	 *
+	 * If this succeeds, iomap_finish_folio_write() must be called once
+	 * writeback completes for the range, regardless of whether the
+	 * writeback succeeded or failed.
+	 *
 	 * Returns the number of bytes processed or a negative errno.
 	 */
 	ssize_t (*writeback_range)(struct iomap_writepage_ctx *wpc,
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v1 3/9] iomap: optimize pending async writeback accounting
  2025-10-09 22:56 [PATCH v1 0/9] iomap: buffered io changes Joanne Koong
  2025-10-09 22:56 ` [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range Joanne Koong
  2025-10-09 22:56 ` [PATCH v1 2/9] docs: document iomap writeback's iomap_finish_folio_write() requirement Joanne Koong
@ 2025-10-09 22:56 ` Joanne Koong
  2025-10-13  3:04   ` Christoph Hellwig
  2025-10-09 22:56 ` [PATCH v1 4/9] iomap: simplify ->read_folio_range() error handling for reads Joanne Koong
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 46+ messages in thread
From: Joanne Koong @ 2025-10-09 22:56 UTC (permalink / raw)
  To: brauner; +Cc: djwong, hch, bfoster, linux-fsdevel, kernel-team

Pending writebacks must be accounted for to determine when all requests
have completed and writeback on the folio should be ended. Currently
this is done by atomically incrementing ifs->write_bytes_pending for
every range to be written back.

Instead, the number of atomic operations can be minimized by setting
ifs->write_bytes_pending to the folio size, internally tracking how many
bytes are written back asynchronously, and then after sending off all
the requests, decrementing ifs->write_bytes_pending by the number of
bytes not written back asynchronously. Now, for N ranges written back,
only N + 2 atomic operations are required instead of 2N + 2.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 fs/fuse/file.c         |  4 ++--
 fs/iomap/buffered-io.c | 45 +++++++++++++++++-------------------------
 fs/iomap/ioend.c       |  2 --
 include/linux/iomap.h  |  2 --
 4 files changed, 20 insertions(+), 33 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 7c9c00784e33..01d378f8de18 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1883,7 +1883,8 @@ static void fuse_writepage_finish(struct fuse_writepage_args *wpa)
 		 * scope of the fi->lock alleviates xarray lock
 		 * contention and noticeably improves performance.
 		 */
-		iomap_finish_folio_write(inode, ap->folios[i], 1);
+		iomap_finish_folio_write(inode, ap->folios[i],
+					 ap->descs[i].length);
 		dec_wb_stat(&bdi->wb, WB_WRITEBACK);
 		wb_writeout_inc(&bdi->wb);
 	}
@@ -2225,7 +2226,6 @@ static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
 		ap = &wpa->ia.ap;
 	}
 
-	iomap_start_folio_write(inode, folio, 1);
 	fuse_writepage_args_page_fill(wpa, folio, ap->num_folios,
 				      offset, len);
 	data->nr_bytes += len;
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 1c6575b7e583..7f914d5ac25d 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1552,16 +1552,16 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops,
 }
 EXPORT_SYMBOL_GPL(iomap_page_mkwrite);
 
-void iomap_start_folio_write(struct inode *inode, struct folio *folio,
-		size_t len)
+static void iomap_writeback_init(struct inode *inode, struct folio *folio)
 {
 	struct iomap_folio_state *ifs = folio->private;
 
 	WARN_ON_ONCE(i_blocks_per_folio(inode, folio) > 1 && !ifs);
-	if (ifs)
-		atomic_add(len, &ifs->write_bytes_pending);
+	if (ifs) {
+		WARN_ON_ONCE(atomic_read(&ifs->write_bytes_pending) != 0);
+		atomic_set(&ifs->write_bytes_pending, folio_size(folio));
+	}
 }
-EXPORT_SYMBOL_GPL(iomap_start_folio_write);
 
 void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
 		size_t len)
@@ -1578,7 +1578,7 @@ EXPORT_SYMBOL_GPL(iomap_finish_folio_write);
 
 static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
 		struct folio *folio, u64 pos, u32 rlen, u64 end_pos,
-		bool *wb_pending)
+		unsigned *wb_bytes_pending)
 {
 	do {
 		ssize_t ret;
@@ -1591,12 +1591,11 @@ static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
 		rlen -= ret;
 		pos += ret;
 
-		/*
-		 * Holes are not be written back by ->writeback_range, so track
+		/* Holes are not written back by ->writeback_range, so track
 		 * if we did handle anything that is not a hole here.
 		 */
 		if (wpc->iomap.type != IOMAP_HOLE)
-			*wb_pending = true;
+			*wb_bytes_pending += ret;
 	} while (rlen);
 
 	return 0;
@@ -1667,7 +1666,7 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
 	u64 pos = folio_pos(folio);
 	u64 end_pos = pos + folio_size(folio);
 	u64 end_aligned = 0;
-	bool wb_pending = false;
+	unsigned wb_bytes_pending = 0;
 	int error = 0;
 	u32 rlen;
 
@@ -1687,14 +1686,7 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
 			iomap_set_range_dirty(folio, 0, end_pos - pos);
 		}
 
-		/*
-		 * Keep the I/O completion handler from clearing the writeback
-		 * bit until we have submitted all blocks by adding a bias to
-		 * ifs->write_bytes_pending, which is dropped after submitting
-		 * all blocks.
-		 */
-		WARN_ON_ONCE(atomic_read(&ifs->write_bytes_pending) != 0);
-		iomap_start_folio_write(inode, folio, 1);
+		iomap_writeback_init(inode, folio);
 	}
 
 	/*
@@ -1709,13 +1701,13 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
 	end_aligned = round_up(end_pos, i_blocksize(inode));
 	while ((rlen = iomap_find_dirty_range(folio, &pos, end_aligned))) {
 		error = iomap_writeback_range(wpc, folio, pos, rlen, end_pos,
-				&wb_pending);
+				&wb_bytes_pending);
 		if (error)
 			break;
 		pos += rlen;
 	}
 
-	if (wb_pending)
+	if (wb_bytes_pending)
 		wpc->nr_folios++;
 
 	/*
@@ -1732,13 +1724,12 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
 	 * already at this point.  In that case we need to clear the writeback
 	 * bit ourselves right after unlocking the page.
 	 */
-	if (ifs) {
-		if (atomic_dec_and_test(&ifs->write_bytes_pending))
-			folio_end_writeback(folio);
-	} else {
-		if (!wb_pending)
-			folio_end_writeback(folio);
-	}
+	if (ifs)
+		iomap_finish_folio_write(inode, folio,
+			folio_size(folio) - wb_bytes_pending);
+	else if (!wb_bytes_pending)
+		folio_end_writeback(folio);
+
 	mapping_set_error(inode->i_mapping, error);
 	return error;
 }
diff --git a/fs/iomap/ioend.c b/fs/iomap/ioend.c
index b49fa75eab26..86f44922ed3b 100644
--- a/fs/iomap/ioend.c
+++ b/fs/iomap/ioend.c
@@ -194,8 +194,6 @@ ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
 	if (!bio_add_folio(&ioend->io_bio, folio, map_len, poff))
 		goto new_ioend;
 
-	iomap_start_folio_write(wpc->inode, folio, map_len);
-
 	/*
 	 * Clamp io_offset and io_size to the incore EOF so that ondisk
 	 * file size updates in the ioend completion are byte-accurate.
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index e6fa812229dc..a156a9964938 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -474,8 +474,6 @@ int iomap_ioend_writeback_submit(struct iomap_writepage_ctx *wpc, int error);
 
 void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
 		int error);
-void iomap_start_folio_write(struct inode *inode, struct folio *folio,
-		size_t len);
 void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
 		size_t len);
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v1 4/9] iomap: simplify ->read_folio_range() error handling for reads
  2025-10-09 22:56 [PATCH v1 0/9] iomap: buffered io changes Joanne Koong
                   ` (2 preceding siblings ...)
  2025-10-09 22:56 ` [PATCH v1 3/9] iomap: optimize pending async writeback accounting Joanne Koong
@ 2025-10-09 22:56 ` Joanne Koong
  2025-10-13  3:06   ` Christoph Hellwig
  2025-10-09 22:56 ` [PATCH v1 5/9] iomap: simplify when reads can be skipped for writes Joanne Koong
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 46+ messages in thread
From: Joanne Koong @ 2025-10-09 22:56 UTC (permalink / raw)
  To: brauner; +Cc: djwong, hch, bfoster, linux-fsdevel, kernel-team

Instead of requiring that the caller calls iomap_finish_folio_read()
even if the ->read_folio_range() callback returns an error, account for
this internally in iomap instead, which makes the interface simpler and
makes it match writeback's ->read_folio_range() error handling
expectations.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 .../filesystems/iomap/operations.rst          |  7 +++--
 fs/fuse/file.c                                | 10 ++-----
 fs/iomap/buffered-io.c                        | 26 +++++++++----------
 include/linux/iomap.h                         |  5 ++--
 4 files changed, 19 insertions(+), 29 deletions(-)

diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index 018cfd13b9fa..dd05d95ebb3e 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -149,10 +149,9 @@ These ``struct kiocb`` flags are significant for buffered I/O with iomap:
 iomap calls these functions:
 
   - ``read_folio_range``: Called to read in the range. This must be provided
-    by the caller. The caller is responsible for calling
-    iomap_finish_folio_read() after reading in the folio range. This should be
-    done even if an error is encountered during the read. This returns 0 on
-    success or a negative error on failure.
+    by the caller. If this succeeds, iomap_finish_folio_read() must be called
+    after the range is read in, regardless of whether the read succeeded or
+    failed.
 
   - ``submit_read``: Submit any pending read requests. This function is
     optional.
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 01d378f8de18..591789adb00b 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -916,13 +916,6 @@ static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
 
 	if (ctx->rac) {
 		ret = fuse_handle_readahead(folio, ctx->rac, data, pos, len);
-		/*
-		 * If fuse_handle_readahead was successful, fuse_readpages_end
-		 * will do the iomap_finish_folio_read, else we need to call it
-		 * here
-		 */
-		if (ret)
-			iomap_finish_folio_read(folio, off, len, ret);
 	} else {
 		/*
 		 *  for non-readahead read requests, do reads synchronously
@@ -930,7 +923,8 @@ static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
 		 *  out-of-order reads
 		 */
 		ret = fuse_do_readfolio(file, folio, off, len);
-		iomap_finish_folio_read(folio, off, len, ret);
+		if (!ret)
+			iomap_finish_folio_read(folio, off, len, ret);
 	}
 	return ret;
 }
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 7f914d5ac25d..dc05ed647ba5 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -377,26 +377,16 @@ static void iomap_read_init(struct folio *folio)
 		size_t len = folio_size(folio);
 
 		spin_lock_irq(&ifs->state_lock);
-		ifs->read_bytes_pending += len;
+		WARN_ON_ONCE(ifs->read_bytes_pending != 0);
+		ifs->read_bytes_pending = len;
 		spin_unlock_irq(&ifs->state_lock);
 	}
 }
 
 static void iomap_read_end(struct folio *folio, size_t bytes_pending)
 {
-	struct iomap_folio_state *ifs;
-
-	/*
-	 * If there are no bytes pending, this means we are responsible for
-	 * unlocking the folio here, since no IO helper has taken ownership of
-	 * it.
-	 */
-	if (!bytes_pending) {
-		folio_unlock(folio);
-		return;
-	}
+	struct iomap_folio_state *ifs = folio->private;
 
-	ifs = folio->private;
 	if (ifs) {
 		bool end_read, uptodate;
 		size_t bytes_accounted = folio_size(folio) - bytes_pending;
@@ -415,6 +405,14 @@ static void iomap_read_end(struct folio *folio, size_t bytes_pending)
 		spin_unlock_irq(&ifs->state_lock);
 		if (end_read)
 			folio_end_read(folio, uptodate);
+	} else if (!bytes_pending) {
+		/*
+		 * If there are no bytes pending, this means we are responsible
+		 * for unlocking the folio here, since no IO helper has taken
+		 * ownership of it. If there are bytes pending, then the IO
+		 * helper will end the read via iomap_finish_folio_read().
+		 */
+		folio_unlock(folio);
 	}
 }
 
@@ -462,10 +460,10 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
 		} else {
 			if (!*bytes_pending)
 				iomap_read_init(folio);
-			*bytes_pending += plen;
 			ret = ctx->ops->read_folio_range(iter, ctx, plen);
 			if (ret)
 				return ret;
+			*bytes_pending += plen;
 		}
 
 		ret = iomap_iter_advance(iter, plen);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index a156a9964938..c417bb8718e3 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -491,9 +491,8 @@ struct iomap_read_ops {
 	/*
 	 * Read in a folio range.
 	 *
-	 * The caller is responsible for calling iomap_finish_folio_read() after
-	 * reading in the folio range. This should be done even if an error is
-	 * encountered during the read.
+	 * If this succeeds, iomap_finish_folio_read() must be called after the
+	 * range is read in, regardless of whether the read succeeded or failed.
 	 *
 	 * Returns 0 on success or a negative error on failure.
 	 */
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v1 5/9] iomap: simplify when reads can be skipped for writes
  2025-10-09 22:56 [PATCH v1 0/9] iomap: buffered io changes Joanne Koong
                   ` (3 preceding siblings ...)
  2025-10-09 22:56 ` [PATCH v1 4/9] iomap: simplify ->read_folio_range() error handling for reads Joanne Koong
@ 2025-10-09 22:56 ` Joanne Koong
  2025-10-13  3:06   ` Christoph Hellwig
  2025-10-09 22:56 ` [PATCH v1 6/9] iomap: optimize reads for non-block-aligned writes Joanne Koong
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 46+ messages in thread
From: Joanne Koong @ 2025-10-09 22:56 UTC (permalink / raw)
  To: brauner; +Cc: djwong, hch, bfoster, linux-fsdevel, kernel-team

Currently, the logic for skipping the read range for a write is

if (!(iter->flags & IOMAP_UNSHARE) &&
    (from <= poff || from >= poff + plen) &&
    (to <= poff || to >= poff + plen))

which breaks down to skipping the read if any of these are true:
a) from <= poff && to <= poff
b) from <= poff && to >= poff + plen
c) from >= poff + plen && to <= poff
d) from >= poff + plen && to >= poff + plen

This can be simplified to
if (!(iter->flags & IOMAP_UNSHARE) &&
    (from <= poff && to >= poff + plen))

from the following reasoning:

a) from <= poff && to <= poff
This reduces to 'to <= poff' since it is guaranteed that 'from <= to'
(since to = from + len). It is not possible for 'from <= to' to be true
here because we only reach here if plen > 0 (thanks to the preceding 'if
(plen == 0)' check that would break us out of the loop). If 'to <=
poff', plen would have to be 0 since poff and plen get adjusted in
lockstep for uptodate blocks. This means we can eliminate this check.

c) from >= poff + plen && to <= poff
This is not possible since 'from <= to' and 'plen > 0'. We can eliminate
this check.

d) from >= poff + plen && to >= poff + plen
This reduces to 'from >= poff + plen' since 'from <= to'.
It is not possible for 'from >= poff + plen' to be true here. We only
reach here if plen > 0 and for writes, poff and plen will always be
block-aligned, which means poff <= from < poff + plen. We can eliminate
this check.

The only valid check is b) from <= poff && to >= poff + plen.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 fs/iomap/buffered-io.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index dc05ed647ba5..0ad8c8a218f3 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -714,9 +714,12 @@ static int __iomap_write_begin(const struct iomap_iter *iter,
 		if (plen == 0)
 			break;
 
+		/*
+		 * If the read range will be entirely overwritten by the write,
+		 * we can skip having to zero/read it in.
+		 */
 		if (!(iter->flags & IOMAP_UNSHARE) &&
-		    (from <= poff || from >= poff + plen) &&
-		    (to <= poff || to >= poff + plen))
+		    (from <= poff && to >= poff + plen))
 			continue;
 
 		if (iomap_block_needs_zeroing(iter, block_start)) {
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v1 6/9] iomap: optimize reads for non-block-aligned writes
  2025-10-09 22:56 [PATCH v1 0/9] iomap: buffered io changes Joanne Koong
                   ` (4 preceding siblings ...)
  2025-10-09 22:56 ` [PATCH v1 5/9] iomap: simplify when reads can be skipped for writes Joanne Koong
@ 2025-10-09 22:56 ` Joanne Koong
  2025-10-13  3:08   ` Christoph Hellwig
  2025-10-09 22:56 ` [PATCH v1 7/9] iomap: use loff_t for file positions and offsets in writeback code Joanne Koong
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 46+ messages in thread
From: Joanne Koong @ 2025-10-09 22:56 UTC (permalink / raw)
  To: brauner; +Cc: djwong, hch, bfoster, linux-fsdevel, kernel-team

If a write is block-aligned (i.e., write offset and length are both
block-aligned), no reads should be necessary.

If the write starts or ends at a non-block-aligned offset, the write
should only need to read in at most two blocks, the starting block and
the ending block. Any intermediary blocks should be skipped since they
will be completely overwritten.

Currently for non-block-aligned writes, the entire range gets read in
including intermediary blocks.

Optimize the logic to read in only the necessary blocks.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 fs/iomap/buffered-io.c | 39 ++++++++++++++++++++++++++++++++-------
 1 file changed, 32 insertions(+), 7 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 0ad8c8a218f3..372e14f7ab57 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -686,6 +686,7 @@ static int __iomap_write_begin(const struct iomap_iter *iter,
 	loff_t block_size = i_blocksize(iter->inode);
 	loff_t block_start = round_down(pos, block_size);
 	loff_t block_end = round_up(pos + len, block_size);
+	unsigned int block_bits = iter->inode->i_blkbits;
 	unsigned int nr_blocks = i_blocks_per_folio(iter->inode, folio);
 	size_t from = offset_in_folio(folio, pos), to = from + len;
 	size_t poff, plen;
@@ -714,13 +715,37 @@ static int __iomap_write_begin(const struct iomap_iter *iter,
 		if (plen == 0)
 			break;
 
-		/*
-		 * If the read range will be entirely overwritten by the write,
-		 * we can skip having to zero/read it in.
-		 */
-		if (!(iter->flags & IOMAP_UNSHARE) &&
-		    (from <= poff && to >= poff + plen))
-			continue;
+		if (!(iter->flags & IOMAP_UNSHARE)) {
+			/*
+			 * If the read range will be entirely overwritten by the
+			 * write, we can skip having to zero/read it in.
+			 */
+			if (from <= poff && to >= poff + plen)
+				continue;
+
+			/*
+			 * If the write starts at a non-block-aligned offset
+			 * (from > poff), read only the first block. Any
+			 * intermediate blocks will be skipped in the next
+			 * iteration.
+			 *
+			 * Exception: skip this optimization if the write spans
+			 * only two blocks and ends at a non-block-aligned
+			 * offset.
+			 */
+			if (from > poff) {
+			       if ((plen >> block_bits) > 2 ||
+						to >= poff + plen)
+					plen = block_size;
+			} else if (to < poff + plen) {
+				/*
+				 * Else if the write ends at an offset into the
+				 * last block, read in just the last block.
+				 */
+				poff = poff + plen - block_size;
+				plen = block_size;
+			}
+		}
 
 		if (iomap_block_needs_zeroing(iter, block_start)) {
 			if (WARN_ON_ONCE(iter->flags & IOMAP_UNSHARE))
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v1 7/9] iomap: use loff_t for file positions and offsets in writeback code
  2025-10-09 22:56 [PATCH v1 0/9] iomap: buffered io changes Joanne Koong
                   ` (5 preceding siblings ...)
  2025-10-09 22:56 ` [PATCH v1 6/9] iomap: optimize reads for non-block-aligned writes Joanne Koong
@ 2025-10-09 22:56 ` Joanne Koong
  2025-10-13  3:09   ` Christoph Hellwig
  2025-10-09 22:56 ` [PATCH v1 8/9] iomap: use find_next_bit() for dirty bitmap scanning Joanne Koong
  2025-10-09 22:56 ` [PATCH v1 9/9] iomap: use find_next_bit() for uptodate " Joanne Koong
  8 siblings, 1 reply; 46+ messages in thread
From: Joanne Koong @ 2025-10-09 22:56 UTC (permalink / raw)
  To: brauner; +Cc: djwong, hch, bfoster, linux-fsdevel, kernel-team

Use loff_t instead of u64 for file positions and offsets, consistent
with kernel VFS conventions.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 fs/fuse/file.c         |  4 ++--
 fs/iomap/buffered-io.c | 17 +++++++++--------
 fs/xfs/xfs_aops.c      |  8 ++++----
 include/linux/iomap.h  |  4 ++--
 4 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 591789adb00b..c44c058feeb0 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2172,8 +2172,8 @@ static bool fuse_folios_need_send(struct fuse_conn *fc, loff_t pos,
 }
 
 static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
-					  struct folio *folio, u64 pos,
-					  unsigned len, u64 end_pos)
+					  struct folio *folio, loff_t pos,
+					  unsigned len, loff_t end_pos)
 {
 	struct fuse_fill_wb_data *data = wpc->wb_ctx;
 	struct fuse_writepage_args *wpa = data->wpa;
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 372e14f7ab57..66c47404787f 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -86,7 +86,8 @@ static inline bool ifs_block_is_dirty(struct folio *folio,
 }
 
 static unsigned ifs_find_dirty_range(struct folio *folio,
-		struct iomap_folio_state *ifs, u64 *range_start, u64 range_end)
+		struct iomap_folio_state *ifs, loff_t *range_start,
+		loff_t range_end)
 {
 	struct inode *inode = folio->mapping->host;
 	unsigned start_blk =
@@ -110,8 +111,8 @@ static unsigned ifs_find_dirty_range(struct folio *folio,
 	return nblks << inode->i_blkbits;
 }
 
-static unsigned iomap_find_dirty_range(struct folio *folio, u64 *range_start,
-		u64 range_end)
+static unsigned iomap_find_dirty_range(struct folio *folio, loff_t *range_start,
+		loff_t range_end)
 {
 	struct iomap_folio_state *ifs = folio->private;
 
@@ -1603,7 +1604,7 @@ void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
 EXPORT_SYMBOL_GPL(iomap_finish_folio_write);
 
 static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
-		struct folio *folio, u64 pos, u32 rlen, u64 end_pos,
+		struct folio *folio, loff_t pos, u32 rlen, loff_t end_pos,
 		unsigned *wb_bytes_pending)
 {
 	do {
@@ -1634,7 +1635,7 @@ static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
  * i_size, adjust end_pos and zero all data beyond i_size.
  */
 static bool iomap_writeback_handle_eof(struct folio *folio, struct inode *inode,
-		u64 *end_pos)
+		loff_t *end_pos)
 {
 	u64 isize = i_size_read(inode);
 
@@ -1689,9 +1690,9 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
 {
 	struct iomap_folio_state *ifs = folio->private;
 	struct inode *inode = wpc->inode;
-	u64 pos = folio_pos(folio);
-	u64 end_pos = pos + folio_size(folio);
-	u64 end_aligned = 0;
+	loff_t pos = folio_pos(folio);
+	loff_t end_pos = pos + folio_size(folio);
+	loff_t end_aligned = 0;
 	unsigned wb_bytes_pending = 0;
 	int error = 0;
 	u32 rlen;
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 0c2ed00733f2..593a34832116 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -480,9 +480,9 @@ static ssize_t
 xfs_writeback_range(
 	struct iomap_writepage_ctx *wpc,
 	struct folio		*folio,
-	u64			offset,
+	loff_t			offset,
 	unsigned int		len,
-	u64			end_pos)
+	loff_t			end_pos)
 {
 	ssize_t			ret;
 
@@ -630,9 +630,9 @@ static ssize_t
 xfs_zoned_writeback_range(
 	struct iomap_writepage_ctx *wpc,
 	struct folio		*folio,
-	u64			offset,
+	loff_t			offset,
 	unsigned int		len,
-	u64			end_pos)
+	loff_t			end_pos)
 {
 	ssize_t			ret;
 
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index c417bb8718e3..8045d4c430ae 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -438,8 +438,8 @@ struct iomap_writeback_ops {
 	 * Returns the number of bytes processed or a negative errno.
 	 */
 	ssize_t (*writeback_range)(struct iomap_writepage_ctx *wpc,
-			struct folio *folio, u64 pos, unsigned int len,
-			u64 end_pos);
+			struct folio *folio, loff_t pos, unsigned int len,
+			loff_t end_pos);
 
 	/*
 	 * Submit a writeback context previously build up by ->writeback_range.
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v1 8/9] iomap: use find_next_bit() for dirty bitmap scanning
  2025-10-09 22:56 [PATCH v1 0/9] iomap: buffered io changes Joanne Koong
                   ` (6 preceding siblings ...)
  2025-10-09 22:56 ` [PATCH v1 7/9] iomap: use loff_t for file positions and offsets in writeback code Joanne Koong
@ 2025-10-09 22:56 ` Joanne Koong
  2025-10-13  3:13   ` Christoph Hellwig
  2025-10-09 22:56 ` [PATCH v1 9/9] iomap: use find_next_bit() for uptodate " Joanne Koong
  8 siblings, 1 reply; 46+ messages in thread
From: Joanne Koong @ 2025-10-09 22:56 UTC (permalink / raw)
  To: brauner; +Cc: djwong, hch, bfoster, linux-fsdevel, kernel-team

Use find_next_bit()/find_next_zero_bit() for iomap dirty bitmap
scanning. This uses __ffs() internally and is more efficient for
finding the next dirty or clean bit than manually iterating through the
bitmap range testing every bit.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
---
 fs/iomap/buffered-io.c | 73 ++++++++++++++++++++++++++++++------------
 1 file changed, 52 insertions(+), 21 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 66c47404787f..37d2b76ca230 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -76,15 +76,49 @@ static void iomap_set_range_uptodate(struct folio *folio, size_t off,
 		folio_mark_uptodate(folio);
 }
 
-static inline bool ifs_block_is_dirty(struct folio *folio,
-		struct iomap_folio_state *ifs, int block)
+/**
+* ifs_next_dirty_block - find the next dirty block in the folio
+* @folio: The folio
+* @start_blk: Block number to begin searching at
+* @end_blk: Last block number (inclusive) to search
+*
+* If no dirty block is found, this will return end_blk + 1.
+*/
+static unsigned ifs_next_dirty_block(struct folio *folio,
+		unsigned start_blk, unsigned end_blk)
 {
+	struct iomap_folio_state *ifs = folio->private;
 	struct inode *inode = folio->mapping->host;
-	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
+	unsigned int blks = i_blocks_per_folio(inode, folio);
+
+	return find_next_bit(ifs->state, blks + end_blk + 1,
+			blks + start_blk) - blks;
+}
+
+/**
+* ifs_next_clean_block - find the next clean block in the folio
+* @folio: The folio
+* @start_blk: Block number to begin searching at
+* @end_blk: Last block number (inclusive) to search
+*
+* If no clean block is found, this will return end_blk + 1.
+*/
+static unsigned ifs_next_clean_block(struct folio *folio,
+		unsigned start_blk, unsigned end_blk)
+{
+	struct iomap_folio_state *ifs = folio->private;
+	struct inode *inode = folio->mapping->host;
+	unsigned int blks = i_blocks_per_folio(inode, folio);
 
-	return test_bit(block + blks_per_folio, ifs->state);
+	return find_next_zero_bit(ifs->state, blks + end_blk + 1,
+			blks + start_blk) - blks;
 }
 
+#define for_each_clean_block(folio, blk, last_blk) \
+	for ((blk) = ifs_next_clean_block((folio), (blk), (last_blk)); \
+	     (blk) <= (last_blk); \
+	     (blk) = ifs_next_clean_block((folio), (blk) + 1, (last_blk)))
+
 static unsigned ifs_find_dirty_range(struct folio *folio,
 		struct iomap_folio_state *ifs, loff_t *range_start,
 		loff_t range_end)
@@ -94,18 +128,17 @@ static unsigned ifs_find_dirty_range(struct folio *folio,
 		offset_in_folio(folio, *range_start) >> inode->i_blkbits;
 	unsigned end_blk = min_not_zero(
 		offset_in_folio(folio, range_end) >> inode->i_blkbits,
-		i_blocks_per_folio(inode, folio));
-	unsigned nblks = 1;
+		i_blocks_per_folio(inode, folio)) - 1;
+	unsigned nblks;
 
-	while (!ifs_block_is_dirty(folio, ifs, start_blk))
-		if (++start_blk == end_blk)
-			return 0;
-
-	while (start_blk + nblks < end_blk) {
-		if (!ifs_block_is_dirty(folio, ifs, start_blk + nblks))
-			break;
-		nblks++;
-	}
+	start_blk = ifs_next_dirty_block(folio, start_blk, end_blk);
+	if (start_blk > end_blk)
+		return 0;
+	else if (start_blk == end_blk)
+		nblks = 1;
+	else
+		nblks = ifs_next_clean_block(folio, start_blk + 1, end_blk)
+				- start_blk;
 
 	*range_start = folio_pos(folio) + (start_blk << inode->i_blkbits);
 	return nblks << inode->i_blkbits;
@@ -1102,7 +1135,7 @@ static void iomap_write_delalloc_ifs_punch(struct inode *inode,
 		struct folio *folio, loff_t start_byte, loff_t end_byte,
 		struct iomap *iomap, iomap_punch_t punch)
 {
-	unsigned int first_blk, last_blk, i;
+	unsigned int first_blk, last_blk;
 	loff_t last_byte;
 	u8 blkbits = inode->i_blkbits;
 	struct iomap_folio_state *ifs;
@@ -1121,11 +1154,9 @@ static void iomap_write_delalloc_ifs_punch(struct inode *inode,
 			folio_pos(folio) + folio_size(folio) - 1);
 	first_blk = offset_in_folio(folio, start_byte) >> blkbits;
 	last_blk = offset_in_folio(folio, last_byte) >> blkbits;
-	for (i = first_blk; i <= last_blk; i++) {
-		if (!ifs_block_is_dirty(folio, ifs, i))
-			punch(inode, folio_pos(folio) + (i << blkbits),
-				    1 << blkbits, iomap);
-	}
+	for_each_clean_block(folio, first_blk, last_blk)
+		punch(inode, folio_pos(folio) + (first_blk << blkbits),
+			    1 << blkbits, iomap);
 }
 
 static void iomap_write_delalloc_punch(struct inode *inode, struct folio *folio,
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v1 9/9] iomap: use find_next_bit() for uptodate bitmap scanning
  2025-10-09 22:56 [PATCH v1 0/9] iomap: buffered io changes Joanne Koong
                   ` (7 preceding siblings ...)
  2025-10-09 22:56 ` [PATCH v1 8/9] iomap: use find_next_bit() for dirty bitmap scanning Joanne Koong
@ 2025-10-09 22:56 ` Joanne Koong
  2025-10-13  3:13   ` Christoph Hellwig
  8 siblings, 1 reply; 46+ messages in thread
From: Joanne Koong @ 2025-10-09 22:56 UTC (permalink / raw)
  To: brauner; +Cc: djwong, hch, bfoster, linux-fsdevel, kernel-team

Use find_next_bit()/find_next_zero_bit() for iomap uptodate bitmap
scanning. This uses __ffs() internally and is more efficient for
finding the next uptodate or non-uptodate bit than manually iterating
through the bitmap range testing every bit.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
---
 fs/iomap/buffered-io.c | 62 ++++++++++++++++++++++++++++--------------
 1 file changed, 41 insertions(+), 21 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 37d2b76ca230..043c10d22db9 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -38,10 +38,36 @@ static inline bool ifs_is_fully_uptodate(struct folio *folio,
 	return bitmap_full(ifs->state, i_blocks_per_folio(inode, folio));
 }
 
-static inline bool ifs_block_is_uptodate(struct iomap_folio_state *ifs,
-		unsigned int block)
+/**
+* ifs_next_uptodate_block - find the next uptodate block in the folio
+* @folio: The folio
+* @start_blk: Block number to begin searching at
+* @end_blk: Last block number (inclusive) to search
+*
+* If no uptodate block is found, this will return end_blk + 1.
+*/
+static unsigned ifs_next_uptodate_block(struct folio *folio,
+		unsigned start_blk, unsigned end_blk)
 {
-	return test_bit(block, ifs->state);
+	struct iomap_folio_state *ifs = folio->private;
+
+	return find_next_bit(ifs->state, end_blk + 1, start_blk);
+}
+
+/**
+* ifs_next_nonuptodate_block - find the next non-uptodate block in the folio
+* @folio: The folio
+* @start_blk: Block number to begin searching at
+* @end_blk: Last block number (inclusive) to search
+*
+* If no non-uptodate block is found, this will return end_blk + 1.
+*/
+static unsigned ifs_next_nonuptodate_block(struct folio *folio,
+		unsigned start_blk, unsigned end_blk)
+{
+	struct iomap_folio_state *ifs = folio->private;
+
+	return find_next_zero_bit(ifs->state, end_blk + 1, start_blk);
 }
 
 static bool ifs_set_range_uptodate(struct folio *folio,
@@ -291,14 +317,11 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
 	 * to avoid reading in already uptodate ranges.
 	 */
 	if (ifs) {
-		unsigned int i, blocks_skipped;
+		unsigned int next, blocks_skipped;
 
-		/* move forward for each leading block marked uptodate */
-		for (i = first; i <= last; i++)
-			if (!ifs_block_is_uptodate(ifs, i))
-				break;
+		next = ifs_next_nonuptodate_block(folio, first, last);
+		blocks_skipped = next - first;
 
-		blocks_skipped = i - first;
 		if (blocks_skipped) {
 			unsigned long block_offset = *pos & (block_size - 1);
 			unsigned bytes_skipped =
@@ -308,15 +331,15 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
 			poff += bytes_skipped;
 			plen -= bytes_skipped;
 		}
-		first = i;
+		first = next;
 
-		/* truncate len if we find any trailing uptodate block(s) */
-		while (++i <= last) {
-			if (ifs_block_is_uptodate(ifs, i)) {
+		/* find any trailing uptodate block(s) */
+		if (++next <= last) {
+			next = ifs_next_uptodate_block(folio, next, last);
+			if (next <= last) {
 				plen -= iomap_bytes_to_truncate(*pos + plen,
-						block_bits, last - i + 1);
-				last = i - 1;
-				break;
+						block_bits, last - next + 1);
+				last = next - 1;
 			}
 		}
 	}
@@ -609,7 +632,7 @@ bool iomap_is_partially_uptodate(struct folio *folio, size_t from, size_t count)
 {
 	struct iomap_folio_state *ifs = folio->private;
 	struct inode *inode = folio->mapping->host;
-	unsigned first, last, i;
+	unsigned first, last;
 
 	if (!ifs)
 		return false;
@@ -621,10 +644,7 @@ bool iomap_is_partially_uptodate(struct folio *folio, size_t from, size_t count)
 	first = from >> inode->i_blkbits;
 	last = (from + count - 1) >> inode->i_blkbits;
 
-	for (i = first; i <= last; i++)
-		if (!ifs_block_is_uptodate(ifs, i))
-			return false;
-	return true;
+	return ifs_next_nonuptodate_block(folio, first, last) > last;
 }
 EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-09 22:56 ` [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range Joanne Koong
@ 2025-10-13  3:00   ` Christoph Hellwig
  2025-10-15 17:34     ` Joanne Koong
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2025-10-13  3:00 UTC (permalink / raw)
  To: Joanne Koong; +Cc: brauner, djwong, hch, bfoster, linux-fsdevel, kernel-team

On Thu, Oct 09, 2025 at 03:56:03PM -0700, Joanne Koong wrote:
> The end position to start truncating from may be at an offset into a
> block, which under the current logic would result in overtruncation.
> 
> Adjust the calculation to account for unaligned end offsets.

Should this get a fixes tag?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 2/9] docs: document iomap writeback's iomap_finish_folio_write() requirement
  2025-10-09 22:56 ` [PATCH v1 2/9] docs: document iomap writeback's iomap_finish_folio_write() requirement Joanne Koong
@ 2025-10-13  3:01   ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2025-10-13  3:01 UTC (permalink / raw)
  To: Joanne Koong; +Cc: brauner, djwong, hch, bfoster, linux-fsdevel, kernel-team

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 3/9] iomap: optimize pending async writeback accounting
  2025-10-09 22:56 ` [PATCH v1 3/9] iomap: optimize pending async writeback accounting Joanne Koong
@ 2025-10-13  3:04   ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2025-10-13  3:04 UTC (permalink / raw)
  To: Joanne Koong; +Cc: brauner, djwong, hch, bfoster, linux-fsdevel, kernel-team

On Thu, Oct 09, 2025 at 03:56:05PM -0700, Joanne Koong wrote:
>  
>  static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
>  		struct folio *folio, u64 pos, u32 rlen, u64 end_pos,
> -		bool *wb_pending)
> +		unsigned *wb_bytes_pending)

Maybe shorten the variable name to bytes_pending or wb_bytes?
The current name feels a bit too verbose (not really a major issue,
just thinking out aloud)

> -		/*
> -		 * Holes are not be written back by ->writeback_range, so track
> +		/* Holes are not written back by ->writeback_range, so track

Please stick to the normal kernel comment style that was used here
previously.

Otherwise this looks good to me.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 4/9] iomap: simplify ->read_folio_range() error handling for reads
  2025-10-09 22:56 ` [PATCH v1 4/9] iomap: simplify ->read_folio_range() error handling for reads Joanne Koong
@ 2025-10-13  3:06   ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2025-10-13  3:06 UTC (permalink / raw)
  To: Joanne Koong; +Cc: brauner, djwong, hch, bfoster, linux-fsdevel, kernel-team

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

(although we'll really need to test all this very careful for the
configurations finding writeback bugs)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 5/9] iomap: simplify when reads can be skipped for writes
  2025-10-09 22:56 ` [PATCH v1 5/9] iomap: simplify when reads can be skipped for writes Joanne Koong
@ 2025-10-13  3:06   ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2025-10-13  3:06 UTC (permalink / raw)
  To: Joanne Koong; +Cc: brauner, djwong, hch, bfoster, linux-fsdevel, kernel-team

>  		if (!(iter->flags & IOMAP_UNSHARE) &&
> +		    (from <= poff && to >= poff + plen))

No need for the inner braces on the new line.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 6/9] iomap: optimize reads for non-block-aligned writes
  2025-10-09 22:56 ` [PATCH v1 6/9] iomap: optimize reads for non-block-aligned writes Joanne Koong
@ 2025-10-13  3:08   ` Christoph Hellwig
  2025-10-14  0:04     ` Joanne Koong
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2025-10-13  3:08 UTC (permalink / raw)
  To: Joanne Koong; +Cc: brauner, djwong, hch, bfoster, linux-fsdevel, kernel-team

On Thu, Oct 09, 2025 at 03:56:08PM -0700, Joanne Koong wrote:
> If a write is block-aligned (i.e., write offset and length are both
> block-aligned), no reads should be necessary.
> 
> If the write starts or ends at a non-block-aligned offset, the write
> should only need to read in at most two blocks, the starting block and
> the ending block. Any intermediary blocks should be skipped since they
> will be completely overwritten.
> 
> Currently for non-block-aligned writes, the entire range gets read in
> including intermediary blocks.
> 
> Optimize the logic to read in only the necessary blocks.

At least for block based file systems doing a single read in the typical
block size range is cheaper than two smaller reads.  This is especially
true for hard drivers with seeks, but even for SSDs I doubt the two
reads are more efficient.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 7/9] iomap: use loff_t for file positions and offsets in writeback code
  2025-10-09 22:56 ` [PATCH v1 7/9] iomap: use loff_t for file positions and offsets in writeback code Joanne Koong
@ 2025-10-13  3:09   ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2025-10-13  3:09 UTC (permalink / raw)
  To: Joanne Koong; +Cc: brauner, djwong, hch, bfoster, linux-fsdevel, kernel-team

On Thu, Oct 09, 2025 at 03:56:09PM -0700, Joanne Koong wrote:
> Use loff_t instead of u64 for file positions and offsets, consistent
> with kernel VFS conventions.

Which for historical reasons makes this a signed value, but I can't
find anything where that would cause problems.  But maybe mention
that in the commit log?

Otheriwse looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 8/9] iomap: use find_next_bit() for dirty bitmap scanning
  2025-10-09 22:56 ` [PATCH v1 8/9] iomap: use find_next_bit() for dirty bitmap scanning Joanne Koong
@ 2025-10-13  3:13   ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2025-10-13  3:13 UTC (permalink / raw)
  To: Joanne Koong; +Cc: brauner, djwong, hch, bfoster, linux-fsdevel, kernel-team

On Thu, Oct 09, 2025 at 03:56:10PM -0700, Joanne Koong wrote:
> Use find_next_bit()/find_next_zero_bit() for iomap dirty bitmap
> scanning. This uses __ffs() internally and is more efficient for
> finding the next dirty or clean bit than manually iterating through the
> bitmap range testing every bit.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Suggested-by: Christoph Hellwig <hch@infradead.org>
> ---
>  fs/iomap/buffered-io.c | 73 ++++++++++++++++++++++++++++++------------
>  1 file changed, 52 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 66c47404787f..37d2b76ca230 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -76,15 +76,49 @@ static void iomap_set_range_uptodate(struct folio *folio, size_t off,
>  		folio_mark_uptodate(folio);
>  }
>  
> -static inline bool ifs_block_is_dirty(struct folio *folio,
> -		struct iomap_folio_state *ifs, int block)
> +/**
> +* ifs_next_dirty_block - find the next dirty block in the folio
> +* @folio: The folio
> +* @start_blk: Block number to begin searching at
> +* @end_blk: Last block number (inclusive) to search
> +*
> +* If no dirty block is found, this will return end_blk + 1.
> +*/

I'm not a huge fan of using the very verbose kerneldoc comments for
static functions where this isn't even turned into generated
documentation.  Can you shorten the comments into normal non-structured
ones?

> +	if (start_blk > end_blk)
> +		return 0;
> +	else if (start_blk == end_blk)

No need for an else after a return.

> +		nblks = ifs_next_clean_block(folio, start_blk + 1, end_blk)
> +				- start_blk;

Kernel style keeps the operator before the line break.

> +	for_each_clean_block(folio, first_blk, last_blk)

Given that this is the only user of this macro in the entire series,
what is the point of it?  Just open coding the search would be simpler.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 9/9] iomap: use find_next_bit() for uptodate bitmap scanning
  2025-10-09 22:56 ` [PATCH v1 9/9] iomap: use find_next_bit() for uptodate " Joanne Koong
@ 2025-10-13  3:13   ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2025-10-13  3:13 UTC (permalink / raw)
  To: Joanne Koong; +Cc: brauner, djwong, hch, bfoster, linux-fsdevel, kernel-team

Same comment about the kerneldoc comments as for the last patch, but
otherwise this looks good.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 6/9] iomap: optimize reads for non-block-aligned writes
  2025-10-13  3:08   ` Christoph Hellwig
@ 2025-10-14  0:04     ` Joanne Koong
  2025-10-14  4:14       ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Joanne Koong @ 2025-10-14  0:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: brauner, djwong, bfoster, linux-fsdevel, kernel-team

On Sun, Oct 12, 2025 at 8:08 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Thu, Oct 09, 2025 at 03:56:08PM -0700, Joanne Koong wrote:
> > If a write is block-aligned (i.e., write offset and length are both
> > block-aligned), no reads should be necessary.
> >
> > If the write starts or ends at a non-block-aligned offset, the write
> > should only need to read in at most two blocks, the starting block and
> > the ending block. Any intermediary blocks should be skipped since they
> > will be completely overwritten.
> >
> > Currently for non-block-aligned writes, the entire range gets read in
> > including intermediary blocks.
> >
> > Optimize the logic to read in only the necessary blocks.
>
> At least for block based file systems doing a single read in the typical
> block size range is cheaper than two smaller reads.  This is especially
> true for hard drivers with seeks, but even for SSDs I doubt the two
> reads are more efficient.
>

Ahh okay, that makes sense. I'll drop this patch then.

Thanks for reviewing the series. I'll make your suggested edits for v2.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 6/9] iomap: optimize reads for non-block-aligned writes
  2025-10-14  0:04     ` Joanne Koong
@ 2025-10-14  4:14       ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2025-10-14  4:14 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Christoph Hellwig, brauner, djwong, bfoster, linux-fsdevel,
	kernel-team

On Mon, Oct 13, 2025 at 05:04:30PM -0700, Joanne Koong wrote:
> Ahh okay, that makes sense. I'll drop this patch then.

If this is a big enough win for fuse / network file systems we can
find a way to opt into this behavior.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-13  3:00   ` Christoph Hellwig
@ 2025-10-15 17:34     ` Joanne Koong
  2025-10-15 17:40       ` Gao Xiang
  0 siblings, 1 reply; 46+ messages in thread
From: Joanne Koong @ 2025-10-15 17:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: brauner, djwong, bfoster, linux-fsdevel, kernel-team

On Sun, Oct 12, 2025 at 8:00 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Thu, Oct 09, 2025 at 03:56:03PM -0700, Joanne Koong wrote:
> > The end position to start truncating from may be at an offset into a
> > block, which under the current logic would result in overtruncation.
> >
> > Adjust the calculation to account for unaligned end offsets.
>
> Should this get a fixes tag?

I don't think this needs a fixes tag because when it was originally
written (in commit 9dc55f1389f9 ("iomap: add support for sub-pagesize
buffered I/O without buffer heads") in 2018), it was only used by xfs.
think it was when erofs started using iomap that iomap mappings could
represent non-block-aligned data.


Thanks,
Joanne

>
> Otherwise looks good:
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-15 17:34     ` Joanne Koong
@ 2025-10-15 17:40       ` Gao Xiang
  2025-10-15 17:49         ` Joanne Koong
  0 siblings, 1 reply; 46+ messages in thread
From: Gao Xiang @ 2025-10-15 17:40 UTC (permalink / raw)
  To: Joanne Koong, Christoph Hellwig
  Cc: brauner, djwong, bfoster, linux-fsdevel, kernel-team



On 2025/10/16 01:34, Joanne Koong wrote:
> On Sun, Oct 12, 2025 at 8:00 PM Christoph Hellwig <hch@infradead.org> wrote:
>>
>> On Thu, Oct 09, 2025 at 03:56:03PM -0700, Joanne Koong wrote:
>>> The end position to start truncating from may be at an offset into a
>>> block, which under the current logic would result in overtruncation.
>>>
>>> Adjust the calculation to account for unaligned end offsets.
>>
>> Should this get a fixes tag?
> 
> I don't think this needs a fixes tag because when it was originally
> written (in commit 9dc55f1389f9 ("iomap: add support for sub-pagesize
> buffered I/O without buffer heads") in 2018), it was only used by xfs.
> think it was when erofs started using iomap that iomap mappings could
> represent non-block-aligned data.

What non-block-aligned data exactly? erofs is a strictly block-aligned
filesystem except for tail inline data.

Is it inline data? gfs2 also uses the similar inline data logic.

Thanks,
Gao Xiang

> 
> 
> Thanks,
> Joanne
> 
>>
>> Otherwise looks good:
>>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-15 17:40       ` Gao Xiang
@ 2025-10-15 17:49         ` Joanne Koong
  2025-10-15 18:06           ` Gao Xiang
  0 siblings, 1 reply; 46+ messages in thread
From: Joanne Koong @ 2025-10-15 17:49 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Christoph Hellwig, brauner, djwong, bfoster, linux-fsdevel,
	kernel-team

On Wed, Oct 15, 2025 at 10:40 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>
> On 2025/10/16 01:34, Joanne Koong wrote:
> > On Sun, Oct 12, 2025 at 8:00 PM Christoph Hellwig <hch@infradead.org> wrote:
> >>
> >> On Thu, Oct 09, 2025 at 03:56:03PM -0700, Joanne Koong wrote:
> >>> The end position to start truncating from may be at an offset into a
> >>> block, which under the current logic would result in overtruncation.
> >>>
> >>> Adjust the calculation to account for unaligned end offsets.
> >>
> >> Should this get a fixes tag?
> >
> > I don't think this needs a fixes tag because when it was originally
> > written (in commit 9dc55f1389f9 ("iomap: add support for sub-pagesize
> > buffered I/O without buffer heads") in 2018), it was only used by xfs.
> > think it was when erofs started using iomap that iomap mappings could
> > represent non-block-aligned data.
>
> What non-block-aligned data exactly? erofs is a strictly block-aligned
> filesystem except for tail inline data.
>
> Is it inline data? gfs2 also uses the similar inline data logic.

This is where I encountered it in erofs: [1] for the "WARNING in
iomap_iter_advance" syz repro. (this syzbot report was generated in
response to this patchset version [2]).

When I ran that syz program locally, I remember seeing pos=116 and length=3980.

Thanks,
Joanne

[1] https://ci.syzbot.org/series/6845596a-1ec9-4396-b9c4-48bddc606bef
[2] https://lore.kernel.org/linux-fsdevel/68ca71bd.050a0220.2ff435.04fc.GAE@google.com/

>
> Thanks,
> Gao Xiang
>
> >
> >
> > Thanks,
> > Joanne
> >
> >>
> >> Otherwise looks good:
> >>
> >> Reviewed-by: Christoph Hellwig <hch@lst.de>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-15 17:49         ` Joanne Koong
@ 2025-10-15 18:06           ` Gao Xiang
  2025-10-15 18:21             ` Joanne Koong
  2025-10-15 18:28             ` Gao Xiang
  0 siblings, 2 replies; 46+ messages in thread
From: Gao Xiang @ 2025-10-15 18:06 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Christoph Hellwig, brauner, djwong, bfoster, linux-fsdevel,
	kernel-team



On 2025/10/16 01:49, Joanne Koong wrote:
> On Wed, Oct 15, 2025 at 10:40 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>
>> On 2025/10/16 01:34, Joanne Koong wrote:
>>> On Sun, Oct 12, 2025 at 8:00 PM Christoph Hellwig <hch@infradead.org> wrote:
>>>>
>>>> On Thu, Oct 09, 2025 at 03:56:03PM -0700, Joanne Koong wrote:
>>>>> The end position to start truncating from may be at an offset into a
>>>>> block, which under the current logic would result in overtruncation.
>>>>>
>>>>> Adjust the calculation to account for unaligned end offsets.
>>>>
>>>> Should this get a fixes tag?
>>>
>>> I don't think this needs a fixes tag because when it was originally
>>> written (in commit 9dc55f1389f9 ("iomap: add support for sub-pagesize
>>> buffered I/O without buffer heads") in 2018), it was only used by xfs.
>>> think it was when erofs started using iomap that iomap mappings could
>>> represent non-block-aligned data.
>>
>> What non-block-aligned data exactly? erofs is a strictly block-aligned
>> filesystem except for tail inline data.
>>
>> Is it inline data? gfs2 also uses the similar inline data logic.
> 
> This is where I encountered it in erofs: [1] for the "WARNING in
> iomap_iter_advance" syz repro. (this syzbot report was generated in
> response to this patchset version [2]).
> 
> When I ran that syz program locally, I remember seeing pos=116 and length=3980.

I just ran the C repro locally with the upstream codebase (but I
didn't use the related Kconfig), and it doesn't show anything.

I feel strange why pos is unaligned, does this warning show
without your patchset on your side?

Thanks,
Gao Xiang

> 
> Thanks,
> Joanne
> 
> [1] https://ci.syzbot.org/series/6845596a-1ec9-4396-b9c4-48bddc606bef
> [2] https://lore.kernel.org/linux-fsdevel/68ca71bd.050a0220.2ff435.04fc.GAE@google.com/
> 
>>
>> Thanks,
>> Gao Xiang
>>
>>>
>>>
>>> Thanks,
>>> Joanne
>>>
>>>>
>>>> Otherwise looks good:
>>>>
>>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-15 18:06           ` Gao Xiang
@ 2025-10-15 18:21             ` Joanne Koong
  2025-10-15 18:39               ` Gao Xiang
  2025-10-15 18:28             ` Gao Xiang
  1 sibling, 1 reply; 46+ messages in thread
From: Joanne Koong @ 2025-10-15 18:21 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Christoph Hellwig, brauner, djwong, bfoster, linux-fsdevel,
	kernel-team

On Wed, Oct 15, 2025 at 11:06 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>
>
>
> On 2025/10/16 01:49, Joanne Koong wrote:
> > On Wed, Oct 15, 2025 at 10:40 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> >>
> >> On 2025/10/16 01:34, Joanne Koong wrote:
> >>> On Sun, Oct 12, 2025 at 8:00 PM Christoph Hellwig <hch@infradead.org> wrote:
> >>>>
> >>>> On Thu, Oct 09, 2025 at 03:56:03PM -0700, Joanne Koong wrote:
> >>>>> The end position to start truncating from may be at an offset into a
> >>>>> block, which under the current logic would result in overtruncation.
> >>>>>
> >>>>> Adjust the calculation to account for unaligned end offsets.
> >>>>
> >>>> Should this get a fixes tag?
> >>>
> >>> I don't think this needs a fixes tag because when it was originally
> >>> written (in commit 9dc55f1389f9 ("iomap: add support for sub-pagesize
> >>> buffered I/O without buffer heads") in 2018), it was only used by xfs.
> >>> think it was when erofs started using iomap that iomap mappings could
> >>> represent non-block-aligned data.
> >>
> >> What non-block-aligned data exactly? erofs is a strictly block-aligned
> >> filesystem except for tail inline data.
> >>
> >> Is it inline data? gfs2 also uses the similar inline data logic.
> >
> > This is where I encountered it in erofs: [1] for the "WARNING in
> > iomap_iter_advance" syz repro. (this syzbot report was generated in
> > response to this patchset version [2]).
> >
> > When I ran that syz program locally, I remember seeing pos=116 and length=3980.
>
> I just ran the C repro locally with the upstream codebase (but I
> didn't use the related Kconfig), and it doesn't show anything.

Which upstream commit are you running it on? It needs to be run on top
of this patchset [1] but without this fix [2]. These changes are in
Christian's vfs-6.19.iomap branch in his vfs tree but I don't think
that branch has been published publicly yet so maybe just patching it
in locally will work best.

When I reproed it last month, I used the syz executor (not the C
repro, though that should probably work too?) directly with the
kconfig they had.

Thanks,
Joanne

[1] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/T/#t
[2] https://lore.kernel.org/linux-fsdevel/20250922180042.1775241-1-joannelkoong@gmail.com/
[3] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/T/#m4ce4707bf98077cde4d1d4845425de30cf2b00f6

>
> I feel strange why pos is unaligned, does this warning show
> without your patchset on your side?
>
> Thanks,
> Gao Xiang
>
> >
> > Thanks,
> > Joanne
> >
> > [1] https://ci.syzbot.org/series/6845596a-1ec9-4396-b9c4-48bddc606bef
> > [2] https://lore.kernel.org/linux-fsdevel/68ca71bd.050a0220.2ff435.04fc.GAE@google.com/
> >
> >>
> >> Thanks,
> >> Gao Xiang
> >>
> >>>
> >>>
> >>> Thanks,
> >>> Joanne
> >>>
> >>>>
> >>>> Otherwise looks good:
> >>>>
> >>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
> >>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-15 18:06           ` Gao Xiang
  2025-10-15 18:21             ` Joanne Koong
@ 2025-10-15 18:28             ` Gao Xiang
  1 sibling, 0 replies; 46+ messages in thread
From: Gao Xiang @ 2025-10-15 18:28 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Christoph Hellwig, brauner, djwong, bfoster, linux-fsdevel,
	kernel-team



On 2025/10/16 02:06, Gao Xiang wrote:
> 
> 
> On 2025/10/16 01:49, Joanne Koong wrote:
>> On Wed, Oct 15, 2025 at 10:40 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>>
>>> On 2025/10/16 01:34, Joanne Koong wrote:
>>>> On Sun, Oct 12, 2025 at 8:00 PM Christoph Hellwig <hch@infradead.org> wrote:
>>>>>
>>>>> On Thu, Oct 09, 2025 at 03:56:03PM -0700, Joanne Koong wrote:
>>>>>> The end position to start truncating from may be at an offset into a
>>>>>> block, which under the current logic would result in overtruncation.
>>>>>>
>>>>>> Adjust the calculation to account for unaligned end offsets.
>>>>>
>>>>> Should this get a fixes tag?
>>>>
>>>> I don't think this needs a fixes tag because when it was originally
>>>> written (in commit 9dc55f1389f9 ("iomap: add support for sub-pagesize
>>>> buffered I/O without buffer heads") in 2018), it was only used by xfs.
>>>> think it was when erofs started using iomap that iomap mappings could
>>>> represent non-block-aligned data.
>>>
>>> What non-block-aligned data exactly? erofs is a strictly block-aligned
>>> filesystem except for tail inline data.
>>>
>>> Is it inline data? gfs2 also uses the similar inline data logic.
>>
>> This is where I encountered it in erofs: [1] for the "WARNING in
>> iomap_iter_advance" syz repro. (this syzbot report was generated in
>> response to this patchset version [2]).
>>
>> When I ran that syz program locally, I remember seeing pos=116 and length=3980.
> 
> I just ran the C repro locally with the upstream codebase (but I
> didn't use the related Kconfig), and it doesn't show anything.
> 
> I feel strange why pos is unaligned, does this warning show
> without your patchset on your side?

I've confirmed that it's a valid IOMAP_INLINE extent returned by
.iomap_begin():

iomap {offset 0 length 116 type 4(IOMAP_INLINE) addr 1184}

Because INLINE extent can return unaligned lengths, and the
new warning is unexpected.

If it's a regression out of iomap itself or out of the new
patchset, please add "fixes" tag.

Thanks,
Gao Xiang

> 
> Thanks,
> Gao Xiang
> 
>>
>> Thanks,
>> Joanne
>>
>> [1] https://ci.syzbot.org/series/6845596a-1ec9-4396-b9c4-48bddc606bef
>> [2] https://lore.kernel.org/linux-fsdevel/68ca71bd.050a0220.2ff435.04fc.GAE@google.com/
>>
>>>
>>> Thanks,
>>> Gao Xiang
>>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-15 18:21             ` Joanne Koong
@ 2025-10-15 18:39               ` Gao Xiang
  2025-10-16  0:36                 ` Joanne Koong
  0 siblings, 1 reply; 46+ messages in thread
From: Gao Xiang @ 2025-10-15 18:39 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Christoph Hellwig, brauner, djwong, bfoster, linux-fsdevel,
	kernel-team

Hi Joanne,

On 2025/10/16 02:21, Joanne Koong wrote:
> On Wed, Oct 15, 2025 at 11:06 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:

...

>>>
>>> This is where I encountered it in erofs: [1] for the "WARNING in
>>> iomap_iter_advance" syz repro. (this syzbot report was generated in
>>> response to this patchset version [2]).
>>>
>>> When I ran that syz program locally, I remember seeing pos=116 and length=3980.
>>
>> I just ran the C repro locally with the upstream codebase (but I
>> didn't use the related Kconfig), and it doesn't show anything.
> 
> Which upstream commit are you running it on? It needs to be run on top
> of this patchset [1] but without this fix [2]. These changes are in
> Christian's vfs-6.19.iomap branch in his vfs tree but I don't think
> that branch has been published publicly yet so maybe just patching it
> in locally will work best.
> 
> When I reproed it last month, I used the syz executor (not the C
> repro, though that should probably work too?) directly with the
> kconfig they had.

I believe it's a regression somewhere since it's a valid
IOMAP_INLINE extent (since it's inlined, the length is not
block-aligned of course), you could add a print just before
erofs_iomap_begin() returns.

Also see my reply:
https://lore.kernel.org/r/cff53c73-f050-44e2-9c61-96552c0e85ab@linux.alibaba.com

I'm not sure if it caused user-visible regressions since
erofs images work properly with upstream code (unlike a
previous regression fixed by commit b26816b4e320 ("iomap:
fix inline data on buffered read")).

But a fixes tag is needed since it causes an unexpected
WARNING at least.

Thanks,
Gao Xiang

> 
> Thanks,
> Joanne
> 
> [1] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/T/#t
> [2] https://lore.kernel.org/linux-fsdevel/20250922180042.1775241-1-joannelkoong@gmail.com/
> [3] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/T/#m4ce4707bf98077cde4d1d4845425de30cf2b00f6
> 
>>
>> I feel strange why pos is unaligned, does this warning show
>> without your patchset on your side?
>>
>> Thanks,
>> Gao Xiang
>>
>>>
>>> Thanks,
>>> Joanne
>>>
>>> [1] https://ci.syzbot.org/series/6845596a-1ec9-4396-b9c4-48bddc606bef
>>> [2] https://lore.kernel.org/linux-fsdevel/68ca71bd.050a0220.2ff435.04fc.GAE@google.com/
>>>
>>>>
>>>> Thanks,
>>>> Gao Xiang
>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Joanne
>>>>>
>>>>>>
>>>>>> Otherwise looks good:
>>>>>>
>>>>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>>>>
>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-15 18:39               ` Gao Xiang
@ 2025-10-16  0:36                 ` Joanne Koong
  2025-10-16  1:27                   ` Joanne Koong
  0 siblings, 1 reply; 46+ messages in thread
From: Joanne Koong @ 2025-10-16  0:36 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Christoph Hellwig, brauner, djwong, bfoster, linux-fsdevel,
	kernel-team

On Wed, Oct 15, 2025 at 11:39 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>
> Hi Joanne,
>
> On 2025/10/16 02:21, Joanne Koong wrote:
> > On Wed, Oct 15, 2025 at 11:06 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>
> ...
>
> >>>
> >>> This is where I encountered it in erofs: [1] for the "WARNING in
> >>> iomap_iter_advance" syz repro. (this syzbot report was generated in
> >>> response to this patchset version [2]).
> >>>
> >>> When I ran that syz program locally, I remember seeing pos=116 and length=3980.
> >>
> >> I just ran the C repro locally with the upstream codebase (but I
> >> didn't use the related Kconfig), and it doesn't show anything.
> >
> > Which upstream commit are you running it on? It needs to be run on top
> > of this patchset [1] but without this fix [2]. These changes are in
> > Christian's vfs-6.19.iomap branch in his vfs tree but I don't think
> > that branch has been published publicly yet so maybe just patching it
> > in locally will work best.
> >
> > When I reproed it last month, I used the syz executor (not the C
> > repro, though that should probably work too?) directly with the
> > kconfig they had.
>
> I believe it's a regression somewhere since it's a valid
> IOMAP_INLINE extent (since it's inlined, the length is not
> block-aligned of course), you could add a print just before
> erofs_iomap_begin() returns.

Ok, so if erofs is strictly block-aligned except for tail inline data
(eg the IOMAP_INLINE extent), then I agree, there is a regression
somewhere as we shouldn't be running into the situation where erofs is
calling iomap_adjust_read_range() with a non-block-aligned position
and length. I'll track the offending commit down tomorrow.


Thanks,
Joanne

>
> Also see my reply:
> https://lore.kernel.org/r/cff53c73-f050-44e2-9c61-96552c0e85ab@linux.alibaba.com
>
> I'm not sure if it caused user-visible regressions since
> erofs images work properly with upstream code (unlike a
> previous regression fixed by commit b26816b4e320 ("iomap:
> fix inline data on buffered read")).
>
> But a fixes tag is needed since it causes an unexpected
> WARNING at least.
>
> Thanks,
> Gao Xiang
>
> >
> > Thanks,
> > Joanne
> >
> > [1] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/T/#t
> > [2] https://lore.kernel.org/linux-fsdevel/20250922180042.1775241-1-joannelkoong@gmail.com/
> > [3] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/T/#m4ce4707bf98077cde4d1d4845425de30cf2b00f6
> >
> >>
> >> I feel strange why pos is unaligned, does this warning show
> >> without your patchset on your side?
> >>
> >> Thanks,
> >> Gao Xiang
> >>
> >>>
> >>> Thanks,
> >>> Joanne
> >>>
> >>> [1] https://ci.syzbot.org/series/6845596a-1ec9-4396-b9c4-48bddc606bef
> >>> [2] https://lore.kernel.org/linux-fsdevel/68ca71bd.050a0220.2ff435.04fc.GAE@google.com/
> >>>
> >>>>
> >>>> Thanks,
> >>>> Gao Xiang
> >>>>
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Joanne
> >>>>>
> >>>>>>
> >>>>>> Otherwise looks good:
> >>>>>>
> >>>>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
> >>>>
> >>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-16  0:36                 ` Joanne Koong
@ 2025-10-16  1:27                   ` Joanne Koong
  2025-10-16  1:58                     ` Gao Xiang
  2025-10-16 11:29                     ` Brian Foster
  0 siblings, 2 replies; 46+ messages in thread
From: Joanne Koong @ 2025-10-16  1:27 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Christoph Hellwig, brauner, djwong, bfoster, linux-fsdevel,
	kernel-team

On Wed, Oct 15, 2025 at 5:36 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Wed, Oct 15, 2025 at 11:39 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> >
> > Hi Joanne,
> >
> > On 2025/10/16 02:21, Joanne Koong wrote:
> > > On Wed, Oct 15, 2025 at 11:06 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> >
> > ...
> >
> > >>>
> > >>> This is where I encountered it in erofs: [1] for the "WARNING in
> > >>> iomap_iter_advance" syz repro. (this syzbot report was generated in
> > >>> response to this patchset version [2]).
> > >>>
> > >>> When I ran that syz program locally, I remember seeing pos=116 and length=3980.
> > >>
> > >> I just ran the C repro locally with the upstream codebase (but I
> > >> didn't use the related Kconfig), and it doesn't show anything.
> > >
> > > Which upstream commit are you running it on? It needs to be run on top
> > > of this patchset [1] but without this fix [2]. These changes are in
> > > Christian's vfs-6.19.iomap branch in his vfs tree but I don't think
> > > that branch has been published publicly yet so maybe just patching it
> > > in locally will work best.
> > >
> > > When I reproed it last month, I used the syz executor (not the C
> > > repro, though that should probably work too?) directly with the
> > > kconfig they had.
> >
> > I believe it's a regression somewhere since it's a valid
> > IOMAP_INLINE extent (since it's inlined, the length is not
> > block-aligned of course), you could add a print just before
> > erofs_iomap_begin() returns.
>
> Ok, so if erofs is strictly block-aligned except for tail inline data
> (eg the IOMAP_INLINE extent), then I agree, there is a regression
> somewhere as we shouldn't be running into the situation where erofs is
> calling iomap_adjust_read_range() with a non-block-aligned position
> and length. I'll track the offending commit down tomorrow.
>

Ok, I think it's commit bc264fea0f6f ("iomap: support incremental
iomap_iter advances") that changed this behavior for erofs such that
the read iteration continues even after encountering an IOMAP_INLINE
extent, whereas before, the iteration stopped after reading in the
iomap inline extent. This leads erofs to end up in the situation where
it calls into iomap_adjust_read_range() with a non-block-aligned
position/length (on that subsequent iteration).

In particular, this change in commit bc264fea0f6f to iomap_iter():

-       if (ret > 0 && !iter->processed && !stale)
+       if (ret > 0 && !advanced && !stale)

For iomap inline extents, iter->processed is 0, which stopped the
iteration before. But now, advanced (which is iter->pos -
iter->iter_start_pos) is used which will continue the iteration (since
the iter is advanced after reading in the iomap inline extents).

Erofs is able to handle subsequent iterations after iomap_inline
extents because erofs_iomap_begin() checks the block map and returns
IOMAP_HOLE if it's not mapped
        if (!(map.m_flags & EROFS_MAP_MAPPED)) {
                iomap->type = IOMAP_HOLE;
                return 0;
        }

but I think what probably would be better is a separate patch that
reverts this back to the original behavior of stopping the iteration
after IOMAP_INLINE extents are read in.

So I don't think this patch should have a fixes: tag for that commit.
It seems to me like no one was hitting this path before with a
non-block-aligned position and offset. Though now there will be a use
case for it, which is fuse.

Thanks,
Joanne

>
> Thanks,
> Joanne
>
> >
> > Also see my reply:
> > https://lore.kernel.org/r/cff53c73-f050-44e2-9c61-96552c0e85ab@linux.alibaba.com
> >
> > I'm not sure if it caused user-visible regressions since
> > erofs images work properly with upstream code (unlike a
> > previous regression fixed by commit b26816b4e320 ("iomap:
> > fix inline data on buffered read")).
> >
> > But a fixes tag is needed since it causes an unexpected
> > WARNING at least.
> >
> > Thanks,
> > Gao Xiang
> >
> > >
> > > Thanks,
> > > Joanne
> > >
> > > [1] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/T/#t
> > > [2] https://lore.kernel.org/linux-fsdevel/20250922180042.1775241-1-joannelkoong@gmail.com/
> > > [3] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/T/#m4ce4707bf98077cde4d1d4845425de30cf2b00f6
> > >
> > >>
> > >> I feel strange why pos is unaligned, does this warning show
> > >> without your patchset on your side?
> > >>
> > >> Thanks,
> > >> Gao Xiang
> > >>
> > >>>
> > >>> Thanks,
> > >>> Joanne
> > >>>
> > >>> [1] https://ci.syzbot.org/series/6845596a-1ec9-4396-b9c4-48bddc606bef
> > >>> [2] https://lore.kernel.org/linux-fsdevel/68ca71bd.050a0220.2ff435.04fc.GAE@google.com/
> > >>>
> > >>>>
> > >>>> Thanks,
> > >>>> Gao Xiang
> > >>>>
> > >>>>>
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Joanne
> > >>>>>
> > >>>>>>
> > >>>>>> Otherwise looks good:
> > >>>>>>
> > >>>>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
> > >>>>
> > >>
> >

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-16  1:27                   ` Joanne Koong
@ 2025-10-16  1:58                     ` Gao Xiang
  2025-10-16 22:03                       ` Joanne Koong
  2025-10-16 11:29                     ` Brian Foster
  1 sibling, 1 reply; 46+ messages in thread
From: Gao Xiang @ 2025-10-16  1:58 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Christoph Hellwig, brauner, djwong, bfoster, linux-fsdevel,
	kernel-team



On 2025/10/16 09:27, Joanne Koong wrote:
> On Wed, Oct 15, 2025 at 5:36 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>>
>> On Wed, Oct 15, 2025 at 11:39 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>>
>>> Hi Joanne,
>>>
>>> On 2025/10/16 02:21, Joanne Koong wrote:
>>>> On Wed, Oct 15, 2025 at 11:06 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>>
>>> ...
>>>
>>>>>>
>>>>>> This is where I encountered it in erofs: [1] for the "WARNING in
>>>>>> iomap_iter_advance" syz repro. (this syzbot report was generated in
>>>>>> response to this patchset version [2]).
>>>>>>
>>>>>> When I ran that syz program locally, I remember seeing pos=116 and length=3980.
>>>>>
>>>>> I just ran the C repro locally with the upstream codebase (but I
>>>>> didn't use the related Kconfig), and it doesn't show anything.
>>>>
>>>> Which upstream commit are you running it on? It needs to be run on top
>>>> of this patchset [1] but without this fix [2]. These changes are in
>>>> Christian's vfs-6.19.iomap branch in his vfs tree but I don't think
>>>> that branch has been published publicly yet so maybe just patching it
>>>> in locally will work best.
>>>>
>>>> When I reproed it last month, I used the syz executor (not the C
>>>> repro, though that should probably work too?) directly with the
>>>> kconfig they had.
>>>
>>> I believe it's a regression somewhere since it's a valid
>>> IOMAP_INLINE extent (since it's inlined, the length is not
>>> block-aligned of course), you could add a print just before
>>> erofs_iomap_begin() returns.
>>
>> Ok, so if erofs is strictly block-aligned except for tail inline data
>> (eg the IOMAP_INLINE extent), then I agree, there is a regression
>> somewhere as we shouldn't be running into the situation where erofs is
>> calling iomap_adjust_read_range() with a non-block-aligned position
>> and length. I'll track the offending commit down tomorrow.
>>
> 
> Ok, I think it's commit bc264fea0f6f ("iomap: support incremental
> iomap_iter advances") that changed this behavior for erofs such that
> the read iteration continues even after encountering an IOMAP_INLINE
> extent, whereas before, the iteration stopped after reading in the
> iomap inline extent. This leads erofs to end up in the situation where
> it calls into iomap_adjust_read_range() with a non-block-aligned
> position/length (on that subsequent iteration).
> 
> In particular, this change in commit bc264fea0f6f to iomap_iter():
> 
> -       if (ret > 0 && !iter->processed && !stale)
> +       if (ret > 0 && !advanced && !stale)
> 
> For iomap inline extents, iter->processed is 0, which stopped the
> iteration before. But now, advanced (which is iter->pos -
> iter->iter_start_pos) is used which will continue the iteration (since
> the iter is advanced after reading in the iomap inline extents).
> 
> Erofs is able to handle subsequent iterations after iomap_inline
> extents because erofs_iomap_begin() checks the block map and returns
> IOMAP_HOLE if it's not mapped
>          if (!(map.m_flags & EROFS_MAP_MAPPED)) {
>                  iomap->type = IOMAP_HOLE;
>                  return 0;
>          }
> 
> but I think what probably would be better is a separate patch that
> reverts this back to the original behavior of stopping the iteration
> after IOMAP_INLINE extents are read in.

Thanks for your analysis, currently I don't have enough time to look
into that (working on other stuffs), but according to your analysis,
it seems EROFS don't have a user-visible regression in the Linus'
upstream.

> 
> So I don't think this patch should have a fixes: tag for that commit.
> It seems to me like no one was hitting this path before with a
> non-block-aligned position and offset. Though now there will be a use
> case for it, which is fuse.

To make it simplified, the issue is that:
  - Previously, before your fuse iomap read patchset (assuming Christian
    is already applied), there was no WARNING out of there;

  - A new WARNING should be considered as a kernel regression.

I'm not sure if Christian's iomap branch allows to be rebased, so in
principle, either:

  - You insert this patch before the WARNING patch in the fuse patchset
    goes in, so no Fixes tag is needed because the WARNING doesn't
    exist anymore,

Or

  - Add the Fixes tag to point to the commit which causes the WARNING
one.

Why it's important? because each upstream commit can be (back)ported
to stable or downstream kernels, a WARNING should be considered as
a new regression and needs a fix commit to be ported together, even
those patches will be merged into upstream together.

Thanks,
Gao Xiang

> 
> Thanks,
> Joanne
> 
>>
>> Thanks,
>> Joanne
>>
>>>
>>> Also see my reply:
>>> https://lore.kernel.org/r/cff53c73-f050-44e2-9c61-96552c0e85ab@linux.alibaba.com
>>>
>>> I'm not sure if it caused user-visible regressions since
>>> erofs images work properly with upstream code (unlike a
>>> previous regression fixed by commit b26816b4e320 ("iomap:
>>> fix inline data on buffered read")).
>>>
>>> But a fixes tag is needed since it causes an unexpected
>>> WARNING at least.
>>>
>>> Thanks,
>>> Gao Xiang
>>>
>>>>
>>>> Thanks,
>>>> Joanne
>>>>
>>>> [1] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/T/#t
>>>> [2] https://lore.kernel.org/linux-fsdevel/20250922180042.1775241-1-joannelkoong@gmail.com/
>>>> [3] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/T/#m4ce4707bf98077cde4d1d4845425de30cf2b00f6
>>>>
>>>>>
>>>>> I feel strange why pos is unaligned, does this warning show
>>>>> without your patchset on your side?
>>>>>
>>>>> Thanks,
>>>>> Gao Xiang
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Joanne
>>>>>>
>>>>>> [1] https://ci.syzbot.org/series/6845596a-1ec9-4396-b9c4-48bddc606bef
>>>>>> [2] https://lore.kernel.org/linux-fsdevel/68ca71bd.050a0220.2ff435.04fc.GAE@google.com/
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Gao Xiang
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Joanne
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Otherwise looks good:
>>>>>>>>>
>>>>>>>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>>>>>>>
>>>>>
>>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-16  1:27                   ` Joanne Koong
  2025-10-16  1:58                     ` Gao Xiang
@ 2025-10-16 11:29                     ` Brian Foster
  2025-10-16 22:39                       ` Joanne Koong
  1 sibling, 1 reply; 46+ messages in thread
From: Brian Foster @ 2025-10-16 11:29 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Gao Xiang, Christoph Hellwig, brauner, djwong, linux-fsdevel,
	kernel-team

On Wed, Oct 15, 2025 at 06:27:10PM -0700, Joanne Koong wrote:
> On Wed, Oct 15, 2025 at 5:36 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Wed, Oct 15, 2025 at 11:39 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > >
> > > Hi Joanne,
> > >
> > > On 2025/10/16 02:21, Joanne Koong wrote:
> > > > On Wed, Oct 15, 2025 at 11:06 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > >
> > > ...
> > >
> > > >>>
> > > >>> This is where I encountered it in erofs: [1] for the "WARNING in
> > > >>> iomap_iter_advance" syz repro. (this syzbot report was generated in
> > > >>> response to this patchset version [2]).
> > > >>>
> > > >>> When I ran that syz program locally, I remember seeing pos=116 and length=3980.
> > > >>
> > > >> I just ran the C repro locally with the upstream codebase (but I
> > > >> didn't use the related Kconfig), and it doesn't show anything.
> > > >
> > > > Which upstream commit are you running it on? It needs to be run on top
> > > > of this patchset [1] but without this fix [2]. These changes are in
> > > > Christian's vfs-6.19.iomap branch in his vfs tree but I don't think
> > > > that branch has been published publicly yet so maybe just patching it
> > > > in locally will work best.
> > > >
> > > > When I reproed it last month, I used the syz executor (not the C
> > > > repro, though that should probably work too?) directly with the
> > > > kconfig they had.
> > >
> > > I believe it's a regression somewhere since it's a valid
> > > IOMAP_INLINE extent (since it's inlined, the length is not
> > > block-aligned of course), you could add a print just before
> > > erofs_iomap_begin() returns.
> >
> > Ok, so if erofs is strictly block-aligned except for tail inline data
> > (eg the IOMAP_INLINE extent), then I agree, there is a regression
> > somewhere as we shouldn't be running into the situation where erofs is
> > calling iomap_adjust_read_range() with a non-block-aligned position
> > and length. I'll track the offending commit down tomorrow.
> >
> 
> Ok, I think it's commit bc264fea0f6f ("iomap: support incremental
> iomap_iter advances") that changed this behavior for erofs such that
> the read iteration continues even after encountering an IOMAP_INLINE
> extent, whereas before, the iteration stopped after reading in the
> iomap inline extent. This leads erofs to end up in the situation where
> it calls into iomap_adjust_read_range() with a non-block-aligned
> position/length (on that subsequent iteration).
> 
> In particular, this change in commit bc264fea0f6f to iomap_iter():
> 
> -       if (ret > 0 && !iter->processed && !stale)
> +       if (ret > 0 && !advanced && !stale)
> 
> For iomap inline extents, iter->processed is 0, which stopped the
> iteration before. But now, advanced (which is iter->pos -
> iter->iter_start_pos) is used which will continue the iteration (since
> the iter is advanced after reading in the iomap inline extents).
> 
> Erofs is able to handle subsequent iterations after iomap_inline
> extents because erofs_iomap_begin() checks the block map and returns
> IOMAP_HOLE if it's not mapped
>         if (!(map.m_flags & EROFS_MAP_MAPPED)) {
>                 iomap->type = IOMAP_HOLE;
>                 return 0;
>         }
> 
> but I think what probably would be better is a separate patch that
> reverts this back to the original behavior of stopping the iteration
> after IOMAP_INLINE extents are read in.
> 

Hmm.. so as of commit bc264fea0f6f, it looks like the read_inline() path
still didn't advance the iter at all by that point. It just returned 0
and this caused iomap_iter() to break out of the iteration loop.

The logic noted above in iomap_iter() is basically saying to break out
if the iteration did nothing, which is a bit of a hacky way to terminate
an IOMAP_INLINE read. The proper thing to do in that path IMO is to
report the bytes processed and then terminate some other way more
naturally. I see Gao actually fixed this sometime later in commit
b26816b4e320 ("iomap: fix inline data on buffered read"), which is when
the inline read path started to advance the iter.

TBH, the behavior described above where we advance over the inline
mapping and then report any remaining iter length as a hole also sounds
like reasonably appropriate behavior to me. I suppose you could argue
that the inline case should just terminate the iter, which perhaps means
it should call iomap_iter_advance_full() instead. That technically
hardcodes that we will never process mappings beyond an inline mapping
into iomap. That bugs me a little bit, but is also probably always going
to be true so doesn't seem like that big of a deal.

If we wanted to consider it an optimization so we didn't always do this
extra iter on inline files, perhaps another variant of that could be an
EOF flag or some such that the fs could set to trigger a full advance
after the current mapping. OTOH you could argue that's what inline
already is so maybe that's overthinking it. Just a thought. Hm?

Brian

> So I don't think this patch should have a fixes: tag for that commit.
> It seems to me like no one was hitting this path before with a
> non-block-aligned position and offset. Though now there will be a use
> case for it, which is fuse.
> 
> Thanks,
> Joanne
> 
> >
> > Thanks,
> > Joanne
> >
> > >
> > > Also see my reply:
> > > https://lore.kernel.org/r/cff53c73-f050-44e2-9c61-96552c0e85ab@linux.alibaba.com
> > >
> > > I'm not sure if it caused user-visible regressions since
> > > erofs images work properly with upstream code (unlike a
> > > previous regression fixed by commit b26816b4e320 ("iomap:
> > > fix inline data on buffered read")).
> > >
> > > But a fixes tag is needed since it causes an unexpected
> > > WARNING at least.
> > >
> > > Thanks,
> > > Gao Xiang
> > >
> > > >
> > > > Thanks,
> > > > Joanne
> > > >
> > > > [1] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/T/#t
> > > > [2] https://lore.kernel.org/linux-fsdevel/20250922180042.1775241-1-joannelkoong@gmail.com/
> > > > [3] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/T/#m4ce4707bf98077cde4d1d4845425de30cf2b00f6
> > > >
> > > >>
> > > >> I feel strange why pos is unaligned, does this warning show
> > > >> without your patchset on your side?
> > > >>
> > > >> Thanks,
> > > >> Gao Xiang
> > > >>
> > > >>>
> > > >>> Thanks,
> > > >>> Joanne
> > > >>>
> > > >>> [1] https://ci.syzbot.org/series/6845596a-1ec9-4396-b9c4-48bddc606bef
> > > >>> [2] https://lore.kernel.org/linux-fsdevel/68ca71bd.050a0220.2ff435.04fc.GAE@google.com/
> > > >>>
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Gao Xiang
> > > >>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Joanne
> > > >>>>>
> > > >>>>>>
> > > >>>>>> Otherwise looks good:
> > > >>>>>>
> > > >>>>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > >>>>
> > > >>
> > >
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-16  1:58                     ` Gao Xiang
@ 2025-10-16 22:03                       ` Joanne Koong
  2025-10-17  0:03                         ` Gao Xiang
  0 siblings, 1 reply; 46+ messages in thread
From: Joanne Koong @ 2025-10-16 22:03 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Christoph Hellwig, brauner, djwong, bfoster, linux-fsdevel,
	kernel-team

On Wed, Oct 15, 2025 at 6:58 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>
> On 2025/10/16 09:27, Joanne Koong wrote:
> > On Wed, Oct 15, 2025 at 5:36 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >>
> >> On Wed, Oct 15, 2025 at 11:39 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> >>>
> >>> Hi Joanne,
> >>>
> >>> On 2025/10/16 02:21, Joanne Koong wrote:
> >>>> On Wed, Oct 15, 2025 at 11:06 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> >>>
> >>> ...
> >>>
> >>>>>>
> >>>>>> This is where I encountered it in erofs: [1] for the "WARNING in
> >>>>>> iomap_iter_advance" syz repro. (this syzbot report was generated in
> >>>>>> response to this patchset version [2]).
> >>>>>>
> >>>>>> When I ran that syz program locally, I remember seeing pos=116 and length=3980.
> >>>>>
> >>>>> I just ran the C repro locally with the upstream codebase (but I
> >>>>> didn't use the related Kconfig), and it doesn't show anything.
> >>>>
> >>>> Which upstream commit are you running it on? It needs to be run on top
> >>>> of this patchset [1] but without this fix [2]. These changes are in
> >>>> Christian's vfs-6.19.iomap branch in his vfs tree but I don't think
> >>>> that branch has been published publicly yet so maybe just patching it
> >>>> in locally will work best.
> >>>>
> >>>> When I reproed it last month, I used the syz executor (not the C
> >>>> repro, though that should probably work too?) directly with the
> >>>> kconfig they had.
> >>>
> >>> I believe it's a regression somewhere since it's a valid
> >>> IOMAP_INLINE extent (since it's inlined, the length is not
> >>> block-aligned of course), you could add a print just before
> >>> erofs_iomap_begin() returns.
> >>
> >> Ok, so if erofs is strictly block-aligned except for tail inline data
> >> (eg the IOMAP_INLINE extent), then I agree, there is a regression
> >> somewhere as we shouldn't be running into the situation where erofs is
> >> calling iomap_adjust_read_range() with a non-block-aligned position
> >> and length. I'll track the offending commit down tomorrow.
> >>
> >
> > Ok, I think it's commit bc264fea0f6f ("iomap: support incremental
> > iomap_iter advances") that changed this behavior for erofs such that
> > the read iteration continues even after encountering an IOMAP_INLINE
> > extent, whereas before, the iteration stopped after reading in the
> > iomap inline extent. This leads erofs to end up in the situation where
> > it calls into iomap_adjust_read_range() with a non-block-aligned
> > position/length (on that subsequent iteration).
> >
> > In particular, this change in commit bc264fea0f6f to iomap_iter():
> >
> > -       if (ret > 0 && !iter->processed && !stale)
> > +       if (ret > 0 && !advanced && !stale)
> >
> > For iomap inline extents, iter->processed is 0, which stopped the
> > iteration before. But now, advanced (which is iter->pos -
> > iter->iter_start_pos) is used which will continue the iteration (since
> > the iter is advanced after reading in the iomap inline extents).
> >
> > Erofs is able to handle subsequent iterations after iomap_inline
> > extents because erofs_iomap_begin() checks the block map and returns
> > IOMAP_HOLE if it's not mapped
> >          if (!(map.m_flags & EROFS_MAP_MAPPED)) {
> >                  iomap->type = IOMAP_HOLE;
> >                  return 0;
> >          }
> >
> > but I think what probably would be better is a separate patch that
> > reverts this back to the original behavior of stopping the iteration
> > after IOMAP_INLINE extents are read in.
>
> Thanks for your analysis, currently I don't have enough time to look
> into that (working on other stuffs), but according to your analysis,
> it seems EROFS don't have a user-visible regression in the Linus'
> upstream.
>
> >
> > So I don't think this patch should have a fixes: tag for that commit.
> > It seems to me like no one was hitting this path before with a
> > non-block-aligned position and offset. Though now there will be a use
> > case for it, which is fuse.
>
> To make it simplified, the issue is that:
>   - Previously, before your fuse iomap read patchset (assuming Christian
>     is already applied), there was no WARNING out of there;
>
>   - A new WARNING should be considered as a kernel regression.

No, the warning was always there. As shown in the syzbot report [1],
the warning that triggers is this one in iomap_iter_advance()

int iomap_iter_advance(struct iomap_iter *iter, u64 *count)
{
        if (WARN_ON_ONCE(*count > iomap_length(iter)))
                return -EIO;
        ...
}

which was there even prior to the fuse iomap read patchset.

Erofs could still trigger this warning even without the fuse iomap
read patchset changes. So I don't think this qualifies as a new
warning that's caused by the fuse iomap read changes. What the fuse
iomap read patchset does is make it more likely for some specific
reads (like the syzbot generated one) to trigger the warning because
now the iterator is advanced incrementally which detects the underflow
length value (hence triggering the warning) whereas before, the bug
could be masked if the underflow of len and overflow of pos that is
returned from iomap_adjust_read_range() offset each other perfectly.
But other reads (for example ones where there are trailing uptodate
blocks that need to be truncated) can already trigger this warning
before the fuse iomap read changes and after the fuse iomap read
changes.

With that said, commit b26816b4e32 ("iomap: fix inline data on
buffered read") is the commit that led erofs to this new behavior of
calling into iomap_adjust_read_range() with non-block-aligned offsets
and lengths and can trigger the warning for some reads. So since this
patch fixes that behavior introduced by that commit, I'll add a Fixes
tag referencing that commit.

Thanks,
Joanne


[1] https://lore.kernel.org/linux-fsdevel/68ca71bd.050a0220.2ff435.04fc.GAE@google.com/

>
> I'm not sure if Christian's iomap branch allows to be rebased, so in
> principle, either:
>
>   - You insert this patch before the WARNING patch in the fuse patchset
>     goes in, so no Fixes tag is needed because the WARNING doesn't
>     exist anymore,
>
> Or
>
>   - Add the Fixes tag to point to the commit which causes the WARNING
> one.
>
> Why it's important? because each upstream commit can be (back)ported
> to stable or downstream kernels, a WARNING should be considered as
> a new regression and needs a fix commit to be ported together, even
> those patches will be merged into upstream together.
>
> Thanks,
> Gao Xiang
>
> >
> > Thanks,
> > Joanne
> >
> >>
> >> Thanks,
> >> Joanne
> >>
> >>>
> >>> Also see my reply:
> >>> https://lore.kernel.org/r/cff53c73-f050-44e2-9c61-96552c0e85ab@linux.alibaba.com
> >>>
> >>> I'm not sure if it caused user-visible regressions since
> >>> erofs images work properly with upstream code (unlike a
> >>> previous regression fixed by commit b26816b4e320 ("iomap:
> >>> fix inline data on buffered read")).
> >>>
> >>> But a fixes tag is needed since it causes an unexpected
> >>> WARNING at least.
> >>>
> >>> Thanks,
> >>> Gao Xiang
> >>>
> >>>>
> >>>> Thanks,
> >>>> Joanne
> >>>>
> >>>> [1] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/T/#t
> >>>> [2] https://lore.kernel.org/linux-fsdevel/20250922180042.1775241-1-joannelkoong@gmail.com/
> >>>> [3] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/T/#m4ce4707bf98077cde4d1d4845425de30cf2b00f6
> >>>>
> >>>>>
> >>>>> I feel strange why pos is unaligned, does this warning show
> >>>>> without your patchset on your side?
> >>>>>
> >>>>> Thanks,
> >>>>> Gao Xiang
> >>>>>
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Joanne
> >>>>>>
> >>>>>> [1] https://ci.syzbot.org/series/6845596a-1ec9-4396-b9c4-48bddc606bef
> >>>>>> [2] https://lore.kernel.org/linux-fsdevel/68ca71bd.050a0220.2ff435.04fc.GAE@google.com/
> >>>>>>
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Gao Xiang
> >>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Joanne
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Otherwise looks good:
> >>>>>>>>>
> >>>>>>>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
> >>>>>>>
> >>>>>
> >>>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-16 11:29                     ` Brian Foster
@ 2025-10-16 22:39                       ` Joanne Koong
  2025-10-17 15:33                         ` Brian Foster
  0 siblings, 1 reply; 46+ messages in thread
From: Joanne Koong @ 2025-10-16 22:39 UTC (permalink / raw)
  To: Brian Foster
  Cc: Gao Xiang, Christoph Hellwig, brauner, djwong, linux-fsdevel,
	kernel-team

On Thu, Oct 16, 2025 at 4:25 AM Brian Foster <bfoster@redhat.com> wrote:
>
> On Wed, Oct 15, 2025 at 06:27:10PM -0700, Joanne Koong wrote:
> > On Wed, Oct 15, 2025 at 5:36 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > On Wed, Oct 15, 2025 at 11:39 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > > >
> > > > Hi Joanne,
> > > >
> > > > On 2025/10/16 02:21, Joanne Koong wrote:
> > > > > On Wed, Oct 15, 2025 at 11:06 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > > >
> > > > ...
> > > >
> > > > >>>
> > > > >>> This is where I encountered it in erofs: [1] for the "WARNING in
> > > > >>> iomap_iter_advance" syz repro. (this syzbot report was generated in
> > > > >>> response to this patchset version [2]).
> > > > >>>
> > > > >>> When I ran that syz program locally, I remember seeing pos=116 and length=3980.
> > > > >>
> > > > >> I just ran the C repro locally with the upstream codebase (but I
> > > > >> didn't use the related Kconfig), and it doesn't show anything.
> > > > >
> > > > > Which upstream commit are you running it on? It needs to be run on top
> > > > > of this patchset [1] but without this fix [2]. These changes are in
> > > > > Christian's vfs-6.19.iomap branch in his vfs tree but I don't think
> > > > > that branch has been published publicly yet so maybe just patching it
> > > > > in locally will work best.
> > > > >
> > > > > When I reproed it last month, I used the syz executor (not the C
> > > > > repro, though that should probably work too?) directly with the
> > > > > kconfig they had.
> > > >
> > > > I believe it's a regression somewhere since it's a valid
> > > > IOMAP_INLINE extent (since it's inlined, the length is not
> > > > block-aligned of course), you could add a print just before
> > > > erofs_iomap_begin() returns.
> > >
> > > Ok, so if erofs is strictly block-aligned except for tail inline data
> > > (eg the IOMAP_INLINE extent), then I agree, there is a regression
> > > somewhere as we shouldn't be running into the situation where erofs is
> > > calling iomap_adjust_read_range() with a non-block-aligned position
> > > and length. I'll track the offending commit down tomorrow.
> > >
> >
> > Ok, I think it's commit bc264fea0f6f ("iomap: support incremental
> > iomap_iter advances") that changed this behavior for erofs such that
> > the read iteration continues even after encountering an IOMAP_INLINE
> > extent, whereas before, the iteration stopped after reading in the
> > iomap inline extent. This leads erofs to end up in the situation where
> > it calls into iomap_adjust_read_range() with a non-block-aligned
> > position/length (on that subsequent iteration).
> >
> > In particular, this change in commit bc264fea0f6f to iomap_iter():
> >
> > -       if (ret > 0 && !iter->processed && !stale)
> > +       if (ret > 0 && !advanced && !stale)
> >
> > For iomap inline extents, iter->processed is 0, which stopped the
> > iteration before. But now, advanced (which is iter->pos -
> > iter->iter_start_pos) is used which will continue the iteration (since
> > the iter is advanced after reading in the iomap inline extents).
> >
> > Erofs is able to handle subsequent iterations after iomap_inline
> > extents because erofs_iomap_begin() checks the block map and returns
> > IOMAP_HOLE if it's not mapped
> >         if (!(map.m_flags & EROFS_MAP_MAPPED)) {
> >                 iomap->type = IOMAP_HOLE;
> >                 return 0;
> >         }
> >
> > but I think what probably would be better is a separate patch that
> > reverts this back to the original behavior of stopping the iteration
> > after IOMAP_INLINE extents are read in.
> >
>
> Hmm.. so as of commit bc264fea0f6f, it looks like the read_inline() path
> still didn't advance the iter at all by that point. It just returned 0
> and this caused iomap_iter() to break out of the iteration loop.
>
> The logic noted above in iomap_iter() is basically saying to break out
> if the iteration did nothing, which is a bit of a hacky way to terminate
> an IOMAP_INLINE read. The proper thing to do in that path IMO is to
> report the bytes processed and then terminate some other way more
> naturally. I see Gao actually fixed this sometime later in commit
> b26816b4e320 ("iomap: fix inline data on buffered read"), which is when
> the inline read path started to advance the iter.

That's a good point, the fix in commit b26816b4e320 is what led to the
new erofs behavior, not commit bc264fea0f6f.

>
> TBH, the behavior described above where we advance over the inline
> mapping and then report any remaining iter length as a hole also sounds
> like reasonably appropriate behavior to me. I suppose you could argue
> that the inline case should just terminate the iter, which perhaps means
> it should call iomap_iter_advance_full() instead. That technically
> hardcodes that we will never process mappings beyond an inline mapping
> into iomap. That bugs me a little bit, but is also probably always going
> to be true so doesn't seem like that big of a deal.

Reporting any remaining iter length as a hole also sounds reasonable
to me but it seems that this may add additional work that may not be
trivial. For example, I'm looking at erofs's erofs_map_blocks() call
which they would need to do to figure out it should be an iomap hole.
It seems a bit nonideal to me that any filesystem using iomap inline
data would also have to make sure they cover this case, which maybe
they already need to do that, I'm not sure, but it seems like an extra
thing they would now need to account for.

One scenario I'm imagining maybe being useful in the future is
supporting chained inline mappings, in which case we would still want
to continue processing after the first inline mapping, but we could
also address that if it does come up.

>
> If we wanted to consider it an optimization so we didn't always do this
> extra iter on inline files, perhaps another variant of that could be an
> EOF flag or some such that the fs could set to trigger a full advance
> after the current mapping. OTOH you could argue that's what inline
> already is so maybe that's overthinking it. Just a thought. Hm?
>

Would non-inline iomap types also find that flag helpful or is the
intention for it to be inline specific? I guess the flag could also be
used by non-inline types to stop the iteration short, if there's a use
case for that?

Thanks,
Joanne

> Brian
>
> > So I don't think this patch should have a fixes: tag for that commit.
> > It seems to me like no one was hitting this path before with a
> > non-block-aligned position and offset. Though now there will be a use
> > case for it, which is fuse.
> >
> > Thanks,
> > Joanne
> >
> > >
> > > Thanks,
> > > Joanne
> > >
> > > >
> > > > Also see my reply:
> > > > https://lore.kernel.org/r/cff53c73-f050-44e2-9c61-96552c0e85ab@linux.alibaba.com
> > > >
> > > > I'm not sure if it caused user-visible regressions since
> > > > erofs images work properly with upstream code (unlike a
> > > > previous regression fixed by commit b26816b4e320 ("iomap:
> > > > fix inline data on buffered read")).
> > > >
> > > > But a fixes tag is needed since it causes an unexpected
> > > > WARNING at least.
> > > >
> > > > Thanks,
> > > > Gao Xiang
> > > >
> > > > >
> > > > > Thanks,
> > > > > Joanne
> > > > >
> > > > > [1] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/T/#t
> > > > > [2] https://lore.kernel.org/linux-fsdevel/20250922180042.1775241-1-joannelkoong@gmail.com/
> > > > > [3] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/T/#m4ce4707bf98077cde4d1d4845425de30cf2b00f6
> > > > >
> > > > >>
> > > > >> I feel strange why pos is unaligned, does this warning show
> > > > >> without your patchset on your side?
> > > > >>
> > > > >> Thanks,
> > > > >> Gao Xiang
> > > > >>
> > > > >>>
> > > > >>> Thanks,
> > > > >>> Joanne
> > > > >>>
> > > > >>> [1] https://ci.syzbot.org/series/6845596a-1ec9-4396-b9c4-48bddc606bef
> > > > >>> [2] https://lore.kernel.org/linux-fsdevel/68ca71bd.050a0220.2ff435.04fc.GAE@google.com/
> > > > >>>
> > > > >>>>
> > > > >>>> Thanks,
> > > > >>>> Gao Xiang
> > > > >>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> Thanks,
> > > > >>>>> Joanne
> > > > >>>>>
> > > > >>>>>>
> > > > >>>>>> Otherwise looks good:
> > > > >>>>>>
> > > > >>>>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > > >>>>
> > > > >>
> > > >
> >
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-16 22:03                       ` Joanne Koong
@ 2025-10-17  0:03                         ` Gao Xiang
  2025-10-17 18:41                           ` Joanne Koong
  0 siblings, 1 reply; 46+ messages in thread
From: Gao Xiang @ 2025-10-17  0:03 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Christoph Hellwig, brauner, djwong, bfoster, linux-fsdevel,
	kernel-team



On 2025/10/17 06:03, Joanne Koong wrote:
> On Wed, Oct 15, 2025 at 6:58 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:

...

>>
>>>
>>> So I don't think this patch should have a fixes: tag for that commit.
>>> It seems to me like no one was hitting this path before with a
>>> non-block-aligned position and offset. Though now there will be a use
>>> case for it, which is fuse.
>>
>> To make it simplified, the issue is that:
>>    - Previously, before your fuse iomap read patchset (assuming Christian
>>      is already applied), there was no WARNING out of there;
>>
>>    - A new WARNING should be considered as a kernel regression.
> 
> No, the warning was always there. As shown in the syzbot report [1],
> the warning that triggers is this one in iomap_iter_advance()
> 
> int iomap_iter_advance(struct iomap_iter *iter, u64 *count)
> {
>          if (WARN_ON_ONCE(*count > iomap_length(iter)))
>                  return -EIO;
>          ...
> }
> 
> which was there even prior to the fuse iomap read patchset.
> 
> Erofs could still trigger this warning even without the fuse iomap
> read patchset changes. So I don't think this qualifies as a new
> warning that's caused by the fuse iomap read changes.

No, I'm pretty sure the current Linus upstream doesn't have this
issue, because:

  - I've checked it against v6.17 with the C repro and related
    Kconfig (with make olddefconfig revised);

  - IOMAP_INLINE is pretty common for directories and regular
    inodes, if it has such warning syzbot should be reported
    much earlier (d9dc477ff6a2 was commited at Feb 26, 2025;
    and b26816b4e320 was commited at Mar 19, 2025) in the dashboard
    (https://syzkaller.appspot.com/upstream/s/erofs) rather
    than triggered directly by your fuse read patchset.

Could you also check with v6.17 codebase?

Thanks,
Gao Xiang


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-16 22:39                       ` Joanne Koong
@ 2025-10-17 15:33                         ` Brian Foster
  2025-10-17 18:27                           ` Brian Foster
  0 siblings, 1 reply; 46+ messages in thread
From: Brian Foster @ 2025-10-17 15:33 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Gao Xiang, Christoph Hellwig, brauner, djwong, linux-fsdevel,
	kernel-team

On Thu, Oct 16, 2025 at 03:39:27PM -0700, Joanne Koong wrote:
> On Thu, Oct 16, 2025 at 4:25 AM Brian Foster <bfoster@redhat.com> wrote:
> >
> > On Wed, Oct 15, 2025 at 06:27:10PM -0700, Joanne Koong wrote:
> > > On Wed, Oct 15, 2025 at 5:36 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > >
> > > > On Wed, Oct 15, 2025 at 11:39 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > > > >
> > > > > Hi Joanne,
> > > > >
> > > > > On 2025/10/16 02:21, Joanne Koong wrote:
> > > > > > On Wed, Oct 15, 2025 at 11:06 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > > > >
> > > > > ...
> > > > >
> > > > > >>>
> > > > > >>> This is where I encountered it in erofs: [1] for the "WARNING in
> > > > > >>> iomap_iter_advance" syz repro. (this syzbot report was generated in
> > > > > >>> response to this patchset version [2]).
> > > > > >>>
> > > > > >>> When I ran that syz program locally, I remember seeing pos=116 and length=3980.
> > > > > >>
> > > > > >> I just ran the C repro locally with the upstream codebase (but I
> > > > > >> didn't use the related Kconfig), and it doesn't show anything.
> > > > > >
> > > > > > Which upstream commit are you running it on? It needs to be run on top
> > > > > > of this patchset [1] but without this fix [2]. These changes are in
> > > > > > Christian's vfs-6.19.iomap branch in his vfs tree but I don't think
> > > > > > that branch has been published publicly yet so maybe just patching it
> > > > > > in locally will work best.
> > > > > >
> > > > > > When I reproed it last month, I used the syz executor (not the C
> > > > > > repro, though that should probably work too?) directly with the
> > > > > > kconfig they had.
> > > > >
> > > > > I believe it's a regression somewhere since it's a valid
> > > > > IOMAP_INLINE extent (since it's inlined, the length is not
> > > > > block-aligned of course), you could add a print just before
> > > > > erofs_iomap_begin() returns.
> > > >
> > > > Ok, so if erofs is strictly block-aligned except for tail inline data
> > > > (eg the IOMAP_INLINE extent), then I agree, there is a regression
> > > > somewhere as we shouldn't be running into the situation where erofs is
> > > > calling iomap_adjust_read_range() with a non-block-aligned position
> > > > and length. I'll track the offending commit down tomorrow.
> > > >
> > >
> > > Ok, I think it's commit bc264fea0f6f ("iomap: support incremental
> > > iomap_iter advances") that changed this behavior for erofs such that
> > > the read iteration continues even after encountering an IOMAP_INLINE
> > > extent, whereas before, the iteration stopped after reading in the
> > > iomap inline extent. This leads erofs to end up in the situation where
> > > it calls into iomap_adjust_read_range() with a non-block-aligned
> > > position/length (on that subsequent iteration).
> > >
> > > In particular, this change in commit bc264fea0f6f to iomap_iter():
> > >
> > > -       if (ret > 0 && !iter->processed && !stale)
> > > +       if (ret > 0 && !advanced && !stale)
> > >
> > > For iomap inline extents, iter->processed is 0, which stopped the
> > > iteration before. But now, advanced (which is iter->pos -
> > > iter->iter_start_pos) is used which will continue the iteration (since
> > > the iter is advanced after reading in the iomap inline extents).
> > >
> > > Erofs is able to handle subsequent iterations after iomap_inline
> > > extents because erofs_iomap_begin() checks the block map and returns
> > > IOMAP_HOLE if it's not mapped
> > >         if (!(map.m_flags & EROFS_MAP_MAPPED)) {
> > >                 iomap->type = IOMAP_HOLE;
> > >                 return 0;
> > >         }
> > >
> > > but I think what probably would be better is a separate patch that
> > > reverts this back to the original behavior of stopping the iteration
> > > after IOMAP_INLINE extents are read in.
> > >
> >
> > Hmm.. so as of commit bc264fea0f6f, it looks like the read_inline() path
> > still didn't advance the iter at all by that point. It just returned 0
> > and this caused iomap_iter() to break out of the iteration loop.
> >
> > The logic noted above in iomap_iter() is basically saying to break out
> > if the iteration did nothing, which is a bit of a hacky way to terminate
> > an IOMAP_INLINE read. The proper thing to do in that path IMO is to
> > report the bytes processed and then terminate some other way more
> > naturally. I see Gao actually fixed this sometime later in commit
> > b26816b4e320 ("iomap: fix inline data on buffered read"), which is when
> > the inline read path started to advance the iter.
> 
> That's a good point, the fix in commit b26816b4e320 is what led to the
> new erofs behavior, not commit bc264fea0f6f.
> 
> >
> > TBH, the behavior described above where we advance over the inline
> > mapping and then report any remaining iter length as a hole also sounds
> > like reasonably appropriate behavior to me. I suppose you could argue
> > that the inline case should just terminate the iter, which perhaps means
> > it should call iomap_iter_advance_full() instead. That technically
> > hardcodes that we will never process mappings beyond an inline mapping
> > into iomap. That bugs me a little bit, but is also probably always going
> > to be true so doesn't seem like that big of a deal.
> 
> Reporting any remaining iter length as a hole also sounds reasonable
> to me but it seems that this may add additional work that may not be
> trivial. For example, I'm looking at erofs's erofs_map_blocks() call
> which they would need to do to figure out it should be an iomap hole.
> It seems a bit nonideal to me that any filesystem using iomap inline
> data would also have to make sure they cover this case, which maybe
> they already need to do that, I'm not sure, but it seems like an extra
> thing they would now need to account for.
> 

To make sure I follow, you're talking about any potential non-erofs
cases, right? I thought it was mentioned that erofs already handled this
correctly by returning a hole. So I take this to mean other users would
need to handle this similarly.

Poking around, I see that ext4 uses iomap and IOMAP_INLINE in a couple
isolated cases. One looks like swapfile activation and the other appears
to be related to fiemap. Any idea if those are working correctly? At
first look it kind of looks like they check and return error for offset
beyond the specified length..

> One scenario I'm imagining maybe being useful in the future is
> supporting chained inline mappings, in which case we would still want
> to continue processing after the first inline mapping, but we could
> also address that if it does come up.
> 

Yeah..

> >
> > If we wanted to consider it an optimization so we didn't always do this
> > extra iter on inline files, perhaps another variant of that could be an
> > EOF flag or some such that the fs could set to trigger a full advance
> > after the current mapping. OTOH you could argue that's what inline
> > already is so maybe that's overthinking it. Just a thought. Hm?
> >
> 
> Would non-inline iomap types also find that flag helpful or is the
> intention for it to be inline specific? I guess the flag could also be
> used by non-inline types to stop the iteration short, if there's a use
> case for that?
> 

... I hadn't really considered that. This was just me thinking out loud
about trying to handle things more generically.

Having thought more about it, I think either way it might just make
sense to fix the inline read case to do the full advance (assuming it is
verified this fixes the problem) to restore historical iomap behavior.
Then if there is ever value to support multiple inline mappings somehow
or another, we could revisit the flag idea.

Brian

> Thanks,
> Joanne
> 
> > Brian
> >
> > > So I don't think this patch should have a fixes: tag for that commit.
> > > It seems to me like no one was hitting this path before with a
> > > non-block-aligned position and offset. Though now there will be a use
> > > case for it, which is fuse.
> > >
> > > Thanks,
> > > Joanne
> > >
> > > >
> > > > Thanks,
> > > > Joanne
> > > >
> > > > >
> > > > > Also see my reply:
> > > > > https://lore.kernel.org/r/cff53c73-f050-44e2-9c61-96552c0e85ab@linux.alibaba.com
> > > > >
> > > > > I'm not sure if it caused user-visible regressions since
> > > > > erofs images work properly with upstream code (unlike a
> > > > > previous regression fixed by commit b26816b4e320 ("iomap:
> > > > > fix inline data on buffered read")).
> > > > >
> > > > > But a fixes tag is needed since it causes an unexpected
> > > > > WARNING at least.
> > > > >
> > > > > Thanks,
> > > > > Gao Xiang
> > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Joanne
> > > > > >
> > > > > > [1] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/T/#t
> > > > > > [2] https://lore.kernel.org/linux-fsdevel/20250922180042.1775241-1-joannelkoong@gmail.com/
> > > > > > [3] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/T/#m4ce4707bf98077cde4d1d4845425de30cf2b00f6
> > > > > >
> > > > > >>
> > > > > >> I feel strange why pos is unaligned, does this warning show
> > > > > >> without your patchset on your side?
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Gao Xiang
> > > > > >>
> > > > > >>>
> > > > > >>> Thanks,
> > > > > >>> Joanne
> > > > > >>>
> > > > > >>> [1] https://ci.syzbot.org/series/6845596a-1ec9-4396-b9c4-48bddc606bef
> > > > > >>> [2] https://lore.kernel.org/linux-fsdevel/68ca71bd.050a0220.2ff435.04fc.GAE@google.com/
> > > > > >>>
> > > > > >>>>
> > > > > >>>> Thanks,
> > > > > >>>> Gao Xiang
> > > > > >>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> Thanks,
> > > > > >>>>> Joanne
> > > > > >>>>>
> > > > > >>>>>>
> > > > > >>>>>> Otherwise looks good:
> > > > > >>>>>>
> > > > > >>>>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > > > >>>>
> > > > > >>
> > > > >
> > >
> >
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-17 15:33                         ` Brian Foster
@ 2025-10-17 18:27                           ` Brian Foster
  2025-10-17 20:19                             ` Joanne Koong
  0 siblings, 1 reply; 46+ messages in thread
From: Brian Foster @ 2025-10-17 18:27 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Gao Xiang, Christoph Hellwig, brauner, djwong, linux-fsdevel,
	kernel-team

On Fri, Oct 17, 2025 at 11:33:31AM -0400, Brian Foster wrote:
> On Thu, Oct 16, 2025 at 03:39:27PM -0700, Joanne Koong wrote:
> > On Thu, Oct 16, 2025 at 4:25 AM Brian Foster <bfoster@redhat.com> wrote:
> > >
> > > On Wed, Oct 15, 2025 at 06:27:10PM -0700, Joanne Koong wrote:
> > > > On Wed, Oct 15, 2025 at 5:36 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > >
> > > > > On Wed, Oct 15, 2025 at 11:39 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > > > > >
> > > > > > Hi Joanne,
> > > > > >
> > > > > > On 2025/10/16 02:21, Joanne Koong wrote:
> > > > > > > On Wed, Oct 15, 2025 at 11:06 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > > > > >
> > > > > > ...
> > > > > >
> > > > > > >>>
> > > > > > >>> This is where I encountered it in erofs: [1] for the "WARNING in
> > > > > > >>> iomap_iter_advance" syz repro. (this syzbot report was generated in
> > > > > > >>> response to this patchset version [2]).
> > > > > > >>>
> > > > > > >>> When I ran that syz program locally, I remember seeing pos=116 and length=3980.
> > > > > > >>
> > > > > > >> I just ran the C repro locally with the upstream codebase (but I
> > > > > > >> didn't use the related Kconfig), and it doesn't show anything.
> > > > > > >
> > > > > > > Which upstream commit are you running it on? It needs to be run on top
> > > > > > > of this patchset [1] but without this fix [2]. These changes are in
> > > > > > > Christian's vfs-6.19.iomap branch in his vfs tree but I don't think
> > > > > > > that branch has been published publicly yet so maybe just patching it
> > > > > > > in locally will work best.
> > > > > > >
> > > > > > > When I reproed it last month, I used the syz executor (not the C
> > > > > > > repro, though that should probably work too?) directly with the
> > > > > > > kconfig they had.
> > > > > >
> > > > > > I believe it's a regression somewhere since it's a valid
> > > > > > IOMAP_INLINE extent (since it's inlined, the length is not
> > > > > > block-aligned of course), you could add a print just before
> > > > > > erofs_iomap_begin() returns.
> > > > >
> > > > > Ok, so if erofs is strictly block-aligned except for tail inline data
> > > > > (eg the IOMAP_INLINE extent), then I agree, there is a regression
> > > > > somewhere as we shouldn't be running into the situation where erofs is
> > > > > calling iomap_adjust_read_range() with a non-block-aligned position
> > > > > and length. I'll track the offending commit down tomorrow.
> > > > >
> > > >
> > > > Ok, I think it's commit bc264fea0f6f ("iomap: support incremental
> > > > iomap_iter advances") that changed this behavior for erofs such that
> > > > the read iteration continues even after encountering an IOMAP_INLINE
> > > > extent, whereas before, the iteration stopped after reading in the
> > > > iomap inline extent. This leads erofs to end up in the situation where
> > > > it calls into iomap_adjust_read_range() with a non-block-aligned
> > > > position/length (on that subsequent iteration).
> > > >
> > > > In particular, this change in commit bc264fea0f6f to iomap_iter():
> > > >
> > > > -       if (ret > 0 && !iter->processed && !stale)
> > > > +       if (ret > 0 && !advanced && !stale)
> > > >
> > > > For iomap inline extents, iter->processed is 0, which stopped the
> > > > iteration before. But now, advanced (which is iter->pos -
> > > > iter->iter_start_pos) is used which will continue the iteration (since
> > > > the iter is advanced after reading in the iomap inline extents).
> > > >
> > > > Erofs is able to handle subsequent iterations after iomap_inline
> > > > extents because erofs_iomap_begin() checks the block map and returns
> > > > IOMAP_HOLE if it's not mapped
> > > >         if (!(map.m_flags & EROFS_MAP_MAPPED)) {
> > > >                 iomap->type = IOMAP_HOLE;
> > > >                 return 0;
> > > >         }
> > > >
> > > > but I think what probably would be better is a separate patch that
> > > > reverts this back to the original behavior of stopping the iteration
> > > > after IOMAP_INLINE extents are read in.
> > > >
> > >
> > > Hmm.. so as of commit bc264fea0f6f, it looks like the read_inline() path
> > > still didn't advance the iter at all by that point. It just returned 0
> > > and this caused iomap_iter() to break out of the iteration loop.
> > >
> > > The logic noted above in iomap_iter() is basically saying to break out
> > > if the iteration did nothing, which is a bit of a hacky way to terminate
> > > an IOMAP_INLINE read. The proper thing to do in that path IMO is to
> > > report the bytes processed and then terminate some other way more
> > > naturally. I see Gao actually fixed this sometime later in commit
> > > b26816b4e320 ("iomap: fix inline data on buffered read"), which is when
> > > the inline read path started to advance the iter.
> > 
> > That's a good point, the fix in commit b26816b4e320 is what led to the
> > new erofs behavior, not commit bc264fea0f6f.
> > 
> > >
> > > TBH, the behavior described above where we advance over the inline
> > > mapping and then report any remaining iter length as a hole also sounds
> > > like reasonably appropriate behavior to me. I suppose you could argue
> > > that the inline case should just terminate the iter, which perhaps means
> > > it should call iomap_iter_advance_full() instead. That technically
> > > hardcodes that we will never process mappings beyond an inline mapping
> > > into iomap. That bugs me a little bit, but is also probably always going
> > > to be true so doesn't seem like that big of a deal.
> > 
> > Reporting any remaining iter length as a hole also sounds reasonable
> > to me but it seems that this may add additional work that may not be
> > trivial. For example, I'm looking at erofs's erofs_map_blocks() call
> > which they would need to do to figure out it should be an iomap hole.
> > It seems a bit nonideal to me that any filesystem using iomap inline
> > data would also have to make sure they cover this case, which maybe
> > they already need to do that, I'm not sure, but it seems like an extra
> > thing they would now need to account for.
> > 
> 
> To make sure I follow, you're talking about any potential non-erofs
> cases, right? I thought it was mentioned that erofs already handled this
> correctly by returning a hole. So I take this to mean other users would
> need to handle this similarly.
> 
> Poking around, I see that ext4 uses iomap and IOMAP_INLINE in a couple
> isolated cases. One looks like swapfile activation and the other appears
> to be related to fiemap. Any idea if those are working correctly? At
> first look it kind of looks like they check and return error for offset
> beyond the specified length..
> 

JFYI from digging further.. ext4 returns -ENOENT for this "offset beyond
inline size" case. I didn't see any error returns from fiemap/filefrag
on a quick test against an inline data file. It looks like
iomap_fiemap() actually ignores errors either if the previous mapping is
non-hole, or catches -ENOENT explicitly to handle an attribute mapping
case. So afaict this all seems to work Ok..

I'm not sure this is all that relevant for the swap case. mkswap
apparently requires a file at least 40k in size, which iiuc is beyond
the scope of files with inline data..

Brian

> > One scenario I'm imagining maybe being useful in the future is
> > supporting chained inline mappings, in which case we would still want
> > to continue processing after the first inline mapping, but we could
> > also address that if it does come up.
> > 
> 
> Yeah..
> 
> > >
> > > If we wanted to consider it an optimization so we didn't always do this
> > > extra iter on inline files, perhaps another variant of that could be an
> > > EOF flag or some such that the fs could set to trigger a full advance
> > > after the current mapping. OTOH you could argue that's what inline
> > > already is so maybe that's overthinking it. Just a thought. Hm?
> > >
> > 
> > Would non-inline iomap types also find that flag helpful or is the
> > intention for it to be inline specific? I guess the flag could also be
> > used by non-inline types to stop the iteration short, if there's a use
> > case for that?
> > 
> 
> ... I hadn't really considered that. This was just me thinking out loud
> about trying to handle things more generically.
> 
> Having thought more about it, I think either way it might just make
> sense to fix the inline read case to do the full advance (assuming it is
> verified this fixes the problem) to restore historical iomap behavior.
> Then if there is ever value to support multiple inline mappings somehow
> or another, we could revisit the flag idea.
> 
> Brian
> 
> > Thanks,
> > Joanne
> > 
> > > Brian
> > >
> > > > So I don't think this patch should have a fixes: tag for that commit.
> > > > It seems to me like no one was hitting this path before with a
> > > > non-block-aligned position and offset. Though now there will be a use
> > > > case for it, which is fuse.
> > > >
> > > > Thanks,
> > > > Joanne
> > > >
> > > > >
> > > > > Thanks,
> > > > > Joanne
> > > > >
> > > > > >
> > > > > > Also see my reply:
> > > > > > https://lore.kernel.org/r/cff53c73-f050-44e2-9c61-96552c0e85ab@linux.alibaba.com
> > > > > >
> > > > > > I'm not sure if it caused user-visible regressions since
> > > > > > erofs images work properly with upstream code (unlike a
> > > > > > previous regression fixed by commit b26816b4e320 ("iomap:
> > > > > > fix inline data on buffered read")).
> > > > > >
> > > > > > But a fixes tag is needed since it causes an unexpected
> > > > > > WARNING at least.
> > > > > >
> > > > > > Thanks,
> > > > > > Gao Xiang
> > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Joanne
> > > > > > >
> > > > > > > [1] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/T/#t
> > > > > > > [2] https://lore.kernel.org/linux-fsdevel/20250922180042.1775241-1-joannelkoong@gmail.com/
> > > > > > > [3] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/T/#m4ce4707bf98077cde4d1d4845425de30cf2b00f6
> > > > > > >
> > > > > > >>
> > > > > > >> I feel strange why pos is unaligned, does this warning show
> > > > > > >> without your patchset on your side?
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >> Gao Xiang
> > > > > > >>
> > > > > > >>>
> > > > > > >>> Thanks,
> > > > > > >>> Joanne
> > > > > > >>>
> > > > > > >>> [1] https://ci.syzbot.org/series/6845596a-1ec9-4396-b9c4-48bddc606bef
> > > > > > >>> [2] https://lore.kernel.org/linux-fsdevel/68ca71bd.050a0220.2ff435.04fc.GAE@google.com/
> > > > > > >>>
> > > > > > >>>>
> > > > > > >>>> Thanks,
> > > > > > >>>> Gao Xiang
> > > > > > >>>>
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>> Thanks,
> > > > > > >>>>> Joanne
> > > > > > >>>>>
> > > > > > >>>>>>
> > > > > > >>>>>> Otherwise looks good:
> > > > > > >>>>>>
> > > > > > >>>>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > > > > >>>>
> > > > > > >>
> > > > > >
> > > >
> > >
> > 
> 
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-17  0:03                         ` Gao Xiang
@ 2025-10-17 18:41                           ` Joanne Koong
  2025-10-17 22:07                             ` Gao Xiang
  0 siblings, 1 reply; 46+ messages in thread
From: Joanne Koong @ 2025-10-17 18:41 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Christoph Hellwig, brauner, djwong, bfoster, linux-fsdevel,
	kernel-team

On Thu, Oct 16, 2025 at 5:03 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>
> On 2025/10/17 06:03, Joanne Koong wrote:
> > On Wed, Oct 15, 2025 at 6:58 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>
> ...
>
> >>
> >>>
> >>> So I don't think this patch should have a fixes: tag for that commit.
> >>> It seems to me like no one was hitting this path before with a
> >>> non-block-aligned position and offset. Though now there will be a use
> >>> case for it, which is fuse.
> >>
> >> To make it simplified, the issue is that:
> >>    - Previously, before your fuse iomap read patchset (assuming Christian
> >>      is already applied), there was no WARNING out of there;
> >>
> >>    - A new WARNING should be considered as a kernel regression.
> >
> > No, the warning was always there. As shown in the syzbot report [1],
> > the warning that triggers is this one in iomap_iter_advance()
> >
> > int iomap_iter_advance(struct iomap_iter *iter, u64 *count)
> > {
> >          if (WARN_ON_ONCE(*count > iomap_length(iter)))
> >                  return -EIO;
> >          ...
> > }
> >
> > which was there even prior to the fuse iomap read patchset.
> >
> > Erofs could still trigger this warning even without the fuse iomap
> > read patchset changes. So I don't think this qualifies as a new
> > warning that's caused by the fuse iomap read changes.
>
> No, I'm pretty sure the current Linus upstream doesn't have this
> issue, because:
>
>   - I've checked it against v6.17 with the C repro and related
>     Kconfig (with make olddefconfig revised);
>
>   - IOMAP_INLINE is pretty common for directories and regular
>     inodes, if it has such warning syzbot should be reported
>     much earlier (d9dc477ff6a2 was commited at Feb 26, 2025;
>     and b26816b4e320 was commited at Mar 19, 2025) in the dashboard
>     (https://syzkaller.appspot.com/upstream/s/erofs) rather
>     than triggered directly by your fuse read patchset.
>
> Could you also check with v6.17 codebase?

I think we are talking about two different things. By "this issue"
what you're talking about is the syzbot read example program that
triggers the warning on erofs, but by "this issue", what I was talking
about is the iomap_iter_advance() warning being triggerable
generically without the fuse read patchset, not just by erofs.

If we're talking about the syzbot erofs warning being triggered, then
this patch is irrelevant to be talking about, because it is this other
patch [1] that fixes that issue. That patch got merged in before any
of the fuse iomap read changes. There is no regression to erofs.

Thanks,
Joanne

[1] https://lore.kernel.org/linux-fsdevel/20250922180042.1775241-1-joannelkoong@gmail.com/

>
> Thanks,
> Gao Xiang
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-17 18:27                           ` Brian Foster
@ 2025-10-17 20:19                             ` Joanne Koong
  2025-10-18 10:30                               ` Brian Foster
  0 siblings, 1 reply; 46+ messages in thread
From: Joanne Koong @ 2025-10-17 20:19 UTC (permalink / raw)
  To: Brian Foster
  Cc: Gao Xiang, Christoph Hellwig, brauner, djwong, linux-fsdevel,
	kernel-team

On Fri, Oct 17, 2025 at 11:23 AM Brian Foster <bfoster@redhat.com> wrote:
>
> On Fri, Oct 17, 2025 at 11:33:31AM -0400, Brian Foster wrote:
> > On Thu, Oct 16, 2025 at 03:39:27PM -0700, Joanne Koong wrote:
> > > On Thu, Oct 16, 2025 at 4:25 AM Brian Foster <bfoster@redhat.com> wrote:
> > > >
> > > > On Wed, Oct 15, 2025 at 06:27:10PM -0700, Joanne Koong wrote:
> > > > > On Wed, Oct 15, 2025 at 5:36 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > > >
> > > > > > On Wed, Oct 15, 2025 at 11:39 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > Hi Joanne,
> > > > > > >
> > > > > > > On 2025/10/16 02:21, Joanne Koong wrote:
> > > > > > > > On Wed, Oct 15, 2025 at 11:06 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > ...
> > > > > > >
> > > > > > > >>>
> > > > > > > >>> This is where I encountered it in erofs: [1] for the "WARNING in
> > > > > > > >>> iomap_iter_advance" syz repro. (this syzbot report was generated in
> > > > > > > >>> response to this patchset version [2]).
> > > > > > > >>>
> > > > > > > >>> When I ran that syz program locally, I remember seeing pos=116 and length=3980.
> > > > > > > >>
> > > > > > > >> I just ran the C repro locally with the upstream codebase (but I
> > > > > > > >> didn't use the related Kconfig), and it doesn't show anything.
> > > > > > > >
> > > > > > > > Which upstream commit are you running it on? It needs to be run on top
> > > > > > > > of this patchset [1] but without this fix [2]. These changes are in
> > > > > > > > Christian's vfs-6.19.iomap branch in his vfs tree but I don't think
> > > > > > > > that branch has been published publicly yet so maybe just patching it
> > > > > > > > in locally will work best.
> > > > > > > >
> > > > > > > > When I reproed it last month, I used the syz executor (not the C
> > > > > > > > repro, though that should probably work too?) directly with the
> > > > > > > > kconfig they had.
> > > > > > >
> > > > > > > I believe it's a regression somewhere since it's a valid
> > > > > > > IOMAP_INLINE extent (since it's inlined, the length is not
> > > > > > > block-aligned of course), you could add a print just before
> > > > > > > erofs_iomap_begin() returns.
> > > > > >
> > > > > > Ok, so if erofs is strictly block-aligned except for tail inline data
> > > > > > (eg the IOMAP_INLINE extent), then I agree, there is a regression
> > > > > > somewhere as we shouldn't be running into the situation where erofs is
> > > > > > calling iomap_adjust_read_range() with a non-block-aligned position
> > > > > > and length. I'll track the offending commit down tomorrow.
> > > > > >
> > > > >
> > > > > Ok, I think it's commit bc264fea0f6f ("iomap: support incremental
> > > > > iomap_iter advances") that changed this behavior for erofs such that
> > > > > the read iteration continues even after encountering an IOMAP_INLINE
> > > > > extent, whereas before, the iteration stopped after reading in the
> > > > > iomap inline extent. This leads erofs to end up in the situation where
> > > > > it calls into iomap_adjust_read_range() with a non-block-aligned
> > > > > position/length (on that subsequent iteration).
> > > > >
> > > > > In particular, this change in commit bc264fea0f6f to iomap_iter():
> > > > >
> > > > > -       if (ret > 0 && !iter->processed && !stale)
> > > > > +       if (ret > 0 && !advanced && !stale)
> > > > >
> > > > > For iomap inline extents, iter->processed is 0, which stopped the
> > > > > iteration before. But now, advanced (which is iter->pos -
> > > > > iter->iter_start_pos) is used which will continue the iteration (since
> > > > > the iter is advanced after reading in the iomap inline extents).
> > > > >
> > > > > Erofs is able to handle subsequent iterations after iomap_inline
> > > > > extents because erofs_iomap_begin() checks the block map and returns
> > > > > IOMAP_HOLE if it's not mapped
> > > > >         if (!(map.m_flags & EROFS_MAP_MAPPED)) {
> > > > >                 iomap->type = IOMAP_HOLE;
> > > > >                 return 0;
> > > > >         }
> > > > >
> > > > > but I think what probably would be better is a separate patch that
> > > > > reverts this back to the original behavior of stopping the iteration
> > > > > after IOMAP_INLINE extents are read in.
> > > > >
> > > >
> > > > Hmm.. so as of commit bc264fea0f6f, it looks like the read_inline() path
> > > > still didn't advance the iter at all by that point. It just returned 0
> > > > and this caused iomap_iter() to break out of the iteration loop.
> > > >
> > > > The logic noted above in iomap_iter() is basically saying to break out
> > > > if the iteration did nothing, which is a bit of a hacky way to terminate
> > > > an IOMAP_INLINE read. The proper thing to do in that path IMO is to
> > > > report the bytes processed and then terminate some other way more
> > > > naturally. I see Gao actually fixed this sometime later in commit
> > > > b26816b4e320 ("iomap: fix inline data on buffered read"), which is when
> > > > the inline read path started to advance the iter.
> > >
> > > That's a good point, the fix in commit b26816b4e320 is what led to the
> > > new erofs behavior, not commit bc264fea0f6f.
> > >
> > > >
> > > > TBH, the behavior described above where we advance over the inline
> > > > mapping and then report any remaining iter length as a hole also sounds
> > > > like reasonably appropriate behavior to me. I suppose you could argue
> > > > that the inline case should just terminate the iter, which perhaps means
> > > > it should call iomap_iter_advance_full() instead. That technically
> > > > hardcodes that we will never process mappings beyond an inline mapping
> > > > into iomap. That bugs me a little bit, but is also probably always going
> > > > to be true so doesn't seem like that big of a deal.
> > >
> > > Reporting any remaining iter length as a hole also sounds reasonable
> > > to me but it seems that this may add additional work that may not be
> > > trivial. For example, I'm looking at erofs's erofs_map_blocks() call
> > > which they would need to do to figure out it should be an iomap hole.
> > > It seems a bit nonideal to me that any filesystem using iomap inline
> > > data would also have to make sure they cover this case, which maybe
> > > they already need to do that, I'm not sure, but it seems like an extra
> > > thing they would now need to account for.
> > >
> >
> > To make sure I follow, you're talking about any potential non-erofs
> > cases, right? I thought it was mentioned that erofs already handled this
> > correctly by returning a hole. So I take this to mean other users would
> > need to handle this similarly.

Yes, sorry if my previous wording was confusing. Erofs already handles
this correctly but there's some overhead with having to determine it's
a hole.

> >
> > Poking around, I see that ext4 uses iomap and IOMAP_INLINE in a couple
> > isolated cases. One looks like swapfile activation and the other appears
> > to be related to fiemap. Any idea if those are working correctly? At
> > first look it kind of looks like they check and return error for offset
> > beyond the specified length..
> >
>
> JFYI from digging further.. ext4 returns -ENOENT for this "offset beyond
> inline size" case. I didn't see any error returns from fiemap/filefrag
> on a quick test against an inline data file. It looks like
> iomap_fiemap() actually ignores errors either if the previous mapping is
> non-hole, or catches -ENOENT explicitly to handle an attribute mapping
> case. So afaict this all seems to work Ok..
>
> I'm not sure this is all that relevant for the swap case. mkswap
> apparently requires a file at least 40k in size, which iiuc is beyond
> the scope of files with inline data..

I think gfs2 also uses IOMAP_INLINE. In __gfs2_iomap_get(), it looks
like they report a hole if the offset is beyond the inode size, so
that looks okay.

>
> Brian
>
> > > One scenario I'm imagining maybe being useful in the future is
> > > supporting chained inline mappings, in which case we would still want
> > > to continue processing after the first inline mapping, but we could
> > > also address that if it does come up.
> > >
> >
> > Yeah..
> >
> > > >
> > > > If we wanted to consider it an optimization so we didn't always do this
> > > > extra iter on inline files, perhaps another variant of that could be an
> > > > EOF flag or some such that the fs could set to trigger a full advance
> > > > after the current mapping. OTOH you could argue that's what inline
> > > > already is so maybe that's overthinking it. Just a thought. Hm?
> > > >
> > >
> > > Would non-inline iomap types also find that flag helpful or is the
> > > intention for it to be inline specific? I guess the flag could also be
> > > used by non-inline types to stop the iteration short, if there's a use
> > > case for that?
> > >
> >
> > ... I hadn't really considered that. This was just me thinking out loud
> > about trying to handle things more generically.
> >
> > Having thought more about it, I think either way it might just make
> > sense to fix the inline read case to do the full advance (assuming it is
> > verified this fixes the problem) to restore historical iomap behavior.

Hmm, the iter is already fully advanced after the inline data is read.
I think the issue is that when the iomap mapping length is less than
the iter len (the iter len will always be the size of the folio but
the length for the inline data could be less), then advancing it fully
through iomap_iter_advance_full() will only advance it up to the iomap
length. iter->len will then still have a positive value which makes
iomap_iter() continue iterating.

I don't see a cleaner way of restoring historical iomap behavior than
adding that eof/stop flag.

Thanks,
Joanne


> > Then if there is ever value to support multiple inline mappings somehow
> > or another, we could revisit the flag idea.

> >
> > Brian
> >
> > > Thanks,
> > > Joanne

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-17 18:41                           ` Joanne Koong
@ 2025-10-17 22:07                             ` Gao Xiang
  2025-10-17 23:22                               ` Joanne Koong
  0 siblings, 1 reply; 46+ messages in thread
From: Gao Xiang @ 2025-10-17 22:07 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Christoph Hellwig, brauner, djwong, bfoster, linux-fsdevel,
	kernel-team



On 2025/10/18 02:41, Joanne Koong wrote:
> On Thu, Oct 16, 2025 at 5:03 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>
>> On 2025/10/17 06:03, Joanne Koong wrote:
>>> On Wed, Oct 15, 2025 at 6:58 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>
>> ...
>>
>>>>
>>>>>
>>>>> So I don't think this patch should have a fixes: tag for that commit.
>>>>> It seems to me like no one was hitting this path before with a
>>>>> non-block-aligned position and offset. Though now there will be a use
>>>>> case for it, which is fuse.
>>>>
>>>> To make it simplified, the issue is that:
>>>>     - Previously, before your fuse iomap read patchset (assuming Christian
>>>>       is already applied), there was no WARNING out of there;
>>>>
>>>>     - A new WARNING should be considered as a kernel regression.
>>>
>>> No, the warning was always there. As shown in the syzbot report [1],
>>> the warning that triggers is this one in iomap_iter_advance()
>>>
>>> int iomap_iter_advance(struct iomap_iter *iter, u64 *count)
>>> {
>>>           if (WARN_ON_ONCE(*count > iomap_length(iter)))
>>>                   return -EIO;
>>>           ...
>>> }
>>>
>>> which was there even prior to the fuse iomap read patchset.
>>>
>>> Erofs could still trigger this warning even without the fuse iomap
>>> read patchset changes. So I don't think this qualifies as a new
>>> warning that's caused by the fuse iomap read changes.
>>
>> No, I'm pretty sure the current Linus upstream doesn't have this
>> issue, because:
>>
>>    - I've checked it against v6.17 with the C repro and related
>>      Kconfig (with make olddefconfig revised);
>>
>>    - IOMAP_INLINE is pretty common for directories and regular
>>      inodes, if it has such warning syzbot should be reported
>>      much earlier (d9dc477ff6a2 was commited at Feb 26, 2025;
>>      and b26816b4e320 was commited at Mar 19, 2025) in the dashboard
>>      (https://syzkaller.appspot.com/upstream/s/erofs) rather
>>      than triggered directly by your fuse read patchset.
>>
>> Could you also check with v6.17 codebase?
> 
> I think we are talking about two different things. By "this issue"
> what you're talking about is the syzbot read example program that
> triggers the warning on erofs, but by "this issue", what I was talking
> about is the iomap_iter_advance() warning being triggerable
> generically without the fuse read patchset, not just by erofs.

Ah, yes.  Sorry the subjects of those two patches are similar,
I got them mixed up.  I thought you resent the previous patch
in this patchset.

> 
> If we're talking about the syzbot erofs warning being triggered, then
> this patch is irrelevant to be talking about, because it is this other
> patch [1] that fixes that issue. That patch got merged in before any
> of the fuse iomap read changes. There is no regression to erofs.

Can you confirm this since I can't open the link below:

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.19.iomap

[1/1] iomap: adjust read range correctly for non-block-aligned positions
       https://git.kernel.org/vfs/vfs/c/94b11133d6f6

As you said, if this commit is prior to the iomap read patchset, that
would be fine.  Otherwise it would be better to add a fixes tag to
that commit to point out this patch should be ported together to avoid
the new warning.

Thanks,
Gao Xiang


> 
> Thanks,
> Joanne
> 
> [1] https://lore.kernel.org/linux-fsdevel/20250922180042.1775241-1-joannelkoong@gmail.com/
> 
>>
>> Thanks,
>> Gao Xiang
>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-17 22:07                             ` Gao Xiang
@ 2025-10-17 23:22                               ` Joanne Koong
  2025-10-18  0:12                                 ` Gao Xiang
  0 siblings, 1 reply; 46+ messages in thread
From: Joanne Koong @ 2025-10-17 23:22 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Christoph Hellwig, brauner, djwong, bfoster, linux-fsdevel,
	kernel-team

On Fri, Oct 17, 2025 at 3:07 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>
>
> On 2025/10/18 02:41, Joanne Koong wrote:
> > On Thu, Oct 16, 2025 at 5:03 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> >>
> >> On 2025/10/17 06:03, Joanne Koong wrote:
> >>> On Wed, Oct 15, 2025 at 6:58 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> >>
> >> ...
> >>
> >>>>
> >>>>>
> >>>>> So I don't think this patch should have a fixes: tag for that commit.
> >>>>> It seems to me like no one was hitting this path before with a
> >>>>> non-block-aligned position and offset. Though now there will be a use
> >>>>> case for it, which is fuse.
> >>>>
> >>>> To make it simplified, the issue is that:
> >>>>     - Previously, before your fuse iomap read patchset (assuming Christian
> >>>>       is already applied), there was no WARNING out of there;
> >>>>
> >>>>     - A new WARNING should be considered as a kernel regression.
> >>>
> >>> No, the warning was always there. As shown in the syzbot report [1],
> >>> the warning that triggers is this one in iomap_iter_advance()
> >>>
> >>> int iomap_iter_advance(struct iomap_iter *iter, u64 *count)
> >>> {
> >>>           if (WARN_ON_ONCE(*count > iomap_length(iter)))
> >>>                   return -EIO;
> >>>           ...
> >>> }
> >>>
> >>> which was there even prior to the fuse iomap read patchset.
> >>>
> >>> Erofs could still trigger this warning even without the fuse iomap
> >>> read patchset changes. So I don't think this qualifies as a new
> >>> warning that's caused by the fuse iomap read changes.
> >>
> >> No, I'm pretty sure the current Linus upstream doesn't have this
> >> issue, because:
> >>
> >>    - I've checked it against v6.17 with the C repro and related
> >>      Kconfig (with make olddefconfig revised);
> >>
> >>    - IOMAP_INLINE is pretty common for directories and regular
> >>      inodes, if it has such warning syzbot should be reported
> >>      much earlier (d9dc477ff6a2 was commited at Feb 26, 2025;
> >>      and b26816b4e320 was commited at Mar 19, 2025) in the dashboard
> >>      (https://syzkaller.appspot.com/upstream/s/erofs) rather
> >>      than triggered directly by your fuse read patchset.
> >>
> >> Could you also check with v6.17 codebase?
> >
> > I think we are talking about two different things. By "this issue"
> > what you're talking about is the syzbot read example program that
> > triggers the warning on erofs, but by "this issue", what I was talking
> > about is the iomap_iter_advance() warning being triggerable
> > generically without the fuse read patchset, not just by erofs.
>
> Ah, yes.  Sorry the subjects of those two patches are similar,
> I got them mixed up.  I thought you resent the previous patch
> in this patchset.
>
> >
> > If we're talking about the syzbot erofs warning being triggered, then
> > this patch is irrelevant to be talking about, because it is this other
> > patch [1] that fixes that issue. That patch got merged in before any
> > of the fuse iomap read changes. There is no regression to erofs.
>
> Can you confirm this since I can't open the link below:
>
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
> branch: vfs-6.19.iomap
>
> [1/1] iomap: adjust read range correctly for non-block-aligned positions
>        https://git.kernel.org/vfs/vfs/c/94b11133d6f6
>

I don't think the vfs-6.19.iomap branch is publicly available yet,
which is why the link doesn't work.

From the merge timestamps in [1] and [2], the fix was applied to the
branch 3 minutes before the fuse iomap changes were applied.
Additionally, in the cover letter of the fuse iomap read patchset [3],
it calls out that the patchset is rebased on top of that fix.

Thanks,
Joanne

[1] https://lore.kernel.org/linux-fsdevel/20250929-gefochten-bergwacht-b43b132a78d9@brauner/
[2] https://lore.kernel.org/linux-fsdevel/20250929-salzbergwerk-ungnade-8a16d724415e@brauner/
[3] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/

> As you said, if this commit is prior to the iomap read patchset, that
> would be fine.  Otherwise it would be better to add a fixes tag to
> that commit to point out this patch should be ported together to avoid
> the new warning.
>
> Thanks,
> Gao Xiang
>
>
> >
> > Thanks,
> > Joanne
> >
> > [1] https://lore.kernel.org/linux-fsdevel/20250922180042.1775241-1-joannelkoong@gmail.com/
> >
> >>
> >> Thanks,
> >> Gao Xiang
> >>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-17 23:22                               ` Joanne Koong
@ 2025-10-18  0:12                                 ` Gao Xiang
  2025-10-20 23:25                                   ` Joanne Koong
  0 siblings, 1 reply; 46+ messages in thread
From: Gao Xiang @ 2025-10-18  0:12 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Christoph Hellwig, brauner, djwong, bfoster, linux-fsdevel,
	kernel-team



On 2025/10/18 07:22, Joanne Koong wrote:
> On Fri, Oct 17, 2025 at 3:07 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>
>>
>> On 2025/10/18 02:41, Joanne Koong wrote:
>>> On Thu, Oct 16, 2025 at 5:03 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>>>
>>>> On 2025/10/17 06:03, Joanne Koong wrote:
>>>>> On Wed, Oct 15, 2025 at 6:58 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>>>
>>>> ...
>>>>
>>>>>>
>>>>>>>
>>>>>>> So I don't think this patch should have a fixes: tag for that commit.
>>>>>>> It seems to me like no one was hitting this path before with a
>>>>>>> non-block-aligned position and offset. Though now there will be a use
>>>>>>> case for it, which is fuse.
>>>>>>
>>>>>> To make it simplified, the issue is that:
>>>>>>      - Previously, before your fuse iomap read patchset (assuming Christian
>>>>>>        is already applied), there was no WARNING out of there;
>>>>>>
>>>>>>      - A new WARNING should be considered as a kernel regression.
>>>>>
>>>>> No, the warning was always there. As shown in the syzbot report [1],
>>>>> the warning that triggers is this one in iomap_iter_advance()
>>>>>
>>>>> int iomap_iter_advance(struct iomap_iter *iter, u64 *count)
>>>>> {
>>>>>            if (WARN_ON_ONCE(*count > iomap_length(iter)))
>>>>>                    return -EIO;
>>>>>            ...
>>>>> }
>>>>>
>>>>> which was there even prior to the fuse iomap read patchset.
>>>>>
>>>>> Erofs could still trigger this warning even without the fuse iomap
>>>>> read patchset changes. So I don't think this qualifies as a new
>>>>> warning that's caused by the fuse iomap read changes.
>>>>
>>>> No, I'm pretty sure the current Linus upstream doesn't have this
>>>> issue, because:
>>>>
>>>>     - I've checked it against v6.17 with the C repro and related
>>>>       Kconfig (with make olddefconfig revised);
>>>>
>>>>     - IOMAP_INLINE is pretty common for directories and regular
>>>>       inodes, if it has such warning syzbot should be reported
>>>>       much earlier (d9dc477ff6a2 was commited at Feb 26, 2025;
>>>>       and b26816b4e320 was commited at Mar 19, 2025) in the dashboard
>>>>       (https://syzkaller.appspot.com/upstream/s/erofs) rather
>>>>       than triggered directly by your fuse read patchset.
>>>>
>>>> Could you also check with v6.17 codebase?
>>>
>>> I think we are talking about two different things. By "this issue"
>>> what you're talking about is the syzbot read example program that
>>> triggers the warning on erofs, but by "this issue", what I was talking
>>> about is the iomap_iter_advance() warning being triggerable
>>> generically without the fuse read patchset, not just by erofs.
>>
>> Ah, yes.  Sorry the subjects of those two patches are similar,
>> I got them mixed up.  I thought you resent the previous patch
>> in this patchset.
>>
>>>
>>> If we're talking about the syzbot erofs warning being triggered, then
>>> this patch is irrelevant to be talking about, because it is this other
>>> patch [1] that fixes that issue. That patch got merged in before any
>>> of the fuse iomap read changes. There is no regression to erofs.
>>
>> Can you confirm this since I can't open the link below:
>>
>> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
>> branch: vfs-6.19.iomap
>>
>> [1/1] iomap: adjust read range correctly for non-block-aligned positions
>>         https://git.kernel.org/vfs/vfs/c/94b11133d6f6
>>
> 
> I don't think the vfs-6.19.iomap branch is publicly available yet,
> which is why the link doesn't work.
> 
>  From the merge timestamps in [1] and [2], the fix was applied to the
> branch 3 minutes before the fuse iomap changes were applied.
> Additionally, in the cover letter of the fuse iomap read patchset [3],
> it calls out that the patchset is rebased on top of that fix.

Ok, make sense, thanks.

Thanks,
Gao Xiang

> 
> Thanks,
> Joanne
> 
> [1] https://lore.kernel.org/linux-fsdevel/20250929-gefochten-bergwacht-b43b132a78d9@brauner/
> [2] https://lore.kernel.org/linux-fsdevel/20250929-salzbergwerk-ungnade-8a16d724415e@brauner/
> [3] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/
> 
>> As you said, if this commit is prior to the iomap read patchset, that
>> would be fine.  Otherwise it would be better to add a fixes tag to
>> that commit to point out this patch should be ported together to avoid
>> the new warning.
>>
>> Thanks,
>> Gao Xiang
>>
>>
>>>
>>> Thanks,
>>> Joanne
>>>
>>> [1] https://lore.kernel.org/linux-fsdevel/20250922180042.1775241-1-joannelkoong@gmail.com/
>>>
>>>>
>>>> Thanks,
>>>> Gao Xiang
>>>>
>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-17 20:19                             ` Joanne Koong
@ 2025-10-18 10:30                               ` Brian Foster
  2025-10-20 21:39                                 ` Joanne Koong
  0 siblings, 1 reply; 46+ messages in thread
From: Brian Foster @ 2025-10-18 10:30 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Gao Xiang, Christoph Hellwig, brauner, djwong, linux-fsdevel,
	kernel-team

On Fri, Oct 17, 2025 at 01:19:39PM -0700, Joanne Koong wrote:
> On Fri, Oct 17, 2025 at 11:23 AM Brian Foster <bfoster@redhat.com> wrote:
> >
> > On Fri, Oct 17, 2025 at 11:33:31AM -0400, Brian Foster wrote:
> > > On Thu, Oct 16, 2025 at 03:39:27PM -0700, Joanne Koong wrote:
> > > > On Thu, Oct 16, 2025 at 4:25 AM Brian Foster <bfoster@redhat.com> wrote:
> > > > >
> > > > > On Wed, Oct 15, 2025 at 06:27:10PM -0700, Joanne Koong wrote:
> > > > > > On Wed, Oct 15, 2025 at 5:36 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > > > >
> > > > > > > On Wed, Oct 15, 2025 at 11:39 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > Hi Joanne,
> > > > > > > >
> > > > > > > > On 2025/10/16 02:21, Joanne Koong wrote:
> > > > > > > > > On Wed, Oct 15, 2025 at 11:06 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > ...
> > > > > > > >
> > > > > > > > >>>
> > > > > > > > >>> This is where I encountered it in erofs: [1] for the "WARNING in
> > > > > > > > >>> iomap_iter_advance" syz repro. (this syzbot report was generated in
> > > > > > > > >>> response to this patchset version [2]).
> > > > > > > > >>>
> > > > > > > > >>> When I ran that syz program locally, I remember seeing pos=116 and length=3980.
> > > > > > > > >>
> > > > > > > > >> I just ran the C repro locally with the upstream codebase (but I
> > > > > > > > >> didn't use the related Kconfig), and it doesn't show anything.
> > > > > > > > >
> > > > > > > > > Which upstream commit are you running it on? It needs to be run on top
> > > > > > > > > of this patchset [1] but without this fix [2]. These changes are in
> > > > > > > > > Christian's vfs-6.19.iomap branch in his vfs tree but I don't think
> > > > > > > > > that branch has been published publicly yet so maybe just patching it
> > > > > > > > > in locally will work best.
> > > > > > > > >
> > > > > > > > > When I reproed it last month, I used the syz executor (not the C
> > > > > > > > > repro, though that should probably work too?) directly with the
> > > > > > > > > kconfig they had.
> > > > > > > >
> > > > > > > > I believe it's a regression somewhere since it's a valid
> > > > > > > > IOMAP_INLINE extent (since it's inlined, the length is not
> > > > > > > > block-aligned of course), you could add a print just before
> > > > > > > > erofs_iomap_begin() returns.
> > > > > > >
> > > > > > > Ok, so if erofs is strictly block-aligned except for tail inline data
> > > > > > > (eg the IOMAP_INLINE extent), then I agree, there is a regression
> > > > > > > somewhere as we shouldn't be running into the situation where erofs is
> > > > > > > calling iomap_adjust_read_range() with a non-block-aligned position
> > > > > > > and length. I'll track the offending commit down tomorrow.
> > > > > > >
> > > > > >
> > > > > > Ok, I think it's commit bc264fea0f6f ("iomap: support incremental
> > > > > > iomap_iter advances") that changed this behavior for erofs such that
> > > > > > the read iteration continues even after encountering an IOMAP_INLINE
> > > > > > extent, whereas before, the iteration stopped after reading in the
> > > > > > iomap inline extent. This leads erofs to end up in the situation where
> > > > > > it calls into iomap_adjust_read_range() with a non-block-aligned
> > > > > > position/length (on that subsequent iteration).
> > > > > >
> > > > > > In particular, this change in commit bc264fea0f6f to iomap_iter():
> > > > > >
> > > > > > -       if (ret > 0 && !iter->processed && !stale)
> > > > > > +       if (ret > 0 && !advanced && !stale)
> > > > > >
> > > > > > For iomap inline extents, iter->processed is 0, which stopped the
> > > > > > iteration before. But now, advanced (which is iter->pos -
> > > > > > iter->iter_start_pos) is used which will continue the iteration (since
> > > > > > the iter is advanced after reading in the iomap inline extents).
> > > > > >
> > > > > > Erofs is able to handle subsequent iterations after iomap_inline
> > > > > > extents because erofs_iomap_begin() checks the block map and returns
> > > > > > IOMAP_HOLE if it's not mapped
> > > > > >         if (!(map.m_flags & EROFS_MAP_MAPPED)) {
> > > > > >                 iomap->type = IOMAP_HOLE;
> > > > > >                 return 0;
> > > > > >         }
> > > > > >
> > > > > > but I think what probably would be better is a separate patch that
> > > > > > reverts this back to the original behavior of stopping the iteration
> > > > > > after IOMAP_INLINE extents are read in.
> > > > > >
> > > > >
> > > > > Hmm.. so as of commit bc264fea0f6f, it looks like the read_inline() path
> > > > > still didn't advance the iter at all by that point. It just returned 0
> > > > > and this caused iomap_iter() to break out of the iteration loop.
> > > > >
> > > > > The logic noted above in iomap_iter() is basically saying to break out
> > > > > if the iteration did nothing, which is a bit of a hacky way to terminate
> > > > > an IOMAP_INLINE read. The proper thing to do in that path IMO is to
> > > > > report the bytes processed and then terminate some other way more
> > > > > naturally. I see Gao actually fixed this sometime later in commit
> > > > > b26816b4e320 ("iomap: fix inline data on buffered read"), which is when
> > > > > the inline read path started to advance the iter.
> > > >
> > > > That's a good point, the fix in commit b26816b4e320 is what led to the
> > > > new erofs behavior, not commit bc264fea0f6f.
> > > >
> > > > >
> > > > > TBH, the behavior described above where we advance over the inline
> > > > > mapping and then report any remaining iter length as a hole also sounds
> > > > > like reasonably appropriate behavior to me. I suppose you could argue
> > > > > that the inline case should just terminate the iter, which perhaps means
> > > > > it should call iomap_iter_advance_full() instead. That technically
> > > > > hardcodes that we will never process mappings beyond an inline mapping
> > > > > into iomap. That bugs me a little bit, but is also probably always going
> > > > > to be true so doesn't seem like that big of a deal.
> > > >
> > > > Reporting any remaining iter length as a hole also sounds reasonable
> > > > to me but it seems that this may add additional work that may not be
> > > > trivial. For example, I'm looking at erofs's erofs_map_blocks() call
> > > > which they would need to do to figure out it should be an iomap hole.
> > > > It seems a bit nonideal to me that any filesystem using iomap inline
> > > > data would also have to make sure they cover this case, which maybe
> > > > they already need to do that, I'm not sure, but it seems like an extra
> > > > thing they would now need to account for.
> > > >
> > >
> > > To make sure I follow, you're talking about any potential non-erofs
> > > cases, right? I thought it was mentioned that erofs already handled this
> > > correctly by returning a hole. So I take this to mean other users would
> > > need to handle this similarly.
> 
> Yes, sorry if my previous wording was confusing. Erofs already handles
> this correctly but there's some overhead with having to determine it's
> a hole.
> 
> > >
> > > Poking around, I see that ext4 uses iomap and IOMAP_INLINE in a couple
> > > isolated cases. One looks like swapfile activation and the other appears
> > > to be related to fiemap. Any idea if those are working correctly? At
> > > first look it kind of looks like they check and return error for offset
> > > beyond the specified length..
> > >
> >
> > JFYI from digging further.. ext4 returns -ENOENT for this "offset beyond
> > inline size" case. I didn't see any error returns from fiemap/filefrag
> > on a quick test against an inline data file. It looks like
> > iomap_fiemap() actually ignores errors either if the previous mapping is
> > non-hole, or catches -ENOENT explicitly to handle an attribute mapping
> > case. So afaict this all seems to work Ok..
> >
> > I'm not sure this is all that relevant for the swap case. mkswap
> > apparently requires a file at least 40k in size, which iiuc is beyond
> > the scope of files with inline data..
> 
> I think gfs2 also uses IOMAP_INLINE. In __gfs2_iomap_get(), it looks
> like they report a hole if the offset is beyond the inode size, so
> that looks okay.
> 
> >
> > Brian
> >
> > > > One scenario I'm imagining maybe being useful in the future is
> > > > supporting chained inline mappings, in which case we would still want
> > > > to continue processing after the first inline mapping, but we could
> > > > also address that if it does come up.
> > > >
> > >
> > > Yeah..
> > >
> > > > >
> > > > > If we wanted to consider it an optimization so we didn't always do this
> > > > > extra iter on inline files, perhaps another variant of that could be an
> > > > > EOF flag or some such that the fs could set to trigger a full advance
> > > > > after the current mapping. OTOH you could argue that's what inline
> > > > > already is so maybe that's overthinking it. Just a thought. Hm?
> > > > >
> > > >
> > > > Would non-inline iomap types also find that flag helpful or is the
> > > > intention for it to be inline specific? I guess the flag could also be
> > > > used by non-inline types to stop the iteration short, if there's a use
> > > > case for that?
> > > >
> > >
> > > ... I hadn't really considered that. This was just me thinking out loud
> > > about trying to handle things more generically.
> > >
> > > Having thought more about it, I think either way it might just make
> > > sense to fix the inline read case to do the full advance (assuming it is
> > > verified this fixes the problem) to restore historical iomap behavior.
> 
> Hmm, the iter is already fully advanced after the inline data is read.
> I think the issue is that when the iomap mapping length is less than
> the iter len (the iter len will always be the size of the folio but
> the length for the inline data could be less), then advancing it fully
> through iomap_iter_advance_full() will only advance it up to the iomap
> length. iter->len will then still have a positive value which makes
> iomap_iter() continue iterating.
> 

Derp.. yeah, I had my wires crossed thinking the full advance completed
the full request, when really it's just the current iteration. Thanks
for pointing that out.

> I don't see a cleaner way of restoring historical iomap behavior than
> adding that eof/stop flag.
> 

Hmm.. so then aside from the oddly aligned iter case warning that you've
already fixed, ISTM that everything otherwise works correctly, right? If
so, I wouldn't be in a huge rush to do the flag thing.

In theory, there may still be similar options to do what I was thinking,
like perhaps just zero out remaining iter->len in that particular inline
path (maybe via an iomap_iter_advance_inline() helper for example). It's
not totally clear to me it's worth it though if everything is pretty
much working as is.

Brian

> Thanks,
> Joanne
> 
> 
> > > Then if there is ever value to support multiple inline mappings somehow
> > > or another, we could revisit the flag idea.
> 
> > >
> > > Brian
> > >
> > > > Thanks,
> > > > Joanne
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-18 10:30                               ` Brian Foster
@ 2025-10-20 21:39                                 ` Joanne Koong
  0 siblings, 0 replies; 46+ messages in thread
From: Joanne Koong @ 2025-10-20 21:39 UTC (permalink / raw)
  To: Brian Foster
  Cc: Gao Xiang, Christoph Hellwig, brauner, djwong, linux-fsdevel,
	kernel-team

On Sat, Oct 18, 2025 at 3:25 AM Brian Foster <bfoster@redhat.com> wrote:
>
> On Fri, Oct 17, 2025 at 01:19:39PM -0700, Joanne Koong wrote:
> > On Fri, Oct 17, 2025 at 11:23 AM Brian Foster <bfoster@redhat.com> wrote:
> > >
> > > On Fri, Oct 17, 2025 at 11:33:31AM -0400, Brian Foster wrote:
> > > > On Thu, Oct 16, 2025 at 03:39:27PM -0700, Joanne Koong wrote:
> > > > > On Thu, Oct 16, 2025 at 4:25 AM Brian Foster <bfoster@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, Oct 15, 2025 at 06:27:10PM -0700, Joanne Koong wrote:
> > > > > > > On Wed, Oct 15, 2025 at 5:36 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Oct 15, 2025 at 11:39 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Joanne,
> > > > > > > > >
> > > > > > > > > On 2025/10/16 02:21, Joanne Koong wrote:
> > > > > > > > > > On Wed, Oct 15, 2025 at 11:06 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > > > > > > > >
> > > > > > > > > ...
> > > > > > > > >
> > > > > > > > > >>>
> > > > > > > > > >>> This is where I encountered it in erofs: [1] for the "WARNING in
> > > > > > > > > >>> iomap_iter_advance" syz repro. (this syzbot report was generated in
> > > > > > > > > >>> response to this patchset version [2]).
> > > > > > > > > >>>
> > > > > > > > > >>> When I ran that syz program locally, I remember seeing pos=116 and length=3980.
> > > > > > > > > >>
> > > > > > > > > >> I just ran the C repro locally with the upstream codebase (but I
> > > > > > > > > >> didn't use the related Kconfig), and it doesn't show anything.
> > > > > > > > > >
> > > > > > > > > > Which upstream commit are you running it on? It needs to be run on top
> > > > > > > > > > of this patchset [1] but without this fix [2]. These changes are in
> > > > > > > > > > Christian's vfs-6.19.iomap branch in his vfs tree but I don't think
> > > > > > > > > > that branch has been published publicly yet so maybe just patching it
> > > > > > > > > > in locally will work best.
> > > > > > > > > >
> > > > > > > > > > When I reproed it last month, I used the syz executor (not the C
> > > > > > > > > > repro, though that should probably work too?) directly with the
> > > > > > > > > > kconfig they had.
> > > > > > > > >
> > > > > > > > > I believe it's a regression somewhere since it's a valid
> > > > > > > > > IOMAP_INLINE extent (since it's inlined, the length is not
> > > > > > > > > block-aligned of course), you could add a print just before
> > > > > > > > > erofs_iomap_begin() returns.
> > > > > > > >
> > > > > > > > Ok, so if erofs is strictly block-aligned except for tail inline data
> > > > > > > > (eg the IOMAP_INLINE extent), then I agree, there is a regression
> > > > > > > > somewhere as we shouldn't be running into the situation where erofs is
> > > > > > > > calling iomap_adjust_read_range() with a non-block-aligned position
> > > > > > > > and length. I'll track the offending commit down tomorrow.
> > > > > > > >
> > > > > > >
> > > > > > > Ok, I think it's commit bc264fea0f6f ("iomap: support incremental
> > > > > > > iomap_iter advances") that changed this behavior for erofs such that
> > > > > > > the read iteration continues even after encountering an IOMAP_INLINE
> > > > > > > extent, whereas before, the iteration stopped after reading in the
> > > > > > > iomap inline extent. This leads erofs to end up in the situation where
> > > > > > > it calls into iomap_adjust_read_range() with a non-block-aligned
> > > > > > > position/length (on that subsequent iteration).
> > > > > > >
> > > > > > > In particular, this change in commit bc264fea0f6f to iomap_iter():
> > > > > > >
> > > > > > > -       if (ret > 0 && !iter->processed && !stale)
> > > > > > > +       if (ret > 0 && !advanced && !stale)
> > > > > > >
> > > > > > > For iomap inline extents, iter->processed is 0, which stopped the
> > > > > > > iteration before. But now, advanced (which is iter->pos -
> > > > > > > iter->iter_start_pos) is used which will continue the iteration (since
> > > > > > > the iter is advanced after reading in the iomap inline extents).
> > > > > > >
> > > > > > > Erofs is able to handle subsequent iterations after iomap_inline
> > > > > > > extents because erofs_iomap_begin() checks the block map and returns
> > > > > > > IOMAP_HOLE if it's not mapped
> > > > > > >         if (!(map.m_flags & EROFS_MAP_MAPPED)) {
> > > > > > >                 iomap->type = IOMAP_HOLE;
> > > > > > >                 return 0;
> > > > > > >         }
> > > > > > >
> > > > > > > but I think what probably would be better is a separate patch that
> > > > > > > reverts this back to the original behavior of stopping the iteration
> > > > > > > after IOMAP_INLINE extents are read in.
> > > > > > >
> > > > > >
> > > > > > Hmm.. so as of commit bc264fea0f6f, it looks like the read_inline() path
> > > > > > still didn't advance the iter at all by that point. It just returned 0
> > > > > > and this caused iomap_iter() to break out of the iteration loop.
> > > > > >
> > > > > > The logic noted above in iomap_iter() is basically saying to break out
> > > > > > if the iteration did nothing, which is a bit of a hacky way to terminate
> > > > > > an IOMAP_INLINE read. The proper thing to do in that path IMO is to
> > > > > > report the bytes processed and then terminate some other way more
> > > > > > naturally. I see Gao actually fixed this sometime later in commit
> > > > > > b26816b4e320 ("iomap: fix inline data on buffered read"), which is when
> > > > > > the inline read path started to advance the iter.
> > > > >
> > > > > That's a good point, the fix in commit b26816b4e320 is what led to the
> > > > > new erofs behavior, not commit bc264fea0f6f.
> > > > >
> > > > > >
> > > > > > TBH, the behavior described above where we advance over the inline
> > > > > > mapping and then report any remaining iter length as a hole also sounds
> > > > > > like reasonably appropriate behavior to me. I suppose you could argue
> > > > > > that the inline case should just terminate the iter, which perhaps means
> > > > > > it should call iomap_iter_advance_full() instead. That technically
> > > > > > hardcodes that we will never process mappings beyond an inline mapping
> > > > > > into iomap. That bugs me a little bit, but is also probably always going
> > > > > > to be true so doesn't seem like that big of a deal.
> > > > >
> > > > > Reporting any remaining iter length as a hole also sounds reasonable
> > > > > to me but it seems that this may add additional work that may not be
> > > > > trivial. For example, I'm looking at erofs's erofs_map_blocks() call
> > > > > which they would need to do to figure out it should be an iomap hole.
> > > > > It seems a bit nonideal to me that any filesystem using iomap inline
> > > > > data would also have to make sure they cover this case, which maybe
> > > > > they already need to do that, I'm not sure, but it seems like an extra
> > > > > thing they would now need to account for.
> > > > >
> > > >
> > > > To make sure I follow, you're talking about any potential non-erofs
> > > > cases, right? I thought it was mentioned that erofs already handled this
> > > > correctly by returning a hole. So I take this to mean other users would
> > > > need to handle this similarly.
> >
> > Yes, sorry if my previous wording was confusing. Erofs already handles
> > this correctly but there's some overhead with having to determine it's
> > a hole.
> >
> > > >
> > > > Poking around, I see that ext4 uses iomap and IOMAP_INLINE in a couple
> > > > isolated cases. One looks like swapfile activation and the other appears
> > > > to be related to fiemap. Any idea if those are working correctly? At
> > > > first look it kind of looks like they check and return error for offset
> > > > beyond the specified length..
> > > >
> > >
> > > JFYI from digging further.. ext4 returns -ENOENT for this "offset beyond
> > > inline size" case. I didn't see any error returns from fiemap/filefrag
> > > on a quick test against an inline data file. It looks like
> > > iomap_fiemap() actually ignores errors either if the previous mapping is
> > > non-hole, or catches -ENOENT explicitly to handle an attribute mapping
> > > case. So afaict this all seems to work Ok..
> > >
> > > I'm not sure this is all that relevant for the swap case. mkswap
> > > apparently requires a file at least 40k in size, which iiuc is beyond
> > > the scope of files with inline data..
> >
> > I think gfs2 also uses IOMAP_INLINE. In __gfs2_iomap_get(), it looks
> > like they report a hole if the offset is beyond the inode size, so
> > that looks okay.
> >
> > >
> > > Brian
> > >
> > > > > One scenario I'm imagining maybe being useful in the future is
> > > > > supporting chained inline mappings, in which case we would still want
> > > > > to continue processing after the first inline mapping, but we could
> > > > > also address that if it does come up.
> > > > >
> > > >
> > > > Yeah..
> > > >
> > > > > >
> > > > > > If we wanted to consider it an optimization so we didn't always do this
> > > > > > extra iter on inline files, perhaps another variant of that could be an
> > > > > > EOF flag or some such that the fs could set to trigger a full advance
> > > > > > after the current mapping. OTOH you could argue that's what inline
> > > > > > already is so maybe that's overthinking it. Just a thought. Hm?
> > > > > >
> > > > >
> > > > > Would non-inline iomap types also find that flag helpful or is the
> > > > > intention for it to be inline specific? I guess the flag could also be
> > > > > used by non-inline types to stop the iteration short, if there's a use
> > > > > case for that?
> > > > >
> > > >
> > > > ... I hadn't really considered that. This was just me thinking out loud
> > > > about trying to handle things more generically.
> > > >
> > > > Having thought more about it, I think either way it might just make
> > > > sense to fix the inline read case to do the full advance (assuming it is
> > > > verified this fixes the problem) to restore historical iomap behavior.
> >
> > Hmm, the iter is already fully advanced after the inline data is read.
> > I think the issue is that when the iomap mapping length is less than
> > the iter len (the iter len will always be the size of the folio but
> > the length for the inline data could be less), then advancing it fully
> > through iomap_iter_advance_full() will only advance it up to the iomap
> > length. iter->len will then still have a positive value which makes
> > iomap_iter() continue iterating.
> >
>
> Derp.. yeah, I had my wires crossed thinking the full advance completed
> the full request, when really it's just the current iteration. Thanks
> for pointing that out.
>
> > I don't see a cleaner way of restoring historical iomap behavior than
> > adding that eof/stop flag.
> >
>
> Hmm.. so then aside from the oddly aligned iter case warning that you've
> already fixed, ISTM that everything otherwise works correctly, right? If
> so, I wouldn't be in a huge rush to do the flag thing.
>
> In theory, there may still be similar options to do what I was thinking,
> like perhaps just zero out remaining iter->len in that particular inline
> path (maybe via an iomap_iter_advance_inline() helper for example). It's
> not totally clear to me it's worth it though if everything is pretty
> much working as is.

Sounds good to me to leave it as is. I was originally thinking it
might be nice to avoid the extra mapping overhead for determining it's
an iomap_hole but I guess it wouldn't actually be much overhead since
in most cases the metadata would be cached from the previous
iomap_inline call and not require a disk read.

Thanks for weighing in on this!

>
> Brian
>
> > Thanks,
> > Joanne
> >
> >
> > > > Then if there is ever value to support multiple inline mappings somehow
> > > > or another, we could revisit the flag idea.
> >
> > > >
> > > > Brian
> > > >
> > > > > Thanks,
> > > > > Joanne
> >
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-18  0:12                                 ` Gao Xiang
@ 2025-10-20 23:25                                   ` Joanne Koong
  2025-10-21  1:39                                     ` Gao Xiang
  0 siblings, 1 reply; 46+ messages in thread
From: Joanne Koong @ 2025-10-20 23:25 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Christoph Hellwig, brauner, djwong, bfoster, linux-fsdevel,
	kernel-team

On Fri, Oct 17, 2025 at 5:13 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>
> On 2025/10/18 07:22, Joanne Koong wrote:
> > On Fri, Oct 17, 2025 at 3:07 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> >>
> >>
> >> On 2025/10/18 02:41, Joanne Koong wrote:
> >>> On Thu, Oct 16, 2025 at 5:03 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> >>>>
> >>>> On 2025/10/17 06:03, Joanne Koong wrote:
> >>>>> On Wed, Oct 15, 2025 at 6:58 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> >>>>
> >>>> ...
> >>>>
> >>>>>>
> >>>>>>>
> >>>>>>> So I don't think this patch should have a fixes: tag for that commit.
> >>>>>>> It seems to me like no one was hitting this path before with a
> >>>>>>> non-block-aligned position and offset. Though now there will be a use
> >>>>>>> case for it, which is fuse.
> >>>>>>
> >>>>>> To make it simplified, the issue is that:
> >>>>>>      - Previously, before your fuse iomap read patchset (assuming Christian
> >>>>>>        is already applied), there was no WARNING out of there;
> >>>>>>
> >>>>>>      - A new WARNING should be considered as a kernel regression.
> >>>>>
> >>>>> No, the warning was always there. As shown in the syzbot report [1],
> >>>>> the warning that triggers is this one in iomap_iter_advance()
> >>>>>
> >>>>> int iomap_iter_advance(struct iomap_iter *iter, u64 *count)
> >>>>> {
> >>>>>            if (WARN_ON_ONCE(*count > iomap_length(iter)))
> >>>>>                    return -EIO;
> >>>>>            ...
> >>>>> }
> >>>>>
> >>>>> which was there even prior to the fuse iomap read patchset.
> >>>>>
> >>>>> Erofs could still trigger this warning even without the fuse iomap
> >>>>> read patchset changes. So I don't think this qualifies as a new
> >>>>> warning that's caused by the fuse iomap read changes.
> >>>>
> >>>> No, I'm pretty sure the current Linus upstream doesn't have this
> >>>> issue, because:
> >>>>
> >>>>     - I've checked it against v6.17 with the C repro and related
> >>>>       Kconfig (with make olddefconfig revised);
> >>>>
> >>>>     - IOMAP_INLINE is pretty common for directories and regular
> >>>>       inodes, if it has such warning syzbot should be reported
> >>>>       much earlier (d9dc477ff6a2 was commited at Feb 26, 2025;
> >>>>       and b26816b4e320 was commited at Mar 19, 2025) in the dashboard
> >>>>       (https://syzkaller.appspot.com/upstream/s/erofs) rather
> >>>>       than triggered directly by your fuse read patchset.
> >>>>
> >>>> Could you also check with v6.17 codebase?
> >>>
> >>> I think we are talking about two different things. By "this issue"
> >>> what you're talking about is the syzbot read example program that
> >>> triggers the warning on erofs, but by "this issue", what I was talking
> >>> about is the iomap_iter_advance() warning being triggerable
> >>> generically without the fuse read patchset, not just by erofs.
> >>
> >> Ah, yes.  Sorry the subjects of those two patches are similar,
> >> I got them mixed up.  I thought you resent the previous patch
> >> in this patchset.
> >>
> >>>
> >>> If we're talking about the syzbot erofs warning being triggered, then
> >>> this patch is irrelevant to be talking about, because it is this other
> >>> patch [1] that fixes that issue. That patch got merged in before any
> >>> of the fuse iomap read changes. There is no regression to erofs.
> >>
> >> Can you confirm this since I can't open the link below:
> >>
> >> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
> >> branch: vfs-6.19.iomap
> >>
> >> [1/1] iomap: adjust read range correctly for non-block-aligned positions
> >>         https://git.kernel.org/vfs/vfs/c/94b11133d6f6
> >>
> >
> > I don't think the vfs-6.19.iomap branch is publicly available yet,
> > which is why the link doesn't work.
> >
> >  From the merge timestamps in [1] and [2], the fix was applied to the
> > branch 3 minutes before the fuse iomap changes were applied.
> > Additionally, in the cover letter of the fuse iomap read patchset [3],
> > it calls out that the patchset is rebased on top of that fix.
>
> Ok, make sense, thanks.

The vfs-6.19.iomap branch is now available. I just triple-checked and
can confirm that commit 7aa6bc3e8766 ("iomap: adjust read range
correctly for non-block-aligned positions") was merged into the tree
prior to commit e0c95d2290c1 ("iomap: set accurate iter->pos when
reading folio ranges").

Thanks,
Joanne

>
> Thanks,
> Gao Xiang
>
> >
> > Thanks,
> > Joanne
> >
> > [1] https://lore.kernel.org/linux-fsdevel/20250929-gefochten-bergwacht-b43b132a78d9@brauner/
> > [2] https://lore.kernel.org/linux-fsdevel/20250929-salzbergwerk-ungnade-8a16d724415e@brauner/
> > [3] https://lore.kernel.org/linux-fsdevel/20250926002609.1302233-1-joannelkoong@gmail.com/
> >
> >> As you said, if this commit is prior to the iomap read patchset, that
> >> would be fine.  Otherwise it would be better to add a fixes tag to
> >> that commit to point out this patch should be ported together to avoid
> >> the new warning.
> >>
> >> Thanks,
> >> Gao Xiang
> >>
> >>
> >>>
> >>> Thanks,
> >>> Joanne
> >>>
> >>> [1] https://lore.kernel.org/linux-fsdevel/20250922180042.1775241-1-joannelkoong@gmail.com/
> >>>
> >>>>
> >>>> Thanks,
> >>>> Gao Xiang
> >>>>
> >>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range
  2025-10-20 23:25                                   ` Joanne Koong
@ 2025-10-21  1:39                                     ` Gao Xiang
  0 siblings, 0 replies; 46+ messages in thread
From: Gao Xiang @ 2025-10-21  1:39 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Christoph Hellwig, brauner, djwong, bfoster, linux-fsdevel,
	kernel-team



On 2025/10/21 07:25, Joanne Koong wrote:
> On Fri, Oct 17, 2025 at 5:13 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>
>> On 2025/10/18 07:22, Joanne Koong wrote:
>>> On Fri, Oct 17, 2025 at 3:07 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>>>
>>>>
>>>> On 2025/10/18 02:41, Joanne Koong wrote:

...

>>>>
>>>> Can you confirm this since I can't open the link below:
>>>>
>>>> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
>>>> branch: vfs-6.19.iomap
>>>>
>>>> [1/1] iomap: adjust read range correctly for non-block-aligned positions
>>>>          https://git.kernel.org/vfs/vfs/c/94b11133d6f6
>>>>
>>>
>>> I don't think the vfs-6.19.iomap branch is publicly available yet,
>>> which is why the link doesn't work.
>>>
>>>   From the merge timestamps in [1] and [2], the fix was applied to the
>>> branch 3 minutes before the fuse iomap changes were applied.
>>> Additionally, in the cover letter of the fuse iomap read patchset [3],
>>> it calls out that the patchset is rebased on top of that fix.
>>
>> Ok, make sense, thanks.
> 
> The vfs-6.19.iomap branch is now available. I just triple-checked and
> can confirm that commit 7aa6bc3e8766 ("iomap: adjust read range
> correctly for non-block-aligned positions") was merged into the tree
> prior to commit e0c95d2290c1 ("iomap: set accurate iter->pos when
> reading folio ranges").

Hi Joanne, thanks for the confirmation again,
sounds fine with me.

Thanks,
Gao Xiang

> 
> Thanks,
> Joanne
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2025-10-21  1:39 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-09 22:56 [PATCH v1 0/9] iomap: buffered io changes Joanne Koong
2025-10-09 22:56 ` [PATCH v1 1/9] iomap: account for unaligned end offsets when truncating read range Joanne Koong
2025-10-13  3:00   ` Christoph Hellwig
2025-10-15 17:34     ` Joanne Koong
2025-10-15 17:40       ` Gao Xiang
2025-10-15 17:49         ` Joanne Koong
2025-10-15 18:06           ` Gao Xiang
2025-10-15 18:21             ` Joanne Koong
2025-10-15 18:39               ` Gao Xiang
2025-10-16  0:36                 ` Joanne Koong
2025-10-16  1:27                   ` Joanne Koong
2025-10-16  1:58                     ` Gao Xiang
2025-10-16 22:03                       ` Joanne Koong
2025-10-17  0:03                         ` Gao Xiang
2025-10-17 18:41                           ` Joanne Koong
2025-10-17 22:07                             ` Gao Xiang
2025-10-17 23:22                               ` Joanne Koong
2025-10-18  0:12                                 ` Gao Xiang
2025-10-20 23:25                                   ` Joanne Koong
2025-10-21  1:39                                     ` Gao Xiang
2025-10-16 11:29                     ` Brian Foster
2025-10-16 22:39                       ` Joanne Koong
2025-10-17 15:33                         ` Brian Foster
2025-10-17 18:27                           ` Brian Foster
2025-10-17 20:19                             ` Joanne Koong
2025-10-18 10:30                               ` Brian Foster
2025-10-20 21:39                                 ` Joanne Koong
2025-10-15 18:28             ` Gao Xiang
2025-10-09 22:56 ` [PATCH v1 2/9] docs: document iomap writeback's iomap_finish_folio_write() requirement Joanne Koong
2025-10-13  3:01   ` Christoph Hellwig
2025-10-09 22:56 ` [PATCH v1 3/9] iomap: optimize pending async writeback accounting Joanne Koong
2025-10-13  3:04   ` Christoph Hellwig
2025-10-09 22:56 ` [PATCH v1 4/9] iomap: simplify ->read_folio_range() error handling for reads Joanne Koong
2025-10-13  3:06   ` Christoph Hellwig
2025-10-09 22:56 ` [PATCH v1 5/9] iomap: simplify when reads can be skipped for writes Joanne Koong
2025-10-13  3:06   ` Christoph Hellwig
2025-10-09 22:56 ` [PATCH v1 6/9] iomap: optimize reads for non-block-aligned writes Joanne Koong
2025-10-13  3:08   ` Christoph Hellwig
2025-10-14  0:04     ` Joanne Koong
2025-10-14  4:14       ` Christoph Hellwig
2025-10-09 22:56 ` [PATCH v1 7/9] iomap: use loff_t for file positions and offsets in writeback code Joanne Koong
2025-10-13  3:09   ` Christoph Hellwig
2025-10-09 22:56 ` [PATCH v1 8/9] iomap: use find_next_bit() for dirty bitmap scanning Joanne Koong
2025-10-13  3:13   ` Christoph Hellwig
2025-10-09 22:56 ` [PATCH v1 9/9] iomap: use find_next_bit() for uptodate " Joanne Koong
2025-10-13  3:13   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).