linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/9] iomap: buffered io changes
@ 2025-11-11 19:36 Joanne Koong
  2025-11-11 19:36 ` [PATCH v4 1/9] iomap: rename bytes_pending/bytes_accounted to bytes_submitted/bytes_not_submitted Joanne Koong
                   ` (9 more replies)
  0 siblings, 10 replies; 20+ messages in thread
From: Joanne Koong @ 2025-11-11 19:36 UTC (permalink / raw)
  To: brauner; +Cc: hch, djwong, bfoster, linux-fsdevel, kernel-team

This series is on top of the vfs-6.19.iomap branch (head commit ca3557a68684)
in Christian's vfs tree.

Thanks,
Joanne

Changelog
---------

v3: https://lore.kernel.org/linux-fsdevel/20251104205119.1600045-1-joannelkoong@gmail.com/
v3 -> v4:
* Add patch 1 for renaming bytes_pending/bytes_accounted
* Fix bug in patch 5 to account for the case where the folio has an ifs but
  iomap_read_init() was never called.
* Add Darrick's Reviewed-bys
* Rebase

v2: https://lore.kernel.org/linux-fsdevel/20251021164353.3854086-1-joannelkoong@gmail.com/
v2 -> v3:
* Fix race when writing back all bytes of a folio (patch 3)
* Rename from bytes_pending to bytes_submitted (patch 3)
* Add more comments about logic (patch 3)
* Change bytes_submitted from unsigned to size_t (patch 3) (Matthew)

v1: https://lore.kernel.org/linux-fsdevel/20251009225611.3744728-1-joannelkoong@gmail.com/
v1 -> v2:
* Incorporate Christoph's feedback (drop non-block-aligned writes patch, fix
  bitmap scanning function comments, use more concise variable name, etc)
* For loff_t patch, fix up .writeback_range() callback for zonefs, gfs2, and
  block 

Joanne Koong (9):
  iomap: rename bytes_pending/bytes_accounted to
    bytes_submitted/bytes_not_submitted
  iomap: account for unaligned end offsets when truncating read range
  docs: document iomap writeback's iomap_finish_folio_write()
    requirement
  iomap: optimize pending async writeback accounting
  iomap: simplify ->read_folio_range() error handling for reads
  iomap: simplify when reads can be skipped for writes
  iomap: use loff_t for file positions and offsets in writeback code
  iomap: use find_next_bit() for dirty bitmap scanning
  iomap: use find_next_bit() for uptodate bitmap scanning

 .../filesystems/iomap/operations.rst          |  10 +-
 block/fops.c                                  |   3 +-
 fs/fuse/file.c                                |  18 +-
 fs/gfs2/bmap.c                                |   3 +-
 fs/iomap/buffered-io.c                        | 307 +++++++++++-------
 fs/iomap/ioend.c                              |   2 -
 fs/xfs/xfs_aops.c                             |   8 +-
 fs/zonefs/file.c                              |   3 +-
 include/linux/iomap.h                         |  15 +-
 9 files changed, 219 insertions(+), 150 deletions(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v4 1/9] iomap: rename bytes_pending/bytes_accounted to bytes_submitted/bytes_not_submitted
  2025-11-11 19:36 [PATCH v4 0/9] iomap: buffered io changes Joanne Koong
@ 2025-11-11 19:36 ` Joanne Koong
  2025-11-11 19:36 ` [PATCH v4 2/9] iomap: account for unaligned end offsets when truncating read range Joanne Koong
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Joanne Koong @ 2025-11-11 19:36 UTC (permalink / raw)
  To: brauner; +Cc: hch, djwong, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

The naming "bytes_pending" and "bytes_accounted" may be confusing and
could be better named. Rename this to "bytes_submitted" and
"bytes_not_submitted" to make it more clear that these are bytes we
passed to the IO helper to read in.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/iomap/buffered-io.c | 39 ++++++++++++++++++++-------------------
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 6ae031ac8058..7dcb8bbc9484 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -394,16 +394,16 @@ static void iomap_read_init(struct folio *folio)
  * Else the IO helper will end the read after all submitted ranges have been
  * read.
  */
-static void iomap_read_end(struct folio *folio, size_t bytes_pending)
+static void iomap_read_end(struct folio *folio, size_t bytes_submitted)
 {
 	struct iomap_folio_state *ifs;
 
 	/*
-	 * If there are no bytes pending, this means we are responsible for
+	 * If there are no bytes submitted, this means we are responsible for
 	 * unlocking the folio here, since no IO helper has taken ownership of
 	 * it.
 	 */
-	if (!bytes_pending) {
+	if (!bytes_submitted) {
 		folio_unlock(folio);
 		return;
 	}
@@ -416,11 +416,11 @@ static void iomap_read_end(struct folio *folio, size_t bytes_pending)
 		 * read_bytes_pending but skipped for IO.
 		 * The +1 accounts for the bias we added in iomap_read_init().
 		 */
-		size_t bytes_accounted = folio_size(folio) + 1 -
-				bytes_pending;
+		size_t bytes_not_submitted = folio_size(folio) + 1 -
+				bytes_submitted;
 
 		spin_lock_irq(&ifs->state_lock);
-		ifs->read_bytes_pending -= bytes_accounted;
+		ifs->read_bytes_pending -= bytes_not_submitted;
 		/*
 		 * If !ifs->read_bytes_pending, this means all pending reads
 		 * by the IO helper have already completed, which means we need
@@ -437,7 +437,7 @@ static void iomap_read_end(struct folio *folio, size_t bytes_pending)
 }
 
 static int iomap_read_folio_iter(struct iomap_iter *iter,
-		struct iomap_read_folio_ctx *ctx, size_t *bytes_pending)
+		struct iomap_read_folio_ctx *ctx, size_t *bytes_submitted)
 {
 	const struct iomap *iomap = &iter->iomap;
 	loff_t pos = iter->pos;
@@ -478,9 +478,9 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
 			folio_zero_range(folio, poff, plen);
 			iomap_set_range_uptodate(folio, poff, plen);
 		} else {
-			if (!*bytes_pending)
+			if (!*bytes_submitted)
 				iomap_read_init(folio);
-			*bytes_pending += plen;
+			*bytes_submitted += plen;
 			ret = ctx->ops->read_folio_range(iter, ctx, plen);
 			if (ret)
 				return ret;
@@ -504,39 +504,40 @@ void iomap_read_folio(const struct iomap_ops *ops,
 		.pos		= folio_pos(folio),
 		.len		= folio_size(folio),
 	};
-	size_t bytes_pending = 0;
+	size_t bytes_submitted = 0;
 	int ret;
 
 	trace_iomap_readpage(iter.inode, 1);
 
 	while ((ret = iomap_iter(&iter, ops)) > 0)
-		iter.status = iomap_read_folio_iter(&iter, ctx, &bytes_pending);
+		iter.status = iomap_read_folio_iter(&iter, ctx,
+				&bytes_submitted);
 
 	if (ctx->ops->submit_read)
 		ctx->ops->submit_read(ctx);
 
-	iomap_read_end(folio, bytes_pending);
+	iomap_read_end(folio, bytes_submitted);
 }
 EXPORT_SYMBOL_GPL(iomap_read_folio);
 
 static int iomap_readahead_iter(struct iomap_iter *iter,
-		struct iomap_read_folio_ctx *ctx, size_t *cur_bytes_pending)
+		struct iomap_read_folio_ctx *ctx, size_t *cur_bytes_submitted)
 {
 	int ret;
 
 	while (iomap_length(iter)) {
 		if (ctx->cur_folio &&
 		    offset_in_folio(ctx->cur_folio, iter->pos) == 0) {
-			iomap_read_end(ctx->cur_folio, *cur_bytes_pending);
+			iomap_read_end(ctx->cur_folio, *cur_bytes_submitted);
 			ctx->cur_folio = NULL;
 		}
 		if (!ctx->cur_folio) {
 			ctx->cur_folio = readahead_folio(ctx->rac);
 			if (WARN_ON_ONCE(!ctx->cur_folio))
 				return -EINVAL;
-			*cur_bytes_pending = 0;
+			*cur_bytes_submitted = 0;
 		}
-		ret = iomap_read_folio_iter(iter, ctx, cur_bytes_pending);
+		ret = iomap_read_folio_iter(iter, ctx, cur_bytes_submitted);
 		if (ret)
 			return ret;
 	}
@@ -568,19 +569,19 @@ void iomap_readahead(const struct iomap_ops *ops,
 		.pos	= readahead_pos(rac),
 		.len	= readahead_length(rac),
 	};
-	size_t cur_bytes_pending;
+	size_t cur_bytes_submitted;
 
 	trace_iomap_readahead(rac->mapping->host, readahead_count(rac));
 
 	while (iomap_iter(&iter, ops) > 0)
 		iter.status = iomap_readahead_iter(&iter, ctx,
-					&cur_bytes_pending);
+					&cur_bytes_submitted);
 
 	if (ctx->ops->submit_read)
 		ctx->ops->submit_read(ctx);
 
 	if (ctx->cur_folio)
-		iomap_read_end(ctx->cur_folio, cur_bytes_pending);
+		iomap_read_end(ctx->cur_folio, cur_bytes_submitted);
 }
 EXPORT_SYMBOL_GPL(iomap_readahead);
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 2/9] iomap: account for unaligned end offsets when truncating read range
  2025-11-11 19:36 [PATCH v4 0/9] iomap: buffered io changes Joanne Koong
  2025-11-11 19:36 ` [PATCH v4 1/9] iomap: rename bytes_pending/bytes_accounted to bytes_submitted/bytes_not_submitted Joanne Koong
@ 2025-11-11 19:36 ` Joanne Koong
  2025-11-11 19:36 ` [PATCH v4 3/9] docs: document iomap writeback's iomap_finish_folio_write() requirement Joanne Koong
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Joanne Koong @ 2025-11-11 19:36 UTC (permalink / raw)
  To: brauner; +Cc: hch, djwong, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

The end position to start truncating from may be at an offset into a
block, which under the current logic would result in overtruncation.

Adjust the calculation to account for unaligned end offsets.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/iomap/buffered-io.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 7dcb8bbc9484..0eb439b523b1 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -218,6 +218,22 @@ static void ifs_free(struct folio *folio)
 	kfree(ifs);
 }
 
+/*
+ * Calculate how many bytes to truncate based off the number of blocks to
+ * truncate and the end position to start truncating from.
+ */
+static size_t iomap_bytes_to_truncate(loff_t end_pos, unsigned block_bits,
+		unsigned blocks_truncated)
+{
+	unsigned block_size = 1 << block_bits;
+	unsigned block_offset = end_pos & (block_size - 1);
+
+	if (!block_offset)
+		return blocks_truncated << block_bits;
+
+	return ((blocks_truncated - 1) << block_bits) + block_offset;
+}
+
 /*
  * Calculate the range inside the folio that we actually need to read.
  */
@@ -263,7 +279,8 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
 		/* truncate len if we find any trailing uptodate block(s) */
 		while (++i <= last) {
 			if (ifs_block_is_uptodate(ifs, i)) {
-				plen -= (last - i + 1) * block_size;
+				plen -= iomap_bytes_to_truncate(*pos + plen,
+						block_bits, last - i + 1);
 				last = i - 1;
 				break;
 			}
@@ -279,7 +296,8 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
 		unsigned end = offset_in_folio(folio, isize - 1) >> block_bits;
 
 		if (first <= end && last > end)
-			plen -= (last - end) * block_size;
+			plen -= iomap_bytes_to_truncate(*pos + plen, block_bits,
+					last - end);
 	}
 
 	*offp = poff;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 3/9] docs: document iomap writeback's iomap_finish_folio_write() requirement
  2025-11-11 19:36 [PATCH v4 0/9] iomap: buffered io changes Joanne Koong
  2025-11-11 19:36 ` [PATCH v4 1/9] iomap: rename bytes_pending/bytes_accounted to bytes_submitted/bytes_not_submitted Joanne Koong
  2025-11-11 19:36 ` [PATCH v4 2/9] iomap: account for unaligned end offsets when truncating read range Joanne Koong
@ 2025-11-11 19:36 ` Joanne Koong
  2025-11-11 19:36 ` [PATCH v4 4/9] iomap: optimize pending async writeback accounting Joanne Koong
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Joanne Koong @ 2025-11-11 19:36 UTC (permalink / raw)
  To: brauner; +Cc: hch, djwong, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

Document that iomap_finish_folio_write() must be called after writeback
on the range completes.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 Documentation/filesystems/iomap/operations.rst | 3 +++
 include/linux/iomap.h                          | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index c88205132039..4d30723be7fa 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -361,6 +361,9 @@ The fields are as follows:
     delalloc reservations to avoid having delalloc reservations for
     clean pagecache.
     This function must be supplied by the filesystem.
+    If this succeeds, iomap_finish_folio_write() must be called once writeback
+    completes for the range, regardless of whether the writeback succeeded or
+    failed.
 
   - ``writeback_submit``: Submit the previous built writeback context.
     Block based file systems should use the iomap_ioend_writeback_submit
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 8b1ac08c7474..a5032e456079 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -435,6 +435,10 @@ struct iomap_writeback_ops {
 	 * An existing mapping from a previous call to this method can be reused
 	 * by the file system if it is still valid.
 	 *
+	 * If this succeeds, iomap_finish_folio_write() must be called once
+	 * writeback completes for the range, regardless of whether the
+	 * writeback succeeded or failed.
+	 *
 	 * Returns the number of bytes processed or a negative errno.
 	 */
 	ssize_t (*writeback_range)(struct iomap_writepage_ctx *wpc,
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 4/9] iomap: optimize pending async writeback accounting
  2025-11-11 19:36 [PATCH v4 0/9] iomap: buffered io changes Joanne Koong
                   ` (2 preceding siblings ...)
  2025-11-11 19:36 ` [PATCH v4 3/9] docs: document iomap writeback's iomap_finish_folio_write() requirement Joanne Koong
@ 2025-11-11 19:36 ` Joanne Koong
  2025-11-11 19:36 ` [PATCH v4 5/9] iomap: simplify ->read_folio_range() error handling for reads Joanne Koong
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Joanne Koong @ 2025-11-11 19:36 UTC (permalink / raw)
  To: brauner; +Cc: hch, djwong, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

Pending writebacks must be accounted for to determine when all requests
have completed and writeback on the folio should be ended. Currently
this is done by atomically incrementing ifs->write_bytes_pending for
every range to be written back.

Instead, the number of atomic operations can be minimized by setting
ifs->write_bytes_pending to the folio size, internally tracking how many
bytes are written back asynchronously, and then after sending off all
the requests, decrementing ifs->write_bytes_pending by the number of
bytes not written back asynchronously. Now, for N ranges written back,
only N + 2 atomic operations are required instead of 2N + 2.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/fuse/file.c         |  4 +--
 fs/iomap/buffered-io.c | 58 +++++++++++++++++++++++++-----------------
 fs/iomap/ioend.c       |  2 --
 include/linux/iomap.h  |  2 --
 4 files changed, 36 insertions(+), 30 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 8275b6681b9b..b343a6f37563 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1885,7 +1885,8 @@ static void fuse_writepage_finish(struct fuse_writepage_args *wpa)
 		 * scope of the fi->lock alleviates xarray lock
 		 * contention and noticeably improves performance.
 		 */
-		iomap_finish_folio_write(inode, ap->folios[i], 1);
+		iomap_finish_folio_write(inode, ap->folios[i],
+					 ap->descs[i].length);
 
 	wake_up(&fi->page_waitq);
 }
@@ -2221,7 +2222,6 @@ static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
 		ap = &wpa->ia.ap;
 	}
 
-	iomap_start_folio_write(inode, folio, 1);
 	fuse_writepage_args_page_fill(wpa, folio, ap->num_folios,
 				      offset, len);
 	data->nr_bytes += len;
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 0eb439b523b1..1873a2f74883 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1641,16 +1641,25 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops,
 }
 EXPORT_SYMBOL_GPL(iomap_page_mkwrite);
 
-void iomap_start_folio_write(struct inode *inode, struct folio *folio,
-		size_t len)
+static void iomap_writeback_init(struct inode *inode, struct folio *folio)
 {
 	struct iomap_folio_state *ifs = folio->private;
 
 	WARN_ON_ONCE(i_blocks_per_folio(inode, folio) > 1 && !ifs);
-	if (ifs)
-		atomic_add(len, &ifs->write_bytes_pending);
+	if (ifs) {
+		WARN_ON_ONCE(atomic_read(&ifs->write_bytes_pending) != 0);
+		/*
+		 * Set this to the folio size. After processing the folio for
+		 * writeback in iomap_writeback_folio(), we'll subtract any
+		 * ranges not written back.
+		 *
+		 * We do this because otherwise, we would have to atomically
+		 * increment ifs->write_bytes_pending every time a range in the
+		 * folio needs to be written back.
+		 */
+		atomic_set(&ifs->write_bytes_pending, folio_size(folio));
+	}
 }
-EXPORT_SYMBOL_GPL(iomap_start_folio_write);
 
 void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
 		size_t len)
@@ -1667,7 +1676,7 @@ EXPORT_SYMBOL_GPL(iomap_finish_folio_write);
 
 static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
 		struct folio *folio, u64 pos, u32 rlen, u64 end_pos,
-		bool *wb_pending)
+		size_t *bytes_submitted)
 {
 	do {
 		ssize_t ret;
@@ -1681,11 +1690,11 @@ static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
 		pos += ret;
 
 		/*
-		 * Holes are not be written back by ->writeback_range, so track
+		 * Holes are not written back by ->writeback_range, so track
 		 * if we did handle anything that is not a hole here.
 		 */
 		if (wpc->iomap.type != IOMAP_HOLE)
-			*wb_pending = true;
+			*bytes_submitted += ret;
 	} while (rlen);
 
 	return 0;
@@ -1756,7 +1765,7 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
 	u64 pos = folio_pos(folio);
 	u64 end_pos = pos + folio_size(folio);
 	u64 end_aligned = 0;
-	bool wb_pending = false;
+	size_t bytes_submitted = 0;
 	int error = 0;
 	u32 rlen;
 
@@ -1776,14 +1785,7 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
 			iomap_set_range_dirty(folio, 0, end_pos - pos);
 		}
 
-		/*
-		 * Keep the I/O completion handler from clearing the writeback
-		 * bit until we have submitted all blocks by adding a bias to
-		 * ifs->write_bytes_pending, which is dropped after submitting
-		 * all blocks.
-		 */
-		WARN_ON_ONCE(atomic_read(&ifs->write_bytes_pending) != 0);
-		iomap_start_folio_write(inode, folio, 1);
+		iomap_writeback_init(inode, folio);
 	}
 
 	/*
@@ -1798,13 +1800,13 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
 	end_aligned = round_up(end_pos, i_blocksize(inode));
 	while ((rlen = iomap_find_dirty_range(folio, &pos, end_aligned))) {
 		error = iomap_writeback_range(wpc, folio, pos, rlen, end_pos,
-				&wb_pending);
+				&bytes_submitted);
 		if (error)
 			break;
 		pos += rlen;
 	}
 
-	if (wb_pending)
+	if (bytes_submitted)
 		wpc->nr_folios++;
 
 	/*
@@ -1822,12 +1824,20 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
 	 * bit ourselves right after unlocking the page.
 	 */
 	if (ifs) {
-		if (atomic_dec_and_test(&ifs->write_bytes_pending))
-			folio_end_writeback(folio);
-	} else {
-		if (!wb_pending)
-			folio_end_writeback(folio);
+		/*
+		 * Subtract any bytes that were initially accounted to
+		 * write_bytes_pending but skipped for writeback.
+		 */
+		size_t bytes_not_submitted = folio_size(folio) -
+				bytes_submitted;
+
+		if (bytes_not_submitted)
+			iomap_finish_folio_write(inode, folio,
+					bytes_not_submitted);
+	} else if (!bytes_submitted) {
+		folio_end_writeback(folio);
 	}
+
 	mapping_set_error(inode->i_mapping, error);
 	return error;
 }
diff --git a/fs/iomap/ioend.c b/fs/iomap/ioend.c
index b49fa75eab26..86f44922ed3b 100644
--- a/fs/iomap/ioend.c
+++ b/fs/iomap/ioend.c
@@ -194,8 +194,6 @@ ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
 	if (!bio_add_folio(&ioend->io_bio, folio, map_len, poff))
 		goto new_ioend;
 
-	iomap_start_folio_write(wpc->inode, folio, map_len);
-
 	/*
 	 * Clamp io_offset and io_size to the incore EOF so that ondisk
 	 * file size updates in the ioend completion are byte-accurate.
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index a5032e456079..b49e47f069db 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -478,8 +478,6 @@ int iomap_ioend_writeback_submit(struct iomap_writepage_ctx *wpc, int error);
 
 void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
 		int error);
-void iomap_start_folio_write(struct inode *inode, struct folio *folio,
-		size_t len);
 void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
 		size_t len);
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 5/9] iomap: simplify ->read_folio_range() error handling for reads
  2025-11-11 19:36 [PATCH v4 0/9] iomap: buffered io changes Joanne Koong
                   ` (3 preceding siblings ...)
  2025-11-11 19:36 ` [PATCH v4 4/9] iomap: optimize pending async writeback accounting Joanne Koong
@ 2025-11-11 19:36 ` Joanne Koong
  2025-11-17 20:46   ` Matthew Wilcox
  2025-11-11 19:36 ` [PATCH v4 6/9] iomap: simplify when reads can be skipped for writes Joanne Koong
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Joanne Koong @ 2025-11-11 19:36 UTC (permalink / raw)
  To: brauner; +Cc: hch, djwong, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

Instead of requiring that the caller calls iomap_finish_folio_read()
even if the ->read_folio_range() callback returns an error, account for
this internally in iomap instead, which makes the interface simpler and
makes it match writeback's ->read_folio_range() error handling
expectations.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 .../filesystems/iomap/operations.rst          |  7 +--
 fs/fuse/file.c                                | 10 +--
 fs/iomap/buffered-io.c                        | 63 ++++++++++---------
 include/linux/iomap.h                         |  5 +-
 4 files changed, 41 insertions(+), 44 deletions(-)

diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index 4d30723be7fa..64f4baf5750e 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -149,10 +149,9 @@ These ``struct kiocb`` flags are significant for buffered I/O with iomap:
 iomap calls these functions:
 
   - ``read_folio_range``: Called to read in the range. This must be provided
-    by the caller. The caller is responsible for calling
-    iomap_finish_folio_read() after reading in the folio range. This should be
-    done even if an error is encountered during the read. This returns 0 on
-    success or a negative error on failure.
+    by the caller. If this succeeds, iomap_finish_folio_read() must be called
+    after the range is read in, regardless of whether the read succeeded or
+    failed.
 
   - ``submit_read``: Submit any pending read requests. This function is
     optional.
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index b343a6f37563..7bcb650a9f26 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -922,13 +922,6 @@ static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
 
 	if (ctx->rac) {
 		ret = fuse_handle_readahead(folio, ctx->rac, data, pos, len);
-		/*
-		 * If fuse_handle_readahead was successful, fuse_readpages_end
-		 * will do the iomap_finish_folio_read, else we need to call it
-		 * here
-		 */
-		if (ret)
-			iomap_finish_folio_read(folio, off, len, ret);
 	} else {
 		/*
 		 *  for non-readahead read requests, do reads synchronously
@@ -936,7 +929,8 @@ static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
 		 *  out-of-order reads
 		 */
 		ret = fuse_do_readfolio(file, folio, off, len);
-		iomap_finish_folio_read(folio, off, len, ret);
+		if (!ret)
+			iomap_finish_folio_read(folio, off, len, ret);
 	}
 	return ret;
 }
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 1873a2f74883..c82b5b24d4b3 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -398,7 +398,8 @@ static void iomap_read_init(struct folio *folio)
 		 * has already finished reading in the entire folio.
 		 */
 		spin_lock_irq(&ifs->state_lock);
-		ifs->read_bytes_pending += len + 1;
+		WARN_ON_ONCE(ifs->read_bytes_pending != 0);
+		ifs->read_bytes_pending = len + 1;
 		spin_unlock_irq(&ifs->state_lock);
 	}
 }
@@ -414,43 +415,47 @@ static void iomap_read_init(struct folio *folio)
  */
 static void iomap_read_end(struct folio *folio, size_t bytes_submitted)
 {
-	struct iomap_folio_state *ifs;
-
-	/*
-	 * If there are no bytes submitted, this means we are responsible for
-	 * unlocking the folio here, since no IO helper has taken ownership of
-	 * it.
-	 */
-	if (!bytes_submitted) {
-		folio_unlock(folio);
-		return;
-	}
+	struct iomap_folio_state *ifs = folio->private;
 
-	ifs = folio->private;
 	if (ifs) {
 		bool end_read, uptodate;
-		/*
-		 * Subtract any bytes that were initially accounted to
-		 * read_bytes_pending but skipped for IO.
-		 * The +1 accounts for the bias we added in iomap_read_init().
-		 */
-		size_t bytes_not_submitted = folio_size(folio) + 1 -
-				bytes_submitted;
 
 		spin_lock_irq(&ifs->state_lock);
-		ifs->read_bytes_pending -= bytes_not_submitted;
-		/*
-		 * If !ifs->read_bytes_pending, this means all pending reads
-		 * by the IO helper have already completed, which means we need
-		 * to end the folio read here. If ifs->read_bytes_pending != 0,
-		 * the IO helper will end the folio read.
-		 */
-		end_read = !ifs->read_bytes_pending;
+		if (!ifs->read_bytes_pending) {
+			WARN_ON_ONCE(bytes_submitted);
+			end_read = true;
+		} else {
+			/*
+			 * Subtract any bytes that were initially accounted to
+			 * read_bytes_pending but skipped for IO. The +1
+			 * accounts for the bias we added in iomap_read_init().
+			 */
+			size_t bytes_not_submitted = folio_size(folio) + 1 -
+					bytes_submitted;
+			ifs->read_bytes_pending -= bytes_not_submitted;
+			/*
+			 * If !ifs->read_bytes_pending, this means all pending
+			 * reads by the IO helper have already completed, which
+			 * means we need to end the folio read here. If
+			 * ifs->read_bytes_pending != 0, the IO helper will end
+			 * the folio read.
+			 */
+			end_read = !ifs->read_bytes_pending;
+		}
 		if (end_read)
 			uptodate = ifs_is_fully_uptodate(folio, ifs);
 		spin_unlock_irq(&ifs->state_lock);
 		if (end_read)
 			folio_end_read(folio, uptodate);
+	} else if (!bytes_submitted) {
+		/*
+		 * If there were no bytes submitted, this means we are
+		 * responsible for unlocking the folio here, since no IO helper
+		 * has taken ownership of it. If there were bytes submitted,
+		 * then the IO helper will end the read via
+		 * iomap_finish_folio_read().
+		 */
+		folio_unlock(folio);
 	}
 }
 
@@ -498,10 +503,10 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
 		} else {
 			if (!*bytes_submitted)
 				iomap_read_init(folio);
-			*bytes_submitted += plen;
 			ret = ctx->ops->read_folio_range(iter, ctx, plen);
 			if (ret)
 				return ret;
+			*bytes_submitted += plen;
 		}
 
 		ret = iomap_iter_advance(iter, plen);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index b49e47f069db..520e967cb501 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -495,9 +495,8 @@ struct iomap_read_ops {
 	/*
 	 * Read in a folio range.
 	 *
-	 * The caller is responsible for calling iomap_finish_folio_read() after
-	 * reading in the folio range. This should be done even if an error is
-	 * encountered during the read.
+	 * If this succeeds, iomap_finish_folio_read() must be called after the
+	 * range is read in, regardless of whether the read succeeded or failed.
 	 *
 	 * Returns 0 on success or a negative error on failure.
 	 */
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 6/9] iomap: simplify when reads can be skipped for writes
  2025-11-11 19:36 [PATCH v4 0/9] iomap: buffered io changes Joanne Koong
                   ` (4 preceding siblings ...)
  2025-11-11 19:36 ` [PATCH v4 5/9] iomap: simplify ->read_folio_range() error handling for reads Joanne Koong
@ 2025-11-11 19:36 ` Joanne Koong
  2025-11-11 19:36 ` [PATCH v4 7/9] iomap: use loff_t for file positions and offsets in writeback code Joanne Koong
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Joanne Koong @ 2025-11-11 19:36 UTC (permalink / raw)
  To: brauner; +Cc: hch, djwong, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

Currently, the logic for skipping the read range for a write is

if (!(iter->flags & IOMAP_UNSHARE) &&
    (from <= poff || from >= poff + plen) &&
    (to <= poff || to >= poff + plen))

which breaks down to skipping the read if any of these are true:
a) from <= poff && to <= poff
b) from <= poff && to >= poff + plen
c) from >= poff + plen && to <= poff
d) from >= poff + plen && to >= poff + plen

This can be simplified to
if (!(iter->flags & IOMAP_UNSHARE) && from <= poff && to >= poff + plen)

from the following reasoning:

a) from <= poff && to <= poff
This reduces to 'to <= poff' since it is guaranteed that 'from <= to'
(since to = from + len). It is not possible for 'from <= to' to be true
here because we only reach here if plen > 0 (thanks to the preceding 'if
(plen == 0)' check that would break us out of the loop). If 'to <=
poff', plen would have to be 0 since poff and plen get adjusted in
lockstep for uptodate blocks. This means we can eliminate this check.

c) from >= poff + plen && to <= poff
This is not possible since 'from <= to' and 'plen > 0'. We can eliminate
this check.

d) from >= poff + plen && to >= poff + plen
This reduces to 'from >= poff + plen' since 'from <= to'.
It is not possible for 'from >= poff + plen' to be true here. We only
reach here if plen > 0 and for writes, poff and plen will always be
block-aligned, which means poff <= from < poff + plen. We can eliminate
this check.

The only valid check is b) from <= poff && to >= poff + plen.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/iomap/buffered-io.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index c82b5b24d4b3..17449ea13420 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -758,9 +758,12 @@ static int __iomap_write_begin(const struct iomap_iter *iter,
 		if (plen == 0)
 			break;
 
-		if (!(iter->flags & IOMAP_UNSHARE) &&
-		    (from <= poff || from >= poff + plen) &&
-		    (to <= poff || to >= poff + plen))
+		/*
+		 * If the read range will be entirely overwritten by the write,
+		 * we can skip having to zero/read it in.
+		 */
+		if (!(iter->flags & IOMAP_UNSHARE) && from <= poff &&
+		    to >= poff + plen)
 			continue;
 
 		if (iomap_block_needs_zeroing(iter, block_start)) {
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 7/9] iomap: use loff_t for file positions and offsets in writeback code
  2025-11-11 19:36 [PATCH v4 0/9] iomap: buffered io changes Joanne Koong
                   ` (5 preceding siblings ...)
  2025-11-11 19:36 ` [PATCH v4 6/9] iomap: simplify when reads can be skipped for writes Joanne Koong
@ 2025-11-11 19:36 ` Joanne Koong
  2025-11-19  3:40   ` Matthew Wilcox
  2025-11-11 19:36 ` [PATCH v4 8/9] iomap: use find_next_bit() for dirty bitmap scanning Joanne Koong
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Joanne Koong @ 2025-11-11 19:36 UTC (permalink / raw)
  To: brauner; +Cc: hch, djwong, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

Use loff_t instead of u64 for file positions and offsets to be
consistent with kernel VFS conventions. Both are 64-bit types. loff_t is
signed for historical reasons but this has no practical effect.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 block/fops.c           |  3 ++-
 fs/fuse/file.c         |  4 ++--
 fs/gfs2/bmap.c         |  3 ++-
 fs/iomap/buffered-io.c | 17 +++++++++--------
 fs/xfs/xfs_aops.c      |  8 ++++----
 fs/zonefs/file.c       |  3 ++-
 include/linux/iomap.h  |  4 ++--
 7 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index 4dad9c2d5796..d2b96143b40f 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -550,7 +550,8 @@ static void blkdev_readahead(struct readahead_control *rac)
 }
 
 static ssize_t blkdev_writeback_range(struct iomap_writepage_ctx *wpc,
-		struct folio *folio, u64 offset, unsigned int len, u64 end_pos)
+		struct folio *folio, loff_t offset, unsigned int len,
+		loff_t end_pos)
 {
 	loff_t isize = i_size_read(wpc->inode);
 
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 7bcb650a9f26..6d5e44cbd62e 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2168,8 +2168,8 @@ static bool fuse_folios_need_send(struct fuse_conn *fc, loff_t pos,
 }
 
 static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
-					  struct folio *folio, u64 pos,
-					  unsigned len, u64 end_pos)
+					  struct folio *folio, loff_t pos,
+					  unsigned len, loff_t end_pos)
 {
 	struct fuse_fill_wb_data *data = wpc->wb_ctx;
 	struct fuse_writepage_args *wpa = data->wpa;
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 131091520de6..2b61b057151b 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -2473,7 +2473,8 @@ int __gfs2_punch_hole(struct file *file, loff_t offset, loff_t length)
 }
 
 static ssize_t gfs2_writeback_range(struct iomap_writepage_ctx *wpc,
-		struct folio *folio, u64 offset, unsigned int len, u64 end_pos)
+		struct folio *folio, loff_t offset, unsigned int len,
+		loff_t end_pos)
 {
 	if (WARN_ON_ONCE(gfs2_is_stuffed(GFS2_I(wpc->inode))))
 		return -EIO;
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 17449ea13420..98c4665addb2 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -86,7 +86,8 @@ static inline bool ifs_block_is_dirty(struct folio *folio,
 }
 
 static unsigned ifs_find_dirty_range(struct folio *folio,
-		struct iomap_folio_state *ifs, u64 *range_start, u64 range_end)
+		struct iomap_folio_state *ifs, loff_t *range_start,
+		loff_t range_end)
 {
 	struct inode *inode = folio->mapping->host;
 	unsigned start_blk =
@@ -110,8 +111,8 @@ static unsigned ifs_find_dirty_range(struct folio *folio,
 	return nblks << inode->i_blkbits;
 }
 
-static unsigned iomap_find_dirty_range(struct folio *folio, u64 *range_start,
-		u64 range_end)
+static unsigned iomap_find_dirty_range(struct folio *folio, loff_t *range_start,
+		loff_t range_end)
 {
 	struct iomap_folio_state *ifs = folio->private;
 
@@ -1683,7 +1684,7 @@ void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
 EXPORT_SYMBOL_GPL(iomap_finish_folio_write);
 
 static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
-		struct folio *folio, u64 pos, u32 rlen, u64 end_pos,
+		struct folio *folio, loff_t pos, u32 rlen, loff_t end_pos,
 		size_t *bytes_submitted)
 {
 	do {
@@ -1715,7 +1716,7 @@ static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
  * i_size, adjust end_pos and zero all data beyond i_size.
  */
 static bool iomap_writeback_handle_eof(struct folio *folio, struct inode *inode,
-		u64 *end_pos)
+		loff_t *end_pos)
 {
 	u64 isize = i_size_read(inode);
 
@@ -1770,9 +1771,9 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
 {
 	struct iomap_folio_state *ifs = folio->private;
 	struct inode *inode = wpc->inode;
-	u64 pos = folio_pos(folio);
-	u64 end_pos = pos + folio_size(folio);
-	u64 end_aligned = 0;
+	loff_t pos = folio_pos(folio);
+	loff_t end_pos = pos + folio_size(folio);
+	loff_t end_aligned = 0;
 	size_t bytes_submitted = 0;
 	int error = 0;
 	u32 rlen;
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 0c2ed00733f2..593a34832116 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -480,9 +480,9 @@ static ssize_t
 xfs_writeback_range(
 	struct iomap_writepage_ctx *wpc,
 	struct folio		*folio,
-	u64			offset,
+	loff_t			offset,
 	unsigned int		len,
-	u64			end_pos)
+	loff_t			end_pos)
 {
 	ssize_t			ret;
 
@@ -630,9 +630,9 @@ static ssize_t
 xfs_zoned_writeback_range(
 	struct iomap_writepage_ctx *wpc,
 	struct folio		*folio,
-	u64			offset,
+	loff_t			offset,
 	unsigned int		len,
-	u64			end_pos)
+	loff_t			end_pos)
 {
 	ssize_t			ret;
 
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index c1e5e30e90a0..d748ed99ac2d 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -126,7 +126,8 @@ static void zonefs_readahead(struct readahead_control *rac)
  * which implies that the page range can only be within the fixed inode size.
  */
 static ssize_t zonefs_writeback_range(struct iomap_writepage_ctx *wpc,
-		struct folio *folio, u64 offset, unsigned len, u64 end_pos)
+		struct folio *folio, loff_t offset, unsigned len,
+		loff_t end_pos)
 {
 	struct zonefs_zone *z = zonefs_inode_zone(wpc->inode);
 
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 520e967cb501..351c7bd9653d 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -442,8 +442,8 @@ struct iomap_writeback_ops {
 	 * Returns the number of bytes processed or a negative errno.
 	 */
 	ssize_t (*writeback_range)(struct iomap_writepage_ctx *wpc,
-			struct folio *folio, u64 pos, unsigned int len,
-			u64 end_pos);
+			struct folio *folio, loff_t pos, unsigned int len,
+			loff_t end_pos);
 
 	/*
 	 * Submit a writeback context previously build up by ->writeback_range.
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 8/9] iomap: use find_next_bit() for dirty bitmap scanning
  2025-11-11 19:36 [PATCH v4 0/9] iomap: buffered io changes Joanne Koong
                   ` (6 preceding siblings ...)
  2025-11-11 19:36 ` [PATCH v4 7/9] iomap: use loff_t for file positions and offsets in writeback code Joanne Koong
@ 2025-11-11 19:36 ` Joanne Koong
  2025-11-11 19:36 ` [PATCH v4 9/9] iomap: use find_next_bit() for uptodate " Joanne Koong
  2025-11-12 10:34 ` [PATCH v4 0/9] iomap: buffered io changes Christian Brauner
  9 siblings, 0 replies; 20+ messages in thread
From: Joanne Koong @ 2025-11-11 19:36 UTC (permalink / raw)
  To: brauner; +Cc: hch, djwong, bfoster, linux-fsdevel, kernel-team

Use find_next_bit()/find_next_zero_bit() for iomap dirty bitmap
scanning. This uses __ffs() internally and is more efficient for
finding the next dirty or clean bit than iterating through the bitmap
range testing every bit.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/iomap/buffered-io.c | 61 ++++++++++++++++++++++++++++--------------
 1 file changed, 41 insertions(+), 20 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 98c4665addb2..1a7146d6ba49 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -76,13 +76,34 @@ static void iomap_set_range_uptodate(struct folio *folio, size_t off,
 		folio_mark_uptodate(folio);
 }
 
-static inline bool ifs_block_is_dirty(struct folio *folio,
-		struct iomap_folio_state *ifs, int block)
+/*
+ * Find the next dirty block in the folio. end_blk is inclusive.
+ * If no dirty block is found, this will return end_blk + 1.
+ */
+static unsigned ifs_next_dirty_block(struct folio *folio,
+		unsigned start_blk, unsigned end_blk)
 {
+	struct iomap_folio_state *ifs = folio->private;
 	struct inode *inode = folio->mapping->host;
-	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
+	unsigned int blks = i_blocks_per_folio(inode, folio);
+
+	return find_next_bit(ifs->state, blks + end_blk + 1,
+			blks + start_blk) - blks;
+}
+
+/*
+ * Find the next clean block in the folio. end_blk is inclusive.
+ * If no clean block is found, this will return end_blk + 1.
+ */
+static unsigned ifs_next_clean_block(struct folio *folio,
+		unsigned start_blk, unsigned end_blk)
+{
+	struct iomap_folio_state *ifs = folio->private;
+	struct inode *inode = folio->mapping->host;
+	unsigned int blks = i_blocks_per_folio(inode, folio);
 
-	return test_bit(block + blks_per_folio, ifs->state);
+	return find_next_zero_bit(ifs->state, blks + end_blk + 1,
+			blks + start_blk) - blks;
 }
 
 static unsigned ifs_find_dirty_range(struct folio *folio,
@@ -94,18 +115,17 @@ static unsigned ifs_find_dirty_range(struct folio *folio,
 		offset_in_folio(folio, *range_start) >> inode->i_blkbits;
 	unsigned end_blk = min_not_zero(
 		offset_in_folio(folio, range_end) >> inode->i_blkbits,
-		i_blocks_per_folio(inode, folio));
-	unsigned nblks = 1;
-
-	while (!ifs_block_is_dirty(folio, ifs, start_blk))
-		if (++start_blk == end_blk)
-			return 0;
+		i_blocks_per_folio(inode, folio)) - 1;
+	unsigned nblks;
 
-	while (start_blk + nblks < end_blk) {
-		if (!ifs_block_is_dirty(folio, ifs, start_blk + nblks))
-			break;
-		nblks++;
-	}
+	start_blk = ifs_next_dirty_block(folio, start_blk, end_blk);
+	if (start_blk > end_blk)
+		return 0;
+	if (start_blk == end_blk)
+		nblks = 1;
+	else
+		nblks = ifs_next_clean_block(folio, start_blk + 1, end_blk) -
+				start_blk;
 
 	*range_start = folio_pos(folio) + (start_blk << inode->i_blkbits);
 	return nblks << inode->i_blkbits;
@@ -1167,7 +1187,7 @@ static void iomap_write_delalloc_ifs_punch(struct inode *inode,
 		struct folio *folio, loff_t start_byte, loff_t end_byte,
 		struct iomap *iomap, iomap_punch_t punch)
 {
-	unsigned int first_blk, last_blk, i;
+	unsigned int first_blk, last_blk;
 	loff_t last_byte;
 	u8 blkbits = inode->i_blkbits;
 	struct iomap_folio_state *ifs;
@@ -1186,10 +1206,11 @@ static void iomap_write_delalloc_ifs_punch(struct inode *inode,
 			folio_pos(folio) + folio_size(folio) - 1);
 	first_blk = offset_in_folio(folio, start_byte) >> blkbits;
 	last_blk = offset_in_folio(folio, last_byte) >> blkbits;
-	for (i = first_blk; i <= last_blk; i++) {
-		if (!ifs_block_is_dirty(folio, ifs, i))
-			punch(inode, folio_pos(folio) + (i << blkbits),
-				    1 << blkbits, iomap);
+	while ((first_blk = ifs_next_clean_block(folio, first_blk, last_blk))
+		       <= last_blk) {
+		punch(inode, folio_pos(folio) + (first_blk << blkbits),
+				1 << blkbits, iomap);
+		first_blk++;
 	}
 }
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 9/9] iomap: use find_next_bit() for uptodate bitmap scanning
  2025-11-11 19:36 [PATCH v4 0/9] iomap: buffered io changes Joanne Koong
                   ` (7 preceding siblings ...)
  2025-11-11 19:36 ` [PATCH v4 8/9] iomap: use find_next_bit() for dirty bitmap scanning Joanne Koong
@ 2025-11-11 19:36 ` Joanne Koong
  2025-11-12 10:34 ` [PATCH v4 0/9] iomap: buffered io changes Christian Brauner
  9 siblings, 0 replies; 20+ messages in thread
From: Joanne Koong @ 2025-11-11 19:36 UTC (permalink / raw)
  To: brauner; +Cc: hch, djwong, bfoster, linux-fsdevel, kernel-team

Use find_next_bit()/find_next_zero_bit() for iomap uptodate bitmap
scanning. This uses __ffs() internally and is more efficient for
finding the next uptodate or non-uptodate bit than iterating through the
the bitmap range testing every bit.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/iomap/buffered-io.c | 52 ++++++++++++++++++++++++++----------------
 1 file changed, 32 insertions(+), 20 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 1a7146d6ba49..0475d949e5a0 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -38,10 +38,28 @@ static inline bool ifs_is_fully_uptodate(struct folio *folio,
 	return bitmap_full(ifs->state, i_blocks_per_folio(inode, folio));
 }
 
-static inline bool ifs_block_is_uptodate(struct iomap_folio_state *ifs,
-		unsigned int block)
+/*
+ * Find the next uptodate block in the folio. end_blk is inclusive.
+ * If no uptodate block is found, this will return end_blk + 1.
+ */
+static unsigned ifs_next_uptodate_block(struct folio *folio,
+		unsigned start_blk, unsigned end_blk)
 {
-	return test_bit(block, ifs->state);
+	struct iomap_folio_state *ifs = folio->private;
+
+	return find_next_bit(ifs->state, end_blk + 1, start_blk);
+}
+
+/*
+ * Find the next non-uptodate block in the folio. end_blk is inclusive.
+ * If no non-uptodate block is found, this will return end_blk + 1.
+ */
+static unsigned ifs_next_nonuptodate_block(struct folio *folio,
+		unsigned start_blk, unsigned end_blk)
+{
+	struct iomap_folio_state *ifs = folio->private;
+
+	return find_next_zero_bit(ifs->state, end_blk + 1, start_blk);
 }
 
 static bool ifs_set_range_uptodate(struct folio *folio,
@@ -278,14 +296,11 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
 	 * to avoid reading in already uptodate ranges.
 	 */
 	if (ifs) {
-		unsigned int i, blocks_skipped;
+		unsigned int next, blocks_skipped;
 
-		/* move forward for each leading block marked uptodate */
-		for (i = first; i <= last; i++)
-			if (!ifs_block_is_uptodate(ifs, i))
-				break;
+		next = ifs_next_nonuptodate_block(folio, first, last);
+		blocks_skipped = next - first;
 
-		blocks_skipped = i - first;
 		if (blocks_skipped) {
 			unsigned long block_offset = *pos & (block_size - 1);
 			unsigned bytes_skipped =
@@ -295,15 +310,15 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
 			poff += bytes_skipped;
 			plen -= bytes_skipped;
 		}
-		first = i;
+		first = next;
 
 		/* truncate len if we find any trailing uptodate block(s) */
-		while (++i <= last) {
-			if (ifs_block_is_uptodate(ifs, i)) {
+		if (++next <= last) {
+			next = ifs_next_uptodate_block(folio, next, last);
+			if (next <= last) {
 				plen -= iomap_bytes_to_truncate(*pos + plen,
-						block_bits, last - i + 1);
-				last = i - 1;
-				break;
+						block_bits, last - next + 1);
+				last = next - 1;
 			}
 		}
 	}
@@ -640,7 +655,7 @@ bool iomap_is_partially_uptodate(struct folio *folio, size_t from, size_t count)
 {
 	struct iomap_folio_state *ifs = folio->private;
 	struct inode *inode = folio->mapping->host;
-	unsigned first, last, i;
+	unsigned first, last;
 
 	if (!ifs)
 		return false;
@@ -652,10 +667,7 @@ bool iomap_is_partially_uptodate(struct folio *folio, size_t from, size_t count)
 	first = from >> inode->i_blkbits;
 	last = (from + count - 1) >> inode->i_blkbits;
 
-	for (i = first; i <= last; i++)
-		if (!ifs_block_is_uptodate(ifs, i))
-			return false;
-	return true;
+	return ifs_next_nonuptodate_block(folio, first, last) > last;
 }
 EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 0/9] iomap: buffered io changes
  2025-11-11 19:36 [PATCH v4 0/9] iomap: buffered io changes Joanne Koong
                   ` (8 preceding siblings ...)
  2025-11-11 19:36 ` [PATCH v4 9/9] iomap: use find_next_bit() for uptodate " Joanne Koong
@ 2025-11-12 10:34 ` Christian Brauner
  9 siblings, 0 replies; 20+ messages in thread
From: Christian Brauner @ 2025-11-12 10:34 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Christian Brauner, hch, djwong, bfoster, linux-fsdevel,
	kernel-team

On Tue, 11 Nov 2025 11:36:49 -0800, Joanne Koong wrote:
> This series is on top of the vfs-6.19.iomap branch (head commit ca3557a68684)
> in Christian's vfs tree.
> 
> Thanks,
> Joanne
> 
> Changelog
> ---------
> 
> [...]

Applied to the vfs-6.19.iomap branch of the vfs/vfs.git tree.
Patches in the vfs-6.19.iomap branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.19.iomap

[1/9] iomap: rename bytes_pending/bytes_accounted to bytes_submitted/bytes_not_submitted
      https://git.kernel.org/vfs/vfs/c/a0f1cabe294c
[2/9] iomap: account for unaligned end offsets when truncating read range
      https://git.kernel.org/vfs/vfs/c/9d875e0eef8e
[3/9] docs: document iomap writeback's iomap_finish_folio_write() requirement
      https://git.kernel.org/vfs/vfs/c/7e6cea5ae2f5
[4/9] iomap: optimize pending async writeback accounting
      https://git.kernel.org/vfs/vfs/c/6b1fd2281fb0
[5/9] iomap: simplify ->read_folio_range() error handling for reads
      https://git.kernel.org/vfs/vfs/c/f8eaf79406fe
[6/9] iomap: simplify when reads can be skipped for writes
      https://git.kernel.org/vfs/vfs/c/a298febc47e0
[7/9] iomap: use loff_t for file positions and offsets in writeback code
      https://git.kernel.org/vfs/vfs/c/b94488503277
[8/9] iomap: use find_next_bit() for dirty bitmap scanning
      https://git.kernel.org/vfs/vfs/c/e46cdbfa2029
[9/9] iomap: use find_next_bit() for uptodate bitmap scanning
      https://git.kernel.org/vfs/vfs/c/608f00b56c31

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 5/9] iomap: simplify ->read_folio_range() error handling for reads
  2025-11-11 19:36 ` [PATCH v4 5/9] iomap: simplify ->read_folio_range() error handling for reads Joanne Koong
@ 2025-11-17 20:46   ` Matthew Wilcox
  2025-11-17 23:26     ` Joanne Koong
  0 siblings, 1 reply; 20+ messages in thread
From: Matthew Wilcox @ 2025-11-17 20:46 UTC (permalink / raw)
  To: Joanne Koong
  Cc: brauner, hch, djwong, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

On Tue, Nov 11, 2025 at 11:36:54AM -0800, Joanne Koong wrote:
> Instead of requiring that the caller calls iomap_finish_folio_read()
> even if the ->read_folio_range() callback returns an error, account for
> this internally in iomap instead, which makes the interface simpler and
> makes it match writeback's ->read_folio_range() error handling
> expectations.

Bisection of next-20251117 leads to this patch (commit
f8eaf79406fe9415db0e7a5c175b50cb01265199)

Here's the failure:

generic/008       run fstests generic/008 at 2025-11-17 20:40:31
page: refcount:5 mapcount:0 mapping:00000000101f858e index:0x4 pfn:0x12d4f8
head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
memcg:ffff8881120315c0
aops:xfs_address_space_operations ino:83 dentry name(?):"008.2222"
flags: 0x8000000000014069(locked|uptodate|lru|private|head|reclaim|zone=2)
raw: 8000000000014069 ffffea0004b69f48 ffffea0004a0a508 ffff8881139d83f0
raw: 0000000000000004 ffff8881070b4420 00000005ffffffff ffff8881120315c0
head: 8000000000014069 ffffea0004b69f48 ffffea0004a0a508 ffff8881139d83f0
head: 0000000000000004 ffff8881070b4420 00000005ffffffff ffff8881120315c0
head: 8000000000000202 ffffea0004b53e01 00000000ffffffff 00000000ffffffff
head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000004
page dumped because: VM_BUG_ON_FOLIO(success && folio_test_uptodate(folio))
------------[ cut here ]------------
kernel BUG at mm/filemap.c:1538!
Oops: invalid opcode: 0000 [#1] SMP NOPTI
CPU: 1 UID: 0 PID: 2607 Comm: xfs_io Not tainted 6.18.0-rc1-ktest-00033-gf8eaf79406fe #151 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:folio_end_read+0x68/0x70
Code: 8e e9 04 00 0f 0b 48 8b 07 48 89 c2 48 c1 ea 03 a8 08 74 00 83 e2 01 b8 09 00 00 00 74 c2 48 c7 c6 a0 6e 3e 82 e8 68 e9 04 00 <0f> 0b 90 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 0018:ffff888112333870 EFLAGS: 00010286
RAX: 000000000000004b RBX: ffffea0004b53e00 RCX: 0000000000000027
RDX: ffff888179657c08 RSI: 0000000000000001 RDI: ffff888179657c00
RBP: ffff888112333870 R08: 00000000fffbffff R09: ffff88817f1fdfa8
R10: 0000000000000003 R11: 0000000000000000 R12: ffff8881070b4420
R13: 0000000000000001 R14: ffff888112333b28 R15: ffffea0004b53e00
FS:  00007f7b1ad91880(0000) GS:ffff8881f6b0b000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055d7455da018 CR3: 00000001111ac000 CR4: 0000000000750eb0
PKRU: 55555554
Call Trace:
 <TASK>
 iomap_read_end+0xac/0x130
 iomap_readahead+0x1e1/0x330
 xfs_vm_readahead+0x3d/0x50
 read_pages+0x69/0x270
 page_cache_ra_order+0x2c2/0x4d0
 page_cache_async_ra+0x204/0x3c0
 filemap_readahead.isra.0+0x67/0x80
 filemap_get_pages+0x376/0x8a0
 ? find_held_lock+0x31/0x90
 ? try_charge_memcg+0x21a/0x750
 ? lock_acquire+0xb2/0x290
 ? __memcg_kmem_charge_page+0x160/0x3c0
 filemap_read+0x106/0x4c0
 ? __might_fault+0x35/0x80
 generic_file_read_iter+0xbc/0x110
 xfs_file_buffered_read+0xa9/0x110
 xfs_file_read_iter+0x82/0xf0
 vfs_read+0x277/0x360
 __x64_sys_pread64+0x7a/0xa0
 x64_sys_call+0x1b03/0x1da0
 do_syscall_64+0x6a/0x2e0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f7b1b0efd07
Code: 08 89 3c 24 48 89 4c 24 18 e8 55 76 fa ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 04 24 e8 a5 76 fa ff 48 8b
RSP: 002b:00007ffc4a4b8080 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f7b1b0efd07
RDX: 0000000000001000 RSI: 000055d7455d8000 RDI: 0000000000000003
RBP: 0000000000001000 R08: 0000000000000000 R09: 0000000000000003
R10: 0000000000001000 R11: 0000000000000293 R12: 0000000000000001
R13: 0000000000020000 R14: 000000000001f000 R15: 0000000000001000
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:folio_end_read+0x68/0x70
Code: 8e e9 04 00 0f 0b 48 8b 07 48 89 c2 48 c1 ea 03 a8 08 74 00 83 e2 01 b8 09 00 00 00 74 c2 48 c7 c6 a0 6e 3e 82 e8 68 e9 04 00 <0f> 0b 90 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 0018:ffff888112333870 EFLAGS: 00010286
RAX: 000000000000004b RBX: ffffea0004b53e00 RCX: 0000000000000027
RDX: ffff888179657c08 RSI: 0000000000000001 RDI: ffff888179657c00
RBP: ffff888112333870 R08: 00000000fffbffff R09: ffff88817f1fdfa8
R10: 0000000000000003 R11: 0000000000000000 R12: ffff8881070b4420
R13: 0000000000000001 R14: ffff888112333b28 R15: ffffea0004b53e00
FS:  00007f7b1ad91880(0000) GS:ffff8881f6b0b000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055d7455da018 CR3: 00000001111ac000 CR4: 0000000000750eb0
PKRU: 55555554
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception ]---

You're calling folio_end_read(folio, true) for a folio which is already
marked uptodate!  I haven't looked through your patch to see what the
problem is yet.  Very reproducible, you only have to run generic/008
with a 1kB blocksize XFS.  And CONFIG_VM_DEBUG set, of course.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 5/9] iomap: simplify ->read_folio_range() error handling for reads
  2025-11-17 20:46   ` Matthew Wilcox
@ 2025-11-17 23:26     ` Joanne Koong
  0 siblings, 0 replies; 20+ messages in thread
From: Joanne Koong @ 2025-11-17 23:26 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: brauner, hch, djwong, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

On Mon, Nov 17, 2025 at 12:46 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Nov 11, 2025 at 11:36:54AM -0800, Joanne Koong wrote:
> > Instead of requiring that the caller calls iomap_finish_folio_read()
> > even if the ->read_folio_range() callback returns an error, account for
> > this internally in iomap instead, which makes the interface simpler and
> > makes it match writeback's ->read_folio_range() error handling
> > expectations.
>
> Bisection of next-20251117 leads to this patch (commit
> f8eaf79406fe9415db0e7a5c175b50cb01265199)
>
> Here's the failure:
>
> generic/008       run fstests generic/008 at 2025-11-17 20:40:31
> page: refcount:5 mapcount:0 mapping:00000000101f858e index:0x4 pfn:0x12d4f8
> head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> memcg:ffff8881120315c0
> aops:xfs_address_space_operations ino:83 dentry name(?):"008.2222"
> flags: 0x8000000000014069(locked|uptodate|lru|private|head|reclaim|zone=2)
> raw: 8000000000014069 ffffea0004b69f48 ffffea0004a0a508 ffff8881139d83f0
> raw: 0000000000000004 ffff8881070b4420 00000005ffffffff ffff8881120315c0
> head: 8000000000014069 ffffea0004b69f48 ffffea0004a0a508 ffff8881139d83f0
> head: 0000000000000004 ffff8881070b4420 00000005ffffffff ffff8881120315c0
> head: 8000000000000202 ffffea0004b53e01 00000000ffffffff 00000000ffffffff
> head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000004
> page dumped because: VM_BUG_ON_FOLIO(success && folio_test_uptodate(folio))
> ------------[ cut here ]------------
> kernel BUG at mm/filemap.c:1538!
> Oops: invalid opcode: 0000 [#1] SMP NOPTI
> CPU: 1 UID: 0 PID: 2607 Comm: xfs_io Not tainted 6.18.0-rc1-ktest-00033-gf8eaf79406fe #151 NONE
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> RIP: 0010:folio_end_read+0x68/0x70
> Code: 8e e9 04 00 0f 0b 48 8b 07 48 89 c2 48 c1 ea 03 a8 08 74 00 83 e2 01 b8 09 00 00 00 74 c2 48 c7 c6 a0 6e 3e 82 e8 68 e9 04 00 <0f> 0b 90 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90
> RSP: 0018:ffff888112333870 EFLAGS: 00010286
> RAX: 000000000000004b RBX: ffffea0004b53e00 RCX: 0000000000000027
> RDX: ffff888179657c08 RSI: 0000000000000001 RDI: ffff888179657c00
> RBP: ffff888112333870 R08: 00000000fffbffff R09: ffff88817f1fdfa8
> R10: 0000000000000003 R11: 0000000000000000 R12: ffff8881070b4420
> R13: 0000000000000001 R14: ffff888112333b28 R15: ffffea0004b53e00
> FS:  00007f7b1ad91880(0000) GS:ffff8881f6b0b000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000055d7455da018 CR3: 00000001111ac000 CR4: 0000000000750eb0
> PKRU: 55555554
> Call Trace:
>  <TASK>
>  iomap_read_end+0xac/0x130
>  iomap_readahead+0x1e1/0x330
>  xfs_vm_readahead+0x3d/0x50
>  read_pages+0x69/0x270
>  page_cache_ra_order+0x2c2/0x4d0
>  page_cache_async_ra+0x204/0x3c0
>  filemap_readahead.isra.0+0x67/0x80
>  filemap_get_pages+0x376/0x8a0
>  ? find_held_lock+0x31/0x90
>  ? try_charge_memcg+0x21a/0x750
>  ? lock_acquire+0xb2/0x290
>  ? __memcg_kmem_charge_page+0x160/0x3c0
>  filemap_read+0x106/0x4c0
>  ? __might_fault+0x35/0x80
>  generic_file_read_iter+0xbc/0x110
>  xfs_file_buffered_read+0xa9/0x110
>  xfs_file_read_iter+0x82/0xf0
>  vfs_read+0x277/0x360
>  __x64_sys_pread64+0x7a/0xa0
>  x64_sys_call+0x1b03/0x1da0
>  do_syscall_64+0x6a/0x2e0
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7f7b1b0efd07
> Code: 08 89 3c 24 48 89 4c 24 18 e8 55 76 fa ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 04 24 e8 a5 76 fa ff 48 8b
> RSP: 002b:00007ffc4a4b8080 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
> RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f7b1b0efd07
> RDX: 0000000000001000 RSI: 000055d7455d8000 RDI: 0000000000000003
> RBP: 0000000000001000 R08: 0000000000000000 R09: 0000000000000003
> R10: 0000000000001000 R11: 0000000000000293 R12: 0000000000000001
> R13: 0000000000020000 R14: 000000000001f000 R15: 0000000000001000
>  </TASK>
> Modules linked in:
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:folio_end_read+0x68/0x70
> Code: 8e e9 04 00 0f 0b 48 8b 07 48 89 c2 48 c1 ea 03 a8 08 74 00 83 e2 01 b8 09 00 00 00 74 c2 48 c7 c6 a0 6e 3e 82 e8 68 e9 04 00 <0f> 0b 90 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90
> RSP: 0018:ffff888112333870 EFLAGS: 00010286
> RAX: 000000000000004b RBX: ffffea0004b53e00 RCX: 0000000000000027
> RDX: ffff888179657c08 RSI: 0000000000000001 RDI: ffff888179657c00
> RBP: ffff888112333870 R08: 00000000fffbffff R09: ffff88817f1fdfa8
> R10: 0000000000000003 R11: 0000000000000000 R12: ffff8881070b4420
> R13: 0000000000000001 R14: ffff888112333b28 R15: ffffea0004b53e00
> FS:  00007f7b1ad91880(0000) GS:ffff8881f6b0b000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000055d7455da018 CR3: 00000001111ac000 CR4: 0000000000750eb0
> PKRU: 55555554
> Kernel panic - not syncing: Fatal exception
> Kernel Offset: disabled
> ---[ end Kernel panic - not syncing: Fatal exception ]---
>
> You're calling folio_end_read(folio, true) for a folio which is already
> marked uptodate!  I haven't looked through your patch to see what the
> problem is yet.  Very reproducible, you only have to run generic/008
> with a 1kB blocksize XFS.  And CONFIG_VM_DEBUG set, of course.

Ok, I see how we're getting into this state. For the case where all
the blocks get zeroed instead of read in, the
iomap_set_range_uptodate() call could have also called
folio_mark_uptodate() already.

This fixes the issue:
+++ b/fs/iomap/buffered-io.c
@@ -423,7 +423,9 @@ static void iomap_read_end(struct folio *folio,
size_t bytes_submitted)
                spin_lock_irq(&ifs->state_lock);
                if (!ifs->read_bytes_pending) {
                        WARN_ON_ONCE(bytes_submitted);
-                       end_read = true;
+                       spin_unlock_irq(&ifs->state_lock);
+                       folio_unlock(folio);
+                       return;
                } else {

I'll submit this fix to Christian's branch.

Thanks for giving the repro.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 7/9] iomap: use loff_t for file positions and offsets in writeback code
  2025-11-11 19:36 ` [PATCH v4 7/9] iomap: use loff_t for file positions and offsets in writeback code Joanne Koong
@ 2025-11-19  3:40   ` Matthew Wilcox
  2025-11-19 18:10     ` Joanne Koong
  0 siblings, 1 reply; 20+ messages in thread
From: Matthew Wilcox @ 2025-11-19  3:40 UTC (permalink / raw)
  To: Joanne Koong
  Cc: brauner, hch, djwong, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

On Tue, Nov 11, 2025 at 11:36:56AM -0800, Joanne Koong wrote:
> Use loff_t instead of u64 for file positions and offsets to be
> consistent with kernel VFS conventions. Both are 64-bit types. loff_t is
> signed for historical reasons but this has no practical effect.

generic/303       run fstests generic/303 at 2025-11-19 03:27:51
XFS: Assertion failed: imap.br_startblock != DELAYSTARTBLOCK, file: fs/xfs/xfs_reflink.c, line: 1569
------------[ cut here ]------------
kernel BUG at fs/xfs/xfs_message.c:102!
Oops: invalid opcode: 0000 [#1] SMP NOPTI
CPU: 8 UID: 0 PID: 2422 Comm: cp Not tainted 6.18.0-rc1-ktest-00035-gb94488503277 #169 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:assfail+0x3c/0x46
Code: c2 e0 cc 40 82 48 89 f1 48 89 fe 48 c7 c7 e3 60 45 82 48 89 e5 e8 e4 fd ff ff 8a 05 16 98 55 01 3c 01 76 02 0f 0b a8 01 74 02 <0f> 0b 0f 0b 5d c3 cc cc cc cc 48 8d 45 10 4c 8d 6c 24 10 48 89 e2
RSP: 0018:ffff888111433cf8 EFLAGS: 00010202
RAX: 00000000ffffff01 RBX: 0007ffffffffffff RCX: 000000007fffffff
RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffffffff824560e3
RBP: ffff888111433cf8 R08: 0000000000000000 R09: 000000000000000a
R10: 000000000000000a R11: 0fffffffffffffff R12: 0000000000000001
R13: 00000000ffffff8b R14: ffff888105280000 R15: 0007ffffffffffff
FS:  00007fc4cd191580(0000) GS:ffff8881f6ccb000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005612bbfade30 CR3: 000000011146c000 CR4: 0000000000750eb0
PKRU: 55555554
Call Trace:
 <TASK>
 xfs_reflink_remap_blocks+0x259/0x450
 xfs_file_remap_range+0xe9/0x3d0
 vfs_clone_file_range+0xde/0x460
 ioctl_file_clone+0x50/0xc0
 __x64_sys_ioctl+0x619/0x9d0
 ? do_sys_openat2+0x99/0xd0
 x64_sys_call+0xed0/0x1da0
 do_syscall_64+0x6a/0x2e0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fc4cd34d37b
Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
RSP: 002b:00007ffeb4734050 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fc4cd34d37b
RDX: 0000000000000003 RSI: 0000000040049409 RDI: 0000000000000004
RBP: 00007ffeb4734490 R08: 00007ffeb4734660 R09: 0000000000000002
R10: 0000000000000007 R11: 0000000000000246 R12: 0000000000000001
R13: 0000000000000000 R14: 0000000000008000 R15: 0000000000000002
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:assfail+0x3c/0x46
Code: c2 e0 cc 40 82 48 89 f1 48 89 fe 48 c7 c7 e3 60 45 82 48 89 e5 e8 e4 fd ff ff 8a 05 16 98 55 01 3c 01 76 02 0f 0b a8 01 74 02 <0f> 0b 0f 0b 5d c3 cc cc cc cc 48 8d 45 10 4c 8d 6c 24 10 48 89 e2
RSP: 0018:ffff888111433cf8 EFLAGS: 00010202
RAX: 00000000ffffff01 RBX: 0007ffffffffffff RCX: 000000007fffffff
RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffffffff824560e3
RBP: ffff888111433cf8 R08: 0000000000000000 R09: 000000000000000a
R10: 000000000000000a R11: 0fffffffffffffff R12: 0000000000000001
R13: 00000000ffffff8b R14: ffff888105280000 R15: 0007ffffffffffff
FS:  00007fc4cd191580(0000) GS:ffff8881f6ccb000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005612bbfade30 CR3: 000000011146c000 CR4: 0000000000750eb0
PKRU: 55555554
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception ]---


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 7/9] iomap: use loff_t for file positions and offsets in writeback code
  2025-11-19  3:40   ` Matthew Wilcox
@ 2025-11-19 18:10     ` Joanne Koong
  2025-11-19 18:27       ` Darrick J. Wong
  0 siblings, 1 reply; 20+ messages in thread
From: Joanne Koong @ 2025-11-19 18:10 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: brauner, hch, djwong, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

On Tue, Nov 18, 2025 at 7:40 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Nov 11, 2025 at 11:36:56AM -0800, Joanne Koong wrote:
> > Use loff_t instead of u64 for file positions and offsets to be
> > consistent with kernel VFS conventions. Both are 64-bit types. loff_t is
> > signed for historical reasons but this has no practical effect.
>
> generic/303       run fstests generic/303 at 2025-11-19 03:27:51
> XFS: Assertion failed: imap.br_startblock != DELAYSTARTBLOCK, file: fs/xfs/xfs_reflink.c, line: 1569
> ------------[ cut here ]------------
> kernel BUG at fs/xfs/xfs_message.c:102!
> Oops: invalid opcode: 0000 [#1] SMP NOPTI
> CPU: 8 UID: 0 PID: 2422 Comm: cp Not tainted 6.18.0-rc1-ktest-00035-gb94488503277 #169 NONE
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> RIP: 0010:assfail+0x3c/0x46
> Code: c2 e0 cc 40 82 48 89 f1 48 89 fe 48 c7 c7 e3 60 45 82 48 89 e5 e8 e4 fd ff ff 8a 05 16 98 55 01 3c 01 76 02 0f 0b a8 01 74 02 <0f> 0b 0f 0b 5d c3 cc cc cc cc 48 8d 45 10 4c 8d 6c 24 10 48 89 e2
> RSP: 0018:ffff888111433cf8 EFLAGS: 00010202
> RAX: 00000000ffffff01 RBX: 0007ffffffffffff RCX: 000000007fffffff
> RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffffffff824560e3
> RBP: ffff888111433cf8 R08: 0000000000000000 R09: 000000000000000a
> R10: 000000000000000a R11: 0fffffffffffffff R12: 0000000000000001
> R13: 00000000ffffff8b R14: ffff888105280000 R15: 0007ffffffffffff
> FS:  00007fc4cd191580(0000) GS:ffff8881f6ccb000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00005612bbfade30 CR3: 000000011146c000 CR4: 0000000000750eb0
> PKRU: 55555554
> Call Trace:
>  <TASK>
>  xfs_reflink_remap_blocks+0x259/0x450
>  xfs_file_remap_range+0xe9/0x3d0
>  vfs_clone_file_range+0xde/0x460
>  ioctl_file_clone+0x50/0xc0
>  __x64_sys_ioctl+0x619/0x9d0
>  ? do_sys_openat2+0x99/0xd0
>  x64_sys_call+0xed0/0x1da0
>  do_syscall_64+0x6a/0x2e0
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7fc4cd34d37b
> Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
> RSP: 002b:00007ffeb4734050 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fc4cd34d37b
> RDX: 0000000000000003 RSI: 0000000040049409 RDI: 0000000000000004
> RBP: 00007ffeb4734490 R08: 00007ffeb4734660 R09: 0000000000000002
> R10: 0000000000000007 R11: 0000000000000246 R12: 0000000000000001
> R13: 0000000000000000 R14: 0000000000008000 R15: 0000000000000002
>  </TASK>
> Modules linked in:
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:assfail+0x3c/0x46
> Code: c2 e0 cc 40 82 48 89 f1 48 89 fe 48 c7 c7 e3 60 45 82 48 89 e5 e8 e4 fd ff ff 8a 05 16 98 55 01 3c 01 76 02 0f 0b a8 01 74 02 <0f> 0b 0f 0b 5d c3 cc cc cc cc 48 8d 45 10 4c 8d 6c 24 10 48 89 e2
> RSP: 0018:ffff888111433cf8 EFLAGS: 00010202
> RAX: 00000000ffffff01 RBX: 0007ffffffffffff RCX: 000000007fffffff
> RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffffffff824560e3
> RBP: ffff888111433cf8 R08: 0000000000000000 R09: 000000000000000a
> R10: 000000000000000a R11: 0fffffffffffffff R12: 0000000000000001
> R13: 00000000ffffff8b R14: ffff888105280000 R15: 0007ffffffffffff
> FS:  00007fc4cd191580(0000) GS:ffff8881f6ccb000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00005612bbfade30 CR3: 000000011146c000 CR4: 0000000000750eb0
> PKRU: 55555554
> Kernel panic - not syncing: Fatal exception
> Kernel Offset: disabled
> ---[ end Kernel panic - not syncing: Fatal exception ]---
>

First off, it's becoming clear to me that I didn't test this patchset
adequately enough. I had run xfstests on fuse but didn't run it on
XFS. My apologies for that, I should have done that and caught these
bugs myself, and will certainly do so next time.

For this test failure, it's because the change from u64 to loff_t is
overflowing loff_t on xfs. The failure is coming from this line change
in particular:

-static unsigned iomap_find_dirty_range(struct folio *folio, u64 *range_start,
- u64 range_end)
+static unsigned iomap_find_dirty_range(struct folio *folio, loff_t
*range_start,
+ loff_t range_end)

which is called when writing back the folio (in iomap_writeback_folio()).

I added some printks and it's overflowing because we are trying to
write back a 4096-byte folio starting at position 9223372036854771712
(2^63 - 4096) in the file which results in an overflowed end_pos of
9223372036854775808 (2^63) when calculating folio_pos + folio_size.

I'm assuming XFS uses these large folio positions as a sentinel/marker
and that it's not actually a folio at position 9223372036854771712,
but either way, this patch doesn't seem viable with how XFS currently
works and I think it needs to get dropped.

I'm going to run the rest of the xfstests suite on XFS for this
patchset series to verify there are no other issues.

Thanks,
Joanne

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 7/9] iomap: use loff_t for file positions and offsets in writeback code
  2025-11-19 18:10     ` Joanne Koong
@ 2025-11-19 18:27       ` Darrick J. Wong
  2025-11-19 19:17         ` Joanne Koong
  0 siblings, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2025-11-19 18:27 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Matthew Wilcox, brauner, hch, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

On Wed, Nov 19, 2025 at 10:10:40AM -0800, Joanne Koong wrote:
> On Tue, Nov 18, 2025 at 7:40 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Tue, Nov 11, 2025 at 11:36:56AM -0800, Joanne Koong wrote:
> > > Use loff_t instead of u64 for file positions and offsets to be
> > > consistent with kernel VFS conventions. Both are 64-bit types. loff_t is
> > > signed for historical reasons but this has no practical effect.
> >
> > generic/303       run fstests generic/303 at 2025-11-19 03:27:51
> > XFS: Assertion failed: imap.br_startblock != DELAYSTARTBLOCK, file: fs/xfs/xfs_reflink.c, line: 1569
> > ------------[ cut here ]------------
> > kernel BUG at fs/xfs/xfs_message.c:102!
> > Oops: invalid opcode: 0000 [#1] SMP NOPTI
> > CPU: 8 UID: 0 PID: 2422 Comm: cp Not tainted 6.18.0-rc1-ktest-00035-gb94488503277 #169 NONE
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> > RIP: 0010:assfail+0x3c/0x46
> > Code: c2 e0 cc 40 82 48 89 f1 48 89 fe 48 c7 c7 e3 60 45 82 48 89 e5 e8 e4 fd ff ff 8a 05 16 98 55 01 3c 01 76 02 0f 0b a8 01 74 02 <0f> 0b 0f 0b 5d c3 cc cc cc cc 48 8d 45 10 4c 8d 6c 24 10 48 89 e2
> > RSP: 0018:ffff888111433cf8 EFLAGS: 00010202
> > RAX: 00000000ffffff01 RBX: 0007ffffffffffff RCX: 000000007fffffff
> > RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffffffff824560e3
> > RBP: ffff888111433cf8 R08: 0000000000000000 R09: 000000000000000a
> > R10: 000000000000000a R11: 0fffffffffffffff R12: 0000000000000001
> > R13: 00000000ffffff8b R14: ffff888105280000 R15: 0007ffffffffffff
> > FS:  00007fc4cd191580(0000) GS:ffff8881f6ccb000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00005612bbfade30 CR3: 000000011146c000 CR4: 0000000000750eb0
> > PKRU: 55555554
> > Call Trace:
> >  <TASK>
> >  xfs_reflink_remap_blocks+0x259/0x450
> >  xfs_file_remap_range+0xe9/0x3d0
> >  vfs_clone_file_range+0xde/0x460
> >  ioctl_file_clone+0x50/0xc0
> >  __x64_sys_ioctl+0x619/0x9d0
> >  ? do_sys_openat2+0x99/0xd0
> >  x64_sys_call+0xed0/0x1da0
> >  do_syscall_64+0x6a/0x2e0
> >  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > RIP: 0033:0x7fc4cd34d37b
> > Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
> > RSP: 002b:00007ffeb4734050 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fc4cd34d37b
> > RDX: 0000000000000003 RSI: 0000000040049409 RDI: 0000000000000004
> > RBP: 00007ffeb4734490 R08: 00007ffeb4734660 R09: 0000000000000002
> > R10: 0000000000000007 R11: 0000000000000246 R12: 0000000000000001
> > R13: 0000000000000000 R14: 0000000000008000 R15: 0000000000000002
> >  </TASK>
> > Modules linked in:
> > ---[ end trace 0000000000000000 ]---
> > RIP: 0010:assfail+0x3c/0x46
> > Code: c2 e0 cc 40 82 48 89 f1 48 89 fe 48 c7 c7 e3 60 45 82 48 89 e5 e8 e4 fd ff ff 8a 05 16 98 55 01 3c 01 76 02 0f 0b a8 01 74 02 <0f> 0b 0f 0b 5d c3 cc cc cc cc 48 8d 45 10 4c 8d 6c 24 10 48 89 e2
> > RSP: 0018:ffff888111433cf8 EFLAGS: 00010202
> > RAX: 00000000ffffff01 RBX: 0007ffffffffffff RCX: 000000007fffffff
> > RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffffffff824560e3
> > RBP: ffff888111433cf8 R08: 0000000000000000 R09: 000000000000000a
> > R10: 000000000000000a R11: 0fffffffffffffff R12: 0000000000000001
> > R13: 00000000ffffff8b R14: ffff888105280000 R15: 0007ffffffffffff
> > FS:  00007fc4cd191580(0000) GS:ffff8881f6ccb000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00005612bbfade30 CR3: 000000011146c000 CR4: 0000000000750eb0
> > PKRU: 55555554
> > Kernel panic - not syncing: Fatal exception
> > Kernel Offset: disabled
> > ---[ end Kernel panic - not syncing: Fatal exception ]---
> >
> 
> First off, it's becoming clear to me that I didn't test this patchset
> adequately enough. I had run xfstests on fuse but didn't run it on
> XFS. My apologies for that, I should have done that and caught these
> bugs myself, and will certainly do so next time.
> 
> For this test failure, it's because the change from u64 to loff_t is
> overflowing loff_t on xfs. The failure is coming from this line change
> in particular:
> 
> -static unsigned iomap_find_dirty_range(struct folio *folio, u64 *range_start,
> - u64 range_end)
> +static unsigned iomap_find_dirty_range(struct folio *folio, loff_t
> *range_start,
> + loff_t range_end)
> 
> which is called when writing back the folio (in iomap_writeback_folio()).
> 
> I added some printks and it's overflowing because we are trying to
> write back a 4096-byte folio starting at position 9223372036854771712
> (2^63 - 4096) in the file which results in an overflowed end_pos of
> 9223372036854775808 (2^63) when calculating folio_pos + folio_size.
> 
> I'm assuming XFS uses these large folio positions as a sentinel/marker
> and that it's not actually a folio at position 9223372036854771712,
> but either way, this patch doesn't seem viable with how XFS currently
> works and I think it needs to get dropped.

xfs supports 9223372036854775807-byte files, so 0x7FFFFFFFFFFFF000
is a valid location for a folio.

--D

> I'm going to run the rest of the xfstests suite on XFS for this
> patchset series to verify there are no other issues.
> 
> Thanks,
> Joanne

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 7/9] iomap: use loff_t for file positions and offsets in writeback code
  2025-11-19 18:27       ` Darrick J. Wong
@ 2025-11-19 19:17         ` Joanne Koong
  2025-11-19 19:35           ` Matthew Wilcox
  0 siblings, 1 reply; 20+ messages in thread
From: Joanne Koong @ 2025-11-19 19:17 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Matthew Wilcox, brauner, hch, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

On Wed, Nov 19, 2025 at 10:27 AM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Wed, Nov 19, 2025 at 10:10:40AM -0800, Joanne Koong wrote:
> > On Tue, Nov 18, 2025 at 7:40 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Tue, Nov 11, 2025 at 11:36:56AM -0800, Joanne Koong wrote:
> > > > Use loff_t instead of u64 for file positions and offsets to be
> > > > consistent with kernel VFS conventions. Both are 64-bit types. loff_t is
> > > > signed for historical reasons but this has no practical effect.
> > >
> > > generic/303       run fstests generic/303 at 2025-11-19 03:27:51
> > > XFS: Assertion failed: imap.br_startblock != DELAYSTARTBLOCK, file: fs/xfs/xfs_reflink.c, line: 1569
> > > ------------[ cut here ]------------
> > > kernel BUG at fs/xfs/xfs_message.c:102!
> > > Oops: invalid opcode: 0000 [#1] SMP NOPTI
> > > CPU: 8 UID: 0 PID: 2422 Comm: cp Not tainted 6.18.0-rc1-ktest-00035-gb94488503277 #169 NONE
> > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> > > RIP: 0010:assfail+0x3c/0x46
> > > Code: c2 e0 cc 40 82 48 89 f1 48 89 fe 48 c7 c7 e3 60 45 82 48 89 e5 e8 e4 fd ff ff 8a 05 16 98 55 01 3c 01 76 02 0f 0b a8 01 74 02 <0f> 0b 0f 0b 5d c3 cc cc cc cc 48 8d 45 10 4c 8d 6c 24 10 48 89 e2
> > > RSP: 0018:ffff888111433cf8 EFLAGS: 00010202
> > > RAX: 00000000ffffff01 RBX: 0007ffffffffffff RCX: 000000007fffffff
> > > RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffffffff824560e3
> > > RBP: ffff888111433cf8 R08: 0000000000000000 R09: 000000000000000a
> > > R10: 000000000000000a R11: 0fffffffffffffff R12: 0000000000000001
> > > R13: 00000000ffffff8b R14: ffff888105280000 R15: 0007ffffffffffff
> > > FS:  00007fc4cd191580(0000) GS:ffff8881f6ccb000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00005612bbfade30 CR3: 000000011146c000 CR4: 0000000000750eb0
> > > PKRU: 55555554
> > > Call Trace:
> > >  <TASK>
> > >  xfs_reflink_remap_blocks+0x259/0x450
> > >  xfs_file_remap_range+0xe9/0x3d0
> > >  vfs_clone_file_range+0xde/0x460
> > >  ioctl_file_clone+0x50/0xc0
> > >  __x64_sys_ioctl+0x619/0x9d0
> > >  ? do_sys_openat2+0x99/0xd0
> > >  x64_sys_call+0xed0/0x1da0
> > >  do_syscall_64+0x6a/0x2e0
> > >  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > RIP: 0033:0x7fc4cd34d37b
> > > Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
> > > RSP: 002b:00007ffeb4734050 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > > RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fc4cd34d37b
> > > RDX: 0000000000000003 RSI: 0000000040049409 RDI: 0000000000000004
> > > RBP: 00007ffeb4734490 R08: 00007ffeb4734660 R09: 0000000000000002
> > > R10: 0000000000000007 R11: 0000000000000246 R12: 0000000000000001
> > > R13: 0000000000000000 R14: 0000000000008000 R15: 0000000000000002
> > >  </TASK>
> > > Modules linked in:
> > > ---[ end trace 0000000000000000 ]---
> > > RIP: 0010:assfail+0x3c/0x46
> > > Code: c2 e0 cc 40 82 48 89 f1 48 89 fe 48 c7 c7 e3 60 45 82 48 89 e5 e8 e4 fd ff ff 8a 05 16 98 55 01 3c 01 76 02 0f 0b a8 01 74 02 <0f> 0b 0f 0b 5d c3 cc cc cc cc 48 8d 45 10 4c 8d 6c 24 10 48 89 e2
> > > RSP: 0018:ffff888111433cf8 EFLAGS: 00010202
> > > RAX: 00000000ffffff01 RBX: 0007ffffffffffff RCX: 000000007fffffff
> > > RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffffffff824560e3
> > > RBP: ffff888111433cf8 R08: 0000000000000000 R09: 000000000000000a
> > > R10: 000000000000000a R11: 0fffffffffffffff R12: 0000000000000001
> > > R13: 00000000ffffff8b R14: ffff888105280000 R15: 0007ffffffffffff
> > > FS:  00007fc4cd191580(0000) GS:ffff8881f6ccb000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00005612bbfade30 CR3: 000000011146c000 CR4: 0000000000750eb0
> > > PKRU: 55555554
> > > Kernel panic - not syncing: Fatal exception
> > > Kernel Offset: disabled
> > > ---[ end Kernel panic - not syncing: Fatal exception ]---
> > >
> >
> > First off, it's becoming clear to me that I didn't test this patchset
> > adequately enough. I had run xfstests on fuse but didn't run it on
> > XFS. My apologies for that, I should have done that and caught these
> > bugs myself, and will certainly do so next time.
> >
> > For this test failure, it's because the change from u64 to loff_t is
> > overflowing loff_t on xfs. The failure is coming from this line change
> > in particular:
> >
> > -static unsigned iomap_find_dirty_range(struct folio *folio, u64 *range_start,
> > - u64 range_end)
> > +static unsigned iomap_find_dirty_range(struct folio *folio, loff_t
> > *range_start,
> > + loff_t range_end)
> >
> > which is called when writing back the folio (in iomap_writeback_folio()).
> >
> > I added some printks and it's overflowing because we are trying to
> > write back a 4096-byte folio starting at position 9223372036854771712
> > (2^63 - 4096) in the file which results in an overflowed end_pos of
> > 9223372036854775808 (2^63) when calculating folio_pos + folio_size.
> >
> > I'm assuming XFS uses these large folio positions as a sentinel/marker
> > and that it's not actually a folio at position 9223372036854771712,
> > but either way, this patch doesn't seem viable with how XFS currently
> > works and I think it needs to get dropped.
>
> xfs supports 9223372036854775807-byte files, so 0x7FFFFFFFFFFFF000
> is a valid location for a folio.

Is 9223372036854775807 the last valid file position supported on xfs
or does xfs also support positions beyond that?

Thanks,
Joanne
>
> --D
>
> > I'm going to run the rest of the xfstests suite on XFS for this
> > patchset series to verify there are no other issues.
> >
> > Thanks,
> > Joanne

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 7/9] iomap: use loff_t for file positions and offsets in writeback code
  2025-11-19 19:17         ` Joanne Koong
@ 2025-11-19 19:35           ` Matthew Wilcox
  2025-11-20  0:38             ` Joanne Koong
  0 siblings, 1 reply; 20+ messages in thread
From: Matthew Wilcox @ 2025-11-19 19:35 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Darrick J. Wong, brauner, hch, bfoster, linux-fsdevel,
	kernel-team, Christoph Hellwig

On Wed, Nov 19, 2025 at 11:17:41AM -0800, Joanne Koong wrote:
> On Wed, Nov 19, 2025 at 10:27 AM Darrick J. Wong <djwong@kernel.org> wrote:
> > xfs supports 9223372036854775807-byte files, so 0x7FFFFFFFFFFFF000
> > is a valid location for a folio.
> 
> Is 9223372036854775807 the last valid file position supported on xfs
> or does xfs also support positions beyond that?

#if BITS_PER_LONG==32
#define MAX_LFS_FILESIZE        ((loff_t)ULONG_MAX << PAGE_SHIFT)
#elif BITS_PER_LONG==64
#define MAX_LFS_FILESIZE        ((loff_t)LLONG_MAX)
#endif

Linux declines to support files beyond 2^63-1.  Today, anyway.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 7/9] iomap: use loff_t for file positions and offsets in writeback code
  2025-11-19 19:35           ` Matthew Wilcox
@ 2025-11-20  0:38             ` Joanne Koong
  2025-11-25  9:22               ` Christian Brauner
  0 siblings, 1 reply; 20+ messages in thread
From: Joanne Koong @ 2025-11-20  0:38 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Darrick J. Wong, brauner, hch, bfoster, linux-fsdevel,
	kernel-team, Christoph Hellwig

On Wed, Nov 19, 2025 at 11:35 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Nov 19, 2025 at 11:17:41AM -0800, Joanne Koong wrote:
> > On Wed, Nov 19, 2025 at 10:27 AM Darrick J. Wong <djwong@kernel.org> wrote:
> > > xfs supports 9223372036854775807-byte files, so 0x7FFFFFFFFFFFF000
> > > is a valid location for a folio.
> >
> > Is 9223372036854775807 the last valid file position supported on xfs
> > or does xfs also support positions beyond that?
>
> #if BITS_PER_LONG==32
> #define MAX_LFS_FILESIZE        ((loff_t)ULONG_MAX << PAGE_SHIFT)
> #elif BITS_PER_LONG==64
> #define MAX_LFS_FILESIZE        ((loff_t)LLONG_MAX)
> #endif
>
> Linux declines to support files beyond 2^63-1.  Today, anyway.

Ah I see, thanks.

I was wrong then - it's not an xfs specific thing, 9223372036854775807
is a valid file position on other filesystems as well.

I think it's best to drop this patch. If we were to keep it,
accounting for the loff_t overflow in "end_pos" and "end_aligned"
makes the whole thing look worse and more confusing.

Christian, could you please drop this patch from your iomap vfs tree?

Thanks,
Joanne

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 7/9] iomap: use loff_t for file positions and offsets in writeback code
  2025-11-20  0:38             ` Joanne Koong
@ 2025-11-25  9:22               ` Christian Brauner
  0 siblings, 0 replies; 20+ messages in thread
From: Christian Brauner @ 2025-11-25  9:22 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Matthew Wilcox, Darrick J. Wong, hch, bfoster, linux-fsdevel,
	kernel-team, Christoph Hellwig

> Christian, could you please drop this patch from your iomap vfs tree?

Done.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-11-25  9:22 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-11 19:36 [PATCH v4 0/9] iomap: buffered io changes Joanne Koong
2025-11-11 19:36 ` [PATCH v4 1/9] iomap: rename bytes_pending/bytes_accounted to bytes_submitted/bytes_not_submitted Joanne Koong
2025-11-11 19:36 ` [PATCH v4 2/9] iomap: account for unaligned end offsets when truncating read range Joanne Koong
2025-11-11 19:36 ` [PATCH v4 3/9] docs: document iomap writeback's iomap_finish_folio_write() requirement Joanne Koong
2025-11-11 19:36 ` [PATCH v4 4/9] iomap: optimize pending async writeback accounting Joanne Koong
2025-11-11 19:36 ` [PATCH v4 5/9] iomap: simplify ->read_folio_range() error handling for reads Joanne Koong
2025-11-17 20:46   ` Matthew Wilcox
2025-11-17 23:26     ` Joanne Koong
2025-11-11 19:36 ` [PATCH v4 6/9] iomap: simplify when reads can be skipped for writes Joanne Koong
2025-11-11 19:36 ` [PATCH v4 7/9] iomap: use loff_t for file positions and offsets in writeback code Joanne Koong
2025-11-19  3:40   ` Matthew Wilcox
2025-11-19 18:10     ` Joanne Koong
2025-11-19 18:27       ` Darrick J. Wong
2025-11-19 19:17         ` Joanne Koong
2025-11-19 19:35           ` Matthew Wilcox
2025-11-20  0:38             ` Joanne Koong
2025-11-25  9:22               ` Christian Brauner
2025-11-11 19:36 ` [PATCH v4 8/9] iomap: use find_next_bit() for dirty bitmap scanning Joanne Koong
2025-11-11 19:36 ` [PATCH v4 9/9] iomap: use find_next_bit() for uptodate " Joanne Koong
2025-11-12 10:34 ` [PATCH v4 0/9] iomap: buffered io changes Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).