[PATCH v3 0/8] iomap: buffered io changes

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/8] iomap: buffered io changes
@ 2025-11-04 20:51 Joanne Koong
  2025-11-04 20:51 ` [PATCH v3 1/8] iomap: account for unaligned end offsets when truncating read range Joanne Koong
                   ` (8 more replies)
  0 siblings, 9 replies; 23+ messages in thread
From: Joanne Koong @ 2025-11-04 20:51 UTC (permalink / raw)
  To: brauner; +Cc: hch, djwong, bfoster, linux-fsdevel, kernel-team


v2: https://lore.kernel.org/linux-fsdevel/20251021164353.3854086-1-joannelkoong@gmail.com/
v2 -> v3:
* Fix race when writing back all bytes of a folio (patch 3)
* Rename from bytes_pending to bytes_submitted (patch 3)
* Add more comments about logic (patch 3)
* Change bytes_submitted from unsigned to size_t (patch 3) (Matthew)

v1: https://lore.kernel.org/linux-fsdevel/20251009225611.3744728-1-joannelkoong@gmail.com/
v1 -> v2:
* Incorporate Christoph's feedback (drop non-block-aligned writes patch, fix
  bitmap scanning function comments, use more concise variable name, etc)
* For loff_t patch, fix up .writeback_range() callback for zonefs, gfs2, and
  block 


Joanne Koong (8):
  iomap: account for unaligned end offsets when truncating read range
  docs: document iomap writeback's iomap_finish_folio_write()
    requirement
  iomap: optimize pending async writeback accounting
  iomap: simplify ->read_folio_range() error handling for reads
  iomap: simplify when reads can be skipped for writes
  iomap: use loff_t for file positions and offsets in writeback code
  iomap: use find_next_bit() for dirty bitmap scanning
  iomap: use find_next_bit() for uptodate bitmap scanning

 .../filesystems/iomap/operations.rst          |  10 +-
 block/fops.c                                  |   3 +-
 fs/fuse/file.c                                |  18 +-
 fs/gfs2/bmap.c                                |   3 +-
 fs/iomap/buffered-io.c                        | 244 +++++++++++-------
 fs/iomap/ioend.c                              |   2 -
 fs/xfs/xfs_aops.c                             |   8 +-
 fs/zonefs/file.c                              |   3 +-
 include/linux/iomap.h                         |  15 +-
 9 files changed, 184 insertions(+), 122 deletions(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 1/8] iomap: account for unaligned end offsets when truncating read range
  2025-11-04 20:51 [PATCH v3 0/8] iomap: buffered io changes Joanne Koong
@ 2025-11-04 20:51 ` Joanne Koong
  2025-11-05  1:27   ` Darrick J. Wong
  2025-11-04 20:51 ` [PATCH v3 2/8] docs: document iomap writeback's iomap_finish_folio_write() requirement Joanne Koong
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Joanne Koong @ 2025-11-04 20:51 UTC (permalink / raw)
  To: brauner; +Cc: hch, djwong, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

The end position to start truncating from may be at an offset into a
block, which under the current logic would result in overtruncation.

Adjust the calculation to account for unaligned end offsets.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index f099c086cbe8..c79b59e52a49 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -218,6 +218,22 @@ static void ifs_free(struct folio *folio)
 	kfree(ifs);
 }
 
+/*
+ * Calculate how many bytes to truncate based off the number of blocks to
+ * truncate and the end position to start truncating from.
+ */
+static size_t iomap_bytes_to_truncate(loff_t end_pos, unsigned block_bits,
+		unsigned blocks_truncated)
+{
+	unsigned block_size = 1 << block_bits;
+	unsigned block_offset = end_pos & (block_size - 1);
+
+	if (!block_offset)
+		return blocks_truncated << block_bits;
+
+	return ((blocks_truncated - 1) << block_bits) + block_offset;
+}
+
 /*
  * Calculate the range inside the folio that we actually need to read.
  */
@@ -263,7 +279,8 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
 		/* truncate len if we find any trailing uptodate block(s) */
 		while (++i <= last) {
 			if (ifs_block_is_uptodate(ifs, i)) {
-				plen -= (last - i + 1) * block_size;
+				plen -= iomap_bytes_to_truncate(*pos + plen,
+						block_bits, last - i + 1);
 				last = i - 1;
 				break;
 			}
@@ -279,7 +296,8 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
 		unsigned end = offset_in_folio(folio, isize - 1) >> block_bits;
 
 		if (first <= end && last > end)
-			plen -= (last - end) * block_size;
+			plen -= iomap_bytes_to_truncate(*pos + plen, block_bits,
+					last - end);
 	}
 
 	*offp = poff;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/8] iomap: account for unaligned end offsets when truncating read range
  2025-11-04 20:51 ` [PATCH v3 1/8] iomap: account for unaligned end offsets when truncating read range Joanne Koong
@ 2025-11-05  1:27   ` Darrick J. Wong
  2025-11-06 17:08     ` Joanne Koong
  0 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2025-11-05  1:27 UTC (permalink / raw)
  To: Joanne Koong
  Cc: brauner, hch, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

On Tue, Nov 04, 2025 at 12:51:12PM -0800, Joanne Koong wrote:
> The end position to start truncating from may be at an offset into a
> block, which under the current logic would result in overtruncation.
> 
> Adjust the calculation to account for unaligned end offsets.

When would the read offset be in the middle of a block?  My guess is
that happens when fuse reads some unaligned number of bytes into a folio
and then stops?  Can this also happen with inlinedata mappings?  I think
those are the only two conditions where iomap isn't dealing with
fsblock-aligned mappings and/or IOs.

(Obviously the pagecache copies themselves aren't required to be
aligned)

--D

> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/buffered-io.c | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index f099c086cbe8..c79b59e52a49 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -218,6 +218,22 @@ static void ifs_free(struct folio *folio)
>  	kfree(ifs);
>  }
>  
> +/*
> + * Calculate how many bytes to truncate based off the number of blocks to
> + * truncate and the end position to start truncating from.
> + */
> +static size_t iomap_bytes_to_truncate(loff_t end_pos, unsigned block_bits,
> +		unsigned blocks_truncated)
> +{
> +	unsigned block_size = 1 << block_bits;
> +	unsigned block_offset = end_pos & (block_size - 1);
> +
> +	if (!block_offset)
> +		return blocks_truncated << block_bits;
> +
> +	return ((blocks_truncated - 1) << block_bits) + block_offset;
> +}
> +
>  /*
>   * Calculate the range inside the folio that we actually need to read.
>   */
> @@ -263,7 +279,8 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
>  		/* truncate len if we find any trailing uptodate block(s) */
>  		while (++i <= last) {
>  			if (ifs_block_is_uptodate(ifs, i)) {
> -				plen -= (last - i + 1) * block_size;
> +				plen -= iomap_bytes_to_truncate(*pos + plen,
> +						block_bits, last - i + 1);
>  				last = i - 1;
>  				break;
>  			}
> @@ -279,7 +296,8 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
>  		unsigned end = offset_in_folio(folio, isize - 1) >> block_bits;
>  
>  		if (first <= end && last > end)
> -			plen -= (last - end) * block_size;
> +			plen -= iomap_bytes_to_truncate(*pos + plen, block_bits,
> +					last - end);
>  	}
>  
>  	*offp = poff;
> -- 
> 2.47.3
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/8] iomap: account for unaligned end offsets when truncating read range
  2025-11-05  1:27   ` Darrick J. Wong
@ 2025-11-06 17:08     ` Joanne Koong
  2025-11-06 23:08       ` Darrick J. Wong
  0 siblings, 1 reply; 23+ messages in thread
From: Joanne Koong @ 2025-11-06 17:08 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: brauner, hch, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

On Tue, Nov 4, 2025 at 5:27 PM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Tue, Nov 04, 2025 at 12:51:12PM -0800, Joanne Koong wrote:
> > The end position to start truncating from may be at an offset into a
> > block, which under the current logic would result in overtruncation.
> >
> > Adjust the calculation to account for unaligned end offsets.
>
> When would the read offset be in the middle of a block?  My guess is
> that happens when fuse reads some unaligned number of bytes into a folio
> and then stops?  Can this also happen with inlinedata mappings?  I think
> those are the only two conditions where iomap isn't dealing with
> fsblock-aligned mappings and/or IOs.
>

The end offset may be in the middle of a block if the filesystem sets
an unaligned mapping size (the length of the read used is derived from
iomap_length(), which considers iter->iomap.length which is set by the
filesystem's ->iomap_begin() handler). This is what we saw on erofs
for inline mappings for example in this syzbot report [1].

[1] https://lore.kernel.org/linux-fsdevel/68ca71bd.050a0220.2ff435.04fc.GAE@google.com/

Thanks,
Joanne

> (Obviously the pagecache copies themselves aren't required to be
> aligned)
>
> --D

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/8] iomap: account for unaligned end offsets when truncating read range
  2025-11-06 17:08     ` Joanne Koong
@ 2025-11-06 23:08       ` Darrick J. Wong
  0 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2025-11-06 23:08 UTC (permalink / raw)
  To: Joanne Koong
  Cc: brauner, hch, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

On Thu, Nov 06, 2025 at 09:08:47AM -0800, Joanne Koong wrote:
> On Tue, Nov 4, 2025 at 5:27 PM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Tue, Nov 04, 2025 at 12:51:12PM -0800, Joanne Koong wrote:
> > > The end position to start truncating from may be at an offset into a
> > > block, which under the current logic would result in overtruncation.
> > >
> > > Adjust the calculation to account for unaligned end offsets.
> >
> > When would the read offset be in the middle of a block?  My guess is
> > that happens when fuse reads some unaligned number of bytes into a folio
> > and then stops?  Can this also happen with inlinedata mappings?  I think
> > those are the only two conditions where iomap isn't dealing with
> > fsblock-aligned mappings and/or IOs.
> >
> 
> The end offset may be in the middle of a block if the filesystem sets
> an unaligned mapping size (the length of the read used is derived from
> iomap_length(), which considers iter->iomap.length which is set by the
> filesystem's ->iomap_begin() handler). This is what we saw on erofs
> for inline mappings for example in this syzbot report [1].
> 
> [1] https://lore.kernel.org/linux-fsdevel/68ca71bd.050a0220.2ff435.04fc.GAE@google.com/

Ahh ok.  Then I've understood this patch sufficiently. :)
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> 
> Thanks,
> Joanne
> 
> > (Obviously the pagecache copies themselves aren't required to be
> > aligned)
> >
> > --D
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 2/8] docs: document iomap writeback's iomap_finish_folio_write() requirement
  2025-11-04 20:51 [PATCH v3 0/8] iomap: buffered io changes Joanne Koong
  2025-11-04 20:51 ` [PATCH v3 1/8] iomap: account for unaligned end offsets when truncating read range Joanne Koong
@ 2025-11-04 20:51 ` Joanne Koong
  2025-11-05  1:22   ` Darrick J. Wong
  2025-11-04 20:51 ` [PATCH v3 3/8] iomap: optimize pending async writeback accounting Joanne Koong
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Joanne Koong @ 2025-11-04 20:51 UTC (permalink / raw)
  To: brauner; +Cc: hch, djwong, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

Document that iomap_finish_folio_write() must be called after writeback
on the range completes.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/filesystems/iomap/operations.rst | 3 +++
 include/linux/iomap.h                          | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index c88205132039..4d30723be7fa 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -361,6 +361,9 @@ The fields are as follows:
     delalloc reservations to avoid having delalloc reservations for
     clean pagecache.
     This function must be supplied by the filesystem.
+    If this succeeds, iomap_finish_folio_write() must be called once writeback
+    completes for the range, regardless of whether the writeback succeeded or
+    failed.
 
   - ``writeback_submit``: Submit the previous built writeback context.
     Block based file systems should use the iomap_ioend_writeback_submit
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 8b1ac08c7474..a5032e456079 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -435,6 +435,10 @@ struct iomap_writeback_ops {
 	 * An existing mapping from a previous call to this method can be reused
 	 * by the file system if it is still valid.
 	 *
+	 * If this succeeds, iomap_finish_folio_write() must be called once
+	 * writeback completes for the range, regardless of whether the
+	 * writeback succeeded or failed.
+	 *
 	 * Returns the number of bytes processed or a negative errno.
 	 */
 	ssize_t (*writeback_range)(struct iomap_writepage_ctx *wpc,
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/8] docs: document iomap writeback's iomap_finish_folio_write() requirement
  2025-11-04 20:51 ` [PATCH v3 2/8] docs: document iomap writeback's iomap_finish_folio_write() requirement Joanne Koong
@ 2025-11-05  1:22   ` Darrick J. Wong
  0 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2025-11-05  1:22 UTC (permalink / raw)
  To: Joanne Koong
  Cc: brauner, hch, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

On Tue, Nov 04, 2025 at 12:51:13PM -0800, Joanne Koong wrote:
> Document that iomap_finish_folio_write() must be called after writeback
> on the range completes.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Makes sense,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  Documentation/filesystems/iomap/operations.rst | 3 +++
>  include/linux/iomap.h                          | 4 ++++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
> index c88205132039..4d30723be7fa 100644
> --- a/Documentation/filesystems/iomap/operations.rst
> +++ b/Documentation/filesystems/iomap/operations.rst
> @@ -361,6 +361,9 @@ The fields are as follows:
>      delalloc reservations to avoid having delalloc reservations for
>      clean pagecache.
>      This function must be supplied by the filesystem.
> +    If this succeeds, iomap_finish_folio_write() must be called once writeback
> +    completes for the range, regardless of whether the writeback succeeded or
> +    failed.
>  
>    - ``writeback_submit``: Submit the previous built writeback context.
>      Block based file systems should use the iomap_ioend_writeback_submit
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 8b1ac08c7474..a5032e456079 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -435,6 +435,10 @@ struct iomap_writeback_ops {
>  	 * An existing mapping from a previous call to this method can be reused
>  	 * by the file system if it is still valid.
>  	 *
> +	 * If this succeeds, iomap_finish_folio_write() must be called once
> +	 * writeback completes for the range, regardless of whether the
> +	 * writeback succeeded or failed.
> +	 *
>  	 * Returns the number of bytes processed or a negative errno.
>  	 */
>  	ssize_t (*writeback_range)(struct iomap_writepage_ctx *wpc,
> -- 
> 2.47.3
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 3/8] iomap: optimize pending async writeback accounting
  2025-11-04 20:51 [PATCH v3 0/8] iomap: buffered io changes Joanne Koong
  2025-11-04 20:51 ` [PATCH v3 1/8] iomap: account for unaligned end offsets when truncating read range Joanne Koong
  2025-11-04 20:51 ` [PATCH v3 2/8] docs: document iomap writeback's iomap_finish_folio_write() requirement Joanne Koong
@ 2025-11-04 20:51 ` Joanne Koong
  2025-11-05  1:28   ` Darrick J. Wong
  2025-11-04 20:51 ` [PATCH v3 4/8] iomap: simplify ->read_folio_range() error handling for reads Joanne Koong
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Joanne Koong @ 2025-11-04 20:51 UTC (permalink / raw)
  To: brauner; +Cc: hch, djwong, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

Pending writebacks must be accounted for to determine when all requests
have completed and writeback on the folio should be ended. Currently
this is done by atomically incrementing ifs->write_bytes_pending for
every range to be written back.

Instead, the number of atomic operations can be minimized by setting
ifs->write_bytes_pending to the folio size, internally tracking how many
bytes are written back asynchronously, and then after sending off all
the requests, decrementing ifs->write_bytes_pending by the number of
bytes not written back asynchronously. Now, for N ranges written back,
only N + 2 atomic operations are required instead of 2N + 2.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/fuse/file.c         |  4 +--
 fs/iomap/buffered-io.c | 58 +++++++++++++++++++++++++-----------------
 fs/iomap/ioend.c       |  2 --
 include/linux/iomap.h  |  2 --
 4 files changed, 36 insertions(+), 30 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 8275b6681b9b..b343a6f37563 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1885,7 +1885,8 @@ static void fuse_writepage_finish(struct fuse_writepage_args *wpa)
 		 * scope of the fi->lock alleviates xarray lock
 		 * contention and noticeably improves performance.
 		 */
-		iomap_finish_folio_write(inode, ap->folios[i], 1);
+		iomap_finish_folio_write(inode, ap->folios[i],
+					 ap->descs[i].length);
 
 	wake_up(&fi->page_waitq);
 }
@@ -2221,7 +2222,6 @@ static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
 		ap = &wpa->ia.ap;
 	}
 
-	iomap_start_folio_write(inode, folio, 1);
 	fuse_writepage_args_page_fill(wpa, folio, ap->num_folios,
 				      offset, len);
 	data->nr_bytes += len;
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index c79b59e52a49..e3171462ba08 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1641,16 +1641,25 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops,
 }
 EXPORT_SYMBOL_GPL(iomap_page_mkwrite);
 
-void iomap_start_folio_write(struct inode *inode, struct folio *folio,
-		size_t len)
+static void iomap_writeback_init(struct inode *inode, struct folio *folio)
 {
 	struct iomap_folio_state *ifs = folio->private;
 
 	WARN_ON_ONCE(i_blocks_per_folio(inode, folio) > 1 && !ifs);
-	if (ifs)
-		atomic_add(len, &ifs->write_bytes_pending);
+	if (ifs) {
+		WARN_ON_ONCE(atomic_read(&ifs->write_bytes_pending) != 0);
+		/*
+		 * Set this to the folio size. After processing the folio for
+		 * writeback in iomap_writeback_folio(), we'll subtract any
+		 * ranges not written back.
+		 *
+		 * We do this because otherwise, we would have to atomically
+		 * increment ifs->write_bytes_pending every time a range in the
+		 * folio needs to be written back.
+		 */
+		atomic_set(&ifs->write_bytes_pending, folio_size(folio));
+	}
 }
-EXPORT_SYMBOL_GPL(iomap_start_folio_write);
 
 void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
 		size_t len)
@@ -1667,7 +1676,7 @@ EXPORT_SYMBOL_GPL(iomap_finish_folio_write);
 
 static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
 		struct folio *folio, u64 pos, u32 rlen, u64 end_pos,
-		bool *wb_pending)
+		size_t *bytes_submitted)
 {
 	do {
 		ssize_t ret;
@@ -1681,11 +1690,11 @@ static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
 		pos += ret;
 
 		/*
-		 * Holes are not be written back by ->writeback_range, so track
+		 * Holes are not written back by ->writeback_range, so track
 		 * if we did handle anything that is not a hole here.
 		 */
 		if (wpc->iomap.type != IOMAP_HOLE)
-			*wb_pending = true;
+			*bytes_submitted += ret;
 	} while (rlen);
 
 	return 0;
@@ -1756,7 +1765,7 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
 	u64 pos = folio_pos(folio);
 	u64 end_pos = pos + folio_size(folio);
 	u64 end_aligned = 0;
-	bool wb_pending = false;
+	size_t bytes_submitted = 0;
 	int error = 0;
 	u32 rlen;
 
@@ -1776,14 +1785,7 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
 			iomap_set_range_dirty(folio, 0, end_pos - pos);
 		}
 
-		/*
-		 * Keep the I/O completion handler from clearing the writeback
-		 * bit until we have submitted all blocks by adding a bias to
-		 * ifs->write_bytes_pending, which is dropped after submitting
-		 * all blocks.
-		 */
-		WARN_ON_ONCE(atomic_read(&ifs->write_bytes_pending) != 0);
-		iomap_start_folio_write(inode, folio, 1);
+		iomap_writeback_init(inode, folio);
 	}
 
 	/*
@@ -1798,13 +1800,13 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
 	end_aligned = round_up(end_pos, i_blocksize(inode));
 	while ((rlen = iomap_find_dirty_range(folio, &pos, end_aligned))) {
 		error = iomap_writeback_range(wpc, folio, pos, rlen, end_pos,
-				&wb_pending);
+				&bytes_submitted);
 		if (error)
 			break;
 		pos += rlen;
 	}
 
-	if (wb_pending)
+	if (bytes_submitted)
 		wpc->nr_folios++;
 
 	/*
@@ -1822,12 +1824,20 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
 	 * bit ourselves right after unlocking the page.
 	 */
 	if (ifs) {
-		if (atomic_dec_and_test(&ifs->write_bytes_pending))
-			folio_end_writeback(folio);
-	} else {
-		if (!wb_pending)
-			folio_end_writeback(folio);
+		/*
+		 * Subtract any bytes that were initially accounted to
+		 * write_bytes_pending but skipped for writeback.
+		 */
+		size_t bytes_not_submitted = folio_size(folio) -
+				bytes_submitted;
+
+		if (bytes_not_submitted)
+			iomap_finish_folio_write(inode, folio,
+					bytes_not_submitted);
+	} else if (!bytes_submitted) {
+		folio_end_writeback(folio);
 	}
+
 	mapping_set_error(inode->i_mapping, error);
 	return error;
 }
diff --git a/fs/iomap/ioend.c b/fs/iomap/ioend.c
index b49fa75eab26..86f44922ed3b 100644
--- a/fs/iomap/ioend.c
+++ b/fs/iomap/ioend.c
@@ -194,8 +194,6 @@ ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
 	if (!bio_add_folio(&ioend->io_bio, folio, map_len, poff))
 		goto new_ioend;
 
-	iomap_start_folio_write(wpc->inode, folio, map_len);
-
 	/*
 	 * Clamp io_offset and io_size to the incore EOF so that ondisk
 	 * file size updates in the ioend completion are byte-accurate.
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index a5032e456079..b49e47f069db 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -478,8 +478,6 @@ int iomap_ioend_writeback_submit(struct iomap_writepage_ctx *wpc, int error);
 
 void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
 		int error);
-void iomap_start_folio_write(struct inode *inode, struct folio *folio,
-		size_t len);
 void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
 		size_t len);
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/8] iomap: optimize pending async writeback accounting
  2025-11-04 20:51 ` [PATCH v3 3/8] iomap: optimize pending async writeback accounting Joanne Koong
@ 2025-11-05  1:28   ` Darrick J. Wong
  0 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2025-11-05  1:28 UTC (permalink / raw)
  To: Joanne Koong
  Cc: brauner, hch, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

On Tue, Nov 04, 2025 at 12:51:14PM -0800, Joanne Koong wrote:
> Pending writebacks must be accounted for to determine when all requests
> have completed and writeback on the folio should be ended. Currently
> this is done by atomically incrementing ifs->write_bytes_pending for
> every range to be written back.
> 
> Instead, the number of atomic operations can be minimized by setting
> ifs->write_bytes_pending to the folio size, internally tracking how many
> bytes are written back asynchronously, and then after sending off all
> the requests, decrementing ifs->write_bytes_pending by the number of
> bytes not written back asynchronously. Now, for N ranges written back,
> only N + 2 atomic operations are required instead of 2N + 2.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Seems reasonable to bias write_bytes_pending upwards and then decrement
it as needed; and then you don't have problems with transient
write_bytes_pending==0.

Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  fs/fuse/file.c         |  4 +--
>  fs/iomap/buffered-io.c | 58 +++++++++++++++++++++++++-----------------
>  fs/iomap/ioend.c       |  2 --
>  include/linux/iomap.h  |  2 --
>  4 files changed, 36 insertions(+), 30 deletions(-)
> 
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 8275b6681b9b..b343a6f37563 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1885,7 +1885,8 @@ static void fuse_writepage_finish(struct fuse_writepage_args *wpa)
>  		 * scope of the fi->lock alleviates xarray lock
>  		 * contention and noticeably improves performance.
>  		 */
> -		iomap_finish_folio_write(inode, ap->folios[i], 1);
> +		iomap_finish_folio_write(inode, ap->folios[i],
> +					 ap->descs[i].length);
>  
>  	wake_up(&fi->page_waitq);
>  }
> @@ -2221,7 +2222,6 @@ static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
>  		ap = &wpa->ia.ap;
>  	}
>  
> -	iomap_start_folio_write(inode, folio, 1);
>  	fuse_writepage_args_page_fill(wpa, folio, ap->num_folios,
>  				      offset, len);
>  	data->nr_bytes += len;
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index c79b59e52a49..e3171462ba08 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1641,16 +1641,25 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops,
>  }
>  EXPORT_SYMBOL_GPL(iomap_page_mkwrite);
>  
> -void iomap_start_folio_write(struct inode *inode, struct folio *folio,
> -		size_t len)
> +static void iomap_writeback_init(struct inode *inode, struct folio *folio)
>  {
>  	struct iomap_folio_state *ifs = folio->private;
>  
>  	WARN_ON_ONCE(i_blocks_per_folio(inode, folio) > 1 && !ifs);
> -	if (ifs)
> -		atomic_add(len, &ifs->write_bytes_pending);
> +	if (ifs) {
> +		WARN_ON_ONCE(atomic_read(&ifs->write_bytes_pending) != 0);
> +		/*
> +		 * Set this to the folio size. After processing the folio for
> +		 * writeback in iomap_writeback_folio(), we'll subtract any
> +		 * ranges not written back.
> +		 *
> +		 * We do this because otherwise, we would have to atomically
> +		 * increment ifs->write_bytes_pending every time a range in the
> +		 * folio needs to be written back.
> +		 */
> +		atomic_set(&ifs->write_bytes_pending, folio_size(folio));
> +	}
>  }
> -EXPORT_SYMBOL_GPL(iomap_start_folio_write);
>  
>  void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
>  		size_t len)
> @@ -1667,7 +1676,7 @@ EXPORT_SYMBOL_GPL(iomap_finish_folio_write);
>  
>  static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
>  		struct folio *folio, u64 pos, u32 rlen, u64 end_pos,
> -		bool *wb_pending)
> +		size_t *bytes_submitted)
>  {
>  	do {
>  		ssize_t ret;
> @@ -1681,11 +1690,11 @@ static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
>  		pos += ret;
>  
>  		/*
> -		 * Holes are not be written back by ->writeback_range, so track
> +		 * Holes are not written back by ->writeback_range, so track
>  		 * if we did handle anything that is not a hole here.
>  		 */
>  		if (wpc->iomap.type != IOMAP_HOLE)
> -			*wb_pending = true;
> +			*bytes_submitted += ret;
>  	} while (rlen);
>  
>  	return 0;
> @@ -1756,7 +1765,7 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
>  	u64 pos = folio_pos(folio);
>  	u64 end_pos = pos + folio_size(folio);
>  	u64 end_aligned = 0;
> -	bool wb_pending = false;
> +	size_t bytes_submitted = 0;
>  	int error = 0;
>  	u32 rlen;
>  
> @@ -1776,14 +1785,7 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
>  			iomap_set_range_dirty(folio, 0, end_pos - pos);
>  		}
>  
> -		/*
> -		 * Keep the I/O completion handler from clearing the writeback
> -		 * bit until we have submitted all blocks by adding a bias to
> -		 * ifs->write_bytes_pending, which is dropped after submitting
> -		 * all blocks.
> -		 */
> -		WARN_ON_ONCE(atomic_read(&ifs->write_bytes_pending) != 0);
> -		iomap_start_folio_write(inode, folio, 1);
> +		iomap_writeback_init(inode, folio);
>  	}
>  
>  	/*
> @@ -1798,13 +1800,13 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
>  	end_aligned = round_up(end_pos, i_blocksize(inode));
>  	while ((rlen = iomap_find_dirty_range(folio, &pos, end_aligned))) {
>  		error = iomap_writeback_range(wpc, folio, pos, rlen, end_pos,
> -				&wb_pending);
> +				&bytes_submitted);
>  		if (error)
>  			break;
>  		pos += rlen;
>  	}
>  
> -	if (wb_pending)
> +	if (bytes_submitted)
>  		wpc->nr_folios++;
>  
>  	/*
> @@ -1822,12 +1824,20 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
>  	 * bit ourselves right after unlocking the page.
>  	 */
>  	if (ifs) {
> -		if (atomic_dec_and_test(&ifs->write_bytes_pending))
> -			folio_end_writeback(folio);
> -	} else {
> -		if (!wb_pending)
> -			folio_end_writeback(folio);
> +		/*
> +		 * Subtract any bytes that were initially accounted to
> +		 * write_bytes_pending but skipped for writeback.
> +		 */
> +		size_t bytes_not_submitted = folio_size(folio) -
> +				bytes_submitted;
> +
> +		if (bytes_not_submitted)
> +			iomap_finish_folio_write(inode, folio,
> +					bytes_not_submitted);
> +	} else if (!bytes_submitted) {
> +		folio_end_writeback(folio);
>  	}
> +
>  	mapping_set_error(inode->i_mapping, error);
>  	return error;
>  }
> diff --git a/fs/iomap/ioend.c b/fs/iomap/ioend.c
> index b49fa75eab26..86f44922ed3b 100644
> --- a/fs/iomap/ioend.c
> +++ b/fs/iomap/ioend.c
> @@ -194,8 +194,6 @@ ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
>  	if (!bio_add_folio(&ioend->io_bio, folio, map_len, poff))
>  		goto new_ioend;
>  
> -	iomap_start_folio_write(wpc->inode, folio, map_len);
> -
>  	/*
>  	 * Clamp io_offset and io_size to the incore EOF so that ondisk
>  	 * file size updates in the ioend completion are byte-accurate.
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index a5032e456079..b49e47f069db 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -478,8 +478,6 @@ int iomap_ioend_writeback_submit(struct iomap_writepage_ctx *wpc, int error);
>  
>  void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
>  		int error);
> -void iomap_start_folio_write(struct inode *inode, struct folio *folio,
> -		size_t len);
>  void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
>  		size_t len);
>  
> -- 
> 2.47.3
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 4/8] iomap: simplify ->read_folio_range() error handling for reads
  2025-11-04 20:51 [PATCH v3 0/8] iomap: buffered io changes Joanne Koong
                   ` (2 preceding siblings ...)
  2025-11-04 20:51 ` [PATCH v3 3/8] iomap: optimize pending async writeback accounting Joanne Koong
@ 2025-11-04 20:51 ` Joanne Koong
  2025-11-05  1:50   ` Darrick J. Wong
  2025-11-04 20:51 ` [PATCH v3 5/8] iomap: simplify when reads can be skipped for writes Joanne Koong
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Joanne Koong @ 2025-11-04 20:51 UTC (permalink / raw)
  To: brauner; +Cc: hch, djwong, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

Instead of requiring that the caller calls iomap_finish_folio_read()
even if the ->read_folio_range() callback returns an error, account for
this internally in iomap instead, which makes the interface simpler and
makes it match writeback's ->read_folio_range() error handling
expectations.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 .../filesystems/iomap/operations.rst          |  7 +++--
 fs/fuse/file.c                                | 10 ++-----
 fs/iomap/buffered-io.c                        | 27 +++++++++----------
 include/linux/iomap.h                         |  5 ++--
 4 files changed, 20 insertions(+), 29 deletions(-)

diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index 4d30723be7fa..64f4baf5750e 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -149,10 +149,9 @@ These ``struct kiocb`` flags are significant for buffered I/O with iomap:
 iomap calls these functions:
 
   - ``read_folio_range``: Called to read in the range. This must be provided
-    by the caller. The caller is responsible for calling
-    iomap_finish_folio_read() after reading in the folio range. This should be
-    done even if an error is encountered during the read. This returns 0 on
-    success or a negative error on failure.
+    by the caller. If this succeeds, iomap_finish_folio_read() must be called
+    after the range is read in, regardless of whether the read succeeded or
+    failed.
 
   - ``submit_read``: Submit any pending read requests. This function is
     optional.
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index b343a6f37563..7bcb650a9f26 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -922,13 +922,6 @@ static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
 
 	if (ctx->rac) {
 		ret = fuse_handle_readahead(folio, ctx->rac, data, pos, len);
-		/*
-		 * If fuse_handle_readahead was successful, fuse_readpages_end
-		 * will do the iomap_finish_folio_read, else we need to call it
-		 * here
-		 */
-		if (ret)
-			iomap_finish_folio_read(folio, off, len, ret);
 	} else {
 		/*
 		 *  for non-readahead read requests, do reads synchronously
@@ -936,7 +929,8 @@ static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
 		 *  out-of-order reads
 		 */
 		ret = fuse_do_readfolio(file, folio, off, len);
-		iomap_finish_folio_read(folio, off, len, ret);
+		if (!ret)
+			iomap_finish_folio_read(folio, off, len, ret);
 	}
 	return ret;
 }
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index e3171462ba08..0f14d2a91f49 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -398,7 +398,8 @@ static void iomap_read_init(struct folio *folio)
 		 * finished reading in the entire folio.
 		 */
 		spin_lock_irq(&ifs->state_lock);
-		ifs->read_bytes_pending += len + 1;
+		WARN_ON_ONCE(ifs->read_bytes_pending != 0);
+		ifs->read_bytes_pending = len + 1;
 		spin_unlock_irq(&ifs->state_lock);
 	}
 }
@@ -414,19 +415,8 @@ static void iomap_read_init(struct folio *folio)
  */
 static void iomap_read_end(struct folio *folio, size_t bytes_submitted)
 {
-	struct iomap_folio_state *ifs;
-
-	/*
-	 * If there are no bytes submitted, this means we are responsible for
-	 * unlocking the folio here, since no IO helper has taken ownership of
-	 * it.
-	 */
-	if (!bytes_submitted) {
-		folio_unlock(folio);
-		return;
-	}
+	struct iomap_folio_state *ifs = folio->private;
 
-	ifs = folio->private;
 	if (ifs) {
 		bool end_read, uptodate;
 		/*
@@ -451,6 +441,15 @@ static void iomap_read_end(struct folio *folio, size_t bytes_submitted)
 		spin_unlock_irq(&ifs->state_lock);
 		if (end_read)
 			folio_end_read(folio, uptodate);
+	} else if (!bytes_submitted) {
+		/*
+		 * If there were no bytes submitted, this means we are
+		 * responsible for unlocking the folio here, since no IO helper
+		 * has taken ownership of it. If there were bytes submitted,
+		 * then the IO helper will end the read via
+		 * iomap_finish_folio_read().
+		 */
+		folio_unlock(folio);
 	}
 }
 
@@ -498,10 +497,10 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
 		} else {
 			if (!*bytes_submitted)
 				iomap_read_init(folio);
-			*bytes_submitted += plen;
 			ret = ctx->ops->read_folio_range(iter, ctx, plen);
 			if (ret)
 				return ret;
+			*bytes_submitted += plen;
 		}
 
 		ret = iomap_iter_advance(iter, plen);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index b49e47f069db..520e967cb501 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -495,9 +495,8 @@ struct iomap_read_ops {
 	/*
 	 * Read in a folio range.
 	 *
-	 * The caller is responsible for calling iomap_finish_folio_read() after
-	 * reading in the folio range. This should be done even if an error is
-	 * encountered during the read.
+	 * If this succeeds, iomap_finish_folio_read() must be called after the
+	 * range is read in, regardless of whether the read succeeded or failed.
 	 *
 	 * Returns 0 on success or a negative error on failure.
 	 */
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/8] iomap: simplify ->read_folio_range() error handling for reads
  2025-11-04 20:51 ` [PATCH v3 4/8] iomap: simplify ->read_folio_range() error handling for reads Joanne Koong
@ 2025-11-05  1:50   ` Darrick J. Wong
  2025-11-06 17:17     ` Joanne Koong
  0 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2025-11-05  1:50 UTC (permalink / raw)
  To: Joanne Koong
  Cc: brauner, hch, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

On Tue, Nov 04, 2025 at 12:51:15PM -0800, Joanne Koong wrote:
> Instead of requiring that the caller calls iomap_finish_folio_read()
> even if the ->read_folio_range() callback returns an error, account for
> this internally in iomap instead, which makes the interface simpler and
> makes it match writeback's ->read_folio_range() error handling
> expectations.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  .../filesystems/iomap/operations.rst          |  7 +++--
>  fs/fuse/file.c                                | 10 ++-----
>  fs/iomap/buffered-io.c                        | 27 +++++++++----------
>  include/linux/iomap.h                         |  5 ++--
>  4 files changed, 20 insertions(+), 29 deletions(-)
> 
> diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
> index 4d30723be7fa..64f4baf5750e 100644
> --- a/Documentation/filesystems/iomap/operations.rst
> +++ b/Documentation/filesystems/iomap/operations.rst
> @@ -149,10 +149,9 @@ These ``struct kiocb`` flags are significant for buffered I/O with iomap:
>  iomap calls these functions:
>  
>    - ``read_folio_range``: Called to read in the range. This must be provided
> -    by the caller. The caller is responsible for calling
> -    iomap_finish_folio_read() after reading in the folio range. This should be
> -    done even if an error is encountered during the read. This returns 0 on
> -    success or a negative error on failure.
> +    by the caller. If this succeeds, iomap_finish_folio_read() must be called
> +    after the range is read in, regardless of whether the read succeeded or
> +    failed.
>  
>    - ``submit_read``: Submit any pending read requests. This function is
>      optional.
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index b343a6f37563..7bcb650a9f26 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -922,13 +922,6 @@ static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
>  
>  	if (ctx->rac) {
>  		ret = fuse_handle_readahead(folio, ctx->rac, data, pos, len);
> -		/*
> -		 * If fuse_handle_readahead was successful, fuse_readpages_end
> -		 * will do the iomap_finish_folio_read, else we need to call it
> -		 * here
> -		 */
> -		if (ret)
> -			iomap_finish_folio_read(folio, off, len, ret);
>  	} else {
>  		/*
>  		 *  for non-readahead read requests, do reads synchronously
> @@ -936,7 +929,8 @@ static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
>  		 *  out-of-order reads
>  		 */
>  		ret = fuse_do_readfolio(file, folio, off, len);
> -		iomap_finish_folio_read(folio, off, len, ret);
> +		if (!ret)
> +			iomap_finish_folio_read(folio, off, len, ret);
>  	}
>  	return ret;
>  }
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index e3171462ba08..0f14d2a91f49 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -398,7 +398,8 @@ static void iomap_read_init(struct folio *folio)
>  		 * finished reading in the entire folio.
>  		 */
>  		spin_lock_irq(&ifs->state_lock);
> -		ifs->read_bytes_pending += len + 1;
> +		WARN_ON_ONCE(ifs->read_bytes_pending != 0);
> +		ifs->read_bytes_pending = len + 1;
>  		spin_unlock_irq(&ifs->state_lock);
>  	}
>  }
> @@ -414,19 +415,8 @@ static void iomap_read_init(struct folio *folio)
>   */
>  static void iomap_read_end(struct folio *folio, size_t bytes_submitted)
>  {
> -	struct iomap_folio_state *ifs;
> -
> -	/*
> -	 * If there are no bytes submitted, this means we are responsible for
> -	 * unlocking the folio here, since no IO helper has taken ownership of
> -	 * it.
> -	 */
> -	if (!bytes_submitted) {
> -		folio_unlock(folio);
> -		return;
> -	}
> +	struct iomap_folio_state *ifs = folio->private;
>  
> -	ifs = folio->private;
>  	if (ifs) {
>  		bool end_read, uptodate;
>  		/*
> @@ -451,6 +441,15 @@ static void iomap_read_end(struct folio *folio, size_t bytes_submitted)
>  		spin_unlock_irq(&ifs->state_lock);
>  		if (end_read)
>  			folio_end_read(folio, uptodate);
> +	} else if (!bytes_submitted) {
> +		/*
> +		 * If there were no bytes submitted, this means we are
> +		 * responsible for unlocking the folio here, since no IO helper
> +		 * has taken ownership of it. If there were bytes submitted,
> +		 * then the IO helper will end the read via
> +		 * iomap_finish_folio_read().
> +		 */
> +		folio_unlock(folio);
>  	}
>  }
>  
> @@ -498,10 +497,10 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
>  		} else {
>  			if (!*bytes_submitted)
>  				iomap_read_init(folio);
> -			*bytes_submitted += plen;
>  			ret = ctx->ops->read_folio_range(iter, ctx, plen);
>  			if (ret)
>  				return ret;
> +			*bytes_submitted += plen;

Hrmm.  Is this the main change of this patch?  We don't increment
bytes_submitted if ->read_folio_range returns an error, which then means
that fuse doesn't have to call iomap_finish_folio_read to decrement
*bytes_submitted?

(and apparently the bio read_folio_range can't fail so no changes are
needed there)

--D

>  		}
>  
>  		ret = iomap_iter_advance(iter, plen);
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index b49e47f069db..520e967cb501 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -495,9 +495,8 @@ struct iomap_read_ops {
>  	/*
>  	 * Read in a folio range.
>  	 *
> -	 * The caller is responsible for calling iomap_finish_folio_read() after
> -	 * reading in the folio range. This should be done even if an error is
> -	 * encountered during the read.
> +	 * If this succeeds, iomap_finish_folio_read() must be called after the
> +	 * range is read in, regardless of whether the read succeeded or failed.
>  	 *
>  	 * Returns 0 on success or a negative error on failure.
>  	 */
> -- 
> 2.47.3
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/8] iomap: simplify ->read_folio_range() error handling for reads
  2025-11-05  1:50   ` Darrick J. Wong
@ 2025-11-06 17:17     ` Joanne Koong
  2025-11-06 23:09       ` Darrick J. Wong
  0 siblings, 1 reply; 23+ messages in thread
From: Joanne Koong @ 2025-11-06 17:17 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: brauner, hch, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

On Tue, Nov 4, 2025 at 5:50 PM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Tue, Nov 04, 2025 at 12:51:15PM -0800, Joanne Koong wrote:
> > Instead of requiring that the caller calls iomap_finish_folio_read()
> > even if the ->read_folio_range() callback returns an error, account for
> > this internally in iomap instead, which makes the interface simpler and
> > makes it match writeback's ->read_folio_range() error handling
> > expectations.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  .../filesystems/iomap/operations.rst          |  7 +++--
> >  fs/fuse/file.c                                | 10 ++-----
> >  fs/iomap/buffered-io.c                        | 27 +++++++++----------
> >  include/linux/iomap.h                         |  5 ++--
> >  4 files changed, 20 insertions(+), 29 deletions(-)
> >
> > @@ -498,10 +497,10 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
> >               } else {
> >                       if (!*bytes_submitted)
> >                               iomap_read_init(folio);
> > -                     *bytes_submitted += plen;
> >                       ret = ctx->ops->read_folio_range(iter, ctx, plen);
> >                       if (ret)
> >                               return ret;
> > +                     *bytes_submitted += plen;
>
> Hrmm.  Is this the main change of this patch?  We don't increment
> bytes_submitted if ->read_folio_range returns an error, which then means
> that fuse doesn't have to call iomap_finish_folio_read to decrement
> *bytes_submitted?
>
> (and apparently the bio read_folio_range can't fail so no changes are
> needed there)

Yes, that is the motivation for the change. And to make the interface
consistent with how the ->read_folio_iter() callback for writeback
handles errors.

Thanks,
Joanne

>
> --D

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/8] iomap: simplify ->read_folio_range() error handling for reads
  2025-11-06 17:17     ` Joanne Koong
@ 2025-11-06 23:09       ` Darrick J. Wong
  0 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2025-11-06 23:09 UTC (permalink / raw)
  To: Joanne Koong
  Cc: brauner, hch, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

On Thu, Nov 06, 2025 at 09:17:02AM -0800, Joanne Koong wrote:
> On Tue, Nov 4, 2025 at 5:50 PM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Tue, Nov 04, 2025 at 12:51:15PM -0800, Joanne Koong wrote:
> > > Instead of requiring that the caller calls iomap_finish_folio_read()
> > > even if the ->read_folio_range() callback returns an error, account for
> > > this internally in iomap instead, which makes the interface simpler and
> > > makes it match writeback's ->read_folio_range() error handling
> > > expectations.
> > >
> > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > >  .../filesystems/iomap/operations.rst          |  7 +++--
> > >  fs/fuse/file.c                                | 10 ++-----
> > >  fs/iomap/buffered-io.c                        | 27 +++++++++----------
> > >  include/linux/iomap.h                         |  5 ++--
> > >  4 files changed, 20 insertions(+), 29 deletions(-)
> > >
> > > @@ -498,10 +497,10 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
> > >               } else {
> > >                       if (!*bytes_submitted)
> > >                               iomap_read_init(folio);
> > > -                     *bytes_submitted += plen;
> > >                       ret = ctx->ops->read_folio_range(iter, ctx, plen);
> > >                       if (ret)
> > >                               return ret;
> > > +                     *bytes_submitted += plen;
> >
> > Hrmm.  Is this the main change of this patch?  We don't increment
> > bytes_submitted if ->read_folio_range returns an error, which then means
> > that fuse doesn't have to call iomap_finish_folio_read to decrement
> > *bytes_submitted?
> >
> > (and apparently the bio read_folio_range can't fail so no changes are
> > needed there)
> 
> Yes, that is the motivation for the change. And to make the interface
> consistent with how the ->read_folio_iter() callback for writeback
> handles errors.

Cool!  I think I understand this well enough to say
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> 
> Thanks,
> Joanne
> 
> >
> > --D
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 5/8] iomap: simplify when reads can be skipped for writes
  2025-11-04 20:51 [PATCH v3 0/8] iomap: buffered io changes Joanne Koong
                   ` (3 preceding siblings ...)
  2025-11-04 20:51 ` [PATCH v3 4/8] iomap: simplify ->read_folio_range() error handling for reads Joanne Koong
@ 2025-11-04 20:51 ` Joanne Koong
  2025-11-05  1:40   ` Darrick J. Wong
  2025-11-04 20:51 ` [PATCH v3 6/8] iomap: use loff_t for file positions and offsets in writeback code Joanne Koong
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Joanne Koong @ 2025-11-04 20:51 UTC (permalink / raw)
  To: brauner; +Cc: hch, djwong, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

Currently, the logic for skipping the read range for a write is

if (!(iter->flags & IOMAP_UNSHARE) &&
    (from <= poff || from >= poff + plen) &&
    (to <= poff || to >= poff + plen))

which breaks down to skipping the read if any of these are true:
a) from <= poff && to <= poff
b) from <= poff && to >= poff + plen
c) from >= poff + plen && to <= poff
d) from >= poff + plen && to >= poff + plen

This can be simplified to
if (!(iter->flags & IOMAP_UNSHARE) && from <= poff && to >= poff + plen)

from the following reasoning:

a) from <= poff && to <= poff
This reduces to 'to <= poff' since it is guaranteed that 'from <= to'
(since to = from + len). It is not possible for 'from <= to' to be true
here because we only reach here if plen > 0 (thanks to the preceding 'if
(plen == 0)' check that would break us out of the loop). If 'to <=
poff', plen would have to be 0 since poff and plen get adjusted in
lockstep for uptodate blocks. This means we can eliminate this check.

c) from >= poff + plen && to <= poff
This is not possible since 'from <= to' and 'plen > 0'. We can eliminate
this check.

d) from >= poff + plen && to >= poff + plen
This reduces to 'from >= poff + plen' since 'from <= to'.
It is not possible for 'from >= poff + plen' to be true here. We only
reach here if plen > 0 and for writes, poff and plen will always be
block-aligned, which means poff <= from < poff + plen. We can eliminate
this check.

The only valid check is b) from <= poff && to >= poff + plen.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 0f14d2a91f49..c02d33bff3d0 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -752,9 +752,12 @@ static int __iomap_write_begin(const struct iomap_iter *iter,
 		if (plen == 0)
 			break;
 
-		if (!(iter->flags & IOMAP_UNSHARE) &&
-		    (from <= poff || from >= poff + plen) &&
-		    (to <= poff || to >= poff + plen))
+		/*
+		 * If the read range will be entirely overwritten by the write,
+		 * we can skip having to zero/read it in.
+		 */
+		if (!(iter->flags & IOMAP_UNSHARE) && from <= poff &&
+		    to >= poff + plen)
 			continue;
 
 		if (iomap_block_needs_zeroing(iter, block_start)) {
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 5/8] iomap: simplify when reads can be skipped for writes
  2025-11-04 20:51 ` [PATCH v3 5/8] iomap: simplify when reads can be skipped for writes Joanne Koong
@ 2025-11-05  1:40   ` Darrick J. Wong
  0 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2025-11-05  1:40 UTC (permalink / raw)
  To: Joanne Koong
  Cc: brauner, hch, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

On Tue, Nov 04, 2025 at 12:51:16PM -0800, Joanne Koong wrote:
> Currently, the logic for skipping the read range for a write is
> 
> if (!(iter->flags & IOMAP_UNSHARE) &&
>     (from <= poff || from >= poff + plen) &&
>     (to <= poff || to >= poff + plen))
> 
> which breaks down to skipping the read if any of these are true:
> a) from <= poff && to <= poff
> b) from <= poff && to >= poff + plen
> c) from >= poff + plen && to <= poff
> d) from >= poff + plen && to >= poff + plen
> 
> This can be simplified to
> if (!(iter->flags & IOMAP_UNSHARE) && from <= poff && to >= poff + plen)
> 
> from the following reasoning:
> 
> a) from <= poff && to <= poff
> This reduces to 'to <= poff' since it is guaranteed that 'from <= to'
> (since to = from + len). It is not possible for 'from <= to' to be true
> here because we only reach here if plen > 0 (thanks to the preceding 'if
> (plen == 0)' check that would break us out of the loop). If 'to <=
> poff', plen would have to be 0 since poff and plen get adjusted in
> lockstep for uptodate blocks. This means we can eliminate this check.
> 
> c) from >= poff + plen && to <= poff
> This is not possible since 'from <= to' and 'plen > 0'. We can eliminate
> this check.
> 
> d) from >= poff + plen && to >= poff + plen
> This reduces to 'from >= poff + plen' since 'from <= to'.
> It is not possible for 'from >= poff + plen' to be true here. We only
> reach here if plen > 0 and for writes, poff and plen will always be
> block-aligned, which means poff <= from < poff + plen. We can eliminate
> this check.
> 
> The only valid check is b) from <= poff && to >= poff + plen.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Heh yeah, makes sense to me
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  fs/iomap/buffered-io.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 0f14d2a91f49..c02d33bff3d0 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -752,9 +752,12 @@ static int __iomap_write_begin(const struct iomap_iter *iter,
>  		if (plen == 0)
>  			break;
>  
> -		if (!(iter->flags & IOMAP_UNSHARE) &&
> -		    (from <= poff || from >= poff + plen) &&
> -		    (to <= poff || to >= poff + plen))
> +		/*
> +		 * If the read range will be entirely overwritten by the write,
> +		 * we can skip having to zero/read it in.
> +		 */
> +		if (!(iter->flags & IOMAP_UNSHARE) && from <= poff &&
> +		    to >= poff + plen)
>  			continue;
>  
>  		if (iomap_block_needs_zeroing(iter, block_start)) {
> -- 
> 2.47.3
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 6/8] iomap: use loff_t for file positions and offsets in writeback code
  2025-11-04 20:51 [PATCH v3 0/8] iomap: buffered io changes Joanne Koong
                   ` (4 preceding siblings ...)
  2025-11-04 20:51 ` [PATCH v3 5/8] iomap: simplify when reads can be skipped for writes Joanne Koong
@ 2025-11-04 20:51 ` Joanne Koong
  2025-11-05  1:41   ` Darrick J. Wong
  2025-11-04 20:51 ` [PATCH v3 7/8] iomap: use find_next_bit() for dirty bitmap scanning Joanne Koong
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Joanne Koong @ 2025-11-04 20:51 UTC (permalink / raw)
  To: brauner; +Cc: hch, djwong, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

Use loff_t instead of u64 for file positions and offsets to be
consistent with kernel VFS conventions. Both are 64-bit types. loff_t is
signed for historical reasons but this has no practical effect.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/fops.c           |  3 ++-
 fs/fuse/file.c         |  4 ++--
 fs/gfs2/bmap.c         |  3 ++-
 fs/iomap/buffered-io.c | 17 +++++++++--------
 fs/xfs/xfs_aops.c      |  8 ++++----
 fs/zonefs/file.c       |  3 ++-
 include/linux/iomap.h  |  4 ++--
 7 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index 4dad9c2d5796..d2b96143b40f 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -550,7 +550,8 @@ static void blkdev_readahead(struct readahead_control *rac)
 }
 
 static ssize_t blkdev_writeback_range(struct iomap_writepage_ctx *wpc,
-		struct folio *folio, u64 offset, unsigned int len, u64 end_pos)
+		struct folio *folio, loff_t offset, unsigned int len,
+		loff_t end_pos)
 {
 	loff_t isize = i_size_read(wpc->inode);
 
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 7bcb650a9f26..6d5e44cbd62e 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2168,8 +2168,8 @@ static bool fuse_folios_need_send(struct fuse_conn *fc, loff_t pos,
 }
 
 static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
-					  struct folio *folio, u64 pos,
-					  unsigned len, u64 end_pos)
+					  struct folio *folio, loff_t pos,
+					  unsigned len, loff_t end_pos)
 {
 	struct fuse_fill_wb_data *data = wpc->wb_ctx;
 	struct fuse_writepage_args *wpa = data->wpa;
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 131091520de6..2b61b057151b 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -2473,7 +2473,8 @@ int __gfs2_punch_hole(struct file *file, loff_t offset, loff_t length)
 }
 
 static ssize_t gfs2_writeback_range(struct iomap_writepage_ctx *wpc,
-		struct folio *folio, u64 offset, unsigned int len, u64 end_pos)
+		struct folio *folio, loff_t offset, unsigned int len,
+		loff_t end_pos)
 {
 	if (WARN_ON_ONCE(gfs2_is_stuffed(GFS2_I(wpc->inode))))
 		return -EIO;
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index c02d33bff3d0..420fe2865927 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -86,7 +86,8 @@ static inline bool ifs_block_is_dirty(struct folio *folio,
 }
 
 static unsigned ifs_find_dirty_range(struct folio *folio,
-		struct iomap_folio_state *ifs, u64 *range_start, u64 range_end)
+		struct iomap_folio_state *ifs, loff_t *range_start,
+		loff_t range_end)
 {
 	struct inode *inode = folio->mapping->host;
 	unsigned start_blk =
@@ -110,8 +111,8 @@ static unsigned ifs_find_dirty_range(struct folio *folio,
 	return nblks << inode->i_blkbits;
 }
 
-static unsigned iomap_find_dirty_range(struct folio *folio, u64 *range_start,
-		u64 range_end)
+static unsigned iomap_find_dirty_range(struct folio *folio, loff_t *range_start,
+		loff_t range_end)
 {
 	struct iomap_folio_state *ifs = folio->private;
 
@@ -1677,7 +1678,7 @@ void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
 EXPORT_SYMBOL_GPL(iomap_finish_folio_write);
 
 static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
-		struct folio *folio, u64 pos, u32 rlen, u64 end_pos,
+		struct folio *folio, loff_t pos, u32 rlen, loff_t end_pos,
 		size_t *bytes_submitted)
 {
 	do {
@@ -1709,7 +1710,7 @@ static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
  * i_size, adjust end_pos and zero all data beyond i_size.
  */
 static bool iomap_writeback_handle_eof(struct folio *folio, struct inode *inode,
-		u64 *end_pos)
+		loff_t *end_pos)
 {
 	u64 isize = i_size_read(inode);
 
@@ -1764,9 +1765,9 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
 {
 	struct iomap_folio_state *ifs = folio->private;
 	struct inode *inode = wpc->inode;
-	u64 pos = folio_pos(folio);
-	u64 end_pos = pos + folio_size(folio);
-	u64 end_aligned = 0;
+	loff_t pos = folio_pos(folio);
+	loff_t end_pos = pos + folio_size(folio);
+	loff_t end_aligned = 0;
 	size_t bytes_submitted = 0;
 	int error = 0;
 	u32 rlen;
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 0c2ed00733f2..593a34832116 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -480,9 +480,9 @@ static ssize_t
 xfs_writeback_range(
 	struct iomap_writepage_ctx *wpc,
 	struct folio		*folio,
-	u64			offset,
+	loff_t			offset,
 	unsigned int		len,
-	u64			end_pos)
+	loff_t			end_pos)
 {
 	ssize_t			ret;
 
@@ -630,9 +630,9 @@ static ssize_t
 xfs_zoned_writeback_range(
 	struct iomap_writepage_ctx *wpc,
 	struct folio		*folio,
-	u64			offset,
+	loff_t			offset,
 	unsigned int		len,
-	u64			end_pos)
+	loff_t			end_pos)
 {
 	ssize_t			ret;
 
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index c1e5e30e90a0..d748ed99ac2d 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -126,7 +126,8 @@ static void zonefs_readahead(struct readahead_control *rac)
  * which implies that the page range can only be within the fixed inode size.
  */
 static ssize_t zonefs_writeback_range(struct iomap_writepage_ctx *wpc,
-		struct folio *folio, u64 offset, unsigned len, u64 end_pos)
+		struct folio *folio, loff_t offset, unsigned len,
+		loff_t end_pos)
 {
 	struct zonefs_zone *z = zonefs_inode_zone(wpc->inode);
 
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 520e967cb501..351c7bd9653d 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -442,8 +442,8 @@ struct iomap_writeback_ops {
 	 * Returns the number of bytes processed or a negative errno.
 	 */
 	ssize_t (*writeback_range)(struct iomap_writepage_ctx *wpc,
-			struct folio *folio, u64 pos, unsigned int len,
-			u64 end_pos);
+			struct folio *folio, loff_t pos, unsigned int len,
+			loff_t end_pos);
 
 	/*
 	 * Submit a writeback context previously build up by ->writeback_range.
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 6/8] iomap: use loff_t for file positions and offsets in writeback code
  2025-11-04 20:51 ` [PATCH v3 6/8] iomap: use loff_t for file positions and offsets in writeback code Joanne Koong
@ 2025-11-05  1:41   ` Darrick J. Wong
  0 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2025-11-05  1:41 UTC (permalink / raw)
  To: Joanne Koong
  Cc: brauner, hch, bfoster, linux-fsdevel, kernel-team,
	Christoph Hellwig

On Tue, Nov 04, 2025 at 12:51:17PM -0800, Joanne Koong wrote:
> Use loff_t instead of u64 for file positions and offsets to be
> consistent with kernel VFS conventions. Both are 64-bit types. loff_t is
> signed for historical reasons but this has no practical effect.

Let's hope iomap never encounters a FOP_UNSIGNED_OFFSET filesystem.

(should we warn somewhere if we do?)

> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  block/fops.c           |  3 ++-
>  fs/fuse/file.c         |  4 ++--
>  fs/gfs2/bmap.c         |  3 ++-
>  fs/iomap/buffered-io.c | 17 +++++++++--------
>  fs/xfs/xfs_aops.c      |  8 ++++----
>  fs/zonefs/file.c       |  3 ++-
>  include/linux/iomap.h  |  4 ++--
>  7 files changed, 23 insertions(+), 19 deletions(-)
> 
> diff --git a/block/fops.c b/block/fops.c
> index 4dad9c2d5796..d2b96143b40f 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -550,7 +550,8 @@ static void blkdev_readahead(struct readahead_control *rac)
>  }
>  
>  static ssize_t blkdev_writeback_range(struct iomap_writepage_ctx *wpc,
> -		struct folio *folio, u64 offset, unsigned int len, u64 end_pos)
> +		struct folio *folio, loff_t offset, unsigned int len,
> +		loff_t end_pos)
>  {
>  	loff_t isize = i_size_read(wpc->inode);
>  
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 7bcb650a9f26..6d5e44cbd62e 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -2168,8 +2168,8 @@ static bool fuse_folios_need_send(struct fuse_conn *fc, loff_t pos,
>  }
>  
>  static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
> -					  struct folio *folio, u64 pos,
> -					  unsigned len, u64 end_pos)
> +					  struct folio *folio, loff_t pos,
> +					  unsigned len, loff_t end_pos)
>  {
>  	struct fuse_fill_wb_data *data = wpc->wb_ctx;
>  	struct fuse_writepage_args *wpa = data->wpa;
> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> index 131091520de6..2b61b057151b 100644
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -2473,7 +2473,8 @@ int __gfs2_punch_hole(struct file *file, loff_t offset, loff_t length)
>  }
>  
>  static ssize_t gfs2_writeback_range(struct iomap_writepage_ctx *wpc,
> -		struct folio *folio, u64 offset, unsigned int len, u64 end_pos)
> +		struct folio *folio, loff_t offset, unsigned int len,
> +		loff_t end_pos)
>  {
>  	if (WARN_ON_ONCE(gfs2_is_stuffed(GFS2_I(wpc->inode))))
>  		return -EIO;
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index c02d33bff3d0..420fe2865927 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -86,7 +86,8 @@ static inline bool ifs_block_is_dirty(struct folio *folio,
>  }
>  
>  static unsigned ifs_find_dirty_range(struct folio *folio,
> -		struct iomap_folio_state *ifs, u64 *range_start, u64 range_end)
> +		struct iomap_folio_state *ifs, loff_t *range_start,
> +		loff_t range_end)
>  {
>  	struct inode *inode = folio->mapping->host;
>  	unsigned start_blk =
> @@ -110,8 +111,8 @@ static unsigned ifs_find_dirty_range(struct folio *folio,
>  	return nblks << inode->i_blkbits;
>  }
>  
> -static unsigned iomap_find_dirty_range(struct folio *folio, u64 *range_start,
> -		u64 range_end)
> +static unsigned iomap_find_dirty_range(struct folio *folio, loff_t *range_start,
> +		loff_t range_end)
>  {
>  	struct iomap_folio_state *ifs = folio->private;
>  
> @@ -1677,7 +1678,7 @@ void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
>  EXPORT_SYMBOL_GPL(iomap_finish_folio_write);
>  
>  static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
> -		struct folio *folio, u64 pos, u32 rlen, u64 end_pos,
> +		struct folio *folio, loff_t pos, u32 rlen, loff_t end_pos,
>  		size_t *bytes_submitted)
>  {
>  	do {
> @@ -1709,7 +1710,7 @@ static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
>   * i_size, adjust end_pos and zero all data beyond i_size.
>   */
>  static bool iomap_writeback_handle_eof(struct folio *folio, struct inode *inode,
> -		u64 *end_pos)
> +		loff_t *end_pos)
>  {
>  	u64 isize = i_size_read(inode);
>  
> @@ -1764,9 +1765,9 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
>  {
>  	struct iomap_folio_state *ifs = folio->private;
>  	struct inode *inode = wpc->inode;
> -	u64 pos = folio_pos(folio);
> -	u64 end_pos = pos + folio_size(folio);
> -	u64 end_aligned = 0;
> +	loff_t pos = folio_pos(folio);
> +	loff_t end_pos = pos + folio_size(folio);
> +	loff_t end_aligned = 0;
>  	size_t bytes_submitted = 0;
>  	int error = 0;
>  	u32 rlen;
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 0c2ed00733f2..593a34832116 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -480,9 +480,9 @@ static ssize_t
>  xfs_writeback_range(
>  	struct iomap_writepage_ctx *wpc,
>  	struct folio		*folio,
> -	u64			offset,
> +	loff_t			offset,
>  	unsigned int		len,
> -	u64			end_pos)
> +	loff_t			end_pos)
>  {
>  	ssize_t			ret;
>  
> @@ -630,9 +630,9 @@ static ssize_t
>  xfs_zoned_writeback_range(
>  	struct iomap_writepage_ctx *wpc,
>  	struct folio		*folio,
> -	u64			offset,
> +	loff_t			offset,
>  	unsigned int		len,
> -	u64			end_pos)
> +	loff_t			end_pos)
>  {
>  	ssize_t			ret;
>  
> diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
> index c1e5e30e90a0..d748ed99ac2d 100644
> --- a/fs/zonefs/file.c
> +++ b/fs/zonefs/file.c
> @@ -126,7 +126,8 @@ static void zonefs_readahead(struct readahead_control *rac)
>   * which implies that the page range can only be within the fixed inode size.
>   */
>  static ssize_t zonefs_writeback_range(struct iomap_writepage_ctx *wpc,
> -		struct folio *folio, u64 offset, unsigned len, u64 end_pos)
> +		struct folio *folio, loff_t offset, unsigned len,
> +		loff_t end_pos)
>  {
>  	struct zonefs_zone *z = zonefs_inode_zone(wpc->inode);
>  
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 520e967cb501..351c7bd9653d 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -442,8 +442,8 @@ struct iomap_writeback_ops {
>  	 * Returns the number of bytes processed or a negative errno.
>  	 */
>  	ssize_t (*writeback_range)(struct iomap_writepage_ctx *wpc,
> -			struct folio *folio, u64 pos, unsigned int len,
> -			u64 end_pos);
> +			struct folio *folio, loff_t pos, unsigned int len,
> +			loff_t end_pos);
>  
>  	/*
>  	 * Submit a writeback context previously build up by ->writeback_range.
> -- 
> 2.47.3
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 7/8] iomap: use find_next_bit() for dirty bitmap scanning
  2025-11-04 20:51 [PATCH v3 0/8] iomap: buffered io changes Joanne Koong
                   ` (5 preceding siblings ...)
  2025-11-04 20:51 ` [PATCH v3 6/8] iomap: use loff_t for file positions and offsets in writeback code Joanne Koong
@ 2025-11-04 20:51 ` Joanne Koong
  2025-11-05  1:42   ` Darrick J. Wong
  2025-11-04 20:51 ` [PATCH v3 8/8] iomap: use find_next_bit() for uptodate " Joanne Koong
  2025-11-11 12:09 ` [PATCH v3 0/8] iomap: buffered io changes Christian Brauner
  8 siblings, 1 reply; 23+ messages in thread
From: Joanne Koong @ 2025-11-04 20:51 UTC (permalink / raw)
  To: brauner; +Cc: hch, djwong, bfoster, linux-fsdevel, kernel-team

Use find_next_bit()/find_next_zero_bit() for iomap dirty bitmap
scanning. This uses __ffs() internally and is more efficient for
finding the next dirty or clean bit than iterating through the bitmap
range testing every bit.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
---
 fs/iomap/buffered-io.c | 61 ++++++++++++++++++++++++++++--------------
 1 file changed, 41 insertions(+), 20 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 420fe2865927..3c9a4b773186 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -76,13 +76,34 @@ static void iomap_set_range_uptodate(struct folio *folio, size_t off,
 		folio_mark_uptodate(folio);
 }
 
-static inline bool ifs_block_is_dirty(struct folio *folio,
-		struct iomap_folio_state *ifs, int block)
+/*
+ * Find the next dirty block in the folio. end_blk is inclusive.
+ * If no dirty block is found, this will return end_blk + 1.
+ */
+static unsigned ifs_next_dirty_block(struct folio *folio,
+		unsigned start_blk, unsigned end_blk)
 {
+	struct iomap_folio_state *ifs = folio->private;
 	struct inode *inode = folio->mapping->host;
-	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
+	unsigned int blks = i_blocks_per_folio(inode, folio);
+
+	return find_next_bit(ifs->state, blks + end_blk + 1,
+			blks + start_blk) - blks;
+}
+
+/*
+ * Find the next clean block in the folio. end_blk is inclusive.
+ * If no clean block is found, this will return end_blk + 1.
+ */
+static unsigned ifs_next_clean_block(struct folio *folio,
+		unsigned start_blk, unsigned end_blk)
+{
+	struct iomap_folio_state *ifs = folio->private;
+	struct inode *inode = folio->mapping->host;
+	unsigned int blks = i_blocks_per_folio(inode, folio);
 
-	return test_bit(block + blks_per_folio, ifs->state);
+	return find_next_zero_bit(ifs->state, blks + end_blk + 1,
+			blks + start_blk) - blks;
 }
 
 static unsigned ifs_find_dirty_range(struct folio *folio,
@@ -94,18 +115,17 @@ static unsigned ifs_find_dirty_range(struct folio *folio,
 		offset_in_folio(folio, *range_start) >> inode->i_blkbits;
 	unsigned end_blk = min_not_zero(
 		offset_in_folio(folio, range_end) >> inode->i_blkbits,
-		i_blocks_per_folio(inode, folio));
-	unsigned nblks = 1;
-
-	while (!ifs_block_is_dirty(folio, ifs, start_blk))
-		if (++start_blk == end_blk)
-			return 0;
+		i_blocks_per_folio(inode, folio)) - 1;
+	unsigned nblks;
 
-	while (start_blk + nblks < end_blk) {
-		if (!ifs_block_is_dirty(folio, ifs, start_blk + nblks))
-			break;
-		nblks++;
-	}
+	start_blk = ifs_next_dirty_block(folio, start_blk, end_blk);
+	if (start_blk > end_blk)
+		return 0;
+	if (start_blk == end_blk)
+		nblks = 1;
+	else
+		nblks = ifs_next_clean_block(folio, start_blk + 1, end_blk) -
+				start_blk;
 
 	*range_start = folio_pos(folio) + (start_blk << inode->i_blkbits);
 	return nblks << inode->i_blkbits;
@@ -1161,7 +1181,7 @@ static void iomap_write_delalloc_ifs_punch(struct inode *inode,
 		struct folio *folio, loff_t start_byte, loff_t end_byte,
 		struct iomap *iomap, iomap_punch_t punch)
 {
-	unsigned int first_blk, last_blk, i;
+	unsigned int first_blk, last_blk;
 	loff_t last_byte;
 	u8 blkbits = inode->i_blkbits;
 	struct iomap_folio_state *ifs;
@@ -1180,10 +1200,11 @@ static void iomap_write_delalloc_ifs_punch(struct inode *inode,
 			folio_pos(folio) + folio_size(folio) - 1);
 	first_blk = offset_in_folio(folio, start_byte) >> blkbits;
 	last_blk = offset_in_folio(folio, last_byte) >> blkbits;
-	for (i = first_blk; i <= last_blk; i++) {
-		if (!ifs_block_is_dirty(folio, ifs, i))
-			punch(inode, folio_pos(folio) + (i << blkbits),
-				    1 << blkbits, iomap);
+	while ((first_blk = ifs_next_clean_block(folio, first_blk, last_blk))
+		       <= last_blk) {
+		punch(inode, folio_pos(folio) + (first_blk << blkbits),
+				1 << blkbits, iomap);
+		first_blk++;
 	}
 }
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 7/8] iomap: use find_next_bit() for dirty bitmap scanning
  2025-11-04 20:51 ` [PATCH v3 7/8] iomap: use find_next_bit() for dirty bitmap scanning Joanne Koong
@ 2025-11-05  1:42   ` Darrick J. Wong
  0 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2025-11-05  1:42 UTC (permalink / raw)
  To: Joanne Koong; +Cc: brauner, hch, bfoster, linux-fsdevel, kernel-team

On Tue, Nov 04, 2025 at 12:51:18PM -0800, Joanne Koong wrote:
> Use find_next_bit()/find_next_zero_bit() for iomap dirty bitmap
> scanning. This uses __ffs() internally and is more efficient for
> finding the next dirty or clean bit than iterating through the bitmap
> range testing every bit.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Suggested-by: Christoph Hellwig <hch@infradead.org>

Oh hey, that's a nice cleanup
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  fs/iomap/buffered-io.c | 61 ++++++++++++++++++++++++++++--------------
>  1 file changed, 41 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 420fe2865927..3c9a4b773186 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -76,13 +76,34 @@ static void iomap_set_range_uptodate(struct folio *folio, size_t off,
>  		folio_mark_uptodate(folio);
>  }
>  
> -static inline bool ifs_block_is_dirty(struct folio *folio,
> -		struct iomap_folio_state *ifs, int block)
> +/*
> + * Find the next dirty block in the folio. end_blk is inclusive.
> + * If no dirty block is found, this will return end_blk + 1.
> + */
> +static unsigned ifs_next_dirty_block(struct folio *folio,
> +		unsigned start_blk, unsigned end_blk)
>  {
> +	struct iomap_folio_state *ifs = folio->private;
>  	struct inode *inode = folio->mapping->host;
> -	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
> +	unsigned int blks = i_blocks_per_folio(inode, folio);
> +
> +	return find_next_bit(ifs->state, blks + end_blk + 1,
> +			blks + start_blk) - blks;
> +}
> +
> +/*
> + * Find the next clean block in the folio. end_blk is inclusive.
> + * If no clean block is found, this will return end_blk + 1.
> + */
> +static unsigned ifs_next_clean_block(struct folio *folio,
> +		unsigned start_blk, unsigned end_blk)
> +{
> +	struct iomap_folio_state *ifs = folio->private;
> +	struct inode *inode = folio->mapping->host;
> +	unsigned int blks = i_blocks_per_folio(inode, folio);
>  
> -	return test_bit(block + blks_per_folio, ifs->state);
> +	return find_next_zero_bit(ifs->state, blks + end_blk + 1,
> +			blks + start_blk) - blks;
>  }
>  
>  static unsigned ifs_find_dirty_range(struct folio *folio,
> @@ -94,18 +115,17 @@ static unsigned ifs_find_dirty_range(struct folio *folio,
>  		offset_in_folio(folio, *range_start) >> inode->i_blkbits;
>  	unsigned end_blk = min_not_zero(
>  		offset_in_folio(folio, range_end) >> inode->i_blkbits,
> -		i_blocks_per_folio(inode, folio));
> -	unsigned nblks = 1;
> -
> -	while (!ifs_block_is_dirty(folio, ifs, start_blk))
> -		if (++start_blk == end_blk)
> -			return 0;
> +		i_blocks_per_folio(inode, folio)) - 1;
> +	unsigned nblks;
>  
> -	while (start_blk + nblks < end_blk) {
> -		if (!ifs_block_is_dirty(folio, ifs, start_blk + nblks))
> -			break;
> -		nblks++;
> -	}
> +	start_blk = ifs_next_dirty_block(folio, start_blk, end_blk);
> +	if (start_blk > end_blk)
> +		return 0;
> +	if (start_blk == end_blk)
> +		nblks = 1;
> +	else
> +		nblks = ifs_next_clean_block(folio, start_blk + 1, end_blk) -
> +				start_blk;
>  
>  	*range_start = folio_pos(folio) + (start_blk << inode->i_blkbits);
>  	return nblks << inode->i_blkbits;
> @@ -1161,7 +1181,7 @@ static void iomap_write_delalloc_ifs_punch(struct inode *inode,
>  		struct folio *folio, loff_t start_byte, loff_t end_byte,
>  		struct iomap *iomap, iomap_punch_t punch)
>  {
> -	unsigned int first_blk, last_blk, i;
> +	unsigned int first_blk, last_blk;
>  	loff_t last_byte;
>  	u8 blkbits = inode->i_blkbits;
>  	struct iomap_folio_state *ifs;
> @@ -1180,10 +1200,11 @@ static void iomap_write_delalloc_ifs_punch(struct inode *inode,
>  			folio_pos(folio) + folio_size(folio) - 1);
>  	first_blk = offset_in_folio(folio, start_byte) >> blkbits;
>  	last_blk = offset_in_folio(folio, last_byte) >> blkbits;
> -	for (i = first_blk; i <= last_blk; i++) {
> -		if (!ifs_block_is_dirty(folio, ifs, i))
> -			punch(inode, folio_pos(folio) + (i << blkbits),
> -				    1 << blkbits, iomap);
> +	while ((first_blk = ifs_next_clean_block(folio, first_blk, last_blk))
> +		       <= last_blk) {
> +		punch(inode, folio_pos(folio) + (first_blk << blkbits),
> +				1 << blkbits, iomap);
> +		first_blk++;
>  	}
>  }
>  
> -- 
> 2.47.3
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 8/8] iomap: use find_next_bit() for uptodate bitmap scanning
  2025-11-04 20:51 [PATCH v3 0/8] iomap: buffered io changes Joanne Koong
                   ` (6 preceding siblings ...)
  2025-11-04 20:51 ` [PATCH v3 7/8] iomap: use find_next_bit() for dirty bitmap scanning Joanne Koong
@ 2025-11-04 20:51 ` Joanne Koong
  2025-11-05  1:42   ` Darrick J. Wong
  2025-11-11 12:09 ` [PATCH v3 0/8] iomap: buffered io changes Christian Brauner
  8 siblings, 1 reply; 23+ messages in thread
From: Joanne Koong @ 2025-11-04 20:51 UTC (permalink / raw)
  To: brauner; +Cc: hch, djwong, bfoster, linux-fsdevel, kernel-team

Use find_next_bit()/find_next_zero_bit() for iomap uptodate bitmap
scanning. This uses __ffs() internally and is more efficient for
finding the next uptodate or non-uptodate bit than iterating through the
the bitmap range testing every bit.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
---
 fs/iomap/buffered-io.c | 52 ++++++++++++++++++++++++++----------------
 1 file changed, 32 insertions(+), 20 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 3c9a4b773186..03dd524b69d2 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -38,10 +38,28 @@ static inline bool ifs_is_fully_uptodate(struct folio *folio,
 	return bitmap_full(ifs->state, i_blocks_per_folio(inode, folio));
 }
 
-static inline bool ifs_block_is_uptodate(struct iomap_folio_state *ifs,
-		unsigned int block)
+/*
+ * Find the next uptodate block in the folio. end_blk is inclusive.
+ * If no uptodate block is found, this will return end_blk + 1.
+ */
+static unsigned ifs_next_uptodate_block(struct folio *folio,
+		unsigned start_blk, unsigned end_blk)
 {
-	return test_bit(block, ifs->state);
+	struct iomap_folio_state *ifs = folio->private;
+
+	return find_next_bit(ifs->state, end_blk + 1, start_blk);
+}
+
+/*
+ * Find the next non-uptodate block in the folio. end_blk is inclusive.
+ * If no non-uptodate block is found, this will return end_blk + 1.
+ */
+static unsigned ifs_next_nonuptodate_block(struct folio *folio,
+		unsigned start_blk, unsigned end_blk)
+{
+	struct iomap_folio_state *ifs = folio->private;
+
+	return find_next_zero_bit(ifs->state, end_blk + 1, start_blk);
 }
 
 static bool ifs_set_range_uptodate(struct folio *folio,
@@ -278,14 +296,11 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
 	 * to avoid reading in already uptodate ranges.
 	 */
 	if (ifs) {
-		unsigned int i, blocks_skipped;
+		unsigned int next, blocks_skipped;
 
-		/* move forward for each leading block marked uptodate */
-		for (i = first; i <= last; i++)
-			if (!ifs_block_is_uptodate(ifs, i))
-				break;
+		next = ifs_next_nonuptodate_block(folio, first, last);
+		blocks_skipped = next - first;
 
-		blocks_skipped = i - first;
 		if (blocks_skipped) {
 			unsigned long block_offset = *pos & (block_size - 1);
 			unsigned bytes_skipped =
@@ -295,15 +310,15 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
 			poff += bytes_skipped;
 			plen -= bytes_skipped;
 		}
-		first = i;
+		first = next;
 
 		/* truncate len if we find any trailing uptodate block(s) */
-		while (++i <= last) {
-			if (ifs_block_is_uptodate(ifs, i)) {
+		if (++next <= last) {
+			next = ifs_next_uptodate_block(folio, next, last);
+			if (next <= last) {
 				plen -= iomap_bytes_to_truncate(*pos + plen,
-						block_bits, last - i + 1);
-				last = i - 1;
-				break;
+						block_bits, last - next + 1);
+				last = next - 1;
 			}
 		}
 	}
@@ -634,7 +649,7 @@ bool iomap_is_partially_uptodate(struct folio *folio, size_t from, size_t count)
 {
 	struct iomap_folio_state *ifs = folio->private;
 	struct inode *inode = folio->mapping->host;
-	unsigned first, last, i;
+	unsigned first, last;
 
 	if (!ifs)
 		return false;
@@ -646,10 +661,7 @@ bool iomap_is_partially_uptodate(struct folio *folio, size_t from, size_t count)
 	first = from >> inode->i_blkbits;
 	last = (from + count - 1) >> inode->i_blkbits;
 
-	for (i = first; i <= last; i++)
-		if (!ifs_block_is_uptodate(ifs, i))
-			return false;
-	return true;
+	return ifs_next_nonuptodate_block(folio, first, last) > last;
 }
 EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 8/8] iomap: use find_next_bit() for uptodate bitmap scanning
  2025-11-04 20:51 ` [PATCH v3 8/8] iomap: use find_next_bit() for uptodate " Joanne Koong
@ 2025-11-05  1:42   ` Darrick J. Wong
  0 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2025-11-05  1:42 UTC (permalink / raw)
  To: Joanne Koong; +Cc: brauner, hch, bfoster, linux-fsdevel, kernel-team

On Tue, Nov 04, 2025 at 12:51:19PM -0800, Joanne Koong wrote:
> Use find_next_bit()/find_next_zero_bit() for iomap uptodate bitmap
> scanning. This uses __ffs() internally and is more efficient for
> finding the next uptodate or non-uptodate bit than iterating through the
> the bitmap range testing every bit.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Suggested-by: Christoph Hellwig <hch@infradead.org>

Here too!
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  fs/iomap/buffered-io.c | 52 ++++++++++++++++++++++++++----------------
>  1 file changed, 32 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 3c9a4b773186..03dd524b69d2 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -38,10 +38,28 @@ static inline bool ifs_is_fully_uptodate(struct folio *folio,
>  	return bitmap_full(ifs->state, i_blocks_per_folio(inode, folio));
>  }
>  
> -static inline bool ifs_block_is_uptodate(struct iomap_folio_state *ifs,
> -		unsigned int block)
> +/*
> + * Find the next uptodate block in the folio. end_blk is inclusive.
> + * If no uptodate block is found, this will return end_blk + 1.
> + */
> +static unsigned ifs_next_uptodate_block(struct folio *folio,
> +		unsigned start_blk, unsigned end_blk)
>  {
> -	return test_bit(block, ifs->state);
> +	struct iomap_folio_state *ifs = folio->private;
> +
> +	return find_next_bit(ifs->state, end_blk + 1, start_blk);
> +}
> +
> +/*
> + * Find the next non-uptodate block in the folio. end_blk is inclusive.
> + * If no non-uptodate block is found, this will return end_blk + 1.
> + */
> +static unsigned ifs_next_nonuptodate_block(struct folio *folio,
> +		unsigned start_blk, unsigned end_blk)
> +{
> +	struct iomap_folio_state *ifs = folio->private;
> +
> +	return find_next_zero_bit(ifs->state, end_blk + 1, start_blk);
>  }
>  
>  static bool ifs_set_range_uptodate(struct folio *folio,
> @@ -278,14 +296,11 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
>  	 * to avoid reading in already uptodate ranges.
>  	 */
>  	if (ifs) {
> -		unsigned int i, blocks_skipped;
> +		unsigned int next, blocks_skipped;
>  
> -		/* move forward for each leading block marked uptodate */
> -		for (i = first; i <= last; i++)
> -			if (!ifs_block_is_uptodate(ifs, i))
> -				break;
> +		next = ifs_next_nonuptodate_block(folio, first, last);
> +		blocks_skipped = next - first;
>  
> -		blocks_skipped = i - first;
>  		if (blocks_skipped) {
>  			unsigned long block_offset = *pos & (block_size - 1);
>  			unsigned bytes_skipped =
> @@ -295,15 +310,15 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
>  			poff += bytes_skipped;
>  			plen -= bytes_skipped;
>  		}
> -		first = i;
> +		first = next;
>  
>  		/* truncate len if we find any trailing uptodate block(s) */
> -		while (++i <= last) {
> -			if (ifs_block_is_uptodate(ifs, i)) {
> +		if (++next <= last) {
> +			next = ifs_next_uptodate_block(folio, next, last);
> +			if (next <= last) {
>  				plen -= iomap_bytes_to_truncate(*pos + plen,
> -						block_bits, last - i + 1);
> -				last = i - 1;
> -				break;
> +						block_bits, last - next + 1);
> +				last = next - 1;
>  			}
>  		}
>  	}
> @@ -634,7 +649,7 @@ bool iomap_is_partially_uptodate(struct folio *folio, size_t from, size_t count)
>  {
>  	struct iomap_folio_state *ifs = folio->private;
>  	struct inode *inode = folio->mapping->host;
> -	unsigned first, last, i;
> +	unsigned first, last;
>  
>  	if (!ifs)
>  		return false;
> @@ -646,10 +661,7 @@ bool iomap_is_partially_uptodate(struct folio *folio, size_t from, size_t count)
>  	first = from >> inode->i_blkbits;
>  	last = (from + count - 1) >> inode->i_blkbits;
>  
> -	for (i = first; i <= last; i++)
> -		if (!ifs_block_is_uptodate(ifs, i))
> -			return false;
> -	return true;
> +	return ifs_next_nonuptodate_block(folio, first, last) > last;
>  }
>  EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
>  
> -- 
> 2.47.3
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/8] iomap: buffered io changes
  2025-11-04 20:51 [PATCH v3 0/8] iomap: buffered io changes Joanne Koong
                   ` (7 preceding siblings ...)
  2025-11-04 20:51 ` [PATCH v3 8/8] iomap: use find_next_bit() for uptodate " Joanne Koong
@ 2025-11-11 12:09 ` Christian Brauner
  2025-11-11 18:05   ` Joanne Koong
  8 siblings, 1 reply; 23+ messages in thread
From: Christian Brauner @ 2025-11-11 12:09 UTC (permalink / raw)
  To: Joanne Koong; +Cc: hch, djwong, bfoster, linux-fsdevel, kernel-team

On Tue, Nov 04, 2025 at 12:51:11PM -0800, Joanne Koong wrote:
> 
> v2: https://lore.kernel.org/linux-fsdevel/20251021164353.3854086-1-joannelkoong@gmail.com/
> v2 -> v3:
> * Fix race when writing back all bytes of a folio (patch 3)
> * Rename from bytes_pending to bytes_submitted (patch 3)
> * Add more comments about logic (patch 3)
> * Change bytes_submitted from unsigned to size_t (patch 3) (Matthew)
> 
> v1: https://lore.kernel.org/linux-fsdevel/20251009225611.3744728-1-joannelkoong@gmail.com/
> v1 -> v2:
> * Incorporate Christoph's feedback (drop non-block-aligned writes patch, fix
>   bitmap scanning function comments, use more concise variable name, etc)
> * For loff_t patch, fix up .writeback_range() callback for zonefs, gfs2, and
>   block 

Joanne, I think we're ready to take this upstream. Could you do me a
favor and rebase this on top of vfs-6.19.iomap and resend?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/8] iomap: buffered io changes
  2025-11-11 12:09 ` [PATCH v3 0/8] iomap: buffered io changes Christian Brauner
@ 2025-11-11 18:05   ` Joanne Koong
  0 siblings, 0 replies; 23+ messages in thread
From: Joanne Koong @ 2025-11-11 18:05 UTC (permalink / raw)
  To: Christian Brauner; +Cc: hch, djwong, bfoster, linux-fsdevel, kernel-team

On Tue, Nov 11, 2025 at 4:09 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Tue, Nov 04, 2025 at 12:51:11PM -0800, Joanne Koong wrote:
> >
> > v2: https://lore.kernel.org/linux-fsdevel/20251021164353.3854086-1-joannelkoong@gmail.com/
> > v2 -> v3:
> > * Fix race when writing back all bytes of a folio (patch 3)
> > * Rename from bytes_pending to bytes_submitted (patch 3)
> > * Add more comments about logic (patch 3)
> > * Change bytes_submitted from unsigned to size_t (patch 3) (Matthew)
> >
> > v1: https://lore.kernel.org/linux-fsdevel/20251009225611.3744728-1-joannelkoong@gmail.com/
> > v1 -> v2:
> > * Incorporate Christoph's feedback (drop non-block-aligned writes patch, fix
> >   bitmap scanning function comments, use more concise variable name, etc)
> > * For loff_t patch, fix up .writeback_range() callback for zonefs, gfs2, and
> >   block
>
> Joanne, I think we're ready to take this upstream. Could you do me a
> favor and rebase this on top of vfs-6.19.iomap and resend?

Absolutely, I will rebase and resend this today.

Thanks,
Joanne

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2025-11-11 18:05 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-04 20:51 [PATCH v3 0/8] iomap: buffered io changes Joanne Koong
2025-11-04 20:51 ` [PATCH v3 1/8] iomap: account for unaligned end offsets when truncating read range Joanne Koong
2025-11-05  1:27   ` Darrick J. Wong
2025-11-06 17:08     ` Joanne Koong
2025-11-06 23:08       ` Darrick J. Wong
2025-11-04 20:51 ` [PATCH v3 2/8] docs: document iomap writeback's iomap_finish_folio_write() requirement Joanne Koong
2025-11-05  1:22   ` Darrick J. Wong
2025-11-04 20:51 ` [PATCH v3 3/8] iomap: optimize pending async writeback accounting Joanne Koong
2025-11-05  1:28   ` Darrick J. Wong
2025-11-04 20:51 ` [PATCH v3 4/8] iomap: simplify ->read_folio_range() error handling for reads Joanne Koong
2025-11-05  1:50   ` Darrick J. Wong
2025-11-06 17:17     ` Joanne Koong
2025-11-06 23:09       ` Darrick J. Wong
2025-11-04 20:51 ` [PATCH v3 5/8] iomap: simplify when reads can be skipped for writes Joanne Koong
2025-11-05  1:40   ` Darrick J. Wong
2025-11-04 20:51 ` [PATCH v3 6/8] iomap: use loff_t for file positions and offsets in writeback code Joanne Koong
2025-11-05  1:41   ` Darrick J. Wong
2025-11-04 20:51 ` [PATCH v3 7/8] iomap: use find_next_bit() for dirty bitmap scanning Joanne Koong
2025-11-05  1:42   ` Darrick J. Wong
2025-11-04 20:51 ` [PATCH v3 8/8] iomap: use find_next_bit() for uptodate " Joanne Koong
2025-11-05  1:42   ` Darrick J. Wong
2025-11-11 12:09 ` [PATCH v3 0/8] iomap: buffered io changes Christian Brauner
2025-11-11 18:05   ` Joanne Koong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).