* [PATCH v2 0/8] iomap: buffered io changes
@ 2025-10-21 16:43 Joanne Koong
2025-10-21 16:43 ` [PATCH v2 1/8] iomap: account for unaligned end offsets when truncating read range Joanne Koong
` (7 more replies)
0 siblings, 8 replies; 12+ messages in thread
From: Joanne Koong @ 2025-10-21 16:43 UTC (permalink / raw)
To: brauner; +Cc: djwong, hch, bfoster, linux-fsdevel, kernel-team
v1: https://lore.kernel.org/linux-fsdevel/20251009225611.3744728-1-joannelkoong@gmail.com/
v1 -> v2:
* Incorporate Christoph's feedback (drop non-block-aligned writes patch, fix
bitmap scanning function comments, use more concise variable name, etc)
* For loff_t patch, fix up .writeback_range() callback for zonefs, gfs2, and
block
Joanne Koong (8):
iomap: account for unaligned end offsets when truncating read range
docs: document iomap writeback's iomap_finish_folio_write()
requirement
iomap: optimize pending async writeback accounting
iomap: simplify ->read_folio_range() error handling for reads
iomap: simplify when reads can be skipped for writes
iomap: use loff_t for file positions and offsets in writeback code
iomap: use find_next_bit() for dirty bitmap scanning
iomap: use find_next_bit() for uptodate bitmap scanning
.../filesystems/iomap/operations.rst | 10 +-
block/fops.c | 3 +-
fs/fuse/file.c | 18 +-
fs/gfs2/bmap.c | 3 +-
fs/iomap/buffered-io.c | 229 +++++++++++-------
fs/iomap/ioend.c | 2 -
fs/xfs/xfs_aops.c | 8 +-
fs/zonefs/file.c | 3 +-
include/linux/iomap.h | 15 +-
9 files changed, 167 insertions(+), 124 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 1/8] iomap: account for unaligned end offsets when truncating read range
2025-10-21 16:43 [PATCH v2 0/8] iomap: buffered io changes Joanne Koong
@ 2025-10-21 16:43 ` Joanne Koong
2025-10-21 16:43 ` [PATCH v2 2/8] docs: document iomap writeback's iomap_finish_folio_write() requirement Joanne Koong
` (6 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Joanne Koong @ 2025-10-21 16:43 UTC (permalink / raw)
To: brauner; +Cc: djwong, hch, bfoster, linux-fsdevel, kernel-team,
Christoph Hellwig
The end position to start truncating from may be at an offset into a
block, which under the current logic would result in overtruncation.
Adjust the calculation to account for unaligned end offsets.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
fs/iomap/buffered-io.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 72196e5021b1..636d2398c9b4 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -218,6 +218,22 @@ static void ifs_free(struct folio *folio)
kfree(ifs);
}
+/*
+ * Calculate how many bytes to truncate based off the number of blocks to
+ * truncate and the end position to start truncating from.
+ */
+static size_t iomap_bytes_to_truncate(loff_t end_pos, unsigned block_bits,
+ unsigned blocks_truncated)
+{
+ unsigned block_size = 1 << block_bits;
+ unsigned block_offset = end_pos & (block_size - 1);
+
+ if (!block_offset)
+ return blocks_truncated << block_bits;
+
+ return ((blocks_truncated - 1) << block_bits) + block_offset;
+}
+
/*
* Calculate the range inside the folio that we actually need to read.
*/
@@ -263,7 +279,8 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
/* truncate len if we find any trailing uptodate block(s) */
while (++i <= last) {
if (ifs_block_is_uptodate(ifs, i)) {
- plen -= (last - i + 1) * block_size;
+ plen -= iomap_bytes_to_truncate(*pos + plen,
+ block_bits, last - i + 1);
last = i - 1;
break;
}
@@ -279,7 +296,8 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
unsigned end = offset_in_folio(folio, isize - 1) >> block_bits;
if (first <= end && last > end)
- plen -= (last - end) * block_size;
+ plen -= iomap_bytes_to_truncate(*pos + plen, block_bits,
+ last - end);
}
*offp = poff;
--
2.47.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 2/8] docs: document iomap writeback's iomap_finish_folio_write() requirement
2025-10-21 16:43 [PATCH v2 0/8] iomap: buffered io changes Joanne Koong
2025-10-21 16:43 ` [PATCH v2 1/8] iomap: account for unaligned end offsets when truncating read range Joanne Koong
@ 2025-10-21 16:43 ` Joanne Koong
2025-10-21 16:43 ` [PATCH v2 3/8] iomap: optimize pending async writeback accounting Joanne Koong
` (5 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Joanne Koong @ 2025-10-21 16:43 UTC (permalink / raw)
To: brauner; +Cc: djwong, hch, bfoster, linux-fsdevel, kernel-team,
Christoph Hellwig
Document that iomap_finish_folio_write() must be called after writeback
on the range completes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
Documentation/filesystems/iomap/operations.rst | 3 +++
include/linux/iomap.h | 4 ++++
2 files changed, 7 insertions(+)
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index c88205132039..4d30723be7fa 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -361,6 +361,9 @@ The fields are as follows:
delalloc reservations to avoid having delalloc reservations for
clean pagecache.
This function must be supplied by the filesystem.
+ If this succeeds, iomap_finish_folio_write() must be called once writeback
+ completes for the range, regardless of whether the writeback succeeded or
+ failed.
- ``writeback_submit``: Submit the previous built writeback context.
Block based file systems should use the iomap_ioend_writeback_submit
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 65d123114883..d1a1e993cfe7 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -435,6 +435,10 @@ struct iomap_writeback_ops {
* An existing mapping from a previous call to this method can be reused
* by the file system if it is still valid.
*
+ * If this succeeds, iomap_finish_folio_write() must be called once
+ * writeback completes for the range, regardless of whether the
+ * writeback succeeded or failed.
+ *
* Returns the number of bytes processed or a negative errno.
*/
ssize_t (*writeback_range)(struct iomap_writepage_ctx *wpc,
--
2.47.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 3/8] iomap: optimize pending async writeback accounting
2025-10-21 16:43 [PATCH v2 0/8] iomap: buffered io changes Joanne Koong
2025-10-21 16:43 ` [PATCH v2 1/8] iomap: account for unaligned end offsets when truncating read range Joanne Koong
2025-10-21 16:43 ` [PATCH v2 2/8] docs: document iomap writeback's iomap_finish_folio_write() requirement Joanne Koong
@ 2025-10-21 16:43 ` Joanne Koong
2025-10-22 4:49 ` Christoph Hellwig
2025-10-22 14:23 ` Matthew Wilcox
2025-10-21 16:43 ` [PATCH v2 4/8] iomap: simplify ->read_folio_range() error handling for reads Joanne Koong
` (4 subsequent siblings)
7 siblings, 2 replies; 12+ messages in thread
From: Joanne Koong @ 2025-10-21 16:43 UTC (permalink / raw)
To: brauner; +Cc: djwong, hch, bfoster, linux-fsdevel, kernel-team
Pending writebacks must be accounted for to determine when all requests
have completed and writeback on the folio should be ended. Currently
this is done by atomically incrementing ifs->write_bytes_pending for
every range to be written back.
Instead, the number of atomic operations can be minimized by setting
ifs->write_bytes_pending to the folio size, internally tracking how many
bytes are written back asynchronously, and then after sending off all
the requests, decrementing ifs->write_bytes_pending by the number of
bytes not written back asynchronously. Now, for N ranges written back,
only N + 2 atomic operations are required instead of 2N + 2.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/fuse/file.c | 4 ++--
fs/iomap/buffered-io.c | 44 +++++++++++++++++-------------------------
fs/iomap/ioend.c | 2 --
include/linux/iomap.h | 2 --
4 files changed, 20 insertions(+), 32 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 8275b6681b9b..b343a6f37563 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1885,7 +1885,8 @@ static void fuse_writepage_finish(struct fuse_writepage_args *wpa)
* scope of the fi->lock alleviates xarray lock
* contention and noticeably improves performance.
*/
- iomap_finish_folio_write(inode, ap->folios[i], 1);
+ iomap_finish_folio_write(inode, ap->folios[i],
+ ap->descs[i].length);
wake_up(&fi->page_waitq);
}
@@ -2221,7 +2222,6 @@ static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
ap = &wpa->ia.ap;
}
- iomap_start_folio_write(inode, folio, 1);
fuse_writepage_args_page_fill(wpa, folio, ap->num_folios,
offset, len);
data->nr_bytes += len;
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 636d2398c9b4..06d6abda5f75 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1603,16 +1603,16 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops,
}
EXPORT_SYMBOL_GPL(iomap_page_mkwrite);
-void iomap_start_folio_write(struct inode *inode, struct folio *folio,
- size_t len)
+static void iomap_writeback_init(struct inode *inode, struct folio *folio)
{
struct iomap_folio_state *ifs = folio->private;
WARN_ON_ONCE(i_blocks_per_folio(inode, folio) > 1 && !ifs);
- if (ifs)
- atomic_add(len, &ifs->write_bytes_pending);
+ if (ifs) {
+ WARN_ON_ONCE(atomic_read(&ifs->write_bytes_pending) != 0);
+ atomic_set(&ifs->write_bytes_pending, folio_size(folio));
+ }
}
-EXPORT_SYMBOL_GPL(iomap_start_folio_write);
void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
size_t len)
@@ -1629,7 +1629,7 @@ EXPORT_SYMBOL_GPL(iomap_finish_folio_write);
static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
struct folio *folio, u64 pos, u32 rlen, u64 end_pos,
- bool *wb_pending)
+ unsigned *bytes_pending)
{
do {
ssize_t ret;
@@ -1643,11 +1643,11 @@ static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
pos += ret;
/*
- * Holes are not be written back by ->writeback_range, so track
+ * Holes are not written back by ->writeback_range, so track
* if we did handle anything that is not a hole here.
*/
if (wpc->iomap.type != IOMAP_HOLE)
- *wb_pending = true;
+ *bytes_pending += ret;
} while (rlen);
return 0;
@@ -1718,7 +1718,7 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
u64 pos = folio_pos(folio);
u64 end_pos = pos + folio_size(folio);
u64 end_aligned = 0;
- bool wb_pending = false;
+ unsigned bytes_pending = 0;
int error = 0;
u32 rlen;
@@ -1738,14 +1738,7 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
iomap_set_range_dirty(folio, 0, end_pos - pos);
}
- /*
- * Keep the I/O completion handler from clearing the writeback
- * bit until we have submitted all blocks by adding a bias to
- * ifs->write_bytes_pending, which is dropped after submitting
- * all blocks.
- */
- WARN_ON_ONCE(atomic_read(&ifs->write_bytes_pending) != 0);
- iomap_start_folio_write(inode, folio, 1);
+ iomap_writeback_init(inode, folio);
}
/*
@@ -1760,13 +1753,13 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
end_aligned = round_up(end_pos, i_blocksize(inode));
while ((rlen = iomap_find_dirty_range(folio, &pos, end_aligned))) {
error = iomap_writeback_range(wpc, folio, pos, rlen, end_pos,
- &wb_pending);
+ &bytes_pending);
if (error)
break;
pos += rlen;
}
- if (wb_pending)
+ if (bytes_pending)
wpc->nr_folios++;
/*
@@ -1783,13 +1776,12 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
* already at this point. In that case we need to clear the writeback
* bit ourselves right after unlocking the page.
*/
- if (ifs) {
- if (atomic_dec_and_test(&ifs->write_bytes_pending))
- folio_end_writeback(folio);
- } else {
- if (!wb_pending)
- folio_end_writeback(folio);
- }
+ if (ifs)
+ iomap_finish_folio_write(inode, folio,
+ folio_size(folio) - bytes_pending);
+ else if (!bytes_pending)
+ folio_end_writeback(folio);
+
mapping_set_error(inode->i_mapping, error);
return error;
}
diff --git a/fs/iomap/ioend.c b/fs/iomap/ioend.c
index b49fa75eab26..86f44922ed3b 100644
--- a/fs/iomap/ioend.c
+++ b/fs/iomap/ioend.c
@@ -194,8 +194,6 @@ ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
if (!bio_add_folio(&ioend->io_bio, folio, map_len, poff))
goto new_ioend;
- iomap_start_folio_write(wpc->inode, folio, map_len);
-
/*
* Clamp io_offset and io_size to the incore EOF so that ondisk
* file size updates in the ioend completion are byte-accurate.
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index d1a1e993cfe7..655f60187f8f 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -478,8 +478,6 @@ int iomap_ioend_writeback_submit(struct iomap_writepage_ctx *wpc, int error);
void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
int error);
-void iomap_start_folio_write(struct inode *inode, struct folio *folio,
- size_t len);
void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
size_t len);
--
2.47.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 4/8] iomap: simplify ->read_folio_range() error handling for reads
2025-10-21 16:43 [PATCH v2 0/8] iomap: buffered io changes Joanne Koong
` (2 preceding siblings ...)
2025-10-21 16:43 ` [PATCH v2 3/8] iomap: optimize pending async writeback accounting Joanne Koong
@ 2025-10-21 16:43 ` Joanne Koong
2025-10-21 16:43 ` [PATCH v2 5/8] iomap: simplify when reads can be skipped for writes Joanne Koong
` (3 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Joanne Koong @ 2025-10-21 16:43 UTC (permalink / raw)
To: brauner; +Cc: djwong, hch, bfoster, linux-fsdevel, kernel-team,
Christoph Hellwig
Instead of requiring that the caller calls iomap_finish_folio_read()
even if the ->read_folio_range() callback returns an error, account for
this internally in iomap instead, which makes the interface simpler and
makes it match writeback's ->read_folio_range() error handling
expectations.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
.../filesystems/iomap/operations.rst | 7 +++--
fs/fuse/file.c | 10 ++-----
fs/iomap/buffered-io.c | 26 +++++++++----------
include/linux/iomap.h | 5 ++--
4 files changed, 19 insertions(+), 29 deletions(-)
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index 4d30723be7fa..64f4baf5750e 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -149,10 +149,9 @@ These ``struct kiocb`` flags are significant for buffered I/O with iomap:
iomap calls these functions:
- ``read_folio_range``: Called to read in the range. This must be provided
- by the caller. The caller is responsible for calling
- iomap_finish_folio_read() after reading in the folio range. This should be
- done even if an error is encountered during the read. This returns 0 on
- success or a negative error on failure.
+ by the caller. If this succeeds, iomap_finish_folio_read() must be called
+ after the range is read in, regardless of whether the read succeeded or
+ failed.
- ``submit_read``: Submit any pending read requests. This function is
optional.
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index b343a6f37563..7bcb650a9f26 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -922,13 +922,6 @@ static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
if (ctx->rac) {
ret = fuse_handle_readahead(folio, ctx->rac, data, pos, len);
- /*
- * If fuse_handle_readahead was successful, fuse_readpages_end
- * will do the iomap_finish_folio_read, else we need to call it
- * here
- */
- if (ret)
- iomap_finish_folio_read(folio, off, len, ret);
} else {
/*
* for non-readahead read requests, do reads synchronously
@@ -936,7 +929,8 @@ static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
* out-of-order reads
*/
ret = fuse_do_readfolio(file, folio, off, len);
- iomap_finish_folio_read(folio, off, len, ret);
+ if (!ret)
+ iomap_finish_folio_read(folio, off, len, ret);
}
return ret;
}
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 06d6abda5f75..82ebfa07735f 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -377,26 +377,16 @@ static void iomap_read_init(struct folio *folio)
size_t len = folio_size(folio);
spin_lock_irq(&ifs->state_lock);
- ifs->read_bytes_pending += len;
+ WARN_ON_ONCE(ifs->read_bytes_pending != 0);
+ ifs->read_bytes_pending = len;
spin_unlock_irq(&ifs->state_lock);
}
}
static void iomap_read_end(struct folio *folio, size_t bytes_pending)
{
- struct iomap_folio_state *ifs;
-
- /*
- * If there are no bytes pending, this means we are responsible for
- * unlocking the folio here, since no IO helper has taken ownership of
- * it.
- */
- if (!bytes_pending) {
- folio_unlock(folio);
- return;
- }
+ struct iomap_folio_state *ifs = folio->private;
- ifs = folio->private;
if (ifs) {
bool end_read, uptodate;
size_t bytes_accounted = folio_size(folio) - bytes_pending;
@@ -415,6 +405,14 @@ static void iomap_read_end(struct folio *folio, size_t bytes_pending)
spin_unlock_irq(&ifs->state_lock);
if (end_read)
folio_end_read(folio, uptodate);
+ } else if (!bytes_pending) {
+ /*
+ * If there are no bytes pending, this means we are responsible
+ * for unlocking the folio here, since no IO helper has taken
+ * ownership of it. If there are bytes pending, then the IO
+ * helper will end the read via iomap_finish_folio_read().
+ */
+ folio_unlock(folio);
}
}
@@ -462,10 +460,10 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
} else {
if (!*bytes_pending)
iomap_read_init(folio);
- *bytes_pending += plen;
ret = ctx->ops->read_folio_range(iter, ctx, plen);
if (ret)
return ret;
+ *bytes_pending += plen;
}
ret = iomap_iter_advance(iter, plen);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 655f60187f8f..b999f900e23b 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -495,9 +495,8 @@ struct iomap_read_ops {
/*
* Read in a folio range.
*
- * The caller is responsible for calling iomap_finish_folio_read() after
- * reading in the folio range. This should be done even if an error is
- * encountered during the read.
+ * If this succeeds, iomap_finish_folio_read() must be called after the
+ * range is read in, regardless of whether the read succeeded or failed.
*
* Returns 0 on success or a negative error on failure.
*/
--
2.47.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 5/8] iomap: simplify when reads can be skipped for writes
2025-10-21 16:43 [PATCH v2 0/8] iomap: buffered io changes Joanne Koong
` (3 preceding siblings ...)
2025-10-21 16:43 ` [PATCH v2 4/8] iomap: simplify ->read_folio_range() error handling for reads Joanne Koong
@ 2025-10-21 16:43 ` Joanne Koong
2025-10-21 16:43 ` [PATCH v2 6/8] iomap: use loff_t for file positions and offsets in writeback code Joanne Koong
` (2 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Joanne Koong @ 2025-10-21 16:43 UTC (permalink / raw)
To: brauner; +Cc: djwong, hch, bfoster, linux-fsdevel, kernel-team,
Christoph Hellwig
Currently, the logic for skipping the read range for a write is
if (!(iter->flags & IOMAP_UNSHARE) &&
(from <= poff || from >= poff + plen) &&
(to <= poff || to >= poff + plen))
which breaks down to skipping the read if any of these are true:
a) from <= poff && to <= poff
b) from <= poff && to >= poff + plen
c) from >= poff + plen && to <= poff
d) from >= poff + plen && to >= poff + plen
This can be simplified to
if (!(iter->flags & IOMAP_UNSHARE) && from <= poff && to >= poff + plen)
from the following reasoning:
a) from <= poff && to <= poff
This reduces to 'to <= poff' since it is guaranteed that 'from <= to'
(since to = from + len). It is not possible for 'from <= to' to be true
here because we only reach here if plen > 0 (thanks to the preceding 'if
(plen == 0)' check that would break us out of the loop). If 'to <=
poff', plen would have to be 0 since poff and plen get adjusted in
lockstep for uptodate blocks. This means we can eliminate this check.
c) from >= poff + plen && to <= poff
This is not possible since 'from <= to' and 'plen > 0'. We can eliminate
this check.
d) from >= poff + plen && to >= poff + plen
This reduces to 'from >= poff + plen' since 'from <= to'.
It is not possible for 'from >= poff + plen' to be true here. We only
reach here if plen > 0 and for writes, poff and plen will always be
block-aligned, which means poff <= from < poff + plen. We can eliminate
this check.
The only valid check is b) from <= poff && to >= poff + plen.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
fs/iomap/buffered-io.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 82ebfa07735f..9f1bc191beca 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -714,9 +714,12 @@ static int __iomap_write_begin(const struct iomap_iter *iter,
if (plen == 0)
break;
- if (!(iter->flags & IOMAP_UNSHARE) &&
- (from <= poff || from >= poff + plen) &&
- (to <= poff || to >= poff + plen))
+ /*
+ * If the read range will be entirely overwritten by the write,
+ * we can skip having to zero/read it in.
+ */
+ if (!(iter->flags & IOMAP_UNSHARE) && from <= poff &&
+ to >= poff + plen)
continue;
if (iomap_block_needs_zeroing(iter, block_start)) {
--
2.47.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 6/8] iomap: use loff_t for file positions and offsets in writeback code
2025-10-21 16:43 [PATCH v2 0/8] iomap: buffered io changes Joanne Koong
` (4 preceding siblings ...)
2025-10-21 16:43 ` [PATCH v2 5/8] iomap: simplify when reads can be skipped for writes Joanne Koong
@ 2025-10-21 16:43 ` Joanne Koong
2025-10-21 16:43 ` [PATCH v2 7/8] iomap: use find_next_bit() for dirty bitmap scanning Joanne Koong
2025-10-21 16:43 ` [PATCH v2 8/8] iomap: use find_next_bit() for uptodate " Joanne Koong
7 siblings, 0 replies; 12+ messages in thread
From: Joanne Koong @ 2025-10-21 16:43 UTC (permalink / raw)
To: brauner; +Cc: djwong, hch, bfoster, linux-fsdevel, kernel-team,
Christoph Hellwig
Use loff_t instead of u64 for file positions and offsets to be
consistent with kernel VFS conventions. Both are 64-bit types. loff_t is
signed for historical reasons but this has no practical effect.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
block/fops.c | 3 ++-
fs/fuse/file.c | 4 ++--
fs/gfs2/bmap.c | 3 ++-
fs/iomap/buffered-io.c | 17 +++++++++--------
fs/xfs/xfs_aops.c | 8 ++++----
fs/zonefs/file.c | 3 ++-
include/linux/iomap.h | 4 ++--
7 files changed, 23 insertions(+), 19 deletions(-)
diff --git a/block/fops.c b/block/fops.c
index 4dad9c2d5796..d2b96143b40f 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -550,7 +550,8 @@ static void blkdev_readahead(struct readahead_control *rac)
}
static ssize_t blkdev_writeback_range(struct iomap_writepage_ctx *wpc,
- struct folio *folio, u64 offset, unsigned int len, u64 end_pos)
+ struct folio *folio, loff_t offset, unsigned int len,
+ loff_t end_pos)
{
loff_t isize = i_size_read(wpc->inode);
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 7bcb650a9f26..6d5e44cbd62e 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2168,8 +2168,8 @@ static bool fuse_folios_need_send(struct fuse_conn *fc, loff_t pos,
}
static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
- struct folio *folio, u64 pos,
- unsigned len, u64 end_pos)
+ struct folio *folio, loff_t pos,
+ unsigned len, loff_t end_pos)
{
struct fuse_fill_wb_data *data = wpc->wb_ctx;
struct fuse_writepage_args *wpa = data->wpa;
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 131091520de6..2b61b057151b 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -2473,7 +2473,8 @@ int __gfs2_punch_hole(struct file *file, loff_t offset, loff_t length)
}
static ssize_t gfs2_writeback_range(struct iomap_writepage_ctx *wpc,
- struct folio *folio, u64 offset, unsigned int len, u64 end_pos)
+ struct folio *folio, loff_t offset, unsigned int len,
+ loff_t end_pos)
{
if (WARN_ON_ONCE(gfs2_is_stuffed(GFS2_I(wpc->inode))))
return -EIO;
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 9f1bc191beca..a8baf62fef30 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -86,7 +86,8 @@ static inline bool ifs_block_is_dirty(struct folio *folio,
}
static unsigned ifs_find_dirty_range(struct folio *folio,
- struct iomap_folio_state *ifs, u64 *range_start, u64 range_end)
+ struct iomap_folio_state *ifs, loff_t *range_start,
+ loff_t range_end)
{
struct inode *inode = folio->mapping->host;
unsigned start_blk =
@@ -110,8 +111,8 @@ static unsigned ifs_find_dirty_range(struct folio *folio,
return nblks << inode->i_blkbits;
}
-static unsigned iomap_find_dirty_range(struct folio *folio, u64 *range_start,
- u64 range_end)
+static unsigned iomap_find_dirty_range(struct folio *folio, loff_t *range_start,
+ loff_t range_end)
{
struct iomap_folio_state *ifs = folio->private;
@@ -1629,7 +1630,7 @@ void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
EXPORT_SYMBOL_GPL(iomap_finish_folio_write);
static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
- struct folio *folio, u64 pos, u32 rlen, u64 end_pos,
+ struct folio *folio, loff_t pos, u32 rlen, loff_t end_pos,
unsigned *bytes_pending)
{
do {
@@ -1661,7 +1662,7 @@ static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
* i_size, adjust end_pos and zero all data beyond i_size.
*/
static bool iomap_writeback_handle_eof(struct folio *folio, struct inode *inode,
- u64 *end_pos)
+ loff_t *end_pos)
{
u64 isize = i_size_read(inode);
@@ -1716,9 +1717,9 @@ int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
{
struct iomap_folio_state *ifs = folio->private;
struct inode *inode = wpc->inode;
- u64 pos = folio_pos(folio);
- u64 end_pos = pos + folio_size(folio);
- u64 end_aligned = 0;
+ loff_t pos = folio_pos(folio);
+ loff_t end_pos = pos + folio_size(folio);
+ loff_t end_aligned = 0;
unsigned bytes_pending = 0;
int error = 0;
u32 rlen;
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 0c2ed00733f2..593a34832116 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -480,9 +480,9 @@ static ssize_t
xfs_writeback_range(
struct iomap_writepage_ctx *wpc,
struct folio *folio,
- u64 offset,
+ loff_t offset,
unsigned int len,
- u64 end_pos)
+ loff_t end_pos)
{
ssize_t ret;
@@ -630,9 +630,9 @@ static ssize_t
xfs_zoned_writeback_range(
struct iomap_writepage_ctx *wpc,
struct folio *folio,
- u64 offset,
+ loff_t offset,
unsigned int len,
- u64 end_pos)
+ loff_t end_pos)
{
ssize_t ret;
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index c1e5e30e90a0..d748ed99ac2d 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -126,7 +126,8 @@ static void zonefs_readahead(struct readahead_control *rac)
* which implies that the page range can only be within the fixed inode size.
*/
static ssize_t zonefs_writeback_range(struct iomap_writepage_ctx *wpc,
- struct folio *folio, u64 offset, unsigned len, u64 end_pos)
+ struct folio *folio, loff_t offset, unsigned len,
+ loff_t end_pos)
{
struct zonefs_zone *z = zonefs_inode_zone(wpc->inode);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index b999f900e23b..cc8c85140b00 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -442,8 +442,8 @@ struct iomap_writeback_ops {
* Returns the number of bytes processed or a negative errno.
*/
ssize_t (*writeback_range)(struct iomap_writepage_ctx *wpc,
- struct folio *folio, u64 pos, unsigned int len,
- u64 end_pos);
+ struct folio *folio, loff_t pos, unsigned int len,
+ loff_t end_pos);
/*
* Submit a writeback context previously build up by ->writeback_range.
--
2.47.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 7/8] iomap: use find_next_bit() for dirty bitmap scanning
2025-10-21 16:43 [PATCH v2 0/8] iomap: buffered io changes Joanne Koong
` (5 preceding siblings ...)
2025-10-21 16:43 ` [PATCH v2 6/8] iomap: use loff_t for file positions and offsets in writeback code Joanne Koong
@ 2025-10-21 16:43 ` Joanne Koong
2025-10-21 16:43 ` [PATCH v2 8/8] iomap: use find_next_bit() for uptodate " Joanne Koong
7 siblings, 0 replies; 12+ messages in thread
From: Joanne Koong @ 2025-10-21 16:43 UTC (permalink / raw)
To: brauner; +Cc: djwong, hch, bfoster, linux-fsdevel, kernel-team
Use find_next_bit()/find_next_zero_bit() for iomap dirty bitmap
scanning. This uses __ffs() internally and is more efficient for
finding the next dirty or clean bit than iterating through the bitmap
range testing every bit.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
---
fs/iomap/buffered-io.c | 61 ++++++++++++++++++++++++++++--------------
1 file changed, 41 insertions(+), 20 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index a8baf62fef30..156ec619b1e5 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -76,13 +76,34 @@ static void iomap_set_range_uptodate(struct folio *folio, size_t off,
folio_mark_uptodate(folio);
}
-static inline bool ifs_block_is_dirty(struct folio *folio,
- struct iomap_folio_state *ifs, int block)
+/*
+ * Find the next dirty block in the folio. end_blk is inclusive.
+ * If no dirty block is found, this will return end_blk + 1.
+ */
+static unsigned ifs_next_dirty_block(struct folio *folio,
+ unsigned start_blk, unsigned end_blk)
{
+ struct iomap_folio_state *ifs = folio->private;
struct inode *inode = folio->mapping->host;
- unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
+ unsigned int blks = i_blocks_per_folio(inode, folio);
+
+ return find_next_bit(ifs->state, blks + end_blk + 1,
+ blks + start_blk) - blks;
+}
+
+/*
+ * Find the next clean block in the folio. end_blk is inclusive.
+ * If no clean block is found, this will return end_blk + 1.
+ */
+static unsigned ifs_next_clean_block(struct folio *folio,
+ unsigned start_blk, unsigned end_blk)
+{
+ struct iomap_folio_state *ifs = folio->private;
+ struct inode *inode = folio->mapping->host;
+ unsigned int blks = i_blocks_per_folio(inode, folio);
- return test_bit(block + blks_per_folio, ifs->state);
+ return find_next_zero_bit(ifs->state, blks + end_blk + 1,
+ blks + start_blk) - blks;
}
static unsigned ifs_find_dirty_range(struct folio *folio,
@@ -94,18 +115,17 @@ static unsigned ifs_find_dirty_range(struct folio *folio,
offset_in_folio(folio, *range_start) >> inode->i_blkbits;
unsigned end_blk = min_not_zero(
offset_in_folio(folio, range_end) >> inode->i_blkbits,
- i_blocks_per_folio(inode, folio));
- unsigned nblks = 1;
-
- while (!ifs_block_is_dirty(folio, ifs, start_blk))
- if (++start_blk == end_blk)
- return 0;
+ i_blocks_per_folio(inode, folio)) - 1;
+ unsigned nblks;
- while (start_blk + nblks < end_blk) {
- if (!ifs_block_is_dirty(folio, ifs, start_blk + nblks))
- break;
- nblks++;
- }
+ start_blk = ifs_next_dirty_block(folio, start_blk, end_blk);
+ if (start_blk > end_blk)
+ return 0;
+ if (start_blk == end_blk)
+ nblks = 1;
+ else
+ nblks = ifs_next_clean_block(folio, start_blk + 1, end_blk) -
+ start_blk;
*range_start = folio_pos(folio) + (start_blk << inode->i_blkbits);
return nblks << inode->i_blkbits;
@@ -1122,7 +1142,7 @@ static void iomap_write_delalloc_ifs_punch(struct inode *inode,
struct folio *folio, loff_t start_byte, loff_t end_byte,
struct iomap *iomap, iomap_punch_t punch)
{
- unsigned int first_blk, last_blk, i;
+ unsigned int first_blk, last_blk;
loff_t last_byte;
u8 blkbits = inode->i_blkbits;
struct iomap_folio_state *ifs;
@@ -1141,10 +1161,11 @@ static void iomap_write_delalloc_ifs_punch(struct inode *inode,
folio_pos(folio) + folio_size(folio) - 1);
first_blk = offset_in_folio(folio, start_byte) >> blkbits;
last_blk = offset_in_folio(folio, last_byte) >> blkbits;
- for (i = first_blk; i <= last_blk; i++) {
- if (!ifs_block_is_dirty(folio, ifs, i))
- punch(inode, folio_pos(folio) + (i << blkbits),
- 1 << blkbits, iomap);
+ while ((first_blk = ifs_next_clean_block(folio, first_blk, last_blk))
+ <= last_blk) {
+ punch(inode, folio_pos(folio) + (first_blk << blkbits),
+ 1 << blkbits, iomap);
+ first_blk++;
}
}
--
2.47.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 8/8] iomap: use find_next_bit() for uptodate bitmap scanning
2025-10-21 16:43 [PATCH v2 0/8] iomap: buffered io changes Joanne Koong
` (6 preceding siblings ...)
2025-10-21 16:43 ` [PATCH v2 7/8] iomap: use find_next_bit() for dirty bitmap scanning Joanne Koong
@ 2025-10-21 16:43 ` Joanne Koong
7 siblings, 0 replies; 12+ messages in thread
From: Joanne Koong @ 2025-10-21 16:43 UTC (permalink / raw)
To: brauner; +Cc: djwong, hch, bfoster, linux-fsdevel, kernel-team
Use find_next_bit()/find_next_zero_bit() for iomap uptodate bitmap
scanning. This uses __ffs() internally and is more efficient for
finding the next uptodate or non-uptodate bit than iterating through the
the bitmap range testing every bit.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
---
fs/iomap/buffered-io.c | 52 ++++++++++++++++++++++++++----------------
1 file changed, 32 insertions(+), 20 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 156ec619b1e5..0b842f350eec 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -38,10 +38,28 @@ static inline bool ifs_is_fully_uptodate(struct folio *folio,
return bitmap_full(ifs->state, i_blocks_per_folio(inode, folio));
}
-static inline bool ifs_block_is_uptodate(struct iomap_folio_state *ifs,
- unsigned int block)
+/*
+ * Find the next uptodate block in the folio. end_blk is inclusive.
+ * If no uptodate block is found, this will return end_blk + 1.
+ */
+static unsigned ifs_next_uptodate_block(struct folio *folio,
+ unsigned start_blk, unsigned end_blk)
{
- return test_bit(block, ifs->state);
+ struct iomap_folio_state *ifs = folio->private;
+
+ return find_next_bit(ifs->state, end_blk + 1, start_blk);
+}
+
+/*
+ * Find the next non-uptodate block in the folio. end_blk is inclusive.
+ * If no non-uptodate block is found, this will return end_blk + 1.
+ */
+static unsigned ifs_next_nonuptodate_block(struct folio *folio,
+ unsigned start_blk, unsigned end_blk)
+{
+ struct iomap_folio_state *ifs = folio->private;
+
+ return find_next_zero_bit(ifs->state, end_blk + 1, start_blk);
}
static bool ifs_set_range_uptodate(struct folio *folio,
@@ -278,14 +296,11 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
* to avoid reading in already uptodate ranges.
*/
if (ifs) {
- unsigned int i, blocks_skipped;
+ unsigned int next, blocks_skipped;
- /* move forward for each leading block marked uptodate */
- for (i = first; i <= last; i++)
- if (!ifs_block_is_uptodate(ifs, i))
- break;
+ next = ifs_next_nonuptodate_block(folio, first, last);
+ blocks_skipped = next - first;
- blocks_skipped = i - first;
if (blocks_skipped) {
unsigned long block_offset = *pos & (block_size - 1);
unsigned bytes_skipped =
@@ -295,15 +310,15 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
poff += bytes_skipped;
plen -= bytes_skipped;
}
- first = i;
+ first = next;
/* truncate len if we find any trailing uptodate block(s) */
- while (++i <= last) {
- if (ifs_block_is_uptodate(ifs, i)) {
+ if (++next <= last) {
+ next = ifs_next_uptodate_block(folio, next, last);
+ if (next <= last) {
plen -= iomap_bytes_to_truncate(*pos + plen,
- block_bits, last - i + 1);
- last = i - 1;
- break;
+ block_bits, last - next + 1);
+ last = next - 1;
}
}
}
@@ -596,7 +611,7 @@ bool iomap_is_partially_uptodate(struct folio *folio, size_t from, size_t count)
{
struct iomap_folio_state *ifs = folio->private;
struct inode *inode = folio->mapping->host;
- unsigned first, last, i;
+ unsigned first, last;
if (!ifs)
return false;
@@ -608,10 +623,7 @@ bool iomap_is_partially_uptodate(struct folio *folio, size_t from, size_t count)
first = from >> inode->i_blkbits;
last = (from + count - 1) >> inode->i_blkbits;
- for (i = first; i <= last; i++)
- if (!ifs_block_is_uptodate(ifs, i))
- return false;
- return true;
+ return ifs_next_nonuptodate_block(folio, first, last) > last;
}
EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
--
2.47.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 3/8] iomap: optimize pending async writeback accounting
2025-10-21 16:43 ` [PATCH v2 3/8] iomap: optimize pending async writeback accounting Joanne Koong
@ 2025-10-22 4:49 ` Christoph Hellwig
2025-10-22 14:23 ` Matthew Wilcox
1 sibling, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2025-10-22 4:49 UTC (permalink / raw)
To: Joanne Koong; +Cc: brauner, djwong, hch, bfoster, linux-fsdevel, kernel-team
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 3/8] iomap: optimize pending async writeback accounting
2025-10-21 16:43 ` [PATCH v2 3/8] iomap: optimize pending async writeback accounting Joanne Koong
2025-10-22 4:49 ` Christoph Hellwig
@ 2025-10-22 14:23 ` Matthew Wilcox
2025-10-22 21:29 ` Joanne Koong
1 sibling, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2025-10-22 14:23 UTC (permalink / raw)
To: Joanne Koong; +Cc: brauner, djwong, hch, bfoster, linux-fsdevel, kernel-team
On Tue, Oct 21, 2025 at 09:43:47AM -0700, Joanne Koong wrote:
> static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
> struct folio *folio, u64 pos, u32 rlen, u64 end_pos,
> - bool *wb_pending)
> + unsigned *bytes_pending)
This makes me nervous. You're essentially saying "we'll never support
a folio larger than 2GiB" and I disagree. I think for it to become a
practical reality to cache files in 4GiB or larger chunks, we'll need
to see about five more doublings in I/O bandwidth. Looking at the
progression of PCIe recently that's only about 15 years out.
I'd recommend using size_t here to match folio_size().
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 3/8] iomap: optimize pending async writeback accounting
2025-10-22 14:23 ` Matthew Wilcox
@ 2025-10-22 21:29 ` Joanne Koong
0 siblings, 0 replies; 12+ messages in thread
From: Joanne Koong @ 2025-10-22 21:29 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: brauner, djwong, hch, bfoster, linux-fsdevel, kernel-team
On Wed, Oct 22, 2025 at 7:23 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Oct 21, 2025 at 09:43:47AM -0700, Joanne Koong wrote:
> > static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
> > struct folio *folio, u64 pos, u32 rlen, u64 end_pos,
> > - bool *wb_pending)
> > + unsigned *bytes_pending)
>
> This makes me nervous. You're essentially saying "we'll never support
> a folio larger than 2GiB" and I disagree. I think for it to become a
> practical reality to cache files in 4GiB or larger chunks, we'll need
> to see about five more doublings in I/O bandwidth. Looking at the
> progression of PCIe recently that's only about 15 years out.
>
> I'd recommend using size_t here to match folio_size().
>
I will change this to a size_t. Thanks for looking at this.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-10-22 21:29 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-21 16:43 [PATCH v2 0/8] iomap: buffered io changes Joanne Koong
2025-10-21 16:43 ` [PATCH v2 1/8] iomap: account for unaligned end offsets when truncating read range Joanne Koong
2025-10-21 16:43 ` [PATCH v2 2/8] docs: document iomap writeback's iomap_finish_folio_write() requirement Joanne Koong
2025-10-21 16:43 ` [PATCH v2 3/8] iomap: optimize pending async writeback accounting Joanne Koong
2025-10-22 4:49 ` Christoph Hellwig
2025-10-22 14:23 ` Matthew Wilcox
2025-10-22 21:29 ` Joanne Koong
2025-10-21 16:43 ` [PATCH v2 4/8] iomap: simplify ->read_folio_range() error handling for reads Joanne Koong
2025-10-21 16:43 ` [PATCH v2 5/8] iomap: simplify when reads can be skipped for writes Joanne Koong
2025-10-21 16:43 ` [PATCH v2 6/8] iomap: use loff_t for file positions and offsets in writeback code Joanne Koong
2025-10-21 16:43 ` [PATCH v2 7/8] iomap: use find_next_bit() for dirty bitmap scanning Joanne Koong
2025-10-21 16:43 ` [PATCH v2 8/8] iomap: use find_next_bit() for uptodate " Joanne Koong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).