* [PATCH v2 00/16] fuse: use iomap for buffered writes + writeback
@ 2025-06-13 21:46 Joanne Koong
2025-06-13 21:46 ` [PATCH v2 01/16] iomap: move buffered io CONFIG_BLOCK dependent logic into separate file Joanne Koong
` (15 more replies)
0 siblings, 16 replies; 28+ messages in thread
From: Joanne Koong @ 2025-06-13 21:46 UTC (permalink / raw)
To: linux-fsdevel
Cc: hch, djwong, anuj1072538, miklos, brauner, linux-xfs, kernel-team
This series adds fuse iomap support for buffered writes and dirty folio
writeback. This is needed so that granular uptodate and dirty tracking can
be used in fuse when large folios are enabled. This has two big advantages.
Now for writes, instead of the entire folio needing to be read into the page
cache, only the relevant portions need to be. Now for writeback, only the
dirty portions need to be written back instead of the entire folio.
This patchset does 3 things, in order of sequence:
a) Decouple iomap/buffered-io.c code from the CONFIG_BLOCK dependency, as some
environments that run fuse may not have CONFIG_BLOCK set
b) Add support to iomap buffered io for generic write and writeback that is
not dependent on bios
c) Add fuse integration with iomap
Patches 3 and 5 are obviated by the refactoring done later on in patches 10
and 11 but I left this patchset in this order in the hopes of making it
more logically easier to follow.
This series was run through fstests with large folios enabled and through
some quick sanity checks on passthrough_hp with a) writing 1 GB in 1 MB chunks
and then going back and dirtying a few bytes in each chunk and b) writing 50
MB in 1 MB chunks and going through dirtying the entire chunk for several runs.
a) showed about a 40% speedup increase with iomap support added and b) showed
roughly the same performance.
This patchset does not enable large folios yet. That will be sent out in a
separate future patchset.
This series is on top of commit 27605c8c0 ("Merge tag 'net-6.16-rc2'...") in
the linux tree.
Thanks,
Joanne
Changeset
-------
v1 -> v2:
* Drop IOMAP_IN_MEM type and just use IOMAP_MAPPED for fuse
* Separate out new helper functions added to iomap into separate commits
* Update iomap documentation
* Clean up iomap_writeback_dirty_folio() locking logic w/ christoph's
recommendation
* Refactor ->map_blocks() to generic ->writeback_folio()
* Refactor ->submit_ioend() to generic ->writeback_complete()
* Add patch for changing 'count' to 'async_writeback'
* Rebase commits onto linux branch instead of fuse branch
v1: https://lore.kernel.org/linux-fsdevel/20250606233803.1421259-1-joannelkoong@gmail.com/
Joanne Koong (16):
iomap: move buffered io CONFIG_BLOCK dependent logic into separate
file
iomap: iomap_read_folio_sync() -> iomap_bio_read_folio_sync()
iomap: iomap_add_to_ioend() -> iomap_bio_add_to_ioend()
iomap: add wrapper function iomap_bio_readpage()
iomap: add wrapper function iomap_bio_ioend_error()
iomap: add wrapper function iomap_submit_bio()
iomap: decouple buffered-io.o from CONFIG_BLOCK
iomap: add read_folio_sync() handler for buffered writes
iomap: change 'count' to 'async_writeback'
iomap: replace ->map_blocks() with generic ->writeback_folio() for
writeback
iomap: replace ->submit_ioend() with generic ->writeback_complete()
for writeback
iomap: support more customized writeback handling
iomap: add iomap_writeback_dirty_folio()
fuse: use iomap for buffered writes
fuse: use iomap for writeback
fuse: use iomap for folio laundering
.../filesystems/iomap/operations.rst | 65 ++-
block/fops.c | 7 +-
fs/fuse/Kconfig | 1 +
fs/fuse/file.c | 308 +++++-------
fs/gfs2/bmap.c | 7 +-
fs/iomap/Makefile | 5 +-
fs/iomap/buffered-io-bio.c | 365 ++++++++++++++
fs/iomap/buffered-io.c | 471 +++---------------
fs/iomap/internal.h | 40 ++
fs/xfs/xfs_aops.c | 28 +-
fs/zonefs/file.c | 7 +-
include/linux/iomap.h | 88 +++-
12 files changed, 775 insertions(+), 617 deletions(-)
create mode 100644 fs/iomap/buffered-io-bio.c
--
2.47.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v2 01/16] iomap: move buffered io CONFIG_BLOCK dependent logic into separate file
2025-06-13 21:46 [PATCH v2 00/16] fuse: use iomap for buffered writes + writeback Joanne Koong
@ 2025-06-13 21:46 ` Joanne Koong
2025-06-13 21:46 ` [PATCH v2 02/16] iomap: iomap_read_folio_sync() -> iomap_bio_read_folio_sync() Joanne Koong
` (14 subsequent siblings)
15 siblings, 0 replies; 28+ messages in thread
From: Joanne Koong @ 2025-06-13 21:46 UTC (permalink / raw)
To: linux-fsdevel
Cc: hch, djwong, anuj1072538, miklos, brauner, linux-xfs, kernel-team
Move the bulk of buffered io logic that depends on CONFIG_BLOCK into a
separate file, buffered-io-bio.c, in the effort to make it so that
callers that do not have CONFIG_BLOCK set may also use iomap for
buffered io to hook into some of its internal features such as granular
dirty and uptodate tracking for large folios.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/Makefile | 1 +
fs/iomap/buffered-io-bio.c | 210 +++++++++++++++++++++++++++++++++++
fs/iomap/buffered-io.c | 222 +------------------------------------
fs/iomap/internal.h | 25 +++++
4 files changed, 239 insertions(+), 219 deletions(-)
create mode 100644 fs/iomap/buffered-io-bio.c
diff --git a/fs/iomap/Makefile b/fs/iomap/Makefile
index 69e8ebb41302..fb7e8a7a3da4 100644
--- a/fs/iomap/Makefile
+++ b/fs/iomap/Makefile
@@ -11,6 +11,7 @@ obj-$(CONFIG_FS_IOMAP) += iomap.o
iomap-y += trace.o \
iter.o
iomap-$(CONFIG_BLOCK) += buffered-io.o \
+ buffered-io-bio.o \
direct-io.o \
ioend.o \
fiemap.o \
diff --git a/fs/iomap/buffered-io-bio.c b/fs/iomap/buffered-io-bio.c
new file mode 100644
index 000000000000..24f5ede7af3d
--- /dev/null
+++ b/fs/iomap/buffered-io-bio.c
@@ -0,0 +1,210 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2010 Red Hat, Inc.
+ * Copyright (C) 2016-2023 Christoph Hellwig.
+ */
+#include <linux/bio.h>
+#include <linux/buffer_head.h>
+#include <linux/iomap.h>
+#include <linux/writeback.h>
+
+#include "internal.h"
+
+int iomap_read_folio_sync(loff_t block_start, struct folio *folio,
+ size_t poff, size_t plen, const struct iomap *iomap)
+{
+ struct bio_vec bvec;
+ struct bio bio;
+
+ bio_init(&bio, iomap->bdev, &bvec, 1, REQ_OP_READ);
+ bio.bi_iter.bi_sector = iomap_sector(iomap, block_start);
+ bio_add_folio_nofail(&bio, folio, plen, poff);
+ return submit_bio_wait(&bio);
+}
+
+static void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
+ size_t len)
+{
+ struct iomap_folio_state *ifs = folio->private;
+
+ WARN_ON_ONCE(i_blocks_per_folio(inode, folio) > 1 && !ifs);
+ WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) <= 0);
+
+ if (!ifs || atomic_sub_and_test(len, &ifs->write_bytes_pending))
+ folio_end_writeback(folio);
+}
+
+/*
+ * We're now finished for good with this ioend structure. Update the page
+ * state, release holds on bios, and finally free up memory. Do not use the
+ * ioend after this.
+ */
+u32 iomap_finish_ioend_buffered(struct iomap_ioend *ioend)
+{
+ struct inode *inode = ioend->io_inode;
+ struct bio *bio = &ioend->io_bio;
+ struct folio_iter fi;
+ u32 folio_count = 0;
+
+ if (ioend->io_error) {
+ mapping_set_error(inode->i_mapping, ioend->io_error);
+ if (!bio_flagged(bio, BIO_QUIET)) {
+ pr_err_ratelimited(
+"%s: writeback error on inode %lu, offset %lld, sector %llu",
+ inode->i_sb->s_id, inode->i_ino,
+ ioend->io_offset, ioend->io_sector);
+ }
+ }
+
+ /* walk all folios in bio, ending page IO on them */
+ bio_for_each_folio_all(fi, bio) {
+ iomap_finish_folio_write(inode, fi.folio, fi.length);
+ folio_count++;
+ }
+
+ bio_put(bio); /* frees the ioend */
+ return folio_count;
+}
+
+static void iomap_writepage_end_bio(struct bio *bio)
+{
+ struct iomap_ioend *ioend = iomap_ioend_from_bio(bio);
+
+ ioend->io_error = blk_status_to_errno(bio->bi_status);
+ iomap_finish_ioend_buffered(ioend);
+}
+
+static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
+ struct writeback_control *wbc, struct inode *inode, loff_t pos,
+ u16 ioend_flags)
+{
+ struct bio *bio;
+
+ bio = bio_alloc_bioset(wpc->iomap.bdev, BIO_MAX_VECS,
+ REQ_OP_WRITE | wbc_to_write_flags(wbc),
+ GFP_NOFS, &iomap_ioend_bioset);
+ bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos);
+ bio->bi_end_io = iomap_writepage_end_bio;
+ bio->bi_write_hint = inode->i_write_hint;
+ wbc_init_bio(wbc, bio);
+ wpc->nr_folios = 0;
+ return iomap_init_ioend(inode, bio, pos, ioend_flags);
+}
+
+static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos,
+ u16 ioend_flags)
+{
+ if (ioend_flags & IOMAP_IOEND_BOUNDARY)
+ return false;
+ if ((ioend_flags & IOMAP_IOEND_NOMERGE_FLAGS) !=
+ (wpc->ioend->io_flags & IOMAP_IOEND_NOMERGE_FLAGS))
+ return false;
+ if (pos != wpc->ioend->io_offset + wpc->ioend->io_size)
+ return false;
+ if (!(wpc->iomap.flags & IOMAP_F_ANON_WRITE) &&
+ iomap_sector(&wpc->iomap, pos) !=
+ bio_end_sector(&wpc->ioend->io_bio))
+ return false;
+ /*
+ * Limit ioend bio chain lengths to minimise IO completion latency. This
+ * also prevents long tight loops ending page writeback on all the
+ * folios in the ioend.
+ */
+ if (wpc->nr_folios >= IOEND_BATCH_SIZE)
+ return false;
+ return true;
+}
+
+/*
+ * Test to see if we have an existing ioend structure that we could append to
+ * first; otherwise finish off the current ioend and start another.
+ *
+ * If a new ioend is created and cached, the old ioend is submitted to the block
+ * layer instantly. Batching optimisations are provided by higher level block
+ * plugging.
+ *
+ * At the end of a writeback pass, there will be a cached ioend remaining on the
+ * writepage context that the caller will need to submit.
+ */
+int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
+ struct writeback_control *wbc, struct folio *folio,
+ struct inode *inode, loff_t pos, loff_t end_pos,
+ unsigned len)
+{
+ struct iomap_folio_state *ifs = folio->private;
+ size_t poff = offset_in_folio(folio, pos);
+ unsigned int ioend_flags = 0;
+ int error;
+
+ if (wpc->iomap.type == IOMAP_UNWRITTEN)
+ ioend_flags |= IOMAP_IOEND_UNWRITTEN;
+ if (wpc->iomap.flags & IOMAP_F_SHARED)
+ ioend_flags |= IOMAP_IOEND_SHARED;
+ if (folio_test_dropbehind(folio))
+ ioend_flags |= IOMAP_IOEND_DONTCACHE;
+ if (pos == wpc->iomap.offset && (wpc->iomap.flags & IOMAP_F_BOUNDARY))
+ ioend_flags |= IOMAP_IOEND_BOUNDARY;
+
+ if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, pos, ioend_flags)) {
+new_ioend:
+ error = iomap_submit_ioend(wpc, 0);
+ if (error)
+ return error;
+ wpc->ioend = iomap_alloc_ioend(wpc, wbc, inode, pos,
+ ioend_flags);
+ }
+
+ if (!bio_add_folio(&wpc->ioend->io_bio, folio, len, poff))
+ goto new_ioend;
+
+ if (ifs)
+ atomic_add(len, &ifs->write_bytes_pending);
+
+ /*
+ * Clamp io_offset and io_size to the incore EOF so that ondisk
+ * file size updates in the ioend completion are byte-accurate.
+ * This avoids recovering files with zeroed tail regions when
+ * writeback races with appending writes:
+ *
+ * Thread 1: Thread 2:
+ * ------------ -----------
+ * write [A, A+B]
+ * update inode size to A+B
+ * submit I/O [A, A+BS]
+ * write [A+B, A+B+C]
+ * update inode size to A+B+C
+ * <I/O completes, updates disk size to min(A+B+C, A+BS)>
+ * <power failure>
+ *
+ * After reboot:
+ * 1) with A+B+C < A+BS, the file has zero padding in range
+ * [A+B, A+B+C]
+ *
+ * |< Block Size (BS) >|
+ * |DDDDDDDDDDDD0000000000000|
+ * ^ ^ ^
+ * A A+B A+B+C
+ * (EOF)
+ *
+ * 2) with A+B+C > A+BS, the file has zero padding in range
+ * [A+B, A+BS]
+ *
+ * |< Block Size (BS) >|< Block Size (BS) >|
+ * |DDDDDDDDDDDD0000000000000|00000000000000000000000000|
+ * ^ ^ ^ ^
+ * A A+B A+BS A+B+C
+ * (EOF)
+ *
+ * D = Valid Data
+ * 0 = Zero Padding
+ *
+ * Note that this defeats the ability to chain the ioends of
+ * appending writes.
+ */
+ wpc->ioend->io_size += len;
+ if (wpc->ioend->io_offset + wpc->ioend->io_size > end_pos)
+ wpc->ioend->io_size = end_pos - wpc->ioend->io_offset;
+
+ wbc_account_cgroup_owner(wbc, folio, len);
+ return 0;
+}
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 3729391a18f3..47e27459da4d 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -21,23 +21,6 @@
#include "../internal.h"
-/*
- * Structure allocated for each folio to track per-block uptodate, dirty state
- * and I/O completions.
- */
-struct iomap_folio_state {
- spinlock_t state_lock;
- unsigned int read_bytes_pending;
- atomic_t write_bytes_pending;
-
- /*
- * Each block has two bits in this bitmap:
- * Bits [0..blocks_per_folio) has the uptodate status.
- * Bits [b_p_f...(2*b_p_f)) has the dirty status.
- */
- unsigned long state[];
-};
-
static inline bool ifs_is_fully_uptodate(struct folio *folio,
struct iomap_folio_state *ifs)
{
@@ -52,8 +35,8 @@ static inline bool ifs_block_is_uptodate(struct iomap_folio_state *ifs,
return test_bit(block, ifs->state);
}
-static bool ifs_set_range_uptodate(struct folio *folio,
- struct iomap_folio_state *ifs, size_t off, size_t len)
+bool ifs_set_range_uptodate(struct folio *folio, struct iomap_folio_state *ifs,
+ size_t off, size_t len)
{
struct inode *inode = folio->mapping->host;
unsigned int first_blk = off >> inode->i_blkbits;
@@ -667,18 +650,6 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
pos + len - 1);
}
-static int iomap_read_folio_sync(loff_t block_start, struct folio *folio,
- size_t poff, size_t plen, const struct iomap *iomap)
-{
- struct bio_vec bvec;
- struct bio bio;
-
- bio_init(&bio, iomap->bdev, &bvec, 1, REQ_OP_READ);
- bio.bi_iter.bi_sector = iomap_sector(iomap, block_start);
- bio_add_folio_nofail(&bio, folio, plen, poff);
- return submit_bio_wait(&bio);
-}
-
static int __iomap_write_begin(const struct iomap_iter *iter, size_t len,
struct folio *folio)
{
@@ -1535,58 +1506,6 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops,
}
EXPORT_SYMBOL_GPL(iomap_page_mkwrite);
-static void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
- size_t len)
-{
- struct iomap_folio_state *ifs = folio->private;
-
- WARN_ON_ONCE(i_blocks_per_folio(inode, folio) > 1 && !ifs);
- WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) <= 0);
-
- if (!ifs || atomic_sub_and_test(len, &ifs->write_bytes_pending))
- folio_end_writeback(folio);
-}
-
-/*
- * We're now finished for good with this ioend structure. Update the page
- * state, release holds on bios, and finally free up memory. Do not use the
- * ioend after this.
- */
-u32 iomap_finish_ioend_buffered(struct iomap_ioend *ioend)
-{
- struct inode *inode = ioend->io_inode;
- struct bio *bio = &ioend->io_bio;
- struct folio_iter fi;
- u32 folio_count = 0;
-
- if (ioend->io_error) {
- mapping_set_error(inode->i_mapping, ioend->io_error);
- if (!bio_flagged(bio, BIO_QUIET)) {
- pr_err_ratelimited(
-"%s: writeback error on inode %lu, offset %lld, sector %llu",
- inode->i_sb->s_id, inode->i_ino,
- ioend->io_offset, ioend->io_sector);
- }
- }
-
- /* walk all folios in bio, ending page IO on them */
- bio_for_each_folio_all(fi, bio) {
- iomap_finish_folio_write(inode, fi.folio, fi.length);
- folio_count++;
- }
-
- bio_put(bio); /* frees the ioend */
- return folio_count;
-}
-
-static void iomap_writepage_end_bio(struct bio *bio)
-{
- struct iomap_ioend *ioend = iomap_ioend_from_bio(bio);
-
- ioend->io_error = blk_status_to_errno(bio->bi_status);
- iomap_finish_ioend_buffered(ioend);
-}
-
/*
* Submit an ioend.
*
@@ -1596,7 +1515,7 @@ static void iomap_writepage_end_bio(struct bio *bio)
* with the error status here to run the normal I/O completion handler to clear
* the writeback bit and let the file system proess the errors.
*/
-static int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error)
+int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error)
{
if (!wpc->ioend)
return error;
@@ -1625,141 +1544,6 @@ static int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error)
return error;
}
-static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
- struct writeback_control *wbc, struct inode *inode, loff_t pos,
- u16 ioend_flags)
-{
- struct bio *bio;
-
- bio = bio_alloc_bioset(wpc->iomap.bdev, BIO_MAX_VECS,
- REQ_OP_WRITE | wbc_to_write_flags(wbc),
- GFP_NOFS, &iomap_ioend_bioset);
- bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos);
- bio->bi_end_io = iomap_writepage_end_bio;
- bio->bi_write_hint = inode->i_write_hint;
- wbc_init_bio(wbc, bio);
- wpc->nr_folios = 0;
- return iomap_init_ioend(inode, bio, pos, ioend_flags);
-}
-
-static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos,
- u16 ioend_flags)
-{
- if (ioend_flags & IOMAP_IOEND_BOUNDARY)
- return false;
- if ((ioend_flags & IOMAP_IOEND_NOMERGE_FLAGS) !=
- (wpc->ioend->io_flags & IOMAP_IOEND_NOMERGE_FLAGS))
- return false;
- if (pos != wpc->ioend->io_offset + wpc->ioend->io_size)
- return false;
- if (!(wpc->iomap.flags & IOMAP_F_ANON_WRITE) &&
- iomap_sector(&wpc->iomap, pos) !=
- bio_end_sector(&wpc->ioend->io_bio))
- return false;
- /*
- * Limit ioend bio chain lengths to minimise IO completion latency. This
- * also prevents long tight loops ending page writeback on all the
- * folios in the ioend.
- */
- if (wpc->nr_folios >= IOEND_BATCH_SIZE)
- return false;
- return true;
-}
-
-/*
- * Test to see if we have an existing ioend structure that we could append to
- * first; otherwise finish off the current ioend and start another.
- *
- * If a new ioend is created and cached, the old ioend is submitted to the block
- * layer instantly. Batching optimisations are provided by higher level block
- * plugging.
- *
- * At the end of a writeback pass, there will be a cached ioend remaining on the
- * writepage context that the caller will need to submit.
- */
-static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
- struct writeback_control *wbc, struct folio *folio,
- struct inode *inode, loff_t pos, loff_t end_pos,
- unsigned len)
-{
- struct iomap_folio_state *ifs = folio->private;
- size_t poff = offset_in_folio(folio, pos);
- unsigned int ioend_flags = 0;
- int error;
-
- if (wpc->iomap.type == IOMAP_UNWRITTEN)
- ioend_flags |= IOMAP_IOEND_UNWRITTEN;
- if (wpc->iomap.flags & IOMAP_F_SHARED)
- ioend_flags |= IOMAP_IOEND_SHARED;
- if (folio_test_dropbehind(folio))
- ioend_flags |= IOMAP_IOEND_DONTCACHE;
- if (pos == wpc->iomap.offset && (wpc->iomap.flags & IOMAP_F_BOUNDARY))
- ioend_flags |= IOMAP_IOEND_BOUNDARY;
-
- if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, pos, ioend_flags)) {
-new_ioend:
- error = iomap_submit_ioend(wpc, 0);
- if (error)
- return error;
- wpc->ioend = iomap_alloc_ioend(wpc, wbc, inode, pos,
- ioend_flags);
- }
-
- if (!bio_add_folio(&wpc->ioend->io_bio, folio, len, poff))
- goto new_ioend;
-
- if (ifs)
- atomic_add(len, &ifs->write_bytes_pending);
-
- /*
- * Clamp io_offset and io_size to the incore EOF so that ondisk
- * file size updates in the ioend completion are byte-accurate.
- * This avoids recovering files with zeroed tail regions when
- * writeback races with appending writes:
- *
- * Thread 1: Thread 2:
- * ------------ -----------
- * write [A, A+B]
- * update inode size to A+B
- * submit I/O [A, A+BS]
- * write [A+B, A+B+C]
- * update inode size to A+B+C
- * <I/O completes, updates disk size to min(A+B+C, A+BS)>
- * <power failure>
- *
- * After reboot:
- * 1) with A+B+C < A+BS, the file has zero padding in range
- * [A+B, A+B+C]
- *
- * |< Block Size (BS) >|
- * |DDDDDDDDDDDD0000000000000|
- * ^ ^ ^
- * A A+B A+B+C
- * (EOF)
- *
- * 2) with A+B+C > A+BS, the file has zero padding in range
- * [A+B, A+BS]
- *
- * |< Block Size (BS) >|< Block Size (BS) >|
- * |DDDDDDDDDDDD0000000000000|00000000000000000000000000|
- * ^ ^ ^ ^
- * A A+B A+BS A+B+C
- * (EOF)
- *
- * D = Valid Data
- * 0 = Zero Padding
- *
- * Note that this defeats the ability to chain the ioends of
- * appending writes.
- */
- wpc->ioend->io_size += len;
- if (wpc->ioend->io_offset + wpc->ioend->io_size > end_pos)
- wpc->ioend->io_size = end_pos - wpc->ioend->io_offset;
-
- wbc_account_cgroup_owner(wbc, folio, len);
- return 0;
-}
-
static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
struct writeback_control *wbc, struct folio *folio,
struct inode *inode, u64 pos, u64 end_pos,
diff --git a/fs/iomap/internal.h b/fs/iomap/internal.h
index f6992a3bf66a..2fc1796053da 100644
--- a/fs/iomap/internal.h
+++ b/fs/iomap/internal.h
@@ -4,7 +4,32 @@
#define IOEND_BATCH_SIZE 4096
+/*
+ * Structure allocated for each folio to track per-block uptodate, dirty state
+ * and I/O completions.
+ */
+struct iomap_folio_state {
+ spinlock_t state_lock;
+ unsigned int read_bytes_pending;
+ atomic_t write_bytes_pending;
+
+ /*
+ * Each block has two bits in this bitmap:
+ * Bits [0..blocks_per_folio) has the uptodate status.
+ * Bits [b_p_f...(2*b_p_f)) has the dirty status.
+ */
+ unsigned long state[];
+};
+
u32 iomap_finish_ioend_buffered(struct iomap_ioend *ioend);
u32 iomap_finish_ioend_direct(struct iomap_ioend *ioend);
+bool ifs_set_range_uptodate(struct folio *folio, struct iomap_folio_state *ifs,
+ size_t off, size_t len);
+int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error);
+int iomap_read_folio_sync(loff_t block_start, struct folio *folio, size_t poff,
+ size_t plen, const struct iomap *iomap);
+int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
+ struct writeback_control *wbc, struct folio *folio,
+ struct inode *inode, loff_t pos, loff_t end_pos, unsigned len);
#endif /* _IOMAP_INTERNAL_H */
--
2.47.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v2 02/16] iomap: iomap_read_folio_sync() -> iomap_bio_read_folio_sync()
2025-06-13 21:46 [PATCH v2 00/16] fuse: use iomap for buffered writes + writeback Joanne Koong
2025-06-13 21:46 ` [PATCH v2 01/16] iomap: move buffered io CONFIG_BLOCK dependent logic into separate file Joanne Koong
@ 2025-06-13 21:46 ` Joanne Koong
2025-06-13 21:46 ` [PATCH v2 03/16] iomap: iomap_add_to_ioend() -> iomap_bio_add_to_ioend() Joanne Koong
` (13 subsequent siblings)
15 siblings, 0 replies; 28+ messages in thread
From: Joanne Koong @ 2025-06-13 21:46 UTC (permalink / raw)
To: linux-fsdevel
Cc: hch, djwong, anuj1072538, miklos, brauner, linux-xfs, kernel-team
Rename from iomap_read_folio_sync() to iomap_bio_read_folio_sync() to
indicate the dependency on the block io layer and add a CONFIG_BLOCK
check to have iomap_bio_read_folio_sync() return -ENOSYS if the caller
calls this in environments where CONFIG_BLOCK is not set.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io-bio.c | 2 +-
fs/iomap/buffered-io.c | 2 +-
fs/iomap/internal.h | 9 +++++++--
3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/fs/iomap/buffered-io-bio.c b/fs/iomap/buffered-io-bio.c
index 24f5ede7af3d..c1132ff4a502 100644
--- a/fs/iomap/buffered-io-bio.c
+++ b/fs/iomap/buffered-io-bio.c
@@ -10,7 +10,7 @@
#include "internal.h"
-int iomap_read_folio_sync(loff_t block_start, struct folio *folio,
+int iomap_bio_read_folio_sync(loff_t block_start, struct folio *folio,
size_t poff, size_t plen, const struct iomap *iomap)
{
struct bio_vec bvec;
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 47e27459da4d..227cbd9a3e9e 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -702,7 +702,7 @@ static int __iomap_write_begin(const struct iomap_iter *iter, size_t len,
if (iter->flags & IOMAP_NOWAIT)
return -EAGAIN;
- status = iomap_read_folio_sync(block_start, folio,
+ status = iomap_bio_read_folio_sync(block_start, folio,
poff, plen, srcmap);
if (status)
return status;
diff --git a/fs/iomap/internal.h b/fs/iomap/internal.h
index 2fc1796053da..9efdbf82795e 100644
--- a/fs/iomap/internal.h
+++ b/fs/iomap/internal.h
@@ -26,10 +26,15 @@ u32 iomap_finish_ioend_direct(struct iomap_ioend *ioend);
bool ifs_set_range_uptodate(struct folio *folio, struct iomap_folio_state *ifs,
size_t off, size_t len);
int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error);
-int iomap_read_folio_sync(loff_t block_start, struct folio *folio, size_t poff,
- size_t plen, const struct iomap *iomap);
int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
struct writeback_control *wbc, struct folio *folio,
struct inode *inode, loff_t pos, loff_t end_pos, unsigned len);
+#ifdef CONFIG_BLOCK
+int iomap_bio_read_folio_sync(loff_t block_start, struct folio *folio,
+ size_t poff, size_t plen, const struct iomap *iomap);
+#else
+#define iomap_bio_read_folio_sync(...) (-ENOSYS)
+#endif /* CONFIG_BLOCK */
+
#endif /* _IOMAP_INTERNAL_H */
--
2.47.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v2 03/16] iomap: iomap_add_to_ioend() -> iomap_bio_add_to_ioend()
2025-06-13 21:46 [PATCH v2 00/16] fuse: use iomap for buffered writes + writeback Joanne Koong
2025-06-13 21:46 ` [PATCH v2 01/16] iomap: move buffered io CONFIG_BLOCK dependent logic into separate file Joanne Koong
2025-06-13 21:46 ` [PATCH v2 02/16] iomap: iomap_read_folio_sync() -> iomap_bio_read_folio_sync() Joanne Koong
@ 2025-06-13 21:46 ` Joanne Koong
2025-06-13 21:46 ` [PATCH v2 04/16] iomap: add wrapper function iomap_bio_readpage() Joanne Koong
` (12 subsequent siblings)
15 siblings, 0 replies; 28+ messages in thread
From: Joanne Koong @ 2025-06-13 21:46 UTC (permalink / raw)
To: linux-fsdevel
Cc: hch, djwong, anuj1072538, miklos, brauner, linux-xfs, kernel-team
Rename from iomap_add_to_ioend() to iomap_bio_add_to_ioend() to indicate
the dependency on the block io layer and add a CONFIG_BLOCK check to
have iomap_bio_add_to_ioend() return -ENOSYS if the caller calls this in
environments where CONFIG_BLOCK is not set.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io-bio.c | 2 +-
fs/iomap/buffered-io.c | 4 ++--
fs/iomap/internal.h | 7 ++++---
3 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/fs/iomap/buffered-io-bio.c b/fs/iomap/buffered-io-bio.c
index c1132ff4a502..798cb59dbbf4 100644
--- a/fs/iomap/buffered-io-bio.c
+++ b/fs/iomap/buffered-io-bio.c
@@ -126,7 +126,7 @@ static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos,
* At the end of a writeback pass, there will be a cached ioend remaining on the
* writepage context that the caller will need to submit.
*/
-int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
+int iomap_bio_add_to_ioend(struct iomap_writepage_ctx *wpc,
struct writeback_control *wbc, struct folio *folio,
struct inode *inode, loff_t pos, loff_t end_pos,
unsigned len)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 227cbd9a3e9e..b7b7222a1700 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1571,8 +1571,8 @@ static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
case IOMAP_HOLE:
break;
default:
- error = iomap_add_to_ioend(wpc, wbc, folio, inode, pos,
- end_pos, map_len);
+ error = iomap_bio_add_to_ioend(wpc, wbc, folio, inode,
+ pos, end_pos, map_len);
if (!error)
(*count)++;
break;
diff --git a/fs/iomap/internal.h b/fs/iomap/internal.h
index 9efdbf82795e..7fa3114c5d16 100644
--- a/fs/iomap/internal.h
+++ b/fs/iomap/internal.h
@@ -26,15 +26,16 @@ u32 iomap_finish_ioend_direct(struct iomap_ioend *ioend);
bool ifs_set_range_uptodate(struct folio *folio, struct iomap_folio_state *ifs,
size_t off, size_t len);
int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error);
-int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
- struct writeback_control *wbc, struct folio *folio,
- struct inode *inode, loff_t pos, loff_t end_pos, unsigned len);
#ifdef CONFIG_BLOCK
int iomap_bio_read_folio_sync(loff_t block_start, struct folio *folio,
size_t poff, size_t plen, const struct iomap *iomap);
+int iomap_bio_add_to_ioend(struct iomap_writepage_ctx *wpc,
+ struct writeback_control *wbc, struct folio *folio,
+ struct inode *inode, loff_t pos, loff_t end_pos, unsigned len);
#else
#define iomap_bio_read_folio_sync(...) (-ENOSYS)
+#define iomap_bio_add_to_ioend(...) (-ENOSYS)
#endif /* CONFIG_BLOCK */
#endif /* _IOMAP_INTERNAL_H */
--
2.47.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v2 04/16] iomap: add wrapper function iomap_bio_readpage()
2025-06-13 21:46 [PATCH v2 00/16] fuse: use iomap for buffered writes + writeback Joanne Koong
` (2 preceding siblings ...)
2025-06-13 21:46 ` [PATCH v2 03/16] iomap: iomap_add_to_ioend() -> iomap_bio_add_to_ioend() Joanne Koong
@ 2025-06-13 21:46 ` Joanne Koong
2025-06-16 12:49 ` Christoph Hellwig
2025-06-13 21:46 ` [PATCH v2 05/16] iomap: add wrapper function iomap_bio_ioend_error() Joanne Koong
` (11 subsequent siblings)
15 siblings, 1 reply; 28+ messages in thread
From: Joanne Koong @ 2025-06-13 21:46 UTC (permalink / raw)
To: linux-fsdevel
Cc: hch, djwong, anuj1072538, miklos, brauner, linux-xfs, kernel-team
Add a wrapper function, iomap_bio_readpage(), around the bio readpage
logic so that callers that do not have CONFIG_BLOCK set may also use
iomap for buffered io.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io-bio.c | 71 ++++++++++++++++++++++++++++++++++++++
fs/iomap/buffered-io.c | 71 +-------------------------------------
fs/iomap/internal.h | 11 ++++++
3 files changed, 83 insertions(+), 70 deletions(-)
diff --git a/fs/iomap/buffered-io-bio.c b/fs/iomap/buffered-io-bio.c
index 798cb59dbbf4..e27a43291653 100644
--- a/fs/iomap/buffered-io-bio.c
+++ b/fs/iomap/buffered-io-bio.c
@@ -10,6 +10,77 @@
#include "internal.h"
+static void iomap_finish_folio_read(struct folio *folio, size_t off,
+ size_t len, int error)
+{
+ struct iomap_folio_state *ifs = folio->private;
+ bool uptodate = !error;
+ bool finished = true;
+
+ if (ifs) {
+ unsigned long flags;
+
+ spin_lock_irqsave(&ifs->state_lock, flags);
+ if (!error)
+ uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
+ ifs->read_bytes_pending -= len;
+ finished = !ifs->read_bytes_pending;
+ spin_unlock_irqrestore(&ifs->state_lock, flags);
+ }
+
+ if (finished)
+ folio_end_read(folio, uptodate);
+}
+
+static void iomap_read_end_io(struct bio *bio)
+{
+ int error = blk_status_to_errno(bio->bi_status);
+ struct folio_iter fi;
+
+ bio_for_each_folio_all(fi, bio)
+ iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error);
+ bio_put(bio);
+}
+
+void iomap_bio_readpage(const struct iomap *iomap, loff_t pos,
+ struct iomap_readpage_ctx *ctx, size_t poff, size_t plen,
+ loff_t length)
+{
+ struct folio *folio = ctx->cur_folio;
+ sector_t sector;
+
+ sector = iomap_sector(iomap, pos);
+ if (!ctx->bio ||
+ bio_end_sector(ctx->bio) != sector ||
+ !bio_add_folio(ctx->bio, folio, plen, poff)) {
+ gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
+ gfp_t orig_gfp = gfp;
+ unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
+
+ if (ctx->bio)
+ submit_bio(ctx->bio);
+
+ if (ctx->rac) /* same as readahead_gfp_mask */
+ gfp |= __GFP_NORETRY | __GFP_NOWARN;
+ ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
+ REQ_OP_READ, gfp);
+ /*
+ * If the bio_alloc fails, try it again for a single page to
+ * avoid having to deal with partial page reads. This emulates
+ * what do_mpage_read_folio does.
+ */
+ if (!ctx->bio) {
+ ctx->bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ,
+ orig_gfp);
+ }
+ if (ctx->rac)
+ ctx->bio->bi_opf |= REQ_RAHEAD;
+ ctx->bio->bi_iter.bi_sector = sector;
+ ctx->bio->bi_end_io = iomap_read_end_io;
+ bio_add_folio_nofail(ctx->bio, folio, plen, poff);
+ }
+}
+
int iomap_bio_read_folio_sync(loff_t block_start, struct folio *folio,
size_t poff, size_t plen, const struct iomap *iomap)
{
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index b7b7222a1700..45c701af3f0c 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -267,45 +267,6 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
*lenp = plen;
}
-static void iomap_finish_folio_read(struct folio *folio, size_t off,
- size_t len, int error)
-{
- struct iomap_folio_state *ifs = folio->private;
- bool uptodate = !error;
- bool finished = true;
-
- if (ifs) {
- unsigned long flags;
-
- spin_lock_irqsave(&ifs->state_lock, flags);
- if (!error)
- uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
- ifs->read_bytes_pending -= len;
- finished = !ifs->read_bytes_pending;
- spin_unlock_irqrestore(&ifs->state_lock, flags);
- }
-
- if (finished)
- folio_end_read(folio, uptodate);
-}
-
-static void iomap_read_end_io(struct bio *bio)
-{
- int error = blk_status_to_errno(bio->bi_status);
- struct folio_iter fi;
-
- bio_for_each_folio_all(fi, bio)
- iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error);
- bio_put(bio);
-}
-
-struct iomap_readpage_ctx {
- struct folio *cur_folio;
- bool cur_folio_in_bio;
- struct bio *bio;
- struct readahead_control *rac;
-};
-
/**
* iomap_read_inline_data - copy inline data into the page cache
* @iter: iteration structure
@@ -354,7 +315,6 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
struct folio *folio = ctx->cur_folio;
struct iomap_folio_state *ifs;
size_t poff, plen;
- sector_t sector;
int ret;
if (iomap->type == IOMAP_INLINE) {
@@ -383,36 +343,7 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
spin_unlock_irq(&ifs->state_lock);
}
- sector = iomap_sector(iomap, pos);
- if (!ctx->bio ||
- bio_end_sector(ctx->bio) != sector ||
- !bio_add_folio(ctx->bio, folio, plen, poff)) {
- gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
- gfp_t orig_gfp = gfp;
- unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
-
- if (ctx->bio)
- submit_bio(ctx->bio);
-
- if (ctx->rac) /* same as readahead_gfp_mask */
- gfp |= __GFP_NORETRY | __GFP_NOWARN;
- ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
- REQ_OP_READ, gfp);
- /*
- * If the bio_alloc fails, try it again for a single page to
- * avoid having to deal with partial page reads. This emulates
- * what do_mpage_read_folio does.
- */
- if (!ctx->bio) {
- ctx->bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ,
- orig_gfp);
- }
- if (ctx->rac)
- ctx->bio->bi_opf |= REQ_RAHEAD;
- ctx->bio->bi_iter.bi_sector = sector;
- ctx->bio->bi_end_io = iomap_read_end_io;
- bio_add_folio_nofail(ctx->bio, folio, plen, poff);
- }
+ iomap_bio_readpage(iomap, pos, ctx, poff, plen, length);
done:
/*
diff --git a/fs/iomap/internal.h b/fs/iomap/internal.h
index 7fa3114c5d16..bbef4b947633 100644
--- a/fs/iomap/internal.h
+++ b/fs/iomap/internal.h
@@ -21,6 +21,13 @@ struct iomap_folio_state {
unsigned long state[];
};
+struct iomap_readpage_ctx {
+ struct folio *cur_folio;
+ bool cur_folio_in_bio;
+ struct bio *bio;
+ struct readahead_control *rac;
+};
+
u32 iomap_finish_ioend_buffered(struct iomap_ioend *ioend);
u32 iomap_finish_ioend_direct(struct iomap_ioend *ioend);
bool ifs_set_range_uptodate(struct folio *folio, struct iomap_folio_state *ifs,
@@ -33,9 +40,13 @@ int iomap_bio_read_folio_sync(loff_t block_start, struct folio *folio,
int iomap_bio_add_to_ioend(struct iomap_writepage_ctx *wpc,
struct writeback_control *wbc, struct folio *folio,
struct inode *inode, loff_t pos, loff_t end_pos, unsigned len);
+void iomap_bio_readpage(const struct iomap *iomap, loff_t pos,
+ struct iomap_readpage_ctx *ctx, size_t poff, size_t plen,
+ loff_t length);
#else
#define iomap_bio_read_folio_sync(...) (-ENOSYS)
#define iomap_bio_add_to_ioend(...) (-ENOSYS)
+#define iomap_bio_readpage(...) ((void)0)
#endif /* CONFIG_BLOCK */
#endif /* _IOMAP_INTERNAL_H */
--
2.47.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v2 05/16] iomap: add wrapper function iomap_bio_ioend_error()
2025-06-13 21:46 [PATCH v2 00/16] fuse: use iomap for buffered writes + writeback Joanne Koong
` (3 preceding siblings ...)
2025-06-13 21:46 ` [PATCH v2 04/16] iomap: add wrapper function iomap_bio_readpage() Joanne Koong
@ 2025-06-13 21:46 ` Joanne Koong
2025-06-13 21:46 ` [PATCH v2 06/16] iomap: add wrapper function iomap_submit_bio() Joanne Koong
` (10 subsequent siblings)
15 siblings, 0 replies; 28+ messages in thread
From: Joanne Koong @ 2025-06-13 21:46 UTC (permalink / raw)
To: linux-fsdevel
Cc: hch, djwong, anuj1072538, miklos, brauner, linux-xfs, kernel-team
Add a wrapper function, iomap_bio_ioend_error(), around the bio error
handling so that callers that do not have CONFIG_BLOCK set may also use
iomap for buffered io.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io-bio.c | 6 ++++++
fs/iomap/buffered-io.c | 6 ++----
fs/iomap/internal.h | 2 ++
3 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/fs/iomap/buffered-io-bio.c b/fs/iomap/buffered-io-bio.c
index e27a43291653..89c06cabbb1b 100644
--- a/fs/iomap/buffered-io-bio.c
+++ b/fs/iomap/buffered-io-bio.c
@@ -145,6 +145,12 @@ static void iomap_writepage_end_bio(struct bio *bio)
iomap_finish_ioend_buffered(ioend);
}
+void iomap_bio_ioend_error(struct iomap_writepage_ctx *wpc, int error)
+{
+ wpc->ioend->io_bio.bi_status = errno_to_blk_status(error);
+ bio_endio(&wpc->ioend->io_bio);
+}
+
static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
struct writeback_control *wbc, struct inode *inode, loff_t pos,
u16 ioend_flags)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 45c701af3f0c..9ce792adf8a4 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1466,10 +1466,8 @@ int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error)
submit_bio(&wpc->ioend->io_bio);
}
- if (error) {
- wpc->ioend->io_bio.bi_status = errno_to_blk_status(error);
- bio_endio(&wpc->ioend->io_bio);
- }
+ if (error)
+ iomap_bio_ioend_error(wpc, error);
wpc->ioend = NULL;
return error;
diff --git a/fs/iomap/internal.h b/fs/iomap/internal.h
index bbef4b947633..664554ffb8bf 100644
--- a/fs/iomap/internal.h
+++ b/fs/iomap/internal.h
@@ -43,10 +43,12 @@ int iomap_bio_add_to_ioend(struct iomap_writepage_ctx *wpc,
void iomap_bio_readpage(const struct iomap *iomap, loff_t pos,
struct iomap_readpage_ctx *ctx, size_t poff, size_t plen,
loff_t length);
+void iomap_bio_ioend_error(struct iomap_writepage_ctx *wpc, int error);
#else
#define iomap_bio_read_folio_sync(...) (-ENOSYS)
#define iomap_bio_add_to_ioend(...) (-ENOSYS)
#define iomap_bio_readpage(...) ((void)0)
+#define iomap_bio_ioend_error(...) ((void)0)
#endif /* CONFIG_BLOCK */
#endif /* _IOMAP_INTERNAL_H */
--
2.47.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v2 06/16] iomap: add wrapper function iomap_submit_bio()
2025-06-13 21:46 [PATCH v2 00/16] fuse: use iomap for buffered writes + writeback Joanne Koong
` (4 preceding siblings ...)
2025-06-13 21:46 ` [PATCH v2 05/16] iomap: add wrapper function iomap_bio_ioend_error() Joanne Koong
@ 2025-06-13 21:46 ` Joanne Koong
2025-06-13 21:46 ` [PATCH v2 07/16] iomap: decouple buffered-io.o from CONFIG_BLOCK Joanne Koong
` (9 subsequent siblings)
15 siblings, 0 replies; 28+ messages in thread
From: Joanne Koong @ 2025-06-13 21:46 UTC (permalink / raw)
To: linux-fsdevel
Cc: hch, djwong, anuj1072538, miklos, brauner, linux-xfs, kernel-team
Add a wrapper function, iomap_submit_bio(), around bio submission so
that callers that do not have CONFIG_BLOCK set may also use iomap for
buffered io.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io-bio.c | 5 +++++
fs/iomap/buffered-io.c | 6 +++---
fs/iomap/internal.h | 2 ++
3 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/fs/iomap/buffered-io-bio.c b/fs/iomap/buffered-io-bio.c
index 89c06cabbb1b..d5dfa1b3eef7 100644
--- a/fs/iomap/buffered-io-bio.c
+++ b/fs/iomap/buffered-io-bio.c
@@ -81,6 +81,11 @@ void iomap_bio_readpage(const struct iomap *iomap, loff_t pos,
}
}
+void iomap_submit_bio(struct bio *bio)
+{
+ submit_bio(bio);
+}
+
int iomap_bio_read_folio_sync(loff_t block_start, struct folio *folio,
size_t poff, size_t plen, const struct iomap *iomap)
{
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 9ce792adf8a4..882e55a1d75c 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -388,7 +388,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
iter.status = iomap_read_folio_iter(&iter, &ctx);
if (ctx.bio) {
- submit_bio(ctx.bio);
+ iomap_submit_bio(ctx.bio);
WARN_ON_ONCE(!ctx.cur_folio_in_bio);
} else {
WARN_ON_ONCE(ctx.cur_folio_in_bio);
@@ -460,7 +460,7 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
iter.status = iomap_readahead_iter(&iter, &ctx);
if (ctx.bio)
- submit_bio(ctx.bio);
+ iomap_submit_bio(ctx.bio);
if (ctx.cur_folio) {
if (!ctx.cur_folio_in_bio)
folio_unlock(ctx.cur_folio);
@@ -1463,7 +1463,7 @@ int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error)
if (WARN_ON_ONCE(wpc->iomap.flags & IOMAP_F_ANON_WRITE))
error = -EIO;
if (!error)
- submit_bio(&wpc->ioend->io_bio);
+ iomap_submit_bio(&wpc->ioend->io_bio);
}
if (error)
diff --git a/fs/iomap/internal.h b/fs/iomap/internal.h
index 664554ffb8bf..27e8a174dc3f 100644
--- a/fs/iomap/internal.h
+++ b/fs/iomap/internal.h
@@ -44,11 +44,13 @@ void iomap_bio_readpage(const struct iomap *iomap, loff_t pos,
struct iomap_readpage_ctx *ctx, size_t poff, size_t plen,
loff_t length);
void iomap_bio_ioend_error(struct iomap_writepage_ctx *wpc, int error);
+void iomap_submit_bio(struct bio *bio);
#else
#define iomap_bio_read_folio_sync(...) (-ENOSYS)
#define iomap_bio_add_to_ioend(...) (-ENOSYS)
#define iomap_bio_readpage(...) ((void)0)
#define iomap_bio_ioend_error(...) ((void)0)
+#define iomap_submit_bio(...) ((void)0)
#endif /* CONFIG_BLOCK */
#endif /* _IOMAP_INTERNAL_H */
--
2.47.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v2 07/16] iomap: decouple buffered-io.o from CONFIG_BLOCK
2025-06-13 21:46 [PATCH v2 00/16] fuse: use iomap for buffered writes + writeback Joanne Koong
` (5 preceding siblings ...)
2025-06-13 21:46 ` [PATCH v2 06/16] iomap: add wrapper function iomap_submit_bio() Joanne Koong
@ 2025-06-13 21:46 ` Joanne Koong
2025-06-13 21:46 ` [PATCH v2 08/16] iomap: add read_folio_sync() handler for buffered writes Joanne Koong
` (8 subsequent siblings)
15 siblings, 0 replies; 28+ messages in thread
From: Joanne Koong @ 2025-06-13 21:46 UTC (permalink / raw)
To: linux-fsdevel
Cc: hch, djwong, anuj1072538, miklos, brauner, linux-xfs, kernel-team
Now that buffered-io.o is not dependent on CONFIG_BLOCK, decouple it
from CONFIG_BLOCK in the Makefile so that filesystems that do not have
CONFIG_BLOCK dependencies may still use iomap for buffered io.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/Makefile | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/iomap/Makefile b/fs/iomap/Makefile
index fb7e8a7a3da4..477d78c58807 100644
--- a/fs/iomap/Makefile
+++ b/fs/iomap/Makefile
@@ -9,9 +9,9 @@ ccflags-y += -I $(src) # needed for trace events
obj-$(CONFIG_FS_IOMAP) += iomap.o
iomap-y += trace.o \
- iter.o
-iomap-$(CONFIG_BLOCK) += buffered-io.o \
- buffered-io-bio.o \
+ iter.o \
+ buffered-io.o
+iomap-$(CONFIG_BLOCK) += buffered-io-bio.o \
direct-io.o \
ioend.o \
fiemap.o \
--
2.47.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v2 08/16] iomap: add read_folio_sync() handler for buffered writes
2025-06-13 21:46 [PATCH v2 00/16] fuse: use iomap for buffered writes + writeback Joanne Koong
` (6 preceding siblings ...)
2025-06-13 21:46 ` [PATCH v2 07/16] iomap: decouple buffered-io.o from CONFIG_BLOCK Joanne Koong
@ 2025-06-13 21:46 ` Joanne Koong
2025-06-13 21:46 ` [PATCH v2 09/16] iomap: change 'count' to 'async_writeback' Joanne Koong
` (7 subsequent siblings)
15 siblings, 0 replies; 28+ messages in thread
From: Joanne Koong @ 2025-06-13 21:46 UTC (permalink / raw)
To: linux-fsdevel
Cc: hch, djwong, anuj1072538, miklos, brauner, linux-xfs, kernel-team
Add a read_folio_sync() handler for buffered writes that filesystems
may pass in if they wish to provide a custom handler for synchronously
reading in the contents of a folio.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
.../filesystems/iomap/operations.rst | 7 +++++++
fs/iomap/buffered-io.c | 19 ++++++++++++++++---
include/linux/iomap.h | 11 +++++++++++
3 files changed, 34 insertions(+), 3 deletions(-)
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index 3b628e370d88..9f0e8a46cc8c 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -72,6 +72,9 @@ default behaviors of iomap:
void (*put_folio)(struct inode *inode, loff_t pos, unsigned copied,
struct folio *folio);
bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap);
+ int (*read_folio_sync)(loff_t block_start, struct folio *folio,
+ size_t off, size_t len,
+ const struct iomap *iomap, void *private);
};
iomap calls these functions:
@@ -102,6 +105,10 @@ iomap calls these functions:
<https://lore.kernel.org/all/20221123055812.747923-8-david@fromorbit.com/>`_
to allocate, install, and lock that folio.
+ - ``read_folio_sync``: Called to synchronously read in the range that will
+ be written to. If this function is not provided, iomap will default to
+ submitting a bio read request.
+
For the pagecache, races can happen if writeback doesn't take
``i_rwsem`` or ``invalidate_lock`` and updates mapping information.
Races can also happen if the filesystem allows concurrent writes.
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 882e55a1d75c..7063a1132694 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -581,10 +581,23 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
pos + len - 1);
}
+static int iomap_read_folio_sync(const struct iomap_iter *iter,
+ loff_t block_start, struct folio *folio, size_t poff,
+ size_t plen)
+{
+ const struct iomap_folio_ops *folio_ops = iter->iomap.folio_ops;
+ const struct iomap *srcmap = iomap_iter_srcmap(iter);
+
+ if (folio_ops && folio_ops->read_folio_sync)
+ return folio_ops->read_folio_sync(block_start, folio, poff,
+ plen, srcmap, iter->private);
+
+ return iomap_bio_read_folio_sync(block_start, folio, poff, plen, srcmap);
+}
+
static int __iomap_write_begin(const struct iomap_iter *iter, size_t len,
struct folio *folio)
{
- const struct iomap *srcmap = iomap_iter_srcmap(iter);
struct iomap_folio_state *ifs;
loff_t pos = iter->pos;
loff_t block_size = i_blocksize(iter->inode);
@@ -633,8 +646,8 @@ static int __iomap_write_begin(const struct iomap_iter *iter, size_t len,
if (iter->flags & IOMAP_NOWAIT)
return -EAGAIN;
- status = iomap_bio_read_folio_sync(block_start, folio,
- poff, plen, srcmap);
+ status = iomap_read_folio_sync(iter, block_start, folio,
+ poff, plen);
if (status)
return status;
}
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 522644d62f30..51cf3e863caf 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -174,6 +174,17 @@ struct iomap_folio_ops {
* locked by the iomap code.
*/
bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap);
+
+ /*
+ * Optional if the filesystem wishes to provide a custom handler for
+ * reading in the contents of a folio, otherwise iomap will default to
+ * submitting a bio read request.
+ *
+ * The read must be done synchronously.
+ */
+ int (*read_folio_sync)(loff_t block_start, struct folio *folio,
+ size_t off, size_t len, const struct iomap *iomap,
+ void *private);
};
/*
--
2.47.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v2 09/16] iomap: change 'count' to 'async_writeback'
2025-06-13 21:46 [PATCH v2 00/16] fuse: use iomap for buffered writes + writeback Joanne Koong
` (7 preceding siblings ...)
2025-06-13 21:46 ` [PATCH v2 08/16] iomap: add read_folio_sync() handler for buffered writes Joanne Koong
@ 2025-06-13 21:46 ` Joanne Koong
2025-06-16 12:52 ` Christoph Hellwig
2025-06-13 21:46 ` [PATCH v2 10/16] iomap: replace ->map_blocks() with generic ->writeback_folio() for writeback Joanne Koong
` (6 subsequent siblings)
15 siblings, 1 reply; 28+ messages in thread
From: Joanne Koong @ 2025-06-13 21:46 UTC (permalink / raw)
To: linux-fsdevel
Cc: hch, djwong, anuj1072538, miklos, brauner, linux-xfs, kernel-team
Rename "count" to "async_writeback" to better reflect its function and
since it is used as a boolean, change its type from unsigned to bool.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 7063a1132694..2f620ebe20e2 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1489,7 +1489,7 @@ int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error)
static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
struct writeback_control *wbc, struct folio *folio,
struct inode *inode, u64 pos, u64 end_pos,
- unsigned dirty_len, unsigned *count)
+ unsigned dirty_len, bool *async_writeback)
{
int error;
@@ -1516,7 +1516,7 @@ static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
error = iomap_bio_add_to_ioend(wpc, wbc, folio, inode,
pos, end_pos, map_len);
if (!error)
- (*count)++;
+ *async_writeback = true;
break;
}
dirty_len -= map_len;
@@ -1603,7 +1603,7 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
u64 pos = folio_pos(folio);
u64 end_pos = pos + folio_size(folio);
u64 end_aligned = 0;
- unsigned count = 0;
+ bool async_writeback = false;
int error = 0;
u32 rlen;
@@ -1647,13 +1647,13 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
end_aligned = round_up(end_pos, i_blocksize(inode));
while ((rlen = iomap_find_dirty_range(folio, &pos, end_aligned))) {
error = iomap_writepage_map_blocks(wpc, wbc, folio, inode,
- pos, end_pos, rlen, &count);
+ pos, end_pos, rlen, &async_writeback);
if (error)
break;
pos += rlen;
}
- if (count)
+ if (async_writeback)
wpc->nr_folios++;
/*
@@ -1675,7 +1675,7 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
if (atomic_dec_and_test(&ifs->write_bytes_pending))
folio_end_writeback(folio);
} else {
- if (!count)
+ if (!async_writeback)
folio_end_writeback(folio);
}
mapping_set_error(inode->i_mapping, error);
--
2.47.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v2 10/16] iomap: replace ->map_blocks() with generic ->writeback_folio() for writeback
2025-06-13 21:46 [PATCH v2 00/16] fuse: use iomap for buffered writes + writeback Joanne Koong
` (8 preceding siblings ...)
2025-06-13 21:46 ` [PATCH v2 09/16] iomap: change 'count' to 'async_writeback' Joanne Koong
@ 2025-06-13 21:46 ` Joanne Koong
2025-06-16 12:54 ` Christoph Hellwig
2025-06-13 21:46 ` [PATCH v2 11/16] iomap: replace ->submit_ioend() with generic ->writeback_complete() " Joanne Koong
` (5 subsequent siblings)
15 siblings, 1 reply; 28+ messages in thread
From: Joanne Koong @ 2025-06-13 21:46 UTC (permalink / raw)
To: linux-fsdevel
Cc: hch, djwong, anuj1072538, miklos, brauner, linux-xfs, kernel-team
As part of the larger effort to have iomap buffered io code support
generic io, replace map_blocks() with writeback_folio() and move the
bio writeback code into a helper function, iomap_bio_writeback_folio(),
that callers using bios can directly invoke.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
.../filesystems/iomap/operations.rst | 38 +++++-----
block/fops.c | 7 +-
fs/gfs2/bmap.c | 7 +-
fs/iomap/buffered-io-bio.c | 49 +++++++++++-
fs/iomap/buffered-io.c | 74 +++++--------------
fs/iomap/internal.h | 4 -
fs/xfs/xfs_aops.c | 14 +++-
fs/zonefs/file.c | 7 +-
include/linux/iomap.h | 42 ++++++++---
9 files changed, 148 insertions(+), 94 deletions(-)
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index 9f0e8a46cc8c..5d018d504145 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -278,7 +278,7 @@ writeback.
It does not lock ``i_rwsem`` or ``invalidate_lock``.
The dirty bit will be cleared for all folios run through the
-``->map_blocks`` machinery described below even if the writeback fails.
+``->writeback_folio`` machinery described below even if the writeback fails.
This is to prevent dirty folio clots when storage devices fail; an
``-EIO`` is recorded for userspace to collect via ``fsync``.
@@ -290,29 +290,33 @@ The ``ops`` structure must be specified and is as follows:
.. code-block:: c
struct iomap_writeback_ops {
- int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode,
- loff_t offset, unsigned len);
+ int (*writeback_folio)(struct iomap_writeback_folio_range *ctx);
int (*submit_ioend)(struct iomap_writepage_ctx *wpc, int status);
void (*discard_folio)(struct folio *folio, loff_t pos);
};
The fields are as follows:
- - ``map_blocks``: Sets ``wpc->iomap`` to the space mapping of the file
- range (in bytes) given by ``offset`` and ``len``.
- iomap calls this function for each dirty fs block in each dirty folio,
- though it will `reuse mappings
+ - ``writeback_folio``: iomap calls this function for each dirty fs block
+ in each dirty folio, though it will `reuse mappings
<https://lore.kernel.org/all/20231207072710.176093-15-hch@lst.de/>`_
for runs of contiguous dirty fsblocks within a folio.
- Do not return ``IOMAP_INLINE`` mappings here; the ``->iomap_end``
- function must deal with persisting written data.
- Do not return ``IOMAP_DELALLOC`` mappings here; iomap currently
- requires mapping to allocated space.
- Filesystems can skip a potentially expensive mapping lookup if the
- mappings have not changed.
- This revalidation must be open-coded by the filesystem; it is
- unclear if ``iomap::validity_cookie`` can be reused for this
- purpose.
+ For blocks that need to be mapped first, please take a look at
+ ``iomap_bio_writeback_folio`` which takes in a ``iomap_map_blocks_t``
+ mapping function. For that mapping function,
+
+ * Set ``wpc->iomap`` to the space mapping of the file range (in bytes)
+ given by ``offset`` and ``len``.
+ * Do not return ``IOMAP_INLINE`` mappings here; the ``->iomap_end``
+ function must deal with persisting written data.
+ * Do not return ``IOMAP_DELALLOC`` mappings here; iomap currently
+ requires mapping to allocated space.
+ * Filesystems can skip a potentially expensive mapping lookup if the
+ mappings have not changed.
+ * This revalidation must be open-coded by the filesystem; it is
+ unclear if ``iomap::validity_cookie`` can be reused for this
+ purpose.
+
This function must be supplied by the filesystem.
- ``submit_ioend``: Allows the file systems to hook into writeback bio
@@ -323,7 +327,7 @@ The fields are as follows:
transactions from process context before submitting the bio.
This function is optional.
- - ``discard_folio``: iomap calls this function after ``->map_blocks``
+ - ``discard_folio``: iomap calls this function after ``->writeback_folio``
fails to schedule I/O for any part of a dirty folio.
The function should throw away any reservations that may have been
made for the write.
diff --git a/block/fops.c b/block/fops.c
index 1309861d4c2c..c35fe1495fd2 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -551,8 +551,13 @@ static int blkdev_map_blocks(struct iomap_writepage_ctx *wpc,
IOMAP_WRITE, &wpc->iomap, NULL);
}
+static int blkdev_writeback_folio(struct iomap_writeback_folio_range *ctx)
+{
+ return iomap_bio_writeback_folio(ctx, blkdev_map_blocks);
+}
+
static const struct iomap_writeback_ops blkdev_writeback_ops = {
- .map_blocks = blkdev_map_blocks,
+ .writeback_folio = blkdev_writeback_folio,
};
static int blkdev_writepages(struct address_space *mapping,
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 7703d0471139..d13dfa986e18 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -2486,6 +2486,11 @@ static int gfs2_map_blocks(struct iomap_writepage_ctx *wpc, struct inode *inode,
return ret;
}
+static int gfs2_writeback_folio(struct iomap_writeback_folio_range *ctx)
+{
+ return iomap_bio_writeback_folio(ctx, gfs2_map_blocks);
+}
+
const struct iomap_writeback_ops gfs2_writeback_ops = {
- .map_blocks = gfs2_map_blocks,
+ .writeback_folio = gfs2_writeback_folio,
};
diff --git a/fs/iomap/buffered-io-bio.c b/fs/iomap/buffered-io-bio.c
index d5dfa1b3eef7..e052fc8b46c1 100644
--- a/fs/iomap/buffered-io-bio.c
+++ b/fs/iomap/buffered-io-bio.c
@@ -9,6 +9,7 @@
#include <linux/writeback.h>
#include "internal.h"
+#include "trace.h"
static void iomap_finish_folio_read(struct folio *folio, size_t off,
size_t len, int error)
@@ -208,7 +209,7 @@ static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos,
* At the end of a writeback pass, there will be a cached ioend remaining on the
* writepage context that the caller will need to submit.
*/
-int iomap_bio_add_to_ioend(struct iomap_writepage_ctx *wpc,
+static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
struct writeback_control *wbc, struct folio *folio,
struct inode *inode, loff_t pos, loff_t end_pos,
unsigned len)
@@ -290,3 +291,49 @@ int iomap_bio_add_to_ioend(struct iomap_writepage_ctx *wpc,
wbc_account_cgroup_owner(wbc, folio, len);
return 0;
}
+
+int iomap_bio_writeback_folio(struct iomap_writeback_folio_range *ctx,
+ iomap_map_blocks_t map_blocks)
+{
+ struct iomap_writepage_ctx *wpc = ctx->wpc;
+ struct folio *folio = ctx->folio;
+ u64 pos = ctx->pos;
+ u64 end_pos = ctx->end_pos;
+ u32 dirty_len = ctx->dirty_len;
+ struct writeback_control *wbc = ctx->wbc;
+ struct inode *inode = folio->mapping->host;
+ int error;
+
+ do {
+ unsigned map_len;
+
+ error = map_blocks(wpc, inode, pos, dirty_len);
+ if (error)
+ break;
+ trace_iomap_writepage_map(inode, pos, dirty_len, &wpc->iomap);
+
+ map_len = min_t(u64, dirty_len,
+ wpc->iomap.offset + wpc->iomap.length - pos);
+ WARN_ON_ONCE(!folio->private && map_len < dirty_len);
+
+ switch (wpc->iomap.type) {
+ case IOMAP_INLINE:
+ WARN_ON_ONCE(1);
+ error = -EIO;
+ break;
+ case IOMAP_HOLE:
+ break;
+ default:
+ error = iomap_add_to_ioend(wpc, wbc, folio, inode, pos,
+ end_pos, map_len);
+ if (!error)
+ ctx->async_writeback = true;
+ break;
+ }
+ dirty_len -= map_len;
+ pos += map_len;
+ } while (dirty_len && !error);
+
+ return error;
+}
+EXPORT_SYMBOL_GPL(iomap_bio_writeback_folio);
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 2f620ebe20e2..2b8d733f65da 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1486,57 +1486,6 @@ int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error)
return error;
}
-static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
- struct writeback_control *wbc, struct folio *folio,
- struct inode *inode, u64 pos, u64 end_pos,
- unsigned dirty_len, bool *async_writeback)
-{
- int error;
-
- do {
- unsigned map_len;
-
- error = wpc->ops->map_blocks(wpc, inode, pos, dirty_len);
- if (error)
- break;
- trace_iomap_writepage_map(inode, pos, dirty_len, &wpc->iomap);
-
- map_len = min_t(u64, dirty_len,
- wpc->iomap.offset + wpc->iomap.length - pos);
- WARN_ON_ONCE(!folio->private && map_len < dirty_len);
-
- switch (wpc->iomap.type) {
- case IOMAP_INLINE:
- WARN_ON_ONCE(1);
- error = -EIO;
- break;
- case IOMAP_HOLE:
- break;
- default:
- error = iomap_bio_add_to_ioend(wpc, wbc, folio, inode,
- pos, end_pos, map_len);
- if (!error)
- *async_writeback = true;
- break;
- }
- dirty_len -= map_len;
- pos += map_len;
- } while (dirty_len && !error);
-
- /*
- * We cannot cancel the ioend directly here on error. We may have
- * already set other pages under writeback and hence we have to run I/O
- * completion to mark the error state of the pages under writeback
- * appropriately.
- *
- * Just let the file system know what portion of the folio failed to
- * map.
- */
- if (error && wpc->ops->discard_folio)
- wpc->ops->discard_folio(folio, pos);
- return error;
-}
-
/*
* Check interaction of the folio with the file end.
*
@@ -1603,9 +1552,14 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
u64 pos = folio_pos(folio);
u64 end_pos = pos + folio_size(folio);
u64 end_aligned = 0;
- bool async_writeback = false;
int error = 0;
u32 rlen;
+ struct iomap_writeback_folio_range ctx = {
+ .wpc = wpc,
+ .wbc = wbc,
+ .folio = folio,
+ .end_pos = end_pos,
+ };
WARN_ON_ONCE(!folio_test_locked(folio));
WARN_ON_ONCE(folio_test_dirty(folio));
@@ -1646,14 +1600,20 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
*/
end_aligned = round_up(end_pos, i_blocksize(inode));
while ((rlen = iomap_find_dirty_range(folio, &pos, end_aligned))) {
- error = iomap_writepage_map_blocks(wpc, wbc, folio, inode,
- pos, end_pos, rlen, &async_writeback);
- if (error)
+ ctx.pos = pos;
+ ctx.dirty_len = rlen;
+ WARN_ON(!wpc->ops->writeback_folio);
+ error = wpc->ops->writeback_folio(&ctx);
+
+ if (error) {
+ if (wpc->ops->discard_folio)
+ wpc->ops->discard_folio(folio, pos);
break;
+ }
pos += rlen;
}
- if (async_writeback)
+ if (ctx.async_writeback)
wpc->nr_folios++;
/*
@@ -1675,7 +1635,7 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
if (atomic_dec_and_test(&ifs->write_bytes_pending))
folio_end_writeback(folio);
} else {
- if (!async_writeback)
+ if (!ctx.async_writeback)
folio_end_writeback(folio);
}
mapping_set_error(inode->i_mapping, error);
diff --git a/fs/iomap/internal.h b/fs/iomap/internal.h
index 27e8a174dc3f..6efb5905bf4f 100644
--- a/fs/iomap/internal.h
+++ b/fs/iomap/internal.h
@@ -37,9 +37,6 @@ int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error);
#ifdef CONFIG_BLOCK
int iomap_bio_read_folio_sync(loff_t block_start, struct folio *folio,
size_t poff, size_t plen, const struct iomap *iomap);
-int iomap_bio_add_to_ioend(struct iomap_writepage_ctx *wpc,
- struct writeback_control *wbc, struct folio *folio,
- struct inode *inode, loff_t pos, loff_t end_pos, unsigned len);
void iomap_bio_readpage(const struct iomap *iomap, loff_t pos,
struct iomap_readpage_ctx *ctx, size_t poff, size_t plen,
loff_t length);
@@ -47,7 +44,6 @@ void iomap_bio_ioend_error(struct iomap_writepage_ctx *wpc, int error);
void iomap_submit_bio(struct bio *bio);
#else
#define iomap_bio_read_folio_sync(...) (-ENOSYS)
-#define iomap_bio_add_to_ioend(...) (-ENOSYS)
#define iomap_bio_readpage(...) ((void)0)
#define iomap_bio_ioend_error(...) ((void)0)
#define iomap_submit_bio(...) ((void)0)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 63151feb9c3f..8878c015bd48 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -455,6 +455,11 @@ xfs_ioend_needs_wq_completion(
return false;
}
+static int xfs_writeback_folio(struct iomap_writeback_folio_range *ctx)
+{
+ return iomap_bio_writeback_folio(ctx, xfs_map_blocks);
+}
+
static int
xfs_submit_ioend(
struct iomap_writepage_ctx *wpc,
@@ -526,7 +531,7 @@ xfs_discard_folio(
}
static const struct iomap_writeback_ops xfs_writeback_ops = {
- .map_blocks = xfs_map_blocks,
+ .writeback_folio = xfs_writeback_folio,
.submit_ioend = xfs_submit_ioend,
.discard_folio = xfs_discard_folio,
};
@@ -608,6 +613,11 @@ xfs_zoned_map_blocks(
return 0;
}
+static int xfs_zoned_writeback_folio(struct iomap_writeback_folio_range *ctx)
+{
+ return iomap_bio_writeback_folio(ctx, xfs_zoned_map_blocks);
+}
+
static int
xfs_zoned_submit_ioend(
struct iomap_writepage_ctx *wpc,
@@ -621,7 +631,7 @@ xfs_zoned_submit_ioend(
}
static const struct iomap_writeback_ops xfs_zoned_writeback_ops = {
- .map_blocks = xfs_zoned_map_blocks,
+ .writeback_folio = xfs_zoned_writeback_folio,
.submit_ioend = xfs_zoned_submit_ioend,
.discard_folio = xfs_discard_folio,
};
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 42e2c0065bb3..11901e40e810 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -145,8 +145,13 @@ static int zonefs_write_map_blocks(struct iomap_writepage_ctx *wpc,
IOMAP_WRITE, &wpc->iomap, NULL);
}
+static int zonefs_writeback_folio(struct iomap_writeback_folio_range *ctx)
+{
+ return iomap_bio_writeback_folio(ctx, zonefs_write_map_blocks);
+}
+
static const struct iomap_writeback_ops zonefs_writeback_ops = {
- .map_blocks = zonefs_write_map_blocks,
+ .writeback_folio = zonefs_writeback_folio,
};
static int zonefs_writepages(struct address_space *mapping,
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 51cf3e863caf..fe827948035d 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -16,6 +16,7 @@ struct inode;
struct iomap_iter;
struct iomap_dio;
struct iomap_writepage_ctx;
+struct iomap_writeback_folio_range;
struct iov_iter;
struct kiocb;
struct page;
@@ -427,18 +428,13 @@ static inline struct iomap_ioend *iomap_ioend_from_bio(struct bio *bio)
struct iomap_writeback_ops {
/*
- * Required, maps the blocks so that writeback can be performed on
- * the range starting at offset.
+ * Required.
*
- * Can return arbitrarily large regions, but we need to call into it at
- * least once per folio to allow the file systems to synchronize with
- * the write path that could be invalidating mappings.
- *
- * An existing mapping from a previous call to this method can be reused
- * by the file system if it is still valid.
+ * If the writeback is done asynchronously, the caller is responsible
+ * for ending writeback on the folio once all the dirty ranges have been
+ * written out and the caller should set ctx->async_writeback to true.
*/
- int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode,
- loff_t offset, unsigned len);
+ int (*writeback_folio)(struct iomap_writeback_folio_range *ctx);
/*
* Optional, allows the file systems to hook into bio submission,
@@ -464,6 +460,16 @@ struct iomap_writepage_ctx {
u32 nr_folios; /* folios added to the ioend */
};
+struct iomap_writeback_folio_range {
+ struct iomap_writepage_ctx *wpc;
+ struct writeback_control *wbc;
+ struct folio *folio;
+ u64 pos;
+ u64 end_pos;
+ u32 dirty_len;
+ bool async_writeback; /* should get set to true if writeback is async */
+};
+
struct iomap_ioend *iomap_init_ioend(struct inode *inode, struct bio *bio,
loff_t file_offset, u16 ioend_flags);
struct iomap_ioend *iomap_split_ioend(struct iomap_ioend *ioend,
@@ -541,4 +547,20 @@ int iomap_swapfile_activate(struct swap_info_struct *sis,
extern struct bio_set iomap_ioend_bioset;
+/*
+ * Maps the blocks so that writeback can be performed on the range
+ * starting at offset.
+ *
+ * Can return arbitrarily large regions, but we need to call into it at
+ * least once per folio to allow the file systems to synchronize with
+ * the write path that could be invalidating mappings.
+ *
+ * An existing mapping from a previous call to this method can be reused
+ * by the file system if it is still valid.
+ */
+typedef int iomap_map_blocks_t(struct iomap_writepage_ctx *wpc,
+ struct inode *inode, loff_t offset, unsigned int len);
+int iomap_bio_writeback_folio(struct iomap_writeback_folio_range *ctx,
+ iomap_map_blocks_t map_blocks);
+
#endif /* LINUX_IOMAP_H */
--
2.47.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v2 11/16] iomap: replace ->submit_ioend() with generic ->writeback_complete() for writeback
2025-06-13 21:46 [PATCH v2 00/16] fuse: use iomap for buffered writes + writeback Joanne Koong
` (9 preceding siblings ...)
2025-06-13 21:46 ` [PATCH v2 10/16] iomap: replace ->map_blocks() with generic ->writeback_folio() for writeback Joanne Koong
@ 2025-06-13 21:46 ` Joanne Koong
2025-06-13 21:46 ` [PATCH v2 12/16] iomap: support more customized writeback handling Joanne Koong
` (4 subsequent siblings)
15 siblings, 0 replies; 28+ messages in thread
From: Joanne Koong @ 2025-06-13 21:46 UTC (permalink / raw)
To: linux-fsdevel
Cc: hch, djwong, anuj1072538, miklos, brauner, linux-xfs, kernel-team
As part of the larger effort to have iomap buffered io code support
generic io, replace submit_ioend() with writeback_complete() and move the
bio ioend code into a helper function, iomap_bio_writeback_complete(),
that callers using bios can directly invoke.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
.../filesystems/iomap/operations.rst | 18 ++++----
fs/iomap/buffered-io-bio.c | 42 ++++++++++++++++++-
fs/iomap/buffered-io.c | 38 +++--------------
fs/iomap/internal.h | 4 +-
fs/xfs/xfs_aops.c | 14 ++++++-
include/linux/iomap.h | 26 ++++++++----
6 files changed, 88 insertions(+), 54 deletions(-)
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index 5d018d504145..47213c810622 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -291,7 +291,7 @@ The ``ops`` structure must be specified and is as follows:
struct iomap_writeback_ops {
int (*writeback_folio)(struct iomap_writeback_folio_range *ctx);
- int (*submit_ioend)(struct iomap_writepage_ctx *wpc, int status);
+ int (*writeback_complete)(struct iomap_writepage_ctx *wpc, int status);
void (*discard_folio)(struct folio *folio, loff_t pos);
};
@@ -319,13 +319,15 @@ The fields are as follows:
This function must be supplied by the filesystem.
- - ``submit_ioend``: Allows the file systems to hook into writeback bio
- submission.
- This might include pre-write space accounting updates, or installing
- a custom ``->bi_end_io`` function for internal purposes, such as
- deferring the ioend completion to a workqueue to run metadata update
- transactions from process context before submitting the bio.
- This function is optional.
+ - ``writeback_complete``: Allows the file systems to execute any logic that
+ needs to happen after ``->writeback_folio`` has been called for all dirty
+ folios. This might include hooking into writeback bio submission for
+ pre-write space accounting updates, or installing a custom ``->bi_end_io``
+ function for internal purposes, such as deferring the ioend completion to
+ a workqueue to run metadata update transactions from process context
+ before submitting the bio.
+ This function is optional. If this function is not provided, iomap will
+ default to ``iomap_bio_writeback_complete``.
- ``discard_folio``: iomap calls this function after ``->writeback_folio``
fails to schedule I/O for any part of a dirty folio.
diff --git a/fs/iomap/buffered-io-bio.c b/fs/iomap/buffered-io-bio.c
index e052fc8b46c1..e9f26a938c8d 100644
--- a/fs/iomap/buffered-io-bio.c
+++ b/fs/iomap/buffered-io-bio.c
@@ -151,7 +151,7 @@ static void iomap_writepage_end_bio(struct bio *bio)
iomap_finish_ioend_buffered(ioend);
}
-void iomap_bio_ioend_error(struct iomap_writepage_ctx *wpc, int error)
+static void iomap_ioend_error(struct iomap_writepage_ctx *wpc, int error)
{
wpc->ioend->io_bio.bi_status = errno_to_blk_status(error);
bio_endio(&wpc->ioend->io_bio);
@@ -230,7 +230,7 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, pos, ioend_flags)) {
new_ioend:
- error = iomap_submit_ioend(wpc, 0);
+ error = iomap_writeback_complete(wpc, 0);
if (error)
return error;
wpc->ioend = iomap_alloc_ioend(wpc, wbc, inode, pos,
@@ -337,3 +337,41 @@ int iomap_bio_writeback_folio(struct iomap_writeback_folio_range *ctx,
return error;
}
EXPORT_SYMBOL_GPL(iomap_bio_writeback_folio);
+
+/*
+ * Submit an ioend.
+ *
+ * If @error is non-zero, it means that we have a situation where some part of
+ * the submission process has failed after we've marked pages for writeback.
+ * We cannot cancel ioend directly in that case, so call the bio end I/O handler
+ * with the error status here to run the normal I/O completion handler to clear
+ * the writeback bit and let the file system proess the errors.
+ */
+int iomap_bio_writeback_complete(struct iomap_writepage_ctx *wpc, int error,
+ iomap_submit_ioend_t submit_ioend)
+{
+ if (!wpc->ioend)
+ return error;
+
+ /*
+ * Let the file systems prepare the I/O submission and hook in an I/O
+ * comletion handler. This also needs to happen in case after a
+ * failure happened so that the file system end I/O handler gets called
+ * to clean up.
+ */
+ if (submit_ioend) {
+ error = submit_ioend(wpc, error);
+ } else {
+ if (WARN_ON_ONCE(wpc->iomap.flags & IOMAP_F_ANON_WRITE))
+ error = -EIO;
+ if (!error)
+ iomap_submit_bio(&wpc->ioend->io_bio);
+ }
+
+ if (error)
+ iomap_ioend_error(wpc, error);
+
+ wpc->ioend = NULL;
+ return error;
+}
+EXPORT_SYMBOL_GPL(iomap_bio_writeback_complete);
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 2b8d733f65da..bdf917ae56dc 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1450,39 +1450,13 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops,
}
EXPORT_SYMBOL_GPL(iomap_page_mkwrite);
-/*
- * Submit an ioend.
- *
- * If @error is non-zero, it means that we have a situation where some part of
- * the submission process has failed after we've marked pages for writeback.
- * We cannot cancel ioend directly in that case, so call the bio end I/O handler
- * with the error status here to run the normal I/O completion handler to clear
- * the writeback bit and let the file system proess the errors.
- */
-int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error)
+int iomap_writeback_complete(struct iomap_writepage_ctx *wpc, int error)
{
- if (!wpc->ioend)
- return error;
-
- /*
- * Let the file systems prepare the I/O submission and hook in an I/O
- * comletion handler. This also needs to happen in case after a
- * failure happened so that the file system end I/O handler gets called
- * to clean up.
- */
- if (wpc->ops->submit_ioend) {
- error = wpc->ops->submit_ioend(wpc, error);
- } else {
- if (WARN_ON_ONCE(wpc->iomap.flags & IOMAP_F_ANON_WRITE))
- error = -EIO;
- if (!error)
- iomap_submit_bio(&wpc->ioend->io_bio);
- }
-
- if (error)
- iomap_bio_ioend_error(wpc, error);
+ if (wpc->ops->writeback_complete)
+ error = wpc->ops->writeback_complete(wpc, error);
+ else
+ error = iomap_bio_writeback_complete(wpc, error, NULL);
- wpc->ioend = NULL;
return error;
}
@@ -1661,6 +1635,6 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
wpc->ops = ops;
while ((folio = writeback_iter(mapping, wbc, folio, &error)))
error = iomap_writepage_map(wpc, wbc, folio);
- return iomap_submit_ioend(wpc, error);
+ return iomap_writeback_complete(wpc, error);
}
EXPORT_SYMBOL_GPL(iomap_writepages);
diff --git a/fs/iomap/internal.h b/fs/iomap/internal.h
index 6efb5905bf4f..bfd3f3be845a 100644
--- a/fs/iomap/internal.h
+++ b/fs/iomap/internal.h
@@ -32,7 +32,7 @@ u32 iomap_finish_ioend_buffered(struct iomap_ioend *ioend);
u32 iomap_finish_ioend_direct(struct iomap_ioend *ioend);
bool ifs_set_range_uptodate(struct folio *folio, struct iomap_folio_state *ifs,
size_t off, size_t len);
-int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error);
+int iomap_writeback_complete(struct iomap_writepage_ctx *wpc, int error);
#ifdef CONFIG_BLOCK
int iomap_bio_read_folio_sync(loff_t block_start, struct folio *folio,
@@ -40,12 +40,10 @@ int iomap_bio_read_folio_sync(loff_t block_start, struct folio *folio,
void iomap_bio_readpage(const struct iomap *iomap, loff_t pos,
struct iomap_readpage_ctx *ctx, size_t poff, size_t plen,
loff_t length);
-void iomap_bio_ioend_error(struct iomap_writepage_ctx *wpc, int error);
void iomap_submit_bio(struct bio *bio);
#else
#define iomap_bio_read_folio_sync(...) (-ENOSYS)
#define iomap_bio_readpage(...) ((void)0)
-#define iomap_bio_ioend_error(...) ((void)0)
#define iomap_submit_bio(...) ((void)0)
#endif /* CONFIG_BLOCK */
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 8878c015bd48..63745ff68250 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -493,6 +493,11 @@ xfs_submit_ioend(
return 0;
}
+static int xfs_writeback_complete(struct iomap_writepage_ctx *wpc, int error)
+{
+ return iomap_bio_writeback_complete(wpc, error, xfs_submit_ioend);
+}
+
/*
* If the folio has delalloc blocks on it, the caller is asking us to punch them
* out. If we don't, we can leave a stale delalloc mapping covered by a clean
@@ -532,7 +537,7 @@ xfs_discard_folio(
static const struct iomap_writeback_ops xfs_writeback_ops = {
.writeback_folio = xfs_writeback_folio,
- .submit_ioend = xfs_submit_ioend,
+ .writeback_complete = xfs_writeback_complete,
.discard_folio = xfs_discard_folio,
};
@@ -630,9 +635,14 @@ xfs_zoned_submit_ioend(
return 0;
}
+static int xfs_zoned_writeback_complete(struct iomap_writepage_ctx *wpc, int error)
+{
+ return iomap_bio_writeback_complete(wpc, error, xfs_zoned_submit_ioend);
+}
+
static const struct iomap_writeback_ops xfs_zoned_writeback_ops = {
.writeback_folio = xfs_zoned_writeback_folio,
- .submit_ioend = xfs_zoned_submit_ioend,
+ .writeback_complete = xfs_zoned_writeback_complete,
.discard_folio = xfs_discard_folio,
};
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index fe827948035d..f4350e59fe7e 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -437,14 +437,10 @@ struct iomap_writeback_ops {
int (*writeback_folio)(struct iomap_writeback_folio_range *ctx);
/*
- * Optional, allows the file systems to hook into bio submission,
- * including overriding the bi_end_io handler.
- *
- * Returns 0 if the bio was successfully submitted, or a negative
- * error code if status was non-zero or another error happened and
- * the bio could not be submitted.
+ * Optional, allows the file system to call into this once
+ * ->writeback_folio() on all dirty ranges have been issued.
*/
- int (*submit_ioend)(struct iomap_writepage_ctx *wpc, int status);
+ int (*writeback_complete)(struct iomap_writepage_ctx *wpc, int status);
/*
* Optional, allows the file system to discard state on a page where
@@ -563,4 +559,20 @@ typedef int iomap_map_blocks_t(struct iomap_writepage_ctx *wpc,
int iomap_bio_writeback_folio(struct iomap_writeback_folio_range *ctx,
iomap_map_blocks_t map_blocks);
+#ifdef CONFIG_BLOCK
+/*
+ * Allows the file systems to hook into bio submission, including overriding
+ * the bi_end_io handler.
+ *
+ * Returns 0 if the bio was successfully submitted, or a negative
+ * error code if status was non-zero or another error happened and
+ * the bio could not be submitted.
+ */
+typedef int iomap_submit_ioend_t(struct iomap_writepage_ctx *wpc, int error);
+int iomap_bio_writeback_complete(struct iomap_writepage_ctx *wpc, int error,
+ iomap_submit_ioend_t submit_ioend);
+#else
+#define iomap_bio_writeback_complete(...) (-ENOSYS)
+#endif /* CONFIG_BLOCK */
+
#endif /* LINUX_IOMAP_H */
--
2.47.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v2 12/16] iomap: support more customized writeback handling
2025-06-13 21:46 [PATCH v2 00/16] fuse: use iomap for buffered writes + writeback Joanne Koong
` (10 preceding siblings ...)
2025-06-13 21:46 ` [PATCH v2 11/16] iomap: replace ->submit_ioend() with generic ->writeback_complete() " Joanne Koong
@ 2025-06-13 21:46 ` Joanne Koong
2025-06-13 21:46 ` [PATCH v2 13/16] iomap: add iomap_writeback_dirty_folio() Joanne Koong
` (3 subsequent siblings)
15 siblings, 0 replies; 28+ messages in thread
From: Joanne Koong @ 2025-06-13 21:46 UTC (permalink / raw)
To: linux-fsdevel
Cc: hch, djwong, anuj1072538, miklos, brauner, linux-xfs, kernel-team
Add convenience helpers for more customized writeback handling.
For example, the caller may wish to use iomap_start_folio_write() and
iomap_finish_folio_write() for tracking when writeback state needs to be
ended on the folio.
Add a void *private field as well that callers can pass into the
->writeback_folio() handler.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io-bio.c | 12 ------------
fs/iomap/buffered-io.c | 24 ++++++++++++++++++++++++
include/linux/iomap.h | 6 ++++++
3 files changed, 30 insertions(+), 12 deletions(-)
diff --git a/fs/iomap/buffered-io-bio.c b/fs/iomap/buffered-io-bio.c
index e9f26a938c8d..2463e3b39f98 100644
--- a/fs/iomap/buffered-io-bio.c
+++ b/fs/iomap/buffered-io-bio.c
@@ -99,18 +99,6 @@ int iomap_bio_read_folio_sync(loff_t block_start, struct folio *folio,
return submit_bio_wait(&bio);
}
-static void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
- size_t len)
-{
- struct iomap_folio_state *ifs = folio->private;
-
- WARN_ON_ONCE(i_blocks_per_folio(inode, folio) > 1 && !ifs);
- WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) <= 0);
-
- if (!ifs || atomic_sub_and_test(len, &ifs->write_bytes_pending))
- folio_end_writeback(folio);
-}
-
/*
* We're now finished for good with this ioend structure. Update the page
* state, release holds on bios, and finally free up memory. Do not use the
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index bdf917ae56dc..25ae1d53eccb 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1638,3 +1638,27 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
return iomap_writeback_complete(wpc, error);
}
EXPORT_SYMBOL_GPL(iomap_writepages);
+
+void iomap_start_folio_write(struct inode *inode, struct folio *folio,
+ size_t len)
+{
+ struct iomap_folio_state *ifs = folio->private;
+
+ WARN_ON_ONCE(i_blocks_per_folio(inode, folio) > 1 && !ifs);
+ if (ifs)
+ atomic_add(len, &ifs->write_bytes_pending);
+}
+EXPORT_SYMBOL_GPL(iomap_start_folio_write);
+
+void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
+ size_t len)
+{
+ struct iomap_folio_state *ifs = folio->private;
+
+ WARN_ON_ONCE(i_blocks_per_folio(inode, folio) > 1 && !ifs);
+ WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) <= 0);
+
+ if (!ifs || atomic_sub_and_test(len, &ifs->write_bytes_pending))
+ folio_end_writeback(folio);
+}
+EXPORT_SYMBOL_GPL(iomap_finish_folio_write);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index f4350e59fe7e..3115b00ff410 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -454,6 +454,7 @@ struct iomap_writepage_ctx {
struct iomap_ioend *ioend;
const struct iomap_writeback_ops *ops;
u32 nr_folios; /* folios added to the ioend */
+ void *private;
};
struct iomap_writeback_folio_range {
@@ -575,4 +576,9 @@ int iomap_bio_writeback_complete(struct iomap_writepage_ctx *wpc, int error,
#define iomap_bio_writeback_complete(...) (-ENOSYS)
#endif /* CONFIG_BLOCK */
+void iomap_start_folio_write(struct inode *inode, struct folio *folio,
+ size_t len);
+void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
+ size_t len);
+
#endif /* LINUX_IOMAP_H */
--
2.47.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v2 13/16] iomap: add iomap_writeback_dirty_folio()
2025-06-13 21:46 [PATCH v2 00/16] fuse: use iomap for buffered writes + writeback Joanne Koong
` (11 preceding siblings ...)
2025-06-13 21:46 ` [PATCH v2 12/16] iomap: support more customized writeback handling Joanne Koong
@ 2025-06-13 21:46 ` Joanne Koong
2025-06-13 21:46 ` [PATCH v2 14/16] fuse: use iomap for buffered writes Joanne Koong
` (2 subsequent siblings)
15 siblings, 0 replies; 28+ messages in thread
From: Joanne Koong @ 2025-06-13 21:46 UTC (permalink / raw)
To: linux-fsdevel
Cc: hch, djwong, anuj1072538, miklos, brauner, linux-xfs, kernel-team
Add iomap_writeback_dirty_folio() for writing back a dirty folio.
One use case of this is for laundering a folio.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 27 +++++++++++++++++++--------
include/linux/iomap.h | 3 +++
2 files changed, 22 insertions(+), 8 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 25ae1d53eccb..d47abeefe92b 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1518,7 +1518,7 @@ static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode,
return true;
}
-static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
+static int iomap_writeback(struct iomap_writepage_ctx *wpc,
struct writeback_control *wbc, struct folio *folio)
{
struct iomap_folio_state *ifs = folio->private;
@@ -1541,10 +1541,8 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
trace_iomap_writepage(inode, pos, folio_size(folio));
- if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
- folio_unlock(folio);
+ if (!iomap_writepage_handle_eof(folio, inode, &end_pos))
return 0;
- }
WARN_ON_ONCE(end_pos <= pos);
if (i_blocks_per_folio(inode, folio) > 1) {
@@ -1602,9 +1600,8 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
* But we may end up either not actually writing any blocks, or (when
* there are multiple blocks in a folio) all I/O might have finished
* already at this point. In that case we need to clear the writeback
- * bit ourselves right after unlocking the page.
+ * bit ourselves.
*/
- folio_unlock(folio);
if (ifs) {
if (atomic_dec_and_test(&ifs->write_bytes_pending))
folio_end_writeback(folio);
@@ -1633,12 +1630,26 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
return -EIO;
wpc->ops = ops;
- while ((folio = writeback_iter(mapping, wbc, folio, &error)))
- error = iomap_writepage_map(wpc, wbc, folio);
+ while ((folio = writeback_iter(mapping, wbc, folio, &error))) {
+ error = iomap_writeback(wpc, wbc, folio);
+ folio_unlock(folio);
+ }
return iomap_writeback_complete(wpc, error);
}
EXPORT_SYMBOL_GPL(iomap_writepages);
+int iomap_writeback_dirty_folio(struct folio *folio,
+ struct writeback_control *wbc, struct iomap_writepage_ctx *wpc,
+ const struct iomap_writeback_ops *ops)
+{
+ int error;
+
+ wpc->ops = ops;
+ error = iomap_writeback(wpc, wbc, folio);
+ return iomap_writeback_complete(wpc, error);
+}
+EXPORT_SYMBOL_GPL(iomap_writeback_dirty_folio);
+
void iomap_start_folio_write(struct inode *inode, struct folio *folio,
size_t len)
{
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 3115b00ff410..95646346dff5 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -478,6 +478,9 @@ void iomap_sort_ioends(struct list_head *ioend_list);
int iomap_writepages(struct address_space *mapping,
struct writeback_control *wbc, struct iomap_writepage_ctx *wpc,
const struct iomap_writeback_ops *ops);
+int iomap_writeback_dirty_folio(struct folio *folio,
+ struct writeback_control *wbc, struct iomap_writepage_ctx *wpc,
+ const struct iomap_writeback_ops *ops);
/*
* Flags for direct I/O ->end_io:
--
2.47.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v2 14/16] fuse: use iomap for buffered writes
2025-06-13 21:46 [PATCH v2 00/16] fuse: use iomap for buffered writes + writeback Joanne Koong
` (12 preceding siblings ...)
2025-06-13 21:46 ` [PATCH v2 13/16] iomap: add iomap_writeback_dirty_folio() Joanne Koong
@ 2025-06-13 21:46 ` Joanne Koong
2025-06-13 21:46 ` [PATCH v2 15/16] fuse: use iomap for writeback Joanne Koong
2025-06-13 21:46 ` [PATCH v2 16/16] fuse: use iomap for folio laundering Joanne Koong
15 siblings, 0 replies; 28+ messages in thread
From: Joanne Koong @ 2025-06-13 21:46 UTC (permalink / raw)
To: linux-fsdevel
Cc: hch, djwong, anuj1072538, miklos, brauner, linux-xfs, kernel-team
Have buffered writes go through iomap. This has two advantages:
* granular large folio synchronous reads
* granular large folio dirty tracking
If for example there is a 1 MB large folio and a write issued at pos 1
to pos 1 MB - 2, only the head and tail pages will need to be read in
and marked uptodate instead of the entire folio needing to be read in.
Non-relevant trailing pages are also skipped (eg if for a 1 MB large
folio a write is issued at pos 1 to 4097, only the first two pages are
read in and the ones after that are skipped).
iomap also has granular dirty tracking. This is useful in that when it
comes to writeback time, only the dirty portions of the large folio will
be written instead of having to write out the entire folio. For example
if there is a 1 MB large folio and only 2 bytes in it are dirty, only
the page for those dirty bytes will be written out. Please note that
granular writeback is only done once fuse also uses iomap in writeback
(separate commit).
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/fuse/Kconfig | 1 +
fs/fuse/file.c | 141 ++++++++++++++++++------------------------------
2 files changed, 53 insertions(+), 89 deletions(-)
diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
index ca215a3cba3e..a774166264de 100644
--- a/fs/fuse/Kconfig
+++ b/fs/fuse/Kconfig
@@ -2,6 +2,7 @@
config FUSE_FS
tristate "FUSE (Filesystem in Userspace) support"
select FS_POSIX_ACL
+ select FS_IOMAP
help
With FUSE it is possible to implement a fully functional filesystem
in a userspace program.
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index f102afc03359..59ff1dfd755b 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -21,6 +21,7 @@
#include <linux/filelock.h>
#include <linux/splice.h>
#include <linux/task_io_accounting_ops.h>
+#include <linux/iomap.h>
static int fuse_send_open(struct fuse_mount *fm, u64 nodeid,
unsigned int open_flags, int opcode,
@@ -788,12 +789,16 @@ static void fuse_short_read(struct inode *inode, u64 attr_ver, size_t num_read,
}
}
-static int fuse_do_readfolio(struct file *file, struct folio *folio)
+static int fuse_do_readfolio(struct file *file, struct folio *folio,
+ size_t off, size_t len)
{
struct inode *inode = folio->mapping->host;
struct fuse_mount *fm = get_fuse_mount(inode);
- loff_t pos = folio_pos(folio);
- struct fuse_folio_desc desc = { .length = folio_size(folio) };
+ loff_t pos = folio_pos(folio) + off;
+ struct fuse_folio_desc desc = {
+ .offset = off,
+ .length = len,
+ };
struct fuse_io_args ia = {
.ap.args.page_zeroing = true,
.ap.args.out_pages = true,
@@ -820,8 +825,6 @@ static int fuse_do_readfolio(struct file *file, struct folio *folio)
if (res < desc.length)
fuse_short_read(inode, attr_ver, res, &ia.ap);
- folio_mark_uptodate(folio);
-
return 0;
}
@@ -834,13 +837,25 @@ static int fuse_read_folio(struct file *file, struct folio *folio)
if (fuse_is_bad(inode))
goto out;
- err = fuse_do_readfolio(file, folio);
+ err = fuse_do_readfolio(file, folio, 0, folio_size(folio));
+ if (!err)
+ folio_mark_uptodate(folio);
+
fuse_invalidate_atime(inode);
out:
folio_unlock(folio);
return err;
}
+static int fuse_iomap_read_folio_sync(loff_t block_start, struct folio *folio,
+ size_t off, size_t len, const struct iomap *iomap,
+ void *private)
+{
+ struct file *file = private;
+
+ return fuse_do_readfolio(file, folio, off, len);
+}
+
static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
int err)
{
@@ -1375,6 +1390,25 @@ static void fuse_dio_unlock(struct kiocb *iocb, bool exclusive)
}
}
+static const struct iomap_folio_ops fuse_iomap_folio_ops = {
+ .read_folio_sync = fuse_iomap_read_folio_sync,
+};
+
+static int fuse_write_iomap_begin(struct inode *inode, loff_t offset,
+ loff_t length, unsigned int flags,
+ struct iomap *iomap, struct iomap *srcmap)
+{
+ iomap->type = IOMAP_MAPPED;
+ iomap->folio_ops = &fuse_iomap_folio_ops;
+ iomap->length = length;
+ iomap->offset = offset;
+ return 0;
+}
+
+static const struct iomap_ops fuse_write_iomap_ops = {
+ .iomap_begin = fuse_write_iomap_begin,
+};
+
static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
struct file *file = iocb->ki_filp;
@@ -1384,6 +1418,7 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = mapping->host;
ssize_t err, count;
struct fuse_conn *fc = get_fuse_conn(inode);
+ bool writeback = false;
if (fc->writeback_cache) {
/* Update size (EOF optimization) and mode (SUID clearing) */
@@ -1397,8 +1432,7 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
file_inode(file))) {
goto writethrough;
}
-
- return generic_file_write_iter(iocb, from);
+ writeback = true;
}
writethrough:
@@ -1420,6 +1454,13 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
goto out;
written = direct_write_fallback(iocb, from, written,
fuse_perform_write(iocb, from));
+ } else if (writeback) {
+ /*
+ * Use iomap so that we can do granular uptodate reads
+ * and granular dirty tracking for large folios.
+ */
+ written = iomap_file_buffered_write(iocb, from,
+ &fuse_write_iomap_ops, file);
} else {
written = fuse_perform_write(iocb, from);
}
@@ -2209,84 +2250,6 @@ static int fuse_writepages(struct address_space *mapping,
return err;
}
-/*
- * It's worthy to make sure that space is reserved on disk for the write,
- * but how to implement it without killing performance need more thinking.
- */
-static int fuse_write_begin(struct file *file, struct address_space *mapping,
- loff_t pos, unsigned len, struct folio **foliop, void **fsdata)
-{
- pgoff_t index = pos >> PAGE_SHIFT;
- struct fuse_conn *fc = get_fuse_conn(file_inode(file));
- struct folio *folio;
- loff_t fsize;
- int err = -ENOMEM;
-
- WARN_ON(!fc->writeback_cache);
-
- folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
- mapping_gfp_mask(mapping));
- if (IS_ERR(folio))
- goto error;
-
- if (folio_test_uptodate(folio) || len >= folio_size(folio))
- goto success;
- /*
- * Check if the start of this folio comes after the end of file,
- * in which case the readpage can be optimized away.
- */
- fsize = i_size_read(mapping->host);
- if (fsize <= folio_pos(folio)) {
- size_t off = offset_in_folio(folio, pos);
- if (off)
- folio_zero_segment(folio, 0, off);
- goto success;
- }
- err = fuse_do_readfolio(file, folio);
- if (err)
- goto cleanup;
-success:
- *foliop = folio;
- return 0;
-
-cleanup:
- folio_unlock(folio);
- folio_put(folio);
-error:
- return err;
-}
-
-static int fuse_write_end(struct file *file, struct address_space *mapping,
- loff_t pos, unsigned len, unsigned copied,
- struct folio *folio, void *fsdata)
-{
- struct inode *inode = folio->mapping->host;
-
- /* Haven't copied anything? Skip zeroing, size extending, dirtying. */
- if (!copied)
- goto unlock;
-
- pos += copied;
- if (!folio_test_uptodate(folio)) {
- /* Zero any unwritten bytes at the end of the page */
- size_t endoff = pos & ~PAGE_MASK;
- if (endoff)
- folio_zero_segment(folio, endoff, PAGE_SIZE);
- folio_mark_uptodate(folio);
- }
-
- if (pos > inode->i_size)
- i_size_write(inode, pos);
-
- folio_mark_dirty(folio);
-
-unlock:
- folio_unlock(folio);
- folio_put(folio);
-
- return copied;
-}
-
static int fuse_launder_folio(struct folio *folio)
{
int err = 0;
@@ -3144,12 +3107,12 @@ static const struct address_space_operations fuse_file_aops = {
.readahead = fuse_readahead,
.writepages = fuse_writepages,
.launder_folio = fuse_launder_folio,
- .dirty_folio = filemap_dirty_folio,
+ .dirty_folio = iomap_dirty_folio,
+ .release_folio = iomap_release_folio,
+ .invalidate_folio = iomap_invalidate_folio,
.migrate_folio = filemap_migrate_folio,
.bmap = fuse_bmap,
.direct_IO = fuse_direct_IO,
- .write_begin = fuse_write_begin,
- .write_end = fuse_write_end,
};
void fuse_init_file_inode(struct inode *inode, unsigned int flags)
--
2.47.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v2 15/16] fuse: use iomap for writeback
2025-06-13 21:46 [PATCH v2 00/16] fuse: use iomap for buffered writes + writeback Joanne Koong
` (13 preceding siblings ...)
2025-06-13 21:46 ` [PATCH v2 14/16] fuse: use iomap for buffered writes Joanne Koong
@ 2025-06-13 21:46 ` Joanne Koong
2025-06-13 21:46 ` [PATCH v2 16/16] fuse: use iomap for folio laundering Joanne Koong
15 siblings, 0 replies; 28+ messages in thread
From: Joanne Koong @ 2025-06-13 21:46 UTC (permalink / raw)
To: linux-fsdevel
Cc: hch, djwong, anuj1072538, miklos, brauner, linux-xfs, kernel-team
Use iomap for dirty folio writeback in ->writepages().
This allows for granular dirty writeback of large folios.
Only the dirty portions of the large folio will be written instead of
having to write out the entire folio. For example if there is a 1 MB
large folio and only 2 bytes in it are dirty, only the page for those
dirty bytes will be written out.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/fuse/file.c | 118 +++++++++++++++++++++++++++++--------------------
1 file changed, 70 insertions(+), 48 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 59ff1dfd755b..db6804f6cc1d 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1835,7 +1835,7 @@ static void fuse_writepage_finish(struct fuse_writepage_args *wpa)
* scope of the fi->lock alleviates xarray lock
* contention and noticeably improves performance.
*/
- folio_end_writeback(ap->folios[i]);
+ iomap_finish_folio_write(inode, ap->folios[i], 1);
dec_wb_stat(&bdi->wb, WB_WRITEBACK);
wb_writeout_inc(&bdi->wb);
}
@@ -2022,19 +2022,20 @@ static void fuse_writepage_add_to_bucket(struct fuse_conn *fc,
}
static void fuse_writepage_args_page_fill(struct fuse_writepage_args *wpa, struct folio *folio,
- uint32_t folio_index)
+ uint32_t folio_index, loff_t offset, unsigned len)
{
struct inode *inode = folio->mapping->host;
struct fuse_args_pages *ap = &wpa->ia.ap;
ap->folios[folio_index] = folio;
- ap->descs[folio_index].offset = 0;
- ap->descs[folio_index].length = folio_size(folio);
+ ap->descs[folio_index].offset = offset;
+ ap->descs[folio_index].length = len;
inc_wb_stat(&inode_to_bdi(inode)->wb, WB_WRITEBACK);
}
static struct fuse_writepage_args *fuse_writepage_args_setup(struct folio *folio,
+ size_t offset,
struct fuse_file *ff)
{
struct inode *inode = folio->mapping->host;
@@ -2047,7 +2048,7 @@ static struct fuse_writepage_args *fuse_writepage_args_setup(struct folio *folio
return NULL;
fuse_writepage_add_to_bucket(fc, wpa);
- fuse_write_args_fill(&wpa->ia, ff, folio_pos(folio), 0);
+ fuse_write_args_fill(&wpa->ia, ff, folio_pos(folio) + offset, 0);
wpa->ia.write.in.write_flags |= FUSE_WRITE_CACHE;
wpa->inode = inode;
wpa->ia.ff = ff;
@@ -2103,7 +2104,7 @@ struct fuse_fill_wb_data {
struct fuse_file *ff;
struct inode *inode;
unsigned int max_folios;
- unsigned int nr_pages;
+ unsigned int nr_bytes;
};
static bool fuse_pages_realloc(struct fuse_fill_wb_data *data)
@@ -2145,21 +2146,28 @@ static void fuse_writepages_send(struct fuse_fill_wb_data *data)
}
static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio,
+ loff_t offset, unsigned len,
struct fuse_args_pages *ap,
struct fuse_fill_wb_data *data)
{
+ struct folio *prev_folio;
+ struct fuse_folio_desc prev_desc;
+
WARN_ON(!ap->num_folios);
/* Reached max pages */
- if (data->nr_pages + folio_nr_pages(folio) > fc->max_pages)
+ if ((data->nr_bytes + len) / PAGE_SIZE > fc->max_pages)
return true;
/* Reached max write bytes */
- if ((data->nr_pages * PAGE_SIZE) + folio_size(folio) > fc->max_write)
+ if (data->nr_bytes + len > fc->max_write)
return true;
/* Discontinuity */
- if (folio_next_index(ap->folios[ap->num_folios - 1]) != folio->index)
+ prev_folio = ap->folios[ap->num_folios - 1];
+ prev_desc = ap->descs[ap->num_folios - 1];
+ if ((folio_pos(prev_folio) + prev_desc.offset + prev_desc.length) !=
+ folio_pos(folio) + offset)
return true;
/* Need to grow the pages array? If so, did the expansion fail? */
@@ -2169,85 +2177,99 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio,
return false;
}
-static int fuse_writepages_fill(struct folio *folio,
- struct writeback_control *wbc, void *_data)
+static int fuse_iomap_writeback_folio(struct iomap_writeback_folio_range *ctx)
{
- struct fuse_fill_wb_data *data = _data;
+ struct fuse_fill_wb_data *data = ctx->wpc->private;
struct fuse_writepage_args *wpa = data->wpa;
+ struct folio *folio = ctx->folio;
struct fuse_args_pages *ap = &wpa->ia.ap;
- struct inode *inode = data->inode;
- struct fuse_inode *fi = get_fuse_inode(inode);
+ struct inode *inode = folio->mapping->host;
struct fuse_conn *fc = get_fuse_conn(inode);
- int err;
+ struct fuse_inode *fi = get_fuse_inode(inode);
+ loff_t offset = offset_in_folio(folio, ctx->pos);
+ unsigned len = ctx->dirty_len;
+
+ /* len will always be page aligned */
+ WARN_ON_ONCE(len & (PAGE_SIZE - 1));
if (!data->ff) {
- err = -EIO;
data->ff = fuse_write_file_get(fi);
if (!data->ff)
- goto out_unlock;
+ return -EIO;
}
- if (wpa && fuse_writepage_need_send(fc, folio, ap, data)) {
+ iomap_start_folio_write(inode, folio, 1);
+ ctx->async_writeback = true;
+
+ if (wpa && fuse_writepage_need_send(fc, folio, offset, len, ap, data)) {
fuse_writepages_send(data);
data->wpa = NULL;
- data->nr_pages = 0;
+ data->nr_bytes = 0;
}
if (data->wpa == NULL) {
- err = -ENOMEM;
- wpa = fuse_writepage_args_setup(folio, data->ff);
+ wpa = fuse_writepage_args_setup(folio, offset, data->ff);
if (!wpa)
- goto out_unlock;
+ return -ENOMEM;
fuse_file_get(wpa->ia.ff);
data->max_folios = 1;
ap = &wpa->ia.ap;
}
- folio_start_writeback(folio);
- fuse_writepage_args_page_fill(wpa, folio, ap->num_folios);
- data->nr_pages += folio_nr_pages(folio);
+ fuse_writepage_args_page_fill(wpa, folio, ap->num_folios,
+ offset, len);
+ data->nr_bytes += len;
- err = 0;
ap->num_folios++;
if (!data->wpa)
data->wpa = wpa;
-out_unlock:
- folio_unlock(folio);
- return err;
+ return 0;
+}
+
+static int fuse_iomap_writeback_complete(struct iomap_writepage_ctx *wpc, int error)
+{
+ struct fuse_fill_wb_data *data = wpc->private;
+
+ WARN_ON_ONCE(!data);
+
+ if (data->wpa) {
+ WARN_ON(!data->wpa->ia.ap.num_folios);
+ fuse_writepages_send(data);
+ }
+
+ if (data->ff)
+ fuse_file_put(data->ff, false);
+
+ return error;
}
+static const struct iomap_writeback_ops fuse_writeback_ops = {
+ .writeback_folio = fuse_iomap_writeback_folio,
+ .writeback_complete = fuse_iomap_writeback_complete,
+};
+
static int fuse_writepages(struct address_space *mapping,
struct writeback_control *wbc)
{
struct inode *inode = mapping->host;
struct fuse_conn *fc = get_fuse_conn(inode);
- struct fuse_fill_wb_data data;
- int err;
+ struct fuse_fill_wb_data data = {
+ .inode = inode,
+ };
+ struct iomap_writepage_ctx wpc = {
+ .iomap.type = IOMAP_MAPPED,
+ .private = &data,
+ };
- err = -EIO;
if (fuse_is_bad(inode))
- goto out;
+ return -EIO;
if (wbc->sync_mode == WB_SYNC_NONE &&
fc->num_background >= fc->congestion_threshold)
return 0;
- data.inode = inode;
- data.wpa = NULL;
- data.ff = NULL;
- data.nr_pages = 0;
-
- err = write_cache_pages(mapping, wbc, fuse_writepages_fill, &data);
- if (data.wpa) {
- WARN_ON(!data.wpa->ia.ap.num_folios);
- fuse_writepages_send(&data);
- }
- if (data.ff)
- fuse_file_put(data.ff, false);
-
-out:
- return err;
+ return iomap_writepages(mapping, wbc, &wpc, &fuse_writeback_ops);
}
static int fuse_launder_folio(struct folio *folio)
--
2.47.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v2 16/16] fuse: use iomap for folio laundering
2025-06-13 21:46 [PATCH v2 00/16] fuse: use iomap for buffered writes + writeback Joanne Koong
` (14 preceding siblings ...)
2025-06-13 21:46 ` [PATCH v2 15/16] fuse: use iomap for writeback Joanne Koong
@ 2025-06-13 21:46 ` Joanne Koong
15 siblings, 0 replies; 28+ messages in thread
From: Joanne Koong @ 2025-06-13 21:46 UTC (permalink / raw)
To: linux-fsdevel
Cc: hch, djwong, anuj1072538, miklos, brauner, linux-xfs, kernel-team
Use iomap for folio laundering, which will do granular dirty
writeback when laundering a large folio.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/fuse/file.c | 49 +++++++++----------------------------------------
1 file changed, 9 insertions(+), 40 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index db6804f6cc1d..800f478ad683 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2060,45 +2060,6 @@ static struct fuse_writepage_args *fuse_writepage_args_setup(struct folio *folio
return wpa;
}
-static int fuse_writepage_locked(struct folio *folio)
-{
- struct address_space *mapping = folio->mapping;
- struct inode *inode = mapping->host;
- struct fuse_inode *fi = get_fuse_inode(inode);
- struct fuse_writepage_args *wpa;
- struct fuse_args_pages *ap;
- struct fuse_file *ff;
- int error = -EIO;
-
- ff = fuse_write_file_get(fi);
- if (!ff)
- goto err;
-
- wpa = fuse_writepage_args_setup(folio, ff);
- error = -ENOMEM;
- if (!wpa)
- goto err_writepage_args;
-
- ap = &wpa->ia.ap;
- ap->num_folios = 1;
-
- folio_start_writeback(folio);
- fuse_writepage_args_page_fill(wpa, folio, 0);
-
- spin_lock(&fi->lock);
- list_add_tail(&wpa->queue_entry, &fi->queued_writes);
- fuse_flush_writepages(inode);
- spin_unlock(&fi->lock);
-
- return 0;
-
-err_writepage_args:
- fuse_file_put(ff, false);
-err:
- mapping_set_error(folio->mapping, error);
- return error;
-}
-
struct fuse_fill_wb_data {
struct fuse_writepage_args *wpa;
struct fuse_file *ff;
@@ -2275,8 +2236,16 @@ static int fuse_writepages(struct address_space *mapping,
static int fuse_launder_folio(struct folio *folio)
{
int err = 0;
+ struct fuse_fill_wb_data data = {
+ .inode = folio->mapping->host,
+ };
+ struct iomap_writepage_ctx wpc = {
+ .iomap.type = IOMAP_MAPPED,
+ .private = &data,
+ };
+
if (folio_clear_dirty_for_io(folio)) {
- err = fuse_writepage_locked(folio);
+ err = iomap_writeback_dirty_folio(folio, NULL, &wpc, &fuse_writeback_ops);
if (!err)
folio_wait_writeback(folio);
}
--
2.47.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v2 04/16] iomap: add wrapper function iomap_bio_readpage()
2025-06-13 21:46 ` [PATCH v2 04/16] iomap: add wrapper function iomap_bio_readpage() Joanne Koong
@ 2025-06-16 12:49 ` Christoph Hellwig
2025-06-16 19:18 ` Joanne Koong
0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2025-06-16 12:49 UTC (permalink / raw)
To: Joanne Koong
Cc: linux-fsdevel, hch, djwong, anuj1072538, miklos, brauner,
linux-xfs, kernel-team
On Fri, Jun 13, 2025 at 02:46:29PM -0700, Joanne Koong wrote:
> Add a wrapper function, iomap_bio_readpage(), around the bio readpage
> logic so that callers that do not have CONFIG_BLOCK set may also use
> iomap for buffered io.
As far as I can tell nothing in this series actually uses the non-block
read path, and I also don't really understand how the current split
would facilitate that. Can you explain a bit more where this is going?
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 09/16] iomap: change 'count' to 'async_writeback'
2025-06-13 21:46 ` [PATCH v2 09/16] iomap: change 'count' to 'async_writeback' Joanne Koong
@ 2025-06-16 12:52 ` Christoph Hellwig
2025-06-16 18:49 ` Joanne Koong
0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2025-06-16 12:52 UTC (permalink / raw)
To: Joanne Koong
Cc: linux-fsdevel, hch, djwong, anuj1072538, miklos, brauner,
linux-xfs, kernel-team
On Fri, Jun 13, 2025 at 02:46:34PM -0700, Joanne Koong wrote:
> Rename "count" to "async_writeback" to better reflect its function and
> since it is used as a boolean, change its type from unsigned to bool.
Not sure async_writeback is really the right name here, the way it is
used is just that there is any writeback going on. Which generally
is asynchronous as otherwise performance would suck, but the important
bit is that the responsibility for finishing the folio writeback shifted
to the caller.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 10/16] iomap: replace ->map_blocks() with generic ->writeback_folio() for writeback
2025-06-13 21:46 ` [PATCH v2 10/16] iomap: replace ->map_blocks() with generic ->writeback_folio() for writeback Joanne Koong
@ 2025-06-16 12:54 ` Christoph Hellwig
0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2025-06-16 12:54 UTC (permalink / raw)
To: Joanne Koong
Cc: linux-fsdevel, hch, djwong, anuj1072538, miklos, brauner,
linux-xfs, kernel-team
On Fri, Jun 13, 2025 at 02:46:35PM -0700, Joanne Koong wrote:
> As part of the larger effort to have iomap buffered io code support
> generic io, replace map_blocks() with writeback_folio() and move the
> bio writeback code into a helper function, iomap_bio_writeback_folio(),
> that callers using bios can directly invoke.
Hmm, what I had in mind with my suggestion was to only have a single
callback, where the guts of the current code are just called by the
block based file systems.
I ended up implementing this this morning to see if it's feasible,
and it works fine so far. Let me send out what I've got.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 09/16] iomap: change 'count' to 'async_writeback'
2025-06-16 12:52 ` Christoph Hellwig
@ 2025-06-16 18:49 ` Joanne Koong
0 siblings, 0 replies; 28+ messages in thread
From: Joanne Koong @ 2025-06-16 18:49 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-fsdevel, djwong, anuj1072538, miklos, brauner, linux-xfs,
kernel-team
On Mon, Jun 16, 2025 at 5:52 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Fri, Jun 13, 2025 at 02:46:34PM -0700, Joanne Koong wrote:
> > Rename "count" to "async_writeback" to better reflect its function and
> > since it is used as a boolean, change its type from unsigned to bool.
>
> Not sure async_writeback is really the right name here, the way it is
> used is just that there is any writeback going on. Which generally
> is asynchronous as otherwise performance would suck, but the important
> bit is that the responsibility for finishing the folio writeback shifted
> to the caller.
I like your name "wb_pending" a lot better.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 04/16] iomap: add wrapper function iomap_bio_readpage()
2025-06-16 12:49 ` Christoph Hellwig
@ 2025-06-16 19:18 ` Joanne Koong
2025-06-17 4:38 ` Christoph Hellwig
0 siblings, 1 reply; 28+ messages in thread
From: Joanne Koong @ 2025-06-16 19:18 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-fsdevel, djwong, anuj1072538, miklos, brauner, linux-xfs,
kernel-team
On Mon, Jun 16, 2025 at 5:49 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Fri, Jun 13, 2025 at 02:46:29PM -0700, Joanne Koong wrote:
> > Add a wrapper function, iomap_bio_readpage(), around the bio readpage
> > logic so that callers that do not have CONFIG_BLOCK set may also use
> > iomap for buffered io.
>
> As far as I can tell nothing in this series actually uses the non-block
> read path, and I also don't really understand how the current split
> would facilitate that. Can you explain a bit more where this is going?
>
Nothing in this series uses the iomap read path, but fuse might be
used in environments where CONFIG_BLOCK isn't set. What I'm trying to
do with this patch is move the logic in iomap readpage that's block /
bio dependent out of buffered-io.c and gate that behind a #ifdef
CONFIG_BLOCK check so that fuse can use buffered-io.c without breaking
compilation for non-CONFIG_BLOCK environments
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 04/16] iomap: add wrapper function iomap_bio_readpage()
2025-06-16 19:18 ` Joanne Koong
@ 2025-06-17 4:38 ` Christoph Hellwig
2025-06-17 17:20 ` Joanne Koong
0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2025-06-17 4:38 UTC (permalink / raw)
To: Joanne Koong
Cc: Christoph Hellwig, linux-fsdevel, djwong, anuj1072538, miklos,
brauner, linux-xfs, kernel-team
On Mon, Jun 16, 2025 at 12:18:21PM -0700, Joanne Koong wrote:
> Nothing in this series uses the iomap read path, but fuse might be
> used in environments where CONFIG_BLOCK isn't set. What I'm trying to
> do with this patch is move the logic in iomap readpage that's block /
> bio dependent out of buffered-io.c and gate that behind a #ifdef
> CONFIG_BLOCK check so that fuse can use buffered-io.c without breaking
> compilation for non-CONFIG_BLOCK environments
Ah, ok. Are you fine with getting something that works for fuse first,
and then we look into !CONFIG_BLOCK environments as a next step?
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 04/16] iomap: add wrapper function iomap_bio_readpage()
2025-06-17 4:38 ` Christoph Hellwig
@ 2025-06-17 17:20 ` Joanne Koong
2025-06-18 4:45 ` Christoph Hellwig
0 siblings, 1 reply; 28+ messages in thread
From: Joanne Koong @ 2025-06-17 17:20 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-fsdevel, djwong, anuj1072538, miklos, brauner, linux-xfs,
kernel-team
On Mon, Jun 16, 2025 at 9:38 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Mon, Jun 16, 2025 at 12:18:21PM -0700, Joanne Koong wrote:
> > Nothing in this series uses the iomap read path, but fuse might be
> > used in environments where CONFIG_BLOCK isn't set. What I'm trying to
> > do with this patch is move the logic in iomap readpage that's block /
> > bio dependent out of buffered-io.c and gate that behind a #ifdef
> > CONFIG_BLOCK check so that fuse can use buffered-io.c without breaking
> > compilation for non-CONFIG_BLOCK environments
>
> Ah, ok. Are you fine with getting something that works for fuse first,
> and then we look into !CONFIG_BLOCK environments as a next step?
I think the fuse iomap work has a hard dependency on the CONFIG_BLOCK
work else it would break backwards compatibility for fuse (eg
non-CONFIG_BLOCK environments wouldn't be able to compile/use fuse
anymore)
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 04/16] iomap: add wrapper function iomap_bio_readpage()
2025-06-17 17:20 ` Joanne Koong
@ 2025-06-18 4:45 ` Christoph Hellwig
2025-06-18 19:17 ` Joanne Koong
0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2025-06-18 4:45 UTC (permalink / raw)
To: Joanne Koong
Cc: Christoph Hellwig, linux-fsdevel, djwong, anuj1072538, miklos,
brauner, linux-xfs, kernel-team
On Tue, Jun 17, 2025 at 10:20:38AM -0700, Joanne Koong wrote:
> > Ah, ok. Are you fine with getting something that works for fuse first,
> > and then we look into !CONFIG_BLOCK environments as a next step?
>
> I think the fuse iomap work has a hard dependency on the CONFIG_BLOCK
> work else it would break backwards compatibility for fuse (eg
> non-CONFIG_BLOCK environments wouldn't be able to compile/use fuse
> anymore)
Sure. What I mean is that I want to do this last before getting the
series ready to merge. I.e. don't bother with until we have something
we're all fine with on a conceptual level.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 04/16] iomap: add wrapper function iomap_bio_readpage()
2025-06-18 4:45 ` Christoph Hellwig
@ 2025-06-18 19:17 ` Joanne Koong
2025-06-23 7:42 ` Christoph Hellwig
0 siblings, 1 reply; 28+ messages in thread
From: Joanne Koong @ 2025-06-18 19:17 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-fsdevel, djwong, anuj1072538, miklos, brauner, linux-xfs,
kernel-team
On Tue, Jun 17, 2025 at 9:45 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Tue, Jun 17, 2025 at 10:20:38AM -0700, Joanne Koong wrote:
> > > Ah, ok. Are you fine with getting something that works for fuse first,
> > > and then we look into !CONFIG_BLOCK environments as a next step?
> >
> > I think the fuse iomap work has a hard dependency on the CONFIG_BLOCK
> > work else it would break backwards compatibility for fuse (eg
> > non-CONFIG_BLOCK environments wouldn't be able to compile/use fuse
> > anymore)
>
> Sure. What I mean is that I want to do this last before getting the
> series ready to merge. I.e. don't bother with until we have something
> we're all fine with on a conceptual level.
I'm pausing this patchset until yours lands and then I was planning to
rebase this (the CONFIG_BLOCK and fuse specifics) on top of yours. Not
sure if that's what you mean or not, but yes, happy to go with
whatever you think works best.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 04/16] iomap: add wrapper function iomap_bio_readpage()
2025-06-18 19:17 ` Joanne Koong
@ 2025-06-23 7:42 ` Christoph Hellwig
2025-06-23 20:53 ` Joanne Koong
0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2025-06-23 7:42 UTC (permalink / raw)
To: Joanne Koong
Cc: Christoph Hellwig, linux-fsdevel, djwong, anuj1072538, miklos,
brauner, linux-xfs, kernel-team
On Wed, Jun 18, 2025 at 12:17:14PM -0700, Joanne Koong wrote:
> > Sure. What I mean is that I want to do this last before getting the
> > series ready to merge. I.e. don't bother with until we have something
> > we're all fine with on a conceptual level.
>
> I'm pausing this patchset until yours lands and then I was planning to
> rebase this (the CONFIG_BLOCK and fuse specifics) on top of yours. Not
> sure if that's what you mean or not, but yes, happy to go with
> whatever you think works best.
It's not going to land without a user..
At some point we'll need to fuse side of this to go ahead. I'm happy
to either hand control of the series to you, or work with you on a
common tree to make that happen.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 04/16] iomap: add wrapper function iomap_bio_readpage()
2025-06-23 7:42 ` Christoph Hellwig
@ 2025-06-23 20:53 ` Joanne Koong
0 siblings, 0 replies; 28+ messages in thread
From: Joanne Koong @ 2025-06-23 20:53 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-fsdevel, djwong, anuj1072538, miklos, brauner, linux-xfs,
kernel-team
On Mon, Jun 23, 2025 at 12:42 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Wed, Jun 18, 2025 at 12:17:14PM -0700, Joanne Koong wrote:
> > > Sure. What I mean is that I want to do this last before getting the
> > > series ready to merge. I.e. don't bother with until we have something
> > > we're all fine with on a conceptual level.
> >
> > I'm pausing this patchset until yours lands and then I was planning to
> > rebase this (the CONFIG_BLOCK and fuse specifics) on top of yours. Not
> > sure if that's what you mean or not, but yes, happy to go with
> > whatever you think works best.
>
> It's not going to land without a user..
>
> At some point we'll need to fuse side of this to go ahead. I'm happy
> to either hand control of the series to you, or work with you on a
> common tree to make that happen.
I will send v3 today with:
Patches 1 to 11: the patches in your patchset in [1]
Patches 12 to 15: the fuse patches in this patchset (14/16, 15/16, and 16/16)
and temporarily drop the CONFIG_BLOCK patches until the series is
ready to merge.
[1] https://lore.kernel.org/linux-fsdevel/20250617105514.3393938-1-hch@lst.de/
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2025-06-23 20:53 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-13 21:46 [PATCH v2 00/16] fuse: use iomap for buffered writes + writeback Joanne Koong
2025-06-13 21:46 ` [PATCH v2 01/16] iomap: move buffered io CONFIG_BLOCK dependent logic into separate file Joanne Koong
2025-06-13 21:46 ` [PATCH v2 02/16] iomap: iomap_read_folio_sync() -> iomap_bio_read_folio_sync() Joanne Koong
2025-06-13 21:46 ` [PATCH v2 03/16] iomap: iomap_add_to_ioend() -> iomap_bio_add_to_ioend() Joanne Koong
2025-06-13 21:46 ` [PATCH v2 04/16] iomap: add wrapper function iomap_bio_readpage() Joanne Koong
2025-06-16 12:49 ` Christoph Hellwig
2025-06-16 19:18 ` Joanne Koong
2025-06-17 4:38 ` Christoph Hellwig
2025-06-17 17:20 ` Joanne Koong
2025-06-18 4:45 ` Christoph Hellwig
2025-06-18 19:17 ` Joanne Koong
2025-06-23 7:42 ` Christoph Hellwig
2025-06-23 20:53 ` Joanne Koong
2025-06-13 21:46 ` [PATCH v2 05/16] iomap: add wrapper function iomap_bio_ioend_error() Joanne Koong
2025-06-13 21:46 ` [PATCH v2 06/16] iomap: add wrapper function iomap_submit_bio() Joanne Koong
2025-06-13 21:46 ` [PATCH v2 07/16] iomap: decouple buffered-io.o from CONFIG_BLOCK Joanne Koong
2025-06-13 21:46 ` [PATCH v2 08/16] iomap: add read_folio_sync() handler for buffered writes Joanne Koong
2025-06-13 21:46 ` [PATCH v2 09/16] iomap: change 'count' to 'async_writeback' Joanne Koong
2025-06-16 12:52 ` Christoph Hellwig
2025-06-16 18:49 ` Joanne Koong
2025-06-13 21:46 ` [PATCH v2 10/16] iomap: replace ->map_blocks() with generic ->writeback_folio() for writeback Joanne Koong
2025-06-16 12:54 ` Christoph Hellwig
2025-06-13 21:46 ` [PATCH v2 11/16] iomap: replace ->submit_ioend() with generic ->writeback_complete() " Joanne Koong
2025-06-13 21:46 ` [PATCH v2 12/16] iomap: support more customized writeback handling Joanne Koong
2025-06-13 21:46 ` [PATCH v2 13/16] iomap: add iomap_writeback_dirty_folio() Joanne Koong
2025-06-13 21:46 ` [PATCH v2 14/16] fuse: use iomap for buffered writes Joanne Koong
2025-06-13 21:46 ` [PATCH v2 15/16] fuse: use iomap for writeback Joanne Koong
2025-06-13 21:46 ` [PATCH v2 16/16] fuse: use iomap for folio laundering Joanne Koong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).