* [PATCH v2 00/16] fuse: use iomap for buffered reads + readahead
@ 2025-09-08 18:51 Joanne Koong
2025-09-08 18:51 ` [PATCH v2 01/16] iomap: move async bio read logic into helper function Joanne Koong
` (15 more replies)
0 siblings, 16 replies; 60+ messages in thread
From: Joanne Koong @ 2025-09-08 18:51 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
This series adds fuse iomap support for buffered reads and readahead.
This is needed so that granular uptodate tracking can be used in fuse when
large folios are enabled so that only the non-uptodate portions of the folio
need to be read in instead of having to read in the entire folio. It also is
needed in order to turn on large folios for servers that use the writeback
cache since otherwise there is a race condition that may lead to data
corruption if there is a partial write, then a read and the read happens
before the write has undergone writeback, since otherwise the folio will not
be marked uptodate from the partial write so the read will read in the entire
folio from disk, which will overwrite the partial write.
This is on top of commit d02ae3528998 ("Merge branch 'kernel-6.18.clone3'
into vfs.all") in Christian's vfs tree.
This series was run through fstests on fuse passthrough_hp with an
out-of kernel patch enabling fuse large folios.
This patchset does not enable large folios on fuse yet. That will be part
of a different patchset.
Thanks,
Joanne
Changelog
---------
v1: https://lore.kernel.org/linux-fsdevel/20250829235627.4053234-1-joannelkoong@gmail.com/
v1 -> v2:
* Don't pass in caller-provided arg through iter->private, pass it through
ctx->private instead (Darrick & Christoph)
* Separate 'bias' for ifs->read_bytes_pending into separate patch (Christoph)
* Rework read/readahead interface to take in struct iomap_read_folio_ctx
(Christoph)
* Add patch for removing fuse fc->blkbits workaround, now that Miklos's tree
has been merged into Christian's
Joanne Koong (16):
iomap: move async bio read logic into helper function
iomap: move read/readahead bio submission logic into helper function
iomap: rename cur_folio_in_bio to folio_owned
iomap: store read/readahead bio generically
iomap: propagate iomap_read_folio() error to caller
iomap: iterate over entire folio in iomap_readpage_iter()
iomap: rename iomap_readpage_iter() to iomap_read_folio_iter()
iomap: rename iomap_readpage_ctx struct to iomap_read_folio_ctx
iomap: add public start/finish folio read helpers
iomap: make iomap_read_folio_ctx->folio_owned internal
iomap: add caller-provided callbacks for read and readahead
iomap: add bias for async read requests
iomap: move read/readahead logic out of CONFIG_BLOCK guard
fuse: use iomap for read_folio
fuse: use iomap for readahead
fuse: remove fc->blkbits workaround for partial writes
.../filesystems/iomap/operations.rst | 42 +++
block/fops.c | 14 +-
fs/erofs/data.c | 14 +-
fs/fuse/dir.c | 2 +-
fs/fuse/file.c | 291 ++++++++++-------
fs/fuse/fuse_i.h | 8 -
fs/fuse/inode.c | 13 +-
fs/gfs2/aops.c | 21 +-
fs/iomap/buffered-io.c | 307 ++++++++++--------
fs/xfs/xfs_aops.c | 14 +-
fs/zonefs/file.c | 14 +-
include/linux/iomap.h | 45 ++-
12 files changed, 509 insertions(+), 276 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v2 01/16] iomap: move async bio read logic into helper function
2025-09-08 18:51 [PATCH v2 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
@ 2025-09-08 18:51 ` Joanne Koong
2025-09-11 11:09 ` Christoph Hellwig
2025-09-08 18:51 ` [PATCH v2 02/16] iomap: move read/readahead bio submission " Joanne Koong
` (14 subsequent siblings)
15 siblings, 1 reply; 60+ messages in thread
From: Joanne Koong @ 2025-09-08 18:51 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
Move the iomap_readpage_iter() async bio read logic into a separate
helper function. This is needed to make iomap read/readahead more
generically usable, especially for filesystems that do not require
CONFIG_BLOCK.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 68 ++++++++++++++++++++++++++----------------
1 file changed, 42 insertions(+), 26 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index fd827398afd2..13854fb6ad86 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -357,36 +357,21 @@ struct iomap_readpage_ctx {
struct readahead_control *rac;
};
-static int iomap_readpage_iter(struct iomap_iter *iter,
- struct iomap_readpage_ctx *ctx)
+/**
+ * Read in a folio range asynchronously through bios.
+ *
+ * This should only be used for read/readahead, not for buffered writes.
+ * Buffered writes must read in the folio synchronously.
+ */
+static void iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
+ struct iomap_readpage_ctx *ctx, loff_t pos, size_t plen)
{
+ struct folio *folio = ctx->cur_folio;
const struct iomap *iomap = &iter->iomap;
- loff_t pos = iter->pos;
+ struct iomap_folio_state *ifs = folio->private;
+ size_t poff = offset_in_folio(folio, pos);
loff_t length = iomap_length(iter);
- struct folio *folio = ctx->cur_folio;
- struct iomap_folio_state *ifs;
- size_t poff, plen;
sector_t sector;
- int ret;
-
- if (iomap->type == IOMAP_INLINE) {
- ret = iomap_read_inline_data(iter, folio);
- if (ret)
- return ret;
- return iomap_iter_advance(iter, &length);
- }
-
- /* zero post-eof blocks as the page may be mapped */
- ifs = ifs_alloc(iter->inode, folio, iter->flags);
- iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff, &plen);
- if (plen == 0)
- goto done;
-
- if (iomap_block_needs_zeroing(iter, pos)) {
- folio_zero_range(folio, poff, plen);
- iomap_set_range_uptodate(folio, poff, plen);
- goto done;
- }
ctx->cur_folio_in_bio = true;
if (ifs) {
@@ -425,6 +410,37 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
ctx->bio->bi_end_io = iomap_read_end_io;
bio_add_folio_nofail(ctx->bio, folio, plen, poff);
}
+}
+
+static int iomap_readpage_iter(struct iomap_iter *iter,
+ struct iomap_readpage_ctx *ctx)
+{
+ const struct iomap *iomap = &iter->iomap;
+ loff_t pos = iter->pos;
+ loff_t length = iomap_length(iter);
+ struct folio *folio = ctx->cur_folio;
+ size_t poff, plen;
+ int ret;
+
+ if (iomap->type == IOMAP_INLINE) {
+ ret = iomap_read_inline_data(iter, folio);
+ if (ret)
+ return ret;
+ return iomap_iter_advance(iter, &length);
+ }
+
+ /* zero post-eof blocks as the page may be mapped */
+ ifs_alloc(iter->inode, folio, iter->flags);
+ iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff, &plen);
+ if (plen == 0)
+ goto done;
+
+ if (iomap_block_needs_zeroing(iter, pos)) {
+ folio_zero_range(folio, poff, plen);
+ iomap_set_range_uptodate(folio, poff, plen);
+ } else {
+ iomap_read_folio_range_bio_async(iter, ctx, pos, plen);
+ }
done:
/*
--
2.47.3
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 02/16] iomap: move read/readahead bio submission logic into helper function
2025-09-08 18:51 [PATCH v2 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
2025-09-08 18:51 ` [PATCH v2 01/16] iomap: move async bio read logic into helper function Joanne Koong
@ 2025-09-08 18:51 ` Joanne Koong
2025-09-11 11:09 ` Christoph Hellwig
2025-09-08 18:51 ` [PATCH v2 03/16] iomap: rename cur_folio_in_bio to folio_owned Joanne Koong
` (13 subsequent siblings)
15 siblings, 1 reply; 60+ messages in thread
From: Joanne Koong @ 2025-09-08 18:51 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
Move the read/readahead bio submission logic into a separate helper
This is needed to make iomap read/readahead more generically usable,
especially for filesystems that do not require CONFIG_BLOCK.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 30 ++++++++++++++++--------------
1 file changed, 16 insertions(+), 14 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 13854fb6ad86..a3b02ed5328f 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -357,6 +357,14 @@ struct iomap_readpage_ctx {
struct readahead_control *rac;
};
+static void iomap_submit_read_bio(struct iomap_readpage_ctx *ctx)
+{
+ struct bio *bio = ctx->bio;
+
+ if (bio)
+ submit_bio(bio);
+}
+
/**
* Read in a folio range asynchronously through bios.
*
@@ -388,8 +396,7 @@ static void iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
gfp_t orig_gfp = gfp;
unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
- if (ctx->bio)
- submit_bio(ctx->bio);
+ iomap_submit_read_bio(ctx);
if (ctx->rac) /* same as readahead_gfp_mask */
gfp |= __GFP_NORETRY | __GFP_NOWARN;
@@ -484,13 +491,10 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
while ((ret = iomap_iter(&iter, ops)) > 0)
iter.status = iomap_read_folio_iter(&iter, &ctx);
- if (ctx.bio) {
- submit_bio(ctx.bio);
- WARN_ON_ONCE(!ctx.cur_folio_in_bio);
- } else {
- WARN_ON_ONCE(ctx.cur_folio_in_bio);
+ iomap_submit_read_bio(&ctx);
+
+ if (!ctx.cur_folio_in_bio)
folio_unlock(folio);
- }
/*
* Just like mpage_readahead and block_read_full_folio, we always
@@ -556,12 +560,10 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
while (iomap_iter(&iter, ops) > 0)
iter.status = iomap_readahead_iter(&iter, &ctx);
- if (ctx.bio)
- submit_bio(ctx.bio);
- if (ctx.cur_folio) {
- if (!ctx.cur_folio_in_bio)
- folio_unlock(ctx.cur_folio);
- }
+ iomap_submit_read_bio(&ctx);
+
+ if (ctx.cur_folio && !ctx.cur_folio_in_bio)
+ folio_unlock(ctx.cur_folio);
}
EXPORT_SYMBOL_GPL(iomap_readahead);
--
2.47.3
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 03/16] iomap: rename cur_folio_in_bio to folio_owned
2025-09-08 18:51 [PATCH v2 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
2025-09-08 18:51 ` [PATCH v2 01/16] iomap: move async bio read logic into helper function Joanne Koong
2025-09-08 18:51 ` [PATCH v2 02/16] iomap: move read/readahead bio submission " Joanne Koong
@ 2025-09-08 18:51 ` Joanne Koong
2025-09-11 11:10 ` Christoph Hellwig
2025-09-08 18:51 ` [PATCH v2 04/16] iomap: store read/readahead bio generically Joanne Koong
` (12 subsequent siblings)
15 siblings, 1 reply; 60+ messages in thread
From: Joanne Koong @ 2025-09-08 18:51 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
The purpose of struct iomap_readpage_ctx's cur_folio_in_bio is to track
whether the folio is owned by the bio (where thus the bio is responsible
for unlocking the folio) or if it needs to be unlocked by iomap. Rename
this to folio_owned to make the purpose more clear and so that when
iomap read/readahead logic is made generic, the name also makes sense
for filesystems that don't use bios.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index a3b02ed5328f..598998269107 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -352,7 +352,12 @@ static void iomap_read_end_io(struct bio *bio)
struct iomap_readpage_ctx {
struct folio *cur_folio;
- bool cur_folio_in_bio;
+ /*
+ * Is the folio owned by this readpage context, or by some
+ * external IO helper? Either way, the owner of the folio is
+ * responsible for unlocking it when the read completes.
+ */
+ bool folio_owned;
struct bio *bio;
struct readahead_control *rac;
};
@@ -381,7 +386,7 @@ static void iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
loff_t length = iomap_length(iter);
sector_t sector;
- ctx->cur_folio_in_bio = true;
+ ctx->folio_owned = true;
if (ifs) {
spin_lock_irq(&ifs->state_lock);
ifs->read_bytes_pending += plen;
@@ -493,7 +498,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
iomap_submit_read_bio(&ctx);
- if (!ctx.cur_folio_in_bio)
+ if (!ctx.folio_owned)
folio_unlock(folio);
/*
@@ -513,13 +518,13 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
while (iomap_length(iter)) {
if (ctx->cur_folio &&
offset_in_folio(ctx->cur_folio, iter->pos) == 0) {
- if (!ctx->cur_folio_in_bio)
+ if (!ctx->folio_owned)
folio_unlock(ctx->cur_folio);
ctx->cur_folio = NULL;
}
if (!ctx->cur_folio) {
ctx->cur_folio = readahead_folio(ctx->rac);
- ctx->cur_folio_in_bio = false;
+ ctx->folio_owned = false;
}
ret = iomap_readpage_iter(iter, ctx);
if (ret)
@@ -562,7 +567,7 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
iomap_submit_read_bio(&ctx);
- if (ctx.cur_folio && !ctx.cur_folio_in_bio)
+ if (ctx.cur_folio && !ctx.folio_owned)
folio_unlock(ctx.cur_folio);
}
EXPORT_SYMBOL_GPL(iomap_readahead);
--
2.47.3
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 04/16] iomap: store read/readahead bio generically
2025-09-08 18:51 [PATCH v2 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (2 preceding siblings ...)
2025-09-08 18:51 ` [PATCH v2 03/16] iomap: rename cur_folio_in_bio to folio_owned Joanne Koong
@ 2025-09-08 18:51 ` Joanne Koong
2025-09-11 11:11 ` Christoph Hellwig
2025-09-08 18:51 ` [PATCH v2 05/16] iomap: propagate iomap_read_folio() error to caller Joanne Koong
` (11 subsequent siblings)
15 siblings, 1 reply; 60+ messages in thread
From: Joanne Koong @ 2025-09-08 18:51 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
Store the iomap_readpage_ctx bio generically as a "void *private".
This makes the read/readahead interface more generic, which allows it to
be used by filesystems that may not be block-based and may not have
CONFIG_BLOCK set.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 27 +++++++++++++--------------
1 file changed, 13 insertions(+), 14 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 598998269107..a83a94bc0be9 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -358,13 +358,13 @@ struct iomap_readpage_ctx {
* responsible for unlocking it when the read completes.
*/
bool folio_owned;
- struct bio *bio;
+ void *private;
struct readahead_control *rac;
};
static void iomap_submit_read_bio(struct iomap_readpage_ctx *ctx)
{
- struct bio *bio = ctx->bio;
+ struct bio *bio = ctx->private;
if (bio)
submit_bio(bio);
@@ -385,6 +385,7 @@ static void iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
size_t poff = offset_in_folio(folio, pos);
loff_t length = iomap_length(iter);
sector_t sector;
+ struct bio *bio = ctx->private;
ctx->folio_owned = true;
if (ifs) {
@@ -394,9 +395,8 @@ static void iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
}
sector = iomap_sector(iomap, pos);
- if (!ctx->bio ||
- bio_end_sector(ctx->bio) != sector ||
- !bio_add_folio(ctx->bio, folio, plen, poff)) {
+ if (!bio || bio_end_sector(bio) != sector ||
+ !bio_add_folio(bio, folio, plen, poff)) {
gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
gfp_t orig_gfp = gfp;
unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
@@ -405,22 +405,21 @@ static void iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
if (ctx->rac) /* same as readahead_gfp_mask */
gfp |= __GFP_NORETRY | __GFP_NOWARN;
- ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
+ bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
REQ_OP_READ, gfp);
/*
* If the bio_alloc fails, try it again for a single page to
* avoid having to deal with partial page reads. This emulates
* what do_mpage_read_folio does.
*/
- if (!ctx->bio) {
- ctx->bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ,
- orig_gfp);
- }
+ if (!bio)
+ bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ, orig_gfp);
if (ctx->rac)
- ctx->bio->bi_opf |= REQ_RAHEAD;
- ctx->bio->bi_iter.bi_sector = sector;
- ctx->bio->bi_end_io = iomap_read_end_io;
- bio_add_folio_nofail(ctx->bio, folio, plen, poff);
+ bio->bi_opf |= REQ_RAHEAD;
+ bio->bi_iter.bi_sector = sector;
+ bio->bi_end_io = iomap_read_end_io;
+ bio_add_folio_nofail(bio, folio, plen, poff);
+ ctx->private = bio;
}
}
--
2.47.3
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 05/16] iomap: propagate iomap_read_folio() error to caller
2025-09-08 18:51 [PATCH v2 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (3 preceding siblings ...)
2025-09-08 18:51 ` [PATCH v2 04/16] iomap: store read/readahead bio generically Joanne Koong
@ 2025-09-08 18:51 ` Joanne Koong
2025-09-11 11:13 ` Christoph Hellwig
2025-09-08 18:51 ` [PATCH v2 06/16] iomap: iterate over entire folio in iomap_readpage_iter() Joanne Koong
` (10 subsequent siblings)
15 siblings, 1 reply; 60+ messages in thread
From: Joanne Koong @ 2025-09-08 18:51 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
Propagate any error encountered in iomap_read_folio() back up to its
caller (otherwise a default -EIO will be passed up by
filemap_read_folio() to callers). This is standard behavior for how
other filesystems handle their ->read_folio() errors as well.
Remove the out of date comment about setting the folio error flag.
Folio error flags were removed in commit 1f56eedf7ff7 ("iomap:
Remove calls to set and clear folio error flag").
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/iomap/buffered-io.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index a83a94bc0be9..51d204f0e077 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -500,12 +500,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
if (!ctx.folio_owned)
folio_unlock(folio);
- /*
- * Just like mpage_readahead and block_read_full_folio, we always
- * return 0 and just set the folio error flag on errors. This
- * should be cleaned up throughout the stack eventually.
- */
- return 0;
+ return ret;
}
EXPORT_SYMBOL_GPL(iomap_read_folio);
--
2.47.3
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 06/16] iomap: iterate over entire folio in iomap_readpage_iter()
2025-09-08 18:51 [PATCH v2 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (4 preceding siblings ...)
2025-09-08 18:51 ` [PATCH v2 05/16] iomap: propagate iomap_read_folio() error to caller Joanne Koong
@ 2025-09-08 18:51 ` Joanne Koong
2025-09-11 11:15 ` Christoph Hellwig
2025-09-08 18:51 ` [PATCH v2 07/16] iomap: rename iomap_readpage_iter() to iomap_read_folio_iter() Joanne Koong
` (9 subsequent siblings)
15 siblings, 1 reply; 60+ messages in thread
From: Joanne Koong @ 2025-09-08 18:51 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
Iterate over all non-uptodate ranges in a single call to
iomap_readpage_iter() instead of leaving the partial folio iteration to
the caller.
This will be useful for supporting caller-provided async folio read
callbacks (added in later commit) because that will require tracking
when the first and last async read request for a folio is sent, in order
to prevent premature read completion of the folio.
This additionally makes the iomap_readahead_iter() logic a bit simpler.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 67 ++++++++++++++++++------------------------
1 file changed, 29 insertions(+), 38 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 51d204f0e077..fc8fa24ae7db 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -431,6 +431,7 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
loff_t length = iomap_length(iter);
struct folio *folio = ctx->cur_folio;
size_t poff, plen;
+ loff_t count;
int ret;
if (iomap->type == IOMAP_INLINE) {
@@ -442,39 +443,29 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
/* zero post-eof blocks as the page may be mapped */
ifs_alloc(iter->inode, folio, iter->flags);
- iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff, &plen);
- if (plen == 0)
- goto done;
- if (iomap_block_needs_zeroing(iter, pos)) {
- folio_zero_range(folio, poff, plen);
- iomap_set_range_uptodate(folio, poff, plen);
- } else {
- iomap_read_folio_range_bio_async(iter, ctx, pos, plen);
- }
-
-done:
- /*
- * Move the caller beyond our range so that it keeps making progress.
- * For that, we have to include any leading non-uptodate ranges, but
- * we can skip trailing ones as they will be handled in the next
- * iteration.
- */
- length = pos - iter->pos + plen;
- return iomap_iter_advance(iter, &length);
-}
+ length = min_t(loff_t, length,
+ folio_size(folio) - offset_in_folio(folio, pos));
+ while (length) {
+ iomap_adjust_read_range(iter->inode, folio, &pos,
+ length, &poff, &plen);
+ count = pos - iter->pos + plen;
+ if (plen == 0)
+ return iomap_iter_advance(iter, &count);
-static int iomap_read_folio_iter(struct iomap_iter *iter,
- struct iomap_readpage_ctx *ctx)
-{
- int ret;
+ if (iomap_block_needs_zeroing(iter, pos)) {
+ folio_zero_range(folio, poff, plen);
+ iomap_set_range_uptodate(folio, poff, plen);
+ } else {
+ iomap_read_folio_range_bio_async(iter, ctx, pos, plen);
+ }
- while (iomap_length(iter)) {
- ret = iomap_readpage_iter(iter, ctx);
+ length -= count;
+ ret = iomap_iter_advance(iter, &count);
if (ret)
return ret;
+ pos = iter->pos;
}
-
return 0;
}
@@ -493,7 +484,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
trace_iomap_readpage(iter.inode, 1);
while ((ret = iomap_iter(&iter, ops)) > 0)
- iter.status = iomap_read_folio_iter(&iter, &ctx);
+ iter.status = iomap_readpage_iter(&iter, &ctx);
iomap_submit_read_bio(&ctx);
@@ -510,16 +501,16 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
int ret;
while (iomap_length(iter)) {
- if (ctx->cur_folio &&
- offset_in_folio(ctx->cur_folio, iter->pos) == 0) {
- if (!ctx->folio_owned)
- folio_unlock(ctx->cur_folio);
- ctx->cur_folio = NULL;
- }
- if (!ctx->cur_folio) {
- ctx->cur_folio = readahead_folio(ctx->rac);
- ctx->folio_owned = false;
- }
+ if (ctx->cur_folio && !ctx->folio_owned)
+ folio_unlock(ctx->cur_folio);
+ ctx->cur_folio = readahead_folio(ctx->rac);
+ /*
+ * We should never in practice hit this case since
+ * the iter length matches the readahead length.
+ */
+ if (WARN_ON_ONCE(!ctx->cur_folio))
+ return -EINVAL;
+ ctx->folio_owned = false;
ret = iomap_readpage_iter(iter, ctx);
if (ret)
return ret;
--
2.47.3
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 07/16] iomap: rename iomap_readpage_iter() to iomap_read_folio_iter()
2025-09-08 18:51 [PATCH v2 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (5 preceding siblings ...)
2025-09-08 18:51 ` [PATCH v2 06/16] iomap: iterate over entire folio in iomap_readpage_iter() Joanne Koong
@ 2025-09-08 18:51 ` Joanne Koong
2025-09-11 11:16 ` Christoph Hellwig
2025-09-08 18:51 ` [PATCH v2 08/16] iomap: rename iomap_readpage_ctx struct to iomap_read_folio_ctx Joanne Koong
` (8 subsequent siblings)
15 siblings, 1 reply; 60+ messages in thread
From: Joanne Koong @ 2025-09-08 18:51 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
->readpage was deprecated and reads are now on folios.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/iomap/buffered-io.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index fc8fa24ae7db..c376a793e4c5 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -423,7 +423,7 @@ static void iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
}
}
-static int iomap_readpage_iter(struct iomap_iter *iter,
+static int iomap_read_folio_iter(struct iomap_iter *iter,
struct iomap_readpage_ctx *ctx)
{
const struct iomap *iomap = &iter->iomap;
@@ -484,7 +484,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
trace_iomap_readpage(iter.inode, 1);
while ((ret = iomap_iter(&iter, ops)) > 0)
- iter.status = iomap_readpage_iter(&iter, &ctx);
+ iter.status = iomap_read_folio_iter(&iter, &ctx);
iomap_submit_read_bio(&ctx);
@@ -511,7 +511,7 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
if (WARN_ON_ONCE(!ctx->cur_folio))
return -EINVAL;
ctx->folio_owned = false;
- ret = iomap_readpage_iter(iter, ctx);
+ ret = iomap_read_folio_iter(iter, ctx);
if (ret)
return ret;
}
--
2.47.3
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 08/16] iomap: rename iomap_readpage_ctx struct to iomap_read_folio_ctx
2025-09-08 18:51 [PATCH v2 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (6 preceding siblings ...)
2025-09-08 18:51 ` [PATCH v2 07/16] iomap: rename iomap_readpage_iter() to iomap_read_folio_iter() Joanne Koong
@ 2025-09-08 18:51 ` Joanne Koong
2025-09-11 11:16 ` Christoph Hellwig
2025-09-08 18:51 ` [PATCH v2 09/16] iomap: add public start/finish folio read helpers Joanne Koong
` (7 subsequent siblings)
15 siblings, 1 reply; 60+ messages in thread
From: Joanne Koong @ 2025-09-08 18:51 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
->readpage was deprecated and reads are now on folios.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/iomap/buffered-io.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index c376a793e4c5..008042108c68 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -350,7 +350,7 @@ static void iomap_read_end_io(struct bio *bio)
bio_put(bio);
}
-struct iomap_readpage_ctx {
+struct iomap_read_folio_ctx {
struct folio *cur_folio;
/*
* Is the folio owned by this readpage context, or by some
@@ -362,7 +362,7 @@ struct iomap_readpage_ctx {
struct readahead_control *rac;
};
-static void iomap_submit_read_bio(struct iomap_readpage_ctx *ctx)
+static void iomap_submit_read_bio(struct iomap_read_folio_ctx *ctx)
{
struct bio *bio = ctx->private;
@@ -377,7 +377,7 @@ static void iomap_submit_read_bio(struct iomap_readpage_ctx *ctx)
* Buffered writes must read in the folio synchronously.
*/
static void iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
- struct iomap_readpage_ctx *ctx, loff_t pos, size_t plen)
+ struct iomap_read_folio_ctx *ctx, loff_t pos, size_t plen)
{
struct folio *folio = ctx->cur_folio;
const struct iomap *iomap = &iter->iomap;
@@ -424,7 +424,7 @@ static void iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
}
static int iomap_read_folio_iter(struct iomap_iter *iter,
- struct iomap_readpage_ctx *ctx)
+ struct iomap_read_folio_ctx *ctx)
{
const struct iomap *iomap = &iter->iomap;
loff_t pos = iter->pos;
@@ -476,7 +476,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
.pos = folio_pos(folio),
.len = folio_size(folio),
};
- struct iomap_readpage_ctx ctx = {
+ struct iomap_read_folio_ctx ctx = {
.cur_folio = folio,
};
int ret;
@@ -496,7 +496,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
EXPORT_SYMBOL_GPL(iomap_read_folio);
static int iomap_readahead_iter(struct iomap_iter *iter,
- struct iomap_readpage_ctx *ctx)
+ struct iomap_read_folio_ctx *ctx)
{
int ret;
@@ -541,7 +541,7 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
.pos = readahead_pos(rac),
.len = readahead_length(rac),
};
- struct iomap_readpage_ctx ctx = {
+ struct iomap_read_folio_ctx ctx = {
.rac = rac,
};
--
2.47.3
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 09/16] iomap: add public start/finish folio read helpers
2025-09-08 18:51 [PATCH v2 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (7 preceding siblings ...)
2025-09-08 18:51 ` [PATCH v2 08/16] iomap: rename iomap_readpage_ctx struct to iomap_read_folio_ctx Joanne Koong
@ 2025-09-08 18:51 ` Joanne Koong
2025-09-11 11:16 ` Christoph Hellwig
2025-09-08 18:51 ` [PATCH v2 10/16] iomap: make iomap_read_folio_ctx->folio_owned internal Joanne Koong
` (6 subsequent siblings)
15 siblings, 1 reply; 60+ messages in thread
From: Joanne Koong @ 2025-09-08 18:51 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
Move ifs read_bytes_pending increment logic into a separate helper,
iomap_start_folio_read(), which will be needed later on by
caller-provided read callbacks (added in a later commit) for
read/readahead. This is the counterpart to the currently existing
iomap_finish_folio_read().
Make iomap_start_folio_read() and iomap_finish_folio_read() publicly
accessible. These need to be accessible in order for caller-provided
read callbacks to use.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/iomap/buffered-io.c | 26 +++++++++++++++++---------
include/linux/iomap.h | 3 +++
2 files changed, 20 insertions(+), 9 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 008042108c68..50de09426c96 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -317,9 +317,20 @@ static int iomap_read_inline_data(const struct iomap_iter *iter,
return 0;
}
-#ifdef CONFIG_BLOCK
-static void iomap_finish_folio_read(struct folio *folio, size_t off,
- size_t len, int error)
+void iomap_start_folio_read(struct folio *folio, size_t len)
+{
+ struct iomap_folio_state *ifs = folio->private;
+
+ if (ifs) {
+ spin_lock_irq(&ifs->state_lock);
+ ifs->read_bytes_pending += len;
+ spin_unlock_irq(&ifs->state_lock);
+ }
+}
+EXPORT_SYMBOL_GPL(iomap_start_folio_read);
+
+void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
+ int error)
{
struct iomap_folio_state *ifs = folio->private;
bool uptodate = !error;
@@ -339,7 +350,9 @@ static void iomap_finish_folio_read(struct folio *folio, size_t off,
if (finished)
folio_end_read(folio, uptodate);
}
+EXPORT_SYMBOL_GPL(iomap_finish_folio_read);
+#ifdef CONFIG_BLOCK
static void iomap_read_end_io(struct bio *bio)
{
int error = blk_status_to_errno(bio->bi_status);
@@ -381,18 +394,13 @@ static void iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
{
struct folio *folio = ctx->cur_folio;
const struct iomap *iomap = &iter->iomap;
- struct iomap_folio_state *ifs = folio->private;
size_t poff = offset_in_folio(folio, pos);
loff_t length = iomap_length(iter);
sector_t sector;
struct bio *bio = ctx->private;
ctx->folio_owned = true;
- if (ifs) {
- spin_lock_irq(&ifs->state_lock);
- ifs->read_bytes_pending += plen;
- spin_unlock_irq(&ifs->state_lock);
- }
+ iomap_start_folio_read(folio, plen);
sector = iomap_sector(iomap, pos);
if (!bio || bio_end_sector(bio) != sector ||
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 73dceabc21c8..0938c4a57f4c 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -467,6 +467,9 @@ ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
loff_t pos, loff_t end_pos, unsigned int dirty_len);
int iomap_ioend_writeback_submit(struct iomap_writepage_ctx *wpc, int error);
+void iomap_start_folio_read(struct folio *folio, size_t len);
+void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
+ int error);
void iomap_start_folio_write(struct inode *inode, struct folio *folio,
size_t len);
void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
--
2.47.3
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 10/16] iomap: make iomap_read_folio_ctx->folio_owned internal
2025-09-08 18:51 [PATCH v2 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (8 preceding siblings ...)
2025-09-08 18:51 ` [PATCH v2 09/16] iomap: add public start/finish folio read helpers Joanne Koong
@ 2025-09-08 18:51 ` Joanne Koong
2025-09-11 11:17 ` Christoph Hellwig
2025-09-08 18:51 ` [PATCH v2 11/16] iomap: add caller-provided callbacks for read and readahead Joanne Koong
` (5 subsequent siblings)
15 siblings, 1 reply; 60+ messages in thread
From: Joanne Koong @ 2025-09-08 18:51 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
struct iomap_read_folio_ctx will be made a public interface when
read/readahead takes in caller-provided callbacks.
To make the interface simpler for end users, keep track of the folio
ownership state internally instead of exposing it in struct
iomap_read_folio_ctx.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 39 +++++++++++++++++++++++----------------
1 file changed, 23 insertions(+), 16 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 50de09426c96..d38459740180 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -365,12 +365,6 @@ static void iomap_read_end_io(struct bio *bio)
struct iomap_read_folio_ctx {
struct folio *cur_folio;
- /*
- * Is the folio owned by this readpage context, or by some
- * external IO helper? Either way, the owner of the folio is
- * responsible for unlocking it when the read completes.
- */
- bool folio_owned;
void *private;
struct readahead_control *rac;
};
@@ -399,7 +393,6 @@ static void iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
sector_t sector;
struct bio *bio = ctx->private;
- ctx->folio_owned = true;
iomap_start_folio_read(folio, plen);
sector = iomap_sector(iomap, pos);
@@ -432,7 +425,7 @@ static void iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
}
static int iomap_read_folio_iter(struct iomap_iter *iter,
- struct iomap_read_folio_ctx *ctx)
+ struct iomap_read_folio_ctx *ctx, bool *cur_folio_owned)
{
const struct iomap *iomap = &iter->iomap;
loff_t pos = iter->pos;
@@ -465,6 +458,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
folio_zero_range(folio, poff, plen);
iomap_set_range_uptodate(folio, poff, plen);
} else {
+ *cur_folio_owned = true;
iomap_read_folio_range_bio_async(iter, ctx, pos, plen);
}
@@ -487,16 +481,22 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
struct iomap_read_folio_ctx ctx = {
.cur_folio = folio,
};
+ /*
+ * If an external IO helper takes ownership of the folio,
+ * it is responsible for unlocking it when the read completes.
+ */
+ bool cur_folio_owned = false;
int ret;
trace_iomap_readpage(iter.inode, 1);
while ((ret = iomap_iter(&iter, ops)) > 0)
- iter.status = iomap_read_folio_iter(&iter, &ctx);
+ iter.status = iomap_read_folio_iter(&iter, &ctx,
+ &cur_folio_owned);
iomap_submit_read_bio(&ctx);
- if (!ctx.folio_owned)
+ if (!cur_folio_owned)
folio_unlock(folio);
return ret;
@@ -504,12 +504,13 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
EXPORT_SYMBOL_GPL(iomap_read_folio);
static int iomap_readahead_iter(struct iomap_iter *iter,
- struct iomap_read_folio_ctx *ctx)
+ struct iomap_read_folio_ctx *ctx,
+ bool *cur_folio_owned)
{
int ret;
while (iomap_length(iter)) {
- if (ctx->cur_folio && !ctx->folio_owned)
+ if (ctx->cur_folio && !*cur_folio_owned)
folio_unlock(ctx->cur_folio);
ctx->cur_folio = readahead_folio(ctx->rac);
/*
@@ -518,8 +519,8 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
*/
if (WARN_ON_ONCE(!ctx->cur_folio))
return -EINVAL;
- ctx->folio_owned = false;
- ret = iomap_read_folio_iter(iter, ctx);
+ *cur_folio_owned = false;
+ ret = iomap_read_folio_iter(iter, ctx, cur_folio_owned);
if (ret)
return ret;
}
@@ -552,15 +553,21 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
struct iomap_read_folio_ctx ctx = {
.rac = rac,
};
+ /*
+ * If an external IO helper takes ownership of the folio,
+ * it is responsible for unlocking it when the read completes.
+ */
+ bool cur_folio_owned = false;
trace_iomap_readahead(rac->mapping->host, readahead_count(rac));
while (iomap_iter(&iter, ops) > 0)
- iter.status = iomap_readahead_iter(&iter, &ctx);
+ iter.status = iomap_readahead_iter(&iter, &ctx,
+ &cur_folio_owned);
iomap_submit_read_bio(&ctx);
- if (ctx.cur_folio && !ctx.folio_owned)
+ if (ctx.cur_folio && !cur_folio_owned)
folio_unlock(ctx.cur_folio);
}
EXPORT_SYMBOL_GPL(iomap_readahead);
--
2.47.3
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 11/16] iomap: add caller-provided callbacks for read and readahead
2025-09-08 18:51 [PATCH v2 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (9 preceding siblings ...)
2025-09-08 18:51 ` [PATCH v2 10/16] iomap: make iomap_read_folio_ctx->folio_owned internal Joanne Koong
@ 2025-09-08 18:51 ` Joanne Koong
2025-09-09 0:14 ` Gao Xiang
2025-09-11 11:26 ` Christoph Hellwig
2025-09-08 18:51 ` [PATCH v2 12/16] iomap: add bias for async read requests Joanne Koong
` (4 subsequent siblings)
15 siblings, 2 replies; 60+ messages in thread
From: Joanne Koong @ 2025-09-08 18:51 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
Add caller-provided callbacks for read and readahead so that it can be
used generically, especially by filesystems that are not block-based.
In particular, this:
* Modifies the read and readahead interface to take in a
struct iomap_read_folio_ctx that is publicly defined as:
struct iomap_read_folio_ctx {
const struct iomap_read_ops *ops;
struct folio *cur_folio;
struct readahead_control *rac;
void *private;
};
where struct iomap_read_ops is defined as:
struct iomap_read_ops {
int (*read_folio_range)(const struct iomap_iter *iter,
struct iomap_read_folio_ctx *ctx,
loff_t pos, size_t len);
int (*read_submit)(struct iomap_read_folio_ctx *ctx);
};
read_folio_range() reads in the folio range and is required by the
caller to provide. read_submit() is optional and is used for
submitting any pending read requests.
iomap_read_folio() must set ops->read_folio_range() and
cur_folio, and iomap_readahead() must set
ops->read_folio_range() and rac.
* Modifies existing filesystems that use iomap for read and readahead to
use the new API. There is no change in functionality for these
filesystems.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
.../filesystems/iomap/operations.rst | 42 ++++++++++++++
block/fops.c | 14 ++++-
fs/erofs/data.c | 14 ++++-
fs/gfs2/aops.c | 21 +++++--
fs/iomap/buffered-io.c | 58 ++++++++++---------
fs/xfs/xfs_aops.c | 14 ++++-
fs/zonefs/file.c | 14 ++++-
include/linux/iomap.h | 42 +++++++++++++-
8 files changed, 178 insertions(+), 41 deletions(-)
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index 067ed8e14ef3..be890192287c 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -135,6 +135,30 @@ These ``struct kiocb`` flags are significant for buffered I/O with iomap:
* ``IOCB_DONTCACHE``: Turns on ``IOMAP_DONTCACHE``.
+``struct iomap_read_ops``
+--------------------------
+
+.. code-block:: c
+
+ struct iomap_read_ops {
+ int (*read_folio_range)(const struct iomap_iter *iter,
+ struct iomap_read_folio_ctx *ctx, loff_t pos,
+ size_t len);
+ int (*read_submit)(struct iomap_read_folio_ctx *ctx);
+ };
+
+iomap calls these functions:
+
+ - ``read_folio_range``: Called to read in the range (read can be done
+ synchronously or asynchronously). This must be provided by the caller.
+ The caller is responsible for calling iomap_start_folio_read() and
+ iomap_finish_folio_read() before and after reading the folio range. This
+ should be done even if an error is encountered during the read. This
+ returns 0 on success or a negative error on failure.
+
+ - ``read_submit``: Submit any pending read requests. This function is
+ optional. This returns 0 on success or a negative error on failure.
+
Internal per-Folio State
------------------------
@@ -182,6 +206,24 @@ The ``flags`` argument to ``->iomap_begin`` will be set to zero.
The pagecache takes whatever locks it needs before calling the
filesystem.
+Both ``iomap_readahead`` and ``iomap_read_folio`` pass in a ``struct
+iomap_read_folio_ctx``:
+
+.. code-block:: c
+
+ struct iomap_read_folio_ctx {
+ const struct iomap_read_ops *ops;
+ struct folio *cur_folio;
+ struct readahead_control *rac;
+ void *private;
+ };
+
+``iomap_readahead`` must set ``ops->read_folio_range()`` and ``rac``.
+``iomap_read_folio`` must set ``ops->read_folio_range()`` and ``cur_folio``.
+Both can optionally set ``ops->read_submit()`` and/or ``private``. ``private``
+is used to pass in any custom data the caller needs accessible in the ops
+callbacks.
+
Buffered Writes
---------------
diff --git a/block/fops.c b/block/fops.c
index ddbc69c0922b..00d9728a9b08 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -533,12 +533,22 @@ const struct address_space_operations def_blk_aops = {
#else /* CONFIG_BUFFER_HEAD */
static int blkdev_read_folio(struct file *file, struct folio *folio)
{
- return iomap_read_folio(folio, &blkdev_iomap_ops);
+ struct iomap_read_folio_ctx ctx = {
+ .ops = &iomap_read_bios_ops,
+ .cur_folio = folio,
+ };
+
+ return iomap_read_folio(&blkdev_iomap_ops, &ctx);
}
static void blkdev_readahead(struct readahead_control *rac)
{
- iomap_readahead(rac, &blkdev_iomap_ops);
+ struct iomap_read_folio_ctx ctx = {
+ .ops = &iomap_read_bios_ops,
+ .rac = rac,
+ };
+
+ iomap_readahead(&blkdev_iomap_ops, &ctx);
}
static ssize_t blkdev_writeback_range(struct iomap_writepage_ctx *wpc,
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 3b1ba571c728..3f27db03310d 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -369,17 +369,27 @@ int erofs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
*/
static int erofs_read_folio(struct file *file, struct folio *folio)
{
+ struct iomap_read_folio_ctx ctx = {
+ .ops = &iomap_read_bios_ops,
+ .cur_folio = folio,
+ };
+
trace_erofs_read_folio(folio, true);
- return iomap_read_folio(folio, &erofs_iomap_ops);
+ return iomap_read_folio(&erofs_iomap_ops, &ctx);
}
static void erofs_readahead(struct readahead_control *rac)
{
+ struct iomap_read_folio_ctx ctx = {
+ .ops = &iomap_read_bios_ops,
+ .rac = rac,
+ };
+
trace_erofs_readahead(rac->mapping->host, readahead_index(rac),
readahead_count(rac), true);
- return iomap_readahead(rac, &erofs_iomap_ops);
+ return iomap_readahead(&erofs_iomap_ops, &ctx);
}
static sector_t erofs_bmap(struct address_space *mapping, sector_t block)
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 47d74afd63ac..1a8567a41f03 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -428,7 +428,12 @@ static int gfs2_read_folio(struct file *file, struct folio *folio)
if (!gfs2_is_jdata(ip) ||
(i_blocksize(inode) == PAGE_SIZE && !folio_buffers(folio))) {
- error = iomap_read_folio(folio, &gfs2_iomap_ops);
+ struct iomap_read_folio_ctx ctx = {
+ .ops = &iomap_read_bios_ops,
+ .cur_folio = folio,
+ };
+
+ error = iomap_read_folio(&gfs2_iomap_ops, &ctx);
} else if (gfs2_is_stuffed(ip)) {
error = stuffed_read_folio(ip, folio);
} else {
@@ -498,12 +503,18 @@ static void gfs2_readahead(struct readahead_control *rac)
struct inode *inode = rac->mapping->host;
struct gfs2_inode *ip = GFS2_I(inode);
- if (gfs2_is_stuffed(ip))
+ if (gfs2_is_stuffed(ip)) {
;
- else if (gfs2_is_jdata(ip))
+ } else if (gfs2_is_jdata(ip)) {
mpage_readahead(rac, gfs2_block_map);
- else
- iomap_readahead(rac, &gfs2_iomap_ops);
+ } else {
+ struct iomap_read_folio_ctx ctx = {
+ .ops = &iomap_read_bios_ops,
+ .rac = rac,
+ };
+
+ iomap_readahead(&gfs2_iomap_ops, &ctx);
+ }
}
/**
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index d38459740180..6fafe3b30563 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -363,18 +363,14 @@ static void iomap_read_end_io(struct bio *bio)
bio_put(bio);
}
-struct iomap_read_folio_ctx {
- struct folio *cur_folio;
- void *private;
- struct readahead_control *rac;
-};
-
-static void iomap_submit_read_bio(struct iomap_read_folio_ctx *ctx)
+static int iomap_submit_read_bio(struct iomap_read_folio_ctx *ctx)
{
struct bio *bio = ctx->private;
if (bio)
submit_bio(bio);
+
+ return 0;
}
/**
@@ -383,7 +379,7 @@ static void iomap_submit_read_bio(struct iomap_read_folio_ctx *ctx)
* This should only be used for read/readahead, not for buffered writes.
* Buffered writes must read in the folio synchronously.
*/
-static void iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
+static int iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
struct iomap_read_folio_ctx *ctx, loff_t pos, size_t plen)
{
struct folio *folio = ctx->cur_folio;
@@ -422,8 +418,15 @@ static void iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
bio_add_folio_nofail(bio, folio, plen, poff);
ctx->private = bio;
}
+ return 0;
}
+const struct iomap_read_ops iomap_read_bios_ops = {
+ .read_folio_range = iomap_read_folio_range_bio_async,
+ .read_submit = iomap_submit_read_bio,
+};
+EXPORT_SYMBOL_GPL(iomap_read_bios_ops);
+
static int iomap_read_folio_iter(struct iomap_iter *iter,
struct iomap_read_folio_ctx *ctx, bool *cur_folio_owned)
{
@@ -459,7 +462,10 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
iomap_set_range_uptodate(folio, poff, plen);
} else {
*cur_folio_owned = true;
- iomap_read_folio_range_bio_async(iter, ctx, pos, plen);
+ ret = ctx->ops->read_folio_range(iter, ctx, pos,
+ plen);
+ if (ret)
+ return ret;
}
length -= count;
@@ -471,35 +477,35 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
return 0;
}
-int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
+int iomap_read_folio(const struct iomap_ops *ops,
+ struct iomap_read_folio_ctx *ctx)
{
+ struct folio *folio = ctx->cur_folio;
struct iomap_iter iter = {
.inode = folio->mapping->host,
.pos = folio_pos(folio),
.len = folio_size(folio),
};
- struct iomap_read_folio_ctx ctx = {
- .cur_folio = folio,
- };
/*
* If an external IO helper takes ownership of the folio,
* it is responsible for unlocking it when the read completes.
*/
bool cur_folio_owned = false;
- int ret;
+ int ret, submit_ret = 0;
trace_iomap_readpage(iter.inode, 1);
while ((ret = iomap_iter(&iter, ops)) > 0)
- iter.status = iomap_read_folio_iter(&iter, &ctx,
+ iter.status = iomap_read_folio_iter(&iter, ctx,
&cur_folio_owned);
- iomap_submit_read_bio(&ctx);
+ if (ctx->ops->read_submit)
+ submit_ret = ctx->ops->read_submit(ctx);
if (!cur_folio_owned)
folio_unlock(folio);
- return ret;
+ return ret ? ret : submit_ret;
}
EXPORT_SYMBOL_GPL(iomap_read_folio);
@@ -530,8 +536,8 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
/**
* iomap_readahead - Attempt to read pages from a file.
- * @rac: Describes the pages to be read.
* @ops: The operations vector for the filesystem.
+ * @ctx: The ctx used for issuing readahead.
*
* This function is for filesystems to call to implement their readahead
* address_space operation.
@@ -543,16 +549,15 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
* function is called with memalloc_nofs set, so allocations will not cause
* the filesystem to be reentered.
*/
-void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
+void iomap_readahead(const struct iomap_ops *ops,
+ struct iomap_read_folio_ctx *ctx)
{
+ struct readahead_control *rac = ctx->rac;
struct iomap_iter iter = {
.inode = rac->mapping->host,
.pos = readahead_pos(rac),
.len = readahead_length(rac),
};
- struct iomap_read_folio_ctx ctx = {
- .rac = rac,
- };
/*
* If an external IO helper takes ownership of the folio,
* it is responsible for unlocking it when the read completes.
@@ -562,13 +567,14 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
trace_iomap_readahead(rac->mapping->host, readahead_count(rac));
while (iomap_iter(&iter, ops) > 0)
- iter.status = iomap_readahead_iter(&iter, &ctx,
+ iter.status = iomap_readahead_iter(&iter, ctx,
&cur_folio_owned);
- iomap_submit_read_bio(&ctx);
+ if (ctx->ops->read_submit)
+ ctx->ops->read_submit(ctx);
- if (ctx.cur_folio && !cur_folio_owned)
- folio_unlock(ctx.cur_folio);
+ if (ctx->cur_folio && !cur_folio_owned)
+ folio_unlock(ctx->cur_folio);
}
EXPORT_SYMBOL_GPL(iomap_readahead);
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 1ee4f835ac3c..124f30e567f4 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -742,14 +742,24 @@ xfs_vm_read_folio(
struct file *unused,
struct folio *folio)
{
- return iomap_read_folio(folio, &xfs_read_iomap_ops);
+ struct iomap_read_folio_ctx ctx = {
+ .ops = &iomap_read_bios_ops,
+ .cur_folio = folio,
+ };
+
+ return iomap_read_folio(&xfs_read_iomap_ops, &ctx);
}
STATIC void
xfs_vm_readahead(
struct readahead_control *rac)
{
- iomap_readahead(rac, &xfs_read_iomap_ops);
+ struct iomap_read_folio_ctx ctx = {
+ .ops = &iomap_read_bios_ops,
+ .rac = rac,
+ };
+
+ iomap_readahead(&xfs_read_iomap_ops, &ctx);
}
static int
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index fd3a5922f6c3..254562842347 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -112,12 +112,22 @@ static const struct iomap_ops zonefs_write_iomap_ops = {
static int zonefs_read_folio(struct file *unused, struct folio *folio)
{
- return iomap_read_folio(folio, &zonefs_read_iomap_ops);
+ struct iomap_read_folio_ctx ctx = {
+ .ops = &iomap_read_bios_ops,
+ .cur_folio = folio,
+ };
+
+ return iomap_read_folio(&zonefs_read_iomap_ops, &ctx);
}
static void zonefs_readahead(struct readahead_control *rac)
{
- iomap_readahead(rac, &zonefs_read_iomap_ops);
+ struct iomap_read_folio_ctx ctx = {
+ .ops = &iomap_read_bios_ops,
+ .rac = rac,
+ };
+
+ iomap_readahead(&zonefs_read_iomap_ops, &ctx);
}
/*
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 0938c4a57f4c..0c6424f70237 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -16,6 +16,7 @@ struct inode;
struct iomap_iter;
struct iomap_dio;
struct iomap_writepage_ctx;
+struct iomap_read_folio_ctx;
struct iov_iter;
struct kiocb;
struct page;
@@ -339,8 +340,10 @@ static inline bool iomap_want_unshare_iter(const struct iomap_iter *iter)
ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
const struct iomap_ops *ops,
const struct iomap_write_ops *write_ops, void *private);
-int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops);
-void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops);
+int iomap_read_folio(const struct iomap_ops *ops,
+ struct iomap_read_folio_ctx *ctx);
+void iomap_readahead(const struct iomap_ops *ops,
+ struct iomap_read_folio_ctx *ctx);
bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);
struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len);
bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags);
@@ -478,6 +481,41 @@ void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio);
int iomap_writepages(struct iomap_writepage_ctx *wpc);
+struct iomap_read_folio_ctx {
+ const struct iomap_read_ops *ops;
+ struct folio *cur_folio;
+ struct readahead_control *rac;
+ void *private;
+};
+
+struct iomap_read_ops {
+ /*
+ * Read in a folio range.
+ *
+ * The read can be done synchronously or asynchronously. The caller is
+ * responsible for calling iomap_start_folio_read() and
+ * iomap_finish_folio_read() before and after reading in the folio
+ * range. This should be done even if an error is encountered during the
+ * read.
+ *
+ * Returns 0 on success or a negative error on failure.
+ */
+ int (*read_folio_range)(const struct iomap_iter *iter,
+ struct iomap_read_folio_ctx *ctx, loff_t pos,
+ size_t len);
+
+ /*
+ * Submit any pending read requests.
+ *
+ * This is optional.
+ *
+ * Returns 0 on success or a negative error on failure.
+ */
+ int (*read_submit)(struct iomap_read_folio_ctx *ctx);
+};
+
+extern const struct iomap_read_ops iomap_read_bios_ops;
+
/*
* Flags for direct I/O ->end_io:
*/
--
2.47.3
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 12/16] iomap: add bias for async read requests
2025-09-08 18:51 [PATCH v2 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (10 preceding siblings ...)
2025-09-08 18:51 ` [PATCH v2 11/16] iomap: add caller-provided callbacks for read and readahead Joanne Koong
@ 2025-09-08 18:51 ` Joanne Koong
2025-09-11 11:31 ` Christoph Hellwig
2025-09-08 18:51 ` [PATCH v2 13/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard Joanne Koong
` (3 subsequent siblings)
15 siblings, 1 reply; 60+ messages in thread
From: Joanne Koong @ 2025-09-08 18:51 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
Non-block-based filesystems will be using iomap read/readahead. If they
handle reading in ranges asynchronously and fulfill those read requests
on an ongoing basis (instead of all together at the end), then there is
the possibility that the read on the folio may be prematurely ended if
earlier async requests complete before the later ones have been issued.
For example if there is a large folio and a readahead request for 16
pages in that folio, if doing readahead on those 16 pages is split into
4 async requests and the first request is sent off and then completed
before we have sent off the second request, then when the first request
calls iomap_finish_folio_read(), ifs->read_bytes_pending would be 0,
which would end the read and unlock the folio prematurely.
To mitigate this, a "bias" is added to ifs->read_bytes_pending before
the first range is forwarded to the caller and removed after the last
range has been forwarded.
iomap writeback does this with their async requests as well to prevent
prematurely ending writeback.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 43 ++++++++++++++++++++++++++++++++----------
1 file changed, 33 insertions(+), 10 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 6fafe3b30563..f673e03f4ffb 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -329,8 +329,8 @@ void iomap_start_folio_read(struct folio *folio, size_t len)
}
EXPORT_SYMBOL_GPL(iomap_start_folio_read);
-void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
- int error)
+static void __iomap_finish_folio_read(struct folio *folio, size_t off,
+ size_t len, int error, bool update_bitmap)
{
struct iomap_folio_state *ifs = folio->private;
bool uptodate = !error;
@@ -340,7 +340,7 @@ void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
unsigned long flags;
spin_lock_irqsave(&ifs->state_lock, flags);
- if (!error)
+ if (!error && update_bitmap)
uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
ifs->read_bytes_pending -= len;
finished = !ifs->read_bytes_pending;
@@ -350,6 +350,12 @@ void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
if (finished)
folio_end_read(folio, uptodate);
}
+
+void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
+ int error)
+{
+ return __iomap_finish_folio_read(folio, off, len, error, true);
+}
EXPORT_SYMBOL_GPL(iomap_finish_folio_read);
#ifdef CONFIG_BLOCK
@@ -434,9 +440,10 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
loff_t pos = iter->pos;
loff_t length = iomap_length(iter);
struct folio *folio = ctx->cur_folio;
+ struct iomap_folio_state *ifs;
size_t poff, plen;
loff_t count;
- int ret;
+ int ret = 0;
if (iomap->type == IOMAP_INLINE) {
ret = iomap_read_inline_data(iter, folio);
@@ -446,7 +453,14 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
}
/* zero post-eof blocks as the page may be mapped */
- ifs_alloc(iter->inode, folio, iter->flags);
+ ifs = ifs_alloc(iter->inode, folio, iter->flags);
+
+ /*
+ * Add a bias to ifs->read_bytes_pending so that a read is ended only
+ * after all the ranges have been read in.
+ */
+ if (ifs)
+ iomap_start_folio_read(folio, 1);
length = min_t(loff_t, length,
folio_size(folio) - offset_in_folio(folio, pos));
@@ -454,8 +468,10 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
iomap_adjust_read_range(iter->inode, folio, &pos,
length, &poff, &plen);
count = pos - iter->pos + plen;
- if (plen == 0)
- return iomap_iter_advance(iter, &count);
+ if (plen == 0) {
+ ret = iomap_iter_advance(iter, &count);
+ break;
+ }
if (iomap_block_needs_zeroing(iter, pos)) {
folio_zero_range(folio, poff, plen);
@@ -465,16 +481,23 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
ret = ctx->ops->read_folio_range(iter, ctx, pos,
plen);
if (ret)
- return ret;
+ break;
}
length -= count;
ret = iomap_iter_advance(iter, &count);
if (ret)
- return ret;
+ break;
pos = iter->pos;
}
- return 0;
+
+ if (ifs) {
+ __iomap_finish_folio_read(folio, 0, 1, ret, false);
+ /* __iomap_finish_folio_read takes care of any unlocking */
+ *cur_folio_owned = true;
+ }
+
+ return ret;
}
int iomap_read_folio(const struct iomap_ops *ops,
--
2.47.3
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 13/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard
2025-09-08 18:51 [PATCH v2 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (11 preceding siblings ...)
2025-09-08 18:51 ` [PATCH v2 12/16] iomap: add bias for async read requests Joanne Koong
@ 2025-09-08 18:51 ` Joanne Koong
2025-09-09 2:14 ` Gao Xiang
2025-09-11 11:44 ` Christoph Hellwig
2025-09-08 18:51 ` [PATCH v2 14/16] fuse: use iomap for read_folio Joanne Koong
` (2 subsequent siblings)
15 siblings, 2 replies; 60+ messages in thread
From: Joanne Koong @ 2025-09-08 18:51 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
There is no longer a dependency on CONFIG_BLOCK in the iomap read and
readahead logic. Move this logic out of the CONFIG_BLOCK guard. This
allows non-block-based filesystems to use iomap for reads/readahead.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 151 +++++++++++++++++++++--------------------
1 file changed, 76 insertions(+), 75 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index f673e03f4ffb..c424e8c157dd 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -358,81 +358,6 @@ void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
}
EXPORT_SYMBOL_GPL(iomap_finish_folio_read);
-#ifdef CONFIG_BLOCK
-static void iomap_read_end_io(struct bio *bio)
-{
- int error = blk_status_to_errno(bio->bi_status);
- struct folio_iter fi;
-
- bio_for_each_folio_all(fi, bio)
- iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error);
- bio_put(bio);
-}
-
-static int iomap_submit_read_bio(struct iomap_read_folio_ctx *ctx)
-{
- struct bio *bio = ctx->private;
-
- if (bio)
- submit_bio(bio);
-
- return 0;
-}
-
-/**
- * Read in a folio range asynchronously through bios.
- *
- * This should only be used for read/readahead, not for buffered writes.
- * Buffered writes must read in the folio synchronously.
- */
-static int iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
- struct iomap_read_folio_ctx *ctx, loff_t pos, size_t plen)
-{
- struct folio *folio = ctx->cur_folio;
- const struct iomap *iomap = &iter->iomap;
- size_t poff = offset_in_folio(folio, pos);
- loff_t length = iomap_length(iter);
- sector_t sector;
- struct bio *bio = ctx->private;
-
- iomap_start_folio_read(folio, plen);
-
- sector = iomap_sector(iomap, pos);
- if (!bio || bio_end_sector(bio) != sector ||
- !bio_add_folio(bio, folio, plen, poff)) {
- gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
- gfp_t orig_gfp = gfp;
- unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
-
- iomap_submit_read_bio(ctx);
-
- if (ctx->rac) /* same as readahead_gfp_mask */
- gfp |= __GFP_NORETRY | __GFP_NOWARN;
- bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
- REQ_OP_READ, gfp);
- /*
- * If the bio_alloc fails, try it again for a single page to
- * avoid having to deal with partial page reads. This emulates
- * what do_mpage_read_folio does.
- */
- if (!bio)
- bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ, orig_gfp);
- if (ctx->rac)
- bio->bi_opf |= REQ_RAHEAD;
- bio->bi_iter.bi_sector = sector;
- bio->bi_end_io = iomap_read_end_io;
- bio_add_folio_nofail(bio, folio, plen, poff);
- ctx->private = bio;
- }
- return 0;
-}
-
-const struct iomap_read_ops iomap_read_bios_ops = {
- .read_folio_range = iomap_read_folio_range_bio_async,
- .read_submit = iomap_submit_read_bio,
-};
-EXPORT_SYMBOL_GPL(iomap_read_bios_ops);
-
static int iomap_read_folio_iter(struct iomap_iter *iter,
struct iomap_read_folio_ctx *ctx, bool *cur_folio_owned)
{
@@ -601,6 +526,82 @@ void iomap_readahead(const struct iomap_ops *ops,
}
EXPORT_SYMBOL_GPL(iomap_readahead);
+#ifdef CONFIG_BLOCK
+static void iomap_read_end_io(struct bio *bio)
+{
+ int error = blk_status_to_errno(bio->bi_status);
+ struct folio_iter fi;
+
+ bio_for_each_folio_all(fi, bio)
+ iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error);
+ bio_put(bio);
+}
+
+static int iomap_submit_read_bio(struct iomap_read_folio_ctx *ctx)
+{
+ struct bio *bio = ctx->private;
+
+ if (bio)
+ submit_bio(bio);
+
+ return 0;
+}
+
+/**
+ * Read in a folio range asynchronously through bios.
+ *
+ * This should only be used for read/readahead, not for buffered writes.
+ * Buffered writes must read in the folio synchronously.
+ */
+static int iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
+ struct iomap_read_folio_ctx *ctx, loff_t pos, size_t plen)
+{
+ struct folio *folio = ctx->cur_folio;
+ const struct iomap *iomap = &iter->iomap;
+ size_t poff = offset_in_folio(folio, pos);
+ loff_t length = iomap_length(iter);
+ sector_t sector;
+ struct bio *bio = ctx->private;
+
+ iomap_start_folio_read(folio, plen);
+
+ sector = iomap_sector(iomap, pos);
+ if (!bio || bio_end_sector(bio) != sector ||
+ !bio_add_folio(bio, folio, plen, poff)) {
+ gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
+ gfp_t orig_gfp = gfp;
+ unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
+
+ if (bio)
+ submit_bio(bio);
+
+ if (ctx->rac) /* same as readahead_gfp_mask */
+ gfp |= __GFP_NORETRY | __GFP_NOWARN;
+ bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
+ REQ_OP_READ, gfp);
+ /*
+ * If the bio_alloc fails, try it again for a single page to
+ * avoid having to deal with partial page reads. This emulates
+ * what do_mpage_read_folio does.
+ */
+ if (!bio)
+ bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ, orig_gfp);
+ if (ctx->rac)
+ bio->bi_opf |= REQ_RAHEAD;
+ bio->bi_iter.bi_sector = sector;
+ bio->bi_end_io = iomap_read_end_io;
+ bio_add_folio_nofail(bio, folio, plen, poff);
+ ctx->private = bio;
+ }
+ return 0;
+}
+
+const struct iomap_read_ops iomap_read_bios_ops = {
+ .read_folio_range = iomap_read_folio_range_bio_async,
+ .read_submit = iomap_submit_read_bio,
+};
+EXPORT_SYMBOL_GPL(iomap_read_bios_ops);
+
static int iomap_read_folio_range(const struct iomap_iter *iter,
struct folio *folio, loff_t pos, size_t len)
{
--
2.47.3
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 14/16] fuse: use iomap for read_folio
2025-09-08 18:51 [PATCH v2 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (12 preceding siblings ...)
2025-09-08 18:51 ` [PATCH v2 13/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard Joanne Koong
@ 2025-09-08 18:51 ` Joanne Koong
2025-09-08 18:51 ` [PATCH v2 15/16] fuse: use iomap for readahead Joanne Koong
2025-09-08 18:51 ` [PATCH v2 16/16] fuse: remove fc->blkbits workaround for partial writes Joanne Koong
15 siblings, 0 replies; 60+ messages in thread
From: Joanne Koong @ 2025-09-08 18:51 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
Read folio data into the page cache using iomap. This gives us granular
uptodate tracking for large folios, which optimizes how much data needs
to be read in. If some portions of the folio are already uptodate (eg
through a prior write), we only need to read in the non-uptodate
portions.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/fuse/file.c | 79 +++++++++++++++++++++++++++++++++++---------------
1 file changed, 56 insertions(+), 23 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 4adcf09d4b01..5b75a461f8e1 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -828,22 +828,69 @@ static int fuse_do_readfolio(struct file *file, struct folio *folio,
return 0;
}
+static int fuse_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
+ unsigned int flags, struct iomap *iomap,
+ struct iomap *srcmap)
+{
+ iomap->type = IOMAP_MAPPED;
+ iomap->length = length;
+ iomap->offset = offset;
+ return 0;
+}
+
+static const struct iomap_ops fuse_iomap_ops = {
+ .iomap_begin = fuse_iomap_begin,
+};
+
+struct fuse_fill_read_data {
+ struct file *file;
+};
+
+static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
+ struct iomap_read_folio_ctx *ctx,
+ loff_t pos, size_t len)
+{
+ struct fuse_fill_read_data *data = ctx->private;
+ struct folio *folio = ctx->cur_folio;
+ size_t off = offset_in_folio(folio, pos);
+ struct file *file = data->file;
+ int ret;
+
+ /*
+ * for non-readahead read requests, do reads synchronously since
+ * it's not guaranteed that the server can handle out-of-order reads
+ */
+ iomap_start_folio_read(folio, len);
+ ret = fuse_do_readfolio(file, folio, off, len);
+ iomap_finish_folio_read(folio, off, len, ret);
+ return ret;
+}
+
+static const struct iomap_read_ops fuse_iomap_read_ops = {
+ .read_folio_range = fuse_iomap_read_folio_range_async,
+};
+
static int fuse_read_folio(struct file *file, struct folio *folio)
{
struct inode *inode = folio->mapping->host;
- int err;
+ struct fuse_fill_read_data data = {
+ .file = file,
+ };
+ struct iomap_read_folio_ctx ctx = {
+ .cur_folio = folio,
+ .ops = &fuse_iomap_read_ops,
+ .private = &data,
- err = -EIO;
- if (fuse_is_bad(inode))
- goto out;
+ };
+ int err;
- err = fuse_do_readfolio(file, folio, 0, folio_size(folio));
- if (!err)
- folio_mark_uptodate(folio);
+ if (fuse_is_bad(inode)) {
+ folio_unlock(folio);
+ return -EIO;
+ }
+ err = iomap_read_folio(&fuse_iomap_ops, &ctx);
fuse_invalidate_atime(inode);
- out:
- folio_unlock(folio);
return err;
}
@@ -1394,20 +1441,6 @@ static const struct iomap_write_ops fuse_iomap_write_ops = {
.read_folio_range = fuse_iomap_read_folio_range,
};
-static int fuse_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
- unsigned int flags, struct iomap *iomap,
- struct iomap *srcmap)
-{
- iomap->type = IOMAP_MAPPED;
- iomap->length = length;
- iomap->offset = offset;
- return 0;
-}
-
-static const struct iomap_ops fuse_iomap_ops = {
- .iomap_begin = fuse_iomap_begin,
-};
-
static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
struct file *file = iocb->ki_filp;
--
2.47.3
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 15/16] fuse: use iomap for readahead
2025-09-08 18:51 [PATCH v2 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (13 preceding siblings ...)
2025-09-08 18:51 ` [PATCH v2 14/16] fuse: use iomap for read_folio Joanne Koong
@ 2025-09-08 18:51 ` Joanne Koong
2025-09-08 18:51 ` [PATCH v2 16/16] fuse: remove fc->blkbits workaround for partial writes Joanne Koong
15 siblings, 0 replies; 60+ messages in thread
From: Joanne Koong @ 2025-09-08 18:51 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
Do readahead in fuse using iomap. This gives us granular uptodate
tracking for large folios, which optimizes how much data needs to be
read in. If some portions of the folio are already uptodate (eg through
a prior write), we only need to read in the non-uptodate portions.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/fuse/file.c | 224 ++++++++++++++++++++++++++++---------------------
1 file changed, 128 insertions(+), 96 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 5b75a461f8e1..3f57b5c6e037 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -844,8 +844,68 @@ static const struct iomap_ops fuse_iomap_ops = {
struct fuse_fill_read_data {
struct file *file;
+
+ /*
+ * Fields below are used if sending the read request
+ * asynchronously.
+ */
+ struct fuse_conn *fc;
+ struct fuse_io_args *ia;
+ unsigned int nr_bytes;
};
+/* forward declarations */
+static bool fuse_folios_need_send(struct fuse_conn *fc, loff_t pos,
+ unsigned len, struct fuse_args_pages *ap,
+ unsigned cur_bytes, bool write);
+static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
+ unsigned int count, bool async);
+
+static int fuse_handle_readahead(struct folio *folio,
+ struct readahead_control *rac,
+ struct fuse_fill_read_data *data, loff_t pos,
+ size_t len)
+{
+ struct fuse_io_args *ia = data->ia;
+ size_t off = offset_in_folio(folio, pos);
+ struct fuse_conn *fc = data->fc;
+ struct fuse_args_pages *ap;
+ unsigned int nr_pages;
+
+ if (ia && fuse_folios_need_send(fc, pos, len, &ia->ap, data->nr_bytes,
+ false)) {
+ fuse_send_readpages(ia, data->file, data->nr_bytes,
+ fc->async_read);
+ data->nr_bytes = 0;
+ data->ia = NULL;
+ ia = NULL;
+ }
+ if (!ia) {
+ if (fc->num_background >= fc->congestion_threshold &&
+ rac->ra->async_size >= readahead_count(rac))
+ /*
+ * Congested and only async pages left, so skip the
+ * rest.
+ */
+ return -EAGAIN;
+
+ nr_pages = min(fc->max_pages, readahead_count(rac));
+ data->ia = fuse_io_alloc(NULL, nr_pages);
+ if (!data->ia)
+ return -ENOMEM;
+ ia = data->ia;
+ }
+ folio_get(folio);
+ ap = &ia->ap;
+ ap->folios[ap->num_folios] = folio;
+ ap->descs[ap->num_folios].offset = off;
+ ap->descs[ap->num_folios].length = len;
+ data->nr_bytes += len;
+ ap->num_folios++;
+
+ return 0;
+}
+
static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
struct iomap_read_folio_ctx *ctx,
loff_t pos, size_t len)
@@ -856,18 +916,41 @@ static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
struct file *file = data->file;
int ret;
- /*
- * for non-readahead read requests, do reads synchronously since
- * it's not guaranteed that the server can handle out-of-order reads
- */
iomap_start_folio_read(folio, len);
- ret = fuse_do_readfolio(file, folio, off, len);
- iomap_finish_folio_read(folio, off, len, ret);
+ if (ctx->rac) {
+ ret = fuse_handle_readahead(folio, ctx->rac, data, pos, len);
+ /*
+ * If fuse_handle_readahead was successful, fuse_readpages_end
+ * will do the iomap_finish_folio_read, else we need to call it
+ * here
+ */
+ if (ret)
+ iomap_finish_folio_read(folio, off, len, ret);
+ } else {
+ /*
+ * for non-readahead read requests, do reads synchronously
+ * since it's not guaranteed that the server can handle
+ * out-of-order reads
+ */
+ ret = fuse_do_readfolio(file, folio, off, len);
+ iomap_finish_folio_read(folio, off, len, ret);
+ }
return ret;
}
+static int fuse_iomap_read_submit(struct iomap_read_folio_ctx *ctx)
+{
+ struct fuse_fill_read_data *data = ctx->private;
+
+ if (data->ia)
+ fuse_send_readpages(data->ia, data->file, data->nr_bytes,
+ data->fc->async_read);
+ return 0;
+}
+
static const struct iomap_read_ops fuse_iomap_read_ops = {
.read_folio_range = fuse_iomap_read_folio_range_async,
+ .read_submit = fuse_iomap_read_submit,
};
static int fuse_read_folio(struct file *file, struct folio *folio)
@@ -930,7 +1013,8 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
}
for (i = 0; i < ap->num_folios; i++) {
- folio_end_read(ap->folios[i], !err);
+ iomap_finish_folio_read(ap->folios[i], ap->descs[i].offset,
+ ap->descs[i].length, err);
folio_put(ap->folios[i]);
}
if (ia->ff)
@@ -940,7 +1024,7 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
}
static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
- unsigned int count)
+ unsigned int count, bool async)
{
struct fuse_file *ff = file->private_data;
struct fuse_mount *fm = ff->fm;
@@ -962,7 +1046,7 @@ static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
fuse_read_args_fill(ia, file, pos, count, FUSE_READ);
ia->read.attr_ver = fuse_get_attr_version(fm->fc);
- if (fm->fc->async_read) {
+ if (async) {
ia->ff = fuse_file_get(ff);
ap->args.end = fuse_readpages_end;
err = fuse_simple_background(fm, &ap->args, GFP_KERNEL);
@@ -979,81 +1063,20 @@ static void fuse_readahead(struct readahead_control *rac)
{
struct inode *inode = rac->mapping->host;
struct fuse_conn *fc = get_fuse_conn(inode);
- unsigned int max_pages, nr_pages;
- struct folio *folio = NULL;
+ struct fuse_fill_read_data data = {
+ .file = rac->file,
+ .fc = fc,
+ };
+ struct iomap_read_folio_ctx ctx = {
+ .ops = &fuse_iomap_read_ops,
+ .rac = rac,
+ .private = &data
+ };
if (fuse_is_bad(inode))
return;
- max_pages = min_t(unsigned int, fc->max_pages,
- fc->max_read / PAGE_SIZE);
-
- /*
- * This is only accurate the first time through, since readahead_folio()
- * doesn't update readahead_count() from the previous folio until the
- * next call. Grab nr_pages here so we know how many pages we're going
- * to have to process. This means that we will exit here with
- * readahead_count() == folio_nr_pages(last_folio), but we will have
- * consumed all of the folios, and read_pages() will call
- * readahead_folio() again which will clean up the rac.
- */
- nr_pages = readahead_count(rac);
-
- while (nr_pages) {
- struct fuse_io_args *ia;
- struct fuse_args_pages *ap;
- unsigned cur_pages = min(max_pages, nr_pages);
- unsigned int pages = 0;
-
- if (fc->num_background >= fc->congestion_threshold &&
- rac->ra->async_size >= readahead_count(rac))
- /*
- * Congested and only async pages left, so skip the
- * rest.
- */
- break;
-
- ia = fuse_io_alloc(NULL, cur_pages);
- if (!ia)
- break;
- ap = &ia->ap;
-
- while (pages < cur_pages) {
- unsigned int folio_pages;
-
- /*
- * This returns a folio with a ref held on it.
- * The ref needs to be held until the request is
- * completed, since the splice case (see
- * fuse_try_move_page()) drops the ref after it's
- * replaced in the page cache.
- */
- if (!folio)
- folio = __readahead_folio(rac);
-
- folio_pages = folio_nr_pages(folio);
- if (folio_pages > cur_pages - pages) {
- /*
- * Large folios belonging to fuse will never
- * have more pages than max_pages.
- */
- WARN_ON(!pages);
- break;
- }
-
- ap->folios[ap->num_folios] = folio;
- ap->descs[ap->num_folios].length = folio_size(folio);
- ap->num_folios++;
- pages += folio_pages;
- folio = NULL;
- }
- fuse_send_readpages(ia, rac->file, pages << PAGE_SHIFT);
- nr_pages -= pages;
- }
- if (folio) {
- folio_end_read(folio, false);
- folio_put(folio);
- }
+ iomap_readahead(&fuse_iomap_ops, &ctx);
}
static ssize_t fuse_cache_read_iter(struct kiocb *iocb, struct iov_iter *to)
@@ -2084,7 +2107,7 @@ struct fuse_fill_wb_data {
struct fuse_file *ff;
unsigned int max_folios;
/*
- * nr_bytes won't overflow since fuse_writepage_need_send() caps
+ * nr_bytes won't overflow since fuse_folios_need_send() caps
* wb requests to never exceed fc->max_pages (which has an upper bound
* of U16_MAX).
*/
@@ -2129,14 +2152,15 @@ static void fuse_writepages_send(struct inode *inode,
spin_unlock(&fi->lock);
}
-static bool fuse_writepage_need_send(struct fuse_conn *fc, loff_t pos,
- unsigned len, struct fuse_args_pages *ap,
- struct fuse_fill_wb_data *data)
+static bool fuse_folios_need_send(struct fuse_conn *fc, loff_t pos,
+ unsigned len, struct fuse_args_pages *ap,
+ unsigned cur_bytes, bool write)
{
struct folio *prev_folio;
struct fuse_folio_desc prev_desc;
- unsigned bytes = data->nr_bytes + len;
+ unsigned bytes = cur_bytes + len;
loff_t prev_pos;
+ size_t max_bytes = write ? fc->max_write : fc->max_read;
WARN_ON(!ap->num_folios);
@@ -2144,8 +2168,7 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, loff_t pos,
if ((bytes + PAGE_SIZE - 1) >> PAGE_SHIFT > fc->max_pages)
return true;
- /* Reached max write bytes */
- if (bytes > fc->max_write)
+ if (bytes > max_bytes)
return true;
/* Discontinuity */
@@ -2155,11 +2178,6 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, loff_t pos,
if (prev_pos != pos)
return true;
- /* Need to grow the pages array? If so, did the expansion fail? */
- if (ap->num_folios == data->max_folios &&
- !fuse_pages_realloc(data, fc->max_pages))
- return true;
-
return false;
}
@@ -2183,10 +2201,24 @@ static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
return -EIO;
}
- if (wpa && fuse_writepage_need_send(fc, pos, len, ap, data)) {
- fuse_writepages_send(inode, data);
- data->wpa = NULL;
- data->nr_bytes = 0;
+ if (wpa) {
+ bool send = fuse_folios_need_send(fc, pos, len, ap,
+ data->nr_bytes, true);
+
+ if (!send) {
+ /*
+ * Need to grow the pages array? If so, did the
+ * expansion fail?
+ */
+ send = (ap->num_folios == data->max_folios) &&
+ !fuse_pages_realloc(data, fc->max_pages);
+ }
+
+ if (send) {
+ fuse_writepages_send(inode, data);
+ data->wpa = NULL;
+ data->nr_bytes = 0;
+ }
}
if (data->wpa == NULL) {
--
2.47.3
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH v2 16/16] fuse: remove fc->blkbits workaround for partial writes
2025-09-08 18:51 [PATCH v2 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (14 preceding siblings ...)
2025-09-08 18:51 ` [PATCH v2 15/16] fuse: use iomap for readahead Joanne Koong
@ 2025-09-08 18:51 ` Joanne Koong
15 siblings, 0 replies; 60+ messages in thread
From: Joanne Koong @ 2025-09-08 18:51 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
Now that fuse is integrated with iomap for read/readahead, we can remove
the workaround that was added in commit bd24d2108e9c ("fuse: fix fuseblk
i_blkbits for iomap partial writes"), which was previously needed to
avoid a race condition where an iomap partial write may be overwritten
by a read if blocksize < PAGE_SIZE. Now that fuse does iomap
read/readahead, this is protected against since there is granular
uptodate tracking of blocks, which means this workaround can be removed.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/fuse/dir.c | 2 +-
fs/fuse/fuse_i.h | 8 --------
fs/fuse/inode.c | 13 +------------
3 files changed, 2 insertions(+), 21 deletions(-)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 5c569c3cb53f..ebee7e0b1cd3 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1199,7 +1199,7 @@ static void fuse_fillattr(struct mnt_idmap *idmap, struct inode *inode,
if (attr->blksize != 0)
blkbits = ilog2(attr->blksize);
else
- blkbits = fc->blkbits;
+ blkbits = inode->i_sb->s_blocksize_bits;
stat->blksize = 1 << blkbits;
}
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index cc428d04be3e..1647eb7ca6fa 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -975,14 +975,6 @@ struct fuse_conn {
/* Request timeout (in jiffies). 0 = no timeout */
unsigned int req_timeout;
} timeout;
-
- /*
- * This is a workaround until fuse uses iomap for reads.
- * For fuseblk servers, this represents the blocksize passed in at
- * mount time and for regular fuse servers, this is equivalent to
- * inode->i_blkbits.
- */
- u8 blkbits;
};
/*
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 7ddfd2b3cc9c..3bfd83469d9f 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -292,7 +292,7 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
if (attr->blksize)
fi->cached_i_blkbits = ilog2(attr->blksize);
else
- fi->cached_i_blkbits = fc->blkbits;
+ fi->cached_i_blkbits = inode->i_sb->s_blocksize_bits;
/*
* Don't set the sticky bit in i_mode, unless we want the VFS
@@ -1810,21 +1810,10 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
err = -EINVAL;
if (!sb_set_blocksize(sb, ctx->blksize))
goto err;
- /*
- * This is a workaround until fuse hooks into iomap for reads.
- * Use PAGE_SIZE for the blocksize else if the writeback cache
- * is enabled, buffered writes go through iomap and a read may
- * overwrite partially written data if blocksize < PAGE_SIZE
- */
- fc->blkbits = sb->s_blocksize_bits;
- if (ctx->blksize != PAGE_SIZE &&
- !sb_set_blocksize(sb, PAGE_SIZE))
- goto err;
#endif
} else {
sb->s_blocksize = PAGE_SIZE;
sb->s_blocksize_bits = PAGE_SHIFT;
- fc->blkbits = sb->s_blocksize_bits;
}
sb->s_subtype = ctx->subtype;
--
2.47.3
^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH v2 11/16] iomap: add caller-provided callbacks for read and readahead
2025-09-08 18:51 ` [PATCH v2 11/16] iomap: add caller-provided callbacks for read and readahead Joanne Koong
@ 2025-09-09 0:14 ` Gao Xiang
2025-09-09 0:40 ` Gao Xiang
2025-09-09 15:24 ` Joanne Koong
2025-09-11 11:26 ` Christoph Hellwig
1 sibling, 2 replies; 60+ messages in thread
From: Gao Xiang @ 2025-09-09 0:14 UTC (permalink / raw)
To: Joanne Koong, djwong, hch
Cc: brauner, miklos, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
Hi Joanne,
On Mon, Sep 08, 2025 at 11:51:17AM -0700, Joanne Koong wrote:
> Add caller-provided callbacks for read and readahead so that it can be
> used generically, especially by filesystems that are not block-based.
>
> In particular, this:
> * Modifies the read and readahead interface to take in a
> struct iomap_read_folio_ctx that is publicly defined as:
>
> struct iomap_read_folio_ctx {
> const struct iomap_read_ops *ops;
> struct folio *cur_folio;
> struct readahead_control *rac;
> void *private;
> };
>
> where struct iomap_read_ops is defined as:
>
> struct iomap_read_ops {
> int (*read_folio_range)(const struct iomap_iter *iter,
> struct iomap_read_folio_ctx *ctx,
> loff_t pos, size_t len);
> int (*read_submit)(struct iomap_read_folio_ctx *ctx);
> };
>
No, I don't think `struct iomap_read_folio_ctx` has another
`.private` makes any sense, because:
- `struct iomap_iter *iter` already has `.private` and I think
it's mainly used for per-request usage; and your new
`.read_folio_range` already passes
`const struct iomap_iter *iter` which has `.private`
I don't think some read-specific `.private` is useful in any
case, also below.
- `struct iomap_read_folio_ctx` cannot be accessed by previous
.iomap_{begin,end} helpers, which means `struct iomap_read_ops`
is only useful for FUSE read iter/submit logic.
Also after my change, the prototype will be:
int iomap_read_folio(const struct iomap_ops *ops,
struct iomap_read_folio_ctx *ctx, void *private2);
void iomap_readahead(const struct iomap_ops *ops,
struct iomap_read_folio_ctx *ctx, void *private2);
Is it pretty weird due to `.iomap_{begin,end}` in principle can
only use `struct iomap_iter *` but have no way to access
` struct iomap_read_folio_ctx` to get more enough content for
read requests.
Thanks,
Gao Xiang
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 11/16] iomap: add caller-provided callbacks for read and readahead
2025-09-09 0:14 ` Gao Xiang
@ 2025-09-09 0:40 ` Gao Xiang
2025-09-09 15:24 ` Joanne Koong
1 sibling, 0 replies; 60+ messages in thread
From: Gao Xiang @ 2025-09-09 0:40 UTC (permalink / raw)
To: Joanne Koong, djwong, hch
Cc: brauner, miklos, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
On Tue, Sep 09, 2025 at 08:14:49AM +0800, Gao Xiang wrote:
> Hi Joanne,
>
> On Mon, Sep 08, 2025 at 11:51:17AM -0700, Joanne Koong wrote:
> > Add caller-provided callbacks for read and readahead so that it can be
> > used generically, especially by filesystems that are not block-based.
> >
> > In particular, this:
> > * Modifies the read and readahead interface to take in a
> > struct iomap_read_folio_ctx that is publicly defined as:
> >
> > struct iomap_read_folio_ctx {
> > const struct iomap_read_ops *ops;
> > struct folio *cur_folio;
> > struct readahead_control *rac;
> > void *private;
> > };
> >
> > where struct iomap_read_ops is defined as:
> >
> > struct iomap_read_ops {
> > int (*read_folio_range)(const struct iomap_iter *iter,
> > struct iomap_read_folio_ctx *ctx,
> > loff_t pos, size_t len);
> > int (*read_submit)(struct iomap_read_folio_ctx *ctx);
> > };
> >
>
> No, I don't think `struct iomap_read_folio_ctx` has another
> `.private` makes any sense, because:
>
> - `struct iomap_iter *iter` already has `.private` and I think
> it's mainly used for per-request usage; and your new
> `.read_folio_range` already passes
> `const struct iomap_iter *iter` which has `.private`
> I don't think some read-specific `.private` is useful in any
> case, also below.
>
> - `struct iomap_read_folio_ctx` cannot be accessed by previous
> .iomap_{begin,end} helpers, which means `struct iomap_read_ops`
> is only useful for FUSE read iter/submit logic.
>
> Also after my change, the prototype will be:
>
> int iomap_read_folio(const struct iomap_ops *ops,
> struct iomap_read_folio_ctx *ctx, void *private2);
> void iomap_readahead(const struct iomap_ops *ops,
> struct iomap_read_folio_ctx *ctx, void *private2);
>
btw, if iomap folks really think it looks clean to pass in two
different `private` like this, I'm fine, basically:
I need a way to create an on-stack context in `erofs_read_folio()`
and `erofs_readahead()` and pass it down to .iomap_{begin,end}
because the current `.iomap_begin` and `.iomap_end` has no way to
get the new on-stack context: it can only get inode,pos,len,etc.
As Darrick mentioned, `iter = container_of(iomap)` usage in
`xfs_zoned_buffered_write_iomap_begin()` and
`xfs_buffered_write_delalloc_punch()` looks uneasy to me as well,
because it couples `struct iomap *` and `struct iomap_iter *` with
iomap implementation internals: At least `struct iomap_iter` has
two `struct iomap`, without any details, it's hard to assume it's
the `iter->iomap` one.
> Is it pretty weird due to `.iomap_{begin,end}` in principle can
> only use `struct iomap_iter *` but have no way to access
> ` struct iomap_read_folio_ctx` to get more enough content for
> read requests.
>
> Thanks,
> Gao Xiang
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 13/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard
2025-09-08 18:51 ` [PATCH v2 13/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard Joanne Koong
@ 2025-09-09 2:14 ` Gao Xiang
2025-09-09 15:33 ` Joanne Koong
2025-09-11 11:44 ` Christoph Hellwig
1 sibling, 1 reply; 60+ messages in thread
From: Gao Xiang @ 2025-09-09 2:14 UTC (permalink / raw)
To: Joanne Koong, brauner, miklos
Cc: hch, djwong, linux-block, gfs2, linux-fsdevel, kernel-team,
linux-xfs, linux-doc
On 2025/9/9 02:51, Joanne Koong wrote:
> There is no longer a dependency on CONFIG_BLOCK in the iomap read and
> readahead logic. Move this logic out of the CONFIG_BLOCK guard. This
> allows non-block-based filesystems to use iomap for reads/readahead.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
> fs/iomap/buffered-io.c | 151 +++++++++++++++++++++--------------------
> 1 file changed, 76 insertions(+), 75 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index f673e03f4ffb..c424e8c157dd 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -358,81 +358,6 @@ void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
> }
> EXPORT_SYMBOL_GPL(iomap_finish_folio_read);
>
> -#ifdef CONFIG_BLOCK
> -static void iomap_read_end_io(struct bio *bio)
> -{
> - int error = blk_status_to_errno(bio->bi_status);
> - struct folio_iter fi;
> -
> - bio_for_each_folio_all(fi, bio)
> - iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error);
> - bio_put(bio);
> -}
> -
> -static int iomap_submit_read_bio(struct iomap_read_folio_ctx *ctx)
> -{
> - struct bio *bio = ctx->private;
> -
> - if (bio)
> - submit_bio(bio);
> -
> - return 0;
> -}
> -
> -/**
> - * Read in a folio range asynchronously through bios.
> - *
> - * This should only be used for read/readahead, not for buffered writes.
> - * Buffered writes must read in the folio synchronously.
> - */
> -static int iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
> - struct iomap_read_folio_ctx *ctx, loff_t pos, size_t plen)
> -{
> - struct folio *folio = ctx->cur_folio;
> - const struct iomap *iomap = &iter->iomap;
> - size_t poff = offset_in_folio(folio, pos);
> - loff_t length = iomap_length(iter);
> - sector_t sector;
> - struct bio *bio = ctx->private;
> -
> - iomap_start_folio_read(folio, plen);
> -
> - sector = iomap_sector(iomap, pos);
> - if (!bio || bio_end_sector(bio) != sector ||
> - !bio_add_folio(bio, folio, plen, poff)) {
> - gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
> - gfp_t orig_gfp = gfp;
> - unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
> -
> - iomap_submit_read_bio(ctx);
> -
> - if (ctx->rac) /* same as readahead_gfp_mask */
> - gfp |= __GFP_NORETRY | __GFP_NOWARN;
> - bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
> - REQ_OP_READ, gfp);
> - /*
> - * If the bio_alloc fails, try it again for a single page to
> - * avoid having to deal with partial page reads. This emulates
> - * what do_mpage_read_folio does.
> - */
> - if (!bio)
> - bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ, orig_gfp);
> - if (ctx->rac)
> - bio->bi_opf |= REQ_RAHEAD;
> - bio->bi_iter.bi_sector = sector;
> - bio->bi_end_io = iomap_read_end_io;
> - bio_add_folio_nofail(bio, folio, plen, poff);
> - ctx->private = bio;
> - }
> - return 0;
> -}
> -
> -const struct iomap_read_ops iomap_read_bios_ops = {
> - .read_folio_range = iomap_read_folio_range_bio_async,
> - .read_submit = iomap_submit_read_bio,
> -};
> -EXPORT_SYMBOL_GPL(iomap_read_bios_ops);
> -
> static int iomap_read_folio_iter(struct iomap_iter *iter,
> struct iomap_read_folio_ctx *ctx, bool *cur_folio_owned)
> {
> @@ -601,6 +526,82 @@ void iomap_readahead(const struct iomap_ops *ops,
> }
> EXPORT_SYMBOL_GPL(iomap_readahead);
>
> +#ifdef CONFIG_BLOCK
> +static void iomap_read_end_io(struct bio *bio)
> +{
> + int error = blk_status_to_errno(bio->bi_status);
> + struct folio_iter fi;
> +
> + bio_for_each_folio_all(fi, bio)
> + iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error);
> + bio_put(bio);
> +}
> +
> +static int iomap_submit_read_bio(struct iomap_read_folio_ctx *ctx)
> +{
> + struct bio *bio = ctx->private;
> +
> + if (bio)
> + submit_bio(bio);
> +
> + return 0;
> +}
> +
> +/**
> + * Read in a folio range asynchronously through bios.
> + *
> + * This should only be used for read/readahead, not for buffered writes.
> + * Buffered writes must read in the folio synchronously.
> + */
> +static int iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
> + struct iomap_read_folio_ctx *ctx, loff_t pos, size_t plen)
> +{
> + struct folio *folio = ctx->cur_folio;
> + const struct iomap *iomap = &iter->iomap;
> + size_t poff = offset_in_folio(folio, pos);
> + loff_t length = iomap_length(iter);
> + sector_t sector;
> + struct bio *bio = ctx->private;
> +
> + iomap_start_folio_read(folio, plen);
> +
> + sector = iomap_sector(iomap, pos);
> + if (!bio || bio_end_sector(bio) != sector ||
> + !bio_add_folio(bio, folio, plen, poff)) {
> + gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
> + gfp_t orig_gfp = gfp;
> + unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
> +
> + if (bio)
> + submit_bio(bio);
> +
> + if (ctx->rac) /* same as readahead_gfp_mask */
> + gfp |= __GFP_NORETRY | __GFP_NOWARN;
> + bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
> + REQ_OP_READ, gfp);
> + /*
> + * If the bio_alloc fails, try it again for a single page to
> + * avoid having to deal with partial page reads. This emulates
> + * what do_mpage_read_folio does.
> + */
> + if (!bio)
> + bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ, orig_gfp);
> + if (ctx->rac)
> + bio->bi_opf |= REQ_RAHEAD;
> + bio->bi_iter.bi_sector = sector;
> + bio->bi_end_io = iomap_read_end_io;
> + bio_add_folio_nofail(bio, folio, plen, poff);
> + ctx->private = bio;
Yes, I understand some way is needed to isolate bio from non-bio
based filesystems, and I also agree `bio` shouldn't be stashed
into `iter->private` since it's just an abuse usage as mentioned
in:
https://lore.kernel.org/r/20250903203031.GM1587915@frogsfrogsfrogs
https://lore.kernel.org/r/aLkskcgl3Z91oIVB@infradead.org
However, the naming of `(struct iomap_read_folio_ctx)->private`
really makes me feel confused because the `private` name in
`read_folio_ctx` is much like a filesystem read context instead
of just be used as `bio` internally in iomap for block-based
filesystems.
also the existing of `iter->private` makes the naming of
`ctx->private` more confusing at least in my view.
Thanks,
Gao Xiang
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 11/16] iomap: add caller-provided callbacks for read and readahead
2025-09-09 0:14 ` Gao Xiang
2025-09-09 0:40 ` Gao Xiang
@ 2025-09-09 15:24 ` Joanne Koong
2025-09-09 23:21 ` Gao Xiang
1 sibling, 1 reply; 60+ messages in thread
From: Joanne Koong @ 2025-09-09 15:24 UTC (permalink / raw)
To: Joanne Koong, djwong, hch, brauner, miklos, hsiangkao,
linux-block, gfs2, linux-fsdevel, kernel-team, linux-xfs,
linux-doc
On Mon, Sep 8, 2025 at 8:14 PM Gao Xiang <xiang@kernel.org> wrote:
>
> Hi Joanne,
>
> On Mon, Sep 08, 2025 at 11:51:17AM -0700, Joanne Koong wrote:
> > Add caller-provided callbacks for read and readahead so that it can be
> > used generically, especially by filesystems that are not block-based.
> >
> > In particular, this:
> > * Modifies the read and readahead interface to take in a
> > struct iomap_read_folio_ctx that is publicly defined as:
> >
> > struct iomap_read_folio_ctx {
> > const struct iomap_read_ops *ops;
> > struct folio *cur_folio;
> > struct readahead_control *rac;
> > void *private;
> > };
> >
> > where struct iomap_read_ops is defined as:
> >
> > struct iomap_read_ops {
> > int (*read_folio_range)(const struct iomap_iter *iter,
> > struct iomap_read_folio_ctx *ctx,
> > loff_t pos, size_t len);
> > int (*read_submit)(struct iomap_read_folio_ctx *ctx);
> > };
> >
>
> No, I don't think `struct iomap_read_folio_ctx` has another
> `.private` makes any sense, because:
>
> - `struct iomap_iter *iter` already has `.private` and I think
> it's mainly used for per-request usage; and your new
> `.read_folio_range` already passes
> `const struct iomap_iter *iter` which has `.private`
> I don't think some read-specific `.private` is useful in any
> case, also below.
>
> - `struct iomap_read_folio_ctx` cannot be accessed by previous
> .iomap_{begin,end} helpers, which means `struct iomap_read_ops`
> is only useful for FUSE read iter/submit logic.
>
> Also after my change, the prototype will be:
>
> int iomap_read_folio(const struct iomap_ops *ops,
> struct iomap_read_folio_ctx *ctx, void *private2);
> void iomap_readahead(const struct iomap_ops *ops,
> struct iomap_read_folio_ctx *ctx, void *private2);
>
> Is it pretty weird due to `.iomap_{begin,end}` in principle can
> only use `struct iomap_iter *` but have no way to access
> ` struct iomap_read_folio_ctx` to get more enough content for
> read requests.
Hi Gao,
imo I don't think it makes sense to, if I'm understanding what you're
proposing correctly, have one shared data pointer between iomap
read/readahead and the iomap_{begin,end} helpers because
a) I don't think it's guaranteed that the data needed by
read/readahead and iomap_{begin,end} is the same. I guess we could
combine the data each needs altogether into one struct, but it seems
simpler and cleaner to me to just have the two be separate.
b) I'm not sure about the erofs use case, but at least for what I'm
seeing for fuse and the block-based filesystems currently using iomap,
the data needed by iomap read/readahead (eg bios, the fuse
fuse_fill_read_data) is irrelevant for iomap_{begin/end} and it seems
unclean to expose that extraneous info. (btw I don't think it's true
that iomap_iter is mainly used for per-request usage - for readahead
for example, iomap_{begin,end} is called before and after we service
the entire readahead, not called per request, whereas
.read_folio_range() is called per request).
c) imo iomap_{begin,end} is meant to be a more generic interface and I
don't think it makes sense to tie read-specific data to it. For
example, some filesystems (eg gfs2) use the same iomap_ops across
different file operations (eg buffered writes, direct io, reads, bmap,
etc).
Thanks,
Joanne
>
> Thanks,
> Gao Xiang
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 13/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard
2025-09-09 2:14 ` Gao Xiang
@ 2025-09-09 15:33 ` Joanne Koong
2025-09-10 4:59 ` Gao Xiang
0 siblings, 1 reply; 60+ messages in thread
From: Joanne Koong @ 2025-09-09 15:33 UTC (permalink / raw)
To: Gao Xiang
Cc: brauner, miklos, hch, djwong, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
On Mon, Sep 8, 2025 at 10:14 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>
> On 2025/9/9 02:51, Joanne Koong wrote:
> > There is no longer a dependency on CONFIG_BLOCK in the iomap read and
> > readahead logic. Move this logic out of the CONFIG_BLOCK guard. This
> > allows non-block-based filesystems to use iomap for reads/readahead.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> > fs/iomap/buffered-io.c | 151 +++++++++++++++++++++--------------------
> > 1 file changed, 76 insertions(+), 75 deletions(-)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index f673e03f4ffb..c424e8c157dd 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -358,81 +358,6 @@ void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
> > }
> > +
> > +/**
> > + * Read in a folio range asynchronously through bios.
> > + *
> > + * This should only be used for read/readahead, not for buffered writes.
> > + * Buffered writes must read in the folio synchronously.
> > + */
> > +static int iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
> > + struct iomap_read_folio_ctx *ctx, loff_t pos, size_t plen)
> > +{
> > + struct folio *folio = ctx->cur_folio;
> > + const struct iomap *iomap = &iter->iomap;
> > + size_t poff = offset_in_folio(folio, pos);
> > + loff_t length = iomap_length(iter);
> > + sector_t sector;
> > + struct bio *bio = ctx->private;
> > +
> > + iomap_start_folio_read(folio, plen);
> > +
> > + sector = iomap_sector(iomap, pos);
> > + if (!bio || bio_end_sector(bio) != sector ||
> > + !bio_add_folio(bio, folio, plen, poff)) {
> > + gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
> > + gfp_t orig_gfp = gfp;
> > + unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
> > +
> > + if (bio)
> > + submit_bio(bio);
> > +
> > + if (ctx->rac) /* same as readahead_gfp_mask */
> > + gfp |= __GFP_NORETRY | __GFP_NOWARN;
> > + bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
> > + REQ_OP_READ, gfp);
> > + /*
> > + * If the bio_alloc fails, try it again for a single page to
> > + * avoid having to deal with partial page reads. This emulates
> > + * what do_mpage_read_folio does.
> > + */
> > + if (!bio)
> > + bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ, orig_gfp);
> > + if (ctx->rac)
> > + bio->bi_opf |= REQ_RAHEAD;
> > + bio->bi_iter.bi_sector = sector;
> > + bio->bi_end_io = iomap_read_end_io;
> > + bio_add_folio_nofail(bio, folio, plen, poff);
> > + ctx->private = bio;
>
> Yes, I understand some way is needed to isolate bio from non-bio
> based filesystems, and I also agree `bio` shouldn't be stashed
> into `iter->private` since it's just an abuse usage as mentioned
> in:
> https://lore.kernel.org/r/20250903203031.GM1587915@frogsfrogsfrogs
> https://lore.kernel.org/r/aLkskcgl3Z91oIVB@infradead.org
>
> However, the naming of `(struct iomap_read_folio_ctx)->private`
> really makes me feel confused because the `private` name in
> `read_folio_ctx` is much like a filesystem read context instead
> of just be used as `bio` internally in iomap for block-based
> filesystems.
>
> also the existing of `iter->private` makes the naming of
> `ctx->private` more confusing at least in my view.
Do you think "ctx->data" would be better? Or is there something else
you had in mind?
Thanks,
Joanne
>
> Thanks,
> Gao Xiang
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 11/16] iomap: add caller-provided callbacks for read and readahead
2025-09-09 15:24 ` Joanne Koong
@ 2025-09-09 23:21 ` Gao Xiang
2025-09-10 17:41 ` Joanne Koong
0 siblings, 1 reply; 60+ messages in thread
From: Gao Xiang @ 2025-09-09 23:21 UTC (permalink / raw)
To: Joanne Koong, djwong, hch, brauner, miklos, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
Hi Joanne,
On 2025/9/9 23:24, Joanne Koong wrote:
> On Mon, Sep 8, 2025 at 8:14 PM Gao Xiang <xiang@kernel.org> wrote:
>>
>> Hi Joanne,
>>
>> On Mon, Sep 08, 2025 at 11:51:17AM -0700, Joanne Koong wrote:
>>> Add caller-provided callbacks for read and readahead so that it can be
>>> used generically, especially by filesystems that are not block-based.
>>>
>>> In particular, this:
>>> * Modifies the read and readahead interface to take in a
>>> struct iomap_read_folio_ctx that is publicly defined as:
>>>
>>> struct iomap_read_folio_ctx {
>>> const struct iomap_read_ops *ops;
>>> struct folio *cur_folio;
>>> struct readahead_control *rac;
>>> void *private;
>>> };
>>>
>>> where struct iomap_read_ops is defined as:
>>>
>>> struct iomap_read_ops {
>>> int (*read_folio_range)(const struct iomap_iter *iter,
>>> struct iomap_read_folio_ctx *ctx,
>>> loff_t pos, size_t len);
>>> int (*read_submit)(struct iomap_read_folio_ctx *ctx);
>>> };
>>>
>>
>> No, I don't think `struct iomap_read_folio_ctx` has another
>> `.private` makes any sense, because:
>>
>> - `struct iomap_iter *iter` already has `.private` and I think
>> it's mainly used for per-request usage; and your new
>> `.read_folio_range` already passes
>> `const struct iomap_iter *iter` which has `.private`
>> I don't think some read-specific `.private` is useful in any
>> case, also below.
>>
>> - `struct iomap_read_folio_ctx` cannot be accessed by previous
>> .iomap_{begin,end} helpers, which means `struct iomap_read_ops`
>> is only useful for FUSE read iter/submit logic.
>>
>> Also after my change, the prototype will be:
>>
>> int iomap_read_folio(const struct iomap_ops *ops,
>> struct iomap_read_folio_ctx *ctx, void *private2);
>> void iomap_readahead(const struct iomap_ops *ops,
>> struct iomap_read_folio_ctx *ctx, void *private2);
>>
>> Is it pretty weird due to `.iomap_{begin,end}` in principle can
>> only use `struct iomap_iter *` but have no way to access
>> ` struct iomap_read_folio_ctx` to get more enough content for
>> read requests.
>
> Hi Gao,
>
> imo I don't think it makes sense to, if I'm understanding what you're
> proposing correctly, have one shared data pointer between iomap
> read/readahead and the iomap_{begin,end} helpers because
My main concern is two `private` naming here: I would like to add
`private` to iomap_read/readahead() much like __iomap_dio_rw() at
least to make our new feature work efficiently.
>
> a) I don't think it's guaranteed that the data needed by
> read/readahead and iomap_{begin,end} is the same. I guess we could
> combine the data each needs altogether into one struct, but it seems
> simpler and cleaner to me to just have the two be separate.
>
> b) I'm not sure about the erofs use case, but at least for what I'm
> seeing for fuse and the block-based filesystems currently using iomap,
> the data needed by iomap read/readahead (eg bios, the fuse
> fuse_fill_read_data) is irrelevant for iomap_{begin/end} and it seems
> unclean to expose that extraneous info. (btw I don't think it's true
> that iomap_iter is mainly used for per-request usage - for readahead
> for example, iomap_{begin,end} is called before and after we service
> the entire readahead, not called per request, whereas
> .read_folio_range() is called per request).
I said `per-request` meant a single sync read or readahead request,
which is triggered by vfs or mm for example.
>
> c) imo iomap_{begin,end} is meant to be a more generic interface and I
> don't think it makes sense to tie read-specific data to it. For
> example, some filesystems (eg gfs2) use the same iomap_ops across
> different file operations (eg buffered writes, direct io, reads, bmap,
> etc).
Previously `.iomap_{begin,end}` participates in buffer read and write
I/O paths (except for page writeback of course) as you said, in
principle users only need to care about fields in `struct iomap_iter`.
`struct iomap_readpage_ctx` is currently used as an internal structure
which is completely invisible to filesystems (IOWs, filesystems don't
need to care or specify any of that).
After your proposal, new renamed `struct iomap_read_folio_ctx` will be
exposed to individual filesystems too, but that makes two external
context structures for the buffer I/O reads (`struct iomap_iter` and
`struct iomap_read_folio_ctx`) instead of one.
I'm not saying your proposal doesn't work, but:
- which is unlike `struct iomap_writepage_ctx` because writeback path
doesn't have `struct iomap_iter` involved, and it has only that
exact one `struct iomap_writepage_ctx` context and all callbacks
use that only;
- take a look at `iomap_dio_rw` and `iomap_dio_ops`, I think it's
somewhat similiar to the new `struct iomap_read_ops` in some
extent, but dio currently also exposes the exact one context
(`struct iomap_iter`) to users.
- take a look at `iomap_write_ops`, it also exposes
`struct iomap_iter` only. you may say `folio`, `pos`, `len` can be
wrapped as another `struct iomap_write_ctx` if needed, but that is
not designed to be exposed to be specfied by write_iter (e.g.
fuse_cache_write_iter)
In short, traditionally the buffered read/write external context is
the only unique one `struct iomap_iter` (`struct iomap_readpage_ctx`
is only for iomap internal use), after your proposal there will be
two external contexts specified by users (.read_folio and .readahead)
but `.iomap_{begin,end}` is unable to get one of them, which is
unlike the current writeback and direct i/o paths (they uses one
external context too.)
Seperate into two contexts works for your use case, but it may
cause issues since future developers have to decide where to
place those new context fields for buffer I/O paths (
`struct iomap_iter` or `struct iomap_read_folio_ctx`), it's still
possible but may cause further churn on the codebase perspective.
That is my minor concern, but my main concern is still `private`
naming.
Thanks,
Gao Xiang
>
>
> Thanks,
> Joanne
>
>>
>> Thanks,
>> Gao Xiang
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 13/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard
2025-09-09 15:33 ` Joanne Koong
@ 2025-09-10 4:59 ` Gao Xiang
2025-09-11 11:37 ` Christoph Hellwig
0 siblings, 1 reply; 60+ messages in thread
From: Gao Xiang @ 2025-09-10 4:59 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, djwong, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
On 2025/9/9 23:33, Joanne Koong wrote:
> On Mon, Sep 8, 2025 at 10:14 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>
>> On 2025/9/9 02:51, Joanne Koong wrote:
>>> There is no longer a dependency on CONFIG_BLOCK in the iomap read and
>>> readahead logic. Move this logic out of the CONFIG_BLOCK guard. This
>>> allows non-block-based filesystems to use iomap for reads/readahead.
>>>
>>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>>> ---
>>> fs/iomap/buffered-io.c | 151 +++++++++++++++++++++--------------------
>>> 1 file changed, 76 insertions(+), 75 deletions(-)
>>>
>>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>>> index f673e03f4ffb..c424e8c157dd 100644
>>> --- a/fs/iomap/buffered-io.c
>>> +++ b/fs/iomap/buffered-io.c
>>> @@ -358,81 +358,6 @@ void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
>>> }
>>> +
>>> +/**
>>> + * Read in a folio range asynchronously through bios.
>>> + *
>>> + * This should only be used for read/readahead, not for buffered writes.
>>> + * Buffered writes must read in the folio synchronously.
>>> + */
>>> +static int iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
>>> + struct iomap_read_folio_ctx *ctx, loff_t pos, size_t plen)
>>> +{
>>> + struct folio *folio = ctx->cur_folio;
>>> + const struct iomap *iomap = &iter->iomap;
>>> + size_t poff = offset_in_folio(folio, pos);
>>> + loff_t length = iomap_length(iter);
>>> + sector_t sector;
>>> + struct bio *bio = ctx->private;
>>> +
>>> + iomap_start_folio_read(folio, plen);
>>> +
>>> + sector = iomap_sector(iomap, pos);
>>> + if (!bio || bio_end_sector(bio) != sector ||
>>> + !bio_add_folio(bio, folio, plen, poff)) {
>>> + gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
>>> + gfp_t orig_gfp = gfp;
>>> + unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
>>> +
>>> + if (bio)
>>> + submit_bio(bio);
>>> +
>>> + if (ctx->rac) /* same as readahead_gfp_mask */
>>> + gfp |= __GFP_NORETRY | __GFP_NOWARN;
>>> + bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
>>> + REQ_OP_READ, gfp);
>>> + /*
>>> + * If the bio_alloc fails, try it again for a single page to
>>> + * avoid having to deal with partial page reads. This emulates
>>> + * what do_mpage_read_folio does.
>>> + */
>>> + if (!bio)
>>> + bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ, orig_gfp);
>>> + if (ctx->rac)
>>> + bio->bi_opf |= REQ_RAHEAD;
>>> + bio->bi_iter.bi_sector = sector;
>>> + bio->bi_end_io = iomap_read_end_io;
>>> + bio_add_folio_nofail(bio, folio, plen, poff);
>>> + ctx->private = bio;
>>
>> Yes, I understand some way is needed to isolate bio from non-bio
>> based filesystems, and I also agree `bio` shouldn't be stashed
>> into `iter->private` since it's just an abuse usage as mentioned
>> in:
>> https://lore.kernel.org/r/20250903203031.GM1587915@frogsfrogsfrogs
>> https://lore.kernel.org/r/aLkskcgl3Z91oIVB@infradead.org
>>
>> However, the naming of `(struct iomap_read_folio_ctx)->private`
>> really makes me feel confused because the `private` name in
>> `read_folio_ctx` is much like a filesystem read context instead
>> of just be used as `bio` internally in iomap for block-based
>> filesystems.
>>
>> also the existing of `iter->private` makes the naming of
>> `ctx->private` more confusing at least in my view.
>
> Do you think "ctx->data" would be better? Or is there something else
> you had in mind?
At least it sounds better on my side, but anyway it's just
my own overall thought. If other folks have different idea,
I don't have strong opinion, I just need something for my own
as previous said.
Thanks,
Gao Xiang
>
> Thanks,
> Joanne
>>
>> Thanks,
>> Gao Xiang
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 11/16] iomap: add caller-provided callbacks for read and readahead
2025-09-09 23:21 ` Gao Xiang
@ 2025-09-10 17:41 ` Joanne Koong
2025-09-11 11:19 ` Christoph Hellwig
0 siblings, 1 reply; 60+ messages in thread
From: Joanne Koong @ 2025-09-10 17:41 UTC (permalink / raw)
To: Gao Xiang
Cc: djwong, hch, brauner, miklos, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
On Tue, Sep 9, 2025 at 7:21 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>
> Hi Joanne,
>
> On 2025/9/9 23:24, Joanne Koong wrote:
> > On Mon, Sep 8, 2025 at 8:14 PM Gao Xiang <xiang@kernel.org> wrote:
> >>
> >> Hi Joanne,
> >>
> >> On Mon, Sep 08, 2025 at 11:51:17AM -0700, Joanne Koong wrote:
> >>> Add caller-provided callbacks for read and readahead so that it can be
> >>> used generically, especially by filesystems that are not block-based.
> >>>
> >>> In particular, this:
> >>> * Modifies the read and readahead interface to take in a
> >>> struct iomap_read_folio_ctx that is publicly defined as:
> >>>
> >>> struct iomap_read_folio_ctx {
> >>> const struct iomap_read_ops *ops;
> >>> struct folio *cur_folio;
> >>> struct readahead_control *rac;
> >>> void *private;
> >>> };
> >>>
> >>> where struct iomap_read_ops is defined as:
> >>>
> >>> struct iomap_read_ops {
> >>> int (*read_folio_range)(const struct iomap_iter *iter,
> >>> struct iomap_read_folio_ctx *ctx,
> >>> loff_t pos, size_t len);
> >>> int (*read_submit)(struct iomap_read_folio_ctx *ctx);
> >>> };
> >>>
> >>
> >> No, I don't think `struct iomap_read_folio_ctx` has another
> >> `.private` makes any sense, because:
> >>
> >> - `struct iomap_iter *iter` already has `.private` and I think
> >> it's mainly used for per-request usage; and your new
> >> `.read_folio_range` already passes
> >> `const struct iomap_iter *iter` which has `.private`
> >> I don't think some read-specific `.private` is useful in any
> >> case, also below.
> >>
> >> - `struct iomap_read_folio_ctx` cannot be accessed by previous
> >> .iomap_{begin,end} helpers, which means `struct iomap_read_ops`
> >> is only useful for FUSE read iter/submit logic.
> >>
> >> Also after my change, the prototype will be:
> >>
> >> int iomap_read_folio(const struct iomap_ops *ops,
> >> struct iomap_read_folio_ctx *ctx, void *private2);
> >> void iomap_readahead(const struct iomap_ops *ops,
> >> struct iomap_read_folio_ctx *ctx, void *private2);
> >>
> >> Is it pretty weird due to `.iomap_{begin,end}` in principle can
> >> only use `struct iomap_iter *` but have no way to access
> >> ` struct iomap_read_folio_ctx` to get more enough content for
> >> read requests.
> >
> > Hi Gao,
> >
> > imo I don't think it makes sense to, if I'm understanding what you're
> > proposing correctly, have one shared data pointer between iomap
> > read/readahead and the iomap_{begin,end} helpers because
>
> My main concern is two `private` naming here: I would like to add
> `private` to iomap_read/readahead() much like __iomap_dio_rw() at
> least to make our new feature work efficiently.
>
> >
> > a) I don't think it's guaranteed that the data needed by
> > read/readahead and iomap_{begin,end} is the same. I guess we could
> > combine the data each needs altogether into one struct, but it seems
> > simpler and cleaner to me to just have the two be separate.
> >
> > b) I'm not sure about the erofs use case, but at least for what I'm
> > seeing for fuse and the block-based filesystems currently using iomap,
> > the data needed by iomap read/readahead (eg bios, the fuse
> > fuse_fill_read_data) is irrelevant for iomap_{begin/end} and it seems
> > unclean to expose that extraneous info. (btw I don't think it's true
> > that iomap_iter is mainly used for per-request usage - for readahead
> > for example, iomap_{begin,end} is called before and after we service
> > the entire readahead, not called per request, whereas
> > .read_folio_range() is called per request).
>
> I said `per-request` meant a single sync read or readahead request,
> which is triggered by vfs or mm for example.
>
> >
> > c) imo iomap_{begin,end} is meant to be a more generic interface and I
> > don't think it makes sense to tie read-specific data to it. For
> > example, some filesystems (eg gfs2) use the same iomap_ops across
> > different file operations (eg buffered writes, direct io, reads, bmap,
> > etc).
>
> Previously `.iomap_{begin,end}` participates in buffer read and write
> I/O paths (except for page writeback of course) as you said, in
> principle users only need to care about fields in `struct iomap_iter`.
>
> `struct iomap_readpage_ctx` is currently used as an internal structure
> which is completely invisible to filesystems (IOWs, filesystems don't
> need to care or specify any of that).
>
> After your proposal, new renamed `struct iomap_read_folio_ctx` will be
> exposed to individual filesystems too, but that makes two external
> context structures for the buffer I/O reads (`struct iomap_iter` and
> `struct iomap_read_folio_ctx`) instead of one.
>
> I'm not saying your proposal doesn't work, but:
>
> - which is unlike `struct iomap_writepage_ctx` because writeback path
> doesn't have `struct iomap_iter` involved, and it has only that
> exact one `struct iomap_writepage_ctx` context and all callbacks
> use that only;
>
> - take a look at `iomap_dio_rw` and `iomap_dio_ops`, I think it's
> somewhat similiar to the new `struct iomap_read_ops` in some
> extent, but dio currently also exposes the exact one context
> (`struct iomap_iter`) to users.
>
> - take a look at `iomap_write_ops`, it also exposes
> `struct iomap_iter` only. you may say `folio`, `pos`, `len` can be
> wrapped as another `struct iomap_write_ctx` if needed, but that is
> not designed to be exposed to be specfied by write_iter (e.g.
> fuse_cache_write_iter)
>
> In short, traditionally the buffered read/write external context is
> the only unique one `struct iomap_iter` (`struct iomap_readpage_ctx`
> is only for iomap internal use), after your proposal there will be
> two external contexts specified by users (.read_folio and .readahead)
> but `.iomap_{begin,end}` is unable to get one of them, which is
> unlike the current writeback and direct i/o paths (they uses one
> external context too.)
>
> Seperate into two contexts works for your use case, but it may
> cause issues since future developers have to decide where to
> place those new context fields for buffer I/O paths (
> `struct iomap_iter` or `struct iomap_read_folio_ctx`), it's still
> possible but may cause further churn on the codebase perspective.
>
> That is my minor concern, but my main concern is still `private`
> naming.
Hi Gao,
In my mind, the big question is whether or not the data the
filesystems pass in is logically shared by both iomap_begin/end and
buffered reads/writes/dio callbacks, or whether the data needed by
both are basically separate entities but have to be frankensteined
together so that it can be passed in through iter->private. My sense
of the read/readahead code is that the data needed by iomap begin/end
vs buffered reads are basically logically separate entities. I see
your point about how the existing code for buffered writes and dio in
iomap have them combined into one, but imo, if the iomap_iter data is
a separate entity from the data needed in the callbacks, then those
pointers should be separate.
But I also am happy to change this back to having it the way it was
for v1 where everything just went through iter->private. I don't feel
strongly about this decision, I'm happy with whichever way we go with.
Thanks,
Joanne
>
> Thanks,
> Gao Xiang
>
> > Thanks,
> > Joanne
> >
> >>
> >> Thanks,
> >> Gao Xiang
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 01/16] iomap: move async bio read logic into helper function
2025-09-08 18:51 ` [PATCH v2 01/16] iomap: move async bio read logic into helper function Joanne Koong
@ 2025-09-11 11:09 ` Christoph Hellwig
2025-09-12 16:01 ` Joanne Koong
0 siblings, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2025-09-11 11:09 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
> +static void iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
> + struct iomap_readpage_ctx *ctx, loff_t pos, size_t plen)
The _async here feels very misplaced, because pretty much everyting in
the area except for the odd write_begin helper is async, and the postfix
does not match the method name.
Also as a general discussion for naming, having common prefixed for sets
of related helpers is nice. Maybe use iomap_bio_* for all the bio
helpers were're adding here? We can then fix the direct I/O code to
match that later.
> {
> + struct folio *folio = ctx->cur_folio;
> const struct iomap *iomap = &iter->iomap;
> - loff_t pos = iter->pos;
Looking at the caller, it seems we should not need the pos argument if
we adjust pos just after calculating count at the beginning of the loop.
I think that would be a much better interface.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 02/16] iomap: move read/readahead bio submission logic into helper function
2025-09-08 18:51 ` [PATCH v2 02/16] iomap: move read/readahead bio submission " Joanne Koong
@ 2025-09-11 11:09 ` Christoph Hellwig
0 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-09-11 11:09 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
This looks good except for the naming scheme mentioned in the last
patch.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 03/16] iomap: rename cur_folio_in_bio to folio_owned
2025-09-08 18:51 ` [PATCH v2 03/16] iomap: rename cur_folio_in_bio to folio_owned Joanne Koong
@ 2025-09-11 11:10 ` Christoph Hellwig
0 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-09-11 11:10 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On Mon, Sep 08, 2025 at 11:51:09AM -0700, Joanne Koong wrote:
> The purpose of struct iomap_readpage_ctx's cur_folio_in_bio is to track
> whether the folio is owned by the bio (where thus the bio is responsible
> for unlocking the folio) or if it needs to be unlocked by iomap. Rename
> this to folio_owned to make the purpose more clear and so that when
> iomap read/readahead logic is made generic, the name also makes sense
> for filesystems that don't use bios.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
> fs/iomap/buffered-io.c | 17 +++++++++++------
> 1 file changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index a3b02ed5328f..598998269107 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -352,7 +352,12 @@ static void iomap_read_end_io(struct bio *bio)
>
> struct iomap_readpage_ctx {
> struct folio *cur_folio;
> - bool cur_folio_in_bio;
> + /*
> + * Is the folio owned by this readpage context, or by some
> + * external IO helper? Either way, the owner of the folio is
> + * responsible for unlocking it when the read completes.
> + */
Nit: you can use up to 80 characters for you comments.
Otherwise this looks good.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 04/16] iomap: store read/readahead bio generically
2025-09-08 18:51 ` [PATCH v2 04/16] iomap: store read/readahead bio generically Joanne Koong
@ 2025-09-11 11:11 ` Christoph Hellwig
2025-09-12 16:10 ` Joanne Koong
0 siblings, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2025-09-11 11:11 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
> + void *private;
private is always a bit annoying to grep for. Maybe fsprivate or
read_ctx instead?
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 05/16] iomap: propagate iomap_read_folio() error to caller
2025-09-08 18:51 ` [PATCH v2 05/16] iomap: propagate iomap_read_folio() error to caller Joanne Koong
@ 2025-09-11 11:13 ` Christoph Hellwig
2025-09-12 16:28 ` Joanne Koong
0 siblings, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2025-09-11 11:13 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On Mon, Sep 08, 2025 at 11:51:11AM -0700, Joanne Koong wrote:
> Propagate any error encountered in iomap_read_folio() back up to its
> caller (otherwise a default -EIO will be passed up by
> filemap_read_folio() to callers). This is standard behavior for how
> other filesystems handle their ->read_folio() errors as well.
>
> Remove the out of date comment about setting the folio error flag.
> Folio error flags were removed in commit 1f56eedf7ff7 ("iomap:
> Remove calls to set and clear folio error flag").
As mentioned last time I actually think this is a bad idea, and the most
common helpers (mpage_read_folio and block_read_full_folio) will not
return and error here.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 06/16] iomap: iterate over entire folio in iomap_readpage_iter()
2025-09-08 18:51 ` [PATCH v2 06/16] iomap: iterate over entire folio in iomap_readpage_iter() Joanne Koong
@ 2025-09-11 11:15 ` Christoph Hellwig
0 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-09-11 11:15 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On Mon, Sep 08, 2025 at 11:51:12AM -0700, Joanne Koong wrote:
> + /*
> + * We should never in practice hit this case since
> + * the iter length matches the readahead length.
> + */
This can also use up all 80 characters for the comment. Otherwise looks
fine:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 07/16] iomap: rename iomap_readpage_iter() to iomap_read_folio_iter()
2025-09-08 18:51 ` [PATCH v2 07/16] iomap: rename iomap_readpage_iter() to iomap_read_folio_iter() Joanne Koong
@ 2025-09-11 11:16 ` Christoph Hellwig
0 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-09-11 11:16 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 08/16] iomap: rename iomap_readpage_ctx struct to iomap_read_folio_ctx
2025-09-08 18:51 ` [PATCH v2 08/16] iomap: rename iomap_readpage_ctx struct to iomap_read_folio_ctx Joanne Koong
@ 2025-09-11 11:16 ` Christoph Hellwig
0 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-09-11 11:16 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 09/16] iomap: add public start/finish folio read helpers
2025-09-08 18:51 ` [PATCH v2 09/16] iomap: add public start/finish folio read helpers Joanne Koong
@ 2025-09-11 11:16 ` Christoph Hellwig
0 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-09-11 11:16 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 10/16] iomap: make iomap_read_folio_ctx->folio_owned internal
2025-09-08 18:51 ` [PATCH v2 10/16] iomap: make iomap_read_folio_ctx->folio_owned internal Joanne Koong
@ 2025-09-11 11:17 ` Christoph Hellwig
0 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-09-11 11:17 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On Mon, Sep 08, 2025 at 11:51:16AM -0700, Joanne Koong wrote:
> struct iomap_read_folio_ctx will be made a public interface when
> read/readahead takes in caller-provided callbacks.
>
> To make the interface simpler for end users, keep track of the folio
> ownership state internally instead of exposing it in struct
> iomap_read_folio_ctx.
This looks fine, but can we merge this with the patch renaming the
flag, please? Otherwise the previous rename is just pointless churn.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 11/16] iomap: add caller-provided callbacks for read and readahead
2025-09-10 17:41 ` Joanne Koong
@ 2025-09-11 11:19 ` Christoph Hellwig
0 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-09-11 11:19 UTC (permalink / raw)
To: Joanne Koong
Cc: Gao Xiang, djwong, hch, brauner, miklos, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On Wed, Sep 10, 2025 at 01:41:25PM -0400, Joanne Koong wrote:
> In my mind, the big question is whether or not the data the
> filesystems pass in is logically shared by both iomap_begin/end and
> buffered reads/writes/dio callbacks, or whether the data needed by
> both are basically separate entities
They are separate entities.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 11/16] iomap: add caller-provided callbacks for read and readahead
2025-09-08 18:51 ` [PATCH v2 11/16] iomap: add caller-provided callbacks for read and readahead Joanne Koong
2025-09-09 0:14 ` Gao Xiang
@ 2025-09-11 11:26 ` Christoph Hellwig
2025-09-12 17:36 ` Joanne Koong
1 sibling, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2025-09-11 11:26 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On Mon, Sep 08, 2025 at 11:51:17AM -0700, Joanne Koong wrote:
> + - ``read_folio_range``: Called to read in the range (read can be done
> + synchronously or asynchronously). This must be provided by the caller.
As far as I can tell, the interface is always based on an asynchronous
operation, but doesn't preclude completing it right away. So the above
is a little misleading.
> + struct iomap_read_folio_ctx ctx = {
> + .ops = &iomap_read_bios_ops,
> + .cur_folio = folio,
> + };
>
> + return iomap_read_folio(&blkdev_iomap_ops, &ctx);
> + struct iomap_read_folio_ctx ctx = {
> + .ops = &iomap_read_bios_ops,
> + .rac = rac,
> + };
> +
> + iomap_readahead(&blkdev_iomap_ops, &ctx);
Can you add iomap_bio_read_folio and iomap_bio_readahead inline helpers
to reduce this boilerplate code duplicated in various file systems?
> -static void iomap_submit_read_bio(struct iomap_read_folio_ctx *ctx)
> +static int iomap_submit_read_bio(struct iomap_read_folio_ctx *ctx)
> {
> struct bio *bio = ctx->private;
>
> if (bio)
> submit_bio(bio);
> +
> + return 0;
Submission interfaces that can return errors both synchronously and
asynchronously are extremely error probe. I'd be much happier if this
interface could not return errors.
> +const struct iomap_read_ops iomap_read_bios_ops = {
> + .read_folio_range = iomap_read_folio_range_bio_async,
> + .read_submit = iomap_submit_read_bio,
> +};
Please use tabs to align struct initializers before the '='.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 12/16] iomap: add bias for async read requests
2025-09-08 18:51 ` [PATCH v2 12/16] iomap: add bias for async read requests Joanne Koong
@ 2025-09-11 11:31 ` Christoph Hellwig
2025-09-12 17:30 ` Joanne Koong
0 siblings, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2025-09-11 11:31 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
> +static void __iomap_finish_folio_read(struct folio *folio, size_t off,
> + size_t len, int error, bool update_bitmap)
> {
> struct iomap_folio_state *ifs = folio->private;
> bool uptodate = !error;
> @@ -340,7 +340,7 @@ void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
> unsigned long flags;
>
> spin_lock_irqsave(&ifs->state_lock, flags);
> - if (!error)
> + if (!error && update_bitmap)
> uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
This code sharing keeps confusing me a bit. I think it's technically
perfectly fine, but not helpful for readability. We'd solve that by
open coding the !update_bitmap case in iomap_read_folio_iter. Which
would also allow to use spin_lock_irq instead of spin_lock_irqsave there
as a nice little micro-optimization. If we'd then also get rid of the
error return from ->read_folio_range and always do asynchronous error
returns it would be even simpler.
Or maybe I just need to live with the magic bitmap update, but the
fact that "len" sometimes is an actual length, and sometimes just a
counter for read_bytes_pending keeps confusing me
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 13/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard
2025-09-10 4:59 ` Gao Xiang
@ 2025-09-11 11:37 ` Christoph Hellwig
2025-09-11 12:29 ` Gao Xiang
0 siblings, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2025-09-11 11:37 UTC (permalink / raw)
To: Gao Xiang
Cc: Joanne Koong, brauner, miklos, hch, djwong, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On Wed, Sep 10, 2025 at 12:59:41PM +0800, Gao Xiang wrote:
> At least it sounds better on my side, but anyway it's just
> my own overall thought. If other folks have different idea,
> I don't have strong opinion, I just need something for my own
> as previous said.
I already dropped my two suggestions on the earlier patch. Not totally
happy about either my suggestions or data, but in full agreement that
it should be something else than private.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 13/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard
2025-09-08 18:51 ` [PATCH v2 13/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard Joanne Koong
2025-09-09 2:14 ` Gao Xiang
@ 2025-09-11 11:44 ` Christoph Hellwig
2025-09-16 23:23 ` Joanne Koong
1 sibling, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2025-09-11 11:44 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On Mon, Sep 08, 2025 at 11:51:19AM -0700, Joanne Koong wrote:
> There is no longer a dependency on CONFIG_BLOCK in the iomap read and
> readahead logic. Move this logic out of the CONFIG_BLOCK guard. This
> allows non-block-based filesystems to use iomap for reads/readahead.
Please move the bio code into a new file. Example patch attached below
that does just that without addressing any of the previous comments:
diff --git a/fs/iomap/Makefile b/fs/iomap/Makefile
index f7e1c8534c46..a572b8808524 100644
--- a/fs/iomap/Makefile
+++ b/fs/iomap/Makefile
@@ -14,5 +14,6 @@ iomap-y += trace.o \
iomap-$(CONFIG_BLOCK) += direct-io.o \
ioend.o \
fiemap.o \
- seek.o
+ seek.o \
+ bio.o
iomap-$(CONFIG_SWAP) += swapfile.o
diff --git a/fs/iomap/bio.c b/fs/iomap/bio.c
new file mode 100644
index 000000000000..bcb87441be9f
--- /dev/null
+++ b/fs/iomap/bio.c
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2010 Red Hat, Inc.
+ * Copyright (C) 2016-2023 Christoph Hellwig.
+ */
+#include <linux/iomap.h>
+#include <linux/pagemap.h>
+#include "internal.h"
+#include "trace.h"
+
+static void iomap_read_end_io(struct bio *bio)
+{
+ int error = blk_status_to_errno(bio->bi_status);
+ struct folio_iter fi;
+
+ bio_for_each_folio_all(fi, bio)
+ iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error);
+ bio_put(bio);
+}
+
+static int iomap_submit_read_bio(struct iomap_read_folio_ctx *ctx)
+{
+ struct bio *bio = ctx->private;
+
+ if (bio)
+ submit_bio(bio);
+
+ return 0;
+}
+
+/**
+ * Read in a folio range asynchronously through bios.
+ *
+ * This should only be used for read/readahead, not for buffered writes.
+ * Buffered writes must read in the folio synchronously.
+ */
+static int iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
+ struct iomap_read_folio_ctx *ctx, loff_t pos, size_t plen)
+{
+ struct folio *folio = ctx->cur_folio;
+ const struct iomap *iomap = &iter->iomap;
+ size_t poff = offset_in_folio(folio, pos);
+ loff_t length = iomap_length(iter);
+ sector_t sector;
+ struct bio *bio = ctx->private;
+
+ iomap_start_folio_read(folio, plen);
+
+ sector = iomap_sector(iomap, pos);
+ if (!bio || bio_end_sector(bio) != sector ||
+ !bio_add_folio(bio, folio, plen, poff)) {
+ gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
+ gfp_t orig_gfp = gfp;
+ unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
+
+ if (bio)
+ submit_bio(bio);
+
+ if (ctx->rac) /* same as readahead_gfp_mask */
+ gfp |= __GFP_NORETRY | __GFP_NOWARN;
+ bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
+ REQ_OP_READ, gfp);
+ /*
+ * If the bio_alloc fails, try it again for a single page to
+ * avoid having to deal with partial page reads. This emulates
+ * what do_mpage_read_folio does.
+ */
+ if (!bio)
+ bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ, orig_gfp);
+ if (ctx->rac)
+ bio->bi_opf |= REQ_RAHEAD;
+ bio->bi_iter.bi_sector = sector;
+ bio->bi_end_io = iomap_read_end_io;
+ bio_add_folio_nofail(bio, folio, plen, poff);
+ ctx->private = bio;
+ }
+ return 0;
+}
+
+const struct iomap_read_ops iomap_read_bios_ops = {
+ .read_folio_range = iomap_read_folio_range_bio_async,
+ .read_submit = iomap_submit_read_bio,
+};
+EXPORT_SYMBOL_GPL(iomap_read_bios_ops);
+
+int iomap_read_folio_range(const struct iomap_iter *iter,
+ struct folio *folio, loff_t pos, size_t len)
+{
+ const struct iomap *srcmap = iomap_iter_srcmap(iter);
+ struct bio_vec bvec;
+ struct bio bio;
+
+ bio_init(&bio, srcmap->bdev, &bvec, 1, REQ_OP_READ);
+ bio.bi_iter.bi_sector = iomap_sector(srcmap, pos);
+ bio_add_folio_nofail(&bio, folio, len, offset_in_folio(folio, pos));
+ return submit_bio_wait(&bio);
+}
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index c424e8c157dd..48626c11f3d8 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -8,6 +8,7 @@
#include <linux/writeback.h>
#include <linux/swap.h>
#include <linux/migrate.h>
+#include "internal.h"
#include "trace.h"
#include "../internal.h"
@@ -526,103 +527,6 @@ void iomap_readahead(const struct iomap_ops *ops,
}
EXPORT_SYMBOL_GPL(iomap_readahead);
-#ifdef CONFIG_BLOCK
-static void iomap_read_end_io(struct bio *bio)
-{
- int error = blk_status_to_errno(bio->bi_status);
- struct folio_iter fi;
-
- bio_for_each_folio_all(fi, bio)
- iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error);
- bio_put(bio);
-}
-
-static int iomap_submit_read_bio(struct iomap_read_folio_ctx *ctx)
-{
- struct bio *bio = ctx->private;
-
- if (bio)
- submit_bio(bio);
-
- return 0;
-}
-
-/**
- * Read in a folio range asynchronously through bios.
- *
- * This should only be used for read/readahead, not for buffered writes.
- * Buffered writes must read in the folio synchronously.
- */
-static int iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
- struct iomap_read_folio_ctx *ctx, loff_t pos, size_t plen)
-{
- struct folio *folio = ctx->cur_folio;
- const struct iomap *iomap = &iter->iomap;
- size_t poff = offset_in_folio(folio, pos);
- loff_t length = iomap_length(iter);
- sector_t sector;
- struct bio *bio = ctx->private;
-
- iomap_start_folio_read(folio, plen);
-
- sector = iomap_sector(iomap, pos);
- if (!bio || bio_end_sector(bio) != sector ||
- !bio_add_folio(bio, folio, plen, poff)) {
- gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
- gfp_t orig_gfp = gfp;
- unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
-
- if (bio)
- submit_bio(bio);
-
- if (ctx->rac) /* same as readahead_gfp_mask */
- gfp |= __GFP_NORETRY | __GFP_NOWARN;
- bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
- REQ_OP_READ, gfp);
- /*
- * If the bio_alloc fails, try it again for a single page to
- * avoid having to deal with partial page reads. This emulates
- * what do_mpage_read_folio does.
- */
- if (!bio)
- bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ, orig_gfp);
- if (ctx->rac)
- bio->bi_opf |= REQ_RAHEAD;
- bio->bi_iter.bi_sector = sector;
- bio->bi_end_io = iomap_read_end_io;
- bio_add_folio_nofail(bio, folio, plen, poff);
- ctx->private = bio;
- }
- return 0;
-}
-
-const struct iomap_read_ops iomap_read_bios_ops = {
- .read_folio_range = iomap_read_folio_range_bio_async,
- .read_submit = iomap_submit_read_bio,
-};
-EXPORT_SYMBOL_GPL(iomap_read_bios_ops);
-
-static int iomap_read_folio_range(const struct iomap_iter *iter,
- struct folio *folio, loff_t pos, size_t len)
-{
- const struct iomap *srcmap = iomap_iter_srcmap(iter);
- struct bio_vec bvec;
- struct bio bio;
-
- bio_init(&bio, srcmap->bdev, &bvec, 1, REQ_OP_READ);
- bio.bi_iter.bi_sector = iomap_sector(srcmap, pos);
- bio_add_folio_nofail(&bio, folio, len, offset_in_folio(folio, pos));
- return submit_bio_wait(&bio);
-}
-#else
-static int iomap_read_folio_range(const struct iomap_iter *iter,
- struct folio *folio, loff_t pos, size_t len)
-{
- WARN_ON_ONCE(1);
- return -EIO;
-}
-#endif /* CONFIG_BLOCK */
-
/*
* iomap_is_partially_uptodate checks whether blocks within a folio are
* uptodate or not.
diff --git a/fs/iomap/internal.h b/fs/iomap/internal.h
index d05cb3aed96e..dc6e95c93f13 100644
--- a/fs/iomap/internal.h
+++ b/fs/iomap/internal.h
@@ -6,4 +6,17 @@
u32 iomap_finish_ioend_direct(struct iomap_ioend *ioend);
+#ifdef CONFIG_BLOCK
+int iomap_read_folio_range(const struct iomap_iter *iter,
+ struct folio *folio, loff_t pos, size_t len);
+#else
+static int iomap_read_folio_range(const struct iomap_iter *iter,
+ struct folio *folio, loff_t pos, size_t len)
+{
+ WARN_ON_ONCE(1);
+ return -EIO;
+}
+#endif /* CONFIG_BLOCK */
+
+
#endif /* _IOMAP_INTERNAL_H */
^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH v2 13/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard
2025-09-11 11:37 ` Christoph Hellwig
@ 2025-09-11 12:29 ` Gao Xiang
2025-09-11 19:45 ` Joanne Koong
0 siblings, 1 reply; 60+ messages in thread
From: Gao Xiang @ 2025-09-11 12:29 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Joanne Koong, brauner, miklos, djwong, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
Hi Christoph,
On 2025/9/11 19:37, Christoph Hellwig wrote:
> On Wed, Sep 10, 2025 at 12:59:41PM +0800, Gao Xiang wrote:
>> At least it sounds better on my side, but anyway it's just
>> my own overall thought. If other folks have different idea,
>> I don't have strong opinion, I just need something for my own
>> as previous said.
>
> I already dropped my two suggestions on the earlier patch. Not totally
> happy about either my suggestions or data, but in full agreement that
> it should be something else than private.
To just quote your previous comment and try to discuss here:
```
On Wed, Sep 10, 2025 at 01:41:25PM -0400, Joanne Koong wrote:
> In my mind, the big question is whether or not the data the
> filesystems pass in is logically shared by both iomap_begin/end and
> buffered reads/writes/dio callbacks, or whether the data needed by
> both are basically separate entities
They are separate entities.
```
I try to push this again because I'm still not quite sure it's
a good idea, let's take this FUSE iomap-read proposal (but sorry
honestly I not fully look into the whole series.)
```
struct fuse_fill_read_data {
struct file *file;
+
+ /*
+ * Fields below are used if sending the read request
+ * asynchronously.
+ */
+ struct fuse_conn *fc;
+ struct fuse_io_args *ia;
+ unsigned int nr_bytes;
};
```
which is just a new FUSE-only-specific context for
`struct iomap_read_folio_ctx`, it's not used by .iomap_{begin,end}
is that basically FUSE _currently_ doesn't have logical-to-physical
mapping requirement (except for another fuse_iomap_begin in dax.c):
```
static int fuse_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
unsigned int flags, struct iomap *iomap,
struct iomap *srcmap)
{
iomap->type = IOMAP_MAPPED;
iomap->length = length;
iomap->offset = offset;
return 0;
}
```
But if FUSE or some other fs later needs to request L2P information
in their .iomap_begin() and need to send L2P requests to userspace
daemon to confirm where to get the physical data (maybe somewhat
like Darrick's work but I don't have extra time to dig into that
either) rather than just something totally bypass iomap-L2P logic
as above, then I'm not sure the current `iomap_iter->private` is
quite seperate to `struct iomap_read_folio_ctx->private`, it seems
both needs fs-specific extra contexts for the same I/O flow.
I think the reason why `struct iomap_read_folio_ctx->private` is
introduced is basically previous iomap filesystems are all
bio-based, and they shares `bio` concept in common but
`iter->private` was not designed for this usage.
But fuse `struct iomap_read_folio_ctx` and
`struct fuse_fill_read_data` are too FUSE-specific, I cannot
see it could be shared by other filesystems in the near future,
which is much like a single-filesystem specific concept, and
unlike to `bio` at all.
I've already racked my brains on this but I have no better
idea on the current callback-hook model (so I don't want to argue
more). Anyway, I really think it should be carefully designed
(because the current FUSE .iomap_{begin,end} are just like no-op
but that is just fuse-specific). If folks really think Joanne's
work is already best or we can live with that, I'm totally fine.
Thanks,
Gao Xiang
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 13/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard
2025-09-11 12:29 ` Gao Xiang
@ 2025-09-11 19:45 ` Joanne Koong
2025-09-12 0:06 ` Gao Xiang
0 siblings, 1 reply; 60+ messages in thread
From: Joanne Koong @ 2025-09-11 19:45 UTC (permalink / raw)
To: Gao Xiang
Cc: Christoph Hellwig, brauner, miklos, djwong, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On Thu, Sep 11, 2025 at 8:29 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>
> Hi Christoph,
>
> On 2025/9/11 19:37, Christoph Hellwig wrote:
> > On Wed, Sep 10, 2025 at 12:59:41PM +0800, Gao Xiang wrote:
> >> At least it sounds better on my side, but anyway it's just
> >> my own overall thought. If other folks have different idea,
> >> I don't have strong opinion, I just need something for my own
> >> as previous said.
> >
> > I already dropped my two suggestions on the earlier patch. Not totally
> > happy about either my suggestions or data, but in full agreement that
> > it should be something else than private.
>
> To just quote your previous comment and try to discuss here:
>
> ```
> On Wed, Sep 10, 2025 at 01:41:25PM -0400, Joanne Koong wrote:
> > In my mind, the big question is whether or not the data the
> > filesystems pass in is logically shared by both iomap_begin/end and
> > buffered reads/writes/dio callbacks, or whether the data needed by
> > both are basically separate entities
>
> They are separate entities.
> ```
>
Hi Gao,
> I try to push this again because I'm still not quite sure it's
> a good idea, let's take this FUSE iomap-read proposal (but sorry
> honestly I not fully look into the whole series.)
>
> ```
> struct fuse_fill_read_data {
> struct file *file;
> +
> + /*
> + * Fields below are used if sending the read request
> + * asynchronously.
> + */
> + struct fuse_conn *fc;
> + struct fuse_io_args *ia;
> + unsigned int nr_bytes;
> };
> ```
>
> which is just a new FUSE-only-specific context for
> `struct iomap_read_folio_ctx`, it's not used by .iomap_{begin,end}
> is that basically FUSE _currently_ doesn't have logical-to-physical
> mapping requirement (except for another fuse_iomap_begin in dax.c):
I don't think this is true. The other filesystems in the kernel using
iomap that do need logical to physical mappings also do not have their
context for `struct iomap_read_folio_ctx` (the struct bio) used by
.iomap_{begin, end} either. As I see it, the purpose of the `struct
iomap_read_folio_ctx` context is for processing/issuing the reads and
the context for .iomap_{begin,end} is for doing all the mapping /
general metadata tracking stuff - even for the filesystems that have
the logical to physical mapping requirements, their usage of the
context is for processing/submitting the bio read requests, which imo
the more high-level iomap_{begin,end} is a layer above.
> ```
> static int fuse_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> unsigned int flags, struct iomap *iomap,
> struct iomap *srcmap)
> {
> iomap->type = IOMAP_MAPPED;
> iomap->length = length;
> iomap->offset = offset;
> return 0;
> }
> ```
>
> But if FUSE or some other fs later needs to request L2P information
> in their .iomap_begin() and need to send L2P requests to userspace
> daemon to confirm where to get the physical data (maybe somewhat
> like Darrick's work but I don't have extra time to dig into that
> either) rather than just something totally bypass iomap-L2P logic
> as above, then I'm not sure the current `iomap_iter->private` is
> quite seperate to `struct iomap_read_folio_ctx->private`, it seems
If in the future this case arises, the L2P mapping info is accessible
by the read callback in the current design. `.read_folio_range()`
passes the iomap iter to the filesystem and they can access
iter->private to get the L2P mapping data they need.
> both needs fs-specific extra contexts for the same I/O flow.
>
> I think the reason why `struct iomap_read_folio_ctx->private` is
> introduced is basically previous iomap filesystems are all
> bio-based, and they shares `bio` concept in common but
> `iter->private` was not designed for this usage.
>
> But fuse `struct iomap_read_folio_ctx` and
> `struct fuse_fill_read_data` are too FUSE-specific, I cannot
> see it could be shared by other filesystems in the near future,
> which is much like a single-filesystem specific concept, and
> unlike to `bio` at all.
Currently fuse is the only non-block-based filesystem using iomap but
I don't see why there wouldn't be more in the future. For example,
while looking at some of the netfs code, a lot of the core
functionality looks the same between that and iomap and I think it
might be a good idea to have netfs in the future use iomap's interface
so that it can get the large folio dirty/uptodate tracking stuff and
any other large folio stuff like more granular writeback stats
accounting for free.
Thanks,
Joanne
>
> I've already racked my brains on this but I have no better
> idea on the current callback-hook model (so I don't want to argue
> more). Anyway, I really think it should be carefully designed
> (because the current FUSE .iomap_{begin,end} are just like no-op
> but that is just fuse-specific). If folks really think Joanne's
> work is already best or we can live with that, I'm totally fine.
>
> Thanks,
> Gao Xiang
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 13/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard
2025-09-11 19:45 ` Joanne Koong
@ 2025-09-12 0:06 ` Gao Xiang
2025-09-12 1:09 ` Gao Xiang
0 siblings, 1 reply; 60+ messages in thread
From: Gao Xiang @ 2025-09-12 0:06 UTC (permalink / raw)
To: Joanne Koong
Cc: Christoph Hellwig, brauner, miklos, djwong, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On 2025/9/12 03:45, Joanne Koong wrote:
> On Thu, Sep 11, 2025 at 8:29 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
...
>> ```
>>
>> But if FUSE or some other fs later needs to request L2P information
>> in their .iomap_begin() and need to send L2P requests to userspace
>> daemon to confirm where to get the physical data (maybe somewhat
>> like Darrick's work but I don't have extra time to dig into that
>> either) rather than just something totally bypass iomap-L2P logic
>> as above, then I'm not sure the current `iomap_iter->private` is
>> quite seperate to `struct iomap_read_folio_ctx->private`, it seems
>
> If in the future this case arises, the L2P mapping info is accessible
> by the read callback in the current design. `.read_folio_range()`
> passes the iomap iter to the filesystem and they can access
> iter->private to get the L2P mapping data they need.
The question is what exposes to `iter->private` then, take
an example:
```
struct file *file;
```
your .read_folio_range() needs `file->private_data` to get
`struct fuse_file` so `file` is kept into
`struct iomap_read_folio_ctx`.
If `file->private_data` will be used for `.iomap_begin()`
as well, what's your proposal then?
Duplicate the same `file` pointer in both
`struct iomap_read_folio_ctx` and `iter->private` context?
>
>> both needs fs-specific extra contexts for the same I/O flow.
>>
>> I think the reason why `struct iomap_read_folio_ctx->private` is
>> introduced is basically previous iomap filesystems are all
>> bio-based, and they shares `bio` concept in common but
>> `iter->private` was not designed for this usage.
>>
>> But fuse `struct iomap_read_folio_ctx` and
>> `struct fuse_fill_read_data` are too FUSE-specific, I cannot
>> see it could be shared by other filesystems in the near future,
>> which is much like a single-filesystem specific concept, and
>> unlike to `bio` at all.
>
> Currently fuse is the only non-block-based filesystem using iomap but
> I don't see why there wouldn't be more in the future. For example,
> while looking at some of the netfs code, a lot of the core
> functionality looks the same between that and iomap and I think it
> might be a good idea to have netfs in the future use iomap's interface
> so that it can get the large folio dirty/uptodate tracking stuff and
> any other large folio stuff like more granular writeback stats
> accounting for free.
I think you need to ask David on this idea, I've told him to
switch fscache to use iomap in 2022 before netfs is fully out [1],
but I don't see it will happen.
[1] https://lore.kernel.org/linux-fsdevel/YfivxC9S52FlyKoL@B-P7TQMD6M-0146/
Thanks,
Gao Xiang
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 13/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard
2025-09-12 0:06 ` Gao Xiang
@ 2025-09-12 1:09 ` Gao Xiang
2025-09-12 1:10 ` Gao Xiang
0 siblings, 1 reply; 60+ messages in thread
From: Gao Xiang @ 2025-09-12 1:09 UTC (permalink / raw)
To: Joanne Koong
Cc: Christoph Hellwig, brauner, miklos, djwong, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On 2025/9/12 08:06, Gao Xiang wrote:
>
>
> On 2025/9/12 03:45, Joanne Koong wrote:
>> On Thu, Sep 11, 2025 at 8:29 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>
> ...
>
>>> ```
>>>
>>> But if FUSE or some other fs later needs to request L2P information
>>> in their .iomap_begin() and need to send L2P requests to userspace
>>> daemon to confirm where to get the physical data (maybe somewhat
>>> like Darrick's work but I don't have extra time to dig into that
>>> either) rather than just something totally bypass iomap-L2P logic
>>> as above, then I'm not sure the current `iomap_iter->private` is
>>> quite seperate to `struct iomap_read_folio_ctx->private`, it seems
>>
>> If in the future this case arises, the L2P mapping info is accessible
>> by the read callback in the current design. `.read_folio_range()`
>> passes the iomap iter to the filesystem and they can access
>> iter->private to get the L2P mapping data they need.
>
> The question is what exposes to `iter->private` then, take
> an example:
>
> ```
> struct file *file;
> ```
>
> your .read_folio_range() needs `file->private_data` to get
> `struct fuse_file` so `file` is kept into
> `struct iomap_read_folio_ctx`.
>
> If `file->private_data` will be used for `.iomap_begin()`
> as well, what's your proposal then?
>
> Duplicate the same `file` pointer in both
> `struct iomap_read_folio_ctx` and `iter->private` context?
It's just an not-so-appropriate example because
`struct file *` and `struct fuse_file *` are widely used
in the (buffer/direct) read/write flow but Darrick's work
doesn't use `file` in .iomap_{begin/end}.
But you may find out `file` pointer is already used for
both FUSE buffer write and your proposal, e.g.
buffer write:
/*
* Use iomap so that we can do granular uptodate reads
* and granular dirty tracking for large folios.
*/
written = iomap_file_buffered_write(iocb, from,
&fuse_iomap_ops,
&fuse_iomap_write_ops,
file);
I just try to say if there is a case/feature which needs
something previously in `struct iomap_read_folio_ctx` to
be available in .iomap_{begin,end} too, you have to either:
- duplicate this in `iter->private` as well;
- move this to `iter->private` entirely.
The problem is that both `iter->private` and
`struct iomap_read_folio_ctx` are filesystem-specific,
I can only see there is no clear boundary to leave something
in which one. It seems just like an artificial choice.
Thanks,
Gao Xiang
>
>
>>
>>> both needs fs-specific extra contexts for the same I/O flow.
>>>
>>> I think the reason why `struct iomap_read_folio_ctx->private` is
>>> introduced is basically previous iomap filesystems are all
>>> bio-based, and they shares `bio` concept in common but
>>> `iter->private` was not designed for this usage.
>>>
>>> But fuse `struct iomap_read_folio_ctx` and
>>> `struct fuse_fill_read_data` are too FUSE-specific, I cannot
>>> see it could be shared by other filesystems in the near future,
>>> which is much like a single-filesystem specific concept, and
>>> unlike to `bio` at all.
>>
>> Currently fuse is the only non-block-based filesystem using iomap but
>> I don't see why there wouldn't be more in the future. For example,
>> while looking at some of the netfs code, a lot of the core
>> functionality looks the same between that and iomap and I think it
>> might be a good idea to have netfs in the future use iomap's interface
>> so that it can get the large folio dirty/uptodate tracking stuff and
>> any other large folio stuff like more granular writeback stats
>> accounting for free.
>
> I think you need to ask David on this idea, I've told him to
> switch fscache to use iomap in 2022 before netfs is fully out [1],
> but I don't see it will happen.
>
> [1] https://lore.kernel.org/linux-fsdevel/YfivxC9S52FlyKoL@B-P7TQMD6M-0146/
>
> Thanks,
> Gao Xiang
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 13/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard
2025-09-12 1:09 ` Gao Xiang
@ 2025-09-12 1:10 ` Gao Xiang
2025-09-12 19:56 ` Joanne Koong
0 siblings, 1 reply; 60+ messages in thread
From: Gao Xiang @ 2025-09-12 1:10 UTC (permalink / raw)
To: Joanne Koong
Cc: Christoph Hellwig, brauner, miklos, djwong, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On 2025/9/12 09:09, Gao Xiang wrote:
>
>
> On 2025/9/12 08:06, Gao Xiang wrote:
>>
>>
>> On 2025/9/12 03:45, Joanne Koong wrote:
>>> On Thu, Sep 11, 2025 at 8:29 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>
>> ...
>>
>>>> ```
>>>>
>>>> But if FUSE or some other fs later needs to request L2P information
>>>> in their .iomap_begin() and need to send L2P requests to userspace
>>>> daemon to confirm where to get the physical data (maybe somewhat
>>>> like Darrick's work but I don't have extra time to dig into that
>>>> either) rather than just something totally bypass iomap-L2P logic
>>>> as above, then I'm not sure the current `iomap_iter->private` is
>>>> quite seperate to `struct iomap_read_folio_ctx->private`, it seems
>>>
>>> If in the future this case arises, the L2P mapping info is accessible
>>> by the read callback in the current design. `.read_folio_range()`
>>> passes the iomap iter to the filesystem and they can access
>>> iter->private to get the L2P mapping data they need.
>>
>> The question is what exposes to `iter->private` then, take
>> an example:
>>
>> ```
>> struct file *file;
>> ```
>>
>> your .read_folio_range() needs `file->private_data` to get
>> `struct fuse_file` so `file` is kept into
>> `struct iomap_read_folio_ctx`.
>>
>> If `file->private_data` will be used for `.iomap_begin()`
>> as well, what's your proposal then?
>>
>> Duplicate the same `file` pointer in both
>> `struct iomap_read_folio_ctx` and `iter->private` context?
>
> It's just an not-so-appropriate example because
> `struct file *` and `struct fuse_file *` are widely used
> in the (buffer/direct) read/write flow but Darrick's work
> doesn't use `file` in .iomap_{begin/end}.
>
> But you may find out `file` pointer is already used for
> both FUSE buffer write and your proposal, e.g.
>
> buffer write:
> /*
> * Use iomap so that we can do granular uptodate reads
> * and granular dirty tracking for large folios.
> */
> written = iomap_file_buffered_write(iocb, from,
> &fuse_iomap_ops,
> &fuse_iomap_write_ops,
> file);
And your buffer write per-fs context seems just use
`iter->private` entirely instead to keep `file`.
>
>
> I just try to say if there is a case/feature which needs
> something previously in `struct iomap_read_folio_ctx` to
> be available in .iomap_{begin,end} too, you have to either:
> - duplicate this in `iter->private` as well;
> - move this to `iter->private` entirely.
>
> The problem is that both `iter->private` and
> `struct iomap_read_folio_ctx` are filesystem-specific,
> I can only see there is no clear boundary to leave something
> in which one. It seems just like an artificial choice.
>
> Thanks,
> Gao Xiang
>
>>
>>
>>>
>>>> both needs fs-specific extra contexts for the same I/O flow.
>>>>
>>>> I think the reason why `struct iomap_read_folio_ctx->private` is
>>>> introduced is basically previous iomap filesystems are all
>>>> bio-based, and they shares `bio` concept in common but
>>>> `iter->private` was not designed for this usage.
>>>>
>>>> But fuse `struct iomap_read_folio_ctx` and
>>>> `struct fuse_fill_read_data` are too FUSE-specific, I cannot
>>>> see it could be shared by other filesystems in the near future,
>>>> which is much like a single-filesystem specific concept, and
>>>> unlike to `bio` at all.
>>>
>>> Currently fuse is the only non-block-based filesystem using iomap but
>>> I don't see why there wouldn't be more in the future. For example,
>>> while looking at some of the netfs code, a lot of the core
>>> functionality looks the same between that and iomap and I think it
>>> might be a good idea to have netfs in the future use iomap's interface
>>> so that it can get the large folio dirty/uptodate tracking stuff and
>>> any other large folio stuff like more granular writeback stats
>>> accounting for free.
>>
>> I think you need to ask David on this idea, I've told him to
>> switch fscache to use iomap in 2022 before netfs is fully out [1],
>> but I don't see it will happen.
>>
>> [1] https://lore.kernel.org/linux-fsdevel/YfivxC9S52FlyKoL@B-P7TQMD6M-0146/
>>
>> Thanks,
>> Gao Xiang
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 01/16] iomap: move async bio read logic into helper function
2025-09-11 11:09 ` Christoph Hellwig
@ 2025-09-12 16:01 ` Joanne Koong
0 siblings, 0 replies; 60+ messages in thread
From: Joanne Koong @ 2025-09-12 16:01 UTC (permalink / raw)
To: Christoph Hellwig
Cc: brauner, miklos, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On Thu, Sep 11, 2025 at 7:13 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> > +static void iomap_read_folio_range_bio_async(const struct iomap_iter *iter,
> > + struct iomap_readpage_ctx *ctx, loff_t pos, size_t plen)
>
> The _async here feels very misplaced, because pretty much everyting in
> the area except for the odd write_begin helper is async, and the postfix
> does not match the method name.
>
> Also as a general discussion for naming, having common prefixed for sets
> of related helpers is nice. Maybe use iomap_bio_* for all the bio
> helpers were're adding here? We can then fix the direct I/O code to
> match that later.
Great point, I'll change this to "iomap_bio_read_folio_range()" for
the async version and then "iomap_bio_read_folio_range_sync()" for the
synchronous version.
>
> > {
> > + struct folio *folio = ctx->cur_folio;
> > const struct iomap *iomap = &iter->iomap;
> > - loff_t pos = iter->pos;
>
> Looking at the caller, it seems we should not need the pos argument if
> we adjust pos just after calculating count at the beginning of the loop.
> I think that would be a much better interface.
>
Sounds good, in that case I think we should do the same for the the
buffered writes ->read_folio_range() api later too then to have it
match
Thanks,
Joanne
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 04/16] iomap: store read/readahead bio generically
2025-09-11 11:11 ` Christoph Hellwig
@ 2025-09-12 16:10 ` Joanne Koong
0 siblings, 0 replies; 60+ messages in thread
From: Joanne Koong @ 2025-09-12 16:10 UTC (permalink / raw)
To: Christoph Hellwig
Cc: brauner, miklos, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On Thu, Sep 11, 2025 at 7:11 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> > + void *private;
>
> private is always a bit annoying to grep for. Maybe fsprivate or
> read_ctx instead?
>
I'll change this to read_ctx. It'll match the "wb_ctx" in struct
iomap_writepage_ctx.
Thanks,
Joanne
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 05/16] iomap: propagate iomap_read_folio() error to caller
2025-09-11 11:13 ` Christoph Hellwig
@ 2025-09-12 16:28 ` Joanne Koong
2025-09-15 16:05 ` Christoph Hellwig
0 siblings, 1 reply; 60+ messages in thread
From: Joanne Koong @ 2025-09-12 16:28 UTC (permalink / raw)
To: Christoph Hellwig
Cc: brauner, miklos, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On Thu, Sep 11, 2025 at 7:13 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Mon, Sep 08, 2025 at 11:51:11AM -0700, Joanne Koong wrote:
> > Propagate any error encountered in iomap_read_folio() back up to its
> > caller (otherwise a default -EIO will be passed up by
> > filemap_read_folio() to callers). This is standard behavior for how
> > other filesystems handle their ->read_folio() errors as well.
> >
> > Remove the out of date comment about setting the folio error flag.
> > Folio error flags were removed in commit 1f56eedf7ff7 ("iomap:
> > Remove calls to set and clear folio error flag").
>
> As mentioned last time I actually think this is a bad idea, and the most
> common helpers (mpage_read_folio and block_read_full_folio) will not
> return and error here.
>
>
I'll drop this. I interpreted Matthew's comment to mean the error
return isn't useful for ->readahead but is for ->read_folio.
If iomap_read_folio() doesn't do error returns and always just returns
0, then maybe we should just make it `void iomap_read_folio(...)`
instead of `int iomap_read_folio(...)` then.
Thanks,
Joanne
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 12/16] iomap: add bias for async read requests
2025-09-11 11:31 ` Christoph Hellwig
@ 2025-09-12 17:30 ` Joanne Koong
2025-09-15 16:05 ` Christoph Hellwig
2025-09-16 19:14 ` Joanne Koong
0 siblings, 2 replies; 60+ messages in thread
From: Joanne Koong @ 2025-09-12 17:30 UTC (permalink / raw)
To: Christoph Hellwig
Cc: brauner, miklos, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On Thu, Sep 11, 2025 at 7:31 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> > +static void __iomap_finish_folio_read(struct folio *folio, size_t off,
> > + size_t len, int error, bool update_bitmap)
> > {
> > struct iomap_folio_state *ifs = folio->private;
> > bool uptodate = !error;
> > @@ -340,7 +340,7 @@ void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
> > unsigned long flags;
> >
> > spin_lock_irqsave(&ifs->state_lock, flags);
> > - if (!error)
> > + if (!error && update_bitmap)
> > uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
>
> This code sharing keeps confusing me a bit. I think it's technically
> perfectly fine, but not helpful for readability. We'd solve that by
> open coding the !update_bitmap case in iomap_read_folio_iter. Which
> would also allow to use spin_lock_irq instead of spin_lock_irqsave there
> as a nice little micro-optimization. If we'd then also get rid of the
> error return from ->read_folio_range and always do asynchronous error
> returns it would be even simpler.
>
> Or maybe I just need to live with the magic bitmap update, but the
> fact that "len" sometimes is an actual length, and sometimes just a
> counter for read_bytes_pending keeps confusing me
>
I think you're right, this is probably clearer without trying to share
the function.
I think maybe we can make this even simpler. Right now we mark the
bitmap uptodate every time a range is read in but I think instead we
can just do one bitmap uptodate operation for the entire folio when
the read has completely finished. If we do this, then we can make
"ifs->read_bytes_pending" back to an atomic_t since we don't save one
atomic operation from doing it through a spinlock anymore (eg what
commit f45b494e2a "iomap: protect read_bytes_pending with the
state_lock" optimized). And then this bias thing can just become:
if (ifs) {
if (atomic_dec_and_test(&ifs->read_bytes_pending))
folio_end_read(folio, !ret);
*cur_folio_owned = true;
}
Thanks,
Joanne
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 11/16] iomap: add caller-provided callbacks for read and readahead
2025-09-11 11:26 ` Christoph Hellwig
@ 2025-09-12 17:36 ` Joanne Koong
0 siblings, 0 replies; 60+ messages in thread
From: Joanne Koong @ 2025-09-12 17:36 UTC (permalink / raw)
To: Christoph Hellwig
Cc: brauner, miklos, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On Thu, Sep 11, 2025 at 7:26 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Mon, Sep 08, 2025 at 11:51:17AM -0700, Joanne Koong wrote:
> > + - ``read_folio_range``: Called to read in the range (read can be done
> > + synchronously or asynchronously). This must be provided by the caller.
>
> As far as I can tell, the interface is always based on an asynchronous
> operation, but doesn't preclude completing it right away. So the above
> is a little misleading.
>
> > + struct iomap_read_folio_ctx ctx = {
> > + .ops = &iomap_read_bios_ops,
> > + .cur_folio = folio,
> > + };
> >
> > + return iomap_read_folio(&blkdev_iomap_ops, &ctx);
>
> > + struct iomap_read_folio_ctx ctx = {
> > + .ops = &iomap_read_bios_ops,
> > + .rac = rac,
> > + };
> > +
> > + iomap_readahead(&blkdev_iomap_ops, &ctx);
>
> Can you add iomap_bio_read_folio and iomap_bio_readahead inline helpers
> to reduce this boilerplate code duplicated in various file systems?
>
> > -static void iomap_submit_read_bio(struct iomap_read_folio_ctx *ctx)
> > +static int iomap_submit_read_bio(struct iomap_read_folio_ctx *ctx)
> > {
> > struct bio *bio = ctx->private;
> >
> > if (bio)
> > submit_bio(bio);
> > +
> > + return 0;
>
> Submission interfaces that can return errors both synchronously and
> asynchronously are extremely error probe. I'd be much happier if this
> interface could not return errors.
Sounds great, I will make these changes you suggested here and in your
comments on the other patches too.
Thank you for reviewing this patchset.
>
> > +const struct iomap_read_ops iomap_read_bios_ops = {
> > + .read_folio_range = iomap_read_folio_range_bio_async,
> > + .read_submit = iomap_submit_read_bio,
> > +};
>
> Please use tabs to align struct initializers before the '='.
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 13/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard
2025-09-12 1:10 ` Gao Xiang
@ 2025-09-12 19:56 ` Joanne Koong
2025-09-12 20:09 ` Joanne Koong
2025-09-12 23:20 ` Gao Xiang
0 siblings, 2 replies; 60+ messages in thread
From: Joanne Koong @ 2025-09-12 19:56 UTC (permalink / raw)
To: Gao Xiang
Cc: Christoph Hellwig, brauner, miklos, djwong, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On Thu, Sep 11, 2025 at 9:11 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>
> On 2025/9/12 09:09, Gao Xiang wrote:
> >
> >
> > On 2025/9/12 08:06, Gao Xiang wrote:
> >>
> >>
> >> On 2025/9/12 03:45, Joanne Koong wrote:
> >>> On Thu, Sep 11, 2025 at 8:29 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> >>
> >>>> But if FUSE or some other fs later needs to request L2P information
> >>>> in their .iomap_begin() and need to send L2P requests to userspace
> >>>> daemon to confirm where to get the physical data (maybe somewhat
> >>>> like Darrick's work but I don't have extra time to dig into that
> >>>> either) rather than just something totally bypass iomap-L2P logic
> >>>> as above, then I'm not sure the current `iomap_iter->private` is
> >>>> quite seperate to `struct iomap_read_folio_ctx->private`, it seems
> >>>
> >>> If in the future this case arises, the L2P mapping info is accessible
> >>> by the read callback in the current design. `.read_folio_range()`
> >>> passes the iomap iter to the filesystem and they can access
> >>> iter->private to get the L2P mapping data they need.
> >>
> >> The question is what exposes to `iter->private` then, take
> >> an example:
> >>
> >> ```
> >> struct file *file;
> >> ```
> >>
> >> your .read_folio_range() needs `file->private_data` to get
> >> `struct fuse_file` so `file` is kept into
> >> `struct iomap_read_folio_ctx`.
> >>
> >> If `file->private_data` will be used for `.iomap_begin()`
> >> as well, what's your proposal then?
> >>
> >> Duplicate the same `file` pointer in both
> >> `struct iomap_read_folio_ctx` and `iter->private` context?
> >
> > It's just an not-so-appropriate example because
> > `struct file *` and `struct fuse_file *` are widely used
> > in the (buffer/direct) read/write flow but Darrick's work
> > doesn't use `file` in .iomap_{begin/end}.
> >
> > But you may find out `file` pointer is already used for
> > both FUSE buffer write and your proposal, e.g.
> >
> > buffer write:
> > /*
> > * Use iomap so that we can do granular uptodate reads
> > * and granular dirty tracking for large folios.
> > */
> > written = iomap_file_buffered_write(iocb, from,
> > &fuse_iomap_ops,
> > &fuse_iomap_write_ops,
> > file);
>
> And your buffer write per-fs context seems just use
> `iter->private` entirely instead to keep `file`.
>
I don’t think the iomap buffered writes interface is good to use as a
model. I looked a bit at some of the other iomap file operations and I
think we should just pass operation-specific data through an
operation-specific context for those too, eg for buffered writes and
dio modifying the interface from
ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter
*from, const struct iomap_ops *ops, const struct iomap_write_ops
*write_ops, void *private);
ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const
struct iomap_ops *ops, const struct iomap_dio_ops *dops, unsigned int
dio_flags, void *private, size_t done_before);
to something like
ssize_t iomap_file_buffered_write(const struct iomap_ops *ops, struct
iomap_write_folio_ctx *ctx);
ssize_t iomap_dio_rw(const struct iomap_ops *ops, struct iomap_dio_ctx *ctx);
There’s one filesystem besides fuse that uses “iter->private” and
that’s for xfs zoned inodes (xfs_zoned_buffered_write_iomap_begin()),
which passes the struct xfs_zone_alloc_ctx* through iter->private,
and it's used afaict for tracking block reservations. imo that's what
iter->private should be used for, to track the more high level
metadata stuff and then the lower-level details that are
operation-specific go through the ctx->data fields. That seems the
cleanest design to me. I think we should rename the iter->private
field to something like "iter->metadata" to make that delineation more
clear. I'm not sure what the iomap maintainers think, but that is my
opinion.
I think if in the future there is a case/feature which needs something
previously in one of the operation-specific ctxes, it seems fine to me
to have both iter->private and ctx->data point to the same thing.
Thanks,
Joanne
> >
> >
> > I just try to say if there is a case/feature which needs
> > something previously in `struct iomap_read_folio_ctx` to
> > be available in .iomap_{begin,end} too, you have to either:
> > - duplicate this in `iter->private` as well;
> > - move this to `iter->private` entirely.
> >
> > The problem is that both `iter->private` and
> > `struct iomap_read_folio_ctx` are filesystem-specific,
> > I can only see there is no clear boundary to leave something
> > in which one. It seems just like an artificial choice.
> >
> > Thanks,
> > Gao Xiang
> >
> >>
> >>
> >>>
> >>>> both needs fs-specific extra contexts for the same I/O flow.
> >>>>
> >>>> I think the reason why `struct iomap_read_folio_ctx->private` is
> >>>> introduced is basically previous iomap filesystems are all
> >>>> bio-based, and they shares `bio` concept in common but
> >>>> `iter->private` was not designed for this usage.
> >>>>
> >>>> But fuse `struct iomap_read_folio_ctx` and
> >>>> `struct fuse_fill_read_data` are too FUSE-specific, I cannot
> >>>> see it could be shared by other filesystems in the near future,
> >>>> which is much like a single-filesystem specific concept, and
> >>>> unlike to `bio` at all.
> >>>
> >>> Currently fuse is the only non-block-based filesystem using iomap but
> >>> I don't see why there wouldn't be more in the future. For example,
> >>> while looking at some of the netfs code, a lot of the core
> >>> functionality looks the same between that and iomap and I think it
> >>> might be a good idea to have netfs in the future use iomap's interface
> >>> so that it can get the large folio dirty/uptodate tracking stuff and
> >>> any other large folio stuff like more granular writeback stats
> >>> accounting for free.
> >>
> >> I think you need to ask David on this idea, I've told him to
> >> switch fscache to use iomap in 2022 before netfs is fully out [1],
> >> but I don't see it will happen.
> >>
> >> [1] https://lore.kernel.org/linux-fsdevel/YfivxC9S52FlyKoL@B-P7TQMD6M-0146/
> >>
> >> Thanks,
> >> Gao Xiang
> >
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 13/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard
2025-09-12 19:56 ` Joanne Koong
@ 2025-09-12 20:09 ` Joanne Koong
2025-09-12 23:35 ` Gao Xiang
2025-09-12 23:20 ` Gao Xiang
1 sibling, 1 reply; 60+ messages in thread
From: Joanne Koong @ 2025-09-12 20:09 UTC (permalink / raw)
To: Gao Xiang
Cc: Christoph Hellwig, brauner, miklos, djwong, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On Fri, Sep 12, 2025 at 3:56 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Thu, Sep 11, 2025 at 9:11 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> >
> > On 2025/9/12 09:09, Gao Xiang wrote:
> > >
> > >
> > > On 2025/9/12 08:06, Gao Xiang wrote:
> > >>
> > >>
> > >> On 2025/9/12 03:45, Joanne Koong wrote:
> > >>> On Thu, Sep 11, 2025 at 8:29 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > >>
> > >>>> But if FUSE or some other fs later needs to request L2P information
> > >>>> in their .iomap_begin() and need to send L2P requests to userspace
> > >>>> daemon to confirm where to get the physical data (maybe somewhat
> > >>>> like Darrick's work but I don't have extra time to dig into that
> > >>>> either) rather than just something totally bypass iomap-L2P logic
> > >>>> as above, then I'm not sure the current `iomap_iter->private` is
> > >>>> quite seperate to `struct iomap_read_folio_ctx->private`, it seems
> > >>>
> > >>> If in the future this case arises, the L2P mapping info is accessible
> > >>> by the read callback in the current design. `.read_folio_range()`
> > >>> passes the iomap iter to the filesystem and they can access
> > >>> iter->private to get the L2P mapping data they need.
> > >>
> > >> The question is what exposes to `iter->private` then, take
> > >> an example:
> > >>
> > >> ```
> > >> struct file *file;
> > >> ```
> > >>
> > >> your .read_folio_range() needs `file->private_data` to get
> > >> `struct fuse_file` so `file` is kept into
> > >> `struct iomap_read_folio_ctx`.
> > >>
> > >> If `file->private_data` will be used for `.iomap_begin()`
> > >> as well, what's your proposal then?
> > >>
> > >> Duplicate the same `file` pointer in both
> > >> `struct iomap_read_folio_ctx` and `iter->private` context?
> > >
> > > It's just an not-so-appropriate example because
> > > `struct file *` and `struct fuse_file *` are widely used
> > > in the (buffer/direct) read/write flow but Darrick's work
> > > doesn't use `file` in .iomap_{begin/end}.
> > >
> > > But you may find out `file` pointer is already used for
> > > both FUSE buffer write and your proposal, e.g.
> > >
> > > buffer write:
> > > /*
> > > * Use iomap so that we can do granular uptodate reads
> > > * and granular dirty tracking for large folios.
> > > */
> > > written = iomap_file_buffered_write(iocb, from,
> > > &fuse_iomap_ops,
> > > &fuse_iomap_write_ops,
> > > file);
> >
> > And your buffer write per-fs context seems just use
> > `iter->private` entirely instead to keep `file`.
> >
>
> I don’t think the iomap buffered writes interface is good to use as a
> model. I looked a bit at some of the other iomap file operations and I
> think we should just pass operation-specific data through an
> operation-specific context for those too, eg for buffered writes and
> dio modifying the interface from
>
> ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter
> *from, const struct iomap_ops *ops, const struct iomap_write_ops
> *write_ops, void *private);
> ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const
> struct iomap_ops *ops, const struct iomap_dio_ops *dops, unsigned int
> dio_flags, void *private, size_t done_before);
>
> to something like
>
> ssize_t iomap_file_buffered_write(const struct iomap_ops *ops, struct
> iomap_write_folio_ctx *ctx);
> ssize_t iomap_dio_rw(const struct iomap_ops *ops, struct iomap_dio_ctx *ctx);
>
> There’s one filesystem besides fuse that uses “iter->private” and
> that’s for xfs zoned inodes (xfs_zoned_buffered_write_iomap_begin()),
> which passes the struct xfs_zone_alloc_ctx* through iter->private,
> and it's used afaict for tracking block reservations. imo that's what
> iter->private should be used for, to track the more high level
> metadata stuff and then the lower-level details that are
> operation-specific go through the ctx->data fields. That seems the
> cleanest design to me. I think we should rename the iter->private
> field to something like "iter->metadata" to make that delineation more
> clear. I'm not sure what the iomap maintainers think, but that is my
> opinion.
>
> I think if in the future there is a case/feature which needs something
> previously in one of the operation-specific ctxes, it seems fine to me
> to have both iter->private and ctx->data point to the same thing.
>
>
> Thanks,
> Joanne
>
> > >
> > >
> > > I just try to say if there is a case/feature which needs
> > > something previously in `struct iomap_read_folio_ctx` to
> > > be available in .iomap_{begin,end} too, you have to either:
> > > - duplicate this in `iter->private` as well;
> > > - move this to `iter->private` entirely.
> > >
> > > The problem is that both `iter->private` and
> > > `struct iomap_read_folio_ctx` are filesystem-specific,
> > > I can only see there is no clear boundary to leave something
> > > in which one. It seems just like an artificial choice.
> > >
> > > Thanks,
> > > Gao Xiang
> > >
> > >>
> > >>
> > >>>
> > >>>> both needs fs-specific extra contexts for the same I/O flow.
> > >>>>
> > >>>> I think the reason why `struct iomap_read_folio_ctx->private` is
> > >>>> introduced is basically previous iomap filesystems are all
> > >>>> bio-based, and they shares `bio` concept in common but
> > >>>> `iter->private` was not designed for this usage.
> > >>>>
> > >>>> But fuse `struct iomap_read_folio_ctx` and
> > >>>> `struct fuse_fill_read_data` are too FUSE-specific, I cannot
> > >>>> see it could be shared by other filesystems in the near future,
> > >>>> which is much like a single-filesystem specific concept, and
> > >>>> unlike to `bio` at all.
> > >>>
> > >>> Currently fuse is the only non-block-based filesystem using iomap but
> > >>> I don't see why there wouldn't be more in the future. For example,
> > >>> while looking at some of the netfs code, a lot of the core
> > >>> functionality looks the same between that and iomap and I think it
> > >>> might be a good idea to have netfs in the future use iomap's interface
> > >>> so that it can get the large folio dirty/uptodate tracking stuff and
> > >>> any other large folio stuff like more granular writeback stats
> > >>> accounting for free.
> > >>
> > >> I think you need to ask David on this idea, I've told him to
> > >> switch fscache to use iomap in 2022 before netfs is fully out [1],
> > >> but I don't see it will happen.
> > >>
> > >> [1] https://lore.kernel.org/linux-fsdevel/YfivxC9S52FlyKoL@B-P7TQMD6M-0146/
(sorry, just saw this part of the email otherwise I would have
included this in the previous message)
Thanks for the link to the thread. My understanding is that the large
folio optimizations stuff was added to iomap in July 2023 (afaict from
the git history) and iomap is entangled with the block layer but it's
becoming more of a generic interface now. Maybe now it makes sense to
go through iomap's interface than it did in 2022, but of course David
has the most context on this.
Thanks,
Joanne
> > >>
> > >> Thanks,
> > >> Gao Xiang
> > >
> >
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 13/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard
2025-09-12 19:56 ` Joanne Koong
2025-09-12 20:09 ` Joanne Koong
@ 2025-09-12 23:20 ` Gao Xiang
1 sibling, 0 replies; 60+ messages in thread
From: Gao Xiang @ 2025-09-12 23:20 UTC (permalink / raw)
To: Joanne Koong
Cc: Christoph Hellwig, brauner, miklos, djwong, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On 2025/9/13 03:56, Joanne Koong wrote:
> On Thu, Sep 11, 2025 at 9:11 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>
>> On 2025/9/12 09:09, Gao Xiang wrote:
>>>
>>>
>>> On 2025/9/12 08:06, Gao Xiang wrote:
>>>>
>>>>
>>>> On 2025/9/12 03:45, Joanne Koong wrote:
>>>>> On Thu, Sep 11, 2025 at 8:29 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>>>
>>>>>> But if FUSE or some other fs later needs to request L2P information
>>>>>> in their .iomap_begin() and need to send L2P requests to userspace
>>>>>> daemon to confirm where to get the physical data (maybe somewhat
>>>>>> like Darrick's work but I don't have extra time to dig into that
>>>>>> either) rather than just something totally bypass iomap-L2P logic
>>>>>> as above, then I'm not sure the current `iomap_iter->private` is
>>>>>> quite seperate to `struct iomap_read_folio_ctx->private`, it seems
>>>>>
>>>>> If in the future this case arises, the L2P mapping info is accessible
>>>>> by the read callback in the current design. `.read_folio_range()`
>>>>> passes the iomap iter to the filesystem and they can access
>>>>> iter->private to get the L2P mapping data they need.
>>>>
>>>> The question is what exposes to `iter->private` then, take
>>>> an example:
>>>>
>>>> ```
>>>> struct file *file;
>>>> ```
>>>>
>>>> your .read_folio_range() needs `file->private_data` to get
>>>> `struct fuse_file` so `file` is kept into
>>>> `struct iomap_read_folio_ctx`.
>>>>
>>>> If `file->private_data` will be used for `.iomap_begin()`
>>>> as well, what's your proposal then?
>>>>
>>>> Duplicate the same `file` pointer in both
>>>> `struct iomap_read_folio_ctx` and `iter->private` context?
>>>
>>> It's just an not-so-appropriate example because
>>> `struct file *` and `struct fuse_file *` are widely used
>>> in the (buffer/direct) read/write flow but Darrick's work
>>> doesn't use `file` in .iomap_{begin/end}.
>>>
>>> But you may find out `file` pointer is already used for
>>> both FUSE buffer write and your proposal, e.g.
>>>
>>> buffer write:
>>> /*
>>> * Use iomap so that we can do granular uptodate reads
>>> * and granular dirty tracking for large folios.
>>> */
>>> written = iomap_file_buffered_write(iocb, from,
>>> &fuse_iomap_ops,
>>> &fuse_iomap_write_ops,
>>> file);
>>
>> And your buffer write per-fs context seems just use
>> `iter->private` entirely instead to keep `file`.
>>
>
> I don’t think the iomap buffered writes interface is good to use as a
> model. I looked a bit at some of the other iomap file operations and I
> think we should just pass operation-specific data through an
> operation-specific context for those too, eg for buffered writes and
> dio modifying the interface from
>
> ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter
> *from, const struct iomap_ops *ops, const struct iomap_write_ops
> *write_ops, void *private);
> ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const
> struct iomap_ops *ops, const struct iomap_dio_ops *dops, unsigned int
> dio_flags, void *private, size_t done_before);
>
> to something like
>
> ssize_t iomap_file_buffered_write(const struct iomap_ops *ops, struct
> iomap_write_folio_ctx *ctx);
> ssize_t iomap_dio_rw(const struct iomap_ops *ops, struct iomap_dio_ctx *ctx);
>
> There’s one filesystem besides fuse that uses “iter->private” and
> that’s for xfs zoned inodes (xfs_zoned_buffered_write_iomap_begin()),
> which passes the struct xfs_zone_alloc_ctx* through iter->private,
> and it's used afaict for tracking block reservations. imo that's what
> iter->private should be used for, to track the more high level
> metadata stuff and then the lower-level details that are
> operation-specific go through the ctx->data fields. That seems the
> cleanest design to me. I think we should rename the iter->private
> field to something like "iter->metadata" to make that delineation more
> clear. I'm not sure what the iomap maintainers think, but that is my
> opinion.
In short, I don't think new "low-level" and "high-level" concepts are
really useful even for disk fses.
>
> I think if in the future there is a case/feature which needs something
> previously in one of the operation-specific ctxes, it seems fine to me
> to have both iter->private and ctx->data point to the same thing.
>
I want to stop this topic here, it's totally up to iomap maintainers to
decide what's the future iomap looks like but I still keep my strong
reserve opinion (you can ignore) from my own code design taste.
Thanks,
Gao Xiang
>
> Thanks,
> Joanne
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 13/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard
2025-09-12 20:09 ` Joanne Koong
@ 2025-09-12 23:35 ` Gao Xiang
0 siblings, 0 replies; 60+ messages in thread
From: Gao Xiang @ 2025-09-12 23:35 UTC (permalink / raw)
To: Joanne Koong, Darrick J. Wong, Christoph Hellwig
Cc: brauner, miklos, linux-block, gfs2, linux-fsdevel, kernel-team,
linux-xfs, linux-doc
On 2025/9/13 04:09, Joanne Koong wrote:
> On Fri, Sep 12, 2025 at 3:56 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>>
...
>>>>> but I don't see it will happen.
>>>>>
>>>>> [1] https://lore.kernel.org/linux-fsdevel/YfivxC9S52FlyKoL@B-P7TQMD6M-0146/
>
> (sorry, just saw this part of the email otherwise I would have
> included this in the previous message)
>
> Thanks for the link to the thread. My understanding is that the large
> folio optimizations stuff was added to iomap in July 2023 (afaict from
> the git history) and iomap is entangled with the block layer but it's
> becoming more of a generic interface now. Maybe now it makes sense to
> go through iomap's interface than it did in 2022, but of course David
> has the most context on this.
Again, I really think iomap callback model is not good stuff especially
as it becomes a more generic thing, and it seems inflexible compared
with other interfaces like the page cache (it also has callbacks
but many of them are just a few entrances of IO flows) or bio kAPIs.
As in the previous example, network filesystems generally don't need
any L2P logic (in principle, FUSE is more like a network filesystem),
but they still have to implement those iomap dummy callbacks and
ignore `iomap->addr`.
As for per-block dirty/uptodate tracking, that is just an atomic
feature to manage sub-folio metadata, but iomap is initially a part
which is out of XFS, and basically standard flows for disk/pmem fses.
I really think better generic interfaces are like lego bricks instead,
therefore filesystems can optionally use any of those atomic features
instead of just calling in iomap {read,write,writeback} maze-like
helpers and do different work in the callback hooks (even not all
filesystems need this).
I've mentioned too in
https://lore.kernel.org/r/d631c71f-9d0d-405f-862d-b881767b1945@linux.alibaba.com
https://lore.kernel.org/r/20250905152118.GE1587915@frogsfrogsfrogs
Thanks,
Gao Xiang
>
>
> Thanks,
> Joanne
>
>>>>>
>>>>> Thanks,
>>>>> Gao Xiang
>>>>
>>>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 05/16] iomap: propagate iomap_read_folio() error to caller
2025-09-12 16:28 ` Joanne Koong
@ 2025-09-15 16:05 ` Christoph Hellwig
0 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-09-15 16:05 UTC (permalink / raw)
To: Joanne Koong
Cc: Christoph Hellwig, brauner, miklos, djwong, hsiangkao,
linux-block, gfs2, linux-fsdevel, kernel-team, linux-xfs,
linux-doc
On Fri, Sep 12, 2025 at 12:28:02PM -0400, Joanne Koong wrote:
> I'll drop this. I interpreted Matthew's comment to mean the error
> return isn't useful for ->readahead but is for ->read_folio.
>
> If iomap_read_folio() doesn't do error returns and always just returns
> 0, then maybe we should just make it `void iomap_read_folio(...)`
> instead of `int iomap_read_folio(...)` then.
Yes, more void returns also really help to simplify the code flow.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 12/16] iomap: add bias for async read requests
2025-09-12 17:30 ` Joanne Koong
@ 2025-09-15 16:05 ` Christoph Hellwig
2025-09-16 19:14 ` Joanne Koong
1 sibling, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-09-15 16:05 UTC (permalink / raw)
To: Joanne Koong
Cc: Christoph Hellwig, brauner, miklos, djwong, hsiangkao,
linux-block, gfs2, linux-fsdevel, kernel-team, linux-xfs,
linux-doc
On Fri, Sep 12, 2025 at 01:30:35PM -0400, Joanne Koong wrote:
> I think you're right, this is probably clearer without trying to share
> the function.
>
> I think maybe we can make this even simpler. Right now we mark the
> bitmap uptodate every time a range is read in but I think instead we
> can just do one bitmap uptodate operation for the entire folio when
> the read has completely finished. If we do this, then we can make
> "ifs->read_bytes_pending" back to an atomic_t since we don't save one
> atomic operation from doing it through a spinlock anymore (eg what
> commit f45b494e2a "iomap: protect read_bytes_pending with the
> state_lock" optimized). And then this bias thing can just become:
That's a really good idea, and also gets us closer to the write side.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 12/16] iomap: add bias for async read requests
2025-09-12 17:30 ` Joanne Koong
2025-09-15 16:05 ` Christoph Hellwig
@ 2025-09-16 19:14 ` Joanne Koong
2025-09-19 15:04 ` Christoph Hellwig
1 sibling, 1 reply; 60+ messages in thread
From: Joanne Koong @ 2025-09-16 19:14 UTC (permalink / raw)
To: Christoph Hellwig
Cc: brauner, miklos, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On Fri, Sep 12, 2025 at 10:30 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Thu, Sep 11, 2025 at 7:31 AM Christoph Hellwig <hch@infradead.org> wrote:
> >
> > > +static void __iomap_finish_folio_read(struct folio *folio, size_t off,
> > > + size_t len, int error, bool update_bitmap)
> > > {
> > > struct iomap_folio_state *ifs = folio->private;
> > > bool uptodate = !error;
> > > @@ -340,7 +340,7 @@ void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
> > > unsigned long flags;
> > >
> > > spin_lock_irqsave(&ifs->state_lock, flags);
> > > - if (!error)
> > > + if (!error && update_bitmap)
> > > uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
> >
> > This code sharing keeps confusing me a bit. I think it's technically
> > perfectly fine, but not helpful for readability. We'd solve that by
> > open coding the !update_bitmap case in iomap_read_folio_iter. Which
> > would also allow to use spin_lock_irq instead of spin_lock_irqsave there
> > as a nice little micro-optimization. If we'd then also get rid of the
> > error return from ->read_folio_range and always do asynchronous error
> > returns it would be even simpler.
> >
> > Or maybe I just need to live with the magic bitmap update, but the
> > fact that "len" sometimes is an actual length, and sometimes just a
> > counter for read_bytes_pending keeps confusing me
> >
>
> I think you're right, this is probably clearer without trying to share
> the function.
>
> I think maybe we can make this even simpler. Right now we mark the
> bitmap uptodate every time a range is read in but I think instead we
> can just do one bitmap uptodate operation for the entire folio when
> the read has completely finished. If we do this, then we can make
> "ifs->read_bytes_pending" back to an atomic_t since we don't save one
> atomic operation from doing it through a spinlock anymore (eg what
> commit f45b494e2a "iomap: protect read_bytes_pending with the
> state_lock" optimized). And then this bias thing can just become:
>
> if (ifs) {
> if (atomic_dec_and_test(&ifs->read_bytes_pending))
> folio_end_read(folio, !ret);
> *cur_folio_owned = true;
> }
>
This idea doesn't work unfortunately because reading in a range might fail.
I'll change this to open coding the !update_bitmap case with
spin_lock_irq, like Christoph suggested.
>
> Thanks,
> Joanne
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 13/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard
2025-09-11 11:44 ` Christoph Hellwig
@ 2025-09-16 23:23 ` Joanne Koong
0 siblings, 0 replies; 60+ messages in thread
From: Joanne Koong @ 2025-09-16 23:23 UTC (permalink / raw)
To: Christoph Hellwig
Cc: brauner, miklos, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On Thu, Sep 11, 2025 at 4:44 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Mon, Sep 08, 2025 at 11:51:19AM -0700, Joanne Koong wrote:
> > There is no longer a dependency on CONFIG_BLOCK in the iomap read and
> > readahead logic. Move this logic out of the CONFIG_BLOCK guard. This
> > allows non-block-based filesystems to use iomap for reads/readahead.
>
> Please move the bio code into a new file. Example patch attached below
> that does just that without addressing any of the previous comments:
>
> diff --git a/fs/iomap/Makefile b/fs/iomap/Makefile
> index f7e1c8534c46..a572b8808524 100644
> --- a/fs/iomap/Makefile
> +++ b/fs/iomap/Makefile
> @@ -14,5 +14,6 @@ iomap-y += trace.o \
> iomap-$(CONFIG_BLOCK) += direct-io.o \
> ioend.o \
> fiemap.o \
> - seek.o
> + seek.o \
> + bio.o
> iomap-$(CONFIG_SWAP) += swapfile.o
...
The version of this for v3 is pretty much exactly what you wrote. i'll
add a signed-off-by attributing the patch to you when I send it out.
Thanks,
Joanne
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 12/16] iomap: add bias for async read requests
2025-09-16 19:14 ` Joanne Koong
@ 2025-09-19 15:04 ` Christoph Hellwig
2025-09-19 17:58 ` Joanne Koong
0 siblings, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2025-09-19 15:04 UTC (permalink / raw)
To: Joanne Koong
Cc: Christoph Hellwig, brauner, miklos, djwong, hsiangkao,
linux-block, gfs2, linux-fsdevel, kernel-team, linux-xfs,
linux-doc
On Tue, Sep 16, 2025 at 12:14:05PM -0700, Joanne Koong wrote:
> > I think you're right, this is probably clearer without trying to share
> > the function.
> >
> > I think maybe we can make this even simpler. Right now we mark the
> > bitmap uptodate every time a range is read in but I think instead we
> > can just do one bitmap uptodate operation for the entire folio when
> > the read has completely finished. If we do this, then we can make
> > "ifs->read_bytes_pending" back to an atomic_t since we don't save one
> > atomic operation from doing it through a spinlock anymore (eg what
> > commit f45b494e2a "iomap: protect read_bytes_pending with the
> > state_lock" optimized). And then this bias thing can just become:
> >
> > if (ifs) {
> > if (atomic_dec_and_test(&ifs->read_bytes_pending))
> > folio_end_read(folio, !ret);
> > *cur_folio_owned = true;
> > }
> >
>
> This idea doesn't work unfortunately because reading in a range might fail.
As in the asynchronous read generats an error, but finishes faster
than the submitting context calling the atomic_dec_and_test here?
Yes, that is possible, although rare. But having a way to pass
that information on somehow. PG_uptodate/folio uptodate would make
sense for that, but right now we expect folio_end_read to set that.
And I fail to understand the logic folio_end_read - it should clear
the locked bit and add the updatodate one, but I have no idea how
it makes that happen.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v2 12/16] iomap: add bias for async read requests
2025-09-19 15:04 ` Christoph Hellwig
@ 2025-09-19 17:58 ` Joanne Koong
0 siblings, 0 replies; 60+ messages in thread
From: Joanne Koong @ 2025-09-19 17:58 UTC (permalink / raw)
To: Christoph Hellwig
Cc: brauner, miklos, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On Fri, Sep 19, 2025 at 8:04 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Tue, Sep 16, 2025 at 12:14:05PM -0700, Joanne Koong wrote:
> > > I think you're right, this is probably clearer without trying to share
> > > the function.
> > >
> > > I think maybe we can make this even simpler. Right now we mark the
> > > bitmap uptodate every time a range is read in but I think instead we
> > > can just do one bitmap uptodate operation for the entire folio when
> > > the read has completely finished. If we do this, then we can make
> > > "ifs->read_bytes_pending" back to an atomic_t since we don't save one
> > > atomic operation from doing it through a spinlock anymore (eg what
> > > commit f45b494e2a "iomap: protect read_bytes_pending with the
> > > state_lock" optimized). And then this bias thing can just become:
> > >
> > > if (ifs) {
> > > if (atomic_dec_and_test(&ifs->read_bytes_pending))
> > > folio_end_read(folio, !ret);
> > > *cur_folio_owned = true;
> > > }
> > >
> >
> > This idea doesn't work unfortunately because reading in a range might fail.
>
> As in the asynchronous read generats an error, but finishes faster
> than the submitting context calling the atomic_dec_and_test here?
>
> Yes, that is possible, although rare. But having a way to pass
> that information on somehow. PG_uptodate/folio uptodate would make
We could use the upper bit of read_bytes_pending to track if any error
occurred, but that still means if there's an error we'd miss marking
the ranges that were successfully read in as uptodate. But that's a
great point, maybe that's fine since that should not happen often
anyways.
> sense for that, but right now we expect folio_end_read to set that.
> And I fail to understand the logic folio_end_read - it should clear
> the locked bit and add the updatodate one, but I have no idea how
> it makes that happen.
>
I think that happens in the folio_xor_flags_has_waiters() ->
xor_unlock_is_negative_byte() call, which does an xor using the
PG_locked/PG_uptodate mask, but the naming of the function kind of
obfuscates that.
^ permalink raw reply [flat|nested] 60+ messages in thread
end of thread, other threads:[~2025-09-19 17:58 UTC | newest]
Thread overview: 60+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-08 18:51 [PATCH v2 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
2025-09-08 18:51 ` [PATCH v2 01/16] iomap: move async bio read logic into helper function Joanne Koong
2025-09-11 11:09 ` Christoph Hellwig
2025-09-12 16:01 ` Joanne Koong
2025-09-08 18:51 ` [PATCH v2 02/16] iomap: move read/readahead bio submission " Joanne Koong
2025-09-11 11:09 ` Christoph Hellwig
2025-09-08 18:51 ` [PATCH v2 03/16] iomap: rename cur_folio_in_bio to folio_owned Joanne Koong
2025-09-11 11:10 ` Christoph Hellwig
2025-09-08 18:51 ` [PATCH v2 04/16] iomap: store read/readahead bio generically Joanne Koong
2025-09-11 11:11 ` Christoph Hellwig
2025-09-12 16:10 ` Joanne Koong
2025-09-08 18:51 ` [PATCH v2 05/16] iomap: propagate iomap_read_folio() error to caller Joanne Koong
2025-09-11 11:13 ` Christoph Hellwig
2025-09-12 16:28 ` Joanne Koong
2025-09-15 16:05 ` Christoph Hellwig
2025-09-08 18:51 ` [PATCH v2 06/16] iomap: iterate over entire folio in iomap_readpage_iter() Joanne Koong
2025-09-11 11:15 ` Christoph Hellwig
2025-09-08 18:51 ` [PATCH v2 07/16] iomap: rename iomap_readpage_iter() to iomap_read_folio_iter() Joanne Koong
2025-09-11 11:16 ` Christoph Hellwig
2025-09-08 18:51 ` [PATCH v2 08/16] iomap: rename iomap_readpage_ctx struct to iomap_read_folio_ctx Joanne Koong
2025-09-11 11:16 ` Christoph Hellwig
2025-09-08 18:51 ` [PATCH v2 09/16] iomap: add public start/finish folio read helpers Joanne Koong
2025-09-11 11:16 ` Christoph Hellwig
2025-09-08 18:51 ` [PATCH v2 10/16] iomap: make iomap_read_folio_ctx->folio_owned internal Joanne Koong
2025-09-11 11:17 ` Christoph Hellwig
2025-09-08 18:51 ` [PATCH v2 11/16] iomap: add caller-provided callbacks for read and readahead Joanne Koong
2025-09-09 0:14 ` Gao Xiang
2025-09-09 0:40 ` Gao Xiang
2025-09-09 15:24 ` Joanne Koong
2025-09-09 23:21 ` Gao Xiang
2025-09-10 17:41 ` Joanne Koong
2025-09-11 11:19 ` Christoph Hellwig
2025-09-11 11:26 ` Christoph Hellwig
2025-09-12 17:36 ` Joanne Koong
2025-09-08 18:51 ` [PATCH v2 12/16] iomap: add bias for async read requests Joanne Koong
2025-09-11 11:31 ` Christoph Hellwig
2025-09-12 17:30 ` Joanne Koong
2025-09-15 16:05 ` Christoph Hellwig
2025-09-16 19:14 ` Joanne Koong
2025-09-19 15:04 ` Christoph Hellwig
2025-09-19 17:58 ` Joanne Koong
2025-09-08 18:51 ` [PATCH v2 13/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard Joanne Koong
2025-09-09 2:14 ` Gao Xiang
2025-09-09 15:33 ` Joanne Koong
2025-09-10 4:59 ` Gao Xiang
2025-09-11 11:37 ` Christoph Hellwig
2025-09-11 12:29 ` Gao Xiang
2025-09-11 19:45 ` Joanne Koong
2025-09-12 0:06 ` Gao Xiang
2025-09-12 1:09 ` Gao Xiang
2025-09-12 1:10 ` Gao Xiang
2025-09-12 19:56 ` Joanne Koong
2025-09-12 20:09 ` Joanne Koong
2025-09-12 23:35 ` Gao Xiang
2025-09-12 23:20 ` Gao Xiang
2025-09-11 11:44 ` Christoph Hellwig
2025-09-16 23:23 ` Joanne Koong
2025-09-08 18:51 ` [PATCH v2 14/16] fuse: use iomap for read_folio Joanne Koong
2025-09-08 18:51 ` [PATCH v2 15/16] fuse: use iomap for readahead Joanne Koong
2025-09-08 18:51 ` [PATCH v2 16/16] fuse: remove fc->blkbits workaround for partial writes Joanne Koong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).