* [PATCH v1 00/16] fuse: use iomap for buffered reads + readahead
@ 2025-08-29 23:56 Joanne Koong
2025-08-29 23:56 ` [PATCH v1 01/16] iomap: move async bio read logic into helper function Joanne Koong
` (15 more replies)
0 siblings, 16 replies; 34+ messages in thread
From: Joanne Koong @ 2025-08-29 23:56 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, linux-fsdevel, kernel-team, linux-xfs, linux-doc
This series adds fuse iomap support for buffered reads and readahead.
This is needed so that granular uptodate tracking can be used in fuse when
large folios are enabled so that only the needed portions of the folio need to
be read in instead of having to read in the entire folio. It also is needed in
order to turn on large folios for servers that use the writeback cache since
otherwise there is a race condition that may lead to data corruption if there
is a partial write, then a read and the read happens before the write has
undergone writeback, since otherwise the folio will not be marked uptodate
from the partial write so the read will read in the entire folio from disk,
which will overwrite the partial write.
Part of this work is modifying the iomap interface to support non-bio
reads and to work on environments that do not have CONFIG_BLOCK enabled,
which is what patchsets 1 to 6 do.
This is on top of commit 4f702205 ("Merge branch 'vfs-6.18.rust' into
vfs.all") in Christian's vfs tree.
This series was run through fstests on fuse passthrough_hp with an
out-of kernel patch enabling fuse large folios.
This patchset does not enable large folios on fuse yet. That will be part
of a different patchset.
Thanks,
Joanne
Joanne Koong (16):
iomap: move async bio read logic into helper function
iomap: rename cur_folio_in_bio to folio_unlocked
iomap: refactor read/readahead completion
iomap: use iomap_iter->private for stashing read/readahead bio
iomap: propagate iomap_read_folio() error to caller
iomap: move read/readahead logic out of CONFIG_BLOCK guard
iomap: iterate through entire folio in iomap_readpage_iter()
iomap: rename iomap_readpage_iter() to iomap_readfolio_iter()
iomap: rename iomap_readpage_ctx struct to iomap_readfolio_ctx
iomap: add iomap_start_folio_read() helper
iomap: make start folio read and finish folio read public APIs
iomap: add iomap_read_ops for read and readahead
iomap: add a private arg for read and readahead
fuse: use iomap for read_folio
fuse: use iomap for readahead
fuse: remove fuse_readpages_end() null mapping check
.../filesystems/iomap/operations.rst | 19 ++
block/fops.c | 4 +-
fs/erofs/data.c | 4 +-
fs/fuse/file.c | 298 +++++++++-------
fs/gfs2/aops.c | 4 +-
fs/iomap/buffered-io.c | 321 +++++++++++-------
fs/xfs/xfs_aops.c | 4 +-
fs/zonefs/file.c | 4 +-
include/linux/iomap.h | 24 +-
9 files changed, 412 insertions(+), 270 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v1 01/16] iomap: move async bio read logic into helper function
2025-08-29 23:56 [PATCH v1 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
@ 2025-08-29 23:56 ` Joanne Koong
2025-09-03 20:16 ` Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 02/16] iomap: rename cur_folio_in_bio to folio_unlocked Joanne Koong
` (14 subsequent siblings)
15 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2025-08-29 23:56 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, linux-fsdevel, kernel-team, linux-xfs, linux-doc
Move the iomap_readpage_iter() async bio read logic into a separate
helper function. This is needed to make iomap read/readahead more
generically usable, especially for filesystems that do not require
CONFIG_BLOCK.
Rename iomap_read_folio_range() to iomap_read_folio_range_sync() to
diferentiate between the synchronous and asynchronous bio folio read
calls.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 68 ++++++++++++++++++++++++------------------
1 file changed, 39 insertions(+), 29 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index fd827398afd2..f8bdb2428819 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -357,36 +357,15 @@ struct iomap_readpage_ctx {
struct readahead_control *rac;
};
-static int iomap_readpage_iter(struct iomap_iter *iter,
- struct iomap_readpage_ctx *ctx)
+static void iomap_read_folio_range_async(const struct iomap_iter *iter,
+ struct iomap_readpage_ctx *ctx, loff_t pos, size_t plen)
{
+ struct folio *folio = ctx->cur_folio;
const struct iomap *iomap = &iter->iomap;
- loff_t pos = iter->pos;
+ struct iomap_folio_state *ifs = folio->private;
+ size_t poff = offset_in_folio(folio, pos);
loff_t length = iomap_length(iter);
- struct folio *folio = ctx->cur_folio;
- struct iomap_folio_state *ifs;
- size_t poff, plen;
sector_t sector;
- int ret;
-
- if (iomap->type == IOMAP_INLINE) {
- ret = iomap_read_inline_data(iter, folio);
- if (ret)
- return ret;
- return iomap_iter_advance(iter, &length);
- }
-
- /* zero post-eof blocks as the page may be mapped */
- ifs = ifs_alloc(iter->inode, folio, iter->flags);
- iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff, &plen);
- if (plen == 0)
- goto done;
-
- if (iomap_block_needs_zeroing(iter, pos)) {
- folio_zero_range(folio, poff, plen);
- iomap_set_range_uptodate(folio, poff, plen);
- goto done;
- }
ctx->cur_folio_in_bio = true;
if (ifs) {
@@ -425,6 +404,37 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
ctx->bio->bi_end_io = iomap_read_end_io;
bio_add_folio_nofail(ctx->bio, folio, plen, poff);
}
+}
+
+static int iomap_readpage_iter(struct iomap_iter *iter,
+ struct iomap_readpage_ctx *ctx)
+{
+ const struct iomap *iomap = &iter->iomap;
+ loff_t pos = iter->pos;
+ loff_t length = iomap_length(iter);
+ struct folio *folio = ctx->cur_folio;
+ size_t poff, plen;
+ int ret;
+
+ if (iomap->type == IOMAP_INLINE) {
+ ret = iomap_read_inline_data(iter, folio);
+ if (ret)
+ return ret;
+ return iomap_iter_advance(iter, &length);
+ }
+
+ /* zero post-eof blocks as the page may be mapped */
+ ifs_alloc(iter->inode, folio, iter->flags);
+ iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff, &plen);
+ if (plen == 0)
+ goto done;
+
+ if (iomap_block_needs_zeroing(iter, pos)) {
+ folio_zero_range(folio, poff, plen);
+ iomap_set_range_uptodate(folio, poff, plen);
+ } else {
+ iomap_read_folio_range_async(iter, ctx, pos, plen);
+ }
done:
/*
@@ -549,7 +559,7 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
}
EXPORT_SYMBOL_GPL(iomap_readahead);
-static int iomap_read_folio_range(const struct iomap_iter *iter,
+static int iomap_read_folio_range_sync(const struct iomap_iter *iter,
struct folio *folio, loff_t pos, size_t len)
{
const struct iomap *srcmap = iomap_iter_srcmap(iter);
@@ -562,7 +572,7 @@ static int iomap_read_folio_range(const struct iomap_iter *iter,
return submit_bio_wait(&bio);
}
#else
-static int iomap_read_folio_range(const struct iomap_iter *iter,
+static int iomap_read_folio_range_sync(const struct iomap_iter *iter,
struct folio *folio, loff_t pos, size_t len)
{
WARN_ON_ONCE(1);
@@ -739,7 +749,7 @@ static int __iomap_write_begin(const struct iomap_iter *iter,
status = write_ops->read_folio_range(iter,
folio, block_start, plen);
else
- status = iomap_read_folio_range(iter,
+ status = iomap_read_folio_range_sync(iter,
folio, block_start, plen);
if (status)
return status;
--
2.47.3
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 02/16] iomap: rename cur_folio_in_bio to folio_unlocked
2025-08-29 23:56 [PATCH v1 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
2025-08-29 23:56 ` [PATCH v1 01/16] iomap: move async bio read logic into helper function Joanne Koong
@ 2025-08-29 23:56 ` Joanne Koong
2025-09-03 20:26 ` [PATCH v1 02/16] iomap: rename cur_folio_in_bio to folio_unlockedOM Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 03/16] iomap: refactor read/readahead completion Joanne Koong
` (13 subsequent siblings)
15 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2025-08-29 23:56 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, linux-fsdevel, kernel-team, linux-xfs, linux-doc
The purpose of struct iomap_readpage_ctx's cur_folio_in_bio is to track
if the folio needs to be unlocked or not. Rename this to folio_unlocked
to make the purpose more clear and so that when iomap read/readahead
logic is made generic, the name also makes sense for filesystems that
don't use bios.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index f8bdb2428819..4b173aad04ed 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -352,7 +352,7 @@ static void iomap_read_end_io(struct bio *bio)
struct iomap_readpage_ctx {
struct folio *cur_folio;
- bool cur_folio_in_bio;
+ bool folio_unlocked;
struct bio *bio;
struct readahead_control *rac;
};
@@ -367,7 +367,7 @@ static void iomap_read_folio_range_async(const struct iomap_iter *iter,
loff_t length = iomap_length(iter);
sector_t sector;
- ctx->cur_folio_in_bio = true;
+ ctx->folio_unlocked = true;
if (ifs) {
spin_lock_irq(&ifs->state_lock);
ifs->read_bytes_pending += plen;
@@ -480,9 +480,9 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
if (ctx.bio) {
submit_bio(ctx.bio);
- WARN_ON_ONCE(!ctx.cur_folio_in_bio);
+ WARN_ON_ONCE(!ctx.folio_unlocked);
} else {
- WARN_ON_ONCE(ctx.cur_folio_in_bio);
+ WARN_ON_ONCE(ctx.folio_unlocked);
folio_unlock(folio);
}
@@ -503,13 +503,13 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
while (iomap_length(iter)) {
if (ctx->cur_folio &&
offset_in_folio(ctx->cur_folio, iter->pos) == 0) {
- if (!ctx->cur_folio_in_bio)
+ if (!ctx->folio_unlocked)
folio_unlock(ctx->cur_folio);
ctx->cur_folio = NULL;
}
if (!ctx->cur_folio) {
ctx->cur_folio = readahead_folio(ctx->rac);
- ctx->cur_folio_in_bio = false;
+ ctx->folio_unlocked = false;
}
ret = iomap_readpage_iter(iter, ctx);
if (ret)
@@ -552,10 +552,8 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
if (ctx.bio)
submit_bio(ctx.bio);
- if (ctx.cur_folio) {
- if (!ctx.cur_folio_in_bio)
- folio_unlock(ctx.cur_folio);
- }
+ if (ctx.cur_folio && !ctx.folio_unlocked)
+ folio_unlock(ctx.cur_folio);
}
EXPORT_SYMBOL_GPL(iomap_readahead);
--
2.47.3
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 03/16] iomap: refactor read/readahead completion
2025-08-29 23:56 [PATCH v1 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
2025-08-29 23:56 ` [PATCH v1 01/16] iomap: move async bio read logic into helper function Joanne Koong
2025-08-29 23:56 ` [PATCH v1 02/16] iomap: rename cur_folio_in_bio to folio_unlocked Joanne Koong
@ 2025-08-29 23:56 ` Joanne Koong
2025-08-29 23:56 ` [PATCH v1 04/16] iomap: use iomap_iter->private for stashing read/readahead bio Joanne Koong
` (12 subsequent siblings)
15 siblings, 0 replies; 34+ messages in thread
From: Joanne Koong @ 2025-08-29 23:56 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, linux-fsdevel, kernel-team, linux-xfs, linux-doc
Refactor the read/readahead completion logic into two new functions,
iomap_readfolio_complete() and iomap_readfolio_submit(). This helps make
iomap read/readahead generic when the code will be moved out of
CONFIG_BLOCK scope.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 27 ++++++++++++++++-----------
1 file changed, 16 insertions(+), 11 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 4b173aad04ed..f2bfb3e17bb0 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -447,6 +447,20 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
return iomap_iter_advance(iter, &length);
}
+static void iomap_readfolio_submit(const struct iomap_readpage_ctx *ctx)
+{
+ if (ctx->bio)
+ submit_bio(ctx->bio);
+}
+
+static void iomap_readfolio_complete(const struct iomap_readpage_ctx *ctx)
+{
+ iomap_readfolio_submit(ctx);
+
+ if (ctx->cur_folio && !ctx->folio_unlocked)
+ folio_unlock(ctx->cur_folio);
+}
+
static int iomap_read_folio_iter(struct iomap_iter *iter,
struct iomap_readpage_ctx *ctx)
{
@@ -478,13 +492,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
while ((ret = iomap_iter(&iter, ops)) > 0)
iter.status = iomap_read_folio_iter(&iter, &ctx);
- if (ctx.bio) {
- submit_bio(ctx.bio);
- WARN_ON_ONCE(!ctx.folio_unlocked);
- } else {
- WARN_ON_ONCE(ctx.folio_unlocked);
- folio_unlock(folio);
- }
+ iomap_readfolio_complete(&ctx);
/*
* Just like mpage_readahead and block_read_full_folio, we always
@@ -550,10 +558,7 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
while (iomap_iter(&iter, ops) > 0)
iter.status = iomap_readahead_iter(&iter, &ctx);
- if (ctx.bio)
- submit_bio(ctx.bio);
- if (ctx.cur_folio && !ctx.folio_unlocked)
- folio_unlock(ctx.cur_folio);
+ iomap_readfolio_complete(&ctx);
}
EXPORT_SYMBOL_GPL(iomap_readahead);
--
2.47.3
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 04/16] iomap: use iomap_iter->private for stashing read/readahead bio
2025-08-29 23:56 [PATCH v1 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (2 preceding siblings ...)
2025-08-29 23:56 ` [PATCH v1 03/16] iomap: refactor read/readahead completion Joanne Koong
@ 2025-08-29 23:56 ` Joanne Koong
2025-09-03 20:30 ` Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 05/16] iomap: propagate iomap_read_folio() error to caller Joanne Koong
` (11 subsequent siblings)
15 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2025-08-29 23:56 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, linux-fsdevel, kernel-team, linux-xfs, linux-doc
Use the iomap_iter->private field for stashing any read/readahead bios
instead of defining the bio as part of the iomap_readpage_ctx struct.
This makes the read/readahead interface more generic. Some filesystems
that will be using iomap for read/readahead may not have CONFIG_BLOCK
set.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 49 +++++++++++++++++++++---------------------
1 file changed, 25 insertions(+), 24 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index f2bfb3e17bb0..9db233a4a82c 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -353,11 +353,10 @@ static void iomap_read_end_io(struct bio *bio)
struct iomap_readpage_ctx {
struct folio *cur_folio;
bool folio_unlocked;
- struct bio *bio;
struct readahead_control *rac;
};
-static void iomap_read_folio_range_async(const struct iomap_iter *iter,
+static void iomap_read_folio_range_async(struct iomap_iter *iter,
struct iomap_readpage_ctx *ctx, loff_t pos, size_t plen)
{
struct folio *folio = ctx->cur_folio;
@@ -365,6 +364,7 @@ static void iomap_read_folio_range_async(const struct iomap_iter *iter,
struct iomap_folio_state *ifs = folio->private;
size_t poff = offset_in_folio(folio, pos);
loff_t length = iomap_length(iter);
+ struct bio *bio = iter->private;
sector_t sector;
ctx->folio_unlocked = true;
@@ -375,34 +375,32 @@ static void iomap_read_folio_range_async(const struct iomap_iter *iter,
}
sector = iomap_sector(iomap, pos);
- if (!ctx->bio ||
- bio_end_sector(ctx->bio) != sector ||
- !bio_add_folio(ctx->bio, folio, plen, poff)) {
+ if (!bio || bio_end_sector(bio) != sector ||
+ !bio_add_folio(bio, folio, plen, poff)) {
gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
gfp_t orig_gfp = gfp;
unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
- if (ctx->bio)
- submit_bio(ctx->bio);
+ if (bio)
+ submit_bio(bio);
if (ctx->rac) /* same as readahead_gfp_mask */
gfp |= __GFP_NORETRY | __GFP_NOWARN;
- ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
- REQ_OP_READ, gfp);
+ bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
+ REQ_OP_READ, gfp);
/*
* If the bio_alloc fails, try it again for a single page to
* avoid having to deal with partial page reads. This emulates
* what do_mpage_read_folio does.
*/
- if (!ctx->bio) {
- ctx->bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ,
- orig_gfp);
- }
+ if (!bio)
+ bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ, orig_gfp);
+ iter->private = bio;
if (ctx->rac)
- ctx->bio->bi_opf |= REQ_RAHEAD;
- ctx->bio->bi_iter.bi_sector = sector;
- ctx->bio->bi_end_io = iomap_read_end_io;
- bio_add_folio_nofail(ctx->bio, folio, plen, poff);
+ bio->bi_opf |= REQ_RAHEAD;
+ bio->bi_iter.bi_sector = sector;
+ bio->bi_end_io = iomap_read_end_io;
+ bio_add_folio_nofail(bio, folio, plen, poff);
}
}
@@ -447,15 +445,18 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
return iomap_iter_advance(iter, &length);
}
-static void iomap_readfolio_submit(const struct iomap_readpage_ctx *ctx)
+static void iomap_readfolio_submit(const struct iomap_iter *iter)
{
- if (ctx->bio)
- submit_bio(ctx->bio);
+ struct bio *bio = iter->private;
+
+ if (bio)
+ submit_bio(bio);
}
-static void iomap_readfolio_complete(const struct iomap_readpage_ctx *ctx)
+static void iomap_readfolio_complete(const struct iomap_iter *iter,
+ const struct iomap_readpage_ctx *ctx)
{
- iomap_readfolio_submit(ctx);
+ iomap_readfolio_submit(iter);
if (ctx->cur_folio && !ctx->folio_unlocked)
folio_unlock(ctx->cur_folio);
@@ -492,7 +493,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
while ((ret = iomap_iter(&iter, ops)) > 0)
iter.status = iomap_read_folio_iter(&iter, &ctx);
- iomap_readfolio_complete(&ctx);
+ iomap_readfolio_complete(&iter, &ctx);
/*
* Just like mpage_readahead and block_read_full_folio, we always
@@ -558,7 +559,7 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
while (iomap_iter(&iter, ops) > 0)
iter.status = iomap_readahead_iter(&iter, &ctx);
- iomap_readfolio_complete(&ctx);
+ iomap_readfolio_complete(&iter, &ctx);
}
EXPORT_SYMBOL_GPL(iomap_readahead);
--
2.47.3
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 05/16] iomap: propagate iomap_read_folio() error to caller
2025-08-29 23:56 [PATCH v1 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (3 preceding siblings ...)
2025-08-29 23:56 ` [PATCH v1 04/16] iomap: use iomap_iter->private for stashing read/readahead bio Joanne Koong
@ 2025-08-29 23:56 ` Joanne Koong
2025-09-03 20:32 ` Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 06/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard Joanne Koong
` (10 subsequent siblings)
15 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2025-08-29 23:56 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, linux-fsdevel, kernel-team, linux-xfs, linux-doc
Propagate any error encountered in iomap_read_folio() back up to its
caller (otherwise a default -EIO will be passed up by
filemap_read_folio() to callers). This is standard behavior for how
other filesystems handle their ->read_folio() errors as well.
Remove the out of date comment about setting the folio error flag.
Folio error flags were removed in commit 1f56eedf7ff7 ("iomap:
Remove calls to set and clear folio error flag").
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 9db233a4a82c..8dd26c50e5ea 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -495,12 +495,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
iomap_readfolio_complete(&iter, &ctx);
- /*
- * Just like mpage_readahead and block_read_full_folio, we always
- * return 0 and just set the folio error flag on errors. This
- * should be cleaned up throughout the stack eventually.
- */
- return 0;
+ return ret;
}
EXPORT_SYMBOL_GPL(iomap_read_folio);
--
2.47.3
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 06/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard
2025-08-29 23:56 [PATCH v1 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (4 preceding siblings ...)
2025-08-29 23:56 ` [PATCH v1 05/16] iomap: propagate iomap_read_folio() error to caller Joanne Koong
@ 2025-08-29 23:56 ` Joanne Koong
2025-08-29 23:56 ` [PATCH v1 07/16] iomap: iterate through entire folio in iomap_readpage_iter() Joanne Koong
` (9 subsequent siblings)
15 siblings, 0 replies; 34+ messages in thread
From: Joanne Koong @ 2025-08-29 23:56 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, linux-fsdevel, kernel-team, linux-xfs, linux-doc
There is no longer a dependency on CONFIG_BLOCK in the iomap read and
readahead logic. Move this logic out of the CONFIG_BLOCK guard.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 81 ++++++++++++++++++++++++------------------
1 file changed, 46 insertions(+), 35 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 8dd26c50e5ea..f26544fbcb36 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -317,6 +317,12 @@ static int iomap_read_inline_data(const struct iomap_iter *iter,
return 0;
}
+struct iomap_readpage_ctx {
+ struct folio *cur_folio;
+ bool folio_unlocked;
+ struct readahead_control *rac;
+};
+
#ifdef CONFIG_BLOCK
static void iomap_finish_folio_read(struct folio *folio, size_t off,
size_t len, int error)
@@ -350,12 +356,6 @@ static void iomap_read_end_io(struct bio *bio)
bio_put(bio);
}
-struct iomap_readpage_ctx {
- struct folio *cur_folio;
- bool folio_unlocked;
- struct readahead_control *rac;
-};
-
static void iomap_read_folio_range_async(struct iomap_iter *iter,
struct iomap_readpage_ctx *ctx, loff_t pos, size_t plen)
{
@@ -404,6 +404,46 @@ static void iomap_read_folio_range_async(struct iomap_iter *iter,
}
}
+static int iomap_read_folio_range_sync(const struct iomap_iter *iter,
+ struct folio *folio, loff_t pos, size_t len)
+{
+ const struct iomap *srcmap = iomap_iter_srcmap(iter);
+ struct bio_vec bvec;
+ struct bio bio;
+
+ bio_init(&bio, srcmap->bdev, &bvec, 1, REQ_OP_READ);
+ bio.bi_iter.bi_sector = iomap_sector(srcmap, pos);
+ bio_add_folio_nofail(&bio, folio, len, offset_in_folio(folio, pos));
+ return submit_bio_wait(&bio);
+}
+
+static void iomap_readfolio_submit(const struct iomap_iter *iter)
+{
+ struct bio *bio = iter->private;
+
+ if (bio)
+ submit_bio(bio);
+}
+#else
+static void iomap_read_folio_range_async(struct iomap_iter *iter,
+ struct iomap_readpage_ctx *ctx, loff_t pos, size_t len)
+{
+ WARN_ON_ONCE(1);
+}
+
+static int iomap_read_folio_range_sync(const struct iomap_iter *iter,
+ struct folio *folio, loff_t pos, size_t len)
+{
+ WARN_ON_ONCE(1);
+ return -EIO;
+}
+
+static void iomap_readfolio_submit(const struct iomap_iter *iter)
+{
+ WARN_ON_ONCE(1);
+}
+#endif /* CONFIG_BLOCK */
+
static int iomap_readpage_iter(struct iomap_iter *iter,
struct iomap_readpage_ctx *ctx)
{
@@ -445,14 +485,6 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
return iomap_iter_advance(iter, &length);
}
-static void iomap_readfolio_submit(const struct iomap_iter *iter)
-{
- struct bio *bio = iter->private;
-
- if (bio)
- submit_bio(bio);
-}
-
static void iomap_readfolio_complete(const struct iomap_iter *iter,
const struct iomap_readpage_ctx *ctx)
{
@@ -558,27 +590,6 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
}
EXPORT_SYMBOL_GPL(iomap_readahead);
-static int iomap_read_folio_range_sync(const struct iomap_iter *iter,
- struct folio *folio, loff_t pos, size_t len)
-{
- const struct iomap *srcmap = iomap_iter_srcmap(iter);
- struct bio_vec bvec;
- struct bio bio;
-
- bio_init(&bio, srcmap->bdev, &bvec, 1, REQ_OP_READ);
- bio.bi_iter.bi_sector = iomap_sector(srcmap, pos);
- bio_add_folio_nofail(&bio, folio, len, offset_in_folio(folio, pos));
- return submit_bio_wait(&bio);
-}
-#else
-static int iomap_read_folio_range_sync(const struct iomap_iter *iter,
- struct folio *folio, loff_t pos, size_t len)
-{
- WARN_ON_ONCE(1);
- return -EIO;
-}
-#endif /* CONFIG_BLOCK */
-
/*
* iomap_is_partially_uptodate checks whether blocks within a folio are
* uptodate or not.
--
2.47.3
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 07/16] iomap: iterate through entire folio in iomap_readpage_iter()
2025-08-29 23:56 [PATCH v1 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (5 preceding siblings ...)
2025-08-29 23:56 ` [PATCH v1 06/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard Joanne Koong
@ 2025-08-29 23:56 ` Joanne Koong
2025-09-03 20:43 ` Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 08/16] iomap: rename iomap_readpage_iter() to iomap_readfolio_iter() Joanne Koong
` (8 subsequent siblings)
15 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2025-08-29 23:56 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, linux-fsdevel, kernel-team, linux-xfs, linux-doc
Iterate through the entire folio in iomap_readpage_iter() in one go
instead of in pieces. This will be needed for supporting user-provided
async read folio callbacks (not yet added). This additionally makes the
iomap_readahead_iter() logic simpler to follow.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 76 ++++++++++++++++++------------------------
1 file changed, 33 insertions(+), 43 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index f26544fbcb36..75bbef386b62 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -452,6 +452,7 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
loff_t length = iomap_length(iter);
struct folio *folio = ctx->cur_folio;
size_t poff, plen;
+ loff_t count;
int ret;
if (iomap->type == IOMAP_INLINE) {
@@ -463,26 +464,30 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
/* zero post-eof blocks as the page may be mapped */
ifs_alloc(iter->inode, folio, iter->flags);
- iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff, &plen);
- if (plen == 0)
- goto done;
- if (iomap_block_needs_zeroing(iter, pos)) {
- folio_zero_range(folio, poff, plen);
- iomap_set_range_uptodate(folio, poff, plen);
- } else {
- iomap_read_folio_range_async(iter, ctx, pos, plen);
- }
+ length = min_t(loff_t, length,
+ folio_size(folio) - offset_in_folio(folio, pos));
+ while (length) {
+ iomap_adjust_read_range(iter->inode, folio, &pos,
+ length, &poff, &plen);
+ count = pos - iter->pos + plen;
+ if (plen == 0)
+ return iomap_iter_advance(iter, &count);
-done:
- /*
- * Move the caller beyond our range so that it keeps making progress.
- * For that, we have to include any leading non-uptodate ranges, but
- * we can skip trailing ones as they will be handled in the next
- * iteration.
- */
- length = pos - iter->pos + plen;
- return iomap_iter_advance(iter, &length);
+ if (iomap_block_needs_zeroing(iter, pos)) {
+ folio_zero_range(folio, poff, plen);
+ iomap_set_range_uptodate(folio, poff, plen);
+ } else {
+ iomap_read_folio_range_async(iter, ctx, pos, plen);
+ }
+
+ length -= count;
+ ret = iomap_iter_advance(iter, &count);
+ if (ret)
+ return ret;
+ pos = iter->pos;
+ }
+ return 0;
}
static void iomap_readfolio_complete(const struct iomap_iter *iter,
@@ -494,20 +499,6 @@ static void iomap_readfolio_complete(const struct iomap_iter *iter,
folio_unlock(ctx->cur_folio);
}
-static int iomap_read_folio_iter(struct iomap_iter *iter,
- struct iomap_readpage_ctx *ctx)
-{
- int ret;
-
- while (iomap_length(iter)) {
- ret = iomap_readpage_iter(iter, ctx);
- if (ret)
- return ret;
- }
-
- return 0;
-}
-
int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
{
struct iomap_iter iter = {
@@ -523,7 +514,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
trace_iomap_readpage(iter.inode, 1);
while ((ret = iomap_iter(&iter, ops)) > 0)
- iter.status = iomap_read_folio_iter(&iter, &ctx);
+ iter.status = iomap_readpage_iter(&iter, &ctx);
iomap_readfolio_complete(&iter, &ctx);
@@ -537,16 +528,15 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
int ret;
while (iomap_length(iter)) {
- if (ctx->cur_folio &&
- offset_in_folio(ctx->cur_folio, iter->pos) == 0) {
- if (!ctx->folio_unlocked)
- folio_unlock(ctx->cur_folio);
- ctx->cur_folio = NULL;
- }
- if (!ctx->cur_folio) {
- ctx->cur_folio = readahead_folio(ctx->rac);
- ctx->folio_unlocked = false;
- }
+ if (ctx->cur_folio && !ctx->folio_unlocked)
+ folio_unlock(ctx->cur_folio);
+ ctx->cur_folio = readahead_folio(ctx->rac);
+ /*
+ * We should never in practice hit this case since
+ * the iter length matches the readahead length.
+ */
+ WARN_ON(!ctx->cur_folio);
+ ctx->folio_unlocked = false;
ret = iomap_readpage_iter(iter, ctx);
if (ret)
return ret;
--
2.47.3
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 08/16] iomap: rename iomap_readpage_iter() to iomap_readfolio_iter()
2025-08-29 23:56 [PATCH v1 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (6 preceding siblings ...)
2025-08-29 23:56 ` [PATCH v1 07/16] iomap: iterate through entire folio in iomap_readpage_iter() Joanne Koong
@ 2025-08-29 23:56 ` Joanne Koong
2025-08-29 23:56 ` [PATCH v1 09/16] iomap: rename iomap_readpage_ctx struct to iomap_readfolio_ctx Joanne Koong
` (7 subsequent siblings)
15 siblings, 0 replies; 34+ messages in thread
From: Joanne Koong @ 2025-08-29 23:56 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, linux-fsdevel, kernel-team, linux-xfs, linux-doc
->readpage was deprecated and reads are now on folios.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 75bbef386b62..743112c7f8e6 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -444,7 +444,7 @@ static void iomap_readfolio_submit(const struct iomap_iter *iter)
}
#endif /* CONFIG_BLOCK */
-static int iomap_readpage_iter(struct iomap_iter *iter,
+static int iomap_readfolio_iter(struct iomap_iter *iter,
struct iomap_readpage_ctx *ctx)
{
const struct iomap *iomap = &iter->iomap;
@@ -514,7 +514,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
trace_iomap_readpage(iter.inode, 1);
while ((ret = iomap_iter(&iter, ops)) > 0)
- iter.status = iomap_readpage_iter(&iter, &ctx);
+ iter.status = iomap_readfolio_iter(&iter, &ctx);
iomap_readfolio_complete(&iter, &ctx);
@@ -537,7 +537,7 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
*/
WARN_ON(!ctx->cur_folio);
ctx->folio_unlocked = false;
- ret = iomap_readpage_iter(iter, ctx);
+ ret = iomap_readfolio_iter(iter, ctx);
if (ret)
return ret;
}
--
2.47.3
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 09/16] iomap: rename iomap_readpage_ctx struct to iomap_readfolio_ctx
2025-08-29 23:56 [PATCH v1 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (7 preceding siblings ...)
2025-08-29 23:56 ` [PATCH v1 08/16] iomap: rename iomap_readpage_iter() to iomap_readfolio_iter() Joanne Koong
@ 2025-08-29 23:56 ` Joanne Koong
2025-09-03 20:44 ` Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 10/16] iomap: add iomap_start_folio_read() helper Joanne Koong
` (6 subsequent siblings)
15 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2025-08-29 23:56 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, linux-fsdevel, kernel-team, linux-xfs, linux-doc
->readpage was deprecated and reads are now on folios.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 743112c7f8e6..a3a9b6146c2f 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -317,7 +317,7 @@ static int iomap_read_inline_data(const struct iomap_iter *iter,
return 0;
}
-struct iomap_readpage_ctx {
+struct iomap_readfolio_ctx {
struct folio *cur_folio;
bool folio_unlocked;
struct readahead_control *rac;
@@ -357,7 +357,7 @@ static void iomap_read_end_io(struct bio *bio)
}
static void iomap_read_folio_range_async(struct iomap_iter *iter,
- struct iomap_readpage_ctx *ctx, loff_t pos, size_t plen)
+ struct iomap_readfolio_ctx *ctx, loff_t pos, size_t plen)
{
struct folio *folio = ctx->cur_folio;
const struct iomap *iomap = &iter->iomap;
@@ -426,7 +426,7 @@ static void iomap_readfolio_submit(const struct iomap_iter *iter)
}
#else
static void iomap_read_folio_range_async(struct iomap_iter *iter,
- struct iomap_readpage_ctx *ctx, loff_t pos, size_t len)
+ struct iomap_readfolio_ctx *ctx, loff_t pos, size_t len)
{
WARN_ON_ONCE(1);
}
@@ -445,7 +445,7 @@ static void iomap_readfolio_submit(const struct iomap_iter *iter)
#endif /* CONFIG_BLOCK */
static int iomap_readfolio_iter(struct iomap_iter *iter,
- struct iomap_readpage_ctx *ctx)
+ struct iomap_readfolio_ctx *ctx)
{
const struct iomap *iomap = &iter->iomap;
loff_t pos = iter->pos;
@@ -491,7 +491,7 @@ static int iomap_readfolio_iter(struct iomap_iter *iter,
}
static void iomap_readfolio_complete(const struct iomap_iter *iter,
- const struct iomap_readpage_ctx *ctx)
+ const struct iomap_readfolio_ctx *ctx)
{
iomap_readfolio_submit(iter);
@@ -506,7 +506,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
.pos = folio_pos(folio),
.len = folio_size(folio),
};
- struct iomap_readpage_ctx ctx = {
+ struct iomap_readfolio_ctx ctx = {
.cur_folio = folio,
};
int ret;
@@ -523,7 +523,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
EXPORT_SYMBOL_GPL(iomap_read_folio);
static int iomap_readahead_iter(struct iomap_iter *iter,
- struct iomap_readpage_ctx *ctx)
+ struct iomap_readfolio_ctx *ctx)
{
int ret;
@@ -567,7 +567,7 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
.pos = readahead_pos(rac),
.len = readahead_length(rac),
};
- struct iomap_readpage_ctx ctx = {
+ struct iomap_readfolio_ctx ctx = {
.rac = rac,
};
--
2.47.3
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 10/16] iomap: add iomap_start_folio_read() helper
2025-08-29 23:56 [PATCH v1 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (8 preceding siblings ...)
2025-08-29 23:56 ` [PATCH v1 09/16] iomap: rename iomap_readpage_ctx struct to iomap_readfolio_ctx Joanne Koong
@ 2025-08-29 23:56 ` Joanne Koong
2025-09-03 20:52 ` Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 11/16] iomap: make start folio read and finish folio read public APIs Joanne Koong
` (5 subsequent siblings)
15 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2025-08-29 23:56 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, linux-fsdevel, kernel-team, linux-xfs, linux-doc
Move ifs read_bytes_pending addition logic into a separate helper,
iomap_start_folio_read(), which will be needed later on by user-provided
read callbacks (not yet added) for read/readahead.This is the
counterpart to the already currently-existing iomap_finish_folio_read().
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index a3a9b6146c2f..6a9f9a9e591f 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -324,6 +324,17 @@ struct iomap_readfolio_ctx {
};
#ifdef CONFIG_BLOCK
+static void iomap_start_folio_read(struct folio *folio, size_t len)
+{
+ struct iomap_folio_state *ifs = folio->private;
+
+ if (ifs) {
+ spin_lock_irq(&ifs->state_lock);
+ ifs->read_bytes_pending += len;
+ spin_unlock_irq(&ifs->state_lock);
+ }
+}
+
static void iomap_finish_folio_read(struct folio *folio, size_t off,
size_t len, int error)
{
@@ -361,18 +372,13 @@ static void iomap_read_folio_range_async(struct iomap_iter *iter,
{
struct folio *folio = ctx->cur_folio;
const struct iomap *iomap = &iter->iomap;
- struct iomap_folio_state *ifs = folio->private;
size_t poff = offset_in_folio(folio, pos);
loff_t length = iomap_length(iter);
struct bio *bio = iter->private;
sector_t sector;
ctx->folio_unlocked = true;
- if (ifs) {
- spin_lock_irq(&ifs->state_lock);
- ifs->read_bytes_pending += plen;
- spin_unlock_irq(&ifs->state_lock);
- }
+ iomap_start_folio_read(folio, plen);
sector = iomap_sector(iomap, pos);
if (!bio || bio_end_sector(bio) != sector ||
--
2.47.3
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 11/16] iomap: make start folio read and finish folio read public APIs
2025-08-29 23:56 [PATCH v1 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (9 preceding siblings ...)
2025-08-29 23:56 ` [PATCH v1 10/16] iomap: add iomap_start_folio_read() helper Joanne Koong
@ 2025-08-29 23:56 ` Joanne Koong
2025-09-03 20:53 ` Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 12/16] iomap: add iomap_read_ops for read and readahead Joanne Koong
` (4 subsequent siblings)
15 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2025-08-29 23:56 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, linux-fsdevel, kernel-team, linux-xfs, linux-doc
Make iomap_start_folio_read() and iomap_finish_folio_read() publicly
accessible. These need to be accessible in order to support
user-provided read folio callbacks for read/readahead.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 10 ++++++----
include/linux/iomap.h | 3 +++
2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 6a9f9a9e591f..5d153c6b16b6 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -323,8 +323,7 @@ struct iomap_readfolio_ctx {
struct readahead_control *rac;
};
-#ifdef CONFIG_BLOCK
-static void iomap_start_folio_read(struct folio *folio, size_t len)
+void iomap_start_folio_read(struct folio *folio, size_t len)
{
struct iomap_folio_state *ifs = folio->private;
@@ -334,9 +333,10 @@ static void iomap_start_folio_read(struct folio *folio, size_t len)
spin_unlock_irq(&ifs->state_lock);
}
}
+EXPORT_SYMBOL_GPL(iomap_start_folio_read);
-static void iomap_finish_folio_read(struct folio *folio, size_t off,
- size_t len, int error)
+void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
+ int error)
{
struct iomap_folio_state *ifs = folio->private;
bool uptodate = !error;
@@ -356,7 +356,9 @@ static void iomap_finish_folio_read(struct folio *folio, size_t off,
if (finished)
folio_end_read(folio, uptodate);
}
+EXPORT_SYMBOL_GPL(iomap_finish_folio_read);
+#ifdef CONFIG_BLOCK
static void iomap_read_end_io(struct bio *bio)
{
int error = blk_status_to_errno(bio->bi_status);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 73dceabc21c8..0938c4a57f4c 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -467,6 +467,9 @@ ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
loff_t pos, loff_t end_pos, unsigned int dirty_len);
int iomap_ioend_writeback_submit(struct iomap_writepage_ctx *wpc, int error);
+void iomap_start_folio_read(struct folio *folio, size_t len);
+void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
+ int error);
void iomap_start_folio_write(struct inode *inode, struct folio *folio,
size_t len);
void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
--
2.47.3
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 12/16] iomap: add iomap_read_ops for read and readahead
2025-08-29 23:56 [PATCH v1 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (10 preceding siblings ...)
2025-08-29 23:56 ` [PATCH v1 11/16] iomap: make start folio read and finish folio read public APIs Joanne Koong
@ 2025-08-29 23:56 ` Joanne Koong
2025-09-03 21:08 ` Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 13/16] iomap: add a private arg " Joanne Koong
` (3 subsequent siblings)
15 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2025-08-29 23:56 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, linux-fsdevel, kernel-team, linux-xfs, linux-doc
Add a "struct iomap_read_ops" that contains a read_folio_range()
callback that callers can provide as a custom handler for reading in a
folio range, if the caller does not wish to issue bio read requests
(which otherwise is the default behavior). read_folio_range() may read
the request asynchronously or synchronously. The caller is responsible
for calling iomap_start_folio_read()/iomap_finish_folio_read() when
reading the folio range.
This makes it so that non-block based filesystems may use iomap for
reads.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
.../filesystems/iomap/operations.rst | 19 +++++
block/fops.c | 4 +-
fs/erofs/data.c | 4 +-
fs/gfs2/aops.c | 4 +-
fs/iomap/buffered-io.c | 79 +++++++++++++------
fs/xfs/xfs_aops.c | 4 +-
fs/zonefs/file.c | 4 +-
include/linux/iomap.h | 21 ++++-
8 files changed, 105 insertions(+), 34 deletions(-)
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index 067ed8e14ef3..215053f0779d 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -57,6 +57,25 @@ The following address space operations can be wrapped easily:
* ``bmap``
* ``swap_activate``
+``struct iomap_read_ops``
+--------------------------
+
+.. code-block:: c
+
+ struct iomap_read_ops {
+ int (*read_folio_range)(const struct iomap_iter *iter,
+ struct folio *folio, loff_t pos, size_t len);
+ };
+
+iomap calls these functions:
+
+ - ``read_folio_range``: Called to read in the range (read does not need to
+ be synchronous). The caller is responsible for calling
+ iomap_start_folio_read() and iomap_finish_folio_read() when reading the
+ folio range. This should be done even if an error is encountered during
+ the read. If this function is not provided by the caller, then iomap
+ will default to issuing asynchronous bio read requests.
+
``struct iomap_write_ops``
--------------------------
diff --git a/block/fops.c b/block/fops.c
index ddbc69c0922b..b42e16d0eb35 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -533,12 +533,12 @@ const struct address_space_operations def_blk_aops = {
#else /* CONFIG_BUFFER_HEAD */
static int blkdev_read_folio(struct file *file, struct folio *folio)
{
- return iomap_read_folio(folio, &blkdev_iomap_ops);
+ return iomap_read_folio(folio, &blkdev_iomap_ops, NULL);
}
static void blkdev_readahead(struct readahead_control *rac)
{
- iomap_readahead(rac, &blkdev_iomap_ops);
+ iomap_readahead(rac, &blkdev_iomap_ops, NULL);
}
static ssize_t blkdev_writeback_range(struct iomap_writepage_ctx *wpc,
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 3b1ba571c728..ea451f233263 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -371,7 +371,7 @@ static int erofs_read_folio(struct file *file, struct folio *folio)
{
trace_erofs_read_folio(folio, true);
- return iomap_read_folio(folio, &erofs_iomap_ops);
+ return iomap_read_folio(folio, &erofs_iomap_ops, NULL);
}
static void erofs_readahead(struct readahead_control *rac)
@@ -379,7 +379,7 @@ static void erofs_readahead(struct readahead_control *rac)
trace_erofs_readahead(rac->mapping->host, readahead_index(rac),
readahead_count(rac), true);
- return iomap_readahead(rac, &erofs_iomap_ops);
+ return iomap_readahead(rac, &erofs_iomap_ops, NULL);
}
static sector_t erofs_bmap(struct address_space *mapping, sector_t block)
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 47d74afd63ac..bf531bcfd8a0 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -428,7 +428,7 @@ static int gfs2_read_folio(struct file *file, struct folio *folio)
if (!gfs2_is_jdata(ip) ||
(i_blocksize(inode) == PAGE_SIZE && !folio_buffers(folio))) {
- error = iomap_read_folio(folio, &gfs2_iomap_ops);
+ error = iomap_read_folio(folio, &gfs2_iomap_ops, NULL);
} else if (gfs2_is_stuffed(ip)) {
error = stuffed_read_folio(ip, folio);
} else {
@@ -503,7 +503,7 @@ static void gfs2_readahead(struct readahead_control *rac)
else if (gfs2_is_jdata(ip))
mpage_readahead(rac, gfs2_block_map);
else
- iomap_readahead(rac, &gfs2_iomap_ops);
+ iomap_readahead(rac, &gfs2_iomap_ops, NULL);
}
/**
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 5d153c6b16b6..06f2c857de64 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -335,8 +335,8 @@ void iomap_start_folio_read(struct folio *folio, size_t len)
}
EXPORT_SYMBOL_GPL(iomap_start_folio_read);
-void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
- int error)
+static void __iomap_finish_folio_read(struct folio *folio, size_t off,
+ size_t len, int error, bool update_bitmap)
{
struct iomap_folio_state *ifs = folio->private;
bool uptodate = !error;
@@ -346,7 +346,7 @@ void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
unsigned long flags;
spin_lock_irqsave(&ifs->state_lock, flags);
- if (!error)
+ if (!error && update_bitmap)
uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
ifs->read_bytes_pending -= len;
finished = !ifs->read_bytes_pending;
@@ -356,6 +356,12 @@ void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
if (finished)
folio_end_read(folio, uptodate);
}
+
+void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
+ int error)
+{
+ return __iomap_finish_folio_read(folio, off, len, error, true);
+}
EXPORT_SYMBOL_GPL(iomap_finish_folio_read);
#ifdef CONFIG_BLOCK
@@ -379,7 +385,6 @@ static void iomap_read_folio_range_async(struct iomap_iter *iter,
struct bio *bio = iter->private;
sector_t sector;
- ctx->folio_unlocked = true;
iomap_start_folio_read(folio, plen);
sector = iomap_sector(iomap, pos);
@@ -453,15 +458,17 @@ static void iomap_readfolio_submit(const struct iomap_iter *iter)
#endif /* CONFIG_BLOCK */
static int iomap_readfolio_iter(struct iomap_iter *iter,
- struct iomap_readfolio_ctx *ctx)
+ struct iomap_readfolio_ctx *ctx,
+ const struct iomap_read_ops *read_ops)
{
const struct iomap *iomap = &iter->iomap;
+ struct iomap_folio_state *ifs;
loff_t pos = iter->pos;
loff_t length = iomap_length(iter);
struct folio *folio = ctx->cur_folio;
size_t poff, plen;
loff_t count;
- int ret;
+ int ret = 0;
if (iomap->type == IOMAP_INLINE) {
ret = iomap_read_inline_data(iter, folio);
@@ -471,7 +478,14 @@ static int iomap_readfolio_iter(struct iomap_iter *iter,
}
/* zero post-eof blocks as the page may be mapped */
- ifs_alloc(iter->inode, folio, iter->flags);
+ ifs = ifs_alloc(iter->inode, folio, iter->flags);
+
+ /*
+ * Add a bias to ifs->read_bytes_pending so that a read is ended only
+ * after all the ranges have been read in.
+ */
+ if (ifs)
+ iomap_start_folio_read(folio, 1);
length = min_t(loff_t, length,
folio_size(folio) - offset_in_folio(folio, pos));
@@ -479,35 +493,53 @@ static int iomap_readfolio_iter(struct iomap_iter *iter,
iomap_adjust_read_range(iter->inode, folio, &pos,
length, &poff, &plen);
count = pos - iter->pos + plen;
- if (plen == 0)
- return iomap_iter_advance(iter, &count);
+ if (plen == 0) {
+ ret = iomap_iter_advance(iter, &count);
+ break;
+ }
if (iomap_block_needs_zeroing(iter, pos)) {
folio_zero_range(folio, poff, plen);
iomap_set_range_uptodate(folio, poff, plen);
} else {
- iomap_read_folio_range_async(iter, ctx, pos, plen);
+ ctx->folio_unlocked = true;
+ if (read_ops && read_ops->read_folio_range) {
+ ret = read_ops->read_folio_range(iter, folio, pos, plen);
+ if (ret)
+ break;
+ } else {
+ iomap_read_folio_range_async(iter, ctx, pos, plen);
+ }
}
length -= count;
ret = iomap_iter_advance(iter, &count);
if (ret)
- return ret;
+ break;
pos = iter->pos;
}
- return 0;
+
+ if (ifs) {
+ __iomap_finish_folio_read(folio, 0, 1, ret, false);
+ ctx->folio_unlocked = true;
+ }
+
+ return ret;
}
static void iomap_readfolio_complete(const struct iomap_iter *iter,
- const struct iomap_readfolio_ctx *ctx)
+ const struct iomap_readfolio_ctx *ctx,
+ const struct iomap_read_ops *read_ops)
{
- iomap_readfolio_submit(iter);
+ if (!read_ops || !read_ops->read_folio_range)
+ iomap_readfolio_submit(iter);
if (ctx->cur_folio && !ctx->folio_unlocked)
folio_unlock(ctx->cur_folio);
}
-int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
+int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops,
+ const struct iomap_read_ops *read_ops)
{
struct iomap_iter iter = {
.inode = folio->mapping->host,
@@ -522,16 +554,17 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
trace_iomap_readpage(iter.inode, 1);
while ((ret = iomap_iter(&iter, ops)) > 0)
- iter.status = iomap_readfolio_iter(&iter, &ctx);
+ iter.status = iomap_readfolio_iter(&iter, &ctx, read_ops);
- iomap_readfolio_complete(&iter, &ctx);
+ iomap_readfolio_complete(&iter, &ctx, read_ops);
return ret;
}
EXPORT_SYMBOL_GPL(iomap_read_folio);
static int iomap_readahead_iter(struct iomap_iter *iter,
- struct iomap_readfolio_ctx *ctx)
+ struct iomap_readfolio_ctx *ctx,
+ const struct iomap_read_ops *read_ops)
{
int ret;
@@ -545,7 +578,7 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
*/
WARN_ON(!ctx->cur_folio);
ctx->folio_unlocked = false;
- ret = iomap_readfolio_iter(iter, ctx);
+ ret = iomap_readfolio_iter(iter, ctx, read_ops);
if (ret)
return ret;
}
@@ -557,6 +590,7 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
* iomap_readahead - Attempt to read pages from a file.
* @rac: Describes the pages to be read.
* @ops: The operations vector for the filesystem.
+ * @read_ops: Optional ops callers can pass in if they want custom handling.
*
* This function is for filesystems to call to implement their readahead
* address_space operation.
@@ -568,7 +602,8 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
* function is called with memalloc_nofs set, so allocations will not cause
* the filesystem to be reentered.
*/
-void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
+void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops,
+ const struct iomap_read_ops *read_ops)
{
struct iomap_iter iter = {
.inode = rac->mapping->host,
@@ -582,9 +617,9 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
trace_iomap_readahead(rac->mapping->host, readahead_count(rac));
while (iomap_iter(&iter, ops) > 0)
- iter.status = iomap_readahead_iter(&iter, &ctx);
+ iter.status = iomap_readahead_iter(&iter, &ctx, read_ops);
- iomap_readfolio_complete(&iter, &ctx);
+ iomap_readfolio_complete(&iter, &ctx, read_ops);
}
EXPORT_SYMBOL_GPL(iomap_readahead);
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 1ee4f835ac3c..fb2150c0825a 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -742,14 +742,14 @@ xfs_vm_read_folio(
struct file *unused,
struct folio *folio)
{
- return iomap_read_folio(folio, &xfs_read_iomap_ops);
+ return iomap_read_folio(folio, &xfs_read_iomap_ops, NULL);
}
STATIC void
xfs_vm_readahead(
struct readahead_control *rac)
{
- iomap_readahead(rac, &xfs_read_iomap_ops);
+ iomap_readahead(rac, &xfs_read_iomap_ops, NULL);
}
static int
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index fd3a5922f6c3..96470daf4d3f 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -112,12 +112,12 @@ static const struct iomap_ops zonefs_write_iomap_ops = {
static int zonefs_read_folio(struct file *unused, struct folio *folio)
{
- return iomap_read_folio(folio, &zonefs_read_iomap_ops);
+ return iomap_read_folio(folio, &zonefs_read_iomap_ops, NULL);
}
static void zonefs_readahead(struct readahead_control *rac)
{
- iomap_readahead(rac, &zonefs_read_iomap_ops);
+ iomap_readahead(rac, &zonefs_read_iomap_ops, NULL);
}
/*
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 0938c4a57f4c..a7247439aeb5 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -178,6 +178,21 @@ struct iomap_write_ops {
struct folio *folio, loff_t pos, size_t len);
};
+struct iomap_read_ops {
+ /*
+ * If the filesystem doesn't provide a custom handler for reading in the
+ * contents of a folio, iomap will default to issuing asynchronous bio
+ * read requests.
+ *
+ * The read does not need to be done synchronously. The caller is
+ * responsible for calling iomap_start_folio_read() and
+ * iomap_finish_folio_read() when reading the folio range. This should
+ * be done even if an error is encountered during the read.
+ */
+ int (*read_folio_range)(const struct iomap_iter *iter,
+ struct folio *folio, loff_t pos, size_t len);
+};
+
/*
* Flags for iomap_begin / iomap_end. No flag implies a read.
*/
@@ -339,8 +354,10 @@ static inline bool iomap_want_unshare_iter(const struct iomap_iter *iter)
ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
const struct iomap_ops *ops,
const struct iomap_write_ops *write_ops, void *private);
-int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops);
-void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops);
+int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops,
+ const struct iomap_read_ops *read_ops);
+void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops,
+ const struct iomap_read_ops *read_ops);
bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);
struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len);
bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags);
--
2.47.3
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 13/16] iomap: add a private arg for read and readahead
2025-08-29 23:56 [PATCH v1 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (11 preceding siblings ...)
2025-08-29 23:56 ` [PATCH v1 12/16] iomap: add iomap_read_ops for read and readahead Joanne Koong
@ 2025-08-29 23:56 ` Joanne Koong
2025-08-30 1:54 ` Gao Xiang
2025-09-03 21:11 ` Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 14/16] fuse: use iomap for read_folio Joanne Koong
` (2 subsequent siblings)
15 siblings, 2 replies; 34+ messages in thread
From: Joanne Koong @ 2025-08-29 23:56 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, linux-fsdevel, kernel-team, linux-xfs, linux-doc
Add a void *private arg for read and readahead which filesystems that
pass in custom read callbacks can use. Stash this in the existing
private field in the iomap_iter.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
block/fops.c | 4 ++--
fs/erofs/data.c | 4 ++--
fs/gfs2/aops.c | 4 ++--
fs/iomap/buffered-io.c | 8 ++++++--
fs/xfs/xfs_aops.c | 4 ++--
fs/zonefs/file.c | 4 ++--
include/linux/iomap.h | 4 ++--
7 files changed, 18 insertions(+), 14 deletions(-)
diff --git a/block/fops.c b/block/fops.c
index b42e16d0eb35..57ae886c7b1a 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -533,12 +533,12 @@ const struct address_space_operations def_blk_aops = {
#else /* CONFIG_BUFFER_HEAD */
static int blkdev_read_folio(struct file *file, struct folio *folio)
{
- return iomap_read_folio(folio, &blkdev_iomap_ops, NULL);
+ return iomap_read_folio(folio, &blkdev_iomap_ops, NULL, NULL);
}
static void blkdev_readahead(struct readahead_control *rac)
{
- iomap_readahead(rac, &blkdev_iomap_ops, NULL);
+ iomap_readahead(rac, &blkdev_iomap_ops, NULL, NULL);
}
static ssize_t blkdev_writeback_range(struct iomap_writepage_ctx *wpc,
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index ea451f233263..2ea338448ca1 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -371,7 +371,7 @@ static int erofs_read_folio(struct file *file, struct folio *folio)
{
trace_erofs_read_folio(folio, true);
- return iomap_read_folio(folio, &erofs_iomap_ops, NULL);
+ return iomap_read_folio(folio, &erofs_iomap_ops, NULL, NULL);
}
static void erofs_readahead(struct readahead_control *rac)
@@ -379,7 +379,7 @@ static void erofs_readahead(struct readahead_control *rac)
trace_erofs_readahead(rac->mapping->host, readahead_index(rac),
readahead_count(rac), true);
- return iomap_readahead(rac, &erofs_iomap_ops, NULL);
+ return iomap_readahead(rac, &erofs_iomap_ops, NULL, NULL);
}
static sector_t erofs_bmap(struct address_space *mapping, sector_t block)
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index bf531bcfd8a0..211a0f7b1416 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -428,7 +428,7 @@ static int gfs2_read_folio(struct file *file, struct folio *folio)
if (!gfs2_is_jdata(ip) ||
(i_blocksize(inode) == PAGE_SIZE && !folio_buffers(folio))) {
- error = iomap_read_folio(folio, &gfs2_iomap_ops, NULL);
+ error = iomap_read_folio(folio, &gfs2_iomap_ops, NULL, NULL);
} else if (gfs2_is_stuffed(ip)) {
error = stuffed_read_folio(ip, folio);
} else {
@@ -503,7 +503,7 @@ static void gfs2_readahead(struct readahead_control *rac)
else if (gfs2_is_jdata(ip))
mpage_readahead(rac, gfs2_block_map);
else
- iomap_readahead(rac, &gfs2_iomap_ops, NULL);
+ iomap_readahead(rac, &gfs2_iomap_ops, NULL, NULL);
}
/**
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 06f2c857de64..d68dd7f63923 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -539,12 +539,13 @@ static void iomap_readfolio_complete(const struct iomap_iter *iter,
}
int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops,
- const struct iomap_read_ops *read_ops)
+ const struct iomap_read_ops *read_ops, void *private)
{
struct iomap_iter iter = {
.inode = folio->mapping->host,
.pos = folio_pos(folio),
.len = folio_size(folio),
+ .private = private,
};
struct iomap_readfolio_ctx ctx = {
.cur_folio = folio,
@@ -591,6 +592,8 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
* @rac: Describes the pages to be read.
* @ops: The operations vector for the filesystem.
* @read_ops: Optional ops callers can pass in if they want custom handling.
+ * @private: If passed in, this will be usable by the caller in any
+ * read_ops callbacks.
*
* This function is for filesystems to call to implement their readahead
* address_space operation.
@@ -603,12 +606,13 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
* the filesystem to be reentered.
*/
void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops,
- const struct iomap_read_ops *read_ops)
+ const struct iomap_read_ops *read_ops, void *private)
{
struct iomap_iter iter = {
.inode = rac->mapping->host,
.pos = readahead_pos(rac),
.len = readahead_length(rac),
+ .private = private,
};
struct iomap_readfolio_ctx ctx = {
.rac = rac,
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index fb2150c0825a..5e71a3888e6d 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -742,14 +742,14 @@ xfs_vm_read_folio(
struct file *unused,
struct folio *folio)
{
- return iomap_read_folio(folio, &xfs_read_iomap_ops, NULL);
+ return iomap_read_folio(folio, &xfs_read_iomap_ops, NULL, NULL);
}
STATIC void
xfs_vm_readahead(
struct readahead_control *rac)
{
- iomap_readahead(rac, &xfs_read_iomap_ops, NULL);
+ iomap_readahead(rac, &xfs_read_iomap_ops, NULL, NULL);
}
static int
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 96470daf4d3f..182bb473a82b 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -112,12 +112,12 @@ static const struct iomap_ops zonefs_write_iomap_ops = {
static int zonefs_read_folio(struct file *unused, struct folio *folio)
{
- return iomap_read_folio(folio, &zonefs_read_iomap_ops, NULL);
+ return iomap_read_folio(folio, &zonefs_read_iomap_ops, NULL, NULL);
}
static void zonefs_readahead(struct readahead_control *rac)
{
- iomap_readahead(rac, &zonefs_read_iomap_ops, NULL);
+ iomap_readahead(rac, &zonefs_read_iomap_ops, NULL, NULL);
}
/*
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index a7247439aeb5..9bc7900dd448 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -355,9 +355,9 @@ ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
const struct iomap_ops *ops,
const struct iomap_write_ops *write_ops, void *private);
int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops,
- const struct iomap_read_ops *read_ops);
+ const struct iomap_read_ops *read_ops, void *private);
void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops,
- const struct iomap_read_ops *read_ops);
+ const struct iomap_read_ops *read_ops, void *private);
bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);
struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len);
bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags);
--
2.47.3
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 14/16] fuse: use iomap for read_folio
2025-08-29 23:56 [PATCH v1 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (12 preceding siblings ...)
2025-08-29 23:56 ` [PATCH v1 13/16] iomap: add a private arg " Joanne Koong
@ 2025-08-29 23:56 ` Joanne Koong
2025-09-03 21:13 ` Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 15/16] fuse: use iomap for readahead Joanne Koong
2025-08-29 23:56 ` [PATCH v1 16/16] fuse: remove fuse_readpages_end() null mapping check Joanne Koong
15 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2025-08-29 23:56 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, linux-fsdevel, kernel-team, linux-xfs, linux-doc
Read folio data into the page cache using iomap. This gives us granular
uptodate tracking for large folios, which optimizes how much data needs
to be read in. If some portions of the folio are already uptodate (eg
through a prior write), we only need to read in the non-uptodate
portions.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/fuse/file.c | 72 ++++++++++++++++++++++++++++++++++----------------
1 file changed, 49 insertions(+), 23 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 5525a4520b0f..bdfb13cdee4b 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -828,22 +828,62 @@ static int fuse_do_readfolio(struct file *file, struct folio *folio,
return 0;
}
+static int fuse_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
+ unsigned int flags, struct iomap *iomap,
+ struct iomap *srcmap)
+{
+ iomap->type = IOMAP_MAPPED;
+ iomap->length = length;
+ iomap->offset = offset;
+ return 0;
+}
+
+static const struct iomap_ops fuse_iomap_ops = {
+ .iomap_begin = fuse_iomap_begin,
+};
+
+struct fuse_fill_read_data {
+ struct file *file;
+};
+
+static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
+ struct folio *folio, loff_t pos,
+ size_t len)
+{
+ struct fuse_fill_read_data *data = iter->private;
+ struct file *file = data->file;
+ size_t off = offset_in_folio(folio, pos);
+ int ret;
+
+ /*
+ * for non-readahead read requests, do reads synchronously since
+ * it's not guaranteed that the server can handle out-of-order reads
+ */
+ iomap_start_folio_read(folio, len);
+ ret = fuse_do_readfolio(file, folio, off, len);
+ iomap_finish_folio_read(folio, off, len, ret);
+ return ret;
+}
+
+static const struct iomap_read_ops fuse_iomap_read_ops = {
+ .read_folio_range = fuse_iomap_read_folio_range_async,
+};
+
static int fuse_read_folio(struct file *file, struct folio *folio)
{
struct inode *inode = folio->mapping->host;
+ struct fuse_fill_read_data data = {
+ .file = file,
+ };
int err;
- err = -EIO;
- if (fuse_is_bad(inode))
- goto out;
-
- err = fuse_do_readfolio(file, folio, 0, folio_size(folio));
- if (!err)
- folio_mark_uptodate(folio);
+ if (fuse_is_bad(inode)) {
+ folio_unlock(folio);
+ return -EIO;
+ }
+ err = iomap_read_folio(folio, &fuse_iomap_ops, &fuse_iomap_read_ops, &data);
fuse_invalidate_atime(inode);
- out:
- folio_unlock(folio);
return err;
}
@@ -1394,20 +1434,6 @@ static const struct iomap_write_ops fuse_iomap_write_ops = {
.read_folio_range = fuse_iomap_read_folio_range,
};
-static int fuse_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
- unsigned int flags, struct iomap *iomap,
- struct iomap *srcmap)
-{
- iomap->type = IOMAP_MAPPED;
- iomap->length = length;
- iomap->offset = offset;
- return 0;
-}
-
-static const struct iomap_ops fuse_iomap_ops = {
- .iomap_begin = fuse_iomap_begin,
-};
-
static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
struct file *file = iocb->ki_filp;
--
2.47.3
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 15/16] fuse: use iomap for readahead
2025-08-29 23:56 [PATCH v1 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (13 preceding siblings ...)
2025-08-29 23:56 ` [PATCH v1 14/16] fuse: use iomap for read_folio Joanne Koong
@ 2025-08-29 23:56 ` Joanne Koong
2025-09-03 21:17 ` Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 16/16] fuse: remove fuse_readpages_end() null mapping check Joanne Koong
15 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2025-08-29 23:56 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, linux-fsdevel, kernel-team, linux-xfs, linux-doc
Do readahead in fuse using iomap. This gives us granular uptodate
tracking for large folios, which optimizes how much data needs to be
read in. If some portions of the folio are already uptodate (eg through
a prior write), we only need to read in the non-uptodate portions.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/fuse/file.c | 214 +++++++++++++++++++++++++++----------------------
1 file changed, 118 insertions(+), 96 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index bdfb13cdee4b..1659603f4cb6 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -844,8 +844,73 @@ static const struct iomap_ops fuse_iomap_ops = {
struct fuse_fill_read_data {
struct file *file;
+ /*
+ * We need to track this because non-readahead requests can't be sent
+ * asynchronously.
+ */
+ bool readahead : 1;
+
+ /*
+ * Fields below are used if sending the read request
+ * asynchronously.
+ */
+ struct fuse_conn *fc;
+ struct readahead_control *rac;
+ struct fuse_io_args *ia;
+ unsigned int nr_bytes;
};
+/* forward declarations */
+static bool fuse_folios_need_send(struct fuse_conn *fc, loff_t pos,
+ unsigned len, struct fuse_args_pages *ap,
+ unsigned cur_bytes, bool write);
+static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
+ unsigned int count, bool async);
+
+static int fuse_handle_readahead(struct folio *folio,
+ struct fuse_fill_read_data *data, loff_t pos,
+ size_t len)
+{
+ struct fuse_io_args *ia = data->ia;
+ size_t off = offset_in_folio(folio, pos);
+ struct fuse_conn *fc = data->fc;
+ struct fuse_args_pages *ap;
+
+ if (ia && fuse_folios_need_send(fc, pos, len, &ia->ap, data->nr_bytes,
+ false)) {
+ fuse_send_readpages(ia, data->file, data->nr_bytes,
+ fc->async_read);
+ data->nr_bytes = 0;
+ ia = NULL;
+ }
+ if (!ia) {
+ struct readahead_control *rac = data->rac;
+ unsigned nr_pages = min(fc->max_pages, readahead_count(rac));
+
+ if (fc->num_background >= fc->congestion_threshold &&
+ rac->ra->async_size >= readahead_count(rac))
+ /*
+ * Congested and only async pages left, so skip the
+ * rest.
+ */
+ return -EAGAIN;
+
+ data->ia = fuse_io_alloc(NULL, nr_pages);
+ if (!data->ia)
+ return -ENOMEM;
+ ia = data->ia;
+ }
+ folio_get(folio);
+ ap = &ia->ap;
+ ap->folios[ap->num_folios] = folio;
+ ap->descs[ap->num_folios].offset = off;
+ ap->descs[ap->num_folios].length = len;
+ data->nr_bytes += len;
+ ap->num_folios++;
+
+ return 0;
+}
+
static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
struct folio *folio, loff_t pos,
size_t len)
@@ -855,13 +920,24 @@ static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
size_t off = offset_in_folio(folio, pos);
int ret;
- /*
- * for non-readahead read requests, do reads synchronously since
- * it's not guaranteed that the server can handle out-of-order reads
- */
iomap_start_folio_read(folio, len);
- ret = fuse_do_readfolio(file, folio, off, len);
- iomap_finish_folio_read(folio, off, len, ret);
+ if (data->readahead) {
+ ret = fuse_handle_readahead(folio, data, pos, len);
+ /*
+ * If fuse_handle_readahead was successful, fuse_readpages_end
+ * will do the iomap_finish_folio_read, else we need to call it
+ * here
+ */
+ if (ret)
+ iomap_finish_folio_read(folio, off, len, ret);
+ } else {
+ /*
+ * for non-readahead read requests, do reads synchronously since
+ * it's not guaranteed that the server can handle out-of-order reads
+ */
+ ret = fuse_do_readfolio(file, folio, off, len);
+ iomap_finish_folio_read(folio, off, len, ret);
+ }
return ret;
}
@@ -923,7 +999,8 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
}
for (i = 0; i < ap->num_folios; i++) {
- folio_end_read(ap->folios[i], !err);
+ iomap_finish_folio_read(ap->folios[i], ap->descs[i].offset,
+ ap->descs[i].length, err);
folio_put(ap->folios[i]);
}
if (ia->ff)
@@ -933,7 +1010,7 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
}
static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
- unsigned int count)
+ unsigned int count, bool async)
{
struct fuse_file *ff = file->private_data;
struct fuse_mount *fm = ff->fm;
@@ -955,7 +1032,7 @@ static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
fuse_read_args_fill(ia, file, pos, count, FUSE_READ);
ia->read.attr_ver = fuse_get_attr_version(fm->fc);
- if (fm->fc->async_read) {
+ if (async) {
ia->ff = fuse_file_get(ff);
ap->args.end = fuse_readpages_end;
err = fuse_simple_background(fm, &ap->args, GFP_KERNEL);
@@ -972,81 +1049,20 @@ static void fuse_readahead(struct readahead_control *rac)
{
struct inode *inode = rac->mapping->host;
struct fuse_conn *fc = get_fuse_conn(inode);
- unsigned int max_pages, nr_pages;
- struct folio *folio = NULL;
+ struct fuse_fill_read_data data = {
+ .file = rac->file,
+ .readahead = true,
+ .fc = fc,
+ .rac = rac,
+ };
if (fuse_is_bad(inode))
return;
- max_pages = min_t(unsigned int, fc->max_pages,
- fc->max_read / PAGE_SIZE);
-
- /*
- * This is only accurate the first time through, since readahead_folio()
- * doesn't update readahead_count() from the previous folio until the
- * next call. Grab nr_pages here so we know how many pages we're going
- * to have to process. This means that we will exit here with
- * readahead_count() == folio_nr_pages(last_folio), but we will have
- * consumed all of the folios, and read_pages() will call
- * readahead_folio() again which will clean up the rac.
- */
- nr_pages = readahead_count(rac);
-
- while (nr_pages) {
- struct fuse_io_args *ia;
- struct fuse_args_pages *ap;
- unsigned cur_pages = min(max_pages, nr_pages);
- unsigned int pages = 0;
-
- if (fc->num_background >= fc->congestion_threshold &&
- rac->ra->async_size >= readahead_count(rac))
- /*
- * Congested and only async pages left, so skip the
- * rest.
- */
- break;
-
- ia = fuse_io_alloc(NULL, cur_pages);
- if (!ia)
- break;
- ap = &ia->ap;
-
- while (pages < cur_pages) {
- unsigned int folio_pages;
-
- /*
- * This returns a folio with a ref held on it.
- * The ref needs to be held until the request is
- * completed, since the splice case (see
- * fuse_try_move_page()) drops the ref after it's
- * replaced in the page cache.
- */
- if (!folio)
- folio = __readahead_folio(rac);
-
- folio_pages = folio_nr_pages(folio);
- if (folio_pages > cur_pages - pages) {
- /*
- * Large folios belonging to fuse will never
- * have more pages than max_pages.
- */
- WARN_ON(!pages);
- break;
- }
-
- ap->folios[ap->num_folios] = folio;
- ap->descs[ap->num_folios].length = folio_size(folio);
- ap->num_folios++;
- pages += folio_pages;
- folio = NULL;
- }
- fuse_send_readpages(ia, rac->file, pages << PAGE_SHIFT);
- nr_pages -= pages;
- }
- if (folio) {
- folio_end_read(folio, false);
- folio_put(folio);
- }
+ iomap_readahead(rac, &fuse_iomap_ops, &fuse_iomap_read_ops, &data);
+ if (data.ia)
+ fuse_send_readpages(data.ia, data.file, data.nr_bytes,
+ fc->async_read);
}
static ssize_t fuse_cache_read_iter(struct kiocb *iocb, struct iov_iter *to)
@@ -2077,7 +2093,7 @@ struct fuse_fill_wb_data {
struct fuse_file *ff;
unsigned int max_folios;
/*
- * nr_bytes won't overflow since fuse_writepage_need_send() caps
+ * nr_bytes won't overflow since fuse_folios_need_send() caps
* wb requests to never exceed fc->max_pages (which has an upper bound
* of U16_MAX).
*/
@@ -2122,14 +2138,15 @@ static void fuse_writepages_send(struct inode *inode,
spin_unlock(&fi->lock);
}
-static bool fuse_writepage_need_send(struct fuse_conn *fc, loff_t pos,
- unsigned len, struct fuse_args_pages *ap,
- struct fuse_fill_wb_data *data)
+static bool fuse_folios_need_send(struct fuse_conn *fc, loff_t pos,
+ unsigned len, struct fuse_args_pages *ap,
+ unsigned cur_bytes, bool write)
{
struct folio *prev_folio;
struct fuse_folio_desc prev_desc;
- unsigned bytes = data->nr_bytes + len;
+ unsigned bytes = cur_bytes + len;
loff_t prev_pos;
+ size_t max_bytes = write ? fc->max_write : fc->max_read;
WARN_ON(!ap->num_folios);
@@ -2137,8 +2154,7 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, loff_t pos,
if ((bytes + PAGE_SIZE - 1) >> PAGE_SHIFT > fc->max_pages)
return true;
- /* Reached max write bytes */
- if (bytes > fc->max_write)
+ if (bytes > max_bytes)
return true;
/* Discontinuity */
@@ -2148,11 +2164,6 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, loff_t pos,
if (prev_pos != pos)
return true;
- /* Need to grow the pages array? If so, did the expansion fail? */
- if (ap->num_folios == data->max_folios &&
- !fuse_pages_realloc(data, fc->max_pages))
- return true;
-
return false;
}
@@ -2176,10 +2187,21 @@ static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
return -EIO;
}
- if (wpa && fuse_writepage_need_send(fc, pos, len, ap, data)) {
- fuse_writepages_send(inode, data);
- data->wpa = NULL;
- data->nr_bytes = 0;
+ if (wpa) {
+ bool send = fuse_folios_need_send(fc, pos, len, ap, data->nr_bytes,
+ true);
+
+ if (!send) {
+ /* Need to grow the pages array? If so, did the expansion fail? */
+ send = (ap->num_folios == data->max_folios) &&
+ !fuse_pages_realloc(data, fc->max_pages);
+ }
+
+ if (send) {
+ fuse_writepages_send(inode, data);
+ data->wpa = NULL;
+ data->nr_bytes = 0;
+ }
}
if (data->wpa == NULL) {
--
2.47.3
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 16/16] fuse: remove fuse_readpages_end() null mapping check
2025-08-29 23:56 [PATCH v1 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
` (14 preceding siblings ...)
2025-08-29 23:56 ` [PATCH v1 15/16] fuse: use iomap for readahead Joanne Koong
@ 2025-08-29 23:56 ` Joanne Koong
2025-09-02 9:21 ` Miklos Szeredi
15 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2025-08-29 23:56 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, linux-fsdevel, kernel-team, linux-xfs, linux-doc
Remove extra logic in fuse_readpages_end() that checks against null
folio mappings. This was added in commit ce534fb05292 ("fuse: allow
splice to move pages"):
"Since the remove_from_page_cache() + add_to_page_cache_locked()
are non-atomic it is possible that the page cache is repopulated in
between the two and add_to_page_cache_locked() will fail. This
could be fixed by creating a new atomic replace_page_cache_page()
function.
fuse_readpages_end() needed to be reworked so it works even if
page->mapping is NULL for some or all pages which can happen if the
add_to_page_cache_locked() failed."
Commit ef6a3c63112e ("mm: add replace_page_cache_page() function") added
atomic page cache replacement, which means the check against null
mappings can be removed.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/fuse/file.c | 24 +++++++++++-------------
1 file changed, 11 insertions(+), 13 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1659603f4cb6..87078f40d446 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -981,22 +981,20 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
struct fuse_args_pages *ap = &ia->ap;
size_t count = ia->read.in.size;
size_t num_read = args->out_args[0].size;
- struct address_space *mapping = NULL;
-
- for (i = 0; mapping == NULL && i < ap->num_folios; i++)
- mapping = ap->folios[i]->mapping;
+ struct address_space *mapping;
+ struct inode *inode;
- if (mapping) {
- struct inode *inode = mapping->host;
+ WARN_ON_ONCE(!ap->num_folios);
+ mapping = ap->folios[0]->mapping;
+ inode = mapping->host;
- /*
- * Short read means EOF. If file size is larger, truncate it
- */
- if (!err && num_read < count)
- fuse_short_read(inode, ia->read.attr_ver, num_read, ap);
+ /*
+ * Short read means EOF. If file size is larger, truncate it
+ */
+ if (!err && num_read < count)
+ fuse_short_read(inode, ia->read.attr_ver, num_read, ap);
- fuse_invalidate_atime(inode);
- }
+ fuse_invalidate_atime(inode);
for (i = 0; i < ap->num_folios; i++) {
iomap_finish_folio_read(ap->folios[i], ap->descs[i].offset,
--
2.47.3
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH v1 13/16] iomap: add a private arg for read and readahead
2025-08-29 23:56 ` [PATCH v1 13/16] iomap: add a private arg " Joanne Koong
@ 2025-08-30 1:54 ` Gao Xiang
2025-09-02 21:24 ` Joanne Koong
2025-09-03 21:11 ` Darrick J. Wong
1 sibling, 1 reply; 34+ messages in thread
From: Gao Xiang @ 2025-08-30 1:54 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, djwong, linux-fsdevel, kernel-team,
linux-xfs, linux-doc
Hi Joanne,
On Fri, Aug 29, 2025 at 04:56:24PM -0700, Joanne Koong wrote:
> Add a void *private arg for read and readahead which filesystems that
> pass in custom read callbacks can use. Stash this in the existing
> private field in the iomap_iter.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
> block/fops.c | 4 ++--
> fs/erofs/data.c | 4 ++--
> fs/gfs2/aops.c | 4 ++--
> fs/iomap/buffered-io.c | 8 ++++++--
> fs/xfs/xfs_aops.c | 4 ++--
> fs/zonefs/file.c | 4 ++--
> include/linux/iomap.h | 4 ++--
> 7 files changed, 18 insertions(+), 14 deletions(-)
>
...
> int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops,
> - const struct iomap_read_ops *read_ops)
> + const struct iomap_read_ops *read_ops, void *private)
> {
> struct iomap_iter iter = {
> .inode = folio->mapping->host,
> .pos = folio_pos(folio),
> .len = folio_size(folio),
> + .private = private,
> };
Will this whole work be landed for v6.18?
If not, may I ask if this patch can be shifted advance in this
patchset for applying separately (I tried but no luck).
Because I also need some similar approach for EROFS iomap page
cache sharing feature since EROFS uncompressed I/Os go through
iomap and extra information needs a proper way to pass down to
iomap_{begin,end} with extra pointer `.private` too.
Thanks,
Gao Xiang
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 16/16] fuse: remove fuse_readpages_end() null mapping check
2025-08-29 23:56 ` [PATCH v1 16/16] fuse: remove fuse_readpages_end() null mapping check Joanne Koong
@ 2025-09-02 9:21 ` Miklos Szeredi
2025-09-02 21:19 ` Joanne Koong
0 siblings, 1 reply; 34+ messages in thread
From: Miklos Szeredi @ 2025-09-02 9:21 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, hch, djwong, linux-fsdevel, kernel-team, linux-xfs,
linux-doc
On Sat, 30 Aug 2025 at 01:58, Joanne Koong <joannelkoong@gmail.com> wrote:
>
> Remove extra logic in fuse_readpages_end() that checks against null
> folio mappings. This was added in commit ce534fb05292 ("fuse: allow
> splice to move pages"):
>
> "Since the remove_from_page_cache() + add_to_page_cache_locked()
> are non-atomic it is possible that the page cache is repopulated in
> between the two and add_to_page_cache_locked() will fail. This
> could be fixed by creating a new atomic replace_page_cache_page()
> function.
>
> fuse_readpages_end() needed to be reworked so it works even if
> page->mapping is NULL for some or all pages which can happen if the
> add_to_page_cache_locked() failed."
>
> Commit ef6a3c63112e ("mm: add replace_page_cache_page() function") added
> atomic page cache replacement, which means the check against null
> mappings can be removed.
If I understand correctly this is independent of the patchset and can
be applied without it.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 16/16] fuse: remove fuse_readpages_end() null mapping check
2025-09-02 9:21 ` Miklos Szeredi
@ 2025-09-02 21:19 ` Joanne Koong
0 siblings, 0 replies; 34+ messages in thread
From: Joanne Koong @ 2025-09-02 21:19 UTC (permalink / raw)
To: Miklos Szeredi
Cc: brauner, hch, djwong, linux-fsdevel, kernel-team, linux-xfs,
linux-doc
On Tue, Sep 2, 2025 at 2:22 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Sat, 30 Aug 2025 at 01:58, Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > Remove extra logic in fuse_readpages_end() that checks against null
> > folio mappings. This was added in commit ce534fb05292 ("fuse: allow
> > splice to move pages"):
> >
> > "Since the remove_from_page_cache() + add_to_page_cache_locked()
> > are non-atomic it is possible that the page cache is repopulated in
> > between the two and add_to_page_cache_locked() will fail. This
> > could be fixed by creating a new atomic replace_page_cache_page()
> > function.
> >
> > fuse_readpages_end() needed to be reworked so it works even if
> > page->mapping is NULL for some or all pages which can happen if the
> > add_to_page_cache_locked() failed."
> >
> > Commit ef6a3c63112e ("mm: add replace_page_cache_page() function") added
> > atomic page cache replacement, which means the check against null
> > mappings can be removed.
>
> If I understand correctly this is independent of the patchset and can
> be applied without it.
Yes, this and patch 05/16 ("iomap: propagate iomap_read_folio() error
to caller"), patch 08/16 ("iomap: rename iomap_readpage_iter() to
iomap_readfolio_iter()"), and patch 09/16 ("iomap: rename
iomap_readpage_ctx struct to iomap_readfolio_ctx") in the series are
independent from fuse iomap read/readahead functionality.
My thinking was that it would be more cohesive to have everything in
one place so that there's less patches scattered about, but I'm
realizing now it probably was just more confusing than helpful.
Thanks,
Joanne
>
> Thanks,
> Miklos
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 13/16] iomap: add a private arg for read and readahead
2025-08-30 1:54 ` Gao Xiang
@ 2025-09-02 21:24 ` Joanne Koong
2025-09-03 1:55 ` Gao Xiang
0 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2025-09-02 21:24 UTC (permalink / raw)
To: Joanne Koong, brauner, miklos, hch, djwong, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
On Fri, Aug 29, 2025 at 6:54 PM Gao Xiang <xiang@kernel.org> wrote:
>
> Hi Joanne,
>
> On Fri, Aug 29, 2025 at 04:56:24PM -0700, Joanne Koong wrote:
> > Add a void *private arg for read and readahead which filesystems that
> > pass in custom read callbacks can use. Stash this in the existing
> > private field in the iomap_iter.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> > block/fops.c | 4 ++--
> > fs/erofs/data.c | 4 ++--
> > fs/gfs2/aops.c | 4 ++--
> > fs/iomap/buffered-io.c | 8 ++++++--
> > fs/xfs/xfs_aops.c | 4 ++--
> > fs/zonefs/file.c | 4 ++--
> > include/linux/iomap.h | 4 ++--
> > 7 files changed, 18 insertions(+), 14 deletions(-)
> >
>
> ...
>
> > int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops,
> > - const struct iomap_read_ops *read_ops)
> > + const struct iomap_read_ops *read_ops, void *private)
> > {
> > struct iomap_iter iter = {
> > .inode = folio->mapping->host,
> > .pos = folio_pos(folio),
> > .len = folio_size(folio),
> > + .private = private,
> > };
>
> Will this whole work be landed for v6.18?
>
> If not, may I ask if this patch can be shifted advance in this
> patchset for applying separately (I tried but no luck).
>
> Because I also need some similar approach for EROFS iomap page
> cache sharing feature since EROFS uncompressed I/Os go through
> iomap and extra information needs a proper way to pass down to
> iomap_{begin,end} with extra pointer `.private` too.
Hi Gao,
I'm not sure whether this will be landed for v6.18 but I'm happy to
shift this patch to the beginning of the patchset for applying
separately.
Thanks,
Joanne
>
> Thanks,
> Gao Xiang
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 13/16] iomap: add a private arg for read and readahead
2025-09-02 21:24 ` Joanne Koong
@ 2025-09-03 1:55 ` Gao Xiang
0 siblings, 0 replies; 34+ messages in thread
From: Gao Xiang @ 2025-09-03 1:55 UTC (permalink / raw)
To: Joanne Koong, brauner, miklos, hch, djwong, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
On 2025/9/3 05:24, Joanne Koong wrote:
> On Fri, Aug 29, 2025 at 6:54 PM Gao Xiang <xiang@kernel.org> wrote:
>>
>> Hi Joanne,
>>
>> On Fri, Aug 29, 2025 at 04:56:24PM -0700, Joanne Koong wrote:
>>> Add a void *private arg for read and readahead which filesystems that
>>> pass in custom read callbacks can use. Stash this in the existing
>>> private field in the iomap_iter.
>>>
>>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>>> ---
>>> block/fops.c | 4 ++--
>>> fs/erofs/data.c | 4 ++--
>>> fs/gfs2/aops.c | 4 ++--
>>> fs/iomap/buffered-io.c | 8 ++++++--
>>> fs/xfs/xfs_aops.c | 4 ++--
>>> fs/zonefs/file.c | 4 ++--
>>> include/linux/iomap.h | 4 ++--
>>> 7 files changed, 18 insertions(+), 14 deletions(-)
>>>
>>
>> ...
>>
>>> int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops,
>>> - const struct iomap_read_ops *read_ops)
>>> + const struct iomap_read_ops *read_ops, void *private)
>>> {
>>> struct iomap_iter iter = {
>>> .inode = folio->mapping->host,
>>> .pos = folio_pos(folio),
>>> .len = folio_size(folio),
>>> + .private = private,
>>> };
>>
>> Will this whole work be landed for v6.18?
>>
>> If not, may I ask if this patch can be shifted advance in this
>> patchset for applying separately (I tried but no luck).
>>
>> Because I also need some similar approach for EROFS iomap page
>> cache sharing feature since EROFS uncompressed I/Os go through
>> iomap and extra information needs a proper way to pass down to
>> iomap_{begin,end} with extra pointer `.private` too.
>
> Hi Gao,
>
> I'm not sure whether this will be landed for v6.18 but I'm happy to
> shift this patch to the beginning of the patchset for applying
> separately.
Yeah, thanks. At least this common patch can be potentially applied
easily (e.g. form a common commit id for both features if really
needed) since other iomap/FUSE patches are not dependency of our new
feature and shouldn't be coupled with our development branch later.
Thanks,
Gao Xiang
>
> Thanks,
> Joanne
>>
>> Thanks,
>> Gao Xiang
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 01/16] iomap: move async bio read logic into helper function
2025-08-29 23:56 ` [PATCH v1 01/16] iomap: move async bio read logic into helper function Joanne Koong
@ 2025-09-03 20:16 ` Darrick J. Wong
0 siblings, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2025-09-03 20:16 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, linux-fsdevel, kernel-team, linux-xfs,
linux-doc
On Fri, Aug 29, 2025 at 04:56:12PM -0700, Joanne Koong wrote:
> Move the iomap_readpage_iter() async bio read logic into a separate
> helper function. This is needed to make iomap read/readahead more
> generically usable, especially for filesystems that do not require
> CONFIG_BLOCK.
>
> Rename iomap_read_folio_range() to iomap_read_folio_range_sync() to
> diferentiate between the synchronous and asynchronous bio folio read
> calls.
Hrmm. Readahead is asynchronous, whereas reading in data as part of an
unaligned write to a file must be synchronous. How about naming it
iomap_readahead_folio_range() ?
Oh wait, iomap_read_folio also calls iomap_readpage_iter, which uses the
readahead paths to fill out a folio, but then waits for the folio lock
to drop, which effectively makes it ... a synchronous user of
asynchronous code.
Bleh, naming is hard. Though the code splitting seems fine...
--D
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
> fs/iomap/buffered-io.c | 68 ++++++++++++++++++++++++------------------
> 1 file changed, 39 insertions(+), 29 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index fd827398afd2..f8bdb2428819 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -357,36 +357,15 @@ struct iomap_readpage_ctx {
> struct readahead_control *rac;
> };
>
> -static int iomap_readpage_iter(struct iomap_iter *iter,
> - struct iomap_readpage_ctx *ctx)
> +static void iomap_read_folio_range_async(const struct iomap_iter *iter,
> + struct iomap_readpage_ctx *ctx, loff_t pos, size_t plen)
> {
> + struct folio *folio = ctx->cur_folio;
> const struct iomap *iomap = &iter->iomap;
> - loff_t pos = iter->pos;
> + struct iomap_folio_state *ifs = folio->private;
> + size_t poff = offset_in_folio(folio, pos);
> loff_t length = iomap_length(iter);
> - struct folio *folio = ctx->cur_folio;
> - struct iomap_folio_state *ifs;
> - size_t poff, plen;
> sector_t sector;
> - int ret;
> -
> - if (iomap->type == IOMAP_INLINE) {
> - ret = iomap_read_inline_data(iter, folio);
> - if (ret)
> - return ret;
> - return iomap_iter_advance(iter, &length);
> - }
> -
> - /* zero post-eof blocks as the page may be mapped */
> - ifs = ifs_alloc(iter->inode, folio, iter->flags);
> - iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff, &plen);
> - if (plen == 0)
> - goto done;
> -
> - if (iomap_block_needs_zeroing(iter, pos)) {
> - folio_zero_range(folio, poff, plen);
> - iomap_set_range_uptodate(folio, poff, plen);
> - goto done;
> - }
>
> ctx->cur_folio_in_bio = true;
> if (ifs) {
> @@ -425,6 +404,37 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> ctx->bio->bi_end_io = iomap_read_end_io;
> bio_add_folio_nofail(ctx->bio, folio, plen, poff);
> }
> +}
> +
> +static int iomap_readpage_iter(struct iomap_iter *iter,
> + struct iomap_readpage_ctx *ctx)
> +{
> + const struct iomap *iomap = &iter->iomap;
> + loff_t pos = iter->pos;
> + loff_t length = iomap_length(iter);
> + struct folio *folio = ctx->cur_folio;
> + size_t poff, plen;
> + int ret;
> +
> + if (iomap->type == IOMAP_INLINE) {
> + ret = iomap_read_inline_data(iter, folio);
> + if (ret)
> + return ret;
> + return iomap_iter_advance(iter, &length);
> + }
> +
> + /* zero post-eof blocks as the page may be mapped */
> + ifs_alloc(iter->inode, folio, iter->flags);
> + iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff, &plen);
> + if (plen == 0)
> + goto done;
> +
> + if (iomap_block_needs_zeroing(iter, pos)) {
> + folio_zero_range(folio, poff, plen);
> + iomap_set_range_uptodate(folio, poff, plen);
> + } else {
> + iomap_read_folio_range_async(iter, ctx, pos, plen);
> + }
>
> done:
> /*
> @@ -549,7 +559,7 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
> }
> EXPORT_SYMBOL_GPL(iomap_readahead);
>
> -static int iomap_read_folio_range(const struct iomap_iter *iter,
> +static int iomap_read_folio_range_sync(const struct iomap_iter *iter,
> struct folio *folio, loff_t pos, size_t len)
> {
> const struct iomap *srcmap = iomap_iter_srcmap(iter);
> @@ -562,7 +572,7 @@ static int iomap_read_folio_range(const struct iomap_iter *iter,
> return submit_bio_wait(&bio);
> }
> #else
> -static int iomap_read_folio_range(const struct iomap_iter *iter,
> +static int iomap_read_folio_range_sync(const struct iomap_iter *iter,
> struct folio *folio, loff_t pos, size_t len)
> {
> WARN_ON_ONCE(1);
> @@ -739,7 +749,7 @@ static int __iomap_write_begin(const struct iomap_iter *iter,
> status = write_ops->read_folio_range(iter,
> folio, block_start, plen);
> else
> - status = iomap_read_folio_range(iter,
> + status = iomap_read_folio_range_sync(iter,
> folio, block_start, plen);
> if (status)
> return status;
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 02/16] iomap: rename cur_folio_in_bio to folio_unlockedOM
2025-08-29 23:56 ` [PATCH v1 02/16] iomap: rename cur_folio_in_bio to folio_unlocked Joanne Koong
@ 2025-09-03 20:26 ` Darrick J. Wong
0 siblings, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2025-09-03 20:26 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, linux-fsdevel, kernel-team, linux-xfs,
linux-doc
On Fri, Aug 29, 2025 at 04:56:13PM -0700, Joanne Koong wrote:
> The purpose of struct iomap_readpage_ctx's cur_folio_in_bio is to track
> if the folio needs to be unlocked or not. Rename this to folio_unlocked
> to make the purpose more clear and so that when iomap read/readahead
> logic is made generic, the name also makes sense for filesystems that
> don't use bios.
Hrmmm. The problem is, "cur_folio_in_bio" captures the meaning that the
(locked) folio is attached to the bio, so the bio_io_end function has to
unlock the folio. The readahead context is basically borrowing the
folio and cannot unlock the folio itelf.
The name folio_unlocked doesn't capture the change in ownership, it just
fixates on the lock state which (imo) is a side effect of the folio lock
ownership.
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
> fs/iomap/buffered-io.c | 18 ++++++++----------
> 1 file changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index f8bdb2428819..4b173aad04ed 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -352,7 +352,7 @@ static void iomap_read_end_io(struct bio *bio)
>
> struct iomap_readpage_ctx {
> struct folio *cur_folio;
> - bool cur_folio_in_bio;
> + bool folio_unlocked;
Maybe this ought to be called cur_folio_borrowed?
/*
* Folio readahead can transfer ownership of a folio lock to
* an external reader (e.g. bios) with the expectation that
* the new owner will unlock the folio when the readahead is
* complete. Under these circumstances, the readahead context
* is merely borrowing the folio and must not unlock it.
*/
bool cur_folio_borrowed;
> struct bio *bio;
> struct readahead_control *rac;
> };
> @@ -367,7 +367,7 @@ static void iomap_read_folio_range_async(const struct iomap_iter *iter,
> loff_t length = iomap_length(iter);
> sector_t sector;
>
> - ctx->cur_folio_in_bio = true;
> + ctx->folio_unlocked = true;
ctx->cur_folio_borrowed = true;
> if (ifs) {
> spin_lock_irq(&ifs->state_lock);
> ifs->read_bytes_pending += plen;
> @@ -480,9 +480,9 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
>
> if (ctx.bio) {
> submit_bio(ctx.bio);
> - WARN_ON_ONCE(!ctx.cur_folio_in_bio);
> + WARN_ON_ONCE(!ctx.folio_unlocked);
> } else {
> - WARN_ON_ONCE(ctx.cur_folio_in_bio);
> + WARN_ON_ONCE(ctx.folio_unlocked);
> folio_unlock(folio);
> }
>
> @@ -503,13 +503,13 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
> while (iomap_length(iter)) {
> if (ctx->cur_folio &&
> offset_in_folio(ctx->cur_folio, iter->pos) == 0) {
> - if (!ctx->cur_folio_in_bio)
> + if (!ctx->folio_unlocked)
> folio_unlock(ctx->cur_folio);
> ctx->cur_folio = NULL;
> }
> if (!ctx->cur_folio) {
> ctx->cur_folio = readahead_folio(ctx->rac);
> - ctx->cur_folio_in_bio = false;
> + ctx->folio_unlocked = false;
ctx->cur_folio_borrowed = false;
> }
> ret = iomap_readpage_iter(iter, ctx);
> if (ret)
> @@ -552,10 +552,8 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
>
> if (ctx.bio)
> submit_bio(ctx.bio);
> - if (ctx.cur_folio) {
> - if (!ctx.cur_folio_in_bio)
> - folio_unlock(ctx.cur_folio);
> - }
> + if (ctx.cur_folio && !ctx.folio_unlocked)
> + folio_unlock(ctx.cur_folio);
if (ctx.cur_folio && !ctx.cur_folio_borrowed)
folio_unlock(ctx.cur_folio);
> }
> EXPORT_SYMBOL_GPL(iomap_readahead);
>
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 04/16] iomap: use iomap_iter->private for stashing read/readahead bio
2025-08-29 23:56 ` [PATCH v1 04/16] iomap: use iomap_iter->private for stashing read/readahead bio Joanne Koong
@ 2025-09-03 20:30 ` Darrick J. Wong
0 siblings, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2025-09-03 20:30 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, linux-fsdevel, kernel-team, linux-xfs,
linux-doc
On Fri, Aug 29, 2025 at 04:56:15PM -0700, Joanne Koong wrote:
> Use the iomap_iter->private field for stashing any read/readahead bios
> instead of defining the bio as part of the iomap_readpage_ctx struct.
> This makes the read/readahead interface more generic. Some filesystems
> that will be using iomap for read/readahead may not have CONFIG_BLOCK
> set.
Sorry, but I don't like abusing iomap_iter::private because (a) it's a
void pointer which means shenanigans; and (b) private exists to store
some private data for an iomap caller, not iomap itself.
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
> fs/iomap/buffered-io.c | 49 +++++++++++++++++++++---------------------
> 1 file changed, 25 insertions(+), 24 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index f2bfb3e17bb0..9db233a4a82c 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -353,11 +353,10 @@ static void iomap_read_end_io(struct bio *bio)
> struct iomap_readpage_ctx {
> struct folio *cur_folio;
> bool folio_unlocked;
> - struct bio *bio;
Does this work if you do:
#ifdef CONFIG_BLOCK
struct bio *bio;
#endif
Hm? Possibly with a forward declaration of struct bio to shut the
compiler up?
--D
> struct readahead_control *rac;
> };
>
> -static void iomap_read_folio_range_async(const struct iomap_iter *iter,
> +static void iomap_read_folio_range_async(struct iomap_iter *iter,
> struct iomap_readpage_ctx *ctx, loff_t pos, size_t plen)
> {
> struct folio *folio = ctx->cur_folio;
> @@ -365,6 +364,7 @@ static void iomap_read_folio_range_async(const struct iomap_iter *iter,
> struct iomap_folio_state *ifs = folio->private;
> size_t poff = offset_in_folio(folio, pos);
> loff_t length = iomap_length(iter);
> + struct bio *bio = iter->private;
> sector_t sector;
>
> ctx->folio_unlocked = true;
> @@ -375,34 +375,32 @@ static void iomap_read_folio_range_async(const struct iomap_iter *iter,
> }
>
> sector = iomap_sector(iomap, pos);
> - if (!ctx->bio ||
> - bio_end_sector(ctx->bio) != sector ||
> - !bio_add_folio(ctx->bio, folio, plen, poff)) {
> + if (!bio || bio_end_sector(bio) != sector ||
> + !bio_add_folio(bio, folio, plen, poff)) {
> gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
> gfp_t orig_gfp = gfp;
> unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
>
> - if (ctx->bio)
> - submit_bio(ctx->bio);
> + if (bio)
> + submit_bio(bio);
>
> if (ctx->rac) /* same as readahead_gfp_mask */
> gfp |= __GFP_NORETRY | __GFP_NOWARN;
> - ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
> - REQ_OP_READ, gfp);
> + bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
> + REQ_OP_READ, gfp);
> /*
> * If the bio_alloc fails, try it again for a single page to
> * avoid having to deal with partial page reads. This emulates
> * what do_mpage_read_folio does.
> */
> - if (!ctx->bio) {
> - ctx->bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ,
> - orig_gfp);
> - }
> + if (!bio)
> + bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ, orig_gfp);
> + iter->private = bio;
> if (ctx->rac)
> - ctx->bio->bi_opf |= REQ_RAHEAD;
> - ctx->bio->bi_iter.bi_sector = sector;
> - ctx->bio->bi_end_io = iomap_read_end_io;
> - bio_add_folio_nofail(ctx->bio, folio, plen, poff);
> + bio->bi_opf |= REQ_RAHEAD;
> + bio->bi_iter.bi_sector = sector;
> + bio->bi_end_io = iomap_read_end_io;
> + bio_add_folio_nofail(bio, folio, plen, poff);
> }
> }
>
> @@ -447,15 +445,18 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> return iomap_iter_advance(iter, &length);
> }
>
> -static void iomap_readfolio_submit(const struct iomap_readpage_ctx *ctx)
> +static void iomap_readfolio_submit(const struct iomap_iter *iter)
> {
> - if (ctx->bio)
> - submit_bio(ctx->bio);
> + struct bio *bio = iter->private;
> +
> + if (bio)
> + submit_bio(bio);
> }
>
> -static void iomap_readfolio_complete(const struct iomap_readpage_ctx *ctx)
> +static void iomap_readfolio_complete(const struct iomap_iter *iter,
> + const struct iomap_readpage_ctx *ctx)
> {
> - iomap_readfolio_submit(ctx);
> + iomap_readfolio_submit(iter);
>
> if (ctx->cur_folio && !ctx->folio_unlocked)
> folio_unlock(ctx->cur_folio);
> @@ -492,7 +493,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
> while ((ret = iomap_iter(&iter, ops)) > 0)
> iter.status = iomap_read_folio_iter(&iter, &ctx);
>
> - iomap_readfolio_complete(&ctx);
> + iomap_readfolio_complete(&iter, &ctx);
>
> /*
> * Just like mpage_readahead and block_read_full_folio, we always
> @@ -558,7 +559,7 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
> while (iomap_iter(&iter, ops) > 0)
> iter.status = iomap_readahead_iter(&iter, &ctx);
>
> - iomap_readfolio_complete(&ctx);
> + iomap_readfolio_complete(&iter, &ctx);
> }
> EXPORT_SYMBOL_GPL(iomap_readahead);
>
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 05/16] iomap: propagate iomap_read_folio() error to caller
2025-08-29 23:56 ` [PATCH v1 05/16] iomap: propagate iomap_read_folio() error to caller Joanne Koong
@ 2025-09-03 20:32 ` Darrick J. Wong
0 siblings, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2025-09-03 20:32 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, linux-fsdevel, kernel-team, linux-xfs,
linux-doc
On Fri, Aug 29, 2025 at 04:56:16PM -0700, Joanne Koong wrote:
> Propagate any error encountered in iomap_read_folio() back up to its
> caller (otherwise a default -EIO will be passed up by
> filemap_read_folio() to callers). This is standard behavior for how
> other filesystems handle their ->read_folio() errors as well.
>
> Remove the out of date comment about setting the folio error flag.
> Folio error flags were removed in commit 1f56eedf7ff7 ("iomap:
> Remove calls to set and clear folio error flag").
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
This seems correct to me
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/iomap/buffered-io.c | 7 +------
> 1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 9db233a4a82c..8dd26c50e5ea 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -495,12 +495,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
>
> iomap_readfolio_complete(&iter, &ctx);
>
> - /*
> - * Just like mpage_readahead and block_read_full_folio, we always
> - * return 0 and just set the folio error flag on errors. This
> - * should be cleaned up throughout the stack eventually.
> - */
> - return 0;
> + return ret;
> }
> EXPORT_SYMBOL_GPL(iomap_read_folio);
>
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 07/16] iomap: iterate through entire folio in iomap_readpage_iter()
2025-08-29 23:56 ` [PATCH v1 07/16] iomap: iterate through entire folio in iomap_readpage_iter() Joanne Koong
@ 2025-09-03 20:43 ` Darrick J. Wong
0 siblings, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2025-09-03 20:43 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, linux-fsdevel, kernel-team, linux-xfs,
linux-doc
On Fri, Aug 29, 2025 at 04:56:18PM -0700, Joanne Koong wrote:
> Iterate through the entire folio in iomap_readpage_iter() in one go
> instead of in pieces. This will be needed for supporting user-provided
> async read folio callbacks (not yet added). This additionally makes the
> iomap_readahead_iter() logic simpler to follow.
This might be a good time to change the name since you're not otherwise
changing the function declaration, and there ought to be /some/
indication that the behavior isn't the same anymore.
Otherwise, this looks correct to me.
--D
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
> fs/iomap/buffered-io.c | 76 ++++++++++++++++++------------------------
> 1 file changed, 33 insertions(+), 43 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index f26544fbcb36..75bbef386b62 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -452,6 +452,7 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> loff_t length = iomap_length(iter);
> struct folio *folio = ctx->cur_folio;
> size_t poff, plen;
> + loff_t count;
> int ret;
>
> if (iomap->type == IOMAP_INLINE) {
> @@ -463,26 +464,30 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
>
> /* zero post-eof blocks as the page may be mapped */
> ifs_alloc(iter->inode, folio, iter->flags);
> - iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff, &plen);
> - if (plen == 0)
> - goto done;
>
> - if (iomap_block_needs_zeroing(iter, pos)) {
> - folio_zero_range(folio, poff, plen);
> - iomap_set_range_uptodate(folio, poff, plen);
> - } else {
> - iomap_read_folio_range_async(iter, ctx, pos, plen);
> - }
> + length = min_t(loff_t, length,
> + folio_size(folio) - offset_in_folio(folio, pos));
> + while (length) {
> + iomap_adjust_read_range(iter->inode, folio, &pos,
> + length, &poff, &plen);
> + count = pos - iter->pos + plen;
> + if (plen == 0)
> + return iomap_iter_advance(iter, &count);
>
> -done:
> - /*
> - * Move the caller beyond our range so that it keeps making progress.
> - * For that, we have to include any leading non-uptodate ranges, but
> - * we can skip trailing ones as they will be handled in the next
> - * iteration.
> - */
> - length = pos - iter->pos + plen;
> - return iomap_iter_advance(iter, &length);
> + if (iomap_block_needs_zeroing(iter, pos)) {
> + folio_zero_range(folio, poff, plen);
> + iomap_set_range_uptodate(folio, poff, plen);
> + } else {
> + iomap_read_folio_range_async(iter, ctx, pos, plen);
> + }
> +
> + length -= count;
> + ret = iomap_iter_advance(iter, &count);
> + if (ret)
> + return ret;
> + pos = iter->pos;
> + }
> + return 0;
> }
>
> static void iomap_readfolio_complete(const struct iomap_iter *iter,
> @@ -494,20 +499,6 @@ static void iomap_readfolio_complete(const struct iomap_iter *iter,
> folio_unlock(ctx->cur_folio);
> }
>
> -static int iomap_read_folio_iter(struct iomap_iter *iter,
> - struct iomap_readpage_ctx *ctx)
> -{
> - int ret;
> -
> - while (iomap_length(iter)) {
> - ret = iomap_readpage_iter(iter, ctx);
> - if (ret)
> - return ret;
> - }
> -
> - return 0;
> -}
> -
> int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
> {
> struct iomap_iter iter = {
> @@ -523,7 +514,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
> trace_iomap_readpage(iter.inode, 1);
>
> while ((ret = iomap_iter(&iter, ops)) > 0)
> - iter.status = iomap_read_folio_iter(&iter, &ctx);
> + iter.status = iomap_readpage_iter(&iter, &ctx);
>
> iomap_readfolio_complete(&iter, &ctx);
>
> @@ -537,16 +528,15 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
> int ret;
>
> while (iomap_length(iter)) {
> - if (ctx->cur_folio &&
> - offset_in_folio(ctx->cur_folio, iter->pos) == 0) {
> - if (!ctx->folio_unlocked)
> - folio_unlock(ctx->cur_folio);
> - ctx->cur_folio = NULL;
> - }
> - if (!ctx->cur_folio) {
> - ctx->cur_folio = readahead_folio(ctx->rac);
> - ctx->folio_unlocked = false;
> - }
> + if (ctx->cur_folio && !ctx->folio_unlocked)
> + folio_unlock(ctx->cur_folio);
> + ctx->cur_folio = readahead_folio(ctx->rac);
> + /*
> + * We should never in practice hit this case since
> + * the iter length matches the readahead length.
> + */
> + WARN_ON(!ctx->cur_folio);
> + ctx->folio_unlocked = false;
> ret = iomap_readpage_iter(iter, ctx);
> if (ret)
> return ret;
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 09/16] iomap: rename iomap_readpage_ctx struct to iomap_readfolio_ctx
2025-08-29 23:56 ` [PATCH v1 09/16] iomap: rename iomap_readpage_ctx struct to iomap_readfolio_ctx Joanne Koong
@ 2025-09-03 20:44 ` Darrick J. Wong
0 siblings, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2025-09-03 20:44 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, linux-fsdevel, kernel-team, linux-xfs,
linux-doc
On Fri, Aug 29, 2025 at 04:56:20PM -0700, Joanne Koong wrote:
> ->readpage was deprecated and reads are now on folios.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
For this and the previous rename patches,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/iomap/buffered-io.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 743112c7f8e6..a3a9b6146c2f 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -317,7 +317,7 @@ static int iomap_read_inline_data(const struct iomap_iter *iter,
> return 0;
> }
>
> -struct iomap_readpage_ctx {
> +struct iomap_readfolio_ctx {
> struct folio *cur_folio;
> bool folio_unlocked;
> struct readahead_control *rac;
> @@ -357,7 +357,7 @@ static void iomap_read_end_io(struct bio *bio)
> }
>
> static void iomap_read_folio_range_async(struct iomap_iter *iter,
> - struct iomap_readpage_ctx *ctx, loff_t pos, size_t plen)
> + struct iomap_readfolio_ctx *ctx, loff_t pos, size_t plen)
> {
> struct folio *folio = ctx->cur_folio;
> const struct iomap *iomap = &iter->iomap;
> @@ -426,7 +426,7 @@ static void iomap_readfolio_submit(const struct iomap_iter *iter)
> }
> #else
> static void iomap_read_folio_range_async(struct iomap_iter *iter,
> - struct iomap_readpage_ctx *ctx, loff_t pos, size_t len)
> + struct iomap_readfolio_ctx *ctx, loff_t pos, size_t len)
> {
> WARN_ON_ONCE(1);
> }
> @@ -445,7 +445,7 @@ static void iomap_readfolio_submit(const struct iomap_iter *iter)
> #endif /* CONFIG_BLOCK */
>
> static int iomap_readfolio_iter(struct iomap_iter *iter,
> - struct iomap_readpage_ctx *ctx)
> + struct iomap_readfolio_ctx *ctx)
> {
> const struct iomap *iomap = &iter->iomap;
> loff_t pos = iter->pos;
> @@ -491,7 +491,7 @@ static int iomap_readfolio_iter(struct iomap_iter *iter,
> }
>
> static void iomap_readfolio_complete(const struct iomap_iter *iter,
> - const struct iomap_readpage_ctx *ctx)
> + const struct iomap_readfolio_ctx *ctx)
> {
> iomap_readfolio_submit(iter);
>
> @@ -506,7 +506,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
> .pos = folio_pos(folio),
> .len = folio_size(folio),
> };
> - struct iomap_readpage_ctx ctx = {
> + struct iomap_readfolio_ctx ctx = {
> .cur_folio = folio,
> };
> int ret;
> @@ -523,7 +523,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
> EXPORT_SYMBOL_GPL(iomap_read_folio);
>
> static int iomap_readahead_iter(struct iomap_iter *iter,
> - struct iomap_readpage_ctx *ctx)
> + struct iomap_readfolio_ctx *ctx)
> {
> int ret;
>
> @@ -567,7 +567,7 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
> .pos = readahead_pos(rac),
> .len = readahead_length(rac),
> };
> - struct iomap_readpage_ctx ctx = {
> + struct iomap_readfolio_ctx ctx = {
> .rac = rac,
> };
>
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 10/16] iomap: add iomap_start_folio_read() helper
2025-08-29 23:56 ` [PATCH v1 10/16] iomap: add iomap_start_folio_read() helper Joanne Koong
@ 2025-09-03 20:52 ` Darrick J. Wong
0 siblings, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2025-09-03 20:52 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, linux-fsdevel, kernel-team, linux-xfs,
linux-doc
On Fri, Aug 29, 2025 at 04:56:21PM -0700, Joanne Koong wrote:
> Move ifs read_bytes_pending addition logic into a separate helper,
> iomap_start_folio_read(), which will be needed later on by user-provided
> read callbacks (not yet added) for read/readahead.This is the
> counterpart to the already currently-existing iomap_finish_folio_read().
Looks ok but aren't your new fuse functions going to need
iomap_start_folio_read? In which case, don't they need to be outside of
#ifdef CONFIG_BLOCK? Why not put them there and avoid patch 11?
Eh whatever the end result is the same
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
> fs/iomap/buffered-io.c | 18 ++++++++++++------
> 1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index a3a9b6146c2f..6a9f9a9e591f 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -324,6 +324,17 @@ struct iomap_readfolio_ctx {
> };
>
> #ifdef CONFIG_BLOCK
> +static void iomap_start_folio_read(struct folio *folio, size_t len)
> +{
> + struct iomap_folio_state *ifs = folio->private;
> +
> + if (ifs) {
> + spin_lock_irq(&ifs->state_lock);
> + ifs->read_bytes_pending += len;
> + spin_unlock_irq(&ifs->state_lock);
> + }
> +}
> +
> static void iomap_finish_folio_read(struct folio *folio, size_t off,
> size_t len, int error)
> {
> @@ -361,18 +372,13 @@ static void iomap_read_folio_range_async(struct iomap_iter *iter,
> {
> struct folio *folio = ctx->cur_folio;
> const struct iomap *iomap = &iter->iomap;
> - struct iomap_folio_state *ifs = folio->private;
> size_t poff = offset_in_folio(folio, pos);
> loff_t length = iomap_length(iter);
> struct bio *bio = iter->private;
> sector_t sector;
>
> ctx->folio_unlocked = true;
> - if (ifs) {
> - spin_lock_irq(&ifs->state_lock);
> - ifs->read_bytes_pending += plen;
> - spin_unlock_irq(&ifs->state_lock);
> - }
> + iomap_start_folio_read(folio, plen);
>
> sector = iomap_sector(iomap, pos);
> if (!bio || bio_end_sector(bio) != sector ||
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 11/16] iomap: make start folio read and finish folio read public APIs
2025-08-29 23:56 ` [PATCH v1 11/16] iomap: make start folio read and finish folio read public APIs Joanne Koong
@ 2025-09-03 20:53 ` Darrick J. Wong
0 siblings, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2025-09-03 20:53 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, linux-fsdevel, kernel-team, linux-xfs,
linux-doc
On Fri, Aug 29, 2025 at 04:56:22PM -0700, Joanne Koong wrote:
> Make iomap_start_folio_read() and iomap_finish_folio_read() publicly
> accessible. These need to be accessible in order to support
> user-provided read folio callbacks for read/readahead.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Looks decent,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/iomap/buffered-io.c | 10 ++++++----
> include/linux/iomap.h | 3 +++
> 2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 6a9f9a9e591f..5d153c6b16b6 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -323,8 +323,7 @@ struct iomap_readfolio_ctx {
> struct readahead_control *rac;
> };
>
> -#ifdef CONFIG_BLOCK
> -static void iomap_start_folio_read(struct folio *folio, size_t len)
> +void iomap_start_folio_read(struct folio *folio, size_t len)
> {
> struct iomap_folio_state *ifs = folio->private;
>
> @@ -334,9 +333,10 @@ static void iomap_start_folio_read(struct folio *folio, size_t len)
> spin_unlock_irq(&ifs->state_lock);
> }
> }
> +EXPORT_SYMBOL_GPL(iomap_start_folio_read);
>
> -static void iomap_finish_folio_read(struct folio *folio, size_t off,
> - size_t len, int error)
> +void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
> + int error)
> {
> struct iomap_folio_state *ifs = folio->private;
> bool uptodate = !error;
> @@ -356,7 +356,9 @@ static void iomap_finish_folio_read(struct folio *folio, size_t off,
> if (finished)
> folio_end_read(folio, uptodate);
> }
> +EXPORT_SYMBOL_GPL(iomap_finish_folio_read);
>
> +#ifdef CONFIG_BLOCK
> static void iomap_read_end_io(struct bio *bio)
> {
> int error = blk_status_to_errno(bio->bi_status);
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 73dceabc21c8..0938c4a57f4c 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -467,6 +467,9 @@ ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
> loff_t pos, loff_t end_pos, unsigned int dirty_len);
> int iomap_ioend_writeback_submit(struct iomap_writepage_ctx *wpc, int error);
>
> +void iomap_start_folio_read(struct folio *folio, size_t len);
> +void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
> + int error);
> void iomap_start_folio_write(struct inode *inode, struct folio *folio,
> size_t len);
> void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 12/16] iomap: add iomap_read_ops for read and readahead
2025-08-29 23:56 ` [PATCH v1 12/16] iomap: add iomap_read_ops for read and readahead Joanne Koong
@ 2025-09-03 21:08 ` Darrick J. Wong
0 siblings, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2025-09-03 21:08 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, linux-fsdevel, kernel-team, linux-xfs,
linux-doc
On Fri, Aug 29, 2025 at 04:56:23PM -0700, Joanne Koong wrote:
> Add a "struct iomap_read_ops" that contains a read_folio_range()
> callback that callers can provide as a custom handler for reading in a
> folio range, if the caller does not wish to issue bio read requests
> (which otherwise is the default behavior). read_folio_range() may read
> the request asynchronously or synchronously. The caller is responsible
> for calling iomap_start_folio_read()/iomap_finish_folio_read() when
> reading the folio range.
>
> This makes it so that non-block based filesystems may use iomap for
> reads.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
> .../filesystems/iomap/operations.rst | 19 +++++
> block/fops.c | 4 +-
> fs/erofs/data.c | 4 +-
> fs/gfs2/aops.c | 4 +-
> fs/iomap/buffered-io.c | 79 +++++++++++++------
> fs/xfs/xfs_aops.c | 4 +-
> fs/zonefs/file.c | 4 +-
> include/linux/iomap.h | 21 ++++-
> 8 files changed, 105 insertions(+), 34 deletions(-)
>
> diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
> index 067ed8e14ef3..215053f0779d 100644
> --- a/Documentation/filesystems/iomap/operations.rst
> +++ b/Documentation/filesystems/iomap/operations.rst
> @@ -57,6 +57,25 @@ The following address space operations can be wrapped easily:
> * ``bmap``
> * ``swap_activate``
>
> +``struct iomap_read_ops``
> +--------------------------
> +
> +.. code-block:: c
> +
> + struct iomap_read_ops {
> + int (*read_folio_range)(const struct iomap_iter *iter,
> + struct folio *folio, loff_t pos, size_t len);
> + };
> +
> +iomap calls these functions:
> +
> + - ``read_folio_range``: Called to read in the range (read does not need to
> + be synchronous). The caller is responsible for calling
Er... does this perform the read synchronously or asynchronously?
Does the implementer need to know? How does iomap figure out what
happened?
My guess is that iomap_finish_folio_read unlocks the folio, and anyone
who cared is by this point already waiting on the folio lock? So it's
actually not important if the ->read_folio_range implementation runs
async or not; the key is that the folio stays locked until we've
completed the read IO?
> + iomap_start_folio_read() and iomap_finish_folio_read() when reading the
> + folio range. This should be done even if an error is encountered during
> + the read. If this function is not provided by the caller, then iomap
> + will default to issuing asynchronous bio read requests.
What is this function supposed to return? The usual 0 or negative
errno?
> +
> ``struct iomap_write_ops``
> --------------------------
>
> diff --git a/block/fops.c b/block/fops.c
> index ddbc69c0922b..b42e16d0eb35 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -533,12 +533,12 @@ const struct address_space_operations def_blk_aops = {
> #else /* CONFIG_BUFFER_HEAD */
> static int blkdev_read_folio(struct file *file, struct folio *folio)
> {
> - return iomap_read_folio(folio, &blkdev_iomap_ops);
> + return iomap_read_folio(folio, &blkdev_iomap_ops, NULL);
> }
>
> static void blkdev_readahead(struct readahead_control *rac)
> {
> - iomap_readahead(rac, &blkdev_iomap_ops);
> + iomap_readahead(rac, &blkdev_iomap_ops, NULL);
> }
>
> static ssize_t blkdev_writeback_range(struct iomap_writepage_ctx *wpc,
> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> index 3b1ba571c728..ea451f233263 100644
> --- a/fs/erofs/data.c
> +++ b/fs/erofs/data.c
> @@ -371,7 +371,7 @@ static int erofs_read_folio(struct file *file, struct folio *folio)
> {
> trace_erofs_read_folio(folio, true);
>
> - return iomap_read_folio(folio, &erofs_iomap_ops);
> + return iomap_read_folio(folio, &erofs_iomap_ops, NULL);
> }
>
> static void erofs_readahead(struct readahead_control *rac)
> @@ -379,7 +379,7 @@ static void erofs_readahead(struct readahead_control *rac)
> trace_erofs_readahead(rac->mapping->host, readahead_index(rac),
> readahead_count(rac), true);
>
> - return iomap_readahead(rac, &erofs_iomap_ops);
> + return iomap_readahead(rac, &erofs_iomap_ops, NULL);
> }
>
> static sector_t erofs_bmap(struct address_space *mapping, sector_t block)
> diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
> index 47d74afd63ac..bf531bcfd8a0 100644
> --- a/fs/gfs2/aops.c
> +++ b/fs/gfs2/aops.c
> @@ -428,7 +428,7 @@ static int gfs2_read_folio(struct file *file, struct folio *folio)
>
> if (!gfs2_is_jdata(ip) ||
> (i_blocksize(inode) == PAGE_SIZE && !folio_buffers(folio))) {
> - error = iomap_read_folio(folio, &gfs2_iomap_ops);
> + error = iomap_read_folio(folio, &gfs2_iomap_ops, NULL);
> } else if (gfs2_is_stuffed(ip)) {
> error = stuffed_read_folio(ip, folio);
> } else {
> @@ -503,7 +503,7 @@ static void gfs2_readahead(struct readahead_control *rac)
> else if (gfs2_is_jdata(ip))
> mpage_readahead(rac, gfs2_block_map);
> else
> - iomap_readahead(rac, &gfs2_iomap_ops);
> + iomap_readahead(rac, &gfs2_iomap_ops, NULL);
> }
>
> /**
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 5d153c6b16b6..06f2c857de64 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -335,8 +335,8 @@ void iomap_start_folio_read(struct folio *folio, size_t len)
> }
> EXPORT_SYMBOL_GPL(iomap_start_folio_read);
>
> -void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
> - int error)
> +static void __iomap_finish_folio_read(struct folio *folio, size_t off,
> + size_t len, int error, bool update_bitmap)
> {
> struct iomap_folio_state *ifs = folio->private;
> bool uptodate = !error;
> @@ -346,7 +346,7 @@ void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
> unsigned long flags;
>
> spin_lock_irqsave(&ifs->state_lock, flags);
> - if (!error)
> + if (!error && update_bitmap)
> uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
When do we /not/ want to set uptodate after a successful read? I guess
iomap_read_folio_range_async goes through the bio machinery and sets
uptodate via iomap_finish_folio_read()? Does the ->read_folio_range
function need to set the uptodate bits itself? Possibly by calling
iomap_finish_folio_read as well?
> ifs->read_bytes_pending -= len;
> finished = !ifs->read_bytes_pending;
> @@ -356,6 +356,12 @@ void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
> if (finished)
> folio_end_read(folio, uptodate);
> }
> +
> +void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
> + int error)
> +{
> + return __iomap_finish_folio_read(folio, off, len, error, true);
> +}
> EXPORT_SYMBOL_GPL(iomap_finish_folio_read);
>
> #ifdef CONFIG_BLOCK
> @@ -379,7 +385,6 @@ static void iomap_read_folio_range_async(struct iomap_iter *iter,
> struct bio *bio = iter->private;
> sector_t sector;
>
> - ctx->folio_unlocked = true;
> iomap_start_folio_read(folio, plen);
>
> sector = iomap_sector(iomap, pos);
> @@ -453,15 +458,17 @@ static void iomap_readfolio_submit(const struct iomap_iter *iter)
> #endif /* CONFIG_BLOCK */
>
> static int iomap_readfolio_iter(struct iomap_iter *iter,
> - struct iomap_readfolio_ctx *ctx)
> + struct iomap_readfolio_ctx *ctx,
> + const struct iomap_read_ops *read_ops)
> {
> const struct iomap *iomap = &iter->iomap;
> + struct iomap_folio_state *ifs;
> loff_t pos = iter->pos;
> loff_t length = iomap_length(iter);
> struct folio *folio = ctx->cur_folio;
> size_t poff, plen;
> loff_t count;
> - int ret;
> + int ret = 0;
>
> if (iomap->type == IOMAP_INLINE) {
> ret = iomap_read_inline_data(iter, folio);
> @@ -471,7 +478,14 @@ static int iomap_readfolio_iter(struct iomap_iter *iter,
> }
>
> /* zero post-eof blocks as the page may be mapped */
> - ifs_alloc(iter->inode, folio, iter->flags);
> + ifs = ifs_alloc(iter->inode, folio, iter->flags);
> +
> + /*
> + * Add a bias to ifs->read_bytes_pending so that a read is ended only
> + * after all the ranges have been read in.
> + */
> + if (ifs)
> + iomap_start_folio_read(folio, 1);
>
> length = min_t(loff_t, length,
> folio_size(folio) - offset_in_folio(folio, pos));
> @@ -479,35 +493,53 @@ static int iomap_readfolio_iter(struct iomap_iter *iter,
> iomap_adjust_read_range(iter->inode, folio, &pos,
> length, &poff, &plen);
> count = pos - iter->pos + plen;
> - if (plen == 0)
> - return iomap_iter_advance(iter, &count);
> + if (plen == 0) {
> + ret = iomap_iter_advance(iter, &count);
> + break;
> + }
>
> if (iomap_block_needs_zeroing(iter, pos)) {
> folio_zero_range(folio, poff, plen);
> iomap_set_range_uptodate(folio, poff, plen);
> } else {
> - iomap_read_folio_range_async(iter, ctx, pos, plen);
> + ctx->folio_unlocked = true;
> + if (read_ops && read_ops->read_folio_range) {
> + ret = read_ops->read_folio_range(iter, folio, pos, plen);
> + if (ret)
> + break;
> + } else {
> + iomap_read_folio_range_async(iter, ctx, pos, plen);
> + }
> }
>
> length -= count;
> ret = iomap_iter_advance(iter, &count);
> if (ret)
> - return ret;
> + break;
> pos = iter->pos;
> }
> - return 0;
> +
> + if (ifs) {
> + __iomap_finish_folio_read(folio, 0, 1, ret, false);
> + ctx->folio_unlocked = true;
Er.... so we subtract 1 from read_bytes_pending? I thought the
->read_folio_range ioend was supposed to decrease that?
--D
> + }
> +
> + return ret;
> }
>
> static void iomap_readfolio_complete(const struct iomap_iter *iter,
> - const struct iomap_readfolio_ctx *ctx)
> + const struct iomap_readfolio_ctx *ctx,
> + const struct iomap_read_ops *read_ops)
> {
> - iomap_readfolio_submit(iter);
> + if (!read_ops || !read_ops->read_folio_range)
> + iomap_readfolio_submit(iter);
>
> if (ctx->cur_folio && !ctx->folio_unlocked)
> folio_unlock(ctx->cur_folio);
> }
>
> -int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
> +int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops,
> + const struct iomap_read_ops *read_ops)
> {
> struct iomap_iter iter = {
> .inode = folio->mapping->host,
> @@ -522,16 +554,17 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
> trace_iomap_readpage(iter.inode, 1);
>
> while ((ret = iomap_iter(&iter, ops)) > 0)
> - iter.status = iomap_readfolio_iter(&iter, &ctx);
> + iter.status = iomap_readfolio_iter(&iter, &ctx, read_ops);
>
> - iomap_readfolio_complete(&iter, &ctx);
> + iomap_readfolio_complete(&iter, &ctx, read_ops);
>
> return ret;
> }
> EXPORT_SYMBOL_GPL(iomap_read_folio);
>
> static int iomap_readahead_iter(struct iomap_iter *iter,
> - struct iomap_readfolio_ctx *ctx)
> + struct iomap_readfolio_ctx *ctx,
> + const struct iomap_read_ops *read_ops)
> {
> int ret;
>
> @@ -545,7 +578,7 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
> */
> WARN_ON(!ctx->cur_folio);
> ctx->folio_unlocked = false;
> - ret = iomap_readfolio_iter(iter, ctx);
> + ret = iomap_readfolio_iter(iter, ctx, read_ops);
> if (ret)
> return ret;
> }
> @@ -557,6 +590,7 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
> * iomap_readahead - Attempt to read pages from a file.
> * @rac: Describes the pages to be read.
> * @ops: The operations vector for the filesystem.
> + * @read_ops: Optional ops callers can pass in if they want custom handling.
> *
> * This function is for filesystems to call to implement their readahead
> * address_space operation.
> @@ -568,7 +602,8 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
> * function is called with memalloc_nofs set, so allocations will not cause
> * the filesystem to be reentered.
> */
> -void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
> +void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops,
> + const struct iomap_read_ops *read_ops)
> {
> struct iomap_iter iter = {
> .inode = rac->mapping->host,
> @@ -582,9 +617,9 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
> trace_iomap_readahead(rac->mapping->host, readahead_count(rac));
>
> while (iomap_iter(&iter, ops) > 0)
> - iter.status = iomap_readahead_iter(&iter, &ctx);
> + iter.status = iomap_readahead_iter(&iter, &ctx, read_ops);
>
> - iomap_readfolio_complete(&iter, &ctx);
> + iomap_readfolio_complete(&iter, &ctx, read_ops);
> }
> EXPORT_SYMBOL_GPL(iomap_readahead);
>
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 1ee4f835ac3c..fb2150c0825a 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -742,14 +742,14 @@ xfs_vm_read_folio(
> struct file *unused,
> struct folio *folio)
> {
> - return iomap_read_folio(folio, &xfs_read_iomap_ops);
> + return iomap_read_folio(folio, &xfs_read_iomap_ops, NULL);
> }
>
> STATIC void
> xfs_vm_readahead(
> struct readahead_control *rac)
> {
> - iomap_readahead(rac, &xfs_read_iomap_ops);
> + iomap_readahead(rac, &xfs_read_iomap_ops, NULL);
> }
>
> static int
> diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
> index fd3a5922f6c3..96470daf4d3f 100644
> --- a/fs/zonefs/file.c
> +++ b/fs/zonefs/file.c
> @@ -112,12 +112,12 @@ static const struct iomap_ops zonefs_write_iomap_ops = {
>
> static int zonefs_read_folio(struct file *unused, struct folio *folio)
> {
> - return iomap_read_folio(folio, &zonefs_read_iomap_ops);
> + return iomap_read_folio(folio, &zonefs_read_iomap_ops, NULL);
> }
>
> static void zonefs_readahead(struct readahead_control *rac)
> {
> - iomap_readahead(rac, &zonefs_read_iomap_ops);
> + iomap_readahead(rac, &zonefs_read_iomap_ops, NULL);
> }
>
> /*
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 0938c4a57f4c..a7247439aeb5 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -178,6 +178,21 @@ struct iomap_write_ops {
> struct folio *folio, loff_t pos, size_t len);
> };
>
> +struct iomap_read_ops {
> + /*
> + * If the filesystem doesn't provide a custom handler for reading in the
> + * contents of a folio, iomap will default to issuing asynchronous bio
> + * read requests.
> + *
> + * The read does not need to be done synchronously. The caller is
> + * responsible for calling iomap_start_folio_read() and
> + * iomap_finish_folio_read() when reading the folio range. This should
> + * be done even if an error is encountered during the read.
> + */
> + int (*read_folio_range)(const struct iomap_iter *iter,
> + struct folio *folio, loff_t pos, size_t len);
> +};
> +
> /*
> * Flags for iomap_begin / iomap_end. No flag implies a read.
> */
> @@ -339,8 +354,10 @@ static inline bool iomap_want_unshare_iter(const struct iomap_iter *iter)
> ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
> const struct iomap_ops *ops,
> const struct iomap_write_ops *write_ops, void *private);
> -int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops);
> -void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops);
> +int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops,
> + const struct iomap_read_ops *read_ops);
> +void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops,
> + const struct iomap_read_ops *read_ops);
> bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);
> struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len);
> bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags);
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 13/16] iomap: add a private arg for read and readahead
2025-08-29 23:56 ` [PATCH v1 13/16] iomap: add a private arg " Joanne Koong
2025-08-30 1:54 ` Gao Xiang
@ 2025-09-03 21:11 ` Darrick J. Wong
1 sibling, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2025-09-03 21:11 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, linux-fsdevel, kernel-team, linux-xfs,
linux-doc
On Fri, Aug 29, 2025 at 04:56:24PM -0700, Joanne Koong wrote:
> Add a void *private arg for read and readahead which filesystems that
> pass in custom read callbacks can use. Stash this in the existing
> private field in the iomap_iter.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Seems reasonable to me, though what happens if an iomap user passes a
non-NULL private pointer here but no folio ops; and then iomap_readahead
tries to store a bio in there?
(This is why I disliked that previous patch so strongly)
--D
> ---
> block/fops.c | 4 ++--
> fs/erofs/data.c | 4 ++--
> fs/gfs2/aops.c | 4 ++--
> fs/iomap/buffered-io.c | 8 ++++++--
> fs/xfs/xfs_aops.c | 4 ++--
> fs/zonefs/file.c | 4 ++--
> include/linux/iomap.h | 4 ++--
> 7 files changed, 18 insertions(+), 14 deletions(-)
>
> diff --git a/block/fops.c b/block/fops.c
> index b42e16d0eb35..57ae886c7b1a 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -533,12 +533,12 @@ const struct address_space_operations def_blk_aops = {
> #else /* CONFIG_BUFFER_HEAD */
> static int blkdev_read_folio(struct file *file, struct folio *folio)
> {
> - return iomap_read_folio(folio, &blkdev_iomap_ops, NULL);
> + return iomap_read_folio(folio, &blkdev_iomap_ops, NULL, NULL);
> }
>
> static void blkdev_readahead(struct readahead_control *rac)
> {
> - iomap_readahead(rac, &blkdev_iomap_ops, NULL);
> + iomap_readahead(rac, &blkdev_iomap_ops, NULL, NULL);
> }
>
> static ssize_t blkdev_writeback_range(struct iomap_writepage_ctx *wpc,
> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> index ea451f233263..2ea338448ca1 100644
> --- a/fs/erofs/data.c
> +++ b/fs/erofs/data.c
> @@ -371,7 +371,7 @@ static int erofs_read_folio(struct file *file, struct folio *folio)
> {
> trace_erofs_read_folio(folio, true);
>
> - return iomap_read_folio(folio, &erofs_iomap_ops, NULL);
> + return iomap_read_folio(folio, &erofs_iomap_ops, NULL, NULL);
> }
>
> static void erofs_readahead(struct readahead_control *rac)
> @@ -379,7 +379,7 @@ static void erofs_readahead(struct readahead_control *rac)
> trace_erofs_readahead(rac->mapping->host, readahead_index(rac),
> readahead_count(rac), true);
>
> - return iomap_readahead(rac, &erofs_iomap_ops, NULL);
> + return iomap_readahead(rac, &erofs_iomap_ops, NULL, NULL);
> }
>
> static sector_t erofs_bmap(struct address_space *mapping, sector_t block)
> diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
> index bf531bcfd8a0..211a0f7b1416 100644
> --- a/fs/gfs2/aops.c
> +++ b/fs/gfs2/aops.c
> @@ -428,7 +428,7 @@ static int gfs2_read_folio(struct file *file, struct folio *folio)
>
> if (!gfs2_is_jdata(ip) ||
> (i_blocksize(inode) == PAGE_SIZE && !folio_buffers(folio))) {
> - error = iomap_read_folio(folio, &gfs2_iomap_ops, NULL);
> + error = iomap_read_folio(folio, &gfs2_iomap_ops, NULL, NULL);
> } else if (gfs2_is_stuffed(ip)) {
> error = stuffed_read_folio(ip, folio);
> } else {
> @@ -503,7 +503,7 @@ static void gfs2_readahead(struct readahead_control *rac)
> else if (gfs2_is_jdata(ip))
> mpage_readahead(rac, gfs2_block_map);
> else
> - iomap_readahead(rac, &gfs2_iomap_ops, NULL);
> + iomap_readahead(rac, &gfs2_iomap_ops, NULL, NULL);
> }
>
> /**
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 06f2c857de64..d68dd7f63923 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -539,12 +539,13 @@ static void iomap_readfolio_complete(const struct iomap_iter *iter,
> }
>
> int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops,
> - const struct iomap_read_ops *read_ops)
> + const struct iomap_read_ops *read_ops, void *private)
> {
> struct iomap_iter iter = {
> .inode = folio->mapping->host,
> .pos = folio_pos(folio),
> .len = folio_size(folio),
> + .private = private,
> };
> struct iomap_readfolio_ctx ctx = {
> .cur_folio = folio,
> @@ -591,6 +592,8 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
> * @rac: Describes the pages to be read.
> * @ops: The operations vector for the filesystem.
> * @read_ops: Optional ops callers can pass in if they want custom handling.
> + * @private: If passed in, this will be usable by the caller in any
> + * read_ops callbacks.
> *
> * This function is for filesystems to call to implement their readahead
> * address_space operation.
> @@ -603,12 +606,13 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
> * the filesystem to be reentered.
> */
> void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops,
> - const struct iomap_read_ops *read_ops)
> + const struct iomap_read_ops *read_ops, void *private)
> {
> struct iomap_iter iter = {
> .inode = rac->mapping->host,
> .pos = readahead_pos(rac),
> .len = readahead_length(rac),
> + .private = private,
> };
> struct iomap_readfolio_ctx ctx = {
> .rac = rac,
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index fb2150c0825a..5e71a3888e6d 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -742,14 +742,14 @@ xfs_vm_read_folio(
> struct file *unused,
> struct folio *folio)
> {
> - return iomap_read_folio(folio, &xfs_read_iomap_ops, NULL);
> + return iomap_read_folio(folio, &xfs_read_iomap_ops, NULL, NULL);
> }
>
> STATIC void
> xfs_vm_readahead(
> struct readahead_control *rac)
> {
> - iomap_readahead(rac, &xfs_read_iomap_ops, NULL);
> + iomap_readahead(rac, &xfs_read_iomap_ops, NULL, NULL);
> }
>
> static int
> diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
> index 96470daf4d3f..182bb473a82b 100644
> --- a/fs/zonefs/file.c
> +++ b/fs/zonefs/file.c
> @@ -112,12 +112,12 @@ static const struct iomap_ops zonefs_write_iomap_ops = {
>
> static int zonefs_read_folio(struct file *unused, struct folio *folio)
> {
> - return iomap_read_folio(folio, &zonefs_read_iomap_ops, NULL);
> + return iomap_read_folio(folio, &zonefs_read_iomap_ops, NULL, NULL);
> }
>
> static void zonefs_readahead(struct readahead_control *rac)
> {
> - iomap_readahead(rac, &zonefs_read_iomap_ops, NULL);
> + iomap_readahead(rac, &zonefs_read_iomap_ops, NULL, NULL);
> }
>
> /*
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index a7247439aeb5..9bc7900dd448 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -355,9 +355,9 @@ ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
> const struct iomap_ops *ops,
> const struct iomap_write_ops *write_ops, void *private);
> int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops,
> - const struct iomap_read_ops *read_ops);
> + const struct iomap_read_ops *read_ops, void *private);
> void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops,
> - const struct iomap_read_ops *read_ops);
> + const struct iomap_read_ops *read_ops, void *private);
> bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);
> struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len);
> bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags);
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 14/16] fuse: use iomap for read_folio
2025-08-29 23:56 ` [PATCH v1 14/16] fuse: use iomap for read_folio Joanne Koong
@ 2025-09-03 21:13 ` Darrick J. Wong
0 siblings, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2025-09-03 21:13 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, linux-fsdevel, kernel-team, linux-xfs,
linux-doc
On Fri, Aug 29, 2025 at 04:56:25PM -0700, Joanne Koong wrote:
> Read folio data into the page cache using iomap. This gives us granular
> uptodate tracking for large folios, which optimizes how much data needs
> to be read in. If some portions of the folio are already uptodate (eg
> through a prior write), we only need to read in the non-uptodate
> portions.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Looks fine to me,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/fuse/file.c | 72 ++++++++++++++++++++++++++++++++++----------------
> 1 file changed, 49 insertions(+), 23 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 5525a4520b0f..bdfb13cdee4b 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -828,22 +828,62 @@ static int fuse_do_readfolio(struct file *file, struct folio *folio,
> return 0;
> }
>
> +static int fuse_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> + unsigned int flags, struct iomap *iomap,
> + struct iomap *srcmap)
> +{
> + iomap->type = IOMAP_MAPPED;
> + iomap->length = length;
> + iomap->offset = offset;
> + return 0;
> +}
> +
> +static const struct iomap_ops fuse_iomap_ops = {
> + .iomap_begin = fuse_iomap_begin,
> +};
> +
> +struct fuse_fill_read_data {
> + struct file *file;
> +};
> +
> +static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
> + struct folio *folio, loff_t pos,
> + size_t len)
> +{
> + struct fuse_fill_read_data *data = iter->private;
> + struct file *file = data->file;
> + size_t off = offset_in_folio(folio, pos);
> + int ret;
> +
> + /*
> + * for non-readahead read requests, do reads synchronously since
> + * it's not guaranteed that the server can handle out-of-order reads
> + */
> + iomap_start_folio_read(folio, len);
> + ret = fuse_do_readfolio(file, folio, off, len);
> + iomap_finish_folio_read(folio, off, len, ret);
> + return ret;
> +}
> +
> +static const struct iomap_read_ops fuse_iomap_read_ops = {
> + .read_folio_range = fuse_iomap_read_folio_range_async,
> +};
> +
> static int fuse_read_folio(struct file *file, struct folio *folio)
> {
> struct inode *inode = folio->mapping->host;
> + struct fuse_fill_read_data data = {
> + .file = file,
> + };
> int err;
>
> - err = -EIO;
> - if (fuse_is_bad(inode))
> - goto out;
> -
> - err = fuse_do_readfolio(file, folio, 0, folio_size(folio));
> - if (!err)
> - folio_mark_uptodate(folio);
> + if (fuse_is_bad(inode)) {
> + folio_unlock(folio);
> + return -EIO;
> + }
>
> + err = iomap_read_folio(folio, &fuse_iomap_ops, &fuse_iomap_read_ops, &data);
> fuse_invalidate_atime(inode);
> - out:
> - folio_unlock(folio);
> return err;
> }
>
> @@ -1394,20 +1434,6 @@ static const struct iomap_write_ops fuse_iomap_write_ops = {
> .read_folio_range = fuse_iomap_read_folio_range,
> };
>
> -static int fuse_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> - unsigned int flags, struct iomap *iomap,
> - struct iomap *srcmap)
> -{
> - iomap->type = IOMAP_MAPPED;
> - iomap->length = length;
> - iomap->offset = offset;
> - return 0;
> -}
> -
> -static const struct iomap_ops fuse_iomap_ops = {
> - .iomap_begin = fuse_iomap_begin,
> -};
> -
> static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
> {
> struct file *file = iocb->ki_filp;
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 15/16] fuse: use iomap for readahead
2025-08-29 23:56 ` [PATCH v1 15/16] fuse: use iomap for readahead Joanne Koong
@ 2025-09-03 21:17 ` Darrick J. Wong
0 siblings, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2025-09-03 21:17 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, linux-fsdevel, kernel-team, linux-xfs,
linux-doc
On Fri, Aug 29, 2025 at 04:56:26PM -0700, Joanne Koong wrote:
> Do readahead in fuse using iomap. This gives us granular uptodate
> tracking for large folios, which optimizes how much data needs to be
> read in. If some portions of the folio are already uptodate (eg through
> a prior write), we only need to read in the non-uptodate portions.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
> fs/fuse/file.c | 214 +++++++++++++++++++++++++++----------------------
> 1 file changed, 118 insertions(+), 96 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index bdfb13cdee4b..1659603f4cb6 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -844,8 +844,73 @@ static const struct iomap_ops fuse_iomap_ops = {
>
> struct fuse_fill_read_data {
> struct file *file;
> + /*
> + * We need to track this because non-readahead requests can't be sent
> + * asynchronously.
> + */
> + bool readahead : 1;
> +
> + /*
> + * Fields below are used if sending the read request
> + * asynchronously.
> + */
> + struct fuse_conn *fc;
> + struct readahead_control *rac;
> + struct fuse_io_args *ia;
> + unsigned int nr_bytes;
> };
>
> +/* forward declarations */
> +static bool fuse_folios_need_send(struct fuse_conn *fc, loff_t pos,
> + unsigned len, struct fuse_args_pages *ap,
> + unsigned cur_bytes, bool write);
> +static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
> + unsigned int count, bool async);
> +
> +static int fuse_handle_readahead(struct folio *folio,
> + struct fuse_fill_read_data *data, loff_t pos,
> + size_t len)
> +{
> + struct fuse_io_args *ia = data->ia;
> + size_t off = offset_in_folio(folio, pos);
> + struct fuse_conn *fc = data->fc;
> + struct fuse_args_pages *ap;
> +
> + if (ia && fuse_folios_need_send(fc, pos, len, &ia->ap, data->nr_bytes,
> + false)) {
> + fuse_send_readpages(ia, data->file, data->nr_bytes,
> + fc->async_read);
> + data->nr_bytes = 0;
> + ia = NULL;
> + }
> + if (!ia) {
> + struct readahead_control *rac = data->rac;
> + unsigned nr_pages = min(fc->max_pages, readahead_count(rac));
> +
> + if (fc->num_background >= fc->congestion_threshold &&
> + rac->ra->async_size >= readahead_count(rac))
> + /*
> + * Congested and only async pages left, so skip the
> + * rest.
> + */
> + return -EAGAIN;
> +
> + data->ia = fuse_io_alloc(NULL, nr_pages);
> + if (!data->ia)
> + return -ENOMEM;
> + ia = data->ia;
> + }
> + folio_get(folio);
> + ap = &ia->ap;
> + ap->folios[ap->num_folios] = folio;
> + ap->descs[ap->num_folios].offset = off;
> + ap->descs[ap->num_folios].length = len;
> + data->nr_bytes += len;
> + ap->num_folios++;
> +
> + return 0;
> +}
> +
> static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
> struct folio *folio, loff_t pos,
> size_t len)
> @@ -855,13 +920,24 @@ static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
> size_t off = offset_in_folio(folio, pos);
> int ret;
>
> - /*
> - * for non-readahead read requests, do reads synchronously since
> - * it's not guaranteed that the server can handle out-of-order reads
> - */
> iomap_start_folio_read(folio, len);
> - ret = fuse_do_readfolio(file, folio, off, len);
> - iomap_finish_folio_read(folio, off, len, ret);
> + if (data->readahead) {
> + ret = fuse_handle_readahead(folio, data, pos, len);
> + /*
> + * If fuse_handle_readahead was successful, fuse_readpages_end
> + * will do the iomap_finish_folio_read, else we need to call it
> + * here
> + */
> + if (ret)
> + iomap_finish_folio_read(folio, off, len, ret);
> + } else {
> + /*
> + * for non-readahead read requests, do reads synchronously since
> + * it's not guaranteed that the server can handle out-of-order reads
> + */
> + ret = fuse_do_readfolio(file, folio, off, len);
> + iomap_finish_folio_read(folio, off, len, ret);
> + }
> return ret;
> }
>
> @@ -923,7 +999,8 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
> }
>
> for (i = 0; i < ap->num_folios; i++) {
> - folio_end_read(ap->folios[i], !err);
> + iomap_finish_folio_read(ap->folios[i], ap->descs[i].offset,
> + ap->descs[i].length, err);
> folio_put(ap->folios[i]);
> }
> if (ia->ff)
> @@ -933,7 +1010,7 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
> }
>
> static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
> - unsigned int count)
> + unsigned int count, bool async)
> {
> struct fuse_file *ff = file->private_data;
> struct fuse_mount *fm = ff->fm;
> @@ -955,7 +1032,7 @@ static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
>
> fuse_read_args_fill(ia, file, pos, count, FUSE_READ);
> ia->read.attr_ver = fuse_get_attr_version(fm->fc);
> - if (fm->fc->async_read) {
> + if (async) {
> ia->ff = fuse_file_get(ff);
> ap->args.end = fuse_readpages_end;
> err = fuse_simple_background(fm, &ap->args, GFP_KERNEL);
> @@ -972,81 +1049,20 @@ static void fuse_readahead(struct readahead_control *rac)
> {
> struct inode *inode = rac->mapping->host;
> struct fuse_conn *fc = get_fuse_conn(inode);
> - unsigned int max_pages, nr_pages;
> - struct folio *folio = NULL;
> + struct fuse_fill_read_data data = {
> + .file = rac->file,
> + .readahead = true,
> + .fc = fc,
> + .rac = rac,
> + };
>
> if (fuse_is_bad(inode))
> return;
>
> - max_pages = min_t(unsigned int, fc->max_pages,
> - fc->max_read / PAGE_SIZE);
> -
> - /*
> - * This is only accurate the first time through, since readahead_folio()
> - * doesn't update readahead_count() from the previous folio until the
> - * next call. Grab nr_pages here so we know how many pages we're going
> - * to have to process. This means that we will exit here with
> - * readahead_count() == folio_nr_pages(last_folio), but we will have
> - * consumed all of the folios, and read_pages() will call
> - * readahead_folio() again which will clean up the rac.
> - */
> - nr_pages = readahead_count(rac);
> -
> - while (nr_pages) {
> - struct fuse_io_args *ia;
> - struct fuse_args_pages *ap;
> - unsigned cur_pages = min(max_pages, nr_pages);
> - unsigned int pages = 0;
> -
> - if (fc->num_background >= fc->congestion_threshold &&
> - rac->ra->async_size >= readahead_count(rac))
> - /*
> - * Congested and only async pages left, so skip the
> - * rest.
> - */
> - break;
> -
> - ia = fuse_io_alloc(NULL, cur_pages);
> - if (!ia)
> - break;
> - ap = &ia->ap;
> -
> - while (pages < cur_pages) {
> - unsigned int folio_pages;
> -
> - /*
> - * This returns a folio with a ref held on it.
> - * The ref needs to be held until the request is
> - * completed, since the splice case (see
> - * fuse_try_move_page()) drops the ref after it's
> - * replaced in the page cache.
> - */
> - if (!folio)
> - folio = __readahead_folio(rac);
> -
> - folio_pages = folio_nr_pages(folio);
> - if (folio_pages > cur_pages - pages) {
> - /*
> - * Large folios belonging to fuse will never
> - * have more pages than max_pages.
> - */
> - WARN_ON(!pages);
> - break;
> - }
> -
> - ap->folios[ap->num_folios] = folio;
> - ap->descs[ap->num_folios].length = folio_size(folio);
> - ap->num_folios++;
> - pages += folio_pages;
> - folio = NULL;
> - }
> - fuse_send_readpages(ia, rac->file, pages << PAGE_SHIFT);
> - nr_pages -= pages;
> - }
> - if (folio) {
> - folio_end_read(folio, false);
> - folio_put(folio);
> - }
> + iomap_readahead(rac, &fuse_iomap_ops, &fuse_iomap_read_ops, &data);
> + if (data.ia)
> + fuse_send_readpages(data.ia, data.file, data.nr_bytes,
> + fc->async_read);
> }
>
> static ssize_t fuse_cache_read_iter(struct kiocb *iocb, struct iov_iter *to)
> @@ -2077,7 +2093,7 @@ struct fuse_fill_wb_data {
> struct fuse_file *ff;
> unsigned int max_folios;
> /*
> - * nr_bytes won't overflow since fuse_writepage_need_send() caps
> + * nr_bytes won't overflow since fuse_folios_need_send() caps
> * wb requests to never exceed fc->max_pages (which has an upper bound
> * of U16_MAX).
> */
> @@ -2122,14 +2138,15 @@ static void fuse_writepages_send(struct inode *inode,
> spin_unlock(&fi->lock);
> }
>
> -static bool fuse_writepage_need_send(struct fuse_conn *fc, loff_t pos,
> - unsigned len, struct fuse_args_pages *ap,
> - struct fuse_fill_wb_data *data)
> +static bool fuse_folios_need_send(struct fuse_conn *fc, loff_t pos,
> + unsigned len, struct fuse_args_pages *ap,
> + unsigned cur_bytes, bool write)
> {
> struct folio *prev_folio;
> struct fuse_folio_desc prev_desc;
> - unsigned bytes = data->nr_bytes + len;
> + unsigned bytes = cur_bytes + len;
> loff_t prev_pos;
> + size_t max_bytes = write ? fc->max_write : fc->max_read;
>
> WARN_ON(!ap->num_folios);
>
> @@ -2137,8 +2154,7 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, loff_t pos,
> if ((bytes + PAGE_SIZE - 1) >> PAGE_SHIFT > fc->max_pages)
> return true;
>
> - /* Reached max write bytes */
> - if (bytes > fc->max_write)
> + if (bytes > max_bytes)
> return true;
>
> /* Discontinuity */
> @@ -2148,11 +2164,6 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, loff_t pos,
> if (prev_pos != pos)
> return true;
>
> - /* Need to grow the pages array? If so, did the expansion fail? */
> - if (ap->num_folios == data->max_folios &&
> - !fuse_pages_realloc(data, fc->max_pages))
> - return true;
> -
> return false;
> }
>
> @@ -2176,10 +2187,21 @@ static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
> return -EIO;
> }
>
> - if (wpa && fuse_writepage_need_send(fc, pos, len, ap, data)) {
> - fuse_writepages_send(inode, data);
> - data->wpa = NULL;
> - data->nr_bytes = 0;
> + if (wpa) {
> + bool send = fuse_folios_need_send(fc, pos, len, ap, data->nr_bytes,
> + true);
> +
> + if (!send) {
> + /* Need to grow the pages array? If so, did the expansion fail? */
> + send = (ap->num_folios == data->max_folios) &&
> + !fuse_pages_realloc(data, fc->max_pages);
> + }
What purpose this code relocation serve? I gather the idea here is that
writes need to reallocate the pages array, whereas readahead can simply
constrain to whatever's already allocated?
--D
> +
> + if (send) {
> + fuse_writepages_send(inode, data);
> + data->wpa = NULL;
> + data->nr_bytes = 0;
> + }
> }
>
> if (data->wpa == NULL) {
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2025-09-03 21:17 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-29 23:56 [PATCH v1 00/16] fuse: use iomap for buffered reads + readahead Joanne Koong
2025-08-29 23:56 ` [PATCH v1 01/16] iomap: move async bio read logic into helper function Joanne Koong
2025-09-03 20:16 ` Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 02/16] iomap: rename cur_folio_in_bio to folio_unlocked Joanne Koong
2025-09-03 20:26 ` [PATCH v1 02/16] iomap: rename cur_folio_in_bio to folio_unlockedOM Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 03/16] iomap: refactor read/readahead completion Joanne Koong
2025-08-29 23:56 ` [PATCH v1 04/16] iomap: use iomap_iter->private for stashing read/readahead bio Joanne Koong
2025-09-03 20:30 ` Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 05/16] iomap: propagate iomap_read_folio() error to caller Joanne Koong
2025-09-03 20:32 ` Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 06/16] iomap: move read/readahead logic out of CONFIG_BLOCK guard Joanne Koong
2025-08-29 23:56 ` [PATCH v1 07/16] iomap: iterate through entire folio in iomap_readpage_iter() Joanne Koong
2025-09-03 20:43 ` Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 08/16] iomap: rename iomap_readpage_iter() to iomap_readfolio_iter() Joanne Koong
2025-08-29 23:56 ` [PATCH v1 09/16] iomap: rename iomap_readpage_ctx struct to iomap_readfolio_ctx Joanne Koong
2025-09-03 20:44 ` Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 10/16] iomap: add iomap_start_folio_read() helper Joanne Koong
2025-09-03 20:52 ` Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 11/16] iomap: make start folio read and finish folio read public APIs Joanne Koong
2025-09-03 20:53 ` Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 12/16] iomap: add iomap_read_ops for read and readahead Joanne Koong
2025-09-03 21:08 ` Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 13/16] iomap: add a private arg " Joanne Koong
2025-08-30 1:54 ` Gao Xiang
2025-09-02 21:24 ` Joanne Koong
2025-09-03 1:55 ` Gao Xiang
2025-09-03 21:11 ` Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 14/16] fuse: use iomap for read_folio Joanne Koong
2025-09-03 21:13 ` Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 15/16] fuse: use iomap for readahead Joanne Koong
2025-09-03 21:17 ` Darrick J. Wong
2025-08-29 23:56 ` [PATCH v1 16/16] fuse: remove fuse_readpages_end() null mapping check Joanne Koong
2025-09-02 9:21 ` Miklos Szeredi
2025-09-02 21:19 ` Joanne Koong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).