* refactor the iomap writeback code v4
@ 2025-07-08 13:51 Christoph Hellwig
2025-07-08 13:51 ` [PATCH 01/14] iomap: header diet Christoph Hellwig
` (14 more replies)
0 siblings, 15 replies; 28+ messages in thread
From: Christoph Hellwig @ 2025-07-08 13:51 UTC (permalink / raw)
To: Christian Brauner
Cc: Darrick J. Wong, Joanne Koong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2
Hi all,
this is an alternative approach to the writeback part of the
"fuse: use iomap for buffered writes + writeback" series from Joanne.
It doesn't try to make the code build without CONFIG_BLOCK yet.
The big difference compared to Joanne's version is that I hope the
split between the generic and ioend/bio based writeback code is a bit
cleaner here. We have two methods that define the split between the
generic writeback code, and the implemementation of it, and all knowledge
of ioends and bios now sits below that layer.
This version passes testing on xfs, and gets as far as mainline for
gfs2 (crashes in generic/361).
Changes since v3:
- add a patch to drop unused includes
- drop the iomap_writepage_ctx renaming - we should do this separately and
including the variable names if desired
- add a comment about special casing of holes in iomap_writeback_range
- split the cleanups to iomap_read_folio_sync into a separate prep patch
- explain the IOMAP_HOLE check in xfs_iomap_valid
- explain the iomap_writeback_folio later folio unlock vs dropbehind
- some cargo culting for the #$W# RST formatting
- "improve" the documentation coverage a bit
Changes since v2:
- rename iomap_writepage_ctx to iomap_writeback_ctx
- keep local map_blocks helpers in XFS
- allow buildinging the writeback and write code for !CONFIG_BLOCK
Changes since v1:
- fix iomap reuse in block/zonefs/gfs2
- catch too large return value from ->writeback_range
- mention the correct file name in a commit log
- add patches for folio laundering
- add patches for read/modify write in the generic write helpers
Diffstat:
Documentation/filesystems/iomap/design.rst | 3
Documentation/filesystems/iomap/operations.rst | 57 +-
block/fops.c | 37 +
fs/gfs2/aops.c | 8
fs/gfs2/bmap.c | 48 +-
fs/gfs2/bmap.h | 1
fs/gfs2/file.c | 3
fs/iomap/Makefile | 6
fs/iomap/buffered-io.c | 554 +++++++------------------
fs/iomap/direct-io.c | 5
fs/iomap/fiemap.c | 3
fs/iomap/internal.h | 1
fs/iomap/ioend.c | 220 +++++++++
fs/iomap/iter.c | 1
fs/iomap/seek.c | 4
fs/iomap/swapfile.c | 3
fs/iomap/trace.c | 1
fs/iomap/trace.h | 4
fs/xfs/xfs_aops.c | 212 +++++----
fs/xfs/xfs_file.c | 6
fs/xfs/xfs_iomap.c | 12
fs/xfs/xfs_iomap.h | 1
fs/xfs/xfs_reflink.c | 3
fs/zonefs/file.c | 40 +
include/linux/iomap.h | 82 ++-
25 files changed, 705 insertions(+), 610 deletions(-)
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 01/14] iomap: header diet
2025-07-08 13:51 refactor the iomap writeback code v4 Christoph Hellwig
@ 2025-07-08 13:51 ` Christoph Hellwig
2025-07-08 19:45 ` Darrick J. Wong
` (2 more replies)
2025-07-08 13:51 ` [PATCH 02/14] iomap: pass more arguments using the iomap writeback context Christoph Hellwig
` (13 subsequent siblings)
14 siblings, 3 replies; 28+ messages in thread
From: Christoph Hellwig @ 2025-07-08 13:51 UTC (permalink / raw)
To: Christian Brauner
Cc: Darrick J. Wong, Joanne Koong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2
Drop various unused #include statements.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/iomap/buffered-io.c | 10 ----------
fs/iomap/direct-io.c | 5 -----
fs/iomap/fiemap.c | 3 ---
fs/iomap/iter.c | 1 -
fs/iomap/seek.c | 4 ----
fs/iomap/swapfile.c | 3 ---
fs/iomap/trace.c | 1 -
7 files changed, 27 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 3729391a18f3..addf6ed13061 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -3,18 +3,8 @@
* Copyright (C) 2010 Red Hat, Inc.
* Copyright (C) 2016-2023 Christoph Hellwig.
*/
-#include <linux/module.h>
-#include <linux/compiler.h>
-#include <linux/fs.h>
#include <linux/iomap.h>
-#include <linux/pagemap.h>
-#include <linux/uio.h>
#include <linux/buffer_head.h>
-#include <linux/dax.h>
-#include <linux/writeback.h>
-#include <linux/swap.h>
-#include <linux/bio.h>
-#include <linux/sched/signal.h>
#include <linux/migrate.h>
#include "internal.h"
#include "trace.h"
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 844261a31156..6f25d4cfea9f 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -3,14 +3,9 @@
* Copyright (C) 2010 Red Hat, Inc.
* Copyright (c) 2016-2025 Christoph Hellwig.
*/
-#include <linux/module.h>
-#include <linux/compiler.h>
-#include <linux/fs.h>
#include <linux/fscrypt.h>
#include <linux/pagemap.h>
#include <linux/iomap.h>
-#include <linux/backing-dev.h>
-#include <linux/uio.h>
#include <linux/task_io_accounting_ops.h>
#include "internal.h"
#include "trace.h"
diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
index 80675c42e94e..d11dadff8286 100644
--- a/fs/iomap/fiemap.c
+++ b/fs/iomap/fiemap.c
@@ -2,9 +2,6 @@
/*
* Copyright (c) 2016-2021 Christoph Hellwig.
*/
-#include <linux/module.h>
-#include <linux/compiler.h>
-#include <linux/fs.h>
#include <linux/iomap.h>
#include <linux/fiemap.h>
#include <linux/pagemap.h>
diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c
index 6ffc6a7b9ba5..cef77ca0c20b 100644
--- a/fs/iomap/iter.c
+++ b/fs/iomap/iter.c
@@ -3,7 +3,6 @@
* Copyright (C) 2010 Red Hat, Inc.
* Copyright (c) 2016-2021 Christoph Hellwig.
*/
-#include <linux/fs.h>
#include <linux/iomap.h>
#include "trace.h"
diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c
index 04d7919636c1..56db2dd4b10d 100644
--- a/fs/iomap/seek.c
+++ b/fs/iomap/seek.c
@@ -3,12 +3,8 @@
* Copyright (C) 2017 Red Hat, Inc.
* Copyright (c) 2018-2021 Christoph Hellwig.
*/
-#include <linux/module.h>
-#include <linux/compiler.h>
-#include <linux/fs.h>
#include <linux/iomap.h>
#include <linux/pagemap.h>
-#include <linux/pagevec.h>
static int iomap_seek_hole_iter(struct iomap_iter *iter,
loff_t *hole_pos)
diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
index c1a762c10ce4..0db77c449467 100644
--- a/fs/iomap/swapfile.c
+++ b/fs/iomap/swapfile.c
@@ -3,9 +3,6 @@
* Copyright (C) 2018 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <darrick.wong@oracle.com>
*/
-#include <linux/module.h>
-#include <linux/compiler.h>
-#include <linux/fs.h>
#include <linux/iomap.h>
#include <linux/swap.h>
diff --git a/fs/iomap/trace.c b/fs/iomap/trace.c
index 728d5443daf5..da217246b1a9 100644
--- a/fs/iomap/trace.c
+++ b/fs/iomap/trace.c
@@ -3,7 +3,6 @@
* Copyright (c) 2019 Christoph Hellwig
*/
#include <linux/iomap.h>
-#include <linux/uio.h>
/*
* We include this last to have the helpers above available for the trace
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 02/14] iomap: pass more arguments using the iomap writeback context
2025-07-08 13:51 refactor the iomap writeback code v4 Christoph Hellwig
2025-07-08 13:51 ` [PATCH 01/14] iomap: header diet Christoph Hellwig
@ 2025-07-08 13:51 ` Christoph Hellwig
2025-07-08 19:45 ` Darrick J. Wong
2025-07-08 13:51 ` [PATCH 03/14] iomap: cleanup the pending writeback tracking in iomap_writepage_map_blocks Christoph Hellwig
` (12 subsequent siblings)
14 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2025-07-08 13:51 UTC (permalink / raw)
To: Christian Brauner
Cc: Darrick J. Wong, Joanne Koong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2, Brian Foster, Johannes Thumshirn
Add inode and wpc fields to pass the inode and writeback context that
are needed in the entire writeback call chain, and let the callers
initialize all fields in the writeback context before calling
iomap_writepages to simplify the argument passing.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
block/fops.c | 8 +++++--
fs/gfs2/aops.c | 8 +++++--
fs/iomap/buffered-io.c | 52 +++++++++++++++++++-----------------------
fs/xfs/xfs_aops.c | 24 +++++++++++++------
fs/zonefs/file.c | 8 +++++--
include/linux/iomap.h | 6 ++---
6 files changed, 61 insertions(+), 45 deletions(-)
diff --git a/block/fops.c b/block/fops.c
index 1309861d4c2c..3394263d942b 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -558,9 +558,13 @@ static const struct iomap_writeback_ops blkdev_writeback_ops = {
static int blkdev_writepages(struct address_space *mapping,
struct writeback_control *wbc)
{
- struct iomap_writepage_ctx wpc = { };
+ struct iomap_writepage_ctx wpc = {
+ .inode = mapping->host,
+ .wbc = wbc,
+ .ops = &blkdev_writeback_ops
+ };
- return iomap_writepages(mapping, wbc, &wpc, &blkdev_writeback_ops);
+ return iomap_writepages(&wpc);
}
const struct address_space_operations def_blk_aops = {
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 14f204cd5a82..47d74afd63ac 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -159,7 +159,11 @@ static int gfs2_writepages(struct address_space *mapping,
struct writeback_control *wbc)
{
struct gfs2_sbd *sdp = gfs2_mapping2sbd(mapping);
- struct iomap_writepage_ctx wpc = { };
+ struct iomap_writepage_ctx wpc = {
+ .inode = mapping->host,
+ .wbc = wbc,
+ .ops = &gfs2_writeback_ops,
+ };
int ret;
/*
@@ -168,7 +172,7 @@ static int gfs2_writepages(struct address_space *mapping,
* want balance_dirty_pages() to loop indefinitely trying to write out
* pages held in the ail that it can't find.
*/
- ret = iomap_writepages(mapping, wbc, &wpc, &gfs2_writeback_ops);
+ ret = iomap_writepages(&wpc);
if (ret == 0 && wbc->nr_to_write > 0)
set_bit(SDF_FORCE_AIL_FLUSH, &sdp->sd_flags);
return ret;
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index addf6ed13061..2806ec1e0b5e 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1616,20 +1616,19 @@ static int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error)
}
static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
- struct writeback_control *wbc, struct inode *inode, loff_t pos,
- u16 ioend_flags)
+ loff_t pos, u16 ioend_flags)
{
struct bio *bio;
bio = bio_alloc_bioset(wpc->iomap.bdev, BIO_MAX_VECS,
- REQ_OP_WRITE | wbc_to_write_flags(wbc),
+ REQ_OP_WRITE | wbc_to_write_flags(wpc->wbc),
GFP_NOFS, &iomap_ioend_bioset);
bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos);
bio->bi_end_io = iomap_writepage_end_bio;
- bio->bi_write_hint = inode->i_write_hint;
- wbc_init_bio(wbc, bio);
+ bio->bi_write_hint = wpc->inode->i_write_hint;
+ wbc_init_bio(wpc->wbc, bio);
wpc->nr_folios = 0;
- return iomap_init_ioend(inode, bio, pos, ioend_flags);
+ return iomap_init_ioend(wpc->inode, bio, pos, ioend_flags);
}
static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos,
@@ -1668,9 +1667,7 @@ static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos,
* writepage context that the caller will need to submit.
*/
static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
- struct writeback_control *wbc, struct folio *folio,
- struct inode *inode, loff_t pos, loff_t end_pos,
- unsigned len)
+ struct folio *folio, loff_t pos, loff_t end_pos, unsigned len)
{
struct iomap_folio_state *ifs = folio->private;
size_t poff = offset_in_folio(folio, pos);
@@ -1691,8 +1688,7 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
error = iomap_submit_ioend(wpc, 0);
if (error)
return error;
- wpc->ioend = iomap_alloc_ioend(wpc, wbc, inode, pos,
- ioend_flags);
+ wpc->ioend = iomap_alloc_ioend(wpc, pos, ioend_flags);
}
if (!bio_add_folio(&wpc->ioend->io_bio, folio, len, poff))
@@ -1746,24 +1742,24 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
if (wpc->ioend->io_offset + wpc->ioend->io_size > end_pos)
wpc->ioend->io_size = end_pos - wpc->ioend->io_offset;
- wbc_account_cgroup_owner(wbc, folio, len);
+ wbc_account_cgroup_owner(wpc->wbc, folio, len);
return 0;
}
static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
- struct writeback_control *wbc, struct folio *folio,
- struct inode *inode, u64 pos, u64 end_pos,
- unsigned dirty_len, unsigned *count)
+ struct folio *folio, u64 pos, u64 end_pos, unsigned dirty_len,
+ unsigned *count)
{
int error;
do {
unsigned map_len;
- error = wpc->ops->map_blocks(wpc, inode, pos, dirty_len);
+ error = wpc->ops->map_blocks(wpc, wpc->inode, pos, dirty_len);
if (error)
break;
- trace_iomap_writepage_map(inode, pos, dirty_len, &wpc->iomap);
+ trace_iomap_writepage_map(wpc->inode, pos, dirty_len,
+ &wpc->iomap);
map_len = min_t(u64, dirty_len,
wpc->iomap.offset + wpc->iomap.length - pos);
@@ -1777,8 +1773,8 @@ static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
case IOMAP_HOLE:
break;
default:
- error = iomap_add_to_ioend(wpc, wbc, folio, inode, pos,
- end_pos, map_len);
+ error = iomap_add_to_ioend(wpc, folio, pos, end_pos,
+ map_len);
if (!error)
(*count)++;
break;
@@ -1860,10 +1856,10 @@ static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode,
}
static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
- struct writeback_control *wbc, struct folio *folio)
+ struct folio *folio)
{
struct iomap_folio_state *ifs = folio->private;
- struct inode *inode = folio->mapping->host;
+ struct inode *inode = wpc->inode;
u64 pos = folio_pos(folio);
u64 end_pos = pos + folio_size(folio);
u64 end_aligned = 0;
@@ -1910,8 +1906,8 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
*/
end_aligned = round_up(end_pos, i_blocksize(inode));
while ((rlen = iomap_find_dirty_range(folio, &pos, end_aligned))) {
- error = iomap_writepage_map_blocks(wpc, wbc, folio, inode,
- pos, end_pos, rlen, &count);
+ error = iomap_writepage_map_blocks(wpc, folio, pos, end_pos,
+ rlen, &count);
if (error)
break;
pos += rlen;
@@ -1947,10 +1943,9 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
}
int
-iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
- struct iomap_writepage_ctx *wpc,
- const struct iomap_writeback_ops *ops)
+iomap_writepages(struct iomap_writepage_ctx *wpc)
{
+ struct address_space *mapping = wpc->inode->i_mapping;
struct folio *folio = NULL;
int error;
@@ -1962,9 +1957,8 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
PF_MEMALLOC))
return -EIO;
- wpc->ops = ops;
- while ((folio = writeback_iter(mapping, wbc, folio, &error)))
- error = iomap_writepage_map(wpc, wbc, folio);
+ while ((folio = writeback_iter(mapping, wpc->wbc, folio, &error)))
+ error = iomap_writepage_map(wpc, folio);
return iomap_submit_ioend(wpc, error);
}
EXPORT_SYMBOL_GPL(iomap_writepages);
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 63151feb9c3f..65485a52df3b 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -636,19 +636,29 @@ xfs_vm_writepages(
xfs_iflags_clear(ip, XFS_ITRUNCATED);
if (xfs_is_zoned_inode(ip)) {
- struct xfs_zoned_writepage_ctx xc = { };
+ struct xfs_zoned_writepage_ctx xc = {
+ .ctx = {
+ .inode = mapping->host,
+ .wbc = wbc,
+ .ops = &xfs_zoned_writeback_ops
+ },
+ };
int error;
- error = iomap_writepages(mapping, wbc, &xc.ctx,
- &xfs_zoned_writeback_ops);
+ error = iomap_writepages(&xc.ctx);
if (xc.open_zone)
xfs_open_zone_put(xc.open_zone);
return error;
} else {
- struct xfs_writepage_ctx wpc = { };
-
- return iomap_writepages(mapping, wbc, &wpc.ctx,
- &xfs_writeback_ops);
+ struct xfs_writepage_ctx wpc = {
+ .ctx = {
+ .inode = mapping->host,
+ .wbc = wbc,
+ .ops = &xfs_writeback_ops
+ },
+ };
+
+ return iomap_writepages(&wpc.ctx);
}
}
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 42e2c0065bb3..edca4bbe4b72 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -152,9 +152,13 @@ static const struct iomap_writeback_ops zonefs_writeback_ops = {
static int zonefs_writepages(struct address_space *mapping,
struct writeback_control *wbc)
{
- struct iomap_writepage_ctx wpc = { };
+ struct iomap_writepage_ctx wpc = {
+ .inode = mapping->host,
+ .wbc = wbc,
+ .ops = &zonefs_writeback_ops,
+ };
- return iomap_writepages(mapping, wbc, &wpc, &zonefs_writeback_ops);
+ return iomap_writepages(&wpc);
}
static int zonefs_swap_activate(struct swap_info_struct *sis,
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 522644d62f30..00179c9387c5 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -448,6 +448,8 @@ struct iomap_writeback_ops {
struct iomap_writepage_ctx {
struct iomap iomap;
+ struct inode *inode;
+ struct writeback_control *wbc;
struct iomap_ioend *ioend;
const struct iomap_writeback_ops *ops;
u32 nr_folios; /* folios added to the ioend */
@@ -461,9 +463,7 @@ void iomap_finish_ioends(struct iomap_ioend *ioend, int error);
void iomap_ioend_try_merge(struct iomap_ioend *ioend,
struct list_head *more_ioends);
void iomap_sort_ioends(struct list_head *ioend_list);
-int iomap_writepages(struct address_space *mapping,
- struct writeback_control *wbc, struct iomap_writepage_ctx *wpc,
- const struct iomap_writeback_ops *ops);
+int iomap_writepages(struct iomap_writepage_ctx *wpc);
/*
* Flags for direct I/O ->end_io:
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 03/14] iomap: cleanup the pending writeback tracking in iomap_writepage_map_blocks
2025-07-08 13:51 refactor the iomap writeback code v4 Christoph Hellwig
2025-07-08 13:51 ` [PATCH 01/14] iomap: header diet Christoph Hellwig
2025-07-08 13:51 ` [PATCH 02/14] iomap: pass more arguments using the iomap writeback context Christoph Hellwig
@ 2025-07-08 13:51 ` Christoph Hellwig
2025-07-08 13:51 ` [PATCH 04/14] iomap: refactor the writeback interface Christoph Hellwig
` (11 subsequent siblings)
14 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2025-07-08 13:51 UTC (permalink / raw)
To: Christian Brauner
Cc: Darrick J. Wong, Joanne Koong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2, Brian Foster, Johannes Thumshirn
From: Joanne Koong <joannelkoong@gmail.com>
We don't care about the count of outstanding ioends, just if there is one.
Replace the count variable passed to iomap_writepage_map_blocks with a
boolean to make that more clear.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
[hch: rename the variable, update the commit message]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/iomap/buffered-io.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 2806ec1e0b5e..372342bfffa3 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1748,7 +1748,7 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
struct folio *folio, u64 pos, u64 end_pos, unsigned dirty_len,
- unsigned *count)
+ bool *wb_pending)
{
int error;
@@ -1776,7 +1776,7 @@ static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
error = iomap_add_to_ioend(wpc, folio, pos, end_pos,
map_len);
if (!error)
- (*count)++;
+ *wb_pending = true;
break;
}
dirty_len -= map_len;
@@ -1863,7 +1863,7 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
u64 pos = folio_pos(folio);
u64 end_pos = pos + folio_size(folio);
u64 end_aligned = 0;
- unsigned count = 0;
+ bool wb_pending = false;
int error = 0;
u32 rlen;
@@ -1907,13 +1907,13 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
end_aligned = round_up(end_pos, i_blocksize(inode));
while ((rlen = iomap_find_dirty_range(folio, &pos, end_aligned))) {
error = iomap_writepage_map_blocks(wpc, folio, pos, end_pos,
- rlen, &count);
+ rlen, &wb_pending);
if (error)
break;
pos += rlen;
}
- if (count)
+ if (wb_pending)
wpc->nr_folios++;
/*
@@ -1935,7 +1935,7 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
if (atomic_dec_and_test(&ifs->write_bytes_pending))
folio_end_writeback(folio);
} else {
- if (!count)
+ if (!wb_pending)
folio_end_writeback(folio);
}
mapping_set_error(inode->i_mapping, error);
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 04/14] iomap: refactor the writeback interface
2025-07-08 13:51 refactor the iomap writeback code v4 Christoph Hellwig
` (2 preceding siblings ...)
2025-07-08 13:51 ` [PATCH 03/14] iomap: cleanup the pending writeback tracking in iomap_writepage_map_blocks Christoph Hellwig
@ 2025-07-08 13:51 ` Christoph Hellwig
2025-07-08 19:44 ` Darrick J. Wong
2025-07-08 13:51 ` [PATCH 05/14] iomap: hide ioends from the generic writeback code Christoph Hellwig
` (10 subsequent siblings)
14 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2025-07-08 13:51 UTC (permalink / raw)
To: Christian Brauner
Cc: Darrick J. Wong, Joanne Koong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2, Brian Foster, Damien Le Moal
Replace ->map_blocks with a new ->writeback_range, which differs in the
following ways:
- it must also queue up the I/O for writeback, that is called into the
slightly refactored and extended in scope iomap_add_to_ioend for
each region
- can handle only a part of the requested region, that is the retry
loop for partial mappings moves to the caller
- handles cleanup on failures as well, and thus also replaces the
discard_folio method only implemented by XFS.
This will allow to use the iomap writeback code also for file systems
that are not block based like fuse.
Co-developed-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Acked-by: Damien Le Moal <dlemoal@kernel.org> # zonefs
---
.../filesystems/iomap/operations.rst | 32 ++---
block/fops.c | 25 ++--
fs/gfs2/bmap.c | 26 ++--
fs/iomap/buffered-io.c | 96 ++++++-------
fs/iomap/trace.h | 2 +-
fs/xfs/xfs_aops.c | 128 +++++++++++-------
fs/zonefs/file.c | 28 ++--
include/linux/iomap.h | 21 ++-
8 files changed, 197 insertions(+), 161 deletions(-)
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index 3b628e370d88..f07c8fdb2046 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -271,7 +271,7 @@ writeback.
It does not lock ``i_rwsem`` or ``invalidate_lock``.
The dirty bit will be cleared for all folios run through the
-``->map_blocks`` machinery described below even if the writeback fails.
+``->writeback_range`` machinery described below even if the writeback fails.
This is to prevent dirty folio clots when storage devices fail; an
``-EIO`` is recorded for userspace to collect via ``fsync``.
@@ -283,15 +283,14 @@ The ``ops`` structure must be specified and is as follows:
.. code-block:: c
struct iomap_writeback_ops {
- int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode,
- loff_t offset, unsigned len);
- int (*submit_ioend)(struct iomap_writepage_ctx *wpc, int status);
- void (*discard_folio)(struct folio *folio, loff_t pos);
+ int (*writeback_range)(struct iomap_writepage_ctx *wpc,
+ struct folio *folio, u64 pos, unsigned int len, u64 end_pos);
+ int (*submit_ioend)(struct iomap_writepage_ctx *wpc, int status);
};
The fields are as follows:
- - ``map_blocks``: Sets ``wpc->iomap`` to the space mapping of the file
+ - ``writeback_range``: Sets ``wpc->iomap`` to the space mapping of the file
range (in bytes) given by ``offset`` and ``len``.
iomap calls this function for each dirty fs block in each dirty folio,
though it will `reuse mappings
@@ -306,6 +305,15 @@ The fields are as follows:
This revalidation must be open-coded by the filesystem; it is
unclear if ``iomap::validity_cookie`` can be reused for this
purpose.
+
+ If this methods fails to schedule I/O for any part of a dirty folio, it
+ should throw away any reservations that may have been made for the write.
+ The folio will be marked clean and an ``-EIO`` recorded in the
+ pagecache.
+ Filesystems can use this callback to `remove
+ <https://lore.kernel.org/all/20201029163313.1766967-1-bfoster@redhat.com/>`_
+ delalloc reservations to avoid having delalloc reservations for
+ clean pagecache.
This function must be supplied by the filesystem.
- ``submit_ioend``: Allows the file systems to hook into writeback bio
@@ -316,18 +324,6 @@ The fields are as follows:
transactions from process context before submitting the bio.
This function is optional.
- - ``discard_folio``: iomap calls this function after ``->map_blocks``
- fails to schedule I/O for any part of a dirty folio.
- The function should throw away any reservations that may have been
- made for the write.
- The folio will be marked clean and an ``-EIO`` recorded in the
- pagecache.
- Filesystems can use this callback to `remove
- <https://lore.kernel.org/all/20201029163313.1766967-1-bfoster@redhat.com/>`_
- delalloc reservations to avoid having delalloc reservations for
- clean pagecache.
- This function is optional.
-
Pagecache Writeback Completion
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/block/fops.c b/block/fops.c
index 3394263d942b..b500ff8f55dd 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -537,22 +537,29 @@ static void blkdev_readahead(struct readahead_control *rac)
iomap_readahead(rac, &blkdev_iomap_ops);
}
-static int blkdev_map_blocks(struct iomap_writepage_ctx *wpc,
- struct inode *inode, loff_t offset, unsigned int len)
+static ssize_t blkdev_writeback_range(struct iomap_writepage_ctx *wpc,
+ struct folio *folio, u64 offset, unsigned int len, u64 end_pos)
{
- loff_t isize = i_size_read(inode);
+ loff_t isize = i_size_read(wpc->inode);
if (WARN_ON_ONCE(offset >= isize))
return -EIO;
- if (offset >= wpc->iomap.offset &&
- offset < wpc->iomap.offset + wpc->iomap.length)
- return 0;
- return blkdev_iomap_begin(inode, offset, isize - offset,
- IOMAP_WRITE, &wpc->iomap, NULL);
+
+ if (offset < wpc->iomap.offset ||
+ offset >= wpc->iomap.offset + wpc->iomap.length) {
+ int error;
+
+ error = blkdev_iomap_begin(wpc->inode, offset, isize - offset,
+ IOMAP_WRITE, &wpc->iomap, NULL);
+ if (error)
+ return error;
+ }
+
+ return iomap_add_to_ioend(wpc, folio, offset, end_pos, len);
}
static const struct iomap_writeback_ops blkdev_writeback_ops = {
- .map_blocks = blkdev_map_blocks,
+ .writeback_range = blkdev_writeback_range,
};
static int blkdev_writepages(struct address_space *mapping,
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 7703d0471139..0cc41de54aba 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -2469,23 +2469,25 @@ int __gfs2_punch_hole(struct file *file, loff_t offset, loff_t length)
return error;
}
-static int gfs2_map_blocks(struct iomap_writepage_ctx *wpc, struct inode *inode,
- loff_t offset, unsigned int len)
+static ssize_t gfs2_writeback_range(struct iomap_writepage_ctx *wpc,
+ struct folio *folio, u64 offset, unsigned int len, u64 end_pos)
{
- int ret;
-
- if (WARN_ON_ONCE(gfs2_is_stuffed(GFS2_I(inode))))
+ if (WARN_ON_ONCE(gfs2_is_stuffed(GFS2_I(wpc->inode))))
return -EIO;
- if (offset >= wpc->iomap.offset &&
- offset < wpc->iomap.offset + wpc->iomap.length)
- return 0;
+ if (offset < wpc->iomap.offset ||
+ offset >= wpc->iomap.offset + wpc->iomap.length) {
+ int ret;
- memset(&wpc->iomap, 0, sizeof(wpc->iomap));
- ret = gfs2_iomap_get(inode, offset, INT_MAX, &wpc->iomap);
- return ret;
+ memset(&wpc->iomap, 0, sizeof(wpc->iomap));
+ ret = gfs2_iomap_get(wpc->inode, offset, INT_MAX, &wpc->iomap);
+ if (ret)
+ return ret;
+ }
+
+ return iomap_add_to_ioend(wpc, folio, offset, end_pos, len);
}
const struct iomap_writeback_ops gfs2_writeback_ops = {
- .map_blocks = gfs2_map_blocks,
+ .writeback_range = gfs2_writeback_range,
};
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 372342bfffa3..7d9cd05c36bb 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1666,14 +1666,30 @@ static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos,
* At the end of a writeback pass, there will be a cached ioend remaining on the
* writepage context that the caller will need to submit.
*/
-static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
- struct folio *folio, loff_t pos, loff_t end_pos, unsigned len)
+ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
+ loff_t pos, loff_t end_pos, unsigned int dirty_len)
{
struct iomap_folio_state *ifs = folio->private;
size_t poff = offset_in_folio(folio, pos);
unsigned int ioend_flags = 0;
+ unsigned int map_len = min_t(u64, dirty_len,
+ wpc->iomap.offset + wpc->iomap.length - pos);
int error;
+ trace_iomap_add_to_ioend(wpc->inode, pos, dirty_len, &wpc->iomap);
+
+ WARN_ON_ONCE(!folio->private && map_len < dirty_len);
+
+ switch (wpc->iomap.type) {
+ case IOMAP_INLINE:
+ WARN_ON_ONCE(1);
+ return -EIO;
+ case IOMAP_HOLE:
+ return map_len;
+ default:
+ break;
+ }
+
if (wpc->iomap.type == IOMAP_UNWRITTEN)
ioend_flags |= IOMAP_IOEND_UNWRITTEN;
if (wpc->iomap.flags & IOMAP_F_SHARED)
@@ -1691,11 +1707,11 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
wpc->ioend = iomap_alloc_ioend(wpc, pos, ioend_flags);
}
- if (!bio_add_folio(&wpc->ioend->io_bio, folio, len, poff))
+ if (!bio_add_folio(&wpc->ioend->io_bio, folio, map_len, poff))
goto new_ioend;
if (ifs)
- atomic_add(len, &ifs->write_bytes_pending);
+ atomic_add(map_len, &ifs->write_bytes_pending);
/*
* Clamp io_offset and io_size to the incore EOF so that ondisk
@@ -1738,63 +1754,39 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
* Note that this defeats the ability to chain the ioends of
* appending writes.
*/
- wpc->ioend->io_size += len;
+ wpc->ioend->io_size += map_len;
if (wpc->ioend->io_offset + wpc->ioend->io_size > end_pos)
wpc->ioend->io_size = end_pos - wpc->ioend->io_offset;
- wbc_account_cgroup_owner(wpc->wbc, folio, len);
- return 0;
+ wbc_account_cgroup_owner(wpc->wbc, folio, map_len);
+ return map_len;
}
+EXPORT_SYMBOL_GPL(iomap_add_to_ioend);
-static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
- struct folio *folio, u64 pos, u64 end_pos, unsigned dirty_len,
+static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
+ struct folio *folio, u64 pos, u32 rlen, u64 end_pos,
bool *wb_pending)
{
- int error;
-
do {
- unsigned map_len;
-
- error = wpc->ops->map_blocks(wpc, wpc->inode, pos, dirty_len);
- if (error)
- break;
- trace_iomap_writepage_map(wpc->inode, pos, dirty_len,
- &wpc->iomap);
+ ssize_t ret;
- map_len = min_t(u64, dirty_len,
- wpc->iomap.offset + wpc->iomap.length - pos);
- WARN_ON_ONCE(!folio->private && map_len < dirty_len);
+ ret = wpc->ops->writeback_range(wpc, folio, pos, rlen, end_pos);
+ if (WARN_ON_ONCE(ret == 0 || ret > rlen))
+ return -EIO;
+ if (ret < 0)
+ return ret;
+ rlen -= ret;
+ pos += ret;
- switch (wpc->iomap.type) {
- case IOMAP_INLINE:
- WARN_ON_ONCE(1);
- error = -EIO;
- break;
- case IOMAP_HOLE:
- break;
- default:
- error = iomap_add_to_ioend(wpc, folio, pos, end_pos,
- map_len);
- if (!error)
- *wb_pending = true;
- break;
- }
- dirty_len -= map_len;
- pos += map_len;
- } while (dirty_len && !error);
+ /*
+ * Holes are not be written back by ->writeback_range, so track
+ * if we did handle anything that is not a hole here.
+ */
+ if (wpc->iomap.type != IOMAP_HOLE)
+ *wb_pending = true;
+ } while (rlen);
- /*
- * We cannot cancel the ioend directly here on error. We may have
- * already set other pages under writeback and hence we have to run I/O
- * completion to mark the error state of the pages under writeback
- * appropriately.
- *
- * Just let the file system know what portion of the folio failed to
- * map.
- */
- if (error && wpc->ops->discard_folio)
- wpc->ops->discard_folio(folio, pos);
- return error;
+ return 0;
}
/*
@@ -1906,8 +1898,8 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
*/
end_aligned = round_up(end_pos, i_blocksize(inode));
while ((rlen = iomap_find_dirty_range(folio, &pos, end_aligned))) {
- error = iomap_writepage_map_blocks(wpc, folio, pos, end_pos,
- rlen, &wb_pending);
+ error = iomap_writeback_range(wpc, folio, pos, rlen, end_pos,
+ &wb_pending);
if (error)
break;
pos += rlen;
diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index 455cc6f90be0..aaea02c9560a 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -169,7 +169,7 @@ DEFINE_EVENT(iomap_class, name, \
DEFINE_IOMAP_EVENT(iomap_iter_dstmap);
DEFINE_IOMAP_EVENT(iomap_iter_srcmap);
-TRACE_EVENT(iomap_writepage_map,
+TRACE_EVENT(iomap_add_to_ioend,
TP_PROTO(struct inode *inode, u64 pos, unsigned int dirty_len,
struct iomap *iomap),
TP_ARGS(inode, pos, dirty_len, iomap),
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 65485a52df3b..f6d44ab78442 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -233,6 +233,47 @@ xfs_end_bio(
spin_unlock_irqrestore(&ip->i_ioend_lock, flags);
}
+/*
+ * We cannot cancel the ioend directly on error. We may have already set other
+ * pages under writeback and hence we have to run I/O completion to mark the
+ * error state of the pages under writeback appropriately.
+ *
+ * If the folio has delalloc blocks on it, the caller is asking us to punch them
+ * out. If we don't, we can leave a stale delalloc mapping covered by a clean
+ * page that needs to be dirtied again before the delalloc mapping can be
+ * converted. This stale delalloc mapping can trip up a later direct I/O read
+ * operation on the same region.
+ *
+ * We prevent this by truncating away the delalloc regions on the folio. Because
+ * they are delalloc, we can do this without needing a transaction. Indeed - if
+ * we get ENOSPC errors, we have to be able to do this truncation without a
+ * transaction as there is no space left for block reservation (typically why
+ * we see a ENOSPC in writeback).
+ */
+static void
+xfs_discard_folio(
+ struct folio *folio,
+ loff_t pos)
+{
+ struct xfs_inode *ip = XFS_I(folio->mapping->host);
+ struct xfs_mount *mp = ip->i_mount;
+
+ if (xfs_is_shutdown(mp))
+ return;
+
+ xfs_alert_ratelimited(mp,
+ "page discard on page "PTR_FMT", inode 0x%llx, pos %llu.",
+ folio, ip->i_ino, pos);
+
+ /*
+ * The end of the punch range is always the offset of the first
+ * byte of the next folio. Hence the end offset is only dependent on the
+ * folio itself and not the start offset that is passed in.
+ */
+ xfs_bmap_punch_delalloc_range(ip, XFS_DATA_FORK, pos,
+ folio_pos(folio) + folio_size(folio), NULL);
+}
+
/*
* Fast revalidation of the cached writeback mapping. Return true if the current
* mapping is valid, false otherwise.
@@ -278,13 +319,12 @@ xfs_imap_valid(
static int
xfs_map_blocks(
struct iomap_writepage_ctx *wpc,
- struct inode *inode,
loff_t offset,
unsigned int len)
{
- struct xfs_inode *ip = XFS_I(inode);
+ struct xfs_inode *ip = XFS_I(wpc->inode);
struct xfs_mount *mp = ip->i_mount;
- ssize_t count = i_blocksize(inode);
+ ssize_t count = i_blocksize(wpc->inode);
xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset);
xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, offset + count);
xfs_fileoff_t cow_fsb;
@@ -436,6 +476,24 @@ xfs_map_blocks(
return 0;
}
+static ssize_t
+xfs_writeback_range(
+ struct iomap_writepage_ctx *wpc,
+ struct folio *folio,
+ u64 offset,
+ unsigned int len,
+ u64 end_pos)
+{
+ ssize_t ret;
+
+ ret = xfs_map_blocks(wpc, offset, len);
+ if (!ret)
+ ret = iomap_add_to_ioend(wpc, folio, offset, end_pos, len);
+ if (ret < 0)
+ xfs_discard_folio(folio, offset);
+ return ret;
+}
+
static bool
xfs_ioend_needs_wq_completion(
struct iomap_ioend *ioend)
@@ -488,47 +546,9 @@ xfs_submit_ioend(
return 0;
}
-/*
- * If the folio has delalloc blocks on it, the caller is asking us to punch them
- * out. If we don't, we can leave a stale delalloc mapping covered by a clean
- * page that needs to be dirtied again before the delalloc mapping can be
- * converted. This stale delalloc mapping can trip up a later direct I/O read
- * operation on the same region.
- *
- * We prevent this by truncating away the delalloc regions on the folio. Because
- * they are delalloc, we can do this without needing a transaction. Indeed - if
- * we get ENOSPC errors, we have to be able to do this truncation without a
- * transaction as there is no space left for block reservation (typically why
- * we see a ENOSPC in writeback).
- */
-static void
-xfs_discard_folio(
- struct folio *folio,
- loff_t pos)
-{
- struct xfs_inode *ip = XFS_I(folio->mapping->host);
- struct xfs_mount *mp = ip->i_mount;
-
- if (xfs_is_shutdown(mp))
- return;
-
- xfs_alert_ratelimited(mp,
- "page discard on page "PTR_FMT", inode 0x%llx, pos %llu.",
- folio, ip->i_ino, pos);
-
- /*
- * The end of the punch range is always the offset of the first
- * byte of the next folio. Hence the end offset is only dependent on the
- * folio itself and not the start offset that is passed in.
- */
- xfs_bmap_punch_delalloc_range(ip, XFS_DATA_FORK, pos,
- folio_pos(folio) + folio_size(folio), NULL);
-}
-
static const struct iomap_writeback_ops xfs_writeback_ops = {
- .map_blocks = xfs_map_blocks,
+ .writeback_range = xfs_writeback_range,
.submit_ioend = xfs_submit_ioend,
- .discard_folio = xfs_discard_folio,
};
struct xfs_zoned_writepage_ctx {
@@ -545,11 +565,10 @@ XFS_ZWPC(struct iomap_writepage_ctx *ctx)
static int
xfs_zoned_map_blocks(
struct iomap_writepage_ctx *wpc,
- struct inode *inode,
loff_t offset,
unsigned int len)
{
- struct xfs_inode *ip = XFS_I(inode);
+ struct xfs_inode *ip = XFS_I(wpc->inode);
struct xfs_mount *mp = ip->i_mount;
xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset);
xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, offset + len);
@@ -608,6 +627,24 @@ xfs_zoned_map_blocks(
return 0;
}
+static ssize_t
+xfs_zoned_writeback_range(
+ struct iomap_writepage_ctx *wpc,
+ struct folio *folio,
+ u64 offset,
+ unsigned int len,
+ u64 end_pos)
+{
+ ssize_t ret;
+
+ ret = xfs_zoned_map_blocks(wpc, offset, len);
+ if (!ret)
+ ret = iomap_add_to_ioend(wpc, folio, offset, end_pos, len);
+ if (ret < 0)
+ xfs_discard_folio(folio, offset);
+ return ret;
+}
+
static int
xfs_zoned_submit_ioend(
struct iomap_writepage_ctx *wpc,
@@ -621,9 +658,8 @@ xfs_zoned_submit_ioend(
}
static const struct iomap_writeback_ops xfs_zoned_writeback_ops = {
- .map_blocks = xfs_zoned_map_blocks,
+ .writeback_range = xfs_zoned_writeback_range,
.submit_ioend = xfs_zoned_submit_ioend,
- .discard_folio = xfs_discard_folio,
};
STATIC int
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index edca4bbe4b72..c88e2c851753 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -124,29 +124,33 @@ static void zonefs_readahead(struct readahead_control *rac)
* Map blocks for page writeback. This is used only on conventional zone files,
* which implies that the page range can only be within the fixed inode size.
*/
-static int zonefs_write_map_blocks(struct iomap_writepage_ctx *wpc,
- struct inode *inode, loff_t offset,
- unsigned int len)
+static ssize_t zonefs_writeback_range(struct iomap_writepage_ctx *wpc,
+ struct folio *folio, u64 offset, unsigned len, u64 end_pos)
{
- struct zonefs_zone *z = zonefs_inode_zone(inode);
+ struct zonefs_zone *z = zonefs_inode_zone(wpc->inode);
if (WARN_ON_ONCE(zonefs_zone_is_seq(z)))
return -EIO;
- if (WARN_ON_ONCE(offset >= i_size_read(inode)))
+ if (WARN_ON_ONCE(offset >= i_size_read(wpc->inode)))
return -EIO;
/* If the mapping is already OK, nothing needs to be done */
- if (offset >= wpc->iomap.offset &&
- offset < wpc->iomap.offset + wpc->iomap.length)
- return 0;
+ if (offset < wpc->iomap.offset ||
+ offset >= wpc->iomap.offset + wpc->iomap.length) {
+ int error;
+
+ error = zonefs_write_iomap_begin(wpc->inode, offset,
+ z->z_capacity - offset, IOMAP_WRITE,
+ &wpc->iomap, NULL);
+ if (error)
+ return error;
+ }
- return zonefs_write_iomap_begin(inode, offset,
- z->z_capacity - offset,
- IOMAP_WRITE, &wpc->iomap, NULL);
+ return iomap_add_to_ioend(wpc, folio, offset, end_pos, len);
}
static const struct iomap_writeback_ops zonefs_writeback_ops = {
- .map_blocks = zonefs_write_map_blocks,
+ .writeback_range = zonefs_writeback_range,
};
static int zonefs_writepages(struct address_space *mapping,
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 00179c9387c5..625d7911a2b5 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -416,18 +416,20 @@ static inline struct iomap_ioend *iomap_ioend_from_bio(struct bio *bio)
struct iomap_writeback_ops {
/*
- * Required, maps the blocks so that writeback can be performed on
- * the range starting at offset.
+ * Required, performs writeback on the passed in range
*
- * Can return arbitrarily large regions, but we need to call into it at
+ * Can map arbitrarily large regions, but we need to call into it at
* least once per folio to allow the file systems to synchronize with
* the write path that could be invalidating mappings.
*
* An existing mapping from a previous call to this method can be reused
* by the file system if it is still valid.
+ *
+ * Returns the number of bytes processed or a negative errno.
*/
- int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode,
- loff_t offset, unsigned len);
+ ssize_t (*writeback_range)(struct iomap_writepage_ctx *wpc,
+ struct folio *folio, u64 pos, unsigned int len,
+ u64 end_pos);
/*
* Optional, allows the file systems to hook into bio submission,
@@ -438,12 +440,6 @@ struct iomap_writeback_ops {
* the bio could not be submitted.
*/
int (*submit_ioend)(struct iomap_writepage_ctx *wpc, int status);
-
- /*
- * Optional, allows the file system to discard state on a page where
- * we failed to submit any I/O.
- */
- void (*discard_folio)(struct folio *folio, loff_t pos);
};
struct iomap_writepage_ctx {
@@ -463,6 +459,9 @@ void iomap_finish_ioends(struct iomap_ioend *ioend, int error);
void iomap_ioend_try_merge(struct iomap_ioend *ioend,
struct list_head *more_ioends);
void iomap_sort_ioends(struct list_head *ioend_list);
+ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
+ loff_t pos, loff_t end_pos, unsigned int dirty_len);
+
int iomap_writepages(struct iomap_writepage_ctx *wpc);
/*
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 05/14] iomap: hide ioends from the generic writeback code
2025-07-08 13:51 refactor the iomap writeback code v4 Christoph Hellwig
` (3 preceding siblings ...)
2025-07-08 13:51 ` [PATCH 04/14] iomap: refactor the writeback interface Christoph Hellwig
@ 2025-07-08 13:51 ` Christoph Hellwig
2025-07-08 19:42 ` Darrick J. Wong
2025-07-08 13:51 ` [PATCH 06/14] iomap: add public helpers for uptodate state manipulation Christoph Hellwig
` (9 subsequent siblings)
14 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2025-07-08 13:51 UTC (permalink / raw)
To: Christian Brauner
Cc: Darrick J. Wong, Joanne Koong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2, Brian Foster, Damien Le Moal
Replace the ioend pointer in iomap_writeback_ctx with a void *wb_ctx
one to facilitate non-block, non-ioend writeback for use. Rename
the submit_ioend method to writeback_submit and make it mandatory so
that the generic writeback code stops seeing ioends and bios.
Co-developed-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Acked-by: Damien Le Moal <dlemoal@kernel.org>
---
.../filesystems/iomap/operations.rst | 17 ++--
block/fops.c | 1 +
fs/gfs2/bmap.c | 1 +
fs/iomap/buffered-io.c | 91 ++++++++++---------
fs/xfs/xfs_aops.c | 60 ++++++------
fs/zonefs/file.c | 1 +
include/linux/iomap.h | 19 ++--
7 files changed, 100 insertions(+), 90 deletions(-)
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index f07c8fdb2046..4b93c5f7841a 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -284,8 +284,8 @@ The ``ops`` structure must be specified and is as follows:
struct iomap_writeback_ops {
int (*writeback_range)(struct iomap_writepage_ctx *wpc,
- struct folio *folio, u64 pos, unsigned int len, u64 end_pos);
- int (*submit_ioend)(struct iomap_writepage_ctx *wpc, int status);
+ struct folio *folio, u64 pos, unsigned int len, u64 end_pos);
+ int (*writeback_submit)(struct iomap_writepage_ctx *wpc, int error);
};
The fields are as follows:
@@ -316,13 +316,15 @@ The fields are as follows:
clean pagecache.
This function must be supplied by the filesystem.
- - ``submit_ioend``: Allows the file systems to hook into writeback bio
- submission.
+ - ``writeback_submit``: Submit the previous built writeback context.
+ Block based file systems should use the iomap_ioend_writeback_submit
+ helper, other file system can implement their own.
+ File systems can optionall to hook into writeback bio submission.
This might include pre-write space accounting updates, or installing
a custom ``->bi_end_io`` function for internal purposes, such as
deferring the ioend completion to a workqueue to run metadata update
transactions from process context before submitting the bio.
- This function is optional.
+ This function must be supplied by the filesystem.
Pagecache Writeback Completion
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -336,10 +338,9 @@ If the write failed, it will also set the error bits on the folios and
the address space.
This can happen in interrupt or process context, depending on the
storage device.
-
Filesystems that need to update internal bookkeeping (e.g. unwritten
-extent conversions) should provide a ``->submit_ioend`` function to
-set ``struct iomap_end::bio::bi_end_io`` to its own function.
+extent conversions) should set their own bi_end_io on the bios
+submitted by ``->submit_writeback``
This function should call ``iomap_finish_ioends`` after finishing its
own work (e.g. unwritten extent conversion).
diff --git a/block/fops.c b/block/fops.c
index b500ff8f55dd..0845737c0320 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -560,6 +560,7 @@ static ssize_t blkdev_writeback_range(struct iomap_writepage_ctx *wpc,
static const struct iomap_writeback_ops blkdev_writeback_ops = {
.writeback_range = blkdev_writeback_range,
+ .writeback_submit = iomap_ioend_writeback_submit,
};
static int blkdev_writepages(struct address_space *mapping,
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 0cc41de54aba..86045d3577b7 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -2490,4 +2490,5 @@ static ssize_t gfs2_writeback_range(struct iomap_writepage_ctx *wpc,
const struct iomap_writeback_ops gfs2_writeback_ops = {
.writeback_range = gfs2_writeback_range,
+ .writeback_submit = iomap_ioend_writeback_submit,
};
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 7d9cd05c36bb..eccb9109467a 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1569,7 +1569,7 @@ u32 iomap_finish_ioend_buffered(struct iomap_ioend *ioend)
return folio_count;
}
-static void iomap_writepage_end_bio(struct bio *bio)
+static void ioend_writeback_end_bio(struct bio *bio)
{
struct iomap_ioend *ioend = iomap_ioend_from_bio(bio);
@@ -1578,42 +1578,30 @@ static void iomap_writepage_end_bio(struct bio *bio)
}
/*
- * Submit an ioend.
- *
- * If @error is non-zero, it means that we have a situation where some part of
- * the submission process has failed after we've marked pages for writeback.
- * We cannot cancel ioend directly in that case, so call the bio end I/O handler
- * with the error status here to run the normal I/O completion handler to clear
- * the writeback bit and let the file system proess the errors.
+ * We cannot cancel the ioend directly in case of an error, so call the bio end
+ * I/O handler with the error status here to run the normal I/O completion
+ * handler.
*/
-static int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error)
+int iomap_ioend_writeback_submit(struct iomap_writepage_ctx *wpc, int error)
{
- if (!wpc->ioend)
- return error;
+ struct iomap_ioend *ioend = wpc->wb_ctx;
- /*
- * Let the file systems prepare the I/O submission and hook in an I/O
- * comletion handler. This also needs to happen in case after a
- * failure happened so that the file system end I/O handler gets called
- * to clean up.
- */
- if (wpc->ops->submit_ioend) {
- error = wpc->ops->submit_ioend(wpc, error);
- } else {
- if (WARN_ON_ONCE(wpc->iomap.flags & IOMAP_F_ANON_WRITE))
- error = -EIO;
- if (!error)
- submit_bio(&wpc->ioend->io_bio);
- }
+ if (!ioend->io_bio.bi_end_io)
+ ioend->io_bio.bi_end_io = ioend_writeback_end_bio;
+
+ if (WARN_ON_ONCE(wpc->iomap.flags & IOMAP_F_ANON_WRITE))
+ error = -EIO;
if (error) {
- wpc->ioend->io_bio.bi_status = errno_to_blk_status(error);
- bio_endio(&wpc->ioend->io_bio);
+ ioend->io_bio.bi_status = errno_to_blk_status(error);
+ bio_endio(&ioend->io_bio);
+ return error;
}
- wpc->ioend = NULL;
- return error;
+ submit_bio(&ioend->io_bio);
+ return 0;
}
+EXPORT_SYMBOL_GPL(iomap_ioend_writeback_submit);
static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
loff_t pos, u16 ioend_flags)
@@ -1624,7 +1612,6 @@ static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
REQ_OP_WRITE | wbc_to_write_flags(wpc->wbc),
GFP_NOFS, &iomap_ioend_bioset);
bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos);
- bio->bi_end_io = iomap_writepage_end_bio;
bio->bi_write_hint = wpc->inode->i_write_hint;
wbc_init_bio(wpc->wbc, bio);
wpc->nr_folios = 0;
@@ -1634,16 +1621,17 @@ static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos,
u16 ioend_flags)
{
+ struct iomap_ioend *ioend = wpc->wb_ctx;
+
if (ioend_flags & IOMAP_IOEND_BOUNDARY)
return false;
if ((ioend_flags & IOMAP_IOEND_NOMERGE_FLAGS) !=
- (wpc->ioend->io_flags & IOMAP_IOEND_NOMERGE_FLAGS))
+ (ioend->io_flags & IOMAP_IOEND_NOMERGE_FLAGS))
return false;
- if (pos != wpc->ioend->io_offset + wpc->ioend->io_size)
+ if (pos != ioend->io_offset + ioend->io_size)
return false;
if (!(wpc->iomap.flags & IOMAP_F_ANON_WRITE) &&
- iomap_sector(&wpc->iomap, pos) !=
- bio_end_sector(&wpc->ioend->io_bio))
+ iomap_sector(&wpc->iomap, pos) != bio_end_sector(&ioend->io_bio))
return false;
/*
* Limit ioend bio chain lengths to minimise IO completion latency. This
@@ -1669,6 +1657,7 @@ static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos,
ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
loff_t pos, loff_t end_pos, unsigned int dirty_len)
{
+ struct iomap_ioend *ioend = wpc->wb_ctx;
struct iomap_folio_state *ifs = folio->private;
size_t poff = offset_in_folio(folio, pos);
unsigned int ioend_flags = 0;
@@ -1699,15 +1688,17 @@ ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
if (pos == wpc->iomap.offset && (wpc->iomap.flags & IOMAP_F_BOUNDARY))
ioend_flags |= IOMAP_IOEND_BOUNDARY;
- if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, pos, ioend_flags)) {
+ if (!ioend || !iomap_can_add_to_ioend(wpc, pos, ioend_flags)) {
new_ioend:
- error = iomap_submit_ioend(wpc, 0);
- if (error)
- return error;
- wpc->ioend = iomap_alloc_ioend(wpc, pos, ioend_flags);
+ if (ioend) {
+ error = wpc->ops->writeback_submit(wpc, 0);
+ if (error)
+ return error;
+ }
+ wpc->wb_ctx = ioend = iomap_alloc_ioend(wpc, pos, ioend_flags);
}
- if (!bio_add_folio(&wpc->ioend->io_bio, folio, map_len, poff))
+ if (!bio_add_folio(&ioend->io_bio, folio, map_len, poff))
goto new_ioend;
if (ifs)
@@ -1754,9 +1745,9 @@ ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
* Note that this defeats the ability to chain the ioends of
* appending writes.
*/
- wpc->ioend->io_size += map_len;
- if (wpc->ioend->io_offset + wpc->ioend->io_size > end_pos)
- wpc->ioend->io_size = end_pos - wpc->ioend->io_offset;
+ ioend->io_size += map_len;
+ if (ioend->io_offset + ioend->io_size > end_pos)
+ ioend->io_size = end_pos - ioend->io_offset;
wbc_account_cgroup_owner(wpc->wbc, folio, map_len);
return map_len;
@@ -1951,6 +1942,18 @@ iomap_writepages(struct iomap_writepage_ctx *wpc)
while ((folio = writeback_iter(mapping, wpc->wbc, folio, &error)))
error = iomap_writepage_map(wpc, folio);
- return iomap_submit_ioend(wpc, error);
+
+ /*
+ * If @error is non-zero, it means that we have a situation where some
+ * part of the submission process has failed after we've marked pages
+ * for writeback.
+ *
+ * We cannot cancel the writeback directly in that case, so always call
+ * ->writeback_submit to run the I/O completion handler to clear the
+ * writeback bit and let the file system proess the errors.
+ */
+ if (wpc->wb_ctx)
+ return wpc->ops->writeback_submit(wpc, error);
+ return error;
}
EXPORT_SYMBOL_GPL(iomap_writepages);
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index f6d44ab78442..1ee4f835ac3c 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -514,41 +514,40 @@ xfs_ioend_needs_wq_completion(
}
static int
-xfs_submit_ioend(
- struct iomap_writepage_ctx *wpc,
- int status)
+xfs_writeback_submit(
+ struct iomap_writepage_ctx *wpc,
+ int error)
{
- struct iomap_ioend *ioend = wpc->ioend;
- unsigned int nofs_flag;
+ struct iomap_ioend *ioend = wpc->wb_ctx;
/*
- * We can allocate memory here while doing writeback on behalf of
- * memory reclaim. To avoid memory allocation deadlocks set the
- * task-wide nofs context for the following operations.
+ * Convert CoW extents to regular.
+ *
+ * We can allocate memory here while doing writeback on behalf of memory
+ * reclaim. To avoid memory allocation deadlocks, set the task-wide
+ * nofs context.
*/
- nofs_flag = memalloc_nofs_save();
+ if (!error && (ioend->io_flags & IOMAP_IOEND_SHARED)) {
+ unsigned int nofs_flag;
- /* Convert CoW extents to regular */
- if (!status && (ioend->io_flags & IOMAP_IOEND_SHARED)) {
- status = xfs_reflink_convert_cow(XFS_I(ioend->io_inode),
+ nofs_flag = memalloc_nofs_save();
+ error = xfs_reflink_convert_cow(XFS_I(ioend->io_inode),
ioend->io_offset, ioend->io_size);
+ memalloc_nofs_restore(nofs_flag);
}
- memalloc_nofs_restore(nofs_flag);
-
- /* send ioends that might require a transaction to the completion wq */
+ /*
+ * Send ioends that might require a transaction to the completion wq.
+ */
if (xfs_ioend_needs_wq_completion(ioend))
ioend->io_bio.bi_end_io = xfs_end_bio;
- if (status)
- return status;
- submit_bio(&ioend->io_bio);
- return 0;
+ return iomap_ioend_writeback_submit(wpc, error);
}
static const struct iomap_writeback_ops xfs_writeback_ops = {
.writeback_range = xfs_writeback_range,
- .submit_ioend = xfs_submit_ioend,
+ .writeback_submit = xfs_writeback_submit,
};
struct xfs_zoned_writepage_ctx {
@@ -646,20 +645,25 @@ xfs_zoned_writeback_range(
}
static int
-xfs_zoned_submit_ioend(
- struct iomap_writepage_ctx *wpc,
- int status)
+xfs_zoned_writeback_submit(
+ struct iomap_writepage_ctx *wpc,
+ int error)
{
- wpc->ioend->io_bio.bi_end_io = xfs_end_bio;
- if (status)
- return status;
- xfs_zone_alloc_and_submit(wpc->ioend, &XFS_ZWPC(wpc)->open_zone);
+ struct iomap_ioend *ioend = wpc->wb_ctx;
+
+ ioend->io_bio.bi_end_io = xfs_end_bio;
+ if (error) {
+ ioend->io_bio.bi_status = errno_to_blk_status(error);
+ bio_endio(&ioend->io_bio);
+ return error;
+ }
+ xfs_zone_alloc_and_submit(ioend, &XFS_ZWPC(wpc)->open_zone);
return 0;
}
static const struct iomap_writeback_ops xfs_zoned_writeback_ops = {
.writeback_range = xfs_zoned_writeback_range,
- .submit_ioend = xfs_zoned_submit_ioend,
+ .writeback_submit = xfs_zoned_writeback_submit,
};
STATIC int
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index c88e2c851753..fee9403ad49b 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -151,6 +151,7 @@ static ssize_t zonefs_writeback_range(struct iomap_writepage_ctx *wpc,
static const struct iomap_writeback_ops zonefs_writeback_ops = {
.writeback_range = zonefs_writeback_range,
+ .writeback_submit = iomap_ioend_writeback_submit,
};
static int zonefs_writepages(struct address_space *mapping,
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 625d7911a2b5..9f32dd8dc075 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -391,8 +391,7 @@ sector_t iomap_bmap(struct address_space *mapping, sector_t bno,
/*
* Structure for writeback I/O completions.
*
- * File systems implementing ->submit_ioend (for buffered I/O) or ->submit_io
- * for direct I/O) can split a bio generated by iomap. In that case the parent
+ * File systems can split a bio generated by iomap. In that case the parent
* ioend it was split from is recorded in ioend->io_parent.
*/
struct iomap_ioend {
@@ -416,7 +415,7 @@ static inline struct iomap_ioend *iomap_ioend_from_bio(struct bio *bio)
struct iomap_writeback_ops {
/*
- * Required, performs writeback on the passed in range
+ * Performs writeback on the passed in range
*
* Can map arbitrarily large regions, but we need to call into it at
* least once per folio to allow the file systems to synchronize with
@@ -432,23 +431,22 @@ struct iomap_writeback_ops {
u64 end_pos);
/*
- * Optional, allows the file systems to hook into bio submission,
- * including overriding the bi_end_io handler.
+ * Submit a writeback context previously build up by ->writeback_range.
*
- * Returns 0 if the bio was successfully submitted, or a negative
- * error code if status was non-zero or another error happened and
- * the bio could not be submitted.
+ * Returns 0 if the context was successfully submitted, or a negative
+ * error code if not. If @error is non-zero a failure occurred, and
+ * the writeback context should be completed with an error.
*/
- int (*submit_ioend)(struct iomap_writepage_ctx *wpc, int status);
+ int (*writeback_submit)(struct iomap_writepage_ctx *wpc, int error);
};
struct iomap_writepage_ctx {
struct iomap iomap;
struct inode *inode;
struct writeback_control *wbc;
- struct iomap_ioend *ioend;
const struct iomap_writeback_ops *ops;
u32 nr_folios; /* folios added to the ioend */
+ void *wb_ctx; /* pending writeback context */
};
struct iomap_ioend *iomap_init_ioend(struct inode *inode, struct bio *bio,
@@ -461,6 +459,7 @@ void iomap_ioend_try_merge(struct iomap_ioend *ioend,
void iomap_sort_ioends(struct list_head *ioend_list);
ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
loff_t pos, loff_t end_pos, unsigned int dirty_len);
+int iomap_ioend_writeback_submit(struct iomap_writepage_ctx *wpc, int error);
int iomap_writepages(struct iomap_writepage_ctx *wpc);
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 06/14] iomap: add public helpers for uptodate state manipulation
2025-07-08 13:51 refactor the iomap writeback code v4 Christoph Hellwig
` (4 preceding siblings ...)
2025-07-08 13:51 ` [PATCH 05/14] iomap: hide ioends from the generic writeback code Christoph Hellwig
@ 2025-07-08 13:51 ` Christoph Hellwig
2025-07-08 13:51 ` [PATCH 07/14] iomap: move all ioend handling to ioend.c Christoph Hellwig
` (8 subsequent siblings)
14 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2025-07-08 13:51 UTC (permalink / raw)
To: Christian Brauner
Cc: Darrick J. Wong, Joanne Koong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2, Brian Foster
From: Joanne Koong <joannelkoong@gmail.com>
Add a new iomap_start_folio_write helper to abstract away the
write_bytes_pending handling, and export it and the existing
iomap_finish_folio_write for non-iomap writeback in fuse.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/iomap/buffered-io.c | 20 +++++++++++++++-----
include/linux/iomap.h | 5 +++++
2 files changed, 20 insertions(+), 5 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index eccb9109467a..aaeaceba8adc 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1525,7 +1525,18 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops,
}
EXPORT_SYMBOL_GPL(iomap_page_mkwrite);
-static void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
+void iomap_start_folio_write(struct inode *inode, struct folio *folio,
+ size_t len)
+{
+ struct iomap_folio_state *ifs = folio->private;
+
+ WARN_ON_ONCE(i_blocks_per_folio(inode, folio) > 1 && !ifs);
+ if (ifs)
+ atomic_add(len, &ifs->write_bytes_pending);
+}
+EXPORT_SYMBOL_GPL(iomap_start_folio_write);
+
+void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
size_t len)
{
struct iomap_folio_state *ifs = folio->private;
@@ -1536,6 +1547,7 @@ static void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
if (!ifs || atomic_sub_and_test(len, &ifs->write_bytes_pending))
folio_end_writeback(folio);
}
+EXPORT_SYMBOL_GPL(iomap_finish_folio_write);
/*
* We're now finished for good with this ioend structure. Update the page
@@ -1658,7 +1670,6 @@ ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
loff_t pos, loff_t end_pos, unsigned int dirty_len)
{
struct iomap_ioend *ioend = wpc->wb_ctx;
- struct iomap_folio_state *ifs = folio->private;
size_t poff = offset_in_folio(folio, pos);
unsigned int ioend_flags = 0;
unsigned int map_len = min_t(u64, dirty_len,
@@ -1701,8 +1712,7 @@ ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
if (!bio_add_folio(&ioend->io_bio, folio, map_len, poff))
goto new_ioend;
- if (ifs)
- atomic_add(map_len, &ifs->write_bytes_pending);
+ iomap_start_folio_write(wpc->inode, folio, map_len);
/*
* Clamp io_offset and io_size to the incore EOF so that ondisk
@@ -1875,7 +1885,7 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
* all blocks.
*/
WARN_ON_ONCE(atomic_read(&ifs->write_bytes_pending) != 0);
- atomic_inc(&ifs->write_bytes_pending);
+ iomap_start_folio_write(inode, folio, 1);
}
/*
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 9f32dd8dc075..cbf9d299a616 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -461,6 +461,11 @@ ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
loff_t pos, loff_t end_pos, unsigned int dirty_len);
int iomap_ioend_writeback_submit(struct iomap_writepage_ctx *wpc, int error);
+void iomap_start_folio_write(struct inode *inode, struct folio *folio,
+ size_t len);
+void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
+ size_t len);
+
int iomap_writepages(struct iomap_writepage_ctx *wpc);
/*
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 07/14] iomap: move all ioend handling to ioend.c
2025-07-08 13:51 refactor the iomap writeback code v4 Christoph Hellwig
` (5 preceding siblings ...)
2025-07-08 13:51 ` [PATCH 06/14] iomap: add public helpers for uptodate state manipulation Christoph Hellwig
@ 2025-07-08 13:51 ` Christoph Hellwig
2025-07-08 13:51 ` [PATCH 08/14] iomap: rename iomap_writepage_map to iomap_writeback_folio Christoph Hellwig
` (7 subsequent siblings)
14 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2025-07-08 13:51 UTC (permalink / raw)
To: Christian Brauner
Cc: Darrick J. Wong, Joanne Koong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2, Brian Foster
Now that the writeback code has the proper abstractions, all the ioend
code can be self-contained in ioend.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/iomap/buffered-io.c | 215 ----------------------------------------
fs/iomap/internal.h | 1 -
fs/iomap/ioend.c | 220 ++++++++++++++++++++++++++++++++++++++++-
3 files changed, 219 insertions(+), 217 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index aaeaceba8adc..6c29c5043309 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1549,221 +1549,6 @@ void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
}
EXPORT_SYMBOL_GPL(iomap_finish_folio_write);
-/*
- * We're now finished for good with this ioend structure. Update the page
- * state, release holds on bios, and finally free up memory. Do not use the
- * ioend after this.
- */
-u32 iomap_finish_ioend_buffered(struct iomap_ioend *ioend)
-{
- struct inode *inode = ioend->io_inode;
- struct bio *bio = &ioend->io_bio;
- struct folio_iter fi;
- u32 folio_count = 0;
-
- if (ioend->io_error) {
- mapping_set_error(inode->i_mapping, ioend->io_error);
- if (!bio_flagged(bio, BIO_QUIET)) {
- pr_err_ratelimited(
-"%s: writeback error on inode %lu, offset %lld, sector %llu",
- inode->i_sb->s_id, inode->i_ino,
- ioend->io_offset, ioend->io_sector);
- }
- }
-
- /* walk all folios in bio, ending page IO on them */
- bio_for_each_folio_all(fi, bio) {
- iomap_finish_folio_write(inode, fi.folio, fi.length);
- folio_count++;
- }
-
- bio_put(bio); /* frees the ioend */
- return folio_count;
-}
-
-static void ioend_writeback_end_bio(struct bio *bio)
-{
- struct iomap_ioend *ioend = iomap_ioend_from_bio(bio);
-
- ioend->io_error = blk_status_to_errno(bio->bi_status);
- iomap_finish_ioend_buffered(ioend);
-}
-
-/*
- * We cannot cancel the ioend directly in case of an error, so call the bio end
- * I/O handler with the error status here to run the normal I/O completion
- * handler.
- */
-int iomap_ioend_writeback_submit(struct iomap_writepage_ctx *wpc, int error)
-{
- struct iomap_ioend *ioend = wpc->wb_ctx;
-
- if (!ioend->io_bio.bi_end_io)
- ioend->io_bio.bi_end_io = ioend_writeback_end_bio;
-
- if (WARN_ON_ONCE(wpc->iomap.flags & IOMAP_F_ANON_WRITE))
- error = -EIO;
-
- if (error) {
- ioend->io_bio.bi_status = errno_to_blk_status(error);
- bio_endio(&ioend->io_bio);
- return error;
- }
-
- submit_bio(&ioend->io_bio);
- return 0;
-}
-EXPORT_SYMBOL_GPL(iomap_ioend_writeback_submit);
-
-static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
- loff_t pos, u16 ioend_flags)
-{
- struct bio *bio;
-
- bio = bio_alloc_bioset(wpc->iomap.bdev, BIO_MAX_VECS,
- REQ_OP_WRITE | wbc_to_write_flags(wpc->wbc),
- GFP_NOFS, &iomap_ioend_bioset);
- bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos);
- bio->bi_write_hint = wpc->inode->i_write_hint;
- wbc_init_bio(wpc->wbc, bio);
- wpc->nr_folios = 0;
- return iomap_init_ioend(wpc->inode, bio, pos, ioend_flags);
-}
-
-static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos,
- u16 ioend_flags)
-{
- struct iomap_ioend *ioend = wpc->wb_ctx;
-
- if (ioend_flags & IOMAP_IOEND_BOUNDARY)
- return false;
- if ((ioend_flags & IOMAP_IOEND_NOMERGE_FLAGS) !=
- (ioend->io_flags & IOMAP_IOEND_NOMERGE_FLAGS))
- return false;
- if (pos != ioend->io_offset + ioend->io_size)
- return false;
- if (!(wpc->iomap.flags & IOMAP_F_ANON_WRITE) &&
- iomap_sector(&wpc->iomap, pos) != bio_end_sector(&ioend->io_bio))
- return false;
- /*
- * Limit ioend bio chain lengths to minimise IO completion latency. This
- * also prevents long tight loops ending page writeback on all the
- * folios in the ioend.
- */
- if (wpc->nr_folios >= IOEND_BATCH_SIZE)
- return false;
- return true;
-}
-
-/*
- * Test to see if we have an existing ioend structure that we could append to
- * first; otherwise finish off the current ioend and start another.
- *
- * If a new ioend is created and cached, the old ioend is submitted to the block
- * layer instantly. Batching optimisations are provided by higher level block
- * plugging.
- *
- * At the end of a writeback pass, there will be a cached ioend remaining on the
- * writepage context that the caller will need to submit.
- */
-ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
- loff_t pos, loff_t end_pos, unsigned int dirty_len)
-{
- struct iomap_ioend *ioend = wpc->wb_ctx;
- size_t poff = offset_in_folio(folio, pos);
- unsigned int ioend_flags = 0;
- unsigned int map_len = min_t(u64, dirty_len,
- wpc->iomap.offset + wpc->iomap.length - pos);
- int error;
-
- trace_iomap_add_to_ioend(wpc->inode, pos, dirty_len, &wpc->iomap);
-
- WARN_ON_ONCE(!folio->private && map_len < dirty_len);
-
- switch (wpc->iomap.type) {
- case IOMAP_INLINE:
- WARN_ON_ONCE(1);
- return -EIO;
- case IOMAP_HOLE:
- return map_len;
- default:
- break;
- }
-
- if (wpc->iomap.type == IOMAP_UNWRITTEN)
- ioend_flags |= IOMAP_IOEND_UNWRITTEN;
- if (wpc->iomap.flags & IOMAP_F_SHARED)
- ioend_flags |= IOMAP_IOEND_SHARED;
- if (folio_test_dropbehind(folio))
- ioend_flags |= IOMAP_IOEND_DONTCACHE;
- if (pos == wpc->iomap.offset && (wpc->iomap.flags & IOMAP_F_BOUNDARY))
- ioend_flags |= IOMAP_IOEND_BOUNDARY;
-
- if (!ioend || !iomap_can_add_to_ioend(wpc, pos, ioend_flags)) {
-new_ioend:
- if (ioend) {
- error = wpc->ops->writeback_submit(wpc, 0);
- if (error)
- return error;
- }
- wpc->wb_ctx = ioend = iomap_alloc_ioend(wpc, pos, ioend_flags);
- }
-
- if (!bio_add_folio(&ioend->io_bio, folio, map_len, poff))
- goto new_ioend;
-
- iomap_start_folio_write(wpc->inode, folio, map_len);
-
- /*
- * Clamp io_offset and io_size to the incore EOF so that ondisk
- * file size updates in the ioend completion are byte-accurate.
- * This avoids recovering files with zeroed tail regions when
- * writeback races with appending writes:
- *
- * Thread 1: Thread 2:
- * ------------ -----------
- * write [A, A+B]
- * update inode size to A+B
- * submit I/O [A, A+BS]
- * write [A+B, A+B+C]
- * update inode size to A+B+C
- * <I/O completes, updates disk size to min(A+B+C, A+BS)>
- * <power failure>
- *
- * After reboot:
- * 1) with A+B+C < A+BS, the file has zero padding in range
- * [A+B, A+B+C]
- *
- * |< Block Size (BS) >|
- * |DDDDDDDDDDDD0000000000000|
- * ^ ^ ^
- * A A+B A+B+C
- * (EOF)
- *
- * 2) with A+B+C > A+BS, the file has zero padding in range
- * [A+B, A+BS]
- *
- * |< Block Size (BS) >|< Block Size (BS) >|
- * |DDDDDDDDDDDD0000000000000|00000000000000000000000000|
- * ^ ^ ^ ^
- * A A+B A+BS A+B+C
- * (EOF)
- *
- * D = Valid Data
- * 0 = Zero Padding
- *
- * Note that this defeats the ability to chain the ioends of
- * appending writes.
- */
- ioend->io_size += map_len;
- if (ioend->io_offset + ioend->io_size > end_pos)
- ioend->io_size = end_pos - ioend->io_offset;
-
- wbc_account_cgroup_owner(wpc->wbc, folio, map_len);
- return map_len;
-}
-EXPORT_SYMBOL_GPL(iomap_add_to_ioend);
-
static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
struct folio *folio, u64 pos, u32 rlen, u64 end_pos,
bool *wb_pending)
diff --git a/fs/iomap/internal.h b/fs/iomap/internal.h
index f6992a3bf66a..d05cb3aed96e 100644
--- a/fs/iomap/internal.h
+++ b/fs/iomap/internal.h
@@ -4,7 +4,6 @@
#define IOEND_BATCH_SIZE 4096
-u32 iomap_finish_ioend_buffered(struct iomap_ioend *ioend);
u32 iomap_finish_ioend_direct(struct iomap_ioend *ioend);
#endif /* _IOMAP_INTERNAL_H */
diff --git a/fs/iomap/ioend.c b/fs/iomap/ioend.c
index 18894ebba6db..b49fa75eab26 100644
--- a/fs/iomap/ioend.c
+++ b/fs/iomap/ioend.c
@@ -1,10 +1,13 @@
// SPDX-License-Identifier: GPL-2.0
/*
- * Copyright (c) 2024-2025 Christoph Hellwig.
+ * Copyright (c) 2016-2025 Christoph Hellwig.
*/
#include <linux/iomap.h>
#include <linux/list_sort.h>
+#include <linux/pagemap.h>
+#include <linux/writeback.h>
#include "internal.h"
+#include "trace.h"
struct bio_set iomap_ioend_bioset;
EXPORT_SYMBOL_GPL(iomap_ioend_bioset);
@@ -28,6 +31,221 @@ struct iomap_ioend *iomap_init_ioend(struct inode *inode,
}
EXPORT_SYMBOL_GPL(iomap_init_ioend);
+/*
+ * We're now finished for good with this ioend structure. Update the folio
+ * state, release holds on bios, and finally free up memory. Do not use the
+ * ioend after this.
+ */
+static u32 iomap_finish_ioend_buffered(struct iomap_ioend *ioend)
+{
+ struct inode *inode = ioend->io_inode;
+ struct bio *bio = &ioend->io_bio;
+ struct folio_iter fi;
+ u32 folio_count = 0;
+
+ if (ioend->io_error) {
+ mapping_set_error(inode->i_mapping, ioend->io_error);
+ if (!bio_flagged(bio, BIO_QUIET)) {
+ pr_err_ratelimited(
+"%s: writeback error on inode %lu, offset %lld, sector %llu",
+ inode->i_sb->s_id, inode->i_ino,
+ ioend->io_offset, ioend->io_sector);
+ }
+ }
+
+ /* walk all folios in bio, ending page IO on them */
+ bio_for_each_folio_all(fi, bio) {
+ iomap_finish_folio_write(inode, fi.folio, fi.length);
+ folio_count++;
+ }
+
+ bio_put(bio); /* frees the ioend */
+ return folio_count;
+}
+
+static void ioend_writeback_end_bio(struct bio *bio)
+{
+ struct iomap_ioend *ioend = iomap_ioend_from_bio(bio);
+
+ ioend->io_error = blk_status_to_errno(bio->bi_status);
+ iomap_finish_ioend_buffered(ioend);
+}
+
+/*
+ * We cannot cancel the ioend directly in case of an error, so call the bio end
+ * I/O handler with the error status here to run the normal I/O completion
+ * handler.
+ */
+int iomap_ioend_writeback_submit(struct iomap_writepage_ctx *wpc, int error)
+{
+ struct iomap_ioend *ioend = wpc->wb_ctx;
+
+ if (!ioend->io_bio.bi_end_io)
+ ioend->io_bio.bi_end_io = ioend_writeback_end_bio;
+
+ if (WARN_ON_ONCE(wpc->iomap.flags & IOMAP_F_ANON_WRITE))
+ error = -EIO;
+
+ if (error) {
+ ioend->io_bio.bi_status = errno_to_blk_status(error);
+ bio_endio(&ioend->io_bio);
+ return error;
+ }
+
+ submit_bio(&ioend->io_bio);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(iomap_ioend_writeback_submit);
+
+static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
+ loff_t pos, u16 ioend_flags)
+{
+ struct bio *bio;
+
+ bio = bio_alloc_bioset(wpc->iomap.bdev, BIO_MAX_VECS,
+ REQ_OP_WRITE | wbc_to_write_flags(wpc->wbc),
+ GFP_NOFS, &iomap_ioend_bioset);
+ bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos);
+ bio->bi_write_hint = wpc->inode->i_write_hint;
+ wbc_init_bio(wpc->wbc, bio);
+ wpc->nr_folios = 0;
+ return iomap_init_ioend(wpc->inode, bio, pos, ioend_flags);
+}
+
+static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos,
+ u16 ioend_flags)
+{
+ struct iomap_ioend *ioend = wpc->wb_ctx;
+
+ if (ioend_flags & IOMAP_IOEND_BOUNDARY)
+ return false;
+ if ((ioend_flags & IOMAP_IOEND_NOMERGE_FLAGS) !=
+ (ioend->io_flags & IOMAP_IOEND_NOMERGE_FLAGS))
+ return false;
+ if (pos != ioend->io_offset + ioend->io_size)
+ return false;
+ if (!(wpc->iomap.flags & IOMAP_F_ANON_WRITE) &&
+ iomap_sector(&wpc->iomap, pos) != bio_end_sector(&ioend->io_bio))
+ return false;
+ /*
+ * Limit ioend bio chain lengths to minimise IO completion latency. This
+ * also prevents long tight loops ending page writeback on all the
+ * folios in the ioend.
+ */
+ if (wpc->nr_folios >= IOEND_BATCH_SIZE)
+ return false;
+ return true;
+}
+
+/*
+ * Test to see if we have an existing ioend structure that we could append to
+ * first; otherwise finish off the current ioend and start another.
+ *
+ * If a new ioend is created and cached, the old ioend is submitted to the block
+ * layer instantly. Batching optimisations are provided by higher level block
+ * plugging.
+ *
+ * At the end of a writeback pass, there will be a cached ioend remaining on the
+ * writepage context that the caller will need to submit.
+ */
+ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
+ loff_t pos, loff_t end_pos, unsigned int dirty_len)
+{
+ struct iomap_ioend *ioend = wpc->wb_ctx;
+ size_t poff = offset_in_folio(folio, pos);
+ unsigned int ioend_flags = 0;
+ unsigned int map_len = min_t(u64, dirty_len,
+ wpc->iomap.offset + wpc->iomap.length - pos);
+ int error;
+
+ trace_iomap_add_to_ioend(wpc->inode, pos, dirty_len, &wpc->iomap);
+
+ WARN_ON_ONCE(!folio->private && map_len < dirty_len);
+
+ switch (wpc->iomap.type) {
+ case IOMAP_INLINE:
+ WARN_ON_ONCE(1);
+ return -EIO;
+ case IOMAP_HOLE:
+ return map_len;
+ default:
+ break;
+ }
+
+ if (wpc->iomap.type == IOMAP_UNWRITTEN)
+ ioend_flags |= IOMAP_IOEND_UNWRITTEN;
+ if (wpc->iomap.flags & IOMAP_F_SHARED)
+ ioend_flags |= IOMAP_IOEND_SHARED;
+ if (folio_test_dropbehind(folio))
+ ioend_flags |= IOMAP_IOEND_DONTCACHE;
+ if (pos == wpc->iomap.offset && (wpc->iomap.flags & IOMAP_F_BOUNDARY))
+ ioend_flags |= IOMAP_IOEND_BOUNDARY;
+
+ if (!ioend || !iomap_can_add_to_ioend(wpc, pos, ioend_flags)) {
+new_ioend:
+ if (ioend) {
+ error = wpc->ops->writeback_submit(wpc, 0);
+ if (error)
+ return error;
+ }
+ wpc->wb_ctx = ioend = iomap_alloc_ioend(wpc, pos, ioend_flags);
+ }
+
+ if (!bio_add_folio(&ioend->io_bio, folio, map_len, poff))
+ goto new_ioend;
+
+ iomap_start_folio_write(wpc->inode, folio, map_len);
+
+ /*
+ * Clamp io_offset and io_size to the incore EOF so that ondisk
+ * file size updates in the ioend completion are byte-accurate.
+ * This avoids recovering files with zeroed tail regions when
+ * writeback races with appending writes:
+ *
+ * Thread 1: Thread 2:
+ * ------------ -----------
+ * write [A, A+B]
+ * update inode size to A+B
+ * submit I/O [A, A+BS]
+ * write [A+B, A+B+C]
+ * update inode size to A+B+C
+ * <I/O completes, updates disk size to min(A+B+C, A+BS)>
+ * <power failure>
+ *
+ * After reboot:
+ * 1) with A+B+C < A+BS, the file has zero padding in range
+ * [A+B, A+B+C]
+ *
+ * |< Block Size (BS) >|
+ * |DDDDDDDDDDDD0000000000000|
+ * ^ ^ ^
+ * A A+B A+B+C
+ * (EOF)
+ *
+ * 2) with A+B+C > A+BS, the file has zero padding in range
+ * [A+B, A+BS]
+ *
+ * |< Block Size (BS) >|< Block Size (BS) >|
+ * |DDDDDDDDDDDD0000000000000|00000000000000000000000000|
+ * ^ ^ ^ ^
+ * A A+B A+BS A+B+C
+ * (EOF)
+ *
+ * D = Valid Data
+ * 0 = Zero Padding
+ *
+ * Note that this defeats the ability to chain the ioends of
+ * appending writes.
+ */
+ ioend->io_size += map_len;
+ if (ioend->io_offset + ioend->io_size > end_pos)
+ ioend->io_size = end_pos - ioend->io_offset;
+
+ wbc_account_cgroup_owner(wpc->wbc, folio, map_len);
+ return map_len;
+}
+EXPORT_SYMBOL_GPL(iomap_add_to_ioend);
+
static u32 iomap_finish_ioend(struct iomap_ioend *ioend, int error)
{
if (ioend->io_parent) {
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 08/14] iomap: rename iomap_writepage_map to iomap_writeback_folio
2025-07-08 13:51 refactor the iomap writeback code v4 Christoph Hellwig
` (6 preceding siblings ...)
2025-07-08 13:51 ` [PATCH 07/14] iomap: move all ioend handling to ioend.c Christoph Hellwig
@ 2025-07-08 13:51 ` Christoph Hellwig
2025-07-08 21:12 ` Joanne Koong
2025-07-08 13:51 ` [PATCH 09/14] iomap: move folio_unlock out of iomap_writeback_folio Christoph Hellwig
` (6 subsequent siblings)
14 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2025-07-08 13:51 UTC (permalink / raw)
To: Christian Brauner
Cc: Darrick J. Wong, Joanne Koong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2, Brian Foster
->writepage is gone, and our naming wasn't always that great to start
with.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/iomap/buffered-io.c | 10 +++++-----
fs/iomap/trace.h | 2 +-
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 6c29c5043309..c1075f3027ac 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1581,7 +1581,7 @@ static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
* If the folio is entirely beyond i_size, return false. If it straddles
* i_size, adjust end_pos and zero all data beyond i_size.
*/
-static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode,
+static bool iomap_writeback_handle_eof(struct folio *folio, struct inode *inode,
u64 *end_pos)
{
u64 isize = i_size_read(inode);
@@ -1633,7 +1633,7 @@ static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode,
return true;
}
-static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
+static int iomap_writeback_folio(struct iomap_writepage_ctx *wpc,
struct folio *folio)
{
struct iomap_folio_state *ifs = folio->private;
@@ -1649,9 +1649,9 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
WARN_ON_ONCE(folio_test_dirty(folio));
WARN_ON_ONCE(folio_test_writeback(folio));
- trace_iomap_writepage(inode, pos, folio_size(folio));
+ trace_iomap_writeback_folio(inode, pos, folio_size(folio));
- if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
+ if (!iomap_writeback_handle_eof(folio, inode, &end_pos)) {
folio_unlock(folio);
return 0;
}
@@ -1736,7 +1736,7 @@ iomap_writepages(struct iomap_writepage_ctx *wpc)
return -EIO;
while ((folio = writeback_iter(mapping, wpc->wbc, folio, &error)))
- error = iomap_writepage_map(wpc, folio);
+ error = iomap_writeback_folio(wpc, folio);
/*
* If @error is non-zero, it means that we have a situation where some
diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index aaea02c9560a..6ad66e6ba653 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -79,7 +79,7 @@ DECLARE_EVENT_CLASS(iomap_range_class,
DEFINE_EVENT(iomap_range_class, name, \
TP_PROTO(struct inode *inode, loff_t off, u64 len),\
TP_ARGS(inode, off, len))
-DEFINE_RANGE_EVENT(iomap_writepage);
+DEFINE_RANGE_EVENT(iomap_writeback_folio);
DEFINE_RANGE_EVENT(iomap_release_folio);
DEFINE_RANGE_EVENT(iomap_invalidate_folio);
DEFINE_RANGE_EVENT(iomap_dio_invalidate_fail);
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 09/14] iomap: move folio_unlock out of iomap_writeback_folio
2025-07-08 13:51 refactor the iomap writeback code v4 Christoph Hellwig
` (7 preceding siblings ...)
2025-07-08 13:51 ` [PATCH 08/14] iomap: rename iomap_writepage_map to iomap_writeback_folio Christoph Hellwig
@ 2025-07-08 13:51 ` Christoph Hellwig
2025-07-08 13:51 ` [PATCH 10/14] iomap: export iomap_writeback_folio Christoph Hellwig
` (5 subsequent siblings)
14 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2025-07-08 13:51 UTC (permalink / raw)
To: Christian Brauner
Cc: Darrick J. Wong, Joanne Koong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2
From: Joanne Koong <joannelkoong@gmail.com>
Move unlocking the folio out of iomap_writeback_folio into the caller.
This means the end writeback machinery is now run with the folio locked
when no writeback happened, or writeback completed extremely fast.
Note that having the folio locked over the call to folio_end_writeback in
iomap_writeback_folio means that the dropbehind handling there will never
run because the trylock fails. The only way this can happen is if the
writepage either never wrote back any dirty data at all, in which case
the dropbehind handling isn't needed, or if all writeback finished
instantly, which is rather unlikely. Even in the latter case the
dropbehind handling is an optional optimization so skipping it will not
cause correctness issues.
This prepares for exporting iomap_writeback_folio for use in folio
laundering.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/iomap/buffered-io.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index c1075f3027ac..1c18925070ca 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1651,10 +1651,8 @@ static int iomap_writeback_folio(struct iomap_writepage_ctx *wpc,
trace_iomap_writeback_folio(inode, pos, folio_size(folio));
- if (!iomap_writeback_handle_eof(folio, inode, &end_pos)) {
- folio_unlock(folio);
+ if (!iomap_writeback_handle_eof(folio, inode, &end_pos))
return 0;
- }
WARN_ON_ONCE(end_pos <= pos);
if (i_blocks_per_folio(inode, folio) > 1) {
@@ -1708,7 +1706,6 @@ static int iomap_writeback_folio(struct iomap_writepage_ctx *wpc,
* already at this point. In that case we need to clear the writeback
* bit ourselves right after unlocking the page.
*/
- folio_unlock(folio);
if (ifs) {
if (atomic_dec_and_test(&ifs->write_bytes_pending))
folio_end_writeback(folio);
@@ -1735,8 +1732,10 @@ iomap_writepages(struct iomap_writepage_ctx *wpc)
PF_MEMALLOC))
return -EIO;
- while ((folio = writeback_iter(mapping, wpc->wbc, folio, &error)))
+ while ((folio = writeback_iter(mapping, wpc->wbc, folio, &error))) {
error = iomap_writeback_folio(wpc, folio);
+ folio_unlock(folio);
+ }
/*
* If @error is non-zero, it means that we have a situation where some
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 10/14] iomap: export iomap_writeback_folio
2025-07-08 13:51 refactor the iomap writeback code v4 Christoph Hellwig
` (8 preceding siblings ...)
2025-07-08 13:51 ` [PATCH 09/14] iomap: move folio_unlock out of iomap_writeback_folio Christoph Hellwig
@ 2025-07-08 13:51 ` Christoph Hellwig
2025-07-08 13:51 ` [PATCH 11/14] iomap: replace iomap_folio_ops with iomap_write_ops Christoph Hellwig
` (4 subsequent siblings)
14 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2025-07-08 13:51 UTC (permalink / raw)
To: Christian Brauner
Cc: Darrick J. Wong, Joanne Koong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2
Allow fuse to use iomap_writeback_folio for folio laundering. Note
that the caller needs to manually submit the pending writeback context.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/iomap/buffered-io.c | 4 ++--
include/linux/iomap.h | 1 +
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 1c18925070ca..ddb4363359e2 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1633,8 +1633,7 @@ static bool iomap_writeback_handle_eof(struct folio *folio, struct inode *inode,
return true;
}
-static int iomap_writeback_folio(struct iomap_writepage_ctx *wpc,
- struct folio *folio)
+int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio)
{
struct iomap_folio_state *ifs = folio->private;
struct inode *inode = wpc->inode;
@@ -1716,6 +1715,7 @@ static int iomap_writeback_folio(struct iomap_writepage_ctx *wpc,
mapping_set_error(inode->i_mapping, error);
return error;
}
+EXPORT_SYMBOL_GPL(iomap_writeback_folio);
int
iomap_writepages(struct iomap_writepage_ctx *wpc)
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index cbf9d299a616..b65d3f063bb0 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -466,6 +466,7 @@ void iomap_start_folio_write(struct inode *inode, struct folio *folio,
void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
size_t len);
+int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio);
int iomap_writepages(struct iomap_writepage_ctx *wpc);
/*
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 11/14] iomap: replace iomap_folio_ops with iomap_write_ops
2025-07-08 13:51 refactor the iomap writeback code v4 Christoph Hellwig
` (9 preceding siblings ...)
2025-07-08 13:51 ` [PATCH 10/14] iomap: export iomap_writeback_folio Christoph Hellwig
@ 2025-07-08 13:51 ` Christoph Hellwig
2025-07-08 13:51 ` [PATCH 12/14] iomap: improve argument passing to iomap_read_folio_sync Christoph Hellwig
` (3 subsequent siblings)
14 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2025-07-08 13:51 UTC (permalink / raw)
To: Christian Brauner
Cc: Darrick J. Wong, Joanne Koong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2, Brian Foster, Damien Le Moal
The iomap_folio_ops are only used for buffered writes, including the zero
and unshare variants. Rename them to iomap_write_ops to better describe
the usage, and pass them through the call chain like the other operation
specific methods instead of through the iomap.
xfs_iomap_valid grows a IOMAP_HOLE check to keep the existing behavior
that never attached the folio_ops to a iomap representing a hole.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Acked-by: Damien Le Moal <dlemoal@kernel.org>
---
Documentation/filesystems/iomap/design.rst | 3 -
.../filesystems/iomap/operations.rst | 8 +-
block/fops.c | 3 +-
fs/gfs2/bmap.c | 21 ++---
fs/gfs2/bmap.h | 1 +
fs/gfs2/file.c | 3 +-
fs/iomap/buffered-io.c | 79 +++++++++++--------
fs/xfs/xfs_file.c | 6 +-
fs/xfs/xfs_iomap.c | 12 ++-
fs/xfs/xfs_iomap.h | 1 +
fs/xfs/xfs_reflink.c | 3 +-
fs/zonefs/file.c | 3 +-
include/linux/iomap.h | 22 +++---
13 files changed, 89 insertions(+), 76 deletions(-)
diff --git a/Documentation/filesystems/iomap/design.rst b/Documentation/filesystems/iomap/design.rst
index f2df9b6df988..0f7672676c0b 100644
--- a/Documentation/filesystems/iomap/design.rst
+++ b/Documentation/filesystems/iomap/design.rst
@@ -167,7 +167,6 @@ structure below:
struct dax_device *dax_dev;
void *inline_data;
void *private;
- const struct iomap_folio_ops *folio_ops;
u64 validity_cookie;
};
@@ -292,8 +291,6 @@ The fields are as follows:
<https://lore.kernel.org/all/20180619164137.13720-7-hch@lst.de/>`_.
This value will be passed unchanged to ``->iomap_end``.
- * ``folio_ops`` will be covered in the section on pagecache operations.
-
* ``validity_cookie`` is a magic freshness value set by the filesystem
that should be used to detect stale mappings.
For pagecache operations this is critical for correct operation
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index 4b93c5f7841a..a9b48ce4af92 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -57,16 +57,12 @@ The following address space operations can be wrapped easily:
* ``bmap``
* ``swap_activate``
-``struct iomap_folio_ops``
+``struct iomap_write_ops``
--------------------------
-The ``->iomap_begin`` function for pagecache operations may set the
-``struct iomap::folio_ops`` field to an ops structure to override
-default behaviors of iomap:
-
.. code-block:: c
- struct iomap_folio_ops {
+ struct iomap_write_ops {
struct folio *(*get_folio)(struct iomap_iter *iter, loff_t pos,
unsigned len);
void (*put_folio)(struct inode *inode, loff_t pos, unsigned copied,
diff --git a/block/fops.c b/block/fops.c
index 0845737c0320..0c2c010ff303 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -723,7 +723,8 @@ blkdev_direct_write(struct kiocb *iocb, struct iov_iter *from)
static ssize_t blkdev_buffered_write(struct kiocb *iocb, struct iov_iter *from)
{
- return iomap_file_buffered_write(iocb, from, &blkdev_iomap_ops, NULL);
+ return iomap_file_buffered_write(iocb, from, &blkdev_iomap_ops, NULL,
+ NULL);
}
/*
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 86045d3577b7..131091520de6 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -963,12 +963,16 @@ static struct folio *
gfs2_iomap_get_folio(struct iomap_iter *iter, loff_t pos, unsigned len)
{
struct inode *inode = iter->inode;
+ struct gfs2_inode *ip = GFS2_I(inode);
unsigned int blockmask = i_blocksize(inode) - 1;
struct gfs2_sbd *sdp = GFS2_SB(inode);
unsigned int blocks;
struct folio *folio;
int status;
+ if (!gfs2_is_jdata(ip) && !gfs2_is_stuffed(ip))
+ return iomap_get_folio(iter, pos, len);
+
blocks = ((pos & blockmask) + len + blockmask) >> inode->i_blkbits;
status = gfs2_trans_begin(sdp, RES_DINODE + blocks, 0);
if (status)
@@ -987,7 +991,7 @@ static void gfs2_iomap_put_folio(struct inode *inode, loff_t pos,
struct gfs2_inode *ip = GFS2_I(inode);
struct gfs2_sbd *sdp = GFS2_SB(inode);
- if (!gfs2_is_stuffed(ip))
+ if (gfs2_is_jdata(ip) && !gfs2_is_stuffed(ip))
gfs2_trans_add_databufs(ip->i_gl, folio,
offset_in_folio(folio, pos),
copied);
@@ -995,13 +999,14 @@ static void gfs2_iomap_put_folio(struct inode *inode, loff_t pos,
folio_unlock(folio);
folio_put(folio);
- if (tr->tr_num_buf_new)
- __mark_inode_dirty(inode, I_DIRTY_DATASYNC);
-
- gfs2_trans_end(sdp);
+ if (gfs2_is_jdata(ip) || gfs2_is_stuffed(ip)) {
+ if (tr->tr_num_buf_new)
+ __mark_inode_dirty(inode, I_DIRTY_DATASYNC);
+ gfs2_trans_end(sdp);
+ }
}
-static const struct iomap_folio_ops gfs2_iomap_folio_ops = {
+const struct iomap_write_ops gfs2_iomap_write_ops = {
.get_folio = gfs2_iomap_get_folio,
.put_folio = gfs2_iomap_put_folio,
};
@@ -1078,8 +1083,6 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
gfs2_trans_end(sdp);
}
- if (gfs2_is_stuffed(ip) || gfs2_is_jdata(ip))
- iomap->folio_ops = &gfs2_iomap_folio_ops;
return 0;
out_trans_end:
@@ -1304,7 +1307,7 @@ static int gfs2_block_zero_range(struct inode *inode, loff_t from, loff_t length
return 0;
length = min(length, inode->i_size - from);
return iomap_zero_range(inode, from, length, NULL, &gfs2_iomap_ops,
- NULL);
+ &gfs2_iomap_write_ops, NULL);
}
#define GFS2_JTRUNC_REVOKES 8192
diff --git a/fs/gfs2/bmap.h b/fs/gfs2/bmap.h
index 4e8b1e8ebdf3..6cdc72dd55a3 100644
--- a/fs/gfs2/bmap.h
+++ b/fs/gfs2/bmap.h
@@ -44,6 +44,7 @@ static inline void gfs2_write_calc_reserv(const struct gfs2_inode *ip,
}
extern const struct iomap_ops gfs2_iomap_ops;
+extern const struct iomap_write_ops gfs2_iomap_write_ops;
extern const struct iomap_writeback_ops gfs2_writeback_ops;
int gfs2_unstuff_dinode(struct gfs2_inode *ip);
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index fd1147aa3891..2908f5bee21d 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1058,7 +1058,8 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb,
}
pagefault_disable();
- ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops, NULL);
+ ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops,
+ &gfs2_iomap_write_ops, NULL);
pagefault_enable();
if (ret > 0)
written += ret;
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index ddb4363359e2..b04c00dd6768 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -732,28 +732,27 @@ static int __iomap_write_begin(const struct iomap_iter *iter, size_t len,
return 0;
}
-static struct folio *__iomap_get_folio(struct iomap_iter *iter, size_t len)
+static struct folio *__iomap_get_folio(struct iomap_iter *iter,
+ const struct iomap_write_ops *write_ops, size_t len)
{
- const struct iomap_folio_ops *folio_ops = iter->iomap.folio_ops;
loff_t pos = iter->pos;
if (!mapping_large_folio_support(iter->inode->i_mapping))
len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));
- if (folio_ops && folio_ops->get_folio)
- return folio_ops->get_folio(iter, pos, len);
- else
- return iomap_get_folio(iter, pos, len);
+ if (write_ops && write_ops->get_folio)
+ return write_ops->get_folio(iter, pos, len);
+ return iomap_get_folio(iter, pos, len);
}
-static void __iomap_put_folio(struct iomap_iter *iter, size_t ret,
+static void __iomap_put_folio(struct iomap_iter *iter,
+ const struct iomap_write_ops *write_ops, size_t ret,
struct folio *folio)
{
- const struct iomap_folio_ops *folio_ops = iter->iomap.folio_ops;
loff_t pos = iter->pos;
- if (folio_ops && folio_ops->put_folio) {
- folio_ops->put_folio(iter->inode, pos, ret, folio);
+ if (write_ops && write_ops->put_folio) {
+ write_ops->put_folio(iter->inode, pos, ret, folio);
} else {
folio_unlock(folio);
folio_put(folio);
@@ -790,10 +789,10 @@ static int iomap_write_begin_inline(const struct iomap_iter *iter,
* offset, and length. Callers can optionally pass a max length *plen,
* otherwise init to zero.
*/
-static int iomap_write_begin(struct iomap_iter *iter, struct folio **foliop,
+static int iomap_write_begin(struct iomap_iter *iter,
+ const struct iomap_write_ops *write_ops, struct folio **foliop,
size_t *poffset, u64 *plen)
{
- const struct iomap_folio_ops *folio_ops = iter->iomap.folio_ops;
const struct iomap *srcmap = iomap_iter_srcmap(iter);
loff_t pos = iter->pos;
u64 len = min_t(u64, SIZE_MAX, iomap_length(iter));
@@ -808,7 +807,7 @@ static int iomap_write_begin(struct iomap_iter *iter, struct folio **foliop,
if (fatal_signal_pending(current))
return -EINTR;
- folio = __iomap_get_folio(iter, len);
+ folio = __iomap_get_folio(iter, write_ops, len);
if (IS_ERR(folio))
return PTR_ERR(folio);
@@ -822,8 +821,8 @@ static int iomap_write_begin(struct iomap_iter *iter, struct folio **foliop,
* could do the wrong thing here (zero a page range incorrectly or fail
* to zero) and corrupt data.
*/
- if (folio_ops && folio_ops->iomap_valid) {
- bool iomap_valid = folio_ops->iomap_valid(iter->inode,
+ if (write_ops && write_ops->iomap_valid) {
+ bool iomap_valid = write_ops->iomap_valid(iter->inode,
&iter->iomap);
if (!iomap_valid) {
iter->iomap.flags |= IOMAP_F_STALE;
@@ -849,8 +848,7 @@ static int iomap_write_begin(struct iomap_iter *iter, struct folio **foliop,
return 0;
out_unlock:
- __iomap_put_folio(iter, 0, folio);
-
+ __iomap_put_folio(iter, write_ops, 0, folio);
return status;
}
@@ -922,7 +920,8 @@ static bool iomap_write_end(struct iomap_iter *iter, size_t len, size_t copied,
return __iomap_write_end(iter->inode, pos, len, copied, folio);
}
-static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
+static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i,
+ const struct iomap_write_ops *write_ops)
{
ssize_t total_written = 0;
int status = 0;
@@ -966,7 +965,8 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
break;
}
- status = iomap_write_begin(iter, &folio, &offset, &bytes);
+ status = iomap_write_begin(iter, write_ops, &folio, &offset,
+ &bytes);
if (unlikely(status)) {
iomap_write_failed(iter->inode, iter->pos, bytes);
break;
@@ -995,7 +995,7 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
i_size_write(iter->inode, pos + written);
iter->iomap.flags |= IOMAP_F_SIZE_CHANGED;
}
- __iomap_put_folio(iter, written, folio);
+ __iomap_put_folio(iter, write_ops, written, folio);
if (old_size < pos)
pagecache_isize_extended(iter->inode, old_size, pos);
@@ -1028,7 +1028,8 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
ssize_t
iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i,
- const struct iomap_ops *ops, void *private)
+ const struct iomap_ops *ops,
+ const struct iomap_write_ops *write_ops, void *private)
{
struct iomap_iter iter = {
.inode = iocb->ki_filp->f_mapping->host,
@@ -1045,7 +1046,7 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i,
iter.flags |= IOMAP_DONTCACHE;
while ((ret = iomap_iter(&iter, ops)) > 0)
- iter.status = iomap_write_iter(&iter, i);
+ iter.status = iomap_write_iter(&iter, i, write_ops);
if (unlikely(iter.pos == iocb->ki_pos))
return ret;
@@ -1279,7 +1280,8 @@ void iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
}
EXPORT_SYMBOL_GPL(iomap_write_delalloc_release);
-static int iomap_unshare_iter(struct iomap_iter *iter)
+static int iomap_unshare_iter(struct iomap_iter *iter,
+ const struct iomap_write_ops *write_ops)
{
struct iomap *iomap = &iter->iomap;
u64 bytes = iomap_length(iter);
@@ -1294,14 +1296,15 @@ static int iomap_unshare_iter(struct iomap_iter *iter)
bool ret;
bytes = min_t(u64, SIZE_MAX, bytes);
- status = iomap_write_begin(iter, &folio, &offset, &bytes);
+ status = iomap_write_begin(iter, write_ops, &folio, &offset,
+ &bytes);
if (unlikely(status))
return status;
if (iomap->flags & IOMAP_F_STALE)
break;
ret = iomap_write_end(iter, bytes, bytes, folio);
- __iomap_put_folio(iter, bytes, folio);
+ __iomap_put_folio(iter, write_ops, bytes, folio);
if (WARN_ON_ONCE(!ret))
return -EIO;
@@ -1319,7 +1322,8 @@ static int iomap_unshare_iter(struct iomap_iter *iter)
int
iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
- const struct iomap_ops *ops)
+ const struct iomap_ops *ops,
+ const struct iomap_write_ops *write_ops)
{
struct iomap_iter iter = {
.inode = inode,
@@ -1334,7 +1338,7 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
iter.len = min(len, size - pos);
while ((ret = iomap_iter(&iter, ops)) > 0)
- iter.status = iomap_unshare_iter(&iter);
+ iter.status = iomap_unshare_iter(&iter, write_ops);
return ret;
}
EXPORT_SYMBOL_GPL(iomap_file_unshare);
@@ -1353,7 +1357,8 @@ static inline int iomap_zero_iter_flush_and_stale(struct iomap_iter *i)
return filemap_write_and_wait_range(mapping, i->pos, end);
}
-static int iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
+static int iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
+ const struct iomap_write_ops *write_ops)
{
u64 bytes = iomap_length(iter);
int status;
@@ -1364,7 +1369,8 @@ static int iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
bool ret;
bytes = min_t(u64, SIZE_MAX, bytes);
- status = iomap_write_begin(iter, &folio, &offset, &bytes);
+ status = iomap_write_begin(iter, write_ops, &folio, &offset,
+ &bytes);
if (status)
return status;
if (iter->iomap.flags & IOMAP_F_STALE)
@@ -1377,7 +1383,7 @@ static int iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
folio_mark_accessed(folio);
ret = iomap_write_end(iter, bytes, bytes, folio);
- __iomap_put_folio(iter, bytes, folio);
+ __iomap_put_folio(iter, write_ops, bytes, folio);
if (WARN_ON_ONCE(!ret))
return -EIO;
@@ -1393,7 +1399,8 @@ static int iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
int
iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
- const struct iomap_ops *ops, void *private)
+ const struct iomap_ops *ops,
+ const struct iomap_write_ops *write_ops, void *private)
{
struct iomap_iter iter = {
.inode = inode,
@@ -1423,7 +1430,8 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
filemap_range_needs_writeback(mapping, pos, pos + plen - 1)) {
iter.len = plen;
while ((ret = iomap_iter(&iter, ops)) > 0)
- iter.status = iomap_zero_iter(&iter, did_zero);
+ iter.status = iomap_zero_iter(&iter, did_zero,
+ write_ops);
iter.len = len - (iter.pos - pos);
if (ret || !iter.len)
@@ -1454,7 +1462,7 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
continue;
}
- iter.status = iomap_zero_iter(&iter, did_zero);
+ iter.status = iomap_zero_iter(&iter, did_zero, write_ops);
}
return ret;
}
@@ -1462,7 +1470,8 @@ EXPORT_SYMBOL_GPL(iomap_zero_range);
int
iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
- const struct iomap_ops *ops, void *private)
+ const struct iomap_ops *ops,
+ const struct iomap_write_ops *write_ops, void *private)
{
unsigned int blocksize = i_blocksize(inode);
unsigned int off = pos & (blocksize - 1);
@@ -1471,7 +1480,7 @@ iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
if (!off)
return 0;
return iomap_zero_range(inode, pos, blocksize - off, did_zero, ops,
- private);
+ write_ops, private);
}
EXPORT_SYMBOL_GPL(iomap_truncate_page);
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 0b41b18debf3..0cbeae61f3a4 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -979,7 +979,8 @@ xfs_file_buffered_write(
trace_xfs_file_buffered_write(iocb, from);
ret = iomap_file_buffered_write(iocb, from,
- &xfs_buffered_write_iomap_ops, NULL);
+ &xfs_buffered_write_iomap_ops, &xfs_iomap_write_ops,
+ NULL);
/*
* If we hit a space limit, try to free up some lingering preallocated
@@ -1059,7 +1060,8 @@ xfs_file_buffered_write_zoned(
retry:
trace_xfs_file_buffered_write(iocb, from);
ret = iomap_file_buffered_write(iocb, from,
- &xfs_buffered_write_iomap_ops, &ac);
+ &xfs_buffered_write_iomap_ops, &xfs_iomap_write_ops,
+ &ac);
if (ret == -ENOSPC && !cleared_space) {
/*
* Kick off writeback to convert delalloc space and release the
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index ff05e6b1b0bb..2e94a9435002 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -79,6 +79,9 @@ xfs_iomap_valid(
{
struct xfs_inode *ip = XFS_I(inode);
+ if (iomap->type == IOMAP_HOLE)
+ return true;
+
if (iomap->validity_cookie !=
xfs_iomap_inode_sequence(ip, iomap->flags)) {
trace_xfs_iomap_invalid(ip, iomap);
@@ -89,7 +92,7 @@ xfs_iomap_valid(
return true;
}
-static const struct iomap_folio_ops xfs_iomap_folio_ops = {
+const struct iomap_write_ops xfs_iomap_write_ops = {
.iomap_valid = xfs_iomap_valid,
};
@@ -151,7 +154,6 @@ xfs_bmbt_to_iomap(
iomap->flags |= IOMAP_F_DIRTY;
iomap->validity_cookie = sequence_cookie;
- iomap->folio_ops = &xfs_iomap_folio_ops;
return 0;
}
@@ -2198,7 +2200,8 @@ xfs_zero_range(
return dax_zero_range(inode, pos, len, did_zero,
&xfs_dax_write_iomap_ops);
return iomap_zero_range(inode, pos, len, did_zero,
- &xfs_buffered_write_iomap_ops, ac);
+ &xfs_buffered_write_iomap_ops, &xfs_iomap_write_ops,
+ ac);
}
int
@@ -2214,5 +2217,6 @@ xfs_truncate_page(
return dax_truncate_page(inode, pos, did_zero,
&xfs_dax_write_iomap_ops);
return iomap_truncate_page(inode, pos, did_zero,
- &xfs_buffered_write_iomap_ops, ac);
+ &xfs_buffered_write_iomap_ops, &xfs_iomap_write_ops,
+ ac);
}
diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
index 674f8ac1b9bd..ebcce7d49446 100644
--- a/fs/xfs/xfs_iomap.h
+++ b/fs/xfs/xfs_iomap.h
@@ -57,5 +57,6 @@ extern const struct iomap_ops xfs_seek_iomap_ops;
extern const struct iomap_ops xfs_xattr_iomap_ops;
extern const struct iomap_ops xfs_dax_write_iomap_ops;
extern const struct iomap_ops xfs_atomic_write_cow_iomap_ops;
+extern const struct iomap_write_ops xfs_iomap_write_ops;
#endif /* __XFS_IOMAP_H__*/
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index ad3bcb76d805..3f177b4ec131 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1881,7 +1881,8 @@ xfs_reflink_unshare(
&xfs_dax_write_iomap_ops);
else
error = iomap_file_unshare(inode, offset, len,
- &xfs_buffered_write_iomap_ops);
+ &xfs_buffered_write_iomap_ops,
+ &xfs_iomap_write_ops);
if (error)
goto out;
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index fee9403ad49b..24c29c10e27f 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -572,7 +572,8 @@ static ssize_t zonefs_file_buffered_write(struct kiocb *iocb,
if (ret <= 0)
goto inode_unlock;
- ret = iomap_file_buffered_write(iocb, from, &zonefs_write_iomap_ops, NULL);
+ ret = iomap_file_buffered_write(iocb, from, &zonefs_write_iomap_ops,
+ NULL, NULL);
if (ret == -EIO)
zonefs_io_error(inode, true);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index b65d3f063bb0..80f543cc4fe8 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -101,8 +101,6 @@ struct vm_fault;
*/
#define IOMAP_NULL_ADDR -1ULL /* addr is not valid */
-struct iomap_folio_ops;
-
struct iomap {
u64 addr; /* disk offset of mapping, bytes */
loff_t offset; /* file offset of mapping, bytes */
@@ -113,7 +111,6 @@ struct iomap {
struct dax_device *dax_dev; /* dax_dev for dax operations */
void *inline_data;
void *private; /* filesystem private */
- const struct iomap_folio_ops *folio_ops;
u64 validity_cookie; /* used with .iomap_valid() */
};
@@ -143,16 +140,11 @@ static inline bool iomap_inline_data_valid(const struct iomap *iomap)
}
/*
- * When a filesystem sets folio_ops in an iomap mapping it returns, get_folio
- * and put_folio will be called for each folio written to. This only applies
- * to buffered writes as unbuffered writes will not typically have folios
- * associated with them.
- *
* When get_folio succeeds, put_folio will always be called to do any
* cleanup work necessary. put_folio is responsible for unlocking and putting
* @folio.
*/
-struct iomap_folio_ops {
+struct iomap_write_ops {
struct folio *(*get_folio)(struct iomap_iter *iter, loff_t pos,
unsigned len);
void (*put_folio)(struct inode *inode, loff_t pos, unsigned copied,
@@ -335,7 +327,8 @@ static inline bool iomap_want_unshare_iter(const struct iomap_iter *iter)
}
ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
- const struct iomap_ops *ops, void *private);
+ const struct iomap_ops *ops,
+ const struct iomap_write_ops *write_ops, void *private);
int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops);
void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops);
bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);
@@ -344,11 +337,14 @@ bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags);
void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len);
bool iomap_dirty_folio(struct address_space *mapping, struct folio *folio);
int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
- const struct iomap_ops *ops);
+ const struct iomap_ops *ops,
+ const struct iomap_write_ops *write_ops);
int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
- bool *did_zero, const struct iomap_ops *ops, void *private);
+ bool *did_zero, const struct iomap_ops *ops,
+ const struct iomap_write_ops *write_ops, void *private);
int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
- const struct iomap_ops *ops, void *private);
+ const struct iomap_ops *ops,
+ const struct iomap_write_ops *write_ops, void *private);
vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops,
void *private);
typedef void (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length,
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 12/14] iomap: improve argument passing to iomap_read_folio_sync
2025-07-08 13:51 refactor the iomap writeback code v4 Christoph Hellwig
` (10 preceding siblings ...)
2025-07-08 13:51 ` [PATCH 11/14] iomap: replace iomap_folio_ops with iomap_write_ops Christoph Hellwig
@ 2025-07-08 13:51 ` Christoph Hellwig
2025-07-08 19:40 ` Darrick J. Wong
2025-07-08 21:23 ` Joanne Koong
2025-07-08 13:51 ` [PATCH 13/14] iomap: add read_folio_range() handler for buffered writes Christoph Hellwig
` (2 subsequent siblings)
14 siblings, 2 replies; 28+ messages in thread
From: Christoph Hellwig @ 2025-07-08 13:51 UTC (permalink / raw)
To: Christian Brauner
Cc: Darrick J. Wong, Joanne Koong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2
Pass the iomap_iter and derive the map inside iomap_read_folio_sync
instead of in the caller, and use the more descriptive srcmap name for
the source iomap. Stop passing the offset into folio argument as it
can be derived from the folio and the file offset. Rename the
variables for the offset into the file and the length to be more
descriptive and match the rest of the code.
Rename the function itself to iomap_read_folio_range to make the use
more clear.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/iomap/buffered-io.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index b04c00dd6768..c73048062cb1 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -657,22 +657,22 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
pos + len - 1);
}
-static int iomap_read_folio_sync(loff_t block_start, struct folio *folio,
- size_t poff, size_t plen, const struct iomap *iomap)
+static int iomap_read_folio_range(const struct iomap_iter *iter,
+ struct folio *folio, loff_t pos, size_t len)
{
+ const struct iomap *srcmap = iomap_iter_srcmap(iter);
struct bio_vec bvec;
struct bio bio;
- bio_init(&bio, iomap->bdev, &bvec, 1, REQ_OP_READ);
- bio.bi_iter.bi_sector = iomap_sector(iomap, block_start);
- bio_add_folio_nofail(&bio, folio, plen, poff);
+ bio_init(&bio, srcmap->bdev, &bvec, 1, REQ_OP_READ);
+ bio.bi_iter.bi_sector = iomap_sector(srcmap, pos);
+ bio_add_folio_nofail(&bio, folio, len, offset_in_folio(folio, pos));
return submit_bio_wait(&bio);
}
static int __iomap_write_begin(const struct iomap_iter *iter, size_t len,
struct folio *folio)
{
- const struct iomap *srcmap = iomap_iter_srcmap(iter);
struct iomap_folio_state *ifs;
loff_t pos = iter->pos;
loff_t block_size = i_blocksize(iter->inode);
@@ -721,8 +721,8 @@ static int __iomap_write_begin(const struct iomap_iter *iter, size_t len,
if (iter->flags & IOMAP_NOWAIT)
return -EAGAIN;
- status = iomap_read_folio_sync(block_start, folio,
- poff, plen, srcmap);
+ status = iomap_read_folio_range(iter, folio,
+ block_start, plen);
if (status)
return status;
}
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 13/14] iomap: add read_folio_range() handler for buffered writes
2025-07-08 13:51 refactor the iomap writeback code v4 Christoph Hellwig
` (11 preceding siblings ...)
2025-07-08 13:51 ` [PATCH 12/14] iomap: improve argument passing to iomap_read_folio_sync Christoph Hellwig
@ 2025-07-08 13:51 ` Christoph Hellwig
2025-07-08 13:51 ` [PATCH 14/14] iomap: build the writeback code without CONFIG_BLOCK Christoph Hellwig
2025-07-08 21:30 ` refactor the iomap writeback code v4 Joanne Koong
14 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2025-07-08 13:51 UTC (permalink / raw)
To: Christian Brauner
Cc: Darrick J. Wong, Joanne Koong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2
Add a read_folio_range() handler for buffered writes that filesystems
may pass in if they wish to provide a custom handler for synchronously
reading in the contents of a folio.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
[hch: renamed to read_folio_range, pass less arguments]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
Documentation/filesystems/iomap/operations.rst | 6 ++++++
fs/iomap/buffered-io.c | 13 +++++++++----
include/linux/iomap.h | 10 ++++++++++
3 files changed, 25 insertions(+), 4 deletions(-)
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index a9b48ce4af92..067ed8e14ef3 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -68,6 +68,8 @@ The following address space operations can be wrapped easily:
void (*put_folio)(struct inode *inode, loff_t pos, unsigned copied,
struct folio *folio);
bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap);
+ int (*read_folio_range)(const struct iomap_iter *iter,
+ struct folio *folio, loff_t pos, size_t len);
};
iomap calls these functions:
@@ -123,6 +125,10 @@ iomap calls these functions:
``->iomap_valid``, then the iomap should considered stale and the
validation failed.
+ - ``read_folio_range``: Called to synchronously read in the range that will
+ be written to. If this function is not provided, iomap will default to
+ submitting a bio read request.
+
These ``struct kiocb`` flags are significant for buffered I/O with iomap:
* ``IOCB_NOWAIT``: Turns on ``IOMAP_NOWAIT``.
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index c73048062cb1..b885267828d8 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -670,7 +670,8 @@ static int iomap_read_folio_range(const struct iomap_iter *iter,
return submit_bio_wait(&bio);
}
-static int __iomap_write_begin(const struct iomap_iter *iter, size_t len,
+static int __iomap_write_begin(const struct iomap_iter *iter,
+ const struct iomap_write_ops *write_ops, size_t len,
struct folio *folio)
{
struct iomap_folio_state *ifs;
@@ -721,8 +722,12 @@ static int __iomap_write_begin(const struct iomap_iter *iter, size_t len,
if (iter->flags & IOMAP_NOWAIT)
return -EAGAIN;
- status = iomap_read_folio_range(iter, folio,
- block_start, plen);
+ if (write_ops && write_ops->read_folio_range)
+ status = write_ops->read_folio_range(iter,
+ folio, block_start, plen);
+ else
+ status = iomap_read_folio_range(iter,
+ folio, block_start, plen);
if (status)
return status;
}
@@ -838,7 +843,7 @@ static int iomap_write_begin(struct iomap_iter *iter,
else if (srcmap->flags & IOMAP_F_BUFFER_HEAD)
status = __block_write_begin_int(folio, pos, len, NULL, srcmap);
else
- status = __iomap_write_begin(iter, len, folio);
+ status = __iomap_write_begin(iter, write_ops, len, folio);
if (unlikely(status))
goto out_unlock;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 80f543cc4fe8..73dceabc21c8 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -166,6 +166,16 @@ struct iomap_write_ops {
* locked by the iomap code.
*/
bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap);
+
+ /*
+ * Optional if the filesystem wishes to provide a custom handler for
+ * reading in the contents of a folio, otherwise iomap will default to
+ * submitting a bio read request.
+ *
+ * The read must be done synchronously.
+ */
+ int (*read_folio_range)(const struct iomap_iter *iter,
+ struct folio *folio, loff_t pos, size_t len);
};
/*
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 14/14] iomap: build the writeback code without CONFIG_BLOCK
2025-07-08 13:51 refactor the iomap writeback code v4 Christoph Hellwig
` (12 preceding siblings ...)
2025-07-08 13:51 ` [PATCH 13/14] iomap: add read_folio_range() handler for buffered writes Christoph Hellwig
@ 2025-07-08 13:51 ` Christoph Hellwig
2025-07-08 21:27 ` Joanne Koong
2025-07-08 21:30 ` refactor the iomap writeback code v4 Joanne Koong
14 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2025-07-08 13:51 UTC (permalink / raw)
To: Christian Brauner
Cc: Darrick J. Wong, Joanne Koong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2
Allow fuse to use the iomap writeback code even when CONFIG_BLOCK is
not enabled. Do this with an ifdef instead of a separate file to keep
the iomap_folio_state local to buffered-io.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/iomap/Makefile | 6 +--
fs/iomap/buffered-io.c | 113 ++++++++++++++++++++++-------------------
2 files changed, 64 insertions(+), 55 deletions(-)
diff --git a/fs/iomap/Makefile b/fs/iomap/Makefile
index 69e8ebb41302..f7e1c8534c46 100644
--- a/fs/iomap/Makefile
+++ b/fs/iomap/Makefile
@@ -9,9 +9,9 @@ ccflags-y += -I $(src) # needed for trace events
obj-$(CONFIG_FS_IOMAP) += iomap.o
iomap-y += trace.o \
- iter.o
-iomap-$(CONFIG_BLOCK) += buffered-io.o \
- direct-io.o \
+ iter.o \
+ buffered-io.o
+iomap-$(CONFIG_BLOCK) += direct-io.o \
ioend.o \
fiemap.o \
seek.o
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index b885267828d8..3a94a6b34aa9 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -274,6 +274,46 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
*lenp = plen;
}
+static inline bool iomap_block_needs_zeroing(const struct iomap_iter *iter,
+ loff_t pos)
+{
+ const struct iomap *srcmap = iomap_iter_srcmap(iter);
+
+ return srcmap->type != IOMAP_MAPPED ||
+ (srcmap->flags & IOMAP_F_NEW) ||
+ pos >= i_size_read(iter->inode);
+}
+
+/**
+ * iomap_read_inline_data - copy inline data into the page cache
+ * @iter: iteration structure
+ * @folio: folio to copy to
+ *
+ * Copy the inline data in @iter into @folio and zero out the rest of the folio.
+ * Only a single IOMAP_INLINE extent is allowed at the end of each file.
+ * Returns zero for success to complete the read, or the usual negative errno.
+ */
+static int iomap_read_inline_data(const struct iomap_iter *iter,
+ struct folio *folio)
+{
+ const struct iomap *iomap = iomap_iter_srcmap(iter);
+ size_t size = i_size_read(iter->inode) - iomap->offset;
+ size_t offset = offset_in_folio(folio, iomap->offset);
+
+ if (folio_test_uptodate(folio))
+ return 0;
+
+ if (WARN_ON_ONCE(size > iomap->length))
+ return -EIO;
+ if (offset > 0)
+ ifs_alloc(iter->inode, folio, iter->flags);
+
+ folio_fill_tail(folio, offset, iomap->inline_data, size);
+ iomap_set_range_uptodate(folio, offset, folio_size(folio) - offset);
+ return 0;
+}
+
+#ifdef CONFIG_BLOCK
static void iomap_finish_folio_read(struct folio *folio, size_t off,
size_t len, int error)
{
@@ -313,45 +353,6 @@ struct iomap_readpage_ctx {
struct readahead_control *rac;
};
-/**
- * iomap_read_inline_data - copy inline data into the page cache
- * @iter: iteration structure
- * @folio: folio to copy to
- *
- * Copy the inline data in @iter into @folio and zero out the rest of the folio.
- * Only a single IOMAP_INLINE extent is allowed at the end of each file.
- * Returns zero for success to complete the read, or the usual negative errno.
- */
-static int iomap_read_inline_data(const struct iomap_iter *iter,
- struct folio *folio)
-{
- const struct iomap *iomap = iomap_iter_srcmap(iter);
- size_t size = i_size_read(iter->inode) - iomap->offset;
- size_t offset = offset_in_folio(folio, iomap->offset);
-
- if (folio_test_uptodate(folio))
- return 0;
-
- if (WARN_ON_ONCE(size > iomap->length))
- return -EIO;
- if (offset > 0)
- ifs_alloc(iter->inode, folio, iter->flags);
-
- folio_fill_tail(folio, offset, iomap->inline_data, size);
- iomap_set_range_uptodate(folio, offset, folio_size(folio) - offset);
- return 0;
-}
-
-static inline bool iomap_block_needs_zeroing(const struct iomap_iter *iter,
- loff_t pos)
-{
- const struct iomap *srcmap = iomap_iter_srcmap(iter);
-
- return srcmap->type != IOMAP_MAPPED ||
- (srcmap->flags & IOMAP_F_NEW) ||
- pos >= i_size_read(iter->inode);
-}
-
static int iomap_readpage_iter(struct iomap_iter *iter,
struct iomap_readpage_ctx *ctx)
{
@@ -544,6 +545,27 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
}
EXPORT_SYMBOL_GPL(iomap_readahead);
+static int iomap_read_folio_range(const struct iomap_iter *iter,
+ struct folio *folio, loff_t pos, size_t len)
+{
+ const struct iomap *srcmap = iomap_iter_srcmap(iter);
+ struct bio_vec bvec;
+ struct bio bio;
+
+ bio_init(&bio, srcmap->bdev, &bvec, 1, REQ_OP_READ);
+ bio.bi_iter.bi_sector = iomap_sector(srcmap, pos);
+ bio_add_folio_nofail(&bio, folio, len, offset_in_folio(folio, pos));
+ return submit_bio_wait(&bio);
+}
+#else
+static int iomap_read_folio_range(const struct iomap_iter *iter,
+ struct folio *folio, loff_t pos, size_t len)
+{
+ WARN_ON_ONCE(1);
+ return -EIO;
+}
+#endif /* CONFIG_BLOCK */
+
/*
* iomap_is_partially_uptodate checks whether blocks within a folio are
* uptodate or not.
@@ -657,19 +679,6 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
pos + len - 1);
}
-static int iomap_read_folio_range(const struct iomap_iter *iter,
- struct folio *folio, loff_t pos, size_t len)
-{
- const struct iomap *srcmap = iomap_iter_srcmap(iter);
- struct bio_vec bvec;
- struct bio bio;
-
- bio_init(&bio, srcmap->bdev, &bvec, 1, REQ_OP_READ);
- bio.bi_iter.bi_sector = iomap_sector(srcmap, pos);
- bio_add_folio_nofail(&bio, folio, len, offset_in_folio(folio, pos));
- return submit_bio_wait(&bio);
-}
-
static int __iomap_write_begin(const struct iomap_iter *iter,
const struct iomap_write_ops *write_ops, size_t len,
struct folio *folio)
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 12/14] iomap: improve argument passing to iomap_read_folio_sync
2025-07-08 13:51 ` [PATCH 12/14] iomap: improve argument passing to iomap_read_folio_sync Christoph Hellwig
@ 2025-07-08 19:40 ` Darrick J. Wong
2025-07-08 21:23 ` Joanne Koong
1 sibling, 0 replies; 28+ messages in thread
From: Darrick J. Wong @ 2025-07-08 19:40 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Joanne Koong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2
On Tue, Jul 08, 2025 at 03:51:18PM +0200, Christoph Hellwig wrote:
> Pass the iomap_iter and derive the map inside iomap_read_folio_sync
> instead of in the caller, and use the more descriptive srcmap name for
> the source iomap. Stop passing the offset into folio argument as it
> can be derived from the folio and the file offset. Rename the
> variables for the offset into the file and the length to be more
> descriptive and match the rest of the code.
>
> Rename the function itself to iomap_read_folio_range to make the use
> more clear.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Much clearer, thank you!
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/iomap/buffered-io.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index b04c00dd6768..c73048062cb1 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -657,22 +657,22 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
> pos + len - 1);
> }
>
> -static int iomap_read_folio_sync(loff_t block_start, struct folio *folio,
> - size_t poff, size_t plen, const struct iomap *iomap)
> +static int iomap_read_folio_range(const struct iomap_iter *iter,
> + struct folio *folio, loff_t pos, size_t len)
> {
> + const struct iomap *srcmap = iomap_iter_srcmap(iter);
> struct bio_vec bvec;
> struct bio bio;
>
> - bio_init(&bio, iomap->bdev, &bvec, 1, REQ_OP_READ);
> - bio.bi_iter.bi_sector = iomap_sector(iomap, block_start);
> - bio_add_folio_nofail(&bio, folio, plen, poff);
> + bio_init(&bio, srcmap->bdev, &bvec, 1, REQ_OP_READ);
> + bio.bi_iter.bi_sector = iomap_sector(srcmap, pos);
> + bio_add_folio_nofail(&bio, folio, len, offset_in_folio(folio, pos));
> return submit_bio_wait(&bio);
> }
>
> static int __iomap_write_begin(const struct iomap_iter *iter, size_t len,
> struct folio *folio)
> {
> - const struct iomap *srcmap = iomap_iter_srcmap(iter);
> struct iomap_folio_state *ifs;
> loff_t pos = iter->pos;
> loff_t block_size = i_blocksize(iter->inode);
> @@ -721,8 +721,8 @@ static int __iomap_write_begin(const struct iomap_iter *iter, size_t len,
> if (iter->flags & IOMAP_NOWAIT)
> return -EAGAIN;
>
> - status = iomap_read_folio_sync(block_start, folio,
> - poff, plen, srcmap);
> + status = iomap_read_folio_range(iter, folio,
> + block_start, plen);
> if (status)
> return status;
> }
> --
> 2.47.2
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 05/14] iomap: hide ioends from the generic writeback code
2025-07-08 13:51 ` [PATCH 05/14] iomap: hide ioends from the generic writeback code Christoph Hellwig
@ 2025-07-08 19:42 ` Darrick J. Wong
0 siblings, 0 replies; 28+ messages in thread
From: Darrick J. Wong @ 2025-07-08 19:42 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Joanne Koong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2, Brian Foster, Damien Le Moal
On Tue, Jul 08, 2025 at 03:51:11PM +0200, Christoph Hellwig wrote:
> Replace the ioend pointer in iomap_writeback_ctx with a void *wb_ctx
> one to facilitate non-block, non-ioend writeback for use. Rename
> the submit_ioend method to writeback_submit and make it mandatory so
> that the generic writeback code stops seeing ioends and bios.
>
> Co-developed-by: Joanne Koong <joannelkoong@gmail.com>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> Acked-by: Damien Le Moal <dlemoal@kernel.org>
Looks good,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> .../filesystems/iomap/operations.rst | 17 ++--
> block/fops.c | 1 +
> fs/gfs2/bmap.c | 1 +
> fs/iomap/buffered-io.c | 91 ++++++++++---------
> fs/xfs/xfs_aops.c | 60 ++++++------
> fs/zonefs/file.c | 1 +
> include/linux/iomap.h | 19 ++--
> 7 files changed, 100 insertions(+), 90 deletions(-)
>
> diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
> index f07c8fdb2046..4b93c5f7841a 100644
> --- a/Documentation/filesystems/iomap/operations.rst
> +++ b/Documentation/filesystems/iomap/operations.rst
> @@ -284,8 +284,8 @@ The ``ops`` structure must be specified and is as follows:
>
> struct iomap_writeback_ops {
> int (*writeback_range)(struct iomap_writepage_ctx *wpc,
> - struct folio *folio, u64 pos, unsigned int len, u64 end_pos);
> - int (*submit_ioend)(struct iomap_writepage_ctx *wpc, int status);
> + struct folio *folio, u64 pos, unsigned int len, u64 end_pos);
> + int (*writeback_submit)(struct iomap_writepage_ctx *wpc, int error);
> };
>
> The fields are as follows:
> @@ -316,13 +316,15 @@ The fields are as follows:
> clean pagecache.
> This function must be supplied by the filesystem.
>
> - - ``submit_ioend``: Allows the file systems to hook into writeback bio
> - submission.
> + - ``writeback_submit``: Submit the previous built writeback context.
> + Block based file systems should use the iomap_ioend_writeback_submit
> + helper, other file system can implement their own.
> + File systems can optionall to hook into writeback bio submission.
> This might include pre-write space accounting updates, or installing
> a custom ``->bi_end_io`` function for internal purposes, such as
> deferring the ioend completion to a workqueue to run metadata update
> transactions from process context before submitting the bio.
> - This function is optional.
> + This function must be supplied by the filesystem.
>
> Pagecache Writeback Completion
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> @@ -336,10 +338,9 @@ If the write failed, it will also set the error bits on the folios and
> the address space.
> This can happen in interrupt or process context, depending on the
> storage device.
> -
> Filesystems that need to update internal bookkeeping (e.g. unwritten
> -extent conversions) should provide a ``->submit_ioend`` function to
> -set ``struct iomap_end::bio::bi_end_io`` to its own function.
> +extent conversions) should set their own bi_end_io on the bios
> +submitted by ``->submit_writeback``
> This function should call ``iomap_finish_ioends`` after finishing its
> own work (e.g. unwritten extent conversion).
>
> diff --git a/block/fops.c b/block/fops.c
> index b500ff8f55dd..0845737c0320 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -560,6 +560,7 @@ static ssize_t blkdev_writeback_range(struct iomap_writepage_ctx *wpc,
>
> static const struct iomap_writeback_ops blkdev_writeback_ops = {
> .writeback_range = blkdev_writeback_range,
> + .writeback_submit = iomap_ioend_writeback_submit,
> };
>
> static int blkdev_writepages(struct address_space *mapping,
> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> index 0cc41de54aba..86045d3577b7 100644
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -2490,4 +2490,5 @@ static ssize_t gfs2_writeback_range(struct iomap_writepage_ctx *wpc,
>
> const struct iomap_writeback_ops gfs2_writeback_ops = {
> .writeback_range = gfs2_writeback_range,
> + .writeback_submit = iomap_ioend_writeback_submit,
> };
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 7d9cd05c36bb..eccb9109467a 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1569,7 +1569,7 @@ u32 iomap_finish_ioend_buffered(struct iomap_ioend *ioend)
> return folio_count;
> }
>
> -static void iomap_writepage_end_bio(struct bio *bio)
> +static void ioend_writeback_end_bio(struct bio *bio)
> {
> struct iomap_ioend *ioend = iomap_ioend_from_bio(bio);
>
> @@ -1578,42 +1578,30 @@ static void iomap_writepage_end_bio(struct bio *bio)
> }
>
> /*
> - * Submit an ioend.
> - *
> - * If @error is non-zero, it means that we have a situation where some part of
> - * the submission process has failed after we've marked pages for writeback.
> - * We cannot cancel ioend directly in that case, so call the bio end I/O handler
> - * with the error status here to run the normal I/O completion handler to clear
> - * the writeback bit and let the file system proess the errors.
> + * We cannot cancel the ioend directly in case of an error, so call the bio end
> + * I/O handler with the error status here to run the normal I/O completion
> + * handler.
> */
> -static int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error)
> +int iomap_ioend_writeback_submit(struct iomap_writepage_ctx *wpc, int error)
> {
> - if (!wpc->ioend)
> - return error;
> + struct iomap_ioend *ioend = wpc->wb_ctx;
>
> - /*
> - * Let the file systems prepare the I/O submission and hook in an I/O
> - * comletion handler. This also needs to happen in case after a
> - * failure happened so that the file system end I/O handler gets called
> - * to clean up.
> - */
> - if (wpc->ops->submit_ioend) {
> - error = wpc->ops->submit_ioend(wpc, error);
> - } else {
> - if (WARN_ON_ONCE(wpc->iomap.flags & IOMAP_F_ANON_WRITE))
> - error = -EIO;
> - if (!error)
> - submit_bio(&wpc->ioend->io_bio);
> - }
> + if (!ioend->io_bio.bi_end_io)
> + ioend->io_bio.bi_end_io = ioend_writeback_end_bio;
> +
> + if (WARN_ON_ONCE(wpc->iomap.flags & IOMAP_F_ANON_WRITE))
> + error = -EIO;
>
> if (error) {
> - wpc->ioend->io_bio.bi_status = errno_to_blk_status(error);
> - bio_endio(&wpc->ioend->io_bio);
> + ioend->io_bio.bi_status = errno_to_blk_status(error);
> + bio_endio(&ioend->io_bio);
> + return error;
> }
>
> - wpc->ioend = NULL;
> - return error;
> + submit_bio(&ioend->io_bio);
> + return 0;
> }
> +EXPORT_SYMBOL_GPL(iomap_ioend_writeback_submit);
>
> static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
> loff_t pos, u16 ioend_flags)
> @@ -1624,7 +1612,6 @@ static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
> REQ_OP_WRITE | wbc_to_write_flags(wpc->wbc),
> GFP_NOFS, &iomap_ioend_bioset);
> bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos);
> - bio->bi_end_io = iomap_writepage_end_bio;
> bio->bi_write_hint = wpc->inode->i_write_hint;
> wbc_init_bio(wpc->wbc, bio);
> wpc->nr_folios = 0;
> @@ -1634,16 +1621,17 @@ static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
> static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos,
> u16 ioend_flags)
> {
> + struct iomap_ioend *ioend = wpc->wb_ctx;
> +
> if (ioend_flags & IOMAP_IOEND_BOUNDARY)
> return false;
> if ((ioend_flags & IOMAP_IOEND_NOMERGE_FLAGS) !=
> - (wpc->ioend->io_flags & IOMAP_IOEND_NOMERGE_FLAGS))
> + (ioend->io_flags & IOMAP_IOEND_NOMERGE_FLAGS))
> return false;
> - if (pos != wpc->ioend->io_offset + wpc->ioend->io_size)
> + if (pos != ioend->io_offset + ioend->io_size)
> return false;
> if (!(wpc->iomap.flags & IOMAP_F_ANON_WRITE) &&
> - iomap_sector(&wpc->iomap, pos) !=
> - bio_end_sector(&wpc->ioend->io_bio))
> + iomap_sector(&wpc->iomap, pos) != bio_end_sector(&ioend->io_bio))
> return false;
> /*
> * Limit ioend bio chain lengths to minimise IO completion latency. This
> @@ -1669,6 +1657,7 @@ static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos,
> ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
> loff_t pos, loff_t end_pos, unsigned int dirty_len)
> {
> + struct iomap_ioend *ioend = wpc->wb_ctx;
> struct iomap_folio_state *ifs = folio->private;
> size_t poff = offset_in_folio(folio, pos);
> unsigned int ioend_flags = 0;
> @@ -1699,15 +1688,17 @@ ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
> if (pos == wpc->iomap.offset && (wpc->iomap.flags & IOMAP_F_BOUNDARY))
> ioend_flags |= IOMAP_IOEND_BOUNDARY;
>
> - if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, pos, ioend_flags)) {
> + if (!ioend || !iomap_can_add_to_ioend(wpc, pos, ioend_flags)) {
> new_ioend:
> - error = iomap_submit_ioend(wpc, 0);
> - if (error)
> - return error;
> - wpc->ioend = iomap_alloc_ioend(wpc, pos, ioend_flags);
> + if (ioend) {
> + error = wpc->ops->writeback_submit(wpc, 0);
> + if (error)
> + return error;
> + }
> + wpc->wb_ctx = ioend = iomap_alloc_ioend(wpc, pos, ioend_flags);
> }
>
> - if (!bio_add_folio(&wpc->ioend->io_bio, folio, map_len, poff))
> + if (!bio_add_folio(&ioend->io_bio, folio, map_len, poff))
> goto new_ioend;
>
> if (ifs)
> @@ -1754,9 +1745,9 @@ ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
> * Note that this defeats the ability to chain the ioends of
> * appending writes.
> */
> - wpc->ioend->io_size += map_len;
> - if (wpc->ioend->io_offset + wpc->ioend->io_size > end_pos)
> - wpc->ioend->io_size = end_pos - wpc->ioend->io_offset;
> + ioend->io_size += map_len;
> + if (ioend->io_offset + ioend->io_size > end_pos)
> + ioend->io_size = end_pos - ioend->io_offset;
>
> wbc_account_cgroup_owner(wpc->wbc, folio, map_len);
> return map_len;
> @@ -1951,6 +1942,18 @@ iomap_writepages(struct iomap_writepage_ctx *wpc)
>
> while ((folio = writeback_iter(mapping, wpc->wbc, folio, &error)))
> error = iomap_writepage_map(wpc, folio);
> - return iomap_submit_ioend(wpc, error);
> +
> + /*
> + * If @error is non-zero, it means that we have a situation where some
> + * part of the submission process has failed after we've marked pages
> + * for writeback.
> + *
> + * We cannot cancel the writeback directly in that case, so always call
> + * ->writeback_submit to run the I/O completion handler to clear the
> + * writeback bit and let the file system proess the errors.
> + */
> + if (wpc->wb_ctx)
> + return wpc->ops->writeback_submit(wpc, error);
> + return error;
> }
> EXPORT_SYMBOL_GPL(iomap_writepages);
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index f6d44ab78442..1ee4f835ac3c 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -514,41 +514,40 @@ xfs_ioend_needs_wq_completion(
> }
>
> static int
> -xfs_submit_ioend(
> - struct iomap_writepage_ctx *wpc,
> - int status)
> +xfs_writeback_submit(
> + struct iomap_writepage_ctx *wpc,
> + int error)
> {
> - struct iomap_ioend *ioend = wpc->ioend;
> - unsigned int nofs_flag;
> + struct iomap_ioend *ioend = wpc->wb_ctx;
>
> /*
> - * We can allocate memory here while doing writeback on behalf of
> - * memory reclaim. To avoid memory allocation deadlocks set the
> - * task-wide nofs context for the following operations.
> + * Convert CoW extents to regular.
> + *
> + * We can allocate memory here while doing writeback on behalf of memory
> + * reclaim. To avoid memory allocation deadlocks, set the task-wide
> + * nofs context.
> */
> - nofs_flag = memalloc_nofs_save();
> + if (!error && (ioend->io_flags & IOMAP_IOEND_SHARED)) {
> + unsigned int nofs_flag;
>
> - /* Convert CoW extents to regular */
> - if (!status && (ioend->io_flags & IOMAP_IOEND_SHARED)) {
> - status = xfs_reflink_convert_cow(XFS_I(ioend->io_inode),
> + nofs_flag = memalloc_nofs_save();
> + error = xfs_reflink_convert_cow(XFS_I(ioend->io_inode),
> ioend->io_offset, ioend->io_size);
> + memalloc_nofs_restore(nofs_flag);
> }
>
> - memalloc_nofs_restore(nofs_flag);
> -
> - /* send ioends that might require a transaction to the completion wq */
> + /*
> + * Send ioends that might require a transaction to the completion wq.
> + */
> if (xfs_ioend_needs_wq_completion(ioend))
> ioend->io_bio.bi_end_io = xfs_end_bio;
>
> - if (status)
> - return status;
> - submit_bio(&ioend->io_bio);
> - return 0;
> + return iomap_ioend_writeback_submit(wpc, error);
> }
>
> static const struct iomap_writeback_ops xfs_writeback_ops = {
> .writeback_range = xfs_writeback_range,
> - .submit_ioend = xfs_submit_ioend,
> + .writeback_submit = xfs_writeback_submit,
> };
>
> struct xfs_zoned_writepage_ctx {
> @@ -646,20 +645,25 @@ xfs_zoned_writeback_range(
> }
>
> static int
> -xfs_zoned_submit_ioend(
> - struct iomap_writepage_ctx *wpc,
> - int status)
> +xfs_zoned_writeback_submit(
> + struct iomap_writepage_ctx *wpc,
> + int error)
> {
> - wpc->ioend->io_bio.bi_end_io = xfs_end_bio;
> - if (status)
> - return status;
> - xfs_zone_alloc_and_submit(wpc->ioend, &XFS_ZWPC(wpc)->open_zone);
> + struct iomap_ioend *ioend = wpc->wb_ctx;
> +
> + ioend->io_bio.bi_end_io = xfs_end_bio;
> + if (error) {
> + ioend->io_bio.bi_status = errno_to_blk_status(error);
> + bio_endio(&ioend->io_bio);
> + return error;
> + }
> + xfs_zone_alloc_and_submit(ioend, &XFS_ZWPC(wpc)->open_zone);
> return 0;
> }
>
> static const struct iomap_writeback_ops xfs_zoned_writeback_ops = {
> .writeback_range = xfs_zoned_writeback_range,
> - .submit_ioend = xfs_zoned_submit_ioend,
> + .writeback_submit = xfs_zoned_writeback_submit,
> };
>
> STATIC int
> diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
> index c88e2c851753..fee9403ad49b 100644
> --- a/fs/zonefs/file.c
> +++ b/fs/zonefs/file.c
> @@ -151,6 +151,7 @@ static ssize_t zonefs_writeback_range(struct iomap_writepage_ctx *wpc,
>
> static const struct iomap_writeback_ops zonefs_writeback_ops = {
> .writeback_range = zonefs_writeback_range,
> + .writeback_submit = iomap_ioend_writeback_submit,
> };
>
> static int zonefs_writepages(struct address_space *mapping,
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 625d7911a2b5..9f32dd8dc075 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -391,8 +391,7 @@ sector_t iomap_bmap(struct address_space *mapping, sector_t bno,
> /*
> * Structure for writeback I/O completions.
> *
> - * File systems implementing ->submit_ioend (for buffered I/O) or ->submit_io
> - * for direct I/O) can split a bio generated by iomap. In that case the parent
> + * File systems can split a bio generated by iomap. In that case the parent
> * ioend it was split from is recorded in ioend->io_parent.
> */
> struct iomap_ioend {
> @@ -416,7 +415,7 @@ static inline struct iomap_ioend *iomap_ioend_from_bio(struct bio *bio)
>
> struct iomap_writeback_ops {
> /*
> - * Required, performs writeback on the passed in range
> + * Performs writeback on the passed in range
> *
> * Can map arbitrarily large regions, but we need to call into it at
> * least once per folio to allow the file systems to synchronize with
> @@ -432,23 +431,22 @@ struct iomap_writeback_ops {
> u64 end_pos);
>
> /*
> - * Optional, allows the file systems to hook into bio submission,
> - * including overriding the bi_end_io handler.
> + * Submit a writeback context previously build up by ->writeback_range.
> *
> - * Returns 0 if the bio was successfully submitted, or a negative
> - * error code if status was non-zero or another error happened and
> - * the bio could not be submitted.
> + * Returns 0 if the context was successfully submitted, or a negative
> + * error code if not. If @error is non-zero a failure occurred, and
> + * the writeback context should be completed with an error.
> */
> - int (*submit_ioend)(struct iomap_writepage_ctx *wpc, int status);
> + int (*writeback_submit)(struct iomap_writepage_ctx *wpc, int error);
> };
>
> struct iomap_writepage_ctx {
> struct iomap iomap;
> struct inode *inode;
> struct writeback_control *wbc;
> - struct iomap_ioend *ioend;
> const struct iomap_writeback_ops *ops;
> u32 nr_folios; /* folios added to the ioend */
> + void *wb_ctx; /* pending writeback context */
> };
>
> struct iomap_ioend *iomap_init_ioend(struct inode *inode, struct bio *bio,
> @@ -461,6 +459,7 @@ void iomap_ioend_try_merge(struct iomap_ioend *ioend,
> void iomap_sort_ioends(struct list_head *ioend_list);
> ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
> loff_t pos, loff_t end_pos, unsigned int dirty_len);
> +int iomap_ioend_writeback_submit(struct iomap_writepage_ctx *wpc, int error);
>
> int iomap_writepages(struct iomap_writepage_ctx *wpc);
>
> --
> 2.47.2
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 04/14] iomap: refactor the writeback interface
2025-07-08 13:51 ` [PATCH 04/14] iomap: refactor the writeback interface Christoph Hellwig
@ 2025-07-08 19:44 ` Darrick J. Wong
0 siblings, 0 replies; 28+ messages in thread
From: Darrick J. Wong @ 2025-07-08 19:44 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Joanne Koong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2, Brian Foster, Damien Le Moal
On Tue, Jul 08, 2025 at 03:51:10PM +0200, Christoph Hellwig wrote:
> Replace ->map_blocks with a new ->writeback_range, which differs in the
> following ways:
>
> - it must also queue up the I/O for writeback, that is called into the
> slightly refactored and extended in scope iomap_add_to_ioend for
> each region
> - can handle only a part of the requested region, that is the retry
> loop for partial mappings moves to the caller
> - handles cleanup on failures as well, and thus also replaces the
> discard_folio method only implemented by XFS.
>
> This will allow to use the iomap writeback code also for file systems
> that are not block based like fuse.
>
> Co-developed-by: Joanne Koong <joannelkoong@gmail.com>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> Acked-by: Damien Le Moal <dlemoal@kernel.org> # zonefs
Looks good to me,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> .../filesystems/iomap/operations.rst | 32 ++---
> block/fops.c | 25 ++--
> fs/gfs2/bmap.c | 26 ++--
> fs/iomap/buffered-io.c | 96 ++++++-------
> fs/iomap/trace.h | 2 +-
> fs/xfs/xfs_aops.c | 128 +++++++++++-------
> fs/zonefs/file.c | 28 ++--
> include/linux/iomap.h | 21 ++-
> 8 files changed, 197 insertions(+), 161 deletions(-)
>
> diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
> index 3b628e370d88..f07c8fdb2046 100644
> --- a/Documentation/filesystems/iomap/operations.rst
> +++ b/Documentation/filesystems/iomap/operations.rst
> @@ -271,7 +271,7 @@ writeback.
> It does not lock ``i_rwsem`` or ``invalidate_lock``.
>
> The dirty bit will be cleared for all folios run through the
> -``->map_blocks`` machinery described below even if the writeback fails.
> +``->writeback_range`` machinery described below even if the writeback fails.
> This is to prevent dirty folio clots when storage devices fail; an
> ``-EIO`` is recorded for userspace to collect via ``fsync``.
>
> @@ -283,15 +283,14 @@ The ``ops`` structure must be specified and is as follows:
> .. code-block:: c
>
> struct iomap_writeback_ops {
> - int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode,
> - loff_t offset, unsigned len);
> - int (*submit_ioend)(struct iomap_writepage_ctx *wpc, int status);
> - void (*discard_folio)(struct folio *folio, loff_t pos);
> + int (*writeback_range)(struct iomap_writepage_ctx *wpc,
> + struct folio *folio, u64 pos, unsigned int len, u64 end_pos);
> + int (*submit_ioend)(struct iomap_writepage_ctx *wpc, int status);
> };
>
> The fields are as follows:
>
> - - ``map_blocks``: Sets ``wpc->iomap`` to the space mapping of the file
> + - ``writeback_range``: Sets ``wpc->iomap`` to the space mapping of the file
> range (in bytes) given by ``offset`` and ``len``.
> iomap calls this function for each dirty fs block in each dirty folio,
> though it will `reuse mappings
> @@ -306,6 +305,15 @@ The fields are as follows:
> This revalidation must be open-coded by the filesystem; it is
> unclear if ``iomap::validity_cookie`` can be reused for this
> purpose.
> +
> + If this methods fails to schedule I/O for any part of a dirty folio, it
> + should throw away any reservations that may have been made for the write.
> + The folio will be marked clean and an ``-EIO`` recorded in the
> + pagecache.
> + Filesystems can use this callback to `remove
> + <https://lore.kernel.org/all/20201029163313.1766967-1-bfoster@redhat.com/>`_
> + delalloc reservations to avoid having delalloc reservations for
> + clean pagecache.
> This function must be supplied by the filesystem.
>
> - ``submit_ioend``: Allows the file systems to hook into writeback bio
> @@ -316,18 +324,6 @@ The fields are as follows:
> transactions from process context before submitting the bio.
> This function is optional.
>
> - - ``discard_folio``: iomap calls this function after ``->map_blocks``
> - fails to schedule I/O for any part of a dirty folio.
> - The function should throw away any reservations that may have been
> - made for the write.
> - The folio will be marked clean and an ``-EIO`` recorded in the
> - pagecache.
> - Filesystems can use this callback to `remove
> - <https://lore.kernel.org/all/20201029163313.1766967-1-bfoster@redhat.com/>`_
> - delalloc reservations to avoid having delalloc reservations for
> - clean pagecache.
> - This function is optional.
> -
> Pagecache Writeback Completion
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> diff --git a/block/fops.c b/block/fops.c
> index 3394263d942b..b500ff8f55dd 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -537,22 +537,29 @@ static void blkdev_readahead(struct readahead_control *rac)
> iomap_readahead(rac, &blkdev_iomap_ops);
> }
>
> -static int blkdev_map_blocks(struct iomap_writepage_ctx *wpc,
> - struct inode *inode, loff_t offset, unsigned int len)
> +static ssize_t blkdev_writeback_range(struct iomap_writepage_ctx *wpc,
> + struct folio *folio, u64 offset, unsigned int len, u64 end_pos)
> {
> - loff_t isize = i_size_read(inode);
> + loff_t isize = i_size_read(wpc->inode);
>
> if (WARN_ON_ONCE(offset >= isize))
> return -EIO;
> - if (offset >= wpc->iomap.offset &&
> - offset < wpc->iomap.offset + wpc->iomap.length)
> - return 0;
> - return blkdev_iomap_begin(inode, offset, isize - offset,
> - IOMAP_WRITE, &wpc->iomap, NULL);
> +
> + if (offset < wpc->iomap.offset ||
> + offset >= wpc->iomap.offset + wpc->iomap.length) {
> + int error;
> +
> + error = blkdev_iomap_begin(wpc->inode, offset, isize - offset,
> + IOMAP_WRITE, &wpc->iomap, NULL);
> + if (error)
> + return error;
> + }
> +
> + return iomap_add_to_ioend(wpc, folio, offset, end_pos, len);
> }
>
> static const struct iomap_writeback_ops blkdev_writeback_ops = {
> - .map_blocks = blkdev_map_blocks,
> + .writeback_range = blkdev_writeback_range,
> };
>
> static int blkdev_writepages(struct address_space *mapping,
> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> index 7703d0471139..0cc41de54aba 100644
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -2469,23 +2469,25 @@ int __gfs2_punch_hole(struct file *file, loff_t offset, loff_t length)
> return error;
> }
>
> -static int gfs2_map_blocks(struct iomap_writepage_ctx *wpc, struct inode *inode,
> - loff_t offset, unsigned int len)
> +static ssize_t gfs2_writeback_range(struct iomap_writepage_ctx *wpc,
> + struct folio *folio, u64 offset, unsigned int len, u64 end_pos)
> {
> - int ret;
> -
> - if (WARN_ON_ONCE(gfs2_is_stuffed(GFS2_I(inode))))
> + if (WARN_ON_ONCE(gfs2_is_stuffed(GFS2_I(wpc->inode))))
> return -EIO;
>
> - if (offset >= wpc->iomap.offset &&
> - offset < wpc->iomap.offset + wpc->iomap.length)
> - return 0;
> + if (offset < wpc->iomap.offset ||
> + offset >= wpc->iomap.offset + wpc->iomap.length) {
> + int ret;
>
> - memset(&wpc->iomap, 0, sizeof(wpc->iomap));
> - ret = gfs2_iomap_get(inode, offset, INT_MAX, &wpc->iomap);
> - return ret;
> + memset(&wpc->iomap, 0, sizeof(wpc->iomap));
> + ret = gfs2_iomap_get(wpc->inode, offset, INT_MAX, &wpc->iomap);
> + if (ret)
> + return ret;
> + }
> +
> + return iomap_add_to_ioend(wpc, folio, offset, end_pos, len);
> }
>
> const struct iomap_writeback_ops gfs2_writeback_ops = {
> - .map_blocks = gfs2_map_blocks,
> + .writeback_range = gfs2_writeback_range,
> };
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 372342bfffa3..7d9cd05c36bb 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1666,14 +1666,30 @@ static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos,
> * At the end of a writeback pass, there will be a cached ioend remaining on the
> * writepage context that the caller will need to submit.
> */
> -static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
> - struct folio *folio, loff_t pos, loff_t end_pos, unsigned len)
> +ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
> + loff_t pos, loff_t end_pos, unsigned int dirty_len)
> {
> struct iomap_folio_state *ifs = folio->private;
> size_t poff = offset_in_folio(folio, pos);
> unsigned int ioend_flags = 0;
> + unsigned int map_len = min_t(u64, dirty_len,
> + wpc->iomap.offset + wpc->iomap.length - pos);
> int error;
>
> + trace_iomap_add_to_ioend(wpc->inode, pos, dirty_len, &wpc->iomap);
> +
> + WARN_ON_ONCE(!folio->private && map_len < dirty_len);
> +
> + switch (wpc->iomap.type) {
> + case IOMAP_INLINE:
> + WARN_ON_ONCE(1);
> + return -EIO;
> + case IOMAP_HOLE:
> + return map_len;
> + default:
> + break;
> + }
> +
> if (wpc->iomap.type == IOMAP_UNWRITTEN)
> ioend_flags |= IOMAP_IOEND_UNWRITTEN;
> if (wpc->iomap.flags & IOMAP_F_SHARED)
> @@ -1691,11 +1707,11 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
> wpc->ioend = iomap_alloc_ioend(wpc, pos, ioend_flags);
> }
>
> - if (!bio_add_folio(&wpc->ioend->io_bio, folio, len, poff))
> + if (!bio_add_folio(&wpc->ioend->io_bio, folio, map_len, poff))
> goto new_ioend;
>
> if (ifs)
> - atomic_add(len, &ifs->write_bytes_pending);
> + atomic_add(map_len, &ifs->write_bytes_pending);
>
> /*
> * Clamp io_offset and io_size to the incore EOF so that ondisk
> @@ -1738,63 +1754,39 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
> * Note that this defeats the ability to chain the ioends of
> * appending writes.
> */
> - wpc->ioend->io_size += len;
> + wpc->ioend->io_size += map_len;
> if (wpc->ioend->io_offset + wpc->ioend->io_size > end_pos)
> wpc->ioend->io_size = end_pos - wpc->ioend->io_offset;
>
> - wbc_account_cgroup_owner(wpc->wbc, folio, len);
> - return 0;
> + wbc_account_cgroup_owner(wpc->wbc, folio, map_len);
> + return map_len;
> }
> +EXPORT_SYMBOL_GPL(iomap_add_to_ioend);
>
> -static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
> - struct folio *folio, u64 pos, u64 end_pos, unsigned dirty_len,
> +static int iomap_writeback_range(struct iomap_writepage_ctx *wpc,
> + struct folio *folio, u64 pos, u32 rlen, u64 end_pos,
> bool *wb_pending)
> {
> - int error;
> -
> do {
> - unsigned map_len;
> -
> - error = wpc->ops->map_blocks(wpc, wpc->inode, pos, dirty_len);
> - if (error)
> - break;
> - trace_iomap_writepage_map(wpc->inode, pos, dirty_len,
> - &wpc->iomap);
> + ssize_t ret;
>
> - map_len = min_t(u64, dirty_len,
> - wpc->iomap.offset + wpc->iomap.length - pos);
> - WARN_ON_ONCE(!folio->private && map_len < dirty_len);
> + ret = wpc->ops->writeback_range(wpc, folio, pos, rlen, end_pos);
> + if (WARN_ON_ONCE(ret == 0 || ret > rlen))
> + return -EIO;
> + if (ret < 0)
> + return ret;
> + rlen -= ret;
> + pos += ret;
>
> - switch (wpc->iomap.type) {
> - case IOMAP_INLINE:
> - WARN_ON_ONCE(1);
> - error = -EIO;
> - break;
> - case IOMAP_HOLE:
> - break;
> - default:
> - error = iomap_add_to_ioend(wpc, folio, pos, end_pos,
> - map_len);
> - if (!error)
> - *wb_pending = true;
> - break;
> - }
> - dirty_len -= map_len;
> - pos += map_len;
> - } while (dirty_len && !error);
> + /*
> + * Holes are not be written back by ->writeback_range, so track
> + * if we did handle anything that is not a hole here.
> + */
> + if (wpc->iomap.type != IOMAP_HOLE)
> + *wb_pending = true;
> + } while (rlen);
>
> - /*
> - * We cannot cancel the ioend directly here on error. We may have
> - * already set other pages under writeback and hence we have to run I/O
> - * completion to mark the error state of the pages under writeback
> - * appropriately.
> - *
> - * Just let the file system know what portion of the folio failed to
> - * map.
> - */
> - if (error && wpc->ops->discard_folio)
> - wpc->ops->discard_folio(folio, pos);
> - return error;
> + return 0;
> }
>
> /*
> @@ -1906,8 +1898,8 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> */
> end_aligned = round_up(end_pos, i_blocksize(inode));
> while ((rlen = iomap_find_dirty_range(folio, &pos, end_aligned))) {
> - error = iomap_writepage_map_blocks(wpc, folio, pos, end_pos,
> - rlen, &wb_pending);
> + error = iomap_writeback_range(wpc, folio, pos, rlen, end_pos,
> + &wb_pending);
> if (error)
> break;
> pos += rlen;
> diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
> index 455cc6f90be0..aaea02c9560a 100644
> --- a/fs/iomap/trace.h
> +++ b/fs/iomap/trace.h
> @@ -169,7 +169,7 @@ DEFINE_EVENT(iomap_class, name, \
> DEFINE_IOMAP_EVENT(iomap_iter_dstmap);
> DEFINE_IOMAP_EVENT(iomap_iter_srcmap);
>
> -TRACE_EVENT(iomap_writepage_map,
> +TRACE_EVENT(iomap_add_to_ioend,
> TP_PROTO(struct inode *inode, u64 pos, unsigned int dirty_len,
> struct iomap *iomap),
> TP_ARGS(inode, pos, dirty_len, iomap),
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 65485a52df3b..f6d44ab78442 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -233,6 +233,47 @@ xfs_end_bio(
> spin_unlock_irqrestore(&ip->i_ioend_lock, flags);
> }
>
> +/*
> + * We cannot cancel the ioend directly on error. We may have already set other
> + * pages under writeback and hence we have to run I/O completion to mark the
> + * error state of the pages under writeback appropriately.
> + *
> + * If the folio has delalloc blocks on it, the caller is asking us to punch them
> + * out. If we don't, we can leave a stale delalloc mapping covered by a clean
> + * page that needs to be dirtied again before the delalloc mapping can be
> + * converted. This stale delalloc mapping can trip up a later direct I/O read
> + * operation on the same region.
> + *
> + * We prevent this by truncating away the delalloc regions on the folio. Because
> + * they are delalloc, we can do this without needing a transaction. Indeed - if
> + * we get ENOSPC errors, we have to be able to do this truncation without a
> + * transaction as there is no space left for block reservation (typically why
> + * we see a ENOSPC in writeback).
> + */
> +static void
> +xfs_discard_folio(
> + struct folio *folio,
> + loff_t pos)
> +{
> + struct xfs_inode *ip = XFS_I(folio->mapping->host);
> + struct xfs_mount *mp = ip->i_mount;
> +
> + if (xfs_is_shutdown(mp))
> + return;
> +
> + xfs_alert_ratelimited(mp,
> + "page discard on page "PTR_FMT", inode 0x%llx, pos %llu.",
> + folio, ip->i_ino, pos);
> +
> + /*
> + * The end of the punch range is always the offset of the first
> + * byte of the next folio. Hence the end offset is only dependent on the
> + * folio itself and not the start offset that is passed in.
> + */
> + xfs_bmap_punch_delalloc_range(ip, XFS_DATA_FORK, pos,
> + folio_pos(folio) + folio_size(folio), NULL);
> +}
> +
> /*
> * Fast revalidation of the cached writeback mapping. Return true if the current
> * mapping is valid, false otherwise.
> @@ -278,13 +319,12 @@ xfs_imap_valid(
> static int
> xfs_map_blocks(
> struct iomap_writepage_ctx *wpc,
> - struct inode *inode,
> loff_t offset,
> unsigned int len)
> {
> - struct xfs_inode *ip = XFS_I(inode);
> + struct xfs_inode *ip = XFS_I(wpc->inode);
> struct xfs_mount *mp = ip->i_mount;
> - ssize_t count = i_blocksize(inode);
> + ssize_t count = i_blocksize(wpc->inode);
> xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset);
> xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, offset + count);
> xfs_fileoff_t cow_fsb;
> @@ -436,6 +476,24 @@ xfs_map_blocks(
> return 0;
> }
>
> +static ssize_t
> +xfs_writeback_range(
> + struct iomap_writepage_ctx *wpc,
> + struct folio *folio,
> + u64 offset,
> + unsigned int len,
> + u64 end_pos)
> +{
> + ssize_t ret;
> +
> + ret = xfs_map_blocks(wpc, offset, len);
> + if (!ret)
> + ret = iomap_add_to_ioend(wpc, folio, offset, end_pos, len);
> + if (ret < 0)
> + xfs_discard_folio(folio, offset);
> + return ret;
> +}
> +
> static bool
> xfs_ioend_needs_wq_completion(
> struct iomap_ioend *ioend)
> @@ -488,47 +546,9 @@ xfs_submit_ioend(
> return 0;
> }
>
> -/*
> - * If the folio has delalloc blocks on it, the caller is asking us to punch them
> - * out. If we don't, we can leave a stale delalloc mapping covered by a clean
> - * page that needs to be dirtied again before the delalloc mapping can be
> - * converted. This stale delalloc mapping can trip up a later direct I/O read
> - * operation on the same region.
> - *
> - * We prevent this by truncating away the delalloc regions on the folio. Because
> - * they are delalloc, we can do this without needing a transaction. Indeed - if
> - * we get ENOSPC errors, we have to be able to do this truncation without a
> - * transaction as there is no space left for block reservation (typically why
> - * we see a ENOSPC in writeback).
> - */
> -static void
> -xfs_discard_folio(
> - struct folio *folio,
> - loff_t pos)
> -{
> - struct xfs_inode *ip = XFS_I(folio->mapping->host);
> - struct xfs_mount *mp = ip->i_mount;
> -
> - if (xfs_is_shutdown(mp))
> - return;
> -
> - xfs_alert_ratelimited(mp,
> - "page discard on page "PTR_FMT", inode 0x%llx, pos %llu.",
> - folio, ip->i_ino, pos);
> -
> - /*
> - * The end of the punch range is always the offset of the first
> - * byte of the next folio. Hence the end offset is only dependent on the
> - * folio itself and not the start offset that is passed in.
> - */
> - xfs_bmap_punch_delalloc_range(ip, XFS_DATA_FORK, pos,
> - folio_pos(folio) + folio_size(folio), NULL);
> -}
> -
> static const struct iomap_writeback_ops xfs_writeback_ops = {
> - .map_blocks = xfs_map_blocks,
> + .writeback_range = xfs_writeback_range,
> .submit_ioend = xfs_submit_ioend,
> - .discard_folio = xfs_discard_folio,
> };
>
> struct xfs_zoned_writepage_ctx {
> @@ -545,11 +565,10 @@ XFS_ZWPC(struct iomap_writepage_ctx *ctx)
> static int
> xfs_zoned_map_blocks(
> struct iomap_writepage_ctx *wpc,
> - struct inode *inode,
> loff_t offset,
> unsigned int len)
> {
> - struct xfs_inode *ip = XFS_I(inode);
> + struct xfs_inode *ip = XFS_I(wpc->inode);
> struct xfs_mount *mp = ip->i_mount;
> xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset);
> xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, offset + len);
> @@ -608,6 +627,24 @@ xfs_zoned_map_blocks(
> return 0;
> }
>
> +static ssize_t
> +xfs_zoned_writeback_range(
> + struct iomap_writepage_ctx *wpc,
> + struct folio *folio,
> + u64 offset,
> + unsigned int len,
> + u64 end_pos)
> +{
> + ssize_t ret;
> +
> + ret = xfs_zoned_map_blocks(wpc, offset, len);
> + if (!ret)
> + ret = iomap_add_to_ioend(wpc, folio, offset, end_pos, len);
> + if (ret < 0)
> + xfs_discard_folio(folio, offset);
> + return ret;
> +}
> +
> static int
> xfs_zoned_submit_ioend(
> struct iomap_writepage_ctx *wpc,
> @@ -621,9 +658,8 @@ xfs_zoned_submit_ioend(
> }
>
> static const struct iomap_writeback_ops xfs_zoned_writeback_ops = {
> - .map_blocks = xfs_zoned_map_blocks,
> + .writeback_range = xfs_zoned_writeback_range,
> .submit_ioend = xfs_zoned_submit_ioend,
> - .discard_folio = xfs_discard_folio,
> };
>
> STATIC int
> diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
> index edca4bbe4b72..c88e2c851753 100644
> --- a/fs/zonefs/file.c
> +++ b/fs/zonefs/file.c
> @@ -124,29 +124,33 @@ static void zonefs_readahead(struct readahead_control *rac)
> * Map blocks for page writeback. This is used only on conventional zone files,
> * which implies that the page range can only be within the fixed inode size.
> */
> -static int zonefs_write_map_blocks(struct iomap_writepage_ctx *wpc,
> - struct inode *inode, loff_t offset,
> - unsigned int len)
> +static ssize_t zonefs_writeback_range(struct iomap_writepage_ctx *wpc,
> + struct folio *folio, u64 offset, unsigned len, u64 end_pos)
> {
> - struct zonefs_zone *z = zonefs_inode_zone(inode);
> + struct zonefs_zone *z = zonefs_inode_zone(wpc->inode);
>
> if (WARN_ON_ONCE(zonefs_zone_is_seq(z)))
> return -EIO;
> - if (WARN_ON_ONCE(offset >= i_size_read(inode)))
> + if (WARN_ON_ONCE(offset >= i_size_read(wpc->inode)))
> return -EIO;
>
> /* If the mapping is already OK, nothing needs to be done */
> - if (offset >= wpc->iomap.offset &&
> - offset < wpc->iomap.offset + wpc->iomap.length)
> - return 0;
> + if (offset < wpc->iomap.offset ||
> + offset >= wpc->iomap.offset + wpc->iomap.length) {
> + int error;
> +
> + error = zonefs_write_iomap_begin(wpc->inode, offset,
> + z->z_capacity - offset, IOMAP_WRITE,
> + &wpc->iomap, NULL);
> + if (error)
> + return error;
> + }
>
> - return zonefs_write_iomap_begin(inode, offset,
> - z->z_capacity - offset,
> - IOMAP_WRITE, &wpc->iomap, NULL);
> + return iomap_add_to_ioend(wpc, folio, offset, end_pos, len);
> }
>
> static const struct iomap_writeback_ops zonefs_writeback_ops = {
> - .map_blocks = zonefs_write_map_blocks,
> + .writeback_range = zonefs_writeback_range,
> };
>
> static int zonefs_writepages(struct address_space *mapping,
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 00179c9387c5..625d7911a2b5 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -416,18 +416,20 @@ static inline struct iomap_ioend *iomap_ioend_from_bio(struct bio *bio)
>
> struct iomap_writeback_ops {
> /*
> - * Required, maps the blocks so that writeback can be performed on
> - * the range starting at offset.
> + * Required, performs writeback on the passed in range
> *
> - * Can return arbitrarily large regions, but we need to call into it at
> + * Can map arbitrarily large regions, but we need to call into it at
> * least once per folio to allow the file systems to synchronize with
> * the write path that could be invalidating mappings.
> *
> * An existing mapping from a previous call to this method can be reused
> * by the file system if it is still valid.
> + *
> + * Returns the number of bytes processed or a negative errno.
> */
> - int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode,
> - loff_t offset, unsigned len);
> + ssize_t (*writeback_range)(struct iomap_writepage_ctx *wpc,
> + struct folio *folio, u64 pos, unsigned int len,
> + u64 end_pos);
>
> /*
> * Optional, allows the file systems to hook into bio submission,
> @@ -438,12 +440,6 @@ struct iomap_writeback_ops {
> * the bio could not be submitted.
> */
> int (*submit_ioend)(struct iomap_writepage_ctx *wpc, int status);
> -
> - /*
> - * Optional, allows the file system to discard state on a page where
> - * we failed to submit any I/O.
> - */
> - void (*discard_folio)(struct folio *folio, loff_t pos);
> };
>
> struct iomap_writepage_ctx {
> @@ -463,6 +459,9 @@ void iomap_finish_ioends(struct iomap_ioend *ioend, int error);
> void iomap_ioend_try_merge(struct iomap_ioend *ioend,
> struct list_head *more_ioends);
> void iomap_sort_ioends(struct list_head *ioend_list);
> +ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
> + loff_t pos, loff_t end_pos, unsigned int dirty_len);
> +
> int iomap_writepages(struct iomap_writepage_ctx *wpc);
>
> /*
> --
> 2.47.2
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 02/14] iomap: pass more arguments using the iomap writeback context
2025-07-08 13:51 ` [PATCH 02/14] iomap: pass more arguments using the iomap writeback context Christoph Hellwig
@ 2025-07-08 19:45 ` Darrick J. Wong
0 siblings, 0 replies; 28+ messages in thread
From: Darrick J. Wong @ 2025-07-08 19:45 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Joanne Koong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2, Brian Foster, Johannes Thumshirn
On Tue, Jul 08, 2025 at 03:51:08PM +0200, Christoph Hellwig wrote:
> Add inode and wpc fields to pass the inode and writeback context that
> are needed in the entire writeback call chain, and let the callers
> initialize all fields in the writeback context before calling
> iomap_writepages to simplify the argument passing.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Yess smaller callsites,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> block/fops.c | 8 +++++--
> fs/gfs2/aops.c | 8 +++++--
> fs/iomap/buffered-io.c | 52 +++++++++++++++++++-----------------------
> fs/xfs/xfs_aops.c | 24 +++++++++++++------
> fs/zonefs/file.c | 8 +++++--
> include/linux/iomap.h | 6 ++---
> 6 files changed, 61 insertions(+), 45 deletions(-)
>
> diff --git a/block/fops.c b/block/fops.c
> index 1309861d4c2c..3394263d942b 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -558,9 +558,13 @@ static const struct iomap_writeback_ops blkdev_writeback_ops = {
> static int blkdev_writepages(struct address_space *mapping,
> struct writeback_control *wbc)
> {
> - struct iomap_writepage_ctx wpc = { };
> + struct iomap_writepage_ctx wpc = {
> + .inode = mapping->host,
> + .wbc = wbc,
> + .ops = &blkdev_writeback_ops
> + };
>
> - return iomap_writepages(mapping, wbc, &wpc, &blkdev_writeback_ops);
> + return iomap_writepages(&wpc);
> }
>
> const struct address_space_operations def_blk_aops = {
> diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
> index 14f204cd5a82..47d74afd63ac 100644
> --- a/fs/gfs2/aops.c
> +++ b/fs/gfs2/aops.c
> @@ -159,7 +159,11 @@ static int gfs2_writepages(struct address_space *mapping,
> struct writeback_control *wbc)
> {
> struct gfs2_sbd *sdp = gfs2_mapping2sbd(mapping);
> - struct iomap_writepage_ctx wpc = { };
> + struct iomap_writepage_ctx wpc = {
> + .inode = mapping->host,
> + .wbc = wbc,
> + .ops = &gfs2_writeback_ops,
> + };
> int ret;
>
> /*
> @@ -168,7 +172,7 @@ static int gfs2_writepages(struct address_space *mapping,
> * want balance_dirty_pages() to loop indefinitely trying to write out
> * pages held in the ail that it can't find.
> */
> - ret = iomap_writepages(mapping, wbc, &wpc, &gfs2_writeback_ops);
> + ret = iomap_writepages(&wpc);
> if (ret == 0 && wbc->nr_to_write > 0)
> set_bit(SDF_FORCE_AIL_FLUSH, &sdp->sd_flags);
> return ret;
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index addf6ed13061..2806ec1e0b5e 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1616,20 +1616,19 @@ static int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error)
> }
>
> static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
> - struct writeback_control *wbc, struct inode *inode, loff_t pos,
> - u16 ioend_flags)
> + loff_t pos, u16 ioend_flags)
> {
> struct bio *bio;
>
> bio = bio_alloc_bioset(wpc->iomap.bdev, BIO_MAX_VECS,
> - REQ_OP_WRITE | wbc_to_write_flags(wbc),
> + REQ_OP_WRITE | wbc_to_write_flags(wpc->wbc),
> GFP_NOFS, &iomap_ioend_bioset);
> bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos);
> bio->bi_end_io = iomap_writepage_end_bio;
> - bio->bi_write_hint = inode->i_write_hint;
> - wbc_init_bio(wbc, bio);
> + bio->bi_write_hint = wpc->inode->i_write_hint;
> + wbc_init_bio(wpc->wbc, bio);
> wpc->nr_folios = 0;
> - return iomap_init_ioend(inode, bio, pos, ioend_flags);
> + return iomap_init_ioend(wpc->inode, bio, pos, ioend_flags);
> }
>
> static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos,
> @@ -1668,9 +1667,7 @@ static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos,
> * writepage context that the caller will need to submit.
> */
> static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
> - struct writeback_control *wbc, struct folio *folio,
> - struct inode *inode, loff_t pos, loff_t end_pos,
> - unsigned len)
> + struct folio *folio, loff_t pos, loff_t end_pos, unsigned len)
> {
> struct iomap_folio_state *ifs = folio->private;
> size_t poff = offset_in_folio(folio, pos);
> @@ -1691,8 +1688,7 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
> error = iomap_submit_ioend(wpc, 0);
> if (error)
> return error;
> - wpc->ioend = iomap_alloc_ioend(wpc, wbc, inode, pos,
> - ioend_flags);
> + wpc->ioend = iomap_alloc_ioend(wpc, pos, ioend_flags);
> }
>
> if (!bio_add_folio(&wpc->ioend->io_bio, folio, len, poff))
> @@ -1746,24 +1742,24 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
> if (wpc->ioend->io_offset + wpc->ioend->io_size > end_pos)
> wpc->ioend->io_size = end_pos - wpc->ioend->io_offset;
>
> - wbc_account_cgroup_owner(wbc, folio, len);
> + wbc_account_cgroup_owner(wpc->wbc, folio, len);
> return 0;
> }
>
> static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
> - struct writeback_control *wbc, struct folio *folio,
> - struct inode *inode, u64 pos, u64 end_pos,
> - unsigned dirty_len, unsigned *count)
> + struct folio *folio, u64 pos, u64 end_pos, unsigned dirty_len,
> + unsigned *count)
> {
> int error;
>
> do {
> unsigned map_len;
>
> - error = wpc->ops->map_blocks(wpc, inode, pos, dirty_len);
> + error = wpc->ops->map_blocks(wpc, wpc->inode, pos, dirty_len);
> if (error)
> break;
> - trace_iomap_writepage_map(inode, pos, dirty_len, &wpc->iomap);
> + trace_iomap_writepage_map(wpc->inode, pos, dirty_len,
> + &wpc->iomap);
>
> map_len = min_t(u64, dirty_len,
> wpc->iomap.offset + wpc->iomap.length - pos);
> @@ -1777,8 +1773,8 @@ static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
> case IOMAP_HOLE:
> break;
> default:
> - error = iomap_add_to_ioend(wpc, wbc, folio, inode, pos,
> - end_pos, map_len);
> + error = iomap_add_to_ioend(wpc, folio, pos, end_pos,
> + map_len);
> if (!error)
> (*count)++;
> break;
> @@ -1860,10 +1856,10 @@ static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode,
> }
>
> static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> - struct writeback_control *wbc, struct folio *folio)
> + struct folio *folio)
> {
> struct iomap_folio_state *ifs = folio->private;
> - struct inode *inode = folio->mapping->host;
> + struct inode *inode = wpc->inode;
> u64 pos = folio_pos(folio);
> u64 end_pos = pos + folio_size(folio);
> u64 end_aligned = 0;
> @@ -1910,8 +1906,8 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> */
> end_aligned = round_up(end_pos, i_blocksize(inode));
> while ((rlen = iomap_find_dirty_range(folio, &pos, end_aligned))) {
> - error = iomap_writepage_map_blocks(wpc, wbc, folio, inode,
> - pos, end_pos, rlen, &count);
> + error = iomap_writepage_map_blocks(wpc, folio, pos, end_pos,
> + rlen, &count);
> if (error)
> break;
> pos += rlen;
> @@ -1947,10 +1943,9 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> }
>
> int
> -iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
> - struct iomap_writepage_ctx *wpc,
> - const struct iomap_writeback_ops *ops)
> +iomap_writepages(struct iomap_writepage_ctx *wpc)
> {
> + struct address_space *mapping = wpc->inode->i_mapping;
> struct folio *folio = NULL;
> int error;
>
> @@ -1962,9 +1957,8 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
> PF_MEMALLOC))
> return -EIO;
>
> - wpc->ops = ops;
> - while ((folio = writeback_iter(mapping, wbc, folio, &error)))
> - error = iomap_writepage_map(wpc, wbc, folio);
> + while ((folio = writeback_iter(mapping, wpc->wbc, folio, &error)))
> + error = iomap_writepage_map(wpc, folio);
> return iomap_submit_ioend(wpc, error);
> }
> EXPORT_SYMBOL_GPL(iomap_writepages);
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 63151feb9c3f..65485a52df3b 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -636,19 +636,29 @@ xfs_vm_writepages(
> xfs_iflags_clear(ip, XFS_ITRUNCATED);
>
> if (xfs_is_zoned_inode(ip)) {
> - struct xfs_zoned_writepage_ctx xc = { };
> + struct xfs_zoned_writepage_ctx xc = {
> + .ctx = {
> + .inode = mapping->host,
> + .wbc = wbc,
> + .ops = &xfs_zoned_writeback_ops
> + },
> + };
> int error;
>
> - error = iomap_writepages(mapping, wbc, &xc.ctx,
> - &xfs_zoned_writeback_ops);
> + error = iomap_writepages(&xc.ctx);
> if (xc.open_zone)
> xfs_open_zone_put(xc.open_zone);
> return error;
> } else {
> - struct xfs_writepage_ctx wpc = { };
> -
> - return iomap_writepages(mapping, wbc, &wpc.ctx,
> - &xfs_writeback_ops);
> + struct xfs_writepage_ctx wpc = {
> + .ctx = {
> + .inode = mapping->host,
> + .wbc = wbc,
> + .ops = &xfs_writeback_ops
> + },
> + };
> +
> + return iomap_writepages(&wpc.ctx);
> }
> }
>
> diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
> index 42e2c0065bb3..edca4bbe4b72 100644
> --- a/fs/zonefs/file.c
> +++ b/fs/zonefs/file.c
> @@ -152,9 +152,13 @@ static const struct iomap_writeback_ops zonefs_writeback_ops = {
> static int zonefs_writepages(struct address_space *mapping,
> struct writeback_control *wbc)
> {
> - struct iomap_writepage_ctx wpc = { };
> + struct iomap_writepage_ctx wpc = {
> + .inode = mapping->host,
> + .wbc = wbc,
> + .ops = &zonefs_writeback_ops,
> + };
>
> - return iomap_writepages(mapping, wbc, &wpc, &zonefs_writeback_ops);
> + return iomap_writepages(&wpc);
> }
>
> static int zonefs_swap_activate(struct swap_info_struct *sis,
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 522644d62f30..00179c9387c5 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -448,6 +448,8 @@ struct iomap_writeback_ops {
>
> struct iomap_writepage_ctx {
> struct iomap iomap;
> + struct inode *inode;
> + struct writeback_control *wbc;
> struct iomap_ioend *ioend;
> const struct iomap_writeback_ops *ops;
> u32 nr_folios; /* folios added to the ioend */
> @@ -461,9 +463,7 @@ void iomap_finish_ioends(struct iomap_ioend *ioend, int error);
> void iomap_ioend_try_merge(struct iomap_ioend *ioend,
> struct list_head *more_ioends);
> void iomap_sort_ioends(struct list_head *ioend_list);
> -int iomap_writepages(struct address_space *mapping,
> - struct writeback_control *wbc, struct iomap_writepage_ctx *wpc,
> - const struct iomap_writeback_ops *ops);
> +int iomap_writepages(struct iomap_writepage_ctx *wpc);
>
> /*
> * Flags for direct I/O ->end_io:
> --
> 2.47.2
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 01/14] iomap: header diet
2025-07-08 13:51 ` [PATCH 01/14] iomap: header diet Christoph Hellwig
@ 2025-07-08 19:45 ` Darrick J. Wong
2025-07-08 21:09 ` Joanne Koong
2025-07-09 9:09 ` kernel test robot
2 siblings, 0 replies; 28+ messages in thread
From: Darrick J. Wong @ 2025-07-08 19:45 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Joanne Koong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2
On Tue, Jul 08, 2025 at 03:51:07PM +0200, Christoph Hellwig wrote:
> Drop various unused #include statements.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Assuming the build bots don't hate this,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/iomap/buffered-io.c | 10 ----------
> fs/iomap/direct-io.c | 5 -----
> fs/iomap/fiemap.c | 3 ---
> fs/iomap/iter.c | 1 -
> fs/iomap/seek.c | 4 ----
> fs/iomap/swapfile.c | 3 ---
> fs/iomap/trace.c | 1 -
> 7 files changed, 27 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 3729391a18f3..addf6ed13061 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -3,18 +3,8 @@
> * Copyright (C) 2010 Red Hat, Inc.
> * Copyright (C) 2016-2023 Christoph Hellwig.
> */
> -#include <linux/module.h>
> -#include <linux/compiler.h>
> -#include <linux/fs.h>
> #include <linux/iomap.h>
> -#include <linux/pagemap.h>
> -#include <linux/uio.h>
> #include <linux/buffer_head.h>
> -#include <linux/dax.h>
> -#include <linux/writeback.h>
> -#include <linux/swap.h>
> -#include <linux/bio.h>
> -#include <linux/sched/signal.h>
> #include <linux/migrate.h>
> #include "internal.h"
> #include "trace.h"
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 844261a31156..6f25d4cfea9f 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -3,14 +3,9 @@
> * Copyright (C) 2010 Red Hat, Inc.
> * Copyright (c) 2016-2025 Christoph Hellwig.
> */
> -#include <linux/module.h>
> -#include <linux/compiler.h>
> -#include <linux/fs.h>
> #include <linux/fscrypt.h>
> #include <linux/pagemap.h>
> #include <linux/iomap.h>
> -#include <linux/backing-dev.h>
> -#include <linux/uio.h>
> #include <linux/task_io_accounting_ops.h>
> #include "internal.h"
> #include "trace.h"
> diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
> index 80675c42e94e..d11dadff8286 100644
> --- a/fs/iomap/fiemap.c
> +++ b/fs/iomap/fiemap.c
> @@ -2,9 +2,6 @@
> /*
> * Copyright (c) 2016-2021 Christoph Hellwig.
> */
> -#include <linux/module.h>
> -#include <linux/compiler.h>
> -#include <linux/fs.h>
> #include <linux/iomap.h>
> #include <linux/fiemap.h>
> #include <linux/pagemap.h>
> diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c
> index 6ffc6a7b9ba5..cef77ca0c20b 100644
> --- a/fs/iomap/iter.c
> +++ b/fs/iomap/iter.c
> @@ -3,7 +3,6 @@
> * Copyright (C) 2010 Red Hat, Inc.
> * Copyright (c) 2016-2021 Christoph Hellwig.
> */
> -#include <linux/fs.h>
> #include <linux/iomap.h>
> #include "trace.h"
>
> diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c
> index 04d7919636c1..56db2dd4b10d 100644
> --- a/fs/iomap/seek.c
> +++ b/fs/iomap/seek.c
> @@ -3,12 +3,8 @@
> * Copyright (C) 2017 Red Hat, Inc.
> * Copyright (c) 2018-2021 Christoph Hellwig.
> */
> -#include <linux/module.h>
> -#include <linux/compiler.h>
> -#include <linux/fs.h>
> #include <linux/iomap.h>
> #include <linux/pagemap.h>
> -#include <linux/pagevec.h>
>
> static int iomap_seek_hole_iter(struct iomap_iter *iter,
> loff_t *hole_pos)
> diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
> index c1a762c10ce4..0db77c449467 100644
> --- a/fs/iomap/swapfile.c
> +++ b/fs/iomap/swapfile.c
> @@ -3,9 +3,6 @@
> * Copyright (C) 2018 Oracle. All Rights Reserved.
> * Author: Darrick J. Wong <darrick.wong@oracle.com>
> */
> -#include <linux/module.h>
> -#include <linux/compiler.h>
> -#include <linux/fs.h>
> #include <linux/iomap.h>
> #include <linux/swap.h>
>
> diff --git a/fs/iomap/trace.c b/fs/iomap/trace.c
> index 728d5443daf5..da217246b1a9 100644
> --- a/fs/iomap/trace.c
> +++ b/fs/iomap/trace.c
> @@ -3,7 +3,6 @@
> * Copyright (c) 2019 Christoph Hellwig
> */
> #include <linux/iomap.h>
> -#include <linux/uio.h>
>
> /*
> * We include this last to have the helpers above available for the trace
> --
> 2.47.2
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 01/14] iomap: header diet
2025-07-08 13:51 ` [PATCH 01/14] iomap: header diet Christoph Hellwig
2025-07-08 19:45 ` Darrick J. Wong
@ 2025-07-08 21:09 ` Joanne Koong
2025-07-09 9:09 ` kernel test robot
2 siblings, 0 replies; 28+ messages in thread
From: Joanne Koong @ 2025-07-08 21:09 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2
On Tue, Jul 8, 2025 at 6:51 AM Christoph Hellwig <hch@lst.de> wrote:
>
> Drop various unused #include statements.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
LGTM. btw in the 7th patch when the ioend handling logic gets moved to
ioend.c, #include "internal.h" in buffered-io.c can be dropped then
too, but that's a very minor triviality.
> ---
> fs/iomap/buffered-io.c | 10 ----------
> fs/iomap/direct-io.c | 5 -----
> fs/iomap/fiemap.c | 3 ---
> fs/iomap/iter.c | 1 -
> fs/iomap/seek.c | 4 ----
> fs/iomap/swapfile.c | 3 ---
> fs/iomap/trace.c | 1 -
> 7 files changed, 27 deletions(-)
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 08/14] iomap: rename iomap_writepage_map to iomap_writeback_folio
2025-07-08 13:51 ` [PATCH 08/14] iomap: rename iomap_writepage_map to iomap_writeback_folio Christoph Hellwig
@ 2025-07-08 21:12 ` Joanne Koong
0 siblings, 0 replies; 28+ messages in thread
From: Joanne Koong @ 2025-07-08 21:12 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2, Brian Foster
On Tue, Jul 8, 2025 at 6:52 AM Christoph Hellwig <hch@lst.de> wrote:
>
> ->writepage is gone, and our naming wasn't always that great to start
> with.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
> ---
> fs/iomap/buffered-io.c | 10 +++++-----
> fs/iomap/trace.h | 2 +-
> 2 files changed, 6 insertions(+), 6 deletions(-)
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 12/14] iomap: improve argument passing to iomap_read_folio_sync
2025-07-08 13:51 ` [PATCH 12/14] iomap: improve argument passing to iomap_read_folio_sync Christoph Hellwig
2025-07-08 19:40 ` Darrick J. Wong
@ 2025-07-08 21:23 ` Joanne Koong
1 sibling, 0 replies; 28+ messages in thread
From: Joanne Koong @ 2025-07-08 21:23 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2
On Tue, Jul 8, 2025 at 6:52 AM Christoph Hellwig <hch@lst.de> wrote:
>
> Pass the iomap_iter and derive the map inside iomap_read_folio_sync
> instead of in the caller, and use the more descriptive srcmap name for
> the source iomap. Stop passing the offset into folio argument as it
> can be derived from the folio and the file offset. Rename the
> variables for the offset into the file and the length to be more
> descriptive and match the rest of the code.
>
> Rename the function itself to iomap_read_folio_range to make the use
> more clear.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 14/14] iomap: build the writeback code without CONFIG_BLOCK
2025-07-08 13:51 ` [PATCH 14/14] iomap: build the writeback code without CONFIG_BLOCK Christoph Hellwig
@ 2025-07-08 21:27 ` Joanne Koong
0 siblings, 0 replies; 28+ messages in thread
From: Joanne Koong @ 2025-07-08 21:27 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2
On Tue, Jul 8, 2025 at 6:52 AM Christoph Hellwig <hch@lst.de> wrote:
>
> Allow fuse to use the iomap writeback code even when CONFIG_BLOCK is
> not enabled. Do this with an ifdef instead of a separate file to keep
> the iomap_folio_state local to buffered-io.c.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
> ---
> fs/iomap/Makefile | 6 +--
> fs/iomap/buffered-io.c | 113 ++++++++++++++++++++++-------------------
> 2 files changed, 64 insertions(+), 55 deletions(-)
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: refactor the iomap writeback code v4
2025-07-08 13:51 refactor the iomap writeback code v4 Christoph Hellwig
` (13 preceding siblings ...)
2025-07-08 13:51 ` [PATCH 14/14] iomap: build the writeback code without CONFIG_BLOCK Christoph Hellwig
@ 2025-07-08 21:30 ` Joanne Koong
14 siblings, 0 replies; 28+ messages in thread
From: Joanne Koong @ 2025-07-08 21:30 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Darrick J. Wong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2
On Tue, Jul 8, 2025 at 6:51 AM Christoph Hellwig <hch@lst.de> wrote:
>
> Hi all,
>
> this is an alternative approach to the writeback part of the
> "fuse: use iomap for buffered writes + writeback" series from Joanne.
> It doesn't try to make the code build without CONFIG_BLOCK yet.
>
> The big difference compared to Joanne's version is that I hope the
> split between the generic and ioend/bio based writeback code is a bit
> cleaner here. We have two methods that define the split between the
> generic writeback code, and the implemementation of it, and all knowledge
> of ioends and bios now sits below that layer.
>
> This version passes testing on xfs, and gets as far as mainline for
> gfs2 (crashes in generic/361).
>
> Changes since v3:
> - add a patch to drop unused includes
> - drop the iomap_writepage_ctx renaming - we should do this separately and
> including the variable names if desired
> - add a comment about special casing of holes in iomap_writeback_range
> - split the cleanups to iomap_read_folio_sync into a separate prep patch
> - explain the IOMAP_HOLE check in xfs_iomap_valid
> - explain the iomap_writeback_folio later folio unlock vs dropbehind
> - some cargo culting for the #$W# RST formatting
> - "improve" the documentation coverage a bit
>
> Changes since v2:
> - rename iomap_writepage_ctx to iomap_writeback_ctx
> - keep local map_blocks helpers in XFS
> - allow buildinging the writeback and write code for !CONFIG_BLOCK
>
> Changes since v1:
> - fix iomap reuse in block/zonefs/gfs2
> - catch too large return value from ->writeback_range
> - mention the correct file name in a commit log
> - add patches for folio laundering
> - add patches for read/modify write in the generic write helpers
>
> Diffstat:
> Documentation/filesystems/iomap/design.rst | 3
> Documentation/filesystems/iomap/operations.rst | 57 +-
> block/fops.c | 37 +
> fs/gfs2/aops.c | 8
> fs/gfs2/bmap.c | 48 +-
> fs/gfs2/bmap.h | 1
> fs/gfs2/file.c | 3
> fs/iomap/Makefile | 6
> fs/iomap/buffered-io.c | 554 +++++++------------------
> fs/iomap/direct-io.c | 5
> fs/iomap/fiemap.c | 3
> fs/iomap/internal.h | 1
> fs/iomap/ioend.c | 220 +++++++++
> fs/iomap/iter.c | 1
> fs/iomap/seek.c | 4
> fs/iomap/swapfile.c | 3
> fs/iomap/trace.c | 1
> fs/iomap/trace.h | 4
> fs/xfs/xfs_aops.c | 212 +++++----
> fs/xfs/xfs_file.c | 6
> fs/xfs/xfs_iomap.c | 12
> fs/xfs/xfs_iomap.h | 1
> fs/xfs/xfs_reflink.c | 3
> fs/zonefs/file.c | 40 +
> include/linux/iomap.h | 82 ++-
> 25 files changed, 705 insertions(+), 610 deletions(-)
Thanks Christoph for all your work on this. I'll pull this and put v4
of the fuse iomap changes on top of this. I'll send that out this
week.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 01/14] iomap: header diet
2025-07-08 13:51 ` [PATCH 01/14] iomap: header diet Christoph Hellwig
2025-07-08 19:45 ` Darrick J. Wong
2025-07-08 21:09 ` Joanne Koong
@ 2025-07-09 9:09 ` kernel test robot
2025-07-09 17:53 ` Joanne Koong
2 siblings, 1 reply; 28+ messages in thread
From: kernel test robot @ 2025-07-09 9:09 UTC (permalink / raw)
To: Christoph Hellwig, Christian Brauner
Cc: oe-kbuild-all, Darrick J. Wong, Joanne Koong, linux-xfs,
linux-fsdevel, linux-doc, linux-block, gfs2
Hi Christoph,
kernel test robot noticed the following build errors:
[auto build test ERROR on brauner-vfs/vfs.all]
[also build test ERROR on xfs-linux/for-next gfs2/for-next linus/master v6.16-rc5 next-20250708]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Christoph-Hellwig/iomap-pass-more-arguments-using-the-iomap-writeback-context/20250708-225155
base: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.all
patch link: https://lore.kernel.org/r/20250708135132.3347932-2-hch%40lst.de
patch subject: [PATCH 01/14] iomap: header diet
config: arc-randconfig-001-20250709 (https://download.01.org/0day-ci/archive/20250709/202507091656.dMTKUTBY-lkp@intel.com/config)
compiler: arc-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250709/202507091656.dMTKUTBY-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202507091656.dMTKUTBY-lkp@intel.com/
All error/warnings (new ones prefixed by >>):
fs/iomap/buffered-io.c: In function 'iomap_dirty_folio':
>> fs/iomap/buffered-io.c:642:9: error: implicit declaration of function 'filemap_dirty_folio'; did you mean 'iomap_dirty_folio'? [-Werror=implicit-function-declaration]
return filemap_dirty_folio(mapping, folio);
^~~~~~~~~~~~~~~~~~~
iomap_dirty_folio
fs/iomap/buffered-io.c: In function 'iomap_write_iter':
>> fs/iomap/buffered-io.c:930:58: error: 'BDP_ASYNC' undeclared (first use in this function); did you mean 'I_SYNC'?
unsigned int bdp_flags = (iter->flags & IOMAP_NOWAIT) ? BDP_ASYNC : 0;
^~~~~~~~~
I_SYNC
fs/iomap/buffered-io.c:930:58: note: each undeclared identifier is reported only once for each function it appears in
>> fs/iomap/buffered-io.c:945:12: error: implicit declaration of function 'balance_dirty_pages_ratelimited_flags' [-Werror=implicit-function-declaration]
status = balance_dirty_pages_ratelimited_flags(mapping,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/iomap/buffered-io.c: In function 'iomap_unshare_iter':
>> fs/iomap/buffered-io.c:1309:3: error: implicit declaration of function 'balance_dirty_pages_ratelimited'; did you mean 'pr_alert_ratelimited'? [-Werror=implicit-function-declaration]
balance_dirty_pages_ratelimited(iter->inode->i_mapping);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pr_alert_ratelimited
fs/iomap/buffered-io.c: In function 'iomap_zero_iter':
>> fs/iomap/buffered-io.c:1376:3: error: implicit declaration of function 'folio_mark_accessed'; did you mean 'folio_wait_locked'? [-Werror=implicit-function-declaration]
folio_mark_accessed(folio);
^~~~~~~~~~~~~~~~~~~
folio_wait_locked
fs/iomap/buffered-io.c: In function 'iomap_alloc_ioend':
fs/iomap/buffered-io.c:1624:26: error: implicit declaration of function 'wbc_to_write_flags'; did you mean 'do_pipe_flags'? [-Werror=implicit-function-declaration]
REQ_OP_WRITE | wbc_to_write_flags(wbc),
^~~~~~~~~~~~~~~~~~
do_pipe_flags
fs/iomap/buffered-io.c:1629:2: error: implicit declaration of function 'wbc_init_bio'; did you mean 'arc_init_IRQ'? [-Werror=implicit-function-declaration]
wbc_init_bio(wbc, bio);
^~~~~~~~~~~~
arc_init_IRQ
fs/iomap/buffered-io.c: In function 'iomap_add_to_ioend':
fs/iomap/buffered-io.c:1748:2: error: implicit declaration of function 'wbc_account_cgroup_owner'; did you mean 'pr_cont_cgroup_name'? [-Werror=implicit-function-declaration]
wbc_account_cgroup_owner(wbc, folio, len);
^~~~~~~~~~~~~~~~~~~~~~~~
pr_cont_cgroup_name
fs/iomap/buffered-io.c: In function 'iomap_writepages':
>> fs/iomap/buffered-io.c:1965:18: error: implicit declaration of function 'writeback_iter'; did you mean 'write_lock_irq'? [-Werror=implicit-function-declaration]
while ((folio = writeback_iter(mapping, wbc, folio, &error)))
^~~~~~~~~~~~~~
write_lock_irq
>> fs/iomap/buffered-io.c:1965:16: warning: assignment to 'struct folio *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
while ((folio = writeback_iter(mapping, wbc, folio, &error)))
^
cc1: some warnings being treated as errors
vim +642 fs/iomap/buffered-io.c
8306a5f5630552 Matthew Wilcox (Oracle 2021-04-28 634)
4ce02c67972211 Ritesh Harjani (IBM 2023-07-10 635) bool iomap_dirty_folio(struct address_space *mapping, struct folio *folio)
4ce02c67972211 Ritesh Harjani (IBM 2023-07-10 636) {
4ce02c67972211 Ritesh Harjani (IBM 2023-07-10 637) struct inode *inode = mapping->host;
4ce02c67972211 Ritesh Harjani (IBM 2023-07-10 638) size_t len = folio_size(folio);
4ce02c67972211 Ritesh Harjani (IBM 2023-07-10 639)
4ce02c67972211 Ritesh Harjani (IBM 2023-07-10 640) ifs_alloc(inode, folio, 0);
4ce02c67972211 Ritesh Harjani (IBM 2023-07-10 641) iomap_set_range_dirty(folio, 0, len);
4ce02c67972211 Ritesh Harjani (IBM 2023-07-10 @642) return filemap_dirty_folio(mapping, folio);
4ce02c67972211 Ritesh Harjani (IBM 2023-07-10 643) }
4ce02c67972211 Ritesh Harjani (IBM 2023-07-10 644) EXPORT_SYMBOL_GPL(iomap_dirty_folio);
4ce02c67972211 Ritesh Harjani (IBM 2023-07-10 645)
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 01/14] iomap: header diet
2025-07-09 9:09 ` kernel test robot
@ 2025-07-09 17:53 ` Joanne Koong
0 siblings, 0 replies; 28+ messages in thread
From: Joanne Koong @ 2025-07-09 17:53 UTC (permalink / raw)
To: kernel test robot
Cc: Christoph Hellwig, Christian Brauner, oe-kbuild-all,
Darrick J. Wong, linux-xfs, linux-fsdevel, linux-doc, linux-block,
gfs2
On Wed, Jul 9, 2025 at 2:10 AM kernel test robot <lkp@intel.com> wrote:
>
> Hi Christoph,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on brauner-vfs/vfs.all]
> [also build test ERROR on xfs-linux/for-next gfs2/for-next linus/master v6.16-rc5 next-20250708]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Christoph-Hellwig/iomap-pass-more-arguments-using-the-iomap-writeback-context/20250708-225155
> base: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.all
> patch link: https://lore.kernel.org/r/20250708135132.3347932-2-hch%40lst.de
> patch subject: [PATCH 01/14] iomap: header diet
> config: arc-randconfig-001-20250709 (https://download.01.org/0day-ci/archive/20250709/202507091656.dMTKUTBY-lkp@intel.com/config)
> compiler: arc-linux-gcc (GCC) 8.5.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250709/202507091656.dMTKUTBY-lkp@intel.com/reproduce)
>
Adding #include <linux/writeback.h> and #include <linux/swap.h> (for
folio_mark_accessed()) back to buffered-io.c fixes it for this config.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 13/14] iomap: add read_folio_range() handler for buffered writes
2025-07-10 13:33 refactor the iomap writeback code v5 Christoph Hellwig
@ 2025-07-10 13:33 ` Christoph Hellwig
0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2025-07-10 13:33 UTC (permalink / raw)
To: Christian Brauner
Cc: Darrick J. Wong, Joanne Koong, linux-xfs, linux-fsdevel,
linux-doc, linux-block, gfs2
Add a read_folio_range() handler for buffered writes that filesystems
may pass in if they wish to provide a custom handler for synchronously
reading in the contents of a folio.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
[hch: renamed to read_folio_range, pass less arguments]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
Documentation/filesystems/iomap/operations.rst | 6 ++++++
fs/iomap/buffered-io.c | 13 +++++++++----
include/linux/iomap.h | 10 ++++++++++
3 files changed, 25 insertions(+), 4 deletions(-)
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index a9b48ce4af92..067ed8e14ef3 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -68,6 +68,8 @@ The following address space operations can be wrapped easily:
void (*put_folio)(struct inode *inode, loff_t pos, unsigned copied,
struct folio *folio);
bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap);
+ int (*read_folio_range)(const struct iomap_iter *iter,
+ struct folio *folio, loff_t pos, size_t len);
};
iomap calls these functions:
@@ -123,6 +125,10 @@ iomap calls these functions:
``->iomap_valid``, then the iomap should considered stale and the
validation failed.
+ - ``read_folio_range``: Called to synchronously read in the range that will
+ be written to. If this function is not provided, iomap will default to
+ submitting a bio read request.
+
These ``struct kiocb`` flags are significant for buffered I/O with iomap:
* ``IOCB_NOWAIT``: Turns on ``IOMAP_NOWAIT``.
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 8a44f56a3a80..aed4fc30a849 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -671,7 +671,8 @@ static int iomap_read_folio_range(const struct iomap_iter *iter,
return submit_bio_wait(&bio);
}
-static int __iomap_write_begin(const struct iomap_iter *iter, size_t len,
+static int __iomap_write_begin(const struct iomap_iter *iter,
+ const struct iomap_write_ops *write_ops, size_t len,
struct folio *folio)
{
struct iomap_folio_state *ifs;
@@ -722,8 +723,12 @@ static int __iomap_write_begin(const struct iomap_iter *iter, size_t len,
if (iter->flags & IOMAP_NOWAIT)
return -EAGAIN;
- status = iomap_read_folio_range(iter, folio,
- block_start, plen);
+ if (write_ops && write_ops->read_folio_range)
+ status = write_ops->read_folio_range(iter,
+ folio, block_start, plen);
+ else
+ status = iomap_read_folio_range(iter,
+ folio, block_start, plen);
if (status)
return status;
}
@@ -839,7 +844,7 @@ static int iomap_write_begin(struct iomap_iter *iter,
else if (srcmap->flags & IOMAP_F_BUFFER_HEAD)
status = __block_write_begin_int(folio, pos, len, NULL, srcmap);
else
- status = __iomap_write_begin(iter, len, folio);
+ status = __iomap_write_begin(iter, write_ops, len, folio);
if (unlikely(status))
goto out_unlock;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 80f543cc4fe8..73dceabc21c8 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -166,6 +166,16 @@ struct iomap_write_ops {
* locked by the iomap code.
*/
bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap);
+
+ /*
+ * Optional if the filesystem wishes to provide a custom handler for
+ * reading in the contents of a folio, otherwise iomap will default to
+ * submitting a bio read request.
+ *
+ * The read must be done synchronously.
+ */
+ int (*read_folio_range)(const struct iomap_iter *iter,
+ struct folio *folio, loff_t pos, size_t len);
};
/*
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
end of thread, other threads:[~2025-07-10 13:34 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-08 13:51 refactor the iomap writeback code v4 Christoph Hellwig
2025-07-08 13:51 ` [PATCH 01/14] iomap: header diet Christoph Hellwig
2025-07-08 19:45 ` Darrick J. Wong
2025-07-08 21:09 ` Joanne Koong
2025-07-09 9:09 ` kernel test robot
2025-07-09 17:53 ` Joanne Koong
2025-07-08 13:51 ` [PATCH 02/14] iomap: pass more arguments using the iomap writeback context Christoph Hellwig
2025-07-08 19:45 ` Darrick J. Wong
2025-07-08 13:51 ` [PATCH 03/14] iomap: cleanup the pending writeback tracking in iomap_writepage_map_blocks Christoph Hellwig
2025-07-08 13:51 ` [PATCH 04/14] iomap: refactor the writeback interface Christoph Hellwig
2025-07-08 19:44 ` Darrick J. Wong
2025-07-08 13:51 ` [PATCH 05/14] iomap: hide ioends from the generic writeback code Christoph Hellwig
2025-07-08 19:42 ` Darrick J. Wong
2025-07-08 13:51 ` [PATCH 06/14] iomap: add public helpers for uptodate state manipulation Christoph Hellwig
2025-07-08 13:51 ` [PATCH 07/14] iomap: move all ioend handling to ioend.c Christoph Hellwig
2025-07-08 13:51 ` [PATCH 08/14] iomap: rename iomap_writepage_map to iomap_writeback_folio Christoph Hellwig
2025-07-08 21:12 ` Joanne Koong
2025-07-08 13:51 ` [PATCH 09/14] iomap: move folio_unlock out of iomap_writeback_folio Christoph Hellwig
2025-07-08 13:51 ` [PATCH 10/14] iomap: export iomap_writeback_folio Christoph Hellwig
2025-07-08 13:51 ` [PATCH 11/14] iomap: replace iomap_folio_ops with iomap_write_ops Christoph Hellwig
2025-07-08 13:51 ` [PATCH 12/14] iomap: improve argument passing to iomap_read_folio_sync Christoph Hellwig
2025-07-08 19:40 ` Darrick J. Wong
2025-07-08 21:23 ` Joanne Koong
2025-07-08 13:51 ` [PATCH 13/14] iomap: add read_folio_range() handler for buffered writes Christoph Hellwig
2025-07-08 13:51 ` [PATCH 14/14] iomap: build the writeback code without CONFIG_BLOCK Christoph Hellwig
2025-07-08 21:27 ` Joanne Koong
2025-07-08 21:30 ` refactor the iomap writeback code v4 Joanne Koong
-- strict thread matches above, loose matches on Subject: below --
2025-07-10 13:33 refactor the iomap writeback code v5 Christoph Hellwig
2025-07-10 13:33 ` [PATCH 13/14] iomap: add read_folio_range() handler for buffered writes Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).