* [PATCH v4 0/5] fuse: use iomap for buffered writes + writeback
@ 2025-07-09 22:10 Joanne Koong
2025-07-09 22:10 ` [PATCH v4 1/5] fuse: use iomap for buffered writes Joanne Koong
` (4 more replies)
0 siblings, 5 replies; 11+ messages in thread
From: Joanne Koong @ 2025-07-09 22:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: hch, miklos, brauner, djwong, anuj20.g, kernel-team
This series adds fuse iomap support for buffered writes and dirty folio
writeback. This is needed so that granular uptodate and dirty tracking can
be used in fuse when large folios are enabled. This has two big advantages.
For writes, instead of the entire folio needing to be read into the page
cache, only the relevant portions need to be. For writeback, only the
dirty portions need to be written back instead of the entire folio.
This patchset is on top of Christoph's iomap patchset [1] which is in his git
tree git://git.infradead.org/users/hch/misc.git iomap-writeback-refactor.
The changes in this patchset can be found in this git tree,
https://github.com/joannekoong/linux/commits/fuse_iomap/
Please note that this patchset does not enable large folios yet. That will be
sent out in a separate future patchset.
Thanks,
Joanne
[1] https://lore.kernel.org/linux-fsdevel/20250708135132.3347932-1-hch@lst.de/
Changeset
-------
v3 -> v4:
* Get rid of writethrough goto (Miklos)
* Move iomap_start_folio_write call to after error check (Darrick)
* Tidy up args for fuse_writepage_need_send() (me)
v3:
https://lore.kernel.org/linux-fsdevel/20250624022135.832899-1-joannelkoong@gmail.com/
v2 -> v3:
* Fix up fuse patches to use iomap APIs from Christoph's patches
* Drop CONFIG_BLOCK patches
* Add patch to use iomap for invalidation and partial uptodateness check
* Add patch for refactoring fuse writeback to use iomap_writepage_ctx inode
v2:
https://lore.kernel.org/linux-fsdevel/20250613214642.2903225-1-joannelkoong@gmail.com/
v1 -> v2:
* Drop IOMAP_IN_MEM type and just use IOMAP_MAPPED for fuse
* Separate out new helper functions added to iomap into separate commits
* Update iomap documentation
* Clean up iomap_writeback_dirty_folio() locking logic w/ christoph's
recommendation
* Refactor ->map_blocks() to generic ->writeback_folio()
* Refactor ->submit_ioend() to generic ->writeback_complete()
* Add patch for changing 'count' to 'async_writeback'
* Rebase commits onto linux branch instead of fuse branch
v1:
https://lore.kernel.org/linux-fsdevel/20250606233803.1421259-1-joannelkoong@gmail.com/
Joanne Koong (5):
fuse: use iomap for buffered writes
fuse: use iomap for writeback
fuse: use iomap for folio laundering
fuse: hook into iomap for invalidating and checking partial
uptodateness
fuse: refactor writeback to use iomap_writepage_ctx inode
fs/fuse/Kconfig | 1 +
fs/fuse/file.c | 339 +++++++++++++++++++++---------------------------
2 files changed, 148 insertions(+), 192 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v4 1/5] fuse: use iomap for buffered writes
2025-07-09 22:10 [PATCH v4 0/5] fuse: use iomap for buffered writes + writeback Joanne Koong
@ 2025-07-09 22:10 ` Joanne Koong
2025-07-12 4:46 ` Darrick J. Wong
2025-07-09 22:10 ` [PATCH v4 2/5] fuse: use iomap for writeback Joanne Koong
` (3 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Joanne Koong @ 2025-07-09 22:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: hch, miklos, brauner, djwong, anuj20.g, kernel-team
Have buffered writes go through iomap. This has two advantages:
* granular large folio synchronous reads
* granular large folio dirty tracking
If for example there is a 1 MB large folio and a write issued at pos 1
to pos 1 MB - 2, only the head and tail pages will need to be read in
and marked uptodate instead of the entire folio needing to be read in.
Non-relevant trailing pages are also skipped (eg if for a 1 MB large
folio a write is issued at pos 1 to 4099, only the first two pages are
read in and the ones after that are skipped).
iomap also has granular dirty tracking. This is useful in that when it
comes to writeback time, only the dirty portions of the large folio will
be written instead of having to write out the entire folio. For example
if there is a 1 MB large folio and only 2 bytes in it are dirty, only
the page for those dirty bytes get written out. Please note that
granular writeback is only done once fuse also uses iomap in writeback
(separate commit).
.release_folio needs to be set to iomap_release_folio so that any
allocated iomap ifs structs get freed.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/fuse/Kconfig | 1 +
fs/fuse/file.c | 148 ++++++++++++++++++------------------------------
2 files changed, 55 insertions(+), 94 deletions(-)
diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
index ca215a3cba3e..a774166264de 100644
--- a/fs/fuse/Kconfig
+++ b/fs/fuse/Kconfig
@@ -2,6 +2,7 @@
config FUSE_FS
tristate "FUSE (Filesystem in Userspace) support"
select FS_POSIX_ACL
+ select FS_IOMAP
help
With FUSE it is possible to implement a fully functional filesystem
in a userspace program.
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 47006d0753f1..cadad61ef7df 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -21,6 +21,7 @@
#include <linux/filelock.h>
#include <linux/splice.h>
#include <linux/task_io_accounting_ops.h>
+#include <linux/iomap.h>
static int fuse_send_open(struct fuse_mount *fm, u64 nodeid,
unsigned int open_flags, int opcode,
@@ -788,12 +789,16 @@ static void fuse_short_read(struct inode *inode, u64 attr_ver, size_t num_read,
}
}
-static int fuse_do_readfolio(struct file *file, struct folio *folio)
+static int fuse_do_readfolio(struct file *file, struct folio *folio,
+ size_t off, size_t len)
{
struct inode *inode = folio->mapping->host;
struct fuse_mount *fm = get_fuse_mount(inode);
- loff_t pos = folio_pos(folio);
- struct fuse_folio_desc desc = { .length = folio_size(folio) };
+ loff_t pos = folio_pos(folio) + off;
+ struct fuse_folio_desc desc = {
+ .offset = off,
+ .length = len,
+ };
struct fuse_io_args ia = {
.ap.args.page_zeroing = true,
.ap.args.out_pages = true,
@@ -820,8 +825,6 @@ static int fuse_do_readfolio(struct file *file, struct folio *folio)
if (res < desc.length)
fuse_short_read(inode, attr_ver, res, &ia.ap);
- folio_mark_uptodate(folio);
-
return 0;
}
@@ -834,13 +837,26 @@ static int fuse_read_folio(struct file *file, struct folio *folio)
if (fuse_is_bad(inode))
goto out;
- err = fuse_do_readfolio(file, folio);
+ err = fuse_do_readfolio(file, folio, 0, folio_size(folio));
+ if (!err)
+ folio_mark_uptodate(folio);
+
fuse_invalidate_atime(inode);
out:
folio_unlock(folio);
return err;
}
+static int fuse_iomap_read_folio_range(const struct iomap_iter *iter,
+ struct folio *folio, loff_t pos,
+ size_t len)
+{
+ struct file *file = iter->private;
+ size_t off = offset_in_folio(folio, pos);
+
+ return fuse_do_readfolio(file, folio, off, len);
+}
+
static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
int err)
{
@@ -1374,6 +1390,24 @@ static void fuse_dio_unlock(struct kiocb *iocb, bool exclusive)
}
}
+static const struct iomap_write_ops fuse_iomap_write_ops = {
+ .read_folio_range = fuse_iomap_read_folio_range,
+};
+
+static int fuse_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
+ unsigned int flags, struct iomap *iomap,
+ struct iomap *srcmap)
+{
+ iomap->type = IOMAP_MAPPED;
+ iomap->length = length;
+ iomap->offset = offset;
+ return 0;
+}
+
+static const struct iomap_ops fuse_iomap_ops = {
+ .iomap_begin = fuse_iomap_begin,
+};
+
static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
struct file *file = iocb->ki_filp;
@@ -1383,6 +1417,7 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = mapping->host;
ssize_t err, count;
struct fuse_conn *fc = get_fuse_conn(inode);
+ bool writeback = false;
if (fc->writeback_cache) {
/* Update size (EOF optimization) and mode (SUID clearing) */
@@ -1391,16 +1426,11 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
if (err)
return err;
- if (fc->handle_killpriv_v2 &&
- setattr_should_drop_suidgid(idmap,
- file_inode(file))) {
- goto writethrough;
- }
-
- return generic_file_write_iter(iocb, from);
+ if (!fc->handle_killpriv_v2 ||
+ !setattr_should_drop_suidgid(idmap, file_inode(file)))
+ writeback = true;
}
-writethrough:
inode_lock(inode);
err = count = generic_write_checks(iocb, from);
@@ -1419,6 +1449,15 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
goto out;
written = direct_write_fallback(iocb, from, written,
fuse_perform_write(iocb, from));
+ } else if (writeback) {
+ /*
+ * Use iomap so that we can do granular uptodate reads
+ * and granular dirty tracking for large folios.
+ */
+ written = iomap_file_buffered_write(iocb, from,
+ &fuse_iomap_ops,
+ &fuse_iomap_write_ops,
+ file);
} else {
written = fuse_perform_write(iocb, from);
}
@@ -2208,84 +2247,6 @@ static int fuse_writepages(struct address_space *mapping,
return err;
}
-/*
- * It's worthy to make sure that space is reserved on disk for the write,
- * but how to implement it without killing performance need more thinking.
- */
-static int fuse_write_begin(struct file *file, struct address_space *mapping,
- loff_t pos, unsigned len, struct folio **foliop, void **fsdata)
-{
- pgoff_t index = pos >> PAGE_SHIFT;
- struct fuse_conn *fc = get_fuse_conn(file_inode(file));
- struct folio *folio;
- loff_t fsize;
- int err = -ENOMEM;
-
- WARN_ON(!fc->writeback_cache);
-
- folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
- mapping_gfp_mask(mapping));
- if (IS_ERR(folio))
- goto error;
-
- if (folio_test_uptodate(folio) || len >= folio_size(folio))
- goto success;
- /*
- * Check if the start of this folio comes after the end of file,
- * in which case the readpage can be optimized away.
- */
- fsize = i_size_read(mapping->host);
- if (fsize <= folio_pos(folio)) {
- size_t off = offset_in_folio(folio, pos);
- if (off)
- folio_zero_segment(folio, 0, off);
- goto success;
- }
- err = fuse_do_readfolio(file, folio);
- if (err)
- goto cleanup;
-success:
- *foliop = folio;
- return 0;
-
-cleanup:
- folio_unlock(folio);
- folio_put(folio);
-error:
- return err;
-}
-
-static int fuse_write_end(struct file *file, struct address_space *mapping,
- loff_t pos, unsigned len, unsigned copied,
- struct folio *folio, void *fsdata)
-{
- struct inode *inode = folio->mapping->host;
-
- /* Haven't copied anything? Skip zeroing, size extending, dirtying. */
- if (!copied)
- goto unlock;
-
- pos += copied;
- if (!folio_test_uptodate(folio)) {
- /* Zero any unwritten bytes at the end of the page */
- size_t endoff = pos & ~PAGE_MASK;
- if (endoff)
- folio_zero_segment(folio, endoff, PAGE_SIZE);
- folio_mark_uptodate(folio);
- }
-
- if (pos > inode->i_size)
- i_size_write(inode, pos);
-
- folio_mark_dirty(folio);
-
-unlock:
- folio_unlock(folio);
- folio_put(folio);
-
- return copied;
-}
-
static int fuse_launder_folio(struct folio *folio)
{
int err = 0;
@@ -3144,11 +3105,10 @@ static const struct address_space_operations fuse_file_aops = {
.writepages = fuse_writepages,
.launder_folio = fuse_launder_folio,
.dirty_folio = filemap_dirty_folio,
+ .release_folio = iomap_release_folio,
.migrate_folio = filemap_migrate_folio,
.bmap = fuse_bmap,
.direct_IO = fuse_direct_IO,
- .write_begin = fuse_write_begin,
- .write_end = fuse_write_end,
};
void fuse_init_file_inode(struct inode *inode, unsigned int flags)
--
2.47.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4 2/5] fuse: use iomap for writeback
2025-07-09 22:10 [PATCH v4 0/5] fuse: use iomap for buffered writes + writeback Joanne Koong
2025-07-09 22:10 ` [PATCH v4 1/5] fuse: use iomap for buffered writes Joanne Koong
@ 2025-07-09 22:10 ` Joanne Koong
2025-07-12 4:41 ` Darrick J. Wong
2025-07-09 22:10 ` [PATCH v4 3/5] fuse: use iomap for folio laundering Joanne Koong
` (2 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Joanne Koong @ 2025-07-09 22:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: hch, miklos, brauner, djwong, anuj20.g, kernel-team
Use iomap for dirty folio writeback in ->writepages().
This allows for granular dirty writeback of large folios.
Only the dirty portions of the large folio will be written instead of
having to write out the entire folio. For example if there is a 1 MB
large folio and only 2 bytes in it are dirty, only the page for those
dirty bytes will be written out.
.dirty_folio needs to be set to iomap_dirty_folio so that the bitmap
iomap uses for dirty tracking correctly reflects dirty regions that need
to be written back.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/fuse/file.c | 127 +++++++++++++++++++++++++++++--------------------
1 file changed, 76 insertions(+), 51 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index cadad61ef7df..70bbc8f26459 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1832,7 +1832,7 @@ static void fuse_writepage_finish(struct fuse_writepage_args *wpa)
* scope of the fi->lock alleviates xarray lock
* contention and noticeably improves performance.
*/
- folio_end_writeback(ap->folios[i]);
+ iomap_finish_folio_write(inode, ap->folios[i], 1);
dec_wb_stat(&bdi->wb, WB_WRITEBACK);
wb_writeout_inc(&bdi->wb);
}
@@ -2019,19 +2019,20 @@ static void fuse_writepage_add_to_bucket(struct fuse_conn *fc,
}
static void fuse_writepage_args_page_fill(struct fuse_writepage_args *wpa, struct folio *folio,
- uint32_t folio_index)
+ uint32_t folio_index, loff_t offset, unsigned len)
{
struct inode *inode = folio->mapping->host;
struct fuse_args_pages *ap = &wpa->ia.ap;
ap->folios[folio_index] = folio;
- ap->descs[folio_index].offset = 0;
- ap->descs[folio_index].length = folio_size(folio);
+ ap->descs[folio_index].offset = offset;
+ ap->descs[folio_index].length = len;
inc_wb_stat(&inode_to_bdi(inode)->wb, WB_WRITEBACK);
}
static struct fuse_writepage_args *fuse_writepage_args_setup(struct folio *folio,
+ size_t offset,
struct fuse_file *ff)
{
struct inode *inode = folio->mapping->host;
@@ -2044,7 +2045,7 @@ static struct fuse_writepage_args *fuse_writepage_args_setup(struct folio *folio
return NULL;
fuse_writepage_add_to_bucket(fc, wpa);
- fuse_write_args_fill(&wpa->ia, ff, folio_pos(folio), 0);
+ fuse_write_args_fill(&wpa->ia, ff, folio_pos(folio) + offset, 0);
wpa->ia.write.in.write_flags |= FUSE_WRITE_CACHE;
wpa->inode = inode;
wpa->ia.ff = ff;
@@ -2070,7 +2071,7 @@ static int fuse_writepage_locked(struct folio *folio)
if (!ff)
goto err;
- wpa = fuse_writepage_args_setup(folio, ff);
+ wpa = fuse_writepage_args_setup(folio, 0, ff);
error = -ENOMEM;
if (!wpa)
goto err_writepage_args;
@@ -2079,7 +2080,7 @@ static int fuse_writepage_locked(struct folio *folio)
ap->num_folios = 1;
folio_start_writeback(folio);
- fuse_writepage_args_page_fill(wpa, folio, 0);
+ fuse_writepage_args_page_fill(wpa, folio, 0, 0, folio_size(folio));
spin_lock(&fi->lock);
list_add_tail(&wpa->queue_entry, &fi->queued_writes);
@@ -2100,7 +2101,7 @@ struct fuse_fill_wb_data {
struct fuse_file *ff;
struct inode *inode;
unsigned int max_folios;
- unsigned int nr_pages;
+ unsigned int nr_bytes;
};
static bool fuse_pages_realloc(struct fuse_fill_wb_data *data)
@@ -2141,22 +2142,29 @@ static void fuse_writepages_send(struct fuse_fill_wb_data *data)
spin_unlock(&fi->lock);
}
-static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio,
- struct fuse_args_pages *ap,
+static bool fuse_writepage_need_send(struct fuse_conn *fc, loff_t pos,
+ unsigned len, struct fuse_args_pages *ap,
struct fuse_fill_wb_data *data)
{
+ struct folio *prev_folio;
+ struct fuse_folio_desc prev_desc;
+ loff_t prev_pos;
+
WARN_ON(!ap->num_folios);
/* Reached max pages */
- if (data->nr_pages + folio_nr_pages(folio) > fc->max_pages)
+ if ((data->nr_bytes + len) / PAGE_SIZE > fc->max_pages)
return true;
/* Reached max write bytes */
- if ((data->nr_pages * PAGE_SIZE) + folio_size(folio) > fc->max_write)
+ if (data->nr_bytes + len > fc->max_write)
return true;
/* Discontinuity */
- if (folio_next_index(ap->folios[ap->num_folios - 1]) != folio->index)
+ prev_folio = ap->folios[ap->num_folios - 1];
+ prev_desc = ap->descs[ap->num_folios - 1];
+ prev_pos = folio_pos(prev_folio) + prev_desc.offset + prev_desc.length;
+ if (prev_pos != pos)
return true;
/* Need to grow the pages array? If so, did the expansion fail? */
@@ -2166,85 +2174,102 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio,
return false;
}
-static int fuse_writepages_fill(struct folio *folio,
- struct writeback_control *wbc, void *_data)
+static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
+ struct folio *folio, u64 pos,
+ unsigned len, u64 end_pos)
{
- struct fuse_fill_wb_data *data = _data;
+ struct fuse_fill_wb_data *data = wpc->wb_ctx;
struct fuse_writepage_args *wpa = data->wpa;
struct fuse_args_pages *ap = &wpa->ia.ap;
struct inode *inode = data->inode;
struct fuse_inode *fi = get_fuse_inode(inode);
struct fuse_conn *fc = get_fuse_conn(inode);
- int err;
+ loff_t offset = offset_in_folio(folio, pos);
+
+ WARN_ON_ONCE(!data);
+ /* len will always be page aligned */
+ WARN_ON_ONCE(len & (PAGE_SIZE - 1));
if (!data->ff) {
- err = -EIO;
data->ff = fuse_write_file_get(fi);
if (!data->ff)
- goto out_unlock;
+ return -EIO;
}
- if (wpa && fuse_writepage_need_send(fc, folio, ap, data)) {
+ if (wpa && fuse_writepage_need_send(fc, pos, len, ap, data)) {
fuse_writepages_send(data);
data->wpa = NULL;
- data->nr_pages = 0;
+ data->nr_bytes = 0;
}
if (data->wpa == NULL) {
- err = -ENOMEM;
- wpa = fuse_writepage_args_setup(folio, data->ff);
+ wpa = fuse_writepage_args_setup(folio, offset, data->ff);
if (!wpa)
- goto out_unlock;
+ return -ENOMEM;
fuse_file_get(wpa->ia.ff);
data->max_folios = 1;
ap = &wpa->ia.ap;
}
- folio_start_writeback(folio);
- fuse_writepage_args_page_fill(wpa, folio, ap->num_folios);
- data->nr_pages += folio_nr_pages(folio);
+ iomap_start_folio_write(inode, folio, 1);
+ fuse_writepage_args_page_fill(wpa, folio, ap->num_folios,
+ offset, len);
+ data->nr_bytes += len;
- err = 0;
ap->num_folios++;
if (!data->wpa)
data->wpa = wpa;
-out_unlock:
- folio_unlock(folio);
- return err;
+ return len;
+}
+
+static int fuse_iomap_writeback_submit(struct iomap_writepage_ctx *wpc,
+ int error)
+{
+ struct fuse_fill_wb_data *data = wpc->wb_ctx;
+
+ WARN_ON_ONCE(!data);
+
+ if (data->wpa) {
+ WARN_ON(!data->wpa->ia.ap.num_folios);
+ fuse_writepages_send(data);
+ }
+
+ if (data->ff)
+ fuse_file_put(data->ff, false);
+
+ return error;
}
+static const struct iomap_writeback_ops fuse_writeback_ops = {
+ .writeback_range = fuse_iomap_writeback_range,
+ .writeback_submit = fuse_iomap_writeback_submit,
+};
+
static int fuse_writepages(struct address_space *mapping,
struct writeback_control *wbc)
{
struct inode *inode = mapping->host;
struct fuse_conn *fc = get_fuse_conn(inode);
- struct fuse_fill_wb_data data;
- int err;
+ struct fuse_fill_wb_data data = {
+ .inode = inode,
+ };
+ struct iomap_writepage_ctx wpc = {
+ .inode = inode,
+ .iomap.type = IOMAP_MAPPED,
+ .wbc = wbc,
+ .ops = &fuse_writeback_ops,
+ .wb_ctx = &data,
+ };
- err = -EIO;
if (fuse_is_bad(inode))
- goto out;
+ return -EIO;
if (wbc->sync_mode == WB_SYNC_NONE &&
fc->num_background >= fc->congestion_threshold)
return 0;
- data.inode = inode;
- data.wpa = NULL;
- data.ff = NULL;
- data.nr_pages = 0;
-
- err = write_cache_pages(mapping, wbc, fuse_writepages_fill, &data);
- if (data.wpa) {
- WARN_ON(!data.wpa->ia.ap.num_folios);
- fuse_writepages_send(&data);
- }
- if (data.ff)
- fuse_file_put(data.ff, false);
-
-out:
- return err;
+ return iomap_writepages(&wpc);
}
static int fuse_launder_folio(struct folio *folio)
@@ -3104,7 +3129,7 @@ static const struct address_space_operations fuse_file_aops = {
.readahead = fuse_readahead,
.writepages = fuse_writepages,
.launder_folio = fuse_launder_folio,
- .dirty_folio = filemap_dirty_folio,
+ .dirty_folio = iomap_dirty_folio,
.release_folio = iomap_release_folio,
.migrate_folio = filemap_migrate_folio,
.bmap = fuse_bmap,
--
2.47.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4 3/5] fuse: use iomap for folio laundering
2025-07-09 22:10 [PATCH v4 0/5] fuse: use iomap for buffered writes + writeback Joanne Koong
2025-07-09 22:10 ` [PATCH v4 1/5] fuse: use iomap for buffered writes Joanne Koong
2025-07-09 22:10 ` [PATCH v4 2/5] fuse: use iomap for writeback Joanne Koong
@ 2025-07-09 22:10 ` Joanne Koong
2025-07-09 22:10 ` [PATCH v4 4/5] fuse: hook into iomap for invalidating and checking partial uptodateness Joanne Koong
2025-07-09 22:10 ` [PATCH v4 5/5] fuse: refactor writeback to use iomap_writepage_ctx inode Joanne Koong
4 siblings, 0 replies; 11+ messages in thread
From: Joanne Koong @ 2025-07-09 22:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: hch, miklos, brauner, djwong, anuj20.g, kernel-team
Use iomap for folio laundering, which will do granular dirty
writeback when laundering a large folio.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/fuse/file.c | 52 ++++++++++++--------------------------------------
1 file changed, 12 insertions(+), 40 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 70bbc8f26459..d7ee03fdccee 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2057,45 +2057,6 @@ static struct fuse_writepage_args *fuse_writepage_args_setup(struct folio *folio
return wpa;
}
-static int fuse_writepage_locked(struct folio *folio)
-{
- struct address_space *mapping = folio->mapping;
- struct inode *inode = mapping->host;
- struct fuse_inode *fi = get_fuse_inode(inode);
- struct fuse_writepage_args *wpa;
- struct fuse_args_pages *ap;
- struct fuse_file *ff;
- int error = -EIO;
-
- ff = fuse_write_file_get(fi);
- if (!ff)
- goto err;
-
- wpa = fuse_writepage_args_setup(folio, 0, ff);
- error = -ENOMEM;
- if (!wpa)
- goto err_writepage_args;
-
- ap = &wpa->ia.ap;
- ap->num_folios = 1;
-
- folio_start_writeback(folio);
- fuse_writepage_args_page_fill(wpa, folio, 0, 0, folio_size(folio));
-
- spin_lock(&fi->lock);
- list_add_tail(&wpa->queue_entry, &fi->queued_writes);
- fuse_flush_writepages(inode);
- spin_unlock(&fi->lock);
-
- return 0;
-
-err_writepage_args:
- fuse_file_put(ff, false);
-err:
- mapping_set_error(folio->mapping, error);
- return error;
-}
-
struct fuse_fill_wb_data {
struct fuse_writepage_args *wpa;
struct fuse_file *ff;
@@ -2275,8 +2236,19 @@ static int fuse_writepages(struct address_space *mapping,
static int fuse_launder_folio(struct folio *folio)
{
int err = 0;
+ struct fuse_fill_wb_data data = {
+ .inode = folio->mapping->host,
+ };
+ struct iomap_writepage_ctx wpc = {
+ .inode = folio->mapping->host,
+ .iomap.type = IOMAP_MAPPED,
+ .ops = &fuse_writeback_ops,
+ .wb_ctx = &data,
+ };
+
if (folio_clear_dirty_for_io(folio)) {
- err = fuse_writepage_locked(folio);
+ err = iomap_writeback_folio(&wpc, folio);
+ err = fuse_iomap_writeback_submit(&wpc, err);
if (!err)
folio_wait_writeback(folio);
}
--
2.47.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4 4/5] fuse: hook into iomap for invalidating and checking partial uptodateness
2025-07-09 22:10 [PATCH v4 0/5] fuse: use iomap for buffered writes + writeback Joanne Koong
` (2 preceding siblings ...)
2025-07-09 22:10 ` [PATCH v4 3/5] fuse: use iomap for folio laundering Joanne Koong
@ 2025-07-09 22:10 ` Joanne Koong
2025-07-09 22:10 ` [PATCH v4 5/5] fuse: refactor writeback to use iomap_writepage_ctx inode Joanne Koong
4 siblings, 0 replies; 11+ messages in thread
From: Joanne Koong @ 2025-07-09 22:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: hch, miklos, brauner, djwong, anuj20.g, kernel-team
Hook into iomap_invalidate_folio() so that if the entire folio is being
invalidated during truncation, the dirty state is cleared and the folio
doesn't get written back. As well the folio's corresponding ifs struct
will get freed.
Hook into iomap_is_partially_uptodate() since iomap tracks uptodateness
granularly when it does buffered writes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/fuse/file.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index d7ee03fdccee..669789043a8e 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -3103,6 +3103,8 @@ static const struct address_space_operations fuse_file_aops = {
.launder_folio = fuse_launder_folio,
.dirty_folio = iomap_dirty_folio,
.release_folio = iomap_release_folio,
+ .invalidate_folio = iomap_invalidate_folio,
+ .is_partially_uptodate = iomap_is_partially_uptodate,
.migrate_folio = filemap_migrate_folio,
.bmap = fuse_bmap,
.direct_IO = fuse_direct_IO,
--
2.47.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4 5/5] fuse: refactor writeback to use iomap_writepage_ctx inode
2025-07-09 22:10 [PATCH v4 0/5] fuse: use iomap for buffered writes + writeback Joanne Koong
` (3 preceding siblings ...)
2025-07-09 22:10 ` [PATCH v4 4/5] fuse: hook into iomap for invalidating and checking partial uptodateness Joanne Koong
@ 2025-07-09 22:10 ` Joanne Koong
4 siblings, 0 replies; 11+ messages in thread
From: Joanne Koong @ 2025-07-09 22:10 UTC (permalink / raw)
To: linux-fsdevel; +Cc: hch, miklos, brauner, djwong, anuj20.g, kernel-team
struct iomap_writepage_ctx includes a pointer to the file inode. In
writeback, use that instead of also passing the inode into
fuse_fill_wb_data.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/fuse/file.c | 28 ++++++++++++----------------
1 file changed, 12 insertions(+), 16 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 669789043a8e..e6745590ef1e 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2060,21 +2060,20 @@ static struct fuse_writepage_args *fuse_writepage_args_setup(struct folio *folio
struct fuse_fill_wb_data {
struct fuse_writepage_args *wpa;
struct fuse_file *ff;
- struct inode *inode;
unsigned int max_folios;
unsigned int nr_bytes;
};
-static bool fuse_pages_realloc(struct fuse_fill_wb_data *data)
+static bool fuse_pages_realloc(struct fuse_fill_wb_data *data,
+ unsigned int max_pages)
{
struct fuse_args_pages *ap = &data->wpa->ia.ap;
- struct fuse_conn *fc = get_fuse_conn(data->inode);
struct folio **folios;
struct fuse_folio_desc *descs;
unsigned int nfolios = min_t(unsigned int,
max_t(unsigned int, data->max_folios * 2,
FUSE_DEFAULT_MAX_PAGES_PER_REQ),
- fc->max_pages);
+ max_pages);
WARN_ON(nfolios <= data->max_folios);
folios = fuse_folios_alloc(nfolios, GFP_NOFS, &descs);
@@ -2091,10 +2090,10 @@ static bool fuse_pages_realloc(struct fuse_fill_wb_data *data)
return true;
}
-static void fuse_writepages_send(struct fuse_fill_wb_data *data)
+static void fuse_writepages_send(struct inode *inode,
+ struct fuse_fill_wb_data *data)
{
struct fuse_writepage_args *wpa = data->wpa;
- struct inode *inode = data->inode;
struct fuse_inode *fi = get_fuse_inode(inode);
spin_lock(&fi->lock);
@@ -2129,7 +2128,8 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, loff_t pos,
return true;
/* Need to grow the pages array? If so, did the expansion fail? */
- if (ap->num_folios == data->max_folios && !fuse_pages_realloc(data))
+ if (ap->num_folios == data->max_folios &&
+ !fuse_pages_realloc(data, fc->max_pages))
return true;
return false;
@@ -2142,7 +2142,7 @@ static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
struct fuse_fill_wb_data *data = wpc->wb_ctx;
struct fuse_writepage_args *wpa = data->wpa;
struct fuse_args_pages *ap = &wpa->ia.ap;
- struct inode *inode = data->inode;
+ struct inode *inode = wpc->inode;
struct fuse_inode *fi = get_fuse_inode(inode);
struct fuse_conn *fc = get_fuse_conn(inode);
loff_t offset = offset_in_folio(folio, pos);
@@ -2158,7 +2158,7 @@ static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
}
if (wpa && fuse_writepage_need_send(fc, pos, len, ap, data)) {
- fuse_writepages_send(data);
+ fuse_writepages_send(inode, data);
data->wpa = NULL;
data->nr_bytes = 0;
}
@@ -2193,7 +2193,7 @@ static int fuse_iomap_writeback_submit(struct iomap_writepage_ctx *wpc,
if (data->wpa) {
WARN_ON(!data->wpa->ia.ap.num_folios);
- fuse_writepages_send(data);
+ fuse_writepages_send(wpc->inode, data);
}
if (data->ff)
@@ -2212,9 +2212,7 @@ static int fuse_writepages(struct address_space *mapping,
{
struct inode *inode = mapping->host;
struct fuse_conn *fc = get_fuse_conn(inode);
- struct fuse_fill_wb_data data = {
- .inode = inode,
- };
+ struct fuse_fill_wb_data data = {};
struct iomap_writepage_ctx wpc = {
.inode = inode,
.iomap.type = IOMAP_MAPPED,
@@ -2236,9 +2234,7 @@ static int fuse_writepages(struct address_space *mapping,
static int fuse_launder_folio(struct folio *folio)
{
int err = 0;
- struct fuse_fill_wb_data data = {
- .inode = folio->mapping->host,
- };
+ struct fuse_fill_wb_data data = {};
struct iomap_writepage_ctx wpc = {
.inode = folio->mapping->host,
.iomap.type = IOMAP_MAPPED,
--
2.47.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v4 2/5] fuse: use iomap for writeback
2025-07-09 22:10 ` [PATCH v4 2/5] fuse: use iomap for writeback Joanne Koong
@ 2025-07-12 4:41 ` Darrick J. Wong
2025-07-14 21:43 ` Joanne Koong
0 siblings, 1 reply; 11+ messages in thread
From: Darrick J. Wong @ 2025-07-12 4:41 UTC (permalink / raw)
To: Joanne Koong; +Cc: linux-fsdevel, hch, miklos, brauner, anuj20.g, kernel-team
On Wed, Jul 09, 2025 at 03:10:20PM -0700, Joanne Koong wrote:
> Use iomap for dirty folio writeback in ->writepages().
> This allows for granular dirty writeback of large folios.
>
> Only the dirty portions of the large folio will be written instead of
> having to write out the entire folio. For example if there is a 1 MB
> large folio and only 2 bytes in it are dirty, only the page for those
> dirty bytes will be written out.
>
> .dirty_folio needs to be set to iomap_dirty_folio so that the bitmap
> iomap uses for dirty tracking correctly reflects dirty regions that need
> to be written back.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
> fs/fuse/file.c | 127 +++++++++++++++++++++++++++++--------------------
> 1 file changed, 76 insertions(+), 51 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index cadad61ef7df..70bbc8f26459 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1832,7 +1832,7 @@ static void fuse_writepage_finish(struct fuse_writepage_args *wpa)
> * scope of the fi->lock alleviates xarray lock
> * contention and noticeably improves performance.
> */
> - folio_end_writeback(ap->folios[i]);
> + iomap_finish_folio_write(inode, ap->folios[i], 1);
> dec_wb_stat(&bdi->wb, WB_WRITEBACK);
> wb_writeout_inc(&bdi->wb);
> }
> @@ -2019,19 +2019,20 @@ static void fuse_writepage_add_to_bucket(struct fuse_conn *fc,
> }
>
> static void fuse_writepage_args_page_fill(struct fuse_writepage_args *wpa, struct folio *folio,
> - uint32_t folio_index)
> + uint32_t folio_index, loff_t offset, unsigned len)
> {
> struct inode *inode = folio->mapping->host;
> struct fuse_args_pages *ap = &wpa->ia.ap;
>
> ap->folios[folio_index] = folio;
> - ap->descs[folio_index].offset = 0;
> - ap->descs[folio_index].length = folio_size(folio);
> + ap->descs[folio_index].offset = offset;
> + ap->descs[folio_index].length = len;
>
> inc_wb_stat(&inode_to_bdi(inode)->wb, WB_WRITEBACK);
> }
>
> static struct fuse_writepage_args *fuse_writepage_args_setup(struct folio *folio,
> + size_t offset,
> struct fuse_file *ff)
> {
> struct inode *inode = folio->mapping->host;
> @@ -2044,7 +2045,7 @@ static struct fuse_writepage_args *fuse_writepage_args_setup(struct folio *folio
> return NULL;
>
> fuse_writepage_add_to_bucket(fc, wpa);
> - fuse_write_args_fill(&wpa->ia, ff, folio_pos(folio), 0);
> + fuse_write_args_fill(&wpa->ia, ff, folio_pos(folio) + offset, 0);
> wpa->ia.write.in.write_flags |= FUSE_WRITE_CACHE;
> wpa->inode = inode;
> wpa->ia.ff = ff;
> @@ -2070,7 +2071,7 @@ static int fuse_writepage_locked(struct folio *folio)
> if (!ff)
> goto err;
>
> - wpa = fuse_writepage_args_setup(folio, ff);
> + wpa = fuse_writepage_args_setup(folio, 0, ff);
> error = -ENOMEM;
> if (!wpa)
> goto err_writepage_args;
> @@ -2079,7 +2080,7 @@ static int fuse_writepage_locked(struct folio *folio)
> ap->num_folios = 1;
>
> folio_start_writeback(folio);
> - fuse_writepage_args_page_fill(wpa, folio, 0);
> + fuse_writepage_args_page_fill(wpa, folio, 0, 0, folio_size(folio));
>
> spin_lock(&fi->lock);
> list_add_tail(&wpa->queue_entry, &fi->queued_writes);
> @@ -2100,7 +2101,7 @@ struct fuse_fill_wb_data {
> struct fuse_file *ff;
> struct inode *inode;
> unsigned int max_folios;
> - unsigned int nr_pages;
> + unsigned int nr_bytes;
I don't know if fuse servers are ever realistically going to end up with
a large number of 1M folios, but at least in theory iomap is capable of
queuing ~4096 folios into a single writeback context. Does this need to
account for that?
> };
>
> static bool fuse_pages_realloc(struct fuse_fill_wb_data *data)
> @@ -2141,22 +2142,29 @@ static void fuse_writepages_send(struct fuse_fill_wb_data *data)
> spin_unlock(&fi->lock);
> }
>
> -static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio,
> - struct fuse_args_pages *ap,
> +static bool fuse_writepage_need_send(struct fuse_conn *fc, loff_t pos,
> + unsigned len, struct fuse_args_pages *ap,
> struct fuse_fill_wb_data *data)
> {
> + struct folio *prev_folio;
> + struct fuse_folio_desc prev_desc;
> + loff_t prev_pos;
> +
> WARN_ON(!ap->num_folios);
>
> /* Reached max pages */
> - if (data->nr_pages + folio_nr_pages(folio) > fc->max_pages)
> + if ((data->nr_bytes + len) / PAGE_SIZE > fc->max_pages)
>> PAGE_SHIFT ?
Otherwise this looks decent to me.
--D
> return true;
>
> /* Reached max write bytes */
> - if ((data->nr_pages * PAGE_SIZE) + folio_size(folio) > fc->max_write)
> + if (data->nr_bytes + len > fc->max_write)
> return true;
>
> /* Discontinuity */
> - if (folio_next_index(ap->folios[ap->num_folios - 1]) != folio->index)
> + prev_folio = ap->folios[ap->num_folios - 1];
> + prev_desc = ap->descs[ap->num_folios - 1];
> + prev_pos = folio_pos(prev_folio) + prev_desc.offset + prev_desc.length;
> + if (prev_pos != pos)
> return true;
>
> /* Need to grow the pages array? If so, did the expansion fail? */
> @@ -2166,85 +2174,102 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio,
> return false;
> }
>
> -static int fuse_writepages_fill(struct folio *folio,
> - struct writeback_control *wbc, void *_data)
> +static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
> + struct folio *folio, u64 pos,
> + unsigned len, u64 end_pos)
> {
> - struct fuse_fill_wb_data *data = _data;
> + struct fuse_fill_wb_data *data = wpc->wb_ctx;
> struct fuse_writepage_args *wpa = data->wpa;
> struct fuse_args_pages *ap = &wpa->ia.ap;
> struct inode *inode = data->inode;
> struct fuse_inode *fi = get_fuse_inode(inode);
> struct fuse_conn *fc = get_fuse_conn(inode);
> - int err;
> + loff_t offset = offset_in_folio(folio, pos);
> +
> + WARN_ON_ONCE(!data);
> + /* len will always be page aligned */
> + WARN_ON_ONCE(len & (PAGE_SIZE - 1));
>
> if (!data->ff) {
> - err = -EIO;
> data->ff = fuse_write_file_get(fi);
> if (!data->ff)
> - goto out_unlock;
> + return -EIO;
> }
>
> - if (wpa && fuse_writepage_need_send(fc, folio, ap, data)) {
> + if (wpa && fuse_writepage_need_send(fc, pos, len, ap, data)) {
> fuse_writepages_send(data);
> data->wpa = NULL;
> - data->nr_pages = 0;
> + data->nr_bytes = 0;
> }
>
> if (data->wpa == NULL) {
> - err = -ENOMEM;
> - wpa = fuse_writepage_args_setup(folio, data->ff);
> + wpa = fuse_writepage_args_setup(folio, offset, data->ff);
> if (!wpa)
> - goto out_unlock;
> + return -ENOMEM;
> fuse_file_get(wpa->ia.ff);
> data->max_folios = 1;
> ap = &wpa->ia.ap;
> }
> - folio_start_writeback(folio);
>
> - fuse_writepage_args_page_fill(wpa, folio, ap->num_folios);
> - data->nr_pages += folio_nr_pages(folio);
> + iomap_start_folio_write(inode, folio, 1);
> + fuse_writepage_args_page_fill(wpa, folio, ap->num_folios,
> + offset, len);
> + data->nr_bytes += len;
>
> - err = 0;
> ap->num_folios++;
> if (!data->wpa)
> data->wpa = wpa;
> -out_unlock:
> - folio_unlock(folio);
>
> - return err;
> + return len;
> +}
> +
> +static int fuse_iomap_writeback_submit(struct iomap_writepage_ctx *wpc,
> + int error)
> +{
> + struct fuse_fill_wb_data *data = wpc->wb_ctx;
> +
> + WARN_ON_ONCE(!data);
> +
> + if (data->wpa) {
> + WARN_ON(!data->wpa->ia.ap.num_folios);
> + fuse_writepages_send(data);
> + }
> +
> + if (data->ff)
> + fuse_file_put(data->ff, false);
> +
> + return error;
> }
>
> +static const struct iomap_writeback_ops fuse_writeback_ops = {
> + .writeback_range = fuse_iomap_writeback_range,
> + .writeback_submit = fuse_iomap_writeback_submit,
> +};
> +
> static int fuse_writepages(struct address_space *mapping,
> struct writeback_control *wbc)
> {
> struct inode *inode = mapping->host;
> struct fuse_conn *fc = get_fuse_conn(inode);
> - struct fuse_fill_wb_data data;
> - int err;
> + struct fuse_fill_wb_data data = {
> + .inode = inode,
> + };
> + struct iomap_writepage_ctx wpc = {
> + .inode = inode,
> + .iomap.type = IOMAP_MAPPED,
> + .wbc = wbc,
> + .ops = &fuse_writeback_ops,
> + .wb_ctx = &data,
> + };
>
> - err = -EIO;
> if (fuse_is_bad(inode))
> - goto out;
> + return -EIO;
>
> if (wbc->sync_mode == WB_SYNC_NONE &&
> fc->num_background >= fc->congestion_threshold)
> return 0;
>
> - data.inode = inode;
> - data.wpa = NULL;
> - data.ff = NULL;
> - data.nr_pages = 0;
> -
> - err = write_cache_pages(mapping, wbc, fuse_writepages_fill, &data);
> - if (data.wpa) {
> - WARN_ON(!data.wpa->ia.ap.num_folios);
> - fuse_writepages_send(&data);
> - }
> - if (data.ff)
> - fuse_file_put(data.ff, false);
> -
> -out:
> - return err;
> + return iomap_writepages(&wpc);
> }
>
> static int fuse_launder_folio(struct folio *folio)
> @@ -3104,7 +3129,7 @@ static const struct address_space_operations fuse_file_aops = {
> .readahead = fuse_readahead,
> .writepages = fuse_writepages,
> .launder_folio = fuse_launder_folio,
> - .dirty_folio = filemap_dirty_folio,
> + .dirty_folio = iomap_dirty_folio,
> .release_folio = iomap_release_folio,
> .migrate_folio = filemap_migrate_folio,
> .bmap = fuse_bmap,
> --
> 2.47.1
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v4 1/5] fuse: use iomap for buffered writes
2025-07-09 22:10 ` [PATCH v4 1/5] fuse: use iomap for buffered writes Joanne Koong
@ 2025-07-12 4:46 ` Darrick J. Wong
2025-07-12 6:13 ` Amir Goldstein
2025-07-14 11:40 ` Christoph Hellwig
0 siblings, 2 replies; 11+ messages in thread
From: Darrick J. Wong @ 2025-07-12 4:46 UTC (permalink / raw)
To: Joanne Koong; +Cc: linux-fsdevel, hch, miklos, brauner, anuj20.g, kernel-team
On Wed, Jul 09, 2025 at 03:10:19PM -0700, Joanne Koong wrote:
> Have buffered writes go through iomap. This has two advantages:
> * granular large folio synchronous reads
> * granular large folio dirty tracking
>
> If for example there is a 1 MB large folio and a write issued at pos 1
> to pos 1 MB - 2, only the head and tail pages will need to be read in
> and marked uptodate instead of the entire folio needing to be read in.
> Non-relevant trailing pages are also skipped (eg if for a 1 MB large
> folio a write is issued at pos 1 to 4099, only the first two pages are
> read in and the ones after that are skipped).
>
> iomap also has granular dirty tracking. This is useful in that when it
> comes to writeback time, only the dirty portions of the large folio will
> be written instead of having to write out the entire folio. For example
> if there is a 1 MB large folio and only 2 bytes in it are dirty, only
> the page for those dirty bytes get written out. Please note that
> granular writeback is only done once fuse also uses iomap in writeback
> (separate commit).
>
> .release_folio needs to be set to iomap_release_folio so that any
> allocated iomap ifs structs get freed.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
> fs/fuse/Kconfig | 1 +
> fs/fuse/file.c | 148 ++++++++++++++++++------------------------------
> 2 files changed, 55 insertions(+), 94 deletions(-)
>
> diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
> index ca215a3cba3e..a774166264de 100644
> --- a/fs/fuse/Kconfig
> +++ b/fs/fuse/Kconfig
> @@ -2,6 +2,7 @@
> config FUSE_FS
> tristate "FUSE (Filesystem in Userspace) support"
> select FS_POSIX_ACL
> + select FS_IOMAP
> help
> With FUSE it is possible to implement a fully functional filesystem
> in a userspace program.
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 47006d0753f1..cadad61ef7df 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -21,6 +21,7 @@
> #include <linux/filelock.h>
> #include <linux/splice.h>
> #include <linux/task_io_accounting_ops.h>
> +#include <linux/iomap.h>
>
> static int fuse_send_open(struct fuse_mount *fm, u64 nodeid,
> unsigned int open_flags, int opcode,
> @@ -788,12 +789,16 @@ static void fuse_short_read(struct inode *inode, u64 attr_ver, size_t num_read,
> }
> }
>
> -static int fuse_do_readfolio(struct file *file, struct folio *folio)
> +static int fuse_do_readfolio(struct file *file, struct folio *folio,
> + size_t off, size_t len)
> {
> struct inode *inode = folio->mapping->host;
> struct fuse_mount *fm = get_fuse_mount(inode);
> - loff_t pos = folio_pos(folio);
> - struct fuse_folio_desc desc = { .length = folio_size(folio) };
> + loff_t pos = folio_pos(folio) + off;
> + struct fuse_folio_desc desc = {
> + .offset = off,
> + .length = len,
> + };
> struct fuse_io_args ia = {
> .ap.args.page_zeroing = true,
> .ap.args.out_pages = true,
> @@ -820,8 +825,6 @@ static int fuse_do_readfolio(struct file *file, struct folio *folio)
> if (res < desc.length)
> fuse_short_read(inode, attr_ver, res, &ia.ap);
>
> - folio_mark_uptodate(folio);
> -
> return 0;
> }
>
> @@ -834,13 +837,26 @@ static int fuse_read_folio(struct file *file, struct folio *folio)
> if (fuse_is_bad(inode))
> goto out;
>
> - err = fuse_do_readfolio(file, folio);
> + err = fuse_do_readfolio(file, folio, 0, folio_size(folio));
> + if (!err)
> + folio_mark_uptodate(folio);
> +
> fuse_invalidate_atime(inode);
> out:
> folio_unlock(folio);
> return err;
> }
>
> +static int fuse_iomap_read_folio_range(const struct iomap_iter *iter,
> + struct folio *folio, loff_t pos,
> + size_t len)
> +{
> + struct file *file = iter->private;
> + size_t off = offset_in_folio(folio, pos);
> +
> + return fuse_do_readfolio(file, folio, off, len);
> +}
> +
> static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
> int err)
> {
> @@ -1374,6 +1390,24 @@ static void fuse_dio_unlock(struct kiocb *iocb, bool exclusive)
> }
> }
>
> +static const struct iomap_write_ops fuse_iomap_write_ops = {
> + .read_folio_range = fuse_iomap_read_folio_range,
> +};
> +
> +static int fuse_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> + unsigned int flags, struct iomap *iomap,
> + struct iomap *srcmap)
> +{
> + iomap->type = IOMAP_MAPPED;
> + iomap->length = length;
> + iomap->offset = offset;
> + return 0;
> +}
> +
> +static const struct iomap_ops fuse_iomap_ops = {
> + .iomap_begin = fuse_iomap_begin,
> +};
> +
> static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
> {
> struct file *file = iocb->ki_filp;
> @@ -1383,6 +1417,7 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
> struct inode *inode = mapping->host;
> ssize_t err, count;
> struct fuse_conn *fc = get_fuse_conn(inode);
> + bool writeback = false;
>
> if (fc->writeback_cache) {
> /* Update size (EOF optimization) and mode (SUID clearing) */
> @@ -1391,16 +1426,11 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
> if (err)
> return err;
>
> - if (fc->handle_killpriv_v2 &&
> - setattr_should_drop_suidgid(idmap,
> - file_inode(file))) {
> - goto writethrough;
> - }
> -
> - return generic_file_write_iter(iocb, from);
> + if (!fc->handle_killpriv_v2 ||
> + !setattr_should_drop_suidgid(idmap, file_inode(file)))
> + writeback = true;
> }
>
> -writethrough:
> inode_lock(inode);
>
> err = count = generic_write_checks(iocb, from);
> @@ -1419,6 +1449,15 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
> goto out;
> written = direct_write_fallback(iocb, from, written,
> fuse_perform_write(iocb, from));
Random unrelatd question: does anyone know why fuse handles IOCB_DIRECT
in its fuse_cache_{read,write}_iter functions and /also/ sets
->direct_IO? I thought filesystems only did one or the other, not both.
Anyway your changes look reasonable to me, so
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> + } else if (writeback) {
> + /*
> + * Use iomap so that we can do granular uptodate reads
> + * and granular dirty tracking for large folios.
> + */
> + written = iomap_file_buffered_write(iocb, from,
> + &fuse_iomap_ops,
> + &fuse_iomap_write_ops,
> + file);
> } else {
> written = fuse_perform_write(iocb, from);
> }
> @@ -2208,84 +2247,6 @@ static int fuse_writepages(struct address_space *mapping,
> return err;
> }
>
> -/*
> - * It's worthy to make sure that space is reserved on disk for the write,
> - * but how to implement it without killing performance need more thinking.
> - */
> -static int fuse_write_begin(struct file *file, struct address_space *mapping,
> - loff_t pos, unsigned len, struct folio **foliop, void **fsdata)
> -{
> - pgoff_t index = pos >> PAGE_SHIFT;
> - struct fuse_conn *fc = get_fuse_conn(file_inode(file));
> - struct folio *folio;
> - loff_t fsize;
> - int err = -ENOMEM;
> -
> - WARN_ON(!fc->writeback_cache);
> -
> - folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
> - mapping_gfp_mask(mapping));
> - if (IS_ERR(folio))
> - goto error;
> -
> - if (folio_test_uptodate(folio) || len >= folio_size(folio))
> - goto success;
> - /*
> - * Check if the start of this folio comes after the end of file,
> - * in which case the readpage can be optimized away.
> - */
> - fsize = i_size_read(mapping->host);
> - if (fsize <= folio_pos(folio)) {
> - size_t off = offset_in_folio(folio, pos);
> - if (off)
> - folio_zero_segment(folio, 0, off);
> - goto success;
> - }
> - err = fuse_do_readfolio(file, folio);
> - if (err)
> - goto cleanup;
> -success:
> - *foliop = folio;
> - return 0;
> -
> -cleanup:
> - folio_unlock(folio);
> - folio_put(folio);
> -error:
> - return err;
> -}
> -
> -static int fuse_write_end(struct file *file, struct address_space *mapping,
> - loff_t pos, unsigned len, unsigned copied,
> - struct folio *folio, void *fsdata)
> -{
> - struct inode *inode = folio->mapping->host;
> -
> - /* Haven't copied anything? Skip zeroing, size extending, dirtying. */
> - if (!copied)
> - goto unlock;
> -
> - pos += copied;
> - if (!folio_test_uptodate(folio)) {
> - /* Zero any unwritten bytes at the end of the page */
> - size_t endoff = pos & ~PAGE_MASK;
> - if (endoff)
> - folio_zero_segment(folio, endoff, PAGE_SIZE);
> - folio_mark_uptodate(folio);
> - }
> -
> - if (pos > inode->i_size)
> - i_size_write(inode, pos);
> -
> - folio_mark_dirty(folio);
> -
> -unlock:
> - folio_unlock(folio);
> - folio_put(folio);
> -
> - return copied;
> -}
> -
> static int fuse_launder_folio(struct folio *folio)
> {
> int err = 0;
> @@ -3144,11 +3105,10 @@ static const struct address_space_operations fuse_file_aops = {
> .writepages = fuse_writepages,
> .launder_folio = fuse_launder_folio,
> .dirty_folio = filemap_dirty_folio,
> + .release_folio = iomap_release_folio,
> .migrate_folio = filemap_migrate_folio,
> .bmap = fuse_bmap,
> .direct_IO = fuse_direct_IO,
> - .write_begin = fuse_write_begin,
> - .write_end = fuse_write_end,
> };
>
> void fuse_init_file_inode(struct inode *inode, unsigned int flags)
> --
> 2.47.1
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v4 1/5] fuse: use iomap for buffered writes
2025-07-12 4:46 ` Darrick J. Wong
@ 2025-07-12 6:13 ` Amir Goldstein
2025-07-14 11:40 ` Christoph Hellwig
1 sibling, 0 replies; 11+ messages in thread
From: Amir Goldstein @ 2025-07-12 6:13 UTC (permalink / raw)
To: Darrick J. Wong, Bernd Schubert
Cc: Joanne Koong, linux-fsdevel, hch, miklos, brauner, anuj20.g,
kernel-team
On Sat, Jul 12, 2025 at 6:46 AM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Wed, Jul 09, 2025 at 03:10:19PM -0700, Joanne Koong wrote:
> > Have buffered writes go through iomap. This has two advantages:
> > * granular large folio synchronous reads
> > * granular large folio dirty tracking
> >
> > If for example there is a 1 MB large folio and a write issued at pos 1
> > to pos 1 MB - 2, only the head and tail pages will need to be read in
> > and marked uptodate instead of the entire folio needing to be read in.
> > Non-relevant trailing pages are also skipped (eg if for a 1 MB large
> > folio a write is issued at pos 1 to 4099, only the first two pages are
> > read in and the ones after that are skipped).
> >
> > iomap also has granular dirty tracking. This is useful in that when it
> > comes to writeback time, only the dirty portions of the large folio will
> > be written instead of having to write out the entire folio. For example
> > if there is a 1 MB large folio and only 2 bytes in it are dirty, only
> > the page for those dirty bytes get written out. Please note that
> > granular writeback is only done once fuse also uses iomap in writeback
> > (separate commit).
> >
> > .release_folio needs to be set to iomap_release_folio so that any
> > allocated iomap ifs structs get freed.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> > fs/fuse/Kconfig | 1 +
> > fs/fuse/file.c | 148 ++++++++++++++++++------------------------------
> > 2 files changed, 55 insertions(+), 94 deletions(-)
> >
> > diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
> > index ca215a3cba3e..a774166264de 100644
> > --- a/fs/fuse/Kconfig
> > +++ b/fs/fuse/Kconfig
> > @@ -2,6 +2,7 @@
> > config FUSE_FS
> > tristate "FUSE (Filesystem in Userspace) support"
> > select FS_POSIX_ACL
> > + select FS_IOMAP
> > help
> > With FUSE it is possible to implement a fully functional filesystem
> > in a userspace program.
> > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > index 47006d0753f1..cadad61ef7df 100644
> > --- a/fs/fuse/file.c
> > +++ b/fs/fuse/file.c
> > @@ -21,6 +21,7 @@
> > #include <linux/filelock.h>
> > #include <linux/splice.h>
> > #include <linux/task_io_accounting_ops.h>
> > +#include <linux/iomap.h>
> >
> > static int fuse_send_open(struct fuse_mount *fm, u64 nodeid,
> > unsigned int open_flags, int opcode,
> > @@ -788,12 +789,16 @@ static void fuse_short_read(struct inode *inode, u64 attr_ver, size_t num_read,
> > }
> > }
> >
> > -static int fuse_do_readfolio(struct file *file, struct folio *folio)
> > +static int fuse_do_readfolio(struct file *file, struct folio *folio,
> > + size_t off, size_t len)
> > {
> > struct inode *inode = folio->mapping->host;
> > struct fuse_mount *fm = get_fuse_mount(inode);
> > - loff_t pos = folio_pos(folio);
> > - struct fuse_folio_desc desc = { .length = folio_size(folio) };
> > + loff_t pos = folio_pos(folio) + off;
> > + struct fuse_folio_desc desc = {
> > + .offset = off,
> > + .length = len,
> > + };
> > struct fuse_io_args ia = {
> > .ap.args.page_zeroing = true,
> > .ap.args.out_pages = true,
> > @@ -820,8 +825,6 @@ static int fuse_do_readfolio(struct file *file, struct folio *folio)
> > if (res < desc.length)
> > fuse_short_read(inode, attr_ver, res, &ia.ap);
> >
> > - folio_mark_uptodate(folio);
> > -
> > return 0;
> > }
> >
> > @@ -834,13 +837,26 @@ static int fuse_read_folio(struct file *file, struct folio *folio)
> > if (fuse_is_bad(inode))
> > goto out;
> >
> > - err = fuse_do_readfolio(file, folio);
> > + err = fuse_do_readfolio(file, folio, 0, folio_size(folio));
> > + if (!err)
> > + folio_mark_uptodate(folio);
> > +
> > fuse_invalidate_atime(inode);
> > out:
> > folio_unlock(folio);
> > return err;
> > }
> >
> > +static int fuse_iomap_read_folio_range(const struct iomap_iter *iter,
> > + struct folio *folio, loff_t pos,
> > + size_t len)
> > +{
> > + struct file *file = iter->private;
> > + size_t off = offset_in_folio(folio, pos);
> > +
> > + return fuse_do_readfolio(file, folio, off, len);
> > +}
> > +
> > static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
> > int err)
> > {
> > @@ -1374,6 +1390,24 @@ static void fuse_dio_unlock(struct kiocb *iocb, bool exclusive)
> > }
> > }
> >
> > +static const struct iomap_write_ops fuse_iomap_write_ops = {
> > + .read_folio_range = fuse_iomap_read_folio_range,
> > +};
> > +
> > +static int fuse_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> > + unsigned int flags, struct iomap *iomap,
> > + struct iomap *srcmap)
> > +{
> > + iomap->type = IOMAP_MAPPED;
> > + iomap->length = length;
> > + iomap->offset = offset;
> > + return 0;
> > +}
> > +
> > +static const struct iomap_ops fuse_iomap_ops = {
> > + .iomap_begin = fuse_iomap_begin,
> > +};
> > +
> > static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
> > {
> > struct file *file = iocb->ki_filp;
> > @@ -1383,6 +1417,7 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
> > struct inode *inode = mapping->host;
> > ssize_t err, count;
> > struct fuse_conn *fc = get_fuse_conn(inode);
> > + bool writeback = false;
> >
> > if (fc->writeback_cache) {
> > /* Update size (EOF optimization) and mode (SUID clearing) */
> > @@ -1391,16 +1426,11 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
> > if (err)
> > return err;
> >
> > - if (fc->handle_killpriv_v2 &&
> > - setattr_should_drop_suidgid(idmap,
> > - file_inode(file))) {
> > - goto writethrough;
> > - }
> > -
> > - return generic_file_write_iter(iocb, from);
> > + if (!fc->handle_killpriv_v2 ||
> > + !setattr_should_drop_suidgid(idmap, file_inode(file)))
> > + writeback = true;
> > }
> >
> > -writethrough:
> > inode_lock(inode);
> >
> > err = count = generic_write_checks(iocb, from);
> > @@ -1419,6 +1449,15 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
> > goto out;
> > written = direct_write_fallback(iocb, from, written,
> > fuse_perform_write(iocb, from));
>
> Random unrelatd question: does anyone know why fuse handles IOCB_DIRECT
> in its fuse_cache_{read,write}_iter functions and /also/ sets
> ->direct_IO? I thought filesystems only did one or the other, not both.
>
I think it has to do with the difference in handling async aio and sync aio
and the difference between user requested O_DIRECT and server
requested FOPEN_DIRECT_IO.
I think Bernd had some patches to further unify the related code.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v4 1/5] fuse: use iomap for buffered writes
2025-07-12 4:46 ` Darrick J. Wong
2025-07-12 6:13 ` Amir Goldstein
@ 2025-07-14 11:40 ` Christoph Hellwig
1 sibling, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2025-07-14 11:40 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Joanne Koong, linux-fsdevel, hch, miklos, brauner, anuj20.g,
kernel-team
On Fri, Jul 11, 2025 at 09:46:11PM -0700, Darrick J. Wong wrote:
[fullquote deleted. Any chance you could only quote the actually relevant
parts as per usual email ettiquette?]
> > @@ -1419,6 +1449,15 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
> > goto out;
> > written = direct_write_fallback(iocb, from, written,
> > fuse_perform_write(iocb, from));
>
> Random unrelatd question: does anyone know why fuse handles IOCB_DIRECT
> in its fuse_cache_{read,write}_iter functions and /also/ sets
> ->direct_IO? I thought filesystems only did one or the other, not both.
Nothing really should be setting ->direct_IO these days except for
legacy reasons. It's another one of those method that aren't methods
but just callbacks that require file system specific context.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v4 2/5] fuse: use iomap for writeback
2025-07-12 4:41 ` Darrick J. Wong
@ 2025-07-14 21:43 ` Joanne Koong
0 siblings, 0 replies; 11+ messages in thread
From: Joanne Koong @ 2025-07-14 21:43 UTC (permalink / raw)
To: Darrick J. Wong
Cc: linux-fsdevel, hch, miklos, brauner, anuj20.g, kernel-team
On Fri, Jul 11, 2025 at 9:41 PM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Wed, Jul 09, 2025 at 03:10:20PM -0700, Joanne Koong wrote:
> > Use iomap for dirty folio writeback in ->writepages().
> > This allows for granular dirty writeback of large folios.
> >
> > Only the dirty portions of the large folio will be written instead of
> > having to write out the entire folio. For example if there is a 1 MB
> > large folio and only 2 bytes in it are dirty, only the page for those
> > dirty bytes will be written out.
> >
> > .dirty_folio needs to be set to iomap_dirty_folio so that the bitmap
> > iomap uses for dirty tracking correctly reflects dirty regions that need
> > to be written back.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> > fs/fuse/file.c | 127 +++++++++++++++++++++++++++++--------------------
> > 1 file changed, 76 insertions(+), 51 deletions(-)
> >
> > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > index cadad61ef7df..70bbc8f26459 100644
> > --- a/fs/fuse/file.c
> > +++ b/fs/fuse/file.c
> > @@ -2100,7 +2101,7 @@ struct fuse_fill_wb_data {
> > struct fuse_file *ff;
> > struct inode *inode;
> > unsigned int max_folios;
> > - unsigned int nr_pages;
> > + unsigned int nr_bytes;
>
> I don't know if fuse servers are ever realistically going to end up with
> a large number of 1M folios, but at least in theory iomap is capable of
> queuing ~4096 folios into a single writeback context. Does this need to
> account for that?
In fuse_writepage_need_send(), the writeback request gets sent out if
max pages can be exceeded (eg if ((data->nr_bytes + len) / PAGE_SIZE >
fc->max_pages)). max pages has a limit of 65535, which gives a limit
in bytes of 256 MB (eg 65535 * PAGE_SIZE), so I think having unsigned
int here for nr_bytes is okay.
>
> > };
> >
> > static bool fuse_pages_realloc(struct fuse_fill_wb_data *data)
> > @@ -2141,22 +2142,29 @@ static void fuse_writepages_send(struct fuse_fill_wb_data *data)
> > spin_unlock(&fi->lock);
> > }
> >
> > -static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio,
> > - struct fuse_args_pages *ap,
> > +static bool fuse_writepage_need_send(struct fuse_conn *fc, loff_t pos,
> > + unsigned len, struct fuse_args_pages *ap,
> > struct fuse_fill_wb_data *data)
> > {
> > + struct folio *prev_folio;
> > + struct fuse_folio_desc prev_desc;
> > + loff_t prev_pos;
> > +
> > WARN_ON(!ap->num_folios);
> >
> > /* Reached max pages */
> > - if (data->nr_pages + folio_nr_pages(folio) > fc->max_pages)
> > + if ((data->nr_bytes + len) / PAGE_SIZE > fc->max_pages)
>
> >> PAGE_SHIFT ?
Nice, i'll change this to >> PAGE_SHIFT. Thanks for looking through
the patchset.
>
> Otherwise this looks decent to me.
>
> --D
>
> > return true;
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-07-14 21:43 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-09 22:10 [PATCH v4 0/5] fuse: use iomap for buffered writes + writeback Joanne Koong
2025-07-09 22:10 ` [PATCH v4 1/5] fuse: use iomap for buffered writes Joanne Koong
2025-07-12 4:46 ` Darrick J. Wong
2025-07-12 6:13 ` Amir Goldstein
2025-07-14 11:40 ` Christoph Hellwig
2025-07-09 22:10 ` [PATCH v4 2/5] fuse: use iomap for writeback Joanne Koong
2025-07-12 4:41 ` Darrick J. Wong
2025-07-14 21:43 ` Joanne Koong
2025-07-09 22:10 ` [PATCH v4 3/5] fuse: use iomap for folio laundering Joanne Koong
2025-07-09 22:10 ` [PATCH v4 4/5] fuse: hook into iomap for invalidating and checking partial uptodateness Joanne Koong
2025-07-09 22:10 ` [PATCH v4 5/5] fuse: refactor writeback to use iomap_writepage_ctx inode Joanne Koong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).