* [PATCH v5 1/5] fuse: use iomap for buffered writes
2025-07-15 20:21 [PATCH v5 0/5] fuse: use iomap for buffered writes + writeback Joanne Koong
@ 2025-07-15 20:21 ` Joanne Koong
2025-07-15 20:21 ` [PATCH v5 2/5] fuse: use iomap for writeback Joanne Koong
` (4 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Joanne Koong @ 2025-07-15 20:21 UTC (permalink / raw)
To: linux-fsdevel; +Cc: hch, miklos, brauner, djwong, anuj20.g, kernel-team
Have buffered writes go through iomap. This has two advantages:
* granular large folio synchronous reads
* granular large folio dirty tracking
If for example there is a 1 MB large folio and a write issued at pos 1
to pos 1 MB - 2, only the head and tail pages will need to be read in
and marked uptodate instead of the entire folio needing to be read in.
Non-relevant trailing pages are also skipped (eg if for a 1 MB large
folio a write is issued at pos 1 to 4099, only the first two pages are
read in and the ones after that are skipped).
iomap also has granular dirty tracking. This is useful in that when it
comes to writeback time, only the dirty portions of the large folio will
be written instead of having to write out the entire folio. For example
if there is a 1 MB large folio and only 2 bytes in it are dirty, only
the page for those dirty bytes get written out. Please note that
granular writeback is only done once fuse also uses iomap in writeback
(separate commit).
.release_folio needs to be set to iomap_release_folio so that any
allocated iomap ifs structs get freed.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/fuse/Kconfig | 1 +
fs/fuse/file.c | 148 ++++++++++++++++++------------------------------
2 files changed, 55 insertions(+), 94 deletions(-)
diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
index ca215a3cba3e..a774166264de 100644
--- a/fs/fuse/Kconfig
+++ b/fs/fuse/Kconfig
@@ -2,6 +2,7 @@
config FUSE_FS
tristate "FUSE (Filesystem in Userspace) support"
select FS_POSIX_ACL
+ select FS_IOMAP
help
With FUSE it is possible to implement a fully functional filesystem
in a userspace program.
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 47006d0753f1..cadad61ef7df 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -21,6 +21,7 @@
#include <linux/filelock.h>
#include <linux/splice.h>
#include <linux/task_io_accounting_ops.h>
+#include <linux/iomap.h>
static int fuse_send_open(struct fuse_mount *fm, u64 nodeid,
unsigned int open_flags, int opcode,
@@ -788,12 +789,16 @@ static void fuse_short_read(struct inode *inode, u64 attr_ver, size_t num_read,
}
}
-static int fuse_do_readfolio(struct file *file, struct folio *folio)
+static int fuse_do_readfolio(struct file *file, struct folio *folio,
+ size_t off, size_t len)
{
struct inode *inode = folio->mapping->host;
struct fuse_mount *fm = get_fuse_mount(inode);
- loff_t pos = folio_pos(folio);
- struct fuse_folio_desc desc = { .length = folio_size(folio) };
+ loff_t pos = folio_pos(folio) + off;
+ struct fuse_folio_desc desc = {
+ .offset = off,
+ .length = len,
+ };
struct fuse_io_args ia = {
.ap.args.page_zeroing = true,
.ap.args.out_pages = true,
@@ -820,8 +825,6 @@ static int fuse_do_readfolio(struct file *file, struct folio *folio)
if (res < desc.length)
fuse_short_read(inode, attr_ver, res, &ia.ap);
- folio_mark_uptodate(folio);
-
return 0;
}
@@ -834,13 +837,26 @@ static int fuse_read_folio(struct file *file, struct folio *folio)
if (fuse_is_bad(inode))
goto out;
- err = fuse_do_readfolio(file, folio);
+ err = fuse_do_readfolio(file, folio, 0, folio_size(folio));
+ if (!err)
+ folio_mark_uptodate(folio);
+
fuse_invalidate_atime(inode);
out:
folio_unlock(folio);
return err;
}
+static int fuse_iomap_read_folio_range(const struct iomap_iter *iter,
+ struct folio *folio, loff_t pos,
+ size_t len)
+{
+ struct file *file = iter->private;
+ size_t off = offset_in_folio(folio, pos);
+
+ return fuse_do_readfolio(file, folio, off, len);
+}
+
static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
int err)
{
@@ -1374,6 +1390,24 @@ static void fuse_dio_unlock(struct kiocb *iocb, bool exclusive)
}
}
+static const struct iomap_write_ops fuse_iomap_write_ops = {
+ .read_folio_range = fuse_iomap_read_folio_range,
+};
+
+static int fuse_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
+ unsigned int flags, struct iomap *iomap,
+ struct iomap *srcmap)
+{
+ iomap->type = IOMAP_MAPPED;
+ iomap->length = length;
+ iomap->offset = offset;
+ return 0;
+}
+
+static const struct iomap_ops fuse_iomap_ops = {
+ .iomap_begin = fuse_iomap_begin,
+};
+
static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
struct file *file = iocb->ki_filp;
@@ -1383,6 +1417,7 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = mapping->host;
ssize_t err, count;
struct fuse_conn *fc = get_fuse_conn(inode);
+ bool writeback = false;
if (fc->writeback_cache) {
/* Update size (EOF optimization) and mode (SUID clearing) */
@@ -1391,16 +1426,11 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
if (err)
return err;
- if (fc->handle_killpriv_v2 &&
- setattr_should_drop_suidgid(idmap,
- file_inode(file))) {
- goto writethrough;
- }
-
- return generic_file_write_iter(iocb, from);
+ if (!fc->handle_killpriv_v2 ||
+ !setattr_should_drop_suidgid(idmap, file_inode(file)))
+ writeback = true;
}
-writethrough:
inode_lock(inode);
err = count = generic_write_checks(iocb, from);
@@ -1419,6 +1449,15 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
goto out;
written = direct_write_fallback(iocb, from, written,
fuse_perform_write(iocb, from));
+ } else if (writeback) {
+ /*
+ * Use iomap so that we can do granular uptodate reads
+ * and granular dirty tracking for large folios.
+ */
+ written = iomap_file_buffered_write(iocb, from,
+ &fuse_iomap_ops,
+ &fuse_iomap_write_ops,
+ file);
} else {
written = fuse_perform_write(iocb, from);
}
@@ -2208,84 +2247,6 @@ static int fuse_writepages(struct address_space *mapping,
return err;
}
-/*
- * It's worthy to make sure that space is reserved on disk for the write,
- * but how to implement it without killing performance need more thinking.
- */
-static int fuse_write_begin(struct file *file, struct address_space *mapping,
- loff_t pos, unsigned len, struct folio **foliop, void **fsdata)
-{
- pgoff_t index = pos >> PAGE_SHIFT;
- struct fuse_conn *fc = get_fuse_conn(file_inode(file));
- struct folio *folio;
- loff_t fsize;
- int err = -ENOMEM;
-
- WARN_ON(!fc->writeback_cache);
-
- folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
- mapping_gfp_mask(mapping));
- if (IS_ERR(folio))
- goto error;
-
- if (folio_test_uptodate(folio) || len >= folio_size(folio))
- goto success;
- /*
- * Check if the start of this folio comes after the end of file,
- * in which case the readpage can be optimized away.
- */
- fsize = i_size_read(mapping->host);
- if (fsize <= folio_pos(folio)) {
- size_t off = offset_in_folio(folio, pos);
- if (off)
- folio_zero_segment(folio, 0, off);
- goto success;
- }
- err = fuse_do_readfolio(file, folio);
- if (err)
- goto cleanup;
-success:
- *foliop = folio;
- return 0;
-
-cleanup:
- folio_unlock(folio);
- folio_put(folio);
-error:
- return err;
-}
-
-static int fuse_write_end(struct file *file, struct address_space *mapping,
- loff_t pos, unsigned len, unsigned copied,
- struct folio *folio, void *fsdata)
-{
- struct inode *inode = folio->mapping->host;
-
- /* Haven't copied anything? Skip zeroing, size extending, dirtying. */
- if (!copied)
- goto unlock;
-
- pos += copied;
- if (!folio_test_uptodate(folio)) {
- /* Zero any unwritten bytes at the end of the page */
- size_t endoff = pos & ~PAGE_MASK;
- if (endoff)
- folio_zero_segment(folio, endoff, PAGE_SIZE);
- folio_mark_uptodate(folio);
- }
-
- if (pos > inode->i_size)
- i_size_write(inode, pos);
-
- folio_mark_dirty(folio);
-
-unlock:
- folio_unlock(folio);
- folio_put(folio);
-
- return copied;
-}
-
static int fuse_launder_folio(struct folio *folio)
{
int err = 0;
@@ -3144,11 +3105,10 @@ static const struct address_space_operations fuse_file_aops = {
.writepages = fuse_writepages,
.launder_folio = fuse_launder_folio,
.dirty_folio = filemap_dirty_folio,
+ .release_folio = iomap_release_folio,
.migrate_folio = filemap_migrate_folio,
.bmap = fuse_bmap,
.direct_IO = fuse_direct_IO,
- .write_begin = fuse_write_begin,
- .write_end = fuse_write_end,
};
void fuse_init_file_inode(struct inode *inode, unsigned int flags)
--
2.47.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v5 2/5] fuse: use iomap for writeback
2025-07-15 20:21 [PATCH v5 0/5] fuse: use iomap for buffered writes + writeback Joanne Koong
2025-07-15 20:21 ` [PATCH v5 1/5] fuse: use iomap for buffered writes Joanne Koong
@ 2025-07-15 20:21 ` Joanne Koong
2025-07-15 20:57 ` Darrick J. Wong
2025-07-15 20:21 ` [PATCH v5 3/5] fuse: use iomap for folio laundering Joanne Koong
` (3 subsequent siblings)
5 siblings, 1 reply; 8+ messages in thread
From: Joanne Koong @ 2025-07-15 20:21 UTC (permalink / raw)
To: linux-fsdevel; +Cc: hch, miklos, brauner, djwong, anuj20.g, kernel-team
Use iomap for dirty folio writeback in ->writepages().
This allows for granular dirty writeback of large folios.
Only the dirty portions of the large folio will be written instead of
having to write out the entire folio. For example if there is a 1 MB
large folio and only 2 bytes in it are dirty, only the page for those
dirty bytes will be written out.
.dirty_folio needs to be set to iomap_dirty_folio so that the bitmap
iomap uses for dirty tracking correctly reflects dirty regions that need
to be written back.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/fuse/file.c | 133 ++++++++++++++++++++++++++++++-------------------
1 file changed, 82 insertions(+), 51 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index cadad61ef7df..93a96cdf56e1 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1832,7 +1832,7 @@ static void fuse_writepage_finish(struct fuse_writepage_args *wpa)
* scope of the fi->lock alleviates xarray lock
* contention and noticeably improves performance.
*/
- folio_end_writeback(ap->folios[i]);
+ iomap_finish_folio_write(inode, ap->folios[i], 1);
dec_wb_stat(&bdi->wb, WB_WRITEBACK);
wb_writeout_inc(&bdi->wb);
}
@@ -2019,19 +2019,20 @@ static void fuse_writepage_add_to_bucket(struct fuse_conn *fc,
}
static void fuse_writepage_args_page_fill(struct fuse_writepage_args *wpa, struct folio *folio,
- uint32_t folio_index)
+ uint32_t folio_index, loff_t offset, unsigned len)
{
struct inode *inode = folio->mapping->host;
struct fuse_args_pages *ap = &wpa->ia.ap;
ap->folios[folio_index] = folio;
- ap->descs[folio_index].offset = 0;
- ap->descs[folio_index].length = folio_size(folio);
+ ap->descs[folio_index].offset = offset;
+ ap->descs[folio_index].length = len;
inc_wb_stat(&inode_to_bdi(inode)->wb, WB_WRITEBACK);
}
static struct fuse_writepage_args *fuse_writepage_args_setup(struct folio *folio,
+ size_t offset,
struct fuse_file *ff)
{
struct inode *inode = folio->mapping->host;
@@ -2044,7 +2045,7 @@ static struct fuse_writepage_args *fuse_writepage_args_setup(struct folio *folio
return NULL;
fuse_writepage_add_to_bucket(fc, wpa);
- fuse_write_args_fill(&wpa->ia, ff, folio_pos(folio), 0);
+ fuse_write_args_fill(&wpa->ia, ff, folio_pos(folio) + offset, 0);
wpa->ia.write.in.write_flags |= FUSE_WRITE_CACHE;
wpa->inode = inode;
wpa->ia.ff = ff;
@@ -2070,7 +2071,7 @@ static int fuse_writepage_locked(struct folio *folio)
if (!ff)
goto err;
- wpa = fuse_writepage_args_setup(folio, ff);
+ wpa = fuse_writepage_args_setup(folio, 0, ff);
error = -ENOMEM;
if (!wpa)
goto err_writepage_args;
@@ -2079,7 +2080,7 @@ static int fuse_writepage_locked(struct folio *folio)
ap->num_folios = 1;
folio_start_writeback(folio);
- fuse_writepage_args_page_fill(wpa, folio, 0);
+ fuse_writepage_args_page_fill(wpa, folio, 0, 0, folio_size(folio));
spin_lock(&fi->lock);
list_add_tail(&wpa->queue_entry, &fi->queued_writes);
@@ -2100,7 +2101,12 @@ struct fuse_fill_wb_data {
struct fuse_file *ff;
struct inode *inode;
unsigned int max_folios;
- unsigned int nr_pages;
+ /*
+ * nr_bytes won't overflow since fuse_writepage_need_send() caps
+ * wb requests to never exceed fc->max_pages (which has an upper bound
+ * of U16_MAX).
+ */
+ unsigned int nr_bytes;
};
static bool fuse_pages_realloc(struct fuse_fill_wb_data *data)
@@ -2141,22 +2147,30 @@ static void fuse_writepages_send(struct fuse_fill_wb_data *data)
spin_unlock(&fi->lock);
}
-static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio,
- struct fuse_args_pages *ap,
+static bool fuse_writepage_need_send(struct fuse_conn *fc, loff_t pos,
+ unsigned len, struct fuse_args_pages *ap,
struct fuse_fill_wb_data *data)
{
+ struct folio *prev_folio;
+ struct fuse_folio_desc prev_desc;
+ unsigned bytes = data->nr_bytes + len;
+ loff_t prev_pos;
+
WARN_ON(!ap->num_folios);
/* Reached max pages */
- if (data->nr_pages + folio_nr_pages(folio) > fc->max_pages)
+ if ((bytes + PAGE_SIZE - 1) >> PAGE_SHIFT > fc->max_pages)
return true;
/* Reached max write bytes */
- if ((data->nr_pages * PAGE_SIZE) + folio_size(folio) > fc->max_write)
+ if (bytes > fc->max_write)
return true;
/* Discontinuity */
- if (folio_next_index(ap->folios[ap->num_folios - 1]) != folio->index)
+ prev_folio = ap->folios[ap->num_folios - 1];
+ prev_desc = ap->descs[ap->num_folios - 1];
+ prev_pos = folio_pos(prev_folio) + prev_desc.offset + prev_desc.length;
+ if (prev_pos != pos)
return true;
/* Need to grow the pages array? If so, did the expansion fail? */
@@ -2166,85 +2180,102 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio,
return false;
}
-static int fuse_writepages_fill(struct folio *folio,
- struct writeback_control *wbc, void *_data)
+static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
+ struct folio *folio, u64 pos,
+ unsigned len, u64 end_pos)
{
- struct fuse_fill_wb_data *data = _data;
+ struct fuse_fill_wb_data *data = wpc->wb_ctx;
struct fuse_writepage_args *wpa = data->wpa;
struct fuse_args_pages *ap = &wpa->ia.ap;
struct inode *inode = data->inode;
struct fuse_inode *fi = get_fuse_inode(inode);
struct fuse_conn *fc = get_fuse_conn(inode);
- int err;
+ loff_t offset = offset_in_folio(folio, pos);
+
+ WARN_ON_ONCE(!data);
+ /* len will always be page aligned */
+ WARN_ON_ONCE(len & (PAGE_SIZE - 1));
if (!data->ff) {
- err = -EIO;
data->ff = fuse_write_file_get(fi);
if (!data->ff)
- goto out_unlock;
+ return -EIO;
}
- if (wpa && fuse_writepage_need_send(fc, folio, ap, data)) {
+ if (wpa && fuse_writepage_need_send(fc, pos, len, ap, data)) {
fuse_writepages_send(data);
data->wpa = NULL;
- data->nr_pages = 0;
+ data->nr_bytes = 0;
}
if (data->wpa == NULL) {
- err = -ENOMEM;
- wpa = fuse_writepage_args_setup(folio, data->ff);
+ wpa = fuse_writepage_args_setup(folio, offset, data->ff);
if (!wpa)
- goto out_unlock;
+ return -ENOMEM;
fuse_file_get(wpa->ia.ff);
data->max_folios = 1;
ap = &wpa->ia.ap;
}
- folio_start_writeback(folio);
- fuse_writepage_args_page_fill(wpa, folio, ap->num_folios);
- data->nr_pages += folio_nr_pages(folio);
+ iomap_start_folio_write(inode, folio, 1);
+ fuse_writepage_args_page_fill(wpa, folio, ap->num_folios,
+ offset, len);
+ data->nr_bytes += len;
- err = 0;
ap->num_folios++;
if (!data->wpa)
data->wpa = wpa;
-out_unlock:
- folio_unlock(folio);
- return err;
+ return len;
+}
+
+static int fuse_iomap_writeback_submit(struct iomap_writepage_ctx *wpc,
+ int error)
+{
+ struct fuse_fill_wb_data *data = wpc->wb_ctx;
+
+ WARN_ON_ONCE(!data);
+
+ if (data->wpa) {
+ WARN_ON(!data->wpa->ia.ap.num_folios);
+ fuse_writepages_send(data);
+ }
+
+ if (data->ff)
+ fuse_file_put(data->ff, false);
+
+ return error;
}
+static const struct iomap_writeback_ops fuse_writeback_ops = {
+ .writeback_range = fuse_iomap_writeback_range,
+ .writeback_submit = fuse_iomap_writeback_submit,
+};
+
static int fuse_writepages(struct address_space *mapping,
struct writeback_control *wbc)
{
struct inode *inode = mapping->host;
struct fuse_conn *fc = get_fuse_conn(inode);
- struct fuse_fill_wb_data data;
- int err;
+ struct fuse_fill_wb_data data = {
+ .inode = inode,
+ };
+ struct iomap_writepage_ctx wpc = {
+ .inode = inode,
+ .iomap.type = IOMAP_MAPPED,
+ .wbc = wbc,
+ .ops = &fuse_writeback_ops,
+ .wb_ctx = &data,
+ };
- err = -EIO;
if (fuse_is_bad(inode))
- goto out;
+ return -EIO;
if (wbc->sync_mode == WB_SYNC_NONE &&
fc->num_background >= fc->congestion_threshold)
return 0;
- data.inode = inode;
- data.wpa = NULL;
- data.ff = NULL;
- data.nr_pages = 0;
-
- err = write_cache_pages(mapping, wbc, fuse_writepages_fill, &data);
- if (data.wpa) {
- WARN_ON(!data.wpa->ia.ap.num_folios);
- fuse_writepages_send(&data);
- }
- if (data.ff)
- fuse_file_put(data.ff, false);
-
-out:
- return err;
+ return iomap_writepages(&wpc);
}
static int fuse_launder_folio(struct folio *folio)
@@ -3104,7 +3135,7 @@ static const struct address_space_operations fuse_file_aops = {
.readahead = fuse_readahead,
.writepages = fuse_writepages,
.launder_folio = fuse_launder_folio,
- .dirty_folio = filemap_dirty_folio,
+ .dirty_folio = iomap_dirty_folio,
.release_folio = iomap_release_folio,
.migrate_folio = filemap_migrate_folio,
.bmap = fuse_bmap,
--
2.47.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v5 2/5] fuse: use iomap for writeback
2025-07-15 20:21 ` [PATCH v5 2/5] fuse: use iomap for writeback Joanne Koong
@ 2025-07-15 20:57 ` Darrick J. Wong
0 siblings, 0 replies; 8+ messages in thread
From: Darrick J. Wong @ 2025-07-15 20:57 UTC (permalink / raw)
To: Joanne Koong; +Cc: linux-fsdevel, hch, miklos, brauner, anuj20.g, kernel-team
On Tue, Jul 15, 2025 at 01:21:19PM -0700, Joanne Koong wrote:
> Use iomap for dirty folio writeback in ->writepages().
> This allows for granular dirty writeback of large folios.
>
> Only the dirty portions of the large folio will be written instead of
> having to write out the entire folio. For example if there is a 1 MB
> large folio and only 2 bytes in it are dirty, only the page for those
> dirty bytes will be written out.
>
> .dirty_folio needs to be set to iomap_dirty_folio so that the bitmap
> iomap uses for dirty tracking correctly reflects dirty regions that need
> to be written back.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Thanks for adding the comment about nr_max_bytes and whatnot!
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/fuse/file.c | 133 ++++++++++++++++++++++++++++++-------------------
> 1 file changed, 82 insertions(+), 51 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index cadad61ef7df..93a96cdf56e1 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1832,7 +1832,7 @@ static void fuse_writepage_finish(struct fuse_writepage_args *wpa)
> * scope of the fi->lock alleviates xarray lock
> * contention and noticeably improves performance.
> */
> - folio_end_writeback(ap->folios[i]);
> + iomap_finish_folio_write(inode, ap->folios[i], 1);
> dec_wb_stat(&bdi->wb, WB_WRITEBACK);
> wb_writeout_inc(&bdi->wb);
> }
> @@ -2019,19 +2019,20 @@ static void fuse_writepage_add_to_bucket(struct fuse_conn *fc,
> }
>
> static void fuse_writepage_args_page_fill(struct fuse_writepage_args *wpa, struct folio *folio,
> - uint32_t folio_index)
> + uint32_t folio_index, loff_t offset, unsigned len)
> {
> struct inode *inode = folio->mapping->host;
> struct fuse_args_pages *ap = &wpa->ia.ap;
>
> ap->folios[folio_index] = folio;
> - ap->descs[folio_index].offset = 0;
> - ap->descs[folio_index].length = folio_size(folio);
> + ap->descs[folio_index].offset = offset;
> + ap->descs[folio_index].length = len;
>
> inc_wb_stat(&inode_to_bdi(inode)->wb, WB_WRITEBACK);
> }
>
> static struct fuse_writepage_args *fuse_writepage_args_setup(struct folio *folio,
> + size_t offset,
> struct fuse_file *ff)
> {
> struct inode *inode = folio->mapping->host;
> @@ -2044,7 +2045,7 @@ static struct fuse_writepage_args *fuse_writepage_args_setup(struct folio *folio
> return NULL;
>
> fuse_writepage_add_to_bucket(fc, wpa);
> - fuse_write_args_fill(&wpa->ia, ff, folio_pos(folio), 0);
> + fuse_write_args_fill(&wpa->ia, ff, folio_pos(folio) + offset, 0);
> wpa->ia.write.in.write_flags |= FUSE_WRITE_CACHE;
> wpa->inode = inode;
> wpa->ia.ff = ff;
> @@ -2070,7 +2071,7 @@ static int fuse_writepage_locked(struct folio *folio)
> if (!ff)
> goto err;
>
> - wpa = fuse_writepage_args_setup(folio, ff);
> + wpa = fuse_writepage_args_setup(folio, 0, ff);
> error = -ENOMEM;
> if (!wpa)
> goto err_writepage_args;
> @@ -2079,7 +2080,7 @@ static int fuse_writepage_locked(struct folio *folio)
> ap->num_folios = 1;
>
> folio_start_writeback(folio);
> - fuse_writepage_args_page_fill(wpa, folio, 0);
> + fuse_writepage_args_page_fill(wpa, folio, 0, 0, folio_size(folio));
>
> spin_lock(&fi->lock);
> list_add_tail(&wpa->queue_entry, &fi->queued_writes);
> @@ -2100,7 +2101,12 @@ struct fuse_fill_wb_data {
> struct fuse_file *ff;
> struct inode *inode;
> unsigned int max_folios;
> - unsigned int nr_pages;
> + /*
> + * nr_bytes won't overflow since fuse_writepage_need_send() caps
> + * wb requests to never exceed fc->max_pages (which has an upper bound
> + * of U16_MAX).
> + */
> + unsigned int nr_bytes;
> };
>
> static bool fuse_pages_realloc(struct fuse_fill_wb_data *data)
> @@ -2141,22 +2147,30 @@ static void fuse_writepages_send(struct fuse_fill_wb_data *data)
> spin_unlock(&fi->lock);
> }
>
> -static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio,
> - struct fuse_args_pages *ap,
> +static bool fuse_writepage_need_send(struct fuse_conn *fc, loff_t pos,
> + unsigned len, struct fuse_args_pages *ap,
> struct fuse_fill_wb_data *data)
> {
> + struct folio *prev_folio;
> + struct fuse_folio_desc prev_desc;
> + unsigned bytes = data->nr_bytes + len;
> + loff_t prev_pos;
> +
> WARN_ON(!ap->num_folios);
>
> /* Reached max pages */
> - if (data->nr_pages + folio_nr_pages(folio) > fc->max_pages)
> + if ((bytes + PAGE_SIZE - 1) >> PAGE_SHIFT > fc->max_pages)
> return true;
>
> /* Reached max write bytes */
> - if ((data->nr_pages * PAGE_SIZE) + folio_size(folio) > fc->max_write)
> + if (bytes > fc->max_write)
> return true;
>
> /* Discontinuity */
> - if (folio_next_index(ap->folios[ap->num_folios - 1]) != folio->index)
> + prev_folio = ap->folios[ap->num_folios - 1];
> + prev_desc = ap->descs[ap->num_folios - 1];
> + prev_pos = folio_pos(prev_folio) + prev_desc.offset + prev_desc.length;
> + if (prev_pos != pos)
> return true;
>
> /* Need to grow the pages array? If so, did the expansion fail? */
> @@ -2166,85 +2180,102 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio,
> return false;
> }
>
> -static int fuse_writepages_fill(struct folio *folio,
> - struct writeback_control *wbc, void *_data)
> +static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
> + struct folio *folio, u64 pos,
> + unsigned len, u64 end_pos)
> {
> - struct fuse_fill_wb_data *data = _data;
> + struct fuse_fill_wb_data *data = wpc->wb_ctx;
> struct fuse_writepage_args *wpa = data->wpa;
> struct fuse_args_pages *ap = &wpa->ia.ap;
> struct inode *inode = data->inode;
> struct fuse_inode *fi = get_fuse_inode(inode);
> struct fuse_conn *fc = get_fuse_conn(inode);
> - int err;
> + loff_t offset = offset_in_folio(folio, pos);
> +
> + WARN_ON_ONCE(!data);
> + /* len will always be page aligned */
> + WARN_ON_ONCE(len & (PAGE_SIZE - 1));
>
> if (!data->ff) {
> - err = -EIO;
> data->ff = fuse_write_file_get(fi);
> if (!data->ff)
> - goto out_unlock;
> + return -EIO;
> }
>
> - if (wpa && fuse_writepage_need_send(fc, folio, ap, data)) {
> + if (wpa && fuse_writepage_need_send(fc, pos, len, ap, data)) {
> fuse_writepages_send(data);
> data->wpa = NULL;
> - data->nr_pages = 0;
> + data->nr_bytes = 0;
> }
>
> if (data->wpa == NULL) {
> - err = -ENOMEM;
> - wpa = fuse_writepage_args_setup(folio, data->ff);
> + wpa = fuse_writepage_args_setup(folio, offset, data->ff);
> if (!wpa)
> - goto out_unlock;
> + return -ENOMEM;
> fuse_file_get(wpa->ia.ff);
> data->max_folios = 1;
> ap = &wpa->ia.ap;
> }
> - folio_start_writeback(folio);
>
> - fuse_writepage_args_page_fill(wpa, folio, ap->num_folios);
> - data->nr_pages += folio_nr_pages(folio);
> + iomap_start_folio_write(inode, folio, 1);
> + fuse_writepage_args_page_fill(wpa, folio, ap->num_folios,
> + offset, len);
> + data->nr_bytes += len;
>
> - err = 0;
> ap->num_folios++;
> if (!data->wpa)
> data->wpa = wpa;
> -out_unlock:
> - folio_unlock(folio);
>
> - return err;
> + return len;
> +}
> +
> +static int fuse_iomap_writeback_submit(struct iomap_writepage_ctx *wpc,
> + int error)
> +{
> + struct fuse_fill_wb_data *data = wpc->wb_ctx;
> +
> + WARN_ON_ONCE(!data);
> +
> + if (data->wpa) {
> + WARN_ON(!data->wpa->ia.ap.num_folios);
> + fuse_writepages_send(data);
> + }
> +
> + if (data->ff)
> + fuse_file_put(data->ff, false);
> +
> + return error;
> }
>
> +static const struct iomap_writeback_ops fuse_writeback_ops = {
> + .writeback_range = fuse_iomap_writeback_range,
> + .writeback_submit = fuse_iomap_writeback_submit,
> +};
> +
> static int fuse_writepages(struct address_space *mapping,
> struct writeback_control *wbc)
> {
> struct inode *inode = mapping->host;
> struct fuse_conn *fc = get_fuse_conn(inode);
> - struct fuse_fill_wb_data data;
> - int err;
> + struct fuse_fill_wb_data data = {
> + .inode = inode,
> + };
> + struct iomap_writepage_ctx wpc = {
> + .inode = inode,
> + .iomap.type = IOMAP_MAPPED,
> + .wbc = wbc,
> + .ops = &fuse_writeback_ops,
> + .wb_ctx = &data,
> + };
>
> - err = -EIO;
> if (fuse_is_bad(inode))
> - goto out;
> + return -EIO;
>
> if (wbc->sync_mode == WB_SYNC_NONE &&
> fc->num_background >= fc->congestion_threshold)
> return 0;
>
> - data.inode = inode;
> - data.wpa = NULL;
> - data.ff = NULL;
> - data.nr_pages = 0;
> -
> - err = write_cache_pages(mapping, wbc, fuse_writepages_fill, &data);
> - if (data.wpa) {
> - WARN_ON(!data.wpa->ia.ap.num_folios);
> - fuse_writepages_send(&data);
> - }
> - if (data.ff)
> - fuse_file_put(data.ff, false);
> -
> -out:
> - return err;
> + return iomap_writepages(&wpc);
> }
>
> static int fuse_launder_folio(struct folio *folio)
> @@ -3104,7 +3135,7 @@ static const struct address_space_operations fuse_file_aops = {
> .readahead = fuse_readahead,
> .writepages = fuse_writepages,
> .launder_folio = fuse_launder_folio,
> - .dirty_folio = filemap_dirty_folio,
> + .dirty_folio = iomap_dirty_folio,
> .release_folio = iomap_release_folio,
> .migrate_folio = filemap_migrate_folio,
> .bmap = fuse_bmap,
> --
> 2.47.1
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v5 3/5] fuse: use iomap for folio laundering
2025-07-15 20:21 [PATCH v5 0/5] fuse: use iomap for buffered writes + writeback Joanne Koong
2025-07-15 20:21 ` [PATCH v5 1/5] fuse: use iomap for buffered writes Joanne Koong
2025-07-15 20:21 ` [PATCH v5 2/5] fuse: use iomap for writeback Joanne Koong
@ 2025-07-15 20:21 ` Joanne Koong
2025-07-15 20:21 ` [PATCH v5 4/5] fuse: hook into iomap for invalidating and checking partial uptodateness Joanne Koong
` (2 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Joanne Koong @ 2025-07-15 20:21 UTC (permalink / raw)
To: linux-fsdevel; +Cc: hch, miklos, brauner, djwong, anuj20.g, kernel-team
Use iomap for folio laundering, which will do granular dirty
writeback when laundering a large folio.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/fuse/file.c | 52 ++++++++++++--------------------------------------
1 file changed, 12 insertions(+), 40 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 93a96cdf56e1..0b57a7b0cd8e 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2057,45 +2057,6 @@ static struct fuse_writepage_args *fuse_writepage_args_setup(struct folio *folio
return wpa;
}
-static int fuse_writepage_locked(struct folio *folio)
-{
- struct address_space *mapping = folio->mapping;
- struct inode *inode = mapping->host;
- struct fuse_inode *fi = get_fuse_inode(inode);
- struct fuse_writepage_args *wpa;
- struct fuse_args_pages *ap;
- struct fuse_file *ff;
- int error = -EIO;
-
- ff = fuse_write_file_get(fi);
- if (!ff)
- goto err;
-
- wpa = fuse_writepage_args_setup(folio, 0, ff);
- error = -ENOMEM;
- if (!wpa)
- goto err_writepage_args;
-
- ap = &wpa->ia.ap;
- ap->num_folios = 1;
-
- folio_start_writeback(folio);
- fuse_writepage_args_page_fill(wpa, folio, 0, 0, folio_size(folio));
-
- spin_lock(&fi->lock);
- list_add_tail(&wpa->queue_entry, &fi->queued_writes);
- fuse_flush_writepages(inode);
- spin_unlock(&fi->lock);
-
- return 0;
-
-err_writepage_args:
- fuse_file_put(ff, false);
-err:
- mapping_set_error(folio->mapping, error);
- return error;
-}
-
struct fuse_fill_wb_data {
struct fuse_writepage_args *wpa;
struct fuse_file *ff;
@@ -2281,8 +2242,19 @@ static int fuse_writepages(struct address_space *mapping,
static int fuse_launder_folio(struct folio *folio)
{
int err = 0;
+ struct fuse_fill_wb_data data = {
+ .inode = folio->mapping->host,
+ };
+ struct iomap_writepage_ctx wpc = {
+ .inode = folio->mapping->host,
+ .iomap.type = IOMAP_MAPPED,
+ .ops = &fuse_writeback_ops,
+ .wb_ctx = &data,
+ };
+
if (folio_clear_dirty_for_io(folio)) {
- err = fuse_writepage_locked(folio);
+ err = iomap_writeback_folio(&wpc, folio);
+ err = fuse_iomap_writeback_submit(&wpc, err);
if (!err)
folio_wait_writeback(folio);
}
--
2.47.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v5 4/5] fuse: hook into iomap for invalidating and checking partial uptodateness
2025-07-15 20:21 [PATCH v5 0/5] fuse: use iomap for buffered writes + writeback Joanne Koong
` (2 preceding siblings ...)
2025-07-15 20:21 ` [PATCH v5 3/5] fuse: use iomap for folio laundering Joanne Koong
@ 2025-07-15 20:21 ` Joanne Koong
2025-07-15 20:21 ` [PATCH v5 5/5] fuse: refactor writeback to use iomap_writepage_ctx inode Joanne Koong
2025-07-17 7:55 ` [PATCH v5 0/5] fuse: use iomap for buffered writes + writeback Christian Brauner
5 siblings, 0 replies; 8+ messages in thread
From: Joanne Koong @ 2025-07-15 20:21 UTC (permalink / raw)
To: linux-fsdevel; +Cc: hch, miklos, brauner, djwong, anuj20.g, kernel-team
Hook into iomap_invalidate_folio() so that if the entire folio is being
invalidated during truncation, the dirty state is cleared and the folio
doesn't get written back. As well the folio's corresponding ifs struct
will get freed.
Hook into iomap_is_partially_uptodate() since iomap tracks uptodateness
granularly when it does buffered writes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/fuse/file.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 0b57a7b0cd8e..096c5ffc6a57 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -3109,6 +3109,8 @@ static const struct address_space_operations fuse_file_aops = {
.launder_folio = fuse_launder_folio,
.dirty_folio = iomap_dirty_folio,
.release_folio = iomap_release_folio,
+ .invalidate_folio = iomap_invalidate_folio,
+ .is_partially_uptodate = iomap_is_partially_uptodate,
.migrate_folio = filemap_migrate_folio,
.bmap = fuse_bmap,
.direct_IO = fuse_direct_IO,
--
2.47.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v5 5/5] fuse: refactor writeback to use iomap_writepage_ctx inode
2025-07-15 20:21 [PATCH v5 0/5] fuse: use iomap for buffered writes + writeback Joanne Koong
` (3 preceding siblings ...)
2025-07-15 20:21 ` [PATCH v5 4/5] fuse: hook into iomap for invalidating and checking partial uptodateness Joanne Koong
@ 2025-07-15 20:21 ` Joanne Koong
2025-07-17 7:55 ` [PATCH v5 0/5] fuse: use iomap for buffered writes + writeback Christian Brauner
5 siblings, 0 replies; 8+ messages in thread
From: Joanne Koong @ 2025-07-15 20:21 UTC (permalink / raw)
To: linux-fsdevel; +Cc: hch, miklos, brauner, djwong, anuj20.g, kernel-team
struct iomap_writepage_ctx includes a pointer to the file inode. In
writeback, use that instead of also passing the inode into
fuse_fill_wb_data.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/fuse/file.c | 28 ++++++++++++----------------
1 file changed, 12 insertions(+), 16 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 096c5ffc6a57..617fd1b562fd 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2060,7 +2060,6 @@ static struct fuse_writepage_args *fuse_writepage_args_setup(struct folio *folio
struct fuse_fill_wb_data {
struct fuse_writepage_args *wpa;
struct fuse_file *ff;
- struct inode *inode;
unsigned int max_folios;
/*
* nr_bytes won't overflow since fuse_writepage_need_send() caps
@@ -2070,16 +2069,16 @@ struct fuse_fill_wb_data {
unsigned int nr_bytes;
};
-static bool fuse_pages_realloc(struct fuse_fill_wb_data *data)
+static bool fuse_pages_realloc(struct fuse_fill_wb_data *data,
+ unsigned int max_pages)
{
struct fuse_args_pages *ap = &data->wpa->ia.ap;
- struct fuse_conn *fc = get_fuse_conn(data->inode);
struct folio **folios;
struct fuse_folio_desc *descs;
unsigned int nfolios = min_t(unsigned int,
max_t(unsigned int, data->max_folios * 2,
FUSE_DEFAULT_MAX_PAGES_PER_REQ),
- fc->max_pages);
+ max_pages);
WARN_ON(nfolios <= data->max_folios);
folios = fuse_folios_alloc(nfolios, GFP_NOFS, &descs);
@@ -2096,10 +2095,10 @@ static bool fuse_pages_realloc(struct fuse_fill_wb_data *data)
return true;
}
-static void fuse_writepages_send(struct fuse_fill_wb_data *data)
+static void fuse_writepages_send(struct inode *inode,
+ struct fuse_fill_wb_data *data)
{
struct fuse_writepage_args *wpa = data->wpa;
- struct inode *inode = data->inode;
struct fuse_inode *fi = get_fuse_inode(inode);
spin_lock(&fi->lock);
@@ -2135,7 +2134,8 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, loff_t pos,
return true;
/* Need to grow the pages array? If so, did the expansion fail? */
- if (ap->num_folios == data->max_folios && !fuse_pages_realloc(data))
+ if (ap->num_folios == data->max_folios &&
+ !fuse_pages_realloc(data, fc->max_pages))
return true;
return false;
@@ -2148,7 +2148,7 @@ static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
struct fuse_fill_wb_data *data = wpc->wb_ctx;
struct fuse_writepage_args *wpa = data->wpa;
struct fuse_args_pages *ap = &wpa->ia.ap;
- struct inode *inode = data->inode;
+ struct inode *inode = wpc->inode;
struct fuse_inode *fi = get_fuse_inode(inode);
struct fuse_conn *fc = get_fuse_conn(inode);
loff_t offset = offset_in_folio(folio, pos);
@@ -2164,7 +2164,7 @@ static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
}
if (wpa && fuse_writepage_need_send(fc, pos, len, ap, data)) {
- fuse_writepages_send(data);
+ fuse_writepages_send(inode, data);
data->wpa = NULL;
data->nr_bytes = 0;
}
@@ -2199,7 +2199,7 @@ static int fuse_iomap_writeback_submit(struct iomap_writepage_ctx *wpc,
if (data->wpa) {
WARN_ON(!data->wpa->ia.ap.num_folios);
- fuse_writepages_send(data);
+ fuse_writepages_send(wpc->inode, data);
}
if (data->ff)
@@ -2218,9 +2218,7 @@ static int fuse_writepages(struct address_space *mapping,
{
struct inode *inode = mapping->host;
struct fuse_conn *fc = get_fuse_conn(inode);
- struct fuse_fill_wb_data data = {
- .inode = inode,
- };
+ struct fuse_fill_wb_data data = {};
struct iomap_writepage_ctx wpc = {
.inode = inode,
.iomap.type = IOMAP_MAPPED,
@@ -2242,9 +2240,7 @@ static int fuse_writepages(struct address_space *mapping,
static int fuse_launder_folio(struct folio *folio)
{
int err = 0;
- struct fuse_fill_wb_data data = {
- .inode = folio->mapping->host,
- };
+ struct fuse_fill_wb_data data = {};
struct iomap_writepage_ctx wpc = {
.inode = folio->mapping->host,
.iomap.type = IOMAP_MAPPED,
--
2.47.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v5 0/5] fuse: use iomap for buffered writes + writeback
2025-07-15 20:21 [PATCH v5 0/5] fuse: use iomap for buffered writes + writeback Joanne Koong
` (4 preceding siblings ...)
2025-07-15 20:21 ` [PATCH v5 5/5] fuse: refactor writeback to use iomap_writepage_ctx inode Joanne Koong
@ 2025-07-17 7:55 ` Christian Brauner
5 siblings, 0 replies; 8+ messages in thread
From: Christian Brauner @ 2025-07-17 7:55 UTC (permalink / raw)
To: linux-fsdevel, Joanne Koong
Cc: Christian Brauner, hch, miklos, djwong, anuj20.g, kernel-team
On Tue, 15 Jul 2025 13:21:17 -0700, Joanne Koong wrote:
> This series adds fuse iomap support for buffered writes and dirty folio
> writeback. This is needed so that granular uptodate and dirty tracking can
> be used in fuse when large folios are enabled. This has two big advantages.
> For writes, instead of the entire folio needing to be read into the page
> cache, only the relevant portions need to be. For writeback, only the
> dirty portions need to be written back instead of the entire folio.
>
> [...]
Applied to the vfs-6.17.iomap branch of the vfs/vfs.git tree.
Patches in the vfs-6.17.iomap branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.17.iomap
[1/5] fuse: use iomap for buffered writes
https://git.kernel.org/vfs/vfs/c/a4c9ab1d4975
[2/5] fuse: use iomap for writeback
https://git.kernel.org/vfs/vfs/c/ef7e7cbb323f
[3/5] fuse: use iomap for folio laundering
https://git.kernel.org/vfs/vfs/c/1097a87dcb74
[4/5] fuse: hook into iomap for invalidating and checking partial uptodateness
https://git.kernel.org/vfs/vfs/c/707c5d3471e3
[5/5] fuse: refactor writeback to use iomap_writepage_ctx inode
https://git.kernel.org/vfs/vfs/c/6e2f4d8a6118
^ permalink raw reply [flat|nested] 8+ messages in thread