* [PATCH v6 00/11] fuse: support large folios
@ 2025-05-12 22:58 Joanne Koong
2025-05-12 22:58 ` [PATCH v6 01/11] fuse: support copying " Joanne Koong
` (11 more replies)
0 siblings, 12 replies; 23+ messages in thread
From: Joanne Koong @ 2025-05-12 22:58 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, bernd.schubert, jlayton, jefflexu, josef, willy,
kernel-team
This patchset adds support for large folios in fuse.
This does not yet switch fuse to using large folios. Using large folios in
fuse is dependent on adding granular dirty-page tracking. This will be done
in a separate patchset that will have fuse use iomap [1]. There also needs
to be a followup (also part of future work) for having dirty page balancing
not tank performance for unprivileged servers where bdi limits lead to subpar
throttling [1], before enabling large folios for fuse.
[1] https://lore.kernel.org/linux-fsdevel/CAJnrk1a38pv3OgFZRfdTiDMXuPWuBgN8KY47XfOsYHj=N2wxAg@mail.gmail.com/#t
Changelog:
v5:
https://lore.kernel.org/linux-fsdevel/20250426000828.3216220-1-joannelkoong@gmail.com/
v5 -> v6:
* Add Bernd's reviewed-bys
* Iniitalize err to 0 for refactoring fuse_fill_write_pages()
(Dan and syzbot)
* Add comment for readahead about size of large folio (Bernd)
* Use WARN_ON for readahead size sanity-checking
v4:
https://lore.kernel.org/linux-fsdevel/20250123012448.2479372-1-joannelkoong@gmail.com/
v4 -> v5:
* Now that temp pages are removed in FUSE, resubmit v3.
v3:
https://lore.kernel.org/linux-fsdevel/20241213221818.322371-1-joannelkoong@gmail.com/
v3 -> v4:
* Add Jeff's reviewed-bys
* Drop writeback large folios changes, drop turning large folios on. These
will be part of a separate future patchset
v2:
https://lore.kernel.org/linux-fsdevel/20241125220537.3663725-1-joannelkoong@gmail.com/
v2 -> v3:
* Fix direct io parsing to check each extracted page instead of assuming all
pages in a large folio will be used (Matthew)
v1:
https://lore.kernel.org/linux-fsdevel/20241109001258.2216604-1-joannelkoong@gmail.com/
v1 -> v2:
* Change naming from "non-writeback write" to "writethrough write"
* Fix deadlock for writethrough writes by calling fault_in_iov_iter_readable()
* first
before __filemap_get_folio() (Josef)
* For readahead, retain original folio_size() for descs.length (Josef)
* Use folio_zero_range() api in fuse_copy_folio() (Josef)
* Add Josef's reviewed-bys
Joanne Koong (11):
fuse: support copying large folios
fuse: support large folios for retrieves
fuse: refactor fuse_fill_write_pages()
fuse: support large folios for writethrough writes
fuse: support large folios for folio reads
fuse: support large folios for symlinks
fuse: support large folios for stores
fuse: support large folios for queued writes
fuse: support large folios for readahead
fuse: optimize direct io large folios processing
fuse: support large folios for writeback
fs/fuse/dev.c | 126 ++++++++++++++++++-----------------
fs/fuse/dir.c | 8 +--
fs/fuse/file.c | 153 +++++++++++++++++++++++++++++--------------
fs/fuse/fuse_dev_i.h | 2 +-
4 files changed, 172 insertions(+), 117 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v6 01/11] fuse: support copying large folios
2025-05-12 22:58 [PATCH v6 00/11] fuse: support large folios Joanne Koong
@ 2025-05-12 22:58 ` Joanne Koong
2025-05-14 0:18 ` Matthew Wilcox
2025-05-12 22:58 ` [PATCH v6 02/11] fuse: support large folios for retrieves Joanne Koong
` (10 subsequent siblings)
11 siblings, 1 reply; 23+ messages in thread
From: Joanne Koong @ 2025-05-12 22:58 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, bernd.schubert, jlayton, jefflexu, josef, willy,
kernel-team, Bernd Schubert
Currently, all folios associated with fuse are one page size. As part of
the work to enable large folios, this commit adds support for copying
to/from folios larger than one page size.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
---
fs/fuse/dev.c | 84 +++++++++++++++++++-------------------------
fs/fuse/fuse_dev_i.h | 2 +-
2 files changed, 37 insertions(+), 49 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 155bb6aeaef5..7b0e3a394480 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -955,10 +955,10 @@ static int fuse_check_folio(struct folio *folio)
* folio that was originally in @pagep will lose a reference and the new
* folio returned in @pagep will carry a reference.
*/
-static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
+static int fuse_try_move_folio(struct fuse_copy_state *cs, struct folio **foliop)
{
int err;
- struct folio *oldfolio = page_folio(*pagep);
+ struct folio *oldfolio = *foliop;
struct folio *newfolio;
struct pipe_buffer *buf = cs->pipebufs;
@@ -979,7 +979,7 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
cs->pipebufs++;
cs->nr_segs--;
- if (cs->len != PAGE_SIZE)
+ if (cs->len != folio_size(oldfolio))
goto out_fallback;
if (!pipe_buf_try_steal(cs->pipe, buf))
@@ -1025,7 +1025,7 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
if (test_bit(FR_ABORTED, &cs->req->flags))
err = -ENOENT;
else
- *pagep = &newfolio->page;
+ *foliop = newfolio;
spin_unlock(&cs->req->waitq.lock);
if (err) {
@@ -1058,8 +1058,8 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
goto out_put_old;
}
-static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page,
- unsigned offset, unsigned count)
+static int fuse_ref_folio(struct fuse_copy_state *cs, struct folio *folio,
+ unsigned offset, unsigned count)
{
struct pipe_buffer *buf;
int err;
@@ -1067,17 +1067,17 @@ static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page,
if (cs->nr_segs >= cs->pipe->max_usage)
return -EIO;
- get_page(page);
+ folio_get(folio);
err = unlock_request(cs->req);
if (err) {
- put_page(page);
+ folio_put(folio);
return err;
}
fuse_copy_finish(cs);
buf = cs->pipebufs;
- buf->page = page;
+ buf->page = &folio->page;
buf->offset = offset;
buf->len = count;
@@ -1089,20 +1089,21 @@ static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page,
}
/*
- * Copy a page in the request to/from the userspace buffer. Must be
+ * Copy a folio in the request to/from the userspace buffer. Must be
* done atomically
*/
-static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep,
- unsigned offset, unsigned count, int zeroing)
+static int fuse_copy_folio(struct fuse_copy_state *cs, struct folio **foliop,
+ unsigned offset, unsigned count, int zeroing)
{
int err;
- struct page *page = *pagep;
+ struct folio *folio = *foliop;
+ size_t size = folio_size(folio);
- if (page && zeroing && count < PAGE_SIZE)
- clear_highpage(page);
+ if (folio && zeroing && count < size)
+ folio_zero_range(folio, 0, size);
while (count) {
- if (cs->write && cs->pipebufs && page) {
+ if (cs->write && cs->pipebufs && folio) {
/*
* Can't control lifetime of pipe buffers, so always
* copy user pages.
@@ -1112,12 +1113,12 @@ static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep,
if (err)
return err;
} else {
- return fuse_ref_page(cs, page, offset, count);
+ return fuse_ref_folio(cs, folio, offset, count);
}
} else if (!cs->len) {
- if (cs->move_pages && page &&
- offset == 0 && count == PAGE_SIZE) {
- err = fuse_try_move_page(cs, pagep);
+ if (cs->move_folios && folio &&
+ offset == 0 && count == folio_size(folio)) {
+ err = fuse_try_move_folio(cs, foliop);
if (err <= 0)
return err;
} else {
@@ -1126,22 +1127,22 @@ static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep,
return err;
}
}
- if (page) {
- void *mapaddr = kmap_local_page(page);
- void *buf = mapaddr + offset;
+ if (folio) {
+ void *mapaddr = kmap_local_folio(folio, offset);
+ void *buf = mapaddr;
offset += fuse_copy_do(cs, &buf, &count);
kunmap_local(mapaddr);
} else
offset += fuse_copy_do(cs, NULL, &count);
}
- if (page && !cs->write)
- flush_dcache_page(page);
+ if (folio && !cs->write)
+ flush_dcache_folio(folio);
return 0;
}
-/* Copy pages in the request to/from userspace buffer */
-static int fuse_copy_pages(struct fuse_copy_state *cs, unsigned nbytes,
- int zeroing)
+/* Copy folios in the request to/from userspace buffer */
+static int fuse_copy_folios(struct fuse_copy_state *cs, unsigned nbytes,
+ int zeroing)
{
unsigned i;
struct fuse_req *req = cs->req;
@@ -1151,23 +1152,12 @@ static int fuse_copy_pages(struct fuse_copy_state *cs, unsigned nbytes,
int err;
unsigned int offset = ap->descs[i].offset;
unsigned int count = min(nbytes, ap->descs[i].length);
- struct page *orig, *pagep;
-
- orig = pagep = &ap->folios[i]->page;
- err = fuse_copy_page(cs, &pagep, offset, count, zeroing);
+ err = fuse_copy_folio(cs, &ap->folios[i], offset, count, zeroing);
if (err)
return err;
nbytes -= count;
-
- /*
- * fuse_copy_page may have moved a page from a pipe instead of
- * copying into our given page, so update the folios if it was
- * replaced.
- */
- if (pagep != orig)
- ap->folios[i] = page_folio(pagep);
}
return 0;
}
@@ -1197,7 +1187,7 @@ int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
for (i = 0; !err && i < numargs; i++) {
struct fuse_arg *arg = &args[i];
if (i == numargs - 1 && argpages)
- err = fuse_copy_pages(cs, arg->size, zeroing);
+ err = fuse_copy_folios(cs, arg->size, zeroing);
else
err = fuse_copy_one(cs, arg->value, arg->size);
}
@@ -1786,7 +1776,6 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
num = outarg.size;
while (num) {
struct folio *folio;
- struct page *page;
unsigned int this_num;
folio = filemap_grab_folio(mapping, index);
@@ -1794,9 +1783,8 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
if (IS_ERR(folio))
goto out_iput;
- page = &folio->page;
this_num = min_t(unsigned, num, folio_size(folio) - offset);
- err = fuse_copy_page(cs, &page, offset, this_num, 0);
+ err = fuse_copy_folio(cs, &folio, offset, this_num, 0);
if (!folio_test_uptodate(folio) && !err && offset == 0 &&
(this_num == folio_size(folio) || file_size == end)) {
folio_zero_segment(folio, this_num, folio_size(folio));
@@ -2037,8 +2025,8 @@ static int fuse_notify_inc_epoch(struct fuse_conn *fc)
static int fuse_notify(struct fuse_conn *fc, enum fuse_notify_code code,
unsigned int size, struct fuse_copy_state *cs)
{
- /* Don't try to move pages (yet) */
- cs->move_pages = false;
+ /* Don't try to move folios (yet) */
+ cs->move_folios = false;
switch (code) {
case FUSE_NOTIFY_POLL:
@@ -2189,7 +2177,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
spin_unlock(&fpq->lock);
cs->req = req;
if (!req->args->page_replace)
- cs->move_pages = false;
+ cs->move_folios = false;
if (oh.error)
err = nbytes != sizeof(oh) ? -EINVAL : 0;
@@ -2307,7 +2295,7 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe,
cs.pipe = pipe;
if (flags & SPLICE_F_MOVE)
- cs.move_pages = true;
+ cs.move_folios = true;
ret = fuse_dev_do_write(fud, &cs, len);
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index db136e045925..5a9bd771a319 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -30,7 +30,7 @@ struct fuse_copy_state {
unsigned int len;
unsigned int offset;
bool write:1;
- bool move_pages:1;
+ bool move_folios:1;
bool is_uring:1;
struct {
unsigned int copied_sz; /* copied size into the user buffer */
--
2.47.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 02/11] fuse: support large folios for retrieves
2025-05-12 22:58 [PATCH v6 00/11] fuse: support large folios Joanne Koong
2025-05-12 22:58 ` [PATCH v6 01/11] fuse: support copying " Joanne Koong
@ 2025-05-12 22:58 ` Joanne Koong
2025-05-12 22:58 ` [PATCH v6 03/11] fuse: refactor fuse_fill_write_pages() Joanne Koong
` (9 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Joanne Koong @ 2025-05-12 22:58 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, bernd.schubert, jlayton, jefflexu, josef, willy,
kernel-team, Bernd Schubert
Add support for folios larger than one page size for retrieves.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
---
fs/fuse/dev.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 7b0e3a394480..fb81c0a1c6cd 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1837,7 +1837,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
unsigned int num;
unsigned int offset;
size_t total_len = 0;
- unsigned int num_pages, cur_pages = 0;
+ unsigned int num_pages;
struct fuse_conn *fc = fm->fc;
struct fuse_retrieve_args *ra;
size_t args_size = sizeof(*ra);
@@ -1855,6 +1855,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
num_pages = (num + offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
num_pages = min(num_pages, fc->max_pages);
+ num = min(num, num_pages << PAGE_SHIFT);
args_size += num_pages * (sizeof(ap->folios[0]) + sizeof(ap->descs[0]));
@@ -1875,25 +1876,29 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
index = outarg->offset >> PAGE_SHIFT;
- while (num && cur_pages < num_pages) {
+ while (num) {
struct folio *folio;
- unsigned int this_num;
+ unsigned int folio_offset;
+ unsigned int nr_bytes;
+ unsigned int nr_pages;
folio = filemap_get_folio(mapping, index);
if (IS_ERR(folio))
break;
- this_num = min_t(unsigned, num, PAGE_SIZE - offset);
+ folio_offset = ((index - folio->index) << PAGE_SHIFT) + offset;
+ nr_bytes = min(folio_size(folio) - folio_offset, num);
+ nr_pages = (offset + nr_bytes + PAGE_SIZE - 1) >> PAGE_SHIFT;
+
ap->folios[ap->num_folios] = folio;
- ap->descs[ap->num_folios].offset = offset;
- ap->descs[ap->num_folios].length = this_num;
+ ap->descs[ap->num_folios].offset = folio_offset;
+ ap->descs[ap->num_folios].length = nr_bytes;
ap->num_folios++;
- cur_pages++;
offset = 0;
- num -= this_num;
- total_len += this_num;
- index++;
+ num -= nr_bytes;
+ total_len += nr_bytes;
+ index += nr_pages;
}
ra->inarg.offset = outarg->offset;
ra->inarg.size = total_len;
--
2.47.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 03/11] fuse: refactor fuse_fill_write_pages()
2025-05-12 22:58 [PATCH v6 00/11] fuse: support large folios Joanne Koong
2025-05-12 22:58 ` [PATCH v6 01/11] fuse: support copying " Joanne Koong
2025-05-12 22:58 ` [PATCH v6 02/11] fuse: support large folios for retrieves Joanne Koong
@ 2025-05-12 22:58 ` Joanne Koong
2025-05-12 22:58 ` [PATCH v6 04/11] fuse: support large folios for writethrough writes Joanne Koong
` (8 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Joanne Koong @ 2025-05-12 22:58 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, bernd.schubert, jlayton, jefflexu, josef, willy,
kernel-team, Bernd Schubert
Refactor the logic in fuse_fill_write_pages() for copying out write
data. This will make the future change for supporting large folios for
writes easier. No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
---
fs/fuse/file.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index e203dd4fcc0f..6b77daa2fbce 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1132,21 +1132,21 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
struct fuse_args_pages *ap = &ia->ap;
struct fuse_conn *fc = get_fuse_conn(mapping->host);
unsigned offset = pos & (PAGE_SIZE - 1);
- unsigned int nr_pages = 0;
size_t count = 0;
- int err;
+ unsigned int num;
+ int err = 0;
+
+ num = min(iov_iter_count(ii), fc->max_write);
+ num = min(num, max_pages << PAGE_SHIFT);
ap->args.in_pages = true;
ap->descs[0].offset = offset;
- do {
+ while (num) {
size_t tmp;
struct folio *folio;
pgoff_t index = pos >> PAGE_SHIFT;
- size_t bytes = min_t(size_t, PAGE_SIZE - offset,
- iov_iter_count(ii));
-
- bytes = min_t(size_t, bytes, fc->max_write - count);
+ unsigned bytes = min(PAGE_SIZE - offset, num);
again:
folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
@@ -1178,14 +1178,13 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
goto again;
}
- err = 0;
ap->folios[ap->num_folios] = folio;
ap->descs[ap->num_folios].length = tmp;
ap->num_folios++;
- nr_pages++;
count += tmp;
pos += tmp;
+ num -= tmp;
offset += tmp;
if (offset == PAGE_SIZE)
offset = 0;
@@ -1200,10 +1199,9 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
ia->write.folio_locked = true;
break;
}
- if (!fc->big_writes)
+ if (!fc->big_writes || offset != 0)
break;
- } while (iov_iter_count(ii) && count < fc->max_write &&
- nr_pages < max_pages && offset == 0);
+ }
return count > 0 ? count : err;
}
--
2.47.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 04/11] fuse: support large folios for writethrough writes
2025-05-12 22:58 [PATCH v6 00/11] fuse: support large folios Joanne Koong
` (2 preceding siblings ...)
2025-05-12 22:58 ` [PATCH v6 03/11] fuse: refactor fuse_fill_write_pages() Joanne Koong
@ 2025-05-12 22:58 ` Joanne Koong
2025-05-12 22:58 ` [PATCH v6 05/11] fuse: support large folios for folio reads Joanne Koong
` (7 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Joanne Koong @ 2025-05-12 22:58 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, bernd.schubert, jlayton, jefflexu, josef, willy,
kernel-team
Add support for folios larger than one page size for writethrough
writes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/fuse/file.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 6b77daa2fbce..2d9bc484e87a 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1146,7 +1146,8 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
size_t tmp;
struct folio *folio;
pgoff_t index = pos >> PAGE_SHIFT;
- unsigned bytes = min(PAGE_SIZE - offset, num);
+ unsigned int bytes;
+ unsigned int folio_offset;
again:
folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
@@ -1159,7 +1160,10 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
if (mapping_writably_mapped(mapping))
flush_dcache_folio(folio);
- tmp = copy_folio_from_iter_atomic(folio, offset, bytes, ii);
+ folio_offset = ((index - folio->index) << PAGE_SHIFT) + offset;
+ bytes = min(folio_size(folio) - folio_offset, num);
+
+ tmp = copy_folio_from_iter_atomic(folio, folio_offset, bytes, ii);
flush_dcache_folio(folio);
if (!tmp) {
@@ -1179,6 +1183,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
}
ap->folios[ap->num_folios] = folio;
+ ap->descs[ap->num_folios].offset = folio_offset;
ap->descs[ap->num_folios].length = tmp;
ap->num_folios++;
@@ -1186,11 +1191,11 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
pos += tmp;
num -= tmp;
offset += tmp;
- if (offset == PAGE_SIZE)
+ if (offset == folio_size(folio))
offset = 0;
- /* If we copied full page, mark it uptodate */
- if (tmp == PAGE_SIZE)
+ /* If we copied full folio, mark it uptodate */
+ if (tmp == folio_size(folio))
folio_mark_uptodate(folio);
if (folio_test_uptodate(folio)) {
--
2.47.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 05/11] fuse: support large folios for folio reads
2025-05-12 22:58 [PATCH v6 00/11] fuse: support large folios Joanne Koong
` (3 preceding siblings ...)
2025-05-12 22:58 ` [PATCH v6 04/11] fuse: support large folios for writethrough writes Joanne Koong
@ 2025-05-12 22:58 ` Joanne Koong
2025-05-12 22:58 ` [PATCH v6 06/11] fuse: support large folios for symlinks Joanne Koong
` (6 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Joanne Koong @ 2025-05-12 22:58 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, bernd.schubert, jlayton, jefflexu, josef, willy,
kernel-team, Bernd Schubert
Add support for folios larger than one page size for folio reads into
the page cache.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
---
fs/fuse/file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 2d9bc484e87a..8efdca3ce566 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -793,7 +793,7 @@ static int fuse_do_readfolio(struct file *file, struct folio *folio)
struct inode *inode = folio->mapping->host;
struct fuse_mount *fm = get_fuse_mount(inode);
loff_t pos = folio_pos(folio);
- struct fuse_folio_desc desc = { .length = PAGE_SIZE };
+ struct fuse_folio_desc desc = { .length = folio_size(folio) };
struct fuse_io_args ia = {
.ap.args.page_zeroing = true,
.ap.args.out_pages = true,
--
2.47.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 06/11] fuse: support large folios for symlinks
2025-05-12 22:58 [PATCH v6 00/11] fuse: support large folios Joanne Koong
` (4 preceding siblings ...)
2025-05-12 22:58 ` [PATCH v6 05/11] fuse: support large folios for folio reads Joanne Koong
@ 2025-05-12 22:58 ` Joanne Koong
2025-05-12 22:58 ` [PATCH v6 07/11] fuse: support large folios for stores Joanne Koong
` (5 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Joanne Koong @ 2025-05-12 22:58 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, bernd.schubert, jlayton, jefflexu, josef, willy,
kernel-team, Bernd Schubert
Support large folios for symlinks and change the name from
fuse_getlink_page() to fuse_getlink_folio().
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
---
fs/fuse/dir.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 1fb0b15a6088..3003119559e8 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1629,10 +1629,10 @@ static int fuse_permission(struct mnt_idmap *idmap,
return err;
}
-static int fuse_readlink_page(struct inode *inode, struct folio *folio)
+static int fuse_readlink_folio(struct inode *inode, struct folio *folio)
{
struct fuse_mount *fm = get_fuse_mount(inode);
- struct fuse_folio_desc desc = { .length = PAGE_SIZE - 1 };
+ struct fuse_folio_desc desc = { .length = folio_size(folio) - 1 };
struct fuse_args_pages ap = {
.num_folios = 1,
.folios = &folio,
@@ -1687,7 +1687,7 @@ static const char *fuse_get_link(struct dentry *dentry, struct inode *inode,
if (!folio)
goto out_err;
- err = fuse_readlink_page(inode, folio);
+ err = fuse_readlink_folio(inode, folio);
if (err) {
folio_put(folio);
goto out_err;
@@ -2277,7 +2277,7 @@ void fuse_init_dir(struct inode *inode)
static int fuse_symlink_read_folio(struct file *null, struct folio *folio)
{
- int err = fuse_readlink_page(folio->mapping->host, folio);
+ int err = fuse_readlink_folio(folio->mapping->host, folio);
if (!err)
folio_mark_uptodate(folio);
--
2.47.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 07/11] fuse: support large folios for stores
2025-05-12 22:58 [PATCH v6 00/11] fuse: support large folios Joanne Koong
` (5 preceding siblings ...)
2025-05-12 22:58 ` [PATCH v6 06/11] fuse: support large folios for symlinks Joanne Koong
@ 2025-05-12 22:58 ` Joanne Koong
2025-05-12 22:58 ` [PATCH v6 08/11] fuse: support large folios for queued writes Joanne Koong
` (4 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Joanne Koong @ 2025-05-12 22:58 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, bernd.schubert, jlayton, jefflexu, josef, willy,
kernel-team
Add support for folios larger than one page size for stores.
Also change variable naming from "this_num" to "nr_bytes".
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/fuse/dev.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index fb81c0a1c6cd..a6ee8cd0f5cb 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1776,18 +1776,23 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
num = outarg.size;
while (num) {
struct folio *folio;
- unsigned int this_num;
+ unsigned int folio_offset;
+ unsigned int nr_bytes;
+ unsigned int nr_pages;
folio = filemap_grab_folio(mapping, index);
err = PTR_ERR(folio);
if (IS_ERR(folio))
goto out_iput;
- this_num = min_t(unsigned, num, folio_size(folio) - offset);
- err = fuse_copy_folio(cs, &folio, offset, this_num, 0);
+ folio_offset = ((index - folio->index) << PAGE_SHIFT) + offset;
+ nr_bytes = min_t(unsigned, num, folio_size(folio) - folio_offset);
+ nr_pages = (offset + nr_bytes + PAGE_SIZE - 1) >> PAGE_SHIFT;
+
+ err = fuse_copy_folio(cs, &folio, folio_offset, nr_bytes, 0);
if (!folio_test_uptodate(folio) && !err && offset == 0 &&
- (this_num == folio_size(folio) || file_size == end)) {
- folio_zero_segment(folio, this_num, folio_size(folio));
+ (nr_bytes == folio_size(folio) || file_size == end)) {
+ folio_zero_segment(folio, nr_bytes, folio_size(folio));
folio_mark_uptodate(folio);
}
folio_unlock(folio);
@@ -1796,9 +1801,9 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
if (err)
goto out_iput;
- num -= this_num;
+ num -= nr_bytes;
offset = 0;
- index++;
+ index += nr_pages;
}
err = 0;
--
2.47.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 08/11] fuse: support large folios for queued writes
2025-05-12 22:58 [PATCH v6 00/11] fuse: support large folios Joanne Koong
` (6 preceding siblings ...)
2025-05-12 22:58 ` [PATCH v6 07/11] fuse: support large folios for stores Joanne Koong
@ 2025-05-12 22:58 ` Joanne Koong
2025-05-12 22:58 ` [PATCH v6 09/11] fuse: support large folios for readahead Joanne Koong
` (3 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Joanne Koong @ 2025-05-12 22:58 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, bernd.schubert, jlayton, jefflexu, josef, willy,
kernel-team, Bernd Schubert
Add support for folios larger than one page size for queued writes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
---
fs/fuse/file.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 8efdca3ce566..f221a45b4bad 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1789,11 +1789,14 @@ __releases(fi->lock)
__acquires(fi->lock)
{
struct fuse_inode *fi = get_fuse_inode(wpa->inode);
+ struct fuse_args_pages *ap = &wpa->ia.ap;
struct fuse_write_in *inarg = &wpa->ia.write.in;
- struct fuse_args *args = &wpa->ia.ap.args;
- /* Currently, all folios in FUSE are one page */
- __u64 data_size = wpa->ia.ap.num_folios * PAGE_SIZE;
- int err;
+ struct fuse_args *args = &ap->args;
+ __u64 data_size = 0;
+ int err, i;
+
+ for (i = 0; i < ap->num_folios; i++)
+ data_size += ap->descs[i].length;
fi->writectr++;
if (inarg->offset + data_size <= size) {
--
2.47.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 09/11] fuse: support large folios for readahead
2025-05-12 22:58 [PATCH v6 00/11] fuse: support large folios Joanne Koong
` (7 preceding siblings ...)
2025-05-12 22:58 ` [PATCH v6 08/11] fuse: support large folios for queued writes Joanne Koong
@ 2025-05-12 22:58 ` Joanne Koong
2025-05-12 22:58 ` [PATCH v6 10/11] fuse: optimize direct io large folios processing Joanne Koong
` (2 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Joanne Koong @ 2025-05-12 22:58 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, bernd.schubert, jlayton, jefflexu, josef, willy,
kernel-team
Add support for folios larger than one page size for readahead.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/fuse/file.c | 38 +++++++++++++++++++++++++++++---------
1 file changed, 29 insertions(+), 9 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index f221a45b4bad..07ff81469a59 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -876,14 +876,13 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
fuse_io_free(ia);
}
-static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file)
+static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
+ unsigned int count)
{
struct fuse_file *ff = file->private_data;
struct fuse_mount *fm = ff->fm;
struct fuse_args_pages *ap = &ia->ap;
loff_t pos = folio_pos(ap->folios[0]);
- /* Currently, all folios in FUSE are one page */
- size_t count = ap->num_folios << PAGE_SHIFT;
ssize_t res;
int err;
@@ -918,6 +917,7 @@ static void fuse_readahead(struct readahead_control *rac)
struct inode *inode = rac->mapping->host;
struct fuse_conn *fc = get_fuse_conn(inode);
unsigned int max_pages, nr_pages;
+ struct folio *folio = NULL;
if (fuse_is_bad(inode))
return;
@@ -939,8 +939,8 @@ static void fuse_readahead(struct readahead_control *rac)
while (nr_pages) {
struct fuse_io_args *ia;
struct fuse_args_pages *ap;
- struct folio *folio;
unsigned cur_pages = min(max_pages, nr_pages);
+ unsigned int pages = 0;
if (fc->num_background >= fc->congestion_threshold &&
rac->ra->async_size >= readahead_count(rac))
@@ -952,10 +952,12 @@ static void fuse_readahead(struct readahead_control *rac)
ia = fuse_io_alloc(NULL, cur_pages);
if (!ia)
- return;
+ break;
ap = &ia->ap;
- while (ap->num_folios < cur_pages) {
+ while (pages < cur_pages) {
+ unsigned int folio_pages;
+
/*
* This returns a folio with a ref held on it.
* The ref needs to be held until the request is
@@ -963,13 +965,31 @@ static void fuse_readahead(struct readahead_control *rac)
* fuse_try_move_page()) drops the ref after it's
* replaced in the page cache.
*/
- folio = __readahead_folio(rac);
+ if (!folio)
+ folio = __readahead_folio(rac);
+
+ folio_pages = folio_nr_pages(folio);
+ if (folio_pages > cur_pages - pages) {
+ /*
+ * Large folios belonging to fuse will never
+ * have more pages than max_pages.
+ */
+ WARN_ON(!pages);
+ break;
+ }
+
ap->folios[ap->num_folios] = folio;
ap->descs[ap->num_folios].length = folio_size(folio);
ap->num_folios++;
+ pages += folio_pages;
+ folio = NULL;
}
- fuse_send_readpages(ia, rac->file);
- nr_pages -= cur_pages;
+ fuse_send_readpages(ia, rac->file, pages << PAGE_SHIFT);
+ nr_pages -= pages;
+ }
+ if (folio) {
+ folio_end_read(folio, false);
+ folio_put(folio);
}
}
--
2.47.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 10/11] fuse: optimize direct io large folios processing
2025-05-12 22:58 [PATCH v6 00/11] fuse: support large folios Joanne Koong
` (8 preceding siblings ...)
2025-05-12 22:58 ` [PATCH v6 09/11] fuse: support large folios for readahead Joanne Koong
@ 2025-05-12 22:58 ` Joanne Koong
2025-05-13 7:19 ` Miklos Szeredi
2025-05-12 22:58 ` [PATCH v6 11/11] fuse: support large folios for writeback Joanne Koong
2025-05-13 7:32 ` [PATCH v6 00/11] fuse: support large folios Miklos Szeredi
11 siblings, 1 reply; 23+ messages in thread
From: Joanne Koong @ 2025-05-12 22:58 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, bernd.schubert, jlayton, jefflexu, josef, willy,
kernel-team, Bernd Schubert
Optimize processing folios larger than one page size for the direct io
case. If contiguous pages are part of the same folio, collate the
processing instead of processing each page in the folio separately.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
---
fs/fuse/file.c | 55 +++++++++++++++++++++++++++++++++++++-------------
1 file changed, 41 insertions(+), 14 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 07ff81469a59..e4d86ced9aac 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1491,7 +1491,8 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
}
while (nbytes < *nbytesp && nr_pages < max_pages) {
- unsigned nfolios, i;
+ struct folio *prev_folio = NULL;
+ unsigned npages, i;
size_t start;
ret = iov_iter_extract_pages(ii, &pages,
@@ -1503,23 +1504,49 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
nbytes += ret;
- nfolios = DIV_ROUND_UP(ret + start, PAGE_SIZE);
+ npages = DIV_ROUND_UP(ret + start, PAGE_SIZE);
- for (i = 0; i < nfolios; i++) {
- struct folio *folio = page_folio(pages[i]);
- unsigned int offset = start +
- (folio_page_idx(folio, pages[i]) << PAGE_SHIFT);
- unsigned int len = min_t(unsigned int, ret, PAGE_SIZE - start);
+ /*
+ * We must check each extracted page. We can't assume every page
+ * in a large folio is used. For example, userspace may mmap() a
+ * file PROT_WRITE, MAP_PRIVATE, and then store to the middle of
+ * a large folio, in which case the extracted pages could be
+ *
+ * folio A page 0
+ * folio A page 1
+ * folio B page 0
+ * folio A page 3
+ *
+ * where folio A belongs to the file and folio B is an anonymous
+ * COW page.
+ */
+ for (i = 0; i < npages && ret; i++) {
+ struct folio *folio;
+ unsigned int offset;
+ unsigned int len;
+
+ WARN_ON(!pages[i]);
+ folio = page_folio(pages[i]);
+
+ len = min_t(unsigned int, ret, PAGE_SIZE - start);
+
+ if (folio == prev_folio && pages[i] != pages[i - 1]) {
+ WARN_ON(ap->folios[ap->num_folios - 1] != folio);
+ ap->descs[ap->num_folios - 1].length += len;
+ WARN_ON(ap->descs[ap->num_folios - 1].length > folio_size(folio));
+ } else {
+ offset = start + (folio_page_idx(folio, pages[i]) << PAGE_SHIFT);
+ ap->descs[ap->num_folios].offset = offset;
+ ap->descs[ap->num_folios].length = len;
+ ap->folios[ap->num_folios] = folio;
+ start = 0;
+ ap->num_folios++;
+ prev_folio = folio;
+ }
- ap->descs[ap->num_folios].offset = offset;
- ap->descs[ap->num_folios].length = len;
- ap->folios[ap->num_folios] = folio;
- start = 0;
ret -= len;
- ap->num_folios++;
}
-
- nr_pages += nfolios;
+ nr_pages += npages;
}
kfree(pages);
--
2.47.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 11/11] fuse: support large folios for writeback
2025-05-12 22:58 [PATCH v6 00/11] fuse: support large folios Joanne Koong
` (9 preceding siblings ...)
2025-05-12 22:58 ` [PATCH v6 10/11] fuse: optimize direct io large folios processing Joanne Koong
@ 2025-05-12 22:58 ` Joanne Koong
2025-05-13 7:32 ` [PATCH v6 00/11] fuse: support large folios Miklos Szeredi
11 siblings, 0 replies; 23+ messages in thread
From: Joanne Koong @ 2025-05-12 22:58 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, bernd.schubert, jlayton, jefflexu, josef, willy,
kernel-team
Add support for folios larger than one page size for writeback.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/fuse/file.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index e4d86ced9aac..b27cdbd4bffe 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2015,7 +2015,7 @@ static void fuse_writepage_args_page_fill(struct fuse_writepage_args *wpa, struc
ap->folios[folio_index] = folio;
ap->descs[folio_index].offset = 0;
- ap->descs[folio_index].length = PAGE_SIZE;
+ ap->descs[folio_index].length = folio_size(folio);
inc_wb_stat(&inode_to_bdi(inode)->wb, WB_WRITEBACK);
}
@@ -2089,6 +2089,7 @@ struct fuse_fill_wb_data {
struct fuse_file *ff;
struct inode *inode;
unsigned int max_folios;
+ unsigned int nr_pages;
};
static bool fuse_pages_realloc(struct fuse_fill_wb_data *data)
@@ -2136,15 +2137,15 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio,
WARN_ON(!ap->num_folios);
/* Reached max pages */
- if (ap->num_folios == fc->max_pages)
+ if (data->nr_pages + folio_nr_pages(folio) > fc->max_pages)
return true;
/* Reached max write bytes */
- if ((ap->num_folios + 1) * PAGE_SIZE > fc->max_write)
+ if ((data->nr_pages * PAGE_SIZE) + folio_size(folio) > fc->max_write)
return true;
/* Discontinuity */
- if (ap->folios[ap->num_folios - 1]->index + 1 != folio_index(folio))
+ if (folio_next_index(ap->folios[ap->num_folios - 1]) != folio_index(folio))
return true;
/* Need to grow the pages array? If so, did the expansion fail? */
@@ -2175,6 +2176,7 @@ static int fuse_writepages_fill(struct folio *folio,
if (wpa && fuse_writepage_need_send(fc, folio, ap, data)) {
fuse_writepages_send(data);
data->wpa = NULL;
+ data->nr_pages = 0;
}
if (data->wpa == NULL) {
@@ -2189,6 +2191,7 @@ static int fuse_writepages_fill(struct folio *folio,
folio_start_writeback(folio);
fuse_writepage_args_page_fill(wpa, folio, ap->num_folios);
+ data->nr_pages += folio_nr_pages(folio);
err = 0;
ap->num_folios++;
@@ -2219,6 +2222,7 @@ static int fuse_writepages(struct address_space *mapping,
data.inode = inode;
data.wpa = NULL;
data.ff = NULL;
+ data.nr_pages = 0;
err = write_cache_pages(mapping, wbc, fuse_writepages_fill, &data);
if (data.wpa) {
--
2.47.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v6 10/11] fuse: optimize direct io large folios processing
2025-05-12 22:58 ` [PATCH v6 10/11] fuse: optimize direct io large folios processing Joanne Koong
@ 2025-05-13 7:19 ` Miklos Szeredi
2025-05-13 20:39 ` Joanne Koong
0 siblings, 1 reply; 23+ messages in thread
From: Miklos Szeredi @ 2025-05-13 7:19 UTC (permalink / raw)
To: Joanne Koong
Cc: linux-fsdevel, bernd.schubert, jlayton, jefflexu, josef, willy,
kernel-team, Bernd Schubert
On Tue, 13 May 2025 at 00:59, Joanne Koong <joannelkoong@gmail.com> wrote:
>
> Optimize processing folios larger than one page size for the direct io
> case. If contiguous pages are part of the same folio, collate the
> processing instead of processing each page in the folio separately.
This patch is sort of special in the series, since the others are
basically no-op until large folios are enabled.
Did you validate this in particular? Is there a good way to test
direct I/O on a buffer with mixed folio sizes?
Thanks,
Miklos
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v6 00/11] fuse: support large folios
2025-05-12 22:58 [PATCH v6 00/11] fuse: support large folios Joanne Koong
` (10 preceding siblings ...)
2025-05-12 22:58 ` [PATCH v6 11/11] fuse: support large folios for writeback Joanne Koong
@ 2025-05-13 7:32 ` Miklos Szeredi
11 siblings, 0 replies; 23+ messages in thread
From: Miklos Szeredi @ 2025-05-13 7:32 UTC (permalink / raw)
To: Joanne Koong
Cc: linux-fsdevel, bernd.schubert, jlayton, jefflexu, josef, willy,
kernel-team
On Tue, 13 May 2025 at 00:59, Joanne Koong <joannelkoong@gmail.com> wrote:
>
> This patchset adds support for large folios in fuse.
>
> This does not yet switch fuse to using large folios. Using large folios in
> fuse is dependent on adding granular dirty-page tracking. This will be done
> in a separate patchset that will have fuse use iomap [1]. There also needs
> to be a followup (also part of future work) for having dirty page balancing
> not tank performance for unprivileged servers where bdi limits lead to subpar
> throttling [1], before enabling large folios for fuse.
Looks good, applied. Thanks for taking care of this.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v6 10/11] fuse: optimize direct io large folios processing
2025-05-13 7:19 ` Miklos Szeredi
@ 2025-05-13 20:39 ` Joanne Koong
2025-05-15 8:27 ` Miklos Szeredi
0 siblings, 1 reply; 23+ messages in thread
From: Joanne Koong @ 2025-05-13 20:39 UTC (permalink / raw)
To: Miklos Szeredi
Cc: linux-fsdevel, bernd.schubert, jlayton, jefflexu, josef, willy,
kernel-team, Bernd Schubert
On Tue, May 13, 2025 at 12:19 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Tue, 13 May 2025 at 00:59, Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > Optimize processing folios larger than one page size for the direct io
> > case. If contiguous pages are part of the same folio, collate the
> > processing instead of processing each page in the folio separately.
>
> This patch is sort of special in the series, since the others are
> basically no-op until large folios are enabled.
>
> Did you validate this in particular? Is there a good way to test
> direct I/O on a buffer with mixed folio sizes?
Hi Miklos,
No, I did not validate this case in particular. I'm happy to drop this
patch for now and resend it when large folios get turned on, if you
prefer that. It seems like it'd be good to add this case to xfstests.
Matthew mentioned in [1] that this can get triggered by:
"Userspace may mmap() a file PROT_WRITE, MAP_PRIVATE.
If they store to the middle of a large folio (the file that is mmaped
may be on a filesystem that does support large folios, rather than
fuse), then we'll have, eg:
folio A page 0
folio A page 1
folio B page 0
folio A page 3
where folio A belongs to the file and folio B is an anonymous COW page."
Looking at iov_iter_extract_pages() more closely though, I'm realizing
now that this function extracts only a list of *contiguous* pages, so
I don't think we even run into a case where the extracted pages that
get returned can have interleaved pages from another folio.
Ideally though, the long-term solution would be having a
iov_iter_extract_folios() API instead of having to use
iov_iter_extract_pages() as a workaround. I'm hoping to take a stab at
implementing that after the large folios work is done.
Thanks,
Joanne
[1] https://lore.kernel.org/linux-fsdevel/Z1cSy1OUxPZ2kzYT@casper.infradead.org/
>
> Thanks,
> Miklos
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v6 01/11] fuse: support copying large folios
2025-05-12 22:58 ` [PATCH v6 01/11] fuse: support copying " Joanne Koong
@ 2025-05-14 0:18 ` Matthew Wilcox
2025-05-14 22:59 ` Joanne Koong
2025-05-15 8:26 ` Miklos Szeredi
0 siblings, 2 replies; 23+ messages in thread
From: Matthew Wilcox @ 2025-05-14 0:18 UTC (permalink / raw)
To: Joanne Koong
Cc: miklos, linux-fsdevel, bernd.schubert, jlayton, jefflexu, josef,
kernel-team, Bernd Schubert
On Mon, May 12, 2025 at 03:58:30PM -0700, Joanne Koong wrote:
> @@ -1126,22 +1127,22 @@ static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep,
> return err;
> }
> }
> - if (page) {
> - void *mapaddr = kmap_local_page(page);
> - void *buf = mapaddr + offset;
> + if (folio) {
> + void *mapaddr = kmap_local_folio(folio, offset);
> + void *buf = mapaddr;
> offset += fuse_copy_do(cs, &buf, &count);
> kunmap_local(mapaddr);
kmap_local_folio() only maps the page which contains 'offset'.
following what the functions in highmem.h do, i'd suggest something
like:
if (folio) {
void *mapaddr = kmap_local_folio(folio, offset);
void *buf = mapaddr;
if (folio_test_highmem(folio) &&
size > PAGE_SIZE - offset_in_page(offset))
size = PAGE_SIZE - offset_in_page(offset);
offset += fuse_copy_do(cs, &buf, &count);
kunmap_local(mapaddr);
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v6 01/11] fuse: support copying large folios
2025-05-14 0:18 ` Matthew Wilcox
@ 2025-05-14 22:59 ` Joanne Koong
2025-05-15 2:05 ` Matthew Wilcox
2025-05-15 8:26 ` Miklos Szeredi
1 sibling, 1 reply; 23+ messages in thread
From: Joanne Koong @ 2025-05-14 22:59 UTC (permalink / raw)
To: Matthew Wilcox
Cc: miklos, linux-fsdevel, bernd.schubert, jlayton, jefflexu, josef,
kernel-team, Bernd Schubert
On Tue, May 13, 2025 at 5:18 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, May 12, 2025 at 03:58:30PM -0700, Joanne Koong wrote:
> > @@ -1126,22 +1127,22 @@ static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep,
> > return err;
> > }
> > }
> > - if (page) {
> > - void *mapaddr = kmap_local_page(page);
> > - void *buf = mapaddr + offset;
> > + if (folio) {
> > + void *mapaddr = kmap_local_folio(folio, offset);
> > + void *buf = mapaddr;
> > offset += fuse_copy_do(cs, &buf, &count);
> > kunmap_local(mapaddr);
>
> kmap_local_folio() only maps the page which contains 'offset'.
> following what the functions in highmem.h do, i'd suggest something
> like:
>
> if (folio) {
> void *mapaddr = kmap_local_folio(folio, offset);
> void *buf = mapaddr;
>
> if (folio_test_highmem(folio) &&
> size > PAGE_SIZE - offset_in_page(offset))
> size = PAGE_SIZE - offset_in_page(offset);
> offset += fuse_copy_do(cs, &buf, &count);
> kunmap_local(mapaddr);
>
Ahh okay, I see, thanks. Do you think it makes sense to change
kmap_local_folio() to kmap all the pages in the folio if the folio is
in highmem instead of the caller needing to do that for each page in
the folio one by one? We would need a kunmap_local_folio() where we
pass in the folio so that we know how many pages need to be unmapped,
but it seems to me like with large folios, every caller will be
running into this issue, so maybe we should just have
kmap_local_folio() handle it?
Thanks,
Joanne
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v6 01/11] fuse: support copying large folios
2025-05-14 22:59 ` Joanne Koong
@ 2025-05-15 2:05 ` Matthew Wilcox
2025-05-15 18:50 ` Joanne Koong
0 siblings, 1 reply; 23+ messages in thread
From: Matthew Wilcox @ 2025-05-15 2:05 UTC (permalink / raw)
To: Joanne Koong
Cc: miklos, linux-fsdevel, bernd.schubert, jlayton, jefflexu, josef,
kernel-team, Bernd Schubert
On Wed, May 14, 2025 at 03:59:50PM -0700, Joanne Koong wrote:
> On Tue, May 13, 2025 at 5:18 PM Matthew Wilcox <willy@infradead.org> wrote:
> > kmap_local_folio() only maps the page which contains 'offset'.
> > following what the functions in highmem.h do, i'd suggest something
> > like:
> >
> > if (folio) {
> > void *mapaddr = kmap_local_folio(folio, offset);
> > void *buf = mapaddr;
> >
> > if (folio_test_highmem(folio) &&
> > size > PAGE_SIZE - offset_in_page(offset))
> > size = PAGE_SIZE - offset_in_page(offset);
> > offset += fuse_copy_do(cs, &buf, &count);
> > kunmap_local(mapaddr);
> >
> Ahh okay, I see, thanks. Do you think it makes sense to change
> kmap_local_folio() to kmap all the pages in the folio if the folio is
> in highmem instead of the caller needing to do that for each page in
> the folio one by one? We would need a kunmap_local_folio() where we
> pass in the folio so that we know how many pages need to be unmapped,
> but it seems to me like with large folios, every caller will be
> running into this issue, so maybe we should just have
> kmap_local_folio() handle it?
Spoken like someone who hasn't looked into the implementation of
kmap_local at all ;-)
Basically, this isn't possible. There's only space for 16 pages to be
mapped at once, and we might want to copy from one folio to another, so
we'd be limited to a maximum folio order of 8. Expanding the reserved
space for kmap is hard because it's primarily used on 32-bit machines
and we're very constrained in VA space.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v6 01/11] fuse: support copying large folios
2025-05-14 0:18 ` Matthew Wilcox
2025-05-14 22:59 ` Joanne Koong
@ 2025-05-15 8:26 ` Miklos Szeredi
2025-05-15 17:53 ` Joanne Koong
1 sibling, 1 reply; 23+ messages in thread
From: Miklos Szeredi @ 2025-05-15 8:26 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Joanne Koong, linux-fsdevel, bernd.schubert, jlayton, jefflexu,
josef, kernel-team, Bernd Schubert
On Wed, 14 May 2025 at 02:18, Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, May 12, 2025 at 03:58:30PM -0700, Joanne Koong wrote:
> > @@ -1126,22 +1127,22 @@ static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep,
> > return err;
> > }
> > }
> > - if (page) {
> > - void *mapaddr = kmap_local_page(page);
> > - void *buf = mapaddr + offset;
> > + if (folio) {
> > + void *mapaddr = kmap_local_folio(folio, offset);
> > + void *buf = mapaddr;
> > offset += fuse_copy_do(cs, &buf, &count);
> > kunmap_local(mapaddr);
Fixed version pushed.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v6 10/11] fuse: optimize direct io large folios processing
2025-05-13 20:39 ` Joanne Koong
@ 2025-05-15 8:27 ` Miklos Szeredi
0 siblings, 0 replies; 23+ messages in thread
From: Miklos Szeredi @ 2025-05-15 8:27 UTC (permalink / raw)
To: Joanne Koong
Cc: linux-fsdevel, bernd.schubert, jlayton, jefflexu, josef, willy,
kernel-team, Bernd Schubert
On Tue, 13 May 2025 at 22:39, Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Tue, May 13, 2025 at 12:19 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Tue, 13 May 2025 at 00:59, Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > Optimize processing folios larger than one page size for the direct io
> > > case. If contiguous pages are part of the same folio, collate the
> > > processing instead of processing each page in the folio separately.
> >
> > This patch is sort of special in the series, since the others are
> > basically no-op until large folios are enabled.
> >
> > Did you validate this in particular? Is there a good way to test
> > direct I/O on a buffer with mixed folio sizes?
>
> Hi Miklos,
>
> No, I did not validate this case in particular. I'm happy to drop this
> patch for now and resend it when large folios get turned on, if you
> prefer that. It seems like it'd be good to add this case to xfstests.
Dropped for now.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v6 01/11] fuse: support copying large folios
2025-05-15 8:26 ` Miklos Szeredi
@ 2025-05-15 17:53 ` Joanne Koong
2025-05-16 9:47 ` Miklos Szeredi
0 siblings, 1 reply; 23+ messages in thread
From: Joanne Koong @ 2025-05-15 17:53 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Matthew Wilcox, linux-fsdevel, bernd.schubert, jlayton, jefflexu,
josef, kernel-team, Bernd Schubert
On Thu, May 15, 2025 at 1:26 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Wed, 14 May 2025 at 02:18, Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Mon, May 12, 2025 at 03:58:30PM -0700, Joanne Koong wrote:
> > > @@ -1126,22 +1127,22 @@ static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep,
> > > return err;
> > > }
> > > }
> > > - if (page) {
> > > - void *mapaddr = kmap_local_page(page);
> > > - void *buf = mapaddr + offset;
> > > + if (folio) {
> > > + void *mapaddr = kmap_local_folio(folio, offset);
> > > + void *buf = mapaddr;
> > > offset += fuse_copy_do(cs, &buf, &count);
> > > kunmap_local(mapaddr);
>
> Fixed version pushed.
I think this needs to be:
if (folio) {
void *mapaddr = kmap_local_folio(folio, offset);
void *buf = mapaddr;
unsigned copy = count;
unsigned bytes_copied;
if (folio_test_highmem(folio) && count >
PAGE_SIZE - offset_in_page(offset))
copy = PAGE_SIZE - offset_in_page(offset);
bytes_copied = fuse_copy_do(cs, &buf, ©);
kunmap_local(mapaddr);
offset += bytes_copied;
count -= bytes_copied;
else it will only copy the first page of the highmem folio.
Thanks,
Joanne
>
> Thanks,
> Miklos
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v6 01/11] fuse: support copying large folios
2025-05-15 2:05 ` Matthew Wilcox
@ 2025-05-15 18:50 ` Joanne Koong
0 siblings, 0 replies; 23+ messages in thread
From: Joanne Koong @ 2025-05-15 18:50 UTC (permalink / raw)
To: Matthew Wilcox
Cc: miklos, linux-fsdevel, bernd.schubert, jlayton, jefflexu, josef,
kernel-team, Bernd Schubert
On Wed, May 14, 2025 at 7:06 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, May 14, 2025 at 03:59:50PM -0700, Joanne Koong wrote:
> > On Tue, May 13, 2025 at 5:18 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > kmap_local_folio() only maps the page which contains 'offset'.
> > > following what the functions in highmem.h do, i'd suggest something
> > > like:
> > >
> > > if (folio) {
> > > void *mapaddr = kmap_local_folio(folio, offset);
> > > void *buf = mapaddr;
> > >
> > > if (folio_test_highmem(folio) &&
> > > size > PAGE_SIZE - offset_in_page(offset))
> > > size = PAGE_SIZE - offset_in_page(offset);
> > > offset += fuse_copy_do(cs, &buf, &count);
> > > kunmap_local(mapaddr);
> > >
> > Ahh okay, I see, thanks. Do you think it makes sense to change
> > kmap_local_folio() to kmap all the pages in the folio if the folio is
> > in highmem instead of the caller needing to do that for each page in
> > the folio one by one? We would need a kunmap_local_folio() where we
> > pass in the folio so that we know how many pages need to be unmapped,
> > but it seems to me like with large folios, every caller will be
> > running into this issue, so maybe we should just have
> > kmap_local_folio() handle it?
>
> Spoken like someone who hasn't looked into the implementation of
> kmap_local at all ;-)
>
> Basically, this isn't possible. There's only space for 16 pages to be
> mapped at once, and we might want to copy from one folio to another, so
> we'd be limited to a maximum folio order of 8. Expanding the reserved
> space for kmap is hard because it's primarily used on 32-bit machines
> and we're very constrained in VA space.
Ah okay, I see, thanks.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v6 01/11] fuse: support copying large folios
2025-05-15 17:53 ` Joanne Koong
@ 2025-05-16 9:47 ` Miklos Szeredi
0 siblings, 0 replies; 23+ messages in thread
From: Miklos Szeredi @ 2025-05-16 9:47 UTC (permalink / raw)
To: Joanne Koong
Cc: Matthew Wilcox, linux-fsdevel, bernd.schubert, jlayton, jefflexu,
josef, kernel-team, Bernd Schubert
On Thu, 15 May 2025 at 19:54, Joanne Koong <joannelkoong@gmail.com> wrote:
> I think this needs to be:
>
> if (folio) {
> void *mapaddr = kmap_local_folio(folio, offset);
> void *buf = mapaddr;
> unsigned copy = count;
> unsigned bytes_copied;
>
> if (folio_test_highmem(folio) && count >
> PAGE_SIZE - offset_in_page(offset))
> copy = PAGE_SIZE - offset_in_page(offset);
>
> bytes_copied = fuse_copy_do(cs, &buf, ©);
> kunmap_local(mapaddr);
> offset += bytes_copied;
> count -= bytes_copied;
>
> else it will only copy the first page of the highmem folio.
Right. Fix applied.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2025-05-16 9:47 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-12 22:58 [PATCH v6 00/11] fuse: support large folios Joanne Koong
2025-05-12 22:58 ` [PATCH v6 01/11] fuse: support copying " Joanne Koong
2025-05-14 0:18 ` Matthew Wilcox
2025-05-14 22:59 ` Joanne Koong
2025-05-15 2:05 ` Matthew Wilcox
2025-05-15 18:50 ` Joanne Koong
2025-05-15 8:26 ` Miklos Szeredi
2025-05-15 17:53 ` Joanne Koong
2025-05-16 9:47 ` Miklos Szeredi
2025-05-12 22:58 ` [PATCH v6 02/11] fuse: support large folios for retrieves Joanne Koong
2025-05-12 22:58 ` [PATCH v6 03/11] fuse: refactor fuse_fill_write_pages() Joanne Koong
2025-05-12 22:58 ` [PATCH v6 04/11] fuse: support large folios for writethrough writes Joanne Koong
2025-05-12 22:58 ` [PATCH v6 05/11] fuse: support large folios for folio reads Joanne Koong
2025-05-12 22:58 ` [PATCH v6 06/11] fuse: support large folios for symlinks Joanne Koong
2025-05-12 22:58 ` [PATCH v6 07/11] fuse: support large folios for stores Joanne Koong
2025-05-12 22:58 ` [PATCH v6 08/11] fuse: support large folios for queued writes Joanne Koong
2025-05-12 22:58 ` [PATCH v6 09/11] fuse: support large folios for readahead Joanne Koong
2025-05-12 22:58 ` [PATCH v6 10/11] fuse: optimize direct io large folios processing Joanne Koong
2025-05-13 7:19 ` Miklos Szeredi
2025-05-13 20:39 ` Joanne Koong
2025-05-15 8:27 ` Miklos Szeredi
2025-05-12 22:58 ` [PATCH v6 11/11] fuse: support large folios for writeback Joanne Koong
2025-05-13 7:32 ` [PATCH v6 00/11] fuse: support large folios Miklos Szeredi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).