* [PATCH v5 00/11] fuse: support large folios
@ 2025-04-26 0:08 Joanne Koong
2025-04-26 0:08 ` [PATCH v5 01/11] fuse: support copying " Joanne Koong
` (10 more replies)
0 siblings, 11 replies; 31+ messages in thread
From: Joanne Koong @ 2025-04-26 0:08 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, jlayton, jefflexu, josef, bernd.schubert, willy,
kernel-team
This patchset adds support for large folios in fuse.
This does not yet switch fuse to using large folios. Using large folios in
fuse is dependent on adding granular dirty-page tracking. This will be done
in a separate patchset that will have fuse use iomap [1]. There also needs
to be a followup (also part of future work) for having dirty page balancing
not tank performance for unprivileged servers where bdi limits lead to subpar
throttling [1], before enabling large folios for fuse.
This patchset (v5) is pretty much identical to v3 except for fixing up
readahead error handling.
[1] https://lore.kernel.org/linux-fsdevel/CAJnrk1a38pv3OgFZRfdTiDMXuPWuBgN8KY47XfOsYHj=N2wxAg@mail.gmail.com/#t
Changelog:
v4: https://lore.kernel.org/linux-fsdevel/20250123012448.2479372-1-joannelkoong@gmail.com/
v4 -> v5:
* Now that temp pages are removed in FUSE, resubmit v3.
v3: https://lore.kernel.org/linux-fsdevel/20241213221818.322371-1-joannelkoong@gmail.com/
v3 -> v4:
* Add Jeff's reviewed-bys
* Drop writeback large folios changes, drop turning large folios on. These
will be part of a separate future patchset
v2: https://lore.kernel.org/linux-fsdevel/20241125220537.3663725-1-joannelkoong@gmail.com/
v2 -> v3:
* Fix direct io parsing to check each extracted page instead of assuming all
pages in a large folio will be used (Matthew)
v1: https://lore.kernel.org/linux-fsdevel/20241109001258.2216604-1-joannelkoong@gmail.com/
v1 -> v2:
* Change naming from "non-writeback write" to "writethrough write"
* Fix deadlock for writethrough writes by calling fault_in_iov_iter_readable()
* first
before __filemap_get_folio() (Josef)
* For readahead, retain original folio_size() for descs.length (Josef)
* Use folio_zero_range() api in fuse_copy_folio() (Josef)
* Add Josef's reviewed-bys
Joanne Koong (11):
fuse: support copying large folios
fuse: support large folios for retrieves
fuse: refactor fuse_fill_write_pages()
fuse: support large folios for writethrough writes
fuse: support large folios for folio reads
fuse: support large folios for symlinks
fuse: support large folios for stores
fuse: support large folios for queued writes
fuse: support large folios for readahead
fuse: optimize direct io large folios processing
fuse: support large folios for writeback
fs/fuse/dev.c | 126 ++++++++++++++++++------------------
fs/fuse/dir.c | 8 +--
fs/fuse/file.c | 148 +++++++++++++++++++++++++++++--------------
fs/fuse/fuse_dev_i.h | 2 +-
4 files changed, 169 insertions(+), 115 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v5 01/11] fuse: support copying large folios
2025-04-26 0:08 [PATCH v5 00/11] fuse: support large folios Joanne Koong
@ 2025-04-26 0:08 ` Joanne Koong
2025-05-04 18:05 ` Bernd Schubert
2025-04-26 0:08 ` [PATCH v5 02/11] fuse: support large folios for retrieves Joanne Koong
` (9 subsequent siblings)
10 siblings, 1 reply; 31+ messages in thread
From: Joanne Koong @ 2025-04-26 0:08 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, jlayton, jefflexu, josef, bernd.schubert, willy,
kernel-team
Currently, all folios associated with fuse are one page size. As part of
the work to enable large folios, this commit adds support for copying
to/from folios larger than one page size.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/fuse/dev.c | 84 +++++++++++++++++++-------------------------
fs/fuse/fuse_dev_i.h | 2 +-
2 files changed, 37 insertions(+), 49 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 155bb6aeaef5..7b0e3a394480 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -955,10 +955,10 @@ static int fuse_check_folio(struct folio *folio)
* folio that was originally in @pagep will lose a reference and the new
* folio returned in @pagep will carry a reference.
*/
-static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
+static int fuse_try_move_folio(struct fuse_copy_state *cs, struct folio **foliop)
{
int err;
- struct folio *oldfolio = page_folio(*pagep);
+ struct folio *oldfolio = *foliop;
struct folio *newfolio;
struct pipe_buffer *buf = cs->pipebufs;
@@ -979,7 +979,7 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
cs->pipebufs++;
cs->nr_segs--;
- if (cs->len != PAGE_SIZE)
+ if (cs->len != folio_size(oldfolio))
goto out_fallback;
if (!pipe_buf_try_steal(cs->pipe, buf))
@@ -1025,7 +1025,7 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
if (test_bit(FR_ABORTED, &cs->req->flags))
err = -ENOENT;
else
- *pagep = &newfolio->page;
+ *foliop = newfolio;
spin_unlock(&cs->req->waitq.lock);
if (err) {
@@ -1058,8 +1058,8 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
goto out_put_old;
}
-static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page,
- unsigned offset, unsigned count)
+static int fuse_ref_folio(struct fuse_copy_state *cs, struct folio *folio,
+ unsigned offset, unsigned count)
{
struct pipe_buffer *buf;
int err;
@@ -1067,17 +1067,17 @@ static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page,
if (cs->nr_segs >= cs->pipe->max_usage)
return -EIO;
- get_page(page);
+ folio_get(folio);
err = unlock_request(cs->req);
if (err) {
- put_page(page);
+ folio_put(folio);
return err;
}
fuse_copy_finish(cs);
buf = cs->pipebufs;
- buf->page = page;
+ buf->page = &folio->page;
buf->offset = offset;
buf->len = count;
@@ -1089,20 +1089,21 @@ static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page,
}
/*
- * Copy a page in the request to/from the userspace buffer. Must be
+ * Copy a folio in the request to/from the userspace buffer. Must be
* done atomically
*/
-static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep,
- unsigned offset, unsigned count, int zeroing)
+static int fuse_copy_folio(struct fuse_copy_state *cs, struct folio **foliop,
+ unsigned offset, unsigned count, int zeroing)
{
int err;
- struct page *page = *pagep;
+ struct folio *folio = *foliop;
+ size_t size = folio_size(folio);
- if (page && zeroing && count < PAGE_SIZE)
- clear_highpage(page);
+ if (folio && zeroing && count < size)
+ folio_zero_range(folio, 0, size);
while (count) {
- if (cs->write && cs->pipebufs && page) {
+ if (cs->write && cs->pipebufs && folio) {
/*
* Can't control lifetime of pipe buffers, so always
* copy user pages.
@@ -1112,12 +1113,12 @@ static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep,
if (err)
return err;
} else {
- return fuse_ref_page(cs, page, offset, count);
+ return fuse_ref_folio(cs, folio, offset, count);
}
} else if (!cs->len) {
- if (cs->move_pages && page &&
- offset == 0 && count == PAGE_SIZE) {
- err = fuse_try_move_page(cs, pagep);
+ if (cs->move_folios && folio &&
+ offset == 0 && count == folio_size(folio)) {
+ err = fuse_try_move_folio(cs, foliop);
if (err <= 0)
return err;
} else {
@@ -1126,22 +1127,22 @@ static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep,
return err;
}
}
- if (page) {
- void *mapaddr = kmap_local_page(page);
- void *buf = mapaddr + offset;
+ if (folio) {
+ void *mapaddr = kmap_local_folio(folio, offset);
+ void *buf = mapaddr;
offset += fuse_copy_do(cs, &buf, &count);
kunmap_local(mapaddr);
} else
offset += fuse_copy_do(cs, NULL, &count);
}
- if (page && !cs->write)
- flush_dcache_page(page);
+ if (folio && !cs->write)
+ flush_dcache_folio(folio);
return 0;
}
-/* Copy pages in the request to/from userspace buffer */
-static int fuse_copy_pages(struct fuse_copy_state *cs, unsigned nbytes,
- int zeroing)
+/* Copy folios in the request to/from userspace buffer */
+static int fuse_copy_folios(struct fuse_copy_state *cs, unsigned nbytes,
+ int zeroing)
{
unsigned i;
struct fuse_req *req = cs->req;
@@ -1151,23 +1152,12 @@ static int fuse_copy_pages(struct fuse_copy_state *cs, unsigned nbytes,
int err;
unsigned int offset = ap->descs[i].offset;
unsigned int count = min(nbytes, ap->descs[i].length);
- struct page *orig, *pagep;
-
- orig = pagep = &ap->folios[i]->page;
- err = fuse_copy_page(cs, &pagep, offset, count, zeroing);
+ err = fuse_copy_folio(cs, &ap->folios[i], offset, count, zeroing);
if (err)
return err;
nbytes -= count;
-
- /*
- * fuse_copy_page may have moved a page from a pipe instead of
- * copying into our given page, so update the folios if it was
- * replaced.
- */
- if (pagep != orig)
- ap->folios[i] = page_folio(pagep);
}
return 0;
}
@@ -1197,7 +1187,7 @@ int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
for (i = 0; !err && i < numargs; i++) {
struct fuse_arg *arg = &args[i];
if (i == numargs - 1 && argpages)
- err = fuse_copy_pages(cs, arg->size, zeroing);
+ err = fuse_copy_folios(cs, arg->size, zeroing);
else
err = fuse_copy_one(cs, arg->value, arg->size);
}
@@ -1786,7 +1776,6 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
num = outarg.size;
while (num) {
struct folio *folio;
- struct page *page;
unsigned int this_num;
folio = filemap_grab_folio(mapping, index);
@@ -1794,9 +1783,8 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
if (IS_ERR(folio))
goto out_iput;
- page = &folio->page;
this_num = min_t(unsigned, num, folio_size(folio) - offset);
- err = fuse_copy_page(cs, &page, offset, this_num, 0);
+ err = fuse_copy_folio(cs, &folio, offset, this_num, 0);
if (!folio_test_uptodate(folio) && !err && offset == 0 &&
(this_num == folio_size(folio) || file_size == end)) {
folio_zero_segment(folio, this_num, folio_size(folio));
@@ -2037,8 +2025,8 @@ static int fuse_notify_inc_epoch(struct fuse_conn *fc)
static int fuse_notify(struct fuse_conn *fc, enum fuse_notify_code code,
unsigned int size, struct fuse_copy_state *cs)
{
- /* Don't try to move pages (yet) */
- cs->move_pages = false;
+ /* Don't try to move folios (yet) */
+ cs->move_folios = false;
switch (code) {
case FUSE_NOTIFY_POLL:
@@ -2189,7 +2177,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
spin_unlock(&fpq->lock);
cs->req = req;
if (!req->args->page_replace)
- cs->move_pages = false;
+ cs->move_folios = false;
if (oh.error)
err = nbytes != sizeof(oh) ? -EINVAL : 0;
@@ -2307,7 +2295,7 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe,
cs.pipe = pipe;
if (flags & SPLICE_F_MOVE)
- cs.move_pages = true;
+ cs.move_folios = true;
ret = fuse_dev_do_write(fud, &cs, len);
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index db136e045925..5a9bd771a319 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -30,7 +30,7 @@ struct fuse_copy_state {
unsigned int len;
unsigned int offset;
bool write:1;
- bool move_pages:1;
+ bool move_folios:1;
bool is_uring:1;
struct {
unsigned int copied_sz; /* copied size into the user buffer */
--
2.47.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v5 02/11] fuse: support large folios for retrieves
2025-04-26 0:08 [PATCH v5 00/11] fuse: support large folios Joanne Koong
2025-04-26 0:08 ` [PATCH v5 01/11] fuse: support copying " Joanne Koong
@ 2025-04-26 0:08 ` Joanne Koong
2025-05-04 18:07 ` Bernd Schubert
2025-04-26 0:08 ` [PATCH v5 03/11] fuse: refactor fuse_fill_write_pages() Joanne Koong
` (8 subsequent siblings)
10 siblings, 1 reply; 31+ messages in thread
From: Joanne Koong @ 2025-04-26 0:08 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, jlayton, jefflexu, josef, bernd.schubert, willy,
kernel-team
Add support for folios larger than one page size for retrieves.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/fuse/dev.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 7b0e3a394480..fb81c0a1c6cd 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1837,7 +1837,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
unsigned int num;
unsigned int offset;
size_t total_len = 0;
- unsigned int num_pages, cur_pages = 0;
+ unsigned int num_pages;
struct fuse_conn *fc = fm->fc;
struct fuse_retrieve_args *ra;
size_t args_size = sizeof(*ra);
@@ -1855,6 +1855,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
num_pages = (num + offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
num_pages = min(num_pages, fc->max_pages);
+ num = min(num, num_pages << PAGE_SHIFT);
args_size += num_pages * (sizeof(ap->folios[0]) + sizeof(ap->descs[0]));
@@ -1875,25 +1876,29 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
index = outarg->offset >> PAGE_SHIFT;
- while (num && cur_pages < num_pages) {
+ while (num) {
struct folio *folio;
- unsigned int this_num;
+ unsigned int folio_offset;
+ unsigned int nr_bytes;
+ unsigned int nr_pages;
folio = filemap_get_folio(mapping, index);
if (IS_ERR(folio))
break;
- this_num = min_t(unsigned, num, PAGE_SIZE - offset);
+ folio_offset = ((index - folio->index) << PAGE_SHIFT) + offset;
+ nr_bytes = min(folio_size(folio) - folio_offset, num);
+ nr_pages = (offset + nr_bytes + PAGE_SIZE - 1) >> PAGE_SHIFT;
+
ap->folios[ap->num_folios] = folio;
- ap->descs[ap->num_folios].offset = offset;
- ap->descs[ap->num_folios].length = this_num;
+ ap->descs[ap->num_folios].offset = folio_offset;
+ ap->descs[ap->num_folios].length = nr_bytes;
ap->num_folios++;
- cur_pages++;
offset = 0;
- num -= this_num;
- total_len += this_num;
- index++;
+ num -= nr_bytes;
+ total_len += nr_bytes;
+ index += nr_pages;
}
ra->inarg.offset = outarg->offset;
ra->inarg.size = total_len;
--
2.47.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v5 03/11] fuse: refactor fuse_fill_write_pages()
2025-04-26 0:08 [PATCH v5 00/11] fuse: support large folios Joanne Koong
2025-04-26 0:08 ` [PATCH v5 01/11] fuse: support copying " Joanne Koong
2025-04-26 0:08 ` [PATCH v5 02/11] fuse: support large folios for retrieves Joanne Koong
@ 2025-04-26 0:08 ` Joanne Koong
2025-04-28 5:32 ` Dan Carpenter
2025-05-04 18:08 ` Bernd Schubert
2025-04-26 0:08 ` [PATCH v5 04/11] fuse: support large folios for writethrough writes Joanne Koong
` (7 subsequent siblings)
10 siblings, 2 replies; 31+ messages in thread
From: Joanne Koong @ 2025-04-26 0:08 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, jlayton, jefflexu, josef, bernd.schubert, willy,
kernel-team
Refactor the logic in fuse_fill_write_pages() for copying out write
data. This will make the future change for supporting large folios for
writes easier. No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/fuse/file.c | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index e203dd4fcc0f..edc86485065e 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1132,21 +1132,21 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
struct fuse_args_pages *ap = &ia->ap;
struct fuse_conn *fc = get_fuse_conn(mapping->host);
unsigned offset = pos & (PAGE_SIZE - 1);
- unsigned int nr_pages = 0;
size_t count = 0;
+ unsigned int num;
int err;
+ num = min(iov_iter_count(ii), fc->max_write);
+ num = min(num, max_pages << PAGE_SHIFT);
+
ap->args.in_pages = true;
ap->descs[0].offset = offset;
- do {
+ while (num) {
size_t tmp;
struct folio *folio;
pgoff_t index = pos >> PAGE_SHIFT;
- size_t bytes = min_t(size_t, PAGE_SIZE - offset,
- iov_iter_count(ii));
-
- bytes = min_t(size_t, bytes, fc->max_write - count);
+ unsigned bytes = min(PAGE_SIZE - offset, num);
again:
folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
@@ -1182,10 +1182,10 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
ap->folios[ap->num_folios] = folio;
ap->descs[ap->num_folios].length = tmp;
ap->num_folios++;
- nr_pages++;
count += tmp;
pos += tmp;
+ num -= tmp;
offset += tmp;
if (offset == PAGE_SIZE)
offset = 0;
@@ -1200,10 +1200,9 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
ia->write.folio_locked = true;
break;
}
- if (!fc->big_writes)
+ if (!fc->big_writes || offset != 0)
break;
- } while (iov_iter_count(ii) && count < fc->max_write &&
- nr_pages < max_pages && offset == 0);
+ }
return count > 0 ? count : err;
}
--
2.47.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v5 04/11] fuse: support large folios for writethrough writes
2025-04-26 0:08 [PATCH v5 00/11] fuse: support large folios Joanne Koong
` (2 preceding siblings ...)
2025-04-26 0:08 ` [PATCH v5 03/11] fuse: refactor fuse_fill_write_pages() Joanne Koong
@ 2025-04-26 0:08 ` Joanne Koong
2025-05-04 18:40 ` Bernd Schubert
2025-04-26 0:08 ` [PATCH v5 05/11] fuse: support large folios for folio reads Joanne Koong
` (6 subsequent siblings)
10 siblings, 1 reply; 31+ messages in thread
From: Joanne Koong @ 2025-04-26 0:08 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, jlayton, jefflexu, josef, bernd.schubert, willy,
kernel-team
Add support for folios larger than one page size for writethrough
writes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/fuse/file.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index edc86485065e..e44b6d26c1c6 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1146,7 +1146,8 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
size_t tmp;
struct folio *folio;
pgoff_t index = pos >> PAGE_SHIFT;
- unsigned bytes = min(PAGE_SIZE - offset, num);
+ unsigned int bytes;
+ unsigned int folio_offset;
again:
folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
@@ -1159,7 +1160,10 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
if (mapping_writably_mapped(mapping))
flush_dcache_folio(folio);
- tmp = copy_folio_from_iter_atomic(folio, offset, bytes, ii);
+ folio_offset = ((index - folio->index) << PAGE_SHIFT) + offset;
+ bytes = min(folio_size(folio) - folio_offset, num);
+
+ tmp = copy_folio_from_iter_atomic(folio, folio_offset, bytes, ii);
flush_dcache_folio(folio);
if (!tmp) {
@@ -1180,6 +1184,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
err = 0;
ap->folios[ap->num_folios] = folio;
+ ap->descs[ap->num_folios].offset = folio_offset;
ap->descs[ap->num_folios].length = tmp;
ap->num_folios++;
@@ -1187,11 +1192,11 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
pos += tmp;
num -= tmp;
offset += tmp;
- if (offset == PAGE_SIZE)
+ if (offset == folio_size(folio))
offset = 0;
- /* If we copied full page, mark it uptodate */
- if (tmp == PAGE_SIZE)
+ /* If we copied full folio, mark it uptodate */
+ if (tmp == folio_size(folio))
folio_mark_uptodate(folio);
if (folio_test_uptodate(folio)) {
--
2.47.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v5 05/11] fuse: support large folios for folio reads
2025-04-26 0:08 [PATCH v5 00/11] fuse: support large folios Joanne Koong
` (3 preceding siblings ...)
2025-04-26 0:08 ` [PATCH v5 04/11] fuse: support large folios for writethrough writes Joanne Koong
@ 2025-04-26 0:08 ` Joanne Koong
2025-05-04 18:58 ` Bernd Schubert
2025-04-26 0:08 ` [PATCH v5 06/11] fuse: support large folios for symlinks Joanne Koong
` (5 subsequent siblings)
10 siblings, 1 reply; 31+ messages in thread
From: Joanne Koong @ 2025-04-26 0:08 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, jlayton, jefflexu, josef, bernd.schubert, willy,
kernel-team
Add support for folios larger than one page size for folio reads into
the page cache.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/fuse/file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index e44b6d26c1c6..0ca3b31c59f9 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -793,7 +793,7 @@ static int fuse_do_readfolio(struct file *file, struct folio *folio)
struct inode *inode = folio->mapping->host;
struct fuse_mount *fm = get_fuse_mount(inode);
loff_t pos = folio_pos(folio);
- struct fuse_folio_desc desc = { .length = PAGE_SIZE };
+ struct fuse_folio_desc desc = { .length = folio_size(folio) };
struct fuse_io_args ia = {
.ap.args.page_zeroing = true,
.ap.args.out_pages = true,
--
2.47.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v5 06/11] fuse: support large folios for symlinks
2025-04-26 0:08 [PATCH v5 00/11] fuse: support large folios Joanne Koong
` (4 preceding siblings ...)
2025-04-26 0:08 ` [PATCH v5 05/11] fuse: support large folios for folio reads Joanne Koong
@ 2025-04-26 0:08 ` Joanne Koong
2025-05-04 19:04 ` Bernd Schubert
2025-04-26 0:08 ` [PATCH v5 07/11] fuse: support large folios for stores Joanne Koong
` (4 subsequent siblings)
10 siblings, 1 reply; 31+ messages in thread
From: Joanne Koong @ 2025-04-26 0:08 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, jlayton, jefflexu, josef, bernd.schubert, willy,
kernel-team
Support large folios for symlinks and change the name from
fuse_getlink_page() to fuse_getlink_folio().
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/fuse/dir.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 1fb0b15a6088..3003119559e8 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1629,10 +1629,10 @@ static int fuse_permission(struct mnt_idmap *idmap,
return err;
}
-static int fuse_readlink_page(struct inode *inode, struct folio *folio)
+static int fuse_readlink_folio(struct inode *inode, struct folio *folio)
{
struct fuse_mount *fm = get_fuse_mount(inode);
- struct fuse_folio_desc desc = { .length = PAGE_SIZE - 1 };
+ struct fuse_folio_desc desc = { .length = folio_size(folio) - 1 };
struct fuse_args_pages ap = {
.num_folios = 1,
.folios = &folio,
@@ -1687,7 +1687,7 @@ static const char *fuse_get_link(struct dentry *dentry, struct inode *inode,
if (!folio)
goto out_err;
- err = fuse_readlink_page(inode, folio);
+ err = fuse_readlink_folio(inode, folio);
if (err) {
folio_put(folio);
goto out_err;
@@ -2277,7 +2277,7 @@ void fuse_init_dir(struct inode *inode)
static int fuse_symlink_read_folio(struct file *null, struct folio *folio)
{
- int err = fuse_readlink_page(folio->mapping->host, folio);
+ int err = fuse_readlink_folio(folio->mapping->host, folio);
if (!err)
folio_mark_uptodate(folio);
--
2.47.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v5 07/11] fuse: support large folios for stores
2025-04-26 0:08 [PATCH v5 00/11] fuse: support large folios Joanne Koong
` (5 preceding siblings ...)
2025-04-26 0:08 ` [PATCH v5 06/11] fuse: support large folios for symlinks Joanne Koong
@ 2025-04-26 0:08 ` Joanne Koong
2025-04-26 0:08 ` [PATCH v5 08/11] fuse: support large folios for queued writes Joanne Koong
` (3 subsequent siblings)
10 siblings, 0 replies; 31+ messages in thread
From: Joanne Koong @ 2025-04-26 0:08 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, jlayton, jefflexu, josef, bernd.schubert, willy,
kernel-team
Add support for folios larger than one page size for stores.
Also change variable naming from "this_num" to "nr_bytes".
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/fuse/dev.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index fb81c0a1c6cd..a6ee8cd0f5cb 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1776,18 +1776,23 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
num = outarg.size;
while (num) {
struct folio *folio;
- unsigned int this_num;
+ unsigned int folio_offset;
+ unsigned int nr_bytes;
+ unsigned int nr_pages;
folio = filemap_grab_folio(mapping, index);
err = PTR_ERR(folio);
if (IS_ERR(folio))
goto out_iput;
- this_num = min_t(unsigned, num, folio_size(folio) - offset);
- err = fuse_copy_folio(cs, &folio, offset, this_num, 0);
+ folio_offset = ((index - folio->index) << PAGE_SHIFT) + offset;
+ nr_bytes = min_t(unsigned, num, folio_size(folio) - folio_offset);
+ nr_pages = (offset + nr_bytes + PAGE_SIZE - 1) >> PAGE_SHIFT;
+
+ err = fuse_copy_folio(cs, &folio, folio_offset, nr_bytes, 0);
if (!folio_test_uptodate(folio) && !err && offset == 0 &&
- (this_num == folio_size(folio) || file_size == end)) {
- folio_zero_segment(folio, this_num, folio_size(folio));
+ (nr_bytes == folio_size(folio) || file_size == end)) {
+ folio_zero_segment(folio, nr_bytes, folio_size(folio));
folio_mark_uptodate(folio);
}
folio_unlock(folio);
@@ -1796,9 +1801,9 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
if (err)
goto out_iput;
- num -= this_num;
+ num -= nr_bytes;
offset = 0;
- index++;
+ index += nr_pages;
}
err = 0;
--
2.47.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v5 08/11] fuse: support large folios for queued writes
2025-04-26 0:08 [PATCH v5 00/11] fuse: support large folios Joanne Koong
` (6 preceding siblings ...)
2025-04-26 0:08 ` [PATCH v5 07/11] fuse: support large folios for stores Joanne Koong
@ 2025-04-26 0:08 ` Joanne Koong
2025-05-04 19:08 ` Bernd Schubert
2025-04-26 0:08 ` [PATCH v5 09/11] fuse: support large folios for readahead Joanne Koong
` (2 subsequent siblings)
10 siblings, 1 reply; 31+ messages in thread
From: Joanne Koong @ 2025-04-26 0:08 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, jlayton, jefflexu, josef, bernd.schubert, willy,
kernel-team
Add support for folios larger than one page size for queued writes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/fuse/file.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 0ca3b31c59f9..1d38486fae50 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1790,11 +1790,14 @@ __releases(fi->lock)
__acquires(fi->lock)
{
struct fuse_inode *fi = get_fuse_inode(wpa->inode);
+ struct fuse_args_pages *ap = &wpa->ia.ap;
struct fuse_write_in *inarg = &wpa->ia.write.in;
- struct fuse_args *args = &wpa->ia.ap.args;
- /* Currently, all folios in FUSE are one page */
- __u64 data_size = wpa->ia.ap.num_folios * PAGE_SIZE;
- int err;
+ struct fuse_args *args = &ap->args;
+ __u64 data_size = 0;
+ int err, i;
+
+ for (i = 0; i < ap->num_folios; i++)
+ data_size += ap->descs[i].length;
fi->writectr++;
if (inarg->offset + data_size <= size) {
--
2.47.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v5 09/11] fuse: support large folios for readahead
2025-04-26 0:08 [PATCH v5 00/11] fuse: support large folios Joanne Koong
` (7 preceding siblings ...)
2025-04-26 0:08 ` [PATCH v5 08/11] fuse: support large folios for queued writes Joanne Koong
@ 2025-04-26 0:08 ` Joanne Koong
2025-05-04 19:13 ` Bernd Schubert
2025-04-26 0:08 ` [PATCH v5 10/11] fuse: optimize direct io large folios processing Joanne Koong
2025-04-26 0:08 ` [PATCH v5 11/11] fuse: support large folios for writeback Joanne Koong
10 siblings, 1 reply; 31+ messages in thread
From: Joanne Koong @ 2025-04-26 0:08 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, jlayton, jefflexu, josef, bernd.schubert, willy,
kernel-team
Add support for folios larger than one page size for readahead.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/fuse/file.c | 36 +++++++++++++++++++++++++++---------
1 file changed, 27 insertions(+), 9 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1d38486fae50..9a31f2a516b9 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -876,14 +876,13 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
fuse_io_free(ia);
}
-static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file)
+static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
+ unsigned int count)
{
struct fuse_file *ff = file->private_data;
struct fuse_mount *fm = ff->fm;
struct fuse_args_pages *ap = &ia->ap;
loff_t pos = folio_pos(ap->folios[0]);
- /* Currently, all folios in FUSE are one page */
- size_t count = ap->num_folios << PAGE_SHIFT;
ssize_t res;
int err;
@@ -918,6 +917,7 @@ static void fuse_readahead(struct readahead_control *rac)
struct inode *inode = rac->mapping->host;
struct fuse_conn *fc = get_fuse_conn(inode);
unsigned int max_pages, nr_pages;
+ struct folio *folio = NULL;
if (fuse_is_bad(inode))
return;
@@ -939,8 +939,8 @@ static void fuse_readahead(struct readahead_control *rac)
while (nr_pages) {
struct fuse_io_args *ia;
struct fuse_args_pages *ap;
- struct folio *folio;
unsigned cur_pages = min(max_pages, nr_pages);
+ unsigned int pages = 0;
if (fc->num_background >= fc->congestion_threshold &&
rac->ra->async_size >= readahead_count(rac))
@@ -952,10 +952,12 @@ static void fuse_readahead(struct readahead_control *rac)
ia = fuse_io_alloc(NULL, cur_pages);
if (!ia)
- return;
+ break;
ap = &ia->ap;
- while (ap->num_folios < cur_pages) {
+ while (pages < cur_pages) {
+ unsigned int folio_pages;
+
/*
* This returns a folio with a ref held on it.
* The ref needs to be held until the request is
@@ -963,13 +965,29 @@ static void fuse_readahead(struct readahead_control *rac)
* fuse_try_move_page()) drops the ref after it's
* replaced in the page cache.
*/
- folio = __readahead_folio(rac);
+ if (!folio)
+ folio = __readahead_folio(rac);
+
+ folio_pages = folio_nr_pages(folio);
+ if (folio_pages > cur_pages - pages)
+ break;
+
ap->folios[ap->num_folios] = folio;
ap->descs[ap->num_folios].length = folio_size(folio);
ap->num_folios++;
+ pages += folio_pages;
+ folio = NULL;
+ }
+ if (!pages) {
+ fuse_io_free(ia);
+ break;
}
- fuse_send_readpages(ia, rac->file);
- nr_pages -= cur_pages;
+ fuse_send_readpages(ia, rac->file, pages << PAGE_SHIFT);
+ nr_pages -= pages;
+ }
+ if (folio) {
+ folio_end_read(folio, false);
+ folio_put(folio);
}
}
--
2.47.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v5 10/11] fuse: optimize direct io large folios processing
2025-04-26 0:08 [PATCH v5 00/11] fuse: support large folios Joanne Koong
` (8 preceding siblings ...)
2025-04-26 0:08 ` [PATCH v5 09/11] fuse: support large folios for readahead Joanne Koong
@ 2025-04-26 0:08 ` Joanne Koong
2025-05-04 19:15 ` Bernd Schubert
2025-07-04 10:24 ` David Hildenbrand
2025-04-26 0:08 ` [PATCH v5 11/11] fuse: support large folios for writeback Joanne Koong
10 siblings, 2 replies; 31+ messages in thread
From: Joanne Koong @ 2025-04-26 0:08 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, jlayton, jefflexu, josef, bernd.schubert, willy,
kernel-team
Optimize processing folios larger than one page size for the direct io
case. If contiguous pages are part of the same folio, collate the
processing instead of processing each page in the folio separately.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/fuse/file.c | 55 +++++++++++++++++++++++++++++++++++++-------------
1 file changed, 41 insertions(+), 14 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 9a31f2a516b9..61eaec1c993b 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1490,7 +1490,8 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
}
while (nbytes < *nbytesp && nr_pages < max_pages) {
- unsigned nfolios, i;
+ struct folio *prev_folio = NULL;
+ unsigned npages, i;
size_t start;
ret = iov_iter_extract_pages(ii, &pages,
@@ -1502,23 +1503,49 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
nbytes += ret;
- nfolios = DIV_ROUND_UP(ret + start, PAGE_SIZE);
+ npages = DIV_ROUND_UP(ret + start, PAGE_SIZE);
- for (i = 0; i < nfolios; i++) {
- struct folio *folio = page_folio(pages[i]);
- unsigned int offset = start +
- (folio_page_idx(folio, pages[i]) << PAGE_SHIFT);
- unsigned int len = min_t(unsigned int, ret, PAGE_SIZE - start);
+ /*
+ * We must check each extracted page. We can't assume every page
+ * in a large folio is used. For example, userspace may mmap() a
+ * file PROT_WRITE, MAP_PRIVATE, and then store to the middle of
+ * a large folio, in which case the extracted pages could be
+ *
+ * folio A page 0
+ * folio A page 1
+ * folio B page 0
+ * folio A page 3
+ *
+ * where folio A belongs to the file and folio B is an anonymous
+ * COW page.
+ */
+ for (i = 0; i < npages && ret; i++) {
+ struct folio *folio;
+ unsigned int offset;
+ unsigned int len;
+
+ WARN_ON(!pages[i]);
+ folio = page_folio(pages[i]);
+
+ len = min_t(unsigned int, ret, PAGE_SIZE - start);
+
+ if (folio == prev_folio && pages[i] != pages[i - 1]) {
+ WARN_ON(ap->folios[ap->num_folios - 1] != folio);
+ ap->descs[ap->num_folios - 1].length += len;
+ WARN_ON(ap->descs[ap->num_folios - 1].length > folio_size(folio));
+ } else {
+ offset = start + (folio_page_idx(folio, pages[i]) << PAGE_SHIFT);
+ ap->descs[ap->num_folios].offset = offset;
+ ap->descs[ap->num_folios].length = len;
+ ap->folios[ap->num_folios] = folio;
+ start = 0;
+ ap->num_folios++;
+ prev_folio = folio;
+ }
- ap->descs[ap->num_folios].offset = offset;
- ap->descs[ap->num_folios].length = len;
- ap->folios[ap->num_folios] = folio;
- start = 0;
ret -= len;
- ap->num_folios++;
}
-
- nr_pages += nfolios;
+ nr_pages += npages;
}
kfree(pages);
--
2.47.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v5 11/11] fuse: support large folios for writeback
2025-04-26 0:08 [PATCH v5 00/11] fuse: support large folios Joanne Koong
` (9 preceding siblings ...)
2025-04-26 0:08 ` [PATCH v5 10/11] fuse: optimize direct io large folios processing Joanne Koong
@ 2025-04-26 0:08 ` Joanne Koong
10 siblings, 0 replies; 31+ messages in thread
From: Joanne Koong @ 2025-04-26 0:08 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, jlayton, jefflexu, josef, bernd.schubert, willy,
kernel-team
Add support for folios larger than one page size for writeback.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/fuse/file.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 61eaec1c993b..5e7187446730 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2014,7 +2014,7 @@ static void fuse_writepage_args_page_fill(struct fuse_writepage_args *wpa, struc
ap->folios[folio_index] = folio;
ap->descs[folio_index].offset = 0;
- ap->descs[folio_index].length = PAGE_SIZE;
+ ap->descs[folio_index].length = folio_size(folio);
inc_wb_stat(&inode_to_bdi(inode)->wb, WB_WRITEBACK);
}
@@ -2088,6 +2088,7 @@ struct fuse_fill_wb_data {
struct fuse_file *ff;
struct inode *inode;
unsigned int max_folios;
+ unsigned int nr_pages;
};
static bool fuse_pages_realloc(struct fuse_fill_wb_data *data)
@@ -2135,15 +2136,15 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio,
WARN_ON(!ap->num_folios);
/* Reached max pages */
- if (ap->num_folios == fc->max_pages)
+ if (data->nr_pages + folio_nr_pages(folio) > fc->max_pages)
return true;
/* Reached max write bytes */
- if ((ap->num_folios + 1) * PAGE_SIZE > fc->max_write)
+ if ((data->nr_pages * PAGE_SIZE) + folio_size(folio) > fc->max_write)
return true;
/* Discontinuity */
- if (ap->folios[ap->num_folios - 1]->index + 1 != folio_index(folio))
+ if (folio_next_index(ap->folios[ap->num_folios - 1]) != folio_index(folio))
return true;
/* Need to grow the pages array? If so, did the expansion fail? */
@@ -2174,6 +2175,7 @@ static int fuse_writepages_fill(struct folio *folio,
if (wpa && fuse_writepage_need_send(fc, folio, ap, data)) {
fuse_writepages_send(data);
data->wpa = NULL;
+ data->nr_pages = 0;
}
if (data->wpa == NULL) {
@@ -2188,6 +2190,7 @@ static int fuse_writepages_fill(struct folio *folio,
folio_start_writeback(folio);
fuse_writepage_args_page_fill(wpa, folio, ap->num_folios);
+ data->nr_pages += folio_nr_pages(folio);
err = 0;
ap->num_folios++;
@@ -2218,6 +2221,7 @@ static int fuse_writepages(struct address_space *mapping,
data.inode = inode;
data.wpa = NULL;
data.ff = NULL;
+ data.nr_pages = 0;
err = write_cache_pages(mapping, wbc, fuse_writepages_fill, &data);
if (data.wpa) {
--
2.47.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v5 03/11] fuse: refactor fuse_fill_write_pages()
2025-04-26 0:08 ` [PATCH v5 03/11] fuse: refactor fuse_fill_write_pages() Joanne Koong
@ 2025-04-28 5:32 ` Dan Carpenter
2025-04-28 22:10 ` Joanne Koong
2025-05-04 18:08 ` Bernd Schubert
1 sibling, 1 reply; 31+ messages in thread
From: Dan Carpenter @ 2025-04-28 5:32 UTC (permalink / raw)
To: oe-kbuild, Joanne Koong, miklos
Cc: lkp, oe-kbuild-all, linux-fsdevel, jlayton, jefflexu, josef,
bernd.schubert, willy, kernel-team
Hi Joanne,
kernel test robot noticed the following build warnings:
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Joanne-Koong/fuse-support-copying-large-folios/20250426-081219
base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git for-next
patch link: https://lore.kernel.org/r/20250426000828.3216220-4-joannelkoong%40gmail.com
patch subject: [PATCH v5 03/11] fuse: refactor fuse_fill_write_pages()
config: i386-randconfig-141-20250426 (https://download.01.org/0day-ci/archive/20250427/202504270319.GmkEM1Xg-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
| Closes: https://lore.kernel.org/r/202504270319.GmkEM1Xg-lkp@intel.com/
smatch warnings:
fs/fuse/file.c:1207 fuse_fill_write_pages() error: uninitialized symbol 'err'.
vim +/err +1207 fs/fuse/file.c
4f06dd92b5d0a6 Vivek Goyal 2020-10-21 1127 static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1128 struct address_space *mapping,
338f2e3f3341a9 Miklos Szeredi 2019-09-10 1129 struct iov_iter *ii, loff_t pos,
338f2e3f3341a9 Miklos Szeredi 2019-09-10 1130 unsigned int max_pages)
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1131 {
4f06dd92b5d0a6 Vivek Goyal 2020-10-21 1132 struct fuse_args_pages *ap = &ia->ap;
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1133 struct fuse_conn *fc = get_fuse_conn(mapping->host);
09cbfeaf1a5a67 Kirill A. Shutemov 2016-04-01 1134 unsigned offset = pos & (PAGE_SIZE - 1);
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1135 size_t count = 0;
dfda790dfda452 Joanne Koong 2025-04-25 1136 unsigned int num;
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1137 int err;
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1138
dfda790dfda452 Joanne Koong 2025-04-25 1139 num = min(iov_iter_count(ii), fc->max_write);
Can iov_iter_count() return zero here?
dfda790dfda452 Joanne Koong 2025-04-25 1140 num = min(num, max_pages << PAGE_SHIFT);
dfda790dfda452 Joanne Koong 2025-04-25 1141
338f2e3f3341a9 Miklos Szeredi 2019-09-10 1142 ap->args.in_pages = true;
68bfb7eb7f7de3 Joanne Koong 2024-10-24 1143 ap->descs[0].offset = offset;
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1144
dfda790dfda452 Joanne Koong 2025-04-25 1145 while (num) {
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1146 size_t tmp;
9bafbe7ae01321 Josef Bacik 2024-09-30 1147 struct folio *folio;
09cbfeaf1a5a67 Kirill A. Shutemov 2016-04-01 1148 pgoff_t index = pos >> PAGE_SHIFT;
dfda790dfda452 Joanne Koong 2025-04-25 1149 unsigned bytes = min(PAGE_SIZE - offset, num);
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1150
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1151 again:
9bafbe7ae01321 Josef Bacik 2024-09-30 1152 folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
9bafbe7ae01321 Josef Bacik 2024-09-30 1153 mapping_gfp_mask(mapping));
9bafbe7ae01321 Josef Bacik 2024-09-30 1154 if (IS_ERR(folio)) {
9bafbe7ae01321 Josef Bacik 2024-09-30 1155 err = PTR_ERR(folio);
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1156 break;
9bafbe7ae01321 Josef Bacik 2024-09-30 1157 }
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1158
931e80e4b3263d anfei zhou 2010-02-02 1159 if (mapping_writably_mapped(mapping))
9bafbe7ae01321 Josef Bacik 2024-09-30 1160 flush_dcache_folio(folio);
931e80e4b3263d anfei zhou 2010-02-02 1161
9bafbe7ae01321 Josef Bacik 2024-09-30 1162 tmp = copy_folio_from_iter_atomic(folio, offset, bytes, ii);
9bafbe7ae01321 Josef Bacik 2024-09-30 1163 flush_dcache_folio(folio);
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1164
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1165 if (!tmp) {
9bafbe7ae01321 Josef Bacik 2024-09-30 1166 folio_unlock(folio);
9bafbe7ae01321 Josef Bacik 2024-09-30 1167 folio_put(folio);
faa794dd2e17e7 Dave Hansen 2025-01-29 1168
faa794dd2e17e7 Dave Hansen 2025-01-29 1169 /*
faa794dd2e17e7 Dave Hansen 2025-01-29 1170 * Ensure forward progress by faulting in
faa794dd2e17e7 Dave Hansen 2025-01-29 1171 * while not holding the folio lock:
faa794dd2e17e7 Dave Hansen 2025-01-29 1172 */
faa794dd2e17e7 Dave Hansen 2025-01-29 1173 if (fault_in_iov_iter_readable(ii, bytes)) {
faa794dd2e17e7 Dave Hansen 2025-01-29 1174 err = -EFAULT;
faa794dd2e17e7 Dave Hansen 2025-01-29 1175 break;
faa794dd2e17e7 Dave Hansen 2025-01-29 1176 }
faa794dd2e17e7 Dave Hansen 2025-01-29 1177
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1178 goto again;
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1179 }
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1180
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1181 err = 0;
f2ef459bab7326 Joanne Koong 2024-10-24 1182 ap->folios[ap->num_folios] = folio;
68bfb7eb7f7de3 Joanne Koong 2024-10-24 1183 ap->descs[ap->num_folios].length = tmp;
f2ef459bab7326 Joanne Koong 2024-10-24 1184 ap->num_folios++;
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1185
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1186 count += tmp;
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1187 pos += tmp;
dfda790dfda452 Joanne Koong 2025-04-25 1188 num -= tmp;
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1189 offset += tmp;
09cbfeaf1a5a67 Kirill A. Shutemov 2016-04-01 1190 if (offset == PAGE_SIZE)
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1191 offset = 0;
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1192
4f06dd92b5d0a6 Vivek Goyal 2020-10-21 1193 /* If we copied full page, mark it uptodate */
4f06dd92b5d0a6 Vivek Goyal 2020-10-21 1194 if (tmp == PAGE_SIZE)
9bafbe7ae01321 Josef Bacik 2024-09-30 1195 folio_mark_uptodate(folio);
4f06dd92b5d0a6 Vivek Goyal 2020-10-21 1196
9bafbe7ae01321 Josef Bacik 2024-09-30 1197 if (folio_test_uptodate(folio)) {
9bafbe7ae01321 Josef Bacik 2024-09-30 1198 folio_unlock(folio);
4f06dd92b5d0a6 Vivek Goyal 2020-10-21 1199 } else {
f2ef459bab7326 Joanne Koong 2024-10-24 1200 ia->write.folio_locked = true;
4f06dd92b5d0a6 Vivek Goyal 2020-10-21 1201 break;
4f06dd92b5d0a6 Vivek Goyal 2020-10-21 1202 }
dfda790dfda452 Joanne Koong 2025-04-25 1203 if (!fc->big_writes || offset != 0)
78bb6cb9a890d3 Miklos Szeredi 2008-05-12 1204 break;
dfda790dfda452 Joanne Koong 2025-04-25 1205 }
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1206
ea9b9907b82a09 Nicholas Piggin 2008-04-30 @1207 return count > 0 ? count : err;
ea9b9907b82a09 Nicholas Piggin 2008-04-30 1208 }
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 03/11] fuse: refactor fuse_fill_write_pages()
2025-04-28 5:32 ` Dan Carpenter
@ 2025-04-28 22:10 ` Joanne Koong
0 siblings, 0 replies; 31+ messages in thread
From: Joanne Koong @ 2025-04-28 22:10 UTC (permalink / raw)
To: Dan Carpenter
Cc: oe-kbuild, miklos, lkp, oe-kbuild-all, linux-fsdevel, jlayton,
jefflexu, josef, bernd.schubert, willy, kernel-team
On Sun, Apr 27, 2025 at 10:32 PM Dan Carpenter <dan.carpenter@linaro.org> wrote:
>
> Hi Joanne,
>
> kernel test robot noticed the following build warnings:
>
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Joanne-Koong/fuse-support-copying-large-folios/20250426-081219
> base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git for-next
> patch link: https://lore.kernel.org/r/20250426000828.3216220-4-joannelkoong%40gmail.com
> patch subject: [PATCH v5 03/11] fuse: refactor fuse_fill_write_pages()
> config: i386-randconfig-141-20250426 (https://download.01.org/0day-ci/archive/20250427/202504270319.GmkEM1Xg-lkp@intel.com/config)
> compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
> | Closes: https://lore.kernel.org/r/202504270319.GmkEM1Xg-lkp@intel.com/
>
> smatch warnings:
> fs/fuse/file.c:1207 fuse_fill_write_pages() error: uninitialized symbol 'err'.
>
> vim +/err +1207 fs/fuse/file.c
>
> 4f06dd92b5d0a6 Vivek Goyal 2020-10-21 1127 static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1128 struct address_space *mapping,
> 338f2e3f3341a9 Miklos Szeredi 2019-09-10 1129 struct iov_iter *ii, loff_t pos,
> 338f2e3f3341a9 Miklos Szeredi 2019-09-10 1130 unsigned int max_pages)
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1131 {
> 4f06dd92b5d0a6 Vivek Goyal 2020-10-21 1132 struct fuse_args_pages *ap = &ia->ap;
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1133 struct fuse_conn *fc = get_fuse_conn(mapping->host);
> 09cbfeaf1a5a67 Kirill A. Shutemov 2016-04-01 1134 unsigned offset = pos & (PAGE_SIZE - 1);
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1135 size_t count = 0;
> dfda790dfda452 Joanne Koong 2025-04-25 1136 unsigned int num;
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1137 int err;
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1138
> dfda790dfda452 Joanne Koong 2025-04-25 1139 num = min(iov_iter_count(ii), fc->max_write);
>
> Can iov_iter_count() return zero here?
>
> dfda790dfda452 Joanne Koong 2025-04-25 1140 num = min(num, max_pages << PAGE_SHIFT);
> dfda790dfda452 Joanne Koong 2025-04-25 1141
> 338f2e3f3341a9 Miklos Szeredi 2019-09-10 1142 ap->args.in_pages = true;
> 68bfb7eb7f7de3 Joanne Koong 2024-10-24 1143 ap->descs[0].offset = offset;
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1144
> dfda790dfda452 Joanne Koong 2025-04-25 1145 while (num) {
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1146 size_t tmp;
> 9bafbe7ae01321 Josef Bacik 2024-09-30 1147 struct folio *folio;
> 09cbfeaf1a5a67 Kirill A. Shutemov 2016-04-01 1148 pgoff_t index = pos >> PAGE_SHIFT;
> dfda790dfda452 Joanne Koong 2025-04-25 1149 unsigned bytes = min(PAGE_SIZE - offset, num);
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1150
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1151 again:
> 9bafbe7ae01321 Josef Bacik 2024-09-30 1152 folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
> 9bafbe7ae01321 Josef Bacik 2024-09-30 1153 mapping_gfp_mask(mapping));
> 9bafbe7ae01321 Josef Bacik 2024-09-30 1154 if (IS_ERR(folio)) {
> 9bafbe7ae01321 Josef Bacik 2024-09-30 1155 err = PTR_ERR(folio);
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1156 break;
> 9bafbe7ae01321 Josef Bacik 2024-09-30 1157 }
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1158
> 931e80e4b3263d anfei zhou 2010-02-02 1159 if (mapping_writably_mapped(mapping))
> 9bafbe7ae01321 Josef Bacik 2024-09-30 1160 flush_dcache_folio(folio);
> 931e80e4b3263d anfei zhou 2010-02-02 1161
> 9bafbe7ae01321 Josef Bacik 2024-09-30 1162 tmp = copy_folio_from_iter_atomic(folio, offset, bytes, ii);
> 9bafbe7ae01321 Josef Bacik 2024-09-30 1163 flush_dcache_folio(folio);
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1164
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1165 if (!tmp) {
> 9bafbe7ae01321 Josef Bacik 2024-09-30 1166 folio_unlock(folio);
> 9bafbe7ae01321 Josef Bacik 2024-09-30 1167 folio_put(folio);
> faa794dd2e17e7 Dave Hansen 2025-01-29 1168
> faa794dd2e17e7 Dave Hansen 2025-01-29 1169 /*
> faa794dd2e17e7 Dave Hansen 2025-01-29 1170 * Ensure forward progress by faulting in
> faa794dd2e17e7 Dave Hansen 2025-01-29 1171 * while not holding the folio lock:
> faa794dd2e17e7 Dave Hansen 2025-01-29 1172 */
> faa794dd2e17e7 Dave Hansen 2025-01-29 1173 if (fault_in_iov_iter_readable(ii, bytes)) {
> faa794dd2e17e7 Dave Hansen 2025-01-29 1174 err = -EFAULT;
> faa794dd2e17e7 Dave Hansen 2025-01-29 1175 break;
> faa794dd2e17e7 Dave Hansen 2025-01-29 1176 }
> faa794dd2e17e7 Dave Hansen 2025-01-29 1177
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1178 goto again;
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1179 }
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1180
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1181 err = 0;
> f2ef459bab7326 Joanne Koong 2024-10-24 1182 ap->folios[ap->num_folios] = folio;
> 68bfb7eb7f7de3 Joanne Koong 2024-10-24 1183 ap->descs[ap->num_folios].length = tmp;
> f2ef459bab7326 Joanne Koong 2024-10-24 1184 ap->num_folios++;
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1185
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1186 count += tmp;
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1187 pos += tmp;
> dfda790dfda452 Joanne Koong 2025-04-25 1188 num -= tmp;
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1189 offset += tmp;
> 09cbfeaf1a5a67 Kirill A. Shutemov 2016-04-01 1190 if (offset == PAGE_SIZE)
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1191 offset = 0;
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1192
> 4f06dd92b5d0a6 Vivek Goyal 2020-10-21 1193 /* If we copied full page, mark it uptodate */
> 4f06dd92b5d0a6 Vivek Goyal 2020-10-21 1194 if (tmp == PAGE_SIZE)
> 9bafbe7ae01321 Josef Bacik 2024-09-30 1195 folio_mark_uptodate(folio);
> 4f06dd92b5d0a6 Vivek Goyal 2020-10-21 1196
> 9bafbe7ae01321 Josef Bacik 2024-09-30 1197 if (folio_test_uptodate(folio)) {
> 9bafbe7ae01321 Josef Bacik 2024-09-30 1198 folio_unlock(folio);
> 4f06dd92b5d0a6 Vivek Goyal 2020-10-21 1199 } else {
> f2ef459bab7326 Joanne Koong 2024-10-24 1200 ia->write.folio_locked = true;
> 4f06dd92b5d0a6 Vivek Goyal 2020-10-21 1201 break;
> 4f06dd92b5d0a6 Vivek Goyal 2020-10-21 1202 }
> dfda790dfda452 Joanne Koong 2025-04-25 1203 if (!fc->big_writes || offset != 0)
> 78bb6cb9a890d3 Miklos Szeredi 2008-05-12 1204 break;
> dfda790dfda452 Joanne Koong 2025-04-25 1205 }
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1206
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 @1207 return count > 0 ? count : err;
> ea9b9907b82a09 Nicholas Piggin 2008-04-30 1208 }
>
I'll initialize err to 0 in v6. I'll wait for more reviews/comments on
the patchset before sending that out.
Thanks,
Joanne
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 01/11] fuse: support copying large folios
2025-04-26 0:08 ` [PATCH v5 01/11] fuse: support copying " Joanne Koong
@ 2025-05-04 18:05 ` Bernd Schubert
0 siblings, 0 replies; 31+ messages in thread
From: Bernd Schubert @ 2025-05-04 18:05 UTC (permalink / raw)
To: Joanne Koong, miklos
Cc: linux-fsdevel, jlayton, jefflexu, josef, willy, kernel-team
On 4/26/25 02:08, Joanne Koong wrote:
> Currently, all folios associated with fuse are one page size. As part of
> the work to enable large folios, this commit adds support for copying
> to/from folios larger than one page size.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> ---
> fs/fuse/dev.c | 84 +++++++++++++++++++-------------------------
> fs/fuse/fuse_dev_i.h | 2 +-
> 2 files changed, 37 insertions(+), 49 deletions(-)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 155bb6aeaef5..7b0e3a394480 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -955,10 +955,10 @@ static int fuse_check_folio(struct folio *folio)
> * folio that was originally in @pagep will lose a reference and the new
> * folio returned in @pagep will carry a reference.
> */
> -static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
> +static int fuse_try_move_folio(struct fuse_copy_state *cs, struct folio **foliop)
> {
> int err;
> - struct folio *oldfolio = page_folio(*pagep);
> + struct folio *oldfolio = *foliop;
> struct folio *newfolio;
> struct pipe_buffer *buf = cs->pipebufs;
>
> @@ -979,7 +979,7 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
> cs->pipebufs++;
> cs->nr_segs--;
>
> - if (cs->len != PAGE_SIZE)
> + if (cs->len != folio_size(oldfolio))
> goto out_fallback;
>
> if (!pipe_buf_try_steal(cs->pipe, buf))
> @@ -1025,7 +1025,7 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
> if (test_bit(FR_ABORTED, &cs->req->flags))
> err = -ENOENT;
> else
> - *pagep = &newfolio->page;
> + *foliop = newfolio;
> spin_unlock(&cs->req->waitq.lock);
>
> if (err) {
> @@ -1058,8 +1058,8 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
> goto out_put_old;
> }
>
> -static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page,
> - unsigned offset, unsigned count)
> +static int fuse_ref_folio(struct fuse_copy_state *cs, struct folio *folio,
> + unsigned offset, unsigned count)
> {
> struct pipe_buffer *buf;
> int err;
> @@ -1067,17 +1067,17 @@ static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page,
> if (cs->nr_segs >= cs->pipe->max_usage)
> return -EIO;
>
> - get_page(page);
> + folio_get(folio);
> err = unlock_request(cs->req);
> if (err) {
> - put_page(page);
> + folio_put(folio);
> return err;
> }
>
> fuse_copy_finish(cs);
>
> buf = cs->pipebufs;
> - buf->page = page;
> + buf->page = &folio->page;
> buf->offset = offset;
> buf->len = count;
>
> @@ -1089,20 +1089,21 @@ static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page,
> }
>
> /*
> - * Copy a page in the request to/from the userspace buffer. Must be
> + * Copy a folio in the request to/from the userspace buffer. Must be
> * done atomically
> */
> -static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep,
> - unsigned offset, unsigned count, int zeroing)
> +static int fuse_copy_folio(struct fuse_copy_state *cs, struct folio **foliop,
> + unsigned offset, unsigned count, int zeroing)
> {
> int err;
> - struct page *page = *pagep;
> + struct folio *folio = *foliop;
> + size_t size = folio_size(folio);
>
> - if (page && zeroing && count < PAGE_SIZE)
> - clear_highpage(page);
> + if (folio && zeroing && count < size)
> + folio_zero_range(folio, 0, size);
>
> while (count) {
> - if (cs->write && cs->pipebufs && page) {
> + if (cs->write && cs->pipebufs && folio) {
> /*
> * Can't control lifetime of pipe buffers, so always
> * copy user pages.
> @@ -1112,12 +1113,12 @@ static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep,
> if (err)
> return err;
> } else {
> - return fuse_ref_page(cs, page, offset, count);
> + return fuse_ref_folio(cs, folio, offset, count);
> }
> } else if (!cs->len) {
> - if (cs->move_pages && page &&
> - offset == 0 && count == PAGE_SIZE) {
> - err = fuse_try_move_page(cs, pagep);
> + if (cs->move_folios && folio &&
> + offset == 0 && count == folio_size(folio)) {
> + err = fuse_try_move_folio(cs, foliop);
> if (err <= 0)
> return err;
> } else {
> @@ -1126,22 +1127,22 @@ static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep,
> return err;
> }
> }
> - if (page) {
> - void *mapaddr = kmap_local_page(page);
> - void *buf = mapaddr + offset;
> + if (folio) {
> + void *mapaddr = kmap_local_folio(folio, offset);
> + void *buf = mapaddr;
> offset += fuse_copy_do(cs, &buf, &count);
> kunmap_local(mapaddr);
> } else
> offset += fuse_copy_do(cs, NULL, &count);
> }
> - if (page && !cs->write)
> - flush_dcache_page(page);
> + if (folio && !cs->write)
> + flush_dcache_folio(folio);
> return 0;
> }
>
> -/* Copy pages in the request to/from userspace buffer */
> -static int fuse_copy_pages(struct fuse_copy_state *cs, unsigned nbytes,
> - int zeroing)
> +/* Copy folios in the request to/from userspace buffer */
> +static int fuse_copy_folios(struct fuse_copy_state *cs, unsigned nbytes,
> + int zeroing)
> {
> unsigned i;
> struct fuse_req *req = cs->req;
> @@ -1151,23 +1152,12 @@ static int fuse_copy_pages(struct fuse_copy_state *cs, unsigned nbytes,
> int err;
> unsigned int offset = ap->descs[i].offset;
> unsigned int count = min(nbytes, ap->descs[i].length);
> - struct page *orig, *pagep;
> -
> - orig = pagep = &ap->folios[i]->page;
>
> - err = fuse_copy_page(cs, &pagep, offset, count, zeroing);
> + err = fuse_copy_folio(cs, &ap->folios[i], offset, count, zeroing);
> if (err)
> return err;
>
> nbytes -= count;
> -
> - /*
> - * fuse_copy_page may have moved a page from a pipe instead of
> - * copying into our given page, so update the folios if it was
> - * replaced.
> - */
> - if (pagep != orig)
> - ap->folios[i] = page_folio(pagep);
> }
> return 0;
> }
> @@ -1197,7 +1187,7 @@ int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
> for (i = 0; !err && i < numargs; i++) {
> struct fuse_arg *arg = &args[i];
> if (i == numargs - 1 && argpages)
> - err = fuse_copy_pages(cs, arg->size, zeroing);
> + err = fuse_copy_folios(cs, arg->size, zeroing);
> else
> err = fuse_copy_one(cs, arg->value, arg->size);
> }
> @@ -1786,7 +1776,6 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
> num = outarg.size;
> while (num) {
> struct folio *folio;
> - struct page *page;
> unsigned int this_num;
>
> folio = filemap_grab_folio(mapping, index);
> @@ -1794,9 +1783,8 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
> if (IS_ERR(folio))
> goto out_iput;
>
> - page = &folio->page;
> this_num = min_t(unsigned, num, folio_size(folio) - offset);
> - err = fuse_copy_page(cs, &page, offset, this_num, 0);
> + err = fuse_copy_folio(cs, &folio, offset, this_num, 0);
> if (!folio_test_uptodate(folio) && !err && offset == 0 &&
> (this_num == folio_size(folio) || file_size == end)) {
> folio_zero_segment(folio, this_num, folio_size(folio));
> @@ -2037,8 +2025,8 @@ static int fuse_notify_inc_epoch(struct fuse_conn *fc)
> static int fuse_notify(struct fuse_conn *fc, enum fuse_notify_code code,
> unsigned int size, struct fuse_copy_state *cs)
> {
> - /* Don't try to move pages (yet) */
> - cs->move_pages = false;
> + /* Don't try to move folios (yet) */
> + cs->move_folios = false;
>
> switch (code) {
> case FUSE_NOTIFY_POLL:
> @@ -2189,7 +2177,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
> spin_unlock(&fpq->lock);
> cs->req = req;
> if (!req->args->page_replace)
> - cs->move_pages = false;
> + cs->move_folios = false;
>
> if (oh.error)
> err = nbytes != sizeof(oh) ? -EINVAL : 0;
> @@ -2307,7 +2295,7 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe,
> cs.pipe = pipe;
>
> if (flags & SPLICE_F_MOVE)
> - cs.move_pages = true;
> + cs.move_folios = true;
>
> ret = fuse_dev_do_write(fud, &cs, len);
>
> diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
> index db136e045925..5a9bd771a319 100644
> --- a/fs/fuse/fuse_dev_i.h
> +++ b/fs/fuse/fuse_dev_i.h
> @@ -30,7 +30,7 @@ struct fuse_copy_state {
> unsigned int len;
> unsigned int offset;
> bool write:1;
> - bool move_pages:1;
> + bool move_folios:1;
> bool is_uring:1;
> struct {
> unsigned int copied_sz; /* copied size into the user buffer */
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 02/11] fuse: support large folios for retrieves
2025-04-26 0:08 ` [PATCH v5 02/11] fuse: support large folios for retrieves Joanne Koong
@ 2025-05-04 18:07 ` Bernd Schubert
0 siblings, 0 replies; 31+ messages in thread
From: Bernd Schubert @ 2025-05-04 18:07 UTC (permalink / raw)
To: Joanne Koong, miklos
Cc: linux-fsdevel, jlayton, jefflexu, josef, willy, kernel-team
On 4/26/25 02:08, Joanne Koong wrote:
> Add support for folios larger than one page size for retrieves.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> ---
> fs/fuse/dev.c | 25 +++++++++++++++----------
> 1 file changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 7b0e3a394480..fb81c0a1c6cd 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -1837,7 +1837,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
> unsigned int num;
> unsigned int offset;
> size_t total_len = 0;
> - unsigned int num_pages, cur_pages = 0;
> + unsigned int num_pages;
> struct fuse_conn *fc = fm->fc;
> struct fuse_retrieve_args *ra;
> size_t args_size = sizeof(*ra);
> @@ -1855,6 +1855,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
>
> num_pages = (num + offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
> num_pages = min(num_pages, fc->max_pages);
> + num = min(num, num_pages << PAGE_SHIFT);
>
> args_size += num_pages * (sizeof(ap->folios[0]) + sizeof(ap->descs[0]));
>
> @@ -1875,25 +1876,29 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
>
> index = outarg->offset >> PAGE_SHIFT;
>
> - while (num && cur_pages < num_pages) {
> + while (num) {
> struct folio *folio;
> - unsigned int this_num;
> + unsigned int folio_offset;
> + unsigned int nr_bytes;
> + unsigned int nr_pages;
>
> folio = filemap_get_folio(mapping, index);
> if (IS_ERR(folio))
> break;
>
> - this_num = min_t(unsigned, num, PAGE_SIZE - offset);
> + folio_offset = ((index - folio->index) << PAGE_SHIFT) + offset;
> + nr_bytes = min(folio_size(folio) - folio_offset, num);
> + nr_pages = (offset + nr_bytes + PAGE_SIZE - 1) >> PAGE_SHIFT;
> +
> ap->folios[ap->num_folios] = folio;
> - ap->descs[ap->num_folios].offset = offset;
> - ap->descs[ap->num_folios].length = this_num;
> + ap->descs[ap->num_folios].offset = folio_offset;
> + ap->descs[ap->num_folios].length = nr_bytes;
> ap->num_folios++;
> - cur_pages++;
>
> offset = 0;
> - num -= this_num;
> - total_len += this_num;
> - index++;
> + num -= nr_bytes;
> + total_len += nr_bytes;
> + index += nr_pages;
> }
> ra->inarg.offset = outarg->offset;
> ra->inarg.size = total_len;
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 03/11] fuse: refactor fuse_fill_write_pages()
2025-04-26 0:08 ` [PATCH v5 03/11] fuse: refactor fuse_fill_write_pages() Joanne Koong
2025-04-28 5:32 ` Dan Carpenter
@ 2025-05-04 18:08 ` Bernd Schubert
1 sibling, 0 replies; 31+ messages in thread
From: Bernd Schubert @ 2025-05-04 18:08 UTC (permalink / raw)
To: Joanne Koong, miklos
Cc: linux-fsdevel, jlayton, jefflexu, josef, willy, kernel-team
On 4/26/25 02:08, Joanne Koong wrote:
> Refactor the logic in fuse_fill_write_pages() for copying out write
> data. This will make the future change for supporting large folios for
> writes easier. No functional changes.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Josef Bacik <josef@toxicpandewa.com>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> ---
> fs/fuse/file.c | 19 +++++++++----------
> 1 file changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index e203dd4fcc0f..edc86485065e 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1132,21 +1132,21 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
> struct fuse_args_pages *ap = &ia->ap;
> struct fuse_conn *fc = get_fuse_conn(mapping->host);
> unsigned offset = pos & (PAGE_SIZE - 1);
> - unsigned int nr_pages = 0;
> size_t count = 0;
> + unsigned int num;
> int err;
>
> + num = min(iov_iter_count(ii), fc->max_write);
> + num = min(num, max_pages << PAGE_SHIFT);
> +
> ap->args.in_pages = true;
> ap->descs[0].offset = offset;
>
> - do {
> + while (num) {
> size_t tmp;
> struct folio *folio;
> pgoff_t index = pos >> PAGE_SHIFT;
> - size_t bytes = min_t(size_t, PAGE_SIZE - offset,
> - iov_iter_count(ii));
> -
> - bytes = min_t(size_t, bytes, fc->max_write - count);
> + unsigned bytes = min(PAGE_SIZE - offset, num);
>
> again:
> folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
> @@ -1182,10 +1182,10 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
> ap->folios[ap->num_folios] = folio;
> ap->descs[ap->num_folios].length = tmp;
> ap->num_folios++;
> - nr_pages++;
>
> count += tmp;
> pos += tmp;
> + num -= tmp;
> offset += tmp;
> if (offset == PAGE_SIZE)
> offset = 0;
> @@ -1200,10 +1200,9 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
> ia->write.folio_locked = true;
> break;
> }
> - if (!fc->big_writes)
> + if (!fc->big_writes || offset != 0)
> break;
> - } while (iov_iter_count(ii) && count < fc->max_write &&
> - nr_pages < max_pages && offset == 0);
> + }
>
> return count > 0 ? count : err;
> }
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 04/11] fuse: support large folios for writethrough writes
2025-04-26 0:08 ` [PATCH v5 04/11] fuse: support large folios for writethrough writes Joanne Koong
@ 2025-05-04 18:40 ` Bernd Schubert
2025-05-05 21:36 ` Joanne Koong
0 siblings, 1 reply; 31+ messages in thread
From: Bernd Schubert @ 2025-05-04 18:40 UTC (permalink / raw)
To: Joanne Koong, miklos
Cc: linux-fsdevel, jlayton, jefflexu, josef, willy, kernel-team
On 4/26/25 02:08, Joanne Koong wrote:
> Add support for folios larger than one page size for writethrough
> writes.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> ---
> fs/fuse/file.c | 15 ++++++++++-----
> 1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index edc86485065e..e44b6d26c1c6 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1146,7 +1146,8 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
> size_t tmp;
> struct folio *folio;
> pgoff_t index = pos >> PAGE_SHIFT;
> - unsigned bytes = min(PAGE_SIZE - offset, num);
> + unsigned int bytes;
> + unsigned int folio_offset;
>
> again:
> folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
> @@ -1159,7 +1160,10 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
> if (mapping_writably_mapped(mapping))
> flush_dcache_folio(folio);
>
> - tmp = copy_folio_from_iter_atomic(folio, offset, bytes, ii);
> + folio_offset = ((index - folio->index) << PAGE_SHIFT) + offset;
> + bytes = min(folio_size(folio) - folio_offset, num);
> +
> + tmp = copy_folio_from_iter_atomic(folio, folio_offset, bytes, ii);
> flush_dcache_folio(folio);
>
> if (!tmp) {f
> @@ -1180,6 +1184,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
>
> err = 0;
> ap->folios[ap->num_folios] = folio;
> + ap->descs[ap->num_folios].offset = folio_offset;
> ap->descs[ap->num_folios].length = tmp;
> ap->num_folios++;
>
> @@ -1187,11 +1192,11 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
> pos += tmp;
> num -= tmp;
> offset += tmp;
> - if (offset == PAGE_SIZE)
> + if (offset == folio_size(folio))
> offset = 0;
>
> - /* If we copied full page, mark it uptodate */
> - if (tmp == PAGE_SIZE)
> + /* If we copied full folio, mark it uptodate */
> + if (tmp == folio_size(folio))
> folio_mark_uptodate(folio);
Here am I confused. I think tmp can be a subpart of the folio, let's say
the folio is 2MB and somehow the again loop would iterate through the
folio in smaller steps. So the folio would be entirely written out, but
tmp might not be folio_size? Doesn't this need to sum up tmp for per
folio and then use that value? And I actually wonder if we could use
the above "(offset == folio_size(folio)" as well. At least if the
initial offset for a folio is 0 it should work.
Thanks,
Bernd
>
> if (folio_test_uptodate(folio)) {
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 05/11] fuse: support large folios for folio reads
2025-04-26 0:08 ` [PATCH v5 05/11] fuse: support large folios for folio reads Joanne Koong
@ 2025-05-04 18:58 ` Bernd Schubert
0 siblings, 0 replies; 31+ messages in thread
From: Bernd Schubert @ 2025-05-04 18:58 UTC (permalink / raw)
To: Joanne Koong, miklos
Cc: linux-fsdevel, jlayton, jefflexu, josef, willy, kernel-team
On 4/26/25 02:08, Joanne Koong wrote:
> Add support for folios larger than one page size for folio reads into
> the page cache.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> ---
> fs/fuse/file.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index e44b6d26c1c6..0ca3b31c59f9 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -793,7 +793,7 @@ static int fuse_do_readfolio(struct file *file, struct folio *folio)
> struct inode *inode = folio->mapping->host;
> struct fuse_mount *fm = get_fuse_mount(inode);
> loff_t pos = folio_pos(folio);
> - struct fuse_folio_desc desc = { .length = PAGE_SIZE };
> + struct fuse_folio_desc desc = { .length = folio_size(folio) };
> struct fuse_io_args ia = {
> .ap.args.page_zeroing = true,
> .ap.args.out_pages = true,
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 06/11] fuse: support large folios for symlinks
2025-04-26 0:08 ` [PATCH v5 06/11] fuse: support large folios for symlinks Joanne Koong
@ 2025-05-04 19:04 ` Bernd Schubert
0 siblings, 0 replies; 31+ messages in thread
From: Bernd Schubert @ 2025-05-04 19:04 UTC (permalink / raw)
To: Joanne Koong, miklos
Cc: linux-fsdevel, jlayton, jefflexu, josef, willy, kernel-team
On 4/26/25 02:08, Joanne Koong wrote:
> Support large folios for symlinks and change the name from
> fuse_getlink_page() to fuse_getlink_folio().
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> ---
> fs/fuse/dir.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 1fb0b15a6088..3003119559e8 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1629,10 +1629,10 @@ static int fuse_permission(struct mnt_idmap *idmap,
> return err;
> }
>
> -static int fuse_readlink_page(struct inode *inode, struct folio *folio)
> +static int fuse_readlink_folio(struct inode *inode, struct folio *folio)
> {
> struct fuse_mount *fm = get_fuse_mount(inode);
> - struct fuse_folio_desc desc = { .length = PAGE_SIZE - 1 };
> + struct fuse_folio_desc desc = { .length = folio_size(folio) - 1 };
> struct fuse_args_pages ap = {
> .num_folios = 1,
> .folios = &folio,
> @@ -1687,7 +1687,7 @@ static const char *fuse_get_link(struct dentry *dentry, struct inode *inode,
> if (!folio)
> goto out_err;
>
> - err = fuse_readlink_page(inode, folio);
> + err = fuse_readlink_folio(inode, folio);
> if (err) {
> folio_put(folio);
> goto out_err;
> @@ -2277,7 +2277,7 @@ void fuse_init_dir(struct inode *inode)
>
> static int fuse_symlink_read_folio(struct file *null, struct folio *folio)
> {
> - int err = fuse_readlink_page(folio->mapping->host, folio);
> + int err = fuse_readlink_folio(folio->mapping->host, folio);
>
> if (!err)
> folio_mark_uptodate(folio);
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 08/11] fuse: support large folios for queued writes
2025-04-26 0:08 ` [PATCH v5 08/11] fuse: support large folios for queued writes Joanne Koong
@ 2025-05-04 19:08 ` Bernd Schubert
0 siblings, 0 replies; 31+ messages in thread
From: Bernd Schubert @ 2025-05-04 19:08 UTC (permalink / raw)
To: Joanne Koong, miklos
Cc: linux-fsdevel, jlayton, jefflexu, josef, willy, kernel-team
On 4/26/25 02:08, Joanne Koong wrote:
> Add support for folios larger than one page size for queued writes.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> ---
> fs/fuse/file.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 0ca3b31c59f9..1d38486fae50 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1790,11 +1790,14 @@ __releases(fi->lock)
> __acquires(fi->lock)
> {
> struct fuse_inode *fi = get_fuse_inode(wpa->inode);
> + struct fuse_args_pages *ap = &wpa->ia.ap;
> struct fuse_write_in *inarg = &wpa->ia.write.in;
> - struct fuse_args *args = &wpa->ia.ap.args;
> - /* Currently, all folios in FUSE are one page */
> - __u64 data_size = wpa->ia.ap.num_folios * PAGE_SIZE;
> - int err;
> + struct fuse_args *args = &ap->args;
> + __u64 data_size = 0;
> + int err, i;
> +
> + for (i = 0; i < ap->num_folios; i++)
> + data_size += ap->descs[i].length;
>
> fi->writectr++;
> if (inarg->offset + data_size <= size) {
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 09/11] fuse: support large folios for readahead
2025-04-26 0:08 ` [PATCH v5 09/11] fuse: support large folios for readahead Joanne Koong
@ 2025-05-04 19:13 ` Bernd Schubert
2025-05-05 14:40 ` Darrick J. Wong
0 siblings, 1 reply; 31+ messages in thread
From: Bernd Schubert @ 2025-05-04 19:13 UTC (permalink / raw)
To: Joanne Koong, miklos
Cc: linux-fsdevel, jlayton, jefflexu, josef, willy, kernel-team
On 4/26/25 02:08, Joanne Koong wrote:
> Add support for folios larger than one page size for readahead.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> ---
> fs/fuse/file.c | 36 +++++++++++++++++++++++++++---------
> 1 file changed, 27 insertions(+), 9 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 1d38486fae50..9a31f2a516b9 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -876,14 +876,13 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
> fuse_io_free(ia);
> }
>
> -static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file)
> +static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
> + unsigned int count)
> {
> struct fuse_file *ff = file->private_data;
> struct fuse_mount *fm = ff->fm;
> struct fuse_args_pages *ap = &ia->ap;
> loff_t pos = folio_pos(ap->folios[0]);
> - /* Currently, all folios in FUSE are one page */
> - size_t count = ap->num_folios << PAGE_SHIFT;
> ssize_t res;
> int err;
>
> @@ -918,6 +917,7 @@ static void fuse_readahead(struct readahead_control *rac)
> struct inode *inode = rac->mapping->host;
> struct fuse_conn *fc = get_fuse_conn(inode);
> unsigned int max_pages, nr_pages;
> + struct folio *folio = NULL;
>
> if (fuse_is_bad(inode))
> return;
> @@ -939,8 +939,8 @@ static void fuse_readahead(struct readahead_control *rac)
> while (nr_pages) {
> struct fuse_io_args *ia;
> struct fuse_args_pages *ap;
> - struct folio *folio;
> unsigned cur_pages = min(max_pages, nr_pages);
> + unsigned int pages = 0;
>
> if (fc->num_background >= fc->congestion_threshold &&
> rac->ra->async_size >= readahead_count(rac))
> @@ -952,10 +952,12 @@ static void fuse_readahead(struct readahead_control *rac)
>
> ia = fuse_io_alloc(NULL, cur_pages);
> if (!ia)
> - return;
> + break;
> ap = &ia->ap;
>
> - while (ap->num_folios < cur_pages) {
> + while (pages < cur_pages) {
> + unsigned int folio_pages;
> +
> /*
> * This returns a folio with a ref held on it.
> * The ref needs to be held until the request is
> @@ -963,13 +965,29 @@ static void fuse_readahead(struct readahead_control *rac)
> * fuse_try_move_page()) drops the ref after it's
> * replaced in the page cache.
> */
> - folio = __readahead_folio(rac);
> + if (!folio)
> + folio = __readahead_folio(rac);
> +
> + folio_pages = folio_nr_pages(folio);
> + if (folio_pages > cur_pages - pages)
> + break;
> +
Hmm, so let's assume this would be a 2MB folio, but fc->max_pages is
limited to 1MB - we not do read-ahead anymore?
Thanks,
Bernd
> ap->folios[ap->num_folios] = folio;
> ap->descs[ap->num_folios].length = folio_size(folio);
> ap->num_folios++;
> + pages += folio_pages;
> + folio = NULL;
> + }
> + if (!pages) {
> + fuse_io_free(ia);
> + break;
> }
> - fuse_send_readpages(ia, rac->file);
> - nr_pages -= cur_pages;
> + fuse_send_readpages(ia, rac->file, pages << PAGE_SHIFT);
> + nr_pages -= pages;
> + }
> + if (folio) {
> + folio_end_read(folio, false);
> + folio_put(folio);
> }
> }
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 10/11] fuse: optimize direct io large folios processing
2025-04-26 0:08 ` [PATCH v5 10/11] fuse: optimize direct io large folios processing Joanne Koong
@ 2025-05-04 19:15 ` Bernd Schubert
2025-07-04 10:24 ` David Hildenbrand
1 sibling, 0 replies; 31+ messages in thread
From: Bernd Schubert @ 2025-05-04 19:15 UTC (permalink / raw)
To: Joanne Koong, miklos
Cc: linux-fsdevel, jlayton, jefflexu, josef, willy, kernel-team
On 4/26/25 02:08, Joanne Koong wrote:
> Optimize processing folios larger than one page size for the direct io
> case. If contiguous pages are part of the same folio, collate the
> processing instead of processing each page in the folio separately.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> ---
> fs/fuse/file.c | 55 +++++++++++++++++++++++++++++++++++++-------------
> 1 file changed, 41 insertions(+), 14 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 9a31f2a516b9..61eaec1c993b 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1490,7 +1490,8 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
> }
>
> while (nbytes < *nbytesp && nr_pages < max_pages) {
> - unsigned nfolios, i;
> + struct folio *prev_folio = NULL;
> + unsigned npages, i;
> size_t start;
>
> ret = iov_iter_extract_pages(ii, &pages,
> @@ -1502,23 +1503,49 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
>
> nbytes += ret;
>
> - nfolios = DIV_ROUND_UP(ret + start, PAGE_SIZE);
> + npages = DIV_ROUND_UP(ret + start, PAGE_SIZE);
>
> - for (i = 0; i < nfolios; i++) {
> - struct folio *folio = page_folio(pages[i]);
> - unsigned int offset = start +
> - (folio_page_idx(folio, pages[i]) << PAGE_SHIFT);
> - unsigned int len = min_t(unsigned int, ret, PAGE_SIZE - start);
> + /*
> + * We must check each extracted page. We can't assume every page
> + * in a large folio is used. For example, userspace may mmap() a
> + * file PROT_WRITE, MAP_PRIVATE, and then store to the middle of
> + * a large folio, in which case the extracted pages could be
> + *
> + * folio A page 0
> + * folio A page 1
> + * folio B page 0
> + * folio A page 3
> + *
> + * where folio A belongs to the file and folio B is an anonymous
> + * COW page.
> + */
> + for (i = 0; i < npages && ret; i++) {
> + struct folio *folio;
> + unsigned int offset;
> + unsigned int len;
> +
> + WARN_ON(!pages[i]);
> + folio = page_folio(pages[i]);
> +
> + len = min_t(unsigned int, ret, PAGE_SIZE - start);
> +
> + if (folio == prev_folio && pages[i] != pages[i - 1]) {
> + WARN_ON(ap->folios[ap->num_folios - 1] != folio);
> + ap->descs[ap->num_folios - 1].length += len;
> + WARN_ON(ap->descs[ap->num_folios - 1].length > folio_size(folio));
> + } else {
> + offset = start + (folio_page_idx(folio, pages[i]) << PAGE_SHIFT);
> + ap->descs[ap->num_folios].offset = offset;
> + ap->descs[ap->num_folios].length = len;
> + ap->folios[ap->num_folios] = folio;
> + start = 0;
> + ap->num_folios++;
> + prev_folio = folio;
> + }
>
> - ap->descs[ap->num_folios].offset = offset;
> - ap->descs[ap->num_folios].length = len;
> - ap->folios[ap->num_folios] = folio;
> - start = 0;
> ret -= len;
> - ap->num_folios++;
> }
> -
> - nr_pages += nfolios;
> + nr_pages += npages;
> }
> kfree(pages);
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 09/11] fuse: support large folios for readahead
2025-05-04 19:13 ` Bernd Schubert
@ 2025-05-05 14:40 ` Darrick J. Wong
2025-05-05 15:23 ` Bernd Schubert
0 siblings, 1 reply; 31+ messages in thread
From: Darrick J. Wong @ 2025-05-05 14:40 UTC (permalink / raw)
To: Bernd Schubert
Cc: Joanne Koong, miklos, linux-fsdevel, jlayton, jefflexu, josef,
willy, kernel-team
On Sun, May 04, 2025 at 09:13:44PM +0200, Bernd Schubert wrote:
>
>
> On 4/26/25 02:08, Joanne Koong wrote:
> > Add support for folios larger than one page size for readahead.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > fs/fuse/file.c | 36 +++++++++++++++++++++++++++---------
> > 1 file changed, 27 insertions(+), 9 deletions(-)
> >
> > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > index 1d38486fae50..9a31f2a516b9 100644
> > --- a/fs/fuse/file.c
> > +++ b/fs/fuse/file.c
> > @@ -876,14 +876,13 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
> > fuse_io_free(ia);
> > }
> >
> > -static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file)
> > +static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
> > + unsigned int count)
> > {
> > struct fuse_file *ff = file->private_data;
> > struct fuse_mount *fm = ff->fm;
> > struct fuse_args_pages *ap = &ia->ap;
> > loff_t pos = folio_pos(ap->folios[0]);
> > - /* Currently, all folios in FUSE are one page */
> > - size_t count = ap->num_folios << PAGE_SHIFT;
> > ssize_t res;
> > int err;
> >
> > @@ -918,6 +917,7 @@ static void fuse_readahead(struct readahead_control *rac)
> > struct inode *inode = rac->mapping->host;
> > struct fuse_conn *fc = get_fuse_conn(inode);
> > unsigned int max_pages, nr_pages;
> > + struct folio *folio = NULL;
> >
> > if (fuse_is_bad(inode))
> > return;
> > @@ -939,8 +939,8 @@ static void fuse_readahead(struct readahead_control *rac)
> > while (nr_pages) {
> > struct fuse_io_args *ia;
> > struct fuse_args_pages *ap;
> > - struct folio *folio;
> > unsigned cur_pages = min(max_pages, nr_pages);
> > + unsigned int pages = 0;
> >
> > if (fc->num_background >= fc->congestion_threshold &&
> > rac->ra->async_size >= readahead_count(rac))
> > @@ -952,10 +952,12 @@ static void fuse_readahead(struct readahead_control *rac)
> >
> > ia = fuse_io_alloc(NULL, cur_pages);
> > if (!ia)
> > - return;
> > + break;
> > ap = &ia->ap;
> >
> > - while (ap->num_folios < cur_pages) {
> > + while (pages < cur_pages) {
> > + unsigned int folio_pages;
> > +
> > /*
> > * This returns a folio with a ref held on it.
> > * The ref needs to be held until the request is
> > @@ -963,13 +965,29 @@ static void fuse_readahead(struct readahead_control *rac)
> > * fuse_try_move_page()) drops the ref after it's
> > * replaced in the page cache.
> > */
> > - folio = __readahead_folio(rac);
> > + if (!folio)
> > + folio = __readahead_folio(rac);
> > +
> > + folio_pages = folio_nr_pages(folio);
> > + if (folio_pages > cur_pages - pages)
> > + break;
> > +
>
> Hmm, so let's assume this would be a 2MB folio, but fc->max_pages is
> limited to 1MB - we not do read-ahead anymore?
It's hard for me to say without seeing the actual enablement patches,
but filesystems are supposed to call mapping_set_folio_order_range to
constrain the sizes of the folios that the pagecache requests.
--D
> Thanks,
> Bernd
>
>
> > ap->folios[ap->num_folios] = folio;
> > ap->descs[ap->num_folios].length = folio_size(folio);
> > ap->num_folios++;
> > + pages += folio_pages;
> > + folio = NULL;
> > + }
> > + if (!pages) {
> > + fuse_io_free(ia);
> > + break;
> > }
> > - fuse_send_readpages(ia, rac->file);
> > - nr_pages -= cur_pages;
> > + fuse_send_readpages(ia, rac->file, pages << PAGE_SHIFT);
> > + nr_pages -= pages;
> > + }
> > + if (folio) {
> > + folio_end_read(folio, false);
> > + folio_put(folio);
> > }
> > }
> >
>
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 09/11] fuse: support large folios for readahead
2025-05-05 14:40 ` Darrick J. Wong
@ 2025-05-05 15:23 ` Bernd Schubert
2025-05-05 22:05 ` Joanne Koong
0 siblings, 1 reply; 31+ messages in thread
From: Bernd Schubert @ 2025-05-05 15:23 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Joanne Koong, miklos, linux-fsdevel, jlayton, jefflexu, josef,
willy, kernel-team
On 5/5/25 16:40, Darrick J. Wong wrote:
> On Sun, May 04, 2025 at 09:13:44PM +0200, Bernd Schubert wrote:
>>
>>
>> On 4/26/25 02:08, Joanne Koong wrote:
>>> Add support for folios larger than one page size for readahead.
>>>
>>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>>> Reviewed-by: Jeff Layton <jlayton@kernel.org>
>>> ---
>>> fs/fuse/file.c | 36 +++++++++++++++++++++++++++---------
>>> 1 file changed, 27 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>>> index 1d38486fae50..9a31f2a516b9 100644
>>> --- a/fs/fuse/file.c
>>> +++ b/fs/fuse/file.c
>>> @@ -876,14 +876,13 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
>>> fuse_io_free(ia);
>>> }
>>>
>>> -static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file)
>>> +static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
>>> + unsigned int count)
>>> {
>>> struct fuse_file *ff = file->private_data;
>>> struct fuse_mount *fm = ff->fm;
>>> struct fuse_args_pages *ap = &ia->ap;
>>> loff_t pos = folio_pos(ap->folios[0]);
>>> - /* Currently, all folios in FUSE are one page */
>>> - size_t count = ap->num_folios << PAGE_SHIFT;
>>> ssize_t res;
>>> int err;
>>>
>>> @@ -918,6 +917,7 @@ static void fuse_readahead(struct readahead_control *rac)
>>> struct inode *inode = rac->mapping->host;
>>> struct fuse_conn *fc = get_fuse_conn(inode);
>>> unsigned int max_pages, nr_pages;
>>> + struct folio *folio = NULL;
>>>
>>> if (fuse_is_bad(inode))
>>> return;
>>> @@ -939,8 +939,8 @@ static void fuse_readahead(struct readahead_control *rac)
>>> while (nr_pages) {
>>> struct fuse_io_args *ia;
>>> struct fuse_args_pages *ap;
>>> - struct folio *folio;
>>> unsigned cur_pages = min(max_pages, nr_pages);
>>> + unsigned int pages = 0;
>>>
>>> if (fc->num_background >= fc->congestion_threshold &&
>>> rac->ra->async_size >= readahead_count(rac))
>>> @@ -952,10 +952,12 @@ static void fuse_readahead(struct readahead_control *rac)
>>>
>>> ia = fuse_io_alloc(NULL, cur_pages);
>>> if (!ia)
>>> - return;
>>> + break;
>>> ap = &ia->ap;
>>>
>>> - while (ap->num_folios < cur_pages) {
>>> + while (pages < cur_pages) {
>>> + unsigned int folio_pages;
>>> +
>>> /*
>>> * This returns a folio with a ref held on it.
>>> * The ref needs to be held until the request is
>>> @@ -963,13 +965,29 @@ static void fuse_readahead(struct readahead_control *rac)
>>> * fuse_try_move_page()) drops the ref after it's
>>> * replaced in the page cache.
>>> */
>>> - folio = __readahead_folio(rac);
>>> + if (!folio)
>>> + folio = __readahead_folio(rac);
>>> +
>>> + folio_pages = folio_nr_pages(folio);
>>> + if (folio_pages > cur_pages - pages)
>>> + break;
>>> +
>>
>> Hmm, so let's assume this would be a 2MB folio, but fc->max_pages is
>> limited to 1MB - we not do read-ahead anymore?
>
> It's hard for me to say without seeing the actual enablement patches,
> but filesystems are supposed to call mapping_set_folio_order_range to
> constrain the sizes of the folios that the pagecache requests.
I think large folios do not get enabled ye in this series. Could we have
a comment here that folio size is supposed to be restricted to
fc->max_pages? And wouldn't that be a case for unlikely()?
Thanks,
Bernd
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 04/11] fuse: support large folios for writethrough writes
2025-05-04 18:40 ` Bernd Schubert
@ 2025-05-05 21:36 ` Joanne Koong
0 siblings, 0 replies; 31+ messages in thread
From: Joanne Koong @ 2025-05-05 21:36 UTC (permalink / raw)
To: Bernd Schubert
Cc: miklos, linux-fsdevel, jlayton, jefflexu, josef, willy,
kernel-team
On Sun, May 4, 2025 at 11:40 AM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
>
>
> On 4/26/25 02:08, Joanne Koong wrote:
> > Add support for folios larger than one page size for writethrough
> > writes.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > fs/fuse/file.c | 15 ++++++++++-----
> > 1 file changed, 10 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > index edc86485065e..e44b6d26c1c6 100644
> > --- a/fs/fuse/file.c
> > +++ b/fs/fuse/file.c
> > @@ -1146,7 +1146,8 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
> > size_t tmp;
> > struct folio *folio;
> > pgoff_t index = pos >> PAGE_SHIFT;
> > - unsigned bytes = min(PAGE_SIZE - offset, num);
> > + unsigned int bytes;
> > + unsigned int folio_offset;
> >
> > again:
> > folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
> > @@ -1159,7 +1160,10 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
> > if (mapping_writably_mapped(mapping))
> > flush_dcache_folio(folio);
> >
> > - tmp = copy_folio_from_iter_atomic(folio, offset, bytes, ii);
> > + folio_offset = ((index - folio->index) << PAGE_SHIFT) + offset;
> > + bytes = min(folio_size(folio) - folio_offset, num);
> > +
> > + tmp = copy_folio_from_iter_atomic(folio, folio_offset, bytes, ii);
> > flush_dcache_folio(folio);
> >
> > if (!tmp) {f
> > @@ -1180,6 +1184,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
> >
> > err = 0;
> > ap->folios[ap->num_folios] = folio;
> > + ap->descs[ap->num_folios].offset = folio_offset;
> > ap->descs[ap->num_folios].length = tmp;
> > ap->num_folios++;
> >
> > @@ -1187,11 +1192,11 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
> > pos += tmp;
> > num -= tmp;
> > offset += tmp;
> > - if (offset == PAGE_SIZE)
> > + if (offset == folio_size(folio))
> > offset = 0;
> >
> > - /* If we copied full page, mark it uptodate */
> > - if (tmp == PAGE_SIZE)
> > + /* If we copied full folio, mark it uptodate */
> > + if (tmp == folio_size(folio))
> > folio_mark_uptodate(folio);
>
> Here am I confused. I think tmp can be a subpart of the folio, let's say
> the folio is 2MB and somehow the again loop would iterate through the
> folio in smaller steps. So the folio would be entirely written out, but
> tmp might not be folio_size? Doesn't this need to sum up tmp for per
> folio and then use that value? And I actually wonder if we could use
> the above "(offset == folio_size(folio)" as well. At least if the
> initial offset for a folio is 0 it should work.
>
Hi Bernd,
Thanks for taking a look at this series and reviewing.
I don't think this scenario is possible. In copy_folio_iter_atomic()
which ends up calling into __copy_from_iter(), I don't see anywhere
where only the subpart of the folio can be copied out. The iter is a
ubuf, and from what I see, either copy_folio_iter_atomic() will return
0 or memcpy all requested bytes (unless bytes is greater than the
bytes contained in the iter, which isn't possible here) into the
folio.
Thanks,
Joanne
>
> Thanks,
> Bernd
>
> >
> > if (folio_test_uptodate(folio)) {
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 09/11] fuse: support large folios for readahead
2025-05-05 15:23 ` Bernd Schubert
@ 2025-05-05 22:05 ` Joanne Koong
0 siblings, 0 replies; 31+ messages in thread
From: Joanne Koong @ 2025-05-05 22:05 UTC (permalink / raw)
To: Bernd Schubert
Cc: Darrick J. Wong, miklos, linux-fsdevel, jlayton, jefflexu, josef,
willy, kernel-team
On Mon, May 5, 2025 at 8:23 AM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
>
>
> On 5/5/25 16:40, Darrick J. Wong wrote:
> > On Sun, May 04, 2025 at 09:13:44PM +0200, Bernd Schubert wrote:
> >>
> >>
> >> On 4/26/25 02:08, Joanne Koong wrote:
> >>> Add support for folios larger than one page size for readahead.
> >>>
> >>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> >>> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> >>> ---
> >>> fs/fuse/file.c | 36 +++++++++++++++++++++++++++---------
> >>> 1 file changed, 27 insertions(+), 9 deletions(-)
> >>>
> >>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> >>> index 1d38486fae50..9a31f2a516b9 100644
> >>> --- a/fs/fuse/file.c
> >>> +++ b/fs/fuse/file.c
> >>> @@ -876,14 +876,13 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
> >>> fuse_io_free(ia);
> >>> }
> >>>
> >>> -static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file)
> >>> +static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
> >>> + unsigned int count)
> >>> {
> >>> struct fuse_file *ff = file->private_data;
> >>> struct fuse_mount *fm = ff->fm;
> >>> struct fuse_args_pages *ap = &ia->ap;
> >>> loff_t pos = folio_pos(ap->folios[0]);
> >>> - /* Currently, all folios in FUSE are one page */
> >>> - size_t count = ap->num_folios << PAGE_SHIFT;
> >>> ssize_t res;
> >>> int err;
> >>>
> >>> @@ -918,6 +917,7 @@ static void fuse_readahead(struct readahead_control *rac)
> >>> struct inode *inode = rac->mapping->host;
> >>> struct fuse_conn *fc = get_fuse_conn(inode);
> >>> unsigned int max_pages, nr_pages;
> >>> + struct folio *folio = NULL;
> >>>
> >>> if (fuse_is_bad(inode))
> >>> return;
> >>> @@ -939,8 +939,8 @@ static void fuse_readahead(struct readahead_control *rac)
> >>> while (nr_pages) {
> >>> struct fuse_io_args *ia;
> >>> struct fuse_args_pages *ap;
> >>> - struct folio *folio;
> >>> unsigned cur_pages = min(max_pages, nr_pages);
> >>> + unsigned int pages = 0;
> >>>
> >>> if (fc->num_background >= fc->congestion_threshold &&
> >>> rac->ra->async_size >= readahead_count(rac))
> >>> @@ -952,10 +952,12 @@ static void fuse_readahead(struct readahead_control *rac)
> >>>
> >>> ia = fuse_io_alloc(NULL, cur_pages);
> >>> if (!ia)
> >>> - return;
> >>> + break;
> >>> ap = &ia->ap;
> >>>
> >>> - while (ap->num_folios < cur_pages) {
> >>> + while (pages < cur_pages) {
> >>> + unsigned int folio_pages;
> >>> +
> >>> /*
> >>> * This returns a folio with a ref held on it.
> >>> * The ref needs to be held until the request is
> >>> @@ -963,13 +965,29 @@ static void fuse_readahead(struct readahead_control *rac)
> >>> * fuse_try_move_page()) drops the ref after it's
> >>> * replaced in the page cache.
> >>> */
> >>> - folio = __readahead_folio(rac);
> >>> + if (!folio)
> >>> + folio = __readahead_folio(rac);
> >>> +
> >>> + folio_pages = folio_nr_pages(folio);
> >>> + if (folio_pages > cur_pages - pages)
> >>> + break;
> >>> +
> >>
> >> Hmm, so let's assume this would be a 2MB folio, but fc->max_pages is
> >> limited to 1MB - we not do read-ahead anymore?
> >
> > It's hard for me to say without seeing the actual enablement patches,
> > but filesystems are supposed to call mapping_set_folio_order_range to
> > constrain the sizes of the folios that the pagecache requests.
Yes, exactly. For enabling fuse, I envision adding something like this
in fuse_init_file_inode():
max_pages = min(min(fc->max_write, fc->max_read) >> PAGE_SHIFT, fc->max_pages);
max_order = ilog2(max_pages);
mapping_set_folio_order_range(inode->i_mapping, 0, max_order);
>
> I think large folios do not get enabled ye in this series. Could we have
> a comment here that folio size is supposed to be restricted to
> fc->max_pages? And wouldn't that be a case for unlikely()?
Large folios are not enabled yet in this series. The cover letter
explains a bit why,
"This does not yet switch fuse to using large folios. Using large folios in
fuse is dependent on adding granular dirty-page tracking. This will be done
in a separate patchset that will have fuse use iomap [1]. There also needs
to be a followup (also part of future work) for having dirty page balancing
not tank performance for unprivileged servers where bdi limits lead to subpar
throttling [1], before enabling large folios for fuse."
[1] https://lore.kernel.org/linux-fsdevel/CAJnrk1a38pv3OgFZRfdTiDMXuPWuBgN8KY47XfOsYHj=N2wxAg@mail.gmail.com/#t
I'll add a comment about this in v6.
Thanks,
Joanne
>
>
> Thanks,
> Bernd
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 10/11] fuse: optimize direct io large folios processing
2025-04-26 0:08 ` [PATCH v5 10/11] fuse: optimize direct io large folios processing Joanne Koong
2025-05-04 19:15 ` Bernd Schubert
@ 2025-07-04 10:24 ` David Hildenbrand
2025-07-07 23:27 ` Joanne Koong
1 sibling, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2025-07-04 10:24 UTC (permalink / raw)
To: Joanne Koong, miklos
Cc: linux-fsdevel, jlayton, jefflexu, josef, bernd.schubert, willy,
kernel-team
On 26.04.25 02:08, Joanne Koong wrote:
> Optimize processing folios larger than one page size for the direct io
> case. If contiguous pages are part of the same folio, collate the
> processing instead of processing each page in the folio separately.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> ---
> fs/fuse/file.c | 55 +++++++++++++++++++++++++++++++++++++-------------
> 1 file changed, 41 insertions(+), 14 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 9a31f2a516b9..61eaec1c993b 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1490,7 +1490,8 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
> }
>
> while (nbytes < *nbytesp && nr_pages < max_pages) {
> - unsigned nfolios, i;
> + struct folio *prev_folio = NULL;
> + unsigned npages, i;
> size_t start;
>
> ret = iov_iter_extract_pages(ii, &pages,
> @@ -1502,23 +1503,49 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
>
> nbytes += ret;
>
> - nfolios = DIV_ROUND_UP(ret + start, PAGE_SIZE);
> + npages = DIV_ROUND_UP(ret + start, PAGE_SIZE);
>
> - for (i = 0; i < nfolios; i++) {
> - struct folio *folio = page_folio(pages[i]);
> - unsigned int offset = start +
> - (folio_page_idx(folio, pages[i]) << PAGE_SHIFT);
> - unsigned int len = min_t(unsigned int, ret, PAGE_SIZE - start);
> + /*
> + * We must check each extracted page. We can't assume every page
> + * in a large folio is used. For example, userspace may mmap() a
> + * file PROT_WRITE, MAP_PRIVATE, and then store to the middle of
> + * a large folio, in which case the extracted pages could be
> + *
> + * folio A page 0
> + * folio A page 1
> + * folio B page 0
> + * folio A page 3
> + *
> + * where folio A belongs to the file and folio B is an anonymous
> + * COW page.
> + */
> + for (i = 0; i < npages && ret; i++) {
> + struct folio *folio;
> + unsigned int offset;
> + unsigned int len;
> +
> + WARN_ON(!pages[i]);
> + folio = page_folio(pages[i]);
> +
> + len = min_t(unsigned int, ret, PAGE_SIZE - start);
> +
> + if (folio == prev_folio && pages[i] != pages[i - 1]) {
I don't really understand the "pages[i] != pages[i - 1]" part.
Why would you have to equal page pointers in there?
Something that might be simpler to understand and implement would be using
num_pages_contiguous()
from
https://lore.kernel.org/all/20250704062602.33500-2-lizhe.67@bytedance.com/T/#u
and then just making sure that we don't exceed the current folio, if we
ever get contiguous pages that cross a folio.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 10/11] fuse: optimize direct io large folios processing
2025-07-04 10:24 ` David Hildenbrand
@ 2025-07-07 23:27 ` Joanne Koong
2025-07-08 16:05 ` David Hildenbrand
0 siblings, 1 reply; 31+ messages in thread
From: Joanne Koong @ 2025-07-07 23:27 UTC (permalink / raw)
To: David Hildenbrand
Cc: miklos, linux-fsdevel, jlayton, jefflexu, josef, bernd.schubert,
willy, kernel-team
On Fri, Jul 4, 2025 at 3:24 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 26.04.25 02:08, Joanne Koong wrote:
> > Optimize processing folios larger than one page size for the direct io
> > case. If contiguous pages are part of the same folio, collate the
> > processing instead of processing each page in the folio separately.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > fs/fuse/file.c | 55 +++++++++++++++++++++++++++++++++++++-------------
> > 1 file changed, 41 insertions(+), 14 deletions(-)
> >
> > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > index 9a31f2a516b9..61eaec1c993b 100644
> > --- a/fs/fuse/file.c
> > +++ b/fs/fuse/file.c
> > @@ -1490,7 +1490,8 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
> > }
> >
> > while (nbytes < *nbytesp && nr_pages < max_pages) {
> > - unsigned nfolios, i;
> > + struct folio *prev_folio = NULL;
> > + unsigned npages, i;
> > size_t start;
> >
> > ret = iov_iter_extract_pages(ii, &pages,
> > @@ -1502,23 +1503,49 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
> >
> > nbytes += ret;
> >
> > - nfolios = DIV_ROUND_UP(ret + start, PAGE_SIZE);
> > + npages = DIV_ROUND_UP(ret + start, PAGE_SIZE);
> >
> > - for (i = 0; i < nfolios; i++) {
> > - struct folio *folio = page_folio(pages[i]);
> > - unsigned int offset = start +
> > - (folio_page_idx(folio, pages[i]) << PAGE_SHIFT);
> > - unsigned int len = min_t(unsigned int, ret, PAGE_SIZE - start);
> > + /*
> > + * We must check each extracted page. We can't assume every page
> > + * in a large folio is used. For example, userspace may mmap() a
> > + * file PROT_WRITE, MAP_PRIVATE, and then store to the middle of
> > + * a large folio, in which case the extracted pages could be
> > + *
> > + * folio A page 0
> > + * folio A page 1
> > + * folio B page 0
> > + * folio A page 3
> > + *
> > + * where folio A belongs to the file and folio B is an anonymous
> > + * COW page.
> > + */
> > + for (i = 0; i < npages && ret; i++) {
> > + struct folio *folio;
> > + unsigned int offset;
> > + unsigned int len;
> > +
> > + WARN_ON(!pages[i]);
> > + folio = page_folio(pages[i]);
> > +
> > + len = min_t(unsigned int, ret, PAGE_SIZE - start);
> > +
> > + if (folio == prev_folio && pages[i] != pages[i - 1]) {
>
> I don't really understand the "pages[i] != pages[i - 1]" part.
>
> Why would you have to equal page pointers in there?
>
The pages extracted are user pages from a userspace iovec. AFAICT,
there's the possibility, eg if userspace mmaps() the file with
copy-on-write, that the same physical page could back multiple
contiguous virtual addresses.
>
> Something that might be simpler to understand and implement would be using
>
> num_pages_contiguous()
>
> from
>
> https://lore.kernel.org/all/20250704062602.33500-2-lizhe.67@bytedance.com/T/#u
>
> and then just making sure that we don't exceed the current folio, if we
> ever get contiguous pages that cross a folio.
Thanks for the link. I think here it's common that the pages array
would hold pages from multiple different folios, so maybe a new helper
num_pages_contiguous_folio() would be useful to return back the number
of contiguous pages that are within the scope of the same folio.
Thanks,
Joanne
>
>
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 10/11] fuse: optimize direct io large folios processing
2025-07-07 23:27 ` Joanne Koong
@ 2025-07-08 16:05 ` David Hildenbrand
2025-07-08 23:14 ` Joanne Koong
0 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2025-07-08 16:05 UTC (permalink / raw)
To: Joanne Koong
Cc: miklos, linux-fsdevel, jlayton, jefflexu, josef, bernd.schubert,
willy, kernel-team
On 08.07.25 01:27, Joanne Koong wrote:
> On Fri, Jul 4, 2025 at 3:24 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 26.04.25 02:08, Joanne Koong wrote:
>>> Optimize processing folios larger than one page size for the direct io
>>> case. If contiguous pages are part of the same folio, collate the
>>> processing instead of processing each page in the folio separately.
>>>
>>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>>> Reviewed-by: Jeff Layton <jlayton@kernel.org>
>>> ---
>>> fs/fuse/file.c | 55 +++++++++++++++++++++++++++++++++++++-------------
>>> 1 file changed, 41 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>>> index 9a31f2a516b9..61eaec1c993b 100644
>>> --- a/fs/fuse/file.c
>>> +++ b/fs/fuse/file.c
>>> @@ -1490,7 +1490,8 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
>>> }
>>>
>>> while (nbytes < *nbytesp && nr_pages < max_pages) {
>>> - unsigned nfolios, i;
>>> + struct folio *prev_folio = NULL;
>>> + unsigned npages, i;
>>> size_t start;
>>>
>>> ret = iov_iter_extract_pages(ii, &pages,
>>> @@ -1502,23 +1503,49 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
>>>
>>> nbytes += ret;
>>>
>>> - nfolios = DIV_ROUND_UP(ret + start, PAGE_SIZE);
>>> + npages = DIV_ROUND_UP(ret + start, PAGE_SIZE);
>>>
>>> - for (i = 0; i < nfolios; i++) {
>>> - struct folio *folio = page_folio(pages[i]);
>>> - unsigned int offset = start +
>>> - (folio_page_idx(folio, pages[i]) << PAGE_SHIFT);
>>> - unsigned int len = min_t(unsigned int, ret, PAGE_SIZE - start);
>>> + /*
>>> + * We must check each extracted page. We can't assume every page
>>> + * in a large folio is used. For example, userspace may mmap() a
>>> + * file PROT_WRITE, MAP_PRIVATE, and then store to the middle of
>>> + * a large folio, in which case the extracted pages could be
>>> + *
>>> + * folio A page 0
>>> + * folio A page 1
>>> + * folio B page 0
>>> + * folio A page 3
>>> + *
>>> + * where folio A belongs to the file and folio B is an anonymous
>>> + * COW page.
>>> + */
>>> + for (i = 0; i < npages && ret; i++) {
>>> + struct folio *folio;
>>> + unsigned int offset;
>>> + unsigned int len;
>>> +
>>> + WARN_ON(!pages[i]);
>>> + folio = page_folio(pages[i]);
>>> +
>>> + len = min_t(unsigned int, ret, PAGE_SIZE - start);
>>> +
>>> + if (folio == prev_folio && pages[i] != pages[i - 1]) {
>>
>> I don't really understand the "pages[i] != pages[i - 1]" part.
>>
>> Why would you have to equal page pointers in there?
>>
>
> The pages extracted are user pages from a userspace iovec. AFAICT,
> there's the possibility, eg if userspace mmaps() the file with
> copy-on-write, that the same physical page could back multiple
> contiguous virtual addresses.
Yes, I but I was rather curious why that would be a condition we are
checking. It's quite the ... corner case :)
>
>>
>> Something that might be simpler to understand and implement would be using
>>
>> num_pages_contiguous()
>>
>> from
>>
>> https://lore.kernel.org/all/20250704062602.33500-2-lizhe.67@bytedance.com/T/#u
>>
>> and then just making sure that we don't exceed the current folio, if we
>> ever get contiguous pages that cross a folio.
>
> Thanks for the link. I think here it's common that the pages array
> would hold pages from multiple different folios, so maybe a new helper
> num_pages_contiguous_folio() would be useful to return back the number
> of contiguous pages that are within the scope of the same folio.
Yes, something like that can be useful.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 10/11] fuse: optimize direct io large folios processing
2025-07-08 16:05 ` David Hildenbrand
@ 2025-07-08 23:14 ` Joanne Koong
0 siblings, 0 replies; 31+ messages in thread
From: Joanne Koong @ 2025-07-08 23:14 UTC (permalink / raw)
To: David Hildenbrand
Cc: miklos, linux-fsdevel, jlayton, jefflexu, josef, bernd.schubert,
willy, kernel-team
On Tue, Jul 8, 2025 at 9:05 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 08.07.25 01:27, Joanne Koong wrote:
> > On Fri, Jul 4, 2025 at 3:24 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 26.04.25 02:08, Joanne Koong wrote:
> >>> Optimize processing folios larger than one page size for the direct io
> >>> case. If contiguous pages are part of the same folio, collate the
> >>> processing instead of processing each page in the folio separately.
> >>>
> >>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> >>> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> >>> ---
> >>> fs/fuse/file.c | 55 +++++++++++++++++++++++++++++++++++++-------------
> >>> 1 file changed, 41 insertions(+), 14 deletions(-)
> >>>
> >>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> >>> index 9a31f2a516b9..61eaec1c993b 100644
> >>> --- a/fs/fuse/file.c
> >>> +++ b/fs/fuse/file.c
> >>> @@ -1490,7 +1490,8 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
> >>> }
> >>>
> >>> while (nbytes < *nbytesp && nr_pages < max_pages) {
> >>> - unsigned nfolios, i;
> >>> + struct folio *prev_folio = NULL;
> >>> + unsigned npages, i;
> >>> size_t start;
> >>>
> >>> ret = iov_iter_extract_pages(ii, &pages,
> >>> @@ -1502,23 +1503,49 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
> >>>
> >>> nbytes += ret;
> >>>
> >>> - nfolios = DIV_ROUND_UP(ret + start, PAGE_SIZE);
> >>> + npages = DIV_ROUND_UP(ret + start, PAGE_SIZE);
> >>>
> >>> - for (i = 0; i < nfolios; i++) {
> >>> - struct folio *folio = page_folio(pages[i]);
> >>> - unsigned int offset = start +
> >>> - (folio_page_idx(folio, pages[i]) << PAGE_SHIFT);
> >>> - unsigned int len = min_t(unsigned int, ret, PAGE_SIZE - start);
> >>> + /*
> >>> + * We must check each extracted page. We can't assume every page
> >>> + * in a large folio is used. For example, userspace may mmap() a
> >>> + * file PROT_WRITE, MAP_PRIVATE, and then store to the middle of
> >>> + * a large folio, in which case the extracted pages could be
> >>> + *
> >>> + * folio A page 0
> >>> + * folio A page 1
> >>> + * folio B page 0
> >>> + * folio A page 3
> >>> + *
> >>> + * where folio A belongs to the file and folio B is an anonymous
> >>> + * COW page.
> >>> + */
> >>> + for (i = 0; i < npages && ret; i++) {
> >>> + struct folio *folio;
> >>> + unsigned int offset;
> >>> + unsigned int len;
> >>> +
> >>> + WARN_ON(!pages[i]);
> >>> + folio = page_folio(pages[i]);
> >>> +
> >>> + len = min_t(unsigned int, ret, PAGE_SIZE - start);
> >>> +
> >>> + if (folio == prev_folio && pages[i] != pages[i - 1]) {
> >>
> >> I don't really understand the "pages[i] != pages[i - 1]" part.
> >>
> >> Why would you have to equal page pointers in there?
> >>
> >
> > The pages extracted are user pages from a userspace iovec. AFAICT,
> > there's the possibility, eg if userspace mmaps() the file with
> > copy-on-write, that the same physical page could back multiple
> > contiguous virtual addresses.
>
> Yes, I but I was rather curious why that would be a condition we are
> checking. It's quite the ... corner case :)
>
Agreed, definitely the corner case :)
In the fuse code, later on when the buffer gets copied to/from the
server, it'll use ap->descs[index].length as the number of bytes to
copy. If we don't check for this duplicate page corner case, then
it'll copy the wrong offsets in the folio, which may even lead to a
page fault if the folio is only one page. This buffer copying logic if
you're curious happens in fuse_copy_args() -> fuse_copy_folios() ->
fuse_copy_folio().
> >
> >>
> >> Something that might be simpler to understand and implement would be using
> >>
> >> num_pages_contiguous()
> >>
> >> from
> >>
> >> https://lore.kernel.org/all/20250704062602.33500-2-lizhe.67@bytedance.com/T/#u
> >>
> >> and then just making sure that we don't exceed the current folio, if we
> >> ever get contiguous pages that cross a folio.
> >
> > Thanks for the link. I think here it's common that the pages array
> > would hold pages from multiple different folios, so maybe a new helper
> > num_pages_contiguous_folio() would be useful to return back the number
> > of contiguous pages that are within the scope of the same folio.
>
> Yes, something like that can be useful.
>
> --
> Cheers,
>
> David / dhildenb
>
>
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2025-07-08 23:15 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-26 0:08 [PATCH v5 00/11] fuse: support large folios Joanne Koong
2025-04-26 0:08 ` [PATCH v5 01/11] fuse: support copying " Joanne Koong
2025-05-04 18:05 ` Bernd Schubert
2025-04-26 0:08 ` [PATCH v5 02/11] fuse: support large folios for retrieves Joanne Koong
2025-05-04 18:07 ` Bernd Schubert
2025-04-26 0:08 ` [PATCH v5 03/11] fuse: refactor fuse_fill_write_pages() Joanne Koong
2025-04-28 5:32 ` Dan Carpenter
2025-04-28 22:10 ` Joanne Koong
2025-05-04 18:08 ` Bernd Schubert
2025-04-26 0:08 ` [PATCH v5 04/11] fuse: support large folios for writethrough writes Joanne Koong
2025-05-04 18:40 ` Bernd Schubert
2025-05-05 21:36 ` Joanne Koong
2025-04-26 0:08 ` [PATCH v5 05/11] fuse: support large folios for folio reads Joanne Koong
2025-05-04 18:58 ` Bernd Schubert
2025-04-26 0:08 ` [PATCH v5 06/11] fuse: support large folios for symlinks Joanne Koong
2025-05-04 19:04 ` Bernd Schubert
2025-04-26 0:08 ` [PATCH v5 07/11] fuse: support large folios for stores Joanne Koong
2025-04-26 0:08 ` [PATCH v5 08/11] fuse: support large folios for queued writes Joanne Koong
2025-05-04 19:08 ` Bernd Schubert
2025-04-26 0:08 ` [PATCH v5 09/11] fuse: support large folios for readahead Joanne Koong
2025-05-04 19:13 ` Bernd Schubert
2025-05-05 14:40 ` Darrick J. Wong
2025-05-05 15:23 ` Bernd Schubert
2025-05-05 22:05 ` Joanne Koong
2025-04-26 0:08 ` [PATCH v5 10/11] fuse: optimize direct io large folios processing Joanne Koong
2025-05-04 19:15 ` Bernd Schubert
2025-07-04 10:24 ` David Hildenbrand
2025-07-07 23:27 ` Joanne Koong
2025-07-08 16:05 ` David Hildenbrand
2025-07-08 23:14 ` Joanne Koong
2025-04-26 0:08 ` [PATCH v5 11/11] fuse: support large folios for writeback Joanne Koong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).