From: "Matthew R. Ochs" <mochs@nvidia.com>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: Bernd Schubert <bschubert@ddn.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH v2] fuse: back uncached readdir buffers with pages
Date: Tue, 28 Apr 2026 16:29:38 -0700 [thread overview]
Message-ID: <20260428233028.2747981-1-mochs@nvidia.com> (raw)
Commit dabb90391028 ("fuse: increase readdir buffer size") changed
fuse_readdir_uncached() to size its temporary buffer from ctx->count.
That is useful for overlayfs and other in-kernel callers that use
INT_MAX to indicate an unlimited directory read.
The buffer is capped by fc->max_pages converted to bytes with PAGE_SIZE.
However, fc->max_pages is a page-count limit, while fc->max_write is the
negotiated byte-sized payload limit. Using only fc->max_pages can produce
a READDIR request larger than the server is prepared to handle, especially
when the server and client use different page sizes.
The larger buffer is also currently supplied as a kvec output argument.
For virtiofs, kvec arguments are copied through req->argbuf, which is
allocated with kmalloc(..., GFP_ATOMIC). A large readdir buffer can
therefore require a multi-megabyte contiguous atomic allocation and fail
with -ENOMEM.
This was observed with a 64K-page guest on a 4K-page host, using an
overlayfs mount whose lower directory is on virtiofs. Reading a merged
directory through overlayfs failed with:
ls: reading directory '<path>': Cannot allocate memory
Avoid the oversized request and the large bounce-buffer allocation by
capping the requested byte size by both fc->max_pages and fc->max_write,
then backing the uncached readdir output with pages and setting out_pages.
The virtiofs transport can then pass the pages as scatter-gather entries
instead of copying the output through argbuf.
Map the pages with vm_map_ram() only while parsing the returned dirents,
so the existing parser can continue to operate on a linear kernel mapping.
Fixes: dabb90391028 ("fuse: increase readdir buffer size")
Cc: stable@vger.kernel.org
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
---
v2:
- Reworked uncached readdir to use output pages and out_pages, per Miklos.
- Cap the requested byte size by both fc->max_pages and fc->max_write.
- Map pages with vm_map_ram() only while parsing returned dirents.
- Verified with --overlay-rwdir across 4K/64K host and guest page sizes.
- Link to v1: https://lore.kernel.org/all/20260428021304.2338592-1-mochs@nvidia.com/
fs/fuse/readdir.c | 67 ++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 57 insertions(+), 10 deletions(-)
diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
index db5ae8ec1030..27162084a683 100644
--- a/fs/fuse/readdir.c
+++ b/fs/fuse/readdir.c
@@ -12,6 +12,7 @@
#include <linux/posix_acl.h>
#include <linux/pagemap.h>
#include <linux/highmem.h>
+#include <linux/vmalloc.h>
static bool fuse_use_readdirplus(struct inode *dir, struct dir_context *ctx)
{
@@ -343,17 +344,45 @@ static int fuse_readdir_uncached(struct file *file, struct dir_context *ctx)
struct fuse_mount *fm = get_fuse_mount(inode);
struct fuse_conn *fc = fm->fc;
struct fuse_io_args ia = {};
- struct fuse_args *args = &ia.ap.args;
+ struct fuse_args_pages *ap = &ia.ap;
+ struct fuse_args *args = &ap->args;
+ struct page **pages;
void *buf;
- size_t bufsize = clamp((unsigned int) ctx->count, PAGE_SIZE, fc->max_pages << PAGE_SHIFT);
+ size_t max_bufsize = min_t(size_t, (size_t)fc->max_pages << PAGE_SHIFT,
+ fc->max_write);
+ size_t count = ctx->count > 0 ? ctx->count : PAGE_SIZE;
+ size_t bufsize = min_t(size_t, max_t(size_t, count, PAGE_SIZE),
+ max_bufsize);
+ unsigned int nr_pages = DIV_ROUND_UP(bufsize, PAGE_SIZE);
u64 attr_version = 0, evict_ctr = 0;
bool locked;
+ unsigned int nr_alloc = 0;
+ unsigned int i;
- buf = kvmalloc(bufsize, GFP_KERNEL);
- if (!buf)
+ pages = kvcalloc(nr_pages, sizeof(*pages), GFP_KERNEL);
+ if (!pages)
return -ENOMEM;
- args->out_args[0].value = buf;
+ while (nr_alloc < nr_pages) {
+ unsigned int last = nr_alloc;
+
+ nr_alloc = alloc_pages_bulk(GFP_KERNEL, nr_pages, pages);
+ if (nr_alloc == last)
+ goto nomem;
+ }
+
+ ap->folios = fuse_folios_alloc(nr_pages, GFP_KERNEL, &ap->descs);
+ if (!ap->folios)
+ goto nomem;
+
+ for (i = 0; i < nr_pages; i++) {
+ ap->folios[i] = page_folio(pages[i]);
+ ap->descs[i].length = min_t(size_t,
+ bufsize - (size_t)i * PAGE_SIZE,
+ PAGE_SIZE);
+ }
+ ap->num_folios = nr_pages;
+ args->out_pages = true;
plus = fuse_use_readdirplus(inode, ctx);
if (plus) {
@@ -372,17 +401,35 @@ static int fuse_readdir_uncached(struct file *file, struct dir_context *ctx)
if (ff->open_flags & FOPEN_CACHE_DIR)
fuse_readdir_cache_end(file, ctx->pos);
- } else if (plus) {
- res = parse_dirplusfile(buf, res, file, ctx, attr_version,
- evict_ctr);
} else {
- res = parse_dirfile(buf, res, file, ctx);
+ buf = vm_map_ram(pages, nr_pages, -1);
+ if (!buf) {
+ res = -ENOMEM;
+ } else {
+ if (plus)
+ res = parse_dirplusfile(buf, res, file, ctx,
+ attr_version,
+ evict_ctr);
+ else
+ res = parse_dirfile(buf, res, file, ctx);
+
+ vm_unmap_ram(buf, nr_pages);
+ }
}
}
- kvfree(buf);
fuse_invalidate_atime(inode);
+
+out:
+ kfree(ap->folios);
+ for (i = 0; i < nr_alloc; i++)
+ __free_page(pages[i]);
+ kvfree(pages);
return res;
+
+nomem:
+ res = -ENOMEM;
+ goto out;
}
enum fuse_parse_result {
--
2.50.1
next reply other threads:[~2026-04-28 23:30 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-28 23:29 Matthew R. Ochs [this message]
2026-04-29 7:27 ` [PATCH v2] fuse: back uncached readdir buffers with pages Miklos Szeredi
2026-04-30 19:24 ` Matt Ochs
2026-05-14 21:35 ` Matt Ochs
2026-04-29 9:29 ` Bernd Schubert
2026-04-29 10:38 ` Miklos Szeredi
2026-04-29 10:47 ` Bernd Schubert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260428233028.2747981-1-mochs@nvidia.com \
--to=mochs@nvidia.com \
--cc=bschubert@ddn.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.