* [PATCH] fuse: do not treat unlimited readdir count as a buffer size
@ 2026-04-28 2:13 Matthew R. Ochs
2026-04-28 13:11 ` Miklos Szeredi
0 siblings, 1 reply; 3+ messages in thread
From: Matthew R. Ochs @ 2026-04-28 2:13 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel, linux-kernel, stable
Commit dabb90391028 ("fuse: increase readdir buffer size") changed
fuse_readdir_uncached() to size its temporary buffer from ctx->count,
clamped to the negotiated FUSE maximum request size.
That is correct for normal userspace getdents callers, where ctx->count is
the userspace dirent buffer size. It is not correct for in-kernel callers
that use the VFS sentinel values documented for struct dir_context.count:
0 means unknown and INT_MAX means unlimited.
Overlayfs uses INT_MAX when reading merged directories. After
dabb90391028, FUSE interprets that sentinel as a real size request and
expands the readdir buffer to fc->max_pages << PAGE_SHIFT.
For virtiofs, the output kvec is included in the request bounce buffer
allocated by copy_args_to_argbuf():
req->argbuf = kmalloc(len, GFP_ATOMIC);
On a 64K-page guest, this can require a multi-megabyte contiguous
GFP_ATOMIC allocation. In the failing setup, a 64K-page guest on a 4K-page
host negotiated max_pages=124, so the computed buffer was about 8MB. The
same guest on a 64K-page host negotiated max_pages=16, limiting the
computed buffer to 1MB and masking the bug.
One way to reproduce this is a 64K-page guest on a 4K-page host with an
overlayfs mount whose lower directory is on virtiofs. Reading a merged
directory through overlayfs can then fail with:
ls: reading directory '<path>': Cannot allocate memory
Treat unknown and unlimited counts the same way fuse_readdir_uncached()
did before dabb90391028: use PAGE_SIZE. Keep the larger readdir buffer
for callers that provide a meaningful positive count.
Fixes: dabb90391028 ("fuse: increase readdir buffer size")
Cc: stable@vger.kernel.org
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
---
fs/fuse/readdir.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
index c2aae2eef086..0e436c563efb 100644
--- a/fs/fuse/readdir.c
+++ b/fs/fuse/readdir.c
@@ -341,7 +341,10 @@ static int fuse_readdir_uncached(struct file *file, struct dir_context *ctx)
struct fuse_io_args ia = {};
struct fuse_args *args = &ia.ap.args;
void *buf;
- size_t bufsize = clamp((unsigned int) ctx->count, PAGE_SIZE, fc->max_pages << PAGE_SHIFT);
+ unsigned int count = (unsigned int)ctx->count;
+ size_t bufsize = (count && count != (unsigned int)INT_MAX) ?
+ clamp(count, (unsigned int)PAGE_SIZE, fc->max_pages << PAGE_SHIFT) :
+ PAGE_SIZE;
u64 attr_version = 0, evict_ctr = 0;
bool locked;
--
2.50.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] fuse: do not treat unlimited readdir count as a buffer size
2026-04-28 2:13 [PATCH] fuse: do not treat unlimited readdir count as a buffer size Matthew R. Ochs
@ 2026-04-28 13:11 ` Miklos Szeredi
2026-04-28 22:36 ` Matt Ochs
0 siblings, 1 reply; 3+ messages in thread
From: Miklos Szeredi @ 2026-04-28 13:11 UTC (permalink / raw)
To: Matthew R. Ochs; +Cc: Bernd Schubert, linux-fsdevel, linux-kernel, stable
On Tue, 28 Apr 2026 at 04:13, Matthew R. Ochs <mochs@nvidia.com> wrote:
> For virtiofs, the output kvec is included in the request bounce buffer
> allocated by copy_args_to_argbuf():
>
> req->argbuf = kmalloc(len, GFP_ATOMIC);
Ugh. The real bug here is inappropriate use of the bounce buffer.
fuse_readdir_uncached() should instead supply an array of pages.
It's a little more complicated, but would fix this properly: overlayfs
does want to get as much of the directory as possible in one go to be
most efficient.
I'd go with vmalloc -> alloc_pages_bulk, then vm_map_ram() before
parsing the result.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] fuse: do not treat unlimited readdir count as a buffer size
2026-04-28 13:11 ` Miklos Szeredi
@ 2026-04-28 22:36 ` Matt Ochs
0 siblings, 0 replies; 3+ messages in thread
From: Matt Ochs @ 2026-04-28 22:36 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Bernd Schubert, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, stable@vger.kernel.org
> On Apr 28, 2026, at 08:11, Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Tue, 28 Apr 2026 at 04:13, Matthew R. Ochs <mochs@nvidia.com> wrote:
>
>> For virtiofs, the output kvec is included in the request bounce buffer
>> allocated by copy_args_to_argbuf():
>>
>> req->argbuf = kmalloc(len, GFP_ATOMIC);
>
> Ugh. The real bug here is inappropriate use of the bounce buffer.
> fuse_readdir_uncached() should instead supply an array of pages.
>
> It's a little more complicated, but would fix this properly: overlayfs
> does want to get as much of the directory as possible in one go to be
> most efficient.
>
> I'd go with vmalloc -> alloc_pages_bulk, then vm_map_ram() before
> parsing the result.
>
Thanks, that makes sense. I reworked the fix along those lines: uncached
readdir now supplies output pages via out_pages and uses vm_map_ram() only
for the existing parser.
Testing also showed that the request size needs to be capped by
fc->max_write as well as fc->max_pages. fc->max_pages is a page-count
limit, and with a 4K host / 64K guest virtiofsd advertised 124 pages,
which the guest turned into an ~8 MiB READDIR. virtiofsd's byte-sized
payload limit was 1 MiB, so the page-backed version still failed until the
fc->max_write cap was added.
I’ll send out a v2 shortly.
-matt
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-04-28 22:36 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28 2:13 [PATCH] fuse: do not treat unlimited readdir count as a buffer size Matthew R. Ochs
2026-04-28 13:11 ` Miklos Szeredi
2026-04-28 22:36 ` Matt Ochs
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox