public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* fuse: buffered reads limited to 256KB regardless of negotiated max_pages
@ 2026-03-16 14:54 jim.harris
  2026-03-24 13:21 ` Stefan Hajnoczi
  0 siblings, 1 reply; 5+ messages in thread
From: jim.harris @ 2026-03-16 14:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Miklos Szeredi, Stefan Hajnoczi, Max Gurtovoy, Idan Zach,
	Konrad Sztyber

Hi all,

We have a FUSE server that advertises max_write=1MB and max_pages=256
in the FUSE_INIT response. Buffered sequential writes arrive at the
server at the full 1MB as expected. However, buffered sequential reads
are capped at 256KB per FUSE READ request.

The cap comes from the BDI readahead window. bdi->ra_pages defaults to
VM_READAHEAD_PAGES (32 pages / 128KB). For sequential access (e.g. cp),
posix_fadvise(POSIX_FADV_SEQUENTIAL) doubles the per-file readahead
window to 2 * bdi->ra_pages (256KB), producing the observed 256KB
limit. A 1MB application read() results in four sequential 256KB
round trips to the FUSE server instead of one.

In process_init_reply(), the kernel processes the
server's max_readahead response like this:

    ra_pages = arg->max_readahead / PAGE_SIZE;
    fm->sb->s_bdi->ra_pages = min(fm->sb->s_bdi->ra_pages, ra_pages);

Since bdi->ra_pages starts at VM_READAHEAD_PAGES (128KB), and the
kernel sends this value as init_in->max_readahead, the server can only
decrease readahead -- never increase it. Even if the server responds
with max_readahead=1MB, the min() clamps it back to 128KB.

Other filesystems set ra_pages or io_pages based on server/device
capabilities:

  - SMB/CIFS sets ra_pages directly (2 * rsize, or from mount option)
  - Ceph sets ra_pages directly from mount option
  - 9P sets both ra_pages and io_pages from maxdata
  - NFS sets io_pages from rsize

I see two possible approaches and would like feedback:

Option A: Fix the max_readahead negotiation

Replace the current:

    fm->sb->s_bdi->ra_pages = min(fm->sb->s_bdi->ra_pages, ra_pages);

with:

    fm->sb->s_bdi->ra_pages = min(ra_pages, fc->max_pages);

This uses the server's max_readahead response directly, capped by
fc->max_pages for safety. I think this is backward compatible:
existing servers that echo the kernel's 128KB value get the same
result. Servers that return a lower value still reduce it. Only
servers that return a higher value see changed behavior.

FUSE servers can opt in by advertising a larger max_readahead in the
FUSE_INIT response.

Option B: Set io_pages from max_pages

Set bdi->io_pages after FUSE_INIT negotiation:

    fm->sb->s_bdi->io_pages = fc->max_pages;

This matches what NFS does (setting io_pages from rsize). The
readahead code uses max(bdi->io_pages, ra->ra_pages) to determine
the maximum readahead size, so a large io_pages would allow larger
readahead submissions.

This is simpler since no server-side change is needed. However, it
bypasses the max_readahead protocol field, making max_readahead
effectively meaningless for any device with large max_pages.

In both cases, fc->max_pages is already clamped by
fc->max_pages_limit, which for virtio-fs accounts for the virtqueue
descriptor count.

Thoughts?

Thanks,
Jim

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-24 17:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-16 14:54 fuse: buffered reads limited to 256KB regardless of negotiated max_pages jim.harris
2026-03-24 13:21 ` Stefan Hajnoczi
2026-03-24 16:05   ` Bernd Schubert
2026-03-24 17:11     ` Darrick J. Wong
2026-03-24 17:40       ` Bernd Schubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox