fuse: buffered reads limited to 256KB regardless of negotiated max

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

* fuse: buffered reads limited to 256KB regardless of negotiated max_pages
@ 2026-03-16 14:54 jim.harris
  2026-03-24 13:21 ` Stefan Hajnoczi
  0 siblings, 1 reply; 5+ messages in thread
From: jim.harris @ 2026-03-16 14:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Miklos Szeredi, Stefan Hajnoczi, Max Gurtovoy, Idan Zach,
	Konrad Sztyber

Hi all,

We have a FUSE server that advertises max_write=1MB and max_pages=256
in the FUSE_INIT response. Buffered sequential writes arrive at the
server at the full 1MB as expected. However, buffered sequential reads
are capped at 256KB per FUSE READ request.

The cap comes from the BDI readahead window. bdi->ra_pages defaults to
VM_READAHEAD_PAGES (32 pages / 128KB). For sequential access (e.g. cp),
posix_fadvise(POSIX_FADV_SEQUENTIAL) doubles the per-file readahead
window to 2 * bdi->ra_pages (256KB), producing the observed 256KB
limit. A 1MB application read() results in four sequential 256KB
round trips to the FUSE server instead of one.

In process_init_reply(), the kernel processes the
server's max_readahead response like this:

    ra_pages = arg->max_readahead / PAGE_SIZE;
    fm->sb->s_bdi->ra_pages = min(fm->sb->s_bdi->ra_pages, ra_pages);

Since bdi->ra_pages starts at VM_READAHEAD_PAGES (128KB), and the
kernel sends this value as init_in->max_readahead, the server can only
decrease readahead -- never increase it. Even if the server responds
with max_readahead=1MB, the min() clamps it back to 128KB.

Other filesystems set ra_pages or io_pages based on server/device
capabilities:

  - SMB/CIFS sets ra_pages directly (2 * rsize, or from mount option)
  - Ceph sets ra_pages directly from mount option
  - 9P sets both ra_pages and io_pages from maxdata
  - NFS sets io_pages from rsize

I see two possible approaches and would like feedback:

Option A: Fix the max_readahead negotiation

Replace the current:

    fm->sb->s_bdi->ra_pages = min(fm->sb->s_bdi->ra_pages, ra_pages);

with:

    fm->sb->s_bdi->ra_pages = min(ra_pages, fc->max_pages);

This uses the server's max_readahead response directly, capped by
fc->max_pages for safety. I think this is backward compatible:
existing servers that echo the kernel's 128KB value get the same
result. Servers that return a lower value still reduce it. Only
servers that return a higher value see changed behavior.

FUSE servers can opt in by advertising a larger max_readahead in the
FUSE_INIT response.

Option B: Set io_pages from max_pages

Set bdi->io_pages after FUSE_INIT negotiation:

    fm->sb->s_bdi->io_pages = fc->max_pages;

This matches what NFS does (setting io_pages from rsize). The
readahead code uses max(bdi->io_pages, ra->ra_pages) to determine
the maximum readahead size, so a large io_pages would allow larger
readahead submissions.

This is simpler since no server-side change is needed. However, it
bypasses the max_readahead protocol field, making max_readahead
effectively meaningless for any device with large max_pages.

In both cases, fc->max_pages is already clamped by
fc->max_pages_limit, which for virtio-fs accounts for the virtqueue
descriptor count.

Thoughts?

Thanks,
Jim

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: fuse: buffered reads limited to 256KB regardless of negotiated max_pages
  2026-03-16 14:54 fuse: buffered reads limited to 256KB regardless of negotiated max_pages jim.harris
@ 2026-03-24 13:21 ` Stefan Hajnoczi
  2026-03-24 16:05   ` Bernd Schubert
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Hajnoczi @ 2026-03-24 13:21 UTC (permalink / raw)
  To: jim.harris
  Cc: linux-fsdevel, Miklos Szeredi, Max Gurtovoy, Idan Zach,
	Konrad Sztyber, German Maglione, hreitz

[-- Attachment #1: Type: text/plain, Size: 3471 bytes --]

On Mon, Mar 16, 2026 at 07:54:35AM -0700, jim.harris@nvidia.com wrote:
> Hi all,
> 
> We have a FUSE server that advertises max_write=1MB and max_pages=256
> in the FUSE_INIT response. Buffered sequential writes arrive at the
> server at the full 1MB as expected. However, buffered sequential reads
> are capped at 256KB per FUSE READ request.
> 
> The cap comes from the BDI readahead window. bdi->ra_pages defaults to
> VM_READAHEAD_PAGES (32 pages / 128KB). For sequential access (e.g. cp),
> posix_fadvise(POSIX_FADV_SEQUENTIAL) doubles the per-file readahead
> window to 2 * bdi->ra_pages (256KB), producing the observed 256KB
> limit. A 1MB application read() results in four sequential 256KB
> round trips to the FUSE server instead of one.

Hi Jim,
Thanks for sharing this issue. I am CCing Geman Maglione and Hanna
Czenczek who work on virtiofsd and are also becoming more involved in
the virtiofs kernel driver.

> In process_init_reply(), the kernel processes the
> server's max_readahead response like this:
> 
>     ra_pages = arg->max_readahead / PAGE_SIZE;
>     fm->sb->s_bdi->ra_pages = min(fm->sb->s_bdi->ra_pages, ra_pages);
> 
> Since bdi->ra_pages starts at VM_READAHEAD_PAGES (128KB), and the
> kernel sends this value as init_in->max_readahead, the server can only
> decrease readahead -- never increase it. Even if the server responds
> with max_readahead=1MB, the min() clamps it back to 128KB.
> 
> Other filesystems set ra_pages or io_pages based on server/device
> capabilities:
> 
>   - SMB/CIFS sets ra_pages directly (2 * rsize, or from mount option)
>   - Ceph sets ra_pages directly from mount option
>   - 9P sets both ra_pages and io_pages from maxdata
>   - NFS sets io_pages from rsize
> 
> I see two possible approaches and would like feedback:
> 
> Option A: Fix the max_readahead negotiation
> 
> Replace the current:
> 
>     fm->sb->s_bdi->ra_pages = min(fm->sb->s_bdi->ra_pages, ra_pages);
> 
> with:
> 
>     fm->sb->s_bdi->ra_pages = min(ra_pages, fc->max_pages);
> 
> This uses the server's max_readahead response directly, capped by
> fc->max_pages for safety. I think this is backward compatible:
> existing servers that echo the kernel's 128KB value get the same
> result. Servers that return a lower value still reduce it. Only
> servers that return a higher value see changed behavior.
> 
> FUSE servers can opt in by advertising a larger max_readahead in the
> FUSE_INIT response.
> 
> Option B: Set io_pages from max_pages
> 
> Set bdi->io_pages after FUSE_INIT negotiation:
> 
>     fm->sb->s_bdi->io_pages = fc->max_pages;
> 
> This matches what NFS does (setting io_pages from rsize). The
> readahead code uses max(bdi->io_pages, ra->ra_pages) to determine
> the maximum readahead size, so a large io_pages would allow larger
> readahead submissions.
> 
> This is simpler since no server-side change is needed. However, it
> bypasses the max_readahead protocol field, making max_readahead
> effectively meaningless for any device with large max_pages.
> 
> In both cases, fc->max_pages is already clamped by
> fc->max_pages_limit, which for virtio-fs accounts for the virtqueue
> descriptor count.
> 
> Thoughts?

I think this is a question for Miklos. You could also send a patch with
your preferred solution to expediate this.

Thank you for looking into this - it will be nice to remove this
performance limitation.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: fuse: buffered reads limited to 256KB regardless of negotiated max_pages
  2026-03-24 13:21 ` Stefan Hajnoczi
@ 2026-03-24 16:05   ` Bernd Schubert
  2026-03-24 17:11     ` Darrick J. Wong
  0 siblings, 1 reply; 5+ messages in thread
From: Bernd Schubert @ 2026-03-24 16:05 UTC (permalink / raw)
  To: Stefan Hajnoczi, jim.harris
  Cc: linux-fsdevel, Miklos Szeredi, Max Gurtovoy, Idan Zach,
	Konrad Sztyber, German Maglione, hreitz



On 3/24/26 14:21, Stefan Hajnoczi wrote:
> On Mon, Mar 16, 2026 at 07:54:35AM -0700, jim.harris@nvidia.com wrote:
>> Hi all,
>>
>> We have a FUSE server that advertises max_write=1MB and max_pages=256
>> in the FUSE_INIT response. Buffered sequential writes arrive at the
>> server at the full 1MB as expected. However, buffered sequential reads
>> are capped at 256KB per FUSE READ request.
>>
>> The cap comes from the BDI readahead window. bdi->ra_pages defaults to
>> VM_READAHEAD_PAGES (32 pages / 128KB). For sequential access (e.g. cp),
>> posix_fadvise(POSIX_FADV_SEQUENTIAL) doubles the per-file readahead
>> window to 2 * bdi->ra_pages (256KB), producing the observed 256KB
>> limit. A 1MB application read() results in four sequential 256KB
>> round trips to the FUSE server instead of one.
> 
> Hi Jim,
> Thanks for sharing this issue. I am CCing Geman Maglione and Hanna
> Czenczek who work on virtiofsd and are also becoming more involved in
> the virtiofs kernel driver.
> 
>> In process_init_reply(), the kernel processes the
>> server's max_readahead response like this:
>>
>>     ra_pages = arg->max_readahead / PAGE_SIZE;
>>     fm->sb->s_bdi->ra_pages = min(fm->sb->s_bdi->ra_pages, ra_pages);
>>
>> Since bdi->ra_pages starts at VM_READAHEAD_PAGES (128KB), and the
>> kernel sends this value as init_in->max_readahead, the server can only
>> decrease readahead -- never increase it. Even if the server responds
>> with max_readahead=1MB, the min() clamps it back to 128KB.
>>
>> Other filesystems set ra_pages or io_pages based on server/device
>> capabilities:
>>
>>   - SMB/CIFS sets ra_pages directly (2 * rsize, or from mount option)
>>   - Ceph sets ra_pages directly from mount option
>>   - 9P sets both ra_pages and io_pages from maxdata
>>   - NFS sets io_pages from rsize
>>
>> I see two possible approaches and would like feedback:
>>
>> Option A: Fix the max_readahead negotiation
>>
>> Replace the current:
>>
>>     fm->sb->s_bdi->ra_pages = min(fm->sb->s_bdi->ra_pages, ra_pages);
>>
>> with:
>>
>>     fm->sb->s_bdi->ra_pages = min(ra_pages, fc->max_pages);
>>
>> This uses the server's max_readahead response directly, capped by
>> fc->max_pages for safety. I think this is backward compatible:
>> existing servers that echo the kernel's 128KB value get the same
>> result. Servers that return a lower value still reduce it. Only
>> servers that return a higher value see changed behavior.
>>
>> FUSE servers can opt in by advertising a larger max_readahead in the
>> FUSE_INIT response.
>>
>> Option B: Set io_pages from max_pages
>>
>> Set bdi->io_pages after FUSE_INIT negotiation:
>>
>>     fm->sb->s_bdi->io_pages = fc->max_pages;
>>
>> This matches what NFS does (setting io_pages from rsize). The
>> readahead code uses max(bdi->io_pages, ra->ra_pages) to determine
>> the maximum readahead size, so a large io_pages would allow larger
>> readahead submissions.
>>
>> This is simpler since no server-side change is needed. However, it
>> bypasses the max_readahead protocol field, making max_readahead
>> effectively meaningless for any device with large max_pages.
>>
>> In both cases, fc->max_pages is already clamped by
>> fc->max_pages_limit, which for virtio-fs accounts for the virtqueue
>> descriptor count.
>>
>> Thoughts?
> 
> I think this is a question for Miklos. You could also send a patch with
> your preferred solution to expediate this.
> 
> Thank you for looking into this - it will be nice to remove this
> performance limitation.


We have this patch in our branch:

commit 763c96da4bd6d1bb95d8e6bb7fd352389f3a17b9
Author: Bernd Schubert <bschubert@ddn.com>
Date:   Wed May 7 23:39:00 2025 +0200

    fuse: Use fuse-server provided read-ahead for CAP_SYS_ADMIN
    
    read-ahead is currently limited to bdi->ra_pages. One can change
    that after the mount with something like
    
    minor=$(stat -c "%d" /path/to/fuse)
    echo 1024 > /sys/class/bdi/0:$(minor)/read_ahead_kb
    
    Issue is that fuse-server cannot do that from its ->init method,
    as it has to know about device minor, which blocks before
    init is complete.
    
    Fuse already sets the bdi value, but upper limit is the current
    bdi value. For CAP_SYS_ADMIN we can allow higher values.
    
    Signed-off-by: Bernd Schubert <bschubert@ddn.com>

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index af8404f539a3..417598bb3575 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1369,7 +1369,10 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
 			fc->no_flock = 1;
 		}
 
-		fm->sb->s_bdi->ra_pages =
+		if (CAP_SYS_ADMIN)
+			fm->sb->s_bdi->ra_pages = ra_pages;
+		else
+			fm->sb->s_bdi->ra_pages =
 				min(fm->sb->s_bdi->ra_pages, ra_pages);
 		fc->minor = arg->minor;
 		fc->max_write = arg->minor < 5 ? 4096 : arg->max_write;




Maybe we can increase the limit for non privileged servers, but
not to fc->max_pages, because an admin might tune fuse_max_pages_limit 
to a rather large  value. I was able to kind of crash the system
with >16MB read-ahead in early 6.x kernels (I think 6.1).




^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: fuse: buffered reads limited to 256KB regardless of negotiated max_pages
  2026-03-24 16:05   ` Bernd Schubert
@ 2026-03-24 17:11     ` Darrick J. Wong
  2026-03-24 17:40       ` Bernd Schubert
  0 siblings, 1 reply; 5+ messages in thread
From: Darrick J. Wong @ 2026-03-24 17:11 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Stefan Hajnoczi, jim.harris, linux-fsdevel, Miklos Szeredi,
	Max Gurtovoy, Idan Zach, Konrad Sztyber, German Maglione, hreitz

On Tue, Mar 24, 2026 at 05:05:45PM +0100, Bernd Schubert wrote:
> 
> 
> On 3/24/26 14:21, Stefan Hajnoczi wrote:
> > On Mon, Mar 16, 2026 at 07:54:35AM -0700, jim.harris@nvidia.com wrote:
> >> Hi all,
> >>
> >> We have a FUSE server that advertises max_write=1MB and max_pages=256
> >> in the FUSE_INIT response. Buffered sequential writes arrive at the
> >> server at the full 1MB as expected. However, buffered sequential reads
> >> are capped at 256KB per FUSE READ request.
> >>
> >> The cap comes from the BDI readahead window. bdi->ra_pages defaults to
> >> VM_READAHEAD_PAGES (32 pages / 128KB). For sequential access (e.g. cp),
> >> posix_fadvise(POSIX_FADV_SEQUENTIAL) doubles the per-file readahead
> >> window to 2 * bdi->ra_pages (256KB), producing the observed 256KB
> >> limit. A 1MB application read() results in four sequential 256KB
> >> round trips to the FUSE server instead of one.
> > 
> > Hi Jim,
> > Thanks for sharing this issue. I am CCing Geman Maglione and Hanna
> > Czenczek who work on virtiofsd and are also becoming more involved in
> > the virtiofs kernel driver.
> > 
> >> In process_init_reply(), the kernel processes the
> >> server's max_readahead response like this:
> >>
> >>     ra_pages = arg->max_readahead / PAGE_SIZE;
> >>     fm->sb->s_bdi->ra_pages = min(fm->sb->s_bdi->ra_pages, ra_pages);
> >>
> >> Since bdi->ra_pages starts at VM_READAHEAD_PAGES (128KB), and the
> >> kernel sends this value as init_in->max_readahead, the server can only
> >> decrease readahead -- never increase it. Even if the server responds
> >> with max_readahead=1MB, the min() clamps it back to 128KB.
> >>
> >> Other filesystems set ra_pages or io_pages based on server/device
> >> capabilities:
> >>
> >>   - SMB/CIFS sets ra_pages directly (2 * rsize, or from mount option)
> >>   - Ceph sets ra_pages directly from mount option
> >>   - 9P sets both ra_pages and io_pages from maxdata
> >>   - NFS sets io_pages from rsize
> >>
> >> I see two possible approaches and would like feedback:
> >>
> >> Option A: Fix the max_readahead negotiation
> >>
> >> Replace the current:
> >>
> >>     fm->sb->s_bdi->ra_pages = min(fm->sb->s_bdi->ra_pages, ra_pages);
> >>
> >> with:
> >>
> >>     fm->sb->s_bdi->ra_pages = min(ra_pages, fc->max_pages);
> >>
> >> This uses the server's max_readahead response directly, capped by
> >> fc->max_pages for safety. I think this is backward compatible:
> >> existing servers that echo the kernel's 128KB value get the same
> >> result. Servers that return a lower value still reduce it. Only
> >> servers that return a higher value see changed behavior.
> >>
> >> FUSE servers can opt in by advertising a larger max_readahead in the
> >> FUSE_INIT response.
> >>
> >> Option B: Set io_pages from max_pages
> >>
> >> Set bdi->io_pages after FUSE_INIT negotiation:
> >>
> >>     fm->sb->s_bdi->io_pages = fc->max_pages;
> >>
> >> This matches what NFS does (setting io_pages from rsize). The
> >> readahead code uses max(bdi->io_pages, ra->ra_pages) to determine
> >> the maximum readahead size, so a large io_pages would allow larger
> >> readahead submissions.
> >>
> >> This is simpler since no server-side change is needed. However, it
> >> bypasses the max_readahead protocol field, making max_readahead
> >> effectively meaningless for any device with large max_pages.
> >>
> >> In both cases, fc->max_pages is already clamped by
> >> fc->max_pages_limit, which for virtio-fs accounts for the virtqueue
> >> descriptor count.
> >>
> >> Thoughts?
> > 
> > I think this is a question for Miklos. You could also send a patch with
> > your preferred solution to expediate this.
> > 
> > Thank you for looking into this - it will be nice to remove this
> > performance limitation.
> 
> 
> We have this patch in our branch:
> 
> commit 763c96da4bd6d1bb95d8e6bb7fd352389f3a17b9
> Author: Bernd Schubert <bschubert@ddn.com>
> Date:   Wed May 7 23:39:00 2025 +0200
> 
>     fuse: Use fuse-server provided read-ahead for CAP_SYS_ADMIN
>     
>     read-ahead is currently limited to bdi->ra_pages. One can change
>     that after the mount with something like
>     
>     minor=$(stat -c "%d" /path/to/fuse)
>     echo 1024 > /sys/class/bdi/0:$(minor)/read_ahead_kb
>     
>     Issue is that fuse-server cannot do that from its ->init method,
>     as it has to know about device minor, which blocks before
>     init is complete.
>     
>     Fuse already sets the bdi value, but upper limit is the current
>     bdi value. For CAP_SYS_ADMIN we can allow higher values.
>     
>     Signed-off-by: Bernd Schubert <bschubert@ddn.com>
> 
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index af8404f539a3..417598bb3575 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -1369,7 +1369,10 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
>  			fc->no_flock = 1;
>  		}
>  
> -		fm->sb->s_bdi->ra_pages =
> +		if (CAP_SYS_ADMIN)

I really hope your branch has:

		if (has_capability_noaudit(current, CAP_SYS_ADMIN))

and not what's written there. ;)

--D

> +			fm->sb->s_bdi->ra_pages = ra_pages;
> +		else
> +			fm->sb->s_bdi->ra_pages =
>  				min(fm->sb->s_bdi->ra_pages, ra_pages);
>  		fc->minor = arg->minor;
>  		fc->max_write = arg->minor < 5 ? 4096 : arg->max_write;
> 
> 
> 
> 
> Maybe we can increase the limit for non privileged servers, but
> not to fc->max_pages, because an admin might tune fuse_max_pages_limit 
> to a rather large  value. I was able to kind of crash the system
> with >16MB read-ahead in early 6.x kernels (I think 6.1).
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: fuse: buffered reads limited to 256KB regardless of negotiated max_pages
  2026-03-24 17:11     ` Darrick J. Wong
@ 2026-03-24 17:40       ` Bernd Schubert
  0 siblings, 0 replies; 5+ messages in thread
From: Bernd Schubert @ 2026-03-24 17:40 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Stefan Hajnoczi, jim.harris, linux-fsdevel, Miklos Szeredi,
	Max Gurtovoy, Idan Zach, Konrad Sztyber, German Maglione, hreitz



On 3/24/26 18:11, Darrick J. Wong wrote:
> On Tue, Mar 24, 2026 at 05:05:45PM +0100, Bernd Schubert wrote:
>>
>>
>> On 3/24/26 14:21, Stefan Hajnoczi wrote:
>>> On Mon, Mar 16, 2026 at 07:54:35AM -0700, jim.harris@nvidia.com wrote:
>>>> Hi all,
>>>>
>>>> We have a FUSE server that advertises max_write=1MB and max_pages=256
>>>> in the FUSE_INIT response. Buffered sequential writes arrive at the
>>>> server at the full 1MB as expected. However, buffered sequential reads
>>>> are capped at 256KB per FUSE READ request.
>>>>
>>>> The cap comes from the BDI readahead window. bdi->ra_pages defaults to
>>>> VM_READAHEAD_PAGES (32 pages / 128KB). For sequential access (e.g. cp),
>>>> posix_fadvise(POSIX_FADV_SEQUENTIAL) doubles the per-file readahead
>>>> window to 2 * bdi->ra_pages (256KB), producing the observed 256KB
>>>> limit. A 1MB application read() results in four sequential 256KB
>>>> round trips to the FUSE server instead of one.
>>>
>>> Hi Jim,
>>> Thanks for sharing this issue. I am CCing Geman Maglione and Hanna
>>> Czenczek who work on virtiofsd and are also becoming more involved in
>>> the virtiofs kernel driver.
>>>
>>>> In process_init_reply(), the kernel processes the
>>>> server's max_readahead response like this:
>>>>
>>>>     ra_pages = arg->max_readahead / PAGE_SIZE;
>>>>     fm->sb->s_bdi->ra_pages = min(fm->sb->s_bdi->ra_pages, ra_pages);
>>>>
>>>> Since bdi->ra_pages starts at VM_READAHEAD_PAGES (128KB), and the
>>>> kernel sends this value as init_in->max_readahead, the server can only
>>>> decrease readahead -- never increase it. Even if the server responds
>>>> with max_readahead=1MB, the min() clamps it back to 128KB.
>>>>
>>>> Other filesystems set ra_pages or io_pages based on server/device
>>>> capabilities:
>>>>
>>>>   - SMB/CIFS sets ra_pages directly (2 * rsize, or from mount option)
>>>>   - Ceph sets ra_pages directly from mount option
>>>>   - 9P sets both ra_pages and io_pages from maxdata
>>>>   - NFS sets io_pages from rsize
>>>>
>>>> I see two possible approaches and would like feedback:
>>>>
>>>> Option A: Fix the max_readahead negotiation
>>>>
>>>> Replace the current:
>>>>
>>>>     fm->sb->s_bdi->ra_pages = min(fm->sb->s_bdi->ra_pages, ra_pages);
>>>>
>>>> with:
>>>>
>>>>     fm->sb->s_bdi->ra_pages = min(ra_pages, fc->max_pages);
>>>>
>>>> This uses the server's max_readahead response directly, capped by
>>>> fc->max_pages for safety. I think this is backward compatible:
>>>> existing servers that echo the kernel's 128KB value get the same
>>>> result. Servers that return a lower value still reduce it. Only
>>>> servers that return a higher value see changed behavior.
>>>>
>>>> FUSE servers can opt in by advertising a larger max_readahead in the
>>>> FUSE_INIT response.
>>>>
>>>> Option B: Set io_pages from max_pages
>>>>
>>>> Set bdi->io_pages after FUSE_INIT negotiation:
>>>>
>>>>     fm->sb->s_bdi->io_pages = fc->max_pages;
>>>>
>>>> This matches what NFS does (setting io_pages from rsize). The
>>>> readahead code uses max(bdi->io_pages, ra->ra_pages) to determine
>>>> the maximum readahead size, so a large io_pages would allow larger
>>>> readahead submissions.
>>>>
>>>> This is simpler since no server-side change is needed. However, it
>>>> bypasses the max_readahead protocol field, making max_readahead
>>>> effectively meaningless for any device with large max_pages.
>>>>
>>>> In both cases, fc->max_pages is already clamped by
>>>> fc->max_pages_limit, which for virtio-fs accounts for the virtqueue
>>>> descriptor count.
>>>>
>>>> Thoughts?
>>>
>>> I think this is a question for Miklos. You could also send a patch with
>>> your preferred solution to expediate this.
>>>
>>> Thank you for looking into this - it will be nice to remove this
>>> performance limitation.
>>
>>
>> We have this patch in our branch:
>>
>> commit 763c96da4bd6d1bb95d8e6bb7fd352389f3a17b9
>> Author: Bernd Schubert <bschubert@ddn.com>
>> Date:   Wed May 7 23:39:00 2025 +0200
>>
>>     fuse: Use fuse-server provided read-ahead for CAP_SYS_ADMIN
>>     
>>     read-ahead is currently limited to bdi->ra_pages. One can change
>>     that after the mount with something like
>>     
>>     minor=$(stat -c "%d" /path/to/fuse)
>>     echo 1024 > /sys/class/bdi/0:$(minor)/read_ahead_kb
>>     
>>     Issue is that fuse-server cannot do that from its ->init method,
>>     as it has to know about device minor, which blocks before
>>     init is complete.
>>     
>>     Fuse already sets the bdi value, but upper limit is the current
>>     bdi value. For CAP_SYS_ADMIN we can allow higher values.
>>     
>>     Signed-off-by: Bernd Schubert <bschubert@ddn.com>
>>
>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
>> index af8404f539a3..417598bb3575 100644
>> --- a/fs/fuse/inode.c
>> +++ b/fs/fuse/inode.c
>> @@ -1369,7 +1369,10 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
>>  			fc->no_flock = 1;
>>  		}
>>  
>> -		fm->sb->s_bdi->ra_pages =
>> +		if (CAP_SYS_ADMIN)
> 
> I really hope your branch has:
> 
> 		if (has_capability_noaudit(current, CAP_SYS_ADMIN))
> 
> and not what's written there. ;)


Oh oh! I had created that while being at a team meeting, must have
been very distracted. The branch is heavily patched - only runs with one
daemon and always as root. So not too bad for us.

Thanks,
Bernd

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-24 17:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-16 14:54 fuse: buffered reads limited to 256KB regardless of negotiated max_pages jim.harris
2026-03-24 13:21 ` Stefan Hajnoczi
2026-03-24 16:05   ` Bernd Schubert
2026-03-24 17:11     ` Darrick J. Wong
2026-03-24 17:40       ` Bernd Schubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox