* FUSE: [Regression] Fuse legacy path performance scaling lost in v6.14 vs v6.8/6.11 (iodepth scaling with io_uring) @ 2025-11-26 15:07 Abhishek Gupta 2025-11-26 19:11 ` Bernd Schubert 0 siblings, 1 reply; 12+ messages in thread From: Abhishek Gupta @ 2025-11-26 15:07 UTC (permalink / raw) To: linux-fsdevel; +Cc: miklos, bschubert, Swetha Vadlakonda Hello Team, I am observing a performance regression in the FUSE subsystem on Kernel 6.14 compared to 6.8/6.11 when using the legacy/standard FUSE interface (userspace daemon using standard read on /dev/fuse). Summary of Issue: On Kernel 6.8 & 6.11, increasing iodepth in fio (using ioengine=io_uring) results in near-linear performance scaling. On Kernel 6.14, using the exact same userspace binary, increasing iodepth yields no performance improvement (behavior resembles iodepth=1). Environment: - Workload: GCSFuse (userspace daemon) + Fio - Fio Config: Random Read, ioengine=io_uring, direct=1, iodepth=4. - CPU: Intel. - Daemon: Go-based. It uses a serialized reader loop on /dev/fuse that immediately spawns a Go routine per request. So, it can serve requests in parallel. - Kernel Config: CONFIG_FUSE_IO_URING=y is enabled, but the daemon is not registering for the ring (legacy mode). Benchmark Observations: - Kernel 6.8/6.11: With iodepth=4, we observe ~3.5-4x throughput compared to iodepth=1. - Kernel 6.14: With iodepth=4, throughput is identical to iodepth=1. Parallelism is effectively lost. Is this a known issue? I would appreciate any insights or pointers on this issue. Thanks & Regards, Abhishek ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: FUSE: [Regression] Fuse legacy path performance scaling lost in v6.14 vs v6.8/6.11 (iodepth scaling with io_uring) 2025-11-26 15:07 FUSE: [Regression] Fuse legacy path performance scaling lost in v6.14 vs v6.8/6.11 (iodepth scaling with io_uring) Abhishek Gupta @ 2025-11-26 19:11 ` Bernd Schubert 2025-11-27 13:37 ` Abhishek Gupta 0 siblings, 1 reply; 12+ messages in thread From: Bernd Schubert @ 2025-11-26 19:11 UTC (permalink / raw) To: Abhishek Gupta, linux-fsdevel@vger.kernel.org Cc: miklos@szeredi.hu, Swetha Vadlakonda Hi Abhishek, On 11/26/25 16:07, Abhishek Gupta wrote: > [You don't often get email from abhishekmgupta@google.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > Hello Team, > > I am observing a performance regression in the FUSE subsystem on > Kernel 6.14 compared to 6.8/6.11 when using the legacy/standard FUSE > interface (userspace daemon using standard read on /dev/fuse). > > Summary of Issue: On Kernel 6.8 & 6.11, increasing iodepth in fio > (using ioengine=io_uring) results in near-linear performance scaling. > On Kernel 6.14, using the exact same userspace binary, increasing > iodepth yields no performance improvement (behavior resembles > iodepth=1). > > Environment: > - Workload: GCSFuse (userspace daemon) + Fio > - Fio Config: Random Read, ioengine=io_uring, direct=1, iodepth=4. > - CPU: Intel. > - Daemon: Go-based. It uses a serialized reader loop on /dev/fuse that > immediately spawns a Go routine per request. So, it can serve requests > in parallel. > - Kernel Config: CONFIG_FUSE_IO_URING=y is enabled, but the daemon is > not registering for the ring (legacy mode). > > Benchmark Observations: > - Kernel 6.8/6.11: With iodepth=4, we observe ~3.5-4x throughput > compared to iodepth=1. > - Kernel 6.14: With iodepth=4, throughput is identical to iodepth=1. > Parallelism is effectively lost. > > Is this a known issue? I would appreciate any insights or pointers on > this issue. Could you give your exact fio line? I'm not aware of such a regression. bschubert2@imesrv3 ~>fio --directory=/tmp/dest --name=iops.\$jobnum --rw=randread --bs=4k --size=1G --numjobs=1 --iodepth=1 --time_based --runtime=30s --group_reporting --ioengine=io_uring --direct=1 iops.$jobnum: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=1 fio-3.36 Starting 1 process iops.$jobnum: Laying out IO file (1 file / 1024MiB) ... Run status group 0 (all jobs): READ: bw=178MiB/s (186MB/s), 178MiB/s-178MiB/s (186MB/s-186MB/s), io=5331MiB (5590MB), run=30001-30001msec bschubert2@imesrv3 ~>fio --directory=/tmp/dest --name=iops.\$jobnum --rw=randread --bs=4k --size=1G --numjobs=1 --iodepth=4 --time_based --runtime=30s --group_reporting --ioengine=io_uring --direct=1 iops.$jobnum: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=4 fio-3.36 Starting 1 process Jobs: 1 (f=1): [r(1)][100.0%][r=673MiB/s][r=172k IOPS][eta 00m:00s] iops.$jobnum: (groupid=0, jobs=1): err= 0: pid=52012: Wed Nov 26 20:08:17 2025 ... Run status group 0 (all jobs): READ: bw=673MiB/s (706MB/s), 673MiB/s-673MiB/s (706MB/s-706MB/s), io=19.7GiB (21.2GB), run=30001-30001msec This is with libfuse `example/passthrough_hp -o allow_other --nopassthrough --foreground /tmp/source /tmp/dest` Thanks, Bernd ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: FUSE: [Regression] Fuse legacy path performance scaling lost in v6.14 vs v6.8/6.11 (iodepth scaling with io_uring) 2025-11-26 19:11 ` Bernd Schubert @ 2025-11-27 13:37 ` Abhishek Gupta 2025-11-27 23:05 ` Bernd Schubert 0 siblings, 1 reply; 12+ messages in thread From: Abhishek Gupta @ 2025-11-27 13:37 UTC (permalink / raw) To: Bernd Schubert Cc: linux-fsdevel@vger.kernel.org, miklos@szeredi.hu, Swetha Vadlakonda Hi Bernd, Thanks for looking into this. Please find below the fio output on 6.11 & 6.14 kernel versions. On kernel 6.11 ~/gcsfuse$ uname -a Linux abhishek-c4-192-west4a 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux iodepth = 1 :~/fio-fio-3.38$ ./fio --name=randread --rw=randread --ioengine=io_uring --thread --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 --iodepth=1 --group_reporting=1 --direct=1 randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=1 fio-3.38 Starting 1 thread ... Run status group 0 (all jobs): READ: bw=3311KiB/s (3391kB/s), 3311KiB/s-3311KiB/s (3391kB/s-3391kB/s), io=48.5MiB (50.9MB), run=15001-15001msec iodepth=4 :~/fio-fio-3.38$ ./fio --name=randread --rw=randread --ioengine=io_uring --thread --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 --iodepth=4 --group_reporting=1 --direct=1 randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=4 fio-3.38 Starting 1 thread ... Run status group 0 (all jobs): READ: bw=11.0MiB/s (11.6MB/s), 11.0MiB/s-11.0MiB/s (11.6MB/s-11.6MB/s), io=166MiB (174MB), run=15002-15002msec On kernel 6.14 :~$ uname -a Linux abhishek-west4a-2504 6.14.0-1019-gcp #20-Ubuntu SMP Wed Oct 15 00:41:12 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux iodepth=1 :~$ fio --name=randread --rw=randread --ioengine=io_uring --thread --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 --iodepth=1 --group_reporting=1 --direct=1 randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=1 fio-3.38 Starting 1 thread ... Run status group 0 (all jobs): READ: bw=3576KiB/s (3662kB/s), 3576KiB/s-3576KiB/s (3662kB/s-3662kB/s), io=52.4MiB (54.9MB), run=15001-15001msec iodepth=4 :~$ fio --name=randread --rw=randread --ioengine=io_uring --thread --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 --iodepth=4 --group_reporting=1 --direct=1 randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=4 fio-3.38 ... Run status group 0 (all jobs): READ: bw=3863KiB/s (3956kB/s), 3863KiB/s-3863KiB/s (3956kB/s-3956kB/s), io=56.6MiB (59.3MB), run=15001-15001msec Thanks, Abhishek On Thu, Nov 27, 2025 at 12:41 AM Bernd Schubert <bschubert@ddn.com> wrote: > > Hi Abhishek, > > On 11/26/25 16:07, Abhishek Gupta wrote: > > [You don't often get email from abhishekmgupta@google.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > > > Hello Team, > > > > I am observing a performance regression in the FUSE subsystem on > > Kernel 6.14 compared to 6.8/6.11 when using the legacy/standard FUSE > > interface (userspace daemon using standard read on /dev/fuse). > > > > Summary of Issue: On Kernel 6.8 & 6.11, increasing iodepth in fio > > (using ioengine=io_uring) results in near-linear performance scaling. > > On Kernel 6.14, using the exact same userspace binary, increasing > > iodepth yields no performance improvement (behavior resembles > > iodepth=1). > > > > Environment: > > - Workload: GCSFuse (userspace daemon) + Fio > > - Fio Config: Random Read, ioengine=io_uring, direct=1, iodepth=4. > > - CPU: Intel. > > - Daemon: Go-based. It uses a serialized reader loop on /dev/fuse that > > immediately spawns a Go routine per request. So, it can serve requests > > in parallel. > > - Kernel Config: CONFIG_FUSE_IO_URING=y is enabled, but the daemon is > > not registering for the ring (legacy mode). > > > > Benchmark Observations: > > - Kernel 6.8/6.11: With iodepth=4, we observe ~3.5-4x throughput > > compared to iodepth=1. > > - Kernel 6.14: With iodepth=4, throughput is identical to iodepth=1. > > Parallelism is effectively lost. > > > > Is this a known issue? I would appreciate any insights or pointers on > > this issue. > > Could you give your exact fio line? I'm not aware of such a regression. > > bschubert2@imesrv3 ~>fio --directory=/tmp/dest --name=iops.\$jobnum --rw=randread --bs=4k --size=1G --numjobs=1 --iodepth=1 --time_based --runtime=30s --group_reporting --ioengine=io_uring --direct=1 > iops.$jobnum: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=1 > fio-3.36 > Starting 1 process > iops.$jobnum: Laying out IO file (1 file / 1024MiB) > ... > Run status group 0 (all jobs): > READ: bw=178MiB/s (186MB/s), 178MiB/s-178MiB/s (186MB/s-186MB/s), io=5331MiB (5590MB), run=30001-30001msec > > bschubert2@imesrv3 ~>fio --directory=/tmp/dest --name=iops.\$jobnum --rw=randread --bs=4k --size=1G --numjobs=1 --iodepth=4 --time_based --runtime=30s --group_reporting --ioengine=io_uring --direct=1 > iops.$jobnum: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=4 > fio-3.36 > Starting 1 process > Jobs: 1 (f=1): [r(1)][100.0%][r=673MiB/s][r=172k IOPS][eta 00m:00s] > iops.$jobnum: (groupid=0, jobs=1): err= 0: pid=52012: Wed Nov 26 20:08:17 2025 > ... > Run status group 0 (all jobs): > READ: bw=673MiB/s (706MB/s), 673MiB/s-673MiB/s (706MB/s-706MB/s), io=19.7GiB (21.2GB), run=30001-30001msec > > > This is with libfuse `example/passthrough_hp -o allow_other --nopassthrough --foreground /tmp/source /tmp/dest` > > > Thanks, > Bernd ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: FUSE: [Regression] Fuse legacy path performance scaling lost in v6.14 vs v6.8/6.11 (iodepth scaling with io_uring) 2025-11-27 13:37 ` Abhishek Gupta @ 2025-11-27 23:05 ` Bernd Schubert 2025-12-02 10:42 ` Abhishek Gupta 0 siblings, 1 reply; 12+ messages in thread From: Bernd Schubert @ 2025-11-27 23:05 UTC (permalink / raw) To: Abhishek Gupta, Bernd Schubert Cc: linux-fsdevel@vger.kernel.org, miklos@szeredi.hu, Swetha Vadlakonda Hi Abhishek, On 11/27/25 14:37, Abhishek Gupta wrote: > Hi Bernd, > > Thanks for looking into this. > Please find below the fio output on 6.11 & 6.14 kernel versions. > > > On kernel 6.11 > > ~/gcsfuse$ uname -a > Linux abhishek-c4-192-west4a 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP > Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux > > iodepth = 1 > :~/fio-fio-3.38$ ./fio --name=randread --rw=randread > --ioengine=io_uring --thread > --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 > --iodepth=1 --group_reporting=1 --direct=1 > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) > 4096B-4096B, ioengine=io_uring, iodepth=1 > fio-3.38 > Starting 1 thread > ... > Run status group 0 (all jobs): > READ: bw=3311KiB/s (3391kB/s), 3311KiB/s-3311KiB/s > (3391kB/s-3391kB/s), io=48.5MiB (50.9MB), run=15001-15001msec > > iodepth=4 > :~/fio-fio-3.38$ ./fio --name=randread --rw=randread > --ioengine=io_uring --thread > --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 > --iodepth=4 --group_reporting=1 --direct=1 > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) > 4096B-4096B, ioengine=io_uring, iodepth=4 > fio-3.38 > Starting 1 thread > ... > Run status group 0 (all jobs): > READ: bw=11.0MiB/s (11.6MB/s), 11.0MiB/s-11.0MiB/s > (11.6MB/s-11.6MB/s), io=166MiB (174MB), run=15002-15002msec > > > On kernel 6.14 > > :~$ uname -a > Linux abhishek-west4a-2504 6.14.0-1019-gcp #20-Ubuntu SMP Wed Oct 15 > 00:41:12 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux > > iodepth=1 > :~$ fio --name=randread --rw=randread --ioengine=io_uring --thread > --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 > --iodepth=1 --group_reporting=1 --direct=1 > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) > 4096B-4096B, ioengine=io_uring, iodepth=1 > fio-3.38 > Starting 1 thread > ... > Run status group 0 (all jobs): > READ: bw=3576KiB/s (3662kB/s), 3576KiB/s-3576KiB/s > (3662kB/s-3662kB/s), io=52.4MiB (54.9MB), run=15001-15001msec > > iodepth=4 > :~$ fio --name=randread --rw=randread --ioengine=io_uring --thread > --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 > --iodepth=4 --group_reporting=1 --direct=1 > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) > 4096B-4096B, ioengine=io_uring, iodepth=4 > fio-3.38 > ... > Run status group 0 (all jobs): > READ: bw=3863KiB/s (3956kB/s), 3863KiB/s-3863KiB/s > (3956kB/s-3956kB/s), io=56.6MiB (59.3MB), run=15001-15001msec assuming I would find some time over the weekend and with the fact that I don't know anything about google cloud, how can I reproduce this? Thanks, Bernd ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: FUSE: [Regression] Fuse legacy path performance scaling lost in v6.14 vs v6.8/6.11 (iodepth scaling with io_uring) 2025-11-27 23:05 ` Bernd Schubert @ 2025-12-02 10:42 ` Abhishek Gupta [not found] ` <CAPr64AKYisa=_X5fAB1ozgb3SoarKm19TD3hgwhX9csD92iBzA@mail.gmail.com> 0 siblings, 1 reply; 12+ messages in thread From: Abhishek Gupta @ 2025-12-02 10:42 UTC (permalink / raw) To: Bernd Schubert Cc: Bernd Schubert, linux-fsdevel@vger.kernel.org, miklos@szeredi.hu, Swetha Vadlakonda Hi Bernd, Apologies for the delay in responding. Here are the steps to reproduce the FUSE performance issue locally using a simple read-bench FUSE filesystem: 1. Set up the FUSE Filesystem: git clone https://github.com/jacobsa/fuse.git jacobsa-fuse cd jacobsa-fuse/samples/mount_readbenchfs # Replace <mnt_dir> with your desired mount point go run mount.go --mount_point <mnt_dir> 2. Run Fio Benchmark (iodepth 1): fio --name=randread --rw=randread --ioengine=io_uring --thread --filename=<mnt_dir>/test --filesize=1G --time_based=1 --runtime=5s --bs=4K --numjobs=1 --iodepth=1 --direct=1 --group_reporting=1 3. Run Fio Benchmark (iodepth 4): fio --name=randread --rw=randread --ioengine=io_uring --thread --filename=<mnt_dir>/test --filesize=1G --time_based=1 --runtime=5s --bs=4K --numjobs=1 --iodepth=4 --direct=1 --group_reporting=1 Example Results on Kernel 6.14 (Regression Observed) The following output shows the lack of scaling on my machine with Kernel 6.14: Kernel: Linux abhishek-west4a-2504 6.14.0-1019-gcp #20-Ubuntu SMP Wed Oct 15 00:41:12 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Iodepth = 1: READ: bw=74.3MiB/s (77.9MB/s), ... io=372MiB (390MB), run=5001-5001msec Iodepth = 4: READ: bw=87.6MiB/s (91.9MB/s), ... io=438MiB (459MB), run=5000-5000msec Thanks, Abhishek On Fri, Nov 28, 2025 at 4:35 AM Bernd Schubert <bernd@bsbernd.com> wrote: > > Hi Abhishek, > > On 11/27/25 14:37, Abhishek Gupta wrote: > > Hi Bernd, > > > > Thanks for looking into this. > > Please find below the fio output on 6.11 & 6.14 kernel versions. > > > > > > On kernel 6.11 > > > > ~/gcsfuse$ uname -a > > Linux abhishek-c4-192-west4a 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP > > Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux > > > > iodepth = 1 > > :~/fio-fio-3.38$ ./fio --name=randread --rw=randread > > --ioengine=io_uring --thread > > --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' > > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 > > --iodepth=1 --group_reporting=1 --direct=1 > > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) > > 4096B-4096B, ioengine=io_uring, iodepth=1 > > fio-3.38 > > Starting 1 thread > > ... > > Run status group 0 (all jobs): > > READ: bw=3311KiB/s (3391kB/s), 3311KiB/s-3311KiB/s > > (3391kB/s-3391kB/s), io=48.5MiB (50.9MB), run=15001-15001msec > > > > iodepth=4 > > :~/fio-fio-3.38$ ./fio --name=randread --rw=randread > > --ioengine=io_uring --thread > > --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' > > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 > > --iodepth=4 --group_reporting=1 --direct=1 > > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) > > 4096B-4096B, ioengine=io_uring, iodepth=4 > > fio-3.38 > > Starting 1 thread > > ... > > Run status group 0 (all jobs): > > READ: bw=11.0MiB/s (11.6MB/s), 11.0MiB/s-11.0MiB/s > > (11.6MB/s-11.6MB/s), io=166MiB (174MB), run=15002-15002msec > > > > > > On kernel 6.14 > > > > :~$ uname -a > > Linux abhishek-west4a-2504 6.14.0-1019-gcp #20-Ubuntu SMP Wed Oct 15 > > 00:41:12 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux > > > > iodepth=1 > > :~$ fio --name=randread --rw=randread --ioengine=io_uring --thread > > --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' > > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 > > --iodepth=1 --group_reporting=1 --direct=1 > > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) > > 4096B-4096B, ioengine=io_uring, iodepth=1 > > fio-3.38 > > Starting 1 thread > > ... > > Run status group 0 (all jobs): > > READ: bw=3576KiB/s (3662kB/s), 3576KiB/s-3576KiB/s > > (3662kB/s-3662kB/s), io=52.4MiB (54.9MB), run=15001-15001msec > > > > iodepth=4 > > :~$ fio --name=randread --rw=randread --ioengine=io_uring --thread > > --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' > > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 > > --iodepth=4 --group_reporting=1 --direct=1 > > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) > > 4096B-4096B, ioengine=io_uring, iodepth=4 > > fio-3.38 > > ... > > Run status group 0 (all jobs): > > READ: bw=3863KiB/s (3956kB/s), 3863KiB/s-3863KiB/s > > (3956kB/s-3956kB/s), io=56.6MiB (59.3MB), run=15001-15001msec > > assuming I would find some time over the weekend and with the fact that > I don't know anything about google cloud, how can I reproduce this? > > > Thanks, > Bernd ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <CAPr64AKYisa=_X5fAB1ozgb3SoarKm19TD3hgwhX9csD92iBzA@mail.gmail.com>]
* Re: FUSE: [Regression] Fuse legacy path performance scaling lost in v6.14 vs v6.8/6.11 (iodepth scaling with io_uring) [not found] ` <CAPr64AKYisa=_X5fAB1ozgb3SoarKm19TD3hgwhX9csD92iBzA@mail.gmail.com> @ 2025-12-08 17:52 ` Bernd Schubert 2025-12-08 22:56 ` Bernd Schubert 0 siblings, 1 reply; 12+ messages in thread From: Bernd Schubert @ 2025-12-08 17:52 UTC (permalink / raw) To: Abhishek Gupta, Bernd Schubert Cc: linux-fsdevel@vger.kernel.org, miklos@szeredi.hu, Swetha Vadlakonda Hi Abhishek, yes I was able to run it today, will send out a mail later. Sorry, rather busy with other work. Best, Bernd On 12/8/25 18:43, Abhishek Gupta wrote: > Hi Bernd, > > Were you able to reproduce the issue locally using the steps I provided? > Please let me know if you require any further information or assistance. > > Thanks, > Abhishek > > > On Tue, Dec 2, 2025 at 4:12 PM Abhishek Gupta <abhishekmgupta@google.com > <mailto:abhishekmgupta@google.com>> wrote: > > Hi Bernd, > > Apologies for the delay in responding. > > Here are the steps to reproduce the FUSE performance issue locally > using a simple read-bench FUSE filesystem: > > 1. Set up the FUSE Filesystem: > git clone https://github.com/jacobsa/fuse.git <https://github.com/ > jacobsa/fuse.git> jacobsa-fuse > cd jacobsa-fuse/samples/mount_readbenchfs > # Replace <mnt_dir> with your desired mount point > go run mount.go --mount_point <mnt_dir> > > 2. Run Fio Benchmark (iodepth 1): > fio --name=randread --rw=randread --ioengine=io_uring --thread > --filename=<mnt_dir>/test --filesize=1G --time_based=1 --runtime=5s > --bs=4K --numjobs=1 --iodepth=1 --direct=1 --group_reporting=1 > > 3. Run Fio Benchmark (iodepth 4): > fio --name=randread --rw=randread --ioengine=io_uring --thread > --filename=<mnt_dir>/test --filesize=1G --time_based=1 --runtime=5s > --bs=4K --numjobs=1 --iodepth=4 --direct=1 --group_reporting=1 > > > Example Results on Kernel 6.14 (Regression Observed) > > The following output shows the lack of scaling on my machine with > Kernel 6.14: > > Kernel: > Linux abhishek-west4a-2504 6.14.0-1019-gcp #20-Ubuntu SMP Wed Oct 15 > 00:41:12 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux > > Iodepth = 1: > READ: bw=74.3MiB/s (77.9MB/s), ... io=372MiB (390MB), run=5001-5001msec > > Iodepth = 4: > READ: bw=87.6MiB/s (91.9MB/s), ... io=438MiB (459MB), run=5000-5000msec > > Thanks, > Abhishek > > > On Fri, Nov 28, 2025 at 4:35 AM Bernd Schubert <bernd@bsbernd.com > <mailto:bernd@bsbernd.com>> wrote: > > > > Hi Abhishek, > > > > On 11/27/25 14:37, Abhishek Gupta wrote: > > > Hi Bernd, > > > > > > Thanks for looking into this. > > > Please find below the fio output on 6.11 & 6.14 kernel versions. > > > > > > > > > On kernel 6.11 > > > > > > ~/gcsfuse$ uname -a > > > Linux abhishek-c4-192-west4a 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP > > > Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux > > > > > > iodepth = 1 > > > :~/fio-fio-3.38$ ./fio --name=randread --rw=randread > > > --ioengine=io_uring --thread > > > --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' > > > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 > > > --iodepth=1 --group_reporting=1 --direct=1 > > > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) > 4096B-4096B, (T) > > > 4096B-4096B, ioengine=io_uring, iodepth=1 > > > fio-3.38 > > > Starting 1 thread > > > ... > > > Run status group 0 (all jobs): > > > READ: bw=3311KiB/s (3391kB/s), 3311KiB/s-3311KiB/s > > > (3391kB/s-3391kB/s), io=48.5MiB (50.9MB), run=15001-15001msec > > > > > > iodepth=4 > > > :~/fio-fio-3.38$ ./fio --name=randread --rw=randread > > > --ioengine=io_uring --thread > > > --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' > > > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 > > > --iodepth=4 --group_reporting=1 --direct=1 > > > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) > 4096B-4096B, (T) > > > 4096B-4096B, ioengine=io_uring, iodepth=4 > > > fio-3.38 > > > Starting 1 thread > > > ... > > > Run status group 0 (all jobs): > > > READ: bw=11.0MiB/s (11.6MB/s), 11.0MiB/s-11.0MiB/s > > > (11.6MB/s-11.6MB/s), io=166MiB (174MB), run=15002-15002msec > > > > > > > > > On kernel 6.14 > > > > > > :~$ uname -a > > > Linux abhishek-west4a-2504 6.14.0-1019-gcp #20-Ubuntu SMP Wed Oct 15 > > > 00:41:12 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux > > > > > > iodepth=1 > > > :~$ fio --name=randread --rw=randread --ioengine=io_uring --thread > > > --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' > > > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 > > > --iodepth=1 --group_reporting=1 --direct=1 > > > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) > 4096B-4096B, (T) > > > 4096B-4096B, ioengine=io_uring, iodepth=1 > > > fio-3.38 > > > Starting 1 thread > > > ... > > > Run status group 0 (all jobs): > > > READ: bw=3576KiB/s (3662kB/s), 3576KiB/s-3576KiB/s > > > (3662kB/s-3662kB/s), io=52.4MiB (54.9MB), run=15001-15001msec > > > > > > iodepth=4 > > > :~$ fio --name=randread --rw=randread --ioengine=io_uring --thread > > > --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' > > > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 > > > --iodepth=4 --group_reporting=1 --direct=1 > > > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) > 4096B-4096B, (T) > > > 4096B-4096B, ioengine=io_uring, iodepth=4 > > > fio-3.38 > > > ... > > > Run status group 0 (all jobs): > > > READ: bw=3863KiB/s (3956kB/s), 3863KiB/s-3863KiB/s > > > (3956kB/s-3956kB/s), io=56.6MiB (59.3MB), run=15001-15001msec > > > > assuming I would find some time over the weekend and with the fact > that > > I don't know anything about google cloud, how can I reproduce this? > > > > > > Thanks, > > Bernd > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: FUSE: [Regression] Fuse legacy path performance scaling lost in v6.14 vs v6.8/6.11 (iodepth scaling with io_uring) 2025-12-08 17:52 ` Bernd Schubert @ 2025-12-08 22:56 ` Bernd Schubert 2025-12-09 17:16 ` Abhishek Gupta 2025-12-15 4:30 ` Joanne Koong 0 siblings, 2 replies; 12+ messages in thread From: Bernd Schubert @ 2025-12-08 22:56 UTC (permalink / raw) To: Bernd Schubert, Abhishek Gupta Cc: linux-fsdevel@vger.kernel.org, miklos@szeredi.hu, Swetha Vadlakonda Hi Abishek, really sorry for the delay. I can see the same as you do, no improvement with --iodepth. Although increasing the number of fio threads/jobs helps. Interesting is that this is not what I'm seeing with passthrough_hp, at least I think so I had run quite some tests here https://lore.kernel.org/r/20251003-reduced-nr-ring-queues_3-v2-6-742ff1a8fc58@ddn.com focused on io-uring, but I had also done some tests with legacy fuse. I was hoping I would managed to re-run today before sending the mail, but much too late right. Will try in the morning. Thanks, Bernd On 12/8/25 18:52, Bernd Schubert wrote: > Hi Abhishek, > > yes I was able to run it today, will send out a mail later. Sorry, > rather busy with other work. > > > Best, > Bernd > > On 12/8/25 18:43, Abhishek Gupta wrote: >> Hi Bernd, >> >> Were you able to reproduce the issue locally using the steps I provided? >> Please let me know if you require any further information or assistance. >> >> Thanks, >> Abhishek >> >> >> On Tue, Dec 2, 2025 at 4:12 PM Abhishek Gupta <abhishekmgupta@google.com >> <mailto:abhishekmgupta@google.com>> wrote: >> >> Hi Bernd, >> >> Apologies for the delay in responding. >> >> Here are the steps to reproduce the FUSE performance issue locally >> using a simple read-bench FUSE filesystem: >> >> 1. Set up the FUSE Filesystem: >> git clone https://github.com/jacobsa/fuse.git <https://github.com/ >> jacobsa/fuse.git> jacobsa-fuse >> cd jacobsa-fuse/samples/mount_readbenchfs >> # Replace <mnt_dir> with your desired mount point >> go run mount.go --mount_point <mnt_dir> >> >> 2. Run Fio Benchmark (iodepth 1): >> fio --name=randread --rw=randread --ioengine=io_uring --thread >> --filename=<mnt_dir>/test --filesize=1G --time_based=1 --runtime=5s >> --bs=4K --numjobs=1 --iodepth=1 --direct=1 --group_reporting=1 >> >> 3. Run Fio Benchmark (iodepth 4): >> fio --name=randread --rw=randread --ioengine=io_uring --thread >> --filename=<mnt_dir>/test --filesize=1G --time_based=1 --runtime=5s >> --bs=4K --numjobs=1 --iodepth=4 --direct=1 --group_reporting=1 >> >> >> Example Results on Kernel 6.14 (Regression Observed) >> >> The following output shows the lack of scaling on my machine with >> Kernel 6.14: >> >> Kernel: >> Linux abhishek-west4a-2504 6.14.0-1019-gcp #20-Ubuntu SMP Wed Oct 15 >> 00:41:12 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux >> >> Iodepth = 1: >> READ: bw=74.3MiB/s (77.9MB/s), ... io=372MiB (390MB), run=5001-5001msec >> >> Iodepth = 4: >> READ: bw=87.6MiB/s (91.9MB/s), ... io=438MiB (459MB), run=5000-5000msec >> >> Thanks, >> Abhishek >> >> >> On Fri, Nov 28, 2025 at 4:35 AM Bernd Schubert <bernd@bsbernd.com >> <mailto:bernd@bsbernd.com>> wrote: >> > >> > Hi Abhishek, >> > >> > On 11/27/25 14:37, Abhishek Gupta wrote: >> > > Hi Bernd, >> > > >> > > Thanks for looking into this. >> > > Please find below the fio output on 6.11 & 6.14 kernel versions. >> > > >> > > >> > > On kernel 6.11 >> > > >> > > ~/gcsfuse$ uname -a >> > > Linux abhishek-c4-192-west4a 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP >> > > Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux >> > > >> > > iodepth = 1 >> > > :~/fio-fio-3.38$ ./fio --name=randread --rw=randread >> > > --ioengine=io_uring --thread >> > > --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' >> > > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 >> > > --iodepth=1 --group_reporting=1 --direct=1 >> > > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) >> 4096B-4096B, (T) >> > > 4096B-4096B, ioengine=io_uring, iodepth=1 >> > > fio-3.38 >> > > Starting 1 thread >> > > ... >> > > Run status group 0 (all jobs): >> > > READ: bw=3311KiB/s (3391kB/s), 3311KiB/s-3311KiB/s >> > > (3391kB/s-3391kB/s), io=48.5MiB (50.9MB), run=15001-15001msec >> > > >> > > iodepth=4 >> > > :~/fio-fio-3.38$ ./fio --name=randread --rw=randread >> > > --ioengine=io_uring --thread >> > > --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' >> > > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 >> > > --iodepth=4 --group_reporting=1 --direct=1 >> > > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) >> 4096B-4096B, (T) >> > > 4096B-4096B, ioengine=io_uring, iodepth=4 >> > > fio-3.38 >> > > Starting 1 thread >> > > ... >> > > Run status group 0 (all jobs): >> > > READ: bw=11.0MiB/s (11.6MB/s), 11.0MiB/s-11.0MiB/s >> > > (11.6MB/s-11.6MB/s), io=166MiB (174MB), run=15002-15002msec >> > > >> > > >> > > On kernel 6.14 >> > > >> > > :~$ uname -a >> > > Linux abhishek-west4a-2504 6.14.0-1019-gcp #20-Ubuntu SMP Wed Oct 15 >> > > 00:41:12 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux >> > > >> > > iodepth=1 >> > > :~$ fio --name=randread --rw=randread --ioengine=io_uring --thread >> > > --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' >> > > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 >> > > --iodepth=1 --group_reporting=1 --direct=1 >> > > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) >> 4096B-4096B, (T) >> > > 4096B-4096B, ioengine=io_uring, iodepth=1 >> > > fio-3.38 >> > > Starting 1 thread >> > > ... >> > > Run status group 0 (all jobs): >> > > READ: bw=3576KiB/s (3662kB/s), 3576KiB/s-3576KiB/s >> > > (3662kB/s-3662kB/s), io=52.4MiB (54.9MB), run=15001-15001msec >> > > >> > > iodepth=4 >> > > :~$ fio --name=randread --rw=randread --ioengine=io_uring --thread >> > > --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' >> > > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 >> > > --iodepth=4 --group_reporting=1 --direct=1 >> > > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) >> 4096B-4096B, (T) >> > > 4096B-4096B, ioengine=io_uring, iodepth=4 >> > > fio-3.38 >> > > ... >> > > Run status group 0 (all jobs): >> > > READ: bw=3863KiB/s (3956kB/s), 3863KiB/s-3863KiB/s >> > > (3956kB/s-3956kB/s), io=56.6MiB (59.3MB), run=15001-15001msec >> > >> > assuming I would find some time over the weekend and with the fact >> that >> > I don't know anything about google cloud, how can I reproduce this? >> > >> > >> > Thanks, >> > Bernd >> > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: FUSE: [Regression] Fuse legacy path performance scaling lost in v6.14 vs v6.8/6.11 (iodepth scaling with io_uring) 2025-12-08 22:56 ` Bernd Schubert @ 2025-12-09 17:16 ` Abhishek Gupta 2025-12-15 4:30 ` Joanne Koong 1 sibling, 0 replies; 12+ messages in thread From: Abhishek Gupta @ 2025-12-09 17:16 UTC (permalink / raw) To: Bernd Schubert Cc: Bernd Schubert, linux-fsdevel@vger.kernel.org, miklos@szeredi.hu, Swetha Vadlakonda Hi Bernd, No worries. Thanks for the update. I look forward to hearing your findings. Thanks, Abhishek On Tue, Dec 9, 2025 at 4:27 AM Bernd Schubert <bernd@bsbernd.com> wrote: > > Hi Abishek, > > really sorry for the delay. I can see the same as you do, no improvement > with --iodepth. Although increasing the number of fio threads/jobs helps. > > Interesting is that this is not what I'm seeing with passthrough_hp, > at least I think so > > I had run quite some tests here > https://lore.kernel.org/r/20251003-reduced-nr-ring-queues_3-v2-6-742ff1a8fc58@ddn.com > focused on io-uring, but I had also done some tests with legacy > fuse. I was hoping I would managed to re-run today before sending > the mail, but much too late right. Will try in the morning. > > > > Thanks, > Bernd > > > On 12/8/25 18:52, Bernd Schubert wrote: > > Hi Abhishek, > > > > yes I was able to run it today, will send out a mail later. Sorry, > > rather busy with other work. > > > > > > Best, > > Bernd > > > > On 12/8/25 18:43, Abhishek Gupta wrote: > >> Hi Bernd, > >> > >> Were you able to reproduce the issue locally using the steps I provided? > >> Please let me know if you require any further information or assistance. > >> > >> Thanks, > >> Abhishek > >> > >> > >> On Tue, Dec 2, 2025 at 4:12 PM Abhishek Gupta <abhishekmgupta@google.com > >> <mailto:abhishekmgupta@google.com>> wrote: > >> > >> Hi Bernd, > >> > >> Apologies for the delay in responding. > >> > >> Here are the steps to reproduce the FUSE performance issue locally > >> using a simple read-bench FUSE filesystem: > >> > >> 1. Set up the FUSE Filesystem: > >> git clone https://github.com/jacobsa/fuse.git <https://github.com/ > >> jacobsa/fuse.git> jacobsa-fuse > >> cd jacobsa-fuse/samples/mount_readbenchfs > >> # Replace <mnt_dir> with your desired mount point > >> go run mount.go --mount_point <mnt_dir> > >> > >> 2. Run Fio Benchmark (iodepth 1): > >> fio --name=randread --rw=randread --ioengine=io_uring --thread > >> --filename=<mnt_dir>/test --filesize=1G --time_based=1 --runtime=5s > >> --bs=4K --numjobs=1 --iodepth=1 --direct=1 --group_reporting=1 > >> > >> 3. Run Fio Benchmark (iodepth 4): > >> fio --name=randread --rw=randread --ioengine=io_uring --thread > >> --filename=<mnt_dir>/test --filesize=1G --time_based=1 --runtime=5s > >> --bs=4K --numjobs=1 --iodepth=4 --direct=1 --group_reporting=1 > >> > >> > >> Example Results on Kernel 6.14 (Regression Observed) > >> > >> The following output shows the lack of scaling on my machine with > >> Kernel 6.14: > >> > >> Kernel: > >> Linux abhishek-west4a-2504 6.14.0-1019-gcp #20-Ubuntu SMP Wed Oct 15 > >> 00:41:12 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux > >> > >> Iodepth = 1: > >> READ: bw=74.3MiB/s (77.9MB/s), ... io=372MiB (390MB), run=5001-5001msec > >> > >> Iodepth = 4: > >> READ: bw=87.6MiB/s (91.9MB/s), ... io=438MiB (459MB), run=5000-5000msec > >> > >> Thanks, > >> Abhishek > >> > >> > >> On Fri, Nov 28, 2025 at 4:35 AM Bernd Schubert <bernd@bsbernd.com > >> <mailto:bernd@bsbernd.com>> wrote: > >> > > >> > Hi Abhishek, > >> > > >> > On 11/27/25 14:37, Abhishek Gupta wrote: > >> > > Hi Bernd, > >> > > > >> > > Thanks for looking into this. > >> > > Please find below the fio output on 6.11 & 6.14 kernel versions. > >> > > > >> > > > >> > > On kernel 6.11 > >> > > > >> > > ~/gcsfuse$ uname -a > >> > > Linux abhishek-c4-192-west4a 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP > >> > > Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux > >> > > > >> > > iodepth = 1 > >> > > :~/fio-fio-3.38$ ./fio --name=randread --rw=randread > >> > > --ioengine=io_uring --thread > >> > > --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' > >> > > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 > >> > > --iodepth=1 --group_reporting=1 --direct=1 > >> > > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) > >> 4096B-4096B, (T) > >> > > 4096B-4096B, ioengine=io_uring, iodepth=1 > >> > > fio-3.38 > >> > > Starting 1 thread > >> > > ... > >> > > Run status group 0 (all jobs): > >> > > READ: bw=3311KiB/s (3391kB/s), 3311KiB/s-3311KiB/s > >> > > (3391kB/s-3391kB/s), io=48.5MiB (50.9MB), run=15001-15001msec > >> > > > >> > > iodepth=4 > >> > > :~/fio-fio-3.38$ ./fio --name=randread --rw=randread > >> > > --ioengine=io_uring --thread > >> > > --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' > >> > > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 > >> > > --iodepth=4 --group_reporting=1 --direct=1 > >> > > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) > >> 4096B-4096B, (T) > >> > > 4096B-4096B, ioengine=io_uring, iodepth=4 > >> > > fio-3.38 > >> > > Starting 1 thread > >> > > ... > >> > > Run status group 0 (all jobs): > >> > > READ: bw=11.0MiB/s (11.6MB/s), 11.0MiB/s-11.0MiB/s > >> > > (11.6MB/s-11.6MB/s), io=166MiB (174MB), run=15002-15002msec > >> > > > >> > > > >> > > On kernel 6.14 > >> > > > >> > > :~$ uname -a > >> > > Linux abhishek-west4a-2504 6.14.0-1019-gcp #20-Ubuntu SMP Wed Oct 15 > >> > > 00:41:12 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux > >> > > > >> > > iodepth=1 > >> > > :~$ fio --name=randread --rw=randread --ioengine=io_uring --thread > >> > > --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' > >> > > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 > >> > > --iodepth=1 --group_reporting=1 --direct=1 > >> > > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) > >> 4096B-4096B, (T) > >> > > 4096B-4096B, ioengine=io_uring, iodepth=1 > >> > > fio-3.38 > >> > > Starting 1 thread > >> > > ... > >> > > Run status group 0 (all jobs): > >> > > READ: bw=3576KiB/s (3662kB/s), 3576KiB/s-3576KiB/s > >> > > (3662kB/s-3662kB/s), io=52.4MiB (54.9MB), run=15001-15001msec > >> > > > >> > > iodepth=4 > >> > > :~$ fio --name=randread --rw=randread --ioengine=io_uring --thread > >> > > --filename_format='/home/abhishekmgupta_google_com/bucket/$jobnum' > >> > > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 > >> > > --iodepth=4 --group_reporting=1 --direct=1 > >> > > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) > >> 4096B-4096B, (T) > >> > > 4096B-4096B, ioengine=io_uring, iodepth=4 > >> > > fio-3.38 > >> > > ... > >> > > Run status group 0 (all jobs): > >> > > READ: bw=3863KiB/s (3956kB/s), 3863KiB/s-3863KiB/s > >> > > (3956kB/s-3956kB/s), io=56.6MiB (59.3MB), run=15001-15001msec > >> > > >> > assuming I would find some time over the weekend and with the fact > >> that > >> > I don't know anything about google cloud, how can I reproduce this? > >> > > >> > > >> > Thanks, > >> > Bernd > >> > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: FUSE: [Regression] Fuse legacy path performance scaling lost in v6.14 vs v6.8/6.11 (iodepth scaling with io_uring) 2025-12-08 22:56 ` Bernd Schubert 2025-12-09 17:16 ` Abhishek Gupta @ 2025-12-15 4:30 ` Joanne Koong 2025-12-17 9:17 ` Abhishek Gupta 1 sibling, 1 reply; 12+ messages in thread From: Joanne Koong @ 2025-12-15 4:30 UTC (permalink / raw) To: Bernd Schubert Cc: Bernd Schubert, Abhishek Gupta, linux-fsdevel@vger.kernel.org, miklos@szeredi.hu, Swetha Vadlakonda On Tue, Dec 9, 2025 at 6:57 AM Bernd Schubert <bernd@bsbernd.com> wrote: > > Hi Abishek, > > really sorry for the delay. I can see the same as you do, no improvement > with --iodepth. Although increasing the number of fio threads/jobs helps. > > Interesting is that this is not what I'm seeing with passthrough_hp, > at least I think so I'm not seeing this regression on passthrough_hp either. On my local vm (on top of the fuse for-next tree) I'm seeing ~13 MiB/s for iodepth=1 and ~70 MiB/s for iodepth=4. Abhishek, are you able to git bisect this to the commit that causes your regression? Thanks, Joanne > > I had run quite some tests here > https://lore.kernel.org/r/20251003-reduced-nr-ring-queues_3-v2-6-742ff1a8fc58@ddn.com > focused on io-uring, but I had also done some tests with legacy > fuse. I was hoping I would managed to re-run today before sending > the mail, but much too late right. Will try in the morning. > > > > Thanks, > Bernd > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: FUSE: [Regression] Fuse legacy path performance scaling lost in v6.14 vs v6.8/6.11 (iodepth scaling with io_uring) 2025-12-15 4:30 ` Joanne Koong @ 2025-12-17 9:17 ` Abhishek Gupta 2025-12-17 10:43 ` Bernd Schubert 2025-12-17 11:34 ` Horst Birthelmer 0 siblings, 2 replies; 12+ messages in thread From: Abhishek Gupta @ 2025-12-17 9:17 UTC (permalink / raw) To: Joanne Koong Cc: Bernd Schubert, Bernd Schubert, linux-fsdevel@vger.kernel.org, miklos@szeredi.hu, Swetha Vadlakonda, Vikas Jain (GCS) Hi Joanne, Bernd, I'm seeing this regression on passthrough_hp as well. Checked it on 6.14.0-1019-gcp and I was getting 11.7MiB/s with iodepth 1 & 15.6 MiB/s with iodepth 4. To remove ambiguity (due to kernel versions), I tried it on stock kernel 6.17 as well. Please find below more details: # Installed stock kernel 6.17 $ uname -a Linux abhishek-ubuntu2510.us-west4-a.c.gcs-fuse-test.internal 6.17.0 #2 SMP Tue Dec 16 12:14:53 UTC 2025 x86_64 GNU/Linux # Running it as sudo to ensure passthrough is allowed (& we don't get permission error for passthrough) $ sudo ./example/passthrough_hp --debug ~/test_source/ ~/test_mount/ DEBUG: lookup(): name=test2.bin, parent=1 DEBUG:do_lookup:410 inode 3527901 count 1 DEBUG: lookup(): created userspace inode 3527901; fd = 9 DEBUG: setup shared backing file 1 for inode 136392323632296 DEBUG: closed backing file 1 for inode 136392323632296 # iodepth 1 $ sudo fio --name=randread --rw=randread --ioengine=io_uring --thread --filename_format='/home/abhishekmgupta_google_com/test_mount/test.bin' --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 --iodepth=1 --group_reporting=1 --direct=1 randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=1 fio-3.39 Starting 1 thread ... Run status group 0 (all jobs): READ: bw=11.4MiB/s (11.9MB/s), 11.4MiB/s-11.4MiB/s (11.9MB/s-11.9MB/s), io=170MiB (179MB), run=15001-15001msec #iodepth 4 $ sudo fio --name=randread --rw=randread --ioengine=io_uring --thread --filename_format='/home/abhishekmgupta_google_com/test_mount/test.bin' --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 --iodepth=4 --group_reporting=1 --direct=1 randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=4 fio-3.39 Starting 1 thread ... Run status group 0 (all jobs): READ: bw=18.3MiB/s (19.2MB/s), 18.3MiB/s-18.3MiB/s (19.2MB/s-19.2MB/s), io=275MiB (288MB), run=15002-15002msec Also, I tried to build the for-next branch against both kernel 6.18 & 6.17 (to figure out the culprit commit), but I got compilation errors. Which kernel version should I build the for-next branch against? Thanks, Abhishek On Mon, Dec 15, 2025 at 10:00 AM Joanne Koong <joannelkoong@gmail.com> wrote: > > On Tue, Dec 9, 2025 at 6:57 AM Bernd Schubert <bernd@bsbernd.com> wrote: > > > > Hi Abishek, > > > > really sorry for the delay. I can see the same as you do, no improvement > > with --iodepth. Although increasing the number of fio threads/jobs helps. > > > > Interesting is that this is not what I'm seeing with passthrough_hp, > > at least I think so > > I'm not seeing this regression on passthrough_hp either. On my local > vm (on top of the fuse for-next tree) I'm seeing ~13 MiB/s for > iodepth=1 and ~70 MiB/s for iodepth=4. > > Abhishek, are you able to git bisect this to the commit that causes > your regression? > > Thanks, > Joanne > > > > > I had run quite some tests here > > https://lore.kernel.org/r/20251003-reduced-nr-ring-queues_3-v2-6-742ff1a8fc58@ddn.com > > focused on io-uring, but I had also done some tests with legacy > > fuse. I was hoping I would managed to re-run today before sending > > the mail, but much too late right. Will try in the morning. > > > > > > > > Thanks, > > Bernd > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: FUSE: [Regression] Fuse legacy path performance scaling lost in v6.14 vs v6.8/6.11 (iodepth scaling with io_uring) 2025-12-17 9:17 ` Abhishek Gupta @ 2025-12-17 10:43 ` Bernd Schubert 2025-12-17 11:34 ` Horst Birthelmer 1 sibling, 0 replies; 12+ messages in thread From: Bernd Schubert @ 2025-12-17 10:43 UTC (permalink / raw) To: Abhishek Gupta, Joanne Koong Cc: Bernd Schubert, linux-fsdevel@vger.kernel.org, miklos@szeredi.hu, Swetha Vadlakonda, Vikas Jain (GCS) Hi Abhishek, [comments inline - on linux mailing lists this is much preferred]. On 12/17/25 10:17, Abhishek Gupta wrote: > Hi Joanne, Bernd, > > I'm seeing this regression on passthrough_hp as well. Checked it on > 6.14.0-1019-gcp and I was getting 11.7MiB/s with iodepth 1 & 15.6 > MiB/s with iodepth 4. To remove ambiguity (due to kernel versions), I > tried it on stock kernel 6.17 as well. Please find below more details: if you can reproduce it with libfuse and passthrough_hp it will be certainly easier for us. > > # Installed stock kernel 6.17 > $ uname -a > Linux abhishek-ubuntu2510.us-west4-a.c.gcs-fuse-test.internal 6.17.0 > #2 SMP Tue Dec 16 12:14:53 UTC 2025 x86_64 GNU/Linux > > # Running it as sudo to ensure passthrough is allowed (& we don't get > permission error for passthrough) > $ sudo ./example/passthrough_hp --debug ~/test_source/ ~/test_mount/ > DEBUG: lookup(): name=test2.bin, parent=1 > DEBUG:do_lookup:410 inode 3527901 count 1 > DEBUG: lookup(): created userspace inode 3527901; fd = 9 > DEBUG: setup shared backing file 1 for inode 136392323632296 > DEBUG: closed backing file 1 for inode 136392323632296 > > # iodepth 1 > $ sudo fio --name=randread --rw=randread --ioengine=io_uring --thread > --filename_format='/home/abhishekmgupta_google_com/test_mount/test.bin' > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 > --iodepth=1 --group_reporting=1 --direct=1 > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) > 4096B-4096B, ioengine=io_uring, iodepth=1 > fio-3.39 > Starting 1 thread ... > Run status group 0 (all jobs): > READ: bw=11.4MiB/s (11.9MB/s), 11.4MiB/s-11.4MiB/s > (11.9MB/s-11.9MB/s), io=170MiB (179MB), run=15001-15001msec > > #iodepth 4 > $ sudo fio --name=randread --rw=randread --ioengine=io_uring --thread > --filename_format='/home/abhishekmgupta_google_com/test_mount/test.bin' > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 > --iodepth=4 --group_reporting=1 --direct=1 > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) > 4096B-4096B, ioengine=io_uring, iodepth=4 > fio-3.39 > Starting 1 thread ... > Run status group 0 (all jobs): > READ: bw=18.3MiB/s (19.2MB/s), 18.3MiB/s-18.3MiB/s > (19.2MB/s-19.2MB/s), io=275MiB (288MB), run=15002-15002msec So here I'm confused --iodepth=1: 11.4MiB/s --iodepth=4: 18.3MiB/s At least there some advantage of --iodepth=4. Would it be possible to provide results for an older kernel that doesn't have the regression your are seeing with passthrough_hp? > > Also, I tried to build the for-next branch against both kernel 6.18 & > 6.17 (to figure out the culprit commit), but I got compilation errors. > Which kernel version should I build the for-next branch against? Dunno, it should not fail to compile. And you are trying to figure out a regression between 6.11 and 6.14, don't you? So it should be a bisect between these two versions? Thanks, Bernd ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Re: FUSE: [Regression] Fuse legacy path performance scaling lost in v6.14 vs v6.8/6.11 (iodepth scaling with io_uring) 2025-12-17 9:17 ` Abhishek Gupta 2025-12-17 10:43 ` Bernd Schubert @ 2025-12-17 11:34 ` Horst Birthelmer 1 sibling, 0 replies; 12+ messages in thread From: Horst Birthelmer @ 2025-12-17 11:34 UTC (permalink / raw) To: Abhishek Gupta Cc: Joanne Koong, Bernd Schubert, Bernd Schubert, linux-fsdevel@vger.kernel.org, miklos@szeredi.hu, Swetha Vadlakonda, Vikas Jain (GCS) On Wed, Dec 17, 2025 at 02:47:00PM +0530, Abhishek Gupta wrote: > Hi Joanne, Bernd, > > I'm seeing this regression on passthrough_hp as well. Checked it on > 6.14.0-1019-gcp and I was getting 11.7MiB/s with iodepth 1 & 15.6 > MiB/s with iodepth 4. To remove ambiguity (due to kernel versions), I > tried it on stock kernel 6.17 as well. Please find below more details: > > # Installed stock kernel 6.17 > $ uname -a > Linux abhishek-ubuntu2510.us-west4-a.c.gcs-fuse-test.internal 6.17.0 > #2 SMP Tue Dec 16 12:14:53 UTC 2025 x86_64 GNU/Linux > > # Running it as sudo to ensure passthrough is allowed (& we don't get > permission error for passthrough) > $ sudo ./example/passthrough_hp --debug ~/test_source/ ~/test_mount/ > DEBUG: lookup(): name=test2.bin, parent=1 > DEBUG:do_lookup:410 inode 3527901 count 1 > DEBUG: lookup(): created userspace inode 3527901; fd = 9 > DEBUG: setup shared backing file 1 for inode 136392323632296 > DEBUG: closed backing file 1 for inode 136392323632296 > > # iodepth 1 > $ sudo fio --name=randread --rw=randread --ioengine=io_uring --thread > --filename_format='/home/abhishekmgupta_google_com/test_mount/test.bin' > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 > --iodepth=1 --group_reporting=1 --direct=1 > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) > 4096B-4096B, ioengine=io_uring, iodepth=1 > fio-3.39 > Starting 1 thread ... > Run status group 0 (all jobs): > READ: bw=11.4MiB/s (11.9MB/s), 11.4MiB/s-11.4MiB/s > (11.9MB/s-11.9MB/s), io=170MiB (179MB), run=15001-15001msec > > #iodepth 4 > $ sudo fio --name=randread --rw=randread --ioengine=io_uring --thread > --filename_format='/home/abhishekmgupta_google_com/test_mount/test.bin' > --filesize=1G --time_based=1 --runtime=15s --bs=4K --numjobs=1 > --iodepth=4 --group_reporting=1 --direct=1 > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) > 4096B-4096B, ioengine=io_uring, iodepth=4 > fio-3.39 > Starting 1 thread ... > Run status group 0 (all jobs): > READ: bw=18.3MiB/s (19.2MB/s), 18.3MiB/s-18.3MiB/s > (19.2MB/s-19.2MB/s), io=275MiB (288MB), run=15002-15002msec > > Also, I tried to build the for-next branch against both kernel 6.18 & > 6.17 (to figure out the culprit commit), but I got compilation errors. > Which kernel version should I build the for-next branch against? Hi Abhishek, since I have been debugging some memory problems for 6.12 I'm somewhat familiar with the changes since. After diffing the differences from v6.14 to v6.17 in the fs/fuse/ directory the big topics of that whole journey were the move to folios, the timeouts and iomap changes. None of which should make that kind of a difference. I'd rather expect it to be slightly faster due to the removal of the tmp pages. IIRC the default passthrough_hp does not use io-uring. Are you using fuse over io-uring or the normal device? Cheers, Horst ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-12-17 11:41 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-26 15:07 FUSE: [Regression] Fuse legacy path performance scaling lost in v6.14 vs v6.8/6.11 (iodepth scaling with io_uring) Abhishek Gupta
2025-11-26 19:11 ` Bernd Schubert
2025-11-27 13:37 ` Abhishek Gupta
2025-11-27 23:05 ` Bernd Schubert
2025-12-02 10:42 ` Abhishek Gupta
[not found] ` <CAPr64AKYisa=_X5fAB1ozgb3SoarKm19TD3hgwhX9csD92iBzA@mail.gmail.com>
2025-12-08 17:52 ` Bernd Schubert
2025-12-08 22:56 ` Bernd Schubert
2025-12-09 17:16 ` Abhishek Gupta
2025-12-15 4:30 ` Joanne Koong
2025-12-17 9:17 ` Abhishek Gupta
2025-12-17 10:43 ` Bernd Schubert
2025-12-17 11:34 ` Horst Birthelmer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).