All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1 RFC] FUSE Export Coroutine Integration Cover Letter
@ 2025-03-15 17:30 saz97
  2025-03-17 20:56 ` Stefan Hajnoczi
  0 siblings, 1 reply; 4+ messages in thread
From: saz97 @ 2025-03-15 17:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: hreitz, kwolf, stefanha, qemu-block, saz97

Signed-off-by: Changzhi Xie <sa.z@qq.com>

FUSE Export Coroutine Integration Cover Letter

This patch series refactors QEMU's FUSE export module to leverage coroutines for read/write operations, 
addressing concurrency limitations and aligning with QEMU's asynchronous I/O model. The changes 
demonstrate measurable performance improvements while simplifying resource management.

1. Technical Implementation
Key modifications address prior review feedback (Stefan Hajnoczi) and optimize execution flow:

​1.1 Coroutine Integration
Convert fuse_read()/fuse_write() to launch coroutines (fuse_*_coroutine)
Utilize non-blocking blk_co_pread()/blk_co_pwrite() for block layer access
Eliminate main loop blocking during heavy I/O workloads

1.2 ​Buffer Management
Removed explicit buffer pre-allocation in read_from_fuse_export()
Replaced fuse_buf_free() with g_free() due to libfuse3 API constraints

​1.3 Resource Lifecycle
Moved in_flight decrement and blk_exp_unref() into coroutines
Added FUSE opcode checks (FUSE_READ/FUSE_WRITE) to prevent premature cleanup

1.4 ​Structural Improvements
Simplified FuseIORequest structure:
Removed redundant fuse_ino_t and fuse_file_info fields
Retained minimal parameter passing requirements

2. Performance Validation
Tested using fio with 4K random RW pattern, and the result is the average of 5 runs:
fio --ioengine=io_uring --numjobs=1 --runtime=30 --ramp_time=5 --rw=randrw --bs=4k --time_based=1

Key Results

Metric	       iodepth=1	           iodepth=64
​Read Latency	  ▼ 2.7% (3.8k→3kns)	  ▼ 1.3% (4.7M→4.6M ns)
​Write Latency	▼ 3.6% (112k→108kns)	▼ 2.8% (5.2M→5.0M ns)
​Read IOPS	    4740 → 4729 (±0.2%)	  ▲ 2.1% (6391→6529)
​Write IOPS	    4738 → 4727 (±0.2%)	  ▲ 2.2% (6390→6529)
​Throughput	    ~18.9 GB/s (stable)	  ▲ 2.1% (25.6→26.1 GB/s)

Analysis

​High Concurrency (iodepth=64):
Sustained throughput gains (+2.1-2.2%) demonstrate improved scalability
Latency reductions confirm reduced contention in concurrent operations

saz97 (1):
  Integration coroutines into fuse export

 block/export/fuse.c | 189 +++++++++++++++++++++++++++++++-------------
 1 file changed, 132 insertions(+), 57 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 4+ messages in thread
* [PATCH 0/1 RFC] FUSE Export Coroutine Integration Cover Letter
@ 2025-03-24  8:05 saz97
  2025-03-24 14:41 ` Stefan Hajnoczi
  0 siblings, 1 reply; 4+ messages in thread
From: saz97 @ 2025-03-24  8:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: hreitz, kwolf, stefanha, qemu-block, saz97

This patch series refactors QEMU's FUSE export module to leverage coroutines for read/write operations,
addressing concurrency limitations and aligning with QEMU's asynchronous I/O model. The changes
demonstrate measurable performance improvements while simplifying resource management.

1. technology implementation

   according to Stefan suggerstion, i move the processing logic of read_from_fuse_export into a coroutine for buffer management.
   and change the fuse_getattr to call: bdrv_co_get_allocated_file_size().    

2. performance summary

   For the coroutine_integration_fuse test, the average results for iodepth=1 and iodepth=64 are as follows:
    -------------------------------  
    Average results for iodepth=1:
    Read_IOPS: coroutine_integration_fuse: 4492.88 | origin: 4309.39 | 4.25% improvement
    Write_IOPS: coroutine_integration_fuse: 4500.68 | origin: 4318.68 | 4.21% improvement
    Read_BW: coroutine_integration_fuse: 17971.00 KB/s | origin: 17237.30 KB/s | 4.26% improvement
    Write_BW: coroutine_integration_fuse: 18002.50 KB/s | origin: 17274.30 KB/s | 4.23% improvement
    --------------------------------
    -------------------------------
    Average results for iodepth=64:
    Read_IOPS: coroutine_integration_fuse: 5576.93 | origin: 5347.13 | 4.29% improvement
    Write_IOPS: coroutine_integration_fuse: 5569.55 | origin: 5337.33 | 4.33% improvement
    Read_BW: coroutine_integration_fuse: 22311.40 KB/s | origin: 21392.20 KB/s | 4.31% improvement
    Write_BW: coroutine_integration_fuse: 22282.20 KB/s | origin: 21353.20 KB/s | 4.34% improvement
    --------------------------------
   Although all metrics show improvements, the gains are concentrated in the 4.2%–4.3% range, which is lower than expected. Further investigation using gprof reveals the reasons for this limited improvement.

3. Performance Bottlenecks Identified via gprof
   After running a fio test with the following command:
   fio --ioengine=io_uring --numjobs=1 --runtime=30 --ramp_time=5 \
    --rw=randrw --bs=4k --time_based=1 --name=job1 \
    --filename=/mnt/qemu-fuse --iopath=64
   and analyzing the execution profile using gprof, the following issues were identified:

   3.1 Increased Overall Execution Time
   In the original implementation, fuse_write + blk_pwrite accounted for 8.7% of total execution time (6.0% + 2.7%).
   After refactoring, fuse_write_coroutine + blk_co_pwrite now accounts for 43.1% (22.9% + 20.2%).
   This suggests that coroutine overhead is contributing significantly to execution time.

   3.2 Increased Read and Write Calls
   fuse_write calls increased from 173,400 → 333,232.
   fuse_read calls increased from 173,526 → 332,931.
   This indicates that the coroutine-based approach is introducing redundant I/O calls, likely due to unnecessary coroutine switches.

   3.3 Significant Coroutine Overhead
   qemu_coroutine_enter is now called 1,572,803 times, compared to ~476,057 previously.
   This frequent coroutine switching introduces unnecessary overhead, limiting the expected performance improvements.

saz97 (1):
  Integration coroutines into fuse export

 block/export/fuse.c | 190 +++++++++++++++++++++++++++++---------------
 1 file changed, 126 insertions(+), 64 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-03-24 14:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-15 17:30 [PATCH 0/1 RFC] FUSE Export Coroutine Integration Cover Letter saz97
2025-03-17 20:56 ` Stefan Hajnoczi
  -- strict thread matches above, loose matches on Subject: below --
2025-03-24  8:05 saz97
2025-03-24 14:41 ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.