From: Hanna Czenczek <hreitz@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org,
Kevin Wolf <kwolf@redhat.com>
Subject: Re: [PATCH 11/15] fuse: Manually process requests (without libfuse)
Date: Fri, 4 Apr 2025 14:36:46 +0200 [thread overview]
Message-ID: <d4dc6324-110d-42ba-bfe1-366b937de40e@redhat.com> (raw)
In-Reply-To: <20250327153503.GK37458@fedora>
On 27.03.25 16:35, Stefan Hajnoczi wrote:
> On Tue, Mar 25, 2025 at 05:06:51PM +0100, Hanna Czenczek wrote:
>> Manually read requests from the /dev/fuse FD and process them, without
>> using libfuse. This allows us to safely add parallel request processing
>> in coroutines later, without having to worry about libfuse internals.
>> (Technically, we already have exactly that problem with
>> read_from_fuse_export()/read_from_fuse_fd() nesting.)
>>
>> We will continue to use libfuse for mounting the filesystem; fusermount3
>> is a effectively a helper program of libfuse, so it should know best how
>> to interact with it. (Doing it manually without libfuse, while doable,
>> is a bit of a pain, and it is not clear to me how stable the "protocol"
>> actually is.)
>>
>> Take this opportunity of quite a major rewrite to update the Copyright
>> line with corrected information that has surfaced in the meantime.
>>
>> Here are some benchmarks from before this patch (4k, iodepth=16, libaio;
>> except 'sync', which are iodepth=1 and pvsync2):
>>
>> file:
>> read:
>> seq aio: 78.6k ±1.3k IOPS
>> rand aio: 39.3k ±2.9k
>> seq sync: 32.5k ±0.7k
>> rand sync: 9.9k ±0.1k
>> write:
>> seq aio: 61.9k ±0.5k
>> rand aio: 61.2k ±0.6k
>> seq sync: 27.9k ±0.2k
>> rand sync: 27.6k ±0.4k
>> null:
>> read:
>> seq aio: 214.0k ±5.9k
>> rand aio: 212.7k ±4.5k
>> seq sync: 90.3k ±6.5k
>> rand sync: 89.7k ±5.1k
>> write:
>> seq aio: 203.9k ±1.5k
>> rand aio: 201.4k ±3.6k
>> seq sync: 86.1k ±6.2k
>> rand sync: 84.9k ±5.3k
>>
>> And with this patch applied:
>>
>> file:
>> read:
>> seq aio: 76.6k ±1.8k (- 3 %)
>> rand aio: 26.7k ±0.4k (-32 %)
>> seq sync: 47.7k ±1.2k (+47 %)
>> rand sync: 10.1k ±0.2k (+ 2 %)
>> write:
>> seq aio: 58.1k ±0.5k (- 6 %)
>> rand aio: 58.1k ±0.5k (- 5 %)
>> seq sync: 36.3k ±0.3k (+30 %)
>> rand sync: 36.1k ±0.4k (+31 %)
>> null:
>> read:
>> seq aio: 268.4k ±3.4k (+25 %)
>> rand aio: 265.3k ±2.1k (+25 %)
>> seq sync: 134.3k ±2.7k (+49 %)
>> rand sync: 132.4k ±1.4k (+48 %)
>> write:
>> seq aio: 275.3k ±1.7k (+35 %)
>> rand aio: 272.3k ±1.9k (+35 %)
>> seq sync: 130.7k ±1.6k (+52 %)
>> rand sync: 127.4k ±2.4k (+50 %)
>>
>> So clearly the AIO file results are actually not good, and random reads
>> are indeed quite terrible. On the other hand, we can see from the sync
>> and null results that request handling should in theory be quicker. How
>> does this fit together?
>>
>> I believe the bad AIO results are an artifact of the accidental parallel
>> request processing we have due to nested polling: Depending on how the
>> actual request processing is structured and how long request processing
>> takes, more or less requests will be submitted in parallel. So because
>> of the restructuring, I think this patch accidentally changes how many
>> requests end up being submitted in parallel, which decreases
>> performance.
>>
>> (I have seen something like this before: In RSD, without having
>> implemented a polling mode, the debug build tended to have better
>> performance than the more optimized release build, because the debug
>> build, taking longer to submit requests, ended up processing more
>> requests in parallel.)
>>
>> In any case, once we use coroutines throughout the code, performance
>> will improve again across the board.
>>
>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
>> ---
>> block/export/fuse.c | 793 +++++++++++++++++++++++++++++++-------------
>> 1 file changed, 567 insertions(+), 226 deletions(-)
>>
>> diff --git a/block/export/fuse.c b/block/export/fuse.c
>> index 3dd50badb3..407b101018 100644
>> --- a/block/export/fuse.c
>> +++ b/block/export/fuse.c
[...]
>> +/**
>> + * Check the FUSE FD for whether it is readable or not. Because we cannot
>> + * reasonably do this without reading a request at the same time, also read and
>> + * process that request if any.
>> + * (To be used as a poll handler for the FUSE FD.)
>> + */
>> +static bool poll_fuse_fd(void *opaque)
>> +{
>> + return read_from_fuse_fd(opaque);
>> +}
> The other io_poll() callbacks in QEMU peek at memory whereas this one
> invokes the read(2) syscall. Two reasons why this is a problem:
> 1. Syscall latency is too high. Other fd handlers will be delayed by
> microseconds.
> 2. This doesn't scale. If every component in QEMU does this then the
> event loop degrades to O(n) of non-blocking read(2) syscalls where n
> is the number of fds.
>
> Also, handling the request inside the io_poll() callback skews
> AioContext's time accounting because time spent handling the request
> will be accounted as "polling time". The adaptive polling calculation
> will think it polled for longer than it did.
>
> If there is no way to peek at memory, please don't implement the
> io_poll() callback.
Got it, thanks!
Hanna
next prev parent reply other threads:[~2025-04-04 12:37 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-25 16:05 [PATCH 00/15] export/fuse: Use coroutines and multi-threading Hanna Czenczek
2025-03-25 16:06 ` [PATCH 01/15] fuse: Copy write buffer content before polling Hanna Czenczek
2025-03-27 14:47 ` Stefan Hajnoczi
2025-04-04 11:17 ` Hanna Czenczek
2025-04-01 13:44 ` Eric Blake
2025-04-04 11:18 ` Hanna Czenczek
2025-03-25 16:06 ` [PATCH 02/15] fuse: Ensure init clean-up even with error_fatal Hanna Czenczek
2025-03-26 5:47 ` Markus Armbruster
2025-03-26 9:49 ` Hanna Czenczek
2025-03-27 14:51 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 03/15] fuse: Remove superfluous empty line Hanna Czenczek
2025-03-27 14:53 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 04/15] fuse: Explicitly set inode ID to 1 Hanna Czenczek
2025-03-27 14:54 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 05/15] fuse: Change setup_... to mount_fuse_export() Hanna Czenczek
2025-03-27 14:55 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 06/15] fuse: Fix mount options Hanna Czenczek
2025-03-27 14:58 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 07/15] fuse: Set direct_io and parallel_direct_writes Hanna Czenczek
2025-03-27 15:09 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 08/15] fuse: Introduce fuse_{at,de}tach_handlers() Hanna Czenczek
2025-03-27 15:12 ` Stefan Hajnoczi
2025-04-01 13:55 ` Eric Blake
2025-04-04 11:24 ` Hanna Czenczek
2025-03-25 16:06 ` [PATCH 09/15] fuse: Introduce fuse_{inc,dec}_in_flight() Hanna Czenczek
2025-03-27 15:13 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 10/15] fuse: Add halted flag Hanna Czenczek
2025-03-27 15:15 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 11/15] fuse: Manually process requests (without libfuse) Hanna Czenczek
2025-03-27 15:35 ` Stefan Hajnoczi
2025-04-04 12:36 ` Hanna Czenczek [this message]
2025-04-01 14:35 ` Eric Blake
2025-04-04 11:30 ` Hanna Czenczek
2025-04-04 11:42 ` Hanna Czenczek
2025-03-25 16:06 ` [PATCH 12/15] fuse: Reduce max read size Hanna Czenczek
2025-03-27 15:35 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 13/15] fuse: Process requests in coroutines Hanna Czenczek
2025-03-27 15:38 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 14/15] fuse: Implement multi-threading Hanna Czenczek
2025-03-26 5:38 ` Markus Armbruster
2025-03-26 9:55 ` Hanna Czenczek
2025-03-26 11:41 ` Markus Armbruster
2025-03-26 13:56 ` Hanna Czenczek
2025-03-27 12:18 ` Markus Armbruster via
2025-03-27 13:45 ` Hanna Czenczek
2025-04-01 12:05 ` Kevin Wolf
2025-04-01 20:31 ` Eric Blake
2025-04-04 12:45 ` Hanna Czenczek
2025-03-27 15:55 ` Stefan Hajnoczi
2025-04-01 20:36 ` Eric Blake
2025-04-02 13:20 ` Stefan Hajnoczi
2025-04-03 17:59 ` Eric Blake
2025-04-04 12:49 ` Hanna Czenczek
2025-04-07 14:02 ` Stefan Hajnoczi
2025-04-01 14:58 ` Eric Blake
2025-03-25 16:06 ` [PATCH 15/15] fuse: Increase MAX_WRITE_SIZE with a second buffer Hanna Czenczek
2025-03-27 15:59 ` Stefan Hajnoczi
2025-04-01 20:24 ` Eric Blake
2025-04-04 13:04 ` Hanna Czenczek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d4dc6324-110d-42ba-bfe1-366b937de40e@redhat.com \
--to=hreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).