qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Hanna Czenczek <hreitz@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org,
	Kevin Wolf <kwolf@redhat.com>
Subject: Re: [PATCH 11/15] fuse: Manually process requests (without libfuse)
Date: Fri, 4 Apr 2025 14:36:46 +0200	[thread overview]
Message-ID: <d4dc6324-110d-42ba-bfe1-366b937de40e@redhat.com> (raw)
In-Reply-To: <20250327153503.GK37458@fedora>

On 27.03.25 16:35, Stefan Hajnoczi wrote:
> On Tue, Mar 25, 2025 at 05:06:51PM +0100, Hanna Czenczek wrote:
>> Manually read requests from the /dev/fuse FD and process them, without
>> using libfuse.  This allows us to safely add parallel request processing
>> in coroutines later, without having to worry about libfuse internals.
>> (Technically, we already have exactly that problem with
>> read_from_fuse_export()/read_from_fuse_fd() nesting.)
>>
>> We will continue to use libfuse for mounting the filesystem; fusermount3
>> is a effectively a helper program of libfuse, so it should know best how
>> to interact with it.  (Doing it manually without libfuse, while doable,
>> is a bit of a pain, and it is not clear to me how stable the "protocol"
>> actually is.)
>>
>> Take this opportunity of quite a major rewrite to update the Copyright
>> line with corrected information that has surfaced in the meantime.
>>
>> Here are some benchmarks from before this patch (4k, iodepth=16, libaio;
>> except 'sync', which are iodepth=1 and pvsync2):
>>
>> file:
>>    read:
>>      seq aio:   78.6k ±1.3k IOPS
>>      rand aio:  39.3k ±2.9k
>>      seq sync:  32.5k ±0.7k
>>      rand sync:  9.9k ±0.1k
>>    write:
>>      seq aio:   61.9k ±0.5k
>>      rand aio:  61.2k ±0.6k
>>      seq sync:  27.9k ±0.2k
>>      rand sync: 27.6k ±0.4k
>> null:
>>    read:
>>      seq aio:   214.0k ±5.9k
>>      rand aio:  212.7k ±4.5k
>>      seq sync:   90.3k ±6.5k
>>      rand sync:  89.7k ±5.1k
>>    write:
>>      seq aio:   203.9k ±1.5k
>>      rand aio:  201.4k ±3.6k
>>      seq sync:   86.1k ±6.2k
>>      rand sync:  84.9k ±5.3k
>>
>> And with this patch applied:
>>
>> file:
>>    read:
>>      seq aio:   76.6k ±1.8k (- 3 %)
>>      rand aio:  26.7k ±0.4k (-32 %)
>>      seq sync:  47.7k ±1.2k (+47 %)
>>      rand sync: 10.1k ±0.2k (+ 2 %)
>>    write:
>>      seq aio:   58.1k ±0.5k (- 6 %)
>>      rand aio:  58.1k ±0.5k (- 5 %)
>>      seq sync:  36.3k ±0.3k (+30 %)
>>      rand sync: 36.1k ±0.4k (+31 %)
>> null:
>>    read:
>>      seq aio:   268.4k ±3.4k (+25 %)
>>      rand aio:  265.3k ±2.1k (+25 %)
>>      seq sync:  134.3k ±2.7k (+49 %)
>>      rand sync: 132.4k ±1.4k (+48 %)
>>    write:
>>      seq aio:   275.3k ±1.7k (+35 %)
>>      rand aio:  272.3k ±1.9k (+35 %)
>>      seq sync:  130.7k ±1.6k (+52 %)
>>      rand sync: 127.4k ±2.4k (+50 %)
>>
>> So clearly the AIO file results are actually not good, and random reads
>> are indeed quite terrible.  On the other hand, we can see from the sync
>> and null results that request handling should in theory be quicker.  How
>> does this fit together?
>>
>> I believe the bad AIO results are an artifact of the accidental parallel
>> request processing we have due to nested polling: Depending on how the
>> actual request processing is structured and how long request processing
>> takes, more or less requests will be submitted in parallel.  So because
>> of the restructuring, I think this patch accidentally changes how many
>> requests end up being submitted in parallel, which decreases
>> performance.
>>
>> (I have seen something like this before: In RSD, without having
>> implemented a polling mode, the debug build tended to have better
>> performance than the more optimized release build, because the debug
>> build, taking longer to submit requests, ended up processing more
>> requests in parallel.)
>>
>> In any case, once we use coroutines throughout the code, performance
>> will improve again across the board.
>>
>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
>> ---
>>   block/export/fuse.c | 793 +++++++++++++++++++++++++++++++-------------
>>   1 file changed, 567 insertions(+), 226 deletions(-)
>>
>> diff --git a/block/export/fuse.c b/block/export/fuse.c
>> index 3dd50badb3..407b101018 100644
>> --- a/block/export/fuse.c
>> +++ b/block/export/fuse.c

[...]

>> +/**
>> + * Check the FUSE FD for whether it is readable or not.  Because we cannot
>> + * reasonably do this without reading a request at the same time, also read and
>> + * process that request if any.
>> + * (To be used as a poll handler for the FUSE FD.)
>> + */
>> +static bool poll_fuse_fd(void *opaque)
>> +{
>> +    return read_from_fuse_fd(opaque);
>> +}
> The other io_poll() callbacks in QEMU peek at memory whereas this one
> invokes the read(2) syscall. Two reasons why this is a problem:
> 1. Syscall latency is too high. Other fd handlers will be delayed by
>     microseconds.
> 2. This doesn't scale. If every component in QEMU does this then the
>     event loop degrades to O(n) of non-blocking read(2) syscalls where n
>     is the number of fds.
>
> Also, handling the request inside the io_poll() callback skews
> AioContext's time accounting because time spent handling the request
> will be accounted as "polling time". The adaptive polling calculation
> will think it polled for longer than it did.
>
> If there is no way to peek at memory, please don't implement the
> io_poll() callback.

Got it, thanks!

Hanna



  reply	other threads:[~2025-04-04 12:37 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-25 16:05 [PATCH 00/15] export/fuse: Use coroutines and multi-threading Hanna Czenczek
2025-03-25 16:06 ` [PATCH 01/15] fuse: Copy write buffer content before polling Hanna Czenczek
2025-03-27 14:47   ` Stefan Hajnoczi
2025-04-04 11:17     ` Hanna Czenczek
2025-04-01 13:44   ` Eric Blake
2025-04-04 11:18     ` Hanna Czenczek
2025-03-25 16:06 ` [PATCH 02/15] fuse: Ensure init clean-up even with error_fatal Hanna Czenczek
2025-03-26  5:47   ` Markus Armbruster
2025-03-26  9:49     ` Hanna Czenczek
2025-03-27 14:51   ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 03/15] fuse: Remove superfluous empty line Hanna Czenczek
2025-03-27 14:53   ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 04/15] fuse: Explicitly set inode ID to 1 Hanna Czenczek
2025-03-27 14:54   ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 05/15] fuse: Change setup_... to mount_fuse_export() Hanna Czenczek
2025-03-27 14:55   ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 06/15] fuse: Fix mount options Hanna Czenczek
2025-03-27 14:58   ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 07/15] fuse: Set direct_io and parallel_direct_writes Hanna Czenczek
2025-03-27 15:09   ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 08/15] fuse: Introduce fuse_{at,de}tach_handlers() Hanna Czenczek
2025-03-27 15:12   ` Stefan Hajnoczi
2025-04-01 13:55   ` Eric Blake
2025-04-04 11:24     ` Hanna Czenczek
2025-03-25 16:06 ` [PATCH 09/15] fuse: Introduce fuse_{inc,dec}_in_flight() Hanna Czenczek
2025-03-27 15:13   ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 10/15] fuse: Add halted flag Hanna Czenczek
2025-03-27 15:15   ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 11/15] fuse: Manually process requests (without libfuse) Hanna Czenczek
2025-03-27 15:35   ` Stefan Hajnoczi
2025-04-04 12:36     ` Hanna Czenczek [this message]
2025-04-01 14:35   ` Eric Blake
2025-04-04 11:30     ` Hanna Czenczek
2025-04-04 11:42     ` Hanna Czenczek
2025-03-25 16:06 ` [PATCH 12/15] fuse: Reduce max read size Hanna Czenczek
2025-03-27 15:35   ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 13/15] fuse: Process requests in coroutines Hanna Czenczek
2025-03-27 15:38   ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 14/15] fuse: Implement multi-threading Hanna Czenczek
2025-03-26  5:38   ` Markus Armbruster
2025-03-26  9:55     ` Hanna Czenczek
2025-03-26 11:41       ` Markus Armbruster
2025-03-26 13:56         ` Hanna Czenczek
2025-03-27 12:18           ` Markus Armbruster via
2025-03-27 13:45             ` Hanna Czenczek
2025-04-01 12:05               ` Kevin Wolf
2025-04-01 20:31                 ` Eric Blake
2025-04-04 12:45                 ` Hanna Czenczek
2025-03-27 15:55   ` Stefan Hajnoczi
2025-04-01 20:36     ` Eric Blake
2025-04-02 13:20       ` Stefan Hajnoczi
2025-04-03 17:59         ` Eric Blake
2025-04-04 12:49     ` Hanna Czenczek
2025-04-07 14:02       ` Stefan Hajnoczi
2025-04-01 14:58   ` Eric Blake
2025-03-25 16:06 ` [PATCH 15/15] fuse: Increase MAX_WRITE_SIZE with a second buffer Hanna Czenczek
2025-03-27 15:59   ` Stefan Hajnoczi
2025-04-01 20:24   ` Eric Blake
2025-04-04 13:04     ` Hanna Czenczek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d4dc6324-110d-42ba-bfe1-366b937de40e@redhat.com \
    --to=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).