From: Hanna Czenczek <hreitz@redhat.com>
To: Fiona Ebner <f.ebner@proxmox.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>,
qemu-block@nongnu.org, Fam Zheng <fam@euphon.net>
Subject: Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler
Date: Mon, 22 Jan 2024 18:41:35 +0100 [thread overview]
Message-ID: <3bb5aa0e-ae0a-4fda-a5b5-1bfac86651ac@redhat.com> (raw)
In-Reply-To: <67a36617-9e61-4778-aebf-1e667cb51120@proxmox.com>
On 05.01.24 15:30, Fiona Ebner wrote:
> Am 05.01.24 um 14:43 schrieb Fiona Ebner:
>> Am 03.01.24 um 14:35 schrieb Paolo Bonzini:
>>> On 1/3/24 12:40, Fiona Ebner wrote:
>>>> I'm happy to report that I cannot reproduce the CPU-usage-spike issue
>>>> with the patch, but I did run into an assertion failure when trying to
>>>> verify that it fixes my original stuck-guest-IO issue. See below for the
>>>> backtrace [0]. Hanna wrote in https://issues.redhat.com/browse/RHEL-3934
>>>>
>>>>> I think it’s sufficient to simply call virtio_queue_notify_vq(vq)
>>>>> after the virtio_queue_aio_attach_host_notifier(vq, ctx) call, because
>>>>> both virtio-scsi’s and virtio-blk’s .handle_output() implementations
>>>>> acquire the device’s context, so this should be directly callable from
>>>>> any context.
>>>> I guess this is not true anymore now that the AioContext locking was
>>>> removed?
>>> Good point and, in fact, even before it was much safer to use
>>> virtio_queue_notify() instead. Not only does it use the event notifier
>>> handler, but it also calls it in the right thread/AioContext just by
>>> doing event_notifier_set().
>>>
>> But with virtio_queue_notify() using the event notifier, the
>> CPU-usage-spike issue is present:
>>
>>>> Back to the CPU-usage-spike issue: I experimented around and it doesn't
>>>> seem to matter whether I notify the virt queue before or after attaching
>>>> the notifiers. But there's another functional difference. My patch
>>>> called virtio_queue_notify() which contains this block:
>>>>
>>>>> if (vq->host_notifier_enabled) {
>>>>> event_notifier_set(&vq->host_notifier);
>>>>> } else if (vq->handle_output) {
>>>>> vq->handle_output(vdev, vq);
>>>> In my testing, the first branch was taken, calling event_notifier_set().
>>>> Hanna's patch uses virtio_queue_notify_vq() and there,
>>>> vq->handle_output() will be called. That seems to be the relevant
>>>> difference regarding the CPU-usage-spike issue.
>> I should mention that this is with a VirtIO SCSI disk. I also attempted
>> to reproduce the CPU-usage-spike issue with a VirtIO block disk, but
>> didn't manage yet.
>>
>> What I noticed is that in virtio_queue_host_notifier_aio_poll(), one of
>> the queues (but only one) will always show as nonempty. And then,
>> run_poll_handlers_once() will always detect progress which explains the
>> CPU usage.
>>
>> The following shows
>> 1. vq address
>> 2. number of times vq was passed to virtio_queue_host_notifier_aio_poll()
>> 3. number of times the result of virtio_queue_host_notifier_aio_poll()
>> was true for the vq
>>
>>> 0x555fd93f9c80 17162000 0
>>> 0x555fd93f9e48 17162000 6
>>> 0x555fd93f9ee0 17162000 0
>>> 0x555fd93f9d18 17162000 17162000
>>> 0x555fd93f9db0 17162000 0
>>> 0x555fd93f9f78 17162000 0
>> And for the problematic one, the reason it is seen as nonempty is:
>>
>>> 0x555fd93f9d18 shadow_avail_idx 8 last_avail_idx 0
> vring_avail_idx(vq) also gives 8 here. This is the vs->event_vq and
> s->events_dropped is false in my testing, so
> virtio_scsi_handle_event_vq() doesn't do anything.
>
>> Those values stay like this while the call counts above increase.
>>
>> So something going wrong with the indices when the event notifier is set
>> from QEMU side (in the main thread)?
>>
>> The guest is Debian 12 with a 6.1 kernel.
So, trying to figure out a new RFC version:
About the stack trace you, Fiona, posted: As far as I understand, that
happens because virtio_blk_drained_end() calling
virtio_queue_notify_vq() wasn’t safe after all, and instead we need to
use virtio_queue_notify(). Right?
However, you say using virtio_queue_notify() instead causes busy loops
of doing nothing in virtio-scsi (what you describe above). I mean,
better than a crash, but, you know. :)
I don’t know have any prior knowledge about the event handling,
unfortunately. The fact that 8 buffers are available but we don’t use
any sounds OK to me; as far as I understand, we only use those buffers
if we have any events to push into them, so as long as we don’t, we
won’t. Question is, should we not have its poll handler return false if
we don’t have any events (i.e. events_dropped is false)? Would that
solve it?
Hanna
next prev parent reply other threads:[~2024-01-22 17:42 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-13 21:15 [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler Stefan Hajnoczi
2023-12-13 21:15 ` [RFC 1/3] aio-posix: run aio_set_fd_handler() in target AioContext Stefan Hajnoczi
2023-12-13 21:15 ` [RFC 2/3] aio: use counter instead of ctx->list_lock Stefan Hajnoczi
2023-12-13 21:15 ` [RFC 3/3] aio-posix: call ->poll_end() when removing AioHandler Stefan Hajnoczi
2023-12-13 21:52 ` Paolo Bonzini
2023-12-14 20:12 ` Stefan Hajnoczi
2023-12-14 20:39 ` Paolo Bonzini
2023-12-18 14:27 ` Stefan Hajnoczi
2023-12-13 21:52 ` [RFC 0/3] " Stefan Hajnoczi
2023-12-13 23:10 ` Paolo Bonzini
2023-12-14 19:52 ` Stefan Hajnoczi
2023-12-14 13:38 ` Fiona Ebner
2023-12-14 19:53 ` Stefan Hajnoczi
2023-12-18 12:41 ` Fiona Ebner
2023-12-18 14:25 ` Stefan Hajnoczi
2023-12-18 14:49 ` Paolo Bonzini
2023-12-19 8:40 ` Fiona Ebner
2024-01-02 15:24 ` Hanna Czenczek
2024-01-02 15:53 ` Paolo Bonzini
2024-01-02 16:55 ` Hanna Czenczek
2024-01-03 11:40 ` Fiona Ebner
2024-01-03 13:35 ` Paolo Bonzini
2024-01-05 13:43 ` Fiona Ebner
2024-01-05 14:30 ` Fiona Ebner
2024-01-22 17:41 ` Hanna Czenczek [this message]
2024-01-22 17:52 ` Hanna Czenczek
2024-01-23 11:12 ` Fiona Ebner
2024-01-23 11:25 ` Hanna Czenczek
2024-01-23 11:15 ` Hanna Czenczek
2024-01-23 16:28 ` Hanna Czenczek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3bb5aa0e-ae0a-4fda-a5b5-1bfac86651ac@redhat.com \
--to=hreitz@redhat.com \
--cc=f.ebner@proxmox.com \
--cc=fam@euphon.net \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).