From: Kevin Wolf <kwolf@redhat.com>
To: Fiona Ebner <f.ebner@proxmox.com>
Cc: qemu-block@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>,
richard.henderson@linaro.org, qemu-devel@nongnu.org,
Thomas Lamprecht <t.lamprecht@proxmox.com>,
Hanna Reitz <hreitz@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>, Fam Zheng <fam@euphon.net>,
"Michael S . Tsirkin" <mst@redhat.com>
Subject: Re: [PULL 29/32] virtio-blk: implement BlockDevOps->drained_begin()
Date: Fri, 8 Dec 2023 09:32:58 +0100 [thread overview]
Message-ID: <ZXLUuoOawqQpyodD@redhat.com> (raw)
In-Reply-To: <c3e115ff-c143-4d1b-901c-6b386d012eac@proxmox.com>
Am 07.12.2023 um 16:22 hat Fiona Ebner geschrieben:
> Am 03.11.23 um 14:12 schrieb Fiona Ebner:
> > Hi,
> >
> > I ran into a strange issue where guest IO would get completely stuck
> > during certain block jobs a while ago and finally managed to find a
> > small reproducer [0]. I'm using a VM with virtio-blk-pci (or
> > virtio-scsi-pci) with an iothread and running
> >
> > fio --name=file --size=100M --direct=1 --rw=randwrite --bs=4k
> > --ioengine=psync --numjobs=5 --runtime=1200 --time_based
> >
> > in the guest. Then I'm issuing the QMP command with the reproducer in a
> > loop. Usually, the guest IO will get stuck after about 1-3 minutes,
> > sometimes fio can manage to continue with a lower speed for a while (but
> > trying to Ctrl+C it or doing other IO in the guest will already be
> > broken), which I guess could be a hint that it's an issue with notifiers?
> >
> > Bisecting (to declare a commit good, I waited 10 minutes) led me to this
> > patch, i.e. commit 1665d9326f ("virtio-blk: implement
> > BlockDevOps->drained_begin()") and for SCSI, I verified that the issue
> > similarly starts happening after 766aa2de0f ("virtio-scsi: implement
> > BlockDevOps->drained_begin()").
> >
> > Both issues are still present on current master (i.e. 1c98a821a2
> > ("tests/qtest: Introduce tests for AMD/Xilinx Versal TRNG device"))
> >
> > Happy to provide more information and hints about how to debug the issue
> > further.
> >
>
> I think I was finally able to get to the bottom of this and have a
> plausible-sounding pet theory now. It involves the VirtIO notifier
> optimization during poll mode.
>
> Let's step through some debug prints I added. First number is always the
> thread ID (I'm sorry that I used warn_report rather than proper tracing):
>
> > 247050 nodefd 29 poll_set_started 1
>
> The iothread starts poll mode for the node with fd 29 which is the
> virtio host notifier.
>
> > 247050 0x55e515185270 poll begin for vq
> > 247050 0x55e515185270 setting notification for vq 0
>
> virtio_queue_set_notification is called to disable notification.
>
> > 247050 nodefd 29 poll_set_started 1 done
> > 247050 0x55e515185270 handle vq suppress_notifications 0 num_reqs 1
> > 247050 0x55e515185270 handle vq suppress_notifications 0 num_reqs 4
>
> virtio-blk handling some requests, note that suppress_notifications is 0
> because we are in poll mode.
>
> > 247048 nodefd 29 addr 0x55e51496ed70 marking as deleted
>
> Main thread marks the node for deletion when beginning drain, i.e.
> detaches the host notifier.
>
> > 247048 nodefd 29 addr 0x55e513cdcd20 is_new 1 adding node
>
> Main thread adds a new node when ending drain, i.e. attaches the host
> notifier.
>
> > 247050 nodefd 29 addr 0x55e51496ed70 remove deleted handler
>
> The iothread removes the handler marked for removal. In particular from
> the node_poll list: QLIST_SAFE_REMOVE(node, node_poll);
>
> > 247050 disabling poll mode before fdmon_ops->wait
>
> This is just before the call to
> poll_set_started(ctx, &ready_list, false)
>
> Whoops!! Nobody ends poll mode for the node with fd 29, because the old
> node was deleted from the node_poll list already and new node is not
> part of it, i.e. nobody has started poll mode for the new node.
>
> > 247050 0x55e515185270 handle vq suppress_notifications 0 num_reqs 0
>
> fdmon_ops->wait() returns one last time (not sure why) but no actual
> requests.
>
> > 247050 disabling poll mode before fdmon_ops->wait
>
> After this, the fdmon_ops->wait() (it's fdmon_poll_wait in my case) will
> just wait forever (or until triggering QMP 'stop' and 'cont' which
> restarts the dataplane).
>
>
> A minimal workaround seems to be either calling
> event_notifier_set(virtio_queue_get_host_notifier(vq));
> or
> virtio_queue_set_notification(vq, true);
> in drainded_end (for both VirtIO SCSI/block).
>
> But is this an actual issue with the AIO interface/implementation? Or
> should it rather be considered a bug in the VirtIO SCSI/block drain
> implementation, because of the notification optimization?
I'm not involved in it myself, but the kind of theme reminds me of this
downstream bug that Hanna analysed recently:
https://issues.redhat.com/browse/RHEL-3934
Does it look like the same root cause to you?
Kevin
next prev parent reply other threads:[~2023-12-08 8:33 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-30 16:32 [PULL 00/32] Block layer patches Kevin Wolf
2023-05-30 16:32 ` [PULL 01/32] block-coroutine-wrapper: Take AioContext lock in no_co_wrappers Kevin Wolf
2023-05-30 16:32 ` [PULL 02/32] block: Clarify locking rules for bdrv_open(_inherit)() Kevin Wolf
2023-05-30 16:32 ` [PULL 03/32] block: Take main AioContext lock when calling bdrv_open() Kevin Wolf
2023-05-30 16:32 ` [PULL 04/32] block-backend: Fix blk_new_open() for iothreads Kevin Wolf
2023-05-30 16:32 ` [PULL 05/32] mirror: Hold main AioContext lock for calling bdrv_open_backing_file() Kevin Wolf
2023-05-30 16:32 ` [PULL 06/32] qcow2: Fix open with 'file' in iothread Kevin Wolf
2023-05-30 16:32 ` [PULL 07/32] raw-format: " Kevin Wolf
2023-05-30 16:32 ` [PULL 08/32] copy-before-write: Fix open with child " Kevin Wolf
2023-05-30 16:32 ` [PULL 09/32] block: Take AioContext lock in bdrv_open_driver() Kevin Wolf
2023-05-30 16:32 ` [PULL 10/32] block: Fix AioContext locking in bdrv_insert_node() Kevin Wolf
2023-05-30 16:32 ` [PULL 11/32] iotests: Make verify_virtio_scsi_pci_or_ccw() public Kevin Wolf
2023-05-30 16:32 ` [PULL 12/32] iotests: Test blockdev-create in iothread Kevin Wolf
2023-05-30 16:32 ` [PULL 13/32] block-backend: split blk_do_set_aio_context() Kevin Wolf
2023-05-30 16:32 ` [PULL 14/32] hw/qdev: introduce qdev_is_realized() helper Kevin Wolf
2023-05-30 16:32 ` [PULL 15/32] virtio-scsi: avoid race between unplug and transport event Kevin Wolf
2023-05-30 16:32 ` [PULL 16/32] virtio-scsi: stop using aio_disable_external() during unplug Kevin Wolf
2023-05-30 16:32 ` [PULL 17/32] util/vhost-user-server: rename refcount to in_flight counter Kevin Wolf
2023-05-30 16:32 ` [PULL 18/32] block/export: wait for vhost-user-blk requests when draining Kevin Wolf
2023-05-30 16:32 ` [PULL 19/32] block/export: stop using is_external in vhost-user-blk server Kevin Wolf
2023-05-30 16:32 ` [PULL 20/32] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore Kevin Wolf
2023-05-30 16:32 ` [PULL 21/32] block: add blk_in_drain() API Kevin Wolf
2023-05-30 16:32 ` [PULL 22/32] block: drain from main loop thread in bdrv_co_yield_to_drain() Kevin Wolf
2023-05-30 16:32 ` [PULL 23/32] xen-block: implement BlockDevOps->drained_begin() Kevin Wolf
2023-05-30 16:32 ` [PULL 24/32] hw/xen: do not set is_external=true on evtchn fds Kevin Wolf
2023-05-30 16:32 ` [PULL 25/32] block/export: rewrite vduse-blk drain code Kevin Wolf
2023-05-30 16:32 ` [PULL 26/32] block/export: don't require AioContext lock around blk_exp_ref/unref() Kevin Wolf
2023-05-30 16:32 ` [PULL 27/32] block/fuse: do not set is_external=true on FUSE fd Kevin Wolf
2023-05-30 16:32 ` [PULL 28/32] virtio: make it possible to detach host notifier from any thread Kevin Wolf
2023-05-30 16:32 ` [PULL 29/32] virtio-blk: implement BlockDevOps->drained_begin() Kevin Wolf
2023-11-03 13:12 ` Fiona Ebner
2023-11-13 14:38 ` Fiona Ebner
2023-12-07 15:22 ` Fiona Ebner
2023-12-08 8:32 ` Kevin Wolf [this message]
2023-12-11 10:48 ` Fiona Ebner
2023-12-13 21:19 ` Stefan Hajnoczi
2023-05-30 16:32 ` [PULL 30/32] virtio-scsi: " Kevin Wolf
2023-05-30 16:32 ` [PULL 31/32] virtio: do not set is_external=true on host notifiers Kevin Wolf
2023-05-30 16:32 ` [PULL 32/32] aio: remove aio_disable_external() API Kevin Wolf
2023-05-30 18:33 ` [PULL 00/32] Block layer patches Richard Henderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZXLUuoOawqQpyodD@redhat.com \
--to=kwolf@redhat.com \
--cc=f.ebner@proxmox.com \
--cc=fam@euphon.net \
--cc=hreitz@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=stefanha@redhat.com \
--cc=t.lamprecht@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).