From: Kevin Wolf <kwolf@redhat.com>
To: Fiona Ebner <f.ebner@proxmox.com>
Cc: qemu-block@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>,
richard.henderson@linaro.org, qemu-devel@nongnu.org,
Thomas Lamprecht <t.lamprecht@proxmox.com>,
Hanna Reitz <hreitz@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>, Fam Zheng <fam@euphon.net>,
"Michael S . Tsirkin" <mst@redhat.com>
Subject: Re: [PULL 29/32] virtio-blk: implement BlockDevOps->drained_begin()
Date: Fri, 8 Dec 2023 09:32:58 +0100 [thread overview]
Message-ID: <ZXLUuoOawqQpyodD@redhat.com> (raw)
In-Reply-To: <c3e115ff-c143-4d1b-901c-6b386d012eac@proxmox.com>
Am 07.12.2023 um 16:22 hat Fiona Ebner geschrieben:
> Am 03.11.23 um 14:12 schrieb Fiona Ebner:
> > Hi,
> >
> > I ran into a strange issue where guest IO would get completely stuck
> > during certain block jobs a while ago and finally managed to find a
> > small reproducer [0]. I'm using a VM with virtio-blk-pci (or
> > virtio-scsi-pci) with an iothread and running
> >
> > fio --name=file --size=100M --direct=1 --rw=randwrite --bs=4k
> > --ioengine=psync --numjobs=5 --runtime=1200 --time_based
> >
> > in the guest. Then I'm issuing the QMP command with the reproducer in a
> > loop. Usually, the guest IO will get stuck after about 1-3 minutes,
> > sometimes fio can manage to continue with a lower speed for a while (but
> > trying to Ctrl+C it or doing other IO in the guest will already be
> > broken), which I guess could be a hint that it's an issue with notifiers?
> >
> > Bisecting (to declare a commit good, I waited 10 minutes) led me to this
> > patch, i.e. commit 1665d9326f ("virtio-blk: implement
> > BlockDevOps->drained_begin()") and for SCSI, I verified that the issue
> > similarly starts happening after 766aa2de0f ("virtio-scsi: implement
> > BlockDevOps->drained_begin()").
> >
> > Both issues are still present on current master (i.e. 1c98a821a2
> > ("tests/qtest: Introduce tests for AMD/Xilinx Versal TRNG device"))
> >
> > Happy to provide more information and hints about how to debug the issue
> > further.
> >
>
> I think I was finally able to get to the bottom of this and have a
> plausible-sounding pet theory now. It involves the VirtIO notifier
> optimization during poll mode.
>
> Let's step through some debug prints I added. First number is always the
> thread ID (I'm sorry that I used warn_report rather than proper tracing):
>
> > 247050 nodefd 29 poll_set_started 1
>
> The iothread starts poll mode for the node with fd 29 which is the
> virtio host notifier.
>
> > 247050 0x55e515185270 poll begin for vq
> > 247050 0x55e515185270 setting notification for vq 0
>
> virtio_queue_set_notification is called to disable notification.
>
> > 247050 nodefd 29 poll_set_started 1 done
> > 247050 0x55e515185270 handle vq suppress_notifications 0 num_reqs 1
> > 247050 0x55e515185270 handle vq suppress_notifications 0 num_reqs 4
>
> virtio-blk handling some requests, note that suppress_notifications is 0
> because we are in poll mode.
>
> > 247048 nodefd 29 addr 0x55e51496ed70 marking as deleted
>
> Main thread marks the node for deletion when beginning drain, i.e.
> detaches the host notifier.
>
> > 247048 nodefd 29 addr 0x55e513cdcd20 is_new 1 adding node
>
> Main thread adds a new node when ending drain, i.e. attaches the host
> notifier.
>
> > 247050 nodefd 29 addr 0x55e51496ed70 remove deleted handler
>
> The iothread removes the handler marked for removal. In particular from
> the node_poll list: QLIST_SAFE_REMOVE(node, node_poll);
>
> > 247050 disabling poll mode before fdmon_ops->wait
>
> This is just before the call to
> poll_set_started(ctx, &ready_list, false)
>
> Whoops!! Nobody ends poll mode for the node with fd 29, because the old
> node was deleted from the node_poll list already and new node is not
> part of it, i.e. nobody has started poll mode for the new node.
>
> > 247050 0x55e515185270 handle vq suppress_notifications 0 num_reqs 0
>
> fdmon_ops->wait() returns one last time (not sure why) but no actual
> requests.
>
> > 247050 disabling poll mode before fdmon_ops->wait
>
> After this, the fdmon_ops->wait() (it's fdmon_poll_wait in my case) will
> just wait forever (or until triggering QMP 'stop' and 'cont' which
> restarts the dataplane).
>
>
> A minimal workaround seems to be either calling
> event_notifier_set(virtio_queue_get_host_notifier(vq));
> or
> virtio_queue_set_notification(vq, true);
> in drainded_end (for both VirtIO SCSI/block).
>
> But is this an actual issue with the AIO interface/implementation? Or
> should it rather be considered a bug in the VirtIO SCSI/block drain
> implementation, because of the notification optimization?
I'm not involved in it myself, but the kind of theme reminds me of this
downstream bug that Hanna analysed recently:
https://issues.redhat.com/browse/RHEL-3934
Does it look like the same root cause to you?
Kevin
next prev parent reply other threads:[~2023-12-08 8:33 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-30 16:32 [PULL 00/32] Block layer patches Kevin Wolf
2023-05-30 16:32 ` [PULL 01/32] block-coroutine-wrapper: Take AioContext lock in no_co_wrappers Kevin Wolf
2023-05-30 16:32 ` [PULL 02/32] block: Clarify locking rules for bdrv_open(_inherit)() Kevin Wolf
2023-05-30 16:32 ` [PULL 03/32] block: Take main AioContext lock when calling bdrv_open() Kevin Wolf
2023-05-30 16:32 ` [PULL 04/32] block-backend: Fix blk_new_open() for iothreads Kevin Wolf
2023-05-30 16:32 ` [PULL 05/32] mirror: Hold main AioContext lock for calling bdrv_open_backing_file() Kevin Wolf
2023-05-30 16:32 ` [PULL 06/32] qcow2: Fix open with 'file' in iothread Kevin Wolf
2023-05-30 16:32 ` [PULL 07/32] raw-format: " Kevin Wolf
2023-05-30 16:32 ` [PULL 08/32] copy-before-write: Fix open with child " Kevin Wolf
2023-05-30 16:32 ` [PULL 09/32] block: Take AioContext lock in bdrv_open_driver() Kevin Wolf
2023-05-30 16:32 ` [PULL 10/32] block: Fix AioContext locking in bdrv_insert_node() Kevin Wolf
2023-05-30 16:32 ` [PULL 11/32] iotests: Make verify_virtio_scsi_pci_or_ccw() public Kevin Wolf
2023-05-30 16:32 ` [PULL 12/32] iotests: Test blockdev-create in iothread Kevin Wolf
2023-05-30 16:32 ` [PULL 13/32] block-backend: split blk_do_set_aio_context() Kevin Wolf
2023-05-30 16:32 ` [PULL 14/32] hw/qdev: introduce qdev_is_realized() helper Kevin Wolf
2023-05-30 16:32 ` [PULL 15/32] virtio-scsi: avoid race between unplug and transport event Kevin Wolf
2023-05-30 16:32 ` [PULL 16/32] virtio-scsi: stop using aio_disable_external() during unplug Kevin Wolf
2023-05-30 16:32 ` [PULL 17/32] util/vhost-user-server: rename refcount to in_flight counter Kevin Wolf
2023-05-30 16:32 ` [PULL 18/32] block/export: wait for vhost-user-blk requests when draining Kevin Wolf
2023-05-30 16:32 ` [PULL 19/32] block/export: stop using is_external in vhost-user-blk server Kevin Wolf
2023-05-30 16:32 ` [PULL 20/32] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore Kevin Wolf
2023-05-30 16:32 ` [PULL 21/32] block: add blk_in_drain() API Kevin Wolf
2023-05-30 16:32 ` [PULL 22/32] block: drain from main loop thread in bdrv_co_yield_to_drain() Kevin Wolf
2023-05-30 16:32 ` [PULL 23/32] xen-block: implement BlockDevOps->drained_begin() Kevin Wolf
2023-05-30 16:32 ` [PULL 24/32] hw/xen: do not set is_external=true on evtchn fds Kevin Wolf
2023-05-30 16:32 ` [PULL 25/32] block/export: rewrite vduse-blk drain code Kevin Wolf
2023-05-30 16:32 ` [PULL 26/32] block/export: don't require AioContext lock around blk_exp_ref/unref() Kevin Wolf
2023-05-30 16:32 ` [PULL 27/32] block/fuse: do not set is_external=true on FUSE fd Kevin Wolf
2023-05-30 16:32 ` [PULL 28/32] virtio: make it possible to detach host notifier from any thread Kevin Wolf
2023-05-30 16:32 ` [PULL 29/32] virtio-blk: implement BlockDevOps->drained_begin() Kevin Wolf
2023-11-03 13:12 ` Fiona Ebner
2023-11-13 14:38 ` Fiona Ebner
2023-12-07 15:22 ` Fiona Ebner
2023-12-08 8:32 ` Kevin Wolf [this message]
2023-12-11 10:48 ` Fiona Ebner
2023-12-13 21:19 ` Stefan Hajnoczi
2023-05-30 16:32 ` [PULL 30/32] virtio-scsi: " Kevin Wolf
2023-05-30 16:32 ` [PULL 31/32] virtio: do not set is_external=true on host notifiers Kevin Wolf
2023-05-30 16:32 ` [PULL 32/32] aio: remove aio_disable_external() API Kevin Wolf
2023-05-30 18:33 ` [PULL 00/32] Block layer patches Richard Henderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZXLUuoOawqQpyodD@redhat.com \
--to=kwolf@redhat.com \
--cc=f.ebner@proxmox.com \
--cc=fam@euphon.net \
--cc=hreitz@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=stefanha@redhat.com \
--cc=t.lamprecht@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.