From: Paolo Bonzini <pbonzini@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>, qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Fam Zheng <fam@euphon.net>,
"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v4] virtio-scsi: restart DMA after iothread
Date: Wed, 12 Jun 2019 14:32:30 +0200 [thread overview]
Message-ID: <298bea90-7b26-6a07-3f3f-65dacaba9fe5@redhat.com> (raw)
In-Reply-To: <20190612120421.20336-1-stefanha@redhat.com>
On 12/06/19 14:04, Stefan Hajnoczi wrote:
> When the 'cont' command resumes guest execution the vm change state
> handlers are invoked. Unfortunately there is no explicit ordering
> between vm change state handlers. When two layers of code both use vm
> change state handlers, we don't control which handler runs first.
>
> virtio-scsi with iothreads hits a deadlock when a failed SCSI command is
> restarted and completes before the iothread is re-initialized.
>
> This patch introduces priorities for VM change state handlers so the
> IOThread is guaranteed to be initialized before DMA requests are
> restarted.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> v4:
> Paolo and Michael were interested in a priorities system. Kevin wasn't
> convinced. Here is a patch implementing the priorities approach so you
> can decide whether you prefer this or not.
I like this better than the others, but I do agree with Kevin that the
names aren't great and PRIO_IOTHREAD has nothing to do with iothreads
really.
Maybe the priority should be simply the "depth" of the device's bus, so
2 for scsi because we know it always has a controller between the device
and the machine and 1 for everything else. Maybe who knows, in the
future the vmstate_change handler could be moved in DeviceClass and
propagated across buses[1]
So I vote for this patch but with VM_CHANGE_STATE_HANDLER_PRIO_IOTHREAD
renamed to VM_CHANGE_STATE_HANDLER_PRIO_DEVICE and
VM_CHANGE_STATE_HANDLER_PRIO_DEVICE renamed to
VM_CHANGE_STATE_HANDLER_PRIO_DISK (which is consistent with the naming
of scsi-disk, though not with ide-drive...).
Paolo
[1] With care, because currently the initialization order for stop is
virtio-device -> virtio-pci -> scsi-bus (and the opposite for start). I
am not sure that the order for virtio-pci and virtio-device could be
exchanged, as would be the case if you followed the bus order
pci->virtio->scsi.
> The customer has now confirmed that the deadlock is fixed. I have
> changed the Subject line from RFC to PATCH.
>
> include/sysemu/sysemu.h | 10 ++++++++++
> hw/scsi/scsi-bus.c | 6 ++++--
> hw/virtio/virtio.c | 6 ++++--
> vl.c | 44 +++++++++++++++++++++++++++++++----------
> 4 files changed, 52 insertions(+), 14 deletions(-)
>
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 61579ae71e..1a4db092c7 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -27,8 +27,18 @@ bool runstate_store(char *str, size_t size);
> typedef struct vm_change_state_entry VMChangeStateEntry;
> typedef void VMChangeStateHandler(void *opaque, int running, RunState state);
>
> +enum {
> + /* Low priorities run first when the VM starts */
> + VM_CHANGE_STATE_HANDLER_PRIO_UNDEFINED = 0,
> + VM_CHANGE_STATE_HANDLER_PRIO_IOTHREAD = 100,
> + VM_CHANGE_STATE_HANDLER_PRIO_DEVICE = 200,
> + /* High priorities run first when the VM stops */
> +};
> +
> VMChangeStateEntry *qemu_add_vm_change_state_handler(VMChangeStateHandler *cb,
> void *opaque);
> +VMChangeStateEntry *qemu_add_vm_change_state_handler_prio(
> + VMChangeStateHandler *cb, void *opaque, int priority);
> void qemu_del_vm_change_state_handler(VMChangeStateEntry *e);
> void vm_state_notify(int running, RunState state);
>
> diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
> index c480553083..eda5b9a19e 100644
> --- a/hw/scsi/scsi-bus.c
> +++ b/hw/scsi/scsi-bus.c
> @@ -206,8 +206,10 @@ static void scsi_qdev_realize(DeviceState *qdev, Error **errp)
> error_propagate(errp, local_err);
> return;
> }
> - dev->vmsentry = qemu_add_vm_change_state_handler(scsi_dma_restart_cb,
> - dev);
> + dev->vmsentry = qemu_add_vm_change_state_handler_prio(
> + scsi_dma_restart_cb,
> + dev,
> + VM_CHANGE_STATE_HANDLER_PRIO_DEVICE);
> }
>
> static void scsi_qdev_unrealize(DeviceState *qdev, Error **errp)
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 07f4a64b48..9256af587a 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2354,8 +2354,10 @@ void virtio_init(VirtIODevice *vdev, const char *name,
> } else {
> vdev->config = NULL;
> }
> - vdev->vmstate = qemu_add_vm_change_state_handler(virtio_vmstate_change,
> - vdev);
> + vdev->vmstate = qemu_add_vm_change_state_handler_prio(
> + virtio_vmstate_change,
> + vdev,
> + VM_CHANGE_STATE_HANDLER_PRIO_IOTHREAD);
> vdev->device_endian = virtio_default_endian();
> vdev->use_guest_notifier_mask = true;
> }
> diff --git a/vl.c b/vl.c
> index cd1fbc4cdc..26c82063d2 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1469,27 +1469,45 @@ static int machine_help_func(QemuOpts *opts, MachineState *machine)
> struct vm_change_state_entry {
> VMChangeStateHandler *cb;
> void *opaque;
> - QLIST_ENTRY (vm_change_state_entry) entries;
> + QTAILQ_ENTRY (vm_change_state_entry) entries;
> + int priority;
> };
>
> -static QLIST_HEAD(, vm_change_state_entry) vm_change_state_head;
> +static QTAILQ_HEAD(, vm_change_state_entry) vm_change_state_head;
>
> -VMChangeStateEntry *qemu_add_vm_change_state_handler(VMChangeStateHandler *cb,
> - void *opaque)
> +VMChangeStateEntry *qemu_add_vm_change_state_handler_prio(
> + VMChangeStateHandler *cb, void *opaque, int priority)
> {
> VMChangeStateEntry *e;
> + VMChangeStateEntry *other;
>
> e = g_malloc0(sizeof (*e));
> -
> e->cb = cb;
> e->opaque = opaque;
> - QLIST_INSERT_HEAD(&vm_change_state_head, e, entries);
> + e->priority = priority;
> +
> + /* Keep list sorted in ascending priority order */
> + QTAILQ_FOREACH(other, &vm_change_state_head, entries) {
> + if (priority < other->priority) {
> + QTAILQ_INSERT_BEFORE(other, e, entries);
> + return e;
> + }
> + }
> +
> + QTAILQ_INSERT_TAIL(&vm_change_state_head, e, entries);
> return e;
> }
>
> +VMChangeStateEntry *qemu_add_vm_change_state_handler(VMChangeStateHandler *cb,
> + void *opaque)
> +{
> + return qemu_add_vm_change_state_handler_prio(cb, opaque,
> + VM_CHANGE_STATE_HANDLER_PRIO_UNDEFINED);
> +}
> +
> void qemu_del_vm_change_state_handler(VMChangeStateEntry *e)
> {
> - QLIST_REMOVE (e, entries);
> + QTAILQ_REMOVE (&vm_change_state_head, e, entries);
> g_free (e);
> }
>
> @@ -1499,8 +1517,14 @@ void vm_state_notify(int running, RunState state)
>
> trace_vm_state_notify(running, state, RunState_str(state));
>
> - QLIST_FOREACH_SAFE(e, &vm_change_state_head, entries, next) {
> - e->cb(e->opaque, running, state);
> + if (running) {
> + QTAILQ_FOREACH_SAFE(e, &vm_change_state_head, entries, next) {
> + e->cb(e->opaque, running, state);
> + }
> + } else {
> + QTAILQ_FOREACH_REVERSE_SAFE(e, &vm_change_state_head, entries, next) {
> + e->cb(e->opaque, running, state);
> + }
> }
> }
>
> @@ -3009,7 +3033,7 @@ int main(int argc, char **argv, char **envp)
> exit(1);
> }
>
> - QLIST_INIT (&vm_change_state_head);
> + QTAILQ_INIT (&vm_change_state_head);
> os_setup_early_signal_handling();
>
> cpu_option = NULL;
>
next prev parent reply other threads:[~2019-06-12 12:57 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-12 12:04 [Qemu-devel] [PATCH v4] virtio-scsi: restart DMA after iothread Stefan Hajnoczi
2019-06-12 12:32 ` Paolo Bonzini [this message]
2019-06-12 14:06 ` no-reply
2019-06-17 12:29 ` Kevin Wolf
2019-06-17 17:23 ` Paolo Bonzini
2019-06-17 17:58 ` Kevin Wolf
2019-06-17 18:27 ` Paolo Bonzini
2019-06-20 16:18 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=298bea90-7b26-6a07-3f3f-65dacaba9fe5@redhat.com \
--to=pbonzini@redhat.com \
--cc=fam@euphon.net \
--cc=kwolf@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).