* [Qemu-devel] [RFC v3 1/3] virtio: add vdc->vmchange_state() callback
2019-05-24 18:36 [Qemu-devel] [RFC v3 0/3] scsi: restart dma after vm change state handlers Stefan Hajnoczi
@ 2019-05-24 18:36 ` Stefan Hajnoczi
2019-05-24 18:36 ` [Qemu-devel] [RFC v3 2/3] scsi: add scsi_bus_dma_restart() Stefan Hajnoczi
` (2 subsequent siblings)
3 siblings, 0 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2019-05-24 18:36 UTC (permalink / raw)
To: qemu-devel
Cc: Fam Zheng, Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi,
Michael S. Tsirkin
The core virtio code invokes ->set_status() followed by
->ioeventfd_start() when the guest resumes execution. Both of these
functions are also called in other cases unrelated to vm change state.
This patch introduces ->vmstate_change() so that devices can act on
guest pause/resume. The existing qemu_add_vm_change_state_handler() API
isn't usable by virtio devices since the ordering between vm change
state handlers is undefined. The new ->vmstate_change() callback is
always invoked after ->set_status() and ->ioeventfd_start() when
resuming a guest.
A later patch makes use of this new callback.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/hw/virtio/virtio.h | 7 +++++++
hw/virtio/virtio.c | 9 +++++++++
2 files changed, 16 insertions(+)
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 27c0efc3d0..5742efa1d7 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -158,6 +158,13 @@ typedef struct VirtioDeviceClass {
void (*save)(VirtIODevice *vdev, QEMUFile *f);
int (*load)(VirtIODevice *vdev, QEMUFile *f, int version_id);
const VMStateDescription *vmsd;
+
+ /* Called when the device should start/stop running because the guest was
+ * resumed/paused. Note that this takes VIRTIO_CONFIG_S_DRIVER_OK into
+ * account so running is true iff the guest is resumed and the guest driver
+ * has already indicated it is ready.
+ */
+ void (*vmstate_change)(VirtIODevice *vdev, bool running);
} VirtioDeviceClass;
void virtio_instance_init_common(Object *proxy_obj, void *data,
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 4805727b53..cdf869456b 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2291,6 +2291,7 @@ static void virtio_vmstate_change(void *opaque, int running, RunState state)
VirtIODevice *vdev = opaque;
BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+ VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev);
bool backend_run = running && vdev->started;
vdev->vm_running = running;
@@ -2298,10 +2299,18 @@ static void virtio_vmstate_change(void *opaque, int running, RunState state)
virtio_set_status(vdev, vdev->status);
}
+ if (!backend_run && vdc->vmstate_change) {
+ vdc->vmstate_change(vdev, backend_run);
+ }
+
if (k->vmstate_change) {
k->vmstate_change(qbus->parent, backend_run);
}
+ if (backend_run && vdc->vmstate_change) {
+ vdc->vmstate_change(vdev, backend_run);
+ }
+
if (!backend_run) {
virtio_set_status(vdev, vdev->status);
}
--
2.21.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* [Qemu-devel] [RFC v3 2/3] scsi: add scsi_bus_dma_restart()
2019-05-24 18:36 [Qemu-devel] [RFC v3 0/3] scsi: restart dma after vm change state handlers Stefan Hajnoczi
2019-05-24 18:36 ` [Qemu-devel] [RFC v3 1/3] virtio: add vdc->vmchange_state() callback Stefan Hajnoczi
@ 2019-05-24 18:36 ` Stefan Hajnoczi
2019-05-24 18:36 ` [Qemu-devel] [RFC v3 3/3] virtio-scsi: fix iothread deadlock on 'cont' Stefan Hajnoczi
2019-05-24 18:47 ` [Qemu-devel] [RFC v3 0/3] scsi: restart dma after vm change state handlers Paolo Bonzini
3 siblings, 0 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2019-05-24 18:36 UTC (permalink / raw)
To: qemu-devel
Cc: Fam Zheng, Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi,
Michael S. Tsirkin
By default scsi-bus.c registers vm change state handlers for SCSIDevice
instances and restarts DMA when guest execution is resumed.
Unfortunately virtio-scsi with iothreads has a special ordering
requirement that the core virtio code's vm change state handler runs
before scsi-bus.c's vm change state handler.
This patch allows SCSI busses to disable the default vm change state
handler for DMA restart. The new scsi_bus_dma_restart() API allows them
to manually restart DMA at a time of their choice.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/hw/scsi/scsi.h | 5 +++++
hw/scsi/scsi-bus.c | 37 ++++++++++++++++++++++++++++++-------
2 files changed, 35 insertions(+), 7 deletions(-)
diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
index acef25faa4..e9cc563daa 100644
--- a/include/hw/scsi/scsi.h
+++ b/include/hw/scsi/scsi.h
@@ -120,6 +120,10 @@ struct SCSIReqOps {
struct SCSIBusInfo {
int tcq;
int max_channel, max_target, max_lun;
+
+ /* Will the bus call scsi_bus_dma_restart() itself? */
+ bool custom_dma_restart;
+
int (*parse_cdb)(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf,
void *hba_private);
void (*transfer_data)(SCSIRequest *req, uint32_t arg);
@@ -160,6 +164,7 @@ SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, BlockBackend *blk,
const char *serial, Error **errp);
void scsi_bus_legacy_handle_cmdline(SCSIBus *bus);
void scsi_legacy_handle_cmdline(void);
+void scsi_bus_dma_restart(SCSIBus *bus);
SCSIRequest *scsi_req_alloc(const SCSIReqOps *reqops, SCSIDevice *d,
uint32_t tag, uint32_t lun, void *hba_private);
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index c480553083..d2c5a1652b 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -134,6 +134,27 @@ void scsi_req_retry(SCSIRequest *req)
req->retry = true;
}
+static void scsi_device_dma_restart(SCSIDevice *dev)
+{
+ if (!dev->bh) {
+ AioContext *ctx = blk_get_aio_context(dev->conf.blk);
+ dev->bh = aio_bh_new(ctx, scsi_dma_restart_bh, dev);
+ qemu_bh_schedule(dev->bh);
+ }
+}
+
+void scsi_bus_dma_restart(SCSIBus *bus)
+{
+ BusChild *kid;
+
+ QTAILQ_FOREACH(kid, &bus->qbus.children, sibling) {
+ DeviceState *qdev = kid->child;
+ SCSIDevice *dev = SCSI_DEVICE(qdev);
+
+ scsi_device_dma_restart(dev);
+ }
+}
+
static void scsi_dma_restart_cb(void *opaque, int running, RunState state)
{
SCSIDevice *s = opaque;
@@ -141,11 +162,8 @@ static void scsi_dma_restart_cb(void *opaque, int running, RunState state)
if (!running) {
return;
}
- if (!s->bh) {
- AioContext *ctx = blk_get_aio_context(s->conf.blk);
- s->bh = aio_bh_new(ctx, scsi_dma_restart_bh, s);
- qemu_bh_schedule(s->bh);
- }
+
+ scsi_device_dma_restart(s);
}
static void scsi_qdev_realize(DeviceState *qdev, Error **errp)
@@ -206,8 +224,13 @@ static void scsi_qdev_realize(DeviceState *qdev, Error **errp)
error_propagate(errp, local_err);
return;
}
- dev->vmsentry = qemu_add_vm_change_state_handler(scsi_dma_restart_cb,
- dev);
+
+ if (bus->info->custom_dma_restart) {
+ dev->vmsentry = NULL;
+ } else {
+ dev->vmsentry = qemu_add_vm_change_state_handler(scsi_dma_restart_cb,
+ dev);
+ }
}
static void scsi_qdev_unrealize(DeviceState *qdev, Error **errp)
--
2.21.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* [Qemu-devel] [RFC v3 3/3] virtio-scsi: fix iothread deadlock on 'cont'
2019-05-24 18:36 [Qemu-devel] [RFC v3 0/3] scsi: restart dma after vm change state handlers Stefan Hajnoczi
2019-05-24 18:36 ` [Qemu-devel] [RFC v3 1/3] virtio: add vdc->vmchange_state() callback Stefan Hajnoczi
2019-05-24 18:36 ` [Qemu-devel] [RFC v3 2/3] scsi: add scsi_bus_dma_restart() Stefan Hajnoczi
@ 2019-05-24 18:36 ` Stefan Hajnoczi
2019-05-24 18:47 ` [Qemu-devel] [RFC v3 0/3] scsi: restart dma after vm change state handlers Paolo Bonzini
3 siblings, 0 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2019-05-24 18:36 UTC (permalink / raw)
To: qemu-devel
Cc: Fam Zheng, Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi,
Michael S. Tsirkin
When the 'cont' command resumes guest execution the vm change state
handlers are invoked. Unfortunately there is no explicit ordering
between vm change state handlers. When two layers of code both use vm
change state handlers, we don't control which handler runs first.
virtio-scsi with iothreads hits a deadlock when a failed SCSI command is
restarted and completes before the iothread is re-initialized.
This patch makes sure that DMA restart happens after the iothread has
been started again.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
hw/scsi/virtio-scsi.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 839f120256..236a0ee873 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -846,12 +846,28 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
qdev_simple_device_unplug_cb(hotplug_dev, dev, errp);
}
+static void virtio_scsi_vmstate_change(VirtIODevice *vdev, bool running)
+{
+ VirtIOSCSI *s = VIRTIO_SCSI(vdev);
+
+ if (running) {
+ scsi_bus_dma_restart(&s->bus);
+ }
+}
+
static struct SCSIBusInfo virtio_scsi_scsi_info = {
.tcq = true,
.max_channel = VIRTIO_SCSI_MAX_CHANNEL,
.max_target = VIRTIO_SCSI_MAX_TARGET,
.max_lun = VIRTIO_SCSI_MAX_LUN,
+ /* We call scsi_bus_dma_restart() ourselves to control the ordering between
+ * ->start_ioeventfd() and DMA restart. Do it in
+ * virtio_scsi_vmstate_change(), which is called by the core virtio code
+ * after ->start_ioeventfd().
+ */
+ .custom_dma_restart = true,
+
.complete = virtio_scsi_command_complete,
.cancel = virtio_scsi_request_cancelled,
.change = virtio_scsi_change,
@@ -986,6 +1002,7 @@ static void virtio_scsi_class_init(ObjectClass *klass, void *data)
vdc->reset = virtio_scsi_reset;
vdc->start_ioeventfd = virtio_scsi_dataplane_start;
vdc->stop_ioeventfd = virtio_scsi_dataplane_stop;
+ vdc->vmstate_change = virtio_scsi_vmstate_change;
hc->plug = virtio_scsi_hotplug;
hc->unplug = virtio_scsi_hotunplug;
}
--
2.21.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [Qemu-devel] [RFC v3 0/3] scsi: restart dma after vm change state handlers
2019-05-24 18:36 [Qemu-devel] [RFC v3 0/3] scsi: restart dma after vm change state handlers Stefan Hajnoczi
` (2 preceding siblings ...)
2019-05-24 18:36 ` [Qemu-devel] [RFC v3 3/3] virtio-scsi: fix iothread deadlock on 'cont' Stefan Hajnoczi
@ 2019-05-24 18:47 ` Paolo Bonzini
2019-05-24 19:08 ` Michael S. Tsirkin
2019-05-29 22:10 ` Kevin Wolf
3 siblings, 2 replies; 9+ messages in thread
From: Paolo Bonzini @ 2019-05-24 18:47 UTC (permalink / raw)
To: Stefan Hajnoczi, qemu-devel; +Cc: Fam Zheng, Kevin Wolf, Michael S. Tsirkin
On 24/05/19 20:36, Stefan Hajnoczi wrote:
> v3:
> * Fix s/k->vmstate_change/vdc->vmstate_change/
> * Still RFC, waiting for customer to confirm this fixes the issue
> v2:
> * Do it properly with a clean API instead of deferring to a BH!
> Thanks for encouraging me to do this, Kevin.
>
> These patches solve a deadlock when the 'cont' command is used and there are
> failed requests on a virtio-scsi device with iothreads. The deadlock itself is
> actually not the thing we need to fix because we should never reach that case
> anyway. Instead we need to make sure DMA restart is only performed after the
> virtio-scsi iothread is re-initialized.
custom_dma_restart is a bit ugly... Do you think it would make sense to
order the VMStateChange handlers using some kind of enum (with the order
unspecified within the category)?
We could start with
VMSTATECHANGE_PRIO_UNKNOWN = 0 (if needed?)
VMSTATECHANGE_PRIO_IOTHREAD = 100
VMSTATECHANGE_PRIO_DEVICE = 200
where higher priorities run first on stop and last on resume.
Paolo
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [Qemu-devel] [RFC v3 0/3] scsi: restart dma after vm change state handlers
2019-05-24 18:47 ` [Qemu-devel] [RFC v3 0/3] scsi: restart dma after vm change state handlers Paolo Bonzini
@ 2019-05-24 19:08 ` Michael S. Tsirkin
2019-05-29 22:10 ` Kevin Wolf
1 sibling, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2019-05-24 19:08 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Fam Zheng, Kevin Wolf, qemu-devel, Stefan Hajnoczi
On Fri, May 24, 2019 at 08:47:18PM +0200, Paolo Bonzini wrote:
> On 24/05/19 20:36, Stefan Hajnoczi wrote:
> > v3:
> > * Fix s/k->vmstate_change/vdc->vmstate_change/
> > * Still RFC, waiting for customer to confirm this fixes the issue
> > v2:
> > * Do it properly with a clean API instead of deferring to a BH!
> > Thanks for encouraging me to do this, Kevin.
> >
> > These patches solve a deadlock when the 'cont' command is used and there are
> > failed requests on a virtio-scsi device with iothreads. The deadlock itself is
> > actually not the thing we need to fix because we should never reach that case
> > anyway. Instead we need to make sure DMA restart is only performed after the
> > virtio-scsi iothread is re-initialized.
>
> custom_dma_restart is a bit ugly... Do you think it would make sense to
> order the VMStateChange handlers using some kind of enum (with the order
> unspecified within the category)?
>
> We could start with
>
> VMSTATECHANGE_PRIO_UNKNOWN = 0 (if needed?)
Yes I think it's a good idea to explicitly say I don't care
about the order like this.
> VMSTATECHANGE_PRIO_IOTHREAD = 100
> VMSTATECHANGE_PRIO_DEVICE = 200
>
> where higher priorities run first on stop and last on resume.
>
> Paolo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [RFC v3 0/3] scsi: restart dma after vm change state handlers
2019-05-24 18:47 ` [Qemu-devel] [RFC v3 0/3] scsi: restart dma after vm change state handlers Paolo Bonzini
2019-05-24 19:08 ` Michael S. Tsirkin
@ 2019-05-29 22:10 ` Kevin Wolf
2019-05-30 8:27 ` Paolo Bonzini
1 sibling, 1 reply; 9+ messages in thread
From: Kevin Wolf @ 2019-05-29 22:10 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Fam Zheng, qemu-devel, Stefan Hajnoczi, Michael S. Tsirkin
Am 24.05.2019 um 20:47 hat Paolo Bonzini geschrieben:
> On 24/05/19 20:36, Stefan Hajnoczi wrote:
> > v3:
> > * Fix s/k->vmstate_change/vdc->vmstate_change/
> > * Still RFC, waiting for customer to confirm this fixes the issue
> > v2:
> > * Do it properly with a clean API instead of deferring to a BH!
> > Thanks for encouraging me to do this, Kevin.
> >
> > These patches solve a deadlock when the 'cont' command is used and there are
> > failed requests on a virtio-scsi device with iothreads. The deadlock itself is
> > actually not the thing we need to fix because we should never reach that case
> > anyway. Instead we need to make sure DMA restart is only performed after the
> > virtio-scsi iothread is re-initialized.
>
> custom_dma_restart is a bit ugly... Do you think it would make sense to
> order the VMStateChange handlers using some kind of enum (with the order
> unspecified within the category)?
>
> We could start with
>
> VMSTATECHANGE_PRIO_UNKNOWN = 0 (if needed?)
> VMSTATECHANGE_PRIO_IOTHREAD = 100
> VMSTATECHANGE_PRIO_DEVICE = 200
>
> where higher priorities run first on stop and last on resume.
I don't think this is as nice because stopping or resuming a device
could involve some async operation (e.g. be delegated to a BH). In this
case, a device on a child bus must still wait for the BH (or other async
part) to be completed before it can resume its own operation.
Basically, a single flat list of global VM state handlers wasn't a good
design, because what we actually need in such cases is something
hierarchical.
Kevin
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [RFC v3 0/3] scsi: restart dma after vm change state handlers
2019-05-29 22:10 ` Kevin Wolf
@ 2019-05-30 8:27 ` Paolo Bonzini
2019-05-30 8:52 ` Kevin Wolf
0 siblings, 1 reply; 9+ messages in thread
From: Paolo Bonzini @ 2019-05-30 8:27 UTC (permalink / raw)
To: Kevin Wolf; +Cc: Fam Zheng, qemu-devel, Stefan Hajnoczi, Michael S. Tsirkin
On 30/05/19 00:10, Kevin Wolf wrote:
> I don't think this is as nice because stopping or resuming a device
> could involve some async operation (e.g. be delegated to a BH). In this
> case, a device on a child bus must still wait for the BH (or other async
> part) to be completed before it can resume its own operation.
>
> Basically, a single flat list of global VM state handlers wasn't a good
> design, because what we actually need in such cases is something
> hierarchical.
So add an AioContext state change handler?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [RFC v3 0/3] scsi: restart dma after vm change state handlers
2019-05-30 8:27 ` Paolo Bonzini
@ 2019-05-30 8:52 ` Kevin Wolf
0 siblings, 0 replies; 9+ messages in thread
From: Kevin Wolf @ 2019-05-30 8:52 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Fam Zheng, qemu-devel, Stefan Hajnoczi, Michael S. Tsirkin
Am 30.05.2019 um 10:27 hat Paolo Bonzini geschrieben:
> On 30/05/19 00:10, Kevin Wolf wrote:
> > I don't think this is as nice because stopping or resuming a device
> > could involve some async operation (e.g. be delegated to a BH). In this
> > case, a device on a child bus must still wait for the BH (or other async
> > part) to be completed before it can resume its own operation.
> >
> > Basically, a single flat list of global VM state handlers wasn't a good
> > design, because what we actually need in such cases is something
> > hierarchical.
>
> So add an AioContext state change handler?
Does anything really change for the AioContext that could cause a
callback? As I read the code, only the virtio-scsi device state really
changes and it doesn't do more with the AioContext than just taking the
lock for a while.
But in any case, inferring whether the HBA is ready from some AioContext
state change, even if it were technically possible, is rather indirect
and more a hack than a proper solution in my book. So a callback from
the HBA to its bus feels like the correct approach.
Kevin
^ permalink raw reply [flat|nested] 9+ messages in thread