virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] virtio: (Partially) enable suspend/resume support
@ 2010-09-29 12:12 Amit Shah
  2010-10-03 15:53 ` Michael S. Tsirkin
  0 siblings, 1 reply; 10+ messages in thread
From: Amit Shah @ 2010-09-29 12:12 UTC (permalink / raw)
  To: Virtualization List; +Cc: Amit Shah, Anthony Liguori, Michael S. Tsirkin

Let host know of our state (partial yet) so that the host PCI device is
up to where we were when we were suspended.

This is still an RFC as this doesn't completely work: an unused device
at the time of suspend will work fine after resume, but a device that
has seen some activity doesn't behave well after resume.  Especially,
host->guest communication doesn't go through.  This could be due to
interrupts/MSI not being wired properly.  I haven't looked at this part
of the problem yet.

We also need a per-driver resume callback which can update the devices
with driver-specific data.  Eg, for virtio_console, the guest port
open/close status will have to be known to the device.

QEMU also starts using the queues as soon as the VIRTIO_PCI_QUEUE_PFN
command is sent, so it has to be taught to only start using queues when
the device is ready and the queues are set up.

However, I just wanted to send this out to get reactions / comments.

Not-yet-Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 drivers/virtio/virtio_pci.c  |   34 ++++++++++++++++++++++++++++++++++
 drivers/virtio/virtio_ring.c |    8 ++++++++
 include/linux/virtio.h       |    2 ++
 include/linux/virtio_pci.h   |    4 +++-
 4 files changed, 47 insertions(+), 1 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index ef8d9d5..a3c7f59 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -698,8 +698,42 @@ static int virtio_pci_suspend(struct pci_dev *pci_dev, pm_message_t state)
 
 static int virtio_pci_resume(struct pci_dev *pci_dev)
 {
+	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
+	struct virtio_pci_vq_info *info;
+	unsigned long flags;
+
 	pci_restore_state(pci_dev);
 	pci_set_power_state(pci_dev, PCI_D0);
+
+	iowrite32(vp_dev->vdev.features[0], vp_dev->ioaddr+VIRTIO_PCI_GUEST_FEATURES);
+	if (vp_dev->msix_used_vectors)
+		iowrite16(vp_dev->msix_used_vectors,
+			  vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
+
+	spin_lock_irqsave(&vp_dev->lock, flags);
+	list_for_each_entry(info, &vp_dev->virtqueues, node) {
+		/* Select the queue we're interested in */
+		iowrite16(info->queue_index,
+			  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
+
+		/* Update the last idx we sent data in */
+		iowrite16(virtqueue_get_avail_idx(info->vq),
+			  vp_dev->ioaddr + VIRTIO_PCI_AVAIL_IDX);
+
+		if (info->msix_vector != VIRTIO_MSI_NO_VECTOR) {
+			iowrite16(info->msix_vector,
+				  vp_dev->ioaddr + VIRTIO_MSI_QUEUE_VECTOR);
+		}
+
+		/* activate the queue */
+		iowrite32(virt_to_phys(info->queue) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT,
+			  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
+
+	}
+	spin_unlock_irqrestore(&vp_dev->lock, flags);
+
+	vp_set_status(&vp_dev->vdev, VIRTIO_CONFIG_S_DRIVER_OK);
+
 	return 0;
 }
 #endif
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1475ed6..3eb91d1 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -237,6 +237,14 @@ add_head:
 }
 EXPORT_SYMBOL_GPL(virtqueue_add_buf_gfp);
 
+u16 virtqueue_get_avail_idx(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	return vq->vring.avail->idx;
+}
+EXPORT_SYMBOL_GPL(virtqueue_get_avail_idx);
+
 void virtqueue_kick(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index aff5b4f..cdb04ec 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -88,6 +88,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
 
 void *virtqueue_detach_unused_buf(struct virtqueue *vq);
 
+u16 virtqueue_get_avail_idx(struct virtqueue *vq);
+
 /**
  * virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus
diff --git a/include/linux/virtio_pci.h b/include/linux/virtio_pci.h
index 9a3d7c4..aa3e584 100644
--- a/include/linux/virtio_pci.h
+++ b/include/linux/virtio_pci.h
@@ -55,9 +55,11 @@
 /* Vector value used to disable MSI for queue */
 #define VIRTIO_MSI_NO_VECTOR            0xffff
 
+#define VIRTIO_PCI_AVAIL_IDX		24
+
 /* The remaining space is defined by each driver as the per-driver
  * configuration space */
-#define VIRTIO_PCI_CONFIG(dev)		((dev)->msix_enabled ? 24 : 20)
+#define VIRTIO_PCI_CONFIG(dev)		26
 
 /* Virtio ABI version, this must match exactly */
 #define VIRTIO_PCI_ABI_VERSION		0
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] virtio: (Partially) enable suspend/resume support
  2010-09-29 12:12 [RFC PATCH] virtio: (Partially) enable suspend/resume support Amit Shah
@ 2010-10-03 15:53 ` Michael S. Tsirkin
  2010-10-05 13:45   ` Amit Shah
  0 siblings, 1 reply; 10+ messages in thread
From: Michael S. Tsirkin @ 2010-10-03 15:53 UTC (permalink / raw)
  To: Amit Shah; +Cc: Anthony Liguori, Virtualization List

On Wed, Sep 29, 2010 at 05:42:24PM +0530, Amit Shah wrote:
> Let host know of our state (partial yet) so that the host PCI device is
> up to where we were when we were suspended.
> 
> This is still an RFC as this doesn't completely work: an unused device
> at the time of suspend will work fine after resume, but a device that
> has seen some activity doesn't behave well after resume.  Especially,
> host->guest communication doesn't go through.  This could be due to
> interrupts/MSI not being wired properly.  I haven't looked at this part
> of the problem yet.
> 
> We also need a per-driver resume callback which can update the devices
> with driver-specific data.  Eg, for virtio_console, the guest port
> open/close status will have to be known to the device.
> 
> QEMU also starts using the queues as soon as the VIRTIO_PCI_QUEUE_PFN
> command is sent, so it has to be taught to only start using queues when
> the device is ready and the queues are set up.
> 
> However, I just wanted to send this out to get reactions / comments.
> 
> Not-yet-Signed-off-by: Amit Shah <amit.shah@redhat.com>
> ---
>  drivers/virtio/virtio_pci.c  |   34 ++++++++++++++++++++++++++++++++++
>  drivers/virtio/virtio_ring.c |    8 ++++++++
>  include/linux/virtio.h       |    2 ++
>  include/linux/virtio_pci.h   |    4 +++-
>  4 files changed, 47 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
> index ef8d9d5..a3c7f59 100644
> --- a/drivers/virtio/virtio_pci.c
> +++ b/drivers/virtio/virtio_pci.c
> @@ -698,8 +698,42 @@ static int virtio_pci_suspend(struct pci_dev *pci_dev, pm_message_t state)
>  
>  static int virtio_pci_resume(struct pci_dev *pci_dev)
>  {
> +	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> +	struct virtio_pci_vq_info *info;
> +	unsigned long flags;
> +
>  	pci_restore_state(pci_dev);
>  	pci_set_power_state(pci_dev, PCI_D0);
> +
> +	iowrite32(vp_dev->vdev.features[0], vp_dev->ioaddr+VIRTIO_PCI_GUEST_FEATURES);

need spaces around +

> +	if (vp_dev->msix_used_vectors)
> +		iowrite16(vp_dev->msix_used_vectors,
> +			  vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);

I think this is wrong, msix_used_vectors is a counter not a vector
number, try with 2 vectors and it will break I think: this is shared
vectors configuration.  We just never had a need to remember the config
vector number, either do remember it, or rely on the fact that it is
always 0 in our code: and maybe add BUG_ON in vp_request_msix_vectors
where it is first assigned. The condition would simply be
msix_enabled.



> +
> +	spin_lock_irqsave(&vp_dev->lock, flags);
> +	list_for_each_entry(info, &vp_dev->virtqueues, node) {
> +		/* Select the queue we're interested in */
> +		iowrite16(info->queue_index,
> +			  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
> +
> +		/* Update the last idx we sent data in */
> +		iowrite16(virtqueue_get_avail_idx(info->vq),
> +			  vp_dev->ioaddr + VIRTIO_PCI_AVAIL_IDX);

Interesting. Could we just reset the ring instead?
I think this would also solve the qemu problem you
outline, won't it?

> +
> +		if (info->msix_vector != VIRTIO_MSI_NO_VECTOR) {
> +			iowrite16(info->msix_vector,
> +				  vp_dev->ioaddr + VIRTIO_MSI_QUEUE_VECTOR);
> +		}

actually it's probably better to write msix_vector if it is
VIRTIO_MSI_NO_VECTOR as well, just to make sure
we are in a known state.  The condition
should be that msix is enabled in the device.

Further, please read the value back and verify that it was
written successfully: on error you get VIRTIO_MSI_NO_VECTOR.


> +
> +		/* activate the queue */
> +		iowrite32(virt_to_phys(info->queue) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT,
> +			  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> +
> +	}
> +	spin_unlock_irqrestore(&vp_dev->lock, flags);
> +
> +	vp_set_status(&vp_dev->vdev, VIRTIO_CONFIG_S_DRIVER_OK);
> +
>  	return 0;
>  }
>  #endif
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 1475ed6..3eb91d1 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -237,6 +237,14 @@ add_head:
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_add_buf_gfp);
>  
> +u16 virtqueue_get_avail_idx(struct virtqueue *_vq)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	return vq->vring.avail->idx;
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_get_avail_idx);
> +
>  void virtqueue_kick(struct virtqueue *_vq)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index aff5b4f..cdb04ec 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -88,6 +88,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
>  
>  void *virtqueue_detach_unused_buf(struct virtqueue *vq);
>  
> +u16 virtqueue_get_avail_idx(struct virtqueue *vq);
> +

This is a bit of a layering violation, in that the avail index
belongs to the ring internals. Possibly the best solution to expose it
is to put the index in the ring. This was one of the ideas
proposed for vring2. Other options include getting rid of the
index entirely.

>  /**
>   * virtio_device - representation of a device using virtio
>   * @index: unique position on the virtio bus
> diff --git a/include/linux/virtio_pci.h b/include/linux/virtio_pci.h
> index 9a3d7c4..aa3e584 100644
> --- a/include/linux/virtio_pci.h
> +++ b/include/linux/virtio_pci.h
> @@ -55,9 +55,11 @@
>  /* Vector value used to disable MSI for queue */
>  #define VIRTIO_MSI_NO_VECTOR            0xffff
>  
> +#define VIRTIO_PCI_AVAIL_IDX		24
> +
>  /* The remaining space is defined by each driver as the per-driver
>   * configuration space */
> -#define VIRTIO_PCI_CONFIG(dev)		((dev)->msix_enabled ? 24 : 20)
> +#define VIRTIO_PCI_CONFIG(dev)		26

This will silently break all devices out there, won't it?

>  
>  /* Virtio ABI version, this must match exactly */
>  #define VIRTIO_PCI_ABI_VERSION		0
> -- 
> 1.7.2.3
> 
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] virtio: (Partially) enable suspend/resume support
  2010-10-03 15:53 ` Michael S. Tsirkin
@ 2010-10-05 13:45   ` Amit Shah
  2010-10-05 15:23     ` Michael S. Tsirkin
  0 siblings, 1 reply; 10+ messages in thread
From: Amit Shah @ 2010-10-05 13:45 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Anthony Liguori, Virtualization List

(Re-adding Rusty to CC, looks like the linux-foundation ml drops CCs.)

On (Sun) Oct 03 2010 [17:53:59], Michael S. Tsirkin wrote:
> On Wed, Sep 29, 2010 at 05:42:24PM +0530, Amit Shah wrote:
> > Let host know of our state (partial yet) so that the host PCI device is
> > up to where we were when we were suspended.
> > 
> > This is still an RFC as this doesn't completely work: an unused device
> > at the time of suspend will work fine after resume, but a device that
> > has seen some activity doesn't behave well after resume.  Especially,
> > host->guest communication doesn't go through.  This could be due to
> > interrupts/MSI not being wired properly.  I haven't looked at this part
> > of the problem yet.
> > 
> > We also need a per-driver resume callback which can update the devices
> > with driver-specific data.  Eg, for virtio_console, the guest port
> > open/close status will have to be known to the device.
> > 
> > QEMU also starts using the queues as soon as the VIRTIO_PCI_QUEUE_PFN
> > command is sent, so it has to be taught to only start using queues when
> > the device is ready and the queues are set up.
> > 
> > However, I just wanted to send this out to get reactions / comments.
> > 
> > Not-yet-Signed-off-by: Amit Shah <amit.shah@redhat.com>
> > ---
> >  drivers/virtio/virtio_pci.c  |   34 ++++++++++++++++++++++++++++++++++
> >  drivers/virtio/virtio_ring.c |    8 ++++++++
> >  include/linux/virtio.h       |    2 ++
> >  include/linux/virtio_pci.h   |    4 +++-
> >  4 files changed, 47 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
> > index ef8d9d5..a3c7f59 100644
> > --- a/drivers/virtio/virtio_pci.c
> > +++ b/drivers/virtio/virtio_pci.c
> > @@ -698,8 +698,42 @@ static int virtio_pci_suspend(struct pci_dev *pci_dev, pm_message_t state)
> >  
> >  static int virtio_pci_resume(struct pci_dev *pci_dev)
> >  {
> > +	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> > +	struct virtio_pci_vq_info *info;
> > +	unsigned long flags;
> > +
> >  	pci_restore_state(pci_dev);
> >  	pci_set_power_state(pci_dev, PCI_D0);
> > +
> > +	iowrite32(vp_dev->vdev.features[0], vp_dev->ioaddr+VIRTIO_PCI_GUEST_FEATURES);
> 
> need spaces around +

That bit is taken from a place elsewhere in the file -- I guess it's
that way to keep everything on the same line.  But in the final
submission, I'll split up the line as I've done the others below.

> > +	if (vp_dev->msix_used_vectors)
> > +		iowrite16(vp_dev->msix_used_vectors,
> > +			  vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
> 
> I think this is wrong, msix_used_vectors is a counter not a vector
> number, try with 2 vectors and it will break I think: this is shared
> vectors configuration.  We just never had a need to remember the config
> vector number, either do remember it, or rely on the fact that it is
> always 0 in our code: and maybe add BUG_ON in vp_request_msix_vectors
> where it is first assigned. The condition would simply be
> msix_enabled.

Yes, I hadn't paid attention to MSI, this helps -- I'll adjust the code
accordingly.

> > +
> > +	spin_lock_irqsave(&vp_dev->lock, flags);
> > +	list_for_each_entry(info, &vp_dev->virtqueues, node) {
> > +		/* Select the queue we're interested in */
> > +		iowrite16(info->queue_index,
> > +			  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
> > +
> > +		/* Update the last idx we sent data in */
> > +		iowrite16(virtqueue_get_avail_idx(info->vq),
> > +			  vp_dev->ioaddr + VIRTIO_PCI_AVAIL_IDX);
> 
> Interesting. Could we just reset the ring instead?
> I think this would also solve the qemu problem you
> outline, won't it?

The problem here is qemu doesn't "know" we went into suspend and came
out of it.  When going to suspend, qemu could've received a kick
notification and would've been just about to process some queue entries.
Now when we resume and reset the ring, qemu could crash anyway seeing
invalid index values.  I think that's happening now anyway.

However, I think the current qemu problem I have might be related to me
not introducing memory barriers here.

> > +		if (info->msix_vector != VIRTIO_MSI_NO_VECTOR) {
> > +			iowrite16(info->msix_vector,
> > +				  vp_dev->ioaddr + VIRTIO_MSI_QUEUE_VECTOR);
> > +		}
> 
> actually it's probably better to write msix_vector if it is
> VIRTIO_MSI_NO_VECTOR as well, just to make sure
> we are in a known state.  The condition
> should be that msix is enabled in the device.
> 
> Further, please read the value back and verify that it was
> written successfully: on error you get VIRTIO_MSI_NO_VECTOR.

Yes, error handling is to be done yet.

> > +		/* activate the queue */
> > +		iowrite32(virt_to_phys(info->queue) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT,
> > +			  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> > +
> > +	}
> > +	spin_unlock_irqrestore(&vp_dev->lock, flags);
> > +
> > +	vp_set_status(&vp_dev->vdev, VIRTIO_CONFIG_S_DRIVER_OK);
> > +
> >  	return 0;
> >  }
> >  #endif
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 1475ed6..3eb91d1 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -237,6 +237,14 @@ add_head:
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_add_buf_gfp);
> >  
> > +u16 virtqueue_get_avail_idx(struct virtqueue *_vq)
> > +{
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +	return vq->vring.avail->idx;
> > +}
> > +EXPORT_SYMBOL_GPL(virtqueue_get_avail_idx);
> > +
> >  void virtqueue_kick(struct virtqueue *_vq)
> >  {
> >  	struct vring_virtqueue *vq = to_vvq(_vq);
> > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > index aff5b4f..cdb04ec 100644
> > --- a/include/linux/virtio.h
> > +++ b/include/linux/virtio.h
> > @@ -88,6 +88,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
> >  
> >  void *virtqueue_detach_unused_buf(struct virtqueue *vq);
> >  
> > +u16 virtqueue_get_avail_idx(struct virtqueue *vq);
> > +
> 
> This is a bit of a layering violation, in that the avail index
> belongs to the ring internals.

Yes, it is.  This exercise is basically to find out what's needed to
make suspend work; we can design the final solution once we have a
better idea of what's involved.

> Possibly the best solution to expose it
> is to put the index in the ring. This was one of the ideas
> proposed for vring2. Other options include getting rid of the
> index entirely.
> 
> >  /**
> >   * virtio_device - representation of a device using virtio
> >   * @index: unique position on the virtio bus
> > diff --git a/include/linux/virtio_pci.h b/include/linux/virtio_pci.h
> > index 9a3d7c4..aa3e584 100644
> > --- a/include/linux/virtio_pci.h
> > +++ b/include/linux/virtio_pci.h
> > @@ -55,9 +55,11 @@
> >  /* Vector value used to disable MSI for queue */
> >  #define VIRTIO_MSI_NO_VECTOR            0xffff
> >  
> > +#define VIRTIO_PCI_AVAIL_IDX		24
> > +
> >  /* The remaining space is defined by each driver as the per-driver
> >   * configuration space */
> > -#define VIRTIO_PCI_CONFIG(dev)		((dev)->msix_enabled ? 24 : 20)
> > +#define VIRTIO_PCI_CONFIG(dev)		26
> 
> This will silently break all devices out there, won't it?

Sure it does :-)  (Has to be handled when this is proposed as a fix
officially.)

		Amit

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] virtio: (Partially) enable suspend/resume support
  2010-10-05 13:45   ` Amit Shah
@ 2010-10-05 15:23     ` Michael S. Tsirkin
  2010-10-06 11:54       ` Amit Shah
  0 siblings, 1 reply; 10+ messages in thread
From: Michael S. Tsirkin @ 2010-10-05 15:23 UTC (permalink / raw)
  To: Amit Shah; +Cc: Anthony Liguori, Virtualization List

On Tue, Oct 05, 2010 at 07:15:31PM +0530, Amit Shah wrote:
> > > +
> > > +	spin_lock_irqsave(&vp_dev->lock, flags);
> > > +	list_for_each_entry(info, &vp_dev->virtqueues, node) {
> > > +		/* Select the queue we're interested in */
> > > +		iowrite16(info->queue_index,
> > > +			  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
> > > +
> > > +		/* Update the last idx we sent data in */
> > > +		iowrite16(virtqueue_get_avail_idx(info->vq),
> > > +			  vp_dev->ioaddr + VIRTIO_PCI_AVAIL_IDX);
> > 
> > Interesting. Could we just reset the ring instead?
> > I think this would also solve the qemu problem you
> > outline, won't it?
> 
> The problem here is qemu doesn't "know" we went into suspend and came
> out of it.  When going to suspend, qemu could've received a kick
> notification and would've been just about to process some queue entries.
> Now when we resume and reset the ring, qemu could crash anyway seeing
> invalid index values.

Hmm, I don't completely understand this.  When there's a reset I expect
this to discard any previous kicks. No?

>  I think that's happening now anyway.

How can one reproduce this?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] virtio: (Partially) enable suspend/resume support
  2010-10-05 15:23     ` Michael S. Tsirkin
@ 2010-10-06 11:54       ` Amit Shah
  2010-11-05 10:15         ` Michael S. Tsirkin
  0 siblings, 1 reply; 10+ messages in thread
From: Amit Shah @ 2010-10-06 11:54 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Anthony Liguori, Virtualization List

On (Tue) Oct 05 2010 [17:23:19], Michael S. Tsirkin wrote:
> On Tue, Oct 05, 2010 at 07:15:31PM +0530, Amit Shah wrote:
> > > > +
> > > > +	spin_lock_irqsave(&vp_dev->lock, flags);
> > > > +	list_for_each_entry(info, &vp_dev->virtqueues, node) {
> > > > +		/* Select the queue we're interested in */
> > > > +		iowrite16(info->queue_index,
> > > > +			  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
> > > > +
> > > > +		/* Update the last idx we sent data in */
> > > > +		iowrite16(virtqueue_get_avail_idx(info->vq),
> > > > +			  vp_dev->ioaddr + VIRTIO_PCI_AVAIL_IDX);
> > > 
> > > Interesting. Could we just reset the ring instead?
> > > I think this would also solve the qemu problem you
> > > outline, won't it?
> > 
> > The problem here is qemu doesn't "know" we went into suspend and came
> > out of it.  When going to suspend, qemu could've received a kick
> > notification and would've been just about to process some queue entries.
> > Now when we resume and reset the ring, qemu could crash anyway seeing
> > invalid index values.
> 
> Hmm, I don't completely understand this.  When there's a reset I expect
> this to discard any previous kicks. No?

I'm talking of a situation like this:


	Guest					Host

   virtqueue_add_buf()
   virtqueue_kick()
					virtqueue_pop() (in progress)

    -->   suspend


Now there will be some buffers in the virtqueue but the host wouldn't
know on the next resume.  So we want to keep the ring state in the
current state so that the host continues consuming from where it left
off.  

Now this wouldn't crash on resume for virtio-net, but for virtio-serial
(which uses chardevs) and also for virtio-block, I guess there are more
problems.

For example, for virtio-serial, if the machine is re-started with a
chardev connected, the virtqueue_num_heads() function gets called, which
results in the 'Guest moved used index' message.

		Amit

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] virtio: (Partially) enable suspend/resume support
  2010-10-06 11:54       ` Amit Shah
@ 2010-11-05 10:15         ` Michael S. Tsirkin
  2010-11-22 13:45           ` Amit Shah
  0 siblings, 1 reply; 10+ messages in thread
From: Michael S. Tsirkin @ 2010-11-05 10:15 UTC (permalink / raw)
  To: Amit Shah; +Cc: Anthony Liguori, Virtualization List

On Wed, Oct 06, 2010 at 05:24:18PM +0530, Amit Shah wrote:
> On (Tue) Oct 05 2010 [17:23:19], Michael S. Tsirkin wrote:
> > On Tue, Oct 05, 2010 at 07:15:31PM +0530, Amit Shah wrote:
> > > > > +
> > > > > +	spin_lock_irqsave(&vp_dev->lock, flags);
> > > > > +	list_for_each_entry(info, &vp_dev->virtqueues, node) {
> > > > > +		/* Select the queue we're interested in */
> > > > > +		iowrite16(info->queue_index,
> > > > > +			  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
> > > > > +
> > > > > +		/* Update the last idx we sent data in */
> > > > > +		iowrite16(virtqueue_get_avail_idx(info->vq),
> > > > > +			  vp_dev->ioaddr + VIRTIO_PCI_AVAIL_IDX);
> > > > 
> > > > Interesting. Could we just reset the ring instead?
> > > > I think this would also solve the qemu problem you
> > > > outline, won't it?
> > > 
> > > The problem here is qemu doesn't "know" we went into suspend and came
> > > out of it.  When going to suspend, qemu could've received a kick
> > > notification and would've been just about to process some queue entries.
> > > Now when we resume and reset the ring, qemu could crash anyway seeing
> > > invalid index values.
> > 
> > Hmm, I don't completely understand this.  When there's a reset I expect
> > this to discard any previous kicks. No?
> 
> I'm talking of a situation like this:
> 
> 
> 	Guest					Host
> 
>    virtqueue_add_buf()
>    virtqueue_kick()
> 					virtqueue_pop() (in progress)
> 
>     -->   suspend
> 
> 
> Now there will be some buffers in the virtqueue but the host wouldn't
> know on the next resume.  So we want to keep the ring state in the
> current state so that the host continues consuming from where it left
> off.  

I still don't see it.  why don't we reset on resume?
If there's a reset host must either discard or
flush out operations in progress.

> Now this wouldn't crash on resume for virtio-net, but for virtio-serial
> (which uses chardevs) and also for virtio-block, I guess there are more
> problems.
> 
> For example, for virtio-serial, if the machine is re-started with a
> chardev connected, the virtqueue_num_heads() function gets called, which
> results in the 'Guest moved used index' message.
> 
> 		Amit

Guest moved used index means a vring related bug?
When the ring is reset both host and guest should start from 0,
and index must be 0 too.

-- 
MST

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] virtio: (Partially) enable suspend/resume support
  2010-11-05 10:15         ` Michael S. Tsirkin
@ 2010-11-22 13:45           ` Amit Shah
  2010-11-22 14:54             ` Michael S. Tsirkin
  0 siblings, 1 reply; 10+ messages in thread
From: Amit Shah @ 2010-11-22 13:45 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Matthew Garrett, Anthony Liguori, Virtualization List

(Adding Matthew Garrett and Vadim Rosenfeld)

On (Fri) Nov 05 2010 [12:15:36], Michael S. Tsirkin wrote:
> I still don't see it.  why don't we reset on resume?
> If there's a reset host must either discard or
> flush out operations in progress.

OK, let me list how virtio-serial works, then let's see how Windows and
Linux differ in their suspend/resume implementations and then discuss
solutions.

virtio-serial, in its probe routine, exchanges some information over a
control vq and per-port vqs -- the number of ports, the number of open
ports, the names assigned to the ports, etc.

When Linux starts up, it goes about doing the regular init, calling the
probe routine.  Later, when it detects there is a suspended image
available, it restores that image.

After the image is restored, the vring counts in qemu reflect the
transfer that's taken place in the probe routine, whereas the vring
counts in the guest kernel reflect the pre-suspend values.

The Linux kernel's suspend/resume notifiers currently offer the
following notifications:
- Preparing to go to suspended state (tasks will be frozen now)
- Preparing to restore the image saved at hibernate-time
- Restore succeeded (called after user-space threads are thawed)


What the Windows driver does is destroy all the virtqueues before
suspend and re-init all of them on restore.  This works well, but in the
Linux case, we don't have a notifier that gets called after restore
succeeds and before user-space tasks are thawed, which means for an open
virtio-serial port, a userspace app doing non-stop writes may find the
communication channel broken because the underlying vq vanished.

Windows has one problem with the balloon driver too -- inflate a
balloon, hibernate.  Start the machine, restore image.  Windows thinks
balloon is inflated.  qemu thinks it's deflated.  This can be solved by
Windows providing a balloon size update to qemu but will most perhaps
need a change to qemu.

For network ports, I guess it'll be the same situation - getting the
network up before userspace notices.

		Amit

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] virtio: (Partially) enable suspend/resume support
  2010-11-22 13:45           ` Amit Shah
@ 2010-11-22 14:54             ` Michael S. Tsirkin
  2010-11-22 15:01               ` Matthew Garrett
  2010-11-23  3:18               ` Amit Shah
  0 siblings, 2 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2010-11-22 14:54 UTC (permalink / raw)
  To: Amit Shah; +Cc: Matthew Garrett, Anthony Liguori, Virtualization List

On Mon, Nov 22, 2010 at 07:15:37PM +0530, Amit Shah wrote:
> (Adding Matthew Garrett and Vadim Rosenfeld)
> 
> On (Fri) Nov 05 2010 [12:15:36], Michael S. Tsirkin wrote:
> > I still don't see it.  why don't we reset on resume?
> > If there's a reset host must either discard or
> > flush out operations in progress.
> 
> OK, let me list how virtio-serial works, then let's see how Windows and
> Linux differ in their suspend/resume implementations and then discuss
> solutions.
> 
> virtio-serial, in its probe routine, exchanges some information over a
> control vq and per-port vqs -- the number of ports, the number of open
> ports, the names assigned to the ports, etc.
> 
> When Linux starts up, it goes about doing the regular init, calling the
> probe routine.  Later, when it detects there is a suspended image
> available, it restores that image.
> 
> After the image is restored, the vring counts in qemu reflect the
> transfer that's taken place in the probe routine, whereas the vring
> counts in the guest kernel reflect the pre-suspend values.
> 
> The Linux kernel's suspend/resume notifiers currently offer the
> following notifications:
> - Preparing to go to suspended state (tasks will be frozen now)
> - Preparing to restore the image saved at hibernate-time
> - Restore succeeded (called after user-space threads are thawed)
> 
> 
> What the Windows driver does is destroy all the virtqueues before
> suspend and re-init all of them on restore.  This works well, but in the
> Linux case, we don't have a notifier that gets called after restore
> succeeds and before user-space tasks are thawed,

Let's add one?

> which means for an open
> virtio-serial port, a userspace app doing non-stop writes may find the
> communication channel broken because the underlying vq vanished.
> 
> Windows has one problem with the balloon driver too -- inflate a
> balloon, hibernate.  Start the machine, restore image.  Windows thinks
> balloon is inflated.  qemu thinks it's deflated.  This can be solved by
> Windows providing a balloon size update to qemu but will most perhaps
> need a change to qemu.
> 
> For network ports, I guess it'll be the same situation - getting the
> network up before userspace notices.
> 
> 		Amit

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] virtio: (Partially) enable suspend/resume support
  2010-11-22 14:54             ` Michael S. Tsirkin
@ 2010-11-22 15:01               ` Matthew Garrett
  2010-11-23  3:18               ` Amit Shah
  1 sibling, 0 replies; 10+ messages in thread
From: Matthew Garrett @ 2010-11-22 15:01 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Amit Shah, Anthony Liguori, Virtualization List

On Mon, Nov 22, 2010 at 04:54:54PM +0200, Michael S. Tsirkin wrote:

> > What the Windows driver does is destroy all the virtqueues before
> > suspend and re-init all of them on restore.  This works well, but in the
> > Linux case, we don't have a notifier that gets called after restore
> > succeeds and before user-space tasks are thawed,
> 
> Let's add one?

May be difficult while retaining kABI, but it's certainly something that 
can go upstream. If it's not possible within kABI constraints then we'll 
need a workaround for 6.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] virtio: (Partially) enable suspend/resume support
  2010-11-22 14:54             ` Michael S. Tsirkin
  2010-11-22 15:01               ` Matthew Garrett
@ 2010-11-23  3:18               ` Amit Shah
  1 sibling, 0 replies; 10+ messages in thread
From: Amit Shah @ 2010-11-23  3:18 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Matthew Garrett, Anthony Liguori, Virtualization List

On (Mon) Nov 22 2010 [16:54:54], Michael S. Tsirkin wrote:
> On Mon, Nov 22, 2010 at 07:15:37PM +0530, Amit Shah wrote:
> > 
> > The Linux kernel's suspend/resume notifiers currently offer the
> > following notifications:
> > - Preparing to go to suspended state (tasks will be frozen now)
> > - Preparing to restore the image saved at hibernate-time
> > - Restore succeeded (called after user-space threads are thawed)
> > 
> > 
> > What the Windows driver does is destroy all the virtqueues before
> > suspend and re-init all of them on restore.  This works well, but in the
> > Linux case, we don't have a notifier that gets called after restore
> > succeeds and before user-space tasks are thawed,
> 
> Let's add one?

Yes, that's where we need to start from.

		Amit

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-11-23  3:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-29 12:12 [RFC PATCH] virtio: (Partially) enable suspend/resume support Amit Shah
2010-10-03 15:53 ` Michael S. Tsirkin
2010-10-05 13:45   ` Amit Shah
2010-10-05 15:23     ` Michael S. Tsirkin
2010-10-06 11:54       ` Amit Shah
2010-11-05 10:15         ` Michael S. Tsirkin
2010-11-22 13:45           ` Amit Shah
2010-11-22 14:54             ` Michael S. Tsirkin
2010-11-22 15:01               ` Matthew Garrett
2010-11-23  3:18               ` Amit Shah

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).