Linux virtualization list
 help / color / mirror / Atom feed
* Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
From: Stefan Hajnoczi @ 2011-12-07  9:08 UTC (permalink / raw)
  To: Jason Wang
  Cc: krkumar2, kvm, mst, netdev, virtualization, levinsasha928,
	bhutchings
In-Reply-To: <4EDED785.80804@redhat.com>

On Wed, Dec 7, 2011 at 3:03 AM, Jason Wang <jasowang@redhat.com> wrote:
> On 12/06/2011 09:15 PM, Stefan Hajnoczi wrote:
>>
>> On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang<jasowang@redhat.com>  wrote:
>>>
>>> On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:
>>>>
>>>> On Tue, Dec 6, 2011 at 6:33 AM, Jason Wang<jasowang@redhat.com>
>>>>  wrote:
>>>>>
>>>>> On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:
>>>>>>
>>>>>> On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang<jasowang@redhat.com>
>>>>>>  wrote:
>>>>
>>>> The vcpus are just threads and may not be bound to physical CPUs, so
>>>> what is the big picture here?  Is the guest even in the position to
>>>> set the best queue mappings today?
>>>
>>>
>>> Not sure it could publish the best mapping but the idea is to make sure
>>> the
>>> packets of a flow were handled by the same guest vcpu and may be the same
>>> vhost thread in order to eliminate the packet reordering and lock
>>> contention. But this assumption does not take the bouncing of vhost or
>>> vcpu
>>> threads which would also affect the result.
>>
>> Okay, this is why I'd like to know what the big picture here is.  What
>> solution are you proposing?  How are we going to have everything from
>> guest application, guest kernel, host threads, and host NIC driver
>> play along so we get the right steering up the entire stack.  I think
>> there needs to be an answer to that before changing virtio-net to add
>> any steering mechanism.
>
>
> Consider the complexity of the host nic each with their own steering
> features,  this series make the first step with minimal effort to try to let
> guest driver and host tap/macvtap co-operate like what physical nic does.
> There may be other method, but performance numbers is also needed to give
> the answer.

I agree that performance results for this need to be shown.

My original point is really that it's not a good idea to take
individual steps without a good big picture because this will change
the virtio-net device specification.  If this turns out to be a dead
end then hosts will need to continue to support the interface forever
(legacy guests could still try to use it).  So please first explain
what the full stack picture is going to look like and how you think it
will lead to better performance.  You don't need to have all the code
or evidence, but just enough explanation so we see where this is all
going.

Stefan

^ permalink raw reply

* Re: [PATCH v4 00/12] virtio: s4 support
From: Amit Shah @ 2011-12-07  7:44 UTC (permalink / raw)
  To: Rusty Russell
  Cc: linux-kernel, Michael S. Tsirkin, levinsasha928,
	Virtualization List
In-Reply-To: <87wra8j13m.fsf@rustcorp.com.au>

On (Wed) 07 Dec 2011 [17:54:29], Rusty Russell wrote:
> On Wed,  7 Dec 2011 01:18:38 +0530, Amit Shah <amit.shah@redhat.com> wrote:
> > Hi,
> > 
> > These patches add support for S4 to virtio (pci) and all drivers.
> 
> Dumb meta-question: why do we want to hibernate virtual machines?

Not a dumb question at all :)

But that doesn't mean I can't give a dumb answer: "Because We Can".

> I figure there's a reason, but it seems a bit weird :)

Well, there is one reason right now: migrating storage along with
VMs.  The guest needs to sync all data to the disk before the target
host accesses the image file.  One way to make sure guests don't access
the disk is by adding a new guest command to stop disk accesses.
However, we already have one way of making guests stop doing whatever
they are by putting them into S4 state, and then waking them up on the
remote, with them thinking nothing about them has changed.

(Did I manage to make this sound desirable after the answer above? :)

	     Amit

^ permalink raw reply

* Re: [net-next RFC PATCH 0/5] Series short description
From: Rusty Russell @ 2011-12-07  7:30 UTC (permalink / raw)
  To: Jason Wang, krkumar2, kvm, mst, netdev, virtualization,
	levinsasha928, bhutchings
In-Reply-To: <20111205085603.6116.65101.stgit@dhcp-8-146.nay.redhat.com>

On Mon, 05 Dec 2011 16:58:37 +0800, Jason Wang <jasowang@redhat.com> wrote:
> multiple queue virtio-net: flow steering through host/guest cooperation
> 
> Hello all:
> 
> This is a rough series adds the guest/host cooperation of flow
> steering support based on Krish Kumar's multiple queue virtio-net
> driver patch 3/3 (http://lwn.net/Articles/467283/).

Is there a real (physical) device which does this kind of thing?  How do
they do it?  Can we copy them?

Cheers,
Rusty.

^ permalink raw reply

* Re: [PATCH v4 00/12] virtio: s4 support
From: Rusty Russell @ 2011-12-07  7:24 UTC (permalink / raw)
  To: Virtualization List
  Cc: Amit Shah, linux-kernel, levinsasha928, Michael S. Tsirkin
In-Reply-To: <cover.1323199985.git.amit.shah@redhat.com>

On Wed,  7 Dec 2011 01:18:38 +0530, Amit Shah <amit.shah@redhat.com> wrote:
> Hi,
> 
> These patches add support for S4 to virtio (pci) and all drivers.

Dumb meta-question: why do we want to hibernate virtual machines?

I figure there's a reason, but it seems a bit weird :)

Thanks,
Rusty.

^ permalink raw reply

* Re: [PATCH v4 12/12] virtio: balloon: Add freeze, restore handlers to support S4
From: Amit Shah @ 2011-12-07  4:50 UTC (permalink / raw)
  To: Virtualization List; +Cc: linux-kernel, levinsasha928, Michael S. Tsirkin
In-Reply-To: <5deccc36afa59032f0e3b10a653773bad511f303.1323199985.git.amit.shah@redhat.com>

On (Wed) 07 Dec 2011 [01:18:50], Amit Shah wrote:

[snip]

> Now to not race with a host issuing ballooning requests while we are in
> the process of freezing, we just exit from the vballoon kthread when the
> processes are asked to freeze.  Upon thaw and restore, we re-start the
> thread.

Actually this isn't necessary.  I over-zealously killed the thread
when it's not really necessary: the thread is frozen before calling
the freeze() callback and is thawed only after the restore() or thaw()
callbacks are done, so we're exactly in the same state with or without
keeping the kthread around (just that the PID of the kthread will
change).  So I'll back out this change for the next revision.


		Amit

^ permalink raw reply

* Re: [PATCH v4 01/12] virtio: pci: switch to new PM API
From: Amit Shah @ 2011-12-07  3:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Michael S. Tsirkin, levinsasha928,
	Virtualization List
In-Reply-To: <201112062312.36498.rjw@sisk.pl>

Hi Rafael,

On (Tue) 06 Dec 2011 [23:12:36], Rafael J. Wysocki wrote:
> Hi,
> 
> On Tuesday, December 06, 2011, Amit Shah wrote:
> > The older PM API doesn't have a way to get notifications on hibernate
> > events.  Switch to the newer one that gives us those notifications.
> > 
> > Signed-off-by: Amit Shah <amit.shah@redhat.com>
> > ---
> >  drivers/virtio/virtio_pci.c |   16 ++++++++++++----
> >  1 files changed, 12 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
> > index 03d1984..23e1532 100644
> > --- a/drivers/virtio/virtio_pci.c
> > +++ b/drivers/virtio/virtio_pci.c
> > @@ -708,19 +708,28 @@ static void __devexit virtio_pci_remove(struct pci_dev *pci_dev)
> >  }
> >  
> >  #ifdef CONFIG_PM
> > -static int virtio_pci_suspend(struct pci_dev *pci_dev, pm_message_t state)
> > +static int virtio_pci_suspend(struct device *dev)
> >  {
> > +	struct pci_dev *pci_dev = to_pci_dev(dev);
> > +
> >  	pci_save_state(pci_dev);
> >  	pci_set_power_state(pci_dev, PCI_D3hot);
> >  	return 0;
> >  }
> >  
> > -static int virtio_pci_resume(struct pci_dev *pci_dev)
> > +static int virtio_pci_resume(struct device *dev)
> >  {
> > +	struct pci_dev *pci_dev = to_pci_dev(dev);
> > +
> >  	pci_restore_state(pci_dev);
> >  	pci_set_power_state(pci_dev, PCI_D0);
> >  	return 0;
> >  }
> > +
> > +static const struct dev_pm_ops virtio_pci_pm_ops = {
> > +	.suspend = virtio_pci_suspend,
> > +	.resume  = virtio_pci_resume,
> > +};
> >  #endif
> 
> You seem to have forgotten about hibernation callbacks.

This patch just moves to the new API keeping everything else the same.
The hibernation callbacks come in patch 2.

>  Please use
> one the macros defined in include/linux/pm.h if you want to use the same
> callback routines for hibernation.

No, they're different functions, so I don't use the maros.

Thanks,

		Amit

^ permalink raw reply

* Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
From: Jason Wang @ 2011-12-07  3:03 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: krkumar2, kvm, mst, netdev, virtualization, levinsasha928,
	bhutchings
In-Reply-To: <CAJSP0QXsLwvH5xYj6h0E_V4VLg6DuUc-GKXu9esEYzL2MFcFGw@mail.gmail.com>

On 12/06/2011 09:15 PM, Stefan Hajnoczi wrote:
> On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang<jasowang@redhat.com>  wrote:
>> On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:
>>> On Tue, Dec 6, 2011 at 6:33 AM, Jason Wang<jasowang@redhat.com>    wrote:
>>>> On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:
>>>>> On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang<jasowang@redhat.com>
>>>>>   wrote:
>>> The vcpus are just threads and may not be bound to physical CPUs, so
>>> what is the big picture here?  Is the guest even in the position to
>>> set the best queue mappings today?
>>
>> Not sure it could publish the best mapping but the idea is to make sure the
>> packets of a flow were handled by the same guest vcpu and may be the same
>> vhost thread in order to eliminate the packet reordering and lock
>> contention. But this assumption does not take the bouncing of vhost or vcpu
>> threads which would also affect the result.
> Okay, this is why I'd like to know what the big picture here is.  What
> solution are you proposing?  How are we going to have everything from
> guest application, guest kernel, host threads, and host NIC driver
> play along so we get the right steering up the entire stack.  I think
> there needs to be an answer to that before changing virtio-net to add
> any steering mechanism.

Consider the complexity of the host nic each with their own steering 
features,  this series make the first step with minimal effort to try to 
let guest driver and host tap/macvtap co-operate like what physical nic 
does. There may be other method, but performance numbers is also needed 
to give the answer.
>
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
From: Sridhar Samudrala @ 2011-12-06 23:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: krkumar2, xma, kvm, virtualization, levinsasha928, netdev,
	bhutchings
In-Reply-To: <20111206161422.GA3245@redhat.com>

On 12/6/2011 8:14 AM, Michael S. Tsirkin wrote:
> On Tue, Dec 06, 2011 at 07:42:54AM -0800, Sridhar Samudrala wrote:
>> On 12/6/2011 5:15 AM, Stefan Hajnoczi wrote:
>>> On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang<jasowang@redhat.com>   wrote:
>>>> On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:
>>>>> On Tue, Dec 6, 2011 at 6:33 AM, Jason Wang<jasowang@redhat.com>     wrote:
>>>>>> On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:
>>>>>>> On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang<jasowang@redhat.com>
>>>>>>>   wrote:
>>>>> The vcpus are just threads and may not be bound to physical CPUs, so
>>>>> what is the big picture here?  Is the guest even in the position to
>>>>> set the best queue mappings today?
>>>> Not sure it could publish the best mapping but the idea is to make sure the
>>>> packets of a flow were handled by the same guest vcpu and may be the same
>>>> vhost thread in order to eliminate the packet reordering and lock
>>>> contention. But this assumption does not take the bouncing of vhost or vcpu
>>>> threads which would also affect the result.
>>> Okay, this is why I'd like to know what the big picture here is.  What
>>> solution are you proposing?  How are we going to have everything from
>>> guest application, guest kernel, host threads, and host NIC driver
>>> play along so we get the right steering up the entire stack.  I think
>>> there needs to be an answer to that before changing virtio-net to add
>>> any steering mechanism.
>>>
>>>
>> Yes. Also the current model of  a vhost thread per VM's interface
>> doesn't help with packet steering
>> all the way from the guest to the host physical NIC.
>>
>> I think we need to have vhost thread(s) per-CPU that can handle
>> packets to/from physical NIC's
>> TX/RX queues.
>> Currently we have a single vhost thread for a VM's i/f
>> that handles all the packets from
>> various flows coming from a multi-queue physical NIC.
>>
>> Thanks
>> Sridhar
> It's not hard to try that:
> 1. revert c23f3445e68e1db0e74099f264bc5ff5d55ebdeb
>     this will convert our thread to a workqueue
> 2. convert the workqueue to a per-cpu one
>
> It didn't work that well in the past, but YMMV
Yes. I tried this before we went ahead with per-interface vhost 
threading model.
At that time, per-cpu vhost  showed a regression with a single-VM and
per-vq vhost showed good performance improvements upto 8 VMs.

So  just making it per-cpu would not be enough. I think we may need a way
to schedule vcpu threads on the same cpu-socket as vhost.

Another aspect we need to look into is the splitting of vhost thread 
into separate
threads for TX and RX. Shirley is doing some work in this area and she 
is seeing
perf. improvements as long as TX and RX threads are on the same cpu-socket.
>
> On the surface I'd say a single thread makes some sense
> as long as guest uses a single queue.
>
But this may not be scalable long term when we want to support a large 
number of VMs each
having multiple virtio-net interfaces with multiple queues.

Thanks
Sridhar

^ permalink raw reply

* Re: [PATCH v4 01/12] virtio: pci: switch to new PM API
From: Rafael J. Wysocki @ 2011-12-06 22:12 UTC (permalink / raw)
  To: Amit Shah
  Cc: linux-kernel, Michael S. Tsirkin, levinsasha928,
	Virtualization List
In-Reply-To: <f34bcde3d5103640176cf6a2cf8c534417771f08.1323199985.git.amit.shah@redhat.com>

Hi,

On Tuesday, December 06, 2011, Amit Shah wrote:
> The older PM API doesn't have a way to get notifications on hibernate
> events.  Switch to the newer one that gives us those notifications.
> 
> Signed-off-by: Amit Shah <amit.shah@redhat.com>
> ---
>  drivers/virtio/virtio_pci.c |   16 ++++++++++++----
>  1 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
> index 03d1984..23e1532 100644
> --- a/drivers/virtio/virtio_pci.c
> +++ b/drivers/virtio/virtio_pci.c
> @@ -708,19 +708,28 @@ static void __devexit virtio_pci_remove(struct pci_dev *pci_dev)
>  }
>  
>  #ifdef CONFIG_PM
> -static int virtio_pci_suspend(struct pci_dev *pci_dev, pm_message_t state)
> +static int virtio_pci_suspend(struct device *dev)
>  {
> +	struct pci_dev *pci_dev = to_pci_dev(dev);
> +
>  	pci_save_state(pci_dev);
>  	pci_set_power_state(pci_dev, PCI_D3hot);
>  	return 0;
>  }
>  
> -static int virtio_pci_resume(struct pci_dev *pci_dev)
> +static int virtio_pci_resume(struct device *dev)
>  {
> +	struct pci_dev *pci_dev = to_pci_dev(dev);
> +
>  	pci_restore_state(pci_dev);
>  	pci_set_power_state(pci_dev, PCI_D0);
>  	return 0;
>  }
> +
> +static const struct dev_pm_ops virtio_pci_pm_ops = {
> +	.suspend = virtio_pci_suspend,
> +	.resume  = virtio_pci_resume,
> +};
>  #endif

You seem to have forgotten about hibernation callbacks.  Please use
one the macros defined in include/linux/pm.h if you want to use the same
callback routines for hibernation.

>  static struct pci_driver virtio_pci_driver = {
> @@ -729,8 +738,7 @@ static struct pci_driver virtio_pci_driver = {
>  	.probe		= virtio_pci_probe,
>  	.remove		= __devexit_p(virtio_pci_remove),
>  #ifdef CONFIG_PM
> -	.suspend	= virtio_pci_suspend,
> -	.resume		= virtio_pci_resume,
> +	.driver.pm	= &virtio_pci_pm_ops,
>  #endif
>  };

Thanks,
Rafael

^ permalink raw reply

* [PATCH v4 12/12] virtio: balloon: Add freeze, restore handlers to support S4
From: Amit Shah @ 2011-12-06 19:48 UTC (permalink / raw)
  To: Virtualization List
  Cc: Amit Shah, linux-kernel, levinsasha928, Michael S. Tsirkin
In-Reply-To: <cover.1323199985.git.amit.shah@redhat.com>

Handling balloon hibernate / restore is tricky.  If the balloon was
inflated before going into the hibernation state, upon resume, the host
will not have any memory of that.  Any pages that were passed on to the
host earlier would most likely be invalid, and the host will have to
re-balloon to the previous value to get in the pre-hibernate state.

So the only sane thing for the guest to do here is to discard all the
pages that were put in the balloon.  When to discard the pages is the
next question.

One solution is to deflate the balloon just before writing the image to
the disk (in the freeze() PM callback).  However, asking for pages from
the host just to discard them immediately after seems wasteful of
resources.  Hence, it makes sense to do this by just fudging our
counters soon after wakeup.  This means we don't deflate the balloon
before sleep, and also don't put unnecessary pressure on the host.

This also helps in the thaw case: if the freeze fails for whatever
reason, the balloon should continue to remain in the inflated state.
This was tested by issuing 'swapoff -a' and trying to go into the S4
state.  That fails, and the balloon stays inflated, as expected.  Both
the host and the guest are happy.

Now to not race with a host issuing ballooning requests while we are in
the process of freezing, we just exit from the vballoon kthread when the
processes are asked to freeze.  Upon thaw and restore, we re-start the
thread.

Finally, in the restore() callback, we empty the list of pages that were
previously given off to the host, add the appropriate number of pages to
the totalram_pages counter, reset the num_pages counter to 0, and
all is fine.

As a last step, delete the vqs on the freeze callback to prepare for
hibernation, and re-create them in the restore and thaw callbacks to
resume normal operation.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 drivers/virtio/virtio_balloon.c |   79 ++++++++++++++++++++++++++++++++++++++-
 1 files changed, 78 insertions(+), 1 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 8bf99be..10ec638 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -258,7 +258,13 @@ static int balloon(void *_vballoon)
 	while (!kthread_should_stop()) {
 		s64 diff;
 
-		try_to_freeze();
+		/*
+		 * On suspend, we want to exit this thread.  We will
+		 * start a new thread on resume.
+		 */
+		if (freezing(current))
+			break;
+
 		wait_event_interruptible(vb->config_change,
 					 (diff = towards_target(vb)) != 0
 					 || vb->need_stats_update
@@ -365,6 +371,72 @@ static void __devexit virtballoon_remove(struct virtio_device *vdev)
 	kfree(vb);
 }
 
+#ifdef CONFIG_PM
+static int virtballoon_freeze(struct virtio_device *vdev)
+{
+	/* Ensure we don't get any more requests from the host */
+	vdev->config->reset(vdev);
+
+	/*
+	 * The kthread is already gone as a result of the PM code
+	 * issuing a freeze request.
+	 */
+
+	vdev->config->del_vqs(vdev);
+	return 0;
+}
+
+static int restore_common(struct virtio_device *vdev)
+{
+	struct virtio_balloon *vb = vdev->priv;
+	int err;
+
+	/*
+	 * If init_vqs below fails, a subsequent module removal
+	 * shouldn't cause us to dereference invalid pointers!
+	 */
+	vb->thread = NULL;
+
+	err = init_vqs(vdev->priv);
+	if (err)
+		return err;
+
+	vb->thread = kthread_run(balloon, vb, "vballoon");
+	if (IS_ERR(vb->thread)) {
+		err = PTR_ERR(vb->thread);
+		vb->thread = NULL;
+	}
+	return err;
+}
+
+static int virtballoon_thaw(struct virtio_device *vdev)
+{
+	return restore_common(vdev);
+}
+
+static int virtballoon_restore(struct virtio_device *vdev)
+{
+	struct virtio_balloon *vb = vdev->priv;
+	struct page *page, *page2;
+
+	/* We're starting from a clean slate */
+	vb->num_pages = 0;
+
+	/*
+	 * If a request wasn't complete at the time of freezing, this
+	 * could have been set.
+	 */
+	vb->need_stats_update = 0;
+
+	/* We don't have these pages in the balloon anymore! */
+	list_for_each_entry_safe(page, page2, &vb->pages, lru) {
+		list_del(&page->lru);
+		totalram_pages++;
+	}
+	return restore_common(vdev);
+}
+#endif
+
 static unsigned int features[] = {
 	VIRTIO_BALLOON_F_MUST_TELL_HOST,
 	VIRTIO_BALLOON_F_STATS_VQ,
@@ -379,6 +451,11 @@ static struct virtio_driver virtio_balloon_driver = {
 	.probe =	virtballoon_probe,
 	.remove =	__devexit_p(virtballoon_remove),
 	.config_changed = virtballoon_changed,
+#ifdef CONFIG_PM
+	.freeze	=	virtballoon_freeze,
+	.restore =	virtballoon_restore,
+	.thaw =		virtballoon_thaw,
+#endif
 };
 
 static int __init init(void)
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH v4 11/12] virtio: balloon: Move out vq initialization into separate function
From: Amit Shah @ 2011-12-06 19:48 UTC (permalink / raw)
  To: Virtualization List
  Cc: Amit Shah, linux-kernel, levinsasha928, Michael S. Tsirkin
In-Reply-To: <cover.1323199985.git.amit.shah@redhat.com>

The probe and PM restore functions will share this code.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 drivers/virtio/virtio_balloon.c |   48 ++++++++++++++++++++++++--------------
 1 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 22f7c69..8bf99be 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -275,32 +275,21 @@ static int balloon(void *_vballoon)
 	return 0;
 }
 
-static int virtballoon_probe(struct virtio_device *vdev)
+static int init_vqs(struct virtio_balloon *vb)
 {
-	struct virtio_balloon *vb;
 	struct virtqueue *vqs[3];
 	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request };
 	const char *names[] = { "inflate", "deflate", "stats" };
 	int err, nvqs;
 
-	vdev->priv = vb = kmalloc(sizeof(*vb), GFP_KERNEL);
-	if (!vb) {
-		err = -ENOMEM;
-		goto out;
-	}
-
-	INIT_LIST_HEAD(&vb->pages);
-	vb->num_pages = 0;
-	init_waitqueue_head(&vb->config_change);
-	vb->vdev = vdev;
-	vb->need_stats_update = 0;
-
-	/* We expect two virtqueues: inflate and deflate,
-	 * and optionally stat. */
+	/*
+	 * We expect two virtqueues: inflate and deflate, and
+	 * optionally stat.
+	 */
 	nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
-	err = vdev->config->find_vqs(vdev, nvqs, vqs, callbacks, names);
+	err = vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names);
 	if (err)
-		goto out_free_vb;
+		return err;
 
 	vb->inflate_vq = vqs[0];
 	vb->deflate_vq = vqs[1];
@@ -317,6 +306,29 @@ static int virtballoon_probe(struct virtio_device *vdev)
 			BUG();
 		virtqueue_kick(vb->stats_vq);
 	}
+	return 0;
+}
+
+static int virtballoon_probe(struct virtio_device *vdev)
+{
+	struct virtio_balloon *vb;
+	int err;
+
+	vdev->priv = vb = kmalloc(sizeof(*vb), GFP_KERNEL);
+	if (!vb) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	INIT_LIST_HEAD(&vb->pages);
+	vb->num_pages = 0;
+	init_waitqueue_head(&vb->config_change);
+	vb->vdev = vdev;
+	vb->need_stats_update = 0;
+
+	err = init_vqs(vb);
+	if (err)
+		goto out_free_vb;
 
 	vb->thread = kthread_run(balloon, vb, "vballoon");
 	if (IS_ERR(vb->thread)) {
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH v4 10/12] virtio: balloon: ensure thread exists before stopping it
From: Amit Shah @ 2011-12-06 19:48 UTC (permalink / raw)
  To: Virtualization List
  Cc: Amit Shah, linux-kernel, levinsasha928, Michael S. Tsirkin
In-Reply-To: <cover.1323199985.git.amit.shah@redhat.com>

The vballoon thread could have exited earlier and not re-started.
Ensure we don't try to stop a non-existent thread.

This can happen if the balloon driver goes into S4 state and the thread
exits (this code lands in the next patch).  If, however, on restore, the
vqs fail to initialise, the vballoon thread will not be re-created.
Upon a subsequent module removal in that state, we will end up
dereferencing an invalid pointer without this patch.
---
 drivers/virtio/virtio_balloon.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 94fd738..22f7c69 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -338,7 +338,9 @@ static void __devexit virtballoon_remove(struct virtio_device *vdev)
 {
 	struct virtio_balloon *vb = vdev->priv;
 
-	kthread_stop(vb->thread);
+	/* Thread may not have started on restore after a suspend */
+	if (vb->thread)
+		kthread_stop(vb->thread);
 
 	/* There might be pages left in the balloon: free them. */
 	while (vb->num_pages)
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH v4 09/12] virtio: net: Add freeze, restore handlers to support S4
From: Amit Shah @ 2011-12-06 19:48 UTC (permalink / raw)
  To: Virtualization List
  Cc: Amit Shah, linux-kernel, levinsasha928, Michael S. Tsirkin
In-Reply-To: <cover.1323199985.git.amit.shah@redhat.com>

Remove all the vqs, disable napi and detach from the netdev on
hibernation.

Re-create vqs after restoring from a hibernated image, re-enable napi
and re-attach the netdev.  This keeps networking working across
hibernation.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 drivers/net/virtio_net.c |   36 ++++++++++++++++++++++++++++++++++++
 1 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 697a0fc..1378f3c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1151,6 +1151,38 @@ static void __devexit virtnet_remove(struct virtio_device *vdev)
 	free_netdev(vi->dev);
 }
 
+#ifdef CONFIG_PM
+static int virtnet_freeze(struct virtio_device *vdev)
+{
+	struct virtnet_info *vi = vdev->priv;
+
+	netif_device_detach(vi->dev);
+	if (netif_running(vi->dev))
+		napi_disable(&vi->napi);
+
+	remove_vq_common(vi);
+
+	return 0;
+}
+
+static int virtnet_restore(struct virtio_device *vdev)
+{
+	struct virtnet_info *vi = vdev->priv;
+	int err;
+
+	err = init_vqs(vi);
+	if (err)
+		return err;
+
+	try_fill_recv(vi, GFP_KERNEL);
+	if (netif_running(vi->dev))
+		virtnet_napi_enable(vi);
+
+	netif_device_attach(vi->dev);
+	return 0;
+}
+#endif
+
 static struct virtio_device_id id_table[] = {
 	{ VIRTIO_ID_NET, VIRTIO_DEV_ANY_ID },
 	{ 0 },
@@ -1175,6 +1207,10 @@ static struct virtio_driver virtio_net_driver = {
 	.probe =	virtnet_probe,
 	.remove =	__devexit_p(virtnet_remove),
 	.config_changed = virtnet_config_changed,
+#ifdef CONFIG_PM
+	.freeze =	virtnet_freeze,
+	.restore =	virtnet_restore,
+#endif
 };
 
 static int __init init(void)
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH v4 08/12] virtio: net: Move out vq and vq buf removal into separate function
From: Amit Shah @ 2011-12-06 19:48 UTC (permalink / raw)
  To: Virtualization List
  Cc: Amit Shah, linux-kernel, levinsasha928, Michael S. Tsirkin
In-Reply-To: <cover.1323199985.git.amit.shah@redhat.com>

The remove and PM freeze functions will share this code.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 drivers/net/virtio_net.c |   19 ++++++++++++-------
 1 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 6baa563..697a0fc 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1123,24 +1123,29 @@ static void free_unused_bufs(struct virtnet_info *vi)
 	BUG_ON(vi->num != 0);
 }
 
-static void __devexit virtnet_remove(struct virtio_device *vdev)
+static void remove_vq_common(struct virtnet_info *vi)
 {
-	struct virtnet_info *vi = vdev->priv;
-
 	/* Stop all the virtqueues. */
-	vdev->config->reset(vdev);
-
+	vi->vdev->config->reset(vi->vdev);
 
-	unregister_netdev(vi->dev);
 	cancel_delayed_work_sync(&vi->refill);
 
 	/* Free unused buffers in both send and recv, if any. */
 	free_unused_bufs(vi);
 
-	vdev->config->del_vqs(vi->vdev);
+	vi->vdev->config->del_vqs(vi->vdev);
 
 	while (vi->pages)
 		__free_pages(get_a_page(vi, GFP_KERNEL), 0);
+}
+
+static void __devexit virtnet_remove(struct virtio_device *vdev)
+{
+	struct virtnet_info *vi = vdev->priv;
+
+	unregister_netdev(vi->dev);
+
+	remove_vq_common(vi);
 
 	free_percpu(vi->stats);
 	free_netdev(vi->dev);
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH v4 07/12] virtio: net: Move out vq initialization into separate function
From: Amit Shah @ 2011-12-06 19:48 UTC (permalink / raw)
  To: Virtualization List
  Cc: Amit Shah, linux-kernel, levinsasha928, Michael S. Tsirkin
In-Reply-To: <cover.1323199985.git.amit.shah@redhat.com>

The probe and PM restore functions will share this code.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 drivers/net/virtio_net.c |   47 +++++++++++++++++++++++++++------------------
 1 files changed, 28 insertions(+), 19 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 6ee8410..6baa563 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -954,15 +954,38 @@ static void virtnet_config_changed(struct virtio_device *vdev)
 	virtnet_update_status(vi);
 }
 
+static int init_vqs(struct virtnet_info *vi)
+{
+	struct virtqueue *vqs[3];
+	vq_callback_t *callbacks[] = { skb_recv_done, skb_xmit_done, NULL};
+	const char *names[] = { "input", "output", "control" };
+	int nvqs, err;
+
+	/* We expect two virtqueues, receive then send,
+	 * and optionally control. */
+	nvqs = virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ? 3 : 2;
+
+	err = vi->vdev->config->find_vqs(vi->vdev, nvqs, vqs, callbacks, names);
+	if (err)
+		return err;
+
+	vi->rvq = vqs[0];
+	vi->svq = vqs[1];
+
+	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)) {
+		vi->cvq = vqs[2];
+
+		if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VLAN))
+			vi->dev->features |= NETIF_F_HW_VLAN_FILTER;
+	}
+	return 0;
+}
+
 static int virtnet_probe(struct virtio_device *vdev)
 {
 	int err;
 	struct net_device *dev;
 	struct virtnet_info *vi;
-	struct virtqueue *vqs[3];
-	vq_callback_t *callbacks[] = { skb_recv_done, skb_xmit_done, NULL};
-	const char *names[] = { "input", "output", "control" };
-	int nvqs;
 
 	/* Allocate ourselves a network device with room for our info */
 	dev = alloc_etherdev(sizeof(struct virtnet_info));
@@ -1034,24 +1057,10 @@ static int virtnet_probe(struct virtio_device *vdev)
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
 		vi->mergeable_rx_bufs = true;
 
-	/* We expect two virtqueues, receive then send,
-	 * and optionally control. */
-	nvqs = virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ? 3 : 2;
-
-	err = vdev->config->find_vqs(vdev, nvqs, vqs, callbacks, names);
+	err = init_vqs(vi);
 	if (err)
 		goto free_stats;
 
-	vi->rvq = vqs[0];
-	vi->svq = vqs[1];
-
-	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)) {
-		vi->cvq = vqs[2];
-
-		if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VLAN))
-			dev->features |= NETIF_F_HW_VLAN_FILTER;
-	}
-
 	err = register_netdev(dev);
 	if (err) {
 		pr_debug("virtio_net: registering device failed\n");
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH v4 06/12] virtio: blk: Add freeze, restore handlers to support S4
From: Amit Shah @ 2011-12-06 19:48 UTC (permalink / raw)
  To: Virtualization List
  Cc: Amit Shah, linux-kernel, levinsasha928, Michael S. Tsirkin
In-Reply-To: <cover.1323199985.git.amit.shah@redhat.com>

Delete the vq and flush any pending requests from the block queue on the
freeze callback to prepare for hibernation.

Re-create the vq in the restore callback to resume normal function.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 drivers/block/virtio_blk.c |   38 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 467f218..a9147a6 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -568,6 +568,40 @@ static void __devexit virtblk_remove(struct virtio_device *vdev)
 	ida_simple_remove(&vd_index_ida, index);
 }
 
+#ifdef CONFIG_PM
+static int virtblk_freeze(struct virtio_device *vdev)
+{
+	struct virtio_blk *vblk = vdev->priv;
+
+	/* Ensure we don't receive any more interrupts */
+	vdev->config->reset(vdev);
+
+	flush_work(&vblk->config_work);
+
+	spin_lock_irq(vblk->disk->queue->queue_lock);
+	blk_stop_queue(vblk->disk->queue);
+	spin_unlock_irq(vblk->disk->queue->queue_lock);
+	blk_sync_queue(vblk->disk->queue);
+
+	vdev->config->del_vqs(vdev);
+	return 0;
+}
+
+static int virtblk_restore(struct virtio_device *vdev)
+{
+	struct virtio_blk *vblk = vdev->priv;
+	int ret;
+
+	ret = init_vq(vdev->priv);
+	if (!ret) {
+		spin_lock_irq(vblk->disk->queue->queue_lock);
+		blk_start_queue(vblk->disk->queue);
+		spin_unlock_irq(vblk->disk->queue->queue_lock);
+	}
+	return ret;
+}
+#endif
+
 static const struct virtio_device_id id_table[] = {
 	{ VIRTIO_ID_BLOCK, VIRTIO_DEV_ANY_ID },
 	{ 0 },
@@ -593,6 +627,10 @@ static struct virtio_driver __refdata virtio_blk = {
 	.probe			= virtblk_probe,
 	.remove			= __devexit_p(virtblk_remove),
 	.config_changed		= virtblk_config_changed,
+#ifdef CONFIG_PM
+	.freeze			= virtblk_freeze,
+	.restore		= virtblk_restore,
+#endif
 };
 
 static int __init init(void)
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH v4 05/12] virtio: blk: Move out vq initialization to separate function
From: Amit Shah @ 2011-12-06 19:48 UTC (permalink / raw)
  To: Virtualization List
  Cc: Amit Shah, linux-kernel, levinsasha928, Michael S. Tsirkin
In-Reply-To: <cover.1323199985.git.amit.shah@redhat.com>

The probe and PM restore functions will share this code.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 drivers/block/virtio_blk.c |   19 ++++++++++++++-----
 1 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 4d0b70a..467f218 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -349,6 +349,18 @@ static void virtblk_config_changed(struct virtio_device *vdev)
 	queue_work(virtblk_wq, &vblk->config_work);
 }
 
+static int init_vq(struct virtio_blk *vblk)
+{
+	int err = 0;
+
+	/* We expect one virtqueue, for output. */
+	vblk->vq = virtio_find_single_vq(vblk->vdev, blk_done, "requests");
+	if (IS_ERR(vblk->vq))
+		err = PTR_ERR(vblk->vq);
+
+	return err;
+}
+
 static int __devinit virtblk_probe(struct virtio_device *vdev)
 {
 	struct virtio_blk *vblk;
@@ -390,12 +402,9 @@ static int __devinit virtblk_probe(struct virtio_device *vdev)
 	sg_init_table(vblk->sg, vblk->sg_elems);
 	INIT_WORK(&vblk->config_work, virtblk_config_changed_work);
 
-	/* We expect one virtqueue, for output. */
-	vblk->vq = virtio_find_single_vq(vdev, blk_done, "requests");
-	if (IS_ERR(vblk->vq)) {
-		err = PTR_ERR(vblk->vq);
+	err = init_vq(vblk);
+	if (err)
 		goto out_free_vblk;
-	}
 
 	vblk->pool = mempool_create_kmalloc_pool(1,sizeof(struct virtblk_req));
 	if (!vblk->pool) {
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH v4 04/12] virtio: console: Add freeze and restore handlers to support S4
From: Amit Shah @ 2011-12-06 19:48 UTC (permalink / raw)
  To: Virtualization List
  Cc: Amit Shah, linux-kernel, levinsasha928, Michael S. Tsirkin
In-Reply-To: <cover.1323199985.git.amit.shah@redhat.com>

Remove all vqs and associated buffers in the freeze callback which
prepares us to go into hibernation state.  On restore, re-create all the
vqs and populate the input vqs with buffers to get to the pre-hibernate
state.

Note: Any outstanding unconsumed buffers are discarded; which means
there's a possibility of data loss in case the host or the guest didn't
consume any data already present in the vqs.  This can be addressed in a
later patch series, perhaps in virtio common code.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 drivers/char/virtio_console.c |   58 +++++++++++++++++++++++++++++++++++++++++
 1 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index e14f5aa..fd2fd6f 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1844,6 +1844,60 @@ static unsigned int features[] = {
 	VIRTIO_CONSOLE_F_MULTIPORT,
 };
 
+#ifdef CONFIG_PM
+static int virtcons_freeze(struct virtio_device *vdev)
+{
+	struct ports_device *portdev;
+	struct port *port;
+
+	portdev = vdev->priv;
+
+	vdev->config->reset(vdev);
+
+	cancel_work_sync(&portdev->control_work);
+	remove_controlq_data(portdev);
+
+	list_for_each_entry(port, &portdev->ports, list) {
+		/*
+		 * We'll ask the host later if the new invocation has
+		 * the port opened or closed.
+		 */
+		port->host_connected = false;
+		remove_port_data(port);
+	}
+	remove_vqs(portdev);
+
+	return 0;
+}
+
+static int virtcons_restore(struct virtio_device *vdev)
+{
+	struct ports_device *portdev;
+	struct port *port;
+	int ret;
+
+	portdev = vdev->priv;
+
+	ret = init_vqs(portdev);
+	if (ret)
+		return ret;
+
+	if (use_multiport(portdev))
+		fill_queue(portdev->c_ivq, &portdev->cvq_lock);
+
+	list_for_each_entry(port, &portdev->ports, list) {
+		port->in_vq = portdev->in_vqs[port->id];
+		port->out_vq = portdev->out_vqs[port->id];
+
+		fill_queue(port->in_vq, &port->inbuf_lock);
+
+		/* Get port open/close status on the host */
+		send_control_msg(port, VIRTIO_CONSOLE_PORT_READY, 1);
+	}
+	return 0;
+}
+#endif
+
 static struct virtio_driver virtio_console = {
 	.feature_table = features,
 	.feature_table_size = ARRAY_SIZE(features),
@@ -1853,6 +1907,10 @@ static struct virtio_driver virtio_console = {
 	.probe =	virtcons_probe,
 	.remove =	virtcons_remove,
 	.config_changed = config_intr,
+#ifdef CONFIG_PM
+	.freeze =	virtcons_freeze,
+	.restore =	virtcons_restore,
+#endif
 };
 
 static int __init init(void)
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH v4 03/12] virtio: console: Move out vq and vq buf removal into separate functions
From: Amit Shah @ 2011-12-06 19:48 UTC (permalink / raw)
  To: Virtualization List
  Cc: Amit Shah, linux-kernel, levinsasha928, Michael S. Tsirkin
In-Reply-To: <cover.1323199985.git.amit.shah@redhat.com>

This common code will be shared with the PM freeze function.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 drivers/char/virtio_console.c |   68 ++++++++++++++++++++++++-----------------
 1 files changed, 40 insertions(+), 28 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 8e3c46d..e14f5aa 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1271,6 +1271,20 @@ static void remove_port(struct kref *kref)
 	kfree(port);
 }
 
+static void remove_port_data(struct port *port)
+{
+	struct port_buffer *buf;
+
+	/* Remove unused data this port might have received. */
+	discard_port_data(port);
+
+	reclaim_consumed_buffers(port);
+
+	/* Remove buffers we queued up for the Host to send us data in. */
+	while ((buf = virtqueue_detach_unused_buf(port->in_vq)))
+		free_buf(buf);
+}
+
 /*
  * Port got unplugged.  Remove port from portdev's list and drop the
  * kref reference.  If no userspace has this port opened, it will
@@ -1278,8 +1292,6 @@ static void remove_port(struct kref *kref)
  */
 static void unplug_port(struct port *port)
 {
-	struct port_buffer *buf;
-
 	spin_lock_irq(&port->portdev->ports_lock);
 	list_del(&port->list);
 	spin_unlock_irq(&port->portdev->ports_lock);
@@ -1300,14 +1312,7 @@ static void unplug_port(struct port *port)
 		hvc_remove(port->cons.hvc);
 	}
 
-	/* Remove unused data this port might have received. */
-	discard_port_data(port);
-
-	reclaim_consumed_buffers(port);
-
-	/* Remove buffers we queued up for the Host to send us data in. */
-	while ((buf = virtqueue_detach_unused_buf(port->in_vq)))
-		free_buf(buf);
+	remove_port_data(port);
 
 	/*
 	 * We should just assume the device itself has gone off --
@@ -1659,6 +1664,28 @@ static const struct file_operations portdev_fops = {
 	.owner = THIS_MODULE,
 };
 
+static void remove_vqs(struct ports_device *portdev)
+{
+	portdev->vdev->config->del_vqs(portdev->vdev);
+	kfree(portdev->in_vqs);
+	kfree(portdev->out_vqs);
+}
+
+static void remove_controlq_data(struct ports_device *portdev)
+{
+	struct port_buffer *buf;
+	unsigned int len;
+
+	if (!use_multiport(portdev))
+		return;
+
+	while ((buf = virtqueue_get_buf(portdev->c_ivq, &len)))
+		free_buf(buf);
+
+	while ((buf = virtqueue_detach_unused_buf(portdev->c_ivq)))
+		free_buf(buf);
+}
+
 /*
  * Once we're further in boot, we get probed like any other virtio
  * device.
@@ -1764,9 +1791,7 @@ free_vqs:
 	/* The host might want to notify mgmt sw about device add failure */
 	__send_control_msg(portdev, VIRTIO_CONSOLE_BAD_ID,
 			   VIRTIO_CONSOLE_DEVICE_READY, 0);
-	vdev->config->del_vqs(vdev);
-	kfree(portdev->in_vqs);
-	kfree(portdev->out_vqs);
+	remove_vqs(portdev);
 free_chrdev:
 	unregister_chrdev(portdev->chr_major, "virtio-portsdev");
 free:
@@ -1804,21 +1829,8 @@ static void virtcons_remove(struct virtio_device *vdev)
 	 * have to just stop using the port, as the vqs are going
 	 * away.
 	 */
-	if (use_multiport(portdev)) {
-		struct port_buffer *buf;
-		unsigned int len;
-
-		while ((buf = virtqueue_get_buf(portdev->c_ivq, &len)))
-			free_buf(buf);
-
-		while ((buf = virtqueue_detach_unused_buf(portdev->c_ivq)))
-			free_buf(buf);
-	}
-
-	vdev->config->del_vqs(vdev);
-	kfree(portdev->in_vqs);
-	kfree(portdev->out_vqs);
-
+	remove_controlq_data(portdev);
+	remove_vqs(portdev);
 	kfree(portdev);
 }
 
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH v4 02/12] virtio: pci: add PM notification handlers for restore, freeze, thaw, poweroff
From: Amit Shah @ 2011-12-06 19:48 UTC (permalink / raw)
  To: Virtualization List
  Cc: Amit Shah, linux-kernel, levinsasha928, Michael S. Tsirkin
In-Reply-To: <cover.1323199985.git.amit.shah@redhat.com>

Handle thaw, restore and freeze notifications from the PM core.  Expose
these to individual virtio drivers that can quiesce and resume vq
operations.  For drivers not implementing the thaw() method, use the
restore method instead.

These functions also save device-specific data so that the device can be
put in pre-suspend state after resume, and disable and enable the PCI
device in the freeze and resume functions, respectively.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 drivers/virtio/virtio_pci.c |   85 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/virtio.h      |    5 +++
 2 files changed, 90 insertions(+), 0 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 23e1532..bd33603 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -55,6 +55,10 @@ struct virtio_pci_device
 	unsigned msix_vectors;
 	/* Vectors allocated, excluding per-vq vectors if any */
 	unsigned msix_used_vectors;
+
+	/* Status saved during hibernate/restore */
+	u8 saved_status;
+
 	/* Whether we have vector per vq */
 	bool per_vq_vectors;
 };
@@ -726,9 +730,90 @@ static int virtio_pci_resume(struct device *dev)
 	return 0;
 }
 
+static int virtio_pci_freeze(struct device *dev)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
+	struct virtio_driver *drv;
+	int ret;
+
+	drv = container_of(vp_dev->vdev.dev.driver,
+			   struct virtio_driver, driver);
+
+	ret = 0;
+	vp_dev->saved_status = vp_get_status(&vp_dev->vdev);
+	if (drv && drv->freeze)
+		ret = drv->freeze(&vp_dev->vdev);
+
+	if (!ret)
+		pci_disable_device(pci_dev);
+	return ret;
+}
+
+static int restore_common(struct device *dev)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
+	int ret;
+
+	ret = pci_enable_device(pci_dev);
+	if (ret)
+		return ret;
+	pci_set_master(pci_dev);
+	vp_set_status(&vp_dev->vdev, vp_dev->saved_status);
+	vp_finalize_features(&vp_dev->vdev);
+
+	return ret;
+}
+
+static int virtio_pci_restore(struct device *dev)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
+	struct virtio_driver *drv;
+	int ret;
+
+	drv = container_of(vp_dev->vdev.dev.driver,
+			   struct virtio_driver, driver);
+
+	ret = restore_common(dev);
+	if (!ret && drv && drv->restore)
+		ret = drv->restore(&vp_dev->vdev);
+
+	return ret;
+}
+
+static int virtio_pci_thaw(struct device *dev)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
+	struct virtio_driver *drv;
+	int ret;
+
+	ret = restore_common(dev);
+	if (ret)
+		return ret;
+
+	drv = container_of(vp_dev->vdev.dev.driver,
+			   struct virtio_driver, driver);
+	if (!drv)
+		return ret;
+
+	if (drv->thaw)
+		ret = drv->thaw(&vp_dev->vdev);
+	else if (drv->restore)
+		ret = drv->restore(&vp_dev->vdev);
+
+	return ret;
+}
+
 static const struct dev_pm_ops virtio_pci_pm_ops = {
 	.suspend = virtio_pci_suspend,
 	.resume  = virtio_pci_resume,
+	.freeze  = virtio_pci_freeze,
+	.thaw    = virtio_pci_thaw,
+	.restore = virtio_pci_restore,
+	.poweroff = virtio_pci_suspend,
 };
 #endif
 
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 4c069d8..92902ab 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -146,6 +146,11 @@ struct virtio_driver {
 	int (*probe)(struct virtio_device *dev);
 	void (*remove)(struct virtio_device *dev);
 	void (*config_changed)(struct virtio_device *dev);
+#ifdef CONFIG_PM
+	int (*freeze)(struct virtio_device *dev);
+	int (*thaw)(struct virtio_device *dev);
+	int (*restore)(struct virtio_device *dev);
+#endif
 };
 
 int register_virtio_driver(struct virtio_driver *drv);
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH v4 01/12] virtio: pci: switch to new PM API
From: Amit Shah @ 2011-12-06 19:48 UTC (permalink / raw)
  To: Virtualization List
  Cc: Amit Shah, linux-kernel, levinsasha928, Michael S. Tsirkin
In-Reply-To: <cover.1323199985.git.amit.shah@redhat.com>

The older PM API doesn't have a way to get notifications on hibernate
events.  Switch to the newer one that gives us those notifications.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 drivers/virtio/virtio_pci.c |   16 ++++++++++++----
 1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 03d1984..23e1532 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -708,19 +708,28 @@ static void __devexit virtio_pci_remove(struct pci_dev *pci_dev)
 }
 
 #ifdef CONFIG_PM
-static int virtio_pci_suspend(struct pci_dev *pci_dev, pm_message_t state)
+static int virtio_pci_suspend(struct device *dev)
 {
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+
 	pci_save_state(pci_dev);
 	pci_set_power_state(pci_dev, PCI_D3hot);
 	return 0;
 }
 
-static int virtio_pci_resume(struct pci_dev *pci_dev)
+static int virtio_pci_resume(struct device *dev)
 {
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+
 	pci_restore_state(pci_dev);
 	pci_set_power_state(pci_dev, PCI_D0);
 	return 0;
 }
+
+static const struct dev_pm_ops virtio_pci_pm_ops = {
+	.suspend = virtio_pci_suspend,
+	.resume  = virtio_pci_resume,
+};
 #endif
 
 static struct pci_driver virtio_pci_driver = {
@@ -729,8 +738,7 @@ static struct pci_driver virtio_pci_driver = {
 	.probe		= virtio_pci_probe,
 	.remove		= __devexit_p(virtio_pci_remove),
 #ifdef CONFIG_PM
-	.suspend	= virtio_pci_suspend,
-	.resume		= virtio_pci_resume,
+	.driver.pm	= &virtio_pci_pm_ops,
 #endif
 };
 
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH v4 00/12] virtio: s4 support
From: Amit Shah @ 2011-12-06 19:48 UTC (permalink / raw)
  To: Virtualization List
  Cc: Amit Shah, linux-kernel, levinsasha928, Michael S. Tsirkin

Hi,

These patches add support for S4 to virtio (pci) and all drivers.

For each driver, all vqs are removed before hibernation, and then
re-created after restore.  Some driver-specific uninit and init work
is also done in the freeze and restore functions.

All the drivers in testing work fine:

* virtio-blk is used for the only disk in the VM, IO works fine before
  and after.  'dd if=/dev/zero of=/tmp/bigfile bs=1024 count=200000'
  across S4 gives same sha1sum for the file in the guest as well as
  one that's created without invoking S4.

* virtio-console: port IO keeps working fine before and after.
  * If a port is waiting for data from the host (blocking read(2)
    call), this works fine in both the cases: host-side connection is
    available or unavailable after resume.  In case the host-side
    connection isn't available, the blocking call is terminated.  If
    it is available, the call continues to remain in blocked state
    till further data arrives.

* virtio-net: ping remains active across S4.

* virtio-balloon: Works fine before and after.  Forgets the ballooned
  value across S4 (see details in commit log). Maintains ballooned
  value on failed freeze.

All of these tests are run in parallel.

I have some more tests lined up on similar lines above.  I'll reply
here if something breaks.

Please review and apply if appropriate,

v4:
 - Disable / enable napi across S4 (Michael S. Tsirkin)
 - Balloon: lots of improvements (I had neglected this driver thinking
   it was a simple one, but this one needed the most thought!  Check
   the commit log for patch 12 for details.)
 - Net, Blk: Reset device as the first operation on freeze

v3:
 - Reset vqs before deleting them (Sasha Levin)
 - Flush block queue before freeze (Rusty)
 - Detach netdev before freeze (Michael S. Tsirkin)

v2:
 - fix checkpatch errors/warnings

Amit Shah (12):
  virtio: pci: switch to new PM API
  virtio: pci: add PM notification handlers for restore, freeze, thaw,
    poweroff
  virtio: console: Move out vq and vq buf removal into separate
    functions
  virtio: console: Add freeze and restore handlers to support S4
  virtio: blk: Move out vq initialization to separate function
  virtio: blk: Add freeze, restore handlers to support S4
  virtio: net: Move out vq initialization into separate function
  virtio: net: Move out vq and vq buf removal into separate function
  virtio: net: Add freeze, restore handlers to support S4
  virtio: balloon: ensure thread exists before stopping it
  virtio: balloon: Move out vq initialization into separate function
  virtio: balloon: Add freeze, restore handlers to support S4

 drivers/block/virtio_blk.c      |   57 +++++++++++++++--
 drivers/char/virtio_console.c   |  126 +++++++++++++++++++++++++++++--------
 drivers/net/virtio_net.c        |  102 ++++++++++++++++++++++--------
 drivers/virtio/virtio_balloon.c |  131 +++++++++++++++++++++++++++++++++------
 drivers/virtio/virtio_pci.c     |  101 +++++++++++++++++++++++++++++-
 include/linux/virtio.h          |    5 ++
 6 files changed, 439 insertions(+), 83 deletions(-)

-- 
1.7.7.3

^ permalink raw reply

* Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
From: Ben Hutchings @ 2011-12-06 17:36 UTC (permalink / raw)
  To: Jason Wang; +Cc: krkumar2, kvm, mst, netdev, virtualization, levinsasha928
In-Reply-To: <4EDDC35C.2070100@redhat.com>

On Tue, 2011-12-06 at 15:25 +0800, Jason Wang wrote:
> On 12/06/2011 04:42 AM, Ben Hutchings wrote:
[...]
> > This is not a proper implementation of ndo_rx_flow_steer.  If you steer
> > a flow by changing the RSS table this can easily cause packet reordering
> > in other flows.  The filtering should be more precise, ideally matching
> > exactly a single flow by e.g. VID and IP 5-tuple.
> >
> > I think you need to add a second hash table which records exactly which
> > flow is supposed to be steered.  Also, you must call
> > rps_may_expire_flow() to check whether an entry in this table may be
> > replaced; otherwise you can cause packet reordering in the flow that was
> > previously being steered.
> >
> > Finally, this function must return the table index it assigned, so that
> > rps_may_expire_flow() works.
> 
> Thanks for the explanation, how about document this briefly in scaling.txt?
[...]

I believe scaling.txt is intended for users/administrators, not
developers.

The documentation for implementers of accelerated RFS is in the comment
for struct net_device_ops and the commit message adding it.  But I
really should improve that comment.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [net-next RFC PATCH 2/5] tuntap: simple flow director support
From: Ben Hutchings @ 2011-12-06 17:31 UTC (permalink / raw)
  To: Jason Wang; +Cc: krkumar2, kvm, mst, netdev, virtualization, levinsasha928
In-Reply-To: <4EDDC27D.9050608@redhat.com>

On Tue, 2011-12-06 at 15:21 +0800, Jason Wang wrote:
> On 12/06/2011 04:09 AM, Ben Hutchings wrote:
> > On Mon, 2011-12-05 at 16:58 +0800, Jason Wang wrote:
> >> This patch adds a simple flow director to tun/tap device. It is just a
> >> page that contains the hash to queue mapping which could be changed by
> >> user-space. The backend (tap/macvtap) would query this table to get
> >> the desired queue of a packets when it send packets to userspace.
> > This is just flow hashing (RSS), not flow steering.
> >
> >> The page address were set through a new kind of ioctl - TUNSETFD and
> >> were pinned until device exit or another new page were specified.
> > [...]
> >
> > You should implement ethtool ETHTOOL_{G,S}RXFHINDIR instead.
> >
> > Ben.
> >
> 
> I'm not fully understanding this. The page belongs to guest, and the 
> idea is to let guest driver can easily change any entry. Looks like if 
> ethtool_set_rxfh_indir() is used, this kind of change is not easy as it 
> needs one copy and can only accept the whole table as its parameters.

Sorry, yes, I was misreading this.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH v3 2/3] hvc_init(): Enforce one-time initialization.
From: Miche Baker-Harvey @ 2011-12-06 17:05 UTC (permalink / raw)
  To: Amit Shah
  Cc: Stephen Rothwell, xen-devel, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, linux-kernel, virtualization,
	Anton Blanchard, Mike Waychison, ppc-dev, Greg Kroah-Hartman,
	Eric Northrup
In-Reply-To: <20111205105452.GB27683@amit-x200.redhat.com>

Amit,

Ah, indeed.  I am not using MSI-X, so virtio_pci::vp_try_to_find_vqs()
calls vp_request_intx() and sets up an interrupt callback.  From
there, when an interrupt occurs, the stack looks something like this:

virtio_pci::vp_interrupt()
  virtio_pci::vp_vring_interrupt()
    virtio_ring::vring_interrupt()
      vq->vq.callback()  <-- in this case, that's virtio_console::control_intr()
        workqueue::schedule_work()
          workqueue::queue_work()
            queue_work_on(get_cpu())  <-- queues the work on the current CPU.

I'm not doing anything to keep multiple control message from being
sent concurrently to the guest, and we will take those interrupts on
any CPU. I've confirmed that the two instances of
handle_control_message() are occurring on different CPUs.

Should this work?  I don't see anywhere that QEMU is serializing the
sending of data to the control queue in the guest, and there's no
serialization in
the control_intr.  I don't understand why you are not seeing the
concurrent execution of handle_control_message().  Are you taking all
your interrupts on a single CPU, maybe?  Or is there some other
serialization in user space?

Miche


On Mon, Dec 5, 2011 at 2:54 AM, Amit Shah <amit.shah@redhat.com> wrote:
> On (Tue) 29 Nov 2011 [09:50:41], Miche Baker-Harvey wrote:
>> Good grief!  Sorry for the spacing mess-up!  Here's a resend with reformatting.
>>
>> Amit,
>> We aren't using either QEMU or kvmtool, but we are using KVM.  All
>
> So it's a different userspace?  Any chance this different userspace is
> causing these problems to appear?  Esp. since I couldn't reproduce
> with qemu.
>
>> the issues we are seeing happen when we try to establish multiple
>> virtioconsoles at boot time.  The command line isn't relevant, but I
>> can tell you the protocol that's passing between the host (kvm) and
>> the guest (see the end of this message).
>>
>> We do go through the control_work_handler(), but it's not
>> providing synchronization.  Here's a trace of the
>> control_work_handler() and handle_control_message() calls; note that
>> there are two concurrent calls to control_work_handler().
>
> Ah; how does that happen?  control_work_handler() should just be
> invoked once, and if there are any more pending work items to be
> consumed, they should be done within the loop inside
> control_work_handler().
>
>> I decorated control_work_handler() with a "lifetime" marker, and
>> passed this value to handle_control_message(), so we can see which
>> control messages are being handled from which instance of
>> the control_work_handler() thread.
>>
>> Notice that we enter control_work_handler() a second time before
>> the handling of the second PORT_ADD message is complete. The
>> first CONSOLE_PORT message is handled by the second
>> control_work_handler() call, but the second is handled by the first
>> control_work_handler() call.
>>
>> root@myubuntu:~# dmesg | grep MBH
>> [3371055.808738] control_work_handler #1
>> [3371055.809372] + #1 handle_control_message PORT_ADD
>> [3371055.810169] - handle_control_message PORT_ADD
>> [3371055.810170] + #1 handle_control_message PORT_ADD
>> [3371055.810244]  control_work_handler #2
>> [3371055.810245] + #2 handle_control_message CONSOLE_PORT
>> [3371055.810246]  got hvc_ports_mutex
>> [3371055.810578] - handle_control_message PORT_ADD
>> [3371055.810579] + #1 handle_control_message CONSOLE_PORT
>> [3371055.810580]  trylock of hvc_ports_mutex failed
>> [3371055.811352]  got hvc_ports_mutex
>> [3371055.811370] - handle_control_message CONSOLE_PORT
>> [3371055.816609] - handle_control_message CONSOLE_PORT
>>
>> So, I'm guessing the bug is that there shouldn't be two instances of
>> control_work_handler() running simultaneously?
>
> Yep, I assumed we did that but apparently not.  Do you plan to chase
> this one down?
>
>                Amit
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox