From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1NZV4l-0007QQ-VY
	for qemu-devel@nongnu.org; Mon, 25 Jan 2010 14:58:20 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1NZV4h-0007JJ-Ag
	for qemu-devel@nongnu.org; Mon, 25 Jan 2010 14:58:19 -0500
Received: from [199.232.76.173] (port=44355 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1NZV4h-0007J6-52
	for qemu-devel@nongnu.org; Mon, 25 Jan 2010 14:58:15 -0500
Received: from mail-fx0-f222.google.com ([209.85.220.222]:48603)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <anthony@codemonkey.ws>) id 1NZV4g-00048Z-Hw
	for qemu-devel@nongnu.org; Mon, 25 Jan 2010 14:58:14 -0500
Received: by fxm22 with SMTP id 22so8284955fxm.2
	for <qemu-devel@nongnu.org>; Mon, 25 Jan 2010 11:58:13 -0800 (PST)
Message-ID: <4B5DF7D0.7050609@codemonkey.ws>
Date: Mon, 25 Jan 2010 13:58:08 -0600
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
References: <cover.1263230079.git.mst@redhat.com>
	<20100111172326.GB12084@redhat.com>
In-Reply-To: <20100111172326.GB12084@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: [Qemu-devel] Re: [PATCH-RFC 13/13] virtio-net: connect to vhost net
	backend
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: qemu-devel@nongnu.org

On 01/11/2010 11:23 AM, Michael S. Tsirkin wrote:
> start/stop backend on driver start/stop
>
> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
> ---
>   hw/virtio-net.c |   40 ++++++++++++++++++++++++++++++++++++++++
>   1 files changed, 40 insertions(+), 0 deletions(-)
>
> diff --git a/hw/virtio-net.c b/hw/virtio-net.c
> index c2a389f..99169e1 100644
> --- a/hw/virtio-net.c
> +++ b/hw/virtio-net.c
> @@ -17,6 +17,7 @@
>   #include "net/tap.h"
>   #include "qemu-timer.h"
>   #include "virtio-net.h"
> +#include "vhost_net.h"
>
>   #define VIRTIO_NET_VM_VERSION    11
>
> @@ -47,6 +48,7 @@ typedef struct VirtIONet
>       uint8_t nomulti;
>       uint8_t nouni;
>       uint8_t nobcast;
> +    uint8_t vhost_started;
>       struct {
>           int in_use;
>           int first_multi;
> @@ -114,6 +116,10 @@ static void virtio_net_reset(VirtIODevice *vdev)
>       n->nomulti = 0;
>       n->nouni = 0;
>       n->nobcast = 0;
> +    if (n->vhost_started) {
> +        vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer), vdev);
> +        n->vhost_started = 0;
> +    }
>
>       /* Flush any MAC and VLAN filter table state */
>       n->mac_table.in_use = 0;
> @@ -820,6 +826,36 @@ static NetClientInfo net_virtio_info = {
>       .link_status_changed = virtio_net_set_link_status,
>   };
>
> +static void virtio_net_set_status(struct VirtIODevice *vdev)
> +{
> +    VirtIONet *n = to_virtio_net(vdev);
> +    if (!n->nic->nc.peer) {
> +        return;
> +    }
> +    if (n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
> +        return;
> +    }
> +
> +    if (!tap_get_vhost_net(n->nic->nc.peer)) {
> +        return;
> +    }
> +    if (!!n->vhost_started == !!(vdev->status&  VIRTIO_CONFIG_S_DRIVER_OK)) {
> +        return;
> +    }
> +    if (vdev->status&  VIRTIO_CONFIG_S_DRIVER_OK) {
> +        int r = vhost_net_start(tap_get_vhost_net(n->nic->nc.peer), vdev);
> +        if (r<  0) {
> +            fprintf(stderr, "unable to start vhost net: "
> +                    "falling back on userspace virtio\n");
> +        } else {
> +            n->vhost_started = 1;
> +        }
> +    } else {
> +        vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer), vdev);
> +        n->vhost_started = 0;
> +    }
> +}
> +
>    

This function does a number of bad things.  It makes virtio-net have 
specific knowledge of backends (like tap) and then has virtio-net pass 
device specific state (vdev) to a network backend.

Ultimately, the following things need to happen:

1) when a driver is ready to begin operating:
   a) virtio-net needs to tell vhost the location of the ring in 
physical memory
   b) virtio-net needs to tell vhost about any notification it receives 
(allowing kvm to shortcut userspace)
   c) virtio-net needs to tell vhost about which irq is being used 
(allowing kvm to shortcut userspace)

What this suggests is that we need an API for the network backends to do 
the following:

  1) probe whether ring passthrough is supported
  2) set the *virtual* address of the ring elements
  3) hand the backend some sort of notification object for sending and 
receiving notifications
  4) stop ring passthrough

vhost should not need any direct knowledge of the device.  This 
interface should be enough to communicating the required data.  I think 
the only bit that is a little fuzzy and perhaps non-obvious for the 
current patches is the notification object.  I would expect it look 
something like:

typedef struct IOEvent {
   int type;
   void (*notify)(IOEvent *);
   void (*on_notify)(IOEvent *, void (*cb)(IOEvent *, void *));
} IOEvent;

And then we would have

typedef struct KVMIOEvent {
   IOEvent event = {.type = KVM_IO_EVENT};
   int fd;
} KVMIOEvent;

if (kvm_enabled()) in virtio-net, we would allocate a KVMIOEvent for the 
PIO notification and hand that to vhost.  vhost would check event.type 
and if it's KVM_IOEVENT, downcast and use that to get at the file 
descriptor.

The KVMIOEvent should be created, not in the set status callback, but in 
the PCI map callback.  And in the PCI map callback, cpu_single_env 
should be passed to a kvm specific function to create this KVMIOEvent 
object that contains the created eventfd() that's handed to kvm via ioctl.

It doesn't have to be exactly like this, but the point of all of this is 
that these KVM specific mechanism (which are really implementation 
details) should not be pervasive throughout the QEMU interfaces.  There 
should also be strong separation between the vhost-net code and the 
virtio-net device.

Regards,

Anthony Liguori


>   VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf)
>   {
>       VirtIONet *n;
> @@ -835,6 +871,7 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf)
>       n->vdev.set_features = virtio_net_set_features;
>       n->vdev.bad_features = virtio_net_bad_features;
>       n->vdev.reset = virtio_net_reset;
> +    n->vdev.set_status = virtio_net_set_status;
>       n->rx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_rx);
>       n->tx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_tx);
>       n->ctrl_vq = virtio_add_queue(&n->vdev, 64, virtio_net_handle_ctrl);
> @@ -864,6 +901,9 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf)
>   void virtio_net_exit(VirtIODevice *vdev)
>   {
>       VirtIONet *n = DO_UPCAST(VirtIONet, vdev, vdev);
> +    if (n->vhost_started) {
> +        vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer), vdev);
> +    }
>
>       qemu_purge_queued_packets(&n->nic->nc);
>
>