qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: qemu-devel@nongnu.org
Subject: [Qemu-devel] Re: [PATCH-RFC 13/13] virtio-net: connect to vhost net backend
Date: Mon, 25 Jan 2010 15:00:16 -0600	[thread overview]
Message-ID: <4B5E0660.6010400@codemonkey.ws> (raw)
In-Reply-To: <20100125202711.GA16928@redhat.com>

On 01/25/2010 02:27 PM, Michael S. Tsirkin wrote:
> On Mon, Jan 25, 2010 at 01:58:08PM -0600, Anthony Liguori wrote:
>    
>> On 01/11/2010 11:23 AM, Michael S. Tsirkin wrote:
>>      
>>> start/stop backend on driver start/stop
>>>
>>> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
>>> ---
>>>    hw/virtio-net.c |   40 ++++++++++++++++++++++++++++++++++++++++
>>>    1 files changed, 40 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/hw/virtio-net.c b/hw/virtio-net.c
>>> index c2a389f..99169e1 100644
>>> --- a/hw/virtio-net.c
>>> +++ b/hw/virtio-net.c
>>> @@ -17,6 +17,7 @@
>>>    #include "net/tap.h"
>>>    #include "qemu-timer.h"
>>>    #include "virtio-net.h"
>>> +#include "vhost_net.h"
>>>
>>>    #define VIRTIO_NET_VM_VERSION    11
>>>
>>> @@ -47,6 +48,7 @@ typedef struct VirtIONet
>>>        uint8_t nomulti;
>>>        uint8_t nouni;
>>>        uint8_t nobcast;
>>> +    uint8_t vhost_started;
>>>        struct {
>>>            int in_use;
>>>            int first_multi;
>>> @@ -114,6 +116,10 @@ static void virtio_net_reset(VirtIODevice *vdev)
>>>        n->nomulti = 0;
>>>        n->nouni = 0;
>>>        n->nobcast = 0;
>>> +    if (n->vhost_started) {
>>> +        vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer), vdev);
>>> +        n->vhost_started = 0;
>>> +    }
>>>
>>>        /* Flush any MAC and VLAN filter table state */
>>>        n->mac_table.in_use = 0;
>>> @@ -820,6 +826,36 @@ static NetClientInfo net_virtio_info = {
>>>        .link_status_changed = virtio_net_set_link_status,
>>>    };
>>>
>>> +static void virtio_net_set_status(struct VirtIODevice *vdev)
>>> +{
>>> +    VirtIONet *n = to_virtio_net(vdev);
>>> +    if (!n->nic->nc.peer) {
>>> +        return;
>>> +    }
>>> +    if (n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
>>> +        return;
>>> +    }
>>> +
>>> +    if (!tap_get_vhost_net(n->nic->nc.peer)) {
>>> +        return;
>>> +    }
>>> +    if (!!n->vhost_started == !!(vdev->status&   VIRTIO_CONFIG_S_DRIVER_OK)) {
>>> +        return;
>>> +    }
>>> +    if (vdev->status&   VIRTIO_CONFIG_S_DRIVER_OK) {
>>> +        int r = vhost_net_start(tap_get_vhost_net(n->nic->nc.peer), vdev);
>>> +        if (r<   0) {
>>> +            fprintf(stderr, "unable to start vhost net: "
>>> +                    "falling back on userspace virtio\n");
>>> +        } else {
>>> +            n->vhost_started = 1;
>>> +        }
>>> +    } else {
>>> +        vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer), vdev);
>>> +        n->vhost_started = 0;
>>> +    }
>>> +}
>>> +
>>>
>>>        
>> This function does a number of bad things.  It makes virtio-net have
>> specific knowledge of backends (like tap) and then has virtio-net pass
>> device specific state (vdev) to a network backend.
>>
>> Ultimately, the following things need to happen:
>>
>> 1) when a driver is ready to begin operating:
>>    a) virtio-net needs to tell vhost the location of the ring in physical
>> memory
>>    b) virtio-net needs to tell vhost about any notification it receives
>> (allowing kvm to shortcut userspace)
>>    c) virtio-net needs to tell vhost about which irq is being used
>> (allowing kvm to shortcut userspace)
>>
>> What this suggests is that we need an API for the network backends to do
>> the following:
>>
>>   1) probe whether ring passthrough is supported
>>   2) set the *virtual* address of the ring elements
>>   3) hand the backend some sort of notification object for sending and
>> receiving notifications
>>   4) stop ring passthrough
>>      
> Look at how vnet_hdr is setup: frontend probes backend, and has 'if
> (backend has vnet header) blabla' vhost is really very similar, so I
> would like to do it in the same way.
>    

vnet_hdr is IMHO a really bad example to copy from.

vnet_hdr leaks into the migration state via n->has_vnet_hdr.  What this 
means is that if you want to migrate from -net tap -net nic,model=virtio 
to -net user -net nic,model=virtio, it will fail.

This is a hard problem to solve in qemu though because it would require 
that we implement software GSO which so far, no one has stepped up to do.

We don't have to repeat this with vhost-net though because unlike 
vnet_hdr, we don't have to expose anything to the guest.

> Generally I do not believe in abstractions that only have one
> implementation behind it: you only know how to abstract interface after you
> have more than one implementation.  So whoever writes another frontend
> that can use vhost will be in a better position to add infrastructure to
> abstract both that new thing and virtio.
>    

I agree with you, but at the same time, I also believe that layering 
violations should be avoided.  I'm not suggesting that you need to make 
anything other than the vhost-net + virtio-net case work.  Just make the 
interfaces abstract enough that you don't need to hand a vdev to 
vhost-net and such that you don't have to pass kvm specific data 
structure (ioeventfd) in what are supposed to be generic interfaces.

>> vhost should not need any direct knowledge of the device.  This
>> interface should be enough to communicating the required data.  I think
>> the only bit that is a little fuzzy and perhaps non-obvious for the
>> current patches is the notification object.  I would expect it look
>> something like:
>>
>> typedef struct IOEvent {
>>    int type;
>>    void (*notify)(IOEvent *);
>>    void (*on_notify)(IOEvent *, void (*cb)(IOEvent *, void *));
>> } IOEvent;
>> And then we would have
>>
>> typedef struct KVMIOEvent {
>>    IOEvent event = {.type = KVM_IO_EVENT};
>>    int fd;
>> } KVMIOEvent;
>>
>> if (kvm_enabled()) in virtio-net, we would allocate a KVMIOEvent for the
>> PIO notification and hand that to vhost.  vhost would check event.type
>> and if it's KVM_IOEVENT, downcast and use that to get at the file
>> descriptor.
>>      
> Since we don't have any other IOEvents, I just put the fd
> in the generic structure directly. If other types surface
> we'll see how to generalize it.
>    

I'd feel a whole lot better if we didn't pass the fd around and instead 
passed around a structure.  With just a tiny bit of layering, we can 
even avoid making that structure KVM specific :-)

>> The KVMIOEvent should be created, not in the set status callback, but in
>> the PCI map callback.  And in the PCI map callback, cpu_single_env
>> should be passed to a kvm specific function to create this KVMIOEvent
>> object that contains the created eventfd() that's handed to kvm via
>> ioctl.
>>      
> So this specific thing does not work very well because with irqchip, we
> want to bypass notification and send irq directly from kernel.
> So I created a structure but it does not have callbacks,
> only the fd.
>    

Your structure is an int, right?  I understand the limits due to lack of 
in-kernel irqchip but I still think passing around an fd is a mistake.

>>   There
>> should also be strong separation between the vhost-net code and the
>> virtio-net device.
>>
>> Regards,
>>
>> Anthony Liguori
>>
>>      
> I don't see the point of this last idea.  vhost is virtio accelerator,
> not a generic network backend.  Whoever wants to use vhost for
> non-virtio frontends will have to add infrastructure for this
> separation, but I do not believe it's practical or desirable.
>    

vhost is a userspace interface to inject packets into a network device 
just like a raw socket or a tun/tap device is.  It happens to have some 
interesting features like it supports remapping of physical addresses 
and it implements interfaces for notifications that can conveniently be 
used by KVM to bypass userspace in the fast paths.

We should think of it this way for the same reason that vhost-net 
doesn't live in kvm.ko.  If it's easy to isolate something and make it 
more generic, it's almost always the right thing to do.  In this case, 
isolating vhost-net from virtio-net in qemu just involves passing some 
information in a function call verses passing a non-public data 
structure that is then accessed directly.  Regardless of whether you 
agree with my view of vhost-net, the argument alone to avoid making a 
non-public structure public should be enough of an argument.

Regards,

Anthony Liguori

  reply	other threads:[~2010-01-25 21:00 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1263230079.git.mst@redhat.com>
2010-01-11 17:16 ` [Qemu-devel] [PATCH-RFC 01/13] virtio: export virtqueue structure Michael S. Tsirkin
2010-01-12 22:32   ` [Qemu-devel] " Anthony Liguori
2010-01-25 19:10     ` Michael S. Tsirkin
2010-01-25 19:23       ` Anthony Liguori
2010-01-11 17:17 ` [Qemu-devel] [PATCH-RFC 02/13] kvm: add API to set ioeventfd Michael S. Tsirkin
2010-01-12 22:35   ` [Qemu-devel] " Anthony Liguori
2010-01-25 19:20     ` Michael S. Tsirkin
2010-01-25 19:28       ` Anthony Liguori
2010-01-25 19:28         ` Michael S. Tsirkin
2010-01-25 19:44           ` Anthony Liguori
2010-01-11 17:17 ` [Qemu-devel] [PATCH-RFC 03/13] virtio: add iofd/irqfd support Michael S. Tsirkin
2010-01-12 22:36   ` [Qemu-devel] " Anthony Liguori
2010-01-13 10:50     ` Michael S. Tsirkin
2010-01-11 17:17 ` [Qemu-devel] [PATCH-RFC 04/13] virtio-pci: fill in irqfd/queufd support Michael S. Tsirkin
2010-01-11 17:19 ` [Qemu-devel] [PATCH-RFC 05/13] syborg_virtio: add irqfd/eventfd support Michael S. Tsirkin
2010-01-11 17:20 ` [Qemu-devel] [PATCH-RFC 06/13] s390-virtio: fill in irqfd support Michael S. Tsirkin
2010-01-11 17:20 ` [Qemu-devel] [PATCH-RFC 07/13] virtio: move typedef to qemu-common Michael S. Tsirkin
2010-01-11 17:20 ` [Qemu-devel] [PATCH-RFC 08/13] net/tap: add interface to get device fd Michael S. Tsirkin
2010-01-11 17:22 ` [Qemu-devel] [PATCH-RFC 09/13] tap: add vhost/vhostfd options Michael S. Tsirkin
2010-01-12 22:39   ` [Qemu-devel] " Anthony Liguori
2010-01-13 10:59     ` Michael S. Tsirkin
2010-01-12 22:42   ` Anthony Liguori
2010-01-11 17:22 ` [Qemu-devel] [PATCH-RFC 10/13] tap: add API to retrieve vhost net header Michael S. Tsirkin
2010-01-11 17:23 ` [Qemu-devel] [PATCH-RFC 12/13] virtio: add status change callback Michael S. Tsirkin
2010-01-11 17:23 ` [Qemu-devel] [PATCH-RFC 13/13] virtio-net: connect to vhost net backend Michael S. Tsirkin
2010-01-25 19:58   ` [Qemu-devel] " Anthony Liguori
2010-01-25 20:27     ` Michael S. Tsirkin
2010-01-25 21:00       ` Anthony Liguori [this message]
2010-01-25 21:01         ` Michael S. Tsirkin
2010-01-25 21:05         ` Michael S. Tsirkin
2010-01-25 21:11           ` Anthony Liguori
2010-02-24  3:14         ` Paul Brook
2010-02-24  5:29           ` Michael S. Tsirkin
2010-02-24 11:30             ` Paul Brook
2010-02-24 11:46               ` Michael S. Tsirkin
2010-02-24 12:26                 ` Paul Brook
2010-02-24 12:40                   ` Michael S. Tsirkin
2010-02-24 15:16                 ` Anthony Liguori
2010-02-24 14:51           ` Anthony Liguori
2010-01-11 17:23 ` [Qemu-devel] [PATCH-RFC 11/13] vhost net support Michael S. Tsirkin
2010-01-12 22:45   ` [Qemu-devel] " Anthony Liguori

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B5E0660.6010400@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).