From: "Michael S. Tsirkin" <mst@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: qemu-devel@nongnu.org
Subject: [Qemu-devel] Re: [PATCH-RFC 13/13] virtio-net: connect to vhost net backend
Date: Mon, 25 Jan 2010 23:05:13 +0200 [thread overview]
Message-ID: <20100125210513.GD16928@redhat.com> (raw)
In-Reply-To: <4B5E0660.6010400@codemonkey.ws>
On Mon, Jan 25, 2010 at 03:00:16PM -0600, Anthony Liguori wrote:
> On 01/25/2010 02:27 PM, Michael S. Tsirkin wrote:
>> On Mon, Jan 25, 2010 at 01:58:08PM -0600, Anthony Liguori wrote:
>>
>>> On 01/11/2010 11:23 AM, Michael S. Tsirkin wrote:
>>>
>>>> start/stop backend on driver start/stop
>>>>
>>>> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
>>>> ---
>>>> hw/virtio-net.c | 40 ++++++++++++++++++++++++++++++++++++++++
>>>> 1 files changed, 40 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/hw/virtio-net.c b/hw/virtio-net.c
>>>> index c2a389f..99169e1 100644
>>>> --- a/hw/virtio-net.c
>>>> +++ b/hw/virtio-net.c
>>>> @@ -17,6 +17,7 @@
>>>> #include "net/tap.h"
>>>> #include "qemu-timer.h"
>>>> #include "virtio-net.h"
>>>> +#include "vhost_net.h"
>>>>
>>>> #define VIRTIO_NET_VM_VERSION 11
>>>>
>>>> @@ -47,6 +48,7 @@ typedef struct VirtIONet
>>>> uint8_t nomulti;
>>>> uint8_t nouni;
>>>> uint8_t nobcast;
>>>> + uint8_t vhost_started;
>>>> struct {
>>>> int in_use;
>>>> int first_multi;
>>>> @@ -114,6 +116,10 @@ static void virtio_net_reset(VirtIODevice *vdev)
>>>> n->nomulti = 0;
>>>> n->nouni = 0;
>>>> n->nobcast = 0;
>>>> + if (n->vhost_started) {
>>>> + vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer), vdev);
>>>> + n->vhost_started = 0;
>>>> + }
>>>>
>>>> /* Flush any MAC and VLAN filter table state */
>>>> n->mac_table.in_use = 0;
>>>> @@ -820,6 +826,36 @@ static NetClientInfo net_virtio_info = {
>>>> .link_status_changed = virtio_net_set_link_status,
>>>> };
>>>>
>>>> +static void virtio_net_set_status(struct VirtIODevice *vdev)
>>>> +{
>>>> + VirtIONet *n = to_virtio_net(vdev);
>>>> + if (!n->nic->nc.peer) {
>>>> + return;
>>>> + }
>>>> + if (n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
>>>> + return;
>>>> + }
>>>> +
>>>> + if (!tap_get_vhost_net(n->nic->nc.peer)) {
>>>> + return;
>>>> + }
>>>> + if (!!n->vhost_started == !!(vdev->status& VIRTIO_CONFIG_S_DRIVER_OK)) {
>>>> + return;
>>>> + }
>>>> + if (vdev->status& VIRTIO_CONFIG_S_DRIVER_OK) {
>>>> + int r = vhost_net_start(tap_get_vhost_net(n->nic->nc.peer), vdev);
>>>> + if (r< 0) {
>>>> + fprintf(stderr, "unable to start vhost net: "
>>>> + "falling back on userspace virtio\n");
>>>> + } else {
>>>> + n->vhost_started = 1;
>>>> + }
>>>> + } else {
>>>> + vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer), vdev);
>>>> + n->vhost_started = 0;
>>>> + }
>>>> +}
>>>> +
>>>>
>>>>
>>> This function does a number of bad things. It makes virtio-net have
>>> specific knowledge of backends (like tap) and then has virtio-net pass
>>> device specific state (vdev) to a network backend.
>>>
>>> Ultimately, the following things need to happen:
>>>
>>> 1) when a driver is ready to begin operating:
>>> a) virtio-net needs to tell vhost the location of the ring in physical
>>> memory
>>> b) virtio-net needs to tell vhost about any notification it receives
>>> (allowing kvm to shortcut userspace)
>>> c) virtio-net needs to tell vhost about which irq is being used
>>> (allowing kvm to shortcut userspace)
>>>
>>> What this suggests is that we need an API for the network backends to do
>>> the following:
>>>
>>> 1) probe whether ring passthrough is supported
>>> 2) set the *virtual* address of the ring elements
>>> 3) hand the backend some sort of notification object for sending and
>>> receiving notifications
>>> 4) stop ring passthrough
>>>
>> Look at how vnet_hdr is setup: frontend probes backend, and has 'if
>> (backend has vnet header) blabla' vhost is really very similar, so I
>> would like to do it in the same way.
>>
>
> vnet_hdr is IMHO a really bad example to copy from.
>
> vnet_hdr leaks into the migration state via n->has_vnet_hdr. What this
> means is that if you want to migrate from -net tap -net nic,model=virtio
> to -net user -net nic,model=virtio, it will fail.
>
> This is a hard problem to solve in qemu though because it would require
> that we implement software GSO which so far, no one has stepped up to do.
>
> We don't have to repeat this with vhost-net though because unlike
> vnet_hdr, we don't have to expose anything to the guest.
>
>> Generally I do not believe in abstractions that only have one
>> implementation behind it: you only know how to abstract interface after you
>> have more than one implementation. So whoever writes another frontend
>> that can use vhost will be in a better position to add infrastructure to
>> abstract both that new thing and virtio.
>>
>
> I agree with you, but at the same time, I also believe that layering
> violations should be avoided. I'm not suggesting that you need to make
> anything other than the vhost-net + virtio-net case work. Just make the
> interfaces abstract enough that you don't need to hand a vdev to
> vhost-net and such that you don't have to pass kvm specific data
> structure (ioeventfd) in what are supposed to be generic interfaces.
>
>>> vhost should not need any direct knowledge of the device. This
>>> interface should be enough to communicating the required data. I think
>>> the only bit that is a little fuzzy and perhaps non-obvious for the
>>> current patches is the notification object. I would expect it look
>>> something like:
>>>
>>> typedef struct IOEvent {
>>> int type;
>>> void (*notify)(IOEvent *);
>>> void (*on_notify)(IOEvent *, void (*cb)(IOEvent *, void *));
>>> } IOEvent;
>>> And then we would have
>>>
>>> typedef struct KVMIOEvent {
>>> IOEvent event = {.type = KVM_IO_EVENT};
>>> int fd;
>>> } KVMIOEvent;
>>>
>>> if (kvm_enabled()) in virtio-net, we would allocate a KVMIOEvent for the
>>> PIO notification and hand that to vhost. vhost would check event.type
>>> and if it's KVM_IOEVENT, downcast and use that to get at the file
>>> descriptor.
>>>
>> Since we don't have any other IOEvents, I just put the fd
>> in the generic structure directly. If other types surface
>> we'll see how to generalize it.
>>
>
> I'd feel a whole lot better if we didn't pass the fd around and instead
> passed around a structure. With just a tiny bit of layering, we can
> even avoid making that structure KVM specific :-)
>
>>> The KVMIOEvent should be created, not in the set status callback, but in
>>> the PCI map callback. And in the PCI map callback, cpu_single_env
>>> should be passed to a kvm specific function to create this KVMIOEvent
>>> object that contains the created eventfd() that's handed to kvm via
>>> ioctl.
>>>
>> So this specific thing does not work very well because with irqchip, we
>> want to bypass notification and send irq directly from kernel.
>> So I created a structure but it does not have callbacks,
>> only the fd.
>>
>
> Your structure is an int, right?
it *has* an int:
struct EventNotifier {
int fd;
};
> I understand the limits due to lack of
> in-kernel irqchip but I still think passing around an fd is a mistake.
So API's will all use EventNotifier *.
We'll be able to add downcasting etc if/when we need it.
>>> There
>>> should also be strong separation between the vhost-net code and the
>>> virtio-net device.
>>>
>>> Regards,
>>>
>>> Anthony Liguori
>>>
>>>
>> I don't see the point of this last idea. vhost is virtio accelerator,
>> not a generic network backend. Whoever wants to use vhost for
>> non-virtio frontends will have to add infrastructure for this
>> separation, but I do not believe it's practical or desirable.
>>
>
> vhost is a userspace interface to inject packets into a network device
> just like a raw socket or a tun/tap device is. It happens to have some
> interesting features like it supports remapping of physical addresses
> and it implements interfaces for notifications that can conveniently be
> used by KVM to bypass userspace in the fast paths.
>
> We should think of it this way for the same reason that vhost-net
> doesn't live in kvm.ko. If it's easy to isolate something and make it
> more generic, it's almost always the right thing to do. In this case,
> isolating vhost-net from virtio-net in qemu just involves passing some
> information in a function call verses passing a non-public data
> structure that is then accessed directly. Regardless of whether you
> agree with my view of vhost-net, the argument alone to avoid making a
> non-public structure public should be enough of an argument.
>
> Regards,
>
> Anthony Liguori
I'll add accessors, it's not a big deal.
next prev parent reply other threads:[~2010-01-25 21:08 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1263230079.git.mst@redhat.com>
2010-01-11 17:16 ` [Qemu-devel] [PATCH-RFC 01/13] virtio: export virtqueue structure Michael S. Tsirkin
2010-01-12 22:32 ` [Qemu-devel] " Anthony Liguori
2010-01-25 19:10 ` Michael S. Tsirkin
2010-01-25 19:23 ` Anthony Liguori
2010-01-11 17:17 ` [Qemu-devel] [PATCH-RFC 02/13] kvm: add API to set ioeventfd Michael S. Tsirkin
2010-01-12 22:35 ` [Qemu-devel] " Anthony Liguori
2010-01-25 19:20 ` Michael S. Tsirkin
2010-01-25 19:28 ` Anthony Liguori
2010-01-25 19:28 ` Michael S. Tsirkin
2010-01-25 19:44 ` Anthony Liguori
2010-01-11 17:17 ` [Qemu-devel] [PATCH-RFC 03/13] virtio: add iofd/irqfd support Michael S. Tsirkin
2010-01-12 22:36 ` [Qemu-devel] " Anthony Liguori
2010-01-13 10:50 ` Michael S. Tsirkin
2010-01-11 17:17 ` [Qemu-devel] [PATCH-RFC 04/13] virtio-pci: fill in irqfd/queufd support Michael S. Tsirkin
2010-01-11 17:19 ` [Qemu-devel] [PATCH-RFC 05/13] syborg_virtio: add irqfd/eventfd support Michael S. Tsirkin
2010-01-11 17:20 ` [Qemu-devel] [PATCH-RFC 06/13] s390-virtio: fill in irqfd support Michael S. Tsirkin
2010-01-11 17:20 ` [Qemu-devel] [PATCH-RFC 07/13] virtio: move typedef to qemu-common Michael S. Tsirkin
2010-01-11 17:20 ` [Qemu-devel] [PATCH-RFC 08/13] net/tap: add interface to get device fd Michael S. Tsirkin
2010-01-11 17:22 ` [Qemu-devel] [PATCH-RFC 09/13] tap: add vhost/vhostfd options Michael S. Tsirkin
2010-01-12 22:39 ` [Qemu-devel] " Anthony Liguori
2010-01-13 10:59 ` Michael S. Tsirkin
2010-01-12 22:42 ` Anthony Liguori
2010-01-11 17:22 ` [Qemu-devel] [PATCH-RFC 10/13] tap: add API to retrieve vhost net header Michael S. Tsirkin
2010-01-11 17:23 ` [Qemu-devel] [PATCH-RFC 12/13] virtio: add status change callback Michael S. Tsirkin
2010-01-11 17:23 ` [Qemu-devel] [PATCH-RFC 13/13] virtio-net: connect to vhost net backend Michael S. Tsirkin
2010-01-25 19:58 ` [Qemu-devel] " Anthony Liguori
2010-01-25 20:27 ` Michael S. Tsirkin
2010-01-25 21:00 ` Anthony Liguori
2010-01-25 21:01 ` Michael S. Tsirkin
2010-01-25 21:05 ` Michael S. Tsirkin [this message]
2010-01-25 21:11 ` Anthony Liguori
2010-02-24 3:14 ` Paul Brook
2010-02-24 5:29 ` Michael S. Tsirkin
2010-02-24 11:30 ` Paul Brook
2010-02-24 11:46 ` Michael S. Tsirkin
2010-02-24 12:26 ` Paul Brook
2010-02-24 12:40 ` Michael S. Tsirkin
2010-02-24 15:16 ` Anthony Liguori
2010-02-24 14:51 ` Anthony Liguori
2010-01-11 17:23 ` [Qemu-devel] [PATCH-RFC 11/13] vhost net support Michael S. Tsirkin
2010-01-12 22:45 ` [Qemu-devel] " Anthony Liguori
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100125210513.GD16928@redhat.com \
--to=mst@redhat.com \
--cc=anthony@codemonkey.ws \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).