From: "Michael S. Tsirkin" <mst@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: amit.shah@redhat.com, quintela@redhat.com, qemu-devel@nongnu.org,
kraxel@redhat.com
Subject: [Qemu-devel] Re: [PATCHv2 09/12] vhost: vhost net support
Date: Sat, 27 Feb 2010 21:38:24 +0200 [thread overview]
Message-ID: <20100227193824.GA26389@redhat.com> (raw)
In-Reply-To: <4B87E62B.5000207@codemonkey.ws>
On Fri, Feb 26, 2010 at 09:18:03AM -0600, Anthony Liguori wrote:
> On 02/26/2010 08:49 AM, Michael S. Tsirkin wrote:
>>
>> KVM code needs all kind of work-arounds for KVM specific issues.
>> It also assumes that KVM is registered at startup, so it
>> does not try to optimize finding slots.
>>
>
> No, the slot mapping changes dynamically so KVM certainly needs to
> optimize this.
Maybe, but it does not, KVM algorithms are n^2 or worse.
> But the point is, why can't we keep a central list of slots somewhere
> that KVM and vhost-net can both use? I'm not saying we use a common
> function to do this work, I'm saying qemu should maintain a proper slot
> list than anyone can access.
>
>> I propose merging this as is, and then someone who has an idea
>> how to do this better can come and unify the code.
>>
>
> Like I said, this has been a huge source of very subtle bugs in the
> past. I'm open to hearing what other people think, but I'm concerned
> that if we merge this code, we'll end up facing some nasty bugs that
> could easily be eliminated by just using the code in kvm-all that has
> already been tested rather extensively.
>
> There really aren't that many work-arounds in the code BTW. The work
> arounds just result in a couple of extra slots too so they shouldn't be
> a burden to vhost.
>
>> Mine has no bugs, let's switch to it!
>>
>> Seriously, need to tread very carefully here.
>> This is why I say: merge it, then look at how to reuse code.
>>
>
> Once it's merged, there's no incentive to look at reusing code.
> Again, I don't think this is a huge burden to vhost. The two bits of code
> literally do exactly the same thing. They just use different data
> structures that ultimately contain the same values.
Not exactly. For example kvm track ROM and video ram addresses.
>>> C++ habits die hard :-)
>>>
>>
>> What's that about?
>>
>
> '++i' is an odd thing to do in C in a for() loop. We're not explicit
> about it in Coding Style but the vast majority of code just does
> 'i++'.
Ugh. Do we really need to specify every little thing?
>>>> + vq->desc = cpu_physical_memory_map(a,&l, 0);
>>>> + if (!vq->desc || l != s) {
>>>> + r = -ENOMEM;
>>>> + goto fail_alloc;
>>>> + }
>>>> + s = l = offsetof(struct vring_avail, ring) +
>>>> + sizeof(u_int64_t) * vq->num;
>>>> + a = virtio_queue_get_avail(vdev, idx);
>>>> + vq->avail = cpu_physical_memory_map(a,&l, 0);
>>>> + if (!vq->avail || l != s) {
>>>> + r = -ENOMEM;
>>>> + goto fail_alloc;
>>>> + }
>>>>
>>>>
>>> You don't unmap avail/desc on failure. map() may fail because the ring
>>> cross MMIO memory and you run out of a bounce buffer.
>>>
>>> IMHO, it would be better to attempt to map the full ring at once and
>>> then if that doesn't succeed, bail out. You can still pass individual
>>> pointers via vhost ioctls but within qemu, it's much easier to deal with
>>> the whole ring at a time.
>>>
>> + a = virtio_queue_get_desc(vdev, idx);
>> I prefer to keep as much logic about ring layout as possible
>> in virtio.c
>>
>
> Well, the downside is that you need to deal with the error path and
> cleanup paths and it becomes more complicated.
>
>>>> + s = l = offsetof(struct vring_used, ring) +
>>>> + sizeof(struct vring_used_elem) * vq->num;
>>>>
>>>>
>>> This is unfortunate. We redefine this structures in qemu to avoid
>>> depending on Linux headers.
>>>
>> And we should for e.g. windows portability.
>>
>>
>>> But you're using the linux versions instead
>>> of the qemu versions. Is it really necessary for vhost.h to include
>>> virtio.h?
>>>
>> Yes. And anyway, vhost does not exist on non-linux systems so there
>> is no issue IMO.
>>
>
> Yeah, like I said, it's unfortunate because it means a read of vhost and
> a reader of virtio.c is likely to get confused. I'm not saying there's
> an easy solution, it's just unfortunate.
>
>>>> + vq->used_phys = a = virtio_queue_get_used(vdev, idx);
>>>> + vq->used = cpu_physical_memory_map(a,&l, 1);
>>>> + if (!vq->used || l != s) {
>>>> + r = -ENOMEM;
>>>> + goto fail_alloc;
>>>> + }
>>>> +
>>>> + r = vhost_virtqueue_set_addr(dev, vq, idx, dev->log_enabled);
>>>> + if (r< 0) {
>>>> + r = -errno;
>>>> + goto fail_alloc;
>>>> + }
>>>> + if (!vdev->binding->guest_notifier || !vdev->binding->host_notifier) {
>>>> + fprintf(stderr, "binding does not support irqfd/queuefd\n");
>>>> + r = -ENOSYS;
>>>> + goto fail_alloc;
>>>> + }
>>>> + r = vdev->binding->guest_notifier(vdev->binding_opaque, idx, true);
>>>> + if (r< 0) {
>>>> + fprintf(stderr, "Error binding guest notifier: %d\n", -r);
>>>> + goto fail_guest_notifier;
>>>> + }
>>>> +
>>>> + r = vdev->binding->host_notifier(vdev->binding_opaque, idx, true);
>>>> + if (r< 0) {
>>>> + fprintf(stderr, "Error binding host notifier: %d\n", -r);
>>>> + goto fail_host_notifier;
>>>> + }
>>>> +
>>>> + file.fd = event_notifier_get_fd(virtio_queue_host_notifier(q));
>>>> + r = ioctl(dev->control, VHOST_SET_VRING_KICK,&file);
>>>> + if (r) {
>>>> + goto fail_kick;
>>>> + }
>>>> +
>>>> + file.fd = event_notifier_get_fd(virtio_queue_guest_notifier(q));
>>>> + r = ioctl(dev->control, VHOST_SET_VRING_CALL,&file);
>>>> + if (r) {
>>>> + goto fail_call;
>>>> + }
>>>>
>>>>
>>> This function would be a bit more reasonable if it were split into
>>> sections FWIW.
>>>
>> Not sure what do you mean here.
>>
>
> Just a suggestion. For instance, moving the setting up of the notifiers
> to a separate function would help with readability IMHO.
Hmm. I'll look into it.
I actually think that for functions that just do a list of things
unconditionally, without branches or loops, or with just error handling
as here, it is perfectly fine for them to be of any length.
>>
>>> You never unmap() the mapped memory and you're cheating by assuming that
>>> the virtio rings have a constant mapping for the life time of a guest.
>>> That's not technically true. My concern is that since a guest can
>>> trigger remappings (by adjusting PCI mappings) badness can ensue.
>>>
>> I do not know how this can happen. What do PCI mappings have to do with this?
>> Please explain. If it can, vhost will need notification to update.
>>
>
> If a guest modifies the bar for an MMIO region such that it happens to
> exist in RAM, while this is a bad thing for the guest to do, I don't
> think we do anything to stop it. When the region gets remapped, the
> result will be that the mapping will change.
So IMO this is the bug. If there's a BAR that matches RAM
physical address, it should never get mapped. Any idea how
to check this?
> Within qemu, because we carry the qemu_mutex, we know that the mappings
> are fixed as long as we're in qemu. We're very careful to assume that
> we don't rely on a mapping past when we drop the qemu_mutex.
>
> With vhost, you register a slot table and update it whenever mappings
> change. I think that's good enough for dealing with ram addresses. But
> you pass the virtual address for the rings and assume those mappings
> never change.
So, the issue IMO is that an MMIO address gets passed instead of RAM.
There's no reason to put virtio rings not in RAM, we just need to
verify this.
>
> I'm pretty sure a guest can cause those to change and I'm not 100% sure,
> but I think it's a potential source of exploits if you assume a mapping.
> In the very least, a guest can trick vhost into writing to ram that it
> wouldn't normally write to.
This seems harmless. guest can write anywhere in ram, anyway.
>>> If you're going this way, I'd suggest making static inlines in the
>>> header file instead of polluting the C file. It's more common to search
>>> within a C file and having two declarations can get annoying.
>>>
>>> Regards,
>>>
>>> Anthony Liguori
>>>
>> The issue with inline is that this means that virtio net will depend on
>> target (need to be recompiled). As it is, a single object can link with
>> vhost and non-vhost versions.
>>
>
> Fair enough.
>
> Regards,
>
> Anthony Liguori
next prev parent reply other threads:[~2010-02-27 19:41 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-25 18:27 [Qemu-devel] [PATCHv2 00/12] vhost-net: upstream integration Michael S. Tsirkin
2010-02-25 18:27 ` [Qemu-devel] [PATCHv2 05/12] virtio: add APIs for queue fields Michael S. Tsirkin
2010-02-25 18:49 ` Blue Swirl
2010-02-26 14:53 ` Michael S. Tsirkin
2010-02-25 19:25 ` [Qemu-devel] " Anthony Liguori
2010-02-26 8:46 ` Gleb Natapov
2010-02-25 18:28 ` [Qemu-devel] [PATCHv2 09/12] vhost: vhost net support Michael S. Tsirkin
2010-02-25 19:04 ` [Qemu-devel] " Juan Quintela
2010-02-26 14:32 ` Michael S. Tsirkin
2010-02-26 14:38 ` Anthony Liguori
2010-02-26 14:54 ` Michael S. Tsirkin
2010-02-25 19:44 ` Anthony Liguori
2010-02-26 14:49 ` Michael S. Tsirkin
2010-02-26 15:18 ` Anthony Liguori
2010-02-27 19:38 ` Michael S. Tsirkin [this message]
2010-02-28 1:59 ` Paul Brook
2010-02-28 10:15 ` Michael S. Tsirkin
2010-02-28 12:45 ` Paul Brook
2010-02-28 14:44 ` Michael S. Tsirkin
2010-02-28 15:23 ` Paul Brook
2010-02-28 15:37 ` Michael S. Tsirkin
2010-02-28 16:02 ` Anthony Liguori
2010-02-25 18:28 ` [Qemu-devel] [PATCHv2 02/12] kvm: add API to set ioeventfd Michael S. Tsirkin
2010-02-25 19:19 ` [Qemu-devel] " Anthony Liguori
2010-03-02 17:41 ` Michael S. Tsirkin
2010-02-25 18:28 ` [Qemu-devel] [PATCHv2 04/12] virtio: add notifier support Michael S. Tsirkin
2010-02-25 18:28 ` [Qemu-devel] [PATCHv2 01/12] tap: add interface to get device fd Michael S. Tsirkin
2010-02-25 18:28 ` [Qemu-devel] [PATCHv2 07/12] virtio: move typedef to qemu-common Michael S. Tsirkin
2010-02-25 18:28 ` [Qemu-devel] [PATCHv2 10/12] tap: add vhost/vhostfd options Michael S. Tsirkin
2010-02-25 19:47 ` [Qemu-devel] " Anthony Liguori
2010-02-26 14:51 ` Michael S. Tsirkin
2010-02-26 15:23 ` Anthony Liguori
2010-02-27 19:44 ` Michael S. Tsirkin
2010-02-28 16:08 ` Anthony Liguori
2010-02-28 17:19 ` Michael S. Tsirkin
2010-02-28 20:57 ` Anthony Liguori
2010-02-28 21:01 ` Michael S. Tsirkin
2010-02-28 22:38 ` Anthony Liguori
2010-02-28 22:39 ` Paul Brook
2010-03-01 19:27 ` Michael S. Tsirkin
2010-03-01 21:54 ` Anthony Liguori
2010-03-02 9:57 ` Michael S. Tsirkin
2010-03-02 14:07 ` Anthony Liguori
2010-03-02 14:33 ` Paul Brook
2010-03-02 14:39 ` Anthony Liguori
2010-03-02 14:55 ` Paul Brook
2010-03-02 15:33 ` Anthony Liguori
2010-03-02 15:53 ` Paul Brook
2010-03-02 15:56 ` Michael S. Tsirkin
2010-03-02 16:12 ` Anthony Liguori
2010-03-02 16:21 ` Marcelo Tosatti
2010-03-02 16:12 ` Marcelo Tosatti
2010-03-02 16:56 ` Anthony Liguori
2010-03-02 17:00 ` Michael S. Tsirkin
2010-03-02 18:00 ` Marcelo Tosatti
2010-03-02 18:13 ` Anthony Liguori
2010-03-02 22:41 ` Paul Brook
2010-03-03 14:15 ` Anthony Liguori
2010-03-03 14:43 ` Paul Brook
2010-03-03 16:24 ` Marcelo Tosatti
2010-02-25 18:28 ` [Qemu-devel] [PATCHv2 11/12] tap: add API to retrieve vhost net header Michael S. Tsirkin
2010-02-25 18:28 ` [Qemu-devel] [PATCHv2 06/12] virtio: add set_status callback Michael S. Tsirkin
2010-02-25 18:28 ` [Qemu-devel] [PATCHv2 08/12] virtio-pci: fill in notifier support Michael S. Tsirkin
2010-02-25 19:30 ` [Qemu-devel] " Anthony Liguori
2010-02-28 20:02 ` Michael S. Tsirkin
2010-02-25 18:28 ` [Qemu-devel] [PATCHv2 03/12] notifier: event notifier implementation Michael S. Tsirkin
2010-02-25 19:22 ` [Qemu-devel] " Anthony Liguori
2010-02-28 19:59 ` Michael S. Tsirkin
2010-02-25 18:28 ` [Qemu-devel] [PATCHv2 12/12] virtio-net: vhost net support Michael S. Tsirkin
2010-02-25 19:49 ` [Qemu-devel] Re: [PATCHv2 00/12] vhost-net: upstream integration Anthony Liguori
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100227193824.GA26389@redhat.com \
--to=mst@redhat.com \
--cc=amit.shah@redhat.com \
--cc=anthony@codemonkey.ws \
--cc=kraxel@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).