From: Avi Kivity <avi@redhat.com>
To: Gregory Haskins <gregory.haskins@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>,
kvm@vger.kernel.org, alacrityvm-devel@lists.sourceforge.net,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH v3 3/6] vbus: add a "vbus-proxy" bus model for vbus_driver objects
Date: Wed, 19 Aug 2009 10:11:42 +0300 [thread overview]
Message-ID: <4A8BA5AE.3030308@redhat.com> (raw)
In-Reply-To: <4A8B9B79.9050004@gmail.com>
On 08/19/2009 09:28 AM, Gregory Haskins wrote:
> Avi Kivity wrote:
>
>> On 08/18/2009 05:46 PM, Gregory Haskins wrote:
>>
>>>
>>>> Can you explain how vbus achieves RDMA?
>>>>
>>>> I also don't see the connection to real time guests.
>>>>
>>>>
>>> Both of these are still in development. Trying to stay true to the
>>> "release early and often" mantra, the core vbus technology is being
>>> pushed now so it can be reviewed. Stay tuned for these other
>>> developments.
>>>
>>>
>> Hopefully you can outline how it works. AFAICT, RDMA and kernel bypass
>> will need device assignment. If you're bypassing the call into the host
>> kernel, it doesn't really matter how that call is made, does it?
>>
> This is for things like the setup of queue-pairs, and the transport of
> door-bells, and ib-verbs. I am not on the team doing that work, so I am
> not an expert in this area. What I do know is having a flexible and
> low-latency signal-path was deemed a key requirement.
>
That's not a full bypass, then. AFAIK kernel bypass has userspace
talking directly to the device.
Given that both virtio and vbus can use ioeventfds, I don't see how one
can perform better than the other.
> For real-time, a big part of it is relaying the guest scheduler state to
> the host, but in a smart way. For instance, the cpu priority for each
> vcpu is in a shared-table. When the priority is raised, we can simply
> update the table without taking a VMEXIT. When it is lowered, we need
> to inform the host of the change in case the underlying task needs to
> reschedule.
>
This is best done using cr8/tpr so you don't have to exit at all. See
also my vtpr support for Windows which does this in software, generally
avoiding the exit even when lowering priority.
> This is where the really fast call() type mechanism is important.
>
> Its also about having the priority flow-end to end, and having the vcpu
> interrupt state affect the task-priority, etc (e.g. pending interrupts
> affect the vcpu task prio).
>
> etc, etc.
>
> I can go on and on (as you know ;), but will wait till this work is more
> concrete and proven.
>
Generally cpu state shouldn't flow through a device but rather through
MSRs, hypercalls, and cpu registers.
> Basically, what it comes down to is both vbus and vhost need
> configuration/management. Vbus does it with sysfs/configfs, and vhost
> does it with ioctls. I ultimately decided to go with sysfs/configfs
> because, at least that the time I looked, it seemed like the "blessed"
> way to do user->kernel interfaces.
>
I really dislike that trend but that's an unrelated discussion.
>> They need to be connected to the real world somehow. What about
>> security? can any user create a container and devices and link them to
>> real interfaces? If not, do you need to run the VM as root?
>>
> Today it has to be root as a result of weak mode support in configfs, so
> you have me there. I am looking for help patching this limitation, though.
>
>
Well, do you plan to address this before submission for inclusion?
>> I hope everyone agrees that it's an important issue for me and that I
>> have to consider non-Linux guests. I also hope that you're considering
>> non-Linux guests since they have considerable market share.
>>
> I didn't mean non-Linux guests are not important. I was disagreeing
> with your assertion that it only works if its PCI. There are numerous
> examples of IHV/ISV "bridge" implementations deployed in Windows, no?
>
I don't know.
> If vbus is exposed as a PCI-BRIDGE, how is this different?
>
Technically it would work, but given you're not interested in Windows,
who would write a driver?
>> Given I'm not the gateway to inclusion of vbus/venet, you don't need to
>> ask me anything. I'm still free to give my opinion.
>>
> Agreed, and I didn't mean to suggest otherwise. It not clear if you are
> wearing the "kvm maintainer" hat, or the "lkml community member" hat at
> times, so its important to make that distinction. Otherwise, its not
> clear if this is edict as my superior, or input as my peer. ;)
>
When I wear a hat, it is a Red Hat. However I am bareheaded most often.
(that is, look at the contents of my message, not who wrote it or his role).
>> With virtio, the number is 1 (or less if you amortize). Set up the ring
>> entries and kick.
>>
> Again, I am just talking about basic PCI here, not the things we build
> on top.
>
Whatever that means, it isn't interesting. Performance is measure for
the whole stack.
> The point is: the things we build on top have costs associated with
> them, and I aim to minimize it. For instance, to do a "call()" kind of
> interface, you generally need to pre-setup some per-cpu mappings so that
> you can just do a single iowrite32() to kick the call off. Those
> per-cpu mappings have a cost if you want them to be high-performance, so
> my argument is that you ideally want to limit the number of times you
> have to do this. My current design reduces this to "once".
>
Do you mean minimizing the setup cost? Seriously?
>> There's no such thing as raw PCI. Every PCI device has a protocol. The
>> protocol virtio chose is optimized for virtualization.
>>
> And its a question of how that protocol scales, more than how the
> protocol works.
>
> Obviously the general idea of the protocol works, as vbus itself is
> implemented as a PCI-BRIDGE and is therefore limited to the underlying
> characteristics that I can get out of PCI (like PIO latency).
>
I thought we agreed that was insignificant?
>> As I've mentioned before, prioritization is available on x86
>>
> But as Ive mentioned, it doesn't work very well.
>
I guess it isn't that important then. I note that clever prioritization
in a guest is pointless if you can't do the same prioritization in the host.
>> , and coalescing scales badly.
>>
> Depends on what is scaling. Scaling vcpus? Yes, you are right.
> Scaling the number of devices? No, this is where it improves.
>
If you queue pending messages instead of walking the device list, you
may be right. Still, if hard interrupt processing takes 10% of your
time you'll only have coalesced 10% of interrupts on average.
>> irq window exits ought to be pretty rare, so we're only left with
>> injection vmexits. At around 1us/vmexit, even 100,000 interrupts/vcpu
>> (which is excessive) will only cost you 10% cpu time.
>>
> 1us is too much for what I am building, IMHO.
You can't use current hardware then.
>> You're free to demultiplex an MSI to however many consumers you want,
>> there's no need for a new bus for that.
>>
> Hmmm...can you elaborate?
>
Point all those MSIs at one vector. Its handler will have to poll all
the attached devices though.
>> Do you use DNS. We use PCI-SIG. If Novell is a PCI-SIG member you can
>> get a vendor ID and control your own virtio space.
>>
> Yeah, we have our own id. I am more concerned about making this design
> make sense outside of PCI oriented environments.
>
IIRC we reuse the PCI IDs for non-PCI.
>>>> That's a bug, not a feature. It means poor scaling as the number of
>>>> vcpus increases and as the number of devices increases.
>>>>
> vcpu increases, I agree (and am ok with, as I expect low vcpu count
> machines to be typical).
I'm not okay with it. If you wish people to adopt vbus over virtio
you'll have to address all concerns, not just yours.
> nr of devices, I disagree. can you elaborate?
>
With message queueing, I retract my remark.
>> Windows,
>>
> Work in progress.
>
Interesting. Do you plan to open source the code? If not, will the
binaries be freely available?
>
>> large guests
>>
> Can you elaborate? I am not familiar with the term.
>
Many vcpus.
>
>> and multiqueue out of your design.
>>
> AFAICT, multiqueue should work quite nicely with vbus. Can you
> elaborate on where you see the problem?
>
You said you aren't interested in it previously IIRC.
>>>> x86 APIC is priority aware.
>>>>
>>>>
>>> Have you ever tried to use it?
>>>
>>>
>> I haven't, but Windows does.
>>
> Yeah, it doesn't really work well. Its an extremely rigid model that
> (IIRC) only lets you prioritize in 16 groups spaced by IDT (0-15 are one
> level, 16-31 are another, etc). Most of the embedded PICs I have worked
> with supported direct remapping, etc. But in any case, Linux doesn't
> support it so we are hosed no matter how good it is.
>
I agree that it isn't very clever (not that I am a real time expert) but
I disagree about dismissing Linux support so easily. If prioritization
is such a win it should be a win on the host as well and we should make
it work on the host as well. Further I don't see how priorities on the
guest can work if they don't on the host.
>>>
>>>
>> They had to build connectors just like you propose to do.
>>
> More importantly, they had to build back-end busses too, no?
>
They had to write 414 lines in drivers/s390/kvm/kvm_virtio.c and
something similar for lguest.
>> But you still need vbus-connector-lguest and vbus-connector-s390 because
>> they all talk to the host differently. So what's changed? the names?
>>
> The fact that they don't need to redo most of the in-kernel backend
> stuff. Just the connector.
>
So they save 414 lines but have to write a connector which is... how large?
>> Well, venet doesn't complement virtio-net, and virtio-pci doesn't
>> complement vbus-connector.
>>
> Agreed, but virtio complements vbus by virtue of virtio-vbus.
>
I don't see what vbus adds to virtio-net.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
next prev parent reply other threads:[~2009-08-19 7:11 UTC|newest]
Thread overview: 120+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-14 15:42 [PATCH v3 0/6] AlacrityVM guest drivers Gregory Haskins
2009-08-14 15:42 ` [PATCH v3 1/6] shm-signal: shared-memory signals Gregory Haskins
2009-08-14 15:43 ` [PATCH v3 2/6] ioq: Add basic definitions for a shared-memory, lockless queue Gregory Haskins
2009-08-14 15:43 ` [PATCH v3 3/6] vbus: add a "vbus-proxy" bus model for vbus_driver objects Gregory Haskins
2009-08-15 10:32 ` Ingo Molnar
2009-08-15 19:15 ` Anthony Liguori
2009-08-16 7:16 ` Ingo Molnar
2009-08-17 13:54 ` Anthony Liguori
2009-08-17 14:23 ` Ingo Molnar
2009-08-17 14:14 ` Gregory Haskins
2009-08-17 14:58 ` Avi Kivity
2009-08-17 15:05 ` Ingo Molnar
2009-08-17 17:41 ` Michael S. Tsirkin
2009-08-17 20:17 ` Gregory Haskins
2009-08-18 8:46 ` Michael S. Tsirkin
2009-08-18 15:19 ` Gregory Haskins
2009-08-18 16:25 ` Michael S. Tsirkin
2009-08-18 15:53 ` [Alacrityvm-devel] " Ira W. Snyder
2009-08-18 16:51 ` Avi Kivity
2009-08-18 17:27 ` Ira W. Snyder
2009-08-18 17:47 ` Avi Kivity
2009-08-18 18:27 ` Ira W. Snyder
2009-08-18 18:52 ` Avi Kivity
2009-08-18 20:59 ` Ira W. Snyder
2009-08-18 21:26 ` Avi Kivity
2009-08-18 22:06 ` Avi Kivity
2009-08-19 0:44 ` Ira W. Snyder
2009-08-19 5:26 ` Avi Kivity
2009-08-19 0:38 ` Ira W. Snyder
2009-08-19 5:40 ` Avi Kivity
2009-08-19 15:28 ` Ira W. Snyder
2009-08-19 15:37 ` Avi Kivity
2009-08-19 16:29 ` Ira W. Snyder
2009-08-19 16:38 ` Avi Kivity
2009-08-19 21:05 ` Hollis Blanchard
2009-08-20 9:57 ` Stefan Hajnoczi
2009-08-20 10:08 ` Avi Kivity
2009-08-18 20:35 ` Michael S. Tsirkin
2009-08-18 21:04 ` Arnd Bergmann
2009-08-18 20:39 ` Michael S. Tsirkin
2009-08-18 20:57 ` Michael S. Tsirkin
2009-08-18 23:24 ` Ira W. Snyder
2009-08-18 1:08 ` Anthony Liguori
2009-08-18 7:38 ` Avi Kivity
2009-08-18 8:54 ` Michael S. Tsirkin
2009-08-18 13:16 ` Gregory Haskins
2009-08-18 13:45 ` Avi Kivity
2009-08-18 15:51 ` Gregory Haskins
2009-08-18 16:14 ` Ingo Molnar
2009-08-19 4:27 ` Gregory Haskins
2009-08-19 5:22 ` Avi Kivity
2009-08-19 13:27 ` Gregory Haskins
2009-08-19 14:35 ` Avi Kivity
2009-08-18 16:47 ` Avi Kivity
2009-08-18 16:51 ` Michael S. Tsirkin
2009-08-19 5:36 ` Gregory Haskins
2009-08-19 5:48 ` Avi Kivity
2009-08-19 6:40 ` Gregory Haskins
2009-08-19 7:13 ` Avi Kivity
2009-08-19 11:40 ` Gregory Haskins
2009-08-19 11:49 ` Avi Kivity
2009-08-19 11:52 ` Gregory Haskins
2009-08-19 14:33 ` Michael S. Tsirkin
2009-08-20 12:12 ` Michael S. Tsirkin
2009-08-16 8:30 ` Avi Kivity
2009-08-17 14:16 ` Gregory Haskins
2009-08-17 14:59 ` Avi Kivity
2009-08-17 15:09 ` Gregory Haskins
2009-08-17 15:14 ` Ingo Molnar
2009-08-17 19:35 ` Gregory Haskins
2009-08-17 15:18 ` Avi Kivity
2009-08-17 13:02 ` Gregory Haskins
2009-08-17 14:25 ` Ingo Molnar
2009-08-17 15:05 ` Gregory Haskins
2009-08-17 15:08 ` Ingo Molnar
2009-08-17 19:33 ` Gregory Haskins
2009-08-18 8:33 ` Avi Kivity
2009-08-18 14:46 ` Gregory Haskins
2009-08-18 16:27 ` Avi Kivity
2009-08-19 6:28 ` Gregory Haskins
2009-08-19 7:11 ` Avi Kivity [this message]
2009-08-19 18:23 ` Nicholas A. Bellinger
2009-08-19 18:39 ` Gregory Haskins
2009-08-19 19:19 ` Nicholas A. Bellinger
2009-08-19 19:34 ` Nicholas A. Bellinger
2009-08-19 20:12 ` configfs/sysfs Avi Kivity
2009-08-19 20:48 ` configfs/sysfs Ingo Molnar
2009-08-19 20:53 ` configfs/sysfs Avi Kivity
2009-08-19 21:19 ` configfs/sysfs Nicholas A. Bellinger
2009-08-19 22:15 ` configfs/sysfs Gregory Haskins
2009-08-19 22:16 ` configfs/sysfs Joel Becker
2009-08-19 23:48 ` [Alacrityvm-devel] configfs/sysfs Alex Tsariounov
2009-08-19 23:54 ` configfs/sysfs Nicholas A. Bellinger
2009-08-20 6:09 ` configfs/sysfs Avi Kivity
[not found] ` <4A8CE891.2010502@redhat.com>
2009-08-20 22:48 ` configfs/sysfs Joel Becker
2009-08-21 4:14 ` configfs/sysfs Avi Kivity
2009-08-19 18:26 ` [PATCH v3 3/6] vbus: add a "vbus-proxy" bus model for vbus_driver objects Gregory Haskins
2009-08-19 20:37 ` Avi Kivity
2009-08-19 20:53 ` Ingo Molnar
2009-08-20 17:25 ` Muli Ben-Yehuda
2009-08-20 20:58 ` Caitlin Bestler
2009-08-18 18:20 ` Arnd Bergmann
2009-08-18 19:08 ` Avi Kivity
2009-08-19 5:36 ` Gregory Haskins
2009-08-18 9:53 ` Michael S. Tsirkin
2009-08-18 10:00 ` Avi Kivity
2009-08-18 10:09 ` Michael S. Tsirkin
2009-08-18 10:13 ` Avi Kivity
2009-08-18 10:28 ` Michael S. Tsirkin
2009-08-18 10:45 ` Avi Kivity
2009-08-18 11:07 ` Michael S. Tsirkin
2009-08-18 11:15 ` Avi Kivity
2009-08-18 11:49 ` Michael S. Tsirkin
2009-08-18 11:54 ` Avi Kivity
2009-08-18 15:39 ` Gregory Haskins
2009-08-18 16:39 ` Michael S. Tsirkin
2009-08-17 15:13 ` Avi Kivity
2009-08-14 15:43 ` [PATCH v3 4/6] vbus-proxy: add a pci-to-vbus bridge Gregory Haskins
2009-08-14 15:43 ` [PATCH v3 5/6] ioq: add driver-side vbus helpers Gregory Haskins
2009-08-14 15:43 ` [PATCH v3 6/6] net: Add vbus_enet driver Gregory Haskins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A8BA5AE.3030308@redhat.com \
--to=avi@redhat.com \
--cc=alacrityvm-devel@lists.sourceforge.net \
--cc=gregory.haskins@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).