netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Avi Kivity <avi@redhat.com>
To: Gregory Haskins <gregory.haskins@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	kvm@vger.kernel.org, alacrityvm-devel@lists.sourceforge.net,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH v3 3/6] vbus: add a "vbus-proxy" bus model for vbus_driver objects
Date: Wed, 19 Aug 2009 10:11:42 +0300	[thread overview]
Message-ID: <4A8BA5AE.3030308@redhat.com> (raw)
In-Reply-To: <4A8B9B79.9050004@gmail.com>

On 08/19/2009 09:28 AM, Gregory Haskins wrote:
> Avi Kivity wrote:
>    
>> On 08/18/2009 05:46 PM, Gregory Haskins wrote:
>>      
>>>        
>>>> Can you explain how vbus achieves RDMA?
>>>>
>>>> I also don't see the connection to real time guests.
>>>>
>>>>          
>>> Both of these are still in development.  Trying to stay true to the
>>> "release early and often" mantra, the core vbus technology is being
>>> pushed now so it can be reviewed.  Stay tuned for these other
>>> developments.
>>>
>>>        
>> Hopefully you can outline how it works.  AFAICT, RDMA and kernel bypass
>> will need device assignment.  If you're bypassing the call into the host
>> kernel, it doesn't really matter how that call is made, does it?
>>      
> This is for things like the setup of queue-pairs, and the transport of
> door-bells, and ib-verbs.  I am not on the team doing that work, so I am
> not an expert in this area.  What I do know is having a flexible and
> low-latency signal-path was deemed a key requirement.
>    

That's not a full bypass, then.  AFAIK kernel bypass has userspace 
talking directly to the device.

Given that both virtio and vbus can use ioeventfds, I don't see how one 
can perform better than the other.

> For real-time, a big part of it is relaying the guest scheduler state to
> the host, but in a smart way.  For instance, the cpu priority for each
> vcpu is in a shared-table.  When the priority is raised, we can simply
> update the table without taking a VMEXIT.  When it is lowered, we need
> to inform the host of the change in case the underlying task needs to
> reschedule.
>    

This is best done using cr8/tpr so you don't have to exit at all.  See 
also my vtpr support for Windows which does this in software, generally 
avoiding the exit even when lowering priority.

> This is where the really fast call() type mechanism is important.
>
> Its also about having the priority flow-end to end, and having the vcpu
> interrupt state affect the task-priority, etc (e.g. pending interrupts
> affect the vcpu task prio).
>
> etc, etc.
>
> I can go on and on (as you know ;), but will wait till this work is more
> concrete and proven.
>    

Generally cpu state shouldn't flow through a device but rather through 
MSRs, hypercalls, and cpu registers.

> Basically, what it comes down to is both vbus and vhost need
> configuration/management.  Vbus does it with sysfs/configfs, and vhost
> does it with ioctls.  I ultimately decided to go with sysfs/configfs
> because, at least that the time I looked, it seemed like the "blessed"
> way to do user->kernel interfaces.
>    

I really dislike that trend but that's an unrelated discussion.

>> They need to be connected to the real world somehow.  What about
>> security?  can any user create a container and devices and link them to
>> real interfaces?  If not, do you need to run the VM as root?
>>      
> Today it has to be root as a result of weak mode support in configfs, so
> you have me there.  I am looking for help patching this limitation, though.
>
>    

Well, do you plan to address this before submission for inclusion?

>> I hope everyone agrees that it's an important issue for me and that I
>> have to consider non-Linux guests.  I also hope that you're considering
>> non-Linux guests since they have considerable market share.
>>      
> I didn't mean non-Linux guests are not important.  I was disagreeing
> with your assertion that it only works if its PCI.  There are numerous
> examples of IHV/ISV "bridge" implementations deployed in Windows, no?
>    

I don't know.

> If vbus is exposed as a PCI-BRIDGE, how is this different?
>    

Technically it would work, but given you're not interested in Windows, 
who would write a driver?

>> Given I'm not the gateway to inclusion of vbus/venet, you don't need to
>> ask me anything.  I'm still free to give my opinion.
>>      
> Agreed, and I didn't mean to suggest otherwise.  It not clear if you are
> wearing the "kvm maintainer" hat, or the "lkml community member" hat at
> times, so its important to make that distinction.  Otherwise, its not
> clear if this is edict as my superior, or input as my peer. ;)
>    

When I wear a hat, it is a Red Hat.  However I am bareheaded most often.

(that is, look at the contents of my message, not who wrote it or his role).

>> With virtio, the number is 1 (or less if you amortize).  Set up the ring
>> entries and kick.
>>      
> Again, I am just talking about basic PCI here, not the things we build
> on top.
>    

Whatever that means, it isn't interesting.  Performance is measure for 
the whole stack.

> The point is: the things we build on top have costs associated with
> them, and I aim to minimize it.  For instance, to do a "call()" kind of
> interface, you generally need to pre-setup some per-cpu mappings so that
> you can just do a single iowrite32() to kick the call off.  Those
> per-cpu mappings have a cost if you want them to be high-performance, so
> my argument is that you ideally want to limit the number of times you
> have to do this.  My current design reduces this to "once".
>    

Do you mean minimizing the setup cost?  Seriously?

>> There's no such thing as raw PCI.  Every PCI device has a protocol.  The
>> protocol virtio chose is optimized for virtualization.
>>      
> And its a question of how that protocol scales, more than how the
> protocol works.
>
> Obviously the general idea of the protocol works, as vbus itself is
> implemented as a PCI-BRIDGE and is therefore limited to the underlying
> characteristics that I can get out of PCI (like PIO latency).
>    

I thought we agreed that was insignificant?

>> As I've mentioned before, prioritization is available on x86
>>      
> But as Ive mentioned, it doesn't work very well.
>    

I guess it isn't that important then.  I note that clever prioritization 
in a guest is pointless if you can't do the same prioritization in the host.

>> , and coalescing scales badly.
>>      
> Depends on what is scaling.  Scaling vcpus?  Yes, you are right.
> Scaling the number of devices?  No, this is where it improves.
>    

If you queue pending messages instead of walking the device list, you 
may be right.  Still, if hard interrupt processing takes 10% of your 
time you'll only have coalesced 10% of interrupts on average.

>> irq window exits ought to be pretty rare, so we're only left with
>> injection vmexits.  At around 1us/vmexit, even 100,000 interrupts/vcpu
>> (which is excessive) will only cost you 10% cpu time.
>>      
> 1us is too much for what I am building, IMHO.

You can't use current hardware then.

>> You're free to demultiplex an MSI to however many consumers you want,
>> there's no need for a new bus for that.
>>      
> Hmmm...can you elaborate?
>    

Point all those MSIs at one vector.  Its handler will have to poll all 
the attached devices though.

>> Do you use DNS.  We use PCI-SIG.  If Novell is a PCI-SIG member you can
>> get a vendor ID and control your own virtio space.
>>      
> Yeah, we have our own id.  I am more concerned about making this design
> make sense outside of PCI oriented environments.
>    

IIRC we reuse the PCI IDs for non-PCI.




>>>> That's a bug, not a feature.  It means poor scaling as the number of
>>>> vcpus increases and as the number of devices increases.
>>>>          
> vcpu increases, I agree (and am ok with, as I expect low vcpu count
> machines to be typical).

I'm not okay with it.  If you wish people to adopt vbus over virtio 
you'll have to address all concerns, not just yours.

> nr of devices, I disagree.  can you elaborate?
>    

With message queueing, I retract my remark.

>> Windows,
>>      
> Work in progress.
>    

Interesting.  Do you plan to open source the code?  If not, will the 
binaries be freely available?

>    
>> large guests
>>      
> Can you elaborate?  I am not familiar with the term.
>    

Many vcpus.

>    
>> and multiqueue out of your design.
>>      
> AFAICT, multiqueue should work quite nicely with vbus.  Can you
> elaborate on where you see the problem?
>    

You said you aren't interested in it previously IIRC.

>>>> x86 APIC is priority aware.
>>>>
>>>>          
>>> Have you ever tried to use it?
>>>
>>>        
>> I haven't, but Windows does.
>>      
> Yeah, it doesn't really work well.  Its an extremely rigid model that
> (IIRC) only lets you prioritize in 16 groups spaced by IDT (0-15 are one
> level, 16-31 are another, etc).  Most of the embedded PICs I have worked
> with supported direct remapping, etc.  But in any case, Linux doesn't
> support it so we are hosed no matter how good it is.
>    

I agree that it isn't very clever (not that I am a real time expert) but 
I disagree about dismissing Linux support so easily.  If prioritization 
is such a win it should be a win on the host as well and we should make 
it work on the host as well.  Further I don't see how priorities on the 
guest can work if they don't on the host.

>>>
>>>        
>> They had to build connectors just like you propose to do.
>>      
> More importantly, they had to build back-end busses too, no?
>    

They had to write 414 lines in drivers/s390/kvm/kvm_virtio.c and 
something similar for lguest.

>> But you still need vbus-connector-lguest and vbus-connector-s390 because
>> they all talk to the host differently.  So what's changed?  the names?
>>      
> The fact that they don't need to redo most of the in-kernel backend
> stuff.  Just the connector.
>    

So they save 414 lines but have to write a connector which is... how large?

>> Well, venet doesn't complement virtio-net, and virtio-pci doesn't
>> complement vbus-connector.
>>      
> Agreed, but virtio complements vbus by virtue of virtio-vbus.
>    

I don't see what vbus adds to virtio-net.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


  reply	other threads:[~2009-08-19  7:11 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-14 15:42 [PATCH v3 0/6] AlacrityVM guest drivers Gregory Haskins
2009-08-14 15:42 ` [PATCH v3 1/6] shm-signal: shared-memory signals Gregory Haskins
2009-08-14 15:43 ` [PATCH v3 2/6] ioq: Add basic definitions for a shared-memory, lockless queue Gregory Haskins
2009-08-14 15:43 ` [PATCH v3 3/6] vbus: add a "vbus-proxy" bus model for vbus_driver objects Gregory Haskins
2009-08-15 10:32   ` Ingo Molnar
2009-08-15 19:15     ` Anthony Liguori
2009-08-16  7:16       ` Ingo Molnar
2009-08-17 13:54         ` Anthony Liguori
2009-08-17 14:23           ` Ingo Molnar
2009-08-17 14:14       ` Gregory Haskins
2009-08-17 14:58         ` Avi Kivity
2009-08-17 15:05           ` Ingo Molnar
2009-08-17 17:41         ` Michael S. Tsirkin
2009-08-17 20:17           ` Gregory Haskins
2009-08-18  8:46             ` Michael S. Tsirkin
2009-08-18 15:19               ` Gregory Haskins
2009-08-18 16:25                 ` Michael S. Tsirkin
2009-08-18 15:53               ` [Alacrityvm-devel] " Ira W. Snyder
2009-08-18 16:51                 ` Avi Kivity
2009-08-18 17:27                   ` Ira W. Snyder
2009-08-18 17:47                     ` Avi Kivity
2009-08-18 18:27                       ` Ira W. Snyder
2009-08-18 18:52                         ` Avi Kivity
2009-08-18 20:59                           ` Ira W. Snyder
2009-08-18 21:26                             ` Avi Kivity
2009-08-18 22:06                               ` Avi Kivity
2009-08-19  0:44                                 ` Ira W. Snyder
2009-08-19  5:26                                   ` Avi Kivity
2009-08-19  0:38                               ` Ira W. Snyder
2009-08-19  5:40                                 ` Avi Kivity
2009-08-19 15:28                                   ` Ira W. Snyder
2009-08-19 15:37                                     ` Avi Kivity
2009-08-19 16:29                                       ` Ira W. Snyder
2009-08-19 16:38                                         ` Avi Kivity
2009-08-19 21:05                                           ` Hollis Blanchard
2009-08-20  9:57                                             ` Stefan Hajnoczi
2009-08-20 10:08                                               ` Avi Kivity
2009-08-18 20:35                         ` Michael S. Tsirkin
2009-08-18 21:04                           ` Arnd Bergmann
2009-08-18 20:39                     ` Michael S. Tsirkin
2009-08-18 20:57                 ` Michael S. Tsirkin
2009-08-18 23:24                   ` Ira W. Snyder
2009-08-18  1:08         ` Anthony Liguori
2009-08-18  7:38           ` Avi Kivity
2009-08-18  8:54           ` Michael S. Tsirkin
2009-08-18 13:16           ` Gregory Haskins
2009-08-18 13:45             ` Avi Kivity
2009-08-18 15:51               ` Gregory Haskins
2009-08-18 16:14                 ` Ingo Molnar
2009-08-19  4:27                   ` Gregory Haskins
2009-08-19  5:22                     ` Avi Kivity
2009-08-19 13:27                       ` Gregory Haskins
2009-08-19 14:35                         ` Avi Kivity
2009-08-18 16:47                 ` Avi Kivity
2009-08-18 16:51                 ` Michael S. Tsirkin
2009-08-19  5:36                   ` Gregory Haskins
2009-08-19  5:48                     ` Avi Kivity
2009-08-19  6:40                       ` Gregory Haskins
2009-08-19  7:13                         ` Avi Kivity
2009-08-19 11:40                           ` Gregory Haskins
2009-08-19 11:49                             ` Avi Kivity
2009-08-19 11:52                               ` Gregory Haskins
2009-08-19 14:33                     ` Michael S. Tsirkin
2009-08-20 12:12                     ` Michael S. Tsirkin
2009-08-16  8:30     ` Avi Kivity
2009-08-17 14:16       ` Gregory Haskins
2009-08-17 14:59         ` Avi Kivity
2009-08-17 15:09           ` Gregory Haskins
2009-08-17 15:14             ` Ingo Molnar
2009-08-17 19:35               ` Gregory Haskins
2009-08-17 15:18             ` Avi Kivity
2009-08-17 13:02     ` Gregory Haskins
2009-08-17 14:25       ` Ingo Molnar
2009-08-17 15:05         ` Gregory Haskins
2009-08-17 15:08           ` Ingo Molnar
2009-08-17 19:33             ` Gregory Haskins
2009-08-18  8:33               ` Avi Kivity
2009-08-18 14:46                 ` Gregory Haskins
2009-08-18 16:27                   ` Avi Kivity
2009-08-19  6:28                     ` Gregory Haskins
2009-08-19  7:11                       ` Avi Kivity [this message]
2009-08-19 18:23                         ` Nicholas A. Bellinger
2009-08-19 18:39                           ` Gregory Haskins
2009-08-19 19:19                             ` Nicholas A. Bellinger
2009-08-19 19:34                               ` Nicholas A. Bellinger
2009-08-19 20:12                           ` configfs/sysfs Avi Kivity
2009-08-19 20:48                             ` configfs/sysfs Ingo Molnar
2009-08-19 20:53                               ` configfs/sysfs Avi Kivity
2009-08-19 21:19                             ` configfs/sysfs Nicholas A. Bellinger
2009-08-19 22:15                             ` configfs/sysfs Gregory Haskins
2009-08-19 22:16                             ` configfs/sysfs Joel Becker
2009-08-19 23:48                               ` [Alacrityvm-devel] configfs/sysfs Alex Tsariounov
2009-08-19 23:54                               ` configfs/sysfs Nicholas A. Bellinger
2009-08-20  6:09                               ` configfs/sysfs Avi Kivity
     [not found]                               ` <4A8CE891.2010502@redhat.com>
2009-08-20 22:48                                 ` configfs/sysfs Joel Becker
2009-08-21  4:14                                   ` configfs/sysfs Avi Kivity
2009-08-19 18:26                         ` [PATCH v3 3/6] vbus: add a "vbus-proxy" bus model for vbus_driver objects Gregory Haskins
2009-08-19 20:37                           ` Avi Kivity
2009-08-19 20:53                             ` Ingo Molnar
2009-08-20 17:25                             ` Muli Ben-Yehuda
2009-08-20 20:58                             ` Caitlin Bestler
2009-08-18 18:20                   ` Arnd Bergmann
2009-08-18 19:08                     ` Avi Kivity
2009-08-19  5:36                     ` Gregory Haskins
2009-08-18  9:53               ` Michael S. Tsirkin
2009-08-18 10:00                 ` Avi Kivity
2009-08-18 10:09                   ` Michael S. Tsirkin
2009-08-18 10:13                     ` Avi Kivity
2009-08-18 10:28                       ` Michael S. Tsirkin
2009-08-18 10:45                         ` Avi Kivity
2009-08-18 11:07                           ` Michael S. Tsirkin
2009-08-18 11:15                             ` Avi Kivity
2009-08-18 11:49                               ` Michael S. Tsirkin
2009-08-18 11:54                                 ` Avi Kivity
2009-08-18 15:39                 ` Gregory Haskins
2009-08-18 16:39                   ` Michael S. Tsirkin
2009-08-17 15:13           ` Avi Kivity
2009-08-14 15:43 ` [PATCH v3 4/6] vbus-proxy: add a pci-to-vbus bridge Gregory Haskins
2009-08-14 15:43 ` [PATCH v3 5/6] ioq: add driver-side vbus helpers Gregory Haskins
2009-08-14 15:43 ` [PATCH v3 6/6] net: Add vbus_enet driver Gregory Haskins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A8BA5AE.3030308@redhat.com \
    --to=avi@redhat.com \
    --cc=alacrityvm-devel@lists.sourceforge.net \
    --cc=gregory.haskins@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).