* Re: vhost net: performance with ping benchmark
2009-08-25 4:14 ` Avi Kivity
@ 2009-08-25 6:46 ` Michael S. Tsirkin
2009-08-25 13:08 ` Anthony Liguori
2009-08-25 12:34 ` Arnd Bergmann
` (2 subsequent siblings)
3 siblings, 1 reply; 17+ messages in thread
From: Michael S. Tsirkin @ 2009-08-25 6:46 UTC (permalink / raw)
To: Avi Kivity
Cc: Anthony Liguori, virtualization, kvm, Rusty Russell,
Mark McLoughlin
On Tue, Aug 25, 2009 at 07:14:29AM +0300, Avi Kivity wrote:
> On 08/25/2009 05:22 AM, Anthony Liguori wrote:
>>
>> I think 2.6.32 is pushing it.
>
> 2.6.32 is pushing it, but we need to push it.
>
>> I think some time is needed to flush out the userspace interface. In
>> particular, I don't think Mark's comments have been adequately
>> addressed. If a version were merged without GSO support, some
>> mechanism to do feature detection would be needed in the userspace API.
>>
>
> I don't see any point in merging without gso (unless it beats userspace
> with gso, which I don't think will happen). In any case we'll need
> feature negotiation.
>
>> I think this is likely going to be needed regardless. I also think
>> the tap compatibility suggestion would simplify the consumption of
>> this in userspace.
>
> What about veth pairs?
>
>> I'd like some time to look at get_state/set_state ioctl()s along with
>> dirty tracking support. It's a much better model for live migration
>> IMHO.
>
> My preference is ring proxying. Not we'll need ring proxying (or at
> least event proxying) for non-MSI guests.
Exactly, that's what I meant earlier. That's enough, isn't it, Anthony?
>> I think so more thorough benchmarking would be good too. In
>> particular, netperf/iperf runs would be nice.
>
> Definitely.
>
> --
> I have a truly marvellous patch that fixes the bug which this
> signature is too narrow to contain.
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: vhost net: performance with ping benchmark
2009-08-25 6:46 ` Michael S. Tsirkin
@ 2009-08-25 13:08 ` Anthony Liguori
2009-08-25 13:34 ` Michael S. Tsirkin
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Anthony Liguori @ 2009-08-25 13:08 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Avi Kivity, virtualization, kvm, Rusty Russell, Mark McLoughlin
Michael S. Tsirkin wrote:
>> My preference is ring proxying. Not we'll need ring proxying (or at
>> least event proxying) for non-MSI guests.
>>
>
> Exactly, that's what I meant earlier. That's enough, isn't it, Anthony?
>
It is if we have a working implementation that demonstrates the
userspace interface is sufficient. Once it goes into the upstream
kernel, we need to have backwards compatibility code in QEMU forever to
support that kernel version.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: vhost net: performance with ping benchmark
2009-08-25 13:08 ` Anthony Liguori
@ 2009-08-25 13:34 ` Michael S. Tsirkin
2009-08-25 13:45 ` Michael S. Tsirkin
2009-08-25 15:57 ` Avi Kivity
2 siblings, 0 replies; 17+ messages in thread
From: Michael S. Tsirkin @ 2009-08-25 13:34 UTC (permalink / raw)
To: Anthony Liguori
Cc: Avi Kivity, virtualization, kvm, Rusty Russell, Mark McLoughlin
On Tue, Aug 25, 2009 at 08:08:05AM -0500, Anthony Liguori wrote:
> Michael S. Tsirkin wrote:
>>> My preference is ring proxying. Not we'll need ring proxying (or at
>>> least event proxying) for non-MSI guests.
>>>
>>
>> Exactly, that's what I meant earlier. That's enough, isn't it, Anthony?
>>
>
> It is if we have a working implementation that demonstrates the
> userspace interface is sufficient.
The idea is trivial enough to be sure the interface is sufficient:
we point kernel at used buffer at address X, and
copy stuff from there to guest buffer, then signal guest.
I'll post a code snippet to show how it's done if you like.
> Once it goes into the upstream
> kernel, we need to have backwards compatibility code in QEMU forever
> to support that kernel version.
Don't worry: kernel needs to handle old userspace as well, and neither I
nor Rusty want to have a compatibility mess in kernel.
> Regards,
>
> Anthony Liguori
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: vhost net: performance with ping benchmark
2009-08-25 13:08 ` Anthony Liguori
2009-08-25 13:34 ` Michael S. Tsirkin
@ 2009-08-25 13:45 ` Michael S. Tsirkin
2009-08-25 15:57 ` Avi Kivity
2 siblings, 0 replies; 17+ messages in thread
From: Michael S. Tsirkin @ 2009-08-25 13:45 UTC (permalink / raw)
To: Anthony Liguori
Cc: Avi Kivity, virtualization, kvm, Rusty Russell, Mark McLoughlin
On Tue, Aug 25, 2009 at 08:08:05AM -0500, Anthony Liguori wrote:
> Once it goes into the upstream
> kernel, we need to have backwards compatibility code in QEMU forever to
> support that kernel version.
BTW, qemu can keep doing the userspace thing if some capability it needs
is missing. It won't be worse off than if the driver is not upstream at
all :).
--
MST
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: vhost net: performance with ping benchmark
2009-08-25 13:08 ` Anthony Liguori
2009-08-25 13:34 ` Michael S. Tsirkin
2009-08-25 13:45 ` Michael S. Tsirkin
@ 2009-08-25 15:57 ` Avi Kivity
2 siblings, 0 replies; 17+ messages in thread
From: Avi Kivity @ 2009-08-25 15:57 UTC (permalink / raw)
To: Anthony Liguori
Cc: Michael S. Tsirkin, virtualization, kvm, Rusty Russell,
Mark McLoughlin
On 08/25/2009 04:08 PM, Anthony Liguori wrote:
> Michael S. Tsirkin wrote:
>>> My preference is ring proxying. Not we'll need ring proxying (or
>>> at least event proxying) for non-MSI guests.
>>
>> Exactly, that's what I meant earlier. That's enough, isn't it, Anthony?
>
> It is if we have a working implementation that demonstrates the
> userspace interface is sufficient. Once it goes into the upstream
> kernel, we need to have backwards compatibility code in QEMU forever
> to support that kernel version.
Not at all. We still have pure userspace support, so if we don't like
the first two versions of vhost, we can simply not support them. Of
course I'm not advocating merging something known bad or untested, just
pointing out that the cost of an error is not that bad.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: vhost net: performance with ping benchmark
2009-08-25 4:14 ` Avi Kivity
2009-08-25 6:46 ` Michael S. Tsirkin
@ 2009-08-25 12:34 ` Arnd Bergmann
2009-08-26 7:34 ` Rusty Russell
2009-08-25 13:06 ` Anthony Liguori
2009-08-25 13:24 ` Anthony Liguori
3 siblings, 1 reply; 17+ messages in thread
From: Arnd Bergmann @ 2009-08-25 12:34 UTC (permalink / raw)
To: Avi Kivity
Cc: Anthony Liguori, Michael S. Tsirkin, virtualization, kvm,
Rusty Russell, Mark McLoughlin
On Tuesday 25 August 2009, Avi Kivity wrote:
> On 08/25/2009 05:22 AM, Anthony Liguori wrote:
> >
> > I think 2.6.32 is pushing it.
>
> 2.6.32 is pushing it, but we need to push it.
Agreed.
> > I think some time is needed to flush out the userspace interface. In
> > particular, I don't think Mark's comments have been adequately
> > addressed. If a version were merged without GSO support, some
> > mechanism to do feature detection would be needed in the userspace API.
>
> I don't see any point in merging without gso (unless it beats userspace
> with gso, which I don't think will happen). In any case we'll need
> feature negotiation.
The feature negotiation that Michael has put in seems sufficient for this.
If you care more about latency than bandwidth, the current driver is
an enormous improvement over the user space code, which I find is enough
reason to have it now.
> > I think this is likely going to be needed regardless. I also think
> > the tap compatibility suggestion would simplify the consumption of
> > this in userspace.
>
> What about veth pairs?
I think you are talking about different issues. Veth pairs let you connect
vhost to a bridge device like you do in the typical tun/tap setup, which
addresses one problem.
The other issue that came up is that raw sockets require root permissions,
so you have to start qemu as root or with an open file descriptor for the
socket (e.g. through libvirt). Permission handling on tap devices is ugly
as well, but is a solved problem. The solution is to be able to pass
a tap device (or a socket from a tap device) into vhost net. IMHO we
need this eventually, but it's not a show stopper for 2.6.32.
Along the same lines, I also think it should support TCP and UDP sockets
so we can offload 'qemu -net socket,mcast' and 'qemu -net socket,connect'
in addition to 'qemu -net socket,raw'.
Arnd <><
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: vhost net: performance with ping benchmark
2009-08-25 12:34 ` Arnd Bergmann
@ 2009-08-26 7:34 ` Rusty Russell
2009-08-26 8:14 ` Michael S. Tsirkin
2009-08-27 16:00 ` Michael S. Tsirkin
0 siblings, 2 replies; 17+ messages in thread
From: Rusty Russell @ 2009-08-26 7:34 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Avi Kivity, Anthony Liguori, Michael S. Tsirkin, virtualization,
kvm, Mark McLoughlin
On Tue, 25 Aug 2009 10:04:41 pm Arnd Bergmann wrote:
> On Tuesday 25 August 2009, Avi Kivity wrote:
> > On 08/25/2009 05:22 AM, Anthony Liguori wrote:
> > >
> > > I think 2.6.32 is pushing it.
> >
> > 2.6.32 is pushing it, but we need to push it.
>
> Agreed.
Get real. It's not happening.
We need migration completely solved and tested. I want to see all the
features supported, including indirect descs and GSO.
If this wasn't a new userspace ABI, I'd be all for throwing it in as
experimental ASAP.
Rusty.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: vhost net: performance with ping benchmark
2009-08-26 7:34 ` Rusty Russell
@ 2009-08-26 8:14 ` Michael S. Tsirkin
2009-08-27 16:00 ` Michael S. Tsirkin
1 sibling, 0 replies; 17+ messages in thread
From: Michael S. Tsirkin @ 2009-08-26 8:14 UTC (permalink / raw)
To: Rusty Russell
Cc: Arnd Bergmann, Avi Kivity, Anthony Liguori, virtualization, kvm,
Mark McLoughlin
On Wed, Aug 26, 2009 at 05:04:44PM +0930, Rusty Russell wrote:
> On Tue, 25 Aug 2009 10:04:41 pm Arnd Bergmann wrote:
> > On Tuesday 25 August 2009, Avi Kivity wrote:
> > > On 08/25/2009 05:22 AM, Anthony Liguori wrote:
> > > >
> > > > I think 2.6.32 is pushing it.
> > >
> > > 2.6.32 is pushing it, but we need to push it.
> >
> > Agreed.
>
> Get real. It's not happening.
>
> We need migration completely solved and tested. I want to see all the
> features supported, including indirect descs and GSO.
I'm not sure why indirect descs are needed for virtio-net. Comments?
> If this wasn't a new userspace ABI, I'd be all for throwing it in as
> experimental ASAP.
>
> Rusty.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: vhost net: performance with ping benchmark
2009-08-26 7:34 ` Rusty Russell
2009-08-26 8:14 ` Michael S. Tsirkin
@ 2009-08-27 16:00 ` Michael S. Tsirkin
1 sibling, 0 replies; 17+ messages in thread
From: Michael S. Tsirkin @ 2009-08-27 16:00 UTC (permalink / raw)
To: Rusty Russell
Cc: Arnd Bergmann, Avi Kivity, Anthony Liguori, virtualization, kvm,
Mark McLoughlin
On Wed, Aug 26, 2009 at 05:04:44PM +0930, Rusty Russell wrote:
> On Tue, 25 Aug 2009 10:04:41 pm Arnd Bergmann wrote:
> > On Tuesday 25 August 2009, Avi Kivity wrote:
> > > On 08/25/2009 05:22 AM, Anthony Liguori wrote:
> > > >
> > > > I think 2.6.32 is pushing it.
> > >
> > > 2.6.32 is pushing it, but we need to push it.
> >
> > Agreed.
>
> Get real. It's not happening.
linux-next should be ok though? Can you put patches there without
targeting 2.6.32?
> We need migration completely solved and tested. I want to see all the
> features supported, including indirect descs and GSO.
>
> If this wasn't a new userspace ABI, I'd be all for throwing it in as
> experimental ASAP.
>
> Rusty.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: vhost net: performance with ping benchmark
2009-08-25 4:14 ` Avi Kivity
2009-08-25 6:46 ` Michael S. Tsirkin
2009-08-25 12:34 ` Arnd Bergmann
@ 2009-08-25 13:06 ` Anthony Liguori
2009-08-25 14:02 ` Michael S. Tsirkin
2009-08-25 13:24 ` Anthony Liguori
3 siblings, 1 reply; 17+ messages in thread
From: Anthony Liguori @ 2009-08-25 13:06 UTC (permalink / raw)
To: Avi Kivity
Cc: Michael S. Tsirkin, virtualization, kvm, Rusty Russell,
Mark McLoughlin
Avi Kivity wrote:
>> I think this is likely going to be needed regardless. I also think
>> the tap compatibility suggestion would simplify the consumption of
>> this in userspace.
>
> What about veth pairs?
Does veth support GSO and checksum offload?
>> I'd like some time to look at get_state/set_state ioctl()s along with
>> dirty tracking support. It's a much better model for live migration
>> IMHO.
>
> My preference is ring proxying. Not we'll need ring proxying (or at
> least event proxying) for non-MSI guests.
I avoided suggested ring proxying because I didn't want to suggest that
merging should be contingent on it.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: vhost net: performance with ping benchmark
2009-08-25 13:06 ` Anthony Liguori
@ 2009-08-25 14:02 ` Michael S. Tsirkin
0 siblings, 0 replies; 17+ messages in thread
From: Michael S. Tsirkin @ 2009-08-25 14:02 UTC (permalink / raw)
To: Anthony Liguori
Cc: Avi Kivity, virtualization, kvm, Rusty Russell, Mark McLoughlin
On Tue, Aug 25, 2009 at 08:06:39AM -0500, Anthony Liguori wrote:
> Avi Kivity wrote:
>>> I think this is likely going to be needed regardless. I also think
>>> the tap compatibility suggestion would simplify the consumption of
>>> this in userspace.
>>
>> What about veth pairs?
>
> Does veth support GSO and checksum offload?
AFAIK, no. But again, improving veth is a separate project :)
>>> I'd like some time to look at get_state/set_state ioctl()s along with
>>> dirty tracking support. It's a much better model for live migration
>>> IMHO.
>>
>> My preference is ring proxying. Not we'll need ring proxying (or at
>> least event proxying) for non-MSI guests.
>
> I avoided suggested ring proxying because I didn't want to suggest that
> merging should be contingent on it.
Happily, the proposed interface supports is.
> Regards,
>
> Anthony Liguori
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: vhost net: performance with ping benchmark
2009-08-25 4:14 ` Avi Kivity
` (2 preceding siblings ...)
2009-08-25 13:06 ` Anthony Liguori
@ 2009-08-25 13:24 ` Anthony Liguori
2009-08-25 13:43 ` Michael S. Tsirkin
3 siblings, 1 reply; 17+ messages in thread
From: Anthony Liguori @ 2009-08-25 13:24 UTC (permalink / raw)
To: Avi Kivity
Cc: Michael S. Tsirkin, virtualization, kvm, Rusty Russell,
Mark McLoughlin
Avi Kivity wrote:
> My preference is ring proxying. Not we'll need ring proxying (or at
> least event proxying) for non-MSI guests.
Thinking about this more...
How does the hand off work? Assuming you normally don't proxy ring
entries and switch to proxying them when you want to migration, do you
have a set of ioctl()s that changes the semantics of the ring to be host
virtual addresses instead of guest physical? If so, what do you do with
in flight requests? Does qemu have to buffer new requests and wait for
old ones to complete?
Unless you always do ring proxying. If that's the case, we don't need
any of the slot management code in vhost.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: vhost net: performance with ping benchmark
2009-08-25 13:24 ` Anthony Liguori
@ 2009-08-25 13:43 ` Michael S. Tsirkin
0 siblings, 0 replies; 17+ messages in thread
From: Michael S. Tsirkin @ 2009-08-25 13:43 UTC (permalink / raw)
To: Anthony Liguori
Cc: Avi Kivity, virtualization, kvm, Rusty Russell, Mark McLoughlin
On Tue, Aug 25, 2009 at 08:24:07AM -0500, Anthony Liguori wrote:
> Avi Kivity wrote:
>> My preference is ring proxying. Not we'll need ring proxying (or at
>> least event proxying) for non-MSI guests.
>
> Thinking about this more...
>
> How does the hand off work? Assuming you normally don't proxy ring
> entries and switch to proxying them when you want to migration, do you
> have a set of ioctl()s that changes the semantics of the ring to be host
> virtual addresses instead of guest physical? If so, what do you do with
> in flight requests? Does qemu have to buffer new requests and wait for
> old ones to complete?
>
> Unless you always do ring proxying. If that's the case, we don't need
> any of the slot management code in vhost.
>
> Regards,
>
> Anthony Liguori
Here's how it works. It relies on the fact that in virtio, guest can not
assume that descriptors have been used unless they appeared in used
buffers.
When migration starts, we do this:
1. stop kernel (disable socket)
2. call VHOST_SET_VRING_USED: note it gets virtual address, bot guest
physical. We point it at buffer in qemu memory
3. call VHOST_SET_VRING_CALL, pass eventfd created by qemu
5. copy over existing used buffer
4. unstop kernel (reenable socket)
Now when migration is in progress, we do this:
A. poll eventfd in 3 above
B. When event is seen, look at used buffer that we gave to kernel
C. Parse descriptors and mark pages that kernel wrote to
as dirty
D. update used buffer that guest looks at
E. signal eventfd for guest
--
MST
^ permalink raw reply [flat|nested] 17+ messages in thread