copyless virtio net thoughts?

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* copyless virtio net thoughts?
@ 2009-02-05  2:07 Chris Wright
  2009-02-05 12:37 ` Avi Kivity
  2009-02-18 11:38 ` Rusty Russell
  0 siblings, 2 replies; 19+ messages in thread
From: Chris Wright @ 2009-02-05  2:07 UTC (permalink / raw)
  To: Arnd Bergmann, Herbert Xu, Rusty Russell; +Cc: kvm

There's been a number of different discussions re: getting copyless virtio
net (esp. for KVM).  This is just a poke in that general direction to
stir the discussion.  I'm interested to hear current thoughts?

thanks
-chris

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: copyless virtio net thoughts?
  2009-02-05  2:07 copyless virtio net thoughts? Chris Wright
@ 2009-02-05 12:37 ` Avi Kivity
  2009-02-05 14:25   ` Anthony Liguori
  2009-02-06  5:40   ` Herbert Xu
  2009-02-18 11:38 ` Rusty Russell
  1 sibling, 2 replies; 19+ messages in thread
From: Avi Kivity @ 2009-02-05 12:37 UTC (permalink / raw)
  To: Chris Wright; +Cc: Arnd Bergmann, Herbert Xu, Rusty Russell, kvm

Chris Wright wrote:
> There's been a number of different discussions re: getting copyless virtio
> net (esp. for KVM).  This is just a poke in that general direction to
> stir the discussion.  I'm interested to hear current thoughts

I believe that copyless networking is absolutely essential.

For transmit, copyless is needed to properly support sendfile() type 
workloads - http/ftp/nfs serving.  These are usually high-bandwidth, 
cache-cold workloads where a copy is most expensive.

For receive, the guest will almost always do an additional copy, but it 
will most likely do the copy from another cpu.  Xen netchannel2 
mitigates this somewhat by having the guest request the hypervisor to 
perform the copy when the rx interrupt is processed, but this may still 
be too early (the packet may be destined to a process that is on another 
vcpu), and the extra hypercall is expensive.

In my opinion, it would be ideal to linux-aio enable taps and packet 
sockets.  io_submit() allows submitting multiple buffers in one syscall 
and supports scatter/gather.  io_getevents() supports dequeuing multiple 
packet completions in one syscall.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: copyless virtio net thoughts?
  2009-02-05 12:37 ` Avi Kivity
@ 2009-02-05 14:25   ` Anthony Liguori
  2009-02-06  5:40   ` Herbert Xu
  1 sibling, 0 replies; 19+ messages in thread
From: Anthony Liguori @ 2009-02-05 14:25 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Chris Wright, Arnd Bergmann, Herbert Xu, Rusty Russell, kvm

Avi Kivity wrote:
> Chris Wright wrote:
>> There's been a number of different discussions re: getting copyless 
>> virtio
>> net (esp. for KVM).  This is just a poke in that general direction to
>> stir the discussion.  I'm interested to hear current thoughts
>
> I believe that copyless networking is absolutely essential.
>
> For transmit, copyless is needed to properly support sendfile() type 
> workloads - http/ftp/nfs serving.  These are usually high-bandwidth, 
> cache-cold workloads where a copy is most expensive.
>
> For receive, the guest will almost always do an additional copy, but 
> it will most likely do the copy from another cpu.  Xen netchannel2 
> mitigates this somewhat by having the guest request the hypervisor to 
> perform the copy when the rx interrupt is processed, but this may 
> still be too early (the packet may be destined to a process that is on 
> another vcpu), and the extra hypercall is expensive.
>
> In my opinion, it would be ideal to linux-aio enable taps and packet 
> sockets.  io_submit() allows submitting multiple buffers in one 
> syscall and supports scatter/gather.  io_getevents() supports 
> dequeuing multiple packet completions in one syscall.

splice() has some nice properties too.  It disconnects the notion of 
moving around packets from the actually copy them.   It also fits well 
into a more performant model of interguest IO.  You can't publish 
multiple buffers with splice but I don't think we can do that today 
practically speaking because of mergable RX buffers.  You would have to 
extend the linux-aio interface to hand it a bunch of buffers and for it 
to tell you where the packet boundaries were.

Regards,

Anthony Liguroi



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: copyless virtio net thoughts?
  2009-02-05 12:37 ` Avi Kivity
  2009-02-05 14:25   ` Anthony Liguori
@ 2009-02-06  5:40   ` Herbert Xu
  2009-02-06  8:46     ` Avi Kivity
  1 sibling, 1 reply; 19+ messages in thread
From: Herbert Xu @ 2009-02-06  5:40 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Chris Wright, Arnd Bergmann, Rusty Russell, kvm, netdev

On Thu, Feb 05, 2009 at 02:37:07PM +0200, Avi Kivity wrote:
>
> I believe that copyless networking is absolutely essential.

I used to think it was important, but I'm now of the opinion
that it's quite useless for virtualisation as it stands.

> For transmit, copyless is needed to properly support sendfile() type  
> workloads - http/ftp/nfs serving.  These are usually high-bandwidth,  
> cache-cold workloads where a copy is most expensive.

This is totally true for baremetal, but useless for virtualisation
right now because the block layer is not zero-copy.  That is, the
data is going to be cache hot anyway so zero-copy networking doesn't
buy you much at all.

Please also recall that for the time being, block speeds are
way slower than network speeds.  So the really interesting case
is actually network-to-network transfers.  Again due to the
RX copy this is going to be cache hot.

> For receive, the guest will almost always do an additional copy, but it  
> will most likely do the copy from another cpu.  Xen netchannel2  

That's what we should strive to avoid.  The best scenario with
modern 10GbE NICs is to stay on one CPU if at all possible.  The
NIC will pick a CPU when it delivers the packet into one of the
RX queues and we should stick with it for as long as possible.

So what I'd like to see next in virtualised networking is virtual
multiqueue support in guest drivers.  No I'm not talking about
making one or more of the physical RX/TX queues available to the
guest (aka passthrough), but actually turning something like the
virtio-net interface into a multiqueue interface.

This is the best way to get cache locality and minimise CPU waste.

So I'm certainly not rushing out to do any zero-copy virtual
networking.  However, I would like to start working on a virtual
multiqueue NIC interface.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: copyless virtio net thoughts?
  2009-02-06  5:40   ` Herbert Xu
@ 2009-02-06  8:46     ` Avi Kivity
  2009-02-06  9:19       ` Herbert Xu
  0 siblings, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2009-02-06  8:46 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Chris Wright, Arnd Bergmann, Rusty Russell, kvm, netdev

Herbert Xu wrote:
> On Thu, Feb 05, 2009 at 02:37:07PM +0200, Avi Kivity wrote:
>   
>> I believe that copyless networking is absolutely essential.
>>     
>
> I used to think it was important, but I'm now of the opinion
> that it's quite useless for virtualisation as it stands.
>
>   
>> For transmit, copyless is needed to properly support sendfile() type  
>> workloads - http/ftp/nfs serving.  These are usually high-bandwidth,  
>> cache-cold workloads where a copy is most expensive.
>>     
>
> This is totally true for baremetal, but useless for virtualisation
> right now because the block layer is not zero-copy.  That is, the
> data is going to be cache hot anyway so zero-copy networking doesn't
> buy you much at all.
>   

The guest's block layer is copyless.  The host block layer is -><- this 
far from being copyless -- all we need is preadv()/pwritev() or to 
replace our thread pool implementation in qemu with linux-aio.  
Everything else is copyless.

Since we are actively working on this, expect this limitation to 
disappear soon.

(even if it doesn't, the effect of block layer copies is multiplied by 
the cache miss percentage which can be quite low for many workloads; but 
again, we're not bulding on that)
> Please also recall that for the time being, block speeds are
> way slower than network speeds.  So the really interesting case
> is actually network-to-network transfers.  Again due to the
> RX copy this is going to be cache hot.
>   

Block speeds are not way slower.  We're at 4Gb/sec for Fibre and 10Gb/s 
for networking.  With dual channels or a decent cache hit rate they're 
evenly matched.

>> For receive, the guest will almost always do an additional copy, but it  
>> will most likely do the copy from another cpu.  Xen netchannel2  
>>     
>
> That's what we should strive to avoid.  The best scenario with
> modern 10GbE NICs is to stay on one CPU if at all possible.  The
> NIC will pick a CPU when it delivers the packet into one of the
> RX queues and we should stick with it for as long as possible.
>
> So what I'd like to see next in virtualised networking is virtual
> multiqueue support in guest drivers.  No I'm not talking about
> making one or more of the physical RX/TX queues available to the
> guest (aka passthrough), but actually turning something like the
> virtio-net interface into a multiqueue interface.
>   

I support this, but it should be in addition to copylessness, not on its 
own.

- many guests will not support multiqueue
- for some threaded workloads, you cannot predict where the final read() 
will come from; this renders multiqueue ineffective for keeping cache 
locality
- usually you want virtio to transfer large amounts of data; but if you 
want your copies to be cache-hot, you need to limit transfers to half 
the cache size (a quarter if hyperthreading); this limits virtio 
effectiveness


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: copyless virtio net thoughts?
  2009-02-06  8:46     ` Avi Kivity
@ 2009-02-06  9:19       ` Herbert Xu
  2009-02-06 14:55         ` Avi Kivity
  0 siblings, 1 reply; 19+ messages in thread
From: Herbert Xu @ 2009-02-06  9:19 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Chris Wright, Arnd Bergmann, Rusty Russell, kvm, netdev

On Fri, Feb 06, 2009 at 10:46:37AM +0200, Avi Kivity wrote:
>
> The guest's block layer is copyless.  The host block layer is -><- this  
> far from being copyless -- all we need is preadv()/pwritev() or to  
> replace our thread pool implementation in qemu with linux-aio.   
> Everything else is copyless.
>
> Since we are actively working on this, expect this limitation to  
> disappear soon.

Great, when that happens I'll promise to revisit zero-copy transmit :)

> I support this, but it should be in addition to copylessness, not on its  
> own.

I was talking about it in the context of zero-copy receive, where
you mentioned that the virtio/kvm copy may not occur on the CPU of
the guest's copy.

My point is that using multiqueue you can avoid this change of CPU.

But yeah I think zero-copy receive is much more useful than zero-
copy transmit at the moment.  Although I'd prefer to wait for
you guys to finish the block layer work before contemplating
pushing the copy on receive into the guest :)

> - many guests will not support multiqueue

Well, these guests will suck both on baremetal and in virtualisation,
big deal :) Multiqueue at 10GbE speeds and above is simply not an
optional feature.

> - for some threaded workloads, you cannot predict where the final read()  
> will come from; this renders multiqueue ineffective for keeping cache  
> locality
>
> - usually you want virtio to transfer large amounts of data; but if you  
> want your copies to be cache-hot, you need to limit transfers to half  
> the cache size (a quarter if hyperthreading); this limits virtio  
> effectiveness

Agreed on both counts.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: copyless virtio net thoughts?
  2009-02-06  9:19       ` Herbert Xu
@ 2009-02-06 14:55         ` Avi Kivity
  2009-02-07 11:56           ` Arnd Bergmann
  0 siblings, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2009-02-06 14:55 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Chris Wright, Arnd Bergmann, Rusty Russell, kvm, netdev

Herbert Xu wrote:
> On Fri, Feb 06, 2009 at 10:46:37AM +0200, Avi Kivity wrote:
>   
>> The guest's block layer is copyless.  The host block layer is -><- this  
>> far from being copyless -- all we need is preadv()/pwritev() or to  
>> replace our thread pool implementation in qemu with linux-aio.   
>> Everything else is copyless.
>>
>> Since we are actively working on this, expect this limitation to  
>> disappear soon.
>>     
>
> Great, when that happens I'll promise to revisit zero-copy transmit :)
>
>   

I was hoping to get some concurrency here, but okay.

>> I support this, but it should be in addition to copylessness, not on its  
>> own.
>>     
>
> I was talking about it in the context of zero-copy receive, where
> you mentioned that the virtio/kvm copy may not occur on the CPU of
> the guest's copy.
>
> My point is that using multiqueue you can avoid this change of CPU.
>
> But yeah I think zero-copy receive is much more useful than zero-
> copy transmit at the moment.  Although I'd prefer to wait for
> you guys to finish the block layer work before contemplating
> pushing the copy on receive into the guest :)
>
>   

We'll get the block layer done soon, so it won't be a barrier.

>> - many guests will not support multiqueue
>>     
>
> Well, these guests will suck both on baremetal and in virtualisation,
> big deal :) Multiqueue at 10GbE speeds and above is simply not an
> optional feature.
>   

Each guest may only use a part of the 10Gb/s bandwidth, if you have 10 
guests each using 1Gb/s, then we should be able to support this without 
multiqueue in the guests.


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: copyless virtio net thoughts?
  2009-02-06 14:55         ` Avi Kivity
@ 2009-02-07 11:56           ` Arnd Bergmann
  2009-02-08  3:01             ` David Miller
  0 siblings, 1 reply; 19+ messages in thread
From: Arnd Bergmann @ 2009-02-07 11:56 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Herbert Xu, Chris Wright, Rusty Russell, kvm, netdev

On Friday 06 February 2009, Avi Kivity wrote:
> > Well, these guests will suck both on baremetal and in virtualisation,
> > big deal :) Multiqueue at 10GbE speeds and above is simply not an
> > optional feature.
> >   
> 
> Each guest may only use a part of the 10Gb/s bandwidth, if you have 10 
> guests each using 1Gb/s, then we should be able to support this without 
> multiqueue in the guests.

I would expect that there are people that even people with 10 simultaneous
guests would like to be able to saturate the link when only one or two of
them are doing much traffic on the interface.

Having the load spread evenly over all guests sounds like a much rarer
use case.

	Arnd <><

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: copyless virtio net thoughts?
  2009-02-07 11:56           ` Arnd Bergmann
@ 2009-02-08  3:01             ` David Miller
  0 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2009-02-08  3:01 UTC (permalink / raw)
  To: arnd; +Cc: avi, herbert, chrisw, rusty, kvm, netdev

From: Arnd Bergmann <arnd@arndb.de>
Date: Sat, 7 Feb 2009 12:56:06 +0100

> Having the load spread evenly over all guests sounds like a much rarer
> use case.

Totally agreed.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: copyless virtio net thoughts?
  2009-02-05  2:07 copyless virtio net thoughts? Chris Wright
  2009-02-05 12:37 ` Avi Kivity
@ 2009-02-18 11:38 ` Rusty Russell
  2009-02-18 12:17   ` Herbert Xu
                     ` (2 more replies)
  1 sibling, 3 replies; 19+ messages in thread
From: Rusty Russell @ 2009-02-18 11:38 UTC (permalink / raw)
  To: Chris Wright; +Cc: Arnd Bergmann, Herbert Xu, kvm

On Thursday 05 February 2009 12:37:32 Chris Wright wrote:
> There's been a number of different discussions re: getting copyless virtio
> net (esp. for KVM).  This is just a poke in that general direction to
> stir the discussion.  I'm interested to hear current thoughts?

This thread seems to have died out, time for me to weigh in!

There are four promising areas that I see when looking at virtio_net performance.  I list them all here because they may interact:

1) Async tap access.
2) Direct NIC attachment.
3) Direct interguest networking.
4) Multiqueue virtio_net.

1) Async tap access
Either via aio, or something like the prototype virtio_ring patches I produced last year.  This is potentially copyless networking for xmit (bar header), with one copy on recv.

2) Direct NIC attachment
This is particularly interesting with SR-IOV or other multiqueue nics, but for boutique cases or benchmarks, could be for normal NICs.  So far I have some very sketched-out patches: for the attached nic dev_alloc_skb() gets an skb from the guest (which supplies them via some kind of AIO interface), and a branch in netif_receive_skb() which returned it to the guest.  This bypasses all firewalling in the host though; we're basically having the guest process drive the NIC directly.

3) Direct interguest networking
Anthony has been thinking here: vmsplice has already been mentioned.  The idea of passing directly from one guest to another is an interesting one: using dma engines might be possible too.  Again, host can't firewall this traffic.  Simplest as a dedicated "internal lan" NIC, but we could theoretically do a fast-path for certain MAC addresses on a general guest NIC.

4) Multiple queues
This is Herbert's.  Should be fairly simple to add; it was in the back of my mind when we started.  Not sure whether the queues should be static or dynamic (imagine direct interguest networking, one queue pair for each other guest), and how xmit queues would be selected by the guest (anything anywhere, or dst mac?).

Anyone else want to make comments?

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: copyless virtio net thoughts?
  2009-02-18 11:38 ` Rusty Russell
@ 2009-02-18 12:17   ` Herbert Xu
  2009-02-18 16:24   ` Arnd Bergmann
  2009-02-18 23:31   ` Simon Horman
  2 siblings, 0 replies; 19+ messages in thread
From: Herbert Xu @ 2009-02-18 12:17 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Chris Wright, Arnd Bergmann, kvm

On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote:
>
> 4) Multiple queues
> This is Herbert's.  Should be fairly simple to add; it was in the back of my mind when we started.  Not sure whether the queues should be static or dynamic (imagine direct interguest networking, one queue pair for each other guest), and how xmit queues would be selected by the guest (anything anywhere, or dst mac?).

The primary purpose of multiple queues is to maximise CPU utilisation,
so the number of queues is simply dependent on the number of CPUs
allotted to the guest.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: copyless virtio net thoughts?
  2009-02-18 11:38 ` Rusty Russell
  2009-02-18 12:17   ` Herbert Xu
@ 2009-02-18 16:24   ` Arnd Bergmann
  2009-02-19 10:56     ` Rusty Russell
  2009-02-18 23:31   ` Simon Horman
  2 siblings, 1 reply; 19+ messages in thread
From: Arnd Bergmann @ 2009-02-18 16:24 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Chris Wright, Herbert Xu, kvm

On Wednesday 18 February 2009, Rusty Russell wrote:

> 2) Direct NIC attachment
> This is particularly interesting with SR-IOV or other multiqueue nics,
> but for boutique cases or benchmarks, could be for normal NICs.  So
> far I have some very sketched-out patches: for the attached nic 
> dev_alloc_skb() gets an skb from the guest (which supplies them via
> some kind of AIO interface), and a branch in netif_receive_skb()
> which returned it to the guest.  This bypasses all firewalling in
> the host though; we're basically having the guest process drive
> the NIC directly.       

If this is not passing the PCI device directly to the guest, but
uses your concept, wouldn't it still be possible to use the firewalling
in the host? You can always inspect the headers, drop the frame, etc
without copying the whole frame at any point.

When it gets to the point of actually giving the (real pf or sr-iov vf)
to one guest, you really get to the point where you can't do local
firewalling any more.

> 3) Direct interguest networking
> Anthony has been thinking here: vmsplice has already been mentioned.
> The idea of passing directly from one guest to another is an
> interesting one: using dma engines might be possible too.  Again,
> host can't firewall this traffic.  Simplest as a dedicated "internal
> lan" NIC, but we could theoretically do a fast-path for certain MAC
> addresses on a general guest NIC.     

Another option would be to use an SR-IOV adapter from multiple guests,
with a virtual ethernet bridge in the adapter. This moves the overhead
from the CPU to the bus and/or adapter, so it may or may not be a real
benefit depending on the workload.

	Arnd <><

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: copyless virtio net thoughts?
  2009-02-18 11:38 ` Rusty Russell
  2009-02-18 12:17   ` Herbert Xu
  2009-02-18 16:24   ` Arnd Bergmann
@ 2009-02-18 23:31   ` Simon Horman
  2009-02-19  1:03     ` Dong, Eddie
                       ` (2 more replies)
  2 siblings, 3 replies; 19+ messages in thread
From: Simon Horman @ 2009-02-18 23:31 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Chris Wright, Arnd Bergmann, Herbert Xu, kvm

On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote:
> 
> 2) Direct NIC attachment This is particularly interesting with SR-IOV or
> other multiqueue nics, but for boutique cases or benchmarks, could be for
> normal NICs.  So far I have some very sketched-out patches: for the
> attached nic dev_alloc_skb() gets an skb from the guest (which supplies
> them via some kind of AIO interface), and a branch in netif_receive_skb()
> which returned it to the guest.  This bypasses all firewalling in the
> host though; we're basically having the guest process drive the NIC
> directly.

Hi Rusty,

Can I clarify that the idea with utilising SR-IOV would be to assign
virtual functions to guests? That is, something conceptually similar to
PCI pass-through in Xen (although I'm not sure that anyone has virtual
function pass-through working yet). If so, wouldn't this also be useful
on machines that have multiple NICs?

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: copyless virtio net thoughts?
  2009-02-18 23:31   ` Simon Horman
@ 2009-02-19  1:03     ` Dong, Eddie
  2009-02-19 11:36     ` Rusty Russell
  2009-02-19 11:37     ` Chris Wright
  2 siblings, 0 replies; 19+ messages in thread
From: Dong, Eddie @ 2009-02-19  1:03 UTC (permalink / raw)
  To: Simon Horman, Rusty Russell
  Cc: Chris Wright, Arnd Bergmann, Herbert Xu, kvm@vger.kernel.org,
	Dong, Eddie

Simon Horman wrote:
> On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell
> wrote: 
>> 
>> 2) Direct NIC attachment This is particularly
>> interesting with SR-IOV or other multiqueue nics, but
>> for boutique cases or benchmarks, could be for normal
>> NICs.  So far I have some very sketched-out patches: for
>> the attached nic dev_alloc_skb() gets an skb from the
>> guest (which supplies them via some kind of AIO
>> interface), and a branch in netif_receive_skb() which
>> returned it to the guest.  This bypasses all firewalling
>> in the host though; we're basically having the guest
>> process drive the NIC directly.  
> 
> Hi Rusty,
> 
> Can I clarify that the idea with utilising SR-IOV would
> be to assign virtual functions to guests? That is,
> something conceptually similar to PCI pass-through in Xen
> (although I'm not sure that anyone has virtual function
> pass-through working yet). If so, wouldn't this also be
> useful on machines that have multiple NICs? 
> 
Yes, and we have successfully get it run with assigning VF to guest in both Xen & KVM, but we are still working on pushing those patches out since it needs Linux PCI subsystem support & driver support.

Thx, eddie

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: copyless virtio net thoughts?
  2009-02-18 16:24   ` Arnd Bergmann
@ 2009-02-19 10:56     ` Rusty Russell
  0 siblings, 0 replies; 19+ messages in thread
From: Rusty Russell @ 2009-02-19 10:56 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Chris Wright, Herbert Xu, kvm

On Thursday 19 February 2009 02:54:06 Arnd Bergmann wrote:
> On Wednesday 18 February 2009, Rusty Russell wrote:
> 
> > 2) Direct NIC attachment
> > This is particularly interesting with SR-IOV or other multiqueue nics,
> > but for boutique cases or benchmarks, could be for normal NICs.  So
> > far I have some very sketched-out patches: for the attached nic 
> > dev_alloc_skb() gets an skb from the guest (which supplies them via
> > some kind of AIO interface), and a branch in netif_receive_skb()
> > which returned it to the guest.  This bypasses all firewalling in
> > the host though; we're basically having the guest process drive
> > the NIC directly.       
> 
> If this is not passing the PCI device directly to the guest, but
> uses your concept, wouldn't it still be possible to use the firewalling
> in the host? You can always inspect the headers, drop the frame, etc
> without copying the whole frame at any point.

It's possible, but you don't want routing or parsing, etc: the NIC
is just "directly" attached to the guest.

You could do it in qemu or whatever, but it would not be the kernel scheme
(netfilter/iptables).

> > 3) Direct interguest networking
> > Anthony has been thinking here: vmsplice has already been mentioned.
> > The idea of passing directly from one guest to another is an
> > interesting one: using dma engines might be possible too.  Again,
> > host can't firewall this traffic.  Simplest as a dedicated "internal
> > lan" NIC, but we could theoretically do a fast-path for certain MAC
> > addresses on a general guest NIC.     
> 
> Another option would be to use an SR-IOV adapter from multiple guests,
> with a virtual ethernet bridge in the adapter. This moves the overhead
> from the CPU to the bus and/or adapter, so it may or may not be a real
> benefit depending on the workload.

Yes, I guess this should work.  Even different SR-IOV adapters will simply
send to one another.  I'm not sure this obviates the desire to have direct
inter-guest which is more generic though.

Thanks!
Rusty.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: copyless virtio net thoughts?
  2009-02-18 23:31   ` Simon Horman
  2009-02-19  1:03     ` Dong, Eddie
@ 2009-02-19 11:36     ` Rusty Russell
  2009-02-19 14:51       ` Arnd Bergmann
  2009-02-19 23:09       ` Simon Horman
  2009-02-19 11:37     ` Chris Wright
  2 siblings, 2 replies; 19+ messages in thread
From: Rusty Russell @ 2009-02-19 11:36 UTC (permalink / raw)
  To: Simon Horman; +Cc: Chris Wright, Arnd Bergmann, Herbert Xu, kvm

On Thursday 19 February 2009 10:01:42 Simon Horman wrote:
> On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote:
> > 
> > 2) Direct NIC attachment This is particularly interesting with SR-IOV or
> > other multiqueue nics, but for boutique cases or benchmarks, could be for
> > normal NICs.  So far I have some very sketched-out patches: for the
> > attached nic dev_alloc_skb() gets an skb from the guest (which supplies
> > them via some kind of AIO interface), and a branch in netif_receive_skb()
> > which returned it to the guest.  This bypasses all firewalling in the
> > host though; we're basically having the guest process drive the NIC
> > directly.
> 
> Hi Rusty,
> 
> Can I clarify that the idea with utilising SR-IOV would be to assign
> virtual functions to guests? That is, something conceptually similar to
> PCI pass-through in Xen (although I'm not sure that anyone has virtual
> function pass-through working yet).

Not quite: I think PCI passthrough IMHO is the *wrong* way to do it: it makes migrate complicated (if not impossible), and requires emulation or the same NIC on the destination host.

This would be the *host* seeing the virtual functions as multiple NICs, then
the ability to attach a given NIC directly to a process.

This isn't guest-visible: the kvm process is configured to connect directly to a NIC, rather than (say) bridging through the host.

> If so, wouldn't this also be useful
> on machines that have multiple NICs?

Yes, but mainly as a benchmark hack AFAICT :)

Hope that clarifies,
Rusty.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: copyless virtio net thoughts?
  2009-02-18 23:31   ` Simon Horman
  2009-02-19  1:03     ` Dong, Eddie
  2009-02-19 11:36     ` Rusty Russell
@ 2009-02-19 11:37     ` Chris Wright
  2 siblings, 0 replies; 19+ messages in thread
From: Chris Wright @ 2009-02-19 11:37 UTC (permalink / raw)
  To: Simon Horman; +Cc: Rusty Russell, Chris Wright, Arnd Bergmann, Herbert Xu, kvm

* Simon Horman (horms@verge.net.au) wrote:
> On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote:
> > 2) Direct NIC attachment This is particularly interesting with SR-IOV or
> > other multiqueue nics, but for boutique cases or benchmarks, could be for
> > normal NICs.  So far I have some very sketched-out patches: for the
> > attached nic dev_alloc_skb() gets an skb from the guest (which supplies
> > them via some kind of AIO interface), and a branch in netif_receive_skb()
> > which returned it to the guest.  This bypasses all firewalling in the
> > host though; we're basically having the guest process drive the NIC
> > directly.
> 
> Can I clarify that the idea with utilising SR-IOV would be to assign
> virtual functions to guests? That is, something conceptually similar to
> PCI pass-through in Xen (although I'm not sure that anyone has virtual
> function pass-through working yet). If so, wouldn't this also be useful
> on machines that have multiple NICs?

This would be the typical usecase for sr-iov.  But I think Rusty is
referring to giving a nic "directly" to a guest but the guest is still
seeing a virtio nic (not pass-through/device-assignment).  So there's
no bridge, and zero copy so the dma buffers are supplied by guest,
but host has the driver for the physical nic or the VF.

thanks,
-chris

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: copyless virtio net thoughts?
  2009-02-19 11:36     ` Rusty Russell
@ 2009-02-19 14:51       ` Arnd Bergmann
  2009-02-19 23:09       ` Simon Horman
  1 sibling, 0 replies; 19+ messages in thread
From: Arnd Bergmann @ 2009-02-19 14:51 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Simon Horman, Chris Wright, Herbert Xu, kvm, Dong, Eddie

On Thursday 19 February 2009, Rusty Russell wrote:

> Not quite: I think PCI passthrough IMHO is the *wrong* way to do it:
> it makes migrate complicated (if not impossible), and requires
> emulation or the same NIC on the destination host.  
> 
> This would be the *host* seeing the virtual functions as multiple
> NICs, then the ability to attach a given NIC directly to a process.

I guess what you mean then is what Intel calls VMDq, not SR-IOV.
Eddie has some slides about this at
http://docs.huihoo.com/kvm/kvmforum2008/kdf2008_7.pdf .

The latest network cards support both operation modes, and it
appears to me that there is a place for both. VMDq gives you
the best performance without limiting flexibility, while SR-IOV
performance in theory can be even better, but sacrificing a
lot of flexibility and potentially local (guest-to-gest)
performance.

AFAICT, any card that supports SR-IOV should also allow a VMDq
like model, as you describe.

	Arnd <><

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: copyless virtio net thoughts?
  2009-02-19 11:36     ` Rusty Russell
  2009-02-19 14:51       ` Arnd Bergmann
@ 2009-02-19 23:09       ` Simon Horman
  1 sibling, 0 replies; 19+ messages in thread
From: Simon Horman @ 2009-02-19 23:09 UTC (permalink / raw)
  To: Rusty Russell, Chris Wright; +Cc: Arnd Bergmann, Herbert Xu, kvm

On Thu, Feb 19, 2009 at 10:06:17PM +1030, Rusty Russell wrote:
> On Thursday 19 February 2009 10:01:42 Simon Horman wrote:
> > On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote:
> > > 
> > > 2) Direct NIC attachment This is particularly interesting with SR-IOV or
> > > other multiqueue nics, but for boutique cases or benchmarks, could be for
> > > normal NICs.  So far I have some very sketched-out patches: for the
> > > attached nic dev_alloc_skb() gets an skb from the guest (which supplies
> > > them via some kind of AIO interface), and a branch in netif_receive_skb()
> > > which returned it to the guest.  This bypasses all firewalling in the
> > > host though; we're basically having the guest process drive the NIC
> > > directly.
> > 
> > Hi Rusty,
> > 
> > Can I clarify that the idea with utilising SR-IOV would be to assign
> > virtual functions to guests? That is, something conceptually similar to
> > PCI pass-through in Xen (although I'm not sure that anyone has virtual
> > function pass-through working yet).
> 
> Not quite: I think PCI passthrough IMHO is the *wrong* way to do it: it
> makes migrate complicated (if not impossible), and requires emulation or
> the same NIC on the destination host.
> 
> This would be the *host* seeing the virtual functions as multiple NICs,
> then the ability to attach a given NIC directly to a process.
> 
> This isn't guest-visible: the kvm process is configured to connect
> directly to a NIC, rather than (say) bridging through the host.

Hi Rusty, Hi Chris,

Thanks for the clarification.

I think that the approach that Xen recommends for migration is to
use a bonding device that accesses the pass-through device if present
and a virtual nic.

The idea that you outline above does sound somewhat cleaner :-)

> > If so, wouldn't this also be useful on machines that have multiple
> > NICs?
> 
> Yes, but mainly as a benchmark hack AFAICT :)

Ok, I was under the impression that at least in the Xen world it
was something people actually used. But I could easily be mistaken.

> Hope that clarifies, Rusty.

On Thu, Feb 19, 2009 at 03:37:52AM -0800, Chris Wright wrote:
> * Simon Horman (horms@verge.net.au) wrote:
> > On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote:
> > > 2) Direct NIC attachment This is particularly interesting with SR-IOV or
> > > other multiqueue nics, but for boutique cases or benchmarks, could be for
> > > normal NICs.  So far I have some very sketched-out patches: for the
> > > attached nic dev_alloc_skb() gets an skb from the guest (which supplies
> > > them via some kind of AIO interface), and a branch in netif_receive_skb()
> > > which returned it to the guest.  This bypasses all firewalling in the
> > > host though; we're basically having the guest process drive the NIC
> > > directly.
> > 
> > Can I clarify that the idea with utilising SR-IOV would be to assign
> > virtual functions to guests? That is, something conceptually similar to
> > PCI pass-through in Xen (although I'm not sure that anyone has virtual
> > function pass-through working yet). If so, wouldn't this also be useful
> > on machines that have multiple NICs?
> 
> This would be the typical usecase for sr-iov.  But I think Rusty is
> referring to giving a nic "directly" to a guest but the guest is still
> seeing a virtio nic (not pass-through/device-assignment).  So there's
> no bridge, and zero copy so the dma buffers are supplied by guest,
> but host has the driver for the physical nic or the VF.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2009-02-19 23:09 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-05  2:07 copyless virtio net thoughts? Chris Wright
2009-02-05 12:37 ` Avi Kivity
2009-02-05 14:25   ` Anthony Liguori
2009-02-06  5:40   ` Herbert Xu
2009-02-06  8:46     ` Avi Kivity
2009-02-06  9:19       ` Herbert Xu
2009-02-06 14:55         ` Avi Kivity
2009-02-07 11:56           ` Arnd Bergmann
2009-02-08  3:01             ` David Miller
2009-02-18 11:38 ` Rusty Russell
2009-02-18 12:17   ` Herbert Xu
2009-02-18 16:24   ` Arnd Bergmann
2009-02-19 10:56     ` Rusty Russell
2009-02-18 23:31   ` Simon Horman
2009-02-19  1:03     ` Dong, Eddie
2009-02-19 11:36     ` Rusty Russell
2009-02-19 14:51       ` Arnd Bergmann
2009-02-19 23:09       ` Simon Horman
2009-02-19 11:37     ` Chris Wright

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox