netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Avi Kivity <avi@redhat.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Chris Wright <chrisw@sous-sol.org>, Arnd Bergmann <arnd@arndb.de>,
	Rusty Russell <rusty@rustcorp.com.au>,
	kvm@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: copyless virtio net thoughts?
Date: Fri, 06 Feb 2009 10:46:37 +0200	[thread overview]
Message-ID: <498BF8ED.8090208@redhat.com> (raw)
In-Reply-To: <20090206054054.GA4824@gondor.apana.org.au>

Herbert Xu wrote:
> On Thu, Feb 05, 2009 at 02:37:07PM +0200, Avi Kivity wrote:
>   
>> I believe that copyless networking is absolutely essential.
>>     
>
> I used to think it was important, but I'm now of the opinion
> that it's quite useless for virtualisation as it stands.
>
>   
>> For transmit, copyless is needed to properly support sendfile() type  
>> workloads - http/ftp/nfs serving.  These are usually high-bandwidth,  
>> cache-cold workloads where a copy is most expensive.
>>     
>
> This is totally true for baremetal, but useless for virtualisation
> right now because the block layer is not zero-copy.  That is, the
> data is going to be cache hot anyway so zero-copy networking doesn't
> buy you much at all.
>   

The guest's block layer is copyless.  The host block layer is -><- this 
far from being copyless -- all we need is preadv()/pwritev() or to 
replace our thread pool implementation in qemu with linux-aio.  
Everything else is copyless.

Since we are actively working on this, expect this limitation to 
disappear soon.

(even if it doesn't, the effect of block layer copies is multiplied by 
the cache miss percentage which can be quite low for many workloads; but 
again, we're not bulding on that)
> Please also recall that for the time being, block speeds are
> way slower than network speeds.  So the really interesting case
> is actually network-to-network transfers.  Again due to the
> RX copy this is going to be cache hot.
>   

Block speeds are not way slower.  We're at 4Gb/sec for Fibre and 10Gb/s 
for networking.  With dual channels or a decent cache hit rate they're 
evenly matched.

>> For receive, the guest will almost always do an additional copy, but it  
>> will most likely do the copy from another cpu.  Xen netchannel2  
>>     
>
> That's what we should strive to avoid.  The best scenario with
> modern 10GbE NICs is to stay on one CPU if at all possible.  The
> NIC will pick a CPU when it delivers the packet into one of the
> RX queues and we should stick with it for as long as possible.
>
> So what I'd like to see next in virtualised networking is virtual
> multiqueue support in guest drivers.  No I'm not talking about
> making one or more of the physical RX/TX queues available to the
> guest (aka passthrough), but actually turning something like the
> virtio-net interface into a multiqueue interface.
>   

I support this, but it should be in addition to copylessness, not on its 
own.

- many guests will not support multiqueue
- for some threaded workloads, you cannot predict where the final read() 
will come from; this renders multiqueue ineffective for keeping cache 
locality
- usually you want virtio to transfer large amounts of data; but if you 
want your copies to be cache-hot, you need to limit transfers to half 
the cache size (a quarter if hyperthreading); this limits virtio 
effectiveness


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


  reply	other threads:[~2009-02-06  8:46 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20090205020732.GA27684@sequoia.sous-sol.org>
     [not found] ` <498ADD73.3060906@redhat.com>
2009-02-06  5:40   ` copyless virtio net thoughts? Herbert Xu
2009-02-06  8:46     ` Avi Kivity [this message]
2009-02-06  9:19       ` Herbert Xu
2009-02-06 14:55         ` Avi Kivity
2009-02-07 11:56           ` Arnd Bergmann
2009-02-08  3:01             ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=498BF8ED.8090208@redhat.com \
    --to=avi@redhat.com \
    --cc=arnd@arndb.de \
    --cc=chrisw@sous-sol.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=kvm@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).