From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: copyless virtio net thoughts? Date: Thu, 05 Feb 2009 08:25:14 -0600 Message-ID: <498AF6CA.50101@codemonkey.ws> References: <20090205020732.GA27684@sequoia.sous-sol.org> <498ADD73.3060906@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Chris Wright , Arnd Bergmann , Herbert Xu , Rusty Russell , kvm@vger.kernel.org To: Avi Kivity Return-path: Received: from mail-qy0-f11.google.com ([209.85.221.11]:45184 "EHLO mail-qy0-f11.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751120AbZBEOZm (ORCPT ); Thu, 5 Feb 2009 09:25:42 -0500 Received: by qyk4 with SMTP id 4so332969qyk.13 for ; Thu, 05 Feb 2009 06:25:35 -0800 (PST) In-Reply-To: <498ADD73.3060906@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: Avi Kivity wrote: > Chris Wright wrote: >> There's been a number of different discussions re: getting copyless >> virtio >> net (esp. for KVM). This is just a poke in that general direction to >> stir the discussion. I'm interested to hear current thoughts > > I believe that copyless networking is absolutely essential. > > For transmit, copyless is needed to properly support sendfile() type > workloads - http/ftp/nfs serving. These are usually high-bandwidth, > cache-cold workloads where a copy is most expensive. > > For receive, the guest will almost always do an additional copy, but > it will most likely do the copy from another cpu. Xen netchannel2 > mitigates this somewhat by having the guest request the hypervisor to > perform the copy when the rx interrupt is processed, but this may > still be too early (the packet may be destined to a process that is on > another vcpu), and the extra hypercall is expensive. > > In my opinion, it would be ideal to linux-aio enable taps and packet > sockets. io_submit() allows submitting multiple buffers in one > syscall and supports scatter/gather. io_getevents() supports > dequeuing multiple packet completions in one syscall. splice() has some nice properties too. It disconnects the notion of moving around packets from the actually copy them. It also fits well into a more performant model of interguest IO. You can't publish multiple buffers with splice but I don't think we can do that today practically speaking because of mergable RX buffers. You would have to extend the linux-aio interface to hand it a bunch of buffers and for it to tell you where the packet boundaries were. Regards, Anthony Liguroi