From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53210) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WKTQe-00067I-MW for Qemu-devel@nongnu.org; Mon, 03 Mar 2014 09:01:19 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WKTQX-0000vC-NV for Qemu-devel@nongnu.org; Mon, 03 Mar 2014 09:01:12 -0500 Received: from alln-iport-3.cisco.com ([173.37.142.90]:30830) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WKTQX-0000uL-Fd for Qemu-devel@nongnu.org; Mon, 03 Mar 2014 09:01:05 -0500 From: "Anton Ivanov (antivano)" Date: Mon, 3 Mar 2014 14:01:00 +0000 Message-ID: <53148B1A.3070008@cisco.com> References: <5310489A.4060501@cisco.com> <20140303132746.GE21055@stefanha-thinkpad.redhat.com> In-Reply-To: <20140303132746.GE21055@stefanha-thinkpad.redhat.com> Content-Language: en-US Content-Type: text/plain; charset="iso-8859-1" Content-ID: <845BDFC0432A5046A2A4BF4AC813DA8C@emea.cisco.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [Qemu-devel] Contribution - L2TPv3 transport List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: "Qemu-devel@nongnu.org" On 03/03/14 13:27, Stefan Hajnoczi wrote: > On Fri, Feb 28, 2014 at 08:28:11AM +0000, Anton Ivanov (antivano) wrote: >> 3. Qemu to communicate with the local host, remote vms, network devices, >> etc at speeds which for a number of use cases exceed the speed of the >> legacy tap driver. > This surprises me. It's odd that tap performs significantly worse. Multipacket RX can go a very long way and it does not work on tap's=20 emulation of a raw socket. At least in 3.2 :) I could have put multipacket-TX in too, but that means breaking QoS for=20 my use cases as well as using 2+ thread IO, improving current qemu timer=20 API, etc. I have looked into it - it is doable. I would not try that in=20 first instance at this point. Tap at present can beat l2tpv3 on use cases where offloads have a=20 significant contribution. Tap is slower on anything per-packet and it is=20 slower end-to-end when combined with a bridge and/or OVS. > > I guess you have two options for using kernel L2TPv3 support: > > 1. Use a bridge and tap to forward between the L2TPv3 interface and the > QEMU. Correct. It ends up being slower on per-packet use cases. Also, in a=20 system it introduces one more touch point - the bridge. It needs to be=20 configured and kept up to date. For our key use case (one which we will=20 ship as a product) we have the following topology: [Customer LAN] <-> [Physical CPE] <->... network... <-> [VM running a=20 service] An example of this would be putting a media server or a NAS on a VM and=20 joining it to a customer network. We can connect the VM via a switch. Nothing wrong with that and it may=20 have the same performance end-to-end. However, it will introduce an=20 extra touch point to deal with in terms control plane, orchestration and=20 provisioning. > > 2. Use macvtap on top of the L2TPv3 interface (if possible). Did not try that so cannot really say. > > Option #2 should be faster. > > Now about the tap userspace ABI, is the performance bottleneck that the > read(2) system call only receives one packet at a time? The tap file > descriptor is not a socket so recvmmsg(2) cannot be used on it directly. If I read the kernel source correctly the tap fd can emulate a socket=20 for some calls. However, when I try recvmmsg I get an ENOTSOCKET. > > I have wondered in the past whether something like packet mmap would be > possible on a tap device. I have done it on raw. I have it approved for submission, mmap works=20 fine on RX (once again). Packet mmap does not work on TX - you end up=20 having to filter your own frames leading to an overall drop in=20 efficiency. So TX still has to be a write to socket. We will be contributing that one shortly after I clean it up to the=20 required coding standards (in fact parts of the source sneaked into the=20 original diff file by mistake). Theoretically, it has very little advantages compared to recvmmsg as=20 there is a copy involved in both cases. I am happy to rewrite that for=20 recvmmsg instead of packet mmap so we can reuse the vector IO code=20 across both drivers. > At that point userspace just waits for > notifications and the actual packets don't require any syscalls. Indeed. That driver will be contributed shortly. We have done that one=20 too :) A. > >> Our suggestion would be that this over time obsoletes the UDP variety of >> the "socket" driver. > Yes, thank you! L2TPv3 looks like a good replacement for the "socket" > driver. > > Will review and respond to your patch in detail. > > Stefan