From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:40292) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Td2Ke-00040Y-Qy for qemu-devel@nongnu.org; Mon, 26 Nov 2012 12:19:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Td2KY-00081t-Q7 for qemu-devel@nongnu.org; Mon, 26 Nov 2012 12:18:56 -0500 Received: from 69.169.164.127.provo.static.broadweavenetworks.net ([69.169.164.127]:41121 helo=baldr.dev-zero.net) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Td2KY-00081j-Is for qemu-devel@nongnu.org; Mon, 26 Nov 2012 12:18:50 -0500 Message-ID: <50B3A494.3050002@dev-zero.net> Date: Mon, 26 Nov 2012 10:19:16 -0700 From: Mike Lovell MIME-Version: 1.0 References: <1340602924-3231-1-git-send-email-mike@dev-zero.net> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Anthony Liguori , qemu-devel On 11/24/2012 08:21 AM, Stefan Hajnoczi wrote: > On Mon, Jun 25, 2012 at 7:42 AM, Mike Lovell wrote: >> This is what I've been calling QDES or QEMU Distributed Ethernet Switch. I >> first had the idea when I was playing with the udp and mcast socket network >> backends while exploring how to build a VM infrastructure. I liked the idea of >> using the sockets backends cause it doesn't require escalated permissions to >> configure and run as well as the ability to talk over IP networks. > Hi Mike, > I was just reading the VXLAN spec and Linux code when I realized this > is similar to your QDES approach: > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d342894c5d2f8c7df194c793ec4059656e09ca31 > http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02 > > If you're still hacking on QDES you may be interested. > > VXLAN is a VLAN mechanism that gets around the 12-bit 802.1Q tag size. > In large deployments it may be necessary to have more than 4096 > VLANs, this is where VXLAN comes in. > > It's a tiny header with VXLAN Network ID that encapsulates Ethernet inside UDP: > > [Outer Ethernet][IP][UDP] [VXLAN] [Inner Ethernet][...] > > UDP is used as follows: > 1. If the host has already learnt an Inner MAC -> Outer IP mapping, > then it transmits a unicast UDP packet. > 2. Otherwise it transmits a multicast UDP packet. > > That means all hosts join a multicast group - this enables broadcast > similar to what you've done in your patches. > > Typically traffic from a VM on Host A to another VM on Host B will use > unicast UDP because the Inner MAC -> Outer IP mapping has been learnt. > > I'm not sure if it makes sense to implement VXLAN in QEMU because the > multicast UDP socket uses a well-known port. I guess that means > multiple QEMUs running on the same host cannot use VXLAN unless they > bind to unique IP addresses. At that point we lose the advantage of a > pure userspace implementation and might as well use the kernel > implementation (or OpenVSwitch) with tap devices. > > Anyway, it's still interesting and maybe there's a way to solve this. > > Stefan the VXLAN spec gave me some inspiration to write the original patch i submitted. unfortunately i made a silly decision of using my own header format and should have used the VXLAN one. but i believe just changing that would make this compatible with VXLAN. i do still want to do more work on this such as converting to make it compatible with VXLAN. there have also been a lot of other changes to the network subsystem that i would need to update the patch for. i've been rather busy the past few months with a work project and told myself i have to finish that before i can go back to this. i also was waiting to see if the curn in the network subsystem would calm down and make all the changes i need there at once. hopefully around the new year i'll have time to look at it. since i originally sent the patch to the list, there have been a few people ask me about it so i think there is some interest for it. i think it does still make sense to implement it in QEMU. there isn't a problem with multiple processes using the same multicast address. the net_socket_mcast_create function in socket.c already sets the IP_MULTICAST_LOOP option which makes it so packets get looped back and also delivered to processes on the same host. that is why there is a check in qdes_receive to see if the sender is the localAddr and drop it if it is. the big advantage i see to implementing VXLAN inside QEMU is that it can be done without any escalated privileges and without reconfiguring the hosts network configuration. mike