* Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES) [not found] <1340602924-3231-1-git-send-email-mike@dev-zero.net> @ 2012-11-24 15:21 ` Stefan Hajnoczi 2012-11-26 17:19 ` Mike Lovell 0 siblings, 1 reply; 6+ messages in thread From: Stefan Hajnoczi @ 2012-11-24 15:21 UTC (permalink / raw) To: Mike Lovell; +Cc: Anthony Liguori, qemu-devel On Mon, Jun 25, 2012 at 7:42 AM, Mike Lovell <mike@dev-zero.net> wrote: > This is what I've been calling QDES or QEMU Distributed Ethernet Switch. I > first had the idea when I was playing with the udp and mcast socket network > backends while exploring how to build a VM infrastructure. I liked the idea of > using the sockets backends cause it doesn't require escalated permissions to > configure and run as well as the ability to talk over IP networks. Hi Mike, I was just reading the VXLAN spec and Linux code when I realized this is similar to your QDES approach: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d342894c5d2f8c7df194c793ec4059656e09ca31 http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02 If you're still hacking on QDES you may be interested. VXLAN is a VLAN mechanism that gets around the 12-bit 802.1Q tag size. In large deployments it may be necessary to have more than 4096 VLANs, this is where VXLAN comes in. It's a tiny header with VXLAN Network ID that encapsulates Ethernet inside UDP: [Outer Ethernet][IP][UDP] [VXLAN] [Inner Ethernet][...] UDP is used as follows: 1. If the host has already learnt an Inner MAC -> Outer IP mapping, then it transmits a unicast UDP packet. 2. Otherwise it transmits a multicast UDP packet. That means all hosts join a multicast group - this enables broadcast similar to what you've done in your patches. Typically traffic from a VM on Host A to another VM on Host B will use unicast UDP because the Inner MAC -> Outer IP mapping has been learnt. I'm not sure if it makes sense to implement VXLAN in QEMU because the multicast UDP socket uses a well-known port. I guess that means multiple QEMUs running on the same host cannot use VXLAN unless they bind to unique IP addresses. At that point we lose the advantage of a pure userspace implementation and might as well use the kernel implementation (or OpenVSwitch) with tap devices. Anyway, it's still interesting and maybe there's a way to solve this. Stefan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES) 2012-11-24 15:21 ` [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES) Stefan Hajnoczi @ 2012-11-26 17:19 ` Mike Lovell 2012-11-27 12:42 ` Stefan Hajnoczi 0 siblings, 1 reply; 6+ messages in thread From: Mike Lovell @ 2012-11-26 17:19 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: Anthony Liguori, qemu-devel On 11/24/2012 08:21 AM, Stefan Hajnoczi wrote: > On Mon, Jun 25, 2012 at 7:42 AM, Mike Lovell <mike@dev-zero.net> wrote: >> This is what I've been calling QDES or QEMU Distributed Ethernet Switch. I >> first had the idea when I was playing with the udp and mcast socket network >> backends while exploring how to build a VM infrastructure. I liked the idea of >> using the sockets backends cause it doesn't require escalated permissions to >> configure and run as well as the ability to talk over IP networks. > Hi Mike, > I was just reading the VXLAN spec and Linux code when I realized this > is similar to your QDES approach: > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d342894c5d2f8c7df194c793ec4059656e09ca31 > http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02 > > If you're still hacking on QDES you may be interested. > > VXLAN is a VLAN mechanism that gets around the 12-bit 802.1Q tag size. > In large deployments it may be necessary to have more than 4096 > VLANs, this is where VXLAN comes in. > > It's a tiny header with VXLAN Network ID that encapsulates Ethernet inside UDP: > > [Outer Ethernet][IP][UDP] [VXLAN] [Inner Ethernet][...] > > UDP is used as follows: > 1. If the host has already learnt an Inner MAC -> Outer IP mapping, > then it transmits a unicast UDP packet. > 2. Otherwise it transmits a multicast UDP packet. > > That means all hosts join a multicast group - this enables broadcast > similar to what you've done in your patches. > > Typically traffic from a VM on Host A to another VM on Host B will use > unicast UDP because the Inner MAC -> Outer IP mapping has been learnt. > > I'm not sure if it makes sense to implement VXLAN in QEMU because the > multicast UDP socket uses a well-known port. I guess that means > multiple QEMUs running on the same host cannot use VXLAN unless they > bind to unique IP addresses. At that point we lose the advantage of a > pure userspace implementation and might as well use the kernel > implementation (or OpenVSwitch) with tap devices. > > Anyway, it's still interesting and maybe there's a way to solve this. > > Stefan the VXLAN spec gave me some inspiration to write the original patch i submitted. unfortunately i made a silly decision of using my own header format and should have used the VXLAN one. but i believe just changing that would make this compatible with VXLAN. i do still want to do more work on this such as converting to make it compatible with VXLAN. there have also been a lot of other changes to the network subsystem that i would need to update the patch for. i've been rather busy the past few months with a work project and told myself i have to finish that before i can go back to this. i also was waiting to see if the curn in the network subsystem would calm down and make all the changes i need there at once. hopefully around the new year i'll have time to look at it. since i originally sent the patch to the list, there have been a few people ask me about it so i think there is some interest for it. i think it does still make sense to implement it in QEMU. there isn't a problem with multiple processes using the same multicast address. the net_socket_mcast_create function in socket.c already sets the IP_MULTICAST_LOOP option which makes it so packets get looped back and also delivered to processes on the same host. that is why there is a check in qdes_receive to see if the sender is the localAddr and drop it if it is. the big advantage i see to implementing VXLAN inside QEMU is that it can be done without any escalated privileges and without reconfiguring the hosts network configuration. mike ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES) 2012-11-26 17:19 ` Mike Lovell @ 2012-11-27 12:42 ` Stefan Hajnoczi 2012-11-27 14:24 ` Anthony Liguori 2012-11-28 7:14 ` Mike Lovell 0 siblings, 2 replies; 6+ messages in thread From: Stefan Hajnoczi @ 2012-11-27 12:42 UTC (permalink / raw) To: Mike Lovell; +Cc: Anthony Liguori, qemu-devel On Mon, Nov 26, 2012 at 6:19 PM, Mike Lovell <mike@dev-zero.net> wrote: > i think it does still make sense to implement it in QEMU. there isn't a > problem with multiple processes using the same multicast address. the > net_socket_mcast_create function in socket.c already sets the > IP_MULTICAST_LOOP option which makes it so packets get looped back and also > delivered to processes on the same host. that is why there is a check in > qdes_receive to see if the sender is the localAddr and drop it if it is. the > big advantage i see to implementing VXLAN inside QEMU is that it can be done > without any escalated privileges and without reconfiguring the hosts network > configuration. The part I'm wondering about with VXLAN multicast is whether all QEMU processes on the host need to receive on the same well-known UDP port. Not sure if that's possible with the sockets API. Stefan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES) 2012-11-27 12:42 ` Stefan Hajnoczi @ 2012-11-27 14:24 ` Anthony Liguori 2012-11-28 7:37 ` Mike Lovell 2012-11-28 7:14 ` Mike Lovell 1 sibling, 1 reply; 6+ messages in thread From: Anthony Liguori @ 2012-11-27 14:24 UTC (permalink / raw) To: Stefan Hajnoczi, Mike Lovell; +Cc: qemu-devel Stefan Hajnoczi <stefanha@gmail.com> writes: > On Mon, Nov 26, 2012 at 6:19 PM, Mike Lovell <mike@dev-zero.net> wrote: >> i think it does still make sense to implement it in QEMU. there isn't a >> problem with multiple processes using the same multicast address. the >> net_socket_mcast_create function in socket.c already sets the >> IP_MULTICAST_LOOP option which makes it so packets get looped back and also >> delivered to processes on the same host. that is why there is a check in >> qdes_receive to see if the sender is the localAddr and drop it if it is. the >> big advantage i see to implementing VXLAN inside QEMU is that it can be done >> without any escalated privileges and without reconfiguring the hosts network >> configuration. > > The part I'm wondering about with VXLAN multicast is whether all QEMU > processes on the host need to receive on the same well-known UDP port. > Not sure if that's possible with the sockets API. Perhaps this is a dumb question, but wouldn't it be trivial to write a VXLAN proxy that added a VXLAN tag to ethernet frames from -net socket? Obviously, this could also be done with the normal linux tools at the tun/tap layer too. I think we should resist adding a bunch of stuff to the networking layer just because we can. Otherwise we'll end up reinventing the Linux networking layer in QEMU. Regards, Anthony Liguori > > Stefan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES) 2012-11-27 14:24 ` Anthony Liguori @ 2012-11-28 7:37 ` Mike Lovell 0 siblings, 0 replies; 6+ messages in thread From: Mike Lovell @ 2012-11-28 7:37 UTC (permalink / raw) To: Anthony Liguori; +Cc: Stefan Hajnoczi, qemu-devel On 11/27/2012 07:24 AM, Anthony Liguori wrote: > Stefan Hajnoczi <stefanha@gmail.com> writes: > >> The part I'm wondering about with VXLAN multicast is whether all QEMU >> processes on the host need to receive on the same well-known UDP port. >> Not sure if that's possible with the sockets API. > Perhaps this is a dumb question, but wouldn't it be trivial to write a > VXLAN proxy that added a VXLAN tag to ethernet frames from -net socket? this is definitely possible. when i was doing my initial prototyping to see if this would be possible, i used the socket network backend to connect to a python program doing the VXLAN-like processing. really ugly code that isn't worth reviving. i liked the idea of having this in qemu since it would simplify configuration and wouldn't require starting two processes and wiring them together. some will probably call this crazy but i still end up using the cli a lot and i wanted to make that simpler. this just requires specifying the multicast address and the network id to qemu. maybe there is a compromise between using the sockets api and cli simplicity with having a helper option for the sockets api that starts the other process. kind of like the bridge-helper but a process that stays running as long as the netdev is around. this would allow easy development of whatever networking methods people would want to experiment with. i briefly looked at the code to see how this could potentially be implemented but haven't started writing any code. > Obviously, this could also be done with the normal linux tools at the > tun/tap layer too. > > I think we should resist adding a bunch of stuff to the networking layer > just because we can. Otherwise we'll end up reinventing the Linux > networking layer in QEMU. definitely a valid point. with the linux 3.7 kernel getting a VXLAN implementation, a guest could use a tap device connected to a linux bridge which also has a VXLAN interface. this would keep all the processing in the kernel and doesn't re-invent the wheel. it still requires escalated privileges to configure the networking in the host which i'm trying to avoid (stupid misguided security monkey that is bugging me). so trade-offs both ways and when i wrote the original patch there wasn't anyone even talking about a VXLAN implementation in the linux kernel. mike ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES) 2012-11-27 12:42 ` Stefan Hajnoczi 2012-11-27 14:24 ` Anthony Liguori @ 2012-11-28 7:14 ` Mike Lovell 1 sibling, 0 replies; 6+ messages in thread From: Mike Lovell @ 2012-11-28 7:14 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: Anthony Liguori, qemu-devel On 11/27/2012 05:42 AM, Stefan Hajnoczi wrote: > On Mon, Nov 26, 2012 at 6:19 PM, Mike Lovell <mike@dev-zero.net> wrote: >> i think it does still make sense to implement it in QEMU. there isn't a >> problem with multiple processes using the same multicast address. the >> net_socket_mcast_create function in socket.c already sets the >> IP_MULTICAST_LOOP option which makes it so packets get looped back and also >> delivered to processes on the same host. that is why there is a check in >> qdes_receive to see if the sender is the localAddr and drop it if it is. the >> big advantage i see to implementing VXLAN inside QEMU is that it can be done >> without any escalated privileges and without reconfiguring the hosts network >> configuration. > The part I'm wondering about with VXLAN multicast is whether all QEMU > processes on the host need to receive on the same well-known UDP port. > Not sure if that's possible with the sockets API. > > Stefan ah. yes. all qemu processes using the same well-known UDP port would receive all multicast packets. the individual processes would then be responsible for checking to make sure the received packets are for the same VNI (VXLAN Network Identifier) that the individual process is configured to use. this would result in some additional processing for every process. it should just be a single int comparison on every packet and an increase of packets for broadcast and multicast packets for all VNIs on the port. also, the code currently allows for using user defined ports and different udp ports could be used for each network. i don't know if other implementations would allow for specifying a different port though. mike ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-11-28 7:37 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <1340602924-3231-1-git-send-email-mike@dev-zero.net> 2012-11-24 15:21 ` [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES) Stefan Hajnoczi 2012-11-26 17:19 ` Mike Lovell 2012-11-27 12:42 ` Stefan Hajnoczi 2012-11-27 14:24 ` Anthony Liguori 2012-11-28 7:37 ` Mike Lovell 2012-11-28 7:14 ` Mike Lovell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).