Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)
       [not found] <1340602924-3231-1-git-send-email-mike@dev-zero.net>
@ 2012-11-24 15:21 ` Stefan Hajnoczi
  2012-11-26 17:19   ` Mike Lovell
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2012-11-24 15:21 UTC (permalink / raw)
  To: Mike Lovell; +Cc: Anthony Liguori, qemu-devel

On Mon, Jun 25, 2012 at 7:42 AM, Mike Lovell <mike@dev-zero.net> wrote:
> This is what I've been calling QDES or QEMU Distributed Ethernet Switch. I
> first had the idea when I was playing with the udp and mcast socket network
> backends while exploring how to build a VM infrastructure. I liked the idea of
> using the sockets backends cause it doesn't require escalated permissions to
> configure and run as well as the ability to talk over IP networks.

Hi Mike,
I was just reading the VXLAN spec and Linux code when I realized this
is similar to your QDES approach:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d342894c5d2f8c7df194c793ec4059656e09ca31
http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02

If you're still hacking on QDES you may be interested.

VXLAN is a VLAN mechanism that gets around the 12-bit 802.1Q tag size.
 In large deployments it may be necessary to have more than 4096
VLANs, this is where VXLAN comes in.

It's a tiny header with VXLAN Network ID that encapsulates Ethernet inside UDP:

[Outer Ethernet][IP][UDP] [VXLAN] [Inner Ethernet][...]

UDP is used as follows:
1. If the host has already learnt an Inner MAC -> Outer IP mapping,
then it transmits a unicast UDP packet.
2. Otherwise it transmits a multicast UDP packet.

That means all hosts join a multicast group - this enables broadcast
similar to what you've done in your patches.

Typically traffic from a VM on Host A to another VM on Host B will use
unicast UDP because the Inner MAC -> Outer IP mapping has been learnt.

I'm not sure if it makes sense to implement VXLAN in QEMU because the
multicast UDP socket uses a well-known port.  I guess that means
multiple QEMUs running on the same host cannot use VXLAN unless they
bind to unique IP addresses.  At that point we lose the advantage of a
pure userspace implementation and might as well use the kernel
implementation (or OpenVSwitch) with tap devices.

Anyway, it's still interesting and maybe there's a way to solve this.

Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)
  2012-11-24 15:21 ` [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES) Stefan Hajnoczi
@ 2012-11-26 17:19   ` Mike Lovell
  2012-11-27 12:42     ` Stefan Hajnoczi
  0 siblings, 1 reply; 6+ messages in thread
From: Mike Lovell @ 2012-11-26 17:19 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Anthony Liguori, qemu-devel

On 11/24/2012 08:21 AM, Stefan Hajnoczi wrote:
> On Mon, Jun 25, 2012 at 7:42 AM, Mike Lovell <mike@dev-zero.net> wrote:
>> This is what I've been calling QDES or QEMU Distributed Ethernet Switch. I
>> first had the idea when I was playing with the udp and mcast socket network
>> backends while exploring how to build a VM infrastructure. I liked the idea of
>> using the sockets backends cause it doesn't require escalated permissions to
>> configure and run as well as the ability to talk over IP networks.
> Hi Mike,
> I was just reading the VXLAN spec and Linux code when I realized this
> is similar to your QDES approach:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d342894c5d2f8c7df194c793ec4059656e09ca31
> http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02
>
> If you're still hacking on QDES you may be interested.
>
> VXLAN is a VLAN mechanism that gets around the 12-bit 802.1Q tag size.
>   In large deployments it may be necessary to have more than 4096
> VLANs, this is where VXLAN comes in.
>
> It's a tiny header with VXLAN Network ID that encapsulates Ethernet inside UDP:
>
> [Outer Ethernet][IP][UDP] [VXLAN] [Inner Ethernet][...]
>
> UDP is used as follows:
> 1. If the host has already learnt an Inner MAC -> Outer IP mapping,
> then it transmits a unicast UDP packet.
> 2. Otherwise it transmits a multicast UDP packet.
>
> That means all hosts join a multicast group - this enables broadcast
> similar to what you've done in your patches.
>
> Typically traffic from a VM on Host A to another VM on Host B will use
> unicast UDP because the Inner MAC -> Outer IP mapping has been learnt.
>
> I'm not sure if it makes sense to implement VXLAN in QEMU because the
> multicast UDP socket uses a well-known port.  I guess that means
> multiple QEMUs running on the same host cannot use VXLAN unless they
> bind to unique IP addresses.  At that point we lose the advantage of a
> pure userspace implementation and might as well use the kernel
> implementation (or OpenVSwitch) with tap devices.
>
> Anyway, it's still interesting and maybe there's a way to solve this.
>
> Stefan

the VXLAN spec gave me some inspiration to write the original patch i 
submitted. unfortunately i made a silly decision of using my own header 
format and should have used the VXLAN one. but i believe just changing 
that would make this compatible with VXLAN.

i do still want to do more work on this such as converting to make it 
compatible with VXLAN. there have also been a lot of other changes to 
the network subsystem that i would need to update the patch for. i've 
been rather busy the past few months with a work project and told myself 
i have to finish that before i can go back to this. i also was waiting 
to see if the curn in the network subsystem would calm down and make all 
the changes i need there at once. hopefully around the new year i'll 
have time to look at it. since i originally sent the patch to the list, 
there have been a few people ask me about it so i think there is some 
interest for it.

i think it does still make sense to implement it in QEMU. there isn't a 
problem with multiple processes using the same multicast address. the 
net_socket_mcast_create function in socket.c already sets the 
IP_MULTICAST_LOOP option which makes it so packets get looped back and 
also delivered to processes on the same host. that is why there is a 
check in qdes_receive to see if the sender is the localAddr and drop it 
if it is. the big advantage i see to implementing VXLAN inside QEMU is 
that it can be done without any escalated privileges and without 
reconfiguring the hosts network configuration.

mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)
  2012-11-26 17:19   ` Mike Lovell
@ 2012-11-27 12:42     ` Stefan Hajnoczi
  2012-11-27 14:24       ` Anthony Liguori
  2012-11-28  7:14       ` Mike Lovell
  0 siblings, 2 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2012-11-27 12:42 UTC (permalink / raw)
  To: Mike Lovell; +Cc: Anthony Liguori, qemu-devel

On Mon, Nov 26, 2012 at 6:19 PM, Mike Lovell <mike@dev-zero.net> wrote:
> i think it does still make sense to implement it in QEMU. there isn't a
> problem with multiple processes using the same multicast address. the
> net_socket_mcast_create function in socket.c already sets the
> IP_MULTICAST_LOOP option which makes it so packets get looped back and also
> delivered to processes on the same host. that is why there is a check in
> qdes_receive to see if the sender is the localAddr and drop it if it is. the
> big advantage i see to implementing VXLAN inside QEMU is that it can be done
> without any escalated privileges and without reconfiguring the hosts network
> configuration.

The part I'm wondering about with VXLAN multicast is whether all QEMU
processes on the host need to receive on the same well-known UDP port.
 Not sure if that's possible with the sockets API.

Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)
  2012-11-27 12:42     ` Stefan Hajnoczi
@ 2012-11-27 14:24       ` Anthony Liguori
  2012-11-28  7:37         ` Mike Lovell
  2012-11-28  7:14       ` Mike Lovell
  1 sibling, 1 reply; 6+ messages in thread
From: Anthony Liguori @ 2012-11-27 14:24 UTC (permalink / raw)
  To: Stefan Hajnoczi, Mike Lovell; +Cc: qemu-devel

Stefan Hajnoczi <stefanha@gmail.com> writes:

> On Mon, Nov 26, 2012 at 6:19 PM, Mike Lovell <mike@dev-zero.net> wrote:
>> i think it does still make sense to implement it in QEMU. there isn't a
>> problem with multiple processes using the same multicast address. the
>> net_socket_mcast_create function in socket.c already sets the
>> IP_MULTICAST_LOOP option which makes it so packets get looped back and also
>> delivered to processes on the same host. that is why there is a check in
>> qdes_receive to see if the sender is the localAddr and drop it if it is. the
>> big advantage i see to implementing VXLAN inside QEMU is that it can be done
>> without any escalated privileges and without reconfiguring the hosts network
>> configuration.
>
> The part I'm wondering about with VXLAN multicast is whether all QEMU
> processes on the host need to receive on the same well-known UDP port.
>  Not sure if that's possible with the sockets API.

Perhaps this is a dumb question, but wouldn't it be trivial to write a
VXLAN proxy that added a VXLAN tag to ethernet frames from -net socket?

Obviously, this could also be done with the normal linux tools at the
tun/tap layer too.

I think we should resist adding a bunch of stuff to the networking layer
just because we can.  Otherwise we'll end up reinventing the Linux
networking layer in QEMU.

Regards,

Anthony Liguori

>
> Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)
  2012-11-27 12:42     ` Stefan Hajnoczi
  2012-11-27 14:24       ` Anthony Liguori
@ 2012-11-28  7:14       ` Mike Lovell
  1 sibling, 0 replies; 6+ messages in thread
From: Mike Lovell @ 2012-11-28  7:14 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Anthony Liguori, qemu-devel

On 11/27/2012 05:42 AM, Stefan Hajnoczi wrote:
> On Mon, Nov 26, 2012 at 6:19 PM, Mike Lovell <mike@dev-zero.net> wrote:
>> i think it does still make sense to implement it in QEMU. there isn't a
>> problem with multiple processes using the same multicast address. the
>> net_socket_mcast_create function in socket.c already sets the
>> IP_MULTICAST_LOOP option which makes it so packets get looped back and also
>> delivered to processes on the same host. that is why there is a check in
>> qdes_receive to see if the sender is the localAddr and drop it if it is. the
>> big advantage i see to implementing VXLAN inside QEMU is that it can be done
>> without any escalated privileges and without reconfiguring the hosts network
>> configuration.
> The part I'm wondering about with VXLAN multicast is whether all QEMU
> processes on the host need to receive on the same well-known UDP port.
>   Not sure if that's possible with the sockets API.
>
> Stefan

ah. yes. all qemu processes using the same well-known UDP port would 
receive all multicast packets. the individual processes would then be 
responsible for checking to make sure the received packets are for the 
same VNI (VXLAN Network Identifier) that the individual process is 
configured to use. this would result in some additional processing for 
every process. it should just be a single int comparison on every packet 
and an increase of packets for broadcast and multicast packets for all 
VNIs on the port.

also, the code currently allows for using user defined ports and 
different udp ports could be used for each network. i don't know if 
other implementations would allow for specifying a different port though.

mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)
  2012-11-27 14:24       ` Anthony Liguori
@ 2012-11-28  7:37         ` Mike Lovell
  0 siblings, 0 replies; 6+ messages in thread
From: Mike Lovell @ 2012-11-28  7:37 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Stefan Hajnoczi, qemu-devel

On 11/27/2012 07:24 AM, Anthony Liguori wrote:
> Stefan Hajnoczi <stefanha@gmail.com> writes:
>
>> The part I'm wondering about with VXLAN multicast is whether all QEMU
>> processes on the host need to receive on the same well-known UDP port.
>>   Not sure if that's possible with the sockets API.
> Perhaps this is a dumb question, but wouldn't it be trivial to write a
> VXLAN proxy that added a VXLAN tag to ethernet frames from -net socket?

this is definitely possible. when i was doing my initial prototyping to 
see if this would be possible, i used the socket network backend to 
connect to a python program doing the VXLAN-like processing. really ugly 
code that isn't worth reviving.

i liked the idea of having this in qemu since it would simplify 
configuration and wouldn't require starting two processes and wiring 
them together. some will probably call this crazy but i still end up 
using the cli a lot and i wanted to make that simpler. this just 
requires specifying the multicast address and the network id to qemu.

maybe there is a compromise between using the sockets api and cli 
simplicity with having a helper option for the sockets api that starts 
the other process. kind of like the bridge-helper but a process that 
stays running as long as the netdev is around. this would allow easy 
development of whatever networking methods people would want to 
experiment with. i briefly looked at the code to see how this could 
potentially be implemented but haven't started writing any code.

> Obviously, this could also be done with the normal linux tools at the
> tun/tap layer too.
>
> I think we should resist adding a bunch of stuff to the networking layer
> just because we can.  Otherwise we'll end up reinventing the Linux
> networking layer in QEMU.

definitely a valid point. with the linux 3.7 kernel getting a VXLAN 
implementation, a guest could use a tap device connected to a linux 
bridge which also has a VXLAN interface. this would keep all the 
processing in the kernel and doesn't re-invent the wheel. it still 
requires escalated privileges to configure the networking in the host 
which i'm trying to avoid (stupid misguided security monkey that is 
bugging me). so trade-offs both ways and when i wrote the original patch 
there wasn't anyone even talking about a VXLAN implementation in the 
linux kernel.

mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-11-28  7:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1340602924-3231-1-git-send-email-mike@dev-zero.net>
2012-11-24 15:21 ` [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES) Stefan Hajnoczi
2012-11-26 17:19   ` Mike Lovell
2012-11-27 12:42     ` Stefan Hajnoczi
2012-11-27 14:24       ` Anthony Liguori
2012-11-28  7:37         ` Mike Lovell
2012-11-28  7:14       ` Mike Lovell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).