* Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)
[not found] <1340602924-3231-1-git-send-email-mike@dev-zero.net>
@ 2012-11-24 15:21 ` Stefan Hajnoczi
2012-11-26 17:19 ` Mike Lovell
0 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2012-11-24 15:21 UTC (permalink / raw)
To: Mike Lovell; +Cc: Anthony Liguori, qemu-devel
On Mon, Jun 25, 2012 at 7:42 AM, Mike Lovell <mike@dev-zero.net> wrote:
> This is what I've been calling QDES or QEMU Distributed Ethernet Switch. I
> first had the idea when I was playing with the udp and mcast socket network
> backends while exploring how to build a VM infrastructure. I liked the idea of
> using the sockets backends cause it doesn't require escalated permissions to
> configure and run as well as the ability to talk over IP networks.
Hi Mike,
I was just reading the VXLAN spec and Linux code when I realized this
is similar to your QDES approach:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d342894c5d2f8c7df194c793ec4059656e09ca31
http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02
If you're still hacking on QDES you may be interested.
VXLAN is a VLAN mechanism that gets around the 12-bit 802.1Q tag size.
In large deployments it may be necessary to have more than 4096
VLANs, this is where VXLAN comes in.
It's a tiny header with VXLAN Network ID that encapsulates Ethernet inside UDP:
[Outer Ethernet][IP][UDP] [VXLAN] [Inner Ethernet][...]
UDP is used as follows:
1. If the host has already learnt an Inner MAC -> Outer IP mapping,
then it transmits a unicast UDP packet.
2. Otherwise it transmits a multicast UDP packet.
That means all hosts join a multicast group - this enables broadcast
similar to what you've done in your patches.
Typically traffic from a VM on Host A to another VM on Host B will use
unicast UDP because the Inner MAC -> Outer IP mapping has been learnt.
I'm not sure if it makes sense to implement VXLAN in QEMU because the
multicast UDP socket uses a well-known port. I guess that means
multiple QEMUs running on the same host cannot use VXLAN unless they
bind to unique IP addresses. At that point we lose the advantage of a
pure userspace implementation and might as well use the kernel
implementation (or OpenVSwitch) with tap devices.
Anyway, it's still interesting and maybe there's a way to solve this.
Stefan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)
2012-11-24 15:21 ` [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES) Stefan Hajnoczi
@ 2012-11-26 17:19 ` Mike Lovell
2012-11-27 12:42 ` Stefan Hajnoczi
0 siblings, 1 reply; 6+ messages in thread
From: Mike Lovell @ 2012-11-26 17:19 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Anthony Liguori, qemu-devel
On 11/24/2012 08:21 AM, Stefan Hajnoczi wrote:
> On Mon, Jun 25, 2012 at 7:42 AM, Mike Lovell <mike@dev-zero.net> wrote:
>> This is what I've been calling QDES or QEMU Distributed Ethernet Switch. I
>> first had the idea when I was playing with the udp and mcast socket network
>> backends while exploring how to build a VM infrastructure. I liked the idea of
>> using the sockets backends cause it doesn't require escalated permissions to
>> configure and run as well as the ability to talk over IP networks.
> Hi Mike,
> I was just reading the VXLAN spec and Linux code when I realized this
> is similar to your QDES approach:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d342894c5d2f8c7df194c793ec4059656e09ca31
> http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02
>
> If you're still hacking on QDES you may be interested.
>
> VXLAN is a VLAN mechanism that gets around the 12-bit 802.1Q tag size.
> In large deployments it may be necessary to have more than 4096
> VLANs, this is where VXLAN comes in.
>
> It's a tiny header with VXLAN Network ID that encapsulates Ethernet inside UDP:
>
> [Outer Ethernet][IP][UDP] [VXLAN] [Inner Ethernet][...]
>
> UDP is used as follows:
> 1. If the host has already learnt an Inner MAC -> Outer IP mapping,
> then it transmits a unicast UDP packet.
> 2. Otherwise it transmits a multicast UDP packet.
>
> That means all hosts join a multicast group - this enables broadcast
> similar to what you've done in your patches.
>
> Typically traffic from a VM on Host A to another VM on Host B will use
> unicast UDP because the Inner MAC -> Outer IP mapping has been learnt.
>
> I'm not sure if it makes sense to implement VXLAN in QEMU because the
> multicast UDP socket uses a well-known port. I guess that means
> multiple QEMUs running on the same host cannot use VXLAN unless they
> bind to unique IP addresses. At that point we lose the advantage of a
> pure userspace implementation and might as well use the kernel
> implementation (or OpenVSwitch) with tap devices.
>
> Anyway, it's still interesting and maybe there's a way to solve this.
>
> Stefan
the VXLAN spec gave me some inspiration to write the original patch i
submitted. unfortunately i made a silly decision of using my own header
format and should have used the VXLAN one. but i believe just changing
that would make this compatible with VXLAN.
i do still want to do more work on this such as converting to make it
compatible with VXLAN. there have also been a lot of other changes to
the network subsystem that i would need to update the patch for. i've
been rather busy the past few months with a work project and told myself
i have to finish that before i can go back to this. i also was waiting
to see if the curn in the network subsystem would calm down and make all
the changes i need there at once. hopefully around the new year i'll
have time to look at it. since i originally sent the patch to the list,
there have been a few people ask me about it so i think there is some
interest for it.
i think it does still make sense to implement it in QEMU. there isn't a
problem with multiple processes using the same multicast address. the
net_socket_mcast_create function in socket.c already sets the
IP_MULTICAST_LOOP option which makes it so packets get looped back and
also delivered to processes on the same host. that is why there is a
check in qdes_receive to see if the sender is the localAddr and drop it
if it is. the big advantage i see to implementing VXLAN inside QEMU is
that it can be done without any escalated privileges and without
reconfiguring the hosts network configuration.
mike
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)
2012-11-26 17:19 ` Mike Lovell
@ 2012-11-27 12:42 ` Stefan Hajnoczi
2012-11-27 14:24 ` Anthony Liguori
2012-11-28 7:14 ` Mike Lovell
0 siblings, 2 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2012-11-27 12:42 UTC (permalink / raw)
To: Mike Lovell; +Cc: Anthony Liguori, qemu-devel
On Mon, Nov 26, 2012 at 6:19 PM, Mike Lovell <mike@dev-zero.net> wrote:
> i think it does still make sense to implement it in QEMU. there isn't a
> problem with multiple processes using the same multicast address. the
> net_socket_mcast_create function in socket.c already sets the
> IP_MULTICAST_LOOP option which makes it so packets get looped back and also
> delivered to processes on the same host. that is why there is a check in
> qdes_receive to see if the sender is the localAddr and drop it if it is. the
> big advantage i see to implementing VXLAN inside QEMU is that it can be done
> without any escalated privileges and without reconfiguring the hosts network
> configuration.
The part I'm wondering about with VXLAN multicast is whether all QEMU
processes on the host need to receive on the same well-known UDP port.
Not sure if that's possible with the sockets API.
Stefan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)
2012-11-27 12:42 ` Stefan Hajnoczi
@ 2012-11-27 14:24 ` Anthony Liguori
2012-11-28 7:37 ` Mike Lovell
2012-11-28 7:14 ` Mike Lovell
1 sibling, 1 reply; 6+ messages in thread
From: Anthony Liguori @ 2012-11-27 14:24 UTC (permalink / raw)
To: Stefan Hajnoczi, Mike Lovell; +Cc: qemu-devel
Stefan Hajnoczi <stefanha@gmail.com> writes:
> On Mon, Nov 26, 2012 at 6:19 PM, Mike Lovell <mike@dev-zero.net> wrote:
>> i think it does still make sense to implement it in QEMU. there isn't a
>> problem with multiple processes using the same multicast address. the
>> net_socket_mcast_create function in socket.c already sets the
>> IP_MULTICAST_LOOP option which makes it so packets get looped back and also
>> delivered to processes on the same host. that is why there is a check in
>> qdes_receive to see if the sender is the localAddr and drop it if it is. the
>> big advantage i see to implementing VXLAN inside QEMU is that it can be done
>> without any escalated privileges and without reconfiguring the hosts network
>> configuration.
>
> The part I'm wondering about with VXLAN multicast is whether all QEMU
> processes on the host need to receive on the same well-known UDP port.
> Not sure if that's possible with the sockets API.
Perhaps this is a dumb question, but wouldn't it be trivial to write a
VXLAN proxy that added a VXLAN tag to ethernet frames from -net socket?
Obviously, this could also be done with the normal linux tools at the
tun/tap layer too.
I think we should resist adding a bunch of stuff to the networking layer
just because we can. Otherwise we'll end up reinventing the Linux
networking layer in QEMU.
Regards,
Anthony Liguori
>
> Stefan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)
2012-11-27 12:42 ` Stefan Hajnoczi
2012-11-27 14:24 ` Anthony Liguori
@ 2012-11-28 7:14 ` Mike Lovell
1 sibling, 0 replies; 6+ messages in thread
From: Mike Lovell @ 2012-11-28 7:14 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Anthony Liguori, qemu-devel
On 11/27/2012 05:42 AM, Stefan Hajnoczi wrote:
> On Mon, Nov 26, 2012 at 6:19 PM, Mike Lovell <mike@dev-zero.net> wrote:
>> i think it does still make sense to implement it in QEMU. there isn't a
>> problem with multiple processes using the same multicast address. the
>> net_socket_mcast_create function in socket.c already sets the
>> IP_MULTICAST_LOOP option which makes it so packets get looped back and also
>> delivered to processes on the same host. that is why there is a check in
>> qdes_receive to see if the sender is the localAddr and drop it if it is. the
>> big advantage i see to implementing VXLAN inside QEMU is that it can be done
>> without any escalated privileges and without reconfiguring the hosts network
>> configuration.
> The part I'm wondering about with VXLAN multicast is whether all QEMU
> processes on the host need to receive on the same well-known UDP port.
> Not sure if that's possible with the sockets API.
>
> Stefan
ah. yes. all qemu processes using the same well-known UDP port would
receive all multicast packets. the individual processes would then be
responsible for checking to make sure the received packets are for the
same VNI (VXLAN Network Identifier) that the individual process is
configured to use. this would result in some additional processing for
every process. it should just be a single int comparison on every packet
and an increase of packets for broadcast and multicast packets for all
VNIs on the port.
also, the code currently allows for using user defined ports and
different udp ports could be used for each network. i don't know if
other implementations would allow for specifying a different port though.
mike
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)
2012-11-27 14:24 ` Anthony Liguori
@ 2012-11-28 7:37 ` Mike Lovell
0 siblings, 0 replies; 6+ messages in thread
From: Mike Lovell @ 2012-11-28 7:37 UTC (permalink / raw)
To: Anthony Liguori; +Cc: Stefan Hajnoczi, qemu-devel
On 11/27/2012 07:24 AM, Anthony Liguori wrote:
> Stefan Hajnoczi <stefanha@gmail.com> writes:
>
>> The part I'm wondering about with VXLAN multicast is whether all QEMU
>> processes on the host need to receive on the same well-known UDP port.
>> Not sure if that's possible with the sockets API.
> Perhaps this is a dumb question, but wouldn't it be trivial to write a
> VXLAN proxy that added a VXLAN tag to ethernet frames from -net socket?
this is definitely possible. when i was doing my initial prototyping to
see if this would be possible, i used the socket network backend to
connect to a python program doing the VXLAN-like processing. really ugly
code that isn't worth reviving.
i liked the idea of having this in qemu since it would simplify
configuration and wouldn't require starting two processes and wiring
them together. some will probably call this crazy but i still end up
using the cli a lot and i wanted to make that simpler. this just
requires specifying the multicast address and the network id to qemu.
maybe there is a compromise between using the sockets api and cli
simplicity with having a helper option for the sockets api that starts
the other process. kind of like the bridge-helper but a process that
stays running as long as the netdev is around. this would allow easy
development of whatever networking methods people would want to
experiment with. i briefly looked at the code to see how this could
potentially be implemented but haven't started writing any code.
> Obviously, this could also be done with the normal linux tools at the
> tun/tap layer too.
>
> I think we should resist adding a bunch of stuff to the networking layer
> just because we can. Otherwise we'll end up reinventing the Linux
> networking layer in QEMU.
definitely a valid point. with the linux 3.7 kernel getting a VXLAN
implementation, a guest could use a tap device connected to a linux
bridge which also has a VXLAN interface. this would keep all the
processing in the kernel and doesn't re-invent the wheel. it still
requires escalated privileges to configure the networking in the host
which i'm trying to avoid (stupid misguided security monkey that is
bugging me). so trade-offs both ways and when i wrote the original patch
there wasn't anyone even talking about a VXLAN implementation in the
linux kernel.
mike
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-11-28 7:37 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1340602924-3231-1-git-send-email-mike@dev-zero.net>
2012-11-24 15:21 ` [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES) Stefan Hajnoczi
2012-11-26 17:19 ` Mike Lovell
2012-11-27 12:42 ` Stefan Hajnoczi
2012-11-27 14:24 ` Anthony Liguori
2012-11-28 7:37 ` Mike Lovell
2012-11-28 7:14 ` Mike Lovell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).