From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MRBiC-0003SQ-Us for qemu-devel@nongnu.org; Wed, 15 Jul 2009 17:08:24 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MRBi8-0003NL-Dz for qemu-devel@nongnu.org; Wed, 15 Jul 2009 17:08:24 -0400 Received: from [199.232.76.173] (port=37462 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MRBi8-0003N7-5j for qemu-devel@nongnu.org; Wed, 15 Jul 2009 17:08:20 -0400 Received: from fmmailgate02.web.de ([217.72.192.227]:46778) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1MRBi7-00051w-4l for qemu-devel@nongnu.org; Wed, 15 Jul 2009 17:08:19 -0400 Message-ID: <4A5E44CD.70209@web.de> Date: Wed, 15 Jul 2009 23:06:21 +0200 From: Jan Kiszka MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH] net: add raw backend References: <20090701162115.GA4555@shareable.org> <4A4CA747.1050509@Voltaire.com> <20090703023911.GD938@shareable.org> <4A534EC4.5030209@voltaire.com> <20090707145739.GB14392@shareable.org> <4A54B0F1.3070201@voltaire.com> <20090715203806.GF3056@shareable.org> In-Reply-To: <20090715203806.GF3056@shareable.org> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigF81FB214E3761D7A0644120B" Sender: jan.kiszka@web.de List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jamie Lokier Cc: Or Gerlitz , Herbert Xu , qemu-devel@nongnu.org This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigF81FB214E3761D7A0644120B Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Jamie Lokier wrote: > Or Gerlitz wrote: >> Jamie Lokier wrote: >>> The problem is simply what the guest sends goes out on the network an= d is=20 >>> not looped backed to the host network stack, and vice versa. So if yo= ur=20 >>> host is 192.168.1.1 and is running a DNS server (say), and the guest = is=20 >>> 192.168.1.2, when the guest sends queries to 192.168.1.1 the host won= 't=20 >>> see those queries. Same if you're running an FTP server on the host = and=20 >>> the guest wants to connect to it, etc. It also means multiple guests = can't=20 >>> see each other, for the same reason. So it's much less useful than=20 >>> bridging, where the guests and host can all see each other and connec= t to=20 >>> each other. >> I wasn't sure to follow if your example refers to the case when=20 >> networking uses the bridge or NAT. If its bridge, then through which=20 >> bridge interface the packet arrives the host stack? say you have a=20 >> bridge whose attached interfaces are tap1(VM1), tap2(VM2) and eth0(NIC= ),=20 >> in your example did you mean that the host IP address is assigned to t= he=20 >> bridge interface? or you were referring a NAT based scheme? >=20 > When using a bridge, you set the IP address on the bridge itself (for > example, br0). DHCP runs on the bridge itself, so does the rest of > the Linux host stack, although you can use raw sockets on the other > interfaces. >=20 > But reading and controlling the hardware is done on the interfaces. >=20 > So if you have some program like NetworkManager which checks if you > have a wire plugged into eth0, it has to read eth0 to get the wire > status, but it has to run DHCP on br0. >=20 > Those programs don't generally have that option, which makes bridges > difficult to use for VMs in a transparent way. >=20 > I wasn't referring to NAT, but you can use NAT with a bridge on Linux; > it's called brouting :-) >=20 >>> Unfortunately, bridging is a pain to set up, if your host has any=20 >>> complicated or automatic network configuration already. >=20 >> As you said bridging requires more configuration >=20 > A bridge is quite simple to configure. Unfortunately because Linux > requires all the IP configuration on the bridge device, but network > device control on the network device, bridges don't work well with > automatic configuration tools. >=20 > If you could apply host IP configuration to the network device and > still have a bridge, that would be perfect. You would just create > br0, add tap1(VM1), tap2(VM2) and eth0(NIC), and everything would work > perfectly. >=20 >> but not less important the performance (packets per second and cpu >> utilization) one can get with bridge+tap is much lower vs what you >> get with the raw mode approach. >=20 > Have you measured it? >=20 >> All in all, its clear that with this approach VM/VM and VM/Host >> communication would have to get switched either at the NIC (e.g >> SR/IOV capable NICs supporting a virtual bridge) or at an external >> switch and make a U turn. >=20 > Unfortunately that's usually impossible. Most switches don't do U > turns, and a lot of simple networks don't have any switches except a > home router. >=20 >> There's a bunch of reasons why people would=20 >> like to do that, among them performance boost, >=20 > No, it makes performance _much_ worse if you have packets leaving the > host, do a U turn and come back on the same link. Much better to use > a bridge inside the host. Probably ten times faster because host's > internal networking is much faster than a typical gigabit link :-) >=20 >> the ability to shape,=20 >> manage and monitor VM/VM traffic in external switches and more. >=20 > That could be useful, but I think it's's probably quite unusual for > someone to want to shape traffic between a VM and it's own host. Also > if you want to do that, you can do it inside the host. >=20 > Sometimes it would be useful to send it outside the host and U turn, > but not very often; only for diagnostics I would think. And even that > can be done with Linux bridges, using VLANs :-) >=20 >>> It would be really nice to find a way which has the advantages of bot= h. =20 >>> Either by adding a different bridging mode to Linux, where host inter= faces=20 >>> can be configured for IP and the bridge hangs off the host interface,= or=20 >>> by a modified tap interface, or by an alternative >>> pcap/packet-like interface which forwards packets in a similar way to= =20 >>> bridging. =20 >=20 >> It seems that this will not yield the performance improvement we can = >> get with going directly to the NIC. >=20 > If you don't need any host<->VM networking, maybe a raw packet socket > is faster. >=20 > But are you sure it's faster? > I'd want to see measurements before I believe it. >=20 > If you need any host<->VM networking, most of the time the packet > socket isn't an option at all. Not many switches will 'U turn' > packets as you suggest. FWIW, the fastest local VM<->VM bridge I've happened to measure so far was using qemu's -net socket,listen/connect, ie. a plain local IP or unix domain socket between two qemu instances. No tap devices, no in-kernel bridges involved. But this picture may change once we have some in-kernel virtio-net backend. Jan --------------enigF81FB214E3761D7A0644120B Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iEYEARECAAYFAkpeRNQACgkQniDOoMHTA+kbggCeJcok1NkfQkFVPVJDON59+yPU Bp4An129nvJ8IRcDLDY/aqlnOTgpN3+v =TRbr -----END PGP SIGNATURE----- --------------enigF81FB214E3761D7A0644120B--