Jamie Lokier wrote: > Or Gerlitz wrote: >> Jamie Lokier wrote: >>> The problem is simply what the guest sends goes out on the network and is >>> not looped backed to the host network stack, and vice versa. So if your >>> host is 192.168.1.1 and is running a DNS server (say), and the guest is >>> 192.168.1.2, when the guest sends queries to 192.168.1.1 the host won't >>> see those queries. Same if you're running an FTP server on the host and >>> the guest wants to connect to it, etc. It also means multiple guests can't >>> see each other, for the same reason. So it's much less useful than >>> bridging, where the guests and host can all see each other and connect to >>> each other. >> I wasn't sure to follow if your example refers to the case when >> networking uses the bridge or NAT. If its bridge, then through which >> bridge interface the packet arrives the host stack? say you have a >> bridge whose attached interfaces are tap1(VM1), tap2(VM2) and eth0(NIC), >> in your example did you mean that the host IP address is assigned to the >> bridge interface? or you were referring a NAT based scheme? > > When using a bridge, you set the IP address on the bridge itself (for > example, br0). DHCP runs on the bridge itself, so does the rest of > the Linux host stack, although you can use raw sockets on the other > interfaces. > > But reading and controlling the hardware is done on the interfaces. > > So if you have some program like NetworkManager which checks if you > have a wire plugged into eth0, it has to read eth0 to get the wire > status, but it has to run DHCP on br0. > > Those programs don't generally have that option, which makes bridges > difficult to use for VMs in a transparent way. > > I wasn't referring to NAT, but you can use NAT with a bridge on Linux; > it's called brouting :-) > >>> Unfortunately, bridging is a pain to set up, if your host has any >>> complicated or automatic network configuration already. > >> As you said bridging requires more configuration > > A bridge is quite simple to configure. Unfortunately because Linux > requires all the IP configuration on the bridge device, but network > device control on the network device, bridges don't work well with > automatic configuration tools. > > If you could apply host IP configuration to the network device and > still have a bridge, that would be perfect. You would just create > br0, add tap1(VM1), tap2(VM2) and eth0(NIC), and everything would work > perfectly. > >> but not less important the performance (packets per second and cpu >> utilization) one can get with bridge+tap is much lower vs what you >> get with the raw mode approach. > > Have you measured it? > >> All in all, its clear that with this approach VM/VM and VM/Host >> communication would have to get switched either at the NIC (e.g >> SR/IOV capable NICs supporting a virtual bridge) or at an external >> switch and make a U turn. > > Unfortunately that's usually impossible. Most switches don't do U > turns, and a lot of simple networks don't have any switches except a > home router. > >> There's a bunch of reasons why people would >> like to do that, among them performance boost, > > No, it makes performance _much_ worse if you have packets leaving the > host, do a U turn and come back on the same link. Much better to use > a bridge inside the host. Probably ten times faster because host's > internal networking is much faster than a typical gigabit link :-) > >> the ability to shape, >> manage and monitor VM/VM traffic in external switches and more. > > That could be useful, but I think it's's probably quite unusual for > someone to want to shape traffic between a VM and it's own host. Also > if you want to do that, you can do it inside the host. > > Sometimes it would be useful to send it outside the host and U turn, > but not very often; only for diagnostics I would think. And even that > can be done with Linux bridges, using VLANs :-) > >>> It would be really nice to find a way which has the advantages of both. >>> Either by adding a different bridging mode to Linux, where host interfaces >>> can be configured for IP and the bridge hangs off the host interface, or >>> by a modified tap interface, or by an alternative >>> pcap/packet-like interface which forwards packets in a similar way to >>> bridging. > >> It seems that this will not yield the performance improvement we can >> get with going directly to the NIC. > > If you don't need any host<->VM networking, maybe a raw packet socket > is faster. > > But are you sure it's faster? > I'd want to see measurements before I believe it. > > If you need any host<->VM networking, most of the time the packet > socket isn't an option at all. Not many switches will 'U turn' > packets as you suggest. FWIW, the fastest local VM<->VM bridge I've happened to measure so far was using qemu's -net socket,listen/connect, ie. a plain local IP or unix domain socket between two qemu instances. No tap devices, no in-kernel bridges involved. But this picture may change once we have some in-kernel virtio-net backend. Jan