Jamie Lokier wrote:
> Or Gerlitz wrote:
>> Jamie Lokier wrote:
>>> The problem is simply what the guest sends goes out on the network and is 
>>> not looped backed to the host network stack, and vice versa. So if your 
>>> host is 192.168.1.1 and is running a DNS server (say), and the guest is 
>>> 192.168.1.2, when the guest sends queries to 192.168.1.1 the host won't 
>>> see those queries.  Same if you're running an FTP server on the host and 
>>> the guest wants to connect to it, etc. It also means multiple guests can't 
>>> see each other, for the same reason. So it's much less useful than 
>>> bridging, where the guests and host can all see each other and connect to 
>>> each other.
>> I wasn't sure to follow if your example refers to the case when 
>> networking uses the bridge or NAT. If its bridge, then through which 
>> bridge interface the packet arrives the host stack? say you have a 
>> bridge whose attached interfaces are tap1(VM1), tap2(VM2) and eth0(NIC), 
>> in your example did you mean that the host IP address is assigned to the 
>> bridge interface? or you were referring a NAT based scheme?
> 
> When using a bridge, you set the IP address on the bridge itself (for
> example, br0).  DHCP runs on the bridge itself, so does the rest of
> the Linux host stack, although you can use raw sockets on the other
> interfaces.
> 
> But reading and controlling the hardware is done on the interfaces.
> 
> So if you have some program like NetworkManager which checks if you
> have a wire plugged into eth0, it has to read eth0 to get the wire
> status, but it has to run DHCP on br0.
> 
> Those programs don't generally have that option, which makes bridges
> difficult to use for VMs in a transparent way.
> 
> I wasn't referring to NAT, but you can use NAT with a bridge on Linux;
> it's called brouting :-)
> 
>>> Unfortunately, bridging is a pain to set up, if your host has any 
>>> complicated or automatic network configuration already.
> 
>> As you said bridging requires more configuration
> 
> A bridge is quite simple to configure.  Unfortunately because Linux
> requires all the IP configuration on the bridge device, but network
> device control on the network device, bridges don't work well with
> automatic configuration tools.
> 
> If you could apply host IP configuration to the network device and
> still have a bridge, that would be perfect.  You would just create
> br0, add tap1(VM1), tap2(VM2) and eth0(NIC), and everything would work
> perfectly.
> 
>> but not less important the performance (packets per second and cpu
>> utilization) one can get with bridge+tap is much lower vs what you
>> get with the raw mode approach.
> 
> Have you measured it?
> 
>> All in all, its clear that with this approach VM/VM and VM/Host
>> communication would have to get switched either at the NIC (e.g
>> SR/IOV capable NICs supporting a virtual bridge) or at an external
>> switch and make a U turn.
> 
> Unfortunately that's usually impossible.  Most switches don't do U
> turns, and a lot of simple networks don't have any switches except a
> home router.
> 
>> There's a bunch of reasons why people would 
>> like to do that, among them performance boost,
> 
> No, it makes performance _much_ worse if you have packets leaving the
> host, do a U turn and come back on the same link.  Much better to use
> a bridge inside the host.  Probably ten times faster because host's
> internal networking is much faster than a typical gigabit link :-)
> 
>> the ability to shape, 
>> manage and monitor VM/VM traffic in external switches and more.
> 
> That could be useful, but I think it's's probably quite unusual for
> someone to want to shape traffic between a VM and it's own host.  Also
> if you want to do that, you can do it inside the host.
> 
> Sometimes it would be useful to send it outside the host and U turn,
> but not very often; only for diagnostics I would think.  And even that
> can be done with Linux bridges, using VLANs :-)
> 
>>> It would be really nice to find a way which has the advantages of both.  
>>> Either by adding a different bridging mode to Linux, where host interfaces 
>>> can be configured for IP and the bridge hangs off the host interface, or 
>>> by a modified tap interface, or by an alternative
>>> pcap/packet-like interface which forwards packets in a similar way to 
>>> bridging.  
> 
>> It seems that this will not yield  the performance improvement we can 
>> get with going directly to the NIC.
> 
> If you don't need any host<->VM networking, maybe a raw packet socket
> is faster.
> 
> But are you sure it's faster?
> I'd want to see measurements before I believe it.
> 
> If you need any host<->VM networking, most of the time the packet
> socket isn't an option at all.  Not many switches will 'U turn'
> packets as you suggest.

FWIW, the fastest local VM<->VM bridge I've happened to measure so far
was using qemu's -net socket,listen/connect, ie. a plain local IP or
unix domain socket between two qemu instances. No tap devices, no
in-kernel bridges involved. But this picture may change once we have
some in-kernel virtio-net backend.

Jan