netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* bridge not routing packets via source bridgeport
@ 2011-01-03 17:52 Sebastian J. Bronner
  2011-01-03 18:42 ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Sebastian J. Bronner @ 2011-01-03 17:52 UTC (permalink / raw)
  To: netdev; +Cc: Daniel Kraft

[-- Attachment #1: Type: text/plain, Size: 3983 bytes --]

Hi all,

we recently upgraded from 2.6.32.25 to 2.6.35.24 and discovered that our
virtual machines can no longer access their own external IP addresses.
Testing revealed that 2.6.34 was the last version not to have the
problem. 2.6.36 still had it. But on to the details.

Our setup:

We use KVM to virtualise our guests. The physical machines (nodes) act
as One-to-One NAT routers to the virtual machines. The virtual machines
are connected via virtio interfaces in a bridge.

Since the virtual machines only know about their RFC-1918 addresses, any
request they make to their NATed global addresses requires a trip
through the node's netfilter to perform the needed SNAT and DNAT operations.

Take the following setup:

   {internet}
       |
     (eth0)       <- 1.1.1.254, proxy_arp=1
       |
     [node]       <- ip_forward=1, routes*, nat**
       |
    (virbr1)      <- 10.0.0.1
    /      \
(vnet0)     |
   |     (vnet1)
(veth0)     |     <- 10.0.0.2
   |     (veth0)  <- 10.0.0.3
 [vm1]      |
          [vm2]

* The static routes on the node for the vms mentioned above are as follows:
# ip r
1.1.1.2 dev virbr1 scope link
1.1.1.3 dev virbr1 scope link

** The NAT rules are set up as follows (in reality, they're a bit more
complicated - but this suffices to illustrate the problem at hand):
# iptables-save -t nat
-A PREROUTING -d 1.1.1.2 -j DNAT --to-destination 10.0.0.2
-A PREROUTING -d 1.1.1.3 -j DNAT --to-destination 10.0.0.3
-A POSTROUTING -s 10.0.0.2 -j SNAT --to-source 1.1.1.2
-A POSTROUTING -s 10.0.0.3 -j SNAT --to-source 1.1.1.3

This means that 1.1.1.2 maps to 10.0.0.2 (vm1) and
                1.1.1.3 maps to 10.0.0.3 (vm2).

Assuming ssh is running on both vms, running 'nc -v 1.1.1.3 22' from vm1
gets me ssh's introductory message.

Assuming, no service is running on port 23, running 'nc -v 1.1.1.3 23'
from vm1 gets me 'Connection refused'.

That's all fine and exactly as it should be. The vms are accessible from
the internet as well, and can access the internet.

If, however, i run 'nc -v 1.1.1.2 22' from vm1 (or any port for that
matter), I get a timeout!

Running tcpdump on all the involved interfaces showed me that the
packets successfully traverse veth0 and vnet0 and appear to get lost
upon reaching virbr1.

So, then I decided to set up a packet trace with iptables:
[on the node]
# modprobe ipt_LOG
# iptables -t raw -A PREROUTING -p tcp --dport 4577 -j TRACE
# tail -f /var/log/messages | grep TRACE
[on vm1]
# nc -v 1.1.1.2 4577

The results were very interesting, if somewhat dumbfounding. They are
attached for easier perusal. The gist of it is that the packet in
question disappears without a trace after going through the DNAT rule in
the PREROUTING chain of the NAT table. This can be seen happening three
times in vm1-to-1.1.1.2.txt in three and six second intervals (retries).

For comparison, I have also included a trace of a successful packet
traversal that ends in a 'Connection refused'. It is in vm1-to-1.1.1.3.txt.

As a last note, I should add that the problem isn't related to the IP
address. I eliminated that by putting two RFC-1918 IPs on vm1 and
mapping two IPs to it, then running nc on one IP, while the other one
was being used as the source IP.

The problem appears to be that packets can't be routed out the same
bridgeport that they arrived from.

I hope this all makes sense and that you can reproduce the problem. One
virtual machine will suffise to see the problem at work.

Feel free to contact me if you need more information or have suggestions
for me.

Cheers,
Sebastian Bronner

P.S.: The IP addresses are faked. I used vim to replace all instances of
the real IPs with the fake ones used in this e-mail consistently.
-- 
*Sebastian J. Bronner*
Administrator

D9T GmbH - Magirusstr. 39/1 - D-89077 Ulm
Tel: +49 731 1411 696-0 - Fax: +49 731 3799-220

Geschäftsführer: Daniel Kraft
Sitz und Register: Ulm, HRB 722416
Ust.IdNr: DE 260484638

http://d9t.de - D9T High Performance Hosting
info@d9t.de

[-- Attachment #2: vm1-to-1.1.1.2.txt --]
[-- Type: text/plain, Size: 3243 bytes --]

Jan  3 18:29:35 s14 kernel: [15791.001685] TRACE: raw:PREROUTING:policy:2 IN=virbr1 OUT= PHYSIN=vnet0 MAC=02:00:00:00:00:16:52:54:00:4a:25:72:08:00 SRC=10.0.0.2 DST=1.1.1.2 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=62671 DF PROTO=TCP SPT=49068 DPT=4577 SEQ=1589611546 ACK=0 WINDOW=5840 RES=0x00 SYN URGP=0 OPT (020405B40402080A000E78D50000000001030307) UID=0 GID=0 
Jan  3 18:29:35 s14 kernel: [15791.001730] TRACE: mangle:PREROUTING:policy:1 IN=virbr1 OUT= PHYSIN=vnet0 MAC=02:00:00:00:00:16:52:54:00:4a:25:72:08:00 SRC=10.0.0.2 DST=1.1.1.2 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=62671 DF PROTO=TCP SPT=49068 DPT=4577 SEQ=1589611546 ACK=0 WINDOW=5840 RES=0x00 SYN URGP=0 OPT (020405B40402080A000E78D50000000001030307) UID=0 GID=0 
Jan  3 18:29:35 s14 kernel: [15791.001762] TRACE: nat:PREROUTING:rule:1 IN=virbr1 OUT= PHYSIN=vnet0 MAC=02:00:00:00:00:16:52:54:00:4a:25:72:08:00 SRC=10.0.0.2 DST=1.1.1.2 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=62671 DF PROTO=TCP SPT=49068 DPT=4577 SEQ=1589611546 ACK=0 WINDOW=5840 RES=0x00 SYN URGP=0 OPT (020405B40402080A000E78D50000000001030307) UID=0 GID=0 
Jan  3 18:29:38 s14 kernel: [15793.995583] TRACE: raw:PREROUTING:policy:2 IN=virbr1 OUT= PHYSIN=vnet0 MAC=02:00:00:00:00:16:52:54:00:4a:25:72:08:00 SRC=10.0.0.2 DST=1.1.1.2 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=62672 DF PROTO=TCP SPT=49068 DPT=4577 SEQ=1589611546 ACK=0 WINDOW=5840 RES=0x00 SYN URGP=0 OPT (020405B40402080A000E7A010000000001030307) UID=0 GID=0 
Jan  3 18:29:38 s14 kernel: [15793.995624] TRACE: mangle:PREROUTING:policy:1 IN=virbr1 OUT= PHYSIN=vnet0 MAC=02:00:00:00:00:16:52:54:00:4a:25:72:08:00 SRC=10.0.0.2 DST=1.1.1.2 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=62672 DF PROTO=TCP SPT=49068 DPT=4577 SEQ=1589611546 ACK=0 WINDOW=5840 RES=0x00 SYN URGP=0 OPT (020405B40402080A000E7A010000000001030307) UID=0 GID=0 
Jan  3 18:29:38 s14 kernel: [15793.995656] TRACE: nat:PREROUTING:rule:1 IN=virbr1 OUT= PHYSIN=vnet0 MAC=02:00:00:00:00:16:52:54:00:4a:25:72:08:00 SRC=10.0.0.2 DST=1.1.1.2 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=62672 DF PROTO=TCP SPT=49068 DPT=4577 SEQ=1589611546 ACK=0 WINDOW=5840 RES=0x00 SYN URGP=0 OPT (020405B40402080A000E7A010000000001030307) UID=0 GID=0 
Jan  3 18:29:44 s14 kernel: [15799.995658] TRACE: raw:PREROUTING:policy:2 IN=virbr1 OUT= PHYSIN=vnet0 MAC=02:00:00:00:00:16:52:54:00:4a:25:72:08:00 SRC=10.0.0.2 DST=1.1.1.2 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=62673 DF PROTO=TCP SPT=49068 DPT=4577 SEQ=1589611546 ACK=0 WINDOW=5840 RES=0x00 SYN URGP=0 OPT (020405B40402080A000E7C590000000001030307) UID=0 GID=0 
Jan  3 18:29:44 s14 kernel: [15799.995700] TRACE: mangle:PREROUTING:policy:1 IN=virbr1 OUT= PHYSIN=vnet0 MAC=02:00:00:00:00:16:52:54:00:4a:25:72:08:00 SRC=10.0.0.2 DST=1.1.1.2 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=62673 DF PROTO=TCP SPT=49068 DPT=4577 SEQ=1589611546 ACK=0 WINDOW=5840 RES=0x00 SYN URGP=0 OPT (020405B40402080A000E7C590000000001030307) UID=0 GID=0 
Jan  3 18:29:44 s14 kernel: [15799.995732] TRACE: nat:PREROUTING:rule:1 IN=virbr1 OUT= PHYSIN=vnet0 MAC=02:00:00:00:00:16:52:54:00:4a:25:72:08:00 SRC=10.0.0.2 DST=1.1.1.2 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=62673 DF PROTO=TCP SPT=49068 DPT=4577 SEQ=1589611546 ACK=0 WINDOW=5840 RES=0x00 SYN URGP=0 OPT (020405B40402080A000E7C590000000001030307) UID=0 GID=0 

[-- Attachment #3: vm1-to-1.1.1.3.txt --]
[-- Type: text/plain, Size: 2284 bytes --]

Jan  3 18:32:33 s14 kernel: [15968.856178] TRACE: raw:PREROUTING:policy:2 IN=virbr1 OUT= PHYSIN=vnet0 MAC=02:00:00:00:00:16:52:54:00:4a:25:72:08:00 SRC=10.0.0.2 DST=1.1.1.3 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=36414 DF PROTO=TCP SPT=37569 DPT=4577 SEQ=80883825 ACK=0 WINDOW=5840 RES=0x00 SYN URGP=0 OPT (020405B40402080A000EBE4F0000000001030307) UID=0 GID=0 
Jan  3 18:32:33 s14 kernel: [15968.856211] TRACE: mangle:PREROUTING:policy:1 IN=virbr1 OUT= PHYSIN=vnet0 MAC=02:00:00:00:00:16:52:54:00:4a:25:72:08:00 SRC=10.0.0.2 DST=1.1.1.3 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=36414 DF PROTO=TCP SPT=37569 DPT=4577 SEQ=80883825 ACK=0 WINDOW=5840 RES=0x00 SYN URGP=0 OPT (020405B40402080A000EBE4F0000000001030307) UID=0 GID=0 
Jan  3 18:32:33 s14 kernel: [15968.856233] TRACE: nat:PREROUTING:policy:2 IN=virbr1 OUT= PHYSIN=vnet0 MAC=02:00:00:00:00:16:52:54:00:4a:25:72:08:00 SRC=10.0.0.2 DST=1.1.1.3 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=36414 DF PROTO=TCP SPT=37569 DPT=4577 SEQ=80883825 ACK=0 WINDOW=5840 RES=0x00 SYN URGP=0 OPT (020405B40402080A000EBE4F0000000001030307) UID=0 GID=0 
Jan  3 18:32:33 s14 kernel: [15968.856272] TRACE: mangle:FORWARD:policy:1 IN=virbr1 OUT=eth0 PHYSIN=vnet0 SRC=10.0.0.2 DST=1.1.1.3 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=36414 DF PROTO=TCP SPT=37569 DPT=4577 SEQ=80883825 ACK=0 WINDOW=5840 RES=0x00 SYN URGP=0 OPT (020405B40402080A000EBE4F0000000001030307) 
Jan  3 18:32:33 s14 kernel: [15968.856288] TRACE: filter:FORWARD:policy:1 IN=virbr1 OUT=eth0 PHYSIN=vnet0 SRC=10.0.0.2 DST=1.1.1.3 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=36414 DF PROTO=TCP SPT=37569 DPT=4577 SEQ=80883825 ACK=0 WINDOW=5840 RES=0x00 SYN URGP=0 OPT (020405B40402080A000EBE4F0000000001030307) 
Jan  3 18:32:33 s14 kernel: [15968.856305] TRACE: mangle:POSTROUTING:policy:1 IN= OUT=eth0 PHYSIN=vnet0 SRC=10.0.0.2 DST=1.1.1.3 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=36414 DF PROTO=TCP SPT=37569 DPT=4577 SEQ=80883825 ACK=0 WINDOW=5840 RES=0x00 SYN URGP=0 OPT (020405B40402080A000EBE4F0000000001030307) 
Jan  3 18:32:33 s14 kernel: [15968.856321] TRACE: nat:POSTROUTING:rule:1 IN= OUT=eth0 PHYSIN=vnet0 SRC=10.0.0.2 DST=1.1.1.3 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=36414 DF PROTO=TCP SPT=37569 DPT=4577 SEQ=80883825 ACK=0 WINDOW=5840 RES=0x00 SYN URGP=0 OPT (020405B40402080A000EBE4F0000000001030307) 

[-- Attachment #4: iptables-nat.txt --]
[-- Type: text/plain, Size: 645 bytes --]

Chain PREROUTING (policy ACCEPT 4027 packets, 296K bytes)
 pkts bytes target     prot opt in     out     source               destination         
    8   488 DNAT       all  --  *      *       0.0.0.0/0            1.1.1.2             to:10.0.0.2 

Chain OUTPUT (policy ACCEPT 24412 packets, 1578K bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain POSTROUTING (policy ACCEPT 24430 packets, 1579K bytes)
 pkts bytes target     prot opt in     out     source               destination         
    4   240 SNAT       all  --  *      *       10.0.0.2             0.0.0.0/0           to:1.1.1.2 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-03-23 19:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-03 17:52 bridge not routing packets via source bridgeport Sebastian J. Bronner
2011-01-03 18:42 ` Eric Dumazet
2011-01-04  8:25   ` Sebastian J. Bronner
2011-03-23 10:12     ` Sebastian J. Bronner
2011-03-23 19:27       ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).