netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: conntrack doesn't always work when a bridge is used
       [not found] <9a4a382a0712180648i7fc958edt6f0d9db83f574c77@mail.gmail.com>
@ 2007-12-19 17:00 ` Damien Thébault
  2007-12-19 19:03   ` Patrick McHardy
  2007-12-28 14:39   ` Damien Thébault
  0 siblings, 2 replies; 21+ messages in thread
From: Damien Thébault @ 2007-12-19 17:00 UTC (permalink / raw)
  To: linux-net, netfilter-devel, Patrick McHardy, David S. Miller

Hello,

I sent the quoted mail to linux-net with my problem yesterday, but I
did a git bisect today and I got the following output :

> 2bf540b73ed5b304e84bb4d4c390d49d1cfa0ef8 is first bad commit
> commit 2bf540b73ed5b304e84bb4d4c390d49d1cfa0ef8
> Author: Patrick McHardy <kaber@trash.net>
> Date:   Wed Dec 13 16:54:25 2006 -0800
>
>     [NETFILTER]: bridge-netfilter: remove deferred hooks
>
>     Remove the deferred hooks and all related code as scheduled in
>     feature-removal-schedule.
>
>     Signed-off-by: Patrick McHardy <kaber@trash.net>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>
> :040000 040000 c49ea947455937566b6129991dde5e86f2453aae 6611736ce5c0fcde7627494b66b9ea94e37ea42e M      Documentation
> :040000 040000 d0dd0700fe68f98b52687be3a0c31d73f7b15b81 f8ddf15a0389c5f5b7f2c11d7d0db039a660e1d5 M      include
> :040000 040000 dafccf7ff8657be9adca6b28dbd365cdd6c01ca5 3eeb1cb4b16cc5cb698ab559b47ea6b0991d4d3a M      net

With exactly this version, the behaviour is similar with what I see
with 2.6.23 and 2.6.24-rc5 (see below).


The original mail :

On Dec 18, 2007 3:48 PM, Damien Thébault <damien.thebault@gmail.com> wrote:
> Hello,
>
> I was trying to use a setup with nat and bridge, but conntrack doesn't
> work properly (at first I tried rtsp, but since it's an external patch
> with some modifications, I used ftp for my tests).
> I removed everything but the minimal setup to report this bug :
>
> A) one ftp client, on 192.168.1.0/24 (192.168.1.50 or 192.168.1.5 or
> 192.168.1.150 or 192.168.1.15, see below)
> B) one router doing nat
>   b1) one interface is on 192.168.1.0/24 with the ftp client (192.168.1.70)
>   b2) the other interface is on 192.168.2.0/24 with the ftp server
> (192.168.2.70 or 192.168.2.7, see below)
> C) one ftp server, on 192.168.2.0/24 (192.168.2.50)
>
> To do my tests, I used only one physical interface on the router,
> configured with two ip addresses. This interface is either a real
> interface or a bridge on top of only this same real interface, for
> exemple :
> ifconfig eth0 192.168.1.70 up
> ifconfig eth0:0 192.168.2.70 up
> or :
> ifconfig eth0:0 down
> brctl addbr br0
> brctl addif br0 eth0
> ifconfig eth0 0.0.0.0 up
> ifconfig br0 192.168.1.70 up
> ifconfig br0:0 192.168.2.70 up
>
> the following netfilter configuration is used :
> iptables -t nat -A POSTROUTING -d 192.168.2.0/24 -j MASQUERADE
>
> I tried this once with three computers, and the today with only two
> computers. But to do this two-computers setup, I had to add another
> rule to do the trick :
> iptables -t nat -A PREROUTING -d 192.168.2.250 -j DNAT
> --to-destination 192.168.2.50
> And I connect from the computer with both 192.168.1.50 and
> 192.168.2.50 to the ftp server 192.168.2.250, the route for
> 192.168.2.0/24 goes via the router, is natted and comes back.
> The behaviour is the same with three distinct computers or with this,
> so it's not really interesting here.
>
> What I'm seeing, is that when the bridge is used, and when the output
> ip address of the router is longer or smaller than the ip address of
> the client, then conntrack doesn't work!
> The good address is present after nat_ftp ("wan"-side ip address of
> ther router), everything seems to work well but the incoming connexion
> from port 20 is not forwarded (RST is replied to the SYN).
> If the bridge is not used, or if the two ip addresses have the same
> length, then it works as expected.
>
> The kernel used on the router is 2.6.24-rc5-git5, and active ftp is used.
> I collected /proc/net/nf_conntrack on each try, if something else is
> needed, I'll get what is needed (the packets captures with wireshark
> look ok in every case, but maybe someone will need them?).
>
> 1) without a bridge
>  * 192.168.1.50 -> 192.168.1.70 [ nat ] 192.168.2.7  -> 192.168.2.50
> ftp works
> /proc/net/nf_conntrack :
> ipv4     2 tcp      6 118 TIME_WAIT src=192.168.2.50 dst=192.168.2.7
> sport=20 dport=51956 packets=5 bytes=334 src=192.168.1.50
> dst=192.168.2.250 sport=51956 dport=20 packets=3 bytes=164 [ASSURED]
> mark=0 secmark=0 use=1
> ipv4     2 tcp      6 431998 ESTABLISHED src=192.168.1.50
> dst=192.168.2.250 sport=52709 dport=21 packets=11 bytes=635
> src=192.168.2.50 dst=192.168.2.7 sport=21 dport=52709 packets=8
> bytes=880 [ASSURED] mark=0 secmark=0 use=2
>
>  * 192.168.1.50 -> 192.168.1.70 [ nat ] 192.168.2.70 -> 192.168.2.50
> ftp works
> /proc/net/nf_conntrack :
> ipv4     2 tcp      6 431998 ESTABLISHED src=192.168.1.50
> dst=192.168.2.250 sport=52747 dport=21 packets=11 bytes=634
> src=192.168.2.50 dst=192.168.2.70 sport=21 dport=52747 packets=8
> bytes=880 [ASSURED] mark=0 secmark=0 use=2
> ipv4     2 tcp      6 118 TIME_WAIT src=192.168.2.50 dst=192.168.2.70
> sport=20 dport=43608 packets=5 bytes=334 src=192.168.1.50
> dst=192.168.2.250 sport=43608 dport=20 packets=3 bytes=164 [ASSURED]
> mark=0 secmark=0 use=1
>
>  * 192.168.1.5  -> 192.168.1.70 [ nat ] 192.168.2.7  -> 192.168.2.50
> ftp works
> /proc/net/nf_conntrack :
> ipv4     2 tcp      6 117 TIME_WAIT src=192.168.2.50 dst=192.168.2.7
> sport=20 dport=41758 packets=5 bytes=334 src=192.168.1.5
> dst=192.168.2.250 sport=41758 dport=20 packets=3 bytes=164 [ASSURED]
> mark=0 secmark=0 use=1
> ipv4     2 tcp      6 431997 ESTABLISHED src=192.168.1.5
> dst=192.168.2.250 sport=47397 dport=21 packets=11 bytes=633
> src=192.168.2.50 dst=192.168.2.7 sport=21 dport=47397 packets=8
> bytes=880 [ASSURED] mark=0 secmark=0 use=2
>
>  * 192.168.1.5  -> 192.168.1.70 [ nat ] 192.168.2.70 -> 192.168.2.50
> ftp works
> /proc/net/nf_conntrack :
> ipv4     2 tcp      6 431998 ESTABLISHED src=192.168.1.5
> dst=192.168.2.250 sport=53851 dport=21 packets=11 bytes=634
> src=192.168.2.50 dst=192.168.2.70 sport=21 dport=53851 packets=8
> bytes=880 [ASSURED] mark=0 secmark=0 use=2
> ipv4     2 tcp      6 118 TIME_WAIT src=192.168.2.50 dst=192.168.2.70
> sport=20 dport=51600 packets=5 bytes=334 src=192.168.1.5
> dst=192.168.2.250 sport=51600 dport=20 packets=3 bytes=164 [ASSURED]
> mark=0 secmark=0 use=1
>
>  * 192.168.1.150-> 192.168.1.70 [ nat ] 192.168.2.70 -> 192.168.2.50
> ftp works
> /proc/net/nf_conntrack :
> ipv4     2 tcp      6 431998 ESTABLISHED src=192.168.1.150
> dst=192.168.2.250 sport=41108 dport=21 packets=11 bytes=636
> src=192.168.2.50 dst=192.168.2.70 sport=21 dport=41108 packets=8
> bytes=880 [ASSURED] mark=0 secmark=0 use=2
> ipv4     2 tcp      6 118 TIME_WAIT src=192.168.2.50 dst=192.168.2.70
> sport=20 dport=35479 packets=5 bytes=334 src=192.168.1.150
> dst=192.168.2.250 sport=35479 dport=20 packets=3 bytes=164 [ASSURED]
> mark=0 secmark=0 use=1
>
>  * 192.168.1.15 -> 192.168.1.70 [ nat ] 192.168.2.70 -> 192.168.2.50
> ftp works
> /proc/net/nf_conntrack :
> ipv4     2 tcp      6 118 TIME_WAIT src=192.168.2.50 dst=192.168.2.70
> sport=20 dport=46819 packets=5 bytes=334 src=192.168.1.15
> dst=192.168.2.250 sport=46819 dport=20 packets=3 bytes=164 [ASSURED]
> mark=0 secmark=0 use=1
> ipv4     2 tcp      6 431998 ESTABLISHED src=192.168.1.15
> dst=192.168.2.250 sport=44839 dport=21 packets=11 bytes=635
> src=192.168.2.50 dst=192.168.2.70 sport=21 dport=44839 packets=8
> bytes=880 [ASSURED] mark=0 secmark=0 use=2
>
> 2) with a bridge
>  * 192.168.1.50 -> 192.168.1.70 [ nat ] 192.168.2.7  -> 192.168.2.50
> ftp doesn't work
> /proc/net/nf_conntrack :
> ipv4     2 tcp      6 431998 ESTABLISHED src=192.168.1.50
> dst=192.168.2.250 sport=56140 dport=21 packets=13 bytes=796
> src=192.168.2.50 dst=192.168.2.7 sport=21 dport=56140 packets=13
> bytes=1109 [ASSURED] mark=0 secmark=0 use=1
>
>  * 192.168.1.50 -> 192.168.1.70 [ nat ] 192.168.2.70 -> 192.168.2.50
> ftp works
> /proc/net/nf_conntrack :
> ipv4     2 tcp      6 431997 ESTABLISHED src=192.168.1.50
> dst=192.168.2.250 sport=34334 dport=21 packets=11 bytes=635
> src=192.168.2.50 dst=192.168.2.70 sport=21 dport=34334 packets=8
> bytes=880 [ASSURED] mark=0 secmark=0 use=2
> ipv4     2 tcp      6 118 TIME_WAIT src=192.168.2.50 dst=192.168.2.70
> sport=20 dport=59124 packets=5 bytes=334 src=192.168.1.50
> dst=192.168.2.250 sport=59124 dport=20 packets=3 bytes=164 [ASSURED]
> mark=0 secmark=0 use=1
>
>  * 192.168.1.5  -> 192.168.1.70 [ nat ] 192.168.2.7  -> 192.168.2.50
> ftp works
> /proc/net/nf_conntrack :
> ipv4     2 tcp      6 116 TIME_WAIT src=192.168.2.50 dst=192.168.2.7
> sport=20 dport=36148 packets=5 bytes=334 src=192.168.1.5
> dst=192.168.2.250 sport=36148 dport=20 packets=3 bytes=164 [ASSURED]
> mark=0 secmark=0 use=1
> ipv4     2 tcp      6 431996 ESTABLISHED src=192.168.1.5
> dst=192.168.2.250 sport=46925 dport=21 packets=11 bytes=633
> src=192.168.2.50 dst=192.168.2.7 sport=21 dport=46925 packets=8
> bytes=880 [ASSURED] mark=0 secmark=0 use=2
>
>  * 192.168.1.5  -> 192.168.1.70 [ nat ] 192.168.2.70 -> 192.168.2.50
> ftp doesn't work
> /proc/net/nf_conntrack :
> ipv4     2 tcp      6 59 CLOSE_WAIT src=192.168.2.50 dst=192.168.2.70
> sport=20 dport=60875 packets=4 bytes=282 src=192.168.1.5
> dst=192.168.2.250 sport=60875 dport=20 packets=3 bytes=164 [ASSURED]
> mark=0 secmark=0 use=1
> ipv4     2 tcp      6 431999 ESTABLISHED src=192.168.1.5
> dst=192.168.2.250 sport=48877 dport=21 packets=12 bytes=687
> src=192.168.2.50 dst=192.168.2.70 sport=21 dport=48877 packets=9
> bytes=939 [ASSURED] mark=0 secmark=0 use=2
>
>  * 192.168.1.150-> 192.168.1.70 [ nat ] 192.168.2.70  -> 192.168.2.50
> ftp doesn't work
> /proc/net/nf_conntrack :
> ipv4     2 tcp      6 431999 ESTABLISHED src=192.168.1.150
> dst=192.168.2.250 sport=33847 dport=21 packets=14 bytes=878
> src=192.168.2.50 dst=192.168.2.70 sport=21 dport=33847 packets=14
> bytes=1173 [ASSURED] mark=0 secmark=0 use=1
>
>  * 192.168.1.15 -> 192.168.1.70 [ nat ] 192.168.2.70 -> 192.168.2.50
> ftp works
> /proc/net/nf_conntrack :
> ipv4     2 tcp      6 431998 ESTABLISHED src=192.168.1.15
> dst=192.168.2.250 sport=46792 dport=21 packets=11 bytes=633
> src=192.168.2.50 dst=192.168.2.70 sport=21 dport=46792 packets=8
> bytes=880 [ASSURED] mark=0 secmark=0 use=2
> ipv4     2 tcp      6 118 TIME_WAIT src=192.168.2.50 dst=192.168.2.70
> sport=20 dport=56576 packets=5 bytes=334 src=192.168.1.15
> dst=192.168.2.250 sport=56576 dport=20 packets=3 bytes=164 [ASSURED]
> mark=0 secmark=0 use=1
>

--
Damien Thébault
-
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: conntrack doesn't always work when a bridge is used
  2007-12-19 17:00 ` conntrack doesn't always work when a bridge is used Damien Thébault
@ 2007-12-19 19:03   ` Patrick McHardy
  2007-12-20  8:30     ` Damien Thébault
  2007-12-28 14:39   ` Damien Thébault
  1 sibling, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2007-12-19 19:03 UTC (permalink / raw)
  To: Damien Thébault; +Cc: linux-net, netfilter-devel, David S. Miller

Damien Thébault wrote:
> Hello,
> 
> I sent the quoted mail to linux-net with my problem yesterday, but I
> did a git bisect today and I got the following output :
> 
>> 2bf540b73ed5b304e84bb4d4c390d49d1cfa0ef8 is first bad commit
>> commit 2bf540b73ed5b304e84bb4d4c390d49d1cfa0ef8
>> Author: Patrick McHardy <kaber@trash.net>
>> Date:   Wed Dec 13 16:54:25 2006 -0800
>>
>>     [NETFILTER]: bridge-netfilter: remove deferred hooks
>>
>>     Remove the deferred hooks and all related code as scheduled in
>>     feature-removal-schedule.
>>
>>     Signed-off-by: Patrick McHardy <kaber@trash.net>
>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>
>> :040000 040000 c49ea947455937566b6129991dde5e86f2453aae 6611736ce5c0fcde7627494b66b9ea94e37ea42e M      Documentation
>> :040000 040000 d0dd0700fe68f98b52687be3a0c31d73f7b15b81 f8ddf15a0389c5f5b7f2c11d7d0db039a660e1d5 M      include
>> :040000 040000 dafccf7ff8657be9adca6b28dbd365cdd6c01ca5 3eeb1cb4b16cc5cb698ab559b47ea6b0991d4d3a M      net
> 
> With exactly this version, the behaviour is similar with what I see
> with 2.6.23 and 2.6.24-rc5 (see below).


Could you capture the conntrack events of the non-working
case with (run in parallel):

conntrack -E
conntrack -E expect

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: conntrack doesn't always work when a bridge is used
  2007-12-19 19:03   ` Patrick McHardy
@ 2007-12-20  8:30     ` Damien Thébault
  2007-12-20 10:06       ` Patrick McHardy
  0 siblings, 1 reply; 21+ messages in thread
From: Damien Thébault @ 2007-12-20  8:30 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: linux-net, netfilter-devel, David S. Miller

On Dec 19, 2007 8:03 PM, Patrick McHardy <kaber@trash.net> wrote:
>
> Could you capture the conntrack events of the non-working
> case with (run in parallel):
>
> conntrack -E
> conntrack -E expect
>

Sure, here it is :

conntrack -E :

    [NEW] tcp      6 120 SYN_SENT src=192.168.1.5 dst=192.168.2.250
sport=45090 dport=21 [UNREPLIED] src=192.168.2.50 dst=192.168.2.70
sport=21 dport=45090
 [UPDATE] tcp      6 60 SYN_RECV src=192.168.1.5 dst=192.168.2.250
sport=45090 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21
dport=45090
 [UPDATE] tcp      6 432000 ESTABLISHED src=192.168.1.5
dst=192.168.2.250 sport=45090 dport=21 src=192.168.2.50
dst=192.168.2.70 sport=21 dport=45090 [ASSURED]
    [NEW] tcp      6 120 SYN_SENT src=127.0.0.1 dst=127.0.0.1
sport=47496 dport=631 [UNREPLIED] src=127.0.0.1 dst=127.0.0.1
sport=631 dport=47496
 [UPDATE] tcp      6 120 CLOSE src=127.0.0.1 dst=127.0.0.1 sport=47496
dport=631 src=127.0.0.1 dst=127.0.0.1 sport=631 dport=47496
[DESTROY] tcp      6 src=127.0.0.1 dst=127.0.0.1 sport=47496 dport=631
packets=1 bytes=60 src=127.0.0.1 dst=127.0.0.1 sport=631 dport=47496
packets=0 bytes=0
    [NEW] tcp      6 120 SYN_SENT src=192.168.2.50 dst=192.168.2.70
sport=20 dport=33344 [UNREPLIED] src=192.168.1.5 dst=192.168.2.250
sport=33344 dport=20
 [UPDATE] tcp      6 60 SYN_RECV src=192.168.2.50 dst=192.168.2.70
sport=20 dport=33344 src=192.168.1.5 dst=192.168.2.250 sport=33344
dport=20
 [UPDATE] tcp      6 432000 ESTABLISHED src=192.168.2.50
dst=192.168.2.70 sport=20 dport=33344 src=192.168.1.5
dst=192.168.2.250 sport=33344 dport=20 [ASSURED]
 [UPDATE] tcp      6 120 FIN_WAIT src=192.168.2.50 dst=192.168.2.70
sport=20 dport=33344 src=192.168.1.5 dst=192.168.2.250 sport=33344
dport=20 [ASSURED]
 [UPDATE] tcp      6 60 CLOSE_WAIT src=192.168.2.50 dst=192.168.2.70
sport=20 dport=33344 src=192.168.1.5 dst=192.168.2.250 sport=33344
dport=20 [ASSURED]
 [UPDATE] tcp      6 10 CLOSE src=192.168.2.50 dst=192.168.2.70
sport=20 dport=33344 src=192.168.1.5 dst=192.168.2.250 sport=33344
dport=20 [ASSURED]
 [UPDATE] tcp      6 120 FIN_WAIT src=192.168.1.5 dst=192.168.2.250
sport=45090 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21
dport=45090 [ASSURED]
 [UPDATE] tcp      6 60 CLOSE_WAIT src=192.168.1.5 dst=192.168.2.250
sport=45090 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21
dport=45090 [ASSURED]
 [UPDATE] tcp      6 30 LAST_ACK src=192.168.1.5 dst=192.168.2.250
sport=45090 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21
dport=45090 [ASSURED]
 [UPDATE] tcp      6 120 TIME_WAIT src=192.168.1.5 dst=192.168.2.250
sport=45090 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21
dport=45090 [ASSURED]
 [UPDATE] tcp      6 10 CLOSE src=192.168.1.5 dst=192.168.2.250
sport=45090 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21
dport=45090 [ASSURED]
    [NEW] unknown  2 600 src=192.168.1.1 dst=224.0.0.1 [UNREPLIED]
src=224.0.0.1 dst=192.168.1.1
[DESTROY] tcp      6 src=192.168.2.50 dst=192.168.2.70 sport=20
dport=33344 packets=4 bytes=559 src=192.168.1.5 dst=192.168.2.250
sport=33344 dport=20 packets=4 bytes=216
[DESTROY] tcp      6 src=192.168.1.5 dst=192.168.2.250 sport=45090
dport=21 packets=17 bytes=916 src=192.168.2.50 dst=192.168.2.70
sport=21 dport=45090 packets=12 bytes=1162

conntrack -E expect :

300 proto=6 src=192.168.2.50 dst=192.168.2.70 sport=0 dport=33344

-- 
Damien Thebault

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: conntrack doesn't always work when a bridge is used
  2007-12-20  8:30     ` Damien Thébault
@ 2007-12-20 10:06       ` Patrick McHardy
  2007-12-20 11:06         ` Damien Thébault
  0 siblings, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2007-12-20 10:06 UTC (permalink / raw)
  To: Damien Thébault; +Cc: linux-net, netfilter-devel, David S. Miller

Damien Thébault wrote:
> On Dec 19, 2007 8:03 PM, Patrick McHardy <kaber@trash.net> wrote:
>> Could you capture the conntrack events of the non-working
>> case with (run in parallel):
>>
>> conntrack -E
>> conntrack -E expect
>>
> 
> Sure, here it is :

That actually looks like it works properly.

New control connection:

>     [NEW] tcp      6 120 SYN_SENT src=192.168.1.5 dst=192.168.2.250
> sport=45090 dport=21 [UNREPLIED] src=192.168.2.50 dst=192.168.2.70
> sport=21 dport=45090
>  [UPDATE] tcp      6 60 SYN_RECV src=192.168.1.5 dst=192.168.2.250
> sport=45090 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21
> dport=45090
>  [UPDATE] tcp      6 432000 ESTABLISHED src=192.168.1.5
> dst=192.168.2.250 sport=45090 dport=21 src=192.168.2.50
> dst=192.168.2.70 sport=21 dport=45090 [ASSURED]

New expectation for data connection:

 > conntrack -E expect :
 >
 > 300 proto=6 src=192.168.2.50 dst=192.168.2.70 sport=0 dport=33344

New data connection machting expectation, both source and
destination properly NATed:

>     [NEW] tcp      6 120 SYN_SENT src=192.168.2.50 dst=192.168.2.70
> sport=20 dport=33344 [UNREPLIED] src=192.168.1.5 dst=192.168.2.250
> sport=33344 dport=20
>  [UPDATE] tcp      6 60 SYN_RECV src=192.168.2.50 dst=192.168.2.70
> sport=20 dport=33344 src=192.168.1.5 dst=192.168.2.250 sport=33344
> dport=20
>  [UPDATE] tcp      6 432000 ESTABLISHED src=192.168.2.50
> dst=192.168.2.70 sport=20 dport=33344 src=192.168.1.5
> dst=192.168.2.250 sport=33344 dport=20 [ASSURED]
>  [UPDATE] tcp      6 120 FIN_WAIT src=192.168.2.50 dst=192.168.2.70
> sport=20 dport=33344 src=192.168.1.5 dst=192.168.2.250 sport=33344
> dport=20 [ASSURED]
>  [UPDATE] tcp      6 60 CLOSE_WAIT src=192.168.2.50 dst=192.168.2.70
> sport=20 dport=33344 src=192.168.1.5 dst=192.168.2.250 sport=33344
> dport=20 [ASSURED]
>  [UPDATE] tcp      6 10 CLOSE src=192.168.2.50 dst=192.168.2.70
> sport=20 dport=33344 src=192.168.1.5 dst=192.168.2.250 sport=33344
> dport=20 [ASSURED]

Data connection closed

>  [UPDATE] tcp      6 120 FIN_WAIT src=192.168.1.5 dst=192.168.2.250
> sport=45090 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21
> dport=45090 [ASSURED]
>  [UPDATE] tcp      6 60 CLOSE_WAIT src=192.168.1.5 dst=192.168.2.250
> sport=45090 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21
> dport=45090 [ASSURED]
>  [UPDATE] tcp      6 30 LAST_ACK src=192.168.1.5 dst=192.168.2.250
> sport=45090 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21
> dport=45090 [ASSURED]
>  [UPDATE] tcp      6 120 TIME_WAIT src=192.168.1.5 dst=192.168.2.250
> sport=45090 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21
> dport=45090 [ASSURED]
>  [UPDATE] tcp      6 10 CLOSE src=192.168.1.5 dst=192.168.2.250
> sport=45090 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21
> dport=45090 [ASSURED]

Control connection closed

> [DESTROY] tcp      6 src=192.168.2.50 dst=192.168.2.70 sport=20
> dport=33344 packets=4 bytes=559 src=192.168.1.5 dst=192.168.2.250
> sport=33344 dport=20 packets=4 bytes=216

> [DESTROY] tcp      6 src=192.168.1.5 dst=192.168.2.250 sport=45090
> dport=21 packets=17 bytes=916 src=192.168.2.50 dst=192.168.2.70
> sport=21 dport=45090 packets=12 bytes=1162

Both connections destroyed
-
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: conntrack doesn't always work when a bridge is used
  2007-12-20 10:06       ` Patrick McHardy
@ 2007-12-20 11:06         ` Damien Thébault
  2007-12-20 11:07           ` Patrick McHardy
  0 siblings, 1 reply; 21+ messages in thread
From: Damien Thébault @ 2007-12-20 11:06 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: linux-net, netfilter-devel, David S. Miller

On Dec 20, 2007 11:06 AM, Patrick McHardy <kaber@trash.net> wrote:
> That actually looks like it works properly.
>
> New control connection:
>
> [...]
>
> New expectation for data connection:
>
> [...]
>
> New data connection machting expectation, both source and
> destination properly NATed:
>
> [...]
>
> Data connection closed
>
> [...]
>
> Control connection closed
>
> [...]
>
> Both connections destroyed
>

Yes, when I'm using ip addresses with the same length, the conntrack
-E output is similar, and it's working.
But if I change the router's "wan"-side ip address to be longer or
shorter than the client's ip address, then it's non-working again.

I don't think it's something in the configuration : the results are
present on two different computers, one being a x86 little endian
debian laptop where I did the bisect, the other being an arm xscale
big endian board with a custom distro (nothing funny here, just
kernel, drivers, busybox and  some utilities).

Well, I'm sorry, I don't want to bother anyone, but those are really
the results I'm seeing.
-- 
Damien Thebault

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: conntrack doesn't always work when a bridge is used
  2007-12-20 11:06         ` Damien Thébault
@ 2007-12-20 11:07           ` Patrick McHardy
  2007-12-20 11:20             ` Damien Thébault
  0 siblings, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2007-12-20 11:07 UTC (permalink / raw)
  To: Damien Thébault; +Cc: linux-net, netfilter-devel, David S. Miller

Damien Thébault wrote:
> Yes, when I'm using ip addresses with the same length, the conntrack
> -E output is similar, and it's working.
> But if I change the router's "wan"-side ip address to be longer or
> shorter than the client's ip address, then it's non-working again.
> 
> I don't think it's something in the configuration : the results are
> present on two different computers, one being a x86 little endian
> debian laptop where I did the bisect, the other being an arm xscale
> big endian board with a custom distro (nothing funny here, just
> kernel, drivers, busybox and  some utilities).
> 
> Well, I'm sorry, I don't want to bother anyone, but those are really
> the results I'm seeing.


Don't worry. I was just wondering because I asked for the output
of the *non-working* case :) Please post that and I'll look into it.
-
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: conntrack doesn't always work when a bridge is used
  2007-12-20 11:07           ` Patrick McHardy
@ 2007-12-20 11:20             ` Damien Thébault
  2007-12-20 11:25               ` Patrick McHardy
  0 siblings, 1 reply; 21+ messages in thread
From: Damien Thébault @ 2007-12-20 11:20 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: linux-net, netfilter-devel, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 431 bytes --]

On Dec 20, 2007 12:07 PM, Patrick McHardy <kaber@trash.net> wrote:
>
> Don't worry. I was just wondering because I asked for the output
> of the *non-working* case :) Please post that and I'll look into it.
>

The fact is that this was the output of the non working case, they are similar.
I'm attaching the four files I just made, with both the working and
the non-working case.
(I didn't wait the [DESTROY])

-- 
Damien Thebault

[-- Attachment #2: bad_conntrack_e.txt --]
[-- Type: text/plain, Size: 1989 bytes --]

    [NEW] tcp      6 120 SYN_SENT src=192.168.1.5 dst=192.168.2.250 sport=55126 dport=21 [UNREPLIED] src=192.168.2.50 dst=192.168.2.70 sport=21 dport=55126
 [UPDATE] tcp      6 60 SYN_RECV src=192.168.1.5 dst=192.168.2.250 sport=55126 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21 dport=55126
 [UPDATE] tcp      6 432000 ESTABLISHED src=192.168.1.5 dst=192.168.2.250 sport=55126 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21 dport=55126 [ASSURED]
    [NEW] tcp      6 120 SYN_SENT src=192.168.2.50 dst=192.168.2.70 sport=20 dport=58132 [UNREPLIED] src=192.168.1.5 dst=192.168.2.250 sport=58132 dport=20
 [UPDATE] tcp      6 60 SYN_RECV src=192.168.2.50 dst=192.168.2.70 sport=20 dport=58132 src=192.168.1.5 dst=192.168.2.250 sport=58132 dport=20
 [UPDATE] tcp      6 432000 ESTABLISHED src=192.168.2.50 dst=192.168.2.70 sport=20 dport=58132 src=192.168.1.5 dst=192.168.2.250 sport=58132 dport=20 [ASSURED]
 [UPDATE] tcp      6 120 FIN_WAIT src=192.168.2.50 dst=192.168.2.70 sport=20 dport=58132 src=192.168.1.5 dst=192.168.2.250 sport=58132 dport=20 [ASSURED]
 [UPDATE] tcp      6 60 CLOSE_WAIT src=192.168.2.50 dst=192.168.2.70 sport=20 dport=58132 src=192.168.1.5 dst=192.168.2.250 sport=58132 dport=20 [ASSURED]
 [UPDATE] tcp      6 10 CLOSE src=192.168.2.50 dst=192.168.2.70 sport=20 dport=58132 src=192.168.1.5 dst=192.168.2.250 sport=58132 dport=20 [ASSURED]
 [UPDATE] tcp      6 120 FIN_WAIT src=192.168.1.5 dst=192.168.2.250 sport=55126 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21 dport=55126 [ASSURED]
 [UPDATE] tcp      6 60 CLOSE_WAIT src=192.168.1.5 dst=192.168.2.250 sport=55126 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21 dport=55126 [ASSURED]
 [UPDATE] tcp      6 30 LAST_ACK src=192.168.1.5 dst=192.168.2.250 sport=55126 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21 dport=55126 [ASSURED]
 [UPDATE] tcp      6 10 CLOSE src=192.168.1.5 dst=192.168.2.250 sport=55126 dport=21 src=192.168.2.50 dst=192.168.2.70 sport=21 dport=55126 [ASSURED]

[-- Attachment #3: bad_conntrack_e_expect.txt --]
[-- Type: text/plain, Size: 105 bytes --]

300 proto=6 src=192.168.2.50 dst=192.168.2.70 sport=0 dport=58132
Now closing conntrack event dumping...

[-- Attachment #4: good_conntrack_e.txt --]
[-- Type: text/plain, Size: 1984 bytes --]

    [NEW] tcp      6 120 SYN_SENT src=192.168.1.5 dst=192.168.2.250 sport=55137 dport=21 [UNREPLIED] src=192.168.2.50 dst=192.168.2.7 sport=21 dport=55137
 [UPDATE] tcp      6 60 SYN_RECV src=192.168.1.5 dst=192.168.2.250 sport=55137 dport=21 src=192.168.2.50 dst=192.168.2.7 sport=21 dport=55137
 [UPDATE] tcp      6 432000 ESTABLISHED src=192.168.1.5 dst=192.168.2.250 sport=55137 dport=21 src=192.168.2.50 dst=192.168.2.7 sport=21 dport=55137 [ASSURED]
    [NEW] tcp      6 120 SYN_SENT src=192.168.2.50 dst=192.168.2.7 sport=20 dport=44357 [UNREPLIED] src=192.168.1.5 dst=192.168.2.250 sport=44357 dport=20
 [UPDATE] tcp      6 60 SYN_RECV src=192.168.2.50 dst=192.168.2.7 sport=20 dport=44357 src=192.168.1.5 dst=192.168.2.250 sport=44357 dport=20
 [UPDATE] tcp      6 431999 ESTABLISHED src=192.168.2.50 dst=192.168.2.7 sport=20 dport=44357 src=192.168.1.5 dst=192.168.2.250 sport=44357 dport=20 [ASSURED]
 [UPDATE] tcp      6 120 FIN_WAIT src=192.168.2.50 dst=192.168.2.7 sport=20 dport=44357 src=192.168.1.5 dst=192.168.2.250 sport=44357 dport=20 [ASSURED]
 [UPDATE] tcp      6 60 CLOSE_WAIT src=192.168.2.50 dst=192.168.2.7 sport=20 dport=44357 src=192.168.1.5 dst=192.168.2.250 sport=44357 dport=20 [ASSURED]
 [UPDATE] tcp      6 30 LAST_ACK src=192.168.2.50 dst=192.168.2.7 sport=20 dport=44357 src=192.168.1.5 dst=192.168.2.250 sport=44357 dport=20 [ASSURED]
 [UPDATE] tcp      6 120 TIME_WAIT src=192.168.2.50 dst=192.168.2.7 sport=20 dport=44357 src=192.168.1.5 dst=192.168.2.250 sport=44357 dport=20 [ASSURED]
 [UPDATE] tcp      6 120 FIN_WAIT src=192.168.1.5 dst=192.168.2.250 sport=55137 dport=21 src=192.168.2.50 dst=192.168.2.7 sport=21 dport=55137 [ASSURED]
 [UPDATE] tcp      6 30 LAST_ACK src=192.168.1.5 dst=192.168.2.250 sport=55137 dport=21 src=192.168.2.50 dst=192.168.2.7 sport=21 dport=55137 [ASSURED]
 [UPDATE] tcp      6 120 TIME_WAIT src=192.168.1.5 dst=192.168.2.250 sport=55137 dport=21 src=192.168.2.50 dst=192.168.2.7 sport=21 dport=55137 [ASSURED]

[-- Attachment #5: good_conntrack_e_expect.txt --]
[-- Type: text/plain, Size: 104 bytes --]

300 proto=6 src=192.168.2.50 dst=192.168.2.7 sport=0 dport=44357
Now closing conntrack event dumping...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: conntrack doesn't always work when a bridge is used
  2007-12-20 11:20             ` Damien Thébault
@ 2007-12-20 11:25               ` Patrick McHardy
  2007-12-20 13:21                 ` Damien Thébault
  0 siblings, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2007-12-20 11:25 UTC (permalink / raw)
  To: Damien Thébault; +Cc: linux-net, netfilter-devel, David S. Miller

Damien Thébault wrote:
> On Dec 20, 2007 12:07 PM, Patrick McHardy <kaber@trash.net> wrote:
>> Don't worry. I was just wondering because I asked for the output
>> of the *non-working* case :) Please post that and I'll look into it.
>>
> 
> The fact is that this was the output of the non working case, they are similar.
> I'm attaching the four files I just made, with both the working and
> the non-working case.


Thanks. Could you also post a tcpdump and enable conntrack logging
by doing "echo 255 >/proc/sys/net/netfilter/nf_conntrack_log_invalid"
and post the output of that, if any (you also need to load ipt_LOG
in case you're not using some other logging backend).
-
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: conntrack doesn't always work when a bridge is used
  2007-12-20 11:25               ` Patrick McHardy
@ 2007-12-20 13:21                 ` Damien Thébault
  2007-12-20 16:08                   ` Damien Thébault
  2007-12-22  7:56                   ` Patrick McHardy
  0 siblings, 2 replies; 21+ messages in thread
From: Damien Thébault @ 2007-12-20 13:21 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: linux-net, netfilter-devel, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 744 bytes --]

On Dec 20, 2007 12:25 PM, Patrick McHardy <kaber@trash.net> wrote:
>
> Thanks. Could you also post a tcpdump and enable conntrack logging
> by doing "echo 255 >/proc/sys/net/netfilter/nf_conntrack_log_invalid"
> and post the output of that, if any (you also need to load ipt_LOG
> in case you're not using some other logging backend).
>

I captured three times. The first time ("bad1" files), the reply is
coming back, but the ftp client doesn't seem to handle it. The second
time ("bad2" files), there is a problem with sequence numbers. And
then the last time ("good" files), it's ok.

I had sequence number errors without the previous bridge patch which
get merged in net-2.6. So I'll try again with the net-2.6 kernel.

-- 
Damien Thebault

[-- Attachment #2: capture_ftp_bad1_router.pcap --]
[-- Type: application/cap, Size: 8577 bytes --]

[-- Attachment #3: capture_ftp_bad2_router.pcap --]
[-- Type: application/cap, Size: 9077 bytes --]

[-- Attachment #4: capture_ftp_good_router.pcap --]
[-- Type: application/cap, Size: 8240 bytes --]

[-- Attachment #5: capture_ftp_bad1.pcap --]
[-- Type: application/cap, Size: 8577 bytes --]

[-- Attachment #6: capture_ftp_bad2.pcap --]
[-- Type: application/cap, Size: 8719 bytes --]

[-- Attachment #7: capture_ftp_good.pcap --]
[-- Type: application/cap, Size: 8240 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: conntrack doesn't always work when a bridge is used
  2007-12-20 13:21                 ` Damien Thébault
@ 2007-12-20 16:08                   ` Damien Thébault
  2007-12-22  7:56                   ` Patrick McHardy
  1 sibling, 0 replies; 21+ messages in thread
From: Damien Thébault @ 2007-12-20 16:08 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: linux-net, netfilter-devel, David S. Miller

On Dec 20, 2007 2:21 PM, Damien Thébault <damien.thebault@gmail.com> wrote:
>
> I had sequence number errors without the previous bridge patch which
> get merged in net-2.6. So I'll try again with the net-2.6 kernel.
>

Ok I tried and it's the same behaviour.
Oh and last time I forgot to tell, but I'm not seeing anything in the
kernel log.
(I checked that the LOG target was working by adding a rule with
iptables on INPUT and trying to connect to this port, the packet was
present in the kernel log)

-- 
Damien Thebault
-
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: conntrack doesn't always work when a bridge is used
  2007-12-20 13:21                 ` Damien Thébault
  2007-12-20 16:08                   ` Damien Thébault
@ 2007-12-22  7:56                   ` Patrick McHardy
  2007-12-26  9:54                     ` Damien Thébault
  1 sibling, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2007-12-22  7:56 UTC (permalink / raw)
  To: Damien Thébault; +Cc: linux-net, netfilter-devel, David S. Miller

Damien Thébault wrote:
> On Dec 20, 2007 12:25 PM, Patrick McHardy <kaber@trash.net> wrote:
>> Thanks. Could you also post a tcpdump and enable conntrack logging
>> by doing "echo 255 >/proc/sys/net/netfilter/nf_conntrack_log_invalid"
>> and post the output of that, if any (you also need to load ipt_LOG
>> in case you're not using some other logging backend).
>>
> 
> I captured three times. The first time ("bad1" files), the reply is
> coming back, but the ftp client doesn't seem to handle it. The second
> time ("bad2" files), there is a problem with sequence numbers. And
> then the last time ("good" files), it's ok.
> 
> I had sequence number errors without the previous bridge patch which
> get merged in net-2.6. So I'll try again with the net-2.6 kernel.


Yes, the captures show the effects from the double POSTROUTING
invocation. Could you send me captures from the current net-2.6
tree?



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: conntrack doesn't always work when a bridge is used
  2007-12-22  7:56                   ` Patrick McHardy
@ 2007-12-26  9:54                     ` Damien Thébault
  2007-12-30 17:53                       ` Patrick McHardy
  0 siblings, 1 reply; 21+ messages in thread
From: Damien Thébault @ 2007-12-26  9:54 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: linux-net, netfilter-devel, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 750 bytes --]

On Dec 22, 2007 8:56 AM, Patrick McHardy <kaber@trash.net> wrote:
> >
> > I captured three times. The first time ("bad1" files), the reply is
> > coming back, but the ftp client doesn't seem to handle it. The second
> > time ("bad2" files), there is a problem with sequence numbers. And
> > then the last time ("good" files), it's ok.
> >
> > I had sequence number errors without the previous bridge patch which
> > get merged in net-2.6. So I'll try again with the net-2.6 kernel.
>
>
> Yes, the captures show the effects from the double POSTROUTING
> invocation. Could you send me captures from the current net-2.6
> tree?
>

Sure, here they are.
(I used David Miller's net-2.6.25 at 75fa3253609430f28da005da494ce5ad3b5c78a1 )

-- 
Damien Thebault

[-- Attachment #2: capture_net_ftp_bad1.pcap --]
[-- Type: application/cap, Size: 7375 bytes --]

[-- Attachment #3: capture_net_ftp_bad2.pcap --]
[-- Type: application/cap, Size: 8121 bytes --]

[-- Attachment #4: capture_net_ftp_good.pcap --]
[-- Type: application/cap, Size: 6936 bytes --]

[-- Attachment #5: capture_net_ftp_bad1_router.pcap --]
[-- Type: application/cap, Size: 7375 bytes --]

[-- Attachment #6: capture_net_ftp_bad2_router.pcap --]
[-- Type: application/cap, Size: 8121 bytes --]

[-- Attachment #7: capture_net_ftp_good_router.pcap --]
[-- Type: application/cap, Size: 6936 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: conntrack doesn't always work when a bridge is used
  2007-12-19 17:00 ` conntrack doesn't always work when a bridge is used Damien Thébault
  2007-12-19 19:03   ` Patrick McHardy
@ 2007-12-28 14:39   ` Damien Thébault
  1 sibling, 0 replies; 21+ messages in thread
From: Damien Thébault @ 2007-12-28 14:39 UTC (permalink / raw)
  To: linux-net, netfilter-devel, Patrick McHardy, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 1689 bytes --]

On Dec 19, 2007 6:00 PM, Damien Thébault <damien.thebault@gmail.com> wrote:
> Hello,
>
> I sent the quoted mail to linux-net with my problem yesterday, but I
> did a git bisect today and I got the following output :
>
> > 2bf540b73ed5b304e84bb4d4c390d49d1cfa0ef8 is first bad commit
> > commit 2bf540b73ed5b304e84bb4d4c390d49d1cfa0ef8
> > Author: Patrick McHardy <kaber@trash.net>
> > Date:   Wed Dec 13 16:54:25 2006 -0800
> >
> >     [NETFILTER]: bridge-netfilter: remove deferred hooks
> >
> >     Remove the deferred hooks and all related code as scheduled in
> >     feature-removal-schedule.
> >
> >     Signed-off-by: Patrick McHardy <kaber@trash.net>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
> >
> > :040000 040000 c49ea947455937566b6129991dde5e86f2453aae 6611736ce5c0fcde7627494b66b9ea94e37ea42e M      Documentation
> > :040000 040000 d0dd0700fe68f98b52687be3a0c31d73f7b15b81 f8ddf15a0389c5f5b7f2c11d7d0db039a660e1d5 M      include
> > :040000 040000 dafccf7ff8657be9adca6b28dbd365cdd6c01ca5 3eeb1cb4b16cc5cb698ab559b47ea6b0991d4d3a M      net
>

This morning I reverted the patch and ported it to work with the
net-2.6.25 and 2.6.23 kernels. With it, the behaviour seems to be good
again (I didn't test a lot so I don't know if anything else is broken
by this).
I don't know yet if it solves my RTSP conntrack problem too.

(Yes I know this is not really the good way to handle this, but since
the removal of the deferred hooks before 2.6.20, there was a lot of
changes in this area, so I just tried to see if it was working with
the current kernel)

I'm attaching the two patchs if anyone needs it.
-- 
Damien Thebault

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: add_deferred_hooks-net-2.6.25.patch --]
[-- Type: text/x-diff; name=add_deferred_hooks-net-2.6.25.patch, Size: 13252 bytes --]

diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 0ae682b..919f331 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -181,6 +181,22 @@ Who:	Nick Piggin <npiggin@suse.de>
 
 ---------------------------
 
+What:	Bridge netfilter deferred IPv4/IPv6 output hook calling
+When:	January 2007
+Why:	The deferred output hooks are a layering violation causing unusual
+	and broken behaviour on bridge devices. Examples of things they
+	break include QoS classifation using the MARK or CLASSIFY targets,
+	the IPsec policy match and connection tracking with VLANs on a
+	bridge. Their only use is to enable bridge output port filtering
+	within iptables with the physdev match, which can also be done by
+	combining iptables and ebtables using netfilter marks. Until it
+	will get removed the hook deferral is disabled by default and is
+	only enabled when needed.
+
+Who:	Patrick McHardy <kaber@trash.net>
+
+---------------------------
+
 What:	PHYSDEVPATH, PHYSDEVBUS, PHYSDEVDRIVER in the uevent environment
 When:	October 2008
 Why:	The stacking of class devices makes these values misleading and
diff --git a/include/linux/netfilter_bridge.h b/include/linux/netfilter_bridge.h
index 499aa93..d9487b9 100644
--- a/include/linux/netfilter_bridge.h
+++ b/include/linux/netfilter_bridge.h
@@ -83,6 +83,7 @@ struct bridge_skb_cb {
 	} daddr;
 };
 
+extern int brnf_deferred_hooks;
 #else
 #define nf_bridge_maybe_copy_header(skb)	(0)
 #define nf_bridge_pad(skb)			(0)
diff --git a/include/linux/netfilter_ipv4.h b/include/linux/netfilter_ipv4.h
index 9a10092..b96952a 100644
--- a/include/linux/netfilter_ipv4.h
+++ b/include/linux/netfilter_ipv4.h
@@ -36,6 +36,7 @@
 #define NFC_IP_DST_PT		0x0400
 /* Something else about the proto */
 #define NFC_IP_PROTO_UNKNOWN	0x2000
+#endif /* ! __KERNEL__ */
 
 /* IP Hooks */
 /* After promisc drops, checksum checks. */
@@ -49,7 +50,6 @@
 /* Packets about to hit the wire. */
 #define NF_IP_POST_ROUTING	4
 #define NF_IP_NUMHOOKS		5
-#endif /* ! __KERNEL__ */
 
 enum nf_ip_hook_priorities {
 	NF_IP_PRI_FIRST = INT_MIN,
@@ -57,8 +57,10 @@ enum nf_ip_hook_priorities {
 	NF_IP_PRI_RAW = -300,
 	NF_IP_PRI_SELINUX_FIRST = -225,
 	NF_IP_PRI_CONNTRACK = -200,
+	NF_IP_PRI_BRIDGE_SABOTAGE_FORWARD = -175,
 	NF_IP_PRI_MANGLE = -150,
 	NF_IP_PRI_NAT_DST = -100,
+	NF_IP_PRI_BRIDGE_SABOTAGE_LOCAL_OUT = -50,
 	NF_IP_PRI_FILTER = 0,
 	NF_IP_PRI_NAT_SRC = 100,
 	NF_IP_PRI_SELINUX_LAST = 225,
diff --git a/include/linux/netfilter_ipv6.h b/include/linux/netfilter_ipv6.h
index 3475a65..07133c9 100644
--- a/include/linux/netfilter_ipv6.h
+++ b/include/linux/netfilter_ipv6.h
@@ -40,6 +40,7 @@
 #define NFC_IP6_DST_PT           0x0400
 /* Something else about the proto */
 #define NFC_IP6_PROTO_UNKNOWN    0x2000
+#endif /* ! __KERNEL__ */
 
 /* IP6 Hooks */
 /* After promisc drops, checksum checks. */
@@ -53,7 +54,6 @@
 /* Packets about to hit the wire. */
 #define NF_IP6_POST_ROUTING	4
 #define NF_IP6_NUMHOOKS		5
-#endif /* ! __KERNEL__ */
 
 
 enum nf_ip6_hook_priorities {
@@ -61,8 +61,10 @@ enum nf_ip6_hook_priorities {
 	NF_IP6_PRI_CONNTRACK_DEFRAG = -400,
 	NF_IP6_PRI_SELINUX_FIRST = -225,
 	NF_IP6_PRI_CONNTRACK = -200,
+	NF_IP6_PRI_BRIDGE_SABOTAGE_FORWARD = -175,
 	NF_IP6_PRI_MANGLE = -150,
 	NF_IP6_PRI_NAT_DST = -100,
+	NF_IP6_PRI_BRIDGE_SABOTAGE_LOCAL_OUT = -50,
 	NF_IP6_PRI_FILTER = 0,
 	NF_IP6_PRI_NAT_SRC = 100,
 	NF_IP6_PRI_SELINUX_LAST = 225,
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 32ac035..86131b4 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -65,6 +65,9 @@ static int brnf_filter_pppoe_tagged __read_mostly = 1;
 #define brnf_filter_pppoe_tagged 1
 #endif
 
+int brnf_deferred_hooks;
+EXPORT_SYMBOL_GPL(brnf_deferred_hooks);
+
 static inline __be16 vlan_proto(const struct sk_buff *skb)
 {
 	return vlan_eth_hdr(skb)->h_vlan_encapsulated_proto;
@@ -689,46 +692,109 @@ static unsigned int br_nf_forward_arp(unsigned int hook, struct sk_buff *skb,
 	return NF_STOLEN;
 }
 
-/* PF_BRIDGE/LOCAL_OUT ***********************************************
- *
- * This function sees both locally originated IP packets and forwarded
+/* PF_BRIDGE/LOCAL_OUT ***********************************************/
+static int br_nf_local_out_finish(struct sk_buff *skb)
+{
+	if (skb->protocol == htons(ETH_P_8021Q)) {
+		skb_push(skb, VLAN_HLEN);
+		skb->network_header -= VLAN_HLEN;
+	}
+
+	NF_HOOK_THRESH(PF_BRIDGE, NF_BR_LOCAL_OUT, skb, NULL, skb->dev,
+		       br_forward_finish, NF_BR_PRI_FIRST + 1);
+
+	return 0;
+}
+
+/* This function sees both locally originated IP packets and forwarded
  * IP packets (in both cases the destination device is a bridge
  * device). It also sees bridged-and-DNAT'ed packets.
+ * To be able to filter on the physical bridge devices (with the physdev
+ * module), we steal packets destined to a bridge device away from the
+ * PF_INET/FORWARD and PF_INET/OUTPUT hook functions, and give them back later,
+ * when we have determined the real output device. This is done in here.
  *
  * If (nf_bridge->mask & BRNF_BRIDGED_DNAT) then the packet is bridged
  * and we fake the PF_BRIDGE/FORWARD hook. The function br_nf_forward()
  * will then fake the PF_INET/FORWARD hook. br_nf_local_out() has priority
  * NF_BR_PRI_FIRST, so no relevant PF_BRIDGE/INPUT functions have been nor
  * will be executed.
- */
+ * Otherwise, if nf_bridge->physindev is NULL, the bridge-nf code never touched
+ * this packet before, and so the packet was locally originated. We fake
+ * the PF_INET/LOCAL_OUT hook.
+ * Finally, if nf_bridge->physindev isn't NULL, then the packet was IP routed,
+ * so we fake the PF_INET/FORWARD hook. ip_sabotage_out() makes sure
+ * even routed packets that didn't arrive on a bridge interface have their
+ * nf_bridge->physindev set. */
 static unsigned int br_nf_local_out(unsigned int hook, struct sk_buff *skb,
 				    const struct net_device *in,
 				    const struct net_device *out,
 				    int (*okfn)(struct sk_buff *))
 {
-	struct net_device *realindev;
+	struct net_device *realindev, *realoutdev;
 	struct nf_bridge_info *nf_bridge;
+	int pf;
 
 	if (!skb->nf_bridge)
 		return NF_ACCEPT;
 
+	if (skb->protocol == htons(ETH_P_IP) || IS_VLAN_IP(skb))
+		pf = PF_INET;
+	else
+		pf = PF_INET6;
+
 	nf_bridge = skb->nf_bridge;
-	if (!(nf_bridge->mask & BRNF_BRIDGED_DNAT))
-		return NF_ACCEPT;
+	nf_bridge->physoutdev = skb->dev;
+	realindev = nf_bridge->physindev;
 
 	/* Bridged, take PF_BRIDGE/FORWARD.
 	 * (see big note in front of br_nf_pre_routing_finish) */
-	nf_bridge->physoutdev = skb->dev;
-	realindev = nf_bridge->physindev;
+	if (nf_bridge->mask & BRNF_BRIDGED_DNAT) {
+		if (nf_bridge->mask & BRNF_PKT_TYPE) {
+			skb->pkt_type = PACKET_OTHERHOST;
+			nf_bridge->mask ^= BRNF_PKT_TYPE;
+		}
+		if (skb->protocol == htons(ETH_P_8021Q)) {
+			skb_push(skb, VLAN_HLEN);
+			skb->network_header -= VLAN_HLEN;
+		}
 
-	if (nf_bridge->mask & BRNF_PKT_TYPE) {
-		skb->pkt_type = PACKET_OTHERHOST;
-		nf_bridge->mask ^= BRNF_PKT_TYPE;
+		NF_HOOK(PF_BRIDGE, NF_BR_FORWARD, skb, realindev,
+			skb->dev, br_forward_finish);
+		goto out;
 	}
-	nf_bridge_push_encap_header(skb);
+	realoutdev = bridge_parent(skb->dev);
+	if (!realoutdev)
+		return NF_DROP;
+
+#if defined(CONFIG_VLAN_8021Q) || defined(CONFIG_VLAN_8021Q_MODULE)
+	/* iptables should match -o br0.x */
+	if (nf_bridge->netoutdev)
+		realoutdev = nf_bridge->netoutdev;
+#endif
+	if (skb->protocol == htons(ETH_P_8021Q)) {
+		skb_pull(skb, VLAN_HLEN);
+		skb->network_header += VLAN_HLEN;
+	}
+	/* IP forwarded traffic has a physindev, locally
+	 * generated traffic hasn't. */
+	if (realindev != NULL) {
+		if (!(nf_bridge->mask & BRNF_DONT_TAKE_PARENT)) {
+			struct net_device *parent = bridge_parent(realindev);
+			if (parent)
+				realindev = parent;
+		}
 
-	NF_HOOK(PF_BRIDGE, NF_BR_FORWARD, skb, realindev, skb->dev,
-		br_forward_finish);
+		NF_HOOK_THRESH(pf, NF_IP_FORWARD, skb, realindev,
+			       realoutdev, br_nf_local_out_finish,
+			       NF_IP_PRI_BRIDGE_SABOTAGE_FORWARD + 1);
+	} else {
+		NF_HOOK_THRESH(pf, NF_IP_LOCAL_OUT, skb, realindev,
+			       realoutdev, br_nf_local_out_finish,
+			       NF_IP_PRI_BRIDGE_SABOTAGE_LOCAL_OUT + 1);
+	}
+
+out:
 	return NF_STOLEN;
 }
 
@@ -834,6 +900,67 @@ static unsigned int ip_sabotage_in(unsigned int hook, struct sk_buff *skb,
 	return NF_ACCEPT;
 }
 
+/* Postpone execution of PF_INET(6)/FORWARD, PF_INET(6)/LOCAL_OUT
+ * and PF_INET(6)/POST_ROUTING until we have done the forwarding
+ * decision in the bridge code and have determined nf_bridge->physoutdev. */
+static unsigned int ip_sabotage_out(unsigned int hook, struct sk_buff *skb,
+				    const struct net_device *in,
+				    const struct net_device *out,
+				    int (*okfn)(struct sk_buff *))
+{
+	if ((out->hard_start_xmit == br_dev_xmit &&
+	     okfn != br_nf_forward_finish &&
+	     okfn != br_nf_local_out_finish && okfn != br_nf_dev_queue_xmit)
+#if defined(CONFIG_VLAN_8021Q) || defined(CONFIG_VLAN_8021Q_MODULE)
+	    || ((out->priv_flags & IFF_802_1Q_VLAN) &&
+		VLAN_DEV_INFO(out)->real_dev->hard_start_xmit == br_dev_xmit)
+#endif
+	    ) {
+		struct nf_bridge_info *nf_bridge;
+
+		if (!skb->nf_bridge) {
+#ifdef CONFIG_SYSCTL
+			/* This code is executed while in the IP(v6) stack,
+			   the version should be 4 or 6. We can't use
+			   skb->protocol because that isn't set on
+			   PF_INET(6)/LOCAL_OUT. */
+			struct iphdr *ip = ip_hdr(skb);
+
+			if (ip->version == 4 && !brnf_call_iptables)
+				return NF_ACCEPT;
+			else if (ip->version == 6 && !brnf_call_ip6tables)
+				return NF_ACCEPT;
+			else if (!brnf_deferred_hooks)
+				return NF_ACCEPT;
+#endif
+			if (hook == NF_IP_POST_ROUTING)
+				return NF_ACCEPT;
+			if (!nf_bridge_alloc(skb))
+				return NF_DROP;
+		}
+
+		nf_bridge = skb->nf_bridge;
+
+		/* This frame will arrive on PF_BRIDGE/LOCAL_OUT and we
+		 * will need the indev then. For a brouter, the real indev
+		 * can be a bridge port, so we make sure br_nf_local_out()
+		 * doesn't use the bridge parent of the indev by using
+		 * the BRNF_DONT_TAKE_PARENT mask. */
+		if (hook == NF_IP_FORWARD && nf_bridge->physindev == NULL) {
+			nf_bridge->mask |= BRNF_DONT_TAKE_PARENT;
+			nf_bridge->physindev = (struct net_device *)in;
+		}
+#if defined(CONFIG_VLAN_8021Q) || defined(CONFIG_VLAN_8021Q_MODULE)
+		/* the iptables outdev is br0.x, not br0 */
+		if (out->priv_flags & IFF_802_1Q_VLAN)
+			nf_bridge->netoutdev = (struct net_device *)out;
+#endif
+		return NF_STOP;
+	}
+
+	return NF_ACCEPT;
+}
+
 /* For br_nf_local_out we need (prio = NF_BR_PRI_FIRST), to insure that innocent
  * PF_BRIDGE/NF_BR_LOCAL_OUT functions don't get bridged traffic as input.
  * For br_nf_post_routing, we need (prio = NF_BR_PRI_LAST), because
@@ -879,6 +1006,36 @@ static struct nf_hook_ops br_nf_ops[] __read_mostly = {
 	  .pf = PF_INET6,
 	  .hooknum = NF_INET_PRE_ROUTING,
 	  .priority = NF_IP6_PRI_FIRST, },
+	{ .hook = ip_sabotage_out,
+	  .owner = THIS_MODULE,
+	  .pf = PF_INET,
+	  .hooknum = NF_IP_FORWARD,
+	  .priority = NF_IP_PRI_BRIDGE_SABOTAGE_FORWARD, },
+	{ .hook = ip_sabotage_out,
+	  .owner = THIS_MODULE,
+	  .pf = PF_INET6,
+	  .hooknum = NF_IP6_FORWARD,
+	  .priority = NF_IP6_PRI_BRIDGE_SABOTAGE_FORWARD, },
+	{ .hook = ip_sabotage_out,
+	  .owner = THIS_MODULE,
+	  .pf = PF_INET,
+	  .hooknum = NF_IP_LOCAL_OUT,
+	  .priority = NF_IP_PRI_BRIDGE_SABOTAGE_LOCAL_OUT, },
+	{ .hook = ip_sabotage_out,
+	  .owner = THIS_MODULE,
+	  .pf = PF_INET6,
+	  .hooknum = NF_IP6_LOCAL_OUT,
+	  .priority = NF_IP6_PRI_BRIDGE_SABOTAGE_LOCAL_OUT, },
+	{ .hook = ip_sabotage_out,
+	  .owner = THIS_MODULE,
+	  .pf = PF_INET,
+	  .hooknum = NF_IP_POST_ROUTING,
+	  .priority = NF_IP_PRI_FIRST, },
+	{ .hook = ip_sabotage_out,
+	  .owner = THIS_MODULE,
+	  .pf = PF_INET6,
+	  .hooknum = NF_IP6_POST_ROUTING,
+	  .priority = NF_IP6_PRI_FIRST, },
 };
 
 #ifdef CONFIG_SYSCTL
diff --git a/net/netfilter/xt_physdev.c b/net/netfilter/xt_physdev.c
index 678b683..6e7eb9b 100644
--- a/net/netfilter/xt_physdev.c
+++ b/net/netfilter/xt_physdev.c
@@ -104,16 +104,20 @@ physdev_mt_check(const char *tablename, const void *ip,
 	if (!(info->bitmask & XT_PHYSDEV_OP_MASK) ||
 	    info->bitmask & ~XT_PHYSDEV_OP_MASK)
 		return false;
-	if (info->bitmask & XT_PHYSDEV_OP_OUT &&
+	if (brnf_deferred_hooks == 0 &&
+	    info->bitmask & XT_PHYSDEV_OP_OUT &&
 	    (!(info->bitmask & XT_PHYSDEV_OP_BRIDGED) ||
 	     info->invert & XT_PHYSDEV_OP_BRIDGED) &&
 	    hook_mask & ((1 << NF_INET_LOCAL_OUT) | (1 << NF_INET_FORWARD) |
 			 (1 << NF_INET_POST_ROUTING))) {
 		printk(KERN_WARNING "physdev match: using --physdev-out in the "
 		       "OUTPUT, FORWARD and POSTROUTING chains for non-bridged "
-		       "traffic is not supported anymore.\n");
-		if (hook_mask & (1 << NF_INET_LOCAL_OUT))
-			return false;
+		       "traffic is deprecated and breaks other things, it will "
+		       "be removed in January 2007. See Documentation/"
+		       "feature-removal-schedule.txt for details. This doesn't "
+		       "affect you in case you're using it for purely bridged "
+		       "traffic.\n");
+		brnf_deferred_hooks = 1;
 	}
 	return true;
 }

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: add_deferred_hooks-2.6.23.patch --]
[-- Type: text/x-diff; name=add_deferred_hooks-2.6.23.patch, Size: 12042 bytes --]

Index: Documentation/feature-removal-schedule.txt
===================================================================
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -154,6 +154,22 @@
 
 ---------------------------
 
+What:	Bridge netfilter deferred IPv4/IPv6 output hook calling
+When:	January 2007
+Why:	The deferred output hooks are a layering violation causing unusual
+	and broken behaviour on bridge devices. Examples of things they
+	break include QoS classifation using the MARK or CLASSIFY targets,
+	the IPsec policy match and connection tracking with VLANs on a
+	bridge. Their only use is to enable bridge output port filtering
+	within iptables with the physdev match, which can also be done by
+	combining iptables and ebtables using netfilter marks. Until it
+	will get removed the hook deferral is disabled by default and is
+	only enabled when needed.
+
+Who:	Patrick McHardy <kaber@trash.net>
+
+---------------------------
+
 What:	PHYSDEVPATH, PHYSDEVBUS, PHYSDEVDRIVER in the uevent environment
 When:	October 2008
 Why:	The stacking of class devices makes these values misleading and
Index: include/linux/netfilter_bridge.h
===================================================================
--- a/include/linux/netfilter_bridge.h
+++ b/include/linux/netfilter_bridge.h
@@ -82,6 +82,7 @@
 	} daddr;
 };
 
+extern int brnf_deferred_hooks;
 #else
 #define nf_bridge_maybe_copy_header(skb)	(0)
 #define nf_bridge_pad(skb)			(0)
Index: include/linux/netfilter_ipv4.h
===================================================================
--- a/include/linux/netfilter_ipv4.h
+++ b/include/linux/netfilter_ipv4.h
@@ -57,8 +57,10 @@
 	NF_IP_PRI_RAW = -300,
 	NF_IP_PRI_SELINUX_FIRST = -225,
 	NF_IP_PRI_CONNTRACK = -200,
+	NF_IP_PRI_BRIDGE_SABOTAGE_FORWARD = -175,
 	NF_IP_PRI_MANGLE = -150,
 	NF_IP_PRI_NAT_DST = -100,
+	NF_IP_PRI_BRIDGE_SABOTAGE_LOCAL_OUT = -50,
 	NF_IP_PRI_FILTER = 0,
 	NF_IP_PRI_NAT_SRC = 100,
 	NF_IP_PRI_SELINUX_LAST = 225,
Index: include/linux/netfilter_ipv6.h
===================================================================
--- a/include/linux/netfilter_ipv6.h
+++ b/include/linux/netfilter_ipv6.h
@@ -62,8 +62,10 @@
 	NF_IP6_PRI_CONNTRACK_DEFRAG = -400,
 	NF_IP6_PRI_SELINUX_FIRST = -225,
 	NF_IP6_PRI_CONNTRACK = -200,
+	NF_IP6_PRI_BRIDGE_SABOTAGE_FORWARD = -175,
 	NF_IP6_PRI_MANGLE = -150,
 	NF_IP6_PRI_NAT_DST = -100,
+	NF_IP6_PRI_BRIDGE_SABOTAGE_LOCAL_OUT = -50,
 	NF_IP6_PRI_FILTER = 0,
 	NF_IP6_PRI_NAT_SRC = 100,
 	NF_IP6_PRI_SELINUX_LAST = 225,
Index: net/bridge/br_netfilter.c
===================================================================
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -65,6 +65,9 @@
 #define brnf_filter_pppoe_tagged 1
 #endif
 
+int brnf_deferred_hooks;
+EXPORT_SYMBOL_GPL(brnf_deferred_hooks);
+
 static inline __be16 vlan_proto(const struct sk_buff *skb)
 {
 	return vlan_eth_hdr(skb)->h_vlan_encapsulated_proto;
@@ -697,47 +700,110 @@
 	return NF_STOLEN;
 }
 
-/* PF_BRIDGE/LOCAL_OUT ***********************************************
- *
- * This function sees both locally originated IP packets and forwarded
+/* PF_BRIDGE/LOCAL_OUT ***********************************************/
+static int br_nf_local_out_finish(struct sk_buff *skb)
+{
+	if (skb->protocol == htons(ETH_P_8021Q)) {
+		skb_push(skb, VLAN_HLEN);
+		skb->network_header -= VLAN_HLEN;
+	}
+
+	NF_HOOK_THRESH(PF_BRIDGE, NF_BR_LOCAL_OUT, skb, NULL, skb->dev,
+		       br_forward_finish, NF_BR_PRI_FIRST + 1);
+
+	return 0;
+}
+
+/* This function sees both locally originated IP packets and forwarded
  * IP packets (in both cases the destination device is a bridge
  * device). It also sees bridged-and-DNAT'ed packets.
+ * To be able to filter on the physical bridge devices (with the physdev
+ * module), we steal packets destined to a bridge device away from the
+ * PF_INET/FORWARD and PF_INET/OUTPUT hook functions, and give them back later,
+ * when we have determined the real output device. This is done in here.
  *
  * If (nf_bridge->mask & BRNF_BRIDGED_DNAT) then the packet is bridged
  * and we fake the PF_BRIDGE/FORWARD hook. The function br_nf_forward()
  * will then fake the PF_INET/FORWARD hook. br_nf_local_out() has priority
  * NF_BR_PRI_FIRST, so no relevant PF_BRIDGE/INPUT functions have been nor
  * will be executed.
- */
+ * Otherwise, if nf_bridge->physindev is NULL, the bridge-nf code never touched
+ * this packet before, and so the packet was locally originated. We fake
+ * the PF_INET/LOCAL_OUT hook.
+ * Finally, if nf_bridge->physindev isn't NULL, then the packet was IP routed,
+ * so we fake the PF_INET/FORWARD hook. ip_sabotage_out() makes sure
+ * even routed packets that didn't arrive on a bridge interface have their
+ * nf_bridge->physindev set. */
 static unsigned int br_nf_local_out(unsigned int hook, struct sk_buff **pskb,
 				    const struct net_device *in,
 				    const struct net_device *out,
 				    int (*okfn)(struct sk_buff *))
 {
-	struct net_device *realindev;
+	struct net_device *realindev, *realoutdev;
 	struct sk_buff *skb = *pskb;
 	struct nf_bridge_info *nf_bridge;
+	int pf;
 
 	if (!skb->nf_bridge)
 		return NF_ACCEPT;
 
+	if (skb->protocol == htons(ETH_P_IP) || IS_VLAN_IP(skb))
+		pf = PF_INET;
+	else
+		pf = PF_INET6;
+
 	nf_bridge = skb->nf_bridge;
-	if (!(nf_bridge->mask & BRNF_BRIDGED_DNAT))
-		return NF_ACCEPT;
+	nf_bridge->physoutdev = skb->dev;
+	realindev = nf_bridge->physindev;
 
 	/* Bridged, take PF_BRIDGE/FORWARD.
 	 * (see big note in front of br_nf_pre_routing_finish) */
-	nf_bridge->physoutdev = skb->dev;
-	realindev = nf_bridge->physindev;
+	if (nf_bridge->mask & BRNF_BRIDGED_DNAT) {
+		if (nf_bridge->mask & BRNF_PKT_TYPE) {
+			skb->pkt_type = PACKET_OTHERHOST;
+			nf_bridge->mask ^= BRNF_PKT_TYPE;
+		}
+		if (skb->protocol == htons(ETH_P_8021Q)) {
+			skb_push(skb, VLAN_HLEN);
+			skb->network_header -= VLAN_HLEN;
+		}
 
-	if (nf_bridge->mask & BRNF_PKT_TYPE) {
-		skb->pkt_type = PACKET_OTHERHOST;
-		nf_bridge->mask ^= BRNF_PKT_TYPE;
+		NF_HOOK(PF_BRIDGE, NF_BR_FORWARD, skb, realindev,
+			skb->dev, br_forward_finish);
+		goto out;
 	}
-	nf_bridge_push_encap_header(skb);
+	realoutdev = bridge_parent(skb->dev);
+	if (!realoutdev)
+		return NF_DROP;
+
+#if defined(CONFIG_VLAN_8021Q) || defined(CONFIG_VLAN_8021Q_MODULE)
+	/* iptables should match -o br0.x */
+	if (nf_bridge->netoutdev)
+		realoutdev = nf_bridge->netoutdev;
+#endif
+	if (skb->protocol == htons(ETH_P_8021Q)) {
+		skb_pull(skb, VLAN_HLEN);
+		skb->network_header += VLAN_HLEN;
+	}
+	/* IP forwarded traffic has a physindev, locally
+	 * generated traffic hasn't. */
+	if (realindev != NULL) {
+		if (!(nf_bridge->mask & BRNF_DONT_TAKE_PARENT)) {
+			struct net_device *parent = bridge_parent(realindev);
+			if (parent)
+				realindev = parent;
+		}
 
-	NF_HOOK(PF_BRIDGE, NF_BR_FORWARD, skb, realindev, skb->dev,
-		br_forward_finish);
+		NF_HOOK_THRESH(pf, NF_IP_FORWARD, skb, realindev,
+			       realoutdev, br_nf_local_out_finish,
+			       NF_IP_PRI_BRIDGE_SABOTAGE_FORWARD + 1);
+	} else {
+		NF_HOOK_THRESH(pf, NF_IP_LOCAL_OUT, skb, realindev,
+			       realoutdev, br_nf_local_out_finish,
+			       NF_IP_PRI_BRIDGE_SABOTAGE_LOCAL_OUT + 1);
+	}
+
+out:
 	return NF_STOLEN;
 }
 
@@ -841,6 +907,69 @@
 	return NF_ACCEPT;
 }
 
+/* Postpone execution of PF_INET(6)/FORWARD, PF_INET(6)/LOCAL_OUT
+ * and PF_INET(6)/POST_ROUTING until we have done the forwarding
+ * decision in the bridge code and have determined nf_bridge->physoutdev. */
+static unsigned int ip_sabotage_out(unsigned int hook, struct sk_buff **pskb,
+				    const struct net_device *in,
+				    const struct net_device *out,
+				    int (*okfn)(struct sk_buff *))
+{
+	struct sk_buff *skb = *pskb;
+
+	if ((out->hard_start_xmit == br_dev_xmit &&
+	     okfn != br_nf_forward_finish &&
+	     okfn != br_nf_local_out_finish && okfn != br_nf_dev_queue_xmit)
+#if defined(CONFIG_VLAN_8021Q) || defined(CONFIG_VLAN_8021Q_MODULE)
+	    || ((out->priv_flags & IFF_802_1Q_VLAN) &&
+		VLAN_DEV_INFO(out)->real_dev->hard_start_xmit == br_dev_xmit)
+#endif
+	    ) {
+		struct nf_bridge_info *nf_bridge;
+
+		if (!skb->nf_bridge) {
+#ifdef CONFIG_SYSCTL
+			/* This code is executed while in the IP(v6) stack,
+			   the version should be 4 or 6. We can't use
+			   skb->protocol because that isn't set on
+			   PF_INET(6)/LOCAL_OUT. */
+			struct iphdr *ip = ip_hdr(skb);
+
+			if (ip->version == 4 && !brnf_call_iptables)
+				return NF_ACCEPT;
+			else if (ip->version == 6 && !brnf_call_ip6tables)
+				return NF_ACCEPT;
+			else if (!brnf_deferred_hooks)
+				return NF_ACCEPT;
+#endif
+			if (hook == NF_IP_POST_ROUTING)
+				return NF_ACCEPT;
+			if (!nf_bridge_alloc(skb))
+				return NF_DROP;
+		}
+
+		nf_bridge = skb->nf_bridge;
+
+		/* This frame will arrive on PF_BRIDGE/LOCAL_OUT and we
+		 * will need the indev then. For a brouter, the real indev
+		 * can be a bridge port, so we make sure br_nf_local_out()
+		 * doesn't use the bridge parent of the indev by using
+		 * the BRNF_DONT_TAKE_PARENT mask. */
+		if (hook == NF_IP_FORWARD && nf_bridge->physindev == NULL) {
+			nf_bridge->mask |= BRNF_DONT_TAKE_PARENT;
+			nf_bridge->physindev = (struct net_device *)in;
+		}
+#if defined(CONFIG_VLAN_8021Q) || defined(CONFIG_VLAN_8021Q_MODULE)
+		/* the iptables outdev is br0.x, not br0 */
+		if (out->priv_flags & IFF_802_1Q_VLAN)
+			nf_bridge->netoutdev = (struct net_device *)out;
+#endif
+		return NF_STOP;
+	}
+
+	return NF_ACCEPT;
+}
+
 /* For br_nf_local_out we need (prio = NF_BR_PRI_FIRST), to insure that innocent
  * PF_BRIDGE/NF_BR_LOCAL_OUT functions don't get bridged traffic as input.
  * For br_nf_post_routing, we need (prio = NF_BR_PRI_LAST), because
@@ -886,6 +1013,36 @@
 	  .pf = PF_INET6,
 	  .hooknum = NF_IP6_PRE_ROUTING,
 	  .priority = NF_IP6_PRI_FIRST, },
+	{ .hook = ip_sabotage_out,
+	  .owner = THIS_MODULE,
+	  .pf = PF_INET,
+	  .hooknum = NF_IP_FORWARD,
+	  .priority = NF_IP_PRI_BRIDGE_SABOTAGE_FORWARD, },
+	{ .hook = ip_sabotage_out,
+	  .owner = THIS_MODULE,
+	  .pf = PF_INET6,
+	  .hooknum = NF_IP6_FORWARD,
+	  .priority = NF_IP6_PRI_BRIDGE_SABOTAGE_FORWARD, },
+	{ .hook = ip_sabotage_out,
+	  .owner = THIS_MODULE,
+	  .pf = PF_INET,
+	  .hooknum = NF_IP_LOCAL_OUT,
+	  .priority = NF_IP_PRI_BRIDGE_SABOTAGE_LOCAL_OUT, },
+	{ .hook = ip_sabotage_out,
+	  .owner = THIS_MODULE,
+	  .pf = PF_INET6,
+	  .hooknum = NF_IP6_LOCAL_OUT,
+	  .priority = NF_IP6_PRI_BRIDGE_SABOTAGE_LOCAL_OUT, },
+	{ .hook = ip_sabotage_out,
+	  .owner = THIS_MODULE,
+	  .pf = PF_INET,
+	  .hooknum = NF_IP_POST_ROUTING,
+	  .priority = NF_IP_PRI_FIRST, },
+	{ .hook = ip_sabotage_out,
+	  .owner = THIS_MODULE,
+	  .pf = PF_INET6,
+	  .hooknum = NF_IP6_POST_ROUTING,
+	  .priority = NF_IP6_PRI_FIRST, },
 };
 
 #ifdef CONFIG_SYSCTL
Index: net/netfilter/xt_physdev.c
===================================================================
--- a/net/netfilter/xt_physdev.c
+++ b/net/netfilter/xt_physdev.c
@@ -110,16 +110,20 @@
 	if (!(info->bitmask & XT_PHYSDEV_OP_MASK) ||
 	    info->bitmask & ~XT_PHYSDEV_OP_MASK)
 		return false;
-	if (info->bitmask & XT_PHYSDEV_OP_OUT &&
+	if (brnf_deferred_hooks == 0 &&
+	    info->bitmask & XT_PHYSDEV_OP_OUT &&
 	    (!(info->bitmask & XT_PHYSDEV_OP_BRIDGED) ||
 	     info->invert & XT_PHYSDEV_OP_BRIDGED) &&
 	    hook_mask & ((1 << NF_IP_LOCAL_OUT) | (1 << NF_IP_FORWARD) |
 			 (1 << NF_IP_POST_ROUTING))) {
 		printk(KERN_WARNING "physdev match: using --physdev-out in the "
 		       "OUTPUT, FORWARD and POSTROUTING chains for non-bridged "
-		       "traffic is not supported anymore.\n");
-		if (hook_mask & (1 << NF_IP_LOCAL_OUT))
-			return false;
+		       "traffic is deprecated and breaks other things, it will "
+		       "be removed in January 2007. See Documentation/"
+		       "feature-removal-schedule.txt for details. This doesn't "
+		       "affect you in case you're using it for purely bridged "
+		       "traffic.\n");
+		brnf_deferred_hooks = 1;
 	}
 	return true;
 }

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: conntrack doesn't always work when a bridge is used
  2007-12-26  9:54                     ` Damien Thébault
@ 2007-12-30 17:53                       ` Patrick McHardy
       [not found]                         ` <9a4a382a0801020118n4166e505l5eb84a9f07f620be@mail.gmail.com>
  0 siblings, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2007-12-30 17:53 UTC (permalink / raw)
  To: Damien Thébault; +Cc: linux-net, netfilter-devel, David S. Miller

Damien Thébault wrote:
> On Dec 22, 2007 8:56 AM, Patrick McHardy <kaber@trash.net> wrote:
>> Yes, the captures show the effects from the double POSTROUTING
>> invocation. Could you send me captures from the current net-2.6
>> tree?
>>     
>
> Sure, here they are.
> (I used David Miller's net-2.6.25 at 75fa3253609430f28da005da494ce5ad3b5c78a1 )
>   

Thanks. They still show the double POST_ROUTING effects (the retransmitted
\0a), but I can't figure out why this would be happening. Please add TRACE
rules in both directions for the FTP control traffic and post the output.
This will allow to verify that we're indeed dealing with double hook
invocations and not some other bug:

modprobe ipt_LOG
iptables -t raw -A OUTPUT -p tcp --dport 21 -j TRACE
iptables -t raw -A OUTPUT -p tcp --sport 21 -j TRACE
iptables -t raw -A PREROUTING -p tcp --dport 21 -j TRACE
iptables -t raw -A PREROUTING -p tcp --sport 21 -j TRACE

Thanks.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: conntrack doesn't always work when a bridge is used
       [not found]                         ` <9a4a382a0801020118n4166e505l5eb84a9f07f620be@mail.gmail.com>
@ 2008-01-11  8:10                           ` Damien Thébault
  2008-01-11 12:24                             ` Patrick McHardy
  0 siblings, 1 reply; 21+ messages in thread
From: Damien Thébault @ 2008-01-11  8:10 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: linux-net, netfilter-devel, David S. Miller

2008/1/2 Damien Thébault <damien.thebault@gmail.com>:
> On Dec 30, 2007 6:53 PM, Patrick McHardy <kaber@trash.net> wrote:
> >
> > Thanks. They still show the double POST_ROUTING effects (the retransmitted
> > \0a), but I can't figure out why this would be happening. Please add TRACE
> > rules in both directions for the FTP control traffic and post the output.
> > This will allow to verify that we're indeed dealing with double hook
> > invocations and not some other bug:
> >
> > modprobe ipt_LOG
> > iptables -t raw -A OUTPUT -p tcp --dport 21 -j TRACE
> > iptables -t raw -A OUTPUT -p tcp --sport 21 -j TRACE
> > iptables -t raw -A PREROUTING -p tcp --dport 21 -j TRACE
> > iptables -t raw -A PREROUTING -p tcp --sport 21 -j TRACE
> >
> > Thanks.
> >
>
> I captured those files with "tail -n 0 -f /var/log/messages". The
> first setup (trace1.log) is the "working" one.
>
> Regards.

I tried to use the patch I created earlier (the one adding the hooks
again). I said it worked but it does not everytime.

By the way, Patrick, what do you think about this bug? Maybe I
shouldn't rely on bridges but it's a useful feature sometimes.

Regards.
-- 
Damien Thebault
-
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: conntrack doesn't always work when a bridge is used
  2008-01-11  8:10                           ` Damien Thébault
@ 2008-01-11 12:24                             ` Patrick McHardy
  2008-01-11 12:53                               ` Damien Thébault
  0 siblings, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2008-01-11 12:24 UTC (permalink / raw)
  To: Damien Thébault; +Cc: linux-net, netfilter-devel, David S. Miller

Damien Thébault wrote:
> 2008/1/2 Damien Thébault <damien.thebault@gmail.com>:
>   
>> On Dec 30, 2007 6:53 PM, Patrick McHardy <kaber@trash.net> wrote:
>>     
>>> Thanks. They still show the double POST_ROUTING effects (the retransmitted
>>> \0a), but I can't figure out why this would be happening. Please add TRACE
>>> rules in both directions for the FTP control traffic and post the output.
>>> This will allow to verify that we're indeed dealing with double hook
>>> invocations and not some other bug:
>>>
>>> modprobe ipt_LOG
>>> iptables -t raw -A OUTPUT -p tcp --dport 21 -j TRACE
>>> iptables -t raw -A OUTPUT -p tcp --sport 21 -j TRACE
>>> iptables -t raw -A PREROUTING -p tcp --dport 21 -j TRACE
>>> iptables -t raw -A PREROUTING -p tcp --sport 21 -j TRACE
>>>       
> I tried to use the patch I created earlier (the one adding the hooks
> again). I said it worked but it does not everytime.
>
> By the way, Patrick, what do you think about this bug? Maybe I
> shouldn't rely on bridges but it's a useful feature sometimes.
>   

No, this should work properly. I just tried to reproduce it,
but I only get a single POSTROUTING invocation. I tried with
real bridged traffic, traffic routed between two different
bridge devices and traffic routed between a bridge device
and a normal ethernet device, but everything seems to work
correctly.

Could you send me the commands you're using to configure
your setup and everything (routing, iptables, ...) that
could be related?





^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: conntrack doesn't always work when a bridge is used
  2008-01-11 12:24                             ` Patrick McHardy
@ 2008-01-11 12:53                               ` Damien Thébault
  2008-01-11 12:57                                 ` Patrick McHardy
  0 siblings, 1 reply; 21+ messages in thread
From: Damien Thébault @ 2008-01-11 12:53 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: linux-net, netfilter-devel, David S. Miller

On Jan 11, 2008 1:24 PM, Patrick McHardy <kaber@trash.net> wrote:
> Damien Thébault wrote:
> >
> > By the way, Patrick, what do you think about this bug? Maybe I
> > shouldn't rely on bridges but it's a useful feature sometimes.
> >
>
> No, this should work properly. I just tried to reproduce it,
> but I only get a single POSTROUTING invocation. I tried with
> real bridged traffic, traffic routed between two different
> bridge devices and traffic routed between a bridge device
> and a normal ethernet device, but everything seems to work
> correctly.
>
> Could you send me the commands you're using to configure
> your setup and everything (routing, iptables, ...) that
> could be related?
>

On the router, I'm using this script :

ifconfig eth0 0.0.0.0 up
brctl addbr br0
brctl addif br0 eth0
ifconfig br0 192.168.1.70 up
ifconfig br0:0 192.168.2.70 up
iptables -t nat -A POSTROUTING -d 192.168.2.0/24 -j MASQUERADE
iptables -t nat -A PREROUTING -d 192.168.2.250 -j DNAT
--to-destination 192.168.2.50
modprobe nf_nat_ftp
echo 1 > /proc/sys/net/ipv4/ip_forward

And for logging :

modprobe ipt_LOG
iptables -t raw -A OUTPUT -p tcp --dport 21 -j TRACE
iptables -t raw -A OUTPUT -p tcp --sport 21 -j TRACE
iptables -t raw -A PREROUTING -p tcp --dport 21 -j TRACE
iptables -t raw -A PREROUTING -p tcp --sport 21 -j TRACE

I only have one interface (eth0), that's why I use br0 and br0:0, so
the wireshark captures show each packet twice, input on br0 and output
on br0:0 (or input on br0:0 and output on br0) when capturing on eth0.

On the ftp client/server :

ifconfig eth2 192.168.1.50
ifconfig eth2:0 192.168.2.50
ip route del 192.168.2.0/24
ip route add 192.168.2.0/24 dev eth2 via 192.168.1.70

And then I try to connect to 192.168.2.250, this will use the router
192.168.1.70 on eth2, wille be DNATted to 192.168.2.50 and will come
back on eth2:0 on the ftp server.

Like the router captures, we have eth2 and eth2:0 together when
capturing on eth2.

This configuration will work fine, but if I run any of this on the
router, it will not work well anymore :

ifconfig br0:0 192.168.2.7 up

or

ifconfig br0:0 192.168.2.170 up

I don't think I'm using anything else.
-- 
Damien Thebault

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: conntrack doesn't always work when a bridge is used
  2008-01-11 12:53                               ` Damien Thébault
@ 2008-01-11 12:57                                 ` Patrick McHardy
  2008-01-11 13:25                                   ` Patrick McHardy
  0 siblings, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2008-01-11 12:57 UTC (permalink / raw)
  To: Damien Thébault; +Cc: linux-net, netfilter-devel, David S. Miller

Damien Thébault wrote:
> On Jan 11, 2008 1:24 PM, Patrick McHardy <kaber@trash.net> wrote:
>   
>> No, this should work properly. I just tried to reproduce it,
>> but I only get a single POSTROUTING invocation. I tried with
>> real bridged traffic, traffic routed between two different
>> bridge devices and traffic routed between a bridge device
>> and a normal ethernet device, but everything seems to work
>> correctly.
>>
>> Could you send me the commands you're using to configure
>> your setup and everything (routing, iptables, ...) that
>> could be related?
>>
>>     
>
> On the router, I'm using this script :
>
> ifconfig eth0 0.0.0.0 up
> brctl addbr br0
> brctl addif br0 eth0
> ifconfig br0 192.168.1.70 up
> ifconfig br0:0 192.168.2.70 up
> iptables -t nat -A POSTROUTING -d 192.168.2.0/24 -j MASQUERADE
> iptables -t nat -A PREROUTING -d 192.168.2.250 -j DNAT
> --to-destination 192.168.2.50
> modprobe nf_nat_ftp
> echo 1 > /proc/sys/net/ipv4/ip_forward
>
> And for logging :
>
> modprobe ipt_LOG
> iptables -t raw -A OUTPUT -p tcp --dport 21 -j TRACE
> iptables -t raw -A OUTPUT -p tcp --sport 21 -j TRACE
> iptables -t raw -A PREROUTING -p tcp --dport 21 -j TRACE
> iptables -t raw -A PREROUTING -p tcp --sport 21 -j TRACE
>
> I only have one interface (eth0), that's why I use br0 and br0:0, so
> the wireshark captures show each packet twice, input on br0 and output
> on br0:0 (or input on br0:0 and output on br0) when capturing on eth0.
>
> On the ftp client/server :
>
> ifconfig eth2 192.168.1.50
> ifconfig eth2:0 192.168.2.50
> ip route del 192.168.2.0/24
> ip route add 192.168.2.0/24 dev eth2 via 192.168.1.70
>
> And then I try to connect to 192.168.2.250, this will use the router
> 192.168.1.70 on eth2, wille be DNATted to 192.168.2.50 and will come
> back on eth2:0 on the ftp server.
>
> Like the router captures, we have eth2 and eth2:0 together when
> capturing on eth2.
>
> This configuration will work fine, but if I run any of this on the
> router, it will not work well anymore :
>
> ifconfig br0:0 192.168.2.7 up
>
> or
>
> ifconfig br0:0 192.168.2.170 up
>
> I don't think I'm using anything else.
>   

Thanks. Its the DNAT rule thats causing this, the bridge netfilter code
calls dst_output directly for bridged dnated frames, causing these
hook invocations:

                PREROUTING
dst_output()    POSTROUTING
                FORWARD
                POSTROUTING


which is obviously broken. I'll see if I can come up with a fix for this.

-
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: conntrack doesn't always work when a bridge is used
  2008-01-11 12:57                                 ` Patrick McHardy
@ 2008-01-11 13:25                                   ` Patrick McHardy
  2008-01-11 15:16                                     ` Damien Thébault
  0 siblings, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2008-01-11 13:25 UTC (permalink / raw)
  To: Damien Thébault; +Cc: linux-net, netfilter-devel, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 1088 bytes --]

Patrick McHardy wrote:
> Damien Thébault wrote:
>> On the router, I'm using this script :
>>
>> ifconfig eth0 0.0.0.0 up
>> brctl addbr br0
>> brctl addif br0 eth0
>> ifconfig br0 192.168.1.70 up
>> ifconfig br0:0 192.168.2.70 up
>> iptables -t nat -A POSTROUTING -d 192.168.2.0/24 -j MASQUERADE
>> iptables -t nat -A PREROUTING -d 192.168.2.250 -j DNAT
>> --to-destination 192.168.2.50 

>
> Thanks. Its the DNAT rule thats causing this, the bridge netfilter code
> calls dst_output directly for bridged dnated frames, causing these
> hook invocations:
>
>                PREROUTING
> dst_output()    POSTROUTING
>                FORWARD
>                POSTROUTING
>
>
> which is obviously broken. I'll see if I can come up with a fix for this.

It appears this has always been broken. Could you test this patch please?

The bridge code only calls dst_output to get a new destination MAC
address for the DNATed packet when the new destination is reachable
on the same bridge, so this patch simply hands the packet to the
neighbour output function without going through the IP stack.



[-- Attachment #2: x --]
[-- Type: text/plain, Size: 672 bytes --]

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index c1757c7..362fe89 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -285,12 +285,17 @@ static int br_nf_pre_routing_finish_bridge(struct sk_buff *skb)
 	skb->nf_bridge->mask ^= BRNF_NF_BRIDGE_PREROUTING;
 
 	skb->dev = bridge_parent(skb->dev);
-	if (!skb->dev)
-		kfree_skb(skb);
-	else {
+	if (skb->dev) {
+		struct dst_entry *dst = skb->dst;
+
 		nf_bridge_pull_encap_header(skb);
-		skb->dst->output(skb);
+
+		if (dst->hh)
+			return neigh_hh_output(dst->hh, skb);
+		else if (dst->neighbour)
+			return dst->neighbour->output(skb);
 	}
+	kfree_skb(skb);
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: conntrack doesn't always work when a bridge is used
  2008-01-11 13:25                                   ` Patrick McHardy
@ 2008-01-11 15:16                                     ` Damien Thébault
  2008-01-11 17:33                                       ` Patrick McHardy
  0 siblings, 1 reply; 21+ messages in thread
From: Damien Thébault @ 2008-01-11 15:16 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: linux-net, netfilter-devel, David S. Miller

On Jan 11, 2008 2:25 PM, Patrick McHardy <kaber@trash.net> wrote:
> Patrick McHardy wrote:
> > Damien Thébault wrote:
> >> On the router, I'm using this script :
> >>
> >> ifconfig eth0 0.0.0.0 up
> >> brctl addbr br0
> >> brctl addif br0 eth0
> >> ifconfig br0 192.168.1.70 up
> >> ifconfig br0:0 192.168.2.70 up
> >> iptables -t nat -A POSTROUTING -d 192.168.2.0/24 -j MASQUERADE
> >> iptables -t nat -A PREROUTING -d 192.168.2.250 -j DNAT
> >> --to-destination 192.168.2.50
>
> >
> > Thanks. Its the DNAT rule thats causing this, the bridge netfilter code
> > calls dst_output directly for bridged dnated frames, causing these
> > hook invocations:
> >
> >                PREROUTING
> > dst_output()    POSTROUTING
> >                FORWARD
> >                POSTROUTING
> >
> >
> > which is obviously broken. I'll see if I can come up with a fix for this.
>
> It appears this has always been broken. Could you test this patch please?
>
> The bridge code only calls dst_output to get a new destination MAC
> address for the DNATed packet when the new destination is reachable
> on the same bridge, so this patch simply hands the packet to the
> neighbour output function without going through the IP stack.
>
>
>
> diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
> index c1757c7..362fe89 100644
> --- a/net/bridge/br_netfilter.c
> +++ b/net/bridge/br_netfilter.c
> @@ -285,12 +285,17 @@ static int br_nf_pre_routing_finish_bridge(struct sk_buff *skb)
>         skb->nf_bridge->mask ^= BRNF_NF_BRIDGE_PREROUTING;
>
>         skb->dev = bridge_parent(skb->dev);
> -       if (!skb->dev)
> -               kfree_skb(skb);
> -       else {
> +       if (skb->dev) {
> +               struct dst_entry *dst = skb->dst;
> +
>                 nf_bridge_pull_encap_header(skb);
> -               skb->dst->output(skb);
> +
> +               if (dst->hh)
> +                       return neigh_hh_output(dst->hh, skb);
> +               else if (dst->neighbour)
> +                       return dst->neighbour->output(skb);
>         }
> +       kfree_skb(skb);
>         return 0;
>  }
>
>
>

I confirm that this patch solves the problem with this setup, thanks!

Does this mean that without this patch, DNAT doesn't work (correctly)
on a bridge?

-- 
Damien Thebault

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: conntrack doesn't always work when a bridge is used
  2008-01-11 15:16                                     ` Damien Thébault
@ 2008-01-11 17:33                                       ` Patrick McHardy
  0 siblings, 0 replies; 21+ messages in thread
From: Patrick McHardy @ 2008-01-11 17:33 UTC (permalink / raw)
  To: Damien Thébault; +Cc: linux-net, netfilter-devel, David S. Miller

Damien Thébault wrote:
>> diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
>> index c1757c7..362fe89 100644
>> --- a/net/bridge/br_netfilter.c
>> +++ b/net/bridge/br_netfilter.c
>> @@ -285,12 +285,17 @@ static int br_nf_pre_routing_finish_bridge(struct sk_buff *skb)
>>         skb->nf_bridge->mask ^= BRNF_NF_BRIDGE_PREROUTING;
>>
>>         skb->dev = bridge_parent(skb->dev);
>> -       if (!skb->dev)
>> -               kfree_skb(skb);
>> -       else {
>> +       if (skb->dev) {
>> +               struct dst_entry *dst = skb->dst;
>> +
>>                 nf_bridge_pull_encap_header(skb);
>> -               skb->dst->output(skb);
>> +
>> +               if (dst->hh)
>> +                       return neigh_hh_output(dst->hh, skb);
>> +               else if (dst->neighbour)
>> +                       return dst->neighbour->output(skb);
>>         }
>> +       kfree_skb(skb);
>>         return 0;
>>  }
>>
>>
>>
> 
> I confirm that this patch solves the problem with this setup, thanks!

Thanks a lot for testing and providing all the data.

> Does this mean that without this patch, DNAT doesn't work (correctly)
> on a bridge?

DNAT itself works, but the incorrect POSTROUTING hook invocation
can break other things like packet mangling by NAT helpers.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2008-01-11 17:33 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <9a4a382a0712180648i7fc958edt6f0d9db83f574c77@mail.gmail.com>
2007-12-19 17:00 ` conntrack doesn't always work when a bridge is used Damien Thébault
2007-12-19 19:03   ` Patrick McHardy
2007-12-20  8:30     ` Damien Thébault
2007-12-20 10:06       ` Patrick McHardy
2007-12-20 11:06         ` Damien Thébault
2007-12-20 11:07           ` Patrick McHardy
2007-12-20 11:20             ` Damien Thébault
2007-12-20 11:25               ` Patrick McHardy
2007-12-20 13:21                 ` Damien Thébault
2007-12-20 16:08                   ` Damien Thébault
2007-12-22  7:56                   ` Patrick McHardy
2007-12-26  9:54                     ` Damien Thébault
2007-12-30 17:53                       ` Patrick McHardy
     [not found]                         ` <9a4a382a0801020118n4166e505l5eb84a9f07f620be@mail.gmail.com>
2008-01-11  8:10                           ` Damien Thébault
2008-01-11 12:24                             ` Patrick McHardy
2008-01-11 12:53                               ` Damien Thébault
2008-01-11 12:57                                 ` Patrick McHardy
2008-01-11 13:25                                   ` Patrick McHardy
2008-01-11 15:16                                     ` Damien Thébault
2008-01-11 17:33                                       ` Patrick McHardy
2007-12-28 14:39   ` Damien Thébault

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).