* nftables: masquerade sets wrong source address
@ 2016-12-13 13:28 Tom Hacohen
2016-12-13 14:32 ` /dev/rob0
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Tom Hacohen @ 2016-12-13 13:28 UTC (permalink / raw)
To: netfilter
Hi,
I've recently migrated from iptables (no modules loaded anymore) to
nftables and came across a weird situation that looks like a bug to
me.
When using "masquerade" it always sets the ip address to that of one
of my interfaces, and not per interface as one would expect.
My config:
flush ruleset
table inet filter {
chain input {
type filter hook input priority 0; policy accept;
iifname lo log accept
}
chain output {
type filter hook output priority 0; policy accept;
}
}
table ip nat {
chain postrouting {
type nat hook postrouting priority 100;
masquerade
}
}
With this, connections to localhost fail because the masquerade line
sets the source IP to that of the wlp1s0 interface, and not of the lo
interface.
Here is output from the log:
IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
SRC=192.168.86.18 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64
ID=64500 DF PROTO=TCP SPT=36844 DPT=8000 WINDOW=43690 RES=0x00 SYN
URGP=0
You can see how the source ip is wrong. This is from running "curl"
trying to connect to a local http server on port 8000.
Removing the masquerade line, or changing it to: "oifname wlp1s0
masquerade" fixes it, but this is just a workaround that will fail in
more complex situations.
I would have loved to provide you with tracing information, but
unfortunately I never got that to work for me.
Tried with kernels: 4.8.12 and 4.4.35 on arch linux. Nft version is 0.6.
Please let me know if there's any other info you'd like me to provide you with.
Thanks,
Tom.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: nftables: masquerade sets wrong source address 2016-12-13 13:28 nftables: masquerade sets wrong source address Tom Hacohen @ 2016-12-13 14:32 ` /dev/rob0 2016-12-13 14:53 ` Tom Hacohen 2016-12-14 22:28 ` Pablo Neira Ayuso 2016-12-17 14:18 ` Liping Zhang 2 siblings, 1 reply; 15+ messages in thread From: /dev/rob0 @ 2016-12-13 14:32 UTC (permalink / raw) To: netfilter On Tue, Dec 13, 2016 at 01:28:41PM +0000, Tom Hacohen wrote: > Removing the masquerade line, or changing it to: "oifname wlp1s0 > masquerade" fixes it, but this is just a workaround that will fail > in more complex situations. ISTM you'd always want to limit a MASQ/SNAT rule by outgoing interface. I don't get why that was "ugly" (as you said in IRC) or likely to fail ... well, certainly if using that ruleset where the default gateway was on some other interface, but so what? Adjust your rule to suit the situation. In more complex situations, such as multiple Internet connections with policy routing, masquerade is not appropriate. You'd have to use SNAT. If you're doing NAT among RFC 1918 networks, YDIW. Fix the routing. -- http://rob0.nodns4.us/ Offlist GMX mail is seen only if "/dev/rob0" is in the Subject: ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: nftables: masquerade sets wrong source address 2016-12-13 14:32 ` /dev/rob0 @ 2016-12-13 14:53 ` Tom Hacohen 0 siblings, 0 replies; 15+ messages in thread From: Tom Hacohen @ 2016-12-13 14:53 UTC (permalink / raw) To: netfilter On Tue, Dec 13, 2016 at 2:32 PM, /dev/rob0 <rob0@gmx.co.uk> wrote: > On Tue, Dec 13, 2016 at 01:28:41PM +0000, Tom Hacohen wrote: >> Removing the masquerade line, or changing it to: "oifname wlp1s0 >> masquerade" fixes it, but this is just a workaround that will fail >> in more complex situations. > > ISTM you'd always want to limit a MASQ/SNAT rule by outgoing > interface. I don't get why that was "ugly" (as you said in IRC) or > likely to fail ... well, certainly if using that ruleset where the > default gateway was on some other interface, but so what? Adjust > your rule to suit the situation. > > In more complex situations, such as multiple Internet connections > with policy routing, masquerade is not appropriate. You'd have to > use SNAT. > > If you're doing NAT among RFC 1918 networks, YDIW. Fix the routing. Perhaps "ugly" wasn't the best choice of words, also, as you said, maybe more complex situations will require other more complex configurations so this simple case wouldn't matter anyway. However, let's leave that aside for a moment and consider the test case I provided in my original email. If masquerade is turned on for "lo" it will set the wrong address. Is that not an issue? Or at least an indication something else may be broken there? At the very least, I found this behaviour surprising. -- Tom > -- > http://rob0.nodns4.us/ > Offlist GMX mail is seen only if "/dev/rob0" is in the Subject: > -- > To unsubscribe from this list: send the line "unsubscribe netfilter" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: nftables: masquerade sets wrong source address 2016-12-13 13:28 nftables: masquerade sets wrong source address Tom Hacohen 2016-12-13 14:32 ` /dev/rob0 @ 2016-12-14 22:28 ` Pablo Neira Ayuso 2016-12-15 11:34 ` Tom Hacohen 2016-12-17 14:18 ` Liping Zhang 2 siblings, 1 reply; 15+ messages in thread From: Pablo Neira Ayuso @ 2016-12-14 22:28 UTC (permalink / raw) To: Tom Hacohen; +Cc: netfilter Hi Tom, On Tue, Dec 13, 2016 at 01:28:41PM +0000, Tom Hacohen wrote: > Hi, > > I've recently migrated from iptables (no modules loaded anymore) to > nftables and came across a weird situation that looks like a bug to > me. > > When using "masquerade" it always sets the ip address to that of one > of my interfaces, and not per interface as one would expect. > > My config: > > flush ruleset > > table inet filter { > chain input { > type filter hook input priority 0; policy accept; > > iifname lo log accept > } > chain output { > type filter hook output priority 0; policy accept; > } > } > > table ip nat { > chain postrouting { > type nat hook postrouting priority 100; > masquerade > } > } > > > With this, connections to localhost fail because the masquerade line > sets the source IP to that of the wlp1s0 interface, and not of the lo > interface. > > Here is output from the log: > IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 > SRC=192.168.86.18 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 > ID=64500 DF PROTO=TCP SPT=36844 DPT=8000 WINDOW=43690 RES=0x00 SYN > URGP=0 > > You can see how the source ip is wrong. This is from running "curl" > trying to connect to a local http server on port 8000. > > Removing the masquerade line, or changing it to: "oifname wlp1s0 > masquerade" fixes it, but this is just a workaround that will fail in > more complex situations. > > I would have loved to provide you with tracing information, but > unfortunately I never got that to work for me. > > Tried with kernels: 4.8.12 and 4.4.35 on arch linux. Nft version is 0.6. > > Please let me know if there's any other info you'd like me to provide you with. I don't remember if this behaviour has been always the case. Would you please check what has been the behaviour in old kernels? nftables shares this masquerade code with iptables, so you can test this with iptables in older kernels. Thanks. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: nftables: masquerade sets wrong source address 2016-12-14 22:28 ` Pablo Neira Ayuso @ 2016-12-15 11:34 ` Tom Hacohen 2016-12-15 21:29 ` Pablo Neira Ayuso 0 siblings, 1 reply; 15+ messages in thread From: Tom Hacohen @ 2016-12-15 11:34 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: netfilter On Wed, Dec 14, 2016 at 10:28 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > Hi Tom, > > On Tue, Dec 13, 2016 at 01:28:41PM +0000, Tom Hacohen wrote: >> Hi, >> >> I've recently migrated from iptables (no modules loaded anymore) to >> nftables and came across a weird situation that looks like a bug to >> me. >> >> When using "masquerade" it always sets the ip address to that of one >> of my interfaces, and not per interface as one would expect. >> >> My config: >> >> flush ruleset >> >> table inet filter { >> chain input { >> type filter hook input priority 0; policy accept; >> >> iifname lo log accept >> } >> chain output { >> type filter hook output priority 0; policy accept; >> } >> } >> >> table ip nat { >> chain postrouting { >> type nat hook postrouting priority 100; >> masquerade >> } >> } >> >> >> With this, connections to localhost fail because the masquerade line >> sets the source IP to that of the wlp1s0 interface, and not of the lo >> interface. >> >> Here is output from the log: >> IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 >> SRC=192.168.86.18 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 >> ID=64500 DF PROTO=TCP SPT=36844 DPT=8000 WINDOW=43690 RES=0x00 SYN >> URGP=0 >> >> You can see how the source ip is wrong. This is from running "curl" >> trying to connect to a local http server on port 8000. >> >> Removing the masquerade line, or changing it to: "oifname wlp1s0 >> masquerade" fixes it, but this is just a workaround that will fail in >> more complex situations. >> >> I would have loved to provide you with tracing information, but >> unfortunately I never got that to work for me. >> >> Tried with kernels: 4.8.12 and 4.4.35 on arch linux. Nft version is 0.6. >> >> Please let me know if there's any other info you'd like me to provide you with. > > I don't remember if this behaviour has been always the case. Would you > please check what has been the behaviour in old kernels? > > nftables shares this masquerade code with iptables, so you can test > this with iptables in older kernels. Hey, Thanks for your reply. I'm sorry, but I don't have access to older kernels. Furthermore, this worked with iptables on the same kernel version using the same rules as far as I can tell. I therefore suspect (without knowing the code) that maybe nftables is trying to masquerade all packets while iptables maybe has a noop when there was no NAT applied, or if the address is already set correctly? That is the best explanation that comes to mind given your assertion about them sharing the masquerade code. Thanks. -- Tom. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: nftables: masquerade sets wrong source address 2016-12-15 11:34 ` Tom Hacohen @ 2016-12-15 21:29 ` Pablo Neira Ayuso 2016-12-15 22:47 ` Tom Hacohen 0 siblings, 1 reply; 15+ messages in thread From: Pablo Neira Ayuso @ 2016-12-15 21:29 UTC (permalink / raw) To: Tom Hacohen; +Cc: netfilter Hi Tom, On Thu, Dec 15, 2016 at 11:34:57AM +0000, Tom Hacohen wrote: > Hey, > > Thanks for your reply. > > I'm sorry, but I don't have access to older kernels. Furthermore, > this worked with iptables on the same kernel version using the same > rules as far as I can tell. Hm, this is working with the same kernel version in iptables? What kernel version are you using? > I therefore suspect (without knowing the code) that maybe nftables is trying > to masquerade all packets while iptables maybe has a noop when there was no > NAT applied, or if the address is already set correctly? That is the best > explanation that comes to mind given your assertion about them sharing the > masquerade code. Both nft and iptables share the same codebase for NAT/masquerade, so if this works with iptables, it shoud work with nft too in the same way. Please, confirm this, it would be good if we get to the core of the problem. If the behaviour differs, or started to differ from some kernel version on, then this is a bug. Thanks. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: nftables: masquerade sets wrong source address 2016-12-15 21:29 ` Pablo Neira Ayuso @ 2016-12-15 22:47 ` Tom Hacohen 2016-12-16 0:04 ` Tom Hacohen 0 siblings, 1 reply; 15+ messages in thread From: Tom Hacohen @ 2016-12-15 22:47 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: netfilter On Thu, Dec 15, 2016 at 9:29 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > Hi Tom, > > On Thu, Dec 15, 2016 at 11:34:57AM +0000, Tom Hacohen wrote: >> Hey, >> >> Thanks for your reply. >> >> I'm sorry, but I don't have access to older kernels. Furthermore, >> this worked with iptables on the same kernel version using the same >> rules as far as I can tell. > > Hm, this is working with the same kernel version in iptables? What > kernel version are you using? > >> I therefore suspect (without knowing the code) that maybe nftables is trying >> to masquerade all packets while iptables maybe has a noop when there was no >> NAT applied, or if the address is already set correctly? That is the best >> explanation that comes to mind given your assertion about them sharing the >> masquerade code. > > Both nft and iptables share the same codebase for NAT/masquerade, so > if this works with iptables, it shoud work with nft too in the same > way. > > Please, confirm this, it would be good if we get to the core of the > problem. If the behaviour differs, or started to differ from some > kernel version on, then this is a bug. > Hi, I can't be sure what I tested on before, because I had a setup that used to work, and then I switched to nftables. I don't remember if I updated the kernel since migrating, but even if I had, it was 4.4.x for sure and probably 4.4.34/35. However, I can say for certain that it used to work for years until very recently, whatever the reason may have been. FWIW, I'm now testing on 4.8.13 and can confirm it's broken with both. I just tested with iptables, and the same masquerade happens for that, so I can confirm that the behaviour is the same with this configuration. SRC=192.168.86.18 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=46247 DF PROTO=TCP SPT=46026 DPT=8000 WINDOW=43690 RES=0x00 SYN URGP=0 SRC=127.0.0.1 DST=127.0.0.1 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=59070 DF PROTO=TCP SPT=8000 DPT=46026 WINDOW=0 RES=0x00 ACK RST URGP=0 Here's the iptables config I used to get these results, which (to me) seems identical to the original nftables config: *nat :PREROUTING ACCEPT [4:252] :INPUT ACCEPT [4:252] :OUTPUT ACCEPT [1:76] :POSTROUTING ACCEPT [0:0] -A POSTROUTING -j MASQUERADE COMMIT *filter :INPUT DROP [0:0] :FORWARD DROP [0:0] :OUTPUT ACCEPT [0:0] -A INPUT -i lo -j LOG -A INPUT -i lo -j ACCEPT COMMIT Other the above, there's nothing interesting in my previous "production" iptables config that used to work. Unfortunately I can't abuse the production machine any longer by changing the firewall rules or jumping back and forth between iptables and nftables. Is there anyone with an older lts kernel that can check this? If so, don't forget to clear your modules, because the nftables nat modules clash with the iptable ones. Regression-hunting aside for a moment, it still looks like a bug to me, even if that bug shared with iptables. There is a simple workaround, just don't masquerade for lo, but still, this looks like something that should be fixed. Please let me know if there's anything else I can do to help. -- Tom ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: nftables: masquerade sets wrong source address 2016-12-15 22:47 ` Tom Hacohen @ 2016-12-16 0:04 ` Tom Hacohen 0 siblings, 0 replies; 15+ messages in thread From: Tom Hacohen @ 2016-12-16 0:04 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: netfilter I'm very confused, just ran the same iptables rules on a freshly booted different box running 4.4.38 and got: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=192.168.86.10 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=17313 DF PROTO=TCP SPT=34548 DPT=8000 WINDOW=43690 RES=0x00 SYN URGP=0 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=8000 DPT=34548 WINDOW=43690 RES=0x00 ACK SYN URGP=0 On the other hand, again, after a fresh boot, this time with nftables, I'm getting: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=192.168.86.10 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=21924 DF PROTO=TCP SPT=48948 DPT=8000 WINDOW=43690 RES=0x00 SYN URGP=0 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=192.168.86.10 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=8000 DPT=48948 WINDOW=43690 RES=0x00 ACK SYN URGP=0 IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=192.168.86.10 DST=127.0.0.1 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=58958 DF PROTO=TCP SPT=48948 DPT=8000 WINDOW=0 RES=0x00 RST URGP=0 I have no idea what is going on. In all of the tests in this thread I've been using "python3 -m http.server" as the test http server and curl as the client. Can you make sense of why I it replies to 127.0.0.1 with iptables, and to 192.168.86.10 (my lan ip) on the other? Also, why doesn't it masquerade the packets after the first one in the iptables case? I also made two additional tests: I started my box ran nftables, it failed as expected, then I ran iptables, it worked and then I ran nftables, and it *worked*. Maybe when iptables initialises masquerading it does it differently from nftables? Is there anything else you'd like me to test? On Thu, Dec 15, 2016 at 10:47 PM, Tom Hacohen <tom@stosb.com> wrote: > On Thu, Dec 15, 2016 at 9:29 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote: >> Hi Tom, >> >> On Thu, Dec 15, 2016 at 11:34:57AM +0000, Tom Hacohen wrote: >>> Hey, >>> >>> Thanks for your reply. >>> >>> I'm sorry, but I don't have access to older kernels. Furthermore, >>> this worked with iptables on the same kernel version using the same >>> rules as far as I can tell. >> >> Hm, this is working with the same kernel version in iptables? What >> kernel version are you using? >> >>> I therefore suspect (without knowing the code) that maybe nftables is trying >>> to masquerade all packets while iptables maybe has a noop when there was no >>> NAT applied, or if the address is already set correctly? That is the best >>> explanation that comes to mind given your assertion about them sharing the >>> masquerade code. >> >> Both nft and iptables share the same codebase for NAT/masquerade, so >> if this works with iptables, it shoud work with nft too in the same >> way. >> >> Please, confirm this, it would be good if we get to the core of the >> problem. If the behaviour differs, or started to differ from some >> kernel version on, then this is a bug. >> > > Hi, > > I can't be sure what I tested on before, because I had a setup that > used to work, and then I switched to nftables. I don't remember if I > updated the kernel since migrating, but even if I had, it was 4.4.x > for sure and probably 4.4.34/35. However, I can say for certain that > it used to work for years until very recently, whatever the reason may > have been. > > FWIW, I'm now testing on 4.8.13 and can confirm it's broken with both. > > I just tested with iptables, and the same masquerade happens for that, > so I can confirm that the behaviour is the same with this > configuration. > > SRC=192.168.86.18 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 > ID=46247 DF PROTO=TCP SPT=46026 DPT=8000 WINDOW=43690 RES=0x00 SYN > URGP=0 > SRC=127.0.0.1 DST=127.0.0.1 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=59070 > DF PROTO=TCP SPT=8000 DPT=46026 WINDOW=0 RES=0x00 ACK RST URGP=0 > > Here's the iptables config I used to get these results, which (to me) > seems identical to the original nftables config: > *nat > :PREROUTING ACCEPT [4:252] > :INPUT ACCEPT [4:252] > :OUTPUT ACCEPT [1:76] > :POSTROUTING ACCEPT [0:0] > -A POSTROUTING -j MASQUERADE > COMMIT > > *filter > :INPUT DROP [0:0] > :FORWARD DROP [0:0] > :OUTPUT ACCEPT [0:0] > -A INPUT -i lo -j LOG > -A INPUT -i lo -j ACCEPT > COMMIT > > Other the above, there's nothing interesting in my previous > "production" iptables config that used to work. Unfortunately I can't > abuse the production machine any longer by changing the firewall rules > or jumping back and forth between iptables and nftables. > > Is there anyone with an older lts kernel that can check this? If so, > don't forget to clear your modules, because the nftables nat modules > clash with the iptable ones. > > Regression-hunting aside for a moment, it still looks like a bug to > me, even if that bug shared with iptables. There is a simple > workaround, just don't masquerade for lo, but still, this looks like > something that should be fixed. > > Please let me know if there's anything else I can do to help. > > -- > Tom ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: nftables: masquerade sets wrong source address 2016-12-13 13:28 nftables: masquerade sets wrong source address Tom Hacohen 2016-12-13 14:32 ` /dev/rob0 2016-12-14 22:28 ` Pablo Neira Ayuso @ 2016-12-17 14:18 ` Liping Zhang 2016-12-19 2:25 ` Liping Zhang 2016-12-20 15:16 ` Tom Hacohen 2 siblings, 2 replies; 15+ messages in thread From: Liping Zhang @ 2016-12-17 14:18 UTC (permalink / raw) To: Tom Hacohen, Pablo Neira Ayuso Cc: netfilter, Netfilter Developer Mailing List Hi Tom, 2016-12-13 21:28 GMT+08:00 Tom Hacohen <tom@stosb.com>: > Hi, > > I've recently migrated from iptables (no modules loaded anymore) to > nftables and came across a weird situation that looks like a bug to > me. > > When using "masquerade" it always sets the ip address to that of one > of my interfaces, and not per interface as one would expect. > > My config: > > flush ruleset > > table inet filter { > chain input { > type filter hook input priority 0; policy accept; > > iifname lo log accept > } > chain output { > type filter hook output priority 0; policy accept; > } > } > > table ip nat { > chain postrouting { > type nat hook postrouting priority 100; > masquerade > } > } > According to the explanations in nftables wifi: https://wiki.nftables.org/wiki-nftables/index.php/Performing_Network_Address_Translation_(NAT) You should add the following nft rules(I agree this is tricky and unfriendly for the end user): # nft add chain nat prerouting { type nat hook prerouting priority 0 \; } But unfortunately, even if you add the above rule, you will still fail to connect to a local server. Now add another nft rules listed below, you can probably make everything work fine: # nft add chain nat output { type nat hook output priority 0 \; } [ cc netfilter-dev group ] For loopback connection, the request packets will traverse: OUTPUT->POSTROUTING->PREROUTING->INPUT and the source ip will be modified in nat POSTROUTING hook. Meanwhile the reply packets will also traverse: OUTPUT->POSTROUTING->PREROUTING->INPUT and if nat OUTPUT hook exist, the destination ip will be modified in it, and re-route will happen. Otherwise, the destination ip will be modified at nat PREROUTING hook, and the dst entry will be dropped. In such situation(i.e. nat OUTPUT doesn't exist), we will try to do routing lookup and packets will be dropped at ip_route_input_slow->martian_destination. Furthermore, if ipt_rpfilter is configured, the reply packet maybe dropped at there. In iptables, nat output chain always exists, so there's no such problem. But I think that enforcing the user to add a nat output chain in nftables is not a good idea, so probably we need a following patch(I only list the ipv4 part): diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c index f8aad03..5bc9b22 100644 --- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c +++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c @@ -344,8 +344,21 @@ nf_nat_ipv4_in(void *priv, struct sk_buff *skb, ret = nf_nat_ipv4_fn(priv, skb, state, do_chain); if (ret != NF_DROP && ret != NF_STOLEN && - daddr != ip_hdr(skb)->daddr) - skb_dst_drop(skb); + daddr != ip_hdr(skb)->daddr) { + const struct rtable *rt = skb_rtable(skb); + int err; + + if (rt) { + if (rt->rt_flags & RTCF_LOCAL) { + err = ip_route_me_harder(state->net, skb, + RTN_UNSPEC); + if (err < 0) + ret = NF_DROP_ERR(err); + } else { + skb_dst_drop(skb); + } + } + } return ret; } > > With this, connections to localhost fail because the masquerade line > sets the source IP to that of the wlp1s0 interface, and not of the lo > interface. > > Here is output from the log: > IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 > SRC=192.168.86.18 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 > ID=64500 DF PROTO=TCP SPT=36844 DPT=8000 WINDOW=43690 RES=0x00 SYN > URGP=0 > > You can see how the source ip is wrong. This is from running "curl" > trying to connect to a local http server on port 8000. > > Removing the masquerade line, or changing it to: "oifname wlp1s0 > masquerade" fixes it, but this is just a workaround that will fail in > more complex situations. > > I would have loved to provide you with tracing information, but > unfortunately I never got that to work for me. > > Tried with kernels: 4.8.12 and 4.4.35 on arch linux. Nft version is 0.6. > > Please let me know if there's any other info you'd like me to provide you with. > > Thanks, > Tom. > -- > To unsubscribe from this list: send the line "unsubscribe netfilter" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: nftables: masquerade sets wrong source address 2016-12-17 14:18 ` Liping Zhang @ 2016-12-19 2:25 ` Liping Zhang 2016-12-20 15:16 ` Tom Hacohen 1 sibling, 0 replies; 15+ messages in thread From: Liping Zhang @ 2016-12-19 2:25 UTC (permalink / raw) To: Tom Hacohen, Pablo Neira Ayuso Cc: netfilter, Netfilter Developer Mailing List 2016-12-17 22:18 GMT+08:00 Liping Zhang <zlpnobody@gmail.com>: > > For loopback connection, the request packets will traverse: > OUTPUT->POSTROUTING->PREROUTING->INPUT > and the source ip will be modified in nat POSTROUTING hook. > > Meanwhile the reply packets will also traverse: > OUTPUT->POSTROUTING->PREROUTING->INPUT > and if nat OUTPUT hook exist, the destination ip will be modified > in it, and re-route will happen. Otherwise, the destination ip will > be modified at nat PREROUTING hook, and the dst entry will > be dropped. In such situation(i.e. nat OUTPUT doesn't exist), > we will try to do routing lookup and packets will be dropped > at ip_route_input_slow->martian_destination. > > Furthermore, if ipt_rpfilter is configured, the reply packet maybe > dropped at there. > > In iptables, nat output chain always exists, so there's no > such problem. > > But I think that enforcing the user to add a nat output chain > in nftables is not a good idea, so probably we need a following > patch(I only list the ipv4 part): > > diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c > b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c > index f8aad03..5bc9b22 100644 > --- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c > +++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c > @@ -344,8 +344,21 @@ nf_nat_ipv4_in(void *priv, struct sk_buff *skb, > > ret = nf_nat_ipv4_fn(priv, skb, state, do_chain); > if (ret != NF_DROP && ret != NF_STOLEN && > - daddr != ip_hdr(skb)->daddr) > - skb_dst_drop(skb); > + daddr != ip_hdr(skb)->daddr) { > + const struct rtable *rt = skb_rtable(skb); > + int err; > + > + if (rt) { > + if (rt->rt_flags & RTCF_LOCAL) { > + err = ip_route_me_harder(state->net, skb, > + RTN_UNSPEC); > + if (err < 0) > + ret = NF_DROP_ERR(err); > + } else { > + skb_dst_drop(skb); > + } > + } > + } > > return ret; > } > Please ignore the above patch, it's incorrect that we use ip_route_output_key for the incoming packets. Maybe the below one will be better, but I'm not sure whether this will break some special use cases or not: diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c index f8aad03..d358670 100644 --- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c +++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c @@ -344,8 +344,13 @@ int nf_nat_icmp_reply_translation(struct sk_buff *skb, ret = nf_nat_ipv4_fn(priv, skb, state, do_chain); if (ret != NF_DROP && ret != NF_STOLEN && - daddr != ip_hdr(skb)->daddr) + daddr != ip_hdr(skb)->daddr) { + if (state->in->flags & IFF_LOOPBACK || + skb->pkt_type == PACKET_LOOPBACK) + return ret; + skb_dst_drop(skb); + } ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: nftables: masquerade sets wrong source address 2016-12-17 14:18 ` Liping Zhang 2016-12-19 2:25 ` Liping Zhang @ 2016-12-20 15:16 ` Tom Hacohen 2016-12-21 2:39 ` Liping Zhang 1 sibling, 1 reply; 15+ messages in thread From: Tom Hacohen @ 2016-12-20 15:16 UTC (permalink / raw) To: Liping Zhang Cc: Pablo Neira Ayuso, netfilter, Netfilter Developer Mailing List On Sat, Dec 17, 2016 at 2:18 PM, Liping Zhang <zlpnobody@gmail.com> wrote: > According to the explanations in nftables wifi: > https://wiki.nftables.org/wiki-nftables/index.php/Performing_Network_Address_Translation_(NAT) > > You should add the following nft rules(I agree this is tricky and > unfriendly for the end user): > # nft add chain nat prerouting { type nat hook prerouting priority 0 \; } > > But unfortunately, even if you add the above rule, you will still fail to > connect to a local server. > Correct, doesn't change anything. > Now add another nft rules listed below, you can probably make everything > work fine: > # nft add chain nat output { type nat hook output priority 0 \; } > Haven't tried it. Why would it change things? Have you tried it? > [ cc netfilter-dev group ] > > For loopback connection, the request packets will traverse: > OUTPUT->POSTROUTING->PREROUTING->INPUT > and the source ip will be modified in nat POSTROUTING hook. The problem is that the IP incorrectly changes to the wrong one. > > Meanwhile the reply packets will also traverse: > OUTPUT->POSTROUTING->PREROUTING->INPUT > and if nat OUTPUT hook exist, the destination ip will be modified > in it, and re-route will happen. Otherwise, the destination ip will > be modified at nat PREROUTING hook, and the dst entry will > be dropped. In such situation(i.e. nat OUTPUT doesn't exist), > we will try to do routing lookup and packets will be dropped > at ip_route_input_slow->martian_destination. I think that all throughout this thread we've been analysing the behaviour in the broken scenario instead of just fixing it (which I see your latest patch may actually do, and that's good). It doesn't matter where the packet goes through after it's been wrongly rewritten, the problem is that it has. > > Furthermore, if ipt_rpfilter is configured, the reply packet maybe > dropped at there. > It's off. > In iptables, nat output chain always exists, so there's no > such problem. > > But I think that enforcing the user to add a nat output chain > in nftables is not a good idea, so probably we need a following > patch(I only list the ipv4 part): Interesting. Maybe that's why it continued to work after iptables has already been loaded on the box. As said above though, I believe the problem is the masquerade setting the wrong ip, and not (only?) the fact that my setup happens to work with iptables but doesn't with nftables. Don't you agree? Thanks, Tom. > > diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c > b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c > index f8aad03..5bc9b22 100644 > --- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c > +++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c > @@ -344,8 +344,21 @@ nf_nat_ipv4_in(void *priv, struct sk_buff *skb, > > ret = nf_nat_ipv4_fn(priv, skb, state, do_chain); > if (ret != NF_DROP && ret != NF_STOLEN && > - daddr != ip_hdr(skb)->daddr) > - skb_dst_drop(skb); > + daddr != ip_hdr(skb)->daddr) { > + const struct rtable *rt = skb_rtable(skb); > + int err; > + > + if (rt) { > + if (rt->rt_flags & RTCF_LOCAL) { > + err = ip_route_me_harder(state->net, skb, > + RTN_UNSPEC); > + if (err < 0) > + ret = NF_DROP_ERR(err); > + } else { > + skb_dst_drop(skb); > + } > + } > + } > > return ret; > } > >> >> With this, connections to localhost fail because the masquerade line >> sets the source IP to that of the wlp1s0 interface, and not of the lo >> interface. >> >> Here is output from the log: >> IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 >> SRC=192.168.86.18 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 >> ID=64500 DF PROTO=TCP SPT=36844 DPT=8000 WINDOW=43690 RES=0x00 SYN >> URGP=0 >> >> You can see how the source ip is wrong. This is from running "curl" >> trying to connect to a local http server on port 8000. >> >> Removing the masquerade line, or changing it to: "oifname wlp1s0 >> masquerade" fixes it, but this is just a workaround that will fail in >> more complex situations. >> >> I would have loved to provide you with tracing information, but >> unfortunately I never got that to work for me. >> >> Tried with kernels: 4.8.12 and 4.4.35 on arch linux. Nft version is 0.6. >> >> Please let me know if there's any other info you'd like me to provide you with. >> >> Thanks, >> Tom. >> -- >> To unsubscribe from this list: send the line "unsubscribe netfilter" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: nftables: masquerade sets wrong source address 2016-12-20 15:16 ` Tom Hacohen @ 2016-12-21 2:39 ` Liping Zhang 2016-12-22 10:26 ` Tom Hacohen 0 siblings, 1 reply; 15+ messages in thread From: Liping Zhang @ 2016-12-21 2:39 UTC (permalink / raw) To: Tom Hacohen Cc: Pablo Neira Ayuso, netfilter, Netfilter Developer Mailing List 2016-12-20 23:16 GMT+08:00 Tom Hacohen <tom@stosb.com>: > On Sat, Dec 17, 2016 at 2:18 PM, Liping Zhang <zlpnobody@gmail.com> wrote: >> According to the explanations in nftables wifi: >> https://wiki.nftables.org/wiki-nftables/index.php/Performing_Network_Address_Translation_(NAT) >> >> You should add the following nft rules(I agree this is tricky and >> unfriendly for the end user): >> # nft add chain nat prerouting { type nat hook prerouting priority 0 \; } >> >> But unfortunately, even if you add the above rule, you will still fail to >> connect to a local server. >> > > Correct, doesn't change anything. > >> Now add another nft rules listed below, you can probably make everything >> work fine: >> # nft add chain nat output { type nat hook output priority 0 \; } >> > > Haven't tried it. Why would it change things? Have you tried it? I tried it and it did take effect. But my test scenario may be different with yours. So can you try it? [...] >> In iptables, nat output chain always exists, so there's no >> such problem. >> >> But I think that enforcing the user to add a nat output chain >> in nftables is not a good idea, so probably we need a following >> patch(I only list the ipv4 part): > > Interesting. Maybe that's why it continued to work after iptables has > already been loaded on the box. > As said above though, I believe the problem is the masquerade setting > the wrong ip, and not (only?) > the fact that my setup happens to work with iptables but doesn't with nftables. As I analyzed, the main difference is that nat OUTPUT hook always exist in iptables, so the reply packet's destination ip address will be modified in OUTPUT hook. While in nftables, without nft output chain, the reply packet's destination ip address will be modified in PREROUTING hook. Then we try to do routing lookup, and the packets will be dropped because the incoming packets' destination ip address is 127.0.0.1 But I think that enforcing the user to add the following nft rule is not friendly: # nft add chain nat output { type nat hook output priority 0 \; } This will become more tricky. Do you agree with this? So I send the related patch to try to improve it. > > Don't you agree? > > Thanks, > Tom. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: nftables: masquerade sets wrong source address 2016-12-21 2:39 ` Liping Zhang @ 2016-12-22 10:26 ` Tom Hacohen 2016-12-22 10:34 ` Florian Westphal 0 siblings, 1 reply; 15+ messages in thread From: Tom Hacohen @ 2016-12-22 10:26 UTC (permalink / raw) To: Liping Zhang Cc: Pablo Neira Ayuso, netfilter, Netfilter Developer Mailing List On Wed, Dec 21, 2016 at 2:39 AM, Liping Zhang <zlpnobody@gmail.com> wrote: > 2016-12-20 23:16 GMT+08:00 Tom Hacohen <tom@stosb.com>: >> On Sat, Dec 17, 2016 at 2:18 PM, Liping Zhang <zlpnobody@gmail.com> wrote: >>> According to the explanations in nftables wifi: >>> https://wiki.nftables.org/wiki-nftables/index.php/Performing_Network_Address_Translation_(NAT) >>> >>> You should add the following nft rules(I agree this is tricky and >>> unfriendly for the end user): >>> # nft add chain nat prerouting { type nat hook prerouting priority 0 \; } >>> >>> But unfortunately, even if you add the above rule, you will still fail to >>> connect to a local server. >>> >> >> Correct, doesn't change anything. >> >>> Now add another nft rules listed below, you can probably make everything >>> work fine: >>> # nft add chain nat output { type nat hook output priority 0 \; } >>> >> >> Haven't tried it. Why would it change things? Have you tried it? > > I tried it and it did take effect. But my test scenario may be > different with yours. > So can you try it? Tried it, and it works. > > [...] >>> In iptables, nat output chain always exists, so there's no >>> such problem. >>> >>> But I think that enforcing the user to add a nat output chain >>> in nftables is not a good idea, so probably we need a following >>> patch(I only list the ipv4 part): >> >> Interesting. Maybe that's why it continued to work after iptables has >> already been loaded on the box. >> As said above though, I believe the problem is the masquerade setting >> the wrong ip, and not (only?) >> the fact that my setup happens to work with iptables but doesn't with nftables. > > As I analyzed, the main difference is that nat OUTPUT hook always > exist in iptables, so the reply packet's destination ip address will be modified > in OUTPUT hook. While in nftables, without nft output chain, the reply packet's > destination ip address will be modified in PREROUTING hook. Then we try to > do routing lookup, and the packets will be dropped because the incoming packets' > destination ip address is 127.0.0.1 > > But I think that enforcing the user to add the following nft rule is > not friendly: > # nft add chain nat output { type nat hook output priority 0 \; } > > This will become more tricky. Do you agree with this? > > So I send the related patch to try to improve it. It's definitely not user friendly to have to add it, especially since I expected having a chain with no rules to be a noop. I don't know how nftables works well enough to comment on one design choice or another, so I can't comment if this needs to be fixed, but this definitely feels inconsistent and buggy. I'm sorry for repeating myself, however I'd like to stress out again, that while your workaround fixes an inconsistency between iptables and nftables, the scenario itself is caused by the buggy behaviour of masquerade with "lo", and that needs to be fixed too. The workaround above, and any fixes to that issue will only fix the dropping of the packets, but the wrong rewrite will still be there. Please let me know if there's anything else you'd like me to test. -- Tom. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: nftables: masquerade sets wrong source address 2016-12-22 10:26 ` Tom Hacohen @ 2016-12-22 10:34 ` Florian Westphal [not found] ` <CAEvi_o8wV5GNk8JvSg96kP3WdDsVsf8PubRe=K1ZiD2+nBaYTg@mail.gmail.com> 0 siblings, 1 reply; 15+ messages in thread From: Florian Westphal @ 2016-12-22 10:34 UTC (permalink / raw) To: Tom Hacohen Cc: Liping Zhang, Pablo Neira Ayuso, netfilter, Netfilter Developer Mailing List Tom Hacohen <tom@stosb.com> wrote: > I'm sorry for repeating myself, however I'd like to stress out again, > that while your workaround fixes an inconsistency between iptables and > nftables, the scenario itself is caused by the buggy behaviour of > masquerade with "lo", and that needs to be fixed too. The workaround > above, and any fixes to that issue will only fix the dropping of the > packets, but the wrong rewrite will still be there. The 'wrong rewrite' also occurs with iptables. It doesn't cause connectivity issues because in iptables the nat table always registers the output hook. (I agree that nft masquerade should not cause these connectivity issues, but I think proper ruleset fix is to use meta iif to restrict masq to the correct interface(s)). ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <CAEvi_o8wV5GNk8JvSg96kP3WdDsVsf8PubRe=K1ZiD2+nBaYTg@mail.gmail.com>]
* Re: nftables: masquerade sets wrong source address [not found] ` <CAEvi_o8wV5GNk8JvSg96kP3WdDsVsf8PubRe=K1ZiD2+nBaYTg@mail.gmail.com> @ 2016-12-22 22:40 ` Tom Hacohen 0 siblings, 0 replies; 15+ messages in thread From: Tom Hacohen @ 2016-12-22 22:40 UTC (permalink / raw) To: Florian Westphal Cc: Pablo Neira Ayuso, Liping Zhang, netfilter, Netfilter Developer Mailing List On Thu, Dec 22, 2016 at 4:56 PM, Tom Hacohen <tom@stosb.com> wrote: > > > On 22 Dec 2016 12:35, "Florian Westphal" <fw@strlen.de> wrote: > > Tom Hacohen <tom@stosb.com> wrote: >> I'm sorry for repeating myself, however I'd like to stress out again, >> that while your workaround fixes an inconsistency between iptables and >> nftables, the scenario itself is caused by the buggy behaviour of >> masquerade with "lo", and that needs to be fixed too. The workaround >> above, and any fixes to that issue will only fix the dropping of the >> packets, but the wrong rewrite will still be there. > > The 'wrong rewrite' also occurs with iptables. > > It doesn't cause connectivity issues because in iptables the nat table > always registers the output hook. > > (I agree that nft masquerade should not cause these connectivity issues, > but I think proper ruleset fix is to use meta iif to restrict masq to > the correct interface(s)). > > > Yes, iptables so misbehaves here. I know you agree about not causing the > connectivity issues, but don't you agree that the wrong rewrite shouldn't > happen? For both iptables and nftables? > > I already use oif to restrict the masquerade, I'm not trying to solve it for > myself, because I already have a working workaround. I'm trying to help > reporting and resolving a bug. > > -- > Tom Resending as plain text. Yes, iptables so misbehaves here. I know you agree about not causing the connectivity issues, but don't you agree that the wrong rewrite shouldn't happen? For both iptables and nftables? I already use oif to restrict the masquerade, I'm not trying to solve it for myself, because I already have a working workaround. I'm trying to help reporting and resolving a bug. -- Tom ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2016-12-22 22:40 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-13 13:28 nftables: masquerade sets wrong source address Tom Hacohen
2016-12-13 14:32 ` /dev/rob0
2016-12-13 14:53 ` Tom Hacohen
2016-12-14 22:28 ` Pablo Neira Ayuso
2016-12-15 11:34 ` Tom Hacohen
2016-12-15 21:29 ` Pablo Neira Ayuso
2016-12-15 22:47 ` Tom Hacohen
2016-12-16 0:04 ` Tom Hacohen
2016-12-17 14:18 ` Liping Zhang
2016-12-19 2:25 ` Liping Zhang
2016-12-20 15:16 ` Tom Hacohen
2016-12-21 2:39 ` Liping Zhang
2016-12-22 10:26 ` Tom Hacohen
2016-12-22 10:34 ` Florian Westphal
[not found] ` <CAEvi_o8wV5GNk8JvSg96kP3WdDsVsf8PubRe=K1ZiD2+nBaYTg@mail.gmail.com>
2016-12-22 22:40 ` Tom Hacohen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox