* ip_route_output_key returns wrong gateway info with specific ip rules
@ 2014-05-09 13:41 Andreas Herz
2014-05-10 9:56 ` Maciej Żenczykowski
0 siblings, 1 reply; 5+ messages in thread
From: Andreas Herz @ 2014-05-09 13:41 UTC (permalink / raw)
To: netfilter-devel
Hi,
I found some strange results when i use "ip_route_output_key" in my own
small kernel netfilter module. I narrowed it down to the issue, that the
information in "rt->gateway" seems to be wrong.
If i have a ip rule with a specific "from" IP or network, it seems that
the kernel doesn't parse/compare it correctly. As soon as i switch to
"from all" it's fine.
So my guess is, there is a problem when parsing the "from" part from ip
rules.
The setup is the following:
Client --- Server --- Gateway --- WAN
The Client has the IP 10.0.20.2, the Server has the IP 10.0.20.1 on the
side to the client. On the other side he has the IP 10.0.12.2 and
10.0.13.2 (alias IP) and the Destination is a gateway with the IPs
10.0.12.1 and 10.0.13.1 which is connected to the WAN.
The Server is using kernel 3.14 from kernel.org and debian base.
Could reproduce it on redhat, too.
I start a "ping 8.8.8.8" the server receives the package and forwards it
to the Gateway.
The server has the following routing table:
10.0.12.0/24 dev eth1 proto kernel scope link src 10.0.12.2
10.0.13.0/24 dev eth1 proto kernel scope link src 10.0.13.2
10.0.20.0/24 dev eth2 proto kernel scope link src 10.0.20.1
default via 10.0.13.1 dev eth1
So the default gateway is the 10.0.13.1 on eth1.
What i want to achieve is, that the packets from this client/net are
send to the 10.0.12.1 with source 10.0.12.2 instead of the default
gateway and IP.
So i created some policy based routing:
0: from all lookup local
16: from all to 10.0.20.0/24 lookup main
16: from all to 10.0.12.0/24 lookup main
16: from all to 10.0.13.0/24 lookup main
2784: from all fwmark 0x10/0xf0 lookup eth1
3296: from 10.0.20.0/24 lookup GW_10.0.12.1_eth1
32766: from all lookup main
32767: from all lookup default
With "ip r list table GW_10.0.12.1_eth1":
default via 10.0.12.1 dev eth1
Now i use the ipt_MASQUERADE module as a base and just added this part
for my test:
"rt = ip_route_output_key(dev_net(skb->dev), &fl);"
And some debug outputs and also the necessary declaring of the flowi4
fl.
What i get for "rt->gateway" in this case is:
10.0.13.1
When i switch
"from 10.0.20.0/24 lookup GW_10.0.12.1_eth1"
to
"from all lookup GW_10.0.12.1_eth1"
i get "10.0.12.1" correctly.
The only iptables rule is the rule in the nat table to jump into the
module. If i log the package i always see the same correct saddr:
"IN=eth2 OUT=eth1 SRC=10.0.20.2 DST=8.8.8.8"
So the ip rule information should be correct.
I have this behaviour since 2.6.32 in which i also used "rt->rt_src"
which is sadly gone :/
(Does anyone know why it got removed?)
Do you have any hint or suggestion for me?
If not i wil try to dig more into it and add more debug parts to
net/ipv4/route.c and recompile the kernel.
--
Andreas Herz
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ip_route_output_key returns wrong gateway info with specific ip rules
2014-05-09 13:41 ip_route_output_key returns wrong gateway info with specific ip rules Andreas Herz
@ 2014-05-10 9:56 ` Maciej Żenczykowski
2014-05-10 10:18 ` Maciej Żenczykowski
2014-05-12 7:55 ` Andreas Herz
0 siblings, 2 replies; 5+ messages in thread
From: Maciej Żenczykowski @ 2014-05-10 9:56 UTC (permalink / raw)
To: Andreas Herz; +Cc: Netfilter Development Mailinglist
(guesswork, read it with a grain of salt)
In my experience stuff like this usually ends up being caused by the
order in which src ip selection (for locally generated packets without
an explicit bind(ip) call, but thus also src ip selection for auto
nat), route lookup and nat happen.
Easiest fix has been to use iptables mangle prerouting to mark
packets, thus also forcing a reroute, and then using fwmark ip rules
instead of (or maybe in addition to) from ip rules.
In your particular case I'm guessing the route lookup is happening
post-source-nat src ip substitution (or even with a 0 during nat sec
ip selection). While theoretically SNAT happens in POSTROUTING, and
thus after routing, I think this is only truly the case for the first
packet of a flow.
To be fair my info may be long obsolete, since I most recently (1-2
weeks ago) ran into something like this on some 2.4 kernel (wrt54gl
openwrt 8.09.2) while trying to use a different src ip for SNAT to
port 80/443 then for all other ports.
In my case iptables mangle prerouting (ip rule fwmark lookup) was used
to mark SYNs from local client ip to dest port 80/443, and then src ip
routing (ip rule from X) worked for everything else (ie. the rest of
the tcp connection).
Although that was with dst port based SNAT rules and not MASQUERADE.
- Maciej
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ip_route_output_key returns wrong gateway info with specific ip rules
2014-05-10 9:56 ` Maciej Żenczykowski
@ 2014-05-10 10:18 ` Maciej Żenczykowski
2014-05-10 10:23 ` Maciej Żenczykowski
2014-05-12 7:55 ` Andreas Herz
1 sibling, 1 reply; 5+ messages in thread
From: Maciej Żenczykowski @ 2014-05-10 10:18 UTC (permalink / raw)
To: Andreas Herz; +Cc: Netfilter Development Mailinglist
To further clarify on the above.
[I'm drastically simplifying as I copy-paste this, hopefully not
introducing mistakes]
Originally I had simply a very standard setup:
-A POSTROUTING -s 192.168.1.0/24 -o wan -p tcp -j SNAT --to-source
A.B.C.201:32768-49151
A.B.C.200/30 dev wan proto kernel scope link src A.B.C.201
192.168.100.0/24 dev wan proto kernel scope link src 192.168.100.3
192.168.1.0/24 dev lan proto kernel scope link src 192.168.1.1
default via A.B.C.202 dev wan
192.168.1.0/24 is lan subnet, .1 being the linux router, .2+ clients
192.168.100.0/24 is wan subnet, .3 is the linux router, .2 is first
cablemodem (and .4 is an experimental second cablemodem, while .1 is
reserved because it seems to be out of box default for tons of
cablemodem hw)
A.B.C.201 is router public ip, A.B.C.202 is the first cablemodem/gw
Now with the above setup everything goes via the first cablemodem.
I wanted to move all http/https traffic to the second cablemodem for
experimental purposes.
In order to do this the packets leaving the router should have src ip
192.168.100.3 and head for 192.168.100.4
First attempt was to simply use:
-A POSTROUTING -s 192.168.1.0/24 -o wan -p tcp -m multiport --dports
80,443 -j SNAT --to-source 192.168.100.3:32768-49151
-A POSTROUTING -s 192.168.1.0/24 -o wan -p tcp -j SNAT --to-source
A.B.C.201:32768-49151
ip rule add pref 100 from 192.168.100.3/24 lookup 100
ip route add default via 192.168.100.4 dev wan src 192.168.100.3 table 100
But it turns out that while you get the right src ip, the SYN packets
are still getting sent out with the first (and not second) cablemodems
destination mac.
And only after receiving a SYN-ACK does the ACK end up getting sent to
the second cablemodem (which of course breaks).
My interpretation is that ROUTE lookup happens before SNAT for the SYN
packet, so we don't know the SRCIP will change, and we do a route
lookup with srcip=client ip,
find the normal route (default via A.B.C.202 dev wan src A.B.C.201)
and thus the dst mac of A.B.C.202 (ie. first cablemodem is used).
On further packets in this stream we apparently know we will nat, and
what we will nat to early enough that we get the proper route.
The following:
-A PREROUTING -i lan -p tcp --syn -m multiport --dports 80,443 -j MARK
--set-mark 1
-A POSTROUTING -s 192.168.1.0/24 -o wan -p tcp -m multiport --dports
80,443 -j SNAT --to-source 192.168.100.3:32768-49151
-A POSTROUTING -s 192.168.1.0/24 -o wan -p tcp -j SNAT --to-source
A.B.C.201:32768-49151
ip rule add pref 100 from 192.168.100.3/24 lookup 100
ip rule add pref 100 fwmark 1 lookup 100
ip route add default via 192.168.100.4 dev wan src 192.168.100.3 table 100
appears to work. We mark the annoying packets with a fwmark 1 before
routing happens, use that to force select the right routing table,
and non-SYN packets work like they did previously.
On Sat, May 10, 2014 at 2:56 AM, Maciej Żenczykowski
<zenczykowski@gmail.com> wrote:
> (guesswork, read it with a grain of salt)
>
> In my experience stuff like this usually ends up being caused by the
> order in which src ip selection (for locally generated packets without
> an explicit bind(ip) call, but thus also src ip selection for auto
> nat), route lookup and nat happen.
> Easiest fix has been to use iptables mangle prerouting to mark
> packets, thus also forcing a reroute, and then using fwmark ip rules
> instead of (or maybe in addition to) from ip rules.
>
> In your particular case I'm guessing the route lookup is happening
> post-source-nat src ip substitution (or even with a 0 during nat sec
> ip selection). While theoretically SNAT happens in POSTROUTING, and
> thus after routing, I think this is only truly the case for the first
> packet of a flow.
>
> To be fair my info may be long obsolete, since I most recently (1-2
> weeks ago) ran into something like this on some 2.4 kernel (wrt54gl
> openwrt 8.09.2) while trying to use a different src ip for SNAT to
> port 80/443 then for all other ports.
> In my case iptables mangle prerouting (ip rule fwmark lookup) was used
> to mark SYNs from local client ip to dest port 80/443, and then src ip
> routing (ip rule from X) worked for everything else (ie. the rest of
> the tcp connection).
> Although that was with dst port based SNAT rules and not MASQUERADE.
>
> - Maciej
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ip_route_output_key returns wrong gateway info with specific ip rules
2014-05-10 10:18 ` Maciej Żenczykowski
@ 2014-05-10 10:23 ` Maciej Żenczykowski
0 siblings, 0 replies; 5+ messages in thread
From: Maciej Żenczykowski @ 2014-05-10 10:23 UTC (permalink / raw)
To: Andreas Herz; +Cc: Netfilter Development Mailinglist
Btw. Obviously you could just skip the '--syn' match, MARK all tcp
packets to those ports, and rely on fwmark rule for everything (and
not just the SYN packets) and not even bother with the 'from
192.168.100.3' rule.
But that would be boring and not showing the issue at hand. ;-)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ip_route_output_key returns wrong gateway info with specific ip rules
2014-05-10 9:56 ` Maciej Żenczykowski
2014-05-10 10:18 ` Maciej Żenczykowski
@ 2014-05-12 7:55 ` Andreas Herz
1 sibling, 0 replies; 5+ messages in thread
From: Andreas Herz @ 2014-05-12 7:55 UTC (permalink / raw)
To: Maciej Żenczykowski; +Cc: Netfilter Development Mailinglist
Hi Maciej,
thanks for your explanations!
On 10/05/14 at 02:56, Maciej Żenczykowski wrote:
> Easiest fix has been to use iptables mangle prerouting to mark
> packets, thus also forcing a reroute, and then using fwmark ip rules
> instead of (or maybe in addition to) from ip rules.
I can workaround this with marked packets etc., sure. But i want to
know/learn why the "ip_route_output_key" from route.c function returns a
different value in my scenario although the data it gets via the
parameter is the same and just the ip rule is different.
I guess i have to add debug parts into route.c and dig more into that, i
just thought maybe some developer already knows what the problem might
be.
Thanks.
--
Andreas Herz
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-05-12 7:55 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-09 13:41 ip_route_output_key returns wrong gateway info with specific ip rules Andreas Herz
2014-05-10 9:56 ` Maciej Żenczykowski
2014-05-10 10:18 ` Maciej Żenczykowski
2014-05-10 10:23 ` Maciej Żenczykowski
2014-05-12 7:55 ` Andreas Herz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).