* Unable to flush ICMP redirect routes in kernel 3.0+
@ 2011-11-15 20:23 Ivan Zahariev
2011-11-15 21:09 ` Eric Dumazet
0 siblings, 1 reply; 26+ messages in thread
From: Ivan Zahariev @ 2011-11-15 20:23 UTC (permalink / raw)
To: netdev
Hello,
We have changed nothing in our network infrastructure but only upgraded
from Linux kernel 2.6.36.2 to 3.0.3. Here is the problem we are
experiencing:
ICMP redirected routes are cached forever, and they can be cleared only
by a reboot.
Here is an example:
root@machine5:~# ip route get 1.1.1.1
1.1.1.1 via 9.0.0.1 dev eth0 src 5.5.5.5
cache <redirected> ipid 0xfb5d rtt 1475ms rttvar 450ms cwnd 10
root@machine5:~# ip route list cache match 1.1.1.1
1.1.1.1 tos lowdelay via 9.0.0.1 dev eth0 src 5.5.5.5
cache <redirected> ipid 0xfb5d rtt 1475ms rttvar 450ms cwnd 10
1.1.1.1 via 9.0.0.1 dev eth0 src 5.5.5.5
cache <redirected> ipid 0xfb5d rtt 1475ms rttvar 450ms cwnd 10
...(two more entries, all go via 9.0.0.1)...
1.1.1.1 is the test destination address
5.5.5.5 is the source IP address of "machine5" via dev eth0, the only
interface besides "lo"
9.0.0.1 is the incorrect gateway which we were redirected to; we want to
change the route to 9.0.0.8
I found no way to clear this route. What I tried:
root@machine5:~# ip route flush cache ### CACHE FLUSH ###
root@machine5:~# ip route list cache match 1.1.1.1 # empty
root@machine5:~# ip route flush cache ### CACHE FLUSH ###
root@machine5:~# echo 1 > /proc/sys/net/ipv4/route/flush
root@machine5:~# ip route list cache match 1.1.1.1 # empty
root@machine5:~# ip route get 1.1.1.1 # magically re-inserts the
<redirected> route, tcpdump sees NO ICMP traffic
1.1.1.1 via 9.0.0.1 dev eth0 src 5.5.5.5
cache <redirected> ipid 0xfb5d rtt 1475ms rttvar 450ms cwnd 10
I also tried to force a scheduled route flush:
root@machine5:~# echo 1 > /proc/sys/net/ipv4/route/gc_timeout
root@machine5:~# echo 1 > /proc/sys/net/ipv4/route/gc_interval
A reboot fixed it all.
This may be related to the "Several major changes to our routing
infrastructure" (https://lkml.org/lkml/2011/3/16/384).
Other users are reporting the same problem:
* https://plus.google.com/u/0/117161704068825702652/posts/1UK1Rp4KA4J
* http://lists.debian.org/debian-kernel/2011/10/msg00633.html
Other similar issues:
* http://www.spinics.net/lists/netdev/msg176966.html
* http://forums.gentoo.org/viewtopic-t-901024-start-0.html
This has been occurring on a few KVM guest machines and also on a
regular Linux machine, so it's not KVM related.
Is this a bug, or it's me who's missing something?
Thanks.
--Ivan
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-15 20:23 Unable to flush ICMP redirect routes in kernel 3.0+ Ivan Zahariev
@ 2011-11-15 21:09 ` Eric Dumazet
2011-11-16 22:32 ` Ivan Zahariev
0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2011-11-15 21:09 UTC (permalink / raw)
To: Ivan Zahariev; +Cc: netdev
Le mardi 15 novembre 2011 à 22:23 +0200, Ivan Zahariev a écrit :
> Hello,
>
> We have changed nothing in our network infrastructure but only upgraded
> from Linux kernel 2.6.36.2 to 3.0.3. Here is the problem we are
> experiencing:
>
> ICMP redirected routes are cached forever, and they can be cleared only
> by a reboot.
>
> Here is an example:
>
> root@machine5:~# ip route get 1.1.1.1
> 1.1.1.1 via 9.0.0.1 dev eth0 src 5.5.5.5
> cache <redirected> ipid 0xfb5d rtt 1475ms rttvar 450ms cwnd 10
>
> root@machine5:~# ip route list cache match 1.1.1.1
> 1.1.1.1 tos lowdelay via 9.0.0.1 dev eth0 src 5.5.5.5
> cache <redirected> ipid 0xfb5d rtt 1475ms rttvar 450ms cwnd 10
> 1.1.1.1 via 9.0.0.1 dev eth0 src 5.5.5.5
> cache <redirected> ipid 0xfb5d rtt 1475ms rttvar 450ms cwnd 10
> ...(two more entries, all go via 9.0.0.1)...
>
> 1.1.1.1 is the test destination address
> 5.5.5.5 is the source IP address of "machine5" via dev eth0, the only
> interface besides "lo"
> 9.0.0.1 is the incorrect gateway which we were redirected to; we want to
> change the route to 9.0.0.8
>
> I found no way to clear this route. What I tried:
>
> root@machine5:~# ip route flush cache ### CACHE FLUSH ###
> root@machine5:~# ip route list cache match 1.1.1.1 # empty
>
> root@machine5:~# ip route flush cache ### CACHE FLUSH ###
> root@machine5:~# echo 1 > /proc/sys/net/ipv4/route/flush
> root@machine5:~# ip route list cache match 1.1.1.1 # empty
>
> root@machine5:~# ip route get 1.1.1.1 # magically re-inserts the
> <redirected> route, tcpdump sees NO ICMP traffic
> 1.1.1.1 via 9.0.0.1 dev eth0 src 5.5.5.5
> cache <redirected> ipid 0xfb5d rtt 1475ms rttvar 450ms cwnd 10
>
> I also tried to force a scheduled route flush:
>
> root@machine5:~# echo 1 > /proc/sys/net/ipv4/route/gc_timeout
> root@machine5:~# echo 1 > /proc/sys/net/ipv4/route/gc_interval
>
> A reboot fixed it all.
>
> This may be related to the "Several major changes to our routing
> infrastructure" (https://lkml.org/lkml/2011/3/16/384).
> Other users are reporting the same problem:
> * https://plus.google.com/u/0/117161704068825702652/posts/1UK1Rp4KA4J
> * http://lists.debian.org/debian-kernel/2011/10/msg00633.html
> Other similar issues:
> * http://www.spinics.net/lists/netdev/msg176966.html
> * http://forums.gentoo.org/viewtopic-t-901024-start-0.html
>
> This has been occurring on a few KVM guest machines and also on a
> regular Linux machine, so it's not KVM related.
>
> Is this a bug, or it's me who's missing something?
>
It is a bug, and as such could you provide needed information for us to
reproduce it ?
What is your network setup ?
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-15 21:09 ` Eric Dumazet
@ 2011-11-16 22:32 ` Ivan Zahariev
2011-11-17 0:33 ` Flavio Leitner
0 siblings, 1 reply; 26+ messages in thread
From: Ivan Zahariev @ 2011-11-16 22:32 UTC (permalink / raw)
To: netdev
On 11/15/2011 11:09 PM, Eric Dumazet wrote:
> Le mardi 15 novembre 2011 à 22:23 +0200, Ivan Zahariev a écrit :
>> Hello,
>>
>> We have changed nothing in our network infrastructure but only upgraded
>> from Linux kernel 2.6.36.2 to 3.0.3. Here is the problem we are
>> experiencing:
>>
>> ICMP redirected routes are cached forever, and they can be cleared only
>> by a reboot.
>>
>> Here is an example:
>>
>> root@machine5:~# ip route get 1.1.1.1
>> 1.1.1.1 via 9.0.0.1 dev eth0 src 5.5.5.5
>> cache<redirected> ipid 0xfb5d rtt 1475ms rttvar 450ms cwnd 10
>>
>> root@machine5:~# ip route list cache match 1.1.1.1
>> 1.1.1.1 tos lowdelay via 9.0.0.1 dev eth0 src 5.5.5.5
>> cache<redirected> ipid 0xfb5d rtt 1475ms rttvar 450ms cwnd 10
>> 1.1.1.1 via 9.0.0.1 dev eth0 src 5.5.5.5
>> cache<redirected> ipid 0xfb5d rtt 1475ms rttvar 450ms cwnd 10
>> ...(two more entries, all go via 9.0.0.1)...
>>
>> 1.1.1.1 is the test destination address
>> 5.5.5.5 is the source IP address of "machine5" via dev eth0, the only
>> interface besides "lo"
>> 9.0.0.1 is the incorrect gateway which we were redirected to; we want to
>> change the route to 9.0.0.8
>>
>> I found no way to clear this route. What I tried:
>>
>> root@machine5:~# ip route flush cache ### CACHE FLUSH ###
>> root@machine5:~# ip route list cache match 1.1.1.1 # empty
>>
>> root@machine5:~# ip route flush cache ### CACHE FLUSH ###
>> root@machine5:~# echo 1> /proc/sys/net/ipv4/route/flush
>> root@machine5:~# ip route list cache match 1.1.1.1 # empty
>>
>> root@machine5:~# ip route get 1.1.1.1 # magically re-inserts the
>> <redirected> route, tcpdump sees NO ICMP traffic
>> 1.1.1.1 via 9.0.0.1 dev eth0 src 5.5.5.5
>> cache<redirected> ipid 0xfb5d rtt 1475ms rttvar 450ms cwnd 10
>>
>> I also tried to force a scheduled route flush:
>>
>> root@machine5:~# echo 1> /proc/sys/net/ipv4/route/gc_timeout
>> root@machine5:~# echo 1> /proc/sys/net/ipv4/route/gc_interval
>>
>> A reboot fixed it all.
>>
>> This may be related to the "Several major changes to our routing
>> infrastructure" (https://lkml.org/lkml/2011/3/16/384).
>> Other users are reporting the same problem:
>> * https://plus.google.com/u/0/117161704068825702652/posts/1UK1Rp4KA4J
>> * http://lists.debian.org/debian-kernel/2011/10/msg00633.html
>> Other similar issues:
>> * http://www.spinics.net/lists/netdev/msg176966.html
>> * http://forums.gentoo.org/viewtopic-t-901024-start-0.html
>>
>> This has been occurring on a few KVM guest machines and also on a
>> regular Linux machine, so it's not KVM related.
>>
>> Is this a bug, or it's me who's missing something?
>>
> It is a bug, and as such could you provide needed information for us to
> reproduce it ?
>
> What is your network setup ?
Network setup is nothing fancy. We have the following machines on a
single /24 ethernet segment:
* 192.168.0.244 (machine5) -- the machine on which we reproduce the
kernel routing bug; kernel: 3.0.3-grsec
* 192.168.0.8 (router8) -- the default gw for the whole
192.168.0.0/24 network; does SNAT; kernel: 2.6.32-5-686
* 192.168.0.120 -- another host with disabled ip_forwarding; must be up
and reachable
There are two bugs actually:
1. Basically, *any* ICMP redirect is cached forever.
2. The output of "ip route" is not consistent with the kernel's routing
behavior.
Quick fix: Disabling "net.ipv4.conf.*.accept_redirects" on all
interfaces works OK and prevents ICMP redirects from affecting the
internal route cache.
Here is a sample test-case scenario:
### right after a clean machine reboot
root@machine5:~# ip route list cache match 8.8.4.4
root@machine5:~# ip route get 8.8.4.4
8.8.4.4 via 192.168.0.8 dev eth0 src 192.168.0.244
cache
### make a TCP request; the TCP packets go to the default gw
192.168.0.8; we see this with a tcpdump at 192.168.0.8
root@machine5:~# telnet 8.8.4.4
### route is still OK and as expected
root@machine5:~# ip route list cache match 8.8.4.4
8.8.4.4 from 192.168.0.244 tos lowdelay via 192.168.0.8 dev eth0
cache ipid 0x303a
8.8.4.4 tos lowdelay via 192.168.0.8 dev eth0 src 192.168.0.244
cache ipid 0x303a
8.8.4.4 via 192.168.0.8 dev eth0 src 192.168.0.244
cache
root@machine5:~# ip route get 8.8.4.4
8.8.4.4 via 192.168.0.8 dev eth0 src 192.168.0.244
cache
### change route to a fake host on the same subnet, so that an ICMP
redirect will follow later
### we also disable NAT for 192.168.0.244, so that an ICMP redirect is
sent accordingly
root@router8:~# route add -host 8.8.4.4 gw 192.168.0.120
### first TCP packet goes to the default gw 192.168.0.8; we see this
with a tcpdump at 192.168.0.8
root@machine5:~# telnet 8.8.4.4
### at machine5: we got the ICMP redirect from the default gw, as expected
# tcpdump: IP 192.168.0.8 > 192.168.0.244: ICMP redirect 8.8.4.4 to host
192.168.0.120, length 68
### the TCP packets now start to use the <redirected> route
192.168.0.120; we see this with a tcpdump at 192.168.0.120
root@machine5:~# telnet 8.8.4.4
### (bug #2) what "ip route" returns is inconsistent, because we are
using the <redirected> route 192.168.0.120 in reality
### note that the count of the route lines increased with one
root@machine5:~# ip route list cache match 8.8.4.4
8.8.4.4 from 192.168.0.244 tos lowdelay via 192.168.0.8 dev eth0
cache ipid 0x303a
8.8.4.4 tos lowdelay via 192.168.0.8 dev eth0 src 192.168.0.244
cache ipid 0x303a
8.8.4.4 via 192.168.0.8 dev eth0 src 192.168.0.244
cache
8.8.4.4 from 192.168.0.244 tos lowdelay via 192.168.0.8 dev eth0
cache ipid 0x303a
root@machine5:~# ip route get 8.8.4.4
8.8.4.4 via 192.168.0.8 dev eth0 src 192.168.0.244
cache
### restore the route on the default gw 192.168.0.8, so that it accepts
8.8.4.4 as destination again
### restore NAT for 192.168.0.244
root@router8:~# route del -host 8.8.4.4 gw 192.168.0.120
### (bug #1) even though we flushed the route cache, the <redirected>
route resurrects from somewhere; even without making any TCP requests
### this time what "ip" returns is consistent with the real (incorrect)
routing behavior of machine5
root@machine5:~# ip route flush cache
root@machine5:~# ip route list cache match 8.8.4.4
root@machine5:~# ip route get 8.8.4.4
8.8.4.4 via 192.168.0.120 dev eth0 src 192.168.0.244
cache <redirected> ipid 0x303a
### the TCP packets STILL use the <redirected> route 192.168.0.120; we
see this with a tcpdump at 192.168.0.120
root@machine5:~# telnet 8.8.4.4
### only a reboot clears the cached <redirected> routes
Cheers.
--Ivan
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-16 22:32 ` Ivan Zahariev
@ 2011-11-17 0:33 ` Flavio Leitner
2011-11-17 8:10 ` Ivan Zahariev
0 siblings, 1 reply; 26+ messages in thread
From: Flavio Leitner @ 2011-11-17 0:33 UTC (permalink / raw)
To: Ivan Zahariev; +Cc: netdev
On Thu, 17 Nov 2011 00:32:18 +0200
Ivan Zahariev <famzah@icdsoft.com> wrote:
> On 11/15/2011 11:09 PM, Eric Dumazet wrote:
> > Le mardi 15 novembre 2011 à 22:23 +0200, Ivan Zahariev a écrit :
> >> Hello,
> >>
> >> We have changed nothing in our network infrastructure but only
> >> upgraded from Linux kernel 2.6.36.2 to 3.0.3. Here is the problem
> >> we are experiencing:
> >>
> >> ICMP redirected routes are cached forever, and they can be cleared
> >> only by a reboot.
> >>
> ### (bug #1) even though we flushed the route cache, the <redirected>
> route resurrects from somewhere; even without making any TCP requests
> ### this time what "ip" returns is consistent with the real
> (incorrect) routing behavior of machine5
> root@machine5:~# ip route flush cache
> root@machine5:~# ip route list cache match 8.8.4.4
> root@machine5:~# ip route get 8.8.4.4
> 8.8.4.4 via 192.168.0.120 dev eth0 src 192.168.0.244
> cache <redirected> ipid 0x303a
>
> ### only a reboot clears the cached <redirected> routes
IIRC, the cache flush doesn't affect the inetpeer where the
redirected gateway is now stored, so even after flushing the
route cache, the inetpeer will restore the old info later.
fbl
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-17 0:33 ` Flavio Leitner
@ 2011-11-17 8:10 ` Ivan Zahariev
2011-11-17 13:11 ` Flavio Leitner
0 siblings, 1 reply; 26+ messages in thread
From: Ivan Zahariev @ 2011-11-17 8:10 UTC (permalink / raw)
To: netdev
On 17.11.2011 г. 02:33 ч., Flavio Leitner wrote:
> On Thu, 17 Nov 2011 00:32:18 +0200
> Ivan Zahariev<famzah@icdsoft.com> wrote:
>
>> On 11/15/2011 11:09 PM, Eric Dumazet wrote:
>>> Le mardi 15 novembre 2011 à 22:23 +0200, Ivan Zahariev a écrit :
>>>> Hello,
>>>>
>>>> We have changed nothing in our network infrastructure but only
>>>> upgraded from Linux kernel 2.6.36.2 to 3.0.3. Here is the problem
>>>> we are experiencing:
>>>>
>>>> ICMP redirected routes are cached forever, and they can be cleared
>>>> only by a reboot.
>>>>
>> ### (bug #1) even though we flushed the route cache, the<redirected>
>> route resurrects from somewhere; even without making any TCP requests
>> ### this time what "ip" returns is consistent with the real
>> (incorrect) routing behavior of machine5
>> root@machine5:~# ip route flush cache
>> root@machine5:~# ip route list cache match 8.8.4.4
>> root@machine5:~# ip route get 8.8.4.4
>> 8.8.4.4 via 192.168.0.120 dev eth0 src 192.168.0.244
>> cache<redirected> ipid 0x303a
>>
>> ### only a reboot clears the cached<redirected> routes
> IIRC, the cache flush doesn't affect the inetpeer where the
> redirected gateway is now stored, so even after flushing the
> route cache, the inetpeer will restore the old info later.
>
> fbl
OK, I guess my questions now are:
* How to flush the inetpeer (redirected cache info) without having to
reboot the machine?
* Why "ip route" returns an incorrect route; example:
### (bug #2) what "ip route" returns is inconsistent, because we are
using the <redirected> route 192.168.0.120 in reality
### note that the count of the route lines increased with one
root@machine5:~# ip route list cache match 8.8.4.4
8.8.4.4 from 192.168.0.244 tos lowdelay via 192.168.0.8 dev eth0
cache ipid 0x303a
8.8.4.4 tos lowdelay via 192.168.0.8 dev eth0 src 192.168.0.244
cache ipid 0x303a
8.8.4.4 via 192.168.0.8 dev eth0 src 192.168.0.244
cache
8.8.4.4 from 192.168.0.244 tos lowdelay via 192.168.0.8 dev eth0
cache ipid 0x303a
### After "ip route flush cache", the output of "ip route" gets
consistent with the real routing behavior of machine5
root@machine5:~# ip route flush cache
root@machine5:~# ip route list cache match 8.8.4.4
root@machine5:~# ip route get 8.8.4.4
8.8.4.4 via 192.168.0.120 dev eth0 src 192.168.0.244
cache <redirected> ipid 0x303a
Thanks.
--Ivan
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-17 8:10 ` Ivan Zahariev
@ 2011-11-17 13:11 ` Flavio Leitner
2011-11-17 13:15 ` Eric Dumazet
0 siblings, 1 reply; 26+ messages in thread
From: Flavio Leitner @ 2011-11-17 13:11 UTC (permalink / raw)
To: Ivan Zahariev; +Cc: netdev
On Thu, 17 Nov 2011 10:10:08 +0200
Ivan Zahariev <famzah@icdsoft.com> wrote:
> On 17.11.2011 г. 02:33 ч., Flavio Leitner wrote:
> > On Thu, 17 Nov 2011 00:32:18 +0200
> > Ivan Zahariev<famzah@icdsoft.com> wrote:
> >
> >> On 11/15/2011 11:09 PM, Eric Dumazet wrote:
> >>> Le mardi 15 novembre 2011 à 22:23 +0200, Ivan Zahariev a écrit :
> >>>> Hello,
> >>>>
> >>>> We have changed nothing in our network infrastructure but only
> >>>> upgraded from Linux kernel 2.6.36.2 to 3.0.3. Here is the problem
> >>>> we are experiencing:
> >>>>
> >>>> ICMP redirected routes are cached forever, and they can be
> >>>> cleared only by a reboot.
> >>>>
> >> ### (bug #1) even though we flushed the route cache,
> >> the<redirected> route resurrects from somewhere; even without
> >> making any TCP requests ### this time what "ip" returns is
> >> consistent with the real (incorrect) routing behavior of machine5
> >> root@machine5:~# ip route flush cache
> >> root@machine5:~# ip route list cache match 8.8.4.4
> >> root@machine5:~# ip route get 8.8.4.4
> >> 8.8.4.4 via 192.168.0.120 dev eth0 src 192.168.0.244
> >> cache<redirected> ipid 0x303a
> >>
> >> ### only a reboot clears the cached<redirected> routes
> > IIRC, the cache flush doesn't affect the inetpeer where the
> > redirected gateway is now stored, so even after flushing the
> > route cache, the inetpeer will restore the old info later.
> >
> > fbl
> OK, I guess my questions now are:
> * How to flush the inetpeer (redirected cache info) without having to
> reboot the machine?
It will expire after 10min if you don't use that specific host.
> * Why "ip route" returns an incorrect route; example:
I am sorry for not being clear before. It is a bug, indeed.
> ### (bug #2) what "ip route" returns is inconsistent, because we are
> using the <redirected> route 192.168.0.120 in reality
> ### note that the count of the route lines increased with one
> root@machine5:~# ip route list cache match 8.8.4.4
> 8.8.4.4 from 192.168.0.244 tos lowdelay via 192.168.0.8 dev eth0
> cache ipid 0x303a
> 8.8.4.4 tos lowdelay via 192.168.0.8 dev eth0 src 192.168.0.244
> cache ipid 0x303a
> 8.8.4.4 via 192.168.0.8 dev eth0 src 192.168.0.244
> cache
> 8.8.4.4 from 192.168.0.244 tos lowdelay via 192.168.0.8 dev eth0
> cache ipid 0x303a
>
> ### After "ip route flush cache", the output of "ip route" gets
> consistent with the real routing behavior of machine5
> root@machine5:~# ip route flush cache
> root@machine5:~# ip route list cache match 8.8.4.4
> root@machine5:~# ip route get 8.8.4.4
> 8.8.4.4 via 192.168.0.120 dev eth0 src 192.168.0.244
> cache <redirected> ipid 0x303a
>
Now the redirected gateway is stored in inetpeer which represents
an specific peer. In your case, you have one for 8.8.4.4.
When you flush the routing cache everything is flushed, except for
the inetpeer as far as I can tell. Later, when you try to access
the host 8.8.4.4 again, the lookup will create a fresh route but
also find the previous 8.8.4.4 inetpeer, so it will re-use the
previous redirected gateway.
Therefore, the routing is fine, but it is missing a way to
invalidade or expire all related inetpeer entries when the flush
happens.
The inetpeer will expire eventually, so waiting before trying again
would help to work around:
1) flush
2) wait to expire (10min)
3) try again
If you know how to compile a kernel, try to change these thresholds
below to expire faster, then you have to wait less for it to expire
instead of rebooting.
net/ipv4/inetpeer.c:
int inet_peer_minttl __read_mostly = 120 * HZ; /* TTL under high load:
120 sec */ int inet_peer_maxttl __read_mostly = 10 * 60 * HZ; /*
usual time to live: 10 min */
That above is just a workaround, indeed.
I am going to be on vacations in the next couple weeks, so I won't be
able to help fixing this any time soon. However, I am pretty sure
someone else will help though :)
fbl
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-17 13:11 ` Flavio Leitner
@ 2011-11-17 13:15 ` Eric Dumazet
2011-11-17 14:40 ` Eric Dumazet
0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2011-11-17 13:15 UTC (permalink / raw)
To: Flavio Leitner; +Cc: Ivan Zahariev, netdev
Le jeudi 17 novembre 2011 à 11:11 -0200, Flavio Leitner a écrit :
> I am going to be on vacations in the next couple weeks, so I won't be
> able to help fixing this any time soon. However, I am pretty sure
> someone else will help though :)
>
I'll take a look :)
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-17 13:15 ` Eric Dumazet
@ 2011-11-17 14:40 ` Eric Dumazet
2011-11-17 15:37 ` Flavio Leitner
2011-11-17 16:52 ` Vasiliy Kulikov
0 siblings, 2 replies; 26+ messages in thread
From: Eric Dumazet @ 2011-11-17 14:40 UTC (permalink / raw)
To: Flavio Leitner, David Miller; +Cc: Ivan Zahariev, netdev, Vasiliy Kulikov
Le jeudi 17 novembre 2011 à 14:15 +0100, Eric Dumazet a écrit :
> I'll take a look :)
>
While looking at it, I found that ICMP_MIB_INERRORS was incremented in
my (successful) ping tests.
netstat -s
...
Icmp:
...
13 input ICMP message failed.
...
[PATCH] ping: dont increment ICMP_MIB_INERRORS
ping module incorrectly increments ICMP_MIB_INERRORS if feeded with a
frame not belonging to its own sockets.
RFC 2011 states that ICMP_MIB_INERRORS should count "the number of ICMP
messages which the entiry received but determined as having
ICMP-specific errors (bad ICMP checksums, bad length, etc.)."
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Vasiliy Kulikov <segoon@openwall.com>
---
net/ipv4/ping.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index a06f73f..43d4c3b 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -339,7 +339,6 @@ void ping_err(struct sk_buff *skb, u32 info)
sk = ping_v4_lookup(net, iph->daddr, iph->saddr,
ntohs(icmph->un.echo.id), skb->dev->ifindex);
if (sk == NULL) {
- ICMP_INC_STATS_BH(net, ICMP_MIB_INERRORS);
pr_debug("no socket, dropping\n");
return; /* No socket for error */
}
@@ -679,7 +678,6 @@ static int ping_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
pr_debug("ping_queue_rcv_skb(sk=%p,sk->num=%d,skb=%p)\n",
inet_sk(sk), inet_sk(sk)->inet_num, skb);
if (sock_queue_rcv_skb(sk, skb) < 0) {
- ICMP_INC_STATS_BH(sock_net(sk), ICMP_MIB_INERRORS);
kfree_skb(skb);
pr_debug("ping_queue_rcv_skb -> failed\n");
return -1;
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-17 14:40 ` Eric Dumazet
@ 2011-11-17 15:37 ` Flavio Leitner
2011-11-17 16:31 ` Eric Dumazet
2011-11-17 16:52 ` Vasiliy Kulikov
1 sibling, 1 reply; 26+ messages in thread
From: Flavio Leitner @ 2011-11-17 15:37 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, Ivan Zahariev, netdev, Vasiliy Kulikov
On Thu, 17 Nov 2011 15:40:20 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> [PATCH] ping: dont increment ICMP_MIB_INERRORS
>
> ping module incorrectly increments ICMP_MIB_INERRORS if feeded with a
> frame not belonging to its own sockets.
>
> RFC 2011 states that ICMP_MIB_INERRORS should count "the number of
> ICMP messages which the entiry received but determined as having
> ICMP-specific errors (bad ICMP checksums, bad length, etc.)."
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Vasiliy Kulikov <segoon@openwall.com>
Yeah, they aren't ICMP specific errors and the callers already
checked for checksum, lengths, and etc.. increasing that counter
when necessary.
Acked-by: Flavio Leitner <fbl@redhat.com>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-17 15:37 ` Flavio Leitner
@ 2011-11-17 16:31 ` Eric Dumazet
2011-11-17 16:40 ` Flavio Leitner
0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2011-11-17 16:31 UTC (permalink / raw)
To: Flavio Leitner; +Cc: David Miller, Ivan Zahariev, netdev, Vasiliy Kulikov
Le jeudi 17 novembre 2011 à 13:37 -0200, Flavio Leitner a écrit :
> On Thu, 17 Nov 2011 15:40:20 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> > [PATCH] ping: dont increment ICMP_MIB_INERRORS
> >
> > ping module incorrectly increments ICMP_MIB_INERRORS if feeded with a
> > frame not belonging to its own sockets.
> >
> > RFC 2011 states that ICMP_MIB_INERRORS should count "the number of
> > ICMP messages which the entiry received but determined as having
> > ICMP-specific errors (bad ICMP checksums, bad length, etc.)."
> >
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > CC: Vasiliy Kulikov <segoon@openwall.com>
>
> Yeah, they aren't ICMP specific errors and the callers already
> checked for checksum, lengths, and etc.. increasing that counter
> when necessary.
>
> Acked-by: Flavio Leitner <fbl@redhat.com>
Thanks
By the way, redirects dont work at all in net-next
Probably coming from your commit 7cc9150ebe8ec0
(route: fix ICMP redirect validation)
Since calling __ip_route_output_key() will create the route with s = 0,
l = 0 (forcing saddr and dev->ifindex) selectors...
We have to add a 'create' parameter to __ip_route_output_key() so that
ip_rt_redirect() doesnt create a route, only find the existing one in
cache ?
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-17 16:31 ` Eric Dumazet
@ 2011-11-17 16:40 ` Flavio Leitner
2011-11-17 16:45 ` Eric Dumazet
0 siblings, 1 reply; 26+ messages in thread
From: Flavio Leitner @ 2011-11-17 16:40 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, Ivan Zahariev, netdev, Vasiliy Kulikov
On Thu, 17 Nov 2011 17:31:50 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 17 novembre 2011 à 13:37 -0200, Flavio Leitner a écrit :
> > On Thu, 17 Nov 2011 15:40:20 +0100
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > > [PATCH] ping: dont increment ICMP_MIB_INERRORS
> > >
> > > ping module incorrectly increments ICMP_MIB_INERRORS if feeded
> > > with a frame not belonging to its own sockets.
> > >
> > > RFC 2011 states that ICMP_MIB_INERRORS should count "the number of
> > > ICMP messages which the entiry received but determined as having
> > > ICMP-specific errors (bad ICMP checksums, bad length, etc.)."
> > >
> > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > > CC: Vasiliy Kulikov <segoon@openwall.com>
> >
> > Yeah, they aren't ICMP specific errors and the callers already
> > checked for checksum, lengths, and etc.. increasing that counter
> > when necessary.
> >
> > Acked-by: Flavio Leitner <fbl@redhat.com>
>
> Thanks
>
> By the way, redirects dont work at all in net-next
Could you be more specific? It seems to be working here.
> Probably coming from your commit 7cc9150ebe8ec0
> (route: fix ICMP redirect validation)
>
> Since calling __ip_route_output_key() will create the route with s =
> 0, l = 0 (forcing saddr and dev->ifindex) selectors...
>
> We have to add a 'create' parameter to __ip_route_output_key() so that
> ip_rt_redirect() doesnt create a route, only find the existing one in
> cache ?
It should receive redirect after sending a packet, so the route
should be ready at this point.
fbl
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-17 16:40 ` Flavio Leitner
@ 2011-11-17 16:45 ` Eric Dumazet
2011-11-17 16:57 ` Eric Dumazet
2011-11-17 17:01 ` Flavio Leitner
0 siblings, 2 replies; 26+ messages in thread
From: Eric Dumazet @ 2011-11-17 16:45 UTC (permalink / raw)
To: Flavio Leitner; +Cc: David Miller, Ivan Zahariev, netdev, Vasiliy Kulikov
Le jeudi 17 novembre 2011 à 14:40 -0200, Flavio Leitner a écrit :
> On Thu, 17 Nov 2011 17:31:50 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> > Le jeudi 17 novembre 2011 à 13:37 -0200, Flavio Leitner a écrit :
> > > On Thu, 17 Nov 2011 15:40:20 +0100
> > > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > >
> > > > [PATCH] ping: dont increment ICMP_MIB_INERRORS
> > > >
> > > > ping module incorrectly increments ICMP_MIB_INERRORS if feeded
> > > > with a frame not belonging to its own sockets.
> > > >
> > > > RFC 2011 states that ICMP_MIB_INERRORS should count "the number of
> > > > ICMP messages which the entiry received but determined as having
> > > > ICMP-specific errors (bad ICMP checksums, bad length, etc.)."
> > > >
> > > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > > > CC: Vasiliy Kulikov <segoon@openwall.com>
> > >
> > > Yeah, they aren't ICMP specific errors and the callers already
> > > checked for checksum, lengths, and etc.. increasing that counter
> > > when necessary.
> > >
> > > Acked-by: Flavio Leitner <fbl@redhat.com>
> >
> > Thanks
> >
> > By the way, redirects dont work at all in net-next
>
> Could you be more specific? It seems to be working here.
>
> > Probably coming from your commit 7cc9150ebe8ec0
> > (route: fix ICMP redirect validation)
> >
> > Since calling __ip_route_output_key() will create the route with s =
> > 0, l = 0 (forcing saddr and dev->ifindex) selectors...
> >
> > We have to add a 'create' parameter to __ip_route_output_key() so that
> > ip_rt_redirect() doesnt create a route, only find the existing one in
> > cache ?
>
> It should receive redirect after sending a packet, so the route
> should be ready at this point.
I receive the redirect, but the rt->peer is updated on a different route
than the one used by my ping command.
Its updated on the specific route (source address forced, output device
forced), not on the wildcarded route my ping is using.
So next packets are still sent on old gateway...
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-17 14:40 ` Eric Dumazet
2011-11-17 15:37 ` Flavio Leitner
@ 2011-11-17 16:52 ` Vasiliy Kulikov
1 sibling, 0 replies; 26+ messages in thread
From: Vasiliy Kulikov @ 2011-11-17 16:52 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Flavio Leitner, David Miller, Ivan Zahariev, netdev
On Thu, Nov 17, 2011 at 15:40 +0100, Eric Dumazet wrote:
> [PATCH] ping: dont increment ICMP_MIB_INERRORS
>
> ping module incorrectly increments ICMP_MIB_INERRORS if feeded with a
> frame not belonging to its own sockets.
>
> RFC 2011 states that ICMP_MIB_INERRORS should count "the number of ICMP
> messages which the entiry received but determined as having
> ICMP-specific errors (bad ICMP checksums, bad length, etc.)."
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Vasiliy Kulikov <segoon@openwall.com>
Thanks,
> ---
> net/ipv4/ping.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
> index a06f73f..43d4c3b 100644
> --- a/net/ipv4/ping.c
> +++ b/net/ipv4/ping.c
> @@ -339,7 +339,6 @@ void ping_err(struct sk_buff *skb, u32 info)
> sk = ping_v4_lookup(net, iph->daddr, iph->saddr,
> ntohs(icmph->un.echo.id), skb->dev->ifindex);
> if (sk == NULL) {
> - ICMP_INC_STATS_BH(net, ICMP_MIB_INERRORS);
> pr_debug("no socket, dropping\n");
> return; /* No socket for error */
> }
> @@ -679,7 +678,6 @@ static int ping_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> pr_debug("ping_queue_rcv_skb(sk=%p,sk->num=%d,skb=%p)\n",
> inet_sk(sk), inet_sk(sk)->inet_num, skb);
> if (sock_queue_rcv_skb(sk, skb) < 0) {
> - ICMP_INC_STATS_BH(sock_net(sk), ICMP_MIB_INERRORS);
> kfree_skb(skb);
> pr_debug("ping_queue_rcv_skb -> failed\n");
> return -1;
>
--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-17 16:45 ` Eric Dumazet
@ 2011-11-17 16:57 ` Eric Dumazet
2011-11-17 17:01 ` Flavio Leitner
1 sibling, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2011-11-17 16:57 UTC (permalink / raw)
To: Flavio Leitner; +Cc: David Miller, Ivan Zahariev, netdev, Vasiliy Kulikov
Some traces :
$ ping -c 3 213.254.248.147
PING 213.254.248.147 (213.254.248.147) 56(84) bytes of data.
>From 192.168.20.108: icmp_seq=1 Redirect Host(New nexthop: 192.168.20.254)
2011/10/17 17:55:56.827 64 bytes from 213.254.248.147: icmp_seq=1 ttl=55 time=10.4 ms
>From 192.168.20.108: icmp_seq=2 Redirect Host(New nexthop: 192.168.20.254)
2011/10/17 17:55:57.828 64 bytes from 213.254.248.147: icmp_seq=2 ttl=55 time=9.87 ms
>From 192.168.20.108: icmp_seq=3 Redirect Host(New nexthop: 192.168.20.254)
2011/10/17 17:55:58.829 64 bytes from 213.254.248.147: icmp_seq=3 ttl=55 time=10.2 ms
--- 213.254.248.147 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 9.871/10.198/10.478/0.275 ms
$ ip ro list cache match 213.254.248.147
213.254.248.147 from 192.168.20.110 via 192.168.20.108 dev vlan.103
cache ipid 0x7187
213.254.248.147 from 192.168.20.110 via 192.168.20.254 dev vlan.103
cache <redirected> ipid 0x7187
213.254.248.147 via 192.168.20.108 dev vlan.103 src 192.168.20.110
cache
$ ip ro get 213.254.248.147
213.254.248.147 via 192.168.20.108 dev vlan.103 src 192.168.20.110
cache
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-17 16:45 ` Eric Dumazet
2011-11-17 16:57 ` Eric Dumazet
@ 2011-11-17 17:01 ` Flavio Leitner
2011-11-17 17:18 ` Eric Dumazet
1 sibling, 1 reply; 26+ messages in thread
From: Flavio Leitner @ 2011-11-17 17:01 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, Ivan Zahariev, netdev, Vasiliy Kulikov
On Thu, 17 Nov 2011 17:45:19 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 17 novembre 2011 à 14:40 -0200, Flavio Leitner a écrit :
> > On Thu, 17 Nov 2011 17:31:50 +0100
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > > Le jeudi 17 novembre 2011 à 13:37 -0200, Flavio Leitner a écrit :
> > > > On Thu, 17 Nov 2011 15:40:20 +0100
> > > > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > >
> > > > > [PATCH] ping: dont increment ICMP_MIB_INERRORS
> > > > >
> > > > > ping module incorrectly increments ICMP_MIB_INERRORS if feeded
> > > > > with a frame not belonging to its own sockets.
> > > > >
> > > > > RFC 2011 states that ICMP_MIB_INERRORS should count "the
> > > > > number of ICMP messages which the entiry received but
> > > > > determined as having ICMP-specific errors (bad ICMP
> > > > > checksums, bad length, etc.)."
> > > > >
> > > > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > > > > CC: Vasiliy Kulikov <segoon@openwall.com>
> > > >
> > > > Yeah, they aren't ICMP specific errors and the callers already
> > > > checked for checksum, lengths, and etc.. increasing that counter
> > > > when necessary.
> > > >
> > > > Acked-by: Flavio Leitner <fbl@redhat.com>
> > >
> > > Thanks
> > >
> > > By the way, redirects dont work at all in net-next
> >
> > Could you be more specific? It seems to be working here.
> >
> > > Probably coming from your commit 7cc9150ebe8ec0
> > > (route: fix ICMP redirect validation)
> > >
> > > Since calling __ip_route_output_key() will create the route with
> > > s = 0, l = 0 (forcing saddr and dev->ifindex) selectors...
> > >
> > > We have to add a 'create' parameter to __ip_route_output_key() so
> > > that ip_rt_redirect() doesnt create a route, only find the
> > > existing one in cache ?
> >
> > It should receive redirect after sending a packet, so the route
> > should be ready at this point.
>
> I receive the redirect, but the rt->peer is updated on a different
> route than the one used by my ping command.
>
> Its updated on the specific route (source address forced, output
> device forced), not on the wildcarded route my ping is using.
>
> So next packets are still sent on old gateway...
Right, so the loop trying different oif and saddr isn't working at
all because __ip_route_output_key() will create a route in the first
attempt. Looks like you're right and we need the 'create' parameter
in __ip_route_output_key().
thanks,
fbl
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-17 17:01 ` Flavio Leitner
@ 2011-11-17 17:18 ` Eric Dumazet
2011-11-17 17:33 ` Flavio Leitner
2011-11-17 17:38 ` Eric Dumazet
0 siblings, 2 replies; 26+ messages in thread
From: Eric Dumazet @ 2011-11-17 17:18 UTC (permalink / raw)
To: Flavio Leitner; +Cc: David Miller, Ivan Zahariev, netdev, Vasiliy Kulikov
Le jeudi 17 novembre 2011 à 15:01 -0200, Flavio Leitner a écrit :
> Right, so the loop trying different oif and saddr isn't working at
> all because __ip_route_output_key() will create a route in the first
> attempt. Looks like you're right and we need the 'create' parameter
> in __ip_route_output_key().
And we have to loop on all possible keys combinations.
(like code before David commit (f39925dbde77 ipv4: Cache learned
redirect information in inetpeer.) : it had a goto do_next;
I am preparing a patch.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-17 17:18 ` Eric Dumazet
@ 2011-11-17 17:33 ` Flavio Leitner
2011-11-17 17:38 ` Eric Dumazet
1 sibling, 0 replies; 26+ messages in thread
From: Flavio Leitner @ 2011-11-17 17:33 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, Ivan Zahariev, netdev, Vasiliy Kulikov
On Thu, 17 Nov 2011 18:18:22 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 17 novembre 2011 à 15:01 -0200, Flavio Leitner a écrit :
>
> > Right, so the loop trying different oif and saddr isn't working at
> > all because __ip_route_output_key() will create a route in the first
> > attempt. Looks like you're right and we need the 'create' parameter
> > in __ip_route_output_key().
>
> And we have to loop on all possible keys combinations.
>
> (like code before David commit (f39925dbde77 ipv4: Cache learned
> redirect information in inetpeer.) : it had a goto do_next;
That is correct. I realize that now.
> I am preparing a patch.
thank you.
fbl
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-17 17:18 ` Eric Dumazet
2011-11-17 17:33 ` Flavio Leitner
@ 2011-11-17 17:38 ` Eric Dumazet
2011-11-18 16:02 ` Eric Dumazet
1 sibling, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2011-11-17 17:38 UTC (permalink / raw)
To: Flavio Leitner; +Cc: David Miller, Ivan Zahariev, netdev, Vasiliy Kulikov
Le jeudi 17 novembre 2011 à 18:18 +0100, Eric Dumazet a écrit :
> Le jeudi 17 novembre 2011 à 15:01 -0200, Flavio Leitner a écrit :
>
> > Right, so the loop trying different oif and saddr isn't working at
> > all because __ip_route_output_key() will create a route in the first
> > attempt. Looks like you're right and we need the 'create' parameter
> > in __ip_route_output_key().
>
> And we have to loop on all possible keys combinations.
>
> (like code before David commit (f39925dbde77 ipv4: Cache learned
> redirect information in inetpeer.) : it had a goto do_next;
>
> I am preparing a patch.
>
Unfortunately its not enough.
For some reason, ipv4_dst_check() is not called later, so we cant call
ipv4_dst_check() and change
rt->rt_gateway = peer->redirect_learned.a4;
dst->obsolete strikes again.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-17 17:38 ` Eric Dumazet
@ 2011-11-18 16:02 ` Eric Dumazet
2011-11-18 16:30 ` Flavio Leitner
2011-11-18 20:26 ` David Miller
0 siblings, 2 replies; 26+ messages in thread
From: Eric Dumazet @ 2011-11-18 16:02 UTC (permalink / raw)
To: Flavio Leitner, David Miller; +Cc: Ivan Zahariev, netdev, Vasiliy Kulikov
David, unless I missed something, we should revert commit f39925dbde77
ipv4: Cache learned redirect information in inetpeer.)
With following patch, redirects now work for me.
Thanks !
[PATCH net-next] ipv4: fix redirect handling
commit f39925dbde77 (ipv4: Cache learned redirect information in
inetpeer.) introduced a regression in ICMP redirect handling.
It assumed ipv4_dst_check() would be called because all possible routes
were attached to the inetpeer we modify in ip_rt_redirect(), but thats
not true.
commit 7cc9150ebe (route: fix ICMP redirect validation) tried to fix
this but solution was not complete. (It fixed only one route)
So we must lookup existing routes (including different TOS values) and
call check_peer_redir() on them.
Reported-by: Ivan Zahariev <famzah@icdsoft.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Flavio Leitner <fbl@redhat.com>
---
net/ipv4/route.c | 110 ++++++++++++++++++++++++---------------------
1 file changed, 59 insertions(+), 51 deletions(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 511f4a7..0c74da8 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1304,16 +1304,42 @@ static void rt_del(unsigned hash, struct rtable *rt)
spin_unlock_bh(rt_hash_lock_addr(hash));
}
+static int check_peer_redir(struct dst_entry *dst, struct inet_peer *peer)
+{
+ struct rtable *rt = (struct rtable *) dst;
+ __be32 orig_gw = rt->rt_gateway;
+ struct neighbour *n, *old_n;
+
+ dst_confirm(&rt->dst);
+
+ rt->rt_gateway = peer->redirect_learned.a4;
+
+ n = ipv4_neigh_lookup(&rt->dst, &rt->rt_gateway);
+ if (IS_ERR(n))
+ return PTR_ERR(n);
+ old_n = xchg(&rt->dst._neighbour, n);
+ if (old_n)
+ neigh_release(old_n);
+ if (!n || !(n->nud_state & NUD_VALID)) {
+ if (n)
+ neigh_event_send(n, NULL);
+ rt->rt_gateway = orig_gw;
+ return -EAGAIN;
+ } else {
+ rt->rt_flags |= RTCF_REDIRECTED;
+ call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
+ }
+ return 0;
+}
+
/* called in rcu_read_lock() section */
void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
__be32 saddr, struct net_device *dev)
{
int s, i;
struct in_device *in_dev = __in_dev_get_rcu(dev);
- struct rtable *rt;
__be32 skeys[2] = { saddr, 0 };
int ikeys[2] = { dev->ifindex, 0 };
- struct flowi4 fl4;
struct inet_peer *peer;
struct net *net;
@@ -1336,33 +1362,42 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
goto reject_redirect;
}
- memset(&fl4, 0, sizeof(fl4));
- fl4.daddr = daddr;
for (s = 0; s < 2; s++) {
for (i = 0; i < 2; i++) {
- fl4.flowi4_oif = ikeys[i];
- fl4.saddr = skeys[s];
- rt = __ip_route_output_key(net, &fl4);
- if (IS_ERR(rt))
- continue;
-
- if (rt->dst.error || rt->dst.dev != dev ||
- rt->rt_gateway != old_gw) {
- ip_rt_put(rt);
- continue;
- }
+ unsigned int hash;
+ struct rtable __rcu **rthp;
+ struct rtable *rt;
+
+ hash = rt_hash(daddr, skeys[s], ikeys[i], rt_genid(net));
+
+ rthp = &rt_hash_table[hash].chain;
+
+ while ((rt = rcu_dereference(*rthp)) != NULL) {
+ rthp = &rt->dst.rt_next;
+
+ if (rt->rt_key_dst != daddr ||
+ rt->rt_key_src != skeys[s] ||
+ rt->rt_oif != ikeys[i] ||
+ rt_is_input_route(rt) ||
+ rt_is_expired(rt) ||
+ !net_eq(dev_net(rt->dst.dev), net) ||
+ rt->dst.error ||
+ rt->dst.dev != dev ||
+ rt->rt_gateway != old_gw)
+ continue;
- if (!rt->peer)
- rt_bind_peer(rt, rt->rt_dst, 1);
+ if (!rt->peer)
+ rt_bind_peer(rt, rt->rt_dst, 1);
- peer = rt->peer;
- if (peer) {
- peer->redirect_learned.a4 = new_gw;
- atomic_inc(&__rt_peer_genid);
+ peer = rt->peer;
+ if (peer) {
+ if (peer->redirect_learned.a4 != new_gw) {
+ peer->redirect_learned.a4 = new_gw;
+ atomic_inc(&__rt_peer_genid);
+ }
+ check_peer_redir(&rt->dst, peer);
+ }
}
-
- ip_rt_put(rt);
- return;
}
}
return;
@@ -1649,33 +1684,6 @@ static void ip_rt_update_pmtu(struct dst_entry *dst, u32 mtu)
}
}
-static int check_peer_redir(struct dst_entry *dst, struct inet_peer *peer)
-{
- struct rtable *rt = (struct rtable *) dst;
- __be32 orig_gw = rt->rt_gateway;
- struct neighbour *n, *old_n;
-
- dst_confirm(&rt->dst);
-
- rt->rt_gateway = peer->redirect_learned.a4;
-
- n = ipv4_neigh_lookup(&rt->dst, &rt->rt_gateway);
- if (IS_ERR(n))
- return PTR_ERR(n);
- old_n = xchg(&rt->dst._neighbour, n);
- if (old_n)
- neigh_release(old_n);
- if (!n || !(n->nud_state & NUD_VALID)) {
- if (n)
- neigh_event_send(n, NULL);
- rt->rt_gateway = orig_gw;
- return -EAGAIN;
- } else {
- rt->rt_flags |= RTCF_REDIRECTED;
- call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
- }
- return 0;
-}
static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie)
{
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-18 16:02 ` Eric Dumazet
@ 2011-11-18 16:30 ` Flavio Leitner
2011-11-18 16:34 ` Eric Dumazet
2011-11-18 20:26 ` David Miller
1 sibling, 1 reply; 26+ messages in thread
From: Flavio Leitner @ 2011-11-18 16:30 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, Ivan Zahariev, netdev, Vasiliy Kulikov
On Fri, 18 Nov 2011 17:02:08 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> David, unless I missed something, we should revert commit f39925dbde77
> ipv4: Cache learned redirect information in inetpeer.)
>
> With following patch, redirects now work for me.
>
> Thanks !
>
>
>
> [PATCH net-next] ipv4: fix redirect handling
>
> commit f39925dbde77 (ipv4: Cache learned redirect information in
> inetpeer.) introduced a regression in ICMP redirect handling.
>
> It assumed ipv4_dst_check() would be called because all possible
> routes were attached to the inetpeer we modify in ip_rt_redirect(),
> but thats not true.
>
> commit 7cc9150ebe (route: fix ICMP redirect validation) tried to fix
> this but solution was not complete. (It fixed only one route)
>
> So we must lookup existing routes (including different TOS values) and
> call check_peer_redir() on them.
>
> Reported-by: Ivan Zahariev <famzah@icdsoft.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Flavio Leitner <fbl@redhat.com>
> ---
> net/ipv4/route.c | 110 ++++++++++++++++++++++++---------------------
> 1 file changed, 59 insertions(+), 51 deletions(-)
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 511f4a7..0c74da8 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1304,16 +1304,42 @@ static void rt_del(unsigned hash, struct
> rtable *rt) spin_unlock_bh(rt_hash_lock_addr(hash));
> }
>
> +static int check_peer_redir(struct dst_entry *dst, struct inet_peer
> *peer) +{
> + struct rtable *rt = (struct rtable *) dst;
> + __be32 orig_gw = rt->rt_gateway;
> + struct neighbour *n, *old_n;
> +
> + dst_confirm(&rt->dst);
> +
> + rt->rt_gateway = peer->redirect_learned.a4;
> +
> + n = ipv4_neigh_lookup(&rt->dst, &rt->rt_gateway);
> + if (IS_ERR(n))
> + return PTR_ERR(n);
> + old_n = xchg(&rt->dst._neighbour, n);
> + if (old_n)
> + neigh_release(old_n);
> + if (!n || !(n->nud_state & NUD_VALID)) {
> + if (n)
> + neigh_event_send(n, NULL);
> + rt->rt_gateway = orig_gw;
> + return -EAGAIN;
> + } else {
> + rt->rt_flags |= RTCF_REDIRECTED;
> + call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
> + }
> + return 0;
> +}
> +
> /* called in rcu_read_lock() section */
> void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
> __be32 saddr, struct net_device *dev)
> {
> int s, i;
> struct in_device *in_dev = __in_dev_get_rcu(dev);
> - struct rtable *rt;
> __be32 skeys[2] = { saddr, 0 };
> int ikeys[2] = { dev->ifindex, 0 };
> - struct flowi4 fl4;
> struct inet_peer *peer;
> struct net *net;
>
> @@ -1336,33 +1362,42 @@ void ip_rt_redirect(__be32 old_gw, __be32
> daddr, __be32 new_gw, goto reject_redirect;
> }
>
> - memset(&fl4, 0, sizeof(fl4));
> - fl4.daddr = daddr;
> for (s = 0; s < 2; s++) {
> for (i = 0; i < 2; i++) {
> - fl4.flowi4_oif = ikeys[i];
> - fl4.saddr = skeys[s];
> - rt = __ip_route_output_key(net, &fl4);
> - if (IS_ERR(rt))
> - continue;
> -
> - if (rt->dst.error || rt->dst.dev != dev ||
> - rt->rt_gateway != old_gw) {
> - ip_rt_put(rt);
> - continue;
> - }
> + unsigned int hash;
> + struct rtable __rcu **rthp;
> + struct rtable *rt;
> +
> + hash = rt_hash(daddr, skeys[s], ikeys[i],
> rt_genid(net)); +
> + rthp = &rt_hash_table[hash].chain;
> +
> + while ((rt = rcu_dereference(*rthp)) !=
> NULL) {
> + rthp = &rt->dst.rt_next;
> +
> + if (rt->rt_key_dst != daddr ||
> + rt->rt_key_src != skeys[s] ||
> + rt->rt_oif != ikeys[i] ||
> + rt_is_input_route(rt) ||
> + rt_is_expired(rt) ||
> + !net_eq(dev_net(rt->dst.dev),
> net) ||
> + rt->dst.error ||
> + rt->dst.dev != dev ||
> + rt->rt_gateway != old_gw)
> + continue;
>
I know we are reverting to get it fixed, but this adds the routing
cache back, so what is the plan? Revert to get it working and then
think on new approach to remove the route cache again later?
I had one previous patch using the routing cache posted to the list,
but it won't fix the route flush problem.
thanks,
fbl
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-18 16:30 ` Flavio Leitner
@ 2011-11-18 16:34 ` Eric Dumazet
2011-11-18 17:05 ` Flavio Leitner
0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2011-11-18 16:34 UTC (permalink / raw)
To: Flavio Leitner; +Cc: David Miller, Ivan Zahariev, netdev, Vasiliy Kulikov
Le vendredi 18 novembre 2011 à 14:30 -0200, Flavio Leitner a écrit :
> I know we are reverting to get it fixed, but this adds the routing
> cache back, so what is the plan? Revert to get it working and then
> think on new approach to remove the route cache again later?
>
> I had one previous patch using the routing cache posted to the list,
> but it won't fix the route flush problem.
>
I dont "add the routing cache back".
Note I only fix existing route entries in the cache ;)
A "revert" is probably safe, since we should push a fix for 3.0/3.1/3.2
kernels...
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-18 16:34 ` Eric Dumazet
@ 2011-11-18 17:05 ` Flavio Leitner
2011-11-18 17:07 ` Eric Dumazet
0 siblings, 1 reply; 26+ messages in thread
From: Flavio Leitner @ 2011-11-18 17:05 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, Ivan Zahariev, netdev, Vasiliy Kulikov
On Fri, 18 Nov 2011 17:34:06 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le vendredi 18 novembre 2011 à 14:30 -0200, Flavio Leitner a écrit :
>
> > I know we are reverting to get it fixed, but this adds the routing
> > cache back, so what is the plan? Revert to get it working and then
> > think on new approach to remove the route cache again later?
> >
> > I had one previous patch using the routing cache posted to the list,
> > but it won't fix the route flush problem.
> >
>
> I dont "add the routing cache back".
Sorry, I meant that we are trying to avoid doing this:
+ hash = rt_hash(daddr, skeys[s], ikeys[i],rt_genid(net));
+
+ rthp = &rt_hash_table[hash].chain;
+
+ while ((rt = rcu_dereference(*rthp)) != NULL) {
+ rthp = &rt->dst.rt_next;
anyway, see below.
> Note I only fix existing route entries in the cache ;)
Exactly.
> A "revert" is probably safe, since we should push a fix for
> 3.0/3.1/3.2 kernels...
I agree that reverting is probably safe.
fbl
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-18 17:05 ` Flavio Leitner
@ 2011-11-18 17:07 ` Eric Dumazet
2011-11-18 17:21 ` Flavio Leitner
0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2011-11-18 17:07 UTC (permalink / raw)
To: Flavio Leitner; +Cc: David Miller, Ivan Zahariev, netdev, Vasiliy Kulikov
Le vendredi 18 novembre 2011 à 15:05 -0200, Flavio Leitner a écrit :
> Sorry, I meant that we are trying to avoid doing this:
> + hash = rt_hash(daddr, skeys[s], ikeys[i],rt_genid(net));
> +
> + rthp = &rt_hash_table[hash].chain;
> +
> + while ((rt = rcu_dereference(*rthp)) != NULL) {
> + rthp = &rt->dst.rt_next;
Sure, but this is still needed right now.
Once route cache is removed, this loop wont exist anymore ;)
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-18 17:07 ` Eric Dumazet
@ 2011-11-18 17:21 ` Flavio Leitner
2011-11-18 18:04 ` David Miller
0 siblings, 1 reply; 26+ messages in thread
From: Flavio Leitner @ 2011-11-18 17:21 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, Ivan Zahariev, netdev, Vasiliy Kulikov
On Fri, 18 Nov 2011 18:07:53 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le vendredi 18 novembre 2011 à 15:05 -0200, Flavio Leitner a écrit :
>
> > Sorry, I meant that we are trying to avoid doing this:
> > + hash = rt_hash(daddr, skeys[s],
> > ikeys[i],rt_genid(net)); +
> > + rthp = &rt_hash_table[hash].chain;
> > +
> > + while ((rt = rcu_dereference(*rthp)) !=
> > NULL) {
> > + rthp = &rt->dst.rt_next;
>
> Sure, but this is still needed right now.
Yes, David will not be happy, unfortunately :)
> Once route cache is removed, this loop wont exist anymore ;)
That's the problem, we need to get rid of it first. :)
fbl
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-18 17:21 ` Flavio Leitner
@ 2011-11-18 18:04 ` David Miller
0 siblings, 0 replies; 26+ messages in thread
From: David Miller @ 2011-11-18 18:04 UTC (permalink / raw)
To: fbl; +Cc: eric.dumazet, famzah, netdev, segoon
From: Flavio Leitner <fbl@redhat.com>
Date: Fri, 18 Nov 2011 15:21:42 -0200
> On Fri, 18 Nov 2011 18:07:53 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>> Le vendredi 18 novembre 2011 à 15:05 -0200, Flavio Leitner a écrit :
>>
>> > Sorry, I meant that we are trying to avoid doing this:
>> > + hash = rt_hash(daddr, skeys[s],
>> > ikeys[i],rt_genid(net)); +
>> > + rthp = &rt_hash_table[hash].chain;
>> > +
>> > + while ((rt = rcu_dereference(*rthp)) !=
>> > NULL) {
>> > + rthp = &rt->dst.rt_next;
>>
>> Sure, but this is still needed right now.
>
> Yes, David will not be happy, unfortunately :)
He better be happy that someone is fixing all the bugs he added.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Unable to flush ICMP redirect routes in kernel 3.0+
2011-11-18 16:02 ` Eric Dumazet
2011-11-18 16:30 ` Flavio Leitner
@ 2011-11-18 20:26 ` David Miller
1 sibling, 0 replies; 26+ messages in thread
From: David Miller @ 2011-11-18 20:26 UTC (permalink / raw)
To: eric.dumazet; +Cc: fbl, famzah, netdev, segoon
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 18 Nov 2011 17:02:08 +0100
> David, unless I missed something, we should revert commit f39925dbde77
> ipv4: Cache learned redirect information in inetpeer.)
>
> With following patch, redirects now work for me.
Yes, it doesn't work very well... sigh.
I've applied your patch and queued it up for stable.
Long term we need a different scheme for redirects.
Thanks!
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2011-11-18 20:27 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-15 20:23 Unable to flush ICMP redirect routes in kernel 3.0+ Ivan Zahariev
2011-11-15 21:09 ` Eric Dumazet
2011-11-16 22:32 ` Ivan Zahariev
2011-11-17 0:33 ` Flavio Leitner
2011-11-17 8:10 ` Ivan Zahariev
2011-11-17 13:11 ` Flavio Leitner
2011-11-17 13:15 ` Eric Dumazet
2011-11-17 14:40 ` Eric Dumazet
2011-11-17 15:37 ` Flavio Leitner
2011-11-17 16:31 ` Eric Dumazet
2011-11-17 16:40 ` Flavio Leitner
2011-11-17 16:45 ` Eric Dumazet
2011-11-17 16:57 ` Eric Dumazet
2011-11-17 17:01 ` Flavio Leitner
2011-11-17 17:18 ` Eric Dumazet
2011-11-17 17:33 ` Flavio Leitner
2011-11-17 17:38 ` Eric Dumazet
2011-11-18 16:02 ` Eric Dumazet
2011-11-18 16:30 ` Flavio Leitner
2011-11-18 16:34 ` Eric Dumazet
2011-11-18 17:05 ` Flavio Leitner
2011-11-18 17:07 ` Eric Dumazet
2011-11-18 17:21 ` Flavio Leitner
2011-11-18 18:04 ` David Miller
2011-11-18 20:26 ` David Miller
2011-11-17 16:52 ` Vasiliy Kulikov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).