From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ivan Zahariev Subject: Re: Unable to flush ICMP redirect routes in kernel 3.0+ Date: Thu, 17 Nov 2011 00:32:18 +0200 Message-ID: <4EC439F2.3080809@icdsoft.com> References: <4EC2CA52.6020104@icdsoft.com> <1321391355.2602.0.camel@edumazet-laptop> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE To: netdev@vger.kernel.org Return-path: Received: from icdsoft.com ([64.14.68.165]:36826 "EHLO us.icdsoft.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753136Ab1KPWcU (ORCPT ); Wed, 16 Nov 2011 17:32:20 -0500 In-Reply-To: <1321391355.2602.0.camel@edumazet-laptop> Sender: netdev-owner@vger.kernel.org List-ID: On 11/15/2011 11:09 PM, Eric Dumazet wrote: > Le mardi 15 novembre 2011 =E0 22:23 +0200, Ivan Zahariev a =E9crit : >> Hello, >> >> We have changed nothing in our network infrastructure but only upgra= ded >> from Linux kernel 2.6.36.2 to 3.0.3. Here is the problem we are >> experiencing: >> >> ICMP redirected routes are cached forever, and they can be cleared o= nly >> by a reboot. >> >> Here is an example: >> >> root@machine5:~# ip route get 1.1.1.1 >> 1.1.1.1 via 9.0.0.1 dev eth0 src 5.5.5.5 >> cache ipid 0xfb5d rtt 1475ms rttvar 450ms cwnd 1= 0 >> >> root@machine5:~# ip route list cache match 1.1.1.1 >> 1.1.1.1 tos lowdelay via 9.0.0.1 dev eth0 src 5.5.5.5 >> cache ipid 0xfb5d rtt 1475ms rttvar 450ms cwnd 1= 0 >> 1.1.1.1 via 9.0.0.1 dev eth0 src 5.5.5.5 >> cache ipid 0xfb5d rtt 1475ms rttvar 450ms cwnd 1= 0 >> ...(two more entries, all go via 9.0.0.1)... >> >> 1.1.1.1 is the test destination address >> 5.5.5.5 is the source IP address of "machine5" via dev eth0, the onl= y >> interface besides "lo" >> 9.0.0.1 is the incorrect gateway which we were redirected to; we wan= t to >> change the route to 9.0.0.8 >> >> I found no way to clear this route. What I tried: >> >> root@machine5:~# ip route flush cache ### CACHE FLUSH ### >> root@machine5:~# ip route list cache match 1.1.1.1 # empty >> >> root@machine5:~# ip route flush cache ### CACHE FLUSH ### >> root@machine5:~# echo 1> /proc/sys/net/ipv4/route/flush >> root@machine5:~# ip route list cache match 1.1.1.1 # empty >> >> root@machine5:~# ip route get 1.1.1.1 # magically re-inserts the >> route, tcpdump sees NO ICMP traffic >> 1.1.1.1 via 9.0.0.1 dev eth0 src 5.5.5.5 >> cache ipid 0xfb5d rtt 1475ms rttvar 450ms cwnd 1= 0 >> >> I also tried to force a scheduled route flush: >> >> root@machine5:~# echo 1> /proc/sys/net/ipv4/route/gc_timeout >> root@machine5:~# echo 1> /proc/sys/net/ipv4/route/gc_interval >> >> A reboot fixed it all. >> >> This may be related to the "Several major changes to our routing >> infrastructure" (https://lkml.org/lkml/2011/3/16/384). >> Other users are reporting the same problem: >> * https://plus.google.com/u/0/117161704068825702652/posts/1UK1Rp4KA4= J >> * http://lists.debian.org/debian-kernel/2011/10/msg00633.html >> Other similar issues: >> * http://www.spinics.net/lists/netdev/msg176966.html >> * http://forums.gentoo.org/viewtopic-t-901024-start-0.html >> >> This has been occurring on a few KVM guest machines and also on a >> regular Linux machine, so it's not KVM related. >> >> Is this a bug, or it's me who's missing something? >> > It is a bug, and as such could you provide needed information for us = to > reproduce it ? > > What is your network setup ? Network setup is nothing fancy. We have the following machines on a=20 single /24 ethernet segment: * 192.168.0.244 (machine5) -- the machine on which we reproduce the=20 kernel routing bug; kernel: 3.0.3-grsec * 192.168.0.8 (router8) -- the default gw for the whole=20 192.168.0.0/24 network; does SNAT; kernel: 2.6.32-5-686 * 192.168.0.120 -- another host with disabled ip_forwarding; must be up= =20 and reachable There are two bugs actually: 1. Basically, *any* ICMP redirect is cached forever. 2. The output of "ip route" is not consistent with the kernel's routing= =20 behavior. Quick fix: Disabling "net.ipv4.conf.*.accept_redirects" on all=20 interfaces works OK and prevents ICMP redirects from affecting the=20 internal route cache. Here is a sample test-case scenario: ### right after a clean machine reboot root@machine5:~# ip route list cache match 8.8.4.4 root@machine5:~# ip route get 8.8.4.4 8.8.4.4 via 192.168.0.8 dev eth0 src 192.168.0.244 cache ### make a TCP request; the TCP packets go to the default gw=20 192.168.0.8; we see this with a tcpdump at 192.168.0.8 root@machine5:~# telnet 8.8.4.4 ### route is still OK and as expected root@machine5:~# ip route list cache match 8.8.4.4 8.8.4.4 from 192.168.0.244 tos lowdelay via 192.168.0.8 dev eth0 cache ipid 0x303a 8.8.4.4 tos lowdelay via 192.168.0.8 dev eth0 src 192.168.0.244 cache ipid 0x303a 8.8.4.4 via 192.168.0.8 dev eth0 src 192.168.0.244 cache root@machine5:~# ip route get 8.8.4.4 8.8.4.4 via 192.168.0.8 dev eth0 src 192.168.0.244 cache ### change route to a fake host on the same subnet, so that an ICMP=20 redirect will follow later ### we also disable NAT for 192.168.0.244, so that an ICMP redirect is=20 sent accordingly root@router8:~# route add -host 8.8.4.4 gw 192.168.0.120 ### first TCP packet goes to the default gw 192.168.0.8; we see this=20 with a tcpdump at 192.168.0.8 root@machine5:~# telnet 8.8.4.4 ### at machine5: we got the ICMP redirect from the default gw, as expec= ted # tcpdump: IP 192.168.0.8 > 192.168.0.244: ICMP redirect 8.8.4.4 to hos= t=20 192.168.0.120, length 68 ### the TCP packets now start to use the route=20 192.168.0.120; we see this with a tcpdump at 192.168.0.120 root@machine5:~# telnet 8.8.4.4 ### (bug #2) what "ip route" returns is inconsistent, because we are=20 using the route 192.168.0.120 in reality ### note that the count of the route lines increased with one root@machine5:~# ip route list cache match 8.8.4.4 8.8.4.4 from 192.168.0.244 tos lowdelay via 192.168.0.8 dev eth0 cache ipid 0x303a 8.8.4.4 tos lowdelay via 192.168.0.8 dev eth0 src 192.168.0.244 cache ipid 0x303a 8.8.4.4 via 192.168.0.8 dev eth0 src 192.168.0.244 cache 8.8.4.4 from 192.168.0.244 tos lowdelay via 192.168.0.8 dev eth0 cache ipid 0x303a root@machine5:~# ip route get 8.8.4.4 8.8.4.4 via 192.168.0.8 dev eth0 src 192.168.0.244 cache ### restore the route on the default gw 192.168.0.8, so that it accepts= =20 8.8.4.4 as destination again ### restore NAT for 192.168.0.244 root@router8:~# route del -host 8.8.4.4 gw 192.168.0.120 ### (bug #1) even though we flushed the route cache, the =20 route resurrects from somewhere; even without making any TCP requests ### this time what "ip" returns is consistent with the real (incorrect)= =20 routing behavior of machine5 root@machine5:~# ip route flush cache root@machine5:~# ip route list cache match 8.8.4.4 root@machine5:~# ip route get 8.8.4.4 8.8.4.4 via 192.168.0.120 dev eth0 src 192.168.0.244 cache ipid 0x303a ### the TCP packets STILL use the route 192.168.0.120; we=20 see this with a tcpdump at 192.168.0.120 root@machine5:~# telnet 8.8.4.4 ### only a reboot clears the cached routes Cheers. --Ivan