From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tokarev Subject: 3.0: unexpected route cache entry for wrong segment? Date: Thu, 09 Feb 2012 21:02:06 +0400 Message-ID: <4F33FC0E.4020701@msgid.tls.msk.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit To: netdev Return-path: Received: from isrv.corpit.ru ([86.62.121.231]:53071 "EHLO isrv.corpit.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757212Ab2BIRCK (ORCPT ); Thu, 9 Feb 2012 12:02:10 -0500 Sender: netdev-owner@vger.kernel.org List-ID: Hello. I'm observing a situation when just one single IP address from entirely different segment gets routed locally as if it were in a directly-connected network. Here's how. The short version, to show the idea, first: A host with single eth0 interface and single IP address (not counting loopback interface): $ ip addr 8: eth0: mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 52:54:c0:a8:b1:02 brd ff:ff:ff:ff:ff:ff inet 192.168.177.2/26 scope global eth0 $ ip route default via 192.168.177.5 dev eth0 192.168.177.0/26 dev eth0 proto kernel scope link src 192.168.177.2 $ ip neigh ... 192.168.177.5 dev eth0 lladdr 00:90:27:30:6d:1c REACHABLE 192.168.177.33 dev eth0 lladdr 38:60:77:25:3f:95 REACHABLE 192.168.19.166 dev eth0 FAILED 192.168.177.21 dev eth0 lladdr 52:54:c0:a8:b1:15 REACHABLE The address in question is this 192.168.19.166 -- it should not be tried on locally connected ethernet segment, but instead should go to the (default) gateway at 192.168.177.5. This machine is running 3.0.18 kernel. The gateway (also running this kernel) can access the IP in question just fine (it is 2 hops away from the gateway, not reachable directly neither from the gw nor from the machine in question). After some searching we found a very very similarly looking issue: http://lists.openwall.net/netdev/2011/11/15/126 "Unable to flush ICMP redirect routes in kernel 3.0+" with a good reproducer: http://lists.openwall.net/netdev/2011/11/16/138 The issue however is that, in our case, I can't reproduce this problem at all using the way described by Ivan Zahariev in the last message: sending redirects from the geateay for "random" addresses does not make corresponding "persistent" cache entries, once the route on the gw gets removed, that IP address starts working again from the machine in question. So now we have only one IP address that behaves like this, and I can't get other addresses to repeat its behavour. The problem appeared suddenly, while the network was in use. What is also interesting here is that the gateway should never send a redirect like that because it has explicit route for that network pointing to entirely different machine. I can work around the _current_ problem we're facing by moving the host in question (192.168.19.166) to another IP address. But I'd love to understand what's going on here. Also, it appears that the patch that emerged from the mentioned discussion hasn't been released in any stable kernels so far - is there some issue with it? And since I can't reproduce the issue here as described above, I've one more question: should it be reproducible? And finally, here's some more details about our setup. It is actually a "bit" more complex, involving bridges, vlans, veth and tap devices. The "host" in question is a lxc guest on veth interface. Its veth iface is connected to a bridge "tls-br" on the host. I'm omiting some details still (like other lxc guests which have very similar config, and also kvm guests with tap interfaces). host$ ip addr 2: eth0: mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:1f:c6:ef:e5:1b brd ff:ff:ff:ff:ff:ff 3: tls-vlan@eth0: mtu 1500 qdisc noqueue master tls-br state UP link/ether 00:1f:c6:ef:e5:1b brd ff:ff:ff:ff:ff:ff 4: tls-br: mtu 1500 qdisc noqueue state UP link/ether 00:1f:c6:ef:e5:1b brd ff:ff:ff:ff:ff:ff inet 192.168.177.15/26 brd 192.168.177.63 scope global tls-br 9: veth-tsrv: mtu 1500 qdisc pfifo_fast master tls-br state UP qlen 1000 link/ether 5e:e8:4f:67:80:17 brd ff:ff:ff:ff:ff:ff tls-br connects tls-vlan@eth0 and veth-tsrv. It has an address from the same 192.168.177/26 segment as the guest in question. host$ ip route default via 192.168.177.5 dev tls-br 192.168.177.0/26 dev tls-br proto kernel scope link src 192.168.177.15 (this is a complete routing table, there's no more routes) What is also very interesting is that this problem with this single IP address affects ALL lxc machines on this host at once, and the host itself: host$ ip neigh 192.168.177.35 dev tls-br lladdr 6c:f0:49:9d:f2:0c STALE 192.168.19.166 dev tls-br FAILED 192.168.177.38 dev tls-br lladdr 38:60:77:25:3f:9c STALE 192.168.177.5 dev tls-br lladdr 00:90:27:30:6d:1c DELAY ... (after trying to ping it). Each "subdivision" on this host has its own arp table, but every subdivision (host itself or any of it lxc guests which all have similar config) always tries to reach thiis very IP address directly. otherLXCguest$ ip n 192.168.19.166 dev eth0 INCOMPLETE 192.168.177.15 dev eth0 lladdr 00:1f:c6:ef:e5:1b STALE 192.168.177.5 dev eth0 lladdr 00:90:27:30:6d:1c DELAY So.. it looks like something does not work right across namespaces. Any clue what's going on? Thank you! /mjt