From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Shaun Kemp" Subject: Bug ? IF_RUNNING/routing table updates Date: Mon, 9 Oct 2006 11:55:28 +0100 Message-ID: <018201c6eb91$816764e0$8100a8c0@stealth00025> Mime-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Return-path: Received: from mail.stealthnet.net.216.3.62.in-addr.arpa ([62.3.216.19]:57239 "EHLO cow.int.stealthnet.net") by vger.kernel.org with ESMTP id S1751791AbWJIKzJ (ORCPT ); Mon, 9 Oct 2006 06:55:09 -0400 Received: from ws129.int.stealthnet.net ([192.168.0.129] helo=stealth00025) by cow.int.stealthnet.net with esmtp (Exim 4.50) id 1GWsmq-0003Ap-Iw for netdev@vger.kernel.org; Mon, 09 Oct 2006 11:55:08 +0100 To: Sender: netdev-owner@vger.kernel.org List-Id: netdev.vger.kernel.org Hi, We appear to have encountered a bug with kernel routing table updates. If an interface (+ associated IP network) loses its IF_RUNNING flag, it is still used for routing. More details below: Flavour: Debian Quagga: ii quagga 0.96.5-11 Unoff. successor of the Zebra BGP/OSPF/RIP r Specific kernel is: 2.4.27-2-386 but noticed the same on 2.6.* release. 0000:02:06.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 0d) 0000:02:07.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 0d) Basically when operating any of the dynamic routing protocols under Quagga (or the older "Zebra"), a niggling problem keeps surfacing which curtails the operation of routing protocols in multihomed server environments under certain circumstances. Similar reports of the problem appear to be littered around various lists, but to date I can see no solution and it remains a problem, hence the post. Whilst the nature of my specific topology is rather complex, I can define the problem generally as: An interface (+ connected IP network) which loses its IF_RUNNING flag (ie unusable for routing) persists in the routing table as a kernel route. Thus rather than responding to a dynamically announced route to this connected network (the connected being unreachable due to the interface being down, but the dynamic offering an alternate path), the box insists on trying to route it out of the broken interface via this ?kernel? sourced route. See below example: ------------------ # ifconfig eth0 Link encap:Ethernet HWaddr 00:20:ED:35:D4:C8 inet addr:192.168.0.143 Bcast:192.168.0.191 Mask:255.255.255.192 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 eth1 Link encap:Ethernet HWaddr 00:20:ED:35:D4:C9 inet addr:192.168.0.207 Bcast:192.168.0.255 Mask:255.255.255.192 UP BROADCAST MULTICAST MTU:1500 Metric:1 # ip route show 192.168.0.128/26 dev eth0 proto kernel scope link src 192.168.0.143 192.168.0.192/26 dev eth1 proto kernel scope link src 192.168.0.207 192.168.0.192/26 via 192.168.0.130 dev eth0 proto zebra metric 60 equalize # ping {anything on 192.168.0.192} The path for 192.168.0.192 is learned via 192.168.0.130 (current ospf dr - irrelevant), but it'll never use it presumably (from Cisco experience) because of the kernel sourced directly connected route still sitting in there. Furthermore, if I then IFDOWN eth1, everything is fine but I don't want to do this manually everytime there's an interface problem because that's why we run ospf ! =:D Not sure whether this is a "driver tells the kernel" or a "kernel checks the driver at {n} intervals" issue - I would suggest the former would be more correct, but it is a problem regardless. Maybe it's just these Intel drivers ? :/ Thanks for your time, Shaun Kemp.