From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jay Vosburgh Subject: Re: [Bonding-devel] quick help with bonding? Date: Thu, 29 Mar 2007 17:13:51 -0700 Message-ID: <11531.1175213631@death> References: <460BE5F0.7070606@nortel.com> <20070329181617.GA25770@gospo.rdu.redhat.com> <460C38EF.1080509@nortel.com> <4074.1175207458@death> <460C4EFE.6030505@nortel.com> Cc: Andy Gospodarek , netdev@vger.kernel.org, bonding-devel@lists.sourceforge.net To: "Chris Friesen" Return-path: Received: from e4.ny.us.ibm.com ([32.97.182.144]:50032 "EHLO e4.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934369AbXC3ANz (ORCPT ); Thu, 29 Mar 2007 20:13:55 -0400 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e4.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id l2U0Dsjn011748 for ; Thu, 29 Mar 2007 20:13:54 -0400 Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v8.3) with ESMTP id l2U0Ds63282966 for ; Thu, 29 Mar 2007 20:13:54 -0400 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l2U0DrMl005624 for ; Thu, 29 Mar 2007 20:13:54 -0400 In-reply-to: <460C4EFE.6030505@nortel.com> Sender: netdev-owner@vger.kernel.org List-Id: netdev.vger.kernel.org Chris Friesen wrote: >Jay Vosburgh wrote: > >> 2.6.10 is pretty old, and there have been a number of fixes to >> the bonding ARP monitor since then, so it may be that it is simply >> misbehaving (presuming that you're running the 2.6.10 bonding driver). >> Are you in a position to test against a more recent kernel (and/or >> bonding driver)? Does the miimon misbehave in a similar fashion? > >Testing a more recent kernel is problematic. A new bonding driver could >be possible, assuming the code hasn't changed too much. > >I just did another experiment. Normally we boot via eth4 (which then >becomes part of the bond with eth5 at init time). If I boot via eth6 >instead, it appears as though the problem doesn't show up. Well, if you're still inclined to investigate, you may want to inspect the ARP probes generated by bonding in the "bad" situation. I don't really have any evidence to back it up, but one guess is that the IP detection stuff in the ARP monitor is getting messed up. I'd check to see if the ARP probes have the correct source IP address (which, in the 2.6.10 era bonding, is determined only once by inspection of outbound ARP traffic, and never updated). If you're not using active-backup mode (you didn't say, and I can't tell from your log excerpt), then the ARP monitor may not work at all (since it will send ARP probes with an IP source of all zeros). If bad ARP probe source addresses are your problem, then that is fixed in a later version of bonding, although the changes would require some rework to backport to 2.6.10 (if they can be backported). -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com