From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jay Vosburgh Subject: Re: [Bugme-new] [Bug 29712] New: Bonding Driver(version : 3.5.0) - Problem with ARP monitoring in active backup mode Date: Fri, 04 Mar 2011 10:18:33 -0800 Message-ID: <19583.1299262713@death> References: <20110224145129.f366b59e.akpm@linux-foundation.org> <4D672525.5080609@hp.com> <19879D0AB3081A4B883186484ECC6FC05E780ADF@MPBAGVEX02.corp.mphasis.com> <17444.1298660550@death> Cc: "Brian Haley" , "Andrew Morton" , bugzilla-daemon@bugzilla.kernel.org, bugme-daemon@bugzilla.kernel.org, netdev@vger.kernel.org To: "Harsha R02" Return-path: Received: from e2.ny.us.ibm.com ([32.97.182.142]:45075 "EHLO e2.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753173Ab1CDSSj (ORCPT ); Fri, 4 Mar 2011 13:18:39 -0500 Received: from d01dlp01.pok.ibm.com (d01dlp01.pok.ibm.com [9.56.224.56]) by e2.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id p24I0Jb7005153 for ; Fri, 4 Mar 2011 13:00:19 -0500 Received: from d01relay03.pok.ibm.com (d01relay03.pok.ibm.com [9.56.227.235]) by d01dlp01.pok.ibm.com (Postfix) with ESMTP id 4DE4038C803C for ; Fri, 4 Mar 2011 13:18:37 -0500 (EST) Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay03.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p24IIc23318514 for ; Fri, 4 Mar 2011 13:18:38 -0500 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p24IIaER008371 for ; Fri, 4 Mar 2011 15:18:38 -0300 In-reply-to: Sender: netdev-owner@vger.kernel.org List-ID: Harsha R02 wrote: >We found that the patch that is presented here has some issues and we >cannot go with this solution. > >In function "bond_ab_arp_probe" in addition to sending arp probes for >the currently active slave we should also >be sending arp probes for the primary_slave if the link status of the >primary slave is up correct ? > >I have made changes as below : > >static void bond_ab_arp_probe(struct bonding *bond) >{ > struct slave *slave; > int i; > > read_lock(&bond->curr_slave_lock); > > if (bond->current_arp_slave && bond->curr_active_slave) > pr_info(DRV_NAME "PROBE: c_arp %s && cas %s BAD\n", > bond->current_arp_slave->dev->name, > bond->curr_active_slave->dev->name); > > if (bond->curr_active_slave) { >+ if((bond->curr_active_slave != bond->primary_slave) && >+ (IS_UP(bond->primary_slave->dev))) { >+ bond_arp_send_all(bond, bond->primary_slave); >+ } > bond_arp_send_all(bond, bond->curr_active_slave); > read_unlock(&bond->curr_slave_lock); No, we can't do this; if we send ARP probes out from an inactive slave (which the primary would be at this point) it will confuse switches that snoop traffic to determine the switch port's MAC addresses (the switches will believe that the "primary" slave is the port to use to reach the bond's MAC address). I think your problem is that your configuration (two systems, back to back, no switch) is not a configuration the ARP monitor is designed to work with. The ARP monitor determines the availability of backup slaves based on traffic received by the backup slaves. The usual source of this traffic is the ARP broadcast requests being sent out the active slave and then forwarded by the switch to all switch ports, including the backup slave's port. I'm guessing that your system isn't forwarding these packets like a switch would, and so the primary slave isn't seeing any incoming packets at all. If your primary slave (which is an inactive slave at the moment) is not receiving traffic, bonding will never believe it is available. I've never experimented with using the ARP monitor in a back-to-back confguration; I'm thinking through how the ARP monitor functions, and I'm not sure it can be reliable when set up like this. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com