From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jay Vosburgh Subject: Re: [PATCH] bonding: If IP route look-up to send an ARP fails, mark in bonding structure as no ARP sent. Date: Thu, 21 Nov 2013 18:43:44 -0800 Message-ID: <8059.1385088224@death.nxdomain> References: <528D5980.3040309@oracle.com> <20131121111022.GA30998@redhat.com> <528E6E40.6020201@oracle.com> <17860.1385068379@death.nxdomain> <528EA6A1.5040209@oracle.com> Cc: Veaceslav Falico , netdev@vger.kernel.org To: rama nichanamatlu Return-path: Received: from e37.co.us.ibm.com ([32.97.110.158]:53000 "EHLO e37.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754061Ab3KVCnu (ORCPT ); Thu, 21 Nov 2013 21:43:50 -0500 Received: from /spool/local by e37.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 21 Nov 2013 19:43:50 -0700 Received: from b03cxnp08027.gho.boulder.ibm.com (b03cxnp08027.gho.boulder.ibm.com [9.17.130.19]) by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id BE03E1FF001F for ; Thu, 21 Nov 2013 19:43:28 -0700 (MST) Received: from d03av05.boulder.ibm.com (d03av05.boulder.ibm.com [9.17.195.85]) by b03cxnp08027.gho.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id rAM0g0Dg40894664 for ; Fri, 22 Nov 2013 01:42:00 +0100 Received: from d03av05.boulder.ibm.com (localhost [127.0.0.1]) by d03av05.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id rAM2hkOO015242 for ; Thu, 21 Nov 2013 19:43:47 -0700 In-reply-to: <528EA6A1.5040209@oracle.com> Sender: netdev-owner@vger.kernel.org List-ID: rama nichanamatlu wrote: >On 11/21/2013 1:12 PM, Jay Vosburgh wrote: >> rama nichanamatlu wrote: >> >>> On 11/21/2013 3:10 AM, Veaceslav Falico wrote: >>>> On Wed, Nov 20, 2013 at 04:53:20PM -0800, rama nichanamatlu wrote: >>>>> During the creation of VLAN's atop bonding the underlying interfaces >>>>> are made part of VLAN's, and at the same bonding driver gets aware >>>>> that VLAN's exists above it and hence would consult IP routing for >>>>> every ARP to be sent to determine the route which tells bonding >>>>> driver the correct VLAN tag to attach to the outgoing ARP packet. But, >>>>> during the VLAN creation when vlan driver puts the underlying >>>>> interface into default vlan and then actual vlan, in-between this if >>>>> bonding driver consults the IP for a route, IP fails to provide a >>>>> correct route and upon which bonding driver drops the ARP packet. ARP >>>>> monitor when it >>>>> comes around next time, sees no ARP response and fails-over to the >>>>> next available slave. Consulting for a IP route, >>>>> ip_route_output(),happens in bond_arp_send_all(). >>>> >>>> bonding works as expected - nothing to fix here. And even as a >>>> workaround/hack - I'm not sure we need that to suppress one failover *only* >>>> when vlan is added on top. >>>> >>>>> >>> Thank U. >>> With *out* this change our systems failed system testing, to >>> consistently be on designated primary interface on *every* single >>> reboot. With this change the behavior was as expected even after a few >>> thousand reboots & System testing could move to next level catching an >>> another bug in sr-iov :). And Without, the outcome was less predictable >>> after a reboot and bonding was on a different slave each time. >>> -Rama >> >> By "designated primary" you mean the bonding primary option, >> correct? >Yes correct. Bonding primary param is set. >ex: primary=eth1 and primary_reselect=2. >Hence it is expected to be on primary on every reboot. If I set up a basic bonding configuration like: [ eth3, eth4 ] -> bond0 -> bond0.66, with primary=eth3 primary_reselect=2 Then look at dmesg, I see this sequence: The bond is set up first, with an arp_ip_target on a VLAN destination. The slaves are added to the bond. The VLAN interface is configured above the bond, and brought up. The slaves become link up after autonegotiation, the ARP monitor commences, and eth3 is made the active slave. Even if eth4 is set by the bond to be "link status up," eth3 becomes the active slave when it becomes "link status up." What network device are you using for the slaves? Are they virtualized devices of some kind? My suspicion is that Ethernet autonegotiation either does not take place or occurs so quickly that the slaves are carrier up before the VLAN is even added. Can you check your dmesg output for the sequence of events? In my test, I do not see the slaves go "NIC Link is Up 1000 Mbps Full Duplex" until about 3 seconds after the VLAN interface has been configured. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com