From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966130AbbJ3S7f (ORCPT ); Fri, 30 Oct 2015 14:59:35 -0400 Received: from mail-bn1bon0132.outbound.protection.outlook.com ([157.56.111.132]:30082 "EHLO na01-bn1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1760385AbbJ3S7b (ORCPT ); Fri, 30 Oct 2015 14:59:31 -0400 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=uwe.koziolek@redknee.com; Subject: Re: [PATCH v4] net/bonding: send arp in interval if no active slave To: Jarod Wilson , Nikolay Aleksandrov References: <56094137.9030206@redhat.com> <1444161197-38442-1-git-send-email-jarod@redhat.com> <56150A1B.10405@cumulusnetworks.com> <56151E4A.2000503@redhat.com> CC: , Jay Vosburgh , Andy Gospodarek , Veaceslav Falico , From: Uwe Koziolek Message-ID: <5633BE00.5090509@redknee.com> Date: Fri, 30 Oct 2015 19:59:12 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <56151E4A.2000503@redhat.com> Content-Type: text/plain; charset="iso-8859-15"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [84.132.54.246] X-ClientProxiedBy: DB5PR02CA0038.eurprd02.prod.outlook.com (25.161.237.48) To CO2PR0501MB903.namprd05.prod.outlook.com (10.141.247.18) X-Microsoft-Exchange-Diagnostics: 1;CO2PR0501MB903;2:lSb0DBdTj6aCZhEmlp+OZN+nCheDQzQ2cO7mfCsyjUGJMX0wRRWXbnx+iir4OOpUW6dwe8RQbiWiymc+WTx37EyzsRSU4K1etZ2JNsOY3aMiBKAVmvbS/AZ1npvqk2Udpsf60i6VuKY+vFKmTM3atM/RECjMb+s0Vw6oUDa7wXc=;3:kYJyXx1dA9o/6vjgvCXTFR/ZotWdKO8e4nEhNtcKrVrlZFP+5fjFaF1dgVIoQCKK+t1ivjw9rKN9YJZSN+apOWvY9c2XzzmtcHy1cC5VGZ98CTy9p2FauDaQyqkg8iwkxOxW5DFSSoFPhgKAeHQh1A==;25:dqy5KF/rxNOrXEqGoRZdQr7Noib+pUfvjqGwfo8g2sTlvPvl1ms5JdLATJkBKf8GcnBxadWwsIOqABh81A6C7X/IjHlx+jb8AH/FtbvJwXs76o1JEQt+ssIffTNDg6XBOwRKqZ8GR9torZujLcMRKuX5Y3k9WFrM4V9fxwb1TPkts9rp8cCoIDKw4D3HSb1WmFfoai45o4lHXB2JMlf/QOtK+QrW2iXu4FhSpAM1Ay6e9/ixlyFFNkeiJRhHSpebx+wxHaWgTth/OKe+q1L6+Q== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CO2PR0501MB903; X-Microsoft-Exchange-Diagnostics: 1;CO2PR0501MB903;20:O1+83nhPc8JVmWHGfpddLlMLEUCGwmfQkYvZ9N7W1Z/gLYIM0uKB+jYIrXydK9vS90MsGvsrPriGJ5wlZRHoiLgxi3wtMnqyZBv2Fo4PPxm/jr55iwc5M4CTOokwSvlX4S19f0vP+M2ruD3y0ZNjPayPSnHXX5cmmL9iXkjJOSJjYv7ggB+lhydtd40kHpSW8eN8Xwa6s1Qi70Z+RavJ45tyCQyIJbOSOVoGmOXtqbjYTMb4s2oXEsvrmirZrWSlhwhgNBiXc3xyhkvGvEs6KKjBnWmPHILdn+R5/lJzLBDUp5DH8MqlOYy/BdxafuPUUP84kxa9qy+MmJVpJ4WRkXZd9ldsIfL3IR9RjMfF+TwrT9u2qKaSez4gzSe1tYbT+1Q5xqfr1sQzCulRTQYeV7cEHWLmt0HjUXxGUWOFBx6l5BAv76dFs+G9xEP7gbMuW9hNZmcvcFB95ANZ91jh/FuR/B6wy3PZS5gRP5Iukb5hDnQ2ida31rfyX3Ws533w;4:PsV+rQDv16Ez9eMjkvP47UscMrsqIlIonrgjZazEyF21UoBjZI0IpjeE5lY6EEzlPfycdZtfRCIRGlABMC4fHKtVxGm9j0gZtnF1Q5pmH5j7+wCmuHoohApwytqxtfvlgO4bcyQKQXnNvkqE6VdTTJwCdn4zTu7sQ/0eKUtcbNfELoW8h5p4l98K43/M/dtvDOXzwlhsTVY/ABnllLxuDjTSrawaFjTnHmcH3USJDJszStDSEy9QSffvaxzoWXPAsUIbf/yI2cFmfMY/D9b6/8MPw0WFW1OxIc/qGvf1mxmJxlNF5AnyUZJJhk9I8oW25t82u9r+mO9/yr93ZnwTiqSNLOB5RBYKC/gXV2yEZgY6CTrp6SeXBmvbvVpEC4WE8eDRmnAGr4BQ0mZk4nK0Vw== X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(41966723458330); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(8121501046)(520078)(5005006)(3002001)(10201501046)(102215026);SRVR:CO2PR0501MB903;BCL:0;PCL:0;RULEID:;SRVR:CO2PR0501MB903; X-Forefront-PRVS: 07459438AA X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(6009001)(6049001)(199003)(189002)(479174004)(24454002)(54534003)(377454003)(53754006)(122386002)(97736004)(42186005)(64126003)(93886004)(77096005)(189998001)(76176999)(50466002)(59896002)(5001960100002)(19580395003)(33656002)(80316001)(117156001)(5001770100001)(19580405001)(50986999)(65806001)(36756003)(4001350100001)(65816999)(5008740100001)(66066001)(92566002)(83506001)(101416001)(5007970100001)(40100003)(23756003)(86362001)(5004730100002)(87266999)(106356001)(87976001)(105586002)(65956001)(81156007)(2950100001)(54356999)(47776003);DIR:OUT;SFP:1102;SCL:1;SRVR:CO2PR0501MB903;H:[192.168.222.3];FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?iso-8859-15?Q?1;CO2PR0501MB903;23:TBXNRTiLO9XBO8xqZ3G8obCR+98NjQ8WqduTo?= =?iso-8859-15?Q?g+szYsghBwPOBuCDKgw1O0ivjq9bCzQtE1wV+v/ljO4/cE4xDihFx4GUT?= =?iso-8859-15?Q?JVcAg2RCGb1TFO5vEAaSA7lzAxwxQaZAAhJtm2/tRGv5bhaPCyZi6uTO2?= =?iso-8859-15?Q?OrF05aCNpKLyKGQPg3OBdUV++vfIBPzIwbcI49oPtVe310WLHTNuW+OgB?= =?iso-8859-15?Q?jeVz3UolfZBSzQEU6O2Ri3SQx91NilViSfOfbyhJJmV4FjJ+CGs40U3Ta?= =?iso-8859-15?Q?XRJprLuutgFWHiMLWuhK0YZDOuLnUbzvQxlJQV/luFQiZ4Ehc+rQCEFRf?= =?iso-8859-15?Q?XGtMDFfQwL9FJINxrkbHH2P4mzz54BlRCDLIMFuQaqyA6fH89hdEJQPDO?= =?iso-8859-15?Q?9et4WmUQ4SefnksSTAxh9pjy22x72+Sz+xU03oQU7UGzeKF4V2Ge1PowM?= =?iso-8859-15?Q?Iaz5P+qnRQV8tcDuqn669BwN9FSf7YpNaaJ1Wpz0gKFD4twMd9CUqanLQ?= =?iso-8859-15?Q?PTZVuxtLR1Hr6DWE+JQqJCZt/NVfdMw8pOX/c3+DMepDRqtK7BhQIC8Wh?= =?iso-8859-15?Q?ijR66+Bj6HFKLEjM+JpuCWwO0Eh7kgCZNPW/jqR9l86nZMC/lnLgUWP9T?= =?iso-8859-15?Q?nci8LiZNUEIM/Mzm1rdPJisVt0V0uSIdvHkebY4g/KC03FZ7UnTjVsY6+?= =?iso-8859-15?Q?DqJRxFWIuTblid6BoAS1FM9HZHrCjUpM8JdWTX5nIMhrEwss3hPykX2eL?= =?iso-8859-15?Q?geaZNmgiVk4Zhx5P2VxXOJD57YmwGx/f7CgSCXPQzFaOacyKhN+0dNlfw?= =?iso-8859-15?Q?7QnK4YxyFCBljoODBSIu1/Z/rTUqsr+94PRl4S5wG4CHt+M/5GKrBzKCf?= =?iso-8859-15?Q?dHcTTLp5AqUX37Dfy8siQypFxj5VHKaJgLOBM1z0YlsGP2yLRW3WEkWJf?= =?iso-8859-15?Q?pi3pcVnGpzzvPZme9JpPwVhe1b/k2j1MnUeQ3OKtl11MhxPpoXtEdvRFf?= =?iso-8859-15?Q?Kl2TxI5lz3vRfoOE0LmhQXx86nntlVvs2eUtCtlu6AeHS0kLCKAlc+uj8?= =?iso-8859-15?Q?tkEHQngUDIORaCkc9wXe+howHbf+QkUR7lUsge0b6dX+2KFA3PCbx9j7w?= =?iso-8859-15?Q?kkvPFXJaV/9WE5baZ3uwt1rR6gsWB9i5cC1s9zhHAjjYyPt2sMV2uxo6r?= =?iso-8859-15?Q?+cRTW/rZ8tZ6py9nMFYrveUUWT+bIt5CyBV5IZGJUd8fcYf2MwKmoQhbB?= =?iso-8859-15?Q?LNvHjK0+KwQelAwEjcS7H6ny/tyeCX6tcbb7tmB8yaOr7vw30E0WCRRuu?= =?iso-8859-15?Q?AhphRo9c+YRT9LSM35y0bh1j31mHzGrmjdup2zbbLQSKpMSlg3Emsz8L0?= =?iso-8859-15?Q?hDpPJKcGYKYRs+cnZAT9ZmeV/mY2HMcs=3D?= X-Microsoft-Exchange-Diagnostics: 1;CO2PR0501MB903;5:bZW6Q+k7DJl6jGAl1AqVgPH9Qpfd06ppqXhg3uXebslqc1FCWkTwS3NMqPLFHrCOm5r6e8PaQbBn/jbldU0UdZIKK2kZDH4EG+qGXDH66YPTyZuTlNVPoStJKMnm3/tyK3sxDIhW05cRlHWA/nJtWw==;24:FKFUSXdQqnKy1jB8kXtASlp+hyCtySzCxtDhxMdixSoDZVcU2KR4MVLfN0Yso+ZGkE1VOijwp1DukATQtjVJZ8t1CDlt7vP5WZyk1cic1rQ=;20:2aTvVaToMerLpzfoNRioaxfnX4OkpAEAvYKFI/ovuFeuGSWULx7bu5qnkAFqEDC8AAHeRjLoI0gQIEcpzhcXmw== SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: redknee.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Oct 2015 18:59:26.9910 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: CO2PR0501MB903 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Nikolay Aleksandrov wrote: >> On 10/06/2015 09:53 PM, Jarod Wilson wrote: >>> From: Uwe Koziolek >>> >>> With some very finicky switch hardware, active backup bonding can get into >>> a situation where we play ping-pong between interfaces, trying to get one >>> to come up as the active slave. There seems to be an issue with the >>> switch's arp replies either taking too long, or simply getting lost, so we >>> wind up unable to get any interface up and active. Sometimes, the issue >>> sorts itself out after a while, sometimes it doesn't. >>> >>> Testing with num_grat_arp has proven fruitless, but sending an additional >>> arp on curr_arp_slave if we're still in the arp_interval timeslice in >>> bond_ab_arp_probe(), has shown to produce 100% reliability in testing with >>> this hardware combination. >>> >>> [jarod: manufacturing of changelog, addition of modparam gating] >>> CC: Jay Vosburgh >>> CC: Andy Gospodarek >>> CC: Veaceslav Falico >>> CC: netdev@vger.kernel.org >>> Signed-off-by: Uwe Koziolek >>> Signed-off-by: Jarod Wilson >>> --- >>> v2: add code comment as to why change is needed >>> v3: fix wrapping of comments >>> v4: [jarod] add module parameter gating of code addition >>> >> Hi all, >> As Andy already stated I'm not a fan of such workarounds either but it's >> necessary sometimes so if this is going to be actually considered then a >> few things need to be fixed. Please make this a proper bonding option >> which can be changed at runtime and not only via a module parameter. > > Okay, I can give that a shot, however... > >> Now, I saw that you've only tested with 500 ms, can't this be fixed by using >> a different interval ? This seems like a very specific problem to have a >> whole new option for. > > ...I'll wait until we've heard confirmation from Uwe that intervals other than 500ms don't fix things. > A test with 5000 ms don't fix the problem. Tested with Cisco C3750, 4 bonds. >> I really want to say fix the switch but I know that's not an option. :-) > > Yeah, unfortunately not! > >> A few minor nits below, >> >>> drivers/net/bonding/bond_main.c | 24 ++++++++++++++++++++++++ >>> include/net/bonding.h | 1 + >>> 2 files changed, 25 insertions(+) >>> >>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >>> index 90f2615..72ab512 100644 >>> --- a/drivers/net/bonding/bond_main.c >>> +++ b/drivers/net/bonding/bond_main.c >>> @@ -95,6 +95,7 @@ static int miimon; >>> static int updelay; >>> static int downdelay; >>> static int use_carrier = 1; >>> +static int arp_slow_switch; >>> static char *mode; >>> static char *primary; >>> static char *primary_reselect; >>> @@ -133,6 +134,10 @@ MODULE_PARM_DESC(downdelay, "Delay before considering link down, " >>> module_param(use_carrier, int, 0); >>> MODULE_PARM_DESC(use_carrier, "Use netif_carrier_ok (vs MII ioctls) in miimon; " >>> "0 for off, 1 for on (default)"); >>> +module_param(arp_slow_switch, int, 0); >>> +MODULE_PARM_DESC(arp_slow_switch, "Do extra arp checks for switches with arp " >>> + "caches that are slow to update; " >>> + "0 for off (default), 1 for on"); >>> module_param(mode, charp, 0); >>> MODULE_PARM_DESC(mode, "Mode of operation; 0 for balance-rr, " >>> "1 for active-backup, 2 for balance-xor, " >>> @@ -2793,6 +2798,18 @@ static bool bond_ab_arp_probe(struct bonding *bond) >>> return should_notify_rtnl; >>> } >>> >>> + /* Sometimes the forwarding tables of the switches are not update >> ^ s/update/updated/ > > D'oh. Fixed locally. > >>> @@ -4280,6 +4297,12 @@ static int bond_check_params(struct bond_params *params) >>> use_carrier = 1; >>> } >>> >>> + if ((arp_slow_switch != 0) && (arp_slow_switch != 1)) { >> ^^ no need for the extra () > > Copy-pasta from use_carrier checks right above it. Never quite sure if I should stick with the same possibly sub-optimal > formatting conventions already in the file, try to fix them while also fixing bugs, or just mix styles... > > >>> + pr_warn("Warning: arp_slow_switch module parameter (%d), not of valid value (0/1), so it was set to 1\n", >>> + arp_slow_switch); >>> + arp_slow_switch = 1; >> ^^ please default to old behaviour in this case (0) > > Will do. > Uwe Koziolek