From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758931AbbIDLE7 (ORCPT ); Fri, 4 Sep 2015 07:04:59 -0400 Received: from mail-bn1on0139.outbound.protection.outlook.com ([157.56.110.139]:11626 "EHLO na01-bn1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1757556AbbIDLE4 (ORCPT ); Fri, 4 Sep 2015 07:04:56 -0400 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=uwe.koziolek@redknee.com; Subject: Re: [PATCH] net/bonding: send arp in interval if no active slave To: Jay Vosburgh References: <1439828583-27325-1-git-send-email-jarod@redhat.com> <20150817165500.GA21512@vps.falico.eu> <55D215F7.3080905@redhat.com> <55D22E64.6020807@redknee.com> <2649.1439838866@famine> <55D2494F.3020800@redknee.com> <20150901154157.GY504@gospo.home.greyhouse.net> <55E63065.8070906@redknee.com> <31529.1441292743@famine> CC: Andy Gospodarek , Jarod Wilson , Veaceslav Falico , , From: Uwe Koziolek Message-ID: <55E97AC6.6030106@redknee.com> Date: Fri, 4 Sep 2015 13:04:38 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <31529.1441292743@famine> Content-Type: text/plain; charset="iso-8859-15"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [84.132.60.144] X-ClientProxiedBy: HE1PR01CA0026.eurprd01.prod.exchangelabs.com (25.163.2.164) To DM2PR0501MB908.namprd05.prod.outlook.com (10.242.173.18) X-Microsoft-Exchange-Diagnostics: 1;DM2PR0501MB908;2:6mh30WN0jJtbgTBmRC0r8gbTUlorztFcRy4RhkfEMpoYX5v7YuISXVxIsRFc3hV3iNkwPYqNcbOFGclUVTRDLHKzbbW3MubtjIKRsoE4DN/aj1xj69eAG3OFuMCns689o5poaANYKG/vvnVuF2RaXY9fyi2AAHh6tCV2urYI5yk=;3:DKlqB2awSgzgMN3arbTUGzdBWZlLcIfSHRListPUwE/kPOUojl0lPRRHAxncD3F3nCc1zV0flLrNG6Td5hqz87T+u6Cxs4lwYc93FULropas6IL3AQamMeR+nSsVuoMMVEA5n7N5mDbVHMOa7kkreA==;25:Rx/Krlwq0bLHGoLTXNXIAOciGYMgzhZ/P9MUC3pFigNaIQ1GPQUNqALM2346DaTqFn5LKwccxaS1kzcgUM4W9VzSghbnG819QEc1L5/xgdranTP/g3kHoF90DjQg5hnMp/kfYMfeKZ7Q9LiXvpn7920JiUy0gzy6/mD5x3bm5D29NPGmPqd9ZcBs+aWJCSzQiuYyEmYEtLU8NZcbUzww3p4LiIQ4smKcsLOGhoCVD8UY4OKSNY9UpMxVbjmOJ/cZ46Q4PFh1cXL2yZA4eKTGpg== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DM2PR0501MB908; X-Microsoft-Exchange-Diagnostics: 1;DM2PR0501MB908;20:9RjWEVFpToe43q2VwQBRWWIFVlzCFUVnYUfESxcW9ZPSO0GXkm+LvgFdYYo7AnkQJZMzReCZ/COS2N27tUxGzPv5amSwskNvMmUI9lCOv2oVJNJ5Id3UXgT4b1bpjnf1ca/2b9EScNHKHL5LTM1F9asTHfbooePqkoitA9vt2C0C44iU1mPvgffgJuzJ3pjGYxMJqpu3oG3n3JcyPq/EpAc46dLGe8Ge/na94qMIrzsSZYXD3hapRwcluDStf5DqR+MMbw4zuqf1UR6XJYyiWkUQcBwa2E3lZGR0HMEvop4I8Y0d7hV9UtguR8jvZo2wL1WSXKtkC80JpUPth9qm5lmzJLXfG6mkb7rEP7WGZZ1G4G7LngPLGWUv2uZ9zdGvQnudeLOrZxEtuacl6cMIU4IDnpYVxzY4s5GUqWxnP5HzfoxobPsGedoIEQdF+lcrPkNSh/aiO2zA1aLiIjj5cU+iIZndgKvQSVClhzH694+CXNGHy0aBwVuryaJcxXFn;4:TIwiMZukcty0B1+EGl/XPEtrq6SaiOFWsFOYaMOzER7AHmCrkGRNwga4OSIF7CwnNC7Oj09NBoVRFjZDmE7DFyPn+pao6rbD1McdpM7J9mEGgxD6Cl3PBKvt4jjFbPiXlBw266jryDi0vF0P1uh3qBtvjwIpcQsgW/eibq0zzmv6KX4S7rHzXqkXLVroxLJS//Dw4dJb3LnTsPgBvmMGX/aKCiFMbMAiNMFJ1Ybn4j9233ccicVGoSBVjq48BmHqR3xIuOWX8tYw42xGvUIuo64NMZhRHxwynEgdHCm79xrhCRsnA/cLyoaMPmFZjXXp X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(8121501046)(5005006)(3002001);SRVR:DM2PR0501MB908;BCL:0;PCL:0;RULEID:;SRVR:DM2PR0501MB908; X-Forefront-PRVS: 06891E23FB X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(6049001)(6009001)(377424004)(377454003)(54534003)(24454002)(199003)(189002)(19580395003)(68736005)(83506001)(81156007)(62966003)(66066001)(42186005)(87976001)(65816999)(80316001)(97736004)(5001960100002)(64126003)(5001860100001)(19580405001)(92566002)(4001540100001)(50986999)(110136002)(76176999)(4001350100001)(77096005)(87266999)(101416001)(189998001)(5001830100001)(40100003)(117156001)(122386002)(2950100001)(64706001)(33656002)(54356999)(65806001)(59896002)(50466002)(65956001)(23756003)(77156002)(46102003)(106356001)(47776003)(5004730100002)(93886004)(5007970100001)(86362001)(36756003)(105586002);DIR:OUT;SFP:1102;SCL:1;SRVR:DM2PR0501MB908;H:[192.168.222.3];FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?iso-8859-15?Q?1;DM2PR0501MB908;23:awvXDUIpOXlQiqjP0RujU/9cpj2r8SzfasSbo?= =?iso-8859-15?Q?WQj69wrFL0FWqhP2GHYDmD9xLZB1OgJqp4KZ8YpmBXEev5yzk5MysPraa?= =?iso-8859-15?Q?UiAGve17pbsVkkToMLs0VMATC0WPAhi/w+AJPZERR1hb767gCOx88HBHg?= =?iso-8859-15?Q?eOMQZ+ONpTolSZwxAeW6MOwU6VbU+8R9E0YcIdMcK5CqLbFN81dUEYqfn?= =?iso-8859-15?Q?7lOi9P50iM5hVrYJR22ahgzRZ9P+vZrOqGRWaUx4So3SriME52XtvDZUg?= =?iso-8859-15?Q?F6Fd6GhyZXhrG8fhvDYXoe01Wpsq5Cz/eKD4urE/RspiHKh0vUjr7Dc8s?= =?iso-8859-15?Q?92wwJJdqRsdNnpDEgYV6ryClJccRVnw2XNf12dGrt9lAHXkkjcPBXfkIN?= =?iso-8859-15?Q?JpV6BCPiRAvQ10dN7o0qc4GaarP/TEvLnAZpD/Foa2IMIgiGpXYfc6UYW?= =?iso-8859-15?Q?HjW+wqHBb2X/AItEUTq5H/JRz7k/DmaV5KkcuQZ00vrR14BSvGgtTh+ND?= =?iso-8859-15?Q?NAHtLeuzmfp2HSXP6USL9i+POHEdy3HmJi7FU33+Vvo7WA0r7/3edYEH+?= =?iso-8859-15?Q?TMr01JgrR5Q6vIAFzxKJRGxByBuR6UrkSa8oiN3QDSX+HT3laRWq0mGtP?= =?iso-8859-15?Q?0Ff/Gw+SBU2lffhDDA7R6lNAVmHiyCbGtw8M9NGDZYLf3kR0Frr66pVm0?= =?iso-8859-15?Q?73GyoY47AraJaUBovSdnsEWB8eYEXBSzj6Kd0TDWaloYxgGXpmvFd1W9d?= =?iso-8859-15?Q?FZriQ7zjMtLYHfEeF9mIS8ODzCpreizEdU0xl8E2n5DaJoE6bFszu1W+Q?= =?iso-8859-15?Q?72mluaG71I5/VzQQoX2ICgbeFSWdJMe9Wc2CsGjXnKrhsrSCJAKhrspUX?= =?iso-8859-15?Q?vMBvyN1yrRtbdmZIprrCkK29bfB81lPwLxCyvQwLwfdmii2a36/3m/5CY?= =?iso-8859-15?Q?d797eiLc7cSxXSZn5B/4c06uOWYK8um2DXaNrTWkD8za1s+Em5Q6bF3Fj?= =?iso-8859-15?Q?tgMjb89u5X2xul3xVWhlS1QSB851Ywx7Ygs2b1VkYRU2ssDOESFEhrXQK?= =?iso-8859-15?Q?Py5/3V0p4oNwHR2XU+y8SmBJMVZXLtE8paxecd6ZpBTiset6XF3OGkgFs?= =?iso-8859-15?Q?pp9ZBMkPGU9rXmKrItydpcxsAFDNvinU2k96oC49jYraI/DASRL/11lFe?= =?iso-8859-15?Q?Eni7z9UocEpCbbolqmyasDeR51ZVsCvISVmkkarKEarSHOB92bJF8pqiX?= =?iso-8859-15?Q?LRB68s4u7tfnVFQHm4/Hdshu4VXBPRNZXMMQSnFyaqJRKKwzRuLyStm+s?= =?iso-8859-15?Q?44gFx0WKR/cehJUhWDpiBRfKYRG9jbphjbQAvFShOS8QWDcJxz9g/O1Xl?= =?iso-8859-15?Q?N5iVJB43HZKEiKth1aevMAOVp+DkjgxflsS3WbHbjXhXZsjdI0yDMdccb?= =?iso-8859-15?Q?1DDQWz6wwf4hOIJ3Do8j/S1KQ+EMfgqCg4phtTVah5g8Cf7z/JKKn3oI5?= =?iso-8859-15?Q?nBHIs2Gmf06bgNYfAcvGVZxo3wM83ORomI1jYq7vLM+x/6MfinNqHUun7?= =?iso-8859-15?Q?ZBGJPVfhsP?= X-Microsoft-Exchange-Diagnostics: 1;DM2PR0501MB908;5:f9JxUmkp/9n4PIZfRFKZ6jkmxCd95mkTWfUnwYXf71yKS1/jxbUTmnJWspE62DOpws+ZUKzyzE2kvoWsvCNAoM41zOimhfysB41Ubwc7RqQd7ewEOV8muNDjQbqsYkFjM1LYZ+wxbf3PDOMU1M34Pw==;24:HTXjeeJz1/xYbiKpgLsw2B0CYqDV//G9WB3P+6g4Fs84IROwDT9Vz65zflDFHEsX4TNsXcLVxvbi/ZZ9L0fkSD7dxqvNa74Bvx0MYPnLuug=;20:o22PsVhdBNjWoBB0oRe9wO7iloTD42o6TiQoC53cxUKUvXmVNUQOHAhvCKl2KbB8+IFOL61dYtkvFMRn4d/C8Q== SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: redknee.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 Sep 2015 11:04:52.6979 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM2PR0501MB908 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Am 03.09.2015 um 17:05 schrieb Jay Vosburgh: > Uwe Koziolek wrote: > >> On Tue, Sep 01, 2015 at 05:41 PM +0200, Andy Gospodarek wrote: >>> On Mon, Aug 17, 2015 at 10:51:27PM +0200, Uwe Koziolek wrote: >>>> On Mon, Aug 17, 2015 at 09:14PM +0200, Jay Vosburgh wrote: >>>>> Uwe Koziolek wrote: >>>>> >>>>>> On2015-08-17 07:12 PM,Jarod Wilson wrote: >>>>>>> On 2015-08-17 12:55 PM, Veaceslav Falico wrote: >>>>>>>> On Mon, Aug 17, 2015 at 12:23:03PM -0400, Jarod Wilson wrote: >>>>>>>>> From: Uwe Koziolek >>>>>>>>> >>>>>>>>> With some very finicky switch hardware, active backup bonding can get >>>>>>>>> into >>>>>>>>> a situation where we play ping-pong between interfaces, trying to get >>>>>>>>> one >>>>>>>>> to come up as the active slave. There seems to be an issue with the >>>>>>>>> switch's arp replies either taking too long, or simply getting lost, >>>>>>>>> so we >>>>>>>>> wind up unable to get any interface up and active. Sometimes, the issue >>>>>>>>> sorts itself out after a while, sometimes it doesn't. >>>>>>>>> >>>>>>>>> Testing with num_grat_arp has proven fruitless, but sending an >>>>>>>>> additional >>>>>>>>> arp on curr_arp_slave if we're still in the arp_interval timeslice in >>>>>>>>> bond_ab_arp_probe(), has shown to produce 100% reliability in testing >>>>>>>>> with >>>>>>>>> this hardware combination. >>>>>>>> Sorry, I don't understand the logic of why it works, and what exactly >>>>>>>> are >>>>>>>> we fixiing here. >>>>>>>> >>>>>>>> It also breaks completely the logic for link state management in case >>>>>>>> of no >>>>>>>> current active slave for 2*arp_interval. >>>>>>>> >>>>>>>> Could you please elaborate what exactly is fixed here, and how it >>>>>>>> works? :) >>>>>>> I can either duplicate some information from the bug, or Uwe can, to >>>>>>> illustrate the exact nature of the problem. >>>>>>> >>>>>>>> p.s. num_grat_arp maybe could help? >>>>>>> That was my thought as well, but as I understand it, that route was >>>>>>> explored, and it didn't help any. I don't actually have a reproducer >>>>>>> setup of my own, unfortunately, so I'm kind of caught in the middle >>>>>>> here... >>>>>>> >>>>>>> Uwe, can you perhaps further enlighten us as to what num_grat_arp >>>>>>> settings were tried that didn't help? I'm still of the mind that if >>>>>>> num_grat_arp *didn't* help, we probably need to do something keyed off >>>>>>> num_grat_arp. >>>>>> The bonding slaves are connected to high available switches, each of the >>>>>> slaves is connected to a different switch. If the bond is starting, only >>>>>> the selected slave sends one arp-request. If a matching arp_response was >>>>>> received, this slave and the bond is going into state up, sending the >>>>>> gratitious arps... >>>>>> But if you got no arp reply the next slave was selected. >>>>>> With most of the newer switches, not overloaded, or with other software >>>>>> bugs, or with a single switch configuration, you would get a arp response >>>>>> on the first arp request. >>>>>> But in case of high availability configuration with non perfect switches >>>>>> like HP ProCurve 54xx, also with some Cisco models, you may not get a >>>>>> response on the first arp request. >>>>>> >>>>>> I have seen network snoops, there the switches are not responding to the >>>>>> first arp request on slave 1, the second arp request was sent on slave 2 >>>>>> but the response was received on slave one, and all following arp >>>>>> requests are anwsered on the wrong slave for a longer time. >>>>> Could you elaborate on the exact "high availability >>>>> configuration" here, including the model(s) of switch(es) involved? >>>>> >>>>> Is this some kind of race between the switch or switches >>>>> updating the forwarding tables and the bond flip flopping between the >>>>> slaves? E.g., source MAC from ARP sent on slave 1 is used to populate >>>>> the forwarding table, but (for whatever reason) there is no reply. ARP >>>>> on slave 2 is sent (using the same source MAC, unless you set >>>>> fail_over_mac), but forwarding tables still send that MAC to slave 1, so >>>>> reply is sent there. >>>> High availability: >>>> 2 managed switches with routing capabilities have an interconnect. >>>> One slave of a bonding interface is connected to the first switch, the >>>> second slave is connected to the other switch. >>>> The switch models are HP ProCurve 5406 and HP ProCurve 5412. As far as i >>>> remember also HP E 3500 and E 3800 are also >>>> affected, for the affected Cisco models I can't answer today. >>>> Affected single switch configurations was not seen. >>>> >>>> Yes, race conditions with delayed upgrades of the forwarding tables is a >>>> well matching explanation for the problem. >>>> >>>>>> The proposed change sents up to 3 arp requests on a down bond using the >>>>>> same slave, delayed by arp_interval. >>>>>> Using problematic switches i have seen the the arp response on the right >>>>>> slave at latest on the second arp request. So the bond is going into state >>>>>> up. >>>>>> >>>>>> How does it works: >>>>>> The bonds in up state are handled on the beginning of bond_ab_arp_probe >>>>>> procedure, the other part of this procedure is handling the slave change. >>>>>> The proposed change is bypassing the slave change for 2 additional calls >>>>>> of bond_ab_arp_probe. >>>>>> Now the retries are not only for an up bond available, they are also >>>>>> implemented for a down bond. >>>>> Does this delay failover or bringup on switches that are not >>>>> "problematic"? I.e., if arp_interval is, say, 1000 (1 second), will >>>>> this impact failover / recovery times? >>>>> >>>>> -J >>>> It depends. >>>> failover times are not impacted, this is handled different. >>>> Only the transition from a down bonding interface (bond and all slaves are >>>> down) to the state up can be increased by up to 2 times arp_interval, >>>> If the selected interface did not came up .If well working switches are >>>> used, and everything other is also ok, there are no impacts. >>> So I'm not a huge fan of workarounds like these, but I also understand >>> from a practical standpoint that this is useful. My only issue with the >>> patch would be to please include a small comment (1-2 lines) in the code >>> that describes the behavior. I know we have the changelog entries for >>> this, but I would feel better about having an exception like this in the >>> code for those reading it and wondering: >>> >>> "Why would we wait 2 intervals before failing over to the next interface >>> when there are no active interfaces?" >>> >> >> diff -up a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >> --- a/drivers/net/bonding/bond_main.c 2015-08-30 20:34:09.000000000 +0200 >> +++ b/drivers/net/bonding/bond_main.c 2015-09-02 00:39:10.000298202 +0200 >> @@ -2795,6 +2795,16 @@ static bool bond_ab_arp_probe(struct bon >> return should_notify_rtnl; >> } >> >> + /* sometimes the forwarding tables of the switches are not updated fast enough >> + * the first arp response after a slave change is received on the wrong slave. >> + * the arp requests will be retried 2 times on the same slave >> + */ >> + >> + if (bond_time_in_interval(bond, curr_arp_slave->last_link_up, 2)) { >> + bond_arp_send_all(bond, curr_arp_slave); >> + return should_notify_rtnl; >> + } >> + > > I probably should have asked this in the beginning, but at what > range of arp_interval values does the problem manifest? If it's a race > condition with the switch update, I'd expect that only very small > arp_interval values would be affected. > > Also, your proposed comment wraps past 80 columns. > > -J > Only 500 msecs arp interval is used, no other values are checked. Wraps in patch are now removed. diff -up ./drivers/net/bonding/bond_main.c.orig ./drivers/net/bonding/bond_main.c --- ./drivers/net/bonding/bond_main.c.orig 2015-08-30 20:34:09.000000000 +0200 +++ ./drivers/net/bonding/bond_main.c 2015-09-04 11:59:05.755897182 +0200 @@ -2795,6 +2795,17 @@ static bool bond_ab_arp_probe(struct bon return should_notify_rtnl; } + /* sometimes the forwarding tables of the switches are not updated + * fast enough. the first arp response after a slave change is received + * on the wrong slave. + * the arp requests will be retried 2 times on the same slave + */ + + if (bond_time_in_interval(bond, curr_arp_slave->last_link_up, 2)) { + bond_arp_send_all(bond, curr_arp_slave); + return should_notify_rtnl; + } + bond_set_slave_inactive_flags(curr_arp_slave, BOND_SLAVE_NOTIFY_LATER); bond_for_each_slave_rcu(bond, slave, iter) { > >> bond_set_slave_inactive_flags(curr_arp_slave, BOND_SLAVE_NOTIFY_LATER); >> >> bond_for_each_slave_rcu(bond, slave, iter) { >> >>>>>> The num_grat_arp has no chance to solve the problem. The num_grat_arp is >>>>>> only used, if a different slave is going active. >>>>>> But in our case, the bonding slaves are not going into the state active >>>>>> for a longer time. >>>>>>>>> [jarod: manufacturing of changelog] >>>>>>>>> CC: Jay Vosburgh >>>>>>>>> CC: Veaceslav Falico >>>>>>>>> CC: Andy Gospodarek >>>>>>>>> CC: netdev@vger.kernel.org >>>>>>>>> Signed-off-by: Uwe Koziolek >>>>>>>>> Signed-off-by: Jarod Wilson >>>>>>>>> --- >>>>>>>>> drivers/net/bonding/bond_main.c | 5 +++++ >>>>>>>>> 1 file changed, 5 insertions(+) >>>>>>>>> >>>>>>>>> diff --git a/drivers/net/bonding/bond_main.c >>>>>>>>> b/drivers/net/bonding/bond_main.c >>>>>>>>> index 0c627b4..60b9483 100644 >>>>>>>>> --- a/drivers/net/bonding/bond_main.c >>>>>>>>> +++ b/drivers/net/bonding/bond_main.c >>>>>>>>> @@ -2794,6 +2794,11 @@ static bool bond_ab_arp_probe(struct bonding >>>>>>>>> *bond) >>>>>>>>> return should_notify_rtnl; >>>>>>>>> } >>>>>>>>> >>>>>>>>> + if (bond_time_in_interval(bond, curr_arp_slave->last_link_up, 2)) >>>>>>>>> { >>>>>>>>> + bond_arp_send_all(bond, curr_arp_slave); >>>>>>>>> + return should_notify_rtnl; >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> bond_set_slave_inactive_flags(curr_arp_slave, >>>>>>>>> BOND_SLAVE_NOTIFY_LATER); >>>>>>>>> >>>>>>>>> bond_for_each_slave_rcu(bond, slave, iter) { >>>>>>>>> -- >>>>>>>>> 1.8.3.1 > > --- > -Jay Vosburgh, jay.vosburgh@canonical.com >