* [Question]: should we consider arp missed max during bond_ab_arp_probe()? @ 2024-11-06 7:39 Hangbin Liu 2024-11-06 8:34 ` Jiri Pirko 2024-11-08 1:32 ` Jay Vosburgh 0 siblings, 2 replies; 8+ messages in thread From: Hangbin Liu @ 2024-11-06 7:39 UTC (permalink / raw) To: Jay Vosburgh; +Cc: netdev Hi Jay, Our QE reported that, when there is no active slave during bond_ab_arp_probe(), the slaves send the arp probe message one by one. This will flap the switch's mac table quickly, sometimes even make the switch stop learning mac address. So should we consider the arp missed max during bond_ab_arp_probe()? i.e. each slave has more chances to send probe messages before switch to another slave. What do you think? Thanks Hangbin ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Question]: should we consider arp missed max during bond_ab_arp_probe()? 2024-11-06 7:39 [Question]: should we consider arp missed max during bond_ab_arp_probe()? Hangbin Liu @ 2024-11-06 8:34 ` Jiri Pirko 2024-11-06 9:25 ` Hangbin Liu 2024-11-08 1:32 ` Jay Vosburgh 1 sibling, 1 reply; 8+ messages in thread From: Jiri Pirko @ 2024-11-06 8:34 UTC (permalink / raw) To: Hangbin Liu; +Cc: Jay Vosburgh, netdev Wed, Nov 06, 2024 at 08:39:48AM CET, liuhangbin@gmail.com wrote: >Hi Jay, > >Our QE reported that, when there is no active slave during >bond_ab_arp_probe(), the slaves send the arp probe message one by one. This >will flap the switch's mac table quickly, sometimes even make the switch stop >learning mac address. So should we consider the arp missed max during >bond_ab_arp_probe()? i.e. each slave has more chances to send probe messages >before switch to another slave. What do you think? Out of curiosity, is anyone still using AB mode in real life? And if yes, any idea why exacly? > >Thanks >Hangbin > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Question]: should we consider arp missed max during bond_ab_arp_probe()? 2024-11-06 8:34 ` Jiri Pirko @ 2024-11-06 9:25 ` Hangbin Liu 2024-11-06 14:40 ` Jiri Pirko 0 siblings, 1 reply; 8+ messages in thread From: Hangbin Liu @ 2024-11-06 9:25 UTC (permalink / raw) To: Jiri Pirko; +Cc: Jay Vosburgh, netdev On Wed, Nov 06, 2024 at 09:34:59AM +0100, Jiri Pirko wrote: > Wed, Nov 06, 2024 at 08:39:48AM CET, liuhangbin@gmail.com wrote: > >Hi Jay, > > > >Our QE reported that, when there is no active slave during > >bond_ab_arp_probe(), the slaves send the arp probe message one by one. This > >will flap the switch's mac table quickly, sometimes even make the switch stop > >learning mac address. So should we consider the arp missed max during > >bond_ab_arp_probe()? i.e. each slave has more chances to send probe messages > >before switch to another slave. What do you think? > > Out of curiosity, is anyone still using AB mode in real life? And if Based on our analyse, in year 2024, there are 53.8% users using 802.3ad mode, 41.6% users using active-backup mode. 2.5% users using round-robin mode. > yes, any idea why exacly? I think they just want to make sure there is a backup for the link. Thanks Hangbin ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Question]: should we consider arp missed max during bond_ab_arp_probe()? 2024-11-06 9:25 ` Hangbin Liu @ 2024-11-06 14:40 ` Jiri Pirko 2024-11-07 1:21 ` Hangbin Liu 2024-11-07 1:21 ` Jakub Kicinski 0 siblings, 2 replies; 8+ messages in thread From: Jiri Pirko @ 2024-11-06 14:40 UTC (permalink / raw) To: Hangbin Liu; +Cc: Jay Vosburgh, netdev Wed, Nov 06, 2024 at 10:25:30AM CET, liuhangbin@gmail.com wrote: >On Wed, Nov 06, 2024 at 09:34:59AM +0100, Jiri Pirko wrote: >> Wed, Nov 06, 2024 at 08:39:48AM CET, liuhangbin@gmail.com wrote: >> >Hi Jay, >> > >> >Our QE reported that, when there is no active slave during >> >bond_ab_arp_probe(), the slaves send the arp probe message one by one. This >> >will flap the switch's mac table quickly, sometimes even make the switch stop >> >learning mac address. So should we consider the arp missed max during >> >bond_ab_arp_probe()? i.e. each slave has more chances to send probe messages >> >before switch to another slave. What do you think? >> >> Out of curiosity, is anyone still using AB mode in real life? And if > >Based on our analyse, in year 2024, there are 53.8% users using 802.3ad mode, >41.6% users using active-backup mode. 2.5% users using round-robin mode. > >> yes, any idea why exacly? > >I think they just want to make sure there is a backup for the link. Why don't they use LACP? You can have backup there as well. > >Thanks >Hangbin ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Question]: should we consider arp missed max during bond_ab_arp_probe()? 2024-11-06 14:40 ` Jiri Pirko @ 2024-11-07 1:21 ` Hangbin Liu 2024-11-07 1:21 ` Jakub Kicinski 1 sibling, 0 replies; 8+ messages in thread From: Hangbin Liu @ 2024-11-07 1:21 UTC (permalink / raw) To: Jiri Pirko; +Cc: Jay Vosburgh, netdev On Wed, Nov 06, 2024 at 03:40:39PM +0100, Jiri Pirko wrote: > >I think they just want to make sure there is a backup for the link. > > Why don't they use LACP? You can have backup there as well. Some users don't want to configure switches. Specifically, some large-scale users don't want or don't allow to maintain the same number of switches. Hangbin ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Question]: should we consider arp missed max during bond_ab_arp_probe()? 2024-11-06 14:40 ` Jiri Pirko 2024-11-07 1:21 ` Hangbin Liu @ 2024-11-07 1:21 ` Jakub Kicinski 1 sibling, 0 replies; 8+ messages in thread From: Jakub Kicinski @ 2024-11-07 1:21 UTC (permalink / raw) To: Jiri Pirko; +Cc: Hangbin Liu, Jay Vosburgh, netdev On Wed, 6 Nov 2024 15:40:39 +0100 Jiri Pirko wrote: > >> Out of curiosity, is anyone still using AB mode in real life? And if > > > >Based on our analyse, in year 2024, there are 53.8% users using 802.3ad mode, > >41.6% users using active-backup mode. 2.5% users using round-robin mode. > > > >> yes, any idea why exacly? > > > >I think they just want to make sure there is a backup for the link. > > Why don't they use LACP? You can have backup there as well. FWIW I was asked to help with A/B setups in the past for facility networks (building sensor etc). Those guys wanted to do as little networking as possible, and had very low bandwidth needs but still wanted the redundancy. Basic active/backup was a good fit. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Question]: should we consider arp missed max during bond_ab_arp_probe()? 2024-11-06 7:39 [Question]: should we consider arp missed max during bond_ab_arp_probe()? Hangbin Liu 2024-11-06 8:34 ` Jiri Pirko @ 2024-11-08 1:32 ` Jay Vosburgh 2024-11-08 2:51 ` Hangbin Liu 1 sibling, 1 reply; 8+ messages in thread From: Jay Vosburgh @ 2024-11-08 1:32 UTC (permalink / raw) To: Hangbin Liu; +Cc: netdev Hangbin Liu <liuhangbin@gmail.com> wrote: >Hi Jay, > >Our QE reported that, when there is no active slave during >bond_ab_arp_probe(), the slaves send the arp probe message one by one. This >will flap the switch's mac table quickly, sometimes even make the switch stop >learning mac address. So should we consider the arp missed max during >bond_ab_arp_probe()? i.e. each slave has more chances to send probe messages >before switch to another slave. What do you think? Well, "quickly" here depends entirely on what the value of arp_interval is. It's been quite a while since I looked into the details of this particular behavior, but at the time I didn't see the switches I had issue flap warnings. If memory serves, I usually tested with arp_interval in the realm of 100ms, with anywhere from 2 to 6 interfaces in the bond. What settings are you using for the bond, and what model of switch exhibits the behavior you describe? That said, the intent of the current implementation is to cycle through the interfaces in the bond relatively quickly when no interfaces are up, under the theory that such behavior finds an available interface in the minimum time. I'm not necessarily opposed to having each probe "step," so to speak, perform multiple ARP probe checks. However, I wonder if this is a complicated workaround for not wanting to change a configuration setting on a switch, and it would only make things better by chance (i.e., that the probes just happen to now take long enough to not run afoul of the switch's time limit for some flap parameter). -J --- -Jay Vosburgh, jv@jvosburgh.net ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Question]: should we consider arp missed max during bond_ab_arp_probe()? 2024-11-08 1:32 ` Jay Vosburgh @ 2024-11-08 2:51 ` Hangbin Liu 0 siblings, 0 replies; 8+ messages in thread From: Hangbin Liu @ 2024-11-08 2:51 UTC (permalink / raw) To: Jay Vosburgh; +Cc: netdev On Thu, Nov 07, 2024 at 05:32:29PM -0800, Jay Vosburgh wrote: > Hangbin Liu <liuhangbin@gmail.com> wrote: > > >Hi Jay, > > > >Our QE reported that, when there is no active slave during > >bond_ab_arp_probe(), the slaves send the arp probe message one by one. This > >will flap the switch's mac table quickly, sometimes even make the switch stop > >learning mac address. So should we consider the arp missed max during > >bond_ab_arp_probe()? i.e. each slave has more chances to send probe messages > >before switch to another slave. What do you think? > > Well, "quickly" here depends entirely on what the value of > arp_interval is. It's been quite a while since I looked into the > details of this particular behavior, but at the time I didn't see the > switches I had issue flap warnings. If memory serves, I usually tested > with arp_interval in the realm of 100ms, with anywhere from 2 to 6 > interfaces in the bond. > > What settings are you using for the bond, and what model of > switch exhibits the behavior you describe? In our network, we have a cisco 9364 switch. Which will disable mac learning for 120 seconds if 6 MAC moves in 30 seconds[1] by default. > > That said, the intent of the current implementation is to cycle > through the interfaces in the bond relatively quickly when no interfaces > are up, under the theory that such behavior finds an available interface > in the minimum time. > > I'm not necessarily opposed to having each probe "step," so to > speak, perform multiple ARP probe checks. However, I wonder if this is > a complicated workaround for not wanting to change a configuration > setting on a switch, and it would only make things better by chance > (i.e., that the probes just happen to now take long enough to not run > afoul of the switch's time limit for some flap parameter). For Cisco Nexus 9300-X switches, the `mac-move policy` is supported since Cisco NX-OS Release 10.3(1)F, which is released August 19, 2022. So there do have an option to disable/modify the mac policy. But switches can't update to this version will be affected, unless the user change the arp_interval to an large number. As there is an workaround (either change the switch configure or arp_interval), I don't have a strong intend to change the bonding behavior. I will do it or ignore it based on your decision. [1] https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/104x/config-guides/cisco-nexus-9000-series-nx-os-system-management-configuration-guide-release-104x/m-configuring-mac-move.html Thanks Hangbin ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-11-08 2:51 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-06 7:39 [Question]: should we consider arp missed max during bond_ab_arp_probe()? Hangbin Liu 2024-11-06 8:34 ` Jiri Pirko 2024-11-06 9:25 ` Hangbin Liu 2024-11-06 14:40 ` Jiri Pirko 2024-11-07 1:21 ` Hangbin Liu 2024-11-07 1:21 ` Jakub Kicinski 2024-11-08 1:32 ` Jay Vosburgh 2024-11-08 2:51 ` Hangbin Liu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).