* Fwd: 802.3ad bonding aggregator reselection [not found] <CAKdSkDUb94mR7cDiZbxsc6fgm_8O5wagjUec4g-t5DF7R_GFDw@mail.gmail.com> @ 2016-06-17 10:40 ` Veli-Matti Lintu [not found] ` <CAD=hENdGOFY5027964=f3xk_qeNmVccHYvr2rvTJtpFmaeFG2w@mail.gmail.com> 0 siblings, 1 reply; 7+ messages in thread From: Veli-Matti Lintu @ 2016-06-17 10:40 UTC (permalink / raw) To: netdev; +Cc: Jay Vosburgh, Andy Gospodarek Hello, I have been trying to get the bonding driver working with multiple aggregators with two switches in mode=802.3ad to handle failing links properly. The goal is to have always the best possible bonded link in use if one or physical links fail. The bonding documentation describes that 802.3ad with ad_select=bandwidth/count should do this, but I wasn't able to get those or ad_select=stable working without patching the kernel. As I'm not really familiar with the codebase, I'm not sure if this is really a kernel problem or a configuration problem. Documentation/networking/bonding.txt ad_select ... The bandwidth and count selection policies permit failover of 802.3ad aggregations when partial failure of the active aggregator occurs. This keeps the aggregator with the highest availability (either in bandwidth or in number of ports) active at all times. This option was added in bonding version 3.4.0. The hardware setup consists of two HP 2530-48G switches and servers that have 6 ports in total that are connected to both switches using 3x1Gbps links. Port groups are configured as LACP on the switches. The switches are connected to each other, but they do not create a single aggregator so that all 6 links could be active at the same time. The NICs use ixgbe and igb drivers. Here are the tested steps: ad_select=stable 1. Enable all links on both switches and boot the server, 3 ports are up 2. Disable one link on switch that is the active aggregator expected: link goes down and port count in /proc/net/bonding/bond0 goes down result: link goes down and port count in /proc/net/bonding/bond0 does not change 3. Disable all links on switch that is the active aggregator expected: link goes down and bond switches to using aggregator that has links up result: link goes down and port count in /proc/net/bonding/bond0 does not change and connection is lost as there are no links up in active aggregator. 4. Enable a single link that on active aggregator that has all links down expect: ? result: aggregator with most links up is activated (in this case the previously non-active switch that had 3 links up all the time) ad_select=bandwidth/count 1. Enable all links on both switches and boot the server, 3 ports are up 2. Disable one link on switch that is the active aggregator expected: link goes down and aggregator reselection is started and non-active aggregator with 3 links up becomes active result: link goes down and port count in /proc/net/bonding/bond0 does not change, aggregator reselection does not occur 3. Same as with ad_select=stable 4. Enable a single link that on active aggregator that has all links down expect: aggregator with most links up is activated result: aggregator with most links up is activated (in this case the previously non-active switch that had 3 links up all the time) In all cases miimon does detect the link going down and if I bring one slaved interface down and back up (ifconfig/ip) in non-active aggregator, aggregator reselection is done. For me it looks like the problem is that when link goes down, there's nothing to check the remaining status of the bond. I could get this to happen with the following patch, but I'm not sure what side effects it might cause. Most of the examples googling revealed seemed to refer to Cisco gear, so I'm wondering if there's something hardware specific here. --- a/drivers/net/bonding/bond_3ad.c 2016-06-17 09:49:56.236636742 +0300 +++ b/drivers/net/bonding/bond_3ad.c 2016-06-17 10:04:34.309353452 +0300 @@ -2458,6 +2458,7 @@ /* link has failed */ port->is_enabled = false; ad_update_actor_keys(port, true); + port->sm_vars &= ~AD_PORT_SELECTED; } netdev_dbg(slave->bond->dev, "Port %d changed link status to %s\n", port->actor_port_number, Here's /proc/net/bonding/bond0 on unmodified 4.7-rc3 after disabling two ports on the switch with active aggregator. The active aggregator info still shows 3 ports. The results are the same on 4.4.x and 4.6.x kernels. The following options were used: options bonding mode=4 miimon=100 downdelay=200 updelay=200 xmit_hash_policy=layer3+4 ad_select=1 max_bonds=0 min_links=0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: IEEE 802.3ad Dynamic link aggregation Transmit Hash Policy: layer3+4 (1) MII Status: up MII Polling Interval (ms): 1000 Up Delay (ms): 2000 Down Delay (ms): 2000 802.3ad info LACP rate: fast Min links: 0 Aggregator selection policy (ad_select): bandwidth System priority: 65535 System MAC address: f2:07:89:4a:7c:9f Active Aggregator Info: Aggregator ID: 1 Number of ports: 3 Actor Key: 9 Partner Key: 57 Partner Mac Address: 6c:3b:e5:df:7a:80 Slave Interface: enp5s0f1 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 0c:c4:7a:34:c7:f1 Slave queue ID: 0 Aggregator ID: 1 Actor Churn State: none Partner Churn State: none Actor Churned Count: 0 Partner Churned Count: 0 details actor lacp pdu: system priority: 65535 system mac address: f2:07:89:4a:7c:9f port key: 9 port priority: 255 port number: 1 port state: 63 details partner lacp pdu: system priority: 31360 system mac address: 6c:3b:e5:df:7a:80 oper key: 57 port priority: 0 port number: 23 port state: 61 Slave Interface: enp5s0f0 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 0c:c4:7a:34:c7:f0 Slave queue ID: 0 Aggregator ID: 2 Actor Churn State: none Partner Churn State: none Actor Churned Count: 0 Partner Churned Count: 0 details actor lacp pdu: system priority: 65535 system mac address: f2:07:89:4a:7c:9f port key: 9 port priority: 255 port number: 2 port state: 63 details partner lacp pdu: system priority: 36992 system mac address: 6c:3b:e5:e0:90:80 oper key: 57 port priority: 0 port number: 23 port state: 61 Slave Interface: ens6f1 MII Status: down Speed: Unknown Duplex: Unknown Link Failure Count: 1 Permanent HW addr: a0:36:9f:83:3c:41 Slave queue ID: 0 Aggregator ID: 1 Actor Churn State: none Partner Churn State: none Actor Churned Count: 0 Partner Churned Count: 0 details actor lacp pdu: system priority: 65535 system mac address: f2:07:89:4a:7c:9f port key: 0 port priority: 255 port number: 3 port state: 63 details partner lacp pdu: system priority: 31360 system mac address: 6c:3b:e5:df:7a:80 oper key: 57 port priority: 0 port number: 29 port state: 61 Slave Interface: ens6f0 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: a0:36:9f:83:3c:40 Slave queue ID: 0 Aggregator ID: 2 Actor Churn State: churned Partner Churn State: churned Actor Churned Count: 1 Partner Churned Count: 1 details actor lacp pdu: system priority: 65535 system mac address: f2:07:89:4a:7c:9f port key: 9 port priority: 255 port number: 4 port state: 7 details partner lacp pdu: system priority: 36992 system mac address: 6c:3b:e5:e0:90:80 oper key: 57 port priority: 0 port number: 29 port state: 53 Slave Interface: ens5f1 MII Status: down Speed: Unknown Duplex: Unknown Link Failure Count: 1 Permanent HW addr: a0:36:9f:83:3d:1f Slave queue ID: 0 Aggregator ID: 1 Actor Churn State: none Partner Churn State: churned Actor Churned Count: 0 Partner Churned Count: 1 details actor lacp pdu: system priority: 65535 system mac address: f2:07:89:4a:7c:9f port key: 0 port priority: 255 port number: 5 port state: 143 details partner lacp pdu: system priority: 31360 system mac address: 6c:3b:e5:df:7a:80 oper key: 57 port priority: 0 port number: 28 port state: 55 Slave Interface: ens5f0 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: a0:36:9f:83:3d:1e Slave queue ID: 0 Aggregator ID: 2 Actor Churn State: none Partner Churn State: none Actor Churned Count: 0 Partner Churned Count: 0 details actor lacp pdu: system priority: 65535 system mac address: f2:07:89:4a:7c:9f port key: 9 port priority: 255 port number: 6 port state: 63 details partner lacp pdu: system priority: 36992 system mac address: 6c:3b:e5:e0:90:80 oper key: 57 port priority: 0 port number: 28 port state: 61 The results with the patch after disabling links and aggregator has been reselected: Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: IEEE 802.3ad Dynamic link aggregation Transmit Hash Policy: layer3+4 (1) MII Status: up MII Polling Interval (ms): 1000 Up Delay (ms): 2000 Down Delay (ms): 2000 802.3ad info LACP rate: fast Min links: 0 Aggregator selection policy (ad_select): bandwidth System priority: 65535 System MAC address: f2:07:89:4a:7c:9f Active Aggregator Info: Aggregator ID: 2 Number of ports: 2 Actor Key: 9 Partner Key: 57 Partner Mac Address: 6c:3b:e5:e0:90:80 Slave Interface: enp5s0f1 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 0c:c4:7a:34:c7:f1 Slave queue ID: 0 Aggregator ID: 1 Actor Churn State: none Partner Churn State: none Actor Churned Count: 0 Partner Churned Count: 0 details actor lacp pdu: system priority: 65535 system mac address: f2:07:89:4a:7c:9f port key: 9 port priority: 255 port number: 1 port state: 63 details partner lacp pdu: system priority: 31360 system mac address: 6c:3b:e5:df:7a:80 oper key: 57 port priority: 0 port number: 23 port state: 61 Slave Interface: enp5s0f0 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 0c:c4:7a:34:c7:f0 Slave queue ID: 0 Aggregator ID: 2 Actor Churn State: none Partner Churn State: none Actor Churned Count: 1 Partner Churned Count: 1 details actor lacp pdu: system priority: 65535 system mac address: f2:07:89:4a:7c:9f port key: 9 port priority: 255 port number: 2 port state: 63 details partner lacp pdu: system priority: 36992 system mac address: 6c:3b:e5:e0:90:80 oper key: 57 port priority: 0 port number: 23 port state: 61 Slave Interface: ens6f1 MII Status: down Speed: Unknown Duplex: Unknown Link Failure Count: 1 Permanent HW addr: a0:36:9f:83:3c:41 Slave queue ID: 0 Aggregator ID: 3 Actor Churn State: none Partner Churn State: none Actor Churned Count: 0 Partner Churned Count: 0 details actor lacp pdu: system priority: 65535 system mac address: f2:07:89:4a:7c:9f port key: 0 port priority: 255 port number: 3 port state: 7 details partner lacp pdu: system priority: 31360 system mac address: 6c:3b:e5:df:7a:80 oper key: 57 port priority: 0 port number: 29 port state: 61 Slave Interface: ens6f0 MII Status: down Speed: Unknown Duplex: Unknown Link Failure Count: 1 Permanent HW addr: a0:36:9f:83:3c:40 Slave queue ID: 0 Aggregator ID: 4 Actor Churn State: monitoring Partner Churn State: monitoring Actor Churned Count: 1 Partner Churned Count: 1 details actor lacp pdu: system priority: 65535 system mac address: f2:07:89:4a:7c:9f port key: 0 port priority: 255 port number: 4 port state: 135 details partner lacp pdu: system priority: 36992 system mac address: 6c:3b:e5:e0:90:80 oper key: 57 port priority: 0 port number: 29 port state: 55 Slave Interface: ens5f1 MII Status: down Speed: Unknown Duplex: Unknown Link Failure Count: 1 Permanent HW addr: a0:36:9f:83:3d:1f Slave queue ID: 0 Aggregator ID: 5 Actor Churn State: churned Partner Churn State: churned Actor Churned Count: 1 Partner Churned Count: 1 details actor lacp pdu: system priority: 65535 system mac address: f2:07:89:4a:7c:9f port key: 0 port priority: 255 port number: 5 port state: 135 details partner lacp pdu: system priority: 31360 system mac address: 6c:3b:e5:df:7a:80 oper key: 57 port priority: 0 port number: 28 port state: 55 Slave Interface: ens5f0 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: a0:36:9f:83:3d:1e Slave queue ID: 0 Aggregator ID: 2 Actor Churn State: none Partner Churn State: none Actor Churned Count: 1 Partner Churned Count: 1 details actor lacp pdu: system priority: 65535 system mac address: f2:07:89:4a:7c:9f port key: 9 port priority: 255 port number: 6 port state: 63 details partner lacp pdu: system priority: 36992 system mac address: 6c:3b:e5:e0:90:80 oper key: 57 port priority: 0 port number: 28 port state: 61 Happy hacking! Veli-Matti ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <CAD=hENdGOFY5027964=f3xk_qeNmVccHYvr2rvTJtpFmaeFG2w@mail.gmail.com>]
* Re: 802.3ad bonding aggregator reselection [not found] ` <CAD=hENdGOFY5027964=f3xk_qeNmVccHYvr2rvTJtpFmaeFG2w@mail.gmail.com> @ 2016-06-21 10:50 ` Veli-Matti Lintu 2016-06-21 15:46 ` Jay Vosburgh 0 siblings, 1 reply; 7+ messages in thread From: Veli-Matti Lintu @ 2016-06-21 10:50 UTC (permalink / raw) To: zhuyj; +Cc: netdev, Jay Vosburgh, Andy Gospodarek 2016-06-20 17:11 GMT+03:00 zhuyj <zyjzyj2000@gmail.com>: > 5. Switch Configuration > ======================= > > For this section, "switch" refers to whatever system the > bonded devices are directly connected to (i.e., where the other end of > the cable plugs into). This may be an actual dedicated switch device, > or it may be another regular system (e.g., another computer running > Linux), > > The active-backup, balance-tlb and balance-alb modes do not > require any specific configuration of the switch. > > The 802.3ad mode requires that the switch have the appropriate > ports configured as an 802.3ad aggregation. The precise method used > to configure this varies from switch to switch, but, for example, a > Cisco 3550 series switch requires that the appropriate ports first be > grouped together in a single etherchannel instance, then that > etherchannel is set to mode "lacp" to enable 802.3ad (instead of > standard EtherChannel). The ports are configured in switch settings (HP Procurve 2530-48G) in same trunk group (TrkX) and trunk group type is set as LACP. /proc/net/bonding/bond0 also shows that the three ports belong to same aggregator and bandwidth tests also support this. In my understanding Procurve's trunk group is pretty much the same as etherchannel in Cisco's terminology. The bonded link comes always up properly, but handling of links going down is the problem. Are there known differences between different vendors there? Veli-Matti ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 802.3ad bonding aggregator reselection 2016-06-21 10:50 ` Veli-Matti Lintu @ 2016-06-21 15:46 ` Jay Vosburgh 2016-06-21 20:48 ` Veli-Matti Lintu 0 siblings, 1 reply; 7+ messages in thread From: Jay Vosburgh @ 2016-06-21 15:46 UTC (permalink / raw) To: Veli-Matti Lintu; +Cc: zhuyj, netdev, Andy Gospodarek Veli-Matti Lintu <veli-matti.lintu@opinsys.fi> wrote: >2016-06-20 17:11 GMT+03:00 zhuyj <zyjzyj2000@gmail.com>: >> 5. Switch Configuration >> ======================= >> >> For this section, "switch" refers to whatever system the >> bonded devices are directly connected to (i.e., where the other end of >> the cable plugs into). This may be an actual dedicated switch device, >> or it may be another regular system (e.g., another computer running >> Linux), >> >> The active-backup, balance-tlb and balance-alb modes do not >> require any specific configuration of the switch. >> >> The 802.3ad mode requires that the switch have the appropriate >> ports configured as an 802.3ad aggregation. The precise method used >> to configure this varies from switch to switch, but, for example, a >> Cisco 3550 series switch requires that the appropriate ports first be >> grouped together in a single etherchannel instance, then that >> etherchannel is set to mode "lacp" to enable 802.3ad (instead of >> standard EtherChannel). > >The ports are configured in switch settings (HP Procurve 2530-48G) in >same trunk group (TrkX) and trunk group type is set as LACP. >/proc/net/bonding/bond0 also shows that the three ports belong to same >aggregator and bandwidth tests also support this. In my understanding >Procurve's trunk group is pretty much the same as etherchannel in >Cisco's terminology. The bonded link comes always up properly, but >handling of links going down is the problem. Are there known >differences between different vendors there? I did the original LACP reselection testing on a Cisco switch, but I have an HP 2530 now; I'll test it later today or tomorrow and see if it behaves properly, and whether your proposed patch is needed. -J --- -Jay Vosburgh, jay.vosburgh@canonical.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 802.3ad bonding aggregator reselection 2016-06-21 15:46 ` Jay Vosburgh @ 2016-06-21 20:48 ` Veli-Matti Lintu 2016-06-22 0:49 ` Jay Vosburgh 0 siblings, 1 reply; 7+ messages in thread From: Veli-Matti Lintu @ 2016-06-21 20:48 UTC (permalink / raw) To: Jay Vosburgh; +Cc: zhuyj, netdev, Andy Gospodarek 2016-06-21 18:46 GMT+03:00 Jay Vosburgh <jay.vosburgh@canonical.com>: > Veli-Matti Lintu <veli-matti.lintu@opinsys.fi> wrote: > >>2016-06-20 17:11 GMT+03:00 zhuyj <zyjzyj2000@gmail.com>: >>> 5. Switch Configuration >>> ======================= >>> >>> For this section, "switch" refers to whatever system the >>> bonded devices are directly connected to (i.e., where the other end of >>> the cable plugs into). This may be an actual dedicated switch device, >>> or it may be another regular system (e.g., another computer running >>> Linux), >>> >>> The active-backup, balance-tlb and balance-alb modes do not >>> require any specific configuration of the switch. >>> >>> The 802.3ad mode requires that the switch have the appropriate >>> ports configured as an 802.3ad aggregation. The precise method used >>> to configure this varies from switch to switch, but, for example, a >>> Cisco 3550 series switch requires that the appropriate ports first be >>> grouped together in a single etherchannel instance, then that >>> etherchannel is set to mode "lacp" to enable 802.3ad (instead of >>> standard EtherChannel). >> >>The ports are configured in switch settings (HP Procurve 2530-48G) in >>same trunk group (TrkX) and trunk group type is set as LACP. >>/proc/net/bonding/bond0 also shows that the three ports belong to same >>aggregator and bandwidth tests also support this. In my understanding >>Procurve's trunk group is pretty much the same as etherchannel in >>Cisco's terminology. The bonded link comes always up properly, but >>handling of links going down is the problem. Are there known >>differences between different vendors there? > > I did the original LACP reselection testing on a Cisco switch, > but I have an HP 2530 now; I'll test it later today or tomorrow and see > if it behaves properly, and whether your proposed patch is needed. Thanks for taking a look at this. Here are some more details about the setup as Zhu Yanjun also requested. The server in question has two internal 10Gbps ports (using ixgbe) and two Intel I350 T2 dual-1Gbps PCIe-cards (using igb). All ports are using 1Gbps connections. 05:00.0 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01) 05:00.1 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01) 81:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 81:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 82:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 82:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) In the test setup the bonds are setup as: 05:00.0 + 81:00.0 + 82:00.0 and 05:00.1 + 81:00.1 + 82:00.1 So each bond uses one port using ixgbe and two ports using igbe. When testing, I have disabled the port in the switch configuration that brings down the link and also miimon sees the link going down on the server. This should be the same as unplugging the cable, so there's nothing coming through the wire to the server. Veli-Matti ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 802.3ad bonding aggregator reselection 2016-06-21 20:48 ` Veli-Matti Lintu @ 2016-06-22 0:49 ` Jay Vosburgh 2016-06-22 17:43 ` Veli-Matti Lintu 0 siblings, 1 reply; 7+ messages in thread From: Jay Vosburgh @ 2016-06-22 0:49 UTC (permalink / raw) To: Veli-Matti Lintu; +Cc: zhuyj, netdev, Andy Gospodarek, Mahesh Bandewar Veli-Matti Lintu <veli-matti.lintu@opinsys.fi> wrote: [...] >>>The ports are configured in switch settings (HP Procurve 2530-48G) in >>>same trunk group (TrkX) and trunk group type is set as LACP. >>>/proc/net/bonding/bond0 also shows that the three ports belong to same >>>aggregator and bandwidth tests also support this. In my understanding >>>Procurve's trunk group is pretty much the same as etherchannel in >>>Cisco's terminology. The bonded link comes always up properly, but >>>handling of links going down is the problem. Are there known >>>differences between different vendors there? >> >> I did the original LACP reselection testing on a Cisco switch, >> but I have an HP 2530 now; I'll test it later today or tomorrow and see >> if it behaves properly, and whether your proposed patch is needed. > >Thanks for taking a look at this. Here are some more details about the >setup as Zhu Yanjun also requested. Summary (because anything involving a standard tends to get long winded): This is not a switch problem. Bonding appears to be following the standard in this case. I've identified when this behavior changed, and I think we should violate the standard in this case for ad_select set to "bandwidth" or "count," neither of which is the default value. Long winded version: I've reproduced the issue locally, and it does not appear to be anything particular to the switch. It appears to be due to changes from commit 7bb11dc9f59ddcb33ee317da77b235235aaa582a Author: Mahesh Bandewar <maheshb@google.com> Date: Sat Oct 31 12:45:06 2015 -0700 bonding: unify all places where actor-oper key needs to be updated. Specifically this block: void bond_3ad_handle_link_change(struct slave *slave, char link) [...] - /* there is no need to reselect a new aggregator, just signal the - * state machines to reinitialize - */ - port->sm_vars |= AD_PORT_BEGIN; Previously, setting BEGIN would cause the port in question to be reinitialized, which in turn would trigger reselection. I'm not sure that adding this section back is the correct fix from the point of view of the standard, however, as 802.1AX 5.2.3.1.2 defines BEGIN as: A Boolean variable that is set to TRUE when the System is initialized or reinitialized, and is set to FALSE when (re-)initialization has completed. and in this case we're not reinitializing the System (i.e., the bond). Further, 802.1AX 5.4.12 says: If the port becomes inoperable and a BEGIN event has not occurred, the state machine enters the PORT_DISABLED state. Partner_Oper_Port_State.Synchronization is set to FALSE. This state allows the current Selection state to remain undisturbed, so that, in the event that the port is still connected to the same Partner and Partner port when it becomes operable again, there will be no disturbance caused to higher layers by unneccessary re-configuration. At the moment, bonding is doing what 5.4.12 specifies, by placing the port into PORT_DISABLED state. bond_3ad_handle_link_change clears port->is_enabled, which causes ad_rx_machine to clear AD_PORT_MATCHED but leave AD_PORT_SELECTED set. This in turn cause the selection logic to skip this port, resulting in the observed behavior (that the port is link down, but stays in the aggregator). Bonding will still remove the slave from the bond->slave_arr, so it won't actually try to send on this slave. I'll further note that 802.1AX 5.4.7 defines port_enabled as: A variable indicating that the physical layer has indicated that the link has been established and the port is operable. Value: Boolean TRUE if the physical layer has indicated that the port is operable. FALSE otherwise. So, it appears that bonding is in conformance with the standard in this case. I don't see an issue with the above behavior when ad_select is set to the default value of "stable"; bonding does reselect a new aggregator when all links fail, and it appears to follow the standard. I think a reasonable compromise here is to utilize a modified version of your patch that clears SELECTED (to trigger reselection) when a link goes down, but only if ad_select is not "stable", for example: diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index b9304a295f86..1ee5a3a5e658 100644 --- a/drivers/net/bonding/bond_3ad.c +++ b/drivers/net/bonding/bond_3ad.c @@ -2458,6 +2458,8 @@ void bond_3ad_handle_link_change(struct slave *slave, char link) /* link has failed */ port->is_enabled = false; ad_update_actor_keys(port, true); + if (__get_agg_selection_mode(port) != BOND_AD_STABLE) + port->port->sm_vars &= ~AD_PORT_SELECTED; } netdev_dbg(slave->bond->dev, "Port %d changed link status to %s\n", port->actor_port_number, I'll test this locally and will submit a formal patch with an update to bonding.txt tomorrow (if it works). -J --- -Jay Vosburgh, jay.vosburgh@canonical.com ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: 802.3ad bonding aggregator reselection 2016-06-22 0:49 ` Jay Vosburgh @ 2016-06-22 17:43 ` Veli-Matti Lintu 2016-06-23 5:58 ` Jay Vosburgh 0 siblings, 1 reply; 7+ messages in thread From: Veli-Matti Lintu @ 2016-06-22 17:43 UTC (permalink / raw) To: Jay Vosburgh; +Cc: zhuyj, netdev, Andy Gospodarek, Mahesh Bandewar 2016-06-22 3:49 GMT+03:00 Jay Vosburgh <jay.vosburgh@canonical.com>: > > Veli-Matti Lintu <veli-matti.lintu@opinsys.fi> wrote: > [...] >>>>The ports are configured in switch settings (HP Procurve 2530-48G) in >>>>same trunk group (TrkX) and trunk group type is set as LACP. >>>>/proc/net/bonding/bond0 also shows that the three ports belong to same >>>>aggregator and bandwidth tests also support this. In my understanding >>>>Procurve's trunk group is pretty much the same as etherchannel in >>>>Cisco's terminology. The bonded link comes always up properly, but >>>>handling of links going down is the problem. Are there known >>>>differences between different vendors there? >>> >>> I did the original LACP reselection testing on a Cisco switch, >>> but I have an HP 2530 now; I'll test it later today or tomorrow and see >>> if it behaves properly, and whether your proposed patch is needed. >> >>Thanks for taking a look at this. Here are some more details about the >>setup as Zhu Yanjun also requested. > > Summary (because anything involving a standard tends to get long > winded): > > This is not a switch problem. Bonding appears to be following > the standard in this case. I've identified when this behavior changed, > and I think we should violate the standard in this case for ad_select > set to "bandwidth" or "count," neither of which is the default value. > > Long winded version: > > I've reproduced the issue locally, and it does not appear to be > anything particular to the switch. It appears to be due to changes from > > commit 7bb11dc9f59ddcb33ee317da77b235235aaa582a > Author: Mahesh Bandewar <maheshb@google.com> > Date: Sat Oct 31 12:45:06 2015 -0700 > > bonding: unify all places where actor-oper key needs to be updated. > > Specifically this block: > > void bond_3ad_handle_link_change(struct slave *slave, char link) > [...] > - /* there is no need to reselect a new aggregator, just signal the > - * state machines to reinitialize > - */ > - port->sm_vars |= AD_PORT_BEGIN; > > Previously, setting BEGIN would cause the port in question to be > reinitialized, which in turn would trigger reselection. > > I'm not sure that adding this section back is the correct fix > from the point of view of the standard, however, as 802.1AX 5.2.3.1.2 > defines BEGIN as: > > A Boolean variable that is set to TRUE when the System is > initialized or reinitialized, and is set to FALSE when > (re-)initialization has completed. > > and in this case we're not reinitializing the System (i.e., the > bond). > > Further, 802.1AX 5.4.12 says: > > If the port becomes inoperable and a BEGIN event has not > occurred, the state machine enters the PORT_DISABLED > state. Partner_Oper_Port_State.Synchronization is set to > FALSE. This state allows the current Selection state to remain > undisturbed, so that, in the event that the port is still > connected to the same Partner and Partner port when it becomes > operable again, there will be no disturbance caused to higher > layers by unneccessary re-configuration. > > At the moment, bonding is doing what 5.4.12 specifies, by > placing the port into PORT_DISABLED state. bond_3ad_handle_link_change > clears port->is_enabled, which causes ad_rx_machine to clear > AD_PORT_MATCHED but leave AD_PORT_SELECTED set. This in turn cause the > selection logic to skip this port, resulting in the observed behavior > (that the port is link down, but stays in the aggregator). > > Bonding will still remove the slave from the bond->slave_arr, so > it won't actually try to send on this slave. I'll further note that > 802.1AX 5.4.7 defines port_enabled as: > > A variable indicating that the physical layer has indicated that > the link has been established and the port is operable. > Value: Boolean > TRUE if the physical layer has indicated that the port is operable. > FALSE otherwise. > > So, it appears that bonding is in conformance with the standard > in this case. I haven't done extensive testing on this, but I haven't noticed anything that would indicate that anything is sent to failed ports. So this part should be working. > I don't see an issue with the above behavior when ad_select is > set to the default value of "stable"; bonding does reselect a new > aggregator when all links fail, and it appears to follow the standard. In my testing ad_select=stable does not reselect a new aggregator when all links have failed. Reselection seems to occur only when a link comes up the failure. Here's an example of two bonds having three links each. Aggregator ID 3 is active with three ports and ID 2 has also three ports up. 802.3ad info LACP rate: fast Min links: 0 Aggregator selection policy (ad_select): stable System priority: 65535 System MAC address: 0c:c4:7a:34:c7:f1 Active Aggregator Info: Aggregator ID: 3 Number of ports: 3 Actor Key: 9 Partner Key: 57 Partner Mac Address: 6c:3b:e5:df:7a:80 Disable all ports in aggregator id 2 (enp5s0f1, ens5f1 and ens6f1) in switch configuration at the same time: [ 146.783003] ixgbe 0000:05:00.1 enp5s0f1: NIC Link is Down [ 146.783223] ixgbe 0000:05:00.1 enp5s0f1: speed changed to 0 for port enp5s0f1 [ 146.858824] bond0: link status down for interface enp5s0f1, disabling it in 200 ms [ 147.058932] bond0: link status definitely down for interface enp5s0f1, disabling it [ 147.291259] igb 0000:81:00.1 ens5f1: igb: ens5f1 NIC Link is Down [ 147.303303] igb 0000:82:00.1 ens6f1: igb: ens6f1 NIC Link is Down [ 147.358862] bond0: link status down for interface ens6f1, disabling it in 200 ms [ 147.358868] bond0: link status down for interface ens5f1, disabling it in 200 ms [ 147.558929] bond0: link status definitely down for interface ens6f1, disabling it [ 147.558987] bond0: link status definitely down for interface ens5f1, disabling it At this point there is no connection to the host and the aggregator with all failed links is still active. 802.3ad info LACP rate: fast Min links: 0 Aggregator selection policy (ad_select): stable System priority: 65535 System MAC address: 0c:c4:7a:34:c7:f1 Active Aggregator Info: Aggregator ID: 3 Number of ports: 3 Actor Key: 9 Partner Key: 57 Partner Mac Address: 6c:3b:e5:df:7a:80 If I then bring down an interface that is connected to an active switch port and bring it back up, reselection is done: # ifconfig ens5f0 down # ifconfig ens5f0 up [ 190.258900] bond0: link status down for interface ens5f0, disabling it in 200 ms [ 190.458934] bond0: link status definitely down for interface ens5f0, disabling it [ 193.192453] 8021q: adding VLAN 0 to HW filter on device ens5f0 [ 196.156105] igb 0000:81:00.0 ens5f0: igb: ens5f0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX [ 196.158912] bond0: link status up for interface ens5f0, enabling it in 200 ms [ 196.360471] bond0: link status definitely up for interface ens5f0, 1000 Mbps full duplex 802.3ad info LACP rate: fast Min links: 0 Aggregator selection policy (ad_select): stable System priority: 65535 System MAC address: 0c:c4:7a:34:c7:f1 Active Aggregator Info: Aggregator ID: 2 Number of ports: 3 Actor Key: 9 Partner Key: 57 Partner Mac Address: 6c:3b:e5:e0:90:80 At this point all connections resume normally. Are you able to reproduce this or is reselection working as expected? > I think a reasonable compromise here is to utilize a modified > version of your patch that clears SELECTED (to trigger reselection) when > a link goes down, but only if ad_select is not "stable", for example: > > diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c > index b9304a295f86..1ee5a3a5e658 100644 > --- a/drivers/net/bonding/bond_3ad.c > +++ b/drivers/net/bonding/bond_3ad.c > @@ -2458,6 +2458,8 @@ void bond_3ad_handle_link_change(struct slave *slave, char link) > /* link has failed */ > port->is_enabled = false; > ad_update_actor_keys(port, true); > + if (__get_agg_selection_mode(port) != BOND_AD_STABLE) > + port->port->sm_vars &= ~AD_PORT_SELECTED; > } > netdev_dbg(slave->bond->dev, "Port %d changed link status to %s\n", > port->actor_port_number, > > I'll test this locally and will submit a formal patch with an > update to bonding.txt tomorrow (if it works). > > -J > > --- > -Jay Vosburgh, jay.vosburgh@canonical.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 802.3ad bonding aggregator reselection 2016-06-22 17:43 ` Veli-Matti Lintu @ 2016-06-23 5:58 ` Jay Vosburgh 0 siblings, 0 replies; 7+ messages in thread From: Jay Vosburgh @ 2016-06-23 5:58 UTC (permalink / raw) To: Veli-Matti Lintu; +Cc: zhuyj, netdev, Andy Gospodarek, Mahesh Bandewar Veli-Matti Lintu <veli-matti.lintu@opinsys.fi> wrote: [...] >> I don't see an issue with the above behavior when ad_select is >> set to the default value of "stable"; bonding does reselect a new >> aggregator when all links fail, and it appears to follow the standard. > >In my testing ad_select=stable does not reselect a new aggregator when >all links have failed. Reselection seems to occur only when a link >comes up the failure. Here's an example of two bonds having three >links each. Aggregator ID 3 is active with three ports and ID 2 has >also three ports up. Yes, I've since observed that as well. [...] >Are you able to reproduce this or is reselection working as expected? Reselection is not working correctly at all. I'm working up a more comprehensive fix; the setting of BEGIN in the older code masked a number of issues in the reselection logic that never came up because setting BEGIN would do a full reselection from scratch at every slave carrier state change (meaning that no aggregator ever ended up with link down ports as members). My test patch at the moment is below (this is against net); any testing or review would be appreciated. I have not tested the ad_select bandwidth behavior of this yet; I've been testing stable and count first. This patch should be conformant to the standard, which requires link down ports to remain selected, but implementations are free to choose an active aggregator however they wish. diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index b9304a295f86..57be940c4c37 100644 --- a/drivers/net/bonding/bond_3ad.c +++ b/drivers/net/bonding/bond_3ad.c @@ -657,6 +657,20 @@ static void __set_agg_ports_ready(struct aggregator *aggregator, int val) } } +static int __agg_active_ports(struct aggregator *agg) +{ + struct port *port; + int active = 0; + + for (port = agg->lag_ports; port; + port = port->next_port_in_aggregator) { + if (port->is_enabled) + active++; + } + + return active; +} + /** * __get_agg_bandwidth - get the total bandwidth of an aggregator * @aggregator: the aggregator we're looking at @@ -665,38 +679,39 @@ static void __set_agg_ports_ready(struct aggregator *aggregator, int val) static u32 __get_agg_bandwidth(struct aggregator *aggregator) { u32 bandwidth = 0; + int nports = __agg_active_ports(aggregator); - if (aggregator->num_of_ports) { + if (nports) { switch (__get_link_speed(aggregator->lag_ports)) { case AD_LINK_SPEED_1MBPS: - bandwidth = aggregator->num_of_ports; + bandwidth = nports; break; case AD_LINK_SPEED_10MBPS: - bandwidth = aggregator->num_of_ports * 10; + bandwidth = nports * 10; break; case AD_LINK_SPEED_100MBPS: - bandwidth = aggregator->num_of_ports * 100; + bandwidth = nports * 100; break; case AD_LINK_SPEED_1000MBPS: - bandwidth = aggregator->num_of_ports * 1000; + bandwidth = nports * 1000; break; case AD_LINK_SPEED_2500MBPS: - bandwidth = aggregator->num_of_ports * 2500; + bandwidth = nports * 2500; break; case AD_LINK_SPEED_10000MBPS: - bandwidth = aggregator->num_of_ports * 10000; + bandwidth = nports * 10000; break; case AD_LINK_SPEED_20000MBPS: - bandwidth = aggregator->num_of_ports * 20000; + bandwidth = nports * 20000; break; case AD_LINK_SPEED_40000MBPS: - bandwidth = aggregator->num_of_ports * 40000; + bandwidth = nports * 40000; break; case AD_LINK_SPEED_56000MBPS: - bandwidth = aggregator->num_of_ports * 56000; + bandwidth = nports * 56000; break; case AD_LINK_SPEED_100000MBPS: - bandwidth = aggregator->num_of_ports * 100000; + bandwidth = nports * 100000; break; default: bandwidth = 0; /* to silence the compiler */ @@ -1530,10 +1545,10 @@ static struct aggregator *ad_agg_selection_test(struct aggregator *best, switch (__get_agg_selection_mode(curr->lag_ports)) { case BOND_AD_COUNT: - if (curr->num_of_ports > best->num_of_ports) + if (__agg_active_ports(curr) > __agg_active_ports(best)) return curr; - if (curr->num_of_ports < best->num_of_ports) + if (__agg_active_ports(curr) < __agg_active_ports(best)) return best; /*FALLTHROUGH*/ @@ -1561,8 +1576,14 @@ static int agg_device_up(const struct aggregator *agg) if (!port) return 0; - return netif_running(port->slave->dev) && - netif_carrier_ok(port->slave->dev); + for (port = agg->lag_ports; port; + port = port->next_port_in_aggregator) { + if (netif_running(port->slave->dev) && + netif_carrier_ok(port->slave->dev)) + return 1; + } + + return 0; } /** @@ -1610,7 +1631,7 @@ static void ad_agg_selection_logic(struct aggregator *agg, agg->is_active = 0; - if (agg->num_of_ports && agg_device_up(agg)) + if (__agg_active_ports(agg) && agg_device_up(agg)) best = ad_agg_selection_test(best, agg); } @@ -1622,7 +1643,7 @@ static void ad_agg_selection_logic(struct aggregator *agg, * answering partner. */ if (active && active->lag_ports && - active->lag_ports->is_enabled && + __agg_active_ports(active) && (__agg_has_partner(active) || (!__agg_has_partner(active) && !__agg_has_partner(best)))) { @@ -2432,7 +2453,9 @@ void bond_3ad_adapter_speed_duplex_changed(struct slave *slave) */ void bond_3ad_handle_link_change(struct slave *slave, char link) { + struct aggregator *agg; struct port *port; + bool dummy; port = &(SLAVE_AD_INFO(slave)->port); @@ -2459,6 +2482,9 @@ void bond_3ad_handle_link_change(struct slave *slave, char link) port->is_enabled = false; ad_update_actor_keys(port, true); } + agg = __get_first_agg(port); + ad_agg_selection_logic(agg, &dummy); + netdev_dbg(slave->bond->dev, "Port %d changed link status to %s\n", port->actor_port_number, link == BOND_LINK_UP ? "UP" : "DOWN"); @@ -2499,7 +2525,7 @@ int bond_3ad_set_carrier(struct bonding *bond) active = __get_active_agg(&(SLAVE_AD_INFO(first_slave)->aggregator)); if (active) { /* are enough slaves available to consider link up? */ - if (active->num_of_ports < bond->params.min_links) { + if (__agg_active_ports(active) < bond->params.min_links) { if (netif_carrier_ok(bond->dev)) { netif_carrier_off(bond->dev); goto out; -J --- -Jay Vosburgh, jay.vosburgh@canonical.com ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-06-23 5:59 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAKdSkDUb94mR7cDiZbxsc6fgm_8O5wagjUec4g-t5DF7R_GFDw@mail.gmail.com>
2016-06-17 10:40 ` Fwd: 802.3ad bonding aggregator reselection Veli-Matti Lintu
[not found] ` <CAD=hENdGOFY5027964=f3xk_qeNmVccHYvr2rvTJtpFmaeFG2w@mail.gmail.com>
2016-06-21 10:50 ` Veli-Matti Lintu
2016-06-21 15:46 ` Jay Vosburgh
2016-06-21 20:48 ` Veli-Matti Lintu
2016-06-22 0:49 ` Jay Vosburgh
2016-06-22 17:43 ` Veli-Matti Lintu
2016-06-23 5:58 ` Jay Vosburgh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).