* [PATCH net 0/3] bonding: fix 802.3ad churn machine and port state issues
@ 2025-11-24 4:33 Hangbin Liu
2025-11-24 4:33 ` [PATCH net 1/3] bonding: set AD_RX_PORT_DISABLED when disabling a port Hangbin Liu
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Hangbin Liu @ 2025-11-24 4:33 UTC (permalink / raw)
To: netdev
Cc: Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Mahesh Bandewar,
Shuah Khan, linux-kselftest, Hangbin Liu
This series fixes two issues in the bonding 802.3ad implementation
related to port state management and churn detection:
1. When disabling a port, we need to set AD_RX_PORT_DISABLED to ensure
proper state machine transitions, preventing ports from getting stuck
in AD_RX_CURRENT state.
2. The ad_churn_machine implementation is restructured to follow IEEE
802.1AX-2014 specifications correctly. The current implementation has
several issues: it doesn't transition to "none" state immediately when
synchronization is achieved, and can get stuck in churned state in
multi-aggregator scenarios.
3. Selftests are enhanced to validate both mux state machine and churn
state logic under aggregator selection and failover scenarios.
These changes ensure proper LACP state machine behavior and fix issues
where ports could remain in incorrect states during aggregator failover.
Hangbin Liu (3):
bonding: set AD_RX_PORT_DISABLED when disabling a port
bonding: restructure ad_churn_machine
selftests: bonding: add mux and churn state testing
drivers/net/bonding/bond_3ad.c | 105 ++++++++++++++----
.../selftests/drivers/net/bonding/Makefile | 2 +-
...nd_lacp_prio.sh => bond_lacp_ad_select.sh} | 73 ++++++++++++
3 files changed, 159 insertions(+), 21 deletions(-)
rename tools/testing/selftests/drivers/net/bonding/{bond_lacp_prio.sh => bond_lacp_ad_select.sh} (64%)
--
2.50.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH net 1/3] bonding: set AD_RX_PORT_DISABLED when disabling a port
2025-11-24 4:33 [PATCH net 0/3] bonding: fix 802.3ad churn machine and port state issues Hangbin Liu
@ 2025-11-24 4:33 ` Hangbin Liu
2025-11-27 10:15 ` Paolo Abeni
2025-11-24 4:33 ` [PATCH net 2/3] bonding: restructure ad_churn_machine Hangbin Liu
2025-11-24 4:33 ` [PATCH net 3/3] selftests: bonding: add mux and churn state testing Hangbin Liu
2 siblings, 1 reply; 10+ messages in thread
From: Hangbin Liu @ 2025-11-24 4:33 UTC (permalink / raw)
To: netdev
Cc: Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Mahesh Bandewar,
Shuah Khan, linux-kselftest, Hangbin Liu
When disabling a port’s collecting and distributing states, updating only
rx_disabled is not sufficient. We also need to set AD_RX_PORT_DISABLED
so that the rx_machine transitions into the AD_RX_EXPIRED state.
One example is in ad_agg_selection_logic(): when a new aggregator is
selected and old active aggregator is disabled, if AD_RX_PORT_DISABLED is
not set, the disabled port may remain stuck in AD_RX_CURRENT due to
continuing to receive partner LACP messages.
The __disable_port() called by ad_disable_collecting_distributing()
does not have this issue, since its caller also clears the
collecting/distributing bits.
The __disable_port() called by bond_3ad_bind_slave() should also be fine,
as the RX state machine is re-initialized to AD_RX_INITIALIZE.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
drivers/net/bonding/bond_3ad.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 49717b7b82a2..d6bd3615d129 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -205,6 +205,7 @@ static void __enable_collecting_port(struct port *port)
*/
static inline void __disable_port(struct port *port)
{
+ port->sm_rx_state = AD_RX_PORT_DISABLED;
bond_set_slave_inactive_flags(port->slave, BOND_SLAVE_NOTIFY_LATER);
}
--
2.50.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net 2/3] bonding: restructure ad_churn_machine
2025-11-24 4:33 [PATCH net 0/3] bonding: fix 802.3ad churn machine and port state issues Hangbin Liu
2025-11-24 4:33 ` [PATCH net 1/3] bonding: set AD_RX_PORT_DISABLED when disabling a port Hangbin Liu
@ 2025-11-24 4:33 ` Hangbin Liu
2025-11-27 10:36 ` Paolo Abeni
2025-11-24 4:33 ` [PATCH net 3/3] selftests: bonding: add mux and churn state testing Hangbin Liu
2 siblings, 1 reply; 10+ messages in thread
From: Hangbin Liu @ 2025-11-24 4:33 UTC (permalink / raw)
To: netdev
Cc: Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Mahesh Bandewar,
Shuah Khan, linux-kselftest, Hangbin Liu, Liang Li
The current ad_churn_machine implementation only transitions the
actor/partner churn state to churned or none after the churn timer expires.
However, IEEE 802.1AX-2014 specifies that a port should enter the none
state immediately once the actor’s port state enters synchronization.
Another issue is that if the churn timer expires while the churn machine is
not in the monitor state (e.g. already in churn), the state may remain
stuck indefinitely with no further transitions. This becomes visible in
multi-aggregator scenarios. For example:
Ports 1 and 2 are in aggregator 1 (active)
Ports 3 and 4 are in aggregator 2 (backup)
Ports 1 and 2 should be in none
Ports 3 and 4 should be in churned
If a failover occurs due to port 2 link down/up, aggregator 2 becomes active.
Under the current implementation, the resulting states may look like:
agg 1 (backup): port 1 -> none, port 2 -> churned
agg 2 (active): ports 3,4 keep in churned.
The root cause is that ad_churn_machine() only clears the
AD_PORT_CHURNED flag and starts a timer. When a churned port becomes active,
its RX state becomes AD_RX_CURRENT, preventing the churn flag from being set
again, leaving no way to retrigger the timer. Fixing this solely in
ad_rx_machine() is insufficient.
This patch rewrites ad_churn_machine according to IEEE 802.1AX-2014
(Figures 6-23 and 6-24), ensuring correct churn detection, state transitions,
and timer behavior. With new implementation, there is no need to set
AD_PORT_CHURNED in ad_rx_machine().
Fixes: 14c9551a32eb ("bonding: Implement port churn-machine (AD standard 43.4.17).")
Reported-by: Liang Li <liali@redhat.com>
Tested-by: Liang Li <liali@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
drivers/net/bonding/bond_3ad.c | 104 ++++++++++++++++++++++++++-------
1 file changed, 84 insertions(+), 20 deletions(-)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index d6bd3615d129..98b8d5040148 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1240,7 +1240,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
/* first, check if port was reinitialized */
if (port->sm_vars & AD_PORT_BEGIN) {
port->sm_rx_state = AD_RX_INITIALIZE;
- port->sm_vars |= AD_PORT_CHURNED;
/* check if port is not enabled */
} else if (!(port->sm_vars & AD_PORT_BEGIN) && !port->is_enabled)
port->sm_rx_state = AD_RX_PORT_DISABLED;
@@ -1248,8 +1247,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
else if (lacpdu && ((port->sm_rx_state == AD_RX_EXPIRED) ||
(port->sm_rx_state == AD_RX_DEFAULTED) ||
(port->sm_rx_state == AD_RX_CURRENT))) {
- if (port->sm_rx_state != AD_RX_CURRENT)
- port->sm_vars |= AD_PORT_CHURNED;
port->sm_rx_timer_counter = 0;
port->sm_rx_state = AD_RX_CURRENT;
} else {
@@ -1333,7 +1330,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
port->partner_oper.port_state |= LACP_STATE_LACP_TIMEOUT;
port->sm_rx_timer_counter = __ad_timer_to_ticks(AD_CURRENT_WHILE_TIMER, (u16)(AD_SHORT_TIMEOUT));
port->actor_oper_port_state |= LACP_STATE_EXPIRED;
- port->sm_vars |= AD_PORT_CHURNED;
break;
case AD_RX_DEFAULTED:
__update_default_selected(port);
@@ -1365,39 +1361,107 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
* ad_churn_machine - handle port churn's state machine
* @port: the port we're looking at
*
+ * IEEE 802.1AX-2014 Figure 6-23 - Actor Churn Detection machine state diagram
+ *
+ * BEGIN || (! port_enabled)
+ * |
+ * (3) (1) v
+ * +----------------------+ ActorPort.Sync +-------------------------+
+ * | NO_ACTOR_CHURN | <--------------------- | ACTOR_CHURN_MONITOR |
+ * |======================| |=========================|
+ * | actor_churn = FALSE; | ! ActorPort.Sync | actor_churn = FALSE; |
+ * | | ---------------------> | Start actor_churn_timer |
+ * +----------------------+ (4) +-------------------------+
+ * ^ |
+ * | |
+ * | actor_churn_timer expired
+ * | |
+ * ActorPort.Sync | (2)
+ * | +--------------------+ |
+ * (3) | | ACTOR_CHURN | |
+ * | |====================| |
+ * +------------- | actor_churn = True | <-----------+
+ * | |
+ * +--------------------+
+ *
+ * Similar for the Figure 6-24 - Partner Churn Detection machine state diagram
*/
static void ad_churn_machine(struct port *port)
{
- if (port->sm_vars & AD_PORT_CHURNED) {
+ bool partner_synced = port->partner_oper.port_state & LACP_STATE_SYNCHRONIZATION;
+ bool actor_synced = port->actor_oper_port_state & LACP_STATE_SYNCHRONIZATION;
+ bool partner_churned = port->sm_vars & AD_PORT_PARTNER_CHURN;
+ bool actor_churned = port->sm_vars & AD_PORT_ACTOR_CHURN;
+
+ /* ---- 1. begin or port not enabled ---- */
+ if ((port->sm_vars & AD_PORT_BEGIN) || !port->is_enabled) {
port->sm_vars &= ~AD_PORT_CHURNED;
+
port->sm_churn_actor_state = AD_CHURN_MONITOR;
port->sm_churn_partner_state = AD_CHURN_MONITOR;
+
port->sm_churn_actor_timer_counter =
__ad_timer_to_ticks(AD_ACTOR_CHURN_TIMER, 0);
port->sm_churn_partner_timer_counter =
- __ad_timer_to_ticks(AD_PARTNER_CHURN_TIMER, 0);
+ __ad_timer_to_ticks(AD_PARTNER_CHURN_TIMER, 0);
+
return;
}
- if (port->sm_churn_actor_timer_counter &&
- !(--port->sm_churn_actor_timer_counter) &&
- port->sm_churn_actor_state == AD_CHURN_MONITOR) {
- if (port->actor_oper_port_state & LACP_STATE_SYNCHRONIZATION) {
+
+ if (port->sm_churn_actor_timer_counter)
+ port->sm_churn_actor_timer_counter--;
+
+ if (port->sm_churn_partner_timer_counter)
+ port->sm_churn_partner_timer_counter--;
+
+ /* ---- 2. timer expired, enter CHURN ---- */
+ if (port->sm_churn_actor_state == AD_CHURN_MONITOR &&
+ !actor_churned && !port->sm_churn_actor_timer_counter) {
+ port->sm_vars |= AD_PORT_ACTOR_CHURN;
+ port->sm_churn_actor_state = AD_CHURN;
+ port->churn_actor_count++;
+ actor_churned = true;
+ }
+
+ if (port->sm_churn_partner_state == AD_CHURN_MONITOR &&
+ !partner_churned && !port->sm_churn_partner_timer_counter) {
+ port->sm_vars |= AD_PORT_PARTNER_CHURN;
+ port->sm_churn_partner_state = AD_CHURN;
+ port->churn_partner_count++;
+ partner_churned = true;
+ }
+
+ /* ---- 3. CHURN_MONITOR/CHURN + sync -> NO_CHURN ---- */
+ if ((port->sm_churn_actor_state == AD_CHURN_MONITOR && !actor_churned) ||
+ (port->sm_churn_actor_state == AD_CHURN && actor_churned)) {
+ if (actor_synced) {
+ port->sm_vars &= ~AD_PORT_ACTOR_CHURN;
port->sm_churn_actor_state = AD_NO_CHURN;
- } else {
- port->churn_actor_count++;
- port->sm_churn_actor_state = AD_CHURN;
+ actor_churned = false;
}
}
- if (port->sm_churn_partner_timer_counter &&
- !(--port->sm_churn_partner_timer_counter) &&
- port->sm_churn_partner_state == AD_CHURN_MONITOR) {
- if (port->partner_oper.port_state & LACP_STATE_SYNCHRONIZATION) {
+
+ if ((port->sm_churn_partner_state == AD_CHURN_MONITOR && !partner_churned) ||
+ (port->sm_churn_partner_state == AD_CHURN && partner_churned)) {
+ if (partner_synced) {
+ port->sm_vars &= ~AD_PORT_PARTNER_CHURN;
port->sm_churn_partner_state = AD_NO_CHURN;
- } else {
- port->churn_partner_count++;
- port->sm_churn_partner_state = AD_CHURN;
+ partner_churned = false;
}
}
+
+ /* ---- 4. NO_CHURN + !sync -> MONITOR ---- */
+ if (port->sm_churn_actor_state == AD_NO_CHURN && !actor_churned && !actor_synced) {
+ port->sm_churn_actor_state = AD_CHURN_MONITOR;
+ port->sm_churn_actor_timer_counter =
+ __ad_timer_to_ticks(AD_ACTOR_CHURN_TIMER, 0);
+ }
+
+ if (port->sm_churn_partner_state == AD_NO_CHURN && !partner_churned && !partner_synced) {
+ port->sm_churn_partner_state = AD_CHURN_MONITOR;
+ port->sm_churn_partner_timer_counter =
+ __ad_timer_to_ticks(AD_PARTNER_CHURN_TIMER, 0);
+ }
}
/**
--
2.50.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net 3/3] selftests: bonding: add mux and churn state testing
2025-11-24 4:33 [PATCH net 0/3] bonding: fix 802.3ad churn machine and port state issues Hangbin Liu
2025-11-24 4:33 ` [PATCH net 1/3] bonding: set AD_RX_PORT_DISABLED when disabling a port Hangbin Liu
2025-11-24 4:33 ` [PATCH net 2/3] bonding: restructure ad_churn_machine Hangbin Liu
@ 2025-11-24 4:33 ` Hangbin Liu
2 siblings, 0 replies; 10+ messages in thread
From: Hangbin Liu @ 2025-11-24 4:33 UTC (permalink / raw)
To: netdev
Cc: Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Mahesh Bandewar,
Shuah Khan, linux-kselftest, Hangbin Liu
Rename the current LACP priority test to LACP ad_select testing, and
extend it to include validation of the actor state machine and churn
state logic. The updated tests verify that both the mux state machine and
the churn state machine behave correctly under aggregator selection and
failover scenarios.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
.../selftests/drivers/net/bonding/Makefile | 2 +-
...nd_lacp_prio.sh => bond_lacp_ad_select.sh} | 73 +++++++++++++++++++
2 files changed, 74 insertions(+), 1 deletion(-)
rename tools/testing/selftests/drivers/net/bonding/{bond_lacp_prio.sh => bond_lacp_ad_select.sh} (64%)
diff --git a/tools/testing/selftests/drivers/net/bonding/Makefile b/tools/testing/selftests/drivers/net/bonding/Makefile
index 6c5c60adb5e8..e7bddfbf0f7a 100644
--- a/tools/testing/selftests/drivers/net/bonding/Makefile
+++ b/tools/testing/selftests/drivers/net/bonding/Makefile
@@ -7,7 +7,7 @@ TEST_PROGS := \
bond-eth-type-change.sh \
bond-lladdr-target.sh \
bond_ipsec_offload.sh \
- bond_lacp_prio.sh \
+ bond_lacp_ad_select.sh \
bond_macvlan_ipvlan.sh \
bond_options.sh \
bond_passive_lacp.sh \
diff --git a/tools/testing/selftests/drivers/net/bonding/bond_lacp_prio.sh b/tools/testing/selftests/drivers/net/bonding/bond_lacp_ad_select.sh
similarity index 64%
rename from tools/testing/selftests/drivers/net/bonding/bond_lacp_prio.sh
rename to tools/testing/selftests/drivers/net/bonding/bond_lacp_ad_select.sh
index a483d505c6a8..9f0b3de4f55c 100755
--- a/tools/testing/selftests/drivers/net/bonding/bond_lacp_prio.sh
+++ b/tools/testing/selftests/drivers/net/bonding/bond_lacp_ad_select.sh
@@ -89,6 +89,65 @@ test_agg_reselect()
RET=1
}
+is_distributing()
+{
+ ip -j -n "$c_ns" -d link show "$1" \
+ | jq -e '.[].linkinfo.info_slave_data.ad_actor_oper_port_state_str | index("distributing")' > /dev/null
+}
+
+get_churn_state()
+{
+ local slave=$1
+ # shellcheck disable=SC2016
+ ip netns exec "$c_ns" awk -v s="$slave" '
+ $0 ~ "Slave Interface: " s {found=1}
+ found && /Actor Churn State:/ { print $4; exit }
+ ' /proc/net/bonding/bond0
+}
+
+check_slave_state()
+{
+ local state=$1
+ local slave_0=$2
+ local slave_1=$3
+ local churn_state
+ RET=0
+
+ s0_agg_id=$(cmd_jq "ip -n ${c_ns} -d -j link show $slave_0" \
+ ".[].linkinfo.info_slave_data.ad_aggregator_id")
+ s1_agg_id=$(cmd_jq "ip -n ${c_ns} -d -j link show $slave_1" \
+ ".[].linkinfo.info_slave_data.ad_aggregator_id")
+ if [ "${s0_agg_id}" -ne "${s1_agg_id}" ]; then
+ log_info "$state: $slave_0 $slave_1 agg ids are different"
+ RET=1
+ fi
+
+ for s in "$slave_0" "$slave_1"; do
+ churn_state=$(get_churn_state "$s")
+ if [ "$state" = "active" ]; then
+ if ! is_distributing "$s"; then
+ log_info "$state: $s is not in distributing state"
+ RET=1
+ fi
+ if [ "$churn_state" != "none" ]; then
+ log_info "$state: $s churn state $churn_state"
+ RET=1
+ fi
+ else
+ # Backup state, should be in churned and not distributing
+ if is_distributing "$s"; then
+ log_info "$state: $s is in distributing state"
+ RET=1
+ fi
+ if [ "$churn_state" != "churned" ]; then
+ log_info "$state: $s churn state $churn_state"
+ # shellcheck disable=SC2034
+ RET=1
+ fi
+ fi
+ done
+}
+
trap cleanup_all_ns EXIT
setup_ns c_ns s_ns b_ns
setup_links
@@ -98,11 +157,25 @@ log_test "bond 802.3ad" "actor_port_prio setting"
test_agg_reselect eth0
log_test "bond 802.3ad" "actor_port_prio select"
+# sleep for a while to make sure the mux state machine has completed.
+sleep 10
+check_slave_state active eth0 eth1
+log_test "bond 802.3ad" "active state/churn checking"
+# wait for churn timer expired, need a bit longer as we restart eth1
+sleep 55
+check_slave_state backup eth2 eth3
+log_test "bond 802.3ad" "backup state/churn checking"
# Change the actor port prio and re-test
ip -n "${c_ns}" link set eth0 type bond_slave actor_port_prio 10
ip -n "${c_ns}" link set eth2 type bond_slave actor_port_prio 1000
test_agg_reselect eth2
log_test "bond 802.3ad" "actor_port_prio switch"
+sleep 10
+check_slave_state active eth2 eth3
+log_test "bond 802.3ad" "active state/churn checking"
+sleep 55
+check_slave_state backup eth0 eth1
+log_test "bond 802.3ad" "backup state/churn checking"
exit "${EXIT_STATUS}"
--
2.50.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net 1/3] bonding: set AD_RX_PORT_DISABLED when disabling a port
2025-11-24 4:33 ` [PATCH net 1/3] bonding: set AD_RX_PORT_DISABLED when disabling a port Hangbin Liu
@ 2025-11-27 10:15 ` Paolo Abeni
2025-11-27 13:33 ` Hangbin Liu
0 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2025-11-27 10:15 UTC (permalink / raw)
To: Hangbin Liu, netdev
Cc: Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Simon Horman, Mahesh Bandewar, Shuah Khan,
linux-kselftest
On 11/24/25 5:33 AM, Hangbin Liu wrote:
> When disabling a port’s collecting and distributing states, updating only
> rx_disabled is not sufficient. We also need to set AD_RX_PORT_DISABLED
> so that the rx_machine transitions into the AD_RX_EXPIRED state.
>
> One example is in ad_agg_selection_logic(): when a new aggregator is
> selected and old active aggregator is disabled, if AD_RX_PORT_DISABLED is
> not set, the disabled port may remain stuck in AD_RX_CURRENT due to
> continuing to receive partner LACP messages.
>
> The __disable_port() called by ad_disable_collecting_distributing()
> does not have this issue, since its caller also clears the
> collecting/distributing bits.
>
> The __disable_port() called by bond_3ad_bind_slave() should also be fine,
> as the RX state machine is re-initialized to AD_RX_INITIALIZE.
Given the above, why don't you apply the change in
ad_agg_selection_logic() only, to reduce the chances of unintended side
effects?
/P
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net 2/3] bonding: restructure ad_churn_machine
2025-11-24 4:33 ` [PATCH net 2/3] bonding: restructure ad_churn_machine Hangbin Liu
@ 2025-11-27 10:36 ` Paolo Abeni
2025-11-27 14:06 ` Hangbin Liu
0 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2025-11-27 10:36 UTC (permalink / raw)
To: Hangbin Liu, netdev
Cc: Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Simon Horman, Mahesh Bandewar, Shuah Khan,
linux-kselftest, Liang Li
On 11/24/25 5:33 AM, Hangbin Liu wrote:
> The current ad_churn_machine implementation only transitions the
> actor/partner churn state to churned or none after the churn timer expires.
> However, IEEE 802.1AX-2014 specifies that a port should enter the none
> state immediately once the actor’s port state enters synchronization.
>
> Another issue is that if the churn timer expires while the churn machine is
> not in the monitor state (e.g. already in churn), the state may remain
> stuck indefinitely with no further transitions. This becomes visible in
> multi-aggregator scenarios. For example:
>
> Ports 1 and 2 are in aggregator 1 (active)
> Ports 3 and 4 are in aggregator 2 (backup)
>
> Ports 1 and 2 should be in none
> Ports 3 and 4 should be in churned
>
> If a failover occurs due to port 2 link down/up, aggregator 2 becomes active.
> Under the current implementation, the resulting states may look like:
>
> agg 1 (backup): port 1 -> none, port 2 -> churned
> agg 2 (active): ports 3,4 keep in churned.
>
> The root cause is that ad_churn_machine() only clears the
> AD_PORT_CHURNED flag and starts a timer. When a churned port becomes active,
> its RX state becomes AD_RX_CURRENT, preventing the churn flag from being set
> again, leaving no way to retrigger the timer. Fixing this solely in
> ad_rx_machine() is insufficient.
>
> This patch rewrites ad_churn_machine according to IEEE 802.1AX-2014
> (Figures 6-23 and 6-24), ensuring correct churn detection, state transitions,
> and timer behavior. With new implementation, there is no need to set
> AD_PORT_CHURNED in ad_rx_machine().
I think this change is too invasive at this point of the cycle. I think
it should be moved to the next one or even to net-next.
> Fixes: 14c9551a32eb ("bonding: Implement port churn-machine (AD standard 43.4.17).")
> Reported-by: Liang Li <liali@redhat.com>
> Tested-by: Liang Li <liali@redhat.com>
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
> drivers/net/bonding/bond_3ad.c | 104 ++++++++++++++++++++++++++-------
> 1 file changed, 84 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> index d6bd3615d129..98b8d5040148 100644
> --- a/drivers/net/bonding/bond_3ad.c
> +++ b/drivers/net/bonding/bond_3ad.c
> @@ -1240,7 +1240,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
> /* first, check if port was reinitialized */
> if (port->sm_vars & AD_PORT_BEGIN) {
> port->sm_rx_state = AD_RX_INITIALIZE;
> - port->sm_vars |= AD_PORT_CHURNED;
> /* check if port is not enabled */
> } else if (!(port->sm_vars & AD_PORT_BEGIN) && !port->is_enabled)
> port->sm_rx_state = AD_RX_PORT_DISABLED;
> @@ -1248,8 +1247,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
> else if (lacpdu && ((port->sm_rx_state == AD_RX_EXPIRED) ||
> (port->sm_rx_state == AD_RX_DEFAULTED) ||
> (port->sm_rx_state == AD_RX_CURRENT))) {
> - if (port->sm_rx_state != AD_RX_CURRENT)
> - port->sm_vars |= AD_PORT_CHURNED;
> port->sm_rx_timer_counter = 0;
> port->sm_rx_state = AD_RX_CURRENT;
> } else {
> @@ -1333,7 +1330,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
> port->partner_oper.port_state |= LACP_STATE_LACP_TIMEOUT;
> port->sm_rx_timer_counter = __ad_timer_to_ticks(AD_CURRENT_WHILE_TIMER, (u16)(AD_SHORT_TIMEOUT));
> port->actor_oper_port_state |= LACP_STATE_EXPIRED;
> - port->sm_vars |= AD_PORT_CHURNED;
> break;
> case AD_RX_DEFAULTED:
> __update_default_selected(port);
> @@ -1365,39 +1361,107 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
> * ad_churn_machine - handle port churn's state machine
> * @port: the port we're looking at
> *
> + * IEEE 802.1AX-2014 Figure 6-23 - Actor Churn Detection machine state diagram
> + *
> + * BEGIN || (! port_enabled)
> + * |
> + * (3) (1) v
> + * +----------------------+ ActorPort.Sync +-------------------------+
> + * | NO_ACTOR_CHURN | <--------------------- | ACTOR_CHURN_MONITOR |
> + * |======================| |=========================|
> + * | actor_churn = FALSE; | ! ActorPort.Sync | actor_churn = FALSE; |
> + * | | ---------------------> | Start actor_churn_timer |
> + * +----------------------+ (4) +-------------------------+
> + * ^ |
> + * | |
> + * | actor_churn_timer expired
> + * | |
> + * ActorPort.Sync | (2)
> + * | +--------------------+ |
> + * (3) | | ACTOR_CHURN | |
> + * | |====================| |
> + * +------------- | actor_churn = True | <-----------+
> + * | |
> + * +--------------------+
> + *
> + * Similar for the Figure 6-24 - Partner Churn Detection machine state diagram
> */
> static void ad_churn_machine(struct port *port)
> {
> - if (port->sm_vars & AD_PORT_CHURNED) {
> + bool partner_synced = port->partner_oper.port_state & LACP_STATE_SYNCHRONIZATION;
> + bool actor_synced = port->actor_oper_port_state & LACP_STATE_SYNCHRONIZATION;
> + bool partner_churned = port->sm_vars & AD_PORT_PARTNER_CHURN;
> + bool actor_churned = port->sm_vars & AD_PORT_ACTOR_CHURN;
> +
> + /* ---- 1. begin or port not enabled ---- */
> + if ((port->sm_vars & AD_PORT_BEGIN) || !port->is_enabled) {
> port->sm_vars &= ~AD_PORT_CHURNED;
> +
> port->sm_churn_actor_state = AD_CHURN_MONITOR;
> port->sm_churn_partner_state = AD_CHURN_MONITOR;
> +
> port->sm_churn_actor_timer_counter =
> __ad_timer_to_ticks(AD_ACTOR_CHURN_TIMER, 0);
> port->sm_churn_partner_timer_counter =
> - __ad_timer_to_ticks(AD_PARTNER_CHURN_TIMER, 0);
> + __ad_timer_to_ticks(AD_PARTNER_CHURN_TIMER, 0);
> +
Please avoid white-space changes only, or if you are going to target
net-next, move them to a pre-req patch.
> return;
> }
> - if (port->sm_churn_actor_timer_counter &&
> - !(--port->sm_churn_actor_timer_counter) &&
> - port->sm_churn_actor_state == AD_CHURN_MONITOR) {
> - if (port->actor_oper_port_state & LACP_STATE_SYNCHRONIZATION) {
> +
> + if (port->sm_churn_actor_timer_counter)
> + port->sm_churn_actor_timer_counter--;
> +
> + if (port->sm_churn_partner_timer_counter)
> + port->sm_churn_partner_timer_counter--;
> +
> + /* ---- 2. timer expired, enter CHURN ---- */
> + if (port->sm_churn_actor_state == AD_CHURN_MONITOR &&
> + !actor_churned && !port->sm_churn_actor_timer_counter) {
> + port->sm_vars |= AD_PORT_ACTOR_CHURN;
> + port->sm_churn_actor_state = AD_CHURN;
> + port->churn_actor_count++;
> + actor_churned = true;
> + }
> +
> + if (port->sm_churn_partner_state == AD_CHURN_MONITOR &&
> + !partner_churned && !port->sm_churn_partner_timer_counter) {
> + port->sm_vars |= AD_PORT_PARTNER_CHURN;
> + port->sm_churn_partner_state = AD_CHURN;
> + port->churn_partner_count++;
> + partner_churned = true;
> + }
> +
> + /* ---- 3. CHURN_MONITOR/CHURN + sync -> NO_CHURN ---- */
> + if ((port->sm_churn_actor_state == AD_CHURN_MONITOR && !actor_churned) ||
> + (port->sm_churn_actor_state == AD_CHURN && actor_churned)) {
Is this ^^^^^^^^^^^^^^^^
test needed ? I *think* the state machine `actor_churned == true` when
`sm_churn_actor_state == AD_CHURN`
> + if (actor_synced) {
> + port->sm_vars &= ~AD_PORT_ACTOR_CHURN;
> port->sm_churn_actor_state = AD_NO_CHURN;
> - } else {
> - port->churn_actor_count++;
> - port->sm_churn_actor_state = AD_CHURN;
> + actor_churned = false;
> }
I think this part is not described by the state diagram above?!?
> }
> - if (port->sm_churn_partner_timer_counter &&
> - !(--port->sm_churn_partner_timer_counter) &&
> - port->sm_churn_partner_state == AD_CHURN_MONITOR) {
> - if (port->partner_oper.port_state & LACP_STATE_SYNCHRONIZATION) {
> +
> + if ((port->sm_churn_partner_state == AD_CHURN_MONITOR && !partner_churned) ||
> + (port->sm_churn_partner_state == AD_CHURN && partner_churned)) {
> + if (partner_synced) {
> + port->sm_vars &= ~AD_PORT_PARTNER_CHURN;
> port->sm_churn_partner_state = AD_NO_CHURN;
> - } else {
> - port->churn_partner_count++;
> - port->sm_churn_partner_state = AD_CHURN;
> + partner_churned = false;
> }
Possibly move this `if` block in a separate helper and reuse for both
partner and actor.
> }
> +
> + /* ---- 4. NO_CHURN + !sync -> MONITOR ---- */
> + if (port->sm_churn_actor_state == AD_NO_CHURN && !actor_churned && !actor_synced) {
> + port->sm_churn_actor_state = AD_CHURN_MONITOR;
> + port->sm_churn_actor_timer_counter =
> + __ad_timer_to_ticks(AD_ACTOR_CHURN_TIMER, 0);
Should this clear sm_vars & AD_PORT_ACTOR_CHURN, too?
> + }
> +
> + if (port->sm_churn_partner_state == AD_NO_CHURN && !partner_churned && !partner_synced) {
> + port->sm_churn_partner_state = AD_CHURN_MONITOR;
> + port->sm_churn_partner_timer_counter =
> + __ad_timer_to_ticks(AD_PARTNER_CHURN_TIMER, 0);
Same question here.
/P
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net 1/3] bonding: set AD_RX_PORT_DISABLED when disabling a port
2025-11-27 10:15 ` Paolo Abeni
@ 2025-11-27 13:33 ` Hangbin Liu
0 siblings, 0 replies; 10+ messages in thread
From: Hangbin Liu @ 2025-11-27 13:33 UTC (permalink / raw)
To: Paolo Abeni
Cc: netdev, Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Simon Horman, Mahesh Bandewar, Shuah Khan,
linux-kselftest
On Thu, Nov 27, 2025 at 11:15:24AM +0100, Paolo Abeni wrote:
> On 11/24/25 5:33 AM, Hangbin Liu wrote:
> > When disabling a port’s collecting and distributing states, updating only
> > rx_disabled is not sufficient. We also need to set AD_RX_PORT_DISABLED
> > so that the rx_machine transitions into the AD_RX_EXPIRED state.
> >
> > One example is in ad_agg_selection_logic(): when a new aggregator is
> > selected and old active aggregator is disabled, if AD_RX_PORT_DISABLED is
> > not set, the disabled port may remain stuck in AD_RX_CURRENT due to
> > continuing to receive partner LACP messages.
> >
> > The __disable_port() called by ad_disable_collecting_distributing()
> > does not have this issue, since its caller also clears the
> > collecting/distributing bits.
> >
> > The __disable_port() called by bond_3ad_bind_slave() should also be fine,
> > as the RX state machine is re-initialized to AD_RX_INITIALIZE.
>
> Given the above, why don't you apply the change in
> ad_agg_selection_logic() only, to reduce the chances of unintended side
> effects?
>
> /P
>
I think setting port->sm_rx_state = AD_RX_PORT_DISABLED and
slave->rx_disabled = 1 should be an atomic operation. The later 2 functions
just did similar stuff(not same the fixed one) on other places.
Thanks
Hagnbin
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net 2/3] bonding: restructure ad_churn_machine
2025-11-27 10:36 ` Paolo Abeni
@ 2025-11-27 14:06 ` Hangbin Liu
2025-11-27 15:29 ` Paolo Abeni
0 siblings, 1 reply; 10+ messages in thread
From: Hangbin Liu @ 2025-11-27 14:06 UTC (permalink / raw)
To: Paolo Abeni
Cc: netdev, Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Simon Horman, Mahesh Bandewar, Shuah Khan,
linux-kselftest, Liang Li
On Thu, Nov 27, 2025 at 11:36:43AM +0100, Paolo Abeni wrote:
> On 11/24/25 5:33 AM, Hangbin Liu wrote:
> > The current ad_churn_machine implementation only transitions the
> > actor/partner churn state to churned or none after the churn timer expires.
> > However, IEEE 802.1AX-2014 specifies that a port should enter the none
> > state immediately once the actor’s port state enters synchronization.
> >
> > Another issue is that if the churn timer expires while the churn machine is
> > not in the monitor state (e.g. already in churn), the state may remain
> > stuck indefinitely with no further transitions. This becomes visible in
> > multi-aggregator scenarios. For example:
> >
> > Ports 1 and 2 are in aggregator 1 (active)
> > Ports 3 and 4 are in aggregator 2 (backup)
> >
> > Ports 1 and 2 should be in none
> > Ports 3 and 4 should be in churned
> >
> > If a failover occurs due to port 2 link down/up, aggregator 2 becomes active.
> > Under the current implementation, the resulting states may look like:
> >
> > agg 1 (backup): port 1 -> none, port 2 -> churned
> > agg 2 (active): ports 3,4 keep in churned.
> >
> > The root cause is that ad_churn_machine() only clears the
> > AD_PORT_CHURNED flag and starts a timer. When a churned port becomes active,
> > its RX state becomes AD_RX_CURRENT, preventing the churn flag from being set
> > again, leaving no way to retrigger the timer. Fixing this solely in
> > ad_rx_machine() is insufficient.
> >
> > This patch rewrites ad_churn_machine according to IEEE 802.1AX-2014
> > (Figures 6-23 and 6-24), ensuring correct churn detection, state transitions,
> > and timer behavior. With new implementation, there is no need to set
> > AD_PORT_CHURNED in ad_rx_machine().
>
> I think this change is too invasive at this point of the cycle. I think
> it should be moved to the next one or even to net-next.
Sure, I can move it to net-next
> > @@ -1365,39 +1361,107 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
> > * ad_churn_machine - handle port churn's state machine
> > * @port: the port we're looking at
> > *
> > + * IEEE 802.1AX-2014 Figure 6-23 - Actor Churn Detection machine state diagram
> > + *
> > + * BEGIN || (! port_enabled)
> > + * |
> > + * (3) (1) v
> > + * +----------------------+ ActorPort.Sync +-------------------------+
> > + * | NO_ACTOR_CHURN | <--------------------- | ACTOR_CHURN_MONITOR |
> > + * |======================| |=========================|
> > + * | actor_churn = FALSE; | ! ActorPort.Sync | actor_churn = FALSE; |
> > + * | | ---------------------> | Start actor_churn_timer |
> > + * +----------------------+ (4) +-------------------------+
> > + * ^ |
> > + * | |
> > + * | actor_churn_timer expired
> > + * | |
> > + * ActorPort.Sync | (2)
> > + * | +--------------------+ |
> > + * (3) | | ACTOR_CHURN | |
> > + * | |====================| |
> > + * +------------- | actor_churn = True | <-----------+
> > + * | |
> > + * +--------------------+
> > + *
> > + * Similar for the Figure 6-24 - Partner Churn Detection machine state diagram
> > */
> > static void ad_churn_machine(struct port *port)
> > {
> > - if (port->sm_vars & AD_PORT_CHURNED) {
> > + bool partner_synced = port->partner_oper.port_state & LACP_STATE_SYNCHRONIZATION;
> > + bool actor_synced = port->actor_oper_port_state & LACP_STATE_SYNCHRONIZATION;
> > + bool partner_churned = port->sm_vars & AD_PORT_PARTNER_CHURN;
> > + bool actor_churned = port->sm_vars & AD_PORT_ACTOR_CHURN;
> > +
> > + /* ---- 1. begin or port not enabled ---- */
> > + if ((port->sm_vars & AD_PORT_BEGIN) || !port->is_enabled) {
> > port->sm_vars &= ~AD_PORT_CHURNED;
> > +
> > port->sm_churn_actor_state = AD_CHURN_MONITOR;
> > port->sm_churn_partner_state = AD_CHURN_MONITOR;
> > +
> > port->sm_churn_actor_timer_counter =
> > __ad_timer_to_ticks(AD_ACTOR_CHURN_TIMER, 0);
> > port->sm_churn_partner_timer_counter =
> > - __ad_timer_to_ticks(AD_PARTNER_CHURN_TIMER, 0);
> > + __ad_timer_to_ticks(AD_PARTNER_CHURN_TIMER, 0);
> > +
>
> Please avoid white-space changes only, or if you are going to target
> net-next, move them to a pre-req patch.
OK, what's pre-req patch?
>
> > return;
> > }
> > - if (port->sm_churn_actor_timer_counter &&
> > - !(--port->sm_churn_actor_timer_counter) &&
> > - port->sm_churn_actor_state == AD_CHURN_MONITOR) {
> > - if (port->actor_oper_port_state & LACP_STATE_SYNCHRONIZATION) {
> > +
> > + if (port->sm_churn_actor_timer_counter)
> > + port->sm_churn_actor_timer_counter--;
> > +
> > + if (port->sm_churn_partner_timer_counter)
> > + port->sm_churn_partner_timer_counter--;
> > +
> > + /* ---- 2. timer expired, enter CHURN ---- */
> > + if (port->sm_churn_actor_state == AD_CHURN_MONITOR &&
> > + !actor_churned && !port->sm_churn_actor_timer_counter) {
> > + port->sm_vars |= AD_PORT_ACTOR_CHURN;
> > + port->sm_churn_actor_state = AD_CHURN;
> > + port->churn_actor_count++;
> > + actor_churned = true;
> > + }
> > +
> > + if (port->sm_churn_partner_state == AD_CHURN_MONITOR &&
> > + !partner_churned && !port->sm_churn_partner_timer_counter) {
> > + port->sm_vars |= AD_PORT_PARTNER_CHURN;
> > + port->sm_churn_partner_state = AD_CHURN;
> > + port->churn_partner_count++;
> > + partner_churned = true;
> > + }
> > +
> > + /* ---- 3. CHURN_MONITOR/CHURN + sync -> NO_CHURN ---- */
> > + if ((port->sm_churn_actor_state == AD_CHURN_MONITOR && !actor_churned) ||
> > + (port->sm_churn_actor_state == AD_CHURN && actor_churned)) {
>
> Is this ^^^^^^^^^^^^^^^^
>
> test needed ? I *think* the state machine `actor_churned == true` when
> `sm_churn_actor_state == AD_CHURN`
Yeah... We don't need this in theory.
>
> > + if (actor_synced) {
> > + port->sm_vars &= ~AD_PORT_ACTOR_CHURN;
> > port->sm_churn_actor_state = AD_NO_CHURN;
> > - } else {
> > - port->churn_actor_count++;
> > - port->sm_churn_actor_state = AD_CHURN;
> > + actor_churned = false;
> > }
>
> I think this part is not described by the state diagram above?!?
This part is about path (3), port in monitor or churn, and actor is in sync.
Then move to state no_churn.
Do you mean port->sm_vars &= ~AD_PORT_ACTOR_CHURN is not described?
Hmm, maybe we don't need this after re-organise.
>
> > }
> > - if (port->sm_churn_partner_timer_counter &&
> > - !(--port->sm_churn_partner_timer_counter) &&
> > - port->sm_churn_partner_state == AD_CHURN_MONITOR) {
> > - if (port->partner_oper.port_state & LACP_STATE_SYNCHRONIZATION) {
> > +
> > + if ((port->sm_churn_partner_state == AD_CHURN_MONITOR && !partner_churned) ||
> > + (port->sm_churn_partner_state == AD_CHURN && partner_churned)) {
> > + if (partner_synced) {
> > + port->sm_vars &= ~AD_PORT_PARTNER_CHURN;
> > port->sm_churn_partner_state = AD_NO_CHURN;
> > - } else {
> > - port->churn_partner_count++;
> > - port->sm_churn_partner_state = AD_CHURN;
> > + partner_churned = false;
> > }
>
> Possibly move this `if` block in a separate helper and reuse for both
> partner and actor.
OK, let me try.
>
> > }
> > +
> > + /* ---- 4. NO_CHURN + !sync -> MONITOR ---- */
> > + if (port->sm_churn_actor_state == AD_NO_CHURN && !actor_churned && !actor_synced) {
> > + port->sm_churn_actor_state = AD_CHURN_MONITOR;
> > + port->sm_churn_actor_timer_counter =
> > + __ad_timer_to_ticks(AD_ACTOR_CHURN_TIMER, 0);
>
> Should this clear sm_vars & AD_PORT_ACTOR_CHURN, too?
Yes, or we can just remove AD_PORT_ACTOR_CHURN as I said above.
Thanks
Hangbin
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net 2/3] bonding: restructure ad_churn_machine
2025-11-27 14:06 ` Hangbin Liu
@ 2025-11-27 15:29 ` Paolo Abeni
2025-11-28 0:26 ` Hangbin Liu
0 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2025-11-27 15:29 UTC (permalink / raw)
To: Hangbin Liu
Cc: netdev, Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Simon Horman, Mahesh Bandewar, Shuah Khan,
linux-kselftest, Liang Li
On 11/27/25 3:06 PM, Hangbin Liu wrote:
> On Thu, Nov 27, 2025 at 11:36:43AM +0100, Paolo Abeni wrote:
>> On 11/24/25 5:33 AM, Hangbin Liu wrote:
>>> static void ad_churn_machine(struct port *port)
>>> {
>>> - if (port->sm_vars & AD_PORT_CHURNED) {
>>> + bool partner_synced = port->partner_oper.port_state & LACP_STATE_SYNCHRONIZATION;
>>> + bool actor_synced = port->actor_oper_port_state & LACP_STATE_SYNCHRONIZATION;
>>> + bool partner_churned = port->sm_vars & AD_PORT_PARTNER_CHURN;
>>> + bool actor_churned = port->sm_vars & AD_PORT_ACTOR_CHURN;
>>> +
>>> + /* ---- 1. begin or port not enabled ---- */
>>> + if ((port->sm_vars & AD_PORT_BEGIN) || !port->is_enabled) {
>>> port->sm_vars &= ~AD_PORT_CHURNED;
>>> +
>>> port->sm_churn_actor_state = AD_CHURN_MONITOR;
>>> port->sm_churn_partner_state = AD_CHURN_MONITOR;
>>> +
>>> port->sm_churn_actor_timer_counter =
>>> __ad_timer_to_ticks(AD_ACTOR_CHURN_TIMER, 0);
>>> port->sm_churn_partner_timer_counter =
>>> - __ad_timer_to_ticks(AD_PARTNER_CHURN_TIMER, 0);
>>> + __ad_timer_to_ticks(AD_PARTNER_CHURN_TIMER, 0);
>>> +
>>
>> Please avoid white-space changes only, or if you are going to target
>> net-next, move them to a pre-req patch.
>
> OK, what's pre-req patch?
I mean: a separate patch, earlier in the series, to keep cosmetic and
functional changes separated and more easily reviewable.
>>> + if (actor_synced) {
>>> + port->sm_vars &= ~AD_PORT_ACTOR_CHURN;
>>> port->sm_churn_actor_state = AD_NO_CHURN;
>>> - } else {
>>> - port->churn_actor_count++;
>>> - port->sm_churn_actor_state = AD_CHURN;
>>> + actor_churned = false;
>>> }
>>
>> I think this part is not described by the state diagram above?!?
>
> This part is about path (3), port in monitor or churn, and actor is in sync.
> Then move to state no_churn.
>
> Do you mean port->sm_vars &= ~AD_PORT_ACTOR_CHURN is not described?
> Hmm, maybe we don't need this after re-organise.
I mean the state change in the else part, I can't map them in the state
machine diagram.
>>> }
>>> +
>>> + /* ---- 4. NO_CHURN + !sync -> MONITOR ---- */
>>> + if (port->sm_churn_actor_state == AD_NO_CHURN && !actor_churned && !actor_synced) {
>>> + port->sm_churn_actor_state = AD_CHURN_MONITOR;
>>> + port->sm_churn_actor_timer_counter =
>>> + __ad_timer_to_ticks(AD_ACTOR_CHURN_TIMER, 0);
>>
>> Should this clear sm_vars & AD_PORT_ACTOR_CHURN, too?
>
> Yes, or we can just remove AD_PORT_ACTOR_CHURN as I said above.
Generally speaking less state sounds simpler and better.
/P
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net 2/3] bonding: restructure ad_churn_machine
2025-11-27 15:29 ` Paolo Abeni
@ 2025-11-28 0:26 ` Hangbin Liu
0 siblings, 0 replies; 10+ messages in thread
From: Hangbin Liu @ 2025-11-28 0:26 UTC (permalink / raw)
To: Paolo Abeni
Cc: netdev, Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Simon Horman, Mahesh Bandewar, Shuah Khan,
linux-kselftest, Liang Li
On Thu, Nov 27, 2025 at 04:29:47PM +0100, Paolo Abeni wrote:
> >> Please avoid white-space changes only, or if you are going to target
> >> net-next, move them to a pre-req patch.
> >
> > OK, what's pre-req patch?
>
> I mean: a separate patch, earlier in the series, to keep cosmetic and
> functional changes separated and more easily reviewable.
Sure
> >>> + if (actor_synced) {
> >>> + port->sm_vars &= ~AD_PORT_ACTOR_CHURN;
> >>> port->sm_churn_actor_state = AD_NO_CHURN;
> >>> - } else {
> >>> - port->churn_actor_count++;
> >>> - port->sm_churn_actor_state = AD_CHURN;
> >>> + actor_churned = false;
> >>> }
> >>
> >> I think this part is not described by the state diagram above?!?
> >
> > This part is about path (3), port in monitor or churn, and actor is in sync.
> > Then move to state no_churn.
> >
> > Do you mean port->sm_vars &= ~AD_PORT_ACTOR_CHURN is not described?
> > Hmm, maybe we don't need this after re-organise.
>
> I mean the state change in the else part, I can't map them in the state
> machine diagram.
The "else" line is removed
Thanks
Hangbin
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-11-28 0:26 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-24 4:33 [PATCH net 0/3] bonding: fix 802.3ad churn machine and port state issues Hangbin Liu
2025-11-24 4:33 ` [PATCH net 1/3] bonding: set AD_RX_PORT_DISABLED when disabling a port Hangbin Liu
2025-11-27 10:15 ` Paolo Abeni
2025-11-27 13:33 ` Hangbin Liu
2025-11-24 4:33 ` [PATCH net 2/3] bonding: restructure ad_churn_machine Hangbin Liu
2025-11-27 10:36 ` Paolo Abeni
2025-11-27 14:06 ` Hangbin Liu
2025-11-27 15:29 ` Paolo Abeni
2025-11-28 0:26 ` Hangbin Liu
2025-11-24 4:33 ` [PATCH net 3/3] selftests: bonding: add mux and churn state testing Hangbin Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).