public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2 net-next 0/3] bonding: fix 802.3ad churn machine and port state issues
@ 2026-01-14  6:49 Hangbin Liu
  2026-01-14  6:49 ` [PATCHv2 net-next 1/3] bonding: set AD_RX_PORT_DISABLED when disabling a port Hangbin Liu
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Hangbin Liu @ 2026-01-14  6:49 UTC (permalink / raw)
  To: netdev
  Cc: Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Mahesh Bandewar,
	Shuah Khan, linux-kselftest, Hangbin Liu

This series fixes two issues in the bonding 802.3ad implementation
related to port state management and churn detection:

1. When disabling a port, we need to set AD_RX_PORT_DISABLED to ensure
   proper state machine transitions, preventing ports from getting stuck
   in AD_RX_CURRENT state.

2. The ad_churn_machine implementation is restructured to follow IEEE
   802.1AX-2014 specifications correctly. The current implementation has
   several issues: it doesn't transition to "none" state immediately when
   synchronization is achieved, and can get stuck in churned state in
   multi-aggregator scenarios.

3. Selftests are enhanced to validate both mux state machine and churn
   state logic under aggregator selection and failover scenarios.

These changes ensure proper LACP state machine behavior and fix issues
where ports could remain in incorrect states during aggregator failover.


v2:
  * The changes are large and not urgent. Post to net-next as Paolo suggested
  * set AD_RX_PORT_DISABLED only in ad_agg_selection_logic to avoid side effect. (Paolo Abeni)
  * remove actor_churn as it can only be true when the state is ACTOR_CHURN (Paolo Abeni)
  * remove AD_PORT_CHURNED since we don't need it anywhere (Paolo Abeni)
  * I didn't add new helper for ad_churn_machine() as it looks not help much.

v1: https://lore.kernel.org/netdev/20251124043310.34073-1-liuhangbin@gmail.com

Hangbin Liu (3):
  bonding: set AD_RX_PORT_DISABLED when disabling a port
  bonding: restructure ad_churn_machine
  selftests: bonding: add mux and churn state testing

 drivers/net/bonding/bond_3ad.c                | 97 ++++++++++++++-----
 .../selftests/drivers/net/bonding/Makefile    |  2 +-
 ...nd_lacp_prio.sh => bond_lacp_ad_select.sh} | 73 ++++++++++++++
 3 files changed, 146 insertions(+), 26 deletions(-)
 rename tools/testing/selftests/drivers/net/bonding/{bond_lacp_prio.sh => bond_lacp_ad_select.sh} (64%)

-- 
2.50.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCHv2 net-next 1/3] bonding: set AD_RX_PORT_DISABLED when disabling a port
  2026-01-14  6:49 [PATCHv2 net-next 0/3] bonding: fix 802.3ad churn machine and port state issues Hangbin Liu
@ 2026-01-14  6:49 ` Hangbin Liu
  2026-01-14  6:49 ` [PATCHv2 net-next 2/3] bonding: restructure ad_churn_machine Hangbin Liu
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Hangbin Liu @ 2026-01-14  6:49 UTC (permalink / raw)
  To: netdev
  Cc: Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Mahesh Bandewar,
	Shuah Khan, linux-kselftest, Hangbin Liu

When disabling a port’s collecting and distributing states, updating only
rx_disabled is not sufficient. We also need to set AD_RX_PORT_DISABLED
so that the rx_machine transitions into the AD_RX_EXPIRED state.

One example is in ad_agg_selection_logic(): when a new aggregator is
selected and old active aggregator is disabled, if AD_RX_PORT_DISABLED is
not set, the disabled port may remain stuck in AD_RX_CURRENT due to
continuing to receive partner LACP messages.

The __disable_port() called by ad_disable_collecting_distributing()
does not have this issue, since its caller also clears the
collecting/distributing bits.

The __disable_port() called by bond_3ad_bind_slave() should also be fine,
as the RX state machine is re-initialized to AD_RX_INITIALIZE.

Let's fix this only in ad_agg_selection_logic() to reduce the chances of
unintended side effects.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 drivers/net/bonding/bond_3ad.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 1a8de2bf8655..bcf9833e5436 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1926,6 +1926,7 @@ static void ad_agg_selection_logic(struct aggregator *agg,
 		if (active) {
 			for (port = active->lag_ports; port;
 			     port = port->next_port_in_aggregator) {
+				port->sm_rx_state = AD_RX_PORT_DISABLED;
 				__disable_port(port);
 			}
 		}
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv2 net-next 2/3] bonding: restructure ad_churn_machine
  2026-01-14  6:49 [PATCHv2 net-next 0/3] bonding: fix 802.3ad churn machine and port state issues Hangbin Liu
  2026-01-14  6:49 ` [PATCHv2 net-next 1/3] bonding: set AD_RX_PORT_DISABLED when disabling a port Hangbin Liu
@ 2026-01-14  6:49 ` Hangbin Liu
  2026-01-19 20:22   ` Jakub Kicinski
  2026-01-14  6:49 ` [PATCHv2 net-next 3/3] selftests: bonding: add mux and churn state testing Hangbin Liu
  2026-02-02  1:14 ` [PATCHv2 net-next 0/3] bonding: fix 802.3ad churn machine and port state issues Hangbin Liu
  3 siblings, 1 reply; 15+ messages in thread
From: Hangbin Liu @ 2026-01-14  6:49 UTC (permalink / raw)
  To: netdev
  Cc: Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Mahesh Bandewar,
	Shuah Khan, linux-kselftest, Hangbin Liu, Liang Li

The current ad_churn_machine implementation only transitions the
actor/partner churn state to churned or none after the churn timer expires.
However, IEEE 802.1AX-2014 specifies that a port should enter the none
state immediately once the actor’s port state enters synchronization.

Another issue is that if the churn timer expires while the churn machine is
not in the monitor state (e.g. already in churn), the state may remain
stuck indefinitely with no further transitions. This becomes visible in
multi-aggregator scenarios. For example:

Ports 1 and 2 are in aggregator 1 (active)
Ports 3 and 4 are in aggregator 2 (backup)

Ports 1 and 2 should be in none
Ports 3 and 4 should be in churned

If a failover occurs due to port 2 link down/up, aggregator 2 becomes active.
Under the current implementation, the resulting states may look like:

agg 1 (backup): port 1 -> none, port 2 -> churned
agg 2 (active): ports 3,4 keep in churned.

The root cause is that ad_churn_machine() only clears the
AD_PORT_CHURNED flag and starts a timer. When a churned port becomes active,
its RX state becomes AD_RX_CURRENT, preventing the churn flag from being set
again, leaving no way to retrigger the timer. Fixing this solely in
ad_rx_machine() is insufficient.

This patch rewrites ad_churn_machine according to IEEE 802.1AX-2014
(Figures 6-23 and 6-24), ensuring correct churn detection, state transitions,
and timer behavior. With new implementation, there is no need to set
AD_PORT_CHURNED in ad_rx_machine().

Fixes: 14c9551a32eb ("bonding: Implement port churn-machine (AD standard 43.4.17).")
Reported-by: Liang Li <liali@redhat.com>
Tested-by: Liang Li <liali@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 drivers/net/bonding/bond_3ad.c | 96 +++++++++++++++++++++++++---------
 1 file changed, 71 insertions(+), 25 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index bcf9833e5436..154e06e345ad 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -44,7 +44,6 @@
 #define AD_PORT_STANDBY         0x80
 #define AD_PORT_SELECTED        0x100
 #define AD_PORT_MOVED           0x200
-#define AD_PORT_CHURNED         (AD_PORT_ACTOR_CHURN | AD_PORT_PARTNER_CHURN)
 
 /* Port Key definitions
  * key is determined according to the link speed, duplex and
@@ -1248,7 +1247,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
 	/* first, check if port was reinitialized */
 	if (port->sm_vars & AD_PORT_BEGIN) {
 		port->sm_rx_state = AD_RX_INITIALIZE;
-		port->sm_vars |= AD_PORT_CHURNED;
 	/* check if port is not enabled */
 	} else if (!(port->sm_vars & AD_PORT_BEGIN) && !port->is_enabled)
 		port->sm_rx_state = AD_RX_PORT_DISABLED;
@@ -1256,8 +1254,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
 	else if (lacpdu && ((port->sm_rx_state == AD_RX_EXPIRED) ||
 		 (port->sm_rx_state == AD_RX_DEFAULTED) ||
 		 (port->sm_rx_state == AD_RX_CURRENT))) {
-		if (port->sm_rx_state != AD_RX_CURRENT)
-			port->sm_vars |= AD_PORT_CHURNED;
 		port->sm_rx_timer_counter = 0;
 		port->sm_rx_state = AD_RX_CURRENT;
 	} else {
@@ -1341,7 +1337,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
 			port->partner_oper.port_state |= LACP_STATE_LACP_TIMEOUT;
 			port->sm_rx_timer_counter = __ad_timer_to_ticks(AD_CURRENT_WHILE_TIMER, (u16)(AD_SHORT_TIMEOUT));
 			port->actor_oper_port_state |= LACP_STATE_EXPIRED;
-			port->sm_vars |= AD_PORT_CHURNED;
 			break;
 		case AD_RX_DEFAULTED:
 			__update_default_selected(port);
@@ -1373,11 +1368,41 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
  * ad_churn_machine - handle port churn's state machine
  * @port: the port we're looking at
  *
+ * IEEE 802.1AX-2014 Figure 6-23 - Actor Churn Detection machine state diagram
+ *
+ *                                                     BEGIN || (! port_enabled)
+ *                                                               |
+ *                                      (3)                (1)   v
+ *   +----------------------+     ActorPort.Sync     +-------------------------+
+ *   |    NO_ACTOR_CHURN    | <--------------------- |   ACTOR_CHURN_MONITOR   |
+ *   |======================|                        |=========================|
+ *   | actor_churn = FALSE; |    ! ActorPort.Sync    | actor_churn = FALSE;    |
+ *   |                      | ---------------------> | Start actor_churn_timer |
+ *   +----------------------+           (4)          +-------------------------+
+ *             ^                                                 |
+ *             |                                                 |
+ *             |                                      actor_churn_timer expired
+ *             |                                                 |
+ *       ActorPort.Sync                                          |  (2)
+ *             |              +--------------------+             |
+ *        (3)  |              |   ACTOR_CHURN      |             |
+ *             |              |====================|             |
+ *             +------------- | actor_churn = True | <-----------+
+ *                            |                    |
+ *                            +--------------------+
+ *
+ * Similar for the Figure 6-24 - Partner Churn Detection machine state diagram
+ *
+ * We don’t need to check actor_churn, because it can only be true when the
+ * state is ACTOR_CHURN.
  */
 static void ad_churn_machine(struct port *port)
 {
-	if (port->sm_vars & AD_PORT_CHURNED) {
-		port->sm_vars &= ~AD_PORT_CHURNED;
+	bool partner_synced = port->partner_oper.port_state & LACP_STATE_SYNCHRONIZATION;
+	bool actor_synced = port->actor_oper_port_state & LACP_STATE_SYNCHRONIZATION;
+
+	/* ---- 1. begin or port not enabled ---- */
+	if ((port->sm_vars & AD_PORT_BEGIN) || !port->is_enabled) {
 		port->sm_churn_actor_state = AD_CHURN_MONITOR;
 		port->sm_churn_partner_state = AD_CHURN_MONITOR;
 		port->sm_churn_actor_timer_counter =
@@ -1386,25 +1411,46 @@ static void ad_churn_machine(struct port *port)
 			 __ad_timer_to_ticks(AD_PARTNER_CHURN_TIMER, 0);
 		return;
 	}
-	if (port->sm_churn_actor_timer_counter &&
-	    !(--port->sm_churn_actor_timer_counter) &&
-	    port->sm_churn_actor_state == AD_CHURN_MONITOR) {
-		if (port->actor_oper_port_state & LACP_STATE_SYNCHRONIZATION) {
-			port->sm_churn_actor_state = AD_NO_CHURN;
-		} else {
-			port->churn_actor_count++;
-			port->sm_churn_actor_state = AD_CHURN;
-		}
+
+	if (port->sm_churn_actor_timer_counter)
+		port->sm_churn_actor_timer_counter--;
+
+	if (port->sm_churn_partner_timer_counter)
+		port->sm_churn_partner_timer_counter--;
+
+	/* ---- 2. timer expired, enter CHURN ---- */
+	if (port->sm_churn_actor_state == AD_CHURN_MONITOR &&
+	    !port->sm_churn_actor_timer_counter) {
+		port->sm_churn_actor_state = AD_CHURN;
+		port->churn_actor_count++;
 	}
-	if (port->sm_churn_partner_timer_counter &&
-	    !(--port->sm_churn_partner_timer_counter) &&
-	    port->sm_churn_partner_state == AD_CHURN_MONITOR) {
-		if (port->partner_oper.port_state & LACP_STATE_SYNCHRONIZATION) {
-			port->sm_churn_partner_state = AD_NO_CHURN;
-		} else {
-			port->churn_partner_count++;
-			port->sm_churn_partner_state = AD_CHURN;
-		}
+
+	if (port->sm_churn_partner_state == AD_CHURN_MONITOR &&
+	    !port->sm_churn_partner_timer_counter) {
+		port->sm_churn_partner_state = AD_CHURN;
+		port->churn_partner_count++;
+	}
+
+	/* ---- 3. CHURN_MONITOR/CHURN + sync -> NO_CHURN ---- */
+	if ((port->sm_churn_actor_state == AD_CHURN_MONITOR ||
+	     port->sm_churn_actor_state == AD_CHURN) && actor_synced)
+		port->sm_churn_actor_state = AD_NO_CHURN;
+
+	if ((port->sm_churn_partner_state == AD_CHURN_MONITOR ||
+	     port->sm_churn_partner_state == AD_CHURN) && partner_synced)
+		port->sm_churn_partner_state = AD_NO_CHURN;
+
+	/* ---- 4. NO_CHURN + !sync -> MONITOR ---- */
+	if (port->sm_churn_actor_state == AD_NO_CHURN && !actor_synced) {
+		port->sm_churn_actor_state = AD_CHURN_MONITOR;
+		port->sm_churn_actor_timer_counter =
+			__ad_timer_to_ticks(AD_ACTOR_CHURN_TIMER, 0);
+	}
+
+	if (port->sm_churn_partner_state == AD_NO_CHURN && !partner_synced) {
+		port->sm_churn_partner_state = AD_CHURN_MONITOR;
+		port->sm_churn_partner_timer_counter =
+			__ad_timer_to_ticks(AD_PARTNER_CHURN_TIMER, 0);
 	}
 }
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv2 net-next 3/3] selftests: bonding: add mux and churn state testing
  2026-01-14  6:49 [PATCHv2 net-next 0/3] bonding: fix 802.3ad churn machine and port state issues Hangbin Liu
  2026-01-14  6:49 ` [PATCHv2 net-next 1/3] bonding: set AD_RX_PORT_DISABLED when disabling a port Hangbin Liu
  2026-01-14  6:49 ` [PATCHv2 net-next 2/3] bonding: restructure ad_churn_machine Hangbin Liu
@ 2026-01-14  6:49 ` Hangbin Liu
  2026-02-02  1:14 ` [PATCHv2 net-next 0/3] bonding: fix 802.3ad churn machine and port state issues Hangbin Liu
  3 siblings, 0 replies; 15+ messages in thread
From: Hangbin Liu @ 2026-01-14  6:49 UTC (permalink / raw)
  To: netdev
  Cc: Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Mahesh Bandewar,
	Shuah Khan, linux-kselftest, Hangbin Liu

Rename the current LACP priority test to LACP ad_select testing, and
extend it to include validation of the actor state machine and churn
state logic. The updated tests verify that both the mux state machine and
the churn state machine behave correctly under aggregator selection and
failover scenarios.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 .../selftests/drivers/net/bonding/Makefile    |  2 +-
 ...nd_lacp_prio.sh => bond_lacp_ad_select.sh} | 73 +++++++++++++++++++
 2 files changed, 74 insertions(+), 1 deletion(-)
 rename tools/testing/selftests/drivers/net/bonding/{bond_lacp_prio.sh => bond_lacp_ad_select.sh} (64%)

diff --git a/tools/testing/selftests/drivers/net/bonding/Makefile b/tools/testing/selftests/drivers/net/bonding/Makefile
index 6c5c60adb5e8..e7bddfbf0f7a 100644
--- a/tools/testing/selftests/drivers/net/bonding/Makefile
+++ b/tools/testing/selftests/drivers/net/bonding/Makefile
@@ -7,7 +7,7 @@ TEST_PROGS := \
 	bond-eth-type-change.sh \
 	bond-lladdr-target.sh \
 	bond_ipsec_offload.sh \
-	bond_lacp_prio.sh \
+	bond_lacp_ad_select.sh \
 	bond_macvlan_ipvlan.sh \
 	bond_options.sh \
 	bond_passive_lacp.sh \
diff --git a/tools/testing/selftests/drivers/net/bonding/bond_lacp_prio.sh b/tools/testing/selftests/drivers/net/bonding/bond_lacp_ad_select.sh
similarity index 64%
rename from tools/testing/selftests/drivers/net/bonding/bond_lacp_prio.sh
rename to tools/testing/selftests/drivers/net/bonding/bond_lacp_ad_select.sh
index a483d505c6a8..9f0b3de4f55c 100755
--- a/tools/testing/selftests/drivers/net/bonding/bond_lacp_prio.sh
+++ b/tools/testing/selftests/drivers/net/bonding/bond_lacp_ad_select.sh
@@ -89,6 +89,65 @@ test_agg_reselect()
 		RET=1
 }
 
+is_distributing()
+{
+	ip -j -n "$c_ns" -d link show "$1" \
+		| jq -e '.[].linkinfo.info_slave_data.ad_actor_oper_port_state_str | index("distributing")' > /dev/null
+}
+
+get_churn_state()
+{
+	local slave=$1
+	# shellcheck disable=SC2016
+	ip netns exec "$c_ns" awk -v s="$slave" '
+	$0 ~ "Slave Interface: " s {found=1}
+	found && /Actor Churn State:/ { print $4; exit }
+	' /proc/net/bonding/bond0
+}
+
+check_slave_state()
+{
+	local state=$1
+	local slave_0=$2
+	local slave_1=$3
+	local churn_state
+	RET=0
+
+	s0_agg_id=$(cmd_jq "ip -n ${c_ns} -d -j link show $slave_0" \
+		".[].linkinfo.info_slave_data.ad_aggregator_id")
+	s1_agg_id=$(cmd_jq "ip -n ${c_ns} -d -j link show $slave_1" \
+		".[].linkinfo.info_slave_data.ad_aggregator_id")
+	if [ "${s0_agg_id}" -ne "${s1_agg_id}" ]; then
+		log_info "$state: $slave_0 $slave_1 agg ids are different"
+		RET=1
+	fi
+
+	for s in "$slave_0" "$slave_1"; do
+		churn_state=$(get_churn_state "$s")
+		if [ "$state" = "active" ]; then
+			if ! is_distributing "$s"; then
+				log_info "$state: $s is not in distributing state"
+				RET=1
+			fi
+			if [ "$churn_state" != "none" ]; then
+				log_info "$state: $s churn state $churn_state"
+				RET=1
+			fi
+		else
+			# Backup state, should be in churned and not distributing
+			if is_distributing "$s"; then
+				log_info "$state: $s is in distributing state"
+				RET=1
+			fi
+			if [ "$churn_state" != "churned" ]; then
+				log_info "$state: $s churn state $churn_state"
+				# shellcheck disable=SC2034
+				RET=1
+			fi
+		fi
+	done
+}
+
 trap cleanup_all_ns EXIT
 setup_ns c_ns s_ns b_ns
 setup_links
@@ -98,11 +157,25 @@ log_test "bond 802.3ad" "actor_port_prio setting"
 
 test_agg_reselect eth0
 log_test "bond 802.3ad" "actor_port_prio select"
+# sleep for a while to make sure the mux state machine has completed.
+sleep 10
+check_slave_state active eth0 eth1
+log_test "bond 802.3ad" "active state/churn checking"
+# wait for churn timer expired, need a bit longer as we restart eth1
+sleep 55
+check_slave_state backup eth2 eth3
+log_test "bond 802.3ad" "backup state/churn checking"
 
 # Change the actor port prio and re-test
 ip -n "${c_ns}" link set eth0 type bond_slave actor_port_prio 10
 ip -n "${c_ns}" link set eth2 type bond_slave actor_port_prio 1000
 test_agg_reselect eth2
 log_test "bond 802.3ad" "actor_port_prio switch"
+sleep 10
+check_slave_state active eth2 eth3
+log_test "bond 802.3ad" "active state/churn checking"
+sleep 55
+check_slave_state backup eth0 eth1
+log_test "bond 802.3ad" "backup state/churn checking"
 
 exit "${EXIT_STATUS}"
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCHv2 net-next 2/3] bonding: restructure ad_churn_machine
  2026-01-14  6:49 ` [PATCHv2 net-next 2/3] bonding: restructure ad_churn_machine Hangbin Liu
@ 2026-01-19 20:22   ` Jakub Kicinski
  2026-01-20  5:51     ` Hangbin Liu
  2026-01-20  8:29     ` Paolo Abeni
  0 siblings, 2 replies; 15+ messages in thread
From: Jakub Kicinski @ 2026-01-19 20:22 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Mahesh Bandewar, Shuah Khan,
	linux-kselftest, Liang Li

On Wed, 14 Jan 2026 06:49:20 +0000 Hangbin Liu wrote:
> The current ad_churn_machine implementation only transitions the
> actor/partner churn state to churned or none after the churn timer expires.
> However, IEEE 802.1AX-2014 specifies that a port should enter the none
> state immediately once the actor’s port state enters synchronization.

Paolo, how do you feel about his patch with 2+ weeks until final?
The first patch is definitely suitable for net. If this one is not
it should not have a Fixes tag. I'd lean towards getting them all
into -rc7 if we can.

BTW please make sure to CC Nik (based on the Fixes tag I think).
Always run get_maintainers on the patches.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv2 net-next 2/3] bonding: restructure ad_churn_machine
  2026-01-19 20:22   ` Jakub Kicinski
@ 2026-01-20  5:51     ` Hangbin Liu
  2026-01-20  8:29     ` Paolo Abeni
  1 sibling, 0 replies; 15+ messages in thread
From: Hangbin Liu @ 2026-01-20  5:51 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Mahesh Bandewar, Shuah Khan,
	linux-kselftest, Liang Li

On Mon, Jan 19, 2026 at 12:22:03PM -0800, Jakub Kicinski wrote:
> BTW please make sure to CC Nik (based on the Fixes tag I think).
> Always run get_maintainers on the patches.

Thanks, I will keep this in mind.

Hangbin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv2 net-next 2/3] bonding: restructure ad_churn_machine
  2026-01-19 20:22   ` Jakub Kicinski
  2026-01-20  5:51     ` Hangbin Liu
@ 2026-01-20  8:29     ` Paolo Abeni
  2026-01-20 23:11       ` Jakub Kicinski
  1 sibling, 1 reply; 15+ messages in thread
From: Paolo Abeni @ 2026-01-20  8:29 UTC (permalink / raw)
  To: Jakub Kicinski, Hangbin Liu
  Cc: netdev, Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
	Simon Horman, Mahesh Bandewar, Shuah Khan, linux-kselftest,
	Liang Li

On 1/19/26 9:22 PM, Jakub Kicinski wrote:
> On Wed, 14 Jan 2026 06:49:20 +0000 Hangbin Liu wrote:
>> The current ad_churn_machine implementation only transitions the
>> actor/partner churn state to churned or none after the churn timer expires.
>> However, IEEE 802.1AX-2014 specifies that a port should enter the none
>> state immediately once the actor’s port state enters synchronization.
> 
> Paolo, how do you feel about his patch with 2+ weeks until final?
> The first patch is definitely suitable for net. If this one is not
> it should not have a Fixes tag. I'd lean towards getting them all
> into -rc7 if we can.

My personal preference would be for 2/3 landing into net-next: the code
looks correct to me, but refactor has IMHO still to much potential for
regressions do land directly into net and the blamed commit is quite old.

I suggested targeting net-next while retaining the Fixes tag as we
already had complex fixes landing into net-next in the past.

/P


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv2 net-next 2/3] bonding: restructure ad_churn_machine
  2026-01-20  8:29     ` Paolo Abeni
@ 2026-01-20 23:11       ` Jakub Kicinski
  2026-01-21  7:58         ` Paolo Abeni
  0 siblings, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2026-01-20 23:11 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Hangbin Liu, netdev, Jay Vosburgh, Andrew Lunn, David S. Miller,
	Eric Dumazet, Simon Horman, Mahesh Bandewar, Shuah Khan,
	linux-kselftest, Liang Li

On Tue, 20 Jan 2026 09:29:14 +0100 Paolo Abeni wrote:
> On 1/19/26 9:22 PM, Jakub Kicinski wrote:
> > On Wed, 14 Jan 2026 06:49:20 +0000 Hangbin Liu wrote:  
> >> The current ad_churn_machine implementation only transitions the
> >> actor/partner churn state to churned or none after the churn timer expires.
> >> However, IEEE 802.1AX-2014 specifies that a port should enter the none
> >> state immediately once the actor’s port state enters synchronization.  
> > 
> > Paolo, how do you feel about his patch with 2+ weeks until final?
> > The first patch is definitely suitable for net. If this one is not
> > it should not have a Fixes tag. I'd lean towards getting them all
> > into -rc7 if we can.  
> 
> My personal preference would be for 2/3 landing into net-next: the code
> looks correct to me, but refactor has IMHO still to much potential for
> regressions do land directly into net and the blamed commit is quite old.
> 
> I suggested targeting net-next while retaining the Fixes tag as we
> already had complex fixes landing into net-next in the past.

The appropriate way to delay propagation of the fix to add:

Cc: <stable@vger.kernel.org> # after 4 weeks

not to merge things into -next.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv2 net-next 2/3] bonding: restructure ad_churn_machine
  2026-01-20 23:11       ` Jakub Kicinski
@ 2026-01-21  7:58         ` Paolo Abeni
  2026-01-22  1:10           ` Jakub Kicinski
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Abeni @ 2026-01-21  7:58 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Hangbin Liu, netdev, Jay Vosburgh, Andrew Lunn, David S. Miller,
	Eric Dumazet, Simon Horman, Mahesh Bandewar, Shuah Khan,
	linux-kselftest, Liang Li

On 1/21/26 12:11 AM, Jakub Kicinski wrote:
> On Tue, 20 Jan 2026 09:29:14 +0100 Paolo Abeni wrote:
>> On 1/19/26 9:22 PM, Jakub Kicinski wrote:
>>> On Wed, 14 Jan 2026 06:49:20 +0000 Hangbin Liu wrote:  
>>>> The current ad_churn_machine implementation only transitions the
>>>> actor/partner churn state to churned or none after the churn timer expires.
>>>> However, IEEE 802.1AX-2014 specifies that a port should enter the none
>>>> state immediately once the actor’s port state enters synchronization.  
>>>
>>> Paolo, how do you feel about his patch with 2+ weeks until final?
>>> The first patch is definitely suitable for net. If this one is not
>>> it should not have a Fixes tag. I'd lean towards getting them all
>>> into -rc7 if we can.  
>>
>> My personal preference would be for 2/3 landing into net-next: the code
>> looks correct to me, but refactor has IMHO still to much potential for
>> regressions do land directly into net and the blamed commit is quite old.
>>
>> I suggested targeting net-next while retaining the Fixes tag as we
>> already had complex fixes landing into net-next in the past.
> 
> The appropriate way to delay propagation of the fix to add:
> 
> Cc: <stable@vger.kernel.org> # after 4 weeks
> 
> not to merge things into -next.

I went over the code as carefully as I could and I don't see any obvious
problem, so I don't have a so strong opinion vs net, but to hopefully
clarify: my thinking is that this is net-next material because it's an
invasive refactor with behavior change. My preference would be let the
code be tested in next/net-next for a while before landing into mainline.

/P


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv2 net-next 2/3] bonding: restructure ad_churn_machine
  2026-01-21  7:58         ` Paolo Abeni
@ 2026-01-22  1:10           ` Jakub Kicinski
  0 siblings, 0 replies; 15+ messages in thread
From: Jakub Kicinski @ 2026-01-22  1:10 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Hangbin Liu, netdev, Jay Vosburgh, Andrew Lunn, David S. Miller,
	Eric Dumazet, Simon Horman, Mahesh Bandewar, Shuah Khan,
	linux-kselftest, Liang Li

On Wed, 21 Jan 2026 08:58:13 +0100 Paolo Abeni wrote:
> >> My personal preference would be for 2/3 landing into net-next: the code
> >> looks correct to me, but refactor has IMHO still to much potential for
> >> regressions do land directly into net and the blamed commit is quite old.
> >>
> >> I suggested targeting net-next while retaining the Fixes tag as we
> >> already had complex fixes landing into net-next in the past.  
> > 
> > The appropriate way to delay propagation of the fix to add:
> > 
> > Cc: <stable@vger.kernel.org> # after 4 weeks
> > 
> > not to merge things into -next.  
> 
> I went over the code as carefully as I could and I don't see any obvious
> problem, so I don't have a so strong opinion vs net, but to hopefully
> clarify: my thinking is that this is net-next material because it's an
> invasive refactor with behavior change. My preference would be let the
> code be tested in next/net-next for a while before landing into mainline.

From the patch description it look like user-trigger-able hang of 
the FSM. But if this is more of a resiliency improvement than a fix
then I'm perfectly fine with net-next. But then no Fixes tag and 
no stable at all, please.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv2 net-next 0/3] bonding: fix 802.3ad churn machine and port state issues
  2026-01-14  6:49 [PATCHv2 net-next 0/3] bonding: fix 802.3ad churn machine and port state issues Hangbin Liu
                   ` (2 preceding siblings ...)
  2026-01-14  6:49 ` [PATCHv2 net-next 3/3] selftests: bonding: add mux and churn state testing Hangbin Liu
@ 2026-02-02  1:14 ` Hangbin Liu
  2026-02-02  9:27   ` Paolo Abeni
  3 siblings, 1 reply; 15+ messages in thread
From: Hangbin Liu @ 2026-02-02  1:14 UTC (permalink / raw)
  To: Jakub Kicinski, Paolo Abeni
  Cc: Jay Vosburgh, Andrew Lunn, David S. Miller, netdev, Eric Dumazet,
	Simon Horman, Mahesh Bandewar, Shuah Khan, linux-kselftest


Hi Jakub, Paolo,

Is there anything I should do for this patchset ?

Thanks
Hangbin
On Wed, Jan 14, 2026 at 06:49:18AM +0000, Hangbin Liu wrote:
> This series fixes two issues in the bonding 802.3ad implementation
> related to port state management and churn detection:
> 
> 1. When disabling a port, we need to set AD_RX_PORT_DISABLED to ensure
>    proper state machine transitions, preventing ports from getting stuck
>    in AD_RX_CURRENT state.
> 
> 2. The ad_churn_machine implementation is restructured to follow IEEE
>    802.1AX-2014 specifications correctly. The current implementation has
>    several issues: it doesn't transition to "none" state immediately when
>    synchronization is achieved, and can get stuck in churned state in
>    multi-aggregator scenarios.
> 
> 3. Selftests are enhanced to validate both mux state machine and churn
>    state logic under aggregator selection and failover scenarios.
> 
> These changes ensure proper LACP state machine behavior and fix issues
> where ports could remain in incorrect states during aggregator failover.
> 
> 
> v2:
>   * The changes are large and not urgent. Post to net-next as Paolo suggested
>   * set AD_RX_PORT_DISABLED only in ad_agg_selection_logic to avoid side effect. (Paolo Abeni)
>   * remove actor_churn as it can only be true when the state is ACTOR_CHURN (Paolo Abeni)
>   * remove AD_PORT_CHURNED since we don't need it anywhere (Paolo Abeni)
>   * I didn't add new helper for ad_churn_machine() as it looks not help much.
> 
> v1: https://lore.kernel.org/netdev/20251124043310.34073-1-liuhangbin@gmail.com
> 
> Hangbin Liu (3):
>   bonding: set AD_RX_PORT_DISABLED when disabling a port
>   bonding: restructure ad_churn_machine
>   selftests: bonding: add mux and churn state testing
> 
>  drivers/net/bonding/bond_3ad.c                | 97 ++++++++++++++-----
>  .../selftests/drivers/net/bonding/Makefile    |  2 +-
>  ...nd_lacp_prio.sh => bond_lacp_ad_select.sh} | 73 ++++++++++++++
>  3 files changed, 146 insertions(+), 26 deletions(-)
>  rename tools/testing/selftests/drivers/net/bonding/{bond_lacp_prio.sh => bond_lacp_ad_select.sh} (64%)
> 
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv2 net-next 0/3] bonding: fix 802.3ad churn machine and port state issues
  2026-02-02  1:14 ` [PATCHv2 net-next 0/3] bonding: fix 802.3ad churn machine and port state issues Hangbin Liu
@ 2026-02-02  9:27   ` Paolo Abeni
  2026-02-23  1:55     ` Hangbin Liu
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Abeni @ 2026-02-02  9:27 UTC (permalink / raw)
  To: Hangbin Liu, Jakub Kicinski
  Cc: Jay Vosburgh, Andrew Lunn, David S. Miller, netdev, Eric Dumazet,
	Simon Horman, Mahesh Bandewar, Shuah Khan, linux-kselftest

On 2/2/26 2:14 AM, Hangbin Liu wrote:
> Hi Jakub, Paolo,
> 
> Is there anything I should do for this patchset ?

My take is that given we are now much more near to 6.19 final, this
should go via net-next.

Beware: it looks like Jakub and me are not 110% aligned on this topic,
so please wait a bit more for his feedback, thanks.

Paolo


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv2 net-next 0/3] bonding: fix 802.3ad churn machine and port state issues
  2026-02-02  9:27   ` Paolo Abeni
@ 2026-02-23  1:55     ` Hangbin Liu
  2026-02-26 11:55       ` Paolo Abeni
  0 siblings, 1 reply; 15+ messages in thread
From: Hangbin Liu @ 2026-02-23  1:55 UTC (permalink / raw)
  To: Paolo Abeni, Jakub Kicinski
  Cc: Jay Vosburgh, Andrew Lunn, David S. Miller, netdev, Eric Dumazet,
	Simon Horman, Mahesh Bandewar, Shuah Khan, linux-kselftest

Hi Paolo, Jakub,

I'm not sure what I should do for this patch set. Should I re-post to net-next or net?

Thanks
Hangbin
On Mon, Feb 02, 2026 at 10:27:51AM +0100, Paolo Abeni wrote:
> On 2/2/26 2:14 AM, Hangbin Liu wrote:
> > Hi Jakub, Paolo,
> > 
> > Is there anything I should do for this patchset ?
> 
> My take is that given we are now much more near to 6.19 final, this
> should go via net-next.
> 
> Beware: it looks like Jakub and me are not 110% aligned on this topic,
> so please wait a bit more for his feedback, thanks.
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv2 net-next 0/3] bonding: fix 802.3ad churn machine and port state issues
  2026-02-23  1:55     ` Hangbin Liu
@ 2026-02-26 11:55       ` Paolo Abeni
  2026-02-26 12:40         ` Hangbin Liu
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Abeni @ 2026-02-26 11:55 UTC (permalink / raw)
  To: Hangbin Liu, Jakub Kicinski
  Cc: Jay Vosburgh, Andrew Lunn, David S. Miller, netdev, Eric Dumazet,
	Simon Horman, Mahesh Bandewar, Shuah Khan, linux-kselftest

On 2/23/26 2:55 AM, Hangbin Liu wrote:
> I'm not sure what I should do for this patch set. Should I re-post to net-next or net?

Jakub and me discussed the topic off-list and the current agreement is
repost for  'net', with fixes tag.

Thanks for your patience.

/P


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv2 net-next 0/3] bonding: fix 802.3ad churn machine and port state issues
  2026-02-26 11:55       ` Paolo Abeni
@ 2026-02-26 12:40         ` Hangbin Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Hangbin Liu @ 2026-02-26 12:40 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Jakub Kicinski, Jay Vosburgh, Andrew Lunn, David S. Miller,
	netdev, Eric Dumazet, Simon Horman, Mahesh Bandewar, Shuah Khan,
	linux-kselftest

On Thu, Feb 26, 2026 at 12:55:48PM +0100, Paolo Abeni wrote:
> On 2/23/26 2:55 AM, Hangbin Liu wrote:
> > I'm not sure what I should do for this patch set. Should I re-post to net-next or net?
> 
> Jakub and me discussed the topic off-list and the current agreement is
> repost for  'net', with fixes tag.

Thanks, I will repost to net.

Regards
Hangbin

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-02-26 12:40 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-14  6:49 [PATCHv2 net-next 0/3] bonding: fix 802.3ad churn machine and port state issues Hangbin Liu
2026-01-14  6:49 ` [PATCHv2 net-next 1/3] bonding: set AD_RX_PORT_DISABLED when disabling a port Hangbin Liu
2026-01-14  6:49 ` [PATCHv2 net-next 2/3] bonding: restructure ad_churn_machine Hangbin Liu
2026-01-19 20:22   ` Jakub Kicinski
2026-01-20  5:51     ` Hangbin Liu
2026-01-20  8:29     ` Paolo Abeni
2026-01-20 23:11       ` Jakub Kicinski
2026-01-21  7:58         ` Paolo Abeni
2026-01-22  1:10           ` Jakub Kicinski
2026-01-14  6:49 ` [PATCHv2 net-next 3/3] selftests: bonding: add mux and churn state testing Hangbin Liu
2026-02-02  1:14 ` [PATCHv2 net-next 0/3] bonding: fix 802.3ad churn machine and port state issues Hangbin Liu
2026-02-02  9:27   ` Paolo Abeni
2026-02-23  1:55     ` Hangbin Liu
2026-02-26 11:55       ` Paolo Abeni
2026-02-26 12:40         ` Hangbin Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox