public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jay Vosburgh <jv@jvosburgh.net>
To: Hangbin Liu <liuhangbin@gmail.com>
Cc: netdev@vger.kernel.org, Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Shuah Khan <shuah@kernel.org>,
	Nikolay Aleksandrov <razor@blackwall.org>,
	Mahesh Bandewar <maheshb@google.com>,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	Liang Li <liali@redhat.com>
Subject: Re: [PATCHv3 net 2/3] bonding: restructure ad_churn_machine
Date: Thu, 26 Feb 2026 16:36:46 -0800	[thread overview]
Message-ID: <941081.1772152606@famine> (raw)
In-Reply-To: <20260226125331.28147-3-liuhangbin@gmail.com>

Hangbin Liu <liuhangbin@gmail.com> wrote:

>The current ad_churn_machine implementation only transitions the
>actor/partner churn state to churned or none after the churn timer expires.
>However, IEEE 802.1AX-2014 specifies that a port should enter the none
>state immediately once the actor’s port state enters synchronization.
>
>Another issue is that if the churn timer expires while the churn machine is
>not in the monitor state (e.g. already in churn), the state may remain
>stuck indefinitely with no further transitions. This becomes visible in
>multi-aggregator scenarios. For example:
>
>Ports 1 and 2 are in aggregator 1 (active)
>Ports 3 and 4 are in aggregator 2 (backup)
>
>Ports 1 and 2 should be in none
>Ports 3 and 4 should be in churned
>
>If a failover occurs due to port 2 link down/up, aggregator 2 becomes active.
>Under the current implementation, the resulting states may look like:
>
>agg 1 (backup): port 1 -> none, port 2 -> churned
>agg 2 (active): ports 3,4 keep in churned.
>
>The root cause is that ad_churn_machine() only clears the
>AD_PORT_CHURNED flag and starts a timer. When a churned port becomes active,
>its RX state becomes AD_RX_CURRENT, preventing the churn flag from being set
>again, leaving no way to retrigger the timer. Fixing this solely in
>ad_rx_machine() is insufficient.
>
>This patch rewrites ad_churn_machine according to IEEE 802.1AX-2014
>(Figures 6-23 and 6-24), ensuring correct churn detection, state transitions,
>and timer behavior. With new implementation, there is no need to set
>AD_PORT_CHURNED in ad_rx_machine().
>
>Fixes: 14c9551a32eb ("bonding: Implement port churn-machine (AD standard 43.4.17).")
>Reported-by: Liang Li <liali@redhat.com>
>Tested-by: Liang Li <liali@redhat.com>
>Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>

	I missed this last time it was posted, but reading it now I
think the functional change looks good, but I question the usefulness of
including the 25 line ASCII art version of the state diagram.

	The standard is publicly available, so a comment saying that the
state machine logic conforms to IEEE 802.1AX-2014 figures 6-23 and 6-24
should be sufficient.  Anyone seriously checking the code against the
standard will need to read the relevant text, so they'll be looking it
up anyway.

	-J

>---
> drivers/net/bonding/bond_3ad.c | 96 +++++++++++++++++++++++++---------
> 1 file changed, 71 insertions(+), 25 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index c47f6a69fd2a..68258d61fd1c 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -44,7 +44,6 @@
> #define AD_PORT_STANDBY         0x80
> #define AD_PORT_SELECTED        0x100
> #define AD_PORT_MOVED           0x200
>-#define AD_PORT_CHURNED         (AD_PORT_ACTOR_CHURN | AD_PORT_PARTNER_CHURN)
> 
> /* Port Key definitions
>  * key is determined according to the link speed, duplex and
>@@ -1254,7 +1253,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
> 	/* first, check if port was reinitialized */
> 	if (port->sm_vars & AD_PORT_BEGIN) {
> 		port->sm_rx_state = AD_RX_INITIALIZE;
>-		port->sm_vars |= AD_PORT_CHURNED;
> 	/* check if port is not enabled */
> 	} else if (!(port->sm_vars & AD_PORT_BEGIN) && !port->is_enabled)
> 		port->sm_rx_state = AD_RX_PORT_DISABLED;
>@@ -1262,8 +1260,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
> 	else if (lacpdu && ((port->sm_rx_state == AD_RX_EXPIRED) ||
> 		 (port->sm_rx_state == AD_RX_DEFAULTED) ||
> 		 (port->sm_rx_state == AD_RX_CURRENT))) {
>-		if (port->sm_rx_state != AD_RX_CURRENT)
>-			port->sm_vars |= AD_PORT_CHURNED;
> 		port->sm_rx_timer_counter = 0;
> 		port->sm_rx_state = AD_RX_CURRENT;
> 	} else {
>@@ -1347,7 +1343,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
> 			port->partner_oper.port_state |= LACP_STATE_LACP_TIMEOUT;
> 			port->sm_rx_timer_counter = __ad_timer_to_ticks(AD_CURRENT_WHILE_TIMER, (u16)(AD_SHORT_TIMEOUT));
> 			port->actor_oper_port_state |= LACP_STATE_EXPIRED;
>-			port->sm_vars |= AD_PORT_CHURNED;
> 			break;
> 		case AD_RX_DEFAULTED:
> 			__update_default_selected(port);
>@@ -1379,11 +1374,41 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
>  * ad_churn_machine - handle port churn's state machine
>  * @port: the port we're looking at
>  *
>+ * IEEE 802.1AX-2014 Figure 6-23 - Actor Churn Detection machine state diagram
>+ *
>+ *                                                     BEGIN || (! port_enabled)
>+ *                                                               |
>+ *                                      (3)                (1)   v
>+ *   +----------------------+     ActorPort.Sync     +-------------------------+
>+ *   |    NO_ACTOR_CHURN    | <--------------------- |   ACTOR_CHURN_MONITOR   |
>+ *   |======================|                        |=========================|
>+ *   | actor_churn = FALSE; |    ! ActorPort.Sync    | actor_churn = FALSE;    |
>+ *   |                      | ---------------------> | Start actor_churn_timer |
>+ *   +----------------------+           (4)          +-------------------------+
>+ *             ^                                                 |
>+ *             |                                                 |
>+ *             |                                      actor_churn_timer expired
>+ *             |                                                 |
>+ *       ActorPort.Sync                                          |  (2)
>+ *             |              +--------------------+             |
>+ *        (3)  |              |   ACTOR_CHURN      |             |
>+ *             |              |====================|             |
>+ *             +------------- | actor_churn = True | <-----------+
>+ *                            |                    |
>+ *                            +--------------------+
>+ *
>+ * Similar for the Figure 6-24 - Partner Churn Detection machine state diagram
>+ *
>+ * We don’t need to check actor_churn, because it can only be true when the
>+ * state is ACTOR_CHURN.
>  */
> static void ad_churn_machine(struct port *port)
> {
>-	if (port->sm_vars & AD_PORT_CHURNED) {
>-		port->sm_vars &= ~AD_PORT_CHURNED;
>+	bool partner_synced = port->partner_oper.port_state & LACP_STATE_SYNCHRONIZATION;
>+	bool actor_synced = port->actor_oper_port_state & LACP_STATE_SYNCHRONIZATION;
>+
>+	/* ---- 1. begin or port not enabled ---- */
>+	if ((port->sm_vars & AD_PORT_BEGIN) || !port->is_enabled) {
> 		port->sm_churn_actor_state = AD_CHURN_MONITOR;
> 		port->sm_churn_partner_state = AD_CHURN_MONITOR;
> 		port->sm_churn_actor_timer_counter =
>@@ -1392,25 +1417,46 @@ static void ad_churn_machine(struct port *port)
> 			 __ad_timer_to_ticks(AD_PARTNER_CHURN_TIMER, 0);
> 		return;
> 	}
>-	if (port->sm_churn_actor_timer_counter &&
>-	    !(--port->sm_churn_actor_timer_counter) &&
>-	    port->sm_churn_actor_state == AD_CHURN_MONITOR) {
>-		if (port->actor_oper_port_state & LACP_STATE_SYNCHRONIZATION) {
>-			port->sm_churn_actor_state = AD_NO_CHURN;
>-		} else {
>-			port->churn_actor_count++;
>-			port->sm_churn_actor_state = AD_CHURN;
>-		}
>+
>+	if (port->sm_churn_actor_timer_counter)
>+		port->sm_churn_actor_timer_counter--;
>+
>+	if (port->sm_churn_partner_timer_counter)
>+		port->sm_churn_partner_timer_counter--;
>+
>+	/* ---- 2. timer expired, enter CHURN ---- */
>+	if (port->sm_churn_actor_state == AD_CHURN_MONITOR &&
>+	    !port->sm_churn_actor_timer_counter) {
>+		port->sm_churn_actor_state = AD_CHURN;
>+		port->churn_actor_count++;
> 	}
>-	if (port->sm_churn_partner_timer_counter &&
>-	    !(--port->sm_churn_partner_timer_counter) &&
>-	    port->sm_churn_partner_state == AD_CHURN_MONITOR) {
>-		if (port->partner_oper.port_state & LACP_STATE_SYNCHRONIZATION) {
>-			port->sm_churn_partner_state = AD_NO_CHURN;
>-		} else {
>-			port->churn_partner_count++;
>-			port->sm_churn_partner_state = AD_CHURN;
>-		}
>+
>+	if (port->sm_churn_partner_state == AD_CHURN_MONITOR &&
>+	    !port->sm_churn_partner_timer_counter) {
>+		port->sm_churn_partner_state = AD_CHURN;
>+		port->churn_partner_count++;
>+	}
>+
>+	/* ---- 3. CHURN_MONITOR/CHURN + sync -> NO_CHURN ---- */
>+	if ((port->sm_churn_actor_state == AD_CHURN_MONITOR ||
>+	     port->sm_churn_actor_state == AD_CHURN) && actor_synced)
>+		port->sm_churn_actor_state = AD_NO_CHURN;
>+
>+	if ((port->sm_churn_partner_state == AD_CHURN_MONITOR ||
>+	     port->sm_churn_partner_state == AD_CHURN) && partner_synced)
>+		port->sm_churn_partner_state = AD_NO_CHURN;
>+
>+	/* ---- 4. NO_CHURN + !sync -> MONITOR ---- */
>+	if (port->sm_churn_actor_state == AD_NO_CHURN && !actor_synced) {
>+		port->sm_churn_actor_state = AD_CHURN_MONITOR;
>+		port->sm_churn_actor_timer_counter =
>+			__ad_timer_to_ticks(AD_ACTOR_CHURN_TIMER, 0);
>+	}
>+
>+	if (port->sm_churn_partner_state == AD_NO_CHURN && !partner_synced) {
>+		port->sm_churn_partner_state = AD_CHURN_MONITOR;
>+		port->sm_churn_partner_timer_counter =
>+			__ad_timer_to_ticks(AD_PARTNER_CHURN_TIMER, 0);
> 	}
> }
> 
>-- 
>2.50.1
>

---
	-Jay Vosburgh, jv@jvosburgh.net

  reply	other threads:[~2026-02-27  0:36 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-26 12:53 [PATCHv3 net 0/3] bonding: fix 802.3ad churn machine and port state issues Hangbin Liu
2026-02-26 12:53 ` [PATCHv3 net 1/3] bonding: set AD_RX_PORT_DISABLED when disabling a port Hangbin Liu
2026-02-27  1:16   ` Jay Vosburgh
2026-02-27  2:31     ` Hangbin Liu
2026-02-27  4:14       ` Hangbin Liu
2026-02-27  4:42       ` Jay Vosburgh
2026-02-27  6:21         ` Hangbin Liu
2026-03-10  3:01           ` Hangbin Liu
2026-02-26 12:53 ` [PATCHv3 net 2/3] bonding: restructure ad_churn_machine Hangbin Liu
2026-02-27  0:36   ` Jay Vosburgh [this message]
2026-02-27  0:52     ` Hangbin Liu
2026-02-27  1:42       ` Jay Vosburgh
2026-02-27  2:36         ` Hangbin Liu
2026-02-26 12:53 ` [PATCHv3 net 3/3] selftests: bonding: add mux and churn state testing Hangbin Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=941081.1772152606@famine \
    --to=jv@jvosburgh.net \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=liali@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=liuhangbin@gmail.com \
    --cc=maheshb@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=razor@blackwall.org \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox