All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jay Vosburgh <jv@jvosburgh.net>
To: Hangbin Liu <liuhangbin@gmail.com>
Cc: netdev@vger.kernel.org, Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Shuah Khan <shuah@kernel.org>,
	Nikolay Aleksandrov <razor@blackwall.org>,
	Mahesh Bandewar <maheshb@google.com>,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	Liang Li <liali@redhat.com>
Subject: Re: [PATCHv3 net 2/3] bonding: restructure ad_churn_machine
Date: Thu, 26 Feb 2026 16:36:46 -0800	[thread overview]
Message-ID: <941081.1772152606@famine> (raw)
In-Reply-To: <20260226125331.28147-3-liuhangbin@gmail.com>

Hangbin Liu <liuhangbin@gmail.com> wrote:

>The current ad_churn_machine implementation only transitions the
>actor/partner churn state to churned or none after the churn timer expires.
>However, IEEE 802.1AX-2014 specifies that a port should enter the none
>state immediately once the actor’s port state enters synchronization.
>
>Another issue is that if the churn timer expires while the churn machine is
>not in the monitor state (e.g. already in churn), the state may remain
>stuck indefinitely with no further transitions. This becomes visible in
>multi-aggregator scenarios. For example:
>
>Ports 1 and 2 are in aggregator 1 (active)
>Ports 3 and 4 are in aggregator 2 (backup)
>
>Ports 1 and 2 should be in none
>Ports 3 and 4 should be in churned
>
>If a failover occurs due to port 2 link down/up, aggregator 2 becomes active.
>Under the current implementation, the resulting states may look like:
>
>agg 1 (backup): port 1 -> none, port 2 -> churned
>agg 2 (active): ports 3,4 keep in churned.
>
>The root cause is that ad_churn_machine() only clears the
>AD_PORT_CHURNED flag and starts a timer. When a churned port becomes active,
>its RX state becomes AD_RX_CURRENT, preventing the churn flag from being set
>again, leaving no way to retrigger the timer. Fixing this solely in
>ad_rx_machine() is insufficient.
>
>This patch rewrites ad_churn_machine according to IEEE 802.1AX-2014
>(Figures 6-23 and 6-24), ensuring correct churn detection, state transitions,
>and timer behavior. With new implementation, there is no need to set
>AD_PORT_CHURNED in ad_rx_machine().
>
>Fixes: 14c9551a32eb ("bonding: Implement port churn-machine (AD standard 43.4.17).")
>Reported-by: Liang Li <liali@redhat.com>
>Tested-by: Liang Li <liali@redhat.com>
>Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>

	I missed this last time it was posted, but reading it now I
think the functional change looks good, but I question the usefulness of
including the 25 line ASCII art version of the state diagram.

	The standard is publicly available, so a comment saying that the
state machine logic conforms to IEEE 802.1AX-2014 figures 6-23 and 6-24
should be sufficient.  Anyone seriously checking the code against the
standard will need to read the relevant text, so they'll be looking it
up anyway.

	-J

>---
> drivers/net/bonding/bond_3ad.c | 96 +++++++++++++++++++++++++---------
> 1 file changed, 71 insertions(+), 25 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index c47f6a69fd2a..68258d61fd1c 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -44,7 +44,6 @@
> #define AD_PORT_STANDBY         0x80
> #define AD_PORT_SELECTED        0x100
> #define AD_PORT_MOVED           0x200
>-#define AD_PORT_CHURNED         (AD_PORT_ACTOR_CHURN | AD_PORT_PARTNER_CHURN)
> 
> /* Port Key definitions
>  * key is determined according to the link speed, duplex and
>@@ -1254,7 +1253,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
> 	/* first, check if port was reinitialized */
> 	if (port->sm_vars & AD_PORT_BEGIN) {
> 		port->sm_rx_state = AD_RX_INITIALIZE;
>-		port->sm_vars |= AD_PORT_CHURNED;
> 	/* check if port is not enabled */
> 	} else if (!(port->sm_vars & AD_PORT_BEGIN) && !port->is_enabled)
> 		port->sm_rx_state = AD_RX_PORT_DISABLED;
>@@ -1262,8 +1260,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
> 	else if (lacpdu && ((port->sm_rx_state == AD_RX_EXPIRED) ||
> 		 (port->sm_rx_state == AD_RX_DEFAULTED) ||
> 		 (port->sm_rx_state == AD_RX_CURRENT))) {
>-		if (port->sm_rx_state != AD_RX_CURRENT)
>-			port->sm_vars |= AD_PORT_CHURNED;
> 		port->sm_rx_timer_counter = 0;
> 		port->sm_rx_state = AD_RX_CURRENT;
> 	} else {
>@@ -1347,7 +1343,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
> 			port->partner_oper.port_state |= LACP_STATE_LACP_TIMEOUT;
> 			port->sm_rx_timer_counter = __ad_timer_to_ticks(AD_CURRENT_WHILE_TIMER, (u16)(AD_SHORT_TIMEOUT));
> 			port->actor_oper_port_state |= LACP_STATE_EXPIRED;
>-			port->sm_vars |= AD_PORT_CHURNED;
> 			break;
> 		case AD_RX_DEFAULTED:
> 			__update_default_selected(port);
>@@ -1379,11 +1374,41 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
>  * ad_churn_machine - handle port churn's state machine
>  * @port: the port we're looking at
>  *
>+ * IEEE 802.1AX-2014 Figure 6-23 - Actor Churn Detection machine state diagram
>+ *
>+ *                                                     BEGIN || (! port_enabled)
>+ *                                                               |
>+ *                                      (3)                (1)   v
>+ *   +----------------------+     ActorPort.Sync     +-------------------------+
>+ *   |    NO_ACTOR_CHURN    | <--------------------- |   ACTOR_CHURN_MONITOR   |
>+ *   |======================|                        |=========================|
>+ *   | actor_churn = FALSE; |    ! ActorPort.Sync    | actor_churn = FALSE;    |
>+ *   |                      | ---------------------> | Start actor_churn_timer |
>+ *   +----------------------+           (4)          +-------------------------+
>+ *             ^                                                 |
>+ *             |                                                 |
>+ *             |                                      actor_churn_timer expired
>+ *             |                                                 |
>+ *       ActorPort.Sync                                          |  (2)
>+ *             |              +--------------------+             |
>+ *        (3)  |              |   ACTOR_CHURN      |             |
>+ *             |              |====================|             |
>+ *             +------------- | actor_churn = True | <-----------+
>+ *                            |                    |
>+ *                            +--------------------+
>+ *
>+ * Similar for the Figure 6-24 - Partner Churn Detection machine state diagram
>+ *
>+ * We don’t need to check actor_churn, because it can only be true when the
>+ * state is ACTOR_CHURN.
>  */
> static void ad_churn_machine(struct port *port)
> {
>-	if (port->sm_vars & AD_PORT_CHURNED) {
>-		port->sm_vars &= ~AD_PORT_CHURNED;
>+	bool partner_synced = port->partner_oper.port_state & LACP_STATE_SYNCHRONIZATION;
>+	bool actor_synced = port->actor_oper_port_state & LACP_STATE_SYNCHRONIZATION;
>+
>+	/* ---- 1. begin or port not enabled ---- */
>+	if ((port->sm_vars & AD_PORT_BEGIN) || !port->is_enabled) {
> 		port->sm_churn_actor_state = AD_CHURN_MONITOR;
> 		port->sm_churn_partner_state = AD_CHURN_MONITOR;
> 		port->sm_churn_actor_timer_counter =
>@@ -1392,25 +1417,46 @@ static void ad_churn_machine(struct port *port)
> 			 __ad_timer_to_ticks(AD_PARTNER_CHURN_TIMER, 0);
> 		return;
> 	}
>-	if (port->sm_churn_actor_timer_counter &&
>-	    !(--port->sm_churn_actor_timer_counter) &&
>-	    port->sm_churn_actor_state == AD_CHURN_MONITOR) {
>-		if (port->actor_oper_port_state & LACP_STATE_SYNCHRONIZATION) {
>-			port->sm_churn_actor_state = AD_NO_CHURN;
>-		} else {
>-			port->churn_actor_count++;
>-			port->sm_churn_actor_state = AD_CHURN;
>-		}
>+
>+	if (port->sm_churn_actor_timer_counter)
>+		port->sm_churn_actor_timer_counter--;
>+
>+	if (port->sm_churn_partner_timer_counter)
>+		port->sm_churn_partner_timer_counter--;
>+
>+	/* ---- 2. timer expired, enter CHURN ---- */
>+	if (port->sm_churn_actor_state == AD_CHURN_MONITOR &&
>+	    !port->sm_churn_actor_timer_counter) {
>+		port->sm_churn_actor_state = AD_CHURN;
>+		port->churn_actor_count++;
> 	}
>-	if (port->sm_churn_partner_timer_counter &&
>-	    !(--port->sm_churn_partner_timer_counter) &&
>-	    port->sm_churn_partner_state == AD_CHURN_MONITOR) {
>-		if (port->partner_oper.port_state & LACP_STATE_SYNCHRONIZATION) {
>-			port->sm_churn_partner_state = AD_NO_CHURN;
>-		} else {
>-			port->churn_partner_count++;
>-			port->sm_churn_partner_state = AD_CHURN;
>-		}
>+
>+	if (port->sm_churn_partner_state == AD_CHURN_MONITOR &&
>+	    !port->sm_churn_partner_timer_counter) {
>+		port->sm_churn_partner_state = AD_CHURN;
>+		port->churn_partner_count++;
>+	}
>+
>+	/* ---- 3. CHURN_MONITOR/CHURN + sync -> NO_CHURN ---- */
>+	if ((port->sm_churn_actor_state == AD_CHURN_MONITOR ||
>+	     port->sm_churn_actor_state == AD_CHURN) && actor_synced)
>+		port->sm_churn_actor_state = AD_NO_CHURN;
>+
>+	if ((port->sm_churn_partner_state == AD_CHURN_MONITOR ||
>+	     port->sm_churn_partner_state == AD_CHURN) && partner_synced)
>+		port->sm_churn_partner_state = AD_NO_CHURN;
>+
>+	/* ---- 4. NO_CHURN + !sync -> MONITOR ---- */
>+	if (port->sm_churn_actor_state == AD_NO_CHURN && !actor_synced) {
>+		port->sm_churn_actor_state = AD_CHURN_MONITOR;
>+		port->sm_churn_actor_timer_counter =
>+			__ad_timer_to_ticks(AD_ACTOR_CHURN_TIMER, 0);
>+	}
>+
>+	if (port->sm_churn_partner_state == AD_NO_CHURN && !partner_synced) {
>+		port->sm_churn_partner_state = AD_CHURN_MONITOR;
>+		port->sm_churn_partner_timer_counter =
>+			__ad_timer_to_ticks(AD_PARTNER_CHURN_TIMER, 0);
> 	}
> }
> 
>-- 
>2.50.1
>

---
	-Jay Vosburgh, jv@jvosburgh.net

  reply	other threads:[~2026-02-27  0:36 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-26 12:53 [PATCHv3 net 0/3] bonding: fix 802.3ad churn machine and port state issues Hangbin Liu
2026-02-26 12:53 ` [PATCHv3 net 1/3] bonding: set AD_RX_PORT_DISABLED when disabling a port Hangbin Liu
2026-02-27  1:16   ` Jay Vosburgh
2026-02-27  2:31     ` Hangbin Liu
2026-02-27  4:14       ` Hangbin Liu
2026-02-27  4:42       ` Jay Vosburgh
2026-02-27  6:21         ` Hangbin Liu
2026-03-10  3:01           ` Hangbin Liu
2026-04-01  1:51           ` Hangbin Liu
2026-02-26 12:53 ` [PATCHv3 net 2/3] bonding: restructure ad_churn_machine Hangbin Liu
2026-02-27  0:36   ` Jay Vosburgh [this message]
2026-02-27  0:52     ` Hangbin Liu
2026-02-27  1:42       ` Jay Vosburgh
2026-02-27  2:36         ` Hangbin Liu
2026-02-26 12:53 ` [PATCHv3 net 3/3] selftests: bonding: add mux and churn state testing Hangbin Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=941081.1772152606@famine \
    --to=jv@jvosburgh.net \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=liali@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=liuhangbin@gmail.com \
    --cc=maheshb@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=razor@blackwall.org \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.