From: Jay Vosburgh <jv@jvosburgh.net>
To: Louis Scalbert <louis.scalbert@6wind.com>
Cc: netdev@vger.kernel.org, andrew+netdev@lunn.ch,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
fbl@redhat.com, andy@greyhouse.net, shemminger@vyatta.com,
maheshb@google.com
Subject: Re: [PATCH net v3 4/5] bonding: 3ad: fix stuck negotiation on recovery
Date: Mon, 13 Apr 2026 11:39:48 -0700 [thread overview]
Message-ID: <710787.1776105588@famine> (raw)
In-Reply-To: <20260408152353.276204-5-louis.scalbert@6wind.com>
Louis Scalbert <louis.scalbert@6wind.com> wrote:
>The previous commit introduced a side effect caused by clearing the
>SELECTED flag on disabled ports. After all ports in an aggregator go
>down, if only a subset of ports comes back up, those ports can no
>longer renegotiate LACP unless all aggregator ports come back up.
>
>1. All aggregator ports go down
> - The SELECTED flag is cleared on all of them.
>2. One port comes back up
> - Its SELECTED flag is set again.
> - It enters the WAITING state and gets its READY_N flag.
> - The remaining ports stay UNSELECTED. Because of that, they cannot
> enter the WAITING state and therefore never get READY_N.
This is the part that I think we may be doing something else
incorrectly. If the port is UNSELECTED, then that means that no
aggregator is currently selected for that port, and therefore it
shouldn't be assigned to an aggregator with other ports (per
802.1AX-2014 6.4.8, "Selected").
I'm not seeing anything in the 6.4.14 Selection Logic that makes
me think a port that is down (port_enabled == FALSE) is disallowed from
being SELECTED.
Looking at the Receive machine state diagram (Figure 6-18), I
tend to think that in this case the port would transition to
PORT_DISABLED state, as we're not asserting a BEGIN (reinitialization of
the LACP protocol entity), so the port variables can remain unchanged.
There's even some language that suggests this is intentional:
"If the Aggregation Port becomes inoperable and the BEGIN
variable is not asserted, the state machine enters the
PORT_DISABLED state. [...] This state allows the current
Selection state to remain undisturbed, so that, in the event
that the Aggregation Port is still connected to the same Partner
and Partner Aggregation Port when it becomes operable again,
there will be no disturbance caused to higher layers by
unnecessary re-configuration.
So, perhaps the actual bug is that these ports are attached to
the aggregator but not SELECTED.
-J
> - __agg_ports_are_ready() returns 0 because it finds a port without
> READY_N.
> - As a result, __set_agg_ports_ready() keeps the READY flag cleared on
> all ports.
> - The port that came back up is therefore not marked READY and cannot
> transition to ATTACHED.
> - LACP negotiation becomes stuck, and the port cannot be used.
>3. All aggregator ports come back up
> - They all regain SELECTED and READY_N.
> - __agg_ports_are_ready() now returns 1.
> - __set_agg_ports_ready() sets READY on all ports.
> - They can then transition to ATTACHED.
> - Negotiation resumes and the aggregator becomes operational again.
>
>Consider only ports currently in the WAITING mux state for READY_N in
>order to avoid __agg_ports_are_ready() to return 0 because of a disabled
>port. That matches 802.3ad, which states: "The Selection Logic asserts
>Ready TRUE when the values of Ready_N for all ports that are waiting to
>attach to a given Aggregator are TRUE.".
>
>Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
>Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
>---
> drivers/net/bonding/bond_3ad.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index 3a94fbcbf721..3f56d892b101 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -700,7 +700,8 @@ static void __update_ntt(struct lacpdu *lacpdu, struct port *port)
> }
>
> /**
>- * __agg_ports_are_ready - check if all ports in an aggregator are ready
>+ * __agg_ports_are_ready - check if all ports in an aggregator that are in
>+ * the WAITING state are ready
> * @aggregator: the aggregator we're looking at
> *
> */
>@@ -716,6 +717,8 @@ static int __agg_ports_are_ready(struct aggregator *aggregator)
> for (port = aggregator->lag_ports;
> port;
> port = port->next_port_in_aggregator) {
>+ if (port->sm_mux_state != AD_MUX_WAITING)
>+ continue;
> if (!(port->sm_vars & AD_PORT_READY_N)) {
> retval = 0;
> break;
>--
>2.39.2
>
---
-Jay Vosburgh, jv@jvosburgh.net
next prev parent reply other threads:[~2026-04-13 18:39 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-08 15:23 [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves Louis Scalbert
2026-04-08 15:23 ` [PATCH net v3 1/5] bonding: 3ad: add lacp_fallback configuration knob Louis Scalbert
2026-04-13 16:45 ` Jay Vosburgh
2026-04-08 15:23 ` [PATCH net v3 2/5] bonding: 3ad: fix carrier when no valid slaves Louis Scalbert
2026-04-13 17:01 ` Jay Vosburgh
2026-04-08 15:23 ` [PATCH net v3 3/5] bonding: 3ad: fix mux port state on oper down Louis Scalbert
2026-04-13 16:49 ` Jay Vosburgh
2026-04-08 15:23 ` [PATCH net v3 4/5] bonding: 3ad: fix stuck negotiation on recovery Louis Scalbert
2026-04-13 18:39 ` Jay Vosburgh [this message]
2026-04-08 15:23 ` [PATCH net v3 5/5] selftests: bonding: add test for fallback mode Louis Scalbert
2026-04-09 3:13 ` [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves Jakub Kicinski
2026-04-09 6:53 ` Jonas Gorski
2026-04-09 11:49 ` Louis Scalbert
2026-04-10 2:38 ` Jakub Kicinski
2026-04-10 14:02 ` Jay Vosburgh
2026-04-13 16:45 ` Jay Vosburgh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=710787.1776105588@famine \
--to=jv@jvosburgh.net \
--cc=andrew+netdev@lunn.ch \
--cc=andy@greyhouse.net \
--cc=edumazet@google.com \
--cc=fbl@redhat.com \
--cc=kuba@kernel.org \
--cc=louis.scalbert@6wind.com \
--cc=maheshb@google.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=shemminger@vyatta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox