From: Jay Vosburgh <jv@jvosburgh.net>
To: Louis Scalbert <louis.scalbert@6wind.com>
Cc: netdev@vger.kernel.org, stephen@networkplumber.org,
andrew+netdev@lunn.ch, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, fbl@redhat.com, andy@greyhouse.net,
shemminger@vyatta.com, maheshb@google.com
Subject: Re: [PATCH net v4 3/4] bonding: 3ad: fix mux port state on oper down
Date: Thu, 07 May 2026 08:58:37 +0200 [thread overview]
Message-ID: <97981.1778137117@vermin> (raw)
In-Reply-To: <CAJ5u_OdCjZJLYr8OhfskTxavuW83dYSFx1_bTto9wezTn60nfA@mail.gmail.com>
Louis Scalbert <louis.scalbert@6wind.com> wrote:
>Hello Jay,
>
>Sorry for the late reply. I’ve been busy with another project these
>past few days.
>
>Le jeu. 23 avr. 2026 à 22:00, Jay Vosburgh <jv@jvosburgh.net> a écrit :
>>
>> Louis Scalbert <louis.scalbert@6wind.com> wrote:
>>
>> >When the bonding interface has carrier down due to the absence of
>> >usable slaves and a slave transitions from down to up, the bonding
>> >interface briefly goes carrier up, then down again, and finally up
>> >once LACP negotiates collecting and distributing on the port.
>> >
>> >When lacp_strict mode is on, the interface should not transition to
>> >carrier up until LACP negotiation is complete.
>> >
>> >This happens because the actor and partner port states remain in
>> >Collecting_Distributing when the port goes down. When the port
>> >comes back up, it temporarily remains in this state until LACP
>> >renegotiation occurs.
>> >
>> >Previously this was mostly cosmetic, but since the bonding carrier
>> >state may depend on the LACP negotiation state, it causes the
>> >interface to flap.
>> >
>> >Move an operationally down port to the Mux WAITING state and clear the
>> >Synchronization, Collecting, and Distributing states, in accordance with
>> >the 802.1AX Mux state machine diagram.
>> >
>> >Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
>> >Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
>> >---
>> > drivers/net/bonding/bond_3ad.c | 7 +++++++
>> > 1 file changed, 7 insertions(+)
>> >
>> >diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>> >index 9cf064243d58..bc2964ea11f5 100644
>> >--- a/drivers/net/bonding/bond_3ad.c
>> >+++ b/drivers/net/bonding/bond_3ad.c
>> >@@ -1053,6 +1053,8 @@ static void ad_mux_machine(struct port *port, bool *update_slave_arr)
>> >
>> > if (port->sm_vars & AD_PORT_BEGIN) {
>> > port->sm_mux_state = AD_MUX_DETACHED;
>> >+ } else if (!port->is_enabled && port->sm_mux_state != AD_MUX_DETACHED) {
>> >+ port->sm_mux_state = AD_MUX_WAITING;
>>
>> Technically, this is not exactly following the state machines.
>>
>> The mux machine should transition to WAITING from DETACHED when
>> Selected == SELECTED or STANDBY, not for !is_enabled ("port_enabled" in
>> the standard).
>
>The MUX machine still transitions from DETACHED to WAITING; this
>happens a few lines later and is unchanged.
>
>The relevant code is:
>
>} else if (!port->is_enabled && port->sm_mux_state != AD_MUX_DETACHED) {
> port->sm_mux_state = AD_MUX_WAITING;
>} else {
> switch (port->sm_mux_state) {
> case AD_MUX_DETACHED:
> if ((port->sm_vars & AD_PORT_SELECTED)
> || (port->sm_vars & AD_PORT_STANDBY))
> /* if SELECTED or STANDBY */
> port->sm_mux_state = AD_MUX_WAITING;
> break;
>
>My change is for NOT is_enabled AND NOT AD_MUX_DETACHED.
>
>> The check for !is_enabled happens in the receive
>> machine, and it would transition to PORT_DISABLED state (which clears
>> Synchronization).
>
>I agree with that statement. However, clearing Synchronization is not
>sufficient: the purpose of the fix is to clear Collecting and
>Distributing.
>
>I noticed that is_enabled is defined in the 802.3ad-2000 standard, while
>Port_Operational is defined in 802.1AX-2020. I am not sure about
>802.1AX-2014, as I do not have access to that version.
Did you mean 802.1AX-2014 instead of 802.3ad-2000 here?
>The 802.1AX-2020 standard uses a different MUX state diagram. Compared to
>802.3ad, the ATTACHED state has been split into ATTACH and ATTACHED.
>There is no longer a WAITING state; it has been replaced by
>ATTACHED_WTR.
>
>The new standard says that the MUX machine should transition to
>ATTACHED_WTR when Port_Operational is FALSE and the current state is
>not DETACHED.
>
>So, in my opinion, the change is correct, at least with respect to
>802.1AX-2020.
I'm concerned that mixing versions of the standard will cause
issues, precisely because the state machines vary between the 2014 and
2020 versions.
As the standards are not explicitly backwards compatible,
bonding should conform to one version, and in my opinion the version
that makes the most sense is the 2014 edition. Bonding is already very
close to the 2014 version, while compliance with the 2020 version would
require significant changes.
The 2014 edition is freely available from the IEEE, for example at:
https://1.ieee802.org/tsn/802-1ax-2014/
>>
>> I'm not sure if this is actually an issue or not; I need to read
>> the relevant bits again to make sure I understand how it's supposed to
>> work.
>
>Please confirm what you want me to do: should I keep the fix as it is ?
I see that you've reposted the patch series, I'll reply there
later when I've had a chance to study the details.
-J
>best regards,
>
>Louis Scalbert
>>
>> -J
>>
>> > } else {
>> > switch (port->sm_mux_state) {
>> > case AD_MUX_DETACHED:
>> >@@ -1200,6 +1202,11 @@ static void ad_mux_machine(struct port *port, bool *update_slave_arr)
>> > break;
>> > case AD_MUX_WAITING:
>> > port->sm_mux_timer_counter = __ad_timer_to_ticks(AD_WAIT_WHILE_TIMER, 0);
>> >+ port->actor_oper_port_state &= ~LACP_STATE_SYNCHRONIZATION;
>> >+ ad_disable_collecting_distributing(port,
>> >+ update_slave_arr);
>> >+ port->actor_oper_port_state &= ~LACP_STATE_COLLECTING;
>> >+ port->actor_oper_port_state &= ~LACP_STATE_DISTRIBUTING;
>> > break;
>> > case AD_MUX_ATTACHED:
>> > if (port->aggregator->is_active)
>> >--
>> >2.39.2
>> >
>>
>> ---
>> -Jay Vosburgh, jv@jvosburgh.net
---
-Jay Vosburgh, jv@jvosburgh.net
next prev parent reply other threads:[~2026-05-07 7:00 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-17 14:05 [PATCH net v4 0/4] bonding: 3ad: fix carrier state with no usable slaves Louis Scalbert
2026-04-17 14:05 ` [PATCH net v4 1/4] bonding: 3ad: add lacp_strict configuration knob Louis Scalbert
2026-04-17 14:05 ` [PATCH net v4 2/4] bonding: 3ad: fix carrier when no usable slaves Louis Scalbert
2026-04-17 14:05 ` [PATCH net v4 3/4] bonding: 3ad: fix mux port state on oper down Louis Scalbert
2026-04-23 20:00 ` Jay Vosburgh
2026-05-05 16:10 ` Louis Scalbert
2026-05-07 6:58 ` Jay Vosburgh [this message]
2026-04-17 14:05 ` [PATCH net v4 4/4] selftests: bonding: add test for lacp_strict mode Louis Scalbert
2026-04-17 19:27 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=97981.1778137117@vermin \
--to=jv@jvosburgh.net \
--cc=andrew+netdev@lunn.ch \
--cc=andy@greyhouse.net \
--cc=edumazet@google.com \
--cc=fbl@redhat.com \
--cc=kuba@kernel.org \
--cc=louis.scalbert@6wind.com \
--cc=maheshb@google.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=shemminger@vyatta.com \
--cc=stephen@networkplumber.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox