* [PATCH net v2 0/3] bonding: 3ad: fix mux port state on oper down
@ 2026-03-25 13:44 Louis Scalbert
2026-03-25 13:44 ` [PATCH net v2 1/3] bonding: 3ad: fix carrier when no valid slaves Louis Scalbert
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Louis Scalbert @ 2026-03-25 13:44 UTC (permalink / raw)
To: netdev
Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
Louis Scalbert
Hi everyone,
This series addresses a blackholing issue and a subsequent link-flapping
issue in the 802.3ad bonding driver when dealing with inactive slaves
and the `min_links` parameter.
Currently, when an 802.3ad bond has no slaves in the collecting
/ distributing state, the bonding master still reports its carrier as up.
This causes upper-layer protocols to consider the interface operational,
resulting in silently dropped traffic.
Patch 1 fixes the core issue by ensuring the carrier is only asserted when
at least `min_links` slaves are genuinely in a valid state (collecting/
distributing, or collecting only if coupled_control is disabled).
Patch 2 fixes a side effect of the first patch. Tightening the carrier
logic exposes a state persistence bug: when a physical link goes down,
the LACP collecting/distributing flags remain set. When the link returns,
the interface briefly hallucinates that it is ready, bounces the carrier
up, and then drops it again once LACP renegotiation starts. Unsetting the
SELECTED flag when the link goes down forces the state machine through
DETACHED, clearing the stale flags and preventing the flap.
Patch 3 fixes a side effect of the second patch caused by clearing the
SELECTED flag on disabled ports. After all ports in an aggregator go
down, if only a subset of ports comes back up, those ports can no
longer renegotiate LACP unless all aggregator ports come back up.
Changelog:
v1 -> v2
- Patch 1: split a comment line that exceeded 80 characters.
- Move the change from patch 2 in __agg_ports_are_ready() into a third
patch, as it is actually a side effect of the fix introduced in
patch 2.
- Patch 2: Expand the commit message and add a code comment describing
the change in ad_port_selection_logic().
- Patch 3: Check the READY_N flag only on ports in the WAITING state,
rather than on all enabled ports. This more closely matches 802.3ad.
Link: https://lore.kernel.org/netdev/20260316131838.3257889-1-louis.scalbert@6wind.com/
Louis Scalbert (3):
bonding: 3ad: fix carrier when no valid slaves
bonding: 3ad: fix mux port state on oper down
bonding: 3ad: fix stuck negotiation on recovery
drivers/net/bonding/bond_3ad.c | 36 +++++++++++++++++++++++++++++++---
1 file changed, 33 insertions(+), 3 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net v2 1/3] bonding: 3ad: fix carrier when no valid slaves
2026-03-25 13:44 [PATCH net v2 0/3] bonding: 3ad: fix mux port state on oper down Louis Scalbert
@ 2026-03-25 13:44 ` Louis Scalbert
2026-03-27 16:16 ` Louis Scalbert
2026-03-25 13:44 ` [PATCH net v2 2/3] bonding: 3ad: fix mux port state on oper down Louis Scalbert
2026-03-25 13:44 ` [PATCH net v2 3/3] bonding: 3ad: fix stuck negotiation on recovery Louis Scalbert
2 siblings, 1 reply; 8+ messages in thread
From: Louis Scalbert @ 2026-03-25 13:44 UTC (permalink / raw)
To: netdev
Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
Louis Scalbert
When an 802.3ad (LACP) bonding interface has no slaves in the
collecting/distributing state, the bonding master may still report
carrier as up. In this situation, no slave is actually able to transmit
or receive traffic.
As a result, upper-layer daemons consider the interface operational
while traffic is effectively blackholed.
Fix this by asserting carrier only when at least 'min_links' slaves are
in the collecting/distributing state (or collecting only if the
coupled_control default behavior is disabled).
Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
drivers/net/bonding/bond_3ad.c | 24 ++++++++++++++++++++++--
1 file changed, 22 insertions(+), 2 deletions(-)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index af7f74cfdc08..6d3613755d45 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -745,6 +745,22 @@ static void __set_agg_ports_ready(struct aggregator *aggregator, int val)
}
}
+static int __agg_valid_ports(struct aggregator *agg)
+{
+ struct port *port;
+ int valid = 0;
+
+ for (port = agg->lag_ports; port;
+ port = port->next_port_in_aggregator) {
+ if (port->actor_oper_port_state & LACP_STATE_COLLECTING &&
+ (!port->slave->bond->params.coupled_control ||
+ port->actor_oper_port_state & LACP_STATE_DISTRIBUTING))
+ valid++;
+ }
+
+ return valid;
+}
+
static int __agg_active_ports(struct aggregator *agg)
{
struct port *port;
@@ -2120,6 +2136,7 @@ static void ad_enable_collecting_distributing(struct port *port,
port->actor_port_number,
port->aggregator->aggregator_identifier);
__enable_port(port);
+ bond_3ad_set_carrier(port->slave->bond);
/* Slave array needs update */
*update_slave_arr = true;
/* Should notify peers if possible */
@@ -2141,6 +2158,7 @@ static void ad_disable_collecting_distributing(struct port *port,
port->actor_port_number,
port->aggregator->aggregator_identifier);
__disable_port(port);
+ bond_3ad_set_carrier(port->slave->bond);
/* Slave array needs an update */
*update_slave_arr = true;
}
@@ -2819,8 +2837,10 @@ int bond_3ad_set_carrier(struct bonding *bond)
}
active = __get_active_agg(&(SLAVE_AD_INFO(first_slave)->aggregator));
if (active) {
- /* are enough slaves available to consider link up? */
- if (__agg_active_ports(active) < bond->params.min_links) {
+ /* are enough slaves in collecting (and distributing) state to consider
+ * link up?
+ */
+ if (__agg_valid_ports(active) < bond->params.min_links) {
if (netif_carrier_ok(bond->dev)) {
netif_carrier_off(bond->dev);
goto out;
--
2.39.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net v2 2/3] bonding: 3ad: fix mux port state on oper down
2026-03-25 13:44 [PATCH net v2 0/3] bonding: 3ad: fix mux port state on oper down Louis Scalbert
2026-03-25 13:44 ` [PATCH net v2 1/3] bonding: 3ad: fix carrier when no valid slaves Louis Scalbert
@ 2026-03-25 13:44 ` Louis Scalbert
2026-03-25 13:44 ` [PATCH net v2 3/3] bonding: 3ad: fix stuck negotiation on recovery Louis Scalbert
2 siblings, 0 replies; 8+ messages in thread
From: Louis Scalbert @ 2026-03-25 13:44 UTC (permalink / raw)
To: netdev
Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
Louis Scalbert
When the bonding interface has carrier down due to the absence of
valid slaves and a slave transitions from down to up, the bonding
interface briefly goes carrier up, then down again, and finally up
once LACP negotiates collecting and distributing on the port.
The interface should not transition to carrier up until LACP
negotiation is complete.
This happens because the actor and partner port states remain in
collecting (and distributing) when the port goes down. When the port
comes back up, it temporarily remains in this state until LACP
renegotiation occurs.
Previously this was mostly cosmetic, but since the bonding carrier
state now depends on the LACP negotiation state, it causes the
interface to flap.
Fix this by unsetting the SELECTED flag when a port goes down so that
the mux state machine transitions through ATTACHED and DETACHED,
which clears the actor collecting and distributing flags. Do not
attempt to set the SELECTED flag if the port is still disabled.
Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
drivers/net/bonding/bond_3ad.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 6d3613755d45..1771e406224c 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1570,6 +1570,12 @@ static void ad_port_selection_logic(struct port *port, bool *update_slave_arr)
struct slave *slave;
int found = 0;
+ /* Disabled ports cannot be SELECTED.
+ * Do not attempt to set the SELECTED flag if the port is still disabled.
+ */
+ if (!port->is_enabled)
+ return;
+
/* if the port is already Selected, do nothing */
if (port->sm_vars & AD_PORT_SELECTED)
return;
@@ -2794,6 +2800,7 @@ void bond_3ad_handle_link_change(struct slave *slave, char link)
/* link has failed */
port->is_enabled = false;
ad_update_actor_keys(port, true);
+ port->sm_vars &= ~AD_PORT_SELECTED;
}
agg = __get_first_agg(port);
ad_agg_selection_logic(agg, &dummy);
--
2.39.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net v2 3/3] bonding: 3ad: fix stuck negotiation on recovery
2026-03-25 13:44 [PATCH net v2 0/3] bonding: 3ad: fix mux port state on oper down Louis Scalbert
2026-03-25 13:44 ` [PATCH net v2 1/3] bonding: 3ad: fix carrier when no valid slaves Louis Scalbert
2026-03-25 13:44 ` [PATCH net v2 2/3] bonding: 3ad: fix mux port state on oper down Louis Scalbert
@ 2026-03-25 13:44 ` Louis Scalbert
2 siblings, 0 replies; 8+ messages in thread
From: Louis Scalbert @ 2026-03-25 13:44 UTC (permalink / raw)
To: netdev
Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
Louis Scalbert
The previous commit introduced a side effect caused by clearing the
SELECTED flag on disabled ports. After all ports in an aggregator go
down, if only a subset of ports comes back up, those ports can no
longer renegotiate LACP unless all aggregator ports come back up.
1. All aggregator ports go down
- The SELECTED flag is cleared on all of them.
2. One port comes back up
- Its SELECTED flag is set again.
- It enters the WAITING state and gets its READY_N flag.
- The remaining ports stay UNSELECTED. Because of that, they cannot
enter the WAITING state and therefore never get READY_N.
- __agg_ports_are_ready() returns 0 because it finds a port without
READY_N.
- As a result, __set_agg_ports_ready() keeps the READY flag cleared on
all ports.
- The port that came back up is therefore not marked READY and cannot
transition to ATTACHED.
- LACP negotiation becomes stuck, and the port cannot be used.
3. All aggregator ports come back up
- They all regain SELECTED and READY_N.
- __agg_ports_are_ready() now returns 1.
- __set_agg_ports_ready() sets READY on all ports.
- They can then transition to ATTACHED.
- Negotiation resumes and the aggregator becomes operational again.
Consider only ports currently in the WAITING mux state for READY_N in
order to avoid __agg_ports_are_ready() to return 0 because of a disabled
port. That matches 802.3ad, which states: "The Selection Logic asserts
Ready TRUE when the values of Ready_N for all ports that are waiting to
attach to a given Aggregator are TRUE.".
Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
drivers/net/bonding/bond_3ad.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 1771e406224c..86657de7143d 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -700,7 +700,8 @@ static void __update_ntt(struct lacpdu *lacpdu, struct port *port)
}
/**
- * __agg_ports_are_ready - check if all ports in an aggregator are ready
+ * __agg_ports_are_ready - check if all ports in an aggregator that are in
+ * the WAITING state are ready
* @aggregator: the aggregator we're looking at
*
*/
@@ -716,6 +717,8 @@ static int __agg_ports_are_ready(struct aggregator *aggregator)
for (port = aggregator->lag_ports;
port;
port = port->next_port_in_aggregator) {
+ if (port->sm_mux_state != AD_MUX_WAITING)
+ continue;
if (!(port->sm_vars & AD_PORT_READY_N)) {
retval = 0;
break;
--
2.39.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net v2 1/3] bonding: 3ad: fix carrier when no valid slaves
2026-03-25 13:44 ` [PATCH net v2 1/3] bonding: 3ad: fix carrier when no valid slaves Louis Scalbert
@ 2026-03-27 16:16 ` Louis Scalbert
2026-03-27 16:37 ` Mahesh Bandewar (महेश बंडेवार)
2026-03-27 16:56 ` Jay Vosburgh
0 siblings, 2 replies; 8+ messages in thread
From: Louis Scalbert @ 2026-03-27 16:16 UTC (permalink / raw)
To: netdev; +Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger
Hi everyone,
Le mer. 25 mars 2026 à 14:44, Louis Scalbert
<louis.scalbert@6wind.com> a écrit :
>
> When an 802.3ad (LACP) bonding interface has no slaves in the
> collecting/distributing state, the bonding master may still report
> carrier as up. In this situation, no slave is actually able to transmit
> or receive traffic.
>
I’ve just realized that the issue description is not entirely accurate.
Bonding links that are not in LACP Collecting/Distributing state are
excluded from traffic
distribution in favor of those that are. However, if no links are in
Collecting/Distributing state,
the kernel considers all links as valid for distributing traffic.
> As a result, upper-layer daemons consider the interface operational
> while traffic is effectively blackholed.
There are situations where traffic is not blackholed.
For example, a Linux machine A is connected to machine B using two
aggregated links.
On A, LACP is configured, but not on B. As a result, LACP will not
reach the Collecting/Distributing
state since B does not send LACP frames. However, the bonding
interface will still be up on both
sides, and traffic can be exchanged.
This behavior is, of course, not compliant with the LACP (802.3ad)
standard, but applying this fix
as-is could introduce regressions in some setups. Under normal
conditions, having no Collecting /
Distributing links means that no interfaces are operational for
handling traffic - that is precisely the
purpose of using LACP.
I will publish a new version with a knob to enforce compliance with
the LACP standard. It will be
disabled by default for backward compatibility.
What do you think of "lacp_strict" or "lacp_enforce" ?
>
> Fix this by asserting carrier only when at least 'min_links' slaves are
> in the collecting/distributing state (or collecting only if the
> coupled_control default behavior is disabled).
>
> Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
> Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
> ---
> drivers/net/bonding/bond_3ad.c | 24 ++++++++++++++++++++++--
> 1 file changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> index af7f74cfdc08..6d3613755d45 100644
> --- a/drivers/net/bonding/bond_3ad.c
> +++ b/drivers/net/bonding/bond_3ad.c
> @@ -745,6 +745,22 @@ static void __set_agg_ports_ready(struct aggregator *aggregator, int val)
> }
> }
>
> +static int __agg_valid_ports(struct aggregator *agg)
> +{
> + struct port *port;
> + int valid = 0;
> +
> + for (port = agg->lag_ports; port;
> + port = port->next_port_in_aggregator) {
> + if (port->actor_oper_port_state & LACP_STATE_COLLECTING &&
> + (!port->slave->bond->params.coupled_control ||
> + port->actor_oper_port_state & LACP_STATE_DISTRIBUTING))
> + valid++;
> + }
> +
> + return valid;
> +}
> +
> static int __agg_active_ports(struct aggregator *agg)
> {
> struct port *port;
> @@ -2120,6 +2136,7 @@ static void ad_enable_collecting_distributing(struct port *port,
> port->actor_port_number,
> port->aggregator->aggregator_identifier);
> __enable_port(port);
> + bond_3ad_set_carrier(port->slave->bond);
> /* Slave array needs update */
> *update_slave_arr = true;
> /* Should notify peers if possible */
> @@ -2141,6 +2158,7 @@ static void ad_disable_collecting_distributing(struct port *port,
> port->actor_port_number,
> port->aggregator->aggregator_identifier);
> __disable_port(port);
> + bond_3ad_set_carrier(port->slave->bond);
> /* Slave array needs an update */
> *update_slave_arr = true;
> }
> @@ -2819,8 +2837,10 @@ int bond_3ad_set_carrier(struct bonding *bond)
> }
> active = __get_active_agg(&(SLAVE_AD_INFO(first_slave)->aggregator));
> if (active) {
> - /* are enough slaves available to consider link up? */
> - if (__agg_active_ports(active) < bond->params.min_links) {
> + /* are enough slaves in collecting (and distributing) state to consider
> + * link up?
> + */
> + if (__agg_valid_ports(active) < bond->params.min_links) {
> if (netif_carrier_ok(bond->dev)) {
> netif_carrier_off(bond->dev);
> goto out;
> --
> 2.39.2
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2 1/3] bonding: 3ad: fix carrier when no valid slaves
2026-03-27 16:16 ` Louis Scalbert
@ 2026-03-27 16:37 ` Mahesh Bandewar (महेश बंडेवार)
2026-03-27 16:56 ` Jay Vosburgh
1 sibling, 0 replies; 8+ messages in thread
From: Mahesh Bandewar (महेश बंडेवार) @ 2026-03-27 16:37 UTC (permalink / raw)
To: Louis Scalbert
Cc: netdev, andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy,
shemminger
On Fri, Mar 27, 2026 at 9:22 AM Louis Scalbert <louis.scalbert@6wind.com> wrote:
>
> Hi everyone,
>
> Le mer. 25 mars 2026 à 14:44, Louis Scalbert
> <louis.scalbert@6wind.com> a écrit :
> >
> > When an 802.3ad (LACP) bonding interface has no slaves in the
> > collecting/distributing state, the bonding master may still report
> > carrier as up. In this situation, no slave is actually able to transmit
> > or receive traffic.
> >
> I’ve just realized that the issue description is not entirely accurate.
>
> Bonding links that are not in LACP Collecting/Distributing state are
> excluded from traffic
> distribution in favor of those that are. However, if no links are in
> Collecting/Distributing state,
> the kernel considers all links as valid for distributing traffic.
>
> > As a result, upper-layer daemons consider the interface operational
> > while traffic is effectively blackholed.
>
> There are situations where traffic is not blackholed.
>
> For example, a Linux machine A is connected to machine B using two
> aggregated links.
> On A, LACP is configured, but not on B. As a result, LACP will not
> reach the Collecting/Distributing
> state since B does not send LACP frames. However, the bonding
> interface will still be up on both
> sides, and traffic can be exchanged.
>
> This behavior is, of course, not compliant with the LACP (802.3ad)
> standard, but applying this fix
> as-is could introduce regressions in some setups. Under normal
> conditions, having no Collecting /
> Distributing links means that no interfaces are operational for
> handling traffic - that is precisely the
> purpose of using LACP.
>
The legacy behavior is to "maintain connectivity" by selecting a
functioning aggregator, as long as there are no link events. It's
crucial to maintain this legacy behavior.
> I will publish a new version with a knob to enforce compliance with
> the LACP standard. It will be
> disabled by default for backward compatibility.
>
> What do you think of "lacp_strict" or "lacp_enforce" ?
>
I don't have any specific preference for the name, but the default
value must comply with the legacy behavior.
> >
> > Fix this by asserting carrier only when at least 'min_links' slaves are
> > in the collecting/distributing state (or collecting only if the
> > coupled_control default behavior is disabled).
> >
> > Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
> > Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
> > ---
> > drivers/net/bonding/bond_3ad.c | 24 ++++++++++++++++++++++--
> > 1 file changed, 22 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> > index af7f74cfdc08..6d3613755d45 100644
> > --- a/drivers/net/bonding/bond_3ad.c
> > +++ b/drivers/net/bonding/bond_3ad.c
> > @@ -745,6 +745,22 @@ static void __set_agg_ports_ready(struct aggregator *aggregator, int val)
> > }
> > }
> >
> > +static int __agg_valid_ports(struct aggregator *agg)
> > +{
> > + struct port *port;
> > + int valid = 0;
> > +
> > + for (port = agg->lag_ports; port;
> > + port = port->next_port_in_aggregator) {
> > + if (port->actor_oper_port_state & LACP_STATE_COLLECTING &&
> > + (!port->slave->bond->params.coupled_control ||
> > + port->actor_oper_port_state & LACP_STATE_DISTRIBUTING))
> > + valid++;
> > + }
> > +
> > + return valid;
> > +}
> > +
> > static int __agg_active_ports(struct aggregator *agg)
> > {
> > struct port *port;
> > @@ -2120,6 +2136,7 @@ static void ad_enable_collecting_distributing(struct port *port,
> > port->actor_port_number,
> > port->aggregator->aggregator_identifier);
> > __enable_port(port);
> > + bond_3ad_set_carrier(port->slave->bond);
> > /* Slave array needs update */
> > *update_slave_arr = true;
> > /* Should notify peers if possible */
> > @@ -2141,6 +2158,7 @@ static void ad_disable_collecting_distributing(struct port *port,
> > port->actor_port_number,
> > port->aggregator->aggregator_identifier);
> > __disable_port(port);
> > + bond_3ad_set_carrier(port->slave->bond);
> > /* Slave array needs an update */
> > *update_slave_arr = true;
> > }
> > @@ -2819,8 +2837,10 @@ int bond_3ad_set_carrier(struct bonding *bond)
> > }
> > active = __get_active_agg(&(SLAVE_AD_INFO(first_slave)->aggregator));
> > if (active) {
> > - /* are enough slaves available to consider link up? */
> > - if (__agg_active_ports(active) < bond->params.min_links) {
> > + /* are enough slaves in collecting (and distributing) state to consider
> > + * link up?
> > + */
> > + if (__agg_valid_ports(active) < bond->params.min_links) {
> > if (netif_carrier_ok(bond->dev)) {
> > netif_carrier_off(bond->dev);
> > goto out;
> > --
> > 2.39.2
> >
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2 1/3] bonding: 3ad: fix carrier when no valid slaves
2026-03-27 16:16 ` Louis Scalbert
2026-03-27 16:37 ` Mahesh Bandewar (महेश बंडेवार)
@ 2026-03-27 16:56 ` Jay Vosburgh
2026-04-01 13:32 ` Louis Scalbert
1 sibling, 1 reply; 8+ messages in thread
From: Jay Vosburgh @ 2026-03-27 16:56 UTC (permalink / raw)
To: Louis Scalbert
Cc: netdev, andrew+netdev, edumazet, kuba, pabeni, fbl, andy,
shemminger
Louis Scalbert <louis.scalbert@6wind.com> wrote:
>Hi everyone,
>
>Le mer. 25 mars 2026 à 14:44, Louis Scalbert
><louis.scalbert@6wind.com> a écrit :
>>
>> When an 802.3ad (LACP) bonding interface has no slaves in the
>> collecting/distributing state, the bonding master may still report
>> carrier as up. In this situation, no slave is actually able to transmit
>> or receive traffic.
>>
>I’ve just realized that the issue description is not entirely accurate.
>
>Bonding links that are not in LACP Collecting/Distributing state are
>excluded from traffic
>distribution in favor of those that are. However, if no links are in
>Collecting/Distributing state,
>the kernel considers all links as valid for distributing traffic.
>
>> As a result, upper-layer daemons consider the interface operational
>> while traffic is effectively blackholed.
>
>There are situations where traffic is not blackholed.
>
>For example, a Linux machine A is connected to machine B using two
>aggregated links.
>On A, LACP is configured, but not on B. As a result, LACP will not
>reach the Collecting/Distributing
>state since B does not send LACP frames. However, the bonding
>interface will still be up on both
>sides, and traffic can be exchanged.
Bonding does this because, in the absense of a LACP partner, the
interfaces in the bond will become Individual links, each of which will
become its own aggregator, one of which will be chosen to be active.
This is in compliance with the standard, (802.1AX-2014, 6.1.1.j)
and is intended to permit some communication between a system that is
LACP-enabled and a system that is not. The common example I'm familiar
with is a host that PXE boots over a network, connected to a
LACP-enabled switch. During early boot, the PXE environment does not
perform LACP, but still needs to communicate with the switch.
>This behavior is, of course, not compliant with the LACP (802.3ad)
>standard, but applying this fix
>as-is could introduce regressions in some setups. Under normal
>conditions, having no Collecting /
>Distributing links means that no interfaces are operational for
>handling traffic - that is precisely the
>purpose of using LACP.
FWIW, generally speaking we've settled on IEEE 802.1AX-2014 as
the version of the standard to which the bonding implementation should
be conformant. There is a more recent version, -2020, but it is a
significant change from the prior versions.
>I will publish a new version with a knob to enforce compliance with
>the LACP standard. It will be
>disabled by default for backward compatibility.
>
>What do you think of "lacp_strict" or "lacp_enforce" ?
I'm not convinced that we are in violation of the standard with
the present behavior. That said, if such a switch is necessary, I'd
vote for "lacp_strict".
-J
>>
>> Fix this by asserting carrier only when at least 'min_links' slaves are
>> in the collecting/distributing state (or collecting only if the
>> coupled_control default behavior is disabled).
>>
>> Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
>> Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
>> ---
>> drivers/net/bonding/bond_3ad.c | 24 ++++++++++++++++++++++--
>> 1 file changed, 22 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>> index af7f74cfdc08..6d3613755d45 100644
>> --- a/drivers/net/bonding/bond_3ad.c
>> +++ b/drivers/net/bonding/bond_3ad.c
>> @@ -745,6 +745,22 @@ static void __set_agg_ports_ready(struct aggregator *aggregator, int val)
>> }
>> }
>>
>> +static int __agg_valid_ports(struct aggregator *agg)
>> +{
>> + struct port *port;
>> + int valid = 0;
>> +
>> + for (port = agg->lag_ports; port;
>> + port = port->next_port_in_aggregator) {
>> + if (port->actor_oper_port_state & LACP_STATE_COLLECTING &&
>> + (!port->slave->bond->params.coupled_control ||
>> + port->actor_oper_port_state & LACP_STATE_DISTRIBUTING))
>> + valid++;
>> + }
>> +
>> + return valid;
>> +}
>> +
>> static int __agg_active_ports(struct aggregator *agg)
>> {
>> struct port *port;
>> @@ -2120,6 +2136,7 @@ static void ad_enable_collecting_distributing(struct port *port,
>> port->actor_port_number,
>> port->aggregator->aggregator_identifier);
>> __enable_port(port);
>> + bond_3ad_set_carrier(port->slave->bond);
>> /* Slave array needs update */
>> *update_slave_arr = true;
>> /* Should notify peers if possible */
>> @@ -2141,6 +2158,7 @@ static void ad_disable_collecting_distributing(struct port *port,
>> port->actor_port_number,
>> port->aggregator->aggregator_identifier);
>> __disable_port(port);
>> + bond_3ad_set_carrier(port->slave->bond);
>> /* Slave array needs an update */
>> *update_slave_arr = true;
>> }
>> @@ -2819,8 +2837,10 @@ int bond_3ad_set_carrier(struct bonding *bond)
>> }
>> active = __get_active_agg(&(SLAVE_AD_INFO(first_slave)->aggregator));
>> if (active) {
>> - /* are enough slaves available to consider link up? */
>> - if (__agg_active_ports(active) < bond->params.min_links) {
>> + /* are enough slaves in collecting (and distributing) state to consider
>> + * link up?
>> + */
>> + if (__agg_valid_ports(active) < bond->params.min_links) {
>> if (netif_carrier_ok(bond->dev)) {
>> netif_carrier_off(bond->dev);
>> goto out;
>> --
>> 2.39.2
>>
---
-Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2 1/3] bonding: 3ad: fix carrier when no valid slaves
2026-03-27 16:56 ` Jay Vosburgh
@ 2026-04-01 13:32 ` Louis Scalbert
0 siblings, 0 replies; 8+ messages in thread
From: Louis Scalbert @ 2026-04-01 13:32 UTC (permalink / raw)
To: Jay Vosburgh
Cc: netdev, andrew+netdev, edumazet, kuba, pabeni, fbl, andy,
shemminger, maheshb
Hello,
Thank you to Mahesh and Jay for your helpful feedback.
I have reviewed the IEEE LACP standard on this specific topic and conducted
tests to provide as accurate a response as possible.
Le ven. 27 mars 2026 à 17:56, Jay Vosburgh <jv@jvosburgh.net> a écrit :
>
> Louis Scalbert <louis.scalbert@6wind.com> wrote:
>
> >Hi everyone,
> >
> >Le mer. 25 mars 2026 à 14:44, Louis Scalbert
> ><louis.scalbert@6wind.com> a écrit :
> >>
> >> When an 802.3ad (LACP) bonding interface has no slaves in the
> >> collecting/distributing state, the bonding master may still report
> >> carrier as up. In this situation, no slave is actually able to transmit
> >> or receive traffic.
> >>
> >I’ve just realized that the issue description is not entirely accurate.
> >
> >Bonding links that are not in LACP Collecting/Distributing state are
> >excluded from traffic
> >distribution in favor of those that are. However, if no links are in
> >Collecting/Distributing state,
> >the kernel considers all links as valid for distributing traffic.
> >
> >> As a result, upper-layer daemons consider the interface operational
> >> while traffic is effectively blackholed.
> >
> >There are situations where traffic is not blackholed.
> >
> >For example, a Linux machine A is connected to machine B using two
> >aggregated links.
> >On A, LACP is configured, but not on B. As a result, LACP will not
> >reach the Collecting/Distributing
> >state since B does not send LACP frames. However, the bonding
> >interface will still be up on both
> >sides, and traffic can be exchanged.
>
> Bonding does this because, in the absense of a LACP partner, the
> interfaces in the bond will become Individual links, each of which will
> become its own aggregator, one of which will be chosen to be active.
Thank you for the detailed description of the current behavior. I
confirm that this matches
what I am observing: only one link is selected for both transmitting
and receiving traffic,
while traffic received on the other enslaved links is discarded.
>
>
> This is in compliance with the standard, (802.1AX-2014, 6.1.1.j)
> and is intended to permit some communication between a system that is
> LACP-enabled and a system that is not. The common example I'm familiar
The exact statement is:
"Backwards compatibility with aggregation-unaware devices - Links that
cannot take part in
Link Aggregation because of their inherent capabilities, management
configuration, or the
capabilities of the devices to which they attach operate as normal,
individual links."
This wording has remained unchanged from 802.3ad-2000 through 802.1AX-2020.
It implies that when the partner does not support aggregation (i.e.,
does not run LACP), all
bond slaves should fall back to operating as normal, individual links.
However, this is not how the kernel behaves today: it selects only a
single link for forwarding,
while traffic received on the other enslaved links is discarded.
As such, the current implementation does not comply with the standard.
>
> with is a host that PXE boots over a network, connected to a
> LACP-enabled switch. During early boot, the PXE environment does not
> perform LACP, but still needs to communicate with the switch.
In this scenario, the switch is the Linux machine. The current
behavior is only effective
if the kernel happens, by chance, to select the port used by PXE.
On Cisco Catalyst switches, when LACP fails to negotiate on all links,
the PortChannel
interface is placed in a suspended state, resulting in carrier down.
To achieve the
IEEE-defined behavior, the user must explicitly configure "no
port-channel standalone-disable".
In that mode, all links operate independently, meaning the switch
forwarding table maintains
MAC entries per link rather than per PortChannel interface. This
allows PXE to function correctly
in such scenarios.
On Cisco routers, however, the only possible state in this situation
is a suspended PortChannel
(carrier down). There is no option to fall back to independent links.
As far as I know, this default behavior (disabling the aggregation
when LACP is not negotiated)
is also the default on other vendors such as Juniper, and is the
behavior I intend to implement
to address the traffic blackholing issue.
Implementing the IEEE-compliant behavior would only be relevant for
bonding interfaces
enslaved to bridge interfaces. My tests in this scenario, without any
LACP negotiation, show
that the bridge only learns forwarding database (FDB) entries behind
the bonding interface,
rather than per individual link. This capability could still be added
in the future if needed.
>
>
> >This behavior is, of course, not compliant with the LACP (802.3ad)
> >standard, but applying this fix
> >as-is could introduce regressions in some setups. Under normal
> >conditions, having no Collecting /
> >Distributing links means that no interfaces are operational for
> >handling traffic - that is precisely the
> >purpose of using LACP.
>
> FWIW, generally speaking we've settled on IEEE 802.1AX-2014 as
> the version of the standard to which the bonding implementation should
> be conformant. There is a more recent version, -2020, but it is a
> significant change from the prior versions.
Good to know - thanks for pointing that out.
>
>
> >I will publish a new version with a knob to enforce compliance with
> >the LACP standard. It will be
> >disabled by default for backward compatibility.
> >
> >What do you think of "lacp_strict" or "lacp_enforce" ?
>
> I'm not convinced that we are in violation of the standard with
> the present behavior. That said, if such a switch is necessary, I'd
> vote for "lacp_strict".
I suggest addressing the traffic blackholing issue by introducing the following
configuration knob. lacp_fallback_mode:
- legacy or 0 (default) value: the current behavior
- strict or 1: set the carrier down
- A value of 2 could be added later to support the IEEE-compliant mode
with bridge
interfaces, if needed.
Best regards,
Louis Scalbert
>
>
> -J
>
> >>
> >> Fix this by asserting carrier only when at least 'min_links' slaves are
> >> in the collecting/distributing state (or collecting only if the
> >> coupled_control default behavior is disabled).
> >>
> >> Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
> >> Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
> >> ---
> >> drivers/net/bonding/bond_3ad.c | 24 ++++++++++++++++++++++--
> >> 1 file changed, 22 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> >> index af7f74cfdc08..6d3613755d45 100644
> >> --- a/drivers/net/bonding/bond_3ad.c
> >> +++ b/drivers/net/bonding/bond_3ad.c
> >> @@ -745,6 +745,22 @@ static void __set_agg_ports_ready(struct aggregator *aggregator, int val)
> >> }
> >> }
> >>
> >> +static int __agg_valid_ports(struct aggregator *agg)
> >> +{
> >> + struct port *port;
> >> + int valid = 0;
> >> +
> >> + for (port = agg->lag_ports; port;
> >> + port = port->next_port_in_aggregator) {
> >> + if (port->actor_oper_port_state & LACP_STATE_COLLECTING &&
> >> + (!port->slave->bond->params.coupled_control ||
> >> + port->actor_oper_port_state & LACP_STATE_DISTRIBUTING))
> >> + valid++;
> >> + }
> >> +
> >> + return valid;
> >> +}
> >> +
> >> static int __agg_active_ports(struct aggregator *agg)
> >> {
> >> struct port *port;
> >> @@ -2120,6 +2136,7 @@ static void ad_enable_collecting_distributing(struct port *port,
> >> port->actor_port_number,
> >> port->aggregator->aggregator_identifier);
> >> __enable_port(port);
> >> + bond_3ad_set_carrier(port->slave->bond);
> >> /* Slave array needs update */
> >> *update_slave_arr = true;
> >> /* Should notify peers if possible */
> >> @@ -2141,6 +2158,7 @@ static void ad_disable_collecting_distributing(struct port *port,
> >> port->actor_port_number,
> >> port->aggregator->aggregator_identifier);
> >> __disable_port(port);
> >> + bond_3ad_set_carrier(port->slave->bond);
> >> /* Slave array needs an update */
> >> *update_slave_arr = true;
> >> }
> >> @@ -2819,8 +2837,10 @@ int bond_3ad_set_carrier(struct bonding *bond)
> >> }
> >> active = __get_active_agg(&(SLAVE_AD_INFO(first_slave)->aggregator));
> >> if (active) {
> >> - /* are enough slaves available to consider link up? */
> >> - if (__agg_active_ports(active) < bond->params.min_links) {
> >> + /* are enough slaves in collecting (and distributing) state to consider
> >> + * link up?
> >> + */
> >> + if (__agg_valid_ports(active) < bond->params.min_links) {
> >> if (netif_carrier_ok(bond->dev)) {
> >> netif_carrier_off(bond->dev);
> >> goto out;
> >> --
> >> 2.39.2
> >>
>
> ---
> -Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-04-01 13:32 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-25 13:44 [PATCH net v2 0/3] bonding: 3ad: fix mux port state on oper down Louis Scalbert
2026-03-25 13:44 ` [PATCH net v2 1/3] bonding: 3ad: fix carrier when no valid slaves Louis Scalbert
2026-03-27 16:16 ` Louis Scalbert
2026-03-27 16:37 ` Mahesh Bandewar (महेश बंडेवार)
2026-03-27 16:56 ` Jay Vosburgh
2026-04-01 13:32 ` Louis Scalbert
2026-03-25 13:44 ` [PATCH net v2 2/3] bonding: 3ad: fix mux port state on oper down Louis Scalbert
2026-03-25 13:44 ` [PATCH net v2 3/3] bonding: 3ad: fix stuck negotiation on recovery Louis Scalbert
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.