public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves
@ 2026-04-08 15:23 Louis Scalbert
  2026-04-08 15:23 ` [PATCH net v3 1/5] bonding: 3ad: add lacp_fallback configuration knob Louis Scalbert
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Louis Scalbert @ 2026-04-08 15:23 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
	maheshb, Louis Scalbert

Hi everyone,

This series addresses a blackholing issue and a subsequent link-flapping
issue in the 802.3ad bonding driver when dealing with inactive slaves
and the `min_links` parameter.

When an 802.3ad (LACP) bonding interface has no slaves in the
collecting/distributing state, the bonding master still reports
carrier as up as long as at least 'min_links' slaves have carrier.

In this situation, only one slave is effectively used for TX/RX,
while traffic received on other slaves is dropped. Upper-layer
daemons therefore consider the interface operational, even though
traffic may be blackholed if the lack of LACP negotiation means
the partner is not ready to deal with traffic.

The current behavior is not compliant with the LACP standard. This
patchset introduces a working behavior that is not strictly
standard-compliant either, but is widely adopted across the industry.
It consists of bringing the bonding master interface down to signal to
upper-layer processes that it is not usable.

This patchset depends on the following iproute2 change:
ip/bond: add lacp_fallback support

Patch 1 introduces the lacp_fallback configuration knob, which is
applied in the subsequent patch. The default (legacy) mode preserves
the existing behavior, while the strict mode is intended to force the
bonding master carrier down in this situation.

Patch 2 addresses the core issue when lacp_fallback is set to strict.
It ensures that carrier is asserted only when at least 'min_links'
slaves are in a valid state (collecting/distributing, or collecting
only when coupled_control is disabled).

Patch 3 fixes a side effect of the first patch. Tightening the carrier 
logic exposes a state persistence bug: when a physical link goes down, 
the LACP collecting/distributing flags remain set. When the link returns, 
the interface briefly hallucinates that it is ready, bounces the carrier 
up, and then drops it again once LACP renegotiation starts. Unsetting the 
SELECTED flag when the link goes down forces the state machine through 
DETACHED, clearing the stale flags and preventing the flap.

Patch 4 fixes a side effect of the second patch caused by clearing the
SELECTED flag on disabled ports. After all ports in an aggregator go
down, if only a subset of ports comes back up, those ports can no
longer renegotiate LACP unless all aggregator ports come back up.

Patch 5 adds a test for the bonding legacy and strict LACP fallback modes.

Louis Scalbert (5):
  bonding: 3ad: add lacp_fallback configuration knob
  bonding: 3ad: fix carrier when no valid slaves
  bonding: 3ad: fix mux port state on oper down
  bonding: 3ad: fix stuck negotiation on recovery
  selftests: bonding: add test for fallback mode

 Documentation/networking/bonding.rst          |  33 ++
 drivers/net/bonding/bond_3ad.c                |  38 ++-
 drivers/net/bonding/bond_main.c               |   1 +
 drivers/net/bonding/bond_netlink.c            |  16 +
 drivers/net/bonding/bond_options.c            |  27 ++
 include/net/bond_options.h                    |   1 +
 include/net/bonding.h                         |   1 +
 include/uapi/linux/if_link.h                  |   1 +
 .../selftests/drivers/net/bonding/Makefile    |   1 +
 .../drivers/net/bonding/bond_lacp_fallback.sh | 299 ++++++++++++++++++
 10 files changed, 415 insertions(+), 3 deletions(-)
 create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_lacp_fallback.sh

-- 
2.39.2


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net v3 1/5] bonding: 3ad: add lacp_fallback configuration knob
  2026-04-08 15:23 [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves Louis Scalbert
@ 2026-04-08 15:23 ` Louis Scalbert
  2026-04-08 15:23 ` [PATCH net v3 2/5] bonding: 3ad: fix carrier when no valid slaves Louis Scalbert
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Louis Scalbert @ 2026-04-08 15:23 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
	maheshb, Louis Scalbert

When an 802.3ad (LACP) bonding interface has no slaves in the
collecting/distributing state, the bonding master still reports
carrier as up as long as at least 'min_links' slaves have carrier.

In this situation, only one slave is effectively used for TX/RX,
while traffic received on other slaves is dropped. Upper-layer
daemons therefore consider the interface operational, even though
traffic may be blackholed if the lack of LACP negotiation means
the partner is not ready to deal with traffic.

Introduce a configuration knob to control this behavior. It allows
the bonding master to assert carrier only when at least 'min_links'
slaves are in collecting/distributing state (or collecting only
when coupled_control is disabled).

The default mode preserves the current (legacy) behavior. This
patch only introduces the knob; its behavior is implemented in
the subsequent commit.

Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
 Documentation/networking/bonding.rst | 33 ++++++++++++++++++++++++++++
 drivers/net/bonding/bond_main.c      |  1 +
 drivers/net/bonding/bond_netlink.c   | 16 ++++++++++++++
 drivers/net/bonding/bond_options.c   | 26 ++++++++++++++++++++++
 include/net/bond_options.h           |  1 +
 include/net/bonding.h                |  1 +
 include/uapi/linux/if_link.h         |  1 +
 7 files changed, 79 insertions(+)

diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
index e700bf1d095c..465d06aead27 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -619,6 +619,39 @@ min_links
 	aggregator cannot be active without at least one available link,
 	setting this option to 0 or to 1 has the exact same effect.
 
+lacp_fallback
+
+	Specifies the fallback behavior of a bonding when LACP negotiation fails on
+	all slave links, i.e. when no slave is in the Collecting/Distributing state
+	(or only in Collecting state when coupled_control is disabled), while at
+	least `min_links` link still reports carrier up.
+
+	This option is only applicable to 802.3ad mode (mode 4).
+
+	Valid values are:
+
+	legacy or 0
+		In this situation, the bonding master remains carrier up and
+		randomly selects a single slave to transmit and receive traffic.
+		Traffic received on other slaves is dropped.
+
+		This mode is deprecated, as it may lead to traffic blackholing
+		when the absence of LACP negotiation means the partner is not
+		ready to collect and distribute traffic.
+
+		This is the legacy default behavior.
+
+	strict or 1
+		In this situation, the bonding master reports carrier down, allowing
+		upper-layer processes to detect that the interface is not usable for
+		collecting and distributing traffic.
+
+		The master transitions to carrier up only when at least
+		`min_links` slaves reach the Collecting(/Distributing) state,
+		allowing traffic to flow.
+
+	The default value is 0 (legacy).
+
 mode
 
 	Specifies one of the bonding policies. The default is
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index a5484d11553d..02cba0560a39 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -6440,6 +6440,7 @@ static int __init bond_check_params(struct bond_params *params)
 	params->ad_user_port_key = ad_user_port_key;
 	params->coupled_control = 1;
 	params->broadcast_neighbor = 0;
+	params->lacp_fallback = 0;
 	if (packets_per_slave > 0) {
 		params->reciprocal_packets_per_slave =
 			reciprocal_value(packets_per_slave);
diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
index 286f11c517f7..1f92ad786b51 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -130,6 +130,7 @@ static const struct nla_policy bond_policy[IFLA_BOND_MAX + 1] = {
 	[IFLA_BOND_NS_IP6_TARGET]	= { .type = NLA_NESTED },
 	[IFLA_BOND_COUPLED_CONTROL]	= { .type = NLA_U8 },
 	[IFLA_BOND_BROADCAST_NEIGH]	= { .type = NLA_U8 },
+	[IFLA_BOND_LACP_FALLBACK]	= { .type = NLA_U8 },
 };
 
 static const struct nla_policy bond_slave_policy[IFLA_BOND_SLAVE_MAX + 1] = {
@@ -586,6 +587,16 @@ static int bond_changelink(struct net_device *bond_dev, struct nlattr *tb[],
 			return err;
 	}
 
+	if (data[IFLA_BOND_LACP_FALLBACK]) {
+		int fallback_mode = nla_get_u8(data[IFLA_BOND_LACP_FALLBACK]);
+
+		bond_opt_initval(&newval, fallback_mode);
+		err = __bond_opt_set(bond, BOND_OPT_LACP_FALLBACK, &newval,
+				     data[IFLA_BOND_LACP_FALLBACK], extack);
+		if (err)
+			return err;
+	}
+
 	return 0;
 }
 
@@ -658,6 +669,7 @@ static size_t bond_get_size(const struct net_device *bond_dev)
 		nla_total_size(sizeof(struct in6_addr)) * BOND_MAX_NS_TARGETS +
 		nla_total_size(sizeof(u8)) +	/* IFLA_BOND_COUPLED_CONTROL */
 		nla_total_size(sizeof(u8)) +	/* IFLA_BOND_BROADCAST_NEIGH */
+		nla_total_size(sizeof(u8)) +	/* IFLA_BOND_LACP_FALLBACK */
 		0;
 }
 
@@ -825,6 +837,10 @@ static int bond_fill_info(struct sk_buff *skb,
 		       bond->params.broadcast_neighbor))
 		goto nla_put_failure;
 
+	if (nla_put_u8(skb, IFLA_BOND_LACP_FALLBACK,
+		       bond->params.lacp_fallback))
+		goto nla_put_failure;
+
 	if (BOND_MODE(bond) == BOND_MODE_8023AD) {
 		struct ad_info info;
 
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 7380cc4ee75a..b672b8a881bb 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -68,6 +68,8 @@ static int bond_option_lacp_active_set(struct bonding *bond,
 				       const struct bond_opt_value *newval);
 static int bond_option_lacp_rate_set(struct bonding *bond,
 				     const struct bond_opt_value *newval);
+static int bond_option_lacp_fallback_set(struct bonding *bond,
+					 const struct bond_opt_value *newval);
 static int bond_option_ad_select_set(struct bonding *bond,
 				     const struct bond_opt_value *newval);
 static int bond_option_queue_id_set(struct bonding *bond,
@@ -162,6 +164,12 @@ static const struct bond_opt_value bond_lacp_rate_tbl[] = {
 	{ NULL,   -1,           0},
 };
 
+static const struct bond_opt_value bond_lacp_fallback_tbl[] = {
+	{ "legacy", 0, BOND_VALFLAG_DEFAULT},
+	{ "strict",  1, 0},
+	{ NULL, -1, 0 }
+};
+
 static const struct bond_opt_value bond_ad_select_tbl[] = {
 	{ "stable",          BOND_AD_STABLE,    BOND_VALFLAG_DEFAULT},
 	{ "bandwidth",       BOND_AD_BANDWIDTH, 0},
@@ -363,6 +371,14 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = {
 		.values = bond_lacp_rate_tbl,
 		.set = bond_option_lacp_rate_set
 	},
+	[BOND_OPT_LACP_FALLBACK] = {
+		.id = BOND_OPT_LACP_FALLBACK,
+		.name = "lacp_fallback",
+		.desc = "Define the LACP fallback mode when no slaves have negotiated",
+		.unsuppmodes = BOND_MODE_ALL_EX(BIT(BOND_MODE_8023AD)),
+		.values = bond_lacp_fallback_tbl,
+		.set = bond_option_lacp_fallback_set
+	},
 	[BOND_OPT_MINLINKS] = {
 		.id = BOND_OPT_MINLINKS,
 		.name = "min_links",
@@ -1684,6 +1700,16 @@ static int bond_option_lacp_rate_set(struct bonding *bond,
 	return 0;
 }
 
+static int bond_option_lacp_fallback_set(struct bonding *bond,
+					 const struct bond_opt_value *newval)
+{
+	netdev_dbg(bond->dev, "Setting LACP fallback to %s (%llu)\n",
+		   newval->string, newval->value);
+	bond->params.lacp_fallback = newval->value;
+
+	return 0;
+}
+
 static int bond_option_ad_select_set(struct bonding *bond,
 				     const struct bond_opt_value *newval)
 {
diff --git a/include/net/bond_options.h b/include/net/bond_options.h
index e6eedf23aea1..5eb64c831f54 100644
--- a/include/net/bond_options.h
+++ b/include/net/bond_options.h
@@ -79,6 +79,7 @@ enum {
 	BOND_OPT_COUPLED_CONTROL,
 	BOND_OPT_BROADCAST_NEIGH,
 	BOND_OPT_ACTOR_PORT_PRIO,
+	BOND_OPT_LACP_FALLBACK,
 	BOND_OPT_LAST
 };
 
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 395c6e281c5f..d8cb02643f8b 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -132,6 +132,7 @@ struct bond_params {
 	int peer_notif_delay;
 	int lacp_active;
 	int lacp_fast;
+	int lacp_fallback;
 	unsigned int min_links;
 	int ad_select;
 	char primary[IFNAMSIZ];
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index e9b5f79e1ee1..7ad3fc600c71 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -1539,6 +1539,7 @@ enum {
 	IFLA_BOND_NS_IP6_TARGET,
 	IFLA_BOND_COUPLED_CONTROL,
 	IFLA_BOND_BROADCAST_NEIGH,
+	IFLA_BOND_LACP_FALLBACK,
 	__IFLA_BOND_MAX,
 };
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net v3 2/5] bonding: 3ad: fix carrier when no valid slaves
  2026-04-08 15:23 [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves Louis Scalbert
  2026-04-08 15:23 ` [PATCH net v3 1/5] bonding: 3ad: add lacp_fallback configuration knob Louis Scalbert
@ 2026-04-08 15:23 ` Louis Scalbert
  2026-04-08 15:23 ` [PATCH net v3 3/5] bonding: 3ad: fix mux port state on oper down Louis Scalbert
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Louis Scalbert @ 2026-04-08 15:23 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
	maheshb, Louis Scalbert

Apply the "lacp_fallback" configuration from the previous commit.

"lacp_fallback" mode "strict" asserts that the bonding master carrier
only when at least 'min_links' slaves are in the collecting/distributing
state (or collecting only if the coupled_control default behavior is
disabled).

Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
 drivers/net/bonding/bond_3ad.c     | 26 ++++++++++++++++++++++++--
 drivers/net/bonding/bond_options.c |  1 +
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index af7f74cfdc08..b79a76296966 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -745,6 +745,22 @@ static void __set_agg_ports_ready(struct aggregator *aggregator, int val)
 	}
 }
 
+static int __agg_valid_ports(struct aggregator *agg)
+{
+	struct port *port;
+	int valid = 0;
+
+	for (port = agg->lag_ports; port;
+	     port = port->next_port_in_aggregator) {
+		if (port->actor_oper_port_state & LACP_STATE_COLLECTING &&
+		    (!port->slave->bond->params.coupled_control ||
+		     port->actor_oper_port_state & LACP_STATE_DISTRIBUTING))
+			valid++;
+	}
+
+	return valid;
+}
+
 static int __agg_active_ports(struct aggregator *agg)
 {
 	struct port *port;
@@ -2120,6 +2136,7 @@ static void ad_enable_collecting_distributing(struct port *port,
 			  port->actor_port_number,
 			  port->aggregator->aggregator_identifier);
 		__enable_port(port);
+		bond_3ad_set_carrier(port->slave->bond);
 		/* Slave array needs update */
 		*update_slave_arr = true;
 		/* Should notify peers if possible */
@@ -2141,6 +2158,7 @@ static void ad_disable_collecting_distributing(struct port *port,
 			  port->actor_port_number,
 			  port->aggregator->aggregator_identifier);
 		__disable_port(port);
+		bond_3ad_set_carrier(port->slave->bond);
 		/* Slave array needs an update */
 		*update_slave_arr = true;
 	}
@@ -2819,8 +2837,12 @@ int bond_3ad_set_carrier(struct bonding *bond)
 	}
 	active = __get_active_agg(&(SLAVE_AD_INFO(first_slave)->aggregator));
 	if (active) {
-		/* are enough slaves available to consider link up? */
-		if (__agg_active_ports(active) < bond->params.min_links) {
+		/* are enough slaves in collecting (and distributing) state to consider
+		 * link up?
+		 */
+		if ((bond->params.lacp_fallback ? __agg_valid_ports(active)
+					: __agg_active_ports(active)) <
+		    bond->params.min_links) {
 			if (netif_carrier_ok(bond->dev)) {
 				netif_carrier_off(bond->dev);
 				goto out;
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index b672b8a881bb..d64a5d2f80b6 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -1706,6 +1706,7 @@ static int bond_option_lacp_fallback_set(struct bonding *bond,
 	netdev_dbg(bond->dev, "Setting LACP fallback to %s (%llu)\n",
 		   newval->string, newval->value);
 	bond->params.lacp_fallback = newval->value;
+	bond_3ad_set_carrier(bond);
 
 	return 0;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net v3 3/5] bonding: 3ad: fix mux port state on oper down
  2026-04-08 15:23 [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves Louis Scalbert
  2026-04-08 15:23 ` [PATCH net v3 1/5] bonding: 3ad: add lacp_fallback configuration knob Louis Scalbert
  2026-04-08 15:23 ` [PATCH net v3 2/5] bonding: 3ad: fix carrier when no valid slaves Louis Scalbert
@ 2026-04-08 15:23 ` Louis Scalbert
  2026-04-08 15:23 ` [PATCH net v3 4/5] bonding: 3ad: fix stuck negotiation on recovery Louis Scalbert
  2026-04-08 15:23 ` [PATCH net v3 5/5] selftests: bonding: add test for fallback mode Louis Scalbert
  4 siblings, 0 replies; 6+ messages in thread
From: Louis Scalbert @ 2026-04-08 15:23 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
	maheshb, Louis Scalbert

When the bonding interface has carrier down due to the absence of
valid slaves and a slave transitions from down to up, the bonding
interface briefly goes carrier up, then down again, and finally up
once LACP negotiates collecting and distributing on the port.

The interface should not transition to carrier up until LACP
negotiation is complete.

This happens because the actor and partner port states remain in
collecting (and distributing) when the port goes down. When the port
comes back up, it temporarily remains in this state until LACP
renegotiation occurs.

Previously this was mostly cosmetic, but since the bonding carrier
state now depends on the LACP negotiation state, it causes the
interface to flap.

Fix this by unsetting the SELECTED flag when a port goes down so that
the mux state machine transitions through ATTACHED and DETACHED,
which clears the actor collecting and distributing flags. Do not
attempt to set the SELECTED flag if the port is still disabled.

Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
 drivers/net/bonding/bond_3ad.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index b79a76296966..3a94fbcbf721 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1570,6 +1570,12 @@ static void ad_port_selection_logic(struct port *port, bool *update_slave_arr)
 	struct slave *slave;
 	int found = 0;
 
+	/* Disabled ports cannot be SELECTED.
+	 * Do not attempt to set the SELECTED flag if the port is still disabled.
+	 */
+	if (!port->is_enabled)
+		return;
+
 	/* if the port is already Selected, do nothing */
 	if (port->sm_vars & AD_PORT_SELECTED)
 		return;
@@ -2794,6 +2800,7 @@ void bond_3ad_handle_link_change(struct slave *slave, char link)
 		/* link has failed */
 		port->is_enabled = false;
 		ad_update_actor_keys(port, true);
+		port->sm_vars &= ~AD_PORT_SELECTED;
 	}
 	agg = __get_first_agg(port);
 	ad_agg_selection_logic(agg, &dummy);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net v3 4/5] bonding: 3ad: fix stuck negotiation on recovery
  2026-04-08 15:23 [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves Louis Scalbert
                   ` (2 preceding siblings ...)
  2026-04-08 15:23 ` [PATCH net v3 3/5] bonding: 3ad: fix mux port state on oper down Louis Scalbert
@ 2026-04-08 15:23 ` Louis Scalbert
  2026-04-08 15:23 ` [PATCH net v3 5/5] selftests: bonding: add test for fallback mode Louis Scalbert
  4 siblings, 0 replies; 6+ messages in thread
From: Louis Scalbert @ 2026-04-08 15:23 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
	maheshb, Louis Scalbert

The previous commit introduced a side effect caused by clearing the
SELECTED flag on disabled ports. After all ports in an aggregator go
down, if only a subset of ports comes back up, those ports can no
longer renegotiate LACP unless all aggregator ports come back up.

1. All aggregator ports go down
  - The SELECTED flag is cleared on all of them.
2. One port comes back up
  - Its SELECTED flag is set again.
  - It enters the WAITING state and gets its READY_N flag.
  - The remaining ports stay UNSELECTED. Because of that, they cannot
  enter the WAITING state and therefore never get READY_N.
  - __agg_ports_are_ready() returns 0 because it finds a port without
  READY_N.
  - As a result, __set_agg_ports_ready() keeps the READY flag cleared on
  all ports.
  - The port that came back up is therefore not marked READY and cannot
  transition to ATTACHED.
  - LACP negotiation becomes stuck, and the port cannot be used.
3. All aggregator ports come back up
  - They all regain SELECTED and READY_N.
  - __agg_ports_are_ready() now returns 1.
  - __set_agg_ports_ready() sets READY on all ports.
  - They can then transition to ATTACHED.
  - Negotiation resumes and the aggregator becomes operational again.

Consider only ports currently in the WAITING mux state for READY_N in
order to avoid __agg_ports_are_ready() to return 0 because of a disabled
port. That matches 802.3ad, which states: "The Selection Logic asserts
Ready TRUE when the values of Ready_N for all ports that are waiting to
attach to a given Aggregator are TRUE.".

Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
 drivers/net/bonding/bond_3ad.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 3a94fbcbf721..3f56d892b101 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -700,7 +700,8 @@ static void __update_ntt(struct lacpdu *lacpdu, struct port *port)
 }
 
 /**
- * __agg_ports_are_ready - check if all ports in an aggregator are ready
+ * __agg_ports_are_ready - check if all ports in an aggregator that are in
+ * the WAITING state are ready
  * @aggregator: the aggregator we're looking at
  *
  */
@@ -716,6 +717,8 @@ static int __agg_ports_are_ready(struct aggregator *aggregator)
 		for (port = aggregator->lag_ports;
 		     port;
 		     port = port->next_port_in_aggregator) {
+			if (port->sm_mux_state != AD_MUX_WAITING)
+				continue;
 			if (!(port->sm_vars & AD_PORT_READY_N)) {
 				retval = 0;
 				break;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net v3 5/5] selftests: bonding: add test for fallback mode
  2026-04-08 15:23 [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves Louis Scalbert
                   ` (3 preceding siblings ...)
  2026-04-08 15:23 ` [PATCH net v3 4/5] bonding: 3ad: fix stuck negotiation on recovery Louis Scalbert
@ 2026-04-08 15:23 ` Louis Scalbert
  4 siblings, 0 replies; 6+ messages in thread
From: Louis Scalbert @ 2026-04-08 15:23 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
	maheshb, Louis Scalbert

Add a test for the bonding legacy and strict LACP fallback modes.

Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
 .../selftests/drivers/net/bonding/Makefile    |   1 +
 .../drivers/net/bonding/bond_lacp_fallback.sh | 299 ++++++++++++++++++
 2 files changed, 300 insertions(+)
 create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_lacp_fallback.sh

diff --git a/tools/testing/selftests/drivers/net/bonding/Makefile b/tools/testing/selftests/drivers/net/bonding/Makefile
index 6c5c60adb5e8..a117bf2e483b 100644
--- a/tools/testing/selftests/drivers/net/bonding/Makefile
+++ b/tools/testing/selftests/drivers/net/bonding/Makefile
@@ -7,6 +7,7 @@ TEST_PROGS := \
 	bond-eth-type-change.sh \
 	bond-lladdr-target.sh \
 	bond_ipsec_offload.sh \
+	bond_lacp_fallback.sh \
 	bond_lacp_prio.sh \
 	bond_macvlan_ipvlan.sh \
 	bond_options.sh \
diff --git a/tools/testing/selftests/drivers/net/bonding/bond_lacp_fallback.sh b/tools/testing/selftests/drivers/net/bonding/bond_lacp_fallback.sh
new file mode 100755
index 000000000000..a983a2c2ea17
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/bonding/bond_lacp_fallback.sh
@@ -0,0 +1,299 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Testing if bond lacp_fallback works
+#
+#          Partner (p_ns)
+#  +-------------------------+
+#  |          bond0          |
+#  |            +            |
+#  |      eth0  |  eth1      |
+#  |        +---+---+        |
+#  |        |       |        |
+#  +-------------------------+
+#           |       |
+#  +--------------------------+
+#  |        |       |         |
+#  |        +---+---+         |
+#  |      eth0  |  eth1       |
+#  |            +             |
+#  |          bond0           |
+#  +--------------------------+
+#         Dut (d_ns)
+
+lib_dir=$(dirname "$0")
+# shellcheck disable=SC1090
+source "$lib_dir"/../../../net/lib.sh
+
+COLLECTING_DISTRIBUTING_MASK=48
+COLLECTING_DISTRIBUTING=48
+FAILED=0
+
+setup_links()
+{
+	# shellcheck disable=SC2154
+	ip -n "${d_ns}" link add eth0 type veth peer name eth0 netns "${p_ns}"
+	ip -n "${d_ns}" link add eth1 type veth peer name eth1 netns "${p_ns}"
+
+	ip -n "${d_ns}" link add bond0 type bond mode 802.3ad miimon 100 \
+		lacp_rate fast min_links 1
+	ip -n "${p_ns}" link add bond0 type bond mode 802.3ad miimon 100 \
+		lacp_rate fast min_links 1
+
+	ip -n "${d_ns}" link set eth0 master bond0
+	ip -n "${d_ns}" link set eth1 master bond0
+	ip -n "${p_ns}" link set eth0 master bond0
+	ip -n "${p_ns}" link set eth1 master bond0
+
+	ip -n "${d_ns}" link set bond0 up
+	ip -n "${p_ns}" link set bond0 up
+}
+
+test_master_carrier() {
+	local expected=$1
+	local mode_name=$2
+	local carrier
+
+	carrier=$(ip netns exec "${d_ns}" cat /sys/class/net/bond0/carrier)
+	[ "$carrier" == "1" ] && carrier="up" || carrier="down"
+
+	[ "$carrier" == "$expected" ] && return
+
+	echo "FAIL: Expected carrier $expected in $mode_name mode, got $carrier"
+
+	RET=1
+
+}
+
+compare_state() {
+	local actual_state=$1
+	local expected_state=$2
+	local iface=$3
+	local last_attempt=$4
+
+    [ $((actual_state & COLLECTING_DISTRIBUTING_MASK)) -eq "$expected_state" ] \
+		&& return 0
+
+	[ "$last_attempt" -ne 1 ] && return 1
+
+	printf "FAIL: Expected LACP %s actor state to " "$iface"
+	if [ "$expected_state" -eq $COLLECTING_DISTRIBUTING ]; then
+		echo "be in Collecting/Distributing state"
+	else
+		echo "have neither Collecting nor Distributing set."
+	fi
+
+	return 1
+}
+
+_test_lacp_port_state() {
+	local interface=$1
+	local expected=$2
+	local last_attempt=$3
+	local eth0_actor_state eth1_actor_state
+	local ret=0
+
+	# shellcheck disable=SC2016
+	while IFS='=' read -r k v; do
+		printf -v "$k" '%s' "$v"
+	done < <(
+		ip netns exec "${d_ns}" awk '
+		/^Slave Interface: / { iface=$3 }
+		/details actor lacp pdu:/ { ctx="actor" }
+		/details partner lacp pdu:/ { ctx="partner" }
+		/^[[:space:]]+port state: / {
+			if (ctx == "actor") {
+				gsub(":", "", iface)
+				printf "%s_%s_state=%s\n", iface, ctx, $3
+			}
+		}
+		' /proc/net/bonding/bond0
+	)
+
+	if [ "$interface" == "eth0" ] || [ "$interface" == "both" ]; then
+		compare_state "$eth0_actor_state" "$expected" eth0 "$last_attempt" || ret=1
+	fi
+
+	if [ "$interface" == "eth1" ] || [ "$interface" == "both" ]; then
+		compare_state "$eth1_actor_state" "$expected" eth1 "$last_attempt" || ret=1
+	fi
+
+	return $ret
+}
+
+test_lacp_port_state() {
+	local interface=$1
+	local expected=$2
+	local retry=$3
+	local last_attempt=0
+	local attempt=1
+	local ret=1
+
+	while [ $attempt -le $((retry + 1)) ]; do
+		[ $attempt -eq $((retry + 1)) ] && last_attempt=1
+		_test_lacp_port_state "$interface" "$expected" "$last_attempt" && return
+		((attempt++))
+		sleep 1
+	done
+
+	RET=1
+}
+
+
+trap cleanup_all_ns EXIT
+setup_ns d_ns p_ns
+setup_links
+
+# Initial state
+RET=0
+mode=legacy
+test_lacp_port_state both $COLLECTING_DISTRIBUTING 3
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 up"
+
+# partner eth0 down, eth1 up
+RET=0
+ip -n "${p_ns}" link set eth0 down
+test_lacp_port_state eth0 $FAILED 5
+test_lacp_port_state eth1 $COLLECTING_DISTRIBUTING 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 down"
+
+# partner eth0 and eth1 down
+RET=0
+ip -n "${p_ns}" link set eth1 down
+test_lacp_port_state both $FAILED 5
+test_master_carrier down $mode # down because of min_links
+log_test "bond LACP" "$mode fallback - eth0 and eth1 down"
+
+# partner eth0 up, eth1 down
+RET=0
+ip -n "${p_ns}" link set eth0 up
+test_lacp_port_state eth0 $COLLECTING_DISTRIBUTING 60
+test_lacp_port_state eth1 $FAILED 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 up, eth1 down"
+
+# partner eth0 and eth1 up
+RET=0
+ip -n "${p_ns}" link set eth1 up
+test_lacp_port_state both $COLLECTING_DISTRIBUTING 60
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 up"
+
+# partner eth0 stops LACP and eth1 up
+RET=0
+ip netns exec "${p_ns}" tc qdisc add dev eth0 root netem loss 100%
+test_lacp_port_state eth0 $FAILED 5
+test_lacp_port_state eth1 $COLLECTING_DISTRIBUTING 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 stopped sending LACP"
+
+# partner eth0 and eth1 stop LACP
+RET=0
+ip netns exec "${p_ns}" tc qdisc add dev eth1 root netem loss 100%
+test_lacp_port_state both $FAILED 5
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 stopped sending LACP"
+
+# switch to lacp_fallback strict
+RET=0
+mode=strict
+ip -n "${d_ns}" link set dev bond0 type bond lacp_fallback $mode
+test_lacp_port_state both $FAILED 1
+test_master_carrier down $mode 5
+log_test "bond LACP" "$mode fallback - eth0 and eth1 stopped sending LACP"
+
+# switch back to lacp_fallback legacy mode
+RET=0
+mode=legacy
+ip -n "${d_ns}" link set dev bond0 type bond lacp_fallback $mode
+test_lacp_port_state both $FAILED 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 stopped sending LACP"
+
+# eth0 recovers LACP
+RET=0
+ip netns exec "${p_ns}" tc qdisc del dev eth0 root
+test_lacp_port_state eth0 $COLLECTING_DISTRIBUTING 60
+test_lacp_port_state eth1 $FAILED 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 recovered and eth1 stopped sending LACP"
+
+# eth1 recovers LACP
+RET=0
+ip netns exec "${p_ns}" tc qdisc del dev eth1 root
+test_lacp_port_state both $COLLECTING_DISTRIBUTING 60
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 recovered LACP"
+
+# switch to lacp_fallback strict
+RET=0
+mode=strict
+ip -n "${d_ns}" link set dev bond0 type bond lacp_fallback $mode
+test_lacp_port_state both $COLLECTING_DISTRIBUTING 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 up"
+
+# partner eth0 down, eth1 up
+RET=0
+ip -n "${p_ns}" link set eth0 down
+test_lacp_port_state eth0 $FAILED 5
+test_lacp_port_state eth1 $COLLECTING_DISTRIBUTING 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 down"
+
+# partner eth0 and eth1 down
+RET=0
+ip -n "${p_ns}" link set eth1 down
+test_lacp_port_state both $FAILED 5
+test_master_carrier down $mode # down because of min_links
+log_test "bond LACP" "$mode fallback - eth0 and eth1 down"
+
+# partner eth0 up, eth1 down
+RET=0
+ip -n "${p_ns}" link set eth0 up
+test_lacp_port_state eth0 $COLLECTING_DISTRIBUTING 60
+test_lacp_port_state eth1 $FAILED 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 up, eth1 down"
+
+# partner eth0 and eth1 up
+RET=0
+ip -n "${p_ns}" link set eth1 up
+test_lacp_port_state both $COLLECTING_DISTRIBUTING 60
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 up"
+
+# partner eth0 stops LACP and eth1 up
+RET=0
+ip netns exec "${p_ns}" tc qdisc add dev eth0 root netem loss 100%
+test_lacp_port_state eth0 $FAILED 5
+test_lacp_port_state eth1 $COLLECTING_DISTRIBUTING 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 stopped sending LACP"
+
+# partner eth0 and eth1 stop LACP
+RET=0
+ip netns exec "${p_ns}" tc qdisc add dev eth1 root netem loss 100%
+test_lacp_port_state both $FAILED 5
+test_master_carrier down $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 stopped sending LACP"
+
+# eth0 recovers LACP
+RET=0
+ip netns exec "${p_ns}" tc qdisc del dev eth0 root
+test_lacp_port_state eth0 $COLLECTING_DISTRIBUTING 60
+test_lacp_port_state eth1 $FAILED 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 recovered and eth1 stopped sending LACP"
+
+# eth1 recovers LACP
+# shellcheck disable=SC2034
+RET=0
+ip netns exec "${p_ns}" tc qdisc del dev eth1 root
+test_lacp_port_state both $COLLECTING_DISTRIBUTING 60
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 recovered LACP"
+
+exit "${EXIT_STATUS}"
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-04-08 15:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-08 15:23 [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves Louis Scalbert
2026-04-08 15:23 ` [PATCH net v3 1/5] bonding: 3ad: add lacp_fallback configuration knob Louis Scalbert
2026-04-08 15:23 ` [PATCH net v3 2/5] bonding: 3ad: fix carrier when no valid slaves Louis Scalbert
2026-04-08 15:23 ` [PATCH net v3 3/5] bonding: 3ad: fix mux port state on oper down Louis Scalbert
2026-04-08 15:23 ` [PATCH net v3 4/5] bonding: 3ad: fix stuck negotiation on recovery Louis Scalbert
2026-04-08 15:23 ` [PATCH net v3 5/5] selftests: bonding: add test for fallback mode Louis Scalbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox