* [PATCH net v3 1/5] bonding: 3ad: add lacp_fallback configuration knob
2026-04-08 15:23 [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves Louis Scalbert
@ 2026-04-08 15:23 ` Louis Scalbert
2026-04-13 16:45 ` Jay Vosburgh
2026-04-08 15:23 ` [PATCH net v3 2/5] bonding: 3ad: fix carrier when no valid slaves Louis Scalbert
` (5 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Louis Scalbert @ 2026-04-08 15:23 UTC (permalink / raw)
To: netdev
Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
maheshb, Louis Scalbert
When an 802.3ad (LACP) bonding interface has no slaves in the
collecting/distributing state, the bonding master still reports
carrier as up as long as at least 'min_links' slaves have carrier.
In this situation, only one slave is effectively used for TX/RX,
while traffic received on other slaves is dropped. Upper-layer
daemons therefore consider the interface operational, even though
traffic may be blackholed if the lack of LACP negotiation means
the partner is not ready to deal with traffic.
Introduce a configuration knob to control this behavior. It allows
the bonding master to assert carrier only when at least 'min_links'
slaves are in collecting/distributing state (or collecting only
when coupled_control is disabled).
The default mode preserves the current (legacy) behavior. This
patch only introduces the knob; its behavior is implemented in
the subsequent commit.
Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
Documentation/networking/bonding.rst | 33 ++++++++++++++++++++++++++++
drivers/net/bonding/bond_main.c | 1 +
drivers/net/bonding/bond_netlink.c | 16 ++++++++++++++
drivers/net/bonding/bond_options.c | 26 ++++++++++++++++++++++
include/net/bond_options.h | 1 +
include/net/bonding.h | 1 +
include/uapi/linux/if_link.h | 1 +
7 files changed, 79 insertions(+)
diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
index e700bf1d095c..465d06aead27 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -619,6 +619,39 @@ min_links
aggregator cannot be active without at least one available link,
setting this option to 0 or to 1 has the exact same effect.
+lacp_fallback
+
+ Specifies the fallback behavior of a bonding when LACP negotiation fails on
+ all slave links, i.e. when no slave is in the Collecting/Distributing state
+ (or only in Collecting state when coupled_control is disabled), while at
+ least `min_links` link still reports carrier up.
+
+ This option is only applicable to 802.3ad mode (mode 4).
+
+ Valid values are:
+
+ legacy or 0
+ In this situation, the bonding master remains carrier up and
+ randomly selects a single slave to transmit and receive traffic.
+ Traffic received on other slaves is dropped.
+
+ This mode is deprecated, as it may lead to traffic blackholing
+ when the absence of LACP negotiation means the partner is not
+ ready to collect and distribute traffic.
+
+ This is the legacy default behavior.
+
+ strict or 1
+ In this situation, the bonding master reports carrier down, allowing
+ upper-layer processes to detect that the interface is not usable for
+ collecting and distributing traffic.
+
+ The master transitions to carrier up only when at least
+ `min_links` slaves reach the Collecting(/Distributing) state,
+ allowing traffic to flow.
+
+ The default value is 0 (legacy).
+
mode
Specifies one of the bonding policies. The default is
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index a5484d11553d..02cba0560a39 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -6440,6 +6440,7 @@ static int __init bond_check_params(struct bond_params *params)
params->ad_user_port_key = ad_user_port_key;
params->coupled_control = 1;
params->broadcast_neighbor = 0;
+ params->lacp_fallback = 0;
if (packets_per_slave > 0) {
params->reciprocal_packets_per_slave =
reciprocal_value(packets_per_slave);
diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
index 286f11c517f7..1f92ad786b51 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -130,6 +130,7 @@ static const struct nla_policy bond_policy[IFLA_BOND_MAX + 1] = {
[IFLA_BOND_NS_IP6_TARGET] = { .type = NLA_NESTED },
[IFLA_BOND_COUPLED_CONTROL] = { .type = NLA_U8 },
[IFLA_BOND_BROADCAST_NEIGH] = { .type = NLA_U8 },
+ [IFLA_BOND_LACP_FALLBACK] = { .type = NLA_U8 },
};
static const struct nla_policy bond_slave_policy[IFLA_BOND_SLAVE_MAX + 1] = {
@@ -586,6 +587,16 @@ static int bond_changelink(struct net_device *bond_dev, struct nlattr *tb[],
return err;
}
+ if (data[IFLA_BOND_LACP_FALLBACK]) {
+ int fallback_mode = nla_get_u8(data[IFLA_BOND_LACP_FALLBACK]);
+
+ bond_opt_initval(&newval, fallback_mode);
+ err = __bond_opt_set(bond, BOND_OPT_LACP_FALLBACK, &newval,
+ data[IFLA_BOND_LACP_FALLBACK], extack);
+ if (err)
+ return err;
+ }
+
return 0;
}
@@ -658,6 +669,7 @@ static size_t bond_get_size(const struct net_device *bond_dev)
nla_total_size(sizeof(struct in6_addr)) * BOND_MAX_NS_TARGETS +
nla_total_size(sizeof(u8)) + /* IFLA_BOND_COUPLED_CONTROL */
nla_total_size(sizeof(u8)) + /* IFLA_BOND_BROADCAST_NEIGH */
+ nla_total_size(sizeof(u8)) + /* IFLA_BOND_LACP_FALLBACK */
0;
}
@@ -825,6 +837,10 @@ static int bond_fill_info(struct sk_buff *skb,
bond->params.broadcast_neighbor))
goto nla_put_failure;
+ if (nla_put_u8(skb, IFLA_BOND_LACP_FALLBACK,
+ bond->params.lacp_fallback))
+ goto nla_put_failure;
+
if (BOND_MODE(bond) == BOND_MODE_8023AD) {
struct ad_info info;
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 7380cc4ee75a..b672b8a881bb 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -68,6 +68,8 @@ static int bond_option_lacp_active_set(struct bonding *bond,
const struct bond_opt_value *newval);
static int bond_option_lacp_rate_set(struct bonding *bond,
const struct bond_opt_value *newval);
+static int bond_option_lacp_fallback_set(struct bonding *bond,
+ const struct bond_opt_value *newval);
static int bond_option_ad_select_set(struct bonding *bond,
const struct bond_opt_value *newval);
static int bond_option_queue_id_set(struct bonding *bond,
@@ -162,6 +164,12 @@ static const struct bond_opt_value bond_lacp_rate_tbl[] = {
{ NULL, -1, 0},
};
+static const struct bond_opt_value bond_lacp_fallback_tbl[] = {
+ { "legacy", 0, BOND_VALFLAG_DEFAULT},
+ { "strict", 1, 0},
+ { NULL, -1, 0 }
+};
+
static const struct bond_opt_value bond_ad_select_tbl[] = {
{ "stable", BOND_AD_STABLE, BOND_VALFLAG_DEFAULT},
{ "bandwidth", BOND_AD_BANDWIDTH, 0},
@@ -363,6 +371,14 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = {
.values = bond_lacp_rate_tbl,
.set = bond_option_lacp_rate_set
},
+ [BOND_OPT_LACP_FALLBACK] = {
+ .id = BOND_OPT_LACP_FALLBACK,
+ .name = "lacp_fallback",
+ .desc = "Define the LACP fallback mode when no slaves have negotiated",
+ .unsuppmodes = BOND_MODE_ALL_EX(BIT(BOND_MODE_8023AD)),
+ .values = bond_lacp_fallback_tbl,
+ .set = bond_option_lacp_fallback_set
+ },
[BOND_OPT_MINLINKS] = {
.id = BOND_OPT_MINLINKS,
.name = "min_links",
@@ -1684,6 +1700,16 @@ static int bond_option_lacp_rate_set(struct bonding *bond,
return 0;
}
+static int bond_option_lacp_fallback_set(struct bonding *bond,
+ const struct bond_opt_value *newval)
+{
+ netdev_dbg(bond->dev, "Setting LACP fallback to %s (%llu)\n",
+ newval->string, newval->value);
+ bond->params.lacp_fallback = newval->value;
+
+ return 0;
+}
+
static int bond_option_ad_select_set(struct bonding *bond,
const struct bond_opt_value *newval)
{
diff --git a/include/net/bond_options.h b/include/net/bond_options.h
index e6eedf23aea1..5eb64c831f54 100644
--- a/include/net/bond_options.h
+++ b/include/net/bond_options.h
@@ -79,6 +79,7 @@ enum {
BOND_OPT_COUPLED_CONTROL,
BOND_OPT_BROADCAST_NEIGH,
BOND_OPT_ACTOR_PORT_PRIO,
+ BOND_OPT_LACP_FALLBACK,
BOND_OPT_LAST
};
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 395c6e281c5f..d8cb02643f8b 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -132,6 +132,7 @@ struct bond_params {
int peer_notif_delay;
int lacp_active;
int lacp_fast;
+ int lacp_fallback;
unsigned int min_links;
int ad_select;
char primary[IFNAMSIZ];
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index e9b5f79e1ee1..7ad3fc600c71 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -1539,6 +1539,7 @@ enum {
IFLA_BOND_NS_IP6_TARGET,
IFLA_BOND_COUPLED_CONTROL,
IFLA_BOND_BROADCAST_NEIGH,
+ IFLA_BOND_LACP_FALLBACK,
__IFLA_BOND_MAX,
};
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH net v3 1/5] bonding: 3ad: add lacp_fallback configuration knob
2026-04-08 15:23 ` [PATCH net v3 1/5] bonding: 3ad: add lacp_fallback configuration knob Louis Scalbert
@ 2026-04-13 16:45 ` Jay Vosburgh
0 siblings, 0 replies; 18+ messages in thread
From: Jay Vosburgh @ 2026-04-13 16:45 UTC (permalink / raw)
To: Louis Scalbert
Cc: netdev, andrew+netdev, edumazet, kuba, pabeni, fbl, andy,
shemminger, maheshb
Louis Scalbert <louis.scalbert@6wind.com> wrote:
>When an 802.3ad (LACP) bonding interface has no slaves in the
>collecting/distributing state, the bonding master still reports
>carrier as up as long as at least 'min_links' slaves have carrier.
>
>In this situation, only one slave is effectively used for TX/RX,
>while traffic received on other slaves is dropped. Upper-layer
>daemons therefore consider the interface operational, even though
>traffic may be blackholed if the lack of LACP negotiation means
>the partner is not ready to deal with traffic.
>
>Introduce a configuration knob to control this behavior. It allows
>the bonding master to assert carrier only when at least 'min_links'
>slaves are in collecting/distributing state (or collecting only
>when coupled_control is disabled).
>
>The default mode preserves the current (legacy) behavior. This
>patch only introduces the knob; its behavior is implemented in
>the subsequent commit.
>
>Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
>Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
>---
> Documentation/networking/bonding.rst | 33 ++++++++++++++++++++++++++++
> drivers/net/bonding/bond_main.c | 1 +
> drivers/net/bonding/bond_netlink.c | 16 ++++++++++++++
> drivers/net/bonding/bond_options.c | 26 ++++++++++++++++++++++
> include/net/bond_options.h | 1 +
> include/net/bonding.h | 1 +
> include/uapi/linux/if_link.h | 1 +
> 7 files changed, 79 insertions(+)
>
>diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
>index e700bf1d095c..465d06aead27 100644
>--- a/Documentation/networking/bonding.rst
>+++ b/Documentation/networking/bonding.rst
>@@ -619,6 +619,39 @@ min_links
> aggregator cannot be active without at least one available link,
> setting this option to 0 or to 1 has the exact same effect.
>
>+lacp_fallback
>+
>+ Specifies the fallback behavior of a bonding when LACP negotiation fails on
>+ all slave links, i.e. when no slave is in the Collecting/Distributing state
>+ (or only in Collecting state when coupled_control is disabled), while at
>+ least `min_links` link still reports carrier up.
>+
>+ This option is only applicable to 802.3ad mode (mode 4).
>+
>+ Valid values are:
>+
>+ legacy or 0
>+ In this situation, the bonding master remains carrier up and
>+ randomly selects a single slave to transmit and receive traffic.
>+ Traffic received on other slaves is dropped.
>+
>+ This mode is deprecated, as it may lead to traffic blackholing
>+ when the absence of LACP negotiation means the partner is not
>+ ready to collect and distribute traffic.
>+
>+ This is the legacy default behavior.
>+
>+ strict or 1
>+ In this situation, the bonding master reports carrier down, allowing
>+ upper-layer processes to detect that the interface is not usable for
>+ collecting and distributing traffic.
>+
>+ The master transitions to carrier up only when at least
>+ `min_links` slaves reach the Collecting(/Distributing) state,
>+ allowing traffic to flow.
>+
>+ The default value is 0 (legacy).
>+
1- Please wrap text at approximately 75 columns.
2- I don't agree with the nomenclature or language of the above.
The existing behavior is not going to be deprecated or considered to be
legacy, and the option nomenclature should reflect that. I would
suggest naming the option "lacp_strict" and having it's possible
settings be on or off, with the default setting as off.
I think the behavior can be described along the lines of
off or 0
One interface of the bond is selected to be active, in
order to facilitate communication with peer devices that
do not implement LACP.
on or 1
Interfaces are only permitted to be made active if they
have an active LACP partner and have successfully reached
Collecting or Collecting_Distributing state.
The default is value is 0 (off).
-J
> mode
>
> Specifies one of the bonding policies. The default is
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index a5484d11553d..02cba0560a39 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -6440,6 +6440,7 @@ static int __init bond_check_params(struct bond_params *params)
> params->ad_user_port_key = ad_user_port_key;
> params->coupled_control = 1;
> params->broadcast_neighbor = 0;
>+ params->lacp_fallback = 0;
> if (packets_per_slave > 0) {
> params->reciprocal_packets_per_slave =
> reciprocal_value(packets_per_slave);
>diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
>index 286f11c517f7..1f92ad786b51 100644
>--- a/drivers/net/bonding/bond_netlink.c
>+++ b/drivers/net/bonding/bond_netlink.c
>@@ -130,6 +130,7 @@ static const struct nla_policy bond_policy[IFLA_BOND_MAX + 1] = {
> [IFLA_BOND_NS_IP6_TARGET] = { .type = NLA_NESTED },
> [IFLA_BOND_COUPLED_CONTROL] = { .type = NLA_U8 },
> [IFLA_BOND_BROADCAST_NEIGH] = { .type = NLA_U8 },
>+ [IFLA_BOND_LACP_FALLBACK] = { .type = NLA_U8 },
> };
>
> static const struct nla_policy bond_slave_policy[IFLA_BOND_SLAVE_MAX + 1] = {
>@@ -586,6 +587,16 @@ static int bond_changelink(struct net_device *bond_dev, struct nlattr *tb[],
> return err;
> }
>
>+ if (data[IFLA_BOND_LACP_FALLBACK]) {
>+ int fallback_mode = nla_get_u8(data[IFLA_BOND_LACP_FALLBACK]);
>+
>+ bond_opt_initval(&newval, fallback_mode);
>+ err = __bond_opt_set(bond, BOND_OPT_LACP_FALLBACK, &newval,
>+ data[IFLA_BOND_LACP_FALLBACK], extack);
>+ if (err)
>+ return err;
>+ }
>+
> return 0;
> }
>
>@@ -658,6 +669,7 @@ static size_t bond_get_size(const struct net_device *bond_dev)
> nla_total_size(sizeof(struct in6_addr)) * BOND_MAX_NS_TARGETS +
> nla_total_size(sizeof(u8)) + /* IFLA_BOND_COUPLED_CONTROL */
> nla_total_size(sizeof(u8)) + /* IFLA_BOND_BROADCAST_NEIGH */
>+ nla_total_size(sizeof(u8)) + /* IFLA_BOND_LACP_FALLBACK */
> 0;
> }
>
>@@ -825,6 +837,10 @@ static int bond_fill_info(struct sk_buff *skb,
> bond->params.broadcast_neighbor))
> goto nla_put_failure;
>
>+ if (nla_put_u8(skb, IFLA_BOND_LACP_FALLBACK,
>+ bond->params.lacp_fallback))
>+ goto nla_put_failure;
>+
> if (BOND_MODE(bond) == BOND_MODE_8023AD) {
> struct ad_info info;
>
>diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
>index 7380cc4ee75a..b672b8a881bb 100644
>--- a/drivers/net/bonding/bond_options.c
>+++ b/drivers/net/bonding/bond_options.c
>@@ -68,6 +68,8 @@ static int bond_option_lacp_active_set(struct bonding *bond,
> const struct bond_opt_value *newval);
> static int bond_option_lacp_rate_set(struct bonding *bond,
> const struct bond_opt_value *newval);
>+static int bond_option_lacp_fallback_set(struct bonding *bond,
>+ const struct bond_opt_value *newval);
> static int bond_option_ad_select_set(struct bonding *bond,
> const struct bond_opt_value *newval);
> static int bond_option_queue_id_set(struct bonding *bond,
>@@ -162,6 +164,12 @@ static const struct bond_opt_value bond_lacp_rate_tbl[] = {
> { NULL, -1, 0},
> };
>
>+static const struct bond_opt_value bond_lacp_fallback_tbl[] = {
>+ { "legacy", 0, BOND_VALFLAG_DEFAULT},
>+ { "strict", 1, 0},
>+ { NULL, -1, 0 }
>+};
>+
> static const struct bond_opt_value bond_ad_select_tbl[] = {
> { "stable", BOND_AD_STABLE, BOND_VALFLAG_DEFAULT},
> { "bandwidth", BOND_AD_BANDWIDTH, 0},
>@@ -363,6 +371,14 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = {
> .values = bond_lacp_rate_tbl,
> .set = bond_option_lacp_rate_set
> },
>+ [BOND_OPT_LACP_FALLBACK] = {
>+ .id = BOND_OPT_LACP_FALLBACK,
>+ .name = "lacp_fallback",
>+ .desc = "Define the LACP fallback mode when no slaves have negotiated",
>+ .unsuppmodes = BOND_MODE_ALL_EX(BIT(BOND_MODE_8023AD)),
>+ .values = bond_lacp_fallback_tbl,
>+ .set = bond_option_lacp_fallback_set
>+ },
> [BOND_OPT_MINLINKS] = {
> .id = BOND_OPT_MINLINKS,
> .name = "min_links",
>@@ -1684,6 +1700,16 @@ static int bond_option_lacp_rate_set(struct bonding *bond,
> return 0;
> }
>
>+static int bond_option_lacp_fallback_set(struct bonding *bond,
>+ const struct bond_opt_value *newval)
>+{
>+ netdev_dbg(bond->dev, "Setting LACP fallback to %s (%llu)\n",
>+ newval->string, newval->value);
>+ bond->params.lacp_fallback = newval->value;
>+
>+ return 0;
>+}
>+
> static int bond_option_ad_select_set(struct bonding *bond,
> const struct bond_opt_value *newval)
> {
>diff --git a/include/net/bond_options.h b/include/net/bond_options.h
>index e6eedf23aea1..5eb64c831f54 100644
>--- a/include/net/bond_options.h
>+++ b/include/net/bond_options.h
>@@ -79,6 +79,7 @@ enum {
> BOND_OPT_COUPLED_CONTROL,
> BOND_OPT_BROADCAST_NEIGH,
> BOND_OPT_ACTOR_PORT_PRIO,
>+ BOND_OPT_LACP_FALLBACK,
> BOND_OPT_LAST
> };
>
>diff --git a/include/net/bonding.h b/include/net/bonding.h
>index 395c6e281c5f..d8cb02643f8b 100644
>--- a/include/net/bonding.h
>+++ b/include/net/bonding.h
>@@ -132,6 +132,7 @@ struct bond_params {
> int peer_notif_delay;
> int lacp_active;
> int lacp_fast;
>+ int lacp_fallback;
> unsigned int min_links;
> int ad_select;
> char primary[IFNAMSIZ];
>diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>index e9b5f79e1ee1..7ad3fc600c71 100644
>--- a/include/uapi/linux/if_link.h
>+++ b/include/uapi/linux/if_link.h
>@@ -1539,6 +1539,7 @@ enum {
> IFLA_BOND_NS_IP6_TARGET,
> IFLA_BOND_COUPLED_CONTROL,
> IFLA_BOND_BROADCAST_NEIGH,
>+ IFLA_BOND_LACP_FALLBACK,
> __IFLA_BOND_MAX,
> };
>
>--
>2.39.2
>
---
-Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH net v3 2/5] bonding: 3ad: fix carrier when no valid slaves
2026-04-08 15:23 [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves Louis Scalbert
2026-04-08 15:23 ` [PATCH net v3 1/5] bonding: 3ad: add lacp_fallback configuration knob Louis Scalbert
@ 2026-04-08 15:23 ` Louis Scalbert
2026-04-13 17:01 ` Jay Vosburgh
2026-04-08 15:23 ` [PATCH net v3 3/5] bonding: 3ad: fix mux port state on oper down Louis Scalbert
` (4 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Louis Scalbert @ 2026-04-08 15:23 UTC (permalink / raw)
To: netdev
Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
maheshb, Louis Scalbert
Apply the "lacp_fallback" configuration from the previous commit.
"lacp_fallback" mode "strict" asserts that the bonding master carrier
only when at least 'min_links' slaves are in the collecting/distributing
state (or collecting only if the coupled_control default behavior is
disabled).
Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
drivers/net/bonding/bond_3ad.c | 26 ++++++++++++++++++++++++--
drivers/net/bonding/bond_options.c | 1 +
2 files changed, 25 insertions(+), 2 deletions(-)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index af7f74cfdc08..b79a76296966 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -745,6 +745,22 @@ static void __set_agg_ports_ready(struct aggregator *aggregator, int val)
}
}
+static int __agg_valid_ports(struct aggregator *agg)
+{
+ struct port *port;
+ int valid = 0;
+
+ for (port = agg->lag_ports; port;
+ port = port->next_port_in_aggregator) {
+ if (port->actor_oper_port_state & LACP_STATE_COLLECTING &&
+ (!port->slave->bond->params.coupled_control ||
+ port->actor_oper_port_state & LACP_STATE_DISTRIBUTING))
+ valid++;
+ }
+
+ return valid;
+}
+
static int __agg_active_ports(struct aggregator *agg)
{
struct port *port;
@@ -2120,6 +2136,7 @@ static void ad_enable_collecting_distributing(struct port *port,
port->actor_port_number,
port->aggregator->aggregator_identifier);
__enable_port(port);
+ bond_3ad_set_carrier(port->slave->bond);
/* Slave array needs update */
*update_slave_arr = true;
/* Should notify peers if possible */
@@ -2141,6 +2158,7 @@ static void ad_disable_collecting_distributing(struct port *port,
port->actor_port_number,
port->aggregator->aggregator_identifier);
__disable_port(port);
+ bond_3ad_set_carrier(port->slave->bond);
/* Slave array needs an update */
*update_slave_arr = true;
}
@@ -2819,8 +2837,12 @@ int bond_3ad_set_carrier(struct bonding *bond)
}
active = __get_active_agg(&(SLAVE_AD_INFO(first_slave)->aggregator));
if (active) {
- /* are enough slaves available to consider link up? */
- if (__agg_active_ports(active) < bond->params.min_links) {
+ /* are enough slaves in collecting (and distributing) state to consider
+ * link up?
+ */
+ if ((bond->params.lacp_fallback ? __agg_valid_ports(active)
+ : __agg_active_ports(active)) <
+ bond->params.min_links) {
if (netif_carrier_ok(bond->dev)) {
netif_carrier_off(bond->dev);
goto out;
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index b672b8a881bb..d64a5d2f80b6 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -1706,6 +1706,7 @@ static int bond_option_lacp_fallback_set(struct bonding *bond,
netdev_dbg(bond->dev, "Setting LACP fallback to %s (%llu)\n",
newval->string, newval->value);
bond->params.lacp_fallback = newval->value;
+ bond_3ad_set_carrier(bond);
return 0;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH net v3 2/5] bonding: 3ad: fix carrier when no valid slaves
2026-04-08 15:23 ` [PATCH net v3 2/5] bonding: 3ad: fix carrier when no valid slaves Louis Scalbert
@ 2026-04-13 17:01 ` Jay Vosburgh
2026-04-15 17:53 ` Louis Scalbert
0 siblings, 1 reply; 18+ messages in thread
From: Jay Vosburgh @ 2026-04-13 17:01 UTC (permalink / raw)
To: Louis Scalbert
Cc: netdev, andrew+netdev, edumazet, kuba, pabeni, fbl, andy,
shemminger, maheshb
Louis Scalbert <louis.scalbert@6wind.com> wrote:
>Apply the "lacp_fallback" configuration from the previous commit.
>
>"lacp_fallback" mode "strict" asserts that the bonding master carrier
>only when at least 'min_links' slaves are in the collecting/distributing
>state (or collecting only if the coupled_control default behavior is
>disabled).
>
>Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
>Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
>---
> drivers/net/bonding/bond_3ad.c | 26 ++++++++++++++++++++++++--
> drivers/net/bonding/bond_options.c | 1 +
> 2 files changed, 25 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index af7f74cfdc08..b79a76296966 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -745,6 +745,22 @@ static void __set_agg_ports_ready(struct aggregator *aggregator, int val)
> }
> }
>
>+static int __agg_valid_ports(struct aggregator *agg)
>+{
>+ struct port *port;
>+ int valid = 0;
>+
>+ for (port = agg->lag_ports; port;
>+ port = port->next_port_in_aggregator) {
>+ if (port->actor_oper_port_state & LACP_STATE_COLLECTING &&
>+ (!port->slave->bond->params.coupled_control ||
>+ port->actor_oper_port_state & LACP_STATE_DISTRIBUTING))
>+ valid++;
Do we need to test coupled_control? I.e., can the test be
if ((port->actor_oper_port_state & LACP_STATE_COLLECTING) &&
(port->actor_oper_port_state & LACP_STATE_DISTRIBUTING))
To my reading, ad_mux_machine will set _COLLECTING and
_DISTRIBUTING appropriately regardless of the coupled_control selection.
>+ }
>+
>+ return valid;
>+}
>+
> static int __agg_active_ports(struct aggregator *agg)
> {
> struct port *port;
>@@ -2120,6 +2136,7 @@ static void ad_enable_collecting_distributing(struct port *port,
> port->actor_port_number,
> port->aggregator->aggregator_identifier);
> __enable_port(port);
>+ bond_3ad_set_carrier(port->slave->bond);
> /* Slave array needs update */
> *update_slave_arr = true;
> /* Should notify peers if possible */
>@@ -2141,6 +2158,7 @@ static void ad_disable_collecting_distributing(struct port *port,
> port->actor_port_number,
> port->aggregator->aggregator_identifier);
> __disable_port(port);
>+ bond_3ad_set_carrier(port->slave->bond);
> /* Slave array needs an update */
> *update_slave_arr = true;
> }
>@@ -2819,8 +2837,12 @@ int bond_3ad_set_carrier(struct bonding *bond)
> }
> active = __get_active_agg(&(SLAVE_AD_INFO(first_slave)->aggregator));
> if (active) {
>- /* are enough slaves available to consider link up? */
>- if (__agg_active_ports(active) < bond->params.min_links) {
>+ /* are enough slaves in collecting (and distributing) state to consider
>+ * link up?
>+ */
>+ if ((bond->params.lacp_fallback ? __agg_valid_ports(active)
>+ : __agg_active_ports(active)) <
>+ bond->params.min_links) {
I think the original comment is better; if the new option is
off, it doesn't require collecting / distributing state.
-J
> if (netif_carrier_ok(bond->dev)) {
> netif_carrier_off(bond->dev);
> goto out;
>diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
>index b672b8a881bb..d64a5d2f80b6 100644
>--- a/drivers/net/bonding/bond_options.c
>+++ b/drivers/net/bonding/bond_options.c
>@@ -1706,6 +1706,7 @@ static int bond_option_lacp_fallback_set(struct bonding *bond,
> netdev_dbg(bond->dev, "Setting LACP fallback to %s (%llu)\n",
> newval->string, newval->value);
> bond->params.lacp_fallback = newval->value;
>+ bond_3ad_set_carrier(bond);
>
> return 0;
> }
>--
>2.39.2
>
---
-Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH net v3 2/5] bonding: 3ad: fix carrier when no valid slaves
2026-04-13 17:01 ` Jay Vosburgh
@ 2026-04-15 17:53 ` Louis Scalbert
0 siblings, 0 replies; 18+ messages in thread
From: Louis Scalbert @ 2026-04-15 17:53 UTC (permalink / raw)
To: Jay Vosburgh
Cc: netdev, andrew+netdev, edumazet, kuba, pabeni, fbl, andy,
shemminger, maheshb
Hello Jay,
Thank you very much for this detailed review.
Le lun. 13 avr. 2026 à 19:01, Jay Vosburgh <jv@jvosburgh.net> a écrit :
>
> Louis Scalbert <louis.scalbert@6wind.com> wrote:
>
> >Apply the "lacp_fallback" configuration from the previous commit.
> >
> >"lacp_fallback" mode "strict" asserts that the bonding master carrier
> >only when at least 'min_links' slaves are in the collecting/distributing
> >state (or collecting only if the coupled_control default behavior is
> >disabled).
> >
> >Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
> >Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
> >---
> > drivers/net/bonding/bond_3ad.c | 26 ++++++++++++++++++++++++--
> > drivers/net/bonding/bond_options.c | 1 +
> > 2 files changed, 25 insertions(+), 2 deletions(-)
> >
> >diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> >index af7f74cfdc08..b79a76296966 100644
> >--- a/drivers/net/bonding/bond_3ad.c
> >+++ b/drivers/net/bonding/bond_3ad.c
> >@@ -745,6 +745,22 @@ static void __set_agg_ports_ready(struct aggregator *aggregator, int val)
> > }
> > }
> >
> >+static int __agg_valid_ports(struct aggregator *agg)
> >+{
> >+ struct port *port;
> >+ int valid = 0;
> >+
> >+ for (port = agg->lag_ports; port;
> >+ port = port->next_port_in_aggregator) {
> >+ if (port->actor_oper_port_state & LACP_STATE_COLLECTING &&
> >+ (!port->slave->bond->params.coupled_control ||
> >+ port->actor_oper_port_state & LACP_STATE_DISTRIBUTING))
> >+ valid++;
>
> Do we need to test coupled_control? I.e., can the test be
With coupled_control enabled (default), the actor allows traffic from
the partner only when it reaches both the COLLECTING and DISTRIBUTING
states, i.e., in the AD_MUX_COLLECTING_DISTRIBUTING Mux state.
With coupled_control disabled, the actor allows traffic from the
partner as soon as it reaches the COLLECTING state, regardless of the
DISTRIBUTING flag. In this case, COLLECTING is set in the
AD_MUX_COLLECTING state, while DISTRIBUTING is only set later in the
AD_MUX_DISTRIBUTING state.
From the perspective of upper-layer processes, a carrier up state
indicates that the link is fully operational and capable of both
collecting and distributing traffic. These processes are not aware of
the distinction between COLLECTING and COLLECTING & DISTRIBUTING
states.
>
> if ((port->actor_oper_port_state & LACP_STATE_COLLECTING) &&
> (port->actor_oper_port_state & LACP_STATE_DISTRIBUTING))
>
If we want to allow collection to start without waiting for
distribution to be enabled, then the carrier must be asserted as soon
as the COLLECTING state is reached.
In that case, this test would not be valid.
In practice, I can only test for LACP_STATE_COLLECTING, because
LACP_STATE_DISTRIBUTING is always set together with
LACP_STATE_COLLECTING.
> To my reading, ad_mux_machine will set _COLLECTING and
> _DISTRIBUTING appropriately regardless of the coupled_control selection.
I don't agree. See:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/bonding/bond_3ad.c?id=v7.0#n1090
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/bonding/bond_3ad.c?id=v7.0#n1202
>
> >+ }
> >+
> >+ return valid;
> >+}
> >+
> > static int __agg_active_ports(struct aggregator *agg)
> > {
> > struct port *port;
> >@@ -2120,6 +2136,7 @@ static void ad_enable_collecting_distributing(struct port *port,
> > port->actor_port_number,
> > port->aggregator->aggregator_identifier);
> > __enable_port(port);
> >+ bond_3ad_set_carrier(port->slave->bond);
> > /* Slave array needs update */
> > *update_slave_arr = true;
> > /* Should notify peers if possible */
> >@@ -2141,6 +2158,7 @@ static void ad_disable_collecting_distributing(struct port *port,
> > port->actor_port_number,
> > port->aggregator->aggregator_identifier);
> > __disable_port(port);
> >+ bond_3ad_set_carrier(port->slave->bond);
> > /* Slave array needs an update */
> > *update_slave_arr = true;
> > }
> >@@ -2819,8 +2837,12 @@ int bond_3ad_set_carrier(struct bonding *bond)
> > }
> > active = __get_active_agg(&(SLAVE_AD_INFO(first_slave)->aggregator));
> > if (active) {
> >- /* are enough slaves available to consider link up? */
> >- if (__agg_active_ports(active) < bond->params.min_links) {
> >+ /* are enough slaves in collecting (and distributing) state to consider
> >+ * link up?
> >+ */
> >+ if ((bond->params.lacp_fallback ? __agg_valid_ports(active)
> >+ : __agg_active_ports(active)) <
> >+ bond->params.min_links) {
>
> I think the original comment is better; if the new option is
> off, it doesn't require collecting / distributing state.
>
> -J
>
> > if (netif_carrier_ok(bond->dev)) {
> > netif_carrier_off(bond->dev);
> > goto out;
> >diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
> >index b672b8a881bb..d64a5d2f80b6 100644
> >--- a/drivers/net/bonding/bond_options.c
> >+++ b/drivers/net/bonding/bond_options.c
> >@@ -1706,6 +1706,7 @@ static int bond_option_lacp_fallback_set(struct bonding *bond,
> > netdev_dbg(bond->dev, "Setting LACP fallback to %s (%llu)\n",
> > newval->string, newval->value);
> > bond->params.lacp_fallback = newval->value;
> >+ bond_3ad_set_carrier(bond);
> >
> > return 0;
> > }
> >--
> >2.39.2
> >
>
> ---
> -Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH net v3 3/5] bonding: 3ad: fix mux port state on oper down
2026-04-08 15:23 [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves Louis Scalbert
2026-04-08 15:23 ` [PATCH net v3 1/5] bonding: 3ad: add lacp_fallback configuration knob Louis Scalbert
2026-04-08 15:23 ` [PATCH net v3 2/5] bonding: 3ad: fix carrier when no valid slaves Louis Scalbert
@ 2026-04-08 15:23 ` Louis Scalbert
2026-04-13 16:49 ` Jay Vosburgh
2026-04-08 15:23 ` [PATCH net v3 4/5] bonding: 3ad: fix stuck negotiation on recovery Louis Scalbert
` (3 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Louis Scalbert @ 2026-04-08 15:23 UTC (permalink / raw)
To: netdev
Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
maheshb, Louis Scalbert
When the bonding interface has carrier down due to the absence of
valid slaves and a slave transitions from down to up, the bonding
interface briefly goes carrier up, then down again, and finally up
once LACP negotiates collecting and distributing on the port.
The interface should not transition to carrier up until LACP
negotiation is complete.
This happens because the actor and partner port states remain in
collecting (and distributing) when the port goes down. When the port
comes back up, it temporarily remains in this state until LACP
renegotiation occurs.
Previously this was mostly cosmetic, but since the bonding carrier
state now depends on the LACP negotiation state, it causes the
interface to flap.
Fix this by unsetting the SELECTED flag when a port goes down so that
the mux state machine transitions through ATTACHED and DETACHED,
which clears the actor collecting and distributing flags. Do not
attempt to set the SELECTED flag if the port is still disabled.
Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
drivers/net/bonding/bond_3ad.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index b79a76296966..3a94fbcbf721 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1570,6 +1570,12 @@ static void ad_port_selection_logic(struct port *port, bool *update_slave_arr)
struct slave *slave;
int found = 0;
+ /* Disabled ports cannot be SELECTED.
+ * Do not attempt to set the SELECTED flag if the port is still disabled.
+ */
+ if (!port->is_enabled)
+ return;
+
/* if the port is already Selected, do nothing */
if (port->sm_vars & AD_PORT_SELECTED)
return;
@@ -2794,6 +2800,7 @@ void bond_3ad_handle_link_change(struct slave *slave, char link)
/* link has failed */
port->is_enabled = false;
ad_update_actor_keys(port, true);
+ port->sm_vars &= ~AD_PORT_SELECTED;
}
agg = __get_first_agg(port);
ad_agg_selection_logic(agg, &dummy);
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH net v3 3/5] bonding: 3ad: fix mux port state on oper down
2026-04-08 15:23 ` [PATCH net v3 3/5] bonding: 3ad: fix mux port state on oper down Louis Scalbert
@ 2026-04-13 16:49 ` Jay Vosburgh
2026-04-15 18:03 ` Louis Scalbert
0 siblings, 1 reply; 18+ messages in thread
From: Jay Vosburgh @ 2026-04-13 16:49 UTC (permalink / raw)
To: Louis Scalbert
Cc: netdev, andrew+netdev, edumazet, kuba, pabeni, fbl, andy,
shemminger, maheshb
Louis Scalbert <louis.scalbert@6wind.com> wrote:
>When the bonding interface has carrier down due to the absence of
>valid slaves and a slave transitions from down to up, the bonding
>interface briefly goes carrier up, then down again, and finally up
>once LACP negotiates collecting and distributing on the port.
Instead of "valid," I would suggest "usable."
>The interface should not transition to carrier up until LACP
>negotiation is complete.
If the new option is off, i.e., does not require successful LACP
negotiation, it should wait some time before asserting carrier up. If
negotiation fails, however, how long should it wait?
>This happens because the actor and partner port states remain in
>collecting (and distributing) when the port goes down. When the port
>comes back up, it temporarily remains in this state until LACP
>renegotiation occurs.
>
>Previously this was mostly cosmetic, but since the bonding carrier
>state now depends on the LACP negotiation state, it causes the
>interface to flap.
"now depends" -> "may depend"
>Fix this by unsetting the SELECTED flag when a port goes down so that
>the mux state machine transitions through ATTACHED and DETACHED,
>which clears the actor collecting and distributing flags. Do not
>attempt to set the SELECTED flag if the port is still disabled.
>
>Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
>Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
>---
> drivers/net/bonding/bond_3ad.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index b79a76296966..3a94fbcbf721 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -1570,6 +1570,12 @@ static void ad_port_selection_logic(struct port *port, bool *update_slave_arr)
> struct slave *slave;
> int found = 0;
>
>+ /* Disabled ports cannot be SELECTED.
>+ * Do not attempt to set the SELECTED flag if the port is still disabled.
>+ */
>+ if (!port->is_enabled)
>+ return;
>+
I think the change is fine, but the comment seems redundant to
me.
-J
> /* if the port is already Selected, do nothing */
> if (port->sm_vars & AD_PORT_SELECTED)
> return;
>@@ -2794,6 +2800,7 @@ void bond_3ad_handle_link_change(struct slave *slave, char link)
> /* link has failed */
> port->is_enabled = false;
> ad_update_actor_keys(port, true);
>+ port->sm_vars &= ~AD_PORT_SELECTED;
> }
> agg = __get_first_agg(port);
> ad_agg_selection_logic(agg, &dummy);
>--
>2.39.2
>
---
-Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH net v3 3/5] bonding: 3ad: fix mux port state on oper down
2026-04-13 16:49 ` Jay Vosburgh
@ 2026-04-15 18:03 ` Louis Scalbert
0 siblings, 0 replies; 18+ messages in thread
From: Louis Scalbert @ 2026-04-15 18:03 UTC (permalink / raw)
To: Jay Vosburgh
Cc: netdev, andrew+netdev, edumazet, kuba, pabeni, fbl, andy,
shemminger, maheshb
Le lun. 13 avr. 2026 à 18:49, Jay Vosburgh <jv@jvosburgh.net> a écrit :
>
> Louis Scalbert <louis.scalbert@6wind.com> wrote:
>
> >When the bonding interface has carrier down due to the absence of
> >valid slaves and a slave transitions from down to up, the bonding
> >interface briefly goes carrier up, then down again, and finally up
> >once LACP negotiates collecting and distributing on the port.
>
> Instead of "valid," I would suggest "usable."
>
> >The interface should not transition to carrier up until LACP
> >negotiation is complete.
I agree to take lacp_strict as the new parameter name.
I should write instead:
*With lacp_strict enabled,* the interface should not transition...
>
> If the new option is off, i.e., does not require successful LACP
> negotiation, it should wait some time before asserting carrier up. If
> negotiation fails, however, how long should it wait?
>
I am not sure I understand your point.
When lacp_strict is disabled, the existing behavior should remain
unchanged. The purpose of this patch is only to correct the
Collecting/Distributing state when the port is not operational.
This state is not used to determine carrier assertion when lacp_strict
is disabled.
I will double-check that the timing remains unchanged before posting
the next version of the patch.
> >This happens because the actor and partner port states remain in
> >collecting (and distributing) when the port goes down. When the port
> >comes back up, it temporarily remains in this state until LACP
> >renegotiation occurs.
> >
> >Previously this was mostly cosmetic, but since the bonding carrier
> >state now depends on the LACP negotiation state, it causes the
> >interface to flap.
>
> "now depends" -> "may depend"
>
> >Fix this by unsetting the SELECTED flag when a port goes down so that
> >the mux state machine transitions through ATTACHED and DETACHED,
> >which clears the actor collecting and distributing flags. Do not
> >attempt to set the SELECTED flag if the port is still disabled.
> >
> >Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
> >Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
> >---
> > drivers/net/bonding/bond_3ad.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> >diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> >index b79a76296966..3a94fbcbf721 100644
> >--- a/drivers/net/bonding/bond_3ad.c
> >+++ b/drivers/net/bonding/bond_3ad.c
> >@@ -1570,6 +1570,12 @@ static void ad_port_selection_logic(struct port *port, bool *update_slave_arr)
> > struct slave *slave;
> > int found = 0;
> >
> >+ /* Disabled ports cannot be SELECTED.
> >+ * Do not attempt to set the SELECTED flag if the port is still disabled.
> >+ */
> >+ if (!port->is_enabled)
> >+ return;
> >+
>
> I think the change is fine, but the comment seems redundant to
> me.
>
> -J
>
> > /* if the port is already Selected, do nothing */
> > if (port->sm_vars & AD_PORT_SELECTED)
> > return;
> >@@ -2794,6 +2800,7 @@ void bond_3ad_handle_link_change(struct slave *slave, char link)
> > /* link has failed */
> > port->is_enabled = false;
> > ad_update_actor_keys(port, true);
> >+ port->sm_vars &= ~AD_PORT_SELECTED;
> > }
> > agg = __get_first_agg(port);
> > ad_agg_selection_logic(agg, &dummy);
> >--
> >2.39.2
> >
>
> ---
> -Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH net v3 4/5] bonding: 3ad: fix stuck negotiation on recovery
2026-04-08 15:23 [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves Louis Scalbert
` (2 preceding siblings ...)
2026-04-08 15:23 ` [PATCH net v3 3/5] bonding: 3ad: fix mux port state on oper down Louis Scalbert
@ 2026-04-08 15:23 ` Louis Scalbert
2026-04-13 18:39 ` Jay Vosburgh
2026-04-08 15:23 ` [PATCH net v3 5/5] selftests: bonding: add test for fallback mode Louis Scalbert
` (2 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Louis Scalbert @ 2026-04-08 15:23 UTC (permalink / raw)
To: netdev
Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
maheshb, Louis Scalbert
The previous commit introduced a side effect caused by clearing the
SELECTED flag on disabled ports. After all ports in an aggregator go
down, if only a subset of ports comes back up, those ports can no
longer renegotiate LACP unless all aggregator ports come back up.
1. All aggregator ports go down
- The SELECTED flag is cleared on all of them.
2. One port comes back up
- Its SELECTED flag is set again.
- It enters the WAITING state and gets its READY_N flag.
- The remaining ports stay UNSELECTED. Because of that, they cannot
enter the WAITING state and therefore never get READY_N.
- __agg_ports_are_ready() returns 0 because it finds a port without
READY_N.
- As a result, __set_agg_ports_ready() keeps the READY flag cleared on
all ports.
- The port that came back up is therefore not marked READY and cannot
transition to ATTACHED.
- LACP negotiation becomes stuck, and the port cannot be used.
3. All aggregator ports come back up
- They all regain SELECTED and READY_N.
- __agg_ports_are_ready() now returns 1.
- __set_agg_ports_ready() sets READY on all ports.
- They can then transition to ATTACHED.
- Negotiation resumes and the aggregator becomes operational again.
Consider only ports currently in the WAITING mux state for READY_N in
order to avoid __agg_ports_are_ready() to return 0 because of a disabled
port. That matches 802.3ad, which states: "The Selection Logic asserts
Ready TRUE when the values of Ready_N for all ports that are waiting to
attach to a given Aggregator are TRUE.".
Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
drivers/net/bonding/bond_3ad.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 3a94fbcbf721..3f56d892b101 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -700,7 +700,8 @@ static void __update_ntt(struct lacpdu *lacpdu, struct port *port)
}
/**
- * __agg_ports_are_ready - check if all ports in an aggregator are ready
+ * __agg_ports_are_ready - check if all ports in an aggregator that are in
+ * the WAITING state are ready
* @aggregator: the aggregator we're looking at
*
*/
@@ -716,6 +717,8 @@ static int __agg_ports_are_ready(struct aggregator *aggregator)
for (port = aggregator->lag_ports;
port;
port = port->next_port_in_aggregator) {
+ if (port->sm_mux_state != AD_MUX_WAITING)
+ continue;
if (!(port->sm_vars & AD_PORT_READY_N)) {
retval = 0;
break;
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH net v3 4/5] bonding: 3ad: fix stuck negotiation on recovery
2026-04-08 15:23 ` [PATCH net v3 4/5] bonding: 3ad: fix stuck negotiation on recovery Louis Scalbert
@ 2026-04-13 18:39 ` Jay Vosburgh
0 siblings, 0 replies; 18+ messages in thread
From: Jay Vosburgh @ 2026-04-13 18:39 UTC (permalink / raw)
To: Louis Scalbert
Cc: netdev, andrew+netdev, edumazet, kuba, pabeni, fbl, andy,
shemminger, maheshb
Louis Scalbert <louis.scalbert@6wind.com> wrote:
>The previous commit introduced a side effect caused by clearing the
>SELECTED flag on disabled ports. After all ports in an aggregator go
>down, if only a subset of ports comes back up, those ports can no
>longer renegotiate LACP unless all aggregator ports come back up.
>
>1. All aggregator ports go down
> - The SELECTED flag is cleared on all of them.
>2. One port comes back up
> - Its SELECTED flag is set again.
> - It enters the WAITING state and gets its READY_N flag.
> - The remaining ports stay UNSELECTED. Because of that, they cannot
> enter the WAITING state and therefore never get READY_N.
This is the part that I think we may be doing something else
incorrectly. If the port is UNSELECTED, then that means that no
aggregator is currently selected for that port, and therefore it
shouldn't be assigned to an aggregator with other ports (per
802.1AX-2014 6.4.8, "Selected").
I'm not seeing anything in the 6.4.14 Selection Logic that makes
me think a port that is down (port_enabled == FALSE) is disallowed from
being SELECTED.
Looking at the Receive machine state diagram (Figure 6-18), I
tend to think that in this case the port would transition to
PORT_DISABLED state, as we're not asserting a BEGIN (reinitialization of
the LACP protocol entity), so the port variables can remain unchanged.
There's even some language that suggests this is intentional:
"If the Aggregation Port becomes inoperable and the BEGIN
variable is not asserted, the state machine enters the
PORT_DISABLED state. [...] This state allows the current
Selection state to remain undisturbed, so that, in the event
that the Aggregation Port is still connected to the same Partner
and Partner Aggregation Port when it becomes operable again,
there will be no disturbance caused to higher layers by
unnecessary re-configuration.
So, perhaps the actual bug is that these ports are attached to
the aggregator but not SELECTED.
-J
> - __agg_ports_are_ready() returns 0 because it finds a port without
> READY_N.
> - As a result, __set_agg_ports_ready() keeps the READY flag cleared on
> all ports.
> - The port that came back up is therefore not marked READY and cannot
> transition to ATTACHED.
> - LACP negotiation becomes stuck, and the port cannot be used.
>3. All aggregator ports come back up
> - They all regain SELECTED and READY_N.
> - __agg_ports_are_ready() now returns 1.
> - __set_agg_ports_ready() sets READY on all ports.
> - They can then transition to ATTACHED.
> - Negotiation resumes and the aggregator becomes operational again.
>
>Consider only ports currently in the WAITING mux state for READY_N in
>order to avoid __agg_ports_are_ready() to return 0 because of a disabled
>port. That matches 802.3ad, which states: "The Selection Logic asserts
>Ready TRUE when the values of Ready_N for all ports that are waiting to
>attach to a given Aggregator are TRUE.".
>
>Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
>Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
>---
> drivers/net/bonding/bond_3ad.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index 3a94fbcbf721..3f56d892b101 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -700,7 +700,8 @@ static void __update_ntt(struct lacpdu *lacpdu, struct port *port)
> }
>
> /**
>- * __agg_ports_are_ready - check if all ports in an aggregator are ready
>+ * __agg_ports_are_ready - check if all ports in an aggregator that are in
>+ * the WAITING state are ready
> * @aggregator: the aggregator we're looking at
> *
> */
>@@ -716,6 +717,8 @@ static int __agg_ports_are_ready(struct aggregator *aggregator)
> for (port = aggregator->lag_ports;
> port;
> port = port->next_port_in_aggregator) {
>+ if (port->sm_mux_state != AD_MUX_WAITING)
>+ continue;
> if (!(port->sm_vars & AD_PORT_READY_N)) {
> retval = 0;
> break;
>--
>2.39.2
>
---
-Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH net v3 5/5] selftests: bonding: add test for fallback mode
2026-04-08 15:23 [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves Louis Scalbert
` (3 preceding siblings ...)
2026-04-08 15:23 ` [PATCH net v3 4/5] bonding: 3ad: fix stuck negotiation on recovery Louis Scalbert
@ 2026-04-08 15:23 ` Louis Scalbert
2026-04-09 3:13 ` [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves Jakub Kicinski
2026-04-13 16:45 ` Jay Vosburgh
6 siblings, 0 replies; 18+ messages in thread
From: Louis Scalbert @ 2026-04-08 15:23 UTC (permalink / raw)
To: netdev
Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
maheshb, Louis Scalbert
Add a test for the bonding legacy and strict LACP fallback modes.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
.../selftests/drivers/net/bonding/Makefile | 1 +
.../drivers/net/bonding/bond_lacp_fallback.sh | 299 ++++++++++++++++++
2 files changed, 300 insertions(+)
create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_lacp_fallback.sh
diff --git a/tools/testing/selftests/drivers/net/bonding/Makefile b/tools/testing/selftests/drivers/net/bonding/Makefile
index 6c5c60adb5e8..a117bf2e483b 100644
--- a/tools/testing/selftests/drivers/net/bonding/Makefile
+++ b/tools/testing/selftests/drivers/net/bonding/Makefile
@@ -7,6 +7,7 @@ TEST_PROGS := \
bond-eth-type-change.sh \
bond-lladdr-target.sh \
bond_ipsec_offload.sh \
+ bond_lacp_fallback.sh \
bond_lacp_prio.sh \
bond_macvlan_ipvlan.sh \
bond_options.sh \
diff --git a/tools/testing/selftests/drivers/net/bonding/bond_lacp_fallback.sh b/tools/testing/selftests/drivers/net/bonding/bond_lacp_fallback.sh
new file mode 100755
index 000000000000..a983a2c2ea17
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/bonding/bond_lacp_fallback.sh
@@ -0,0 +1,299 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Testing if bond lacp_fallback works
+#
+# Partner (p_ns)
+# +-------------------------+
+# | bond0 |
+# | + |
+# | eth0 | eth1 |
+# | +---+---+ |
+# | | | |
+# +-------------------------+
+# | |
+# +--------------------------+
+# | | | |
+# | +---+---+ |
+# | eth0 | eth1 |
+# | + |
+# | bond0 |
+# +--------------------------+
+# Dut (d_ns)
+
+lib_dir=$(dirname "$0")
+# shellcheck disable=SC1090
+source "$lib_dir"/../../../net/lib.sh
+
+COLLECTING_DISTRIBUTING_MASK=48
+COLLECTING_DISTRIBUTING=48
+FAILED=0
+
+setup_links()
+{
+ # shellcheck disable=SC2154
+ ip -n "${d_ns}" link add eth0 type veth peer name eth0 netns "${p_ns}"
+ ip -n "${d_ns}" link add eth1 type veth peer name eth1 netns "${p_ns}"
+
+ ip -n "${d_ns}" link add bond0 type bond mode 802.3ad miimon 100 \
+ lacp_rate fast min_links 1
+ ip -n "${p_ns}" link add bond0 type bond mode 802.3ad miimon 100 \
+ lacp_rate fast min_links 1
+
+ ip -n "${d_ns}" link set eth0 master bond0
+ ip -n "${d_ns}" link set eth1 master bond0
+ ip -n "${p_ns}" link set eth0 master bond0
+ ip -n "${p_ns}" link set eth1 master bond0
+
+ ip -n "${d_ns}" link set bond0 up
+ ip -n "${p_ns}" link set bond0 up
+}
+
+test_master_carrier() {
+ local expected=$1
+ local mode_name=$2
+ local carrier
+
+ carrier=$(ip netns exec "${d_ns}" cat /sys/class/net/bond0/carrier)
+ [ "$carrier" == "1" ] && carrier="up" || carrier="down"
+
+ [ "$carrier" == "$expected" ] && return
+
+ echo "FAIL: Expected carrier $expected in $mode_name mode, got $carrier"
+
+ RET=1
+
+}
+
+compare_state() {
+ local actual_state=$1
+ local expected_state=$2
+ local iface=$3
+ local last_attempt=$4
+
+ [ $((actual_state & COLLECTING_DISTRIBUTING_MASK)) -eq "$expected_state" ] \
+ && return 0
+
+ [ "$last_attempt" -ne 1 ] && return 1
+
+ printf "FAIL: Expected LACP %s actor state to " "$iface"
+ if [ "$expected_state" -eq $COLLECTING_DISTRIBUTING ]; then
+ echo "be in Collecting/Distributing state"
+ else
+ echo "have neither Collecting nor Distributing set."
+ fi
+
+ return 1
+}
+
+_test_lacp_port_state() {
+ local interface=$1
+ local expected=$2
+ local last_attempt=$3
+ local eth0_actor_state eth1_actor_state
+ local ret=0
+
+ # shellcheck disable=SC2016
+ while IFS='=' read -r k v; do
+ printf -v "$k" '%s' "$v"
+ done < <(
+ ip netns exec "${d_ns}" awk '
+ /^Slave Interface: / { iface=$3 }
+ /details actor lacp pdu:/ { ctx="actor" }
+ /details partner lacp pdu:/ { ctx="partner" }
+ /^[[:space:]]+port state: / {
+ if (ctx == "actor") {
+ gsub(":", "", iface)
+ printf "%s_%s_state=%s\n", iface, ctx, $3
+ }
+ }
+ ' /proc/net/bonding/bond0
+ )
+
+ if [ "$interface" == "eth0" ] || [ "$interface" == "both" ]; then
+ compare_state "$eth0_actor_state" "$expected" eth0 "$last_attempt" || ret=1
+ fi
+
+ if [ "$interface" == "eth1" ] || [ "$interface" == "both" ]; then
+ compare_state "$eth1_actor_state" "$expected" eth1 "$last_attempt" || ret=1
+ fi
+
+ return $ret
+}
+
+test_lacp_port_state() {
+ local interface=$1
+ local expected=$2
+ local retry=$3
+ local last_attempt=0
+ local attempt=1
+ local ret=1
+
+ while [ $attempt -le $((retry + 1)) ]; do
+ [ $attempt -eq $((retry + 1)) ] && last_attempt=1
+ _test_lacp_port_state "$interface" "$expected" "$last_attempt" && return
+ ((attempt++))
+ sleep 1
+ done
+
+ RET=1
+}
+
+
+trap cleanup_all_ns EXIT
+setup_ns d_ns p_ns
+setup_links
+
+# Initial state
+RET=0
+mode=legacy
+test_lacp_port_state both $COLLECTING_DISTRIBUTING 3
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 up"
+
+# partner eth0 down, eth1 up
+RET=0
+ip -n "${p_ns}" link set eth0 down
+test_lacp_port_state eth0 $FAILED 5
+test_lacp_port_state eth1 $COLLECTING_DISTRIBUTING 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 down"
+
+# partner eth0 and eth1 down
+RET=0
+ip -n "${p_ns}" link set eth1 down
+test_lacp_port_state both $FAILED 5
+test_master_carrier down $mode # down because of min_links
+log_test "bond LACP" "$mode fallback - eth0 and eth1 down"
+
+# partner eth0 up, eth1 down
+RET=0
+ip -n "${p_ns}" link set eth0 up
+test_lacp_port_state eth0 $COLLECTING_DISTRIBUTING 60
+test_lacp_port_state eth1 $FAILED 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 up, eth1 down"
+
+# partner eth0 and eth1 up
+RET=0
+ip -n "${p_ns}" link set eth1 up
+test_lacp_port_state both $COLLECTING_DISTRIBUTING 60
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 up"
+
+# partner eth0 stops LACP and eth1 up
+RET=0
+ip netns exec "${p_ns}" tc qdisc add dev eth0 root netem loss 100%
+test_lacp_port_state eth0 $FAILED 5
+test_lacp_port_state eth1 $COLLECTING_DISTRIBUTING 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 stopped sending LACP"
+
+# partner eth0 and eth1 stop LACP
+RET=0
+ip netns exec "${p_ns}" tc qdisc add dev eth1 root netem loss 100%
+test_lacp_port_state both $FAILED 5
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 stopped sending LACP"
+
+# switch to lacp_fallback strict
+RET=0
+mode=strict
+ip -n "${d_ns}" link set dev bond0 type bond lacp_fallback $mode
+test_lacp_port_state both $FAILED 1
+test_master_carrier down $mode 5
+log_test "bond LACP" "$mode fallback - eth0 and eth1 stopped sending LACP"
+
+# switch back to lacp_fallback legacy mode
+RET=0
+mode=legacy
+ip -n "${d_ns}" link set dev bond0 type bond lacp_fallback $mode
+test_lacp_port_state both $FAILED 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 stopped sending LACP"
+
+# eth0 recovers LACP
+RET=0
+ip netns exec "${p_ns}" tc qdisc del dev eth0 root
+test_lacp_port_state eth0 $COLLECTING_DISTRIBUTING 60
+test_lacp_port_state eth1 $FAILED 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 recovered and eth1 stopped sending LACP"
+
+# eth1 recovers LACP
+RET=0
+ip netns exec "${p_ns}" tc qdisc del dev eth1 root
+test_lacp_port_state both $COLLECTING_DISTRIBUTING 60
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 recovered LACP"
+
+# switch to lacp_fallback strict
+RET=0
+mode=strict
+ip -n "${d_ns}" link set dev bond0 type bond lacp_fallback $mode
+test_lacp_port_state both $COLLECTING_DISTRIBUTING 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 up"
+
+# partner eth0 down, eth1 up
+RET=0
+ip -n "${p_ns}" link set eth0 down
+test_lacp_port_state eth0 $FAILED 5
+test_lacp_port_state eth1 $COLLECTING_DISTRIBUTING 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 down"
+
+# partner eth0 and eth1 down
+RET=0
+ip -n "${p_ns}" link set eth1 down
+test_lacp_port_state both $FAILED 5
+test_master_carrier down $mode # down because of min_links
+log_test "bond LACP" "$mode fallback - eth0 and eth1 down"
+
+# partner eth0 up, eth1 down
+RET=0
+ip -n "${p_ns}" link set eth0 up
+test_lacp_port_state eth0 $COLLECTING_DISTRIBUTING 60
+test_lacp_port_state eth1 $FAILED 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 up, eth1 down"
+
+# partner eth0 and eth1 up
+RET=0
+ip -n "${p_ns}" link set eth1 up
+test_lacp_port_state both $COLLECTING_DISTRIBUTING 60
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 up"
+
+# partner eth0 stops LACP and eth1 up
+RET=0
+ip netns exec "${p_ns}" tc qdisc add dev eth0 root netem loss 100%
+test_lacp_port_state eth0 $FAILED 5
+test_lacp_port_state eth1 $COLLECTING_DISTRIBUTING 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 stopped sending LACP"
+
+# partner eth0 and eth1 stop LACP
+RET=0
+ip netns exec "${p_ns}" tc qdisc add dev eth1 root netem loss 100%
+test_lacp_port_state both $FAILED 5
+test_master_carrier down $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 stopped sending LACP"
+
+# eth0 recovers LACP
+RET=0
+ip netns exec "${p_ns}" tc qdisc del dev eth0 root
+test_lacp_port_state eth0 $COLLECTING_DISTRIBUTING 60
+test_lacp_port_state eth1 $FAILED 1
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 recovered and eth1 stopped sending LACP"
+
+# eth1 recovers LACP
+# shellcheck disable=SC2034
+RET=0
+ip netns exec "${p_ns}" tc qdisc del dev eth1 root
+test_lacp_port_state both $COLLECTING_DISTRIBUTING 60
+test_master_carrier up $mode
+log_test "bond LACP" "$mode fallback - eth0 and eth1 recovered LACP"
+
+exit "${EXIT_STATUS}"
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves
2026-04-08 15:23 [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves Louis Scalbert
` (4 preceding siblings ...)
2026-04-08 15:23 ` [PATCH net v3 5/5] selftests: bonding: add test for fallback mode Louis Scalbert
@ 2026-04-09 3:13 ` Jakub Kicinski
2026-04-09 6:53 ` Jonas Gorski
2026-04-13 16:45 ` Jay Vosburgh
6 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2026-04-09 3:13 UTC (permalink / raw)
To: Louis Scalbert
Cc: netdev, andrew+netdev, jv, edumazet, pabeni, fbl, andy,
shemminger, maheshb
On Wed, 8 Apr 2026 17:23:48 +0200 Louis Scalbert wrote:
> The current behavior is not compliant with the LACP standard. This
> patchset introduces a working behavior that is not strictly
> standard-compliant either, but is widely adopted across the industry.
> It consists of bringing the bonding master interface down to signal to
> upper-layer processes that it is not usable.
Is the only problem the compliance? If so I don't think this qualifies
as a fix. Please drop the Fixes tags and repost for net-next. Please
keep in mind the 24h reposting period (also I need some time tomorrow
to queue your patch to the CI so that the selftest passes when v4 is
posted :()
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves
2026-04-09 3:13 ` [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves Jakub Kicinski
@ 2026-04-09 6:53 ` Jonas Gorski
2026-04-09 11:49 ` Louis Scalbert
0 siblings, 1 reply; 18+ messages in thread
From: Jonas Gorski @ 2026-04-09 6:53 UTC (permalink / raw)
To: Jakub Kicinski, Louis Scalbert
Cc: netdev, andrew+netdev, jv, edumazet, pabeni, fbl, andy,
shemminger, maheshb
On 09/04/2026 05:13, Jakub Kicinski wrote:
> On Wed, 8 Apr 2026 17:23:48 +0200 Louis Scalbert wrote:
>> The current behavior is not compliant with the LACP standard. This
>> patchset introduces a working behavior that is not strictly
>> standard-compliant either, but is widely adopted across the industry.
>> It consists of bringing the bonding master interface down to signal to
>> upper-layer processes that it is not usable.
>
> Is the only problem the compliance? If so I don't think this qualifies
> as a fix. Please drop the Fixes tags and repost for net-next. Please
> keep in mind the 24h reposting period (also I need some time tomorrow
> to queue your patch to the CI so that the selftest passes when v4 is
> posted :()
Signalling link up too early can cause issues for some protocols that
may change behavior in the absence of PDUs from a link partner.
E.g. AFAIU RSTP may decide the bond is an edge port because no RSTP
BPDUs received and put the bond in forwarding, which then temporarily
creates a loop once the bond actually starts forwarding packets (until
it receives the next RSTP BPDU, which may take up to two seconds).
Best regards,
Jonas
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves
2026-04-09 6:53 ` Jonas Gorski
@ 2026-04-09 11:49 ` Louis Scalbert
2026-04-10 2:38 ` Jakub Kicinski
0 siblings, 1 reply; 18+ messages in thread
From: Louis Scalbert @ 2026-04-09 11:49 UTC (permalink / raw)
To: Jonas Gorski
Cc: Jakub Kicinski, netdev, andrew+netdev, jv, edumazet, pabeni, fbl,
andy, shemminger, maheshb
Hello,
Le jeu. 9 avr. 2026 à 08:53, Jonas Gorski <jonas.gorski@gmail.com> a écrit :
>
> On 09/04/2026 05:13, Jakub Kicinski wrote:
> > On Wed, 8 Apr 2026 17:23:48 +0200 Louis Scalbert wrote:
> >> The current behavior is not compliant with the LACP standard. This
> >> patchset introduces a working behavior that is not strictly
> >> standard-compliant either, but is widely adopted across the industry.
> >> It consists of bringing the bonding master interface down to signal to
> >> upper-layer processes that it is not usable.
> >
> > Is the only problem the compliance? If so I don't think this qualifies
> > as a fix. Please drop the Fixes tags and repost for net-next. Please
> > keep in mind the 24h reposting period (also I need some time tomorrow
> > to queue your patch to the CI so that the selftest passes when v4 is
> > posted :()
The problem is not only about compliance.
In his review of v2, Jay argued that the current behavior is
standard-compliant and pointed out that it enables some use cases, such
as PXE facing an LACP bond.
I replied that the current implementation is not actually compliant
with the standard, and that the PXE case does not truly work in a
reliable way, since success depends on a random link choice.
My goal is not to make the PXE use case work. Rather, this series fixes
other problematic scenarios. To avoid regressing setups that may
benefit from the current behavior, I added a configuration knob that
allows preserving the legacy behavior. The legacy mode should be
deprecated in my opinion.
>
> Signalling link up too early can cause issues for some protocols that
> may change behavior in the absence of PDUs from a link partner.
I agree with your point. I have observed issues with
keepalived VRRP when it is configured on top of a bonding interface.
When the bond reports carrier as up while no slave is actually able to
receive traffic (due to the partner not being ready, as indicated by the
absence of LACP negotiation), the VRRP process interprets the interface
as operational. At the same time, the absence of received VRRP
advertisements is interpreted as if it were the only router on the
segment. As a result, it transitions to the MASTER state.
In reality, another VRRP router may already be MASTER and actively
sending advertisements, but those packets are not received due to the
bonding state. This leads to a split-brain condition with multiple
masters on the network.
Such a situation breaks the assumptions of
VRRP, where a single MASTER is expected to handle traffic,
and can result in traffic inconsistency or loss when upper-layer
processes rely on this behavior.
>
> E.g. AFAIU RSTP may decide the bond is an edge port because no RSTP
> BPDUs received and put the bond in forwarding, which then temporarily
> creates a loop once the bond actually starts forwarding packets (until
> it receives the next RSTP BPDU, which may take up to two seconds).
>
There are also situations where BGP incorrectly assumes that the link is
usable while it is not.
Please confirm whether it should be a fix or not.
> Best regards,
> Jonas
Best regards,
Louis Scalbert
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves
2026-04-09 11:49 ` Louis Scalbert
@ 2026-04-10 2:38 ` Jakub Kicinski
2026-04-10 14:02 ` Jay Vosburgh
0 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2026-04-10 2:38 UTC (permalink / raw)
To: Louis Scalbert
Cc: Jonas Gorski, netdev, andrew+netdev, jv, edumazet, pabeni, fbl,
andy, shemminger, maheshb
On Thu, 9 Apr 2026 13:49:06 +0200 Louis Scalbert wrote:
> > Signalling link up too early can cause issues for some protocols that
> > may change behavior in the absence of PDUs from a link partner.
>
> I agree with your point. I have observed issues with
> keepalived VRRP when it is configured on top of a bonding interface.
>
> When the bond reports carrier as up while no slave is actually able to
> receive traffic (due to the partner not being ready, as indicated by the
> absence of LACP negotiation), the VRRP process interprets the interface
> as operational. At the same time, the absence of received VRRP
> advertisements is interpreted as if it were the only router on the
> segment. As a result, it transitions to the MASTER state.
>
> In reality, another VRRP router may already be MASTER and actively
> sending advertisements, but those packets are not received due to the
> bonding state. This leads to a split-brain condition with multiple
> masters on the network.
>
> Such a situation breaks the assumptions of
> VRRP, where a single MASTER is expected to handle traffic,
> and can result in traffic inconsistency or loss when upper-layer
> processes rely on this behavior.
It's been like this for what, 15 years?
We have to draw the line between fix and improvement somewhere.
In Linux we generally draw the line at regressions+crashes/security
bugs. If a use case never worked correctly it's not getting fixed.
It's getting enabled.
That said, if Jay wants it as a fix I'm not going to argue.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves
2026-04-10 2:38 ` Jakub Kicinski
@ 2026-04-10 14:02 ` Jay Vosburgh
0 siblings, 0 replies; 18+ messages in thread
From: Jay Vosburgh @ 2026-04-10 14:02 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Louis Scalbert, Jonas Gorski, netdev, andrew+netdev, edumazet,
pabeni, fbl, andy, shemminger, maheshb
Jakub Kicinski <kuba@kernel.org> wrote:
>On Thu, 9 Apr 2026 13:49:06 +0200 Louis Scalbert wrote:
>> > Signalling link up too early can cause issues for some protocols that
>> > may change behavior in the absence of PDUs from a link partner.
>>
>> I agree with your point. I have observed issues with
>> keepalived VRRP when it is configured on top of a bonding interface.
>>
>> When the bond reports carrier as up while no slave is actually able to
>> receive traffic (due to the partner not being ready, as indicated by the
>> absence of LACP negotiation), the VRRP process interprets the interface
>> as operational. At the same time, the absence of received VRRP
>> advertisements is interpreted as if it were the only router on the
>> segment. As a result, it transitions to the MASTER state.
>>
>> In reality, another VRRP router may already be MASTER and actively
>> sending advertisements, but those packets are not received due to the
>> bonding state. This leads to a split-brain condition with multiple
>> masters on the network.
>>
>> Such a situation breaks the assumptions of
>> VRRP, where a single MASTER is expected to handle traffic,
>> and can result in traffic inconsistency or loss when upper-layer
>> processes rely on this behavior.
>
>It's been like this for what, 15 years?
>We have to draw the line between fix and improvement somewhere.
>In Linux we generally draw the line at regressions+crashes/security
>bugs. If a use case never worked correctly it's not getting fixed.
>It's getting enabled.
>
>That said, if Jay wants it as a fix I'm not going to argue.
My apologies for not responding sooner, I was under the weather
for a few days and am just now catching up.
My general position is that, first, the current bonding behavior
is compliant to the standard, which gives latitude in how individual
ports (those not partnered with a LACP peer) are managed. The stated
Cisco, et al, behavior of denying use of such ports is also compliant.
One thing we cannot do is run such ports together as a logical link
aggregation, for both standards compliance reasons and well as the
practical issues of creating topology loops or duplicate frames.
Second, the minutiae of the standard is not the real issue at
hand, which is that bonding's behavior of enabling an un-partnered port
and setting the bond to carrier up based on carrier state does cause
communications issues with peers that behave differently and deny use of
un-partnered ports.
As such, in principle I'm not opposed to an option that would
essentially tell bonding to only use LACP-partnered ports for the active
aggregator. I'll have some time over the weekend to review the patch
set in detail and respond to the specifics.
-J
---
-Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves
2026-04-08 15:23 [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves Louis Scalbert
` (5 preceding siblings ...)
2026-04-09 3:13 ` [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves Jakub Kicinski
@ 2026-04-13 16:45 ` Jay Vosburgh
6 siblings, 0 replies; 18+ messages in thread
From: Jay Vosburgh @ 2026-04-13 16:45 UTC (permalink / raw)
To: Louis Scalbert
Cc: netdev, andrew+netdev, edumazet, kuba, pabeni, fbl, andy,
shemminger, maheshb
Louis Scalbert <louis.scalbert@6wind.com> wrote:
>Hi everyone,
>
>This series addresses a blackholing issue and a subsequent link-flapping
>issue in the 802.3ad bonding driver when dealing with inactive slaves
>and the `min_links` parameter.
>
>When an 802.3ad (LACP) bonding interface has no slaves in the
>collecting/distributing state, the bonding master still reports
>carrier as up as long as at least 'min_links' slaves have carrier.
>
>In this situation, only one slave is effectively used for TX/RX,
>while traffic received on other slaves is dropped. Upper-layer
>daemons therefore consider the interface operational, even though
>traffic may be blackholed if the lack of LACP negotiation means
>the partner is not ready to deal with traffic.
>
>The current behavior is not compliant with the LACP standard. This
>patchset introduces a working behavior that is not strictly
>standard-compliant either, but is widely adopted across the industry.
>It consists of bringing the bonding master interface down to signal to
>upper-layer processes that it is not usable.
As I've said, I believe that the current behavior is compliant
with the standard, as IEEE 802.1AX-2014 6.1.1.j, as we've discussed,
says that (to summarize) links that cannot participate in Link
Aggregation ... operate as normal, individual links. The mechanism by
which that takes place is not defined, and we, in my opinion, are not in
violation of the standard by selecting a bond member and making it
active.
>This patchset depends on the following iproute2 change:
>ip/bond: add lacp_fallback support
>
>Patch 1 introduces the lacp_fallback configuration knob, which is
>applied in the subsequent patch. The default (legacy) mode preserves
>the existing behavior, while the strict mode is intended to force the
>bonding master carrier down in this situation.
The above notwithstanding, I don't object in general to an
option that says, essentially, "require that only LACP-partnered ports
be made active."
Also, after reading through the patch set, I'm comfortable
calling the entire series a "fix," i.e., suitable for net vs net-next.
Most of the changes are genuine code fixes, and the addition of the
option is less a real feature and more of a change to better manage
connectivity in edge cases, so I'm fine with that being called a fix as
well.
-J
>Patch 2 addresses the core issue when lacp_fallback is set to strict.
>It ensures that carrier is asserted only when at least 'min_links'
>slaves are in a valid state (collecting/distributing, or collecting
>only when coupled_control is disabled).
>
>Patch 3 fixes a side effect of the first patch. Tightening the carrier
>logic exposes a state persistence bug: when a physical link goes down,
>the LACP collecting/distributing flags remain set. When the link returns,
>the interface briefly hallucinates that it is ready, bounces the carrier
>up, and then drops it again once LACP renegotiation starts. Unsetting the
>SELECTED flag when the link goes down forces the state machine through
>DETACHED, clearing the stale flags and preventing the flap.
>
>Patch 4 fixes a side effect of the second patch caused by clearing the
>SELECTED flag on disabled ports. After all ports in an aggregator go
>down, if only a subset of ports comes back up, those ports can no
>longer renegotiate LACP unless all aggregator ports come back up.
>
>Patch 5 adds a test for the bonding legacy and strict LACP fallback modes.
>
>Louis Scalbert (5):
> bonding: 3ad: add lacp_fallback configuration knob
> bonding: 3ad: fix carrier when no valid slaves
> bonding: 3ad: fix mux port state on oper down
> bonding: 3ad: fix stuck negotiation on recovery
> selftests: bonding: add test for fallback mode
>
> Documentation/networking/bonding.rst | 33 ++
> drivers/net/bonding/bond_3ad.c | 38 ++-
> drivers/net/bonding/bond_main.c | 1 +
> drivers/net/bonding/bond_netlink.c | 16 +
> drivers/net/bonding/bond_options.c | 27 ++
> include/net/bond_options.h | 1 +
> include/net/bonding.h | 1 +
> include/uapi/linux/if_link.h | 1 +
> .../selftests/drivers/net/bonding/Makefile | 1 +
> .../drivers/net/bonding/bond_lacp_fallback.sh | 299 ++++++++++++++++++
> 10 files changed, 415 insertions(+), 3 deletions(-)
> create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_lacp_fallback.sh
>
>--
>2.39.2
>
---
-Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 18+ messages in thread