* [net-next v6 0/4] add broadcast_neighbor for no-stacking networking arch
@ 2025-06-10 3:44 Tonghao Zhang
2025-06-10 3:44 ` [net-next v6 1/4] net: bonding: add broadcast_neighbor option for 802.3ad Tonghao Zhang
` (4 more replies)
0 siblings, 5 replies; 17+ messages in thread
From: Tonghao Zhang @ 2025-06-10 3:44 UTC (permalink / raw)
To: netdev
Cc: Tonghao Zhang, Jay Vosburgh, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
Andrew Lunn, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Nikolay Aleksandrov, Zengbing Tu
For no-stacking networking arch, and enable the bond mode 4(lacp) in
datacenter, the switch require arp/nd packets as session synchronization.
More details please see patch.
v6 change log:
- del unnecessary rcu_read_lock in bond_xmit_broadcast
v5 change log:
- format commit message of all patches
- use skb_header_pointer instead of pskb_may_pull
- send only packets to active slaves instead of all ports, and add more commit log
v4 change log:
- fix dec option in bond_close
v3 change log:
- inc/dec broadcast_neighbor option in bond_open/close and UP state.
- remove explicit inline of bond_should_broadcast_neighbor
- remove sysfs option
- remove EXPORT_SYMBOL_GPL
- reorder option bond_opt_value
- use rcu_xxx in bond_should_notify_peers.
v2 change log:
- add static branch for performance
- add more info about no-stacking arch in commit message
- add broadcast_neighbor info and format doc
- invoke bond_should_broadcast_neighbor only in BOND_MODE_8023AD mode for performance
- explain why we need sending peer notify when failure recovery
- change the doc about num_unsol_na
- refine function name to ad_cond_set_peer_notif
- ad_cond_set_peer_notif invoked in ad_enable_collecting_distributing
- refine bond_should_notify_peers for lacp mode.
Cc: Jay Vosburgh <jv@jvosburgh.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Cc: Zengbing Tu <tuzengbing@didiglobal.com>
Tonghao Zhang (4):
net: bonding: add broadcast_neighbor option for 802.3ad
net: bonding: add broadcast_neighbor netlink option
net: bonding: send peer notify when failure recovery
net: bonding: add tracepoint for 802.3ad
Documentation/networking/bonding.rst | 11 +++-
drivers/net/bonding/bond_3ad.c | 19 ++++++
drivers/net/bonding/bond_main.c | 87 ++++++++++++++++++++++++----
drivers/net/bonding/bond_netlink.c | 16 +++++
drivers/net/bonding/bond_options.c | 35 +++++++++++
include/net/bond_options.h | 1 +
include/net/bonding.h | 3 +
include/trace/events/bonding.h | 37 ++++++++++++
include/uapi/linux/if_link.h | 1 +
9 files changed, 197 insertions(+), 13 deletions(-)
create mode 100644 include/trace/events/bonding.h
--
2.34.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [net-next v6 1/4] net: bonding: add broadcast_neighbor option for 802.3ad
2025-06-10 3:44 [net-next v6 0/4] add broadcast_neighbor for no-stacking networking arch Tonghao Zhang
@ 2025-06-10 3:44 ` Tonghao Zhang
2025-06-16 23:04 ` Jakub Kicinski
2025-06-10 3:44 ` [net-next v6 2/4] net: bonding: add broadcast_neighbor netlink option Tonghao Zhang
` (3 subsequent siblings)
4 siblings, 1 reply; 17+ messages in thread
From: Tonghao Zhang @ 2025-06-10 3:44 UTC (permalink / raw)
To: netdev
Cc: Tonghao Zhang, Jay Vosburgh, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
Andrew Lunn, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Nikolay Aleksandrov, Zengbing Tu
Stacking technology is a type of technology used to expand ports on
Ethernet switches. It is widely used as a common access method in
large-scale Internet data center architectures. Years of practice
have proved that stacking technology has advantages and disadvantages
in high-reliability network architecture scenarios. For instance,
in stacking networking arch, conventional switch system upgrades
require multiple stacked devices to restart at the same time.
Therefore, it is inevitable that the business will be interrupted
for a while. It is for this reason that "no-stacking" in data centers
has become a trend. Additionally, when the stacking link connecting
the switches fails or is abnormal, the stack will split. Although it is
not common, it still happens in actual operation. The problem is that
after the split, it is equivalent to two switches with the same
configuration appearing in the network, causing network configuration
conflicts and ultimately interrupting the services carried by the
stacking system.
To improve network stability, "non-stacking" solutions have been
increasingly adopted, particularly by public cloud providers and
tech companies like Alibaba, Tencent, and Didi. "non-stacking" is
a method of mimicing switch stacking that convinces a LACP peer,
bonding in this case, connected to a set of "non-stacked" switches
that all of its ports are connected to a single switch
(i.e., LACP aggregator), as if those switches were stacked. This
enables the LACP peer's ports to aggregate together, and requires
(a) special switch configuration, described in the linked article,
and (b) modifications to the bonding 802.3ad (LACP) mode to send
all ARP/ND packets across all ports of the active aggregator.
Note that, with multiple aggregators, the current broadcast mode
logic will send only packets to the selected aggregator(s).
+-----------+ +-----------+
| switch1 | | switch2 |
+-----------+ +-----------+
^ ^
| |
+-----------------+
| bond4 lacp |
+-----------------+
| |
| NIC1 | NIC2
+-----------------+
| server |
+-----------------+
- https://www.ruijie.com/fr-fr/support/tech-gallery/de-stack-data-center-network-architecture/
Cc: Jay Vosburgh <jv@jvosburgh.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com>
Signed-off-by: Zengbing Tu <tuzengbing@didiglobal.com>
---
Documentation/networking/bonding.rst | 6 +++
drivers/net/bonding/bond_main.c | 66 +++++++++++++++++++++++++---
drivers/net/bonding/bond_options.c | 35 +++++++++++++++
include/net/bond_options.h | 1 +
include/net/bonding.h | 3 ++
5 files changed, 105 insertions(+), 6 deletions(-)
diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
index a4c1291d2561..14f7593d888d 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -562,6 +562,12 @@ lacp_rate
The default is slow.
+broadcast_neighbor
+
+ Option specifying whether to broadcast ARP/ND packets to all
+ active slaves. This option has no effect in modes other than
+ 802.3ad mode. The default is off (0).
+
max_bonds
Specifies the number of bonding devices to create for this
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index c4d53e8e7c15..12046ef51569 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -212,6 +212,8 @@ atomic_t netpoll_block_tx = ATOMIC_INIT(0);
unsigned int bond_net_id __read_mostly;
+DEFINE_STATIC_KEY_FALSE(bond_bcast_neigh_enabled);
+
static const struct flow_dissector_key flow_keys_bonding_keys[] = {
{
.key_id = FLOW_DISSECTOR_KEY_CONTROL,
@@ -4456,6 +4458,9 @@ static int bond_open(struct net_device *bond_dev)
bond_for_each_slave(bond, slave, iter)
dev_mc_add(slave->dev, lacpdu_mcast_addr);
+
+ if (bond->params.broadcast_neighbor)
+ static_branch_inc(&bond_bcast_neigh_enabled);
}
if (bond_mode_can_use_xmit_hash(bond))
@@ -4475,6 +4480,10 @@ static int bond_close(struct net_device *bond_dev)
bond_alb_deinitialize(bond);
bond->recv_probe = NULL;
+ if (BOND_MODE(bond) == BOND_MODE_8023AD &&
+ bond->params.broadcast_neighbor)
+ static_branch_dec(&bond_bcast_neigh_enabled);
+
if (bond_uses_primary(bond)) {
rcu_read_lock();
slave = rcu_dereference(bond->curr_active_slave);
@@ -5310,6 +5319,37 @@ static struct slave *bond_xdp_xmit_3ad_xor_slave_get(struct bonding *bond,
return slaves->arr[hash % count];
}
+static bool bond_should_broadcast_neighbor(struct sk_buff *skb,
+ struct net_device *dev)
+{
+ struct bonding *bond = netdev_priv(dev);
+ struct {
+ struct ipv6hdr ip6;
+ struct icmp6hdr icmp6;
+ } *combined, _combined;
+
+ if (!static_branch_unlikely(&bond_bcast_neigh_enabled))
+ return false;
+
+ if (!bond->params.broadcast_neighbor)
+ return false;
+
+ if (skb->protocol == htons(ETH_P_ARP))
+ return true;
+
+ if (skb->protocol == htons(ETH_P_IPV6)) {
+ combined = skb_header_pointer(skb, skb_mac_header_len(skb),
+ sizeof(_combined),
+ &_combined);
+ if (combined && combined->ip6.nexthdr == NEXTHDR_ICMP &&
+ (combined->icmp6.icmp6_type == NDISC_NEIGHBOUR_SOLICITATION ||
+ combined->icmp6.icmp6_type == NDISC_NEIGHBOUR_ADVERTISEMENT))
+ return true;
+ }
+
+ return false;
+}
+
/* Use this Xmit function for 3AD as well as XOR modes. The current
* usable slave array is formed in the control path. The xmit function
* just calculates hash and sends the packet out.
@@ -5329,17 +5369,27 @@ static netdev_tx_t bond_3ad_xor_xmit(struct sk_buff *skb,
return bond_tx_drop(dev, skb);
}
-/* in broadcast mode, we send everything to all usable interfaces. */
+/* in broadcast mode, we send everything to all or usable slave interfaces.
+ * under rcu_read_lock when this function is called.
+ */
static netdev_tx_t bond_xmit_broadcast(struct sk_buff *skb,
- struct net_device *bond_dev)
+ struct net_device *bond_dev,
+ bool all_slaves)
{
struct bonding *bond = netdev_priv(bond_dev);
- struct slave *slave = NULL;
- struct list_head *iter;
+ struct bond_up_slave *slaves;
bool xmit_suc = false;
bool skb_used = false;
+ int slaves_count, i;
- bond_for_each_slave_rcu(bond, slave, iter) {
+ if (all_slaves)
+ slaves = rcu_dereference(bond->all_slaves);
+ else
+ slaves = rcu_dereference(bond->usable_slaves);
+
+ slaves_count = slaves ? READ_ONCE(slaves->count) : 0;
+ for (i = 0; i < slaves_count; i++) {
+ struct slave *slave = slaves->arr[i];
struct sk_buff *skb2;
if (!(bond_slave_is_up(slave) && slave->link == BOND_LINK_UP))
@@ -5577,10 +5627,13 @@ static netdev_tx_t __bond_start_xmit(struct sk_buff *skb, struct net_device *dev
case BOND_MODE_ACTIVEBACKUP:
return bond_xmit_activebackup(skb, dev);
case BOND_MODE_8023AD:
+ if (bond_should_broadcast_neighbor(skb, dev))
+ return bond_xmit_broadcast(skb, dev, false);
+ fallthrough;
case BOND_MODE_XOR:
return bond_3ad_xor_xmit(skb, dev);
case BOND_MODE_BROADCAST:
- return bond_xmit_broadcast(skb, dev);
+ return bond_xmit_broadcast(skb, dev, true);
case BOND_MODE_ALB:
return bond_alb_xmit(skb, dev);
case BOND_MODE_TLB:
@@ -6456,6 +6509,7 @@ static int __init bond_check_params(struct bond_params *params)
eth_zero_addr(params->ad_actor_system);
params->ad_user_port_key = ad_user_port_key;
params->coupled_control = 1;
+ params->broadcast_neighbor = 0;
if (packets_per_slave > 0) {
params->reciprocal_packets_per_slave =
reciprocal_value(packets_per_slave);
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 91893c29b899..7f0939337231 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -87,6 +87,8 @@ static int bond_option_missed_max_set(struct bonding *bond,
const struct bond_opt_value *newval);
static int bond_option_coupled_control_set(struct bonding *bond,
const struct bond_opt_value *newval);
+static int bond_option_broadcast_neigh_set(struct bonding *bond,
+ const struct bond_opt_value *newval);
static const struct bond_opt_value bond_mode_tbl[] = {
{ "balance-rr", BOND_MODE_ROUNDROBIN, BOND_VALFLAG_DEFAULT},
@@ -240,6 +242,12 @@ static const struct bond_opt_value bond_coupled_control_tbl[] = {
{ NULL, -1, 0},
};
+static const struct bond_opt_value bond_broadcast_neigh_tbl[] = {
+ { "off", 0, BOND_VALFLAG_DEFAULT},
+ { "on", 1, 0},
+ { NULL, -1, 0}
+};
+
static const struct bond_option bond_opts[BOND_OPT_LAST] = {
[BOND_OPT_MODE] = {
.id = BOND_OPT_MODE,
@@ -513,6 +521,14 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = {
.flags = BOND_OPTFLAG_IFDOWN,
.values = bond_coupled_control_tbl,
.set = bond_option_coupled_control_set,
+ },
+ [BOND_OPT_BROADCAST_NEIGH] = {
+ .id = BOND_OPT_BROADCAST_NEIGH,
+ .name = "broadcast_neighbor",
+ .desc = "Broadcast neighbor packets to all slaves",
+ .unsuppmodes = BOND_MODE_ALL_EX(BIT(BOND_MODE_8023AD)),
+ .values = bond_broadcast_neigh_tbl,
+ .set = bond_option_broadcast_neigh_set,
}
};
@@ -1840,3 +1856,22 @@ static int bond_option_coupled_control_set(struct bonding *bond,
bond->params.coupled_control = newval->value;
return 0;
}
+
+static int bond_option_broadcast_neigh_set(struct bonding *bond,
+ const struct bond_opt_value *newval)
+{
+ if (bond->params.broadcast_neighbor == newval->value)
+ return 0;
+
+ bond->params.broadcast_neighbor = newval->value;
+ if (bond->dev->flags & IFF_UP) {
+ if (bond->params.broadcast_neighbor)
+ static_branch_inc(&bond_bcast_neigh_enabled);
+ else
+ static_branch_dec(&bond_bcast_neigh_enabled);
+ }
+
+ netdev_dbg(bond->dev, "Setting broadcast_neighbor to %s (%llu)\n",
+ newval->string, newval->value);
+ return 0;
+}
diff --git a/include/net/bond_options.h b/include/net/bond_options.h
index 18687ccf0638..022b122a9fb6 100644
--- a/include/net/bond_options.h
+++ b/include/net/bond_options.h
@@ -77,6 +77,7 @@ enum {
BOND_OPT_NS_TARGETS,
BOND_OPT_PRIO,
BOND_OPT_COUPLED_CONTROL,
+ BOND_OPT_BROADCAST_NEIGH,
BOND_OPT_LAST
};
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 95f67b308c19..e06f0d63b2c1 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -115,6 +115,8 @@ static inline int is_netpoll_tx_blocked(struct net_device *dev)
#define is_netpoll_tx_blocked(dev) (0)
#endif
+DECLARE_STATIC_KEY_FALSE(bond_bcast_neigh_enabled);
+
struct bond_params {
int mode;
int xmit_policy;
@@ -149,6 +151,7 @@ struct bond_params {
struct in6_addr ns_targets[BOND_MAX_NS_TARGETS];
#endif
int coupled_control;
+ int broadcast_neighbor;
/* 2 bytes of padding : see ether_addr_equal_64bits() */
u8 ad_actor_system[ETH_ALEN + 2];
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [net-next v6 2/4] net: bonding: add broadcast_neighbor netlink option
2025-06-10 3:44 [net-next v6 0/4] add broadcast_neighbor for no-stacking networking arch Tonghao Zhang
2025-06-10 3:44 ` [net-next v6 1/4] net: bonding: add broadcast_neighbor option for 802.3ad Tonghao Zhang
@ 2025-06-10 3:44 ` Tonghao Zhang
2025-06-10 3:44 ` [net-next v6 3/4] net: bonding: send peer notify when failure recovery Tonghao Zhang
` (2 subsequent siblings)
4 siblings, 0 replies; 17+ messages in thread
From: Tonghao Zhang @ 2025-06-10 3:44 UTC (permalink / raw)
To: netdev
Cc: Tonghao Zhang, Jay Vosburgh, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
Andrew Lunn, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Nikolay Aleksandrov, Zengbing Tu
User can config or display the bonding broadcast_neighbor option via
iproute2/netlink.
Cc: Jay Vosburgh <jv@jvosburgh.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com>
Signed-off-by: Zengbing Tu <tuzengbing@didiglobal.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
---
drivers/net/bonding/bond_netlink.c | 16 ++++++++++++++++
include/uapi/linux/if_link.h | 1 +
2 files changed, 17 insertions(+)
diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
index ac5e402c34bc..57fff2421f1b 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -124,6 +124,7 @@ static const struct nla_policy bond_policy[IFLA_BOND_MAX + 1] = {
[IFLA_BOND_MISSED_MAX] = { .type = NLA_U8 },
[IFLA_BOND_NS_IP6_TARGET] = { .type = NLA_NESTED },
[IFLA_BOND_COUPLED_CONTROL] = { .type = NLA_U8 },
+ [IFLA_BOND_BROADCAST_NEIGH] = { .type = NLA_U8 },
};
static const struct nla_policy bond_slave_policy[IFLA_BOND_SLAVE_MAX + 1] = {
@@ -561,6 +562,16 @@ static int bond_changelink(struct net_device *bond_dev, struct nlattr *tb[],
return err;
}
+ if (data[IFLA_BOND_BROADCAST_NEIGH]) {
+ int broadcast_neigh = nla_get_u8(data[IFLA_BOND_BROADCAST_NEIGH]);
+
+ bond_opt_initval(&newval, broadcast_neigh);
+ err = __bond_opt_set(bond, BOND_OPT_BROADCAST_NEIGH, &newval,
+ data[IFLA_BOND_BROADCAST_NEIGH], extack);
+ if (err)
+ return err;
+ }
+
return 0;
}
@@ -630,6 +641,7 @@ static size_t bond_get_size(const struct net_device *bond_dev)
nla_total_size(sizeof(struct nlattr)) +
nla_total_size(sizeof(struct in6_addr)) * BOND_MAX_NS_TARGETS +
nla_total_size(sizeof(u8)) + /* IFLA_BOND_COUPLED_CONTROL */
+ nla_total_size(sizeof(u8)) + /* IFLA_BOND_BROADCAST_NEIGH */
0;
}
@@ -793,6 +805,10 @@ static int bond_fill_info(struct sk_buff *skb,
bond->params.coupled_control))
goto nla_put_failure;
+ if (nla_put_u8(skb, IFLA_BOND_BROADCAST_NEIGH,
+ bond->params.broadcast_neighbor))
+ goto nla_put_failure;
+
if (BOND_MODE(bond) == BOND_MODE_8023AD) {
struct ad_info info;
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 3ad2d5d98034..53b2f6ebda8b 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -1534,6 +1534,7 @@ enum {
IFLA_BOND_MISSED_MAX,
IFLA_BOND_NS_IP6_TARGET,
IFLA_BOND_COUPLED_CONTROL,
+ IFLA_BOND_BROADCAST_NEIGH,
__IFLA_BOND_MAX,
};
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [net-next v6 3/4] net: bonding: send peer notify when failure recovery
2025-06-10 3:44 [net-next v6 0/4] add broadcast_neighbor for no-stacking networking arch Tonghao Zhang
2025-06-10 3:44 ` [net-next v6 1/4] net: bonding: add broadcast_neighbor option for 802.3ad Tonghao Zhang
2025-06-10 3:44 ` [net-next v6 2/4] net: bonding: add broadcast_neighbor netlink option Tonghao Zhang
@ 2025-06-10 3:44 ` Tonghao Zhang
2025-06-17 0:36 ` Jay Vosburgh
2025-06-10 3:44 ` [net-next v6 4/4] net: bonding: add tracepoint for 802.3ad Tonghao Zhang
2025-06-16 2:07 ` [net-next v6 0/4] add broadcast_neighbor for no-stacking networking arch Tonghao Zhang
4 siblings, 1 reply; 17+ messages in thread
From: Tonghao Zhang @ 2025-06-10 3:44 UTC (permalink / raw)
To: netdev
Cc: Tonghao Zhang, Jay Vosburgh, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
Andrew Lunn, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Nikolay Aleksandrov, Zengbing Tu
After LACP protocol recovery, the port can transmit packets.
However, if the bond port doesn't send gratuitous ARP/ND
packets to the switch, the switch won't return packets through
the current interface. This causes traffic imbalance. To resolve
this issue, when LACP protocol recovers, send ARP/ND packets.
Cc: Jay Vosburgh <jv@jvosburgh.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com>
Signed-off-by: Zengbing Tu <tuzengbing@didiglobal.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
---
Documentation/networking/bonding.rst | 5 +++--
drivers/net/bonding/bond_3ad.c | 13 +++++++++++++
drivers/net/bonding/bond_main.c | 21 ++++++++++++++++-----
3 files changed, 32 insertions(+), 7 deletions(-)
diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
index 14f7593d888d..f8f5766703d4 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -773,8 +773,9 @@ num_unsol_na
greater than 1.
The valid range is 0 - 255; the default value is 1. These options
- affect only the active-backup mode. These options were added for
- bonding versions 3.3.0 and 3.4.0 respectively.
+ affect the active-backup or 802.3ad (broadcast_neighbor enabled) mode.
+ These options were added for bonding versions 3.3.0 and 3.4.0
+ respectively.
From Linux 3.0 and bonding version 3.7.1, these notifications
are generated by the ipv4 and ipv6 code and the numbers of
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index c6807e473ab7..d1c2d416ac87 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -982,6 +982,17 @@ static int ad_marker_send(struct port *port, struct bond_marker *marker)
return 0;
}
+static void ad_cond_set_peer_notif(struct port *port)
+{
+ struct bonding *bond = port->slave->bond;
+
+ if (bond->params.broadcast_neighbor && rtnl_trylock()) {
+ bond->send_peer_notif = bond->params.num_peer_notif *
+ max(1, bond->params.peer_notif_delay);
+ rtnl_unlock();
+ }
+}
+
/**
* ad_mux_machine - handle a port's mux state machine
* @port: the port we're looking at
@@ -2061,6 +2072,8 @@ static void ad_enable_collecting_distributing(struct port *port,
__enable_port(port);
/* Slave array needs update */
*update_slave_arr = true;
+ /* Should notify peers if possible */
+ ad_cond_set_peer_notif(port);
}
}
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 12046ef51569..0acece55d9cb 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1237,17 +1237,28 @@ static struct slave *bond_find_best_slave(struct bonding *bond)
/* must be called in RCU critical section or with RTNL held */
static bool bond_should_notify_peers(struct bonding *bond)
{
- struct slave *slave = rcu_dereference_rtnl(bond->curr_active_slave);
+ struct bond_up_slave *usable;
+ struct slave *slave = NULL;
- if (!slave || !bond->send_peer_notif ||
+ if (!bond->send_peer_notif ||
bond->send_peer_notif %
max(1, bond->params.peer_notif_delay) != 0 ||
- !netif_carrier_ok(bond->dev) ||
- test_bit(__LINK_STATE_LINKWATCH_PENDING, &slave->dev->state))
+ !netif_carrier_ok(bond->dev))
return false;
+ if (BOND_MODE(bond) == BOND_MODE_8023AD) {
+ usable = rcu_dereference_rtnl(bond->usable_slaves);
+ if (!usable || !READ_ONCE(usable->count))
+ return false;
+ } else {
+ slave = rcu_dereference_rtnl(bond->curr_active_slave);
+ if (!slave || test_bit(__LINK_STATE_LINKWATCH_PENDING,
+ &slave->dev->state))
+ return false;
+ }
+
netdev_dbg(bond->dev, "bond_should_notify_peers: slave %s\n",
- slave ? slave->dev->name : "NULL");
+ slave ? slave->dev->name : "all");
return true;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [net-next v6 4/4] net: bonding: add tracepoint for 802.3ad
2025-06-10 3:44 [net-next v6 0/4] add broadcast_neighbor for no-stacking networking arch Tonghao Zhang
` (2 preceding siblings ...)
2025-06-10 3:44 ` [net-next v6 3/4] net: bonding: send peer notify when failure recovery Tonghao Zhang
@ 2025-06-10 3:44 ` Tonghao Zhang
2025-06-17 0:28 ` Jay Vosburgh
2025-06-16 2:07 ` [net-next v6 0/4] add broadcast_neighbor for no-stacking networking arch Tonghao Zhang
4 siblings, 1 reply; 17+ messages in thread
From: Tonghao Zhang @ 2025-06-10 3:44 UTC (permalink / raw)
To: netdev
Cc: Tonghao Zhang, Jay Vosburgh, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
Andrew Lunn, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Nikolay Aleksandrov, Zengbing Tu
Users can monitor NIC link status changes through netlink. However, LACP
protocol failures may occur despite operational physical links. There is
no way to detect LACP state changes. This patch adds tracepoint at
LACP state transition.
Cc: Jay Vosburgh <jv@jvosburgh.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com>
Signed-off-by: Zengbing Tu <tuzengbing@didiglobal.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
---
drivers/net/bonding/bond_3ad.c | 6 ++++++
include/trace/events/bonding.h | 37 ++++++++++++++++++++++++++++++++++
2 files changed, 43 insertions(+)
create mode 100644 include/trace/events/bonding.h
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index d1c2d416ac87..55703230ab29 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -16,6 +16,9 @@
#include <net/bond_3ad.h>
#include <net/netlink.h>
+#define CREATE_TRACE_POINTS
+#include <trace/events/bonding.h>
+
/* General definitions */
#define AD_SHORT_TIMEOUT 1
#define AD_LONG_TIMEOUT 0
@@ -1146,6 +1149,9 @@ static void ad_mux_machine(struct port *port, bool *update_slave_arr)
port->actor_port_number,
last_state,
port->sm_mux_state);
+
+ trace_3ad_mux_state(port->slave->dev, last_state, port->sm_mux_state);
+
switch (port->sm_mux_state) {
case AD_MUX_DETACHED:
port->actor_oper_port_state &= ~LACP_STATE_SYNCHRONIZATION;
diff --git a/include/trace/events/bonding.h b/include/trace/events/bonding.h
new file mode 100644
index 000000000000..1ee4b07d912a
--- /dev/null
+++ b/include/trace/events/bonding.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#if !defined(_TRACE_BONDING_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_BONDING_H
+
+#include <linux/netdevice.h>
+#include <linux/tracepoint.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM bonding
+
+TRACE_EVENT(3ad_mux_state,
+ TP_PROTO(struct net_device *dev, u32 last_state, u32 curr_state),
+ TP_ARGS(dev, last_state, curr_state),
+
+ TP_STRUCT__entry(
+ __field(int, ifindex)
+ __string(dev_name, dev->name)
+ __field(u32, last_state)
+ __field(u32, curr_state)
+ ),
+
+ TP_fast_assign(
+ __entry->ifindex = dev->ifindex;
+ __assign_str(dev_name);
+ __entry->last_state = last_state;
+ __entry->curr_state = curr_state;
+ ),
+
+ TP_printk("ifindex %d dev %s last_state 0x%x curr_state 0x%x",
+ __entry->ifindex, __get_str(dev_name),
+ __entry->last_state, __entry->curr_state)
+);
+
+#endif /* _TRACE_BONDING_H */
+
+#include <trace/define_trace.h>
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [net-next v6 0/4] add broadcast_neighbor for no-stacking networking arch
2025-06-10 3:44 [net-next v6 0/4] add broadcast_neighbor for no-stacking networking arch Tonghao Zhang
` (3 preceding siblings ...)
2025-06-10 3:44 ` [net-next v6 4/4] net: bonding: add tracepoint for 802.3ad Tonghao Zhang
@ 2025-06-16 2:07 ` Tonghao Zhang
4 siblings, 0 replies; 17+ messages in thread
From: Tonghao Zhang @ 2025-06-16 2:07 UTC (permalink / raw)
To: netdev
Cc: Jay Vosburgh, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Andrew Lunn,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Nikolay Aleksandrov, Zengbing Tu
ping
> 2025年6月10日 11:44,Tonghao Zhang <tonghao@bamaicloud.com> 写道:
>
> For no-stacking networking arch, and enable the bond mode 4(lacp) in
> datacenter, the switch require arp/nd packets as session synchronization.
> More details please see patch.
>
> v6 change log:
> - del unnecessary rcu_read_lock in bond_xmit_broadcast
>
> v5 change log:
> - format commit message of all patches
> - use skb_header_pointer instead of pskb_may_pull
> - send only packets to active slaves instead of all ports, and add more commit log
>
> v4 change log:
> - fix dec option in bond_close
>
> v3 change log:
> - inc/dec broadcast_neighbor option in bond_open/close and UP state.
> - remove explicit inline of bond_should_broadcast_neighbor
> - remove sysfs option
> - remove EXPORT_SYMBOL_GPL
> - reorder option bond_opt_value
> - use rcu_xxx in bond_should_notify_peers.
>
> v2 change log:
> - add static branch for performance
> - add more info about no-stacking arch in commit message
> - add broadcast_neighbor info and format doc
> - invoke bond_should_broadcast_neighbor only in BOND_MODE_8023AD mode for performance
> - explain why we need sending peer notify when failure recovery
> - change the doc about num_unsol_na
> - refine function name to ad_cond_set_peer_notif
> - ad_cond_set_peer_notif invoked in ad_enable_collecting_distributing
> - refine bond_should_notify_peers for lacp mode.
>
> Cc: Jay Vosburgh <jv@jvosburgh.net>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Simon Horman <horms@kernel.org>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Andrew Lunn <andrew+netdev@lunn.ch>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Nikolay Aleksandrov <razor@blackwall.org>
> Cc: Zengbing Tu <tuzengbing@didiglobal.com>
>
> Tonghao Zhang (4):
> net: bonding: add broadcast_neighbor option for 802.3ad
> net: bonding: add broadcast_neighbor netlink option
> net: bonding: send peer notify when failure recovery
> net: bonding: add tracepoint for 802.3ad
>
> Documentation/networking/bonding.rst | 11 +++-
> drivers/net/bonding/bond_3ad.c | 19 ++++++
> drivers/net/bonding/bond_main.c | 87 ++++++++++++++++++++++++----
> drivers/net/bonding/bond_netlink.c | 16 +++++
> drivers/net/bonding/bond_options.c | 35 +++++++++++
> include/net/bond_options.h | 1 +
> include/net/bonding.h | 3 +
> include/trace/events/bonding.h | 37 ++++++++++++
> include/uapi/linux/if_link.h | 1 +
> 9 files changed, 197 insertions(+), 13 deletions(-)
> create mode 100644 include/trace/events/bonding.h
>
> --
> 2.34.1
>
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [net-next v6 1/4] net: bonding: add broadcast_neighbor option for 802.3ad
2025-06-10 3:44 ` [net-next v6 1/4] net: bonding: add broadcast_neighbor option for 802.3ad Tonghao Zhang
@ 2025-06-16 23:04 ` Jakub Kicinski
2025-06-17 10:25 ` Tonghao Zhang
0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2025-06-16 23:04 UTC (permalink / raw)
To: Tonghao Zhang
Cc: netdev, Jay Vosburgh, David S. Miller, Eric Dumazet, Paolo Abeni,
Simon Horman, Jonathan Corbet, Andrew Lunn, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Nikolay Aleksandrov,
Zengbing Tu
On Tue, 10 Jun 2025 11:44:42 +0800 Tonghao Zhang wrote:
> + if (bond->params.broadcast_neighbor)
> + static_branch_inc(&bond_bcast_neigh_enabled);
> }
>
> if (bond_mode_can_use_xmit_hash(bond))
> @@ -4475,6 +4480,10 @@ static int bond_close(struct net_device *bond_dev)
> bond_alb_deinitialize(bond);
> bond->recv_probe = NULL;
>
> + if (BOND_MODE(bond) == BOND_MODE_8023AD &&
> + bond->params.broadcast_neighbor)
> + static_branch_dec(&bond_bcast_neigh_enabled);
is .broadcast_neighbor automatically cleared when mode is changed?
I don't see it in the code.
--
pw-bot: cr
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [net-next v6 4/4] net: bonding: add tracepoint for 802.3ad
2025-06-10 3:44 ` [net-next v6 4/4] net: bonding: add tracepoint for 802.3ad Tonghao Zhang
@ 2025-06-17 0:28 ` Jay Vosburgh
2025-06-17 10:37 ` Tonghao Zhang
0 siblings, 1 reply; 17+ messages in thread
From: Jay Vosburgh @ 2025-06-17 0:28 UTC (permalink / raw)
To: Tonghao Zhang
Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Andrew Lunn,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Nikolay Aleksandrov, Zengbing Tu
Tonghao Zhang <tonghao@bamaicloud.com> wrote:
>Users can monitor NIC link status changes through netlink. However, LACP
>protocol failures may occur despite operational physical links. There is
>no way to detect LACP state changes. This patch adds tracepoint at
>LACP state transition.
This patch really has nothing to do with the rest of the series
(it's unrelated to the broadcast_neighbor functionality), and should
really be sent separately.
That said, I recall asking about work that was proposed some
time ago to create netlink events (visible to ip monitor, et al) when
the LACP state changes. That would be a cleaner method to watch the
LACP state machine (as it would integrate with all of the other event
infrastructure). Maybe I missed the response, but what became of that
work?
-J
>Cc: Jay Vosburgh <jv@jvosburgh.net>
>Cc: "David S. Miller" <davem@davemloft.net>
>Cc: Eric Dumazet <edumazet@google.com>
>Cc: Jakub Kicinski <kuba@kernel.org>
>Cc: Paolo Abeni <pabeni@redhat.com>
>Cc: Simon Horman <horms@kernel.org>
>Cc: Jonathan Corbet <corbet@lwn.net>
>Cc: Andrew Lunn <andrew+netdev@lunn.ch>
>Cc: Steven Rostedt <rostedt@goodmis.org>
>Cc: Masami Hiramatsu <mhiramat@kernel.org>
>Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>Cc: Nikolay Aleksandrov <razor@blackwall.org>
>Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com>
>Signed-off-by: Zengbing Tu <tuzengbing@didiglobal.com>
>Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
>---
> drivers/net/bonding/bond_3ad.c | 6 ++++++
> include/trace/events/bonding.h | 37 ++++++++++++++++++++++++++++++++++
> 2 files changed, 43 insertions(+)
> create mode 100644 include/trace/events/bonding.h
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index d1c2d416ac87..55703230ab29 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -16,6 +16,9 @@
> #include <net/bond_3ad.h>
> #include <net/netlink.h>
>
>+#define CREATE_TRACE_POINTS
>+#include <trace/events/bonding.h>
>+
> /* General definitions */
> #define AD_SHORT_TIMEOUT 1
> #define AD_LONG_TIMEOUT 0
>@@ -1146,6 +1149,9 @@ static void ad_mux_machine(struct port *port, bool *update_slave_arr)
> port->actor_port_number,
> last_state,
> port->sm_mux_state);
>+
>+ trace_3ad_mux_state(port->slave->dev, last_state, port->sm_mux_state);
>+
> switch (port->sm_mux_state) {
> case AD_MUX_DETACHED:
> port->actor_oper_port_state &= ~LACP_STATE_SYNCHRONIZATION;
>diff --git a/include/trace/events/bonding.h b/include/trace/events/bonding.h
>new file mode 100644
>index 000000000000..1ee4b07d912a
>--- /dev/null
>+++ b/include/trace/events/bonding.h
>@@ -0,0 +1,37 @@
>+/* SPDX-License-Identifier: GPL-2.0 */
>+
>+#if !defined(_TRACE_BONDING_H) || defined(TRACE_HEADER_MULTI_READ)
>+#define _TRACE_BONDING_H
>+
>+#include <linux/netdevice.h>
>+#include <linux/tracepoint.h>
>+
>+#undef TRACE_SYSTEM
>+#define TRACE_SYSTEM bonding
>+
>+TRACE_EVENT(3ad_mux_state,
>+ TP_PROTO(struct net_device *dev, u32 last_state, u32 curr_state),
>+ TP_ARGS(dev, last_state, curr_state),
>+
>+ TP_STRUCT__entry(
>+ __field(int, ifindex)
>+ __string(dev_name, dev->name)
>+ __field(u32, last_state)
>+ __field(u32, curr_state)
>+ ),
>+
>+ TP_fast_assign(
>+ __entry->ifindex = dev->ifindex;
>+ __assign_str(dev_name);
>+ __entry->last_state = last_state;
>+ __entry->curr_state = curr_state;
>+ ),
>+
>+ TP_printk("ifindex %d dev %s last_state 0x%x curr_state 0x%x",
>+ __entry->ifindex, __get_str(dev_name),
>+ __entry->last_state, __entry->curr_state)
>+);
>+
>+#endif /* _TRACE_BONDING_H */
>+
>+#include <trace/define_trace.h>
>--
>2.34.1
>
---
-Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [net-next v6 3/4] net: bonding: send peer notify when failure recovery
2025-06-10 3:44 ` [net-next v6 3/4] net: bonding: send peer notify when failure recovery Tonghao Zhang
@ 2025-06-17 0:36 ` Jay Vosburgh
2025-06-17 10:47 ` Tonghao Zhang
0 siblings, 1 reply; 17+ messages in thread
From: Jay Vosburgh @ 2025-06-17 0:36 UTC (permalink / raw)
To: Tonghao Zhang
Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Andrew Lunn,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Nikolay Aleksandrov, Zengbing Tu
Tonghao Zhang <tonghao@bamaicloud.com> wrote:
>After LACP protocol recovery, the port can transmit packets.
>However, if the bond port doesn't send gratuitous ARP/ND
>packets to the switch, the switch won't return packets through
>the current interface. This causes traffic imbalance. To resolve
>this issue, when LACP protocol recovers, send ARP/ND packets.
I think the description above needs to mention that the
gratuitous ARP/ND only happens if broadcast_neighbor is enabled.
I'll note that the documentation update does include this
caveat.
>Cc: Jay Vosburgh <jv@jvosburgh.net>
>Cc: "David S. Miller" <davem@davemloft.net>
>Cc: Eric Dumazet <edumazet@google.com>
>Cc: Jakub Kicinski <kuba@kernel.org>
>Cc: Paolo Abeni <pabeni@redhat.com>
>Cc: Simon Horman <horms@kernel.org>
>Cc: Jonathan Corbet <corbet@lwn.net>
>Cc: Andrew Lunn <andrew+netdev@lunn.ch>
>Cc: Steven Rostedt <rostedt@goodmis.org>
>Cc: Masami Hiramatsu <mhiramat@kernel.org>
>Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>Cc: Nikolay Aleksandrov <razor@blackwall.org>
>Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com>
>Signed-off-by: Zengbing Tu <tuzengbing@didiglobal.com>
>Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
>---
> Documentation/networking/bonding.rst | 5 +++--
> drivers/net/bonding/bond_3ad.c | 13 +++++++++++++
> drivers/net/bonding/bond_main.c | 21 ++++++++++++++++-----
> 3 files changed, 32 insertions(+), 7 deletions(-)
>
>diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
>index 14f7593d888d..f8f5766703d4 100644
>--- a/Documentation/networking/bonding.rst
>+++ b/Documentation/networking/bonding.rst
>@@ -773,8 +773,9 @@ num_unsol_na
> greater than 1.
>
> The valid range is 0 - 255; the default value is 1. These options
>- affect only the active-backup mode. These options were added for
>- bonding versions 3.3.0 and 3.4.0 respectively.
>+ affect the active-backup or 802.3ad (broadcast_neighbor enabled) mode.
>+ These options were added for bonding versions 3.3.0 and 3.4.0
>+ respectively.
>
> From Linux 3.0 and bonding version 3.7.1, these notifications
> are generated by the ipv4 and ipv6 code and the numbers of
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index c6807e473ab7..d1c2d416ac87 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -982,6 +982,17 @@ static int ad_marker_send(struct port *port, struct bond_marker *marker)
> return 0;
> }
>
>+static void ad_cond_set_peer_notif(struct port *port)
>+{
>+ struct bonding *bond = port->slave->bond;
>+
>+ if (bond->params.broadcast_neighbor && rtnl_trylock()) {
>+ bond->send_peer_notif = bond->params.num_peer_notif *
>+ max(1, bond->params.peer_notif_delay);
>+ rtnl_unlock();
>+ }
>+}
>+
> /**
> * ad_mux_machine - handle a port's mux state machine
> * @port: the port we're looking at
>@@ -2061,6 +2072,8 @@ static void ad_enable_collecting_distributing(struct port *port,
> __enable_port(port);
> /* Slave array needs update */
> *update_slave_arr = true;
>+ /* Should notify peers if possible */
>+ ad_cond_set_peer_notif(port);
> }
> }
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 12046ef51569..0acece55d9cb 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1237,17 +1237,28 @@ static struct slave *bond_find_best_slave(struct bonding *bond)
> /* must be called in RCU critical section or with RTNL held */
> static bool bond_should_notify_peers(struct bonding *bond)
> {
>- struct slave *slave = rcu_dereference_rtnl(bond->curr_active_slave);
>+ struct bond_up_slave *usable;
>+ struct slave *slave = NULL;
>
>- if (!slave || !bond->send_peer_notif ||
>+ if (!bond->send_peer_notif ||
> bond->send_peer_notif %
> max(1, bond->params.peer_notif_delay) != 0 ||
>- !netif_carrier_ok(bond->dev) ||
>- test_bit(__LINK_STATE_LINKWATCH_PENDING, &slave->dev->state))
>+ !netif_carrier_ok(bond->dev))
> return false;
>
>+ if (BOND_MODE(bond) == BOND_MODE_8023AD) {
>+ usable = rcu_dereference_rtnl(bond->usable_slaves);
>+ if (!usable || !READ_ONCE(usable->count))
>+ return false;
>+ } else {
>+ slave = rcu_dereference_rtnl(bond->curr_active_slave);
>+ if (!slave || test_bit(__LINK_STATE_LINKWATCH_PENDING,
>+ &slave->dev->state))
>+ return false;
>+ }
>+
> netdev_dbg(bond->dev, "bond_should_notify_peers: slave %s\n",
>- slave ? slave->dev->name : "NULL");
>+ slave ? slave->dev->name : "all");
Is it actually correct that if slave == NULL, the notify peers
logic will send to all ports? I'm not sure why this changed.
-J
>
> return true;
> }
>--
>2.34.1
---
-Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [net-next v6 1/4] net: bonding: add broadcast_neighbor option for 802.3ad
2025-06-16 23:04 ` Jakub Kicinski
@ 2025-06-17 10:25 ` Tonghao Zhang
0 siblings, 0 replies; 17+ messages in thread
From: Tonghao Zhang @ 2025-06-17 10:25 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, Jay Vosburgh, David S. Miller, Eric Dumazet, Paolo Abeni,
Simon Horman, Jonathan Corbet, Andrew Lunn, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Nikolay Aleksandrov,
Zengbing Tu
> 2025年6月17日 07:04,Jakub Kicinski <kuba@kernel.org> 写道:
>
> On Tue, 10 Jun 2025 11:44:42 +0800 Tonghao Zhang wrote:
>> + if (bond->params.broadcast_neighbor)
>> + static_branch_inc(&bond_bcast_neigh_enabled);
>> }
>>
>> if (bond_mode_can_use_xmit_hash(bond))
>> @@ -4475,6 +4480,10 @@ static int bond_close(struct net_device *bond_dev)
>> bond_alb_deinitialize(bond);
>> bond->recv_probe = NULL;
>>
>> + if (BOND_MODE(bond) == BOND_MODE_8023AD &&
>> + bond->params.broadcast_neighbor)
>> + static_branch_dec(&bond_bcast_neigh_enabled);
>
> is .broadcast_neighbor automatically cleared when mode is changed?
> I don't see it in the code.
No clean this value when changing the mode, this option don’t work in other mode. But it is better to clean up it.
> --
> pw-bot: cr
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [net-next v6 4/4] net: bonding: add tracepoint for 802.3ad
2025-06-17 0:28 ` Jay Vosburgh
@ 2025-06-17 10:37 ` Tonghao Zhang
2025-06-23 2:11 ` Tonghao Zhang
0 siblings, 1 reply; 17+ messages in thread
From: Tonghao Zhang @ 2025-06-17 10:37 UTC (permalink / raw)
To: Jay Vosburgh
Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Andrew Lunn,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Nikolay Aleksandrov, Zengbing Tu
> 2025年6月17日 08:28,Jay Vosburgh <jv@jvosburgh.net> 写道:
>
> Tonghao Zhang <tonghao@bamaicloud.com> wrote:
>
>> Users can monitor NIC link status changes through netlink. However, LACP
>> protocol failures may occur despite operational physical links. There is
>> no way to detect LACP state changes. This patch adds tracepoint at
>> LACP state transition.
>
> This patch really has nothing to do with the rest of the series
> (it's unrelated to the broadcast_neighbor functionality), and should
> really be sent separately.
… monitoring the lacp state is part of “no-stacking” arch solution. So I sent it as series.
if unnecessary, I will set it separately.
> That said, I recall asking about work that was proposed some
Sorry I may miss your commits about this patch.
> time ago to create netlink events (visible to ip monitor, et al) when
> the LACP state changes. That would be a cleaner method to watch the
> LACP state machine (as it would integrate with all of the other event
Why not consider a BPF+tracepoint solution? It provides more flexible LACP data collection with simpler implementation.
> infrastructure). Maybe I missed the response, but what became of that
> work?
>
> -J
>
>> Cc: Jay Vosburgh <jv@jvosburgh.net>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: Eric Dumazet <edumazet@google.com>
>> Cc: Jakub Kicinski <kuba@kernel.org>
>> Cc: Paolo Abeni <pabeni@redhat.com>
>> Cc: Simon Horman <horms@kernel.org>
>> Cc: Jonathan Corbet <corbet@lwn.net>
>> Cc: Andrew Lunn <andrew+netdev@lunn.ch>
>> Cc: Steven Rostedt <rostedt@goodmis.org>
>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> Cc: Nikolay Aleksandrov <razor@blackwall.org>
>> Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com>
>> Signed-off-by: Zengbing Tu <tuzengbing@didiglobal.com>
>> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
>> ---
>> drivers/net/bonding/bond_3ad.c | 6 ++++++
>> include/trace/events/bonding.h | 37 ++++++++++++++++++++++++++++++++++
>> 2 files changed, 43 insertions(+)
>> create mode 100644 include/trace/events/bonding.h
>>
>> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>> index d1c2d416ac87..55703230ab29 100644
>> --- a/drivers/net/bonding/bond_3ad.c
>> +++ b/drivers/net/bonding/bond_3ad.c
>> @@ -16,6 +16,9 @@
>> #include <net/bond_3ad.h>
>> #include <net/netlink.h>
>>
>> +#define CREATE_TRACE_POINTS
>> +#include <trace/events/bonding.h>
>> +
>> /* General definitions */
>> #define AD_SHORT_TIMEOUT 1
>> #define AD_LONG_TIMEOUT 0
>> @@ -1146,6 +1149,9 @@ static void ad_mux_machine(struct port *port, bool *update_slave_arr)
>> port->actor_port_number,
>> last_state,
>> port->sm_mux_state);
>> +
>> + trace_3ad_mux_state(port->slave->dev, last_state, port->sm_mux_state);
>> +
>> switch (port->sm_mux_state) {
>> case AD_MUX_DETACHED:
>> port->actor_oper_port_state &= ~LACP_STATE_SYNCHRONIZATION;
>> diff --git a/include/trace/events/bonding.h b/include/trace/events/bonding.h
>> new file mode 100644
>> index 000000000000..1ee4b07d912a
>> --- /dev/null
>> +++ b/include/trace/events/bonding.h
>> @@ -0,0 +1,37 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +
>> +#if !defined(_TRACE_BONDING_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#define _TRACE_BONDING_H
>> +
>> +#include <linux/netdevice.h>
>> +#include <linux/tracepoint.h>
>> +
>> +#undef TRACE_SYSTEM
>> +#define TRACE_SYSTEM bonding
>> +
>> +TRACE_EVENT(3ad_mux_state,
>> + TP_PROTO(struct net_device *dev, u32 last_state, u32 curr_state),
>> + TP_ARGS(dev, last_state, curr_state),
>> +
>> + TP_STRUCT__entry(
>> + __field(int, ifindex)
>> + __string(dev_name, dev->name)
>> + __field(u32, last_state)
>> + __field(u32, curr_state)
>> + ),
>> +
>> + TP_fast_assign(
>> + __entry->ifindex = dev->ifindex;
>> + __assign_str(dev_name);
>> + __entry->last_state = last_state;
>> + __entry->curr_state = curr_state;
>> + ),
>> +
>> + TP_printk("ifindex %d dev %s last_state 0x%x curr_state 0x%x",
>> + __entry->ifindex, __get_str(dev_name),
>> + __entry->last_state, __entry->curr_state)
>> +);
>> +
>> +#endif /* _TRACE_BONDING_H */
>> +
>> +#include <trace/define_trace.h>
>> --
>> 2.34.1
>>
>
> ---
> -Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [net-next v6 3/4] net: bonding: send peer notify when failure recovery
2025-06-17 0:36 ` Jay Vosburgh
@ 2025-06-17 10:47 ` Tonghao Zhang
2025-06-17 11:39 ` Tonghao Zhang
0 siblings, 1 reply; 17+ messages in thread
From: Tonghao Zhang @ 2025-06-17 10:47 UTC (permalink / raw)
To: Jay Vosburgh
Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Andrew Lunn,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Nikolay Aleksandrov, Zengbing Tu
> 2025年6月17日 08:36,Jay Vosburgh <jv@jvosburgh.net> 写道:
>
> Tonghao Zhang <tonghao@bamaicloud.com> wrote:
>
>> After LACP protocol recovery, the port can transmit packets.
>> However, if the bond port doesn't send gratuitous ARP/ND
>> packets to the switch, the switch won't return packets through
>> the current interface. This causes traffic imbalance. To resolve
>> this issue, when LACP protocol recovers, send ARP/ND packets.
>
> I think the description above needs to mention that the
> gratuitous ARP/ND only happens if broadcast_neighbor is enabled.
Ok, thanks
>
> I'll note that the documentation update does include this
> caveat.
>
>> Cc: Jay Vosburgh <jv@jvosburgh.net>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: Eric Dumazet <edumazet@google.com>
>> Cc: Jakub Kicinski <kuba@kernel.org>
>> Cc: Paolo Abeni <pabeni@redhat.com>
>> Cc: Simon Horman <horms@kernel.org>
>> Cc: Jonathan Corbet <corbet@lwn.net>
>> Cc: Andrew Lunn <andrew+netdev@lunn.ch>
>> Cc: Steven Rostedt <rostedt@goodmis.org>
>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> Cc: Nikolay Aleksandrov <razor@blackwall.org>
>> Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com>
>> Signed-off-by: Zengbing Tu <tuzengbing@didiglobal.com>
>> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
>> ---
>> Documentation/networking/bonding.rst | 5 +++--
>> drivers/net/bonding/bond_3ad.c | 13 +++++++++++++
>> drivers/net/bonding/bond_main.c | 21 ++++++++++++++++-----
>> 3 files changed, 32 insertions(+), 7 deletions(-)
>>
>> diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
>> index 14f7593d888d..f8f5766703d4 100644
>> --- a/Documentation/networking/bonding.rst
>> +++ b/Documentation/networking/bonding.rst
>> @@ -773,8 +773,9 @@ num_unsol_na
>> greater than 1.
>>
>> The valid range is 0 - 255; the default value is 1. These options
>> - affect only the active-backup mode. These options were added for
>> - bonding versions 3.3.0 and 3.4.0 respectively.
>> + affect the active-backup or 802.3ad (broadcast_neighbor enabled) mode.
>> + These options were added for bonding versions 3.3.0 and 3.4.0
>> + respectively.
>>
>> From Linux 3.0 and bonding version 3.7.1, these notifications
>> are generated by the ipv4 and ipv6 code and the numbers of
>> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>> index c6807e473ab7..d1c2d416ac87 100644
>> --- a/drivers/net/bonding/bond_3ad.c
>> +++ b/drivers/net/bonding/bond_3ad.c
>> @@ -982,6 +982,17 @@ static int ad_marker_send(struct port *port, struct bond_marker *marker)
>> return 0;
>> }
>>
>> +static void ad_cond_set_peer_notif(struct port *port)
>> +{
>> + struct bonding *bond = port->slave->bond;
>> +
>> + if (bond->params.broadcast_neighbor && rtnl_trylock()) {
>> + bond->send_peer_notif = bond->params.num_peer_notif *
>> + max(1, bond->params.peer_notif_delay);
>> + rtnl_unlock();
>> + }
>> +}
>> +
>> /**
>> * ad_mux_machine - handle a port's mux state machine
>> * @port: the port we're looking at
>> @@ -2061,6 +2072,8 @@ static void ad_enable_collecting_distributing(struct port *port,
>> __enable_port(port);
>> /* Slave array needs update */
>> *update_slave_arr = true;
>> + /* Should notify peers if possible */
>> + ad_cond_set_peer_notif(port);
>> }
>> }
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 12046ef51569..0acece55d9cb 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -1237,17 +1237,28 @@ static struct slave *bond_find_best_slave(struct bonding *bond)
>> /* must be called in RCU critical section or with RTNL held */
>> static bool bond_should_notify_peers(struct bonding *bond)
>> {
>> - struct slave *slave = rcu_dereference_rtnl(bond->curr_active_slave);
>> + struct bond_up_slave *usable;
>> + struct slave *slave = NULL;
>>
>> - if (!slave || !bond->send_peer_notif ||
>> + if (!bond->send_peer_notif ||
>> bond->send_peer_notif %
>> max(1, bond->params.peer_notif_delay) != 0 ||
>> - !netif_carrier_ok(bond->dev) ||
>> - test_bit(__LINK_STATE_LINKWATCH_PENDING, &slave->dev->state))
>> + !netif_carrier_ok(bond->dev))
>> return false;
>>
>> + if (BOND_MODE(bond) == BOND_MODE_8023AD) {
>> + usable = rcu_dereference_rtnl(bond->usable_slaves);
>> + if (!usable || !READ_ONCE(usable->count))
>> + return false;
>> + } else {
>> + slave = rcu_dereference_rtnl(bond->curr_active_slave);
>> + if (!slave || test_bit(__LINK_STATE_LINKWATCH_PENDING,
>> + &slave->dev->state))
>> + return false;
>> + }
>> +
>> netdev_dbg(bond->dev, "bond_should_notify_peers: slave %s\n",
>> - slave ? slave->dev->name : "NULL");
>> + slave ? slave->dev->name : "all");
>
> Is it actually correct that if slave == NULL, the notify peers
> logic will send to all ports? I'm not sure why this changed.
when bond is lacp mode, and send_peer_notif > 0, usable_slaves > 0, then slave == NULL, the debug log will print info "bond_should_notify_peers: slave all".
In active-backup mode, slave is not NULL, that means "bond_should_notify_peers: slave xx".
>
> -J
>
>>
>> return true;
>> }
>> --
>> 2.34.1
>
> ---
> -Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [net-next v6 3/4] net: bonding: send peer notify when failure recovery
2025-06-17 10:47 ` Tonghao Zhang
@ 2025-06-17 11:39 ` Tonghao Zhang
2025-06-18 2:51 ` Tonghao Zhang
0 siblings, 1 reply; 17+ messages in thread
From: Tonghao Zhang @ 2025-06-17 11:39 UTC (permalink / raw)
To: jv
Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Andrew Lunn,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Nikolay Aleksandrov, Zengbing Tu, Tonghao Zhang
> 2025年6月17日 18:47,Tonghao Zhang <tonghao@bamaicloud.com> 写道:
>
>
>
>> 2025年6月17日 08:36,Jay Vosburgh <jv@jvosburgh.net> 写道:
>>
>> Tonghao Zhang <tonghao@bamaicloud.com> wrote:
>>
>>> After LACP protocol recovery, the port can transmit packets.
>>> However, if the bond port doesn't send gratuitous ARP/ND
>>> packets to the switch, the switch won't return packets through
>>> the current interface. This causes traffic imbalance. To resolve
>>> this issue, when LACP protocol recovers, send ARP/ND packets.
>>
>> I think the description above needs to mention that the
>> gratuitous ARP/ND only happens if broadcast_neighbor is enabled.
> Ok, thanks
>>
>> I'll note that the documentation update does include this
>> caveat.
>>
>>> Cc: Jay Vosburgh <jv@jvosburgh.net>
>>> Cc: "David S. Miller" <davem@davemloft.net>
>>> Cc: Eric Dumazet <edumazet@google.com>
>>> Cc: Jakub Kicinski <kuba@kernel.org>
>>> Cc: Paolo Abeni <pabeni@redhat.com>
>>> Cc: Simon Horman <horms@kernel.org>
>>> Cc: Jonathan Corbet <corbet@lwn.net>
>>> Cc: Andrew Lunn <andrew+netdev@lunn.ch>
>>> Cc: Steven Rostedt <rostedt@goodmis.org>
>>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>> Cc: Nikolay Aleksandrov <razor@blackwall.org>
>>> Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com>
>>> Signed-off-by: Zengbing Tu <tuzengbing@didiglobal.com>
>>> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
>>> ---
>>> Documentation/networking/bonding.rst | 5 +++--
>>> drivers/net/bonding/bond_3ad.c | 13 +++++++++++++
>>> drivers/net/bonding/bond_main.c | 21 ++++++++++++++++-----
>>> 3 files changed, 32 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
>>> index 14f7593d888d..f8f5766703d4 100644
>>> --- a/Documentation/networking/bonding.rst
>>> +++ b/Documentation/networking/bonding.rst
>>> @@ -773,8 +773,9 @@ num_unsol_na
>>> greater than 1.
>>>
>>> The valid range is 0 - 255; the default value is 1. These options
>>> - affect only the active-backup mode. These options were added for
>>> - bonding versions 3.3.0 and 3.4.0 respectively.
>>> + affect the active-backup or 802.3ad (broadcast_neighbor enabled) mode.
>>> + These options were added for bonding versions 3.3.0 and 3.4.0
>>> + respectively.
>>>
>>> From Linux 3.0 and bonding version 3.7.1, these notifications
>>> are generated by the ipv4 and ipv6 code and the numbers of
>>> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>>> index c6807e473ab7..d1c2d416ac87 100644
>>> --- a/drivers/net/bonding/bond_3ad.c
>>> +++ b/drivers/net/bonding/bond_3ad.c
>>> @@ -982,6 +982,17 @@ static int ad_marker_send(struct port *port, struct bond_marker *marker)
>>> return 0;
>>> }
>>>
>>> +static void ad_cond_set_peer_notif(struct port *port)
>>> +{
>>> + struct bonding *bond = port->slave->bond;
>>> +
>>> + if (bond->params.broadcast_neighbor && rtnl_trylock()) {
>>> + bond->send_peer_notif = bond->params.num_peer_notif *
>>> + max(1, bond->params.peer_notif_delay);
>>> + rtnl_unlock();
>>> + }
>>> +}
>>> +
>>> /**
>>> * ad_mux_machine - handle a port's mux state machine
>>> * @port: the port we're looking at
>>> @@ -2061,6 +2072,8 @@ static void ad_enable_collecting_distributing(struct port *port,
>>> __enable_port(port);
>>> /* Slave array needs update */
>>> *update_slave_arr = true;
>>> + /* Should notify peers if possible */
>>> + ad_cond_set_peer_notif(port);
>>> }
>>> }
>>>
>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>> index 12046ef51569..0acece55d9cb 100644
>>> --- a/drivers/net/bonding/bond_main.c
>>> +++ b/drivers/net/bonding/bond_main.c
>>> @@ -1237,17 +1237,28 @@ static struct slave *bond_find_best_slave(struct bonding *bond)
>>> /* must be called in RCU critical section or with RTNL held */
>>> static bool bond_should_notify_peers(struct bonding *bond)
>>> {
>>> - struct slave *slave = rcu_dereference_rtnl(bond->curr_active_slave);
>>> + struct bond_up_slave *usable;
>>> + struct slave *slave = NULL;
>>>
>>> - if (!slave || !bond->send_peer_notif ||
>>> + if (!bond->send_peer_notif ||
>>> bond->send_peer_notif %
>>> max(1, bond->params.peer_notif_delay) != 0 ||
>>> - !netif_carrier_ok(bond->dev) ||
>>> - test_bit(__LINK_STATE_LINKWATCH_PENDING, &slave->dev->state))
>>> + !netif_carrier_ok(bond->dev))
>>> return false;
>>>
>>> + if (BOND_MODE(bond) == BOND_MODE_8023AD) {
>>> + usable = rcu_dereference_rtnl(bond->usable_slaves);
>>> + if (!usable || !READ_ONCE(usable->count))
>>> + return false;
>>> + } else {
>>> + slave = rcu_dereference_rtnl(bond->curr_active_slave);
>>> + if (!slave || test_bit(__LINK_STATE_LINKWATCH_PENDING,
>>> + &slave->dev->state))
>>> + return false;
>>> + }
>>> +
>>> netdev_dbg(bond->dev, "bond_should_notify_peers: slave %s\n",
>>> - slave ? slave->dev->name : "NULL");
>>> + slave ? slave->dev->name : "all");
>>
>> Is it actually correct that if slave == NULL, the notify peers
>> logic will send to all ports? I'm not sure why this changed.
> when bond is lacp mode, and send_peer_notif > 0, usable_slaves > 0, then slave == NULL, the debug log will print info "bond_should_notify_peers: slave all”.
In lacp mode, when broadcast_neighbor enabled, send_peer_notif will be set in ad_cond_set_peer_notif.
>
> In active-backup mode, slave is not NULL, that means "bond_should_notify_peers: slave xx".
>>
>> -J
>>
>>>
>>> return true;
>>> }
>>> --
>>> 2.34.1
>>
>> ---
>> -Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [net-next v6 3/4] net: bonding: send peer notify when failure recovery
2025-06-17 11:39 ` Tonghao Zhang
@ 2025-06-18 2:51 ` Tonghao Zhang
0 siblings, 0 replies; 17+ messages in thread
From: Tonghao Zhang @ 2025-06-18 2:51 UTC (permalink / raw)
To: jv
Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Andrew Lunn,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Nikolay Aleksandrov, Zengbing Tu
> 2025年6月17日 19:39,Tonghao Zhang <tonghao@bamaicloud.com> 写道:
>
>>
>> 2025年6月17日 18:47,Tonghao Zhang <tonghao@bamaicloud.com> 写道:
>>
>>
>>
>>> 2025年6月17日 08:36,Jay Vosburgh <jv@jvosburgh.net> 写道:
>>>
>>> Tonghao Zhang <tonghao@bamaicloud.com> wrote:
>>>
>>>> After LACP protocol recovery, the port can transmit packets.
>>>> However, if the bond port doesn't send gratuitous ARP/ND
>>>> packets to the switch, the switch won't return packets through
>>>> the current interface. This causes traffic imbalance. To resolve
>>>> this issue, when LACP protocol recovers, send ARP/ND packets.
>>>
>>> I think the description above needs to mention that the
>>> gratuitous ARP/ND only happens if broadcast_neighbor is enabled.
>> Ok, thanks
>>>
>>> I'll note that the documentation update does include this
>>> caveat.
>>>
>>>> Cc: Jay Vosburgh <jv@jvosburgh.net>
>>>> Cc: "David S. Miller" <davem@davemloft.net>
>>>> Cc: Eric Dumazet <edumazet@google.com>
>>>> Cc: Jakub Kicinski <kuba@kernel.org>
>>>> Cc: Paolo Abeni <pabeni@redhat.com>
>>>> Cc: Simon Horman <horms@kernel.org>
>>>> Cc: Jonathan Corbet <corbet@lwn.net>
>>>> Cc: Andrew Lunn <andrew+netdev@lunn.ch>
>>>> Cc: Steven Rostedt <rostedt@goodmis.org>
>>>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>>> Cc: Nikolay Aleksandrov <razor@blackwall.org>
>>>> Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com>
>>>> Signed-off-by: Zengbing Tu <tuzengbing@didiglobal.com>
>>>> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
>>>> ---
>>>> Documentation/networking/bonding.rst | 5 +++--
>>>> drivers/net/bonding/bond_3ad.c | 13 +++++++++++++
>>>> drivers/net/bonding/bond_main.c | 21 ++++++++++++++++-----
>>>> 3 files changed, 32 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
>>>> index 14f7593d888d..f8f5766703d4 100644
>>>> --- a/Documentation/networking/bonding.rst
>>>> +++ b/Documentation/networking/bonding.rst
>>>> @@ -773,8 +773,9 @@ num_unsol_na
>>>> greater than 1.
>>>>
>>>> The valid range is 0 - 255; the default value is 1. These options
>>>> - affect only the active-backup mode. These options were added for
>>>> - bonding versions 3.3.0 and 3.4.0 respectively.
>>>> + affect the active-backup or 802.3ad (broadcast_neighbor enabled) mode.
>>>> + These options were added for bonding versions 3.3.0 and 3.4.0
>>>> + respectively.
>>>>
>>>> From Linux 3.0 and bonding version 3.7.1, these notifications
>>>> are generated by the ipv4 and ipv6 code and the numbers of
>>>> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>>>> index c6807e473ab7..d1c2d416ac87 100644
>>>> --- a/drivers/net/bonding/bond_3ad.c
>>>> +++ b/drivers/net/bonding/bond_3ad.c
>>>> @@ -982,6 +982,17 @@ static int ad_marker_send(struct port *port, struct bond_marker *marker)
>>>> return 0;
>>>> }
>>>>
>>>> +static void ad_cond_set_peer_notif(struct port *port)
>>>> +{
>>>> + struct bonding *bond = port->slave->bond;
>>>> +
>>>> + if (bond->params.broadcast_neighbor && rtnl_trylock()) {
>>>> + bond->send_peer_notif = bond->params.num_peer_notif *
>>>> + max(1, bond->params.peer_notif_delay);
>>>> + rtnl_unlock();
>>>> + }
>>>> +}
>>>> +
>>>> /**
>>>> * ad_mux_machine - handle a port's mux state machine
>>>> * @port: the port we're looking at
>>>> @@ -2061,6 +2072,8 @@ static void ad_enable_collecting_distributing(struct port *port,
>>>> __enable_port(port);
>>>> /* Slave array needs update */
>>>> *update_slave_arr = true;
>>>> + /* Should notify peers if possible */
>>>> + ad_cond_set_peer_notif(port);
>>>> }
>>>> }
>>>>
>>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>>> index 12046ef51569..0acece55d9cb 100644
>>>> --- a/drivers/net/bonding/bond_main.c
>>>> +++ b/drivers/net/bonding/bond_main.c
>>>> @@ -1237,17 +1237,28 @@ static struct slave *bond_find_best_slave(struct bonding *bond)
>>>> /* must be called in RCU critical section or with RTNL held */
>>>> static bool bond_should_notify_peers(struct bonding *bond)
>>>> {
>>>> - struct slave *slave = rcu_dereference_rtnl(bond->curr_active_slave);
>>>> + struct bond_up_slave *usable;
>>>> + struct slave *slave = NULL;
>>>>
>>>> - if (!slave || !bond->send_peer_notif ||
>>>> + if (!bond->send_peer_notif ||
>>>> bond->send_peer_notif %
>>>> max(1, bond->params.peer_notif_delay) != 0 ||
>>>> - !netif_carrier_ok(bond->dev) ||
>>>> - test_bit(__LINK_STATE_LINKWATCH_PENDING, &slave->dev->state))
>>>> + !netif_carrier_ok(bond->dev))
>>>> return false;
>>>>
>>>> + if (BOND_MODE(bond) == BOND_MODE_8023AD) {
>>>> + usable = rcu_dereference_rtnl(bond->usable_slaves);
>>>> + if (!usable || !READ_ONCE(usable->count))
>>>> + return false;
>>>> + } else {
>>>> + slave = rcu_dereference_rtnl(bond->curr_active_slave);
>>>> + if (!slave || test_bit(__LINK_STATE_LINKWATCH_PENDING,
>>>> + &slave->dev->state))
>>>> + return false;
>>>> + }
>>>> +
>>>> netdev_dbg(bond->dev, "bond_should_notify_peers: slave %s\n",
>>>> - slave ? slave->dev->name : "NULL");
>>>> + slave ? slave->dev->name : "all");
>>>
>>> Is it actually correct that if slave == NULL, the notify peers
>>> logic will send to all ports? I'm not sure why this changed.
>> when bond is lacp mode, and send_peer_notif > 0, usable_slaves > 0, then slave == NULL, the debug log will print info "bond_should_notify_peers: slave all”.
> In lacp mode, when broadcast_neighbor enabled, send_peer_notif will be set in ad_cond_set_peer_notif.
Hi Jay
This patch is ok? Or delete the debug info if you're confused. I think this is correct and useful to debug notify peers function.
>>
>> In active-backup mode, slave is not NULL, that means "bond_should_notify_peers: slave xx".
>>>
>>> -J
>>>
>>>>
>>>> return true;
>>>> }
>>>> --
>>>> 2.34.1
>>>
>>> ---
>>> -Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [net-next v6 4/4] net: bonding: add tracepoint for 802.3ad
2025-06-17 10:37 ` Tonghao Zhang
@ 2025-06-23 2:11 ` Tonghao Zhang
2025-06-24 19:43 ` Jay Vosburgh
0 siblings, 1 reply; 17+ messages in thread
From: Tonghao Zhang @ 2025-06-23 2:11 UTC (permalink / raw)
To: jv
Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Andrew Lunn,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Nikolay Aleksandrov, Zengbing Tu, tonghao
> 2025年6月17日 18:37,Tonghao Zhang <tonghao@bamaicloud.com> 写道:
>
>
>
>> 2025年6月17日 08:28,Jay Vosburgh <jv@jvosburgh.net> 写道:
>>
>> Tonghao Zhang <tonghao@bamaicloud.com> wrote:
>>
>>> Users can monitor NIC link status changes through netlink. However, LACP
>>> protocol failures may occur despite operational physical links. There is
>>> no way to detect LACP state changes. This patch adds tracepoint at
>>> LACP state transition.
>>
>> This patch really has nothing to do with the rest of the series
>> (it's unrelated to the broadcast_neighbor functionality), and should
>> really be sent separately.
> … monitoring the lacp state is part of “no-stacking” arch solution. So I sent it as series.
> if unnecessary, I will set it separately.
>
>> That said, I recall asking about work that was proposed some
> Sorry I may miss your commits about this patch.
>> time ago to create netlink events (visible to ip monitor, et al) when
>> the LACP state changes. That would be a cleaner method to watch the
>> LACP state machine (as it would integrate with all of the other event
> Why not consider a BPF+tracepoint solution? It provides more flexible LACP data collection with simpler implementation.
We developed a component. It collects kernel events via kprobe, ftrace, and tracepoint. Events include:
- Scheduling latency
- Direct memory reclaim
- Network packets drop
- LACP state events
BPF + tracepoint is our optimal approach. I think we should support this method.
>> infrastructure). Maybe I missed the response, but what became of that
>> work?
>>
>> -J
>>
>>> Cc: Jay Vosburgh <jv@jvosburgh.net>
>>> Cc: "David S. Miller" <davem@davemloft.net>
>>> Cc: Eric Dumazet <edumazet@google.com>
>>> Cc: Jakub Kicinski <kuba@kernel.org>
>>> Cc: Paolo Abeni <pabeni@redhat.com>
>>> Cc: Simon Horman <horms@kernel.org>
>>> Cc: Jonathan Corbet <corbet@lwn.net>
>>> Cc: Andrew Lunn <andrew+netdev@lunn.ch>
>>> Cc: Steven Rostedt <rostedt@goodmis.org>
>>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>> Cc: Nikolay Aleksandrov <razor@blackwall.org>
>>> Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com>
>>> Signed-off-by: Zengbing Tu <tuzengbing@didiglobal.com>
>>> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
>>> ---
>>> drivers/net/bonding/bond_3ad.c | 6 ++++++
>>> include/trace/events/bonding.h | 37 ++++++++++++++++++++++++++++++++++
>>> 2 files changed, 43 insertions(+)
>>> create mode 100644 include/trace/events/bonding.h
>>>
>>> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>>> index d1c2d416ac87..55703230ab29 100644
>>> --- a/drivers/net/bonding/bond_3ad.c
>>> +++ b/drivers/net/bonding/bond_3ad.c
>>> @@ -16,6 +16,9 @@
>>> #include <net/bond_3ad.h>
>>> #include <net/netlink.h>
>>>
>>> +#define CREATE_TRACE_POINTS
>>> +#include <trace/events/bonding.h>
>>> +
>>> /* General definitions */
>>> #define AD_SHORT_TIMEOUT 1
>>> #define AD_LONG_TIMEOUT 0
>>> @@ -1146,6 +1149,9 @@ static void ad_mux_machine(struct port *port, bool *update_slave_arr)
>>> port->actor_port_number,
>>> last_state,
>>> port->sm_mux_state);
>>> +
>>> + trace_3ad_mux_state(port->slave->dev, last_state, port->sm_mux_state);
>>> +
>>> switch (port->sm_mux_state) {
>>> case AD_MUX_DETACHED:
>>> port->actor_oper_port_state &= ~LACP_STATE_SYNCHRONIZATION;
>>> diff --git a/include/trace/events/bonding.h b/include/trace/events/bonding.h
>>> new file mode 100644
>>> index 000000000000..1ee4b07d912a
>>> --- /dev/null
>>> +++ b/include/trace/events/bonding.h
>>> @@ -0,0 +1,37 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>> +
>>> +#if !defined(_TRACE_BONDING_H) || defined(TRACE_HEADER_MULTI_READ)
>>> +#define _TRACE_BONDING_H
>>> +
>>> +#include <linux/netdevice.h>
>>> +#include <linux/tracepoint.h>
>>> +
>>> +#undef TRACE_SYSTEM
>>> +#define TRACE_SYSTEM bonding
>>> +
>>> +TRACE_EVENT(3ad_mux_state,
>>> + TP_PROTO(struct net_device *dev, u32 last_state, u32 curr_state),
>>> + TP_ARGS(dev, last_state, curr_state),
>>> +
>>> + TP_STRUCT__entry(
>>> + __field(int, ifindex)
>>> + __string(dev_name, dev->name)
>>> + __field(u32, last_state)
>>> + __field(u32, curr_state)
>>> + ),
>>> +
>>> + TP_fast_assign(
>>> + __entry->ifindex = dev->ifindex;
>>> + __assign_str(dev_name);
>>> + __entry->last_state = last_state;
>>> + __entry->curr_state = curr_state;
>>> + ),
>>> +
>>> + TP_printk("ifindex %d dev %s last_state 0x%x curr_state 0x%x",
>>> + __entry->ifindex, __get_str(dev_name),
>>> + __entry->last_state, __entry->curr_state)
>>> +);
>>> +
>>> +#endif /* _TRACE_BONDING_H */
>>> +
>>> +#include <trace/define_trace.h>
>>> --
>>> 2.34.1
>>>
>>
>> ---
>> -Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [net-next v6 4/4] net: bonding: add tracepoint for 802.3ad
2025-06-23 2:11 ` Tonghao Zhang
@ 2025-06-24 19:43 ` Jay Vosburgh
2025-06-26 2:57 ` Tonghao Zhang
0 siblings, 1 reply; 17+ messages in thread
From: Jay Vosburgh @ 2025-06-24 19:43 UTC (permalink / raw)
To: Tonghao Zhang
Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Andrew Lunn,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Nikolay Aleksandrov, Zengbing Tu
Tonghao Zhang <tonghao@bamaicloud.com> wrote:
>> 2025年6月17日 18:37,Tonghao Zhang <tonghao@bamaicloud.com> 写道:
>>
>>> 2025年6月17日 08:28,Jay Vosburgh <jv@jvosburgh.net> 写道:
>>>
>>> Tonghao Zhang <tonghao@bamaicloud.com> wrote:
>>>
>>>> Users can monitor NIC link status changes through netlink. However, LACP
>>>> protocol failures may occur despite operational physical links. There is
>>>> no way to detect LACP state changes. This patch adds tracepoint at
>>>> LACP state transition.
>>>
>>> This patch really has nothing to do with the rest of the series
>>> (it's unrelated to the broadcast_neighbor functionality), and should
>>> really be sent separately.
>> … monitoring the lacp state is part of “no-stacking” arch solution. So I sent it as series.
>> if unnecessary, I will set it separately.
>>
>>> That said, I recall asking about work that was proposed some
>> Sorry I may miss your commits about this patch.
>>> time ago to create netlink events (visible to ip monitor, et al) when
>>> the LACP state changes. That would be a cleaner method to watch the
>>> LACP state machine (as it would integrate with all of the other event
>> Why not consider a BPF+tracepoint solution? It provides more flexible LACP data collection with simpler implementation.
>We developed a component. It collects kernel events via kprobe, ftrace, and tracepoint. Events include:
>- Scheduling latency
>- Direct memory reclaim
>- Network packets drop
>- LACP state events
>
>BPF + tracepoint is our optimal approach. I think we should support this method.
At present, as far as I know, networking state change events are
exported to user space through netlink. Absent a compelling reason why
the LACP state change cannot be exported via netlink, my view is that it
should be consistent with all other network events.
Also, to be clear, I'm asking for justification because this is
a request to do something in a special bonding-unique way. There are
already a lot of special cases in bonding, in which things are done
differently than the usual practice. Adding an API element, such as a
tracepoint, is forever, and as such adding one that also differs from
the usual practice deserves scrutiny.
-J
>>> infrastructure). Maybe I missed the response, but what became of that
>>> work?
>>>
>>> -J
>>>
>>>> Cc: Jay Vosburgh <jv@jvosburgh.net>
>>>> Cc: "David S. Miller" <davem@davemloft.net>
>>>> Cc: Eric Dumazet <edumazet@google.com>
>>>> Cc: Jakub Kicinski <kuba@kernel.org>
>>>> Cc: Paolo Abeni <pabeni@redhat.com>
>>>> Cc: Simon Horman <horms@kernel.org>
>>>> Cc: Jonathan Corbet <corbet@lwn.net>
>>>> Cc: Andrew Lunn <andrew+netdev@lunn.ch>
>>>> Cc: Steven Rostedt <rostedt@goodmis.org>
>>>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>>> Cc: Nikolay Aleksandrov <razor@blackwall.org>
>>>> Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com>
>>>> Signed-off-by: Zengbing Tu <tuzengbing@didiglobal.com>
>>>> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
>>>> ---
>>>> drivers/net/bonding/bond_3ad.c | 6 ++++++
>>>> include/trace/events/bonding.h | 37 ++++++++++++++++++++++++++++++++++
>>>> 2 files changed, 43 insertions(+)
>>>> create mode 100644 include/trace/events/bonding.h
>>>>
>>>> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>>>> index d1c2d416ac87..55703230ab29 100644
>>>> --- a/drivers/net/bonding/bond_3ad.c
>>>> +++ b/drivers/net/bonding/bond_3ad.c
>>>> @@ -16,6 +16,9 @@
>>>> #include <net/bond_3ad.h>
>>>> #include <net/netlink.h>
>>>>
>>>> +#define CREATE_TRACE_POINTS
>>>> +#include <trace/events/bonding.h>
>>>> +
>>>> /* General definitions */
>>>> #define AD_SHORT_TIMEOUT 1
>>>> #define AD_LONG_TIMEOUT 0
>>>> @@ -1146,6 +1149,9 @@ static void ad_mux_machine(struct port *port, bool *update_slave_arr)
>>>> port->actor_port_number,
>>>> last_state,
>>>> port->sm_mux_state);
>>>> +
>>>> + trace_3ad_mux_state(port->slave->dev, last_state, port->sm_mux_state);
>>>> +
>>>> switch (port->sm_mux_state) {
>>>> case AD_MUX_DETACHED:
>>>> port->actor_oper_port_state &= ~LACP_STATE_SYNCHRONIZATION;
>>>> diff --git a/include/trace/events/bonding.h b/include/trace/events/bonding.h
>>>> new file mode 100644
>>>> index 000000000000..1ee4b07d912a
>>>> --- /dev/null
>>>> +++ b/include/trace/events/bonding.h
>>>> @@ -0,0 +1,37 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>>> +
>>>> +#if !defined(_TRACE_BONDING_H) || defined(TRACE_HEADER_MULTI_READ)
>>>> +#define _TRACE_BONDING_H
>>>> +
>>>> +#include <linux/netdevice.h>
>>>> +#include <linux/tracepoint.h>
>>>> +
>>>> +#undef TRACE_SYSTEM
>>>> +#define TRACE_SYSTEM bonding
>>>> +
>>>> +TRACE_EVENT(3ad_mux_state,
>>>> + TP_PROTO(struct net_device *dev, u32 last_state, u32 curr_state),
>>>> + TP_ARGS(dev, last_state, curr_state),
>>>> +
>>>> + TP_STRUCT__entry(
>>>> + __field(int, ifindex)
>>>> + __string(dev_name, dev->name)
>>>> + __field(u32, last_state)
>>>> + __field(u32, curr_state)
>>>> + ),
>>>> +
>>>> + TP_fast_assign(
>>>> + __entry->ifindex = dev->ifindex;
>>>> + __assign_str(dev_name);
>>>> + __entry->last_state = last_state;
>>>> + __entry->curr_state = curr_state;
>>>> + ),
>>>> +
>>>> + TP_printk("ifindex %d dev %s last_state 0x%x curr_state 0x%x",
>>>> + __entry->ifindex, __get_str(dev_name),
>>>> + __entry->last_state, __entry->curr_state)
>>>> +);
>>>> +
>>>> +#endif /* _TRACE_BONDING_H */
>>>> +
>>>> +#include <trace/define_trace.h>
>>>> --
>>>> 2.34.1
>>>>
>>>
>>> ---
>>> -Jay Vosburgh, jv@jvosburgh.net
>
>
>
---
-Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [net-next v6 4/4] net: bonding: add tracepoint for 802.3ad
2025-06-24 19:43 ` Jay Vosburgh
@ 2025-06-26 2:57 ` Tonghao Zhang
0 siblings, 0 replies; 17+ messages in thread
From: Tonghao Zhang @ 2025-06-26 2:57 UTC (permalink / raw)
To: Jay Vosburgh
Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Andrew Lunn,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Nikolay Aleksandrov, Zengbing Tu
> 2025年6月25日 03:43,Jay Vosburgh <jv@jvosburgh.net> 写道:
>
> Tonghao Zhang <tonghao@bamaicloud.com> wrote:
>
>>> 2025年6月17日 18:37,Tonghao Zhang <tonghao@bamaicloud.com> 写道:
>>>
>>>> 2025年6月17日 08:28,Jay Vosburgh <jv@jvosburgh.net> 写道:
>>>>
>>>> Tonghao Zhang <tonghao@bamaicloud.com> wrote:
>>>>
>>>>> Users can monitor NIC link status changes through netlink. However, LACP
>>>>> protocol failures may occur despite operational physical links. There is
>>>>> no way to detect LACP state changes. This patch adds tracepoint at
>>>>> LACP state transition.
>>>>
>>>> This patch really has nothing to do with the rest of the series
>>>> (it's unrelated to the broadcast_neighbor functionality), and should
>>>> really be sent separately.
>>> … monitoring the lacp state is part of “no-stacking” arch solution. So I sent it as series.
>>> if unnecessary, I will set it separately.
>>>
>>>> That said, I recall asking about work that was proposed some
>>> Sorry I may miss your commits about this patch.
>>>> time ago to create netlink events (visible to ip monitor, et al) when
>>>> the LACP state changes. That would be a cleaner method to watch the
>>>> LACP state machine (as it would integrate with all of the other event
>>> Why not consider a BPF+tracepoint solution? It provides more flexible LACP data collection with simpler implementation.
>> We developed a component. It collects kernel events via kprobe, ftrace, and tracepoint. Events include:
>> - Scheduling latency
>> - Direct memory reclaim
>> - Network packets drop
>> - LACP state events
>>
>> BPF + tracepoint is our optimal approach. I think we should support this method.
>
> At present, as far as I know, networking state change events are
> exported to user space through netlink. Absent a compelling reason why
> the LACP state change cannot be exported via netlink, my view is that it
> should be consistent with all other network events.
>
> Also, to be clear, I'm asking for justification because this is
> a request to do something in a special bonding-unique way. There are
> already a lot of special cases in bonding, in which things are done
> differently than the usual practice. Adding an API element, such as a
> tracepoint, is forever, and as such adding one that also differs from
> the usual practice deserves scrutiny.
I will research the netlink notification method and post a patch when it's ready.
>
> -J
>
>
>>>> infrastructure). Maybe I missed the response, but what became of that
>>>> work?
>>>>
>>>> -J
>>>>
>>>>> Cc: Jay Vosburgh <jv@jvosburgh.net>
>>>>> Cc: "David S. Miller" <davem@davemloft.net>
>>>>> Cc: Eric Dumazet <edumazet@google.com>
>>>>> Cc: Jakub Kicinski <kuba@kernel.org>
>>>>> Cc: Paolo Abeni <pabeni@redhat.com>
>>>>> Cc: Simon Horman <horms@kernel.org>
>>>>> Cc: Jonathan Corbet <corbet@lwn.net>
>>>>> Cc: Andrew Lunn <andrew+netdev@lunn.ch>
>>>>> Cc: Steven Rostedt <rostedt@goodmis.org>
>>>>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>>>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>>>> Cc: Nikolay Aleksandrov <razor@blackwall.org>
>>>>> Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com>
>>>>> Signed-off-by: Zengbing Tu <tuzengbing@didiglobal.com>
>>>>> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
>>>>> ---
>>>>> drivers/net/bonding/bond_3ad.c | 6 ++++++
>>>>> include/trace/events/bonding.h | 37 ++++++++++++++++++++++++++++++++++
>>>>> 2 files changed, 43 insertions(+)
>>>>> create mode 100644 include/trace/events/bonding.h
>>>>>
>>>>> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>>>>> index d1c2d416ac87..55703230ab29 100644
>>>>> --- a/drivers/net/bonding/bond_3ad.c
>>>>> +++ b/drivers/net/bonding/bond_3ad.c
>>>>> @@ -16,6 +16,9 @@
>>>>> #include <net/bond_3ad.h>
>>>>> #include <net/netlink.h>
>>>>>
>>>>> +#define CREATE_TRACE_POINTS
>>>>> +#include <trace/events/bonding.h>
>>>>> +
>>>>> /* General definitions */
>>>>> #define AD_SHORT_TIMEOUT 1
>>>>> #define AD_LONG_TIMEOUT 0
>>>>> @@ -1146,6 +1149,9 @@ static void ad_mux_machine(struct port *port, bool *update_slave_arr)
>>>>> port->actor_port_number,
>>>>> last_state,
>>>>> port->sm_mux_state);
>>>>> +
>>>>> + trace_3ad_mux_state(port->slave->dev, last_state, port->sm_mux_state);
>>>>> +
>>>>> switch (port->sm_mux_state) {
>>>>> case AD_MUX_DETACHED:
>>>>> port->actor_oper_port_state &= ~LACP_STATE_SYNCHRONIZATION;
>>>>> diff --git a/include/trace/events/bonding.h b/include/trace/events/bonding.h
>>>>> new file mode 100644
>>>>> index 000000000000..1ee4b07d912a
>>>>> --- /dev/null
>>>>> +++ b/include/trace/events/bonding.h
>>>>> @@ -0,0 +1,37 @@
>>>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>>>> +
>>>>> +#if !defined(_TRACE_BONDING_H) || defined(TRACE_HEADER_MULTI_READ)
>>>>> +#define _TRACE_BONDING_H
>>>>> +
>>>>> +#include <linux/netdevice.h>
>>>>> +#include <linux/tracepoint.h>
>>>>> +
>>>>> +#undef TRACE_SYSTEM
>>>>> +#define TRACE_SYSTEM bonding
>>>>> +
>>>>> +TRACE_EVENT(3ad_mux_state,
>>>>> + TP_PROTO(struct net_device *dev, u32 last_state, u32 curr_state),
>>>>> + TP_ARGS(dev, last_state, curr_state),
>>>>> +
>>>>> + TP_STRUCT__entry(
>>>>> + __field(int, ifindex)
>>>>> + __string(dev_name, dev->name)
>>>>> + __field(u32, last_state)
>>>>> + __field(u32, curr_state)
>>>>> + ),
>>>>> +
>>>>> + TP_fast_assign(
>>>>> + __entry->ifindex = dev->ifindex;
>>>>> + __assign_str(dev_name);
>>>>> + __entry->last_state = last_state;
>>>>> + __entry->curr_state = curr_state;
>>>>> + ),
>>>>> +
>>>>> + TP_printk("ifindex %d dev %s last_state 0x%x curr_state 0x%x",
>>>>> + __entry->ifindex, __get_str(dev_name),
>>>>> + __entry->last_state, __entry->curr_state)
>>>>> +);
>>>>> +
>>>>> +#endif /* _TRACE_BONDING_H */
>>>>> +
>>>>> +#include <trace/define_trace.h>
>>>>> --
>>>>> 2.34.1
>>>>>
>>>>
>>>> ---
>>>> -Jay Vosburgh, jv@jvosburgh.net
>>
>>
>>
> ---
> -Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-06-26 2:58 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-10 3:44 [net-next v6 0/4] add broadcast_neighbor for no-stacking networking arch Tonghao Zhang
2025-06-10 3:44 ` [net-next v6 1/4] net: bonding: add broadcast_neighbor option for 802.3ad Tonghao Zhang
2025-06-16 23:04 ` Jakub Kicinski
2025-06-17 10:25 ` Tonghao Zhang
2025-06-10 3:44 ` [net-next v6 2/4] net: bonding: add broadcast_neighbor netlink option Tonghao Zhang
2025-06-10 3:44 ` [net-next v6 3/4] net: bonding: send peer notify when failure recovery Tonghao Zhang
2025-06-17 0:36 ` Jay Vosburgh
2025-06-17 10:47 ` Tonghao Zhang
2025-06-17 11:39 ` Tonghao Zhang
2025-06-18 2:51 ` Tonghao Zhang
2025-06-10 3:44 ` [net-next v6 4/4] net: bonding: add tracepoint for 802.3ad Tonghao Zhang
2025-06-17 0:28 ` Jay Vosburgh
2025-06-17 10:37 ` Tonghao Zhang
2025-06-23 2:11 ` Tonghao Zhang
2025-06-24 19:43 ` Jay Vosburgh
2025-06-26 2:57 ` Tonghao Zhang
2025-06-16 2:07 ` [net-next v6 0/4] add broadcast_neighbor for no-stacking networking arch Tonghao Zhang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).