netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv2 net] bonding: fix multicast MAC address synchronization
@ 2025-08-05  8:09 Hangbin Liu
  2025-08-12  8:42 ` Paolo Abeni
  0 siblings, 1 reply; 3+ messages in thread
From: Hangbin Liu @ 2025-08-05  8:09 UTC (permalink / raw)
  To: netdev
  Cc: Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Nikolay Aleksandrov, Simon Horman,
	linux-kernel, Hangbin Liu, Liang Li

There is a corner case where the NS (Neighbor Solicitation) target is set to
an invalid or unreachable address. In such cases, all the slave links are
marked as down and set to *backup*. This causes the bond to add multicast MAC
addresses to all slaves. The ARP monitor then cycles through each slave to
probe them, temporarily marking as *active*.

Later, if the NS target is changed or cleared during this probe cycle, the
*active* slave will fail to remove its NS multicast address because
bond_slave_ns_maddrs_del() only removes addresses from backup slaves.
This leaves stale multicast MACs on the interface.

To fix this, we move the NS multicast MAC address handling into
bond_set_slave_state(), so every slave state transition consistently
adds/removes NS multicast addresses as needed.

We also ensure this logic is only active when arp_interval is configured,
to prevent misconfiguration or accidental behavior in unsupported modes.

Note: Cleanup in __bond_release_one() is retained to remove addresses
when the slave is unbound from the bond.

Fixes: 8eb36164d1a6 ("bonding: add ns target multicast address to slave device")
Reported-by: Liang Li <liali@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
v2:
1) Make sure arp_interval is set befer setting slave mac address.
2) add comment about why we need to set slave->backup between two if blocks
3) update commit description
---
 drivers/net/bonding/bond_main.c    |  9 ---------
 drivers/net/bonding/bond_options.c |  1 +
 include/net/bonding.h              | 17 +++++++++++++++++
 3 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 257333c88710..283615d8a3fd 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1003,8 +1003,6 @@ static void bond_hw_addr_swap(struct bonding *bond, struct slave *new_active,
 
 		if (bond->dev->flags & IFF_UP)
 			bond_hw_addr_flush(bond->dev, old_active->dev);
-
-		bond_slave_ns_maddrs_add(bond, old_active);
 	}
 
 	if (new_active) {
@@ -1021,8 +1019,6 @@ static void bond_hw_addr_swap(struct bonding *bond, struct slave *new_active,
 			dev_mc_sync(new_active->dev, bond->dev);
 			netif_addr_unlock_bh(bond->dev);
 		}
-
-		bond_slave_ns_maddrs_del(bond, new_active);
 	}
 }
 
@@ -2373,11 +2369,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
 	bond_compute_features(bond);
 	bond_set_carrier(bond);
 
-	/* Needs to be called before bond_select_active_slave(), which will
-	 * remove the maddrs if the slave is selected as active slave.
-	 */
-	bond_slave_ns_maddrs_add(bond, new_slave);
-
 	if (bond_uses_primary(bond)) {
 		block_netpoll_tx();
 		bond_select_active_slave(bond);
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 1d639a3be6ba..f54386982198 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -1264,6 +1264,7 @@ static int bond_option_arp_ip_targets_set(struct bonding *bond,
 static bool slave_can_set_ns_maddr(const struct bonding *bond, struct slave *slave)
 {
 	return BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP &&
+	       bond->params.arp_interval &&
 	       !bond_is_active_slave(slave) &&
 	       slave->dev->flags & IFF_MULTICAST;
 }
diff --git a/include/net/bonding.h b/include/net/bonding.h
index e06f0d63b2c1..951d752a5301 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -388,7 +388,24 @@ static inline void bond_set_slave_state(struct slave *slave,
 	if (slave->backup == slave_state)
 		return;
 
+	/*
+	 * The slave->backup assignment must occur between the two if blocks.
+	 * This is because bond_slave_ns_maddrs_{add,del} only operate on
+	 * backup slaves:
+	 *
+	 * - If the slave is transitioning to active, we must call
+	 *   bond_slave_ns_maddrs_del() *before* updating the backup flag.
+	 * - If transitioning to backup, we must call
+	 *   bond_slave_ns_maddrs_add() *after* setting the flag.
+	 */
+	if (slave_state == BOND_STATE_ACTIVE)
+		bond_slave_ns_maddrs_del(slave->bond, slave);
+
 	slave->backup = slave_state;
+
+	if (slave_state == BOND_STATE_BACKUP)
+		bond_slave_ns_maddrs_add(slave->bond, slave);
+
 	if (notify) {
 		bond_lower_state_changed(slave);
 		bond_queue_slave_event(slave);
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCHv2 net] bonding: fix multicast MAC address synchronization
  2025-08-05  8:09 [PATCHv2 net] bonding: fix multicast MAC address synchronization Hangbin Liu
@ 2025-08-12  8:42 ` Paolo Abeni
  2025-08-13  4:04   ` Hangbin Liu
  0 siblings, 1 reply; 3+ messages in thread
From: Paolo Abeni @ 2025-08-12  8:42 UTC (permalink / raw)
  To: Hangbin Liu, netdev
  Cc: Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Nikolay Aleksandrov, Simon Horman, linux-kernel,
	Liang Li

On 8/5/25 10:09 AM, Hangbin Liu wrote:
> There is a corner case where the NS (Neighbor Solicitation) target is set to
> an invalid or unreachable address. In such cases, all the slave links are
> marked as down and set to *backup*. This causes the bond to add multicast MAC
> addresses to all slaves. The ARP monitor then cycles through each slave to
> probe them, temporarily marking as *active*.
> 
> Later, if the NS target is changed or cleared during this probe cycle, the
> *active* slave will fail to remove its NS multicast address because
> bond_slave_ns_maddrs_del() only removes addresses from backup slaves.
> This leaves stale multicast MACs on the interface.
> 
> To fix this, we move the NS multicast MAC address handling into
> bond_set_slave_state(), so every slave state transition consistently
> adds/removes NS multicast addresses as needed.
> 
> We also ensure this logic is only active when arp_interval is configured,
> to prevent misconfiguration or accidental behavior in unsupported modes.

As noted by Jay in the previous revision, moving the handling into
bond_set_slave_state() could possibly impact a lot of scenarios, and
it's not obvious to me that restricting to arp_interval != 0 would be
sufficient.

I'm wondering if the issue could/should instead addressed explicitly
handling the mac swap for the active slave at NS target change time. WDYT?

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCHv2 net] bonding: fix multicast MAC address synchronization
  2025-08-12  8:42 ` Paolo Abeni
@ 2025-08-13  4:04   ` Hangbin Liu
  0 siblings, 0 replies; 3+ messages in thread
From: Hangbin Liu @ 2025-08-13  4:04 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Nikolay Aleksandrov, Simon Horman, linux-kernel,
	Liang Li

On Tue, Aug 12, 2025 at 10:42:22AM +0200, Paolo Abeni wrote:
> On 8/5/25 10:09 AM, Hangbin Liu wrote:
> > There is a corner case where the NS (Neighbor Solicitation) target is set to
> > an invalid or unreachable address. In such cases, all the slave links are
> > marked as down and set to *backup*. This causes the bond to add multicast MAC
> > addresses to all slaves. The ARP monitor then cycles through each slave to
> > probe them, temporarily marking as *active*.
> > 
> > Later, if the NS target is changed or cleared during this probe cycle, the
> > *active* slave will fail to remove its NS multicast address because
> > bond_slave_ns_maddrs_del() only removes addresses from backup slaves.
> > This leaves stale multicast MACs on the interface.
> > 
> > To fix this, we move the NS multicast MAC address handling into
> > bond_set_slave_state(), so every slave state transition consistently
> > adds/removes NS multicast addresses as needed.
> > 
> > We also ensure this logic is only active when arp_interval is configured,
> > to prevent misconfiguration or accidental behavior in unsupported modes.
> 
> As noted by Jay in the previous revision, moving the handling into
> bond_set_slave_state() could possibly impact a lot of scenarios, and
> it's not obvious to me that restricting to arp_interval != 0 would be
> sufficient.

I understand your concern. The bond_set_slave_state() function is called by:
  - bond_set_slave_inactive_flags
  - bond_set_slave_tx_disabled_flags
  - bond_set_slave_active_flags

These functions are mainly invoked via bond_change_active_slave, bond_enslave,
bond_ab_arp_commit, and bond_miimon_commit.

To avoid misconfiguration, in slave_can_set_ns_maddr() I tried to limit
changes to the backup slave when operating in active-backup mode with
arp_interval enabled. I also ensured that the multicast address is only
modified when the NS target is set.

> 
> I'm wondering if the issue could/should instead addressed explicitly
> handling the mac swap for the active slave at NS target change time. WDYT?

The problem is that bond_hw_addr_swap() is only called in bond_ab_arp_commit()
during ARP monitoring, while the bond sets active/inactive flags in
bond_ab_arp_probe(). These operations are called partially.

bond_activebackup_arp_mon
 - bond_ab_arp_commit
   - bond_select_active_slave
     - bond_change_active_slave
       - bond_hw_addr_swap
 - bond_ab_arp_probe
   - bond_set_slave_{active/inactive}_flags

On the other hand, we need to set the multicast address on the *temporary*
active interface to ensure we can receive the replied NA message. The MAC
swap only happens when the *actual* active interface is chosen.

This is why I chose to place the multicast address configuration in
bond_set_slave_state().

Thanks
Hangbin

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-08-13  4:04 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-05  8:09 [PATCHv2 net] bonding: fix multicast MAC address synchronization Hangbin Liu
2025-08-12  8:42 ` Paolo Abeni
2025-08-13  4:04   ` Hangbin Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).