[Discuss] IPv4 packets lost with macvlan over bond alb

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [Discuss] IPv4 packets lost with macvlan over bond alb
@ 2023-06-02  8:08 Hangbin Liu
  2023-06-08  3:43 ` Hangbin Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Hangbin Liu @ 2023-06-02  8:08 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev

Hi Jay,

It looks there is an regression for commit 14af9963ba1e ("bonding: Support
macvlans on top of tlb/rlb mode bonds"). The author export modified ARP to
remote when there is macvlan over bond, which make remote add neighbor
with macvlan's IP and bond's mac. The author expect RLB will replace all
inner packets to correct mac address if target is macvlan, but RLB only
handle ARP packets. This make all none arp packets macvlan received have
incorrect mac address, and dropped directly.

In short, remote client learned macvlan's ip with bond's mac. So the macvlan
will receive packets with incorrect macs and dropped.

To fix this, one way is to revert the patch and only send learning packets for
both tlb and alb mode for macvlan. This would make all macvlan rx packets go
through bond's active slave.

Another way is to replace the bond's mac address to correct macvlan's address
based on the rx_hashtbl . But this may has impact to the receive performance
since we need to check all the upper devices and deal the mac address for
each packets in bond_handle_frame().

So which way do you prefer?

Reproducer:
```
#!/bin/bash

# Source the topo in bond selftest
source bond_topo_3d1c.sh

trap cleanup EXIT

setup_prepare
bond_reset "mode balance-alb"
ip -n ${s_ns} addr flush dev bond0

ip -n ${s_ns} link add link bond0 name macv0 type macvlan mode bridge
ip -n ${s_ns} link set macv0 up

# I just add macvlan on the server netns, you can also move it to another netns for testing
ip -n ${s_ns} addr add ${s_ip4}/24 dev macv0
ip -n ${s_ns} addr add ${s_ip6}/24 dev macv0
ip netns exec ${c_ns} ping ${s_ip4} -c 4
sleep 5
ip netns exec ${c_ns} ping ${s_ip4} -c 4
```

Thanks
Hangbin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Discuss] IPv4 packets lost with macvlan over bond alb
  2023-06-02  8:08 [Discuss] IPv4 packets lost with macvlan over bond alb Hangbin Liu
@ 2023-06-08  3:43 ` Hangbin Liu
  2023-06-08  4:44   ` Jay Vosburgh
  2023-06-17  1:45   ` Jay Vosburgh
  0 siblings, 2 replies; 9+ messages in thread
From: Hangbin Liu @ 2023-06-08  3:43 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev

Hi Jay, any thoughts?

Thanks
Hangbin
On Fri, Jun 02, 2023 at 04:09:00PM +0800, Hangbin Liu wrote:
> Hi Jay,
> 
> It looks there is an regression for commit 14af9963ba1e ("bonding: Support
> macvlans on top of tlb/rlb mode bonds"). The author export modified ARP to
> remote when there is macvlan over bond, which make remote add neighbor
> with macvlan's IP and bond's mac. The author expect RLB will replace all
> inner packets to correct mac address if target is macvlan, but RLB only
> handle ARP packets. This make all none arp packets macvlan received have
> incorrect mac address, and dropped directly.
> 
> In short, remote client learned macvlan's ip with bond's mac. So the macvlan
> will receive packets with incorrect macs and dropped.
> 
> To fix this, one way is to revert the patch and only send learning packets for
> both tlb and alb mode for macvlan. This would make all macvlan rx packets go
> through bond's active slave.
> 
> Another way is to replace the bond's mac address to correct macvlan's address
> based on the rx_hashtbl . But this may has impact to the receive performance
> since we need to check all the upper devices and deal the mac address for
> each packets in bond_handle_frame().
> 
> So which way do you prefer?
> 
> Reproducer:
> ```
> #!/bin/bash
> 
> # Source the topo in bond selftest
> source bond_topo_3d1c.sh
> 
> trap cleanup EXIT
> 
> setup_prepare
> bond_reset "mode balance-alb"
> ip -n ${s_ns} addr flush dev bond0
> 
> ip -n ${s_ns} link add link bond0 name macv0 type macvlan mode bridge
> ip -n ${s_ns} link set macv0 up
> 
> # I just add macvlan on the server netns, you can also move it to another netns for testing
> ip -n ${s_ns} addr add ${s_ip4}/24 dev macv0
> ip -n ${s_ns} addr add ${s_ip6}/24 dev macv0
> ip netns exec ${c_ns} ping ${s_ip4} -c 4
> sleep 5
> ip netns exec ${c_ns} ping ${s_ip4} -c 4
> ```
> 
> Thanks
> Hangbin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Discuss] IPv4 packets lost with macvlan over bond alb
  2023-06-08  3:43 ` Hangbin Liu
@ 2023-06-08  4:44   ` Jay Vosburgh
  2023-06-17  1:45   ` Jay Vosburgh
  1 sibling, 0 replies; 9+ messages in thread
From: Jay Vosburgh @ 2023-06-08  4:44 UTC (permalink / raw)
  To: Hangbin Liu; +Cc: netdev

Hangbin Liu <liuhangbin@gmail.com> wrote:

>Hi Jay, any thoughts?

	Only that the more I look into this the more confused I get.  I
haven't had a chance to set it up to test and poke at it, but did look
through the code and some of the history.

>Thanks
>Hangbin
>On Fri, Jun 02, 2023 at 04:09:00PM +0800, Hangbin Liu wrote:
>> Hi Jay,
>> 
>> It looks there is an regression for commit 14af9963ba1e ("bonding: Support
>> macvlans on top of tlb/rlb mode bonds"). The author export modified ARP to
>> remote when there is macvlan over bond, which make remote add neighbor
>> with macvlan's IP and bond's mac.

	There was a report similar to yours in 2017,

https://lore.kernel.org/netdev/CAJN_NGada81u96VSRz=puy3DOXjqJ6H9vNkMjgGFBea3vgrPPQ@mail.gmail.com/

	with no responses, so I presume this hasn't worked for at least
that long (suggesting that macvlan over balance-alb is not a popular
deployment choice).  That leads me to wonder if the 14af9963ba1e patch
ever worked entirely correctly.  Perhaps whatever Vlad did to test it
missed the condition that you and the above reporter are seeing, or it
works in only a subset of macvlan modes, or maybe it did work correctly
when 14af9963ba1e was applied.

>> [...] The author expect RLB will replace all
>> inner packets to correct mac address if target is macvlan, but RLB only
>> handle ARP packets. This make all none arp packets macvlan received have
>> incorrect mac address, and dropped directly.

	Reading the 14af9963ba1e commit message, I think you're
referring to:

    To make RLB work, all we have to do is accept ARP packets
    for addresses added to the bond dev->uc list.  Since RLB
    mode will take care to update the peers directly with
    correct mac addresses, learning packets for these addresses
    do not have be send to switch.

	I don't think that means he expects RLB to fix up the MACs in
every incoming packet.  I'm reading this as that the macvlan peers will
be issued ARPs that match the IP address to the macvlan MAC address, and
that those packets should be accepted because the macvlan MAC address is
in the bond interface's unicast MAC address list (dev->uc).

>> In short, remote client learned macvlan's ip with bond's mac. So the macvlan
>> will receive packets with incorrect macs and dropped.
>> 
>> To fix this, one way is to revert the patch and only send learning packets for
>> both tlb and alb mode for macvlan. This would make all macvlan rx packets go
>> through bond's active slave.
>> 
>> Another way is to replace the bond's mac address to correct macvlan's address
>> based on the rx_hashtbl . But this may has impact to the receive performance
>> since we need to check all the upper devices and deal the mac address for
>> each packets in bond_handle_frame().
>> 
>> So which way do you prefer?

	I don't yet understand what's going on well enough have an
informed opinion; in particular, if this did work correctly when
14af9963ba1e was originally applied, then it's not clear what broke it.
I should have some time to test this in the next day or two.

	-J


>> Reproducer:
>> ```
>> #!/bin/bash
>> 
>> # Source the topo in bond selftest
>> source bond_topo_3d1c.sh
>> 
>> trap cleanup EXIT
>> 
>> setup_prepare
>> bond_reset "mode balance-alb"
>> ip -n ${s_ns} addr flush dev bond0
>> 
>> ip -n ${s_ns} link add link bond0 name macv0 type macvlan mode bridge
>> ip -n ${s_ns} link set macv0 up
>> 
>> # I just add macvlan on the server netns, you can also move it to another netns for testing
>> ip -n ${s_ns} addr add ${s_ip4}/24 dev macv0
>> ip -n ${s_ns} addr add ${s_ip6}/24 dev macv0
>> ip netns exec ${c_ns} ping ${s_ip4} -c 4
>> sleep 5
>> ip netns exec ${c_ns} ping ${s_ip4} -c 4
>> ```
>> 
>> Thanks
>> Hangbin

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Discuss] IPv4 packets lost with macvlan over bond alb
  2023-06-08  3:43 ` Hangbin Liu
  2023-06-08  4:44   ` Jay Vosburgh
@ 2023-06-17  1:45   ` Jay Vosburgh
  2023-06-17  8:29     ` Hangbin Liu
  1 sibling, 1 reply; 9+ messages in thread
From: Jay Vosburgh @ 2023-06-17  1:45 UTC (permalink / raw)
  To: Hangbin Liu; +Cc: netdev

Hangbin Liu <liuhangbin@gmail.com> wrote:

>Hi Jay, any thoughts?

	Just an update that I've done some poking with your reproducer;
as described, the ARP reply for the IP address associated with the
macvlan interface is coming back with the bond's MAC, not the MAC of the
macvlan.  If I manually create a neighbour entry on the client with the
corrent IP and MAC for the macvlan, then connectivity works as expected.

	I'll have to look a bit further into the ARP MAC selection logic
here (i.e., where does that MAC come from when the ARP reply is
generated).  It also makes me wonder what's special about macvlan, e.g.,
why doesn't regular VLAN (or other stacked devices) fail in the same way
(or maybe it does and nobody has noticed).

	-J

>Thanks
>Hangbin
>On Fri, Jun 02, 2023 at 04:09:00PM +0800, Hangbin Liu wrote:
>> Hi Jay,
>> 
>> It looks there is an regression for commit 14af9963ba1e ("bonding: Support
>> macvlans on top of tlb/rlb mode bonds"). The author export modified ARP to
>> remote when there is macvlan over bond, which make remote add neighbor
>> with macvlan's IP and bond's mac. The author expect RLB will replace all
>> inner packets to correct mac address if target is macvlan, but RLB only
>> handle ARP packets. This make all none arp packets macvlan received have
>> incorrect mac address, and dropped directly.
>> 
>> In short, remote client learned macvlan's ip with bond's mac. So the macvlan
>> will receive packets with incorrect macs and dropped.
>> 
>> To fix this, one way is to revert the patch and only send learning packets for
>> both tlb and alb mode for macvlan. This would make all macvlan rx packets go
>> through bond's active slave.
>> 
>> Another way is to replace the bond's mac address to correct macvlan's address
>> based on the rx_hashtbl . But this may has impact to the receive performance
>> since we need to check all the upper devices and deal the mac address for
>> each packets in bond_handle_frame().
>> 
>> So which way do you prefer?
>> 
>> Reproducer:
>> ```
>> #!/bin/bash
>> 
>> # Source the topo in bond selftest
>> source bond_topo_3d1c.sh
>> 
>> trap cleanup EXIT
>> 
>> setup_prepare
>> bond_reset "mode balance-alb"
>> ip -n ${s_ns} addr flush dev bond0
>> 
>> ip -n ${s_ns} link add link bond0 name macv0 type macvlan mode bridge
>> ip -n ${s_ns} link set macv0 up
>> 
>> # I just add macvlan on the server netns, you can also move it to another netns for testing
>> ip -n ${s_ns} addr add ${s_ip4}/24 dev macv0
>> ip -n ${s_ns} addr add ${s_ip6}/24 dev macv0
>> ip netns exec ${c_ns} ping ${s_ip4} -c 4
>> sleep 5
>> ip netns exec ${c_ns} ping ${s_ip4} -c 4
>> ```
>> 
>> Thanks
>> Hangbin

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Discuss] IPv4 packets lost with macvlan over bond alb
  2023-06-17  1:45   ` Jay Vosburgh
@ 2023-06-17  8:29     ` Hangbin Liu
  2023-06-22  1:42       ` Jay Vosburgh
  0 siblings, 1 reply; 9+ messages in thread
From: Hangbin Liu @ 2023-06-17  8:29 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev

On Sat, Jun 17, 2023 at 9:45 AM Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
>
> Hangbin Liu <liuhangbin@gmail.com> wrote:
>
> >Hi Jay, any thoughts?
>
>         Just an update that I've done some poking with your reproducer;
> as described, the ARP reply for the IP address associated with the
> macvlan interface is coming back with the bond's MAC, not the MAC of the
> macvlan.  If I manually create a neighbour entry on the client with the
> corrent IP and MAC for the macvlan, then connectivity works as expected.
>
>         I'll have to look a bit further into the ARP MAC selection logic
> here (i.e., where does that MAC come from when the ARP reply is
> generated).  It also makes me wonder what's special about macvlan, e.g.,
> why doesn't regular VLAN (or other stacked devices) fail in the same way
> (or maybe it does and nobody has noticed).

VLAN or other overlay devices use the same MAC address with Bonding.
So they work with the alb mode. But MACVLAN uses different MAC
addresses with Bonding.

Thanks
Hangbin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Discuss] IPv4 packets lost with macvlan over bond alb
  2023-06-17  8:29     ` Hangbin Liu
@ 2023-06-22  1:42       ` Jay Vosburgh
  2023-06-25  7:37         ` Hangbin Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Jay Vosburgh @ 2023-06-22  1:42 UTC (permalink / raw)
  To: Hangbin Liu; +Cc: netdev

Hangbin Liu <liuhangbin@gmail.com> wrote:

>On Sat, Jun 17, 2023 at 9:45 AM Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
>>
>> Hangbin Liu <liuhangbin@gmail.com> wrote:
>>
>> >Hi Jay, any thoughts?
>>
>>         Just an update that I've done some poking with your reproducer;
>> as described, the ARP reply for the IP address associated with the
>> macvlan interface is coming back with the bond's MAC, not the MAC of the
>> macvlan.  If I manually create a neighbour entry on the client with the
>> corrent IP and MAC for the macvlan, then connectivity works as expected.
>>
>>         I'll have to look a bit further into the ARP MAC selection logic
>> here (i.e., where does that MAC come from when the ARP reply is
>> generated).  It also makes me wonder what's special about macvlan, e.g.,
>> why doesn't regular VLAN (or other stacked devices) fail in the same way
>> (or maybe it does and nobody has noticed).
>
>VLAN or other overlay devices use the same MAC address with Bonding.
>So they work with the alb mode. But MACVLAN uses different MAC
>addresses with Bonding.

	By default, yes, VLANs use the same MAC, but may use a different
MAC than the device the VLAN is configured above.  However, changing the
VLAN's MAC address in a similar test case (VLAN above balance-alb bond)
still works, because VLAN packets are delivered by matching the VLAN ID
(via vlan_do_receive() -> vlan_find_dev()), not via the MAC address.

	So, the RLB MAC edits done by rlb_arp_xmit() work in the sense
that traffic flows, even though peers see a MAC address from the bond
for the VLAN IP, not the VLAN's actual MAC address.

	A bridge can also use a MAC address that differs from the bond,
but rlb_arp_xmit() has a special case for bridge, and doesn't alter the
ARP if the relevant IP address is on a bridge (so, no balancing).

	Changing rlb_arp_xmit() to add macvlan to the bridge check makes
the test case pass, e.g.,

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index b9dbad3a8af8..f720c419dfb7 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -668,7 +668,7 @@ static struct slave *rlb_arp_xmit(struct sk_buff *skb, struct bonding *bond)
 
 	dev = ip_dev_find(dev_net(bond->dev), arp->ip_src);
 	if (dev) {
-		if (netif_is_bridge_master(dev)) {
+		if (netif_is_bridge_master(dev) || netif_is_macvlan(dev)) {
 			dev_put(dev);
 			return NULL;
 		}

	but I'm not completely sure that's doing the right thing with
regard to the macvlan logic in alb_send_learning_packets() ->
alb_upper_dev_walk().

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Discuss] IPv4 packets lost with macvlan over bond alb
  2023-06-22  1:42       ` Jay Vosburgh
@ 2023-06-25  7:37         ` Hangbin Liu
  2023-07-14  9:18           ` Hangbin Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Hangbin Liu @ 2023-06-25  7:37 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev

Hi Jay,
On Wed, Jun 21, 2023 at 06:42:10PM -0700, Jay Vosburgh wrote:
> 	By default, yes, VLANs use the same MAC, but may use a different
> MAC than the device the VLAN is configured above.  However, changing the
> VLAN's MAC address in a similar test case (VLAN above balance-alb bond)
> still works, because VLAN packets are delivered by matching the VLAN ID
> (via vlan_do_receive() -> vlan_find_dev()), not via the MAC address.
> 
> 	So, the RLB MAC edits done by rlb_arp_xmit() work in the sense
> that traffic flows, even though peers see a MAC address from the bond
> for the VLAN IP, not the VLAN's actual MAC address.
> 
> 	A bridge can also use a MAC address that differs from the bond,
> but rlb_arp_xmit() has a special case for bridge, and doesn't alter the
> ARP if the relevant IP address is on a bridge (so, no balancing).
> 
> 	Changing rlb_arp_xmit() to add macvlan to the bridge check makes
> the test case pass, e.g.,
> 
> diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
> index b9dbad3a8af8..f720c419dfb7 100644
> --- a/drivers/net/bonding/bond_alb.c
> +++ b/drivers/net/bonding/bond_alb.c
> @@ -668,7 +668,7 @@ static struct slave *rlb_arp_xmit(struct sk_buff *skb, struct bonding *bond)
>  
>  	dev = ip_dev_find(dev_net(bond->dev), arp->ip_src);
>  	if (dev) {
> -		if (netif_is_bridge_master(dev)) {
> +		if (netif_is_bridge_master(dev) || netif_is_macvlan(dev)) {

This is not enough. As a common usecase is to attach the macvlan to another
namespace(Apologise that my reproducer only has a comment, but no set the macvlan
to a separate namespace). So ip_dev_find() could not find the macvlan dev.

Using netif_is_macvlan_port() could make sure the bonding is under macvlan
devices. But what if there are middle devices between bond and macvlan? Maybe
we need to check each upper device?

If we want to skip arp xmit for macvlan, it looks like a partial revert of
14af9963ba1e ("bonding: Support macvlans on top of tlb/rlb mode bonds"), which
means we can keep the TLB part and revert the RLB modification.


So my draft patch is:

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index b9dbad3a8af8..c27f1a78a94b 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -663,7 +663,7 @@ static struct slave *rlb_arp_xmit(struct sk_buff *skb, struct bonding *bond)
 	/* Don't modify or load balance ARPs that do not originate locally
 	 * (e.g.,arrive via a bridge).
 	 */
-	if (!bond_slave_has_mac_rx(bond, arp->mac_src))
+	if (!bond_slave_has_mac_rcu(bond, arp->mac_src))
 		return NULL;
 
 	dev = ip_dev_find(dev_net(bond->dev), arp->ip_src);
@@ -960,7 +960,6 @@ static int alb_upper_dev_walk(struct net_device *upper,
 			      struct netdev_nested_priv *priv)
 {
 	struct alb_walk_data *data = (struct alb_walk_data *)priv->data;
-	bool strict_match = data->strict_match;
 	const u8 *mac_addr = data->mac_addr;
 	struct bonding *bond = data->bond;
 	struct slave *slave = data->slave;
@@ -979,10 +978,7 @@ static int alb_upper_dev_walk(struct net_device *upper,
 		}
 	}
 
-	/* If this is a macvlan device, then only send updates
-	 * when strict_match is turned off.
-	 */
-	if (netif_is_macvlan(upper) && !strict_match) {
+	if (netif_is_macvlan(upper)) {
 		tags = bond_verify_device_path(bond->dev, upper, 0);
 		if (IS_ERR_OR_NULL(tags))
 			BUG();
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 59955ac33157..6e4e406d8cd2 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -724,23 +724,14 @@ static inline struct slave *bond_slave_has_mac(struct bonding *bond,
 }
 
 /* Caller must hold rcu_read_lock() for read */
-static inline bool bond_slave_has_mac_rx(struct bonding *bond, const u8 *mac)
+static inline bool bond_slave_has_mac_rcu(struct bonding *bond, const u8 *mac)
 {
 	struct list_head *iter;
 	struct slave *tmp;
-	struct netdev_hw_addr *ha;
 
 	bond_for_each_slave_rcu(bond, tmp, iter)
 		if (ether_addr_equal_64bits(mac, tmp->dev->dev_addr))
 			return true;
-
-	if (netdev_uc_empty(bond->dev))
-		return false;
-
-	netdev_for_each_uc_addr(ha, bond->dev)
-		if (ether_addr_equal_64bits(mac, ha->addr))
-			return true;
-
 	return false;
 }
 

If we want to keep the macvlan support for ALB, as I said, the second way
is restore the macvlan mac address when receive message for macvlan port, e.g.:
I didn't test which way affect the performance more.

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index edbaa1444f8e..379d4c139b23 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1585,6 +1585,36 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
 				  bond->dev->addr_len);
 	}
 
+	if (BOND_MODE(bond) == BOND_MODE_ALB &&
+	    netif_is_macvlan_port(bond->dev) &&
+	    skb->pkt_type == PACKET_HOST) {
+		struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
+		struct rlb_client_info *client_info;
+		u32 hash_index;
+
+		spin_lock_bh(&bond->mode_lock);
+
+		hash_index = bond_info->rx_hashtbl_used_head;
+		for (; hash_index != RLB_NULL_INDEX;
+		     hash_index = client_info->used_next) {
+
+			client_info = &(bond_info->rx_hashtbl[hash_index]);
+
+			if (ip_hdr(skb)->saddr == client_info->ip_dst &&
+			    ip_hdr(skb)->daddr == client_info->ip_src) {
+
+				if (unlikely(skb_cow_head(skb, skb->data - skb_mac_header(skb)))) {
+					kfree_skb(skb);
+					return RX_HANDLER_CONSUMED;
+				}
+				bond_hw_addr_copy(eth_hdr(skb)->h_dest,
+						  client_info->mac_src, ETH_ALEN);
+			}
+		}
+
+		spin_unlock_bh(&bond->mode_lock);
+	}
+
 	return ret;
 }
 

Thanks
Hangbin

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Discuss] IPv4 packets lost with macvlan over bond alb
  2023-06-25  7:37         ` Hangbin Liu
@ 2023-07-14  9:18           ` Hangbin Liu
  2023-08-05  8:25             ` Hangbin Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Hangbin Liu @ 2023-07-14  9:18 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev

Hi Jay,

Any thoughts for my last reply?

Thanks
Hangbin
On Sun, Jun 25, 2023 at 03:37:10PM +0800, Hangbin Liu wrote:
> Hi Jay,
> On Wed, Jun 21, 2023 at 06:42:10PM -0700, Jay Vosburgh wrote:
> > 	By default, yes, VLANs use the same MAC, but may use a different
> > MAC than the device the VLAN is configured above.  However, changing the
> > VLAN's MAC address in a similar test case (VLAN above balance-alb bond)
> > still works, because VLAN packets are delivered by matching the VLAN ID
> > (via vlan_do_receive() -> vlan_find_dev()), not via the MAC address.
> > 
> > 	So, the RLB MAC edits done by rlb_arp_xmit() work in the sense
> > that traffic flows, even though peers see a MAC address from the bond
> > for the VLAN IP, not the VLAN's actual MAC address.
> > 
> > 	A bridge can also use a MAC address that differs from the bond,
> > but rlb_arp_xmit() has a special case for bridge, and doesn't alter the
> > ARP if the relevant IP address is on a bridge (so, no balancing).
> > 
> > 	Changing rlb_arp_xmit() to add macvlan to the bridge check makes
> > the test case pass, e.g.,
> > 
> > diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
> > index b9dbad3a8af8..f720c419dfb7 100644
> > --- a/drivers/net/bonding/bond_alb.c
> > +++ b/drivers/net/bonding/bond_alb.c
> > @@ -668,7 +668,7 @@ static struct slave *rlb_arp_xmit(struct sk_buff *skb, struct bonding *bond)
> >  
> >  	dev = ip_dev_find(dev_net(bond->dev), arp->ip_src);
> >  	if (dev) {
> > -		if (netif_is_bridge_master(dev)) {
> > +		if (netif_is_bridge_master(dev) || netif_is_macvlan(dev)) {
> 
> This is not enough. As a common usecase is to attach the macvlan to another
> namespace(Apologise that my reproducer only has a comment, but no set the macvlan
> to a separate namespace). So ip_dev_find() could not find the macvlan dev.
> 
> Using netif_is_macvlan_port() could make sure the bonding is under macvlan
> devices. But what if there are middle devices between bond and macvlan? Maybe
> we need to check each upper device?
> 
> If we want to skip arp xmit for macvlan, it looks like a partial revert of
> 14af9963ba1e ("bonding: Support macvlans on top of tlb/rlb mode bonds"), which
> means we can keep the TLB part and revert the RLB modification.
> 
> 
> So my draft patch is:
> 
> diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
> index b9dbad3a8af8..c27f1a78a94b 100644
> --- a/drivers/net/bonding/bond_alb.c
> +++ b/drivers/net/bonding/bond_alb.c
> @@ -663,7 +663,7 @@ static struct slave *rlb_arp_xmit(struct sk_buff *skb, struct bonding *bond)
>  	/* Don't modify or load balance ARPs that do not originate locally
>  	 * (e.g.,arrive via a bridge).
>  	 */
> -	if (!bond_slave_has_mac_rx(bond, arp->mac_src))
> +	if (!bond_slave_has_mac_rcu(bond, arp->mac_src))
>  		return NULL;
>  
>  	dev = ip_dev_find(dev_net(bond->dev), arp->ip_src);
> @@ -960,7 +960,6 @@ static int alb_upper_dev_walk(struct net_device *upper,
>  			      struct netdev_nested_priv *priv)
>  {
>  	struct alb_walk_data *data = (struct alb_walk_data *)priv->data;
> -	bool strict_match = data->strict_match;
>  	const u8 *mac_addr = data->mac_addr;
>  	struct bonding *bond = data->bond;
>  	struct slave *slave = data->slave;
> @@ -979,10 +978,7 @@ static int alb_upper_dev_walk(struct net_device *upper,
>  		}
>  	}
>  
> -	/* If this is a macvlan device, then only send updates
> -	 * when strict_match is turned off.
> -	 */
> -	if (netif_is_macvlan(upper) && !strict_match) {
> +	if (netif_is_macvlan(upper)) {
>  		tags = bond_verify_device_path(bond->dev, upper, 0);
>  		if (IS_ERR_OR_NULL(tags))
>  			BUG();
> diff --git a/include/net/bonding.h b/include/net/bonding.h
> index 59955ac33157..6e4e406d8cd2 100644
> --- a/include/net/bonding.h
> +++ b/include/net/bonding.h
> @@ -724,23 +724,14 @@ static inline struct slave *bond_slave_has_mac(struct bonding *bond,
>  }
>  
>  /* Caller must hold rcu_read_lock() for read */
> -static inline bool bond_slave_has_mac_rx(struct bonding *bond, const u8 *mac)
> +static inline bool bond_slave_has_mac_rcu(struct bonding *bond, const u8 *mac)
>  {
>  	struct list_head *iter;
>  	struct slave *tmp;
> -	struct netdev_hw_addr *ha;
>  
>  	bond_for_each_slave_rcu(bond, tmp, iter)
>  		if (ether_addr_equal_64bits(mac, tmp->dev->dev_addr))
>  			return true;
> -
> -	if (netdev_uc_empty(bond->dev))
> -		return false;
> -
> -	netdev_for_each_uc_addr(ha, bond->dev)
> -		if (ether_addr_equal_64bits(mac, ha->addr))
> -			return true;
> -
>  	return false;
>  }
>  
> 
> If we want to keep the macvlan support for ALB, as I said, the second way
> is restore the macvlan mac address when receive message for macvlan port, e.g.:
> I didn't test which way affect the performance more.
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index edbaa1444f8e..379d4c139b23 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1585,6 +1585,36 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
>  				  bond->dev->addr_len);
>  	}
>  
> +	if (BOND_MODE(bond) == BOND_MODE_ALB &&
> +	    netif_is_macvlan_port(bond->dev) &&
> +	    skb->pkt_type == PACKET_HOST) {
> +		struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
> +		struct rlb_client_info *client_info;
> +		u32 hash_index;
> +
> +		spin_lock_bh(&bond->mode_lock);
> +
> +		hash_index = bond_info->rx_hashtbl_used_head;
> +		for (; hash_index != RLB_NULL_INDEX;
> +		     hash_index = client_info->used_next) {
> +
> +			client_info = &(bond_info->rx_hashtbl[hash_index]);
> +
> +			if (ip_hdr(skb)->saddr == client_info->ip_dst &&
> +			    ip_hdr(skb)->daddr == client_info->ip_src) {
> +
> +				if (unlikely(skb_cow_head(skb, skb->data - skb_mac_header(skb)))) {
> +					kfree_skb(skb);
> +					return RX_HANDLER_CONSUMED;
> +				}
> +				bond_hw_addr_copy(eth_hdr(skb)->h_dest,
> +						  client_info->mac_src, ETH_ALEN);
> +			}
> +		}
> +
> +		spin_unlock_bh(&bond->mode_lock);
> +	}
> +
>  	return ret;
>  }
>  
> 
> Thanks
> Hangbin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Discuss] IPv4 packets lost with macvlan over bond alb
  2023-07-14  9:18           ` Hangbin Liu
@ 2023-08-05  8:25             ` Hangbin Liu
  0 siblings, 0 replies; 9+ messages in thread
From: Hangbin Liu @ 2023-08-05  8:25 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev

On Fri, Jul 14, 2023 at 05:18:48PM +0800, Hangbin Liu wrote:
> Hi Jay,
> 
> Any thoughts for my last reply?

Ping

> 
> Thanks
> Hangbin
> On Sun, Jun 25, 2023 at 03:37:10PM +0800, Hangbin Liu wrote:
> > Hi Jay,
> > On Wed, Jun 21, 2023 at 06:42:10PM -0700, Jay Vosburgh wrote:
> > > 	By default, yes, VLANs use the same MAC, but may use a different
> > > MAC than the device the VLAN is configured above.  However, changing the
> > > VLAN's MAC address in a similar test case (VLAN above balance-alb bond)
> > > still works, because VLAN packets are delivered by matching the VLAN ID
> > > (via vlan_do_receive() -> vlan_find_dev()), not via the MAC address.
> > > 
> > > 	So, the RLB MAC edits done by rlb_arp_xmit() work in the sense
> > > that traffic flows, even though peers see a MAC address from the bond
> > > for the VLAN IP, not the VLAN's actual MAC address.
> > > 
> > > 	A bridge can also use a MAC address that differs from the bond,
> > > but rlb_arp_xmit() has a special case for bridge, and doesn't alter the
> > > ARP if the relevant IP address is on a bridge (so, no balancing).
> > > 
> > > 	Changing rlb_arp_xmit() to add macvlan to the bridge check makes
> > > the test case pass, e.g.,
> > > 
> > > diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
> > > index b9dbad3a8af8..f720c419dfb7 100644
> > > --- a/drivers/net/bonding/bond_alb.c
> > > +++ b/drivers/net/bonding/bond_alb.c
> > > @@ -668,7 +668,7 @@ static struct slave *rlb_arp_xmit(struct sk_buff *skb, struct bonding *bond)
> > >  
> > >  	dev = ip_dev_find(dev_net(bond->dev), arp->ip_src);
> > >  	if (dev) {
> > > -		if (netif_is_bridge_master(dev)) {
> > > +		if (netif_is_bridge_master(dev) || netif_is_macvlan(dev)) {
> > 
> > This is not enough. As a common usecase is to attach the macvlan to another
> > namespace(Apologise that my reproducer only has a comment, but no set the macvlan
> > to a separate namespace). So ip_dev_find() could not find the macvlan dev.
> > 
> > Using netif_is_macvlan_port() could make sure the bonding is under macvlan
> > devices. But what if there are middle devices between bond and macvlan? Maybe
> > we need to check each upper device?
> > 
> > If we want to skip arp xmit for macvlan, it looks like a partial revert of
> > 14af9963ba1e ("bonding: Support macvlans on top of tlb/rlb mode bonds"), which
> > means we can keep the TLB part and revert the RLB modification.
> > 
> > 
> > So my draft patch is:
> > 
> > diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
> > index b9dbad3a8af8..c27f1a78a94b 100644
> > --- a/drivers/net/bonding/bond_alb.c
> > +++ b/drivers/net/bonding/bond_alb.c
> > @@ -663,7 +663,7 @@ static struct slave *rlb_arp_xmit(struct sk_buff *skb, struct bonding *bond)
> >  	/* Don't modify or load balance ARPs that do not originate locally
> >  	 * (e.g.,arrive via a bridge).
> >  	 */
> > -	if (!bond_slave_has_mac_rx(bond, arp->mac_src))
> > +	if (!bond_slave_has_mac_rcu(bond, arp->mac_src))
> >  		return NULL;
> >  
> >  	dev = ip_dev_find(dev_net(bond->dev), arp->ip_src);
> > @@ -960,7 +960,6 @@ static int alb_upper_dev_walk(struct net_device *upper,
> >  			      struct netdev_nested_priv *priv)
> >  {
> >  	struct alb_walk_data *data = (struct alb_walk_data *)priv->data;
> > -	bool strict_match = data->strict_match;
> >  	const u8 *mac_addr = data->mac_addr;
> >  	struct bonding *bond = data->bond;
> >  	struct slave *slave = data->slave;
> > @@ -979,10 +978,7 @@ static int alb_upper_dev_walk(struct net_device *upper,
> >  		}
> >  	}
> >  
> > -	/* If this is a macvlan device, then only send updates
> > -	 * when strict_match is turned off.
> > -	 */
> > -	if (netif_is_macvlan(upper) && !strict_match) {
> > +	if (netif_is_macvlan(upper)) {
> >  		tags = bond_verify_device_path(bond->dev, upper, 0);
> >  		if (IS_ERR_OR_NULL(tags))
> >  			BUG();
> > diff --git a/include/net/bonding.h b/include/net/bonding.h
> > index 59955ac33157..6e4e406d8cd2 100644
> > --- a/include/net/bonding.h
> > +++ b/include/net/bonding.h
> > @@ -724,23 +724,14 @@ static inline struct slave *bond_slave_has_mac(struct bonding *bond,
> >  }
> >  
> >  /* Caller must hold rcu_read_lock() for read */
> > -static inline bool bond_slave_has_mac_rx(struct bonding *bond, const u8 *mac)
> > +static inline bool bond_slave_has_mac_rcu(struct bonding *bond, const u8 *mac)
> >  {
> >  	struct list_head *iter;
> >  	struct slave *tmp;
> > -	struct netdev_hw_addr *ha;
> >  
> >  	bond_for_each_slave_rcu(bond, tmp, iter)
> >  		if (ether_addr_equal_64bits(mac, tmp->dev->dev_addr))
> >  			return true;
> > -
> > -	if (netdev_uc_empty(bond->dev))
> > -		return false;
> > -
> > -	netdev_for_each_uc_addr(ha, bond->dev)
> > -		if (ether_addr_equal_64bits(mac, ha->addr))
> > -			return true;
> > -
> >  	return false;
> >  }
> >  
> > 
> > If we want to keep the macvlan support for ALB, as I said, the second way
> > is restore the macvlan mac address when receive message for macvlan port, e.g.:
> > I didn't test which way affect the performance more.
> > 
> > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> > index edbaa1444f8e..379d4c139b23 100644
> > --- a/drivers/net/bonding/bond_main.c
> > +++ b/drivers/net/bonding/bond_main.c
> > @@ -1585,6 +1585,36 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
> >  				  bond->dev->addr_len);
> >  	}
> >  
> > +	if (BOND_MODE(bond) == BOND_MODE_ALB &&
> > +	    netif_is_macvlan_port(bond->dev) &&
> > +	    skb->pkt_type == PACKET_HOST) {
> > +		struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
> > +		struct rlb_client_info *client_info;
> > +		u32 hash_index;
> > +
> > +		spin_lock_bh(&bond->mode_lock);
> > +
> > +		hash_index = bond_info->rx_hashtbl_used_head;
> > +		for (; hash_index != RLB_NULL_INDEX;
> > +		     hash_index = client_info->used_next) {
> > +
> > +			client_info = &(bond_info->rx_hashtbl[hash_index]);
> > +
> > +			if (ip_hdr(skb)->saddr == client_info->ip_dst &&
> > +			    ip_hdr(skb)->daddr == client_info->ip_src) {
> > +
> > +				if (unlikely(skb_cow_head(skb, skb->data - skb_mac_header(skb)))) {
> > +					kfree_skb(skb);
> > +					return RX_HANDLER_CONSUMED;
> > +				}
> > +				bond_hw_addr_copy(eth_hdr(skb)->h_dest,
> > +						  client_info->mac_src, ETH_ALEN);
> > +			}
> > +		}
> > +
> > +		spin_unlock_bh(&bond->mode_lock);
> > +	}
> > +
> >  	return ret;
> >  }
> >  
> > 
> > Thanks
> > Hangbin

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-08-05  8:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-02  8:08 [Discuss] IPv4 packets lost with macvlan over bond alb Hangbin Liu
2023-06-08  3:43 ` Hangbin Liu
2023-06-08  4:44   ` Jay Vosburgh
2023-06-17  1:45   ` Jay Vosburgh
2023-06-17  8:29     ` Hangbin Liu
2023-06-22  1:42       ` Jay Vosburgh
2023-06-25  7:37         ` Hangbin Liu
2023-07-14  9:18           ` Hangbin Liu
2023-08-05  8:25             ` Hangbin Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).