From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jay Vosburgh Subject: Re: bonding device in balance-alb mode shows packet loss in kernel 3.2-rc6 Date: Wed, 28 Dec 2011 12:08:30 -0800 Message-ID: <27384.1325102910@death> References: Cc: netdev@vger.kernel.org To: Narendra_K@Dell.com Return-path: Received: from e6.ny.us.ibm.com ([32.97.182.146]:41891 "EHLO e6.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754165Ab1L1UJG (ORCPT ); Wed, 28 Dec 2011 15:09:06 -0500 Received: from /spool/local by e6.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 28 Dec 2011 15:09:04 -0500 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id pBSK8WJo292266 for ; Wed, 28 Dec 2011 15:08:32 -0500 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id pBSK8Vpa007684 for ; Wed, 28 Dec 2011 13:08:31 -0700 In-reply-to: Sender: netdev-owner@vger.kernel.org List-ID: wrote: >Hello, > >On kernel version 3.2-rc6, when a bonding device is configured in 'balance-alb' mode, >ping reported packet losses. By looking at protocol trance, it seemed like the lost >packets had the destination MAC id of inactive slave. In balance-alb mode, there isn't really an "inactive" slave in the same sense as for active-backup mode. For this mode, the "inactive" slave flag is used to suppress duplicates for multicast and broadcasts, to prevent multiple copies of those from being received (if each slave gets one copy). Unicast traffic should pass normally to all slaves. Each slave also keeps a discrete MAC address, and peers are assigned to particular slaves via tailored ARP messages (so, different peers may see a different MAC for the bond's IP address). >Scenario: > >Host under test: > >bond0 IP addr: 10.2.2.1 - balance-alb mode, 2 or more slaves. > >Remote Host1: 10.2.2.11 > >Remote Host2: 10.2.2.2 > >Ping to Host 1 IP. Observe that there is no packet loss > ># ping 10.2.2.11 >PING 10.2.2.11 (10.2.2.11) 56(84) bytes of data. >64 bytes from 10.2.2.11: icmp_seq=1 ttl=64 time=0.156 ms >64 bytes from 10.2.2.11: icmp_seq=2 ttl=64 time=0.130 ms >64 bytes from 10.2.2.11: icmp_seq=3 ttl=64 time=0.151 ms >64 bytes from 10.2.2.11: icmp_seq=4 ttl=64 time=0.137 ms >64 bytes from 10.2.2.11: icmp_seq=5 ttl=64 time=0.151 ms >64 bytes from 10.2.2.11: icmp_seq=6 ttl=64 time=0.129 ms >^C >--- 10.2.2.11 ping statistics --- >6 packets transmitted, 6 received, 0% packet loss, time 4997ms >rtt min/avg/max/mdev = 0.129/0.142/0.156/0.014 ms > >Now ping to Host2 IP. Observe that there is packet loss. It is reproducible almost >always. > ># ping 10.2.2.2 >PING 10.2.2.2 (10.2.2.2) 56(84) bytes of data. >64 bytes from 10.2.2.2: icmp_seq=6 ttl=64 time=0.108 ms >64 bytes from 10.2.2.2: icmp_seq=7 ttl=64 time=0.104 ms >64 bytes from 10.2.2.2: icmp_seq=8 ttl=64 time=0.119 ms >64 bytes from 10.2.2.2: icmp_seq=56 ttl=64 time=0.139 ms >64 bytes from 10.2.2.2: icmp_seq=57 ttl=64 time=0.111 ms >^C >--- 10.2.2.2 ping statistics --- >75 packets transmitted, 5 received, 93% packet loss, time 74037ms >rtt min/avg/max/mdev = 0.104/0.116/0.139/0.014 ms > >More information: > >Hardware information: >Dell PowerEdge R610 > ># lspci | grep -i ether >01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) >01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) >02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) >02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) > >Kernel version: >3.2.0-rc6 > ># ethtool -i bond0 >driver: bonding >version: 3.7.1 > >By observing the packets on remote HOST2, the sequence is > >1. 'bond0' broadcasts an ARP request with source MAC equal to >'bond0' MAC address and receives a ARP response to the same. >Next few packets are received. In this case, it means the peer has been assigned to the "em2" slave. >2. After some, there are 2 ARP replies from 'bond0' to HOST2 >with source MAC equal to 'inactive slave' MAC id. Now HOST2 sends >ICMP response with destnation MAC equal to inactive slave MAC id >and these packets are dropped. This part is not unusual for the balance-alb mode; the traffic is periodically rebalanced, and in this case the peer HOST2 was likely assigned to a different slave that it was previously. I'm not sure why the packets don't reach their destination, but they shouldn't be dropped due to the slave being "inactive," as I explained above. >The wireshark protocol trace is attached to this note. > >3. The behavior was independent of the Network adapters models. > >4. Also, I had few prints in 'eth_type_trans' and it seemed like the 'inactive slave' >was not receiving any frames destined to it (00:21:9b:9d:a5:74) except ARP broadcasts. >Setting the 'inactive slave' in 'promisc' mode made bond0 see the responses. This seems very strange, since the MAC information shown later suggests that the slaves all are using their original MAC addresses, so the packets ought to be delivered. I'm out of the office until next week, so I won't have an opportunity to try and reproduce this myself until then. I wonder if something in the rx_handler changes over the last few months has broken this, although a look at the code suggests that it should be doing the right things. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com