From: Jay Vosburgh <fubar@us.ibm.com>
To: Narendra_K@Dell.com
Cc: netdev@vger.kernel.org
Subject: Re: bonding device in balance-alb mode shows packet loss in kernel 3.2-rc6
Date: Wed, 28 Dec 2011 12:08:30 -0800 [thread overview]
Message-ID: <27384.1325102910@death> (raw)
In-Reply-To: <E31FB011129F30488D5861F38390491520C52BCC8B@BLRX7MCDC201.AMER.DELL.COM>
<Narendra_K@Dell.com> wrote:
>Hello,
>
>On kernel version 3.2-rc6, when a bonding device is configured in 'balance-alb' mode,
>ping reported packet losses. By looking at protocol trance, it seemed like the lost
>packets had the destination MAC id of inactive slave.
In balance-alb mode, there isn't really an "inactive" slave in
the same sense as for active-backup mode. For this mode, the "inactive"
slave flag is used to suppress duplicates for multicast and broadcasts,
to prevent multiple copies of those from being received (if each slave
gets one copy). Unicast traffic should pass normally to all slaves.
Each slave also keeps a discrete MAC address, and peers are assigned to
particular slaves via tailored ARP messages (so, different peers may see
a different MAC for the bond's IP address).
>Scenario:
>
>Host under test:
>
>bond0 IP addr: 10.2.2.1 - balance-alb mode, 2 or more slaves.
>
>Remote Host1: 10.2.2.11
>
>Remote Host2: 10.2.2.2
>
>Ping to Host 1 IP. Observe that there is no packet loss
>
># ping 10.2.2.11
>PING 10.2.2.11 (10.2.2.11) 56(84) bytes of data.
>64 bytes from 10.2.2.11: icmp_seq=1 ttl=64 time=0.156 ms
>64 bytes from 10.2.2.11: icmp_seq=2 ttl=64 time=0.130 ms
>64 bytes from 10.2.2.11: icmp_seq=3 ttl=64 time=0.151 ms
>64 bytes from 10.2.2.11: icmp_seq=4 ttl=64 time=0.137 ms
>64 bytes from 10.2.2.11: icmp_seq=5 ttl=64 time=0.151 ms
>64 bytes from 10.2.2.11: icmp_seq=6 ttl=64 time=0.129 ms
>^C
>--- 10.2.2.11 ping statistics ---
>6 packets transmitted, 6 received, 0% packet loss, time 4997ms
>rtt min/avg/max/mdev = 0.129/0.142/0.156/0.014 ms
>
>Now ping to Host2 IP. Observe that there is packet loss. It is reproducible almost
>always.
>
># ping 10.2.2.2
>PING 10.2.2.2 (10.2.2.2) 56(84) bytes of data.
>64 bytes from 10.2.2.2: icmp_seq=6 ttl=64 time=0.108 ms
>64 bytes from 10.2.2.2: icmp_seq=7 ttl=64 time=0.104 ms
>64 bytes from 10.2.2.2: icmp_seq=8 ttl=64 time=0.119 ms
>64 bytes from 10.2.2.2: icmp_seq=56 ttl=64 time=0.139 ms
>64 bytes from 10.2.2.2: icmp_seq=57 ttl=64 time=0.111 ms
>^C
>--- 10.2.2.2 ping statistics ---
>75 packets transmitted, 5 received, 93% packet loss, time 74037ms
>rtt min/avg/max/mdev = 0.104/0.116/0.139/0.014 ms
>
>More information:
>
>Hardware information:
>Dell PowerEdge R610
>
># lspci | grep -i ether
>01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
>01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
>02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
>02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
>
>Kernel version:
>3.2.0-rc6
>
># ethtool -i bond0
>driver: bonding
>version: 3.7.1
>
>By observing the packets on remote HOST2, the sequence is
>
>1. 'bond0' broadcasts an ARP request with source MAC equal to
>'bond0' MAC address and receives a ARP response to the same.
>Next few packets are received.
In this case, it means the peer has been assigned to the "em2"
slave.
>2. After some, there are 2 ARP replies from 'bond0' to HOST2
>with source MAC equal to 'inactive slave' MAC id. Now HOST2 sends
>ICMP response with destnation MAC equal to inactive slave MAC id
>and these packets are dropped.
This part is not unusual for the balance-alb mode; the traffic
is periodically rebalanced, and in this case the peer HOST2 was likely
assigned to a different slave that it was previously. I'm not sure why
the packets don't reach their destination, but they shouldn't be dropped
due to the slave being "inactive," as I explained above.
>The wireshark protocol trace is attached to this note.
>
>3. The behavior was independent of the Network adapters models.
>
>4. Also, I had few prints in 'eth_type_trans' and it seemed like the 'inactive slave'
>was not receiving any frames destined to it (00:21:9b:9d:a5:74) except ARP broadcasts.
>Setting the 'inactive slave' in 'promisc' mode made bond0 see the responses.
This seems very strange, since the MAC information shown later
suggests that the slaves all are using their original MAC addresses, so
the packets ought to be delivered.
I'm out of the office until next week, so I won't have an
opportunity to try and reproduce this myself until then. I wonder if
something in the rx_handler changes over the last few months has broken
this, although a look at the code suggests that it should be doing the
right things.
-J
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
next prev parent reply other threads:[~2011-12-28 20:09 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-27 14:31 bonding device in balance-alb mode shows packet loss in kernel 3.2-rc6 Narendra_K
2011-12-27 19:56 ` Nicolas de Pesloüan
2011-12-28 7:25 ` Narendra_K
2011-12-28 17:59 ` Narendra_K
2011-12-28 20:08 ` Jay Vosburgh [this message]
2011-12-30 12:22 ` Narendra_K
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=27384.1325102910@death \
--to=fubar@us.ibm.com \
--cc=Narendra_K@Dell.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).