From: Jay Vosburgh <fubar@us.ibm.com>
To: Patrick Schaaf <netdev@bof.de>
Cc: netdev@vger.kernel.org
Subject: Re: bonding flaps between member interfaces
Date: Tue, 17 May 2011 18:22:22 -0700 [thread overview]
Message-ID: <27478.1305681742@death> (raw)
In-Reply-To: <1305638854.6044.223.camel@lat1>
Patrick Schaaf <netdev@bof.de> wrote:
>Dear netdev,
>
>I'm experiencing a regression with bonding. Bugzilla and cursory
>searching of the list did not immediately show up anything that seems
>related, so here's the report:
>
>Short summary: bonding flips between members every second
I have reproduced the problem on a 2.6.38-rc5-ish kernel.
The described configuration is enslaving two VLAN interfaces; I
also tried enslaving eth0/eth1 directly and stacking the VLAN atop
bonding. That doesn't work either. I don't get any errors, and bonding
says the slaves are up, but ping through the VLAN fails. Ping over the
non-VLAN (directly on bond0) works ok.
I'll give it some bisect action and report back.
-J
>bonding in active-backup mode with ARP monitoring
>two members in the bond, both being VLAN interfaces on top of two
>separate ethernet interfaces
>bnx2 ethernet driver, but saw the same behaviour with a tigon box
>concrete settings:
>BONDING_MODULE_OPTS="mode=active-backup primary=eth0.24 arp_interval=250
>arp_ip_target=192.168.x.x"
>See below for a /proc/net/bonding/bond24 output reflecing the
>configuration.
>
>This setup I have in production on 2.6.36.2, and it works fine.
>It also works fine, tested today, with 2.6.36,4 and 2.6.37.6
>
>Starting with 2.6.38 (2.6.38.6 tested today), and still happening with
>2.6.39-rc7, I experience problems. While I can still work over the
>interface, it is flipping once per second between the two member
>interfaces. There is no indication of the underlying interface going
>up/down, but bonding seems to think so.
>
>See below an excerpt of the kernel log for two back-and-forth flapping
>cycles.
>
>In /proc/net/bonding/bond24, I see the failure counter of the configured
>primary interface counting up with each flap. The counter of the non
>primary interface does not move. When I switch the primary interface by
>echoing to /sys, the behaviour of the counters flips: always the
>configured primary has the counter going up.
>
>best regards
> Patrick
>
>Here is /proc/net/bonding/bond24 while running on 2.6.37.6, to show the
>concrete configuration from this POV. Everything looks the same with the
>failing kernels, except for the noted behaviour of the Failure Counts.
>
>Ethernet Channel Bonding Driver: v3.7.0 (June 2, 2010)
>
>Bonding Mode: fault-tolerance (active-backup)
>Primary Slave: eth0.24 (primary_reselect always)
>Currently Active Slave: eth0.24
>MII Status: up
>MII Polling Interval (ms): 0
>Up Delay (ms): 0
>Down Delay (ms): 0
>ARP Polling Interval (ms): 250
>ARP IP target/s (n.n.n.n form): 192.168.x.x
>
>Slave Interface: eth0.24
>MII Status: up
>Speed: 1000 Mbps
>Duplex: full
>Link Failure Count: 0
>Permanent HW addr: d4:85:64:ca:1c:12
>Slave queue ID: 0
>
>Slave Interface: eth1.24
>MII Status: up
>Speed: 1000 Mbps
>Duplex: full
>Link Failure Count: 0
>Permanent HW addr: d4:85:64:ca:1c:14
>Slave queue ID: 0
>
>Here is kernel log output for two flapping cycles (booted kernel was
>2.6.39-rc7):
>
>May 17 14:58:22 myserver kernel: [ 1016.629155] bonding: bond24: link
>status definitely down for interface eth0.24, disabling it
>May 17 14:58:22 myserver kernel: [ 1016.629159] bonding: bond24: making
>interface eth1.24 the new active one.
>May 17 14:58:22 myserver kernel: [ 1016.629162] device eth0.24 left
>promiscuous mode
>May 17 14:58:22 myserver kernel: [ 1016.629164] device eth0 left
>promiscuous mode
>May 17 14:58:22 myserver kernel: [ 1016.629191] device eth1.24 entered
>promiscuous mode
>May 17 14:58:22 myserver kernel: [ 1016.629193] device eth1 entered
>promiscuous mode
>May 17 14:58:22 myserver kernel: [ 1016.878596] bonding: bond24: link
>status definitely up for interface eth0.24.
>May 17 14:58:22 myserver kernel: [ 1016.878600] bonding: bond24: making
>interface eth0.24 the new active one.
>May 17 14:58:22 myserver kernel: [ 1016.878603] device eth1.24 left
>promiscuous mode
>May 17 14:58:22 myserver kernel: [ 1016.878605] device eth1 left
>promiscuous mode
>May 17 14:58:22 myserver kernel: [ 1016.878631] device eth0.24 entered
>promiscuous mode
>May 17 14:58:22 myserver kernel: [ 1016.878633] device eth0 entered
>promiscuous mode
>May 17 14:58:23 myserver kernel: [ 1017.626919] bonding: bond24: link
>status definitely down for interface eth0.24, disabling it
>May 17 14:58:23 myserver kernel: [ 1017.626923] bonding: bond24: making
>interface eth1.24 the new active one.
>May 17 14:58:23 myserver kernel: [ 1017.626926] device eth0.24 left
>promiscuous mode
>May 17 14:58:23 myserver kernel: [ 1017.626928] device eth0 left
>promiscuous mode
>May 17 14:58:23 myserver kernel: [ 1017.626955] device eth1.24 entered
>promiscuous mode
>May 17 14:58:23 myserver kernel: [ 1017.626957] device eth1 entered
>promiscuous mode
>May 17 14:58:23 myserver kernel: [ 1017.876359] bonding: bond24: link
>status definitely up for interface eth0.24.
>May 17 14:58:23 myserver kernel: [ 1017.876363] bonding: bond24: making
>interface eth0.24 the new active one.
>May 17 14:58:23 myserver kernel: [ 1017.876366] device eth1.24 left
>promiscuous mode
>May 17 14:58:23 myserver kernel: [ 1017.876368] device eth1 left
>promiscuous mode
>May 17 14:58:23 myserver kernel: [ 1017.876394] device eth0.24 entered
>promiscuous mode
>May 17 14:58:23 myserver kernel: [ 1017.876396] device eth0 entered
>promiscuous mode
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
prev parent reply other threads:[~2011-05-18 1:22 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-17 13:27 bonding flaps between member interfaces Patrick Schaaf
2011-05-18 1:22 ` Jay Vosburgh [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=27478.1305681742@death \
--to=fubar@us.ibm.com \
--cc=netdev@bof.de \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).