netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bonding driver fails to enable second interface if updelay is non-zero
@ 2017-07-21  2:07 Benjamin Gilbert
  2017-07-24 20:53 ` Cong Wang
  0 siblings, 1 reply; 3+ messages in thread
From: Benjamin Gilbert @ 2017-07-21  2:07 UTC (permalink / raw)
  To: netdev; +Cc: maheshb

[resend]

Hello,

Starting with commit de77ecd4ef02ca783f7762e04e92b3d0964be66b, and
through 4.12.2, the bonding driver in 802.3ad mode fails to enable the
second interface on a bond device if updelay is non-zero.  dmesg says:

[   35.825227] bond0: Setting xmit hash policy to layer3+4 (1)
[   35.825259] bond0: Setting MII monitoring interval to 100
[   35.825303] bond0: Setting down delay to 200
[   35.825328] bond0: Setting up delay to 200
[   35.827414] bond0: Adding slave eth0
[   35.949205] bond0: Enslaving eth0 as a backup interface with a down link
[   35.950812] bond0: Adding slave eth1
[   36.073764] bond0: Enslaving eth1 as a backup interface with a down link
[   36.076808] IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
[   39.327423] igb 0000:01:00.0 eth0: igb: eth0 NIC Link is Up 1000
Mbps Full Duplex, Flow Control: RX
[   39.405580] bond0: link status up for interface eth0, enabling it in 0 ms
[   39.405607] bond0: link status definitely up for interface eth0,
1000 Mbps full duplex
[   39.405608] bond0: Warning: No 802.3ad response from the link
partner for any adapters in the bond
[   39.405613] bond0: first active interface up!
[   39.406186] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
[   39.551391] igb 0000:01:00.1 eth1: igb: eth1 NIC Link is Up 1000
Mbps Full Duplex, Flow Control: RX
[   39.613590] bond0: link status up for interface eth1, enabling it in 200 ms
[   39.717575] bond0: link status up for interface eth1, enabling it in 200 ms
[   39.821395] bond0: link status up for interface eth1, enabling it in 200 ms
[   39.925584] bond0: link status up for interface eth1, enabling it in 200 ms
[   40.029288] bond0: link status up for interface eth1, enabling it in 200 ms
[   40.133388] bond0: link status up for interface eth1, enabling it in 200 ms

...and so on every 100 ms.  The bug doesn't trigger 100% reliably, but
can be provoked by removing and re-adding interfaces to the bond via
sysfs.

While the problem is occurring, networking appears to be unreliable.
Setting the updelay to 0 fixes it:

[  345.472559] bond0: link status up for interface eth1, enabling it in 200 ms
[  345.576558] bond0: link status up for interface eth1, enabling it in 200 ms
[  345.607614] bond0: Setting up delay to 0
[  345.680396] bond0: link status definitely up for interface eth1,
1000 Mbps full duplex

I'd be happy to provide further details or to test patches.

--Benjamin Gilbert

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Bonding driver fails to enable second interface if updelay is non-zero
  2017-07-21  2:07 Bonding driver fails to enable second interface if updelay is non-zero Benjamin Gilbert
@ 2017-07-24 20:53 ` Cong Wang
       [not found]   ` <CAF2d9jhh9MKqVdnc3nY7_kAwScRsA19U+t1VU8TqO6UvGBrvCQ@mail.gmail.com>
  0 siblings, 1 reply; 3+ messages in thread
From: Cong Wang @ 2017-07-24 20:53 UTC (permalink / raw)
  To: Benjamin Gilbert; +Cc: Linux Kernel Network Developers, Mahesh Bandewar

On Thu, Jul 20, 2017 at 7:07 PM, Benjamin Gilbert
<benjamin.gilbert@coreos.com> wrote:
> [resend]
>
> Hello,
>
> Starting with commit de77ecd4ef02ca783f7762e04e92b3d0964be66b, and
> through 4.12.2, the bonding driver in 802.3ad mode fails to enable the
> second interface on a bond device if updelay is non-zero.  dmesg says:
>
> [   35.825227] bond0: Setting xmit hash policy to layer3+4 (1)
> [   35.825259] bond0: Setting MII monitoring interval to 100
> [   35.825303] bond0: Setting down delay to 200
> [   35.825328] bond0: Setting up delay to 200
> [   35.827414] bond0: Adding slave eth0
> [   35.949205] bond0: Enslaving eth0 as a backup interface with a down link
> [   35.950812] bond0: Adding slave eth1
> [   36.073764] bond0: Enslaving eth1 as a backup interface with a down link
> [   36.076808] IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
> [   39.327423] igb 0000:01:00.0 eth0: igb: eth0 NIC Link is Up 1000
> Mbps Full Duplex, Flow Control: RX
> [   39.405580] bond0: link status up for interface eth0, enabling it in 0 ms
> [   39.405607] bond0: link status definitely up for interface eth0,
> 1000 Mbps full duplex
> [   39.405608] bond0: Warning: No 802.3ad response from the link
> partner for any adapters in the bond
> [   39.405613] bond0: first active interface up!
> [   39.406186] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
> [   39.551391] igb 0000:01:00.1 eth1: igb: eth1 NIC Link is Up 1000
> Mbps Full Duplex, Flow Control: RX
> [   39.613590] bond0: link status up for interface eth1, enabling it in 200 ms
> [   39.717575] bond0: link status up for interface eth1, enabling it in 200 ms
> [   39.821395] bond0: link status up for interface eth1, enabling it in 200 ms
> [   39.925584] bond0: link status up for interface eth1, enabling it in 200 ms
> [   40.029288] bond0: link status up for interface eth1, enabling it in 200 ms
> [   40.133388] bond0: link status up for interface eth1, enabling it in 200 ms
>
> ...and so on every 100 ms.  The bug doesn't trigger 100% reliably, but
> can be provoked by removing and re-adding interfaces to the bond via
> sysfs.
>
> While the problem is occurring, networking appears to be unreliable.
> Setting the updelay to 0 fixes it:
>
> [  345.472559] bond0: link status up for interface eth1, enabling it in 200 ms
> [  345.576558] bond0: link status up for interface eth1, enabling it in 200 ms
> [  345.607614] bond0: Setting up delay to 0
> [  345.680396] bond0: link status definitely up for interface eth1,
> 1000 Mbps full duplex
>
> I'd be happy to provide further details or to test patches.

A quick glance seems Mahesh missed the following piece:

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 181839d6fbea..9bee6c1c70cc 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2050,6 +2050,7 @@ static int bond_miimon_inspect(struct bonding *bond)
                                continue;

                        bond_propose_link_state(slave, BOND_LINK_FAIL);
+                       commit++;
                        slave->delay = bond->params.downdelay;
                        if (slave->delay) {
                                netdev_info(bond->dev, "link status
down for %sinterface %s, disabling it in %d ms\n",
@@ -2088,6 +2089,7 @@ static int bond_miimon_inspect(struct bonding *bond)
                                continue;

                        bond_propose_link_state(slave, BOND_LINK_BACK);
+                       commit++;
                        slave->delay = bond->params.updelay;

                        if (slave->delay) {

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: Bonding driver fails to enable second interface if updelay is non-zero
       [not found]   ` <CAF2d9jhh9MKqVdnc3nY7_kAwScRsA19U+t1VU8TqO6UvGBrvCQ@mail.gmail.com>
@ 2017-07-25  2:19     ` Benjamin Gilbert
  0 siblings, 0 replies; 3+ messages in thread
From: Benjamin Gilbert @ 2017-07-25  2:19 UTC (permalink / raw)
  To: Linux Kernel Network Developers
  Cc: Cong Wang,
	Mahesh Bandewar (महेश बंडेवार)

On Mon, Jul 24, 2017 at 2:34 PM, Mahesh Bandewar (महेश बंडेवार)
<maheshb@google.com> wrote:
> Having said that, the proposed solution (Cong's patch) should fix it, please
> give it a try so that this fix can be formalized.

The patch fixes the problem for me.

Thanks!
--Benjamin Gilbert

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-07-25  2:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-21  2:07 Bonding driver fails to enable second interface if updelay is non-zero Benjamin Gilbert
2017-07-24 20:53 ` Cong Wang
     [not found]   ` <CAF2d9jhh9MKqVdnc3nY7_kAwScRsA19U+t1VU8TqO6UvGBrvCQ@mail.gmail.com>
2017-07-25  2:19     ` Benjamin Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).