netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jay Vosburgh <jay.vosburgh@canonical.com>
To: Aleksei Zakharov <zaharov@selectel.ru>
Cc: netdev@vger.kernel.org
Subject: Re: [PATCH] bonding/802.3ad: fix slave initialization states race
Date: Wed, 18 Sep 2019 16:34:34 +0200	[thread overview]
Message-ID: <31893.1568817274@nyx> (raw)
In-Reply-To: <20190918130545.GA11133@yandex.ru>

Aleksei Zakharov <zaharov@selectel.ru> wrote:

>Once a while, one of 802.3ad slaves fails to initialize and hangs in
>BOND_LINK_FAIL state. Commit 334031219a84 ("bonding/802.3ad: fix slave
>link initialization transition states") checks slave->last_link_up. But
>link can still hang in weird state.
>After physical link comes up it sends first two LACPDU messages and
>doesn't work properly after that. It doesn't send or receive LACPDU.
>Once it happens, the only message in dmesg is:
>bond1: link status up again after 0 ms for interface eth2

	I believe this message indicates that the slave entered
BOND_LINK_FAIL state, but downdelay was not set.  The _FAIL state is
really for managing the downdelay expiration, and a slave should not be
in that state (outside of a brief transition entirely within
bond_miimon_inspect) if downdelay is 0.

>This behavior can be reproduced (not every time):
>1. Set slave link down
>2. Wait for 1-3 seconds
>3. Set slave link up
>
>The fix is to check slave->link before setting it to BOND_LINK_FAIL or
>BOND_LINK_DOWN state. If got invalid Speed/Dupex values and link is in
>BOND_LINK_UP state, mark it as BOND_LINK_FAIL; otherwise mark it as
>BOND_LINK_DOWN.
>
>Fixes: 334031219a84 ("bonding/802.3ad: fix slave link initialization
>transition states")
>Signed-off-by: Aleksei Zakharov <zakharov.a.g@yandex.ru>
>---
> drivers/net/bonding/bond_main.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 931d9d935686..a28776d8f33f 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -3135,7 +3135,7 @@ static int bond_slave_netdev_event(unsigned long event,
> 		 */
> 		if (bond_update_speed_duplex(slave) &&
> 		    BOND_MODE(bond) == BOND_MODE_8023AD) {
>-			if (slave->last_link_up)
>+			if (slave->link == BOND_LINK_UP)
> 				slave->link = BOND_LINK_FAIL;
> 			else
> 				slave->link = BOND_LINK_DOWN;

	Is the core problem here that slaves are reporting link up, but
returning invalid values for speed and/or duplex?  If so, what network
device are you testing with that is exhibiting this behavior?

	If I'm not mistaken, there have been several iterations of
hackery on this block of code to work around this same problem, and each
time there's some corner case that still doesn't work.

	As Davem asked last time around, is the real problem that device
drivers report carrier up but supply invalid speed and duplex state?

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

  reply	other threads:[~2019-09-18 14:34 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-18 13:05 [PATCH] bonding/802.3ad: fix slave initialization states race Aleksei Zakharov
2019-09-18 14:34 ` Jay Vosburgh [this message]
     [not found]   ` <CAJYOGF9KZdouvmTxQcTOQgsi-uBxbvW50K3ufW1=8neeW98QVA@mail.gmail.com>
2019-09-18 18:27     ` Fwd: " Алексей Захаров
2019-09-19  8:00       ` Jay Vosburgh
2019-09-19  9:53         ` Алексей Захаров
2019-09-19 15:27           ` Jay Vosburgh
2019-09-20 13:52             ` Jay Vosburgh
2019-09-20 16:00               ` Алексей Захаров
2019-09-21  7:06                 ` Jay Vosburgh
2019-09-21 11:17                   ` Алексей Захаров
2019-09-25  0:31                     ` Jay Vosburgh
2019-09-25 11:01                       ` Aleksei Zakharov
2019-09-26  4:38                         ` Jay Vosburgh
2019-09-26 14:25                           ` Aleksei Zakharov
2019-09-26 20:01                             ` Jay Vosburgh
2019-09-27 11:14                               ` Aleksei Zakharov
2019-09-27  9:43                         ` zhangsha (A)
2019-10-22 12:05                       ` Aleksei Zakharov
2019-09-24 13:52 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=31893.1568817274@nyx \
    --to=jay.vosburgh@canonical.com \
    --cc=netdev@vger.kernel.org \
    --cc=zaharov@selectel.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).