Re: Bond recovery from BOND_LINK_FAIL state not working

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jay Vosburgh <jay.vosburgh@canonical.com>
To: Alex Sidorenko <alexandre.sidorenko@hpe.com>
Cc: Jarod Wilson <jarod@redhat.com>,
	netdev@vger.kernel.org, Mahesh Bandewar <maheshb@google.com>
Subject: Re: Bond recovery from BOND_LINK_FAIL state not working
Date: Thu, 02 Nov 2017 18:11:06 -0700	[thread overview]
Message-ID: <995.1509671466@famine> (raw)
In-Reply-To: <b0b5b1e0-3e62-8526-3628-1ed2afecacc1@hpe.com>

Alex Sidorenko <alexandre.sidorenko@hpe.com> wrote:
>On 11/02/2017 12:51 AM, Jay Vosburgh wrote:
>> Jarod Wilson <jarod@redhat.com> wrote:
>>
>>> On 2017-11-01 8:35 PM, Jay Vosburgh wrote:
>>>> Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
>>>>
>>>>> Alex Sidorenko <alexandre.sidorenko@hpe.com> wrote:
>>>>>
>>>>>> The problem has been found while trying to deploy RHEL7 on HPE Synergy
>>>>>> platform, it is seen both in customer's environment and in HPE test lab.
>>>>>>
>>>>>> There are several bonds configured in TLB mode and miimon=100, all other
>>>>>> options are default. Slaves are connected to VirtualConnect
>>>>>> modules. Rebooting a VC module should bring one bond slave (ens3f0) down
>>>>>> temporarily, but not another one (ens3f1). But what we see is
>>>>>>
>>>>>> Oct 24 10:37:12 SYDC1LNX kernel: bond0: link status up again after 0 ms for interface ens3f1
>>>>
>>>> 	In net-next, I don't see a path in the code that will lead to
>>>> this message, as it would apparently require entering
>>>> bond_miimon_inspect in state BOND_LINK_FAIL but with downdelay set to 0.
>>>> If downdelay is 0, the code will transition to BOND_LINK_DOWN and not
>>>> remain in _FAIL state.
>>>
>>> The kernel in question is laden with a fair bit of additional debug spew,
>>> as we were going back and forth, trying to isolate where things were going
>>> wrong.  That was indeed from the BOND_LINK_FAIL state in
>>> bond_miimon_inspect, inside the if (link_state) clause though, so after
>>> commit++, there's a continue, which ... does what now? Doesn't it take us
>>> back to the top of the bond_for_each_slave_rcu() loop, so we bypass the
>>> next few lines of code that would have led to a transition to
>>> BOND_LINK_DOWN?
>>
>> 	Just to confirm: your downdelay is 0, correct?
>
>Correct.
>
>>
>> 	And, do you get any other log messages other than "link status
>> up again after 0 ms"?
>
>Yes, here are some messages (from an early instrumentation):
[...]
>That is, we never see ens3f1 going to BOND_LINK_DOWN and it continues
>staying in BOND_LINK_NOCHANGE/BOND_LINK_FAIL
>
>
>>
>> 	To answer your question, yes, the "if (link_state) {" block in
>> the BOND_LINK_FAIL case of bond_miimon_inspect ends in continue, but
>> this path is nominally for the downdelay logic.  If downdelay is active
>> and the link recovers before the delay expires, the link should never
>> have moved to BOND_LINK_DOWN.  The commit++ causes bond_miimon_inspect
>> to return nonzero, causing in turn the bond_propose_link_state change to
>> BOND_LINK_FAIL state to be committed.  This path deliberately does not
>> set slave->new_link, as downdelay is purposely delaying the transition
>> to BOND_LINK_DOWN.
>>
>> 	If downdelay is 0, the slave->link should not persist in
>> BOND_LINK_FAIL state; it should set new_link = BOND_LINK_DOWN which will
>> cause a transition in bond_miimon_commit.  The bond_propose_link_state
>> call to set BOND_LINK_FAIL in the BOND_LINK_UP case will be committed in
>> bond_mii_monitor prior to calling bond_miimon_commit, which will in turn
>> do the transition to _DOWN state.  In this case, the BOND_LINK_FAIL case
>> "if (link_state) {" block should never be entered.
>
>I totally agree with your description of transition logic, and this is why
>we were puzzled by how this can occur until we noticed NetworkManager
>messages around this time and decided to run a test without it.
>Without NM, everything works as expected. After that, adding more
>instrumentation, we have found that we do not propose BOND_LINK_FAIL inside
>bond_miimon_inspect() but elsewhere (NetworkManager?).

	I think I see the flaw in the logic.

	1) bond_miimon_inspect finds link_state = 0, then makes a call
to bond_propose_link_state(BOND_LINK_FAIL), setting link_new_state to
BOND_LINK_FAIL.  _inspect then sets slave->new_link = BOND_LINK_DOWN and
returns non-zero.

	2) bond_mii_monitor rtnl_trylock fails, it reschedules.

	3) bond_mii_monitor runs again, and calls bond_miimon_inspect.

	4) the slave's link has recovered, so link_state != 0.
slave->link is still BOND_LINK_UP.  The slave's link_new_state remains
set to BOND_LINK_FAIL, but new_link is reset to NOCHANGE.
bond_miimon_inspect returns 0, so nothing is committed.

	5) step 4 can repeat indefinitely.

	6) eventually, the other slave does something that causes
commit++, making bond_mii_monitor call bond_commit_link_state and then
bond_miimon_commit.  The slave in question from steps 1-4 still has
link_new_state as BOND_LINK_FAIL, but new_link is NOCHANGE, so it ends
up in BOND_LINK_FAIL state.

	I think step 6 could also occur concurrently with the initial
pass through step 4 to induce the problem.

	It looks like Mahesh mostly fixed this in

commit fb9eb899a6dc663e4a2deed9af2ac28f507d0ffb
Author: Mahesh Bandewar <maheshb@google.com>
Date:   Tue Apr 11 22:36:00 2017 -0700

    bonding: handle link transition from FAIL to UP correctly

	but the window still exists, and requires the slave link state
to change between the failed rtnl_trylock and the second pass through
_inspect.  The problem is that a state transition has been kept between
invocations to _inspect, but the condition that induced the transition
has changed.

	I haven't tested these, but I suspect the solution is either to
clear link_new_state on entry to the loop in bond_miimon_inspect, or
merge new_state and link_new_state as a single "next state" (which is
cleared on entry to the loop).

	The first of these is a pretty simple patch:

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 18b58e1376f1..6f89f9981a6c 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2046,6 +2046,7 @@ static int bond_miimon_inspect(struct bonding *bond)
 
 	bond_for_each_slave_rcu(bond, slave, iter) {
 		slave->new_link = BOND_LINK_NOCHANGE;
+		slave->link_new_state = slave->link;
 
 		link_state = bond_check_dev_link(bond, slave->dev, 0);
 

	Alex / Jarod, could you check my logic, and would you be able to
test this patch if my analysis appears sound?

	Thanks,

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

next prev parent reply	other threads:[~2017-11-03  1:11 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-01 18:09 Bond recovery from BOND_LINK_FAIL state not working Alex Sidorenko
2017-11-01 21:34 ` Jay Vosburgh
2017-11-02  0:35   ` Jay Vosburgh
2017-11-02  2:37     ` Jarod Wilson
2017-11-02  4:51       ` Jay Vosburgh
2017-11-02 12:47         ` Alex Sidorenko
2017-11-03  1:11           ` Jay Vosburgh [this message]
2017-11-03 15:40             ` Alex Sidorenko
2017-11-03 18:26               ` Jay Vosburgh
2017-11-03 19:30                 ` Alex Sidorenko
2017-11-03 21:46                   ` Jarod Wilson
2017-11-06  6:06             ` Jarod Wilson
2017-11-07  2:47               ` Jay Vosburgh

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:18b58e1376f dfblob:6f89f9981a6 )
 OR (
bs:"Re: Bond recovery from BOND_LINK_FAIL state not working" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=995.1509671466@famine \
    --to=jay.vosburgh@canonical.com \
    --cc=alexandre.sidorenko@hpe.com \
    --cc=jarod@redhat.com \
    --cc=maheshb@google.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).