Re: Bond recovery from BOND_LINK_FAIL state not working

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jay Vosburgh <jay.vosburgh@canonical.com>
To: Alex Sidorenko <alexandre.sidorenko@hpe.com>
Cc: Jarod Wilson <jarod@redhat.com>,
	netdev@vger.kernel.org, Mahesh Bandewar <maheshb@google.com>
Subject: Re: Bond recovery from BOND_LINK_FAIL state not working
Date: Thu, 02 Nov 2017 18:11:06 -0700	[thread overview]
Message-ID: <995.1509671466@famine> (raw)
In-Reply-To: <b0b5b1e0-3e62-8526-3628-1ed2afecacc1@hpe.com>

Alex Sidorenko <alexandre.sidorenko@hpe.com> wrote:
>On 11/02/2017 12:51 AM, Jay Vosburgh wrote:
>> Jarod Wilson <jarod@redhat.com> wrote:
>>
>>> On 2017-11-01 8:35 PM, Jay Vosburgh wrote:
>>>> Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
>>>>
>>>>> Alex Sidorenko <alexandre.sidorenko@hpe.com> wrote:
>>>>>
>>>>>> The problem has been found while trying to deploy RHEL7 on HPE Synergy
>>>>>> platform, it is seen both in customer's environment and in HPE test lab.
>>>>>>
>>>>>> There are several bonds configured in TLB mode and miimon=100, all other
>>>>>> options are default. Slaves are connected to VirtualConnect
>>>>>> modules. Rebooting a VC module should bring one bond slave (ens3f0) down
>>>>>> temporarily, but not another one (ens3f1). But what we see is
>>>>>>
>>>>>> Oct 24 10:37:12 SYDC1LNX kernel: bond0: link status up again after 0 ms for interface ens3f1
>>>>
>>>> 	In net-next, I don't see a path in the code that will lead to
>>>> this message, as it would apparently require entering
>>>> bond_miimon_inspect in state BOND_LINK_FAIL but with downdelay set to 0.
>>>> If downdelay is 0, the code will transition to BOND_LINK_DOWN and not
>>>> remain in _FAIL state.
>>>
>>> The kernel in question is laden with a fair bit of additional debug spew,
>>> as we were going back and forth, trying to isolate where things were going
>>> wrong.  That was indeed from the BOND_LINK_FAIL state in
>>> bond_miimon_inspect, inside the if (link_state) clause though, so after
>>> commit++, there's a continue, which ... does what now? Doesn't it take us
>>> back to the top of the bond_for_each_slave_rcu() loop, so we bypass the
>>> next few lines of code that would have led to a transition to
>>> BOND_LINK_DOWN?
>>
>> 	Just to confirm: your downdelay is 0, correct?
>
>Correct.
>
>>
>> 	And, do you get any other log messages other than "link status
>> up again after 0 ms"?
>
>Yes, here are some messages (from an early instrumentation):
[...]
>That is, we never see ens3f1 going to BOND_LINK_DOWN and it continues
>staying in BOND_LINK_NOCHANGE/BOND_LINK_FAIL
>
>
>>
>> 	To answer your question, yes, the "if (link_state) {" block in
>> the BOND_LINK_FAIL case of bond_miimon_inspect ends in continue, but
>> this path is nominally for the downdelay logic.  If downdelay is active
>> and the link recovers before the delay expires, the link should never
>> have moved to BOND_LINK_DOWN.  The commit++ causes bond_miimon_inspect
>> to return nonzero, causing in turn the bond_propose_link_state change to
>> BOND_LINK_FAIL state to be committed.  This path deliberately does not
>> set slave->new_link, as downdelay is purposely delaying the transition
>> to BOND_LINK_DOWN.
>>
>> 	If downdelay is 0, the slave->link should not persist in
>> BOND_LINK_FAIL state; it should set new_link = BOND_LINK_DOWN which will
>> cause a transition in bond_miimon_commit.  The bond_propose_link_state
>> call to set BOND_LINK_FAIL in the BOND_LINK_UP case will be committed in
>> bond_mii_monitor prior to calling bond_miimon_commit, which will in turn
>> do the transition to _DOWN state.  In this case, the BOND_LINK_FAIL case
>> "if (link_state) {" block should never be entered.
>
>I totally agree with your description of transition logic, and this is why
>we were puzzled by how this can occur until we noticed NetworkManager
>messages around this time and decided to run a test without it.
>Without NM, everything works as expected. After that, adding more
>instrumentation, we have found that we do not propose BOND_LINK_FAIL inside
>bond_miimon_inspect() but elsewhere (NetworkManager?).

	I think I see the flaw in the logic.

	1) bond_miimon_inspect finds link_state = 0, then makes a call
to bond_propose_link_state(BOND_LINK_FAIL), setting link_new_state to
BOND_LINK_FAIL.  _inspect then sets slave->new_link = BOND_LINK_DOWN and
returns non-zero.

	2) bond_mii_monitor rtnl_trylock fails, it reschedules.

	3) bond_mii_monitor runs again, and calls bond_miimon_inspect.

	4) the slave's link has recovered, so link_state != 0.
slave->link is still BOND_LINK_UP.  The slave's link_new_state remains
set to BOND_LINK_FAIL, but new_link is reset to NOCHANGE.
bond_miimon_inspect returns 0, so nothing is committed.

	5) step 4 can repeat indefinitely.

	6) eventually, the other slave does something that causes
commit++, making bond_mii_monitor call bond_commit_link_state and then
bond_miimon_commit.  The slave in question from steps 1-4 still has
link_new_state as BOND_LINK_FAIL, but new_link is NOCHANGE, so it ends
up in BOND_LINK_FAIL state.

	I think step 6 could also occur concurrently with the initial
pass through step 4 to induce the problem.

	It looks like Mahesh mostly fixed this in

commit fb9eb899a6dc663e4a2deed9af2ac28f507d0ffb
Author: Mahesh Bandewar <maheshb@google.com>
Date:   Tue Apr 11 22:36:00 2017 -0700

    bonding: handle link transition from FAIL to UP correctly

	but the window still exists, and requires the slave link state
to change between the failed rtnl_trylock and the second pass through
_inspect.  The problem is that a state transition has been kept between
invocations to _inspect, but the condition that induced the transition
has changed.

	I haven't tested these, but I suspect the solution is either to
clear link_new_state on entry to the loop in bond_miimon_inspect, or
merge new_state and link_new_state as a single "next state" (which is
cleared on entry to the loop).

	The first of these is a pretty simple patch:

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 18b58e1376f1..6f89f9981a6c 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2046,6 +2046,7 @@ static int bond_miimon_inspect(struct bonding *bond)
 
 	bond_for_each_slave_rcu(bond, slave, iter) {
 		slave->new_link = BOND_LINK_NOCHANGE;
+		slave->link_new_state = slave->link;
 
 		link_state = bond_check_dev_link(bond, slave->dev, 0);
 

	Alex / Jarod, could you check my logic, and would you be able to
test this patch if my analysis appears sound?

	Thanks,

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

next prev parent reply	other threads:[~2017-11-03  1:11 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-01 18:09 Bond recovery from BOND_LINK_FAIL state not working Alex Sidorenko
2017-11-01 21:34 ` Jay Vosburgh
2017-11-02  0:35   ` Jay Vosburgh
2017-11-02  2:37     ` Jarod Wilson
2017-11-02  4:51       ` Jay Vosburgh
2017-11-02 12:47         ` Alex Sidorenko
2017-11-03  1:11           ` Jay Vosburgh [this message]
2017-11-03 15:40             ` Alex Sidorenko
2017-11-03 18:26               ` Jay Vosburgh
2017-11-03 19:30                 ` Alex Sidorenko
2017-11-03 21:46                   ` Jarod Wilson
2017-11-06  6:06             ` Jarod Wilson
2017-11-07  2:47               ` Jay Vosburgh

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:18b58e1376f dfblob:6f89f9981a6 )
 OR (
bs:"Re: Bond recovery from BOND_LINK_FAIL state not working" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=995.1509671466@famine \
    --to=jay.vosburgh@canonical.com \
    --cc=alexandre.sidorenko@hpe.com \
    --cc=jarod@redhat.com \
    --cc=maheshb@google.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.