netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jarod Wilson <jarod@redhat.com>
To: Jay Vosburgh <jay.vosburgh@canonical.com>,
	Alex Sidorenko <alexandre.sidorenko@hpe.com>
Cc: netdev@vger.kernel.org
Subject: Re: Bond recovery from BOND_LINK_FAIL state not working
Date: Wed, 1 Nov 2017 22:37:26 -0400	[thread overview]
Message-ID: <a6b9c9bc-a3e7-60fb-4a44-cc8b641f308c@redhat.com> (raw)
In-Reply-To: <10968.1509582913@famine>

On 2017-11-01 8:35 PM, Jay Vosburgh wrote:
> Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
> 
>> Alex Sidorenko <alexandre.sidorenko@hpe.com> wrote:
>>
>>> The problem has been found while trying to deploy RHEL7 on HPE Synergy
>>> platform, it is seen both in customer's environment and in HPE test lab.
>>>
>>> There are several bonds configured in TLB mode and miimon=100, all other
>>> options are default. Slaves are connected to VirtualConnect
>>> modules. Rebooting a VC module should bring one bond slave (ens3f0) down
>>> temporarily, but not another one (ens3f1). But what we see is
>>>
>>> Oct 24 10:37:12 SYDC1LNX kernel: bond0: link status up again after 0 ms for interface ens3f1
> 
> 	In net-next, I don't see a path in the code that will lead to
> this message, as it would apparently require entering
> bond_miimon_inspect in state BOND_LINK_FAIL but with downdelay set to 0.
> If downdelay is 0, the code will transition to BOND_LINK_DOWN and not
> remain in _FAIL state.

The kernel in question is laden with a fair bit of additional debug 
spew, as we were going back and forth, trying to isolate where things 
were going wrong.  That was indeed from the BOND_LINK_FAIL state in 
bond_miimon_inspect, inside the if (link_state) clause though, so after 
commit++, there's a continue, which ... does what now? Doesn't it take 
us back to the top of the bond_for_each_slave_rcu() loop, so we bypass 
the next few lines of code that would have led to a transition to 
BOND_LINK_DOWN?

...
>> 	Your patch does not apply to net-next, so I'm not absolutely
>> sure where this is, but presuming that this is in the BOND_LINK_FAIL
>> case of the switch, it looks like both BOND_LINK_FAIL and BOND_LINK_BACK
>> will have the issue that if the link recovers or fails, respectively,
>> within the delay window (for down/updelay > 0) it won't set a
>> slave->new_link.
>>
>> 	Looks like this got lost somewhere along the line, as originally
>> the transition back to UP (or DOWN) happened immediately, and that has
>> been lost somewhere.
>>
>> 	I'll have to dig out when that broke, but I'll see about a test
>> patch this afternoon.
> 
> 	The case I was concerned with was moved around; the proposed
> state is committed in bond_mii_monitor.  But to commit to _FAIL state,
> the downdelay would have to be > 0.  I'm not seeing any errors in
> net-next; can you reproduce your erroneous behavior on net-next?

I can try to get a net-next-ish kernel into their hands, but the bonding 
driver we're working with here is quite close to current net-next 
already, so I'm fairly confident the same thing will happen.

-- 
Jarod Wilson
jarod@redhat.com

  reply	other threads:[~2017-11-02  2:37 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-01 18:09 Bond recovery from BOND_LINK_FAIL state not working Alex Sidorenko
2017-11-01 21:34 ` Jay Vosburgh
2017-11-02  0:35   ` Jay Vosburgh
2017-11-02  2:37     ` Jarod Wilson [this message]
2017-11-02  4:51       ` Jay Vosburgh
2017-11-02 12:47         ` Alex Sidorenko
2017-11-03  1:11           ` Jay Vosburgh
2017-11-03 15:40             ` Alex Sidorenko
2017-11-03 18:26               ` Jay Vosburgh
2017-11-03 19:30                 ` Alex Sidorenko
2017-11-03 21:46                   ` Jarod Wilson
2017-11-06  6:06             ` Jarod Wilson
2017-11-07  2:47               ` Jay Vosburgh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a6b9c9bc-a3e7-60fb-4a44-cc8b641f308c@redhat.com \
    --to=jarod@redhat.com \
    --cc=alexandre.sidorenko@hpe.com \
    --cc=jay.vosburgh@canonical.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).