From: Jay Vosburgh <jay.vosburgh@canonical.com>
To: Nithin Sujir <nsujir@tintri.com>
Cc: netdev@vger.kernel.org
Subject: Re: bond link state mismatch, rtnl_trylock() vs rtnl_lock()
Date: Tue, 23 May 2017 13:13:32 -0700 [thread overview]
Message-ID: <1638.1495570412@famine> (raw)
In-Reply-To: <24c0747b-df09-66da-f3ac-a393a5902a72@tintri.com>
Nithin Sujir <nsujir@tintri.com> wrote:
>Hi,
>We're encountering a problem in 4.4 LTS where, rarely, the bond link state
>is not updated when the slave link changes.
>
>I've traced the issue to the arp monitor unable to get the rtnl lock. The
>sequence resulting in failure is as below.
>
>bond_loadbalance_arp_mon() periodically called, if slave link is _down_,
>it checks if the slave is sending/receiving packets. If it is, it sets
>flags to be processed later down the function for bond link
>update. However, it sets the slave->link right away.
>
> if (slave->link != BOND_LINK_UP) {
> if (bond_time_in_interval(bond, trans_start, 1) &&
> bond_time_in_interval(bond, slave->last_rx,
>1)) {
>
> slave->link = BOND_LINK_UP;
> slave_state_changed = 1;
>
>
>Later down the function, it tries to get the rtnl_lock. If it doesn't get
>it, it rearms and returns.
>
> if (do_failover || slave_state_changed) {
> if (!rtnl_trylock())
> goto re_arm; <-- returns here
>
> if (slave_state_changed) {
> bond_slave_state_change(bond);
>
>This is the problem. The next time this function is called, the
>slave->link is already marked UP. And we will never update the bond link
>state to UP.
This looks like an ARP monitor version of
commit de77ecd4ef02ca783f7762e04e92b3d0964be66b
Author: Mahesh Bandewar <maheshb@google.com>
Date: Mon Mar 27 11:37:33 2017 -0700
bonding: improve link-status update in mii-monitoring
and probably needs a similar fix (possibly for both the
loadbalance and active-backup ARP monitor cases).
>Changing the rtnl_trylock() -> rtnl_lock() _does_ fix the issue.
>
>Is this the right way to fix it? If it is, I can submit this formally.
It's not the right way, unfortunately.
The reason for the rtnl_trylock is that there's a possible race
against bond_close() -> bond_work_cancel_all() trying to cancel the
arp_work workqueue item while it's running. bond_close is called with
RTNL held, so if it has RTNL and is waiting for the work function to
complete, an rtnl_lock call here will deadlock. Some of the trylock
calls in bonding are commented to this effect, but not this one.
-J
>What are the guidelines around using rtnl_lock() vs rtnl_trylock()? Some
>places are using rtnl_lock() and other rtnl_trylock(). Sorry, I couldn't
>find much via a google search or in Documentation/.
>
>Thanks,
>Nithin.
>
>--------------------
>
>diff --git a/drivers/net/bonding/bond_main.c
>b/drivers/net/bonding/bond_main.c
>index 5dca77e..1f60503 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -2614,8 +2614,7 @@ static void bond_loadbalance_arp_mon(struct
>work_struct *work)
> rcu_read_unlock();
>
> if (do_failover || slave_state_changed) {
>- if (!rtnl_trylock())
>- goto re_arm;
>+ rtnl_lock();
>
> if (slave_state_changed) {
> bond_slave_state_change(bond);
>
>
---
-Jay Vosburgh, jay.vosburgh@canonical.com
next prev parent reply other threads:[~2017-05-23 20:13 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-23 19:32 bond link state mismatch, rtnl_trylock() vs rtnl_lock() Nithin Sujir
2017-05-23 20:13 ` Jay Vosburgh [this message]
2017-05-23 21:10 ` Nithin Sujir
2017-05-23 21:30 ` Mahesh Bandewar (महेश बंडेवार)
2017-05-23 21:35 ` Nithin Sujir
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1638.1495570412@famine \
--to=jay.vosburgh@canonical.com \
--cc=netdev@vger.kernel.org \
--cc=nsujir@tintri.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).