netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* bond link state mismatch, rtnl_trylock() vs rtnl_lock()
@ 2017-05-23 19:32 Nithin Sujir
  2017-05-23 20:13 ` Jay Vosburgh
  2017-05-23 21:30 ` Mahesh Bandewar (महेश बंडेवार)
  0 siblings, 2 replies; 5+ messages in thread
From: Nithin Sujir @ 2017-05-23 19:32 UTC (permalink / raw)
  To: netdev

Hi,
We're encountering a problem in 4.4 LTS where, rarely, the bond link 
state is not updated when the slave link changes.

I've traced the issue to the arp monitor unable to get the rtnl lock. 
The sequence resulting in failure is as below.

bond_loadbalance_arp_mon() periodically called, if slave link is _down_, 
it checks if the slave is sending/receiving packets. If it is, it sets 
flags to be processed later down the function for bond link update. 
However, it sets the slave->link right away.

                 if (slave->link != BOND_LINK_UP) {
                         if (bond_time_in_interval(bond, trans_start, 1) &&
                             bond_time_in_interval(bond, slave->last_rx, 
1)) {

                                 slave->link  = BOND_LINK_UP;
                                 slave_state_changed = 1;


Later down the function, it tries to get the rtnl_lock. If it doesn't 
get it, it rearms and returns.

         if (do_failover || slave_state_changed) {
                 if (!rtnl_trylock())
                         goto re_arm; <-- returns here

                 if (slave_state_changed) {
                         bond_slave_state_change(bond);

This is the problem. The next time this function is called, the 
slave->link is already marked UP. And we will never update the bond link 
state to UP.

Changing the rtnl_trylock() -> rtnl_lock() _does_ fix the issue.

Is this the right way to fix it? If it is, I can submit this formally.

What are the guidelines around using rtnl_lock() vs rtnl_trylock()? Some 
places are using rtnl_lock() and other rtnl_trylock(). Sorry, I couldn't 
find much via a google search or in Documentation/.

Thanks,
Nithin.

--------------------

diff --git a/drivers/net/bonding/bond_main.c 
b/drivers/net/bonding/bond_main.c
index 5dca77e..1f60503 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2614,8 +2614,7 @@ static void bond_loadbalance_arp_mon(struct 
work_struct *work)
         rcu_read_unlock();

         if (do_failover || slave_state_changed) {
-               if (!rtnl_trylock())
-                       goto re_arm;
+               rtnl_lock();

                 if (slave_state_changed) {
                         bond_slave_state_change(bond);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-05-23 21:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-23 19:32 bond link state mismatch, rtnl_trylock() vs rtnl_lock() Nithin Sujir
2017-05-23 20:13 ` Jay Vosburgh
2017-05-23 21:10   ` Nithin Sujir
2017-05-23 21:30 ` Mahesh Bandewar (महेश बंडेवार)
2017-05-23 21:35   ` Nithin Sujir

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).