From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jay Vosburgh Subject: [PATCH net] bonding: fix slave stuck in BOND_LINK_FAIL state Date: Tue, 07 Nov 2017 19:50:07 +0900 Message-ID: <15116.1510051807@nyx> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT Cc: Alex Sidorenko , Mahesh Bandewar , Jarod Wilson , Veaceslav Falico , Andy Gospodarek , "David Miller" To: netdev@vger.kernel.org Return-path: Received: from youngberry.canonical.com ([91.189.89.112]:35738 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753560AbdKGKuS (ORCPT ); Tue, 7 Nov 2017 05:50:18 -0500 Content-ID: <15115.1510051807.1@nyx> Sender: netdev-owner@vger.kernel.org List-ID: The bonding miimon logic has a flaw, in that a failure of the rtnl_trylock can cause a slave to become permanently stuck in BOND_LINK_FAIL state. The sequence of events to cause this is as follows: 1) bond_miimon_inspect finds that a slave's link is down, and so calls bond_propose_link_state, setting slave->new_link_state to BOND_LINK_FAIL, then sets slave->new_link to BOND_LINK_DOWN and returns non-zero. 2) In bond_mii_monitor, the rtnl_trylock fails, and the timer is rescheduled. No change is committed. 3) bond_miimon_inspect is called again, but this time the slave from step 1 has recovered. slave->new_link is reset to NOCHANGE, and, as slave->link was never changed, the switch enters the BOND_LINK_UP case, and does nothing. The pending BOND_LINK_FAIL state from step 1 remains pending, as new_link_state is not reset. 4) The state from step 3 persists until another slave changes link state and causes bond_miimon_inspect to return non-zero. At this point, the BOND_LINK_FAIL state change on the slave from steps 1-3 is committed, and the slave will remain stuck in BOND_LINK_FAIL state even though it is actually link up. The remedy for this is to initialize new_link_state on each entry to bond_miimon_inspect, as is already done with new_link. Reported-by: Alex Sidorenko Reviewed-by: Jarod Wilson Signed-off-by: Jay Vosburgh Fixes: fb9eb899a6dc ("bonding: handle link transition from FAIL to UP correctly") --- drivers/net/bonding/bond_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index c99dc59d729b..167434e952da 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -2042,6 +2042,7 @@ static int bond_miimon_inspect(struct bonding *bond) bond_for_each_slave_rcu(bond, slave, iter) { slave->new_link = BOND_LINK_NOCHANGE; + slave->link_new_state = slave->link; link_state = bond_check_dev_link(bond, slave->dev, 0); -- 2.14.1