From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nikolay Aleksandrov Subject: Re: [PATCH net-next 2/2] bonding: Simplify the xmit function for modes that use xmit_hash Date: Fri, 05 Sep 2014 13:49:21 +0200 Message-ID: <5409A341.2010806@redhat.com> References: <1409780851-32081-1-git-send-email-maheshb@google.com> <54086645.3010102@redhat.com> <54099DD3.20109@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Jay Vosburgh , Veaceslav Falico , Andy Gospodarek , David Miller , netdev , Eric Dumazet , Maciej Zenczykowski To: Mahesh Bandewar Return-path: Received: from mx1.redhat.com ([209.132.183.28]:62037 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932134AbaIELtc (ORCPT ); Fri, 5 Sep 2014 07:49:32 -0400 In-Reply-To: <54099DD3.20109@redhat.com> Sender: netdev-owner@vger.kernel.org List-ID: On 05/09/14 13:26, Nikolay Aleksandrov wrote: > On 05/09/14 02:10, Mahesh Bandewar wrote: >> On Thu, Sep 4, 2014 at 6:16 AM, Nikolay Aleksandrov wrote: >>> On 03/09/14 23:47, Mahesh Bandewar wrote: >>>> >>>> Earlier change to use usable slave array for TLB mode had an additional >>>> performance advantage. So extending the same logic to all other modes >>>> that use xmit-hash for slave selection (viz 802.3AD, and XOR modes). >>>> Also consolidating this with the earlier TLB change. >>>> >>>> The main idea is to build the usable slaves array in the control path >>>> and use that array for slave selection during xmit operation. >>>> >>>> Measured performance in a setup with a bond of 4x1G NICs with 200 >>>> instances of netperf for the modes involved (3ad, xor, tlb) >>>> cmd: netperf -t TCP_RR -H -l 60 -s 5 >>>> >>>> Mode TPS-Before TPS-After >>>> >>>> 802.3ad : 468,694 493,101 >>>> TLB (lb=0): 392,583 392,965 >>>> XOR : 475,696 484,517 >>>> >>>> Signed-off-by: Mahesh Bandewar >>>> --- >>> > <<<<>>>>> >>>> - bond_xmit_slave_id(bond, skb, bond_xmit_hash(bond, skb) % >>>> bond->slave_cnt); >>>> + old_arr = rcu_dereference_protected(bond->slave_arr, >>>> + lockdep_rtnl_is_held() || >>>> + lockdep_is_held(&bond->lock) >>>> || >>>> + >>>> lockdep_is_held(&bond->curr_slave_lock)); >>> >>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >>> This line is the most troublesome for me, which lock is it ? Does this mean >>> that whichever I hold from the three I can update the slave array ? >>> I don't think this is worked out well, you should explicitly specify how and >>> why it is safe to update this under each of the locks and maybe you'll be >>> able to reduce the lock list :-) >>> >> This is primarily because of different code paths it's taking to reach >> here. In all these cases, one of those locks is held. Unfortunately >> there are three such locks that I have identified (for all three >> modes involved) and hence the above line. >> > > True, but I did a little grepping and here's my analysis of the call sites which > I can't guarantee is full or complete, but it shows at least 1 problem. > bond_update_slave_arr() callers: > > 1. 3ad mode > 1.1. bond_3ad_state_machine_handler -> ad_mux_machine -> > ad_(en|dis)able_collecting_distributing > - read_lock(bond->lock), rcu_read_lock, state_machine_lock > 1.2. __bond_release_one -> bond_3ad_unbind_slave > - rtnl, write_lock(bond->lock) > 1.3. bond_change_active_slave -> bond_3ad_handle_link_change > - from 4. rtnl, new_active != NULL -> write_lock(curr_slave_lock) > 1.4. bond_miimon_commit -> bond_3ad_handle_link_change > - rtnl ^^^^^^ missed the state_machine_lock here > > 2. TLB > 2.1. __bond_release_one -> bond_alb_deinit_slave > - rtnl > 2.2. bond_change_active_slave -> bond_alb_handle_link_change > - from 4. rtnl, new_active != NULL -> write_lock(curr_slave_lock) > 2.3. bond_miimon_commit -> bond_alb_handle_link_change > - rtnl > > 3. XOR > 3.1. __bond_release_one > - rtnl > 3.2. bond_miimon_commit > - rtnl > > 4. bond_change_active_slave: > 1. bond_select_active_slave -> bond_change_active_slave > 1.1. bond_enslave -> bond_select_active_slave > - rtnl, write_lock(curr_slave_lock) > 1.2. __bond_release_one -> bond_select_active_slave > - rtnl, write_lock(curr_slave_lock) > 1.3. bond_miimon_commit -> bond_select_active_slave > - rtnl, write_lock(curr_slave_lock) > 1.4. bond_loadbalance_arp_mon -> bond_select_active_slave > - rtnl, write_lock(curr_slave_lock) > 1.5. bond_ab_arp_commit -> bond_select_active_slave > - rtnl, write_lock(curr_slave_lock) > 1.6. bond_slave_netdev_event -> bond_select_active_slave > - rtnl, write_lock(curr_slave_lock) > 1.7. bond_options.c (all callers) > - rtnl, write_lock(curr_slave_lock) > > > Almost all callers of slave_update_arr() currently have rtnl acquired, but > there's 1 troubling caller: bond_3ad_state_machine_handler() which is called > from a workqueue. Now if we're able to execute anything with that workqueue, we > have a race condition, good candidates are all options which don't acquire > write_lock(bond->lock), I think the only one that can call > bond_slave_update_arr() of those is primary_reselect right now. ^^^^^^^^^^^^^^^^ Though even that might not be a problem since the state_machine_lock would save you, so it looks like it's not a problem but the convoluted locking requirements are a problem waiting to happen by themselves. Anyway that is a longstanding problem so I don't mind if you keep the code like this, too. Just wanted to make sure that it doesn't create any new subtle race conditions. > So if you come up with some way to deal with that, you probably can use only > rtnl for syncing the array and simplify this. > Again I might be wrong since this is done only via grepping :-) > > Cheers, > Nik