From mboxrd@z Thu Jan 1 00:00:00 1970 From: Veaceslav Falico Subject: Re: [PATCH net-next v3 0/10] bonding: rebuild the lock use for bond monitor Date: Mon, 11 Nov 2013 14:06:17 +0100 Message-ID: <20131111130617.GY19702@redhat.com> References: <5280CF34.20703@huawei.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Cc: Jay Vosburgh , Andy Gospodarek , "David S. Miller" , Nikolay Aleksandrov , Netdev To: Ding Tianhong Return-path: Received: from mx1.redhat.com ([209.132.183.28]:8417 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753094Ab3KKNIe (ORCPT ); Mon, 11 Nov 2013 08:08:34 -0500 Content-Disposition: inline In-Reply-To: <5280CF34.20703@huawei.com> Sender: netdev-owner@vger.kernel.org List-ID: On Mon, Nov 11, 2013 at 08:36:04PM +0800, Ding Tianhong wrote: >Now the bond slave list is not protected by bond lock, only by RTNL, >but the monitor still use the bond lock to protect the slave list, >it is useless, according to the Veaceslav's opinion, there were >three way to fix the protect problem: > >1. add bond_master_upper_dev_link() and bond_upper_dev_unlink() > in bond->lock, but it is unsafe to call call_netdevice_notifiers() > in write lock. >2. remove unused bond->lock for monitor function, only use the exist > rtnl lock(), it will take performance loss in fast path. >3. use RCU to protect the slave list, of course, performance is better, > but in slow path, it is ignored. > >obviously the solution 1 is not fit here, I will consider the 2 and 3 >solution. My principle is simple, if in fast path, RCU is better, >otherwise in slow path, both is well, but according to the Jay Vosburgh's >opinion, the monitor will loss performace if use RTNL to protect the all >slave list, so remove the bond lock and replace with RCU. > >The second problem is the curr_slave_lock for bond, it is too old and >unwanted in many place, because the curr_active_slave would only be >changed in 3 place: > >1. enslave slave. >2. release slave. >3. change active slave. > >all above were already holding bond lock, RTNL and curr_slave_lock >together, it is tedious and no need to add so mach lock, when change >the curr_active_slave, you have to hold the RTNL and curr_slave_lock >together, and when you read the curr_active_slave, RTNL or curr_slave_lock, >any one of them is no problem. Boot-test *with the same parameters as before* gave me the following trace[1], which is inevitable in case of mode 1 bonding. So that you've either ignored this warning or didn't actually test mode 1, even though your last patchset was reverted because of a regression in the same mode. How was this tested? And btw - net-next is closed. [1]: [ 13.847032] bonding: bond0: link status definitely up for interface eth2. [ 13.848732] bonding: bond0: making interface eth2 the new active one. [ 13.850429] device eth2 entered promiscuous mode [ 13.852168] [ 13.853833] =============================== [ 13.855410] [ INFO: suspicious RCU usage. ] [ 13.857017] 3.12.0-bond+ #314 Tainted: G I [ 13.858690] ------------------------------- [ 13.860404] drivers/net/bonding/bond_main.c:818 suspicious rcu_dereference_check() usage! [ 13.862006] [ 13.862006] other info that might help us debug this: [ 13.862006] [ 13.866334] [ 13.866334] rcu_scheduler_active = 1, debug_locks = 0 [ 13.869296] 4 locks held by kworker/u8:3/57: [ 13.870841] #0: (%s#4){.+.+..}, at: [] process_one_work+0x189/0x580 [ 13.872353] #1: ((&(&bond->arp_work)->work)){+.+...}, at: [] process_one_work+0x189/0x580 [ 13.873967] #2: (rtnl_mutex){+.+.+.}, at: [] rtnl_trylock+0x15/0x20 [ 13.875569] #3: (&bond->curr_slave_lock){++.+..}, at: [] bond_ab_arp_commit+0x12e/0x200 [bonding] [ 13.877167] [ 13.877167] stack backtrace: [ 13.880287] CPU: 1 PID: 57 Comm: kworker/u8:3 Tainted: G I 3.12.0-bond+ #314 [ 13.882011] Hardware name: Hewlett-Packard HP xw4600 Workstation/0AA0h, BIOS 786F3 v01.15 08/28/2008 [ 13.883585] Workqueue: bond0 bond_activebackup_arp_mon [bonding] [ 13.885011] 0000000000000001 ffff880079e89be8 ffffffff817a9df8 0000000000000002 [ 13.886564] ffff880079e80000 ffff880079e89c18 ffffffff81128d23 ffff8800790d4b40 [ 13.888179] ffff88007980a400 ffff8800790d4bb8 ffff8800790d4b40 ffff880079e89c38 [ 13.889837] Call Trace: [ 13.891417] [] dump_stack+0x59/0x81 [ 13.892881] [] lockdep_rcu_suspicious+0x103/0x140 [ 13.894290] [] bond_should_notify_peers+0xb1/0x110 [bonding] [ 13.895686] [] bond_change_active_slave+0x299/0x370 [bonding] [ 13.897118] [] bond_select_active_slave+0xf7/0x1d0 [bonding] [ 13.898672] [] bond_ab_arp_commit+0x136/0x200 [bonding] [ 13.900165] [] bond_activebackup_arp_mon+0x10d/0x340 [bonding] [ 13.901709] [] ? bond_activebackup_arp_mon+0x53/0x340 [bonding] [ 13.903125] [] process_one_work+0x1fa/0x580 [ 13.904554] [] ? process_one_work+0x189/0x580 [ 13.906023] [] worker_thread+0x11f/0x3a0 [ 13.907506] [] ? manage_workers+0x170/0x170 [ 13.908931] [] kthread+0xee/0x100 [ 13.910327] [] ? __lock_release+0x13b/0x1b0 [ 13.911677] [] ? __init_kthread_worker+0x70/0x70 [ 13.913082] [] ret_from_fork+0x7c/0xb0 [ 13.914478] [] ? __init_kthread_worker+0x70/0x70 [ 13.915860] bonding: bond0: first active interface up! [ 13.917294] bridge0: port 1(bond0) entered forwarding state [ 13.918632] bridge0: port 1(bond0) entered forwarding state [ 14.017018] bonding: bond0: link status definitely up for interface eth0. > >for the stability, I did not change the logic for the monitor, >all change is clear and simple, I have test the patch set for lockdep, >it work well and stability. > >v2. accept the Jay Vosburgh's opinion, remove the RTNL and replace with RCU, > also add some rcu function for bond use, so the patch set reach 10. > >v3. accept the Nikolay Aleksandrov's opinion, remove no needed bond_has_slave_rcu(), > add protection for several 3ad mode handler functions and current_arp_slave. > rebuild the bond_first_slave_rcu(), make it more clear. > >Best Regards >Ding Tianhong > >Ding Tianhong (10): > bonding: remove the no effect lock for bond_select_active_slave() > bonding: rebuild the lock use for bond_mii_monitor() > bonding: rebuild the lock use for bond_alb_monitor() > bonding: rebuild the lock use for bond_loadbalance_arp_mon() > bonding: create bond_first_slave_rcu() > bonding: rebuild the lock use for bond_activebackup_arp_mon() > bonding: rebuild the lock use for bond_3ad_state_machine_handler() > bonding: remove unwanted lock for bond_option_active_slave_set() > bonding: remove unwanted lock for bond enslave and release > bonding: remove unwanted lock for bond_store_primaryxxx() > > drivers/net/bonding/bond_3ad.c | 53 +++++++------ > drivers/net/bonding/bond_alb.c | 34 +++------ > drivers/net/bonding/bond_main.c | 147 ++++++++++++++++--------------------- > drivers/net/bonding/bond_options.c | 2 - > drivers/net/bonding/bond_sysfs.c | 4 - > drivers/net/bonding/bonding.h | 9 +++ > include/linux/netdevice.h | 16 ++++ > net/core/dev.c | 16 ---- > 8 files changed, 132 insertions(+), 149 deletions(-) > >-- >1.8.2.1 > > >