From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jay Vosburgh Subject: Re: 2.6.27.18: bnx2/tg3: BUG: "scheduling while atomic" trying to ifenslave a second interface to my bond Date: Wed, 15 Apr 2009 11:11:53 -0700 Message-ID: <30789.1239819113@death.nxdomain.ibm.com> References: <1239657348.8944.529.camel@psmith-ubeta.netezza.com> <11276.1239757967@death.nxdomain.ibm.com> <1239814602.8944.593.camel@psmith-ubeta.netezza.com> Cc: netdev@vger.kernel.org To: paul@mad-scientist.net Return-path: Received: from e7.ny.us.ibm.com ([32.97.182.137]:36321 "EHLO e7.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752404AbZDOSL4 (ORCPT ); Wed, 15 Apr 2009 14:11:56 -0400 Received: from d01relay07.pok.ibm.com (d01relay07.pok.ibm.com [9.56.227.147]) by e7.ny.us.ibm.com (8.13.1/8.13.1) with ESMTP id n3FI1oQ9022885 for ; Wed, 15 Apr 2009 14:01:50 -0400 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d01relay07.pok.ibm.com (8.13.8/8.13.8/NCO v9.2) with ESMTP id n3FIBpvc3899616 for ; Wed, 15 Apr 2009 14:11:51 -0400 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id n3FIBlit021510 for ; Wed, 15 Apr 2009 12:11:47 -0600 In-reply-to: <1239814602.8944.593.camel@psmith-ubeta.netezza.com> Sender: netdev-owner@vger.kernel.org List-ID: Paul Smith wrote: >On Tue, 2009-04-14 at 18:12 -0700, Jay Vosburgh wrote: >> I think I know what's going on. I believe this patch will >> resolve things, but I won't be able to test it until tomorrow. If you >> want to test this, great; if you want to wait, that's fine too. > >Hi Jay; as I mentioned last night this patch is working fine for me so >far. Thanks for the test report. >However, looking at the rest of this function it seems to me that there >are other locking issues, at least based on the documentation in the >header file: > > * Here are the locking policies for the two bonding locks: > * > * 1) Get bond->lock when reading/writing slave list. > * 2) Get bond->curr_slave_lock when reading/writing bond->curr_active_slave. > * (It is unnecessary when the write-lock is put with bond->lock.) > * 3) When we lock with bond->curr_slave_lock, we must lock with bond->lock > * beforehand. > >For example, don't you need to hold bond->curr_slave_lock at least >around the "if (!bond->curr_active_slave)"? What about around the >"bond_for_each_slave" loop? > >Many of the other functions, later, also seem to work with >bond->curr_active_slave and they don't take this lock. > >Unless I'm missing something, I think there are still more problems in >the locking in bond_alb_set_mac_address(). The various MAC manipulating functions are either called under RTNL (as bond_alb_set_mac_address is) or take pains to acquire RTNL before doing anything with the MAC. Also, the slave list and curr_active_slave are mutexed by RTNL, so those inspections should be safe. I'm reasonably sure that the curr_slave_lock is superfluous (which wasn't the case when it was originally introduced), but I haven't had a chance to validate this. The locking has changed from what's documented in the header file; RTNL wasn't used for this when that was written. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com