From mboxrd@z Thu Jan 1 00:00:00 1970 From: HAYASAKA Mitsuo Subject: Re: [PATCH net -v2] [BUGFIX] bonding: use flush_delayed_work_sync in bond_close Date: Mon, 24 Oct 2011 13:00:46 +0900 Message-ID: <4EA4E2EE.4050201@hitachi.com> References: <20111019081757.12455.24788.stgit@ltc219.sdl.hitachi.co.jp> <27007.1319047262@death> <20111019114129.162d895a@nehalam.linuxnetplumber.net> <29731.1319051397@death> <17144.1319178396@death> <14766.1319245142@death> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: netdev@vger.kernel.org, =?UTF-8?Q?Am=C3=A9rico_Wang?= , Stephen Hemminger , Andy Gospodarek , linux-kernel@vger.kernel.org, yrl.pp-manager.tt@hitachi.com To: Jay Vosburgh Return-path: In-Reply-To: <14766.1319245142@death> Sender: linux-kernel-owner@vger.kernel.org List-Id: netdev.vger.kernel.org (2011/10/22 9:59), Jay Vosburgh wrote: > Jay Vosburgh wrote: >=20 >> Am=C3=A9rico Wang wrote: >> >>> On Thu, Oct 20, 2011 at 3:09 AM, Jay Vosburgh wr= ote: >>>> Stephen Hemminger wrote: >>>> >>>>> On Wed, 19 Oct 2011 11:01:02 -0700 >>>>> Jay Vosburgh wrote: >>>>> >>>>>> Mitsuo Hayasaka wrote: >>>>>> >>>>>>> The bond_close() calls cancel_delayed_work() to cancel delayed = works. >>>>>>> It, however, cannot cancel works that were already queued in wo= rkqueue. >>>>>>> The bond_open() initializes work->data, and proccess_one_work()= refers >>>>>>> get_work_cwq(work)->wq->flags. The get_work_cwq() returns NULL = when >>>>>>> work->data has been initialized. Thus, a panic occurs. >>>>>>> >>>>>>> This patch uses flush_delayed_work_sync() instead of cancel_del= ayed_work() >>>>>>> in bond_close(). It cancels delayed timer and waits for work to= finish >>>>>>> execution. So, it can avoid the null pointer dereference due to= the >>>>>>> parallel executions of proccess_one_work() and initializing pro= ccess >>>>>>> of bond_open(). >>>>>> >>>>>> I'm setting up to test this. I have a dim recollection tha= t we >>>>>> tried this some years ago, and there was a different deadlock th= at >>>>>> manifested through the flush path. Perhaps changes since then h= ave >>>>>> removed that problem. >>>>>> >>>>>> -J >>>>> >>>>> Won't this deadlock on RTNL. The problem is that: >>>>> >>>>> CPU0 CPU1 >>>>> rtnl_lock >>>>> bond_close >>>>> delayed_work >>>>> mii_work >>>>> read_lock(bond->lock); >>>>> read_unlock(bond->lock); >>>>> rtnl_lock... waiting for CPU0 >>>>> flush_delayed_work_sync >>>>> waiting for delayed_work to finish... >>>> >>>> Yah, that was it. We discussed this a couple of years ago = in >>>> regards to a similar patch: >>>> >>>> http://lists.openwall.net/netdev/2009/12/17/3 >>>> >>>> The short version is that we could rework the rtnl_lock ins= ide >>>> the montiors to be conditional and retry on failure (where "retry"= means >>>> "reschedule the work and try again later," not "spin retrying on r= tnl"). >>>> That should permit the use of flush or cancel to terminate the wor= k >>>> items. >>> >>> Yes? Even if we use rtnl_trylock(), doesn't flush_delayed_work_sync= () >>> still queue the pending delayed work and wait for it to be finished= ? >> >> Yes, it does. The original patch wants to use flush instead of >> cancel to wait for the work to finish, because there's evidently a >> possibility of getting back into bond_open before the work item >> executes, and bond_open would reinitialize the work queue and corrup= t >> the queued work item. >> >> The original patch series, and recipe for destruction, is here: >> >> http://www.spinics.net/lists/netdev/msg176382.html >> >> I've been unable to reproduce the work queue panic locally, >> although it sounds plausible. >> >> Mitsuo: can you provide the precise bonding configuration you're >> using to induce the problem? Driver options, number and type of sla= ves, >> etc. >> >>> Maybe I am too blind, why do we need rtnl_lock for cancel_delayed_w= ork() >>> inside bond_close()? >> >> We don't need RTNL for cancel/flush. However, bond_close is an >> ndo_stop operation, and is called in the dev_close path, which alway= s >> occurs under RTNL. The mii / arp monitor work functions separately >> acquire RTNL if they need to perform various failover related >> operations. >> >> I'm working on a patch that should resolve the mii / arp monitor >> RTNL problem as I described above (if rtnl_trylock fails, punt and >> reschedule the work). I need to rearrange the netdev_bonding_change >> stuff a bit as well, since it acquires RTNL separately. >> >> Once these changes are made to mii / arp monitor, then >> bond_close can call flush instead of cancel, which should eliminate = the >> original problem described at the top. >=20 > Just an update: there are three functions that may deadlock if > the cancel work calls are changed to flush_sync. There are two > rtnl_lock calls in each of the bond_mii_monitor and > bond_activebackup_arp_mon functions, and one more in the > bond_alb_monitor. >=20 > Still testing to make sure I haven't missed anything, and I > still haven't been able to reproduce Mitsuo's original failure. The interval of mii_mon was set to 1 to reproduce this bug easily and=20 the 802.3ad mode was used. Then, I executed the following command. # while true; do ifconfig bond0 down; done & # while true; do ifconfig bond0 up; done & This bug rarely occurs since it is the severe timing problem. I found that it is more easily to reproduce this bug when using guest O= S. =46or example, it took one to three days for me to reproduce it on host= OS,=20 but some hours on guest OS. Thanks. >=20 > -J >=20 > --- > -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com >=20 >=20