From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jay Vosburgh Subject: Re: [PATCH net -v2] [BUGFIX] bonding: use flush_delayed_work_sync in bond_close Date: Fri, 21 Oct 2011 17:59:02 -0700 Message-ID: <14766.1319245142@death> References: <20111019081757.12455.24788.stgit@ltc219.sdl.hitachi.co.jp> <27007.1319047262@death> <20111019114129.162d895a@nehalam.linuxnetplumber.net> <29731.1319051397@death> <17144.1319178396@death> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: =?us-ascii?Q?=3D=3Fus-ascii=3FQ=3F=3D3D=3D3FUTF-8=3D3FQ=3D3FAm=3D3DC3?= =?us-ascii?Q?=3D3DA9rico=3D5FWang=3D3F=3D3D=3F=3D?= , Stephen Hemminger , Mitsuo Hayasaka , Andy Gospodarek , linux-kernel@vger.kernel.org, yrl.pp-manager.tt@hitachi.com To: netdev@vger.kernel.org Return-path: In-reply-to: <17144.1319178396@death> Sender: linux-kernel-owner@vger.kernel.org List-Id: netdev.vger.kernel.org Jay Vosburgh wrote: >Am=C3=A9rico Wang wrote: > >>On Thu, Oct 20, 2011 at 3:09 AM, Jay Vosburgh wrot= e: >>> Stephen Hemminger wrote: >>> >>>>On Wed, 19 Oct 2011 11:01:02 -0700 >>>>Jay Vosburgh wrote: >>>> >>>>> Mitsuo Hayasaka wrote: >>>>> >>>>> >The bond_close() calls cancel_delayed_work() to cancel delayed w= orks. >>>>> >It, however, cannot cancel works that were already queued in wor= kqueue. >>>>> >The bond_open() initializes work->data, and proccess_one_work() = refers >>>>> >get_work_cwq(work)->wq->flags. The get_work_cwq() returns NULL w= hen >>>>> >work->data has been initialized. Thus, a panic occurs. >>>>> > >>>>> >This patch uses flush_delayed_work_sync() instead of cancel_dela= yed_work() >>>>> >in bond_close(). It cancels delayed timer and waits for work to = finish >>>>> >execution. So, it can avoid the null pointer dereference due to = the >>>>> >parallel executions of proccess_one_work() and initializing proc= cess >>>>> >of bond_open(). >>>>> >>>>> =C2=A0 =C2=A0 =C2=A0I'm setting up to test this. =C2=A0I have a d= im recollection that we >>>>> tried this some years ago, and there was a different deadlock tha= t >>>>> manifested through the flush path. =C2=A0Perhaps changes since th= en have >>>>> removed that problem. >>>>> >>>>> =C2=A0 =C2=A0 =C2=A0-J >>>> >>>>Won't this deadlock on RTNL. =C2=A0The problem is that: >>>> >>>> =C2=A0 CPU0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0CPU1 >>>> =C2=A0rtnl_lock >>>> =C2=A0 =C2=A0 =C2=A0bond_close >>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 delayed_work >>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mii_work >>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 read_lock(b= ond->lock); >>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 read_unlock= (bond->lock); >>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 rtnl_lock..= =2E waiting for CPU0 >>>> =C2=A0 =C2=A0 =C2=A0flush_delayed_work_sync >>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0waiting for delayed_work to fini= sh... >>> >>> =C2=A0 =C2=A0 =C2=A0 =C2=A0Yah, that was it. =C2=A0We discussed thi= s a couple of years ago in >>> regards to a similar patch: >>> >>> http://lists.openwall.net/netdev/2009/12/17/3 >>> >>> =C2=A0 =C2=A0 =C2=A0 =C2=A0The short version is that we could rewor= k the rtnl_lock inside >>> the montiors to be conditional and retry on failure (where "retry" = means >>> "reschedule the work and try again later," not "spin retrying on rt= nl"). >>> That should permit the use of flush or cancel to terminate the work >>> items. >> >>Yes? Even if we use rtnl_trylock(), doesn't flush_delayed_work_sync() >>still queue the pending delayed work and wait for it to be finished? > > Yes, it does. The original patch wants to use flush instead of >cancel to wait for the work to finish, because there's evidently a >possibility of getting back into bond_open before the work item >executes, and bond_open would reinitialize the work queue and corrupt >the queued work item. > > The original patch series, and recipe for destruction, is here: > > http://www.spinics.net/lists/netdev/msg176382.html > > I've been unable to reproduce the work queue panic locally, >although it sounds plausible. > > Mitsuo: can you provide the precise bonding configuration you're >using to induce the problem? Driver options, number and type of slave= s, >etc. > >>Maybe I am too blind, why do we need rtnl_lock for cancel_delayed_wor= k() >>inside bond_close()? > > We don't need RTNL for cancel/flush. However, bond_close is an >ndo_stop operation, and is called in the dev_close path, which always >occurs under RTNL. The mii / arp monitor work functions separately >acquire RTNL if they need to perform various failover related >operations. > > I'm working on a patch that should resolve the mii / arp monitor >RTNL problem as I described above (if rtnl_trylock fails, punt and >reschedule the work). I need to rearrange the netdev_bonding_change >stuff a bit as well, since it acquires RTNL separately. > > Once these changes are made to mii / arp monitor, then >bond_close can call flush instead of cancel, which should eliminate th= e >original problem described at the top. Just an update: there are three functions that may deadlock if the cancel work calls are changed to flush_sync. There are two rtnl_lock calls in each of the bond_mii_monitor and bond_activebackup_arp_mon functions, and one more in the bond_alb_monitor. Still testing to make sure I haven't missed anything, and I still haven't been able to reproduce Mitsuo's original failure. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com