From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Bharadiya,Pankaj" Subject: Re: Dead lock condition occured ipanic during register_netdevice_notifier call in 4.9.102 Date: Thu, 31 May 2018 14:56:21 +0530 Message-ID: <20180531092621.GA15221@plaxmina-desktop.iind.intel.com> References: <810586B7581CC8469141DADEBC37191292C67AB8@BGSMSX103.gar.corp.intel.com> <818edeb4-036e-cc56-cfd5-03754eda1180@virtuozzo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "Mansoor, Illyas" , "davem@davemloft.net" , "netdev@vger.kernel.org" , "Feng, Fleming" , "Li, Lili" , "Zhang, Baoli" , "Pan, Kris" , "Xia, Hui" , "Mei, Paul" To: Kirill Tkhai Return-path: Received: from mga17.intel.com ([192.55.52.151]:25729 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754092AbeEaJc3 (ORCPT ); Thu, 31 May 2018 05:32:29 -0400 Content-Disposition: inline In-Reply-To: <818edeb4-036e-cc56-cfd5-03754eda1180@virtuozzo.com> Sender: netdev-owner@vger.kernel.org List-ID: On Thu, May 31, 2018 at 12:21:31PM +0300, Kirill Tkhai wrote: > Hi, Illyas, > > On 31.05.2018 11:43, Mansoor, Illyas wrote: > > We are facing mutex dead lock condition that we think might be related to a fix that you have provided in: > > Merge branch 'Close-race-between-un-register_netdevice_notifier-and-pernet_operations' commit b9a12601541eb55d07e00261a5112a4bc36fe7be > > > > We tried to backport the patch series, but got stuck due to dependencies not met in 4.9.102 kernel for these patch series. > > Could you please provide some pointers, so that we can fix in 4.9.y kernel. > > > > Appreciate any help or pointers on this one. > > > > Ipanic logs pasted below: > > > > <3>[ 6513.681473] INFO: task sensors@1.0-ser:2744 blocked for more than 120 seconds. > > <3>[ 6513.689723] Tainted: P U W O 4.9.102-quilt-2e5dc0ac-07850-g222b9655589b #1 > > <3>[ 6513.699108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > <6>[ 6513.707997] sensors@1.0-ser D 0 2744 1 0x00000000 > > <4>[ 6513.708007] ffff880223f38040 ffff88027fc980c0 0000000000000000 ffff880271987000 > > <4>[ 6513.708024] ffff88026f9ae040 ffffc90000d57d40 ffffffff81b363d1 ffffffff81396e0b > > <4>[ 6513.708032] 00ffc90000d57d20 ffff88027fc980c0 ffffc90000d57d90 ffff88026f9ae040 > > <4>[ 6513.708040] Call Trace: > > <4>[ 6513.708056] [] ? __schedule+0x221/0x6e0 > > <4>[ 6513.708063] [] ? sidtab_context_to_sid+0x39b/0x410 > > <4>[ 6513.708068] [] schedule+0x36/0x90 > > <4>[ 6513.708072] [] schedule_preempt_disabled+0x18/0x30 > > <4>[ 6513.708078] [] __mutex_lock_slowpath+0x185/0x3f0 > > <4>[ 6513.708083] [] mutex_lock+0x25/0x30 > > <4>[ 6513.708089] [] rtnl_lock+0x15/0x20 > > <4>[ 6513.708095] [] register_netdevice_notifier+0x2d/0x200 > > <4>[ 6513.708107] [] raw_init+0x8b/0x90 > > <4>[ 6513.708118] [] can_create+0xe1/0x1c0 > > <4>[ 6513.708129] [] __sock_create+0x12e/0x210 > > <4>[ 6513.708141] [] SyS_socket+0x55/0xb0 > > <4>[ 6513.708156] [] do_syscall_64+0x6a/0xe0 > > <4>[ 6513.708166] [] entry_SYSCALL_64_after_swapgs+0x5d/0xd7 > > <4>[ 6513.708171] NMI backtrace for cpu 2 > > <4>[ 6513.708178] CPU: 2 PID: 482 Comm: khungtaskd Tainted: P U W O 4.9.102-quilt-2e5dc0ac-07850-g222b9655589b #1 > > <4>[ 6513.708180] ffffc90000eafdd0 ffffffff813f56bc 0000000000000000 0000000000000000 > > <4>[ 6513.708188] ffffc90000eafe00 ffffffff813f9fe1 0000000000000002 0000000000000000 > > <4>[ 6513.708195] ffffffff81042d80 ffffffff826120f8 ffffc90000eafe30 ffffffff813fa0a3 > > 1)I'm not sure commit b9a12601541eb55d07e00261a5112a4bc36fe7be will help here, because this > stack looks for me like just someone does not release the mutex. It's possible firstly > try to analyze who actually owns it. > > 2)Also, note that rtnl_is_locked() is used in wrong way in one driver there > (see WILC_WFI_deinit_mon_interface()), so it also may introduce an imbalance > (if you use the driver). > Thank you for your quick response. We will look into your suggestions and get back. Thanks, Pankaj > Kirill