From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lachlan McIlroy Subject: Re: possible deadlock shown by CONFIG_PROVE_LOCKING Date: Mon, 04 Feb 2008 16:56:52 +1100 Message-ID: <47A6A924.5010601@sgi.com> References: <18328.37831.359806.359123@fisica.ufpr.br> <479942D9.6000302@sgi.com> <18332.63985.756005.37224@fisica.ufpr.br> Reply-To: lachlan@sgi.com Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: linux-fsdevel@vger.kernel.org To: Carlos Carvalho Return-path: Received: from relay1.sgi.com ([192.48.171.29]:60718 "EHLO relay.sgi.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751180AbYBDFwv (ORCPT ); Mon, 4 Feb 2008 00:52:51 -0500 In-Reply-To: <18332.63985.756005.37224@fisica.ufpr.br> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Carlos Carvalho wrote: > Lachlan McIlroy (lachlan@sgi.com) wrote on 25 January 2008 13:00: > >Carlos Carvalho wrote: > >> I compiled the kernel with Ingo's CONFIG_PROVE_LOCKING and got the > >> below at boot. Is it a problem? > >It was a problem - it has since been fixed in 2.6.23. > >Patch is attached in case you're interested. > > [patch removed] > > Thanks. I applied it to 22.16 without rejects but then the machine > doesn't even mount the partition: > > Filesystem "md0": Disabling barriers, not supported by the underlying device > XFS mounting filesystem md0 > Unable to handle kernel paging request at ffff8110e325e8d0 RIP: The patch shouldn't have caused that. Did you keep a stack trace from this? > > Then I ran 23.14, no errors. However the machine locks hard a few > seconds after we start the heavy IO tests... Not good either - again, any stack traces from the stuck threads? > > So we used .24. Now the machine stood some 16h of heavy IO load. Good! > However we got the log below after about 15 minutes of running the test. > It didn't repeat. Is it still a problem? If it didn't cause a deadlock then it's probably not an issue. We probably just need some more lockdep annotations to tell lockdep to ignore this locking scenario. I can't see where we acquire nested qh_lock's in xfs_qm_dqget(). It might actually be caused by acquiring the xfs_qm_freelist lock in xfs_qm_dqget while holding the hash lock because both locks are initialised with the same stack and lockdep will think they are locks for the same object type. > > Jan 26 19:25:43 mumm kernel: Filesystem "md0": Disabling barriers, not supported by the underlying device > Jan 26 19:25:43 mumm kernel: XFS mounting filesystem md0 > Jan 26 19:25:43 mumm kernel: Ending clean XFS mount for filesystem: md0 > > 15min later... > > Jan 26 19:41:40 mumm kernel: > Jan 26 19:41:40 mumm kernel: ============================================= > Jan 26 19:41:40 mumm kernel: [ INFO: possible recursive locking detected ] > Jan 26 19:41:40 mumm kernel: 2.6.24 #3 > Jan 26 19:41:40 mumm kernel: --------------------------------------------- > Jan 26 19:41:40 mumm kernel: rm/1589 is trying to acquire lock: > Jan 26 19:41:40 mumm kernel: (&list->qh_lock){--..}, at: [] xfs_qm_dqget+0x297/0x355 > Jan 26 19:41:40 mumm kernel: > Jan 26 19:41:40 mumm kernel: but task is already holding lock: > Jan 26 19:41:40 mumm kernel: (&list->qh_lock){--..}, at: [] xfs_qm_dqget+0x1f6/0x355 > Jan 26 19:41:40 mumm kernel: > Jan 26 19:41:40 mumm kernel: other info that might help us debug this: > Jan 26 19:41:40 mumm kernel: 4 locks held by rm/1589: > Jan 26 19:41:40 mumm kernel: #0: (&type->i_mutex_dir_key#2/1){--..}, at: [] do_unlinkat+0x6b/0x133 > Jan 26 19:41:40 mumm kernel: #1: (&sb->s_type->i_mutex_key#2){--..}, at: [] vfs_unlink+0x42/0x8c > Jan 26 19:41:40 mumm kernel: #2: (&(&ip->i_lock)->mr_lock){----}, at: [] xfs_ilock+0x63/0x8d > Jan 26 19:41:40 mumm kernel: #3: (&list->qh_lock){--..}, at: [] xfs_qm_dqget+0x1f6/0x355 > Jan 26 19:41:40 mumm kernel: > Jan 26 19:41:40 mumm kernel: stack backtrace: > Jan 26 19:41:40 mumm kernel: Pid: 1589, comm: rm Not tainted 2.6.24 #3 > Jan 26 19:41:40 mumm kernel: > Jan 26 19:41:40 mumm kernel: Call Trace: > Jan 26 19:41:40 mumm kernel: [] __lock_acquire+0x1b7/0xcd0 > Jan 26 19:41:40 mumm kernel: [] xfs_qm_dqget+0x297/0x355 > Jan 26 19:41:40 mumm kernel: [] lock_acquire+0x51/0x6c > Jan 26 19:41:40 mumm kernel: [] xfs_qm_dqget+0x297/0x355 > Jan 26 19:41:40 mumm kernel: [] debug_mutex_lock_common+0x16/0x23 > Jan 26 19:41:40 mumm kernel: other info that might help us debug this: > Jan 26 19:41:40 mumm kernel: 4 locks held by rm/1589: > Jan 26 19:41:40 mumm kernel: #0: (&type->i_mutex_dir_key#2/1){--..}, at: [] do_unlinkat+0x6b/0x133 > Jan 26 19:41:40 mumm kernel: #1: (&sb->s_type->i_mutex_key#2){--..}, at: [] vfs_unlink+0x42/0x8c > Jan 26 19:41:40 mumm kernel: #2: (&(&ip->i_lock)->mr_lock){----}, at: [] xfs_ilock+0x63/0x8d > Jan 26 19:41:40 mumm kernel: #3: (&list->qh_lock){--..}, at: [] xfs_qm_dqget+0x1f6/0x355 > Jan 26 19:41:40 mumm kernel: > Jan 26 19:41:40 mumm kernel: stack backtrace: > Jan 26 19:41:40 mumm kernel: Pid: 1589, comm: rm Not tainted 2.6.24 #3 > Jan 26 19:41:40 mumm kernel: > Jan 26 19:41:40 mumm kernel: Call Trace: > Jan 26 19:41:40 mumm kernel: [] __lock_acquire+0x1b7/0xcd0 > Jan 26 19:41:40 mumm kernel: [] xfs_qm_dqget+0x297/0x355 > Jan 26 19:41:40 mumm kernel: [] lock_acquire+0x51/0x6c > Jan 26 19:41:40 mumm kernel: [] xfs_qm_dqget+0x297/0x355 > Jan 26 19:41:40 mumm kernel: [] debug_mutex_lock_common+0x16/0x23 > Jan 26 19:41:40 mumm kernel: [] mutex_lock_nested+0xec/0x286 > Jan 26 19:41:40 mumm kernel: [] xfs_qm_dqget+0x297/0x355 > Jan 26 19:41:40 mumm kernel: [] xfs_qm_dqattach_one+0xe2/0x150 > Jan 26 19:41:40 mumm kernel: [] xfs_qm_dqattach+0xb7/0x175 > Jan 26 19:41:40 mumm kernel: [] xfs_remove+0x8f/0x364 > Jan 26 19:41:40 mumm kernel: [] mutex_lock_nested+0x26c/0x286 > Jan 26 19:41:40 mumm kernel: [] trace_hardirqs_on+0x115/0x139 > Jan 26 19:41:40 mumm kernel: [] xfs_vn_unlink+0x17/0x36 > Jan 26 19:41:40 mumm kernel: [] vfs_unlink+0x5c/0x8c > Jan 26 19:41:40 mumm kernel: [] do_unlinkat+0xb4/0x133 > Jan 26 19:41:40 mumm kernel: [] trace_hardirqs_on_thunk+0x35/0x3a > Jan 26 19:41:40 mumm kernel: [] trace_hardirqs_on+0x115/0x139 > Jan 26 19:41:40 mumm kernel: [] trace_hardirqs_on_thunk+0x35/0x3a > Jan 26 19:41:40 mumm kernel: [] system_call+0x7e/0x83 >