possible deadlock shown by CONFIG_PROVE

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* possible deadlock shown by CONFIG_PROVE_LOCKING
@ 2008-01-24 13:33 Carlos Carvalho
  2008-01-25  2:00 ` Lachlan McIlroy
  0 siblings, 1 reply; 4+ messages in thread
From: Carlos Carvalho @ 2008-01-24 13:33 UTC (permalink / raw)
  To: linux-fsdevel

I compiled the kernel with Ingo's CONFIG_PROVE_LOCKING and got the
below at boot. Is it a problem?

Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
... MAX_LOCKDEP_SUBCLASSES:    8
... MAX_LOCK_DEPTH:          30
... MAX_LOCKDEP_KEYS:        2048
... CLASSHASH_SIZE:           1024
... MAX_LOCKDEP_ENTRIES:     8192
... MAX_LOCKDEP_CHAINS:      16384
... CHAINHASH_SIZE:          8192
 memory used by lock dependency info: 1648 kB
 per task-struct memory footprint: 1680 bytes
------------------------
| Locking API testsuite:
----------------------------------------------------------------------------
[removed]
-------------------------------------------------------
Good, all 218 testcases passed! |
---------------------------------

Further down

md: running: <sdah1><sdag1>
raid1: raid set md3 active with 2 out of 2 mirrors
md: ... autorun DONE.
Filesystem "md1": Disabling barriers, not supported by the underlying device
XFS mounting filesystem md1
Ending clean XFS mount for filesystem: md1
VFS: Mounted root (xfs filesystem).
Freeing unused kernel memory: 284k freed
Warning: unable to open an initial console.
Filesystem "md1": Disabling barriers, not supported by the underlying device

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.22.16 #1
-------------------------------------------------------
mount/1558 is trying to acquire lock:
 (&(&ip->i_lock)->mr_lock/1){--..}, at: [<ffffffff80312805>] xfs_ilock+0x63/0x8d

but task is already holding lock:
 (&(&ip->i_lock)->mr_lock){----}, at: [<ffffffff80312805>] xfs_ilock+0x63/0x8d

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&(&ip->i_lock)->mr_lock){----}:
       [<ffffffff80249fa6>] __lock_acquire+0xa0f/0xb9f
       [<ffffffff8024a50d>] lock_acquire+0x48/0x63
       [<ffffffff80312805>] xfs_ilock+0x63/0x8d
       [<ffffffff8023c909>] down_write_nested+0x38/0x46
       [<ffffffff80312805>] xfs_ilock+0x63/0x8d
       [<ffffffff803132e8>] xfs_iget_core+0x3ef/0x705
       [<ffffffff803136a2>] xfs_iget+0xa4/0x14e
       [<ffffffff80328364>] xfs_trans_iget+0xb4/0x128
       [<ffffffff80316a57>] xfs_ialloc+0x9b/0x4b7
       [<ffffffff80249fc9>] __lock_acquire+0xa32/0xb9f
       [<ffffffff80328d87>] xfs_dir_ialloc+0x84/0x2cd
       [<ffffffff80312805>] xfs_ilock+0x63/0x8d
       [<ffffffff8023c909>] down_write_nested+0x38/0x46
       [<ffffffff8032e307>] xfs_create+0x331/0x65f
       [<ffffffff80308163>] xfs_dir2_leaf_lookup+0x1d/0x96
       [<ffffffff80338367>] xfs_vn_mknod+0x12f/0x1f2
       [<ffffffff8027fb0a>] vfs_create+0x6e/0x9e
       [<ffffffff80282af3>] open_namei+0x1f7/0x6a9
       [<ffffffff8021843d>] do_page_fault+0x438/0x78f
       [<ffffffff8027705a>] do_filp_open+0x1c/0x3d
       [<ffffffff8045bf56>] _spin_unlock+0x17/0x20
       [<ffffffff80276e3d>] get_unused_fd+0x11c/0x12a
       [<ffffffff802770bb>] do_sys_open+0x40/0x7b
       [<ffffffff802095be>] system_call+0x7e/0x83
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (&(&ip->i_lock)->mr_lock/1){--..}:
       [<ffffffff80248896>] print_circular_bug_header+0xcc/0xd3
       [<ffffffff80249ea2>] __lock_acquire+0x90b/0xb9f
       [<ffffffff8024a50d>] lock_acquire+0x48/0x63
       [<ffffffff80312805>] xfs_ilock+0x63/0x8d
       [<ffffffff8023c909>] down_write_nested+0x38/0x46
       [<ffffffff80312805>] xfs_ilock+0x63/0x8d
       [<ffffffff8032bd30>] xfs_lock_inodes+0x152/0x16d
       [<ffffffff8032e807>] xfs_link+0x1d2/0x3f7
       [<ffffffff80249f3f>] __lock_acquire+0x9a8/0xb9f
       [<ffffffff80337fe5>] xfs_vn_link+0x3c/0x91
       [<ffffffff80248f4a>] mark_held_locks+0x58/0x72
       [<ffffffff8045a9b7>] __mutex_lock_slowpath+0x250/0x266
       [<ffffffff80249119>] trace_hardirqs_on+0x115/0x139
       [<ffffffff8045a9c2>] __mutex_lock_slowpath+0x25b/0x266
       [<ffffffff8027f88b>] vfs_link+0xe8/0x124
       [<ffffffff802822d8>] sys_linkat+0xcd/0x129
       [<ffffffff8045baaf>] trace_hardirqs_on_thunk+0x35/0x37
       [<ffffffff80249119>] trace_hardirqs_on+0x115/0x139
       [<ffffffff8045baaf>] trace_hardirqs_on_thunk+0x35/0x37
       [<ffffffff802095be>] system_call+0x7e/0x83
       [<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

3 locks held by mount/1558:
 #0:  (&inode->i_mutex/1){--..}, at: [<ffffffff802800f5>] lookup_create+0x23/0x8
5
 #1:  (&inode->i_mutex){--..}, at: [<ffffffff8027f878>] vfs_link+0xd5/0x124
 #2:  (&(&ip->i_lock)->mr_lock){----}, at: [<ffffffff80312805>] xfs_ilock+0x63/0
x8d

stack backtrace:

Call Trace:
 [<ffffffff80248612>] print_circular_bug_tail+0x69/0x72
 [<ffffffff80248896>] print_circular_bug_header+0xcc/0xd3
 [<ffffffff80249ea2>] __lock_acquire+0x90b/0xb9f
 [<ffffffff8024a50d>] lock_acquire+0x48/0x63
 [<ffffffff80312805>] xfs_ilock+0x63/0x8d
 [<ffffffff8023c909>] down_write_nested+0x38/0x46
 [<ffffffff80312805>] xfs_ilock+0x63/0x8d
 [<ffffffff8032bd30>] xfs_lock_inodes+0x152/0x16d
 [<ffffffff8032e807>] xfs_link+0x1d2/0x3f7
 [<ffffffff80249f3f>] __lock_acquire+0x9a8/0xb9f
 [<ffffffff80337fe5>] xfs_vn_link+0x3c/0x91
 [<ffffffff80248f4a>] mark_held_locks+0x58/0x72
 [<ffffffff8045a9b7>] __mutex_lock_slowpath+0x250/0x266
 [<ffffffff80249119>] trace_hardirqs_on+0x115/0x139
 [<ffffffff8045a9c2>] __mutex_lock_slowpath+0x25b/0x266
 [<ffffffff8027f88b>] vfs_link+0xe8/0x124
 [<ffffffff802822d8>] sys_linkat+0xcd/0x129
 [<ffffffff8045baaf>] trace_hardirqs_on_thunk+0x35/0x37
 [<ffffffff80249119>] trace_hardirqs_on+0x115/0x139
 [<ffffffff8045baaf>] trace_hardirqs_on_thunk+0x35/0x37
 [<ffffffff802095be>] system_call+0x7e/0x83

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: possible deadlock shown by CONFIG_PROVE_LOCKING
  2008-01-24 13:33 possible deadlock shown by CONFIG_PROVE_LOCKING Carlos Carvalho
@ 2008-01-25  2:00 ` Lachlan McIlroy
  2008-01-27 21:38   ` Carlos Carvalho
  0 siblings, 1 reply; 4+ messages in thread
From: Lachlan McIlroy @ 2008-01-25  2:00 UTC (permalink / raw)
  To: Carlos Carvalho; +Cc: linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 6236 bytes --]

Carlos Carvalho wrote:
> I compiled the kernel with Ingo's CONFIG_PROVE_LOCKING and got the
> below at boot. Is it a problem?
It was a problem - it has since been fixed in 2.6.23.
Patch is attached in case you're interested.

> 
> Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
> ... MAX_LOCKDEP_SUBCLASSES:    8
> ... MAX_LOCK_DEPTH:          30
> ... MAX_LOCKDEP_KEYS:        2048
> ... CLASSHASH_SIZE:           1024
> ... MAX_LOCKDEP_ENTRIES:     8192
> ... MAX_LOCKDEP_CHAINS:      16384
> ... CHAINHASH_SIZE:          8192
>  memory used by lock dependency info: 1648 kB
>  per task-struct memory footprint: 1680 bytes
> ------------------------
> | Locking API testsuite:
> ----------------------------------------------------------------------------
> [removed]
> -------------------------------------------------------
> Good, all 218 testcases passed! |
> ---------------------------------
> 
> Further down
> 
> md: running: <sdah1><sdag1>
> raid1: raid set md3 active with 2 out of 2 mirrors
> md: ... autorun DONE.
> Filesystem "md1": Disabling barriers, not supported by the underlying device
> XFS mounting filesystem md1
> Ending clean XFS mount for filesystem: md1
> VFS: Mounted root (xfs filesystem).
> Freeing unused kernel memory: 284k freed
> Warning: unable to open an initial console.
> Filesystem "md1": Disabling barriers, not supported by the underlying device
> 
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.22.16 #1
> -------------------------------------------------------
> mount/1558 is trying to acquire lock:
>  (&(&ip->i_lock)->mr_lock/1){--..}, at: [<ffffffff80312805>] xfs_ilock+0x63/0x8d
> 
> but task is already holding lock:
>  (&(&ip->i_lock)->mr_lock){----}, at: [<ffffffff80312805>] xfs_ilock+0x63/0x8d
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (&(&ip->i_lock)->mr_lock){----}:
>        [<ffffffff80249fa6>] __lock_acquire+0xa0f/0xb9f
>        [<ffffffff8024a50d>] lock_acquire+0x48/0x63
>        [<ffffffff80312805>] xfs_ilock+0x63/0x8d
>        [<ffffffff8023c909>] down_write_nested+0x38/0x46
>        [<ffffffff80312805>] xfs_ilock+0x63/0x8d
>        [<ffffffff803132e8>] xfs_iget_core+0x3ef/0x705
>        [<ffffffff803136a2>] xfs_iget+0xa4/0x14e
>        [<ffffffff80328364>] xfs_trans_iget+0xb4/0x128
>        [<ffffffff80316a57>] xfs_ialloc+0x9b/0x4b7
>        [<ffffffff80249fc9>] __lock_acquire+0xa32/0xb9f
>        [<ffffffff80328d87>] xfs_dir_ialloc+0x84/0x2cd
>        [<ffffffff80312805>] xfs_ilock+0x63/0x8d
>        [<ffffffff8023c909>] down_write_nested+0x38/0x46
>        [<ffffffff8032e307>] xfs_create+0x331/0x65f
>        [<ffffffff80308163>] xfs_dir2_leaf_lookup+0x1d/0x96
>        [<ffffffff80338367>] xfs_vn_mknod+0x12f/0x1f2
>        [<ffffffff8027fb0a>] vfs_create+0x6e/0x9e
>        [<ffffffff80282af3>] open_namei+0x1f7/0x6a9
>        [<ffffffff8021843d>] do_page_fault+0x438/0x78f
>        [<ffffffff8027705a>] do_filp_open+0x1c/0x3d
>        [<ffffffff8045bf56>] _spin_unlock+0x17/0x20
>        [<ffffffff80276e3d>] get_unused_fd+0x11c/0x12a
>        [<ffffffff802770bb>] do_sys_open+0x40/0x7b
>        [<ffffffff802095be>] system_call+0x7e/0x83
>        [<ffffffffffffffff>] 0xffffffffffffffff
> 
> -> #0 (&(&ip->i_lock)->mr_lock/1){--..}:
>        [<ffffffff80248896>] print_circular_bug_header+0xcc/0xd3
>        [<ffffffff80249ea2>] __lock_acquire+0x90b/0xb9f
>        [<ffffffff8024a50d>] lock_acquire+0x48/0x63
>        [<ffffffff80312805>] xfs_ilock+0x63/0x8d
>        [<ffffffff8023c909>] down_write_nested+0x38/0x46
>        [<ffffffff80312805>] xfs_ilock+0x63/0x8d
>        [<ffffffff8032bd30>] xfs_lock_inodes+0x152/0x16d
>        [<ffffffff8032e807>] xfs_link+0x1d2/0x3f7
>        [<ffffffff80249f3f>] __lock_acquire+0x9a8/0xb9f
>        [<ffffffff80337fe5>] xfs_vn_link+0x3c/0x91
>        [<ffffffff80248f4a>] mark_held_locks+0x58/0x72
>        [<ffffffff8045a9b7>] __mutex_lock_slowpath+0x250/0x266
>        [<ffffffff80249119>] trace_hardirqs_on+0x115/0x139
>        [<ffffffff8045a9c2>] __mutex_lock_slowpath+0x25b/0x266
>        [<ffffffff8027f88b>] vfs_link+0xe8/0x124
>        [<ffffffff802822d8>] sys_linkat+0xcd/0x129
>        [<ffffffff8045baaf>] trace_hardirqs_on_thunk+0x35/0x37
>        [<ffffffff80249119>] trace_hardirqs_on+0x115/0x139
>        [<ffffffff8045baaf>] trace_hardirqs_on_thunk+0x35/0x37
>        [<ffffffff802095be>] system_call+0x7e/0x83
>        [<ffffffffffffffff>] 0xffffffffffffffff
> 
> other info that might help us debug this:
> 
> 3 locks held by mount/1558:
>  #0:  (&inode->i_mutex/1){--..}, at: [<ffffffff802800f5>] lookup_create+0x23/0x8
> 5
>  #1:  (&inode->i_mutex){--..}, at: [<ffffffff8027f878>] vfs_link+0xd5/0x124
>  #2:  (&(&ip->i_lock)->mr_lock){----}, at: [<ffffffff80312805>] xfs_ilock+0x63/0
> x8d
> 
> stack backtrace:
> 
> Call Trace:
>  [<ffffffff80248612>] print_circular_bug_tail+0x69/0x72
>  [<ffffffff80248896>] print_circular_bug_header+0xcc/0xd3
>  [<ffffffff80249ea2>] __lock_acquire+0x90b/0xb9f
>  [<ffffffff8024a50d>] lock_acquire+0x48/0x63
>  [<ffffffff80312805>] xfs_ilock+0x63/0x8d
>  [<ffffffff8023c909>] down_write_nested+0x38/0x46
>  [<ffffffff80312805>] xfs_ilock+0x63/0x8d
>  [<ffffffff8032bd30>] xfs_lock_inodes+0x152/0x16d
>  [<ffffffff8032e807>] xfs_link+0x1d2/0x3f7
>  [<ffffffff80249f3f>] __lock_acquire+0x9a8/0xb9f
>  [<ffffffff80337fe5>] xfs_vn_link+0x3c/0x91
>  [<ffffffff80248f4a>] mark_held_locks+0x58/0x72
>  [<ffffffff8045a9b7>] __mutex_lock_slowpath+0x250/0x266
>  [<ffffffff80249119>] trace_hardirqs_on+0x115/0x139
>  [<ffffffff8045a9c2>] __mutex_lock_slowpath+0x25b/0x266
>  [<ffffffff8027f88b>] vfs_link+0xe8/0x124
>  [<ffffffff802822d8>] sys_linkat+0xcd/0x129
>  [<ffffffff8045baaf>] trace_hardirqs_on_thunk+0x35/0x37
>  [<ffffffff80249119>] trace_hardirqs_on+0x115/0x139
>  [<ffffffff8045baaf>] trace_hardirqs_on_thunk+0x35/0x37
>  [<ffffffff802095be>] system_call+0x7e/0x83
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 2429 bytes --]

commit 0f1145cc18e970ebe37da114fc34c297f135e062
Author: David Chinner <dgc@sgi.com>
Date:   Fri Jun 29 17:26:09 2007 +1000

    [XFS] Fix lockdep annotations for xfs_lock_inodes
    
    SGI-PV: 967035
    SGI-Modid: xfs-linux-melb:xfs-kern:29026a
    
    Signed-off-by: David Chinner <dgc@sgi.com>
    Signed-off-by: Tim Shimmin <tes@sgi.com>

diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index d418eee..012dfd4 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -415,19 +415,22 @@ xfs_iflags_test(xfs_inode_t *ip, unsigne
  * gets a lockdep subclass of 1 and the second lock will have a lockdep
  * subclass of 0.
  *
- * XFS_I[O]LOCK_INUMORDER - for locking several inodes at the some time
+ * XFS_LOCK_INUMORDER - for locking several inodes at the some time
  * with xfs_lock_inodes().  This flag is used as the starting subclass
  * and each subsequent lock acquired will increment the subclass by one.
  * So the first lock acquired will have a lockdep subclass of 2, the
- * second lock will have a lockdep subclass of 3, and so on.
+ * second lock will have a lockdep subclass of 3, and so on. It is
+ * the responsibility of the class builder to shift this to the correct
+ * portion of the lock_mode lockdep mask.
  */
+#define XFS_LOCK_PARENT		1
+#define XFS_LOCK_INUMORDER	2
+
 #define XFS_IOLOCK_SHIFT	16
-#define	XFS_IOLOCK_PARENT	(1 << XFS_IOLOCK_SHIFT)
-#define	XFS_IOLOCK_INUMORDER	(2 << XFS_IOLOCK_SHIFT)
+#define	XFS_IOLOCK_PARENT	(XFS_LOCK_PARENT << XFS_IOLOCK_SHIFT)
 
 #define XFS_ILOCK_SHIFT		24
-#define	XFS_ILOCK_PARENT	(1 << XFS_ILOCK_SHIFT)
-#define	XFS_ILOCK_INUMORDER	(2 << XFS_ILOCK_SHIFT)
+#define	XFS_ILOCK_PARENT	(XFS_LOCK_PARENT << XFS_ILOCK_SHIFT)
 
 #define XFS_IOLOCK_DEP_MASK	0x00ff0000
 #define XFS_ILOCK_DEP_MASK	0xff000000
diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index 60fd0be..79b5227 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -2224,9 +2224,9 @@ static inline int
 xfs_lock_inumorder(int lock_mode, int subclass)
 {
 	if (lock_mode & (XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL))
-		lock_mode |= (subclass + XFS_IOLOCK_INUMORDER) << XFS_IOLOCK_SHIFT;
+		lock_mode |= (subclass + XFS_LOCK_INUMORDER) << XFS_IOLOCK_SHIFT;
 	if (lock_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL))
-		lock_mode |= (subclass + XFS_ILOCK_INUMORDER) << XFS_ILOCK_SHIFT;
+		lock_mode |= (subclass + XFS_LOCK_INUMORDER) << XFS_ILOCK_SHIFT;
 
 	return lock_mode;
 }

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: possible deadlock shown by CONFIG_PROVE_LOCKING
  2008-01-25  2:00 ` Lachlan McIlroy
@ 2008-01-27 21:38   ` Carlos Carvalho
  2008-02-04  5:56     ` Lachlan McIlroy
  0 siblings, 1 reply; 4+ messages in thread
From: Carlos Carvalho @ 2008-01-27 21:38 UTC (permalink / raw)
  To: lachlan; +Cc: linux-fsdevel

Lachlan McIlroy (lachlan@sgi.com) wrote on 25 January 2008 13:00:
 >Carlos Carvalho wrote:
 >> I compiled the kernel with Ingo's CONFIG_PROVE_LOCKING and got the
 >> below at boot. Is it a problem?
 >It was a problem - it has since been fixed in 2.6.23.
 >Patch is attached in case you're interested.

[patch removed]

Thanks. I applied it to 22.16 without rejects but then the machine
doesn't even mount the partition:

Filesystem "md0": Disabling barriers, not supported by the underlying device
XFS mounting filesystem md0
Unable to handle kernel paging request at ffff8110e325e8d0 RIP: 

Then I ran 23.14, no errors. However the machine locks hard a few
seconds after we start the heavy IO tests...

So we used .24. Now the machine stood some 16h of heavy IO load. Good!
However we got the log below after about 15 minutes of running the test.
It didn't repeat. Is it still a problem?

Jan 26 19:25:43 mumm kernel: Filesystem "md0": Disabling barriers, not supported by the underlying device
Jan 26 19:25:43 mumm kernel: XFS mounting filesystem md0
Jan 26 19:25:43 mumm kernel: Ending clean XFS mount for filesystem: md0

15min later...

Jan 26 19:41:40 mumm kernel: 
Jan 26 19:41:40 mumm kernel: =============================================
Jan 26 19:41:40 mumm kernel: [ INFO: possible recursive locking detected ]
Jan 26 19:41:40 mumm kernel: 2.6.24 #3
Jan 26 19:41:40 mumm kernel: ---------------------------------------------
Jan 26 19:41:40 mumm kernel: rm/1589 is trying to acquire lock:
Jan 26 19:41:40 mumm kernel:  (&list->qh_lock){--..}, at: [<ffffffff802c58e2>] xfs_qm_dqget+0x297/0x355
Jan 26 19:41:40 mumm kernel: 
Jan 26 19:41:40 mumm kernel: but task is already holding lock:
Jan 26 19:41:40 mumm kernel:  (&list->qh_lock){--..}, at: [<ffffffff802c5841>] xfs_qm_dqget+0x1f6/0x355
Jan 26 19:41:40 mumm kernel: 
Jan 26 19:41:40 mumm kernel: other info that might help us debug this:
Jan 26 19:41:40 mumm kernel: 4 locks held by rm/1589:
Jan 26 19:41:40 mumm kernel:  #0:  (&type->i_mutex_dir_key#2/1){--..}, at: [<ffffffff8027c400>] do_unlinkat+0x6b/0x133
Jan 26 19:41:40 mumm kernel:  #1:  (&sb->s_type->i_mutex_key#2){--..}, at: [<ffffffff8027a59d>] vfs_unlink+0x42/0x8c
Jan 26 19:41:40 mumm kernel:  #2:  (&(&ip->i_lock)->mr_lock){----}, at: [<ffffffff802f4093>] xfs_ilock+0x63/0x8d
Jan 26 19:41:40 mumm kernel:  #3:  (&list->qh_lock){--..}, at: [<ffffffff802c5841>] xfs_qm_dqget+0x1f6/0x355
Jan 26 19:41:40 mumm kernel: 
Jan 26 19:41:40 mumm kernel: stack backtrace:
Jan 26 19:41:40 mumm kernel: Pid: 1589, comm: rm Not tainted 2.6.24 #3
Jan 26 19:41:40 mumm kernel: 
Jan 26 19:41:40 mumm kernel: Call Trace:
Jan 26 19:41:40 mumm kernel:  [<ffffffff802446b2>] __lock_acquire+0x1b7/0xcd0
Jan 26 19:41:40 mumm kernel:  [<ffffffff802c58e2>] xfs_qm_dqget+0x297/0x355
Jan 26 19:41:40 mumm kernel:  [<ffffffff802455f4>] lock_acquire+0x51/0x6c
Jan 26 19:41:40 mumm kernel:  [<ffffffff802c58e2>] xfs_qm_dqget+0x297/0x355
Jan 26 19:41:40 mumm kernel:  [<ffffffff80241ef9>] debug_mutex_lock_common+0x16/0x23
Jan 26 19:41:40 mumm kernel: other info that might help us debug this:
Jan 26 19:41:40 mumm kernel: 4 locks held by rm/1589:
Jan 26 19:41:40 mumm kernel:  #0:  (&type->i_mutex_dir_key#2/1){--..}, at: [<ffffffff8027c400>] do_unlinkat+0x6b/0x133
Jan 26 19:41:40 mumm kernel:  #1:  (&sb->s_type->i_mutex_key#2){--..}, at: [<ffffffff8027a59d>] vfs_unlink+0x42/0x8c
Jan 26 19:41:40 mumm kernel:  #2:  (&(&ip->i_lock)->mr_lock){----}, at: [<ffffffff802f4093>] xfs_ilock+0x63/0x8d
Jan 26 19:41:40 mumm kernel:  #3:  (&list->qh_lock){--..}, at: [<ffffffff802c5841>] xfs_qm_dqget+0x1f6/0x355
Jan 26 19:41:40 mumm kernel: 
Jan 26 19:41:40 mumm kernel: stack backtrace:
Jan 26 19:41:40 mumm kernel: Pid: 1589, comm: rm Not tainted 2.6.24 #3
Jan 26 19:41:40 mumm kernel: 
Jan 26 19:41:40 mumm kernel: Call Trace:
Jan 26 19:41:40 mumm kernel:  [<ffffffff802446b2>] __lock_acquire+0x1b7/0xcd0
Jan 26 19:41:40 mumm kernel:  [<ffffffff802c58e2>] xfs_qm_dqget+0x297/0x355
Jan 26 19:41:40 mumm kernel:  [<ffffffff802455f4>] lock_acquire+0x51/0x6c
Jan 26 19:41:40 mumm kernel:  [<ffffffff802c58e2>] xfs_qm_dqget+0x297/0x355
Jan 26 19:41:40 mumm kernel:  [<ffffffff80241ef9>] debug_mutex_lock_common+0x16/0x23
Jan 26 19:41:40 mumm kernel:  [<ffffffff803f8c7f>] mutex_lock_nested+0xec/0x286
Jan 26 19:41:40 mumm kernel:  [<ffffffff802c58e2>] xfs_qm_dqget+0x297/0x355
Jan 26 19:41:40 mumm kernel:  [<ffffffff802c8c2c>] xfs_qm_dqattach_one+0xe2/0x150
Jan 26 19:41:40 mumm kernel:  [<ffffffff802c8d51>] xfs_qm_dqattach+0xb7/0x175
Jan 26 19:41:40 mumm kernel:  [<ffffffff80311ac5>] xfs_remove+0x8f/0x364
Jan 26 19:41:40 mumm kernel:  [<ffffffff803f8dff>] mutex_lock_nested+0x26c/0x286
Jan 26 19:41:40 mumm kernel:  [<ffffffff80244082>] trace_hardirqs_on+0x115/0x139
Jan 26 19:41:40 mumm kernel:  [<ffffffff80318e6f>] xfs_vn_unlink+0x17/0x36
Jan 26 19:41:40 mumm kernel:  [<ffffffff8027a5b7>] vfs_unlink+0x5c/0x8c
Jan 26 19:41:40 mumm kernel:  [<ffffffff8027c449>] do_unlinkat+0xb4/0x133
Jan 26 19:41:40 mumm kernel:  [<ffffffff803f997a>] trace_hardirqs_on_thunk+0x35/0x3a
Jan 26 19:41:40 mumm kernel:  [<ffffffff80244082>] trace_hardirqs_on+0x115/0x139
Jan 26 19:41:40 mumm kernel:  [<ffffffff803f997a>] trace_hardirqs_on_thunk+0x35/0x3a
Jan 26 19:41:40 mumm kernel:  [<ffffffff8020b66e>] system_call+0x7e/0x83

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: possible deadlock shown by CONFIG_PROVE_LOCKING
  2008-01-27 21:38   ` Carlos Carvalho
@ 2008-02-04  5:56     ` Lachlan McIlroy
  0 siblings, 0 replies; 4+ messages in thread
From: Lachlan McIlroy @ 2008-02-04  5:56 UTC (permalink / raw)
  To: Carlos Carvalho; +Cc: linux-fsdevel

Carlos Carvalho wrote:
> Lachlan McIlroy (lachlan@sgi.com) wrote on 25 January 2008 13:00:
>  >Carlos Carvalho wrote:
>  >> I compiled the kernel with Ingo's CONFIG_PROVE_LOCKING and got the
>  >> below at boot. Is it a problem?
>  >It was a problem - it has since been fixed in 2.6.23.
>  >Patch is attached in case you're interested.
> 
> [patch removed]
> 
> Thanks. I applied it to 22.16 without rejects but then the machine
> doesn't even mount the partition:
> 
> Filesystem "md0": Disabling barriers, not supported by the underlying device
> XFS mounting filesystem md0
> Unable to handle kernel paging request at ffff8110e325e8d0 RIP: 
The patch shouldn't have caused that.  Did you keep a stack trace from this?

> 
> Then I ran 23.14, no errors. However the machine locks hard a few
> seconds after we start the heavy IO tests...
Not good either - again, any stack traces from the stuck threads?

> 
> So we used .24. Now the machine stood some 16h of heavy IO load. Good!
> However we got the log below after about 15 minutes of running the test.
> It didn't repeat. Is it still a problem?
If it didn't cause a deadlock then it's probably not an issue.  We probably
just need some more lockdep annotations to tell lockdep to ignore this
locking scenario.  I can't see where we acquire nested qh_lock's in
xfs_qm_dqget().  It might actually be caused by acquiring the xfs_qm_freelist
lock in xfs_qm_dqget while holding the hash lock because both locks are
initialised with the same stack and lockdep will think they are locks for
the same object type.

> 
> Jan 26 19:25:43 mumm kernel: Filesystem "md0": Disabling barriers, not supported by the underlying device
> Jan 26 19:25:43 mumm kernel: XFS mounting filesystem md0
> Jan 26 19:25:43 mumm kernel: Ending clean XFS mount for filesystem: md0
> 
> 15min later...
> 
> Jan 26 19:41:40 mumm kernel: 
> Jan 26 19:41:40 mumm kernel: =============================================
> Jan 26 19:41:40 mumm kernel: [ INFO: possible recursive locking detected ]
> Jan 26 19:41:40 mumm kernel: 2.6.24 #3
> Jan 26 19:41:40 mumm kernel: ---------------------------------------------
> Jan 26 19:41:40 mumm kernel: rm/1589 is trying to acquire lock:
> Jan 26 19:41:40 mumm kernel:  (&list->qh_lock){--..}, at: [<ffffffff802c58e2>] xfs_qm_dqget+0x297/0x355
> Jan 26 19:41:40 mumm kernel: 
> Jan 26 19:41:40 mumm kernel: but task is already holding lock:
> Jan 26 19:41:40 mumm kernel:  (&list->qh_lock){--..}, at: [<ffffffff802c5841>] xfs_qm_dqget+0x1f6/0x355
> Jan 26 19:41:40 mumm kernel: 
> Jan 26 19:41:40 mumm kernel: other info that might help us debug this:
> Jan 26 19:41:40 mumm kernel: 4 locks held by rm/1589:
> Jan 26 19:41:40 mumm kernel:  #0:  (&type->i_mutex_dir_key#2/1){--..}, at: [<ffffffff8027c400>] do_unlinkat+0x6b/0x133
> Jan 26 19:41:40 mumm kernel:  #1:  (&sb->s_type->i_mutex_key#2){--..}, at: [<ffffffff8027a59d>] vfs_unlink+0x42/0x8c
> Jan 26 19:41:40 mumm kernel:  #2:  (&(&ip->i_lock)->mr_lock){----}, at: [<ffffffff802f4093>] xfs_ilock+0x63/0x8d
> Jan 26 19:41:40 mumm kernel:  #3:  (&list->qh_lock){--..}, at: [<ffffffff802c5841>] xfs_qm_dqget+0x1f6/0x355
> Jan 26 19:41:40 mumm kernel: 
> Jan 26 19:41:40 mumm kernel: stack backtrace:
> Jan 26 19:41:40 mumm kernel: Pid: 1589, comm: rm Not tainted 2.6.24 #3
> Jan 26 19:41:40 mumm kernel: 
> Jan 26 19:41:40 mumm kernel: Call Trace:
> Jan 26 19:41:40 mumm kernel:  [<ffffffff802446b2>] __lock_acquire+0x1b7/0xcd0
> Jan 26 19:41:40 mumm kernel:  [<ffffffff802c58e2>] xfs_qm_dqget+0x297/0x355
> Jan 26 19:41:40 mumm kernel:  [<ffffffff802455f4>] lock_acquire+0x51/0x6c
> Jan 26 19:41:40 mumm kernel:  [<ffffffff802c58e2>] xfs_qm_dqget+0x297/0x355
> Jan 26 19:41:40 mumm kernel:  [<ffffffff80241ef9>] debug_mutex_lock_common+0x16/0x23
> Jan 26 19:41:40 mumm kernel: other info that might help us debug this:
> Jan 26 19:41:40 mumm kernel: 4 locks held by rm/1589:
> Jan 26 19:41:40 mumm kernel:  #0:  (&type->i_mutex_dir_key#2/1){--..}, at: [<ffffffff8027c400>] do_unlinkat+0x6b/0x133
> Jan 26 19:41:40 mumm kernel:  #1:  (&sb->s_type->i_mutex_key#2){--..}, at: [<ffffffff8027a59d>] vfs_unlink+0x42/0x8c
> Jan 26 19:41:40 mumm kernel:  #2:  (&(&ip->i_lock)->mr_lock){----}, at: [<ffffffff802f4093>] xfs_ilock+0x63/0x8d
> Jan 26 19:41:40 mumm kernel:  #3:  (&list->qh_lock){--..}, at: [<ffffffff802c5841>] xfs_qm_dqget+0x1f6/0x355
> Jan 26 19:41:40 mumm kernel: 
> Jan 26 19:41:40 mumm kernel: stack backtrace:
> Jan 26 19:41:40 mumm kernel: Pid: 1589, comm: rm Not tainted 2.6.24 #3
> Jan 26 19:41:40 mumm kernel: 
> Jan 26 19:41:40 mumm kernel: Call Trace:
> Jan 26 19:41:40 mumm kernel:  [<ffffffff802446b2>] __lock_acquire+0x1b7/0xcd0
> Jan 26 19:41:40 mumm kernel:  [<ffffffff802c58e2>] xfs_qm_dqget+0x297/0x355
> Jan 26 19:41:40 mumm kernel:  [<ffffffff802455f4>] lock_acquire+0x51/0x6c
> Jan 26 19:41:40 mumm kernel:  [<ffffffff802c58e2>] xfs_qm_dqget+0x297/0x355
> Jan 26 19:41:40 mumm kernel:  [<ffffffff80241ef9>] debug_mutex_lock_common+0x16/0x23
> Jan 26 19:41:40 mumm kernel:  [<ffffffff803f8c7f>] mutex_lock_nested+0xec/0x286
> Jan 26 19:41:40 mumm kernel:  [<ffffffff802c58e2>] xfs_qm_dqget+0x297/0x355
> Jan 26 19:41:40 mumm kernel:  [<ffffffff802c8c2c>] xfs_qm_dqattach_one+0xe2/0x150
> Jan 26 19:41:40 mumm kernel:  [<ffffffff802c8d51>] xfs_qm_dqattach+0xb7/0x175
> Jan 26 19:41:40 mumm kernel:  [<ffffffff80311ac5>] xfs_remove+0x8f/0x364
> Jan 26 19:41:40 mumm kernel:  [<ffffffff803f8dff>] mutex_lock_nested+0x26c/0x286
> Jan 26 19:41:40 mumm kernel:  [<ffffffff80244082>] trace_hardirqs_on+0x115/0x139
> Jan 26 19:41:40 mumm kernel:  [<ffffffff80318e6f>] xfs_vn_unlink+0x17/0x36
> Jan 26 19:41:40 mumm kernel:  [<ffffffff8027a5b7>] vfs_unlink+0x5c/0x8c
> Jan 26 19:41:40 mumm kernel:  [<ffffffff8027c449>] do_unlinkat+0xb4/0x133
> Jan 26 19:41:40 mumm kernel:  [<ffffffff803f997a>] trace_hardirqs_on_thunk+0x35/0x3a
> Jan 26 19:41:40 mumm kernel:  [<ffffffff80244082>] trace_hardirqs_on+0x115/0x139
> Jan 26 19:41:40 mumm kernel:  [<ffffffff803f997a>] trace_hardirqs_on_thunk+0x35/0x3a
> Jan 26 19:41:40 mumm kernel:  [<ffffffff8020b66e>] system_call+0x7e/0x83
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-02-04  5:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-24 13:33 possible deadlock shown by CONFIG_PROVE_LOCKING Carlos Carvalho
2008-01-25  2:00 ` Lachlan McIlroy
2008-01-27 21:38   ` Carlos Carvalho
2008-02-04  5:56     ` Lachlan McIlroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).