The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [BUG -rt] circular locking deadlock
@ 2007-08-16  1:39 john stultz
  2007-08-16  7:39 ` Peter Zijlstra
  0 siblings, 1 reply; 3+ messages in thread
From: john stultz @ 2007-08-16  1:39 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner; +Cc: lkml

Hey Ingo, Thomas,

I was playing with the latency tracer on 2.6.23-rc2-rt2 while a "make
-j8" was going on in the background and the box hung with this on the
console:


============================================
[ BUG: circular locking deadlock detected! ]
--------------------------------------------
cc1/7161 is deadlocking current task cc1/7164


1) cc1/7164 is trying to acquire this lock:
 [ffff8101fa375250] {&list->lock}
.. ->owner: ffff8101ef7f7162
.. held by:               cc1: 7161 [ffff8101ef7f7160, 120]

2) cc1/7161 is blocked on this lock:
 [ffff8102118e5900] {&list->lock}
.. ->owner: ffff8100ddb3ace2
.. held by:               cc1: 7164 [ffff8100ddb3ace0, 120]

cc1/7161's [blocked] stackdump:

 ffff8101f9423c68 0000000000000086 0000000100000000 0000000000000002
 ffffffff82205d00 0000000000000202 ffffffff82189280 ffff8101ef7f7160
 ffff8100ddb3ace0 ffff8101f9423c10 0000000000000202 0000000000000000
Call Trace:
 [<ffffffff8100c58c>] mcount+0x2c/0x3a
 [<ffffffff817e6e6b>] schedule+0x3b/0x130
 [<ffffffff817e7feb>] rt_spin_lock_slowlock+0xdb/0x210
 [<ffffffff817e8b4d>] __rt_spin_lock+0x3d/0x50
 [<ffffffff817e8b6e>] rt_spin_lock+0xe/0x10
 [<ffffffff81297a62>] __lock_list+0x42/0x70
 [<ffffffff81297b43>] lock_list_splice_init+0x33/0xb0
 [<ffffffff810bf9d8>] file_move+0xc8/0xd0
 [<ffffffff810bc99e>] __dentry_open+0xae/0x1d0
 [<ffffffff810bcb84>] nameidata_to_filp+0x44/0x60
 [<ffffffff810bcbed>] do_filp_open+0x4d/0x60
 [<ffffffff817e8ace>] __lock_text_start+0xe/0x10
 [<ffffffff810bc8a3>] get_unused_fd_flags+0x103/0x130
 [<ffffffff810bcc61>] do_sys_open+0x61/0x100
 [<ffffffff810bcd40>] sys_open+0x20/0x30
 [<ffffffff8100c9a5>] tracesys+0x151/0x1be

---------------------------
| preempt count: 00000002 ]
| 2-level deep critical section nesting:
----------------------------------------
.. [<ffffffff817e5e33>] .... __schedule+0x23/0x265
.....[<ffffffff817e6e6b>] ..   ( <= schedule+0x3b/0x130)
.. [<ffffffff817e9041>] .... __spin_lock_irq+0x21/0x40
.....[<ffffffff817e5ec4>] ..   ( <= __schedule+0xb4/0x265)


cc1/7164's [current] stackdump:


Call Trace:
 [<ffffffff8100e433>] show_stack+0x13/0x20
 [<ffffffff8106c643>] debug_rt_mutex_print_deadlock+0x163/0x180
 [<ffffffff817e7fe6>] rt_spin_lock_slowlock+0xd6/0x210
 [<ffffffff817e8b4d>] __rt_spin_lock+0x3d/0x50
 [<ffffffff817e8b6e>] rt_spin_lock+0xe/0x10
 [<ffffffff81297a62>] __lock_list+0x42/0x70
 [<ffffffff81297be6>] lock_list_del_init+0x26/0x70
 [<ffffffff810bf8fa>] file_kill+0x4a/0x60
 [<ffffffff810bfb8e>] __fput+0x12e/0x1d0
 [<ffffffff810bfcc5>] fput+0x25/0x30
 [<ffffffff810bc771>] filp_close+0x61/0x90
 [<ffffffff810bded9>] sys_close+0xa9/0x100
 [<ffffffff8100c9a5>] tracesys+0x151/0x1be

---------------------------
| preempt count: 00000000 ]
| 0-level deep critical section nesting:
----------------------------------------

[ turning off deadlock detection.Please report this trace. ]



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [BUG -rt] circular locking deadlock
  2007-08-16  1:39 [BUG -rt] circular locking deadlock john stultz
@ 2007-08-16  7:39 ` Peter Zijlstra
  2007-08-16 11:33   ` Peter Zijlstra
  0 siblings, 1 reply; 3+ messages in thread
From: Peter Zijlstra @ 2007-08-16  7:39 UTC (permalink / raw)
  To: john stultz; +Cc: Ingo Molnar, Thomas Gleixner, lkml

On Wed, 2007-08-15 at 18:39 -0700, john stultz wrote:
> Hey Ingo, Thomas,
> 
> I was playing with the latency tracer on 2.6.23-rc2-rt2 while a "make
> -j8" was going on in the background and the box hung with this on the
> console:

Hmm, this would have been me :-/

I'll go play...

> ============================================
> [ BUG: circular locking deadlock detected! ]
> --------------------------------------------
> cc1/7161 is deadlocking current task cc1/7164
> 
> 
> 1) cc1/7164 is trying to acquire this lock:
>  [ffff8101fa375250] {&list->lock}
> ... ->owner: ffff8101ef7f7162
> ... held by:               cc1: 7161 [ffff8101ef7f7160, 120]
> 
> 2) cc1/7161 is blocked on this lock:
>  [ffff8102118e5900] {&list->lock}
> ... ->owner: ffff8100ddb3ace2
> ... held by:               cc1: 7164 [ffff8100ddb3ace0, 120]
> 
> cc1/7161's [blocked] stackdump:
> 
>  ffff8101f9423c68 0000000000000086 0000000100000000 0000000000000002
>  ffffffff82205d00 0000000000000202 ffffffff82189280 ffff8101ef7f7160
>  ffff8100ddb3ace0 ffff8101f9423c10 0000000000000202 0000000000000000
> Call Trace:
>  [<ffffffff8100c58c>] mcount+0x2c/0x3a
>  [<ffffffff817e6e6b>] schedule+0x3b/0x130
>  [<ffffffff817e7feb>] rt_spin_lock_slowlock+0xdb/0x210
>  [<ffffffff817e8b4d>] __rt_spin_lock+0x3d/0x50
>  [<ffffffff817e8b6e>] rt_spin_lock+0xe/0x10
>  [<ffffffff81297a62>] __lock_list+0x42/0x70
>  [<ffffffff81297b43>] lock_list_splice_init+0x33/0xb0
>  [<ffffffff810bf9d8>] file_move+0xc8/0xd0
>  [<ffffffff810bc99e>] __dentry_open+0xae/0x1d0
>  [<ffffffff810bcb84>] nameidata_to_filp+0x44/0x60
>  [<ffffffff810bcbed>] do_filp_open+0x4d/0x60
>  [<ffffffff817e8ace>] __lock_text_start+0xe/0x10
>  [<ffffffff810bc8a3>] get_unused_fd_flags+0x103/0x130
>  [<ffffffff810bcc61>] do_sys_open+0x61/0x100
>  [<ffffffff810bcd40>] sys_open+0x20/0x30
>  [<ffffffff8100c9a5>] tracesys+0x151/0x1be
> 
> ---------------------------
> | preempt count: 00000002 ]
> | 2-level deep critical section nesting:
> ----------------------------------------
> ... [<ffffffff817e5e33>] .... __schedule+0x23/0x265
> ......[<ffffffff817e6e6b>] ..   ( <= schedule+0x3b/0x130)
> ... [<ffffffff817e9041>] .... __spin_lock_irq+0x21/0x40
> ......[<ffffffff817e5ec4>] ..   ( <= __schedule+0xb4/0x265)
> 
> 
> cc1/7164's [current] stackdump:
> 
> 
> Call Trace:
>  [<ffffffff8100e433>] show_stack+0x13/0x20
>  [<ffffffff8106c643>] debug_rt_mutex_print_deadlock+0x163/0x180
>  [<ffffffff817e7fe6>] rt_spin_lock_slowlock+0xd6/0x210
>  [<ffffffff817e8b4d>] __rt_spin_lock+0x3d/0x50
>  [<ffffffff817e8b6e>] rt_spin_lock+0xe/0x10
>  [<ffffffff81297a62>] __lock_list+0x42/0x70
>  [<ffffffff81297be6>] lock_list_del_init+0x26/0x70
>  [<ffffffff810bf8fa>] file_kill+0x4a/0x60
>  [<ffffffff810bfb8e>] __fput+0x12e/0x1d0
>  [<ffffffff810bfcc5>] fput+0x25/0x30
>  [<ffffffff810bc771>] filp_close+0x61/0x90
>  [<ffffffff810bded9>] sys_close+0xa9/0x100
>  [<ffffffff8100c9a5>] tracesys+0x151/0x1be
> 
> ---------------------------
> | preempt count: 00000000 ]
> | 0-level deep critical section nesting:
> ----------------------------------------
> 
> [ turning off deadlock detection.Please report this trace. ]



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [BUG -rt] circular locking deadlock
  2007-08-16  7:39 ` Peter Zijlstra
@ 2007-08-16 11:33   ` Peter Zijlstra
  0 siblings, 0 replies; 3+ messages in thread
From: Peter Zijlstra @ 2007-08-16 11:33 UTC (permalink / raw)
  To: john stultz; +Cc: Ingo Molnar, Thomas Gleixner, lkml

On Thu, 2007-08-16 at 09:39 +0200, Peter Zijlstra wrote:
> On Wed, 2007-08-15 at 18:39 -0700, john stultz wrote:
> > Hey Ingo, Thomas,
> > 
> > I was playing with the latency tracer on 2.6.23-rc2-rt2 while a "make
> > -j8" was going on in the background and the box hung with this on the
> > console:
> 
> Hmm, this would have been me :-/
> 
> I'll go play...


Could you give this a spin...

(not sure on the added rmbs, but they can't hurt)

---
Fix a deadlock in the fine grain locked list primitives.

Delete and splice use a double lock, which normally locks in the
prev->cur order. For delete this is deadlock free provided one will
never delete the list head - which is exactly what splice attempts.

In order to solve this, use the reverse locking order for splice - which
then assumes that the list passes is indeed the list head (no fancy
dummy item headless lists here).

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 lib/lock_list.c |   66 ++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 60 insertions(+), 6 deletions(-)

Index: linux-2.6/lib/lock_list.c
===================================================================
--- linux-2.6.orig/lib/lock_list.c
+++ linux-2.6/lib/lock_list.c
@@ -11,7 +11,7 @@
  *
  * Passed pointers are assumed to be stable by external means such as
  * refcounts or RCU. The individual list entries are assumed to be RCU
- * freed (requirement of __lock_list_del).
+ * freed (requirement of __lock_list).
  */
 
 #include <linux/lock_list.h>
@@ -19,12 +19,9 @@
 void lock_list_add(struct lock_list_head *new,
 		   struct lock_list_head *list)
 {
-	struct lock_list_head *next;
-
 	spin_lock(&new->lock);
 	spin_lock_nested(&list->lock, LOCK_LIST_NESTING_PREV);
-	next = list->next;
-	__list_add(&new->head, &list->head, &next->head);
+	__list_add(&new->head, &list->head, &list->next->head);
 	spin_unlock(&list->lock);
 	spin_unlock(&new->lock);
 }
@@ -35,6 +32,13 @@ static spinlock_t *__lock_list(struct lo
 	spinlock_t *lock = NULL;
 
 again:
+	/*
+	 * all modifications are done under spinlocks
+	 * but this read is not, the unlock acks as a wmb
+	 * for modifications.
+	 */
+	smp_rmb();
+
 	prev = entry->prev;
 	if (prev == entry)
 		goto one;
@@ -52,6 +56,56 @@ one:
 	return lock;
 }
 
+/*
+ * deadlock galore...
+ *
+ * when using __lock_list to lock the list head we get this:
+ *
+ *     lock H      2               1
+ *     lock 1      a       b
+ *     lock 2              A       B
+ *
+ *     list: ..-> [H] <-> [1] <-> [2] <-..
+ *
+ * obvious dead-lock, to solve this we must use a reverse order
+ * when trying to acquire a double lock on the head:
+ *
+ *     lock H r    1               2
+ *     lock 1      a       b
+ *     lock 2              A       B
+ *
+ *     list: ..-> [H] <-> [1] <-> [2] <-..
+ */
+static spinlock_t *__lock_list_reverse(struct lock_list_head *entry)
+{
+	struct lock_list_head *prev;
+	spinlock_t *lock = NULL;
+
+	spin_lock(&entry->lock);
+again:
+	/*
+	 * all modifications are done under spinlocks
+	 * but this read is not, the unlock acks as a wmb
+	 * for modifications.
+	 */
+	smp_rmb();
+	prev = entry->prev;
+	if (prev == entry)
+		goto done;
+
+	spin_lock_nested(&prev->lock, LOCK_LIST_NESTING_PREV);
+	if (unlikely(entry->prev != prev)) {
+		/*
+		 * we lost
+		 */
+		spin_unlock(&prev->lock);
+		goto again;
+	}
+	lock = &prev->lock;
+done:
+	return lock;
+}
+
 void lock_list_del_init(struct lock_list_head *entry)
 {
 	spinlock_t *lock;
@@ -71,7 +125,7 @@ void lock_list_splice_init(struct lock_l
 	spinlock_t *lock;
 
 	rcu_read_lock();
-	lock = __lock_list(list);
+	lock = __lock_list_reverse(list);
 	if (!list_empty(&list->head)) {
 		spin_lock_nested(&head->lock, LOCK_LIST_NESTING_NEXT);
 		__list_splice(&list->head, &head->head);



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-08-16 11:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-16  1:39 [BUG -rt] circular locking deadlock john stultz
2007-08-16  7:39 ` Peter Zijlstra
2007-08-16 11:33   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox