* Re: 2.6.17-rc5-mm3
@ 2006-06-05 16:30 Martin Bligh
2006-06-05 19:44 ` 2.6.17-rc5-mm3 Ingo Molnar
0 siblings, 1 reply; 14+ messages in thread
From: Martin Bligh @ 2006-06-05 16:30 UTC (permalink / raw)
To: Andrew Morton; +Cc: Andy Whitcroft, LKML, Ingo Molnar
panic on NUMA-Q during LTP. Was fine in -mm2.
BUG: unable to handle kernel paging request at virtual address 22222232
printing eip:
c012cf84
*pde = 25b5a001
*pte = 00000000
Oops: 0000 [#1]
SMP
last sysfs file: /devices/pci0000:00/0000:00:0a.0/resource
Modules linked in:
CPU: 12
EIP: 0060:[<c012cf84>] Not tainted VLI
EFLAGS: 00010002 (2.6.17-rc5-mm3-autokern1 #1)
EIP is at check_deadlock+0x19/0xe1
eax: 00000001 ebx: e4453030 ecx: 00000000 edx: e4008000
esi: 22222222 edi: 00000001 ebp: 22222222 esp: e47ebec0
ds: 007b es: 007b ss: 0068
Process mkdir09 (pid: 18319, threadinfo=e47ea000 task=e5f91ab0)
Stack: e4453030 22222222 00000000 e459231c c012d015 22222222 00000001
e4008000
e459231c e47ea000 e47ebf1c e5f91ab0 c012d1ce e459231c 00000000
e47ea000
e47ebf1c e459231c 00000246 c02f1d74 e459231c e47ebf1c e47ea000
e47ebf1c
Call Trace:
[<c012d015>] check_deadlock+0xaa/0xe1
[<c012d1ce>] debug_mutex_add_waiter+0x4a/0x5c
[<c02f1d74>] __mutex_lock_slowpath+0x9e/0x1cb
[<c01648a9>] do_rmdir+0x67/0xc2
[<c02001da>] __put_user_4+0x12/0x18
[<c016490f>] sys_rmdir+0xb/0xe
[<c02f2f1f>] syscall_call+0x7/0xb
Code: 0c 68 60 07 31 c0 e8 22 c0 fe ff 58 fa 5b 5e 5f 5d c3 55 83 3d cc
11 36 c0 00 57 56 53 8b 6c 24 14 8b 7c 24 18 0f 84 c1 00 00 00 <8b> 55
10 31 c0 85 d2 0f 84 b6 00 00 00 8b 1a 31 f6 8b 83 c4 04
EIP: [<c012cf84>] check_deadlock+0x19/0xe1 SS:ESP 0068:e47ebec0
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.17-rc5-mm3
2006-06-05 16:30 2.6.17-rc5-mm3 Martin Bligh
@ 2006-06-05 19:44 ` Ingo Molnar
2006-06-05 20:00 ` 2.6.17-rc5-mm3 Randy.Dunlap
0 siblings, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2006-06-05 19:44 UTC (permalink / raw)
To: Martin Bligh; +Cc: Andrew Morton, Andy Whitcroft, LKML
* Martin Bligh <mbligh@google.com> wrote:
> panic on NUMA-Q during LTP. Was fine in -mm2.
>
> BUG: unable to handle kernel paging request at virtual address 22222232
> EIP is at check_deadlock+0x19/0xe1
> eax: 00000001 ebx: e4453030 ecx: 00000000 edx: e4008000
> esi: 22222222 edi: 00000001 ebp: 22222222 esp: e47ebec0
again these 0x22222222 entries on the stack. What on earth does this?
Andy got a similar crash on x86_64, with a 0x2222222222222222 entry ...
nothing of our magic values are 0x22 or 0x222222222.
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.17-rc5-mm3
2006-06-05 19:44 ` 2.6.17-rc5-mm3 Ingo Molnar
@ 2006-06-05 20:00 ` Randy.Dunlap
2006-06-05 20:05 ` 2.6.17-rc5-mm3 Ingo Molnar
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Randy.Dunlap @ 2006-06-05 20:00 UTC (permalink / raw)
To: Ingo Molnar; +Cc: mbligh, akpm, apw, linux-kernel
On Mon, 5 Jun 2006 21:44:22 +0200 Ingo Molnar wrote:
>
> * Martin Bligh <mbligh@google.com> wrote:
>
> > panic on NUMA-Q during LTP. Was fine in -mm2.
> >
> > BUG: unable to handle kernel paging request at virtual address 22222232
>
> > EIP is at check_deadlock+0x19/0xe1
> > eax: 00000001 ebx: e4453030 ecx: 00000000 edx: e4008000
> > esi: 22222222 edi: 00000001 ebp: 22222222 esp: e47ebec0
>
> again these 0x22222222 entries on the stack. What on earth does this?
> Andy got a similar crash on x86_64, with a 0x2222222222222222 entry ...
>
> nothing of our magic values are 0x22 or 0x222222222.
kernel/mutex-debug.c:
void debug_mutex_free_waiter(struct mutex_waiter *waiter)
{
DEBUG_WARN_ON(!list_empty(&waiter->list));
memset(waiter, 0x22, sizeof(*waiter));
}
---
~Randy
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.17-rc5-mm3
2006-06-05 20:00 ` 2.6.17-rc5-mm3 Randy.Dunlap
@ 2006-06-05 20:05 ` Ingo Molnar
2006-06-05 20:05 ` 2.6.17-rc5-mm3 Dave Jones
2006-06-06 8:56 ` [patch, -rc5-mm3] better lock debugging: remove mutex deadlock checking code Ingo Molnar
2 siblings, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2006-06-05 20:05 UTC (permalink / raw)
To: Randy.Dunlap; +Cc: mbligh, akpm, apw, linux-kernel
* Randy.Dunlap <rdunlap@xenotime.net> wrote:
> > > panic on NUMA-Q during LTP. Was fine in -mm2.
> > >
> > > BUG: unable to handle kernel paging request at virtual address 22222232
> >
> > > EIP is at check_deadlock+0x19/0xe1
> > > eax: 00000001 ebx: e4453030 ecx: 00000000 edx: e4008000
> > > esi: 22222222 edi: 00000001 ebp: 22222222 esp: e47ebec0
> >
> > again these 0x22222222 entries on the stack. What on earth does this?
> > Andy got a similar crash on x86_64, with a 0x2222222222222222 entry ...
> >
> > nothing of our magic values are 0x22 or 0x222222222.
>
> kernel/mutex-debug.c:
> void debug_mutex_free_waiter(struct mutex_waiter *waiter)
> {
> DEBUG_WARN_ON(!list_empty(&waiter->list));
> memset(waiter, 0x22, sizeof(*waiter));
> }
ah!!! that's indeed a hint. Will take a look tomorrow.
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.17-rc5-mm3
2006-06-05 20:00 ` 2.6.17-rc5-mm3 Randy.Dunlap
2006-06-05 20:05 ` 2.6.17-rc5-mm3 Ingo Molnar
@ 2006-06-05 20:05 ` Dave Jones
2006-06-05 20:08 ` 2.6.17-rc5-mm3 Ingo Molnar
2006-06-05 20:14 ` 2.6.17-rc5-mm3 Randy.Dunlap
2006-06-06 8:56 ` [patch, -rc5-mm3] better lock debugging: remove mutex deadlock checking code Ingo Molnar
2 siblings, 2 replies; 14+ messages in thread
From: Dave Jones @ 2006-06-05 20:05 UTC (permalink / raw)
To: Randy.Dunlap; +Cc: Ingo Molnar, mbligh, akpm, apw, linux-kernel
On Mon, Jun 05, 2006 at 01:00:39PM -0700, Randy.Dunlap wrote:
> On Mon, 5 Jun 2006 21:44:22 +0200 Ingo Molnar wrote:
>
> >
> > * Martin Bligh <mbligh@google.com> wrote:
> >
> > > panic on NUMA-Q during LTP. Was fine in -mm2.
> > >
> > > BUG: unable to handle kernel paging request at virtual address 22222232
> >
> > > EIP is at check_deadlock+0x19/0xe1
> > > eax: 00000001 ebx: e4453030 ecx: 00000000 edx: e4008000
> > > esi: 22222222 edi: 00000001 ebp: 22222222 esp: e47ebec0
> >
> > again these 0x22222222 entries on the stack. What on earth does this?
> > Andy got a similar crash on x86_64, with a 0x2222222222222222 entry ...
> >
> > nothing of our magic values are 0x22 or 0x222222222.
>
> kernel/mutex-debug.c:
> void debug_mutex_free_waiter(struct mutex_waiter *waiter)
> {
> DEBUG_WARN_ON(!list_empty(&waiter->list));
> memset(waiter, 0x22, sizeof(*waiter));
> }
Documentation/magic-number.txt sounds so promising, but we scatter definitions
of numbers all over the place. (No mention of the slab poison values,
or similar numbers there for eg, and various pointers to _other_ lists
of magic numbers).
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.17-rc5-mm3
2006-06-05 20:05 ` 2.6.17-rc5-mm3 Dave Jones
@ 2006-06-05 20:08 ` Ingo Molnar
2006-06-05 20:14 ` 2.6.17-rc5-mm3 Randy.Dunlap
1 sibling, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2006-06-05 20:08 UTC (permalink / raw)
To: Dave Jones, Randy.Dunlap, mbligh, akpm, apw, linux-kernel
* Dave Jones <davej@redhat.com> wrote:
> > kernel/mutex-debug.c:
> > void debug_mutex_free_waiter(struct mutex_waiter *waiter)
> > {
> > DEBUG_WARN_ON(!list_empty(&waiter->list));
> > memset(waiter, 0x22, sizeof(*waiter));
> > }
>
> Documentation/magic-number.txt sounds so promising, but we scatter
> definitions of numbers all over the place. (No mention of the slab
> poison values, or similar numbers there for eg, and various pointers
> to _other_ lists of magic numbers).
we've also got include/linux/poison.h - i'll move this value there.
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.17-rc5-mm3
2006-06-05 20:05 ` 2.6.17-rc5-mm3 Dave Jones
2006-06-05 20:08 ` 2.6.17-rc5-mm3 Ingo Molnar
@ 2006-06-05 20:14 ` Randy.Dunlap
2006-06-05 20:54 ` [PATCH] poison: add & use more constants Randy.Dunlap
1 sibling, 1 reply; 14+ messages in thread
From: Randy.Dunlap @ 2006-06-05 20:14 UTC (permalink / raw)
To: Dave Jones; +Cc: mingo, mbligh, akpm, apw, linux-kernel
On Mon, 5 Jun 2006 16:05:54 -0400 Dave Jones wrote:
> On Mon, Jun 05, 2006 at 01:00:39PM -0700, Randy.Dunlap wrote:
> > On Mon, 5 Jun 2006 21:44:22 +0200 Ingo Molnar wrote:
> >
> > >
> > > * Martin Bligh <mbligh@google.com> wrote:
> > >
> > > > panic on NUMA-Q during LTP. Was fine in -mm2.
> > > >
> > > > BUG: unable to handle kernel paging request at virtual address 22222232
> > >
> > > > EIP is at check_deadlock+0x19/0xe1
> > > > eax: 00000001 ebx: e4453030 ecx: 00000000 edx: e4008000
> > > > esi: 22222222 edi: 00000001 ebp: 22222222 esp: e47ebec0
> > >
> > > again these 0x22222222 entries on the stack. What on earth does this?
> > > Andy got a similar crash on x86_64, with a 0x2222222222222222 entry ...
> > >
> > > nothing of our magic values are 0x22 or 0x222222222.
> >
> > kernel/mutex-debug.c:
> > void debug_mutex_free_waiter(struct mutex_waiter *waiter)
> > {
> > DEBUG_WARN_ON(!list_empty(&waiter->list));
> > memset(waiter, 0x22, sizeof(*waiter));
> > }
>
> Documentation/magic-number.txt sounds so promising, but we scatter definitions
> of numbers all over the place. (No mention of the slab poison values,
> or similar numbers there for eg, and various pointers to _other_ lists
> of magic numbers).
I have a few more that I can add to include/linux/poison.h, like this one
above (only in -mm at present).
./include/linux/libata.h:#define ATA_TAG_POISON 0xfafbfcfdU
./arch/ppc/8260_io/fcc_enet.c:1918: memset((char *)(&(immap->im_dprambase[(mem_addr+64)])), 0x88, 32);
./drivers/usb/mon/mon_text.c:429: memset(mem, 0xe5, sizeof(struct mon_event_text));
./kernel/mutex-debug.c:384: memset(waiter, 0x11, sizeof(*waiter));
./kernel/mutex-debug.c:400: memset(waiter, 0x22, sizeof(*waiter));
./security/keys/key.c:985: memset(&key->payload, 0xbd, sizeof(key->payload));
./drivers/char/ftape/lowlevel/ftape-ctl.c:738: memset(ft_buffer[i]->address, 0xAA, FT_BUFF_SIZE);
./drivers/block/sx8.c:/* 0xf is just arbitrary, non-zero noise; this is sorta like poisoning */
---
~Randy
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] poison: add & use more constants
2006-06-05 20:14 ` 2.6.17-rc5-mm3 Randy.Dunlap
@ 2006-06-05 20:54 ` Randy.Dunlap
2006-06-06 13:33 ` Steven Rostedt
0 siblings, 1 reply; 14+ messages in thread
From: Randy.Dunlap @ 2006-06-05 20:54 UTC (permalink / raw)
To: Randy.Dunlap; +Cc: davej, mingo, mbligh, akpm, apw, linux-kernel
From: Randy Dunlap <rdunlap@xenotime.net>
Add more poison values to include/linux/poison.h.
It's not clear to me whether some others should be added or not,
so I haven't added any of these:
./include/linux/libata.h:#define ATA_TAG_POISON 0xfafbfcfdU
./arch/ppc/8260_io/fcc_enet.c:1918: memset((char *)(&(immap->im_dprambase[(mem_addr+64)])), 0x88, 32);
./drivers/usb/mon/mon_text.c:429: memset(mem, 0xe5, sizeof(struct mon_event_text));
./drivers/char/ftape/lowlevel/ftape-ctl.c:738: memset(ft_buffer[i]->address, 0xAA, FT_BUFF_SIZE);
./drivers/block/sx8.c:/* 0xf is just arbitrary, non-zero noise; this is sorta like poisoning */
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
---
include/linux/poison.h | 7 +++++++
kernel/mutex-debug.c | 5 +++--
security/keys/key.c | 3 ++-
3 files changed, 12 insertions(+), 3 deletions(-)
--- linux-2617-rc5mm3.orig/include/linux/poison.h
+++ linux-2617-rc5mm3/include/linux/poison.h
@@ -45,6 +45,13 @@
/********** drivers/atm/ **********/
#define ATM_POISON_FREE 0x12
+/********** kernel/mutexes **********/
+#define MUTEX_DEBUG_INIT 0x11
+#define MUTEX_DEBUG_FREE 0x22
+
+/********** security/ **********/
+#define KEY_DESTROY 0xbd
+
/********** sound/oss/ **********/
#define OSS_POISON_FREE 0xAB
--- linux-2617-rc5mm3.orig/kernel/mutex-debug.c
+++ linux-2617-rc5mm3/kernel/mutex-debug.c
@@ -16,6 +16,7 @@
#include <linux/sched.h>
#include <linux/delay.h>
#include <linux/module.h>
+#include <linux/poison.h>
#include <linux/spinlock.h>
#include <linux/kallsyms.h>
#include <linux/interrupt.h>
@@ -155,7 +156,7 @@ void debug_mutex_set_owner(struct mutex
void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter)
{
- memset(waiter, 0x11, sizeof(*waiter));
+ memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
waiter->magic = waiter;
INIT_LIST_HEAD(&waiter->list);
}
@@ -171,7 +172,7 @@ void debug_mutex_wake_waiter(struct mute
void debug_mutex_free_waiter(struct mutex_waiter *waiter)
{
DEBUG_WARN_ON(!list_empty(&waiter->list));
- memset(waiter, 0x22, sizeof(*waiter));
+ memset(waiter, MUTEX_DEBUG_FREE, sizeof(*waiter));
}
void debug_mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
--- linux-2617-rc5mm3.orig/security/keys/key.c
+++ linux-2617-rc5mm3/security/keys/key.c
@@ -11,6 +11,7 @@
#include <linux/module.h>
#include <linux/init.h>
+#include <linux/poison.h>
#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/security.h>
@@ -986,7 +987,7 @@ void unregister_key_type(struct key_type
if (key->type == ktype) {
if (ktype->destroy)
ktype->destroy(key);
- memset(&key->payload, 0xbd, sizeof(key->payload));
+ memset(&key->payload, KEY_DESTROY, sizeof(key->payload));
}
}
---
^ permalink raw reply [flat|nested] 14+ messages in thread
* [patch, -rc5-mm3] better lock debugging: remove mutex deadlock checking code
2006-06-05 20:00 ` 2.6.17-rc5-mm3 Randy.Dunlap
2006-06-05 20:05 ` 2.6.17-rc5-mm3 Ingo Molnar
2006-06-05 20:05 ` 2.6.17-rc5-mm3 Dave Jones
@ 2006-06-06 8:56 ` Ingo Molnar
2006-06-06 11:40 ` Andy Whitcroft
2006-06-07 9:17 ` Andy Whitcroft
2 siblings, 2 replies; 14+ messages in thread
From: Ingo Molnar @ 2006-06-06 8:56 UTC (permalink / raw)
To: Randy.Dunlap; +Cc: mbligh, akpm, apw, linux-kernel
* Randy.Dunlap <rdunlap@xenotime.net> wrote:
> BUG: unable to handle kernel paging request at virtual address 22222232
ok, this was a big thinko on my part, and it was right before our eyes.
Mutex deadlock checking relied on the 'big mutex debugging lock', but
that one is gone now - so mutex deadlock checking became racy (as your
crashes nicely pinpointed that). The races are more likely with an
increasing number of CPUs.
so the patch below finishes the cleanup i started: it removes deadlock
checking from the mutex code and lets the lock validator do that. This
should also be (much) faster on SMP, because the lock validator is
lockless in the fastpath. (if CONFIG_DEBUG_LOCKDEP is disabled)
Ingo
----------------
Subject: better lock debugging: remove mutex deadlock checking code
From: Ingo Molnar <mingo@elte.hu>
with the lock validator we detect mutex deadlocks (and more), the mutex
deadlock checking code is both redundant and slower. So remove it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/mutex-debug.c | 126 ---------------------------------------------------
lib/Kconfig.debug | 8 ---
2 files changed, 1 insertion(+), 133 deletions(-)
Index: linux/kernel/mutex-debug.c
===================================================================
--- linux.orig/kernel/mutex-debug.c
+++ linux/kernel/mutex-debug.c
@@ -23,128 +23,6 @@
#include "mutex-debug.h"
-static void printk_task(struct task_struct *p)
-{
- if (p)
- printk("%16s:%5d [%p, %3d]", p->comm, p->pid, p, p->prio);
- else
- printk("<none>");
-}
-
-static void printk_ti(struct thread_info *ti)
-{
- if (ti)
- printk_task(ti->task);
- else
- printk("<none>");
-}
-
-static void printk_lock(struct mutex *lock, int print_owner)
-{
-#ifdef CONFIG_PROVE_MUTEX_LOCKING
- printk(" [%p] {%s}\n", lock, lock->dep_map.name);
-#else
- printk(" [%p]\n", lock);
-#endif
-
- if (print_owner && lock->owner) {
- printk(".. held by: ");
- printk_ti(lock->owner);
- printk("\n");
- }
-}
-
-static void report_deadlock(struct task_struct *task, struct mutex *lock,
- struct mutex *lockblk)
-{
- printk("\n%s/%d is trying to acquire this lock:\n",
- current->comm, current->pid);
- printk_lock(lock, 1);
- debug_show_held_locks(current);
-
- if (lockblk) {
- printk("but %s/%d is deadlocking current task %s/%d!\n\n",
- task->comm, task->pid, current->comm, current->pid);
- printk("\n%s/%d is blocked on this lock:\n",
- task->comm, task->pid);
- printk_lock(lockblk, 1);
-
- debug_show_held_locks(task);
-
- printk("\n%s/%d's [blocked] stackdump:\n\n",
- task->comm, task->pid);
- show_stack(task, NULL);
- }
-
- printk("\n%s/%d's [current] stackdump:\n\n",
- current->comm, current->pid);
- dump_stack();
- debug_show_all_locks();
- printk("[ turning off deadlock detection. Please report this. ]\n\n");
- local_irq_disable();
-}
-
-/*
- * Recursively check for mutex deadlocks:
- */
-static int check_deadlock(struct mutex *lock, int depth, struct thread_info *ti)
-{
- struct mutex *lockblk;
- struct task_struct *task;
-
- if (!debug_locks)
- return 0;
-
- ti = lock->owner;
- if (!ti)
- return 0;
-
- task = ti->task;
- /*
- * In the PROVE_MUTEX_LOCKING we are tracking all held
- * locks already, which allows us to optimize this:
- */
-#ifdef CONFIG_PROVE_MUTEX_LOCKING
- if (!task->lockdep_depth)
- return 0;
-#endif
- lockblk = NULL;
- if (task->blocked_on)
- lockblk = task->blocked_on->lock;
-
- /* Self-deadlock: */
- if (current == task) {
- debug_locks_off();
- if (depth)
- return 1;
- printk("\n==========================================\n");
- printk( "[ BUG: lock recursion deadlock detected! |\n");
- printk( "------------------------------------------\n");
- report_deadlock(task, lock, NULL);
- return 0;
- }
-
- /* Ugh, something corrupted the lock data structure? */
- if (depth > 20) {
- debug_locks_off();
- printk("\n===========================================\n");
- printk( "[ BUG: infinite lock dependency detected!? |\n");
- printk( "-------------------------------------------\n");
- report_deadlock(task, lock, lockblk);
- return 0;
- }
-
- /* Recursively check for dependencies: */
- if (lockblk && check_deadlock(lockblk, depth+1, ti)) {
- printk("\n============================================\n");
- printk( "[ BUG: circular locking deadlock detected! ]\n");
- printk( "--------------------------------------------\n");
- report_deadlock(task, lock, lockblk);
- return 0;
- }
- return 0;
-}
-
/*
* Must be called with lock->wait_lock held.
*/
@@ -178,9 +56,7 @@ void debug_mutex_add_waiter(struct mutex
struct thread_info *ti)
{
SMP_DEBUG_WARN_ON(!spin_is_locked(&lock->wait_lock));
-#ifdef CONFIG_DEBUG_MUTEX_DEADLOCKS
- check_deadlock(lock, 0, ti);
-#endif
+
/* Mark the current thread as blocked on the lock: */
ti->task->blocked_on = waiter;
waiter->lock = lock;
Index: linux/lib/Kconfig.debug
===================================================================
--- linux.orig/lib/Kconfig.debug
+++ linux/lib/Kconfig.debug
@@ -164,14 +164,6 @@ config DEBUG_MUTEX_ALLOC
(kfree(), kmem_cache_free(), free_pages(), vfree(), etc.),
or whether there is any lock held during task exit.
-config DEBUG_MUTEX_DEADLOCKS
- bool "Detect mutex related deadlocks"
- default y
- depends on DEBUG_MUTEXES
- help
- This feature will automatically detect and report mutex related
- deadlocks, as they happen.
-
config DEBUG_RT_MUTEXES
bool "RT Mutex debugging, deadlock detection"
default y
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch, -rc5-mm3] better lock debugging: remove mutex deadlock checking code
2006-06-06 8:56 ` [patch, -rc5-mm3] better lock debugging: remove mutex deadlock checking code Ingo Molnar
@ 2006-06-06 11:40 ` Andy Whitcroft
2006-06-06 17:17 ` Andy Whitcroft
2006-06-07 9:17 ` Andy Whitcroft
1 sibling, 1 reply; 14+ messages in thread
From: Andy Whitcroft @ 2006-06-06 11:40 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Randy.Dunlap, mbligh, akpm, linux-kernel
Ingo Molnar wrote:
> * Randy.Dunlap <rdunlap@xenotime.net> wrote:
>
>
>>BUG: unable to handle kernel paging request at virtual address 22222232
>
>
> ok, this was a big thinko on my part, and it was right before our eyes.
> Mutex deadlock checking relied on the 'big mutex debugging lock', but
> that one is gone now - so mutex deadlock checking became racy (as your
> crashes nicely pinpointed that). The races are more likely with an
> increasing number of CPUs.
>
> so the patch below finishes the cleanup i started: it removes deadlock
> checking from the mutex code and lets the lock validator do that. This
> should also be (much) faster on SMP, because the lock validator is
> lockless in the fastpath. (if CONFIG_DEBUG_LOCKDEP is disabled)
>
> Ingo
>
> ----------------
> Subject: better lock debugging: remove mutex deadlock checking code
> From: Ingo Molnar <mingo@elte.hu>
>
> with the lock validator we detect mutex deadlocks (and more), the mutex
> deadlock checking code is both redundant and slower. So remove it.
>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
> kernel/mutex-debug.c | 126 ---------------------------------------------------
> lib/Kconfig.debug | 8 ---
> 2 files changed, 1 insertion(+), 133 deletions(-)
>
> Index: linux/kernel/mutex-debug.c
> ===================================================================
> --- linux.orig/kernel/mutex-debug.c
> +++ linux/kernel/mutex-debug.c
> @@ -23,128 +23,6 @@
>
> #include "mutex-debug.h"
>
> -static void printk_task(struct task_struct *p)
> -{
> - if (p)
> - printk("%16s:%5d [%p, %3d]", p->comm, p->pid, p, p->prio);
> - else
> - printk("<none>");
> -}
> -
> -static void printk_ti(struct thread_info *ti)
> -{
> - if (ti)
> - printk_task(ti->task);
> - else
> - printk("<none>");
> -}
> -
> -static void printk_lock(struct mutex *lock, int print_owner)
> -{
> -#ifdef CONFIG_PROVE_MUTEX_LOCKING
> - printk(" [%p] {%s}\n", lock, lock->dep_map.name);
> -#else
> - printk(" [%p]\n", lock);
> -#endif
> -
> - if (print_owner && lock->owner) {
> - printk(".. held by: ");
> - printk_ti(lock->owner);
> - printk("\n");
> - }
> -}
> -
> -static void report_deadlock(struct task_struct *task, struct mutex *lock,
> - struct mutex *lockblk)
> -{
> - printk("\n%s/%d is trying to acquire this lock:\n",
> - current->comm, current->pid);
> - printk_lock(lock, 1);
> - debug_show_held_locks(current);
> -
> - if (lockblk) {
> - printk("but %s/%d is deadlocking current task %s/%d!\n\n",
> - task->comm, task->pid, current->comm, current->pid);
> - printk("\n%s/%d is blocked on this lock:\n",
> - task->comm, task->pid);
> - printk_lock(lockblk, 1);
> -
> - debug_show_held_locks(task);
> -
> - printk("\n%s/%d's [blocked] stackdump:\n\n",
> - task->comm, task->pid);
> - show_stack(task, NULL);
> - }
> -
> - printk("\n%s/%d's [current] stackdump:\n\n",
> - current->comm, current->pid);
> - dump_stack();
> - debug_show_all_locks();
> - printk("[ turning off deadlock detection. Please report this. ]\n\n");
> - local_irq_disable();
> -}
> -
> -/*
> - * Recursively check for mutex deadlocks:
> - */
> -static int check_deadlock(struct mutex *lock, int depth, struct thread_info *ti)
> -{
> - struct mutex *lockblk;
> - struct task_struct *task;
> -
> - if (!debug_locks)
> - return 0;
> -
> - ti = lock->owner;
> - if (!ti)
> - return 0;
> -
> - task = ti->task;
> - /*
> - * In the PROVE_MUTEX_LOCKING we are tracking all held
> - * locks already, which allows us to optimize this:
> - */
> -#ifdef CONFIG_PROVE_MUTEX_LOCKING
> - if (!task->lockdep_depth)
> - return 0;
> -#endif
> - lockblk = NULL;
> - if (task->blocked_on)
> - lockblk = task->blocked_on->lock;
> -
> - /* Self-deadlock: */
> - if (current == task) {
> - debug_locks_off();
> - if (depth)
> - return 1;
> - printk("\n==========================================\n");
> - printk( "[ BUG: lock recursion deadlock detected! |\n");
> - printk( "------------------------------------------\n");
> - report_deadlock(task, lock, NULL);
> - return 0;
> - }
> -
> - /* Ugh, something corrupted the lock data structure? */
> - if (depth > 20) {
> - debug_locks_off();
> - printk("\n===========================================\n");
> - printk( "[ BUG: infinite lock dependency detected!? |\n");
> - printk( "-------------------------------------------\n");
> - report_deadlock(task, lock, lockblk);
> - return 0;
> - }
> -
> - /* Recursively check for dependencies: */
> - if (lockblk && check_deadlock(lockblk, depth+1, ti)) {
> - printk("\n============================================\n");
> - printk( "[ BUG: circular locking deadlock detected! ]\n");
> - printk( "--------------------------------------------\n");
> - report_deadlock(task, lock, lockblk);
> - return 0;
> - }
> - return 0;
> -}
> -
> /*
> * Must be called with lock->wait_lock held.
> */
> @@ -178,9 +56,7 @@ void debug_mutex_add_waiter(struct mutex
> struct thread_info *ti)
> {
> SMP_DEBUG_WARN_ON(!spin_is_locked(&lock->wait_lock));
> -#ifdef CONFIG_DEBUG_MUTEX_DEADLOCKS
> - check_deadlock(lock, 0, ti);
> -#endif
> +
> /* Mark the current thread as blocked on the lock: */
> ti->task->blocked_on = waiter;
> waiter->lock = lock;
> Index: linux/lib/Kconfig.debug
> ===================================================================
> --- linux.orig/lib/Kconfig.debug
> +++ linux/lib/Kconfig.debug
> @@ -164,14 +164,6 @@ config DEBUG_MUTEX_ALLOC
> (kfree(), kmem_cache_free(), free_pages(), vfree(), etc.),
> or whether there is any lock held during task exit.
>
> -config DEBUG_MUTEX_DEADLOCKS
> - bool "Detect mutex related deadlocks"
> - default y
> - depends on DEBUG_MUTEXES
> - help
> - This feature will automatically detect and report mutex related
> - deadlocks, as they happen.
> -
> config DEBUG_RT_MUTEXES
> bool "RT Mutex debugging, deadlock detection"
> default y
I'll shove this one in for testing too. Results on TKO as I have them.
-apw
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] poison: add & use more constants
2006-06-05 20:54 ` [PATCH] poison: add & use more constants Randy.Dunlap
@ 2006-06-06 13:33 ` Steven Rostedt
0 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2006-06-06 13:33 UTC (permalink / raw)
To: Randy.Dunlap; +Cc: davej, mingo, mbligh, akpm, apw, linux-kernel
On Mon, 2006-06-05 at 13:54 -0700, Randy.Dunlap wrote:
> From: Randy Dunlap <rdunlap@xenotime.net>
>
> Add more poison values to include/linux/poison.h.
> It's not clear to me whether some others should be added or not,
> so I haven't added any of these:
>
> ./include/linux/libata.h:#define ATA_TAG_POISON 0xfafbfcfdU
> ./arch/ppc/8260_io/fcc_enet.c:1918: memset((char *)(&(immap->im_dprambase[(mem_addr+64)])), 0x88, 32);
> ./drivers/usb/mon/mon_text.c:429: memset(mem, 0xe5, sizeof(struct mon_event_text));
> ./drivers/char/ftape/lowlevel/ftape-ctl.c:738: memset(ft_buffer[i]->address, 0xAA, FT_BUFF_SIZE);
> ./drivers/block/sx8.c:/* 0xf is just arbitrary, non-zero noise; this is sorta like poisoning */
You don't have my personal favorite? From AIX that would poison pages
with 0xdeadbeef :)
-- Steve
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch, -rc5-mm3] better lock debugging: remove mutex deadlock checking code
2006-06-06 11:40 ` Andy Whitcroft
@ 2006-06-06 17:17 ` Andy Whitcroft
2006-06-06 19:29 ` Ingo Molnar
0 siblings, 1 reply; 14+ messages in thread
From: Andy Whitcroft @ 2006-06-06 17:17 UTC (permalink / raw)
To: Andy Whitcroft; +Cc: Ingo Molnar, Randy.Dunlap, mbligh, akpm, linux-kernel
Andy Whitcroft wrote:
> Ingo Molnar wrote:
>
>>* Randy.Dunlap <rdunlap@xenotime.net> wrote:
>>
>>
>>
>>>BUG: unable to handle kernel paging request at virtual address 22222232
>>
>>
>>ok, this was a big thinko on my part, and it was right before our eyes.
>>Mutex deadlock checking relied on the 'big mutex debugging lock', but
>>that one is gone now - so mutex deadlock checking became racy (as your
>>crashes nicely pinpointed that). The races are more likely with an
>>increasing number of CPUs.
>>
>>so the patch below finishes the cleanup i started: it removes deadlock
>>checking from the mutex code and lets the lock validator do that. This
>>should also be (much) faster on SMP, because the lock validator is
>>lockless in the fastpath. (if CONFIG_DEBUG_LOCKDEP is disabled)
>>
>> Ingo
>>
>>----------------
>>Subject: better lock debugging: remove mutex deadlock checking code
>>From: Ingo Molnar <mingo@elte.hu>
>>
>>with the lock validator we detect mutex deadlocks (and more), the mutex
>>deadlock checking code is both redundant and slower. So remove it.
>>
>>Signed-off-by: Ingo Molnar <mingo@elte.hu>
>>---
>> kernel/mutex-debug.c | 126 ---------------------------------------------------
>> lib/Kconfig.debug | 8 ---
>> 2 files changed, 1 insertion(+), 133 deletions(-)
>>
>>Index: linux/kernel/mutex-debug.c
>>===================================================================
>>--- linux.orig/kernel/mutex-debug.c
>>+++ linux/kernel/mutex-debug.c
>>@@ -23,128 +23,6 @@
>>
>> #include "mutex-debug.h"
>>
>>-static void printk_task(struct task_struct *p)
>>-{
>>- if (p)
>>- printk("%16s:%5d [%p, %3d]", p->comm, p->pid, p, p->prio);
>>- else
>>- printk("<none>");
>>-}
>>-
>>-static void printk_ti(struct thread_info *ti)
>>-{
>>- if (ti)
>>- printk_task(ti->task);
>>- else
>>- printk("<none>");
>>-}
>>-
>>-static void printk_lock(struct mutex *lock, int print_owner)
>>-{
>>-#ifdef CONFIG_PROVE_MUTEX_LOCKING
>>- printk(" [%p] {%s}\n", lock, lock->dep_map.name);
>>-#else
>>- printk(" [%p]\n", lock);
>>-#endif
>>-
>>- if (print_owner && lock->owner) {
>>- printk(".. held by: ");
>>- printk_ti(lock->owner);
>>- printk("\n");
>>- }
>>-}
>>-
>>-static void report_deadlock(struct task_struct *task, struct mutex *lock,
>>- struct mutex *lockblk)
>>-{
>>- printk("\n%s/%d is trying to acquire this lock:\n",
>>- current->comm, current->pid);
>>- printk_lock(lock, 1);
>>- debug_show_held_locks(current);
>>-
>>- if (lockblk) {
>>- printk("but %s/%d is deadlocking current task %s/%d!\n\n",
>>- task->comm, task->pid, current->comm, current->pid);
>>- printk("\n%s/%d is blocked on this lock:\n",
>>- task->comm, task->pid);
>>- printk_lock(lockblk, 1);
>>-
>>- debug_show_held_locks(task);
>>-
>>- printk("\n%s/%d's [blocked] stackdump:\n\n",
>>- task->comm, task->pid);
>>- show_stack(task, NULL);
>>- }
>>-
>>- printk("\n%s/%d's [current] stackdump:\n\n",
>>- current->comm, current->pid);
>>- dump_stack();
>>- debug_show_all_locks();
>>- printk("[ turning off deadlock detection. Please report this. ]\n\n");
>>- local_irq_disable();
>>-}
>>-
>>-/*
>>- * Recursively check for mutex deadlocks:
>>- */
>>-static int check_deadlock(struct mutex *lock, int depth, struct thread_info *ti)
>>-{
>>- struct mutex *lockblk;
>>- struct task_struct *task;
>>-
>>- if (!debug_locks)
>>- return 0;
>>-
>>- ti = lock->owner;
>>- if (!ti)
>>- return 0;
>>-
>>- task = ti->task;
>>- /*
>>- * In the PROVE_MUTEX_LOCKING we are tracking all held
>>- * locks already, which allows us to optimize this:
>>- */
>>-#ifdef CONFIG_PROVE_MUTEX_LOCKING
>>- if (!task->lockdep_depth)
>>- return 0;
>>-#endif
>>- lockblk = NULL;
>>- if (task->blocked_on)
>>- lockblk = task->blocked_on->lock;
>>-
>>- /* Self-deadlock: */
>>- if (current == task) {
>>- debug_locks_off();
>>- if (depth)
>>- return 1;
>>- printk("\n==========================================\n");
>>- printk( "[ BUG: lock recursion deadlock detected! |\n");
>>- printk( "------------------------------------------\n");
>>- report_deadlock(task, lock, NULL);
>>- return 0;
>>- }
>>-
>>- /* Ugh, something corrupted the lock data structure? */
>>- if (depth > 20) {
>>- debug_locks_off();
>>- printk("\n===========================================\n");
>>- printk( "[ BUG: infinite lock dependency detected!? |\n");
>>- printk( "-------------------------------------------\n");
>>- report_deadlock(task, lock, lockblk);
>>- return 0;
>>- }
>>-
>>- /* Recursively check for dependencies: */
>>- if (lockblk && check_deadlock(lockblk, depth+1, ti)) {
>>- printk("\n============================================\n");
>>- printk( "[ BUG: circular locking deadlock detected! ]\n");
>>- printk( "--------------------------------------------\n");
>>- report_deadlock(task, lock, lockblk);
>>- return 0;
>>- }
>>- return 0;
>>-}
>>-
>> /*
>> * Must be called with lock->wait_lock held.
>> */
>>@@ -178,9 +56,7 @@ void debug_mutex_add_waiter(struct mutex
>> struct thread_info *ti)
>> {
>> SMP_DEBUG_WARN_ON(!spin_is_locked(&lock->wait_lock));
>>-#ifdef CONFIG_DEBUG_MUTEX_DEADLOCKS
>>- check_deadlock(lock, 0, ti);
>>-#endif
>>+
>> /* Mark the current thread as blocked on the lock: */
>> ti->task->blocked_on = waiter;
>> waiter->lock = lock;
>>Index: linux/lib/Kconfig.debug
>>===================================================================
>>--- linux.orig/lib/Kconfig.debug
>>+++ linux/lib/Kconfig.debug
>>@@ -164,14 +164,6 @@ config DEBUG_MUTEX_ALLOC
>> (kfree(), kmem_cache_free(), free_pages(), vfree(), etc.),
>> or whether there is any lock held during task exit.
>>
>>-config DEBUG_MUTEX_DEADLOCKS
>>- bool "Detect mutex related deadlocks"
>>- default y
>>- depends on DEBUG_MUTEXES
>>- help
>>- This feature will automatically detect and report mutex related
>>- deadlocks, as they happen.
>>-
>> config DEBUG_RT_MUTEXES
>> bool "RT Mutex debugging, deadlock detection"
>> default y
>
>
> I'll shove this one in for testing too. Results on TKO as I have them.
>
> -apw
>
This is definatly clearing up a bunch of problems with the current -mm.
-apw
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch, -rc5-mm3] better lock debugging: remove mutex deadlock checking code
2006-06-06 17:17 ` Andy Whitcroft
@ 2006-06-06 19:29 ` Ingo Molnar
0 siblings, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2006-06-06 19:29 UTC (permalink / raw)
To: Andy Whitcroft; +Cc: Randy.Dunlap, mbligh, akpm, linux-kernel
* Andy Whitcroft <apw@shadowen.org> wrote:
> > I'll shove this one in for testing too. Results on TKO as I have them.
>
> This is definatly clearing up a bunch of problems with the current
> -mm.
great! Thanks for testing this out, this bug was the scariest pending
one.
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch, -rc5-mm3] better lock debugging: remove mutex deadlock checking code
2006-06-06 8:56 ` [patch, -rc5-mm3] better lock debugging: remove mutex deadlock checking code Ingo Molnar
2006-06-06 11:40 ` Andy Whitcroft
@ 2006-06-07 9:17 ` Andy Whitcroft
1 sibling, 0 replies; 14+ messages in thread
From: Andy Whitcroft @ 2006-06-07 9:17 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Randy.Dunlap, mbligh, akpm, linux-kernel
Ingo Molnar wrote:
> * Randy.Dunlap <rdunlap@xenotime.net> wrote:
>
>
>>BUG: unable to handle kernel paging request at virtual address 22222232
>
>
> ok, this was a big thinko on my part, and it was right before our eyes.
> Mutex deadlock checking relied on the 'big mutex debugging lock', but
> that one is gone now - so mutex deadlock checking became racy (as your
> crashes nicely pinpointed that). The races are more likely with an
> increasing number of CPUs.
>
> so the patch below finishes the cleanup i started: it removes deadlock
> checking from the mutex code and lets the lock validator do that. This
> should also be (much) faster on SMP, because the lock validator is
> lockless in the fastpath. (if CONFIG_DEBUG_LOCKDEP is disabled)
>
> Ingo
>
> ----------------
> Subject: better lock debugging: remove mutex deadlock checking code
> From: Ingo Molnar <mingo@elte.hu>
>
> with the lock validator we detect mutex deadlocks (and more), the mutex
> deadlock checking code is both redundant and slower. So remove it.
>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
Ok, this patch in combination with either fix for the swap max bug are
showing passes across the board.
Acked-by: Andy Whitcroft <apw@shadowen.org>
-apw
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2006-06-07 9:17 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-05 16:30 2.6.17-rc5-mm3 Martin Bligh
2006-06-05 19:44 ` 2.6.17-rc5-mm3 Ingo Molnar
2006-06-05 20:00 ` 2.6.17-rc5-mm3 Randy.Dunlap
2006-06-05 20:05 ` 2.6.17-rc5-mm3 Ingo Molnar
2006-06-05 20:05 ` 2.6.17-rc5-mm3 Dave Jones
2006-06-05 20:08 ` 2.6.17-rc5-mm3 Ingo Molnar
2006-06-05 20:14 ` 2.6.17-rc5-mm3 Randy.Dunlap
2006-06-05 20:54 ` [PATCH] poison: add & use more constants Randy.Dunlap
2006-06-06 13:33 ` Steven Rostedt
2006-06-06 8:56 ` [patch, -rc5-mm3] better lock debugging: remove mutex deadlock checking code Ingo Molnar
2006-06-06 11:40 ` Andy Whitcroft
2006-06-06 17:17 ` Andy Whitcroft
2006-06-06 19:29 ` Ingo Molnar
2006-06-07 9:17 ` Andy Whitcroft
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.