* [patch] 2.4.13-pre3 arm/i386/mips/mips64/s390/s390x/sh die() deadlock
@ 2001-10-16 2:23 Keith Owens
2001-10-16 2:36 ` Linus Torvalds
0 siblings, 1 reply; 5+ messages in thread
From: Keith Owens @ 2001-10-16 2:23 UTC (permalink / raw)
To: linux-kernel; +Cc: torvalds
Any die() routine that uses die_lock to avoid multiple cpu reentrancy
will deadlock on recursive die() errors. This patch only affects the
architectures that use die_lock and only if the code is in Linus's
tree.
I have not looked at the arch CVS trees so other architectures may
still have problems, please check. Several architectures have no
protection against multiple cpu reentrancy in die(), I left those
alone.
mips, mips64 and sh had uninitialized spinlock_t die_lock, naughty, it
breaks spin lock debugging. Any other uninitialized spinlock_t lying
around in those trees?
Most of the patch is white space changes due to new indentation.
Speaking of white space, s390/s390x has a lot of leading spaces instead
of tabs, did somebody code on an 026 card punch ;)? A white space
clean up on s390/s390x would be nice.
Patch against 2.4.13-pre3. Compiled on i386, eyeballed (using IEHIBALL
on s390) but untested on other arch.
Index: 13-pre3.1/arch/arm/kernel/traps.c
--- 13-pre3.1/arch/arm/kernel/traps.c Fri, 12 Oct 2001 11:40:38 +1000 kaos (linux-2.4/w/c/46_traps.c 1.3.1.2.1.1 644)
+++ 13-pre3.1(w)/arch/arm/kernel/traps.c Tue, 16 Oct 2001 12:03:28 +1000 kaos (linux-2.4/w/c/46_traps.c 1.3.1.2.1.1 644)
@@ -159,43 +159,49 @@ void show_trace_task(struct task_struct
}
}
-spinlock_t die_lock = SPIN_LOCK_UNLOCKED;
-
-/*
- * This function is protected against re-entrancy.
- */
NORET_TYPE void die(const char *str, struct pt_regs *regs, int err)
{
+ static spinlock_t die_lock = SPIN_LOCK_UNLOCKED;
+ static int die_lock_owner = -1, die_lock_owner_depth;
struct task_struct *tsk = current;
- console_verbose();
- spin_lock_irq(&die_lock);
+ if (die_lock_owner != smp_processor_id()) {
+ console_verbose();
+ spin_lock_irq(&die_lock);
+ die_lock_owner = smp_processor_id();
+ die_lock_owner_depth = 0;
+ }
- printk("Internal error: %s: %x\n", str, err);
- printk("CPU: %d\n", smp_processor_id());
- show_regs(regs);
- printk("Process %s (pid: %d, stackpage=%08lx)\n",
- current->comm, current->pid, 4096+(unsigned long)tsk);
-
- if (!user_mode(regs) || in_interrupt()) {
- mm_segment_t fs;
-
- /*
- * We need to switch to kernel mode so that we can
- * use __get_user to safely read from kernel space.
- * Note that we now dump the code first, just in case
- * the backtrace kills us.
- */
- fs = get_fs();
- set_fs(KERNEL_DS);
-
- dump_stack(tsk, (unsigned long)(regs + 1));
- dump_backtrace(regs, tsk);
- dump_instr(regs);
+ if (++die_lock_owner_depth < 3) {
+ printk("Internal error: %s: %x\n", str, err);
+ printk("CPU: %d\n", smp_processor_id());
+ show_regs(regs);
+ printk("Process %s (pid: %d, stackpage=%08lx)\n",
+ current->comm, current->pid, 4096+(unsigned long)tsk);
+
+ if (!user_mode(regs) || in_interrupt()) {
+ mm_segment_t fs;
+
+ /*
+ * We need to switch to kernel mode so that we can
+ * use __get_user to safely read from kernel space.
+ * Note that we now dump the code first, just in case
+ * the backtrace kills us.
+ */
+ fs = get_fs();
+ set_fs(KERNEL_DS);
+
+ dump_stack(tsk, (unsigned long)(regs + 1));
+ dump_backtrace(regs, tsk);
+ dump_instr(regs);
- set_fs(fs);
+ set_fs(fs);
+ }
}
+ else
+ printk(KERN_ERR "Recursive die() failure, output suppressed\n");
+ die_lock_owner = -1;
spin_unlock_irq(&die_lock);
do_exit(SIGSEGV);
}
Index: 13-pre3.1/arch/i386/kernel/traps.c
--- 13-pre3.1/arch/i386/kernel/traps.c Tue, 02 Oct 2001 11:04:33 +1000 kaos (linux-2.4/S/c/22_traps.c 1.1.2.1.1.2.1.1.1.6 644)
+++ 13-pre3.1(w)/arch/i386/kernel/traps.c Tue, 16 Oct 2001 11:34:27 +1000 kaos (linux-2.4/S/c/22_traps.c 1.1.2.1.1.2.1.1.1.6 644)
@@ -237,16 +237,28 @@ bad:
printk("\n");
}
-spinlock_t die_lock = SPIN_LOCK_UNLOCKED;
-
void die(const char * str, struct pt_regs * regs, long err)
{
- console_verbose();
- spin_lock_irq(&die_lock);
- bust_spinlocks(1);
- printk("%s: %04lx\n", str, err & 0xffff);
- show_registers(regs);
+ static spinlock_t die_lock = SPIN_LOCK_UNLOCKED;
+ static int die_lock_owner = -1, die_lock_owner_depth;
+
+ if (die_lock_owner != smp_processor_id()) {
+ console_verbose();
+ spin_lock_irq(&die_lock);
+ die_lock_owner = smp_processor_id();
+ die_lock_owner_depth = 0;
+ bust_spinlocks(1);
+ }
+
+ if (++die_lock_owner_depth < 3) {
+ printk("%s: %04lx\n", str, err & 0xffff);
+ show_registers(regs);
+ }
+ else
+ printk(KERN_ERR "Recursive die() failure, output suppressed\n");
+
bust_spinlocks(0);
+ die_lock_owner = -1;
spin_unlock_irq(&die_lock);
do_exit(SIGSEGV);
}
Index: 13-pre3.1/arch/mips64/kernel/traps.c
--- 13-pre3.1/arch/mips64/kernel/traps.c Mon, 10 Sep 2001 15:46:51 +1000 kaos (linux-2.4/p/c/5_traps.c 1.1.1.1.1.1 644)
+++ 13-pre3.1(w)/arch/mips64/kernel/traps.c Tue, 16 Oct 2001 11:42:23 +1000 kaos (linux-2.4/p/c/5_traps.c 1.1.1.1.1.1 644)
@@ -159,23 +159,35 @@ void show_code(unsigned int *pc)
}
}
-spinlock_t die_lock;
-
void die(const char * str, struct pt_regs * regs, unsigned long err)
{
+ static spinlock_t die_lock = SPIN_LOCK_UNLOCKED;
+ static int die_lock_owner = -1, die_lock_owner_depth;
+
if (user_mode(regs)) /* Just return if in user mode. */
return;
- console_verbose();
- spin_lock_irq(&die_lock);
- printk("%s: %04lx\n", str, err & 0xffff);
- show_regs(regs);
- printk("Process %s (pid: %d, stackpage=%08lx)\n",
- current->comm, current->pid, (unsigned long) current);
- show_stack((unsigned long *) regs->regs[29]);
- show_trace((unsigned long *) regs->regs[29]);
- show_code((unsigned int *) regs->cp0_epc);
- printk("\n");
+ if (die_lock_owner != smp_processor_id()) {
+ console_verbose();
+ spin_lock_irq(&die_lock);
+ die_lock_owner = smp_processor_id();
+ die_lock_owner_depth = 0;
+ }
+
+ if (++die_lock_owner_depth < 3) {
+ printk("%s: %04lx\n", str, err & 0xffff);
+ show_regs(regs);
+ printk("Process %s (pid: %d, stackpage=%08lx)\n",
+ current->comm, current->pid, (unsigned long) current);
+ show_stack((unsigned long *) regs->regs[29]);
+ show_trace((unsigned long *) regs->regs[29]);
+ show_code((unsigned int *) regs->cp0_epc);
+ printk("\n");
+ }
+ else
+ printk(KERN_ERR "Recursive die() failure, output suppressed\n");
+
+ die_lock_owner = -1;
spin_unlock_irq(&die_lock);
do_exit(SIGSEGV);
}
Index: 13-pre3.1/arch/mips/kernel/traps.c
--- 13-pre3.1/arch/mips/kernel/traps.c Mon, 10 Sep 2001 15:46:51 +1000 kaos (linux-2.4/M/c/26_traps.c 1.1.2.1.1.1.1.1 644)
+++ 13-pre3.1(w)/arch/mips/kernel/traps.c Tue, 16 Oct 2001 11:38:52 +1000 kaos (linux-2.4/M/c/26_traps.c 1.1.2.1.1.1.1.1 644)
@@ -188,24 +188,36 @@ void show_code(unsigned int *pc)
}
}
-spinlock_t die_lock;
-
extern void __die(const char * str, struct pt_regs * regs, const char *where,
unsigned long line)
{
- console_verbose();
- spin_lock_irq(&die_lock);
- printk("%s", str);
- if (where)
- printk(" in %s, line %ld", where, line);
- printk(":\n");
- show_regs(regs);
- printk("Process %s (pid: %d, stackpage=%08lx)\n",
- current->comm, current->pid, (unsigned long) current);
- show_stack((unsigned int *) regs->regs[29]);
- show_trace((unsigned int *) regs->regs[29]);
- show_code((unsigned int *) regs->cp0_epc);
- printk("\n");
+ static spinlock_t die_lock = SPIN_LOCK_UNLOCKED;
+ static int die_lock_owner = -1, die_lock_owner_depth;
+
+ if (die_lock_owner != smp_processor_id()) {
+ console_verbose();
+ spin_lock_irq(&die_lock);
+ die_lock_owner = smp_processor_id();
+ die_lock_owner_depth = 0;
+ }
+
+ if (++die_lock_owner_depth < 3) {
+ printk("%s", str);
+ if (where)
+ printk(" in %s, line %ld", where, line);
+ printk(":\n");
+ show_regs(regs);
+ printk("Process %s (pid: %d, stackpage=%08lx)\n",
+ current->comm, current->pid, (unsigned long) current);
+ show_stack((unsigned int *) regs->regs[29]);
+ show_trace((unsigned int *) regs->regs[29]);
+ show_code((unsigned int *) regs->cp0_epc);
+ printk("\n");
+ }
+ else
+ printk(KERN_ERR "Recursive __die() failure, output suppressed\n");
+
+ die_lock_owner = -1;
spin_unlock_irq(&die_lock);
do_exit(SIGSEGV);
}
Index: 13-pre3.1/arch/s390/kernel/traps.c
--- 13-pre3.1/arch/s390/kernel/traps.c Fri, 12 Oct 2001 11:40:38 +1000 kaos (linux-2.4/n/c/8_traps.c 1.4.1.1 644)
+++ 13-pre3.1(w)/arch/s390/kernel/traps.c Tue, 16 Oct 2001 11:54:35 +1000 kaos (linux-2.4/n/c/8_traps.c 1.4.1.1 644)
@@ -60,18 +60,30 @@ extern void pfault_fini(void);
extern void pfault_interrupt(struct pt_regs *regs, __u16 error_code);
#endif
-spinlock_t die_lock = SPIN_LOCK_UNLOCKED;
-
void die(const char * str, struct pt_regs * regs, long err)
{
- console_verbose();
- spin_lock_irq(&die_lock);
- bust_spinlocks(1);
- printk("%s: %04lx\n", str, err & 0xffff);
- show_regs(regs);
+ static spinlock_t die_lock = SPIN_LOCK_UNLOCKED;
+ static int die_lock_owner = -1, die_lock_owner_depth;
+
+ if (die_lock_owner != smp_processor_id()) {
+ console_verbose();
+ spin_lock_irq(&die_lock);
+ die_lock_owner = smp_processor_id();
+ die_lock_owner_depth = 0;
+ bust_spinlocks(1);
+ }
+
+ if (++die_lock_owner_depth < 3) {
+ printk("%s: %04lx\n", str, err & 0xffff);
+ show_regs(regs);
+ }
+ else
+ printk(KERN_ERR "Recursive die() failure, output suppressed\n");
+
bust_spinlocks(0);
- spin_unlock_irq(&die_lock);
- do_exit(SIGSEGV);
+ die_lock_owner = -1;
+ spin_unlock_irq(&die_lock);
+ do_exit(SIGSEGV);
}
#define DO_ERROR(signr, str, name) \
Index: 13-pre3.1/arch/s390x/kernel/traps.c
--- 13-pre3.1/arch/s390x/kernel/traps.c Fri, 12 Oct 2001 11:40:38 +1000 kaos (linux-2.4/p/d/19_traps.c 1.2.1.1 644)
+++ 13-pre3.1(w)/arch/s390x/kernel/traps.c Tue, 16 Oct 2001 11:54:45 +1000 kaos (linux-2.4/p/d/19_traps.c 1.2.1.1 644)
@@ -58,18 +58,30 @@ extern void pfault_fini(void);
extern void pfault_interrupt(struct pt_regs *regs, __u16 error_code);
#endif
-spinlock_t die_lock = SPIN_LOCK_UNLOCKED;
-
void die(const char * str, struct pt_regs * regs, long err)
{
- console_verbose();
- spin_lock_irq(&die_lock);
- bust_spinlocks(1);
- printk("%s: %04lx\n", str, err & 0xffff);
- show_regs(regs);
+ static spinlock_t die_lock = SPIN_LOCK_UNLOCKED;
+ static int die_lock_owner = -1, die_lock_owner_depth;
+
+ if (die_lock_owner != smp_processor_id()) {
+ console_verbose();
+ spin_lock_irq(&die_lock);
+ die_lock_owner = smp_processor_id();
+ die_lock_owner_depth = 0;
+ bust_spinlocks(1);
+ }
+
+ if (++die_lock_owner_depth < 3) {
+ printk("%s: %04lx\n", str, err & 0xffff);
+ show_regs(regs);
+ }
+ else
+ printk(KERN_ERR "Recursive die() failure, output suppressed\n");
+
bust_spinlocks(0);
- spin_unlock_irq(&die_lock);
- do_exit(SIGSEGV);
+ die_lock_owner = -1;
+ spin_unlock_irq(&die_lock);
+ do_exit(SIGSEGV);
}
#define DO_ERROR(signr, str, name) \
Index: 13-pre3.1/arch/sh/kernel/traps.c
--- 13-pre3.1/arch/sh/kernel/traps.c Sun, 09 Sep 2001 19:22:07 +1000 kaos (linux-2.4/t/c/26_traps.c 1.4 644)
+++ 13-pre3.1(w)/arch/sh/kernel/traps.c Tue, 16 Oct 2001 11:51:17 +1000 kaos (linux-2.4/t/c/26_traps.c 1.4 644)
@@ -54,14 +54,26 @@ asmlinkage void do_##name(unsigned long
#define VMALLOC_OFFSET (8*1024*1024)
#define MODULE_RANGE (8*1024*1024)
-spinlock_t die_lock;
-
void die(const char * str, struct pt_regs * regs, long err)
{
- console_verbose();
- spin_lock_irq(&die_lock);
- printk("%s: %04lx\n", str, err & 0xffff);
- show_regs(regs);
+ static spinlock_t die_lock = SPIN_LOCK_UNLOCKED;
+ static int die_lock_owner = -1, die_lock_owner_depth;
+
+ if (die_lock_owner != smp_processor_id()) {
+ console_verbose();
+ spin_lock_irq(&die_lock);
+ die_lock_owner = smp_processor_id();
+ die_lock_owner_depth = 0;
+ }
+
+ if (++die_lock_owner_depth < 3) {
+ printk("%s: %04lx\n", str, err & 0xffff);
+ show_regs(regs);
+ }
+ else
+ printk(KERN_ERR "Recursive die() failure, output suppressed\n");
+
+ die_lock_owner = -1;
spin_unlock_irq(&die_lock);
do_exit(SIGSEGV);
}
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch] 2.4.13-pre3 arm/i386/mips/mips64/s390/s390x/sh die() deadlock
2001-10-16 2:23 [patch] 2.4.13-pre3 arm/i386/mips/mips64/s390/s390x/sh die() deadlock Keith Owens
@ 2001-10-16 2:36 ` Linus Torvalds
2001-10-16 2:58 ` Keith Owens
0 siblings, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2001-10-16 2:36 UTC (permalink / raw)
To: Keith Owens; +Cc: linux-kernel
On Tue, 16 Oct 2001, Keith Owens wrote:
>
> Any die() routine that uses die_lock to avoid multiple cpu reentrancy
> will deadlock on recursive die() errors.
Well, I have to say that I personally have always considered the "die"
lock to not be about multiple CPU re-entrancy, but _exactly_ to stop
infinite oops reports if an oops itself oopses.
I much prefer a dead machine with a partially visible oops over a oops
where the original oops has scrolled away due to recursive faults.
Quite frankly, I consider the ia64 case to be a ia64 bug, nothing more.
Linus
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch] 2.4.13-pre3 arm/i386/mips/mips64/s390/s390x/sh die() deadlock
2001-10-16 2:36 ` Linus Torvalds
@ 2001-10-16 2:58 ` Keith Owens
2001-10-16 4:27 ` Keith Owens
0 siblings, 1 reply; 5+ messages in thread
From: Keith Owens @ 2001-10-16 2:58 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-kernel
On Mon, 15 Oct 2001 19:36:02 -0700 (PDT),
Linus Torvalds <torvalds@transmeta.com> wrote:
>On Tue, 16 Oct 2001, Keith Owens wrote:
>>
>> Any die() routine that uses die_lock to avoid multiple cpu reentrancy
>> will deadlock on recursive die() errors.
>
>Well, I have to say that I personally have always considered the "die"
>lock to not be about multiple CPU re-entrancy, but _exactly_ to stop
>infinite oops reports if an oops itself oopses.
>
>I much prefer a dead machine with a partially visible oops over a oops
>where the original oops has scrolled away due to recursive faults.
AFAIK, Andrew Morton was considering concurrent calls to die() when he
added die_lock. There had been bug reports where both cpus were trying
to dump registers at the same time so the output was interleaved and
unreadable, die_lock prevented that.
IMHO it is unrealistic to expect that all code inside die() will never
fail. Any unexpected kernel corruption could cause the register or
backtrace dump to fail. The patch gets the best of both worlds. It
protects against recursive errors and against concurrent calls to
die().
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [patch] 2.4.13-pre3 arm/i386/mips/mips64/s390/s390x/sh die() deadlock
@ 2001-10-16 3:35 Strumila, John
0 siblings, 0 replies; 5+ messages in thread
From: Strumila, John @ 2001-10-16 3:35 UTC (permalink / raw)
To: linux-kernel; +Cc: Keith Owens
gday Keith,
I dont think many know the reference to IEHIBALL...
John
>Patch against 2.4.13-pre3. Compiled on i386, eyeballed (using IEHIBALL
>on s390) but untested on other arch.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch] 2.4.13-pre3 arm/i386/mips/mips64/s390/s390x/sh die() deadlock
2001-10-16 2:58 ` Keith Owens
@ 2001-10-16 4:27 ` Keith Owens
0 siblings, 0 replies; 5+ messages in thread
From: Keith Owens @ 2001-10-16 4:27 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-kernel
On Tue, 16 Oct 2001 12:58:21 +1000,
Keith Owens <kaos@ocs.com.au> wrote:
>On Mon, 15 Oct 2001 19:36:02 -0700 (PDT),
>Linus Torvalds <torvalds@transmeta.com> wrote:
>>I much prefer a dead machine with a partially visible oops over a oops
>>where the original oops has scrolled away due to recursive faults.
>
>IMHO it is unrealistic to expect that all code inside die() will never
>fail. Any unexpected kernel corruption could cause the register or
>backtrace dump to fail. The patch gets the best of both worlds. It
>protects against recursive errors and against concurrent calls to
>die().
Previous message sent too soon.
The patch makes two attempts at dumping registers, one for the original
oops and one if die() fails, then it gives up. The second attempt is
useful for diagnosing why die() is failing, without that data it is
difficult to fix die() itself.
I was aiming to improve error handling in the rare case that die()
failed so we could get better diagnostics in the long term, by fixing
the problems that make die() fail. If you think that this would scroll
away useful data then we can compromise.
if (++die_lock_owner_depth < 2+(CONFIG_DIAGNOSE_RECURSIVE_DIE+0)) {
CONFIG_DIAGNOSE_RECURSIVE_DIE
If this variable is selected then the kernel will attempt to provide
extra diagnostics in the rare cases when the kernel die() routine
itself dies. This may cause useful information from the first
failure to be lost. Unless you want to diagnose the die() and
show_regs() code in the kernel, say N here.
Acceptable?
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2001-10-16 4:28 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-10-16 2:23 [patch] 2.4.13-pre3 arm/i386/mips/mips64/s390/s390x/sh die() deadlock Keith Owens
2001-10-16 2:36 ` Linus Torvalds
2001-10-16 2:58 ` Keith Owens
2001-10-16 4:27 ` Keith Owens
-- strict thread matches above, loose matches on Subject: below --
2001-10-16 3:35 Strumila, John
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox