* [PATCH] NMI trigger switch support for debugging
@ 2004-05-24 9:21 AKIYAMA Nobuyuki
2004-05-24 9:34 ` Andrew Morton
2004-05-24 10:01 ` [PATCH] typo in drivers/usb/class/usblp.c Benoît Dejean
0 siblings, 2 replies; 15+ messages in thread
From: AKIYAMA Nobuyuki @ 2004-05-24 9:21 UTC (permalink / raw)
To: linux-kernel
Hi all,
I made a patch for debugging with the help of NMI trigger switch.
When kernel hangs severely, keyboard operation(e.g.Ctrl-Alt-Del)
doesn't work properly. This patch enables debugging information
to be displayed on console in this case.
I think this feature is necessary as standard functionality.
Please feel free to use this patch and let me know if you have
any comments.
-Background
When a trouble occurs in kernel, we usually begin to investigate
with following information:
- panic >> panic message.
- oops >> CPU registers and stack trace.
- hang >> **NONE** no standard method established.
-How it works
Most IA32 servers have a NMI switch that fires NMI interrupt up.
The NMI interrupt can interrupt even if kernel is serious state,
for example deadlock under the interrupt disabled.
When the NMI switch is pressed after this feature is activated,
CPU registers and stack trace are displayed on console and then
panic occurs.
This feature is activated or deactivated with sysctl.
On IA32 architecture, only the following are defined as reason
of NMI interrupt:
- memory parity error
- I/O check error
The reason code of NMI switch is not defined, so this patch assumes
that all undefined NMI interrupts are fired by MNI switch.
However, oprofile and NMI watchdog also use undefined NMI interrupt.
Therefore this feature cannot be used at the same time with oprofile
and NMI watchdog. This feature hands NMI interrupt over to oprofile
and NMI watchdog. So, when they have been activated, this feature
doesn't work even if it is activated.
-Support architecture
IA32
-Setup
Set up the system control parameter as follows:
# sysctl -w kernel.unknown_nmi_panic=1
kernel.unknown_nmi_panic = 1
If the NMI switch is pressed, CPU registers and stack trace will
be displayed on console and then panic occurs.
Regards,
Nobuyuki Akiyama
diff -Nur linux-2.6.6.org/arch/i386/kernel/nmi.c
linux-2.6.6/arch/i386/kernel/nmi.c
--- linux-2.6.6.org/arch/i386/kernel/nmi.c 2004-05-10 11:32:37.000000000
+0900
+++ linux-2.6.6/arch/i386/kernel/nmi.c 2004-05-20 14:24:03.000000000 +0900
@@ -452,6 +452,8 @@
alert_counter[i] = 0;
}
+extern void die_nmi(struct pt_regs *, const char *msg);
+
void nmi_watchdog_tick (struct pt_regs * regs)
{
@@ -470,21 +472,8 @@
* wait a few IRQs (5 seconds) before doing the oops ...
*/
alert_counter[cpu]++;
- if (alert_counter[cpu] == 5*nmi_hz) {
- spin_lock(&nmi_print_lock);
- /*
- * We are in trouble anyway, lets at least try
- * to get a message out.
- */
- bust_spinlocks(1);
- printk("NMI Watchdog detected LOCKUP on CPU%d, eip %08lx,
registers:\n", cpu, regs->eip);
- show_registers(regs);
- printk("console shuts up ...\n");
- console_silent();
- spin_unlock(&nmi_print_lock);
- bust_spinlocks(0);
- do_exit(SIGSEGV);
- }
+ if (alert_counter[cpu] == 5*nmi_hz)
+ die_nmi(regs, "NMI Watchdog detected LOCKUP");
} else {
last_irq_sums[cpu] = sum;
alert_counter[cpu] = 0;
diff -Nur linux-2.6.6.org/arch/i386/kernel/traps.c
linux-2.6.6/arch/i386/kernel/traps.c
--- linux-2.6.6.org/arch/i386/kernel/traps.c 2004-05-10
11:32:02.000000000 +0900
+++ linux-2.6.6/arch/i386/kernel/traps.c 2004-05-20 14:21:53.000000000 +0900
@@ -454,6 +454,29 @@
printk("Do you have a strange power saving mode enabled?\n");
}
+int unknown_nmi_panic = 0;
+
+static spinlock_t nmi_print_lock = SPIN_LOCK_UNLOCKED;
+
+void die_nmi (struct pt_regs *regs, const char *msg)
+{
+ spin_lock(&nmi_print_lock);
+ /*
+ * We are in trouble anyway, lets at least try
+ * to get a message out.
+ */
+ bust_spinlocks(1);
+ printk(msg);
+ printk(" on CPU%d, eip %08lx, registers:\n",
+ smp_processor_id(), regs->eip);
+ show_registers(regs);
+ printk("console shuts up ...\n");
+ console_silent();
+ spin_unlock(&nmi_print_lock);
+ bust_spinlocks(0);
+ do_exit(SIGSEGV);
+}
+
static void default_do_nmi(struct pt_regs * regs)
{
unsigned char reason = get_nmi_reason();
@@ -469,6 +492,11 @@
return;
}
#endif
+ if (unknown_nmi_panic) {
+ char buf[64];
+ sprintf(buf, "NMI received for unknown reason %02x\n", reason);
+ die_nmi(regs, buf);
+ }
unknown_nmi_error(reason, regs);
return;
}
diff -Nur linux-2.6.6.org/include/linux/sysctl.h
linux-2.6.6/include/linux/sysctl.h
--- linux-2.6.6.org/include/linux/sysctl.h 2004-05-10 11:32:38.000000000
+0900
+++ linux-2.6.6/include/linux/sysctl.h 2004-05-20 14:19:36.000000000 +0900
@@ -133,6 +133,7 @@
KERN_NGROUPS_MAX=63, /* int: NGROUPS_MAX */
KERN_SPARC_SCONS_PWROFF=64, /* int: serial console power-off halt */
KERN_HZ_TIMER=65, /* int: hz timer on or off */
+ KERN_UNKNOWN_NMI_PANIC=66, /* int: unknown nmi panic flag */
};
diff -Nur linux-2.6.6.org/kernel/sysctl.c linux-2.6.6/kernel/sysctl.c
--- linux-2.6.6.org/kernel/sysctl.c 2004-05-10 11:32:00.000000000 +0900
+++ linux-2.6.6/kernel/sysctl.c 2004-05-20 14:19:36.000000000 +0900
@@ -64,6 +64,7 @@
extern int min_free_kbytes;
extern int printk_ratelimit_jiffies;
extern int printk_ratelimit_burst;
+extern int unknown_nmi_panic;
/* this is needed for the proc_dointvec_minmax for [fs_]overflow UID
and GID */
static int maxolduid = 65535;
@@ -636,6 +637,16 @@
.mode = 0444,
.proc_handler = &proc_dointvec,
},
+#if defined(__i386__)
+ {
+ .ctl_name = KERN_UNKNOWN_NMI_PANIC,
+ .procname = "unknown_nmi_panic",
+ .data = &unknown_nmi_panic,
+ .maxlen = sizeof (int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
+#endif
{ .ctl_name = 0 }
};
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH] NMI trigger switch support for debugging 2004-05-24 9:21 [PATCH] NMI trigger switch support for debugging AKIYAMA Nobuyuki @ 2004-05-24 9:34 ` Andrew Morton 2004-05-26 1:36 ` AKIYAMA Nobuyuki 2004-05-24 10:01 ` [PATCH] typo in drivers/usb/class/usblp.c Benoît Dejean 1 sibling, 1 reply; 15+ messages in thread From: Andrew Morton @ 2004-05-24 9:34 UTC (permalink / raw) To: AKIYAMA Nobuyuki; +Cc: linux-kernel AKIYAMA Nobuyuki <akiyama.nobuyuk@jp.fujitsu.com> wrote: > > Therefore this feature cannot be used at the same time with oprofile > and NMI watchdog. This feature hands NMI interrupt over to oprofile > and NMI watchdog. So, when they have been activated, this feature > doesn't work even if it is activated. An API was recently added to solve this. See reserve_lapic_nmi() and release_lapic_nmi(). ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] NMI trigger switch support for debugging 2004-05-24 9:34 ` Andrew Morton @ 2004-05-26 1:36 ` AKIYAMA Nobuyuki 2004-05-26 1:41 ` Andrew Morton 0 siblings, 1 reply; 15+ messages in thread From: AKIYAMA Nobuyuki @ 2004-05-26 1:36 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Hi Andrew, Andrew Morton wrote: >AKIYAMA Nobuyuki <akiyama.nobuyuk@jp.fujitsu.com> wrote: > > >>Therefore this feature cannot be used at the same time with oprofile >> and NMI watchdog. This feature hands NMI interrupt over to oprofile >> and NMI watchdog. So, when they have been activated, this feature >> doesn't work even if it is activated. >> >> > >An API was recently added to solve this. See reserve_lapic_nmi() and >release_lapic_nmi(). > > > Thank you for your comment. I remade my patch referring to your suggestion. Regards, Nobuyuki Akiyama diff -Nur linux-2.6.6.org/arch/i386/kernel/nmi.c linux-2.6.6/arch/i386/kernel/nmi.c --- linux-2.6.6.org/arch/i386/kernel/nmi.c 2004-05-25 20:02:45.000000000 +0900 +++ linux-2.6.6/arch/i386/kernel/nmi.c 2004-05-26 09:48:23.000000000 +0900 @@ -25,16 +25,20 @@ #include <linux/module.h> #include <linux/nmi.h> #include <linux/sysdev.h> +#include <linux/sysctl.h> #include <asm/smp.h> #include <asm/mtrr.h> #include <asm/mpspec.h> #include <asm/nmi.h> +#include "mach_traps.h" + unsigned int nmi_watchdog = NMI_NONE; static unsigned int nmi_hz = HZ; unsigned int nmi_perfctr_msr; /* the MSR to reset in NMI handler */ extern void show_registers(struct pt_regs *regs); +int unknown_nmi_panic = 0; /* * lapic_nmi_owner tracks the ownership of the lapic NMI hardware: @@ -419,8 +423,6 @@ nmi_active = 1; } -static spinlock_t nmi_print_lock = SPIN_LOCK_UNLOCKED; - /* * the best way to detect whether a CPU has a 'hard lockup' problem * is to check it's local APIC timer IRQ counts. If they are not @@ -452,6 +454,8 @@ alert_counter[i] = 0; } +extern void die_nmi(struct pt_regs *, const char *msg); + void nmi_watchdog_tick (struct pt_regs * regs) { @@ -470,21 +474,8 @@ * wait a few IRQs (5 seconds) before doing the oops ... */ alert_counter[cpu]++; - if (alert_counter[cpu] == 5*nmi_hz) { - spin_lock(&nmi_print_lock); - /* - * We are in trouble anyway, lets at least try - * to get a message out. - */ - bust_spinlocks(1); - printk("NMI Watchdog detected LOCKUP on CPU%d, eip %08lx, registers:\n", cpu, regs->eip); - show_registers(regs); - printk("console shuts up ...\n"); - console_silent(); - spin_unlock(&nmi_print_lock); - bust_spinlocks(0); - do_exit(SIGSEGV); - } + if (alert_counter[cpu] == 5*nmi_hz) + die_nmi(regs, "NMI Watchdog detected LOCKUP"); } else { last_irq_sums[cpu] = sum; alert_counter[cpu] = 0; @@ -511,6 +502,45 @@ } } +static int unknown_nmi_panic_callback(struct pt_regs *regs, int cpu) +{ + unsigned char reason = get_nmi_reason(); + char buf[64]; + + if (!(reason & 0xc0)) { + sprintf(buf, "NMI received for unknown reason %02x\n", reason); + die_nmi(regs, buf); + } + return 0; +} + +/* + * proc handler for /proc/sys/kernel/unknown_nmi_panic + */ +int proc_unknown_nmi_panic(ctl_table *table, int write, + struct file *file, void __user *buffer, size_t *length) +{ + int old_state; + + old_state = unknown_nmi_panic; + proc_dointvec(table, write, file, buffer, length); + if (!old_state == !unknown_nmi_panic) + return 0; + + if (unknown_nmi_panic) { + if (reserve_lapic_nmi() < 0) { + unknown_nmi_panic = 0; + return -EBUSY; + } else { + set_nmi_callback(unknown_nmi_panic_callback); + } + } else { + release_lapic_nmi(); + unset_nmi_callback(); + } + return 0; +} + EXPORT_SYMBOL(nmi_active); EXPORT_SYMBOL(nmi_watchdog); EXPORT_SYMBOL(reserve_lapic_nmi); diff -Nur linux-2.6.6.org/arch/i386/kernel/traps.c linux-2.6.6/arch/i386/kernel/traps.c --- linux-2.6.6.org/arch/i386/kernel/traps.c 2004-05-25 20:02:45.000000000 +0900 +++ linux-2.6.6/arch/i386/kernel/traps.c 2004-05-25 22:54:03.000000000 +0900 @@ -454,6 +454,27 @@ printk("Do you have a strange power saving mode enabled?\n"); } +static spinlock_t nmi_print_lock = SPIN_LOCK_UNLOCKED; + +void die_nmi (struct pt_regs *regs, const char *msg) +{ + spin_lock(&nmi_print_lock); + /* + * We are in trouble anyway, lets at least try + * to get a message out. + */ + bust_spinlocks(1); + printk(msg); + printk(" on CPU%d, eip %08lx, registers:\n", + smp_processor_id(), regs->eip); + show_registers(regs); + printk("console shuts up ...\n"); + console_silent(); + spin_unlock(&nmi_print_lock); + bust_spinlocks(0); + do_exit(SIGSEGV); +} + static void default_do_nmi(struct pt_regs * regs) { unsigned char reason = get_nmi_reason(); diff -Nur linux-2.6.6.org/include/linux/sysctl.h linux-2.6.6/include/linux/sysctl.h --- linux-2.6.6.org/include/linux/sysctl.h 2004-05-25 20:03:13.000000000 +0900 +++ linux-2.6.6/include/linux/sysctl.h 2004-05-25 22:54:03.000000000 +0900 @@ -133,6 +133,7 @@ KERN_NGROUPS_MAX=63, /* int: NGROUPS_MAX */ KERN_SPARC_SCONS_PWROFF=64, /* int: serial console power-off halt */ KERN_HZ_TIMER=65, /* int: hz timer on or off */ + KERN_UNKNOWN_NMI_PANIC=66, /* int: unknown nmi panic flag */ }; diff -Nur linux-2.6.6.org/kernel/sysctl.c linux-2.6.6/kernel/sysctl.c --- linux-2.6.6.org/kernel/sysctl.c 2004-05-25 20:03:31.000000000 +0900 +++ linux-2.6.6/kernel/sysctl.c 2004-05-25 22:54:03.000000000 +0900 @@ -65,6 +65,12 @@ extern int printk_ratelimit_jiffies; extern int printk_ratelimit_burst; +#if defined(__i386__) +extern int unknown_nmi_panic; +extern int proc_unknown_nmi_panic(ctl_table *, int, struct file *, + void __user *, size_t *); +#endif + /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ static int maxolduid = 65535; static int minolduid; @@ -636,6 +642,16 @@ .mode = 0444, .proc_handler = &proc_dointvec, }, +#if defined(__i386__) + { + .ctl_name = KERN_UNKNOWN_NMI_PANIC, + .procname = "unknown_nmi_panic", + .data = &unknown_nmi_panic, + .maxlen = sizeof (int), + .mode = 0644, + .proc_handler = &proc_unknown_nmi_panic, + }, +#endif { .ctl_name = 0 } }; ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] NMI trigger switch support for debugging 2004-05-26 1:36 ` AKIYAMA Nobuyuki @ 2004-05-26 1:41 ` Andrew Morton 2004-05-26 2:28 ` AKIYAMA Nobuyuki 0 siblings, 1 reply; 15+ messages in thread From: Andrew Morton @ 2004-05-26 1:41 UTC (permalink / raw) To: AKIYAMA Nobuyuki; +Cc: linux-kernel AKIYAMA Nobuyuki <akiyama.nobuyuk@jp.fujitsu.com> wrote: > > Thank you for your comment. > I remade my patch referring to your suggestion. OK. Now I've largely forgotten what your patch did and cannot proceed with it unless it has a good changelog entry. Please maintain a full description of your patches and include that description whenever reissuing the patch. And it needs to be reissued anyway, because your email client wordwrapped it. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] NMI trigger switch support for debugging 2004-05-26 1:41 ` Andrew Morton @ 2004-05-26 2:28 ` AKIYAMA Nobuyuki 2004-05-26 2:37 ` Andrew Morton 2004-05-26 9:43 ` Mikael Pettersson 0 siblings, 2 replies; 15+ messages in thread From: AKIYAMA Nobuyuki @ 2004-05-26 2:28 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Hi Andrew, Andrew Morton wrote: >AKIYAMA Nobuyuki <akiyama.nobuyuk@jp.fujitsu.com> wrote: > > >>Thank you for your comment. >> I remade my patch referring to your suggestion. >> >> > >OK. Now I've largely forgotten what your patch did and cannot proceed with >it unless it has a good changelog entry. > >Please maintain a full description of your patches and include that >description whenever reissuing the patch. > >And it needs to be reissued anyway, because your email client wordwrapped it. > > > > > Sorry, I resend document and patch. Regards, Nobuyuki Akiyama ----- -Summary I made a patch for debugging with the help of NMI trigger switch. When kernel hangs severely, keyboard operation(e.g.Ctrl-Alt-Del) doesn't work properly. This patch enables debugging information to be displayed on console in this case. I think this feature is necessary as standard functionality. -Background When a trouble occurs in kernel, we usually begin to investigate with following information: - panic >> panic message. - oops >> CPU registers and stack trace. - hang >> **NONE** no standard method established. -How it works Most IA32 servers have a NMI switch that fires NMI interrupt up. The NMI interrupt can interrupt even if kernel is serious state, for example deadlock under the interrupt disabled. When the NMI switch is pressed after this feature is activated, CPU registers and stack trace are displayed on console and then panic occurs. This feature is activated or deactivated with sysctl. On IA32 architecture, only the following are defined as reason of NMI interrupt: - memory parity error - I/O check error The reason code of NMI switch is not defined, so this patch assumes that all undefined NMI interrupts are fired by MNI switch. However, oprofile and NMI watchdog also use undefined NMI interrupt. Therefore this feature cannot be used at the same time with oprofile and NMI watchdog. This feature hands NMI interrupt over to oprofile and NMI watchdog. So, when they have been activated, this feature doesn't work even if it is activated. -Support architecture IA32 -Setup Set up the system control parameter as follows: # sysctl -w kernel.unknown_nmi_panic=1 kernel.unknown_nmi_panic = 1 If the NMI switch is pressed, CPU registers and stack trace will be displayed on console and then panic occurs. diff -Nur linux-2.6.6.org/arch/i386/kernel/nmi.c linux-2.6.6/arch/i386/kernel/nmi.c --- linux-2.6.6.org/arch/i386/kernel/nmi.c 2004-05-25 20:02:45.000000000 +0900 +++ linux-2.6.6/arch/i386/kernel/nmi.c 2004-05-26 09:48:23.000000000 +0900 @@ -25,16 +25,20 @@ #include <linux/module.h> #include <linux/nmi.h> #include <linux/sysdev.h> +#include <linux/sysctl.h> #include <asm/smp.h> #include <asm/mtrr.h> #include <asm/mpspec.h> #include <asm/nmi.h> +#include "mach_traps.h" + unsigned int nmi_watchdog = NMI_NONE; static unsigned int nmi_hz = HZ; unsigned int nmi_perfctr_msr; /* the MSR to reset in NMI handler */ extern void show_registers(struct pt_regs *regs); +int unknown_nmi_panic = 0; /* * lapic_nmi_owner tracks the ownership of the lapic NMI hardware: @@ -419,8 +423,6 @@ nmi_active = 1; } -static spinlock_t nmi_print_lock = SPIN_LOCK_UNLOCKED; - /* * the best way to detect whether a CPU has a 'hard lockup' problem * is to check it's local APIC timer IRQ counts. If they are not @@ -452,6 +454,8 @@ alert_counter[i] = 0; } +extern void die_nmi(struct pt_regs *, const char *msg); + void nmi_watchdog_tick (struct pt_regs * regs) { @@ -470,21 +474,8 @@ * wait a few IRQs (5 seconds) before doing the oops ... */ alert_counter[cpu]++; - if (alert_counter[cpu] == 5*nmi_hz) { - spin_lock(&nmi_print_lock); - /* - * We are in trouble anyway, lets at least try - * to get a message out. - */ - bust_spinlocks(1); - printk("NMI Watchdog detected LOCKUP on CPU%d, eip %08lx, registers:\n", cpu, regs->eip); - show_registers(regs); - printk("console shuts up ...\n"); - console_silent(); - spin_unlock(&nmi_print_lock); - bust_spinlocks(0); - do_exit(SIGSEGV); - } + if (alert_counter[cpu] == 5*nmi_hz) + die_nmi(regs, "NMI Watchdog detected LOCKUP"); } else { last_irq_sums[cpu] = sum; alert_counter[cpu] = 0; @@ -511,6 +502,45 @@ } } +static int unknown_nmi_panic_callback(struct pt_regs *regs, int cpu) +{ + unsigned char reason = get_nmi_reason(); + char buf[64]; + + if (!(reason & 0xc0)) { + sprintf(buf, "NMI received for unknown reason %02x\n", reason); + die_nmi(regs, buf); + } + return 0; +} + +/* + * proc handler for /proc/sys/kernel/unknown_nmi_panic + */ +int proc_unknown_nmi_panic(ctl_table *table, int write, + struct file *file, void __user *buffer, size_t *length) +{ + int old_state; + + old_state = unknown_nmi_panic; + proc_dointvec(table, write, file, buffer, length); + if (!old_state == !unknown_nmi_panic) + return 0; + + if (unknown_nmi_panic) { + if (reserve_lapic_nmi() < 0) { + unknown_nmi_panic = 0; + return -EBUSY; + } else { + set_nmi_callback(unknown_nmi_panic_callback); + } + } else { + release_lapic_nmi(); + unset_nmi_callback(); + } + return 0; +} + EXPORT_SYMBOL(nmi_active); EXPORT_SYMBOL(nmi_watchdog); EXPORT_SYMBOL(reserve_lapic_nmi); diff -Nur linux-2.6.6.org/arch/i386/kernel/traps.c linux-2.6.6/arch/i386/kernel/traps.c --- linux-2.6.6.org/arch/i386/kernel/traps.c 2004-05-25 20:02:45.000000000 +0900 +++ linux-2.6.6/arch/i386/kernel/traps.c 2004-05-25 22:54:03.000000000 +0900 @@ -454,6 +454,27 @@ printk("Do you have a strange power saving mode enabled?\n"); } +static spinlock_t nmi_print_lock = SPIN_LOCK_UNLOCKED; + +void die_nmi (struct pt_regs *regs, const char *msg) +{ + spin_lock(&nmi_print_lock); + /* + * We are in trouble anyway, lets at least try + * to get a message out. + */ + bust_spinlocks(1); + printk(msg); + printk(" on CPU%d, eip %08lx, registers:\n", + smp_processor_id(), regs->eip); + show_registers(regs); + printk("console shuts up ...\n"); + console_silent(); + spin_unlock(&nmi_print_lock); + bust_spinlocks(0); + do_exit(SIGSEGV); +} + static void default_do_nmi(struct pt_regs * regs) { unsigned char reason = get_nmi_reason(); diff -Nur linux-2.6.6.org/include/linux/sysctl.h linux-2.6.6/include/linux/sysctl.h --- linux-2.6.6.org/include/linux/sysctl.h 2004-05-25 20:03:13.000000000 +0900 +++ linux-2.6.6/include/linux/sysctl.h 2004-05-25 22:54:03.000000000 +0900 @@ -133,6 +133,7 @@ KERN_NGROUPS_MAX=63, /* int: NGROUPS_MAX */ KERN_SPARC_SCONS_PWROFF=64, /* int: serial console power-off halt */ KERN_HZ_TIMER=65, /* int: hz timer on or off */ + KERN_UNKNOWN_NMI_PANIC=66, /* int: unknown nmi panic flag */ }; diff -Nur linux-2.6.6.org/kernel/sysctl.c linux-2.6.6/kernel/sysctl.c --- linux-2.6.6.org/kernel/sysctl.c 2004-05-25 20:03:31.000000000 +0900 +++ linux-2.6.6/kernel/sysctl.c 2004-05-25 22:54:03.000000000 +0900 @@ -65,6 +65,12 @@ extern int printk_ratelimit_jiffies; extern int printk_ratelimit_burst; +#if defined(__i386__) +extern int unknown_nmi_panic; +extern int proc_unknown_nmi_panic(ctl_table *, int, struct file *, + void __user *, size_t *); +#endif + /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ static int maxolduid = 65535; static int minolduid; @@ -636,6 +642,16 @@ .mode = 0444, .proc_handler = &proc_dointvec, }, +#if defined(__i386__) + { + .ctl_name = KERN_UNKNOWN_NMI_PANIC, + .procname = "unknown_nmi_panic", + .data = &unknown_nmi_panic, + .maxlen = sizeof (int), + .mode = 0644, + .proc_handler = &proc_unknown_nmi_panic, + }, +#endif { .ctl_name = 0 } }; ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] NMI trigger switch support for debugging 2004-05-26 2:28 ` AKIYAMA Nobuyuki @ 2004-05-26 2:37 ` Andrew Morton 2004-05-26 12:17 ` AKIYAMA Nobuyuki ` (2 more replies) 2004-05-26 9:43 ` Mikael Pettersson 1 sibling, 3 replies; 15+ messages in thread From: Andrew Morton @ 2004-05-26 2:37 UTC (permalink / raw) To: AKIYAMA Nobuyuki; +Cc: linux-kernel AKIYAMA Nobuyuki <akiyama.nobuyuk@jp.fujitsu.com> wrote: > > Sorry, I resend document and patch. Great, thanks. Updates to Documentation/kernel-parameters.txt and Documentation/filesystems/proc.txt would be nice. If the machine locks up with interrupts enabled we can use sysrq-T and sysrq-P. If it locks up with interrupts disabled the NMI watchdog will automatically produce the same info as your patch. So what advantage does the patch add? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] NMI trigger switch support for debugging 2004-05-26 2:37 ` Andrew Morton @ 2004-05-26 12:17 ` AKIYAMA Nobuyuki 2004-05-26 12:52 ` Maciej W. Rozycki 2004-05-27 8:43 ` AKIYAMA Nobuyuki 2 siblings, 0 replies; 15+ messages in thread From: AKIYAMA Nobuyuki @ 2004-05-26 12:17 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Hi Andrew, Andrew Morton wrote: >AKIYAMA Nobuyuki <akiyama.nobuyuk@jp.fujitsu.com> wrote: > > >>Sorry, I resend document and patch. >> >> > >Great, thanks. Updates to Documentation/kernel-parameters.txt and >Documentation/filesystems/proc.txt would be nice. > > >If the machine locks up with interrupts enabled we can use sysrq-T and >sysrq-P. If it locks up with interrupts disabled the NMI watchdog will >automatically produce the same info as your patch. So what advantage does >the patch add? > > People who think performance is very important and want to run only program they need tend not to use NMI watchdog. My patch does not affect performance at all, and it just run when NMI switch is pressed. Whenever debugging information is needed, we can always get it. Regards, Nobuyuki Akiyama ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] NMI trigger switch support for debugging 2004-05-26 2:37 ` Andrew Morton 2004-05-26 12:17 ` AKIYAMA Nobuyuki @ 2004-05-26 12:52 ` Maciej W. Rozycki 2004-05-27 8:43 ` AKIYAMA Nobuyuki 2 siblings, 0 replies; 15+ messages in thread From: Maciej W. Rozycki @ 2004-05-26 12:52 UTC (permalink / raw) To: Andrew Morton; +Cc: AKIYAMA Nobuyuki, linux-kernel On Tue, 25 May 2004, Andrew Morton wrote: > If the machine locks up with interrupts enabled we can use sysrq-T and > sysrq-P. If it locks up with interrupts disabled the NMI watchdog will > automatically produce the same info as your patch. So what advantage does > the patch add? A system may have no NMI watchdog available (which requires an APIC), yet still have an NMI button. Though this is probably the case only for IA32 systems that are several years old now. Note that EISA systems have a different NMI watchdog, based on a second 8254 PIT, which we've never attempted to make use of. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--------------------------------------------------------------+ + e-mail: macro@ds2.pg.gda.pl, PGP key available + ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] NMI trigger switch support for debugging 2004-05-26 2:37 ` Andrew Morton 2004-05-26 12:17 ` AKIYAMA Nobuyuki 2004-05-26 12:52 ` Maciej W. Rozycki @ 2004-05-27 8:43 ` AKIYAMA Nobuyuki 2 siblings, 0 replies; 15+ messages in thread From: AKIYAMA Nobuyuki @ 2004-05-27 8:43 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Hi Andrew, Andrew Morton wrote: > Great, thanks. Updates to Documentation/kernel-parameters.txt and > Documentation/filesystems/proc.txt would be nice. > I wrote a document(proc.txt), and remade a patch that merged Mikael and Andi's feedback. The patch has been tested well. Regards, Nobuyuki Akiyama --- proc.txt.orig 2004-05-27 14:28:38.000000000 +0900 +++ proc.txt 2004-05-27 14:28:07.000000000 +0900 @@ -1105,6 +1105,20 @@ The location where the modprobe binary is located. The kernel uses this program to load modules on demand. +unknown_nmi_panic +----------------- + +The value in this file affects behavior of handling NMI. When the value is +non-zero, unknown NMI is trapped and then panic occurs. At that time, kernel +debugging information is displayed on console. + +NMI switch that most IA32 servers have fires unknown NMI up, for example. +If a system hangs up, try pressing the NMI switch. + +[NOTE] + This function and oprofile share a NMI callback. Therefore this function + cannot be enabled when oprofile is activated. + 2.4 /proc/sys/vm - The virtual memory subsystem ----------------------------------------------- diff -Nur linux-2.6.6.org/arch/i386/kernel/nmi.c linux-2.6.6/arch/i386/kernel/nmi.c --- linux-2.6.6.org/arch/i386/kernel/nmi.c 2004-05-25 20:02:45.000000000 +0900 +++ linux-2.6.6/arch/i386/kernel/nmi.c 2004-05-26 09:48:23.000000000 +0900 @@ -25,16 +25,20 @@ #include <linux/module.h> #include <linux/nmi.h> #include <linux/sysdev.h> +#include <linux/sysctl.h> #include <asm/smp.h> #include <asm/mtrr.h> #include <asm/mpspec.h> #include <asm/nmi.h> +#include "mach_traps.h" + unsigned int nmi_watchdog = NMI_NONE; static unsigned int nmi_hz = HZ; unsigned int nmi_perfctr_msr; /* the MSR to reset in NMI handler */ extern void show_registers(struct pt_regs *regs); +int unknown_nmi_panic; /* * lapic_nmi_owner tracks the ownership of the lapic NMI hardware: @@ -419,8 +423,6 @@ nmi_active = 1; } -static spinlock_t nmi_print_lock = SPIN_LOCK_UNLOCKED; - /* * the best way to detect whether a CPU has a 'hard lockup' problem * is to check it's local APIC timer IRQ counts. If they are not @@ -452,6 +454,8 @@ alert_counter[i] = 0; } +extern void die_nmi(struct pt_regs *, const char *msg); + void nmi_watchdog_tick (struct pt_regs * regs) { @@ -470,21 +474,8 @@ * wait a few IRQs (5 seconds) before doing the oops ... */ alert_counter[cpu]++; - if (alert_counter[cpu] == 5*nmi_hz) { - spin_lock(&nmi_print_lock); - /* - * We are in trouble anyway, lets at least try - * to get a message out. - */ - bust_spinlocks(1); - printk("NMI Watchdog detected LOCKUP on CPU%d, eip %08lx, registers:\n", cpu, regs->eip); - show_registers(regs); - printk("console shuts up ...\n"); - console_silent(); - spin_unlock(&nmi_print_lock); - bust_spinlocks(0); - do_exit(SIGSEGV); - } + if (alert_counter[cpu] == 5*nmi_hz) + die_nmi(regs, "NMI Watchdog detected LOCKUP"); } else { last_irq_sums[cpu] = sum; alert_counter[cpu] = 0; @@ -511,6 +502,45 @@ } } +static int unknown_nmi_panic_callback(struct pt_regs *regs, int cpu) +{ + unsigned char reason = get_nmi_reason(); + char buf[64]; + + if (!(reason & 0xc0)) { + sprintf(buf, "NMI received for unknown reason %02x\n", reason); + die_nmi(regs, buf); + } + return 0; +} + +/* + * proc handler for /proc/sys/kernel/unknown_nmi_panic + */ +int proc_unknown_nmi_panic(ctl_table *table, int write, + struct file *file, void __user *buffer, size_t *length) +{ + int old_state; + + old_state = unknown_nmi_panic; + proc_dointvec(table, write, file, buffer, length); + if (!!old_state == !!unknown_nmi_panic) + return 0; + + if (unknown_nmi_panic) { + if (reserve_lapic_nmi() < 0) { + unknown_nmi_panic = 0; + return -EBUSY; + } else { + set_nmi_callback(unknown_nmi_panic_callback); + } + } else { + release_lapic_nmi(); + unset_nmi_callback(); + } + return 0; +} + EXPORT_SYMBOL(nmi_active); EXPORT_SYMBOL(nmi_watchdog); EXPORT_SYMBOL(reserve_lapic_nmi); diff -Nur linux-2.6.6.org/arch/i386/kernel/traps.c linux-2.6.6/arch/i386/kernel/traps.c --- linux-2.6.6.org/arch/i386/kernel/traps.c 2004-05-25 20:02:45.000000000 +0900 +++ linux-2.6.6/arch/i386/kernel/traps.c 2004-05-25 22:54:03.000000000 +0900 @@ -454,6 +454,27 @@ printk("Do you have a strange power saving mode enabled?\n"); } +static spinlock_t nmi_print_lock = SPIN_LOCK_UNLOCKED; + +void die_nmi (struct pt_regs *regs, const char *msg) +{ + spin_lock(&nmi_print_lock); + /* + * We are in trouble anyway, lets at least try + * to get a message out. + */ + bust_spinlocks(1); + printk(msg); + printk(" on CPU%d, eip %08lx, registers:\n", + smp_processor_id(), regs->eip); + show_registers(regs); + printk("console shuts up ...\n"); + console_silent(); + spin_unlock(&nmi_print_lock); + bust_spinlocks(0); + do_exit(SIGSEGV); +} + static void default_do_nmi(struct pt_regs * regs) { unsigned char reason = get_nmi_reason(); diff -Nur linux-2.6.6.org/include/linux/sysctl.h linux-2.6.6/include/linux/sysctl.h --- linux-2.6.6.org/include/linux/sysctl.h 2004-05-25 20:03:13.000000000 +0900 +++ linux-2.6.6/include/linux/sysctl.h 2004-05-25 22:54:03.000000000 +0900 @@ -133,6 +133,7 @@ KERN_NGROUPS_MAX=63, /* int: NGROUPS_MAX */ KERN_SPARC_SCONS_PWROFF=64, /* int: serial console power-off halt */ KERN_HZ_TIMER=65, /* int: hz timer on or off */ + KERN_UNKNOWN_NMI_PANIC=66, /* int: unknown nmi panic flag */ }; diff -Nur linux-2.6.6.org/kernel/sysctl.c linux-2.6.6/kernel/sysctl.c --- linux-2.6.6.org/kernel/sysctl.c 2004-05-25 20:03:31.000000000 +0900 +++ linux-2.6.6/kernel/sysctl.c 2004-05-25 22:54:03.000000000 +0900 @@ -65,6 +65,12 @@ extern int printk_ratelimit_jiffies; extern int printk_ratelimit_burst; +#if defined(__i386__) +extern int unknown_nmi_panic; +extern int proc_unknown_nmi_panic(ctl_table *, int, struct file *, + void __user *, size_t *); +#endif + /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ static int maxolduid = 65535; static int minolduid; @@ -636,6 +642,16 @@ .mode = 0444, .proc_handler = &proc_dointvec, }, +#if defined(__i386__) + { + .ctl_name = KERN_UNKNOWN_NMI_PANIC, + .procname = "unknown_nmi_panic", + .data = &unknown_nmi_panic, + .maxlen = sizeof (int), + .mode = 0644, + .proc_handler = &proc_unknown_nmi_panic, + }, +#endif { .ctl_name = 0 } }; ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] NMI trigger switch support for debugging 2004-05-26 2:28 ` AKIYAMA Nobuyuki 2004-05-26 2:37 ` Andrew Morton @ 2004-05-26 9:43 ` Mikael Pettersson 2004-05-26 13:42 ` AKIYAMA Nobuyuki 2004-05-26 20:34 ` Paul Jackson 1 sibling, 2 replies; 15+ messages in thread From: Mikael Pettersson @ 2004-05-26 9:43 UTC (permalink / raw) To: AKIYAMA Nobuyuki; +Cc: Andrew Morton, linux-kernel AKIYAMA Nobuyuki writes: > +int unknown_nmi_panic = 0; It's a kernel coding standard to _not_ explicitly initialise static-extent data to zero. > +/* > + * proc handler for /proc/sys/kernel/unknown_nmi_panic > + */ > +int proc_unknown_nmi_panic(ctl_table *table, int write, > + struct file *file, void __user *buffer, size_t *length) > +{ > + int old_state; > + > + old_state = unknown_nmi_panic; > + proc_dointvec(table, write, file, buffer, length); > + if (!old_state == !unknown_nmi_panic) > + return 0; This conditional looks terribly obscure. Can you simplify it or explain your intention here? > + if (unknown_nmi_panic) { > + if (reserve_lapic_nmi() < 0) { > + unknown_nmi_panic = 0; > + return -EBUSY; > + } else { > + set_nmi_callback(unknown_nmi_panic_callback); > + } > + } else { > + release_lapic_nmi(); You're invoking release_lapic_nmi() in response to user input, without having verified that _you_ had done a reserve_lapic_nmi() before. It looks like the code will do horrible things if the operator invokes the sysctl incorrectly. Such errors do happen, so code should include basic sanity checks. /Mikael ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] NMI trigger switch support for debugging 2004-05-26 9:43 ` Mikael Pettersson @ 2004-05-26 13:42 ` AKIYAMA Nobuyuki 2004-05-26 20:34 ` Paul Jackson 1 sibling, 0 replies; 15+ messages in thread From: AKIYAMA Nobuyuki @ 2004-05-26 13:42 UTC (permalink / raw) To: Mikael Pettersson; +Cc: Andrew Morton, linux-kernel Hi Mikael, Thank you for reviewing. Mikael Pettersson wrote: >AKIYAMA Nobuyuki writes: > > +int unknown_nmi_panic = 0; > >It's a kernel coding standard to _not_ explicitly initialise >static-extent data to zero. > > > OK, thanks. > > +/* > > + * proc handler for /proc/sys/kernel/unknown_nmi_panic > > + */ > > +int proc_unknown_nmi_panic(ctl_table *table, int write, > > + struct file *file, void __user *buffer, size_t *length) > > +{ > > + int old_state; > > + > > + old_state = unknown_nmi_panic; > > + proc_dointvec(table, write, file, buffer, length); > > + if (!old_state == !unknown_nmi_panic) > > + return 0; > >This conditional looks terribly obscure. >Can you simplify it or explain your intention here? > > > This code checks whether unknown_nmi_panic is changed to another state. Only when state is changed, I'd like to go next step. old_state unknown_nmi_panic condition 0 0 : TRUE(no change, return) none zero none zero : TRUE(no change, return) 0 none zero : FALSE(changed, go next step) none zero 0 : FALSE(changed, go next step) > > + if (unknown_nmi_panic) { > > + if (reserve_lapic_nmi() < 0) { > > + unknown_nmi_panic = 0; > > + return -EBUSY; > > + } else { > > + set_nmi_callback(unknown_nmi_panic_callback); > > + } > > + } else { > > + release_lapic_nmi(); > >You're invoking release_lapic_nmi() in response to user >input, without having verified that _you_ had done a >reserve_lapic_nmi() before. > > The only one user can obtain NMI callback. If unknown_nmi_panic is 0 at this step, it says that I have obtained NMI callback before. So, I think invoking release_lapic_nmi() has no problem. Regards, Nobuyuki Akiyama ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] NMI trigger switch support for debugging 2004-05-26 9:43 ` Mikael Pettersson 2004-05-26 13:42 ` AKIYAMA Nobuyuki @ 2004-05-26 20:34 ` Paul Jackson 2004-05-26 21:04 ` Mikael Pettersson 1 sibling, 1 reply; 15+ messages in thread From: Paul Jackson @ 2004-05-26 20:34 UTC (permalink / raw) To: Mikael Pettersson; +Cc: akiyama.nobuyuk, akpm, linux-kernel Mikael Pettersson, replying to AKIYAMA Nobuyuki: > > + if (!old_state == !unknown_nmi_panic) > > + return 0; > > This conditional looks terribly obscure. Would the following variant seem clearer: if (!!unknown_nmi_panic == !!old_state) return 0; Odd, I know. For those of us familiar with the '!!' idiom, which converts any value to its binary logical equivalent 0 (if zero) or 1 (otherwise), this reads as: if (the logical value of unknown_nmi_panic is unchanged) return 0; However, I could easily imagine others finding this variant even more bizarre. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.650.933.1373 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] NMI trigger switch support for debugging 2004-05-26 20:34 ` Paul Jackson @ 2004-05-26 21:04 ` Mikael Pettersson 0 siblings, 0 replies; 15+ messages in thread From: Mikael Pettersson @ 2004-05-26 21:04 UTC (permalink / raw) To: Paul Jackson; +Cc: akiyama.nobuyuk, akpm, linux-kernel Paul Jackson writes: > Mikael Pettersson, replying to AKIYAMA Nobuyuki: > > > + if (!old_state == !unknown_nmi_panic) > > > + return 0; > > > > This conditional looks terribly obscure. > > Would the following variant seem clearer: > > if (!!unknown_nmi_panic == !!old_state) > return 0; > > Odd, I know. For those of us familiar with the '!!' idiom, which > converts any value to its binary logical equivalent 0 (if zero) or > 1 (otherwise), this reads as: > > if (the logical value of unknown_nmi_panic is unchanged) > return 0; The !! idiom has the advantage of making it crystal clear that the author is comparing boolean-normalised values. The code I commented on was unusual enough that I couldn't ignore the possibility of a bug. In this case, I'd prefer the !! idiom, or moving the test into the two state-changing code snippets below it. /Mikael ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] typo in drivers/usb/class/usblp.c 2004-05-24 9:21 [PATCH] NMI trigger switch support for debugging AKIYAMA Nobuyuki 2004-05-24 9:34 ` Andrew Morton @ 2004-05-24 10:01 ` Benoît Dejean 1 sibling, 0 replies; 15+ messages in thread From: Benoît Dejean @ 2004-05-24 10:01 UTC (permalink / raw) Cc: linux-kernel i think there's a typo error in usblp.c patch against 2.6.6 --- linux-2.6.6/drivers/usb/class/usblp.c 2004-04-04 05:36:26.000000000 +0200 +++ linux-2.6.6-modified/drivers/usb/class/usblp.c 2004-05-24 01:15:20.000000000 +0200 @@ -305,7 +305,7 @@ if (~status & LP_PERRORP) newerr = 3; - if (status & LP_POUTPA) + if (~status & LP_POUTPA) newerr = 1; if (~status & LP_PSELECD) newerr = 2; -- Benoît Dejean JID: TazForEver@jabber.org http://gdesklets.gnomedesktop.org http://www.paulla.asso.fr ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <1ZjWz-sq-11@gated-at.bofh.it>]
[parent not found: <1Zk6a-z2-3@gated-at.bofh.it>]
[parent not found: <1ZVyO-63D-25@gated-at.bofh.it>]
[parent not found: <1ZVIq-69O-3@gated-at.bofh.it>]
[parent not found: <1ZWuM-6Nz-1@gated-at.bofh.it>]
[parent not found: <1ZWEt-6Uw-1@gated-at.bofh.it>]
* Re: [PATCH] NMI trigger switch support for debugging [not found] ` <1ZWEt-6Uw-1@gated-at.bofh.it> @ 2004-05-26 11:23 ` Andi Kleen 0 siblings, 0 replies; 15+ messages in thread From: Andi Kleen @ 2004-05-26 11:23 UTC (permalink / raw) To: Andrew Morton; +Cc: akiyama.nobuyuk, linux-kernel Andrew Morton <akpm@osdl.org> writes: > AKIYAMA Nobuyuki <akiyama.nobuyuk@jp.fujitsu.com> wrote: >> >> Sorry, I resend document and patch. > > Great, thanks. Updates to Documentation/kernel-parameters.txt and > Documentation/filesystems/proc.txt would be nice. > > > If the machine locks up with interrupts enabled we can use sysrq-T and > sysrq-P. If it locks up with interrupts disabled the NMI watchdog will > automatically produce the same info as your patch. So what advantage does > the patch add? His patch will still work e.g. if the interrupt locks are messed up. Then the keyboard interrupt will not work anymore, but NMI will. Arguably a bit obscure, but could happen. The bigger advantage I see from the patch (and why i like it) is that distributions often disable sysrq by default for security reasons. This is not really needed for this NMI oopser, since you can assume that someone with access to the NMI switch can crash the machine at will. -Andi ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2004-05-27 8:42 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-24 9:21 [PATCH] NMI trigger switch support for debugging AKIYAMA Nobuyuki
2004-05-24 9:34 ` Andrew Morton
2004-05-26 1:36 ` AKIYAMA Nobuyuki
2004-05-26 1:41 ` Andrew Morton
2004-05-26 2:28 ` AKIYAMA Nobuyuki
2004-05-26 2:37 ` Andrew Morton
2004-05-26 12:17 ` AKIYAMA Nobuyuki
2004-05-26 12:52 ` Maciej W. Rozycki
2004-05-27 8:43 ` AKIYAMA Nobuyuki
2004-05-26 9:43 ` Mikael Pettersson
2004-05-26 13:42 ` AKIYAMA Nobuyuki
2004-05-26 20:34 ` Paul Jackson
2004-05-26 21:04 ` Mikael Pettersson
2004-05-24 10:01 ` [PATCH] typo in drivers/usb/class/usblp.c Benoît Dejean
[not found] <1ZjWz-sq-11@gated-at.bofh.it>
[not found] ` <1Zk6a-z2-3@gated-at.bofh.it>
[not found] ` <1ZVyO-63D-25@gated-at.bofh.it>
[not found] ` <1ZVIq-69O-3@gated-at.bofh.it>
[not found] ` <1ZWuM-6Nz-1@gated-at.bofh.it>
[not found] ` <1ZWEt-6Uw-1@gated-at.bofh.it>
2004-05-26 11:23 ` [PATCH] NMI trigger switch support for debugging Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox