* NMI is broken!
@ 2001-11-30 9:10 george anzinger
2001-12-01 7:11 ` george anzinger
0 siblings, 1 reply; 2+ messages in thread
From: george anzinger @ 2001-11-30 9:10 UTC (permalink / raw)
To: linux-kernel@vger.kernel.org, Ingo Molnar
It appears that the NMI generation set up code is NOT correctly setting
up the system to generate NMIs if the interrupt system is off! Kernel
version 2.4.2 NMI worked correctly, 2.4.7 and later do not. I have not
tested kernels 2.4.3 thru 2.4.6. If the interrupt system is on, NMIs
are being correctly generated.
IMHO the check_nmi_watchdog() test is flawed in that it explicitly
enables interrupts prior to the test. Below is a patch to fix this
issue. When this patch is applied to the 2.4.2 kernel, the test works
correctly, however, 2.4.7 and above all report "NMI appears to be
stuck". (Note, the check_nmi_watchdog() code is in io_apic.c in the
2.4.2 kernel and is called "nmi_irq_works()".)
Note that this is not just a testing issue, NMI is no longer pulling the
system out of deadlocks with any of the irq spinlocks.
I am afraid that I don't know enough or have the right documentation to
figure out what changed, but it appears that the change was about the
same time that the nmi.c module was introduced.
Here is the recommended patch for nmi.c:
--- linux-2.4.13-org/arch/i386/kernel/nmi.c Tue Sep 25 00:34:57 2001
+++ linux/arch/i386/kernel/nmi.c Fri Nov 30 00:31:03 2001
@@ -46,28 +54,30 @@
int __init check_nmi_watchdog (void)
{
irq_cpustat_t tmp[NR_CPUS];
- int j, cpu;
+ int j, cpu, err = 0;
printk(KERN_INFO "testing NMI watchdog ... ");
memcpy(tmp, irq_stat, sizeof(tmp));
- sti();
+ cli();
mdelay((10*1000)/nmi_hz); // wait 10 ticks
for (j = 0; j < smp_num_cpus; j++) {
cpu = cpu_logical_map(j);
if (nmi_count(cpu) - tmp[cpu].__nmi_count <= 5) {
printk("CPU#%d: NMI appears to be stuck!\n", cpu);
- return -1;
+ err = -1;
}
}
- printk("OK.\n");
+ if (! err ){
+ printk("OK.\n");
+ }
/* now that we know it works we can reduce NMI frequency to
something more reasonable; makes a difference in some configs */
if (nmi_watchdog == NMI_LOCAL_APIC)
nmi_hz = 1;
-
+ sti();
return 0;
}
--
George george@mvista.com
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: NMI is broken!
2001-11-30 9:10 NMI is broken! george anzinger
@ 2001-12-01 7:11 ` george anzinger
0 siblings, 0 replies; 2+ messages in thread
From: george anzinger @ 2001-12-01 7:11 UTC (permalink / raw)
To: linux-kernel@vger.kernel.org, Ingo Molnar
george anzinger wrote:
>
> It appears that the NMI generation set up code is NOT correctly setting
> up the system to generate NMIs if the interrupt system is off! Kernel
> version 2.4.2 NMI worked correctly, 2.4.7 and later do not. I have not
> tested kernels 2.4.3 thru 2.4.6. If the interrupt system is on, NMIs
> are being correctly generated.
Oops! I did it! It appears that PIT interrupt requests are somehow
involved in the NMI generation. Since the high-res-timers patch
programs the PIT to interrupt once at the end of each PIT interrupt, it
stops interrupting if the interrupt system is off. At least this
appears to be the problem. Now if someone could explain just how this
is done, or at least point me to the the appropriate documentation...
So, sorry for the noise, but still, I think the patch below is a good
thing(tm).
George
>
> IMHO the check_nmi_watchdog() test is flawed in that it explicitly
> enables interrupts prior to the test. Below is a patch to fix this
> issue. When this patch is applied to the 2.4.2 kernel, the test works
> correctly, however, 2.4.7 and above all report "NMI appears to be
> stuck". (Note, the check_nmi_watchdog() code is in io_apic.c in the
> 2.4.2 kernel and is called "nmi_irq_works()".)
>
> Note that this is not just a testing issue, NMI is no longer pulling the
> system out of deadlocks with any of the irq spinlocks.
>
> I am afraid that I don't know enough or have the right documentation to
> figure out what changed, but it appears that the change was about the
> same time that the nmi.c module was introduced.
>
> Here is the recommended patch for nmi.c:
>
> --- linux-2.4.13-org/arch/i386/kernel/nmi.c Tue Sep 25 00:34:57 2001
> +++ linux/arch/i386/kernel/nmi.c Fri Nov 30 00:31:03 2001
> @@ -46,28 +54,30 @@
> int __init check_nmi_watchdog (void)
> {
> irq_cpustat_t tmp[NR_CPUS];
> - int j, cpu;
> + int j, cpu, err = 0;
>
> printk(KERN_INFO "testing NMI watchdog ... ");
>
> memcpy(tmp, irq_stat, sizeof(tmp));
> - sti();
> + cli();
> mdelay((10*1000)/nmi_hz); // wait 10 ticks
>
> for (j = 0; j < smp_num_cpus; j++) {
> cpu = cpu_logical_map(j);
> if (nmi_count(cpu) - tmp[cpu].__nmi_count <= 5) {
> printk("CPU#%d: NMI appears to be stuck!\n", cpu);
> - return -1;
> + err = -1;
> }
> }
> - printk("OK.\n");
> + if (! err ){
> + printk("OK.\n");
> + }
>
> /* now that we know it works we can reduce NMI frequency to
> something more reasonable; makes a difference in some configs */
> if (nmi_watchdog == NMI_LOCAL_APIC)
> nmi_hz = 1;
> -
> + sti();
> return 0;
> }
>
>
> --
> George george@mvista.com
> High-res-timers: http://sourceforge.net/projects/high-res-timers/
> Real time sched: http://sourceforge.net/projects/rtsched/
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
George george@mvista.com
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2001-12-01 7:12 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-11-30 9:10 NMI is broken! george anzinger
2001-12-01 7:11 ` george anzinger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox