* nmi watchdog failure on dual Athlon box
@ 2004-09-28 16:33 Joerg Sommrey
2004-09-28 17:08 ` Maciej W. Rozycki
2004-09-28 20:20 ` Chris Wedgwood
0 siblings, 2 replies; 6+ messages in thread
From: Joerg Sommrey @ 2004-09-28 16:33 UTC (permalink / raw)
To: Linux kernel mailing list
Hello,
just tried Ingo's "lockupcli" nmi watchdog test - it fails to unlock the
box.
boot-parm:
...nmi_watchdog=2...
dmesg:
...
testing NMI watchdog ... OK.
...
/proc/interrupts:
...
NMI: 115 103
...
So far everything looks fine. But after running Ingo's "lockupcli" the
box is locked (surprise!) but there is no nmi watchdog killing anything.
The system gets rebooted from the w83627hf WDT after 60 s.
System:
Tyan Tiger MPX (S2466)
2 x Athlon MP 2000+
kernel 2.6.8.1
nmi_watchdog=1 has never worked for me (except 2.6.3-mm4).
I'm not really surprised at this test as I had a couple of lockups in
the past that were never resolved by the nmi watchdog.
Any ideas?
-jo
--
-rw-r--r-- 1 jo users 63 2004-09-28 17:44 /home/jo/.signature
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: nmi watchdog failure on dual Athlon box
2004-09-28 16:33 nmi watchdog failure on dual Athlon box Joerg Sommrey
@ 2004-09-28 17:08 ` Maciej W. Rozycki
2004-09-28 18:31 ` Joerg Sommrey
2004-09-28 20:20 ` Chris Wedgwood
1 sibling, 1 reply; 6+ messages in thread
From: Maciej W. Rozycki @ 2004-09-28 17:08 UTC (permalink / raw)
To: Joerg Sommrey; +Cc: Linux kernel mailing list
On Tue, 28 Sep 2004, Joerg Sommrey wrote:
> just tried Ingo's "lockupcli" nmi watchdog test - it fails to unlock the
> box.
>
> boot-parm:
> ...nmi_watchdog=2...
The local APIC NMI watchdog has limited capabilities. It may fail to
trigger for certain lockups because there is no available event that would
happen periodically regardless of the CPU state. I can only suspect what
"lockupcli" does (where is it available from, anyway?), but if it runs
"cli; hlt", then the watchdog *will* fail.
> nmi_watchdog=1 has never worked for me (except 2.6.3-mm4).
Too bad. The I/O APIC watchdog triggers regardless of the CPU state and
works as long as the chipset is operational.
Maciej
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: nmi watchdog failure on dual Athlon box
2004-09-28 17:08 ` Maciej W. Rozycki
@ 2004-09-28 18:31 ` Joerg Sommrey
2004-09-28 21:08 ` Maciej W. Rozycki
0 siblings, 1 reply; 6+ messages in thread
From: Joerg Sommrey @ 2004-09-28 18:31 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: Linux kernel mailing list
On Tue, Sep 28, 2004 at 06:08:37PM +0100, Maciej W. Rozycki wrote:
> On Tue, 28 Sep 2004, Joerg Sommrey wrote:
>
> > just tried Ingo's "lockupcli" nmi watchdog test - it fails to unlock the
> > box.
> >
> > boot-parm:
> > ...nmi_watchdog=2...
>
> The local APIC NMI watchdog has limited capabilities. It may fail to
> trigger for certain lockups because there is no available event that would
> happen periodically regardless of the CPU state. I can only suspect what
> "lockupcli" does (where is it available from, anyway?), but if it runs
> "cli; hlt", then the watchdog *will* fail.
Here's the quote from Ingo's mail:
In <2Jo20-7ry-33@gated-at.bofh.it> Ingo Molnar <mingo@elte.hu> writes:
|once the NMI watchdog is up and running it should catch all hard lockups
|and print backtraces to the serial console - even if you are within X
|while the lockup happens. You can test hard lockups by running the
|attached 'lockupcli' userspace code as root - it turns off interrupts
|and goes into an infinite loop => instant lockup. The NMI watchdog
|should notice this condition after a couple of seconds and should abort
|the task, printing a kernel trace as well. Your box should be back in
|working order after that point.
[...]
|--- lockupcli.c
|
|main ()
|{
| iopl(3);
| for (;;) asm("cli");
|}
Does this mean there is a good reason for further investigations on why
the IO-APIC NMI watchdog doesn't work? Until now I thought it would
be ok as long as the local APIC NMI watchdog is set up.
-jo
--
-rw-r--r-- 1 jo users 63 2004-09-28 18:42 /home/jo/.signature
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: nmi watchdog failure on dual Athlon box
2004-09-28 16:33 nmi watchdog failure on dual Athlon box Joerg Sommrey
2004-09-28 17:08 ` Maciej W. Rozycki
@ 2004-09-28 20:20 ` Chris Wedgwood
1 sibling, 0 replies; 6+ messages in thread
From: Chris Wedgwood @ 2004-09-28 20:20 UTC (permalink / raw)
To: Joerg Sommrey, Linux kernel mailing list
On Tue, Sep 28, 2004 at 06:33:24PM +0200, Joerg Sommrey wrote:
> nmi_watchdog=1 has never worked for me (except 2.6.3-mm4).
tyan 2466? if so then i've seen this too, i think it's a mainboard
problem
--cw
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: nmi watchdog failure on dual Athlon box
2004-09-28 18:31 ` Joerg Sommrey
@ 2004-09-28 21:08 ` Maciej W. Rozycki
2004-09-29 20:27 ` Joerg Sommrey
0 siblings, 1 reply; 6+ messages in thread
From: Maciej W. Rozycki @ 2004-09-28 21:08 UTC (permalink / raw)
To: Joerg Sommrey; +Cc: Linux kernel mailing list
On Tue, 28 Sep 2004, Joerg Sommrey wrote:
> |--- lockupcli.c
> |
> |main ()
> |{
> | iopl(3);
> | for (;;) asm("cli");
> |}
>
> Does this mean there is a good reason for further investigations on why
> the IO-APIC NMI watchdog doesn't work? Until now I thought it would
> be ok as long as the local APIC NMI watchdog is set up.
Since this program does busy looping, the local APIC NMI watchdog should
trigger indeed. It's "cli; hlt" that causes a problem with this watchdog.
Something wrong is happening in your system, indeed.
Maciej
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: nmi watchdog failure on dual Athlon box
2004-09-28 21:08 ` Maciej W. Rozycki
@ 2004-09-29 20:27 ` Joerg Sommrey
0 siblings, 0 replies; 6+ messages in thread
From: Joerg Sommrey @ 2004-09-29 20:27 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: Linux kernel mailing list
On Tue, Sep 28, 2004 at 10:08:21PM +0100, Maciej W. Rozycki wrote:
> On Tue, 28 Sep 2004, Joerg Sommrey wrote:
>
> > |--- lockupcli.c
> > |
> > |main ()
> > |{
> > | iopl(3);
> > | for (;;) asm("cli");
> > |}
> >
> > Does this mean there is a good reason for further investigations on why
> > the IO-APIC NMI watchdog doesn't work? Until now I thought it would
> > be ok as long as the local APIC NMI watchdog is set up.
>
> Since this program does busy looping, the local APIC NMI watchdog should
> trigger indeed. It's "cli; hlt" that causes a problem with this watchdog.
> Something wrong is happening in your system, indeed.
As I stated earlier, there *seemed* to be a working IO-APIC NMI watchdog
with 2.6.3-mm4. I never checked it's functionallity. Now I rebuilt that
kernel and gave it a try. Though it claims to have a running IO-APIC NMI
watchdog, the lockupcli test failed. Zwane was right when he suspected the
nmi_watchdog=1 test working erratically in that case. Sad but true: no NMI
watchdog on tyan S2466. I wonder if it's just impossible on such a board
or if it needs some "special treatment"
-jo
--
-rw-r--r-- 1 jo users 63 2004-09-29 22:10 /home/jo/.signature
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2004-09-29 20:28 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-28 16:33 nmi watchdog failure on dual Athlon box Joerg Sommrey
2004-09-28 17:08 ` Maciej W. Rozycki
2004-09-28 18:31 ` Joerg Sommrey
2004-09-28 21:08 ` Maciej W. Rozycki
2004-09-29 20:27 ` Joerg Sommrey
2004-09-28 20:20 ` Chris Wedgwood
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox