* Re: [PATCH 2.5] report unknown NMI reasons only once
[not found] ` <20030420052007$742b@gated-at.bofh.it>
@ 2003-04-20 14:09 ` Pascal Schmidt
2003-04-20 14:32 ` John Bradford
0 siblings, 1 reply; 7+ messages in thread
From: Pascal Schmidt @ 2003-04-20 14:09 UTC (permalink / raw)
To: linux-kernel; +Cc: Zwane Mwaikambo
On Sun, 20 Apr 2003 07:20:07 +0200, you wrote:
>> Beats me as to what could be wrong. It's not a memory problem and the
>> CPU does not overheat.
>>
>> I'll go patch the kernel for my personal use then, but I'm not the only
>> one seeing those messages without any system problems.
>
> It's all fun and games until NMIs turn into MCEs...
I have the MCE polling stuff enabled and will keep an eye on it. So far
I suspect flaky motherboard design (Athlon XPs didn't even exist when
this piece of hardware was designed).
It's definitely CPU-related since it never happened with the Duron
processor that I used before.
--
Ciao,
Pascal
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH 2.5] report unknown NMI reasons only once
2003-04-20 14:09 ` [PATCH 2.5] report unknown NMI reasons only once Pascal Schmidt
@ 2003-04-20 14:32 ` John Bradford
2003-04-20 14:58 ` Pascal Schmidt
0 siblings, 1 reply; 7+ messages in thread
From: John Bradford @ 2003-04-20 14:32 UTC (permalink / raw)
To: Pascal Schmidt; +Cc: linux-kernel, Zwane Mwaikambo
> >> Beats me as to what could be wrong. It's not a memory problem and the
> >> CPU does not overheat.
> >>
> >> I'll go patch the kernel for my personal use then, but I'm not the only
> >> one seeing those messages without any system problems.
> >
> > It's all fun and games until NMIs turn into MCEs...
>
> I have the MCE polling stuff enabled and will keep an eye on it. So far
> I suspect flaky motherboard design (Athlon XPs didn't even exist when
> this piece of hardware was designed).
>
> It's definitely CPU-related since it never happened with the Duron
> processor that I used before.
Are you sure that the CPU voltage is correct?
John.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2.5] report unknown NMI reasons only once
2003-04-20 14:32 ` John Bradford
@ 2003-04-20 14:58 ` Pascal Schmidt
0 siblings, 0 replies; 7+ messages in thread
From: Pascal Schmidt @ 2003-04-20 14:58 UTC (permalink / raw)
To: John Bradford; +Cc: linux-kernel
On Sun, 20 Apr 2003, John Bradford wrote:
> > It's definitely CPU-related since it never happened with the Duron
> > processor that I used before.
>
> Are you sure that the CPU voltage is correct?
The motherboard is in autodetect mode...
According to AMD's docs, my CPU wants 1.50V.
According to the BIOS, what it gets is 1.48V.
AMD does not specify a tolerance for Vcore, but I can try to manually
set the motherboard for 1.50V and see if the NMIs disappear.
--
Ciao,
Pascal
^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20030419183013$0e6c@gated-at.bofh.it>]
* [PATCH 2.5] report unknown NMI reasons only once
@ 2003-04-19 18:21 Pascal Schmidt
2003-04-19 21:59 ` Alan Cox
0 siblings, 1 reply; 7+ messages in thread
From: Pascal Schmidt @ 2003-04-19 18:21 UTC (permalink / raw)
To: linux-kernel; +Cc: Alan Cox
Hi people!
On my machine (Athlon XP on an old ALi Magik1 motherboard), I get tons of
messages like:
Uhhuh. NMI received for unknown reason 2d on CPU 0.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
There's nothing I can do about them (ACPI or APIC support in the kernel or
not does not make a difference, nor do any of the BIOS power management
settings). The machine runs fine, no problems in memtest86, and I've not
noticed a problem with a system in the three or four months that I'm
running this configuration.
Those NMIs happen only rarely when the machine is lightly loaded, but
under load, I get several of them per second. This quickly makes
/var/log/messages grow.
I don't think reporting any of those NMIs more than once provides
valuable information, so I've cooked up a patch which only reports each
unknown NMI reason once.
I don't know who the maintainer of arch/i386/kernel/traps.c is supposed
to be, so I'm sending this to the list and CC to Alan since he is not
unlikely to take small patches like this. ;)
Patch against 2.5.67-bk10. Compiles, boots and does what it's supposed to
do on my machine.
Comments, anyone?
--- linux-2.5.67-bk10/arch/i386/kernel/traps.c Sat Apr 19 20:00:42 2003
+++ work/arch/i386/kernel/traps.c Sat Apr 19 20:17:40 2003
@@ -5,6 +5,8 @@
*
* Pentium III FXSR, SSE support
* Gareth Hughes <gareth@valinux.com>, May 2000
+ * Limit unknown NMI reporting
+ * Pascal Schmidt <der.eremit@email.de>, April 2003
*/
/*
@@ -48,6 +50,8 @@
#include <asm/pgalloc.h>
#include <asm/arch_hooks.h>
+#include <asm/bitops.h>
+
#include <linux/irq.h>
#include <linux/module.h>
@@ -417,6 +421,9 @@ static void io_check_error(unsigned char
outb(reason, 0x61);
}
+/* bit field for already reported unknown NMI reasons */
+static int unknown_nmi_reported[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
+
static void unknown_nmi_error(unsigned char reason, struct pt_regs * regs)
{
#ifdef CONFIG_MCA
@@ -427,10 +434,13 @@ static void unknown_nmi_error(unsigned c
return;
}
#endif
- printk("Uhhuh. NMI received for unknown reason %02x on CPU %d.\n",
- reason, smp_processor_id());
- printk("Dazed and confused, but trying to continue\n");
- printk("Do you have a strange power saving mode enabled?\n");
+ if ( !test_bit(reason, (void *)&unknown_nmi_reported) ) {
+ printk("Uhhuh. NMI received for unknown reason %02x on CPU %d.\n",
+ reason, smp_processor_id());
+ printk("Dazed and confused, but trying to continue\n");
+ printk("Do you have a strange power saving mode enabled?\n");
+ __set_bit(reason, (void *)&unknown_nmi_reported);
+ }
}
static void default_do_nmi(struct pt_regs * regs)
--
Ciao,
Pascal
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH 2.5] report unknown NMI reasons only once
2003-04-19 18:21 Pascal Schmidt
@ 2003-04-19 21:59 ` Alan Cox
0 siblings, 0 replies; 7+ messages in thread
From: Alan Cox @ 2003-04-19 21:59 UTC (permalink / raw)
To: Pascal Schmidt; +Cc: Linux Kernel Mailing List
On Sad, 2003-04-19 at 19:21, Pascal Schmidt wrote:
> Those NMIs happen only rarely when the machine is lightly loaded, but
> under load, I get several of them per second. This quickly makes
> /var/log/messages grow.
I guess they are overheat traps then
> I don't think reporting any of those NMIs more than once provides
> valuable information, so I've cooked up a patch which only reports each
> unknown NMI reason once.
Its sitting there saying "Something is wrong" "Something is still
wrong". By all means kill it on your box, but this is not good for
general consumption.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2003-04-20 14:46 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20030419232008$7a8a@gated-at.bofh.it>
[not found] ` <20030419232008$6291@gated-at.bofh.it>
[not found] ` <20030419232008$5be3@gated-at.bofh.it>
[not found] ` <20030420052007$742b@gated-at.bofh.it>
2003-04-20 14:09 ` [PATCH 2.5] report unknown NMI reasons only once Pascal Schmidt
2003-04-20 14:32 ` John Bradford
2003-04-20 14:58 ` Pascal Schmidt
[not found] <20030419183013$0e6c@gated-at.bofh.it>
[not found] ` <20030419230019$7cdc@gated-at.bofh.it>
2003-04-19 23:17 ` Pascal Schmidt
2003-04-20 5:03 ` Zwane Mwaikambo
2003-04-19 18:21 Pascal Schmidt
2003-04-19 21:59 ` Alan Cox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox