* 2.4.18-pre8-K2: Kernel panic: CPU context corrupt
@ 2002-02-07 23:18 Alex Riesen
2002-02-07 23:36 ` Dave Jones
0 siblings, 1 reply; 9+ messages in thread
From: Alex Riesen @ 2002-02-07 23:18 UTC (permalink / raw)
To: linux-kernel
Frozen while compiling galeon (1.1.2, 778 files in ~50Mb),
also had xmms playing something (alsa-0.5.12, Ensoniq AudioPCI ES1371),
and some ssh (slow traffic, NIC Digital Equipment Corporation DECchip 21142/43).
NFS traffic (kernel automounter). XFree86 4.2.0, usb devices (mouse, for example).
Low static electricity.
It looks really bad :(
Ok, continue...
alt-sysrq-b booted, and sync seems also worked:
Feb 7 23:45:31 steel kernel: CPU 0: Machine Check Exception: 0000000000000004
Feb 7 23:45:31 steel kernel: Bank 4: b200000000040151
Feb 7 23:45:31 steel kernel: Kernel panic: CPU context corrupt
Feb 7 23:46:07 steel kernel: <6>SysRq : Emergency Sync
Feb 7 23:46:07 steel kernel: Syncing device 03:02 ... OK
I've pressed sysrq-s many times, at the moments sound played a second,
two or three times.
No serial console output, sorry, thought the system went stable.
Booted 2.5.4-pre1 before, recovered home reiserfs (--rebuild-tree)
from the mess it left. Rebooted in 2.4.18-pre8-K2. Got the panic.
-alex
P.S. no nasty suspections about processor, please. No funds reserved
for a new one :)
PIII-700, ASUS CUV4X (VIA KT133), <512Mb
ver_linux:
Linux steel 2.4.18-pre8-K2 #2 Thu Feb 7 00:02:26 CET 2002 i686 unknown
Gnu C 2.95.3
Gnu make 3.79.1
binutils 2.11.2
util-linux 2.11n
mount 2.11n
modutils 2.4.12
e2fsprogs 1.23
reiserfsprogs 3.x.0j
Linux C Library 2.2.4
Dynamic linker (ldd) 2.2.4
Procps 2.0.7
Console-tools 0.3.3
Sh-utils 2.0
Modules Loaded nfs lockd sunrpc ide-cd cdrom snd-seq-midi snd-seq-midi-event snd-seq snd-card-ens1371 snd-ens1371 snd-pcm snd-timer snd-rawmidi snd-seq-device snd-ac97-codec snd-mixer snd soundcore autofs4 tulip mousedev usbmouse usb-uhci usbcore input reiserfs ext3 jbd nls_iso8859-1 nls_cp437 vfat fat
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.4.18-pre8-K2: Kernel panic: CPU context corrupt
2002-02-07 23:18 2.4.18-pre8-K2: Kernel panic: CPU context corrupt Alex Riesen
@ 2002-02-07 23:36 ` Dave Jones
2002-02-08 22:13 ` Alex Riesen
2002-02-09 22:23 ` Pavel Machek
0 siblings, 2 replies; 9+ messages in thread
From: Dave Jones @ 2002-02-07 23:36 UTC (permalink / raw)
To: Alex Riesen; +Cc: linux-kernel
On Fri, Feb 08, 2002 at 12:18:31AM +0100, Alex Riesen wrote:
> Feb 7 23:45:31 steel kernel: CPU 0: Machine Check Exception: 0000000000000004
> Feb 7 23:45:31 steel kernel: Bank 4: b200000000040151
> Feb 7 23:45:31 steel kernel: Kernel panic: CPU context corrupt
Machine checks are indicative of hardware fault.
Overclocking, inadequate cooling and bad memory are the usual causes.
> P.S. no nasty suspections about processor, please. No funds reserved
> for a new one :)
The truth hurts 8(
--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.4.18-pre8-K2: Kernel panic: CPU context corrupt
2002-02-07 23:36 ` Dave Jones
@ 2002-02-08 22:13 ` Alex Riesen
2002-02-09 22:23 ` Pavel Machek
1 sibling, 0 replies; 9+ messages in thread
From: Alex Riesen @ 2002-02-08 22:13 UTC (permalink / raw)
To: linux-kernel
On Fri, Feb 08, 2002 at 12:36:53AM +0100, Dave Jones wrote:
> On Fri, Feb 08, 2002 at 12:18:31AM +0100, Alex Riesen wrote:
>
> > Feb 7 23:45:31 steel kernel: CPU 0: Machine Check Exception: 0000000000000004
> > Feb 7 23:45:31 steel kernel: Bank 4: b200000000040151
> > Feb 7 23:45:31 steel kernel: Kernel panic: CPU context corrupt
>
> Machine checks are indicative of hardware fault.
> Overclocking, inadequate cooling and bad memory are the usual causes.
no overclocking, memtest passed (1 pass, 1 hour), native intel cooler.
Space radiation, maybe 8)
> > P.S. no nasty suspections about processor, please. No funds reserved
> > for a new one :)
>
> The truth hurts 8(
oh dear...
>
> --
> | Dave Jones. http://www.codemonkey.org.uk
> | SuSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.4.18-pre8-K2: Kernel panic: CPU context corrupt
2002-02-07 23:36 ` Dave Jones
2002-02-08 22:13 ` Alex Riesen
@ 2002-02-09 22:23 ` Pavel Machek
2002-02-10 21:33 ` Dave Jones
2002-02-11 11:59 ` Alex Riesen
1 sibling, 2 replies; 9+ messages in thread
From: Pavel Machek @ 2002-02-09 22:23 UTC (permalink / raw)
To: Dave Jones, Alex Riesen, linux-kernel
Hi!
> > Feb 7 23:45:31 steel kernel: CPU 0: Machine Check Exception: 0000000000000004
> > Feb 7 23:45:31 steel kernel: Bank 4: b200000000040151
> > Feb 7 23:45:31 steel kernel: Kernel panic: CPU context corrupt
>
> Machine checks are indicative of hardware fault.
> Overclocking, inadequate cooling and bad memory are the usual
> causes.
Maybe you should print something like
Machine Check Exception: .... (hardware problem!)
so that we get less reports like this?
Pavel
--
(about SSSCA) "I don't say this lightly. However, I really think that the U.S.
no longer is classifiable as a democracy, but rather as a plutocracy." --hpa
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.4.18-pre8-K2: Kernel panic: CPU context corrupt
2002-02-09 22:23 ` Pavel Machek
@ 2002-02-10 21:33 ` Dave Jones
2002-02-11 1:03 ` Alan Cox
2002-02-11 11:59 ` Alex Riesen
1 sibling, 1 reply; 9+ messages in thread
From: Dave Jones @ 2002-02-10 21:33 UTC (permalink / raw)
To: Pavel Machek; +Cc: Alex Riesen, linux-kernel
On Sat, 9 Feb 2002, Pavel Machek wrote:
> > > Feb 7 23:45:31 steel kernel: CPU 0: Machine Check Exception: 0000000000000004
> > > Feb 7 23:45:31 steel kernel: Bank 4: b200000000040151
> > > Feb 7 23:45:31 steel kernel: Kernel panic: CPU context corrupt
> > Machine checks are indicative of hardware fault.
> > Overclocking, inadequate cooling and bad memory are the usual
> > causes.
> Maybe you should print something like
> Machine Check Exception: .... (hardware problem!)
> so that we get less reports like this?
When I get around to finishing the diagnosis tool, I'll add
something like "Feed to decodemca for more info".
--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.4.18-pre8-K2: Kernel panic: CPU context corrupt
2002-02-10 21:33 ` Dave Jones
@ 2002-02-11 1:03 ` Alan Cox
0 siblings, 0 replies; 9+ messages in thread
From: Alan Cox @ 2002-02-11 1:03 UTC (permalink / raw)
To: Dave Jones; +Cc: Pavel Machek, Alex Riesen, linux-kernel
> > Maybe you should print something like
> > Machine Check Exception: .... (hardware problem!)
> > so that we get less reports like this?
>
> When I get around to finishing the diagnosis tool, I'll add
> something like "Feed to decodemca for more info".
For a lot of processors the MCE values are not documented. Strangely for
once Intel are the good guys and AMD seem to be sitting on the docs.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.4.18-pre8-K2: Kernel panic: CPU context corrupt
2002-02-09 22:23 ` Pavel Machek
2002-02-10 21:33 ` Dave Jones
@ 2002-02-11 11:59 ` Alex Riesen
2002-02-11 12:52 ` Pavel Machek
1 sibling, 1 reply; 9+ messages in thread
From: Alex Riesen @ 2002-02-11 11:59 UTC (permalink / raw)
To: Pavel Machek; +Cc: linux-kernel
I can good understand that it is a hardware problem.
But if someone seems not to be interested in reports like this,
why dump them out? Just save what we can and hang silently,
but no reports, they're boring 8-]
What does the "Bank 4: b200000000040151" mean?
If that is a memory, can anyone help to find out which slot it is?
(memtest86 haven't found anything, btw, i doubt that counts)
-alex
P.S. if someone going to change the message about machine check,
could you please avoid lame descriptions? Like "(hardware problem!)"?
I sure the majority are experienced enough to understand what the
words "Machine Check" mean.
On Sat, Feb 09, 2002 at 11:23:58PM +0100, Pavel Machek wrote:
> Hi!
>
> > > Feb 7 23:45:31 steel kernel: CPU 0: Machine Check Exception: 0000000000000004
> > > Feb 7 23:45:31 steel kernel: Bank 4: b200000000040151
> > > Feb 7 23:45:31 steel kernel: Kernel panic: CPU context corrupt
> >
> > Machine checks are indicative of hardware fault.
> > Overclocking, inadequate cooling and bad memory are the usual
> > causes.
>
> Maybe you should print something like
>
> Machine Check Exception: .... (hardware problem!)
>
> so that we get less reports like this?
> Pavel
> --
> (about SSSCA) "I don't say this lightly. However, I really think that the U.S.
> no longer is classifiable as a democracy, but rather as a plutocracy." --hpa
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.4.18-pre8-K2: Kernel panic: CPU context corrupt
2002-02-11 11:59 ` Alex Riesen
@ 2002-02-11 12:52 ` Pavel Machek
0 siblings, 0 replies; 9+ messages in thread
From: Pavel Machek @ 2002-02-11 12:52 UTC (permalink / raw)
To: Alex Riesen; +Cc: linux-kernel
Hi!
> What does the "Bank 4: b200000000040151" mean?
> If that is a memory, can anyone help to find out which slot it is?
> (memtest86 haven't found anything, btw, i doubt that counts)
> -alex
>
> P.S. if someone going to change the message about machine check,
> could you please avoid lame descriptions? Like "(hardware problem!)"?
> I sure the majority are experienced enough to understand what the
> words "Machine Check" mean.
Ugh? If you understand that its hardware problem, why did you bother
contacting l-k? l-k is certainly not interested in debugging hardware
problems....
...and... It is not exactly easy to see that Machine check means
hardware problem...
Pavel
> > > > Feb 7 23:45:31 steel kernel: CPU 0: Machine Check Exception: 0000000000000004
> > > > Feb 7 23:45:31 steel kernel: Bank 4: b200000000040151
> > > > Feb 7 23:45:31 steel kernel: Kernel panic: CPU context corrupt
> > >
> > > Machine checks are indicative of hardware fault.
> > > Overclocking, inadequate cooling and bad memory are the usual
> > > causes.
> >
> > Maybe you should print something like
> >
> > Machine Check Exception: .... (hardware problem!)
> >
> > so that we get less reports like this?
--
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.4.18-pre8-K2: Kernel panic: CPU context corrupt
@ 2002-02-08 22:44 Dieter Nützel
0 siblings, 0 replies; 9+ messages in thread
From: Dieter Nützel @ 2002-02-08 22:44 UTC (permalink / raw)
To: Alex Riesen; +Cc: Dave Jones, Linux Kernel List
On Friday, February 08, 2002 at 22:18 +0100, Alex Riesen wrote:
> On Fri, Feb 08, 2002 at 12:36:53AM +0100, Dave Jones wrote:
> > On Fri, Feb 08, 2002 at 12:18:31AM +0100, Alex Riesen wrote:
> >
> > > Feb 7 23:45:31 steel kernel: CPU 0: Machine Check Exception:
> > > 0000000000000004
> > > Feb 7 23:45:31 steel kernel: Bank 4: b200000000040151
> > > Feb 7 23:45:31 steel kernel: Kernel panic: CPU context corrupt
> >
> > Machine checks are indicative of hardware fault.
> > Overclocking, inadequate cooling and bad memory are the usual causes.
>
> no overclocking, memtest passed (1 pass, 1 hour), native intel cooler.
> Space radiation, maybe 8)
We run it over night in our lab, to be sure...
Good luck!
-Dieter
--
Dieter Nützel
Graduate Student, Computer Science
University of Hamburg
Department of Computer Science
@home: Dieter.Nuetzel@hamburg.de
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2002-02-11 12:53 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-02-07 23:18 2.4.18-pre8-K2: Kernel panic: CPU context corrupt Alex Riesen
2002-02-07 23:36 ` Dave Jones
2002-02-08 22:13 ` Alex Riesen
2002-02-09 22:23 ` Pavel Machek
2002-02-10 21:33 ` Dave Jones
2002-02-11 1:03 ` Alan Cox
2002-02-11 11:59 ` Alex Riesen
2002-02-11 12:52 ` Pavel Machek
-- strict thread matches above, loose matches on Subject: below --
2002-02-08 22:44 Dieter Nützel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox