* Pls help me understand this MCE
@ 2005-08-11 13:49 David King
2005-08-11 14:10 ` Petr Vandrovec
0 siblings, 1 reply; 3+ messages in thread
From: David King @ 2005-08-11 13:49 UTC (permalink / raw)
To: linux-kernel
I'm new at this so I'm learning my way through it and I'd appreciate any
guidance. My system freezes solid intermittently for no apparent
reason. The serial console shows a kernal panic caused by a machine
check exception. mcelog decodes the MCE as follows:
CPU 0 4 northbridge TSC f03d1e587b
Northbridge Watchdog error
bit57 = processor context corrupt
bit61 = error uncorrected
bus error 'generic participation, request timed out
generic error mem transaction
generic access, level generic'
STATUS b200000000070f0f MCGSTATUS 4
That's all meaningless to me so I'm looking for help understanding what
it means and what parts of my system I should be looking at in order to
try to resolve this MCE.
A Google search found one hit that suggested that "Something tried to
access a physical memory address that was not mapped in the CPU." If
that is indeed the correct interpretation, is there any wany to figure
out what that "something" is?
The system in question is a no-name system built from parts by a local
computer shop. It has an ASUS A8N-SLI motherboard. I have updated the
BIOS to the latest without affecting this problem. The CPU installed is
an "AMD Athlon(tm) 64 Processor 3500+". I am running Fedora Core 4,
kernel 2.6.12-1.1398_FC4.
Any guidance would be appreciated.
--
David King
dave@daveking.com
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Pls help me understand this MCE
2005-08-11 13:49 Pls help me understand this MCE David King
@ 2005-08-11 14:10 ` Petr Vandrovec
2005-08-11 18:02 ` David King
0 siblings, 1 reply; 3+ messages in thread
From: Petr Vandrovec @ 2005-08-11 14:10 UTC (permalink / raw)
To: David King; +Cc: linux-kernel
David King wrote:
> CPU 0 4 northbridge TSC f03d1e587b
> Northbridge Watchdog error
> bit57 = processor context corrupt
> bit61 = error uncorrected
> bus error 'generic participation, request timed out
> generic error mem transaction
> generic access, level generic'
> STATUS b200000000070f0f MCGSTATUS 4
>
> That's all meaningless to me so I'm looking for help understanding what
> it means and what parts of my system I should be looking at in order to
> try to resolve this MCE.
>
> A Google search found one hit that suggested that "Something tried to
> access a physical memory address that was not mapped in the CPU." If
> that is indeed the correct interpretation, is there any wany to figure
> out what that "something" is?
Try dumping *all* MCE values, as well as a call stack. Even although
MCE is tagged as processor context corrupt, there is rather big chance
that stack trace will point back to the instruction which caused MCE
(it always did in my case), especially if it is single processor system.
Then you'll at least know which subsystem/driver did that.
Best regards,
Petr Vandrovec
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Pls help me understand this MCE
2005-08-11 14:10 ` Petr Vandrovec
@ 2005-08-11 18:02 ` David King
0 siblings, 0 replies; 3+ messages in thread
From: David King @ 2005-08-11 18:02 UTC (permalink / raw)
To: Petr Vandrovec; +Cc: linux-kernel
Petr Vandrovec wrote:
> Try dumping *all* MCE values, as well as a call stack. Even although
> MCE is tagged as processor context corrupt, there is rather big chance
> that stack trace will point back to the instruction which caused MCE
> (it always did in my case), especially if it is single processor system.
> Then you'll at least know which subsystem/driver did that.
Ok, here's everything I got from the serial console when the error
occurred. I don't have a clue how to interpret this stuff so I'd be
eternally grateful if someone out there can help. Or, if I
misunderstood what you were telling me I ought to do, then explaining
the process a bit more would be appreciated too.
CPU 0: Machine Check Exception: 4 Bank 4: b200000000070f0f
TSC 7cba18189a
Kernel panic - not syncing: Machine check
Call Trace: <#MC> <ffffffff8013a4b5>{panic+133}
<ffffffff80116d48>{print_mce+136}
<ffffffff80116e19>{mce_panic+137} <ffffffff801173f2>{do_machine_check+754}
<ffffffff80110147>{machine_check+127}
<ffffffff80113dec>{timer_interrupt+444}
<EOE> <IRQ> <ffffffff80146b50>{process_timeout+0}
<ffffffff801704dc>{handle_IRQ_event+44} <ffffffff801706ed>{__do_IRQ+477}
<ffffffff801120b8>{do_IRQ+72} <ffffffff8010f6c3>{ret_from_intr+0}
<EOI> <ffffffff8010d230>{default_idle+0} <ffffffff8010d252>{default_idle+34}
<ffffffff8010d291>{cpu_idle+49} <ffffffff8057e7e5>{start_kernel+469}
<ffffffff8057e1f4>{_sinittext+500}
Thanks
--
David King
dave@daveking.com
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2005-08-11 18:02 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-11 13:49 Pls help me understand this MCE David King
2005-08-11 14:10 ` Petr Vandrovec
2005-08-11 18:02 ` David King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox