* hardware error state at cmc
@ 2003-12-08 16:05 Christian Hinkelbein
2003-12-08 18:23 ` Luck, Tony
0 siblings, 1 reply; 2+ messages in thread
From: Christian Hinkelbein @ 2003-12-08 16:05 UTC (permalink / raw)
To: linux-ia64
Hello,
could anyone give me a hint about the meaning of the following
appearing in system.log (i2000, 2.6.0-test4, uptime ~40 days):
kernel: +BEGIN HARDWARE ERROR STATE AT CMC
kernel: +Err Record ID: 37 SAL Rev: 0.02
kernel: +Time: 12/03/2003 18:56:34 Severity 258
kernel: +Processor Device Error Info Section
kernel: Processor Error Map: 0x4000
kernel: Processor State Param: 0x8000000fff611b0
kernel: Processor LID: 0x3000000
kernel: + Cache check info[0]
kernel: + Level: L0, Index: 0, Operation: Unknown,
kernel: CPUID Regs: 0x49656e69756e6547 0x6c65746e 0x0 0x7000804
kernel: +END HARDWARE ERROR STATE AT CMC
thanks
christian
^ permalink raw reply [flat|nested] 2+ messages in thread
* RE: hardware error state at cmc
2003-12-08 16:05 hardware error state at cmc Christian Hinkelbein
@ 2003-12-08 18:23 ` Luck, Tony
0 siblings, 0 replies; 2+ messages in thread
From: Luck, Tony @ 2003-12-08 18:23 UTC (permalink / raw)
To: linux-ia64
> Hello,
> could anyone give me a hint about the meaning of the following
> appearing in system.log (i2000, 2.6.0-test4, uptime ~40 days):
>
> kernel: +BEGIN HARDWARE ERROR STATE AT CMC
> kernel: +Err Record ID: 37 SAL Rev: 0.02
> kernel: +Time: 12/03/2003 18:56:34 Severity 258
> kernel: +Processor Device Error Info Section
> kernel: Processor Error Map: 0x4000
> kernel: Processor State Param: 0x8000000fff611b0
> kernel: Processor LID: 0x3000000
> kernel: + Cache check info[0]
> kernel: + Level: L0, Index: 0, Operation: Unknown,
> kernel: CPUID Regs: 0x49656e69756e6547 0x6c65746e 0x0 0x7000804
> kernel: +END HARDWARE ERROR STATE AT CMC
One of your processors had a correctible error in its cache. The
cpu fixed it, but interrupted the OS to tell you it that it happened.
The "Processor LID" field should tell you which cpu had the error
(should match the "cr.lid" value of one of you cpus). This is
probably the 37th error since system reset (Error Record ID is
37). You might want to check your logs to see what kinds of errors
were reported for the previous 36 errors to see if there is any sort
of pattern (which may indicate real hardware problems). If the
errors are of different types, and reported by different processors,
then you may just be seeing stray neutrons flipping bits as they
pass through.
You might also want to get 2.6.0-test11 and apply Keith Owens patch
http://marc.theaimsgroup.com/?l=linux-ia64&m\x106974968032730&w=2 to
get easier to read logs, together with Keith's "salinfo" package,
which Bjorn hosted at:
ftp://ftp.kernel.org/pub/linux/kernel/people/helgaas/salinfo-0.4.tar.gz
-Tony Luck
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2003-12-08 18:23 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-12-08 16:05 hardware error state at cmc Christian Hinkelbein
2003-12-08 18:23 ` Luck, Tony
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox