linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/14] amd64_edac: marry mcheck to amd64 edac
@ 2009-07-20 16:12 Borislav Petkov
  2009-07-20 16:12 ` [PATCH 01/14] amd64_edac: simplify error type bits extractors Borislav Petkov
                   ` (15 more replies)
  0 siblings, 16 replies; 36+ messages in thread
From: Borislav Petkov @ 2009-07-20 16:12 UTC (permalink / raw)
  To: mingo, hpa, tglx, norsk5, aris; +Cc: linux-kernel, x86

Hi all,

this is the first version of the attempt to forward MCE information to
the amd64 EDAC module for further decoding. When the MCE handler gets
invoked and the EDAC module is loaded, here's how a decoded MCE looks
like:

Disabling lock debugging due to kernel taint

<0>HARDWARE ERROR
CPU 3: Machine Check Exception:                4 Bank 0: b20040001c000175
TSC 714e9b73cf 
PROCESSOR 2:100f22 TIME 1247237579 SOCKET 0 APIC 3
MC0_STATUS: Uncorrected error, report: yes, MiscV: invalid, CPU context corrupt: yes
 Data Cache Error: Data/Tag Evict error.
 Transaction: Evict, Type: Data, Cache Level: L1
This is not a software problem!
<0>Run through mcelog --ascii to decode and contact your hardware vendor
Machine check: Processor context corrupt
Kernel panic - not syncing: Fatal machine check on current CPU
Pid: 4817, comm: cc1 Tainted: G   M       2.6.31-rc2-00218-g78848b0-dirty #42
Call Trace:
 <#MC>  [<ffffffff8134a17a>] panic+0xaf/0x178
 [<ffffffff812b5d9e>] ? decode_mce+0x47e/0x540
 [<ffffffff81019210>] ? print_mce+0x90/0x110
 [<ffffffff810193e7>] mce_panic+0x157/0x180
 [<ffffffff81019de7>] do_machine_check+0x757/0x930
 [<ffffffff8134d96d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff8134e9cb>] machine_check+0x1b/0x20
 <EOE>

Clearly, the "Run through mcelog... " line is redundant now :) since
there's no need for userspace decoding anymore and the original EDAC
functionality (polling workqueue) is still preserved. The code currently
uses EDAC to decode DRAM ECC errors but this could clearly be extended
to handle all valid addresses acquired from MCi_ADDR registers.

Comments and further suggestions are most welcome.

Thanks,
Boris.

 arch/x86/kernel/cpu/mcheck/mce.c    |    7 +
 drivers/edac/amd64_edac.c           |  484 +++++++++++++++++++++--------------
 drivers/edac/amd64_edac.h           |   67 ++---
 drivers/edac/amd64_edac_dbg.c       |    2 +-
 drivers/edac/amd64_edac_err_types.c |  126 +++++-----
 5 files changed, 382 insertions(+), 304 deletions(-)


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2009-08-04 14:45 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-20 16:12 [RFC PATCH 0/14] amd64_edac: marry mcheck to amd64 edac Borislav Petkov
2009-07-20 16:12 ` [PATCH 01/14] amd64_edac: simplify error type bits extractors Borislav Petkov
2009-07-20 17:56   ` Aristeu Rozanski
2009-07-21  9:40     ` Borislav Petkov
2009-07-20 16:12 ` [PATCH 02/14] amd64_edac: cleanup amd64_process_error_info Borislav Petkov
2009-07-20 16:12 ` [PATCH 03/14] amd64_edac: cleanup/complete NB MCE decoding Borislav Petkov
2009-07-20 16:12 ` [PATCH 04/14] amd64_edac: fixup ExtError decoding Borislav Petkov
2009-07-20 16:12 ` [PATCH 05/14] amd64_edac: remove memory and GART TLB error decoders Borislav Petkov
2009-07-20 16:12 ` [PATCH 06/14] amd64_edac: cleanup amd64_decode_bus_error Borislav Petkov
2009-07-20 16:12 ` [PATCH 07/14] mce3: pass mce info to EDAC for decoding Borislav Petkov
2009-07-20 18:04   ` Andi Kleen
2009-07-20 18:27     ` Doug Thompson
2009-07-20 19:22       ` Andi Kleen
2009-07-20 20:17         ` H. Peter Anvin
2009-07-20 21:02           ` Doug Thompson
2009-07-21  3:41           ` Hidetoshi Seto
2009-07-21  6:51             ` Andi Kleen
2009-07-21 10:49               ` Borislav Petkov
2009-08-04 14:45                 ` Ingo Molnar
2009-07-20 21:00         ` Doug Thompson
2009-07-20 19:44       ` [PATCH 07/14] mce3: pass mce info to EDAC for decoding II Andi Kleen
2009-07-21 10:51         ` Borislav Petkov
2009-07-21 11:07           ` Andi Kleen
2009-07-21 12:52             ` Borislav Petkov
2009-07-21 10:44     ` [PATCH 07/14] mce3: pass mce info to EDAC for decoding Borislav Petkov
2009-07-21 11:04       ` Andi Kleen
2009-07-21 12:56         ` Borislav Petkov
2009-07-20 16:12 ` [PATCH 08/14] amd64_edac: carve out MCi_STATUS decoding Borislav Petkov
2009-07-20 16:13 ` [PATCH 09/14] amd64_edac: carve out decoding of MCi_STATUS ErrorCode Borislav Petkov
2009-07-20 16:13 ` [PATCH 10/14] amd64_edac: decode data cache MCEs Borislav Petkov
2009-07-20 16:13 ` [PATCH 11/14] amd64_edac: decode instruction " Borislav Petkov
2009-07-20 16:13 ` [PATCH 12/14] amd64_edac: decode bus unit MCEs Borislav Petkov
2009-07-20 16:13 ` [PATCH 13/14] amd64_edac: decode load store MCEs Borislav Petkov
2009-07-20 16:13 ` [PATCH 14/14] amd64_edac: decode FR MCEs Borislav Petkov
2009-07-20 17:24 ` [RFC PATCH 0/14] amd64_edac: marry mcheck to amd64 edac Doug Thompson
2009-07-21  3:52 ` Hidetoshi Seto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).