public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <borislav.petkov@amd.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Borislav Petkov <petkovbb@googlemail.com>,
	Andi Kleen <andi@firstfloor.org>,
	x86@kernel.org, linux-kernel@vger.kernel.org, torvalds@osdl.org
Subject: Re: x86: mce: Please revert 22223c9b417be5fd0ab2cf9ad17eb7bd1e19f7b9
Date: Thu, 1 Oct 2009 16:14:32 +0200	[thread overview]
Message-ID: <20091001141432.GA11410@aftab> (raw)
In-Reply-To: <20090930230947.GA9346@elte.hu>

On Thu, Oct 01, 2009 at 01:09:47AM +0200, Ingo Molnar wrote:
> 
> * Borislav Petkov <petkovbb@googlemail.com> wrote:
> 
> > On Wed, Sep 30, 2009 at 11:48:59PM +0200, Ingo Molnar wrote:
> > > I.e. something like the patch below. Completely untested.

Ok, here it is, tested on two Fam10 machines here with injecting MCEs.
The decoding code is now built-in by default (early_initcall requires
!MODULE).

I dunno, it looks like we could just as well move the decoding stuff to
<arch/x86/kernel/cpu/mcheck/> since its now built-in and doesn't have
any connection to EDAC except the NB decoder which is a function ptr
registration thing.

Please take a look and let me know if there are any objections.

Thanks.

---
>From 2c494ebc4ddbdb386a3ffa1a30339d3cec4072e5 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Thu, 1 Oct 2009 11:43:48 +0200
Subject: [PATCH] x86, mce: Fix MCE decoding callback logic

Make decoding of MCEs happen only on AMD hardware by registering a
non-default callback only on CPU families which support it.

While looking at the interaction of decode_mce() with the other MCE code
i also noticed a few other things and made the following cleanups/fixes:

 - Fixed the mce_decode() weak alias - a weak alias is really not good
   here, it should be a proper callback. A weak alias will be overriden
   if a piece of code is built into the kernel - not good, obviously.

 - The patch initializes the callback on AMD family 10h and 11h - a
   quick glance suggests that decoding of earlier models isnt supported?

 - Added the more correct fallback printk of:

	No support for human readable MCE decoding on this CPU type.
	Transcribe the message and run it through 'mcelog --ascii' to decode.

   On CPUs that dont have a decoder.

 - Made the surrounding code more readable.

Note that the callback allows us to have a default fallback - without
having to check the CPU versions during the printout itself. When an
EDAC module registers itself, it can install the decode-print function.

(there's no unregister needed as this is core code.)

Borislav:

 - add K8 to the set of supported CPUs

 - always build in edac_mce_amd since we use an early_initcall now

 - fixup checkpatch warnings

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
 arch/x86/include/asm/mce.h       |    2 +
 arch/x86/kernel/cpu/mcheck/mce.c |   49 ++++++++++++++++++++++++--------------
 drivers/edac/Makefile            |    4 +--
 drivers/edac/edac_mce_amd.c      |   15 +++++++++++-
 4 files changed, 48 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index b608a64..f1363b7 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -133,6 +133,8 @@ static inline void winchip_mcheck_init(struct cpuinfo_x86 *c) {}
 static inline void enable_p5_mce(void) {}
 #endif
 
+extern void (*x86_mce_decode_callback)(struct mce *m);
+
 void mce_setup(struct mce *m);
 void mce_log(struct mce *m);
 DECLARE_PER_CPU(struct sys_device, mce_dev);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 183c345..614c849 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -85,6 +85,18 @@ static DECLARE_WAIT_QUEUE_HEAD(mce_wait);
 static DEFINE_PER_CPU(struct mce, mces_seen);
 static int			cpu_missing;
 
+static void default_decode_mce(struct mce *m)
+{
+	pr_emerg("No human readable MCE decoding support on this CPU type.\n");
+	pr_emerg("Run the message through 'mcelog --ascii' to decode.\n");
+}
+
+/*
+ * CPU/chipset specific EDAC code can register a callback here to print
+ * MCE errors in a human-readable form:
+ */
+void (*x86_mce_decode_callback)(struct mce *m) = default_decode_mce;
+EXPORT_SYMBOL(x86_mce_decode_callback);
 
 /* MCA banks polled by the period polling timer for corrected events */
 DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = {
@@ -165,46 +177,47 @@ void mce_log(struct mce *mce)
 	set_bit(0, &mce_need_notify);
 }
 
-void __weak decode_mce(struct mce *m)
-{
-	return;
-}
-
 static void print_mce(struct mce *m)
 {
-	printk(KERN_EMERG
-	       "CPU %d: Machine Check Exception: %16Lx Bank %d: %016Lx\n",
+	pr_emerg("CPU %d: Machine Check Exception: %16Lx Bank %d: %016Lx\n",
 	       m->extcpu, m->mcgstatus, m->bank, m->status);
+
 	if (m->ip) {
-		printk(KERN_EMERG "RIP%s %02x:<%016Lx> ",
+		pr_emerg("RIP%s %02x:<%016Lx> ",
 		       !(m->mcgstatus & MCG_STATUS_EIPV) ? " !INEXACT!" : "",
 		       m->cs, m->ip);
+
 		if (m->cs == __KERNEL_CS)
 			print_symbol("{%s}", m->ip);
-		printk(KERN_CONT "\n");
+		pr_cont("\n");
 	}
-	printk(KERN_EMERG "TSC %llx ", m->tsc);
+
+	pr_emerg("TSC %llx ", m->tsc);
 	if (m->addr)
-		printk(KERN_CONT "ADDR %llx ", m->addr);
+		pr_cont("ADDR %llx ", m->addr);
 	if (m->misc)
-		printk(KERN_CONT "MISC %llx ", m->misc);
-	printk(KERN_CONT "\n");
-	printk(KERN_EMERG "PROCESSOR %u:%x TIME %llu SOCKET %u APIC %x\n",
+		pr_cont("MISC %llx ", m->misc);
+
+	pr_cont("\n");
+	pr_emerg("PROCESSOR %u:%x TIME %llu SOCKET %u APIC %x\n",
 			m->cpuvendor, m->cpuid, m->time, m->socketid,
 			m->apicid);
 
-	decode_mce(m);
+	/*
+	 * Print out human-readable details about the MCE error,
+	 * (if the CPU has an implementation for that):
+	 */
+	x86_mce_decode_callback(m);
 }
 
 static void print_mce_head(void)
 {
-	printk(KERN_EMERG "\nHARDWARE ERROR\n");
+	pr_emerg("\nHARDWARE ERROR\n");
 }
 
 static void print_mce_tail(void)
 {
-	printk(KERN_EMERG "This is not a software problem!\n"
-	       "Run through mcelog --ascii to decode and contact your hardware vendor\n");
+	pr_emerg("This is not a software problem!\n");
 }
 
 #define PANIC_TIMEOUT 5 /* 5 seconds */
diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
index 7a473bb..74fc527 100644
--- a/drivers/edac/Makefile
+++ b/drivers/edac/Makefile
@@ -17,9 +17,7 @@ ifdef CONFIG_PCI
 edac_core-objs	+= edac_pci.o edac_pci_sysfs.o
 endif
 
-ifdef CONFIG_CPU_SUP_AMD
-edac_core-objs  += edac_mce_amd.o
-endif
+obj-$(CONFIG_CPU_SUP_AMD)               += edac_mce_amd.o
 
 obj-$(CONFIG_EDAC_AMD76X)		+= amd76x_edac.o
 obj-$(CONFIG_EDAC_CPC925)		+= cpc925_edac.o
diff --git a/drivers/edac/edac_mce_amd.c b/drivers/edac/edac_mce_amd.c
index 0c21c37..83a01a1 100644
--- a/drivers/edac/edac_mce_amd.c
+++ b/drivers/edac/edac_mce_amd.c
@@ -362,7 +362,7 @@ static inline void amd_decode_err_code(unsigned int ec)
 		pr_warning("Huh? Unknown MCE error 0x%x\n", ec);
 }
 
-void decode_mce(struct mce *m)
+static void amd_decode_mce(struct mce *m)
 {
 	struct err_regs regs;
 	int node, ecc;
@@ -420,3 +420,16 @@ void decode_mce(struct mce *m)
 
 	amd_decode_err_code(m->status & 0xffff);
 }
+
+static int __init mce_amd_init(void)
+{
+	/*
+	 * We can decode MCEs for Opteron and later CPUs:
+	 */
+	if ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
+	    (boot_cpu_data.x86 >= 0xf))
+		x86_mce_decode_callback = amd_decode_mce;
+
+	return 0;
+}
+early_initcall(mce_amd_init);
-- 
1.6.4.3

-- 
Regards/Gruss,
Boris.

Operating | Advanced Micro Devices GmbH
  System  | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632


  reply	other threads:[~2009-10-01 14:14 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-30 14:09 x86: mce: Please revert 22223c9b417be5fd0ab2cf9ad17eb7bd1e19f7b9 Andi Kleen
2009-09-30 19:40 ` Borislav Petkov
2009-09-30 20:46   ` Ingo Molnar
2009-09-30 21:48     ` Ingo Molnar
2009-09-30 22:39       ` Borislav Petkov
2009-09-30 23:09         ` Ingo Molnar
2009-10-01 14:14           ` Borislav Petkov [this message]
2009-10-01 14:34             ` Linus Torvalds
2009-10-01 14:46               ` Borislav Petkov
2009-10-01 15:00                 ` Ingo Molnar
2009-10-01 15:21                   ` Borislav Petkov
2009-10-01 15:32                     ` Ingo Molnar
2009-10-02 13:21                       ` Borislav Petkov
2009-10-02 13:22                         ` [PATCH 1/3] x86, mce, edac: Fix MCE decoding callback logic Borislav Petkov
2009-10-02 13:23                         ` [PATCH 2/3] initcalls: add early_initcall for modules Borislav Petkov
2009-10-02 14:01                           ` [tip:x86/urgent] initcalls: Add early_initcall() " tip-bot for Borislav Petkov
2009-10-02 13:26                         ` x86: mce: Please revert 22223c9b417be5fd0ab2cf9ad17eb7bd1e19f7b9 Ingo Molnar
2009-10-02 13:31                         ` [PATCH 3/3] EDAC: carve out AMD MCE decoding logic Borislav Petkov
2009-10-02 13:39                           ` Ingo Molnar
2009-10-02 18:26                             ` Borislav Petkov
2009-10-02 18:47                               ` Ingo Molnar
2009-10-03  6:57                                 ` Borislav Petkov
2009-10-03  7:18                                   ` Ingo Molnar
2009-10-05 15:15                                     ` Borislav Petkov
2009-10-16 12:55                                   ` [tip:perf/mce] mce, edac: Use an atomic notifier for MCEs decoding tip-bot for Borislav Petkov
2009-10-02 14:01                           ` [tip:x86/urgent] x86: EDAC: carve out AMD MCE decoding logic tip-bot for Borislav Petkov
2009-10-01 14:55               ` x86: mce: Please revert 22223c9b417be5fd0ab2cf9ad17eb7bd1e19f7b9 Ingo Molnar
2009-10-01 15:26               ` Andi Kleen
2009-10-02 14:01             ` [tip:x86/urgent] x86: EDAC: MCE: Fix MCE decoding callback logic tip-bot for Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091001141432.GA11410@aftab \
    --to=borislav.petkov@amd.com \
    --cc=andi@firstfloor.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=petkovbb@googlemail.com \
    --cc=torvalds@osdl.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox