All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <borislav.petkov@amd.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Borislav Petkov <petkovbb@googlemail.com>,
	Andi Kleen <andi@firstfloor.org>,
	x86@kernel.org, linux-kernel@vger.kernel.org, torvalds@osdl.org
Subject: Re: x86: mce: Please revert 22223c9b417be5fd0ab2cf9ad17eb7bd1e19f7b9
Date: Thu, 1 Oct 2009 16:14:32 +0200	[thread overview]
Message-ID: <20091001141432.GA11410@aftab> (raw)
In-Reply-To: <20090930230947.GA9346@elte.hu>

On Thu, Oct 01, 2009 at 01:09:47AM +0200, Ingo Molnar wrote:
> 
> * Borislav Petkov <petkovbb@googlemail.com> wrote:
> 
> > On Wed, Sep 30, 2009 at 11:48:59PM +0200, Ingo Molnar wrote:
> > > I.e. something like the patch below. Completely untested.

Ok, here it is, tested on two Fam10 machines here with injecting MCEs.
The decoding code is now built-in by default (early_initcall requires
!MODULE).

I dunno, it looks like we could just as well move the decoding stuff to
<arch/x86/kernel/cpu/mcheck/> since its now built-in and doesn't have
any connection to EDAC except the NB decoder which is a function ptr
registration thing.

Please take a look and let me know if there are any objections.

Thanks.

---
>From 2c494ebc4ddbdb386a3ffa1a30339d3cec4072e5 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Thu, 1 Oct 2009 11:43:48 +0200
Subject: [PATCH] x86, mce: Fix MCE decoding callback logic

Make decoding of MCEs happen only on AMD hardware by registering a
non-default callback only on CPU families which support it.

While looking at the interaction of decode_mce() with the other MCE code
i also noticed a few other things and made the following cleanups/fixes:

 - Fixed the mce_decode() weak alias - a weak alias is really not good
   here, it should be a proper callback. A weak alias will be overriden
   if a piece of code is built into the kernel - not good, obviously.

 - The patch initializes the callback on AMD family 10h and 11h - a
   quick glance suggests that decoding of earlier models isnt supported?

 - Added the more correct fallback printk of:

	No support for human readable MCE decoding on this CPU type.
	Transcribe the message and run it through 'mcelog --ascii' to decode.

   On CPUs that dont have a decoder.

 - Made the surrounding code more readable.

Note that the callback allows us to have a default fallback - without
having to check the CPU versions during the printout itself. When an
EDAC module registers itself, it can install the decode-print function.

(there's no unregister needed as this is core code.)

Borislav:

 - add K8 to the set of supported CPUs

 - always build in edac_mce_amd since we use an early_initcall now

 - fixup checkpatch warnings

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
 arch/x86/include/asm/mce.h       |    2 +
 arch/x86/kernel/cpu/mcheck/mce.c |   49 ++++++++++++++++++++++++--------------
 drivers/edac/Makefile            |    4 +--
 drivers/edac/edac_mce_amd.c      |   15 +++++++++++-
 4 files changed, 48 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index b608a64..f1363b7 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -133,6 +133,8 @@ static inline void winchip_mcheck_init(struct cpuinfo_x86 *c) {}
 static inline void enable_p5_mce(void) {}
 #endif
 
+extern void (*x86_mce_decode_callback)(struct mce *m);
+
 void mce_setup(struct mce *m);
 void mce_log(struct mce *m);
 DECLARE_PER_CPU(struct sys_device, mce_dev);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 183c345..614c849 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -85,6 +85,18 @@ static DECLARE_WAIT_QUEUE_HEAD(mce_wait);
 static DEFINE_PER_CPU(struct mce, mces_seen);
 static int			cpu_missing;
 
+static void default_decode_mce(struct mce *m)
+{
+	pr_emerg("No human readable MCE decoding support on this CPU type.\n");
+	pr_emerg("Run the message through 'mcelog --ascii' to decode.\n");
+}
+
+/*
+ * CPU/chipset specific EDAC code can register a callback here to print
+ * MCE errors in a human-readable form:
+ */
+void (*x86_mce_decode_callback)(struct mce *m) = default_decode_mce;
+EXPORT_SYMBOL(x86_mce_decode_callback);
 
 /* MCA banks polled by the period polling timer for corrected events */
 DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = {
@@ -165,46 +177,47 @@ void mce_log(struct mce *mce)
 	set_bit(0, &mce_need_notify);
 }
 
-void __weak decode_mce(struct mce *m)
-{
-	return;
-}
-
 static void print_mce(struct mce *m)
 {
-	printk(KERN_EMERG
-	       "CPU %d: Machine Check Exception: %16Lx Bank %d: %016Lx\n",
+	pr_emerg("CPU %d: Machine Check Exception: %16Lx Bank %d: %016Lx\n",
 	       m->extcpu, m->mcgstatus, m->bank, m->status);
+
 	if (m->ip) {
-		printk(KERN_EMERG "RIP%s %02x:<%016Lx> ",
+		pr_emerg("RIP%s %02x:<%016Lx> ",
 		       !(m->mcgstatus & MCG_STATUS_EIPV) ? " !INEXACT!" : "",
 		       m->cs, m->ip);
+
 		if (m->cs == __KERNEL_CS)
 			print_symbol("{%s}", m->ip);
-		printk(KERN_CONT "\n");
+		pr_cont("\n");
 	}
-	printk(KERN_EMERG "TSC %llx ", m->tsc);
+
+	pr_emerg("TSC %llx ", m->tsc);
 	if (m->addr)
-		printk(KERN_CONT "ADDR %llx ", m->addr);
+		pr_cont("ADDR %llx ", m->addr);
 	if (m->misc)
-		printk(KERN_CONT "MISC %llx ", m->misc);
-	printk(KERN_CONT "\n");
-	printk(KERN_EMERG "PROCESSOR %u:%x TIME %llu SOCKET %u APIC %x\n",
+		pr_cont("MISC %llx ", m->misc);
+
+	pr_cont("\n");
+	pr_emerg("PROCESSOR %u:%x TIME %llu SOCKET %u APIC %x\n",
 			m->cpuvendor, m->cpuid, m->time, m->socketid,
 			m->apicid);
 
-	decode_mce(m);
+	/*
+	 * Print out human-readable details about the MCE error,
+	 * (if the CPU has an implementation for that):
+	 */
+	x86_mce_decode_callback(m);
 }
 
 static void print_mce_head(void)
 {
-	printk(KERN_EMERG "\nHARDWARE ERROR\n");
+	pr_emerg("\nHARDWARE ERROR\n");
 }
 
 static void print_mce_tail(void)
 {
-	printk(KERN_EMERG "This is not a software problem!\n"
-	       "Run through mcelog --ascii to decode and contact your hardware vendor\n");
+	pr_emerg("This is not a software problem!\n");
 }
 
 #define PANIC_TIMEOUT 5 /* 5 seconds */
diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
index 7a473bb..74fc527 100644
--- a/drivers/edac/Makefile
+++ b/drivers/edac/Makefile
@@ -17,9 +17,7 @@ ifdef CONFIG_PCI
 edac_core-objs	+= edac_pci.o edac_pci_sysfs.o
 endif
 
-ifdef CONFIG_CPU_SUP_AMD
-edac_core-objs  += edac_mce_amd.o
-endif
+obj-$(CONFIG_CPU_SUP_AMD)               += edac_mce_amd.o
 
 obj-$(CONFIG_EDAC_AMD76X)		+= amd76x_edac.o
 obj-$(CONFIG_EDAC_CPC925)		+= cpc925_edac.o
diff --git a/drivers/edac/edac_mce_amd.c b/drivers/edac/edac_mce_amd.c
index 0c21c37..83a01a1 100644
--- a/drivers/edac/edac_mce_amd.c
+++ b/drivers/edac/edac_mce_amd.c
@@ -362,7 +362,7 @@ static inline void amd_decode_err_code(unsigned int ec)
 		pr_warning("Huh? Unknown MCE error 0x%x\n", ec);
 }
 
-void decode_mce(struct mce *m)
+static void amd_decode_mce(struct mce *m)
 {
 	struct err_regs regs;
 	int node, ecc;
@@ -420,3 +420,16 @@ void decode_mce(struct mce *m)
 
 	amd_decode_err_code(m->status & 0xffff);
 }
+
+static int __init mce_amd_init(void)
+{
+	/*
+	 * We can decode MCEs for Opteron and later CPUs:
+	 */
+	if ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
+	    (boot_cpu_data.x86 >= 0xf))
+		x86_mce_decode_callback = amd_decode_mce;
+
+	return 0;
+}
+early_initcall(mce_amd_init);
-- 
1.6.4.3

-- 
Regards/Gruss,
Boris.

Operating | Advanced Micro Devices GmbH
  System  | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632


  reply	other threads:[~2009-10-01 14:14 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-30 14:09 x86: mce: Please revert 22223c9b417be5fd0ab2cf9ad17eb7bd1e19f7b9 Andi Kleen
2009-09-30 19:40 ` Borislav Petkov
2009-09-30 20:46   ` Ingo Molnar
2009-09-30 21:48     ` Ingo Molnar
2009-09-30 22:39       ` Borislav Petkov
2009-09-30 23:09         ` Ingo Molnar
2009-10-01 14:14           ` Borislav Petkov [this message]
2009-10-01 14:34             ` Linus Torvalds
2009-10-01 14:46               ` Borislav Petkov
2009-10-01 15:00                 ` Ingo Molnar
2009-10-01 15:21                   ` Borislav Petkov
2009-10-01 15:32                     ` Ingo Molnar
2009-10-02 13:21                       ` Borislav Petkov
2009-10-02 13:22                         ` [PATCH 1/3] x86, mce, edac: Fix MCE decoding callback logic Borislav Petkov
2009-10-02 13:23                         ` [PATCH 2/3] initcalls: add early_initcall for modules Borislav Petkov
2009-10-02 14:01                           ` [tip:x86/urgent] initcalls: Add early_initcall() " tip-bot for Borislav Petkov
2009-10-02 13:26                         ` x86: mce: Please revert 22223c9b417be5fd0ab2cf9ad17eb7bd1e19f7b9 Ingo Molnar
2009-10-02 13:31                         ` [PATCH 3/3] EDAC: carve out AMD MCE decoding logic Borislav Petkov
2009-10-02 13:39                           ` Ingo Molnar
2009-10-02 18:26                             ` Borislav Petkov
2009-10-02 18:47                               ` Ingo Molnar
2009-10-03  6:57                                 ` Borislav Petkov
2009-10-03  7:18                                   ` Ingo Molnar
2009-10-05 15:15                                     ` Borislav Petkov
2009-10-16 12:55                                   ` [tip:perf/mce] mce, edac: Use an atomic notifier for MCEs decoding tip-bot for Borislav Petkov
2009-10-02 14:01                           ` [tip:x86/urgent] x86: EDAC: carve out AMD MCE decoding logic tip-bot for Borislav Petkov
2009-10-01 14:55               ` x86: mce: Please revert 22223c9b417be5fd0ab2cf9ad17eb7bd1e19f7b9 Ingo Molnar
2009-10-01 15:26               ` Andi Kleen
2009-10-02 14:01             ` [tip:x86/urgent] x86: EDAC: MCE: Fix MCE decoding callback logic tip-bot for Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091001141432.GA11410@aftab \
    --to=borislav.petkov@amd.com \
    --cc=andi@firstfloor.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=petkovbb@googlemail.com \
    --cc=torvalds@osdl.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.