public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <borislav.petkov@amd.com>
To: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
Cc: Borislav Petkov <petkovbb@googlemail.com>,
	linux-kernel@vger.kernel.org,
	osrc-patches <osrc-patches@elbe.amd.com>
Subject: Re: K8 ECC error with linux-2.6.32
Date: Tue, 15 Dec 2009 16:30:26 +0100	[thread overview]
Message-ID: <20091215153026.GD20880@aftab> (raw)
In-Reply-To: <200912150808.04814.johannes.hirte@fem.tu-ilmenau.de>

On Tue, Dec 15, 2009 at 08:08:04AM +0100, Johannes Hirte wrote:
>  Northbridge Error, node 0, core: -1
> amd_decode_nb_mce: NBSL: 0x0005001b, NBSL: 0xa4000000
> K8 ECC error.

Yep, this is a benign GART TLB error which is not being reported but
you're using the amd64_edac module and it trips since the error is still
being logged and the module sees it. There are two fixes:

1. If you have a BIOS option with a wording like:

"Gart Table Walk Error MC reporting: Disabled/Enabled."

which should disable it.

2. If no BIOS option, the patch below should fix it. Can you please
test (against v2.6.32).

Thanks.

---
diff --git a/drivers/edac/edac_mce_amd.c b/drivers/edac/edac_mce_amd.c
index 713ed7d..026f0cb 100644
--- a/drivers/edac/edac_mce_amd.c
+++ b/drivers/edac/edac_mce_amd.c
@@ -300,6 +300,12 @@ void amd_decode_nb_mce(int node_id, struct err_regs *regs, int handle_errors)
 	if (!handle_errors)
 		return;
 
+	/*
+	 * GART TLB error reporting is disabled by default. Bail out early.
+	 */
+	if (TLB_ERROR(ec) && !report_gart_errors)
+		return;
+
 	pr_emerg(" Northbridge Error, node %d", node_id);
 
 	/*
@@ -311,10 +317,9 @@ void amd_decode_nb_mce(int node_id, struct err_regs *regs, int handle_errors)
 		if (regs->nbsh & K8_NBSH_ERR_CPU_VAL)
 			pr_cont(", core: %u\n", (u8)(regs->nbsh & 0xf));
 	} else {
-		pr_cont(", core: %d\n", ilog2((regs->nbsh & 0xf)));
+		pr_cont(", core: %d\n", fls((regs->nbsh & 0xf) - 1));
 	}
 
-
 	pr_emerg("%s.\n", EXT_ERR_MSG(xec));
 
 	if (BUS_ERROR(ec) && nb_bus_decoder)
@@ -334,21 +339,6 @@ static void amd_decode_fr_mce(u64 mc5_status)
 static inline void amd_decode_err_code(unsigned int ec)
 {
 	if (TLB_ERROR(ec)) {
-		/*
-		 * GART errors are intended to help graphics driver developers
-		 * to detect bad GART PTEs. It is recommended by AMD to disable
-		 * GART table walk error reporting by default[1] (currently
-		 * being disabled in mce_cpu_quirks()) and according to the
-		 * comment in mce_cpu_quirks(), such GART errors can be
-		 * incorrectly triggered. We may see these errors anyway and
-		 * unless requested by the user, they won't be reported.
-		 *
-		 * [1] section 13.10.1 on BIOS and Kernel Developers Guide for
-		 *     AMD NPT family 0Fh processors
-		 */
-		if (!report_gart_errors)
-			return;
-
 		pr_emerg(" Transaction: %s, Cache Level %s\n",
 			 TT_MSG(ec), LL_MSG(ec));
 	} else if (MEM_ERROR(ec)) {

-- 
Regards/Gruss,
Boris.

Operating | Advanced Micro Devices GmbH
  System  | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632


  reply	other threads:[~2009-12-15 15:30 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-11 21:02 K8 ECC error with linux-2.6.32 Johannes Hirte
2009-12-11 21:11 ` Johannes Hirte
2009-12-11 21:19 ` Borislav Petkov
2009-12-11 21:39   ` Johannes Hirte
2009-12-11 22:07     ` Borislav Petkov
2009-12-11 22:12       ` Johannes Hirte
2009-12-14 13:26       ` Johannes Hirte
2009-12-14 22:23         ` Borislav Petkov
2009-12-15  7:08           ` Johannes Hirte
2009-12-15 15:30             ` Borislav Petkov [this message]
2009-12-15 22:00               ` Johannes Hirte
2009-12-16  7:14                 ` Borislav Petkov
2009-12-16 14:58                   ` radeon KMS causes GART Table Walk Errors (was: K8 ECC error with linux-2.6.32) Johannes Hirte
2009-12-16 16:41                     ` Borislav Petkov
2009-12-17  3:07                       ` Johannes Hirte
2009-12-17  7:22                         ` Borislav Petkov
2009-12-17 19:03                           ` Johannes Hirte
2009-12-18 11:56                             ` Borislav Petkov
2009-12-16 18:41                     ` Jerome Glisse
2009-12-16 19:31                       ` Johannes Hirte
2009-12-18 13:47                         ` Johannes Hirte
2009-12-18 14:44                           ` Jerome Glisse
2009-12-18 15:37                             ` Johannes Hirte
2009-12-24 19:04                             ` Johannes Hirte

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091215153026.GD20880@aftab \
    --to=borislav.petkov@amd.com \
    --cc=johannes.hirte@fem.tu-ilmenau.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=osrc-patches@elbe.amd.com \
    --cc=petkovbb@googlemail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox