public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed
From: Huang Ying <ying.huang@intel.com>
To: "Nath, Arindam" <Arindam.Nath@amd.com>
Cc: "linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"Su, Henry" <Henry.Su@amd.com>
Subject: Re: Linux ACPI BERT implementation
Date: Fri, 12 Aug 2011 10:54:31 +0800	[thread overview]
Message-ID: <4E4495E7.3050305@intel.com> (raw)
In-Reply-To: <6C03668EAF45B747AF947A1603D1B3000197F3F448@SAUSEXMBP01.amd.com>

[-- Attachment #1: Type: text/plain, Size: 1302 bytes --]

Hi, Arindam,

The bert patch is attached with the mail.

The testing steps

1. Build kernel with APEI enabled.
2. Trigger some hardware error so that BERT will be filled
3. Reboot system, and check whether there are some kernel message
(dmesg) begin with

[Hardware Error] Error record from previous boot:

We don't know how to do (2).  That is key for this testing.

Best Regards,
Huang Ying

On 08/10/2011 02:36 PM, Nath, Arindam wrote:
> Hi Huang,
> 
>> -----Original Message-----
>> From: Huang Ying [mailto:ying.huang@intel.com]
>> Sent: Friday, July 15, 2011 12:38 PM
>> To: Nath, Arindam
>> Cc: linux-acpi@vger.kernel.org; Su, Henry
>> Subject: Re: Linux ACPI BERT implementation
>>
>> On 07/15/2011 02:45 PM, Nath, Arindam wrote:
>>> Hi Huang,
>>>
>>>
>>>
>>> Since you have already added some support for APEI into the ACPI
>>> subsystem, are you planning to add support for BERT too?
>>
>> Yes. I have the code.  But have not found a way to test it on my
>> testing
>> machine.  Do you have any machine to test it?
> 
> Sorry for the late response. We do have platform with ACPI BERT support, but it has not been tested. So if you provide me the patch, it would help if you could also provide me the steps to test the patch.
> 
> Thanks,
> Arindam
> 
>>
>> Best Regards,
>> Huang Ying
> 
> 


[-- Attachment #2: 0005-ACPI-APEI-Boot-Error-Record-Table-BERT-support.patch --]
[-- Type: text/x-patch, Size: 6708 bytes --]

Subject: [PATCH] ACPI, APEI, Boot Error Record Table (BERT) support

Under normal circumstances, when a hardware error occurs, kernel will
be notified via NMI, MCE or some other method, then kernel will
process the error condition, report it, and recover it if possible.
But sometime, the situation is so bad, so that firmware may choose to
reset directly without notifying Linux kernel.

Linux kernel can use the Boot Error Record Table (BERT) to get the
un-notified hardware errors that occurred in a previous boot.  In this
patch, the error information is reported via printk.

For more information about ERST, please refer to ACPI Specification
version 4.0, section 17.3.1

Signed-off-by: Huang Ying <ying.huang@intel.com>
---
 Documentation/kernel-parameters.txt |    3 
 drivers/acpi/apei/Makefile          |    2 
 drivers/acpi/apei/bert.c            |  169 ++++++++++++++++++++++++++++++++++++
 include/acpi/apei.h                 |    1 
 4 files changed, 174 insertions(+), 1 deletion(-)
 create mode 100644 drivers/acpi/apei/bert.c

--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -402,6 +402,9 @@ bytes respectively. Such letter suffixes
 
 	bootmem_debug	[KNL] Enable bootmem allocator debug messages.
 
+	bert_disable	[ACPI]
+			Disable Boot Error Record Table (BEST) support.
+
 	bttv.card=	[HW,V4L] bttv (bt848 + bt878 based grabber cards)
 	bttv.radio=	Most important insmod options are available as
 			kernel args too.
--- a/drivers/acpi/apei/Makefile
+++ b/drivers/acpi/apei/Makefile
@@ -3,4 +3,4 @@ obj-$(CONFIG_ACPI_APEI_GHES)	+= ghes.o
 obj-$(CONFIG_ACPI_APEI_EINJ)	+= einj.o
 obj-$(CONFIG_ACPI_APEI_ERST_DEBUG) += erst-dbg.o
 
-apei-y := apei-base.o hest.o cper.o erst.o
+apei-y := apei-base.o hest.o cper.o erst.o bert.o
--- /dev/null
+++ b/drivers/acpi/apei/bert.c
@@ -0,0 +1,169 @@
+/*
+ * APEI Boot Error Record Table (BERT) support
+ *
+ * Copyright 2011 Intel Corp.
+ *   Author: Huang Ying <ying.huang@intel.com>
+ *
+ * Under normal circumstances, when a hardware error occurs, kernel
+ * will be notified via NMI, MCE or some other method, then kernel
+ * will process the error condition, report it, and recover it if
+ * possible. But sometime, the situation is so bad, so that firmware
+ * may choose to reset directly without notifying Linux kernel.
+ *
+ * Linux kernel can use the Boot Error Record Table (BERT) to get the
+ * un-notified hardware errors that occurred in a previous boot.
+ *
+ * For more information about ERST, please refer to ACPI Specification
+ * version 4.0, section 17.3.1
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/acpi.h>
+#include <linux/io.h>
+
+#include "apei-internal.h"
+
+#define BERT_PFX "BERT: "
+
+int bert_disable;
+EXPORT_SYMBOL_GPL(bert_disable);
+
+static void __init bert_print_all(struct acpi_hest_generic_status *region,
+				  unsigned int region_len)
+{
+	int remain, first = 1;
+	u32 estatus_len;
+	struct acpi_hest_generic_status *estatus;
+
+	remain = region_len;
+	estatus = region;
+	while (remain > sizeof(struct acpi_hest_generic_status)) {
+		/* No more error record */
+		if (!estatus->block_status)
+			break;
+
+		estatus_len = apei_estatus_len(estatus);
+		if (estatus_len < sizeof(struct acpi_hest_generic_status) ||
+		    remain < estatus_len) {
+			pr_err(FW_BUG BERT_PFX "Invalid error status block with length %u\n",
+			       estatus_len);
+			return;
+		}
+
+		if (apei_estatus_check(estatus)) {
+			pr_err(FW_BUG BERT_PFX "Invalid Error status block\n");
+			goto next;
+		}
+
+		if (first) {
+			pr_info(HW_ERR "Error record from previous boot:\n");
+			first = 0;
+		}
+		apei_estatus_print(KERN_INFO HW_ERR, estatus);
+next:
+		estatus = (void *)estatus + estatus_len;
+		remain -= estatus_len;
+	}
+}
+
+static int __init setup_bert_disable(char *str)
+{
+	bert_disable = 1;
+	return 0;
+}
+__setup("bert_disable", setup_bert_disable);
+
+static int __init bert_check_table(struct acpi_table_bert *bert_tab)
+{
+	if (bert_tab->header.length < sizeof(struct acpi_table_bert))
+		return -EINVAL;
+	if (bert_tab->region_length != 0 &&
+	    bert_tab->region_length < sizeof(struct acpi_bert_region))
+		return -EINVAL;
+
+	return 0;
+}
+
+static int __init bert_init(void)
+{
+	acpi_status status;
+	struct acpi_table_bert *bert_tab;
+	struct resource *r;
+	struct acpi_hest_generic_status *bert_region;
+	unsigned int region_len;
+	int rc = -EINVAL;
+
+	if (acpi_disabled)
+		goto out;
+
+	if (bert_disable) {
+		pr_info(BERT_PFX "Boot Error Record Table (BERT) support is disabled.\n");
+		goto out;
+	}
+
+	status = acpi_get_table(ACPI_SIG_BERT, 0,
+				(struct acpi_table_header **)&bert_tab);
+	if (status == AE_NOT_FOUND) {
+		pr_err(BERT_PFX "Table is not found!\n");
+		goto out;
+	} else if (ACPI_FAILURE(status)) {
+		const char *msg = acpi_format_exception(status);
+		pr_err(BERT_PFX "Failed to get table, %s\n", msg);
+		goto out;
+	}
+
+	rc = bert_check_table(bert_tab);
+	if (rc) {
+		pr_err(FW_BUG BERT_PFX "BERT table is invalid\n");
+		goto out;
+	}
+
+	region_len = bert_tab->region_length;
+	if (!region_len) {
+		rc = 0;
+		goto out;
+	}
+
+	r = request_mem_region(bert_tab->address, region_len, "APEI BERT");
+	if (!r) {
+		pr_err(BERT_PFX "Can not request iomem region <%016llx-%016llx> for BERT.\n",
+		       (unsigned long long)bert_tab->address,
+		       (unsigned long long)bert_tab->address + region_len);
+		rc = -EIO;
+		goto out;
+	}
+
+	bert_region = ioremap_cache(bert_tab->address, region_len);
+	if (!bert_region) {
+		rc = -ENOMEM;
+		goto out_release;
+	}
+
+	bert_print_all(bert_region, region_len);
+
+	iounmap(bert_region);
+
+out_release:
+	release_mem_region(bert_tab->address, region_len);
+out:
+	if (rc)
+		bert_disable = 1;
+
+	return rc;
+}
+late_initcall(bert_init);
--- a/include/acpi/apei.h
+++ b/include/acpi/apei.h
@@ -23,6 +23,7 @@ extern int ghes_disable;
 #else
 #define ghes_disable 1
 #endif
+extern int bert_disable;
 
 #ifdef CONFIG_ACPI_APEI
 void __init acpi_hest_init(void);

  reply	other threads:[~2011-08-12  2:54 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <6C03668EAF45B747AF947A1603D1B300018F85B9D6@SAUSEXMBP01.amd.com>
2011-07-15  7:07 ` Linux ACPI BERT implementation Huang Ying
2011-08-10  6:36   ` Nath, Arindam
2011-08-12  2:54     ` Huang Ying [this message]
2011-08-12  9:43       ` Nath, Arindam
2011-07-15  6:47 Nath, Arindam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E4495E7.3050305@intel.com \
    --to=ying.huang@intel.com \
    --cc=Arindam.Nath@amd.com \
    --cc=Henry.Su@amd.com \
    --cc=linux-acpi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox