* [PATCH -v4 0/3] ACPI, APEI, Report GHES error information via printk
@ 2010-12-07 2:22 Huang Ying
2010-12-07 2:22 ` [PATCH -v4 1/3] Add CPER PCIe error section structure and constants definition Huang Ying
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Huang Ying @ 2010-12-07 2:22 UTC (permalink / raw)
To: Len Brown
Cc: linux-kernel, Andi Kleen, Tony Luck, ying.huang, linux-acpi,
Peter Zijlstra, Andrew Morton, Linus Torvalds, Ingo Molnar
The recent consensus is that hardware error should be reported via printk.
This patch adds printk support to APEI GHES.
v4:
- Remove pr_pfx, use printk directly
v3:
- Add document for output format per Andrew's comments
- Move pr_pfx into printk.h, hope it is useful for others too
- Fixes some issues according to comments
v2:
- Some minor adjustment of PCIe error section definition and printk format
[PATCH -v4 1/3] Add CPER PCIe error section structure and constants definition
[PATCH -v4 2/3] ACPI, APEI, Add APEI generic error status printing support
[PATCH -v4 3/3] ACPI, APEI, Report GHES error information via printk
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH -v4 1/3] Add CPER PCIe error section structure and constants definition
2010-12-07 2:22 [PATCH -v4 0/3] ACPI, APEI, Report GHES error information via printk Huang Ying
@ 2010-12-07 2:22 ` Huang Ying
2010-12-07 2:22 ` [PATCH -v4 2/3] ACPI, APEI, Add APEI generic error status printing support Huang Ying
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Huang Ying @ 2010-12-07 2:22 UTC (permalink / raw)
To: Len Brown
Cc: linux-kernel, Andi Kleen, Tony Luck, ying.huang, linux-acpi,
Peter Zijlstra, Andrew Morton, Linus Torvalds, Ingo Molnar
On some machine, PCIe error is reported via APEI (ACPI Platform Error
Interface). The error data is passed from firmware to Linux via CPER
PCIe error section structure.
This patch adds CPER PCIe error section structure and constants
definition.
Signed-off-by: Huang Ying <ying.huang@intel.com>
---
include/linux/cper.h | 86 ++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 82 insertions(+), 4 deletions(-)
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -39,10 +39,12 @@
* Severity difinition for error_severity in struct cper_record_header
* and section_severity in struct cper_section_descriptor
*/
-#define CPER_SEV_RECOVERABLE 0x0
-#define CPER_SEV_FATAL 0x1
-#define CPER_SEV_CORRECTED 0x2
-#define CPER_SEV_INFORMATIONAL 0x3
+enum {
+ CPER_SEV_RECOVERABLE,
+ CPER_SEV_FATAL,
+ CPER_SEV_CORRECTED,
+ CPER_SEV_INFORMATIONAL,
+};
/*
* Validation bits difinition for validation_bits in struct
@@ -201,6 +203,47 @@
UUID_LE(0x036F84E1, 0x7F37, 0x428c, 0xA7, 0x9E, 0x57, 0x5F, \
0xDF, 0xAA, 0x84, 0xEC)
+#define CPER_PROC_VALID_TYPE 0x0001
+#define CPER_PROC_VALID_ISA 0x0002
+#define CPER_PROC_VALID_ERROR_TYPE 0x0004
+#define CPER_PROC_VALID_OPERATION 0x0008
+#define CPER_PROC_VALID_FLAGS 0x0010
+#define CPER_PROC_VALID_LEVEL 0x0020
+#define CPER_PROC_VALID_VERSION 0x0040
+#define CPER_PROC_VALID_BRAND_INFO 0x0080
+#define CPER_PROC_VALID_ID 0x0100
+#define CPER_PROC_VALID_TARGET_ADDRESS 0x0200
+#define CPER_PROC_VALID_REQUESTOR_ID 0x0400
+#define CPER_PROC_VALID_RESPONDER_ID 0x0800
+#define CPER_PROC_VALID_IP 0x1000
+
+#define CPER_MEM_VALID_ERROR_STATUS 0x0001
+#define CPER_MEM_VALID_PHYSICAL_ADDRESS 0x0002
+#define CPER_MEM_VALID_PHYSICAL_ADDRESS_MASK 0x0004
+#define CPER_MEM_VALID_NODE 0x0008
+#define CPER_MEM_VALID_CARD 0x0010
+#define CPER_MEM_VALID_MODULE 0x0020
+#define CPER_MEM_VALID_BANK 0x0040
+#define CPER_MEM_VALID_DEVICE 0x0080
+#define CPER_MEM_VALID_ROW 0x0100
+#define CPER_MEM_VALID_COLUMN 0x0200
+#define CPER_MEM_VALID_BIT_POSITION 0x0400
+#define CPER_MEM_VALID_REQUESTOR_ID 0x0800
+#define CPER_MEM_VALID_RESPONDER_ID 0x1000
+#define CPER_MEM_VALID_TARGET_ID 0x2000
+#define CPER_MEM_VALID_ERROR_TYPE 0x4000
+
+#define CPER_PCIE_VALID_PORT_TYPE 0x0001
+#define CPER_PCIE_VALID_VERSION 0x0002
+#define CPER_PCIE_VALID_COMMAND_STATUS 0x0004
+#define CPER_PCIE_VALID_DEVICE_ID 0x0008
+#define CPER_PCIE_VALID_SERIAL_NUMBER 0x0010
+#define CPER_PCIE_VALID_BRIDGE_CONTROL_STATUS 0x0020
+#define CPER_PCIE_VALID_CAPABILITY 0x0040
+#define CPER_PCIE_VALID_AER_INFO 0x0080
+
+#define CPER_PCIE_SLOT_SHIFT 3
+
/*
* All tables and structs must be byte-packed to match CPER
* specification, since the tables are provided by the system BIOS
@@ -306,6 +349,41 @@ struct cper_sec_mem_err {
__u8 error_type;
};
+struct cper_sec_pcie {
+ __u64 validation_bits;
+ __u32 port_type;
+ struct {
+ __u8 minor;
+ __u8 major;
+ __u8 reserved[2];
+ } version;
+ __u16 command;
+ __u16 status;
+ __u32 reserved;
+ struct {
+ __u16 vendor_id;
+ __u16 device_id;
+ __u8 class_code[3];
+ __u8 function;
+ __u8 device;
+ __u16 segment;
+ __u8 bus;
+ __u8 secondary_bus;
+ __u16 slot;
+ __u8 reserved;
+ } device_id;
+ struct {
+ __u32 lower;
+ __u32 upper;
+ } serial_number;
+ struct {
+ __u16 secondary_status;
+ __u16 control;
+ } bridge;
+ __u8 capability[60];
+ __u8 aer_info[96];
+};
+
/* Reset to default packing */
#pragma pack()
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH -v4 2/3] ACPI, APEI, Add APEI generic error status printing support
2010-12-07 2:22 [PATCH -v4 0/3] ACPI, APEI, Report GHES error information via printk Huang Ying
2010-12-07 2:22 ` [PATCH -v4 1/3] Add CPER PCIe error section structure and constants definition Huang Ying
@ 2010-12-07 2:22 ` Huang Ying
2010-12-07 2:22 ` [PATCH -v4 3/3] ACPI, APEI, Report GHES error information via printk Huang Ying
2010-12-14 4:43 ` [PATCH -v4 0/3] " Len Brown
3 siblings, 0 replies; 5+ messages in thread
From: Huang Ying @ 2010-12-07 2:22 UTC (permalink / raw)
To: Len Brown
Cc: linux-kernel, Andi Kleen, Tony Luck, ying.huang, linux-acpi,
Peter Zijlstra, Andrew Morton, Linus Torvalds, Ingo Molnar
In APEI, Hardware error information reported by firmware to Linux
kernel is in the data structure of APEI generic error status (struct
acpi_hes_generic_status). While now printk is used by Linux kernel to
report hardware error information to user space.
So, this patch adds printing support for the data structure, so that
the corresponding hardware error information can be reported to user
space via printk.
PCIe AER information printing is not implemented yet. Will refactor the
original PCIe AER information printing code to avoid code duplicating.
The output format is as follow:
<error record> :=
APEI generic hardware error status
severity: <integer>, <severity string>
section: <integer>, severity: <integer>, <severity string>
flags: <integer>
<section flags strings>
fru_id: <uuid string>
fru_text: <string>
section_type: <section type string>
<section data>
<severity string>* := recoverable | fatal | corrected | info
<section flags strings># :=
[primary][, containment warning][, reset][, threshold exceeded]\
[, resource not accessible][, latent error]
<section type string> := generic processor error | memory error | \
PCIe error | unknown, <uuid string>
<section data> :=
<generic processor section data> | <memory section data> | \
<pcie section data> | <null>
<generic processor section data> :=
[processor_type: <integer>, <proc type string>]
[processor_isa: <integer>, <proc isa string>]
[error_type: <integer>
<proc error type strings>]
[operation: <integer>, <proc operation string>]
[flags: <integer>
<proc flags strings>]
[level: <integer>]
[version_info: <integer>]
[processor_id: <integer>]
[target_address: <integer>]
[requestor_id: <integer>]
[responder_id: <integer>]
[IP: <integer>]
<proc type string>* := IA32/X64 | IA64
<proc isa string>* := IA32 | IA64 | X64
<processor error type strings># :=
[cache error][, TLB error][, bus error][, micro-architectural error]
<proc operation string>* := unknown or generic | data read | data write | \
instruction execution
<proc flags strings># :=
[restartable][, precise IP][, overflow][, corrected]
<memory section data> :=
[error_status: <integer>]
[physical_address: <integer>]
[physical_address_mask: <integer>]
[node: <integer>]
[card: <integer>]
[module: <integer>]
[bank: <integer>]
[device: <integer>]
[row: <integer>]
[column: <integer>]
[bit_position: <integer>]
[requestor_id: <integer>]
[responder_id: <integer>]
[target_id: <integer>]
[error_type: <integer>, <mem error type string>]
<mem error type string>* :=
unknown | no error | single-bit ECC | multi-bit ECC | \
single-symbol chipkill ECC | multi-symbol chipkill ECC | master abort | \
target abort | parity error | watchdog timeout | invalid address | \
mirror Broken | memory sparing | scrub corrected error | \
scrub uncorrected error
<pcie section data> :=
[port_type: <integer>, <pcie port type string>]
[version: <integer>.<integer>]
[command: <integer>, status: <integer>]
[device_id: <integer>:<integer>:<integer>.<integer>
slot: <integer>
secondary_bus: <integer>
vendor_id: <integer>, device_id: <integer>
class_code: <integer>]
[serial number: <integer>, <integer>]
[bridge: secondary_status: <integer>, control: <integer>]
<pcie port type string>* := PCIe end point | legacy PCI end point | \
unknown | unknown | root port | upstream switch port | \
downstream switch port | PCIe to PCI/PCI-X bridge | \
PCI/PCI-X to PCIe bridge | root complex integrated endpoint device | \
root complex event collector
Where, [] designate corresponding content is optional
All <field string> description with * has the following format:
field: <integer>, <field string>
Where value of <integer> should be the position of "string" in <field
string> description. Otherwise, <field string> will be "unknown".
All <field strings> description with # has the following format:
field: <integer>
<field strings>
Where each string in <fields strings> corresponding to one set bit of
<integer>. The bit position is the position of "string" in <field
strings> description.
For more detailed explanation of every field, please refer to UEFI
specification version 2.3 or later, section Appendix N: Common
Platform Error Record.
Signed-off-by: Huang Ying <ying.huang@intel.com>
---
Documentation/acpi/apei/output_format.txt | 122 +++++++++++
drivers/acpi/apei/apei-internal.h | 2
drivers/acpi/apei/cper.c | 311 ++++++++++++++++++++++++++++++
3 files changed, 435 insertions(+)
create mode 100644 Documentation/acpi/apei/output_format.txt
--- /dev/null
+++ b/Documentation/acpi/apei/output_format.txt
@@ -0,0 +1,122 @@
+ APEI output format
+ ~~~~~~~~~~~~~~~~~~
+
+APEI uses printk as hardware error reporting interface, the output
+format is as follow.
+
+<error record> :=
+APEI generic hardware error status
+severity: <integer>, <severity string>
+section: <integer>, severity: <integer>, <severity string>
+flags: <integer>
+<section flags strings>
+fru_id: <uuid string>
+fru_text: <string>
+section_type: <section type string>
+<section data>
+
+<severity string>* := recoverable | fatal | corrected | info
+
+<section flags strings># :=
+[primary][, containment warning][, reset][, threshold exceeded]\
+[, resource not accessible][, latent error]
+
+<section type string> := generic processor error | memory error | \
+PCIe error | unknown, <uuid string>
+
+<section data> :=
+<generic processor section data> | <memory section data> | \
+<pcie section data> | <null>
+
+<generic processor section data> :=
+[processor_type: <integer>, <proc type string>]
+[processor_isa: <integer>, <proc isa string>]
+[error_type: <integer>
+<proc error type strings>]
+[operation: <integer>, <proc operation string>]
+[flags: <integer>
+<proc flags strings>]
+[level: <integer>]
+[version_info: <integer>]
+[processor_id: <integer>]
+[target_address: <integer>]
+[requestor_id: <integer>]
+[responder_id: <integer>]
+[IP: <integer>]
+
+<proc type string>* := IA32/X64 | IA64
+
+<proc isa string>* := IA32 | IA64 | X64
+
+<processor error type strings># :=
+[cache error][, TLB error][, bus error][, micro-architectural error]
+
+<proc operation string>* := unknown or generic | data read | data write | \
+instruction execution
+
+<proc flags strings># :=
+[restartable][, precise IP][, overflow][, corrected]
+
+<memory section data> :=
+[error_status: <integer>]
+[physical_address: <integer>]
+[physical_address_mask: <integer>]
+[node: <integer>]
+[card: <integer>]
+[module: <integer>]
+[bank: <integer>]
+[device: <integer>]
+[row: <integer>]
+[column: <integer>]
+[bit_position: <integer>]
+[requestor_id: <integer>]
+[responder_id: <integer>]
+[target_id: <integer>]
+[error_type: <integer>, <mem error type string>]
+
+<mem error type string>* :=
+unknown | no error | single-bit ECC | multi-bit ECC | \
+single-symbol chipkill ECC | multi-symbol chipkill ECC | master abort | \
+target abort | parity error | watchdog timeout | invalid address | \
+mirror Broken | memory sparing | scrub corrected error | \
+scrub uncorrected error
+
+<pcie section data> :=
+[port_type: <integer>, <pcie port type string>]
+[version: <integer>.<integer>]
+[command: <integer>, status: <integer>]
+[device_id: <integer>:<integer>:<integer>.<integer>
+slot: <integer>
+secondary_bus: <integer>
+vendor_id: <integer>, device_id: <integer>
+class_code: <integer>]
+[serial number: <integer>, <integer>]
+[bridge: secondary_status: <integer>, control: <integer>]
+
+<pcie port type string>* := PCIe end point | legacy PCI end point | \
+unknown | unknown | root port | upstream switch port | \
+downstream switch port | PCIe to PCI/PCI-X bridge | \
+PCI/PCI-X to PCIe bridge | root complex integrated endpoint device | \
+root complex event collector
+
+Where, [] designate corresponding content is optional
+
+All <field string> description with * has the following format:
+
+field: <integer>, <field string>
+
+Where value of <integer> should be the position of "string" in <field
+string> description. Otherwise, <field string> will be "unknown".
+
+All <field strings> description with # has the following format:
+
+field: <integer>
+<field strings>
+
+Where each string in <fields strings> corresponding to one set bit of
+<integer>. The bit position is the position of "string" in <field
+strings> description.
+
+For more detailed explanation of every field, please refer to UEFI
+specification version 2.3 or later, section Appendix N: Common
+Platform Error Record.
--- a/drivers/acpi/apei/apei-internal.h
+++ b/drivers/acpi/apei/apei-internal.h
@@ -109,6 +109,8 @@ static inline u32 apei_estatus_len(struc
return sizeof(*estatus) + estatus->data_length;
}
+void apei_estatus_print(const char *pfx,
+ const struct acpi_hest_generic_status *estatus);
int apei_estatus_check_header(const struct acpi_hest_generic_status *estatus);
int apei_estatus_check(const struct acpi_hest_generic_status *estatus);
#endif
--- a/drivers/acpi/apei/cper.c
+++ b/drivers/acpi/apei/cper.c
@@ -46,6 +46,317 @@ u64 cper_next_record_id(void)
}
EXPORT_SYMBOL_GPL(cper_next_record_id);
+static const char *cper_severity_strs[] = {
+ "recoverable",
+ "fatal",
+ "corrected",
+ "info",
+};
+
+static const char *cper_severity_str(unsigned int severity)
+{
+ return severity < ARRAY_SIZE(cper_severity_strs) ?
+ cper_severity_strs[severity] : "unknown";
+}
+
+/*
+ * cper_print_bits - print strings for set bits
+ * @pfx: prefix for each line, including log level and prefix string
+ * @bits: bit mask
+ * @strs: string array, indexed by bit position
+ * @strs_size: size of the string array: @strs
+ *
+ * For each set bit in @bits, print the corresponding string in @strs.
+ * If the output length is longer than 80, multiple line will be
+ * printed, with @pfx is printed at the beginning of each line.
+ */
+static void cper_print_bits(const char *pfx, unsigned int bits,
+ const char *strs[], unsigned int strs_size)
+{
+ int i, len = 0;
+ const char *str;
+ char buf[84];
+
+ for (i = 0; i < strs_size; i++) {
+ if (!(bits & (1U << i)))
+ continue;
+ str = strs[i];
+ if (len && len + strlen(str) + 2 > 80) {
+ printk("%s\n", buf);
+ len = 0;
+ }
+ if (!len)
+ len = snprintf(buf, sizeof(buf), "%s%s", pfx, str);
+ else
+ len += snprintf(buf+len, sizeof(buf)-len, ", %s", str);
+ }
+ if (len)
+ printk("%s\n", buf);
+}
+
+static const char *cper_proc_type_strs[] = {
+ "IA32/X64",
+ "IA64",
+};
+
+static const char *cper_proc_isa_strs[] = {
+ "IA32",
+ "IA64",
+ "X64",
+};
+
+static const char *cper_proc_error_type_strs[] = {
+ "cache error",
+ "TLB error",
+ "bus error",
+ "micro-architectural error",
+};
+
+static const char *cper_proc_op_strs[] = {
+ "unknown or generic",
+ "data read",
+ "data write",
+ "instruction execution",
+};
+
+static const char *cper_proc_flag_strs[] = {
+ "restartable",
+ "precise IP",
+ "overflow",
+ "corrected",
+};
+
+static void cper_print_proc_generic(const char *pfx,
+ const struct cper_sec_proc_generic *proc)
+{
+ if (proc->validation_bits & CPER_PROC_VALID_TYPE)
+ printk("%s""processor_type: %d, %s\n", pfx, proc->proc_type,
+ proc->proc_type < ARRAY_SIZE(cper_proc_type_strs) ?
+ cper_proc_type_strs[proc->proc_type] : "unknown");
+ if (proc->validation_bits & CPER_PROC_VALID_ISA)
+ printk("%s""processor_isa: %d, %s\n", pfx, proc->proc_isa,
+ proc->proc_isa < ARRAY_SIZE(cper_proc_isa_strs) ?
+ cper_proc_isa_strs[proc->proc_isa] : "unknown");
+ if (proc->validation_bits & CPER_PROC_VALID_ERROR_TYPE) {
+ printk("%s""error_type: 0x%02x\n", pfx, proc->proc_error_type);
+ cper_print_bits(pfx, proc->proc_error_type,
+ cper_proc_error_type_strs,
+ ARRAY_SIZE(cper_proc_error_type_strs));
+ }
+ if (proc->validation_bits & CPER_PROC_VALID_OPERATION)
+ printk("%s""operation: %d, %s\n", pfx, proc->operation,
+ proc->operation < ARRAY_SIZE(cper_proc_op_strs) ?
+ cper_proc_op_strs[proc->operation] : "unknown");
+ if (proc->validation_bits & CPER_PROC_VALID_FLAGS) {
+ printk("%s""flags: 0x%02x\n", pfx, proc->flags);
+ cper_print_bits(pfx, proc->flags, cper_proc_flag_strs,
+ ARRAY_SIZE(cper_proc_flag_strs));
+ }
+ if (proc->validation_bits & CPER_PROC_VALID_LEVEL)
+ printk("%s""level: %d\n", pfx, proc->level);
+ if (proc->validation_bits & CPER_PROC_VALID_VERSION)
+ printk("%s""version_info: 0x%016llx\n", pfx, proc->cpu_version);
+ if (proc->validation_bits & CPER_PROC_VALID_ID)
+ printk("%s""processor_id: 0x%016llx\n", pfx, proc->proc_id);
+ if (proc->validation_bits & CPER_PROC_VALID_TARGET_ADDRESS)
+ printk("%s""target_address: 0x%016llx\n",
+ pfx, proc->target_addr);
+ if (proc->validation_bits & CPER_PROC_VALID_REQUESTOR_ID)
+ printk("%s""requestor_id: 0x%016llx\n",
+ pfx, proc->requestor_id);
+ if (proc->validation_bits & CPER_PROC_VALID_RESPONDER_ID)
+ printk("%s""responder_id: 0x%016llx\n",
+ pfx, proc->responder_id);
+ if (proc->validation_bits & CPER_PROC_VALID_IP)
+ printk("%s""IP: 0x%016llx\n", pfx, proc->ip);
+}
+
+static const char *cper_mem_err_type_strs[] = {
+ "unknown",
+ "no error",
+ "single-bit ECC",
+ "multi-bit ECC",
+ "single-symbol chipkill ECC",
+ "multi-symbol chipkill ECC",
+ "master abort",
+ "target abort",
+ "parity error",
+ "watchdog timeout",
+ "invalid address",
+ "mirror Broken",
+ "memory sparing",
+ "scrub corrected error",
+ "scrub uncorrected error",
+};
+
+static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem)
+{
+ if (mem->validation_bits & CPER_MEM_VALID_ERROR_STATUS)
+ printk("%s""error_status: 0x%016llx\n", pfx, mem->error_status);
+ if (mem->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS)
+ printk("%s""physical_address: 0x%016llx\n",
+ pfx, mem->physical_addr);
+ if (mem->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS_MASK)
+ printk("%s""physical_address_mask: 0x%016llx\n",
+ pfx, mem->physical_addr_mask);
+ if (mem->validation_bits & CPER_MEM_VALID_NODE)
+ printk("%s""node: %d\n", pfx, mem->node);
+ if (mem->validation_bits & CPER_MEM_VALID_CARD)
+ printk("%s""card: %d\n", pfx, mem->card);
+ if (mem->validation_bits & CPER_MEM_VALID_MODULE)
+ printk("%s""module: %d\n", pfx, mem->module);
+ if (mem->validation_bits & CPER_MEM_VALID_BANK)
+ printk("%s""bank: %d\n", pfx, mem->bank);
+ if (mem->validation_bits & CPER_MEM_VALID_DEVICE)
+ printk("%s""device: %d\n", pfx, mem->device);
+ if (mem->validation_bits & CPER_MEM_VALID_ROW)
+ printk("%s""row: %d\n", pfx, mem->row);
+ if (mem->validation_bits & CPER_MEM_VALID_COLUMN)
+ printk("%s""column: %d\n", pfx, mem->column);
+ if (mem->validation_bits & CPER_MEM_VALID_BIT_POSITION)
+ printk("%s""bit_position: %d\n", pfx, mem->bit_pos);
+ if (mem->validation_bits & CPER_MEM_VALID_REQUESTOR_ID)
+ printk("%s""requestor_id: 0x%016llx\n", pfx, mem->requestor_id);
+ if (mem->validation_bits & CPER_MEM_VALID_RESPONDER_ID)
+ printk("%s""responder_id: 0x%016llx\n", pfx, mem->responder_id);
+ if (mem->validation_bits & CPER_MEM_VALID_TARGET_ID)
+ printk("%s""target_id: 0x%016llx\n", pfx, mem->target_id);
+ if (mem->validation_bits & CPER_MEM_VALID_ERROR_TYPE) {
+ u8 etype = mem->error_type;
+ printk("%s""error_type: %d, %s\n", pfx, etype,
+ etype < ARRAY_SIZE(cper_mem_err_type_strs) ?
+ cper_mem_err_type_strs[etype] : "unknown");
+ }
+}
+
+static const char *cper_pcie_port_type_strs[] = {
+ "PCIe end point",
+ "legacy PCI end point",
+ "unknown",
+ "unknown",
+ "root port",
+ "upstream switch port",
+ "downstream switch port",
+ "PCIe to PCI/PCI-X bridge",
+ "PCI/PCI-X to PCIe bridge",
+ "root complex integrated endpoint device",
+ "root complex event collector",
+};
+
+static void cper_print_pcie(const char *pfx, const struct cper_sec_pcie *pcie)
+{
+ if (pcie->validation_bits & CPER_PCIE_VALID_PORT_TYPE)
+ printk("%s""port_type: %d, %s\n", pfx, pcie->port_type,
+ pcie->port_type < ARRAY_SIZE(cper_pcie_port_type_strs) ?
+ cper_pcie_port_type_strs[pcie->port_type] : "unknown");
+ if (pcie->validation_bits & CPER_PCIE_VALID_VERSION)
+ printk("%s""version: %d.%d\n", pfx,
+ pcie->version.major, pcie->version.minor);
+ if (pcie->validation_bits & CPER_PCIE_VALID_COMMAND_STATUS)
+ printk("%s""command: 0x%04x, status: 0x%04x\n", pfx,
+ pcie->command, pcie->status);
+ if (pcie->validation_bits & CPER_PCIE_VALID_DEVICE_ID) {
+ const __u8 *p;
+ printk("%s""device_id: %04x:%02x:%02x.%x\n", pfx,
+ pcie->device_id.segment, pcie->device_id.bus,
+ pcie->device_id.device, pcie->device_id.function);
+ printk("%s""slot: %d\n", pfx,
+ pcie->device_id.slot >> CPER_PCIE_SLOT_SHIFT);
+ printk("%s""secondary_bus: 0x%02x\n", pfx,
+ pcie->device_id.secondary_bus);
+ printk("%s""vendor_id: 0x%04x, device_id: 0x%04x\n", pfx,
+ pcie->device_id.vendor_id, pcie->device_id.device_id);
+ p = pcie->device_id.class_code;
+ printk("%s""class_code: %02x%02x%02x\n", pfx, p[0], p[1], p[2]);
+ }
+ if (pcie->validation_bits & CPER_PCIE_VALID_SERIAL_NUMBER)
+ printk("%s""serial number: 0x%04x, 0x%04x\n", pfx,
+ pcie->serial_number.lower, pcie->serial_number.upper);
+ if (pcie->validation_bits & CPER_PCIE_VALID_BRIDGE_CONTROL_STATUS)
+ printk(
+ "%s""bridge: secondary_status: 0x%04x, control: 0x%04x\n",
+ pfx, pcie->bridge.secondary_status, pcie->bridge.control);
+}
+
+static const char *apei_estatus_section_flag_strs[] = {
+ "primary",
+ "containment warning",
+ "reset",
+ "threshold exceeded",
+ "resource not accessible",
+ "latent error",
+};
+
+static void apei_estatus_print_section(
+ const char *pfx, const struct acpi_hest_generic_data *gdata, int sec_no)
+{
+ uuid_le *sec_type = (uuid_le *)gdata->section_type;
+ __u16 severity;
+
+ severity = gdata->error_severity;
+ printk("%s""section: %d, severity: %d, %s\n", pfx, sec_no, severity,
+ cper_severity_str(severity));
+ printk("%s""flags: 0x%02x\n", pfx, gdata->flags);
+ cper_print_bits(pfx, gdata->flags, apei_estatus_section_flag_strs,
+ ARRAY_SIZE(apei_estatus_section_flag_strs));
+ if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
+ printk("%s""fru_id: %pUl\n", pfx, (uuid_le *)gdata->fru_id);
+ if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
+ printk("%s""fru_text: %.20s\n", pfx, gdata->fru_text);
+
+ if (!uuid_le_cmp(*sec_type, CPER_SEC_PROC_GENERIC)) {
+ struct cper_sec_proc_generic *proc_err = (void *)(gdata + 1);
+ printk("%s""section_type: general processor error\n", pfx);
+ if (gdata->error_data_length >= sizeof(*proc_err))
+ cper_print_proc_generic(pfx, proc_err);
+ else
+ goto err_section_too_small;
+ } else if (!uuid_le_cmp(*sec_type, CPER_SEC_PLATFORM_MEM)) {
+ struct cper_sec_mem_err *mem_err = (void *)(gdata + 1);
+ printk("%s""section_type: memory error\n", pfx);
+ if (gdata->error_data_length >= sizeof(*mem_err))
+ cper_print_mem(pfx, mem_err);
+ else
+ goto err_section_too_small;
+ } else if (!uuid_le_cmp(*sec_type, CPER_SEC_PCIE)) {
+ struct cper_sec_pcie *pcie = (void *)(gdata + 1);
+ printk("%s""section_type: PCIe error\n", pfx);
+ if (gdata->error_data_length >= sizeof(*pcie))
+ cper_print_pcie(pfx, pcie);
+ else
+ goto err_section_too_small;
+ } else
+ printk("%s""section type: unknown, %pUl\n", pfx, sec_type);
+
+ return;
+
+err_section_too_small:
+ pr_err(FW_WARN "error section length is too small\n");
+}
+
+void apei_estatus_print(const char *pfx,
+ const struct acpi_hest_generic_status *estatus)
+{
+ struct acpi_hest_generic_data *gdata;
+ unsigned int data_len, gedata_len;
+ int sec_no = 0;
+ __u16 severity;
+
+ printk("%s""APEI generic hardware error status\n", pfx);
+ severity = estatus->error_severity;
+ printk("%s""severity: %d, %s\n", pfx, severity,
+ cper_severity_str(severity));
+ data_len = estatus->data_length;
+ gdata = (struct acpi_hest_generic_data *)(estatus + 1);
+ while (data_len > sizeof(*gdata)) {
+ gedata_len = gdata->error_data_length;
+ apei_estatus_print_section(pfx, gdata, sec_no);
+ data_len -= gedata_len + sizeof(*gdata);
+ sec_no++;
+ }
+}
+EXPORT_SYMBOL_GPL(apei_estatus_print);
+
int apei_estatus_check_header(const struct acpi_hest_generic_status *estatus)
{
if (estatus->data_length &&
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH -v4 3/3] ACPI, APEI, Report GHES error information via printk
2010-12-07 2:22 [PATCH -v4 0/3] ACPI, APEI, Report GHES error information via printk Huang Ying
2010-12-07 2:22 ` [PATCH -v4 1/3] Add CPER PCIe error section structure and constants definition Huang Ying
2010-12-07 2:22 ` [PATCH -v4 2/3] ACPI, APEI, Add APEI generic error status printing support Huang Ying
@ 2010-12-07 2:22 ` Huang Ying
2010-12-14 4:43 ` [PATCH -v4 0/3] " Len Brown
3 siblings, 0 replies; 5+ messages in thread
From: Huang Ying @ 2010-12-07 2:22 UTC (permalink / raw)
To: Len Brown
Cc: linux-kernel, Andi Kleen, Tony Luck, ying.huang, linux-acpi,
Peter Zijlstra, Andrew Morton, Linus Torvalds, Ingo Molnar
printk is one of the methods to report hardware errors to user space.
This patch implements hardware error reporting for GHES via printk.
Signed-off-by: Huang Ying <ying.huang@intel.com>
---
drivers/acpi/apei/ghes.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -43,6 +43,7 @@
#include <linux/kdebug.h>
#include <linux/platform_device.h>
#include <linux/mutex.h>
+#include <linux/ratelimit.h>
#include <acpi/apei.h>
#include <acpi/atomicio.h>
#include <acpi/hed.h>
@@ -255,11 +256,26 @@ static void ghes_do_proc(struct ghes *gh
}
#endif
}
+}
+
+static void ghes_print_estatus(const char *pfx, struct ghes *ghes)
+{
+ /* Not more than 2 messages every 5 seconds */
+ static DEFINE_RATELIMIT_STATE(ratelimit, 5*HZ, 2);
- if (!processed && printk_ratelimit())
- pr_warning(GHES_PFX
- "Unknown error record from generic hardware error source: %d\n",
- ghes->generic->header.source_id);
+ if (pfx == NULL) {
+ if (ghes_severity(ghes->estatus->error_severity) <=
+ GHES_SEV_CORRECTED)
+ pfx = KERN_WARNING HW_ERR;
+ else
+ pfx = KERN_ERR HW_ERR;
+ }
+ if (__ratelimit(&ratelimit)) {
+ printk(
+ "%s""Hardware error from APEI Generic Hardware Error Source: %d\n",
+ pfx, ghes->generic->header.source_id);
+ apei_estatus_print(pfx, ghes->estatus);
+ }
}
static int ghes_proc(struct ghes *ghes)
@@ -269,6 +285,7 @@ static int ghes_proc(struct ghes *ghes)
rc = ghes_read_estatus(ghes, 0);
if (rc)
goto out;
+ ghes_print_estatus(NULL, ghes);
ghes_do_proc(ghes);
out:
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH -v4 0/3] ACPI, APEI, Report GHES error information via printk
2010-12-07 2:22 [PATCH -v4 0/3] ACPI, APEI, Report GHES error information via printk Huang Ying
` (2 preceding siblings ...)
2010-12-07 2:22 ` [PATCH -v4 3/3] ACPI, APEI, Report GHES error information via printk Huang Ying
@ 2010-12-14 4:43 ` Len Brown
3 siblings, 0 replies; 5+ messages in thread
From: Len Brown @ 2010-12-14 4:43 UTC (permalink / raw)
To: Huang Ying
Cc: linux-kernel, Andi Kleen, Tony Luck, linux-acpi, Peter Zijlstra,
Andrew Morton, Linus Torvalds, Ingo Molnar
On Tue, 7 Dec 2010, Huang Ying wrote:
> The recent consensus is that hardware error should be reported via printk.
> This patch adds printk support to APEI GHES.
>
> v4:
>
> - Remove pr_pfx, use printk directly
v4 applied to acpi-test
thanks,
Len Brown, Intel Open Source Technology Center
>
> v3:
>
> - Add document for output format per Andrew's comments
> - Move pr_pfx into printk.h, hope it is useful for others too
> - Fixes some issues according to comments
>
> v2:
>
> - Some minor adjustment of PCIe error section definition and printk format
>
> [PATCH -v4 1/3] Add CPER PCIe error section structure and constants definition
> [PATCH -v4 2/3] ACPI, APEI, Add APEI generic error status printing support
> [PATCH -v4 3/3] ACPI, APEI, Report GHES error information via printk
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-12-14 4:43 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-07 2:22 [PATCH -v4 0/3] ACPI, APEI, Report GHES error information via printk Huang Ying
2010-12-07 2:22 ` [PATCH -v4 1/3] Add CPER PCIe error section structure and constants definition Huang Ying
2010-12-07 2:22 ` [PATCH -v4 2/3] ACPI, APEI, Add APEI generic error status printing support Huang Ying
2010-12-07 2:22 ` [PATCH -v4 3/3] ACPI, APEI, Report GHES error information via printk Huang Ying
2010-12-14 4:43 ` [PATCH -v4 0/3] " Len Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox