* [PATCH -v3 0/4] ACPI, APEI, Report GHES error information via printk
@ 2010-12-02 8:05 Huang Ying
2010-12-02 8:05 ` [PATCH -v3 1/4] printk, Add pr_pfx for library functions Huang Ying
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Huang Ying @ 2010-12-02 8:05 UTC (permalink / raw)
To: Len Brown
Cc: linux-kernel, Andi Kleen, Tony Luck, ying.huang, linux-acpi,
Peter Zijlstra, Andrew Morton, Linus Torvalds, Ingo Molnar
The recent consensus is that hardware error should be reported via printk.
This patch adds printk support to APEI GHES.
v3:
- Add document for output format per Andrew's comments
- Move pr_pfx into printk.h, hope it is useful for others too
- Fixes some issues according to comments
v2:
- Some minor adjustment of PCIe error section definition and printk format
[PATCH -v3 1/4] printk, Add pr_pfx for library functions
[PATCH -v3 2/4] Add CPER PCIe error section structure and constants definition
[PATCH -v3 3/4] ACPI, APEI, Add APEI generic error status printing support
[PATCH -v3 4/4] ACPI, APEI, Report GHES error information via printk
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH -v3 1/4] printk, Add pr_pfx for library functions
2010-12-02 8:05 [PATCH -v3 0/4] ACPI, APEI, Report GHES error information via printk Huang Ying
@ 2010-12-02 8:05 ` Huang Ying
2010-12-02 8:37 ` Joe Perches
2010-12-02 8:05 ` [PATCH -v3 2/4] Add CPER PCIe error section structure and constants definition Huang Ying
` (2 subsequent siblings)
3 siblings, 1 reply; 7+ messages in thread
From: Huang Ying @ 2010-12-02 8:05 UTC (permalink / raw)
To: Len Brown
Cc: linux-kernel, Andi Kleen, Tony Luck, ying.huang, linux-acpi,
Peter Zijlstra, Andrew Morton, Linus Torvalds, Ingo Molnar
For library functions doing printk, the log level and line prefix
usually need to be specified by the caller. So this patch adds
"pr_pfx" to make the life of these library functions easier.
Signed-off-by: Huang Ying <ying.huang@intel.com>
---
include/linux/printk.h | 7 +++++++
1 file changed, 7 insertions(+)
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -202,6 +202,13 @@ extern void print_hex_dump_bytes(const c
#endif
/*
+ * Used by library functions, where log level and line prefix need to
+ * be specifieded by the caller.
+ */
+#define pr_pfx(pfx, fmt, ...) \
+ printk("%s" fmt, pfx, ##__VA_ARGS__)
+
+/*
* ratelimited messages with local ratelimit_state,
* no local ratelimit_state used in the !PRINTK case
*/
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH -v3 2/4] Add CPER PCIe error section structure and constants definition
2010-12-02 8:05 [PATCH -v3 0/4] ACPI, APEI, Report GHES error information via printk Huang Ying
2010-12-02 8:05 ` [PATCH -v3 1/4] printk, Add pr_pfx for library functions Huang Ying
@ 2010-12-02 8:05 ` Huang Ying
2010-12-02 8:05 ` [PATCH -v3 3/4] ACPI, APEI, Add APEI generic error status printing support Huang Ying
2010-12-02 8:05 ` [PATCH -v3 4/4] ACPI, APEI, Report GHES error information via printk Huang Ying
3 siblings, 0 replies; 7+ messages in thread
From: Huang Ying @ 2010-12-02 8:05 UTC (permalink / raw)
To: Len Brown
Cc: linux-kernel, Andi Kleen, Tony Luck, ying.huang, linux-acpi,
Peter Zijlstra, Andrew Morton, Linus Torvalds, Ingo Molnar
On some machine, PCIe error is reported via APEI (ACPI Platform Error
Interface). The error data is passed from firmware to Linux via CPER
PCIe error section structure.
This patch adds CPER PCIe error section structure and constants
definition.
Signed-off-by: Huang Ying <ying.huang@intel.com>
---
include/linux/cper.h | 86 ++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 82 insertions(+), 4 deletions(-)
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -39,10 +39,12 @@
* Severity difinition for error_severity in struct cper_record_header
* and section_severity in struct cper_section_descriptor
*/
-#define CPER_SEV_RECOVERABLE 0x0
-#define CPER_SEV_FATAL 0x1
-#define CPER_SEV_CORRECTED 0x2
-#define CPER_SEV_INFORMATIONAL 0x3
+enum {
+ CPER_SEV_RECOVERABLE,
+ CPER_SEV_FATAL,
+ CPER_SEV_CORRECTED,
+ CPER_SEV_INFORMATIONAL,
+};
/*
* Validation bits difinition for validation_bits in struct
@@ -201,6 +203,47 @@
UUID_LE(0x036F84E1, 0x7F37, 0x428c, 0xA7, 0x9E, 0x57, 0x5F, \
0xDF, 0xAA, 0x84, 0xEC)
+#define CPER_PROC_VALID_TYPE 0x0001
+#define CPER_PROC_VALID_ISA 0x0002
+#define CPER_PROC_VALID_ERROR_TYPE 0x0004
+#define CPER_PROC_VALID_OPERATION 0x0008
+#define CPER_PROC_VALID_FLAGS 0x0010
+#define CPER_PROC_VALID_LEVEL 0x0020
+#define CPER_PROC_VALID_VERSION 0x0040
+#define CPER_PROC_VALID_BRAND_INFO 0x0080
+#define CPER_PROC_VALID_ID 0x0100
+#define CPER_PROC_VALID_TARGET_ADDRESS 0x0200
+#define CPER_PROC_VALID_REQUESTOR_ID 0x0400
+#define CPER_PROC_VALID_RESPONDER_ID 0x0800
+#define CPER_PROC_VALID_IP 0x1000
+
+#define CPER_MEM_VALID_ERROR_STATUS 0x0001
+#define CPER_MEM_VALID_PHYSICAL_ADDRESS 0x0002
+#define CPER_MEM_VALID_PHYSICAL_ADDRESS_MASK 0x0004
+#define CPER_MEM_VALID_NODE 0x0008
+#define CPER_MEM_VALID_CARD 0x0010
+#define CPER_MEM_VALID_MODULE 0x0020
+#define CPER_MEM_VALID_BANK 0x0040
+#define CPER_MEM_VALID_DEVICE 0x0080
+#define CPER_MEM_VALID_ROW 0x0100
+#define CPER_MEM_VALID_COLUMN 0x0200
+#define CPER_MEM_VALID_BIT_POSITION 0x0400
+#define CPER_MEM_VALID_REQUESTOR_ID 0x0800
+#define CPER_MEM_VALID_RESPONDER_ID 0x1000
+#define CPER_MEM_VALID_TARGET_ID 0x2000
+#define CPER_MEM_VALID_ERROR_TYPE 0x4000
+
+#define CPER_PCIE_VALID_PORT_TYPE 0x0001
+#define CPER_PCIE_VALID_VERSION 0x0002
+#define CPER_PCIE_VALID_COMMAND_STATUS 0x0004
+#define CPER_PCIE_VALID_DEVICE_ID 0x0008
+#define CPER_PCIE_VALID_SERIAL_NUMBER 0x0010
+#define CPER_PCIE_VALID_BRIDGE_CONTROL_STATUS 0x0020
+#define CPER_PCIE_VALID_CAPABILITY 0x0040
+#define CPER_PCIE_VALID_AER_INFO 0x0080
+
+#define CPER_PCIE_SLOT_SHIFT 3
+
/*
* All tables and structs must be byte-packed to match CPER
* specification, since the tables are provided by the system BIOS
@@ -306,6 +349,41 @@ struct cper_sec_mem_err {
__u8 error_type;
};
+struct cper_sec_pcie {
+ __u64 validation_bits;
+ __u32 port_type;
+ struct {
+ __u8 minor;
+ __u8 major;
+ __u8 reserved[2];
+ } version;
+ __u16 command;
+ __u16 status;
+ __u32 reserved;
+ struct {
+ __u16 vendor_id;
+ __u16 device_id;
+ __u8 class_code[3];
+ __u8 function;
+ __u8 device;
+ __u16 segment;
+ __u8 bus;
+ __u8 secondary_bus;
+ __u16 slot;
+ __u8 reserved;
+ } device_id;
+ struct {
+ __u32 lower;
+ __u32 upper;
+ } serial_number;
+ struct {
+ __u16 secondary_status;
+ __u16 control;
+ } bridge;
+ __u8 capability[60];
+ __u8 aer_info[96];
+};
+
/* Reset to default packing */
#pragma pack()
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH -v3 3/4] ACPI, APEI, Add APEI generic error status printing support
2010-12-02 8:05 [PATCH -v3 0/4] ACPI, APEI, Report GHES error information via printk Huang Ying
2010-12-02 8:05 ` [PATCH -v3 1/4] printk, Add pr_pfx for library functions Huang Ying
2010-12-02 8:05 ` [PATCH -v3 2/4] Add CPER PCIe error section structure and constants definition Huang Ying
@ 2010-12-02 8:05 ` Huang Ying
2010-12-02 8:05 ` [PATCH -v3 4/4] ACPI, APEI, Report GHES error information via printk Huang Ying
3 siblings, 0 replies; 7+ messages in thread
From: Huang Ying @ 2010-12-02 8:05 UTC (permalink / raw)
To: Len Brown
Cc: linux-kernel, Andi Kleen, Tony Luck, ying.huang, linux-acpi,
Peter Zijlstra, Andrew Morton, Linus Torvalds, Ingo Molnar
In APEI, Hardware error information reported by firmware to Linux
kernel is in the data structure of APEI generic error status (struct
acpi_hes_generic_status). While now printk is used by Linux kernel to
report hardware error information to user space.
So, this patch adds printing support for the data structure, so that
the corresponding hardware error information can be reported to user
space via printk.
PCIe AER information printing is not implemented yet. Will refactor the
original PCIe AER information printing code to avoid code duplicating.
The output format is as follow:
<error record> :=
APEI generic hardware error status
severity: <integer>, <severity string>
section: <integer>, severity: <integer>, <severity string>
flags: <integer>
<section flags strings>
fru_id: <uuid string>
fru_text: <string>
section_type: <section type string>
<section data>
<severity string>* := recoverable | fatal | corrected | info
<section flags strings># :=
[primary][, containment warning][, reset][, threshold exceeded]\
[, resource not accessible][, latent error]
<section type string> := generic processor error | memory error | \
PCIe error | unknown, <uuid string>
<section data> :=
<generic processor section data> | <memory section data> | \
<pcie section data> | <null>
<generic processor section data> :=
[processor_type: <integer>, <proc type string>]
[processor_isa: <integer>, <proc isa string>]
[error_type: <integer>
<proc error type strings>]
[operation: <integer>, <proc operation string>]
[flags: <integer>
<proc flags strings>]
[level: <integer>]
[version_info: <integer>]
[processor_id: <integer>]
[target_address: <integer>]
[requestor_id: <integer>]
[responder_id: <integer>]
[IP: <integer>]
<proc type string>* := IA32/X64 | IA64
<proc isa string>* := IA32 | IA64 | X64
<processor error type strings># :=
[cache error][, TLB error][, bus error][, micro-architectural error]
<proc operation string>* := unknown or generic | data read | data write | \
instruction execution
<proc flags strings># :=
[restartable][, precise IP][, overflow][, corrected]
<memory section data> :=
[error_status: <integer>]
[physical_address: <integer>]
[physical_address_mask: <integer>]
[node: <integer>]
[card: <integer>]
[module: <integer>]
[bank: <integer>]
[device: <integer>]
[row: <integer>]
[column: <integer>]
[bit_position: <integer>]
[requestor_id: <integer>]
[responder_id: <integer>]
[target_id: <integer>]
[error_type: <integer>, <mem error type string>]
<mem error type string>* :=
unknown | no error | single-bit ECC | multi-bit ECC | \
single-symbol chipkill ECC | multi-symbol chipkill ECC | master abort | \
target abort | parity error | watchdog timeout | invalid address | \
mirror Broken | memory sparing | scrub corrected error | \
scrub uncorrected error
<pcie section data> :=
[port_type: <integer>, <pcie port type string>]
[version: <integer>.<integer>]
[command: <integer>, status: <integer>]
[device_id: <integer>:<integer>:<integer>.<integer>
slot: <integer>
secondary_bus: <integer>
vendor_id: <integer>, device_id: <integer>
class_code: <integer>]
[serial number: <integer>, <integer>]
[bridge: secondary_status: <integer>, control: <integer>]
<pcie port type string>* := PCIe end point | legacy PCI end point | \
unknown | unknown | root port | upstream switch port | \
downstream switch port | PCIe to PCI/PCI-X bridge | \
PCI/PCI-X to PCIe bridge | root complex integrated endpoint device | \
root complex event collector
Where, [] designate corresponding content is optional
All <field string> description with * has the following format:
field: <integer>, <field string>
Where value of <integer> should be the position of "string" in <field
string> description. Otherwise, <field string> will be "unknown".
All <field strings> description with # has the following format:
field: <integer>
<field strings>
Where each string in <fields strings> corresponding to one set bit of
<integer>. The bit position is the position of "string" in <field
strings> description.
For more detailed explanation of every field, please refer to UEFI
specification version 2.3 or later, section Appendix N: Common
Platform Error Record.
Signed-off-by: Huang Ying <ying.huang@intel.com>
---
Documentation/acpi/apei/output_format.txt | 122 +++++++++++
drivers/acpi/apei/apei-internal.h | 2
drivers/acpi/apei/cper.c | 309 ++++++++++++++++++++++++++++++
3 files changed, 433 insertions(+)
create mode 100644 Documentation/acpi/apei/output_format.txt
--- /dev/null
+++ b/Documentation/acpi/apei/output_format.txt
@@ -0,0 +1,122 @@
+ APEI output format
+ ~~~~~~~~~~~~~~~~~~
+
+APEI uses printk as hardware error reporting interface, the output
+format is as follow.
+
+<error record> :=
+APEI generic hardware error status
+severity: <integer>, <severity string>
+section: <integer>, severity: <integer>, <severity string>
+flags: <integer>
+<section flags strings>
+fru_id: <uuid string>
+fru_text: <string>
+section_type: <section type string>
+<section data>
+
+<severity string>* := recoverable | fatal | corrected | info
+
+<section flags strings># :=
+[primary][, containment warning][, reset][, threshold exceeded]\
+[, resource not accessible][, latent error]
+
+<section type string> := generic processor error | memory error | \
+PCIe error | unknown, <uuid string>
+
+<section data> :=
+<generic processor section data> | <memory section data> | \
+<pcie section data> | <null>
+
+<generic processor section data> :=
+[processor_type: <integer>, <proc type string>]
+[processor_isa: <integer>, <proc isa string>]
+[error_type: <integer>
+<proc error type strings>]
+[operation: <integer>, <proc operation string>]
+[flags: <integer>
+<proc flags strings>]
+[level: <integer>]
+[version_info: <integer>]
+[processor_id: <integer>]
+[target_address: <integer>]
+[requestor_id: <integer>]
+[responder_id: <integer>]
+[IP: <integer>]
+
+<proc type string>* := IA32/X64 | IA64
+
+<proc isa string>* := IA32 | IA64 | X64
+
+<processor error type strings># :=
+[cache error][, TLB error][, bus error][, micro-architectural error]
+
+<proc operation string>* := unknown or generic | data read | data write | \
+instruction execution
+
+<proc flags strings># :=
+[restartable][, precise IP][, overflow][, corrected]
+
+<memory section data> :=
+[error_status: <integer>]
+[physical_address: <integer>]
+[physical_address_mask: <integer>]
+[node: <integer>]
+[card: <integer>]
+[module: <integer>]
+[bank: <integer>]
+[device: <integer>]
+[row: <integer>]
+[column: <integer>]
+[bit_position: <integer>]
+[requestor_id: <integer>]
+[responder_id: <integer>]
+[target_id: <integer>]
+[error_type: <integer>, <mem error type string>]
+
+<mem error type string>* :=
+unknown | no error | single-bit ECC | multi-bit ECC | \
+single-symbol chipkill ECC | multi-symbol chipkill ECC | master abort | \
+target abort | parity error | watchdog timeout | invalid address | \
+mirror Broken | memory sparing | scrub corrected error | \
+scrub uncorrected error
+
+<pcie section data> :=
+[port_type: <integer>, <pcie port type string>]
+[version: <integer>.<integer>]
+[command: <integer>, status: <integer>]
+[device_id: <integer>:<integer>:<integer>.<integer>
+slot: <integer>
+secondary_bus: <integer>
+vendor_id: <integer>, device_id: <integer>
+class_code: <integer>]
+[serial number: <integer>, <integer>]
+[bridge: secondary_status: <integer>, control: <integer>]
+
+<pcie port type string>* := PCIe end point | legacy PCI end point | \
+unknown | unknown | root port | upstream switch port | \
+downstream switch port | PCIe to PCI/PCI-X bridge | \
+PCI/PCI-X to PCIe bridge | root complex integrated endpoint device | \
+root complex event collector
+
+Where, [] designate corresponding content is optional
+
+All <field string> description with * has the following format:
+
+field: <integer>, <field string>
+
+Where value of <integer> should be the position of "string" in <field
+string> description. Otherwise, <field string> will be "unknown".
+
+All <field strings> description with # has the following format:
+
+field: <integer>
+<field strings>
+
+Where each string in <fields strings> corresponding to one set bit of
+<integer>. The bit position is the position of "string" in <field
+strings> description.
+
+For more detailed explanation of every field, please refer to UEFI
+specification version 2.3 or later, section Appendix N: Common
+Platform Error Record.
--- a/drivers/acpi/apei/apei-internal.h
+++ b/drivers/acpi/apei/apei-internal.h
@@ -109,6 +109,8 @@ static inline u32 apei_estatus_len(struc
return sizeof(*estatus) + estatus->data_length;
}
+void apei_estatus_print(const char *pfx,
+ const struct acpi_hest_generic_status *estatus);
int apei_estatus_check_header(const struct acpi_hest_generic_status *estatus);
int apei_estatus_check(const struct acpi_hest_generic_status *estatus);
#endif
--- a/drivers/acpi/apei/cper.c
+++ b/drivers/acpi/apei/cper.c
@@ -46,6 +46,315 @@ u64 cper_next_record_id(void)
}
EXPORT_SYMBOL_GPL(cper_next_record_id);
+static const char *cper_severity_strs[] = {
+ "recoverable",
+ "fatal",
+ "corrected",
+ "info",
+};
+
+static const char *cper_severity_str(unsigned int severity)
+{
+ return severity < ARRAY_SIZE(cper_severity_strs) ?
+ cper_severity_strs[severity] : "unknown";
+}
+
+/*
+ * cper_print_bits - print strings for set bits
+ * @pfx: prefix for each line, including log level and prefix string
+ * @bits: bit mask
+ * @strs: string array, indexed by bit position
+ * @strs_size: size of the string array: @strs
+ *
+ * For each set bit in @bits, print the corresponding string in @strs.
+ * If the output length is longer than 80, multiple line will be
+ * printed, with @pfx is printed at the beginning of each line.
+ */
+static void cper_print_bits(const char *pfx, unsigned int bits,
+ const char *strs[], unsigned int strs_size)
+{
+ int i, len = 0;
+ const char *str;
+ char buf[84];
+
+ for (i = 0; i < strs_size; i++) {
+ if (!(bits & (1U << i)))
+ continue;
+ str = strs[i];
+ if (len && len + strlen(str) + 2 > 80) {
+ printk("%s\n", buf);
+ len = 0;
+ }
+ if (!len)
+ len = snprintf(buf, sizeof(buf), "%s%s", pfx, str);
+ else
+ len += snprintf(buf+len, sizeof(buf)-len, ", %s", str);
+ }
+ if (len)
+ printk("%s\n", buf);
+}
+
+static const char *cper_proc_type_strs[] = {
+ "IA32/X64",
+ "IA64",
+};
+
+static const char *cper_proc_isa_strs[] = {
+ "IA32",
+ "IA64",
+ "X64",
+};
+
+static const char *cper_proc_error_type_strs[] = {
+ "cache error",
+ "TLB error",
+ "bus error",
+ "micro-architectural error",
+};
+
+static const char *cper_proc_op_strs[] = {
+ "unknown or generic",
+ "data read",
+ "data write",
+ "instruction execution",
+};
+
+static const char *cper_proc_flag_strs[] = {
+ "restartable",
+ "precise IP",
+ "overflow",
+ "corrected",
+};
+
+static void cper_print_proc_generic(const char *pfx,
+ const struct cper_sec_proc_generic *proc)
+{
+ if (proc->validation_bits & CPER_PROC_VALID_TYPE)
+ pr_pfx(pfx, "processor_type: %d, %s\n", proc->proc_type,
+ proc->proc_type < ARRAY_SIZE(cper_proc_type_strs) ?
+ cper_proc_type_strs[proc->proc_type] : "unknown");
+ if (proc->validation_bits & CPER_PROC_VALID_ISA)
+ pr_pfx(pfx, "processor_isa: %d, %s\n", proc->proc_isa,
+ proc->proc_isa < ARRAY_SIZE(cper_proc_isa_strs) ?
+ cper_proc_isa_strs[proc->proc_isa] : "unknown");
+ if (proc->validation_bits & CPER_PROC_VALID_ERROR_TYPE) {
+ pr_pfx(pfx, "error_type: 0x%02x\n", proc->proc_error_type);
+ cper_print_bits(pfx, proc->proc_error_type,
+ cper_proc_error_type_strs,
+ ARRAY_SIZE(cper_proc_error_type_strs));
+ }
+ if (proc->validation_bits & CPER_PROC_VALID_OPERATION)
+ pr_pfx(pfx, "operation: %d, %s\n", proc->operation,
+ proc->operation < ARRAY_SIZE(cper_proc_op_strs) ?
+ cper_proc_op_strs[proc->operation] : "unknown");
+ if (proc->validation_bits & CPER_PROC_VALID_FLAGS) {
+ pr_pfx(pfx, "flags: 0x%02x\n", proc->flags);
+ cper_print_bits(pfx, proc->flags, cper_proc_flag_strs,
+ ARRAY_SIZE(cper_proc_flag_strs));
+ }
+ if (proc->validation_bits & CPER_PROC_VALID_LEVEL)
+ pr_pfx(pfx, "level: %d\n", proc->level);
+ if (proc->validation_bits & CPER_PROC_VALID_VERSION)
+ pr_pfx(pfx, "version_info: 0x%016llx\n", proc->cpu_version);
+ if (proc->validation_bits & CPER_PROC_VALID_ID)
+ pr_pfx(pfx, "processor_id: 0x%016llx\n", proc->proc_id);
+ if (proc->validation_bits & CPER_PROC_VALID_TARGET_ADDRESS)
+ pr_pfx(pfx, "target_address: 0x%016llx\n",
+ proc->target_addr);
+ if (proc->validation_bits & CPER_PROC_VALID_REQUESTOR_ID)
+ pr_pfx(pfx, "requestor_id: 0x%016llx\n", proc->requestor_id);
+ if (proc->validation_bits & CPER_PROC_VALID_RESPONDER_ID)
+ pr_pfx(pfx, "responder_id: 0x%016llx\n", proc->responder_id);
+ if (proc->validation_bits & CPER_PROC_VALID_IP)
+ pr_pfx(pfx, "IP: 0x%016llx\n", proc->ip);
+}
+
+static const char *cper_mem_err_type_strs[] = {
+ "unknown",
+ "no error",
+ "single-bit ECC",
+ "multi-bit ECC",
+ "single-symbol chipkill ECC",
+ "multi-symbol chipkill ECC",
+ "master abort",
+ "target abort",
+ "parity error",
+ "watchdog timeout",
+ "invalid address",
+ "mirror Broken",
+ "memory sparing",
+ "scrub corrected error",
+ "scrub uncorrected error",
+};
+
+static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem)
+{
+ if (mem->validation_bits & CPER_MEM_VALID_ERROR_STATUS)
+ pr_pfx(pfx, "error_status: 0x%016llx\n", mem->error_status);
+ if (mem->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS)
+ pr_pfx(pfx, "physical_address: 0x%016llx\n",
+ mem->physical_addr);
+ if (mem->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS_MASK)
+ pr_pfx(pfx, "physical_address_mask: 0x%016llx\n",
+ mem->physical_addr_mask);
+ if (mem->validation_bits & CPER_MEM_VALID_NODE)
+ pr_pfx(pfx, "node: %d\n", mem->node);
+ if (mem->validation_bits & CPER_MEM_VALID_CARD)
+ pr_pfx(pfx, "card: %d\n", mem->card);
+ if (mem->validation_bits & CPER_MEM_VALID_MODULE)
+ pr_pfx(pfx, "module: %d\n", mem->module);
+ if (mem->validation_bits & CPER_MEM_VALID_BANK)
+ pr_pfx(pfx, "bank: %d\n", mem->bank);
+ if (mem->validation_bits & CPER_MEM_VALID_DEVICE)
+ pr_pfx(pfx, "device: %d\n", mem->device);
+ if (mem->validation_bits & CPER_MEM_VALID_ROW)
+ pr_pfx(pfx, "row: %d\n", mem->row);
+ if (mem->validation_bits & CPER_MEM_VALID_COLUMN)
+ pr_pfx(pfx, "column: %d\n", mem->column);
+ if (mem->validation_bits & CPER_MEM_VALID_BIT_POSITION)
+ pr_pfx(pfx, "bit_position: %d\n", mem->bit_pos);
+ if (mem->validation_bits & CPER_MEM_VALID_REQUESTOR_ID)
+ pr_pfx(pfx, "requestor_id: 0x%016llx\n", mem->requestor_id);
+ if (mem->validation_bits & CPER_MEM_VALID_RESPONDER_ID)
+ pr_pfx(pfx, "responder_id: 0x%016llx\n", mem->responder_id);
+ if (mem->validation_bits & CPER_MEM_VALID_TARGET_ID)
+ pr_pfx(pfx, "target_id: 0x%016llx\n", mem->target_id);
+ if (mem->validation_bits & CPER_MEM_VALID_ERROR_TYPE) {
+ u8 etype = mem->error_type;
+ pr_pfx(pfx, "error_type: %d, %s\n", etype,
+ etype < ARRAY_SIZE(cper_mem_err_type_strs) ?
+ cper_mem_err_type_strs[etype] : "unknown");
+ }
+}
+
+static const char *cper_pcie_port_type_strs[] = {
+ "PCIe end point",
+ "legacy PCI end point",
+ "unknown",
+ "unknown",
+ "root port",
+ "upstream switch port",
+ "downstream switch port",
+ "PCIe to PCI/PCI-X bridge",
+ "PCI/PCI-X to PCIe bridge",
+ "root complex integrated endpoint device",
+ "root complex event collector",
+};
+
+static void cper_print_pcie(const char *pfx, const struct cper_sec_pcie *pcie)
+{
+ if (pcie->validation_bits & CPER_PCIE_VALID_PORT_TYPE)
+ pr_pfx(pfx, "port_type: %d, %s\n", pcie->port_type,
+ pcie->port_type < ARRAY_SIZE(cper_pcie_port_type_strs) ?
+ cper_pcie_port_type_strs[pcie->port_type] : "unknown");
+ if (pcie->validation_bits & CPER_PCIE_VALID_VERSION)
+ pr_pfx(pfx, "version: %d.%d\n",
+ pcie->version.major, pcie->version.minor);
+ if (pcie->validation_bits & CPER_PCIE_VALID_COMMAND_STATUS)
+ pr_pfx(pfx, "command: 0x%04x, status: 0x%04x\n",
+ pcie->command, pcie->status);
+ if (pcie->validation_bits & CPER_PCIE_VALID_DEVICE_ID) {
+ const __u8 *p;
+ pr_pfx(pfx, "device_id: %04x:%02x:%02x.%x\n",
+ pcie->device_id.segment, pcie->device_id.bus,
+ pcie->device_id.device, pcie->device_id.function);
+ pr_pfx(pfx, "slot: %d\n",
+ pcie->device_id.slot >> CPER_PCIE_SLOT_SHIFT);
+ pr_pfx(pfx, "secondary_bus: 0x%02x\n",
+ pcie->device_id.secondary_bus);
+ pr_pfx(pfx, "vendor_id: 0x%04x, device_id: 0x%04x\n",
+ pcie->device_id.vendor_id, pcie->device_id.device_id);
+ p = pcie->device_id.class_code;
+ pr_pfx(pfx, "class_code: %02x%02x%02x\n", p[0], p[1], p[2]);
+ }
+ if (pcie->validation_bits & CPER_PCIE_VALID_SERIAL_NUMBER)
+ pr_pfx(pfx, "serial number: 0x%04x, 0x%04x\n",
+ pcie->serial_number.lower, pcie->serial_number.upper);
+ if (pcie->validation_bits & CPER_PCIE_VALID_BRIDGE_CONTROL_STATUS)
+ pr_pfx(pfx,
+ "bridge: secondary_status: 0x%04x, control: 0x%04x\n",
+ pcie->bridge.secondary_status, pcie->bridge.control);
+}
+
+static const char *apei_estatus_section_flag_strs[] = {
+ "primary",
+ "containment warning",
+ "reset",
+ "threshold exceeded",
+ "resource not accessible",
+ "latent error",
+};
+
+static void apei_estatus_print_section(
+ const char *pfx, const struct acpi_hest_generic_data *gdata, int sec_no)
+{
+ uuid_le *sec_type = (uuid_le *)gdata->section_type;
+ __u16 severity;
+
+ severity = gdata->error_severity;
+ pr_pfx(pfx, "section: %d, severity: %d, %s\n", sec_no, severity,
+ cper_severity_str(severity));
+ pr_pfx(pfx, "flags: 0x%02x\n", gdata->flags);
+ cper_print_bits(pfx, gdata->flags, apei_estatus_section_flag_strs,
+ ARRAY_SIZE(apei_estatus_section_flag_strs));
+ if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
+ pr_pfx(pfx, "fru_id: %pUl\n", (uuid_le *)gdata->fru_id);
+ if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
+ pr_pfx(pfx, "fru_text: %.20s\n", gdata->fru_text);
+
+ if (!uuid_le_cmp(*sec_type, CPER_SEC_PROC_GENERIC)) {
+ struct cper_sec_proc_generic *proc_err = (void *)(gdata + 1);
+ pr_pfx(pfx, "section_type: general processor error\n");
+ if (gdata->error_data_length >= sizeof(*proc_err))
+ cper_print_proc_generic(pfx, proc_err);
+ else
+ goto err_section_too_small;
+ } else if (!uuid_le_cmp(*sec_type, CPER_SEC_PLATFORM_MEM)) {
+ struct cper_sec_mem_err *mem_err = (void *)(gdata + 1);
+ pr_pfx(pfx, "section_type: memory error\n");
+ if (gdata->error_data_length >= sizeof(*mem_err))
+ cper_print_mem(pfx, mem_err);
+ else
+ goto err_section_too_small;
+ } else if (!uuid_le_cmp(*sec_type, CPER_SEC_PCIE)) {
+ struct cper_sec_pcie *pcie = (void *)(gdata + 1);
+ pr_pfx(pfx, "section_type: PCIe error\n");
+ if (gdata->error_data_length >= sizeof(*pcie))
+ cper_print_pcie(pfx, pcie);
+ else
+ goto err_section_too_small;
+ } else
+ pr_pfx(pfx, "section type: unknown, %pUl\n", sec_type);
+
+ return;
+
+err_section_too_small:
+ pr_err(FW_WARN "error section length is too small\n");
+}
+
+void apei_estatus_print(const char *pfx,
+ const struct acpi_hest_generic_status *estatus)
+{
+ struct acpi_hest_generic_data *gdata;
+ unsigned int data_len, gedata_len;
+ int sec_no = 0;
+ __u16 severity;
+
+ pr_pfx(pfx, "APEI generic hardware error status\n");
+ severity = estatus->error_severity;
+ pr_pfx(pfx, "severity: %d, %s\n", severity,
+ cper_severity_str(severity));
+ data_len = estatus->data_length;
+ gdata = (struct acpi_hest_generic_data *)(estatus + 1);
+ while (data_len > sizeof(*gdata)) {
+ gedata_len = gdata->error_data_length;
+ apei_estatus_print_section(pfx, gdata, sec_no);
+ data_len -= gedata_len + sizeof(*gdata);
+ sec_no++;
+ }
+}
+EXPORT_SYMBOL_GPL(apei_estatus_print);
+
int apei_estatus_check_header(const struct acpi_hest_generic_status *estatus)
{
if (estatus->data_length &&
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH -v3 4/4] ACPI, APEI, Report GHES error information via printk
2010-12-02 8:05 [PATCH -v3 0/4] ACPI, APEI, Report GHES error information via printk Huang Ying
` (2 preceding siblings ...)
2010-12-02 8:05 ` [PATCH -v3 3/4] ACPI, APEI, Add APEI generic error status printing support Huang Ying
@ 2010-12-02 8:05 ` Huang Ying
3 siblings, 0 replies; 7+ messages in thread
From: Huang Ying @ 2010-12-02 8:05 UTC (permalink / raw)
To: Len Brown
Cc: linux-kernel, Andi Kleen, Tony Luck, ying.huang, linux-acpi,
Peter Zijlstra, Andrew Morton, Linus Torvalds, Ingo Molnar
printk is one of the methods to report hardware errors to user space.
This patch implements hardware error reporting for GHES via printk.
Signed-off-by: Huang Ying <ying.huang@intel.com>
---
drivers/acpi/apei/ghes.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -43,6 +43,7 @@
#include <linux/kdebug.h>
#include <linux/platform_device.h>
#include <linux/mutex.h>
+#include <linux/ratelimit.h>
#include <acpi/apei.h>
#include <acpi/atomicio.h>
#include <acpi/hed.h>
@@ -255,11 +256,26 @@ static void ghes_do_proc(struct ghes *gh
}
#endif
}
+}
+
+static void ghes_print_estatus(const char *pfx, struct ghes *ghes)
+{
+ /* Not more than 2 messages every 5 seconds */
+ static DEFINE_RATELIMIT_STATE(ratelimit, 5*HZ, 2);
- if (!processed && printk_ratelimit())
- pr_warning(GHES_PFX
- "Unknown error record from generic hardware error source: %d\n",
- ghes->generic->header.source_id);
+ if (pfx == NULL) {
+ if (ghes_severity(ghes->estatus->error_severity) <=
+ GHES_SEV_CORRECTED)
+ pfx = KERN_WARNING HW_ERR;
+ else
+ pfx = KERN_ERR HW_ERR;
+ }
+ if (__ratelimit(&ratelimit)) {
+ printk(
+ "%s""Hardware error from APEI Generic Hardware Error Source: %d\n",
+ pfx, ghes->generic->header.source_id);
+ apei_estatus_print(pfx, ghes->estatus);
+ }
}
static int ghes_proc(struct ghes *ghes)
@@ -269,6 +285,7 @@ static int ghes_proc(struct ghes *ghes)
rc = ghes_read_estatus(ghes, 0);
if (rc)
goto out;
+ ghes_print_estatus(NULL, ghes);
ghes_do_proc(ghes);
out:
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH -v3 1/4] printk, Add pr_pfx for library functions
2010-12-02 8:05 ` [PATCH -v3 1/4] printk, Add pr_pfx for library functions Huang Ying
@ 2010-12-02 8:37 ` Joe Perches
2010-12-02 8:54 ` Huang Ying
0 siblings, 1 reply; 7+ messages in thread
From: Joe Perches @ 2010-12-02 8:37 UTC (permalink / raw)
To: Huang Ying
Cc: Len Brown, linux-kernel, Andi Kleen, Tony Luck, linux-acpi,
Peter Zijlstra, Andrew Morton, Linus Torvalds, Ingo Molnar
On Thu, 2010-12-02 at 16:05 +0800, Huang Ying wrote:
> For library functions doing printk, the log level and line prefix
> usually need to be specified by the caller. So this patch adds
> "pr_pfx" to make the life of these library functions easier.
[]
> +#define pr_pfx(pfx, fmt, ...) \
> + printk("%s" fmt, pfx, ##__VA_ARGS__)
I think this would be a very error prone style.
pr_<foo> uses are log levels.
For an casual reader, is pfx a new log level or
is it a prefix or is it both? I do understand
what you intend.
Mutability of the KERN_<level> is pretty unusual.
Most all of the uses are ?: style
Here are the first uses in each file that has them:
There aren't many.
$ grep -rP -m1 --include=*.[ch] "KERN_[A-Z]+\s*:\s*KERN" *
arch/avr32/mm/fault.c: is_global_init(tsk) ? KERN_EMERG : KERN_INFO,
arch/sparc/kernel/traps_64.c: (recoverable ? KERN_WARNING : KERN_CRIT), smp_processor_id(),
arch/sparc/mm/fault_64.c: task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
arch/sparc/mm/fault_32.c: task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
arch/x86/kernel/signal.c: task_pid_nr(current) > 1 ? KERN_INFO : KERN_EMERG,
arch/x86/mm/fault.c: task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
arch/parisc/kernel/traps.c: level = user ? KERN_DEBUG : KERN_CRIT;
drivers/pci/pcie/aer/aerdrv_errprint.c: KERN_WARNING : KERN_ERR, dev_driver_string(&pdev->dev), \
include/linux/arcdevice.h: : x < D_DURING ? KERN_INFO : KERN_DEBUG, \
mm/internal.h: printk(level <= MMINIT_WARNING ? KERN_WARNING : KERN_DEBUG); \
sound/core/misc.c: const char *pfx = level ? KERN_DEBUG : KERN_DEFAULT;
I don't see why you can't use something like that
with a simpler flag for level.
You also miss a chance to use pr_pfx in ghes_print_estatus:
+ if (__ratelimit(&ratelimit)) {
+ printk(
+ "%s""Hardware error from APEI Generic Hardware Error Source: %d\n",
+ pfx, ghes->generic->header.source_id);
+ apei_estatus_print(pfx, ghes->estatus);
+
ghes_print_estatus is only called with a NULL first argument.
so the code use of pr_pfx seems dubious in any case.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH -v3 1/4] printk, Add pr_pfx for library functions
2010-12-02 8:37 ` Joe Perches
@ 2010-12-02 8:54 ` Huang Ying
0 siblings, 0 replies; 7+ messages in thread
From: Huang Ying @ 2010-12-02 8:54 UTC (permalink / raw)
To: Joe Perches
Cc: Len Brown, linux-kernel@vger.kernel.org, Andi Kleen, Luck, Tony,
linux-acpi@vger.kernel.org, Peter Zijlstra, Andrew Morton,
Linus Torvalds, Ingo Molnar
Hi, Joe,
On Thu, 2010-12-02 at 16:37 +0800, Joe Perches wrote:
> On Thu, 2010-12-02 at 16:05 +0800, Huang Ying wrote:
> > For library functions doing printk, the log level and line prefix
> > usually need to be specified by the caller. So this patch adds
> > "pr_pfx" to make the life of these library functions easier.
> []
> > +#define pr_pfx(pfx, fmt, ...) \
> > + printk("%s" fmt, pfx, ##__VA_ARGS__)
>
> I think this would be a very error prone style.
>
> pr_<foo> uses are log levels.
>
> For an casual reader, is pfx a new log level or
> is it a prefix or is it both? I do understand
> what you intend.
Maybe we can rename it to avoid confusing. Any suggestion? Or just
pfx_pr?
> Mutability of the KERN_<level> is pretty unusual.
> Most all of the uses are ?: style
If printk is used in library functions, the log level and some prefix
need to be specified by the caller. That is the intended user. Such as
that in 2 of the patch series.
> Here are the first uses in each file that has them:
> There aren't many.
>
> $ grep -rP -m1 --include=*.[ch] "KERN_[A-Z]+\s*:\s*KERN" *
> arch/avr32/mm/fault.c: is_global_init(tsk) ? KERN_EMERG : KERN_INFO,
> arch/sparc/kernel/traps_64.c: (recoverable ? KERN_WARNING : KERN_CRIT), smp_processor_id(),
> arch/sparc/mm/fault_64.c: task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
> arch/sparc/mm/fault_32.c: task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
> arch/x86/kernel/signal.c: task_pid_nr(current) > 1 ? KERN_INFO : KERN_EMERG,
> arch/x86/mm/fault.c: task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
> arch/parisc/kernel/traps.c: level = user ? KERN_DEBUG : KERN_CRIT;
> drivers/pci/pcie/aer/aerdrv_errprint.c: KERN_WARNING : KERN_ERR, dev_driver_string(&pdev->dev), \
> include/linux/arcdevice.h: : x < D_DURING ? KERN_INFO : KERN_DEBUG, \
> mm/internal.h: printk(level <= MMINIT_WARNING ? KERN_WARNING : KERN_DEBUG); \
> sound/core/misc.c: const char *pfx = level ? KERN_DEBUG : KERN_DEFAULT;
>
> I don't see why you can't use something like that
> with a simpler flag for level.
That is doable. I just think let caller specify log level and line
prefix is more flexible, and uses less code.
> You also miss a chance to use pr_pfx in ghes_print_estatus:
Sorry, forget that.
> + if (__ratelimit(&ratelimit)) {
> + printk(
> + "%s""Hardware error from APEI Generic Hardware Error Source: %d\n",
> + pfx, ghes->generic->header.source_id);
> + apei_estatus_print(pfx, ghes->estatus);
> +
>
> ghes_print_estatus is only called with a NULL first argument.
> so the code use of pr_pfx seems dubious in any case.
In some following-up patches (not sent out yet), ghes_print_estatus may
be called with non-NULL first argument.
Best Regards,
Huang Ying
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-12-02 8:54 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-02 8:05 [PATCH -v3 0/4] ACPI, APEI, Report GHES error information via printk Huang Ying
2010-12-02 8:05 ` [PATCH -v3 1/4] printk, Add pr_pfx for library functions Huang Ying
2010-12-02 8:37 ` Joe Perches
2010-12-02 8:54 ` Huang Ying
2010-12-02 8:05 ` [PATCH -v3 2/4] Add CPER PCIe error section structure and constants definition Huang Ying
2010-12-02 8:05 ` [PATCH -v3 3/4] ACPI, APEI, Add APEI generic error status printing support Huang Ying
2010-12-02 8:05 ` [PATCH -v3 4/4] ACPI, APEI, Report GHES error information via printk Huang Ying
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox