* [PATCH 0/3] Add trace event for ghes memory error
@ 2013-08-08 18:27 Naveen N. Rao
  2013-08-08 18:27 ` [PATCH 1/3] mce: acpi/apei: trace: Include PCIe AER trace event conditionally Naveen N. Rao
                   ` (2 more replies)
  0 siblings, 3 replies; 65+ messages in thread
From: Naveen N. Rao @ 2013-08-08 18:27 UTC (permalink / raw)
  To: tony.luck, bp, bhelgaas, rostedt, rjw, lance.ortiz, m.chehab
  Cc: linux-pci, linux-acpi, linux-kernel, Naveen N. Rao
This patch series adds a new trace event for memory errors reported via APEI
generic hardware error source.
- Naveen
Naveen N. Rao (3):
  mce: acpi/apei: trace: Include PCIe AER trace event conditionally
  mce: acpi/apei: trace: Add trace event for ghes memory error
  mce: acpi/apei: trace: Enable ghes memory error trace event
 drivers/acpi/apei/cper.c               |  21 +++--
 drivers/pci/pcie/aer/aerdrv_errprint.c |   1 +
 include/trace/events/ras.h             | 159 ++++++++++++++++++++++++++++++++-
 3 files changed, 175 insertions(+), 6 deletions(-)
-- 
1.8.3.4
^ permalink raw reply	[flat|nested] 65+ messages in thread
* [PATCH 1/3] mce: acpi/apei: trace: Include PCIe AER trace event conditionally
  2013-08-08 18:27 [PATCH 0/3] Add trace event for ghes memory error Naveen N. Rao
@ 2013-08-08 18:27 ` Naveen N. Rao
  2013-08-08 19:23   ` Steven Rostedt
  2013-08-08 18:27 ` [PATCH 2/3] mce: acpi/apei: trace: Add trace event for ghes memory error Naveen N. Rao
  2013-08-08 18:27 ` [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event Naveen N. Rao
  2 siblings, 1 reply; 65+ messages in thread
From: Naveen N. Rao @ 2013-08-08 18:27 UTC (permalink / raw)
  To: tony.luck, bp, bhelgaas, rostedt, rjw, lance.ortiz, m.chehab
  Cc: linux-pci, linux-acpi, linux-kernel, Naveen N. Rao
Since we'll be adding multiple trace events to ras.h, we need to protect
each block appropriately so that they only get included in the right
places. Update PCIe AER trace event for this purpose.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 drivers/pci/pcie/aer/aerdrv_errprint.c | 1 +
 include/trace/events/ras.h             | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c
index 2c7c9f5..4d06859 100644
--- a/drivers/pci/pcie/aer/aerdrv_errprint.c
+++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
@@ -24,6 +24,7 @@
 #include "aerdrv.h"
 
 #define CREATE_TRACE_POINTS
+#define TRACE_EVENT_PCIE_AER
 #include <trace/events/ras.h>
 
 #define AER_AGENT_RECEIVER		0
diff --git a/include/trace/events/ras.h b/include/trace/events/ras.h
index 88b8783..4a66142 100644
--- a/include/trace/events/ras.h
+++ b/include/trace/events/ras.h
@@ -1,7 +1,7 @@
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM ras
 
-#if !defined(_TRACE_AER_H) || defined(TRACE_HEADER_MULTI_READ)
+#if (!defined(_TRACE_AER_H) || defined(TRACE_HEADER_MULTI_READ)) && defined(TRACE_EVENT_PCIE_AER)
 #define _TRACE_AER_H
 
 #include <linux/tracepoint.h>
-- 
1.8.3.4
^ permalink raw reply related	[flat|nested] 65+ messages in thread
* [PATCH 2/3] mce: acpi/apei: trace: Add trace event for ghes memory error
  2013-08-08 18:27 [PATCH 0/3] Add trace event for ghes memory error Naveen N. Rao
  2013-08-08 18:27 ` [PATCH 1/3] mce: acpi/apei: trace: Include PCIe AER trace event conditionally Naveen N. Rao
@ 2013-08-08 18:27 ` Naveen N. Rao
  2013-08-08 19:17   ` Borislav Petkov
  2013-08-08 18:27 ` [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event Naveen N. Rao
  2 siblings, 1 reply; 65+ messages in thread
From: Naveen N. Rao @ 2013-08-08 18:27 UTC (permalink / raw)
  To: tony.luck, bp, bhelgaas, rostedt, rjw, lance.ortiz, m.chehab
  Cc: linux-pci, linux-acpi, linux-kernel, Naveen N. Rao
Add a trace event for memory error event from generic hardware error
source. We expose all members from the generic error status block, the
generic error data and the cper memory error record.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 include/trace/events/ras.h | 157 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 157 insertions(+)
diff --git a/include/trace/events/ras.h b/include/trace/events/ras.h
index 4a66142..1d8d404 100644
--- a/include/trace/events/ras.h
+++ b/include/trace/events/ras.h
@@ -73,5 +73,162 @@ TRACE_EVENT(aer_event,
 
 #endif /* _TRACE_AER_H */
 
+#if (!defined(_TRACE_GHES_H) || defined(TRACE_HEADER_MULTI_READ)) && defined(TRACE_EVENT_GHES)
+#define _TRACE_GHES_H
+
+#include <linux/tracepoint.h>
+
+/* Values for generic error status block_status */
+#define estatus_block_status_strs			\
+	{BIT(0),	"uncorrected error"},		\
+	{BIT(1),	"corrected error"},		\
+	{BIT(2),	"multiple uncorrected errors"},	\
+	{BIT(3),	"multiple corrected errors"}
+
+/* Values for error_severity */
+#define error_severity_strs				\
+	{BIT(0),	"recoverable"},			\
+	{BIT(1),	"fatal"},			\
+	{BIT(2),	"corrected"},			\
+	{BIT(3),	"info"}
+
+/* Values for generic error data flags */
+#define gdata_flags_strs				\
+	{BIT(0),	"primary"},			\
+	{BIT(1),	"containment warning"},		\
+	{BIT(2),	"reset"},			\
+	{BIT(3),	"error threshold exceeded"},	\
+	{BIT(4),	"resource not accessible"},	\
+	{BIT(5),	"latent error"}
+
+/* Values for memory error validation bits */
+#define mem_validation_bits_strs			\
+	{BIT(0),	"ERROR_STATUS"},		\
+	{BIT(1),	"PHYSICAL_ADDRESS"},		\
+	{BIT(2),	"PHYSICAL_ADDRESS_MASK"},	\
+	{BIT(3),	"NODE"},			\
+	{BIT(4),	"CARD"},			\
+	{BIT(5),	"MODULE"},			\
+	{BIT(6),	"BANK"},			\
+	{BIT(7),	"DEVICE"},			\
+	{BIT(8),	"ROW"},				\
+	{BIT(9),	"COLUMN"},			\
+	{BIT(10),	"BIT_POSITION"},		\
+	{BIT(11),	"REQUESTOR_ID"},		\
+	{BIT(12),	"RESPONDER_ID"},		\
+	{BIT(13),	"TARGET_ID"},			\
+	{BIT(14),	"ERROR_TYPE"}
+
+/* Values for memory error type */
+#define __show_mem_error_type(type)			\
+	__print_symbolic(type,				\
+		{0,	"unknown"},			\
+		{1,	"no error"},			\
+		{2,	"single-bit ECC"},		\
+		{3,	"multi-bit ECC"},		\
+		{4,	"single-symbol chipkill ECC"},	\
+		{5,	"multi-symbol chipkill ECC"},	\
+		{6,	"master abort"},		\
+		{7,	"target abort"},		\
+		{8,	"parity error"},		\
+		{9,	"watchdog timeout"},		\
+		{10,	"invalid address"},		\
+		{11,	"mirror broken"},		\
+		{12,	"memory sparing"},		\
+		{13,	"scrub corrected error"},	\
+		{14,	"scrub uncorrected error"})
+
+
+TRACE_EVENT(ghes_platform_memory_event,
+	TP_PROTO(const struct acpi_hest_generic_status *estatus,
+		 const struct acpi_hest_generic_data *gdata,
+		 const struct cper_sec_mem_err *mem),
+
+	TP_ARGS(estatus, gdata, mem),
+
+	TP_STRUCT__entry(
+		__field(	u32,	estatus_block_status		)
+		__field(	u32,	estatus_raw_data_offset		)
+		__field(	u32,	estatus_raw_data_length		)
+		__field(	u32,	estatus_data_length		)
+		__field(	u32,	estatus_error_severity		)
+		__array(	u8,	gdata_section_type,	16	)
+		__field(	u32,	gdata_error_severity		)
+		__field(	u16,	gdata_revision			)
+		__field(	u8,	gdata_validation_bits		)
+		__field(	u8,	gdata_flags			)
+		__field(	u32,	gdata_error_data_length		)
+		__array(	u8,	gdata_fru_id,		16	)
+		__array(	u8,	gdata_fru_text,		20	)
+		__field(	u64,	mem_validation_bits		)
+		__field(	u64,	mem_error_status		)
+		__field(	u64,	mem_physical_addr		)
+		__field(	u64,	mem_physical_addr_mask		)
+		__field(	u16,	mem_node			)
+		__field(	u16,	mem_card			)
+		__field(	u16,	mem_module			)
+		__field(	u16,	mem_bank			)
+		__field(	u16,	mem_device			)
+		__field(	u16,	mem_row				)
+		__field(	u16,	mem_column			)
+		__field(	u16,	mem_bit_pos			)
+		__field(	u64,	mem_requestor_id		)
+		__field(	u64,	mem_responder_id		)
+		__field(	u64,	mem_target_id			)
+		__field(	u8,	mem_error_type			)
+	),
+
+	TP_fast_assign(
+		__entry->estatus_block_status		= estatus->block_status;
+		__entry->estatus_raw_data_offset	= estatus->raw_data_offset;
+		__entry->estatus_raw_data_length	= estatus->raw_data_length;
+		__entry->estatus_data_length		= estatus->data_length;
+		__entry->estatus_error_severity		= estatus->error_severity;
+		memcpy(&__entry->gdata_section_type, &gdata->section_type, 16);
+		__entry->gdata_error_severity		= gdata->error_severity;
+		__entry->gdata_revision			= gdata->revision;
+		__entry->gdata_validation_bits		= gdata->validation_bits;
+		__entry->gdata_flags			= gdata->flags;
+		__entry->gdata_error_data_length	= gdata->error_data_length;
+		memcpy(&__entry->gdata_fru_id, &gdata->fru_id, 16);
+		memcpy(&__entry->gdata_fru_text, &gdata->fru_text, 20);
+		__entry->mem_validation_bits		= mem->validation_bits;
+		__entry->mem_error_status		= mem->error_status;
+		__entry->mem_physical_addr		= mem->physical_addr;
+		__entry->mem_physical_addr_mask		= mem->physical_addr_mask;
+		__entry->mem_node			= mem->node;
+		__entry->mem_card			= mem->card;
+		__entry->mem_module			= mem->module;
+		__entry->mem_bank			= mem->bank;
+		__entry->mem_device			= mem->device;
+		__entry->mem_row			= mem->row;
+		__entry->mem_column			= mem->column;
+		__entry->mem_bit_pos			= mem->bit_pos;
+		__entry->mem_requestor_id		= mem->requestor_id;
+		__entry->mem_responder_id		= mem->responder_id;
+		__entry->mem_target_id			= mem->target_id;
+		__entry->mem_error_type			= mem->error_type;
+	),
+
+	TP_printk("%s, event status: %s; generic data entry severity: %s, flags: %s, fru: %.20s, memory error section: validation bits: %s, error status: 0x%016llx, physical addr: 0x%016llx, physical addr mask: 0x%016llx, node: %d, card: %d, module: %d, bank: %d, device: %d, row: %d, column: %d, bit position: %d, requestor id: 0x%016llx, responder id: 0x%016llx, target id: 0x%016llx, error type: %s",
+		__print_flags(__entry->estatus_error_severity, "|", error_severity_strs),
+		__print_flags(__entry->estatus_block_status & 0x0f, "|", estatus_block_status_strs),
+		__print_flags(__entry->gdata_error_severity, "|", error_severity_strs),
+		__entry->gdata_flags ?
+		__print_flags(__entry->gdata_flags, "|", gdata_flags_strs) : "(null)",
+		(__entry->gdata_validation_bits & CPER_SEC_VALID_FRU_TEXT) ?
+		(char *)__entry->gdata_fru_text : "(null)",
+		__entry->mem_validation_bits ?
+		__print_flags(__entry->mem_validation_bits, "|", mem_validation_bits_strs) : "(null)",
+		__entry->mem_error_status, __entry->mem_physical_addr, __entry->mem_physical_addr_mask,
+		__entry->mem_node, __entry->mem_card, __entry->mem_module, __entry->mem_bank,
+		__entry->mem_device, __entry->mem_row, __entry->mem_column, __entry->mem_bit_pos,
+		__entry->mem_requestor_id, __entry->mem_responder_id, __entry->mem_target_id,
+		__show_mem_error_type(__entry->mem_error_type)
+	)
+);
+
+#endif /* _TRACE_GHES_H */
+
 /* This part must be outside protection */
 #include <trace/define_trace.h>
-- 
1.8.3.4
^ permalink raw reply related	[flat|nested] 65+ messages in thread
* [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-08 18:27 [PATCH 0/3] Add trace event for ghes memory error Naveen N. Rao
  2013-08-08 18:27 ` [PATCH 1/3] mce: acpi/apei: trace: Include PCIe AER trace event conditionally Naveen N. Rao
  2013-08-08 18:27 ` [PATCH 2/3] mce: acpi/apei: trace: Add trace event for ghes memory error Naveen N. Rao
@ 2013-08-08 18:27 ` Naveen N. Rao
  2013-08-08 19:38   ` Mauro Carvalho Chehab
  2 siblings, 1 reply; 65+ messages in thread
From: Naveen N. Rao @ 2013-08-08 18:27 UTC (permalink / raw)
  To: tony.luck, bp, bhelgaas, rostedt, rjw, lance.ortiz, m.chehab
  Cc: linux-pci, linux-acpi, linux-kernel, Naveen N. Rao
Enable memory error trace event in cper.c
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 drivers/acpi/apei/cper.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/drivers/acpi/apei/cper.c b/drivers/acpi/apei/cper.c
index 33dc6a0..19a9c0b 100644
--- a/drivers/acpi/apei/cper.c
+++ b/drivers/acpi/apei/cper.c
@@ -32,6 +32,10 @@
 #include <linux/pci.h>
 #include <linux/aer.h>
 
+#define CREATE_TRACE_POINTS
+#define TRACE_EVENT_GHES
+#include <trace/events/ras.h>
+
 /*
  * CPER record ID need to be unique even after reboot, because record
  * ID is used as index for ERST storage, while CPER records from
@@ -193,8 +197,13 @@ static const char *cper_mem_err_type_strs[] = {
 	"scrub uncorrected error",
 };
 
-static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem)
+static void cper_print_mem(const char *pfx,
+		const struct acpi_hest_generic_status *estatus,
+		const struct acpi_hest_generic_data *gdata,
+		const struct cper_sec_mem_err *mem)
 {
+	trace_ghes_platform_memory_event(estatus, gdata, mem);
+
 	if (mem->validation_bits & CPER_MEM_VALID_ERROR_STATUS)
 		printk("%s""error_status: 0x%016llx\n", pfx, mem->error_status);
 	if (mem->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS)
@@ -292,8 +301,10 @@ static const char *apei_estatus_section_flag_strs[] = {
 	"latent error",
 };
 
-static void apei_estatus_print_section(
-	const char *pfx, const struct acpi_hest_generic_data *gdata, int sec_no)
+static void apei_estatus_print_section(const char *pfx,
+		const struct acpi_hest_generic_status *estatus,
+		const struct acpi_hest_generic_data *gdata,
+		int sec_no)
 {
 	uuid_le *sec_type = (uuid_le *)gdata->section_type;
 	__u16 severity;
@@ -320,7 +331,7 @@ static void apei_estatus_print_section(
 		struct cper_sec_mem_err *mem_err = (void *)(gdata + 1);
 		printk("%s""section_type: memory error\n", pfx);
 		if (gdata->error_data_length >= sizeof(*mem_err))
-			cper_print_mem(pfx, mem_err);
+			cper_print_mem(pfx, estatus, gdata, mem_err);
 		else
 			goto err_section_too_small;
 	} else if (!uuid_le_cmp(*sec_type, CPER_SEC_PCIE)) {
@@ -355,7 +366,7 @@ void apei_estatus_print(const char *pfx,
 	gdata = (struct acpi_hest_generic_data *)(estatus + 1);
 	while (data_len > sizeof(*gdata)) {
 		gedata_len = gdata->error_data_length;
-		apei_estatus_print_section(pfx, gdata, sec_no);
+		apei_estatus_print_section(pfx, estatus, gdata, sec_no);
 		data_len -= gedata_len + sizeof(*gdata);
 		gdata = (void *)(gdata + 1) + gedata_len;
 		sec_no++;
-- 
1.8.3.4
^ permalink raw reply related	[flat|nested] 65+ messages in thread
* Re: [PATCH 2/3] mce: acpi/apei: trace: Add trace event for ghes memory error
  2013-08-08 18:27 ` [PATCH 2/3] mce: acpi/apei: trace: Add trace event for ghes memory error Naveen N. Rao
@ 2013-08-08 19:17   ` Borislav Petkov
  2013-08-12 11:28     ` Naveen N. Rao
  0 siblings, 1 reply; 65+ messages in thread
From: Borislav Petkov @ 2013-08-08 19:17 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: tony.luck, bhelgaas, rostedt, rjw, lance.ortiz, m.chehab,
	linux-pci, linux-acpi, linux-kernel
On Thu, Aug 08, 2013 at 11:57:50PM +0530, Naveen N. Rao wrote:
> +TRACE_EVENT(ghes_platform_memory_event,
> +	TP_PROTO(const struct acpi_hest_generic_status *estatus,
> +		 const struct acpi_hest_generic_data *gdata,
> +		 const struct cper_sec_mem_err *mem),
> +
> +	TP_ARGS(estatus, gdata, mem),
> +
> +	TP_STRUCT__entry(
> +		__field(	u32,	estatus_block_status		)
> +		__field(	u32,	estatus_raw_data_offset		)
> +		__field(	u32,	estatus_raw_data_length		)
> +		__field(	u32,	estatus_data_length		)
> +		__field(	u32,	estatus_error_severity		)
> +		__array(	u8,	gdata_section_type,	16	)
> +		__field(	u32,	gdata_error_severity		)
> +		__field(	u16,	gdata_revision			)
> +		__field(	u8,	gdata_validation_bits		)
> +		__field(	u8,	gdata_flags			)
> +		__field(	u32,	gdata_error_data_length		)
> +		__array(	u8,	gdata_fru_id,		16	)
> +		__array(	u8,	gdata_fru_text,		20	)
> +		__field(	u64,	mem_validation_bits		)
> +		__field(	u64,	mem_error_status		)
> +		__field(	u64,	mem_physical_addr		)
> +		__field(	u64,	mem_physical_addr_mask		)
> +		__field(	u16,	mem_node			)
> +		__field(	u16,	mem_card			)
> +		__field(	u16,	mem_module			)
> +		__field(	u16,	mem_bank			)
> +		__field(	u16,	mem_device			)
> +		__field(	u16,	mem_row				)
> +		__field(	u16,	mem_column			)
> +		__field(	u16,	mem_bit_pos			)
> +		__field(	u64,	mem_requestor_id		)
> +		__field(	u64,	mem_responder_id		)
> +		__field(	u64,	mem_target_id			)
> +		__field(	u8,	mem_error_type			)
> +	),
Without looking at the rest, a trace record from this tracepoint is
going to be 160 bytes IINM, which looks kinda fat to me. And during an
error storm we're probably not going to be able to log them all, maybe?
Yes, no, maybe I'm off base...
In any case, are we sure we want all those fields above? Can we make
them smaller, drop some of them from the tracepoint, etc, etc? Can we
compute some of them in userspace with information we already have?
Hmmm.
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 1/3] mce: acpi/apei: trace: Include PCIe AER trace event conditionally
  2013-08-08 18:27 ` [PATCH 1/3] mce: acpi/apei: trace: Include PCIe AER trace event conditionally Naveen N. Rao
@ 2013-08-08 19:23   ` Steven Rostedt
  2013-08-12 11:37     ` Naveen N. Rao
  0 siblings, 1 reply; 65+ messages in thread
From: Steven Rostedt @ 2013-08-08 19:23 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: tony.luck, bp, bhelgaas, rjw, lance.ortiz, m.chehab, linux-pci,
	linux-acpi, linux-kernel
[ attempting to try out claws-mail, hopefully this messages isn't
scrambled ;-) ]
On Thu,  8 Aug 2013 23:57:49 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> Since we'll be adding multiple trace events to ras.h, we need to protect
> each block appropriately so that they only get included in the right
> places. Update PCIe AER trace event for this purpose.
Why not make a separate file for each? You will have to define
TRACE_EVENT_PCIE_AER for the users as well. That is, the places that
include ras.h and use the trace_aer_*() tracepoints.
> 
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> ---
>  drivers/pci/pcie/aer/aerdrv_errprint.c | 1 +
>  include/trace/events/ras.h             | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c
> index 2c7c9f5..4d06859 100644
> --- a/drivers/pci/pcie/aer/aerdrv_errprint.c
> +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
> @@ -24,6 +24,7 @@
>  #include "aerdrv.h"
>  
>  #define CREATE_TRACE_POINTS
> +#define TRACE_EVENT_PCIE_AER
>  #include <trace/events/ras.h>
>  
>  #define AER_AGENT_RECEIVER		0
> diff --git a/include/trace/events/ras.h b/include/trace/events/ras.h
> index 88b8783..4a66142 100644
> --- a/include/trace/events/ras.h
> +++ b/include/trace/events/ras.h
> @@ -1,7 +1,7 @@
>  #undef TRACE_SYSTEM
>  #define TRACE_SYSTEM ras
>  
> -#if !defined(_TRACE_AER_H) || defined(TRACE_HEADER_MULTI_READ)
> +#if (!defined(_TRACE_AER_H) || defined(TRACE_HEADER_MULTI_READ)) && defined()
I think it would look cleaner to encapsulate the one define with the
other:
#ifdef TRACE_EVENT_PCIE_AER
#if !defined(_TRACE_AER_H) || defined(TRACE_HEADER_MULTI_READ)
-- Steve
>  #define _TRACE_AER_H
>  
>  #include <linux/tracepoint.h>
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-08 18:27 ` [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event Naveen N. Rao
@ 2013-08-08 19:38   ` Mauro Carvalho Chehab
  2013-08-10 18:03     ` Borislav Petkov
  0 siblings, 1 reply; 65+ messages in thread
From: Mauro Carvalho Chehab @ 2013-08-08 19:38 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: tony.luck, bp, bhelgaas, rostedt, rjw, lance.ortiz, linux-pci,
	linux-acpi, linux-kernel
Em Thu, 08 Aug 2013 23:57:51 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> escreveu:
> Enable memory error trace event in cper.c
Why do we need to do that? Memory error events are already handled
via edac_ghes module, in a standard way that allows the same
tracing to be used by either BIOS or by direct hardware error
report mechanism.
Adding an extra tracing for the same thing here will just make
userspace more complex, and will create duplicated error events
for the very same error.
Ok, if the interface there is poor, let's improve it, but adding
yet another interface to report the same error doesn't sound
reasonable on my eyes.
Regards,
Mauro
> 
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> ---
>  drivers/acpi/apei/cper.c | 21 ++++++++++++++++-----
>  1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/acpi/apei/cper.c b/drivers/acpi/apei/cper.c
> index 33dc6a0..19a9c0b 100644
> --- a/drivers/acpi/apei/cper.c
> +++ b/drivers/acpi/apei/cper.c
> @@ -32,6 +32,10 @@
>  #include <linux/pci.h>
>  #include <linux/aer.h>
>  
> +#define CREATE_TRACE_POINTS
> +#define TRACE_EVENT_GHES
> +#include <trace/events/ras.h>
> +
>  /*
>   * CPER record ID need to be unique even after reboot, because record
>   * ID is used as index for ERST storage, while CPER records from
> @@ -193,8 +197,13 @@ static const char *cper_mem_err_type_strs[] = {
>  	"scrub uncorrected error",
>  };
>  
> -static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem)
> +static void cper_print_mem(const char *pfx,
> +		const struct acpi_hest_generic_status *estatus,
> +		const struct acpi_hest_generic_data *gdata,
> +		const struct cper_sec_mem_err *mem)
>  {
> +	trace_ghes_platform_memory_event(estatus, gdata, mem);
> +
>  	if (mem->validation_bits & CPER_MEM_VALID_ERROR_STATUS)
>  		printk("%s""error_status: 0x%016llx\n", pfx, mem->error_status);
>  	if (mem->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS)
> @@ -292,8 +301,10 @@ static const char *apei_estatus_section_flag_strs[] = {
>  	"latent error",
>  };
>  
> -static void apei_estatus_print_section(
> -	const char *pfx, const struct acpi_hest_generic_data *gdata, int sec_no)
> +static void apei_estatus_print_section(const char *pfx,
> +		const struct acpi_hest_generic_status *estatus,
> +		const struct acpi_hest_generic_data *gdata,
> +		int sec_no)
>  {
>  	uuid_le *sec_type = (uuid_le *)gdata->section_type;
>  	__u16 severity;
> @@ -320,7 +331,7 @@ static void apei_estatus_print_section(
>  		struct cper_sec_mem_err *mem_err = (void *)(gdata + 1);
>  		printk("%s""section_type: memory error\n", pfx);
>  		if (gdata->error_data_length >= sizeof(*mem_err))
> -			cper_print_mem(pfx, mem_err);
> +			cper_print_mem(pfx, estatus, gdata, mem_err);
>  		else
>  			goto err_section_too_small;
>  	} else if (!uuid_le_cmp(*sec_type, CPER_SEC_PCIE)) {
> @@ -355,7 +366,7 @@ void apei_estatus_print(const char *pfx,
>  	gdata = (struct acpi_hest_generic_data *)(estatus + 1);
>  	while (data_len > sizeof(*gdata)) {
>  		gedata_len = gdata->error_data_length;
> -		apei_estatus_print_section(pfx, gdata, sec_no);
> +		apei_estatus_print_section(pfx, estatus, gdata, sec_no);
>  		data_len -= gedata_len + sizeof(*gdata);
>  		gdata = (void *)(gdata + 1) + gedata_len;
>  		sec_no++;
-- 
Cheers,
Mauro
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-08 19:38   ` Mauro Carvalho Chehab
@ 2013-08-10 18:03     ` Borislav Petkov
  2013-08-12 11:33       ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 65+ messages in thread
From: Borislav Petkov @ 2013-08-10 18:03 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Naveen N. Rao, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel
On Thu, Aug 08, 2013 at 04:38:22PM -0300, Mauro Carvalho Chehab wrote:
> Em Thu, 08 Aug 2013 23:57:51 +0530
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> escreveu:
> 
> > Enable memory error trace event in cper.c
> 
> Why do we need to do that? Memory error events are already handled
> via edac_ghes module,
If APEI gives me all the required information in order to deal with the
hardware error - and it looks like it does - then the additional layer
of ghes_edac is not needed.
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 2/3] mce: acpi/apei: trace: Add trace event for ghes memory error
  2013-08-08 19:17   ` Borislav Petkov
@ 2013-08-12 11:28     ` Naveen N. Rao
  0 siblings, 0 replies; 65+ messages in thread
From: Naveen N. Rao @ 2013-08-12 11:28 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: tony.luck, bhelgaas, rostedt, rjw, lance.ortiz, m.chehab,
	linux-pci, linux-acpi, linux-kernel
On 08/09/2013 12:47 AM, Borislav Petkov wrote:
> On Thu, Aug 08, 2013 at 11:57:50PM +0530, Naveen N. Rao wrote:
>> +TRACE_EVENT(ghes_platform_memory_event,
>> +	TP_PROTO(const struct acpi_hest_generic_status *estatus,
>> +		 const struct acpi_hest_generic_data *gdata,
>> +		 const struct cper_sec_mem_err *mem),
>> +
>> +	TP_ARGS(estatus, gdata, mem),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(	u32,	estatus_block_status		)
>> +		__field(	u32,	estatus_raw_data_offset		)
>> +		__field(	u32,	estatus_raw_data_length		)
>> +		__field(	u32,	estatus_data_length		)
>> +		__field(	u32,	estatus_error_severity		)
>> +		__array(	u8,	gdata_section_type,	16	)
>> +		__field(	u32,	gdata_error_severity		)
>> +		__field(	u16,	gdata_revision			)
>> +		__field(	u8,	gdata_validation_bits		)
>> +		__field(	u8,	gdata_flags			)
>> +		__field(	u32,	gdata_error_data_length		)
>> +		__array(	u8,	gdata_fru_id,		16	)
>> +		__array(	u8,	gdata_fru_text,		20	)
>> +		__field(	u64,	mem_validation_bits		)
>> +		__field(	u64,	mem_error_status		)
>> +		__field(	u64,	mem_physical_addr		)
>> +		__field(	u64,	mem_physical_addr_mask		)
>> +		__field(	u16,	mem_node			)
>> +		__field(	u16,	mem_card			)
>> +		__field(	u16,	mem_module			)
>> +		__field(	u16,	mem_bank			)
>> +		__field(	u16,	mem_device			)
>> +		__field(	u16,	mem_row				)
>> +		__field(	u16,	mem_column			)
>> +		__field(	u16,	mem_bit_pos			)
>> +		__field(	u64,	mem_requestor_id		)
>> +		__field(	u64,	mem_responder_id		)
>> +		__field(	u64,	mem_target_id			)
>> +		__field(	u8,	mem_error_type			)
>> +	),
>
> Without looking at the rest, a trace record from this tracepoint is
> going to be 160 bytes IINM, which looks kinda fat to me. And during an
> error storm we're probably not going to be able to log them all, maybe?
> Yes, no, maybe I'm off base...
>
> In any case, are we sure we want all those fields above? Can we make
> them smaller, drop some of them from the tracepoint, etc, etc? Can we
> compute some of them in userspace with information we already have?
Good idea - I hadn't thought from that perspective. I think we can drop 
a few fields there, especially the length/offset fields and perhaps the 
section_type since we know this is a memory error. Will get back with a 
new revision.
Thanks,
Naveen
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-10 18:03     ` Borislav Petkov
@ 2013-08-12 11:33       ` Mauro Carvalho Chehab
  2013-08-12 12:38         ` Borislav Petkov
  2013-08-12 12:41         ` Naveen N. Rao
  0 siblings, 2 replies; 65+ messages in thread
From: Mauro Carvalho Chehab @ 2013-08-12 11:33 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Naveen N. Rao, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel
Em Sat, 10 Aug 2013 20:03:22 +0200
Borislav Petkov <bp@alien8.de> escreveu:
> On Thu, Aug 08, 2013 at 04:38:22PM -0300, Mauro Carvalho Chehab wrote:
> > Em Thu, 08 Aug 2013 23:57:51 +0530
> > "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> escreveu:
> > 
> > > Enable memory error trace event in cper.c
> > 
> > Why do we need to do that? Memory error events are already handled
> > via edac_ghes module,
> 
> If APEI gives me all the required information in order to deal with the
> hardware error - and it looks like it does - then the additional layer
> of ghes_edac is not needed.
APEI is just the mechanism that collects the data, not the mechanism
that reports to userspace.
The current implementation is that APEI already reports those errors
via ghes_edac driver. It also reports the very same error via MCE
(although the APEI interface to MCE is currently broken for everything
that it is not Nehalem-EX - as it basically emulates the MCE log for
that specific architecture).
I really don't see any sense on adding yet-another-way to report the
very same error.
Regards,
Mauro
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 1/3] mce: acpi/apei: trace: Include PCIe AER trace event conditionally
  2013-08-08 19:23   ` Steven Rostedt
@ 2013-08-12 11:37     ` Naveen N. Rao
  2013-08-12 13:13       ` Steven Rostedt
  0 siblings, 1 reply; 65+ messages in thread
From: Naveen N. Rao @ 2013-08-12 11:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: tony.luck, bp, bhelgaas, rjw, lance.ortiz, m.chehab, linux-pci,
	linux-acpi, linux-kernel
On 08/09/2013 12:53 AM, Steven Rostedt wrote:
> [ attempting to try out claws-mail, hopefully this messages isn't
> scrambled ;-) ]
Works just fine :)
>
> On Thu,  8 Aug 2013 23:57:49 +0530
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
>
>> Since we'll be adding multiple trace events to ras.h, we need to protect
>> each block appropriately so that they only get included in the right
>> places. Update PCIe AER trace event for this purpose.
>
> Why not make a separate file for each? You will have to define
The idea was to have all RAS-related trace points in a single place. 
This was discussed back when the PCIe AER trace event was added and so I 
chose to add the new event here as well.
> TRACE_EVENT_PCIE_AER for the users as well. That is, the places that
> include ras.h and use the trace_aer_*() tracepoints.
Do you mean the change I've done to aerdrv-errprint.c below? This trace 
point is currently only used there, so I guess we are ok?
Thanks,
Naveen
>
>
>>
>> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
>> ---
>>   drivers/pci/pcie/aer/aerdrv_errprint.c | 1 +
>>   include/trace/events/ras.h             | 2 +-
>>   2 files changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c
>> index 2c7c9f5..4d06859 100644
>> --- a/drivers/pci/pcie/aer/aerdrv_errprint.c
>> +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
>> @@ -24,6 +24,7 @@
>>   #include "aerdrv.h"
>>
>>   #define CREATE_TRACE_POINTS
>> +#define TRACE_EVENT_PCIE_AER
>>   #include <trace/events/ras.h>
>>
>>   #define AER_AGENT_RECEIVER		0
>> diff --git a/include/trace/events/ras.h b/include/trace/events/ras.h
>> index 88b8783..4a66142 100644
>> --- a/include/trace/events/ras.h
>> +++ b/include/trace/events/ras.h
>> @@ -1,7 +1,7 @@
>>   #undef TRACE_SYSTEM
>>   #define TRACE_SYSTEM ras
>>
>> -#if !defined(_TRACE_AER_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#if (!defined(_TRACE_AER_H) || defined(TRACE_HEADER_MULTI_READ)) && defined()
>
> I think it would look cleaner to encapsulate the one define with the
> other:
>
> #ifdef TRACE_EVENT_PCIE_AER
> #if !defined(_TRACE_AER_H) || defined(TRACE_HEADER_MULTI_READ)
>
>
> -- Steve
>
>>   #define _TRACE_AER_H
>>
>>   #include <linux/tracepoint.h>
>
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-12 11:33       ` Mauro Carvalho Chehab
@ 2013-08-12 12:38         ` Borislav Petkov
  2013-08-12 14:49           ` Mauro Carvalho Chehab
  2013-08-12 12:41         ` Naveen N. Rao
  1 sibling, 1 reply; 65+ messages in thread
From: Borislav Petkov @ 2013-08-12 12:38 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Naveen N. Rao, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel
On Mon, Aug 12, 2013 at 08:33:55AM -0300, Mauro Carvalho Chehab wrote:
> APEI is just the mechanism that collects the data, not the mechanism
> that reports to userspace.
Both methods add a tracepoint - no difference.
> I really don't see any sense on adding yet-another-way to report the
> very same error.
Well, your suggested way through the layers is this:
Hardware->APEI->EDAC.
His is
Hardware->APEI.
If I can lose the EDAC layer, then this is a clear win.
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-12 11:33       ` Mauro Carvalho Chehab
  2013-08-12 12:38         ` Borislav Petkov
@ 2013-08-12 12:41         ` Naveen N. Rao
  2013-08-12 12:53           ` Borislav Petkov
  2013-08-12 14:44           ` Mauro Carvalho Chehab
  1 sibling, 2 replies; 65+ messages in thread
From: Naveen N. Rao @ 2013-08-12 12:41 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Borislav Petkov, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel
On 08/12/2013 05:03 PM, Mauro Carvalho Chehab wrote:
> Em Sat, 10 Aug 2013 20:03:22 +0200
> Borislav Petkov <bp@alien8.de> escreveu:
>
>> On Thu, Aug 08, 2013 at 04:38:22PM -0300, Mauro Carvalho Chehab wrote:
>>> Em Thu, 08 Aug 2013 23:57:51 +0530
>>> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> escreveu:
>>>
>>>> Enable memory error trace event in cper.c
>>>
>>> Why do we need to do that? Memory error events are already handled
>>> via edac_ghes module,
>>
>> If APEI gives me all the required information in order to deal with the
>> hardware error - and it looks like it does - then the additional layer
>> of ghes_edac is not needed.
>
> APEI is just the mechanism that collects the data, not the mechanism
> that reports to userspace.
I think what Boris is saying is that ghes_edac isn't adding anything 
more here given what we get from APEI structures. So, there doesn't seem 
to be a need to add dependency on edac for this purpose.
Further, ghes_edac seems to require EDAC_MM_EDAC to be compiled into the 
kernel (not a module). So, more dependencies.
>
> The current implementation is that APEI already reports those errors
> via ghes_edac driver. It also reports the very same error via MCE
> (although the APEI interface to MCE is currently broken for everything
> that it is not Nehalem-EX - as it basically emulates the MCE log for
> that specific architecture).
So, I looked at ghes_edac and it basically seems to boil down to 
trace_mc_event. But, this only seems to expose the APEI data as a string 
and doesn't look to really make all the fields available to user-space 
in a raw manner. Not sure how well this can be utilised by a user-space 
tool. Do you have suggestions on how we can do this?
Thanks,
Naveen
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-12 12:41         ` Naveen N. Rao
@ 2013-08-12 12:53           ` Borislav Petkov
  2013-08-13 11:21             ` Naveen N. Rao
  2013-08-12 14:44           ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 65+ messages in thread
From: Borislav Petkov @ 2013-08-12 12:53 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Mauro Carvalho Chehab, tony.luck, bhelgaas, rostedt, rjw,
	lance.ortiz, linux-pci, linux-acpi, linux-kernel
On Mon, Aug 12, 2013 at 06:11:49PM +0530, Naveen N. Rao wrote:
> So, I looked at ghes_edac and it basically seems to boil down to
> trace_mc_event. But, this only seems to expose the APEI data as a
> string and doesn't look to really make all the fields available to
> user-space in a raw manner. Not sure how well this can be utilised
> by a user-space tool.
Well, your tracepoint dumps the decoded memory error too. So all the
information we need is already there, without edac. Or am I missing some
bits?
Thus why I'm saying is that we don't need the additional edac layer.
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 1/3] mce: acpi/apei: trace: Include PCIe AER trace event conditionally
  2013-08-12 11:37     ` Naveen N. Rao
@ 2013-08-12 13:13       ` Steven Rostedt
  2013-08-12 13:26         ` Borislav Petkov
  0 siblings, 1 reply; 65+ messages in thread
From: Steven Rostedt @ 2013-08-12 13:13 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: tony.luck, bp, bhelgaas, rjw, lance.ortiz, m.chehab, linux-pci,
	linux-acpi, linux-kernel
On Mon, 12 Aug 2013 17:07:01 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> > TRACE_EVENT_PCIE_AER for the users as well. That is, the places that
> > include ras.h and use the trace_aer_*() tracepoints.
> 
> Do you mean the change I've done to aerdrv-errprint.c below? This trace 
> point is currently only used there, so I guess we are ok?
> 
Yeah, if it's only used on one place, then it's fine. But if it gets
used in other files, than those other files with have to define the
macro as well.
-- Steve
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 1/3] mce: acpi/apei: trace: Include PCIe AER trace event conditionally
  2013-08-12 13:13       ` Steven Rostedt
@ 2013-08-12 13:26         ` Borislav Petkov
  0 siblings, 0 replies; 65+ messages in thread
From: Borislav Petkov @ 2013-08-12 13:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Naveen N. Rao, tony.luck, bhelgaas, rjw, lance.ortiz, m.chehab,
	linux-pci, linux-acpi, linux-kernel
On Mon, Aug 12, 2013 at 09:13:37AM -0400, Steven Rostedt wrote:
> Yeah, if it's only used on one place, then it's fine. But if it gets
> used in other files, than those other files with have to define the
> macro as well.
Yeah, we need to be careful about this. When a ras tracepoint gets used
in multiple places, I'm guessing it would be cleaner/easier to fork it
out in its own ras-<a-lot-used-tracepoint>.h header, huh?
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-12 12:41         ` Naveen N. Rao
  2013-08-12 12:53           ` Borislav Petkov
@ 2013-08-12 14:44           ` Mauro Carvalho Chehab
  2013-08-13 11:41             ` Naveen N. Rao
  1 sibling, 1 reply; 65+ messages in thread
From: Mauro Carvalho Chehab @ 2013-08-12 14:44 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Borislav Petkov, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel, Aristeu Rozanski Filho
Em Mon, 12 Aug 2013 18:11:49 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> escreveu:
> On 08/12/2013 05:03 PM, Mauro Carvalho Chehab wrote:
> > Em Sat, 10 Aug 2013 20:03:22 +0200
> > Borislav Petkov <bp@alien8.de> escreveu:
> >
> >> On Thu, Aug 08, 2013 at 04:38:22PM -0300, Mauro Carvalho Chehab wrote:
> >>> Em Thu, 08 Aug 2013 23:57:51 +0530
> >>> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> escreveu:
> >>>
> >>>> Enable memory error trace event in cper.c
> >>>
> >>> Why do we need to do that? Memory error events are already handled
> >>> via edac_ghes module,
> >>
> >> If APEI gives me all the required information in order to deal with the
> >> hardware error - and it looks like it does - then the additional layer
> >> of ghes_edac is not needed.
> >
> > APEI is just the mechanism that collects the data, not the mechanism
> > that reports to userspace.
> 
> I think what Boris is saying is that ghes_edac isn't adding anything 
> more here given what we get from APEI structures. So, there doesn't seem 
> to be a need to add dependency on edac for this purpose.
> 
> Further, ghes_edac seems to require EDAC_MM_EDAC to be compiled into the 
> kernel (not a module). So, more dependencies.
> 
> >
> > The current implementation is that APEI already reports those errors
> > via ghes_edac driver. It also reports the very same error via MCE
> > (although the APEI interface to MCE is currently broken for everything
> > that it is not Nehalem-EX - as it basically emulates the MCE log for
> > that specific architecture).
> 
> So, I looked at ghes_edac and it basically seems to boil down to 
> trace_mc_event. 
Yes. It also provides the sysfs nodes that describe the memory.
> But, this only seems to expose the APEI data as a string 
> and doesn't look to really make all the fields available to user-space 
> in a raw manner. Not sure how well this can be utilised by a user-space 
> tool. Do you have suggestions on how we can do this?
There's already an userspace tool that handes it:
	https://git.fedorahosted.org/cgit/rasdaemon.git/
What is missing there on the current version is the bits that would allow
to translate from APEI way to report an error (memory node, card, module,
bank, device) into a DIMM label[1]. 
At the end, what really matters is to be able to point to the DIMM(s)
in a way that the user can replace them (e. g. using the silk screen
labels on the system motherboard).
[1] It does such translation for the other EDAC drivers, via a
configuration file that does such per-system mapping. Extending it to
also handle APEI errors shouldn't be hard.
> 
> Thanks,
> Naveen
> 
-- 
Cheers,
Mauro
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-12 12:38         ` Borislav Petkov
@ 2013-08-12 14:49           ` Mauro Carvalho Chehab
  2013-08-12 15:04             ` Borislav Petkov
  0 siblings, 1 reply; 65+ messages in thread
From: Mauro Carvalho Chehab @ 2013-08-12 14:49 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Naveen N. Rao, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel
Em Mon, 12 Aug 2013 14:38:13 +0200
Borislav Petkov <bp@alien8.de> escreveu:
> On Mon, Aug 12, 2013 at 08:33:55AM -0300, Mauro Carvalho Chehab wrote:
> > APEI is just the mechanism that collects the data, not the mechanism
> > that reports to userspace.
> 
> Both methods add a tracepoint - no difference.
> 
> > I really don't see any sense on adding yet-another-way to report the
> > very same error.
> 
> Well, your suggested way through the layers is this:
> 
> Hardware->APEI->EDAC.
> 
> His is
> 
> Hardware->APEI.
> 
> If I can lose the EDAC layer, then this is a clear win.
Clear win from what PoV? Userspace will need to decode a different type
of tracing, and implement a different logic for APEI. So, it will be
reinventing the wheel, with a different trace format, and will require
userspace to implement another tracing event for the same thing that
EDAC already provides.
Also, if both ghes_edac and this new tracing is enabled, userspace will
receive twice the same event, as two traces will be received for the
same thing.
Worse than that, how userspace is supposed to merge those two events
into one?
> 
-- 
Cheers,
Mauro
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-12 14:49           ` Mauro Carvalho Chehab
@ 2013-08-12 15:04             ` Borislav Petkov
  2013-08-12 17:25               ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 65+ messages in thread
From: Borislav Petkov @ 2013-08-12 15:04 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Naveen N. Rao, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel
On Mon, Aug 12, 2013 at 11:49:32AM -0300, Mauro Carvalho Chehab wrote:
> Clear win from what PoV? Userspace will need to decode a different type
> of tracing, and implement a different logic for APEI.
There's no different type of tracing - it is the same info as in both
cases it comes from APEI. And if it can be done in the APEI layer, then
we don't need the next layer.
> Also, if both ghes_edac and this new tracing is enabled, userspace
> will receive twice the same event, as two traces will be received for
> the same thing.
We are, of course, going to have only one tracepoint which reports
memory errors, not two.
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-12 15:04             ` Borislav Petkov
@ 2013-08-12 17:25               ` Mauro Carvalho Chehab
  2013-08-12 17:54                 ` Luck, Tony
  2013-08-12 17:56                 ` Borislav Petkov
  0 siblings, 2 replies; 65+ messages in thread
From: Mauro Carvalho Chehab @ 2013-08-12 17:25 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Naveen N. Rao, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel
Em Mon, 12 Aug 2013 17:04:24 +0200
Borislav Petkov <bp@alien8.de> escreveu:
> On Mon, Aug 12, 2013 at 11:49:32AM -0300, Mauro Carvalho Chehab wrote:
> > Clear win from what PoV? Userspace will need to decode a different type
> > of tracing, and implement a different logic for APEI.
> 
> There's no different type of tracing - it is the same info as in both
> cases it comes from APEI.
Well, patch 2/3 is defining a different type of tracing for memory errors,
instead of re-using the existing one.
> And if it can be done in the APEI layer, then
> we don't need the next layer.
Userspace still needs the EDAC sysfs, in order to identify how the memory
is organized, and do the proper memory labels association.
What edac_ghes does is to fill those sysfs nodes, and to call the
existing tracing to report errors.
> > Also, if both ghes_edac and this new tracing is enabled, userspace
> > will receive twice the same event, as two traces will be received for
> > the same thing.
> 
> We are, of course, going to have only one tracepoint which reports
> memory errors, not two.
Yes, that's my point.
Regards,
Mauro
^ permalink raw reply	[flat|nested] 65+ messages in thread
* RE: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-12 17:25               ` Mauro Carvalho Chehab
@ 2013-08-12 17:54                 ` Luck, Tony
  2013-08-12 17:56                 ` Borislav Petkov
  1 sibling, 0 replies; 65+ messages in thread
From: Luck, Tony @ 2013-08-12 17:54 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Borislav Petkov
  Cc: Naveen N. Rao, bhelgaas@google.com, rostedt@goodmis.org,
	rjw@sisk.pl, lance.ortiz@hp.com, linux-pci@vger.kernel.org,
	linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org
>> We are, of course, going to have only one tracepoint which reports
>> memory errors, not two.
>
> Yes, that's my point.
Is life that simple?
We have systems that have no EDAC driver (in some cases because the
architecture precludes one from ever being written, in other because
we either don't have the documentation, or because no driver has been
written yet).	
Systems also have varying degrees of APEI support (from none at all, to
full support ... possibly we may have some with apparent support, but BIOS
provides bogus information ... assuming BIOS folks live up/down to our
usual expectations).
-Tony
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-12 17:25               ` Mauro Carvalho Chehab
  2013-08-12 17:54                 ` Luck, Tony
@ 2013-08-12 17:56                 ` Borislav Petkov
  2013-08-13 11:36                   ` Naveen N. Rao
  1 sibling, 1 reply; 65+ messages in thread
From: Borislav Petkov @ 2013-08-12 17:56 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Naveen N. Rao
  Cc: tony.luck, bhelgaas, rostedt, rjw, lance.ortiz, linux-pci,
	linux-acpi, linux-kernel
On Mon, Aug 12, 2013 at 02:25:57PM -0300, Mauro Carvalho Chehab wrote:
> Userspace still needs the EDAC sysfs, in order to identify how the
> memory is organized, and do the proper memory labels association.
>
> What edac_ghes does is to fill those sysfs nodes, and to call the
> existing tracing to report errors.
This is the only reason which justifies EDAC's existence. Naveen, can
your BIOS directly report the silkscreen label of the DIMM in error?
Generally, can any BIOS do that?
More specifically, what are those gdata_fru_id and gdata_fru_text
things?
Because if it can, then having the memory error tracepoint come direct
from APEI should be enough. The ghes_edac functionality could be then
fallback for BIOSes which cannot report the silkscreen label and in such
case I can imagine keeping both tracepoints, but disabling one of the
two...
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-12 12:53           ` Borislav Petkov
@ 2013-08-13 11:21             ` Naveen N. Rao
  2013-08-13 12:42               ` Borislav Petkov
  0 siblings, 1 reply; 65+ messages in thread
From: Naveen N. Rao @ 2013-08-13 11:21 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Mauro Carvalho Chehab, tony.luck, bhelgaas, rostedt, rjw,
	lance.ortiz, linux-pci, linux-acpi, linux-kernel
On 08/12/2013 06:23 PM, Borislav Petkov wrote:
> On Mon, Aug 12, 2013 at 06:11:49PM +0530, Naveen N. Rao wrote:
>> So, I looked at ghes_edac and it basically seems to boil down to
>> trace_mc_event. But, this only seems to expose the APEI data as a
>> string and doesn't look to really make all the fields available to
>> user-space in a raw manner. Not sure how well this can be utilised
>> by a user-space tool.
>
> Well, your tracepoint dumps the decoded memory error too. So all the
> information we need is already there, without edac. Or am I missing some
> bits?
>
> Thus why I'm saying is that we don't need the additional edac layer.
>
You're right - my trace point makes all the data provided by apei as-is 
to userspace. However, ghes_edac seems to squash some of this data into 
a string when reporting through mc_event.
Regards,
Naveen
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-12 17:56                 ` Borislav Petkov
@ 2013-08-13 11:36                   ` Naveen N. Rao
  2013-08-13 12:21                     ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 65+ messages in thread
From: Naveen N. Rao @ 2013-08-13 11:36 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Mauro Carvalho Chehab, tony.luck, bhelgaas, rostedt, rjw,
	lance.ortiz, linux-pci, linux-acpi, linux-kernel
On 08/12/2013 11:26 PM, Borislav Petkov wrote:
> On Mon, Aug 12, 2013 at 02:25:57PM -0300, Mauro Carvalho Chehab wrote:
>> Userspace still needs the EDAC sysfs, in order to identify how the
>> memory is organized, and do the proper memory labels association.
>>
>> What edac_ghes does is to fill those sysfs nodes, and to call the
>> existing tracing to report errors.
I suppose you're referring to the entries under /sys/devices/system/edac/mc?
I'm not sure I understand how this helps. ghes_edac seems to just be 
populating this based on dmi, which if I'm not mistaken, can be obtained 
in userspace (mcelog as an example).
Also, on my system, all DIMMs are being reported under mc0. I doubt if 
the labels there are accurate.
>
> This is the only reason which justifies EDAC's existence. Naveen, can
> your BIOS directly report the silkscreen label of the DIMM in error?
> Generally, can any BIOS do that?
>
> More specifically, what are those gdata_fru_id and gdata_fru_text
> things?
My understanding was that this provides the DIMM serial number, but I'm 
double checking just to be sure.
Thanks,
Naveen
>
> Because if it can, then having the memory error tracepoint come direct
> from APEI should be enough. The ghes_edac functionality could be then
> fallback for BIOSes which cannot report the silkscreen label and in such
> case I can imagine keeping both tracepoints, but disabling one of the
> two...
>
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-12 14:44           ` Mauro Carvalho Chehab
@ 2013-08-13 11:41             ` Naveen N. Rao
  2013-08-13 12:41               ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 65+ messages in thread
From: Naveen N. Rao @ 2013-08-13 11:41 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Borislav Petkov, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel, Aristeu Rozanski Filho
On 08/12/2013 08:14 PM, Mauro Carvalho Chehab wrote:
>> But, this only seems to expose the APEI data as a string
>> and doesn't look to really make all the fields available to user-space
>> in a raw manner. Not sure how well this can be utilised by a user-space
>> tool. Do you have suggestions on how we can do this?
>
> There's already an userspace tool that handes it:
> 	https://git.fedorahosted.org/cgit/rasdaemon.git/
>
> What is missing there on the current version is the bits that would allow
> to translate from APEI way to report an error (memory node, card, module,
> bank, device) into a DIMM label[1].
If I'm reading this right, all APEI data seems to be squashed into a 
string in mc_event. Also, the fru id/text don't seem to be passed to 
user-space.
- Naveen
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-13 11:36                   ` Naveen N. Rao
@ 2013-08-13 12:21                     ` Mauro Carvalho Chehab
  2013-08-13 12:33                       ` Borislav Petkov
  2013-08-13 16:55                       ` Naveen N. Rao
  0 siblings, 2 replies; 65+ messages in thread
From: Mauro Carvalho Chehab @ 2013-08-13 12:21 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Borislav Petkov, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel, Aristeu Rozanski Filho
Em Tue, 13 Aug 2013 17:06:14 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> escreveu:
> On 08/12/2013 11:26 PM, Borislav Petkov wrote:
> > On Mon, Aug 12, 2013 at 02:25:57PM -0300, Mauro Carvalho Chehab wrote:
> >> Userspace still needs the EDAC sysfs, in order to identify how the
> >> memory is organized, and do the proper memory labels association.
> >>
> >> What edac_ghes does is to fill those sysfs nodes, and to call the
> >> existing tracing to report errors.
> 
> I suppose you're referring to the entries under /sys/devices/system/edac/mc?
Yes.
> 
> I'm not sure I understand how this helps. ghes_edac seems to just be 
> populating this based on dmi, which if I'm not mistaken, can be obtained 
> in userspace (mcelog as an example).
> 
> Also, on my system, all DIMMs are being reported under mc0. I doubt if 
> the labels there are accurate.
Yes, this is the current status of ghes_edac, where BIOS doesn't provide any
reliable way to associate a given APEI report to a physical DIMM slot label.
The plan is to add more logic there as BIOSes start to provide some reliable
way to do such association. I discussed this subject with a few vendors
while I was working at Red Hat.
> >
> > This is the only reason which justifies EDAC's existence. Naveen, can
> > your BIOS directly report the silkscreen label of the DIMM in error?
> > Generally, can any BIOS do that?
> >
> > More specifically, what are those gdata_fru_id and gdata_fru_text
> > things?
> 
> My understanding was that this provides the DIMM serial number, but I'm 
> double checking just to be sure.
If it provides the DIMM serial number, then it is possible to improve the
ghes_edac driver to associate them. One option could be to write an I2C
driver and dig those information directly from the memories, although 
doing that could be risky, as BIOS could also try to access the same I2C
buses.
> 
> Thanks,
> Naveen
> 
> >
> > Because if it can, then having the memory error tracepoint come direct
> > from APEI should be enough. The ghes_edac functionality could be then
> > fallback for BIOSes which cannot report the silkscreen label and in such
> > case I can imagine keeping both tracepoints, but disabling one of the
> > two...
> >
> 
-- 
Cheers,
Mauro
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-13 12:21                     ` Mauro Carvalho Chehab
@ 2013-08-13 12:33                       ` Borislav Petkov
  2013-08-13 16:55                       ` Naveen N. Rao
  1 sibling, 0 replies; 65+ messages in thread
From: Borislav Petkov @ 2013-08-13 12:33 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Mauro Carvalho Chehab, tony.luck, bhelgaas, rostedt, rjw,
	lance.ortiz, linux-pci, linux-acpi, linux-kernel,
	Aristeu Rozanski Filho
On Tue, Aug 13, 2013 at 09:21:54AM -0300, Mauro Carvalho Chehab wrote:
> > > More specifically, what are those gdata_fru_id and gdata_fru_text
> > > things?
> > 
> > My understanding was that this provides the DIMM serial number, but I'm 
> > double checking just to be sure.
Hm, ok, then.
If this info is exported to userspace over i2c or DMI (I've seen
it sometimes in dmidecode output too) then the information in the
tracepoint can be used in conjunction with dmidecode output to map back
to the DIMM or so...
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-13 11:41             ` Naveen N. Rao
@ 2013-08-13 12:41               ` Mauro Carvalho Chehab
  2013-08-13 17:17                 ` Naveen N. Rao
  0 siblings, 1 reply; 65+ messages in thread
From: Mauro Carvalho Chehab @ 2013-08-13 12:41 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Borislav Petkov, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel, Aristeu Rozanski Filho
Em Tue, 13 Aug 2013 17:11:18 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> escreveu:
> On 08/12/2013 08:14 PM, Mauro Carvalho Chehab wrote:
> >> But, this only seems to expose the APEI data as a string
> >> and doesn't look to really make all the fields available to user-space
> >> in a raw manner. Not sure how well this can be utilised by a user-space
> >> tool. Do you have suggestions on how we can do this?
> >
> > There's already an userspace tool that handes it:
> > 	https://git.fedorahosted.org/cgit/rasdaemon.git/
> >
> > What is missing there on the current version is the bits that would allow
> > to translate from APEI way to report an error (memory node, card, module,
> > bank, device) into a DIMM label[1].
> 
> If I'm reading this right, all APEI data seems to be squashed into a 
> string in mc_event.
Yes. We had lots of discussion about how to map memory errors over the
last couple years. Basically, it was decided that the information that
could be decoded into a DIMM to be mapped as integers, and all other
driver-specific data to be added as strings.
On the tests I did, different machines/vendors fill the APEI data on
a different way, with makes harder to associate them to a DIMM.
> Also, the fru id/text don't seem to be passed to user-space.
That's likely because on the systems I tested, those fields were not
filled (or maybe they appeared on a latter ACPI version). We should add
them also the same string as the other fields there at ghes_edac.
Regards,
Mauro
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-13 11:21             ` Naveen N. Rao
@ 2013-08-13 12:42               ` Borislav Petkov
  2013-08-13 17:32                 ` Naveen N. Rao
  0 siblings, 1 reply; 65+ messages in thread
From: Borislav Petkov @ 2013-08-13 12:42 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Mauro Carvalho Chehab, tony.luck, bhelgaas, rostedt, rjw,
	lance.ortiz, linux-pci, linux-acpi, linux-kernel
On Tue, Aug 13, 2013 at 04:51:33PM +0530, Naveen N. Rao wrote:
> You're right - my trace point makes all the data provided by apei
> as-is to userspace. However, ghes_edac seems to squash some of this
> data into a string when reporting through mc_event.
Right, for systems which don't need EDAC to decode to the DIMM or for
which there are no EDAC drivers written, they could use a tracepoint
which carries APEI info as-is. Others, which need EDAC, should probably
use trace_mc_event and disable the APEI tracepoint.
I think this should address Tony's concerns...
Btw, you could call your TP something simpler like
trace_ghes_memory_event or so.
Btw 2, if GHES can report other types of errors (I'm pretty sure it can)
maybe we can use a single tracepoint called trace_ghes_event for any
types of errors coming out of it...
Oh, and while at it, we probably need to start thinking of a mechanism
to disable all the error printing, i.e. cper_print_mem() and such,
if a userspace agent is listening in on the tracepoint and the error
information is carried through it to userspace.
Thanks.
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-13 12:21                     ` Mauro Carvalho Chehab
  2013-08-13 12:33                       ` Borislav Petkov
@ 2013-08-13 16:55                       ` Naveen N. Rao
  2013-08-14 23:54                         ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 65+ messages in thread
From: Naveen N. Rao @ 2013-08-13 16:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Borislav Petkov, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel, Aristeu Rozanski Filho
On 08/13/2013 05:51 PM, Mauro Carvalho Chehab wrote:
> Em Tue, 13 Aug 2013 17:06:14 +0530
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> escreveu:
>
>> On 08/12/2013 11:26 PM, Borislav Petkov wrote:
>>> On Mon, Aug 12, 2013 at 02:25:57PM -0300, Mauro Carvalho Chehab wrote:
>>>> Userspace still needs the EDAC sysfs, in order to identify how the
>>>> memory is organized, and do the proper memory labels association.
>>>>
>>>> What edac_ghes does is to fill those sysfs nodes, and to call the
>>>> existing tracing to report errors.
>>
>> I suppose you're referring to the entries under /sys/devices/system/edac/mc?
>
> Yes.
>
>>
>> I'm not sure I understand how this helps. ghes_edac seems to just be
>> populating this based on dmi, which if I'm not mistaken, can be obtained
>> in userspace (mcelog as an example).
>>
>> Also, on my system, all DIMMs are being reported under mc0. I doubt if
>> the labels there are accurate.
>
> Yes, this is the current status of ghes_edac, where BIOS doesn't provide any
> reliable way to associate a given APEI report to a physical DIMM slot label.
>
> The plan is to add more logic there as BIOSes start to provide some reliable
> way to do such association. I discussed this subject with a few vendors
> while I was working at Red Hat.
Hmm... is there anything specific in the APEI report that could help? 
More importantly, is there a need to do this in-kernel rather than in 
user-space?
Thanks,
Naveen
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-13 12:41               ` Mauro Carvalho Chehab
@ 2013-08-13 17:17                 ` Naveen N. Rao
  2013-08-13 17:39                   ` Luck, Tony
  2013-08-14 23:56                   ` Mauro Carvalho Chehab
  0 siblings, 2 replies; 65+ messages in thread
From: Naveen N. Rao @ 2013-08-13 17:17 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Borislav Petkov, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel, Aristeu Rozanski Filho
On 08/13/2013 06:11 PM, Mauro Carvalho Chehab wrote:
> Em Tue, 13 Aug 2013 17:11:18 +0530
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> escreveu:
>
>> On 08/12/2013 08:14 PM, Mauro Carvalho Chehab wrote:
>>>> But, this only seems to expose the APEI data as a string
>>>> and doesn't look to really make all the fields available to user-space
>>>> in a raw manner. Not sure how well this can be utilised by a user-space
>>>> tool. Do you have suggestions on how we can do this?
>>>
>>> There's already an userspace tool that handes it:
>>> 	https://git.fedorahosted.org/cgit/rasdaemon.git/
>>>
>>> What is missing there on the current version is the bits that would allow
>>> to translate from APEI way to report an error (memory node, card, module,
>>> bank, device) into a DIMM label[1].
>>
>> If I'm reading this right, all APEI data seems to be squashed into a
>> string in mc_event.
>
> Yes. We had lots of discussion about how to map memory errors over the
> last couple years. Basically, it was decided that the information that
> could be decoded into a DIMM to be mapped as integers, and all other
> driver-specific data to be added as strings.
>
> On the tests I did, different machines/vendors fill the APEI data on
> a different way, with makes harder to associate them to a DIMM.
Ok, so it looks like ghes_edac isn't quite useful yet.
In the meantime, like Boris suggests, I think we can have a different 
trace event for raw APEI reports - userspace can use it as it pleases.
Once ghes_edac gets better, users can decide whether they want raw APEI 
reports or the EDAC-processed version and choose one or the other trace 
event.
Regards,
Naveen
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-13 12:42               ` Borislav Petkov
@ 2013-08-13 17:32                 ` Naveen N. Rao
  2013-08-13 17:58                   ` Borislav Petkov
  2013-08-15  0:00                   ` Mauro Carvalho Chehab
  0 siblings, 2 replies; 65+ messages in thread
From: Naveen N. Rao @ 2013-08-13 17:32 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Mauro Carvalho Chehab, tony.luck, bhelgaas, rostedt, rjw,
	lance.ortiz, linux-pci, linux-acpi, linux-kernel
On 08/13/2013 06:12 PM, Borislav Petkov wrote:
> On Tue, Aug 13, 2013 at 04:51:33PM +0530, Naveen N. Rao wrote:
>> You're right - my trace point makes all the data provided by apei
>> as-is to userspace. However, ghes_edac seems to squash some of this
>> data into a string when reporting through mc_event.
>
> Right, for systems which don't need EDAC to decode to the DIMM or for
> which there are no EDAC drivers written, they could use a tracepoint
> which carries APEI info as-is. Others, which need EDAC, should probably
> use trace_mc_event and disable the APEI tracepoint.
If I'm not mistaken, even for systems that have EDAC drivers, it looks 
to me like EDAC can't really decode to the DIMM given what is provided 
by the bios in the APEI report currently. If and when ghes_edac gains 
this capability, users will have a choice between raw APEI reports vs. 
edac processed ones.
>
> I think this should address Tony's concerns...
>
> Btw, you could call your TP something simpler like
> trace_ghes_memory_event or so.
I started out with a simpler name, but eventually decided to use the 
name from the CPER record so it is clear what this event carries. I 
think this will be better when adding further ghes events for say, 
processor generic, PCIe and others.
>
> Btw 2, if GHES can report other types of errors (I'm pretty sure it can)
> maybe we can use a single tracepoint called trace_ghes_event for any
> types of errors coming out of it...
Two problems with this:
- One, the record size will be really big since the cper records for 
each type of error is large.
- Two, it may be better to filter events based on the type of error 
(memory error, processor, pcie, ...) rather than subscribing for all 
ghes error reports.
>
> Oh, and while at it, we probably need to start thinking of a mechanism
> to disable all the error printing, i.e. cper_print_mem() and such,
> if a userspace agent is listening in on the tracepoint and the error
> information is carried through it to userspace.
Do you mean conditionally print the cper records based on whether the 
tracepoint is enabled or not? Wouldn't that be confusing if someone is 
monitoring dmesg as well?
Thanks,
Naveen
^ permalink raw reply	[flat|nested] 65+ messages in thread
* RE: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-13 17:17                 ` Naveen N. Rao
@ 2013-08-13 17:39                   ` Luck, Tony
  2013-08-14 10:47                     ` Naveen N. Rao
  2013-08-14 23:56                   ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 65+ messages in thread
From: Luck, Tony @ 2013-08-13 17:39 UTC (permalink / raw)
  To: Naveen N. Rao, Mauro Carvalho Chehab
  Cc: Borislav Petkov, bhelgaas@google.com, rostedt@goodmis.org,
	rjw@sisk.pl, lance.ortiz@hp.com, linux-pci@vger.kernel.org,
	linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org,
	Aristeu Rozanski Filho
> In the meantime, like Boris suggests, I think we can have a different 
> trace event for raw APEI reports - userspace can use it as it pleases.
>
> Once ghes_edac gets better, users can decide whether they want raw APEI 
> reports or the EDAC-processed version and choose one or the other trace 
> event.
It's cheap to add as many tracepoints as we like - but may be costly to maintain.
Especially if we have to tinker with them later to adjust which things are logged,
that puts a burden on user-space tools to be updated to adapt to the changing
API.
Mauro has written his user-space tool to process the ghes-edac events:
  git://git.fedorahosted.org/rasdaemon.git
Who is writing the user space tools to process the new apei tracepoints
you want to add?
I'm not opposed to these patches - just wondering who is taking the next step
to make them useful.
-Tony
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-13 17:32                 ` Naveen N. Rao
@ 2013-08-13 17:58                   ` Borislav Petkov
  2013-08-13 18:05                     ` Luck, Tony
  2013-08-14 10:57                     ` Naveen N. Rao
  2013-08-15  0:00                   ` Mauro Carvalho Chehab
  1 sibling, 2 replies; 65+ messages in thread
From: Borislav Petkov @ 2013-08-13 17:58 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Mauro Carvalho Chehab, tony.luck, bhelgaas, rostedt, rjw,
	lance.ortiz, linux-pci, linux-acpi, linux-kernel
On Tue, Aug 13, 2013 at 11:02:08PM +0530, Naveen N. Rao wrote:
> If I'm not mistaken, even for systems that have EDAC drivers, it looks
> to me like EDAC can't really decode to the DIMM given what is provided
> by the bios in the APEI report currently. If and when ghes_edac gains
> this capability, users will have a choice between raw APEI reports vs.
> edac processed ones.
Which kinda makes that APEI tracepoint not really useful and we can call
the one we have already - trace_mc_event - from APEI...
> I started out with a simpler name, but eventually decided to use the
> name from the CPER record so it is clear what this event carries. I
> think this will be better when adding further ghes events for say,
> processor generic, PCIe and others.
This is exactly my fear: having to add a tracepoint per error type
instead of having a single trace_hw_error or so...
> >Btw 2, if GHES can report other types of errors (I'm pretty sure it can)
> >maybe we can use a single tracepoint called trace_ghes_event for any
> >types of errors coming out of it...
> 
> Two problems with this:
> - One, the record size will be really big since the cper records for
> each type of error is large.
I better go look at that CPER crap....
> - Two, it may be better to filter events based on the type of error
> (memory error, processor, pcie, ...) rather than subscribing for all
> ghes error reports.
You can filter that in userspace too.
> Do you mean conditionally print the cper records based on whether the
> tracepoint is enabled or not? Wouldn't that be confusing if someone is
> monitoring dmesg as well?
Why would you need dmesg if you get your hw errors over the tracepoint?
Thanks.
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
* RE: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-13 17:58                   ` Borislav Petkov
@ 2013-08-13 18:05                     ` Luck, Tony
  2013-08-13 18:10                       ` Borislav Petkov
  2013-08-14 10:57                     ` Naveen N. Rao
  1 sibling, 1 reply; 65+ messages in thread
From: Luck, Tony @ 2013-08-13 18:05 UTC (permalink / raw)
  To: Borislav Petkov, Naveen N. Rao
  Cc: Mauro Carvalho Chehab, bhelgaas@google.com, rostedt@goodmis.org,
	rjw@sisk.pl, lance.ortiz@hp.com, linux-pci@vger.kernel.org,
	linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org
PiBXaHkgd291bGQgeW91IG5lZWQgZG1lc2cgaWYgeW91IGdldCB5b3VyIGh3IGVycm9ycyBvdmVy
IHRoZSB0cmFjZXBvaW50Pw0KDQpSZWR1bmRhbmN5IGlzIGEgZ29vZCB0aGluZyB3aGVuIHRhbGtp
bmcgYWJvdXQgbWlzc2lvbiBjcml0aWNhbCBzeXN0ZW1zLiAgZG1lc2cNCm1heSBiZSBmZWVkaW5n
IHRvIGEgc2VyaWFsIGNvbnNvbGUgdG8gYmUgbG9nZ2VkIGFuZCBhbmFseXNlZCBvbiBhbm90aGVy
IHN5c3RlbS4NCg0KVGhlIHRyYWNlcG9pbnQgZGF0YSBnb2VzIHRvIGEgcHJvY2VzcyBvbiB0aGUg
c3lzdGVtIGV4cGVyaWVuY2luZyB0aGUgZXJyb3JzLiBJZiB0aGUNCmVycm9ycyBhcmUgc2VyaW91
cyAob3IgYSBwcmVjdXJzb3IgdG8gc29tZXRoaW5nIHNlcmlvdXMpIHRoYXQgcHJvY2VzcyBtYXkg
bmV2ZXIgZ2V0DQp0aGUgY2hhbmNlIHRvIHNhdmUgdGhlIGxvZy4NCg0KLVRvbnkNCg==
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-13 18:05                     ` Luck, Tony
@ 2013-08-13 18:10                       ` Borislav Petkov
  2013-08-13 20:13                         ` Luck, Tony
  0 siblings, 1 reply; 65+ messages in thread
From: Borislav Petkov @ 2013-08-13 18:10 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Naveen N. Rao, Mauro Carvalho Chehab, bhelgaas@google.com,
	rostedt@goodmis.org, rjw@sisk.pl, lance.ortiz@hp.com,
	linux-pci@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-kernel@vger.kernel.org
On Tue, Aug 13, 2013 at 06:05:02PM +0000, Luck, Tony wrote:
> If the errors are serious (or a precursor to something serious) that
> process may never get the chance to save the log.
What about sending tracepoint data over serial and/or network? I agree
that dmesg over serial would be helpful but we need a similar sure-fire
way for carrying error info out.
Which reminds me, pstore could also be a good thing to use, in addition.
Only put error info there as it is limited anyway.
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
* RE: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-13 18:10                       ` Borislav Petkov
@ 2013-08-13 20:13                         ` Luck, Tony
  2013-08-14  5:43                           ` Borislav Petkov
  0 siblings, 1 reply; 65+ messages in thread
From: Luck, Tony @ 2013-08-13 20:13 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Naveen N. Rao, Mauro Carvalho Chehab, bhelgaas@google.com,
	rostedt@goodmis.org, rjw@sisk.pl, lance.ortiz@hp.com,
	linux-pci@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-kernel@vger.kernel.org
PiBXaGF0IGFib3V0IHNlbmRpbmcgdHJhY2Vwb2ludCBkYXRhIG92ZXIgc2VyaWFsIGFuZC9vciBu
ZXR3b3JrPyBJIGFncmVlDQo+IHRoYXQgZG1lc2cgb3ZlciBzZXJpYWwgd291bGQgYmUgaGVscGZ1
bCBidXQgd2UgbmVlZCBhIHNpbWlsYXIgc3VyZS1maXJlDQo+IHdheSBmb3IgY2FycnlpbmcgZXJy
b3IgaW5mbyBvdXQuDQoNCkdlbmVyaWMgdHJhY2Vwb2ludHMgYXJlIGFyY2hpdGVjdGVkIHRvIGJl
IGFibGUgdG8gZmlyZSBhdCB2ZXJ5IGhpZ2ggcmF0ZXMgYW5kDQpsb2cgaHVnZSBhbW91bnRzIG9m
IGluZm9ybWF0aW9uLiAgU28gd2UnZCBuZWVkIHNvbWV0aGluZyBzcGVjaWFsIHRvIHNheQ0KanVz
dCBsb2cgdGhlc2Ugc3BlY2lhbCB0cmFjZXBvaW50cyB0byBuZXR3b3JrL3NlcmlhbC4NCg0KPiBX
aGljaCByZW1pbmRzIG1lLCBwc3RvcmUgY291bGQgYWxzbyBiZSBhIGdvb2QgdGhpbmcgdG8gdXNl
LCBpbiBhZGRpdGlvbi4NCj4gT25seSBwdXQgZXJyb3IgaW5mbyB0aGVyZSBhcyBpdCBpcyBsaW1p
dGVkIGFueXdheS4NCg0KWWVzIC0gc3BhY2UgaXMgdmVyeSBsaW1pdGVkLiAgSSBkb24ndCBrbm93
IGhvdyB0byBhc3NpZ24gcHJpb3JpdHkgZm9yIGxvZ2dpbmcNCnRoZSBkbWVzZyBkYXRhIHZzLiBz
b21lIGVycm9yIGxvZ3MuDQoNCg0KSWYgd2UganVzdCAicHJpbnRrKCkiIHRoZSBtb3N0IGltcG9y
dGFudCBwYXJ0cyAtIHRoZW4gdGhhdCBkYXRhIHdpbGwgYXV0b21hdGljYWxseQ0KZmxvdyB0byB0
aGUgc2VyaWFsIGNvbnNvbGUgYW5kIHRvIHBzdG9yZS4gIFRoZW4gd2UgaGF2ZSBtdWx0aXBsZSBw
YXRocyBmb3IgdGhlDQpjcml0aWNhbCBiaXRzIG9mIHRoZSBlcnJvciBsb2cgLSBhbmQgdGhlIHRy
YWNlcG9pbnRzIGdpdmUgdXMgbW9yZSBkZXRhaWxzIGZvciB0aGUNCmNhc2VzIHdoZXJlIHRoZSBt
YWNoaW5lIGRvZXNuJ3Qgc3BvbnRhbmVvdXNseSBleHBsb2RlLg0KDQotVG9ueQ0K
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-13 20:13                         ` Luck, Tony
@ 2013-08-14  5:43                           ` Borislav Petkov
  2013-08-14 18:38                             ` Luck, Tony
  2013-08-15  0:05                             ` Mauro Carvalho Chehab
  0 siblings, 2 replies; 65+ messages in thread
From: Borislav Petkov @ 2013-08-14  5:43 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Naveen N. Rao, Mauro Carvalho Chehab, bhelgaas@google.com,
	rostedt@goodmis.org, rjw@sisk.pl, lance.ortiz@hp.com,
	linux-pci@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-kernel@vger.kernel.org
On Tue, Aug 13, 2013 at 08:13:56PM +0000, Luck, Tony wrote:
> Generic tracepoints are architected to be able to fire at very high
> rates and log huge amounts of information. So we'd need something
> special to say just log these special tracepoints to network/serial.
>
> > Which reminds me, pstore could also be a good thing to use, in addition.
> > Only put error info there as it is limited anyway.
> 
> Yes - space is very limited.  I don't know how to assign priority for logging
> the dmesg data vs. some error logs.
Didn't we say at some point, "log only the panic messsage which kills
the machine"?
However, we probably could use more the messages before that
catastrophic event because they could give us hints about what lead to
the panic but in that case maybe a limited pstore is the wrong logging
medium.
Actually, I can imagine the full serial/network logs of "special"
tracepoints + dmesg to be the optimal thing.
> If we just "printk()" the most important parts - then that data will
> automatically flow to the serial console and to pstore.
Actually, does the pstore act like a circular buffer? Because if it
contains the last N relevant messages (for an arbitrary definition of
relevant) before the system dies, then that could more helpful than only
the error messages.
And with the advent of UEFI, pretty much every system has a pstore. Too
bad that we have to limit it to 50% of size so that the boxes don't
brick. :-P
> Then we have multiple paths for the critical bits of the error log
> - and the tracepoints give us more details for the cases where the
> machine doesn't spontaneously explode.
Ok, let's sort:
* First we have the not-so-critical hw error messages. We want to carry
those out-of-band, i.e. not in dmesg so that people don't have to parse
and collect dmesg but have a specialized solution which gives them
structured logs and tools can analyze, collect and ... those errors.
* When a critical error happens, the above usage is not necessarily
advantageous anymore in the sense that, in order to debug what caused
the machine to crash, we don't simply necessarily want only the crash
message but also the whole system activity that lead to it.
In which case, we probably actually want to turn off/ignore the error
logging tracepoints and write *only* to dmesg which goes out over serial
and to pstore. Right?
Because in such cases I want to have *all* *relevant* messages that lead
to the explosion + the explosion message itself.
Makes sense? Yes, no? Aspects I've missed?
Thanks.
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-13 17:39                   ` Luck, Tony
@ 2013-08-14 10:47                     ` Naveen N. Rao
  2013-08-14 12:18                       ` Borislav Petkov
  2013-08-15  0:15                       ` Mauro Carvalho Chehab
  0 siblings, 2 replies; 65+ messages in thread
From: Naveen N. Rao @ 2013-08-14 10:47 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Mauro Carvalho Chehab, Borislav Petkov, bhelgaas@google.com,
	rostedt@goodmis.org, rjw@sisk.pl, lance.ortiz@hp.com,
	linux-pci@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-kernel@vger.kernel.org, Aristeu Rozanski Filho
On 08/13/2013 11:09 PM, Luck, Tony wrote:
>> In the meantime, like Boris suggests, I think we can have a different
>> trace event for raw APEI reports - userspace can use it as it pleases.
>>
>> Once ghes_edac gets better, users can decide whether they want raw APEI
>> reports or the EDAC-processed version and choose one or the other trace
>> event.
>
> It's cheap to add as many tracepoints as we like - but may be costly to maintain.
> Especially if we have to tinker with them later to adjust which things are logged,
> that puts a burden on user-space tools to be updated to adapt to the changing
> API.
Agree. And this is the reason I have been considering mc_event. But, the 
below issues with ghes_edac made me unsure:
- One, the logging format for APEI data is a bit verbose and hard to 
parse. But, I suppose we could work with this if we make a few changes. 
Is it ok to change how the APEI data is made available through 
mc_event->driver_detail?
- Two, if ghes_edac is enabled, it prevents other edac drivers from 
being loaded. It looks like the assumption here is that if ghes/firmware 
first is enabled, then *all* memory errors are reported through ghes 
which is not true. We could have (a subset of) corrected errors reported 
through ghes, some through CMCI and uncorrected errors through MCE. So, 
if I'm not mistaken, if ghes_edac is enabled, we will only receive ghes 
error events through mc_event and not the others. Mauro, is this accurate?
>
> Mauro has written his user-space tool to process the ghes-edac events:
>    git://git.fedorahosted.org/rasdaemon.git
>
> Who is writing the user space tools to process the new apei tracepoints
> you want to add?
Enabling rasdaemon itself for the new tracepoint is an option, as long 
as Mauro doesn't object to it ;)
>
> I'm not opposed to these patches - just wondering who is taking the next step
> to make them useful.
Sure.
Regards,
Naveen
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-13 17:58                   ` Borislav Petkov
  2013-08-13 18:05                     ` Luck, Tony
@ 2013-08-14 10:57                     ` Naveen N. Rao
  2013-08-15  0:22                       ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 65+ messages in thread
From: Naveen N. Rao @ 2013-08-14 10:57 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Mauro Carvalho Chehab, tony.luck, bhelgaas, rostedt, rjw,
	lance.ortiz, linux-pci, linux-acpi, linux-kernel
On 08/13/2013 11:28 PM, Borislav Petkov wrote:
> On Tue, Aug 13, 2013 at 11:02:08PM +0530, Naveen N. Rao wrote:
>> If I'm not mistaken, even for systems that have EDAC drivers, it looks
>> to me like EDAC can't really decode to the DIMM given what is provided
>> by the bios in the APEI report currently. If and when ghes_edac gains
>> this capability, users will have a choice between raw APEI reports vs.
>> edac processed ones.
>
> Which kinda makes that APEI tracepoint not really useful and we can call
> the one we have already - trace_mc_event - from APEI...
This looks like a nice option. Mauro, what do you think?
Thanks,
Naveen
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-14 10:47                     ` Naveen N. Rao
@ 2013-08-14 12:18                       ` Borislav Petkov
  2013-08-15  0:15                       ` Mauro Carvalho Chehab
  1 sibling, 0 replies; 65+ messages in thread
From: Borislav Petkov @ 2013-08-14 12:18 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Luck, Tony, Mauro Carvalho Chehab, bhelgaas@google.com,
	rostedt@goodmis.org, rjw@sisk.pl, lance.ortiz@hp.com,
	linux-pci@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-kernel@vger.kernel.org, Aristeu Rozanski Filho
On Wed, Aug 14, 2013 at 04:17:26PM +0530, Naveen N. Rao wrote:
> - One, the logging format for APEI data is a bit verbose and hard
> to parse. But, I suppose we could work with this if we make a few
> changes. Is it ok to change how the APEI data is made available
> through mc_event->driver_detail?
How?
> - Two, if ghes_edac is enabled, it prevents other edac drivers
> from being loaded. It looks like the assumption here is that if
> ghes/firmware first is enabled, then *all* memory errors are reported
> through ghes which is not true. We could have (a subset of) corrected
> errors reported through ghes, some through CMCI and uncorrected errors
> through MCE. So, if I'm not mistaken, if ghes_edac is enabled, we will
> only receive ghes error events through mc_event and not the others.
The idea is to have a single tracepoint reporting memory errors. Where
you call it from shouldn't matter. Now, we have to make sure that we
don't report the same event more than once over different paths.
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
* RE: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-14  5:43                           ` Borislav Petkov
@ 2013-08-14 18:38                             ` Luck, Tony
  2013-08-15 10:14                               ` Borislav Petkov
  2013-08-15  0:05                             ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 65+ messages in thread
From: Luck, Tony @ 2013-08-14 18:38 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Naveen N. Rao, Mauro Carvalho Chehab, bhelgaas@google.com,
	rostedt@goodmis.org, rjw@sisk.pl, lance.ortiz@hp.com,
	linux-pci@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-kernel@vger.kernel.org
PiBEaWRuJ3Qgd2Ugc2F5IGF0IHNvbWUgcG9pbnQsICJsb2cgb25seSB0aGUgcGFuaWMgbWVzc3Nh
Z2Ugd2hpY2gga2lsbHMNCj4gdGhlIG1hY2hpbmUiPw0KDQpXZSd2ZSB3YW5kZXJlZCBhcm91bmQg
ZGlmZmVyZW50IHN0cmF0ZWdpZXMgaGVyZS4gV2UgZGVmaW5pdGVseSB3YW50DQp0aGUgcGFuaWMg
bG9nLiAgU29tZSBwZW9wbGUgd2FudCBhbGwgb3RoZXIgImtlcm5lbCBleGl0IiBsb2dzIChzaHV0
ZG93biwNCnJlYm9vdCwga2V4ZWMpLiAgV2hlbiB0aGVyZSBpcyBlbm91Z2ggc3BhY2UgaW4gdGhl
IHBzdG9yZSBiYWNrZW5kIHdlDQptaWdodCBhbHNvIHdhbnQgdGhlICJvb3BzIiB0aGF0IHByZWNl
ZWRlZCB0aGUgcGFuaWMuICAoT2YgY291cnNlIHdoZW4NCnRoZSBvb3BzIGhhcHBlbnMgd2UgZG9u
J3Qga25vdyB0aGUgZnV0dXJlLCBzbyBoYXZlIHRvIHNhdmUgaXQganVzdCBpbg0KY2FzZSAuLi4g
dGhlbiBpZiBtb3JlICJvb3BzIiBoYXBwZW4gd2UgaGF2ZSB0byBkZWNpZGUgd2hldGhlciB0byBr
ZWVwDQp0aGUgb2xkIG9vcHMgbG9nLCBvciBzYXZlIHRoZSBuZXdlciBvbmUpLg0KDQo+IEhvd2V2
ZXIsIHdlIHByb2JhYmx5IGNvdWxkIHVzZSBtb3JlIHRoZSBtZXNzYWdlcyBiZWZvcmUgdGhhdA0K
PiBjYXRhc3Ryb3BoaWMgZXZlbnQgYmVjYXVzZSB0aGV5IGNvdWxkIGdpdmUgdXMgaGludHMgYWJv
dXQgd2hhdCBsZWFkIHRvDQo+IHRoZSBwYW5pYyBidXQgaW4gdGhhdCBjYXNlIG1heWJlIGEgbGlt
aXRlZCBwc3RvcmUgaXMgdGhlIHdyb25nIGxvZ2dpbmcNCj4gbWVkaXVtLg0KWWVzIC0gbG9uZ2Vy
IGxvZ3MgYXJlIGJldHRlci4gIFNhZCB0aGF0IHRoZSBwc3RvcmUgYmFja2VuZCBkZXZpY2VzIGFy
ZQ0KbWVhc3VyZWQgaW4ga2lsb2J5dGVzIDotKQ0KDQo+IEFjdHVhbGx5LCBJIGNhbiBpbWFnaW5l
IHRoZSBmdWxsIHNlcmlhbC9uZXR3b3JrIGxvZ3Mgb2YgInNwZWNpYWwiDQo+IHRyYWNlcG9pbnRz
ICsgZG1lc2cgdG8gYmUgdGhlIG9wdGltYWwgdGhpbmcuDQpJZiB5b3UgZ3Vlc3MgdGhlIHJpZ2h0
ICJzcGVjaWFsIiB0cmFjZXBvaW50cyB0byBsb2cgLSB0aGVuIHllcy4NCg0KPiBBY3R1YWxseSwg
ZG9lcyB0aGUgcHN0b3JlIGFjdCBsaWtlIGEgY2lyY3VsYXIgYnVmZmVyPyBCZWNhdXNlIGlmIGl0
DQo+IGNvbnRhaW5zIHRoZSBsYXN0IE4gcmVsZXZhbnQgbWVzc2FnZXMgKGZvciBhbiBhcmJpdHJh
cnkgZGVmaW5pdGlvbiBvZg0KPiByZWxldmFudCkgYmVmb3JlIHRoZSBzeXN0ZW0gZGllcywgdGhl
biB0aGF0IGNvdWxkIG1vcmUgaGVscGZ1bCB0aGFuIG9ubHkNCj4gdGhlIGVycm9yIG1lc3NhZ2Vz
Lg0KTm8gLSB3cml0ZSBzcGVlZCBmb3IgdGhlIHBlcnNpc3RlbnQgc3RvcmFnZSBiYWNraW5nIHBz
dG9yZSAoZmxhc2gpIG1lYW5zIHdlDQpkb24ndCBsb2cgYXMgd2UgZ28uIFdlIHdhaXQgZm9yIGEg
cGFuaWMgYW5kIHRoZW4gb3VyIHJlZ2lzdGVyZWQgZnVuY3Rpb24NCmdldHMgY2FsbGVkIHNvIHdl
IGNhbiBzbmFwc2hvdCB3aGF0IGlzIGluIHRoZSBjb25zb2xlIGxvZyBhdCB0aGF0IHBvaW50Lg0K
V2UgYWxzbyBkb24ndCB3YW50IHRvIHdlYXIgb3V0IHRoZSBmbGFzaCB3aGljaCBtYXkgYmUgc29s
ZGVyZWQgdG8gdGhlDQptb3RoZXJib2FyZC4NCg0KDQo+IE9rLCBsZXQncyBzb3J0Og0KDQo+ICog
Rmlyc3Qgd2UgaGF2ZSB0aGUgbm90LXNvLWNyaXRpY2FsIGh3IGVycm9yIG1lc3NhZ2VzLiBXZSB3
YW50IHRvIGNhcnJ5DQo+IHRob3NlIG91dC1vZi1iYW5kLCBpLmUuIG5vdCBpbiBkbWVzZyBzbyB0
aGF0IHBlb3BsZSBkb24ndCBoYXZlIHRvIHBhcnNlDQo+IGFuZCBjb2xsZWN0IGRtZXNnIGJ1dCBo
YXZlIGEgc3BlY2lhbGl6ZWQgc29sdXRpb24gd2hpY2ggZ2l2ZXMgdGhlbQ0KPiBzdHJ1Y3R1cmVk
IGxvZ3MgYW5kIHRvb2xzIGNhbiBhbmFseXplLCBjb2xsZWN0IGFuZCAuLi4gdGhvc2UgZXJyb3Jz
Lg0KDQpBZ3JlZWQgLSB3ZSBzaG91bGRuJ3QgY2x1dHRlciBsb2dzIHdpdGggZGV0YWlscyBvZiBj
b3JyZWN0ZWQgZXJyb3JzLg0KQXQgbW9zdCB3ZSBzaG91bGQgaGF2ZSBhIHJhdGUtbGltaXRlZCBs
b2cgc2hvd2luZyB0aGUgY291bnQgb2YgY29ycmVjdGVkIGVycm9ycw0Kc28gdGhhdCBzb21lb25l
IHdobyBqdXN0IHdhdGNoZXMgZG1lc2cga25vd3MgdGhleSBzaG91bGQgZ28gZGlnIGRlZXBlcg0K
aWYgdGhleSBzZWUgc29tZSBiaWcgbnVtYmVyIG9mIGNvcnJlY3RlZCBlcnJvcnMuDQoNCj4gKiBX
aGVuIGEgY3JpdGljYWwgZXJyb3IgaGFwcGVucywgdGhlIGFib3ZlIHVzYWdlIGlzIG5vdCBuZWNl
c3NhcmlseQ0KPiBhZHZhbnRhZ2VvdXMgYW55bW9yZSBpbiB0aGUgc2Vuc2UgdGhhdCwgaW4gb3Jk
ZXIgdG8gZGVidWcgd2hhdCBjYXVzZWQNCj4gdGhlIG1hY2hpbmUgdG8gY3Jhc2gsIHdlIGRvbid0
IHNpbXBseSBuZWNlc3NhcmlseSB3YW50IG9ubHkgdGhlIGNyYXNoDQo+IG1lc3NhZ2UgYnV0IGFs
c28gdGhlIHdob2xlIHN5c3RlbSBhY3Rpdml0eSB0aGF0IGxlYWQgdG8gaXQuDQoNClllcy4gIFRo
ZXJlIGFyZSBwZW9wbGUgbG9va2luZyBhdCB2YXJpb3VzICJmbGlnaHQgcmVjb3JkZXIiIG1vZGVz
IGZvciB0cmFjaW5nDQp0aGF0IGtlZXAgbG9ncyBvZiBub3JtYWwgZXZlbnRzIGluIGEgY2lyY3Vs
YXIgYnVmZmVyIGluIFJBTSAuLi4gaWYgdGhlc2UgZXhpc3QNCnRoZXkgc2hvdWxkIGJlIHNhdmVk
IGF0IGNyYXNoIHRpbWUgKGFuZCB0aGV5IGFyZSBpbiB0aGUga2V4ZWMva2R1bXAgcGF0aCwNCmJ1
dCBJIGRvbuKAmXQga25vdyBpZiBhbnlvbmUgZG9lcyBhbnl0aGluZyBpbiB0aGUgbm9uLWtkdW1w
IGNhc2UpLg0KDQo+IEluIHdoaWNoIGNhc2UsIHdlIHByb2JhYmx5IGFjdHVhbGx5IHdhbnQgdG8g
dHVybiBvZmYvaWdub3JlIHRoZSBlcnJvcg0KPiBsb2dnaW5nIHRyYWNlcG9pbnRzIGFuZCB3cml0
ZSAqb25seSogdG8gZG1lc2cgd2hpY2ggZ29lcyBvdXQgb3ZlciBzZXJpYWwNCj4gYW5kIHRvIHBz
dG9yZS4gUmlnaHQ/DQpUcmFjZXBvaW50cyBmb3IgZXJyb3JzIHRoYXQgYXJlIGdvaW5nIHRvIGxl
YWQgdG8gc3lzdGVtIGNyYXNoIHdvdWxkIG9ubHkgYmUNCnVzZWZ1bCB0b2dldGhlciB3aXRoIGEg
ZmxpZ2h0IHJlY29yZGVyIHRvIG1ha2Ugc3VyZSB0aGV5IGdldCBzYXZlZC4gSSB0aGluaw0KdHJh
Y2Vwb2ludHMgZm9yIGNvcnJlY3RlZCBlcnJvcnMgYXJlIGJldHRlciB0aGFuIGRtZXNnIGxvZ3Mu
DQoNCj4gQmVjYXVzZSBpbiBzdWNoIGNhc2VzIEkgd2FudCB0byBoYXZlICphbGwqICpyZWxldmFu
dCogbWVzc2FnZXMgdGhhdCBsZWFkDQo+IHRvIHRoZSBleHBsb3Npb24gKyB0aGUgZXhwbG9zaW9u
IG1lc3NhZ2UgaXRzZWxmLg0KDQpJbiBhIHBlcmZlY3Qgd29ybGQgeWVzIC0gSSBkb24ndCBrbm93
IHRoYXQgd2UgY2FuIGFjaGlldmUgcGVyZmVjdGlvbiAtIGJ1dCB3ZQ0KY2FuIGl0ZXJhdGUgdGhy
b3VnaCBnb29kLCBiZXR0ZXIsIGV2ZW4gYmV0dGVyLiAgVGhlIHJlYWxseSBoYXJkIHBhcnQgb2Yg
dGhpcyBpcw0KZmlndXJpbmcgb3V0IHdoYXQgaXMgKnJlbGV2YW50KiB0byBzYXZlIGJlZm9yZSBh
IHBhcnRpY3VsYXIgY3Jhc2ggaGFwcGVucy4NCg0KLVRvbnkNCg==
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-13 16:55                       ` Naveen N. Rao
@ 2013-08-14 23:54                         ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 65+ messages in thread
From: Mauro Carvalho Chehab @ 2013-08-14 23:54 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Borislav Petkov, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel, Aristeu Rozanski Filho
Em Tue, 13 Aug 2013 22:25:58 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> escreveu:
(sorry for a late answer, I had to do a small travel yesterday)
> On 08/13/2013 05:51 PM, Mauro Carvalho Chehab wrote:
> > Em Tue, 13 Aug 2013 17:06:14 +0530
> > "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> escreveu:
> >
> >> On 08/12/2013 11:26 PM, Borislav Petkov wrote:
> >>> On Mon, Aug 12, 2013 at 02:25:57PM -0300, Mauro Carvalho Chehab wrote:
> >>>> Userspace still needs the EDAC sysfs, in order to identify how the
> >>>> memory is organized, and do the proper memory labels association.
> >>>>
> >>>> What edac_ghes does is to fill those sysfs nodes, and to call the
> >>>> existing tracing to report errors.
> >>
> >> I suppose you're referring to the entries under /sys/devices/system/edac/mc?
> >
> > Yes.
> >
> >>
> >> I'm not sure I understand how this helps. ghes_edac seems to just be
> >> populating this based on dmi, which if I'm not mistaken, can be obtained
> >> in userspace (mcelog as an example).
> >>
> >> Also, on my system, all DIMMs are being reported under mc0. I doubt if
> >> the labels there are accurate.
> >
> > Yes, this is the current status of ghes_edac, where BIOS doesn't provide any
> > reliable way to associate a given APEI report to a physical DIMM slot label.
> >
> > The plan is to add more logic there as BIOSes start to provide some reliable
> > way to do such association. I discussed this subject with a few vendors
> > while I was working at Red Hat.
> 
> Hmm... is there anything specific in the APEI report that could help? 
I didn't see anything at APEI spec that would allow to describe how the
memory is organized. So, it is hard for the ghes_edac driver to discover
how many memory controllers, channels and slots are available. This data
is needed, in order to allow userspace to pass the labels for each DIMM,
or for the Kernel to auto-discover.
> More importantly, is there a need to do this in-kernel rather than in 
> user-space?
Yes, due to 2 aspects:
On a critical error, the machine will die. The EDAC core will print the
error at dmesg, but no other record to be latter parsed will be available;
With hot pluggable memories, dynamic channel rerouting, memory poisoning
and other funny things, it could not be possible to point to a DIMM, 
if the parsing is done on a latter time.
Regards,
Mauro
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-13 17:17                 ` Naveen N. Rao
  2013-08-13 17:39                   ` Luck, Tony
@ 2013-08-14 23:56                   ` Mauro Carvalho Chehab
  2013-08-15 10:02                     ` Borislav Petkov
  1 sibling, 1 reply; 65+ messages in thread
From: Mauro Carvalho Chehab @ 2013-08-14 23:56 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Borislav Petkov, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel, Aristeu Rozanski Filho
Em Tue, 13 Aug 2013 22:47:36 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> escreveu:
> On 08/13/2013 06:11 PM, Mauro Carvalho Chehab wrote:
> > Em Tue, 13 Aug 2013 17:11:18 +0530
> > "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> escreveu:
> >
> >> On 08/12/2013 08:14 PM, Mauro Carvalho Chehab wrote:
> >>>> But, this only seems to expose the APEI data as a string
> >>>> and doesn't look to really make all the fields available to user-space
> >>>> in a raw manner. Not sure how well this can be utilised by a user-space
> >>>> tool. Do you have suggestions on how we can do this?
> >>>
> >>> There's already an userspace tool that handes it:
> >>> 	https://git.fedorahosted.org/cgit/rasdaemon.git/
> >>>
> >>> What is missing there on the current version is the bits that would allow
> >>> to translate from APEI way to report an error (memory node, card, module,
> >>> bank, device) into a DIMM label[1].
> >>
> >> If I'm reading this right, all APEI data seems to be squashed into a
> >> string in mc_event.
> >
> > Yes. We had lots of discussion about how to map memory errors over the
> > last couple years. Basically, it was decided that the information that
> > could be decoded into a DIMM to be mapped as integers, and all other
> > driver-specific data to be added as strings.
> >
> > On the tests I did, different machines/vendors fill the APEI data on
> > a different way, with makes harder to associate them to a DIMM.
> 
> Ok, so it looks like ghes_edac isn't quite useful yet.
> 
> In the meantime, like Boris suggests, I think we can have a different 
> trace event for raw APEI reports - userspace can use it as it pleases.
"In the meantime" is something that worries me the most. Kernel APIs should
be stable. We cannot randomly change it on each new kernel version.
Better to spend a little more time discussing than implementing a new trace
that will be removed on a near future.
> 
> Once ghes_edac gets better, users can decide whether they want raw APEI 
> reports or the EDAC-processed version and choose one or the other trace 
> event.
> 
> Regards,
> Naveen
> 
-- 
Cheers,
Mauro
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-13 17:32                 ` Naveen N. Rao
  2013-08-13 17:58                   ` Borislav Petkov
@ 2013-08-15  0:00                   ` Mauro Carvalho Chehab
  2013-08-15  9:43                     ` Borislav Petkov
  1 sibling, 1 reply; 65+ messages in thread
From: Mauro Carvalho Chehab @ 2013-08-15  0:00 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Borislav Petkov, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel
Em Tue, 13 Aug 2013 23:02:08 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> escreveu:
> On 08/13/2013 06:12 PM, Borislav Petkov wrote:
> > On Tue, Aug 13, 2013 at 04:51:33PM +0530, Naveen N. Rao wrote:
> >> You're right - my trace point makes all the data provided by apei
> >> as-is to userspace. However, ghes_edac seems to squash some of this
> >> data into a string when reporting through mc_event.
> >
> > Right, for systems which don't need EDAC to decode to the DIMM or for
> > which there are no EDAC drivers written, they could use a tracepoint
> > which carries APEI info as-is. Others, which need EDAC, should probably
> > use trace_mc_event and disable the APEI tracepoint.
> 
> If I'm not mistaken, even for systems that have EDAC drivers, it looks 
> to me like EDAC can't really decode to the DIMM given what is provided 
> by the bios in the APEI report currently.
Yes, the current APEI events, reported via EDAC, can't be decoded currently.
> If and when ghes_edac gains 
> this capability, users will have a choice between raw APEI reports vs. 
> edac processed ones.
An APEI-specific tracing won't fix it, as, AFAIKT, we don't have any way
to map it, even on userspace.
> 
> >
> > I think this should address Tony's concerns...
> >
> > Btw, you could call your TP something simpler like
> > trace_ghes_memory_event or so.
> 
> I started out with a simpler name, but eventually decided to use the 
> name from the CPER record so it is clear what this event carries. I 
> think this will be better when adding further ghes events for say, 
> processor generic, PCIe and others.
> 
> >
> > Btw 2, if GHES can report other types of errors (I'm pretty sure it can)
> > maybe we can use a single tracepoint called trace_ghes_event for any
> > types of errors coming out of it...
> 
> Two problems with this:
> - One, the record size will be really big since the cper records for 
> each type of error is large.
> - Two, it may be better to filter events based on the type of error 
> (memory error, processor, pcie, ...) rather than subscribing for all 
> ghes error reports.
I agree: per-type of error events is better than a big generic one.
> 
> >
> > Oh, and while at it, we probably need to start thinking of a mechanism
> > to disable all the error printing, i.e. cper_print_mem() and such,
> > if a userspace agent is listening in on the tracepoint and the error
> > information is carried through it to userspace.
> 
> Do you mean conditionally print the cper records based on whether the 
> tracepoint is enabled or not? Wouldn't that be confusing if someone is 
> monitoring dmesg as well?
> 
> 
> Thanks,
> Naveen
> 
-- 
Cheers,
Mauro
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-14  5:43                           ` Borislav Petkov
  2013-08-14 18:38                             ` Luck, Tony
@ 2013-08-15  0:05                             ` Mauro Carvalho Chehab
  1 sibling, 0 replies; 65+ messages in thread
From: Mauro Carvalho Chehab @ 2013-08-15  0:05 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luck, Tony, Naveen N. Rao, bhelgaas@google.com,
	rostedt@goodmis.org, rjw@sisk.pl, lance.ortiz@hp.com,
	linux-pci@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-kernel@vger.kernel.org
Em Wed, 14 Aug 2013 07:43:22 +0200
Borislav Petkov <bp@alien8.de> escreveu:
> On Tue, Aug 13, 2013 at 08:13:56PM +0000, Luck, Tony wrote:
> > Generic tracepoints are architected to be able to fire at very high
> > rates and log huge amounts of information. So we'd need something
> > special to say just log these special tracepoints to network/serial.
> >
> > > Which reminds me, pstore could also be a good thing to use, in addition.
> > > Only put error info there as it is limited anyway.
> > 
> > Yes - space is very limited.  I don't know how to assign priority for logging
> > the dmesg data vs. some error logs.
> 
> Didn't we say at some point, "log only the panic messsage which kills
> the machine"?
EDAC core allows those kind of things, and even panic when errors arrive:
$ modinfo edac_core
filename:       /lib/modules/3.10.5-201.fc19.x86_64/kernel/drivers/edac/edac_core.ko
...
parm:           edac_pci_panic_on_pe:Panic on PCI Bus Parity error: 0=off 1=on (int)
parm:           edac_mc_panic_on_ue:Panic on uncorrected error: 0=off 1=on (int)
parm:           edac_mc_log_ue:Log uncorrectable error to console: 0=off 1=on (int)
parm:           edac_mc_log_ce:Log correctable error to console: 0=off 1=on (int)
Those have 644 permission, so they can be changed at runtime.
Of course, there are space for improvements.
> However, we probably could use more the messages before that
> catastrophic event because they could give us hints about what lead to
> the panic but in that case maybe a limited pstore is the wrong logging
> medium.
> 
> Actually, I can imagine the full serial/network logs of "special"
> tracepoints + dmesg to be the optimal thing.
> 
> > If we just "printk()" the most important parts - then that data will
> > automatically flow to the serial console and to pstore.
> 
> Actually, does the pstore act like a circular buffer? Because if it
> contains the last N relevant messages (for an arbitrary definition of
> relevant) before the system dies, then that could more helpful than only
> the error messages.
> 
> And with the advent of UEFI, pretty much every system has a pstore. Too
> bad that we have to limit it to 50% of size so that the boxes don't
> brick. :-P
> 
> > Then we have multiple paths for the critical bits of the error log
> > - and the tracepoints give us more details for the cases where the
> > machine doesn't spontaneously explode.
> 
> Ok, let's sort:
> 
> * First we have the not-so-critical hw error messages. We want to carry
> those out-of-band, i.e. not in dmesg so that people don't have to parse
> and collect dmesg but have a specialized solution which gives them
> structured logs and tools can analyze, collect and ... those errors.
> 
> * When a critical error happens, the above usage is not necessarily
> advantageous anymore in the sense that, in order to debug what caused
> the machine to crash, we don't simply necessarily want only the crash
> message but also the whole system activity that lead to it.
> 
> In which case, we probably actually want to turn off/ignore the error
> logging tracepoints and write *only* to dmesg which goes out over serial
> and to pstore. Right?
> 
> Because in such cases I want to have *all* *relevant* messages that lead
> to the explosion + the explosion message itself.
> 
> Makes sense? Yes, no? Aspects I've missed?
Makes sense to me.
> 
> Thanks.
> 
-- 
Cheers,
Mauro
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-14 10:47                     ` Naveen N. Rao
  2013-08-14 12:18                       ` Borislav Petkov
@ 2013-08-15  0:15                       ` Mauro Carvalho Chehab
  2013-08-15 10:01                         ` Borislav Petkov
  1 sibling, 1 reply; 65+ messages in thread
From: Mauro Carvalho Chehab @ 2013-08-15  0:15 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Luck, Tony, Borislav Petkov, bhelgaas@google.com,
	rostedt@goodmis.org, rjw@sisk.pl, lance.ortiz@hp.com,
	linux-pci@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-kernel@vger.kernel.org, Aristeu Rozanski Filho
Em Wed, 14 Aug 2013 16:17:26 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> escreveu:
> On 08/13/2013 11:09 PM, Luck, Tony wrote:
> >> In the meantime, like Boris suggests, I think we can have a different
> >> trace event for raw APEI reports - userspace can use it as it pleases.
> >>
> >> Once ghes_edac gets better, users can decide whether they want raw APEI
> >> reports or the EDAC-processed version and choose one or the other trace
> >> event.
> >
> > It's cheap to add as many tracepoints as we like - but may be costly to maintain.
> > Especially if we have to tinker with them later to adjust which things are logged,
> > that puts a burden on user-space tools to be updated to adapt to the changing
> > API.
> 
> Agree. And this is the reason I have been considering mc_event. But, the 
> below issues with ghes_edac made me unsure:
> - One, the logging format for APEI data is a bit verbose and hard to 
> parse. But, I suppose we could work with this if we make a few changes. 
> Is it ok to change how the APEI data is made available through 
> mc_event->driver_detail?
Well, as userspace currently only stores it, doing a few changes at
driver_detail is likely safe, but we need to know what do you intend to do.
> - Two, if ghes_edac is enabled, it prevents other edac drivers from 
> being loaded. It looks like the assumption here is that if ghes/firmware 
> first is enabled, then *all* memory errors are reported through ghes 
> which is not true. We could have (a subset of) corrected errors reported 
> through ghes, some through CMCI and uncorrected errors through MCE. So, 
> if I'm not mistaken, if ghes_edac is enabled, we will only receive ghes 
> error events through mc_event and not the others. Mauro, is this accurate?
Yes, that's the current assumption. It prevents to have both BIOS and a
direct-hardware-access-EDAC-driver to race, as this is known to have
serious issues.
Btw, that's basically the reason why EDAC core should be compiled builtin,
as we need to reserve resources for APEI/GHES before having a chance to
register another EDAC driver.
The current logic doesn't affect error reports via MCE, although I think
we should also try to mask it at kernel, as it is easier to avoid event
duplication in Kernelspace than in userspace (at least for some cases).
We may try to implement a fine graining type of resource locking. Feel free
to propose patches for it.
> 
> >
> > Mauro has written his user-space tool to process the ghes-edac events:
> >    git://git.fedorahosted.org/rasdaemon.git
> >
> > Who is writing the user space tools to process the new apei tracepoints
> > you want to add?
> 
> Enabling rasdaemon itself for the new tracepoint is an option, as long 
> as Mauro doesn't object to it ;)
I don't object to add new tracepoint events there, but I want to prevent
duplicate reports for the very same error. One thing is to have a single
memory corrected error. The other thing is to have a burst of errors at the
same DIMM. If the very same error starts to appear 2, 3, 4 times, then
userspace may take the wrong decision of replacing a good memory just
because of a single random error there.
> 
> >
> > I'm not opposed to these patches - just wondering who is taking the next step
> > to make them useful.
> 
> Sure.
> 
> 
> Regards,
> Naveen
> 
-- 
Cheers,
Mauro
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-14 10:57                     ` Naveen N. Rao
@ 2013-08-15  0:22                       ` Mauro Carvalho Chehab
  2013-08-15  9:38                         ` Borislav Petkov
  0 siblings, 1 reply; 65+ messages in thread
From: Mauro Carvalho Chehab @ 2013-08-15  0:22 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Borislav Petkov, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel
Em Wed, 14 Aug 2013 16:27:06 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> escreveu:
> On 08/13/2013 11:28 PM, Borislav Petkov wrote:
> > On Tue, Aug 13, 2013 at 11:02:08PM +0530, Naveen N. Rao wrote:
> >> If I'm not mistaken, even for systems that have EDAC drivers, it looks
> >> to me like EDAC can't really decode to the DIMM given what is provided
> >> by the bios in the APEI report currently. If and when ghes_edac gains
> >> this capability, users will have a choice between raw APEI reports vs.
> >> edac processed ones.
> >
> > Which kinda makes that APEI tracepoint not really useful and we can call
> > the one we have already - trace_mc_event - from APEI...
> 
> This looks like a nice option. Mauro, what do you think?
I considered calling trace_mc_event directly in APEI, before writing ghes_edac.
I decided to implement it as a separate driver due to some reasons:
	1) EDAC core needs to know that it should reject "hardware first"
drivers. So, any solution would need to talk to EDAC core anyway (or we 
would need to write some other kind of resource allocation somewhere);
	2) EDAC userspace would need to detect the type of memory. Even
being crappy, the current way the memory is reported as a single contiguous
block at sysfs. So, EDAC userspace is aware that it can't decode the DIMM;
	3) If BIOS vendors add later some solution to enumerate the DIMMS
per memory controller, channel, socket with APEI, the addition to the
existing driver would be trivial.
So, while it would work to just call the tracing at APEI, on the current way,
I believe that having this part on a separate code is better.
Cheers,
Mauro
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-15  0:22                       ` Mauro Carvalho Chehab
@ 2013-08-15  9:38                         ` Borislav Petkov
  2013-08-15 13:26                           ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 65+ messages in thread
From: Borislav Petkov @ 2013-08-15  9:38 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Naveen N. Rao, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel
On Wed, Aug 14, 2013 at 09:22:11PM -0300, Mauro Carvalho Chehab wrote:
> 	1) EDAC core needs to know that it should reject "hardware first"
> drivers.
-ENOPARSE. What do you mean?
>  3) If BIOS vendors add later some solution to enumerate the DIMMS
> per memory controller, channel, socket with APEI, the addition to the
> existing driver would be trivial.
Actually, with BIOS vendors wanting to do firmware-first strategy with
DRAM errors, they must have a pretty good and intimate picture of the
memory topology at hand. So it is only a logical consequence for them,
when reporting a memory error to the OS to tell us the silkscreen label
too, while at it.
And if they do that, we don't need the additional layer - just a
tracepoint from APEI and a userspace script.
It's a whole another question if they don't do that.
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-15  0:00                   ` Mauro Carvalho Chehab
@ 2013-08-15  9:43                     ` Borislav Petkov
  0 siblings, 0 replies; 65+ messages in thread
From: Borislav Petkov @ 2013-08-15  9:43 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Naveen N. Rao, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel
On Wed, Aug 14, 2013 at 09:00:35PM -0300, Mauro Carvalho Chehab wrote:
> I agree: per-type of error events is better than a big generic one.
There are many types of hardware errors and having a single TP for each
is not a good design. Especially if the error formats are comparable
to a high degree. We probably would end up with a nice sensible
classification of having only a handful of TPs covering *all* hardware
events.
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-15  0:15                       ` Mauro Carvalho Chehab
@ 2013-08-15 10:01                         ` Borislav Petkov
  2013-08-15 13:34                           ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 65+ messages in thread
From: Borislav Petkov @ 2013-08-15 10:01 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Naveen N. Rao, Luck, Tony, bhelgaas@google.com,
	rostedt@goodmis.org, rjw@sisk.pl, lance.ortiz@hp.com,
	linux-pci@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-kernel@vger.kernel.org, Aristeu Rozanski Filho
On Wed, Aug 14, 2013 at 09:15:04PM -0300, Mauro Carvalho Chehab wrote:
> > - Two, if ghes_edac is enabled, it prevents other edac drivers
> > from being loaded. It looks like the assumption here is that if
> > ghes/firmware first is enabled, then *all* memory errors are
> > reported through ghes which is not true. We could have (a subset
> > of) corrected errors reported through ghes, some through CMCI
> > and uncorrected errors through MCE. So, if I'm not mistaken, if
> > ghes_edac is enabled, we will only receive ghes error events through
> > mc_event and not the others. Mauro, is this accurate?
> 
> Yes, that's the current assumption. It prevents to have both BIOS and a
> direct-hardware-access-EDAC-driver to race, as this is known to have
> serious issues.
Ok, this is getting confusing so let's shed some more light.
* First of all, Naveen is asking whether other *edac* drivers can
be loaded. And no, they cannot once the ghes thing is loaded which
potentially is a problem.
For example, if the chipset-specific driver has additional functionality
from ghes_edac, then that functionality is gone when ghes_edac loads
first. This might be a problem in some cases, we probably need to think
about this more in depth.
* Then, there's the trace_mce_record() TP which comes straight
from mce.c This one is always enabled unless the mce_disable_bank
functionality kicks in which you, Naveen, added :-).
* The CMCI stuff should be synchronized with the MCE TP so that should
be ok.
I think those are our current error reporting paths...
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-14 23:56                   ` Mauro Carvalho Chehab
@ 2013-08-15 10:02                     ` Borislav Petkov
  0 siblings, 0 replies; 65+ messages in thread
From: Borislav Petkov @ 2013-08-15 10:02 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Naveen N. Rao, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel, Aristeu Rozanski Filho
On Wed, Aug 14, 2013 at 08:56:38PM -0300, Mauro Carvalho Chehab wrote:
> Better to spend a little more time discussing than implementing a new trace
> that will be removed on a near future.
Right, "in the meantime" we established that this new TP doesn't bring
us anything new so we might just as well use trace_mc_event and call it
either straight from APEI or in ghes_edac.
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-14 18:38                             ` Luck, Tony
@ 2013-08-15 10:14                               ` Borislav Petkov
  2013-08-15 19:14                                 ` Luck, Tony
  0 siblings, 1 reply; 65+ messages in thread
From: Borislav Petkov @ 2013-08-15 10:14 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Naveen N. Rao, Mauro Carvalho Chehab, bhelgaas@google.com,
	rostedt@goodmis.org, rjw@sisk.pl, lance.ortiz@hp.com,
	linux-pci@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-kernel@vger.kernel.org
On Wed, Aug 14, 2013 at 06:38:09PM +0000, Luck, Tony wrote:
> We've wandered around different strategies here. We definitely
> want the panic log. Some people want all other "kernel exit" logs
> (shutdown, reboot, kexec). When there is enough space in the pstore
> backend we might also want the "oops" that preceeded the panic. (Of
> course when the oops happens we don't know the future, so have to
> save it just in case ... then if more "oops" happen we have to decide
> whether to keep the old oops log, or save the newer one).
Ok, dmesg over serial and *only* oops+panic in pstore. Right.
> Yes - longer logs are better. Sad that the pstore backend devices are
> measured in kilobytes :-)
Right, so good ole serial again to the rescue! There's no room for full
dmesg in nvram because it needs space for the UEFI GUI and some other
crap :-)
> No - write speed for the persistent storage backing pstore (flash)
> means we don't log as we go. We wait for a panic and then our
> registered function gets called so we can snapshot what is in the
> console log at that point. We also don't want to wear out the flash
> which may be soldered to the motherboard.
I suspected as much. So we can forget about using *only* pstore for hw
errors logging. It would be cool to do so but the technology simply
doesn't give it.
> Agreed - we shouldn't clutter logs with details of corrected errors.
> At most we should have a rate-limited log showing the count of
> corrected errors so that someone who just watches dmesg knows they
> should go dig deeper if they see some big number of corrected errors.
/me nods.
> Yes. There are people looking at various "flight recorder" modes for
> tracing that keep logs of normal events in a circular buffer in RAM
> ... if these exist they should be saved at crash time (and they are in
> the kexec/kdump path, but I don’t know if anyone does anything in
> the non-kdump case).
Right, the cheapest solution is serial. Simply log everything to serial
because we can. But this is the key thing I wanted to emphasize:
For severe hardware errors we don't want to use any tracepoint -
actually it is even a bad thing to do so because they would get lost in
some side channels which, during a critical situation, might not get
written to anything/survive the crash, etc.
So what I'm saying is, we basically want severe hardware errors to land
to good old dmesg and to all consoles. No fancy TP stuff for them.
> Tracepoints for errors that are going to lead to system crash would
> only be useful together with a flight recorder to make sure they get
> saved. I think tracepoints for corrected errors are better than dmesg
> logs.
Yes, exactly.
> In a perfect world yes - I don't know that we can achieve perfection
> - but we can iterate through good, better, even better. The really
> hard part of this is figuring out what is *relevant* to save before a
> particular crash happens.
Well, if I have serial connected to the box, it will contain basically
everything the machine said, no?
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-15  9:38                         ` Borislav Petkov
@ 2013-08-15 13:26                           ` Mauro Carvalho Chehab
  2013-08-15 13:44                             ` Borislav Petkov
  0 siblings, 1 reply; 65+ messages in thread
From: Mauro Carvalho Chehab @ 2013-08-15 13:26 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Naveen N. Rao, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel
Em Thu, 15 Aug 2013 11:38:31 +0200
Borislav Petkov <bp@alien8.de> escreveu:
> On Wed, Aug 14, 2013 at 09:22:11PM -0300, Mauro Carvalho Chehab wrote:
> > 	1) EDAC core needs to know that it should reject "hardware first"
> > drivers.
> 
> -ENOPARSE. What do you mean?
I mean that the edac core needs to know that, on a given system, the
BIOS is accessing the hardware registers and sending the data via ghes_edac.
On such case, it should reject the driver that reads such data directly
from the hardware, as having both active cause inconsistent error reports
(I got a few BZ reports about that).
> >  3) If BIOS vendors add later some solution to enumerate the DIMMS
> > per memory controller, channel, socket with APEI, the addition to the
> > existing driver would be trivial.
> 
> Actually, with BIOS vendors wanting to do firmware-first strategy with
> DRAM errors, they must have a pretty good and intimate picture of the
> memory topology at hand. So it is only a logical consequence for them,
> when reporting a memory error to the OS to tell us the silkscreen label
> too, while at it.
> 
> And if they do that, we don't need the additional layer - just a
> tracepoint from APEI and a userspace script.
No. As we want that fatal errors to also be properly reported, the
kernel will still need to know the memory layout.
Ok, such information can come via userspace, just like we do with the
other EDAC drivers, but we'll need to allow to dynamically create the
memory layout via sysfs (or to use some other interface for loading that
data).
> It's a whole another question if they don't do that.
-- 
Cheers,
Mauro
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-15 10:01                         ` Borislav Petkov
@ 2013-08-15 13:34                           ` Mauro Carvalho Chehab
  2013-08-15 13:51                             ` Borislav Petkov
  0 siblings, 1 reply; 65+ messages in thread
From: Mauro Carvalho Chehab @ 2013-08-15 13:34 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Naveen N. Rao, Luck, Tony, bhelgaas@google.com,
	rostedt@goodmis.org, rjw@sisk.pl, lance.ortiz@hp.com,
	linux-pci@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-kernel@vger.kernel.org, Aristeu Rozanski Filho
Em Thu, 15 Aug 2013 12:01:32 +0200
Borislav Petkov <bp@alien8.de> escreveu:
> On Wed, Aug 14, 2013 at 09:15:04PM -0300, Mauro Carvalho Chehab wrote:
> > > - Two, if ghes_edac is enabled, it prevents other edac drivers
> > > from being loaded. It looks like the assumption here is that if
> > > ghes/firmware first is enabled, then *all* memory errors are
> > > reported through ghes which is not true. We could have (a subset
> > > of) corrected errors reported through ghes, some through CMCI
> > > and uncorrected errors through MCE. So, if I'm not mistaken, if
> > > ghes_edac is enabled, we will only receive ghes error events through
> > > mc_event and not the others. Mauro, is this accurate?
> > 
> > Yes, that's the current assumption. It prevents to have both BIOS and a
> > direct-hardware-access-EDAC-driver to race, as this is known to have
> > serious issues.
> 
> Ok, this is getting confusing so let's shed some more light.
> 
> * First of all, Naveen is asking whether other *edac* drivers can
> be loaded. And no, they cannot once the ghes thing is loaded which
> potentially is a problem.
> 
> For example, if the chipset-specific driver has additional functionality
> from ghes_edac, then that functionality is gone when ghes_edac loads
> first. This might be a problem in some cases, we probably need to think
> about this more in depth.
Yes, but the thing is that it is not safe to use the hardware driver if
the BIOS is also reading the hardware error registers directly, as, on
several hardware, a read cause the error data to be cleaned on such 
register.
So, either APEI should be extended to allow some fine-grained that would
provide ways to check/control what resources would be reserved for BIOS 
only, or the user needs to tell BIOS/Kernel if they want BIOS
or OS to access the hardware.
Regards,
Mauro
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-15 13:26                           ` Mauro Carvalho Chehab
@ 2013-08-15 13:44                             ` Borislav Petkov
  2013-08-15 14:14                               ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 65+ messages in thread
From: Borislav Petkov @ 2013-08-15 13:44 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Naveen N. Rao, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel
On Thu, Aug 15, 2013 at 10:26:07AM -0300, Mauro Carvalho Chehab wrote:
> I mean that the edac core needs to know that, on a given system, the
> BIOS is accessing the hardware registers and sending the data via
> ghes_edac.
Right, that's the firmware-first thing which Naveen did - see
mce_disable_bank.
> No. As we want that fatal errors to also be properly reported, the
> kernel will still need to know the memory layout.
Read what I said: if you have the silkscreen label you don't need the
memory layout - you *already* *know* which DIMM is affected.
Also, fatal errors are a whole different beast where we run in NMI
context or we even don't get to run the #MC handler on some systems.
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-15 13:34                           ` Mauro Carvalho Chehab
@ 2013-08-15 13:51                             ` Borislav Petkov
  2013-08-15 18:16                               ` Luck, Tony
  0 siblings, 1 reply; 65+ messages in thread
From: Borislav Petkov @ 2013-08-15 13:51 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Naveen N. Rao, Luck, Tony, bhelgaas@google.com,
	rostedt@goodmis.org, rjw@sisk.pl, lance.ortiz@hp.com,
	linux-pci@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-kernel@vger.kernel.org, Aristeu Rozanski Filho
On Thu, Aug 15, 2013 at 10:34:21AM -0300, Mauro Carvalho Chehab wrote:
> Yes, but the thing is that it is not safe to use the hardware driver
> if the BIOS is also reading the hardware error registers directly, as,
> on several hardware, a read cause the error data to be cleaned on such
> register.
Here's the deal:
* We parse some APEI table and disable those MCA banks which the BIOS
wants to handle first.
* When the BIOS decides to report an error from that handling, it does
so over another BIOS table.
* Now you have two possibilities:
** On systems without an edac driver or where it doesn't make sense to
have the ghes_edac driver, we call trace_mc_event() straight from APEI
code (this is what we're currently discussung).
** On other systems, where we need ghes_edac, we *don't* use the
trace_mc_event() tracepoint in the APEI code but let it come from
ghes_edac with additional information collected by edac.
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-15 13:44                             ` Borislav Petkov
@ 2013-08-15 14:14                               ` Mauro Carvalho Chehab
  2013-08-15 16:11                                 ` Borislav Petkov
  2013-08-15 19:20                                 ` Luck, Tony
  0 siblings, 2 replies; 65+ messages in thread
From: Mauro Carvalho Chehab @ 2013-08-15 14:14 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Naveen N. Rao, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel
Em Thu, 15 Aug 2013 15:44:54 +0200
Borislav Petkov <bp@alien8.de> escreveu:
> On Thu, Aug 15, 2013 at 10:26:07AM -0300, Mauro Carvalho Chehab wrote:
> > I mean that the edac core needs to know that, on a given system, the
> > BIOS is accessing the hardware registers and sending the data via
> > ghes_edac.
> 
> Right, that's the firmware-first thing which Naveen did - see
> mce_disable_bank.
> 
> > No. As we want that fatal errors to also be properly reported, the
> > kernel will still need to know the memory layout.
> 
> Read what I said: if you have the silkscreen label you don't need the
> memory layout - you *already* *know* which DIMM is affected.
AFAIKT, APEI doesn't provide the silkscreen label. Some code (or some
datasheet) is needed to translate between what APEI provides into the
silkscreen label.
Naveen, please correct me if I'm wrong.
> Also, fatal errors are a whole different beast where we run in NMI
> context or we even don't get to run the #MC handler on some systems.
I see.
Yes, APEI currently prints only a raw event on high severity errors
at ghes_notify_nmi(), and doesn't call ghes_edac. Changing it would
require to parse the error at __ghes_print_estatus(). Not sure how
easy would be to change that.
Em Thu, 15 Aug 2013 15:51:06 +0200
Borislav Petkov <bp@alien8.de> escreveu:
> On Thu, Aug 15, 2013 at 10:34:21AM -0300, Mauro Carvalho Chehab wrote:
> > Yes, but the thing is that it is not safe to use the hardware driver
> > if the BIOS is also reading the hardware error registers directly, as,
> > on several hardware, a read cause the error data to be cleaned on such
> > register.
> 
> Here's the deal:
> 
> * We parse some APEI table and disable those MCA banks which the BIOS
> wants to handle first.
> 
> * When the BIOS decides to report an error from that handling, it does
> so over another BIOS table.
OK.
> 
> * Now you have two possibilities:
> 
> ** On systems without an edac driver or where it doesn't make sense to
> have the ghes_edac driver, we call trace_mc_event() straight from APEI
> code (this is what we're currently discussung).
> 
> ** On other systems, where we need ghes_edac, we *don't* use the
> trace_mc_event() tracepoint in the APEI code but let it come from
> ghes_edac with additional information collected by edac.
I don't see why should we have those two alternatives, as, at worse
case (e. g. if ghes_edac can't enrich the APEI data with labels),
they'll basically provide the very same data to userspace, and the
EDAC extra overhead is small, on its error report logic.
The risk of doing the very same thing on two different places is that
the logic to encapsulate APEI data into trace_mc_event() would be
on two separate places. It risks that someone would change one of the
drivers and forget to apply the very same change on the other, causing
parse errors on userspace, depending on the source.
-- 
Cheers,
Mauro
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-15 14:14                               ` Mauro Carvalho Chehab
@ 2013-08-15 16:11                                 ` Borislav Petkov
  2013-08-15 19:20                                 ` Luck, Tony
  1 sibling, 0 replies; 65+ messages in thread
From: Borislav Petkov @ 2013-08-15 16:11 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Naveen N. Rao, tony.luck, bhelgaas, rostedt, rjw, lance.ortiz,
	linux-pci, linux-acpi, linux-kernel
On Thu, Aug 15, 2013 at 11:14:07AM -0300, Mauro Carvalho Chehab wrote:
> I don't see why should we have those two alternatives, as, at worse
> case (e. g. if ghes_edac can't enrich the APEI data with labels),
> they'll basically provide the very same data to userspace, and the
> EDAC extra overhead is small, on its error report logic.
Well, a couple of reasons.
The first and foremost one is having another layer which needs
registration, etc. because ghes_edac pulls the whole edac core stuff
with it. The thinner we are, the less overhead we cause. And this is
a must for RAS.
Actually, this is a must for all kernel code - the faster we can get
done and the thinner we are, the better. We absolutely definitely don't
want to have a useless layer in the error reporting path just because it
is easier.
This short path will pay out later in error storms and other
resources-constrained situations.
Furthermore, dealing with another edac driver is not trivial for
distros, like going around and telling people that all of a sudden
they need to enable ghes_edac. This is tangential to the issue which
Naveen raised that on some machines, you want not only ghes_edac but the
platform-specific one. Which doesn't work currently and we don't have a
clear solution on how to get it working yet.
Finally, userspace doesn't care where it gets its TP data from as long
as it is there.
> The risk of doing the very same thing on two different places is that
> the logic to encapsulate APEI data into trace_mc_event() would be on
> two separate places. It risks that someone would change one of the
> drivers and forget to apply the very same change on the other, causing
> parse errors on userspace, depending on the source.
We'll make sure that doesn't happen.
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
* RE: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-15 13:51                             ` Borislav Petkov
@ 2013-08-15 18:16                               ` Luck, Tony
  2013-08-15 18:41                                 ` Borislav Petkov
  0 siblings, 1 reply; 65+ messages in thread
From: Luck, Tony @ 2013-08-15 18:16 UTC (permalink / raw)
  To: Borislav Petkov, Mauro Carvalho Chehab
  Cc: Naveen N. Rao, bhelgaas@google.com, rostedt@goodmis.org,
	rjw@sisk.pl, lance.ortiz@hp.com, linux-pci@vger.kernel.org,
	linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org,
	Aristeu Rozanski Filho
PiAqIFdlIHBhcnNlIHNvbWUgQVBFSSB0YWJsZSBhbmQgZGlzYWJsZSB0aG9zZSBNQ0EgYmFua3Mg
d2hpY2ggdGhlIEJJT1MNCj4gd2FudHMgdG8gaGFuZGxlIGZpcnN0Lg0KDQpXZSBoYXZlIG5vIGlk
ZWEgd2hpY2ggZXJyb3JzIHRoZSBCSU9TIGhhcyBjaG9zZW4gZm9yIGl0c2VsZi4gIFdlIGp1c3Qg
a25vdyB3aGljaA0KYmFuayBudW1iZXJzIC4uLiBhbmQgSW50ZWwgcHJvY2Vzc29ycyBjaGFuZ2Ug
bWFwcGluZ3Mgb2Ygd2hpY2ggZXJyb3JzIGFyZSBsb2dnZWQNCmluIHdoaWNoIGJhbmtzIGluIGV2
ZXJ5IG5ldyBwcm9jZXNzb3IgdG9jayAoYW5kIHNvbWV0aW1lcyB0aWNrKS4gIFNvbWUgYmFua3MN
CmFyZSBkb2N1bWVudGVkIGluIHByb2Nlc3NvciBkYXRhc2hlZXQuIG1vc3QgYXJlIG5vdC4gTW9z
dCBjb21tb24gY2FzZSBtaWdodA0Kd2VsbCBiZSBtZW1vcnkgLi4uIGJ1dCBpdCBjb3VsZCBiZSBj
YWNoZSwgb3IgSS9PLCBvciAuLi4NCg0KU28gdGhpcyBkb2Vzbid0IGhlbHAgTWF1cm8gZmlndXJl
IG91dCB3aGV0aGVyIHRvIGFsbG93IGxvYWRpbmcgb2YgYW4NCkVEQUMgZHJpdmVyIHRoYXQgd2ls
bCBwZWVrIGFuZCBwb2tlIGF0IGNoaXBzZXQgc3BlY2lmaWMgcmVnaXN0ZXJzIGluDQpwb3NzaWJs
eSByYWN5IHdheXMgd2l0aCBCSU9TIGNvZGUgZG9pbmcgdGhlIHNhbWUgdGhpbmcuDQoNCi1Ub255
DQo=
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-15 18:16                               ` Luck, Tony
@ 2013-08-15 18:41                                 ` Borislav Petkov
  0 siblings, 0 replies; 65+ messages in thread
From: Borislav Petkov @ 2013-08-15 18:41 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Mauro Carvalho Chehab, Naveen N. Rao, bhelgaas@google.com,
	rostedt@goodmis.org, rjw@sisk.pl, lance.ortiz@hp.com,
	linux-pci@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-kernel@vger.kernel.org, Aristeu Rozanski Filho
On Thu, Aug 15, 2013 at 06:16:48PM +0000, Luck, Tony wrote:
> > * We parse some APEI table and disable those MCA banks which the BIOS
> > wants to handle first.
>
> We have no idea which errors the BIOS has chosen for itself. We just
> know which bank numbers ...
Well, those which BIOS hasn't chosen for itself get simply handled up
through, HEST it is, I think. So it all goes out in APEI anyway...
> and Intel processors change mappings of which errors are logged in
> which banks in every new processor tock (and sometimes tick). Some
> banks are documented in processor datasheet. most are not. Most common
> case might well be memory ... but it could be cache, or I/O, or ...
>
> So this doesn't help Mauro figure out whether to allow loading of an
> EDAC driver that will peek and poke at chipset specific registers in
> possibly racy ways with BIOS code doing the same thing.
That doesn't matter - the only thing that matters is if an EDAC driver
has anything additional to bring to the table. If it does, then it gets
to see the errors before they're dumped to userspace. If not, then APEI
should report them directly.
Mind you, if we've disabled an MCA bank for the kernel then no EDAC
driver gets to see errors from it either because APEI has taken
responsibility. Unless said driver is poking around MCA registers -
which it shouldn't.
So I'd guess the decision to load an EDAC driver should be a platform
one. A platform which gives *sufficient* information in APEI tables for
an error doesn't need an EDAC driver. Older platforms or platforms which
cannot supply sufficient information for, say, properly pinpointing the
DIMM, should use the additional help of an EDAC driver for that, if
possible.
Which begs the most important question: do we even have a platform that
can give us sufficient information without the need for an EDAC driver?
Because if not, we should stop wasting energy pointlessly and simply
drop this discussion: we basically load an EDAC driver and do not do the
APEI tracepoint because it simply doesn't make any sense and there's no
actual platform giving us that info.
So, which is it?
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
* RE: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-15 10:14                               ` Borislav Petkov
@ 2013-08-15 19:14                                 ` Luck, Tony
  2013-08-15 19:43                                   ` Borislav Petkov
  0 siblings, 1 reply; 65+ messages in thread
From: Luck, Tony @ 2013-08-15 19:14 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Naveen N. Rao, Mauro Carvalho Chehab, bhelgaas@google.com,
	rostedt@goodmis.org, rjw@sisk.pl, lance.ortiz@hp.com,
	linux-pci@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-kernel@vger.kernel.org
PiBXZWxsLCBpZiBJIGhhdmUgc2VyaWFsIGNvbm5lY3RlZCB0byB0aGUgYm94LCBpdCB3aWxsIGNv
bnRhaW4gYmFzaWNhbGx5DQo+IGV2ZXJ5dGhpbmcgdGhlIG1hY2hpbmUgc2FpZCwgbm8/DQoNClll
cyAtIGJ1dCB0aGUgc2VyaWFsIHBvcnQgaXMgdG9vIHNsb3cgdG8gbG9nIGV2ZXJ5dGhpbmcgdGhh
dCB5b3UgbWlnaHQNCmNvbmNlaXZhYmx5IG5lZWQgdG8gZGVidWcgeW91ciBwcm9ibGVtLiAgSW1h
Z2luZSB0cnlpbmcgdG8gbG9nIGV2ZXJ5DQppbnRlcnJ1cHQgYW5kIGV2ZXJ5IHBhZ2VmYXVsdCBv
biBldmVyeSBwcm9jZXNzb3IgZG93biBhIHNpbmdsZSAxMTUyMDANCmJhdWQgY29ubmVjdGlvbi4g
IFRodXMgbXkgZWFybGllciBjb21tZW50cyBhYm91dCBndWVzc2luZyB3aGF0IHRvDQpsb2cuDQoN
Ci1Ub255DQo=
^ permalink raw reply	[flat|nested] 65+ messages in thread
* RE: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-15 14:14                               ` Mauro Carvalho Chehab
  2013-08-15 16:11                                 ` Borislav Petkov
@ 2013-08-15 19:20                                 ` Luck, Tony
  2013-08-15 19:41                                   ` Borislav Petkov
  1 sibling, 1 reply; 65+ messages in thread
From: Luck, Tony @ 2013-08-15 19:20 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Borislav Petkov
  Cc: Naveen N. Rao, bhelgaas@google.com, rostedt@goodmis.org,
	rjw@sisk.pl, lance.ortiz@hp.com, linux-pci@vger.kernel.org,
	linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org
> AFAIKT, APEI doesn't provide the silkscreen label. Some code (or some
> datasheet) is needed to translate between what APEI provides into the
> silkscreen label.
In theory it could. The ACPI generic error structure used to report includes
a 20-byte free format field which a BIOS could use to describe the location
of the error.  Haven't seen anyone do this yet - and our internal BIOS people
looked at me like I was crazy to suggest such a thing.  But I still entertain
hopes that some OEM will do the right thing and raise the bar of usefulness.
-Tony
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-15 19:20                                 ` Luck, Tony
@ 2013-08-15 19:41                                   ` Borislav Petkov
  0 siblings, 0 replies; 65+ messages in thread
From: Borislav Petkov @ 2013-08-15 19:41 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Mauro Carvalho Chehab, Naveen N. Rao, bhelgaas@google.com,
	rostedt@goodmis.org, rjw@sisk.pl, lance.ortiz@hp.com,
	linux-pci@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-kernel@vger.kernel.org
On Thu, Aug 15, 2013 at 07:20:29PM +0000, Luck, Tony wrote:
> In theory it could. The ACPI generic error structure used to report
> includes a 20-byte free format field which a BIOS could use to
> describe the location of the error. Haven't seen anyone do this yet -
> and our internal BIOS people looked at me like I was crazy to suggest
> such a thing. But I still entertain hopes that some OEM will do the
> right thing and raise the bar of usefulness.
Sadly, I can report similar experiences.
I did try to get the RAS hw people persuaded that spitting out the DIMM
mapping is best done by the BIOS because it has that info already -
it is a matter of carrying it out. Alas, more important events took
place... :)
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
-- 
^ permalink raw reply	[flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
  2013-08-15 19:14                                 ` Luck, Tony
@ 2013-08-15 19:43                                   ` Borislav Petkov
  0 siblings, 0 replies; 65+ messages in thread
From: Borislav Petkov @ 2013-08-15 19:43 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Naveen N. Rao, Mauro Carvalho Chehab, bhelgaas@google.com,
	rostedt@goodmis.org, rjw@sisk.pl, lance.ortiz@hp.com,
	linux-pci@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-kernel@vger.kernel.org
On Thu, Aug 15, 2013 at 07:14:34PM +0000, Luck, Tony wrote:
> Yes - but the serial port is too slow to log everything that you might
> conceivably need to debug your problem. Imagine trying to log every
> interrupt and every pagefault on every processor down a single 115200
> baud connection. Thus my earlier comments about guessing what to log.
Right, but a normally sized dmesg usually gets through ok. Selecting
what to log could be a very tedious and error-prone process...
-- 
Regards/Gruss,
    Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply	[flat|nested] 65+ messages in thread
end of thread, other threads:[~2013-08-15 19:43 UTC | newest]
Thread overview: 65+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-08 18:27 [PATCH 0/3] Add trace event for ghes memory error Naveen N. Rao
2013-08-08 18:27 ` [PATCH 1/3] mce: acpi/apei: trace: Include PCIe AER trace event conditionally Naveen N. Rao
2013-08-08 19:23   ` Steven Rostedt
2013-08-12 11:37     ` Naveen N. Rao
2013-08-12 13:13       ` Steven Rostedt
2013-08-12 13:26         ` Borislav Petkov
2013-08-08 18:27 ` [PATCH 2/3] mce: acpi/apei: trace: Add trace event for ghes memory error Naveen N. Rao
2013-08-08 19:17   ` Borislav Petkov
2013-08-12 11:28     ` Naveen N. Rao
2013-08-08 18:27 ` [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event Naveen N. Rao
2013-08-08 19:38   ` Mauro Carvalho Chehab
2013-08-10 18:03     ` Borislav Petkov
2013-08-12 11:33       ` Mauro Carvalho Chehab
2013-08-12 12:38         ` Borislav Petkov
2013-08-12 14:49           ` Mauro Carvalho Chehab
2013-08-12 15:04             ` Borislav Petkov
2013-08-12 17:25               ` Mauro Carvalho Chehab
2013-08-12 17:54                 ` Luck, Tony
2013-08-12 17:56                 ` Borislav Petkov
2013-08-13 11:36                   ` Naveen N. Rao
2013-08-13 12:21                     ` Mauro Carvalho Chehab
2013-08-13 12:33                       ` Borislav Petkov
2013-08-13 16:55                       ` Naveen N. Rao
2013-08-14 23:54                         ` Mauro Carvalho Chehab
2013-08-12 12:41         ` Naveen N. Rao
2013-08-12 12:53           ` Borislav Petkov
2013-08-13 11:21             ` Naveen N. Rao
2013-08-13 12:42               ` Borislav Petkov
2013-08-13 17:32                 ` Naveen N. Rao
2013-08-13 17:58                   ` Borislav Petkov
2013-08-13 18:05                     ` Luck, Tony
2013-08-13 18:10                       ` Borislav Petkov
2013-08-13 20:13                         ` Luck, Tony
2013-08-14  5:43                           ` Borislav Petkov
2013-08-14 18:38                             ` Luck, Tony
2013-08-15 10:14                               ` Borislav Petkov
2013-08-15 19:14                                 ` Luck, Tony
2013-08-15 19:43                                   ` Borislav Petkov
2013-08-15  0:05                             ` Mauro Carvalho Chehab
2013-08-14 10:57                     ` Naveen N. Rao
2013-08-15  0:22                       ` Mauro Carvalho Chehab
2013-08-15  9:38                         ` Borislav Petkov
2013-08-15 13:26                           ` Mauro Carvalho Chehab
2013-08-15 13:44                             ` Borislav Petkov
2013-08-15 14:14                               ` Mauro Carvalho Chehab
2013-08-15 16:11                                 ` Borislav Petkov
2013-08-15 19:20                                 ` Luck, Tony
2013-08-15 19:41                                   ` Borislav Petkov
2013-08-15  0:00                   ` Mauro Carvalho Chehab
2013-08-15  9:43                     ` Borislav Petkov
2013-08-12 14:44           ` Mauro Carvalho Chehab
2013-08-13 11:41             ` Naveen N. Rao
2013-08-13 12:41               ` Mauro Carvalho Chehab
2013-08-13 17:17                 ` Naveen N. Rao
2013-08-13 17:39                   ` Luck, Tony
2013-08-14 10:47                     ` Naveen N. Rao
2013-08-14 12:18                       ` Borislav Petkov
2013-08-15  0:15                       ` Mauro Carvalho Chehab
2013-08-15 10:01                         ` Borislav Petkov
2013-08-15 13:34                           ` Mauro Carvalho Chehab
2013-08-15 13:51                             ` Borislav Petkov
2013-08-15 18:16                               ` Luck, Tony
2013-08-15 18:41                                 ` Borislav Petkov
2013-08-14 23:56                   ` Mauro Carvalho Chehab
2013-08-15 10:02                     ` Borislav Petkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).