linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/2] ACPI, APEI, GHES: Remove strict check for memory error handling
@ 2013-11-25  7:15 Chen, Gong
  2013-11-25  7:15 ` [PATCH v2 2/2] ACPI, APEI, GHES: Cleanup ghes codes " Chen, Gong
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Chen, Gong @ 2013-11-25  7:15 UTC (permalink / raw)
  To: tony.luck, bp, naveen.n.rao; +Cc: linux-acpi, Chen, Gong

Usually SCI is employed to handle corrected error, especially
for memory corrected error but in fact SCI still can be used
to handle any error like memory uncorrected error even fatal
error if BIOS enable it. For this kind of situation, it
should be logged, too.

v2 -> v1: make the event record more precisely

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
---
 arch/x86/kernel/cpu/mcheck/mce-apei.c | 10 +++++++---
 drivers/acpi/apei/ghes.c              |  3 +--
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-apei.c b/arch/x86/kernel/cpu/mcheck/mce-apei.c
index de8b60a..d137ab8 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-apei.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-apei.c
@@ -33,6 +33,7 @@
 #include <linux/acpi.h>
 #include <linux/cper.h>
 #include <acpi/apei.h>
+#include <acpi/ghes.h>
 #include <asm/mce.h>
 
 #include "mce-internal.h"
@@ -41,14 +42,17 @@ void apei_mce_report_mem_error(int corrected, struct cper_sec_mem_err *mem_err)
 {
 	struct mce m;
 
-	/* Only corrected MC is reported */
-	if (!corrected || !(mem_err->validation_bits & CPER_MEM_VALID_PA))
+	if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
 		return;
 
 	mce_setup(&m);
 	m.bank = 1;
-	/* Fake a memory read corrected error with unknown channel */
+	/* Fake a memory read error with unknown channel */
 	m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | 0x9f;
+	if (corrected >= GHES_SEV_RECOVERABLE)
+		m.status |= MCI_STATUS_UC;
+	if (corrected >= GHES_SEV_PANIC)
+		m.status |= MCI_STATUS_PCC;
 	m.addr = mem_err->physical_addr;
 	mce_log(&m);
 	mce_notify_irq();
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index a30bc31..ce3683d 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -453,8 +453,7 @@ static void ghes_do_proc(struct ghes *ghes,
 			ghes_edac_report_mem_error(ghes, sev, mem_err);
 
 #ifdef CONFIG_X86_MCE
-			apei_mce_report_mem_error(sev == GHES_SEV_CORRECTED,
-						  mem_err);
+			apei_mce_report_mem_error(sev, mem_err);
 #endif
 			ghes_handle_memory_failure(gdata, sev);
 		}
-- 
1.8.4.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/2] ACPI, APEI, GHES: Cleanup ghes codes for memory error handling
  2013-11-25  7:15 [PATCH v2 1/2] ACPI, APEI, GHES: Remove strict check for memory error handling Chen, Gong
@ 2013-11-25  7:15 ` Chen, Gong
  2013-11-26  6:54   ` Chen, Gong
  2013-11-26  9:04   ` Naveen N. Rao
  2013-11-25 17:13 ` [PATCH v2 1/2] ACPI, APEI, GHES: Remove strict check " Borislav Petkov
  2013-11-26  9:02 ` Naveen N. Rao
  2 siblings, 2 replies; 14+ messages in thread
From: Chen, Gong @ 2013-11-25  7:15 UTC (permalink / raw)
  To: tony.luck, bp, naveen.n.rao; +Cc: linux-acpi, Chen, Gong

Cleanup the logic for function ghes_handle_memory_failure. Just
make it simpler and cleaner.

v2 -> v1: fix a compile error & some minor changes.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
---
 drivers/acpi/apei/ghes.c | 36 ++++++++++++++++++++----------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index ce3683d..46766ef 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -413,27 +413,31 @@ static void ghes_handle_memory_failure(struct acpi_generic_data *gdata, int sev)
 {
 #ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
 	unsigned long pfn;
+	int flags = -1;
 	int sec_sev = ghes_severity(gdata->error_severity);
 	struct cper_sec_mem_err *mem_err;
 	mem_err = (struct cper_sec_mem_err *)(gdata + 1);
 
-	if (sec_sev == GHES_SEV_CORRECTED &&
-	    (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED) &&
-	    (mem_err->validation_bits & CPER_MEM_VALID_PA)) {
-		pfn = mem_err->physical_addr >> PAGE_SHIFT;
-		if (pfn_valid(pfn))
-			memory_failure_queue(pfn, 0, MF_SOFT_OFFLINE);
-		else if (printk_ratelimit())
-			pr_warn(FW_WARN GHES_PFX
-			"Invalid address in generic error data: %#llx\n",
-			mem_err->physical_addr);
-	}
-	if (sev == GHES_SEV_RECOVERABLE &&
-	    sec_sev == GHES_SEV_RECOVERABLE &&
-	    mem_err->validation_bits & CPER_MEM_VALID_PA) {
-		pfn = mem_err->physical_addr >> PAGE_SHIFT;
-		memory_failure_queue(pfn, 0, 0);
+	if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
+		return;
+
+	pfn = mem_err->physical_addr >> PAGE_SHIFT;
+	if (!pfn_valid(pfn)) {
+		pr_warn_ratelimited(FW_WARN GHES_PFX
+		"Invalid address in generic error data: %#llx\n",
+		mem_err->physical_addr);
+		return;
 	}
+
+	/* iff following two events can be handled properly by now */
+	if (sec_sev == GHES_SEV_CORRECTED &&
+	    (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED))
+		flags = MF_SOFT_OFFLINE;
+	if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE)
+		flags = 0;
+
+	if (flags != -1)
+		memory_failure_queue(pfn, 0, flags);
 #endif
 }
 
-- 
1.8.4.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] ACPI, APEI, GHES: Remove strict check for memory error handling
  2013-11-25  7:15 [PATCH v2 1/2] ACPI, APEI, GHES: Remove strict check for memory error handling Chen, Gong
  2013-11-25  7:15 ` [PATCH v2 2/2] ACPI, APEI, GHES: Cleanup ghes codes " Chen, Gong
@ 2013-11-25 17:13 ` Borislav Petkov
  2013-11-26  9:02 ` Naveen N. Rao
  2 siblings, 0 replies; 14+ messages in thread
From: Borislav Petkov @ 2013-11-25 17:13 UTC (permalink / raw)
  To: Chen, Gong; +Cc: tony.luck, naveen.n.rao, linux-acpi

On Mon, Nov 25, 2013 at 02:15:00AM -0500, Chen, Gong wrote:
> Usually SCI is employed to handle corrected error, especially
> for memory corrected error but in fact SCI still can be used
> to handle any error like memory uncorrected error even fatal
> error if BIOS enable it. For this kind of situation, it
> should be logged, too.
> 
> v2 -> v1: make the event record more precisely
> 
> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>

Looks ok to me.

Acked-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] ACPI, APEI, GHES: Cleanup ghes codes for memory error handling
  2013-11-25  7:15 ` [PATCH v2 2/2] ACPI, APEI, GHES: Cleanup ghes codes " Chen, Gong
@ 2013-11-26  6:54   ` Chen, Gong
  2013-11-26  7:23     ` Borislav Petkov
  2013-11-26  9:04   ` Naveen N. Rao
  1 sibling, 1 reply; 14+ messages in thread
From: Chen, Gong @ 2013-11-26  6:54 UTC (permalink / raw)
  To: tony.luck, bp, naveen.n.rao; +Cc: linux-acpi

[-- Attachment #1: Type: text/plain, Size: 2817 bytes --]

On Mon, Nov 25, 2013 at 02:15:01AM -0500, Chen, Gong wrote:
> Date: Mon, 25 Nov 2013 02:15:01 -0500
> From: "Chen, Gong" <gong.chen@linux.intel.com>
> To: tony.luck@intel.com, bp@alien8.de, naveen.n.rao@linux.vnet.ibm.com
> Cc: linux-acpi@vger.kernel.org, "Chen, Gong" <gong.chen@linux.intel.com>
> Subject: [PATCH v2 2/2] ACPI, APEI, GHES: Cleanup ghes codes for memory
>  error handling
> X-Mailer: git-send-email 1.8.4.3
> 
> Cleanup the logic for function ghes_handle_memory_failure. Just
> make it simpler and cleaner.
> 
> v2 -> v1: fix a compile error & some minor changes.
> 
> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
> ---
>  drivers/acpi/apei/ghes.c | 36 ++++++++++++++++++++----------------
>  1 file changed, 20 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index ce3683d..46766ef 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -413,27 +413,31 @@ static void ghes_handle_memory_failure(struct acpi_generic_data *gdata, int sev)
>  {
>  #ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
>  	unsigned long pfn;
> +	int flags = -1;
>  	int sec_sev = ghes_severity(gdata->error_severity);
>  	struct cper_sec_mem_err *mem_err;
>  	mem_err = (struct cper_sec_mem_err *)(gdata + 1);
>  
> -	if (sec_sev == GHES_SEV_CORRECTED &&
> -	    (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED) &&
> -	    (mem_err->validation_bits & CPER_MEM_VALID_PA)) {
> -		pfn = mem_err->physical_addr >> PAGE_SHIFT;
> -		if (pfn_valid(pfn))
> -			memory_failure_queue(pfn, 0, MF_SOFT_OFFLINE);
> -		else if (printk_ratelimit())
> -			pr_warn(FW_WARN GHES_PFX
> -			"Invalid address in generic error data: %#llx\n",
> -			mem_err->physical_addr);
> -	}
> -	if (sev == GHES_SEV_RECOVERABLE &&
> -	    sec_sev == GHES_SEV_RECOVERABLE &&
> -	    mem_err->validation_bits & CPER_MEM_VALID_PA) {
> -		pfn = mem_err->physical_addr >> PAGE_SHIFT;
> -		memory_failure_queue(pfn, 0, 0);
> +	if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
> +		return;
> +
> +	pfn = mem_err->physical_addr >> PAGE_SHIFT;
> +	if (!pfn_valid(pfn)) {
> +		pr_warn_ratelimited(FW_WARN GHES_PFX
> +		"Invalid address in generic error data: %#llx\n",
> +		mem_err->physical_addr);
> +		return;
>  	}
> +
> +	/* iff following two events can be handled properly by now */
> +	if (sec_sev == GHES_SEV_CORRECTED &&
> +	    (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED))
> +		flags = MF_SOFT_OFFLINE;
> +	if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE)
> +		flags = 0;
> +
> +	if (flags != -1)
> +		memory_failure_queue(pfn, 0, flags);
>  #endif
>  }
>  

Hi, Boris

In this patch so-called cleanup includes an implied PFN check for UC error
but missed in current codes.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] ACPI, APEI, GHES: Cleanup ghes codes for memory error handling
  2013-11-26  6:54   ` Chen, Gong
@ 2013-11-26  7:23     ` Borislav Petkov
  2013-11-27  2:15       ` Chen, Gong
  0 siblings, 1 reply; 14+ messages in thread
From: Borislav Petkov @ 2013-11-26  7:23 UTC (permalink / raw)
  To: Chen, Gong; +Cc: tony.luck, naveen.n.rao, linux-acpi

On Tue, Nov 26, 2013 at 01:54:57AM -0500, Chen, Gong wrote:
> In this patch so-called cleanup includes an implied PFN check for UC
> error but missed in current codes.

Right, I was about to look at it. You probably should add this to the
commit message so that it is clear.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] ACPI, APEI, GHES: Remove strict check for memory error handling
  2013-11-25  7:15 [PATCH v2 1/2] ACPI, APEI, GHES: Remove strict check for memory error handling Chen, Gong
  2013-11-25  7:15 ` [PATCH v2 2/2] ACPI, APEI, GHES: Cleanup ghes codes " Chen, Gong
  2013-11-25 17:13 ` [PATCH v2 1/2] ACPI, APEI, GHES: Remove strict check " Borislav Petkov
@ 2013-11-26  9:02 ` Naveen N. Rao
  2013-11-26  9:31   ` Chen, Gong
  2 siblings, 1 reply; 14+ messages in thread
From: Naveen N. Rao @ 2013-11-26  9:02 UTC (permalink / raw)
  To: Chen, Gong, tony.luck, bp; +Cc: linux-acpi

On 11/25/2013 12:45 PM, Chen, Gong wrote:
> Usually SCI is employed to handle corrected error, especially
> for memory corrected error but in fact SCI still can be used
> to handle any error like memory uncorrected error even fatal
> error if BIOS enable it. For this kind of situation, it
> should be logged, too.
>
> v2 -> v1: make the event record more precisely
>
> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
> ---
>   arch/x86/kernel/cpu/mcheck/mce-apei.c | 10 +++++++---
>   drivers/acpi/apei/ghes.c              |  3 +--
>   2 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce-apei.c b/arch/x86/kernel/cpu/mcheck/mce-apei.c
> index de8b60a..d137ab8 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce-apei.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce-apei.c
> @@ -33,6 +33,7 @@
>   #include <linux/acpi.h>
>   #include <linux/cper.h>
>   #include <acpi/apei.h>
> +#include <acpi/ghes.h>
>   #include <asm/mce.h>
>
>   #include "mce-internal.h"
> @@ -41,14 +42,17 @@ void apei_mce_report_mem_error(int corrected, struct cper_sec_mem_err *mem_err)
>   {
>   	struct mce m;
>
> -	/* Only corrected MC is reported */
> -	if (!corrected || !(mem_err->validation_bits & CPER_MEM_VALID_PA))
> +	if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
>   		return;
>
>   	mce_setup(&m);
>   	m.bank = 1;
> -	/* Fake a memory read corrected error with unknown channel */
> +	/* Fake a memory read error with unknown channel */
>   	m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | 0x9f;
> +	if (corrected >= GHES_SEV_RECOVERABLE)
> +		m.status |= MCI_STATUS_UC;
> +	if (corrected >= GHES_SEV_PANIC)
> +		m.status |= MCI_STATUS_PCC;

Hmm... so you only fill up the most basic information from the cper 
record. In the absence of 'S', 'AR' bits, I am not sure how useful this 
is - except for logging the error through /dev/mcelog for legacy users. 
If that is the intent, you have my

Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>


- Naveen

>   	m.addr = mem_err->physical_addr;
>   	mce_log(&m);
>   	mce_notify_irq();
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index a30bc31..ce3683d 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -453,8 +453,7 @@ static void ghes_do_proc(struct ghes *ghes,
>   			ghes_edac_report_mem_error(ghes, sev, mem_err);
>
>   #ifdef CONFIG_X86_MCE
> -			apei_mce_report_mem_error(sev == GHES_SEV_CORRECTED,
> -						  mem_err);
> +			apei_mce_report_mem_error(sev, mem_err);
>   #endif
>   			ghes_handle_memory_failure(gdata, sev);
>   		}
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] ACPI, APEI, GHES: Cleanup ghes codes for memory error handling
  2013-11-25  7:15 ` [PATCH v2 2/2] ACPI, APEI, GHES: Cleanup ghes codes " Chen, Gong
  2013-11-26  6:54   ` Chen, Gong
@ 2013-11-26  9:04   ` Naveen N. Rao
  2013-12-21 12:41     ` Borislav Petkov
  1 sibling, 1 reply; 14+ messages in thread
From: Naveen N. Rao @ 2013-11-26  9:04 UTC (permalink / raw)
  To: Chen, Gong, tony.luck, bp; +Cc: linux-acpi

On 11/25/2013 12:45 PM, Chen, Gong wrote:
> Cleanup the logic for function ghes_handle_memory_failure. Just
> make it simpler and cleaner.
>
> v2 -> v1: fix a compile error & some minor changes.
>
> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>

Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

> ---
>   drivers/acpi/apei/ghes.c | 36 ++++++++++++++++++++----------------
>   1 file changed, 20 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index ce3683d..46766ef 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -413,27 +413,31 @@ static void ghes_handle_memory_failure(struct acpi_generic_data *gdata, int sev)
>   {
>   #ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
>   	unsigned long pfn;
> +	int flags = -1;
>   	int sec_sev = ghes_severity(gdata->error_severity);
>   	struct cper_sec_mem_err *mem_err;
>   	mem_err = (struct cper_sec_mem_err *)(gdata + 1);
>
> -	if (sec_sev == GHES_SEV_CORRECTED &&
> -	    (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED) &&
> -	    (mem_err->validation_bits & CPER_MEM_VALID_PA)) {
> -		pfn = mem_err->physical_addr >> PAGE_SHIFT;
> -		if (pfn_valid(pfn))
> -			memory_failure_queue(pfn, 0, MF_SOFT_OFFLINE);
> -		else if (printk_ratelimit())
> -			pr_warn(FW_WARN GHES_PFX
> -			"Invalid address in generic error data: %#llx\n",
> -			mem_err->physical_addr);
> -	}
> -	if (sev == GHES_SEV_RECOVERABLE &&
> -	    sec_sev == GHES_SEV_RECOVERABLE &&
> -	    mem_err->validation_bits & CPER_MEM_VALID_PA) {
> -		pfn = mem_err->physical_addr >> PAGE_SHIFT;
> -		memory_failure_queue(pfn, 0, 0);
> +	if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
> +		return;
> +
> +	pfn = mem_err->physical_addr >> PAGE_SHIFT;
> +	if (!pfn_valid(pfn)) {
> +		pr_warn_ratelimited(FW_WARN GHES_PFX
> +		"Invalid address in generic error data: %#llx\n",
> +		mem_err->physical_addr);
> +		return;
>   	}
> +
> +	/* iff following two events can be handled properly by now */
> +	if (sec_sev == GHES_SEV_CORRECTED &&
> +	    (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED))
> +		flags = MF_SOFT_OFFLINE;
> +	if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE)
> +		flags = 0;
> +
> +	if (flags != -1)
> +		memory_failure_queue(pfn, 0, flags);
>   #endif
>   }
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] ACPI, APEI, GHES: Remove strict check for memory error handling
  2013-11-26  9:02 ` Naveen N. Rao
@ 2013-11-26  9:31   ` Chen, Gong
  2013-12-14 13:42     ` Chen, Gong
  0 siblings, 1 reply; 14+ messages in thread
From: Chen, Gong @ 2013-11-26  9:31 UTC (permalink / raw)
  To: Naveen N. Rao; +Cc: tony.luck, bp, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 3006 bytes --]

On Tue, Nov 26, 2013 at 02:32:53PM +0530, Naveen N. Rao wrote:
> Date: Tue, 26 Nov 2013 14:32:53 +0530
> From: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
> To: "Chen, Gong" <gong.chen@linux.intel.com>, tony.luck@intel.com,
>  bp@alien8.de
> CC: linux-acpi@vger.kernel.org
> Subject: Re: [PATCH v2 1/2] ACPI, APEI, GHES: Remove strict check for
>  memory error handling
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101
>  Thunderbird/24.1.0
> 
> On 11/25/2013 12:45 PM, Chen, Gong wrote:
> >Usually SCI is employed to handle corrected error, especially
> >for memory corrected error but in fact SCI still can be used
> >to handle any error like memory uncorrected error even fatal
> >error if BIOS enable it. For this kind of situation, it
> >should be logged, too.
> >
> >v2 -> v1: make the event record more precisely
> >
> >Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
> >---
> >  arch/x86/kernel/cpu/mcheck/mce-apei.c | 10 +++++++---
> >  drivers/acpi/apei/ghes.c              |  3 +--
> >  2 files changed, 8 insertions(+), 5 deletions(-)
> >
> >diff --git a/arch/x86/kernel/cpu/mcheck/mce-apei.c b/arch/x86/kernel/cpu/mcheck/mce-apei.c
> >index de8b60a..d137ab8 100644
> >--- a/arch/x86/kernel/cpu/mcheck/mce-apei.c
> >+++ b/arch/x86/kernel/cpu/mcheck/mce-apei.c
> >@@ -33,6 +33,7 @@
> >  #include <linux/acpi.h>
> >  #include <linux/cper.h>
> >  #include <acpi/apei.h>
> >+#include <acpi/ghes.h>
> >  #include <asm/mce.h>
> >
> >  #include "mce-internal.h"
> >@@ -41,14 +42,17 @@ void apei_mce_report_mem_error(int corrected, struct cper_sec_mem_err *mem_err)
> >  {
> >  	struct mce m;
> >
> >-	/* Only corrected MC is reported */
> >-	if (!corrected || !(mem_err->validation_bits & CPER_MEM_VALID_PA))
> >+	if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
> >  		return;
> >
> >  	mce_setup(&m);
> >  	m.bank = 1;
> >-	/* Fake a memory read corrected error with unknown channel */
> >+	/* Fake a memory read error with unknown channel */
> >  	m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | 0x9f;
> >+	if (corrected >= GHES_SEV_RECOVERABLE)
> >+		m.status |= MCI_STATUS_UC;
> >+	if (corrected >= GHES_SEV_PANIC)
> >+		m.status |= MCI_STATUS_PCC;
> 
> Hmm... so you only fill up the most basic information from the cper
> record. In the absence of 'S', 'AR' bits, I am not sure how useful
> this is - except for logging the error through /dev/mcelog for
> legacy users. If that is the intent, you have my
> 
> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> 
> 
> - Naveen
> 

Thanks for your ACK. We want to record more information but you know
UEFI/CPER is not related to MCE in essentially. So we can't figure
out all necessary information to construct MCE record. IOW, we can
just apply the most valuable information like physical address and
fake other fields. From this point of view, this kind of H/W error
event report method is still not perfect.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] ACPI, APEI, GHES: Cleanup ghes codes for memory error handling
  2013-11-26  7:23     ` Borislav Petkov
@ 2013-11-27  2:15       ` Chen, Gong
  2013-12-14 13:42         ` Chen, Gong
  0 siblings, 1 reply; 14+ messages in thread
From: Chen, Gong @ 2013-11-27  2:15 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: tony.luck, naveen.n.rao, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 974 bytes --]

On Tue, Nov 26, 2013 at 08:23:35AM +0100, Borislav Petkov wrote:
> Date: Tue, 26 Nov 2013 08:23:35 +0100
> From: Borislav Petkov <bp@alien8.de>
> To: "Chen, Gong" <gong.chen@linux.intel.com>
> Cc: tony.luck@intel.com, naveen.n.rao@linux.vnet.ibm.com,
>  linux-acpi@vger.kernel.org
> Subject: Re: [PATCH v2 2/2] ACPI, APEI, GHES: Cleanup ghes codes for memory
>  error handling
> User-Agent: Mutt/1.5.21 (2010-09-15)
> 
> On Tue, Nov 26, 2013 at 01:54:57AM -0500, Chen, Gong wrote:
> > In this patch so-called cleanup includes an implied PFN check for UC
> > error but missed in current codes.
> 
> Right, I was about to look at it. You probably should add this to the
> commit message so that it is clear.
> 

How about this:

Add proper PFN validity check for UC error and cleanup the code logic
to make it simpler and cleaner.

If OK and reasonable for this patch, would you mind helping to update the
introduction in the patch before merging it?

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] ACPI, APEI, GHES: Cleanup ghes codes for memory error handling
  2013-11-27  2:15       ` Chen, Gong
@ 2013-12-14 13:42         ` Chen, Gong
  0 siblings, 0 replies; 14+ messages in thread
From: Chen, Gong @ 2013-12-14 13:42 UTC (permalink / raw)
  To: Borislav Petkov, tony.luck, naveen.n.rao, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 1518 bytes --]

On Tue, Nov 26, 2013 at 09:15:42PM -0500, Chen, Gong wrote:
> Date:	Tue, 26 Nov 2013 21:15:42 -0500
> From: "Chen, Gong" <gong.chen@linux.intel.com>
> To: Borislav Petkov <bp@alien8.de>
> Cc: tony.luck@intel.com, naveen.n.rao@linux.vnet.ibm.com,
>  linux-acpi@vger.kernel.org
> Subject: Re: [PATCH v2 2/2] ACPI, APEI, GHES: Cleanup ghes codes for memory
>  error handling
> User-Agent: Mutt/1.5.21 (2010-09-15)
> 
> On Tue, Nov 26, 2013 at 08:23:35AM +0100, Borislav Petkov wrote:
> > Date: Tue, 26 Nov 2013 08:23:35 +0100
> > From: Borislav Petkov <bp@alien8.de>
> > To: "Chen, Gong" <gong.chen@linux.intel.com>
> > Cc: tony.luck@intel.com, naveen.n.rao@linux.vnet.ibm.com,
> >  linux-acpi@vger.kernel.org
> > Subject: Re: [PATCH v2 2/2] ACPI, APEI, GHES: Cleanup ghes codes for memory
> >  error handling
> > User-Agent: Mutt/1.5.21 (2010-09-15)
> > 
> > On Tue, Nov 26, 2013 at 01:54:57AM -0500, Chen, Gong wrote:
> > > In this patch so-called cleanup includes an implied PFN check for UC
> > > error but missed in current codes.
> > 
> > Right, I was about to look at it. You probably should add this to the
> > commit message so that it is clear.
> > 
> 
> How about this:
> 
> Add proper PFN validity check for UC error and cleanup the code logic
> to make it simpler and cleaner.
> 
> If OK and reasonable for this patch, would you mind helping to update the
> introduction in the patch before merging it?

Hi, Boris

Will you pick up this patch in your RAS request pull?

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] ACPI, APEI, GHES: Remove strict check for memory error handling
  2013-11-26  9:31   ` Chen, Gong
@ 2013-12-14 13:42     ` Chen, Gong
  2013-12-16 14:51       ` Borislav Petkov
  0 siblings, 1 reply; 14+ messages in thread
From: Chen, Gong @ 2013-12-14 13:42 UTC (permalink / raw)
  To: Naveen N. Rao, tony.luck, bp, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 3645 bytes --]

On Tue, Nov 26, 2013 at 04:31:36AM -0500, Chen, Gong wrote:
> Date:	Tue, 26 Nov 2013 04:31:36 -0500
> From: "Chen, Gong" <gong.chen@linux.intel.com>
> To: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
> Cc: tony.luck@intel.com, bp@alien8.de, linux-acpi@vger.kernel.org
> Subject: Re: [PATCH v2 1/2] ACPI, APEI, GHES: Remove strict check for
>  memory error handling
> User-Agent: Mutt/1.5.21 (2010-09-15)
> 
> On Tue, Nov 26, 2013 at 02:32:53PM +0530, Naveen N. Rao wrote:
> > Date: Tue, 26 Nov 2013 14:32:53 +0530
> > From: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
> > To: "Chen, Gong" <gong.chen@linux.intel.com>, tony.luck@intel.com,
> >  bp@alien8.de
> > CC: linux-acpi@vger.kernel.org
> > Subject: Re: [PATCH v2 1/2] ACPI, APEI, GHES: Remove strict check for
> >  memory error handling
> > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101
> >  Thunderbird/24.1.0
> > 
> > On 11/25/2013 12:45 PM, Chen, Gong wrote:
> > >Usually SCI is employed to handle corrected error, especially
> > >for memory corrected error but in fact SCI still can be used
> > >to handle any error like memory uncorrected error even fatal
> > >error if BIOS enable it. For this kind of situation, it
> > >should be logged, too.
> > >
> > >v2 -> v1: make the event record more precisely
> > >
> > >Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
> > >---
> > >  arch/x86/kernel/cpu/mcheck/mce-apei.c | 10 +++++++---
> > >  drivers/acpi/apei/ghes.c              |  3 +--
> > >  2 files changed, 8 insertions(+), 5 deletions(-)
> > >
> > >diff --git a/arch/x86/kernel/cpu/mcheck/mce-apei.c b/arch/x86/kernel/cpu/mcheck/mce-apei.c
> > >index de8b60a..d137ab8 100644
> > >--- a/arch/x86/kernel/cpu/mcheck/mce-apei.c
> > >+++ b/arch/x86/kernel/cpu/mcheck/mce-apei.c
> > >@@ -33,6 +33,7 @@
> > >  #include <linux/acpi.h>
> > >  #include <linux/cper.h>
> > >  #include <acpi/apei.h>
> > >+#include <acpi/ghes.h>
> > >  #include <asm/mce.h>
> > >
> > >  #include "mce-internal.h"
> > >@@ -41,14 +42,17 @@ void apei_mce_report_mem_error(int corrected, struct cper_sec_mem_err *mem_err)
> > >  {
> > >  	struct mce m;
> > >
> > >-	/* Only corrected MC is reported */
> > >-	if (!corrected || !(mem_err->validation_bits & CPER_MEM_VALID_PA))
> > >+	if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
> > >  		return;
> > >
> > >  	mce_setup(&m);
> > >  	m.bank = 1;
> > >-	/* Fake a memory read corrected error with unknown channel */
> > >+	/* Fake a memory read error with unknown channel */
> > >  	m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | 0x9f;
> > >+	if (corrected >= GHES_SEV_RECOVERABLE)
> > >+		m.status |= MCI_STATUS_UC;
> > >+	if (corrected >= GHES_SEV_PANIC)
> > >+		m.status |= MCI_STATUS_PCC;
> > 
> > Hmm... so you only fill up the most basic information from the cper
> > record. In the absence of 'S', 'AR' bits, I am not sure how useful
> > this is - except for logging the error through /dev/mcelog for
> > legacy users. If that is the intent, you have my
> > 
> > Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> > 
> > 
> > - Naveen
> > 
> 
> Thanks for your ACK. We want to record more information but you know
> UEFI/CPER is not related to MCE in essentially. So we can't figure
> out all necessary information to construct MCE record. IOW, we can
> just apply the most valuable information like physical address and
> fake other fields. From this point of view, this kind of H/W error
> event report method is still not perfect.

Hi, Boris

Will you pick up this patch in your RAS request pull?

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] ACPI, APEI, GHES: Remove strict check for memory error handling
  2013-12-16 14:51       ` Borislav Petkov
@ 2013-12-16 14:40         ` Chen, Gong
  0 siblings, 0 replies; 14+ messages in thread
From: Chen, Gong @ 2013-12-16 14:40 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Naveen N. Rao, tony.luck, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 726 bytes --]

On Mon, Dec 16, 2013 at 03:51:29PM +0100, Borislav Petkov wrote:
> Date: Mon, 16 Dec 2013 15:51:29 +0100
> From: Borislav Petkov <bp@alien8.de>
> To: "Chen, Gong" <gong.chen@linux.intel.com>
> Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>, tony.luck@intel.com,
>  linux-acpi@vger.kernel.org
> Subject: Re: [PATCH v2 1/2] ACPI, APEI, GHES: Remove strict check for
>  memory error handling
> User-Agent: Mutt/1.5.21 (2010-09-15)
> 
> On Sat, Dec 14, 2013 at 08:42:56AM -0500, Chen, Gong wrote:
> > Will you pick up this patch in your RAS request pull?
> 
> Applied, with commit message massaging and s/corrected/severity/
> automatic variable change, version below:
> 
Thanks very much for your effort.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] ACPI, APEI, GHES: Remove strict check for memory error handling
  2013-12-14 13:42     ` Chen, Gong
@ 2013-12-16 14:51       ` Borislav Petkov
  2013-12-16 14:40         ` Chen, Gong
  0 siblings, 1 reply; 14+ messages in thread
From: Borislav Petkov @ 2013-12-16 14:51 UTC (permalink / raw)
  To: Chen, Gong; +Cc: Naveen N. Rao, tony.luck, linux-acpi

On Sat, Dec 14, 2013 at 08:42:56AM -0500, Chen, Gong wrote:
> Will you pick up this patch in your RAS request pull?

Applied, with commit message massaging and s/corrected/severity/
automatic variable change, version below:

--
From: "Chen, Gong" <gong.chen@linux.intel.com>
Subject: [PATCH] ACPI, APEI, GHES: Do not report only correctable errors with SCI

Currently SCI is employed to handle corrected errors, and memory
corrected errors, more specifically but in fact SCI still can be used to
handle any errors, e.g. uncorrected or even fatal ones if enabled by the
BIOS. Enable logging for those kinds of errors too.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1385363701-12387-1-git-send-email-gong.chen@linux.intel.com
[ Boris: massage commit message, rename automatic variable. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce-apei.c | 14 ++++++++++----
 drivers/acpi/apei/ghes.c              |  3 +--
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-apei.c b/arch/x86/kernel/cpu/mcheck/mce-apei.c
index de8b60a53f69..a1aef9533154 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-apei.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-apei.c
@@ -33,22 +33,28 @@
 #include <linux/acpi.h>
 #include <linux/cper.h>
 #include <acpi/apei.h>
+#include <acpi/ghes.h>
 #include <asm/mce.h>
 
 #include "mce-internal.h"
 
-void apei_mce_report_mem_error(int corrected, struct cper_sec_mem_err *mem_err)
+void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err)
 {
 	struct mce m;
 
-	/* Only corrected MC is reported */
-	if (!corrected || !(mem_err->validation_bits & CPER_MEM_VALID_PA))
+	if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
 		return;
 
 	mce_setup(&m);
 	m.bank = 1;
-	/* Fake a memory read corrected error with unknown channel */
+	/* Fake a memory read error with unknown channel */
 	m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | 0x9f;
+
+	if (severity >= GHES_SEV_RECOVERABLE)
+		m.status |= MCI_STATUS_UC;
+	if (severity >= GHES_SEV_PANIC)
+		m.status |= MCI_STATUS_PCC;
+
 	m.addr = mem_err->physical_addr;
 	mce_log(&m);
 	mce_notify_irq();
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index a30bc313787b..ce3683d93a13 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -453,8 +453,7 @@ static void ghes_do_proc(struct ghes *ghes,
 			ghes_edac_report_mem_error(ghes, sev, mem_err);
 
 #ifdef CONFIG_X86_MCE
-			apei_mce_report_mem_error(sev == GHES_SEV_CORRECTED,
-						  mem_err);
+			apei_mce_report_mem_error(sev, mem_err);
 #endif
 			ghes_handle_memory_failure(gdata, sev);
 		}
-- 
1.8.4

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] ACPI, APEI, GHES: Cleanup ghes codes for memory error handling
  2013-11-26  9:04   ` Naveen N. Rao
@ 2013-12-21 12:41     ` Borislav Petkov
  0 siblings, 0 replies; 14+ messages in thread
From: Borislav Petkov @ 2013-12-21 12:41 UTC (permalink / raw)
  To: Naveen N. Rao; +Cc: Chen, Gong, tony.luck, linux-acpi

On Tue, Nov 26, 2013 at 02:34:51PM +0530, Naveen N. Rao wrote:
> On 11/25/2013 12:45 PM, Chen, Gong wrote:
> >Cleanup the logic for function ghes_handle_memory_failure. Just
> >make it simpler and cleaner.
> >
> >v2 -> v1: fix a compile error & some minor changes.
> >
> >Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
> 
> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

Applied, thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-12-21 12:41 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-25  7:15 [PATCH v2 1/2] ACPI, APEI, GHES: Remove strict check for memory error handling Chen, Gong
2013-11-25  7:15 ` [PATCH v2 2/2] ACPI, APEI, GHES: Cleanup ghes codes " Chen, Gong
2013-11-26  6:54   ` Chen, Gong
2013-11-26  7:23     ` Borislav Petkov
2013-11-27  2:15       ` Chen, Gong
2013-12-14 13:42         ` Chen, Gong
2013-11-26  9:04   ` Naveen N. Rao
2013-12-21 12:41     ` Borislav Petkov
2013-11-25 17:13 ` [PATCH v2 1/2] ACPI, APEI, GHES: Remove strict check " Borislav Petkov
2013-11-26  9:02 ` Naveen N. Rao
2013-11-26  9:31   ` Chen, Gong
2013-12-14 13:42     ` Chen, Gong
2013-12-16 14:51       ` Borislav Petkov
2013-12-16 14:40         ` Chen, Gong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).