linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86, MCE, fix MSR_IA32_MCI_CTL2 CMCI threshold setup
@ 2010-03-29  7:16 Huang Ying
  2010-03-29  8:26 ` Hidetoshi Seto
  0 siblings, 1 reply; 8+ messages in thread
From: Huang Ying @ 2010-03-29  7:16 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Andi Kleen, Hidetoshi Seto
  Cc: linux-kernel@vger.kernel.org

It is reported that CMCI is not raised when number of corrected error
reaches preset threshold. After inspection, it is found that
MSR_IA32_MCI_CTL2 threshold field is not setup properly. This patch
fixed it.

Reported-by: Shaohui Zheng <shaohui.zheng@intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/include/asm/mce.h             |    3 +++
 arch/x86/kernel/cpu/mcheck/mce_intel.c |    1 +
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 6c3fdd6..355f298 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -38,6 +38,9 @@
 #define MCM_ADDR_MEM	 3	/* memory address */
 #define MCM_ADDR_GENERIC 7	/* generic */
 
+/* CTL2 register defines */
+#define MCI_CTL2_THRESHOLD_MASK	0x7fff
+
 #define MCJ_CTX_MASK		3
 #define MCJ_CTX(flags)		((flags) & MCJ_CTX_MASK)
 #define MCJ_CTX_RANDOM		0    /* inject context: random */
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index d15df6e..ffe730d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -101,6 +101,7 @@ static void cmci_discover(int banks, int boot)
 			continue;
 		}
 
+		val &= ~MCI_CTL2_THRESHOLD_MASK;
 		val |= CMCI_EN | CMCI_THRESHOLD;
 		wrmsrl(MSR_IA32_MCx_CTL2(i), val);
 		rdmsrl(MSR_IA32_MCx_CTL2(i), val);



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86, MCE, fix MSR_IA32_MCI_CTL2 CMCI threshold setup
  2010-03-29  7:16 Huang Ying
@ 2010-03-29  8:26 ` Hidetoshi Seto
  2010-03-29  8:40   ` Huang Ying
  2010-03-29 10:50   ` Andi Kleen
  0 siblings, 2 replies; 8+ messages in thread
From: Hidetoshi Seto @ 2010-03-29  8:26 UTC (permalink / raw)
  To: Huang Ying
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen,
	linux-kernel@vger.kernel.org

(2010/03/29 16:16), Huang Ying wrote:
> It is reported that CMCI is not raised when number of corrected error
> reaches preset threshold. After inspection, it is found that
> MSR_IA32_MCI_CTL2 threshold field is not setup properly. This patch
> fixed it.
> 
> Reported-by: Shaohui Zheng <shaohui.zheng@intel.com>
> Signed-off-by: Huang Ying <ying.huang@intel.com>
> Acked-by: Andi Kleen <ak@linux.intel.com>
> ---
>  arch/x86/include/asm/mce.h             |    3 +++
>  arch/x86/kernel/cpu/mcheck/mce_intel.c |    1 +
>  2 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 6c3fdd6..355f298 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -38,6 +38,9 @@
>  #define MCM_ADDR_MEM	 3	/* memory address */
>  #define MCM_ADDR_GENERIC 7	/* generic */
>  
> +/* CTL2 register defines */
> +#define MCI_CTL2_THRESHOLD_MASK	0x7fff
> +
>  #define MCJ_CTX_MASK		3
>  #define MCJ_CTX(flags)		((flags) & MCJ_CTX_MASK)
>  #define MCJ_CTX_RANDOM		0    /* inject context: random */
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> index d15df6e..ffe730d 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> @@ -101,6 +101,7 @@ static void cmci_discover(int banks, int boot)
>  			continue;
>  		}
>  
> +		val &= ~MCI_CTL2_THRESHOLD_MASK;
>  		val |= CMCI_EN | CMCI_THRESHOLD;
>  		wrmsrl(MSR_IA32_MCx_CTL2(i), val);
>  		rdmsrl(MSR_IA32_MCx_CTL2(i), val);

Hum, it seems that the CTL2 of reported environment had
be initialized to have more than CMCI_THRESHOLD(=1) by
BIOS etc.  Could you explain more about your inspection?

Maybe we could handle this environment if kernel supports
APEI's firmware-first for corrected MCE and if this is in
that case.

Patch looks good. How about cc-ing stable?

Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>


Thanks,
H.Seto


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86, MCE, fix MSR_IA32_MCI_CTL2 CMCI threshold setup
  2010-03-29  8:26 ` Hidetoshi Seto
@ 2010-03-29  8:40   ` Huang Ying
  2010-03-29 10:50   ` Andi Kleen
  1 sibling, 0 replies; 8+ messages in thread
From: Huang Ying @ 2010-03-29  8:40 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen,
	linux-kernel@vger.kernel.org

Hi, Seto san,

Thanks for review.

On Mon, 2010-03-29 at 16:26 +0800, Hidetoshi Seto wrote:
> (2010/03/29 16:16), Huang Ying wrote:
> > It is reported that CMCI is not raised when number of corrected error
> > reaches preset threshold. After inspection, it is found that
> > MSR_IA32_MCI_CTL2 threshold field is not setup properly. This patch
> > fixed it.
> > 
> > Reported-by: Shaohui Zheng <shaohui.zheng@intel.com>
> > Signed-off-by: Huang Ying <ying.huang@intel.com>
> > Acked-by: Andi Kleen <ak@linux.intel.com>
> > ---
> >  arch/x86/include/asm/mce.h             |    3 +++
> >  arch/x86/kernel/cpu/mcheck/mce_intel.c |    1 +
> >  2 files changed, 4 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> > index 6c3fdd6..355f298 100644
> > --- a/arch/x86/include/asm/mce.h
> > +++ b/arch/x86/include/asm/mce.h
> > @@ -38,6 +38,9 @@
> >  #define MCM_ADDR_MEM	 3	/* memory address */
> >  #define MCM_ADDR_GENERIC 7	/* generic */
> >  
> > +/* CTL2 register defines */
> > +#define MCI_CTL2_THRESHOLD_MASK	0x7fff
> > +
> >  #define MCJ_CTX_MASK		3
> >  #define MCJ_CTX(flags)		((flags) & MCJ_CTX_MASK)
> >  #define MCJ_CTX_RANDOM		0    /* inject context: random */
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> > index d15df6e..ffe730d 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> > @@ -101,6 +101,7 @@ static void cmci_discover(int banks, int boot)
> >  			continue;
> >  		}
> >  
> > +		val &= ~MCI_CTL2_THRESHOLD_MASK;
> >  		val |= CMCI_EN | CMCI_THRESHOLD;
> >  		wrmsrl(MSR_IA32_MCx_CTL2(i), val);
> >  		rdmsrl(MSR_IA32_MCx_CTL2(i), val);
> 
> Hum, it seems that the CTL2 of reported environment had
> be initialized to have more than CMCI_THRESHOLD(=1) by
> BIOS etc.  Could you explain more about your inspection?

In CTL2, threshold is initialized to 10 by BIOS.

> Maybe we could handle this environment if kernel supports
> APEI's firmware-first for corrected MCE and if this is in
> that case.

I still don't know much about the details of the interaction between
BIOS and OS here.

Best Regards,
Huang Ying


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86, MCE, fix MSR_IA32_MCI_CTL2 CMCI threshold setup
  2010-03-29  8:26 ` Hidetoshi Seto
  2010-03-29  8:40   ` Huang Ying
@ 2010-03-29 10:50   ` Andi Kleen
  1 sibling, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2010-03-29 10:50 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: Huang Ying, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	linux-kernel@vger.kernel.org

Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> writes:
>
> Maybe we could handle this environment if kernel supports
> APEI's firmware-first for corrected MCE and if this is in
> that case.

The problem at hand has nothing to do with firmware first
or APEI.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] x86, MCE, fix MSR_IA32_MCI_CTL2 CMCI threshold setup
@ 2010-05-17  8:08 Huang Ying
  2010-05-17 17:59 ` H. Peter Anvin
  0 siblings, 1 reply; 8+ messages in thread
From: Huang Ying @ 2010-05-17  8:08 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Andi Kleen, Hidetoshi Seto
  Cc: linux-kernel@vger.kernel.org

It is reported that CMCI is not raised when number of corrected error
reaches preset threshold. After inspection, it is found that
MSR_IA32_MCI_CTL2 threshold field is not setup properly. This patch
fixed it.


Changelog:

v2:

- Rename CMCI_EN to MCI_CTL2_CMCI_EN and CMCI_THRESHOLD_MASK to
  MCI_CTL2_CMCI_THRESHOLD_MASK to make naming consistent.

Reported-by: Shaohui Zheng <shaohui.zheng@intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
 arch/x86/include/asm/mce.h             |    4 ++++
 arch/x86/include/asm/msr-index.h       |    3 ---
 arch/x86/kernel/cpu/mcheck/mce_intel.c |    9 +++++----
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 6c3fdd6..6202db7 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -38,6 +38,10 @@
 #define MCM_ADDR_MEM	 3	/* memory address */
 #define MCM_ADDR_GENERIC 7	/* generic */
 
+/* CTL2 register defines */
+#define MCI_CTL2_CMCI_EN		(1ULL << 30)
+#define MCI_CTL2_CMCI_THRESHOLD_MASK	0x7fffULL
+
 #define MCJ_CTX_MASK		3
 #define MCJ_CTX(flags)		((flags) & MCJ_CTX_MASK)
 #define MCJ_CTX_RANDOM		0    /* inject context: random */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 4604e6a..baa6370 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -91,9 +91,6 @@
 #define MSR_IA32_MC0_CTL2		0x00000280
 #define MSR_IA32_MCx_CTL2(x)		(MSR_IA32_MC0_CTL2 + (x))
 
-#define CMCI_EN			(1ULL << 30)
-#define CMCI_THRESHOLD_MASK		0xffffULL
-
 #define MSR_P6_PERFCTR0			0x000000c1
 #define MSR_P6_PERFCTR1			0x000000c2
 #define MSR_P6_EVNTSEL0			0x00000186
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 62b48e4..6fcd093 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -95,19 +95,20 @@ static void cmci_discover(int banks, int boot)
 		rdmsrl(MSR_IA32_MCx_CTL2(i), val);
 
 		/* Already owned by someone else? */
-		if (val & CMCI_EN) {
+		if (val & MCI_CTL2_CMCI_EN) {
 			if (test_and_clear_bit(i, owned) && !boot)
 				print_update("SHD", &hdr, i);
 			__clear_bit(i, __get_cpu_var(mce_poll_banks));
 			continue;
 		}
 
-		val |= CMCI_EN | CMCI_THRESHOLD;
+		val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK;
+		val |= MCI_CTL2_CMCI_EN | CMCI_THRESHOLD;
 		wrmsrl(MSR_IA32_MCx_CTL2(i), val);
 		rdmsrl(MSR_IA32_MCx_CTL2(i), val);
 
 		/* Did the enable bit stick? -- the bank supports CMCI */
-		if (val & CMCI_EN) {
+		if (val & MCI_CTL2_CMCI_EN) {
 			if (!test_and_set_bit(i, owned) && !boot)
 				print_update("CMCI", &hdr, i);
 			__clear_bit(i, __get_cpu_var(mce_poll_banks));
@@ -155,7 +156,7 @@ void cmci_clear(void)
 			continue;
 		/* Disable CMCI */
 		rdmsrl(MSR_IA32_MCx_CTL2(i), val);
-		val &= ~(CMCI_EN|CMCI_THRESHOLD_MASK);
+		val &= ~(MCI_CTL2_CMCI_EN|MCI_CTL2_CMCI_THRESHOLD_MASK);
 		wrmsrl(MSR_IA32_MCx_CTL2(i), val);
 		__clear_bit(i, __get_cpu_var(mce_banks_owned));
 	}



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86, MCE, fix MSR_IA32_MCI_CTL2 CMCI threshold setup
  2010-05-17  8:08 [PATCH] x86, MCE, fix MSR_IA32_MCI_CTL2 CMCI threshold setup Huang Ying
@ 2010-05-17 17:59 ` H. Peter Anvin
  2010-05-18  0:51   ` Huang Ying
  0 siblings, 1 reply; 8+ messages in thread
From: H. Peter Anvin @ 2010-05-17 17:59 UTC (permalink / raw)
  To: Huang Ying
  Cc: Ingo Molnar, Andi Kleen, Hidetoshi Seto,
	linux-kernel@vger.kernel.org

On 05/17/2010 01:08 AM, Huang Ying wrote:
> It is reported that CMCI is not raised when number of corrected error
> reaches preset threshold. After inspection, it is found that
> MSR_IA32_MCI_CTL2 threshold field is not setup properly. This patch
> fixed it.
> 
> 
> Changelog:
> 
> v2:
> 
> - Rename CMCI_EN to MCI_CTL2_CMCI_EN and CMCI_THRESHOLD_MASK to
>   MCI_CTL2_CMCI_THRESHOLD_MASK to make naming consistent.
> 

This looks like a mix of a renaming patch and new functionality.  Please
submit the renaming as a one patch and then the new functionality as a
second patch on top, otherwise it gets hard to see what is actually
going on.

If I'm not mistaken, there are at least two functionality changes:

+#define MCI_CTL2_CMCI_THRESHOLD_MASK	0x7fffULL
-#define CMCI_THRESHOLD_MASK		0xffffULL

... change of mask, and:

-		val |= CMCI_EN | CMCI_THRESHOLD;
+		val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK;
+		val |= MCI_CTL2_CMCI_EN | CMCI_THRESHOLD;

bit being cleared which wasn't before.

	-hpa

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86, MCE, fix MSR_IA32_MCI_CTL2 CMCI threshold setup
  2010-05-17 17:59 ` H. Peter Anvin
@ 2010-05-18  0:51   ` Huang Ying
  2010-05-18  1:37     ` H. Peter Anvin
  0 siblings, 1 reply; 8+ messages in thread
From: Huang Ying @ 2010-05-18  0:51 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Andi Kleen, Hidetoshi Seto,
	linux-kernel@vger.kernel.org

Hi, Peter,

Thanks for review.

On Tue, 2010-05-18 at 01:59 +0800, H. Peter Anvin wrote:
> On 05/17/2010 01:08 AM, Huang Ying wrote:
> > It is reported that CMCI is not raised when number of corrected error
> > reaches preset threshold. After inspection, it is found that
> > MSR_IA32_MCI_CTL2 threshold field is not setup properly. This patch
> > fixed it.
> > 
> > 
> > Changelog:
> > 
> > v2:
> > 
> > - Rename CMCI_EN to MCI_CTL2_CMCI_EN and CMCI_THRESHOLD_MASK to
> >   MCI_CTL2_CMCI_THRESHOLD_MASK to make naming consistent.
> > 
> 
> This looks like a mix of a renaming patch and new functionality.  Please
> submit the renaming as a one patch and then the new functionality as a
> second patch on top, otherwise it gets hard to see what is actually
> going on.
> 
> If I'm not mistaken, there are at least two functionality changes:
> 
> +#define MCI_CTL2_CMCI_THRESHOLD_MASK	0x7fffULL
> -#define CMCI_THRESHOLD_MASK		0xffffULL
> 
> ... change of mask, and:
> 
> -		val |= CMCI_EN | CMCI_THRESHOLD;
> +		val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK;
> +		val |= MCI_CTL2_CMCI_EN | CMCI_THRESHOLD;
> 
> bit being cleared which wasn't before.

This means we need 3 patches?

Best Regards,
Huang Ying



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86, MCE, fix MSR_IA32_MCI_CTL2 CMCI threshold setup
  2010-05-18  0:51   ` Huang Ying
@ 2010-05-18  1:37     ` H. Peter Anvin
  0 siblings, 0 replies; 8+ messages in thread
From: H. Peter Anvin @ 2010-05-18  1:37 UTC (permalink / raw)
  To: Huang Ying
  Cc: Ingo Molnar, Andi Kleen, Hidetoshi Seto,
	linux-kernel@vger.kernel.org

On 05/17/2010 05:51 PM, Huang Ying wrote:
> Hi, Peter,
> 
> Thanks for review.
> 
> On Tue, 2010-05-18 at 01:59 +0800, H. Peter Anvin wrote:
>> On 05/17/2010 01:08 AM, Huang Ying wrote:
>>> It is reported that CMCI is not raised when number of corrected error
>>> reaches preset threshold. After inspection, it is found that
>>> MSR_IA32_MCI_CTL2 threshold field is not setup properly. This patch
>>> fixed it.
>>>
>>>
>>> Changelog:
>>>
>>> v2:
>>>
>>> - Rename CMCI_EN to MCI_CTL2_CMCI_EN and CMCI_THRESHOLD_MASK to
>>>   MCI_CTL2_CMCI_THRESHOLD_MASK to make naming consistent.
>>>
>>
>> This looks like a mix of a renaming patch and new functionality.  Please
>> submit the renaming as a one patch and then the new functionality as a
>> second patch on top, otherwise it gets hard to see what is actually
>> going on.
>>
>> If I'm not mistaken, there are at least two functionality changes:
>>
>> +#define MCI_CTL2_CMCI_THRESHOLD_MASK	0x7fffULL
>> -#define CMCI_THRESHOLD_MASK		0xffffULL
>>
>> ... change of mask, and:
>>
>> -		val |= CMCI_EN | CMCI_THRESHOLD;
>> +		val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK;
>> +		val |= MCI_CTL2_CMCI_EN | CMCI_THRESHOLD;
>>
>> bit being cleared which wasn't before.
> 
> This means we need 3 patches?
> 

No, two patches is fine -- one for the rename (no binary changes) and
one for the functionality changes.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-05-18  1:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-17  8:08 [PATCH] x86, MCE, fix MSR_IA32_MCI_CTL2 CMCI threshold setup Huang Ying
2010-05-17 17:59 ` H. Peter Anvin
2010-05-18  0:51   ` Huang Ying
2010-05-18  1:37     ` H. Peter Anvin
  -- strict thread matches above, loose matches on Subject: below --
2010-03-29  7:16 Huang Ying
2010-03-29  8:26 ` Hidetoshi Seto
2010-03-29  8:40   ` Huang Ying
2010-03-29 10:50   ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).