* [PATCH] x86, MCE, fix MSR_IA32_MCI_CTL2 CMCI threshold setup
@ 2010-03-29 7:16 Huang Ying
2010-03-29 8:26 ` Hidetoshi Seto
0 siblings, 1 reply; 8+ messages in thread
From: Huang Ying @ 2010-03-29 7:16 UTC (permalink / raw)
To: Ingo Molnar, H. Peter Anvin, Andi Kleen, Hidetoshi Seto
Cc: linux-kernel@vger.kernel.org
It is reported that CMCI is not raised when number of corrected error
reaches preset threshold. After inspection, it is found that
MSR_IA32_MCI_CTL2 threshold field is not setup properly. This patch
fixed it.
Reported-by: Shaohui Zheng <shaohui.zheng@intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
---
arch/x86/include/asm/mce.h | 3 +++
arch/x86/kernel/cpu/mcheck/mce_intel.c | 1 +
2 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 6c3fdd6..355f298 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -38,6 +38,9 @@
#define MCM_ADDR_MEM 3 /* memory address */
#define MCM_ADDR_GENERIC 7 /* generic */
+/* CTL2 register defines */
+#define MCI_CTL2_THRESHOLD_MASK 0x7fff
+
#define MCJ_CTX_MASK 3
#define MCJ_CTX(flags) ((flags) & MCJ_CTX_MASK)
#define MCJ_CTX_RANDOM 0 /* inject context: random */
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index d15df6e..ffe730d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -101,6 +101,7 @@ static void cmci_discover(int banks, int boot)
continue;
}
+ val &= ~MCI_CTL2_THRESHOLD_MASK;
val |= CMCI_EN | CMCI_THRESHOLD;
wrmsrl(MSR_IA32_MCx_CTL2(i), val);
rdmsrl(MSR_IA32_MCx_CTL2(i), val);
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] x86, MCE, fix MSR_IA32_MCI_CTL2 CMCI threshold setup
2010-03-29 7:16 Huang Ying
@ 2010-03-29 8:26 ` Hidetoshi Seto
2010-03-29 8:40 ` Huang Ying
2010-03-29 10:50 ` Andi Kleen
0 siblings, 2 replies; 8+ messages in thread
From: Hidetoshi Seto @ 2010-03-29 8:26 UTC (permalink / raw)
To: Huang Ying
Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen,
linux-kernel@vger.kernel.org
(2010/03/29 16:16), Huang Ying wrote:
> It is reported that CMCI is not raised when number of corrected error
> reaches preset threshold. After inspection, it is found that
> MSR_IA32_MCI_CTL2 threshold field is not setup properly. This patch
> fixed it.
>
> Reported-by: Shaohui Zheng <shaohui.zheng@intel.com>
> Signed-off-by: Huang Ying <ying.huang@intel.com>
> Acked-by: Andi Kleen <ak@linux.intel.com>
> ---
> arch/x86/include/asm/mce.h | 3 +++
> arch/x86/kernel/cpu/mcheck/mce_intel.c | 1 +
> 2 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 6c3fdd6..355f298 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -38,6 +38,9 @@
> #define MCM_ADDR_MEM 3 /* memory address */
> #define MCM_ADDR_GENERIC 7 /* generic */
>
> +/* CTL2 register defines */
> +#define MCI_CTL2_THRESHOLD_MASK 0x7fff
> +
> #define MCJ_CTX_MASK 3
> #define MCJ_CTX(flags) ((flags) & MCJ_CTX_MASK)
> #define MCJ_CTX_RANDOM 0 /* inject context: random */
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> index d15df6e..ffe730d 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> @@ -101,6 +101,7 @@ static void cmci_discover(int banks, int boot)
> continue;
> }
>
> + val &= ~MCI_CTL2_THRESHOLD_MASK;
> val |= CMCI_EN | CMCI_THRESHOLD;
> wrmsrl(MSR_IA32_MCx_CTL2(i), val);
> rdmsrl(MSR_IA32_MCx_CTL2(i), val);
Hum, it seems that the CTL2 of reported environment had
be initialized to have more than CMCI_THRESHOLD(=1) by
BIOS etc. Could you explain more about your inspection?
Maybe we could handle this environment if kernel supports
APEI's firmware-first for corrected MCE and if this is in
that case.
Patch looks good. How about cc-ing stable?
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] x86, MCE, fix MSR_IA32_MCI_CTL2 CMCI threshold setup
2010-03-29 8:26 ` Hidetoshi Seto
@ 2010-03-29 8:40 ` Huang Ying
2010-03-29 10:50 ` Andi Kleen
1 sibling, 0 replies; 8+ messages in thread
From: Huang Ying @ 2010-03-29 8:40 UTC (permalink / raw)
To: Hidetoshi Seto
Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen,
linux-kernel@vger.kernel.org
Hi, Seto san,
Thanks for review.
On Mon, 2010-03-29 at 16:26 +0800, Hidetoshi Seto wrote:
> (2010/03/29 16:16), Huang Ying wrote:
> > It is reported that CMCI is not raised when number of corrected error
> > reaches preset threshold. After inspection, it is found that
> > MSR_IA32_MCI_CTL2 threshold field is not setup properly. This patch
> > fixed it.
> >
> > Reported-by: Shaohui Zheng <shaohui.zheng@intel.com>
> > Signed-off-by: Huang Ying <ying.huang@intel.com>
> > Acked-by: Andi Kleen <ak@linux.intel.com>
> > ---
> > arch/x86/include/asm/mce.h | 3 +++
> > arch/x86/kernel/cpu/mcheck/mce_intel.c | 1 +
> > 2 files changed, 4 insertions(+), 0 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> > index 6c3fdd6..355f298 100644
> > --- a/arch/x86/include/asm/mce.h
> > +++ b/arch/x86/include/asm/mce.h
> > @@ -38,6 +38,9 @@
> > #define MCM_ADDR_MEM 3 /* memory address */
> > #define MCM_ADDR_GENERIC 7 /* generic */
> >
> > +/* CTL2 register defines */
> > +#define MCI_CTL2_THRESHOLD_MASK 0x7fff
> > +
> > #define MCJ_CTX_MASK 3
> > #define MCJ_CTX(flags) ((flags) & MCJ_CTX_MASK)
> > #define MCJ_CTX_RANDOM 0 /* inject context: random */
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> > index d15df6e..ffe730d 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> > @@ -101,6 +101,7 @@ static void cmci_discover(int banks, int boot)
> > continue;
> > }
> >
> > + val &= ~MCI_CTL2_THRESHOLD_MASK;
> > val |= CMCI_EN | CMCI_THRESHOLD;
> > wrmsrl(MSR_IA32_MCx_CTL2(i), val);
> > rdmsrl(MSR_IA32_MCx_CTL2(i), val);
>
> Hum, it seems that the CTL2 of reported environment had
> be initialized to have more than CMCI_THRESHOLD(=1) by
> BIOS etc. Could you explain more about your inspection?
In CTL2, threshold is initialized to 10 by BIOS.
> Maybe we could handle this environment if kernel supports
> APEI's firmware-first for corrected MCE and if this is in
> that case.
I still don't know much about the details of the interaction between
BIOS and OS here.
Best Regards,
Huang Ying
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] x86, MCE, fix MSR_IA32_MCI_CTL2 CMCI threshold setup
2010-03-29 8:26 ` Hidetoshi Seto
2010-03-29 8:40 ` Huang Ying
@ 2010-03-29 10:50 ` Andi Kleen
1 sibling, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2010-03-29 10:50 UTC (permalink / raw)
To: Hidetoshi Seto
Cc: Huang Ying, Ingo Molnar, H. Peter Anvin, Andi Kleen,
linux-kernel@vger.kernel.org
Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> writes:
>
> Maybe we could handle this environment if kernel supports
> APEI's firmware-first for corrected MCE and if this is in
> that case.
The problem at hand has nothing to do with firmware first
or APEI.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] x86, MCE, fix MSR_IA32_MCI_CTL2 CMCI threshold setup
@ 2010-05-17 8:08 Huang Ying
2010-05-17 17:59 ` H. Peter Anvin
0 siblings, 1 reply; 8+ messages in thread
From: Huang Ying @ 2010-05-17 8:08 UTC (permalink / raw)
To: Ingo Molnar, H. Peter Anvin, Andi Kleen, Hidetoshi Seto
Cc: linux-kernel@vger.kernel.org
It is reported that CMCI is not raised when number of corrected error
reaches preset threshold. After inspection, it is found that
MSR_IA32_MCI_CTL2 threshold field is not setup properly. This patch
fixed it.
Changelog:
v2:
- Rename CMCI_EN to MCI_CTL2_CMCI_EN and CMCI_THRESHOLD_MASK to
MCI_CTL2_CMCI_THRESHOLD_MASK to make naming consistent.
Reported-by: Shaohui Zheng <shaohui.zheng@intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
arch/x86/include/asm/mce.h | 4 ++++
arch/x86/include/asm/msr-index.h | 3 ---
arch/x86/kernel/cpu/mcheck/mce_intel.c | 9 +++++----
3 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 6c3fdd6..6202db7 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -38,6 +38,10 @@
#define MCM_ADDR_MEM 3 /* memory address */
#define MCM_ADDR_GENERIC 7 /* generic */
+/* CTL2 register defines */
+#define MCI_CTL2_CMCI_EN (1ULL << 30)
+#define MCI_CTL2_CMCI_THRESHOLD_MASK 0x7fffULL
+
#define MCJ_CTX_MASK 3
#define MCJ_CTX(flags) ((flags) & MCJ_CTX_MASK)
#define MCJ_CTX_RANDOM 0 /* inject context: random */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 4604e6a..baa6370 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -91,9 +91,6 @@
#define MSR_IA32_MC0_CTL2 0x00000280
#define MSR_IA32_MCx_CTL2(x) (MSR_IA32_MC0_CTL2 + (x))
-#define CMCI_EN (1ULL << 30)
-#define CMCI_THRESHOLD_MASK 0xffffULL
-
#define MSR_P6_PERFCTR0 0x000000c1
#define MSR_P6_PERFCTR1 0x000000c2
#define MSR_P6_EVNTSEL0 0x00000186
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 62b48e4..6fcd093 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -95,19 +95,20 @@ static void cmci_discover(int banks, int boot)
rdmsrl(MSR_IA32_MCx_CTL2(i), val);
/* Already owned by someone else? */
- if (val & CMCI_EN) {
+ if (val & MCI_CTL2_CMCI_EN) {
if (test_and_clear_bit(i, owned) && !boot)
print_update("SHD", &hdr, i);
__clear_bit(i, __get_cpu_var(mce_poll_banks));
continue;
}
- val |= CMCI_EN | CMCI_THRESHOLD;
+ val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK;
+ val |= MCI_CTL2_CMCI_EN | CMCI_THRESHOLD;
wrmsrl(MSR_IA32_MCx_CTL2(i), val);
rdmsrl(MSR_IA32_MCx_CTL2(i), val);
/* Did the enable bit stick? -- the bank supports CMCI */
- if (val & CMCI_EN) {
+ if (val & MCI_CTL2_CMCI_EN) {
if (!test_and_set_bit(i, owned) && !boot)
print_update("CMCI", &hdr, i);
__clear_bit(i, __get_cpu_var(mce_poll_banks));
@@ -155,7 +156,7 @@ void cmci_clear(void)
continue;
/* Disable CMCI */
rdmsrl(MSR_IA32_MCx_CTL2(i), val);
- val &= ~(CMCI_EN|CMCI_THRESHOLD_MASK);
+ val &= ~(MCI_CTL2_CMCI_EN|MCI_CTL2_CMCI_THRESHOLD_MASK);
wrmsrl(MSR_IA32_MCx_CTL2(i), val);
__clear_bit(i, __get_cpu_var(mce_banks_owned));
}
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] x86, MCE, fix MSR_IA32_MCI_CTL2 CMCI threshold setup
2010-05-17 8:08 [PATCH] x86, MCE, fix MSR_IA32_MCI_CTL2 CMCI threshold setup Huang Ying
@ 2010-05-17 17:59 ` H. Peter Anvin
2010-05-18 0:51 ` Huang Ying
0 siblings, 1 reply; 8+ messages in thread
From: H. Peter Anvin @ 2010-05-17 17:59 UTC (permalink / raw)
To: Huang Ying
Cc: Ingo Molnar, Andi Kleen, Hidetoshi Seto,
linux-kernel@vger.kernel.org
On 05/17/2010 01:08 AM, Huang Ying wrote:
> It is reported that CMCI is not raised when number of corrected error
> reaches preset threshold. After inspection, it is found that
> MSR_IA32_MCI_CTL2 threshold field is not setup properly. This patch
> fixed it.
>
>
> Changelog:
>
> v2:
>
> - Rename CMCI_EN to MCI_CTL2_CMCI_EN and CMCI_THRESHOLD_MASK to
> MCI_CTL2_CMCI_THRESHOLD_MASK to make naming consistent.
>
This looks like a mix of a renaming patch and new functionality. Please
submit the renaming as a one patch and then the new functionality as a
second patch on top, otherwise it gets hard to see what is actually
going on.
If I'm not mistaken, there are at least two functionality changes:
+#define MCI_CTL2_CMCI_THRESHOLD_MASK 0x7fffULL
-#define CMCI_THRESHOLD_MASK 0xffffULL
... change of mask, and:
- val |= CMCI_EN | CMCI_THRESHOLD;
+ val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK;
+ val |= MCI_CTL2_CMCI_EN | CMCI_THRESHOLD;
bit being cleared which wasn't before.
-hpa
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] x86, MCE, fix MSR_IA32_MCI_CTL2 CMCI threshold setup
2010-05-17 17:59 ` H. Peter Anvin
@ 2010-05-18 0:51 ` Huang Ying
2010-05-18 1:37 ` H. Peter Anvin
0 siblings, 1 reply; 8+ messages in thread
From: Huang Ying @ 2010-05-18 0:51 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Ingo Molnar, Andi Kleen, Hidetoshi Seto,
linux-kernel@vger.kernel.org
Hi, Peter,
Thanks for review.
On Tue, 2010-05-18 at 01:59 +0800, H. Peter Anvin wrote:
> On 05/17/2010 01:08 AM, Huang Ying wrote:
> > It is reported that CMCI is not raised when number of corrected error
> > reaches preset threshold. After inspection, it is found that
> > MSR_IA32_MCI_CTL2 threshold field is not setup properly. This patch
> > fixed it.
> >
> >
> > Changelog:
> >
> > v2:
> >
> > - Rename CMCI_EN to MCI_CTL2_CMCI_EN and CMCI_THRESHOLD_MASK to
> > MCI_CTL2_CMCI_THRESHOLD_MASK to make naming consistent.
> >
>
> This looks like a mix of a renaming patch and new functionality. Please
> submit the renaming as a one patch and then the new functionality as a
> second patch on top, otherwise it gets hard to see what is actually
> going on.
>
> If I'm not mistaken, there are at least two functionality changes:
>
> +#define MCI_CTL2_CMCI_THRESHOLD_MASK 0x7fffULL
> -#define CMCI_THRESHOLD_MASK 0xffffULL
>
> ... change of mask, and:
>
> - val |= CMCI_EN | CMCI_THRESHOLD;
> + val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK;
> + val |= MCI_CTL2_CMCI_EN | CMCI_THRESHOLD;
>
> bit being cleared which wasn't before.
This means we need 3 patches?
Best Regards,
Huang Ying
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] x86, MCE, fix MSR_IA32_MCI_CTL2 CMCI threshold setup
2010-05-18 0:51 ` Huang Ying
@ 2010-05-18 1:37 ` H. Peter Anvin
0 siblings, 0 replies; 8+ messages in thread
From: H. Peter Anvin @ 2010-05-18 1:37 UTC (permalink / raw)
To: Huang Ying
Cc: Ingo Molnar, Andi Kleen, Hidetoshi Seto,
linux-kernel@vger.kernel.org
On 05/17/2010 05:51 PM, Huang Ying wrote:
> Hi, Peter,
>
> Thanks for review.
>
> On Tue, 2010-05-18 at 01:59 +0800, H. Peter Anvin wrote:
>> On 05/17/2010 01:08 AM, Huang Ying wrote:
>>> It is reported that CMCI is not raised when number of corrected error
>>> reaches preset threshold. After inspection, it is found that
>>> MSR_IA32_MCI_CTL2 threshold field is not setup properly. This patch
>>> fixed it.
>>>
>>>
>>> Changelog:
>>>
>>> v2:
>>>
>>> - Rename CMCI_EN to MCI_CTL2_CMCI_EN and CMCI_THRESHOLD_MASK to
>>> MCI_CTL2_CMCI_THRESHOLD_MASK to make naming consistent.
>>>
>>
>> This looks like a mix of a renaming patch and new functionality. Please
>> submit the renaming as a one patch and then the new functionality as a
>> second patch on top, otherwise it gets hard to see what is actually
>> going on.
>>
>> If I'm not mistaken, there are at least two functionality changes:
>>
>> +#define MCI_CTL2_CMCI_THRESHOLD_MASK 0x7fffULL
>> -#define CMCI_THRESHOLD_MASK 0xffffULL
>>
>> ... change of mask, and:
>>
>> - val |= CMCI_EN | CMCI_THRESHOLD;
>> + val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK;
>> + val |= MCI_CTL2_CMCI_EN | CMCI_THRESHOLD;
>>
>> bit being cleared which wasn't before.
>
> This means we need 3 patches?
>
No, two patches is fine -- one for the rename (no binary changes) and
one for the functionality changes.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-05-18 1:38 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-17 8:08 [PATCH] x86, MCE, fix MSR_IA32_MCI_CTL2 CMCI threshold setup Huang Ying
2010-05-17 17:59 ` H. Peter Anvin
2010-05-18 0:51 ` Huang Ying
2010-05-18 1:37 ` H. Peter Anvin
-- strict thread matches above, loose matches on Subject: below --
2010-03-29 7:16 Huang Ying
2010-03-29 8:26 ` Hidetoshi Seto
2010-03-29 8:40 ` Huang Ying
2010-03-29 10:50 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).