* [PATCH] MCHECK, AMD: Fix code to allow calls to vmce_amd_rdmsr and vmce_amd_wrmsr
@ 2013-11-01 20:02 Aravind Gopalakrishnan
2013-11-04 15:52 ` Jan Beulich
0 siblings, 1 reply; 4+ messages in thread
From: Aravind Gopalakrishnan @ 2013-11-01 20:02 UTC (permalink / raw)
To: chegger, jinsong.liu
Cc: boris.ostrovsky, jacob.w.shin, Aravind Gopalakrishnan,
suravee.suthikulpanit, xen-devel
The existing switch statement: switch ( msr & (MSR_IA32_MC0_CTL | 0x3))
causes MSR_F10_MC4_MISC1 ... MSR_F10_MC4_MISC3 to be wrongly routed.
The old values were:
Value After applying msr & (MSR_IA32_MC0_CTL | 0x3)
-------------------------------- --------------------------------------------
MSR_F10_MC4_MISC1 = 0xc0000408 0x400 : Falls to case MC0_CTL
MSR_F10_MC4_MISC2 = 0xc0000409 0x401 : Falls to case MC0_STATUS
MSR_F10_MC4_MISC3 = 0xc000040A 0x402 : Falls to case MC0_ADDR
The patch corrects the switch statement to allow vmce_amd_* functions to handle
guest's rdmsr and wrmsr calls for MSR_F10_MC4_MISC1 ... MSR_F10_MC4_MISC3
While at it, correct the above mentioned MSR values in msr-index.h
The values should be -
MSR_F10_MC4_MISC1 (DRAM error type) = 0x00000413
MSR_F10_MC4_MISC2 (Link error type) = 0xc0000408
MSR_F10_MC4_MISC3 (L3 cache error) = 0xc0000409
Refer F10 BKDG F3x1[78, 70, 68, 60]. Also, MSRC0000040A does not exist from
Fam15 onwards. So let's use 0x413 for DRAM errors.
Tested on AMD Fam15 processor and works fine.
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
---
xen/arch/x86/cpu/mcheck/vmce.c | 4 ++--
xen/arch/x86/hvm/svm/svm.c | 10 ++++++----
xen/include/asm-x86/msr-index.h | 6 +++---
3 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c
index af3b491..448b2d7 100644
--- a/xen/arch/x86/cpu/mcheck/vmce.c
+++ b/xen/arch/x86/cpu/mcheck/vmce.c
@@ -107,7 +107,7 @@ static int bank_mce_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
*val = 0;
- switch ( msr & (MSR_IA32_MC0_CTL | 3) )
+ switch ( msr )
{
case MSR_IA32_MC0_CTL:
/* stick all 1's to MCi_CTL */
@@ -210,7 +210,7 @@ static int bank_mce_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
int ret = 1;
unsigned int bank = (msr - MSR_IA32_MC0_CTL) / 4;
- switch ( msr & (MSR_IA32_MC0_CTL | 3) )
+ switch ( msr )
{
case MSR_IA32_MC0_CTL:
/*
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 22a63a7..25bb792 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1460,8 +1460,9 @@ static int svm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
*msr_content = v->arch.hvm_svm.guest_sysenter_eip;
break;
- case MSR_IA32_MCx_MISC(4): /* Threshold register */
- case MSR_F10_MC4_MISC1 ... MSR_F10_MC4_MISC3:
+ case MSR_F10_MC4_MISC1:
+ case MSR_F10_MC4_MISC2:
+ case MSR_F10_MC4_MISC3:
/*
* MCA/MCE: We report that the threshold register is unavailable
* for OS use (locked by the BIOS).
@@ -1659,8 +1660,9 @@ static int svm_msr_write_intercept(unsigned int msr, uint64_t msr_content)
vpmu_do_wrmsr(msr, msr_content);
break;
- case MSR_IA32_MCx_MISC(4): /* Threshold register */
- case MSR_F10_MC4_MISC1 ... MSR_F10_MC4_MISC3:
+ case MSR_F10_MC4_MISC1:
+ case MSR_F10_MC4_MISC2:
+ case MSR_F10_MC4_MISC3:
/*
* MCA/MCE: Threshold register is reported to be locked, so we ignore
* all write accesses. This behaviour matches real HW, so guests should
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index e597a28..9904d41 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -218,9 +218,9 @@
#define AMD64_NB_CFG_CF8_EXT_ENABLE_BIT 46
/* AMD Family10h machine check MSRs */
-#define MSR_F10_MC4_MISC1 0xc0000408
-#define MSR_F10_MC4_MISC2 0xc0000409
-#define MSR_F10_MC4_MISC3 0xc000040A
+#define MSR_F10_MC4_MISC1 0x00000413
+#define MSR_F10_MC4_MISC2 0xc0000408
+#define MSR_F10_MC4_MISC3 0xc0000409
/* AMD Family10h Bus Unit MSRs */
#define MSR_F10_BU_CFG 0xc0011023
--
1.7.9.5
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] MCHECK, AMD: Fix code to allow calls to vmce_amd_rdmsr and vmce_amd_wrmsr
2013-11-01 20:02 [PATCH] MCHECK, AMD: Fix code to allow calls to vmce_amd_rdmsr and vmce_amd_wrmsr Aravind Gopalakrishnan
@ 2013-11-04 15:52 ` Jan Beulich
2013-11-05 15:39 ` Aravind Gopalakrioshnan
0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2013-11-04 15:52 UTC (permalink / raw)
To: Aravind Gopalakrishnan
Cc: jinsong.liu, chegger, jacob.w.shin, suravee.suthikulpanit,
xen-devel, boris.ostrovsky
>>> On 01.11.13 at 21:02, Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> wrote:
> The existing switch statement: switch ( msr & (MSR_IA32_MC0_CTL | 0x3))
> causes MSR_F10_MC4_MISC1 ... MSR_F10_MC4_MISC3 to be wrongly routed.
> The old values were:
> Value After applying msr & (MSR_IA32_MC0_CTL | 0x3)
> -------------------------------- --------------------------------------------
> MSR_F10_MC4_MISC1 = 0xc0000408 0x400 : Falls to case MC0_CTL
> MSR_F10_MC4_MISC2 = 0xc0000409 0x401 : Falls to case MC0_STATUS
> MSR_F10_MC4_MISC3 = 0xc000040A 0x402 : Falls to case MC0_ADDR
>
> The patch corrects the switch statement to allow vmce_amd_* functions to handle
> guest's rdmsr and wrmsr calls for MSR_F10_MC4_MISC1 ... MSR_F10_MC4_MISC3
But it should do so properly.
> While at it, correct the above mentioned MSR values in msr-index.h
> The values should be -
> MSR_F10_MC4_MISC1 (DRAM error type) = 0x00000413
> MSR_F10_MC4_MISC2 (Link error type) = 0xc0000408
> MSR_F10_MC4_MISC3 (L3 cache error) = 0xc0000409
>
> Refer F10 BKDG F3x1[78, 70, 68, 60]. Also, MSRC0000040A does not exist from
> Fam15 onwards. So let's use 0x413 for DRAM errors.
I don't think I follow what you're trying to say here.
> --- a/xen/arch/x86/cpu/mcheck/vmce.c
> +++ b/xen/arch/x86/cpu/mcheck/vmce.c
> @@ -107,7 +107,7 @@ static int bank_mce_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
>
> *val = 0;
>
> - switch ( msr & (MSR_IA32_MC0_CTL | 3) )
> + switch ( msr )
You can't drop the masking altogether here - that way you're
preventing banks other than bank 0 to be handled here.
> @@ -210,7 +210,7 @@ static int bank_mce_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
> int ret = 1;
> unsigned int bank = (msr - MSR_IA32_MC0_CTL) / 4;
>
> - switch ( msr & (MSR_IA32_MC0_CTL | 3) )
> + switch ( msr )
Same here.
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -1460,8 +1460,9 @@ static int svm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
> *msr_content = v->arch.hvm_svm.guest_sysenter_eip;
> break;
>
> - case MSR_IA32_MCx_MISC(4): /* Threshold register */
This deletion isn't being explained at all in the description.
> @@ -1659,8 +1660,9 @@ static int svm_msr_write_intercept(unsigned int msr, uint64_t msr_content)
> vpmu_do_wrmsr(msr, msr_content);
> break;
>
> - case MSR_IA32_MCx_MISC(4): /* Threshold register */
Again, same thing here.
> --- a/xen/include/asm-x86/msr-index.h
> +++ b/xen/include/asm-x86/msr-index.h
> @@ -218,9 +218,9 @@
> #define AMD64_NB_CFG_CF8_EXT_ENABLE_BIT 46
>
> /* AMD Family10h machine check MSRs */
> -#define MSR_F10_MC4_MISC1 0xc0000408
> -#define MSR_F10_MC4_MISC2 0xc0000409
> -#define MSR_F10_MC4_MISC3 0xc000040A
> +#define MSR_F10_MC4_MISC1 0x00000413
> +#define MSR_F10_MC4_MISC2 0xc0000408
> +#define MSR_F10_MC4_MISC3 0xc0000409
Fam10 BIOS and Kernel Developer's Guide disagrees with this.
Fam15 model 0x also doesn't list C0000413 (but indeed marks
C000040A [as well as subsequent ones] as reserved). Fam15
model 1x even lists everything from C0000409 onwards as
reserved.
So I'm afraid you need to start over.
Jan
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] MCHECK, AMD: Fix code to allow calls to vmce_amd_rdmsr and vmce_amd_wrmsr
2013-11-04 15:52 ` Jan Beulich
@ 2013-11-05 15:39 ` Aravind Gopalakrioshnan
2013-11-05 16:02 ` Jan Beulich
0 siblings, 1 reply; 4+ messages in thread
From: Aravind Gopalakrioshnan @ 2013-11-05 15:39 UTC (permalink / raw)
To: Jan Beulich
Cc: jinsong.liu, chegger, jacob.w.shin, suravee.suthikulpanit,
xen-devel, boris.ostrovsky
- switch ( msr & (MSR_IA32_MC0_CTL | 3) )
+ switch ( msr )
> You can't drop the masking altogether here - that way you're
> preventing banks other than bank 0 to be handled here.
When I drop the mask, I am allowing all MCA MSR's here.. It is when I
apply the mask
that I am allowing only MC0 bank MSR's alone. Here is an example: (All
values in hexadecimal form)
(MSR_IA32_MC0_CTL | 3) = 0x403; Therefore -
Msr val = 400
val & 0x403 = 400
Msr val = 401
val & 0x403 = 401
Msr val = 402
val & 0x403 = 402
Msr val = 403
val & 0x403 = 403
Msr val = 404
val & 0x403 = 400
Msr val = 405
val & 0x403 = 401
Msr val = 406
val & 0x403 = 402
Msr val = 407
val & 0x403 = 403
Msr val = 413
val & 0x403 = 403
Msr val = c0000408
val & 0x403 = 400
Msr val = c0000409
val & 0x403 = 401
We need to route the MC4 MSR's (0x413 for DRAM errors, 0xc0000408 for
Link errors, 0xc0000409 for L3 errors)
to be handled by vmce_amd_wrmsr and vmce_amd_rdmsr. Since by removing
the mask altogether, I am also
allowing MSR's 0x404, 0x405, 0x406 and 0x407 (which is harmless still as
they fall to 'default' in vmce_amd_rdmsr or
vmce_amd_wrmsr functions);
>> @@ -210,7 +210,7 @@ static int bank_mce_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
>> int ret = 1;
>> unsigned int bank = (msr - MSR_IA32_MC0_CTL) / 4;
>>
>> - switch ( msr & (MSR_IA32_MC0_CTL | 3) )
>> + switch ( msr )
> Same here.
>
>> --- a/xen/arch/x86/hvm/svm/svm.c
>> +++ b/xen/arch/x86/hvm/svm/svm.c
>> @@ -1460,8 +1460,9 @@ static int svm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
>> *msr_content = v->arch.hvm_svm.guest_sysenter_eip;
>> break;
>>
>> - case MSR_IA32_MCx_MISC(4): /* Threshold register */
> This deletion isn't being explained at all in the description.
>
>> @@ -1659,8 +1660,9 @@ static int svm_msr_write_intercept(unsigned int msr, uint64_t msr_content)
>> vpmu_do_wrmsr(msr, msr_content);
>> break;
>>
>> - case MSR_IA32_MCx_MISC(4): /* Threshold register */
> Again, same thing here.
I will explain this better in a V2 patch...
>> --- a/xen/include/asm-x86/msr-index.h
>> +++ b/xen/include/asm-x86/msr-index.h
>> @@ -218,9 +218,9 @@
>> #define AMD64_NB_CFG_CF8_EXT_ENABLE_BIT 46
>>
>> /* AMD Family10h machine check MSRs */
>> -#define MSR_F10_MC4_MISC1 0xc0000408
>> -#define MSR_F10_MC4_MISC2 0xc0000409
>> -#define MSR_F10_MC4_MISC3 0xc000040A
>> +#define MSR_F10_MC4_MISC1 0x00000413
>> +#define MSR_F10_MC4_MISC2 0xc0000408
>> +#define MSR_F10_MC4_MISC3 0xc0000409
> Fam10 BIOS and Kernel Developer's Guide disagrees with this.
> Fam15 model 0x also doesn't list C0000413 (but indeed marks
> C000040A [as well as subsequent ones] as reserved). Fam15
> model 1x even lists everything from C0000409 onwards as
> reserved.
It is MSR 0x413 and not 0xc0000413. MSR 0x413 *is* listed as DRAM
thresholding register.
And you are right about Link Thresholding (MSRC0000408) and
L3 Thresholding(MSRC0000409). One way to resolve this can be to add
quirks in 'mce_amd_quirks' structure and check for it before we emulate
Link/L3 thresholding MSR's to the guest.
Thoughts?
Thanks,
-Aravind.
> So I'm afraid you need to start over.
>
> Jan
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] MCHECK, AMD: Fix code to allow calls to vmce_amd_rdmsr and vmce_amd_wrmsr
2013-11-05 15:39 ` Aravind Gopalakrioshnan
@ 2013-11-05 16:02 ` Jan Beulich
0 siblings, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2013-11-05 16:02 UTC (permalink / raw)
To: Aravind Gopalakrioshnan
Cc: jinsong.liu, chegger, jacob.w.shin, suravee.suthikulpanit,
xen-devel, boris.ostrovsky
>>> On 05.11.13 at 16:39, Aravind Gopalakrioshnan <aravind.gopalakrishnan@amd.com> wrote:
>
> - switch ( msr & (MSR_IA32_MC0_CTL | 3) )
> + switch ( msr )
>
>> You can't drop the masking altogether here - that way you're
>> preventing banks other than bank 0 to be handled here.
>
> When I drop the mask, I am allowing all MCA MSR's here.. It is when I
> apply the mask
> that I am allowing only MC0 bank MSR's alone. Here is an example: (All
> values in hexadecimal form)
>
> (MSR_IA32_MC0_CTL | 3) = 0x403; Therefore -
>
>
> Msr val = 400
> val & 0x403 = 400
>
> Msr val = 401
> val & 0x403 = 401
>
> Msr val = 402
> val & 0x403 = 402
>
> Msr val = 403
> val & 0x403 = 403
>
> Msr val = 404
> val & 0x403 = 400
So you contradict yourself here: MSR 404 is being allowed in, and
is being handled just like MSR 400, just that "bank" is 1 in that
case.
> We need to route the MC4 MSR's (0x413 for DRAM errors,
MSR 413 is really MSR_IA32_MCx_MISC(4) if I'm not mistaken,
i.e. visible or invisible depending on the number of MCE MSR
banks a guest gets to see.
> 0xc0000408 for
> Link errors, 0xc0000409 for L3 errors)
> to be handled by vmce_amd_wrmsr and vmce_amd_rdmsr. Since by removing
> the mask altogether, I am also
> allowing MSR's 0x404, 0x405, 0x406 and 0x407 (which is harmless still as
> they fall to 'default' in vmce_amd_rdmsr or
> vmce_amd_wrmsr functions);
No, they all end up at the default case all of the sudden, while
they're supposed to be (and are currently) handled by the
respective non-default cases.
The only thing I agree to is that for the C00004xxx range
we're wrongly masking off the top two bits, which was an
oversight when support for these got added.
>>> --- a/xen/include/asm-x86/msr-index.h
>>> +++ b/xen/include/asm-x86/msr-index.h
>>> @@ -218,9 +218,9 @@
>>> #define AMD64_NB_CFG_CF8_EXT_ENABLE_BIT 46
>>>
>>> /* AMD Family10h machine check MSRs */
>>> -#define MSR_F10_MC4_MISC1 0xc0000408
>>> -#define MSR_F10_MC4_MISC2 0xc0000409
>>> -#define MSR_F10_MC4_MISC3 0xc000040A
>>> +#define MSR_F10_MC4_MISC1 0x00000413
>>> +#define MSR_F10_MC4_MISC2 0xc0000408
>>> +#define MSR_F10_MC4_MISC3 0xc0000409
>> Fam10 BIOS and Kernel Developer's Guide disagrees with this.
>> Fam15 model 0x also doesn't list C0000413 (but indeed marks
>> C000040A [as well as subsequent ones] as reserved). Fam15
>> model 1x even lists everything from C0000409 onwards as
>> reserved.
> It is MSR 0x413 and not 0xc0000413. MSR 0x413 *is* listed as DRAM
> thresholding register.
So it doesn't belong into this group in the first place then. As
above - as a "standard" or "architectural" MCE MSR, it shouldn't
get its own #define, but be referenced through
MSR_IA32_MCx_MISC().
> And you are right about Link Thresholding (MSRC0000408) and
> L3 Thresholding(MSRC0000409). One way to resolve this can be to add
> quirks in 'mce_amd_quirks' structure and check for it before we emulate
> Link/L3 thresholding MSR's to the guest.
>
> Thoughts?
That's a possibility. But you first of all need to explain how a
bank 4 MSR would be seen by a guest at all, when we only
emulate 1 or 2 banks these days.
Jan
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-11-05 16:02 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-01 20:02 [PATCH] MCHECK, AMD: Fix code to allow calls to vmce_amd_rdmsr and vmce_amd_wrmsr Aravind Gopalakrishnan
2013-11-04 15:52 ` Jan Beulich
2013-11-05 15:39 ` Aravind Gopalakrioshnan
2013-11-05 16:02 ` Jan Beulich
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.