[PATCH] mcheck, vmce: Allow vmce_amd_* functions to handle AMD thresolding MSRs

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mcheck, vmce: Allow vmce_amd_* functions to handle AMD thresolding MSRs
@ 2014-02-07  0:32 Aravind Gopalakrishnan
  2014-02-07 11:05 ` Jan Beulich
  0 siblings, 1 reply; 6+ messages in thread
From: Aravind Gopalakrishnan @ 2014-02-07  0:32 UTC (permalink / raw)
  To: chegger, jinsong.liu, suravee.suthikulpanit, boris.ostrovsky,
	xen-devel, JBeulich
  Cc: Aravind Gopalakrishnan

vmce_amd_[rd|wr]msr functions can handle accesses to AMD thresholding
registers. But due to this statement here:
switch ( msr & (MSR_IA32_MC0_CTL | 3) )
we are wrongly masking off top two bits which meant the register
accesses never made it to vmce_amd_* functions.

We correct this problem by modifying the mask in this patch to allow
AMD thresholding registers to fall to 'default' case which in turn
allows vmce_amd_* functions to handle access to the registers.

Also, the extended block of AMD MC4 MISC registers do not exist always.
In this patch, we rework the vmce_amd_[wr|rd]msr functions
to return #GP to guest if register does not exist in HW. If they do,
retain current behavior.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Christoph Egger <chegger@amazon.de>
---
 xen/arch/x86/cpu/mcheck/amd_f10.c |   54 +++++++++++++++----------------------
 xen/arch/x86/cpu/mcheck/mce_amd.h |    3 +++
 xen/arch/x86/cpu/mcheck/vmce.c    |    4 +--
 3 files changed, 26 insertions(+), 35 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/amd_f10.c b/xen/arch/x86/cpu/mcheck/amd_f10.c
index 61319dc..605f277 100644
--- a/xen/arch/x86/cpu/mcheck/amd_f10.c
+++ b/xen/arch/x86/cpu/mcheck/amd_f10.c
@@ -102,46 +102,34 @@ enum mcheck_type amd_f10_mcheck_init(struct cpuinfo_x86 *c)
 	return mcheck_amd_famXX;
 }
 
+/* check for AMD MC4 extended MISC register presence */
+static inline int amd_thresholding_reg_present(uint32_t msr)
+{
+    uint64_t val;
+    rdmsr_safe(msr, val);
+    if ( val & (AMD_MC4_MISC_VAL_MASK | AMD_MC4_MISC_CNTP_MASK) )
+        return 1;
+
+    return 0;
+}
+
 /* amd specific MCA MSR */
 int vmce_amd_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
 {
-	switch (msr) {
-	case MSR_F10_MC4_MISC1: /* DRAM error type */
-		v->arch.vmce.bank[1].mci_misc = val; 
-		mce_printk(MCE_VERBOSE, "MCE: wr msr %#"PRIx64"\n", val);
-		break;
-	case MSR_F10_MC4_MISC2: /* Link error type */
-	case MSR_F10_MC4_MISC3: /* L3 cache error type */
-		/* ignore write: we do not emulate link and l3 cache errors
-		 * to the guest.
-		 */
-		mce_printk(MCE_VERBOSE, "MCE: wr msr %#"PRIx64"\n", val);
-		break;
-	default:
-		return 0;
-	}
+    /* If not present, #GP fault, else do nothing as we don't emulate */
+    if ( !amd_thresholding_reg_present(msr) )
+        return -1;
 
-	return 1;
+    mce_printk(MCE_VERBOSE, "MCE: wr msr %#"PRIx64"\n", val);
+    return 1;
 }
 
 int vmce_amd_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
 {
-	switch (msr) {
-	case MSR_F10_MC4_MISC1: /* DRAM error type */
-		*val = v->arch.vmce.bank[1].mci_misc;
-		mce_printk(MCE_VERBOSE, "MCE: rd msr %#"PRIx64"\n", *val);
-		break;
-	case MSR_F10_MC4_MISC2: /* Link error type */
-	case MSR_F10_MC4_MISC3: /* L3 cache error type */
-		/* we do not emulate link and l3 cache
-		 * errors to the guest.
-		 */
-		*val = 0;
-		mce_printk(MCE_VERBOSE, "MCE: rd msr %#"PRIx64"\n", *val);
-		break;
-	default:
-		return 0;
-	}
+    /* If not present, #GP fault, else assign '0' as we don't emulate */
+    if ( !amd_thresholding_reg_present(msr) )
+        return -1;
 
-	return 1;
+    *val = 0;
+    return 1;
 }
diff --git a/xen/arch/x86/cpu/mcheck/mce_amd.h b/xen/arch/x86/cpu/mcheck/mce_amd.h
index 5d047e7..a6024fb 100644
--- a/xen/arch/x86/cpu/mcheck/mce_amd.h
+++ b/xen/arch/x86/cpu/mcheck/mce_amd.h
@@ -1,6 +1,9 @@
 #ifndef _MCHECK_AMD_H
 #define _MCHECK_AMD_H
 
+#define AMD_MC4_MISC_VAL_MASK           (1ULL << 63)
+#define AMD_MC4_MISC_CNTP_MASK          (1ULL << 62)
+
 enum mcheck_type amd_k8_mcheck_init(struct cpuinfo_x86 *c);
 enum mcheck_type amd_f10_mcheck_init(struct cpuinfo_x86 *c);
 
diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c
index f6c35db..be9bb5e 100644
--- a/xen/arch/x86/cpu/mcheck/vmce.c
+++ b/xen/arch/x86/cpu/mcheck/vmce.c
@@ -107,7 +107,7 @@ static int bank_mce_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
 
     *val = 0;
 
-    switch ( msr & (MSR_IA32_MC0_CTL | 3) )
+    switch ( msr & (-MSR_IA32_MC0_CTL | 3) )
     {
     case MSR_IA32_MC0_CTL:
         /* stick all 1's to MCi_CTL */
@@ -210,7 +210,7 @@ static int bank_mce_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
     int ret = 1;
     unsigned int bank = (msr - MSR_IA32_MC0_CTL) / 4;
 
-    switch ( msr & (MSR_IA32_MC0_CTL | 3) )
+    switch ( msr & (-MSR_IA32_MC0_CTL | 3) )
     {
     case MSR_IA32_MC0_CTL:
         /*
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] mcheck, vmce: Allow vmce_amd_* functions to handle AMD thresolding MSRs
  2014-02-07  0:32 [PATCH] mcheck, vmce: Allow vmce_amd_* functions to handle AMD thresolding MSRs Aravind Gopalakrishnan
@ 2014-02-07 11:05 ` Jan Beulich
  2014-02-07 21:27   ` Aravind Gopalakrishnan
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Beulich @ 2014-02-07 11:05 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: jinsong.liu, boris.ostrovsky, chegger, suravee.suthikulpanit,
	xen-devel

>>> On 07.02.14 at 01:32, Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> wrote:
> +/* check for AMD MC4 extended MISC register presence */
> +static inline int amd_thresholding_reg_present(uint32_t msr)
> +{
> +    uint64_t val;
> +    rdmsr_safe(msr, val);

You ought to check the result of this operation, even if at present
it clear "val" on error.

I also wonder what good it does to repeatedly trigger #GP here
if we already once learned that there's no such register. IOW,
please store the fact that the register is absent in a static
variable (and no, this shouldn't be a per-CPU one - if the register
is missing on any pCPU, we must not try to access it anywhere, as
vCPU-s could end up running once here and once there; in the end
we assume consistency across the CPUs in a system anyway).

> +    if ( val & (AMD_MC4_MISC_VAL_MASK | AMD_MC4_MISC_CNTP_MASK) )
> +        return 1;
> +
> +    return 0;
> +}
> +
>  /* amd specific MCA MSR */
>  int vmce_amd_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
>  {
> -	switch (msr) {
> -	case MSR_F10_MC4_MISC1: /* DRAM error type */
> -		v->arch.vmce.bank[1].mci_misc = val; 
> -		mce_printk(MCE_VERBOSE, "MCE: wr msr %#"PRIx64"\n", val);
> -		break;
> -	case MSR_F10_MC4_MISC2: /* Link error type */
> -	case MSR_F10_MC4_MISC3: /* L3 cache error type */
> -		/* ignore write: we do not emulate link and l3 cache errors
> -		 * to the guest.
> -		 */
> -		mce_printk(MCE_VERBOSE, "MCE: wr msr %#"PRIx64"\n", val);
> -		break;
> -	default:
> -		return 0;
> -	}
> +    /* If not present, #GP fault, else do nothing as we don't emulate */
> +    if ( !amd_thresholding_reg_present(msr) )
> +        return -1;

The one thing I'm concerned about making this #GP in the guest is
migration: With it being _newer_ CPUs implementing fewer of these
MSRs, it would be impossible to migrate a guest from an older system
to a newer one - a direction that (as long as the newer system
provides all the hardware capabilities the older one has) is generally
assumed to work. Bottom line - we're probably better off always
dropping writes, and always returning zero for reads. Which will
eliminate the need for amd_thresholding_reg_present().

Jan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mcheck, vmce: Allow vmce_amd_* functions to handle AMD thresolding MSRs
  2014-02-07 11:05 ` Jan Beulich
@ 2014-02-07 21:27   ` Aravind Gopalakrishnan
  2014-02-10  7:41     ` Jan Beulich
  0 siblings, 1 reply; 6+ messages in thread
From: Aravind Gopalakrishnan @ 2014-02-07 21:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: jinsong.liu, boris.ostrovsky, chegger, suravee.suthikulpanit,
	xen-devel

On Fri, Feb 07, 2014 at 11:05:17AM +0000, Jan Beulich wrote:
> >>> On 07.02.14 at 01:32, Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> wrote:
> > -	case MSR_F10_MC4_MISC1: /* DRAM error type */
> > -		v->arch.vmce.bank[1].mci_misc = val; 
> > -		mce_printk(MCE_VERBOSE, "MCE: wr msr %#"PRIx64"\n", val);
> > -		break;
> > -	case MSR_F10_MC4_MISC2: /* Link error type */
> > -	case MSR_F10_MC4_MISC3: /* L3 cache error type */
> > -		/* ignore write: we do not emulate link and l3 cache errors
> > -		 * to the guest.
> > -		 */
> > -		mce_printk(MCE_VERBOSE, "MCE: wr msr %#"PRIx64"\n", val);
> > -		break;
> > -	default:
> > -		return 0;
> > -	}
> > +    /* If not present, #GP fault, else do nothing as we don't emulate */
> > +    if ( !amd_thresholding_reg_present(msr) )
> > +        return -1;
> 
> The one thing I'm concerned about making this #GP in the guest is
> migration: With it being _newer_ CPUs implementing fewer of these
> MSRs, it would be impossible to migrate a guest from an older system
> to a newer one - a direction that (as long as the newer system
> provides all the hardware capabilities the older one has) is generally
> assumed to work. Bottom line - we're probably better off always
> dropping writes, and always returning zero for reads. Which will
> eliminate the need for amd_thresholding_reg_present().
> 

Before I go ahead and remove the function, few questions-

Assuming there is a tool in the guest that accesses these MSRs,
wouldn't it be fair to expect that the tool keep in mind these MSRs
exist only in certain families?

For example:
if there's a guest running on F10 that accesses 0xc000040a, that would
be fine. But once we migrate to a newer family, then the guest should
not even generate accesses to the MSR.

Also, returning #GP to guests would mean keeping it consistent with HW
behavior. If we return zero for reads, (IMHO) it's not necessarily
correct information as the register does not even exist.. 

Bare-metal cases will face same problems too.. but if a register doesn't
exist, then shouldn't OS/hypervisor just say so and let whoever
generated the access deal with it?

Thanks,
-Aravind.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mcheck, vmce: Allow vmce_amd_* functions to handle AMD thresolding MSRs
  2014-02-07 21:27   ` Aravind Gopalakrishnan
@ 2014-02-10  7:41     ` Jan Beulich
  2014-02-10 16:54       ` Aravind Gopalakrishnan
  2014-02-12  9:58       ` Egger, Christoph
  0 siblings, 2 replies; 6+ messages in thread
From: Jan Beulich @ 2014-02-10  7:41 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: jinsong.liu, boris.ostrovsky, chegger, suravee.suthikulpanit,
	xen-devel

>>> On 07.02.14 at 22:27, Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
wrote:
> On Fri, Feb 07, 2014 at 11:05:17AM +0000, Jan Beulich wrote:
>> >>> On 07.02.14 at 01:32, Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> 
> wrote:
>> > -	case MSR_F10_MC4_MISC1: /* DRAM error type */
>> > -		v->arch.vmce.bank[1].mci_misc = val; 
>> > -		mce_printk(MCE_VERBOSE, "MCE: wr msr %#"PRIx64"\n", val);
>> > -		break;
>> > -	case MSR_F10_MC4_MISC2: /* Link error type */
>> > -	case MSR_F10_MC4_MISC3: /* L3 cache error type */
>> > -		/* ignore write: we do not emulate link and l3 cache errors
>> > -		 * to the guest.
>> > -		 */
>> > -		mce_printk(MCE_VERBOSE, "MCE: wr msr %#"PRIx64"\n", val);
>> > -		break;
>> > -	default:
>> > -		return 0;
>> > -	}
>> > +    /* If not present, #GP fault, else do nothing as we don't emulate */
>> > +    if ( !amd_thresholding_reg_present(msr) )
>> > +        return -1;
>> 
>> The one thing I'm concerned about making this #GP in the guest is
>> migration: With it being _newer_ CPUs implementing fewer of these
>> MSRs, it would be impossible to migrate a guest from an older system
>> to a newer one - a direction that (as long as the newer system
>> provides all the hardware capabilities the older one has) is generally
>> assumed to work. Bottom line - we're probably better off always
>> dropping writes, and always returning zero for reads. Which will
>> eliminate the need for amd_thresholding_reg_present().
>> 
> 
> Before I go ahead and remove the function, few questions-
> 
> Assuming there is a tool in the guest that accesses these MSRs,
> wouldn't it be fair to expect that the tool keep in mind these MSRs
> exist only in certain families?
> 
> For example:
> if there's a guest running on F10 that accesses 0xc000040a, that would
> be fine. But once we migrate to a newer family, then the guest should
> not even generate accesses to the MSR.

All correct, provided the family check and the MSR access aren't
separated by a migration.

> Also, returning #GP to guests would mean keeping it consistent with HW
> behavior. If we return zero for reads, (IMHO) it's not necessarily
> correct information as the register does not even exist.. 
> 
> Bare-metal cases will face same problems too.. but if a register doesn't
> exist, then shouldn't OS/hypervisor just say so and let whoever
> generated the access deal with it?

That's all valid argumentation as long as you leave migration out
of the picture.

Jan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mcheck, vmce: Allow vmce_amd_* functions to handle AMD thresolding MSRs
  2014-02-10  7:41     ` Jan Beulich
@ 2014-02-10 16:54       ` Aravind Gopalakrishnan
  2014-02-12  9:58       ` Egger, Christoph
  1 sibling, 0 replies; 6+ messages in thread
From: Aravind Gopalakrishnan @ 2014-02-10 16:54 UTC (permalink / raw)
  To: Jan Beulich
  Cc: jinsong.liu, boris.ostrovsky, chegger, suravee.suthikulpanit,
	xen-devel

On 2/10/2014 1:41 AM, Jan Beulich wrote:
> That's all valid argumentation as long as you leave migration out
> of the picture.
>
> Jan
>
>
Hmm. Allright; I am sending revised version of the patch with 
'amd_thresholding_reg_present' function removed..

Thanks,
-Aravind.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mcheck, vmce: Allow vmce_amd_* functions to handle AMD thresolding MSRs
  2014-02-10  7:41     ` Jan Beulich
  2014-02-10 16:54       ` Aravind Gopalakrishnan
@ 2014-02-12  9:58       ` Egger, Christoph
  1 sibling, 0 replies; 6+ messages in thread
From: Egger, Christoph @ 2014-02-12  9:58 UTC (permalink / raw)
  To: Jan Beulich, Aravind Gopalakrishnan
  Cc: jinsong.liu, boris.ostrovsky, suravee.suthikulpanit, xen-devel

On 10.02.14 08:41, Jan Beulich wrote:
>>>> On 07.02.14 at 22:27, Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
> wrote:
>> On Fri, Feb 07, 2014 at 11:05:17AM +0000, Jan Beulich wrote:
>>>>>> On 07.02.14 at 01:32, Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> 
>> wrote:
>>>> -	case MSR_F10_MC4_MISC1: /* DRAM error type */
>>>> -		v->arch.vmce.bank[1].mci_misc = val; 
>>>> -		mce_printk(MCE_VERBOSE, "MCE: wr msr %#"PRIx64"\n", val);
>>>> -		break;
>>>> -	case MSR_F10_MC4_MISC2: /* Link error type */
>>>> -	case MSR_F10_MC4_MISC3: /* L3 cache error type */
>>>> -		/* ignore write: we do not emulate link and l3 cache errors
>>>> -		 * to the guest.
>>>> -		 */
>>>> -		mce_printk(MCE_VERBOSE, "MCE: wr msr %#"PRIx64"\n", val);
>>>> -		break;
>>>> -	default:
>>>> -		return 0;
>>>> -	}
>>>> +    /* If not present, #GP fault, else do nothing as we don't emulate */
>>>> +    if ( !amd_thresholding_reg_present(msr) )
>>>> +        return -1;
>>>
>>> The one thing I'm concerned about making this #GP in the guest is
>>> migration: With it being _newer_ CPUs implementing fewer of these
>>> MSRs, it would be impossible to migrate a guest from an older system
>>> to a newer one - a direction that (as long as the newer system
>>> provides all the hardware capabilities the older one has) is generally
>>> assumed to work. Bottom line - we're probably better off always
>>> dropping writes, and always returning zero for reads. Which will
>>> eliminate the need for amd_thresholding_reg_present().
>>>
>>
>> Before I go ahead and remove the function, few questions-
>>
>> Assuming there is a tool in the guest that accesses these MSRs,
>> wouldn't it be fair to expect that the tool keep in mind these MSRs
>> exist only in certain families?
>>
>> For example:
>> if there's a guest running on F10 that accesses 0xc000040a, that would
>> be fine. But once we migrate to a newer family, then the guest should
>> not even generate accesses to the MSR.
> 
> All correct, provided the family check and the MSR access aren't
> separated by a migration.
> 
>> Also, returning #GP to guests would mean keeping it consistent with HW
>> behavior. If we return zero for reads, (IMHO) it's not necessarily
>> correct information as the register does not even exist.. 
>>
>> Bare-metal cases will face same problems too.. but if a register doesn't
>> exist, then shouldn't OS/hypervisor just say so and let whoever
>> generated the access deal with it?
> 
> That's all valid argumentation as long as you leave migration out
> of the picture.

I agree with Jan. All argumentation is valid from hardware perspective.

Apart from migration there is another perspective you miss completely:
The vmce_amd_* functions (and also the corresponding intel functions)
deal with *virtual* MSRs and deal with the case what should happen
with/to the guest when the guest accesses them.

This has absolutely nothing to do what the hardware provides and what
not. The point is, the guest knows (or better assumes) which MSRs exist
from the cpu family/model information it gets via cpuid. The question is
what should happen when the guest accesses these MSRs.

To get the right thing, the questions are:
What should the hypervisor do for recovery?
Does it make sense to make the guest aware of it?

Christoph

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-02-12  9:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-07  0:32 [PATCH] mcheck, vmce: Allow vmce_amd_* functions to handle AMD thresolding MSRs Aravind Gopalakrishnan
2014-02-07 11:05 ` Jan Beulich
2014-02-07 21:27   ` Aravind Gopalakrishnan
2014-02-10  7:41     ` Jan Beulich
2014-02-10 16:54       ` Aravind Gopalakrishnan
2014-02-12  9:58       ` Egger, Christoph

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).