From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boris Ostrovsky Subject: Re: [PATCH] x86/AMD: Add support for AMD's OSVW feature in guests Date: Tue, 17 Jan 2012 12:54:14 -0500 Message-ID: <4F15B5C6.6070808@amd.com> References: <4F153EBA020000780006D248@nat28.tlf.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4F153EBA020000780006D248@nat28.tlf.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Jan Beulich Cc: "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org Responses inline. On 01/17/12 03:26, Jan Beulich wrote: >>>> On 16.01.12 at 22:11, Boris Ostrovsky wrote: >> --- a/tools/libxc/xc_cpuid_x86.c Mon Jan 09 16:01:44 2012 +0100 >> +++ b/tools/libxc/xc_cpuid_x86.c Mon Jan 16 22:08:09 2012 +0100 >> @@ -108,6 +108,7 @@ static void amd_xc_cpuid_policy( >> bitmaskof(X86_FEATURE_SSE4A) | >> bitmaskof(X86_FEATURE_MISALIGNSSE) | >> bitmaskof(X86_FEATURE_3DNOWPREFETCH) | >> + bitmaskof(X86_FEATURE_OSVW) | > > Indentation. > > Also, is this really meaningful to PV guests? Does amd_xc_cpuid_policy() get called for PV guests? I thought this is HVM path only. xc_cpuid_pv_policy() is where clear_bit(X86_FEATURE_OSVW, regs[2]) was removed to handle PV. I believe this (i.e. OSVW changes) is meaningful for PV. Taking erratum 400 as an example -- we don't need a Linux PV guest reading an MSR before going to idle (in amd_e400_idle()). > And valid for pre-Fam10? No, it indeed shouldn't be set in those cases. I could set the bit conditionally, based on host family but the trouble is I don't necessarily know what family the guest will be. Or is this known at this point? > >> bitmaskof(X86_FEATURE_XOP) | >> bitmaskof(X86_FEATURE_FMA4) | >> bitmaskof(X86_FEATURE_TBM) | >> --- a/xen/arch/x86/cpu/amd.c Mon Jan 09 16:01:44 2012 +0100 >> +++ b/xen/arch/x86/cpu/amd.c Mon Jan 16 22:08:09 2012 +0100 >> @@ -32,6 +32,13 @@ >> static char opt_famrev[14]; >> string_param("cpuid_mask_cpu", opt_famrev); >> >> +/* >> + * Set osvw_len to higher number when updated Revision Guides >> + * are published and we know what the new status bits are >> + */ > > This is ugly, as it requires permanent attention. The value to start > with should really be 64 (beyond which other code adjustments are > going to be necessary anyway). I went back and forth on this. The reason I settled on 4 is because we obviously don't know what errata the higher bits will cover and because it is (at least theoretically) possible that a guest shouldn't see the same status bits as the host. But maybe code maintenance is more burden than it's worth? > >> +static uint64_t osvw_length = 4, osvw_status; >> +static DEFINE_SPINLOCK(amd_lock); >> + >> static inline void wrmsr_amd(unsigned int index, unsigned int lo, >> unsigned int hi) >> { >> @@ -182,6 +189,35 @@ static void __devinit set_cpuidmask(cons >> } >> } >> >> +static void amd_guest_osvw_init(struct vcpu *vcpu) >> +{ >> + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) >> + return; > > Shouldn't we bail here for pre-Fam10 CPUs too? As you noted below the vendor check should be removed anyway since this is being called from under such a check already. As for family --- I think I need to solve how X86_FEATURE_OSVW is set and that will take care of whether anything that gets initialized in this routine is ever used. > > Further, as asked above already - is all this really meaningful to PV > guests? Otherwise the code would better go somewhere under > xen/arch/x86/hvm/svm/ ... > > Also, throughout this file indentation needs to be changed to Linux > style (tabs instead of spaces), unless you want to contribute a patch > to convert the whole file to Xen style (in which case you'd still need > to make adjustments here). > >> + >> + /* >> + * Guests should see errata 400 and 415 as fixed (assuming that >> + * HLT and IO instructions are intercepted). >> + */ >> + vcpu->arch.amd.osvw.length = (osvw_length>= 3) ? (osvw_length) : 3; > > An expression consisting of just an identifier for sure doesn't need > parentheses. > >> + vcpu->arch.amd.osvw.status = osvw_status& ~(6ULL); >> + >> + /* >> + * By increasing VCPU's osvw.length to 3 we are telling the guest that >> + * all osvw.status bits inside that length, including bit 0 (which is >> + * reserved for erratum 298), are valid. However, if host processor's >> + * osvw_len is 0 then osvw_status[0] carries no information. We need to >> + * be conservative here and therefore we tell the guest that erratum 298 >> + * is present (because we really don't know). >> + */ >> + if (osvw_length == 0&& boot_cpu_data.x86 == 0x10) > > Why do you check the family here? If osvw_length can only ever be > zero on Fam10, then the first check is sufficient. If osvw_length can > be zero on other than Fam10 (no matter whether we bail early older > CPUs), then the check is actually wrong. 10h is the only family affected by this erratum. > >> + vcpu->arch.amd.osvw.status |= 1; >> +} >> + >> +void amd_vcpu_initialise(struct vcpu *vcpu) >> +{ >> + amd_guest_osvw_init(vcpu); >> +} >> + >> /* >> * Check for the presence of an AMD erratum. Arguments are defined in amd.h >> >> * for each known erratum. Return 1 if erratum is found. >> @@ -512,6 +548,30 @@ static void __devinit init_amd(struct cp >> set_cpuidmask(c); >> >> check_syscfg_dram_mod_en(); >> + >> + /* >> + * Get OSVW bits. If bits are not the same on different processors then >> + * choose the worst case (i.e. if erratum is present on one processor and >> + * not on another assume that the erratum is present everywhere). >> + */ >> + if (test_bit(X86_FEATURE_OSVW,&boot_cpu_data.x86_capability)) { >> + uint64_t len, status; >> + >> + if (rdmsr_safe(MSR_AMD_OSVW_ID_LENGTH, len) || >> + rdmsr_safe(MSR_AMD_OSVW_STATUS, status)) >> + len = status = 0; >> + >> + spin_lock(&amd_lock); > > What is the locking here supposed to protect against? The AP call tree > is smp_callin() -> identify_cpu() -> init_amd(), which cannot race with > anything (as the initiating processor is waiting for cpu_state to change > to CPU_STATE_CALLIN). > >> + >> + if (len< osvw_length) >> + osvw_length = len; >> + >> + osvw_status |= status; >> + osvw_status&= (1ULL<< osvw_length) - 1; >> + >> + spin_unlock(&amd_lock); >> + } else >> + osvw_length = osvw_status = 0; >> } >> >> static struct cpu_dev amd_cpu_dev __cpuinitdata = { >> --- a/xen/arch/x86/domain.c Mon Jan 09 16:01:44 2012 +0100 >> +++ b/xen/arch/x86/domain.c Mon Jan 16 22:08:09 2012 +0100 >> @@ -422,6 +422,10 @@ int vcpu_initialise(struct vcpu *v) >> if ( (rc = vcpu_init_fpu(v)) != 0 ) >> return rc; >> >> + /* Vendor-specific initialization */ >> + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) > > If you check the vendor here, you don't need to anywhere in the > descendant functions. > >> + amd_vcpu_initialise(v); >> + >> if ( is_hvm_domain(d) ) >> { >> rc = hvm_vcpu_initialise(v); >> --- a/xen/arch/x86/hvm/svm/svm.c Mon Jan 09 16:01:44 2012 +0100 >> +++ b/xen/arch/x86/hvm/svm/svm.c Mon Jan 16 22:08:09 2012 +0100 >> @@ -1044,6 +1044,27 @@ static void svm_init_erratum_383(struct >> } >> } >> >> +static int svm_handle_osvw(struct vcpu *v, uint32_t msr, uint64_t *val, uint read) >> +{ >> + uint eax, ebx, ecx, edx; >> + >> + /* Guest OSVW support */ >> + hvm_cpuid(0x80000001,&eax,&ebx,&ecx,&edx); >> + if (!test_bit((X86_FEATURE_OSVW& 31),&ecx)) > > Formatting of changes to this file should be changed to Xen style. > >> + return -1; >> + >> + if (read) { >> + if (msr == MSR_AMD_OSVW_ID_LENGTH) >> + *val = v->arch.amd.osvw.length; >> + else >> + *val = v->arch.amd.osvw.status; >> + } >> + /* Writes are ignored */ >> + >> + return 0; >> +} >> + >> + >> static int svm_cpu_up(void) >> { >> uint64_t msr_content; >> --- a/xen/arch/x86/traps.c Mon Jan 09 16:01:44 2012 +0100 >> +++ b/xen/arch/x86/traps.c Mon Jan 16 22:08:09 2012 +0100 >> @@ -2542,6 +2542,15 @@ static int emulate_privileged_op(struct >> if ( wrmsr_safe(regs->ecx, msr_content) != 0 ) >> goto fail; >> break; >> + case MSR_AMD_OSVW_ID_LENGTH: >> + case MSR_AMD_OSVW_STATUS: >> + if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD ) { >> + if (!boot_cpu_has(X86_FEATURE_OSVW)) >> + goto fail; >> + else > > Bogus "else" after a "goto". And I wonder, provided there is some > point in doing all this for PV guests in the first place, whether this > shouldn't be kept getting handled by the default case. If we go into default case we will emit "Domain attempted WRMSR ..." message in the log. I was trying to avoid that since writing to these registers is not really an error, it just gets ignored. I'll fix the rest. Thanks for review. -boris > >> + break; /* Writes are ignored */ >> + } >> + /* Fall through to default case */ >> default: >> if ( wrmsr_hypervisor_regs(regs->ecx, msr_content) ) >> break; >> @@ -2573,6 +2582,7 @@ static int emulate_privileged_op(struct >> break; >> >> case 0x32: /* RDMSR */ >> + > > Stray addition of a newline. > >> switch ( (u32)regs->ecx ) >> { >> #ifdef CONFIG_X86_64 >> @@ -2632,6 +2642,23 @@ static int emulate_privileged_op(struct >> regs->eax = (uint32_t)msr_content; >> regs->edx = (uint32_t)(msr_content>> 32); >> break; >> + case MSR_AMD_OSVW_ID_LENGTH: >> + case MSR_AMD_OSVW_STATUS: >> + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) { >> + if (!boot_cpu_has(X86_FEATURE_OSVW)) >> + goto fail; >> + else { > > Bogus "else" after a "goto". > > Jan > >> + if ((u32)regs->ecx == MSR_AMD_OSVW_ID_LENGTH) >> + msr_content = v->arch.amd.osvw.length; >> + else >> + msr_content = v->arch.amd.osvw.status; >> + >> + regs->eax = (uint32_t)msr_content; >> + regs->edx = (uint32_t)(msr_content>> 32); >> + } >> + } else >> + goto rdmsr_normal; >> + break; >> default: >> if ( rdmsr_hypervisor_regs(regs->ecx,&val) ) >> { > >