From mboxrd@z Thu Jan  1 00:00:00 1970
From: Boris Ostrovsky <boris.ostrovsky@amd.com>
Subject: Re: [PATCH] x86/AMD: Add support for AMD's OSVW feature
 in guests
Date: Tue, 17 Jan 2012 12:54:14 -0500
Message-ID: <4F15B5C6.6070808@amd.com>
References: <f157d40df95aa3b8becb.1326748277@wotan.osrc.amd.com>
	<4F153EBA020000780006D248@nat28.tlf.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <4F153EBA020000780006D248@nat28.tlf.novell.com>
List-Unsubscribe: <http://lists.xensource.com/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Jan Beulich <JBeulich@suse.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
List-Id: xen-devel@lists.xenproject.org

Responses inline.

On 01/17/12 03:26, Jan Beulich wrote:
>>>> On 16.01.12 at 22:11, Boris Ostrovsky<boris.ostrovsky@amd.com>  wrote:
>> --- a/tools/libxc/xc_cpuid_x86.c	Mon Jan 09 16:01:44 2012 +0100
>> +++ b/tools/libxc/xc_cpuid_x86.c	Mon Jan 16 22:08:09 2012 +0100
>> @@ -108,6 +108,7 @@ static void amd_xc_cpuid_policy(
>>                       bitmaskof(X86_FEATURE_SSE4A) |
>>                       bitmaskof(X86_FEATURE_MISALIGNSSE) |
>>                       bitmaskof(X86_FEATURE_3DNOWPREFETCH) |
>> +		    bitmaskof(X86_FEATURE_OSVW) |
>
> Indentation.
>
> Also, is this really meaningful to PV guests?

Does amd_xc_cpuid_policy() get called for PV guests? I thought this is 
HVM path only.

xc_cpuid_pv_policy() is where clear_bit(X86_FEATURE_OSVW, regs[2]) was 
removed to handle PV.

I believe this (i.e. OSVW changes) is meaningful for PV. Taking erratum 
400 as an example -- we don't need a Linux PV guest reading an MSR 
before going to idle (in amd_e400_idle()).

> And valid for pre-Fam10?

No, it indeed shouldn't be set in those cases.

I could set the bit conditionally, based on host family but the trouble 
is I don't necessarily know what family the guest will be. Or is this 
known at this point?

>
>>                       bitmaskof(X86_FEATURE_XOP) |
>>                       bitmaskof(X86_FEATURE_FMA4) |
>>                       bitmaskof(X86_FEATURE_TBM) |
>> --- a/xen/arch/x86/cpu/amd.c	Mon Jan 09 16:01:44 2012 +0100
>> +++ b/xen/arch/x86/cpu/amd.c	Mon Jan 16 22:08:09 2012 +0100
>> @@ -32,6 +32,13 @@
>>   static char opt_famrev[14];
>>   string_param("cpuid_mask_cpu", opt_famrev);
>>
>> +/*
>> + * Set osvw_len to higher number when updated Revision Guides
>> + * are published and we know what the new status bits are
>> + */
>
> This is ugly, as it requires permanent attention. The value to start
> with should really be 64 (beyond which other code adjustments are
> going to be necessary anyway).

I went back and forth on this. The reason I settled on 4 is because we 
obviously don't know what errata the higher bits will cover and because 
it is (at least theoretically) possible that a guest shouldn't see the 
same status bits as the host.

But maybe code maintenance is more burden than it's worth?

>
>> +static uint64_t osvw_length = 4, osvw_status;
>> +static DEFINE_SPINLOCK(amd_lock);
>> +
>>   static inline void wrmsr_amd(unsigned int index, unsigned int lo,
>>   		unsigned int hi)
>>   {
>> @@ -182,6 +189,35 @@ static void __devinit set_cpuidmask(cons
>>   	}
>>   }
>>
>> +static void amd_guest_osvw_init(struct vcpu *vcpu)
>> +{
>> +    if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
>> +	return;
>
> Shouldn't we bail here for pre-Fam10 CPUs too?

As you noted below the vendor check should be removed anyway since this 
is being called from under such a check already.

As for family --- I think I need to solve how X86_FEATURE_OSVW is set 
and that will take care of whether anything that gets initialized in 
this routine is ever used.

>
> Further, as asked above already - is all this really meaningful to PV
> guests? Otherwise the code would better go somewhere under
> xen/arch/x86/hvm/svm/ ...
>
> Also, throughout this file indentation needs to be changed to Linux
> style (tabs instead of spaces), unless you want to contribute a patch
> to convert the whole file to Xen style (in which case you'd still need
> to make adjustments here).
>
>> +
>> +    /*
>> +     * Guests should see errata 400 and 415 as fixed (assuming that
>> +     * HLT and IO instructions are intercepted).
>> +     */
>> +    vcpu->arch.amd.osvw.length = (osvw_length>= 3) ? (osvw_length) : 3;
>
> An expression consisting of just an identifier for sure doesn't need
> parentheses.
>
>> +    vcpu->arch.amd.osvw.status = osvw_status&  ~(6ULL);
>> +
>> +    /*
>> +     * By increasing VCPU's osvw.length to 3 we are telling the guest that
>> +     * all osvw.status bits inside that length, including bit 0 (which is
>> +     * reserved for erratum 298), are valid. However, if host processor's
>> +     * osvw_len is 0 then osvw_status[0] carries no information. We need to
>> +     * be conservative here and therefore we tell the guest that erratum 298
>> +     * is present (because we really don't know).
>> +     */
>> +    if (osvw_length == 0&&  boot_cpu_data.x86 == 0x10)
>
> Why do you check the family here? If osvw_length can only ever be
> zero on Fam10, then the first check is sufficient. If osvw_length can
> be zero on other than Fam10 (no matter whether we bail early older
> CPUs), then the check is actually wrong.

10h is the only family affected by this erratum.

>
>> +	vcpu->arch.amd.osvw.status |= 1;
>> +}
>> +
>> +void amd_vcpu_initialise(struct vcpu *vcpu)
>> +{
>> +    amd_guest_osvw_init(vcpu);
>> +}
>> +
>>   /*
>>    * Check for the presence of an AMD erratum. Arguments are defined in amd.h
>>
>>    * for each known erratum. Return 1 if erratum is found.
>> @@ -512,6 +548,30 @@ static void __devinit init_amd(struct cp
>>   	set_cpuidmask(c);
>>
>>   	check_syscfg_dram_mod_en();
>> +
>> +    /*
>> +     * Get OSVW bits. If bits are not the same on different processors then
>> +     * choose the worst case (i.e. if erratum is present on one processor and
>> +     * not on another assume that the erratum is present everywhere).
>> +     */
>> +     if (test_bit(X86_FEATURE_OSVW,&boot_cpu_data.x86_capability)) {
>> +         uint64_t len, status;
>> +
>> +        if (rdmsr_safe(MSR_AMD_OSVW_ID_LENGTH, len) ||
>> +            rdmsr_safe(MSR_AMD_OSVW_STATUS, status))
>> +            len = status = 0;
>> +
>> +        spin_lock(&amd_lock);
>
> What is the locking here supposed to protect against? The AP call tree
> is smp_callin() ->  identify_cpu() ->  init_amd(), which cannot race with
> anything (as the initiating processor is waiting for cpu_state to change
> to CPU_STATE_CALLIN).
>
>> +
>> +        if (len<  osvw_length)
>> +            osvw_length = len;
>> +
>> +        osvw_status |= status;
>> +        osvw_status&= (1ULL<<  osvw_length) - 1;
>> +
>> +        spin_unlock(&amd_lock);
>> +    } else
>> +	 osvw_length = osvw_status = 0;
>>   }
>>
>>   static struct cpu_dev amd_cpu_dev __cpuinitdata = {
>> --- a/xen/arch/x86/domain.c	Mon Jan 09 16:01:44 2012 +0100
>> +++ b/xen/arch/x86/domain.c	Mon Jan 16 22:08:09 2012 +0100
>> @@ -422,6 +422,10 @@ int vcpu_initialise(struct vcpu *v)
>>       if ( (rc = vcpu_init_fpu(v)) != 0 )
>>           return rc;
>>
>> +    /* Vendor-specific initialization */
>> +    if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
>
> If you check the vendor here, you don't need to anywhere in the
> descendant functions.
>
>> +        amd_vcpu_initialise(v);
>> +
>>       if ( is_hvm_domain(d) )
>>       {
>>           rc = hvm_vcpu_initialise(v);
>> --- a/xen/arch/x86/hvm/svm/svm.c	Mon Jan 09 16:01:44 2012 +0100
>> +++ b/xen/arch/x86/hvm/svm/svm.c	Mon Jan 16 22:08:09 2012 +0100
>> @@ -1044,6 +1044,27 @@ static void svm_init_erratum_383(struct
>>       }
>>   }
>>
>> +static int svm_handle_osvw(struct vcpu *v, uint32_t msr, uint64_t *val, uint read)
>> +{
>> +    uint eax, ebx, ecx, edx;
>> +
>> +    /* Guest OSVW support */
>> +    hvm_cpuid(0x80000001,&eax,&ebx,&ecx,&edx);
>> +    if (!test_bit((X86_FEATURE_OSVW&  31),&ecx))
>
> Formatting of changes to this file should be changed to Xen style.
>
>> +        return -1;
>> +
>> +    if (read) {
>> +        if (msr == MSR_AMD_OSVW_ID_LENGTH)
>> +            *val = v->arch.amd.osvw.length;
>> +        else
>> +            *val = v->arch.amd.osvw.status;
>> +    }
>> +    /* Writes are ignored */
>> +
>> +    return 0;
>> +}
>> +
>> +
>>   static int svm_cpu_up(void)
>>   {
>>       uint64_t msr_content;
>> --- a/xen/arch/x86/traps.c	Mon Jan 09 16:01:44 2012 +0100
>> +++ b/xen/arch/x86/traps.c	Mon Jan 16 22:08:09 2012 +0100
>> @@ -2542,6 +2542,15 @@ static int emulate_privileged_op(struct
>>               if ( wrmsr_safe(regs->ecx, msr_content) != 0 )
>>                   goto fail;
>>               break;
>> +        case MSR_AMD_OSVW_ID_LENGTH:
>> +        case MSR_AMD_OSVW_STATUS:
>> +            if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD ) {
>> +                if (!boot_cpu_has(X86_FEATURE_OSVW))
>> +                    goto fail;
>> +                else
>
> Bogus "else" after a "goto". And I wonder, provided there is some
> point in doing all this for PV guests in the first place, whether this
> shouldn't be kept getting handled by the default case.


If we go into default case we will emit "Domain attempted WRMSR ..." 
message in the log. I was trying to avoid that since writing to these 
registers is not really an error, it just gets ignored.

I'll fix the rest.

Thanks for review.
-boris

>
>> +                    break; /* Writes are ignored */
>> +            }
>> +            /* Fall through to default case */
>>           default:
>>               if ( wrmsr_hypervisor_regs(regs->ecx, msr_content) )
>>                   break;
>> @@ -2573,6 +2582,7 @@ static int emulate_privileged_op(struct
>>           break;
>>
>>       case 0x32: /* RDMSR */
>> +
>
> Stray addition of a newline.
>
>>           switch ( (u32)regs->ecx )
>>           {
>>   #ifdef CONFIG_X86_64
>> @@ -2632,6 +2642,23 @@ static int emulate_privileged_op(struct
>>               regs->eax = (uint32_t)msr_content;
>>               regs->edx = (uint32_t)(msr_content>>  32);
>>               break;
>> +        case MSR_AMD_OSVW_ID_LENGTH:
>> +        case MSR_AMD_OSVW_STATUS:
>> +            if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
>> +                if (!boot_cpu_has(X86_FEATURE_OSVW))
>> +                    goto fail;
>> +                else {
>
> Bogus "else" after a "goto".
>
> Jan
>
>> +                    if ((u32)regs->ecx == MSR_AMD_OSVW_ID_LENGTH)
>> +                        msr_content = v->arch.amd.osvw.length;
>> +                    else
>> +                        msr_content = v->arch.amd.osvw.status;
>> +
>> +                    regs->eax = (uint32_t)msr_content;
>> +                    regs->edx = (uint32_t)(msr_content>>  32);
>> +                }
>> +            } else
>> +                goto rdmsr_normal;
>> +            break;
>>           default:
>>               if ( rdmsr_hypervisor_regs(regs->ecx,&val) )
>>               {
>
>