From mboxrd@z Thu Jan  1 00:00:00 1970
From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Subject: Re: [PATCH V2] x86,
	amd_ucode: Skip microcode updates for final levels
Date: Mon, 03 Aug 2015 14:18:23 -0400
Message-ID: <55BFB06F.7070101@oracle.com>
References: <1438619749-1625-1-git-send-email-aravind.gopalakrishnan@amd.com>
	<55BFA354.4040009@oracle.com> <55BFAA6A.2040108@amd.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <55BFAA6A.2040108@amd.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>, jbeulich@suse.com, andrew.cooper3@citrix.com
Cc: keir@xen.org, Suravee.Suthikulpanit@amd.com, xen-devel@lists.xen.org
List-Id: xen-devel@lists.xenproject.org

On 08/03/2015 01:52 PM, Aravind Gopalakrishnan wrote:
> On 8/3/2015 12:22 PM, Boris Ostrovsky wrote:
>> On 08/03/2015 12:35 PM, Aravind Gopalakrishnan wrote:
>>> Some of older[Fam10h] systems require that certain number of
>>> applied microcode patch levels should not be overwritten by
>>> the microcode loader. Otherwise, system hangs are known to occur.
>>>
>>> The 'final_levels' of patch ids have been obtained empirically.
>>> Refer bug https://bugzilla.suse.com/show_bug.cgi?id=913996
>>> for details of the issue.
>>>
>>> The short version is that people have predominantly noticed
>>> system hang issues when trying to update microcode levels
>>> beyond the patch IDs below.
>>> [0x01000098, 0x0100009f, 0x010000af]
>>>
>>>
>>>
>>>   +/*
>>> + * The 'final_levels' of patch ids have been obtained empirically.
>>> + * Refer bug https://bugzilla.suse.com/show_bug.cgi?id=913996
>>> + * for details of the issue. The short version is that people
>>> + * have predominantly noticed system hang issues when trying to
>>> + * update microcode levels beyond the patch IDs below.
>>> + * From internal discussions, we gathered that OS/hypervisor
>>> + * cannot reliably perform microcode updates beyond these levels
>>> + * due to hardware issues. Therefore, we need to abort microcode
>>> + * update process if we hit any of these levels.
>>> + */
>>> +static unsigned int final_levels[] = {
>>> +    0x01000098,
>>> +    0x0100009f,
>>> +    0x010000af
>>> +};
>>> +
>>> +static bool_t check_final_patch_levels(int cpu)
>>> +{
>>> +    /*
>>> +     * Check the current patch levels on the cpu. If they are equal to
>>> +     * any of the 'final_levels', then we should not update the 
>>> microcode
>>> +     * patch on the cpu as system will hang otherwise.
>>> +     */
>>> +    struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
>>> +    unsigned int i;
>>> +
>>> +    for ( i = 0; i < ARRAY_SIZE(final_levels); i++ )
>>> +        if ( uci->cpu_sig.rev == final_levels[i] )
>>> +            return 1;
>>
>> (I should have asked this when you posted v1, sorry)
>>
>> Are these final_levels[] fam10h-specific? Are we guaranteed that 
>> other families won't have them? I think family check would be good 
>> here. Or at least a comment stating that these levels can only be 
>> observed on family 10h (the only mention of family right now is in 
>> the commit message)
>>
>>
>
>
> Yep. These are Fam10h specific microcode patches. And yeah, the HW 
> problem seems to exist only on these Fam10h specific processor 
> variants. So, it's pretty isolated.
>
> And why need a family check? The microcode patches are tied to a 
> specific family/model anyway right?

Patches --- yes. I wasn't sure about levels (I hope we will never see 
the same level for different families but AFAIK it's not strictly 
prohibited since level is just a cookie).

> It's only ever going to apply successfully if the equivalent id's match..
>
> And sure, I can massage the comment like so-
> " ... The short version is that people who are using certain Fam10h 
> systems are noticing system hang issues when tryingupdate microcode..."
>
> Would that be ok?

Better than nothing (although TBH I'd rather see both the comment and 
the check).

-boris

>
> Thanks,
> -Aravind.
>