From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boris Ostrovsky Subject: Re: [PATCH V2] x86, amd_ucode: Skip microcode updates for final levels Date: Mon, 03 Aug 2015 14:18:23 -0400 Message-ID: <55BFB06F.7070101@oracle.com> References: <1438619749-1625-1-git-send-email-aravind.gopalakrishnan@amd.com> <55BFA354.4040009@oracle.com> <55BFAA6A.2040108@amd.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <55BFAA6A.2040108@amd.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Aravind Gopalakrishnan , jbeulich@suse.com, andrew.cooper3@citrix.com Cc: keir@xen.org, Suravee.Suthikulpanit@amd.com, xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 08/03/2015 01:52 PM, Aravind Gopalakrishnan wrote: > On 8/3/2015 12:22 PM, Boris Ostrovsky wrote: >> On 08/03/2015 12:35 PM, Aravind Gopalakrishnan wrote: >>> Some of older[Fam10h] systems require that certain number of >>> applied microcode patch levels should not be overwritten by >>> the microcode loader. Otherwise, system hangs are known to occur. >>> >>> The 'final_levels' of patch ids have been obtained empirically. >>> Refer bug https://bugzilla.suse.com/show_bug.cgi?id=913996 >>> for details of the issue. >>> >>> The short version is that people have predominantly noticed >>> system hang issues when trying to update microcode levels >>> beyond the patch IDs below. >>> [0x01000098, 0x0100009f, 0x010000af] >>> >>> >>> >>> +/* >>> + * The 'final_levels' of patch ids have been obtained empirically. >>> + * Refer bug https://bugzilla.suse.com/show_bug.cgi?id=913996 >>> + * for details of the issue. The short version is that people >>> + * have predominantly noticed system hang issues when trying to >>> + * update microcode levels beyond the patch IDs below. >>> + * From internal discussions, we gathered that OS/hypervisor >>> + * cannot reliably perform microcode updates beyond these levels >>> + * due to hardware issues. Therefore, we need to abort microcode >>> + * update process if we hit any of these levels. >>> + */ >>> +static unsigned int final_levels[] = { >>> + 0x01000098, >>> + 0x0100009f, >>> + 0x010000af >>> +}; >>> + >>> +static bool_t check_final_patch_levels(int cpu) >>> +{ >>> + /* >>> + * Check the current patch levels on the cpu. If they are equal to >>> + * any of the 'final_levels', then we should not update the >>> microcode >>> + * patch on the cpu as system will hang otherwise. >>> + */ >>> + struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu); >>> + unsigned int i; >>> + >>> + for ( i = 0; i < ARRAY_SIZE(final_levels); i++ ) >>> + if ( uci->cpu_sig.rev == final_levels[i] ) >>> + return 1; >> >> (I should have asked this when you posted v1, sorry) >> >> Are these final_levels[] fam10h-specific? Are we guaranteed that >> other families won't have them? I think family check would be good >> here. Or at least a comment stating that these levels can only be >> observed on family 10h (the only mention of family right now is in >> the commit message) >> >> > > > Yep. These are Fam10h specific microcode patches. And yeah, the HW > problem seems to exist only on these Fam10h specific processor > variants. So, it's pretty isolated. > > And why need a family check? The microcode patches are tied to a > specific family/model anyway right? Patches --- yes. I wasn't sure about levels (I hope we will never see the same level for different families but AFAIK it's not strictly prohibited since level is just a cookie). > It's only ever going to apply successfully if the equivalent id's match.. > > And sure, I can massage the comment like so- > " ... The short version is that people who are using certain Fam10h > systems are noticing system hang issues when tryingupdate microcode..." > > Would that be ok? Better than nothing (although TBH I'd rather see both the comment and the check). -boris > > Thanks, > -Aravind. >