From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH V2] x86, amd_ucode: Safeguard against #GP Date: Wed, 28 May 2014 00:47:17 +0100 Message-ID: <53852405.9010704@citrix.com> References: <1401215048-17154-1-git-send-email-aravind.gopalakrishnan@amd.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1401215048-17154-1-git-send-email-aravind.gopalakrishnan@amd.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Aravind Gopalakrishnan , JBeulich@suse.com, xen-devel@lists.xen.org Cc: boris.ostrovsky@oracle.com, keir@xen.org List-Id: xen-devel@lists.xenproject.org On 27/05/2014 19:24, Aravind Gopalakrishnan wrote: > When HW tries to load a corrupted patch, it generates #GP > and hangs the system. Use wrmsr_safe instead so that we > fail to load microcode gracefully. > > Also, massage error handling around apply_microcode to keep > in tune with error handling style of other parts of the code. > > Example on a Fam15h system- > (XEN) microcode: CPU0 collect_cpu_info: patch_id=0x6000626 > (XEN) microcode: CPU0 size 7870, block size 2586 offset 76 equivID > 0x6012 rev 0x6000637 > (XEN) microcode: CPU0 found a matching microcode update with version > 0x6000637 (current=0x6000626) > (XEN) traps.c:3073: GPF (0000): ffff82d08016f682 -> ffff82d08022d9f8 > (XEN) microcode: CPU0 update from revision 0x6000637 to 0x6000626 failed > ^^^^^^^^^^^^^^^^^^^^^^ > As shown, the log message above has the two revisions reversed. Fix this > > Changes in V2: > - Do not ignore return value from wrmsr_safe > - Flip revision numbers as shown above > > Signed-off-by: Aravind Gopalakrishnan > Reviewed-by: Boris Ostrovsky I thought we had identified that the hangs were to do with your use of 'noreboot' on the Xen command line. ~Andrew > --- > xen/arch/x86/microcode_amd.c | 25 +++++++++++++++++++------ > 1 file changed, 19 insertions(+), 6 deletions(-) > > diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c > index e83f4b6..1db8a0d 100644 > --- a/xen/arch/x86/microcode_amd.c > +++ b/xen/arch/x86/microcode_amd.c > @@ -178,32 +178,39 @@ static int apply_microcode(int cpu) > uint32_t rev; > struct microcode_amd *mc_amd = uci->mc.mc_amd; > struct microcode_header_amd *hdr; > + int error = -EINVAL; > > /* We should bind the task to the CPU */ > BUG_ON(raw_smp_processor_id() != cpu); > > if ( mc_amd == NULL ) > - return -EINVAL; > + goto apply_err1; > > hdr = mc_amd->mpb; > if ( hdr == NULL ) > - return -EINVAL; > + goto apply_err1; > > spin_lock_irqsave(µcode_update_lock, flags); > > - wrmsrl(MSR_AMD_PATCHLOADER, (unsigned long)hdr); > + error = wrmsr_safe(MSR_AMD_PATCHLOADER, (unsigned long)hdr); > > /* get patch id after patching */ > rdmsrl(MSR_AMD_PATCHLEVEL, rev); > > spin_unlock_irqrestore(µcode_update_lock, flags); > > + /* Catch HW patch application failure */ > + if ( error ) { > + printk(KERN_ERR "microcode: CPU%d ucode patch application failed HW tests. " > + "HW returned #GP\n", cpu); > + goto apply_err2; This... > + } > + > /* check current patch id and patch's id for match */ > if ( rev != hdr->patch_id ) > { > - printk(KERN_ERR "microcode: CPU%d update from revision " > - "%#x to %#x failed\n", cpu, hdr->patch_id, rev); > - return -EIO; > + error = -EIO; > + goto apply_err2; > } > > printk(KERN_WARNING "microcode: CPU%d updated from revision %#x to %#x\n", > @@ -212,6 +219,12 @@ static int apply_microcode(int cpu) > uci->cpu_sig.rev = rev; > > return 0; > + > +apply_err2: > + printk(KERN_ERR "microcode: CPU%d update from revision " > + "%#x to %#x failed\n", cpu, rev, hdr->patch_id); ... combined with this will result in two error messages being printed. This seems over overkill for the circumstance. ~Andrew. > +apply_err1: > + return error; > } > > static int get_ucode_from_buffer_amd(