From mboxrd@z Thu Jan 1 00:00:00 1970 From: Aravind Gopalakrishnan Subject: Re: [PATCH V2] x86, amd_ucode: Safeguard against #GP Date: Fri, 30 May 2014 11:01:46 -0500 Message-ID: <5388AB6A.7010803@amd.com> References: <1401215048-17154-1-git-send-email-aravind.gopalakrishnan@amd.com> <53852405.9010704@citrix.com> <5385FDD3.8020307@amd.com> <53862348.1060400@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <53862348.1060400@oracle.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: boris ostrovsky Cc: Andrew Cooper , keir@xen.org, JBeulich@suse.com, xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 5/28/2014 12:56 PM, boris ostrovsky wrote: > > On 5/28/2014 11:16 AM, Aravind Gopalakrishnan wrote: >> On 5/27/2014 6:47 PM, Andrew Cooper wrote: >>> On 27/05/2014 19:24, Aravind Gopalakrishnan wrote: >>>> When HW tries to load a corrupted patch, it generates #GP >>>> and hangs the system. Use wrmsr_safe instead so that we >>>> fail to load microcode gracefully. >>>> >>>> Also, massage error handling around apply_microcode to keep >>>> in tune with error handling style of other parts of the code. >>>> >>>> Example on a Fam15h system- >>>> (XEN) microcode: CPU0 collect_cpu_info: patch_id=0x6000626 >>>> (XEN) microcode: CPU0 size 7870, block size 2586 offset 76 equivID >>>> 0x6012 rev 0x6000637 >>>> (XEN) microcode: CPU0 found a matching microcode update with version >>>> 0x6000637 (current=0x6000626) >>>> (XEN) traps.c:3073: GPF (0000): ffff82d08016f682 -> ffff82d08022d9f8 >>>> (XEN) microcode: CPU0 update from revision 0x6000637 to 0x6000626 >>>> failed >>>> ^^^^^^^^^^^^^^^^^^^^^^ >>>> As shown, the log message above has the two revisions reversed. Fix >>>> this >>>> >>>> Changes in V2: >>>> - Do not ignore return value from wrmsr_safe >>>> - Flip revision numbers as shown above >>>> >>>> Signed-off-by: Aravind Gopalakrishnan >>>> Reviewed-by: Boris Ostrovsky >>> I thought we had identified that the hangs were to do with your use of >>> 'noreboot' on the Xen command line. >>> >> >> Hmm. Yeah.. I figured using wrmsr_safe allows user to just boot into >> dom0 without >> having to run through reboot loops. (lazy alternative I guess) >> >> Nevermind then. Thanks for the comments (Jan and Andrew). Will keep >> in mind for the future. > > I don't understand --- the fact that you had noreboot option meant > that your system wouldn't reboot (duh!) when a patch is corrupted (aka > "it will hang"). But I'd think we don't want a reboot neither --- we > want to safely skip the patch (and possibly backlist it). > > So, allowing the reboot to happen in turn allows entry to grub. Where we can simply remove 'ucode=' option and this is same as 'skipping' the patch using wrmsr_safe right? Except this is now explicitly done by the sysadmin while wrmsr_safe just works without anyone doing some extra work; and printing a log message informs user that update went wrong for whatever reason. My understanding from earlier comments is that Andrew (and Jan) would rather not see a change from wrmsrl to wrmsr_safe if it is needless because there is already a way someone can circumvent the corrupt patch: just don't provide it on the grub menu. Thanks, -Aravind.