From: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>,
boris ostrovsky <boris.ostrovsky@oracle.com>
Cc: keir@xen.org, JBeulich@suse.com, xen-devel@lists.xen.org
Subject: Re: [PATCH V2] x86, amd_ucode: Safeguard against #GP
Date: Fri, 30 May 2014 11:46:01 -0500 [thread overview]
Message-ID: <5388B5C9.20100@amd.com> (raw)
In-Reply-To: <5388B005.2070108@citrix.com>
On 5/30/2014 11:21 AM, Andrew Cooper wrote:
>
> On 30/05/2014 17:01, Aravind Gopalakrishnan wrote:
>> On 5/28/2014 12:56 PM, boris ostrovsky wrote:
>>>
>>> On 5/28/2014 11:16 AM, Aravind Gopalakrishnan wrote:
>>>> On 5/27/2014 6:47 PM, Andrew Cooper wrote:
>>>>> On 27/05/2014 19:24, Aravind Gopalakrishnan wrote:
>>>>>> When HW tries to load a corrupted patch, it generates #GP
>>>>>> and hangs the system. Use wrmsr_safe instead so that we
>>>>>> fail to load microcode gracefully.
>>>>>>
>>>>>> Also, massage error handling around apply_microcode to keep
>>>>>> in tune with error handling style of other parts of the code.
>>>>>>
>>>>>> Example on a Fam15h system-
>>>>>> (XEN) microcode: CPU0 collect_cpu_info: patch_id=0x6000626
>>>>>> (XEN) microcode: CPU0 size 7870, block size 2586 offset 76 equivID
>>>>>> 0x6012 rev 0x6000637
>>>>>> (XEN) microcode: CPU0 found a matching microcode update with version
>>>>>> 0x6000637 (current=0x6000626)
>>>>>> (XEN) traps.c:3073: GPF (0000): ffff82d08016f682 -> ffff82d08022d9f8
>>>>>> (XEN) microcode: CPU0 update from revision 0x6000637 to 0x6000626
>>>>>> failed
>>>>>> ^^^^^^^^^^^^^^^^^^^^^^
>>>>>> As shown, the log message above has the two revisions reversed.
>>>>>> Fix this
>>>>>>
>>>>>> Changes in V2:
>>>>>> - Do not ignore return value from wrmsr_safe
>>>>>> - Flip revision numbers as shown above
>>>>>>
>>>>>> Signed-off-by: Aravind Gopalakrishnan
>>>>>> <aravind.gopalakrishnan@amd.com>
>>>>>> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>>>> I thought we had identified that the hangs were to do with your
>>>>> use of
>>>>> 'noreboot' on the Xen command line.
>>>>>
>>>>
>>>> Hmm. Yeah.. I figured using wrmsr_safe allows user to just boot
>>>> into dom0 without
>>>> having to run through reboot loops. (lazy alternative I guess)
>>>>
>>>> Nevermind then. Thanks for the comments (Jan and Andrew). Will keep
>>>> in mind for the future.
>>>
>>> I don't understand --- the fact that you had noreboot option meant
>>> that your system wouldn't reboot (duh!) when a patch is corrupted
>>> (aka "it will hang"). But I'd think we don't want a reboot neither
>>> --- we want to safely skip the patch (and possibly backlist it).
>>>
>>>
>>
>> So, allowing the reboot to happen in turn allows entry to grub.
>> Where we can simply remove 'ucode=' option and this is same as
>> 'skipping' the patch using wrmsr_safe right?
>>
>> Except this is now explicitly done by the sysadmin
>> while wrmsr_safe just works without anyone doing some extra work;
>> and printing a log message informs user that update went wrong
>> for whatever reason.
>>
>> My understanding from earlier comments is that Andrew (and Jan)
>> would rather not see a change from wrmsrl to wrmsr_safe if it
>> is needless because there is already a way someone can circumvent the
>> corrupt patch: just don't provide it on the grub menu.
>>
>> Thanks,
>> -Aravind.
>
> Sorry - I think my comment confused the issue. Let me retry.
>
> Originally, the bug was described approximately as "A corrupt ucode
> will cause a GP fault, causing the server to hang".
>
> The unhandled #GP fault certainly should be wrapped with wrmsr_safe(),
> and an error/warning presented to the user. In the case that a bad
> ucode is discovered, it should be discarded and the server allowed to
> boot. It is substantially more useful for the server to come up and
> say "I couldn't load that bit of microcode you wanted me to", than to
> sit in a reboot loop because you made a typo in the bootloader config,
> and have to get someone in the datacenter to poke the physical server.
>
>
> My objection was to the wording of the comment alone. Unhandled #GP
> faults do not "hang" Xen unless you ask for them to behave in that
> way, given "noreboot" on the command line.
>
Ah. Okay, let me look into this again then and fix issues with V2.
-Aravind.
next prev parent reply other threads:[~2014-05-30 16:46 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-27 18:24 [PATCH V2] x86, amd_ucode: Safeguard against #GP Aravind Gopalakrishnan
2014-05-27 23:47 ` Andrew Cooper
2014-05-28 15:16 ` Aravind Gopalakrishnan
2014-05-28 17:56 ` boris ostrovsky
2014-05-30 16:01 ` Aravind Gopalakrishnan
2014-05-30 16:21 ` Andrew Cooper
2014-05-30 16:46 ` Aravind Gopalakrishnan [this message]
2014-06-02 7:31 ` Jan Beulich
2014-06-02 14:13 ` Boris Ostrovsky
2014-05-28 7:22 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5388B5C9.20100@amd.com \
--to=aravind.gopalakrishnan@amd.com \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=boris.ostrovsky@oracle.com \
--cc=keir@xen.org \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).