Re: [PATCH v6 00/12] improve late microcode loading

From: Chao Gao <chao.gao@intel.com>
To: "Woods, Brian" <Brian.Woods@amd.com>
Cc: "Sergey Dyasli" <sergey.dyasli@citrix.com>,
	"Wei Liu" <wei.liu2@citrix.com>,
	"Ashok Raj" <ashok.raj@intel.com>,
	"Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Jan Beulich" <jbeulich@suse.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"Suthikulpanit, Suravee" <Suravee.Suthikulpanit@amd.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [PATCH v6 00/12] improve late microcode loading
Date: Wed, 20 Mar 2019 16:58:40 +0800	[thread overview]
Message-ID: <20190320085838.GA19696@gao-cwp> (raw)
In-Reply-To: <b66fbb87-2b7f-32da-5e78-659aae7314f5@amd.com>

On Tue, Mar 19, 2019 at 09:39:59PM +0000, Woods, Brian wrote:
>On 3/19/19 3:22 PM, Brian Woods wrote:
>> On 3/11/19 2:57 AM, Chao Gao wrote:
>>> Major changes in version 6:
>>>   - run wbinvd before updating microcode (patch 10)
>>>   - add an userspace tool for late microcode update (patch 1)
>>>   - scale time to wait by the number of remaining CPUs to respond
>>>   - remove 'cpu' parameters from some related callbacks and functins
>>>   - save an ucode patch only if its supported CPU is allowed to mix with
>>>     current cpu.
>>>
>>> Changes in version 5:
>>>   - support parallel microcode updates for all cores (see patch 8)
>>>   - Address Roger's comments on the last version.
>>>
>>> The intention of this series is to make the late microcode loading
>>> more reliable by rendezvousing all cpus in stop_machine context.
>>> This idea comes from Ashok. I am porting his linux patch to Xen
>>> (see patch 10 and 11 for more details).
>>>
>>> This series makes five changes:
>>>   1. Patch 1: an userspace tool for late microcode update
>>>   2. Patch 2-9: introduce a global microcode cache and some cleanup
>>>   3. Patch 10: writeback and invalidate cache before updating microcode
>>>   3. Patch 11: synchronize late microcode loading
>>>   4. Patch 12: support parallel microcodes update on different cores
>>>
>>> Currently, late microcode loading does a lot of things including
>>> parsing microcode blob, checking the signature/revision and performing
>>> update. Putting all of them into stop_machine context is a bad idea
>>> because of complexity (One issue I observed is memory allocation
>>> triggered one assertion in stop_machine context). In order to simplify
>>> the load process, I move parsing microcode out of the load process.
>>> The microcode blob is parsed and a global microcode cache is built on
>>> a single CPU before rendezvousing all cpus to update microcode. Other
>>> CPUs just get and load a suitable microcode from the global cache.
>>> With this global cache, it is safe to put simplified load process to
>>> stop_machine context.
>>>
>>> Regarding changes to AMD side, I didn't do any test for them due to
>>> lack of hardware. Could you help to test this series on an AMD machine?
>>> At least, two basic tests are needed:
>>> * do a microcode update after system bootup
>>> * don't bring all pCPUs up at bootup by specifying maxcpus option in xen
>>>    command line and then do a microcode update and online all offlined
>>>    CPUs via 'xen-hptool'.
>>>
>>> Chao Gao (12):
>>>    misc/xenmicrocode: Upload a microcode blob to the hypervisor
>>>    microcode/intel: use union to get fields without shifting and masking
>>>    microcode/intel: extend microcode_update_match()
>>>    microcode: introduce a global cache of ucode patch
>>>    microcode: only save compatible ucode patches
>>>    microcode: remove struct ucode_cpu_info
>>>    microcode: remove pointless 'cpu' parameter
>>>    microcode: split out apply_microcode() from cpu_request_microcode()
>>>    microcode: remove struct microcode_info
>>>    microcode/intel: Writeback and invalidate caches before updating
>>>      microcode
>>>    x86/microcode: Synchronize late microcode loading
>>>    microcode: update microcode on cores in parallel
>>>
>>>   tools/libxc/include/xenctrl.h   |   1 +
>>>   tools/libxc/xc_misc.c           |  20 +++
>>>   tools/misc/Makefile             |   4 +
>>>   tools/misc/xenmicrocode.c       |  89 ++++++++++
>>>   xen/arch/x86/acpi/power.c       |   2 +-
>>>   xen/arch/x86/apic.c             |   2 +-
>>>   xen/arch/x86/microcode.c        | 380 
>>> +++++++++++++++++++++++++++-------------
>>>   xen/arch/x86/microcode_amd.c    | 236 ++++++++++++-------------
>>>   xen/arch/x86/microcode_intel.c  | 206 +++++++++++++---------
>>>   xen/arch/x86/smpboot.c          |   5 +-
>>>   xen/arch/x86/spec_ctrl.c        |   2 +-
>>>   xen/include/asm-x86/microcode.h |  40 +++--
>>>   xen/include/asm-x86/processor.h |   3 +-
>>>   13 files changed, 639 insertions(+), 351 deletions(-)
>>>   create mode 100644 tools/misc/xenmicrocode.c
>>>
>> 
>> Sorry for the delay.  These patches fail on F17h.  I'm looking into 
>> where it fails now.
>
>Bisecting it says it's commit "microcode: introduce a global cache of 
>ucode patch."
>
>The failing commit fails with:
>(XEN) [00000085227df312] microcode: CPU0 update from revision 0x8001207 
>to 0xffff8304 failed
>(XEN) [00000085240578ec] traps.c:1574: GPF (0000): ffff82d080426c88 
>[probe_cpuid_faulting+0xe/0xa2] -> ffff82d0803818b2
>
>That microcode revision is WAY off.  It should be 0x8001227 and not 
>0xffff8304.  I don't think I'll be able to do much on it before the end 
>of today, but let me what information you need or if there's anything I 
>should be looking at in particular.

Thanks for your testing.

Sergey tested it on some AMD machines. He pointed out an error in the
patch 4. I think the failure you observed was caused by that error.
I will fix it in the next version.

I am really sorry for this. I should have you copied on each patch.

[1]: https://lists.xenproject.org/archives/html/xen-devel/2019-03/msg00901.html

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel