Re: [Patch v3 2/2] x86/microcode: Synchronize late microcode loading

From: Chao Gao <chao.gao@intel.com>
To: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Cc: "Tian, Kevin" <kevin.tian@intel.com>,
	"Raj, Ashok" <ashok.raj@intel.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Jan Beulich <jbeulich@suse.com>,
	"Nakajima, Jun" <jun.nakajima@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Borislav Petkov <bp@suse.de>
Subject: Re: [Patch v3 2/2] x86/microcode: Synchronize late microcode loading
Date: Tue, 13 Nov 2018 17:08:26 +0800	[thread overview]
Message-ID: <20181113090825.GA23682@gao-cwp> (raw)
In-Reply-To: <1525816893-36669-2-git-send-email-chao.gao@intel.com>

On Wed, May 09, 2018 at 06:01:33AM +0800, Gao, Chao wrote:
>This patch ports microcode improvement patches from linux kernel.
>
>Before you read any further: the early loading method is still the
>preferred one and you should always do that. The following patch is
>improving the late loading mechanism for long running jobs and cloud use
>cases.
>
>Gather all cores and serialize the microcode update on them by doing it
>one-by-one to make the late update process as reliable as possible and
>avoid potential issues caused by the microcode update.
>
>This patch is also in accord with Andrew's suggestion,
>"Rendezvous all online cpus in an IPI to apply the patch, and keep the
>processors in until all have completed the patch.", in [1].
>
>[1]:https://wiki.xenproject.org/wiki/XenParavirtOps/microcode_update#Run_time_microcode_updates
>
>Signed-off-by: Chao Gao <chao.gao@intel.com>
>Tested-by: Chao Gao <chao.gao@intel.com>
>[linux commit: a5321aec6412b20b5ad15db2d6b916c05349dbff]
>[linux commit: bb8c13d61a629276a162c1d2b1a20a815cbcfbb7]
>Cc: Kevin Tian <kevin.tian@intel.com>
>Cc: Jun Nakajima <jun.nakajima@intel.com>
>Cc: Ashok Raj <ashok.raj@intel.com>
>Cc: Borislav Petkov <bp@suse.de>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>Cc: Jan Beulich <jbeulich@suse.com>
>---
>+static int do_microcode_update(void *_info)
>+{
>+    struct microcode_info *info = _info;
>+    unsigned int cpu = smp_processor_id();
>+    int ret;
>+
>+    ret = wait_for_cpus(&info->cpu_in, MICROCODE_DEFAULT_TIMEOUT);
>+    if ( ret )
>+        return ret;
>+
>+    /*
>+     * Logical threads which set the first bit in cpu_sibling_mask can do
>+     * the update. Other sibling threads just await the completion of
>+     * microcode update.
>+     */
>+    if ( !cpumask_test_and_set_cpu(
>+                cpumask_first(per_cpu(cpu_sibling_mask, cpu)), &info->cpus) )
>+        ret = microcode_update_cpu(info->buffer, info->buffer_size);

HI

A critical issue I realized is that microcode_update_cpu() here contains
much things, for instance, parsing the microcode binary and loading it. It is
a bad idea to put so much things in stop_machine context, especially memory
allocation (I did observe one assertion triggered sometimes when doing this).
AFAIK, we have two solutions:
1. use a malloc variation which can be used in stop_machine context.
2. like linux kernel, we can separate the microcode parsing from loading,
and in stop_machine context only do microcode loading. When parsing microcode
binary, all valid ucodes are put into a list. At loading stage, going through
the list is definitely safe.

What's your opinion on these two solutions?

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel