Re: [Xen-devel] [PATCH v11 6/7] microcode: rendezvous CPUs in NMI handler and load ucode

From: Chao Gao <chao.gao@intel.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: "Sergey Dyasli" <sergey.dyasli@citrix.com>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Ashok Raj" <ashok.raj@intel.com>, "Wei Liu" <wl@xen.org>,
	"Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com>,
	"George Dunlap" <George.Dunlap@eu.citrix.com>,
	"Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Ian Jackson" <ian.jackson@eu.citrix.com>,
	"Tim Deegan" <tim@xen.org>, "Julien Grall" <julien.grall@arm.com>,
	xen-devel@lists.xenproject.org,
	"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [Xen-devel] [PATCH v11 6/7] microcode: rendezvous CPUs in NMI handler and load ucode
Date: Fri, 27 Sep 2019 21:53:32 +0800	[thread overview]
Message-ID: <20190927135330.GA5570@gao-cwp> (raw)
In-Reply-To: <c9b53200-3b90-7f93-21f8-dd715965546a@suse.com>

On Fri, Sep 27, 2019 at 12:19:22PM +0200, Jan Beulich wrote:
>On 26.09.2019 15:53, Chao Gao wrote:
>> @@ -105,23 +110,42 @@ void __init microcode_set_module(unsigned int idx)
>>  }
>>  
>>  /*
>> - * The format is '[<integer>|scan]'. Both options are optional.
>> + * The format is '[<integer>|scan, nmi=<bool>]'. Both options are optional.
>>   * If the EFI has forced which of the multiboot payloads is to be used,
>> - * no parsing will be attempted.
>> + * only nmi=<bool> is parsed.
>>   */
>>  static int __init parse_ucode(const char *s)
>>  {
>> -    const char *q = NULL;
>> +    const char *ss;
>> +    int val, rc = 0;
>>  
>> -    if ( ucode_mod_forced ) /* Forced by EFI */
>> -       return 0;
>> +    do {
>> +        ss = strchr(s, ',');
>> +        if ( !ss )
>> +            ss = strchr(s, '\0');
>>  
>> -    if ( !strncmp(s, "scan", 4) )
>> -        ucode_scan = 1;
>> -    else
>> -        ucode_mod_idx = simple_strtol(s, &q, 0);
>> +        if ( (val = parse_boolean("nmi", s, ss)) >= 0 )
>> +            ucode_in_nmi = val;
>> +        else if ( !ucode_mod_forced ) /* Not forced by EFI */
>> +        {
>> +            const char *q = NULL;
>> +
>> +            if ( !strncmp(s, "scan", 4) )
>> +            {
>> +                ucode_scan = true;
>
>I guess it would have resulted in more consistent code if you had
>used parse_boolean() here, too.
>
>> @@ -222,6 +246,8 @@ const struct microcode_ops *microcode_ops;
>>  static DEFINE_SPINLOCK(microcode_mutex);
>>  
>>  DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
>> +/* Store error code of the work done in NMI handler */
>> +DEFINE_PER_CPU(int, loading_err);
>
>static
>
>> @@ -356,42 +383,88 @@ static void set_state(unsigned int state)
>>      smp_wmb();
>>  }
>>  
>> -static int secondary_thread_fn(void)
>> +static int secondary_nmi_work(void)
>>  {
>> -    unsigned int primary = cpumask_first(this_cpu(cpu_sibling_mask));
>> +    cpumask_set_cpu(smp_processor_id(), &cpu_callin_map);
>>  
>> -    if ( !wait_for_state(LOADING_CALLIN) )
>> -        return -EBUSY;
>> +    return wait_for_state(LOADING_EXIT) ? 0 : -EBUSY;
>> +}
>> +
>> +static int primary_thread_work(const struct microcode_patch *patch)
>> +{
>> +    int ret;
>>  
>>      cpumask_set_cpu(smp_processor_id(), &cpu_callin_map);
>>  
>> -    if ( !wait_for_state(LOADING_EXIT) )
>> +    if ( !wait_for_state(LOADING_ENTER) )
>>          return -EBUSY;
>>  
>> -    /* Copy update revision from the primary thread. */
>> -    this_cpu(cpu_sig).rev = per_cpu(cpu_sig, primary).rev;
>> +    ret = microcode_ops->apply_microcode(patch);
>> +    if ( !ret )
>> +        atomic_inc(&cpu_updated);
>> +    atomic_inc(&cpu_out);
>>  
>> -    return 0;
>> +    return ret;
>>  }
>>  
>> -static int primary_thread_fn(const struct microcode_patch *patch)
>> +static int primary_nmi_work(const struct microcode_patch *patch)
>> +{
>> +    return primary_thread_work(patch);
>> +}
>
>Why this wrapper? The function signatures are identical. I guess
>you want to emphasize the environment the function is to be used
>in, so perhaps fine despite the redundancy. At least there's no
>address taken of this function, so the compiler can eliminate it.
>
>> +static int secondary_thread_fn(void)
>> +{
>>      if ( !wait_for_state(LOADING_CALLIN) )
>>          return -EBUSY;
>>  
>> -    cpumask_set_cpu(smp_processor_id(), &cpu_callin_map);
>> +    self_nmi();
>>  
>> -    if ( !wait_for_state(LOADING_ENTER) )
>> +    /* Copy update revision from the primary thread. */
>> +    this_cpu(cpu_sig).rev =
>> +        per_cpu(cpu_sig, cpumask_first(this_cpu(cpu_sibling_mask))).rev;
>
>_alternative_instructions() takes specific care to avoid relying on
>the NMI potentially not arriving synchronously (in which case you'd
>potentially copy a not-yet-updated CPU signature above). I think the
>same care wants applying here, which I guess would be another
>
>    wait_for_state(LOADING_EXIT);
>
>> +    return this_cpu(loading_err);
>> +}
>> +
>> +static int primary_thread_fn(const struct microcode_patch *patch)
>> +{
>> +    if ( !wait_for_state(LOADING_CALLIN) )
>>          return -EBUSY;
>>  
>> -    ret = microcode_ops->apply_microcode(patch);
>> -    if ( !ret )
>> -        atomic_inc(&cpu_updated);
>> -    atomic_inc(&cpu_out);
>> +    if ( ucode_in_nmi )
>> +    {
>> +        self_nmi();
>> +        return this_cpu(loading_err);
>
>Same here than, to protect against returning a not-yet-updated error
>indicator.
>
>> @@ -420,14 +498,23 @@ static int control_thread_fn(const struct microcode_patch *patch)
>>          return ret;
>>      }
>>  
>> -    /* Let primary threads load the given ucode update */
>> -    set_state(LOADING_ENTER);
>> -
>> +    /* Control thread loads ucode first while others are in NMI handler. */
>>      ret = microcode_ops->apply_microcode(patch);
>>      if ( !ret )
>>          atomic_inc(&cpu_updated);
>>      atomic_inc(&cpu_out);
>>  
>> +    if ( ret == -EIO )
>> +    {
>> +        printk(XENLOG_ERR
>> +               "Late loading aborted: CPU%u failed to update ucode\n", cpu);
>> +        set_state(LOADING_EXIT);
>> +        return ret;
>> +    }
>> +
>> +    /* Let primary threads load the given ucode update */
>> +    set_state(LOADING_ENTER);
>
>While the description goes to some lengths to explain this ordering of
>updates, I still don't really see the point: How is it better for the
>control CPU to have updated its ucode early and then hit an NMI before
>the other CPUs have even started updating, than the other way around
>in the opposite case?

We want to be conservative here. If an ucode is to update something
shared by a whole socket, for the latter case, control thread may
be accessing things that are being updating by the ucode loading on
other cores. It is not safe, just like sibling thread isn't expected
to access features exposed by the old ucode when primary thread is
loading ucode.

Do you think it makes a little sense? If yes, I would like to post
a new version of this patch later this day to catch up Xen 4.13.

Other comments make sense to me.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel