Re: [Xen-devel] [PATCH v8 14/16] x86/microcode: Synchronize late microcode loading

From: Chao Gao <chao.gao@intel.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: "Kevin Tian" <kevin.tian@intel.com>,
	"Ashok Raj" <ashok.raj@intel.com>, "Wei Liu" <wl@xen.org>,
	"Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Jun Nakajima" <jun.nakajima@intel.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Borislav Petkov" <bp@suse.de>,
	"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [Xen-devel] [PATCH v8 14/16] x86/microcode: Synchronize late microcode loading
Date: Mon, 5 Aug 2019 21:16:24 +0800	[thread overview]
Message-ID: <20190805131623.GC1685@gao-cwp> (raw)
In-Reply-To: <14b8b6ba-4b3c-dada-dfb2-65b815ca621f@suse.com>

On Mon, Aug 05, 2019 at 10:43:16AM +0000, Jan Beulich wrote:
>On 01.08.2019 12:22, Chao Gao wrote:
>> @@ -234,6 +267,35 @@ bool microcode_update_cache(struct microcode_patch *patch)
>>   }
>>   
>>   /*
>> + * Wait for a condition to be met with a timeout (us).
>> + */
>> +static int wait_for_condition(int (*func)(void *data), void *data,
>> +                         unsigned int timeout)
>> +{
>> +    while ( !func(data) )
>> +    {
>> +        if ( !timeout-- )
>> +        {
>> +            printk("CPU%u: Timeout in %s\n", smp_processor_id(), __func__);
>
>I don't think __func__ is helpful here for problem analysis. Instead
>I think you would want to log either func or __builtin_return_address(0).

Will do.

>
>> @@ -283,37 +345,105 @@ static int microcode_update_cpu(const struct microcode_patch *patch)
>>       return err;
>>   }
>>   
>> -static long do_microcode_update(void *patch)
>> +static int do_microcode_update(void *patch)
>>   {
>> -    unsigned int cpu;
>> +    unsigned int cpu = smp_processor_id();
>> +    unsigned int cpu_nr = num_online_cpus();
>> +    int ret;
>>   
>> -    /* store the patch after a successful loading */
>> -    if ( !microcode_update_cpu(patch) && patch )
>> +    /* Mark loading an ucode is in progress */
>> +    cmpxchg(&loading_state, LOADING_EXITED, LOADING_ENTERED);
>
>Why cmpxchg(), especially when you don't check whether the store
>has actually happened? (Same further down then.)

loading_state is set to "LOADING_EXITED" by the caller. So the first CPU
coming here would store "LOADING_ENTERED" to it. And we don't need
special handling for the CPU that sets the state, so the return value
isn't checked.

Here are my considerations about how to set the state:
1) We cannot set the state in the caller. Because we would use this
state to block #NMI handling. Setting the state in the caller may
break stop_machine_run mechanism with the patch 16.

2) The first CPU reaching here should set the state (it means we cannot
choose one CPU, e.g. BSP, to do it). With this, #NMI handling is
disabled system-wise before any CPU calls in. Otherwise, if there is
an exception for a CPU, it may be still in #NMI handling, when its
sibling thread starts loading an ucode.

3) A simple assignment on every CPU is problematic in some cases.
For example, if one CPU comes in after other CPUs timed out and left,
it might set the state to "LOADING_ENTERED" and be blocked in #NMI
handling forever with patch 16.

That's why I choose to use a cmpxhg().

Regarding the cmpxchg() in error-handling below, it can be replaced by
a simple assignment. But I am not sure whether it is better to use
cmpxchg() considering cache line bouncing.

>
>> +    cpumask_set_cpu(cpu, &cpu_callin_map);
>> +    ret = wait_for_condition(wait_cpu_callin, (void *)(unsigned long)cpu_nr,
>> +                             MICROCODE_CALLIN_TIMEOUT_US);
>> +    if ( ret )
>>       {
>> -        spin_lock(&microcode_mutex);
>> -        microcode_update_cache(patch);
>> -        spin_unlock(&microcode_mutex);
>> -        patch = NULL;
>> +        cmpxchg(&loading_state, LOADING_ENTERED, LOADING_ABORTED);
>> +        return ret;
>>       }
>>   
>> -    if ( microcode_ops->end_update )
>> -        microcode_ops->end_update();
>> +    /*
>> +     * Load microcode update on only one logical processor per core, or in
>> +     * AMD's term, one core per compute unit. The one with the lowest thread
>> +     * id among all siblings is chosen to perform the loading.
>> +     */
>> +    if ( (cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu))) )
>> +    {
>> +        static unsigned int panicked = 0;
>> +        bool monitor;
>> +        unsigned int done;
>> +        unsigned long tick = 0;
>>   
>> -    cpu = cpumask_next(smp_processor_id(), &cpu_online_map);
>> -    if ( cpu < nr_cpu_ids )
>> -        return continue_hypercall_on_cpu(cpu, do_microcode_update, patch);
>> +        ret = microcode_ops->apply_microcode(patch);
>> +        if ( !ret )
>> +        {
>> +            unsigned int cpu2;
>>   
>> -    /* Free the patch if no CPU has loaded it successfully. */
>> -    if ( patch )
>> -        microcode_free_patch(patch);
>> +            atomic_inc(&cpu_updated);
>> +            /* Propagate revision number to all siblings */
>> +            for_each_cpu(cpu2, per_cpu(cpu_sibling_mask, cpu))
>> +                per_cpu(cpu_sig, cpu2).rev = this_cpu(cpu_sig).rev;
>> +        }
>>   
>> -    return 0;
>> +        /*
>> +         * The first CPU reaching here will monitor the progress and emit
>> +         * warning message if the duration is too long (e.g. >1 second).
>> +         */
>> +        monitor = !atomic_inc_return(&cpu_out);
>> +        if ( monitor )
>> +            tick = rdtsc_ordered();
>> +
>> +        /* Waiting for all cores or computing units finishing update */
>> +        done = atomic_read(&cpu_out);
>> +        while ( panicked && done != nr_cores )
>
>Don't you mean "!panicked" here, or || instead of && ? Otherwise I don't
>see how the loop would ever be entered.

Sorry. It should be !panicked.

Other comments are reasonable and I will follow your suggestions.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel