* Re: [PATCH] x86/microcode: Prevent attempting updates known to fail
[not found] <20230531175119.10830-1-alejandro.vallejo@cloud.com>
@ 2023-06-01 10:54 ` Andrew Cooper
2023-06-02 13:19 ` Alejandro Vallejo
2023-06-02 20:35 ` Andrew Cooper
0 siblings, 2 replies; 5+ messages in thread
From: Andrew Cooper @ 2023-06-01 10:54 UTC (permalink / raw)
To: Alejandro Vallejo, Xen-devel; +Cc: Jan Beulich, Roger Pau Monné, Wei Liu
On 31/05/2023 6:51 pm, Alejandro Vallejo wrote:
> diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c
> index cd456c476f..e507945932 100644
> --- a/xen/arch/x86/cpu/microcode/core.c
> +++ b/xen/arch/x86/cpu/microcode/core.c
> @@ -697,6 +697,17 @@ static long cf_check microcode_update_helper(void *data)
> return ret;
> }
>
> +static bool this_cpu_can_install_update(void)
> +{
> + uint64_t mcu_ctrl;
> +
> + if ( !cpu_has_mcu_ctrl )
> + return true;
> +
> + rdmsrl(MSR_MCU_CONTROL, mcu_ctrl);
> + return !(mcu_ctrl & MCU_CONTROL_DIS_MCU_LOAD);
> +}
> +
> int microcode_update(XEN_GUEST_HANDLE(const_void) buf, unsigned long len)
> {
> int ret;
> @@ -708,6 +719,22 @@ int microcode_update(XEN_GUEST_HANDLE(const_void) buf, unsigned long len)
> if ( !ucode_ops.apply_microcode )
> return -EINVAL;
>
> + if ( !this_cpu_can_install_update() )
> + {
> + /*
> + * This CPU can't install microcode, so it makes no sense to try to
> + * go on. We're implicitly trusting firmware sanity in that all
> + * CPUs are expected to have a homogeneous setting. If, for some
> + * reason, another CPU happens to be locked down when this one
> + * isn't then unpleasantness will follow. In particular, some CPUs
> + * will be updated while others will not. A very stern message will
> + * be displayed in xl-dmesg that case, strongly advising to reboot the
> + * machine.
> + */
> + printk("WARNING: microcode not installed due to DIS_MCU_LOAD=1");
> + return -EACCES;
> + }
I had something else in mind here. Right now, this will read
MSR_MCU_CONTROL and emit a printk() on every microcode load, which will
be every AP, and every time the user uses the xen-ucode tool.
Instead, I recommend the following:
1) One patch moving the early-cpuid/msr read from tsx_init() into
early_microcode_init(), adjusting the comment as it goes. No point
duplicating that logic, and we need it earlier on boot now.
2) This patch, adjusting early_microcode_init() only. Have a printk()
saying "microcode loading disabled by firmware" and avoid filling in
ucode_ops. Every other part of ucode handling understands "loading not
available".
In terms of the commit message, you should call out the usecase
explicitly. This feature is intended for baremetal clouds where the
platform owner doesn't trust the tenant to choose the microcode version
in use.
Also, I'm really not sure what your 3rd paragraph is trying to say.
~Andrew
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] x86/microcode: Prevent attempting updates known to fail
2023-06-01 10:54 ` [PATCH] x86/microcode: Prevent attempting updates known to fail Andrew Cooper
@ 2023-06-02 13:19 ` Alejandro Vallejo
2023-06-02 16:44 ` Andrew Cooper
2023-06-02 20:35 ` Andrew Cooper
1 sibling, 1 reply; 5+ messages in thread
From: Alejandro Vallejo @ 2023-06-02 13:19 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Xen-devel, Jan Beulich, Roger Pau Monné, Wei Liu
On Thu, Jun 01, 2023 at 11:54:31AM +0100, Andrew Cooper wrote:
> I had something else in mind here. Right now, this will read
> MSR_MCU_CONTROL and emit a printk() on every microcode load, which will
> be every AP, and every time the user uses the xen-ucode tool.
Not every AP. The hypercall would return with an error before the APs are
brought in. It is true that the error on dmesg would appear on every
microcode load attempt though.
>
> Instead, I recommend the following:
>
> 1) One patch moving the early-cpuid/msr read from tsx_init() into
> early_microcode_init(), adjusting the comment as it goes. No point
> duplicating that logic, and we need it earlier on boot now.
> 2) This patch, adjusting early_microcode_init() only. Have a printk()
> saying "microcode loading disabled by firmware" and avoid filling in
> ucode_ops. Every other part of ucode handling understands "loading not
> available".
Sure. Going on a tangent though, I do wonder why tsx_init() is preceding
identify_cpu(). It's reading cpuid leaf 7d0 simply because it hasn't been
read yet, but it's not obvious why this rush in invoking tsx_init(). I
can't see any obvious marker that affect the following identify_cpu() call,
and swapping them gets rid of the cpuid read.
>
> In terms of the commit message, you should call out the usecase
> explicitly. This feature is intended for baremetal clouds where the
> platform owner doesn't trust the tenant to choose the microcode version
> in use.
>
Sure.
> Also, I'm really not sure what your 3rd paragraph is trying to say.
That the case where CPU_i.DIS_MCU_LOAD != CPU_j.DIS_MCU_LOAD where i != j
is not specifically handled on the grounds that sane firmware ensures that
condition doesn't happen, and we already notify when the system reached a
nonsensical state with different CPUs having different microcode versions.
I'll rewrite it better.
Alejandro
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] x86/microcode: Prevent attempting updates known to fail
2023-06-02 13:19 ` Alejandro Vallejo
@ 2023-06-02 16:44 ` Andrew Cooper
0 siblings, 0 replies; 5+ messages in thread
From: Andrew Cooper @ 2023-06-02 16:44 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Xen-devel, Jan Beulich, Roger Pau Monné, Wei Liu
On 02/06/2023 2:19 pm, Alejandro Vallejo wrote:
> On Thu, Jun 01, 2023 at 11:54:31AM +0100, Andrew Cooper wrote:
>> I had something else in mind here. Right now, this will read
>> MSR_MCU_CONTROL and emit a printk() on every microcode load, which will
>> be every AP, and every time the user uses the xen-ucode tool.
> Not every AP. The hypercall would return with an error before the APs are
> brought in. It is true that the error on dmesg would appear on every
> microcode load attempt though.
I meant every AP on boot, where Xen initiates the ucode load.
>
>> Instead, I recommend the following:
>>
>> 1) One patch moving the early-cpuid/msr read from tsx_init() into
>> early_microcode_init(), adjusting the comment as it goes. No point
>> duplicating that logic, and we need it earlier on boot now.
>> 2) This patch, adjusting early_microcode_init() only. Have a printk()
>> saying "microcode loading disabled by firmware" and avoid filling in
>> ucode_ops. Every other part of ucode handling understands "loading not
>> available".
> Sure. Going on a tangent though, I do wonder why tsx_init() is preceding
> identify_cpu(). It's reading cpuid leaf 7d0 simply because it hasn't been
> read yet, but it's not obvious why this rush in invoking tsx_init(). I
> can't see any obvious marker that affect the following identify_cpu() call,
> and swapping them gets rid of the cpuid read.
In __start_xen(),
tsx_init(); /* Needs microcode. May change HLE/RTM feature bits. */
If you were to test such a patch, the test-tsx ought to fail on SKL/KBL
amongst others.
One of the things that tsx_init() does is select TSX_CTRL_CPUID_CLEAR
and/or TSX_CPUID_CLEAR, which hides the HLE and RTM bits in regular
CPUID, so wants to run before the general CPUID scan. This matters for
guest performance - if TSX is actually always aborting, but reported to
the guest, then any library using RTM will be less performant than using
the non-transactional path.
Conversely if the user wants to explicitly re-activate TSX despite the
firmware defaults, those bits need clearing before the CPUID scan for
anything to work.
~Andrew
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] x86/microcode: Prevent attempting updates known to fail
2023-06-01 10:54 ` [PATCH] x86/microcode: Prevent attempting updates known to fail Andrew Cooper
2023-06-02 13:19 ` Alejandro Vallejo
@ 2023-06-02 20:35 ` Andrew Cooper
2023-06-05 10:31 ` Alejandro Vallejo
1 sibling, 1 reply; 5+ messages in thread
From: Andrew Cooper @ 2023-06-02 20:35 UTC (permalink / raw)
To: Alejandro Vallejo, Xen-devel; +Cc: Jan Beulich, Roger Pau Monné, Wei Liu
On 01/06/2023 11:54 am, Andrew Cooper wrote:
> Instead, I recommend the following:
>
> 1) One patch moving the early-cpuid/msr read from tsx_init() into
> early_microcode_init(), adjusting the comment as it goes. No point
> duplicating that logic, and we need it earlier on boot now.
> 2) This patch, adjusting early_microcode_init() only. Have a printk()
> saying "microcode loading disabled by firmware" and avoid filling in
> ucode_ops. Every other part of ucode handling understands "loading not
> available".
So, having fallen over "x86/ucode: Exit early from early_update_cache()
if loading not available" for other reasons, I've realised that this
isn't a completely sensible suggestion.
By not filling in ucode_ops, nothing ever calls collect_info(), meaning
that external components which peek at this_cpu(cpu_sig).rev get 0's
back in place of the actual microcode revision. That's probably the
best we can do for genuinely no ucode facilities available.
But there's another case we ought to cope with. Some hypervisors
deliberately report a microcode revision of ~0, and we should take to
mean "no microcode loading available" too.
For this MCU_CONTROL_DIS_MCU_LOAD case, we don't want to be trying to
load new microcode because that's a waste of time, but we absolutely
should query the current microcode revision. It is frequently relevant
for security reasons.
So I think we want to fine-grain things a little, and separate the
concepts of "ucode info available" and "ucode loading available". Per
the current mechanism, that would involve supporting a case where
ucode_ops.collect_cpu_info() is available but
ucode_ops.apply_microcode() is not.
~Andrew
P.S. also in our copious free time, we need to start supporting the
Intel min_rev field, which is more complicated than it sounds.
min_rev is vaguely defined as being relevant to block updates "after
you've evaluated CPUID and made decisions based on it", but here in Xen
we do also do livepatching and late loading to explicitly make use of
newly enumerated features.
So we need a way of xen-ucode saying "please really do load this,
because I as the admin think it will be fine in combination with the
livepatch I'm about to apply".
My best idea for this is to have a `--force` option to pass to Xen to
skip the revision checks, which will require either a new hypercall, or
perhaps borrowing a high bit from the size field in the current hypercall.
With a force option in place, the boot time ucode=allow-same can go
away. It has become distinctly less useful now that we were forced do
this unilaterally on AMD CPUs, and separating "allow same because of HW
bugs" from "the Admin promised they knew what they were doing" would be
better for testing.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] x86/microcode: Prevent attempting updates known to fail
2023-06-02 20:35 ` Andrew Cooper
@ 2023-06-05 10:31 ` Alejandro Vallejo
0 siblings, 0 replies; 5+ messages in thread
From: Alejandro Vallejo @ 2023-06-05 10:31 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Xen-devel, Jan Beulich, Roger Pau Monné, Wei Liu
On Fri, Jun 02, 2023 at 09:35:56PM +0100, Andrew Cooper wrote:
> For this MCU_CONTROL_DIS_MCU_LOAD case, we don't want to be trying to
> load new microcode because that's a waste of time, but we absolutely
> should query the current microcode revision. It is frequently relevant
> for security reasons.
>
> So I think we want to fine-grain things a little, and separate the
> concepts of "ucode info available" and "ucode loading available". Per
> the current mechanism, that would involve supporting a case where
> ucode_ops.collect_cpu_info() is available but
> ucode_ops.apply_microcode() is not.
I was going after something to that effect, yes.
>
> ~Andrew
>
> P.S. also in our copious free time, we need to start supporting the
> Intel min_rev field, which is more complicated than it sounds.
>
> min_rev is vaguely defined as being relevant to block updates "after
> you've evaluated CPUID and made decisions based on it", but here in Xen
> we do also do livepatching and late loading to explicitly make use of
> newly enumerated features.
>
> So we need a way of xen-ucode saying "please really do load this,
> because I as the admin think it will be fine in combination with the
> livepatch I'm about to apply".
>
> My best idea for this is to have a `--force` option to pass to Xen to
> skip the revision checks, which will require either a new hypercall, or
> perhaps borrowing a high bit from the size field in the current hypercall.
>
> With a force option in place, the boot time ucode=allow-same can go
> away. It has become distinctly less useful now that we were forced do
> this unilaterally on AMD CPUs, and separating "allow same because of HW
> bugs" from "the Admin promised they knew what they were doing" would be
> better for testing.
I've created a GitLab issue to keep track of that:
https://gitlab.com/xen-project/xen/-/issues/164
There's also the case of downgrades. We probably want to at least avoid
going back to a microcode revision with different min_rev field.
Alejandro
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-06-05 10:32 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20230531175119.10830-1-alejandro.vallejo@cloud.com>
2023-06-01 10:54 ` [PATCH] x86/microcode: Prevent attempting updates known to fail Andrew Cooper
2023-06-02 13:19 ` Alejandro Vallejo
2023-06-02 16:44 ` Andrew Cooper
2023-06-02 20:35 ` Andrew Cooper
2023-06-05 10:31 ` Alejandro Vallejo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.