* Re: [PATCH] x86/microcode: Prevent attempting updates known to fail [not found] <20230531175119.10830-1-alejandro.vallejo@cloud.com> @ 2023-06-01 10:54 ` Andrew Cooper 2023-06-02 13:19 ` Alejandro Vallejo 2023-06-02 20:35 ` Andrew Cooper 0 siblings, 2 replies; 5+ messages in thread From: Andrew Cooper @ 2023-06-01 10:54 UTC (permalink / raw) To: Alejandro Vallejo, Xen-devel; +Cc: Jan Beulich, Roger Pau Monné, Wei Liu On 31/05/2023 6:51 pm, Alejandro Vallejo wrote: > diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c > index cd456c476f..e507945932 100644 > --- a/xen/arch/x86/cpu/microcode/core.c > +++ b/xen/arch/x86/cpu/microcode/core.c > @@ -697,6 +697,17 @@ static long cf_check microcode_update_helper(void *data) > return ret; > } > > +static bool this_cpu_can_install_update(void) > +{ > + uint64_t mcu_ctrl; > + > + if ( !cpu_has_mcu_ctrl ) > + return true; > + > + rdmsrl(MSR_MCU_CONTROL, mcu_ctrl); > + return !(mcu_ctrl & MCU_CONTROL_DIS_MCU_LOAD); > +} > + > int microcode_update(XEN_GUEST_HANDLE(const_void) buf, unsigned long len) > { > int ret; > @@ -708,6 +719,22 @@ int microcode_update(XEN_GUEST_HANDLE(const_void) buf, unsigned long len) > if ( !ucode_ops.apply_microcode ) > return -EINVAL; > > + if ( !this_cpu_can_install_update() ) > + { > + /* > + * This CPU can't install microcode, so it makes no sense to try to > + * go on. We're implicitly trusting firmware sanity in that all > + * CPUs are expected to have a homogeneous setting. If, for some > + * reason, another CPU happens to be locked down when this one > + * isn't then unpleasantness will follow. In particular, some CPUs > + * will be updated while others will not. A very stern message will > + * be displayed in xl-dmesg that case, strongly advising to reboot the > + * machine. > + */ > + printk("WARNING: microcode not installed due to DIS_MCU_LOAD=1"); > + return -EACCES; > + } I had something else in mind here. Right now, this will read MSR_MCU_CONTROL and emit a printk() on every microcode load, which will be every AP, and every time the user uses the xen-ucode tool. Instead, I recommend the following: 1) One patch moving the early-cpuid/msr read from tsx_init() into early_microcode_init(), adjusting the comment as it goes. No point duplicating that logic, and we need it earlier on boot now. 2) This patch, adjusting early_microcode_init() only. Have a printk() saying "microcode loading disabled by firmware" and avoid filling in ucode_ops. Every other part of ucode handling understands "loading not available". In terms of the commit message, you should call out the usecase explicitly. This feature is intended for baremetal clouds where the platform owner doesn't trust the tenant to choose the microcode version in use. Also, I'm really not sure what your 3rd paragraph is trying to say. ~Andrew ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] x86/microcode: Prevent attempting updates known to fail 2023-06-01 10:54 ` [PATCH] x86/microcode: Prevent attempting updates known to fail Andrew Cooper @ 2023-06-02 13:19 ` Alejandro Vallejo 2023-06-02 16:44 ` Andrew Cooper 2023-06-02 20:35 ` Andrew Cooper 1 sibling, 1 reply; 5+ messages in thread From: Alejandro Vallejo @ 2023-06-02 13:19 UTC (permalink / raw) To: Andrew Cooper; +Cc: Xen-devel, Jan Beulich, Roger Pau Monné, Wei Liu On Thu, Jun 01, 2023 at 11:54:31AM +0100, Andrew Cooper wrote: > I had something else in mind here. Right now, this will read > MSR_MCU_CONTROL and emit a printk() on every microcode load, which will > be every AP, and every time the user uses the xen-ucode tool. Not every AP. The hypercall would return with an error before the APs are brought in. It is true that the error on dmesg would appear on every microcode load attempt though. > > Instead, I recommend the following: > > 1) One patch moving the early-cpuid/msr read from tsx_init() into > early_microcode_init(), adjusting the comment as it goes. No point > duplicating that logic, and we need it earlier on boot now. > 2) This patch, adjusting early_microcode_init() only. Have a printk() > saying "microcode loading disabled by firmware" and avoid filling in > ucode_ops. Every other part of ucode handling understands "loading not > available". Sure. Going on a tangent though, I do wonder why tsx_init() is preceding identify_cpu(). It's reading cpuid leaf 7d0 simply because it hasn't been read yet, but it's not obvious why this rush in invoking tsx_init(). I can't see any obvious marker that affect the following identify_cpu() call, and swapping them gets rid of the cpuid read. > > In terms of the commit message, you should call out the usecase > explicitly. This feature is intended for baremetal clouds where the > platform owner doesn't trust the tenant to choose the microcode version > in use. > Sure. > Also, I'm really not sure what your 3rd paragraph is trying to say. That the case where CPU_i.DIS_MCU_LOAD != CPU_j.DIS_MCU_LOAD where i != j is not specifically handled on the grounds that sane firmware ensures that condition doesn't happen, and we already notify when the system reached a nonsensical state with different CPUs having different microcode versions. I'll rewrite it better. Alejandro ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] x86/microcode: Prevent attempting updates known to fail 2023-06-02 13:19 ` Alejandro Vallejo @ 2023-06-02 16:44 ` Andrew Cooper 0 siblings, 0 replies; 5+ messages in thread From: Andrew Cooper @ 2023-06-02 16:44 UTC (permalink / raw) To: Alejandro Vallejo; +Cc: Xen-devel, Jan Beulich, Roger Pau Monné, Wei Liu On 02/06/2023 2:19 pm, Alejandro Vallejo wrote: > On Thu, Jun 01, 2023 at 11:54:31AM +0100, Andrew Cooper wrote: >> I had something else in mind here. Right now, this will read >> MSR_MCU_CONTROL and emit a printk() on every microcode load, which will >> be every AP, and every time the user uses the xen-ucode tool. > Not every AP. The hypercall would return with an error before the APs are > brought in. It is true that the error on dmesg would appear on every > microcode load attempt though. I meant every AP on boot, where Xen initiates the ucode load. > >> Instead, I recommend the following: >> >> 1) One patch moving the early-cpuid/msr read from tsx_init() into >> early_microcode_init(), adjusting the comment as it goes. No point >> duplicating that logic, and we need it earlier on boot now. >> 2) This patch, adjusting early_microcode_init() only. Have a printk() >> saying "microcode loading disabled by firmware" and avoid filling in >> ucode_ops. Every other part of ucode handling understands "loading not >> available". > Sure. Going on a tangent though, I do wonder why tsx_init() is preceding > identify_cpu(). It's reading cpuid leaf 7d0 simply because it hasn't been > read yet, but it's not obvious why this rush in invoking tsx_init(). I > can't see any obvious marker that affect the following identify_cpu() call, > and swapping them gets rid of the cpuid read. In __start_xen(), tsx_init(); /* Needs microcode. May change HLE/RTM feature bits. */ If you were to test such a patch, the test-tsx ought to fail on SKL/KBL amongst others. One of the things that tsx_init() does is select TSX_CTRL_CPUID_CLEAR and/or TSX_CPUID_CLEAR, which hides the HLE and RTM bits in regular CPUID, so wants to run before the general CPUID scan. This matters for guest performance - if TSX is actually always aborting, but reported to the guest, then any library using RTM will be less performant than using the non-transactional path. Conversely if the user wants to explicitly re-activate TSX despite the firmware defaults, those bits need clearing before the CPUID scan for anything to work. ~Andrew ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] x86/microcode: Prevent attempting updates known to fail 2023-06-01 10:54 ` [PATCH] x86/microcode: Prevent attempting updates known to fail Andrew Cooper 2023-06-02 13:19 ` Alejandro Vallejo @ 2023-06-02 20:35 ` Andrew Cooper 2023-06-05 10:31 ` Alejandro Vallejo 1 sibling, 1 reply; 5+ messages in thread From: Andrew Cooper @ 2023-06-02 20:35 UTC (permalink / raw) To: Alejandro Vallejo, Xen-devel; +Cc: Jan Beulich, Roger Pau Monné, Wei Liu On 01/06/2023 11:54 am, Andrew Cooper wrote: > Instead, I recommend the following: > > 1) One patch moving the early-cpuid/msr read from tsx_init() into > early_microcode_init(), adjusting the comment as it goes. No point > duplicating that logic, and we need it earlier on boot now. > 2) This patch, adjusting early_microcode_init() only. Have a printk() > saying "microcode loading disabled by firmware" and avoid filling in > ucode_ops. Every other part of ucode handling understands "loading not > available". So, having fallen over "x86/ucode: Exit early from early_update_cache() if loading not available" for other reasons, I've realised that this isn't a completely sensible suggestion. By not filling in ucode_ops, nothing ever calls collect_info(), meaning that external components which peek at this_cpu(cpu_sig).rev get 0's back in place of the actual microcode revision. That's probably the best we can do for genuinely no ucode facilities available. But there's another case we ought to cope with. Some hypervisors deliberately report a microcode revision of ~0, and we should take to mean "no microcode loading available" too. For this MCU_CONTROL_DIS_MCU_LOAD case, we don't want to be trying to load new microcode because that's a waste of time, but we absolutely should query the current microcode revision. It is frequently relevant for security reasons. So I think we want to fine-grain things a little, and separate the concepts of "ucode info available" and "ucode loading available". Per the current mechanism, that would involve supporting a case where ucode_ops.collect_cpu_info() is available but ucode_ops.apply_microcode() is not. ~Andrew P.S. also in our copious free time, we need to start supporting the Intel min_rev field, which is more complicated than it sounds. min_rev is vaguely defined as being relevant to block updates "after you've evaluated CPUID and made decisions based on it", but here in Xen we do also do livepatching and late loading to explicitly make use of newly enumerated features. So we need a way of xen-ucode saying "please really do load this, because I as the admin think it will be fine in combination with the livepatch I'm about to apply". My best idea for this is to have a `--force` option to pass to Xen to skip the revision checks, which will require either a new hypercall, or perhaps borrowing a high bit from the size field in the current hypercall. With a force option in place, the boot time ucode=allow-same can go away. It has become distinctly less useful now that we were forced do this unilaterally on AMD CPUs, and separating "allow same because of HW bugs" from "the Admin promised they knew what they were doing" would be better for testing. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] x86/microcode: Prevent attempting updates known to fail 2023-06-02 20:35 ` Andrew Cooper @ 2023-06-05 10:31 ` Alejandro Vallejo 0 siblings, 0 replies; 5+ messages in thread From: Alejandro Vallejo @ 2023-06-05 10:31 UTC (permalink / raw) To: Andrew Cooper; +Cc: Xen-devel, Jan Beulich, Roger Pau Monné, Wei Liu On Fri, Jun 02, 2023 at 09:35:56PM +0100, Andrew Cooper wrote: > For this MCU_CONTROL_DIS_MCU_LOAD case, we don't want to be trying to > load new microcode because that's a waste of time, but we absolutely > should query the current microcode revision. It is frequently relevant > for security reasons. > > So I think we want to fine-grain things a little, and separate the > concepts of "ucode info available" and "ucode loading available". Per > the current mechanism, that would involve supporting a case where > ucode_ops.collect_cpu_info() is available but > ucode_ops.apply_microcode() is not. I was going after something to that effect, yes. > > ~Andrew > > P.S. also in our copious free time, we need to start supporting the > Intel min_rev field, which is more complicated than it sounds. > > min_rev is vaguely defined as being relevant to block updates "after > you've evaluated CPUID and made decisions based on it", but here in Xen > we do also do livepatching and late loading to explicitly make use of > newly enumerated features. > > So we need a way of xen-ucode saying "please really do load this, > because I as the admin think it will be fine in combination with the > livepatch I'm about to apply". > > My best idea for this is to have a `--force` option to pass to Xen to > skip the revision checks, which will require either a new hypercall, or > perhaps borrowing a high bit from the size field in the current hypercall. > > With a force option in place, the boot time ucode=allow-same can go > away. It has become distinctly less useful now that we were forced do > this unilaterally on AMD CPUs, and separating "allow same because of HW > bugs" from "the Admin promised they knew what they were doing" would be > better for testing. I've created a GitLab issue to keep track of that: https://gitlab.com/xen-project/xen/-/issues/164 There's also the case of downgrades. We probably want to at least avoid going back to a microcode revision with different min_rev field. Alejandro ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-06-05 10:32 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20230531175119.10830-1-alejandro.vallejo@cloud.com>
2023-06-01 10:54 ` [PATCH] x86/microcode: Prevent attempting updates known to fail Andrew Cooper
2023-06-02 13:19 ` Alejandro Vallejo
2023-06-02 16:44 ` Andrew Cooper
2023-06-02 20:35 ` Andrew Cooper
2023-06-05 10:31 ` Alejandro Vallejo
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.