* [PATCH v2 0/2] KVM: x86: MCE fixes
@ 2026-06-09 10:55 Carlos López
2026-06-09 10:55 ` [PATCH v2 1/2] KVM: x86: Fix array_index_nospec() protection in kvm_vcpu_ioctl_x86_set_mce() Carlos López
2026-06-09 10:55 ` [PATCH v2 2/2] KVM: x86: Do not gate MCE logging based on IA32_MCi_CTL Carlos López
0 siblings, 2 replies; 5+ messages in thread
From: Carlos López @ 2026-06-09 10:55 UTC (permalink / raw)
To: kvm, seanjc, pbonzini
Cc: linux-kernel, x86, tglx, mingo, dave.hansen, hpa,
Carlos López
These two patches are somewhat unrelated, but patch 2 came out of
Sean's suggestions from reviewing the first patch (see [1]). Patch 1 is
unchanged.
Note that the SDM is ambiguous regarding the logging behavior for
IA32_MCG_CTL, so additional correctness fixes may be warranted on top.
[1] https://lore.kernel.org/all/20260516163412.601908-1-clopez@suse.de/
Carlos López (2):
KVM: x86: Fix array_index_nospec() protection in kvm_vcpu_ioctl_x86_set_mce()
KVM: x86: Do not gate MCE logging based on IA32_MCi_CTL
arch/x86/kvm/x86.c | 32 ++++++++++++++++++++------------
1 file changed, 20 insertions(+), 12 deletions(-)
base-commit: de3a35be92d2391ece4bf3143ef2887192625fd0
--
2.51.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2 1/2] KVM: x86: Fix array_index_nospec() protection in kvm_vcpu_ioctl_x86_set_mce()
2026-06-09 10:55 [PATCH v2 0/2] KVM: x86: MCE fixes Carlos López
@ 2026-06-09 10:55 ` Carlos López
2026-06-09 10:55 ` [PATCH v2 2/2] KVM: x86: Do not gate MCE logging based on IA32_MCi_CTL Carlos López
1 sibling, 0 replies; 5+ messages in thread
From: Carlos López @ 2026-06-09 10:55 UTC (permalink / raw)
To: kvm, seanjc, pbonzini
Cc: linux-kernel, x86, tglx, mingo, dave.hansen, hpa,
Carlos López, Borislav Petkov, Jue Wang
Commit aebc3ca19063 ("KVM: x86: Enable CMCI capability by default and
handle injected UCNA errors") introduced kvm_vcpu_x86_set_ucna(), which
accesses @vcpu->arch.mci_ctl2_banks[] using @mce->bank as the index. The
@mce struct is user-controlled, provided via the KVM_X86_SET_MCE ioctl.
The caller of this function, kvm_vcpu_ioctl_x86_set_mce(), bounds-checks
@mce->bank and applies array_index_nospec() to advance the @banks
pointer, but @mce->bank itself is passed through unclamped. On a
speculative path that bypasses the bounds check, the raw @mce->bank
value can index mci_ctl2_banks[] out-of-bounds.
In practice this is a very weak gadget, and would at most allow leaking
a single bit in a 64-bit integer, but prevent potential future issues by
clamping @mce->bank in place with array_index_nospec(), before passing
the struct to kvm_vcpu_x86_set_ucna().
Fixes: aebc3ca19063 ("KVM: x86: Enable CMCI capability by default and handle injected UCNA errors")
Signed-off-by: Carlos López <clopez@suse.de>
---
arch/x86/kvm/x86.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cf122b8c3210..77a780177c4e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5472,7 +5472,8 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu,
if (mce->bank >= bank_num || !(mce->status & MCI_STATUS_VAL))
return -EINVAL;
- banks += array_index_nospec(4 * mce->bank, 4 * bank_num);
+ mce->bank = array_index_nospec(mce->bank, bank_num);
+ banks += 4 * mce->bank;
if (is_ucna(mce))
return kvm_vcpu_x86_set_ucna(vcpu, mce, banks);
--
2.51.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v2 2/2] KVM: x86: Do not gate MCE logging based on IA32_MCi_CTL
2026-06-09 10:55 [PATCH v2 0/2] KVM: x86: MCE fixes Carlos López
2026-06-09 10:55 ` [PATCH v2 1/2] KVM: x86: Fix array_index_nospec() protection in kvm_vcpu_ioctl_x86_set_mce() Carlos López
@ 2026-06-09 10:55 ` Carlos López
2026-06-09 11:10 ` sashiko-bot
1 sibling, 1 reply; 5+ messages in thread
From: Carlos López @ 2026-06-09 10:55 UTC (permalink / raw)
To: kvm, seanjc, pbonzini
Cc: linux-kernel, x86, tglx, mingo, dave.hansen, hpa,
Carlos López, Borislav Petkov, Ying Huang, Avi Kivity
When userspace issues KVM_X86_SET_MCE, kvm_vcpu_ioctl_x86_set_mce()
decides whether to log an uncorrectable MCE by looking at the
corresponding IA32_MCi_CTL MSR. This is not the behavior specified in
the Intel SDM (17.3.2.1 IA32_MCi_CTL MSRs):
Setting an EEj flag enables signaling #MC of the associated error and
clearing it disables signaling of the error. Error logging happens
regardless of the setting of these bits. The processor drops writes to
bits that are not implemented.
Hoist the logging code, which writes to the corresponding error banks,
above the MCi_CTL check. Keep the #MC delivery after the check, as this
is correct behavior.
Fixes: 890ca9aefa78 ("KVM: Add MCE support")
Signed-off-by: Carlos López <clopez@suse.de>
---
arch/x86/kvm/x86.c | 29 ++++++++++++++++++-----------
1 file changed, 18 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 77a780177c4e..722bead405b5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5485,24 +5485,31 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu,
if ((mce->status & MCI_STATUS_UC) && (mcg_cap & MCG_CTL_P) &&
vcpu->arch.mcg_ctl != ~(u64)0)
return 0;
- /*
- * if IA32_MCi_CTL is not all 1s, the uncorrected error
- * reporting is disabled for the bank
- */
- if ((mce->status & MCI_STATUS_UC) && banks[0] != ~(u64)0)
- return 0;
+
if (mce->status & MCI_STATUS_UC) {
+ /*
+ * Per the SDM, error logging always happens independently of
+ * IA32_MCi_CTL
+ */
+ if (banks[1] & MCI_STATUS_VAL)
+ mce->status |= MCI_STATUS_OVER;
+ banks[2] = mce->addr;
+ banks[3] = mce->misc;
+ banks[1] = mce->status;
+
+ /*
+ * if IA32_MCi_CTL is not all 1s, the uncorrected error
+ * reporting is disabled for the bank
+ */
+ if (banks[0] != ~(u64)0)
+ return 0;
+
if ((vcpu->arch.mcg_status & MCG_STATUS_MCIP) ||
!kvm_is_cr4_bit_set(vcpu, X86_CR4_MCE)) {
kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
return 0;
}
- if (banks[1] & MCI_STATUS_VAL)
- mce->status |= MCI_STATUS_OVER;
- banks[2] = mce->addr;
- banks[3] = mce->misc;
vcpu->arch.mcg_status = mce->mcg_status;
- banks[1] = mce->status;
kvm_queue_exception(vcpu, MC_VECTOR);
} else if (!(banks[1] & MCI_STATUS_VAL)
|| !(banks[1] & MCI_STATUS_UC)) {
--
2.51.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2 2/2] KVM: x86: Do not gate MCE logging based on IA32_MCi_CTL
2026-06-09 10:55 ` [PATCH v2 2/2] KVM: x86: Do not gate MCE logging based on IA32_MCi_CTL Carlos López
@ 2026-06-09 11:10 ` sashiko-bot
2026-06-09 12:56 ` Carlos López
0 siblings, 1 reply; 5+ messages in thread
From: sashiko-bot @ 2026-06-09 11:10 UTC (permalink / raw)
To: Carlos López; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
New issues:
- [High] The patch incorrectly overwrites the MCE banks when a double/triple fault occurs (MCIP flag already set).
Pre-existing issues:
- [High] The patch fails to apply the MCE logging fix to the global control register IA32_MCG_CTL, leaving MCE emulation architecturally broken.
--
commit 146247f45f3a8d3f982b364bba19a51f76828ce0
Author: Carlos López <clopez@suse.de>
KVM: x86: Do not gate MCE logging based on IA32_MCi_CTL
This patch hoists the MCE logging logic above the IA32_MCi_CTL check.
This ensures that KVM logs uncorrectable machine check errors even if
reporting is disabled for that specific bank, conforming to the Intel SDM.
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5486,24 +5486,31 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu,
> if ((mce->status & MCI_STATUS_UC) && (mcg_cap & MCG_CTL_P) &&
> vcpu->arch.mcg_ctl != ~(u64)0)
> return 0;
[Severity: High]
This isn't a bug introduced by this patch, but does this global control
register check also need to be adjusted?
Just like IA32_MCi_CTL, the Intel SDM states that error logging happens
regardless of the setting of the IA32_MCG_CTL bits. By returning early
here, KVM skips logging the uncorrectable error to the bank if the global
reporting is disabled.
Additionally, KVM currently suppresses generation and logging for all
banks if any bit in IA32_MCG_CTL is 0, rather than checking the specific
bit for the bank. Should the logging logic be hoisted above this check
as well?
> - /*
> - * if IA32_MCi_CTL is not all 1s, the uncorrected error
> - * reporting is disabled for the bank
> - */
> - if ((mce->status & MCI_STATUS_UC) && banks[0] != ~(u64)0)
> - return 0;
> +
> if (mce->status & MCI_STATUS_UC) {
> + /*
> + * Per the SDM, error logging always happens independently of
> + * IA32_MCi_CTL
> + */
> + if (banks[1] & MCI_STATUS_VAL)
> + mce->status |= MCI_STATUS_OVER;
> + banks[2] = mce->addr;
> + banks[3] = mce->misc;
> + banks[1] = mce->status;
> +
> + /*
> + * if IA32_MCi_CTL is not all 1s, the uncorrected error
> + * reporting is disabled for the bank
> + */
> + if (banks[0] != ~(u64)0)
> + return 0;
> +
> if ((vcpu->arch.mcg_status & MCG_STATUS_MCIP) ||
> !kvm_is_cr4_bit_set(vcpu, X86_CR4_MCE)) {
> kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
> return 0;
> }
[Severity: High]
By hoisting the error logging block above the IA32_MCi_CTL check, does
this now incorrectly overwrite the MCE banks when a double or triple fault
occurs?
According to the SDM, if the processor encounters another machine-check
error while the MCIP flag is already set, it causes a triple fault and
does not update the status registers for the second error.
Since the logging block in kvm_vcpu_ioctl_x86_set_mce() now executes
before checking for MCG_STATUS_MCIP, it looks like it will overwrite the
original addr, misc, and status of the first error before requesting the
triple fault.
> - if (banks[1] & MCI_STATUS_VAL)
> - mce->status |= MCI_STATUS_OVER;
> - banks[2] = mce->addr;
> - banks[3] = mce->misc;
> vcpu->arch.mcg_status = mce->mcg_status;
> - banks[1] = mce->status;
> kvm_queue_exception(vcpu, MC_VECTOR);
> } else if (!(banks[1] & MCI_STATUS_VAL)
[ ... ]
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260609105525.2545638-2-clopez@suse.de?part=2
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2 2/2] KVM: x86: Do not gate MCE logging based on IA32_MCi_CTL
2026-06-09 11:10 ` sashiko-bot
@ 2026-06-09 12:56 ` Carlos López
0 siblings, 0 replies; 5+ messages in thread
From: Carlos López @ 2026-06-09 12:56 UTC (permalink / raw)
To: sashiko-reviews; +Cc: kvm
On 6/9/26 1:10 PM, sashiko-bot@kernel.org wrote:
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -5486,24 +5486,31 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu,
>> if ((mce->status & MCI_STATUS_UC) && (mcg_cap & MCG_CTL_P) &&
>> vcpu->arch.mcg_ctl != ~(u64)0)
>> return 0;
>
> [Severity: High]
> This isn't a bug introduced by this patch, but does this global control
> register check also need to be adjusted?
>
> Just like IA32_MCi_CTL, the Intel SDM states that error logging happens
> regardless of the setting of the IA32_MCG_CTL bits. By returning early
> here, KVM skips logging the uncorrectable error to the bank if the global
> reporting is disabled.
Does the SDM specify so, though? As Sean noted, and as echoed in the
cover letter, this is pretty ambiguous.
> Additionally, KVM currently suppresses generation and logging for all
> banks if any bit in IA32_MCG_CTL is 0, rather than checking the specific
> bit for the bank. Should the logging logic be hoisted above this check
> as well?
>
>> - /*
>> - * if IA32_MCi_CTL is not all 1s, the uncorrected error
>> - * reporting is disabled for the bank
>> - */
>> - if ((mce->status & MCI_STATUS_UC) && banks[0] != ~(u64)0)
>> - return 0;
>> +
>> if (mce->status & MCI_STATUS_UC) {
>> + /*
>> + * Per the SDM, error logging always happens independently of
>> + * IA32_MCi_CTL
>> + */
>> + if (banks[1] & MCI_STATUS_VAL)
>> + mce->status |= MCI_STATUS_OVER;
>> + banks[2] = mce->addr;
>> + banks[3] = mce->misc;
>> + banks[1] = mce->status;
>> +
>> + /*
>> + * if IA32_MCi_CTL is not all 1s, the uncorrected error
>> + * reporting is disabled for the bank
>> + */
>> + if (banks[0] != ~(u64)0)
>> + return 0;
>> +
>> if ((vcpu->arch.mcg_status & MCG_STATUS_MCIP) ||
>> !kvm_is_cr4_bit_set(vcpu, X86_CR4_MCE)) {
>> kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
>> return 0;
>> }
>
> [Severity: High]
> By hoisting the error logging block above the IA32_MCi_CTL check, does
> this now incorrectly overwrite the MCE banks when a double or triple fault
> occurs?
>
> According to the SDM, if the processor encounters another machine-check
> error while the MCIP flag is already set, it causes a triple fault and
> does not update the status registers for the second error.
17.3.1.2 "IA32_MCG_STATUS MSR" does not say anything about updating or
not addr/misc/status depending on MCIP, it just says:
The occurrence of a second Machine-Check Event while MCIP is set will
cause the processor to enter a shutdown state.
Claude helped me find a good counterexample to what Sashiko says
regarding MCIP. Table 18-40 "Machine Check Error Codes for
IA32_MC4_STATUS" says this about bits 19:16:
Model specific error code bits 19:16.
If MACOD = 40CH, MSCOD encoding should be interpreted as:
01H: MCE when CR4.MCE is clear.
02H: MCE when MCIP bit is set.
MSCOD=02H is precisely a case of an update to an MCi_STATUS MSR while
MCIP is set.
> Since the logging block in kvm_vcpu_ioctl_x86_set_mce() now executes
> before checking for MCG_STATUS_MCIP, it looks like it will overwrite the
> original addr, misc, and status of the first error before requesting the
> triple fault.
17.3.2.2 "IA32_MCi_STATUS MSRS" does say that a second UC *to the same
bank* should not overwrite the first:
OVER (machine check overflow) flag, bit 62 - (...). Uncorrected errors
are not written over previous valid uncorrected errors. (...)
But this has nothing to do with MCIP. So, IIUC, the the new behavior is
still not completely correct because a second UC to the *same* bank
would overwrite the first. I can prepare a v3 with this.
>
>> - if (banks[1] & MCI_STATUS_VAL)
>> - mce->status |= MCI_STATUS_OVER;
>> - banks[2] = mce->addr;
>> - banks[3] = mce->misc;
>> vcpu->arch.mcg_status = mce->mcg_status;
>> - banks[1] = mce->status;
>> kvm_queue_exception(vcpu, MC_VECTOR);
>> } else if (!(banks[1] & MCI_STATUS_VAL)
> [ ... ]
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-09 12:56 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-09 10:55 [PATCH v2 0/2] KVM: x86: MCE fixes Carlos López
2026-06-09 10:55 ` [PATCH v2 1/2] KVM: x86: Fix array_index_nospec() protection in kvm_vcpu_ioctl_x86_set_mce() Carlos López
2026-06-09 10:55 ` [PATCH v2 2/2] KVM: x86: Do not gate MCE logging based on IA32_MCi_CTL Carlos López
2026-06-09 11:10 ` sashiko-bot
2026-06-09 12:56 ` Carlos López
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox