[PATCH] KVM: set_msr_mce: Permit guests to ignore single-bit ECC errors

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] KVM: set_msr_mce: Permit guests to ignore single-bit ECC errors
@ 2022-05-21  8:15 Lev Kujawski
  2022-05-23 17:08 ` Sean Christopherson
  2022-05-23 19:11 ` Paolo Bonzini
  0 siblings, 2 replies; 5+ messages in thread
From: Lev Kujawski @ 2022-05-21  8:15 UTC (permalink / raw)
  To: kvm; +Cc: Lev Kujawski

Certain guest operating systems (e.g., UNIXWARE) clear bit 0 of
MC1_CTL to ignore single-bit ECC data errors.  Single-bit ECC data
errors are always correctable and thus are safe to ignore because they
are informational in nature rather than signaling a loss of data
integrity.

Prior to this patch, these guests would crash upon writing MC1_CTL,
with resultant error messages like the following:

error: kvm run failed Operation not permitted
EAX=fffffffe EBX=fffffffe ECX=00000404 EDX=ffffffff
ESI=ffffffff EDI=00000001 EBP=fffdaba4 ESP=fffdab20
EIP=c01333a5 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0108 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0100 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0108 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0108 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0000 00000000 ffffffff 00c00000
GS =0000 00000000 ffffffff 00c00000
LDT=0118 c1026390 00000047 00008200 DPL=0 LDT
TR =0110 ffff5af0 00000067 00008b00 DPL=0 TSS32-busy
GDT=     ffff5020 000002cf
IDT=     ffff52f0 000007ff
CR0=8001003b CR2=00000000 CR3=0100a000 CR4=00000230
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
EFER=0000000000000000
Code=08 89 01 89 51 04 c3 8b 4c 24 08 8b 01 8b 51 04 8b 4c 24 04 <0f>
30 c3 f7 05 a4 6d ff ff 10 00 00 00 74 03 0f 31 c3 33 c0 33 d2 c3 8d
74 26 00 0f 31 c3

Signed-off-by: Lev Kujawski <lkujaw@member.fsf.org>
---
 arch/x86/kvm/x86.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4790f0d7d40b..128dca4e7bb7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3215,10 +3215,13 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			/* only 0 or all 1s can be written to IA32_MCi_CTL
 			 * some Linux kernels though clear bit 10 in bank 4 to
 			 * workaround a BIOS/GART TBL issue on AMD K8s, ignore
-			 * this to avoid an uncatched #GP in the guest
+			 * this to avoid an uncatched #GP in the guest.
+			 *
+			 * UNIXWARE clears bit 0 of MC1_CTL to ignore
+			 * correctable, single-bit ECC data errors.
 			 */
 			if ((offset & 0x3) == 0 &&
-			    data != 0 && (data | (1 << 10)) != ~(u64)0)
+			    data != 0 && (data | (1 << 10) | 1) != ~(u64)0)
 				return -1;
 
 			/* MCi_STATUS */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] KVM: set_msr_mce: Permit guests to ignore single-bit ECC errors
  2022-05-21  8:15 [PATCH] KVM: set_msr_mce: Permit guests to ignore single-bit ECC errors Lev Kujawski
@ 2022-05-23 17:08 ` Sean Christopherson
  2022-05-23 19:20   ` Paolo Bonzini
  2022-05-23 21:48   ` Lev Kujawski
  2022-05-23 19:11 ` Paolo Bonzini
  1 sibling, 2 replies; 5+ messages in thread
From: Sean Christopherson @ 2022-05-23 17:08 UTC (permalink / raw)
  To: Lev Kujawski; +Cc: kvm

"KVM: x86:" for the shortlog scope.

On Sat, May 21, 2022, Lev Kujawski wrote:
> Certain guest operating systems (e.g., UNIXWARE) clear bit 0 of
> MC1_CTL to ignore single-bit ECC data errors.

Not that it really matters, but is this behavior documented anywhere?  I've searched
a variety of SDMs, APMs, and PPRs, and can't find anything that documents this exact
behavior.  I totally believe that some CPUs behave this way, but it'd be nice to
document exactly which generations of whose CPUs allow clearing bit zero.

> Single-bit ECC data errors are always correctable and thus are safe to ignore
> because they are informational in nature rather than signaling a loss of data
> integrity.
> 
> Prior to this patch, these guests would crash upon writing MC1_CTL,
> with resultant error messages like the following:
> 
> error: kvm run failed Operation not permitted
> EAX=fffffffe EBX=fffffffe ECX=00000404 EDX=ffffffff
> ESI=ffffffff EDI=00000001 EBP=fffdaba4 ESP=fffdab20
> EIP=c01333a5 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0108 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0100 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
> SS =0108 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0108 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0000 00000000 ffffffff 00c00000
> GS =0000 00000000 ffffffff 00c00000
> LDT=0118 c1026390 00000047 00008200 DPL=0 LDT
> TR =0110 ffff5af0 00000067 00008b00 DPL=0 TSS32-busy
> GDT=     ffff5020 000002cf
> IDT=     ffff52f0 000007ff
> CR0=8001003b CR2=00000000 CR3=0100a000 CR4=00000230
> DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
> DR6=ffff0ff0 DR7=00000400
> EFER=0000000000000000
> Code=08 89 01 89 51 04 c3 8b 4c 24 08 8b 01 8b 51 04 8b 4c 24 04 <0f>
> 30 c3 f7 05 a4 6d ff ff 10 00 00 00 74 03 0f 31 c3 33 c0 33 d2 c3 8d
> 74 26 00 0f 31 c3
> 
> Signed-off-by: Lev Kujawski <lkujaw@member.fsf.org>
> ---
>  arch/x86/kvm/x86.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4790f0d7d40b..128dca4e7bb7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3215,10 +3215,13 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  			/* only 0 or all 1s can be written to IA32_MCi_CTL
>  			 * some Linux kernels though clear bit 10 in bank 4 to
>  			 * workaround a BIOS/GART TBL issue on AMD K8s, ignore
> -			 * this to avoid an uncatched #GP in the guest
> +			 * this to avoid an uncatched #GP in the guest.
> +			 *
> +			 * UNIXWARE clears bit 0 of MC1_CTL to ignore
> +			 * correctable, single-bit ECC data errors.
>  			 */
>  			if ((offset & 0x3) == 0 &&
> -			    data != 0 && (data | (1 << 10)) != ~(u64)0)
> +			    data != 0 && (data | (1 << 10) | 1) != ~(u64)0)
>  				return -1;

If KVM injects a #GP like it's supposed to[*], will UNIXWARE eat the #GP and continue
on, or will it explode?  If it continues on, I'd prefer to avoid more special casing in
KVM.

If it explodes, I think my preference would be to just drop the MCi_CTL checks
entirely.  AFAICT, P4-based and P5-based Intel CPus, and all? AMD CPUs allow
setting/clearing arbitrary bits.  The checks really aren't buying us anything,
and it seems like Intel retroactively defined the "architectural" behavior of
only 0s/1s.

[*] https://lore.kernel.org/all/20220512222716.4112548-2-seanjc@google.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] KVM: set_msr_mce: Permit guests to ignore single-bit ECC errors
  2022-05-21  8:15 [PATCH] KVM: set_msr_mce: Permit guests to ignore single-bit ECC errors Lev Kujawski
  2022-05-23 17:08 ` Sean Christopherson
@ 2022-05-23 19:11 ` Paolo Bonzini
  1 sibling, 0 replies; 5+ messages in thread
From: Paolo Bonzini @ 2022-05-23 19:11 UTC (permalink / raw)
  To: Lev Kujawski, kvm

On 5/21/22 10:15, Lev Kujawski wrote:
> Certain guest operating systems (e.g., UNIXWARE) clear bit 0 of
> MC1_CTL to ignore single-bit ECC data errors.  Single-bit ECC data
> errors are always correctable and thus are safe to ignore because they
> are informational in nature rather than signaling a loss of data
> integrity.
> 
> Prior to this patch, these guests would crash upon writing MC1_CTL,
> with resultant error messages like the following:
> 
> error: kvm run failed Operation not permitted
> EAX=fffffffe EBX=fffffffe ECX=00000404 EDX=ffffffff
> ESI=ffffffff EDI=00000001 EBP=fffdaba4 ESP=fffdab20
> EIP=c01333a5 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0108 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0100 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
> SS =0108 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0108 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0000 00000000 ffffffff 00c00000
> GS =0000 00000000 ffffffff 00c00000
> LDT=0118 c1026390 00000047 00008200 DPL=0 LDT
> TR =0110 ffff5af0 00000067 00008b00 DPL=0 TSS32-busy
> GDT=     ffff5020 000002cf
> IDT=     ffff52f0 000007ff
> CR0=8001003b CR2=00000000 CR3=0100a000 CR4=00000230
> DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
> DR6=ffff0ff0 DR7=00000400
> EFER=0000000000000000
> Code=08 89 01 89 51 04 c3 8b 4c 24 08 8b 01 8b 51 04 8b 4c 24 04 <0f>
> 30 c3 f7 05 a4 6d ff ff 10 00 00 00 74 03 0f 31 c3 33 c0 33 d2 c3 8d
> 74 26 00 0f 31 c3
> 
> Signed-off-by: Lev Kujawski <lkujaw@member.fsf.org>
> ---
>   arch/x86/kvm/x86.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4790f0d7d40b..128dca4e7bb7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3215,10 +3215,13 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   			/* only 0 or all 1s can be written to IA32_MCi_CTL
>   			 * some Linux kernels though clear bit 10 in bank 4 to
>   			 * workaround a BIOS/GART TBL issue on AMD K8s, ignore
> -			 * this to avoid an uncatched #GP in the guest
> +			 * this to avoid an uncatched #GP in the guest.
> +			 *
> +			 * UNIXWARE clears bit 0 of MC1_CTL to ignore
> +			 * correctable, single-bit ECC data errors.
>   			 */
>   			if ((offset & 0x3) == 0 &&
> -			    data != 0 && (data | (1 << 10)) != ~(u64)0)
> +			    data != 0 && (data | (1 << 10) | 1) != ~(u64)0)
>   				return -1;
>   
>   			/* MCi_STATUS */

Queued, thanks.

Paolo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] KVM: set_msr_mce: Permit guests to ignore single-bit ECC errors
  2022-05-23 17:08 ` Sean Christopherson
@ 2022-05-23 19:20   ` Paolo Bonzini
  2022-05-23 21:48   ` Lev Kujawski
  1 sibling, 0 replies; 5+ messages in thread
From: Paolo Bonzini @ 2022-05-23 19:20 UTC (permalink / raw)
  To: Sean Christopherson, Lev Kujawski; +Cc: kvm

On 5/23/22 19:08, Sean Christopherson wrote:
> If KVM injects a #GP like it's supposed to[*], will UNIXWARE eat the #GP and continue
> on, or will it explode?  If it continues on, I'd prefer to avoid more special casing in
> KVM.
> 
> If it explodes, I think my preference would be to just drop the MCi_CTL checks
> entirely.  AFAICT, P4-based and P5-based Intel CPus, and all? AMD CPUs allow
> setting/clearing arbitrary bits.  The checks really aren't buying us anything,
> and it seems like Intel retroactively defined the "architectural" behavior of
> only 0s/1s.

I'm always a bit worried about removing #GP behavior (just like adding 
it of course) because sometimes it is used by OSes to detect specific 
non-architectural processor behavior.

Paolo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] KVM: set_msr_mce: Permit guests to ignore single-bit ECC errors
  2022-05-23 17:08 ` Sean Christopherson
  2022-05-23 19:20   ` Paolo Bonzini
@ 2022-05-23 21:48   ` Lev Kujawski
  1 sibling, 0 replies; 5+ messages in thread
From: Lev Kujawski @ 2022-05-23 21:48 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, lkujaw


Sean Christopherson writes:

> "KVM: x86:" for the shortlog scope.
>
> On Sat, May 21, 2022, Lev Kujawski wrote:
>> Certain guest operating systems (e.g., UNIXWARE) clear bit 0 of
>> MC1_CTL to ignore single-bit ECC data errors.
>
> Not that it really matters, but is this behavior documented anywhere?  I've searched
> a variety of SDMs, APMs, and PPRs, and can't find anything that documents this exact
> behavior.  I totally believe that some CPUs behave this way, but it'd be nice to
> document exactly which generations of whose CPUs allow clearing bit zero.

Intel's coverage of IA32_MC1_CTL appears to be proprietary (perhaps
Appendix H material), but AMD helpfully documented it on page 204 of
their BIOS and Kernel Developer's Guide:

https://www.amd.com/system/files/TechDocs/26094.PDF

I experimentally determined that UNIXWARE writes MC1_CTL on QEMU models
"pentium2" or newer, but my guess is that this functionality was
actually introduced with the Pentium Pro.

>> Single-bit ECC data errors are always correctable and thus are safe to ignore
>> because they are informational in nature rather than signaling a loss of data
>> integrity.
>> 
>> Prior to this patch, these guests would crash upon writing MC1_CTL,
>> with resultant error messages like the following:
>> 
>> error: kvm run failed Operation not permitted
>> EAX=fffffffe EBX=fffffffe ECX=00000404 EDX=ffffffff
>> ESI=ffffffff EDI=00000001 EBP=fffdaba4 ESP=fffdab20
>> EIP=c01333a5 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0108 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> CS =0100 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
>> SS =0108 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> DS =0108 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
>> FS =0000 00000000 ffffffff 00c00000
>> GS =0000 00000000 ffffffff 00c00000
>> LDT=0118 c1026390 00000047 00008200 DPL=0 LDT
>> TR =0110 ffff5af0 00000067 00008b00 DPL=0 TSS32-busy
>> GDT=     ffff5020 000002cf
>> IDT=     ffff52f0 000007ff
>> CR0=8001003b CR2=00000000 CR3=0100a000 CR4=00000230
>> DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
>> DR6=ffff0ff0 DR7=00000400
>> EFER=0000000000000000
>> Code=08 89 01 89 51 04 c3 8b 4c 24 08 8b 01 8b 51 04 8b 4c 24 04 <0f>
>> 30 c3 f7 05 a4 6d ff ff 10 00 00 00 74 03 0f 31 c3 33 c0 33 d2 c3 8d
>> 74 26 00 0f 31 c3
>> 
>> Signed-off-by: Lev Kujawski <lkujaw@member.fsf.org>
>> ---
>>  arch/x86/kvm/x86.c | 7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 4790f0d7d40b..128dca4e7bb7 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3215,10 +3215,13 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>  			/* only 0 or all 1s can be written to IA32_MCi_CTL
>>  			 * some Linux kernels though clear bit 10 in bank 4 to
>>  			 * workaround a BIOS/GART TBL issue on AMD K8s, ignore
>> -			 * this to avoid an uncatched #GP in the guest
>> +			 * this to avoid an uncatched #GP in the guest.
>> +			 *
>> +			 * UNIXWARE clears bit 0 of MC1_CTL to ignore
>> +			 * correctable, single-bit ECC data errors.
>>  			 */
>>  			if ((offset & 0x3) == 0 &&
>> -			    data != 0 && (data | (1 << 10)) != ~(u64)0)
>> +			    data != 0 && (data | (1 << 10) | 1) != ~(u64)0)
>>  				return -1;
>
> If KVM injects a #GP like it's supposed to[*], will UNIXWARE eat the #GP and continue
> on, or will it explode?  If it continues on, I'd prefer to avoid more special casing in
> KVM.
>
> If it explodes, I think my preference would be to just drop the MCi_CTL checks
> entirely.  AFAICT, P4-based and P5-based Intel CPus, and all? AMD CPUs allow
> setting/clearing arbitrary bits.  The checks really aren't buying us anything,
> and it seems like Intel retroactively defined the "architectural" behavior of
> only 0s/1s.
>
> [*] https://lore.kernel.org/all/20220512222716.4112548-2-seanjc@google.com

Unfortunately, I cannot say if the UNIXWARE kernel would panic because
QEMU enters a STOP state from which attempts to continue are met with
"Error: Resetting the Virtual Machine is required."

Thanks for the feedback, Lev


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-05-23 21:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-05-21  8:15 [PATCH] KVM: set_msr_mce: Permit guests to ignore single-bit ECC errors Lev Kujawski
2022-05-23 17:08 ` Sean Christopherson
2022-05-23 19:20   ` Paolo Bonzini
2022-05-23 21:48   ` Lev Kujawski
2022-05-23 19:11 ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox