* [Bug] AMD nested: commit broke VMware @ 2023-05-29 12:58 jwarren 2023-05-29 13:43 ` Maxim Levitsky 0 siblings, 1 reply; 9+ messages in thread From: jwarren @ 2023-05-29 12:58 UTC (permalink / raw) To: Kvm Hello, Since kernel 5.16 users can't start VMware VMs when it is nested under KVM on AMD CPUs. User reports are here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2008583 https://forums.unraid.net/topic/128868-vmware-7x-will-not-start-any-vms-under-unraid-6110/ I've pinpointed it to commit 174a921b6975ef959dd82ee9e8844067a62e3ec1 (appeared in 5.16rc1) "nSVM: Check for reserved encodings of TLB_CONTROL in nested VMCB" I've confirmed that VMware errors out when it checks for TLB_CONTROL_FLUSH_ASID support and gets a 'false' answer. First revisions of the patch in question had some support for TLB_CONTROL_FLUSH_ASID, but it was removed: https://lore.kernel.org/kvm/f7c2d5f5-3560-8666-90be-3605220cb93c@redhat.com/ I don't know what would be the best case here, maybe put a quirk there, so it doesn't break "userspace". Committer's email is dead, so I'm writing here. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug] AMD nested: commit broke VMware 2023-05-29 12:58 [Bug] AMD nested: commit broke VMware jwarren @ 2023-05-29 13:43 ` Maxim Levitsky 2023-05-30 20:10 ` Jim Mattson 0 siblings, 1 reply; 9+ messages in thread From: Maxim Levitsky @ 2023-05-29 13:43 UTC (permalink / raw) To: jwarren, Kvm У пн, 2023-05-29 у 14:58 +0200, jwarren@tutanota.com пише: > Hello, > Since kernel 5.16 users can't start VMware VMs when it is nested under KVM on AMD CPUs. > > User reports are here: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2008583 > https://forums.unraid.net/topic/128868-vmware-7x-will-not-start-any-vms-under-unraid-6110/ > > I've pinpointed it to commit 174a921b6975ef959dd82ee9e8844067a62e3ec1 (appeared in 5.16rc1) > "nSVM: Check for reserved encodings of TLB_CONTROL in nested VMCB" > > I've confirmed that VMware errors out when it checks for TLB_CONTROL_FLUSH_ASID support and gets a 'false' answer. > > First revisions of the patch in question had some support for TLB_CONTROL_FLUSH_ASID, but it was removed: > https://lore.kernel.org/kvm/f7c2d5f5-3560-8666-90be-3605220cb93c@redhat.com/ > > I don't know what would be the best case here, maybe put a quirk there, so it doesn't break "userspace". > Committer's email is dead, so I'm writing here. > I have to say that I know about this for long time, because some time ago I used to play with VMware player in a VM on AMD on my spare time, on weekends (just doing various crazy things with double nesting, running win98 nested, vfio stuff, etc, etc). I didn't report it because its a bug in VMWARE - they set a bit in the tlb_control without checking CPUID's FLUSHBYASID which states that KVM doesn't support setting this bit. Supporting FLUSHBYASID would fix this, and make nesting faster too, but it is far from a trivial job. I hope that I will find time to do this soon. Best regards, Maxim Levitsky ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug] AMD nested: commit broke VMware 2023-05-29 13:43 ` Maxim Levitsky @ 2023-05-30 20:10 ` Jim Mattson 2023-05-30 20:34 ` Jim Mattson 0 siblings, 1 reply; 9+ messages in thread From: Jim Mattson @ 2023-05-30 20:10 UTC (permalink / raw) To: Maxim Levitsky; +Cc: jwarren, Kvm On Mon, May 29, 2023 at 6:44 AM Maxim Levitsky <mlevitsk@redhat.com> wrote: > > У пн, 2023-05-29 у 14:58 +0200, jwarren@tutanota.com пише: > > Hello, > > Since kernel 5.16 users can't start VMware VMs when it is nested under KVM on AMD CPUs. > > > > User reports are here: > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2008583 > > https://forums.unraid.net/topic/128868-vmware-7x-will-not-start-any-vms-under-unraid-6110/ > > > > I've pinpointed it to commit 174a921b6975ef959dd82ee9e8844067a62e3ec1 (appeared in 5.16rc1) > > "nSVM: Check for reserved encodings of TLB_CONTROL in nested VMCB" > > > > I've confirmed that VMware errors out when it checks for TLB_CONTROL_FLUSH_ASID support and gets a 'false' answer. > > > > First revisions of the patch in question had some support for TLB_CONTROL_FLUSH_ASID, but it was removed: > > https://lore.kernel.org/kvm/f7c2d5f5-3560-8666-90be-3605220cb93c@redhat.com/ > > > > I don't know what would be the best case here, maybe put a quirk there, so it doesn't break "userspace". > > Committer's email is dead, so I'm writing here. > > > > I have to say that I know about this for long time, because some time ago I used to play with VMware player in a > VM on AMD on my spare time, on weekends > (just doing various crazy things with double nesting, running win98 nested, vfio stuff, etc, etc). > > I didn't report it because its a bug in VMWARE - they set a bit in the tlb_control without checking CPUID's FLUSHBYASID > which states that KVM doesn't support setting this bit. I am pretty sure that bit 1 is supposed to be ignored on hardware without FlushByASID, but I'll have to see if I can dig up an old APM to verify that. > Supporting FLUSHBYASID would fix this, and make nesting faster too, > but it is far from a trivial job. > > I hope that I will find time to do this soon. > > Best regards, > Maxim Levitsky > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug] AMD nested: commit broke VMware 2023-05-30 20:10 ` Jim Mattson @ 2023-05-30 20:34 ` Jim Mattson 2023-05-31 10:35 ` Maxim Levitsky 0 siblings, 1 reply; 9+ messages in thread From: Jim Mattson @ 2023-05-30 20:34 UTC (permalink / raw) To: Maxim Levitsky; +Cc: jwarren, Kvm On Tue, May 30, 2023 at 1:10 PM Jim Mattson <jmattson@google.com> wrote: > > On Mon, May 29, 2023 at 6:44 AM Maxim Levitsky <mlevitsk@redhat.com> wrote: > > > > У пн, 2023-05-29 у 14:58 +0200, jwarren@tutanota.com пише: > > > Hello, > > > Since kernel 5.16 users can't start VMware VMs when it is nested under KVM on AMD CPUs. > > > > > > User reports are here: > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2008583 > > > https://forums.unraid.net/topic/128868-vmware-7x-will-not-start-any-vms-under-unraid-6110/ > > > > > > I've pinpointed it to commit 174a921b6975ef959dd82ee9e8844067a62e3ec1 (appeared in 5.16rc1) > > > "nSVM: Check for reserved encodings of TLB_CONTROL in nested VMCB" > > > > > > I've confirmed that VMware errors out when it checks for TLB_CONTROL_FLUSH_ASID support and gets a 'false' answer. > > > > > > First revisions of the patch in question had some support for TLB_CONTROL_FLUSH_ASID, but it was removed: > > > https://lore.kernel.org/kvm/f7c2d5f5-3560-8666-90be-3605220cb93c@redhat.com/ > > > > > > I don't know what would be the best case here, maybe put a quirk there, so it doesn't break "userspace". > > > Committer's email is dead, so I'm writing here. > > > > > > > I have to say that I know about this for long time, because some time ago I used to play with VMware player in a > > VM on AMD on my spare time, on weekends > > (just doing various crazy things with double nesting, running win98 nested, vfio stuff, etc, etc). > > > > I didn't report it because its a bug in VMWARE - they set a bit in the tlb_control without checking CPUID's FLUSHBYASID > > which states that KVM doesn't support setting this bit. > > I am pretty sure that bit 1 is supposed to be ignored on hardware > without FlushByASID, but I'll have to see if I can dig up an old APM > to verify that. I couldn't find an APM that old, but even today's APM does not specify that any checks are performed on the TLB_CONTROL field by VMRUN. While Intel likes to fail VM-entry for illegal VMCS state, AMD prefers to massage the VMCB to render any illegal VMCB state legal. For example, rather than fail VM-entry for a non-canonical address, AMD is inclined to drop the high bits and sign-extend the low bits, so that the address is canonical. I'm willing to bet that modern CPUs continue to ignore the TLB_CONTROL bits that were noted "reserved" in version 3.22 of the manual, and that Krish simply manufactured the checks in commit 174a921b6975 ("nSVM: Check for reserved encodings of TLB_CONTROL in nested VMCB"), without cause. > > Supporting FLUSHBYASID would fix this, and make nesting faster too, > > but it is far from a trivial job. > > > > I hope that I will find time to do this soon. > > > > Best regards, > > Maxim Levitsky > > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug] AMD nested: commit broke VMware 2023-05-30 20:34 ` Jim Mattson @ 2023-05-31 10:35 ` Maxim Levitsky 2023-05-31 14:41 ` Sean Christopherson 0 siblings, 1 reply; 9+ messages in thread From: Maxim Levitsky @ 2023-05-31 10:35 UTC (permalink / raw) To: Jim Mattson; +Cc: jwarren, Kvm У вт, 2023-05-30 у 13:34 -0700, Jim Mattson пише: > On Tue, May 30, 2023 at 1:10 PM Jim Mattson <jmattson@google.com> wrote: > > On Mon, May 29, 2023 at 6:44 AM Maxim Levitsky <mlevitsk@redhat.com> wrote: > > > У пн, 2023-05-29 у 14:58 +0200, jwarren@tutanota.com пише: > > > > Hello, > > > > Since kernel 5.16 users can't start VMware VMs when it is nested under KVM on AMD CPUs. > > > > > > > > User reports are here: > > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2008583 > > > > https://forums.unraid.net/topic/128868-vmware-7x-will-not-start-any-vms-under-unraid-6110/ > > > > > > > > I've pinpointed it to commit 174a921b6975ef959dd82ee9e8844067a62e3ec1 (appeared in 5.16rc1) > > > > "nSVM: Check for reserved encodings of TLB_CONTROL in nested VMCB" > > > > > > > > I've confirmed that VMware errors out when it checks for TLB_CONTROL_FLUSH_ASID support and gets a 'false' answer. > > > > > > > > First revisions of the patch in question had some support for TLB_CONTROL_FLUSH_ASID, but it was removed: > > > > https://lore.kernel.org/kvm/f7c2d5f5-3560-8666-90be-3605220cb93c@redhat.com/ > > > > > > > > I don't know what would be the best case here, maybe put a quirk there, so it doesn't break "userspace". > > > > Committer's email is dead, so I'm writing here. > > > > > > > > > > I have to say that I know about this for long time, because some time ago I used to play with VMware player in a > > > VM on AMD on my spare time, on weekends > > > (just doing various crazy things with double nesting, running win98 nested, vfio stuff, etc, etc). > > > > > > I didn't report it because its a bug in VMWARE - they set a bit in the tlb_control without checking CPUID's FLUSHBYASID > > > which states that KVM doesn't support setting this bit. > > > > I am pretty sure that bit 1 is supposed to be ignored on hardware > > without FlushByASID, but I'll have to see if I can dig up an old APM > > to verify that. > > I couldn't find an APM that old, but even today's APM does not specify > that any checks are performed on the TLB_CONTROL field by VMRUN. > > While Intel likes to fail VM-entry for illegal VMCS state, AMD prefers > to massage the VMCB to render any illegal VMCB state legal. For > example, rather than fail VM-entry for a non-canonical address, AMD is > inclined to drop the high bits and sign-extend the low bits, so that > the address is canonical. > > I'm willing to bet that modern CPUs continue to ignore the TLB_CONTROL > bits that were noted "reserved" in version 3.22 of the manual, and > that Krish simply manufactured the checks in commit 174a921b6975 > ("nSVM: Check for reserved encodings of TLB_CONTROL in nested VMCB"), > without cause. > > > > Supporting FLUSHBYASID would fix this, and make nesting faster too, > > > but it is far from a trivial job. > > > > > > I hope that I will find time to do this soon. > > > > > > Best regards, > > > Maxim Levitsky > > > > > > Yup... After applying this horrible hack to KVM, the VM still boots just fine on bare metal. diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h index e7c7379d6ac7b0..2e45c1b747104a 100644 --- a/arch/x86/include/asm/svm.h +++ b/arch/x86/include/asm/svm.h @@ -170,10 +170,10 @@ struct __attribute__ ((__packed__)) vmcb_control_area { }; -#define TLB_CONTROL_DO_NOTHING 0 -#define TLB_CONTROL_FLUSH_ALL_ASID 1 -#define TLB_CONTROL_FLUSH_ASID 3 -#define TLB_CONTROL_FLUSH_ASID_LOCAL 7 +#define TLB_CONTROL_DO_NOTHING 0xF0 +#define TLB_CONTROL_FLUSH_ALL_ASID 0xF1 +#define TLB_CONTROL_FLUSH_ASID 0xF3 +#define TLB_CONTROL_FLUSH_ASID_LOCAL 0xF7 #define V_TPR_MASK 0x0f Shall we revert the offending patch then? Best regards, Maxim Levitsky ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Bug] AMD nested: commit broke VMware 2023-05-31 10:35 ` Maxim Levitsky @ 2023-05-31 14:41 ` Sean Christopherson 2023-05-31 15:34 ` Jim Mattson [not found] ` <CALMp9eTti7gSNKgR=h__SsoKynaR1tR2nHhuk_6tse-3FHJ7mw@mail.gmail.com-NWmRa0I----9> 0 siblings, 2 replies; 9+ messages in thread From: Sean Christopherson @ 2023-05-31 14:41 UTC (permalink / raw) To: Maxim Levitsky; +Cc: Jim Mattson, jwarren, Kvm On Wed, May 31, 2023, Maxim Levitsky wrote: > У вт, 2023-05-30 у 13:34 -0700, Jim Mattson пише: > > On Tue, May 30, 2023 at 1:10 PM Jim Mattson <jmattson@google.com> wrote: > > > On Mon, May 29, 2023 at 6:44 AM Maxim Levitsky <mlevitsk@redhat.com> wrote: > > > > У пн, 2023-05-29 у 14:58 +0200, jwarren@tutanota.com пише: > > > > > I don't know what would be the best case here, maybe put a quirk > > > > > there, so it doesn't break "userspace". Committer's email is dead, > > > > > so I'm writing here. > > > > > > > > > > > > > I have to say that I know about this for long time, because some time > > > > ago I used to play with VMware player in a VM on AMD on my spare time, > > > > on weekends (just doing various crazy things with double nesting, > > > > running win98 nested, vfio stuff, etc, etc). > > > > > > > > I didn't report it because its a bug in VMWARE - they set a bit in the > > > > tlb_control without checking CPUID's FLUSHBYASID which states that KVM > > > > doesn't support setting this bit. > > > > > > I am pretty sure that bit 1 is supposed to be ignored on hardware > > > without FlushByASID, but I'll have to see if I can dig up an old APM > > > to verify that. > > > > I couldn't find an APM that old, but even today's APM does not specify > > that any checks are performed on the TLB_CONTROL field by VMRUN. > > > > While Intel likes to fail VM-entry for illegal VMCS state, AMD prefers > > to massage the VMCB to render any illegal VMCB state legal. For > > example, rather than fail VM-entry for a non-canonical address, AMD is > > inclined to drop the high bits and sign-extend the low bits, so that > > the address is canonical. > > > > I'm willing to bet that modern CPUs continue to ignore the TLB_CONTROL > > bits that were noted "reserved" in version 3.22 of the manual, and > > that Krish simply manufactured the checks in commit 174a921b6975 > > ("nSVM: Check for reserved encodings of TLB_CONTROL in nested VMCB"), > > without cause. Ya. The APM even provides a definition of "reserved" that explicitly calls out the different reserved qualifiers. The only fields/values that KVM can actively enforce are things tagged MBZ. reserved Fields marked as reserved may be used at some future time. To preserve compatibility with future processors, reserved fields require special handling when read or written by software. Software must not depend on the state of a reserved field (unless qualified as RAZ), nor upon the ability of such fields to return a previously written state. If a field is marked reserved without qualification, software must not change the state of that field; it must reload that field with the same value returned from a prior read. Reserved fields may be qualified as IGN, MBZ, RAZ, or SBZ (see definitions). > > > > Supporting FLUSHBYASID would fix this, and make nesting faster too, > > > > but it is far from a trivial job. > > > > > > > > I hope that I will find time to do this soon. ... > Shall we revert the offending patch then? Yes please. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug] AMD nested: commit broke VMware 2023-05-31 14:41 ` Sean Christopherson @ 2023-05-31 15:34 ` Jim Mattson [not found] ` <CALMp9eTti7gSNKgR=h__SsoKynaR1tR2nHhuk_6tse-3FHJ7mw@mail.gmail.com-NWmRa0I----9> 1 sibling, 0 replies; 9+ messages in thread From: Jim Mattson @ 2023-05-31 15:34 UTC (permalink / raw) To: Sean Christopherson; +Cc: Maxim Levitsky, jwarren, Kvm On Wed, May 31, 2023 at 7:41 AM Sean Christopherson <seanjc@google.com> wrote: > > On Wed, May 31, 2023, Maxim Levitsky wrote: > > У вт, 2023-05-30 у 13:34 -0700, Jim Mattson пише: > > > On Tue, May 30, 2023 at 1:10 PM Jim Mattson <jmattson@google.com> wrote: > > > > On Mon, May 29, 2023 at 6:44 AM Maxim Levitsky <mlevitsk@redhat.com> wrote: > > > > > У пн, 2023-05-29 у 14:58 +0200, jwarren@tutanota.com пише: > > > > > > I don't know what would be the best case here, maybe put a quirk > > > > > > there, so it doesn't break "userspace". Committer's email is dead, > > > > > > so I'm writing here. > > > > > > > > > > > > > > > > I have to say that I know about this for long time, because some time > > > > > ago I used to play with VMware player in a VM on AMD on my spare time, > > > > > on weekends (just doing various crazy things with double nesting, > > > > > running win98 nested, vfio stuff, etc, etc). > > > > > > > > > > I didn't report it because its a bug in VMWARE - they set a bit in the > > > > > tlb_control without checking CPUID's FLUSHBYASID which states that KVM > > > > > doesn't support setting this bit. > > > > > > > > I am pretty sure that bit 1 is supposed to be ignored on hardware > > > > without FlushByASID, but I'll have to see if I can dig up an old APM > > > > to verify that. > > > > > > I couldn't find an APM that old, but even today's APM does not specify > > > that any checks are performed on the TLB_CONTROL field by VMRUN. > > > > > > While Intel likes to fail VM-entry for illegal VMCS state, AMD prefers > > > to massage the VMCB to render any illegal VMCB state legal. For > > > example, rather than fail VM-entry for a non-canonical address, AMD is > > > inclined to drop the high bits and sign-extend the low bits, so that > > > the address is canonical. > > > > > > I'm willing to bet that modern CPUs continue to ignore the TLB_CONTROL > > > bits that were noted "reserved" in version 3.22 of the manual, and > > > that Krish simply manufactured the checks in commit 174a921b6975 > > > ("nSVM: Check for reserved encodings of TLB_CONTROL in nested VMCB"), > > > without cause. > > Ya. The APM even provides a definition of "reserved" that explicitly calls out > the different reserved qualifiers. The only fields/values that KVM can actively > enforce are things tagged MBZ. > > reserved > Fields marked as reserved may be used at some future time. > To preserve compatibility with future processors, reserved fields require special handling when > read or written by software. Software must not depend on the state of a reserved field (unless > qualified as RAZ), nor upon the ability of such fields to return a previously written state. > If a field is marked reserved without qualification, software must not change the state of that field; > it must reload that field with the same value returned from a prior read. > Reserved fields may be qualified as IGN, MBZ, RAZ, or SBZ (see definitions). > > > > > > Supporting FLUSHBYASID would fix this, and make nesting faster too, > > > > > but it is far from a trivial job. > > > > > > > > > > I hope that I will find time to do this soon. > > ... > > > Shall we revert the offending patch then? > > Yes please. It's not quite that simple. The vmcb12 TLB_CONTROL field needs to be sanitized on its way into the vmcb02 (perhaps in nested_copy_vmcb_control_to_cache()?). Bits 63:2 should be cleared. Also, if the guest CPUID does not advertise support for FlushByASID, then bit 1 should be cleared. Note that the vmcb12 TLB_CONTROL field itself must not be modified, since the APM specifically states, "The VMRUN instruction reads, but does not change, the value of the TLB_CONTROL field." ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <CALMp9eTti7gSNKgR=h__SsoKynaR1tR2nHhuk_6tse-3FHJ7mw@mail.gmail.com-NWmRa0I----9>]
* Re: [Bug] AMD nested: commit broke VMware [not found] ` <CALMp9eTti7gSNKgR=h__SsoKynaR1tR2nHhuk_6tse-3FHJ7mw@mail.gmail.com-NWmRa0I----9> @ 2023-08-01 8:49 ` jwarren 2023-08-01 10:04 ` Maxim Levitsky 0 siblings, 1 reply; 9+ messages in thread From: jwarren @ 2023-08-01 8:49 UTC (permalink / raw) To: Jim Mattson; +Cc: Sean Christopherson, Maxim Levitsky, Kvm Hi Are there any thoughts on this? PS As I see it, that commit didn't fix any real problem (up to 5.15 nothing was broken that required it), but did break nested VMWare Workstation/ESXi for people that use it (yes, it's vmware's code bug, but doubt they will fix it). May 31, 2023, 15:34 by jmattson@google.com: > On Wed, May 31, 2023 at 7:41 AM Sean Christopherson <seanjc@google.com> wrote: > >> >> On Wed, May 31, 2023, Maxim Levitsky wrote: >> > У вт, 2023-05-30 у 13:34 -0700, Jim Mattson пише: >> > > On Tue, May 30, 2023 at 1:10 PM Jim Mattson <jmattson@google.com> wrote: >> > > > On Mon, May 29, 2023 at 6:44 AM Maxim Levitsky <mlevitsk@redhat.com> wrote: >> > > > > У пн, 2023-05-29 у 14:58 +0200, jwarren@tutanota.com пише: >> > > > > > I don't know what would be the best case here, maybe put a quirk >> > > > > > there, so it doesn't break "userspace". Committer's email is dead, >> > > > > > so I'm writing here. >> > > > > > >> > > > > >> > > > > I have to say that I know about this for long time, because some time >> > > > > ago I used to play with VMware player in a VM on AMD on my spare time, >> > > > > on weekends (just doing various crazy things with double nesting, >> > > > > running win98 nested, vfio stuff, etc, etc). >> > > > > >> > > > > I didn't report it because its a bug in VMWARE - they set a bit in the >> > > > > tlb_control without checking CPUID's FLUSHBYASID which states that KVM >> > > > > doesn't support setting this bit. >> > > > >> > > > I am pretty sure that bit 1 is supposed to be ignored on hardware >> > > > without FlushByASID, but I'll have to see if I can dig up an old APM >> > > > to verify that. >> > > >> > > I couldn't find an APM that old, but even today's APM does not specify >> > > that any checks are performed on the TLB_CONTROL field by VMRUN. >> > > >> > > While Intel likes to fail VM-entry for illegal VMCS state, AMD prefers >> > > to massage the VMCB to render any illegal VMCB state legal. For >> > > example, rather than fail VM-entry for a non-canonical address, AMD is >> > > inclined to drop the high bits and sign-extend the low bits, so that >> > > the address is canonical. >> > > >> > > I'm willing to bet that modern CPUs continue to ignore the TLB_CONTROL >> > > bits that were noted "reserved" in version 3.22 of the manual, and >> > > that Krish simply manufactured the checks in commit 174a921b6975 >> > > ("nSVM: Check for reserved encodings of TLB_CONTROL in nested VMCB"), >> > > without cause. >> >> Ya. The APM even provides a definition of "reserved" that explicitly calls out >> the different reserved qualifiers. The only fields/values that KVM can actively >> enforce are things tagged MBZ. >> >> reserved >> Fields marked as reserved may be used at some future time. >> To preserve compatibility with future processors, reserved fields require special handling when >> read or written by software. Software must not depend on the state of a reserved field (unless >> qualified as RAZ), nor upon the ability of such fields to return a previously written state. >> If a field is marked reserved without qualification, software must not change the state of that field; >> it must reload that field with the same value returned from a prior read. >> Reserved fields may be qualified as IGN, MBZ, RAZ, or SBZ (see definitions). >> >> > > > > Supporting FLUSHBYASID would fix this, and make nesting faster too, >> > > > > but it is far from a trivial job. >> > > > > >> > > > > I hope that I will find time to do this soon. >> >> ... >> >> > Shall we revert the offending patch then? >> >> Yes please. >> > > It's not quite that simple. > > The vmcb12 TLB_CONTROL field needs to be sanitized on its way into the > vmcb02 (perhaps in nested_copy_vmcb_control_to_cache()?). Bits 63:2 > should be cleared. Also, if the guest CPUID does not advertise support > for FlushByASID, then bit 1 should be cleared. Note that the vmcb12 > TLB_CONTROL field itself must not be modified, since the APM > specifically states, "The VMRUN instruction reads, but does not > change, the value of the TLB_CONTROL field." > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug] AMD nested: commit broke VMware 2023-08-01 8:49 ` jwarren @ 2023-08-01 10:04 ` Maxim Levitsky 0 siblings, 0 replies; 9+ messages in thread From: Maxim Levitsky @ 2023-08-01 10:04 UTC (permalink / raw) To: jwarren, Jim Mattson; +Cc: Sean Christopherson, Kvm У вт, 2023-08-01 у 10:49 +0200, jwarren@tutanota.com пише: > Hi > Are there any thoughts on this? > > PS As I see it, that commit didn't fix any real problem (up to 5.15 nothing was broken that required it), but did break nested VMWare Workstation/ESXi for people that use it (yes, it's vmware's code bug, but doubt they will fix it). > > > May 31, 2023, 15:34 by jmattson@google.com: > > > On Wed, May 31, 2023 at 7:41 AM Sean Christopherson <seanjc@google.com> wrote: > > > > > On Wed, May 31, 2023, Maxim Levitsky wrote: > > > > У вт, 2023-05-30 у 13:34 -0700, Jim Mattson пише: > > > > > On Tue, May 30, 2023 at 1:10 PM Jim Mattson <jmattson@google.com> wrote: > > > > > > On Mon, May 29, 2023 at 6:44 AM Maxim Levitsky <mlevitsk@redhat.com> wrote: > > > > > > > У пн, 2023-05-29 у 14:58 +0200, jwarren@tutanota.com пише: > > > > > > > > I don't know what would be the best case here, maybe put a quirk > > > > > > > > there, so it doesn't break "userspace". Committer's email is dead, > > > > > > > > so I'm writing here. > > > > > > > > > > > > > > > > > > > > > > I have to say that I know about this for long time, because some time > > > > > > > ago I used to play with VMware player in a VM on AMD on my spare time, > > > > > > > on weekends (just doing various crazy things with double nesting, > > > > > > > running win98 nested, vfio stuff, etc, etc). > > > > > > > > > > > > > > I didn't report it because its a bug in VMWARE - they set a bit in the > > > > > > > tlb_control without checking CPUID's FLUSHBYASID which states that KVM > > > > > > > doesn't support setting this bit. > > > > > > > > > > > > I am pretty sure that bit 1 is supposed to be ignored on hardware > > > > > > without FlushByASID, but I'll have to see if I can dig up an old APM > > > > > > to verify that. > > > > > > > > > > I couldn't find an APM that old, but even today's APM does not specify > > > > > that any checks are performed on the TLB_CONTROL field by VMRUN. > > > > > > > > > > While Intel likes to fail VM-entry for illegal VMCS state, AMD prefers > > > > > to massage the VMCB to render any illegal VMCB state legal. For > > > > > example, rather than fail VM-entry for a non-canonical address, AMD is > > > > > inclined to drop the high bits and sign-extend the low bits, so that > > > > > the address is canonical. > > > > > > > > > > I'm willing to bet that modern CPUs continue to ignore the TLB_CONTROL > > > > > bits that were noted "reserved" in version 3.22 of the manual, and > > > > > that Krish simply manufactured the checks in commit 174a921b6975 > > > > > ("nSVM: Check for reserved encodings of TLB_CONTROL in nested VMCB"), > > > > > without cause. > > > > > > Ya. The APM even provides a definition of "reserved" that explicitly calls out > > > the different reserved qualifiers. The only fields/values that KVM can actively > > > enforce are things tagged MBZ. > > > > > > reserved > > > Fields marked as reserved may be used at some future time. > > > To preserve compatibility with future processors, reserved fields require special handling when > > > read or written by software. Software must not depend on the state of a reserved field (unless > > > qualified as RAZ), nor upon the ability of such fields to return a previously written state. > > > If a field is marked reserved without qualification, software must not change the state of that field; > > > it must reload that field with the same value returned from a prior read. > > > Reserved fields may be qualified as IGN, MBZ, RAZ, or SBZ (see definitions). > > > > > > > > > > Supporting FLUSHBYASID would fix this, and make nesting faster too, > > > > > > > but it is far from a trivial job. > > > > > > > > > > > > > > I hope that I will find time to do this soon. > > > > > > ... > > > > > > > Shall we revert the offending patch then? > > > > > > Yes please. > > > > > > > It's not quite that simple. > > > > The vmcb12 TLB_CONTROL field needs to be sanitized on its way into the > > vmcb02 (perhaps in nested_copy_vmcb_control_to_cache()?). Bits 63:2 > > should be cleared. Also, if the guest CPUID does not advertise support > > for FlushByASID, then bit 1 should be cleared. Note that the vmcb12 > > TLB_CONTROL field itself must not be modified, since the APM > > specifically states, "The VMRUN instruction reads, but does not > > change, the value of the TLB_CONTROL field." > > I will prepare a patch for this hopefully this week. Jim mentioned that a revert is not full solution for this. Best regards, Maxim Levitsky ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-08-01 10:05 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-29 12:58 [Bug] AMD nested: commit broke VMware jwarren
2023-05-29 13:43 ` Maxim Levitsky
2023-05-30 20:10 ` Jim Mattson
2023-05-30 20:34 ` Jim Mattson
2023-05-31 10:35 ` Maxim Levitsky
2023-05-31 14:41 ` Sean Christopherson
2023-05-31 15:34 ` Jim Mattson
[not found] ` <CALMp9eTti7gSNKgR=h__SsoKynaR1tR2nHhuk_6tse-3FHJ7mw@mail.gmail.com-NWmRa0I----9>
2023-08-01 8:49 ` jwarren
2023-08-01 10:04 ` Maxim Levitsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox