* [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode
@ 2013-02-28 9:44 Jan Kiszka
2013-03-04 13:22 ` Gleb Natapov
0 siblings, 1 reply; 20+ messages in thread
From: Jan Kiszka @ 2013-02-28 9:44 UTC (permalink / raw)
To: Marcelo Tosatti, Gleb Natapov; +Cc: kvm, Nadav Har'El, Nakajima, Jun
The logic for calculating the value with which we call kvm_set_cr0/4 was
broken (will definitely be visible with nested unrestricted guest mode
support). Also, we performed the check regarding CR0_ALWAYSON too early
when in guest mode.
What really needs to be done on both CR0 and CR4 is to mask out L1-owned
bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and
arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus,
are not suited as input.
For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and
refuse the update if it fails. To be fully consistent, we implement this
check now also for CR4.
Finally, we have to set the shadow to the value L2 wanted to write
originally.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
Found while making unrestricted guest mode working. Not sure what impact
the bugs had on current feature level, if any.
For interested folks, I've pushed my nEPT environment here:
git://git.kiszka.org/linux-kvm.git nept-hacking
arch/x86/kvm/vmx.c | 49 ++++++++++++++++++++++++++++++-------------------
1 files changed, 30 insertions(+), 19 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7cc566b..d1dac08 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4605,37 +4605,48 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
/* called to set cr0 as appropriate for a mov-to-cr0 exit. */
static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val)
{
- if (to_vmx(vcpu)->nested.vmxon &&
- ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON))
- return 1;
-
if (is_guest_mode(vcpu)) {
- /*
- * We get here when L2 changed cr0 in a way that did not change
- * any of L1's shadowed bits (see nested_vmx_exit_handled_cr),
- * but did change L0 shadowed bits. This can currently happen
- * with the TS bit: L0 may want to leave TS on (for lazy fpu
- * loading) while pretending to allow the guest to change it.
- */
- if (kvm_set_cr0(vcpu, (val & vcpu->arch.cr0_guest_owned_bits) |
- (vcpu->arch.cr0 & ~vcpu->arch.cr0_guest_owned_bits)))
+ struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+ unsigned long orig_val = val;
+
+ val = (val & ~vmcs12->cr0_guest_host_mask) |
+ (vmcs_read64(GUEST_CR0) & vmcs12->cr0_guest_host_mask);
+ if ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON)
+ return 1;
+
+ if (kvm_set_cr0(vcpu, val))
return 1;
- vmcs_writel(CR0_READ_SHADOW, val);
+ vmcs_writel(CR0_READ_SHADOW, orig_val);
return 0;
- } else
+ } else {
+ if (to_vmx(vcpu)->nested.vmxon &&
+ ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON))
+ return 1;
return kvm_set_cr0(vcpu, val);
+ }
}
static int handle_set_cr4(struct kvm_vcpu *vcpu, unsigned long val)
{
if (is_guest_mode(vcpu)) {
- if (kvm_set_cr4(vcpu, (val & vcpu->arch.cr4_guest_owned_bits) |
- (vcpu->arch.cr4 & ~vcpu->arch.cr4_guest_owned_bits)))
+ struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+ unsigned long orig_val = val;
+
+ val = (val & ~vmcs12->cr4_guest_host_mask) |
+ (vmcs_readl(GUEST_CR4) & vmcs12->cr4_guest_host_mask);
+ if ((val & VMXON_CR4_ALWAYSON) != VMXON_CR4_ALWAYSON)
+ return 1;
+
+ if (kvm_set_cr4(vcpu, val))
return 1;
- vmcs_writel(CR4_READ_SHADOW, val);
+ vmcs_writel(CR4_READ_SHADOW, orig_val);
return 0;
- } else
+ } else {
+ if (to_vmx(vcpu)->nested.vmxon &&
+ ((val & VMXON_CR4_ALWAYSON) != VMXON_CR4_ALWAYSON))
+ return 1;
return kvm_set_cr4(vcpu, val);
+ }
}
/* called to set cr0 as approriate for clts instruction exit. */
--
1.7.3.4
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode 2013-02-28 9:44 [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode Jan Kiszka @ 2013-03-04 13:22 ` Gleb Natapov 2013-03-04 14:09 ` Jan Kiszka 0 siblings, 1 reply; 20+ messages in thread From: Gleb Natapov @ 2013-03-04 13:22 UTC (permalink / raw) To: Jan Kiszka; +Cc: Marcelo Tosatti, kvm, Nadav Har'El, Nakajima, Jun On Thu, Feb 28, 2013 at 10:44:47AM +0100, Jan Kiszka wrote: > The logic for calculating the value with which we call kvm_set_cr0/4 was > broken (will definitely be visible with nested unrestricted guest mode > support). Also, we performed the check regarding CR0_ALWAYSON too early > when in guest mode. > > What really needs to be done on both CR0 and CR4 is to mask out L1-owned > bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and > arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus, > are not suited as input. > > For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and > refuse the update if it fails. To be fully consistent, we implement this > check now also for CR4. > > Finally, we have to set the shadow to the value L2 wanted to write > originally. > > Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> > --- > > Found while making unrestricted guest mode working. Not sure what impact > the bugs had on current feature level, if any. > > For interested folks, I've pushed my nEPT environment here: > > git://git.kiszka.org/linux-kvm.git nept-hacking > > arch/x86/kvm/vmx.c | 49 ++++++++++++++++++++++++++++++------------------- > 1 files changed, 30 insertions(+), 19 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 7cc566b..d1dac08 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -4605,37 +4605,48 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall) > /* called to set cr0 as appropriate for a mov-to-cr0 exit. */ > static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val) > { > - if (to_vmx(vcpu)->nested.vmxon && > - ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON)) > - return 1; > - > if (is_guest_mode(vcpu)) { > - /* > - * We get here when L2 changed cr0 in a way that did not change > - * any of L1's shadowed bits (see nested_vmx_exit_handled_cr), > - * but did change L0 shadowed bits. This can currently happen > - * with the TS bit: L0 may want to leave TS on (for lazy fpu > - * loading) while pretending to allow the guest to change it. > - */ Can't say I understand this patch yet, but it looks like the comment is still valid. Why have you removed it? > - if (kvm_set_cr0(vcpu, (val & vcpu->arch.cr0_guest_owned_bits) | > - (vcpu->arch.cr0 & ~vcpu->arch.cr0_guest_owned_bits))) > + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > + unsigned long orig_val = val; > + > + val = (val & ~vmcs12->cr0_guest_host_mask) | > + (vmcs_read64(GUEST_CR0) & vmcs12->cr0_guest_host_mask); > + if ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON) > + return 1; > + > + if (kvm_set_cr0(vcpu, val)) > return 1; > - vmcs_writel(CR0_READ_SHADOW, val); > + vmcs_writel(CR0_READ_SHADOW, orig_val); > return 0; > - } else > + } else { > + if (to_vmx(vcpu)->nested.vmxon && > + ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON)) > + return 1; > return kvm_set_cr0(vcpu, val); > + } > } > > static int handle_set_cr4(struct kvm_vcpu *vcpu, unsigned long val) > { > if (is_guest_mode(vcpu)) { > - if (kvm_set_cr4(vcpu, (val & vcpu->arch.cr4_guest_owned_bits) | > - (vcpu->arch.cr4 & ~vcpu->arch.cr4_guest_owned_bits))) > + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > + unsigned long orig_val = val; > + > + val = (val & ~vmcs12->cr4_guest_host_mask) | > + (vmcs_readl(GUEST_CR4) & vmcs12->cr4_guest_host_mask); > + if ((val & VMXON_CR4_ALWAYSON) != VMXON_CR4_ALWAYSON) > + return 1; > + > + if (kvm_set_cr4(vcpu, val)) > return 1; > - vmcs_writel(CR4_READ_SHADOW, val); > + vmcs_writel(CR4_READ_SHADOW, orig_val); > return 0; > - } else > + } else { > + if (to_vmx(vcpu)->nested.vmxon && > + ((val & VMXON_CR4_ALWAYSON) != VMXON_CR4_ALWAYSON)) > + return 1; > return kvm_set_cr4(vcpu, val); > + } > } > > /* called to set cr0 as approriate for clts instruction exit. */ > -- > 1.7.3.4 -- Gleb. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode 2013-03-04 13:22 ` Gleb Natapov @ 2013-03-04 14:09 ` Jan Kiszka 2013-03-04 14:15 ` Gleb Natapov 0 siblings, 1 reply; 20+ messages in thread From: Jan Kiszka @ 2013-03-04 14:09 UTC (permalink / raw) To: Gleb Natapov; +Cc: Marcelo Tosatti, kvm, Nadav Har'El, Nakajima, Jun On 2013-03-04 14:22, Gleb Natapov wrote: > On Thu, Feb 28, 2013 at 10:44:47AM +0100, Jan Kiszka wrote: >> The logic for calculating the value with which we call kvm_set_cr0/4 was >> broken (will definitely be visible with nested unrestricted guest mode >> support). Also, we performed the check regarding CR0_ALWAYSON too early >> when in guest mode. >> >> What really needs to be done on both CR0 and CR4 is to mask out L1-owned >> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and >> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus, >> are not suited as input. >> >> For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and >> refuse the update if it fails. To be fully consistent, we implement this >> check now also for CR4. >> >> Finally, we have to set the shadow to the value L2 wanted to write >> originally. >> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >> --- >> >> Found while making unrestricted guest mode working. Not sure what impact >> the bugs had on current feature level, if any. >> >> For interested folks, I've pushed my nEPT environment here: >> >> git://git.kiszka.org/linux-kvm.git nept-hacking >> >> arch/x86/kvm/vmx.c | 49 ++++++++++++++++++++++++++++++------------------- >> 1 files changed, 30 insertions(+), 19 deletions(-) >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> index 7cc566b..d1dac08 100644 >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -4605,37 +4605,48 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall) >> /* called to set cr0 as appropriate for a mov-to-cr0 exit. */ >> static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val) >> { >> - if (to_vmx(vcpu)->nested.vmxon && >> - ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON)) >> - return 1; >> - >> if (is_guest_mode(vcpu)) { >> - /* >> - * We get here when L2 changed cr0 in a way that did not change >> - * any of L1's shadowed bits (see nested_vmx_exit_handled_cr), >> - * but did change L0 shadowed bits. This can currently happen >> - * with the TS bit: L0 may want to leave TS on (for lazy fpu >> - * loading) while pretending to allow the guest to change it. >> - */ > Can't say I understand this patch yet, but it looks like the comment is > still valid. Why have you removed it? L0 allows L1 or L2 at most to own TS, the rest is host-owned. I think the comment was always misleading. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode 2013-03-04 14:09 ` Jan Kiszka @ 2013-03-04 14:15 ` Gleb Natapov 2013-03-04 14:25 ` Jan Kiszka 0 siblings, 1 reply; 20+ messages in thread From: Gleb Natapov @ 2013-03-04 14:15 UTC (permalink / raw) To: Jan Kiszka; +Cc: Marcelo Tosatti, kvm, Nadav Har'El, Nakajima, Jun On Mon, Mar 04, 2013 at 03:09:51PM +0100, Jan Kiszka wrote: > On 2013-03-04 14:22, Gleb Natapov wrote: > > On Thu, Feb 28, 2013 at 10:44:47AM +0100, Jan Kiszka wrote: > >> The logic for calculating the value with which we call kvm_set_cr0/4 was > >> broken (will definitely be visible with nested unrestricted guest mode > >> support). Also, we performed the check regarding CR0_ALWAYSON too early > >> when in guest mode. > >> > >> What really needs to be done on both CR0 and CR4 is to mask out L1-owned > >> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and > >> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus, > >> are not suited as input. > >> > >> For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and > >> refuse the update if it fails. To be fully consistent, we implement this > >> check now also for CR4. > >> > >> Finally, we have to set the shadow to the value L2 wanted to write > >> originally. > >> > >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> > >> --- > >> > >> Found while making unrestricted guest mode working. Not sure what impact > >> the bugs had on current feature level, if any. > >> > >> For interested folks, I've pushed my nEPT environment here: > >> > >> git://git.kiszka.org/linux-kvm.git nept-hacking > >> > >> arch/x86/kvm/vmx.c | 49 ++++++++++++++++++++++++++++++------------------- > >> 1 files changed, 30 insertions(+), 19 deletions(-) > >> > >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > >> index 7cc566b..d1dac08 100644 > >> --- a/arch/x86/kvm/vmx.c > >> +++ b/arch/x86/kvm/vmx.c > >> @@ -4605,37 +4605,48 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall) > >> /* called to set cr0 as appropriate for a mov-to-cr0 exit. */ > >> static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val) > >> { > >> - if (to_vmx(vcpu)->nested.vmxon && > >> - ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON)) > >> - return 1; > >> - > >> if (is_guest_mode(vcpu)) { > >> - /* > >> - * We get here when L2 changed cr0 in a way that did not change > >> - * any of L1's shadowed bits (see nested_vmx_exit_handled_cr), > >> - * but did change L0 shadowed bits. This can currently happen > >> - * with the TS bit: L0 may want to leave TS on (for lazy fpu > >> - * loading) while pretending to allow the guest to change it. > >> - */ > > Can't say I understand this patch yet, but it looks like the comment is > > still valid. Why have you removed it? > > L0 allows L1 or L2 at most to own TS, the rest is host-owned. I think > the comment was always misleading. > I do not see how it is misleading. For everything but TS we will not get here (if L1 is kvm). For TS we will get here if L1 allows L2 to change it, but L0 does not. -- Gleb. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode 2013-03-04 14:15 ` Gleb Natapov @ 2013-03-04 14:25 ` Jan Kiszka 2013-03-04 15:30 ` Nadav Har'El 2013-03-04 17:56 ` Gleb Natapov 0 siblings, 2 replies; 20+ messages in thread From: Jan Kiszka @ 2013-03-04 14:25 UTC (permalink / raw) To: Gleb Natapov; +Cc: Marcelo Tosatti, kvm, Nadav Har'El, Nakajima, Jun On 2013-03-04 15:15, Gleb Natapov wrote: > On Mon, Mar 04, 2013 at 03:09:51PM +0100, Jan Kiszka wrote: >> On 2013-03-04 14:22, Gleb Natapov wrote: >>> On Thu, Feb 28, 2013 at 10:44:47AM +0100, Jan Kiszka wrote: >>>> The logic for calculating the value with which we call kvm_set_cr0/4 was >>>> broken (will definitely be visible with nested unrestricted guest mode >>>> support). Also, we performed the check regarding CR0_ALWAYSON too early >>>> when in guest mode. >>>> >>>> What really needs to be done on both CR0 and CR4 is to mask out L1-owned >>>> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and >>>> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus, >>>> are not suited as input. >>>> >>>> For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and >>>> refuse the update if it fails. To be fully consistent, we implement this >>>> check now also for CR4. >>>> >>>> Finally, we have to set the shadow to the value L2 wanted to write >>>> originally. >>>> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>> --- >>>> >>>> Found while making unrestricted guest mode working. Not sure what impact >>>> the bugs had on current feature level, if any. >>>> >>>> For interested folks, I've pushed my nEPT environment here: >>>> >>>> git://git.kiszka.org/linux-kvm.git nept-hacking >>>> >>>> arch/x86/kvm/vmx.c | 49 ++++++++++++++++++++++++++++++------------------- >>>> 1 files changed, 30 insertions(+), 19 deletions(-) >>>> >>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>>> index 7cc566b..d1dac08 100644 >>>> --- a/arch/x86/kvm/vmx.c >>>> +++ b/arch/x86/kvm/vmx.c >>>> @@ -4605,37 +4605,48 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall) >>>> /* called to set cr0 as appropriate for a mov-to-cr0 exit. */ >>>> static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val) >>>> { >>>> - if (to_vmx(vcpu)->nested.vmxon && >>>> - ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON)) >>>> - return 1; >>>> - >>>> if (is_guest_mode(vcpu)) { >>>> - /* >>>> - * We get here when L2 changed cr0 in a way that did not change >>>> - * any of L1's shadowed bits (see nested_vmx_exit_handled_cr), >>>> - * but did change L0 shadowed bits. This can currently happen >>>> - * with the TS bit: L0 may want to leave TS on (for lazy fpu >>>> - * loading) while pretending to allow the guest to change it. >>>> - */ >>> Can't say I understand this patch yet, but it looks like the comment is >>> still valid. Why have you removed it? >> >> L0 allows L1 or L2 at most to own TS, the rest is host-owned. I think >> the comment was always misleading. >> > I do not see how it is misleading. For everything but TS we will not get > here (if L1 is kvm). For TS we will get here if L1 allows L2 to change > it, but L0 does not. For everything *but guest-owned* we will get here, thus for most CR0 accesses (bit-wise, not regarding frequency). Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode 2013-03-04 14:25 ` Jan Kiszka @ 2013-03-04 15:30 ` Nadav Har'El 2013-03-04 16:01 ` Jan Kiszka 2013-03-04 17:56 ` Gleb Natapov 1 sibling, 1 reply; 20+ messages in thread From: Nadav Har'El @ 2013-03-04 15:30 UTC (permalink / raw) To: Jan Kiszka; +Cc: Gleb Natapov, Marcelo Tosatti, kvm, Nakajima, Jun On Mon, Mar 04, 2013, Jan Kiszka wrote about "Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode": > >>>> if (is_guest_mode(vcpu)) { > >>>> - /* > >>>> - * We get here when L2 changed cr0 in a way that did not change > >>>> - * any of L1's shadowed bits (see nested_vmx_exit_handled_cr), > >>>> - * but did change L0 shadowed bits. This can currently happen > >>>> - * with the TS bit: L0 may want to leave TS on (for lazy fpu > >>>> - * loading) while pretending to allow the guest to change it. > >>>> - */ > >>> Can't say I understand this patch yet, but it looks like the comment is > >>> still valid. Why have you removed it? > >> > >> L0 allows L1 or L2 at most to own TS, the rest is host-owned. I think > >> the comment was always misleading. > >> > > I do not see how it is misleading. For everything but TS we will not get > > here (if L1 is kvm). For TS we will get here if L1 allows L2 to change > > it, but L0 does not. > > For everything *but guest-owned* we will get here, thus for most CR0 > accesses (bit-wise, not regarding frequency). For most CR0 bits, L1 (at least, a KVM one) will shadow (trap) them, so we won't get to this point you modified at all... Instead, nested_vmx_exit_handled_cr() would notice that a shadowed-by-L1 bit was modified so an exit to L1 is required. We only get to that code you changed if a bit was modified that L1 did *not* want to trap, but L0 did. This is definitely not the bit-wise majority of the cases - unless you have an L1 that does not trap most of the CR0 bits. But I'm more worried about the actual code change :-) I didn't understand if there's a situation where the existing code did something wrong, or why it was wrong. Did you check the lazy-FPU-loading (TS bit) aspect of your new code? To effectively check this, what I had to do is to run on all of L0, L1, and L2, long runs of parallel "make" (make -j3) - concurrently. Even code which doesn't do floating-point calculations uses the FPU sometimes for its wide registers, so all these processes, guests and guest's guests, compete for the FPU, exercising very well this code path. If the TS bit is handled wrongly, some of these make processes will die, when one of the compilations dies of SIGSEGV (forgetting to set the FPU registers leads to some uninitialized pointers being used), so it's quite easy to exercise this. -- Nadav Har'El | Monday, Mar 4 2013, 22 Adar 5773 nyh@math.technion.ac.il |----------------------------------------- Phone +972-523-790466, ICQ 13349191 |"A witty saying proves nothing." -- http://nadav.harel.org.il |Voltaire ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode 2013-03-04 15:30 ` Nadav Har'El @ 2013-03-04 16:01 ` Jan Kiszka 0 siblings, 0 replies; 20+ messages in thread From: Jan Kiszka @ 2013-03-04 16:01 UTC (permalink / raw) To: Nadav Har'El; +Cc: Gleb Natapov, Marcelo Tosatti, kvm, Nakajima, Jun On 2013-03-04 16:30, Nadav Har'El wrote: > On Mon, Mar 04, 2013, Jan Kiszka wrote about "Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode": >>>>>> if (is_guest_mode(vcpu)) { >>>>>> - /* >>>>>> - * We get here when L2 changed cr0 in a way that did not change >>>>>> - * any of L1's shadowed bits (see nested_vmx_exit_handled_cr), >>>>>> - * but did change L0 shadowed bits. This can currently happen >>>>>> - * with the TS bit: L0 may want to leave TS on (for lazy fpu >>>>>> - * loading) while pretending to allow the guest to change it. >>>>>> - */ >>>>> Can't say I understand this patch yet, but it looks like the comment is >>>>> still valid. Why have you removed it? >>>> >>>> L0 allows L1 or L2 at most to own TS, the rest is host-owned. I think >>>> the comment was always misleading. >>>> >>> I do not see how it is misleading. For everything but TS we will not get >>> here (if L1 is kvm). For TS we will get here if L1 allows L2 to change >>> it, but L0 does not. >> >> For everything *but guest-owned* we will get here, thus for most CR0 >> accesses (bit-wise, not regarding frequency). > > For most CR0 bits, L1 (at least, a KVM one) will shadow (trap) them, so > we won't get to this point you modified at all... Instead, > nested_vmx_exit_handled_cr() would notice that a shadowed-by-L1 bit > was modified so an exit to L1 is required. We only get to that code > you changed if a bit was modified that L1 did *not* want to trap, but L0 did. > This is definitely not the bit-wise majority of the cases - unless you > have an L1 that does not trap most of the CR0 bits. > > But I'm more worried about the actual code change :-) I didn't > understand if there's a situation where the existing code did something > wrong, or why it was wrong. Did you check the lazy-FPU-loading (TS bit) > aspect of your new code? To effectively check this, what I had to do > is to run on all of L0, L1, and L2, long runs of parallel "make" (make -j3) - > concurrently. Even code which doesn't do floating-point calculations uses > the FPU sometimes for its wide registers, so all these processes, guests > and guest's guests, compete for the FPU, exercising very well this code > path. If the TS bit is handled wrongly, some of these make processes > will die, when one of the compilations dies of SIGSEGV (forgetting to > set the FPU registers leads to some uninitialized pointers being used), > so it's quite easy to exercise this. I'm not focusing on hosting KVM but arbitrary guests. So I looked at it generically, defining what bits should L1 effectively hand over to L0, like real hardware would do when setting CR0/4. And that value was wrongly calculated, breaking in practice when you allow unrestricted guest mode, ie. when L2 starts playing with PE and PG - just to name one prominent scenario. By focusing on TS for KVM-on-KVM without unrestricted guest mode, you weren't able to trigger this. But if you can tell me, where I may pass wrong values to kvm_set_cr0/4, I'm all ears. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode 2013-03-04 14:25 ` Jan Kiszka 2013-03-04 15:30 ` Nadav Har'El @ 2013-03-04 17:56 ` Gleb Natapov 2013-03-04 18:08 ` Jan Kiszka 1 sibling, 1 reply; 20+ messages in thread From: Gleb Natapov @ 2013-03-04 17:56 UTC (permalink / raw) To: Jan Kiszka; +Cc: Marcelo Tosatti, kvm, Nadav Har'El, Nakajima, Jun On Mon, Mar 04, 2013 at 03:25:47PM +0100, Jan Kiszka wrote: > On 2013-03-04 15:15, Gleb Natapov wrote: > > On Mon, Mar 04, 2013 at 03:09:51PM +0100, Jan Kiszka wrote: > >> On 2013-03-04 14:22, Gleb Natapov wrote: > >>> On Thu, Feb 28, 2013 at 10:44:47AM +0100, Jan Kiszka wrote: > >>>> The logic for calculating the value with which we call kvm_set_cr0/4 was > >>>> broken (will definitely be visible with nested unrestricted guest mode > >>>> support). Also, we performed the check regarding CR0_ALWAYSON too early > >>>> when in guest mode. > >>>> > >>>> What really needs to be done on both CR0 and CR4 is to mask out L1-owned > >>>> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and > >>>> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus, > >>>> are not suited as input. > >>>> > >>>> For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and > >>>> refuse the update if it fails. To be fully consistent, we implement this > >>>> check now also for CR4. > >>>> > >>>> Finally, we have to set the shadow to the value L2 wanted to write > >>>> originally. > >>>> > >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> > >>>> --- > >>>> > >>>> Found while making unrestricted guest mode working. Not sure what impact > >>>> the bugs had on current feature level, if any. > >>>> > >>>> For interested folks, I've pushed my nEPT environment here: > >>>> > >>>> git://git.kiszka.org/linux-kvm.git nept-hacking > >>>> > >>>> arch/x86/kvm/vmx.c | 49 ++++++++++++++++++++++++++++++------------------- > >>>> 1 files changed, 30 insertions(+), 19 deletions(-) > >>>> > >>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > >>>> index 7cc566b..d1dac08 100644 > >>>> --- a/arch/x86/kvm/vmx.c > >>>> +++ b/arch/x86/kvm/vmx.c > >>>> @@ -4605,37 +4605,48 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall) > >>>> /* called to set cr0 as appropriate for a mov-to-cr0 exit. */ > >>>> static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val) > >>>> { > >>>> - if (to_vmx(vcpu)->nested.vmxon && > >>>> - ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON)) > >>>> - return 1; > >>>> - > >>>> if (is_guest_mode(vcpu)) { > >>>> - /* > >>>> - * We get here when L2 changed cr0 in a way that did not change > >>>> - * any of L1's shadowed bits (see nested_vmx_exit_handled_cr), > >>>> - * but did change L0 shadowed bits. This can currently happen > >>>> - * with the TS bit: L0 may want to leave TS on (for lazy fpu > >>>> - * loading) while pretending to allow the guest to change it. > >>>> - */ > >>> Can't say I understand this patch yet, but it looks like the comment is > >>> still valid. Why have you removed it? > >> > >> L0 allows L1 or L2 at most to own TS, the rest is host-owned. I think > >> the comment was always misleading. > >> > > I do not see how it is misleading. For everything but TS we will not get > > here (if L1 is kvm). For TS we will get here if L1 allows L2 to change > > it, but L0 does not. > > For everything *but guest-owned* we will get here, thus for most CR0 > accesses (bit-wise, not regarding frequency). > I do not see how. If bit is trapped by L1 we will not get here. We will do vmexit to L1 instead. nested_vmx_exit_handled_cr() check this condition. I am not arguing about you code (didn't grok it yet), but the comment still make sense to me. -- Gleb. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode 2013-03-04 17:56 ` Gleb Natapov @ 2013-03-04 18:08 ` Jan Kiszka 2013-03-04 18:39 ` Gleb Natapov 0 siblings, 1 reply; 20+ messages in thread From: Jan Kiszka @ 2013-03-04 18:08 UTC (permalink / raw) To: Gleb Natapov; +Cc: Marcelo Tosatti, kvm, Nadav Har'El, Nakajima, Jun On 2013-03-04 18:56, Gleb Natapov wrote: > On Mon, Mar 04, 2013 at 03:25:47PM +0100, Jan Kiszka wrote: >> On 2013-03-04 15:15, Gleb Natapov wrote: >>> On Mon, Mar 04, 2013 at 03:09:51PM +0100, Jan Kiszka wrote: >>>> On 2013-03-04 14:22, Gleb Natapov wrote: >>>>> On Thu, Feb 28, 2013 at 10:44:47AM +0100, Jan Kiszka wrote: >>>>>> The logic for calculating the value with which we call kvm_set_cr0/4 was >>>>>> broken (will definitely be visible with nested unrestricted guest mode >>>>>> support). Also, we performed the check regarding CR0_ALWAYSON too early >>>>>> when in guest mode. >>>>>> >>>>>> What really needs to be done on both CR0 and CR4 is to mask out L1-owned >>>>>> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and >>>>>> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus, >>>>>> are not suited as input. >>>>>> >>>>>> For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and >>>>>> refuse the update if it fails. To be fully consistent, we implement this >>>>>> check now also for CR4. >>>>>> >>>>>> Finally, we have to set the shadow to the value L2 wanted to write >>>>>> originally. >>>>>> >>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>> --- >>>>>> >>>>>> Found while making unrestricted guest mode working. Not sure what impact >>>>>> the bugs had on current feature level, if any. >>>>>> >>>>>> For interested folks, I've pushed my nEPT environment here: >>>>>> >>>>>> git://git.kiszka.org/linux-kvm.git nept-hacking >>>>>> >>>>>> arch/x86/kvm/vmx.c | 49 ++++++++++++++++++++++++++++++------------------- >>>>>> 1 files changed, 30 insertions(+), 19 deletions(-) >>>>>> >>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>>>>> index 7cc566b..d1dac08 100644 >>>>>> --- a/arch/x86/kvm/vmx.c >>>>>> +++ b/arch/x86/kvm/vmx.c >>>>>> @@ -4605,37 +4605,48 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall) >>>>>> /* called to set cr0 as appropriate for a mov-to-cr0 exit. */ >>>>>> static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val) >>>>>> { >>>>>> - if (to_vmx(vcpu)->nested.vmxon && >>>>>> - ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON)) >>>>>> - return 1; >>>>>> - >>>>>> if (is_guest_mode(vcpu)) { >>>>>> - /* >>>>>> - * We get here when L2 changed cr0 in a way that did not change >>>>>> - * any of L1's shadowed bits (see nested_vmx_exit_handled_cr), >>>>>> - * but did change L0 shadowed bits. This can currently happen >>>>>> - * with the TS bit: L0 may want to leave TS on (for lazy fpu >>>>>> - * loading) while pretending to allow the guest to change it. >>>>>> - */ >>>>> Can't say I understand this patch yet, but it looks like the comment is >>>>> still valid. Why have you removed it? >>>> >>>> L0 allows L1 or L2 at most to own TS, the rest is host-owned. I think >>>> the comment was always misleading. >>>> >>> I do not see how it is misleading. For everything but TS we will not get >>> here (if L1 is kvm). For TS we will get here if L1 allows L2 to change >>> it, but L0 does not. >> >> For everything *but guest-owned* we will get here, thus for most CR0 >> accesses (bit-wise, not regarding frequency). >> > I do not see how. If bit is trapped by L1 we will not get here. We will > do vmexit to L1 instead. nested_vmx_exit_handled_cr() check this condition. > I am not arguing about you code (didn't grok it yet), but the comment > still make sense to me. "We get here when L2 changed cr0 in a way that did not change any of L1's shadowed bits (see nested_vmx_exit_handled_cr), but did change L0 shadowed bits." That I can sign. But the rest about TS is just misleading as we trap _every_ change in L0 - except for TS under certain conditions. The old code was tested against TS only, that's what the comment witness. If you prefer, I'll leave part one in. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode 2013-03-04 18:08 ` Jan Kiszka @ 2013-03-04 18:39 ` Gleb Natapov 2013-03-04 19:23 ` Jan Kiszka 0 siblings, 1 reply; 20+ messages in thread From: Gleb Natapov @ 2013-03-04 18:39 UTC (permalink / raw) To: Jan Kiszka; +Cc: Marcelo Tosatti, kvm, Nadav Har'El, Nakajima, Jun On Mon, Mar 04, 2013 at 07:08:08PM +0100, Jan Kiszka wrote: > On 2013-03-04 18:56, Gleb Natapov wrote: > > On Mon, Mar 04, 2013 at 03:25:47PM +0100, Jan Kiszka wrote: > >> On 2013-03-04 15:15, Gleb Natapov wrote: > >>> On Mon, Mar 04, 2013 at 03:09:51PM +0100, Jan Kiszka wrote: > >>>> On 2013-03-04 14:22, Gleb Natapov wrote: > >>>>> On Thu, Feb 28, 2013 at 10:44:47AM +0100, Jan Kiszka wrote: > >>>>>> The logic for calculating the value with which we call kvm_set_cr0/4 was > >>>>>> broken (will definitely be visible with nested unrestricted guest mode > >>>>>> support). Also, we performed the check regarding CR0_ALWAYSON too early > >>>>>> when in guest mode. > >>>>>> > >>>>>> What really needs to be done on both CR0 and CR4 is to mask out L1-owned > >>>>>> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and > >>>>>> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus, > >>>>>> are not suited as input. > >>>>>> > >>>>>> For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and > >>>>>> refuse the update if it fails. To be fully consistent, we implement this > >>>>>> check now also for CR4. > >>>>>> > >>>>>> Finally, we have to set the shadow to the value L2 wanted to write > >>>>>> originally. > >>>>>> > >>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> > >>>>>> --- > >>>>>> > >>>>>> Found while making unrestricted guest mode working. Not sure what impact > >>>>>> the bugs had on current feature level, if any. > >>>>>> > >>>>>> For interested folks, I've pushed my nEPT environment here: > >>>>>> > >>>>>> git://git.kiszka.org/linux-kvm.git nept-hacking > >>>>>> > >>>>>> arch/x86/kvm/vmx.c | 49 ++++++++++++++++++++++++++++++------------------- > >>>>>> 1 files changed, 30 insertions(+), 19 deletions(-) > >>>>>> > >>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > >>>>>> index 7cc566b..d1dac08 100644 > >>>>>> --- a/arch/x86/kvm/vmx.c > >>>>>> +++ b/arch/x86/kvm/vmx.c > >>>>>> @@ -4605,37 +4605,48 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall) > >>>>>> /* called to set cr0 as appropriate for a mov-to-cr0 exit. */ > >>>>>> static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val) > >>>>>> { > >>>>>> - if (to_vmx(vcpu)->nested.vmxon && > >>>>>> - ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON)) > >>>>>> - return 1; > >>>>>> - > >>>>>> if (is_guest_mode(vcpu)) { > >>>>>> - /* > >>>>>> - * We get here when L2 changed cr0 in a way that did not change > >>>>>> - * any of L1's shadowed bits (see nested_vmx_exit_handled_cr), > >>>>>> - * but did change L0 shadowed bits. This can currently happen > >>>>>> - * with the TS bit: L0 may want to leave TS on (for lazy fpu > >>>>>> - * loading) while pretending to allow the guest to change it. > >>>>>> - */ > >>>>> Can't say I understand this patch yet, but it looks like the comment is > >>>>> still valid. Why have you removed it? > >>>> > >>>> L0 allows L1 or L2 at most to own TS, the rest is host-owned. I think > >>>> the comment was always misleading. > >>>> > >>> I do not see how it is misleading. For everything but TS we will not get > >>> here (if L1 is kvm). For TS we will get here if L1 allows L2 to change > >>> it, but L0 does not. > >> > >> For everything *but guest-owned* we will get here, thus for most CR0 > >> accesses (bit-wise, not regarding frequency). > >> > > I do not see how. If bit is trapped by L1 we will not get here. We will > > do vmexit to L1 instead. nested_vmx_exit_handled_cr() check this condition. > > I am not arguing about you code (didn't grok it yet), but the comment > > still make sense to me. > > "We get here when L2 changed cr0 in a way that did not change any of > L1's shadowed bits (see nested_vmx_exit_handled_cr), but did change L0 > shadowed bits." That I can sign. But the rest about TS is just > misleading as we trap _every_ change in L0 - except for TS under certain > conditions. The old code was tested against TS only, that's what the > comment witness. > TS is just an example of how we can get here with KVM on KVM. Obviously other hypervisors may have different configuration. L2 may allow full guest access to CR0 and then each CR0 write by L2 will be handled here. Under what other condition "we trap _every_ change in L0 - except for TS" here? > If you prefer, I'll leave part one in. > Please do so. Without the comment it is not obvious why exit condition is not checked here. Still do not see why you object to TS part. -- Gleb. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode 2013-03-04 18:39 ` Gleb Natapov @ 2013-03-04 19:23 ` Jan Kiszka 2013-03-04 19:33 ` Gleb Natapov 0 siblings, 1 reply; 20+ messages in thread From: Jan Kiszka @ 2013-03-04 19:23 UTC (permalink / raw) To: Gleb Natapov; +Cc: Marcelo Tosatti, kvm, Nadav Har'El, Nakajima, Jun [-- Attachment #1: Type: text/plain, Size: 4906 bytes --] On 2013-03-04 19:39, Gleb Natapov wrote: > On Mon, Mar 04, 2013 at 07:08:08PM +0100, Jan Kiszka wrote: >> On 2013-03-04 18:56, Gleb Natapov wrote: >>> On Mon, Mar 04, 2013 at 03:25:47PM +0100, Jan Kiszka wrote: >>>> On 2013-03-04 15:15, Gleb Natapov wrote: >>>>> On Mon, Mar 04, 2013 at 03:09:51PM +0100, Jan Kiszka wrote: >>>>>> On 2013-03-04 14:22, Gleb Natapov wrote: >>>>>>> On Thu, Feb 28, 2013 at 10:44:47AM +0100, Jan Kiszka wrote: >>>>>>>> The logic for calculating the value with which we call kvm_set_cr0/4 was >>>>>>>> broken (will definitely be visible with nested unrestricted guest mode >>>>>>>> support). Also, we performed the check regarding CR0_ALWAYSON too early >>>>>>>> when in guest mode. >>>>>>>> >>>>>>>> What really needs to be done on both CR0 and CR4 is to mask out L1-owned >>>>>>>> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and >>>>>>>> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus, >>>>>>>> are not suited as input. >>>>>>>> >>>>>>>> For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and >>>>>>>> refuse the update if it fails. To be fully consistent, we implement this >>>>>>>> check now also for CR4. >>>>>>>> >>>>>>>> Finally, we have to set the shadow to the value L2 wanted to write >>>>>>>> originally. >>>>>>>> >>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>> --- >>>>>>>> >>>>>>>> Found while making unrestricted guest mode working. Not sure what impact >>>>>>>> the bugs had on current feature level, if any. >>>>>>>> >>>>>>>> For interested folks, I've pushed my nEPT environment here: >>>>>>>> >>>>>>>> git://git.kiszka.org/linux-kvm.git nept-hacking >>>>>>>> >>>>>>>> arch/x86/kvm/vmx.c | 49 ++++++++++++++++++++++++++++++------------------- >>>>>>>> 1 files changed, 30 insertions(+), 19 deletions(-) >>>>>>>> >>>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>>>>>>> index 7cc566b..d1dac08 100644 >>>>>>>> --- a/arch/x86/kvm/vmx.c >>>>>>>> +++ b/arch/x86/kvm/vmx.c >>>>>>>> @@ -4605,37 +4605,48 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall) >>>>>>>> /* called to set cr0 as appropriate for a mov-to-cr0 exit. */ >>>>>>>> static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val) >>>>>>>> { >>>>>>>> - if (to_vmx(vcpu)->nested.vmxon && >>>>>>>> - ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON)) >>>>>>>> - return 1; >>>>>>>> - >>>>>>>> if (is_guest_mode(vcpu)) { >>>>>>>> - /* >>>>>>>> - * We get here when L2 changed cr0 in a way that did not change >>>>>>>> - * any of L1's shadowed bits (see nested_vmx_exit_handled_cr), >>>>>>>> - * but did change L0 shadowed bits. This can currently happen >>>>>>>> - * with the TS bit: L0 may want to leave TS on (for lazy fpu >>>>>>>> - * loading) while pretending to allow the guest to change it. >>>>>>>> - */ >>>>>>> Can't say I understand this patch yet, but it looks like the comment is >>>>>>> still valid. Why have you removed it? >>>>>> >>>>>> L0 allows L1 or L2 at most to own TS, the rest is host-owned. I think >>>>>> the comment was always misleading. >>>>>> >>>>> I do not see how it is misleading. For everything but TS we will not get >>>>> here (if L1 is kvm). For TS we will get here if L1 allows L2 to change >>>>> it, but L0 does not. >>>> >>>> For everything *but guest-owned* we will get here, thus for most CR0 >>>> accesses (bit-wise, not regarding frequency). >>>> >>> I do not see how. If bit is trapped by L1 we will not get here. We will >>> do vmexit to L1 instead. nested_vmx_exit_handled_cr() check this condition. >>> I am not arguing about you code (didn't grok it yet), but the comment >>> still make sense to me. >> >> "We get here when L2 changed cr0 in a way that did not change any of >> L1's shadowed bits (see nested_vmx_exit_handled_cr), but did change L0 >> shadowed bits." That I can sign. But the rest about TS is just >> misleading as we trap _every_ change in L0 - except for TS under certain >> conditions. The old code was tested against TS only, that's what the >> comment witness. >> > TS is just an example of how we can get here with KVM on KVM. Obviously > other hypervisors may have different configuration. L2 may allow full > guest access to CR0 and then each CR0 write by L2 will be handled here. > Under what other condition "we trap _every_ change in L0 - except for > TS" here? On FPU activation: cr0_guest_owned_bits = X86_CR0_TS; And on FPU deactivation: cr0_guest_owned_bits = 0; > >> If you prefer, I'll leave part one in. >> > Please do so. Without the comment it is not obvious why exit condition > is not checked here. Still do not see why you object to TS part. It describes a corner case in a way that suggests this is the only reason why we get here. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode 2013-03-04 19:23 ` Jan Kiszka @ 2013-03-04 19:33 ` Gleb Natapov 2013-03-04 19:37 ` Jan Kiszka 0 siblings, 1 reply; 20+ messages in thread From: Gleb Natapov @ 2013-03-04 19:33 UTC (permalink / raw) To: Jan Kiszka; +Cc: Marcelo Tosatti, kvm, Nadav Har'El, Nakajima, Jun On Mon, Mar 04, 2013 at 08:23:52PM +0100, Jan Kiszka wrote: > On 2013-03-04 19:39, Gleb Natapov wrote: > > On Mon, Mar 04, 2013 at 07:08:08PM +0100, Jan Kiszka wrote: > >> On 2013-03-04 18:56, Gleb Natapov wrote: > >>> On Mon, Mar 04, 2013 at 03:25:47PM +0100, Jan Kiszka wrote: > >>>> On 2013-03-04 15:15, Gleb Natapov wrote: > >>>>> On Mon, Mar 04, 2013 at 03:09:51PM +0100, Jan Kiszka wrote: > >>>>>> On 2013-03-04 14:22, Gleb Natapov wrote: > >>>>>>> On Thu, Feb 28, 2013 at 10:44:47AM +0100, Jan Kiszka wrote: > >>>>>>>> The logic for calculating the value with which we call kvm_set_cr0/4 was > >>>>>>>> broken (will definitely be visible with nested unrestricted guest mode > >>>>>>>> support). Also, we performed the check regarding CR0_ALWAYSON too early > >>>>>>>> when in guest mode. > >>>>>>>> > >>>>>>>> What really needs to be done on both CR0 and CR4 is to mask out L1-owned > >>>>>>>> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and > >>>>>>>> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus, > >>>>>>>> are not suited as input. > >>>>>>>> > >>>>>>>> For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and > >>>>>>>> refuse the update if it fails. To be fully consistent, we implement this > >>>>>>>> check now also for CR4. > >>>>>>>> > >>>>>>>> Finally, we have to set the shadow to the value L2 wanted to write > >>>>>>>> originally. > >>>>>>>> > >>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> > >>>>>>>> --- > >>>>>>>> > >>>>>>>> Found while making unrestricted guest mode working. Not sure what impact > >>>>>>>> the bugs had on current feature level, if any. > >>>>>>>> > >>>>>>>> For interested folks, I've pushed my nEPT environment here: > >>>>>>>> > >>>>>>>> git://git.kiszka.org/linux-kvm.git nept-hacking > >>>>>>>> > >>>>>>>> arch/x86/kvm/vmx.c | 49 ++++++++++++++++++++++++++++++------------------- > >>>>>>>> 1 files changed, 30 insertions(+), 19 deletions(-) > >>>>>>>> > >>>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > >>>>>>>> index 7cc566b..d1dac08 100644 > >>>>>>>> --- a/arch/x86/kvm/vmx.c > >>>>>>>> +++ b/arch/x86/kvm/vmx.c > >>>>>>>> @@ -4605,37 +4605,48 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall) > >>>>>>>> /* called to set cr0 as appropriate for a mov-to-cr0 exit. */ > >>>>>>>> static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val) > >>>>>>>> { > >>>>>>>> - if (to_vmx(vcpu)->nested.vmxon && > >>>>>>>> - ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON)) > >>>>>>>> - return 1; > >>>>>>>> - > >>>>>>>> if (is_guest_mode(vcpu)) { > >>>>>>>> - /* > >>>>>>>> - * We get here when L2 changed cr0 in a way that did not change > >>>>>>>> - * any of L1's shadowed bits (see nested_vmx_exit_handled_cr), > >>>>>>>> - * but did change L0 shadowed bits. This can currently happen > >>>>>>>> - * with the TS bit: L0 may want to leave TS on (for lazy fpu > >>>>>>>> - * loading) while pretending to allow the guest to change it. > >>>>>>>> - */ > >>>>>>> Can't say I understand this patch yet, but it looks like the comment is > >>>>>>> still valid. Why have you removed it? > >>>>>> > >>>>>> L0 allows L1 or L2 at most to own TS, the rest is host-owned. I think > >>>>>> the comment was always misleading. > >>>>>> > >>>>> I do not see how it is misleading. For everything but TS we will not get > >>>>> here (if L1 is kvm). For TS we will get here if L1 allows L2 to change > >>>>> it, but L0 does not. > >>>> > >>>> For everything *but guest-owned* we will get here, thus for most CR0 > >>>> accesses (bit-wise, not regarding frequency). > >>>> > >>> I do not see how. If bit is trapped by L1 we will not get here. We will > >>> do vmexit to L1 instead. nested_vmx_exit_handled_cr() check this condition. > >>> I am not arguing about you code (didn't grok it yet), but the comment > >>> still make sense to me. > >> > >> "We get here when L2 changed cr0 in a way that did not change any of > >> L1's shadowed bits (see nested_vmx_exit_handled_cr), but did change L0 > >> shadowed bits." That I can sign. But the rest about TS is just > >> misleading as we trap _every_ change in L0 - except for TS under certain > >> conditions. The old code was tested against TS only, that's what the > >> comment witness. > >> > > TS is just an example of how we can get here with KVM on KVM. Obviously > > other hypervisors may have different configuration. L2 may allow full > > guest access to CR0 and then each CR0 write by L2 will be handled here. > > Under what other condition "we trap _every_ change in L0 - except for > > TS" here? > > On FPU activation: > > cr0_guest_owned_bits = X86_CR0_TS; > > And on FPU deactivation: > > cr0_guest_owned_bits = 0; > That's exactly TS case that comment explains. Note that CR0_GUEST_HOST_MASK = ~cr0_guest_owned_bits. > > > >> If you prefer, I'll leave part one in. > >> > > Please do so. Without the comment it is not obvious why exit condition > > is not checked here. Still do not see why you object to TS part. > > It describes a corner case in a way that suggests this is the only > reason why we get here. > For KVM on KVM it is. -- Gleb. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode 2013-03-04 19:33 ` Gleb Natapov @ 2013-03-04 19:37 ` Jan Kiszka 2013-03-04 20:00 ` Gleb Natapov 0 siblings, 1 reply; 20+ messages in thread From: Jan Kiszka @ 2013-03-04 19:37 UTC (permalink / raw) To: Gleb Natapov; +Cc: Marcelo Tosatti, kvm, Nadav Har'El, Nakajima, Jun [-- Attachment #1: Type: text/plain, Size: 5622 bytes --] On 2013-03-04 20:33, Gleb Natapov wrote: > On Mon, Mar 04, 2013 at 08:23:52PM +0100, Jan Kiszka wrote: >> On 2013-03-04 19:39, Gleb Natapov wrote: >>> On Mon, Mar 04, 2013 at 07:08:08PM +0100, Jan Kiszka wrote: >>>> On 2013-03-04 18:56, Gleb Natapov wrote: >>>>> On Mon, Mar 04, 2013 at 03:25:47PM +0100, Jan Kiszka wrote: >>>>>> On 2013-03-04 15:15, Gleb Natapov wrote: >>>>>>> On Mon, Mar 04, 2013 at 03:09:51PM +0100, Jan Kiszka wrote: >>>>>>>> On 2013-03-04 14:22, Gleb Natapov wrote: >>>>>>>>> On Thu, Feb 28, 2013 at 10:44:47AM +0100, Jan Kiszka wrote: >>>>>>>>>> The logic for calculating the value with which we call kvm_set_cr0/4 was >>>>>>>>>> broken (will definitely be visible with nested unrestricted guest mode >>>>>>>>>> support). Also, we performed the check regarding CR0_ALWAYSON too early >>>>>>>>>> when in guest mode. >>>>>>>>>> >>>>>>>>>> What really needs to be done on both CR0 and CR4 is to mask out L1-owned >>>>>>>>>> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and >>>>>>>>>> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus, >>>>>>>>>> are not suited as input. >>>>>>>>>> >>>>>>>>>> For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and >>>>>>>>>> refuse the update if it fails. To be fully consistent, we implement this >>>>>>>>>> check now also for CR4. >>>>>>>>>> >>>>>>>>>> Finally, we have to set the shadow to the value L2 wanted to write >>>>>>>>>> originally. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> Found while making unrestricted guest mode working. Not sure what impact >>>>>>>>>> the bugs had on current feature level, if any. >>>>>>>>>> >>>>>>>>>> For interested folks, I've pushed my nEPT environment here: >>>>>>>>>> >>>>>>>>>> git://git.kiszka.org/linux-kvm.git nept-hacking >>>>>>>>>> >>>>>>>>>> arch/x86/kvm/vmx.c | 49 ++++++++++++++++++++++++++++++------------------- >>>>>>>>>> 1 files changed, 30 insertions(+), 19 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>>>>>>>>> index 7cc566b..d1dac08 100644 >>>>>>>>>> --- a/arch/x86/kvm/vmx.c >>>>>>>>>> +++ b/arch/x86/kvm/vmx.c >>>>>>>>>> @@ -4605,37 +4605,48 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall) >>>>>>>>>> /* called to set cr0 as appropriate for a mov-to-cr0 exit. */ >>>>>>>>>> static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val) >>>>>>>>>> { >>>>>>>>>> - if (to_vmx(vcpu)->nested.vmxon && >>>>>>>>>> - ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON)) >>>>>>>>>> - return 1; >>>>>>>>>> - >>>>>>>>>> if (is_guest_mode(vcpu)) { >>>>>>>>>> - /* >>>>>>>>>> - * We get here when L2 changed cr0 in a way that did not change >>>>>>>>>> - * any of L1's shadowed bits (see nested_vmx_exit_handled_cr), >>>>>>>>>> - * but did change L0 shadowed bits. This can currently happen >>>>>>>>>> - * with the TS bit: L0 may want to leave TS on (for lazy fpu >>>>>>>>>> - * loading) while pretending to allow the guest to change it. >>>>>>>>>> - */ >>>>>>>>> Can't say I understand this patch yet, but it looks like the comment is >>>>>>>>> still valid. Why have you removed it? >>>>>>>> >>>>>>>> L0 allows L1 or L2 at most to own TS, the rest is host-owned. I think >>>>>>>> the comment was always misleading. >>>>>>>> >>>>>>> I do not see how it is misleading. For everything but TS we will not get >>>>>>> here (if L1 is kvm). For TS we will get here if L1 allows L2 to change >>>>>>> it, but L0 does not. >>>>>> >>>>>> For everything *but guest-owned* we will get here, thus for most CR0 >>>>>> accesses (bit-wise, not regarding frequency). >>>>>> >>>>> I do not see how. If bit is trapped by L1 we will not get here. We will >>>>> do vmexit to L1 instead. nested_vmx_exit_handled_cr() check this condition. >>>>> I am not arguing about you code (didn't grok it yet), but the comment >>>>> still make sense to me. >>>> >>>> "We get here when L2 changed cr0 in a way that did not change any of >>>> L1's shadowed bits (see nested_vmx_exit_handled_cr), but did change L0 >>>> shadowed bits." That I can sign. But the rest about TS is just >>>> misleading as we trap _every_ change in L0 - except for TS under certain >>>> conditions. The old code was tested against TS only, that's what the >>>> comment witness. >>>> >>> TS is just an example of how we can get here with KVM on KVM. Obviously >>> other hypervisors may have different configuration. L2 may allow full >>> guest access to CR0 and then each CR0 write by L2 will be handled here. >>> Under what other condition "we trap _every_ change in L0 - except for >>> TS" here? >> >> On FPU activation: >> >> cr0_guest_owned_bits = X86_CR0_TS; >> >> And on FPU deactivation: >> >> cr0_guest_owned_bits = 0; >> > That's exactly TS case that comment explains. Note that > CR0_GUEST_HOST_MASK = ~cr0_guest_owned_bits. Again, it's the inverse of what the comment suggest: we enter handle_set_cr0 for every change on CR0 that doesn't match the shadow - except TS was given to the guest by both L1 and L0 (or TS isn't changed as well). > >>> >>>> If you prefer, I'll leave part one in. >>>> >>> Please do so. Without the comment it is not obvious why exit condition >>> is not checked here. Still do not see why you object to TS part. >> >> It describes a corner case in a way that suggests this is the only >> reason why we get here. >> > For KVM on KVM it is. Which is, sorry, irrelevant. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode 2013-03-04 19:37 ` Jan Kiszka @ 2013-03-04 20:00 ` Gleb Natapov 2013-03-04 20:12 ` Jan Kiszka 0 siblings, 1 reply; 20+ messages in thread From: Gleb Natapov @ 2013-03-04 20:00 UTC (permalink / raw) To: Jan Kiszka; +Cc: Marcelo Tosatti, kvm, Nadav Har'El, Nakajima, Jun On Mon, Mar 04, 2013 at 08:37:38PM +0100, Jan Kiszka wrote: > On 2013-03-04 20:33, Gleb Natapov wrote: > > On Mon, Mar 04, 2013 at 08:23:52PM +0100, Jan Kiszka wrote: > >> On 2013-03-04 19:39, Gleb Natapov wrote: > >>> On Mon, Mar 04, 2013 at 07:08:08PM +0100, Jan Kiszka wrote: > >>>> On 2013-03-04 18:56, Gleb Natapov wrote: > >>>>> On Mon, Mar 04, 2013 at 03:25:47PM +0100, Jan Kiszka wrote: > >>>>>> On 2013-03-04 15:15, Gleb Natapov wrote: > >>>>>>> On Mon, Mar 04, 2013 at 03:09:51PM +0100, Jan Kiszka wrote: > >>>>>>>> On 2013-03-04 14:22, Gleb Natapov wrote: > >>>>>>>>> On Thu, Feb 28, 2013 at 10:44:47AM +0100, Jan Kiszka wrote: > >>>>>>>>>> The logic for calculating the value with which we call kvm_set_cr0/4 was > >>>>>>>>>> broken (will definitely be visible with nested unrestricted guest mode > >>>>>>>>>> support). Also, we performed the check regarding CR0_ALWAYSON too early > >>>>>>>>>> when in guest mode. > >>>>>>>>>> > >>>>>>>>>> What really needs to be done on both CR0 and CR4 is to mask out L1-owned > >>>>>>>>>> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and > >>>>>>>>>> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus, > >>>>>>>>>> are not suited as input. > >>>>>>>>>> > >>>>>>>>>> For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and > >>>>>>>>>> refuse the update if it fails. To be fully consistent, we implement this > >>>>>>>>>> check now also for CR4. > >>>>>>>>>> > >>>>>>>>>> Finally, we have to set the shadow to the value L2 wanted to write > >>>>>>>>>> originally. > >>>>>>>>>> > >>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> > >>>>>>>>>> --- > >>>>>>>>>> > >>>>>>>>>> Found while making unrestricted guest mode working. Not sure what impact > >>>>>>>>>> the bugs had on current feature level, if any. > >>>>>>>>>> > >>>>>>>>>> For interested folks, I've pushed my nEPT environment here: > >>>>>>>>>> > >>>>>>>>>> git://git.kiszka.org/linux-kvm.git nept-hacking > >>>>>>>>>> > >>>>>>>>>> arch/x86/kvm/vmx.c | 49 ++++++++++++++++++++++++++++++------------------- > >>>>>>>>>> 1 files changed, 30 insertions(+), 19 deletions(-) > >>>>>>>>>> > >>>>>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > >>>>>>>>>> index 7cc566b..d1dac08 100644 > >>>>>>>>>> --- a/arch/x86/kvm/vmx.c > >>>>>>>>>> +++ b/arch/x86/kvm/vmx.c > >>>>>>>>>> @@ -4605,37 +4605,48 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall) > >>>>>>>>>> /* called to set cr0 as appropriate for a mov-to-cr0 exit. */ > >>>>>>>>>> static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val) > >>>>>>>>>> { > >>>>>>>>>> - if (to_vmx(vcpu)->nested.vmxon && > >>>>>>>>>> - ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON)) > >>>>>>>>>> - return 1; > >>>>>>>>>> - > >>>>>>>>>> if (is_guest_mode(vcpu)) { > >>>>>>>>>> - /* > >>>>>>>>>> - * We get here when L2 changed cr0 in a way that did not change > >>>>>>>>>> - * any of L1's shadowed bits (see nested_vmx_exit_handled_cr), > >>>>>>>>>> - * but did change L0 shadowed bits. This can currently happen > >>>>>>>>>> - * with the TS bit: L0 may want to leave TS on (for lazy fpu > >>>>>>>>>> - * loading) while pretending to allow the guest to change it. > >>>>>>>>>> - */ > >>>>>>>>> Can't say I understand this patch yet, but it looks like the comment is > >>>>>>>>> still valid. Why have you removed it? > >>>>>>>> > >>>>>>>> L0 allows L1 or L2 at most to own TS, the rest is host-owned. I think > >>>>>>>> the comment was always misleading. > >>>>>>>> > >>>>>>> I do not see how it is misleading. For everything but TS we will not get > >>>>>>> here (if L1 is kvm). For TS we will get here if L1 allows L2 to change > >>>>>>> it, but L0 does not. > >>>>>> > >>>>>> For everything *but guest-owned* we will get here, thus for most CR0 > >>>>>> accesses (bit-wise, not regarding frequency). > >>>>>> > >>>>> I do not see how. If bit is trapped by L1 we will not get here. We will > >>>>> do vmexit to L1 instead. nested_vmx_exit_handled_cr() check this condition. > >>>>> I am not arguing about you code (didn't grok it yet), but the comment > >>>>> still make sense to me. > >>>> > >>>> "We get here when L2 changed cr0 in a way that did not change any of > >>>> L1's shadowed bits (see nested_vmx_exit_handled_cr), but did change L0 > >>>> shadowed bits." That I can sign. But the rest about TS is just > >>>> misleading as we trap _every_ change in L0 - except for TS under certain > >>>> conditions. The old code was tested against TS only, that's what the > >>>> comment witness. > >>>> > >>> TS is just an example of how we can get here with KVM on KVM. Obviously > >>> other hypervisors may have different configuration. L2 may allow full > >>> guest access to CR0 and then each CR0 write by L2 will be handled here. > >>> Under what other condition "we trap _every_ change in L0 - except for > >>> TS" here? > >> > >> On FPU activation: > >> > >> cr0_guest_owned_bits = X86_CR0_TS; > >> > >> And on FPU deactivation: > >> > >> cr0_guest_owned_bits = 0; > >> > > That's exactly TS case that comment explains. Note that > > CR0_GUEST_HOST_MASK = ~cr0_guest_owned_bits. > > Again, it's the inverse of what the comment suggest: we enter > handle_set_cr0 for every change on CR0 that doesn't match the shadow - > except TS was given to the guest by both L1 and L0 (or TS isn't changed > as well). That doesn't make sense to me. I do not even sure what you are saying since you do not specify what shadow is matched. From the code I see that on CR0 exit to L0 from L2 we check if L2 tries to change CR0 bits that L1 claims to belong to it and do #vmexit to L1 if it is: if (vmcs12->cr0_guest_host_mask & (val ^ vmcs12->cr0_read_shadow)) return 1; We never reach handle_set_cr0() in that case. Can you provide an example with actual values for L2/L1/L0 of what you are trying to say? > > > > >>> > >>>> If you prefer, I'll leave part one in. > >>>> > >>> Please do so. Without the comment it is not obvious why exit condition > >>> is not checked here. Still do not see why you object to TS part. > >> > >> It describes a corner case in a way that suggests this is the only > >> reason why we get here. > >> > > For KVM on KVM it is. > > Which is, sorry, irrelevant. > As an example that helps developers to understand the code it pretty much is. I agree that "This can currently happen..." should be replaced with something like "With KVM as L1 this can currently happen...". -- Gleb. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode 2013-03-04 20:00 ` Gleb Natapov @ 2013-03-04 20:12 ` Jan Kiszka 2013-03-04 20:24 ` Gleb Natapov 0 siblings, 1 reply; 20+ messages in thread From: Jan Kiszka @ 2013-03-04 20:12 UTC (permalink / raw) To: Gleb Natapov; +Cc: Marcelo Tosatti, kvm, Nadav Har'El, Nakajima, Jun [-- Attachment #1: Type: text/plain, Size: 6372 bytes --] On 2013-03-04 21:00, Gleb Natapov wrote: > On Mon, Mar 04, 2013 at 08:37:38PM +0100, Jan Kiszka wrote: >> On 2013-03-04 20:33, Gleb Natapov wrote: >>> On Mon, Mar 04, 2013 at 08:23:52PM +0100, Jan Kiszka wrote: >>>> On 2013-03-04 19:39, Gleb Natapov wrote: >>>>> On Mon, Mar 04, 2013 at 07:08:08PM +0100, Jan Kiszka wrote: >>>>>> On 2013-03-04 18:56, Gleb Natapov wrote: >>>>>>> On Mon, Mar 04, 2013 at 03:25:47PM +0100, Jan Kiszka wrote: >>>>>>>> On 2013-03-04 15:15, Gleb Natapov wrote: >>>>>>>>> On Mon, Mar 04, 2013 at 03:09:51PM +0100, Jan Kiszka wrote: >>>>>>>>>> On 2013-03-04 14:22, Gleb Natapov wrote: >>>>>>>>>>> On Thu, Feb 28, 2013 at 10:44:47AM +0100, Jan Kiszka wrote: >>>>>>>>>>>> The logic for calculating the value with which we call kvm_set_cr0/4 was >>>>>>>>>>>> broken (will definitely be visible with nested unrestricted guest mode >>>>>>>>>>>> support). Also, we performed the check regarding CR0_ALWAYSON too early >>>>>>>>>>>> when in guest mode. >>>>>>>>>>>> >>>>>>>>>>>> What really needs to be done on both CR0 and CR4 is to mask out L1-owned >>>>>>>>>>>> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and >>>>>>>>>>>> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus, >>>>>>>>>>>> are not suited as input. >>>>>>>>>>>> >>>>>>>>>>>> For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and >>>>>>>>>>>> refuse the update if it fails. To be fully consistent, we implement this >>>>>>>>>>>> check now also for CR4. >>>>>>>>>>>> >>>>>>>>>>>> Finally, we have to set the shadow to the value L2 wanted to write >>>>>>>>>>>> originally. >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>>>>>>>> --- >>>>>>>>>>>> >>>>>>>>>>>> Found while making unrestricted guest mode working. Not sure what impact >>>>>>>>>>>> the bugs had on current feature level, if any. >>>>>>>>>>>> >>>>>>>>>>>> For interested folks, I've pushed my nEPT environment here: >>>>>>>>>>>> >>>>>>>>>>>> git://git.kiszka.org/linux-kvm.git nept-hacking >>>>>>>>>>>> >>>>>>>>>>>> arch/x86/kvm/vmx.c | 49 ++++++++++++++++++++++++++++++------------------- >>>>>>>>>>>> 1 files changed, 30 insertions(+), 19 deletions(-) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>>>>>>>>>>> index 7cc566b..d1dac08 100644 >>>>>>>>>>>> --- a/arch/x86/kvm/vmx.c >>>>>>>>>>>> +++ b/arch/x86/kvm/vmx.c >>>>>>>>>>>> @@ -4605,37 +4605,48 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall) >>>>>>>>>>>> /* called to set cr0 as appropriate for a mov-to-cr0 exit. */ >>>>>>>>>>>> static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val) >>>>>>>>>>>> { >>>>>>>>>>>> - if (to_vmx(vcpu)->nested.vmxon && >>>>>>>>>>>> - ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON)) >>>>>>>>>>>> - return 1; >>>>>>>>>>>> - >>>>>>>>>>>> if (is_guest_mode(vcpu)) { >>>>>>>>>>>> - /* >>>>>>>>>>>> - * We get here when L2 changed cr0 in a way that did not change >>>>>>>>>>>> - * any of L1's shadowed bits (see nested_vmx_exit_handled_cr), >>>>>>>>>>>> - * but did change L0 shadowed bits. This can currently happen >>>>>>>>>>>> - * with the TS bit: L0 may want to leave TS on (for lazy fpu >>>>>>>>>>>> - * loading) while pretending to allow the guest to change it. >>>>>>>>>>>> - */ >>>>>>>>>>> Can't say I understand this patch yet, but it looks like the comment is >>>>>>>>>>> still valid. Why have you removed it? >>>>>>>>>> >>>>>>>>>> L0 allows L1 or L2 at most to own TS, the rest is host-owned. I think >>>>>>>>>> the comment was always misleading. >>>>>>>>>> >>>>>>>>> I do not see how it is misleading. For everything but TS we will not get >>>>>>>>> here (if L1 is kvm). For TS we will get here if L1 allows L2 to change >>>>>>>>> it, but L0 does not. >>>>>>>> >>>>>>>> For everything *but guest-owned* we will get here, thus for most CR0 >>>>>>>> accesses (bit-wise, not regarding frequency). >>>>>>>> >>>>>>> I do not see how. If bit is trapped by L1 we will not get here. We will >>>>>>> do vmexit to L1 instead. nested_vmx_exit_handled_cr() check this condition. >>>>>>> I am not arguing about you code (didn't grok it yet), but the comment >>>>>>> still make sense to me. >>>>>> >>>>>> "We get here when L2 changed cr0 in a way that did not change any of >>>>>> L1's shadowed bits (see nested_vmx_exit_handled_cr), but did change L0 >>>>>> shadowed bits." That I can sign. But the rest about TS is just >>>>>> misleading as we trap _every_ change in L0 - except for TS under certain >>>>>> conditions. The old code was tested against TS only, that's what the >>>>>> comment witness. >>>>>> >>>>> TS is just an example of how we can get here with KVM on KVM. Obviously >>>>> other hypervisors may have different configuration. L2 may allow full >>>>> guest access to CR0 and then each CR0 write by L2 will be handled here. >>>>> Under what other condition "we trap _every_ change in L0 - except for >>>>> TS" here? >>>> >>>> On FPU activation: >>>> >>>> cr0_guest_owned_bits = X86_CR0_TS; >>>> >>>> And on FPU deactivation: >>>> >>>> cr0_guest_owned_bits = 0; >>>> >>> That's exactly TS case that comment explains. Note that >>> CR0_GUEST_HOST_MASK = ~cr0_guest_owned_bits. >> >> Again, it's the inverse of what the comment suggest: we enter >> handle_set_cr0 for every change on CR0 that doesn't match the shadow - >> except TS was given to the guest by both L1 and L0 (or TS isn't changed >> as well). > That doesn't make sense to me. I do not even sure what you are saying > since you do not specify what shadow is matched. From the code I see > that on CR0 exit to L0 from L2 we check if L2 tries to change CR0 bits > that L1 claims to belong to it and do #vmexit to L1 if it is: > > if (vmcs12->cr0_guest_host_mask & (val ^ vmcs12->cr0_read_shadow)) > return 1; > > We never reach handle_set_cr0() in that case. > > Can you provide an example with actual values for L2/L1/L0 of what you > are trying to say? I already provided a concrete one: L1 clears PE/PG from its guest_host_mask (assuming we support unrestricted guest mode for L1), L2 switches from real to protected mode, thus sets PE=1 while the shadow (set by L0) holds 0 => we end up in handle_set_cr0. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode 2013-03-04 20:12 ` Jan Kiszka @ 2013-03-04 20:24 ` Gleb Natapov 2013-03-04 20:37 ` Jan Kiszka 0 siblings, 1 reply; 20+ messages in thread From: Gleb Natapov @ 2013-03-04 20:24 UTC (permalink / raw) To: Jan Kiszka; +Cc: Marcelo Tosatti, kvm, Nadav Har'El, Nakajima, Jun On Mon, Mar 04, 2013 at 09:12:25PM +0100, Jan Kiszka wrote: > On 2013-03-04 21:00, Gleb Natapov wrote: > > On Mon, Mar 04, 2013 at 08:37:38PM +0100, Jan Kiszka wrote: > >> On 2013-03-04 20:33, Gleb Natapov wrote: > >>> On Mon, Mar 04, 2013 at 08:23:52PM +0100, Jan Kiszka wrote: > >>>> On 2013-03-04 19:39, Gleb Natapov wrote: > >>>>> On Mon, Mar 04, 2013 at 07:08:08PM +0100, Jan Kiszka wrote: > >>>>>> On 2013-03-04 18:56, Gleb Natapov wrote: > >>>>>>> On Mon, Mar 04, 2013 at 03:25:47PM +0100, Jan Kiszka wrote: > >>>>>>>> On 2013-03-04 15:15, Gleb Natapov wrote: > >>>>>>>>> On Mon, Mar 04, 2013 at 03:09:51PM +0100, Jan Kiszka wrote: > >>>>>>>>>> On 2013-03-04 14:22, Gleb Natapov wrote: > >>>>>>>>>>> On Thu, Feb 28, 2013 at 10:44:47AM +0100, Jan Kiszka wrote: > >>>>>>>>>>>> The logic for calculating the value with which we call kvm_set_cr0/4 was > >>>>>>>>>>>> broken (will definitely be visible with nested unrestricted guest mode > >>>>>>>>>>>> support). Also, we performed the check regarding CR0_ALWAYSON too early > >>>>>>>>>>>> when in guest mode. > >>>>>>>>>>>> > >>>>>>>>>>>> What really needs to be done on both CR0 and CR4 is to mask out L1-owned > >>>>>>>>>>>> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and > >>>>>>>>>>>> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus, > >>>>>>>>>>>> are not suited as input. > >>>>>>>>>>>> > >>>>>>>>>>>> For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and > >>>>>>>>>>>> refuse the update if it fails. To be fully consistent, we implement this > >>>>>>>>>>>> check now also for CR4. > >>>>>>>>>>>> > >>>>>>>>>>>> Finally, we have to set the shadow to the value L2 wanted to write > >>>>>>>>>>>> originally. > >>>>>>>>>>>> > >>>>>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> > >>>>>>>>>>>> --- > >>>>>>>>>>>> > >>>>>>>>>>>> Found while making unrestricted guest mode working. Not sure what impact > >>>>>>>>>>>> the bugs had on current feature level, if any. > >>>>>>>>>>>> > >>>>>>>>>>>> For interested folks, I've pushed my nEPT environment here: > >>>>>>>>>>>> > >>>>>>>>>>>> git://git.kiszka.org/linux-kvm.git nept-hacking > >>>>>>>>>>>> > >>>>>>>>>>>> arch/x86/kvm/vmx.c | 49 ++++++++++++++++++++++++++++++------------------- > >>>>>>>>>>>> 1 files changed, 30 insertions(+), 19 deletions(-) > >>>>>>>>>>>> > >>>>>>>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > >>>>>>>>>>>> index 7cc566b..d1dac08 100644 > >>>>>>>>>>>> --- a/arch/x86/kvm/vmx.c > >>>>>>>>>>>> +++ b/arch/x86/kvm/vmx.c > >>>>>>>>>>>> @@ -4605,37 +4605,48 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall) > >>>>>>>>>>>> /* called to set cr0 as appropriate for a mov-to-cr0 exit. */ > >>>>>>>>>>>> static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val) > >>>>>>>>>>>> { > >>>>>>>>>>>> - if (to_vmx(vcpu)->nested.vmxon && > >>>>>>>>>>>> - ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON)) > >>>>>>>>>>>> - return 1; > >>>>>>>>>>>> - > >>>>>>>>>>>> if (is_guest_mode(vcpu)) { > >>>>>>>>>>>> - /* > >>>>>>>>>>>> - * We get here when L2 changed cr0 in a way that did not change > >>>>>>>>>>>> - * any of L1's shadowed bits (see nested_vmx_exit_handled_cr), > >>>>>>>>>>>> - * but did change L0 shadowed bits. This can currently happen > >>>>>>>>>>>> - * with the TS bit: L0 may want to leave TS on (for lazy fpu > >>>>>>>>>>>> - * loading) while pretending to allow the guest to change it. > >>>>>>>>>>>> - */ > >>>>>>>>>>> Can't say I understand this patch yet, but it looks like the comment is > >>>>>>>>>>> still valid. Why have you removed it? > >>>>>>>>>> > >>>>>>>>>> L0 allows L1 or L2 at most to own TS, the rest is host-owned. I think > >>>>>>>>>> the comment was always misleading. > >>>>>>>>>> > >>>>>>>>> I do not see how it is misleading. For everything but TS we will not get > >>>>>>>>> here (if L1 is kvm). For TS we will get here if L1 allows L2 to change > >>>>>>>>> it, but L0 does not. > >>>>>>>> > >>>>>>>> For everything *but guest-owned* we will get here, thus for most CR0 > >>>>>>>> accesses (bit-wise, not regarding frequency). > >>>>>>>> > >>>>>>> I do not see how. If bit is trapped by L1 we will not get here. We will > >>>>>>> do vmexit to L1 instead. nested_vmx_exit_handled_cr() check this condition. > >>>>>>> I am not arguing about you code (didn't grok it yet), but the comment > >>>>>>> still make sense to me. > >>>>>> > >>>>>> "We get here when L2 changed cr0 in a way that did not change any of > >>>>>> L1's shadowed bits (see nested_vmx_exit_handled_cr), but did change L0 > >>>>>> shadowed bits." That I can sign. But the rest about TS is just > >>>>>> misleading as we trap _every_ change in L0 - except for TS under certain > >>>>>> conditions. The old code was tested against TS only, that's what the > >>>>>> comment witness. > >>>>>> > >>>>> TS is just an example of how we can get here with KVM on KVM. Obviously > >>>>> other hypervisors may have different configuration. L2 may allow full > >>>>> guest access to CR0 and then each CR0 write by L2 will be handled here. > >>>>> Under what other condition "we trap _every_ change in L0 - except for > >>>>> TS" here? > >>>> > >>>> On FPU activation: > >>>> > >>>> cr0_guest_owned_bits = X86_CR0_TS; > >>>> > >>>> And on FPU deactivation: > >>>> > >>>> cr0_guest_owned_bits = 0; > >>>> > >>> That's exactly TS case that comment explains. Note that > >>> CR0_GUEST_HOST_MASK = ~cr0_guest_owned_bits. > >> > >> Again, it's the inverse of what the comment suggest: we enter > >> handle_set_cr0 for every change on CR0 that doesn't match the shadow - > >> except TS was given to the guest by both L1 and L0 (or TS isn't changed > >> as well). > > That doesn't make sense to me. I do not even sure what you are saying > > since you do not specify what shadow is matched. From the code I see > > that on CR0 exit to L0 from L2 we check if L2 tries to change CR0 bits > > that L1 claims to belong to it and do #vmexit to L1 if it is: > > > > if (vmcs12->cr0_guest_host_mask & (val ^ vmcs12->cr0_read_shadow)) > > return 1; > > > > We never reach handle_set_cr0() in that case. > > > > Can you provide an example with actual values for L2/L1/L0 of what you > > are trying to say? > > I already provided a concrete one: L1 clears PE/PG from its > guest_host_mask (assuming we support unrestricted guest mode for L1), L2 > switches from real to protected mode, thus sets PE=1 while the shadow > (set by L0) holds 0 => we end up in handle_set_cr0. > So how is this "inverse of what the comment suggest"? I do not understand your grudge against the comment. Just clarify that TS is the one example of how we can get here if you think that it is not clear enough. The TS part was useful to me. -- Gleb. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode 2013-03-04 20:24 ` Gleb Natapov @ 2013-03-04 20:37 ` Jan Kiszka 2013-03-04 21:00 ` Gleb Natapov 0 siblings, 1 reply; 20+ messages in thread From: Jan Kiszka @ 2013-03-04 20:37 UTC (permalink / raw) To: Gleb Natapov; +Cc: Marcelo Tosatti, kvm, Nadav Har'El, Nakajima, Jun [-- Attachment #1: Type: text/plain, Size: 1651 bytes --] On 2013-03-04 21:24, Gleb Natapov wrote: >>> That doesn't make sense to me. I do not even sure what you are saying >>> since you do not specify what shadow is matched. From the code I see >>> that on CR0 exit to L0 from L2 we check if L2 tries to change CR0 bits >>> that L1 claims to belong to it and do #vmexit to L1 if it is: >>> >>> if (vmcs12->cr0_guest_host_mask & (val ^ vmcs12->cr0_read_shadow)) >>> return 1; >>> >>> We never reach handle_set_cr0() in that case. >>> >>> Can you provide an example with actual values for L2/L1/L0 of what you >>> are trying to say? >> >> I already provided a concrete one: L1 clears PE/PG from its >> guest_host_mask (assuming we support unrestricted guest mode for L1), L2 >> switches from real to protected mode, thus sets PE=1 while the shadow >> (set by L0) holds 0 => we end up in handle_set_cr0. >> > So how is this "inverse of what the comment suggest"? I do not > understand your grudge against the comment. Just clarify that TS is the > one example of how we can get here if you think that it is not clear enough. > The TS part was useful to me. Hmm, if the comment was helpful, why did you have to ask me what was wrong? :) If you insist on clarification, let's try it again: "We get here when L2 changed cr0 in a way that did not change any of L1's shadowed bits (see nested_vmx_exit_handled_cr), but did change L0 shadowed bits. KVM only hands over TS to L1, and that only if the FPU is enabled. So we will be called for every change that L1 allows L2 to perform natively." For me, ignoring TS made things clearer and simpler. HTH, Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode 2013-03-04 20:37 ` Jan Kiszka @ 2013-03-04 21:00 ` Gleb Natapov 2013-03-04 21:09 ` Jan Kiszka 0 siblings, 1 reply; 20+ messages in thread From: Gleb Natapov @ 2013-03-04 21:00 UTC (permalink / raw) To: Jan Kiszka; +Cc: Marcelo Tosatti, kvm, Nadav Har'El, Nakajima, Jun On Mon, Mar 04, 2013 at 09:37:17PM +0100, Jan Kiszka wrote: > On 2013-03-04 21:24, Gleb Natapov wrote: > >>> That doesn't make sense to me. I do not even sure what you are saying > >>> since you do not specify what shadow is matched. From the code I see > >>> that on CR0 exit to L0 from L2 we check if L2 tries to change CR0 bits > >>> that L1 claims to belong to it and do #vmexit to L1 if it is: > >>> > >>> if (vmcs12->cr0_guest_host_mask & (val ^ vmcs12->cr0_read_shadow)) > >>> return 1; > >>> > >>> We never reach handle_set_cr0() in that case. > >>> > >>> Can you provide an example with actual values for L2/L1/L0 of what you > >>> are trying to say? > >> > >> I already provided a concrete one: L1 clears PE/PG from its > >> guest_host_mask (assuming we support unrestricted guest mode for L1), L2 > >> switches from real to protected mode, thus sets PE=1 while the shadow > >> (set by L0) holds 0 => we end up in handle_set_cr0. > >> > > So how is this "inverse of what the comment suggest"? I do not > > understand your grudge against the comment. Just clarify that TS is the > > one example of how we can get here if you think that it is not clear enough. > > The TS part was useful to me. > > Hmm, if the comment was helpful, why did you have to ask me what was > wrong? :) > Comment helped me understand what's the case that should be handled here. I didn't asked you what was wrong, Nadav did, but that's because I haven't wrapped my mind around this code yet. I will ask it later :) > If you insist on clarification, let's try it again: > > "We get here when L2 changed cr0 in a way that did not change > any of L1's shadowed bits (see nested_vmx_exit_handled_cr), > but did change L0 shadowed bits. KVM only hands over TS to L1, > and that only if the FPU is enabled. So we will be called for > every change that L1 allows L2 to perform natively." I find this much more confusing that original comment. Actually re-reading the original one I do not think it's about KVM as L2 either. What bout: We get here when L2 changed cr0 in a way that did not change any of L1's shadowed bits (see nested_vmx_exit_handled_cr), but did change L0 shadowed bits. This may happen with the TS bit: L0 may want to leave TS on (for lazy fpu loading) while pretending to allow the guest to change it. So in your case it may happen to PE, but TS is only an example anyway. > > For me, ignoring TS made things clearer and simpler. > Because you had other example to work with. -- Gleb. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode 2013-03-04 21:00 ` Gleb Natapov @ 2013-03-04 21:09 ` Jan Kiszka 2013-03-05 6:25 ` Gleb Natapov 0 siblings, 1 reply; 20+ messages in thread From: Jan Kiszka @ 2013-03-04 21:09 UTC (permalink / raw) To: Gleb Natapov; +Cc: Marcelo Tosatti, kvm, Nadav Har'El, Nakajima, Jun [-- Attachment #1: Type: text/plain, Size: 2748 bytes --] On 2013-03-04 22:00, Gleb Natapov wrote: > On Mon, Mar 04, 2013 at 09:37:17PM +0100, Jan Kiszka wrote: >> On 2013-03-04 21:24, Gleb Natapov wrote: >>>>> That doesn't make sense to me. I do not even sure what you are saying >>>>> since you do not specify what shadow is matched. From the code I see >>>>> that on CR0 exit to L0 from L2 we check if L2 tries to change CR0 bits >>>>> that L1 claims to belong to it and do #vmexit to L1 if it is: >>>>> >>>>> if (vmcs12->cr0_guest_host_mask & (val ^ vmcs12->cr0_read_shadow)) >>>>> return 1; >>>>> >>>>> We never reach handle_set_cr0() in that case. >>>>> >>>>> Can you provide an example with actual values for L2/L1/L0 of what you >>>>> are trying to say? >>>> >>>> I already provided a concrete one: L1 clears PE/PG from its >>>> guest_host_mask (assuming we support unrestricted guest mode for L1), L2 >>>> switches from real to protected mode, thus sets PE=1 while the shadow >>>> (set by L0) holds 0 => we end up in handle_set_cr0. >>>> >>> So how is this "inverse of what the comment suggest"? I do not >>> understand your grudge against the comment. Just clarify that TS is the >>> one example of how we can get here if you think that it is not clear enough. >>> The TS part was useful to me. >> >> Hmm, if the comment was helpful, why did you have to ask me what was >> wrong? :) >> > Comment helped me understand what's the case that should be handled > here. I didn't asked you what was wrong, Nadav did, but that's because > I haven't wrapped my mind around this code yet. I will ask it later :) > >> If you insist on clarification, let's try it again: >> >> "We get here when L2 changed cr0 in a way that did not change >> any of L1's shadowed bits (see nested_vmx_exit_handled_cr), >> but did change L0 shadowed bits. KVM only hands over TS to L1, >> and that only if the FPU is enabled. So we will be called for >> every change that L1 allows L2 to perform natively." > > I find this much more confusing that original comment. Actually > re-reading the original one I do not think it's about KVM as L2 either. > What bout: > > We get here when L2 changed cr0 in a way that did not change > any of L1's shadowed bits (see nested_vmx_exit_handled_cr), > but did change L0 shadowed bits. This may happen with > the TS bit: L0 may want to leave TS on (for lazy fpu loading) > while pretending to allow the guest to change it. Which guest? > > So in your case it may happen to PE, but TS is only an example anyway. Not thinking about TS avoids the additional complication it brings because L0 may change its mind about guest ownership. Other bits are unconditionally L0-owned. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode 2013-03-04 21:09 ` Jan Kiszka @ 2013-03-05 6:25 ` Gleb Natapov 0 siblings, 0 replies; 20+ messages in thread From: Gleb Natapov @ 2013-03-05 6:25 UTC (permalink / raw) To: Jan Kiszka; +Cc: Marcelo Tosatti, kvm, Nadav Har'El, Nakajima, Jun On Mon, Mar 04, 2013 at 10:09:44PM +0100, Jan Kiszka wrote: > On 2013-03-04 22:00, Gleb Natapov wrote: > > On Mon, Mar 04, 2013 at 09:37:17PM +0100, Jan Kiszka wrote: > >> On 2013-03-04 21:24, Gleb Natapov wrote: > >>>>> That doesn't make sense to me. I do not even sure what you are saying > >>>>> since you do not specify what shadow is matched. From the code I see > >>>>> that on CR0 exit to L0 from L2 we check if L2 tries to change CR0 bits > >>>>> that L1 claims to belong to it and do #vmexit to L1 if it is: > >>>>> > >>>>> if (vmcs12->cr0_guest_host_mask & (val ^ vmcs12->cr0_read_shadow)) > >>>>> return 1; > >>>>> > >>>>> We never reach handle_set_cr0() in that case. > >>>>> > >>>>> Can you provide an example with actual values for L2/L1/L0 of what you > >>>>> are trying to say? > >>>> > >>>> I already provided a concrete one: L1 clears PE/PG from its > >>>> guest_host_mask (assuming we support unrestricted guest mode for L1), L2 > >>>> switches from real to protected mode, thus sets PE=1 while the shadow > >>>> (set by L0) holds 0 => we end up in handle_set_cr0. > >>>> > >>> So how is this "inverse of what the comment suggest"? I do not > >>> understand your grudge against the comment. Just clarify that TS is the > >>> one example of how we can get here if you think that it is not clear enough. > >>> The TS part was useful to me. > >> > >> Hmm, if the comment was helpful, why did you have to ask me what was > >> wrong? :) > >> > > Comment helped me understand what's the case that should be handled > > here. I didn't asked you what was wrong, Nadav did, but that's because > > I haven't wrapped my mind around this code yet. I will ask it later :) > > > >> If you insist on clarification, let's try it again: > >> > >> "We get here when L2 changed cr0 in a way that did not change > >> any of L1's shadowed bits (see nested_vmx_exit_handled_cr), > >> but did change L0 shadowed bits. KVM only hands over TS to L1, > >> and that only if the FPU is enabled. So we will be called for > >> every change that L1 allows L2 to perform natively." > > > > I find this much more confusing that original comment. Actually > > re-reading the original one I do not think it's about KVM as L2 either. > > What bout: > > > > We get here when L2 changed cr0 in a way that did not change > > any of L1's shadowed bits (see nested_vmx_exit_handled_cr), > > but did change L0 shadowed bits. This may happen with > > the TS bit: L0 may want to leave TS on (for lazy fpu loading) > > while pretending to allow the guest to change it. > > Which guest? Its guest, L1. L1 in its turn may allow L2 to change it, but this will cause L0 handling the actual cr0 write. > > > > > So in your case it may happen to PE, but TS is only an example anyway. > > Not thinking about TS avoids the additional complication it brings > because L0 may change its mind about guest ownership. Other bits are > unconditionally L0-owned. > The fact that KVM can change its mind about TS ownership is precisely why mentioning it in the comment is useful for KVM developers. When it is mentioned along with "lazy fpu" I immediately pictured the situation where TS will be shadowed by L0, but not L1 and this is what good comment is about. -- Gleb. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2013-03-05 6:25 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-02-28 9:44 [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode Jan Kiszka 2013-03-04 13:22 ` Gleb Natapov 2013-03-04 14:09 ` Jan Kiszka 2013-03-04 14:15 ` Gleb Natapov 2013-03-04 14:25 ` Jan Kiszka 2013-03-04 15:30 ` Nadav Har'El 2013-03-04 16:01 ` Jan Kiszka 2013-03-04 17:56 ` Gleb Natapov 2013-03-04 18:08 ` Jan Kiszka 2013-03-04 18:39 ` Gleb Natapov 2013-03-04 19:23 ` Jan Kiszka 2013-03-04 19:33 ` Gleb Natapov 2013-03-04 19:37 ` Jan Kiszka 2013-03-04 20:00 ` Gleb Natapov 2013-03-04 20:12 ` Jan Kiszka 2013-03-04 20:24 ` Gleb Natapov 2013-03-04 20:37 ` Jan Kiszka 2013-03-04 21:00 ` Gleb Natapov 2013-03-04 21:09 ` Jan Kiszka 2013-03-05 6:25 ` Gleb Natapov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox