From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kiszka Subject: Re: [PATCH] KVM: nVMX: Fix setting of CR0 and CR4 in guest mode Date: Mon, 04 Mar 2013 21:12:25 +0100 Message-ID: <51350029.6080908@web.de> References: <20130304132234.GP23616@redhat.com> <5134AB2F.20807@siemens.com> <20130304141515.GQ23616@redhat.com> <5134AEEB.9020600@siemens.com> <20130304175632.GD14220@redhat.com> <5134E308.3000202@siemens.com> <20130304183956.GF14220@redhat.com> <5134F4C8.9010807@web.de> <20130304193331.GG14220@redhat.com> <5134F802.2020200@web.de> <20130304200033.GH14220@redhat.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="----enig2CKEVPQXHXOFJDIIMCCHR" Cc: Marcelo Tosatti , kvm , Nadav Har'El , "Nakajima, Jun" To: Gleb Natapov Return-path: Received: from mout.web.de ([212.227.15.3]:56596 "EHLO mout.web.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932201Ab3CDUMi (ORCPT ); Mon, 4 Mar 2013 15:12:38 -0500 In-Reply-To: <20130304200033.GH14220@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: This is an OpenPGP/MIME signed message (RFC 4880 and 3156) ------enig2CKEVPQXHXOFJDIIMCCHR Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 2013-03-04 21:00, Gleb Natapov wrote: > On Mon, Mar 04, 2013 at 08:37:38PM +0100, Jan Kiszka wrote: >> On 2013-03-04 20:33, Gleb Natapov wrote: >>> On Mon, Mar 04, 2013 at 08:23:52PM +0100, Jan Kiszka wrote: >>>> On 2013-03-04 19:39, Gleb Natapov wrote: >>>>> On Mon, Mar 04, 2013 at 07:08:08PM +0100, Jan Kiszka wrote: >>>>>> On 2013-03-04 18:56, Gleb Natapov wrote: >>>>>>> On Mon, Mar 04, 2013 at 03:25:47PM +0100, Jan Kiszka wrote: >>>>>>>> On 2013-03-04 15:15, Gleb Natapov wrote: >>>>>>>>> On Mon, Mar 04, 2013 at 03:09:51PM +0100, Jan Kiszka wrote: >>>>>>>>>> On 2013-03-04 14:22, Gleb Natapov wrote: >>>>>>>>>>> On Thu, Feb 28, 2013 at 10:44:47AM +0100, Jan Kiszka wrote: >>>>>>>>>>>> The logic for calculating the value with which we call kvm_s= et_cr0/4 was >>>>>>>>>>>> broken (will definitely be visible with nested unrestricted = guest mode >>>>>>>>>>>> support). Also, we performed the check regarding CR0_ALWAYSO= N too early >>>>>>>>>>>> when in guest mode. >>>>>>>>>>>> >>>>>>>>>>>> What really needs to be done on both CR0 and CR4 is to mask = out L1-owned >>>>>>>>>>>> bits and merge them in from GUEST_CR0/4. In contrast, arch.c= r0/4 and >>>>>>>>>>>> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state = and, thus, >>>>>>>>>>>> are not suited as input. >>>>>>>>>>>> >>>>>>>>>>>> For both CRs, we can then apply the check against VMXON_CRx_= ALWAYSON and >>>>>>>>>>>> refuse the update if it fails. To be fully consistent, we im= plement this >>>>>>>>>>>> check now also for CR4. >>>>>>>>>>>> >>>>>>>>>>>> Finally, we have to set the shadow to the value L2 wanted to= write >>>>>>>>>>>> originally. >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Jan Kiszka >>>>>>>>>>>> --- >>>>>>>>>>>> >>>>>>>>>>>> Found while making unrestricted guest mode working. Not sure= what impact >>>>>>>>>>>> the bugs had on current feature level, if any. >>>>>>>>>>>> >>>>>>>>>>>> For interested folks, I've pushed my nEPT environment here: >>>>>>>>>>>> >>>>>>>>>>>> git://git.kiszka.org/linux-kvm.git nept-hacking >>>>>>>>>>>> >>>>>>>>>>>> arch/x86/kvm/vmx.c | 49 ++++++++++++++++++++++++++++++---= ---------------- >>>>>>>>>>>> 1 files changed, 30 insertions(+), 19 deletions(-) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>>>>>>>>>>> index 7cc566b..d1dac08 100644 >>>>>>>>>>>> --- a/arch/x86/kvm/vmx.c >>>>>>>>>>>> +++ b/arch/x86/kvm/vmx.c >>>>>>>>>>>> @@ -4605,37 +4605,48 @@ vmx_patch_hypercall(struct kvm_vcpu = *vcpu, unsigned char *hypercall) >>>>>>>>>>>> /* called to set cr0 as appropriate for a mov-to-cr0 exit. = */ >>>>>>>>>>>> static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned l= ong val) >>>>>>>>>>>> { >>>>>>>>>>>> - if (to_vmx(vcpu)->nested.vmxon && >>>>>>>>>>>> - ((val & VMXON_CR0_ALWAYSON) !=3D VMXON_CR0_ALWAYSON)) >>>>>>>>>>>> - return 1; >>>>>>>>>>>> - >>>>>>>>>>>> if (is_guest_mode(vcpu)) { >>>>>>>>>>>> - /* >>>>>>>>>>>> - * We get here when L2 changed cr0 in a way that did not = change >>>>>>>>>>>> - * any of L1's shadowed bits (see nested_vmx_exit_handled= _cr), >>>>>>>>>>>> - * but did change L0 shadowed bits. This can currently ha= ppen >>>>>>>>>>>> - * with the TS bit: L0 may want to leave TS on (for lazy = fpu >>>>>>>>>>>> - * loading) while pretending to allow the guest to change= it. >>>>>>>>>>>> - */ >>>>>>>>>>> Can't say I understand this patch yet, but it looks like the = comment is >>>>>>>>>>> still valid. Why have you removed it? >>>>>>>>>> >>>>>>>>>> L0 allows L1 or L2 at most to own TS, the rest is host-owned. = I think >>>>>>>>>> the comment was always misleading. >>>>>>>>>> >>>>>>>>> I do not see how it is misleading. For everything but TS we wil= l not get >>>>>>>>> here (if L1 is kvm). For TS we will get here if L1 allows L2 to= change >>>>>>>>> it, but L0 does not. >>>>>>>> >>>>>>>> For everything *but guest-owned* we will get here, thus for most= CR0 >>>>>>>> accesses (bit-wise, not regarding frequency). >>>>>>>> >>>>>>> I do not see how. If bit is trapped by L1 we will not get here. W= e will >>>>>>> do vmexit to L1 instead. nested_vmx_exit_handled_cr() check this = condition. >>>>>>> I am not arguing about you code (didn't grok it yet), but the com= ment >>>>>>> still make sense to me. >>>>>> >>>>>> "We get here when L2 changed cr0 in a way that did not change any = of >>>>>> L1's shadowed bits (see nested_vmx_exit_handled_cr), but did chang= e L0 >>>>>> shadowed bits." That I can sign. But the rest about TS is just >>>>>> misleading as we trap _every_ change in L0 - except for TS under c= ertain >>>>>> conditions. The old code was tested against TS only, that's what t= he >>>>>> comment witness. >>>>>> >>>>> TS is just an example of how we can get here with KVM on KVM. Obvio= usly >>>>> other hypervisors may have different configuration. L2 may allow fu= ll >>>>> guest access to CR0 and then each CR0 write by L2 will be handled h= ere. >>>>> Under what other condition "we trap _every_ change in L0 - except f= or >>>>> TS" here? >>>> >>>> On FPU activation: >>>> >>>> cr0_guest_owned_bits =3D X86_CR0_TS; >>>> >>>> And on FPU deactivation: >>>> >>>> cr0_guest_owned_bits =3D 0; >>>> >>> That's exactly TS case that comment explains. Note that >>> CR0_GUEST_HOST_MASK =3D ~cr0_guest_owned_bits. >> >> Again, it's the inverse of what the comment suggest: we enter >> handle_set_cr0 for every change on CR0 that doesn't match the shadow -= >> except TS was given to the guest by both L1 and L0 (or TS isn't change= d >> as well). > That doesn't make sense to me. I do not even sure what you are saying > since you do not specify what shadow is matched. From the code I see > that on CR0 exit to L0 from L2 we check if L2 tries to change CR0 bits > that L1 claims to belong to it and do #vmexit to L1 if it is: >=20 > if (vmcs12->cr0_guest_host_mask & (val ^ vmcs12->cr0_read_shadow)) > return 1; >=20 > We never reach handle_set_cr0() in that case. >=20 > Can you provide an example with actual values for L2/L1/L0 of what you > are trying to say? I already provided a concrete one: L1 clears PE/PG from its guest_host_mask (assuming we support unrestricted guest mode for L1), L2 switches from real to protected mode, thus sets PE=3D1 while the shadow (set by L0) holds 0 =3D> we end up in handle_set_cr0. Jan ------enig2CKEVPQXHXOFJDIIMCCHR Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlE1ACkACgkQitSsb3rl5xSwNACeJTvOFWXNqElARWACgl2L7iF9 Q9EAoOHUdMu/mGYwifUylCVSabCWCSbx =VNco -----END PGP SIGNATURE----- ------enig2CKEVPQXHXOFJDIIMCCHR--