From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kiszka Subject: Re: [PATCH v2 5/8] KVM: nVMX: Fix guest CR3 read-back on VM-exit Date: Wed, 07 Aug 2013 14:46:31 +0200 Message-ID: <520241A7.5060301@siemens.com> References: <0816baee846f9c8f4d54c6738b2582a95f9c56a3.1375778397.git.jan.kiszka@web.de> <20130806101236.GN8218@redhat.com> <20130806140248.GB8218@redhat.com> <20130806144117.GD8218@redhat.com> <52011AE6.2010006@siemens.com> <20130806155344.GE8218@redhat.com> <52011CCE.4000109@siemens.com> <20130807123958.GA30470@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: "Zhang, Yang Z" , Paolo Bonzini , kvm , Xiao Guangrong , "Nakajima, Jun" , Arthur Chunqi Li To: Gleb Natapov Return-path: Received: from david.siemens.de ([192.35.17.14]:34572 "EHLO david.siemens.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932833Ab3HGMqm (ORCPT ); Wed, 7 Aug 2013 08:46:42 -0400 In-Reply-To: <20130807123958.GA30470@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 2013-08-07 14:39, Gleb Natapov wrote: > On Tue, Aug 06, 2013 at 05:57:02PM +0200, Jan Kiszka wrote: >> On 2013-08-06 17:53, Gleb Natapov wrote: >>> On Tue, Aug 06, 2013 at 05:48:54PM +0200, Jan Kiszka wrote: >>>> On 2013-08-06 17:04, Zhang, Yang Z wrote: >>>>> Gleb Natapov wrote on 2013-08-06: >>>>>> On Tue, Aug 06, 2013 at 02:12:51PM +0000, Zhang, Yang Z wrote: >>>>>>> Gleb Natapov wrote on 2013-08-06: >>>>>>>> On Tue, Aug 06, 2013 at 11:44:41AM +0000, Zhang, Yang Z wrote: >>>>>>>>> Gleb Natapov wrote on 2013-08-06: >>>>>>>>>> On Tue, Aug 06, 2013 at 10:39:59AM +0200, Jan Kiszka wrote: >>>>>>>>>>> From: Jan Kiszka >>>>>>>>>>> >>>>>>>>>>> If nested EPT is enabled, the L2 guest may change CR3 without any >>>>>>>>>>> exits. We therefore have to read the current value from the VMCS >>>>>>>>>>> when switching to L1. However, if paging wasn't enabled, L0 tracks >>>>>>>>>>> L2's CR3, and GUEST_CR3 rather contains the real-mode identity map. >>>>>>>>>>> So we need to retrieve CR3 from the architectural state after >>>>>>>>>>> conditionally updating it - and this is what kvm_read_cr3 does. >>>>>>>>>>> >>>>>>>>>> I have a headache from trying to think about it already, but >>>>>>>>>> shouldn't >>>>>>>>>> L1 be the one who setups identity map for L2? I traced what >>>>>>>>>> vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) return here and do not >>>>>>>>>> see >>>>>>>>> Here is my understanding: >>>>>>>>> In vmx_set_cr3(), if enabled ept, it will check whether target >>>>>>>>> vcpu is enabling >>>>>>>> paging. When L2 running in real mode, then target vcpu is not >>>>>>>> enabling paging and it will use L0's identity map for L2. If you >>>>>>>> read GUEST_CR3 from VMCS, then you may get the L2's identity map >>>>>>>> not >>>>>> L1's. >>>>>>>>> >>>>>>>> Yes, but why it makes sense to use L0 identity map for L2? I didn't >>>>>>>> see different vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) values because >>>>>>>> L0 and L1 use the same identity map address. When I changed identity >>>>>>>> address L1 configures vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) are >>>>>>>> indeed different, but the real CR3 L2 uses points to L0 identity map. >>>>>>>> If I zero L1 identity map page L2 still works. >>>>>>>> >>>>>>> If L2 in real mode, then L2PA == L1PA. So L0's identity map also works >>>>>>> if L2 is in real mode. >>>>>>> >>>>>> That not the point. It may work accidentally for kvm on kvm, but what >>>>>> if other hypervisor plays different tricks and builds different ident map for its guest? >>>>> Yes, if other hypervisor doesn't build the 1:1 mapping for its guest, it will fail to work. But I cannot imagine what kind of hypervisor will do this and what the purpose is. >>>>> Anyway, current logic is definitely wrong. It should use L1's identity map instead L0's. >>>> >>>> So something like this is rather needed? >>>> >>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>>> index 44494ed..60a3644 100644 >>>> --- a/arch/x86/kvm/vmx.c >>>> +++ b/arch/x86/kvm/vmx.c >>>> @@ -3375,8 +3375,10 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) >>>> if (enable_ept) { >>>> eptp = construct_eptp(cr3); >>>> vmcs_write64(EPT_POINTER, eptp); >>>> - guest_cr3 = is_paging(vcpu) ? kvm_read_cr3(vcpu) : >>>> - vcpu->kvm->arch.ept_identity_map_addr; >>>> + if (is_paging(vcpu) || is_guest_mode(vcpu)) >>>> + guest_cr3 = kvm_read_cr3(vcpu) : >>>> + else >>>> + guest_cr3 = vcpu->kvm->arch.ept_identity_map_addr; >>>> ept_load_pdptrs(vcpu); >>>> } >>>> >>> That what I am thinking, will think about it some more tomorrow. >> >> OK, and I'll feed it into a local test. >> > Thought about is some more. So without nested unrestricted guest (nUG) > is_paging() will always be true (since without nUG guest entry is not > possible otherwise) and guest's cr3 will be used, but with nUG identity > map is not used (that is why L2 still works even though wrong identity > map pointer is assigned to cr3), so the code here just corrupts nested > guest's cr3 for no reason and that is why you had to use kvm_read_cr3() > in prepare_vmcs12() to get correct cr3 value. The patch above should be > used instead of original one IMO. How is testing going? Yes, testing worked fine. I've queued above patch and will send it out within the next round. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux