From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paolo Bonzini Subject: Re: [PATCH v2 5/8] KVM: nVMX: Fix guest CR3 read-back on VM-exit Date: Wed, 07 Aug 2013 15:32:37 +0200 Message-ID: <52024C75.9000304@redhat.com> References: <0816baee846f9c8f4d54c6738b2582a95f9c56a3.1375778397.git.jan.kiszka@web.de> <20130806101236.GN8218@redhat.com> <20130806140248.GB8218@redhat.com> <20130806144117.GD8218@redhat.com> <52011AE6.2010006@siemens.com> <20130806155344.GE8218@redhat.com> <52011CCE.4000109@siemens.com> <20130807123958.GA30470@redhat.com> <520241A7.5060301@siemens.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Gleb Natapov , "Zhang, Yang Z" , kvm , Xiao Guangrong , "Nakajima, Jun" , Arthur Chunqi Li To: Jan Kiszka Return-path: Received: from mail-wi0-f179.google.com ([209.85.212.179]:44014 "EHLO mail-wi0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933057Ab3HGNco (ORCPT ); Wed, 7 Aug 2013 09:32:44 -0400 Received: by mail-wi0-f179.google.com with SMTP id hr7so1691633wib.0 for ; Wed, 07 Aug 2013 06:32:43 -0700 (PDT) In-Reply-To: <520241A7.5060301@siemens.com> Sender: kvm-owner@vger.kernel.org List-ID: On 08/07/2013 02:46 PM, Jan Kiszka wrote: > On 2013-08-07 14:39, Gleb Natapov wrote: >> On Tue, Aug 06, 2013 at 05:57:02PM +0200, Jan Kiszka wrote: >>> On 2013-08-06 17:53, Gleb Natapov wrote: >>>> On Tue, Aug 06, 2013 at 05:48:54PM +0200, Jan Kiszka wrote: >>>>> On 2013-08-06 17:04, Zhang, Yang Z wrote: >>>>>> Gleb Natapov wrote on 2013-08-06: >>>>>>> On Tue, Aug 06, 2013 at 02:12:51PM +0000, Zhang, Yang Z wrote: >>>>>>>> Gleb Natapov wrote on 2013-08-06: >>>>>>>>> On Tue, Aug 06, 2013 at 11:44:41AM +0000, Zhang, Yang Z wrote: >>>>>>>>>> Gleb Natapov wrote on 2013-08-06: >>>>>>>>>>> On Tue, Aug 06, 2013 at 10:39:59AM +0200, Jan Kiszka wrote: >>>>>>>>>>>> From: Jan Kiszka >>>>>>>>>>>> >>>>>>>>>>>> If nested EPT is enabled, the L2 guest may change CR3 without any >>>>>>>>>>>> exits. We therefore have to read the current value from the VMCS >>>>>>>>>>>> when switching to L1. However, if paging wasn't enabled, L0 tracks >>>>>>>>>>>> L2's CR3, and GUEST_CR3 rather contains the real-mode identity map. >>>>>>>>>>>> So we need to retrieve CR3 from the architectural state after >>>>>>>>>>>> conditionally updating it - and this is what kvm_read_cr3 does. >>>>>>>>>>>> >>>>>>>>>>> I have a headache from trying to think about it already, but >>>>>>>>>>> shouldn't >>>>>>>>>>> L1 be the one who setups identity map for L2? I traced what >>>>>>>>>>> vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) return here and do not >>>>>>>>>>> see >>>>>>>>>> Here is my understanding: >>>>>>>>>> In vmx_set_cr3(), if enabled ept, it will check whether target >>>>>>>>>> vcpu is enabling >>>>>>>>> paging. When L2 running in real mode, then target vcpu is not >>>>>>>>> enabling paging and it will use L0's identity map for L2. If you >>>>>>>>> read GUEST_CR3 from VMCS, then you may get the L2's identity map >>>>>>>>> not >>>>>>> L1's. >>>>>>>>>> >>>>>>>>> Yes, but why it makes sense to use L0 identity map for L2? I didn't >>>>>>>>> see different vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) values because >>>>>>>>> L0 and L1 use the same identity map address. When I changed identity >>>>>>>>> address L1 configures vmcs_read64(GUEST_CR3)/kvm_read_cr3(vcpu) are >>>>>>>>> indeed different, but the real CR3 L2 uses points to L0 identity map. >>>>>>>>> If I zero L1 identity map page L2 still works. >>>>>>>>> >>>>>>>> If L2 in real mode, then L2PA == L1PA. So L0's identity map also works >>>>>>>> if L2 is in real mode. >>>>>>>> >>>>>>> That not the point. It may work accidentally for kvm on kvm, but what >>>>>>> if other hypervisor plays different tricks and builds different ident map for its guest? >>>>>> Yes, if other hypervisor doesn't build the 1:1 mapping for its guest, it will fail to work. But I cannot imagine what kind of hypervisor will do this and what the purpose is. >>>>>> Anyway, current logic is definitely wrong. It should use L1's identity map instead L0's. >>>>> >>>>> So something like this is rather needed? >>>>> >>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>>>> index 44494ed..60a3644 100644 >>>>> --- a/arch/x86/kvm/vmx.c >>>>> +++ b/arch/x86/kvm/vmx.c >>>>> @@ -3375,8 +3375,10 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) >>>>> if (enable_ept) { >>>>> eptp = construct_eptp(cr3); >>>>> vmcs_write64(EPT_POINTER, eptp); >>>>> - guest_cr3 = is_paging(vcpu) ? kvm_read_cr3(vcpu) : >>>>> - vcpu->kvm->arch.ept_identity_map_addr; >>>>> + if (is_paging(vcpu) || is_guest_mode(vcpu)) >>>>> + guest_cr3 = kvm_read_cr3(vcpu) : >>>>> + else >>>>> + guest_cr3 = vcpu->kvm->arch.ept_identity_map_addr; >>>>> ept_load_pdptrs(vcpu); >>>>> } >>>>> >>>> That what I am thinking, will think about it some more tomorrow. >>> >>> OK, and I'll feed it into a local test. >>> >> Thought about is some more. So without nested unrestricted guest (nUG) >> is_paging() will always be true (since without nUG guest entry is not >> possible otherwise) and guest's cr3 will be used, but with nUG identity >> map is not used (that is why L2 still works even though wrong identity >> map pointer is assigned to cr3), so the code here just corrupts nested >> guest's cr3 for no reason and that is why you had to use kvm_read_cr3() >> in prepare_vmcs12() to get correct cr3 value. The patch above should be >> used instead of original one IMO. How is testing going? > > Yes, testing worked fine. I've queued above patch and will send it out > within the next round. Just reply here with the commit message you desire and Signed-off-by, so I can queue it for people who wish to play with nEPT. Paolo