From mboxrd@z Thu Jan 1 00:00:00 1970 From: Radim =?utf-8?B?S3LEjW3DocWZ?= Subject: Re: [PATCH] KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT Date: Tue, 29 Nov 2016 17:15:44 +0100 Message-ID: <20161129161543.GB1682@potion> References: <1479994178-27703-1-git-send-email-lprosek@redhat.com> <20161124182534.GB17619@potion> <20161125141543.GA5878@potion> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Cc: KVM list , Paolo Bonzini , Bandan Das To: Ladi Prosek Return-path: Received: from mx1.redhat.com ([209.132.183.28]:33872 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754671AbcK2QPs (ORCPT ); Tue, 29 Nov 2016 11:15:48 -0500 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 629D43719D0 for ; Tue, 29 Nov 2016 16:15:47 +0000 (UTC) Content-Disposition: inline In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: 2016-11-28 19:12+0100, Ladi Prosek: > On Fri, Nov 25, 2016 at 3:15 PM, Radim Krčmář wrote: > > 2016-11-25 09:44+0100, Ladi Prosek: >>> What kvm_set_cr3 does: >>> >>> * conditional kvm_mmu_sync_roots + kvm_make_request(KVM_REQ_TLB_FLUSH) >>> ** kvm_mmu_sync_roots will run anyway: kvm_mmu_load <- kvm_mmu_reload >>> <- vcpu_enter_guest >>> ** tlb flush will be done anyway: vmx_flush_tlb <- vmx_set_cr3 <- >>> kvm_mmu_load <- kvm_mmu_reload <- vcpu_enter_guest >>> >>> * in long mode, it fails if (cr3 & CR3_L_MODE_RESERVED_BITS) >>> ** nobody checks the return value >>> ** Intel manual says "Reserved bits in CR0 and CR3 remain clear after >>> any load of those registers; attempts to set them have no impact." >>> Should we just clear the bits and carry on then? This is in conflict >>> with "#GP(0) .. If an attempt is made to write a 1 to any reserved bit >>> in CR3." Hmm. >> >> The spec is quite clear on this. 26.3.1.1 Checks on Guest Control >> Registers, Debug Registers, and MSRs: >> >> The following checks are performed on processors that support Intel 64 >> architecture: The CR3 field must be such that bits 63:52 and bits in >> the range 51:32 beyond the processor’s physical-address width are 0. >> >> To verify, I tried these two options on top of vmx_vcpu_run >> >> vmcs_writel(GUEST_CR3, vmcs_readl(GUEST_CR3) | 1UL << boot_cpu_data.x86_phys_bits); >> vmcs_writel(GUEST_CR3, vmcs_readl(GUEST_CR3) | CR3_PCID_INVD); >> >> and both failed VM entry. We should fail the nested VM entry as well >> and use cpuid_maxphyaddr() to determine when. > > Thanks, I hadn't realized that the rules for VM entry are different > from regular CR3 loads. One more reason for not using kvm_set_cr3 > here. > >> (And I have a bad feeling that guest's physical address width is not >> being limited by hardware's ...) > > Can you elaborate? In which MMU modes would it be causing problems? It's a corner case on hardware that has less physical bits than the guest is configure for. If L1 then sets bits between its maximum and the hardware maximum, then VM entry in L0 will fail and that will kill L1 (report hardware error to userspace). L1 did nothing wrong and the bug is in L0, so killing L1 if we hit the corner case is the best way ...