From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: 2 CPU Conformance Issue in KVM/x86 Date: Mon, 09 Mar 2015 21:19:29 +0200 Message-ID: <54FDF241.8080002@gmail.com> References: <54F58471.7020906@redhat.com> <54FDD39C.9060908@gmail.com> <6073FF8F-E261-4DC3-817A-9F4A46B5C0DB@gmail.com> <54FDE50B.8040408@gmail.com> <13DCF857-5591-4499-9B0D-4165268E9CE8@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Paolo Bonzini , kvm list , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= To: Nadav Amit Return-path: Received: from mail-we0-f171.google.com ([74.125.82.171]:42167 "EHLO mail-we0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751340AbbCITTe (ORCPT ); Mon, 9 Mar 2015 15:19:34 -0400 Received: by wesq59 with SMTP id q59so21117313wes.9 for ; Mon, 09 Mar 2015 12:19:32 -0700 (PDT) In-Reply-To: <13DCF857-5591-4499-9B0D-4165268E9CE8@gmail.com> Sender: kvm-owner@vger.kernel.org List-ID: On 03/09/2015 09:07 PM, Nadav Amit wrote: > Avi Kivity wrote: > >> On 03/09/2015 07:51 PM, Nadav Amit wrote: >>> Avi Kivity wrote: >>> >>>> On 03/03/2015 11:52 AM, Paolo Bonzini wrote: >>>>>> In this >>>>>> case, the VM might expect exceptions when PTE bits which are hig= her than the >>>>>> maximum (reported) address width are set, and it would not get s= uch >>>>>> exceptions. This problem can easily be experienced by small chan= ge to the >>>>>> existing KVM unit-tests. >>>>>> >>>>>> There are many variants to this problem, and the only solution w= hich I >>>>>> consider complete is to report to the VM the maximum (52) physic= al address >>>>>> width to the VM, configure the VM to exit on #PF with reserved-b= it >>>>>> error-codes, and then emulate these faulting instructions. >>>>> Not even that would be a definitive solution. If the guest tries= to map >>>>> RAM (e.g. a PCI BAR that is backed by RAM) above the host MAXPHYA= DDR, >>>>> you would get EPT misconfiguration vmexits. >>>>> >>>>> I think there is no way to emulate physical address width correct= ly, >>>>> except by disabling EPT. >>>> Is the issue emulating a higher MAXPHYADDR on the guest than is av= ailable >>>> on the host? I don't think there's any need to support that. >>>> >>>> Emulating a lower setting on the guest than is available on the ho= st is, I >>>> think, desirable. Whether it would work depends on the relative pr= iority >>>> of EPT misconfiguration exits vs. page table permission faults. >>> Thanks for the feedback. >>> >>> Guest page-table permissions faults got priority over EPT misconfig= uration. >>> KVM can even be set to trap page-table permission faults, at least = in VT-x. >>> Anyhow, I don=E2=80=99t think it is enough. >> Why is it not enough? If you trap a permission fault, you can inject= any exception error code you like. > Because there is no real permission fault. In the following example, = the VM > expects one (VM=E2=80=99s MAXPHYADDR=3D40), but there isn=E2=80=99t (= Host=E2=80=99s MAXPHYADDR=3D46), so > the hypervisor cannot trap it. It can only trap all #PF, which is obv= iously > too intrusive. There are three cases: 1) The guest has marked the page as not present. In this case, no=20 reserved bits are set and the guest should receive its #PF. 2) The page is present and the permissions are sufficient. In this=20 case, you will get an EPT misconfiguration and can proceed to inject a=20 #PF with the reserved bit flag set. 3) The page is present but permissions are not sufficient. In this cas= e=20 you can trap the fault via the PFEC_MASK register and inject a #PF to=20 the guest. So you can emulate it and only trap permission faults. It's still too=20 expensive though. >>> Here is an example >>> >>> My machine has MAXPHYADDR of 46. I modified kvm-unit-tests access t= est to >>> set pte.45 instead of pte.51, which from the VM point-of-view shoul= d cause >>> the #PF error-code indicate the reserved bits are set (just as pte.= 51 does). >>> Here is one error from the log: >>> >>> test pte.p pte.45 pde.p user: FAIL: error code 5 expected d >>> Dump mapping: address: 123400000000 >>> ------L4: 304b007 >>> ------L3: 304c007 >>> ------L2: 304d001 >>> ------L1: 200002000001 >> This is with an ept misconfig programmed into that address, yes? > A reserved bit in the PTE is set - from the VM point-of-view. If ther= e > wasn=E2=80=99t another cause for #PF, it would lead to EPT violation/= misconfig. > >>> As you can see, the #PF should have had two reasons: reserved bits,= and user >>> access to supervisor only page. The error-code however does not ind= icate the >>> reserved-bits are set. >>> >>> Note that KVM did not trap any exit on that faulting instruction, a= s >>> otherwise it would try to emulate the instruction and assuming it i= s >>> supported (and that the #PF was not on an instruction fetch), shoul= d be able >>> to emulate the #PF correctly. >>> [ The test actually crashes soon after this error due to these reas= ons. ] >>> >>> Anyhow, that is the reason for me to assume that having the maximum >>> MAXPHYADDR is better. >> Well, that doesn't work for the reasons Paolo noted. The guest can = have a ivshmem device attached, and map it above a host-supported virtu= al address, and suddenly it goes slow. > I fully understand. That=E2=80=99s the reason I don=E2=80=99t have a = reasonable solution. I can't think of one with reasonable performance either. Perhaps the=20 maintainers could raise the issue with Intel. It looks academic but it= =20 can happen in real life -- KVM for example used to rely on reserved bit= s=20 faults (it set all bits in the PTE so it wouldn't have been caught by t= his).