From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi.kivity@gmail.com>
Subject: Re: 2 CPU Conformance Issue in KVM/x86
Date: Mon, 09 Mar 2015 20:23:07 +0200
Message-ID: <54FDE50B.8040408@gmail.com>
References: <A6F671BC-983C-4005-87E9-FCC68DEF0D30@gmail.com> <54F58471.7020906@redhat.com> <54FDD39C.9060908@gmail.com> <6073FF8F-E261-4DC3-817A-9F4A46B5C0DB@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8;
	format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm list <kvm@vger.kernel.org>,
	=?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= <rkrcmar@redhat.com>
To: Nadav Amit <nadav.amit@gmail.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mail-we0-f177.google.com ([74.125.82.177]:44998 "EHLO
	mail-we0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1750741AbbCISXM (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 9 Mar 2015 14:23:12 -0400
Received: by wesp10 with SMTP id p10so8579329wes.11
        for <kvm@vger.kernel.org>; Mon, 09 Mar 2015 11:23:10 -0700 (PDT)
In-Reply-To: <6073FF8F-E261-4DC3-817A-9F4A46B5C0DB@gmail.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 03/09/2015 07:51 PM, Nadav Amit wrote:
> Avi Kivity <avi.kivity@gmail.com> wrote:
>
>> On 03/03/2015 11:52 AM, Paolo Bonzini wrote:
>>>> In this
>>>> case, the VM might expect exceptions when PTE bits which are highe=
r than the
>>>> maximum (reported) address width are set, and it would not get suc=
h
>>>> exceptions. This problem can easily be experienced by small change=
 to the
>>>> existing KVM unit-tests.
>>>>
>>>> There are many variants to this problem, and the only solution whi=
ch I
>>>> consider complete is to report to the VM the maximum (52) physical=
 address
>>>> width to the VM, configure the VM to exit on #PF with reserved-bit
>>>> error-codes, and then emulate these faulting instructions.
>>> Not even that would be a definitive solution.  If the guest tries t=
o map
>>> RAM (e.g. a PCI BAR that is backed by RAM) above the host MAXPHYADD=
R,
>>> you would get EPT misconfiguration vmexits.
>>>
>>> I think there is no way to emulate physical address width correctly=
,
>>> except by disabling EPT.
>> Is the issue emulating a higher MAXPHYADDR on the guest than is avai=
lable
>> on the host? I don't think there's any need to support that.
>>
>> Emulating a lower setting on the guest than is available on the host=
 is, I
>> think, desirable. Whether it would work depends on the relative prio=
rity
>> of EPT misconfiguration exits vs. page table permission faults.
> Thanks for the feedback.
>
> Guest page-table permissions faults got priority over EPT misconfigur=
ation.
> KVM can even be set to trap page-table permission faults, at least in=
 VT-x.
> Anyhow, I don=E2=80=99t think it is enough.

Why is it not enough? If you trap a permission fault, you can inject an=
y=20
exception error code you like.

>   Here is an example
>
> My machine has MAXPHYADDR of 46. I modified kvm-unit-tests access tes=
t to
> set pte.45 instead of pte.51, which from the VM point-of-view should =
cause
> the #PF error-code indicate the reserved bits are set (just as pte.51=
 does).
> Here is one error from the log:
>
> test pte.p pte.45 pde.p user: FAIL: error code 5 expected d
> Dump mapping: address: 123400000000
> ------L4: 304b007
> ------L3: 304c007
> ------L2: 304d001
> ------L1: 200002000001

This is with an ept misconfig programmed into that address, yes?

> As you can see, the #PF should have had two reasons: reserved bits, a=
nd user
> access to supervisor only page. The error-code however does not indic=
ate the
> reserved-bits are set.
>
> Note that KVM did not trap any exit on that faulting instruction, as
> otherwise it would try to emulate the instruction and assuming it is
> supported (and that the #PF was not on an instruction fetch), should =
be able
> to emulate the #PF correctly.
> [ The test actually crashes soon after this error due to these reason=
s. ]
>
> Anyhow, that is the reason for me to assume that having the maximum
> MAXPHYADDR is better.
>

Well, that doesn't work for the reasons Paolo noted.  The guest can hav=
e=20
a ivshmem device attached, and map it above a host-supported virtual=20
address, and suddenly it goes slow.