From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@redhat.com>
Subject: Re: [PATCH V3] perf & kvm: Enhance perf to collect KVM guest  os
 statistics from host side
Date: Sat, 17 Apr 2010 21:12:41 +0300
Message-ID: <4BC9FA19.2070602@redhat.com>
References: <1902387910.2078.435.camel@ymzhang.sh.intel.com> <20100415104051.GC12697@8bytes.org> <4BC6EDFF.3000702@redhat.com> <201004152208.08409.sheng@linux.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8;
	format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: Joerg Roedel <joro@8bytes.org>,
	"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>,
	Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Jes Sorensen <Jes.Sorensen@redhat.com>,
	Gleb Natapov <gleb@redhat.com>,
	Zachary Amsden <zamsden@redhat.com>, zhiteng.huang@intel.com,
	tim.c.chen@intel.com, Arnaldo Carvalho de Melo <acme@infradead.org>
To: Sheng Yang <sheng@linux.intel.com>
Return-path: <linux-kernel-owner@vger.kernel.org>
In-Reply-To: <201004152208.08409.sheng@linux.intel.com>
Sender: linux-kernel-owner@vger.kernel.org
List-Id: kvm.vger.kernel.org

On 04/15/2010 05:08 PM, Sheng Yang wrote:
> On Thursday 15 April 2010 18:44:15 Avi Kivity wrote:
>   =20
>> On 04/15/2010 01:40 PM, Joerg Roedel wrote:
>>     =20
>>>> That means an NMI that happens outside guest code (for example, in=
 the
>>>> mmu, or during the exit itself) would be counted as if in guest co=
de.
>>>>         =20
>>> Hmm, true. The same is true for an NMI that happens between VMSAVE =
and
>>> STGI but that window is smaller. Anyway, I think we don't need the
>>> busy-wait loop. The NMI should be executed at a well defined point =
and
>>> we set the cpu_var back to NULL after that point.
>>>       =20
>> The point is not well defined.  Considering there are already at lea=
st
>> two implementations svm, I don't want to rely on implementation deta=
ils.
>>     =20
> After more investigating, I realized that I had interpreted the SDM w=
rong.
> Sorry.
>
> There is *no* risk with the original method of calling "int $2".
>
> According to the SDM 24.1:
>
>   =20
>> The following bullets detail when architectural state is and is not =
updated
>>     =20
> in response to VM exits:
> [...]
>   =20
>> - An NMI causes subsequent NMIs to be blocked, but only after the VM=
 exit
>>     =20
> completes.
>
> So the truth is, after NMI directly caused VMExit, the following NMIs=
 would be
> blocked, until encountered next "iret". So execute "int $2" is safe i=
n
> vmx_complete_interrupts(), no risk in causing nested NMI. And it woul=
d unblock
> the following NMIs as well due to "iret" it executed.
>
> So there is unnecessary to make change to avoid "potential nested NMI=
".
>   =20

Let's look at the surrounding text...

>
> The following bullets detail when architectural state is and is not=20
> updated in response
> to VM exits:
> =E2=80=A2   If an event causes a VM exit directly, it does not update=
=20
> architectural state as it
>     would have if it had it not caused the VM exit:
>     =E2=80=94 A debug exception does not update DR6, DR7.GD, or IA32_=
DEBUGCTL.LBR.
>         (Information about the nature of the debug exception is saved=
=20
> in the exit
>         qualification field.)
>     =E2=80=94 A page fault does not update CR2. (The linear address c=
ausing=20
> the page fault
>         is saved in the exit-qualification field.)
>     =E2=80=94 An NMI causes subsequent NMIs to be blocked, but only a=
fter the=20
> VM exit
>         completes.
>     =E2=80=94 An external interrupt does not acknowledge the interrup=
t=20
> controller and the
>         interrupt remains pending, unless the =E2=80=9Cacknowledge in=
terrupt=20
> on exit=E2=80=9D
>         VM-exit control is 1. In such a case, the interrupt controlle=
r=20
> is acknowledged
>         and the interrupt is no longer pending.


Everywhere it says state is _not_ updated, so I think what is meant is=20
that NMIs are blocked, but only _until_ the VM exit completes.

I think you were right the first time around.  Can you check with your=20
architecture team?

--=20
Do not meddle in the internals of kernels, for they are subtle and quic=
k to panic.