From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [PATCH V3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side Date: Sat, 17 Apr 2010 21:12:41 +0300 Message-ID: <4BC9FA19.2070602@redhat.com> References: <1902387910.2078.435.camel@ymzhang.sh.intel.com> <20100415104051.GC12697@8bytes.org> <4BC6EDFF.3000702@redhat.com> <201004152208.08409.sheng@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Joerg Roedel , "Zhang, Yanmin" , Ingo Molnar , Peter Zijlstra , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Marcelo Tosatti , Jes Sorensen , Gleb Natapov , Zachary Amsden , zhiteng.huang@intel.com, tim.c.chen@intel.com, Arnaldo Carvalho de Melo To: Sheng Yang Return-path: In-Reply-To: <201004152208.08409.sheng@linux.intel.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On 04/15/2010 05:08 PM, Sheng Yang wrote: > On Thursday 15 April 2010 18:44:15 Avi Kivity wrote: > =20 >> On 04/15/2010 01:40 PM, Joerg Roedel wrote: >> =20 >>>> That means an NMI that happens outside guest code (for example, in= the >>>> mmu, or during the exit itself) would be counted as if in guest co= de. >>>> =20 >>> Hmm, true. The same is true for an NMI that happens between VMSAVE = and >>> STGI but that window is smaller. Anyway, I think we don't need the >>> busy-wait loop. The NMI should be executed at a well defined point = and >>> we set the cpu_var back to NULL after that point. >>> =20 >> The point is not well defined. Considering there are already at lea= st >> two implementations svm, I don't want to rely on implementation deta= ils. >> =20 > After more investigating, I realized that I had interpreted the SDM w= rong. > Sorry. > > There is *no* risk with the original method of calling "int $2". > > According to the SDM 24.1: > > =20 >> The following bullets detail when architectural state is and is not = updated >> =20 > in response to VM exits: > [...] > =20 >> - An NMI causes subsequent NMIs to be blocked, but only after the VM= exit >> =20 > completes. > > So the truth is, after NMI directly caused VMExit, the following NMIs= would be > blocked, until encountered next "iret". So execute "int $2" is safe i= n > vmx_complete_interrupts(), no risk in causing nested NMI. And it woul= d unblock > the following NMIs as well due to "iret" it executed. > > So there is unnecessary to make change to avoid "potential nested NMI= ". > =20 Let's look at the surrounding text... > > The following bullets detail when architectural state is and is not=20 > updated in response > to VM exits: > =E2=80=A2 If an event causes a VM exit directly, it does not update= =20 > architectural state as it > would have if it had it not caused the VM exit: > =E2=80=94 A debug exception does not update DR6, DR7.GD, or IA32_= DEBUGCTL.LBR. > (Information about the nature of the debug exception is saved= =20 > in the exit > qualification field.) > =E2=80=94 A page fault does not update CR2. (The linear address c= ausing=20 > the page fault > is saved in the exit-qualification field.) > =E2=80=94 An NMI causes subsequent NMIs to be blocked, but only a= fter the=20 > VM exit > completes. > =E2=80=94 An external interrupt does not acknowledge the interrup= t=20 > controller and the > interrupt remains pending, unless the =E2=80=9Cacknowledge in= terrupt=20 > on exit=E2=80=9D > VM-exit control is 1. In such a case, the interrupt controlle= r=20 > is acknowledged > and the interrupt is no longer pending. Everywhere it says state is _not_ updated, so I think what is meant is=20 that NMIs are blocked, but only _until_ the VM exit completes. I think you were right the first time around. Can you check with your=20 architecture team? --=20 Do not meddle in the internals of kernels, for they are subtle and quic= k to panic.