From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@redhat.com>
Subject: Re: Enhance perf to support KVM
Date: Fri, 26 Feb 2010 11:53:51 +0200
Message-ID: <4B879A2F.50203@redhat.com>
References: <1267068445.1726.25.camel@localhost> <1267089644.12790.74.camel@laptop> <1267152599.1726.76.camel@localhost> <20100226090147.GH15885@elte.hu>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>, ming.m.lin@intel.com,
	sheng.yang@intel.com, Jes Sorensen <Jes.Sorensen@redhat.com>,
	KVM General <kvm@vger.kernel.org>,
	Zachary Amsden <zamsden@redhat.com>,
	Gleb Natapov <gleb@redhat.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	=?UTF-8?B?RnLDqWTDqXJpYyBXZWk=?= =?UTF-8?B?c2JlY2tlcg==?=
	<fweisbec@gmail.com>, Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Arjan van de Ven <arjan@infradead.org>
To: Ingo Molnar <mingo@elte.hu>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:28628 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S933131Ab0BZJyl (ORCPT <rfc822;kvm@vger.kernel.org>);
	Fri, 26 Feb 2010 04:54:41 -0500
In-Reply-To: <20100226090147.GH15885@elte.hu>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 02/26/2010 11:01 AM, Ingo Molnar wrote:
> * Zhang, Yanmin<yanmin_zhang@linux.intel.com>  wrote:
>
>    
>> 2) We couldn't get guest os kernel/user stack data in an easy way, so we
>> might not support callchain feature of tool perf. A work around is KVM
>> copies kernel stack data out, so we could at least support guest os kernel
>> callchain.
>>      
> If the guest is Linux, KVM can get all the info we need.
>
> While the PMU event itself might trigger in an NMI (where we cannot access
> most of KVM's data structures safely), for this specific case of KVM
> instrumentation we can delay the processing to a more appropriate time - in
> fact we can do it in the KVM thread itself.
>    

The nmi will be a synchronous event: it happens in guest context, and we 
program the hardware to intercept nmis, so we just get an exit telling 
us that an nmi has happened.

(would also be interesting to allow the guest to process the nmi 
directly in some scenarios, though that would require that there be no 
nmi sources on the host).

> We can do that because we just triggered a VM exit, so the VM state is for all
> purposes frozen (as far as this virtual CPU goes).
>    

Yes.

> Which egives us plenty of time and opportunity to piggy back to the KVM
> thread, look up the guest stack, process/fill the MMU cache as we walk the
> guest page tables, etc. etc.
>
> It would need some minimal callback facility towards KVM, triggered by a perf
> event PMI.
>    

Since the event is synchronous and kvm is aware of it we don't need a 
callback; kvm can call directly into perf with all the information.

> One additional step needed is to get symbol information from the guest, and to
> integrate it into the symbol cache on the host side in ~/.debug. We already
> support cross-arch symbols and 'perf archive', so the basic facilities are
> there for that. So you can profile on 32-bit PA-RISC and type 'perf report' on
> 64-bit x86 and get all the right info.
>
> For this to work across a guest, a gateway is needed towards the guest.
> There's several ways to achieve this. The most practical would be two steps:
>
>   - a user-space facility to access guest images/libraries. (say via ssh, or
>     just a plain TCP port) This would be useful for general 'remote profiling'
>     sessions as well, so it's not KVM specific - it would be useful for remote
>     debugging.
>
>   - The guest /proc/kallsyms (and vmlinux) could be accessed via that channel
>     as well.
>
> (Note that this is purely for guest symbol space access - all the profiling
> data itself comes via the host kernel.)
>
> In theory we could build some sort of 'symbol server' facility into the
> kernel, which could be enabled in guest kernels too - but i suspect existing,
> user-space transports go most of the way already.

There is also vmchannel aka virtio-serial, a guest-to-host communication 
channel.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.