From mboxrd@z Thu Jan  1 00:00:00 1970
From: Ingo Molnar <mingo@elte.hu>
Subject: Re: KVM PMU virtualization
Date: Fri, 26 Feb 2010 11:56:21 +0100
Message-ID: <20100226105621.GD7463@elte.hu>
References: <4B86917C.4070102@redhat.com>
 <20100225173423.GB4246@8bytes.org>
 <1267152917.1726.82.camel@localhost>
 <20100226085105.GC4246@8bytes.org>
 <20100226091732.GI15885@elte.hu>
 <20100226104218.GE4246@8bytes.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>,
	Jes Sorensen <Jes.Sorensen@redhat.com>,
	KVM General <kvm@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Avi Kivity <avi@redhat.com>,
	Zachary Amsden <zamsden@redhat.com>,
	Gleb Natapov <gleb@redhat.com>, ming.m.lin@intel.com
To: Joerg Roedel <joro@8bytes.org>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx2.mail.elte.hu ([157.181.151.9]:39869 "EHLO mx2.mail.elte.hu"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S933370Ab0BZK4d (ORCPT <rfc822;kvm@vger.kernel.org>);
	Fri, 26 Feb 2010 05:56:33 -0500
Content-Disposition: inline
In-Reply-To: <20100226104218.GE4246@8bytes.org>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>


* Joerg Roedel <joro@8bytes.org> wrote:

> On Fri, Feb 26, 2010 at 10:17:32AM +0100, Ingo Molnar wrote:
> > My suggestion, as always, would be to start very simple and very minimal:
> > 
> > Enable 'perf kvm top' to show guest overhead. Use the exact same kernel image 
> > both as a host and as guest (for testing), to not have to deal with the symbol 
> > space transport problem initially. Enable 'perf kvm record' to only record 
> > guest events by default. Etc.
> > 
> > This alone will be a quite useful result already - and gives a basis for 
> > further work. No need to spend months to do the big grand design straight 
> > away, all of this can be done gradually and in the order of usefulness - and 
> > you'll always have something that actually works (and helps your other KVM 
> > projects) along the way.
> > 
> > [ And, as so often, once you walk that path, that grand scheme you are 
> >   thinking about right now might easily become last year's really bad idea ;-) ]
> > 
> > So please start walking the path and experience the challenges first-hand.
> 
> That sounds like a good approach for the 'measure-guest-from-host'
> problem. It is also not very hard to implement. Where does perf fetch
> the rip of the nmi from, stack only or is this configurable?

The host semantics are that it takes the stack from the regs, and with 
call-graph recording (perf record -g) it will walk down the exception stack, 
irq stack, kernel stack, and user-space stack as well. (up to the point the 
pages are present - it stops on a non-present page. An app that is being 
profiled has its stack present so it's not an issue in practice.)

I'd suggest to leave out call graph sampling initially, and just get 'perf kvm 
top' to work with guest RIPs, simply sampled from the VM exit state.

See arch/x86/kernel/cpu/perf_event.c:

static void
perf_callchain_kernel(struct pt_regs *regs, struct perf_callchain_entry *entry)
{
        callchain_store(entry, PERF_CONTEXT_KERNEL);
        callchain_store(entry, regs->ip);

        dump_trace(NULL, regs, NULL, regs->bp, &backtrace_ops, entry);
}

If you have easy access to the VM state from NMI context right there then just 
hack in the guest RIP and you should have some prototype that samples the 
guest. (assuming you use the same kernel image for both the host an the guest)

This would be the easiest way to prototype it all.

	Ingo