Performance monitoring units and KVM

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* Performance monitoring units and KVM
@ 2008-01-30 17:06 Markus Armbruster
       [not found] ` <87wsprxmyb.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Markus Armbruster @ 2008-01-30 17:06 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Before I talk about performance monitoring units (PMUs) and KVM, let
me sketch PMUs and the software we have to put them to use.  You may
wish to skip to the next occurence of "KVM".

Modern processors sport PMUs in various forms and shapes.  The
simplest form is a couple of performance counters, each of which can
be programmed to count a certain number of a certain event, then
interrupt.  Supported events depend on the PMU, and typically include
cycles spent executing instructions, cache misses, stall cycles and
such.  Precision varies; there's undercounting, and the interrupt can
occur pretty far from the instruction that triggered the event.
Smarter hardware exists that can record samples in a buffer, and only
interrupts when that buffer fills up.

Our existing tool for putting the PMUs to use is OProfile.  OProfile
is a low-overhead, transparent (no instrumentation), system-wide
profiler.

OProfile consists of a kernel part that messes with the hardware and
delivers samples to user space, a daemon that records these samples,
and utilities for control and analysis.

OProfile is a very useful tool, but it has its limitations.  It uses
PMUs only in their simple form.  Users also want to monitor single
threads instead of the whole system, write applications that monitor
selected parts of themselves and more.  Perfmon2 attempts to provide a
generic interface to all the various PMUs that can support all that.
But it's a big, complex hairball out of tree, and merging it will take
time and hard work.

So, what does all this have to do with virtualization in general and
KVM in particular?

As I explained above, use of the PMU beyond what OProfile can do is
quite a hairball.  Adding virtualization to it can only make it
hairier.  I feel that hairball needs to be untangled elsewhere before
we touch it.  That leaves system-wide profiling.

System-wide profiling comes with two competing definitions of system:
virtual and real.  Both are useful.  And both need work.

System-wide profiling of the *virtual* machine is related to profiling
just a process.  That's hard.  I guess building on Perfmon2 would make
sense there, but as long as it's out of tree...  Can we wait for it?
If not, what then?

System-wide profiling of the *real* machine we already have: OProfile.
The fact that we're running guests guests doesn't disturb it.
However, presence of guests makes it harder to interpret samples: we
need to map code address to program/DSO.  The information necessary
for that lives in the guest kernel.

An obvious way to join the sample with the information is delegating
the recording of samples to the guest.  Note, however, that you then
need to set up the recording of samples in each guest in addition to
the host, which is inconvenient.

Such a real-system-wide profiler already exists for Xen: Xenoprof,
which is a patch to OProfile.  Here's how it works.

Xenoprof splits OProfile's kernel part: the hardware part moves into
the hypervisor, while the deliver-to-user-space part remains in the
kernel.  Kernel and hypervisor talk through hypercalls, shared memory
and virtual interrupts.  Instead of the driver for the real PMU, the
kernel uses a special Xenoprof driver that talks to the hypervisor.
The hypervisor accepts commands controlling the PMU only from the
privileged guest (dom0).

Xen guests (domains in Xen parlance) running Xenoprof are called
active: they receive their samples from the hypervisor and record
them.  Domains not running Xenoprof are called passive, and the
hypervisor routes their samples to dom0 for recording.  Dom0 can make
sense of passive domain's Linux kernel space samples, if given
suitable kernel symbols, but can't make sense of passive user space.

Active Xenoprof is useful because it gets you the most data.  It's
also quite a rain dance to use: starting and stopping profiling takes
several steps spread over all active domains, and if you misstep,
things tend to fail in the most confusing way imaginable.  More robust
error handling and better automation should help there.

Passive Xenoprof is useful because it works even when a domain can't
cooperate (no Xenoprof), or you can't be bothered to start OProfile
there.

The same ideas should work for KVM.  The whole hypervisor headache
just evaporates, of course.  What remains is the host kernel routing
samples to active guests (over virtio, I guess), and guests kernels
receiving samples from there instead of the hardware PMU.  In other
words, the sample channel from the host becomes our virtual PMU for
the guest.  Which needs a driver for it.  It's a weird PMU, because
you can't program its performance counters.  That's left to the host.

How much of Xenoprof's kernel code we could use I don't know.  A
common user space should be quite feasible.

So, what do you think?  Is this worthwhile?  Other ideas?

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <87wsprxmyb.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>]

* Re: Performance monitoring units and KVM
       [not found] ` <87wsprxmyb.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
@ 2008-01-30 17:41   ` Avi Kivity
       [not found]     ` <47A0B6DF.40208-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Avi Kivity @ 2008-01-30 17:41 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Markus Armbruster wrote:
>
> System-wide profiling of the *virtual* machine is related to profiling
> just a process.  That's hard.  I guess building on Perfmon2 would make
> sense there, but as long as it's out of tree...  Can we wait for it?
> If not, what then?
>
>   

Give the guest access to the real PMU.  Save them on every exit 
(switching profiling off), and restore them on every entry (switching 
profiling on).  The only problem with this is that it is very cpu model 
dependent, losing the hardware independence that virtual machines have.  
If you are satisfied with the architectural performance counters, then 
we even have hardware independence.

>
> The same ideas should work for KVM.  The whole hypervisor headache
> just evaporates, of course.  What remains is the host kernel routing
> samples to active guests (over virtio, I guess), and guests kernels
> receiving samples from there instead of the hardware PMU.  In other
> words, the sample channel from the host becomes our virtual PMU for
> the guest.  Which needs a driver for it.  It's a weird PMU, because
> you can't program its performance counters.  That's left to the host.
>   

Is there really a requirement to profile several userspace programs, on 
several guests, simultaneously?  If not, passing through the PMU will 
work best, with the additional advantage that guests will not need 
modification (so you can run Windows with VTune, for example).

If this three-tier profiling is actually needed, perhaps we can do all 
recording on the host, but have an interface to let the guest translate 
rip samples to something more meaningful.  This might work in this way:

- oprofile on the host receives the pmu nmi
- oprofile calls a hook (placed there by kvm) when it sees that the task 
is actually a virtual machine, instead of the usual translation process
- kvm injects an interrupt into the guest
- the guest converts the pmu rip value into a meaningful string and 
writes it into memory
- (later) kvm picks this up and passes it back to oprofile

The advantage here is that besides a fairly simple driver that needs to 
be loaded into the guest (and can be loaded automatically), everything 
is controlled from the host.  All the information is available on the 
host, so that sorting by counter occurences, for example, works.

-- 
error compiling committee.c: too many arguments to function

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <47A0B6DF.40208-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]

* Re: Performance monitoring units and KVM
       [not found]     ` <47A0B6DF.40208-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2008-01-30 18:05       ` Andi Kleen
  2008-01-30 18:23       ` Balaji Rao
  2008-01-30 19:55       ` Markus Armbruster
  2 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2008-01-30 18:05 UTC (permalink / raw)
  To: avi-atKUWr5tajBWk0Htik3J/w
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	Markus.Armbruster-5C7GfCeVMHo


> Is there really a requirement to profile several userspace programs, on 
> several guests, simultaneously? 

Since guests affect each others performance (e.g. one guest can push
the data of another guest out of cache) profiling over guests makes
a lot of sense. Otherwise you cannot easily diagnose any situations
where a guest affects another guests' performance negatively.

>The usual argument for profiling over gus
 If not, passing through the PMU will 
> work best, with the additional advantage that guests will not need 
> modification (so you can run Windows with VTune, for example).

Ideally you would support both.

-Andi

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Performance monitoring units and KVM
       [not found]     ` <47A0B6DF.40208-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  2008-01-30 18:05       ` Andi Kleen
@ 2008-01-30 18:23       ` Balaji Rao
       [not found]         ` <200801302353.19872.balajirrao-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2008-01-30 19:55       ` Markus Armbruster
  2 siblings, 1 reply; 15+ messages in thread
From: Balaji Rao @ 2008-01-30 18:23 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f; +Cc: Markus Armbruster, Avi Kivity

On Wednesday 30 January 2008 11:11:51 pm Avi Kivity wrote:
> Markus Armbruster wrote:
> > System-wide profiling of the *virtual* machine is related to profiling
> > just a process.  That's hard.  I guess building on Perfmon2 would make
> > sense there, but as long as it's out of tree...  Can we wait for it?
> > If not, what then?
>
> Give the guest access to the real PMU.  Save them on every exit
> (switching profiling off), and restore them on every entry (switching
> profiling on).  The only problem with this is that it is very cpu model
> dependent, losing the hardware independence that virtual machines have.
> If you are satisfied with the architectural performance counters, then
> we even have hardware independence.
But don't the architectural performance counters vary between Intel and AMD 
cpus ? AFAIK, they do. And, this would pose problems during migration between 
Intel and AMD hosts.

I am not sure how important is it to support migration between Intel and AMD 
hosts. If it were not that important, then IMO we could go ahead with exposing 
the real PMU. Maybe we could warn users against running profilers in the guest 
if they intend it to to be Intel<->AMD migrateable ?

regards,
balaji rao

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <200801302353.19872.balajirrao-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]

* Re: Performance monitoring units and KVM
       [not found]         ` <200801302353.19872.balajirrao-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2008-01-30 18:14           ` Avi Kivity
       [not found]             ` <47A0BE8F.4090508-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Avi Kivity @ 2008-01-30 18:14 UTC (permalink / raw)
  To: Balaji Rao; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Markus Armbruster

Balaji Rao wrote:
> On Wednesday 30 January 2008 11:11:51 pm Avi Kivity wrote:
>   
>> Markus Armbruster wrote:
>>     
>>> System-wide profiling of the *virtual* machine is related to profiling
>>> just a process.  That's hard.  I guess building on Perfmon2 would make
>>> sense there, but as long as it's out of tree...  Can we wait for it?
>>> If not, what then?
>>>       
>> Give the guest access to the real PMU.  Save them on every exit
>> (switching profiling off), and restore them on every entry (switching
>> profiling on).  The only problem with this is that it is very cpu model
>> dependent, losing the hardware independence that virtual machines have.
>> If you are satisfied with the architectural performance counters, then
>> we even have hardware independence.
>>     
> But don't the architectural performance counters vary between Intel and AMD 
> cpus ? AFAIK, they do. And, this would pose problems during migration between 
> Intel and AMD hosts.
>
>   

The also vary between Intel hosts of different models, and likely 
different AMD hosts as well.  The PMU is not architectural (or, in other 
words, model specific).  So migration and PMU pass-through are mutually 
exclusive unless you have a homogeneous server farm.

> I am not sure how important is it to support migration between Intel and AMD 
> hosts. If it were not that important, then IMO we could go ahead with exposing 
> the real PMU. Maybe we could warn users against running profilers in the guest 
> if they intend it to to be Intel<->AMD migrateable ?
>   

We can give the user the option to expose only the architectural PMU 
(which is quite limited) and have cross-model migration, or to expose 
the full PMU and lose hardware independence.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <47A0BE8F.4090508-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]

* Re: Performance monitoring units and KVM
       [not found]             ` <47A0BE8F.4090508-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2008-01-30 18:26               ` Andi Kleen
       [not found]                 ` <p73ir1b5fvy.fsf-KvMlXPVkKihbpigZmTR7Iw@public.gmane.org>
  2008-01-30 18:55               ` Balaji Rao
  1 sibling, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2008-01-30 18:26 UTC (permalink / raw)
  To: avi-atKUWr5tajBWk0Htik3J/w,
	kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	markus.armbruster-5C7GfCeVMHo


> Balaji Rao wrote:
>> On Wednesday 30 January 2008 11:11:51 pm Avi Kivity wrote:
>>   
>>> Markus Armbruster wrote:
>>>     
>>>> System-wide profiling of the *virtual* machine is related to profiling
>>>> just a process.  That's hard.  I guess building on Perfmon2 would make
>>>> sense there, but as long as it's out of tree...  Can we wait for it?
>>>> If not, what then?
>>>>       
>>> Give the guest access to the real PMU.  Save them on every exit
>>> (switching profiling off), and restore them on every entry (switching
>>> profiling on).  The only problem with this is that it is very cpu model
>>> dependent, losing the hardware independence that virtual machines have.
>>> If you are satisfied with the architectural performance counters, then
>>> we even have hardware independence.
>>>     
>> But don't the architectural performance counters vary between Intel and AMD 
>> cpus ? AFAIK, they do. And, this would pose problems during migration between 
>> Intel and AMD hosts.
>>
>>   
>
> The also vary between Intel hosts of different models, and likely 
> different AMD hosts as well.  The PMU is not architectural (or, in other 
> words, model specific). 

Intel has an architectural PMU now, but it only works
on relatively new CPUs and not on AMD.

>> I am not sure how important is it to support migration between Intel and AMD 
>> hosts. If it were not that important, then IMO we could go ahead with exposing 
>> the real PMU. Maybe we could warn users against running profilers in the guest 
>> if they intend it to to be Intel<->AMD migrateable ?
>>   
>
> We can give the user the option to expose only the architectural PMU 
> (which is quite limited) and have cross-model migration, or to expose 
> the full PMU and lose hardware independence.

There is no really an architectural PMU if you consider
boxes beyond relatively new Intel CPUs (which got one) 

But on the other hand in my experience most PMU users use 
relatively simple few counters (e.g. 90+% likely the local
variant of CPU_CYCLES_NONHALTED) so it would be in theory
possible to translate those by traps from a different CPU's
format in the monitor into the local MSR.

The only trouble is that is no architectural way to tell
the guest "i support only counter X Y Z" and also no
nice way to reject a particular counter except for just
not ticking.

And trapping these MSR writes might be too slow for some
applications.

Still I suspect just emulating cycles non halting would 
be a decent 95+% solution.

-Andi

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <p73ir1b5fvy.fsf-KvMlXPVkKihbpigZmTR7Iw@public.gmane.org>]

* Re: Performance monitoring units and KVM
       [not found]                 ` <p73ir1b5fvy.fsf-KvMlXPVkKihbpigZmTR7Iw@public.gmane.org>
@ 2008-01-30 19:14                   ` Balaji Rao
       [not found]                     ` <200801310044.11055.balajirrao-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Balaji Rao @ 2008-01-30 19:14 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
  Cc: markus.armbruster-5C7GfCeVMHo, Andi Kleen,
	avi-atKUWr5tajBWk0Htik3J/w

On Wednesday 30 January 2008 11:56:25 pm Andi Kleen wrote:
> There is no really an architectural PMU if you consider
> boxes beyond relatively new Intel CPUs (which got one)
>
But since kvm runs only on such CPUs, it should not really be a problem in 
migrating between various Intel models at least.
> But on the other hand in my experience most PMU users use
> relatively simple few counters (e.g. 90+% likely the local
> variant of CPU_CYCLES_NONHALTED) so it would be in theory
> possible to translate those by traps from a different CPU's
> format in the monitor into the local MSR.
>
> The only trouble is that is no architectural way to tell
> the guest "i support only counter X Y Z" and also no
> nice way to reject a particular counter except for just
> not ticking.
Can't this be exported through CPUID ?
>

regards,
balaji rao

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <200801310044.11055.balajirrao-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]

* Re: Performance monitoring units and KVM
       [not found]                     ` <200801310044.11055.balajirrao-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2008-01-31  3:12                       ` Andi Kleen
       [not found]                         ` <20080131031232.GB27115-qrUzlfsMFqo/4alezvVtWx2eb7JE58TQ@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2008-01-31  3:12 UTC (permalink / raw)
  To: Balaji Rao
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	markus.armbruster-5C7GfCeVMHo, Andi Kleen,
	avi-atKUWr5tajBWk0Htik3J/w

On Thu, Jan 31, 2008 at 12:44:10AM +0530, Balaji Rao wrote:
> On Wednesday 30 January 2008 11:56:25 pm Andi Kleen wrote:
> > There is no really an architectural PMU if you consider
> > boxes beyond relatively new Intel CPUs (which got one)
> >
> But since kvm runs only on such CPUs, it should not really be a problem in 
> migrating between various Intel models at least.

I'm not 100% sure, but I think there are P4 based (Family 15) CPUs which have
VT but not ArchPerfMon. AFAIK ArchPerfMon is only in Family 6 CPUs.
Family 15 has a completely different PerfMon interface.

> > But on the other hand in my experience most PMU users use
> > relatively simple few counters (e.g. 90+% likely the local
> > variant of CPU_CYCLES_NONHALTED) so it would be in theory
> > possible to translate those by traps from a different CPU's
> > format in the monitor into the local MSR.
> >
> > The only trouble is that is no architectural way to tell
> > the guest "i support only counter X Y Z" and also no
> > nice way to reject a particular counter except for just
> > not ticking.
> Can't this be exported through CPUID ?

Sure it could, but that would be a new interface. If you were
free to define a new interface you could also just go completely
hypercall based.

-Andi


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <20080131031232.GB27115-qrUzlfsMFqo/4alezvVtWx2eb7JE58TQ@public.gmane.org>]

* Re: Performance monitoring units and KVM II
       [not found]                         ` <20080131031232.GB27115-qrUzlfsMFqo/4alezvVtWx2eb7JE58TQ@public.gmane.org>
@ 2008-01-31  3:44                           ` Andi Kleen
  2008-01-31  7:12                           ` Performance monitoring units and KVM Balaji Rao
  1 sibling, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2008-01-31  3:44 UTC (permalink / raw)
  To: Andi Kleen
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Balaji Rao,
	markus.armbruster-5C7GfCeVMHo, avi-atKUWr5tajBWk0Htik3J/w

> Sure it could, but that would be a new interface. If you were
> free to define a new interface you could also just go completely
> hypercall based.

Actually thinking about it more it would be probably possible for 
KVM to emulate ArchPerfMon on AMD and Family 15 Intel based on 
the local PMU. ArchPerfmon has only a few defined counters which
can likely be all emulated with other counters.

Then the migration would work and PMU also.

The only issue is that some clients might check the cpu vendor/model/family
in addition to the ArchPerfMon. So to do successfull migration you
would always need to fake an ArchPerfMon capable CPU in CPUID initially.
So e.g. even if you ran on AMD SVM you would need to fake a Yonah or Core2.o

Not sure if that had other bide side effects. Perhaps it could be made
optional.

-Andi  

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Performance monitoring units and KVM
       [not found]                         ` <20080131031232.GB27115-qrUzlfsMFqo/4alezvVtWx2eb7JE58TQ@public.gmane.org>
  2008-01-31  3:44                           ` Performance monitoring units and KVM II Andi Kleen
@ 2008-01-31  7:12                           ` Balaji Rao
  1 sibling, 0 replies; 15+ messages in thread
From: Balaji Rao @ 2008-01-31  7:12 UTC (permalink / raw)
  To: Andi Kleen; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

On Thursday 31 January 2008 08:42:32 am Andi Kleen wrote:
> On Thu, Jan 31, 2008 at 12:44:10AM +0530, Balaji Rao wrote:
> > On Wednesday 30 January 2008 11:56:25 pm Andi Kleen wrote:
> > > There is no really an architectural PMU if you consider
> > > boxes beyond relatively new Intel CPUs (which got one)
> >
> > But since kvm runs only on such CPUs, it should not really be a problem
> > in migrating between various Intel models at least.
>
> I'm not 100% sure, but I think there are P4 based (Family 15) CPUs which
> have VT but not ArchPerfMon. AFAIK ArchPerfMon is only in Family 6 CPUs.
> Family 15 has a completely different PerfMon interface.
>
Yea, you are right. To quote from the manual (2.1.7), 'Intel® Virtualization 
Technology (Intel® VT) was introduced in the Intel Pentium 4 processor 672 and 
662.' And they dont have ArchPerfMon as it was introduced only after core solo 
and duo.

regards,
balaji rao

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Performance monitoring units and KVM
       [not found]             ` <47A0BE8F.4090508-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  2008-01-30 18:26               ` Andi Kleen
@ 2008-01-30 18:55               ` Balaji Rao
  1 sibling, 0 replies; 15+ messages in thread
From: Balaji Rao @ 2008-01-30 18:55 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Markus Armbruster

On Wednesday 30 January 2008 11:44:39 pm Avi Kivity wrote:
> Balaji Rao wrote:
> > But don't the architectural performance counters vary between Intel and
> > AMD cpus ? AFAIK, they do. And, this would pose problems during migration
> > between Intel and AMD hosts.
>
> The also vary between Intel hosts of different models, and likely
> different AMD hosts as well.  The PMU is not architectural (or, in other
> words, model specific).  So migration and PMU pass-through are mutually
> exclusive unless you have a homogeneous server farm.
>
Right. I had confused myself in understanding that Architectural Performance 
monitoring is consistent across all processors right from P6. But infact it was 
introduced starting with core solo and core duo.
> > I am not sure how important is it to support migration between Intel and
> > AMD hosts. If it were not that important, then IMO we could go ahead with
> > exposing the real PMU. Maybe we could warn users against running
> > profilers in the guest if they intend it to to be Intel<->AMD migrateable
> > ?
>
> We can give the user the option to expose only the architectural PMU
> (which is quite limited) and have cross-model migration, or to expose
> the full PMU and lose hardware independence.

Yes. This looks like the right thing to do..

regards,
balaji rao

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Performance monitoring units and KVM
       [not found]     ` <47A0B6DF.40208-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  2008-01-30 18:05       ` Andi Kleen
  2008-01-30 18:23       ` Balaji Rao
@ 2008-01-30 19:55       ` Markus Armbruster
       [not found]         ` <87fxwfw0jn.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
  2 siblings, 1 reply; 15+ messages in thread
From: Markus Armbruster @ 2008-01-30 19:55 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> writes:

> Markus Armbruster wrote:
>>
>> System-wide profiling of the *virtual* machine is related to profiling
>> just a process.  That's hard.  I guess building on Perfmon2 would make
>> sense there, but as long as it's out of tree...  Can we wait for it?
>> If not, what then?
>>
>>   
>
> Give the guest access to the real PMU.  Save them on every exit
> (switching profiling off), and restore them on every entry (switching
> profiling on).  The only problem with this is that it is very cpu
> model dependent, losing the hardware independence that virtual
> machines have.  If you are satisfied with the architectural
> performance counters, then we even have hardware independence.

Saving and restoring the PMU is *slow* on most machines.  Especially
bad on machines where reading / writing a PMU register involves
serializing instructions.

Want to try anyway?

I hope hardware vendors will eventually make PMUs friendlier to
virtualization.

>> The same ideas should work for KVM.  The whole hypervisor headache
>> just evaporates, of course.  What remains is the host kernel routing
>> samples to active guests (over virtio, I guess), and guests kernels
>> receiving samples from there instead of the hardware PMU.  In other
>> words, the sample channel from the host becomes our virtual PMU for
>> the guest.  Which needs a driver for it.  It's a weird PMU, because
>> you can't program its performance counters.  That's left to the host.
>>   
>
> Is there really a requirement to profile several userspace programs,
> on several guests, simultaneously?  If not, passing through the PMU
> will work best, with the additional advantage that guests will not
> need modification (so you can run Windows with VTune, for example).

There are uses for both kinds of system-wide profiling.

> If this three-tier profiling is actually needed, perhaps we can do all
> recording on the host, but have an interface to let the guest
> translate rip samples to something more meaningful.  This might work
> in this way:
>
> - oprofile on the host receives the pmu nmi
> - oprofile calls a hook (placed there by kvm) when it sees that the
> task is actually a virtual machine, instead of the usual translation
> process
> - kvm injects an interrupt into the guest
> - the guest converts the pmu rip value into a meaningful string and
> writes it into memory
> - (later) kvm picks this up and passes it back to oprofile
>
> The advantage here is that besides a fairly simple driver that needs
> to be loaded into the guest (and can be loaded automatically),
> everything is controlled from the host.  All the information is
> available on the host, so that sorting by counter occurences, for
> example, works.

Yep.

Problems include:

* OProfile user space receives dcookies from the kernel, which it
  passes to lookup_dcookie().  We'd have to delegate that to the
  appropriate guest.

* OProfile user space needs to be taught where do find each guest's
  debug info.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <87fxwfw0jn.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>]

* Re: Performance monitoring units and KVM
       [not found]         ` <87fxwfw0jn.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
@ 2008-01-31  7:03           ` Avi Kivity
       [not found]             ` <47A172D4.6040505-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Avi Kivity @ 2008-01-31  7:03 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Markus Armbruster wrote:
> Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> writes:
>
>   
>> Markus Armbruster wrote:
>>     
>>> System-wide profiling of the *virtual* machine is related to profiling
>>> just a process.  That's hard.  I guess building on Perfmon2 would make
>>> sense there, but as long as it's out of tree...  Can we wait for it?
>>> If not, what then?
>>>
>>>   
>>>       
>> Give the guest access to the real PMU.  Save them on every exit
>> (switching profiling off), and restore them on every entry (switching
>> profiling on).  The only problem with this is that it is very cpu
>> model dependent, losing the hardware independence that virtual
>> machines have.  If you are satisfied with the architectural
>> performance counters, then we even have hardware independence.
>>     
>
> Saving and restoring the PMU is *slow* on most machines.  Especially
> bad on machines where reading / writing a PMU register involves
> serializing instructions.
>
>   

That is true. It will increase vmexit latencies by several microseconds.

> Want to try anyway?
>   

If we want to support unmodified oprofile/VTune, we have to. I can't 
judge how important it would be to users.

>
>   
>>> The same ideas should work for KVM.  The whole hypervisor headache
>>> just evaporates, of course.  What remains is the host kernel routing
>>> samples to active guests (over virtio, I guess), and guests kernels
>>> receiving samples from there instead of the hardware PMU.  In other
>>> words, the sample channel from the host becomes our virtual PMU for
>>> the guest.  Which needs a driver for it.  It's a weird PMU, because
>>> you can't program its performance counters.  That's left to the host.
>>>   
>>>       
>> Is there really a requirement to profile several userspace programs,
>> on several guests, simultaneously?  If not, passing through the PMU
>> will work best, with the additional advantage that guests will not
>> need modification (so you can run Windows with VTune, for example).
>>     
>
> There are uses for both kinds of system-wide profiling.
>
>   

Okay; we can do both. Pass-through should be quite simple.

>> If this three-tier profiling is actually needed, perhaps we can do all
>> recording on the host, but have an interface to let the guest
>> translate rip samples to something more meaningful.  This might work
>> in this way:
>>
>> - oprofile on the host receives the pmu nmi
>> - oprofile calls a hook (placed there by kvm) when it sees that the
>> task is actually a virtual machine, instead of the usual translation
>> process
>> - kvm injects an interrupt into the guest
>> - the guest converts the pmu rip value into a meaningful string and
>> writes it into memory
>> - (later) kvm picks this up and passes it back to oprofile
>>
>> The advantage here is that besides a fairly simple driver that needs
>> to be loaded into the guest (and can be loaded automatically),
>> everything is controlled from the host.  All the information is
>> available on the host, so that sorting by counter occurences, for
>> example, works.
>>     
>
> Yep.
>
> Problems include:
>
> * OProfile user space receives dcookies from the kernel, which it
>   passes to lookup_dcookie().  We'd have to delegate that to the
>   appropriate guest.
>   

This part looks doable.

> * OProfile user space needs to be taught where do find each guest's
>   debug info.
>   

This one seems too horrible to contemplate. NFS exports on each guest 
and mounts on the host? With fuse sshfs?

Collecting and analyzing all the data on the host looks much better than 
distributing it to guests, however, if we can manage to transfer the 
debug information.

[Wild idea: rewrite lookup_dcookie in a systemtap-like language, and 
execute it on the host instead of on the guest. Basically the host would 
use the guest's vmlinux debug info to decode the information from raw 
kernel memory]

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <47A172D4.6040505-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]

* Re: Performance monitoring units and KVM
       [not found]             ` <47A172D4.6040505-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2008-01-31 15:42               ` Markus Armbruster
       [not found]                 ` <87odb2gfx5.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Markus Armbruster @ 2008-01-31 15:42 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, William Cohen

[Note cc: Will, who knows much more about OProfile than I do]

Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> writes:

> Markus Armbruster wrote:
>> Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> writes:
>>
>>   
>>> Markus Armbruster wrote:
>>>     
>>>> System-wide profiling of the *virtual* machine is related to profiling
>>>> just a process.  That's hard.  I guess building on Perfmon2 would make
>>>> sense there, but as long as it's out of tree...  Can we wait for it?
>>>> If not, what then?
>>>>
>>>>         
>>> Give the guest access to the real PMU.  Save them on every exit
>>> (switching profiling off), and restore them on every entry (switching
>>> profiling on).  The only problem with this is that it is very cpu
>>> model dependent, losing the hardware independence that virtual
>>> machines have.  If you are satisfied with the architectural
>>> performance counters, then we even have hardware independence.
>>>     
>>
>> Saving and restoring the PMU is *slow* on most machines.  Especially
>> bad on machines where reading / writing a PMU register involves
>> serializing instructions.
>>
>>   
>
> That is true. It will increase vmexit latencies by several microseconds.
>
>> Want to try anyway?
>>   
>
> If we want to support unmodified oprofile/VTune, we have to. I can't
> judge how important it would be to users.

Neither can I, at this time.

We'd have to make sure that we pay only as we go: save / restore the
PMU only when and as far as it is in use.

Migration was discussed elsewhere in this thread.  Andi suggested to
provide a virtual architectural PMU instead of virtualizing the real
PMU.  That way we could migrate between dissimilar hardware.  But it
would involve faking a CPU with an architectural PMU.  Hmm.

Isn't asking for a bit much to want migration between hardware with
different real PMUs, *and* virtual performance monitoring at the same
time?

As far as OProfile is concerned: we can make it work with whatever
kind of virtual PMU we want, without a complete CPU fake.  It just
needs to be able to detect our virtual PMU.

>>>> The same ideas should work for KVM.  The whole hypervisor headache
>>>> just evaporates, of course.  What remains is the host kernel routing
>>>> samples to active guests (over virtio, I guess), and guests kernels
>>>> receiving samples from there instead of the hardware PMU.  In other
>>>> words, the sample channel from the host becomes our virtual PMU for
>>>> the guest.  Which needs a driver for it.  It's a weird PMU, because
>>>> you can't program its performance counters.  That's left to the host.
>>>>         
>>> Is there really a requirement to profile several userspace programs,
>>> on several guests, simultaneously?  If not, passing through the PMU
>>> will work best, with the additional advantage that guests will not
>>> need modification (so you can run Windows with VTune, for example).
>>>     
>>
>> There are uses for both kinds of system-wide profiling.
>>
>>   
>
> Okay; we can do both. Pass-through should be quite simple.
>
>>> If this three-tier profiling is actually needed, perhaps we can do all
>>> recording on the host, but have an interface to let the guest
>>> translate rip samples to something more meaningful.  This might work
>>> in this way:
>>>
>>> - oprofile on the host receives the pmu nmi
>>> - oprofile calls a hook (placed there by kvm) when it sees that the
>>> task is actually a virtual machine, instead of the usual translation
>>> process
>>> - kvm injects an interrupt into the guest
>>> - the guest converts the pmu rip value into a meaningful string and
>>> writes it into memory
>>> - (later) kvm picks this up and passes it back to oprofile
>>>
>>> The advantage here is that besides a fairly simple driver that needs
>>> to be loaded into the guest (and can be loaded automatically),
>>> everything is controlled from the host.  All the information is
>>> available on the host, so that sorting by counter occurences, for
>>> example, works.
>>>     
>>
>> Yep.
>>
>> Problems include:
>>
>> * OProfile user space receives dcookies from the kernel, which it
>>   passes to lookup_dcookie().  We'd have to delegate that to the
>>   appropriate guest.
>>   
>
> This part looks doable.
>
>> * OProfile user space needs to be taught where do find each guest's
>>   debug info.
>>   
>
> This one seems too horrible to contemplate. NFS exports on each guest
> and mounts on the host? With fuse sshfs?

OProfile searches for debug info in a couple of places in the
filesystem.  Perhaps we could teach it to take a guest root directory
parameter, and search a guest's debuginfo below that.  How the
debuginfo gets there is then the user's problem (NFS mount, fetch &
unpack rpms, ...).

> Collecting and analyzing all the data on the host looks much better
> than distributing it to guests, however, if we can manage to transfer
> the debug information.

Yes, it liberates the user from a whole lot of hassle.

However, there are uses for collecting a guest's data on the guest as
well.  Say you run some very low-overhead, system-wide sampling
continuously on the host, and let guests subscribe to it (no root on
host required).  Kind of like a very limited virtual PMU that doesn't
give you many choices on how to sample.

If we need this capability anyway, we can just as well start with it,
because it seems easier than collecting everything on the host.

> [Wild idea: rewrite lookup_dcookie in a systemtap-like language, and
> execute it on the host instead of on the guest. Basically the host
> would use the guest's vmlinux debug info to decode the information
> from raw kernel memory]

Urgs.  A bit too wild for my taste, I have to admit :)

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <87odb2gfx5.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>]

* Re: Performance monitoring units and KVM
       [not found]                 ` <87odb2gfx5.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
@ 2008-01-31 16:37                   ` Avi Kivity
  0 siblings, 0 replies; 15+ messages in thread
From: Avi Kivity @ 2008-01-31 16:37 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, William Cohen

Markus Armbruster wrote:
> As far as OProfile is concerned: we can make it work with whatever
> kind of virtual PMU we want, without a complete CPU fake.  It just
> needs to be able to detect our virtual PMU.
>
>   

Well, if the virtual PMU happens to match exactly the physical 
architectural PMU, that saves us the bother of specifying and 
documenting it, porting oprofile, and explaining to users that VTune 
doesn't work.

>>> * OProfile user space needs to be taught where do find each guest's
>>>   debug info.  This one seems too horrible to contemplate. NFS exports on each guest
>>> and mounts on the host? With fuse sshfs?
>>>       
>
> OProfile searches for debug info in a couple of places in the
> filesystem.  Perhaps we could teach it to take a guest root directory
> parameter, and search a guest's debuginfo below that.  How the
> debuginfo gets there is then the user's problem (NFS mount, fetch &
> unpack rpms, ...).
>
>   

unionfs can help here.

>
> However, there are uses for collecting a guest's data on the guest as
> well.  Say you run some very low-overhead, system-wide sampling
> continuously on the host, and let guests subscribe to it (no root on
> host required).  Kind of like a very limited virtual PMU that doesn't
> give you many choices on how to sample.
>
>   

You're describing the architectural PMU again: guest-wide profiling 
without access to the rest of the system.


> If we need this capability anyway, we can just as well start with it,
> because it seems easier than collecting everything on the host.
>
>   

I believe that PMU passthrough is the easiest, followed by emulating the 
architectural PMU where it doesn't exist.



-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-01-31 16:37 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-30 17:06 Performance monitoring units and KVM Markus Armbruster
     [not found] ` <87wsprxmyb.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
2008-01-30 17:41   ` Avi Kivity
     [not found]     ` <47A0B6DF.40208-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2008-01-30 18:05       ` Andi Kleen
2008-01-30 18:23       ` Balaji Rao
     [not found]         ` <200801302353.19872.balajirrao-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2008-01-30 18:14           ` Avi Kivity
     [not found]             ` <47A0BE8F.4090508-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2008-01-30 18:26               ` Andi Kleen
     [not found]                 ` <p73ir1b5fvy.fsf-KvMlXPVkKihbpigZmTR7Iw@public.gmane.org>
2008-01-30 19:14                   ` Balaji Rao
     [not found]                     ` <200801310044.11055.balajirrao-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2008-01-31  3:12                       ` Andi Kleen
     [not found]                         ` <20080131031232.GB27115-qrUzlfsMFqo/4alezvVtWx2eb7JE58TQ@public.gmane.org>
2008-01-31  3:44                           ` Performance monitoring units and KVM II Andi Kleen
2008-01-31  7:12                           ` Performance monitoring units and KVM Balaji Rao
2008-01-30 18:55               ` Balaji Rao
2008-01-30 19:55       ` Markus Armbruster
     [not found]         ` <87fxwfw0jn.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
2008-01-31  7:03           ` Avi Kivity
     [not found]             ` <47A172D4.6040505-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2008-01-31 15:42               ` Markus Armbruster
     [not found]                 ` <87odb2gfx5.fsf-A7mx1g9ivIOttUaS3K59qNi2O/JbrIOy@public.gmane.org>
2008-01-31 16:37                   ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox