From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jes Sorensen Subject: Re: KVM PMU virtualization Date: Fri, 26 Feb 2010 14:51:04 +0100 Message-ID: <4B87D1C8.5090901@redhat.com> References: <4B86917C.4070102@redhat.com> <20100225173423.GB4246@8bytes.org> <20100226084241.GF15885@elte.hu> <4B87987A.2020302@redhat.com> <20100226104437.GB7463@elte.hu> <4B87AF44.9090702@redhat.com> <20100226114217.GI7463@elte.hu> <4B87B5DE.30503@redhat.com> <1267190907.22519.601.camel@laptop> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Avi Kivity , Ingo Molnar , Joerg Roedel , KVM General , Zachary Amsden , Gleb Natapov , ming.m.lin@intel.com, "Zhang, Yanmin" , Thomas Gleixner , "H. Peter Anvin" , Arjan van de Ven , Fr??d??ric Weisbecker , Arnaldo Carvalho de Melo To: Peter Zijlstra Return-path: Received: from mx1.redhat.com ([209.132.183.28]:17910 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S936217Ab0BZNyh (ORCPT ); Fri, 26 Feb 2010 08:54:37 -0500 In-Reply-To: <1267190907.22519.601.camel@laptop> Sender: kvm-owner@vger.kernel.org List-ID: On 02/26/10 14:28, Peter Zijlstra wrote: > On Fri, 2010-02-26 at 13:51 +0200, Avi Kivity wrote: > >> It would be the other way round - the host would steal the pmu from the >> guest. Later we can try to time-slice and extrapolate, though that's >> not going to be easy. > > Right, so perf already does the time slicing and interpolating thing, so > a soft-pmu gets that for free. What I don't like here is that without rewriting the guest OS, there will be two layers of time-slicing and extrapolation. That is going to make the reported numbers close to useless. > Anyway, this discussion seems somewhat in a stale-mate position. > > The KVM folks basically demand a full PMU MSR shadow with PMI > passthrough so that their $legacy shit works without modification. > > My question with that is how $legacy muck can ever know how the current > PMU works, you can't even properly emulate a core2 pmu on a nehalem > because intel keeps messing with the event codes for every new model. > > So basically for this to work means the guest can't run legacy stuff > anyway, but needs to run very up-to-date software, so we might as well > create a soft-pmu/paravirt interface now and have all up-to-date > software support that for the next generation. That is the problem. Today there is a large install base out there of core2 users who wish to measure their stuff on the hardware they have. The same will be true for Nehalem based stuff, when whatever replaces Nehalem comes out makes that incompatible. Since we are unable to emulate Core2 on Nehalem, and almost certainly will be unable to emulate Nehalem on it's successor, we are stuck with this. A para-virt interface is a nice idea, but since we cannot emulate an old CPU properly it still means there isn't much we can do as we're stuck with the same limitations. I simply see the value of introducing a para-virt interface for this. > Furthermore, when KVM doesn't virtualize the physical system topology, > some PMU features cannot even be sanely used from a vcpu. That is definitely an issue, and there is nothing we can really do about that. Having two guests running in parallel under KVM means that they are going to see more cache misses than they would if they ran barebone on the hardware. However even with all of this, we have to keep in mind who is going to use the performance monitoring in a guest. It is going to be application writers, mostly people writing analytical/scientific applications. They rarely have control over the OS they are running on, but are given systems and told to work on what they are given. Driver upgrades and things like that don't come quickly. However they also tend to understand limitations like these and will be able to still benefit from perf on a system like that. > So while currently a root user can already tie up all of the pmu using > perf, simply using that to hand the full pmu off to the guest still > leaves lots of issues. Well isn't that the case with the current setup anyway? If enough user apps start requesting PMU resources, the hw is going to run out of counters very quickly anyway. The real issue here IMHO is whether or not is it possible to use a PMU to count anything on different CPU? If that is really possible, sharing the PMU is not an option :( All that said, what we really want is for Intel+AMD to come up with proper hw PMU virtualization support that makes it easy to rotate the full PMU in and out for a guest. Then this whole discussion will become a non issue. Cheers, Jes