Re: [Patch v4 07/13] perf/x86: Add constraint for guest perf metrics event

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Dapeng Mi <dapeng1.mi@linux.intel.com>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	 Kan Liang <kan.liang@linux.intel.com>,
	Like Xu <likexu@tencent.com>,
	 Mark Rutland <mark.rutland@arm.com>,
	 Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>,  Namhyung Kim <namhyung@kernel.org>,
	Ian Rogers <irogers@google.com>,
	 Adrian Hunter <adrian.hunter@intel.com>,
	kvm@vger.kernel.org,  linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	 Zhenyu Wang <zhenyuw@linux.intel.com>,
	Zhang Xiong <xiong.y.zhang@intel.com>,
	 Lv Zhiyuan <zhiyuan.lv@intel.com>,
	Yang Weijiang <weijiang.yang@intel.com>,
	 Dapeng Mi <dapeng1.mi@intel.com>,
	Jim Mattson <jmattson@google.com>,
	 David Dunn <daviddunn@google.com>,
	Mingwei Zhang <mizhang@google.com>,
	 Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [Patch v4 07/13] perf/x86: Add constraint for guest perf metrics event
Date: Mon, 2 Oct 2023 08:56:50 -0700	[thread overview]
Message-ID: <ZRroQg6flyGBtZTG@google.com> (raw)
In-Reply-To: <ZRrF38RGllA04R8o@gmail.com>

On Mon, Oct 02, 2023, Ingo Molnar wrote:
> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Fri, Sep 29, 2023 at 03:46:55PM +0000, Sean Christopherson wrote:
> > 
> > > > I will firmly reject anything that takes the PMU away from the host
> > > > entirely through.
> > > 
> > > Why?  What is so wrong with supporting use cases where the platform owner *wants*
> > > to give up host PMU and NMI watchdog functionality?  If disabling host PMU usage
> > > were complex, highly invasive, and/or difficult to maintain, then I can understand
> > > the pushback.  
> > 
> > Because it sucks.
>
> > You're forcing people to choose between no host PMU or a slow guest PMU.

Nowhere did I say that we wouldn't take patches to improve the existing vPMU
support.  But that's largely a moot point because I don't think it's possible to
improve the current approach to the point where it would provide a performant,
functional guest PMU.

> > And that's simply not a sane choice for most people --

It's better than the status quo, which is that no one gets to choose, everyone
gets a slow guest PMU.

> > worse it's not a choice based in technical reality.

The technical reality is that context switching the PMU between host and guest
requires reading and writing far too many MSRs for KVM to be able to context
switch at every VM-Enter and every VM-Exit.  And PMIs skidding past VM-Exit adds
another layer of complexity to deal with.

> > It's a choice out of lazyness, disabling host PMU is not a requirement
> > for pass-through.

The requirement isn't passthrough access, the requirements are that the guest's
PMU has accuracy that is on par with bare metal, and that exposing a PMU to the
guest doesn't have a meaningful impact on guest performance.

> Not just a choice of laziness, but it will clearly be forced upon users
> by external entities:
> 
>    "Pass ownership of the PMU to the guest and have no host PMU, or you
>     won't have sane guest PMU support at all. If you disagree, please open
>     a support ticket, which we'll ignore."

We don't have sane guest PMU support today.  In the 12+ years since commit
f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests"), KVM has
never provided anything remotely close to a sane vPMU.  It *mostly* works if host
perf is quiesced, but that "good enough" approach doesn't suffice for any form of
PMU usage that requires a high level of accuracy and precision.

> The host OS shouldn't offer facilities that severely limit its own capabilities,
> when there's a better solution. We don't give the FPU to apps exclusively either,
> it would be insanely stupid for a platform to do that.

The FPU can be effeciently context switched, guest state remains resident in
hardware so long as the vCPU task is scheduled in (ignoring infrequrent FPU usage
from IRQ context), and guest usage of the FPU doesn't require trap-and-emulate
behavior in KVM.

As David said, ceding the hardware PMU for all of kvm_arch_vcpu_ioctl_run()
(module the vCPU task being scheduled out) is likely a viable alternative.

 : But it does mean that when entering the KVM run loop, the host perf system 
 : needs to context switch away the host PMU state and allow KVM to load the guest
 : PMU state.  And much like the FPU situation, the portion of the host kernel
 : that runs between the context switch to the KVM thread and VMENTER to the guest
 : cannot use the PMU.

If y'all are willing to let KVM redefined exclude_guest to be KVM's outer run
loop, then I'm all for exploring that option.  But that idea got shot down over
a year ago[*].  Or at least, that was my reading of things.  Maybe it was just a
misunderstanding because we didn't do a good job of defining the behavior.

I am completely ok with either approach, but I am not ok with being nak'd on both.
Because unless there's a magical third option lurking, those two options are the
only ways for KVM to provide a vPMU that meets the requirements for slice-of-hardware
use cases.

[*] https://lore.kernel.org/all/YgPCm1WIt9dHuoEo@hirez.programming.kicks-ass.net

next prev parent reply	other threads:[~2023-10-02 15:56 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-27  3:31 [Patch v4 00/13] Enable fixed counter 3 and topdown perf metrics for vPMU Dapeng Mi
2023-09-27  3:31 ` [Patch v4 01/13] KVM: x86/pmu: Add Intel CPUID-hinted TopDown slots event Dapeng Mi
2023-09-27  3:31 ` [Patch v4 02/13] KVM: x86/pmu: Support PMU fixed counter 3 Dapeng Mi
2023-09-27  3:31 ` [Patch v4 03/13] perf/core: Add function perf_event_group_leader_check() Dapeng Mi
2023-09-27  3:31 ` [Patch v4 04/13] perf/core: Add function perf_event_move_group() Dapeng Mi
2023-09-27  3:31 ` [Patch v4 05/13] perf/core: Add *group_leader for perf_event_create_group_kernel_counters() Dapeng Mi
2023-09-27  3:31 ` [Patch v4 06/13] perf/x86: Fix typos and inconsistent indents in perf_event header Dapeng Mi
2023-09-27  3:31 ` [Patch v4 07/13] perf/x86: Add constraint for guest perf metrics event Dapeng Mi
2023-09-27 11:33   ` Peter Zijlstra
2023-09-27 17:27     ` Sean Christopherson
2023-09-28  9:24       ` Mi, Dapeng
2023-09-29 11:53       ` Peter Zijlstra
2023-09-29 15:20         ` Ravi Bangoria
2023-10-02 12:29           ` Peter Zijlstra
2023-10-03  6:36             ` Ravi Bangoria
2023-09-29 15:46         ` Sean Christopherson
2023-09-30  3:29           ` Jim Mattson
2023-10-01  0:31             ` Namhyung Kim
2023-10-02 11:57           ` Peter Zijlstra
2023-10-02 13:30             ` Ingo Molnar
2023-10-02 15:23               ` David Dunn
2023-10-02 19:02                 ` Mingwei Zhang
2023-10-02 15:56               ` Sean Christopherson [this message]
2023-10-02 19:50                 ` Liang, Kan
2023-10-02 20:52                   ` Peter Zijlstra
2023-10-02 20:40                 ` Peter Zijlstra
2023-10-03  0:56                   ` Sean Christopherson
2023-10-03  8:16                     ` Peter Zijlstra
2023-10-03 15:23                       ` Sean Christopherson
2023-10-03 18:21                         ` Jim Mattson
2023-10-04 11:32                           ` Peter Zijlstra
2023-10-04 11:21                         ` Peter Zijlstra
2023-10-04 19:51                           ` Mingwei Zhang
2023-10-04 21:50                             ` Sean Christopherson
2023-10-04 22:05                               ` Sean Christopherson
2023-10-08 10:04                                 ` Like Xu
2023-10-09 17:03                                   ` Manali Shukla
2023-10-11 14:15                                     ` Peter Zijlstra
2023-10-17 10:24                                       ` Manali Shukla
2023-10-17 16:58                                         ` Mingwei Zhang
2023-10-18  0:01                                           ` Mingwei Zhang
2023-10-11 14:20                               ` Peter Zijlstra
2023-10-13 17:02                                 ` Sean Christopherson
2023-10-03 17:31                       ` Manali Shukla
2023-10-03 22:02                     ` Mingwei Zhang
2023-10-04 20:43                       ` Sean Christopherson
2023-09-27  3:31 ` [Patch v4 08/13] perf/core: Add new function perf_event_topdown_metrics() Dapeng Mi
2023-09-27  3:31 ` [Patch v4 09/13] perf/x86/intel: Handle KVM virtual metrics event in perf system Dapeng Mi
2023-09-27  3:31 ` [Patch v4 10/13] KVM: x86/pmu: Extend pmc_reprogram_counter() to create group events Dapeng Mi
2023-09-27  3:31 ` [Patch v4 11/13] KVM: x86/pmu: Support topdown perf metrics feature Dapeng Mi
2023-09-27  3:31 ` [Patch v4 12/13] KVM: x86/pmu: Handle PERF_METRICS overflow Dapeng Mi
2023-09-27  3:31 ` [Patch v4 13/13] KVM: x86/pmu: Expose Topdown in MSR_IA32_PERF_CAPABILITIES Dapeng Mi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZRroQg6flyGBtZTG@google.com \
    --to=seanjc@google.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=dapeng1.mi@intel.com \
    --cc=dapeng1.mi@linux.intel.com \
    --cc=daviddunn@google.com \
    --cc=irogers@google.com \
    --cc=jmattson@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=likexu@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@kernel.org \
    --cc=mizhang@google.com \
    --cc=namhyung@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=weijiang.yang@intel.com \
    --cc=xiong.y.zhang@intel.com \
    --cc=zhenyuw@linux.intel.com \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).