linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	 Arnaldo Carvalho de Melo <acme@kernel.org>,
	Kan Liang <kan.liang@linux.intel.com>,
	 Like Xu <likexu@tencent.com>,
	Mark Rutland <mark.rutland@arm.com>,
	 Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>,  Namhyung Kim <namhyung@kernel.org>,
	Ian Rogers <irogers@google.com>,
	 Adrian Hunter <adrian.hunter@intel.com>,
	kvm@vger.kernel.org,  linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	 Zhenyu Wang <zhenyuw@linux.intel.com>,
	Zhang Xiong <xiong.y.zhang@intel.com>,
	 Lv Zhiyuan <zhiyuan.lv@intel.com>,
	Yang Weijiang <weijiang.yang@intel.com>,
	 Dapeng Mi <dapeng1.mi@intel.com>,
	Jim Mattson <jmattson@google.com>,
	 David Dunn <daviddunn@google.com>,
	Mingwei Zhang <mizhang@google.com>
Subject: Re: [Patch v4 07/13] perf/x86: Add constraint for guest perf metrics event
Date: Wed, 27 Sep 2023 10:27:07 -0700	[thread overview]
Message-ID: <ZRRl6y1GL-7RM63x@google.com> (raw)
In-Reply-To: <20230927113312.GD21810@noisy.programming.kicks-ass.net>

+Jim, David, and Mingwei

On Wed, Sep 27, 2023, Peter Zijlstra wrote:
> On Wed, Sep 27, 2023 at 11:31:18AM +0800, Dapeng Mi wrote:
> > When guest wants to use PERF_METRICS MSR, a virtual metrics event needs
> > to be created in the perf subsystem so that the guest can have exclusive
> > ownership of the PERF_METRICS MSR.
> 
> Urgh, can someone please remind me how all that is supposed to work
> again? The guest is just a task that wants the event. If the
> host creates a CPU event, then that gets scheduled with higher priority
> and the task looses out, no joy.
> 
> So you cannot guarantee the guest gets anything.
> 
> That is, I remember we've had this exact problem before, but I keep
> forgetting how this all is supposed to work. I don't use this virt stuff
> (and every time I try qemu arguments defeat me and I give up in
> disgust).

I don't think it does work, at least not without a very, very carefully crafted
setup and a host userspace that knows it must not use certain aspects of perf.
E.g. for PEBS, if the guest virtual counters don't map 1:1 to the "real" counters
in hardware, KVM+perf simply disables the counter.

And for top-down slots, getting anything remotely accurate requires pinning vCPUs
1:1 with pCPUs and enumerating an accurate toplogy to the guest:

  The count is distributed among unhalted logical processors (hyper-threads) who
  share the same physical core, in processors that support Intel Hyper-Threading
  Technology.

Jumping the gun a bit (we're in the *super* early stages of scraping together a
rough PoC), but I think we should effectively put KVM's current vPMU support into
maintenance-only mode, i.e. stop adding new features unless they are *very* simple
to enable, and instead pursue an implementation that (a) lets userspace (and/or
the kernel builder) completely disable host perf (or possibly just host perf usage
of the hardware PMU) and (b) let KVM passthrough the entire hardware PMU when it
has been turned off in the host.

I.e. keep KVM's existing best-offset vPMU support, e.g. for setups where the
platform owner is also the VM ueer (running a Windows VM on a Linux box, hosting
a Linux VM in ChromeOS, etc...).  But for anything advanced and for hard guarantees,
e.g. cloud providers that want to expose fully featured vPMU to customers, force
the platform owner to choose between using perf (or again, perf with hardware PMU)
in the host, and exposing the hardware PMU to the guest.

Hardware vendors are pushing us in the direction whether we like it or not, e.g.
SNP and TDX want to disallow profiling the guest from the host, ARM has an
upcoming PMU model where (IIUC) it can't be virtualized without a passthrough
approach, Intel's hybrid CPUs are a complete trainwreck unless vCPUs are pinned,
and virtualizing things like top-down slots, PEBS, and LBRs in the shared model
requires an absurd amount of complexity throughout the kernel and userspace.

Note, a similar idea was floated and rejected in the past[*], but that failed
proposal tried to retain host perf+PMU functionality by making the behavior dynamic,
which I agree would create an awful ABI for the host.  If we make the "knob" a
Kconfig or kernel param, i.e. require the platform owner to opt-out of using perf
no later than at boot time, then I think we can provide a sane ABI, keep the
implementation simple, all without breaking existing users that utilize perf in
the host to profile guests.

[*] https://lore.kernel.org/all/CALMp9eRBOmwz=mspp0m5Q093K3rMUeAsF3vEL39MGV5Br9wEQQ@mail.gmail.com

  reply	other threads:[~2023-09-27 17:27 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-27  3:31 [Patch v4 00/13] Enable fixed counter 3 and topdown perf metrics for vPMU Dapeng Mi
2023-09-27  3:31 ` [Patch v4 01/13] KVM: x86/pmu: Add Intel CPUID-hinted TopDown slots event Dapeng Mi
2023-09-27  3:31 ` [Patch v4 02/13] KVM: x86/pmu: Support PMU fixed counter 3 Dapeng Mi
2023-09-27  3:31 ` [Patch v4 03/13] perf/core: Add function perf_event_group_leader_check() Dapeng Mi
2023-09-27  3:31 ` [Patch v4 04/13] perf/core: Add function perf_event_move_group() Dapeng Mi
2023-09-27  3:31 ` [Patch v4 05/13] perf/core: Add *group_leader for perf_event_create_group_kernel_counters() Dapeng Mi
2023-09-27  3:31 ` [Patch v4 06/13] perf/x86: Fix typos and inconsistent indents in perf_event header Dapeng Mi
2023-09-27  3:31 ` [Patch v4 07/13] perf/x86: Add constraint for guest perf metrics event Dapeng Mi
2023-09-27 11:33   ` Peter Zijlstra
2023-09-27 17:27     ` Sean Christopherson [this message]
2023-09-28  9:24       ` Mi, Dapeng
2023-09-29 11:53       ` Peter Zijlstra
2023-09-29 15:20         ` Ravi Bangoria
2023-10-02 12:29           ` Peter Zijlstra
2023-10-03  6:36             ` Ravi Bangoria
2023-09-29 15:46         ` Sean Christopherson
2023-09-30  3:29           ` Jim Mattson
2023-10-01  0:31             ` Namhyung Kim
2023-10-02 11:57           ` Peter Zijlstra
2023-10-02 13:30             ` Ingo Molnar
2023-10-02 15:23               ` David Dunn
2023-10-02 19:02                 ` Mingwei Zhang
2023-10-02 15:56               ` Sean Christopherson
2023-10-02 19:50                 ` Liang, Kan
2023-10-02 20:52                   ` Peter Zijlstra
2023-10-02 20:40                 ` Peter Zijlstra
2023-10-03  0:56                   ` Sean Christopherson
2023-10-03  8:16                     ` Peter Zijlstra
2023-10-03 15:23                       ` Sean Christopherson
2023-10-03 18:21                         ` Jim Mattson
2023-10-04 11:32                           ` Peter Zijlstra
2023-10-04 11:21                         ` Peter Zijlstra
2023-10-04 19:51                           ` Mingwei Zhang
2023-10-04 21:50                             ` Sean Christopherson
2023-10-04 22:05                               ` Sean Christopherson
2023-10-08 10:04                                 ` Like Xu
2023-10-09 17:03                                   ` Manali Shukla
2023-10-11 14:15                                     ` Peter Zijlstra
2023-10-17 10:24                                       ` Manali Shukla
2023-10-17 16:58                                         ` Mingwei Zhang
2023-10-18  0:01                                           ` Mingwei Zhang
2023-10-11 14:20                               ` Peter Zijlstra
2023-10-13 17:02                                 ` Sean Christopherson
2023-10-03 17:31                       ` Manali Shukla
2023-10-03 22:02                     ` Mingwei Zhang
2023-10-04 20:43                       ` Sean Christopherson
2023-09-27  3:31 ` [Patch v4 08/13] perf/core: Add new function perf_event_topdown_metrics() Dapeng Mi
2023-09-27  3:31 ` [Patch v4 09/13] perf/x86/intel: Handle KVM virtual metrics event in perf system Dapeng Mi
2023-09-27  3:31 ` [Patch v4 10/13] KVM: x86/pmu: Extend pmc_reprogram_counter() to create group events Dapeng Mi
2023-09-27  3:31 ` [Patch v4 11/13] KVM: x86/pmu: Support topdown perf metrics feature Dapeng Mi
2023-09-27  3:31 ` [Patch v4 12/13] KVM: x86/pmu: Handle PERF_METRICS overflow Dapeng Mi
2023-09-27  3:31 ` [Patch v4 13/13] KVM: x86/pmu: Expose Topdown in MSR_IA32_PERF_CAPABILITIES Dapeng Mi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZRRl6y1GL-7RM63x@google.com \
    --to=seanjc@google.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=dapeng1.mi@intel.com \
    --cc=dapeng1.mi@linux.intel.com \
    --cc=daviddunn@google.com \
    --cc=irogers@google.com \
    --cc=jmattson@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=likexu@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mizhang@google.com \
    --cc=namhyung@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=weijiang.yang@intel.com \
    --cc=xiong.y.zhang@intel.com \
    --cc=zhenyuw@linux.intel.com \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).