public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Sean Christopherson <seanjc@google.com>
Cc: Ingo Molnar <mingo@kernel.org>,
	Dapeng Mi <dapeng1.mi@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Kan Liang <kan.liang@linux.intel.com>,
	Like Xu <likexu@tencent.com>, Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
	Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	kvm@vger.kernel.org, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Zhenyu Wang <zhenyuw@linux.intel.com>,
	Zhang Xiong <xiong.y.zhang@intel.com>,
	Lv Zhiyuan <zhiyuan.lv@intel.com>,
	Yang Weijiang <weijiang.yang@intel.com>,
	Dapeng Mi <dapeng1.mi@intel.com>,
	Jim Mattson <jmattson@google.com>,
	David Dunn <daviddunn@google.com>,
	Mingwei Zhang <mizhang@google.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [Patch v4 07/13] perf/x86: Add constraint for guest perf metrics event
Date: Tue, 3 Oct 2023 10:16:16 +0200	[thread overview]
Message-ID: <20231003081616.GE27267@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <ZRtmvLJFGfjcusQW@google.com>

On Mon, Oct 02, 2023 at 05:56:28PM -0700, Sean Christopherson wrote:
> On Mon, Oct 02, 2023, Peter Zijlstra wrote:

> > I'm not sure what you're suggesting here. It will have to save/restore
> > all those MSRs anyway. Suppose it switches between vCPUs.
> 
> The "when" is what's important.   If KVM took a literal interpretation of
> "exclude guest" for pass-through MSRs, then KVM would context switch all those
> MSRs twice for every VM-Exit=>VM-Enter roundtrip, even when the VM-Exit isn't a
> reschedule IRQ to schedule in a different task (or vCPU).  The overhead to save
> all the host/guest MSRs and load all of the guest/host MSRs *twice* for every
> VM-Exit would be a non-starter.  E.g. simple VM-Exits are completely handled in
> <1500 cycles, and "fastpath" exits are something like half that.  Switching all
> the MSRs is likely 1000+ cycles, if not double that.

See, you're the virt-nerd and I'm sure you know what you're talking
about, but I have no clue :-) I didn't know there were different levels
of vm-exit.

> FWIW, the primary use case we care about is for slice-of-hardware VMs, where each
> vCPU is pinned 1:1 with a host pCPU.

I've been given to understand that vm-exit is a bad word in this
scenario, any exit is a fail. They get MWAIT and all the other crap and
more or less pretend to be real hardware.

So why do you care about those MSRs so much? That should 'never' happen
in this scenario.

> > > Or at least, that was my reading of things.  Maybe it was just a
> > > misunderstanding because we didn't do a good job of defining the behavior.
> > 
> > This might be the case. I don't particularly care where the guest
> > boundary lies -- somewhere in the vCPU thread. Once the thread is gone,
> > PMU is usable again etc..
> 
> Well drat, that there would have saved a wee bit of frustration.  Better late
> than never though, that's for sure.
> 
> Just to double confirm: keeping guest PMU state loaded until the vCPU is scheduled
> out or KVM exits to userspace, would mean that host perf events won't be active
> for potentially large swaths of non-KVM code.  Any function calls or event/exception
> handlers that occur within the context of ioctl(KVM_RUN) would run with host
> perf events disabled.

Hurmph, that sounds sub-optimal, earlier you said <1500 cycles, this all
sounds like a ton more.

/me frobs around the kvm code some...

Are we talking about exit_fastpath loop in vcpu_enter_guest() ? That
seems to run with IRQs disabled, so at most you can trigger a #PF or
something, which will then trip an exception fixup because you can't run
#PF with IRQs disabled etc..

That seems fine. That is, a theoretical kvm_x86_handle_enter_irqoff()
coupled with the existing kvm_x86_handle_exit_irqoff() seems like
reasonable solution from where I'm sitting. That also more or less
matches the FPU state save/restore AFAICT.

Or are you talking about the whole of vcpu_run() ? That seems like a
massive amount of code, and doesn't look like anything I'd call a
fast-path. Also, much of that loop has preemption enabled...

> Are you ok with that approach?  Assuming we don't completely botch things, the
> interfaces are sane, we can come up with a clean solution for handling NMIs, etc.

Since you steal the whole PMU, can't you re-route the PMI to something
that's virt friendly too?

> > It also means ::exclude_guest should actually work -- it often does not
> > today -- the IBS thing for example totally ignores it.
> 
> Is that already an in-tree, or are you talking about Manali's proposed series to
> support virtualizing IBS?

The IBS code as is, it totally ignores ::exclude_guest. Manali was going
to add some of it. But I'm not at all sure about the state of the other
PMU drivers we have.

Just for giggles, P4 has VMX support... /me runs like crazy

  reply	other threads:[~2023-10-03  8:30 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-27  3:31 [Patch v4 00/13] Enable fixed counter 3 and topdown perf metrics for vPMU Dapeng Mi
2023-09-27  3:31 ` [Patch v4 01/13] KVM: x86/pmu: Add Intel CPUID-hinted TopDown slots event Dapeng Mi
2023-09-27  3:31 ` [Patch v4 02/13] KVM: x86/pmu: Support PMU fixed counter 3 Dapeng Mi
2023-09-27  3:31 ` [Patch v4 03/13] perf/core: Add function perf_event_group_leader_check() Dapeng Mi
2023-09-27  3:31 ` [Patch v4 04/13] perf/core: Add function perf_event_move_group() Dapeng Mi
2023-09-27  3:31 ` [Patch v4 05/13] perf/core: Add *group_leader for perf_event_create_group_kernel_counters() Dapeng Mi
2023-09-27  3:31 ` [Patch v4 06/13] perf/x86: Fix typos and inconsistent indents in perf_event header Dapeng Mi
2023-09-27  3:31 ` [Patch v4 07/13] perf/x86: Add constraint for guest perf metrics event Dapeng Mi
2023-09-27 11:33   ` Peter Zijlstra
2023-09-27 17:27     ` Sean Christopherson
2023-09-28  9:24       ` Mi, Dapeng
2023-09-29 11:53       ` Peter Zijlstra
2023-09-29 15:20         ` Ravi Bangoria
2023-10-02 12:29           ` Peter Zijlstra
2023-10-03  6:36             ` Ravi Bangoria
2023-09-29 15:46         ` Sean Christopherson
2023-09-30  3:29           ` Jim Mattson
2023-10-01  0:31             ` Namhyung Kim
2023-10-02 11:57           ` Peter Zijlstra
2023-10-02 13:30             ` Ingo Molnar
2023-10-02 15:23               ` David Dunn
2023-10-02 19:02                 ` Mingwei Zhang
2023-10-02 15:56               ` Sean Christopherson
2023-10-02 19:50                 ` Liang, Kan
2023-10-02 20:52                   ` Peter Zijlstra
2023-10-02 20:40                 ` Peter Zijlstra
2023-10-03  0:56                   ` Sean Christopherson
2023-10-03  8:16                     ` Peter Zijlstra [this message]
2023-10-03 15:23                       ` Sean Christopherson
2023-10-03 18:21                         ` Jim Mattson
2023-10-04 11:32                           ` Peter Zijlstra
2023-10-04 11:21                         ` Peter Zijlstra
2023-10-04 19:51                           ` Mingwei Zhang
2023-10-04 21:50                             ` Sean Christopherson
2023-10-04 22:05                               ` Sean Christopherson
2023-10-08 10:04                                 ` Like Xu
2023-10-09 17:03                                   ` Manali Shukla
2023-10-11 14:15                                     ` Peter Zijlstra
2023-10-17 10:24                                       ` Manali Shukla
2023-10-17 16:58                                         ` Mingwei Zhang
2023-10-18  0:01                                           ` Mingwei Zhang
2023-10-11 14:20                               ` Peter Zijlstra
2023-10-13 17:02                                 ` Sean Christopherson
2023-10-03 17:31                       ` Manali Shukla
2023-10-03 22:02                     ` Mingwei Zhang
2023-10-04 20:43                       ` Sean Christopherson
2023-09-27  3:31 ` [Patch v4 08/13] perf/core: Add new function perf_event_topdown_metrics() Dapeng Mi
2023-09-27  3:31 ` [Patch v4 09/13] perf/x86/intel: Handle KVM virtual metrics event in perf system Dapeng Mi
2023-09-27  3:31 ` [Patch v4 10/13] KVM: x86/pmu: Extend pmc_reprogram_counter() to create group events Dapeng Mi
2023-09-27  3:31 ` [Patch v4 11/13] KVM: x86/pmu: Support topdown perf metrics feature Dapeng Mi
2023-09-27  3:31 ` [Patch v4 12/13] KVM: x86/pmu: Handle PERF_METRICS overflow Dapeng Mi
2023-09-27  3:31 ` [Patch v4 13/13] KVM: x86/pmu: Expose Topdown in MSR_IA32_PERF_CAPABILITIES Dapeng Mi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231003081616.GE27267@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=dapeng1.mi@intel.com \
    --cc=dapeng1.mi@linux.intel.com \
    --cc=daviddunn@google.com \
    --cc=irogers@google.com \
    --cc=jmattson@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=likexu@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@kernel.org \
    --cc=mizhang@google.com \
    --cc=namhyung@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=weijiang.yang@intel.com \
    --cc=xiong.y.zhang@intel.com \
    --cc=zhenyuw@linux.intel.com \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox