From: Sean Christopherson <seanjc@google.com>
To: Mingwei Zhang <mizhang@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>,
Dapeng Mi <dapeng1.mi@linux.intel.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Kan Liang <kan.liang@linux.intel.com>,
Like Xu <likexu@tencent.com>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
kvm@vger.kernel.org, linux-perf-users@vger.kernel.org,
linux-kernel@vger.kernel.org,
Zhenyu Wang <zhenyuw@linux.intel.com>,
Zhang Xiong <xiong.y.zhang@intel.com>,
Lv Zhiyuan <zhiyuan.lv@intel.com>,
Yang Weijiang <weijiang.yang@intel.com>,
Dapeng Mi <dapeng1.mi@intel.com>,
Jim Mattson <jmattson@google.com>,
David Dunn <daviddunn@google.com>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [Patch v4 07/13] perf/x86: Add constraint for guest perf metrics event
Date: Wed, 4 Oct 2023 13:43:50 -0700 [thread overview]
Message-ID: <ZR3Ohk50rSofAnSL@google.com> (raw)
In-Reply-To: <CAL715WLbAnnGUiTdHPO0L7v2FHGa5qmTnWJDi8k9qVkGry5GGQ@mail.gmail.com>
On Tue, Oct 03, 2023, Mingwei Zhang wrote:
> On Mon, Oct 2, 2023 at 5:56 PM Sean Christopherson <seanjc@google.com> wrote:
> > The "when" is what's important. If KVM took a literal interpretation of
> > "exclude guest" for pass-through MSRs, then KVM would context switch all those
> > MSRs twice for every VM-Exit=>VM-Enter roundtrip, even when the VM-Exit isn't a
> > reschedule IRQ to schedule in a different task (or vCPU). The overhead to save
> > all the host/guest MSRs and load all of the guest/host MSRs *twice* for every
> > VM-Exit would be a non-starter. E.g. simple VM-Exits are completely handled in
> > <1500 cycles, and "fastpath" exits are something like half that. Switching all
> > the MSRs is likely 1000+ cycles, if not double that.
>
> Hi Sean,
>
> Sorry, I have no intention to interrupt the conversation, but this is
> slightly confusing to me.
>
> I remember when doing AMX, we added gigantic 8KB memory in the FPU
> context switch. That works well in Linux today. Why can't we do the
> same for PMU? Assuming we context switch all counters, selectors and
> global stuff there?
That's what we (Google folks) are proposing. However, there are significant
side effects if KVM context switches PMU outside of vcpu_run(), whereas the FPU
doesn't suffer the same problems.
Keeping the guest FPU resident for the duration of vcpu_run() is, in terms of
functionality, completely transparent to the rest of the kernel. From the kernel's
perspective, the guest FPU is just a variation of a userspace FPU, and the kernel
is already designed to save/restore userspace/guest FPU state when the kernel wants
to use the FPU for whatever reason. And crucially, kernel FPU usage is explicit
and contained, e.g. see kernel_fpu_{begin,end}(), and comes with mechanisms for
KVM to detect when the guest FPU needs to be reloaded (see TIF_NEED_FPU_LOAD).
The PMU is a completely different story. PMU usage, a.k.a. perf, by design is
"always running". KVM can't transparently stop host usage of the PMU, as disabling
host PMU usage stops perf events from counting/profiling whatever it is they're
supposed to profile.
Today, KVM minimizes the "downtime" of host PMU usage by context switching PMU
state at VM-Enter and VM-Exit, or at least as close as possible, e.g. for LBRs
and Intel PT.
What we are proposing would *significantly* increase the downtime, to the point
where it would almost be unbounded in some paths, e.g. if KVM faults in a page,
gup() could go swap in memory from disk, install PTEs, and so on and so forth.
If the host is trying to profile something related to swap or memory management,
they're out of luck.
next prev parent reply other threads:[~2023-10-04 20:43 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-27 3:31 [Patch v4 00/13] Enable fixed counter 3 and topdown perf metrics for vPMU Dapeng Mi
2023-09-27 3:31 ` [Patch v4 01/13] KVM: x86/pmu: Add Intel CPUID-hinted TopDown slots event Dapeng Mi
2023-09-27 3:31 ` [Patch v4 02/13] KVM: x86/pmu: Support PMU fixed counter 3 Dapeng Mi
2023-09-27 3:31 ` [Patch v4 03/13] perf/core: Add function perf_event_group_leader_check() Dapeng Mi
2023-09-27 3:31 ` [Patch v4 04/13] perf/core: Add function perf_event_move_group() Dapeng Mi
2023-09-27 3:31 ` [Patch v4 05/13] perf/core: Add *group_leader for perf_event_create_group_kernel_counters() Dapeng Mi
2023-09-27 3:31 ` [Patch v4 06/13] perf/x86: Fix typos and inconsistent indents in perf_event header Dapeng Mi
2023-09-27 3:31 ` [Patch v4 07/13] perf/x86: Add constraint for guest perf metrics event Dapeng Mi
2023-09-27 11:33 ` Peter Zijlstra
2023-09-27 17:27 ` Sean Christopherson
2023-09-28 9:24 ` Mi, Dapeng
2023-09-29 11:53 ` Peter Zijlstra
2023-09-29 15:20 ` Ravi Bangoria
2023-10-02 12:29 ` Peter Zijlstra
2023-10-03 6:36 ` Ravi Bangoria
2023-09-29 15:46 ` Sean Christopherson
2023-09-30 3:29 ` Jim Mattson
2023-10-01 0:31 ` Namhyung Kim
2023-10-02 11:57 ` Peter Zijlstra
2023-10-02 13:30 ` Ingo Molnar
2023-10-02 15:23 ` David Dunn
2023-10-02 19:02 ` Mingwei Zhang
2023-10-02 15:56 ` Sean Christopherson
2023-10-02 19:50 ` Liang, Kan
2023-10-02 20:52 ` Peter Zijlstra
2023-10-02 20:40 ` Peter Zijlstra
2023-10-03 0:56 ` Sean Christopherson
2023-10-03 8:16 ` Peter Zijlstra
2023-10-03 15:23 ` Sean Christopherson
2023-10-03 18:21 ` Jim Mattson
2023-10-04 11:32 ` Peter Zijlstra
2023-10-04 11:21 ` Peter Zijlstra
2023-10-04 19:51 ` Mingwei Zhang
2023-10-04 21:50 ` Sean Christopherson
2023-10-04 22:05 ` Sean Christopherson
2023-10-08 10:04 ` Like Xu
2023-10-09 17:03 ` Manali Shukla
2023-10-11 14:15 ` Peter Zijlstra
2023-10-17 10:24 ` Manali Shukla
2023-10-17 16:58 ` Mingwei Zhang
2023-10-18 0:01 ` Mingwei Zhang
2023-10-11 14:20 ` Peter Zijlstra
2023-10-13 17:02 ` Sean Christopherson
2023-10-03 17:31 ` Manali Shukla
2023-10-03 22:02 ` Mingwei Zhang
2023-10-04 20:43 ` Sean Christopherson [this message]
2023-09-27 3:31 ` [Patch v4 08/13] perf/core: Add new function perf_event_topdown_metrics() Dapeng Mi
2023-09-27 3:31 ` [Patch v4 09/13] perf/x86/intel: Handle KVM virtual metrics event in perf system Dapeng Mi
2023-09-27 3:31 ` [Patch v4 10/13] KVM: x86/pmu: Extend pmc_reprogram_counter() to create group events Dapeng Mi
2023-09-27 3:31 ` [Patch v4 11/13] KVM: x86/pmu: Support topdown perf metrics feature Dapeng Mi
2023-09-27 3:31 ` [Patch v4 12/13] KVM: x86/pmu: Handle PERF_METRICS overflow Dapeng Mi
2023-09-27 3:31 ` [Patch v4 13/13] KVM: x86/pmu: Expose Topdown in MSR_IA32_PERF_CAPABILITIES Dapeng Mi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZR3Ohk50rSofAnSL@google.com \
--to=seanjc@google.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=dapeng1.mi@intel.com \
--cc=dapeng1.mi@linux.intel.com \
--cc=daviddunn@google.com \
--cc=irogers@google.com \
--cc=jmattson@google.com \
--cc=jolsa@kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=likexu@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@kernel.org \
--cc=mizhang@google.com \
--cc=namhyung@kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=weijiang.yang@intel.com \
--cc=xiong.y.zhang@intel.com \
--cc=zhenyuw@linux.intel.com \
--cc=zhiyuan.lv@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).