From: Sean Christopherson <seanjc@google.com>
To: Marc Zyngier <maz@kernel.org>, Oliver Upton <oupton@kernel.org>,
Tianrui Zhao <zhaotianrui@loongson.cn>,
Bibo Mao <maobibo@loongson.cn>,
Huacai Chen <chenhuacai@kernel.org>,
Anup Patel <anup@brainfault.org>, Paul Walmsley <pjw@kernel.org>,
Palmer Dabbelt <palmer@dabbelt.com>,
Albert Ou <aou@eecs.berkeley.edu>, Xin Li <xin@zytor.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Andy Lutomirski <luto@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Namhyung Kim <namhyung@kernel.org>,
Sean Christopherson <seanjc@google.com>,
Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org, loongarch@lists.linux.dev,
kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
Mingwei Zhang <mizhang@google.com>,
Xudong Hao <xudong.hao@intel.com>,
Sandipan Das <sandipan.das@amd.com>,
Dapeng Mi <dapeng1.mi@linux.intel.com>,
Xiong Zhang <xiong.y.zhang@linux.intel.com>,
Manali Shukla <manali.shukla@amd.com>,
Jim Mattson <jmattson@google.com>
Subject: [PATCH v6 04/44] perf: Add APIs to create/release mediated guest vPMUs
Date: Fri, 5 Dec 2025 16:16:40 -0800 [thread overview]
Message-ID: <20251206001720.468579-5-seanjc@google.com> (raw)
In-Reply-To: <20251206001720.468579-1-seanjc@google.com>
From: Kan Liang <kan.liang@linux.intel.com>
Currently, exposing PMU capabilities to a KVM guest is done by emulating
guest PMCs via host perf events, i.e. by having KVM be "just" another user
of perf. As a result, the guest and host are effectively competing for
resources, and emulating guest accesses to vPMU resources requires
expensive actions (expensive relative to the native instruction). The
overhead and resource competition results in degraded guest performance
and ultimately very poor vPMU accuracy.
To address the issues with the perf-emulated vPMU, introduce a "mediated
vPMU", where the data plane (PMCs and enable/disable knobs) is exposed
directly to the guest, but the control plane (event selectors and access
to fixed counters) is managed by KVM (via MSR interceptions). To allow
host perf usage of the PMU to (partially) co-exist with KVM/guest usage
of the PMU, KVM and perf will coordinate to a world switch between host
perf context and guest vPMU context near VM-Enter/VM-Exit.
Add two exported APIs, perf_{create,release}_mediated_pmu(), to allow KVM
to create and release a mediated PMU instance (per VM). Because host perf
context will be deactivated while the guest is running, mediated PMU usage
will be mutually exclusive with perf analysis of the guest, i.e. perf
events that do NOT exclude the guest will not behave as expected.
To avoid silent failure of !exclude_guest perf events, disallow creating a
mediated PMU if there are active !exclude_guest events, and on the perf
side, disallowing creating new !exclude_guest perf events while there is
at least one active mediated PMU.
Exempt PMU resources that do not support mediated PMU usage, i.e. that are
outside the scope/view of KVM's vPMU and will not be swapped out while the
guest is running.
Guard mediated PMU with a new kconfig to help readers identify code paths
that are unique to mediated PMU support, and to allow for adding arch-
specific hooks without stubs. KVM x86 is expected to be the only KVM
architecture to support a mediated PMU in the near future (e.g. arm64 is
trending toward a partitioned PMU implementation), and KVM x86 will select
PERF_GUEST_MEDIATED_PMU unconditionally, i.e. won't need stubs.
Immediately select PERF_GUEST_MEDIATED_PMU when KVM x86 is enabled so that
all paths are compile tested. Full KVM support is on its way...
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
[sean: add kconfig and WARNing, rewrite changelog, swizzle patch ordering]
Tested-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/Kconfig | 1 +
include/linux/perf_event.h | 6 +++
init/Kconfig | 4 ++
kernel/events/core.c | 82 ++++++++++++++++++++++++++++++++++++++
4 files changed, 93 insertions(+)
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 278f08194ec8..d916bd766c94 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -37,6 +37,7 @@ config KVM_X86
select SCHED_INFO
select PERF_EVENTS
select GUEST_PERF_EVENTS
+ select PERF_GUEST_MEDIATED_PMU
select HAVE_KVM_MSI
select HAVE_KVM_CPU_RELAX_INTERCEPT
select HAVE_KVM_NO_POLL
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index fd1d91017b99..94f679634ef6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -305,6 +305,7 @@ struct perf_event_pmu_context;
#define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
#define PERF_PMU_CAP_AUX_PAUSE 0x0200
#define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400
+#define PERF_PMU_CAP_MEDIATED_VPMU 0x0800
/**
* pmu::scope
@@ -1914,6 +1915,11 @@ extern int perf_event_account_interrupt(struct perf_event *event);
extern int perf_event_period(struct perf_event *event, u64 value);
extern u64 perf_event_pause(struct perf_event *event, bool reset);
+#ifdef CONFIG_PERF_GUEST_MEDIATED_PMU
+int perf_create_mediated_pmu(void);
+void perf_release_mediated_pmu(void);
+#endif
+
#else /* !CONFIG_PERF_EVENTS: */
static inline void *
diff --git a/init/Kconfig b/init/Kconfig
index cab3ad28ca49..45b9ac626829 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2010,6 +2010,10 @@ config GUEST_PERF_EVENTS
bool
depends on HAVE_PERF_EVENTS
+config PERF_GUEST_MEDIATED_PMU
+ bool
+ depends on GUEST_PERF_EVENTS
+
config PERF_USE_VMALLOC
bool
help
diff --git a/kernel/events/core.c b/kernel/events/core.c
index e34112df8b31..cfeea7d330f9 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5657,6 +5657,8 @@ static void __free_event(struct perf_event *event)
call_rcu(&event->rcu_head, free_event_rcu);
}
+static void mediated_pmu_unaccount_event(struct perf_event *event);
+
DEFINE_FREE(__free_event, struct perf_event *, if (_T) __free_event(_T))
/* vs perf_event_alloc() success */
@@ -5666,6 +5668,7 @@ static void _free_event(struct perf_event *event)
irq_work_sync(&event->pending_disable_irq);
unaccount_event(event);
+ mediated_pmu_unaccount_event(event);
if (event->rb) {
/*
@@ -6188,6 +6191,81 @@ u64 perf_event_pause(struct perf_event *event, bool reset)
}
EXPORT_SYMBOL_GPL(perf_event_pause);
+#ifdef CONFIG_PERF_GUEST_MEDIATED_PMU
+static atomic_t nr_include_guest_events __read_mostly;
+
+static atomic_t nr_mediated_pmu_vms __read_mostly;
+static DEFINE_MUTEX(perf_mediated_pmu_mutex);
+
+/* !exclude_guest event of PMU with PERF_PMU_CAP_MEDIATED_VPMU */
+static inline bool is_include_guest_event(struct perf_event *event)
+{
+ if ((event->pmu->capabilities & PERF_PMU_CAP_MEDIATED_VPMU) &&
+ !event->attr.exclude_guest)
+ return true;
+
+ return false;
+}
+
+static int mediated_pmu_account_event(struct perf_event *event)
+{
+ if (!is_include_guest_event(event))
+ return 0;
+
+ guard(mutex)(&perf_mediated_pmu_mutex);
+
+ if (atomic_read(&nr_mediated_pmu_vms))
+ return -EOPNOTSUPP;
+
+ atomic_inc(&nr_include_guest_events);
+ return 0;
+}
+
+static void mediated_pmu_unaccount_event(struct perf_event *event)
+{
+ if (!is_include_guest_event(event))
+ return;
+
+ atomic_dec(&nr_include_guest_events);
+}
+
+/*
+ * Currently invoked at VM creation to
+ * - Check whether there are existing !exclude_guest events of PMU with
+ * PERF_PMU_CAP_MEDIATED_VPMU
+ * - Set nr_mediated_pmu_vms to prevent !exclude_guest event creation on
+ * PMUs with PERF_PMU_CAP_MEDIATED_VPMU
+ *
+ * No impact for the PMU without PERF_PMU_CAP_MEDIATED_VPMU. The perf
+ * still owns all the PMU resources.
+ */
+int perf_create_mediated_pmu(void)
+{
+ guard(mutex)(&perf_mediated_pmu_mutex);
+ if (atomic_inc_not_zero(&nr_mediated_pmu_vms))
+ return 0;
+
+ if (atomic_read(&nr_include_guest_events))
+ return -EBUSY;
+
+ atomic_inc(&nr_mediated_pmu_vms);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(perf_create_mediated_pmu);
+
+void perf_release_mediated_pmu(void)
+{
+ if (WARN_ON_ONCE(!atomic_read(&nr_mediated_pmu_vms)))
+ return;
+
+ atomic_dec(&nr_mediated_pmu_vms);
+}
+EXPORT_SYMBOL_GPL(perf_release_mediated_pmu);
+#else
+static int mediated_pmu_account_event(struct perf_event *event) { return 0; }
+static void mediated_pmu_unaccount_event(struct perf_event *event) {}
+#endif
+
/*
* Holding the top-level event's child_mutex means that any
* descendant process that has inherited this event will block
@@ -13078,6 +13156,10 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
if (err)
return ERR_PTR(err);
+ err = mediated_pmu_account_event(event);
+ if (err)
+ return ERR_PTR(err);
+
/* symmetric to unaccount_event() in _free_event() */
account_event(event);
--
2.52.0.223.gf5cc29aaa4-goog
next prev parent reply other threads:[~2025-12-06 0:17 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-06 0:16 [PATCH v6 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
2025-12-06 0:16 ` [PATCH v6 01/44] perf: Skip pmu_ctx based on event_type Sean Christopherson
2025-12-06 0:16 ` [PATCH v6 02/44] perf: Add generic exclude_guest support Sean Christopherson
2025-12-06 0:16 ` [PATCH v6 03/44] perf: Move security_perf_event_free() call to __free_event() Sean Christopherson
2025-12-06 0:16 ` Sean Christopherson [this message]
2025-12-08 11:51 ` [PATCH v6 04/44] perf: Add APIs to create/release mediated guest vPMUs Peter Zijlstra
2025-12-08 18:07 ` Sean Christopherson
2025-12-06 0:16 ` [PATCH v6 05/44] perf: Clean up perf ctx time Sean Christopherson
2025-12-06 0:16 ` [PATCH v6 06/44] perf: Add a EVENT_GUEST flag Sean Christopherson
2025-12-06 0:16 ` [PATCH v6 07/44] perf: Add APIs to load/put guest mediated PMU context Sean Christopherson
2025-12-06 0:16 ` [PATCH v6 08/44] perf/x86/core: Register a new vector for handling mediated guest PMIs Sean Christopherson
2025-12-06 0:16 ` [PATCH v6 09/44] perf/x86/core: Add APIs to switch to/from mediated PMI vector (for KVM) Sean Christopherson
2025-12-06 0:16 ` [PATCH v6 10/44] perf/x86/core: Do not set bit width for unavailable counters Sean Christopherson
2025-12-06 0:16 ` [PATCH v6 11/44] perf/x86/core: Plumb mediated PMU capability from x86_pmu to x86_pmu_cap Sean Christopherson
2025-12-06 0:16 ` [PATCH v6 12/44] perf/x86/intel: Support PERF_PMU_CAP_MEDIATED_VPMU Sean Christopherson
2025-12-06 0:16 ` [PATCH v6 13/44] perf/x86/amd: Support PERF_PMU_CAP_MEDIATED_VPMU for AMD host Sean Christopherson
2025-12-06 0:16 ` [PATCH v6 14/44] KVM: Add a simplified wrapper for registering perf callbacks Sean Christopherson
2025-12-06 0:16 ` [PATCH v6 15/44] KVM: x86/pmu: Snapshot host (i.e. perf's) reported PMU capabilities Sean Christopherson
2025-12-06 0:16 ` [PATCH v6 16/44] KVM: x86/pmu: Start stubbing in mediated PMU support Sean Christopherson
2025-12-06 0:16 ` [PATCH v6 17/44] KVM: x86/pmu: Implement Intel mediated PMU requirements and constraints Sean Christopherson
2025-12-06 0:16 ` [PATCH v6 18/44] KVM: x86/pmu: Implement AMD mediated PMU requirements Sean Christopherson
2025-12-06 0:16 ` [PATCH v6 19/44] KVM: x86/pmu: Register PMI handler for mediated vPMU Sean Christopherson
2025-12-06 0:16 ` [PATCH v6 20/44] KVM: x86/pmu: Disable RDPMC interception for compatible " Sean Christopherson
2025-12-06 0:16 ` [PATCH v6 21/44] KVM: x86/pmu: Load/save GLOBAL_CTRL via entry/exit fields for mediated PMU Sean Christopherson
2025-12-06 0:16 ` [PATCH v6 22/44] KVM: x86/pmu: Disable interception of select PMU MSRs for mediated vPMUs Sean Christopherson
2025-12-06 0:16 ` [PATCH v6 23/44] KVM: x86/pmu: Bypass perf checks when emulating mediated PMU counter accesses Sean Christopherson
2025-12-06 0:17 ` [PATCH v6 24/44] KVM: x86/pmu: Introduce eventsel_hw to prepare for pmu event filtering Sean Christopherson
2025-12-06 0:17 ` [PATCH v6 25/44] KVM: x86/pmu: Reprogram mediated PMU event selectors on event filter updates Sean Christopherson
2025-12-06 0:17 ` [PATCH v6 26/44] KVM: x86/pmu: Always stuff GuestOnly=1,HostOnly=0 for mediated PMCs on AMD Sean Christopherson
2025-12-06 0:17 ` [PATCH v6 27/44] KVM: x86/pmu: Load/put mediated PMU context when entering/exiting guest Sean Christopherson
2025-12-06 0:17 ` [PATCH v6 28/44] KVM: x86/pmu: Disallow emulation in the fastpath if mediated PMCs are active Sean Christopherson
2025-12-06 0:17 ` [PATCH v6 29/44] KVM: x86/pmu: Handle emulated instruction for mediated vPMU Sean Christopherson
2025-12-06 0:17 ` [PATCH v6 30/44] KVM: nVMX: Add macros to simplify nested MSR interception setting Sean Christopherson
2025-12-06 0:17 ` [PATCH v6 31/44] KVM: nVMX: Disable PMU MSR interception as appropriate while running L2 Sean Christopherson
2025-12-06 0:17 ` [PATCH v6 32/44] KVM: nSVM: " Sean Christopherson
2025-12-06 0:17 ` [PATCH v6 33/44] KVM: x86/pmu: Expose enable_mediated_pmu parameter to user space Sean Christopherson
2025-12-06 0:17 ` [PATCH v6 34/44] KVM: x86/pmu: Elide WRMSRs when loading guest PMCs if values already match Sean Christopherson
2025-12-06 0:17 ` [PATCH v6 35/44] KVM: VMX: Drop intermediate "guest" field from msr_autostore Sean Christopherson
2025-12-08 9:14 ` Mi, Dapeng
2025-12-06 0:17 ` [PATCH v6 36/44] KVM: nVMX: Don't update msr_autostore count when saving TSC for vmcs12 Sean Christopherson
2025-12-06 0:17 ` [PATCH v6 37/44] KVM: VMX: Dedup code for removing MSR from VMCS's auto-load list Sean Christopherson
2025-12-08 9:29 ` Mi, Dapeng
2025-12-09 17:37 ` Sean Christopherson
2025-12-10 1:08 ` Mi, Dapeng
2025-12-06 0:17 ` [PATCH v6 38/44] KVM: VMX: Drop unused @entry_only param from add_atomic_switch_msr() Sean Christopherson
2025-12-08 9:32 ` Mi, Dapeng
2025-12-06 0:17 ` [PATCH v6 39/44] KVM: VMX: Bug the VM if either MSR auto-load list is full Sean Christopherson
2025-12-08 9:32 ` Mi, Dapeng
2025-12-08 9:34 ` Mi, Dapeng
2025-12-06 0:17 ` [PATCH v6 40/44] KVM: VMX: Set MSR index auto-load entry if and only if entry is "new" Sean Christopherson
2025-12-08 9:35 ` Mi, Dapeng
2025-12-06 0:17 ` [PATCH v6 41/44] KVM: VMX: Compartmentalize adding MSRs to host vs. guest auto-load list Sean Christopherson
2025-12-08 9:36 ` Mi, Dapeng
2025-12-06 0:17 ` [PATCH v6 42/44] KVM: VMX: Dedup code for adding MSR to VMCS's auto list Sean Christopherson
2025-12-08 9:37 ` Mi, Dapeng
2025-12-06 0:17 ` [PATCH v6 43/44] KVM: VMX: Initialize vmcs01.VM_EXIT_MSR_STORE_ADDR with list address Sean Christopherson
2025-12-06 0:17 ` [PATCH v6 44/44] KVM: VMX: Add mediated PMU support for CPUs without "save perf global ctrl" Sean Christopherson
2025-12-08 9:39 ` Mi, Dapeng
2025-12-09 6:31 ` Mi, Dapeng
2025-12-08 15:37 ` [PATCH v6 00/44] KVM: x86: Add support for mediated vPMUs Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251206001720.468579-5-seanjc@google.com \
--to=seanjc@google.com \
--cc=acme@kernel.org \
--cc=anup@brainfault.org \
--cc=aou@eecs.berkeley.edu \
--cc=chenhuacai@kernel.org \
--cc=dapeng1.mi@linux.intel.com \
--cc=hpa@zytor.com \
--cc=jmattson@google.com \
--cc=kvm-riscv@lists.infradead.org \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=loongarch@lists.linux.dev \
--cc=luto@kernel.org \
--cc=manali.shukla@amd.com \
--cc=maobibo@loongson.cn \
--cc=maz@kernel.org \
--cc=mingo@redhat.com \
--cc=mizhang@google.com \
--cc=namhyung@kernel.org \
--cc=oupton@kernel.org \
--cc=palmer@dabbelt.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=pjw@kernel.org \
--cc=sandipan.das@amd.com \
--cc=xin@zytor.com \
--cc=xiong.y.zhang@linux.intel.com \
--cc=xudong.hao@intel.com \
--cc=zhaotianrui@loongson.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).