linux-riscv.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs
@ 2025-08-06 19:56 Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 01/44] perf: Skip pmu_ctx based on event_type Sean Christopherson
                   ` (46 more replies)
  0 siblings, 47 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

This series is based on the fastpath+PMU cleanups series[*] (which is based on
kvm/queue), but the non-KVM changes apply cleanly on v6.16 or Linus' tree.
I.e. if you only care about the perf changes, I would just apply on whatever
branch is convenient and stop when you hit the KVM changes.

My hope/plan is that the perf changes will go through the tip tree with a
stable tag/branch, and the KVM changes will go the kvm-x86 tree.

Non-x86 KVM folks, y'all are getting Cc'd due to minor changes in "KVM: Add a
simplified wrapper for registering perf callbacks".

The full set is also available at:

  https://github.com/sean-jc/linux.git tags/mediated-vpmu-v5

Add support for mediated vPMUs in KVM x86, where "mediated" aligns with the
standard definition of intercepting control operations (e.g. event selectors),
while allowing the guest to perform data operations (e.g. read PMCs, toggle
counters on/off) without KVM getting involed.

For an in-depth description of the what and why, please see the cover letter
from the original RFC:

  https://lore.kernel.org/all/20240126085444.324918-1-xiong.y.zhang@linux.intel.com

All KVM tests pass (or fail the same before and after), and I've manually
verified MSR/PMC are passed through as expected, but I haven't done much at all
to actually utilize the PMU in a guest.  I'll be amazed if I didn't make at
least one major goof.

Similarly, I tried to address all feedback, but there are many, many changes
relative to v4.  If I missed something, I apologize in advance.

In other words, please thoroughly review and test.

[*] https://lore.kernel.org/all/20250805190526.1453366-1-seanjc@google.com

v5:
 - Add a patch to call security_perf_event_free() from __free_event()
   instead of _free_event() (necessitated by the __cleanup() changes).
 - Add CONFIG_PERF_GUEST_MEDIATED_PMU to guard the new perf functionality.
 - Ensure the PMU is fully disabled in perf_{load,put}_guest_context() when
   when switching between guest and host context. [Kan, Namhyung]
 - Route the new system IRQ, PERF_GUEST_MEDIATED_PMI_VECTOR, through perf,
   not KVM, and play nice with FRED.
 - Rename and combine perf_{guest,host}_{enter,exit}() to a single set of
   APIs, perf_{load,put}_guest_context().
 - Rename perf_{get,put}_mediated_pmu() to perf_{create,release}_mediated_pmu()
   to (hopefully) better differentiate them from perf_{load,put}_guest_context().
 - Change the param to the load/put APIs from "u32 guest_lvtpc" to
   "unsigned long data" to decouple arch code as much as possible.  E.g. if
   a non-x86 arch were to ever support a mediated vPMU, @data could be used
   to pass a pointer to a struct.
 - Use pmu->version to detect if a vCPU has a mediated PMU.
 - Use a kvm_x86_ops hook to check for mediated PMU support.
 - Cull "passthrough" from as many places as I could find.
 - Improve the changelog/documentation related to RDPMC interception.
 - Check harware capabilities, not KVM capabilities, when calculating
   MSR and RDPMC intercepts.
 - Rework intercept (re)calculation to use a request and the existing (well,
   will be existing as of 6.17-rc1) vendor hooks for recalculating intercepts.
 - Always read PERF_GLOBAL_CTRL on VM-Exit if writes weren't intercepted while
   running the vCPU.
 - Call setup_vmcs_config() before kvm_x86_vendor_init() so that the golden
   VMCS configuration is known before kvm_init_pmu_capability() is called.
 - Keep as much refresh/init code in common x86 as possible.
 - Context switch PMCs and event selectors in common x86, not vendor code.
 - Bail from the VM-Exit fastpath if the guest is counting instructions
   retired and the mediated PMU is enabled (because guest state hasn't yet
   been synchronized with hardware).
 - Don't require an userspace to opt-in via KVM_CAP_PMU_CAPABILITY, and instead
   automatically "create" a mediated PMU on the first KVM_CREATE_VCPU call if
   the VM has an in-kernel local APIC.
 - Add entries in kernel-parameters.txt for the PMU params.
 - Add a patch to elide PMC writes when possible.
 - Many more fixups and tweaks...

v4:
 - https://lore.kernel.org/all/20250324173121.1275209-1-mizhang@google.com
 - Rebase whole patchset on 6.14-rc3 base.
 - Address Peter's comments on Perf part.
 - Address Sean's comments on KVM part.
   * Change key word "passthrough" to "mediated" in all patches
   * Change static enabling to user space dynamic enabling via KVM_CAP_PMU_CAPABILITY.
   * Only support GLOBAL_CTRL save/restore with VMCS exec_ctrl, drop the MSR
     save/retore list support for GLOBAL_CTRL, thus the support of mediated
     vPMU is constrained to SapphireRapids and later CPUs on Intel side.
   * Merge some small changes into a single patch.
 - Address Sandipan's comment on invalid pmu pointer.
 - Add back "eventsel_hw" and "fixed_ctr_ctrl_hw" to avoid to directly
   manipulate pmc->eventsel and pmu->fixed_ctr_ctrl.

v3: https://lore.kernel.org/all/20240801045907.4010984-1-mizhang@google.com
v2: https://lore.kernel.org/all/20240506053020.3911940-1-mizhang@google.com
v1: https://lore.kernel.org/all/20240126085444.324918-1-xiong.y.zhang@linux.intel.com

Dapeng Mi (15):
  KVM: x86/pmu: Start stubbing in mediated PMU support
  KVM: x86/pmu: Implement Intel mediated PMU requirements and
    constraints
  KVM: x86: Rename vmx_vmentry/vmexit_ctrl() helpers
  KVM: x86/pmu: Move PMU_CAP_{FW_WRITES,LBR_FMT} into msr-index.h header
  KVM: VMX: Add helpers to toggle/change a bit in VMCS execution
    controls
  KVM: x86/pmu: Disable RDPMC interception for compatible mediated vPMU
  KVM: x86/pmu: Load/save GLOBAL_CTRL via entry/exit fields for mediated
    PMU
  KVM: x86/pmu: Use BIT_ULL() instead of open coded equivalents
  KVM: x86/pmu: Disable interception of select PMU MSRs for mediated
    vPMUs
  KVM: x86/pmu: Bypass perf checks when emulating mediated PMU counter
    accesses
  KVM: x86/pmu: Reprogram mediated PMU event selectors on event filter
    updates
  KVM: x86/pmu: Load/put mediated PMU context when entering/exiting
    guest
  KVM: x86/pmu: Handle emulated instruction for mediated vPMU
  KVM: nVMX: Add macros to simplify nested MSR interception setting
  KVM: x86/pmu: Expose enable_mediated_pmu parameter to user space

Kan Liang (7):
  perf: Skip pmu_ctx based on event_type
  perf: Add generic exclude_guest support
  perf: Add APIs to create/release mediated guest vPMUs
  perf: Clean up perf ctx time
  perf: Add a EVENT_GUEST flag
  perf: Add APIs to load/put guest mediated PMU context
  perf/x86/intel: Support PERF_PMU_CAP_MEDIATED_VPMU

Mingwei Zhang (3):
  perf/x86/core: Plumb mediated PMU capability from x86_pmu to
    x86_pmu_cap
  KVM: x86/pmu: Introduce eventsel_hw to prepare for pmu event filtering
  KVM: nVMX: Disable PMU MSR interception as appropriate while running
    L2

Sandipan Das (3):
  perf/x86/core: Do not set bit width for unavailable counters
  perf/x86/amd: Support PERF_PMU_CAP_MEDIATED_VPMU for AMD host
  KVM: x86/pmu: Always stuff GuestOnly=1,HostOnly=0 for mediated PMCs on
    AMD

Sean Christopherson (15):
  perf: Move security_perf_event_free() call to __free_event()
  perf: core/x86: Register a new vector for handling mediated guest PMIs
  perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put
    context
  KVM: VMX: Setup canonical VMCS config prior to kvm_x86_vendor_init()
  KVM: SVM: Check pmu->version, not enable_pmu, when getting PMC MSRs
  KVM: Add a simplified wrapper for registering perf callbacks
  KVM: x86/pmu: Snapshot host (i.e. perf's) reported PMU capabilities
  KVM: x86/pmu: Implement AMD mediated PMU requirements
  KVM: x86: Rework KVM_REQ_MSR_FILTER_CHANGED into a generic
    RECALC_INTERCEPTS
  KVM: x86: Use KVM_REQ_RECALC_INTERCEPTS to react to CPUID updates
  KVM: x86/pmu: Move initialization of valid PMCs bitmask to common x86
  KVM: x86/pmu: Restrict GLOBAL_{CTRL,STATUS}, fixed PMCs, and PEBS to
    PMU v2+
  KVM: x86/pmu: Disallow emulation in the fastpath if mediated PMCs are
    active
  KVM: nSVM: Disable PMU MSR interception as appropriate while running
    L2
  KVM: x86/pmu: Elide WRMSRs when loading guest PMCs if values already
    match

Xiong Zhang (1):
  KVM: x86/pmu: Register PMI handler for mediated vPMU

 .../admin-guide/kernel-parameters.txt         |  49 ++
 arch/arm64/kvm/arm.c                          |   2 +-
 arch/loongarch/kvm/main.c                     |   2 +-
 arch/riscv/kvm/main.c                         |   2 +-
 arch/x86/entry/entry_fred.c                   |   1 +
 arch/x86/events/amd/core.c                    |   2 +
 arch/x86/events/core.c                        |  32 +-
 arch/x86/events/intel/core.c                  |   5 +
 arch/x86/include/asm/hardirq.h                |   3 +
 arch/x86/include/asm/idtentry.h               |   6 +
 arch/x86/include/asm/irq_vectors.h            |   4 +-
 arch/x86/include/asm/kvm-x86-ops.h            |   2 +-
 arch/x86/include/asm/kvm-x86-pmu-ops.h        |   4 +
 arch/x86/include/asm/kvm_host.h               |   7 +-
 arch/x86/include/asm/msr-index.h              |  17 +-
 arch/x86/include/asm/perf_event.h             |   1 +
 arch/x86/include/asm/vmx.h                    |   1 +
 arch/x86/kernel/idt.c                         |   3 +
 arch/x86/kernel/irq.c                         |  19 +
 arch/x86/kvm/Kconfig                          |   1 +
 arch/x86/kvm/cpuid.c                          |   2 +
 arch/x86/kvm/pmu.c                            | 272 ++++++++-
 arch/x86/kvm/pmu.h                            |  37 +-
 arch/x86/kvm/svm/nested.c                     |  18 +-
 arch/x86/kvm/svm/pmu.c                        |  51 +-
 arch/x86/kvm/svm/svm.c                        |  54 +-
 arch/x86/kvm/vmx/capabilities.h               |  11 +-
 arch/x86/kvm/vmx/main.c                       |  14 +-
 arch/x86/kvm/vmx/nested.c                     |  65 ++-
 arch/x86/kvm/vmx/pmu_intel.c                  | 169 ++++--
 arch/x86/kvm/vmx/pmu_intel.h                  |  15 +
 arch/x86/kvm/vmx/vmx.c                        | 143 +++--
 arch/x86/kvm/vmx/vmx.h                        |  11 +-
 arch/x86/kvm/vmx/x86_ops.h                    |   2 +-
 arch/x86/kvm/x86.c                            |  69 ++-
 arch/x86/kvm/x86.h                            |   1 +
 include/linux/kvm_host.h                      |  11 +-
 include/linux/perf_event.h                    |  38 +-
 init/Kconfig                                  |   4 +
 kernel/events/core.c                          | 521 ++++++++++++++----
 .../beauty/arch/x86/include/asm/irq_vectors.h |   3 +-
 virt/kvm/kvm_main.c                           |   6 +-
 42 files changed, 1385 insertions(+), 295 deletions(-)


base-commit: 53d61a43a7973f812caa08fa922b607574befef4
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v5 01/44] perf: Skip pmu_ctx based on event_type
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 02/44] perf: Add generic exclude_guest support Sean Christopherson
                   ` (45 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Kan Liang <kan.liang@linux.intel.com>

To optimize the cgroup context switch, the perf_event_pmu_context
iteration skips the PMUs without cgroup events. A bool cgroup was
introduced to indicate the case. It can work, but this way is hard to
extend for other cases, e.g. skipping non-mediated PMUs. It doesn't
make sense to keep adding bool variables.

Pass the event_type instead of the specific bool variable. Check both
the event_type and related pmu_ctx variables to decide whether skipping
a PMU.

Event flags, e.g., EVENT_CGROUP, should be cleard in the ctx->is_active.
Add EVENT_FLAGS to indicate such event flags.

No functional change.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 kernel/events/core.c | 74 ++++++++++++++++++++++++--------------------
 1 file changed, 40 insertions(+), 34 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 22fdf0c187cd..d4528554528d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -164,7 +164,7 @@ enum event_type_t {
 	/* see ctx_resched() for details */
 	EVENT_CPU	= 0x10,
 	EVENT_CGROUP	= 0x20,
-
+	EVENT_FLAGS	= EVENT_CGROUP,
 	/* compound helpers */
 	EVENT_ALL         = EVENT_FLEXIBLE | EVENT_PINNED,
 	EVENT_TIME_FROZEN = EVENT_TIME | EVENT_FROZEN,
@@ -778,27 +778,37 @@ do {									\
 	___p;								\
 })
 
-#define for_each_epc(_epc, _ctx, _pmu, _cgroup)				\
+static bool perf_skip_pmu_ctx(struct perf_event_pmu_context *pmu_ctx,
+			      enum event_type_t event_type)
+{
+	if ((event_type & EVENT_CGROUP) && !pmu_ctx->nr_cgroups)
+		return true;
+	return false;
+}
+
+#define for_each_epc(_epc, _ctx, _pmu, _event_type)			\
 	list_for_each_entry(_epc, &((_ctx)->pmu_ctx_list), pmu_ctx_entry) \
-		if (_cgroup && !_epc->nr_cgroups)			\
+		if (perf_skip_pmu_ctx(_epc, _event_type))		\
 			continue;					\
 		else if (_pmu && _epc->pmu != _pmu)			\
 			continue;					\
 		else
 
-static void perf_ctx_disable(struct perf_event_context *ctx, bool cgroup)
+static void perf_ctx_disable(struct perf_event_context *ctx,
+			     enum event_type_t event_type)
 {
 	struct perf_event_pmu_context *pmu_ctx;
 
-	for_each_epc(pmu_ctx, ctx, NULL, cgroup)
+	for_each_epc(pmu_ctx, ctx, NULL, event_type)
 		perf_pmu_disable(pmu_ctx->pmu);
 }
 
-static void perf_ctx_enable(struct perf_event_context *ctx, bool cgroup)
+static void perf_ctx_enable(struct perf_event_context *ctx,
+			    enum event_type_t event_type)
 {
 	struct perf_event_pmu_context *pmu_ctx;
 
-	for_each_epc(pmu_ctx, ctx, NULL, cgroup)
+	for_each_epc(pmu_ctx, ctx, NULL, event_type)
 		perf_pmu_enable(pmu_ctx->pmu);
 }
 
@@ -963,8 +973,7 @@ static void perf_cgroup_switch(struct task_struct *task)
 		return;
 
 	WARN_ON_ONCE(cpuctx->ctx.nr_cgroups == 0);
-
-	perf_ctx_disable(&cpuctx->ctx, true);
+	perf_ctx_disable(&cpuctx->ctx, EVENT_CGROUP);
 
 	ctx_sched_out(&cpuctx->ctx, NULL, EVENT_ALL|EVENT_CGROUP);
 	/*
@@ -980,7 +989,7 @@ static void perf_cgroup_switch(struct task_struct *task)
 	 */
 	ctx_sched_in(&cpuctx->ctx, NULL, EVENT_ALL|EVENT_CGROUP);
 
-	perf_ctx_enable(&cpuctx->ctx, true);
+	perf_ctx_enable(&cpuctx->ctx, EVENT_CGROUP);
 }
 
 static int perf_cgroup_ensure_storage(struct perf_event *event,
@@ -2898,11 +2907,11 @@ static void ctx_resched(struct perf_cpu_context *cpuctx,
 
 	event_type &= EVENT_ALL;
 
-	for_each_epc(epc, &cpuctx->ctx, pmu, false)
+	for_each_epc(epc, &cpuctx->ctx, pmu, 0)
 		perf_pmu_disable(epc->pmu);
 
 	if (task_ctx) {
-		for_each_epc(epc, task_ctx, pmu, false)
+		for_each_epc(epc, task_ctx, pmu, 0)
 			perf_pmu_disable(epc->pmu);
 
 		task_ctx_sched_out(task_ctx, pmu, event_type);
@@ -2922,11 +2931,11 @@ static void ctx_resched(struct perf_cpu_context *cpuctx,
 
 	perf_event_sched_in(cpuctx, task_ctx, pmu);
 
-	for_each_epc(epc, &cpuctx->ctx, pmu, false)
+	for_each_epc(epc, &cpuctx->ctx, pmu, 0)
 		perf_pmu_enable(epc->pmu);
 
 	if (task_ctx) {
-		for_each_epc(epc, task_ctx, pmu, false)
+		for_each_epc(epc, task_ctx, pmu, 0)
 			perf_pmu_enable(epc->pmu);
 	}
 }
@@ -3475,11 +3484,10 @@ static void
 ctx_sched_out(struct perf_event_context *ctx, struct pmu *pmu, enum event_type_t event_type)
 {
 	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
+	enum event_type_t active_type = event_type & ~EVENT_FLAGS;
 	struct perf_event_pmu_context *pmu_ctx;
 	int is_active = ctx->is_active;
-	bool cgroup = event_type & EVENT_CGROUP;
 
-	event_type &= ~EVENT_CGROUP;
 
 	lockdep_assert_held(&ctx->lock);
 
@@ -3510,7 +3518,7 @@ ctx_sched_out(struct perf_event_context *ctx, struct pmu *pmu, enum event_type_t
 	 * see __load_acquire() in perf_event_time_now()
 	 */
 	barrier();
-	ctx->is_active &= ~event_type;
+	ctx->is_active &= ~active_type;
 
 	if (!(ctx->is_active & EVENT_ALL)) {
 		/*
@@ -3531,7 +3539,7 @@ ctx_sched_out(struct perf_event_context *ctx, struct pmu *pmu, enum event_type_t
 
 	is_active ^= ctx->is_active; /* changed bits */
 
-	for_each_epc(pmu_ctx, ctx, pmu, cgroup)
+	for_each_epc(pmu_ctx, ctx, pmu, event_type)
 		__pmu_ctx_sched_out(pmu_ctx, is_active);
 }
 
@@ -3687,7 +3695,7 @@ perf_event_context_sched_out(struct task_struct *task, struct task_struct *next)
 		raw_spin_lock_nested(&next_ctx->lock, SINGLE_DEPTH_NESTING);
 		if (context_equiv(ctx, next_ctx)) {
 
-			perf_ctx_disable(ctx, false);
+			perf_ctx_disable(ctx, 0);
 
 			/* PMIs are disabled; ctx->nr_no_switch_fast is stable. */
 			if (local_read(&ctx->nr_no_switch_fast) ||
@@ -3711,7 +3719,7 @@ perf_event_context_sched_out(struct task_struct *task, struct task_struct *next)
 
 			perf_ctx_sched_task_cb(ctx, task, false);
 
-			perf_ctx_enable(ctx, false);
+			perf_ctx_enable(ctx, 0);
 
 			/*
 			 * RCU_INIT_POINTER here is safe because we've not
@@ -3735,13 +3743,13 @@ perf_event_context_sched_out(struct task_struct *task, struct task_struct *next)
 
 	if (do_switch) {
 		raw_spin_lock(&ctx->lock);
-		perf_ctx_disable(ctx, false);
+		perf_ctx_disable(ctx, 0);
 
 inside_switch:
 		perf_ctx_sched_task_cb(ctx, task, false);
 		task_ctx_sched_out(ctx, NULL, EVENT_ALL);
 
-		perf_ctx_enable(ctx, false);
+		perf_ctx_enable(ctx, 0);
 		raw_spin_unlock(&ctx->lock);
 	}
 }
@@ -4050,11 +4058,9 @@ static void
 ctx_sched_in(struct perf_event_context *ctx, struct pmu *pmu, enum event_type_t event_type)
 {
 	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
+	enum event_type_t active_type = event_type & ~EVENT_FLAGS;
 	struct perf_event_pmu_context *pmu_ctx;
 	int is_active = ctx->is_active;
-	bool cgroup = event_type & EVENT_CGROUP;
-
-	event_type &= ~EVENT_CGROUP;
 
 	lockdep_assert_held(&ctx->lock);
 
@@ -4072,7 +4078,7 @@ ctx_sched_in(struct perf_event_context *ctx, struct pmu *pmu, enum event_type_t
 		barrier();
 	}
 
-	ctx->is_active |= (event_type | EVENT_TIME);
+	ctx->is_active |= active_type | EVENT_TIME;
 	if (ctx->task) {
 		if (!(is_active & EVENT_ALL))
 			cpuctx->task_ctx = ctx;
@@ -4087,13 +4093,13 @@ ctx_sched_in(struct perf_event_context *ctx, struct pmu *pmu, enum event_type_t
 	 * in order to give them the best chance of going on.
 	 */
 	if (is_active & EVENT_PINNED) {
-		for_each_epc(pmu_ctx, ctx, pmu, cgroup)
+		for_each_epc(pmu_ctx, ctx, pmu, event_type)
 			__pmu_ctx_sched_in(pmu_ctx, EVENT_PINNED);
 	}
 
 	/* Then walk through the lower prio flexible groups */
 	if (is_active & EVENT_FLEXIBLE) {
-		for_each_epc(pmu_ctx, ctx, pmu, cgroup)
+		for_each_epc(pmu_ctx, ctx, pmu, event_type)
 			__pmu_ctx_sched_in(pmu_ctx, EVENT_FLEXIBLE);
 	}
 }
@@ -4110,11 +4116,11 @@ static void perf_event_context_sched_in(struct task_struct *task)
 
 	if (cpuctx->task_ctx == ctx) {
 		perf_ctx_lock(cpuctx, ctx);
-		perf_ctx_disable(ctx, false);
+		perf_ctx_disable(ctx, 0);
 
 		perf_ctx_sched_task_cb(ctx, task, true);
 
-		perf_ctx_enable(ctx, false);
+		perf_ctx_enable(ctx, 0);
 		perf_ctx_unlock(cpuctx, ctx);
 		goto rcu_unlock;
 	}
@@ -4127,7 +4133,7 @@ static void perf_event_context_sched_in(struct task_struct *task)
 	if (!ctx->nr_events)
 		goto unlock;
 
-	perf_ctx_disable(ctx, false);
+	perf_ctx_disable(ctx, 0);
 	/*
 	 * We want to keep the following priority order:
 	 * cpu pinned (that don't need to move), task pinned,
@@ -4137,7 +4143,7 @@ static void perf_event_context_sched_in(struct task_struct *task)
 	 * events, no need to flip the cpuctx's events around.
 	 */
 	if (!RB_EMPTY_ROOT(&ctx->pinned_groups.tree)) {
-		perf_ctx_disable(&cpuctx->ctx, false);
+		perf_ctx_disable(&cpuctx->ctx, 0);
 		ctx_sched_out(&cpuctx->ctx, NULL, EVENT_FLEXIBLE);
 	}
 
@@ -4146,9 +4152,9 @@ static void perf_event_context_sched_in(struct task_struct *task)
 	perf_ctx_sched_task_cb(cpuctx->task_ctx, task, true);
 
 	if (!RB_EMPTY_ROOT(&ctx->pinned_groups.tree))
-		perf_ctx_enable(&cpuctx->ctx, false);
+		perf_ctx_enable(&cpuctx->ctx, 0);
 
-	perf_ctx_enable(ctx, false);
+	perf_ctx_enable(ctx, 0);
 
 unlock:
 	perf_ctx_unlock(cpuctx, ctx);
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 02/44] perf: Add generic exclude_guest support
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 01/44] perf: Skip pmu_ctx based on event_type Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 03/44] perf: Move security_perf_event_free() call to __free_event() Sean Christopherson
                   ` (44 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Kan Liang <kan.liang@linux.intel.com>

Only KVM knows the exact time when a guest is entering/exiting. Expose
two interfaces to KVM to switch the ownership of the PMU resources.

All the pinned events must be scheduled in first. Extend the
perf_event_sched_in() helper to support extra flag, e.g., EVENT_GUEST.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 kernel/events/core.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index d4528554528d..3a98e11d8efc 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2866,14 +2866,15 @@ static void task_ctx_sched_out(struct perf_event_context *ctx,
 
 static void perf_event_sched_in(struct perf_cpu_context *cpuctx,
 				struct perf_event_context *ctx,
-				struct pmu *pmu)
+				struct pmu *pmu,
+				enum event_type_t event_type)
 {
-	ctx_sched_in(&cpuctx->ctx, pmu, EVENT_PINNED);
+	ctx_sched_in(&cpuctx->ctx, pmu, EVENT_PINNED | event_type);
 	if (ctx)
-		 ctx_sched_in(ctx, pmu, EVENT_PINNED);
-	ctx_sched_in(&cpuctx->ctx, pmu, EVENT_FLEXIBLE);
+		ctx_sched_in(ctx, pmu, EVENT_PINNED | event_type);
+	ctx_sched_in(&cpuctx->ctx, pmu, EVENT_FLEXIBLE | event_type);
 	if (ctx)
-		 ctx_sched_in(ctx, pmu, EVENT_FLEXIBLE);
+		ctx_sched_in(ctx, pmu, EVENT_FLEXIBLE | event_type);
 }
 
 /*
@@ -2929,7 +2930,7 @@ static void ctx_resched(struct perf_cpu_context *cpuctx,
 	else if (event_type & EVENT_PINNED)
 		ctx_sched_out(&cpuctx->ctx, pmu, EVENT_FLEXIBLE);
 
-	perf_event_sched_in(cpuctx, task_ctx, pmu);
+	perf_event_sched_in(cpuctx, task_ctx, pmu, 0);
 
 	for_each_epc(epc, &cpuctx->ctx, pmu, 0)
 		perf_pmu_enable(epc->pmu);
@@ -4147,7 +4148,7 @@ static void perf_event_context_sched_in(struct task_struct *task)
 		ctx_sched_out(&cpuctx->ctx, NULL, EVENT_FLEXIBLE);
 	}
 
-	perf_event_sched_in(cpuctx, ctx, NULL);
+	perf_event_sched_in(cpuctx, ctx, NULL, 0);
 
 	perf_ctx_sched_task_cb(cpuctx->task_ctx, task, true);
 
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 03/44] perf: Move security_perf_event_free() call to __free_event()
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 01/44] perf: Skip pmu_ctx based on event_type Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 02/44] perf: Add generic exclude_guest support Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 04/44] perf: Add APIs to create/release mediated guest vPMUs Sean Christopherson
                   ` (43 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

Move the freeing of any security state associated with a perf event from
_free_event() to __free_event(), i.e. invoke security_perf_event_free() in
the error paths for perf_event_alloc().  This will allow adding potential
error paths in perf_event_alloc() that can occur after allocating security
state.

Note, kfree() and thus security_perf_event_free() is a nop if
event->security is NULL, i.e. calling security_perf_event_free() even if
security_perf_event_alloc() fails or is never reached is functionality ok.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 kernel/events/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3a98e11d8efc..1753a97638a3 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5596,6 +5596,8 @@ static void __free_event(struct perf_event *event)
 {
 	struct pmu *pmu = event->pmu;
 
+	security_perf_event_free(event);
+
 	if (event->attach_state & PERF_ATTACH_CALLCHAIN)
 		put_callchain_buffers();
 
@@ -5659,8 +5661,6 @@ static void _free_event(struct perf_event *event)
 
 	unaccount_event(event);
 
-	security_perf_event_free(event);
-
 	if (event->rb) {
 		/*
 		 * Can happen when we close an event with re-directed output.
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 04/44] perf: Add APIs to create/release mediated guest vPMUs
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (2 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 03/44] perf: Move security_perf_event_free() call to __free_event() Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 05/44] perf: Clean up perf ctx time Sean Christopherson
                   ` (42 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Kan Liang <kan.liang@linux.intel.com>

Currently, exposing PMU capabilities to a KVM guest is done by emulating
guest PMCs via host perf events, i.e. by having KVM be "just" another user
of perf.  As a result, the guest and host are effectively competing for
resources, and emulating guest accesses to vPMU resources requires
expensive actions (expensive relative to the native instruction).  The
overhead and resource competition results in degraded guest performance
and ultimately very poor vPMU accuracy.

To address the issues with the perf-emulated vPMU, introduce a "mediated
vPMU", where the data plane (PMCs and enable/disable knobs) is exposed
directly to the guest, but the control plane (event selectors and access
to fixed counters) is managed by KVM (via MSR interceptions).  To allow
host perf usage of the PMU to (partially) co-exist with KVM/guest usage
of the PMU, KVM and perf will coordinate to a world switch between host
perf context and guest vPMU context near VM-Enter/VM-Exit.

Add two exported APIs, perf_{create,release}_mediated_pmu(), to allow KVM
to create and release a mediated PMU instance (per VM).  Because host perf
context will be deactivated while the guest is running, mediated PMU usage
will be mutually exclusive with perf analysis of the guest, i.e. perf
events that do NOT exclude the guest will not behave as expected.

To avoid silent failure of !exclude_guest perf events, disallow creating a
mediated PMU if there are active !exclude_guest events, and on the perf
side, disallowing creating new !exclude_guest perf events while there is
at least one active mediated PMU.

Exempt PMU resources that do not support mediated PMU usage, i.e. that are
outside the scope/view of KVM's vPMU and will not be swapped out while the
guest is running.

Guard mediated PMU with a new kconfig to help readers identify code paths
that are unique to mediated PMU support, and to allow for adding arch-
specific hooks without stubs.  KVM x86 is expected to be the only KVM
architecture to support a mediated PMU in the near future (e.g. arm64 is
trending toward a partitioned PMU implementation), and KVM x86 will select
PERF_GUEST_MEDIATED_PMU unconditionally, i.e. won't need stubs.

Immediately select PERF_GUEST_MEDIATED_PMU when KVM x86 is enabled so that
all paths are compile tested.  Full KVM support is on its way...

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
[sean: add kconfig and WARNing, rewrite changelog, swizzle patch ordering]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/Kconfig       |  1 +
 include/linux/perf_event.h |  6 +++
 init/Kconfig               |  4 ++
 kernel/events/core.c       | 82 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 93 insertions(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 2c86673155c9..ee67357b5e36 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -37,6 +37,7 @@ config KVM_X86
 	select SCHED_INFO
 	select PERF_EVENTS
 	select GUEST_PERF_EVENTS
+	select PERF_GUEST_MEDIATED_PMU
 	select HAVE_KVM_MSI
 	select HAVE_KVM_CPU_RELAX_INTERCEPT
 	select HAVE_KVM_NO_POLL
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index ec9d96025683..63097beb5f02 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -305,6 +305,7 @@ struct perf_event_pmu_context;
 #define PERF_PMU_CAP_EXTENDED_HW_TYPE	0x0100
 #define PERF_PMU_CAP_AUX_PAUSE		0x0200
 #define PERF_PMU_CAP_AUX_PREFER_LARGE	0x0400
+#define PERF_PMU_CAP_MEDIATED_VPMU	0x0800
 
 /**
  * pmu::scope
@@ -1914,6 +1915,11 @@ extern int perf_event_account_interrupt(struct perf_event *event);
 extern int perf_event_period(struct perf_event *event, u64 value);
 extern u64 perf_event_pause(struct perf_event *event, bool reset);
 
+#ifdef CONFIG_PERF_GUEST_MEDIATED_PMU
+int perf_create_mediated_pmu(void);
+void perf_release_mediated_pmu(void);
+#endif
+
 #else /* !CONFIG_PERF_EVENTS: */
 
 static inline void *
diff --git a/init/Kconfig b/init/Kconfig
index 666783eb50ab..1e3c90c3f24f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1955,6 +1955,10 @@ config GUEST_PERF_EVENTS
 	bool
 	depends on HAVE_PERF_EVENTS
 
+config PERF_GUEST_MEDIATED_PMU
+	bool
+	depends on GUEST_PERF_EVENTS
+
 config PERF_USE_VMALLOC
 	bool
 	help
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1753a97638a3..bf0347231bd9 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5651,6 +5651,8 @@ static void __free_event(struct perf_event *event)
 	call_rcu(&event->rcu_head, free_event_rcu);
 }
 
+static void mediated_pmu_unaccount_event(struct perf_event *event);
+
 DEFINE_FREE(__free_event, struct perf_event *, if (_T) __free_event(_T))
 
 /* vs perf_event_alloc() success */
@@ -5660,6 +5662,7 @@ static void _free_event(struct perf_event *event)
 	irq_work_sync(&event->pending_disable_irq);
 
 	unaccount_event(event);
+	mediated_pmu_unaccount_event(event);
 
 	if (event->rb) {
 		/*
@@ -6182,6 +6185,81 @@ u64 perf_event_pause(struct perf_event *event, bool reset)
 }
 EXPORT_SYMBOL_GPL(perf_event_pause);
 
+#ifdef CONFIG_PERF_GUEST_MEDIATED_PMU
+static atomic_t nr_include_guest_events __read_mostly;
+
+static atomic_t nr_mediated_pmu_vms __read_mostly;
+static DEFINE_MUTEX(perf_mediated_pmu_mutex);
+
+/* !exclude_guest event of PMU with PERF_PMU_CAP_MEDIATED_VPMU */
+static inline bool is_include_guest_event(struct perf_event *event)
+{
+	if ((event->pmu->capabilities & PERF_PMU_CAP_MEDIATED_VPMU) &&
+	    !event->attr.exclude_guest)
+		return true;
+
+	return false;
+}
+
+static int mediated_pmu_account_event(struct perf_event *event)
+{
+	if (!is_include_guest_event(event))
+		return 0;
+
+	guard(mutex)(&perf_mediated_pmu_mutex);
+
+	if (atomic_read(&nr_mediated_pmu_vms))
+		return -EOPNOTSUPP;
+
+	atomic_inc(&nr_include_guest_events);
+	return 0;
+}
+
+static void mediated_pmu_unaccount_event(struct perf_event *event)
+{
+	if (!is_include_guest_event(event))
+		return;
+
+	atomic_dec(&nr_include_guest_events);
+}
+
+/*
+ * Currently invoked at VM creation to
+ * - Check whether there are existing !exclude_guest events of PMU with
+ *   PERF_PMU_CAP_MEDIATED_VPMU
+ * - Set nr_mediated_pmu_vms to prevent !exclude_guest event creation on
+ *   PMUs with PERF_PMU_CAP_MEDIATED_VPMU
+ *
+ * No impact for the PMU without PERF_PMU_CAP_MEDIATED_VPMU. The perf
+ * still owns all the PMU resources.
+ */
+int perf_create_mediated_pmu(void)
+{
+	guard(mutex)(&perf_mediated_pmu_mutex);
+	if (atomic_inc_not_zero(&nr_mediated_pmu_vms))
+		return 0;
+
+	if (atomic_read(&nr_include_guest_events))
+		return -EBUSY;
+
+	atomic_inc(&nr_mediated_pmu_vms);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(perf_create_mediated_pmu);
+
+void perf_release_mediated_pmu(void)
+{
+	if (WARN_ON_ONCE(!atomic_read(&nr_mediated_pmu_vms)))
+		return;
+
+	atomic_dec(&nr_mediated_pmu_vms);
+}
+EXPORT_SYMBOL_GPL(perf_release_mediated_pmu);
+#else
+static int mediated_pmu_account_event(struct perf_event *event) { return 0; }
+static void mediated_pmu_unaccount_event(struct perf_event *event) {}
+#endif
+
 /*
  * Holding the top-level event's child_mutex means that any
  * descendant process that has inherited this event will block
@@ -13024,6 +13102,10 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 	if (err)
 		return ERR_PTR(err);
 
+	err = mediated_pmu_account_event(event);
+	if (err)
+		return ERR_PTR(err);
+
 	/* symmetric to unaccount_event() in _free_event() */
 	account_event(event);
 
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 05/44] perf: Clean up perf ctx time
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (3 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 04/44] perf: Add APIs to create/release mediated guest vPMUs Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 06/44] perf: Add a EVENT_GUEST flag Sean Christopherson
                   ` (41 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Kan Liang <kan.liang@linux.intel.com>

The current perf tracks two timestamps for the normal ctx and cgroup.
The same type of variables and similar codes are used to track the
timestamps. In the following patch, the third timestamp to track the
guest time will be introduced.
To avoid the code duplication, add a new struct perf_time_ctx and factor
out a generic function update_perf_time_ctx().

No functional change.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/perf_event.h | 13 +++----
 kernel/events/core.c       | 70 +++++++++++++++++---------------------
 2 files changed, 39 insertions(+), 44 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 63097beb5f02..186a026fb7fb 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -999,6 +999,11 @@ struct perf_event_groups {
 	u64				index;
 };
 
+struct perf_time_ctx {
+	u64		time;
+	u64		stamp;
+	u64		offset;
+};
 
 /**
  * struct perf_event_context - event context structure
@@ -1037,9 +1042,7 @@ struct perf_event_context {
 	/*
 	 * Context clock, runs when context enabled.
 	 */
-	u64				time;
-	u64				timestamp;
-	u64				timeoffset;
+	struct perf_time_ctx		time;
 
 	/*
 	 * These fields let us detect when two contexts have both
@@ -1172,9 +1175,7 @@ struct bpf_perf_event_data_kern {
  * This is a per-cpu dynamically allocated data structure.
  */
 struct perf_cgroup_info {
-	u64				time;
-	u64				timestamp;
-	u64				timeoffset;
+	struct perf_time_ctx		time;
 	int				active;
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index bf0347231bd9..2cf988bdabf0 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -815,6 +815,24 @@ static void perf_ctx_enable(struct perf_event_context *ctx,
 static void ctx_sched_out(struct perf_event_context *ctx, struct pmu *pmu, enum event_type_t event_type);
 static void ctx_sched_in(struct perf_event_context *ctx, struct pmu *pmu, enum event_type_t event_type);
 
+static inline void update_perf_time_ctx(struct perf_time_ctx *time, u64 now, bool adv)
+{
+	if (adv)
+		time->time += now - time->stamp;
+	time->stamp = now;
+
+	/*
+	 * The above: time' = time + (now - timestamp), can be re-arranged
+	 * into: time` = now + (time - timestamp), which gives a single value
+	 * offset to compute future time without locks on.
+	 *
+	 * See perf_event_time_now(), which can be used from NMI context where
+	 * it's (obviously) not possible to acquire ctx->lock in order to read
+	 * both the above values in a consistent manner.
+	 */
+	WRITE_ONCE(time->offset, time->time - time->stamp);
+}
+
 #ifdef CONFIG_CGROUP_PERF
 
 static inline bool
@@ -856,7 +874,7 @@ static inline u64 perf_cgroup_event_time(struct perf_event *event)
 	struct perf_cgroup_info *t;
 
 	t = per_cpu_ptr(event->cgrp->info, event->cpu);
-	return t->time;
+	return t->time.time;
 }
 
 static inline u64 perf_cgroup_event_time_now(struct perf_event *event, u64 now)
@@ -865,22 +883,11 @@ static inline u64 perf_cgroup_event_time_now(struct perf_event *event, u64 now)
 
 	t = per_cpu_ptr(event->cgrp->info, event->cpu);
 	if (!__load_acquire(&t->active))
-		return t->time;
-	now += READ_ONCE(t->timeoffset);
+		return t->time.time;
+	now += READ_ONCE(t->time.offset);
 	return now;
 }
 
-static inline void __update_cgrp_time(struct perf_cgroup_info *info, u64 now, bool adv)
-{
-	if (adv)
-		info->time += now - info->timestamp;
-	info->timestamp = now;
-	/*
-	 * see update_context_time()
-	 */
-	WRITE_ONCE(info->timeoffset, info->time - info->timestamp);
-}
-
 static inline void update_cgrp_time_from_cpuctx(struct perf_cpu_context *cpuctx, bool final)
 {
 	struct perf_cgroup *cgrp = cpuctx->cgrp;
@@ -894,7 +901,7 @@ static inline void update_cgrp_time_from_cpuctx(struct perf_cpu_context *cpuctx,
 			cgrp = container_of(css, struct perf_cgroup, css);
 			info = this_cpu_ptr(cgrp->info);
 
-			__update_cgrp_time(info, now, true);
+			update_perf_time_ctx(&info->time, now, true);
 			if (final)
 				__store_release(&info->active, 0);
 		}
@@ -917,7 +924,7 @@ static inline void update_cgrp_time_from_event(struct perf_event *event)
 	 * Do not update time when cgroup is not active
 	 */
 	if (info->active)
-		__update_cgrp_time(info, perf_clock(), true);
+		update_perf_time_ctx(&info->time, perf_clock(), true);
 }
 
 static inline void
@@ -941,7 +948,7 @@ perf_cgroup_set_timestamp(struct perf_cpu_context *cpuctx)
 	for (css = &cgrp->css; css; css = css->parent) {
 		cgrp = container_of(css, struct perf_cgroup, css);
 		info = this_cpu_ptr(cgrp->info);
-		__update_cgrp_time(info, ctx->timestamp, false);
+		update_perf_time_ctx(&info->time, ctx->time.stamp, false);
 		__store_release(&info->active, 1);
 	}
 }
@@ -1562,20 +1569,7 @@ static void __update_context_time(struct perf_event_context *ctx, bool adv)
 
 	lockdep_assert_held(&ctx->lock);
 
-	if (adv)
-		ctx->time += now - ctx->timestamp;
-	ctx->timestamp = now;
-
-	/*
-	 * The above: time' = time + (now - timestamp), can be re-arranged
-	 * into: time` = now + (time - timestamp), which gives a single value
-	 * offset to compute future time without locks on.
-	 *
-	 * See perf_event_time_now(), which can be used from NMI context where
-	 * it's (obviously) not possible to acquire ctx->lock in order to read
-	 * both the above values in a consistent manner.
-	 */
-	WRITE_ONCE(ctx->timeoffset, ctx->time - ctx->timestamp);
+	update_perf_time_ctx(&ctx->time, now, adv);
 }
 
 static void update_context_time(struct perf_event_context *ctx)
@@ -1593,7 +1587,7 @@ static u64 perf_event_time(struct perf_event *event)
 	if (is_cgroup_event(event))
 		return perf_cgroup_event_time(event);
 
-	return ctx->time;
+	return ctx->time.time;
 }
 
 static u64 perf_event_time_now(struct perf_event *event, u64 now)
@@ -1607,9 +1601,9 @@ static u64 perf_event_time_now(struct perf_event *event, u64 now)
 		return perf_cgroup_event_time_now(event, now);
 
 	if (!(__load_acquire(&ctx->is_active) & EVENT_TIME))
-		return ctx->time;
+		return ctx->time.time;
 
-	now += READ_ONCE(ctx->timeoffset);
+	now += READ_ONCE(ctx->time.offset);
 	return now;
 }
 
@@ -11991,7 +11985,7 @@ static void task_clock_event_update(struct perf_event *event, u64 now)
 
 static void task_clock_event_start(struct perf_event *event, int flags)
 {
-	local64_set(&event->hw.prev_count, event->ctx->time);
+	local64_set(&event->hw.prev_count, event->ctx->time.time);
 	perf_swevent_start_hrtimer(event);
 }
 
@@ -11999,7 +11993,7 @@ static void task_clock_event_stop(struct perf_event *event, int flags)
 {
 	perf_swevent_cancel_hrtimer(event);
 	if (flags & PERF_EF_UPDATE)
-		task_clock_event_update(event, event->ctx->time);
+		task_clock_event_update(event, event->ctx->time.time);
 }
 
 static int task_clock_event_add(struct perf_event *event, int flags)
@@ -12019,8 +12013,8 @@ static void task_clock_event_del(struct perf_event *event, int flags)
 static void task_clock_event_read(struct perf_event *event)
 {
 	u64 now = perf_clock();
-	u64 delta = now - event->ctx->timestamp;
-	u64 time = event->ctx->time + delta;
+	u64 delta = now - event->ctx->time.stamp;
+	u64 time = event->ctx->time.time + delta;
 
 	task_clock_event_update(event, time);
 }
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 06/44] perf: Add a EVENT_GUEST flag
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (4 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 05/44] perf: Clean up perf ctx time Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 07/44] perf: Add APIs to load/put guest mediated PMU context Sean Christopherson
                   ` (40 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Kan Liang <kan.liang@linux.intel.com>

Current perf doesn't explicitly schedule out all exclude_guest events
while the guest is running. There is no problem with the current
emulated vPMU. Because perf owns all the PMU counters. It can mask the
counter which is assigned to an exclude_guest event when a guest is
running (Intel way), or set the corresponding HOSTONLY bit in evsentsel
(AMD way). The counter doesn't count when a guest is running.

However, either way doesn't work with the introduced mediated vPMU.
A guest owns all the PMU counters when it's running. The host should not
mask any counters. The counter may be used by the guest. The evsentsel
may be overwritten.

Perf should explicitly schedule out all exclude_guest events to release
the PMU resources when entering a guest, and resume the counting when
exiting the guest.

It's possible that an exclude_guest event is created when a guest is
running. The new event should not be scheduled in as well.

The ctx time is shared among different PMUs. The time cannot be stopped
when a guest is running. It is required to calculate the time for events
from other PMUs, e.g., uncore events. Add timeguest to track the guest
run time. For an exclude_guest event, the elapsed time equals
the ctx time - guest time.
Cgroup has dedicated times. Use the same method to deduct the guest time
from the cgroup time as well.

Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
[sean: massage comments]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/perf_event.h |   6 +
 kernel/events/core.c       | 232 ++++++++++++++++++++++++++++---------
 2 files changed, 186 insertions(+), 52 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 186a026fb7fb..0958b6d0a61c 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1044,6 +1044,11 @@ struct perf_event_context {
 	 */
 	struct perf_time_ctx		time;
 
+	/*
+	 * Context clock, runs when in the guest mode.
+	 */
+	struct perf_time_ctx		timeguest;
+
 	/*
 	 * These fields let us detect when two contexts have both
 	 * been cloned (inherited) from a common ancestor.
@@ -1176,6 +1181,7 @@ struct bpf_perf_event_data_kern {
  */
 struct perf_cgroup_info {
 	struct perf_time_ctx		time;
+	struct perf_time_ctx		timeguest;
 	int				active;
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2cf988bdabf0..6875b56ddd6b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -164,7 +164,19 @@ enum event_type_t {
 	/* see ctx_resched() for details */
 	EVENT_CPU	= 0x10,
 	EVENT_CGROUP	= 0x20,
-	EVENT_FLAGS	= EVENT_CGROUP,
+
+	/*
+	 * EVENT_GUEST is set when scheduling in/out events between the host
+	 * and a guest with a mediated vPMU.  Among other things, EVENT_GUEST
+	 * is used:
+	 *
+	 * - In for_each_epc() to skip PMUs that don't support events in a
+	 *   MEDIATED_VPMU guest, i.e. don't need to be context switched.
+	 * - To indicate the start/end point of the events in a guest.  Guest
+	 *   running time is deducted for host-only (exclude_guest) events.
+	 */
+	EVENT_GUEST	= 0x40,
+	EVENT_FLAGS	= EVENT_CGROUP | EVENT_GUEST,
 	/* compound helpers */
 	EVENT_ALL         = EVENT_FLEXIBLE | EVENT_PINNED,
 	EVENT_TIME_FROZEN = EVENT_TIME | EVENT_FROZEN,
@@ -457,6 +469,11 @@ static cpumask_var_t perf_online_pkg_mask;
 static cpumask_var_t perf_online_sys_mask;
 static struct kmem_cache *perf_event_cache;
 
+static __always_inline bool is_guest_mediated_pmu_loaded(void)
+{
+	return false;
+}
+
 /*
  * perf event paranoia level:
  *  -1 - not paranoid at all
@@ -783,6 +800,9 @@ static bool perf_skip_pmu_ctx(struct perf_event_pmu_context *pmu_ctx,
 {
 	if ((event_type & EVENT_CGROUP) && !pmu_ctx->nr_cgroups)
 		return true;
+	if ((event_type & EVENT_GUEST) &&
+	    !(pmu_ctx->pmu->capabilities & PERF_PMU_CAP_MEDIATED_VPMU))
+		return true;
 	return false;
 }
 
@@ -833,6 +853,39 @@ static inline void update_perf_time_ctx(struct perf_time_ctx *time, u64 now, boo
 	WRITE_ONCE(time->offset, time->time - time->stamp);
 }
 
+static_assert(offsetof(struct perf_event_context, timeguest) -
+	      offsetof(struct perf_event_context, time) ==
+	      sizeof(struct perf_time_ctx));
+
+#define T_TOTAL		0
+#define T_GUEST		1
+
+static inline u64 __perf_event_time_ctx(struct perf_event *event,
+					struct perf_time_ctx *times)
+{
+	u64 time = times[T_TOTAL].time;
+
+	if (event->attr.exclude_guest)
+		time -= times[T_GUEST].time;
+
+	return time;
+}
+
+static inline u64 __perf_event_time_ctx_now(struct perf_event *event,
+					    struct perf_time_ctx *times,
+					    u64 now)
+{
+	if (is_guest_mediated_pmu_loaded() && event->attr.exclude_guest) {
+		/*
+		 * (now + times[total].offset) - (now + times[guest].offset) :=
+		 * times[total].offset - times[guest].offset
+		 */
+		return READ_ONCE(times[T_TOTAL].offset) - READ_ONCE(times[T_GUEST].offset);
+	}
+
+	return now + READ_ONCE(times[T_TOTAL].offset);
+}
+
 #ifdef CONFIG_CGROUP_PERF
 
 static inline bool
@@ -869,12 +922,16 @@ static inline int is_cgroup_event(struct perf_event *event)
 	return event->cgrp != NULL;
 }
 
+static_assert(offsetof(struct perf_cgroup_info, timeguest) -
+	      offsetof(struct perf_cgroup_info, time) ==
+	      sizeof(struct perf_time_ctx));
+
 static inline u64 perf_cgroup_event_time(struct perf_event *event)
 {
 	struct perf_cgroup_info *t;
 
 	t = per_cpu_ptr(event->cgrp->info, event->cpu);
-	return t->time.time;
+	return __perf_event_time_ctx(event, &t->time);
 }
 
 static inline u64 perf_cgroup_event_time_now(struct perf_event *event, u64 now)
@@ -883,9 +940,21 @@ static inline u64 perf_cgroup_event_time_now(struct perf_event *event, u64 now)
 
 	t = per_cpu_ptr(event->cgrp->info, event->cpu);
 	if (!__load_acquire(&t->active))
-		return t->time.time;
-	now += READ_ONCE(t->time.offset);
-	return now;
+		return __perf_event_time_ctx(event, &t->time);
+
+	return __perf_event_time_ctx_now(event, &t->time, now);
+}
+
+static inline void __update_cgrp_guest_time(struct perf_cgroup_info *info, u64 now, bool adv)
+{
+	update_perf_time_ctx(&info->timeguest, now, adv);
+}
+
+static inline void update_cgrp_time(struct perf_cgroup_info *info, u64 now)
+{
+	update_perf_time_ctx(&info->time, now, true);
+	if (is_guest_mediated_pmu_loaded())
+		__update_cgrp_guest_time(info, now, true);
 }
 
 static inline void update_cgrp_time_from_cpuctx(struct perf_cpu_context *cpuctx, bool final)
@@ -901,7 +970,7 @@ static inline void update_cgrp_time_from_cpuctx(struct perf_cpu_context *cpuctx,
 			cgrp = container_of(css, struct perf_cgroup, css);
 			info = this_cpu_ptr(cgrp->info);
 
-			update_perf_time_ctx(&info->time, now, true);
+			update_cgrp_time(info, now);
 			if (final)
 				__store_release(&info->active, 0);
 		}
@@ -924,11 +993,11 @@ static inline void update_cgrp_time_from_event(struct perf_event *event)
 	 * Do not update time when cgroup is not active
 	 */
 	if (info->active)
-		update_perf_time_ctx(&info->time, perf_clock(), true);
+		update_cgrp_time(info, perf_clock());
 }
 
 static inline void
-perf_cgroup_set_timestamp(struct perf_cpu_context *cpuctx)
+perf_cgroup_set_timestamp(struct perf_cpu_context *cpuctx, bool guest)
 {
 	struct perf_event_context *ctx = &cpuctx->ctx;
 	struct perf_cgroup *cgrp = cpuctx->cgrp;
@@ -948,8 +1017,12 @@ perf_cgroup_set_timestamp(struct perf_cpu_context *cpuctx)
 	for (css = &cgrp->css; css; css = css->parent) {
 		cgrp = container_of(css, struct perf_cgroup, css);
 		info = this_cpu_ptr(cgrp->info);
-		update_perf_time_ctx(&info->time, ctx->time.stamp, false);
-		__store_release(&info->active, 1);
+		if (guest) {
+			__update_cgrp_guest_time(info, ctx->time.stamp, false);
+		} else {
+			update_perf_time_ctx(&info->time, ctx->time.stamp, false);
+			__store_release(&info->active, 1);
+		}
 	}
 }
 
@@ -1153,7 +1226,7 @@ static inline int perf_cgroup_connect(pid_t pid, struct perf_event *event,
 }
 
 static inline void
-perf_cgroup_set_timestamp(struct perf_cpu_context *cpuctx)
+perf_cgroup_set_timestamp(struct perf_cpu_context *cpuctx, bool guest)
 {
 }
 
@@ -1565,16 +1638,24 @@ static void perf_unpin_context(struct perf_event_context *ctx)
  */
 static void __update_context_time(struct perf_event_context *ctx, bool adv)
 {
-	u64 now = perf_clock();
+	lockdep_assert_held(&ctx->lock);
+
+	update_perf_time_ctx(&ctx->time, perf_clock(), adv);
+}
 
+static void __update_context_guest_time(struct perf_event_context *ctx, bool adv)
+{
 	lockdep_assert_held(&ctx->lock);
 
-	update_perf_time_ctx(&ctx->time, now, adv);
+	/* must be called after __update_context_time(); */
+	update_perf_time_ctx(&ctx->timeguest, ctx->time.stamp, adv);
 }
 
 static void update_context_time(struct perf_event_context *ctx)
 {
 	__update_context_time(ctx, true);
+	if (is_guest_mediated_pmu_loaded())
+		__update_context_guest_time(ctx, true);
 }
 
 static u64 perf_event_time(struct perf_event *event)
@@ -1587,7 +1668,7 @@ static u64 perf_event_time(struct perf_event *event)
 	if (is_cgroup_event(event))
 		return perf_cgroup_event_time(event);
 
-	return ctx->time.time;
+	return __perf_event_time_ctx(event, &ctx->time);
 }
 
 static u64 perf_event_time_now(struct perf_event *event, u64 now)
@@ -1601,10 +1682,9 @@ static u64 perf_event_time_now(struct perf_event *event, u64 now)
 		return perf_cgroup_event_time_now(event, now);
 
 	if (!(__load_acquire(&ctx->is_active) & EVENT_TIME))
-		return ctx->time.time;
+		return __perf_event_time_ctx(event, &ctx->time);
 
-	now += READ_ONCE(ctx->time.offset);
-	return now;
+	return __perf_event_time_ctx_now(event, &ctx->time, now);
 }
 
 static enum event_type_t get_event_type(struct perf_event *event)
@@ -2427,20 +2507,23 @@ group_sched_out(struct perf_event *group_event, struct perf_event_context *ctx)
 }
 
 static inline void
-__ctx_time_update(struct perf_cpu_context *cpuctx, struct perf_event_context *ctx, bool final)
+__ctx_time_update(struct perf_cpu_context *cpuctx, struct perf_event_context *ctx,
+		  bool final, enum event_type_t event_type)
 {
 	if (ctx->is_active & EVENT_TIME) {
 		if (ctx->is_active & EVENT_FROZEN)
 			return;
+
 		update_context_time(ctx);
-		update_cgrp_time_from_cpuctx(cpuctx, final);
+		/* vPMU should not stop time */
+		update_cgrp_time_from_cpuctx(cpuctx, !(event_type & EVENT_GUEST) && final);
 	}
 }
 
 static inline void
 ctx_time_update(struct perf_cpu_context *cpuctx, struct perf_event_context *ctx)
 {
-	__ctx_time_update(cpuctx, ctx, false);
+	__ctx_time_update(cpuctx, ctx, false, 0);
 }
 
 /*
@@ -3506,7 +3589,7 @@ ctx_sched_out(struct perf_event_context *ctx, struct pmu *pmu, enum event_type_t
 	 *
 	 * would only update time for the pinned events.
 	 */
-	__ctx_time_update(cpuctx, ctx, ctx == &cpuctx->ctx);
+	__ctx_time_update(cpuctx, ctx, ctx == &cpuctx->ctx, event_type);
 
 	/*
 	 * CPU-release for the below ->is_active store,
@@ -3532,7 +3615,18 @@ ctx_sched_out(struct perf_event_context *ctx, struct pmu *pmu, enum event_type_t
 			cpuctx->task_ctx = NULL;
 	}
 
-	is_active ^= ctx->is_active; /* changed bits */
+	if (event_type & EVENT_GUEST) {
+		/*
+		 * Schedule out all exclude_guest events of PMU
+		 * with PERF_PMU_CAP_MEDIATED_VPMU.
+		 */
+		is_active = EVENT_ALL;
+		__update_context_guest_time(ctx, false);
+		perf_cgroup_set_timestamp(cpuctx, true);
+		barrier();
+	} else {
+		is_active ^= ctx->is_active; /* changed bits */
+	}
 
 	for_each_epc(pmu_ctx, ctx, pmu, event_type)
 		__pmu_ctx_sched_out(pmu_ctx, is_active);
@@ -3991,10 +4085,15 @@ static inline void group_update_userpage(struct perf_event *group_event)
 		event_update_userpage(event);
 }
 
+struct merge_sched_data {
+	int can_add_hw;
+	enum event_type_t event_type;
+};
+
 static int merge_sched_in(struct perf_event *event, void *data)
 {
 	struct perf_event_context *ctx = event->ctx;
-	int *can_add_hw = data;
+	struct merge_sched_data *msd = data;
 
 	if (event->state <= PERF_EVENT_STATE_OFF)
 		return 0;
@@ -4002,13 +4101,22 @@ static int merge_sched_in(struct perf_event *event, void *data)
 	if (!event_filter_match(event))
 		return 0;
 
-	if (group_can_go_on(event, *can_add_hw)) {
+	/*
+	 * Don't schedule in any host events from PMU with
+	 * PERF_PMU_CAP_MEDIATED_VPMU, while a guest is running.
+	 */
+	if (is_guest_mediated_pmu_loaded() &&
+	    event->pmu_ctx->pmu->capabilities & PERF_PMU_CAP_MEDIATED_VPMU &&
+	    !(msd->event_type & EVENT_GUEST))
+		return 0;
+
+	if (group_can_go_on(event, msd->can_add_hw)) {
 		if (!group_sched_in(event, ctx))
 			list_add_tail(&event->active_list, get_event_list(event));
 	}
 
 	if (event->state == PERF_EVENT_STATE_INACTIVE) {
-		*can_add_hw = 0;
+		msd->can_add_hw = 0;
 		if (event->attr.pinned) {
 			perf_cgroup_event_disable(event, ctx);
 			perf_event_set_state(event, PERF_EVENT_STATE_ERROR);
@@ -4031,11 +4139,15 @@ static int merge_sched_in(struct perf_event *event, void *data)
 
 static void pmu_groups_sched_in(struct perf_event_context *ctx,
 				struct perf_event_groups *groups,
-				struct pmu *pmu)
+				struct pmu *pmu,
+				enum event_type_t event_type)
 {
-	int can_add_hw = 1;
+	struct merge_sched_data msd = {
+		.can_add_hw = 1,
+		.event_type = event_type,
+	};
 	visit_groups_merge(ctx, groups, smp_processor_id(), pmu,
-			   merge_sched_in, &can_add_hw);
+			   merge_sched_in, &msd);
 }
 
 static void __pmu_ctx_sched_in(struct perf_event_pmu_context *pmu_ctx,
@@ -4044,9 +4156,9 @@ static void __pmu_ctx_sched_in(struct perf_event_pmu_context *pmu_ctx,
 	struct perf_event_context *ctx = pmu_ctx->ctx;
 
 	if (event_type & EVENT_PINNED)
-		pmu_groups_sched_in(ctx, &ctx->pinned_groups, pmu_ctx->pmu);
+		pmu_groups_sched_in(ctx, &ctx->pinned_groups, pmu_ctx->pmu, event_type);
 	if (event_type & EVENT_FLEXIBLE)
-		pmu_groups_sched_in(ctx, &ctx->flexible_groups, pmu_ctx->pmu);
+		pmu_groups_sched_in(ctx, &ctx->flexible_groups, pmu_ctx->pmu, event_type);
 }
 
 static void
@@ -4063,9 +4175,11 @@ ctx_sched_in(struct perf_event_context *ctx, struct pmu *pmu, enum event_type_t
 		return;
 
 	if (!(is_active & EVENT_TIME)) {
+		/* EVENT_TIME should be active while the guest runs */
+		WARN_ON_ONCE(event_type & EVENT_GUEST);
 		/* start ctx time */
 		__update_context_time(ctx, false);
-		perf_cgroup_set_timestamp(cpuctx);
+		perf_cgroup_set_timestamp(cpuctx, false);
 		/*
 		 * CPU-release for the below ->is_active store,
 		 * see __load_acquire() in perf_event_time_now()
@@ -4081,7 +4195,23 @@ ctx_sched_in(struct perf_event_context *ctx, struct pmu *pmu, enum event_type_t
 			WARN_ON_ONCE(cpuctx->task_ctx != ctx);
 	}
 
-	is_active ^= ctx->is_active; /* changed bits */
+	if (event_type & EVENT_GUEST) {
+		/*
+		 * Schedule in the required exclude_guest events of PMU
+		 * with PERF_PMU_CAP_MEDIATED_VPMU.
+		 */
+		is_active = event_type & EVENT_ALL;
+
+		/*
+		 * Update ctx time to set the new start time for
+		 * the exclude_guest events.
+		 */
+		update_context_time(ctx);
+		update_cgrp_time_from_cpuctx(cpuctx, false);
+		barrier();
+	} else {
+		is_active ^= ctx->is_active; /* changed bits */
+	}
 
 	/*
 	 * First go through the list and put on any pinned groups
@@ -4089,13 +4219,13 @@ ctx_sched_in(struct perf_event_context *ctx, struct pmu *pmu, enum event_type_t
 	 */
 	if (is_active & EVENT_PINNED) {
 		for_each_epc(pmu_ctx, ctx, pmu, event_type)
-			__pmu_ctx_sched_in(pmu_ctx, EVENT_PINNED);
+			__pmu_ctx_sched_in(pmu_ctx, EVENT_PINNED | (event_type & EVENT_GUEST));
 	}
 
 	/* Then walk through the lower prio flexible groups */
 	if (is_active & EVENT_FLEXIBLE) {
 		for_each_epc(pmu_ctx, ctx, pmu, event_type)
-			__pmu_ctx_sched_in(pmu_ctx, EVENT_FLEXIBLE);
+			__pmu_ctx_sched_in(pmu_ctx, EVENT_FLEXIBLE | (event_type & EVENT_GUEST));
 	}
 }
 
@@ -6621,23 +6751,23 @@ void perf_event_update_userpage(struct perf_event *event)
 	if (!rb)
 		goto unlock;
 
-	/*
-	 * compute total_time_enabled, total_time_running
-	 * based on snapshot values taken when the event
-	 * was last scheduled in.
-	 *
-	 * we cannot simply called update_context_time()
-	 * because of locking issue as we can be called in
-	 * NMI context
-	 */
-	calc_timer_values(event, &now, &enabled, &running);
-
-	userpg = rb->user_page;
 	/*
 	 * Disable preemption to guarantee consistent time stamps are stored to
 	 * the user page.
 	 */
 	preempt_disable();
+
+	/*
+	 * Compute total_time_enabled, total_time_running based on snapshot
+	 * values taken when the event was last scheduled in.
+	 *
+	 * We cannot simply call update_context_time() because doing so would
+	 * lead to deadlock when called from NMI context.
+	 */
+	calc_timer_values(event, &now, &enabled, &running);
+
+	userpg = rb->user_page;
+
 	++userpg->lock;
 	barrier();
 	userpg->index = perf_event_index(event);
@@ -7902,13 +8032,11 @@ static void perf_output_read(struct perf_output_handle *handle,
 	u64 read_format = event->attr.read_format;
 
 	/*
-	 * compute total_time_enabled, total_time_running
-	 * based on snapshot values taken when the event
-	 * was last scheduled in.
+	 * Compute total_time_enabled, total_time_running based on snapshot
+	 * values taken when the event was last scheduled in.
 	 *
-	 * we cannot simply called update_context_time()
-	 * because of locking issue as we are called in
-	 * NMI context
+	 * We cannot simply call update_context_time() because doing so would
+	 * lead to deadlock when called from NMI context.
 	 */
 	if (read_format & PERF_FORMAT_TOTAL_TIMES)
 		calc_timer_values(event, &now, &enabled, &running);
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 07/44] perf: Add APIs to load/put guest mediated PMU context
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (5 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 06/44] perf: Add a EVENT_GUEST flag Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-08  7:30   ` Mi, Dapeng
  2025-08-06 19:56 ` [PATCH v5 08/44] perf: core/x86: Register a new vector for handling mediated guest PMIs Sean Christopherson
                   ` (39 subsequent siblings)
  46 siblings, 1 reply; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Kan Liang <kan.liang@linux.intel.com>

Add exported APIs to load/put a guest mediated PMU context.  KVM will
load the guest PMU shortly before VM-Enter, and put the guest PMU shortly
after VM-Exit.

On the perf side of things, schedule out all exclude_guest events when the
guest context is loaded, and schedule them back in when the guest context
is put.  I.e. yield the hardware PMU resources to the guest, by way of KVM.

Note, perf is only responsible for managing host context.  KVM is
responsible for loading/storing guest state to/from hardware.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
[sean: shuffle patches around, write changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/perf_event.h |  2 ++
 kernel/events/core.c       | 61 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 0958b6d0a61c..42d019d70b42 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1925,6 +1925,8 @@ extern u64 perf_event_pause(struct perf_event *event, bool reset);
 #ifdef CONFIG_PERF_GUEST_MEDIATED_PMU
 int perf_create_mediated_pmu(void);
 void perf_release_mediated_pmu(void);
+void perf_load_guest_context(unsigned long data);
+void perf_put_guest_context(void);
 #endif
 
 #else /* !CONFIG_PERF_EVENTS: */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6875b56ddd6b..77398b1ad4c5 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -469,10 +469,19 @@ static cpumask_var_t perf_online_pkg_mask;
 static cpumask_var_t perf_online_sys_mask;
 static struct kmem_cache *perf_event_cache;
 
+#ifdef CONFIG_PERF_GUEST_MEDIATED_PMU
+static DEFINE_PER_CPU(bool, guest_ctx_loaded);
+
+static __always_inline bool is_guest_mediated_pmu_loaded(void)
+{
+	return __this_cpu_read(guest_ctx_loaded);
+}
+#else
 static __always_inline bool is_guest_mediated_pmu_loaded(void)
 {
 	return false;
 }
+#endif
 
 /*
  * perf event paranoia level:
@@ -6379,6 +6388,58 @@ void perf_release_mediated_pmu(void)
 	atomic_dec(&nr_mediated_pmu_vms);
 }
 EXPORT_SYMBOL_GPL(perf_release_mediated_pmu);
+
+/* When loading a guest's mediated PMU, schedule out all exclude_guest events. */
+void perf_load_guest_context(unsigned long data)
+{
+	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
+
+	lockdep_assert_irqs_disabled();
+
+	guard(perf_ctx_lock)(cpuctx, cpuctx->task_ctx);
+
+	if (WARN_ON_ONCE(__this_cpu_read(guest_ctx_loaded)))
+		return;
+
+	perf_ctx_disable(&cpuctx->ctx, EVENT_GUEST);
+	ctx_sched_out(&cpuctx->ctx, NULL, EVENT_GUEST);
+	if (cpuctx->task_ctx) {
+		perf_ctx_disable(cpuctx->task_ctx, EVENT_GUEST);
+		task_ctx_sched_out(cpuctx->task_ctx, NULL, EVENT_GUEST);
+	}
+
+	perf_ctx_enable(&cpuctx->ctx, EVENT_GUEST);
+	if (cpuctx->task_ctx)
+		perf_ctx_enable(cpuctx->task_ctx, EVENT_GUEST);
+
+	__this_cpu_write(guest_ctx_loaded, true);
+}
+EXPORT_SYMBOL_GPL(perf_load_guest_context);
+
+void perf_put_guest_context(void)
+{
+	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
+
+	lockdep_assert_irqs_disabled();
+
+	guard(perf_ctx_lock)(cpuctx, cpuctx->task_ctx);
+
+	if (WARN_ON_ONCE(!__this_cpu_read(guest_ctx_loaded)))
+		return;
+
+	perf_ctx_disable(&cpuctx->ctx, EVENT_GUEST);
+	if (cpuctx->task_ctx)
+		perf_ctx_disable(cpuctx->task_ctx, EVENT_GUEST);
+
+	perf_event_sched_in(cpuctx, cpuctx->task_ctx, NULL, EVENT_GUEST);
+
+	if (cpuctx->task_ctx)
+		perf_ctx_enable(cpuctx->task_ctx, EVENT_GUEST);
+	perf_ctx_enable(&cpuctx->ctx, EVENT_GUEST);
+
+	__this_cpu_write(guest_ctx_loaded, false);
+}
+EXPORT_SYMBOL_GPL(perf_put_guest_context);
 #else
 static int mediated_pmu_account_event(struct perf_event *event) { return 0; }
 static void mediated_pmu_unaccount_event(struct perf_event *event) {}
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 08/44] perf: core/x86: Register a new vector for handling mediated guest PMIs
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (6 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 07/44] perf: Add APIs to load/put guest mediated PMU context Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 09/44] perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put context Sean Christopherson
                   ` (38 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

Wire up system vector 0xf5 for handling PMIs (i.e. interrupts delivered
through the LVTPC) while running KVM guests with a mediated PMU.  Perf
currently delivers all PMIs as NMIs, e.g. so that events that trigger while
IRQs are disabled aren't delayed and generate useless records, but due to
the multiplexing of NMIs throughout the system, correctly identifying NMIs
for a mediated PMU is practically infeasible.

To (greatly) simplify identifying guest mediated PMU PMIs, perf will
switch the CPU's LVTPC between PERF_GUEST_MEDIATED_PMI_VECTOR and NMI when
guest PMU context is loaded/put.  I.e. PMIs that are generated by the CPU
while the guest is active will be identified purely based on the IRQ
vector.

Route the vector through perf, e.g. as opposed to letting KVM attach a
handler directly a la posted interrupt notification vectors, as perf owns
the LVTPC and thus is the rightful owner of PERF_GUEST_MEDIATED_PMI_VECTOR.
Functionally, having KVM directly own the vector would be fine (both KVM
and perf will be completely aware of when a mediated PMU is active), but
would lead to an undesirable split in ownership: perf would be responsible
for installing the vector, but not handling the resulting IRQs.

Add a new perf_guest_info_callbacks hook (and static call) to allow KVM to
register its handler with perf when running guests with mediated PMUs.

Note, because KVM always runs guests with host IRQs enabled, there is no
danger of a PMI being delayed from the guest's perspective due to using a
regular IRQ instead of an NMI.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/entry/entry_fred.c                   |  1 +
 arch/x86/include/asm/hardirq.h                |  3 +++
 arch/x86/include/asm/idtentry.h               |  6 ++++++
 arch/x86/include/asm/irq_vectors.h            |  4 +++-
 arch/x86/kernel/idt.c                         |  3 +++
 arch/x86/kernel/irq.c                         | 19 +++++++++++++++++++
 include/linux/perf_event.h                    |  8 ++++++++
 kernel/events/core.c                          |  9 +++++++--
 .../beauty/arch/x86/include/asm/irq_vectors.h |  3 ++-
 virt/kvm/kvm_main.c                           |  3 +++
 10 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c
index f004a4dc74c2..d80861a4cd00 100644
--- a/arch/x86/entry/entry_fred.c
+++ b/arch/x86/entry/entry_fred.c
@@ -114,6 +114,7 @@ static idtentry_t sysvec_table[NR_SYSTEM_VECTORS] __ro_after_init = {
 
 	SYSVEC(IRQ_WORK_VECTOR,			irq_work),
 
+	SYSVEC(PERF_GUEST_MEDIATED_PMI_VECTOR,	perf_guest_mediated_pmi_handler),
 	SYSVEC(POSTED_INTR_VECTOR,		kvm_posted_intr_ipi),
 	SYSVEC(POSTED_INTR_WAKEUP_VECTOR,	kvm_posted_intr_wakeup_ipi),
 	SYSVEC(POSTED_INTR_NESTED_VECTOR,	kvm_posted_intr_nested_ipi),
diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index f00c09ffe6a9..f221d001e4ed 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -18,6 +18,9 @@ typedef struct {
 	unsigned int kvm_posted_intr_ipis;
 	unsigned int kvm_posted_intr_wakeup_ipis;
 	unsigned int kvm_posted_intr_nested_ipis;
+#endif
+#ifdef CONFIG_GUEST_PERF_EVENTS
+	unsigned int perf_guest_mediated_pmis;
 #endif
 	unsigned int x86_platform_ipis;	/* arch dependent */
 	unsigned int apic_perf_irqs;
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index a4ec27c67988..5109c24b3c1c 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -751,6 +751,12 @@ DECLARE_IDTENTRY_SYSVEC(POSTED_INTR_NESTED_VECTOR,	sysvec_kvm_posted_intr_nested
 # define fred_sysvec_kvm_posted_intr_nested_ipi		NULL
 #endif
 
+# ifdef CONFIG_GUEST_PERF_EVENTS
+DECLARE_IDTENTRY_SYSVEC(PERF_GUEST_MEDIATED_PMI_VECTOR,	sysvec_perf_guest_mediated_pmi_handler);
+#else
+# define fred_sysvec_perf_guest_mediated_pmi_handler	NULL
+#endif
+
 # ifdef CONFIG_X86_POSTED_MSI
 DECLARE_IDTENTRY_SYSVEC(POSTED_MSI_NOTIFICATION_VECTOR,	sysvec_posted_msi_notification);
 #else
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 47051871b436..85253fc8e384 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -77,7 +77,9 @@
  */
 #define IRQ_WORK_VECTOR			0xf6
 
-/* 0xf5 - unused, was UV_BAU_MESSAGE */
+/* IRQ vector for PMIs when running a guest with a mediated PMU. */
+#define PERF_GUEST_MEDIATED_PMI_VECTOR	0xf5
+
 #define DEFERRED_ERROR_VECTOR		0xf4
 
 /* Vector on which hypervisor callbacks will be delivered */
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index f445bec516a0..260456588756 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -158,6 +158,9 @@ static const __initconst struct idt_data apic_idts[] = {
 	INTG(POSTED_INTR_WAKEUP_VECTOR,		asm_sysvec_kvm_posted_intr_wakeup_ipi),
 	INTG(POSTED_INTR_NESTED_VECTOR,		asm_sysvec_kvm_posted_intr_nested_ipi),
 # endif
+#ifdef CONFIG_GUEST_PERF_EVENTS
+	INTG(PERF_GUEST_MEDIATED_PMI_VECTOR,	asm_sysvec_perf_guest_mediated_pmi_handler),
+#endif
 # ifdef CONFIG_IRQ_WORK
 	INTG(IRQ_WORK_VECTOR,			asm_sysvec_irq_work),
 # endif
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 9ed29ff10e59..70ab066aa4cb 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -191,6 +191,13 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 			   irq_stats(j)->kvm_posted_intr_wakeup_ipis);
 	seq_puts(p, "  Posted-interrupt wakeup event\n");
 #endif
+#ifdef CONFIG_GUEST_PERF_EVENTS
+	seq_printf(p, "%*s: ", prec, "VPMI");
+	for_each_online_cpu(j)
+		seq_printf(p, "%10u ",
+			   irq_stats(j)->perf_guest_mediated_pmis);
+	seq_puts(p, " Perf Guest Mediated PMI\n");
+#endif
 #ifdef CONFIG_X86_POSTED_MSI
 	seq_printf(p, "%*s: ", prec, "PMN");
 	for_each_online_cpu(j)
@@ -315,6 +322,18 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_x86_platform_ipi)
 }
 #endif
 
+#ifdef CONFIG_GUEST_PERF_EVENTS
+/*
+ * Handler for PERF_GUEST_MEDIATED_PMI_VECTOR.
+ */
+DEFINE_IDTENTRY_SYSVEC(sysvec_perf_guest_mediated_pmi_handler)
+{
+	 apic_eoi();
+	 inc_irq_stat(perf_guest_mediated_pmis);
+	 perf_guest_handle_mediated_pmi();
+}
+#endif
+
 #if IS_ENABLED(CONFIG_KVM)
 static void dummy_handler(void) {}
 static void (*kvm_posted_intr_wakeup_handler)(void) = dummy_handler;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 42d019d70b42..0c529fbd97e6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1677,6 +1677,8 @@ struct perf_guest_info_callbacks {
 	unsigned int			(*state)(void);
 	unsigned long			(*get_ip)(void);
 	unsigned int			(*handle_intel_pt_intr)(void);
+
+	void				(*handle_mediated_pmi)(void);
 };
 
 #ifdef CONFIG_GUEST_PERF_EVENTS
@@ -1686,6 +1688,7 @@ extern struct perf_guest_info_callbacks __rcu *perf_guest_cbs;
 DECLARE_STATIC_CALL(__perf_guest_state, *perf_guest_cbs->state);
 DECLARE_STATIC_CALL(__perf_guest_get_ip, *perf_guest_cbs->get_ip);
 DECLARE_STATIC_CALL(__perf_guest_handle_intel_pt_intr, *perf_guest_cbs->handle_intel_pt_intr);
+DECLARE_STATIC_CALL(__perf_guest_handle_mediated_pmi, *perf_guest_cbs->handle_mediated_pmi);
 
 static inline unsigned int perf_guest_state(void)
 {
@@ -1702,6 +1705,11 @@ static inline unsigned int perf_guest_handle_intel_pt_intr(void)
 	return static_call(__perf_guest_handle_intel_pt_intr)();
 }
 
+static inline void perf_guest_handle_mediated_pmi(void)
+{
+	static_call(__perf_guest_handle_mediated_pmi)();
+}
+
 extern void perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs);
 extern void perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs);
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 77398b1ad4c5..e1df3c3bfc0d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7607,6 +7607,7 @@ struct perf_guest_info_callbacks __rcu *perf_guest_cbs;
 DEFINE_STATIC_CALL_RET0(__perf_guest_state, *perf_guest_cbs->state);
 DEFINE_STATIC_CALL_RET0(__perf_guest_get_ip, *perf_guest_cbs->get_ip);
 DEFINE_STATIC_CALL_RET0(__perf_guest_handle_intel_pt_intr, *perf_guest_cbs->handle_intel_pt_intr);
+DEFINE_STATIC_CALL_RET0(__perf_guest_handle_mediated_pmi, *perf_guest_cbs->handle_mediated_pmi);
 
 void perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
 {
@@ -7621,6 +7622,10 @@ void perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
 	if (cbs->handle_intel_pt_intr)
 		static_call_update(__perf_guest_handle_intel_pt_intr,
 				   cbs->handle_intel_pt_intr);
+
+	if (cbs->handle_mediated_pmi)
+		static_call_update(__perf_guest_handle_mediated_pmi,
+				   cbs->handle_mediated_pmi);
 }
 EXPORT_SYMBOL_GPL(perf_register_guest_info_callbacks);
 
@@ -7632,8 +7637,8 @@ void perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
 	rcu_assign_pointer(perf_guest_cbs, NULL);
 	static_call_update(__perf_guest_state, (void *)&__static_call_return0);
 	static_call_update(__perf_guest_get_ip, (void *)&__static_call_return0);
-	static_call_update(__perf_guest_handle_intel_pt_intr,
-			   (void *)&__static_call_return0);
+	static_call_update(__perf_guest_handle_intel_pt_intr, (void *)&__static_call_return0);
+	static_call_update(__perf_guest_handle_mediated_pmi, (void *)&__static_call_return0);
 	synchronize_rcu();
 }
 EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks);
diff --git a/tools/perf/trace/beauty/arch/x86/include/asm/irq_vectors.h b/tools/perf/trace/beauty/arch/x86/include/asm/irq_vectors.h
index 47051871b436..6e1d5b955aae 100644
--- a/tools/perf/trace/beauty/arch/x86/include/asm/irq_vectors.h
+++ b/tools/perf/trace/beauty/arch/x86/include/asm/irq_vectors.h
@@ -77,7 +77,8 @@
  */
 #define IRQ_WORK_VECTOR			0xf6
 
-/* 0xf5 - unused, was UV_BAU_MESSAGE */
+#define PERF_GUEST_MEDIATED_PMI_VECTOR	0xf5
+
 #define DEFERRED_ERROR_VECTOR		0xf4
 
 /* Vector on which hypervisor callbacks will be delivered */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6c07dd423458..ecafab2e17d9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -6426,11 +6426,14 @@ static struct perf_guest_info_callbacks kvm_guest_cbs = {
 	.state			= kvm_guest_state,
 	.get_ip			= kvm_guest_get_ip,
 	.handle_intel_pt_intr	= NULL,
+	.handle_mediated_pmi	= NULL,
 };
 
 void kvm_register_perf_callbacks(unsigned int (*pt_intr_handler)(void))
 {
 	kvm_guest_cbs.handle_intel_pt_intr = pt_intr_handler;
+	kvm_guest_cbs.handle_mediated_pmi = NULL;
+
 	perf_register_guest_info_callbacks(&kvm_guest_cbs);
 }
 void kvm_unregister_perf_callbacks(void)
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 09/44] perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put context
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (7 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 08/44] perf: core/x86: Register a new vector for handling mediated guest PMIs Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-15 11:39   ` Peter Zijlstra
  2025-08-15 13:04   ` Peter Zijlstra
  2025-08-06 19:56 ` [PATCH v5 10/44] perf/x86/core: Do not set bit width for unavailable counters Sean Christopherson
                   ` (37 subsequent siblings)
  46 siblings, 2 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

Add arch hooks to the mediated vPMU load/put APIs, and use the hooks to
switch PMIs to the dedicated mediated PMU IRQ vector on load, and back to
perf's standard NMI when the guest context is put.  I.e. route PMIs to
PERF_GUEST_MEDIATED_PMI_VECTOR when the guest context is active, and to
NMIs while the host context is active.

While running with guest context loaded, ignore all NMIs (in perf).  Any
NMI that arrives while the LVTPC points at the mediated PMU IRQ vector
can't possibly be due to a host perf event.

Signed-off-by: Xiong Zhang <xiong.y.zhang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
[sean: use arch hook instead of per-PMU callback]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/events/core.c     | 27 +++++++++++++++++++++++++++
 include/linux/perf_event.h |  3 +++
 kernel/events/core.c       |  4 ++++
 3 files changed, 34 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 7610f26dfbd9..9b0525b252f1 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -55,6 +55,8 @@ DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
 	.pmu = &pmu,
 };
 
+static DEFINE_PER_CPU(bool, x86_guest_ctx_loaded);
+
 DEFINE_STATIC_KEY_FALSE(rdpmc_never_available_key);
 DEFINE_STATIC_KEY_FALSE(rdpmc_always_available_key);
 DEFINE_STATIC_KEY_FALSE(perf_is_hybrid);
@@ -1756,6 +1758,16 @@ perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)
 	u64 finish_clock;
 	int ret;
 
+	/*
+	 * Ignore all NMIs when a guest's mediated PMU context is loaded.  Any
+	 * such NMI can't be due to a PMI as the CPU's LVTPC is switched to/from
+	 * the dedicated mediated PMI IRQ vector while host events are quiesced.
+	 * Attempting to handle a PMI while the guest's context is loaded will
+	 * generate false positives and clobber guest state.
+	 */
+	if (this_cpu_read(x86_guest_ctx_loaded))
+		return NMI_DONE;
+
 	/*
 	 * All PMUs/events that share this PMI handler should make sure to
 	 * increment active_events for their events.
@@ -2727,6 +2739,21 @@ static struct pmu pmu = {
 	.filter			= x86_pmu_filter,
 };
 
+void arch_perf_load_guest_context(unsigned long data)
+{
+	u32 masked = data & APIC_LVT_MASKED;
+
+	apic_write(APIC_LVTPC,
+		   APIC_DM_FIXED | PERF_GUEST_MEDIATED_PMI_VECTOR | masked);
+	this_cpu_write(x86_guest_ctx_loaded, true);
+}
+
+void arch_perf_put_guest_context(void)
+{
+	this_cpu_write(x86_guest_ctx_loaded, false);
+	apic_write(APIC_LVTPC, APIC_DM_NMI);
+}
+
 void arch_perf_update_userpage(struct perf_event *event,
 			       struct perf_event_mmap_page *userpg, u64 now)
 {
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 0c529fbd97e6..3a9bd9c4c90e 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1846,6 +1846,9 @@ static inline unsigned long perf_arch_guest_misc_flags(struct pt_regs *regs)
 # define perf_arch_guest_misc_flags(regs)	perf_arch_guest_misc_flags(regs)
 #endif
 
+extern void arch_perf_load_guest_context(unsigned long data);
+extern void arch_perf_put_guest_context(void);
+
 static inline bool needs_branch_stack(struct perf_event *event)
 {
 	return event->attr.branch_sample_type != 0;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index e1df3c3bfc0d..ad22b182762e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6408,6 +6408,8 @@ void perf_load_guest_context(unsigned long data)
 		task_ctx_sched_out(cpuctx->task_ctx, NULL, EVENT_GUEST);
 	}
 
+	arch_perf_load_guest_context(data);
+
 	perf_ctx_enable(&cpuctx->ctx, EVENT_GUEST);
 	if (cpuctx->task_ctx)
 		perf_ctx_enable(cpuctx->task_ctx, EVENT_GUEST);
@@ -6433,6 +6435,8 @@ void perf_put_guest_context(void)
 
 	perf_event_sched_in(cpuctx, cpuctx->task_ctx, NULL, EVENT_GUEST);
 
+	arch_perf_put_guest_context();
+
 	if (cpuctx->task_ctx)
 		perf_ctx_enable(cpuctx->task_ctx, EVENT_GUEST);
 	perf_ctx_enable(&cpuctx->ctx, EVENT_GUEST);
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 10/44] perf/x86/core: Do not set bit width for unavailable counters
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (8 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 09/44] perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put context Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 11/44] perf/x86/core: Plumb mediated PMU capability from x86_pmu to x86_pmu_cap Sean Christopherson
                   ` (36 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Sandipan Das <sandipan.das@amd.com>

Not all x86 processors have fixed counters. It may also be the case that
a processor has only fixed counters and no general-purpose counters. Set
the bit widths corresponding to each counter type only if such counters
are available.

Fixes: b3d9468a8bd2 ("perf, x86: Expose perf capability to other modules")
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/events/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 9b0525b252f1..b8583a6962f1 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -3125,8 +3125,8 @@ void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
 	cap->version		= x86_pmu.version;
 	cap->num_counters_gp	= x86_pmu_num_counters(NULL);
 	cap->num_counters_fixed	= x86_pmu_num_counters_fixed(NULL);
-	cap->bit_width_gp	= x86_pmu.cntval_bits;
-	cap->bit_width_fixed	= x86_pmu.cntval_bits;
+	cap->bit_width_gp	= cap->num_counters_gp ? x86_pmu.cntval_bits : 0;
+	cap->bit_width_fixed	= cap->num_counters_fixed ? x86_pmu.cntval_bits : 0;
 	cap->events_mask	= (unsigned int)x86_pmu.events_maskl;
 	cap->events_mask_len	= x86_pmu.events_mask_len;
 	cap->pebs_ept		= x86_pmu.pebs_ept;
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 11/44] perf/x86/core: Plumb mediated PMU capability from x86_pmu to x86_pmu_cap
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (9 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 10/44] perf/x86/core: Do not set bit width for unavailable counters Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 12/44] perf/x86/intel: Support PERF_PMU_CAP_MEDIATED_VPMU Sean Christopherson
                   ` (35 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Mingwei Zhang <mizhang@google.com>

Plumb mediated PMU capability to x86_pmu_cap in order to let any kernel
entity such as KVM know that host PMU support mediated PMU mode and has
the implementation.

Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/events/core.c            | 1 +
 arch/x86/include/asm/perf_event.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index b8583a6962f1..d58aa316b65a 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -3130,6 +3130,7 @@ void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
 	cap->events_mask	= (unsigned int)x86_pmu.events_maskl;
 	cap->events_mask_len	= x86_pmu.events_mask_len;
 	cap->pebs_ept		= x86_pmu.pebs_ept;
+	cap->mediated		= !!(pmu.capabilities & PERF_PMU_CAP_MEDIATED_VPMU);
 }
 EXPORT_SYMBOL_GPL(perf_get_x86_pmu_capability);
 
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 70d1d94aca7e..74db361a53d3 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -292,6 +292,7 @@ struct x86_pmu_capability {
 	unsigned int	events_mask;
 	int		events_mask_len;
 	unsigned int	pebs_ept	:1;
+	unsigned int	mediated	:1;
 };
 
 /*
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 12/44] perf/x86/intel: Support PERF_PMU_CAP_MEDIATED_VPMU
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (10 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 11/44] perf/x86/core: Plumb mediated PMU capability from x86_pmu to x86_pmu_cap Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 13/44] perf/x86/amd: Support PERF_PMU_CAP_MEDIATED_VPMU for AMD host Sean Christopherson
                   ` (34 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Kan Liang <kan.liang@linux.intel.com>

Apply the PERF_PMU_CAP_MEDIATED_VPMU for Intel core PMU. It only indicates
that the perf side of core PMU is ready to support the mediated vPMU.
Besides the capability, the hypervisor, a.k.a. KVM, still needs to check
the PMU version and other PMU features/capabilities to decide whether to
enable support mediated vPMUs.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
[sean: massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/events/intel/core.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index c2fb729c270e..3d93fcf8b650 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5322,6 +5322,8 @@ static void intel_pmu_check_hybrid_pmus(struct x86_hybrid_pmu *pmu)
 	else
 		pmu->intel_ctrl &= ~(1ULL << GLOBAL_CTRL_EN_PERF_METRICS);
 
+	pmu->pmu.capabilities |= PERF_PMU_CAP_MEDIATED_VPMU;
+
 	intel_pmu_check_event_constraints(pmu->event_constraints,
 					  pmu->cntr_mask64,
 					  pmu->fixed_cntr_mask64,
@@ -6939,6 +6941,9 @@ __init int intel_pmu_init(void)
 			pr_cont(" AnyThread deprecated, ");
 	}
 
+	/* The perf side of core PMU is ready to support the mediated vPMU. */
+	x86_get_pmu(smp_processor_id())->capabilities |= PERF_PMU_CAP_MEDIATED_VPMU;
+
 	/*
 	 * Many features on and after V6 require dynamic constraint,
 	 * e.g., Arch PEBS, ACR.
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 13/44] perf/x86/amd: Support PERF_PMU_CAP_MEDIATED_VPMU for AMD host
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (11 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 12/44] perf/x86/intel: Support PERF_PMU_CAP_MEDIATED_VPMU Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 14/44] KVM: VMX: Setup canonical VMCS config prior to kvm_x86_vendor_init() Sean Christopherson
                   ` (33 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Sandipan Das <sandipan.das@amd.com>

Apply the PERF_PMU_CAP_MEDIATED_VPMU flag for version 2 and later
implementations of the core PMU. Aside from having Global Control and
Status registers, virtualizing the PMU using the mediated model requires
an interface to set or clear the overflow bits in the Global Status MSRs
while restoring or saving the PMU context of a vCPU.

PerfMonV2-capable hardware has additional MSRs for this purpose, namely
PerfCntrGlobalStatusSet and PerfCntrGlobalStatusClr, thereby making it
suitable for use with mediated vPMU.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/events/amd/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index b20661b8621d..8179fb5f1ee3 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -1433,6 +1433,8 @@ static int __init amd_core_pmu_init(void)
 
 		amd_pmu_global_cntr_mask = x86_pmu.cntr_mask64;
 
+		x86_get_pmu(smp_processor_id())->capabilities |= PERF_PMU_CAP_MEDIATED_VPMU;
+
 		/* Update PMC handling functions */
 		x86_pmu.enable_all = amd_pmu_v2_enable_all;
 		x86_pmu.disable_all = amd_pmu_v2_disable_all;
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 14/44] KVM: VMX: Setup canonical VMCS config prior to kvm_x86_vendor_init()
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (12 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 13/44] perf/x86/amd: Support PERF_PMU_CAP_MEDIATED_VPMU for AMD host Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 15/44] KVM: SVM: Check pmu->version, not enable_pmu, when getting PMC MSRs Sean Christopherson
                   ` (32 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

Setup the golden VMCS config during vmx_init(), before the call to
kvm_x86_vendor_init(), instead of waiting until the callback to do
hardware setup.  setup_vmcs_config() only touches VMX state, i.e. doesn't
poke anything in kvm.ko, and has no runtime dependencies beyond
hv_init_evmcs().

Setting the VMCS config early on will allow referencing VMCS and VMX
capabilities at any point during setup, e.g. to check for PERF_GLOBAL_CTRL
save/load support during mediated PMU initialization.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 95765db52992..ed10013dac95 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8335,8 +8335,6 @@ __init int vmx_hardware_setup(void)
 
 	vmx_setup_user_return_msrs();
 
-	if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0)
-		return -EIO;
 
 	if (boot_cpu_has(X86_FEATURE_NX))
 		kvm_enable_efer_bits(EFER_NX);
@@ -8560,11 +8558,18 @@ int __init vmx_init(void)
 		return -EOPNOTSUPP;
 
 	/*
-	 * Note, hv_init_evmcs() touches only VMX knobs, i.e. there's nothing
-	 * to unwind if a later step fails.
+	 * Note, VMCS and eVMCS configuration only touch VMX knobs/variables,
+	 * i.e. there's nothing to unwind if a later step fails.
 	 */
 	hv_init_evmcs();
 
+	/*
+	 * Parse the VMCS config and VMX capabilities before anything else, so
+	 * that the information is available to all setup flows.
+	 */
+	if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0)
+		return -EIO;
+
 	r = kvm_x86_vendor_init(&vt_init_ops);
 	if (r)
 		return r;
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 15/44] KVM: SVM: Check pmu->version, not enable_pmu, when getting PMC MSRs
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (13 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 14/44] KVM: VMX: Setup canonical VMCS config prior to kvm_x86_vendor_init() Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-13  9:58   ` Sandipan Das
  2025-08-06 19:56 ` [PATCH v5 16/44] KVM: Add a simplified wrapper for registering perf callbacks Sean Christopherson
                   ` (31 subsequent siblings)
  46 siblings, 1 reply; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

Gate access to PMC MSRs based on pmu->version, not on kvm->arch.enable_pmu,
to more accurately reflect KVM's behavior.  This is a glorified nop, as
pmu->version and pmu->nr_arch_gp_counters can only be non-zero if
amd_pmu_refresh() is reached, kvm_pmu_refresh() invokes amd_pmu_refresh()
if and only if kvm->arch.enable_pmu is true, and amd_pmu_refresh() forces
pmu->version to be 1 or 2.

I.e. the following holds true:

  !pmu->nr_arch_gp_counters || kvm->arch.enable_pmu == (pmu->version > 0)

and so the only way for amd_pmu_get_pmc() to return a non-NULL value is if
both kvm->arch.enable_pmu and pmu->version evaluate to true.

No real functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/pmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 288f7f2a46f2..7b8577f3c57a 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -41,7 +41,7 @@ static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr,
 	struct kvm_vcpu *vcpu = pmu_to_vcpu(pmu);
 	unsigned int idx;
 
-	if (!vcpu->kvm->arch.enable_pmu)
+	if (!pmu->version)
 		return NULL;
 
 	switch (msr) {
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 16/44] KVM: Add a simplified wrapper for registering perf callbacks
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (14 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 15/44] KVM: SVM: Check pmu->version, not enable_pmu, when getting PMC MSRs Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-22 10:32   ` Anup Patel
  2025-08-06 19:56 ` [PATCH v5 17/44] KVM: x86/pmu: Snapshot host (i.e. perf's) reported PMU capabilities Sean Christopherson
                   ` (30 subsequent siblings)
  46 siblings, 1 reply; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

Add a parameter-less API for registering perf callbacks in anticipation of
introducing another x86-only parameter for handling mediated PMU PMIs.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/kvm/arm.c      |  2 +-
 arch/loongarch/kvm/main.c |  2 +-
 arch/riscv/kvm/main.c     |  2 +-
 arch/x86/kvm/x86.c        |  2 +-
 include/linux/kvm_host.h  | 11 +++++++++--
 virt/kvm/kvm_main.c       |  5 +++--
 6 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 888f7c7abf54..6c604b5214f2 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -2328,7 +2328,7 @@ static int __init init_subsystems(void)
 	if (err)
 		goto out;
 
-	kvm_register_perf_callbacks(NULL);
+	kvm_register_perf_callbacks();
 
 out:
 	if (err)
diff --git a/arch/loongarch/kvm/main.c b/arch/loongarch/kvm/main.c
index 80ea63d465b8..f62326fe29fa 100644
--- a/arch/loongarch/kvm/main.c
+++ b/arch/loongarch/kvm/main.c
@@ -394,7 +394,7 @@ static int kvm_loongarch_env_init(void)
 	}
 
 	kvm_init_gcsr_flag();
-	kvm_register_perf_callbacks(NULL);
+	kvm_register_perf_callbacks();
 
 	/* Register LoongArch IPI interrupt controller interface. */
 	ret = kvm_loongarch_register_ipi_device();
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
index 67c876de74ef..cbe842c2f615 100644
--- a/arch/riscv/kvm/main.c
+++ b/arch/riscv/kvm/main.c
@@ -159,7 +159,7 @@ static int __init riscv_kvm_init(void)
 		kvm_info("AIA available with %d guest external interrupts\n",
 			 kvm_riscv_aia_nr_hgei);
 
-	kvm_register_perf_callbacks(NULL);
+	kvm_register_perf_callbacks();
 
 	rc = kvm_init(sizeof(struct kvm_vcpu), 0, THIS_MODULE);
 	if (rc) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5af2c5aed0f2..d80bbd5e0859 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9689,7 +9689,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 		set_hv_tscchange_cb(kvm_hyperv_tsc_notifier);
 #endif
 
-	kvm_register_perf_callbacks(ops->handle_intel_pt_intr);
+	__kvm_register_perf_callbacks(ops->handle_intel_pt_intr, NULL);
 
 	if (IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) && tdp_mmu_enabled)
 		kvm_caps.supported_vm_types |= BIT(KVM_X86_SW_PROTECTED_VM);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 15656b7fba6c..20c50eaa0089 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1731,10 +1731,17 @@ static inline bool kvm_arch_intc_initialized(struct kvm *kvm)
 #ifdef CONFIG_GUEST_PERF_EVENTS
 unsigned long kvm_arch_vcpu_get_ip(struct kvm_vcpu *vcpu);
 
-void kvm_register_perf_callbacks(unsigned int (*pt_intr_handler)(void));
+void __kvm_register_perf_callbacks(unsigned int (*pt_intr_handler)(void),
+				   void (*mediated_pmi_handler)(void));
+
+static inline void kvm_register_perf_callbacks(void)
+{
+	__kvm_register_perf_callbacks(NULL, NULL);
+}
+
 void kvm_unregister_perf_callbacks(void);
 #else
-static inline void kvm_register_perf_callbacks(void *ign) {}
+static inline void kvm_register_perf_callbacks(void) {}
 static inline void kvm_unregister_perf_callbacks(void) {}
 #endif /* CONFIG_GUEST_PERF_EVENTS */
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ecafab2e17d9..d477a7fda0ae 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -6429,10 +6429,11 @@ static struct perf_guest_info_callbacks kvm_guest_cbs = {
 	.handle_mediated_pmi	= NULL,
 };
 
-void kvm_register_perf_callbacks(unsigned int (*pt_intr_handler)(void))
+void __kvm_register_perf_callbacks(unsigned int (*pt_intr_handler)(void),
+				   void (*mediated_pmi_handler)(void))
 {
 	kvm_guest_cbs.handle_intel_pt_intr = pt_intr_handler;
-	kvm_guest_cbs.handle_mediated_pmi = NULL;
+	kvm_guest_cbs.handle_mediated_pmi = mediated_pmi_handler;
 
 	perf_register_guest_info_callbacks(&kvm_guest_cbs);
 }
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 17/44] KVM: x86/pmu: Snapshot host (i.e. perf's) reported PMU capabilities
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (15 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 16/44] KVM: Add a simplified wrapper for registering perf callbacks Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-13  9:56   ` Sandipan Das
  2025-08-06 19:56 ` [PATCH v5 18/44] KVM: x86/pmu: Start stubbing in mediated PMU support Sean Christopherson
                   ` (29 subsequent siblings)
  46 siblings, 1 reply; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

Take a snapshot of the unadulterated PMU capabilities provided by perf so
that KVM can compare guest vPMU capabilities against hardware capabilities
when determining whether or not to intercept PMU MSRs (and RDPMC).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/pmu.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 3206412a35a1..0f3e011824ed 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -26,6 +26,10 @@
 /* This is enough to filter the vast majority of currently defined events. */
 #define KVM_PMU_EVENT_FILTER_MAX_EVENTS 300
 
+/* Unadultered PMU capabilities of the host, i.e. of hardware. */
+static struct x86_pmu_capability __read_mostly kvm_host_pmu;
+
+/* KVM's PMU capabilities, i.e. the intersection of KVM and hardware support. */
 struct x86_pmu_capability __read_mostly kvm_pmu_cap;
 EXPORT_SYMBOL_GPL(kvm_pmu_cap);
 
@@ -104,6 +108,8 @@ void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
 	bool is_intel = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL;
 	int min_nr_gp_ctrs = pmu_ops->MIN_NR_GP_COUNTERS;
 
+	perf_get_x86_pmu_capability(&kvm_host_pmu);
+
 	/*
 	 * Hybrid PMUs don't play nice with virtualization without careful
 	 * configuration by userspace, and KVM's APIs for reporting supported
@@ -114,18 +120,16 @@ void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
 		enable_pmu = false;
 
 	if (enable_pmu) {
-		perf_get_x86_pmu_capability(&kvm_pmu_cap);
-
 		/*
 		 * WARN if perf did NOT disable hardware PMU if the number of
 		 * architecturally required GP counters aren't present, i.e. if
 		 * there are a non-zero number of counters, but fewer than what
 		 * is architecturally required.
 		 */
-		if (!kvm_pmu_cap.num_counters_gp ||
-		    WARN_ON_ONCE(kvm_pmu_cap.num_counters_gp < min_nr_gp_ctrs))
+		if (!kvm_host_pmu.num_counters_gp ||
+		    WARN_ON_ONCE(kvm_host_pmu.num_counters_gp < min_nr_gp_ctrs))
 			enable_pmu = false;
-		else if (is_intel && !kvm_pmu_cap.version)
+		else if (is_intel && !kvm_host_pmu.version)
 			enable_pmu = false;
 	}
 
@@ -134,6 +138,7 @@ void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
 		return;
 	}
 
+	memcpy(&kvm_pmu_cap, &kvm_host_pmu, sizeof(kvm_host_pmu));
 	kvm_pmu_cap.version = min(kvm_pmu_cap.version, 2);
 	kvm_pmu_cap.num_counters_gp = min(kvm_pmu_cap.num_counters_gp,
 					  pmu_ops->MAX_NR_GP_COUNTERS);
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 18/44] KVM: x86/pmu: Start stubbing in mediated PMU support
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (16 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 17/44] KVM: x86/pmu: Snapshot host (i.e. perf's) reported PMU capabilities Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 19/44] KVM: x86/pmu: Implement Intel mediated PMU requirements and constraints Sean Christopherson
                   ` (28 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Dapeng Mi <dapeng1.mi@linux.intel.com>

Introduce enable_mediated_pmu as a global variable, with the intent of
exposing it to userspace a vendor module parameter, to control and reflect
mediated vPMU support.  Wire up the perf plumbing to create+release a
mediated PMU, but defer exposing the parameter to userspace until KVM
support for a mediated PMUs is fully landed.

To (a) minimize compatibility issues, (b) to give userspace a chance to
opt out of the restrictive side-effects of perf_create_mediated_pmu(),
and (c) to avoid adding new dependencies between enabling an in-kernel
irqchip and a mediated vPMU, defer "creating" a mediated PMU in perf
until the first vCPU is created.

Regarding userspace compatibility, an alternative solution would be to
make the mediated PMU fully opt-in, e.g. to avoid unexpected failure due
to perf_create_mediated_pmu() failing.  Ironically, that approach creates
an even bigger compatibility issue, as turning on enable_mediated_pmu
would silently break VMMs that don't utilize KVM_CAP_PMU_CAPABILITY (well,
silently until the guest tried to access PMU assets).

Regarding an in-kernel irqchip, create a mediated PMU if and only if the
VM has an in-kernel local APIC, as the mediated PMU will take a hard
dependency on forwarding PMIs to the guest without bouncing through host
userspace.  Silently "drop" the PMU instead of rejecting KVM_CREATE_VCPU,
as KVM's existing vPMU support doesn't function correctly if the local
APIC is emulated by userspace, e.g. PMIs will never be delivered.  I.e.
it's far, far more likely that rejecting KVM_CREATE_VCPU would cause
problems, e.g. for tests or userspace daemons that just want to probe
basic KVM functionality.

Note!  Deliberately make mediated PMU creation "sticky", i.e. don't unwind
it on failure to create a vCPU.  Practically speaking, there's no harm to
having a VM with a mediated PMU and no vCPUs.  To avoid an "impossible" VM
setup, reject KVM_CAP_PMU_CAPABILITY if a mediated PMU has been created,
i.e. don't let userspace disable PMU support after failed vCPU creation
(with PMU support enabled).

Defer vendor specific requirements and constraints to the future.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Co-developed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/pmu.c              |  4 ++++
 arch/x86/kvm/pmu.h              |  7 +++++++
 arch/x86/kvm/x86.c              | 37 +++++++++++++++++++++++++++++++--
 arch/x86/kvm/x86.h              |  1 +
 5 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d7680612ba1e..ff0d753e2b07 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1471,6 +1471,7 @@ struct kvm_arch {
 
 	bool bus_lock_detection_enabled;
 	bool enable_pmu;
+	bool created_mediated_pmu;
 
 	u32 notify_window;
 	u32 notify_vmexit_flags;
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 0f3e011824ed..4d4bb9b17412 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -133,6 +133,10 @@ void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
 			enable_pmu = false;
 	}
 
+	if (!enable_pmu || !enable_mediated_pmu || !kvm_host_pmu.mediated ||
+	    !pmu_ops->is_mediated_pmu_supported(&kvm_host_pmu))
+		enable_mediated_pmu = false;
+
 	if (!enable_pmu) {
 		memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
 		return;
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 08ae644db00e..f5b6181b772c 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -37,6 +37,8 @@ struct kvm_pmu_ops {
 	void (*deliver_pmi)(struct kvm_vcpu *vcpu);
 	void (*cleanup)(struct kvm_vcpu *vcpu);
 
+	bool (*is_mediated_pmu_supported)(struct x86_pmu_capability *host_pmu);
+
 	const u64 EVENTSEL_EVENT;
 	const int MAX_NR_GP_COUNTERS;
 	const int MIN_NR_GP_COUNTERS;
@@ -58,6 +60,11 @@ static inline bool kvm_pmu_has_perf_global_ctrl(struct kvm_pmu *pmu)
 	return pmu->version > 1;
 }
 
+static inline bool kvm_vcpu_has_mediated_pmu(struct kvm_vcpu *vcpu)
+{
+	return enable_mediated_pmu && vcpu_to_pmu(vcpu)->version;
+}
+
 /*
  * KVM tracks all counters in 64-bit bitmaps, with general purpose counters
  * mapped to bits 31:0 and fixed counters mapped to 63:32, e.g. fixed counter 0
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d80bbd5e0859..396d1aa81732 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -187,6 +187,10 @@ bool __read_mostly enable_pmu = true;
 EXPORT_SYMBOL_GPL(enable_pmu);
 module_param(enable_pmu, bool, 0444);
 
+/* Enable/disabled mediated PMU virtualization. */
+bool __read_mostly enable_mediated_pmu;
+EXPORT_SYMBOL_GPL(enable_mediated_pmu);
+
 bool __read_mostly eager_page_split = true;
 module_param(eager_page_split, bool, 0644);
 
@@ -6542,7 +6546,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 			break;
 
 		mutex_lock(&kvm->lock);
-		if (!kvm->created_vcpus) {
+		if (!kvm->created_vcpus && !kvm->arch.created_mediated_pmu) {
 			kvm->arch.enable_pmu = !(cap->args[0] & KVM_PMU_CAP_DISABLE);
 			r = 0;
 		}
@@ -12174,8 +12178,13 @@ static int sync_regs(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+#define PERF_MEDIATED_PMU_MSG \
+	"Failed to enable mediated vPMU, try disabling system wide perf events and nmi_watchdog.\n"
+
 int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
 {
+	int r;
+
 	if (kvm_check_tsc_unstable() && kvm->created_vcpus)
 		pr_warn_once("SMP vm created on host with unstable TSC; "
 			     "guest TSC will not be reliable\n");
@@ -12186,7 +12195,29 @@ int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
 	if (id >= kvm->arch.max_vcpu_ids)
 		return -EINVAL;
 
-	return kvm_x86_call(vcpu_precreate)(kvm);
+	/*
+	 * Note, any actions done by .vcpu_create() must be idempotent with
+	 * respect to creating multiple vCPUs, and therefore are not undone if
+	 * creating a vCPU fails (including failure during pre-create).
+	 */
+	r = kvm_x86_call(vcpu_precreate)(kvm);
+	if (r)
+		return r;
+
+	if (enable_mediated_pmu && kvm->arch.enable_pmu &&
+	    !kvm->arch.created_mediated_pmu) {
+		if (irqchip_in_kernel(kvm)) {
+			r = perf_create_mediated_pmu();
+			if (r) {
+				pr_warn_ratelimited(PERF_MEDIATED_PMU_MSG);
+				return r;
+			}
+			kvm->arch.created_mediated_pmu = true;
+		} else {
+			kvm->arch.enable_pmu = false;
+		}
+	}
+	return 0;
 }
 
 int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
@@ -12818,6 +12849,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 		__x86_set_memory_region(kvm, TSS_PRIVATE_MEMSLOT, 0, 0);
 		mutex_unlock(&kvm->slots_lock);
 	}
+	if (kvm->arch.created_mediated_pmu)
+		perf_release_mediated_pmu();
 	kvm_destroy_vcpus(kvm);
 	kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
 #ifdef CONFIG_KVM_IOAPIC
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 46220b04cdf2..bd1149768acc 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -445,6 +445,7 @@ extern struct kvm_caps kvm_caps;
 extern struct kvm_host_values kvm_host;
 
 extern bool enable_pmu;
+extern bool enable_mediated_pmu;
 
 /*
  * Get a filtered version of KVM's supported XCR0 that strips out dynamic
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 19/44] KVM: x86/pmu: Implement Intel mediated PMU requirements and constraints
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (17 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 18/44] KVM: x86/pmu: Start stubbing in mediated PMU support Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 20/44] KVM: x86/pmu: Implement AMD mediated PMU requirements Sean Christopherson
                   ` (27 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Dapeng Mi <dapeng1.mi@linux.intel.com>

Implement Intel PMU requirements and constraints for mediated PMU support.
Require host PMU version 4+ so that PERF_GLOBAL_STATUS_SET can be used to
precisely load the guest's status value into hardware, and require full-
width writes so that KVM can precisely load guest counter values.

Disable PEBS and LBRs if mediated PMU support is enabled, as they won't be
supported in the initial implementation.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Co-developed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
[sean: split to separate patch, add full-width writes dependency]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/capabilities.h |  3 ++-
 arch/x86/kvm/vmx/pmu_intel.c    | 17 +++++++++++++++++
 arch/x86/kvm/vmx/vmx.c          |  3 ++-
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 5316c27f6099..854e54c352f8 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -389,7 +389,8 @@ static inline bool vmx_pt_mode_is_host_guest(void)
 
 static inline bool vmx_pebs_supported(void)
 {
-	return boot_cpu_has(X86_FEATURE_PEBS) && kvm_pmu_cap.pebs_ept;
+	return boot_cpu_has(X86_FEATURE_PEBS) && kvm_pmu_cap.pebs_ept &&
+	       !enable_mediated_pmu;
 }
 
 static inline bool cpu_has_notify_vmexit(void)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 07baff96300f..8df8d7b4f212 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -776,6 +776,20 @@ void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu)
 	}
 }
 
+static bool intel_pmu_is_mediated_pmu_supported(struct x86_pmu_capability *host_pmu)
+{
+	u64 host_perf_cap = 0;
+
+	if (boot_cpu_has(X86_FEATURE_PDCM))
+		rdmsrq(MSR_IA32_PERF_CAPABILITIES, host_perf_cap);
+
+	/*
+	 * Require v4+ for MSR_CORE_PERF_GLOBAL_STATUS_SET, and full-width
+	 * writes so that KVM can precisely load guest counter values.
+	 */
+	return host_pmu->version >= 4 && host_perf_cap & PMU_CAP_FW_WRITES;
+}
+
 struct kvm_pmu_ops intel_pmu_ops __initdata = {
 	.rdpmc_ecx_to_pmc = intel_rdpmc_ecx_to_pmc,
 	.msr_idx_to_pmc = intel_msr_idx_to_pmc,
@@ -787,6 +801,9 @@ struct kvm_pmu_ops intel_pmu_ops __initdata = {
 	.reset = intel_pmu_reset,
 	.deliver_pmi = intel_pmu_deliver_pmi,
 	.cleanup = intel_pmu_cleanup,
+
+	.is_mediated_pmu_supported = intel_pmu_is_mediated_pmu_supported,
+
 	.EVENTSEL_EVENT = ARCH_PERFMON_EVENTSEL_EVENT,
 	.MAX_NR_GP_COUNTERS = KVM_MAX_NR_INTEL_GP_COUNTERS,
 	.MIN_NR_GP_COUNTERS = 1,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ed10013dac95..8c6343494e62 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7795,7 +7795,8 @@ static __init u64 vmx_get_perf_capabilities(void)
 	if (boot_cpu_has(X86_FEATURE_PDCM))
 		rdmsrq(MSR_IA32_PERF_CAPABILITIES, host_perf_cap);
 
-	if (!cpu_feature_enabled(X86_FEATURE_ARCH_LBR)) {
+	if (!cpu_feature_enabled(X86_FEATURE_ARCH_LBR) &&
+	    !enable_mediated_pmu) {
 		x86_perf_get_lbr(&vmx_lbr_caps);
 
 		/*
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 20/44] KVM: x86/pmu: Implement AMD mediated PMU requirements
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (18 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 19/44] KVM: x86/pmu: Implement Intel mediated PMU requirements and constraints Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-13  9:49   ` Sandipan Das
  2025-08-06 19:56 ` [PATCH v5 21/44] KVM: x86/pmu: Register PMI handler for mediated vPMU Sean Christopherson
                   ` (26 subsequent siblings)
  46 siblings, 1 reply; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

Require host PMU version 2+ for AMD mediated PMU support, as
PERF_GLOBAL_CTRL and friends are hard requirements for the mediated PMU.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Co-developed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
[sean: extract to separate patch, write changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/pmu.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 7b8577f3c57a..96be2c3e0d65 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -227,6 +227,11 @@ static void amd_pmu_init(struct kvm_vcpu *vcpu)
 	}
 }
 
+static bool amd_pmu_is_mediated_pmu_supported(struct x86_pmu_capability *host_pmu)
+{
+	return host_pmu->version >= 2;
+}
+
 struct kvm_pmu_ops amd_pmu_ops __initdata = {
 	.rdpmc_ecx_to_pmc = amd_rdpmc_ecx_to_pmc,
 	.msr_idx_to_pmc = amd_msr_idx_to_pmc,
@@ -236,6 +241,9 @@ struct kvm_pmu_ops amd_pmu_ops __initdata = {
 	.set_msr = amd_pmu_set_msr,
 	.refresh = amd_pmu_refresh,
 	.init = amd_pmu_init,
+
+	.is_mediated_pmu_supported = amd_pmu_is_mediated_pmu_supported,
+
 	.EVENTSEL_EVENT = AMD64_EVENTSEL_EVENT,
 	.MAX_NR_GP_COUNTERS = KVM_MAX_NR_AMD_GP_COUNTERS,
 	.MIN_NR_GP_COUNTERS = AMD64_NUM_COUNTERS,
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 21/44] KVM: x86/pmu: Register PMI handler for mediated vPMU
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (19 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 20/44] KVM: x86/pmu: Implement AMD mediated PMU requirements Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 22/44] KVM: x86: Rename vmx_vmentry/vmexit_ctrl() helpers Sean Christopherson
                   ` (25 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Xiong Zhang <xiong.y.zhang@linux.intel.com>

Register a dedicated PMI handler with perf's callback when mediated PMU
support is enabled.  Perf routes PMIs that arrive while guest context is
loaded to the provided callback, by modifying the CPU's LVTPC to point at
a dedicated mediated PMI IRQ vector.

WARN upon receipt of a mediated PMI if there is no active vCPU, or if the
vCPU doesn't have a mediated PMU.  Even if a PMI manages to skid past
VM-Exit, it should never be delayed all the way beyond unloading the vCPU.
And while running vCPUs without a mediated PMU, the LVTPC should never be
wired up to the mediated PMI IRQ vector, i.e. should always be routed
through perf's NMI handler.

Signed-off-by: Xiong Zhang <xiong.y.zhang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/pmu.c | 10 ++++++++++
 arch/x86/kvm/pmu.h |  2 ++
 arch/x86/kvm/x86.c |  3 ++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 4d4bb9b17412..680523e9d11e 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -155,6 +155,16 @@ void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
 		perf_get_hw_event_config(PERF_COUNT_HW_BRANCH_INSTRUCTIONS);
 }
 
+void kvm_handle_guest_mediated_pmi(void)
+{
+	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
+
+	if (WARN_ON_ONCE(!vcpu || !kvm_vcpu_has_mediated_pmu(vcpu)))
+		return;
+
+	kvm_make_request(KVM_REQ_PMI, vcpu);
+}
+
 static inline void __kvm_perf_overflow(struct kvm_pmc *pmc, bool in_pmi)
 {
 	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index f5b6181b772c..e038bce76b9e 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -46,6 +46,8 @@ struct kvm_pmu_ops {
 
 void kvm_pmu_ops_update(const struct kvm_pmu_ops *pmu_ops);
 
+void kvm_handle_guest_mediated_pmi(void);
+
 static inline bool kvm_pmu_has_perf_global_ctrl(struct kvm_pmu *pmu)
 {
 	/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 396d1aa81732..2c34dd3f0222 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9693,7 +9693,8 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 		set_hv_tscchange_cb(kvm_hyperv_tsc_notifier);
 #endif
 
-	__kvm_register_perf_callbacks(ops->handle_intel_pt_intr, NULL);
+	__kvm_register_perf_callbacks(ops->handle_intel_pt_intr,
+				      enable_mediated_pmu ? kvm_handle_guest_mediated_pmi : NULL);
 
 	if (IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) && tdp_mmu_enabled)
 		kvm_caps.supported_vm_types |= BIT(KVM_X86_SW_PROTECTED_VM);
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 22/44] KVM: x86: Rename vmx_vmentry/vmexit_ctrl() helpers
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (20 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 21/44] KVM: x86/pmu: Register PMI handler for mediated vPMU Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 23/44] KVM: x86/pmu: Move PMU_CAP_{FW_WRITES,LBR_FMT} into msr-index.h header Sean Christopherson
                   ` (24 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Dapeng Mi <dapeng1.mi@linux.intel.com>

Rename the two helpers vmx_vmentry/vmexit_ctrl() to
vmx_get_initial_vmentry/vmexit_ctrl() to represent their real meaning.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 8c6343494e62..7b0b51809f0e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4304,7 +4304,7 @@ static u32 vmx_pin_based_exec_ctrl(struct vcpu_vmx *vmx)
 	return pin_based_exec_ctrl;
 }
 
-static u32 vmx_vmentry_ctrl(void)
+static u32 vmx_get_initial_vmentry_ctrl(void)
 {
 	u32 vmentry_ctrl = vmcs_config.vmentry_ctrl;
 
@@ -4321,7 +4321,7 @@ static u32 vmx_vmentry_ctrl(void)
 	return vmentry_ctrl;
 }
 
-static u32 vmx_vmexit_ctrl(void)
+static u32 vmx_get_initial_vmexit_ctrl(void)
 {
 	u32 vmexit_ctrl = vmcs_config.vmexit_ctrl;
 
@@ -4686,10 +4686,10 @@ static void init_vmcs(struct vcpu_vmx *vmx)
 	if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT)
 		vmcs_write64(GUEST_IA32_PAT, vmx->vcpu.arch.pat);
 
-	vm_exit_controls_set(vmx, vmx_vmexit_ctrl());
+	vm_exit_controls_set(vmx, vmx_get_initial_vmexit_ctrl());
 
 	/* 22.2.1, 20.8.1 */
-	vm_entry_controls_set(vmx, vmx_vmentry_ctrl());
+	vm_entry_controls_set(vmx, vmx_get_initial_vmentry_ctrl());
 
 	vmx->vcpu.arch.cr0_guest_owned_bits = vmx_l1_guest_owned_cr0_bits();
 	vmcs_writel(CR0_GUEST_HOST_MASK, ~vmx->vcpu.arch.cr0_guest_owned_bits);
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 23/44] KVM: x86/pmu: Move PMU_CAP_{FW_WRITES,LBR_FMT} into msr-index.h header
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (21 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 22/44] KVM: x86: Rename vmx_vmentry/vmexit_ctrl() helpers Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 24/44] KVM: x86: Rework KVM_REQ_MSR_FILTER_CHANGED into a generic RECALC_INTERCEPTS Sean Christopherson
                   ` (23 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Dapeng Mi <dapeng1.mi@linux.intel.com>

Move PMU_CAP_{FW_WRITES,LBR_FMT} into msr-index.h and rename them with
PERF_CAP prefix to keep consistent with other perf capabilities macros.

No functional change intended.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/msr-index.h | 15 +++++++++------
 arch/x86/kvm/vmx/capabilities.h  |  3 ---
 arch/x86/kvm/vmx/pmu_intel.c     |  6 +++---
 arch/x86/kvm/vmx/vmx.c           | 12 ++++++------
 4 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index c29127ac626a..f19d1ee9a396 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -315,12 +315,15 @@
 #define PERF_CAP_PT_IDX			16
 
 #define MSR_PEBS_LD_LAT_THRESHOLD	0x000003f6
-#define PERF_CAP_PEBS_TRAP             BIT_ULL(6)
-#define PERF_CAP_ARCH_REG              BIT_ULL(7)
-#define PERF_CAP_PEBS_FORMAT           0xf00
-#define PERF_CAP_PEBS_BASELINE         BIT_ULL(14)
-#define PERF_CAP_PEBS_MASK	(PERF_CAP_PEBS_TRAP | PERF_CAP_ARCH_REG | \
-				 PERF_CAP_PEBS_FORMAT | PERF_CAP_PEBS_BASELINE)
+
+#define PERF_CAP_LBR_FMT		0x3f
+#define PERF_CAP_PEBS_TRAP		BIT_ULL(6)
+#define PERF_CAP_ARCH_REG		BIT_ULL(7)
+#define PERF_CAP_PEBS_FORMAT		0xf00
+#define PERF_CAP_FW_WRITES		BIT_ULL(13)
+#define PERF_CAP_PEBS_BASELINE		BIT_ULL(14)
+#define PERF_CAP_PEBS_MASK		(PERF_CAP_PEBS_TRAP | PERF_CAP_ARCH_REG | \
+					 PERF_CAP_PEBS_FORMAT | PERF_CAP_PEBS_BASELINE)
 
 #define MSR_IA32_RTIT_CTL		0x00000570
 #define RTIT_CTL_TRACEEN		BIT(0)
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 854e54c352f8..26ff606ff139 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -20,9 +20,6 @@ extern int __read_mostly pt_mode;
 #define PT_MODE_SYSTEM		0
 #define PT_MODE_HOST_GUEST	1
 
-#define PMU_CAP_FW_WRITES	(1ULL << 13)
-#define PMU_CAP_LBR_FMT		0x3f
-
 struct nested_vmx_msrs {
 	/*
 	 * We only store the "true" versions of the VMX capability MSRs. We
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 8df8d7b4f212..7ab35ef4a3b1 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -138,7 +138,7 @@ static inline u64 vcpu_get_perf_capabilities(struct kvm_vcpu *vcpu)
 
 static inline bool fw_writes_is_enabled(struct kvm_vcpu *vcpu)
 {
-	return (vcpu_get_perf_capabilities(vcpu) & PMU_CAP_FW_WRITES) != 0;
+	return (vcpu_get_perf_capabilities(vcpu) & PERF_CAP_FW_WRITES) != 0;
 }
 
 static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm_pmu *pmu, u32 msr)
@@ -588,7 +588,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 
 	perf_capabilities = vcpu_get_perf_capabilities(vcpu);
 	if (intel_pmu_lbr_is_compatible(vcpu) &&
-	    (perf_capabilities & PMU_CAP_LBR_FMT))
+	    (perf_capabilities & PERF_CAP_LBR_FMT))
 		memcpy(&lbr_desc->records, &vmx_lbr_caps, sizeof(vmx_lbr_caps));
 	else
 		lbr_desc->records.nr = 0;
@@ -787,7 +787,7 @@ static bool intel_pmu_is_mediated_pmu_supported(struct x86_pmu_capability *host_
 	 * Require v4+ for MSR_CORE_PERF_GLOBAL_STATUS_SET, and full-width
 	 * writes so that KVM can precisely load guest counter values.
 	 */
-	return host_pmu->version >= 4 && host_perf_cap & PMU_CAP_FW_WRITES;
+	return host_pmu->version >= 4 && host_perf_cap & PERF_CAP_FW_WRITES;
 }
 
 struct kvm_pmu_ops intel_pmu_ops __initdata = {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 7b0b51809f0e..93b87f9e6dfd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2127,7 +2127,7 @@ u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
 	    (host_initiated || guest_cpu_cap_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT)))
 		debugctl |= DEBUGCTLMSR_BUS_LOCK_DETECT;
 
-	if ((kvm_caps.supported_perf_cap & PMU_CAP_LBR_FMT) &&
+	if ((kvm_caps.supported_perf_cap & PERF_CAP_LBR_FMT) &&
 	    (host_initiated || intel_pmu_lbr_is_enabled(vcpu)))
 		debugctl |= DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI;
 
@@ -2412,9 +2412,9 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			vmx->pt_desc.guest.addr_a[index / 2] = data;
 		break;
 	case MSR_IA32_PERF_CAPABILITIES:
-		if (data & PMU_CAP_LBR_FMT) {
-			if ((data & PMU_CAP_LBR_FMT) !=
-			    (kvm_caps.supported_perf_cap & PMU_CAP_LBR_FMT))
+		if (data & PERF_CAP_LBR_FMT) {
+			if ((data & PERF_CAP_LBR_FMT) !=
+			    (kvm_caps.supported_perf_cap & PERF_CAP_LBR_FMT))
 				return 1;
 			if (!cpuid_model_is_consistent(vcpu))
 				return 1;
@@ -7786,7 +7786,7 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 static __init u64 vmx_get_perf_capabilities(void)
 {
-	u64 perf_cap = PMU_CAP_FW_WRITES;
+	u64 perf_cap = PERF_CAP_FW_WRITES;
 	u64 host_perf_cap = 0;
 
 	if (!enable_pmu)
@@ -7807,7 +7807,7 @@ static __init u64 vmx_get_perf_capabilities(void)
 		if (!vmx_lbr_caps.has_callstack)
 			memset(&vmx_lbr_caps, 0, sizeof(vmx_lbr_caps));
 		else if (vmx_lbr_caps.nr)
-			perf_cap |= host_perf_cap & PMU_CAP_LBR_FMT;
+			perf_cap |= host_perf_cap & PERF_CAP_LBR_FMT;
 	}
 
 	if (vmx_pebs_supported()) {
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 24/44] KVM: x86: Rework KVM_REQ_MSR_FILTER_CHANGED into a generic RECALC_INTERCEPTS
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (22 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 23/44] KVM: x86/pmu: Move PMU_CAP_{FW_WRITES,LBR_FMT} into msr-index.h header Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 25/44] KVM: x86: Use KVM_REQ_RECALC_INTERCEPTS to react to CPUID updates Sean Christopherson
                   ` (22 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

Rework the MSR_FILTER_CHANGED request into a more generic RECALC_INTERCEPTS
request, and expand the responsibilities of vendor code to recalculate all
intercepts that vary based on userspace input, e.g. instruction intercepts
that are tied to guest CPUID.

Providing a generic recalc request will allow the upcoming mediated PMU
support to trigger a recalc when PMU features, e.g. PERF_CAPABILITIES, are
set by userspace, without having to make multiple calls to/from PMU code.
As a bonus, using a request will effectively coalesce recalcs, e.g. will
reduce the number of recalcs for normal usage from 3+ to 1 (vCPU create,
set CPUID, set PERF_CAPABILITIES (Intel only), set filter).

The downside is that MSR filter changes that are done in isolation will do
a small amount of unnecessary work, but that's already a relatively slow
path, and the cost of recalculating instruction intercepts is negligible.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  2 +-
 arch/x86/include/asm/kvm_host.h    |  4 ++--
 arch/x86/kvm/svm/svm.c             |  8 ++++----
 arch/x86/kvm/vmx/main.c            | 14 +++++++-------
 arch/x86/kvm/vmx/vmx.c             |  9 +++++++--
 arch/x86/kvm/vmx/x86_ops.h         |  2 +-
 arch/x86/kvm/x86.c                 | 15 +++++++--------
 7 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 18a5c3119e1a..7c240e23bd52 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -138,7 +138,7 @@ KVM_X86_OP(check_emulate_instruction)
 KVM_X86_OP(apic_init_signal_blocked)
 KVM_X86_OP_OPTIONAL(enable_l2_tlb_flush)
 KVM_X86_OP_OPTIONAL(migrate_timers)
-KVM_X86_OP(recalc_msr_intercepts)
+KVM_X86_OP(recalc_intercepts)
 KVM_X86_OP(complete_emulated_msr)
 KVM_X86_OP(vcpu_deliver_sipi_vector)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ff0d753e2b07..b891bd92fc83 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -120,7 +120,7 @@
 #define KVM_REQ_TLB_FLUSH_GUEST \
 	KVM_ARCH_REQ_FLAGS(27, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_APF_READY		KVM_ARCH_REQ(28)
-#define KVM_REQ_MSR_FILTER_CHANGED	KVM_ARCH_REQ(29)
+#define KVM_REQ_RECALC_INTERCEPTS	KVM_ARCH_REQ(29)
 #define KVM_REQ_UPDATE_CPU_DIRTY_LOGGING \
 	KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_MMU_FREE_OBSOLETE_ROOTS \
@@ -1912,7 +1912,7 @@ struct kvm_x86_ops {
 	int (*enable_l2_tlb_flush)(struct kvm_vcpu *vcpu);
 
 	void (*migrate_timers)(struct kvm_vcpu *vcpu);
-	void (*recalc_msr_intercepts)(struct kvm_vcpu *vcpu);
+	void (*recalc_intercepts)(struct kvm_vcpu *vcpu);
 	int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
 
 	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f7e1e665a826..3d9dcc66a407 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1077,7 +1077,7 @@ static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu)
 	}
 }
 
-static void svm_recalc_intercepts_after_set_cpuid(struct kvm_vcpu *vcpu)
+static void svm_recalc_intercepts(struct kvm_vcpu *vcpu)
 {
 	svm_recalc_instruction_intercepts(vcpu);
 	svm_recalc_msr_intercepts(vcpu);
@@ -1225,7 +1225,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 
 	svm_hv_init_vmcb(vmcb);
 
-	svm_recalc_intercepts_after_set_cpuid(vcpu);
+	svm_recalc_intercepts(vcpu);
 
 	vmcb_mark_all_dirty(vmcb);
 
@@ -4479,7 +4479,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	if (sev_guest(vcpu->kvm))
 		sev_vcpu_after_set_cpuid(svm);
 
-	svm_recalc_intercepts_after_set_cpuid(vcpu);
+	svm_recalc_intercepts(vcpu);
 }
 
 static bool svm_has_wbinvd_exit(void)
@@ -5181,7 +5181,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
 
-	.recalc_msr_intercepts = svm_recalc_msr_intercepts,
+	.recalc_intercepts = svm_recalc_intercepts,
 	.complete_emulated_msr = svm_complete_emulated_msr,
 
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index dbab1c15b0cd..68dcafd177a8 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -188,18 +188,18 @@ static int vt_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	return vmx_get_msr(vcpu, msr_info);
 }
 
-static void vt_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
+static void vt_recalc_intercepts(struct kvm_vcpu *vcpu)
 {
 	/*
-	 * TDX doesn't allow VMM to configure interception of MSR accesses.
-	 * TDX guest requests MSR accesses by calling TDVMCALL.  The MSR
-	 * filters will be applied when handling the TDVMCALL for RDMSR/WRMSR
-	 * if the userspace has set any.
+	 * TDX doesn't allow VMM to configure interception of instructions or
+	 * MSR accesses.  TDX guest requests MSR accesses by calling TDVMCALL.
+	 * The MSR filters will be applied when handling the TDVMCALL for
+	 * RDMSR/WRMSR if the userspace has set any.
 	 */
 	if (is_td_vcpu(vcpu))
 		return;
 
-	vmx_recalc_msr_intercepts(vcpu);
+	vmx_recalc_intercepts(vcpu);
 }
 
 static int vt_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
@@ -995,7 +995,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.apic_init_signal_blocked = vt_op(apic_init_signal_blocked),
 	.migrate_timers = vmx_migrate_timers,
 
-	.recalc_msr_intercepts = vt_op(recalc_msr_intercepts),
+	.recalc_intercepts = vt_op(recalc_intercepts),
 	.complete_emulated_msr = vt_op(complete_emulated_msr),
 
 	.vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 93b87f9e6dfd..2244ca074e9d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4068,7 +4068,7 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
 	}
 }
 
-void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
+static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 {
 	if (!cpu_has_vmx_msr_bitmap())
 		return;
@@ -4121,6 +4121,11 @@ void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 	 */
 }
 
+void vmx_recalc_intercepts(struct kvm_vcpu *vcpu)
+{
+	vmx_recalc_msr_intercepts(vcpu);
+}
+
 static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
 						int vector)
 {
@@ -7778,7 +7783,7 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 			~FEAT_CTL_SGX_LC_ENABLED;
 
 	/* Recalc MSR interception to account for feature changes. */
-	vmx_recalc_msr_intercepts(vcpu);
+	vmx_recalc_intercepts(vcpu);
 
 	/* Refresh #PF interception to account for MAXPHYADDR changes. */
 	vmx_update_exception_bitmap(vcpu);
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 2b3424f638db..2c590ff44ced 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -52,7 +52,7 @@ void vmx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
 			   int trig_mode, int vector);
 void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu);
 bool vmx_has_emulated_msr(struct kvm *kvm, u32 index);
-void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu);
+void vmx_recalc_intercepts(struct kvm_vcpu *vcpu);
 void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
 void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu);
 int vmx_get_feature_msr(u32 msr, u64 *data);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2c34dd3f0222..69f5d9deb75f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6742,7 +6742,11 @@ static int kvm_vm_ioctl_set_msr_filter(struct kvm *kvm,
 
 	kvm_free_msr_filter(old_filter);
 
-	kvm_make_all_cpus_request(kvm, KVM_REQ_MSR_FILTER_CHANGED);
+	/*
+	 * Recalc MSR intercepts as userspace may want to intercept accesses to
+	 * MSRs that KVM would otherwise pass through to the guest.
+	 */
+	kvm_make_all_cpus_request(kvm, KVM_REQ_RECALC_INTERCEPTS);
 
 	return 0;
 }
@@ -10765,13 +10769,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		if (kvm_check_request(KVM_REQ_APF_READY, vcpu))
 			kvm_check_async_pf_completion(vcpu);
 
-		/*
-		 * Recalc MSR intercepts as userspace may want to intercept
-		 * accesses to MSRs that KVM would otherwise pass through to
-		 * the guest.
-		 */
-		if (kvm_check_request(KVM_REQ_MSR_FILTER_CHANGED, vcpu))
-			kvm_x86_call(recalc_msr_intercepts)(vcpu);
+		if (kvm_check_request(KVM_REQ_RECALC_INTERCEPTS, vcpu))
+			kvm_x86_call(recalc_intercepts)(vcpu);
 
 		if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
 			kvm_x86_call(update_cpu_dirty_logging)(vcpu);
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 25/44] KVM: x86: Use KVM_REQ_RECALC_INTERCEPTS to react to CPUID updates
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (23 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 24/44] KVM: x86: Rework KVM_REQ_MSR_FILTER_CHANGED into a generic RECALC_INTERCEPTS Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 26/44] KVM: VMX: Add helpers to toggle/change a bit in VMCS execution controls Sean Christopherson
                   ` (21 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

Defer recalculating MSR and instruction intercepts after a CPUID update
via RECALC_INTERCEPTS to converge on RECALC_INTERCEPTS as the "official"
mechanism for triggering recalcs.  As a bonus, because KVM does a "recalc"
during vCPU creation, and every functional VMM sets CPUID at least once,
for all intents and purposes this saves at least one recalc.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c   | 2 ++
 arch/x86/kvm/svm/svm.c | 4 +---
 arch/x86/kvm/vmx/vmx.c | 3 ---
 3 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index e2836a255b16..cc16e28bfab2 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -448,6 +448,8 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	 * adjustments to the reserved GPA bits.
 	 */
 	kvm_mmu_after_set_cpuid(vcpu);
+
+	kvm_make_request(KVM_REQ_RECALC_INTERCEPTS, vcpu);
 }
 
 int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3d9dcc66a407..ef7dffc54dca 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1225,7 +1225,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 
 	svm_hv_init_vmcb(vmcb);
 
-	svm_recalc_intercepts(vcpu);
+	kvm_make_request(KVM_REQ_RECALC_INTERCEPTS, vcpu);
 
 	vmcb_mark_all_dirty(vmcb);
 
@@ -4478,8 +4478,6 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 	if (sev_guest(vcpu->kvm))
 		sev_vcpu_after_set_cpuid(svm);
-
-	svm_recalc_intercepts(vcpu);
 }
 
 static bool svm_has_wbinvd_exit(void)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2244ca074e9d..6094de4855d6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7782,9 +7782,6 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		vmx->msr_ia32_feature_control_valid_bits &=
 			~FEAT_CTL_SGX_LC_ENABLED;
 
-	/* Recalc MSR interception to account for feature changes. */
-	vmx_recalc_intercepts(vcpu);
-
 	/* Refresh #PF interception to account for MAXPHYADDR changes. */
 	vmx_update_exception_bitmap(vcpu);
 }
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 26/44] KVM: VMX: Add helpers to toggle/change a bit in VMCS execution controls
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (24 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 25/44] KVM: x86: Use KVM_REQ_RECALC_INTERCEPTS to react to CPUID updates Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 27/44] KVM: x86/pmu: Disable RDPMC interception for compatible mediated vPMU Sean Christopherson
                   ` (20 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Dapeng Mi <dapeng1.mi@linux.intel.com>

Expand the VMCS controls builder macros to generate helpers to change a
bit to the desired value, and use the new helpers when toggling APICv
related controls.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
[sean: rewrite changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 20 +++++++-------------
 arch/x86/kvm/vmx/vmx.h |  8 ++++++++
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6094de4855d6..baea4a9cf74f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4356,19 +4356,13 @@ void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
 
 	pin_controls_set(vmx, vmx_pin_based_exec_ctrl(vmx));
 
-	if (kvm_vcpu_apicv_active(vcpu)) {
-		secondary_exec_controls_setbit(vmx,
-					       SECONDARY_EXEC_APIC_REGISTER_VIRT |
-					       SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
-		if (enable_ipiv)
-			tertiary_exec_controls_setbit(vmx, TERTIARY_EXEC_IPI_VIRT);
-	} else {
-		secondary_exec_controls_clearbit(vmx,
-						 SECONDARY_EXEC_APIC_REGISTER_VIRT |
-						 SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
-		if (enable_ipiv)
-			tertiary_exec_controls_clearbit(vmx, TERTIARY_EXEC_IPI_VIRT);
-	}
+	secondary_exec_controls_changebit(vmx,
+					  SECONDARY_EXEC_APIC_REGISTER_VIRT |
+					  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY,
+					  kvm_vcpu_apicv_active(vcpu));
+	if (enable_ipiv)
+		tertiary_exec_controls_changebit(vmx, TERTIARY_EXEC_IPI_VIRT,
+						 kvm_vcpu_apicv_active(vcpu));
 
 	vmx_update_msr_bitmap_x2apic(vcpu);
 }
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index d3389baf3ab3..a4e5bcd1d023 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -608,6 +608,14 @@ static __always_inline void lname##_controls_clearbit(struct vcpu_vmx *vmx, u##b
 {												\
 	BUILD_BUG_ON(!(val & (KVM_REQUIRED_VMX_##uname | KVM_OPTIONAL_VMX_##uname)));		\
 	lname##_controls_set(vmx, lname##_controls_get(vmx) & ~val);				\
+}												\
+static __always_inline void lname##_controls_changebit(struct vcpu_vmx *vmx, u##bits val,	\
+						       bool set)				\
+{												\
+	if (set)										\
+		lname##_controls_setbit(vmx, val);						\
+	else											\
+		lname##_controls_clearbit(vmx, val);						\
 }
 BUILD_CONTROLS_SHADOW(vm_entry, VM_ENTRY_CONTROLS, 32)
 BUILD_CONTROLS_SHADOW(vm_exit, VM_EXIT_CONTROLS, 32)
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 27/44] KVM: x86/pmu: Disable RDPMC interception for compatible mediated vPMU
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (25 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 26/44] KVM: VMX: Add helpers to toggle/change a bit in VMCS execution controls Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 28/44] KVM: x86/pmu: Load/save GLOBAL_CTRL via entry/exit fields for mediated PMU Sean Christopherson
                   ` (19 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Dapeng Mi <dapeng1.mi@linux.intel.com>

Disable RDPMC interception for vCPUs with a mediated vPMU that is
compatible with the host PMU, i.e. that doesn't require KVM emulation of
RDPMC to honor the guest's vCPU model.  With a mediated vPMU, all guest
state accessible via RDPMC is loaded into hardware while the guest is
running.

Adust RDPMC interception only for non-TDX guests, as the TDX module is
responsible for managing RDPMC intercepts based on the TD configuration.

Co-developed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Co-developed-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/pmu.c     | 26 ++++++++++++++++++++++++++
 arch/x86/kvm/pmu.h     |  1 +
 arch/x86/kvm/svm/svm.c |  5 +++++
 arch/x86/kvm/vmx/vmx.c |  7 +++++++
 arch/x86/kvm/x86.c     |  1 +
 5 files changed, 40 insertions(+)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 680523e9d11e..674f42d083a9 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -712,6 +712,32 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
 	return 0;
 }
 
+bool kvm_need_rdpmc_intercept(struct kvm_vcpu *vcpu)
+{
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+
+	if (!kvm_vcpu_has_mediated_pmu(vcpu))
+		return true;
+
+	/*
+	 * VMware allows access to these Pseduo-PMCs even when read via RDPMC
+	 * in Ring3 when CR4.PCE=0.
+	 */
+	if (enable_vmware_backdoor)
+		return true;
+
+	/*
+	 * Note!  Check *host* PMU capabilities, not KVM's PMU capabilities, as
+	 * KVM's capabilities are constrained based on KVM support, i.e. KVM's
+	 * capabilities themselves may be a subset of hardware capabilities.
+	 */
+	return pmu->nr_arch_gp_counters != kvm_host_pmu.num_counters_gp ||
+	       pmu->nr_arch_fixed_counters != kvm_host_pmu.num_counters_fixed ||
+	       pmu->counter_bitmask[KVM_PMC_GP] != (BIT_ULL(kvm_host_pmu.bit_width_gp) - 1) ||
+	       pmu->counter_bitmask[KVM_PMC_FIXED] != (BIT_ULL(kvm_host_pmu.bit_width_fixed) - 1);
+}
+EXPORT_SYMBOL_GPL(kvm_need_rdpmc_intercept);
+
 void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
 {
 	if (lapic_in_kernel(vcpu)) {
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index e038bce76b9e..6b95e81c078c 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -238,6 +238,7 @@ void kvm_pmu_instruction_retired(struct kvm_vcpu *vcpu);
 void kvm_pmu_branch_retired(struct kvm_vcpu *vcpu);
 
 bool is_vmware_backdoor_pmc(u32 pmc_idx);
+bool kvm_need_rdpmc_intercept(struct kvm_vcpu *vcpu);
 
 extern struct kvm_pmu_ops intel_pmu_ops;
 extern struct kvm_pmu_ops amd_pmu_ops;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ef7dffc54dca..2d42962b47aa 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1075,6 +1075,11 @@ static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu)
 			svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
 		}
 	}
+
+	if (kvm_need_rdpmc_intercept(vcpu))
+		svm_set_intercept(svm, INTERCEPT_RDPMC);
+	else
+		svm_clr_intercept(svm, INTERCEPT_RDPMC);
 }
 
 static void svm_recalc_intercepts(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index baea4a9cf74f..2f7db32710e3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4121,8 +4121,15 @@ static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 	 */
 }
 
+static void vmx_recalc_instruction_intercepts(struct kvm_vcpu *vcpu)
+{
+	exec_controls_changebit(to_vmx(vcpu), CPU_BASED_RDPMC_EXITING,
+				kvm_need_rdpmc_intercept(vcpu));
+}
+
 void vmx_recalc_intercepts(struct kvm_vcpu *vcpu)
 {
+	vmx_recalc_instruction_intercepts(vcpu);
 	vmx_recalc_msr_intercepts(vcpu);
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 69f5d9deb75f..b8014435c988 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3793,6 +3793,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 
 		vcpu->arch.perf_capabilities = data;
 		kvm_pmu_refresh(vcpu);
+		kvm_make_request(KVM_REQ_RECALC_INTERCEPTS, vcpu);
 		break;
 	case MSR_IA32_PRED_CMD: {
 		u64 reserved_bits = ~(PRED_CMD_IBPB | PRED_CMD_SBPB);
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 28/44] KVM: x86/pmu: Load/save GLOBAL_CTRL via entry/exit fields for mediated PMU
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (26 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 27/44] KVM: x86/pmu: Disable RDPMC interception for compatible mediated vPMU Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 29/44] KVM: x86/pmu: Use BIT_ULL() instead of open coded equivalents Sean Christopherson
                   ` (18 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Dapeng Mi <dapeng1.mi@linux.intel.com>

When running a guest with a mediated PMU, context switch PERF_GLOBAL_CTRL
via the dedicated VMCS fields for both host and guest.  For the host,
always zero GLOBAL_CTRL on exit as the guest's state will still be loaded
in hardware (KVM will context switch the bulk of PMU state outside of the
inner run loop).  For the guest, use the dedicated fields to atomically
load and save PERF_GLOBAL_CTRL on all entry/exits.

Note, VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL was introduced by Sapphire
Rapids, and is expected to be supported on all CPUs with PMU v4+.  WARN if
that expectation is not met.  Alternatively, KVM could manually save
PERF_GLOBAL_CTRL via the MSR save list, but the associated complexity and
runtime overhead is unjustified given that the feature should always be
available on relevant CPUs.

To minimize VM-Entry latency, propagate IA32_PERF_GLOBAL_CTRL to the VMCS
on-demand.  But to minimize complexity, read IA32_PERF_GLOBAL_CTRL out of
the VMCS on all non-failing VM-Exits.  I.e. partially cache the MSR.
KVM could track GLOBAL_CTRL as an EXREG and defer all reads, but writes
are rare, i.e. the dirty tracking for an EXREG is unnecessary, and it's
not obvious that shaving ~15-20 cycles per exit is meaningful given the
total overhead associated with mediated PMU context switches.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Co-developed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm-x86-pmu-ops.h |  2 ++
 arch/x86/include/asm/vmx.h             |  1 +
 arch/x86/kvm/pmu.c                     | 13 +++++++++--
 arch/x86/kvm/pmu.h                     |  3 ++-
 arch/x86/kvm/vmx/capabilities.h        |  5 +++++
 arch/x86/kvm/vmx/pmu_intel.c           | 19 +++++++++++++++-
 arch/x86/kvm/vmx/vmx.c                 | 31 +++++++++++++++++++++++++-
 arch/x86/kvm/vmx/vmx.h                 |  3 ++-
 8 files changed, 71 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-pmu-ops.h b/arch/x86/include/asm/kvm-x86-pmu-ops.h
index 9159bf1a4730..ad2cc82abf79 100644
--- a/arch/x86/include/asm/kvm-x86-pmu-ops.h
+++ b/arch/x86/include/asm/kvm-x86-pmu-ops.h
@@ -23,5 +23,7 @@ KVM_X86_PMU_OP_OPTIONAL(reset)
 KVM_X86_PMU_OP_OPTIONAL(deliver_pmi)
 KVM_X86_PMU_OP_OPTIONAL(cleanup)
 
+KVM_X86_PMU_OP_OPTIONAL(write_global_ctrl)
+
 #undef KVM_X86_PMU_OP
 #undef KVM_X86_PMU_OP_OPTIONAL
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index cca7d6641287..af71666c3a37 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -106,6 +106,7 @@
 #define VM_EXIT_CLEAR_BNDCFGS                   0x00800000
 #define VM_EXIT_PT_CONCEAL_PIP			0x01000000
 #define VM_EXIT_CLEAR_IA32_RTIT_CTL		0x02000000
+#define VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL	0x40000000
 
 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR	0x00036dff
 
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 674f42d083a9..a4fe0e76df79 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -103,7 +103,7 @@ void kvm_pmu_ops_update(const struct kvm_pmu_ops *pmu_ops)
 #undef __KVM_X86_PMU_OP
 }
 
-void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
+void kvm_init_pmu_capability(struct kvm_pmu_ops *pmu_ops)
 {
 	bool is_intel = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL;
 	int min_nr_gp_ctrs = pmu_ops->MIN_NR_GP_COUNTERS;
@@ -137,6 +137,9 @@ void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
 	    !pmu_ops->is_mediated_pmu_supported(&kvm_host_pmu))
 		enable_mediated_pmu = false;
 
+	if (!enable_mediated_pmu)
+		pmu_ops->write_global_ctrl = NULL;
+
 	if (!enable_pmu) {
 		memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
 		return;
@@ -831,6 +834,9 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			diff = pmu->global_ctrl ^ data;
 			pmu->global_ctrl = data;
 			reprogram_counters(pmu, diff);
+
+			if (kvm_vcpu_has_mediated_pmu(vcpu))
+				kvm_pmu_call(write_global_ctrl)(data);
 		}
 		break;
 	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
@@ -921,8 +927,11 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
 	 * in the global controls).  Emulate that behavior when refreshing the
 	 * PMU so that userspace doesn't need to manually set PERF_GLOBAL_CTRL.
 	 */
-	if (kvm_pmu_has_perf_global_ctrl(pmu) && pmu->nr_arch_gp_counters)
+	if (kvm_pmu_has_perf_global_ctrl(pmu) && pmu->nr_arch_gp_counters) {
 		pmu->global_ctrl = GENMASK_ULL(pmu->nr_arch_gp_counters - 1, 0);
+		if (kvm_vcpu_has_mediated_pmu(vcpu))
+			kvm_pmu_call(write_global_ctrl)(pmu->global_ctrl);
+	}
 }
 
 void kvm_pmu_init(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 6b95e81c078c..dcf4e2253875 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -38,6 +38,7 @@ struct kvm_pmu_ops {
 	void (*cleanup)(struct kvm_vcpu *vcpu);
 
 	bool (*is_mediated_pmu_supported)(struct x86_pmu_capability *host_pmu);
+	void (*write_global_ctrl)(u64 global_ctrl);
 
 	const u64 EVENTSEL_EVENT;
 	const int MAX_NR_GP_COUNTERS;
@@ -183,7 +184,7 @@ static inline bool pmc_is_locally_enabled(struct kvm_pmc *pmc)
 
 extern struct x86_pmu_capability kvm_pmu_cap;
 
-void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops);
+void kvm_init_pmu_capability(struct kvm_pmu_ops *pmu_ops);
 
 void kvm_pmu_recalc_pmc_emulation(struct kvm_pmu *pmu, struct kvm_pmc *pmc);
 
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 26ff606ff139..874c6dd34665 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -100,6 +100,11 @@ static inline bool cpu_has_load_perf_global_ctrl(void)
 	return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
 }
 
+static inline bool cpu_has_save_perf_global_ctrl(void)
+{
+	return vmcs_config.vmexit_ctrl & VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL;
+}
+
 static inline bool cpu_has_vmx_mpx(void)
 {
 	return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_BNDCFGS;
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 7ab35ef4a3b1..98f7b45ea391 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -787,7 +787,23 @@ static bool intel_pmu_is_mediated_pmu_supported(struct x86_pmu_capability *host_
 	 * Require v4+ for MSR_CORE_PERF_GLOBAL_STATUS_SET, and full-width
 	 * writes so that KVM can precisely load guest counter values.
 	 */
-	return host_pmu->version >= 4 && host_perf_cap & PERF_CAP_FW_WRITES;
+	if (host_pmu->version < 4 || !(host_perf_cap & PERF_CAP_FW_WRITES))
+		return false;
+
+	/*
+	 * All CPUs that support a mediated PMU are expected to support loading
+	 * and saving PERF_GLOBAL_CTRL via dedicated VMCS fields.
+	 */
+	if (WARN_ON_ONCE(!cpu_has_load_perf_global_ctrl() ||
+			 !cpu_has_save_perf_global_ctrl()))
+		return false;
+
+	return true;
+}
+
+static void intel_pmu_write_global_ctrl(u64 global_ctrl)
+{
+	vmcs_write64(GUEST_IA32_PERF_GLOBAL_CTRL, global_ctrl);
 }
 
 struct kvm_pmu_ops intel_pmu_ops __initdata = {
@@ -803,6 +819,7 @@ struct kvm_pmu_ops intel_pmu_ops __initdata = {
 	.cleanup = intel_pmu_cleanup,
 
 	.is_mediated_pmu_supported = intel_pmu_is_mediated_pmu_supported,
+	.write_global_ctrl = intel_pmu_write_global_ctrl,
 
 	.EVENTSEL_EVENT = ARCH_PERFMON_EVENTSEL_EVENT,
 	.MAX_NR_GP_COUNTERS = KVM_MAX_NR_INTEL_GP_COUNTERS,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2f7db32710e3..1233a0afb31e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4115,6 +4115,18 @@ static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 		vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
 					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
 
+	if (enable_mediated_pmu) {
+		bool is_mediated_pmu = kvm_vcpu_has_mediated_pmu(vcpu);
+		struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+		vm_entry_controls_changebit(vmx,
+					    VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL, is_mediated_pmu);
+
+		vm_exit_controls_changebit(vmx,
+					   VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
+					   VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL, is_mediated_pmu);
+	}
+
 	/*
 	 * x2APIC and LBR MSR intercepts are modified on-demand and cannot be
 	 * filtered by userspace.
@@ -4282,6 +4294,16 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
 
 	if (cpu_has_load_ia32_efer())
 		vmcs_write64(HOST_IA32_EFER, kvm_host.efer);
+
+	/*
+	 * When running a guest with a mediated PMU, guest state is resident in
+	 * hardware after VM-Exit.  Zero PERF_GLOBAL_CTRL on exit so that host
+	 * activity doesn't bleed into the guest counters.  When running with
+	 * an emulated PMU, PERF_GLOBAL_CTRL is dynamically computed on every
+	 * entry/exit to merge guest and host PMU usage.
+	 */
+	if (enable_mediated_pmu)
+		vmcs_write64(HOST_IA32_PERF_GLOBAL_CTRL, 0);
 }
 
 void set_cr4_guest_host_mask(struct vcpu_vmx *vmx)
@@ -4349,7 +4371,8 @@ static u32 vmx_get_initial_vmexit_ctrl(void)
 				 VM_EXIT_CLEAR_IA32_RTIT_CTL);
 	/* Loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically */
 	return vmexit_ctrl &
-		~(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | VM_EXIT_LOAD_IA32_EFER);
+		~(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | VM_EXIT_LOAD_IA32_EFER |
+		  VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL);
 }
 
 void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
@@ -7087,6 +7110,9 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
 	struct perf_guest_switch_msr *msrs;
 	struct kvm_pmu *pmu = vcpu_to_pmu(&vmx->vcpu);
 
+	if (kvm_vcpu_has_mediated_pmu(&vmx->vcpu))
+		return;
+
 	pmu->host_cross_mapped_mask = 0;
 	if (pmu->pebs_enable & pmu->global_ctrl)
 		intel_pmu_cross_mapped_check(pmu);
@@ -7407,6 +7433,9 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
 
 	vmx->loaded_vmcs->launched = 1;
 
+	if (!msr_write_intercepted(vmx, MSR_CORE_PERF_GLOBAL_CTRL))
+		vcpu_to_pmu(vcpu)->global_ctrl = vmcs_read64(GUEST_IA32_PERF_GLOBAL_CTRL);
+
 	vmx_recover_nmi_blocking(vmx);
 	vmx_complete_interrupts(vmx);
 
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index a4e5bcd1d023..7eb57f5cb975 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -506,7 +506,8 @@ static inline u8 vmx_get_rvi(void)
 	       VM_EXIT_LOAD_IA32_EFER |					\
 	       VM_EXIT_CLEAR_BNDCFGS |					\
 	       VM_EXIT_PT_CONCEAL_PIP |					\
-	       VM_EXIT_CLEAR_IA32_RTIT_CTL)
+	       VM_EXIT_CLEAR_IA32_RTIT_CTL |				\
+	       VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL)
 
 #define KVM_REQUIRED_VMX_PIN_BASED_VM_EXEC_CONTROL			\
 	(PIN_BASED_EXT_INTR_MASK |					\
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 29/44] KVM: x86/pmu: Use BIT_ULL() instead of open coded equivalents
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (27 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 28/44] KVM: x86/pmu: Load/save GLOBAL_CTRL via entry/exit fields for mediated PMU Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 30/44] KVM: x86/pmu: Move initialization of valid PMCs bitmask to common x86 Sean Christopherson
                   ` (17 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Dapeng Mi <dapeng1.mi@linux.intel.com>

Replace a variety of "1ull << N" and "(u64)1 << N" snippets with BIT_ULL()
in the PMU code.

No functional change intended.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
[sean: split to separate patch, write changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/pmu.c       |  4 ++--
 arch/x86/kvm/vmx/pmu_intel.c | 15 ++++++---------
 2 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 96be2c3e0d65..b777c3743304 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -199,11 +199,11 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
 					 kvm_pmu_cap.num_counters_gp);
 
 	if (pmu->version > 1) {
-		pmu->global_ctrl_rsvd = ~((1ull << pmu->nr_arch_gp_counters) - 1);
+		pmu->global_ctrl_rsvd = ~(BIT_ULL(pmu->nr_arch_gp_counters) - 1);
 		pmu->global_status_rsvd = pmu->global_ctrl_rsvd;
 	}
 
-	pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1;
+	pmu->counter_bitmask[KVM_PMC_GP] = BIT_ULL(48) - 1;
 	pmu->reserved_bits = 0xfffffff000280000ull;
 	pmu->raw_event_mask = AMD64_RAW_EVENT_MASK;
 	/* not applicable to AMD; but clean them to prevent any fall out */
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 98f7b45ea391..6352d029298c 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -536,11 +536,10 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 					 kvm_pmu_cap.num_counters_gp);
 	eax.split.bit_width = min_t(int, eax.split.bit_width,
 				    kvm_pmu_cap.bit_width_gp);
-	pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << eax.split.bit_width) - 1;
+	pmu->counter_bitmask[KVM_PMC_GP] = BIT_ULL(eax.split.bit_width) - 1;
 	eax.split.mask_length = min_t(int, eax.split.mask_length,
 				      kvm_pmu_cap.events_mask_len);
-	pmu->available_event_types = ~entry->ebx &
-					((1ull << eax.split.mask_length) - 1);
+	pmu->available_event_types = ~entry->ebx & (BIT_ULL(eax.split.mask_length) - 1);
 
 	if (pmu->version == 1) {
 		pmu->nr_arch_fixed_counters = 0;
@@ -549,16 +548,15 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 						    kvm_pmu_cap.num_counters_fixed);
 		edx.split.bit_width_fixed = min_t(int, edx.split.bit_width_fixed,
 						  kvm_pmu_cap.bit_width_fixed);
-		pmu->counter_bitmask[KVM_PMC_FIXED] =
-			((u64)1 << edx.split.bit_width_fixed) - 1;
+		pmu->counter_bitmask[KVM_PMC_FIXED] = BIT_ULL(edx.split.bit_width_fixed) - 1;
 	}
 
 	intel_pmu_enable_fixed_counter_bits(pmu, INTEL_FIXED_0_KERNEL |
 						 INTEL_FIXED_0_USER |
 						 INTEL_FIXED_0_ENABLE_PMI);
 
-	counter_rsvd = ~(((1ull << pmu->nr_arch_gp_counters) - 1) |
-		(((1ull << pmu->nr_arch_fixed_counters) - 1) << KVM_FIXED_PMC_BASE_IDX));
+	counter_rsvd = ~((BIT_ULL(pmu->nr_arch_gp_counters) - 1) |
+			 ((BIT_ULL(pmu->nr_arch_fixed_counters) - 1) << KVM_FIXED_PMC_BASE_IDX));
 	pmu->global_ctrl_rsvd = counter_rsvd;
 
 	/*
@@ -603,8 +601,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 			pmu->pebs_data_cfg_rsvd = ~0xff00000full;
 			intel_pmu_enable_fixed_counter_bits(pmu, ICL_FIXED_0_ADAPTIVE);
 		} else {
-			pmu->pebs_enable_rsvd =
-				~((1ull << pmu->nr_arch_gp_counters) - 1);
+			pmu->pebs_enable_rsvd = ~(BIT_ULL(pmu->nr_arch_gp_counters) - 1);
 		}
 	}
 }
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 30/44] KVM: x86/pmu: Move initialization of valid PMCs bitmask to common x86
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (28 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 29/44] KVM: x86/pmu: Use BIT_ULL() instead of open coded equivalents Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 31/44] KVM: x86/pmu: Restrict GLOBAL_{CTRL,STATUS}, fixed PMCs, and PEBS to PMU v2+ Sean Christopherson
                   ` (16 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

Move all initialization of all_valid_pmc_idx to common code, as the logic
is 100% common to Intel and AMD, and KVM heavily relies on Intel and AMD
having the same semantics.  E.g. the fact that AMD doesn't support fixed
counters doesn't allow KVM to use all_valid_pmc_idx[63:32] for other
purposes.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/pmu.c           | 4 ++++
 arch/x86/kvm/svm/pmu.c       | 1 -
 arch/x86/kvm/vmx/pmu_intel.c | 5 -----
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index a4fe0e76df79..4246e1d2cfcc 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -932,6 +932,10 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
 		if (kvm_vcpu_has_mediated_pmu(vcpu))
 			kvm_pmu_call(write_global_ctrl)(pmu->global_ctrl);
 	}
+
+	bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters);
+	bitmap_set(pmu->all_valid_pmc_idx, KVM_FIXED_PMC_BASE_IDX,
+		   pmu->nr_arch_fixed_counters);
 }
 
 void kvm_pmu_init(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index b777c3743304..9ffd44a5d474 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -209,7 +209,6 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
 	/* not applicable to AMD; but clean them to prevent any fall out */
 	pmu->counter_bitmask[KVM_PMC_FIXED] = 0;
 	pmu->nr_arch_fixed_counters = 0;
-	bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters);
 }
 
 static void amd_pmu_init(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 6352d029298c..1de94a39ca18 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -579,11 +579,6 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 		pmu->raw_event_mask |= (HSW_IN_TX|HSW_IN_TX_CHECKPOINTED);
 	}
 
-	bitmap_set(pmu->all_valid_pmc_idx,
-		0, pmu->nr_arch_gp_counters);
-	bitmap_set(pmu->all_valid_pmc_idx,
-		INTEL_PMC_MAX_GENERIC, pmu->nr_arch_fixed_counters);
-
 	perf_capabilities = vcpu_get_perf_capabilities(vcpu);
 	if (intel_pmu_lbr_is_compatible(vcpu) &&
 	    (perf_capabilities & PERF_CAP_LBR_FMT))
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 31/44] KVM: x86/pmu: Restrict GLOBAL_{CTRL,STATUS}, fixed PMCs, and PEBS to PMU v2+
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (29 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 30/44] KVM: x86/pmu: Move initialization of valid PMCs bitmask to common x86 Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 32/44] KVM: x86/pmu: Disable interception of select PMU MSRs for mediated vPMUs Sean Christopherson
                   ` (15 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

Restrict support for GLOBAL_CTRL, GLOBAL_STATUS, fixed PMCs, and PEBS to
v2 or later vPMUs.  The SDM explicitly states that GLOBAL_{CTRL,STATUS} and
fixed counters were introduced with PMU v2, and PEBS has hard dependencies
on fixed counters and the bitmap MSR layouts established by PMU v2.

Reported-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/pmu_intel.c | 51 ++++++++++++++++++------------------
 1 file changed, 25 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 1de94a39ca18..779b4e64acac 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -541,16 +541,33 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 				      kvm_pmu_cap.events_mask_len);
 	pmu->available_event_types = ~entry->ebx & (BIT_ULL(eax.split.mask_length) - 1);
 
-	if (pmu->version == 1) {
-		pmu->nr_arch_fixed_counters = 0;
-	} else {
-		pmu->nr_arch_fixed_counters = min_t(int, edx.split.num_counters_fixed,
-						    kvm_pmu_cap.num_counters_fixed);
-		edx.split.bit_width_fixed = min_t(int, edx.split.bit_width_fixed,
-						  kvm_pmu_cap.bit_width_fixed);
-		pmu->counter_bitmask[KVM_PMC_FIXED] = BIT_ULL(edx.split.bit_width_fixed) - 1;
+	entry = kvm_find_cpuid_entry_index(vcpu, 7, 0);
+	if (entry &&
+	    (boot_cpu_has(X86_FEATURE_HLE) || boot_cpu_has(X86_FEATURE_RTM)) &&
+	    (entry->ebx & (X86_FEATURE_HLE|X86_FEATURE_RTM))) {
+		pmu->reserved_bits ^= HSW_IN_TX;
+		pmu->raw_event_mask |= (HSW_IN_TX|HSW_IN_TX_CHECKPOINTED);
 	}
 
+	perf_capabilities = vcpu_get_perf_capabilities(vcpu);
+	if (intel_pmu_lbr_is_compatible(vcpu) &&
+	    (perf_capabilities & PERF_CAP_LBR_FMT))
+		memcpy(&lbr_desc->records, &vmx_lbr_caps, sizeof(vmx_lbr_caps));
+	else
+		lbr_desc->records.nr = 0;
+
+	if (lbr_desc->records.nr)
+		bitmap_set(pmu->all_valid_pmc_idx, INTEL_PMC_IDX_FIXED_VLBR, 1);
+
+	if (pmu->version == 1)
+		return;
+
+	pmu->nr_arch_fixed_counters = min_t(int, edx.split.num_counters_fixed,
+					    kvm_pmu_cap.num_counters_fixed);
+	edx.split.bit_width_fixed = min_t(int, edx.split.bit_width_fixed,
+					  kvm_pmu_cap.bit_width_fixed);
+	pmu->counter_bitmask[KVM_PMC_FIXED] = BIT_ULL(edx.split.bit_width_fixed) - 1;
+
 	intel_pmu_enable_fixed_counter_bits(pmu, INTEL_FIXED_0_KERNEL |
 						 INTEL_FIXED_0_USER |
 						 INTEL_FIXED_0_ENABLE_PMI);
@@ -571,24 +588,6 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 		pmu->global_status_rsvd &=
 				~MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI;
 
-	entry = kvm_find_cpuid_entry_index(vcpu, 7, 0);
-	if (entry &&
-	    (boot_cpu_has(X86_FEATURE_HLE) || boot_cpu_has(X86_FEATURE_RTM)) &&
-	    (entry->ebx & (X86_FEATURE_HLE|X86_FEATURE_RTM))) {
-		pmu->reserved_bits ^= HSW_IN_TX;
-		pmu->raw_event_mask |= (HSW_IN_TX|HSW_IN_TX_CHECKPOINTED);
-	}
-
-	perf_capabilities = vcpu_get_perf_capabilities(vcpu);
-	if (intel_pmu_lbr_is_compatible(vcpu) &&
-	    (perf_capabilities & PERF_CAP_LBR_FMT))
-		memcpy(&lbr_desc->records, &vmx_lbr_caps, sizeof(vmx_lbr_caps));
-	else
-		lbr_desc->records.nr = 0;
-
-	if (lbr_desc->records.nr)
-		bitmap_set(pmu->all_valid_pmc_idx, INTEL_PMC_IDX_FIXED_VLBR, 1);
-
 	if (perf_capabilities & PERF_CAP_PEBS_FORMAT) {
 		if (perf_capabilities & PERF_CAP_PEBS_BASELINE) {
 			pmu->pebs_enable_rsvd = counter_rsvd;
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 32/44] KVM: x86/pmu: Disable interception of select PMU MSRs for mediated vPMUs
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (30 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 31/44] KVM: x86/pmu: Restrict GLOBAL_{CTRL,STATUS}, fixed PMCs, and PEBS to PMU v2+ Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 33/44] KVM: x86/pmu: Bypass perf checks when emulating mediated PMU counter accesses Sean Christopherson
                   ` (14 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Dapeng Mi <dapeng1.mi@linux.intel.com>

For vCPUs with a mediated vPMU, disable interception of counter MSRs for
PMCs that are exposed to the guest, and for GLOBAL_CTRL and related MSRs
if they are fully supported according to the vCPU model, i.e. if the MSRs
and all bits supported by hardware exist from the guest's point of view.

Do NOT passthrough event selector or fixed counter control MSRs, so that
KVM can enforce userspace-defined event filters, e.g. to prevent use of
AnyThread events (which is unfortunately a setting in the fixed counter
control MSR).

Defer support for nested passthrough of mediated PMU MSRs to the future,
as the logic for nested MSR interception is unfortunately vendor specific.

Suggested-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Co-developed-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
[sean: squash patches, massage changelog, refresh VMX MSRs on filter change]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/msr-index.h |  1 +
 arch/x86/kvm/pmu.c               | 34 ++++++++++++------
 arch/x86/kvm/pmu.h               |  1 +
 arch/x86/kvm/svm/svm.c           | 36 +++++++++++++++++++
 arch/x86/kvm/vmx/pmu_intel.c     | 13 -------
 arch/x86/kvm/vmx/pmu_intel.h     | 15 ++++++++
 arch/x86/kvm/vmx/vmx.c           | 59 ++++++++++++++++++++++++++------
 7 files changed, 124 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index f19d1ee9a396..06bd7aaae931 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -736,6 +736,7 @@
 #define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS	0xc0000300
 #define MSR_AMD64_PERF_CNTR_GLOBAL_CTL		0xc0000301
 #define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR	0xc0000302
+#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET	0xc0000303
 
 /* AMD Last Branch Record MSRs */
 #define MSR_AMD64_LBR_SELECT			0xc000010e
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 4246e1d2cfcc..817ef852bdf9 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -715,18 +715,14 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
 	return 0;
 }
 
-bool kvm_need_rdpmc_intercept(struct kvm_vcpu *vcpu)
+bool kvm_need_perf_global_ctrl_intercept(struct kvm_vcpu *vcpu)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
 
 	if (!kvm_vcpu_has_mediated_pmu(vcpu))
 		return true;
 
-	/*
-	 * VMware allows access to these Pseduo-PMCs even when read via RDPMC
-	 * in Ring3 when CR4.PCE=0.
-	 */
-	if (enable_vmware_backdoor)
+	if (!kvm_pmu_has_perf_global_ctrl(pmu))
 		return true;
 
 	/*
@@ -735,7 +731,22 @@ bool kvm_need_rdpmc_intercept(struct kvm_vcpu *vcpu)
 	 * capabilities themselves may be a subset of hardware capabilities.
 	 */
 	return pmu->nr_arch_gp_counters != kvm_host_pmu.num_counters_gp ||
-	       pmu->nr_arch_fixed_counters != kvm_host_pmu.num_counters_fixed ||
+	       pmu->nr_arch_fixed_counters != kvm_host_pmu.num_counters_fixed;
+}
+EXPORT_SYMBOL_GPL(kvm_need_perf_global_ctrl_intercept);
+
+bool kvm_need_rdpmc_intercept(struct kvm_vcpu *vcpu)
+{
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+
+	/*
+	 * VMware allows access to these Pseduo-PMCs even when read via RDPMC
+	 * in Ring3 when CR4.PCE=0.
+	 */
+	if (enable_vmware_backdoor)
+		return true;
+
+	return kvm_need_perf_global_ctrl_intercept(vcpu) ||
 	       pmu->counter_bitmask[KVM_PMC_GP] != (BIT_ULL(kvm_host_pmu.bit_width_gp) - 1) ||
 	       pmu->counter_bitmask[KVM_PMC_FIXED] != (BIT_ULL(kvm_host_pmu.bit_width_fixed) - 1);
 }
@@ -927,11 +938,12 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
 	 * in the global controls).  Emulate that behavior when refreshing the
 	 * PMU so that userspace doesn't need to manually set PERF_GLOBAL_CTRL.
 	 */
-	if (kvm_pmu_has_perf_global_ctrl(pmu) && pmu->nr_arch_gp_counters) {
+	if (pmu->nr_arch_gp_counters &&
+	    (kvm_pmu_has_perf_global_ctrl(pmu) || kvm_vcpu_has_mediated_pmu(vcpu)))
 		pmu->global_ctrl = GENMASK_ULL(pmu->nr_arch_gp_counters - 1, 0);
-		if (kvm_vcpu_has_mediated_pmu(vcpu))
-			kvm_pmu_call(write_global_ctrl)(pmu->global_ctrl);
-	}
+
+	if (kvm_vcpu_has_mediated_pmu(vcpu))
+		kvm_pmu_call(write_global_ctrl)(pmu->global_ctrl);
 
 	bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters);
 	bitmap_set(pmu->all_valid_pmc_idx, KVM_FIXED_PMC_BASE_IDX,
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index dcf4e2253875..51963a3a167a 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -239,6 +239,7 @@ void kvm_pmu_instruction_retired(struct kvm_vcpu *vcpu);
 void kvm_pmu_branch_retired(struct kvm_vcpu *vcpu);
 
 bool is_vmware_backdoor_pmc(u32 pmc_idx);
+bool kvm_need_perf_global_ctrl_intercept(struct kvm_vcpu *vcpu);
 bool kvm_need_rdpmc_intercept(struct kvm_vcpu *vcpu);
 
 extern struct kvm_pmu_ops intel_pmu_ops;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2d42962b47aa..add50b64256c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -790,6 +790,40 @@ void svm_vcpu_free_msrpm(void *msrpm)
 	__free_pages(virt_to_page(msrpm), get_order(MSRPM_SIZE));
 }
 
+static void svm_recalc_pmu_msr_intercepts(struct kvm_vcpu *vcpu)
+{
+	bool intercept = !kvm_vcpu_has_mediated_pmu(vcpu);
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	int i;
+
+	if (!enable_mediated_pmu)
+		return;
+
+	/* Legacy counters are always available for AMD CPUs with a PMU. */
+	for (i = 0; i < min(pmu->nr_arch_gp_counters, AMD64_NUM_COUNTERS); i++)
+		svm_set_intercept_for_msr(vcpu, MSR_K7_PERFCTR0 + i,
+					  MSR_TYPE_RW, intercept);
+
+	intercept |= !guest_cpu_cap_has(vcpu, X86_FEATURE_PERFCTR_CORE);
+	for (i = 0; i < pmu->nr_arch_gp_counters; i++)
+		svm_set_intercept_for_msr(vcpu, MSR_F15H_PERF_CTR + 2 * i,
+					  MSR_TYPE_RW, intercept);
+
+	for ( ; i < kvm_pmu_cap.num_counters_gp; i++)
+		svm_enable_intercept_for_msr(vcpu, MSR_F15H_PERF_CTR + 2 * i,
+					     MSR_TYPE_RW);
+
+	intercept = kvm_need_perf_global_ctrl_intercept(vcpu);
+	svm_set_intercept_for_msr(vcpu, MSR_AMD64_PERF_CNTR_GLOBAL_CTL,
+				  MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(vcpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS,
+				  MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(vcpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR,
+				  MSR_TYPE_RW, intercept);
+	svm_set_intercept_for_msr(vcpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET,
+				  MSR_TYPE_RW, intercept);
+}
+
 static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -847,6 +881,8 @@ static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 	if (sev_es_guest(vcpu->kvm))
 		sev_es_recalc_msr_intercepts(vcpu);
 
+	svm_recalc_pmu_msr_intercepts(vcpu);
+
 	/*
 	 * x2APIC intercepts are modified on-demand and cannot be filtered by
 	 * userspace.
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 779b4e64acac..2bdddb95816e 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -128,19 +128,6 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
 	return &counters[array_index_nospec(idx, num_counters)];
 }
 
-static inline u64 vcpu_get_perf_capabilities(struct kvm_vcpu *vcpu)
-{
-	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_PDCM))
-		return 0;
-
-	return vcpu->arch.perf_capabilities;
-}
-
-static inline bool fw_writes_is_enabled(struct kvm_vcpu *vcpu)
-{
-	return (vcpu_get_perf_capabilities(vcpu) & PERF_CAP_FW_WRITES) != 0;
-}
-
 static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm_pmu *pmu, u32 msr)
 {
 	if (!fw_writes_is_enabled(pmu_to_vcpu(pmu)))
diff --git a/arch/x86/kvm/vmx/pmu_intel.h b/arch/x86/kvm/vmx/pmu_intel.h
index 5620d0882cdc..5d9357640aa1 100644
--- a/arch/x86/kvm/vmx/pmu_intel.h
+++ b/arch/x86/kvm/vmx/pmu_intel.h
@@ -4,6 +4,21 @@
 
 #include <linux/kvm_host.h>
 
+#include "cpuid.h"
+
+static inline u64 vcpu_get_perf_capabilities(struct kvm_vcpu *vcpu)
+{
+	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_PDCM))
+		return 0;
+
+	return vcpu->arch.perf_capabilities;
+}
+
+static inline bool fw_writes_is_enabled(struct kvm_vcpu *vcpu)
+{
+	return (vcpu_get_perf_capabilities(vcpu) & PERF_CAP_FW_WRITES) != 0;
+}
+
 bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu);
 int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu);
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1233a0afb31e..85bd82d41f94 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4068,6 +4068,53 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
 	}
 }
 
+static void vmx_recalc_pmu_msr_intercepts(struct kvm_vcpu *vcpu)
+{
+	bool has_mediated_pmu = kvm_vcpu_has_mediated_pmu(vcpu);
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	bool intercept = !has_mediated_pmu;
+	int i;
+
+	if (!enable_mediated_pmu)
+		return;
+
+	vm_entry_controls_changebit(vmx, VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL,
+				    has_mediated_pmu);
+
+	vm_exit_controls_changebit(vmx, VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
+					VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL,
+				   has_mediated_pmu);
+
+	for (i = 0; i < pmu->nr_arch_gp_counters; i++) {
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PERFCTR0 + i,
+					  MSR_TYPE_RW, intercept);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PMC0 + i, MSR_TYPE_RW,
+					  intercept || !fw_writes_is_enabled(vcpu));
+	}
+	for ( ; i < kvm_pmu_cap.num_counters_gp; i++) {
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PERFCTR0 + i,
+					  MSR_TYPE_RW, true);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PMC0 + i,
+					  MSR_TYPE_RW, true);
+	}
+
+	for (i = 0; i < pmu->nr_arch_fixed_counters; i++)
+		vmx_set_intercept_for_msr(vcpu, MSR_CORE_PERF_FIXED_CTR0 + i,
+					  MSR_TYPE_RW, intercept);
+	for ( ; i < kvm_pmu_cap.num_counters_fixed; i++)
+		vmx_set_intercept_for_msr(vcpu, MSR_CORE_PERF_FIXED_CTR0 + i,
+					  MSR_TYPE_RW, true);
+
+	intercept = kvm_need_perf_global_ctrl_intercept(vcpu);
+	vmx_set_intercept_for_msr(vcpu, MSR_CORE_PERF_GLOBAL_STATUS,
+				  MSR_TYPE_RW, intercept);
+	vmx_set_intercept_for_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL,
+				  MSR_TYPE_RW, intercept);
+	vmx_set_intercept_for_msr(vcpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
+				  MSR_TYPE_RW, intercept);
+}
+
 static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 {
 	if (!cpu_has_vmx_msr_bitmap())
@@ -4115,17 +4162,7 @@ static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 		vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
 					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
 
-	if (enable_mediated_pmu) {
-		bool is_mediated_pmu = kvm_vcpu_has_mediated_pmu(vcpu);
-		struct vcpu_vmx *vmx = to_vmx(vcpu);
-
-		vm_entry_controls_changebit(vmx,
-					    VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL, is_mediated_pmu);
-
-		vm_exit_controls_changebit(vmx,
-					   VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
-					   VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL, is_mediated_pmu);
-	}
+	vmx_recalc_pmu_msr_intercepts(vcpu);
 
 	/*
 	 * x2APIC and LBR MSR intercepts are modified on-demand and cannot be
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 33/44] KVM: x86/pmu: Bypass perf checks when emulating mediated PMU counter accesses
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (31 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 32/44] KVM: x86/pmu: Disable interception of select PMU MSRs for mediated vPMUs Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-13 10:01   ` Sandipan Das
  2025-08-06 19:56 ` [PATCH v5 34/44] KVM: x86/pmu: Introduce eventsel_hw to prepare for pmu event filtering Sean Christopherson
                   ` (13 subsequent siblings)
  46 siblings, 1 reply; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Dapeng Mi <dapeng1.mi@linux.intel.com>

When emulating a PMC counter read or write for a mediated PMU, bypass the
perf checks and emulated_counter logic as the counters aren't proxied
through perf, i.e. pmc->counter always holds the guest's up-to-date value,
and thus there's no need to defer emulated overflow checks.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Co-developed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
[sean: split from event filtering change, write shortlog+changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/pmu.c | 5 +++++
 arch/x86/kvm/pmu.h | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 817ef852bdf9..082d2905882b 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -377,6 +377,11 @@ static void pmc_update_sample_period(struct kvm_pmc *pmc)
 
 void pmc_write_counter(struct kvm_pmc *pmc, u64 val)
 {
+	if (kvm_vcpu_has_mediated_pmu(pmc->vcpu)) {
+		pmc->counter = val & pmc_bitmask(pmc);
+		return;
+	}
+
 	/*
 	 * Drop any unconsumed accumulated counts, the WRMSR is a write, not a
 	 * read-modify-write.  Adjust the counter value so that its value is
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 51963a3a167a..1c9d26d60a60 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -111,6 +111,9 @@ static inline u64 pmc_read_counter(struct kvm_pmc *pmc)
 {
 	u64 counter, enabled, running;
 
+	if (kvm_vcpu_has_mediated_pmu(pmc->vcpu))
+		return pmc->counter & pmc_bitmask(pmc);
+
 	counter = pmc->counter + pmc->emulated_counter;
 
 	if (pmc->perf_event && !pmc->is_paused)
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 34/44] KVM: x86/pmu: Introduce eventsel_hw to prepare for pmu event filtering
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (32 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 33/44] KVM: x86/pmu: Bypass perf checks when emulating mediated PMU counter accesses Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 35/44] KVM: x86/pmu: Reprogram mediated PMU event selectors on event filter updates Sean Christopherson
                   ` (12 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Mingwei Zhang <mizhang@google.com>

Introduce eventsel_hw and fixed_ctr_ctrl_hw to store the actual HW value in
PMU event selector MSRs. In mediated PMU checks events before allowing the
event values written to the PMU MSRs. However, to match the HW behavior,
when PMU event checks fails, KVM should allow guest to read the value back.

This essentially requires an extra variable to separate the guest requested
value from actual PMU MSR value. Note this only applies to event selectors.

Signed-off-by: Mingwei Zhang <mizhang@google.com>
Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 2 ++
 arch/x86/kvm/pmu.c              | 7 +++++--
 arch/x86/kvm/svm/pmu.c          | 1 +
 arch/x86/kvm/vmx/pmu_intel.c    | 2 ++
 4 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b891bd92fc83..5512e33db14a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -528,6 +528,7 @@ struct kvm_pmc {
 	 */
 	u64 emulated_counter;
 	u64 eventsel;
+	u64 eventsel_hw;
 	struct perf_event *perf_event;
 	struct kvm_vcpu *vcpu;
 	/*
@@ -556,6 +557,7 @@ struct kvm_pmu {
 	unsigned nr_arch_fixed_counters;
 	unsigned available_event_types;
 	u64 fixed_ctr_ctrl;
+	u64 fixed_ctr_ctrl_hw;
 	u64 fixed_ctr_ctrl_rsvd;
 	u64 global_ctrl;
 	u64 global_status;
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 082d2905882b..e39ae37f0280 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -890,11 +890,14 @@ static void kvm_pmu_reset(struct kvm_vcpu *vcpu)
 		pmc->counter = 0;
 		pmc->emulated_counter = 0;
 
-		if (pmc_is_gp(pmc))
+		if (pmc_is_gp(pmc)) {
 			pmc->eventsel = 0;
+			pmc->eventsel_hw = 0;
+		}
 	}
 
-	pmu->fixed_ctr_ctrl = pmu->global_ctrl = pmu->global_status = 0;
+	pmu->fixed_ctr_ctrl = pmu->fixed_ctr_ctrl_hw = 0;
+	pmu->global_ctrl = pmu->global_status = 0;
 
 	kvm_pmu_call(reset)(vcpu);
 }
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 9ffd44a5d474..9641ef5d0dd7 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -165,6 +165,7 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		data &= ~pmu->reserved_bits;
 		if (data != pmc->eventsel) {
 			pmc->eventsel = data;
+			pmc->eventsel_hw = data;
 			kvm_pmu_request_counter_reprogram(pmc);
 		}
 		return 0;
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 2bdddb95816e..6874c522577e 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -61,6 +61,7 @@ static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
 	int i;
 
 	pmu->fixed_ctr_ctrl = data;
+	pmu->fixed_ctr_ctrl_hw = data;
 	for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
 		u8 new_ctrl = fixed_ctrl_field(data, i);
 		u8 old_ctrl = fixed_ctrl_field(old_fixed_ctr_ctrl, i);
@@ -430,6 +431,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 
 			if (data != pmc->eventsel) {
 				pmc->eventsel = data;
+				pmc->eventsel_hw = data;
 				kvm_pmu_request_counter_reprogram(pmc);
 			}
 			break;
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 35/44] KVM: x86/pmu: Reprogram mediated PMU event selectors on event filter updates
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (33 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 34/44] KVM: x86/pmu: Introduce eventsel_hw to prepare for pmu event filtering Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 36/44] KVM: x86/pmu: Always stuff GuestOnly=1,HostOnly=0 for mediated PMCs on AMD Sean Christopherson
                   ` (11 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Dapeng Mi <dapeng1.mi@linux.intel.com>

Refresh the event selectors that are programmed into hardware when a PMC
is "reprogrammed" for a mediated PMU, i.e. if userspace changes the PMU
event filters

Note, KVM doesn't utilize the reprogramming infrastructure to handle
counter overflow for mediated PMUs, as there's no need to reprogram a
non-existent perf event.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Co-developed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
[sean: add a helper to document behavior, split patch and rewrite changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/pmu.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index e39ae37f0280..b4c6a7704a01 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -518,6 +518,25 @@ static bool pmc_is_event_allowed(struct kvm_pmc *pmc)
 	return is_fixed_event_allowed(filter, pmc->idx);
 }
 
+static void kvm_mediated_pmu_refresh_event_filter(struct kvm_pmc *pmc)
+{
+	bool allowed = pmc_is_event_allowed(pmc);
+	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
+
+	if (pmc_is_gp(pmc)) {
+		pmc->eventsel_hw &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
+		if (allowed)
+			pmc->eventsel_hw |= pmc->eventsel &
+					    ARCH_PERFMON_EVENTSEL_ENABLE;
+	} else {
+		u64 mask = intel_fixed_bits_by_idx(pmc->idx - KVM_FIXED_PMC_BASE_IDX, 0xf);
+
+		pmu->fixed_ctr_ctrl_hw &= ~mask;
+		if (allowed)
+			pmu->fixed_ctr_ctrl_hw |= pmu->fixed_ctr_ctrl & mask;
+	}
+}
+
 static int reprogram_counter(struct kvm_pmc *pmc)
 {
 	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
@@ -526,6 +545,11 @@ static int reprogram_counter(struct kvm_pmc *pmc)
 	bool emulate_overflow;
 	u8 fixed_ctr_ctrl;
 
+	if (kvm_vcpu_has_mediated_pmu(pmu_to_vcpu(pmu))) {
+		kvm_mediated_pmu_refresh_event_filter(pmc);
+		return 0;
+	}
+
 	emulate_overflow = pmc_pause_counter(pmc);
 
 	if (!pmc_is_globally_enabled(pmc) || !pmc_is_locally_enabled(pmc) ||
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 36/44] KVM: x86/pmu: Always stuff GuestOnly=1,HostOnly=0 for mediated PMCs on AMD
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (34 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 35/44] KVM: x86/pmu: Reprogram mediated PMU event selectors on event filter updates Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:56 ` [PATCH v5 37/44] KVM: x86/pmu: Load/put mediated PMU context when entering/exiting guest Sean Christopherson
                   ` (10 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Sandipan Das <sandipan.das@amd.com>

On AMD platforms, there is no way to restore PerfCntrGlobalCtl at
VM-Entry or clear it at VM-Exit. Since the register states will be
restored before entering and saved after exiting guest context, the
counters can keep ticking and even overflow leading to chaos while
still in host context.

To avoid this, intecept event selectors, which is already done by mediated
PMU. In addition, always set the GuestOnly bit and clear the HostOnly bit
for PMU selectors on AMD. Doing so allows the counters run only in guest
context even if their enable bits are still set after VM exit and before
host/guest PMU context switch.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
[sean: massage shortlog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/pmu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 9641ef5d0dd7..a5e70a4e7647 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -165,7 +165,8 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		data &= ~pmu->reserved_bits;
 		if (data != pmc->eventsel) {
 			pmc->eventsel = data;
-			pmc->eventsel_hw = data;
+			pmc->eventsel_hw = (data & ~AMD64_EVENTSEL_HOSTONLY) |
+					   AMD64_EVENTSEL_GUESTONLY;
 			kvm_pmu_request_counter_reprogram(pmc);
 		}
 		return 0;
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 37/44] KVM: x86/pmu: Load/put mediated PMU context when entering/exiting guest
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (35 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 36/44] KVM: x86/pmu: Always stuff GuestOnly=1,HostOnly=0 for mediated PMCs on AMD Sean Christopherson
@ 2025-08-06 19:56 ` Sean Christopherson
  2025-08-06 19:57 ` [PATCH v5 38/44] KVM: x86/pmu: Disallow emulation in the fastpath if mediated PMCs are active Sean Christopherson
                   ` (9 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:56 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Dapeng Mi <dapeng1.mi@linux.intel.com>

Implement the PMU "world switch" between host perf and guest mediated PMU.
When loading guest state, call into perf to switch from host to guest, and
then load guest state into hardware, and then reverse those actions when
putting guest state.

On the KVM side, when loading guest state, zero PERF_GLOBAL_CTRL to ensure
all counters are disabled, then load selectors and counters, and finally
call into vendor code to load control/status information.  While VMX and
SVM use different mechanisms to avoid counting host activity while guest
controls are loaded, both implementations require PERF_GLOBAL_CTRL to be
zeroed when the event selectors are in flux.

When putting guest state, reverse the order, and save and zero controls
and status prior to saving+zeroing selectors and counters.  Defer clearing
PERF_GLOBAL_CTRL to vendor code, as only SVM needs to manually clear the
MSR; VMX configures PERF_GLOBAL_CTRL to be atomically cleared by the CPU
on VM-Exit.

Handle the difference in MSR layouts between Intel and AMD by communicating
the bases and stride via kvm_pmu_ops.  Because KVM requires Intel v4 (and
full-width writes) and AMD v2, the MSRs to load/save are constant for a
given vendor, i.e. do not vary based on the guest PMU, and do not vary
based on host PMU (because KVM will simply disable mediated PMU support if
the necessary MSRs are unsupported).

Except for retrieving the guest's PERF_GLOBAL_CTRL, which needs to be read
before invoking any fastpath handler (spoiler alert), perform the context
switch around KVM's inner run loop.  State only needs to be synchronized
from hardware before KVM can access the software "caches".

Note, VMX already grabs the guest's PERF_GLOBAL_CTRL immediately after
VM-Exit, as hardware saves value into the VMCS.

Co-developed-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Co-developed-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm-x86-pmu-ops.h |   2 +
 arch/x86/include/asm/msr-index.h       |   1 +
 arch/x86/kvm/pmu.c                     | 111 +++++++++++++++++++++++++
 arch/x86/kvm/pmu.h                     |  10 +++
 arch/x86/kvm/svm/pmu.c                 |  34 ++++++++
 arch/x86/kvm/svm/svm.c                 |   3 +
 arch/x86/kvm/vmx/pmu_intel.c           |  45 ++++++++++
 arch/x86/kvm/x86.c                     |   4 +
 8 files changed, 210 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-pmu-ops.h b/arch/x86/include/asm/kvm-x86-pmu-ops.h
index ad2cc82abf79..f0aa6996811f 100644
--- a/arch/x86/include/asm/kvm-x86-pmu-ops.h
+++ b/arch/x86/include/asm/kvm-x86-pmu-ops.h
@@ -24,6 +24,8 @@ KVM_X86_PMU_OP_OPTIONAL(deliver_pmi)
 KVM_X86_PMU_OP_OPTIONAL(cleanup)
 
 KVM_X86_PMU_OP_OPTIONAL(write_global_ctrl)
+KVM_X86_PMU_OP(mediated_load)
+KVM_X86_PMU_OP(mediated_put)
 
 #undef KVM_X86_PMU_OP
 #undef KVM_X86_PMU_OP_OPTIONAL
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 06bd7aaae931..0f50a74797a4 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1171,6 +1171,7 @@
 #define MSR_CORE_PERF_GLOBAL_STATUS	0x0000038e
 #define MSR_CORE_PERF_GLOBAL_CTRL	0x0000038f
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL	0x00000390
+#define MSR_CORE_PERF_GLOBAL_STATUS_SET	0x00000391
 
 #define MSR_PERF_METRICS		0x00000329
 
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index b4c6a7704a01..77042cad3155 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -1234,3 +1234,114 @@ int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
 	kfree(filter);
 	return r;
 }
+
+static __always_inline u32 fixed_counter_msr(u32 idx)
+{
+	return kvm_pmu_ops.FIXED_COUNTER_BASE + idx * kvm_pmu_ops.MSR_STRIDE;
+}
+
+static __always_inline u32 gp_counter_msr(u32 idx)
+{
+	return kvm_pmu_ops.GP_COUNTER_BASE + idx * kvm_pmu_ops.MSR_STRIDE;
+}
+
+static __always_inline u32 gp_eventsel_msr(u32 idx)
+{
+	return kvm_pmu_ops.GP_EVENTSEL_BASE + idx * kvm_pmu_ops.MSR_STRIDE;
+}
+
+static void kvm_pmu_load_guest_pmcs(struct kvm_vcpu *vcpu)
+{
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	struct kvm_pmc *pmc;
+	u32 i;
+
+	/*
+	 * No need to zero out unexposed GP/fixed counters/selectors since RDPMC
+	 * is intercepted if hardware has counters that aren't visible to the
+	 * guest (KVM will inject #GP as appropriate).
+	 */
+	for (i = 0; i < pmu->nr_arch_gp_counters; i++) {
+		pmc = &pmu->gp_counters[i];
+
+		wrmsrl(gp_counter_msr(i), pmc->counter);
+		wrmsrl(gp_eventsel_msr(i), pmc->eventsel_hw);
+	}
+	for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
+		pmc = &pmu->fixed_counters[i];
+
+		wrmsrl(fixed_counter_msr(i), pmc->counter);
+	}
+}
+
+void kvm_mediated_pmu_load(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_vcpu_has_mediated_pmu(vcpu) ||
+	    KVM_BUG_ON(!lapic_in_kernel(vcpu), vcpu->kvm))
+		return;
+
+	lockdep_assert_irqs_disabled();
+
+	perf_load_guest_context(kvm_lapic_get_reg(vcpu->arch.apic, APIC_LVTPC));
+
+	/*
+	 * Disable all counters before loading event selectors and PMCs so that
+	 * KVM doesn't enable or load guest counters while host events are
+	 * active.  VMX will enable/disabled counters at VM-Enter/VM-Exit by
+	 * atomically loading PERF_GLOBAL_CONTROL.  SVM effectively performs
+	 * the switch by configuring all events to be GUEST_ONLY.
+	 */
+	wrmsrl(kvm_pmu_ops.PERF_GLOBAL_CTRL, 0);
+
+	kvm_pmu_load_guest_pmcs(vcpu);
+
+	kvm_pmu_call(mediated_load)(vcpu);
+}
+
+static void kvm_pmu_put_guest_pmcs(struct kvm_vcpu *vcpu)
+{
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	struct kvm_pmc *pmc;
+	u32 i;
+
+	/*
+	 * Clear selectors and counters to ensure hardware doesn't count using
+	 * guest controls when the host (perf) restores its state.
+	 */
+	for (i = 0; i < pmu->nr_arch_gp_counters; i++) {
+		pmc = &pmu->gp_counters[i];
+
+		pmc->counter = rdpmc(i);
+		if (pmc->counter)
+			wrmsrl(gp_counter_msr(i), 0);
+		if (pmc->eventsel_hw)
+			wrmsrl(gp_eventsel_msr(i), 0);
+	}
+
+	for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
+		pmc = &pmu->fixed_counters[i];
+
+		pmc->counter = rdpmc(INTEL_PMC_FIXED_RDPMC_BASE | i);
+		if (pmc->counter)
+			wrmsrl(fixed_counter_msr(i), 0);
+	}
+}
+
+void kvm_mediated_pmu_put(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_vcpu_has_mediated_pmu(vcpu) ||
+	    KVM_BUG_ON(!lapic_in_kernel(vcpu), vcpu->kvm))
+		return;
+
+	lockdep_assert_irqs_disabled();
+
+	/*
+	 * Defer handling of PERF_GLOBAL_CTRL to vendor code.  On Intel, it's
+	 * atomically cleared on VM-Exit, i.e. doesn't need to be clear here.
+	 */
+	kvm_pmu_call(mediated_put)(vcpu);
+
+	kvm_pmu_put_guest_pmcs(vcpu);
+
+	perf_put_guest_context();
+}
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 1c9d26d60a60..e2e2d8476a3f 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -38,11 +38,19 @@ struct kvm_pmu_ops {
 	void (*cleanup)(struct kvm_vcpu *vcpu);
 
 	bool (*is_mediated_pmu_supported)(struct x86_pmu_capability *host_pmu);
+	void (*mediated_load)(struct kvm_vcpu *vcpu);
+	void (*mediated_put)(struct kvm_vcpu *vcpu);
 	void (*write_global_ctrl)(u64 global_ctrl);
 
 	const u64 EVENTSEL_EVENT;
 	const int MAX_NR_GP_COUNTERS;
 	const int MIN_NR_GP_COUNTERS;
+
+	const u32 PERF_GLOBAL_CTRL;
+	const u32 GP_EVENTSEL_BASE;
+	const u32 GP_COUNTER_BASE;
+	const u32 FIXED_COUNTER_BASE;
+	const u32 MSR_STRIDE;
 };
 
 void kvm_pmu_ops_update(const struct kvm_pmu_ops *pmu_ops);
@@ -240,6 +248,8 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu);
 int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp);
 void kvm_pmu_instruction_retired(struct kvm_vcpu *vcpu);
 void kvm_pmu_branch_retired(struct kvm_vcpu *vcpu);
+void kvm_mediated_pmu_load(struct kvm_vcpu *vcpu);
+void kvm_mediated_pmu_put(struct kvm_vcpu *vcpu);
 
 bool is_vmware_backdoor_pmc(u32 pmc_idx);
 bool kvm_need_perf_global_ctrl_intercept(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index a5e70a4e7647..c03720b30785 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -233,6 +233,32 @@ static bool amd_pmu_is_mediated_pmu_supported(struct x86_pmu_capability *host_pm
 	return host_pmu->version >= 2;
 }
 
+static void amd_mediated_pmu_load(struct kvm_vcpu *vcpu)
+{
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	u64 global_status;
+
+	rdmsrq(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, global_status);
+	/* Clear host global_status MSR if non-zero. */
+	if (global_status)
+		wrmsrq(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, global_status);
+
+	wrmsrq(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET, pmu->global_status);
+	wrmsrq(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, pmu->global_ctrl);
+}
+
+static void amd_mediated_pmu_put(struct kvm_vcpu *vcpu)
+{
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+
+	wrmsrq(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0);
+	rdmsrq(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, pmu->global_status);
+
+	/* Clear global status bits if non-zero */
+	if (pmu->global_status)
+		wrmsrq(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, pmu->global_status);
+}
+
 struct kvm_pmu_ops amd_pmu_ops __initdata = {
 	.rdpmc_ecx_to_pmc = amd_rdpmc_ecx_to_pmc,
 	.msr_idx_to_pmc = amd_msr_idx_to_pmc,
@@ -244,8 +270,16 @@ struct kvm_pmu_ops amd_pmu_ops __initdata = {
 	.init = amd_pmu_init,
 
 	.is_mediated_pmu_supported = amd_pmu_is_mediated_pmu_supported,
+	.mediated_load = amd_mediated_pmu_load,
+	.mediated_put = amd_mediated_pmu_put,
 
 	.EVENTSEL_EVENT = AMD64_EVENTSEL_EVENT,
 	.MAX_NR_GP_COUNTERS = KVM_MAX_NR_AMD_GP_COUNTERS,
 	.MIN_NR_GP_COUNTERS = AMD64_NUM_COUNTERS,
+
+	.PERF_GLOBAL_CTRL = MSR_AMD64_PERF_CNTR_GLOBAL_CTL,
+	.GP_EVENTSEL_BASE = MSR_F15H_PERF_CTL0,
+	.GP_COUNTER_BASE = MSR_F15H_PERF_CTR0,
+	.FIXED_COUNTER_BASE = 0,
+	.MSR_STRIDE = 2,
 };
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index add50b64256c..ca6f453cc160 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4416,6 +4416,9 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
 
 	vcpu->arch.regs_avail &= ~SVM_REGS_LAZY_LOAD_SET;
 
+	if (!msr_write_intercepted(vcpu, MSR_AMD64_PERF_CNTR_GLOBAL_CTL))
+		rdmsrq(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, vcpu_to_pmu(vcpu)->global_ctrl);
+
 	/*
 	 * We need to handle MC intercepts here before the vcpu has a chance to
 	 * change the physical cpu
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 6874c522577e..41a845de789e 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -786,6 +786,43 @@ static void intel_pmu_write_global_ctrl(u64 global_ctrl)
 	vmcs_write64(GUEST_IA32_PERF_GLOBAL_CTRL, global_ctrl);
 }
 
+
+static void intel_mediated_pmu_load(struct kvm_vcpu *vcpu)
+{
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	u64 global_status, toggle;
+
+	rdmsrq(MSR_CORE_PERF_GLOBAL_STATUS, global_status);
+	toggle = pmu->global_status ^ global_status;
+	if (global_status & toggle)
+		wrmsrq(MSR_CORE_PERF_GLOBAL_OVF_CTRL, global_status & toggle);
+	if (pmu->global_status & toggle)
+		wrmsrq(MSR_CORE_PERF_GLOBAL_STATUS_SET, pmu->global_status & toggle);
+
+	wrmsrq(MSR_CORE_PERF_FIXED_CTR_CTRL, pmu->fixed_ctr_ctrl_hw);
+}
+
+static void intel_mediated_pmu_put(struct kvm_vcpu *vcpu)
+{
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+
+	/* MSR_CORE_PERF_GLOBAL_CTRL is already saved at VM-exit. */
+	rdmsrq(MSR_CORE_PERF_GLOBAL_STATUS, pmu->global_status);
+
+	/* Clear hardware MSR_CORE_PERF_GLOBAL_STATUS MSR, if non-zero. */
+	if (pmu->global_status)
+		wrmsrq(MSR_CORE_PERF_GLOBAL_OVF_CTRL, pmu->global_status);
+
+	/*
+	 * Clear hardware FIXED_CTR_CTRL MSR to avoid information leakage and
+	 * also to avoid accidentally enabling fixed counters (based on guest
+	 * state) while running in the host, e.g. when setting global ctrl.
+	 */
+	if (pmu->fixed_ctr_ctrl_hw)
+		wrmsrq(MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
+}
+
+
 struct kvm_pmu_ops intel_pmu_ops __initdata = {
 	.rdpmc_ecx_to_pmc = intel_rdpmc_ecx_to_pmc,
 	.msr_idx_to_pmc = intel_msr_idx_to_pmc,
@@ -799,9 +836,17 @@ struct kvm_pmu_ops intel_pmu_ops __initdata = {
 	.cleanup = intel_pmu_cleanup,
 
 	.is_mediated_pmu_supported = intel_pmu_is_mediated_pmu_supported,
+	.mediated_load = intel_mediated_pmu_load,
+	.mediated_put = intel_mediated_pmu_put,
 	.write_global_ctrl = intel_pmu_write_global_ctrl,
 
 	.EVENTSEL_EVENT = ARCH_PERFMON_EVENTSEL_EVENT,
 	.MAX_NR_GP_COUNTERS = KVM_MAX_NR_INTEL_GP_COUNTERS,
 	.MIN_NR_GP_COUNTERS = 1,
+
+	.PERF_GLOBAL_CTRL = MSR_CORE_PERF_GLOBAL_CTRL,
+	.GP_EVENTSEL_BASE = MSR_P6_EVNTSEL0,
+	.GP_COUNTER_BASE = MSR_IA32_PMC0,
+	.FIXED_COUNTER_BASE = MSR_CORE_PERF_FIXED_CTR0,
+	.MSR_STRIDE = 1,
 };
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b8014435c988..7fb94ef64e18 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10906,6 +10906,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		run_flags |= KVM_RUN_LOAD_DEBUGCTL;
 	vcpu->arch.host_debugctl = debug_ctl;
 
+	kvm_mediated_pmu_load(vcpu);
+
 	guest_timing_enter_irqoff();
 
 	for (;;) {
@@ -10936,6 +10938,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		++vcpu->stat.exits;
 	}
 
+	kvm_mediated_pmu_put(vcpu);
+
 	/*
 	 * Do this here before restoring debug registers on the host.  And
 	 * since we do this before handling the vmexit, a DR access vmexit
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 38/44] KVM: x86/pmu: Disallow emulation in the fastpath if mediated PMCs are active
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (36 preceding siblings ...)
  2025-08-06 19:56 ` [PATCH v5 37/44] KVM: x86/pmu: Load/put mediated PMU context when entering/exiting guest Sean Christopherson
@ 2025-08-06 19:57 ` Sean Christopherson
  2025-08-13  9:53   ` Sandipan Das
  2025-08-06 19:57 ` [PATCH v5 39/44] KVM: x86/pmu: Handle emulated instruction for mediated vPMU Sean Christopherson
                   ` (8 subsequent siblings)
  46 siblings, 1 reply; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:57 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

Don't handle exits in the fastpath if emulation is required, i.e. if an
instruction needs to be skipped, the mediated PMU is enabled, and one or
more PMCs is counting instructions.  With the mediated PMU, KVM's cache of
PMU state is inconsistent with respect to hardware until KVM exits the
inner run loop (when the mediated PMU is "put").

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/pmu.h | 10 ++++++++++
 arch/x86/kvm/x86.c |  9 +++++++++
 2 files changed, 19 insertions(+)

diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index e2e2d8476a3f..a0cd42cbea9d 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -234,6 +234,16 @@ static inline bool pmc_is_globally_enabled(struct kvm_pmc *pmc)
 	return test_bit(pmc->idx, (unsigned long *)&pmu->global_ctrl);
 }
 
+static inline bool kvm_pmu_is_fastpath_emulation_allowed(struct kvm_vcpu *vcpu)
+{
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+
+	return !kvm_vcpu_has_mediated_pmu(vcpu) ||
+	       !bitmap_intersects(pmu->pmc_counting_instructions,
+				  (unsigned long *)&pmu->global_ctrl,
+				  X86_PMC_IDX_MAX);
+}
+
 void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
 void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
 int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7fb94ef64e18..6bdf7ef0b535 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2092,6 +2092,9 @@ EXPORT_SYMBOL_GPL(kvm_emulate_invd);
 
 fastpath_t handle_fastpath_invd(struct kvm_vcpu *vcpu)
 {
+	if (!kvm_pmu_is_fastpath_emulation_allowed(vcpu))
+		return EXIT_FASTPATH_NONE;
+
 	if (!kvm_emulate_invd(vcpu))
 		return EXIT_FASTPATH_EXIT_USERSPACE;
 
@@ -2151,6 +2154,9 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu)
 	u64 data = kvm_read_edx_eax(vcpu);
 	u32 msr = kvm_rcx_read(vcpu);
 
+	if (!kvm_pmu_is_fastpath_emulation_allowed(vcpu))
+		return EXIT_FASTPATH_NONE;
+
 	switch (msr) {
 	case APIC_BASE_MSR + (APIC_ICR >> 4):
 		if (!lapic_in_kernel(vcpu) || !apic_x2apic_mode(vcpu->arch.apic) ||
@@ -11267,6 +11273,9 @@ EXPORT_SYMBOL_GPL(kvm_emulate_halt);
 
 fastpath_t handle_fastpath_hlt(struct kvm_vcpu *vcpu)
 {
+	if (!kvm_pmu_is_fastpath_emulation_allowed(vcpu))
+		return EXIT_FASTPATH_NONE;
+
 	if (!kvm_emulate_halt(vcpu))
 		return EXIT_FASTPATH_EXIT_USERSPACE;
 
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 39/44] KVM: x86/pmu: Handle emulated instruction for mediated vPMU
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (37 preceding siblings ...)
  2025-08-06 19:57 ` [PATCH v5 38/44] KVM: x86/pmu: Disallow emulation in the fastpath if mediated PMCs are active Sean Christopherson
@ 2025-08-06 19:57 ` Sean Christopherson
  2025-08-06 19:57 ` [PATCH v5 40/44] KVM: nVMX: Add macros to simplify nested MSR interception setting Sean Christopherson
                   ` (7 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:57 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Dapeng Mi <dapeng1.mi@linux.intel.com>

Mediated vPMU needs to accumulate the emulated instructions into counter
and load the counter into HW at vm-entry.

Moreover, if the accumulation leads to counter overflow, KVM needs to
update GLOBAL_STATUS and inject PMI into guest as well.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/pmu.c | 39 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 77042cad3155..ddab1630a978 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -1018,10 +1018,45 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu)
 	kvm_pmu_reset(vcpu);
 }
 
+static bool pmc_is_pmi_enabled(struct kvm_pmc *pmc)
+{
+	u8 fixed_ctr_ctrl;
+
+	if (pmc_is_gp(pmc))
+		return pmc->eventsel & ARCH_PERFMON_EVENTSEL_INT;
+
+	fixed_ctr_ctrl = fixed_ctrl_field(pmc_to_pmu(pmc)->fixed_ctr_ctrl,
+					  pmc->idx - KVM_FIXED_PMC_BASE_IDX);
+	return fixed_ctr_ctrl & INTEL_FIXED_0_ENABLE_PMI;
+}
+
 static void kvm_pmu_incr_counter(struct kvm_pmc *pmc)
 {
-	pmc->emulated_counter++;
-	kvm_pmu_request_counter_reprogram(pmc);
+	struct kvm_vcpu *vcpu = pmc->vcpu;
+
+	/*
+	 * For perf-based PMUs, accumulate software-emulated events separately
+	 * from pmc->counter, as pmc->counter is offset by the count of the
+	 * associated perf event. Request reprogramming, which will consult
+	 * both emulated and hardware-generated events to detect overflow.
+	 */
+	if (!kvm_vcpu_has_mediated_pmu(vcpu)) {
+		pmc->emulated_counter++;
+		kvm_pmu_request_counter_reprogram(pmc);
+		return;
+	}
+
+	/*
+	 * For mediated PMUs, pmc->counter is updated when the vCPU's PMU is
+	 * put, and will be loaded into hardware when the PMU is loaded. Simply
+	 * increment the counter and signal overflow if it wraps to zero.
+	 */
+	pmc->counter = (pmc->counter + 1) & pmc_bitmask(pmc);
+	if (!pmc->counter) {
+		pmc_to_pmu(pmc)->global_status |= BIT_ULL(pmc->idx);
+		if (pmc_is_pmi_enabled(pmc))
+			kvm_make_request(KVM_REQ_PMI, vcpu);
+	}
 }
 
 static inline bool cpl_is_matched(struct kvm_pmc *pmc)
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 40/44] KVM: nVMX: Add macros to simplify nested MSR interception setting
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (38 preceding siblings ...)
  2025-08-06 19:57 ` [PATCH v5 39/44] KVM: x86/pmu: Handle emulated instruction for mediated vPMU Sean Christopherson
@ 2025-08-06 19:57 ` Sean Christopherson
  2025-08-06 19:57 ` [PATCH v5 41/44] KVM: nVMX: Disable PMU MSR interception as appropriate while running L2 Sean Christopherson
                   ` (6 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:57 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Dapeng Mi <dapeng1.mi@linux.intel.com>

Add macros nested_vmx_merge_msr_bitmaps_xxx() to simplify nested MSR
interception setting. No function change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index db2fd4eedc90..47f1f0c7d3a7 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -614,6 +614,19 @@ static inline void nested_vmx_set_intercept_for_msr(struct vcpu_vmx *vmx,
 						   msr_bitmap_l0, msr);
 }
 
+#define nested_vmx_merge_msr_bitmaps(msr, type)	\
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1,	\
+					 msr_bitmap_l0, msr, type)
+
+#define nested_vmx_merge_msr_bitmaps_read(msr) \
+	nested_vmx_merge_msr_bitmaps(msr, MSR_TYPE_R)
+
+#define nested_vmx_merge_msr_bitmaps_write(msr) \
+	nested_vmx_merge_msr_bitmaps(msr, MSR_TYPE_W)
+
+#define nested_vmx_merge_msr_bitmaps_rw(msr) \
+	nested_vmx_merge_msr_bitmaps(msr, MSR_TYPE_RW)
+
 /*
  * Merge L0's and L1's MSR bitmap, return false to indicate that
  * we do not use the hardware.
@@ -697,23 +710,13 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 	 * other runtime changes to vmcs01's bitmap, e.g. dynamic pass-through.
 	 */
 #ifdef CONFIG_X86_64
-	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
-					 MSR_FS_BASE, MSR_TYPE_RW);
-
-	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
-					 MSR_GS_BASE, MSR_TYPE_RW);
-
-	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
-					 MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
+	nested_vmx_merge_msr_bitmaps_rw(MSR_FS_BASE);
+	nested_vmx_merge_msr_bitmaps_rw(MSR_GS_BASE);
+	nested_vmx_merge_msr_bitmaps_rw(MSR_KERNEL_GS_BASE);
 #endif
-	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
-					 MSR_IA32_SPEC_CTRL, MSR_TYPE_RW);
-
-	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
-					 MSR_IA32_PRED_CMD, MSR_TYPE_W);
-
-	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
-					 MSR_IA32_FLUSH_CMD, MSR_TYPE_W);
+	nested_vmx_merge_msr_bitmaps_rw(MSR_IA32_SPEC_CTRL);
+	nested_vmx_merge_msr_bitmaps_write(MSR_IA32_PRED_CMD);
+	nested_vmx_merge_msr_bitmaps_write(MSR_IA32_FLUSH_CMD);
 
 	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
 					 MSR_IA32_APERF, MSR_TYPE_R);
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 41/44] KVM: nVMX: Disable PMU MSR interception as appropriate while running L2
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (39 preceding siblings ...)
  2025-08-06 19:57 ` [PATCH v5 40/44] KVM: nVMX: Add macros to simplify nested MSR interception setting Sean Christopherson
@ 2025-08-06 19:57 ` Sean Christopherson
  2025-08-06 19:57 ` [PATCH v5 42/44] KVM: nSVM: " Sean Christopherson
                   ` (5 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:57 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Mingwei Zhang <mizhang@google.com>

Merge KVM's PMU MSR interception bitmaps with those of L1, i.e. merge the
bitmaps of vmcs01 and vmcs12, e.g. so that KVM doesn't interpose on MSR
accesses unnecessarily if L1 exposes a mediated PMU (or equivalent) to L2.

Signed-off-by: Mingwei Zhang <mizhang@google.com>
Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
[sean: rewrite changelog and comment, omit MSRs that are always intercepted]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 47f1f0c7d3a7..b986a6fb684c 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -627,6 +627,34 @@ static inline void nested_vmx_set_intercept_for_msr(struct vcpu_vmx *vmx,
 #define nested_vmx_merge_msr_bitmaps_rw(msr) \
 	nested_vmx_merge_msr_bitmaps(msr, MSR_TYPE_RW)
 
+static void nested_vmx_merge_pmu_msr_bitmaps(struct kvm_vcpu *vcpu,
+					     unsigned long *msr_bitmap_l1,
+					     unsigned long *msr_bitmap_l0)
+{
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	int i;
+
+	/*
+	 * Skip the merges if the vCPU doesn't have a mediated PMU MSR, i.e. if
+	 * none of the MSRs can possibly be passed through to L1.
+	 */
+	if (!kvm_vcpu_has_mediated_pmu(vcpu))
+		return;
+
+	for (i = 0; i < pmu->nr_arch_gp_counters; i++) {
+		nested_vmx_merge_msr_bitmaps_rw(MSR_IA32_PERFCTR0 + i);
+		nested_vmx_merge_msr_bitmaps_rw(MSR_IA32_PMC0 + i);
+	}
+
+	for (i = 0; i < pmu->nr_arch_fixed_counters; i++)
+		nested_vmx_merge_msr_bitmaps_rw(MSR_CORE_PERF_FIXED_CTR0 + i);
+
+	nested_vmx_merge_msr_bitmaps_rw(MSR_CORE_PERF_GLOBAL_CTRL);
+	nested_vmx_merge_msr_bitmaps_read(MSR_CORE_PERF_GLOBAL_STATUS);
+	nested_vmx_merge_msr_bitmaps_write(MSR_CORE_PERF_GLOBAL_OVF_CTRL);
+}
+
 /*
  * Merge L0's and L1's MSR bitmap, return false to indicate that
  * we do not use the hardware.
@@ -724,6 +752,8 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
 					 MSR_IA32_MPERF, MSR_TYPE_R);
 
+	nested_vmx_merge_pmu_msr_bitmaps(vcpu, msr_bitmap_l1, msr_bitmap_l0);
+
 	kvm_vcpu_unmap(vcpu, &map);
 
 	vmx->nested.force_msr_bitmap_recalc = false;
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 42/44] KVM: nSVM: Disable PMU MSR interception as appropriate while running L2
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (40 preceding siblings ...)
  2025-08-06 19:57 ` [PATCH v5 41/44] KVM: nVMX: Disable PMU MSR interception as appropriate while running L2 Sean Christopherson
@ 2025-08-06 19:57 ` Sean Christopherson
  2025-08-06 19:57 ` [PATCH v5 43/44] KVM: x86/pmu: Expose enable_mediated_pmu parameter to user space Sean Christopherson
                   ` (4 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:57 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

Add MSRs that might be passed through to L1 when running with a mediated
PMU to the nested SVM's set of to-be-merged MSR indices, i.e. disable
interception of PMU MSRs when running L2 if both KVM (L0) and L1 disable
interception.  There is no need for KVM to interpose on such MSR accesses,
e.g. if L1 exposes a mediated PMU (or equivalent) to L2.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index b7fd2e869998..de2b9db2d0ba 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -194,7 +194,7 @@ void recalc_intercepts(struct vcpu_svm *svm)
  * Hardcode the capacity of the array based on the maximum number of _offsets_.
  * MSRs are batched together, so there are fewer offsets than MSRs.
  */
-static int nested_svm_msrpm_merge_offsets[7] __ro_after_init;
+static int nested_svm_msrpm_merge_offsets[10] __ro_after_init;
 static int nested_svm_nr_msrpm_merge_offsets __ro_after_init;
 typedef unsigned long nsvm_msrpm_merge_t;
 
@@ -222,6 +222,22 @@ int __init nested_svm_init_msrpm_merge_offsets(void)
 		MSR_IA32_LASTBRANCHTOIP,
 		MSR_IA32_LASTINTFROMIP,
 		MSR_IA32_LASTINTTOIP,
+
+		MSR_K7_PERFCTR0,
+		MSR_K7_PERFCTR1,
+		MSR_K7_PERFCTR2,
+		MSR_K7_PERFCTR3,
+		MSR_F15H_PERF_CTR0,
+		MSR_F15H_PERF_CTR1,
+		MSR_F15H_PERF_CTR2,
+		MSR_F15H_PERF_CTR3,
+		MSR_F15H_PERF_CTR4,
+		MSR_F15H_PERF_CTR5,
+
+		MSR_AMD64_PERF_CNTR_GLOBAL_CTL,
+		MSR_AMD64_PERF_CNTR_GLOBAL_STATUS,
+		MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR,
+		MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET,
 	};
 	int i, j;
 
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 43/44] KVM: x86/pmu: Expose enable_mediated_pmu parameter to user space
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (41 preceding siblings ...)
  2025-08-06 19:57 ` [PATCH v5 42/44] KVM: nSVM: " Sean Christopherson
@ 2025-08-06 19:57 ` Sean Christopherson
  2025-08-06 19:57 ` [PATCH v5 44/44] KVM: x86/pmu: Elide WRMSRs when loading guest PMCs if values already match Sean Christopherson
                   ` (3 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:57 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

From: Dapeng Mi <dapeng1.mi@linux.intel.com>

Expose enable_mediated_pmu parameter to user space, i.e. allow userspace
to enable/disable mediated vPMU support.

Document the mediated versus perf-based behavior as part of the
kernel-parameters.txt entry, and opportunistically add an entry for the
core enable_pmu param as well.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 .../admin-guide/kernel-parameters.txt         | 49 +++++++++++++++++++
 arch/x86/kvm/svm/svm.c                        |  2 +
 arch/x86/kvm/vmx/vmx.c                        |  2 +
 3 files changed, 53 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 07e22ba5bfe3..12a96493de9a 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2840,6 +2840,26 @@
 
 			Default is Y (on).
 
+	kvm.enable_pmu=[KVM,X86]
+			If enabled, KVM will virtualize PMU functionality based
+			on the virtual CPU model defined by userspace.  This
+			can be overridden on a per-VM basis via
+			KVM_CAP_PMU_CAPABILITY.
+
+			If disabled, KVM will not virtualize PMU functionality,
+			e.g. MSRs, PMCs, PMIs, etc., even if userspace defines
+			a virtual CPU model that contains PMU assets.
+
+			Note, KVM's vPMU support implicitly requires running
+			with an in-kernel local APIC, e.g. to deliver PMIs to
+			the guest.  Running without an in-kernel local APIC is
+			not supported, though KVM will allow such a combination
+			(with severely degraded functionality).
+
+			See also enable_mediated_pmu.
+
+			Default is Y (on).
+
 	kvm.enable_virt_at_load=[KVM,ARM64,LOONGARCH,MIPS,RISCV,X86]
 			If enabled, KVM will enable virtualization in hardware
 			when KVM is loaded, and disable virtualization when KVM
@@ -2886,6 +2906,35 @@
 			If the value is 0 (the default), KVM will pick a period based
 			on the ratio, such that a page is zapped after 1 hour on average.
 
+	kvm-{amd,intel}.enable_mediated_pmu=[KVM,AMD,INTEL]
+			If enabled, KVM will provide a mediated virtual PMU,
+			instead of the default perf-based virtual PMU (if
+			kvm.enable_pmu is true and PMU is enumerated via the
+			virtual CPU model).
+
+			With a perf-based vPMU, KVM operates as a user of perf,
+			i.e. emulates guest PMU counters using perf events.
+			KVM-created perf events are managed by perf as regular
+			(guest-only) events, e.g. are scheduled in/out, contend
+			for hardware resources, etc.  Using a perf-based vPMU
+			allows guest and host usage of the PMU to co-exist, but
+			incurs non-trivial overhead and can result in silently
+			dropped guest events (due to resource contention).
+
+			With a mediated vPMU, hardware PMU state is context
+			switched around the world switch to/from the guest.
+			KVM mediates which events the guest can utilize, but
+			gives the guest direct access to all other PMU assets
+			when possible (KVM may intercept some accesses if the
+			virtual CPU model provides a subset of hardware PMU
+			functionality).  Using a mediated vPMU significantly
+			reduces PMU virtualization overhead and eliminates lost
+			guest events, but is mutually exclusive with using perf
+			to profile KVM guests and adds latency to most VM-Exits
+			(to context switch PMU state).
+
+			Default is N (off).
+
 	kvm-amd.nested=	[KVM,AMD] Control nested virtualization feature in
 			KVM/SVM. Default is 1 (enabled).
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ca6f453cc160..2797c3ab7854 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -178,6 +178,8 @@ module_param(intercept_smi, bool, 0444);
 bool vnmi = true;
 module_param(vnmi, bool, 0444);
 
+module_param(enable_mediated_pmu, bool, 0444);
+
 static bool svm_gp_erratum_intercept = true;
 
 static u8 rsm_ins_bytes[] = "\x0f\xaa";
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 85bd82d41f94..4a4691beba55 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -151,6 +151,8 @@ module_param_named(preemption_timer, enable_preemption_timer, bool, S_IRUGO);
 extern bool __read_mostly allow_smaller_maxphyaddr;
 module_param(allow_smaller_maxphyaddr, bool, S_IRUGO);
 
+module_param(enable_mediated_pmu, bool, 0444);
+
 #define KVM_VM_CR0_ALWAYS_OFF (X86_CR0_NW | X86_CR0_CD)
 #define KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST X86_CR0_NE
 #define KVM_VM_CR0_ALWAYS_ON				\
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 44/44] KVM: x86/pmu: Elide WRMSRs when loading guest PMCs if values already match
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (42 preceding siblings ...)
  2025-08-06 19:57 ` [PATCH v5 43/44] KVM: x86/pmu: Expose enable_mediated_pmu parameter to user space Sean Christopherson
@ 2025-08-06 19:57 ` Sean Christopherson
  2025-08-08  8:28 ` [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Mi, Dapeng
                   ` (2 subsequent siblings)
  46 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-06 19:57 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Sean Christopherson,
	Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

When loading a mediated PMU state, elide the WRMSRs to load PMCs with the
guest's value if the value in hardware already matches the guest's value.
For the relatively common case where neither the guest nor the host is
actively using the PMU, i.e. when all/many counters are '0', eliding the
WRMSRs reduces the latency of handling VM-Exit by a measurable amount
(WRMSR is significantly more expensive than RDPMC).

As measured by KVM-Unit-Tests' CPUID VM-Exit testcase, this provides a
a ~25% reduction in latency (4k => 3k cycles) on Intel Emerald Rapids,
and a ~13% reduction (6.2k => 5.3k cycles) on AMD Turing.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/pmu.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index ddab1630a978..0e5048ae86fa 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -1299,13 +1299,15 @@ static void kvm_pmu_load_guest_pmcs(struct kvm_vcpu *vcpu)
 	for (i = 0; i < pmu->nr_arch_gp_counters; i++) {
 		pmc = &pmu->gp_counters[i];
 
-		wrmsrl(gp_counter_msr(i), pmc->counter);
+		if (pmc->counter != rdpmc(i))
+			wrmsrl(gp_counter_msr(i), pmc->counter);
 		wrmsrl(gp_eventsel_msr(i), pmc->eventsel_hw);
 	}
 	for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
 		pmc = &pmu->fixed_counters[i];
 
-		wrmsrl(fixed_counter_msr(i), pmc->counter);
+		if (pmc->counter != rdpmc(INTEL_PMC_FIXED_RDPMC_BASE | i))
+			wrmsrl(fixed_counter_msr(i), pmc->counter);
 	}
 }
 
-- 
2.50.1.565.gc32cd1483b-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 07/44] perf: Add APIs to load/put guest mediated PMU context
  2025-08-06 19:56 ` [PATCH v5 07/44] perf: Add APIs to load/put guest mediated PMU context Sean Christopherson
@ 2025-08-08  7:30   ` Mi, Dapeng
  0 siblings, 0 replies; 65+ messages in thread
From: Mi, Dapeng @ 2025-08-08  7:30 UTC (permalink / raw)
  To: Sean Christopherson, Marc Zyngier, Oliver Upton, Tianrui Zhao,
	Bibo Mao, Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Xin Li, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das


On 8/7/2025 3:56 AM, Sean Christopherson wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
>
> Add exported APIs to load/put a guest mediated PMU context.  KVM will
> load the guest PMU shortly before VM-Enter, and put the guest PMU shortly
> after VM-Exit.
>
> On the perf side of things, schedule out all exclude_guest events when the
> guest context is loaded, and schedule them back in when the guest context
> is put.  I.e. yield the hardware PMU resources to the guest, by way of KVM.
>
> Note, perf is only responsible for managing host context.  KVM is
> responsible for loading/storing guest state to/from hardware.
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> Signed-off-by: Mingwei Zhang <mizhang@google.com>
> [sean: shuffle patches around, write changelog]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  include/linux/perf_event.h |  2 ++
>  kernel/events/core.c       | 61 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 63 insertions(+)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 0958b6d0a61c..42d019d70b42 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1925,6 +1925,8 @@ extern u64 perf_event_pause(struct perf_event *event, bool reset);
>  #ifdef CONFIG_PERF_GUEST_MEDIATED_PMU
>  int perf_create_mediated_pmu(void);
>  void perf_release_mediated_pmu(void);
> +void perf_load_guest_context(unsigned long data);
> +void perf_put_guest_context(void);
>  #endif
>  
>  #else /* !CONFIG_PERF_EVENTS: */
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 6875b56ddd6b..77398b1ad4c5 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -469,10 +469,19 @@ static cpumask_var_t perf_online_pkg_mask;
>  static cpumask_var_t perf_online_sys_mask;
>  static struct kmem_cache *perf_event_cache;
>  
> +#ifdef CONFIG_PERF_GUEST_MEDIATED_PMU
> +static DEFINE_PER_CPU(bool, guest_ctx_loaded);
> +
> +static __always_inline bool is_guest_mediated_pmu_loaded(void)
> +{
> +	return __this_cpu_read(guest_ctx_loaded);
> +}
> +#else
>  static __always_inline bool is_guest_mediated_pmu_loaded(void)
>  {
>  	return false;
>  }
> +#endif
>  
>  /*
>   * perf event paranoia level:
> @@ -6379,6 +6388,58 @@ void perf_release_mediated_pmu(void)
>  	atomic_dec(&nr_mediated_pmu_vms);
>  }
>  EXPORT_SYMBOL_GPL(perf_release_mediated_pmu);
> +
> +/* When loading a guest's mediated PMU, schedule out all exclude_guest events. */
> +void perf_load_guest_context(unsigned long data)

nit: the "data" argument is not used in this patch, we may defer to
introduce it in patch 09/44.


> +{
> +	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
> +
> +	lockdep_assert_irqs_disabled();
> +
> +	guard(perf_ctx_lock)(cpuctx, cpuctx->task_ctx);
> +
> +	if (WARN_ON_ONCE(__this_cpu_read(guest_ctx_loaded)))
> +		return;
> +
> +	perf_ctx_disable(&cpuctx->ctx, EVENT_GUEST);
> +	ctx_sched_out(&cpuctx->ctx, NULL, EVENT_GUEST);
> +	if (cpuctx->task_ctx) {
> +		perf_ctx_disable(cpuctx->task_ctx, EVENT_GUEST);
> +		task_ctx_sched_out(cpuctx->task_ctx, NULL, EVENT_GUEST);
> +	}
> +
> +	perf_ctx_enable(&cpuctx->ctx, EVENT_GUEST);
> +	if (cpuctx->task_ctx)
> +		perf_ctx_enable(cpuctx->task_ctx, EVENT_GUEST);
> +
> +	__this_cpu_write(guest_ctx_loaded, true);
> +}
> +EXPORT_SYMBOL_GPL(perf_load_guest_context);
> +
> +void perf_put_guest_context(void)
> +{
> +	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
> +
> +	lockdep_assert_irqs_disabled();
> +
> +	guard(perf_ctx_lock)(cpuctx, cpuctx->task_ctx);
> +
> +	if (WARN_ON_ONCE(!__this_cpu_read(guest_ctx_loaded)))
> +		return;
> +
> +	perf_ctx_disable(&cpuctx->ctx, EVENT_GUEST);
> +	if (cpuctx->task_ctx)
> +		perf_ctx_disable(cpuctx->task_ctx, EVENT_GUEST);
> +
> +	perf_event_sched_in(cpuctx, cpuctx->task_ctx, NULL, EVENT_GUEST);
> +
> +	if (cpuctx->task_ctx)
> +		perf_ctx_enable(cpuctx->task_ctx, EVENT_GUEST);
> +	perf_ctx_enable(&cpuctx->ctx, EVENT_GUEST);
> +
> +	__this_cpu_write(guest_ctx_loaded, false);
> +}
> +EXPORT_SYMBOL_GPL(perf_put_guest_context);
>  #else
>  static int mediated_pmu_account_event(struct perf_event *event) { return 0; }
>  static void mediated_pmu_unaccount_event(struct perf_event *event) {}

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (43 preceding siblings ...)
  2025-08-06 19:57 ` [PATCH v5 44/44] KVM: x86/pmu: Elide WRMSRs when loading guest PMCs if values already match Sean Christopherson
@ 2025-08-08  8:28 ` Mi, Dapeng
  2025-08-08  8:35   ` Mi, Dapeng
  2025-08-13  9:45 ` Sandipan Das
  2025-08-22  8:12 ` Hao, Xudong
  46 siblings, 1 reply; 65+ messages in thread
From: Mi, Dapeng @ 2025-08-08  8:28 UTC (permalink / raw)
  To: Sean Christopherson, Marc Zyngier, Oliver Upton, Tianrui Zhao,
	Bibo Mao, Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Xin Li, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das


On 8/7/2025 3:56 AM, Sean Christopherson wrote:
> This series is based on the fastpath+PMU cleanups series[*] (which is based on
> kvm/queue), but the non-KVM changes apply cleanly on v6.16 or Linus' tree.
> I.e. if you only care about the perf changes, I would just apply on whatever
> branch is convenient and stop when you hit the KVM changes.
>
> My hope/plan is that the perf changes will go through the tip tree with a
> stable tag/branch, and the KVM changes will go the kvm-x86 tree.
>
> Non-x86 KVM folks, y'all are getting Cc'd due to minor changes in "KVM: Add a
> simplified wrapper for registering perf callbacks".
>
> The full set is also available at:
>
>   https://github.com/sean-jc/linux.git tags/mediated-vpmu-v5
>
> Add support for mediated vPMUs in KVM x86, where "mediated" aligns with the
> standard definition of intercepting control operations (e.g. event selectors),
> while allowing the guest to perform data operations (e.g. read PMCs, toggle
> counters on/off) without KVM getting involed.
>
> For an in-depth description of the what and why, please see the cover letter
> from the original RFC:
>
>   https://lore.kernel.org/all/20240126085444.324918-1-xiong.y.zhang@linux.intel.com
>
> All KVM tests pass (or fail the same before and after), and I've manually
> verified MSR/PMC are passed through as expected, but I haven't done much at all
> to actually utilize the PMU in a guest.  I'll be amazed if I didn't make at
> least one major goof.
>
> Similarly, I tried to address all feedback, but there are many, many changes
> relative to v4.  If I missed something, I apologize in advance.
>
> In other words, please thoroughly review and test.

Went through the whole patchset, it looks good to me.

Run all PMU related kselftests and KUT tests on Intel Sapphire Rapids, no
issue is found. We would run broader tests on more Intel platforms. Thanks.


>
> [*] https://lore.kernel.org/all/20250805190526.1453366-1-seanjc@google.com
>
> v5:
>  - Add a patch to call security_perf_event_free() from __free_event()
>    instead of _free_event() (necessitated by the __cleanup() changes).
>  - Add CONFIG_PERF_GUEST_MEDIATED_PMU to guard the new perf functionality.
>  - Ensure the PMU is fully disabled in perf_{load,put}_guest_context() when
>    when switching between guest and host context. [Kan, Namhyung]
>  - Route the new system IRQ, PERF_GUEST_MEDIATED_PMI_VECTOR, through perf,
>    not KVM, and play nice with FRED.
>  - Rename and combine perf_{guest,host}_{enter,exit}() to a single set of
>    APIs, perf_{load,put}_guest_context().
>  - Rename perf_{get,put}_mediated_pmu() to perf_{create,release}_mediated_pmu()
>    to (hopefully) better differentiate them from perf_{load,put}_guest_context().
>  - Change the param to the load/put APIs from "u32 guest_lvtpc" to
>    "unsigned long data" to decouple arch code as much as possible.  E.g. if
>    a non-x86 arch were to ever support a mediated vPMU, @data could be used
>    to pass a pointer to a struct.
>  - Use pmu->version to detect if a vCPU has a mediated PMU.
>  - Use a kvm_x86_ops hook to check for mediated PMU support.
>  - Cull "passthrough" from as many places as I could find.
>  - Improve the changelog/documentation related to RDPMC interception.
>  - Check harware capabilities, not KVM capabilities, when calculating
>    MSR and RDPMC intercepts.
>  - Rework intercept (re)calculation to use a request and the existing (well,
>    will be existing as of 6.17-rc1) vendor hooks for recalculating intercepts.
>  - Always read PERF_GLOBAL_CTRL on VM-Exit if writes weren't intercepted while
>    running the vCPU.
>  - Call setup_vmcs_config() before kvm_x86_vendor_init() so that the golden
>    VMCS configuration is known before kvm_init_pmu_capability() is called.
>  - Keep as much refresh/init code in common x86 as possible.
>  - Context switch PMCs and event selectors in common x86, not vendor code.
>  - Bail from the VM-Exit fastpath if the guest is counting instructions
>    retired and the mediated PMU is enabled (because guest state hasn't yet
>    been synchronized with hardware).
>  - Don't require an userspace to opt-in via KVM_CAP_PMU_CAPABILITY, and instead
>    automatically "create" a mediated PMU on the first KVM_CREATE_VCPU call if
>    the VM has an in-kernel local APIC.
>  - Add entries in kernel-parameters.txt for the PMU params.
>  - Add a patch to elide PMC writes when possible.
>  - Many more fixups and tweaks...
>
> v4:
>  - https://lore.kernel.org/all/20250324173121.1275209-1-mizhang@google.com
>  - Rebase whole patchset on 6.14-rc3 base.
>  - Address Peter's comments on Perf part.
>  - Address Sean's comments on KVM part.
>    * Change key word "passthrough" to "mediated" in all patches
>    * Change static enabling to user space dynamic enabling via KVM_CAP_PMU_CAPABILITY.
>    * Only support GLOBAL_CTRL save/restore with VMCS exec_ctrl, drop the MSR
>      save/retore list support for GLOBAL_CTRL, thus the support of mediated
>      vPMU is constrained to SapphireRapids and later CPUs on Intel side.
>    * Merge some small changes into a single patch.
>  - Address Sandipan's comment on invalid pmu pointer.
>  - Add back "eventsel_hw" and "fixed_ctr_ctrl_hw" to avoid to directly
>    manipulate pmc->eventsel and pmu->fixed_ctr_ctrl.
>
> v3: https://lore.kernel.org/all/20240801045907.4010984-1-mizhang@google.com
> v2: https://lore.kernel.org/all/20240506053020.3911940-1-mizhang@google.com
> v1: https://lore.kernel.org/all/20240126085444.324918-1-xiong.y.zhang@linux.intel.com
>
> Dapeng Mi (15):
>   KVM: x86/pmu: Start stubbing in mediated PMU support
>   KVM: x86/pmu: Implement Intel mediated PMU requirements and
>     constraints
>   KVM: x86: Rename vmx_vmentry/vmexit_ctrl() helpers
>   KVM: x86/pmu: Move PMU_CAP_{FW_WRITES,LBR_FMT} into msr-index.h header
>   KVM: VMX: Add helpers to toggle/change a bit in VMCS execution
>     controls
>   KVM: x86/pmu: Disable RDPMC interception for compatible mediated vPMU
>   KVM: x86/pmu: Load/save GLOBAL_CTRL via entry/exit fields for mediated
>     PMU
>   KVM: x86/pmu: Use BIT_ULL() instead of open coded equivalents
>   KVM: x86/pmu: Disable interception of select PMU MSRs for mediated
>     vPMUs
>   KVM: x86/pmu: Bypass perf checks when emulating mediated PMU counter
>     accesses
>   KVM: x86/pmu: Reprogram mediated PMU event selectors on event filter
>     updates
>   KVM: x86/pmu: Load/put mediated PMU context when entering/exiting
>     guest
>   KVM: x86/pmu: Handle emulated instruction for mediated vPMU
>   KVM: nVMX: Add macros to simplify nested MSR interception setting
>   KVM: x86/pmu: Expose enable_mediated_pmu parameter to user space
>
> Kan Liang (7):
>   perf: Skip pmu_ctx based on event_type
>   perf: Add generic exclude_guest support
>   perf: Add APIs to create/release mediated guest vPMUs
>   perf: Clean up perf ctx time
>   perf: Add a EVENT_GUEST flag
>   perf: Add APIs to load/put guest mediated PMU context
>   perf/x86/intel: Support PERF_PMU_CAP_MEDIATED_VPMU
>
> Mingwei Zhang (3):
>   perf/x86/core: Plumb mediated PMU capability from x86_pmu to
>     x86_pmu_cap
>   KVM: x86/pmu: Introduce eventsel_hw to prepare for pmu event filtering
>   KVM: nVMX: Disable PMU MSR interception as appropriate while running
>     L2
>
> Sandipan Das (3):
>   perf/x86/core: Do not set bit width for unavailable counters
>   perf/x86/amd: Support PERF_PMU_CAP_MEDIATED_VPMU for AMD host
>   KVM: x86/pmu: Always stuff GuestOnly=1,HostOnly=0 for mediated PMCs on
>     AMD
>
> Sean Christopherson (15):
>   perf: Move security_perf_event_free() call to __free_event()
>   perf: core/x86: Register a new vector for handling mediated guest PMIs
>   perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put
>     context
>   KVM: VMX: Setup canonical VMCS config prior to kvm_x86_vendor_init()
>   KVM: SVM: Check pmu->version, not enable_pmu, when getting PMC MSRs
>   KVM: Add a simplified wrapper for registering perf callbacks
>   KVM: x86/pmu: Snapshot host (i.e. perf's) reported PMU capabilities
>   KVM: x86/pmu: Implement AMD mediated PMU requirements
>   KVM: x86: Rework KVM_REQ_MSR_FILTER_CHANGED into a generic
>     RECALC_INTERCEPTS
>   KVM: x86: Use KVM_REQ_RECALC_INTERCEPTS to react to CPUID updates
>   KVM: x86/pmu: Move initialization of valid PMCs bitmask to common x86
>   KVM: x86/pmu: Restrict GLOBAL_{CTRL,STATUS}, fixed PMCs, and PEBS to
>     PMU v2+
>   KVM: x86/pmu: Disallow emulation in the fastpath if mediated PMCs are
>     active
>   KVM: nSVM: Disable PMU MSR interception as appropriate while running
>     L2
>   KVM: x86/pmu: Elide WRMSRs when loading guest PMCs if values already
>     match
>
> Xiong Zhang (1):
>   KVM: x86/pmu: Register PMI handler for mediated vPMU
>
>  .../admin-guide/kernel-parameters.txt         |  49 ++
>  arch/arm64/kvm/arm.c                          |   2 +-
>  arch/loongarch/kvm/main.c                     |   2 +-
>  arch/riscv/kvm/main.c                         |   2 +-
>  arch/x86/entry/entry_fred.c                   |   1 +
>  arch/x86/events/amd/core.c                    |   2 +
>  arch/x86/events/core.c                        |  32 +-
>  arch/x86/events/intel/core.c                  |   5 +
>  arch/x86/include/asm/hardirq.h                |   3 +
>  arch/x86/include/asm/idtentry.h               |   6 +
>  arch/x86/include/asm/irq_vectors.h            |   4 +-
>  arch/x86/include/asm/kvm-x86-ops.h            |   2 +-
>  arch/x86/include/asm/kvm-x86-pmu-ops.h        |   4 +
>  arch/x86/include/asm/kvm_host.h               |   7 +-
>  arch/x86/include/asm/msr-index.h              |  17 +-
>  arch/x86/include/asm/perf_event.h             |   1 +
>  arch/x86/include/asm/vmx.h                    |   1 +
>  arch/x86/kernel/idt.c                         |   3 +
>  arch/x86/kernel/irq.c                         |  19 +
>  arch/x86/kvm/Kconfig                          |   1 +
>  arch/x86/kvm/cpuid.c                          |   2 +
>  arch/x86/kvm/pmu.c                            | 272 ++++++++-
>  arch/x86/kvm/pmu.h                            |  37 +-
>  arch/x86/kvm/svm/nested.c                     |  18 +-
>  arch/x86/kvm/svm/pmu.c                        |  51 +-
>  arch/x86/kvm/svm/svm.c                        |  54 +-
>  arch/x86/kvm/vmx/capabilities.h               |  11 +-
>  arch/x86/kvm/vmx/main.c                       |  14 +-
>  arch/x86/kvm/vmx/nested.c                     |  65 ++-
>  arch/x86/kvm/vmx/pmu_intel.c                  | 169 ++++--
>  arch/x86/kvm/vmx/pmu_intel.h                  |  15 +
>  arch/x86/kvm/vmx/vmx.c                        | 143 +++--
>  arch/x86/kvm/vmx/vmx.h                        |  11 +-
>  arch/x86/kvm/vmx/x86_ops.h                    |   2 +-
>  arch/x86/kvm/x86.c                            |  69 ++-
>  arch/x86/kvm/x86.h                            |   1 +
>  include/linux/kvm_host.h                      |  11 +-
>  include/linux/perf_event.h                    |  38 +-
>  init/Kconfig                                  |   4 +
>  kernel/events/core.c                          | 521 ++++++++++++++----
>  .../beauty/arch/x86/include/asm/irq_vectors.h |   3 +-
>  virt/kvm/kvm_main.c                           |   6 +-
>  42 files changed, 1385 insertions(+), 295 deletions(-)
>
>
> base-commit: 53d61a43a7973f812caa08fa922b607574befef4

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs
  2025-08-08  8:28 ` [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Mi, Dapeng
@ 2025-08-08  8:35   ` Mi, Dapeng
  0 siblings, 0 replies; 65+ messages in thread
From: Mi, Dapeng @ 2025-08-08  8:35 UTC (permalink / raw)
  To: Sean Christopherson, Marc Zyngier, Oliver Upton, Tianrui Zhao,
	Bibo Mao, Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Xin Li, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das


On 8/8/2025 4:28 PM, Mi, Dapeng wrote:
> On 8/7/2025 3:56 AM, Sean Christopherson wrote:
>> This series is based on the fastpath+PMU cleanups series[*] (which is based on
>> kvm/queue), but the non-KVM changes apply cleanly on v6.16 or Linus' tree.
>> I.e. if you only care about the perf changes, I would just apply on whatever
>> branch is convenient and stop when you hit the KVM changes.
>>
>> My hope/plan is that the perf changes will go through the tip tree with a
>> stable tag/branch, and the KVM changes will go the kvm-x86 tree.
>>
>> Non-x86 KVM folks, y'all are getting Cc'd due to minor changes in "KVM: Add a
>> simplified wrapper for registering perf callbacks".
>>
>> The full set is also available at:
>>
>>   https://github.com/sean-jc/linux.git tags/mediated-vpmu-v5
>>
>> Add support for mediated vPMUs in KVM x86, where "mediated" aligns with the
>> standard definition of intercepting control operations (e.g. event selectors),
>> while allowing the guest to perform data operations (e.g. read PMCs, toggle
>> counters on/off) without KVM getting involed.
>>
>> For an in-depth description of the what and why, please see the cover letter
>> from the original RFC:
>>
>>   https://lore.kernel.org/all/20240126085444.324918-1-xiong.y.zhang@linux.intel.com
>>
>> All KVM tests pass (or fail the same before and after), and I've manually
>> verified MSR/PMC are passed through as expected, but I haven't done much at all
>> to actually utilize the PMU in a guest.  I'll be amazed if I didn't make at
>> least one major goof.
>>
>> Similarly, I tried to address all feedback, but there are many, many changes
>> relative to v4.  If I missed something, I apologize in advance.
>>
>> In other words, please thoroughly review and test.
> Went through the whole patchset, it looks good to me.
>
> Run all PMU related kselftests and KUT tests on Intel Sapphire Rapids, no
> issue is found. We would run broader tests on more Intel platforms. Thanks.

Forgot to say "all tests are run for both mediated vPMU and the legacy
perf-based vPMU, no issue is found". Thanks.


>
>
>> [*] https://lore.kernel.org/all/20250805190526.1453366-1-seanjc@google.com
>>
>> v5:
>>  - Add a patch to call security_perf_event_free() from __free_event()
>>    instead of _free_event() (necessitated by the __cleanup() changes).
>>  - Add CONFIG_PERF_GUEST_MEDIATED_PMU to guard the new perf functionality.
>>  - Ensure the PMU is fully disabled in perf_{load,put}_guest_context() when
>>    when switching between guest and host context. [Kan, Namhyung]
>>  - Route the new system IRQ, PERF_GUEST_MEDIATED_PMI_VECTOR, through perf,
>>    not KVM, and play nice with FRED.
>>  - Rename and combine perf_{guest,host}_{enter,exit}() to a single set of
>>    APIs, perf_{load,put}_guest_context().
>>  - Rename perf_{get,put}_mediated_pmu() to perf_{create,release}_mediated_pmu()
>>    to (hopefully) better differentiate them from perf_{load,put}_guest_context().
>>  - Change the param to the load/put APIs from "u32 guest_lvtpc" to
>>    "unsigned long data" to decouple arch code as much as possible.  E.g. if
>>    a non-x86 arch were to ever support a mediated vPMU, @data could be used
>>    to pass a pointer to a struct.
>>  - Use pmu->version to detect if a vCPU has a mediated PMU.
>>  - Use a kvm_x86_ops hook to check for mediated PMU support.
>>  - Cull "passthrough" from as many places as I could find.
>>  - Improve the changelog/documentation related to RDPMC interception.
>>  - Check harware capabilities, not KVM capabilities, when calculating
>>    MSR and RDPMC intercepts.
>>  - Rework intercept (re)calculation to use a request and the existing (well,
>>    will be existing as of 6.17-rc1) vendor hooks for recalculating intercepts.
>>  - Always read PERF_GLOBAL_CTRL on VM-Exit if writes weren't intercepted while
>>    running the vCPU.
>>  - Call setup_vmcs_config() before kvm_x86_vendor_init() so that the golden
>>    VMCS configuration is known before kvm_init_pmu_capability() is called.
>>  - Keep as much refresh/init code in common x86 as possible.
>>  - Context switch PMCs and event selectors in common x86, not vendor code.
>>  - Bail from the VM-Exit fastpath if the guest is counting instructions
>>    retired and the mediated PMU is enabled (because guest state hasn't yet
>>    been synchronized with hardware).
>>  - Don't require an userspace to opt-in via KVM_CAP_PMU_CAPABILITY, and instead
>>    automatically "create" a mediated PMU on the first KVM_CREATE_VCPU call if
>>    the VM has an in-kernel local APIC.
>>  - Add entries in kernel-parameters.txt for the PMU params.
>>  - Add a patch to elide PMC writes when possible.
>>  - Many more fixups and tweaks...
>>
>> v4:
>>  - https://lore.kernel.org/all/20250324173121.1275209-1-mizhang@google.com
>>  - Rebase whole patchset on 6.14-rc3 base.
>>  - Address Peter's comments on Perf part.
>>  - Address Sean's comments on KVM part.
>>    * Change key word "passthrough" to "mediated" in all patches
>>    * Change static enabling to user space dynamic enabling via KVM_CAP_PMU_CAPABILITY.
>>    * Only support GLOBAL_CTRL save/restore with VMCS exec_ctrl, drop the MSR
>>      save/retore list support for GLOBAL_CTRL, thus the support of mediated
>>      vPMU is constrained to SapphireRapids and later CPUs on Intel side.
>>    * Merge some small changes into a single patch.
>>  - Address Sandipan's comment on invalid pmu pointer.
>>  - Add back "eventsel_hw" and "fixed_ctr_ctrl_hw" to avoid to directly
>>    manipulate pmc->eventsel and pmu->fixed_ctr_ctrl.
>>
>> v3: https://lore.kernel.org/all/20240801045907.4010984-1-mizhang@google.com
>> v2: https://lore.kernel.org/all/20240506053020.3911940-1-mizhang@google.com
>> v1: https://lore.kernel.org/all/20240126085444.324918-1-xiong.y.zhang@linux.intel.com
>>
>> Dapeng Mi (15):
>>   KVM: x86/pmu: Start stubbing in mediated PMU support
>>   KVM: x86/pmu: Implement Intel mediated PMU requirements and
>>     constraints
>>   KVM: x86: Rename vmx_vmentry/vmexit_ctrl() helpers
>>   KVM: x86/pmu: Move PMU_CAP_{FW_WRITES,LBR_FMT} into msr-index.h header
>>   KVM: VMX: Add helpers to toggle/change a bit in VMCS execution
>>     controls
>>   KVM: x86/pmu: Disable RDPMC interception for compatible mediated vPMU
>>   KVM: x86/pmu: Load/save GLOBAL_CTRL via entry/exit fields for mediated
>>     PMU
>>   KVM: x86/pmu: Use BIT_ULL() instead of open coded equivalents
>>   KVM: x86/pmu: Disable interception of select PMU MSRs for mediated
>>     vPMUs
>>   KVM: x86/pmu: Bypass perf checks when emulating mediated PMU counter
>>     accesses
>>   KVM: x86/pmu: Reprogram mediated PMU event selectors on event filter
>>     updates
>>   KVM: x86/pmu: Load/put mediated PMU context when entering/exiting
>>     guest
>>   KVM: x86/pmu: Handle emulated instruction for mediated vPMU
>>   KVM: nVMX: Add macros to simplify nested MSR interception setting
>>   KVM: x86/pmu: Expose enable_mediated_pmu parameter to user space
>>
>> Kan Liang (7):
>>   perf: Skip pmu_ctx based on event_type
>>   perf: Add generic exclude_guest support
>>   perf: Add APIs to create/release mediated guest vPMUs
>>   perf: Clean up perf ctx time
>>   perf: Add a EVENT_GUEST flag
>>   perf: Add APIs to load/put guest mediated PMU context
>>   perf/x86/intel: Support PERF_PMU_CAP_MEDIATED_VPMU
>>
>> Mingwei Zhang (3):
>>   perf/x86/core: Plumb mediated PMU capability from x86_pmu to
>>     x86_pmu_cap
>>   KVM: x86/pmu: Introduce eventsel_hw to prepare for pmu event filtering
>>   KVM: nVMX: Disable PMU MSR interception as appropriate while running
>>     L2
>>
>> Sandipan Das (3):
>>   perf/x86/core: Do not set bit width for unavailable counters
>>   perf/x86/amd: Support PERF_PMU_CAP_MEDIATED_VPMU for AMD host
>>   KVM: x86/pmu: Always stuff GuestOnly=1,HostOnly=0 for mediated PMCs on
>>     AMD
>>
>> Sean Christopherson (15):
>>   perf: Move security_perf_event_free() call to __free_event()
>>   perf: core/x86: Register a new vector for handling mediated guest PMIs
>>   perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put
>>     context
>>   KVM: VMX: Setup canonical VMCS config prior to kvm_x86_vendor_init()
>>   KVM: SVM: Check pmu->version, not enable_pmu, when getting PMC MSRs
>>   KVM: Add a simplified wrapper for registering perf callbacks
>>   KVM: x86/pmu: Snapshot host (i.e. perf's) reported PMU capabilities
>>   KVM: x86/pmu: Implement AMD mediated PMU requirements
>>   KVM: x86: Rework KVM_REQ_MSR_FILTER_CHANGED into a generic
>>     RECALC_INTERCEPTS
>>   KVM: x86: Use KVM_REQ_RECALC_INTERCEPTS to react to CPUID updates
>>   KVM: x86/pmu: Move initialization of valid PMCs bitmask to common x86
>>   KVM: x86/pmu: Restrict GLOBAL_{CTRL,STATUS}, fixed PMCs, and PEBS to
>>     PMU v2+
>>   KVM: x86/pmu: Disallow emulation in the fastpath if mediated PMCs are
>>     active
>>   KVM: nSVM: Disable PMU MSR interception as appropriate while running
>>     L2
>>   KVM: x86/pmu: Elide WRMSRs when loading guest PMCs if values already
>>     match
>>
>> Xiong Zhang (1):
>>   KVM: x86/pmu: Register PMI handler for mediated vPMU
>>
>>  .../admin-guide/kernel-parameters.txt         |  49 ++
>>  arch/arm64/kvm/arm.c                          |   2 +-
>>  arch/loongarch/kvm/main.c                     |   2 +-
>>  arch/riscv/kvm/main.c                         |   2 +-
>>  arch/x86/entry/entry_fred.c                   |   1 +
>>  arch/x86/events/amd/core.c                    |   2 +
>>  arch/x86/events/core.c                        |  32 +-
>>  arch/x86/events/intel/core.c                  |   5 +
>>  arch/x86/include/asm/hardirq.h                |   3 +
>>  arch/x86/include/asm/idtentry.h               |   6 +
>>  arch/x86/include/asm/irq_vectors.h            |   4 +-
>>  arch/x86/include/asm/kvm-x86-ops.h            |   2 +-
>>  arch/x86/include/asm/kvm-x86-pmu-ops.h        |   4 +
>>  arch/x86/include/asm/kvm_host.h               |   7 +-
>>  arch/x86/include/asm/msr-index.h              |  17 +-
>>  arch/x86/include/asm/perf_event.h             |   1 +
>>  arch/x86/include/asm/vmx.h                    |   1 +
>>  arch/x86/kernel/idt.c                         |   3 +
>>  arch/x86/kernel/irq.c                         |  19 +
>>  arch/x86/kvm/Kconfig                          |   1 +
>>  arch/x86/kvm/cpuid.c                          |   2 +
>>  arch/x86/kvm/pmu.c                            | 272 ++++++++-
>>  arch/x86/kvm/pmu.h                            |  37 +-
>>  arch/x86/kvm/svm/nested.c                     |  18 +-
>>  arch/x86/kvm/svm/pmu.c                        |  51 +-
>>  arch/x86/kvm/svm/svm.c                        |  54 +-
>>  arch/x86/kvm/vmx/capabilities.h               |  11 +-
>>  arch/x86/kvm/vmx/main.c                       |  14 +-
>>  arch/x86/kvm/vmx/nested.c                     |  65 ++-
>>  arch/x86/kvm/vmx/pmu_intel.c                  | 169 ++++--
>>  arch/x86/kvm/vmx/pmu_intel.h                  |  15 +
>>  arch/x86/kvm/vmx/vmx.c                        | 143 +++--
>>  arch/x86/kvm/vmx/vmx.h                        |  11 +-
>>  arch/x86/kvm/vmx/x86_ops.h                    |   2 +-
>>  arch/x86/kvm/x86.c                            |  69 ++-
>>  arch/x86/kvm/x86.h                            |   1 +
>>  include/linux/kvm_host.h                      |  11 +-
>>  include/linux/perf_event.h                    |  38 +-
>>  init/Kconfig                                  |   4 +
>>  kernel/events/core.c                          | 521 ++++++++++++++----
>>  .../beauty/arch/x86/include/asm/irq_vectors.h |   3 +-
>>  virt/kvm/kvm_main.c                           |   6 +-
>>  42 files changed, 1385 insertions(+), 295 deletions(-)
>>
>>
>> base-commit: 53d61a43a7973f812caa08fa922b607574befef4

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (44 preceding siblings ...)
  2025-08-08  8:28 ` [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Mi, Dapeng
@ 2025-08-13  9:45 ` Sandipan Das
  2025-08-22  8:12 ` Hao, Xudong
  46 siblings, 0 replies; 65+ messages in thread
From: Sandipan Das @ 2025-08-13  9:45 UTC (permalink / raw)
  To: Sean Christopherson, Marc Zyngier, Oliver Upton, Tianrui Zhao,
	Bibo Mao, Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Xin Li, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Dapeng Mi

On 07-08-2025 01:26, Sean Christopherson wrote:
> This series is based on the fastpath+PMU cleanups series[*] (which is based on
> kvm/queue), but the non-KVM changes apply cleanly on v6.16 or Linus' tree.
> I.e. if you only care about the perf changes, I would just apply on whatever
> branch is convenient and stop when you hit the KVM changes.
> 
> My hope/plan is that the perf changes will go through the tip tree with a
> stable tag/branch, and the KVM changes will go the kvm-x86 tree.
> 
> Non-x86 KVM folks, y'all are getting Cc'd due to minor changes in "KVM: Add a
> simplified wrapper for registering perf callbacks".
> 
> The full set is also available at:
> 
>   https://github.com/sean-jc/linux.git tags/mediated-vpmu-v5
> 
> Add support for mediated vPMUs in KVM x86, where "mediated" aligns with the
> standard definition of intercepting control operations (e.g. event selectors),
> while allowing the guest to perform data operations (e.g. read PMCs, toggle
> counters on/off) without KVM getting involed.
> 
> For an in-depth description of the what and why, please see the cover letter
> from the original RFC:
> 
>   https://lore.kernel.org/all/20240126085444.324918-1-xiong.y.zhang@linux.intel.com
> 
> All KVM tests pass (or fail the same before and after), and I've manually
> verified MSR/PMC are passed through as expected, but I haven't done much at all
> to actually utilize the PMU in a guest.  I'll be amazed if I didn't make at
> least one major goof.
> 
> Similarly, I tried to address all feedback, but there are many, many changes
> relative to v4.  If I missed something, I apologize in advance.
> 
> In other words, please thoroughly review and test.
> 
> [*] https://lore.kernel.org/all/20250805190526.1453366-1-seanjc@google.com
> 
> v5:
>  - Add a patch to call security_perf_event_free() from __free_event()
>    instead of _free_event() (necessitated by the __cleanup() changes).
>  - Add CONFIG_PERF_GUEST_MEDIATED_PMU to guard the new perf functionality.
>  - Ensure the PMU is fully disabled in perf_{load,put}_guest_context() when
>    when switching between guest and host context. [Kan, Namhyung]
>  - Route the new system IRQ, PERF_GUEST_MEDIATED_PMI_VECTOR, through perf,
>    not KVM, and play nice with FRED.
>  - Rename and combine perf_{guest,host}_{enter,exit}() to a single set of
>    APIs, perf_{load,put}_guest_context().
>  - Rename perf_{get,put}_mediated_pmu() to perf_{create,release}_mediated_pmu()
>    to (hopefully) better differentiate them from perf_{load,put}_guest_context().
>  - Change the param to the load/put APIs from "u32 guest_lvtpc" to
>    "unsigned long data" to decouple arch code as much as possible.  E.g. if
>    a non-x86 arch were to ever support a mediated vPMU, @data could be used
>    to pass a pointer to a struct.
>  - Use pmu->version to detect if a vCPU has a mediated PMU.
>  - Use a kvm_x86_ops hook to check for mediated PMU support.
>  - Cull "passthrough" from as many places as I could find.
>  - Improve the changelog/documentation related to RDPMC interception.
>  - Check harware capabilities, not KVM capabilities, when calculating
>    MSR and RDPMC intercepts.
>  - Rework intercept (re)calculation to use a request and the existing (well,
>    will be existing as of 6.17-rc1) vendor hooks for recalculating intercepts.
>  - Always read PERF_GLOBAL_CTRL on VM-Exit if writes weren't intercepted while
>    running the vCPU.
>  - Call setup_vmcs_config() before kvm_x86_vendor_init() so that the golden
>    VMCS configuration is known before kvm_init_pmu_capability() is called.
>  - Keep as much refresh/init code in common x86 as possible.
>  - Context switch PMCs and event selectors in common x86, not vendor code.
>  - Bail from the VM-Exit fastpath if the guest is counting instructions
>    retired and the mediated PMU is enabled (because guest state hasn't yet
>    been synchronized with hardware).
>  - Don't require an userspace to opt-in via KVM_CAP_PMU_CAPABILITY, and instead
>    automatically "create" a mediated PMU on the first KVM_CREATE_VCPU call if
>    the VM has an in-kernel local APIC.
>  - Add entries in kernel-parameters.txt for the PMU params.
>  - Add a patch to elide PMC writes when possible.
>  - Many more fixups and tweaks...
> 
> v4:
>  - https://lore.kernel.org/all/20250324173121.1275209-1-mizhang@google.com
>  - Rebase whole patchset on 6.14-rc3 base.
>  - Address Peter's comments on Perf part.
>  - Address Sean's comments on KVM part.
>    * Change key word "passthrough" to "mediated" in all patches
>    * Change static enabling to user space dynamic enabling via KVM_CAP_PMU_CAPABILITY.
>    * Only support GLOBAL_CTRL save/restore with VMCS exec_ctrl, drop the MSR
>      save/retore list support for GLOBAL_CTRL, thus the support of mediated
>      vPMU is constrained to SapphireRapids and later CPUs on Intel side.
>    * Merge some small changes into a single patch.
>  - Address Sandipan's comment on invalid pmu pointer.
>  - Add back "eventsel_hw" and "fixed_ctr_ctrl_hw" to avoid to directly
>    manipulate pmc->eventsel and pmu->fixed_ctr_ctrl.
> 
> v3: https://lore.kernel.org/all/20240801045907.4010984-1-mizhang@google.com
> v2: https://lore.kernel.org/all/20240506053020.3911940-1-mizhang@google.com
> v1: https://lore.kernel.org/all/20240126085444.324918-1-xiong.y.zhang@linux.intel.com
> 
> Dapeng Mi (15):
>   KVM: x86/pmu: Start stubbing in mediated PMU support
>   KVM: x86/pmu: Implement Intel mediated PMU requirements and
>     constraints
>   KVM: x86: Rename vmx_vmentry/vmexit_ctrl() helpers
>   KVM: x86/pmu: Move PMU_CAP_{FW_WRITES,LBR_FMT} into msr-index.h header
>   KVM: VMX: Add helpers to toggle/change a bit in VMCS execution
>     controls
>   KVM: x86/pmu: Disable RDPMC interception for compatible mediated vPMU
>   KVM: x86/pmu: Load/save GLOBAL_CTRL via entry/exit fields for mediated
>     PMU
>   KVM: x86/pmu: Use BIT_ULL() instead of open coded equivalents
>   KVM: x86/pmu: Disable interception of select PMU MSRs for mediated
>     vPMUs
>   KVM: x86/pmu: Bypass perf checks when emulating mediated PMU counter
>     accesses
>   KVM: x86/pmu: Reprogram mediated PMU event selectors on event filter
>     updates
>   KVM: x86/pmu: Load/put mediated PMU context when entering/exiting
>     guest
>   KVM: x86/pmu: Handle emulated instruction for mediated vPMU
>   KVM: nVMX: Add macros to simplify nested MSR interception setting
>   KVM: x86/pmu: Expose enable_mediated_pmu parameter to user space
> 
> Kan Liang (7):
>   perf: Skip pmu_ctx based on event_type
>   perf: Add generic exclude_guest support
>   perf: Add APIs to create/release mediated guest vPMUs
>   perf: Clean up perf ctx time
>   perf: Add a EVENT_GUEST flag
>   perf: Add APIs to load/put guest mediated PMU context
>   perf/x86/intel: Support PERF_PMU_CAP_MEDIATED_VPMU
> 
> Mingwei Zhang (3):
>   perf/x86/core: Plumb mediated PMU capability from x86_pmu to
>     x86_pmu_cap
>   KVM: x86/pmu: Introduce eventsel_hw to prepare for pmu event filtering
>   KVM: nVMX: Disable PMU MSR interception as appropriate while running
>     L2
> 
> Sandipan Das (3):
>   perf/x86/core: Do not set bit width for unavailable counters
>   perf/x86/amd: Support PERF_PMU_CAP_MEDIATED_VPMU for AMD host
>   KVM: x86/pmu: Always stuff GuestOnly=1,HostOnly=0 for mediated PMCs on
>     AMD
> 
> Sean Christopherson (15):
>   perf: Move security_perf_event_free() call to __free_event()
>   perf: core/x86: Register a new vector for handling mediated guest PMIs
>   perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put
>     context
>   KVM: VMX: Setup canonical VMCS config prior to kvm_x86_vendor_init()
>   KVM: SVM: Check pmu->version, not enable_pmu, when getting PMC MSRs
>   KVM: Add a simplified wrapper for registering perf callbacks
>   KVM: x86/pmu: Snapshot host (i.e. perf's) reported PMU capabilities
>   KVM: x86/pmu: Implement AMD mediated PMU requirements
>   KVM: x86: Rework KVM_REQ_MSR_FILTER_CHANGED into a generic
>     RECALC_INTERCEPTS
>   KVM: x86: Use KVM_REQ_RECALC_INTERCEPTS to react to CPUID updates
>   KVM: x86/pmu: Move initialization of valid PMCs bitmask to common x86
>   KVM: x86/pmu: Restrict GLOBAL_{CTRL,STATUS}, fixed PMCs, and PEBS to
>     PMU v2+
>   KVM: x86/pmu: Disallow emulation in the fastpath if mediated PMCs are
>     active
>   KVM: nSVM: Disable PMU MSR interception as appropriate while running
>     L2
>   KVM: x86/pmu: Elide WRMSRs when loading guest PMCs if values already
>     match
> 
> Xiong Zhang (1):
>   KVM: x86/pmu: Register PMI handler for mediated vPMU
> 
>  .../admin-guide/kernel-parameters.txt         |  49 ++
>  arch/arm64/kvm/arm.c                          |   2 +-
>  arch/loongarch/kvm/main.c                     |   2 +-
>  arch/riscv/kvm/main.c                         |   2 +-
>  arch/x86/entry/entry_fred.c                   |   1 +
>  arch/x86/events/amd/core.c                    |   2 +
>  arch/x86/events/core.c                        |  32 +-
>  arch/x86/events/intel/core.c                  |   5 +
>  arch/x86/include/asm/hardirq.h                |   3 +
>  arch/x86/include/asm/idtentry.h               |   6 +
>  arch/x86/include/asm/irq_vectors.h            |   4 +-
>  arch/x86/include/asm/kvm-x86-ops.h            |   2 +-
>  arch/x86/include/asm/kvm-x86-pmu-ops.h        |   4 +
>  arch/x86/include/asm/kvm_host.h               |   7 +-
>  arch/x86/include/asm/msr-index.h              |  17 +-
>  arch/x86/include/asm/perf_event.h             |   1 +
>  arch/x86/include/asm/vmx.h                    |   1 +
>  arch/x86/kernel/idt.c                         |   3 +
>  arch/x86/kernel/irq.c                         |  19 +
>  arch/x86/kvm/Kconfig                          |   1 +
>  arch/x86/kvm/cpuid.c                          |   2 +
>  arch/x86/kvm/pmu.c                            | 272 ++++++++-
>  arch/x86/kvm/pmu.h                            |  37 +-
>  arch/x86/kvm/svm/nested.c                     |  18 +-
>  arch/x86/kvm/svm/pmu.c                        |  51 +-
>  arch/x86/kvm/svm/svm.c                        |  54 +-
>  arch/x86/kvm/vmx/capabilities.h               |  11 +-
>  arch/x86/kvm/vmx/main.c                       |  14 +-
>  arch/x86/kvm/vmx/nested.c                     |  65 ++-
>  arch/x86/kvm/vmx/pmu_intel.c                  | 169 ++++--
>  arch/x86/kvm/vmx/pmu_intel.h                  |  15 +
>  arch/x86/kvm/vmx/vmx.c                        | 143 +++--
>  arch/x86/kvm/vmx/vmx.h                        |  11 +-
>  arch/x86/kvm/vmx/x86_ops.h                    |   2 +-
>  arch/x86/kvm/x86.c                            |  69 ++-
>  arch/x86/kvm/x86.h                            |   1 +
>  include/linux/kvm_host.h                      |  11 +-
>  include/linux/perf_event.h                    |  38 +-
>  init/Kconfig                                  |   4 +
>  kernel/events/core.c                          | 521 ++++++++++++++----
>  .../beauty/arch/x86/include/asm/irq_vectors.h |   3 +-
>  virt/kvm/kvm_main.c                           |   6 +-
>  42 files changed, 1385 insertions(+), 295 deletions(-)
> 
> 
> base-commit: 53d61a43a7973f812caa08fa922b607574befef4

No issues seen with KUT and KVM kselftest runs on the following types of
AMD host systems.
- Milan (does not have PerfMonV2, cannot use Mediated PMU)
- Genoa and Turin (have PerfMonV2)

Tested with all combinations of kvm.force_emulation_prefix and
kvm_amd.enable_mediated_pmu. The issue seen previously where RDPMC gets
intercepted on secondary vCPUs has also been addressed.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 20/44] KVM: x86/pmu: Implement AMD mediated PMU requirements
  2025-08-06 19:56 ` [PATCH v5 20/44] KVM: x86/pmu: Implement AMD mediated PMU requirements Sean Christopherson
@ 2025-08-13  9:49   ` Sandipan Das
  0 siblings, 0 replies; 65+ messages in thread
From: Sandipan Das @ 2025-08-13  9:49 UTC (permalink / raw)
  To: Sean Christopherson, Marc Zyngier, Oliver Upton, Tianrui Zhao,
	Bibo Mao, Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Xin Li, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Dapeng Mi

On 07-08-2025 01:26, Sean Christopherson wrote:
> Require host PMU version 2+ for AMD mediated PMU support, as
> PERF_GLOBAL_CTRL and friends are hard requirements for the mediated PMU.
> 
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Co-developed-by: Mingwei Zhang <mizhang@google.com>
> Signed-off-by: Mingwei Zhang <mizhang@google.com>
> [sean: extract to separate patch, write changelog]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/pmu.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
> index 7b8577f3c57a..96be2c3e0d65 100644
> --- a/arch/x86/kvm/svm/pmu.c
> +++ b/arch/x86/kvm/svm/pmu.c
> @@ -227,6 +227,11 @@ static void amd_pmu_init(struct kvm_vcpu *vcpu)
>  	}
>  }
>  
> +static bool amd_pmu_is_mediated_pmu_supported(struct x86_pmu_capability *host_pmu)
> +{
> +	return host_pmu->version >= 2;
> +}
> +
>  struct kvm_pmu_ops amd_pmu_ops __initdata = {
>  	.rdpmc_ecx_to_pmc = amd_rdpmc_ecx_to_pmc,
>  	.msr_idx_to_pmc = amd_msr_idx_to_pmc,
> @@ -236,6 +241,9 @@ struct kvm_pmu_ops amd_pmu_ops __initdata = {
>  	.set_msr = amd_pmu_set_msr,
>  	.refresh = amd_pmu_refresh,
>  	.init = amd_pmu_init,
> +
> +	.is_mediated_pmu_supported = amd_pmu_is_mediated_pmu_supported,
> +
>  	.EVENTSEL_EVENT = AMD64_EVENTSEL_EVENT,
>  	.MAX_NR_GP_COUNTERS = KVM_MAX_NR_AMD_GP_COUNTERS,
>  	.MIN_NR_GP_COUNTERS = AMD64_NUM_COUNTERS,

Reviewed-by: Sandipan Das <sandipan.das@amd.com>


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 38/44] KVM: x86/pmu: Disallow emulation in the fastpath if mediated PMCs are active
  2025-08-06 19:57 ` [PATCH v5 38/44] KVM: x86/pmu: Disallow emulation in the fastpath if mediated PMCs are active Sean Christopherson
@ 2025-08-13  9:53   ` Sandipan Das
  0 siblings, 0 replies; 65+ messages in thread
From: Sandipan Das @ 2025-08-13  9:53 UTC (permalink / raw)
  To: Sean Christopherson, Marc Zyngier, Oliver Upton, Tianrui Zhao,
	Bibo Mao, Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Xin Li, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Dapeng Mi

On 07-08-2025 01:27, Sean Christopherson wrote:
> Don't handle exits in the fastpath if emulation is required, i.e. if an
> instruction needs to be skipped, the mediated PMU is enabled, and one or
> more PMCs is counting instructions.  With the mediated PMU, KVM's cache of
> PMU state is inconsistent with respect to hardware until KVM exits the
> inner run loop (when the mediated PMU is "put").
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/pmu.h | 10 ++++++++++
>  arch/x86/kvm/x86.c |  9 +++++++++
>  2 files changed, 19 insertions(+)
> 
> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> index e2e2d8476a3f..a0cd42cbea9d 100644
> --- a/arch/x86/kvm/pmu.h
> +++ b/arch/x86/kvm/pmu.h
> @@ -234,6 +234,16 @@ static inline bool pmc_is_globally_enabled(struct kvm_pmc *pmc)
>  	return test_bit(pmc->idx, (unsigned long *)&pmu->global_ctrl);
>  }
>  
> +static inline bool kvm_pmu_is_fastpath_emulation_allowed(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> +
> +	return !kvm_vcpu_has_mediated_pmu(vcpu) ||
> +	       !bitmap_intersects(pmu->pmc_counting_instructions,
> +				  (unsigned long *)&pmu->global_ctrl,
> +				  X86_PMC_IDX_MAX);
> +}
> +
>  void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
>  void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
>  int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7fb94ef64e18..6bdf7ef0b535 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2092,6 +2092,9 @@ EXPORT_SYMBOL_GPL(kvm_emulate_invd);
>  
>  fastpath_t handle_fastpath_invd(struct kvm_vcpu *vcpu)
>  {
> +	if (!kvm_pmu_is_fastpath_emulation_allowed(vcpu))
> +		return EXIT_FASTPATH_NONE;
> +
>  	if (!kvm_emulate_invd(vcpu))
>  		return EXIT_FASTPATH_EXIT_USERSPACE;
>  
> @@ -2151,6 +2154,9 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu)
>  	u64 data = kvm_read_edx_eax(vcpu);
>  	u32 msr = kvm_rcx_read(vcpu);
>  
> +	if (!kvm_pmu_is_fastpath_emulation_allowed(vcpu))
> +		return EXIT_FASTPATH_NONE;
> +
>  	switch (msr) {
>  	case APIC_BASE_MSR + (APIC_ICR >> 4):
>  		if (!lapic_in_kernel(vcpu) || !apic_x2apic_mode(vcpu->arch.apic) ||
> @@ -11267,6 +11273,9 @@ EXPORT_SYMBOL_GPL(kvm_emulate_halt);
>  
>  fastpath_t handle_fastpath_hlt(struct kvm_vcpu *vcpu)
>  {
> +	if (!kvm_pmu_is_fastpath_emulation_allowed(vcpu))
> +		return EXIT_FASTPATH_NONE;
> +
>  	if (!kvm_emulate_halt(vcpu))
>  		return EXIT_FASTPATH_EXIT_USERSPACE;
>  

Reviewed-by: Sandipan Das <sandipan.das@amd.com>


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 17/44] KVM: x86/pmu: Snapshot host (i.e. perf's) reported PMU capabilities
  2025-08-06 19:56 ` [PATCH v5 17/44] KVM: x86/pmu: Snapshot host (i.e. perf's) reported PMU capabilities Sean Christopherson
@ 2025-08-13  9:56   ` Sandipan Das
  0 siblings, 0 replies; 65+ messages in thread
From: Sandipan Das @ 2025-08-13  9:56 UTC (permalink / raw)
  To: Sean Christopherson, Marc Zyngier, Oliver Upton, Tianrui Zhao,
	Bibo Mao, Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Xin Li, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Dapeng Mi

On 07-08-2025 01:26, Sean Christopherson wrote:
> Take a snapshot of the unadulterated PMU capabilities provided by perf so
> that KVM can compare guest vPMU capabilities against hardware capabilities
> when determining whether or not to intercept PMU MSRs (and RDPMC).
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/pmu.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 3206412a35a1..0f3e011824ed 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -26,6 +26,10 @@
>  /* This is enough to filter the vast majority of currently defined events. */
>  #define KVM_PMU_EVENT_FILTER_MAX_EVENTS 300
>  
> +/* Unadultered PMU capabilities of the host, i.e. of hardware. */
> +static struct x86_pmu_capability __read_mostly kvm_host_pmu;
> +
> +/* KVM's PMU capabilities, i.e. the intersection of KVM and hardware support. */
>  struct x86_pmu_capability __read_mostly kvm_pmu_cap;
>  EXPORT_SYMBOL_GPL(kvm_pmu_cap);
>  
> @@ -104,6 +108,8 @@ void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
>  	bool is_intel = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL;
>  	int min_nr_gp_ctrs = pmu_ops->MIN_NR_GP_COUNTERS;
>  
> +	perf_get_x86_pmu_capability(&kvm_host_pmu);
> +
>  	/*
>  	 * Hybrid PMUs don't play nice with virtualization without careful
>  	 * configuration by userspace, and KVM's APIs for reporting supported
> @@ -114,18 +120,16 @@ void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
>  		enable_pmu = false;
>  
>  	if (enable_pmu) {
> -		perf_get_x86_pmu_capability(&kvm_pmu_cap);
> -
>  		/*
>  		 * WARN if perf did NOT disable hardware PMU if the number of
>  		 * architecturally required GP counters aren't present, i.e. if
>  		 * there are a non-zero number of counters, but fewer than what
>  		 * is architecturally required.
>  		 */
> -		if (!kvm_pmu_cap.num_counters_gp ||
> -		    WARN_ON_ONCE(kvm_pmu_cap.num_counters_gp < min_nr_gp_ctrs))
> +		if (!kvm_host_pmu.num_counters_gp ||
> +		    WARN_ON_ONCE(kvm_host_pmu.num_counters_gp < min_nr_gp_ctrs))
>  			enable_pmu = false;
> -		else if (is_intel && !kvm_pmu_cap.version)
> +		else if (is_intel && !kvm_host_pmu.version)
>  			enable_pmu = false;
>  	}
>  
> @@ -134,6 +138,7 @@ void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
>  		return;
>  	}
>  
> +	memcpy(&kvm_pmu_cap, &kvm_host_pmu, sizeof(kvm_host_pmu));
>  	kvm_pmu_cap.version = min(kvm_pmu_cap.version, 2);
>  	kvm_pmu_cap.num_counters_gp = min(kvm_pmu_cap.num_counters_gp,
>  					  pmu_ops->MAX_NR_GP_COUNTERS);

Reviewed-by: Sandipan Das <sandipan.das@amd.com>


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 15/44] KVM: SVM: Check pmu->version, not enable_pmu, when getting PMC MSRs
  2025-08-06 19:56 ` [PATCH v5 15/44] KVM: SVM: Check pmu->version, not enable_pmu, when getting PMC MSRs Sean Christopherson
@ 2025-08-13  9:58   ` Sandipan Das
  0 siblings, 0 replies; 65+ messages in thread
From: Sandipan Das @ 2025-08-13  9:58 UTC (permalink / raw)
  To: Sean Christopherson, Marc Zyngier, Oliver Upton, Tianrui Zhao,
	Bibo Mao, Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Xin Li, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Dapeng Mi

On 07-08-2025 01:26, Sean Christopherson wrote:
> Gate access to PMC MSRs based on pmu->version, not on kvm->arch.enable_pmu,
> to more accurately reflect KVM's behavior.  This is a glorified nop, as
> pmu->version and pmu->nr_arch_gp_counters can only be non-zero if
> amd_pmu_refresh() is reached, kvm_pmu_refresh() invokes amd_pmu_refresh()
> if and only if kvm->arch.enable_pmu is true, and amd_pmu_refresh() forces
> pmu->version to be 1 or 2.
> 
> I.e. the following holds true:
> 
>   !pmu->nr_arch_gp_counters || kvm->arch.enable_pmu == (pmu->version > 0)
> 
> and so the only way for amd_pmu_get_pmc() to return a non-NULL value is if
> both kvm->arch.enable_pmu and pmu->version evaluate to true.
> 
> No real functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/pmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
> index 288f7f2a46f2..7b8577f3c57a 100644
> --- a/arch/x86/kvm/svm/pmu.c
> +++ b/arch/x86/kvm/svm/pmu.c
> @@ -41,7 +41,7 @@ static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr,
>  	struct kvm_vcpu *vcpu = pmu_to_vcpu(pmu);
>  	unsigned int idx;
>  
> -	if (!vcpu->kvm->arch.enable_pmu)
> +	if (!pmu->version)
>  		return NULL;
>  
>  	switch (msr) {

Reviewed-by: Sandipan Das <sandipan.das@amd.com>


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 33/44] KVM: x86/pmu: Bypass perf checks when emulating mediated PMU counter accesses
  2025-08-06 19:56 ` [PATCH v5 33/44] KVM: x86/pmu: Bypass perf checks when emulating mediated PMU counter accesses Sean Christopherson
@ 2025-08-13 10:01   ` Sandipan Das
  0 siblings, 0 replies; 65+ messages in thread
From: Sandipan Das @ 2025-08-13 10:01 UTC (permalink / raw)
  To: Sean Christopherson, Marc Zyngier, Oliver Upton, Tianrui Zhao,
	Bibo Mao, Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Xin Li, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Dapeng Mi

On 07-08-2025 01:26, Sean Christopherson wrote:
> From: Dapeng Mi <dapeng1.mi@linux.intel.com>
> 
> When emulating a PMC counter read or write for a mediated PMU, bypass the
> perf checks and emulated_counter logic as the counters aren't proxied
> through perf, i.e. pmc->counter always holds the guest's up-to-date value,
> and thus there's no need to defer emulated overflow checks.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Co-developed-by: Mingwei Zhang <mizhang@google.com>
> Signed-off-by: Mingwei Zhang <mizhang@google.com>
> [sean: split from event filtering change, write shortlog+changelog]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/pmu.c | 5 +++++
>  arch/x86/kvm/pmu.h | 3 +++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 817ef852bdf9..082d2905882b 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -377,6 +377,11 @@ static void pmc_update_sample_period(struct kvm_pmc *pmc)
>  
>  void pmc_write_counter(struct kvm_pmc *pmc, u64 val)
>  {
> +	if (kvm_vcpu_has_mediated_pmu(pmc->vcpu)) {
> +		pmc->counter = val & pmc_bitmask(pmc);
> +		return;
> +	}
> +
>  	/*
>  	 * Drop any unconsumed accumulated counts, the WRMSR is a write, not a
>  	 * read-modify-write.  Adjust the counter value so that its value is
> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> index 51963a3a167a..1c9d26d60a60 100644
> --- a/arch/x86/kvm/pmu.h
> +++ b/arch/x86/kvm/pmu.h
> @@ -111,6 +111,9 @@ static inline u64 pmc_read_counter(struct kvm_pmc *pmc)
>  {
>  	u64 counter, enabled, running;
>  
> +	if (kvm_vcpu_has_mediated_pmu(pmc->vcpu))
> +		return pmc->counter & pmc_bitmask(pmc);
> +
>  	counter = pmc->counter + pmc->emulated_counter;
>  
>  	if (pmc->perf_event && !pmc->is_paused)

Reviewed-by: Sandipan Das <sandipan.das@amd.com>


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 09/44] perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put context
  2025-08-06 19:56 ` [PATCH v5 09/44] perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put context Sean Christopherson
@ 2025-08-15 11:39   ` Peter Zijlstra
  2025-08-15 15:41     ` Sean Christopherson
  2025-08-15 13:04   ` Peter Zijlstra
  1 sibling, 1 reply; 65+ messages in thread
From: Peter Zijlstra @ 2025-08-15 11:39 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Paolo Bonzini,
	linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

On Wed, Aug 06, 2025 at 12:56:31PM -0700, Sean Christopherson wrote:
> Add arch hooks to the mediated vPMU load/put APIs, and use the hooks to
> switch PMIs to the dedicated mediated PMU IRQ vector on load, and back to
> perf's standard NMI when the guest context is put.  I.e. route PMIs to
> PERF_GUEST_MEDIATED_PMI_VECTOR when the guest context is active, and to
> NMIs while the host context is active.
> 
> While running with guest context loaded, ignore all NMIs (in perf).  Any
> NMI that arrives while the LVTPC points at the mediated PMU IRQ vector
> can't possibly be due to a host perf event.
> 
> Signed-off-by: Xiong Zhang <xiong.y.zhang@linux.intel.com>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> Signed-off-by: Mingwei Zhang <mizhang@google.com>
> [sean: use arch hook instead of per-PMU callback]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/events/core.c     | 27 +++++++++++++++++++++++++++
>  include/linux/perf_event.h |  3 +++
>  kernel/events/core.c       |  4 ++++
>  3 files changed, 34 insertions(+)
> 
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 7610f26dfbd9..9b0525b252f1 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -55,6 +55,8 @@ DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
>  	.pmu = &pmu,
>  };
>  
> +static DEFINE_PER_CPU(bool, x86_guest_ctx_loaded);
> +
>  DEFINE_STATIC_KEY_FALSE(rdpmc_never_available_key);
>  DEFINE_STATIC_KEY_FALSE(rdpmc_always_available_key);
>  DEFINE_STATIC_KEY_FALSE(perf_is_hybrid);
> @@ -1756,6 +1758,16 @@ perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)
>  	u64 finish_clock;
>  	int ret;
>  
> +	/*
> +	 * Ignore all NMIs when a guest's mediated PMU context is loaded.  Any
> +	 * such NMI can't be due to a PMI as the CPU's LVTPC is switched to/from
> +	 * the dedicated mediated PMI IRQ vector while host events are quiesced.
> +	 * Attempting to handle a PMI while the guest's context is loaded will
> +	 * generate false positives and clobber guest state.
> +	 */
> +	if (this_cpu_read(x86_guest_ctx_loaded))
> +		return NMI_DONE;
> +
>  	/*
>  	 * All PMUs/events that share this PMI handler should make sure to
>  	 * increment active_events for their events.
> @@ -2727,6 +2739,21 @@ static struct pmu pmu = {
>  	.filter			= x86_pmu_filter,
>  };
>  
> +void arch_perf_load_guest_context(unsigned long data)
> +{
> +	u32 masked = data & APIC_LVT_MASKED;
> +
> +	apic_write(APIC_LVTPC,
> +		   APIC_DM_FIXED | PERF_GUEST_MEDIATED_PMI_VECTOR | masked);
> +	this_cpu_write(x86_guest_ctx_loaded, true);
> +}
> +
> +void arch_perf_put_guest_context(void)
> +{
> +	this_cpu_write(x86_guest_ctx_loaded, false);
> +	apic_write(APIC_LVTPC, APIC_DM_NMI);
> +}
> +
>  void arch_perf_update_userpage(struct perf_event *event,
>  			       struct perf_event_mmap_page *userpg, u64 now)
>  {
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 0c529fbd97e6..3a9bd9c4c90e 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1846,6 +1846,9 @@ static inline unsigned long perf_arch_guest_misc_flags(struct pt_regs *regs)
>  # define perf_arch_guest_misc_flags(regs)	perf_arch_guest_misc_flags(regs)
>  #endif
>  
> +extern void arch_perf_load_guest_context(unsigned long data);
> +extern void arch_perf_put_guest_context(void);
> +
>  static inline bool needs_branch_stack(struct perf_event *event)
>  {
>  	return event->attr.branch_sample_type != 0;
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index e1df3c3bfc0d..ad22b182762e 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6408,6 +6408,8 @@ void perf_load_guest_context(unsigned long data)
>  		task_ctx_sched_out(cpuctx->task_ctx, NULL, EVENT_GUEST);
>  	}
>  
> +	arch_perf_load_guest_context(data);

So I still don't understand why this ever needs to reach the generic
code. x86 pmu driver and x86 kvm can surely sort this out inside of x86,
no?


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 09/44] perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put context
  2025-08-06 19:56 ` [PATCH v5 09/44] perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put context Sean Christopherson
  2025-08-15 11:39   ` Peter Zijlstra
@ 2025-08-15 13:04   ` Peter Zijlstra
  2025-08-15 15:51     ` Sean Christopherson
  1 sibling, 1 reply; 65+ messages in thread
From: Peter Zijlstra @ 2025-08-15 13:04 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Paolo Bonzini,
	linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

On Wed, Aug 06, 2025 at 12:56:31PM -0700, Sean Christopherson wrote:

> @@ -2727,6 +2739,21 @@ static struct pmu pmu = {
>  	.filter			= x86_pmu_filter,
>  };
>  
> +void arch_perf_load_guest_context(unsigned long data)
> +{
> +	u32 masked = data & APIC_LVT_MASKED;
> +
> +	apic_write(APIC_LVTPC,
> +		   APIC_DM_FIXED | PERF_GUEST_MEDIATED_PMI_VECTOR | masked);
> +	this_cpu_write(x86_guest_ctx_loaded, true);
> +}

I'm further confused, why would this ever be masked?

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 09/44] perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put context
  2025-08-15 11:39   ` Peter Zijlstra
@ 2025-08-15 15:41     ` Sean Christopherson
  2025-08-15 15:55       ` Sean Christopherson
  0 siblings, 1 reply; 65+ messages in thread
From: Sean Christopherson @ 2025-08-15 15:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Paolo Bonzini,
	linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

On Fri, Aug 15, 2025, Peter Zijlstra wrote:
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index e1df3c3bfc0d..ad22b182762e 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -6408,6 +6408,8 @@ void perf_load_guest_context(unsigned long data)
> >  		task_ctx_sched_out(cpuctx->task_ctx, NULL, EVENT_GUEST);
> >  	}
> >  
> > +	arch_perf_load_guest_context(data);
> 
> So I still don't understand why this ever needs to reach the generic
> code. x86 pmu driver and x86 kvm can surely sort this out inside of x86,
> no?

It's definitely possible to handle this entirely within x86, I just don't love
switching the LVTPC without the protection of perf_ctx_lock and perf_ctx_disable().
It's not a sticking point for me if you strongly prefer something like this: 

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 0e5048ae86fa..86b81c217b97 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -1319,7 +1319,9 @@ void kvm_mediated_pmu_load(struct kvm_vcpu *vcpu)
 
        lockdep_assert_irqs_disabled();
 
-       perf_load_guest_context(kvm_lapic_get_reg(vcpu->arch.apic, APIC_LVTPC));
+       perf_load_guest_context();
+
+       perf_load_guest_lvtpc(kvm_lapic_get_reg(vcpu->arch.apic, APIC_LVTPC));
 
        /*
         * Disable all counters before loading event selectors and PMCs so that
@@ -1380,5 +1382,7 @@ void kvm_mediated_pmu_put(struct kvm_vcpu *vcpu)
 
        kvm_pmu_put_guest_pmcs(vcpu);
 
+       perf_put_guest_lvtpc();
+
        perf_put_guest_context();
 }


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 09/44] perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put context
  2025-08-15 13:04   ` Peter Zijlstra
@ 2025-08-15 15:51     ` Sean Christopherson
  0 siblings, 0 replies; 65+ messages in thread
From: Sean Christopherson @ 2025-08-15 15:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Paolo Bonzini,
	linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

On Fri, Aug 15, 2025, Peter Zijlstra wrote:
> On Wed, Aug 06, 2025 at 12:56:31PM -0700, Sean Christopherson wrote:
> 
> > @@ -2727,6 +2739,21 @@ static struct pmu pmu = {
> >  	.filter			= x86_pmu_filter,
> >  };
> >  
> > +void arch_perf_load_guest_context(unsigned long data)
> > +{
> > +	u32 masked = data & APIC_LVT_MASKED;
> > +
> > +	apic_write(APIC_LVTPC,
> > +		   APIC_DM_FIXED | PERF_GUEST_MEDIATED_PMI_VECTOR | masked);
> > +	this_cpu_write(x86_guest_ctx_loaded, true);
> > +}
> 
> I'm further confused, why would this ever be masked?

The idea is to match the guest's LVTPC state so that KVM doesn't trigger IRQ
VM-Exits on counter overflow when the guest's LVTPC is masked.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 09/44] perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put context
  2025-08-15 15:41     ` Sean Christopherson
@ 2025-08-15 15:55       ` Sean Christopherson
  2025-08-18 14:32         ` Peter Zijlstra
  0 siblings, 1 reply; 65+ messages in thread
From: Sean Christopherson @ 2025-08-15 15:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Paolo Bonzini,
	linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

On Fri, Aug 15, 2025, Sean Christopherson wrote:
> On Fri, Aug 15, 2025, Peter Zijlstra wrote:
> > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > index e1df3c3bfc0d..ad22b182762e 100644
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -6408,6 +6408,8 @@ void perf_load_guest_context(unsigned long data)
> > >  		task_ctx_sched_out(cpuctx->task_ctx, NULL, EVENT_GUEST);
> > >  	}
> > >  
> > > +	arch_perf_load_guest_context(data);
> > 
> > So I still don't understand why this ever needs to reach the generic
> > code. x86 pmu driver and x86 kvm can surely sort this out inside of x86,
> > no?
> 
> It's definitely possible to handle this entirely within x86, I just don't love
> switching the LVTPC without the protection of perf_ctx_lock and perf_ctx_disable().
> It's not a sticking point for me if you strongly prefer something like this: 
> 
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 0e5048ae86fa..86b81c217b97 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -1319,7 +1319,9 @@ void kvm_mediated_pmu_load(struct kvm_vcpu *vcpu)
>  
>         lockdep_assert_irqs_disabled();
>  
> -       perf_load_guest_context(kvm_lapic_get_reg(vcpu->arch.apic, APIC_LVTPC));
> +       perf_load_guest_context();
> +
> +       perf_load_guest_lvtpc(kvm_lapic_get_reg(vcpu->arch.apic, APIC_LVTPC));

Hmm, an argument for providing a dedicated perf_load_guest_lvtpc() APIs is that
it would allow KVM to handle LVTPC writes in KVM's VM-Exit fastpath, i.e. without
having to do a full put+reload of the guest context.

So if we're confident that switching the host LVTPC outside of
perf_{load,put}_guest_context() is functionally safe, I'm a-ok with it.

>         /*
>          * Disable all counters before loading event selectors and PMCs so that
> @@ -1380,5 +1382,7 @@ void kvm_mediated_pmu_put(struct kvm_vcpu *vcpu)
>  
>         kvm_pmu_put_guest_pmcs(vcpu);
>  
> +       perf_put_guest_lvtpc();
> +
>         perf_put_guest_context();
>  }
> 

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 09/44] perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put context
  2025-08-15 15:55       ` Sean Christopherson
@ 2025-08-18 14:32         ` Peter Zijlstra
  2025-08-18 15:25           ` Sean Christopherson
  0 siblings, 1 reply; 65+ messages in thread
From: Peter Zijlstra @ 2025-08-18 14:32 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Paolo Bonzini,
	linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

On Fri, Aug 15, 2025 at 08:55:25AM -0700, Sean Christopherson wrote:
> On Fri, Aug 15, 2025, Sean Christopherson wrote:
> > On Fri, Aug 15, 2025, Peter Zijlstra wrote:
> > > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > > index e1df3c3bfc0d..ad22b182762e 100644
> > > > --- a/kernel/events/core.c
> > > > +++ b/kernel/events/core.c
> > > > @@ -6408,6 +6408,8 @@ void perf_load_guest_context(unsigned long data)
> > > >  		task_ctx_sched_out(cpuctx->task_ctx, NULL, EVENT_GUEST);
> > > >  	}
> > > >  
> > > > +	arch_perf_load_guest_context(data);
> > > 
> > > So I still don't understand why this ever needs to reach the generic
> > > code. x86 pmu driver and x86 kvm can surely sort this out inside of x86,
> > > no?
> > 
> > It's definitely possible to handle this entirely within x86, I just don't love
> > switching the LVTPC without the protection of perf_ctx_lock and perf_ctx_disable().
> > It's not a sticking point for me if you strongly prefer something like this: 
> > 
> > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> > index 0e5048ae86fa..86b81c217b97 100644
> > --- a/arch/x86/kvm/pmu.c
> > +++ b/arch/x86/kvm/pmu.c
> > @@ -1319,7 +1319,9 @@ void kvm_mediated_pmu_load(struct kvm_vcpu *vcpu)
> >  
> >         lockdep_assert_irqs_disabled();
> >  
> > -       perf_load_guest_context(kvm_lapic_get_reg(vcpu->arch.apic, APIC_LVTPC));
> > +       perf_load_guest_context();
> > +
> > +       perf_load_guest_lvtpc(kvm_lapic_get_reg(vcpu->arch.apic, APIC_LVTPC));
> 
> Hmm, an argument for providing a dedicated perf_load_guest_lvtpc() APIs is that
> it would allow KVM to handle LVTPC writes in KVM's VM-Exit fastpath, i.e. without
> having to do a full put+reload of the guest context.
> 
> So if we're confident that switching the host LVTPC outside of
> perf_{load,put}_guest_context() is functionally safe, I'm a-ok with it.

Let me see. So the hardware sets Masked when it raises the interrupt.

The interrupt handler clears it from software -- depending on uarch in 3
different places:
 1) right at the start of the PMI
 2) in the middle, right before enabling the PMU (writing global control)
 3) at the end of the PMI

the various changelogs adding that code mention spurious PMIs and
malformed PEBS records.

So the fun all happens when the guest is doing PMI and gets a VM-exit
while still Masked.

At that point, we can come in and completely rewrite the PMU state,
reroute the PMI and enable things again. Then later, we 'restore' the
PMU state, re-set LVTPC masked to the guest interrupt and 'resume'.

What could possibly go wrong :/ Kan, I'm assuming, but not knowing, that
writing all the PMU MSRs is somehow serializing state sufficient to not
cause the above mentioned fails? Specifically, clearing PEBS_ENABLE
should inhibit those malformed PEBS records or something? What if the
host also has PEBS and we don't actually clear the bit?

The current order ensures we rewrite LVTPC when global control is unset;
I think we want to keep that.

While staring at this, I note that perf_load_guest_context() will clear
global ctrl, clear all the counter programming, and re-enable an empty
pmu. Now, an empty PMU should result in global control being zero --
there is nothing run after all.

But then kvm_mediated_pmu_load() writes an explicit 0 again. Perhaps
replace this with asserting it is 0 instead?

Anyway, this means that moving the LVTPC writing into
kvm_mediated_pmu_load() as you suggest is identical.
perf_load_guest_context() results in global control being 0, we then
assert it is 0, and write LVTPC while it is still 0.
kvm_pmu_load_guest_pmcs() will then frob the MSRs.

OK, so *IF* doing the VM-exit during PMI is sound, this is something
that needs a comment somewhere. Then going back again, is the easy part,
since on the host side, we can never transition into KVM during a PMI.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 09/44] perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put context
  2025-08-18 14:32         ` Peter Zijlstra
@ 2025-08-18 15:25           ` Sean Christopherson
  2025-08-18 16:12             ` Peter Zijlstra
  0 siblings, 1 reply; 65+ messages in thread
From: Sean Christopherson @ 2025-08-18 15:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Paolo Bonzini,
	linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

On Mon, Aug 18, 2025, Peter Zijlstra wrote:
> On Fri, Aug 15, 2025 at 08:55:25AM -0700, Sean Christopherson wrote:
> > On Fri, Aug 15, 2025, Sean Christopherson wrote:
> > > On Fri, Aug 15, 2025, Peter Zijlstra wrote:
> > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > > > index e1df3c3bfc0d..ad22b182762e 100644
> > > > > --- a/kernel/events/core.c
> > > > > +++ b/kernel/events/core.c
> > > > > @@ -6408,6 +6408,8 @@ void perf_load_guest_context(unsigned long data)
> > > > >  		task_ctx_sched_out(cpuctx->task_ctx, NULL, EVENT_GUEST);
> > > > >  	}
> > > > >  
> > > > > +	arch_perf_load_guest_context(data);
> > > > 
> > > > So I still don't understand why this ever needs to reach the generic
> > > > code. x86 pmu driver and x86 kvm can surely sort this out inside of x86,
> > > > no?
> > > 
> > > It's definitely possible to handle this entirely within x86, I just don't love
> > > switching the LVTPC without the protection of perf_ctx_lock and perf_ctx_disable().
> > > It's not a sticking point for me if you strongly prefer something like this: 
> > > 
> > > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> > > index 0e5048ae86fa..86b81c217b97 100644
> > > --- a/arch/x86/kvm/pmu.c
> > > +++ b/arch/x86/kvm/pmu.c
> > > @@ -1319,7 +1319,9 @@ void kvm_mediated_pmu_load(struct kvm_vcpu *vcpu)
> > >  
> > >         lockdep_assert_irqs_disabled();
> > >  
> > > -       perf_load_guest_context(kvm_lapic_get_reg(vcpu->arch.apic, APIC_LVTPC));
> > > +       perf_load_guest_context();
> > > +
> > > +       perf_load_guest_lvtpc(kvm_lapic_get_reg(vcpu->arch.apic, APIC_LVTPC));
> > 
> > Hmm, an argument for providing a dedicated perf_load_guest_lvtpc() APIs is that
> > it would allow KVM to handle LVTPC writes in KVM's VM-Exit fastpath, i.e. without
> > having to do a full put+reload of the guest context.
> > 
> > So if we're confident that switching the host LVTPC outside of
> > perf_{load,put}_guest_context() is functionally safe, I'm a-ok with it.
> 
> Let me see. So the hardware sets Masked when it raises the interrupt.
> 
> The interrupt handler clears it from software -- depending on uarch in 3
> different places:
>  1) right at the start of the PMI
>  2) in the middle, right before enabling the PMU (writing global control)
>  3) at the end of the PMI
> 
> the various changelogs adding that code mention spurious PMIs and
> malformed PEBS records.
> 
> So the fun all happens when the guest is doing PMI and gets a VM-exit
> while still Masked.
> 
> At that point, we can come in and completely rewrite the PMU state,
> reroute the PMI and enable things again. Then later, we 'restore' the
> PMU state, re-set LVTPC masked to the guest interrupt and 'resume'.
> 
> What could possibly go wrong :/ Kan, I'm assuming, but not knowing, that
> writing all the PMU MSRs is somehow serializing state sufficient to not
> cause the above mentioned fails? Specifically, clearing PEBS_ENABLE
> should inhibit those malformed PEBS records or something? What if the
> host also has PEBS and we don't actually clear the bit?
> 
> The current order ensures we rewrite LVTPC when global control is unset;
> I think we want to keep that.

Yes, for sure.

> While staring at this, I note that perf_load_guest_context() will clear
> global ctrl, clear all the counter programming, and re-enable an empty
> pmu. Now, an empty PMU should result in global control being zero --
> there is nothing run after all.
> 
> But then kvm_mediated_pmu_load() writes an explicit 0 again. Perhaps
> replace this with asserting it is 0 instead?

Yeah, I like that idea, a lot.  This?

	perf_load_guest_context();

	/*
	 * Sanity check that "loading" guest context disabled all counters, as
	 * modifying the LVTPC while host perf is active will cause explosions,
	 * as will loading event selectors and PMCs with guest values.
	 *
	 * VMX will enable/disable counters at VM-Enter/VM-Exit by atomically
	 * loading PERF_GLOBAL_CONTROL.  SVM effectively performs the switch by
	 * configuring all events to be GUEST_ONLY.
	 */
	WARN_ON_ONCE(rdmsrq(kvm_pmu_ops.PERF_GLOBAL_CTRL));

	perf_load_guest_lvtpc(kvm_lapic_get_reg(vcpu->arch.apic, APIC_LVTPC));

> Anyway, this means that moving the LVTPC writing into
> kvm_mediated_pmu_load() as you suggest is identical.
> perf_load_guest_context() results in global control being 0, we then
> assert it is 0, and write LVTPC while it is still 0.
> kvm_pmu_load_guest_pmcs() will then frob the MSRs.
> 
> OK, so *IF* doing the VM-exit during PMI is sound, this is something
> that needs a comment somewhere. 

I'm a bit lost here.  Are you essentially asking if it's ok to take a VM-Exit
while the guest is handling a PMI?  If so, that _has_ to work, because there are
myriad things that can/will trigger a VM-Exit at any point while the guest is
active.

> Then going back again, is the easy part, since on the host side, we can never
> transition into KVM during a PMI.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 09/44] perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put context
  2025-08-18 15:25           ` Sean Christopherson
@ 2025-08-18 16:12             ` Peter Zijlstra
  2025-08-18 20:07               ` Liang, Kan
  0 siblings, 1 reply; 65+ messages in thread
From: Peter Zijlstra @ 2025-08-18 16:12 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Paolo Bonzini,
	linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

On Mon, Aug 18, 2025 at 08:25:34AM -0700, Sean Christopherson wrote:

> > OK, so *IF* doing the VM-exit during PMI is sound, this is something
> > that needs a comment somewhere. 
> 
> I'm a bit lost here.  Are you essentially asking if it's ok to take a VM-Exit
> while the guest is handling a PMI?  If so, that _has_ to work, because there are
> myriad things that can/will trigger a VM-Exit at any point while the guest is
> active.

Yes, that's what I'm asking. Why is this VM-exit during PMI nonsense not
subject to the same failures that mandates the mid/late PMI ACK.

And yes, I realize this needs to work. But so far I'm not sure I
understand why that is a safe thing to do.

Like I wrote, I suspect writing all the PMU MSRs serializes things
sufficiently, but if that is the case, that needs to be explicitly
mentioned. Because that also doesn't explain why we needs mid-ack
instead of late-ack on ADL e-cores for instance.

Could it perhaps be that we don't let the guests do PEBS because DS
doesn't virtualize? And thus we don't have the malformed PEBS record?

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 09/44] perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put context
  2025-08-18 16:12             ` Peter Zijlstra
@ 2025-08-18 20:07               ` Liang, Kan
  0 siblings, 0 replies; 65+ messages in thread
From: Liang, Kan @ 2025-08-18 20:07 UTC (permalink / raw)
  To: Peter Zijlstra, Sean Christopherson
  Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li,
	H. Peter Anvin, Andy Lutomirski, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Paolo Bonzini,
	linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Yongwei Ma, Mingwei Zhang,
	Xiong Zhang, Sandipan Das, Dapeng Mi



On 2025-08-18 9:12 a.m., Peter Zijlstra wrote:
> On Mon, Aug 18, 2025 at 08:25:34AM -0700, Sean Christopherson wrote:
> 
>>> OK, so *IF* doing the VM-exit during PMI is sound, this is something
>>> that needs a comment somewhere. 
>>
>> I'm a bit lost here.  Are you essentially asking if it's ok to take a VM-Exit
>> while the guest is handling a PMI?  If so, that _has_ to work, because there are
>> myriad things that can/will trigger a VM-Exit at any point while the guest is
>> active.
> 
> Yes, that's what I'm asking. Why is this VM-exit during PMI nonsense not
> subject to the same failures that mandates the mid/late PMI ACK.
> 
> And yes, I realize this needs to work. But so far I'm not sure I
> understand why that is a safe thing to do.
> 
> Like I wrote, I suspect writing all the PMU MSRs serializes things
> sufficiently, but if that is the case, that needs to be explicitly
> mentioned. Because that also doesn't explain why we needs mid-ack
> instead of late-ack on ADL e-cores for instance.

The mid-ack and late-ack only require under some corner cases, e.g., two
PMIs are triggered simultaneously with PEBS.
Because the ucode of p-core and e-core handle the pending PEBS records
and PMIs differently.
For p-core, the ACK should be as close to EOM. Otherwise, the pending
PMI will trigger a spurious PMI warning.
For e-core, the uncode handles the pending PMI well. There is no
spurious PMI. However, it impacts the update of the PEBS_DATA_CFG. The
PEBS_DATA_CFG is global. If the ACK cannot be done before re-enabling
counters, the stale PEBS_DATA_CFG will somehow be written into the next
PEBS record of the pending PMI. It triggers the malformed PEBS record.

For the upcoming arch PEBS, the data cfg is per-counter.
The mid-ack workaround should not be required.
> 
> Could it perhaps be that we don't let the guests do PEBS because DS
> doesn't virtualize? And thus we don't have the malformed PEBS record?
>

Yes, I don't think it can impact the mediated PMU. The legacy PEBS for
vPMU is not supported.
Since the configuration is per-counter with the arch PEBS, the malformed
PEBS record should not be triggered either.


Thanks,
Kan

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs
  2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
                   ` (45 preceding siblings ...)
  2025-08-13  9:45 ` Sandipan Das
@ 2025-08-22  8:12 ` Hao, Xudong
  46 siblings, 0 replies; 65+ messages in thread
From: Hao, Xudong @ 2025-08-22  8:12 UTC (permalink / raw)
  To: Sean Christopherson, Marc Zyngier, Oliver Upton, Tianrui Zhao,
	Bibo Mao, Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Xin Li, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Paolo Bonzini
  Cc: linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi



On 8/7/2025 3:56 AM, Sean Christopherson wrote:
> This series is based on the fastpath+PMU cleanups series[*] (which is based on
> kvm/queue), but the non-KVM changes apply cleanly on v6.16 or Linus' tree.
> I.e. if you only care about the perf changes, I would just apply on whatever
> branch is convenient and stop when you hit the KVM changes.
> 
> My hope/plan is that the perf changes will go through the tip tree with a
> stable tag/branch, and the KVM changes will go the kvm-x86 tree.
> 
> Non-x86 KVM folks, y'all are getting Cc'd due to minor changes in "KVM: Add a
> simplified wrapper for registering perf callbacks".
> 
> The full set is also available at:
> 
>    https://github.com/sean-jc/linux.git tags/mediated-vpmu-v5
> 
> Add support for mediated vPMUs in KVM x86, where "mediated" aligns with the
> standard definition of intercepting control operations (e.g. event selectors),
> while allowing the guest to perform data operations (e.g. read PMCs, toggle
> counters on/off) without KVM getting involed.
> 
> For an in-depth description of the what and why, please see the cover letter
> from the original RFC:
> 
>    https://lore.kernel.org/all/20240126085444.324918-1-xiong.y.zhang@linux.intel.com
> 
> All KVM tests pass (or fail the same before and after), and I've manually
> verified MSR/PMC are passed through as expected, but I haven't done much at all
> to actually utilize the PMU in a guest.  I'll be amazed if I didn't make at
> least one major goof.
> 
> Similarly, I tried to address all feedback, but there are many, many changes
> relative to v4.  If I missed something, I apologize in advance.
> 
> In other words, please thoroughly review and test.
> 
> [*] https://lore.kernel.org/all/20250805190526.1453366-1-seanjc@google.com
> 
> v5:
>   - Add a patch to call security_perf_event_free() from __free_event()
>     instead of _free_event() (necessitated by the __cleanup() changes).
>   - Add CONFIG_PERF_GUEST_MEDIATED_PMU to guard the new perf functionality.
>   - Ensure the PMU is fully disabled in perf_{load,put}_guest_context() when
>     when switching between guest and host context. [Kan, Namhyung]
>   - Route the new system IRQ, PERF_GUEST_MEDIATED_PMI_VECTOR, through perf,
>     not KVM, and play nice with FRED.
>   - Rename and combine perf_{guest,host}_{enter,exit}() to a single set of
>     APIs, perf_{load,put}_guest_context().
>   - Rename perf_{get,put}_mediated_pmu() to perf_{create,release}_mediated_pmu()
>     to (hopefully) better differentiate them from perf_{load,put}_guest_context().
>   - Change the param to the load/put APIs from "u32 guest_lvtpc" to
>     "unsigned long data" to decouple arch code as much as possible.  E.g. if
>     a non-x86 arch were to ever support a mediated vPMU, @data could be used
>     to pass a pointer to a struct.
>   - Use pmu->version to detect if a vCPU has a mediated PMU.
>   - Use a kvm_x86_ops hook to check for mediated PMU support.
>   - Cull "passthrough" from as many places as I could find.
>   - Improve the changelog/documentation related to RDPMC interception.
>   - Check harware capabilities, not KVM capabilities, when calculating
>     MSR and RDPMC intercepts.
>   - Rework intercept (re)calculation to use a request and the existing (well,
>     will be existing as of 6.17-rc1) vendor hooks for recalculating intercepts.
>   - Always read PERF_GLOBAL_CTRL on VM-Exit if writes weren't intercepted while
>     running the vCPU.
>   - Call setup_vmcs_config() before kvm_x86_vendor_init() so that the golden
>     VMCS configuration is known before kvm_init_pmu_capability() is called.
>   - Keep as much refresh/init code in common x86 as possible.
>   - Context switch PMCs and event selectors in common x86, not vendor code.
>   - Bail from the VM-Exit fastpath if the guest is counting instructions
>     retired and the mediated PMU is enabled (because guest state hasn't yet
>     been synchronized with hardware).
>   - Don't require an userspace to opt-in via KVM_CAP_PMU_CAPABILITY, and instead
>     automatically "create" a mediated PMU on the first KVM_CREATE_VCPU call if
>     the VM has an in-kernel local APIC.
>   - Add entries in kernel-parameters.txt for the PMU params.
>   - Add a patch to elide PMC writes when possible.
>   - Many more fixups and tweaks...
> 
> v4:
>   - https://lore.kernel.org/all/20250324173121.1275209-1-mizhang@google.com
>   - Rebase whole patchset on 6.14-rc3 base.
>   - Address Peter's comments on Perf part.
>   - Address Sean's comments on KVM part.
>     * Change key word "passthrough" to "mediated" in all patches
>     * Change static enabling to user space dynamic enabling via KVM_CAP_PMU_CAPABILITY.
>     * Only support GLOBAL_CTRL save/restore with VMCS exec_ctrl, drop the MSR
>       save/retore list support for GLOBAL_CTRL, thus the support of mediated
>       vPMU is constrained to SapphireRapids and later CPUs on Intel side.
>     * Merge some small changes into a single patch.
>   - Address Sandipan's comment on invalid pmu pointer.
>   - Add back "eventsel_hw" and "fixed_ctr_ctrl_hw" to avoid to directly
>     manipulate pmc->eventsel and pmu->fixed_ctr_ctrl.
> 
> v3: https://lore.kernel.org/all/20240801045907.4010984-1-mizhang@google.com
> v2: https://lore.kernel.org/all/20240506053020.3911940-1-mizhang@google.com
> v1: https://lore.kernel.org/all/20240126085444.324918-1-xiong.y.zhang@linux.intel.com
> 
> Dapeng Mi (15):
>    KVM: x86/pmu: Start stubbing in mediated PMU support
>    KVM: x86/pmu: Implement Intel mediated PMU requirements and
>      constraints
>    KVM: x86: Rename vmx_vmentry/vmexit_ctrl() helpers
>    KVM: x86/pmu: Move PMU_CAP_{FW_WRITES,LBR_FMT} into msr-index.h header
>    KVM: VMX: Add helpers to toggle/change a bit in VMCS execution
>      controls
>    KVM: x86/pmu: Disable RDPMC interception for compatible mediated vPMU
>    KVM: x86/pmu: Load/save GLOBAL_CTRL via entry/exit fields for mediated
>      PMU
>    KVM: x86/pmu: Use BIT_ULL() instead of open coded equivalents
>    KVM: x86/pmu: Disable interception of select PMU MSRs for mediated
>      vPMUs
>    KVM: x86/pmu: Bypass perf checks when emulating mediated PMU counter
>      accesses
>    KVM: x86/pmu: Reprogram mediated PMU event selectors on event filter
>      updates
>    KVM: x86/pmu: Load/put mediated PMU context when entering/exiting
>      guest
>    KVM: x86/pmu: Handle emulated instruction for mediated vPMU
>    KVM: nVMX: Add macros to simplify nested MSR interception setting
>    KVM: x86/pmu: Expose enable_mediated_pmu parameter to user space
> 
> Kan Liang (7):
>    perf: Skip pmu_ctx based on event_type
>    perf: Add generic exclude_guest support
>    perf: Add APIs to create/release mediated guest vPMUs
>    perf: Clean up perf ctx time
>    perf: Add a EVENT_GUEST flag
>    perf: Add APIs to load/put guest mediated PMU context
>    perf/x86/intel: Support PERF_PMU_CAP_MEDIATED_VPMU
> 
> Mingwei Zhang (3):
>    perf/x86/core: Plumb mediated PMU capability from x86_pmu to
>      x86_pmu_cap
>    KVM: x86/pmu: Introduce eventsel_hw to prepare for pmu event filtering
>    KVM: nVMX: Disable PMU MSR interception as appropriate while running
>      L2
> 
> Sandipan Das (3):
>    perf/x86/core: Do not set bit width for unavailable counters
>    perf/x86/amd: Support PERF_PMU_CAP_MEDIATED_VPMU for AMD host
>    KVM: x86/pmu: Always stuff GuestOnly=1,HostOnly=0 for mediated PMCs on
>      AMD
> 
> Sean Christopherson (15):
>    perf: Move security_perf_event_free() call to __free_event()
>    perf: core/x86: Register a new vector for handling mediated guest PMIs
>    perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put
>      context
>    KVM: VMX: Setup canonical VMCS config prior to kvm_x86_vendor_init()
>    KVM: SVM: Check pmu->version, not enable_pmu, when getting PMC MSRs
>    KVM: Add a simplified wrapper for registering perf callbacks
>    KVM: x86/pmu: Snapshot host (i.e. perf's) reported PMU capabilities
>    KVM: x86/pmu: Implement AMD mediated PMU requirements
>    KVM: x86: Rework KVM_REQ_MSR_FILTER_CHANGED into a generic
>      RECALC_INTERCEPTS
>    KVM: x86: Use KVM_REQ_RECALC_INTERCEPTS to react to CPUID updates
>    KVM: x86/pmu: Move initialization of valid PMCs bitmask to common x86
>    KVM: x86/pmu: Restrict GLOBAL_{CTRL,STATUS}, fixed PMCs, and PEBS to
>      PMU v2+
>    KVM: x86/pmu: Disallow emulation in the fastpath if mediated PMCs are
>      active
>    KVM: nSVM: Disable PMU MSR interception as appropriate while running
>      L2
>    KVM: x86/pmu: Elide WRMSRs when loading guest PMCs if values already
>      match
> 
> Xiong Zhang (1):
>    KVM: x86/pmu: Register PMI handler for mediated vPMU
> 
>   .../admin-guide/kernel-parameters.txt         |  49 ++
>   arch/arm64/kvm/arm.c                          |   2 +-
>   arch/loongarch/kvm/main.c                     |   2 +-
>   arch/riscv/kvm/main.c                         |   2 +-
>   arch/x86/entry/entry_fred.c                   |   1 +
>   arch/x86/events/amd/core.c                    |   2 +
>   arch/x86/events/core.c                        |  32 +-
>   arch/x86/events/intel/core.c                  |   5 +
>   arch/x86/include/asm/hardirq.h                |   3 +
>   arch/x86/include/asm/idtentry.h               |   6 +
>   arch/x86/include/asm/irq_vectors.h            |   4 +-
>   arch/x86/include/asm/kvm-x86-ops.h            |   2 +-
>   arch/x86/include/asm/kvm-x86-pmu-ops.h        |   4 +
>   arch/x86/include/asm/kvm_host.h               |   7 +-
>   arch/x86/include/asm/msr-index.h              |  17 +-
>   arch/x86/include/asm/perf_event.h             |   1 +
>   arch/x86/include/asm/vmx.h                    |   1 +
>   arch/x86/kernel/idt.c                         |   3 +
>   arch/x86/kernel/irq.c                         |  19 +
>   arch/x86/kvm/Kconfig                          |   1 +
>   arch/x86/kvm/cpuid.c                          |   2 +
>   arch/x86/kvm/pmu.c                            | 272 ++++++++-
>   arch/x86/kvm/pmu.h                            |  37 +-
>   arch/x86/kvm/svm/nested.c                     |  18 +-
>   arch/x86/kvm/svm/pmu.c                        |  51 +-
>   arch/x86/kvm/svm/svm.c                        |  54 +-
>   arch/x86/kvm/vmx/capabilities.h               |  11 +-
>   arch/x86/kvm/vmx/main.c                       |  14 +-
>   arch/x86/kvm/vmx/nested.c                     |  65 ++-
>   arch/x86/kvm/vmx/pmu_intel.c                  | 169 ++++--
>   arch/x86/kvm/vmx/pmu_intel.h                  |  15 +
>   arch/x86/kvm/vmx/vmx.c                        | 143 +++--
>   arch/x86/kvm/vmx/vmx.h                        |  11 +-
>   arch/x86/kvm/vmx/x86_ops.h                    |   2 +-
>   arch/x86/kvm/x86.c                            |  69 ++-
>   arch/x86/kvm/x86.h                            |   1 +
>   include/linux/kvm_host.h                      |  11 +-
>   include/linux/perf_event.h                    |  38 +-
>   init/Kconfig                                  |   4 +
>   kernel/events/core.c                          | 521 ++++++++++++++----
>   .../beauty/arch/x86/include/asm/irq_vectors.h |   3 +-
>   virt/kvm/kvm_main.c                           |   6 +-
>   42 files changed, 1385 insertions(+), 295 deletions(-)
> 
> 
> base-commit: 53d61a43a7973f812caa08fa922b607574befef4

Tested KUT/kselftest/Perf fuzzer/internal perf test suite on 4 Intel 
platforms: Sapphire Rapids (SPR), Granite Rapids (GNR), Sierra Forest 
(SRF), Clearwater Forest (CWF), no issue is found with both mediated 
vPMU and legacy perf-based vPMU.

All tests can be classified into 3 types: Bare Metal perf test, KVM 
guest perf test, and Host-Guest concurrent perf test.

* Bare Metal perf test
1. Perf Fuzzer run pass on 4 Intel platforms, SPR, GNR, SRF, CWF.
2. Internal perf test suite run pass on 4 Intel platforms, SPR, GNR, 
SRF, CWF.

* KVM guest perf test
1. KUT and KVM kselftest passed except for unspported features with 
result "Skip" [1][2].
                         | SPR      | GNR      | SRF      | CWF      |
legacy perf-based vPMU  |          |          |          |          |
KUT                     |          |          |          |          |
   pmu                   | Pass     | Pass     | Fail[3]  | Fail[3]  |
   pmu_lbr               | Skip[1]  | Skip[1]  | Skip[1]  | Skip[1]  |
   pmu_pebs              | Pass     | Fail[4]  | Fail[4]  | Skip[2]  |
KVM kselftest           |          |          |          |          |
   pmu_event_filter_test | Pass     | Pass     | Pass     | Pass     |
   vmx_pmu_caps_test     | Pass     | Pass     | Pass     | Pass     |
   pmu_counters_test     | Pass     | Pass     | Fail[5]  | Fail[5]  |
                         |          |          |          |          |
Mediated vPMU           |          |          |          |          |
KUT                     |          |          |          |          |
   pmu                   | Pass     | Pass     | Fail[3]  | Pass     |
   pmu_lbr               | Skip[1]  | Skip[1]  | Skip[1]  | Skip[1]  |
   pmu_pebs              | Skip[2]  | Skip[2]  | Skip[2]  | Skip[2]  |
KVM kselftest           |          |          |          |          |
   pmu_event_filter_test | Pass     | Pass     | Pass     | Pass     |
   vmx_pmu_caps_test     | Pass     | Pass     | Pass     | Pass     |
   pmu_counters_test     | Pass     | Pass     | Pass     | Pass     |

All failures above are known issues, which run pass with 3 patchset 
[3][4][5].

[1] Mediated vPMU based arch-LBR support is not upstreamed yet.
[2] Mediated vPMU based arch-PEBS support is not upstreamed yet.
[3] kvm-unit-tests: Fix pmu test errors on GNR/SRF/CWF 
https://lore.kernel.org/all/20250718013915.227452-1-dapeng1.mi@linux.intel.com/
[4] perf/x86: Add PERF_CAP_PEBS_TIMING_INFO flag 
https://lore.kernel.org/all/20250811090034.51249-5-dapeng1.mi@linux.intel.com/
[5] Fix PMU kselftests errors on GNR/SRF/CWF 
https://lore.kernel.org/all/20250718001905.196989-1-dapeng1.mi@linux.intel.com/

2. Guest run perf countering/sampling pass with both mediated vPMU and 
legacy perf-based vPMU on 4 Intel platforms, SPR, GNR, SRF, CWF.

3. Nested, L2 run perf pass with below combinations: (Mediated vPMU 1, 
legacy perf-based vPMU 0)
L0   | L1   | L2   |
0    | 0    | Pass |
1    | 0    | Pass |

Mediated vPMU can not be enabled on L1 guest since KVM still doesn't 
support vmx VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL yet and mediated vPMU 
depends on it.

* Host-Guest concurrent perf test
Host and guest run perf stress (record and counting) in parallel pass.

Tested-by: Xudong Hao <xudong.hao@intel.com>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 16/44] KVM: Add a simplified wrapper for registering perf callbacks
  2025-08-06 19:56 ` [PATCH v5 16/44] KVM: Add a simplified wrapper for registering perf callbacks Sean Christopherson
@ 2025-08-22 10:32   ` Anup Patel
  0 siblings, 0 replies; 65+ messages in thread
From: Anup Patel @ 2025-08-22 10:32 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao, Huacai Chen,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Xin Li, H. Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Paolo Bonzini,
	linux-arm-kernel, kvmarm, kvm, loongarch, kvm-riscv, linux-riscv,
	linux-kernel, linux-perf-users, Kan Liang, Yongwei Ma,
	Mingwei Zhang, Xiong Zhang, Sandipan Das, Dapeng Mi

On Thu, Aug 7, 2025 at 1:27 AM Sean Christopherson <seanjc@google.com> wrote:
>
> Add a parameter-less API for registering perf callbacks in anticipation of
> introducing another x86-only parameter for handling mediated PMU PMIs.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

For KVM RISC-V:
Acked-by: Anup Patel <anup@brainfault.org>

Regards,
Anup

> ---
>  arch/arm64/kvm/arm.c      |  2 +-
>  arch/loongarch/kvm/main.c |  2 +-
>  arch/riscv/kvm/main.c     |  2 +-
>  arch/x86/kvm/x86.c        |  2 +-
>  include/linux/kvm_host.h  | 11 +++++++++--
>  virt/kvm/kvm_main.c       |  5 +++--
>  6 files changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 888f7c7abf54..6c604b5214f2 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -2328,7 +2328,7 @@ static int __init init_subsystems(void)
>         if (err)
>                 goto out;
>
> -       kvm_register_perf_callbacks(NULL);
> +       kvm_register_perf_callbacks();
>
>  out:
>         if (err)
> diff --git a/arch/loongarch/kvm/main.c b/arch/loongarch/kvm/main.c
> index 80ea63d465b8..f62326fe29fa 100644
> --- a/arch/loongarch/kvm/main.c
> +++ b/arch/loongarch/kvm/main.c
> @@ -394,7 +394,7 @@ static int kvm_loongarch_env_init(void)
>         }
>
>         kvm_init_gcsr_flag();
> -       kvm_register_perf_callbacks(NULL);
> +       kvm_register_perf_callbacks();
>
>         /* Register LoongArch IPI interrupt controller interface. */
>         ret = kvm_loongarch_register_ipi_device();
> diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
> index 67c876de74ef..cbe842c2f615 100644
> --- a/arch/riscv/kvm/main.c
> +++ b/arch/riscv/kvm/main.c
> @@ -159,7 +159,7 @@ static int __init riscv_kvm_init(void)
>                 kvm_info("AIA available with %d guest external interrupts\n",
>                          kvm_riscv_aia_nr_hgei);
>
> -       kvm_register_perf_callbacks(NULL);
> +       kvm_register_perf_callbacks();
>
>         rc = kvm_init(sizeof(struct kvm_vcpu), 0, THIS_MODULE);
>         if (rc) {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5af2c5aed0f2..d80bbd5e0859 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9689,7 +9689,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>                 set_hv_tscchange_cb(kvm_hyperv_tsc_notifier);
>  #endif
>
> -       kvm_register_perf_callbacks(ops->handle_intel_pt_intr);
> +       __kvm_register_perf_callbacks(ops->handle_intel_pt_intr, NULL);
>
>         if (IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) && tdp_mmu_enabled)
>                 kvm_caps.supported_vm_types |= BIT(KVM_X86_SW_PROTECTED_VM);
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 15656b7fba6c..20c50eaa0089 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1731,10 +1731,17 @@ static inline bool kvm_arch_intc_initialized(struct kvm *kvm)
>  #ifdef CONFIG_GUEST_PERF_EVENTS
>  unsigned long kvm_arch_vcpu_get_ip(struct kvm_vcpu *vcpu);
>
> -void kvm_register_perf_callbacks(unsigned int (*pt_intr_handler)(void));
> +void __kvm_register_perf_callbacks(unsigned int (*pt_intr_handler)(void),
> +                                  void (*mediated_pmi_handler)(void));
> +
> +static inline void kvm_register_perf_callbacks(void)
> +{
> +       __kvm_register_perf_callbacks(NULL, NULL);
> +}
> +
>  void kvm_unregister_perf_callbacks(void);
>  #else
> -static inline void kvm_register_perf_callbacks(void *ign) {}
> +static inline void kvm_register_perf_callbacks(void) {}
>  static inline void kvm_unregister_perf_callbacks(void) {}
>  #endif /* CONFIG_GUEST_PERF_EVENTS */
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index ecafab2e17d9..d477a7fda0ae 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -6429,10 +6429,11 @@ static struct perf_guest_info_callbacks kvm_guest_cbs = {
>         .handle_mediated_pmi    = NULL,
>  };
>
> -void kvm_register_perf_callbacks(unsigned int (*pt_intr_handler)(void))
> +void __kvm_register_perf_callbacks(unsigned int (*pt_intr_handler)(void),
> +                                  void (*mediated_pmi_handler)(void))
>  {
>         kvm_guest_cbs.handle_intel_pt_intr = pt_intr_handler;
> -       kvm_guest_cbs.handle_mediated_pmi = NULL;
> +       kvm_guest_cbs.handle_mediated_pmi = mediated_pmi_handler;
>
>         perf_register_guest_info_callbacks(&kvm_guest_cbs);
>  }
> --
> 2.50.1.565.gc32cd1483b-goog
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2025-08-22 15:29 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-06 19:56 [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 01/44] perf: Skip pmu_ctx based on event_type Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 02/44] perf: Add generic exclude_guest support Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 03/44] perf: Move security_perf_event_free() call to __free_event() Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 04/44] perf: Add APIs to create/release mediated guest vPMUs Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 05/44] perf: Clean up perf ctx time Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 06/44] perf: Add a EVENT_GUEST flag Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 07/44] perf: Add APIs to load/put guest mediated PMU context Sean Christopherson
2025-08-08  7:30   ` Mi, Dapeng
2025-08-06 19:56 ` [PATCH v5 08/44] perf: core/x86: Register a new vector for handling mediated guest PMIs Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 09/44] perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put context Sean Christopherson
2025-08-15 11:39   ` Peter Zijlstra
2025-08-15 15:41     ` Sean Christopherson
2025-08-15 15:55       ` Sean Christopherson
2025-08-18 14:32         ` Peter Zijlstra
2025-08-18 15:25           ` Sean Christopherson
2025-08-18 16:12             ` Peter Zijlstra
2025-08-18 20:07               ` Liang, Kan
2025-08-15 13:04   ` Peter Zijlstra
2025-08-15 15:51     ` Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 10/44] perf/x86/core: Do not set bit width for unavailable counters Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 11/44] perf/x86/core: Plumb mediated PMU capability from x86_pmu to x86_pmu_cap Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 12/44] perf/x86/intel: Support PERF_PMU_CAP_MEDIATED_VPMU Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 13/44] perf/x86/amd: Support PERF_PMU_CAP_MEDIATED_VPMU for AMD host Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 14/44] KVM: VMX: Setup canonical VMCS config prior to kvm_x86_vendor_init() Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 15/44] KVM: SVM: Check pmu->version, not enable_pmu, when getting PMC MSRs Sean Christopherson
2025-08-13  9:58   ` Sandipan Das
2025-08-06 19:56 ` [PATCH v5 16/44] KVM: Add a simplified wrapper for registering perf callbacks Sean Christopherson
2025-08-22 10:32   ` Anup Patel
2025-08-06 19:56 ` [PATCH v5 17/44] KVM: x86/pmu: Snapshot host (i.e. perf's) reported PMU capabilities Sean Christopherson
2025-08-13  9:56   ` Sandipan Das
2025-08-06 19:56 ` [PATCH v5 18/44] KVM: x86/pmu: Start stubbing in mediated PMU support Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 19/44] KVM: x86/pmu: Implement Intel mediated PMU requirements and constraints Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 20/44] KVM: x86/pmu: Implement AMD mediated PMU requirements Sean Christopherson
2025-08-13  9:49   ` Sandipan Das
2025-08-06 19:56 ` [PATCH v5 21/44] KVM: x86/pmu: Register PMI handler for mediated vPMU Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 22/44] KVM: x86: Rename vmx_vmentry/vmexit_ctrl() helpers Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 23/44] KVM: x86/pmu: Move PMU_CAP_{FW_WRITES,LBR_FMT} into msr-index.h header Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 24/44] KVM: x86: Rework KVM_REQ_MSR_FILTER_CHANGED into a generic RECALC_INTERCEPTS Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 25/44] KVM: x86: Use KVM_REQ_RECALC_INTERCEPTS to react to CPUID updates Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 26/44] KVM: VMX: Add helpers to toggle/change a bit in VMCS execution controls Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 27/44] KVM: x86/pmu: Disable RDPMC interception for compatible mediated vPMU Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 28/44] KVM: x86/pmu: Load/save GLOBAL_CTRL via entry/exit fields for mediated PMU Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 29/44] KVM: x86/pmu: Use BIT_ULL() instead of open coded equivalents Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 30/44] KVM: x86/pmu: Move initialization of valid PMCs bitmask to common x86 Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 31/44] KVM: x86/pmu: Restrict GLOBAL_{CTRL,STATUS}, fixed PMCs, and PEBS to PMU v2+ Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 32/44] KVM: x86/pmu: Disable interception of select PMU MSRs for mediated vPMUs Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 33/44] KVM: x86/pmu: Bypass perf checks when emulating mediated PMU counter accesses Sean Christopherson
2025-08-13 10:01   ` Sandipan Das
2025-08-06 19:56 ` [PATCH v5 34/44] KVM: x86/pmu: Introduce eventsel_hw to prepare for pmu event filtering Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 35/44] KVM: x86/pmu: Reprogram mediated PMU event selectors on event filter updates Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 36/44] KVM: x86/pmu: Always stuff GuestOnly=1,HostOnly=0 for mediated PMCs on AMD Sean Christopherson
2025-08-06 19:56 ` [PATCH v5 37/44] KVM: x86/pmu: Load/put mediated PMU context when entering/exiting guest Sean Christopherson
2025-08-06 19:57 ` [PATCH v5 38/44] KVM: x86/pmu: Disallow emulation in the fastpath if mediated PMCs are active Sean Christopherson
2025-08-13  9:53   ` Sandipan Das
2025-08-06 19:57 ` [PATCH v5 39/44] KVM: x86/pmu: Handle emulated instruction for mediated vPMU Sean Christopherson
2025-08-06 19:57 ` [PATCH v5 40/44] KVM: nVMX: Add macros to simplify nested MSR interception setting Sean Christopherson
2025-08-06 19:57 ` [PATCH v5 41/44] KVM: nVMX: Disable PMU MSR interception as appropriate while running L2 Sean Christopherson
2025-08-06 19:57 ` [PATCH v5 42/44] KVM: nSVM: " Sean Christopherson
2025-08-06 19:57 ` [PATCH v5 43/44] KVM: x86/pmu: Expose enable_mediated_pmu parameter to user space Sean Christopherson
2025-08-06 19:57 ` [PATCH v5 44/44] KVM: x86/pmu: Elide WRMSRs when loading guest PMCs if values already match Sean Christopherson
2025-08-08  8:28 ` [PATCH v5 00/44] KVM: x86: Add support for mediated vPMUs Mi, Dapeng
2025-08-08  8:35   ` Mi, Dapeng
2025-08-13  9:45 ` Sandipan Das
2025-08-22  8:12 ` Hao, Xudong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).