Kernel KVM virtualization development
 help / color / mirror / Atom feed
* [PATCH V6 0/8] KVM: x86/pmu: Add hardware Topdown metrics support
@ 2026-06-29 23:19 Zide Chen
  2026-06-29 23:19 ` [PATCH v6 1/8] KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events Zide Chen
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Zide Chen @ 2026-06-29 23:19 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Zide Chen,
	Das Sandipan, Shukla Manali, Dapeng Mi, Falcon Thomas, Xudong Hao

The Top-Down Microarchitecture Analysis (TMA) method is a structured
approach for identifying performance bottlenecks in out-of-order
processors.

Currently, guests support the TMA method by collecting Topdown events
using GP counters, which may trigger multiplexing.  To free up scarce
GP counters, eliminate multiplexing-induced skew, and obtain coherent
Topdown metric ratios, it is desirable to expose fixed counter 3 and
the IA32_PERF_METRICS MSR to guests.

Several attempts have been made to virtualize this under the legacy
vPMU model [1][2][3], but they were unsuccessful.  With the new mediated
vPMU, enabling TMA support in guests becomes much simpler.  It avoids
invasive changes to the perf core, eliminates CPU pinning and
fixed-counter affinity issues, and reduces the large overhead of
trapping and emulating MSR accesses.

[1] https://lore.kernel.org/kvm/20231031090613.2872700-1-dapeng1.mi@linux.intel.com/
[2] https://lore.kernel.org/all/20230927033124.1226509-1-dapeng1.mi@linux.intel.com/T/
[3] https://lwn.net/ml/linux-kernel/20221212125844.41157-1-likexu@tencent.com/

Tested on an Sapphire Rapids. Without this series, only raw topdown.*_slots
events work in the guest, and metric events (e.g. cpu/topdown-bad-spec/) are
not available.

With this series, metric events are visible in the guest.  Run this
command on both host and guest:

$ perf stat --topdown --no-metric-only -- taskset -c 2 perf bench sched messaging

Host results:

# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 1.500 [sec]

 Performance counter stats for 'taskset -c 2 perf bench sched messaging':

     4,266,060,558      TOPDOWN.SLOTS:u              #     32.0 %  tma_frontend_bound
                                                     #      5.2 %  tma_bad_speculation
       588,397,905      topdown-retiring:u           #     13.8 %  tma_retiring
                                                     #     49.0 %  tma_backend_bound
     1,376,283,990      topdown-fe-bound:u
     2,096,827,304      topdown-be-bound:u
       217,425,841      topdown-bad-spec:u
         5,050,520      INT_MISC.UOP_DROPPING:u

Rebased to kvm-x86/next: 50406d35f563

v6 changes:
- patch 6/8: New patch to refactor rdpmc emulation code.
- patch 7/8: More strict handling of RDPMC ECX argument.
- patch 8/8: Move perf metrics out of test_arch_events().
- patch 2/8: Minor fix of comments.
v5 changes:
- patch 3,5,6/7: new patches to handle RDPMC on metrics.
- patch 6/7: remove host_initiated check.
v4 changes:
- patch 3/4: Remove WARN_ON_ONCE() and simply reject the guest accesses
  by checking host_initiated. (Sashiko)
- patch 3/4: Passthru MSR_PERF_METRICS only if has_mediated_pmu is
  true. (Sashiko)
v3 changes:
- patch 2/4: Move the non-contiguous counter filter code to pmu.c (Dapeng)
- patch 3/4: Replace WARN_ON() with WARN_ON_ONCE(). (Dapeng)
- patch 4/4: Change abs() with explicit bounds (sum >= 0xfd && sum <= 0x102).
- Minor comment cleanups.

v2 changes:
- As suggested by Dapeng, implement a new selftest patch.
- Don't advertise fixed counter 3 if the host doesn't support it.
- Minor change in patch 1 to remove a magic number.

v5:
https://lore.kernel.org/kvm/20260625034555.141453-1-zide.chen@intel.com/
v4:
https://lore.kernel.org/kvm/20260623041927.178256-1-zide.chen@intel.com/
QEMU:
https://lore.kernel.org/qemu-devel/20260604025546.19378-7-zide.chen@intel.com/

Dapeng Mi (2):
  KVM: x86/pmu: Support Intel fixed counter 3 on mediated vPMU
  KVM: x86/pmu: Support PERF_METRICS MSR in mediated vPMU

Mingwei Zhang (1):
  KVM: x86/pmu: Snapshot host IA32_PERF_CAPABILITIES in kvm_host

Zide Chen (5):
  KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events
  KVM: x86/pmu: Rename and move vcpu_get_perf_capabilities() to pmu.h
  KVM: x86/pmu: Move RDPMC emulation into per-vendor callbacks
  KVM: x86/pmu: Emulate RDPMC on performance metrics
  KVM: selftests: Add PERF_METRICS and fixed counter 3 tests

 arch/x86/include/asm/kvm-x86-pmu-ops.h        |  2 +-
 arch/x86/include/asm/kvm_host.h               |  4 +-
 arch/x86/include/asm/msr-index.h              |  1 +
 arch/x86/include/asm/perf_event.h             |  1 +
 arch/x86/kvm/msrs.c                           | 10 +-
 arch/x86/kvm/pmu.c                            | 37 +++++--
 arch/x86/kvm/pmu.h                            | 16 ++-
 arch/x86/kvm/svm/pmu.c                        | 13 ++-
 arch/x86/kvm/vmx/pmu_intel.c                  | 99 +++++++++++++------
 arch/x86/kvm/vmx/pmu_intel.h                  | 10 +-
 arch/x86/kvm/vmx/vmx.c                        | 15 +--
 arch/x86/kvm/x86.c                            |  4 +
 tools/arch/x86/include/asm/msr-index.h        |  1 +
 tools/testing/selftests/kvm/include/x86/pmu.h |  3 +
 .../selftests/kvm/x86/pmu_counters_test.c     | 72 +++++++++++++-
 15 files changed, 220 insertions(+), 68 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v6 1/8] KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events
  2026-06-29 23:19 [PATCH V6 0/8] KVM: x86/pmu: Add hardware Topdown metrics support Zide Chen
@ 2026-06-29 23:19 ` Zide Chen
  2026-06-30  2:13   ` Mi, Dapeng
  2026-06-29 23:19 ` [PATCH v6 2/8] KVM: x86/pmu: Support Intel fixed counter 3 on mediated vPMU Zide Chen
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Zide Chen @ 2026-06-29 23:19 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Zide Chen,
	Das Sandipan, Shukla Manali, Dapeng Mi, Falcon Thomas, Xudong Hao

Only fixed counters 0..2 have matching generic cross-platform
hardware perf events (INSTRUCTIONS, CPU_CYCLES, REF_CPU_CYCLES).
Therefore, perf_get_hw_event_config() is only applicable to these
counters.

KVM does not intend to emulate fixed counters >= 3 on legacy
(non-mediated) vPMU, while for mediated vPMU, KVM does not care what
the fixed counter event mappings are.  Therefore, return 0 for their
eventsel.

The two BUILD_BUG_ON() checks are no longer needed, so drop them along
with __always_inline.

Signed-off-by: Zide Chen <zide.chen@intel.com>
---
v6:
- Re-arrange the code for early return. Clearer.
v2:
- Replace 3 in "if (index < 3)" with ARRAY_SIZE(fixed_pmc_perf_ids).
---
 arch/x86/kvm/vmx/pmu_intel.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index a73a9515d96c..f15af497d27f 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -464,11 +464,8 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
  * different perf_event is already utilizing the requested counter, but the end
  * result is the same (ignoring the fact that using a general purpose counter
  * will likely exacerbate counter contention).
- *
- * Forcibly inlined to allow asserting on @index at build time, and there should
- * never be more than one user.
  */
-static __always_inline u64 intel_get_fixed_pmc_eventsel(unsigned int index)
+static u64 intel_get_fixed_pmc_eventsel(unsigned int index)
 {
 	const enum perf_hw_id fixed_pmc_perf_ids[] = {
 		[0] = PERF_COUNT_HW_INSTRUCTIONS,
@@ -477,8 +474,13 @@ static __always_inline u64 intel_get_fixed_pmc_eventsel(unsigned int index)
 	};
 	u64 eventsel;
 
-	BUILD_BUG_ON(ARRAY_SIZE(fixed_pmc_perf_ids) != KVM_MAX_NR_INTEL_FIXED_COUNTERS);
-	BUILD_BUG_ON(index >= KVM_MAX_NR_INTEL_FIXED_COUNTERS);
+	/*
+	 * Fixed counters 3 and above don't have a corresponding generic
+	 * hardware perf event, and KVM does not intend to emulate them on
+	 * non-mediated vPMU.
+	 */
+	if (index >= ARRAY_SIZE(fixed_pmc_perf_ids))
+		return 0;
 
 	/*
 	 * Yell if perf reports support for a fixed counter but perf doesn't
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 2/8] KVM: x86/pmu: Support Intel fixed counter 3 on mediated vPMU
  2026-06-29 23:19 [PATCH V6 0/8] KVM: x86/pmu: Add hardware Topdown metrics support Zide Chen
  2026-06-29 23:19 ` [PATCH v6 1/8] KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events Zide Chen
@ 2026-06-29 23:19 ` Zide Chen
  2026-06-30  2:16   ` Mi, Dapeng
  2026-06-29 23:19 ` [PATCH v6 3/8] KVM: x86/pmu: Rename and move vcpu_get_perf_capabilities() to pmu.h Zide Chen
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Zide Chen @ 2026-06-29 23:19 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Zide Chen,
	Das Sandipan, Shukla Manali, Dapeng Mi, Falcon Thomas, Xudong Hao

From: Dapeng Mi <dapeng1.mi@linux.intel.com>

Starting with Ice Lake, Intel introduced fixed counter 3, which counts
TOPDOWN.SLOTS - the number of available slots for an unhalted logical
processor.  It serves as the denominator for top-level metrics in the
Top-down Microarchitecture Analysis method.

Emulating this counter on legacy vPMU would require introducing a new
generic perf encoding for the Intel-specific TOPDOWN.SLOTS event in
order to call perf_get_hw_event_config().  This is undesirable as it
would pollute the generic perf event encoding.

Moreover, KVM does not intend to emulate IA32_PERF_METRICS in the
legacy vPMU model, and without IA32_PERF_METRICS, emulating this
counter has little practical value.  Therefore, expose fixed counter
3 to guests only when mediated vPMU is enabled.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Co-developed-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
v6:
- Update comments to replace 2 with KVM_MAX_NR_INTEL_FIXED_COUNTERS - 1.
v3:
- Move the non-contiguous counter filter code to pmu.c
v2:
- Don't advertise fixed counter 3 to userspace if the host doesn't
  support it.
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/msrs.c             |  4 ++--
 arch/x86/kvm/pmu.c              | 18 +++++++++++++++++-
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d8700eb848b4..dc9e4e8bfc07 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -609,7 +609,7 @@ struct kvm_pmc {
 #define KVM_MAX_NR_GP_COUNTERS		KVM_MAX(KVM_MAX_NR_INTEL_GP_COUNTERS, \
 						KVM_MAX_NR_AMD_GP_COUNTERS)
 
-#define KVM_MAX_NR_INTEL_FIXED_COUNTERS	3
+#define KVM_MAX_NR_INTEL_FIXED_COUNTERS	4
 #define KVM_MAX_NR_AMD_FIXED_COUNTERS	0
 #define KVM_MAX_NR_FIXED_COUNTERS	KVM_MAX(KVM_MAX_NR_INTEL_FIXED_COUNTERS, \
 						KVM_MAX_NR_AMD_FIXED_COUNTERS)
diff --git a/arch/x86/kvm/msrs.c b/arch/x86/kvm/msrs.c
index c230b18d87e3..3bf42d90ad14 100644
--- a/arch/x86/kvm/msrs.c
+++ b/arch/x86/kvm/msrs.c
@@ -228,7 +228,7 @@ static const u32 msrs_to_save_base[] = {
 
 static const u32 msrs_to_save_pmu[] = {
 	MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
-	MSR_ARCH_PERFMON_FIXED_CTR0 + 2,
+	MSR_ARCH_PERFMON_FIXED_CTR2, MSR_ARCH_PERFMON_FIXED_CTR3,
 	MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
 	MSR_CORE_PERF_GLOBAL_CTRL,
 	MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
@@ -2688,7 +2688,7 @@ void kvm_init_msr_lists(void)
 {
 	unsigned i;
 
-	BUILD_BUG_ON_MSG(KVM_MAX_NR_FIXED_COUNTERS != 3,
+	BUILD_BUG_ON_MSG(KVM_MAX_NR_FIXED_COUNTERS != 4,
 			 "Please update the fixed PMCs in msrs_to_save_pmu[]");
 
 	num_msrs_to_save = 0;
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 62d0ed99ebe9..f82ba63767d0 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -99,7 +99,8 @@ static const struct x86_cpu_id vmx_pebs_pdist_cpu[] = {
  *        all perf counters (both gp and fixed). The mapping relationship
  *        between pmc and perf counters is as the following:
  *        * Intel: [0 .. KVM_MAX_NR_INTEL_GP_COUNTERS-1] <=> gp counters
- *                 [KVM_FIXED_PMC_BASE_IDX .. KVM_FIXED_PMC_BASE_IDX + 2] <=> fixed
+ *                 [KVM_FIXED_PMC_BASE_IDX .. KVM_FIXED_PMC_BASE_IDX +
+ *                  KVM_MAX_NR_INTEL_FIXED_COUNTERS - 1] <=> fixed
  *        * AMD:   [0 .. AMD64_NUM_COUNTERS-1] and, for families 15H
  *          and later, [0 .. AMD64_NUM_COUNTERS_CORE-1] <=> gp counters
  */
@@ -134,6 +135,8 @@ void kvm_init_pmu_capability(struct kvm_pmu_ops *pmu_ops)
 {
 	bool is_intel = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL;
 	int min_nr_gp_ctrs = pmu_ops->MIN_NR_GP_COUNTERS;
+	union cpuid10_edx edx;
+	u32 eax, ebx, ecx;
 
 	/*
 	 * Hybrid PMUs don't play nice with virtualization without careful
@@ -181,6 +184,19 @@ void kvm_init_pmu_capability(struct kvm_pmu_ops *pmu_ops)
 	kvm_pmu_cap.num_counters_fixed = min(kvm_pmu_cap.num_counters_fixed,
 					     KVM_MAX_NR_FIXED_COUNTERS);
 
+	/*
+	 * Currently, KVM doesn't support non-contiguous fixed counters; make
+	 * sure only contiguous ones are retained in kvm_pmu_cap.
+	 */
+	if (kvm_host_pmu.version >= 5) {
+		cpuid(0xa, &eax, &ebx, &ecx, &edx.full);
+		if (kvm_pmu_cap.num_counters_fixed > edx.split.num_counters_fixed)
+			kvm_pmu_cap.num_counters_fixed = edx.split.num_counters_fixed;
+	}
+
+	if (!enable_mediated_pmu && kvm_pmu_cap.num_counters_fixed > 3)
+		kvm_pmu_cap.num_counters_fixed = 3;
+
 	kvm_pmu_eventsel.INSTRUCTIONS_RETIRED =
 		perf_get_hw_event_config(PERF_COUNT_HW_INSTRUCTIONS);
 	kvm_pmu_eventsel.BRANCH_INSTRUCTIONS_RETIRED =
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 3/8] KVM: x86/pmu: Rename and move vcpu_get_perf_capabilities() to pmu.h
  2026-06-29 23:19 [PATCH V6 0/8] KVM: x86/pmu: Add hardware Topdown metrics support Zide Chen
  2026-06-29 23:19 ` [PATCH v6 1/8] KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events Zide Chen
  2026-06-29 23:19 ` [PATCH v6 2/8] KVM: x86/pmu: Support Intel fixed counter 3 on mediated vPMU Zide Chen
@ 2026-06-29 23:19 ` Zide Chen
  2026-06-30  2:18   ` Mi, Dapeng
  2026-06-29 23:19 ` [PATCH v6 4/8] KVM: x86/pmu: Snapshot host IA32_PERF_CAPABILITIES in kvm_host Zide Chen
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Zide Chen @ 2026-06-29 23:19 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Zide Chen,
	Das Sandipan, Shukla Manali, Dapeng Mi, Falcon Thomas, Xudong Hao

This is in preparation for it to be called from common x86 code, for
example kvm_need_rdpmc_intercept(), to check the guest's PERF_METRICS
capability.

Rename it to kvm_vcpu_get_perf_caps() to indicate that it's part of
the common API, and shorten _capabilities to _caps.

No functional change intended.

Signed-off-by: Zide Chen <zide.chen@intel.com>
---
v5: new patch.
---
 arch/x86/kvm/pmu.h           |  8 ++++++++
 arch/x86/kvm/vmx/pmu_intel.c |  6 +++---
 arch/x86/kvm/vmx/pmu_intel.h | 10 +---------
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index a5821d7c87f9..1b2f66a2e915 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -271,6 +271,14 @@ static inline bool kvm_pmu_is_fastpath_emulation_allowed(struct kvm_vcpu *vcpu)
 				  X86_PMC_IDX_MAX);
 }
 
+static inline u64 kvm_vcpu_get_perf_caps(struct kvm_vcpu *vcpu)
+{
+	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_PDCM))
+		return 0;
+
+	return vcpu->arch.perf_capabilities;
+}
+
 void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
 int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
 int kvm_pmu_check_rdpmc_early(struct kvm_vcpu *vcpu, unsigned int idx);
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index f15af497d27f..e426ddc8add4 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -189,13 +189,13 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 	case MSR_CORE_PERF_FIXED_CTR_CTRL:
 		return kvm_pmu_has_perf_global_ctrl(pmu);
 	case MSR_IA32_PEBS_ENABLE:
-		ret = vcpu_get_perf_capabilities(vcpu) & PERF_CAP_PEBS_FORMAT;
+		ret = kvm_vcpu_get_perf_caps(vcpu) & PERF_CAP_PEBS_FORMAT;
 		break;
 	case MSR_IA32_DS_AREA:
 		ret = guest_cpu_cap_has(vcpu, X86_FEATURE_DS);
 		break;
 	case MSR_PEBS_DATA_CFG:
-		perf_capabilities = vcpu_get_perf_capabilities(vcpu);
+		perf_capabilities = kvm_vcpu_get_perf_caps(vcpu);
 		ret = (perf_capabilities & PERF_CAP_PEBS_BASELINE) &&
 			((perf_capabilities & PERF_CAP_PEBS_FORMAT) > 3);
 		break;
@@ -550,7 +550,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 		pmu->raw_event_mask |= (HSW_IN_TX|HSW_IN_TX_CHECKPOINTED);
 	}
 
-	perf_capabilities = vcpu_get_perf_capabilities(vcpu);
+	perf_capabilities = kvm_vcpu_get_perf_caps(vcpu);
 	if (intel_pmu_lbr_is_compatible(vcpu) &&
 	    (perf_capabilities & PERF_CAP_LBR_FMT))
 		memcpy(&lbr_desc->records, &vmx_lbr_caps, sizeof(vmx_lbr_caps));
diff --git a/arch/x86/kvm/vmx/pmu_intel.h b/arch/x86/kvm/vmx/pmu_intel.h
index 5d9357640aa1..afdbbc9991d6 100644
--- a/arch/x86/kvm/vmx/pmu_intel.h
+++ b/arch/x86/kvm/vmx/pmu_intel.h
@@ -6,17 +6,9 @@
 
 #include "cpuid.h"
 
-static inline u64 vcpu_get_perf_capabilities(struct kvm_vcpu *vcpu)
-{
-	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_PDCM))
-		return 0;
-
-	return vcpu->arch.perf_capabilities;
-}
-
 static inline bool fw_writes_is_enabled(struct kvm_vcpu *vcpu)
 {
-	return (vcpu_get_perf_capabilities(vcpu) & PERF_CAP_FW_WRITES) != 0;
+	return (kvm_vcpu_get_perf_caps(vcpu) & PERF_CAP_FW_WRITES) != 0;
 }
 
 bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu);
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 4/8] KVM: x86/pmu: Snapshot host IA32_PERF_CAPABILITIES in kvm_host
  2026-06-29 23:19 [PATCH V6 0/8] KVM: x86/pmu: Add hardware Topdown metrics support Zide Chen
                   ` (2 preceding siblings ...)
  2026-06-29 23:19 ` [PATCH v6 3/8] KVM: x86/pmu: Rename and move vcpu_get_perf_capabilities() to pmu.h Zide Chen
@ 2026-06-29 23:19 ` Zide Chen
  2026-06-30  2:19   ` Mi, Dapeng
  2026-06-29 23:19 ` [PATCH v6 5/8] KVM: x86/pmu: Support PERF_METRICS MSR in mediated vPMU Zide Chen
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Zide Chen @ 2026-06-29 23:19 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Zide Chen,
	Das Sandipan, Shukla Manali, Dapeng Mi, Falcon Thomas, Xudong Hao

From: Mingwei Zhang <mizhang@google.com>

Cache the unadulterated snapshot of perf_capabilities so that KVM can
compare guest vPMU capabilities against raw hardware capabilities.

For example, if the host supports PERF_METRICS but it is not configured
for the guest, KVM can use it to determine that RDPMC accesses must be
intercepted.

Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
v5: new patch.
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/vmx/vmx.c          | 8 ++------
 arch/x86/kvm/x86.c              | 4 ++++
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index dc9e4e8bfc07..80f638588bf7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -347,6 +347,7 @@ struct kvm_host_values {
 	u64 xss;
 	u64 s_cet;
 	u64 arch_capabilities;
+	u64 perf_capabilities;
 };
 extern struct kvm_host_values kvm_host;
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index aded7039bd3e..b736b9ff965b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8050,14 +8050,10 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 static __init u64 vmx_get_perf_capabilities(void)
 {
 	u64 perf_cap = PERF_CAP_FW_WRITES;
-	u64 host_perf_cap = 0;
 
 	if (!enable_pmu)
 		return 0;
 
-	if (boot_cpu_has(X86_FEATURE_PDCM))
-		rdmsrq(MSR_IA32_PERF_CAPABILITIES, host_perf_cap);
-
 	if (!cpu_feature_enabled(X86_FEATURE_ARCH_LBR) &&
 	    !enable_mediated_pmu) {
 		x86_perf_get_lbr(&vmx_lbr_caps);
@@ -8070,11 +8066,11 @@ static __init u64 vmx_get_perf_capabilities(void)
 		if (!vmx_lbr_caps.has_callstack)
 			memset(&vmx_lbr_caps, 0, sizeof(vmx_lbr_caps));
 		else if (vmx_lbr_caps.nr)
-			perf_cap |= host_perf_cap & PERF_CAP_LBR_FMT;
+			perf_cap |= kvm_host.perf_capabilities & PERF_CAP_LBR_FMT;
 	}
 
 	if (vmx_pebs_supported()) {
-		perf_cap |= host_perf_cap & PERF_CAP_PEBS_MASK;
+		perf_cap |= kvm_host.perf_capabilities & PERF_CAP_PEBS_MASK;
 
 		/*
 		 * Disallow adaptive PEBS as it is functionally broken, can be
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8dbc0fa302a8..8e775855f9be 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7032,6 +7032,10 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
 		rdmsrq(MSR_IA32_ARCH_CAPABILITIES, kvm_host.arch_capabilities);
 
+	if (boot_cpu_has(X86_FEATURE_PDCM))
+		rdmsrq_safe(MSR_IA32_PERF_CAPABILITIES,
+			    &kvm_host.perf_capabilities);
+
 	WARN_ON_ONCE(kvm_nr_uret_msrs);
 
 	r = ops->hardware_setup();
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 5/8] KVM: x86/pmu: Support PERF_METRICS MSR in mediated vPMU
  2026-06-29 23:19 [PATCH V6 0/8] KVM: x86/pmu: Add hardware Topdown metrics support Zide Chen
                   ` (3 preceding siblings ...)
  2026-06-29 23:19 ` [PATCH v6 4/8] KVM: x86/pmu: Snapshot host IA32_PERF_CAPABILITIES in kvm_host Zide Chen
@ 2026-06-29 23:19 ` Zide Chen
  2026-06-30  2:20   ` Mi, Dapeng
  2026-06-29 23:19 ` [PATCH v6 6/8] KVM: x86/pmu: Move RDPMC emulation into per-vendor callbacks Zide Chen
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Zide Chen @ 2026-06-29 23:19 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Zide Chen,
	Das Sandipan, Shukla Manali, Dapeng Mi, Falcon Thomas, Xudong Hao

From: Dapeng Mi <dapeng1.mi@linux.intel.com>

Bit 15 in IA32_PERF_CAPABILITIES indicates that the CPU provides
built-in support for Topdown Microarchitecture Analysis (TMA) L1
metrics via the IA32_PERF_METRICS MSR.

Expose this capability only when mediated vPMU is enabled, as emulating
IA32_PERF_METRICS in the legacy vPMU model is impractical.

Pass IA32_PERF_METRICS through to the guest only when mediated vPMU is
enabled and bit 15 is set in guest IA32_PERF_CAPABILITIES.  Allow
kvm_pmu_{get,set}_msr() to handle this MSR for host accesses.

Save and restore this MSR on host/guest PMU context switches so that
host PMU activity does not clobber the guest value, and guest state
is not leaked into the host.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
v5:
- Remove host_initiated check in set/get MSR handlers.
v4:
- Remove WARN_ON_ONCE() and simply reject the guest accesses by checking
  host_initiated. (Sashiko)
- Passthru MSR_PERF_METRICS only if has_mediated_pmu is true. (Sashiko)
- Remove the redundant !! in vcpu_has_perf_metrics().
v3:
- Replace WARN_ON() with WARN_ON_ONCE(). (Dapeng)
- Add comments to explain why we don't validate writes on PERF_METRICS.
---
 arch/x86/include/asm/kvm_host.h   |  1 +
 arch/x86/include/asm/msr-index.h  |  1 +
 arch/x86/include/asm/perf_event.h |  1 +
 arch/x86/kvm/msrs.c               |  6 +++++-
 arch/x86/kvm/pmu.h                |  5 +++++
 arch/x86/kvm/vmx/pmu_intel.c      | 31 +++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c            |  7 +++++++
 7 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 80f638588bf7..96376d8a5199 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -630,6 +630,7 @@ struct kvm_pmu {
 	u64 global_status_rsvd;
 	u64 reserved_bits;
 	u64 raw_event_mask;
+	u64 perf_metrics;
 	struct kvm_pmc gp_counters[KVM_MAX_NR_GP_COUNTERS];
 	struct kvm_pmc fixed_counters[KVM_MAX_NR_FIXED_COUNTERS];
 
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 18c4be75e927..fdcaeb6c8352 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -331,6 +331,7 @@
 #define PERF_CAP_PEBS_FORMAT		0xf00
 #define PERF_CAP_FW_WRITES		BIT_ULL(13)
 #define PERF_CAP_PEBS_BASELINE		BIT_ULL(14)
+#define PERF_CAP_PERF_METRICS		BIT_ULL(15)
 #define PERF_CAP_PEBS_TIMING_INFO	BIT_ULL(17)
 #define PERF_CAP_PEBS_MASK		(PERF_CAP_PEBS_TRAP | PERF_CAP_ARCH_REG | \
 					 PERF_CAP_PEBS_FORMAT | PERF_CAP_PEBS_BASELINE | \
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 1eb13673e889..bc2e1cbcd9b9 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -447,6 +447,7 @@ static inline bool is_topdown_idx(int idx)
 #define GLOBAL_STATUS_ARCH_PEBS_THRESHOLD_BIT	54
 #define GLOBAL_STATUS_ARCH_PEBS_THRESHOLD	BIT_ULL(GLOBAL_STATUS_ARCH_PEBS_THRESHOLD_BIT)
 #define GLOBAL_STATUS_PERF_METRICS_OVF_BIT	48
+#define GLOBAL_STATUS_PERF_METRICS_OVF		BIT_ULL(GLOBAL_STATUS_PERF_METRICS_OVF_BIT)
 
 #define GLOBAL_CTRL_EN_PERF_METRICS		BIT_ULL(48)
 /*
diff --git a/arch/x86/kvm/msrs.c b/arch/x86/kvm/msrs.c
index 3bf42d90ad14..c751a8dbd45d 100644
--- a/arch/x86/kvm/msrs.c
+++ b/arch/x86/kvm/msrs.c
@@ -230,7 +230,7 @@ static const u32 msrs_to_save_pmu[] = {
 	MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
 	MSR_ARCH_PERFMON_FIXED_CTR2, MSR_ARCH_PERFMON_FIXED_CTR3,
 	MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
-	MSR_CORE_PERF_GLOBAL_CTRL,
+	MSR_CORE_PERF_GLOBAL_CTRL, MSR_PERF_METRICS,
 	MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
 
 	/* This part of MSRs should match KVM_MAX_NR_INTEL_GP_COUNTERS. */
@@ -2625,6 +2625,10 @@ static void kvm_probe_msr_to_save(u32 msr_index)
 		     intel_pt_validate_hw_cap(PT_CAP_num_address_ranges) * 2))
 			return;
 		break;
+	case MSR_PERF_METRICS:
+		if (!(kvm_caps.supported_perf_cap & PERF_CAP_PERF_METRICS))
+			return;
+		break;
 	case MSR_ARCH_PERFMON_PERFCTR0 ...
 	     MSR_ARCH_PERFMON_PERFCTR0 + KVM_MAX_NR_GP_COUNTERS - 1:
 		if (msr_index - MSR_ARCH_PERFMON_PERFCTR0 >=
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 1b2f66a2e915..3066cade5790 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -279,6 +279,11 @@ static inline u64 kvm_vcpu_get_perf_caps(struct kvm_vcpu *vcpu)
 	return vcpu->arch.perf_capabilities;
 }
 
+static inline bool kvm_vcpu_has_perf_metrics(struct kvm_vcpu *vcpu)
+{
+	return kvm_vcpu_get_perf_caps(vcpu) & PERF_CAP_PERF_METRICS;
+}
+
 void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
 int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
 int kvm_pmu_check_rdpmc_early(struct kvm_vcpu *vcpu, unsigned int idx);
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index e426ddc8add4..225afd3937c3 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -188,6 +188,8 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 	switch (msr) {
 	case MSR_CORE_PERF_FIXED_CTR_CTRL:
 		return kvm_pmu_has_perf_global_ctrl(pmu);
+	case MSR_PERF_METRICS:
+		return kvm_vcpu_has_perf_metrics(vcpu);
 	case MSR_IA32_PEBS_ENABLE:
 		ret = kvm_vcpu_get_perf_caps(vcpu) & PERF_CAP_PEBS_FORMAT;
 		break;
@@ -345,6 +347,9 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_CORE_PERF_FIXED_CTR_CTRL:
 		msr_info->data = pmu->fixed_ctr_ctrl;
 		break;
+	case MSR_PERF_METRICS:
+		msr_info->data = pmu->perf_metrics;
+		break;
 	case MSR_IA32_PEBS_ENABLE:
 		msr_info->data = pmu->pebs_enable;
 		break;
@@ -394,6 +399,15 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (pmu->fixed_ctr_ctrl != data)
 			reprogram_fixed_counters(pmu, data);
 		break;
+	case MSR_PERF_METRICS:
+		/*
+		 * On platforms that support only hardware level-1, bits [63:32]
+		 * are reserved and ignored by hardware. If hardware level-2 is also
+		 * supported, they may contain valid metric data.
+		 * Either way, guest writes are passed through verbatim.
+		 */
+		pmu->perf_metrics = data;
+		break;
 	case MSR_IA32_PEBS_ENABLE:
 		if (data & pmu->pebs_enable_rsvd)
 			return 1;
@@ -589,6 +603,11 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 		pmu->global_status_rsvd &=
 				~MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI;
 
+	if (perf_capabilities & PERF_CAP_PERF_METRICS) {
+		pmu->global_ctrl_rsvd &= ~GLOBAL_CTRL_EN_PERF_METRICS;
+		pmu->global_status_rsvd &= ~GLOBAL_STATUS_PERF_METRICS_OVF;
+	}
+
 	if (perf_capabilities & PERF_CAP_PEBS_FORMAT) {
 		if (perf_capabilities & PERF_CAP_PEBS_BASELINE) {
 			pmu->pebs_enable_rsvd = counter_rsvd;
@@ -632,6 +651,9 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
 
 static void intel_pmu_reset(struct kvm_vcpu *vcpu)
 {
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+
+	pmu->perf_metrics = 0;
 	intel_pmu_release_guest_lbr_event(vcpu);
 }
 
@@ -803,6 +825,9 @@ static void intel_mediated_pmu_load(struct kvm_vcpu *vcpu)
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
 	u64 global_status, toggle;
 
+	if (kvm_vcpu_has_perf_metrics(vcpu))
+		wrmsrq(MSR_PERF_METRICS, pmu->perf_metrics);
+
 	rdmsrq(MSR_CORE_PERF_GLOBAL_STATUS, global_status);
 	toggle = pmu->global_status ^ global_status;
 	if (global_status & toggle)
@@ -831,6 +856,12 @@ static void intel_mediated_pmu_put(struct kvm_vcpu *vcpu)
 	 */
 	if (pmu->fixed_ctr_ctrl_hw)
 		wrmsrq(MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
+
+	if (kvm_vcpu_has_perf_metrics(vcpu)) {
+		pmu->perf_metrics = rdpmc(INTEL_PMC_FIXED_RDPMC_METRICS);
+		if (pmu->perf_metrics)
+			wrmsrq(MSR_PERF_METRICS, 0);
+	}
 }
 
 struct kvm_pmu_ops intel_pmu_ops __initdata = {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b736b9ff965b..21eb4b339fa6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4273,6 +4273,10 @@ static void vmx_recalc_pmu_msr_intercepts(struct kvm_vcpu *vcpu)
 				  MSR_TYPE_RW, intercept);
 	vmx_set_intercept_for_msr(vcpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
 				  MSR_TYPE_RW, intercept);
+
+	intercept = !has_mediated_pmu || !kvm_vcpu_has_perf_metrics(vcpu);
+	vmx_set_intercept_for_msr(vcpu, MSR_PERF_METRICS,
+				  MSR_TYPE_RW, intercept);
 }
 
 static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
@@ -8095,6 +8099,9 @@ static __init u64 vmx_get_perf_capabilities(void)
 		perf_cap &= ~PERF_CAP_PEBS_BASELINE;
 	}
 
+	if (enable_mediated_pmu)
+		perf_cap |= kvm_host.perf_capabilities & PERF_CAP_PERF_METRICS;
+
 	return perf_cap;
 }
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 6/8] KVM: x86/pmu: Move RDPMC emulation into per-vendor callbacks
  2026-06-29 23:19 [PATCH V6 0/8] KVM: x86/pmu: Add hardware Topdown metrics support Zide Chen
                   ` (4 preceding siblings ...)
  2026-06-29 23:19 ` [PATCH v6 5/8] KVM: x86/pmu: Support PERF_METRICS MSR in mediated vPMU Zide Chen
@ 2026-06-29 23:19 ` Zide Chen
  2026-06-30  2:23   ` Mi, Dapeng
  2026-06-29 23:19 ` [PATCH v6 7/8] KVM: x86/pmu: Emulate RDPMC on performance metrics Zide Chen
  2026-06-29 23:19 ` [PATCH v6 8/8] KVM: selftests: Add PERF_METRICS and fixed counter 3 tests Zide Chen
  7 siblings, 1 reply; 18+ messages in thread
From: Zide Chen @ 2026-06-29 23:19 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Zide Chen,
	Das Sandipan, Shukla Manali, Dapeng Mi, Falcon Thomas, Xudong Hao

The current RDPMC emulation splits responsibility: rdpmc_ecx_to_pmc()
in each vendor returns a kvm_pmc, then common code calls
pmc_read_counter().

This design cannot support RDPMC reads that don't map to a counter,
such as PERF_METRICS on Intel platforms.

Replace rdpmc_ecx_to_pmc() with emulate_rdpmc(), which takes full
ownership of the emulation and writes the result directly into @data.

Also drop the redundant bitmask in intel_emulate_rdpmc() since
pmc_read_counter() already applies the counter's bit-width mask.

No functional change intended.

Signed-off-by: Zide Chen <zide.chen@intel.com>
---
v6: new patch.
---
 arch/x86/include/asm/kvm-x86-pmu-ops.h |  2 +-
 arch/x86/kvm/pmu.c                     |  9 +--------
 arch/x86/kvm/pmu.h                     |  4 ++--
 arch/x86/kvm/svm/pmu.c                 | 13 +++++++++----
 arch/x86/kvm/vmx/pmu_intel.c           | 25 ++++++++++++-------------
 5 files changed, 25 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-pmu-ops.h b/arch/x86/include/asm/kvm-x86-pmu-ops.h
index 4a223c2793e3..4b50ed058aed 100644
--- a/arch/x86/include/asm/kvm-x86-pmu-ops.h
+++ b/arch/x86/include/asm/kvm-x86-pmu-ops.h
@@ -13,7 +13,7 @@
  * KVM_X86_PMU_OP_OPTIONAL() can be used for those functions that can have
  * a NULL definition.
  */
-KVM_X86_PMU_OP(rdpmc_ecx_to_pmc)
+KVM_X86_PMU_OP(emulate_rdpmc)
 KVM_X86_PMU_OP(msr_idx_to_pmc)
 KVM_X86_PMU_OP_OPTIONAL(check_rdpmc_early)
 KVM_X86_PMU_OP(is_valid_msr)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index f82ba63767d0..8ef2d4761790 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -768,8 +768,6 @@ static int kvm_pmu_rdpmc_vmware(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
 int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
-	struct kvm_pmc *pmc;
-	u64 mask = ~0ull;
 
 	if (!pmu->version)
 		return 1;
@@ -777,17 +775,12 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
 	if (is_vmware_backdoor_pmc(idx))
 		return kvm_pmu_rdpmc_vmware(vcpu, idx, data);
 
-	pmc = kvm_pmu_call(rdpmc_ecx_to_pmc)(vcpu, idx, &mask);
-	if (!pmc)
-		return 1;
-
 	if (!kvm_is_cr4_bit_set(vcpu, X86_CR4_PCE) &&
 	    (kvm_x86_call(get_cpl)(vcpu) != 0) &&
 	    kvm_is_cr0_bit_set(vcpu, X86_CR0_PE))
 		return 1;
 
-	*data = pmc_read_counter(pmc) & mask;
-	return 0;
+	return kvm_pmu_call(emulate_rdpmc)(vcpu, idx, data);
 }
 
 static bool kvm_need_any_pmc_intercept(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 3066cade5790..cdbefda844b9 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -24,8 +24,8 @@
 #define KVM_FIXED_PMC_BASE_IDX INTEL_PMC_IDX_FIXED
 
 struct kvm_pmu_ops {
-	struct kvm_pmc *(*rdpmc_ecx_to_pmc)(struct kvm_vcpu *vcpu,
-		unsigned int idx, u64 *mask);
+	int (*emulate_rdpmc)(struct kvm_vcpu *vcpu, unsigned int idx,
+			     u64 *data);
 	struct kvm_pmc *(*msr_idx_to_pmc)(struct kvm_vcpu *vcpu, u32 msr);
 	int (*check_rdpmc_early)(struct kvm_vcpu *vcpu, unsigned int idx);
 	bool (*is_valid_msr)(struct kvm_vcpu *vcpu, u32 msr);
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index c18286545a7a..0517fd4bbcd7 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -84,10 +84,15 @@ static int amd_check_rdpmc_early(struct kvm_vcpu *vcpu, unsigned int idx)
 }
 
 /* idx is the ECX register of RDPMC instruction */
-static struct kvm_pmc *amd_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
-	unsigned int idx, u64 *mask)
+static int amd_emulate_rdpmc(struct kvm_vcpu *vcpu, unsigned int idx, u64 *data)
 {
-	return amd_pmu_get_pmc(vcpu_to_pmu(vcpu), idx);
+	struct kvm_pmc *pmc = amd_pmu_get_pmc(vcpu_to_pmu(vcpu), idx);
+
+	if (!pmc)
+		return 1;
+
+	*data = pmc_read_counter(pmc);
+	return 0;
 }
 
 static struct kvm_pmc *amd_msr_idx_to_pmc(struct kvm_vcpu *vcpu, u32 msr)
@@ -302,7 +307,7 @@ static bool amd_pmc_is_disabled_in_current_mode(struct kvm_pmc *pmc)
 }
 
 struct kvm_pmu_ops amd_pmu_ops __initdata = {
-	.rdpmc_ecx_to_pmc = amd_rdpmc_ecx_to_pmc,
+	.emulate_rdpmc = amd_emulate_rdpmc,
 	.msr_idx_to_pmc = amd_msr_idx_to_pmc,
 	.check_rdpmc_early = amd_check_rdpmc_early,
 	.is_valid_msr = amd_is_valid_msr,
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 225afd3937c3..080677372c9b 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -84,14 +84,13 @@ static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
 	}
 }
 
-static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
-					    unsigned int idx, u64 *mask)
+static int intel_emulate_rdpmc(struct kvm_vcpu *vcpu, unsigned int idx,
+			       u64 *data)
 {
 	unsigned int type = idx & INTEL_RDPMC_TYPE_MASK;
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
-	struct kvm_pmc *counters;
+	struct kvm_pmc *counters, *pmc;
 	unsigned int num_counters;
-	u64 bitmask;
 
 	/*
 	 * The encoding of ECX for RDPMC is different for architectural versus
@@ -104,7 +103,9 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
 	 * as KVM doesn't support such PMUs.
 	 */
 	if (WARN_ON_ONCE(!pmu->version))
-		return NULL;
+		return 1;
+
+	idx &= INTEL_RDPMC_INDEX_MASK;
 
 	/*
 	 * General Purpose (GP) PMCs are supported on all PMUs, and fixed PMCs
@@ -118,23 +119,21 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
 	case INTEL_RDPMC_FIXED:
 		counters = pmu->fixed_counters;
 		num_counters = pmu->nr_arch_fixed_counters;
-		bitmask = pmu->counter_bitmask[KVM_PMC_FIXED];
 		break;
 	case INTEL_RDPMC_GP:
 		counters = pmu->gp_counters;
 		num_counters = pmu->nr_arch_gp_counters;
-		bitmask = pmu->counter_bitmask[KVM_PMC_GP];
 		break;
 	default:
-		return NULL;
+		return 1;
 	}
 
-	idx &= INTEL_RDPMC_INDEX_MASK;
 	if (idx >= num_counters)
-		return NULL;
+		return 1;
 
-	*mask &= bitmask;
-	return &counters[array_index_nospec(idx, num_counters)];
+	pmc = &counters[array_index_nospec(idx, num_counters)];
+	*data = pmc_read_counter(pmc);
+	return 0;
 }
 
 static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm_pmu *pmu, u32 msr)
@@ -865,7 +864,7 @@ static void intel_mediated_pmu_put(struct kvm_vcpu *vcpu)
 }
 
 struct kvm_pmu_ops intel_pmu_ops __initdata = {
-	.rdpmc_ecx_to_pmc = intel_rdpmc_ecx_to_pmc,
+	.emulate_rdpmc = intel_emulate_rdpmc,
 	.msr_idx_to_pmc = intel_msr_idx_to_pmc,
 	.is_valid_msr = intel_is_valid_msr,
 	.get_msr = intel_pmu_get_msr,
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 7/8] KVM: x86/pmu: Emulate RDPMC on performance metrics
  2026-06-29 23:19 [PATCH V6 0/8] KVM: x86/pmu: Add hardware Topdown metrics support Zide Chen
                   ` (5 preceding siblings ...)
  2026-06-29 23:19 ` [PATCH v6 6/8] KVM: x86/pmu: Move RDPMC emulation into per-vendor callbacks Zide Chen
@ 2026-06-29 23:19 ` Zide Chen
  2026-06-30  2:23   ` Mi, Dapeng
  2026-06-29 23:19 ` [PATCH v6 8/8] KVM: selftests: Add PERF_METRICS and fixed counter 3 tests Zide Chen
  7 siblings, 1 reply; 18+ messages in thread
From: Zide Chen @ 2026-06-29 23:19 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Zide Chen,
	Das Sandipan, Shukla Manali, Dapeng Mi, Falcon Thomas, Xudong Hao

If the host has the PERF_METRICS capability but it's not present on
the guest, RDPMC interception must be enabled and KVM should inject
an #GP when the guest attempts a PERF_METRICS RDPMC.

If the guest has PERF_METRICS but RDPMC interception is enabled for
other reasons, KVM needs to emulate RDPMC with type 2000H.

For simplicity, Metrics Clear Mode is not supported.

Signed-off-by: Zide Chen <zide.chen@intel.com>
---
v6:
- Merge kvm_pmu_rdpmc_metrics() into intel_emulate_rdpmc().
- Reject non-zero index.
v5:
- new patch.
---
 arch/x86/kvm/pmu.c           |  7 +++++++
 arch/x86/kvm/vmx/pmu_intel.c | 14 ++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 8ef2d4761790..04b9c840f218 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -806,6 +806,12 @@ bool kvm_need_perf_global_ctrl_intercept(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_need_perf_global_ctrl_intercept);
 
+static bool kvm_need_perf_metrics_intercept(struct kvm_vcpu *vcpu)
+{
+	return (kvm_host.perf_capabilities & PERF_CAP_PERF_METRICS) &&
+		!kvm_vcpu_has_perf_metrics(vcpu);
+}
+
 bool kvm_need_rdpmc_intercept(struct kvm_vcpu *vcpu)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -818,6 +824,7 @@ bool kvm_need_rdpmc_intercept(struct kvm_vcpu *vcpu)
 		return true;
 
 	return kvm_need_any_pmc_intercept(vcpu) ||
+	       kvm_need_perf_metrics_intercept(vcpu) ||
 	       pmu->counter_bitmask[KVM_PMC_GP] != (BIT_ULL(kvm_host_pmu.bit_width_gp) - 1) ||
 	       pmu->counter_bitmask[KVM_PMC_FIXED] != (BIT_ULL(kvm_host_pmu.bit_width_fixed) - 1);
 }
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 080677372c9b..93b5a8360377 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -30,6 +30,7 @@
  */
 #define INTEL_RDPMC_GP		0
 #define INTEL_RDPMC_FIXED	INTEL_PMC_FIXED_RDPMC_BASE
+#define INTEL_RDPMC_METRICS	INTEL_PMC_FIXED_RDPMC_METRICS
 
 #define INTEL_RDPMC_TYPE_MASK	GENMASK(31, 16)
 #define INTEL_RDPMC_INDEX_MASK	GENMASK(15, 0)
@@ -124,6 +125,19 @@ static int intel_emulate_rdpmc(struct kvm_vcpu *vcpu, unsigned int idx,
 		counters = pmu->gp_counters;
 		num_counters = pmu->nr_arch_gp_counters;
 		break;
+	case INTEL_RDPMC_METRICS:
+		if (!kvm_vcpu_has_perf_metrics(vcpu))
+			return 1;
+
+		/*
+		 * The index in ECX[15:0] is implementation specific, but no
+		 * platform currently supports a non-zero index.
+		 */
+		if (idx)
+			return 1;
+
+		*data = pmu->perf_metrics;
+		return 0;
 	default:
 		return 1;
 	}
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 8/8] KVM: selftests: Add PERF_METRICS and fixed counter 3 tests
  2026-06-29 23:19 [PATCH V6 0/8] KVM: x86/pmu: Add hardware Topdown metrics support Zide Chen
                   ` (6 preceding siblings ...)
  2026-06-29 23:19 ` [PATCH v6 7/8] KVM: x86/pmu: Emulate RDPMC on performance metrics Zide Chen
@ 2026-06-29 23:19 ` Zide Chen
  2026-06-29 23:45   ` sashiko-bot
  2026-06-30  2:36   ` Mi, Dapeng
  7 siblings, 2 replies; 18+ messages in thread
From: Zide Chen @ 2026-06-29 23:19 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Zide Chen,
	Das Sandipan, Shukla Manali, Dapeng Mi, Falcon Thomas, Xudong Hao

Add a test case to exercise IA32_PERF_METRICS, i.e. architectural
support for Topdown (TMA) Level 1 metrics, enumerated by
IA32_PERF_CAPABILITIES[15].

Only check for non-zero metrics, as they are derived and depend on
the workload, CPU model, and host scheduling, making precise
expectations fragile.

Extend the PMU selftest to cover Intel fixed counter 3 by bumping
MAX_NR_FIXED_COUNTERS to 4 and validating basic functionality.

Signed-off-by: Zide Chen <zide.chen@intel.com>
---
v6:
- Move perf metrics test out of test_arch_events(); it doesn't belong
  there, and this also avoids redundant runs of perf metrics test.
- Correct the +/-3 error margin.
v3:
- Slightly reword comment to explain the sum of topdown metrics
  is close to 100%.
- Change abs() with explicit bounds (sum >= 0xfd && sum <= 0x102)
  for better readability.
v2:
- New patch.
---
 tools/arch/x86/include/asm/msr-index.h        |  1 +
 tools/testing/selftests/kvm/include/x86/pmu.h |  3 +
 .../selftests/kvm/x86/pmu_counters_test.c     | 94 ++++++++++++++++++-
 3 files changed, 93 insertions(+), 5 deletions(-)

diff --git a/tools/arch/x86/include/asm/msr-index.h b/tools/arch/x86/include/asm/msr-index.h
index eff29645719b..e7745e2cd543 100644
--- a/tools/arch/x86/include/asm/msr-index.h
+++ b/tools/arch/x86/include/asm/msr-index.h
@@ -331,6 +331,7 @@
 #define PERF_CAP_PEBS_FORMAT		0xf00
 #define PERF_CAP_FW_WRITES		BIT_ULL(13)
 #define PERF_CAP_PEBS_BASELINE		BIT_ULL(14)
+#define PERF_CAP_PERF_METRICS		BIT_ULL(15)
 #define PERF_CAP_PEBS_TIMING_INFO	BIT_ULL(17)
 #define PERF_CAP_PEBS_MASK		(PERF_CAP_PEBS_TRAP | PERF_CAP_ARCH_REG | \
 					 PERF_CAP_PEBS_FORMAT | PERF_CAP_PEBS_BASELINE | \
diff --git a/tools/testing/selftests/kvm/include/x86/pmu.h b/tools/testing/selftests/kvm/include/x86/pmu.h
index 608ed83d7c6a..6c19503e0bb7 100644
--- a/tools/testing/selftests/kvm/include/x86/pmu.h
+++ b/tools/testing/selftests/kvm/include/x86/pmu.h
@@ -52,6 +52,9 @@
 /* Fixed PMC controls, Intel only. */
 #define FIXED_PMC_GLOBAL_CTRL_ENABLE(_idx)	BIT_ULL((32 + (_idx)))
 
+/* PERF_METRICS enable, Intel only. */
+#define PERF_METRICS_GLOBAL_CTRL_ENABLE		BIT_ULL(48)
+
 #define FIXED_PMC_KERNEL			BIT_ULL(0)
 #define FIXED_PMC_USER				BIT_ULL(1)
 #define FIXED_PMC_ANYTHREAD			BIT_ULL(2)
diff --git a/tools/testing/selftests/kvm/x86/pmu_counters_test.c b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
index dc6afac3aa91..38057754e024 100644
--- a/tools/testing/selftests/kvm/x86/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
@@ -6,6 +6,7 @@
 
 #include "pmu.h"
 #include "processor.h"
+#include <linux/bitfield.h>
 
 /* Number of iterations of the loop for the guest measurement payload. */
 #define NUM_LOOPS			10
@@ -241,17 +242,20 @@ do {										\
 	);									\
 } while (0)
 
-#define GUEST_TEST_EVENT(_idx, _pmc, _pmc_msr, _ctrl_msr, _value, FEP)		\
+#define GUEST_RUN_PAYLOAD(_ctrl_msr, _value, FEP)				\
 do {										\
-	wrmsr(_pmc_msr, 0);							\
-										\
 	if (this_cpu_has(X86_FEATURE_CLFLUSHOPT))				\
 		GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflushopt %[m]", FEP);	\
 	else if (this_cpu_has(X86_FEATURE_CLFLUSH))				\
 		GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflush  %[m]", FEP);	\
 	else									\
 		GUEST_MEASURE_EVENT(_ctrl_msr, _value, "nop", FEP);		\
-										\
+} while (0)
+
+#define GUEST_TEST_EVENT(_idx, _pmc, _pmc_msr, _ctrl_msr, _value, FEP)		\
+do {										\
+	wrmsr(_pmc_msr, 0);							\
+	GUEST_RUN_PAYLOAD(_ctrl_msr, _value, FEP);				\
 	guest_assert_event_count(_idx, _pmc, _pmc_msr);				\
 } while (0)
 
@@ -318,6 +322,75 @@ static void guest_test_arch_event(u8 idx)
 				FIXED_PMC_GLOBAL_CTRL_ENABLE(i));
 }
 
+static void __guest_test_perf_metrics(void)
+{
+	int retiring, bad_spec, fe_bound, be_bound, sum;
+	u64 global_ctrl, metrics;
+
+	if ((guest_get_pmu_version() < 2) ||	/* Does guest have GLOBAL_CTRL? */
+	    !this_cpu_has(X86_FEATURE_PDCM) ||
+	    !(rdmsr(MSR_IA32_PERF_CAPABILITIES) & PERF_CAP_PERF_METRICS))
+		return;
+
+	wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+	wrmsr(MSR_CORE_PERF_FIXED_CTR3, 0);
+	wrmsr(MSR_PERF_METRICS, 0);
+
+	/* Enable fixed ctr3 (TOPDOWN.SLOTS) and PERF_METRICS. */
+	wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, FIXED_PMC_CTRL(3, FIXED_PMC_KERNEL));
+	global_ctrl = FIXED_PMC_GLOBAL_CTRL_ENABLE(3) |
+		      PERF_METRICS_GLOBAL_CTRL_ENABLE;
+
+	GUEST_RUN_PAYLOAD(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl, "");
+
+	/* Check test results. */
+	metrics = rdmsr(MSR_PERF_METRICS);
+	retiring = FIELD_GET(GENMASK_ULL(7, 0), metrics);
+	bad_spec = FIELD_GET(GENMASK_ULL(15, 8), metrics);
+	fe_bound = FIELD_GET(GENMASK_ULL(23, 16), metrics);
+	be_bound = FIELD_GET(GENMASK_ULL(31, 24), metrics);
+
+	/*
+	 * Be conservative: the measured payload definitely retires work, so
+	 * Retiring should be non-zero.
+	 */
+	GUEST_ASSERT_NE(metrics, 0ULL);
+	GUEST_ASSERT_NE(retiring, 0ULL);
+
+	/*
+	 * Each level-1 topdown metrics is an integer fraction of 255.
+	 * An +/-3 error margin is chosen for a loose sanity check.
+	 */
+	sum = retiring + bad_spec + fe_bound + be_bound;
+	GUEST_ASSERT(sum >= 0xfc && sum <= 0x102);
+
+	/* Sanity check after PERF_METRICS disabled. */
+	__asm__ __volatile__("loop ." : "+c"((int){NUM_LOOPS}));
+	GUEST_ASSERT_EQ(rdmsr(MSR_PERF_METRICS), metrics);
+	wrmsr(MSR_PERF_METRICS, 0xdeaddead);
+
+	GUEST_ASSERT_EQ(rdmsr(MSR_PERF_METRICS), 0xdeaddead);
+}
+
+static void guest_test_perf_metrics(void)
+{
+	__guest_test_perf_metrics();
+	GUEST_DONE();
+}
+
+static void test_perf_metrics(u8 pmu_version, u64 perf_capabilities)
+{
+	struct kvm_vcpu *vcpu;
+	struct kvm_vm *vm;
+
+	vm = pmu_vm_create_with_one_vcpu(&vcpu, guest_test_perf_metrics,
+					 pmu_version, perf_capabilities);
+
+	run_vcpu(vcpu);
+
+	kvm_vm_free(vm);
+}
+
 static void guest_test_arch_events(void)
 {
 	u8 i;
@@ -361,7 +434,7 @@ static void test_arch_events(u8 pmu_version, u64 perf_capabilities,
  * other than PMCs in the future.
  */
 #define MAX_NR_GP_COUNTERS	8
-#define MAX_NR_FIXED_COUNTERS	3
+#define MAX_NR_FIXED_COUNTERS	4
 
 #define GUEST_ASSERT_PMC_MSR_ACCESS(insn, msr, expect_gp, vector)		\
 __GUEST_ASSERT(expect_gp ? vector == GP_VECTOR : !vector,			\
@@ -585,6 +658,7 @@ static void test_intel_counters(void)
 	u8 nr_fixed_counters = kvm_cpu_property(X86_PROPERTY_PMU_NR_FIXED_COUNTERS);
 	u8 nr_gp_counters = kvm_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS);
 	u8 pmu_version = kvm_cpu_property(X86_PROPERTY_PMU_VERSION);
+	u64 advertised_perf_caps = kvm_get_feature_msr(MSR_IA32_PERF_CAPABILITIES);
 	unsigned int i;
 	u8 v, j;
 	u32 k;
@@ -592,6 +666,7 @@ static void test_intel_counters(void)
 	const u64 perf_caps[] = {
 		0,
 		PMU_CAP_FW_WRITES,
+		PERF_CAP_PERF_METRICS,
 	};
 
 	/*
@@ -649,6 +724,10 @@ static void test_intel_counters(void)
 			if (!kvm_has_perf_caps && perf_caps[i])
 				continue;
 
+			/* Ignore unsupported features. */
+			if (perf_caps[i] & ~advertised_perf_caps)
+				continue;
+
 			pr_info("Testing arch events, PMU version %u, perf_caps = %lx\n",
 				v, perf_caps[i]);
 
@@ -675,6 +754,11 @@ static void test_intel_counters(void)
 				for (k = 0; k <= (BIT(nr_fixed_counters) - 1); k++)
 					test_fixed_counters(v, perf_caps[i], j, k);
 			}
+
+			pr_info("Testing Perf Metrics, PMU version %u, perf_caps = %lx\n",
+				v, perf_caps[i]);
+
+			test_perf_metrics(v, perf_caps[i]);
 		}
 	}
 }
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 8/8] KVM: selftests: Add PERF_METRICS and fixed counter 3 tests
  2026-06-29 23:19 ` [PATCH v6 8/8] KVM: selftests: Add PERF_METRICS and fixed counter 3 tests Zide Chen
@ 2026-06-29 23:45   ` sashiko-bot
  2026-06-30  2:36   ` Mi, Dapeng
  1 sibling, 0 replies; 18+ messages in thread
From: sashiko-bot @ 2026-06-29 23:45 UTC (permalink / raw)
  To: Zide Chen; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The newly added test for PERF_METRICS writes a non-zero value (0xdeaddead) to MSR_PERF_METRICS and expects to read it back verbatim, which violates Intel's architectural rules and will cause a #GP exception on hardware.
--

commit 35ae6d43235ee910691fc703d1aee89a9dbe653a
Author: Zide Chen <zide.chen@intel.com>

KVM: selftests: Add PERF_METRICS and fixed counter 3 tests

This patch adds a test case to exercise IA32_PERF_METRICS, verifying
architectural support for Topdown Level 1 metrics. It also extends the
PMU selftest to cover Intel fixed counter 3 by bumping the max fixed
counters and validating basic functionality.

> diff --git a/tools/testing/selftests/kvm/x86/pmu_counters_test.c b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
> --- a/tools/testing/selftests/kvm/x86/pmu_counters_test.c
> +++ b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
[ ... ]
> @@ -318,6 +322,75 @@ static void guest_test_arch_event(u8 idx)
>  				FIXED_PMC_GLOBAL_CTRL_ENABLE(i));
>  }
>  
> +static void __guest_test_perf_metrics(void)
> +{
[ ... ]
> +	/* Sanity check after PERF_METRICS disabled. */
> +	__asm__ __volatile__("loop ." : "+c"((int){NUM_LOOPS}));
> +	GUEST_ASSERT_EQ(rdmsr(MSR_PERF_METRICS), metrics);
> +	wrmsr(MSR_PERF_METRICS, 0xdeaddead);
> +
> +	GUEST_ASSERT_EQ(rdmsr(MSR_PERF_METRICS), 0xdeaddead);
> +}

[Severity: Medium]
Will this write to MSR_PERF_METRICS cause a #GP exception on hardware?

According to the Intel SDM, software should only write 0 to clear the metrics,
and writing a non-zero value like 0xdeaddead may trigger a general protection
fault. 

If this occurs on a host where MSR_PERF_METRICS writes are passed through
to the physical CPU, could it cause the guest to crash with an unhandled
#GP(0) exception and fail the selftest?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260629231938.15129-1-zide.chen@intel.com?part=8

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 1/8] KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events
  2026-06-29 23:19 ` [PATCH v6 1/8] KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events Zide Chen
@ 2026-06-30  2:13   ` Mi, Dapeng
  0 siblings, 0 replies; 18+ messages in thread
From: Mi, Dapeng @ 2026-06-30  2:13 UTC (permalink / raw)
  To: Zide Chen, Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Das Sandipan,
	Shukla Manali, Falcon Thomas, Xudong Hao

LGTM. Thanks.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>

On 6/30/2026 7:19 AM, Zide Chen wrote:
> Only fixed counters 0..2 have matching generic cross-platform
> hardware perf events (INSTRUCTIONS, CPU_CYCLES, REF_CPU_CYCLES).
> Therefore, perf_get_hw_event_config() is only applicable to these
> counters.
>
> KVM does not intend to emulate fixed counters >= 3 on legacy
> (non-mediated) vPMU, while for mediated vPMU, KVM does not care what
> the fixed counter event mappings are.  Therefore, return 0 for their
> eventsel.
>
> The two BUILD_BUG_ON() checks are no longer needed, so drop them along
> with __always_inline.
>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
> ---
> v6:
> - Re-arrange the code for early return. Clearer.
> v2:
> - Replace 3 in "if (index < 3)" with ARRAY_SIZE(fixed_pmc_perf_ids).
> ---
>  arch/x86/kvm/vmx/pmu_intel.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index a73a9515d96c..f15af497d27f 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -464,11 +464,8 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   * different perf_event is already utilizing the requested counter, but the end
>   * result is the same (ignoring the fact that using a general purpose counter
>   * will likely exacerbate counter contention).
> - *
> - * Forcibly inlined to allow asserting on @index at build time, and there should
> - * never be more than one user.
>   */
> -static __always_inline u64 intel_get_fixed_pmc_eventsel(unsigned int index)
> +static u64 intel_get_fixed_pmc_eventsel(unsigned int index)
>  {
>  	const enum perf_hw_id fixed_pmc_perf_ids[] = {
>  		[0] = PERF_COUNT_HW_INSTRUCTIONS,
> @@ -477,8 +474,13 @@ static __always_inline u64 intel_get_fixed_pmc_eventsel(unsigned int index)
>  	};
>  	u64 eventsel;
>  
> -	BUILD_BUG_ON(ARRAY_SIZE(fixed_pmc_perf_ids) != KVM_MAX_NR_INTEL_FIXED_COUNTERS);
> -	BUILD_BUG_ON(index >= KVM_MAX_NR_INTEL_FIXED_COUNTERS);
> +	/*
> +	 * Fixed counters 3 and above don't have a corresponding generic
> +	 * hardware perf event, and KVM does not intend to emulate them on
> +	 * non-mediated vPMU.
> +	 */
> +	if (index >= ARRAY_SIZE(fixed_pmc_perf_ids))
> +		return 0;
>  
>  	/*
>  	 * Yell if perf reports support for a fixed counter but perf doesn't

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 2/8] KVM: x86/pmu: Support Intel fixed counter 3 on mediated vPMU
  2026-06-29 23:19 ` [PATCH v6 2/8] KVM: x86/pmu: Support Intel fixed counter 3 on mediated vPMU Zide Chen
@ 2026-06-30  2:16   ` Mi, Dapeng
  0 siblings, 0 replies; 18+ messages in thread
From: Mi, Dapeng @ 2026-06-30  2:16 UTC (permalink / raw)
  To: Zide Chen, Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Das Sandipan,
	Shukla Manali, Falcon Thomas, Xudong Hao

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>

On 6/30/2026 7:19 AM, Zide Chen wrote:
> From: Dapeng Mi <dapeng1.mi@linux.intel.com>
>
> Starting with Ice Lake, Intel introduced fixed counter 3, which counts
> TOPDOWN.SLOTS - the number of available slots for an unhalted logical
> processor.  It serves as the denominator for top-level metrics in the
> Top-down Microarchitecture Analysis method.
>
> Emulating this counter on legacy vPMU would require introducing a new
> generic perf encoding for the Intel-specific TOPDOWN.SLOTS event in
> order to call perf_get_hw_event_config().  This is undesirable as it
> would pollute the generic perf event encoding.
>
> Moreover, KVM does not intend to emulate IA32_PERF_METRICS in the
> legacy vPMU model, and without IA32_PERF_METRICS, emulating this
> counter has little practical value.  Therefore, expose fixed counter
> 3 to guests only when mediated vPMU is enabled.
>
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Co-developed-by: Zide Chen <zide.chen@intel.com>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
> ---
> v6:
> - Update comments to replace 2 with KVM_MAX_NR_INTEL_FIXED_COUNTERS - 1.
> v3:
> - Move the non-contiguous counter filter code to pmu.c
> v2:
> - Don't advertise fixed counter 3 to userspace if the host doesn't
>   support it.
> ---
>  arch/x86/include/asm/kvm_host.h |  2 +-
>  arch/x86/kvm/msrs.c             |  4 ++--
>  arch/x86/kvm/pmu.c              | 18 +++++++++++++++++-
>  3 files changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index d8700eb848b4..dc9e4e8bfc07 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -609,7 +609,7 @@ struct kvm_pmc {
>  #define KVM_MAX_NR_GP_COUNTERS		KVM_MAX(KVM_MAX_NR_INTEL_GP_COUNTERS, \
>  						KVM_MAX_NR_AMD_GP_COUNTERS)
>  
> -#define KVM_MAX_NR_INTEL_FIXED_COUNTERS	3
> +#define KVM_MAX_NR_INTEL_FIXED_COUNTERS	4
>  #define KVM_MAX_NR_AMD_FIXED_COUNTERS	0
>  #define KVM_MAX_NR_FIXED_COUNTERS	KVM_MAX(KVM_MAX_NR_INTEL_FIXED_COUNTERS, \
>  						KVM_MAX_NR_AMD_FIXED_COUNTERS)
> diff --git a/arch/x86/kvm/msrs.c b/arch/x86/kvm/msrs.c
> index c230b18d87e3..3bf42d90ad14 100644
> --- a/arch/x86/kvm/msrs.c
> +++ b/arch/x86/kvm/msrs.c
> @@ -228,7 +228,7 @@ static const u32 msrs_to_save_base[] = {
>  
>  static const u32 msrs_to_save_pmu[] = {
>  	MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
> -	MSR_ARCH_PERFMON_FIXED_CTR0 + 2,
> +	MSR_ARCH_PERFMON_FIXED_CTR2, MSR_ARCH_PERFMON_FIXED_CTR3,
>  	MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
>  	MSR_CORE_PERF_GLOBAL_CTRL,
>  	MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
> @@ -2688,7 +2688,7 @@ void kvm_init_msr_lists(void)
>  {
>  	unsigned i;
>  
> -	BUILD_BUG_ON_MSG(KVM_MAX_NR_FIXED_COUNTERS != 3,
> +	BUILD_BUG_ON_MSG(KVM_MAX_NR_FIXED_COUNTERS != 4,
>  			 "Please update the fixed PMCs in msrs_to_save_pmu[]");
>  
>  	num_msrs_to_save = 0;
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 62d0ed99ebe9..f82ba63767d0 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -99,7 +99,8 @@ static const struct x86_cpu_id vmx_pebs_pdist_cpu[] = {
>   *        all perf counters (both gp and fixed). The mapping relationship
>   *        between pmc and perf counters is as the following:
>   *        * Intel: [0 .. KVM_MAX_NR_INTEL_GP_COUNTERS-1] <=> gp counters
> - *                 [KVM_FIXED_PMC_BASE_IDX .. KVM_FIXED_PMC_BASE_IDX + 2] <=> fixed
> + *                 [KVM_FIXED_PMC_BASE_IDX .. KVM_FIXED_PMC_BASE_IDX +
> + *                  KVM_MAX_NR_INTEL_FIXED_COUNTERS - 1] <=> fixed
>   *        * AMD:   [0 .. AMD64_NUM_COUNTERS-1] and, for families 15H
>   *          and later, [0 .. AMD64_NUM_COUNTERS_CORE-1] <=> gp counters
>   */
> @@ -134,6 +135,8 @@ void kvm_init_pmu_capability(struct kvm_pmu_ops *pmu_ops)
>  {
>  	bool is_intel = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL;
>  	int min_nr_gp_ctrs = pmu_ops->MIN_NR_GP_COUNTERS;
> +	union cpuid10_edx edx;
> +	u32 eax, ebx, ecx;
>  
>  	/*
>  	 * Hybrid PMUs don't play nice with virtualization without careful
> @@ -181,6 +184,19 @@ void kvm_init_pmu_capability(struct kvm_pmu_ops *pmu_ops)
>  	kvm_pmu_cap.num_counters_fixed = min(kvm_pmu_cap.num_counters_fixed,
>  					     KVM_MAX_NR_FIXED_COUNTERS);
>  
> +	/*
> +	 * Currently, KVM doesn't support non-contiguous fixed counters; make
> +	 * sure only contiguous ones are retained in kvm_pmu_cap.
> +	 */
> +	if (kvm_host_pmu.version >= 5) {
> +		cpuid(0xa, &eax, &ebx, &ecx, &edx.full);
> +		if (kvm_pmu_cap.num_counters_fixed > edx.split.num_counters_fixed)
> +			kvm_pmu_cap.num_counters_fixed = edx.split.num_counters_fixed;
> +	}
> +
> +	if (!enable_mediated_pmu && kvm_pmu_cap.num_counters_fixed > 3)
> +		kvm_pmu_cap.num_counters_fixed = 3;
> +
>  	kvm_pmu_eventsel.INSTRUCTIONS_RETIRED =
>  		perf_get_hw_event_config(PERF_COUNT_HW_INSTRUCTIONS);
>  	kvm_pmu_eventsel.BRANCH_INSTRUCTIONS_RETIRED =

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 3/8] KVM: x86/pmu: Rename and move vcpu_get_perf_capabilities() to pmu.h
  2026-06-29 23:19 ` [PATCH v6 3/8] KVM: x86/pmu: Rename and move vcpu_get_perf_capabilities() to pmu.h Zide Chen
@ 2026-06-30  2:18   ` Mi, Dapeng
  0 siblings, 0 replies; 18+ messages in thread
From: Mi, Dapeng @ 2026-06-30  2:18 UTC (permalink / raw)
  To: Zide Chen, Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Das Sandipan,
	Shukla Manali, Falcon Thomas, Xudong Hao

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>

On 6/30/2026 7:19 AM, Zide Chen wrote:
> This is in preparation for it to be called from common x86 code, for
> example kvm_need_rdpmc_intercept(), to check the guest's PERF_METRICS
> capability.
>
> Rename it to kvm_vcpu_get_perf_caps() to indicate that it's part of
> the common API, and shorten _capabilities to _caps.
>
> No functional change intended.
>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
> ---
> v5: new patch.
> ---
>  arch/x86/kvm/pmu.h           |  8 ++++++++
>  arch/x86/kvm/vmx/pmu_intel.c |  6 +++---
>  arch/x86/kvm/vmx/pmu_intel.h | 10 +---------
>  3 files changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> index a5821d7c87f9..1b2f66a2e915 100644
> --- a/arch/x86/kvm/pmu.h
> +++ b/arch/x86/kvm/pmu.h
> @@ -271,6 +271,14 @@ static inline bool kvm_pmu_is_fastpath_emulation_allowed(struct kvm_vcpu *vcpu)
>  				  X86_PMC_IDX_MAX);
>  }
>  
> +static inline u64 kvm_vcpu_get_perf_caps(struct kvm_vcpu *vcpu)
> +{
> +	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_PDCM))
> +		return 0;
> +
> +	return vcpu->arch.perf_capabilities;
> +}
> +
>  void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
>  int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
>  int kvm_pmu_check_rdpmc_early(struct kvm_vcpu *vcpu, unsigned int idx);
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index f15af497d27f..e426ddc8add4 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -189,13 +189,13 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
>  	case MSR_CORE_PERF_FIXED_CTR_CTRL:
>  		return kvm_pmu_has_perf_global_ctrl(pmu);
>  	case MSR_IA32_PEBS_ENABLE:
> -		ret = vcpu_get_perf_capabilities(vcpu) & PERF_CAP_PEBS_FORMAT;
> +		ret = kvm_vcpu_get_perf_caps(vcpu) & PERF_CAP_PEBS_FORMAT;
>  		break;
>  	case MSR_IA32_DS_AREA:
>  		ret = guest_cpu_cap_has(vcpu, X86_FEATURE_DS);
>  		break;
>  	case MSR_PEBS_DATA_CFG:
> -		perf_capabilities = vcpu_get_perf_capabilities(vcpu);
> +		perf_capabilities = kvm_vcpu_get_perf_caps(vcpu);
>  		ret = (perf_capabilities & PERF_CAP_PEBS_BASELINE) &&
>  			((perf_capabilities & PERF_CAP_PEBS_FORMAT) > 3);
>  		break;
> @@ -550,7 +550,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>  		pmu->raw_event_mask |= (HSW_IN_TX|HSW_IN_TX_CHECKPOINTED);
>  	}
>  
> -	perf_capabilities = vcpu_get_perf_capabilities(vcpu);
> +	perf_capabilities = kvm_vcpu_get_perf_caps(vcpu);
>  	if (intel_pmu_lbr_is_compatible(vcpu) &&
>  	    (perf_capabilities & PERF_CAP_LBR_FMT))
>  		memcpy(&lbr_desc->records, &vmx_lbr_caps, sizeof(vmx_lbr_caps));
> diff --git a/arch/x86/kvm/vmx/pmu_intel.h b/arch/x86/kvm/vmx/pmu_intel.h
> index 5d9357640aa1..afdbbc9991d6 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.h
> +++ b/arch/x86/kvm/vmx/pmu_intel.h
> @@ -6,17 +6,9 @@
>  
>  #include "cpuid.h"
>  
> -static inline u64 vcpu_get_perf_capabilities(struct kvm_vcpu *vcpu)
> -{
> -	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_PDCM))
> -		return 0;
> -
> -	return vcpu->arch.perf_capabilities;
> -}
> -
>  static inline bool fw_writes_is_enabled(struct kvm_vcpu *vcpu)
>  {
> -	return (vcpu_get_perf_capabilities(vcpu) & PERF_CAP_FW_WRITES) != 0;
> +	return (kvm_vcpu_get_perf_caps(vcpu) & PERF_CAP_FW_WRITES) != 0;
>  }
>  
>  bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 4/8] KVM: x86/pmu: Snapshot host IA32_PERF_CAPABILITIES in kvm_host
  2026-06-29 23:19 ` [PATCH v6 4/8] KVM: x86/pmu: Snapshot host IA32_PERF_CAPABILITIES in kvm_host Zide Chen
@ 2026-06-30  2:19   ` Mi, Dapeng
  0 siblings, 0 replies; 18+ messages in thread
From: Mi, Dapeng @ 2026-06-30  2:19 UTC (permalink / raw)
  To: Zide Chen, Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Das Sandipan,
	Shukla Manali, Falcon Thomas, Xudong Hao

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>


On 6/30/2026 7:19 AM, Zide Chen wrote:
> From: Mingwei Zhang <mizhang@google.com>
>
> Cache the unadulterated snapshot of perf_capabilities so that KVM can
> compare guest vPMU capabilities against raw hardware capabilities.
>
> For example, if the host supports PERF_METRICS but it is not configured
> for the guest, KVM can use it to determine that RDPMC accesses must be
> intercepted.
>
> Signed-off-by: Mingwei Zhang <mizhang@google.com>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
> ---
> v5: new patch.
> ---
>  arch/x86/include/asm/kvm_host.h | 1 +
>  arch/x86/kvm/vmx/vmx.c          | 8 ++------
>  arch/x86/kvm/x86.c              | 4 ++++
>  3 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index dc9e4e8bfc07..80f638588bf7 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -347,6 +347,7 @@ struct kvm_host_values {
>  	u64 xss;
>  	u64 s_cet;
>  	u64 arch_capabilities;
> +	u64 perf_capabilities;
>  };
>  extern struct kvm_host_values kvm_host;
>  
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index aded7039bd3e..b736b9ff965b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -8050,14 +8050,10 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  static __init u64 vmx_get_perf_capabilities(void)
>  {
>  	u64 perf_cap = PERF_CAP_FW_WRITES;
> -	u64 host_perf_cap = 0;
>  
>  	if (!enable_pmu)
>  		return 0;
>  
> -	if (boot_cpu_has(X86_FEATURE_PDCM))
> -		rdmsrq(MSR_IA32_PERF_CAPABILITIES, host_perf_cap);
> -
>  	if (!cpu_feature_enabled(X86_FEATURE_ARCH_LBR) &&
>  	    !enable_mediated_pmu) {
>  		x86_perf_get_lbr(&vmx_lbr_caps);
> @@ -8070,11 +8066,11 @@ static __init u64 vmx_get_perf_capabilities(void)
>  		if (!vmx_lbr_caps.has_callstack)
>  			memset(&vmx_lbr_caps, 0, sizeof(vmx_lbr_caps));
>  		else if (vmx_lbr_caps.nr)
> -			perf_cap |= host_perf_cap & PERF_CAP_LBR_FMT;
> +			perf_cap |= kvm_host.perf_capabilities & PERF_CAP_LBR_FMT;
>  	}
>  
>  	if (vmx_pebs_supported()) {
> -		perf_cap |= host_perf_cap & PERF_CAP_PEBS_MASK;
> +		perf_cap |= kvm_host.perf_capabilities & PERF_CAP_PEBS_MASK;
>  
>  		/*
>  		 * Disallow adaptive PEBS as it is functionally broken, can be
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 8dbc0fa302a8..8e775855f9be 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7032,6 +7032,10 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>  	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
>  		rdmsrq(MSR_IA32_ARCH_CAPABILITIES, kvm_host.arch_capabilities);
>  
> +	if (boot_cpu_has(X86_FEATURE_PDCM))
> +		rdmsrq_safe(MSR_IA32_PERF_CAPABILITIES,
> +			    &kvm_host.perf_capabilities);
> +
>  	WARN_ON_ONCE(kvm_nr_uret_msrs);
>  
>  	r = ops->hardware_setup();

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 5/8] KVM: x86/pmu: Support PERF_METRICS MSR in mediated vPMU
  2026-06-29 23:19 ` [PATCH v6 5/8] KVM: x86/pmu: Support PERF_METRICS MSR in mediated vPMU Zide Chen
@ 2026-06-30  2:20   ` Mi, Dapeng
  0 siblings, 0 replies; 18+ messages in thread
From: Mi, Dapeng @ 2026-06-30  2:20 UTC (permalink / raw)
  To: Zide Chen, Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Das Sandipan,
	Shukla Manali, Falcon Thomas, Xudong Hao

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>


On 6/30/2026 7:19 AM, Zide Chen wrote:
> From: Dapeng Mi <dapeng1.mi@linux.intel.com>
>
> Bit 15 in IA32_PERF_CAPABILITIES indicates that the CPU provides
> built-in support for Topdown Microarchitecture Analysis (TMA) L1
> metrics via the IA32_PERF_METRICS MSR.
>
> Expose this capability only when mediated vPMU is enabled, as emulating
> IA32_PERF_METRICS in the legacy vPMU model is impractical.
>
> Pass IA32_PERF_METRICS through to the guest only when mediated vPMU is
> enabled and bit 15 is set in guest IA32_PERF_CAPABILITIES.  Allow
> kvm_pmu_{get,set}_msr() to handle this MSR for host accesses.
>
> Save and restore this MSR on host/guest PMU context switches so that
> host PMU activity does not clobber the guest value, and guest state
> is not leaked into the host.
>
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
> ---
> v5:
> - Remove host_initiated check in set/get MSR handlers.
> v4:
> - Remove WARN_ON_ONCE() and simply reject the guest accesses by checking
>   host_initiated. (Sashiko)
> - Passthru MSR_PERF_METRICS only if has_mediated_pmu is true. (Sashiko)
> - Remove the redundant !! in vcpu_has_perf_metrics().
> v3:
> - Replace WARN_ON() with WARN_ON_ONCE(). (Dapeng)
> - Add comments to explain why we don't validate writes on PERF_METRICS.
> ---
>  arch/x86/include/asm/kvm_host.h   |  1 +
>  arch/x86/include/asm/msr-index.h  |  1 +
>  arch/x86/include/asm/perf_event.h |  1 +
>  arch/x86/kvm/msrs.c               |  6 +++++-
>  arch/x86/kvm/pmu.h                |  5 +++++
>  arch/x86/kvm/vmx/pmu_intel.c      | 31 +++++++++++++++++++++++++++++++
>  arch/x86/kvm/vmx/vmx.c            |  7 +++++++
>  7 files changed, 51 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 80f638588bf7..96376d8a5199 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -630,6 +630,7 @@ struct kvm_pmu {
>  	u64 global_status_rsvd;
>  	u64 reserved_bits;
>  	u64 raw_event_mask;
> +	u64 perf_metrics;
>  	struct kvm_pmc gp_counters[KVM_MAX_NR_GP_COUNTERS];
>  	struct kvm_pmc fixed_counters[KVM_MAX_NR_FIXED_COUNTERS];
>  
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 18c4be75e927..fdcaeb6c8352 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -331,6 +331,7 @@
>  #define PERF_CAP_PEBS_FORMAT		0xf00
>  #define PERF_CAP_FW_WRITES		BIT_ULL(13)
>  #define PERF_CAP_PEBS_BASELINE		BIT_ULL(14)
> +#define PERF_CAP_PERF_METRICS		BIT_ULL(15)
>  #define PERF_CAP_PEBS_TIMING_INFO	BIT_ULL(17)
>  #define PERF_CAP_PEBS_MASK		(PERF_CAP_PEBS_TRAP | PERF_CAP_ARCH_REG | \
>  					 PERF_CAP_PEBS_FORMAT | PERF_CAP_PEBS_BASELINE | \
> diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
> index 1eb13673e889..bc2e1cbcd9b9 100644
> --- a/arch/x86/include/asm/perf_event.h
> +++ b/arch/x86/include/asm/perf_event.h
> @@ -447,6 +447,7 @@ static inline bool is_topdown_idx(int idx)
>  #define GLOBAL_STATUS_ARCH_PEBS_THRESHOLD_BIT	54
>  #define GLOBAL_STATUS_ARCH_PEBS_THRESHOLD	BIT_ULL(GLOBAL_STATUS_ARCH_PEBS_THRESHOLD_BIT)
>  #define GLOBAL_STATUS_PERF_METRICS_OVF_BIT	48
> +#define GLOBAL_STATUS_PERF_METRICS_OVF		BIT_ULL(GLOBAL_STATUS_PERF_METRICS_OVF_BIT)
>  
>  #define GLOBAL_CTRL_EN_PERF_METRICS		BIT_ULL(48)
>  /*
> diff --git a/arch/x86/kvm/msrs.c b/arch/x86/kvm/msrs.c
> index 3bf42d90ad14..c751a8dbd45d 100644
> --- a/arch/x86/kvm/msrs.c
> +++ b/arch/x86/kvm/msrs.c
> @@ -230,7 +230,7 @@ static const u32 msrs_to_save_pmu[] = {
>  	MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
>  	MSR_ARCH_PERFMON_FIXED_CTR2, MSR_ARCH_PERFMON_FIXED_CTR3,
>  	MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
> -	MSR_CORE_PERF_GLOBAL_CTRL,
> +	MSR_CORE_PERF_GLOBAL_CTRL, MSR_PERF_METRICS,
>  	MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
>  
>  	/* This part of MSRs should match KVM_MAX_NR_INTEL_GP_COUNTERS. */
> @@ -2625,6 +2625,10 @@ static void kvm_probe_msr_to_save(u32 msr_index)
>  		     intel_pt_validate_hw_cap(PT_CAP_num_address_ranges) * 2))
>  			return;
>  		break;
> +	case MSR_PERF_METRICS:
> +		if (!(kvm_caps.supported_perf_cap & PERF_CAP_PERF_METRICS))
> +			return;
> +		break;
>  	case MSR_ARCH_PERFMON_PERFCTR0 ...
>  	     MSR_ARCH_PERFMON_PERFCTR0 + KVM_MAX_NR_GP_COUNTERS - 1:
>  		if (msr_index - MSR_ARCH_PERFMON_PERFCTR0 >=
> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> index 1b2f66a2e915..3066cade5790 100644
> --- a/arch/x86/kvm/pmu.h
> +++ b/arch/x86/kvm/pmu.h
> @@ -279,6 +279,11 @@ static inline u64 kvm_vcpu_get_perf_caps(struct kvm_vcpu *vcpu)
>  	return vcpu->arch.perf_capabilities;
>  }
>  
> +static inline bool kvm_vcpu_has_perf_metrics(struct kvm_vcpu *vcpu)
> +{
> +	return kvm_vcpu_get_perf_caps(vcpu) & PERF_CAP_PERF_METRICS;
> +}
> +
>  void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
>  int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
>  int kvm_pmu_check_rdpmc_early(struct kvm_vcpu *vcpu, unsigned int idx);
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index e426ddc8add4..225afd3937c3 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -188,6 +188,8 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
>  	switch (msr) {
>  	case MSR_CORE_PERF_FIXED_CTR_CTRL:
>  		return kvm_pmu_has_perf_global_ctrl(pmu);
> +	case MSR_PERF_METRICS:
> +		return kvm_vcpu_has_perf_metrics(vcpu);
>  	case MSR_IA32_PEBS_ENABLE:
>  		ret = kvm_vcpu_get_perf_caps(vcpu) & PERF_CAP_PEBS_FORMAT;
>  		break;
> @@ -345,6 +347,9 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	case MSR_CORE_PERF_FIXED_CTR_CTRL:
>  		msr_info->data = pmu->fixed_ctr_ctrl;
>  		break;
> +	case MSR_PERF_METRICS:
> +		msr_info->data = pmu->perf_metrics;
> +		break;
>  	case MSR_IA32_PEBS_ENABLE:
>  		msr_info->data = pmu->pebs_enable;
>  		break;
> @@ -394,6 +399,15 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		if (pmu->fixed_ctr_ctrl != data)
>  			reprogram_fixed_counters(pmu, data);
>  		break;
> +	case MSR_PERF_METRICS:
> +		/*
> +		 * On platforms that support only hardware level-1, bits [63:32]
> +		 * are reserved and ignored by hardware. If hardware level-2 is also
> +		 * supported, they may contain valid metric data.
> +		 * Either way, guest writes are passed through verbatim.
> +		 */
> +		pmu->perf_metrics = data;
> +		break;
>  	case MSR_IA32_PEBS_ENABLE:
>  		if (data & pmu->pebs_enable_rsvd)
>  			return 1;
> @@ -589,6 +603,11 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>  		pmu->global_status_rsvd &=
>  				~MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI;
>  
> +	if (perf_capabilities & PERF_CAP_PERF_METRICS) {
> +		pmu->global_ctrl_rsvd &= ~GLOBAL_CTRL_EN_PERF_METRICS;
> +		pmu->global_status_rsvd &= ~GLOBAL_STATUS_PERF_METRICS_OVF;
> +	}
> +
>  	if (perf_capabilities & PERF_CAP_PEBS_FORMAT) {
>  		if (perf_capabilities & PERF_CAP_PEBS_BASELINE) {
>  			pmu->pebs_enable_rsvd = counter_rsvd;
> @@ -632,6 +651,9 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
>  
>  static void intel_pmu_reset(struct kvm_vcpu *vcpu)
>  {
> +	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> +
> +	pmu->perf_metrics = 0;
>  	intel_pmu_release_guest_lbr_event(vcpu);
>  }
>  
> @@ -803,6 +825,9 @@ static void intel_mediated_pmu_load(struct kvm_vcpu *vcpu)
>  	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>  	u64 global_status, toggle;
>  
> +	if (kvm_vcpu_has_perf_metrics(vcpu))
> +		wrmsrq(MSR_PERF_METRICS, pmu->perf_metrics);
> +
>  	rdmsrq(MSR_CORE_PERF_GLOBAL_STATUS, global_status);
>  	toggle = pmu->global_status ^ global_status;
>  	if (global_status & toggle)
> @@ -831,6 +856,12 @@ static void intel_mediated_pmu_put(struct kvm_vcpu *vcpu)
>  	 */
>  	if (pmu->fixed_ctr_ctrl_hw)
>  		wrmsrq(MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
> +
> +	if (kvm_vcpu_has_perf_metrics(vcpu)) {
> +		pmu->perf_metrics = rdpmc(INTEL_PMC_FIXED_RDPMC_METRICS);
> +		if (pmu->perf_metrics)
> +			wrmsrq(MSR_PERF_METRICS, 0);
> +	}
>  }
>  
>  struct kvm_pmu_ops intel_pmu_ops __initdata = {
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index b736b9ff965b..21eb4b339fa6 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4273,6 +4273,10 @@ static void vmx_recalc_pmu_msr_intercepts(struct kvm_vcpu *vcpu)
>  				  MSR_TYPE_RW, intercept);
>  	vmx_set_intercept_for_msr(vcpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
>  				  MSR_TYPE_RW, intercept);
> +
> +	intercept = !has_mediated_pmu || !kvm_vcpu_has_perf_metrics(vcpu);
> +	vmx_set_intercept_for_msr(vcpu, MSR_PERF_METRICS,
> +				  MSR_TYPE_RW, intercept);
>  }
>  
>  static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
> @@ -8095,6 +8099,9 @@ static __init u64 vmx_get_perf_capabilities(void)
>  		perf_cap &= ~PERF_CAP_PEBS_BASELINE;
>  	}
>  
> +	if (enable_mediated_pmu)
> +		perf_cap |= kvm_host.perf_capabilities & PERF_CAP_PERF_METRICS;
> +
>  	return perf_cap;
>  }
>  

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 6/8] KVM: x86/pmu: Move RDPMC emulation into per-vendor callbacks
  2026-06-29 23:19 ` [PATCH v6 6/8] KVM: x86/pmu: Move RDPMC emulation into per-vendor callbacks Zide Chen
@ 2026-06-30  2:23   ` Mi, Dapeng
  0 siblings, 0 replies; 18+ messages in thread
From: Mi, Dapeng @ 2026-06-30  2:23 UTC (permalink / raw)
  To: Zide Chen, Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Das Sandipan,
	Shukla Manali, Falcon Thomas, Xudong Hao


On 6/30/2026 7:19 AM, Zide Chen wrote:
> The current RDPMC emulation splits responsibility: rdpmc_ecx_to_pmc()
> in each vendor returns a kvm_pmc, then common code calls
> pmc_read_counter().
>
> This design cannot support RDPMC reads that don't map to a counter,
> such as PERF_METRICS on Intel platforms.
>
> Replace rdpmc_ecx_to_pmc() with emulate_rdpmc(), which takes full
> ownership of the emulation and writes the result directly into @data.
>
> Also drop the redundant bitmask in intel_emulate_rdpmc() since
> pmc_read_counter() already applies the counter's bit-width mask.

Nice cleanup.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>


>
> No functional change intended.
>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
> ---
> v6: new patch.
> ---
>  arch/x86/include/asm/kvm-x86-pmu-ops.h |  2 +-
>  arch/x86/kvm/pmu.c                     |  9 +--------
>  arch/x86/kvm/pmu.h                     |  4 ++--
>  arch/x86/kvm/svm/pmu.c                 | 13 +++++++++----
>  arch/x86/kvm/vmx/pmu_intel.c           | 25 ++++++++++++-------------
>  5 files changed, 25 insertions(+), 28 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm-x86-pmu-ops.h b/arch/x86/include/asm/kvm-x86-pmu-ops.h
> index 4a223c2793e3..4b50ed058aed 100644
> --- a/arch/x86/include/asm/kvm-x86-pmu-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-pmu-ops.h
> @@ -13,7 +13,7 @@
>   * KVM_X86_PMU_OP_OPTIONAL() can be used for those functions that can have
>   * a NULL definition.
>   */
> -KVM_X86_PMU_OP(rdpmc_ecx_to_pmc)
> +KVM_X86_PMU_OP(emulate_rdpmc)
>  KVM_X86_PMU_OP(msr_idx_to_pmc)
>  KVM_X86_PMU_OP_OPTIONAL(check_rdpmc_early)
>  KVM_X86_PMU_OP(is_valid_msr)
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index f82ba63767d0..8ef2d4761790 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -768,8 +768,6 @@ static int kvm_pmu_rdpmc_vmware(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
>  int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
>  {
>  	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> -	struct kvm_pmc *pmc;
> -	u64 mask = ~0ull;
>  
>  	if (!pmu->version)
>  		return 1;
> @@ -777,17 +775,12 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
>  	if (is_vmware_backdoor_pmc(idx))
>  		return kvm_pmu_rdpmc_vmware(vcpu, idx, data);
>  
> -	pmc = kvm_pmu_call(rdpmc_ecx_to_pmc)(vcpu, idx, &mask);
> -	if (!pmc)
> -		return 1;
> -
>  	if (!kvm_is_cr4_bit_set(vcpu, X86_CR4_PCE) &&
>  	    (kvm_x86_call(get_cpl)(vcpu) != 0) &&
>  	    kvm_is_cr0_bit_set(vcpu, X86_CR0_PE))
>  		return 1;
>  
> -	*data = pmc_read_counter(pmc) & mask;
> -	return 0;
> +	return kvm_pmu_call(emulate_rdpmc)(vcpu, idx, data);
>  }
>  
>  static bool kvm_need_any_pmc_intercept(struct kvm_vcpu *vcpu)
> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> index 3066cade5790..cdbefda844b9 100644
> --- a/arch/x86/kvm/pmu.h
> +++ b/arch/x86/kvm/pmu.h
> @@ -24,8 +24,8 @@
>  #define KVM_FIXED_PMC_BASE_IDX INTEL_PMC_IDX_FIXED
>  
>  struct kvm_pmu_ops {
> -	struct kvm_pmc *(*rdpmc_ecx_to_pmc)(struct kvm_vcpu *vcpu,
> -		unsigned int idx, u64 *mask);
> +	int (*emulate_rdpmc)(struct kvm_vcpu *vcpu, unsigned int idx,
> +			     u64 *data);
>  	struct kvm_pmc *(*msr_idx_to_pmc)(struct kvm_vcpu *vcpu, u32 msr);
>  	int (*check_rdpmc_early)(struct kvm_vcpu *vcpu, unsigned int idx);
>  	bool (*is_valid_msr)(struct kvm_vcpu *vcpu, u32 msr);
> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
> index c18286545a7a..0517fd4bbcd7 100644
> --- a/arch/x86/kvm/svm/pmu.c
> +++ b/arch/x86/kvm/svm/pmu.c
> @@ -84,10 +84,15 @@ static int amd_check_rdpmc_early(struct kvm_vcpu *vcpu, unsigned int idx)
>  }
>  
>  /* idx is the ECX register of RDPMC instruction */
> -static struct kvm_pmc *amd_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
> -	unsigned int idx, u64 *mask)
> +static int amd_emulate_rdpmc(struct kvm_vcpu *vcpu, unsigned int idx, u64 *data)
>  {
> -	return amd_pmu_get_pmc(vcpu_to_pmu(vcpu), idx);
> +	struct kvm_pmc *pmc = amd_pmu_get_pmc(vcpu_to_pmu(vcpu), idx);
> +
> +	if (!pmc)
> +		return 1;
> +
> +	*data = pmc_read_counter(pmc);
> +	return 0;
>  }
>  
>  static struct kvm_pmc *amd_msr_idx_to_pmc(struct kvm_vcpu *vcpu, u32 msr)
> @@ -302,7 +307,7 @@ static bool amd_pmc_is_disabled_in_current_mode(struct kvm_pmc *pmc)
>  }
>  
>  struct kvm_pmu_ops amd_pmu_ops __initdata = {
> -	.rdpmc_ecx_to_pmc = amd_rdpmc_ecx_to_pmc,
> +	.emulate_rdpmc = amd_emulate_rdpmc,
>  	.msr_idx_to_pmc = amd_msr_idx_to_pmc,
>  	.check_rdpmc_early = amd_check_rdpmc_early,
>  	.is_valid_msr = amd_is_valid_msr,
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 225afd3937c3..080677372c9b 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -84,14 +84,13 @@ static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
>  	}
>  }
>  
> -static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
> -					    unsigned int idx, u64 *mask)
> +static int intel_emulate_rdpmc(struct kvm_vcpu *vcpu, unsigned int idx,
> +			       u64 *data)
>  {
>  	unsigned int type = idx & INTEL_RDPMC_TYPE_MASK;
>  	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> -	struct kvm_pmc *counters;
> +	struct kvm_pmc *counters, *pmc;
>  	unsigned int num_counters;
> -	u64 bitmask;
>  
>  	/*
>  	 * The encoding of ECX for RDPMC is different for architectural versus
> @@ -104,7 +103,9 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
>  	 * as KVM doesn't support such PMUs.
>  	 */
>  	if (WARN_ON_ONCE(!pmu->version))
> -		return NULL;
> +		return 1;
> +
> +	idx &= INTEL_RDPMC_INDEX_MASK;
>  
>  	/*
>  	 * General Purpose (GP) PMCs are supported on all PMUs, and fixed PMCs
> @@ -118,23 +119,21 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
>  	case INTEL_RDPMC_FIXED:
>  		counters = pmu->fixed_counters;
>  		num_counters = pmu->nr_arch_fixed_counters;
> -		bitmask = pmu->counter_bitmask[KVM_PMC_FIXED];
>  		break;
>  	case INTEL_RDPMC_GP:
>  		counters = pmu->gp_counters;
>  		num_counters = pmu->nr_arch_gp_counters;
> -		bitmask = pmu->counter_bitmask[KVM_PMC_GP];
>  		break;
>  	default:
> -		return NULL;
> +		return 1;
>  	}
>  
> -	idx &= INTEL_RDPMC_INDEX_MASK;
>  	if (idx >= num_counters)
> -		return NULL;
> +		return 1;
>  
> -	*mask &= bitmask;
> -	return &counters[array_index_nospec(idx, num_counters)];
> +	pmc = &counters[array_index_nospec(idx, num_counters)];
> +	*data = pmc_read_counter(pmc);
> +	return 0;
>  }
>  
>  static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm_pmu *pmu, u32 msr)
> @@ -865,7 +864,7 @@ static void intel_mediated_pmu_put(struct kvm_vcpu *vcpu)
>  }
>  
>  struct kvm_pmu_ops intel_pmu_ops __initdata = {
> -	.rdpmc_ecx_to_pmc = intel_rdpmc_ecx_to_pmc,
> +	.emulate_rdpmc = intel_emulate_rdpmc,
>  	.msr_idx_to_pmc = intel_msr_idx_to_pmc,
>  	.is_valid_msr = intel_is_valid_msr,
>  	.get_msr = intel_pmu_get_msr,

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 7/8] KVM: x86/pmu: Emulate RDPMC on performance metrics
  2026-06-29 23:19 ` [PATCH v6 7/8] KVM: x86/pmu: Emulate RDPMC on performance metrics Zide Chen
@ 2026-06-30  2:23   ` Mi, Dapeng
  0 siblings, 0 replies; 18+ messages in thread
From: Mi, Dapeng @ 2026-06-30  2:23 UTC (permalink / raw)
  To: Zide Chen, Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Das Sandipan,
	Shukla Manali, Falcon Thomas, Xudong Hao

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>


On 6/30/2026 7:19 AM, Zide Chen wrote:
> If the host has the PERF_METRICS capability but it's not present on
> the guest, RDPMC interception must be enabled and KVM should inject
> an #GP when the guest attempts a PERF_METRICS RDPMC.
>
> If the guest has PERF_METRICS but RDPMC interception is enabled for
> other reasons, KVM needs to emulate RDPMC with type 2000H.
>
> For simplicity, Metrics Clear Mode is not supported.
>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
> ---
> v6:
> - Merge kvm_pmu_rdpmc_metrics() into intel_emulate_rdpmc().
> - Reject non-zero index.
> v5:
> - new patch.
> ---
>  arch/x86/kvm/pmu.c           |  7 +++++++
>  arch/x86/kvm/vmx/pmu_intel.c | 14 ++++++++++++++
>  2 files changed, 21 insertions(+)
>
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 8ef2d4761790..04b9c840f218 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -806,6 +806,12 @@ bool kvm_need_perf_global_ctrl_intercept(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_need_perf_global_ctrl_intercept);
>  
> +static bool kvm_need_perf_metrics_intercept(struct kvm_vcpu *vcpu)
> +{
> +	return (kvm_host.perf_capabilities & PERF_CAP_PERF_METRICS) &&
> +		!kvm_vcpu_has_perf_metrics(vcpu);
> +}
> +
>  bool kvm_need_rdpmc_intercept(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> @@ -818,6 +824,7 @@ bool kvm_need_rdpmc_intercept(struct kvm_vcpu *vcpu)
>  		return true;
>  
>  	return kvm_need_any_pmc_intercept(vcpu) ||
> +	       kvm_need_perf_metrics_intercept(vcpu) ||
>  	       pmu->counter_bitmask[KVM_PMC_GP] != (BIT_ULL(kvm_host_pmu.bit_width_gp) - 1) ||
>  	       pmu->counter_bitmask[KVM_PMC_FIXED] != (BIT_ULL(kvm_host_pmu.bit_width_fixed) - 1);
>  }
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 080677372c9b..93b5a8360377 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -30,6 +30,7 @@
>   */
>  #define INTEL_RDPMC_GP		0
>  #define INTEL_RDPMC_FIXED	INTEL_PMC_FIXED_RDPMC_BASE
> +#define INTEL_RDPMC_METRICS	INTEL_PMC_FIXED_RDPMC_METRICS
>  
>  #define INTEL_RDPMC_TYPE_MASK	GENMASK(31, 16)
>  #define INTEL_RDPMC_INDEX_MASK	GENMASK(15, 0)
> @@ -124,6 +125,19 @@ static int intel_emulate_rdpmc(struct kvm_vcpu *vcpu, unsigned int idx,
>  		counters = pmu->gp_counters;
>  		num_counters = pmu->nr_arch_gp_counters;
>  		break;
> +	case INTEL_RDPMC_METRICS:
> +		if (!kvm_vcpu_has_perf_metrics(vcpu))
> +			return 1;
> +
> +		/*
> +		 * The index in ECX[15:0] is implementation specific, but no
> +		 * platform currently supports a non-zero index.
> +		 */
> +		if (idx)
> +			return 1;
> +
> +		*data = pmu->perf_metrics;
> +		return 0;
>  	default:
>  		return 1;
>  	}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 8/8] KVM: selftests: Add PERF_METRICS and fixed counter 3 tests
  2026-06-29 23:19 ` [PATCH v6 8/8] KVM: selftests: Add PERF_METRICS and fixed counter 3 tests Zide Chen
  2026-06-29 23:45   ` sashiko-bot
@ 2026-06-30  2:36   ` Mi, Dapeng
  1 sibling, 0 replies; 18+ messages in thread
From: Mi, Dapeng @ 2026-06-30  2:36 UTC (permalink / raw)
  To: Zide Chen, Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Das Sandipan,
	Shukla Manali, Falcon Thomas, Xudong Hao


On 6/30/2026 7:19 AM, Zide Chen wrote:
> Add a test case to exercise IA32_PERF_METRICS, i.e. architectural
> support for Topdown (TMA) Level 1 metrics, enumerated by
> IA32_PERF_CAPABILITIES[15].
>
> Only check for non-zero metrics, as they are derived and depend on
> the workload, CPU model, and host scheduling, making precise
> expectations fragile.
>
> Extend the PMU selftest to cover Intel fixed counter 3 by bumping
> MAX_NR_FIXED_COUNTERS to 4 and validating basic functionality.
>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
> ---
> v6:
> - Move perf metrics test out of test_arch_events(); it doesn't belong
>   there, and this also avoids redundant runs of perf metrics test.
> - Correct the +/-3 error margin.
> v3:
> - Slightly reword comment to explain the sum of topdown metrics
>   is close to 100%.
> - Change abs() with explicit bounds (sum >= 0xfd && sum <= 0x102)
>   for better readability.
> v2:
> - New patch.
> ---
>  tools/arch/x86/include/asm/msr-index.h        |  1 +
>  tools/testing/selftests/kvm/include/x86/pmu.h |  3 +
>  .../selftests/kvm/x86/pmu_counters_test.c     | 94 ++++++++++++++++++-
>  3 files changed, 93 insertions(+), 5 deletions(-)
>
> diff --git a/tools/arch/x86/include/asm/msr-index.h b/tools/arch/x86/include/asm/msr-index.h
> index eff29645719b..e7745e2cd543 100644
> --- a/tools/arch/x86/include/asm/msr-index.h
> +++ b/tools/arch/x86/include/asm/msr-index.h
> @@ -331,6 +331,7 @@
>  #define PERF_CAP_PEBS_FORMAT		0xf00
>  #define PERF_CAP_FW_WRITES		BIT_ULL(13)
>  #define PERF_CAP_PEBS_BASELINE		BIT_ULL(14)
> +#define PERF_CAP_PERF_METRICS		BIT_ULL(15)
>  #define PERF_CAP_PEBS_TIMING_INFO	BIT_ULL(17)
>  #define PERF_CAP_PEBS_MASK		(PERF_CAP_PEBS_TRAP | PERF_CAP_ARCH_REG | \
>  					 PERF_CAP_PEBS_FORMAT | PERF_CAP_PEBS_BASELINE | \
> diff --git a/tools/testing/selftests/kvm/include/x86/pmu.h b/tools/testing/selftests/kvm/include/x86/pmu.h
> index 608ed83d7c6a..6c19503e0bb7 100644
> --- a/tools/testing/selftests/kvm/include/x86/pmu.h
> +++ b/tools/testing/selftests/kvm/include/x86/pmu.h
> @@ -52,6 +52,9 @@
>  /* Fixed PMC controls, Intel only. */
>  #define FIXED_PMC_GLOBAL_CTRL_ENABLE(_idx)	BIT_ULL((32 + (_idx)))
>  
> +/* PERF_METRICS enable, Intel only. */
> +#define PERF_METRICS_GLOBAL_CTRL_ENABLE		BIT_ULL(48)
> +
>  #define FIXED_PMC_KERNEL			BIT_ULL(0)
>  #define FIXED_PMC_USER				BIT_ULL(1)
>  #define FIXED_PMC_ANYTHREAD			BIT_ULL(2)
> diff --git a/tools/testing/selftests/kvm/x86/pmu_counters_test.c b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
> index dc6afac3aa91..38057754e024 100644
> --- a/tools/testing/selftests/kvm/x86/pmu_counters_test.c
> +++ b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
> @@ -6,6 +6,7 @@
>  
>  #include "pmu.h"
>  #include "processor.h"
> +#include <linux/bitfield.h>
>  
>  /* Number of iterations of the loop for the guest measurement payload. */
>  #define NUM_LOOPS			10
> @@ -241,17 +242,20 @@ do {										\
>  	);									\
>  } while (0)
>  
> -#define GUEST_TEST_EVENT(_idx, _pmc, _pmc_msr, _ctrl_msr, _value, FEP)		\
> +#define GUEST_RUN_PAYLOAD(_ctrl_msr, _value, FEP)				\
>  do {										\
> -	wrmsr(_pmc_msr, 0);							\
> -										\
>  	if (this_cpu_has(X86_FEATURE_CLFLUSHOPT))				\
>  		GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflushopt %[m]", FEP);	\
>  	else if (this_cpu_has(X86_FEATURE_CLFLUSH))				\
>  		GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflush  %[m]", FEP);	\
>  	else									\
>  		GUEST_MEASURE_EVENT(_ctrl_msr, _value, "nop", FEP);		\
> -										\
> +} while (0)
> +
> +#define GUEST_TEST_EVENT(_idx, _pmc, _pmc_msr, _ctrl_msr, _value, FEP)		\
> +do {										\
> +	wrmsr(_pmc_msr, 0);							\
> +	GUEST_RUN_PAYLOAD(_ctrl_msr, _value, FEP);				\
>  	guest_assert_event_count(_idx, _pmc, _pmc_msr);				\
>  } while (0)
>  
> @@ -318,6 +322,75 @@ static void guest_test_arch_event(u8 idx)
>  				FIXED_PMC_GLOBAL_CTRL_ENABLE(i));
>  }
>  
> +static void __guest_test_perf_metrics(void)
> +{
> +	int retiring, bad_spec, fe_bound, be_bound, sum;
> +	u64 global_ctrl, metrics;
> +
> +	if ((guest_get_pmu_version() < 2) ||	/* Does guest have GLOBAL_CTRL? */
> +	    !this_cpu_has(X86_FEATURE_PDCM) ||
> +	    !(rdmsr(MSR_IA32_PERF_CAPABILITIES) & PERF_CAP_PERF_METRICS))
> +		return;
> +
> +	wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
> +	wrmsr(MSR_CORE_PERF_FIXED_CTR3, 0);
> +	wrmsr(MSR_PERF_METRICS, 0);
> +
> +	/* Enable fixed ctr3 (TOPDOWN.SLOTS) and PERF_METRICS. */
> +	wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, FIXED_PMC_CTRL(3, FIXED_PMC_KERNEL));
> +	global_ctrl = FIXED_PMC_GLOBAL_CTRL_ENABLE(3) |
> +		      PERF_METRICS_GLOBAL_CTRL_ENABLE;
> +
> +	GUEST_RUN_PAYLOAD(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl, "");
> +
> +	/* Check test results. */
> +	metrics = rdmsr(MSR_PERF_METRICS);

Could we use rdpmc instead of rdmsr here? rdpmc is a preferred way to read
counter value.


> +	retiring = FIELD_GET(GENMASK_ULL(7, 0), metrics);
> +	bad_spec = FIELD_GET(GENMASK_ULL(15, 8), metrics);
> +	fe_bound = FIELD_GET(GENMASK_ULL(23, 16), metrics);
> +	be_bound = FIELD_GET(GENMASK_ULL(31, 24), metrics);
> +
> +	/*
> +	 * Be conservative: the measured payload definitely retires work, so
> +	 * Retiring should be non-zero.
> +	 */
> +	GUEST_ASSERT_NE(metrics, 0ULL);
> +	GUEST_ASSERT_NE(retiring, 0ULL);
> +
> +	/*
> +	 * Each level-1 topdown metrics is an integer fraction of 255.
> +	 * An +/-3 error margin is chosen for a loose sanity check.
> +	 */
> +	sum = retiring + bad_spec + fe_bound + be_bound;
> +	GUEST_ASSERT(sum >= 0xfc && sum <= 0x102);
> +
> +	/* Sanity check after PERF_METRICS disabled. */
> +	__asm__ __volatile__("loop ." : "+c"((int){NUM_LOOPS}));
> +	GUEST_ASSERT_EQ(rdmsr(MSR_PERF_METRICS), metrics);

Better rdpmc as well here.

> +	wrmsr(MSR_PERF_METRICS, 0xdeaddead);
> +
> +	GUEST_ASSERT_EQ(rdmsr(MSR_PERF_METRICS), 0xdeaddead);

We can still use rdmsr here, then rdpmc and wrmsr/rdmsr for PERF_METRICS
are all validated. 

Thanks.


> +}
> +
> +static void guest_test_perf_metrics(void)
> +{
> +	__guest_test_perf_metrics();
> +	GUEST_DONE();
> +}
> +
> +static void test_perf_metrics(u8 pmu_version, u64 perf_capabilities)
> +{
> +	struct kvm_vcpu *vcpu;
> +	struct kvm_vm *vm;
> +
> +	vm = pmu_vm_create_with_one_vcpu(&vcpu, guest_test_perf_metrics,
> +					 pmu_version, perf_capabilities);
> +
> +	run_vcpu(vcpu);
> +
> +	kvm_vm_free(vm);
> +}
> +
>  static void guest_test_arch_events(void)
>  {
>  	u8 i;
> @@ -361,7 +434,7 @@ static void test_arch_events(u8 pmu_version, u64 perf_capabilities,
>   * other than PMCs in the future.
>   */
>  #define MAX_NR_GP_COUNTERS	8
> -#define MAX_NR_FIXED_COUNTERS	3
> +#define MAX_NR_FIXED_COUNTERS	4
>  
>  #define GUEST_ASSERT_PMC_MSR_ACCESS(insn, msr, expect_gp, vector)		\
>  __GUEST_ASSERT(expect_gp ? vector == GP_VECTOR : !vector,			\
> @@ -585,6 +658,7 @@ static void test_intel_counters(void)
>  	u8 nr_fixed_counters = kvm_cpu_property(X86_PROPERTY_PMU_NR_FIXED_COUNTERS);
>  	u8 nr_gp_counters = kvm_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS);
>  	u8 pmu_version = kvm_cpu_property(X86_PROPERTY_PMU_VERSION);
> +	u64 advertised_perf_caps = kvm_get_feature_msr(MSR_IA32_PERF_CAPABILITIES);
>  	unsigned int i;
>  	u8 v, j;
>  	u32 k;
> @@ -592,6 +666,7 @@ static void test_intel_counters(void)
>  	const u64 perf_caps[] = {
>  		0,
>  		PMU_CAP_FW_WRITES,
> +		PERF_CAP_PERF_METRICS,
>  	};
>  
>  	/*
> @@ -649,6 +724,10 @@ static void test_intel_counters(void)
>  			if (!kvm_has_perf_caps && perf_caps[i])
>  				continue;
>  
> +			/* Ignore unsupported features. */
> +			if (perf_caps[i] & ~advertised_perf_caps)
> +				continue;
> +
>  			pr_info("Testing arch events, PMU version %u, perf_caps = %lx\n",
>  				v, perf_caps[i]);
>  
> @@ -675,6 +754,11 @@ static void test_intel_counters(void)
>  				for (k = 0; k <= (BIT(nr_fixed_counters) - 1); k++)
>  					test_fixed_counters(v, perf_caps[i], j, k);
>  			}
> +
> +			pr_info("Testing Perf Metrics, PMU version %u, perf_caps = %lx\n",
> +				v, perf_caps[i]);
> +
> +			test_perf_metrics(v, perf_caps[i]);
>  		}
>  	}
>  }

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2026-06-30  2:36 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-29 23:19 [PATCH V6 0/8] KVM: x86/pmu: Add hardware Topdown metrics support Zide Chen
2026-06-29 23:19 ` [PATCH v6 1/8] KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events Zide Chen
2026-06-30  2:13   ` Mi, Dapeng
2026-06-29 23:19 ` [PATCH v6 2/8] KVM: x86/pmu: Support Intel fixed counter 3 on mediated vPMU Zide Chen
2026-06-30  2:16   ` Mi, Dapeng
2026-06-29 23:19 ` [PATCH v6 3/8] KVM: x86/pmu: Rename and move vcpu_get_perf_capabilities() to pmu.h Zide Chen
2026-06-30  2:18   ` Mi, Dapeng
2026-06-29 23:19 ` [PATCH v6 4/8] KVM: x86/pmu: Snapshot host IA32_PERF_CAPABILITIES in kvm_host Zide Chen
2026-06-30  2:19   ` Mi, Dapeng
2026-06-29 23:19 ` [PATCH v6 5/8] KVM: x86/pmu: Support PERF_METRICS MSR in mediated vPMU Zide Chen
2026-06-30  2:20   ` Mi, Dapeng
2026-06-29 23:19 ` [PATCH v6 6/8] KVM: x86/pmu: Move RDPMC emulation into per-vendor callbacks Zide Chen
2026-06-30  2:23   ` Mi, Dapeng
2026-06-29 23:19 ` [PATCH v6 7/8] KVM: x86/pmu: Emulate RDPMC on performance metrics Zide Chen
2026-06-30  2:23   ` Mi, Dapeng
2026-06-29 23:19 ` [PATCH v6 8/8] KVM: selftests: Add PERF_METRICS and fixed counter 3 tests Zide Chen
2026-06-29 23:45   ` sashiko-bot
2026-06-30  2:36   ` Mi, Dapeng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox