* [PATCH 0/3] KVM: x86/pmu: Add hardware Topdown metrics support
@ 2026-02-26 23:06 Zide Chen
2026-02-26 23:06 ` [PATCH 1/3] KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events Zide Chen
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Zide Chen @ 2026-02-26 23:06 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Zide Chen,
Das Sandipan, Shukla Manali, Dapeng Mi, Falcon Thomas, Xudong Hao
The Top-Down Microarchitecture Analysis (TMA) method is a structured
approach for identifying performance bottlenecks in out-of-order
processors.
Currently, guests support the TMA method by collecting Topdown events
using GP counters, which may trigger multiplexing. To free up scarce
GP counters, eliminate multiplexing-induced skew, and obtain coherent
Topdown metric ratios, it is desirable to expose fixed counter 3 and
the IA32_PERF_METRICS MSR to guests.
Several failed attempts have been made to virtualize this under the
legacy vPMU model: [1], [2], [3]. With the new mediated vPMU, enabling
TMA support in guests becomes much simpler. It avoids invasive changes
to the perf core, eliminates CPU pinning and fixed-counter affinity
issues, and reduces the overhead of trapping and emulating MSR accesses.
[1] https://lore.kernel.org/kvm/20231031090613.2872700-1-dapeng1.mi@linux.intel.com/
[2] https://lore.kernel.org/all/20230927033124.1226509-1-dapeng1.mi@linux.intel.com/T/
[3] https://lwn.net/ml/linux-kernel/20221212125844.41157-1-likexu@tencent.com/
Tested on an SPR. Without this series, only raw topdown.*_slots events
work in the guest, and metric events (e.g. cpu/topdown-bad-spec/) are
not available.
With this series, metric events are visible in the guest. Run this
command on both host and guest:
$ perf stat --topdown --no-metric-only -- taskset -c 2 perf bench sched messaging
Host results:
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 1.500 [sec]
Performance counter stats for 'taskset -c 2 perf bench sched messaging':
4,266,060,558 TOPDOWN.SLOTS:u # 32.0 % tma_frontend_bound
# 5.2 % tma_bad_speculation
588,397,905 topdown-retiring:u # 13.8 % tma_retiring
# 49.0 % tma_backend_bound
1,376,283,990 topdown-fe-bound:u
2,096,827,304 topdown-be-bound:u
217,425,841 topdown-bad-spec:u
5,050,520 INT_MISC.UOP_DROPPING:u
1.755503765 seconds time elapsed
0.235965000 seconds user
1.500508000 seconds sys
Guest results:
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 1.558 [sec]
Performance counter stats for 'taskset -c 2 perf bench sched messaging':
5,148,818,712 TOPDOWN.SLOTS:u # 34.0 % tma_frontend_bound
# 4.6 % tma_bad_speculation
602,862,499 topdown-retiring:u # 11.7 % tma_retiring
# 49.7 % tma_backend_bound
1,759,698,259 topdown-fe-bound:u
2,565,571,672 topdown-be-bound:u
230,277,308 topdown-bad-spec:u
4,966,279 INT_MISC.UOP_DROPPING:u
1.783366587 seconds time elapsed
0.313692000 seconds user
1.446377000 seconds sys
Dapeng Mi (2):
KVM: x86/pmu: Support Intel fixed counter 3 on mediated vPMU
KVM: x86/pmu: Support PERF_METRICS MSR in mediated vPMU
Zide Chen (1):
KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events
arch/x86/include/asm/kvm_host.h | 3 +-
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/perf_event.h | 1 +
arch/x86/kvm/pmu.c | 4 +++
arch/x86/kvm/vmx/pmu_intel.c | 57 ++++++++++++++++++++++++-------
arch/x86/kvm/vmx/pmu_intel.h | 5 +++
arch/x86/kvm/vmx/vmx.c | 6 ++++
arch/x86/kvm/x86.c | 10 ++++--
8 files changed, 71 insertions(+), 16 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/3] KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events
2026-02-26 23:06 [PATCH 0/3] KVM: x86/pmu: Add hardware Topdown metrics support Zide Chen
@ 2026-02-26 23:06 ` Zide Chen
2026-03-24 5:48 ` Mi, Dapeng
2026-02-26 23:06 ` [PATCH 2/3] KVM: x86/pmu: Support Intel fixed counter 3 on mediated vPMU Zide Chen
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Zide Chen @ 2026-02-26 23:06 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Zide Chen,
Das Sandipan, Shukla Manali, Dapeng Mi, Falcon Thomas, Xudong Hao
Only fixed counters 0..2 have matching generic cross-platform
hardware perf events (INSTRUCTIONS, CPU_CYCLES, REF_CPU_CYCLES).
Therefore, perf_get_hw_event_config() is only applicable to these
counters.
KVM does not intend to emulate fixed counters >= 3 on legacy
(non-mediated) vPMU, while for mediated vPMU, KVM does not care what
the fixed counter event mappings are. Therefore, return 0 for their
eventsel.
Also remove __always_inline as BUILD_BUG_ON() is no longer needed.
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
arch/x86/kvm/vmx/pmu_intel.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 27eb76e6b6a0..4bfd16a9e6c7 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -454,28 +454,30 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
* different perf_event is already utilizing the requested counter, but the end
* result is the same (ignoring the fact that using a general purpose counter
* will likely exacerbate counter contention).
- *
- * Forcibly inlined to allow asserting on @index at build time, and there should
- * never be more than one user.
*/
-static __always_inline u64 intel_get_fixed_pmc_eventsel(unsigned int index)
+static u64 intel_get_fixed_pmc_eventsel(unsigned int index)
{
const enum perf_hw_id fixed_pmc_perf_ids[] = {
[0] = PERF_COUNT_HW_INSTRUCTIONS,
[1] = PERF_COUNT_HW_CPU_CYCLES,
[2] = PERF_COUNT_HW_REF_CPU_CYCLES,
};
- u64 eventsel;
-
- BUILD_BUG_ON(ARRAY_SIZE(fixed_pmc_perf_ids) != KVM_MAX_NR_INTEL_FIXED_COUNTERS);
- BUILD_BUG_ON(index >= KVM_MAX_NR_INTEL_FIXED_COUNTERS);
+ u64 eventsel = 0;
/*
- * Yell if perf reports support for a fixed counter but perf doesn't
- * have a known encoding for the associated general purpose event.
+ * Fixed counters 3 and above don't have corresponding generic hardware
+ * perf event, and KVM does not intend to emulate them on non-mediated
+ * vPMU.
*/
- eventsel = perf_get_hw_event_config(fixed_pmc_perf_ids[index]);
- WARN_ON_ONCE(!eventsel && index < kvm_pmu_cap.num_counters_fixed);
+ if (index < 3) {
+ /*
+ * Yell if perf reports support for a fixed counter but perf
+ * doesn't have a known encoding for the associated general
+ * purpose event.
+ */
+ eventsel = perf_get_hw_event_config(fixed_pmc_perf_ids[index]);
+ WARN_ON_ONCE(!eventsel && index < kvm_pmu_cap.num_counters_fixed);
+ }
return eventsel;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/3] KVM: x86/pmu: Support Intel fixed counter 3 on mediated vPMU
2026-02-26 23:06 [PATCH 0/3] KVM: x86/pmu: Add hardware Topdown metrics support Zide Chen
2026-02-26 23:06 ` [PATCH 1/3] KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events Zide Chen
@ 2026-02-26 23:06 ` Zide Chen
2026-02-26 23:06 ` [PATCH 3/3] KVM: x86/pmu: Support PERF_METRICS MSR in " Zide Chen
2026-04-09 6:25 ` [PATCH 0/3] KVM: x86/pmu: Add hardware Topdown metrics support Mi, Dapeng
3 siblings, 0 replies; 9+ messages in thread
From: Zide Chen @ 2026-02-26 23:06 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Zide Chen,
Das Sandipan, Shukla Manali, Dapeng Mi, Falcon Thomas, Xudong Hao
From: Dapeng Mi <dapeng1.mi@linux.intel.com>
Starting with Ice Lake, Intel introduces fixed counter 3, which counts
TOPDOWN.SLOTS - the number of available slots for an unhalted logical
processor. It serves as the denominator for top-level metrics in the
Top-down Microarchitecture Analysis method.
Emulating this counter on legacy vPMU would require introducing a new
generic perf encoding for the Intel-specific TOPDOWN.SLOTS event in
order to call perf_get_hw_event_config(). This is undesirable as it
would pollute the generic perf event encoding.
Moreover, KVM does not intend to emulate IA32_PERF_METRICS in the
legacy vPMU model, and without IA32_PERF_METRICS, emulating this
counter has little practical value. Therefore, expose fixed counter
3 to guests only when mediated vPMU is enabled.
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Co-developed-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/pmu.c | 4 ++++
arch/x86/kvm/x86.c | 4 ++--
3 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ff07c45e3c73..4666b2c7988f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -555,7 +555,7 @@ struct kvm_pmc {
#define KVM_MAX_NR_GP_COUNTERS KVM_MAX(KVM_MAX_NR_INTEL_GP_COUNTERS, \
KVM_MAX_NR_AMD_GP_COUNTERS)
-#define KVM_MAX_NR_INTEL_FIXED_COUNTERS 3
+#define KVM_MAX_NR_INTEL_FIXED_COUNTERS 4
#define KVM_MAX_NR_AMD_FIXED_COUNTERS 0
#define KVM_MAX_NR_FIXED_COUNTERS KVM_MAX(KVM_MAX_NR_INTEL_FIXED_COUNTERS, \
KVM_MAX_NR_AMD_FIXED_COUNTERS)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index bd6b785cf261..ee49395bfb82 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -148,6 +148,10 @@ void kvm_init_pmu_capability(struct kvm_pmu_ops *pmu_ops)
}
memcpy(&kvm_pmu_cap, &kvm_host_pmu, sizeof(kvm_host_pmu));
+
+ if (!enable_mediated_pmu && kvm_pmu_cap.num_counters_fixed > 3)
+ kvm_pmu_cap.num_counters_fixed = 3;
+
kvm_pmu_cap.version = min(kvm_pmu_cap.version, 2);
kvm_pmu_cap.num_counters_gp = min(kvm_pmu_cap.num_counters_gp,
pmu_ops->MAX_NR_GP_COUNTERS);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3fb64905d190..2ab7a4958620 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -355,7 +355,7 @@ static const u32 msrs_to_save_base[] = {
static const u32 msrs_to_save_pmu[] = {
MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
- MSR_ARCH_PERFMON_FIXED_CTR0 + 2,
+ MSR_ARCH_PERFMON_FIXED_CTR2, MSR_ARCH_PERFMON_FIXED_CTR3,
MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
MSR_CORE_PERF_GLOBAL_CTRL,
MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
@@ -7738,7 +7738,7 @@ static void kvm_init_msr_lists(void)
{
unsigned i;
- BUILD_BUG_ON_MSG(KVM_MAX_NR_FIXED_COUNTERS != 3,
+ BUILD_BUG_ON_MSG(KVM_MAX_NR_FIXED_COUNTERS != 4,
"Please update the fixed PMCs in msrs_to_save_pmu[]");
num_msrs_to_save = 0;
--
2.53.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/3] KVM: x86/pmu: Support PERF_METRICS MSR in mediated vPMU
2026-02-26 23:06 [PATCH 0/3] KVM: x86/pmu: Add hardware Topdown metrics support Zide Chen
2026-02-26 23:06 ` [PATCH 1/3] KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events Zide Chen
2026-02-26 23:06 ` [PATCH 2/3] KVM: x86/pmu: Support Intel fixed counter 3 on mediated vPMU Zide Chen
@ 2026-02-26 23:06 ` Zide Chen
2026-03-24 5:54 ` Mi, Dapeng
2026-04-09 6:25 ` [PATCH 0/3] KVM: x86/pmu: Add hardware Topdown metrics support Mi, Dapeng
3 siblings, 1 reply; 9+ messages in thread
From: Zide Chen @ 2026-02-26 23:06 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Zide Chen,
Das Sandipan, Shukla Manali, Dapeng Mi, Falcon Thomas, Xudong Hao
From: Dapeng Mi <dapeng1.mi@linux.intel.com>
Bit 15 in IA32_PERF_CAPABILITIES indicates that the CPU provides
built-in support for Topdown Microarchitecture Analysis (TMA) L1
metrics via the IA32_PERF_METRICS MSR.
Expose this capability only when mediated vPMU is enabled, as emulating
IA32_PERF_METRICS in the legacy vPMU model is impractical.
Pass IA32_PERF_METRICS through to the guest only when mediated vPMU is
enabled and bit 15 is set in guest IA32_PERF_CAPABILITIES is. Allow
kvm_pmu_{get,set}_msr() to handle this MSR for host accesses.
Save and restore this MSR on host/guest PMU context switches so that
host PMU activity does not clobber the guest value, and guest state
is not leaked into the host.
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/perf_event.h | 1 +
arch/x86/kvm/vmx/pmu_intel.c | 31 +++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/pmu_intel.h | 5 +++++
arch/x86/kvm/vmx/vmx.c | 6 ++++++
arch/x86/kvm/x86.c | 6 +++++-
7 files changed, 50 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4666b2c7988f..bf817c613451 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -575,6 +575,7 @@ struct kvm_pmu {
u64 global_status_rsvd;
u64 reserved_bits;
u64 raw_event_mask;
+ u64 perf_metrics;
struct kvm_pmc gp_counters[KVM_MAX_NR_GP_COUNTERS];
struct kvm_pmc fixed_counters[KVM_MAX_NR_FIXED_COUNTERS];
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index da5275d8eda6..337667a7ad1b 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -331,6 +331,7 @@
#define PERF_CAP_PEBS_FORMAT 0xf00
#define PERF_CAP_FW_WRITES BIT_ULL(13)
#define PERF_CAP_PEBS_BASELINE BIT_ULL(14)
+#define PERF_CAP_PERF_METRICS BIT_ULL(15)
#define PERF_CAP_PEBS_TIMING_INFO BIT_ULL(17)
#define PERF_CAP_PEBS_MASK (PERF_CAP_PEBS_TRAP | PERF_CAP_ARCH_REG | \
PERF_CAP_PEBS_FORMAT | PERF_CAP_PEBS_BASELINE | \
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index ff5acb8b199b..dfead3a34b74 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -445,6 +445,7 @@ static inline bool is_topdown_idx(int idx)
#define GLOBAL_STATUS_ARCH_PEBS_THRESHOLD_BIT 54
#define GLOBAL_STATUS_ARCH_PEBS_THRESHOLD BIT_ULL(GLOBAL_STATUS_ARCH_PEBS_THRESHOLD_BIT)
#define GLOBAL_STATUS_PERF_METRICS_OVF_BIT 48
+#define GLOBAL_STATUS_PERF_METRICS_OVF BIT_ULL(GLOBAL_STATUS_PERF_METRICS_OVF_BIT)
#define GLOBAL_CTRL_EN_PERF_METRICS BIT_ULL(48)
/*
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 9da47cf2af63..61bb2086f94a 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -180,6 +180,8 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
switch (msr) {
case MSR_CORE_PERF_FIXED_CTR_CTRL:
return kvm_pmu_has_perf_global_ctrl(pmu);
+ case MSR_PERF_METRICS:
+ return vcpu_has_perf_metrics(vcpu);
case MSR_IA32_PEBS_ENABLE:
ret = vcpu_get_perf_capabilities(vcpu) & PERF_CAP_PEBS_FORMAT;
break;
@@ -335,6 +337,10 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_CORE_PERF_FIXED_CTR_CTRL:
msr_info->data = pmu->fixed_ctr_ctrl;
break;
+ case MSR_PERF_METRICS:
+ WARN_ON(!msr_info->host_initiated);
+ msr_info->data = pmu->perf_metrics;
+ break;
case MSR_IA32_PEBS_ENABLE:
msr_info->data = pmu->pebs_enable;
break;
@@ -384,6 +390,10 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (pmu->fixed_ctr_ctrl != data)
reprogram_fixed_counters(pmu, data);
break;
+ case MSR_PERF_METRICS:
+ WARN_ON(!msr_info->host_initiated);
+ pmu->perf_metrics = data;
+ break;
case MSR_IA32_PEBS_ENABLE:
if (data & pmu->pebs_enable_rsvd)
return 1;
@@ -579,6 +589,11 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
pmu->global_status_rsvd &=
~MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI;
+ if (perf_capabilities & PERF_CAP_PERF_METRICS) {
+ pmu->global_ctrl_rsvd &= ~GLOBAL_CTRL_EN_PERF_METRICS;
+ pmu->global_status_rsvd &= ~GLOBAL_STATUS_PERF_METRICS_OVF;
+ }
+
if (perf_capabilities & PERF_CAP_PEBS_FORMAT) {
if (perf_capabilities & PERF_CAP_PEBS_BASELINE) {
pmu->pebs_enable_rsvd = counter_rsvd;
@@ -622,6 +637,9 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
static void intel_pmu_reset(struct kvm_vcpu *vcpu)
{
+ struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+
+ pmu->perf_metrics = 0;
intel_pmu_release_guest_lbr_event(vcpu);
}
@@ -793,6 +811,13 @@ static void intel_mediated_pmu_load(struct kvm_vcpu *vcpu)
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
u64 global_status, toggle;
+ /*
+ * PERF_METRICS MSR must be restored closely after fixed counter 3
+ * (kvm_pmu_load_guest_pmcs()).
+ */
+ if (vcpu_has_perf_metrics(vcpu))
+ wrmsrq(MSR_PERF_METRICS, pmu->perf_metrics);
+
rdmsrq(MSR_CORE_PERF_GLOBAL_STATUS, global_status);
toggle = pmu->global_status ^ global_status;
if (global_status & toggle)
@@ -821,6 +846,12 @@ static void intel_mediated_pmu_put(struct kvm_vcpu *vcpu)
*/
if (pmu->fixed_ctr_ctrl_hw)
wrmsrq(MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
+
+ if (vcpu_has_perf_metrics(vcpu)) {
+ pmu->perf_metrics = rdpmc(INTEL_PMC_FIXED_RDPMC_METRICS);
+ if (pmu->perf_metrics)
+ wrmsrq(MSR_PERF_METRICS, 0);
+ }
}
struct kvm_pmu_ops intel_pmu_ops __initdata = {
diff --git a/arch/x86/kvm/vmx/pmu_intel.h b/arch/x86/kvm/vmx/pmu_intel.h
index 5d9357640aa1..2ec547223b09 100644
--- a/arch/x86/kvm/vmx/pmu_intel.h
+++ b/arch/x86/kvm/vmx/pmu_intel.h
@@ -40,4 +40,9 @@ struct lbr_desc {
extern struct x86_pmu_lbr vmx_lbr_caps;
+static inline bool vcpu_has_perf_metrics(struct kvm_vcpu *vcpu)
+{
+ return !!(vcpu_get_perf_capabilities(vcpu) & PERF_CAP_PERF_METRICS);
+}
+
#endif /* __KVM_X86_VMX_PMU_INTEL_H */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 967b58a8ab9d..4ade1394460a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4338,6 +4338,9 @@ static void vmx_recalc_pmu_msr_intercepts(struct kvm_vcpu *vcpu)
MSR_TYPE_RW, intercept);
vmx_set_intercept_for_msr(vcpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
MSR_TYPE_RW, intercept);
+
+ vmx_set_intercept_for_msr(vcpu, MSR_PERF_METRICS, MSR_TYPE_RW,
+ !vcpu_has_perf_metrics(vcpu));
}
static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
@@ -8183,6 +8186,9 @@ static __init u64 vmx_get_perf_capabilities(void)
perf_cap &= ~PERF_CAP_PEBS_BASELINE;
}
+ if (enable_mediated_pmu)
+ perf_cap |= host_perf_cap & PERF_CAP_PERF_METRICS;
+
return perf_cap;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2ab7a4958620..4d0e38303aa5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -357,7 +357,7 @@ static const u32 msrs_to_save_pmu[] = {
MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
MSR_ARCH_PERFMON_FIXED_CTR2, MSR_ARCH_PERFMON_FIXED_CTR3,
MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
- MSR_CORE_PERF_GLOBAL_CTRL,
+ MSR_CORE_PERF_GLOBAL_CTRL, MSR_PERF_METRICS,
MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
/* This part of MSRs should match KVM_MAX_NR_INTEL_GP_COUNTERS. */
@@ -7675,6 +7675,10 @@ static void kvm_probe_msr_to_save(u32 msr_index)
intel_pt_validate_hw_cap(PT_CAP_num_address_ranges) * 2))
return;
break;
+ case MSR_PERF_METRICS:
+ if (!(kvm_caps.supported_perf_cap & PERF_CAP_PERF_METRICS))
+ return;
+ break;
case MSR_ARCH_PERFMON_PERFCTR0 ...
MSR_ARCH_PERFMON_PERFCTR0 + KVM_MAX_NR_GP_COUNTERS - 1:
if (msr_index - MSR_ARCH_PERFMON_PERFCTR0 >=
--
2.53.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/3] KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events
2026-02-26 23:06 ` [PATCH 1/3] KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events Zide Chen
@ 2026-03-24 5:48 ` Mi, Dapeng
2026-03-27 20:53 ` Chen, Zide
0 siblings, 1 reply; 9+ messages in thread
From: Mi, Dapeng @ 2026-03-24 5:48 UTC (permalink / raw)
To: Zide Chen, Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Das Sandipan,
Shukla Manali, Falcon Thomas, Xudong Hao
On 2/27/2026 7:06 AM, Zide Chen wrote:
> Only fixed counters 0..2 have matching generic cross-platform
> hardware perf events (INSTRUCTIONS, CPU_CYCLES, REF_CPU_CYCLES).
> Therefore, perf_get_hw_event_config() is only applicable to these
> counters.
>
> KVM does not intend to emulate fixed counters >= 3 on legacy
> (non-mediated) vPMU, while for mediated vPMU, KVM does not care what
> the fixed counter event mappings are. Therefore, return 0 for their
> eventsel.
>
> Also remove __always_inline as BUILD_BUG_ON() is no longer needed.
>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
> ---
> arch/x86/kvm/vmx/pmu_intel.c | 26 ++++++++++++++------------
> 1 file changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 27eb76e6b6a0..4bfd16a9e6c7 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -454,28 +454,30 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> * different perf_event is already utilizing the requested counter, but the end
> * result is the same (ignoring the fact that using a general purpose counter
> * will likely exacerbate counter contention).
> - *
> - * Forcibly inlined to allow asserting on @index at build time, and there should
> - * never be more than one user.
> */
> -static __always_inline u64 intel_get_fixed_pmc_eventsel(unsigned int index)
> +static u64 intel_get_fixed_pmc_eventsel(unsigned int index)
> {
> const enum perf_hw_id fixed_pmc_perf_ids[] = {
> [0] = PERF_COUNT_HW_INSTRUCTIONS,
> [1] = PERF_COUNT_HW_CPU_CYCLES,
> [2] = PERF_COUNT_HW_REF_CPU_CYCLES,
> };
> - u64 eventsel;
> -
> - BUILD_BUG_ON(ARRAY_SIZE(fixed_pmc_perf_ids) != KVM_MAX_NR_INTEL_FIXED_COUNTERS);
> - BUILD_BUG_ON(index >= KVM_MAX_NR_INTEL_FIXED_COUNTERS);
> + u64 eventsel = 0;
I'm not sure if we can directly add the "slots" event support in the
perf_hw_id list. It looks more straightforward, but it need to involve perf
changes and impact all architectures. Not sure if it's worthy to do it.
Current way is fine for me if we decided not to support the fixed counter 3
for the legacy emulated vPMU.
Maybe one minor optimization, we can directly return here and then don't
need to add so much indents, e.g.,
if (index >=3)
return eventsel;
>
> /*
> - * Yell if perf reports support for a fixed counter but perf doesn't
> - * have a known encoding for the associated general purpose event.
> + * Fixed counters 3 and above don't have corresponding generic hardware
> + * perf event, and KVM does not intend to emulate them on non-mediated
> + * vPMU.
> */
> - eventsel = perf_get_hw_event_config(fixed_pmc_perf_ids[index]);
> - WARN_ON_ONCE(!eventsel && index < kvm_pmu_cap.num_counters_fixed);
> + if (index < 3) {
> + /*
> + * Yell if perf reports support for a fixed counter but perf
> + * doesn't have a known encoding for the associated general
> + * purpose event.
> + */
> + eventsel = perf_get_hw_event_config(fixed_pmc_perf_ids[index]);
> + WARN_ON_ONCE(!eventsel && index < kvm_pmu_cap.num_counters_fixed);
> + }
> return eventsel;
> }
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/3] KVM: x86/pmu: Support PERF_METRICS MSR in mediated vPMU
2026-02-26 23:06 ` [PATCH 3/3] KVM: x86/pmu: Support PERF_METRICS MSR in " Zide Chen
@ 2026-03-24 5:54 ` Mi, Dapeng
2026-03-27 20:23 ` Chen, Zide
0 siblings, 1 reply; 9+ messages in thread
From: Mi, Dapeng @ 2026-03-24 5:54 UTC (permalink / raw)
To: Zide Chen, Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Das Sandipan,
Shukla Manali, Falcon Thomas, Xudong Hao
On 2/27/2026 7:06 AM, Zide Chen wrote:
> From: Dapeng Mi <dapeng1.mi@linux.intel.com>
>
> Bit 15 in IA32_PERF_CAPABILITIES indicates that the CPU provides
> built-in support for Topdown Microarchitecture Analysis (TMA) L1
> metrics via the IA32_PERF_METRICS MSR.
>
> Expose this capability only when mediated vPMU is enabled, as emulating
> IA32_PERF_METRICS in the legacy vPMU model is impractical.
>
> Pass IA32_PERF_METRICS through to the guest only when mediated vPMU is
> enabled and bit 15 is set in guest IA32_PERF_CAPABILITIES is. Allow
> kvm_pmu_{get,set}_msr() to handle this MSR for host accesses.
>
> Save and restore this MSR on host/guest PMU context switches so that
> host PMU activity does not clobber the guest value, and guest state
> is not leaked into the host.
>
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/include/asm/msr-index.h | 1 +
> arch/x86/include/asm/perf_event.h | 1 +
> arch/x86/kvm/vmx/pmu_intel.c | 31 +++++++++++++++++++++++++++++++
> arch/x86/kvm/vmx/pmu_intel.h | 5 +++++
> arch/x86/kvm/vmx/vmx.c | 6 ++++++
> arch/x86/kvm/x86.c | 6 +++++-
> 7 files changed, 50 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 4666b2c7988f..bf817c613451 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -575,6 +575,7 @@ struct kvm_pmu {
> u64 global_status_rsvd;
> u64 reserved_bits;
> u64 raw_event_mask;
> + u64 perf_metrics;
> struct kvm_pmc gp_counters[KVM_MAX_NR_GP_COUNTERS];
> struct kvm_pmc fixed_counters[KVM_MAX_NR_FIXED_COUNTERS];
>
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index da5275d8eda6..337667a7ad1b 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -331,6 +331,7 @@
> #define PERF_CAP_PEBS_FORMAT 0xf00
> #define PERF_CAP_FW_WRITES BIT_ULL(13)
> #define PERF_CAP_PEBS_BASELINE BIT_ULL(14)
> +#define PERF_CAP_PERF_METRICS BIT_ULL(15)
> #define PERF_CAP_PEBS_TIMING_INFO BIT_ULL(17)
> #define PERF_CAP_PEBS_MASK (PERF_CAP_PEBS_TRAP | PERF_CAP_ARCH_REG | \
> PERF_CAP_PEBS_FORMAT | PERF_CAP_PEBS_BASELINE | \
> diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
> index ff5acb8b199b..dfead3a34b74 100644
> --- a/arch/x86/include/asm/perf_event.h
> +++ b/arch/x86/include/asm/perf_event.h
> @@ -445,6 +445,7 @@ static inline bool is_topdown_idx(int idx)
> #define GLOBAL_STATUS_ARCH_PEBS_THRESHOLD_BIT 54
> #define GLOBAL_STATUS_ARCH_PEBS_THRESHOLD BIT_ULL(GLOBAL_STATUS_ARCH_PEBS_THRESHOLD_BIT)
> #define GLOBAL_STATUS_PERF_METRICS_OVF_BIT 48
> +#define GLOBAL_STATUS_PERF_METRICS_OVF BIT_ULL(GLOBAL_STATUS_PERF_METRICS_OVF_BIT)
>
> #define GLOBAL_CTRL_EN_PERF_METRICS BIT_ULL(48)
> /*
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 9da47cf2af63..61bb2086f94a 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -180,6 +180,8 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
> switch (msr) {
> case MSR_CORE_PERF_FIXED_CTR_CTRL:
> return kvm_pmu_has_perf_global_ctrl(pmu);
> + case MSR_PERF_METRICS:
> + return vcpu_has_perf_metrics(vcpu);
> case MSR_IA32_PEBS_ENABLE:
> ret = vcpu_get_perf_capabilities(vcpu) & PERF_CAP_PEBS_FORMAT;
> break;
> @@ -335,6 +337,10 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> case MSR_CORE_PERF_FIXED_CTR_CTRL:
> msr_info->data = pmu->fixed_ctr_ctrl;
> break;
> + case MSR_PERF_METRICS:
> + WARN_ON(!msr_info->host_initiated);
WARN_ON_ONCE() should be good enough.
> + msr_info->data = pmu->perf_metrics;
> + break;
> case MSR_IA32_PEBS_ENABLE:
> msr_info->data = pmu->pebs_enable;
> break;
> @@ -384,6 +390,10 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> if (pmu->fixed_ctr_ctrl != data)
> reprogram_fixed_counters(pmu, data);
> break;
> + case MSR_PERF_METRICS:
> + WARN_ON(!msr_info->host_initiated);
ditto.
> + pmu->perf_metrics = data;
> + break;
> case MSR_IA32_PEBS_ENABLE:
> if (data & pmu->pebs_enable_rsvd)
> return 1;
> @@ -579,6 +589,11 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
> pmu->global_status_rsvd &=
> ~MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI;
>
> + if (perf_capabilities & PERF_CAP_PERF_METRICS) {
> + pmu->global_ctrl_rsvd &= ~GLOBAL_CTRL_EN_PERF_METRICS;
> + pmu->global_status_rsvd &= ~GLOBAL_STATUS_PERF_METRICS_OVF;
> + }
> +
> if (perf_capabilities & PERF_CAP_PEBS_FORMAT) {
> if (perf_capabilities & PERF_CAP_PEBS_BASELINE) {
> pmu->pebs_enable_rsvd = counter_rsvd;
> @@ -622,6 +637,9 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
>
> static void intel_pmu_reset(struct kvm_vcpu *vcpu)
> {
> + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> +
> + pmu->perf_metrics = 0;
> intel_pmu_release_guest_lbr_event(vcpu);
> }
>
> @@ -793,6 +811,13 @@ static void intel_mediated_pmu_load(struct kvm_vcpu *vcpu)
> struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> u64 global_status, toggle;
>
> + /*
> + * PERF_METRICS MSR must be restored closely after fixed counter 3
> + * (kvm_pmu_load_guest_pmcs()).
> + */
> + if (vcpu_has_perf_metrics(vcpu))
> + wrmsrq(MSR_PERF_METRICS, pmu->perf_metrics);
> +
> rdmsrq(MSR_CORE_PERF_GLOBAL_STATUS, global_status);
> toggle = pmu->global_status ^ global_status;
> if (global_status & toggle)
> @@ -821,6 +846,12 @@ static void intel_mediated_pmu_put(struct kvm_vcpu *vcpu)
> */
> if (pmu->fixed_ctr_ctrl_hw)
> wrmsrq(MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
> +
> + if (vcpu_has_perf_metrics(vcpu)) {
> + pmu->perf_metrics = rdpmc(INTEL_PMC_FIXED_RDPMC_METRICS);
> + if (pmu->perf_metrics)
> + wrmsrq(MSR_PERF_METRICS, 0);
> + }
> }
>
> struct kvm_pmu_ops intel_pmu_ops __initdata = {
> diff --git a/arch/x86/kvm/vmx/pmu_intel.h b/arch/x86/kvm/vmx/pmu_intel.h
> index 5d9357640aa1..2ec547223b09 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.h
> +++ b/arch/x86/kvm/vmx/pmu_intel.h
> @@ -40,4 +40,9 @@ struct lbr_desc {
>
> extern struct x86_pmu_lbr vmx_lbr_caps;
>
> +static inline bool vcpu_has_perf_metrics(struct kvm_vcpu *vcpu)
> +{
> + return !!(vcpu_get_perf_capabilities(vcpu) & PERF_CAP_PERF_METRICS);
> +}
> +
> #endif /* __KVM_X86_VMX_PMU_INTEL_H */
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 967b58a8ab9d..4ade1394460a 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4338,6 +4338,9 @@ static void vmx_recalc_pmu_msr_intercepts(struct kvm_vcpu *vcpu)
> MSR_TYPE_RW, intercept);
> vmx_set_intercept_for_msr(vcpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
> MSR_TYPE_RW, intercept);
> +
> + vmx_set_intercept_for_msr(vcpu, MSR_PERF_METRICS, MSR_TYPE_RW,
> + !vcpu_has_perf_metrics(vcpu));
> }
>
> static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
> @@ -8183,6 +8186,9 @@ static __init u64 vmx_get_perf_capabilities(void)
> perf_cap &= ~PERF_CAP_PEBS_BASELINE;
> }
>
> + if (enable_mediated_pmu)
> + perf_cap |= host_perf_cap & PERF_CAP_PERF_METRICS;
> +
> return perf_cap;
> }
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2ab7a4958620..4d0e38303aa5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -357,7 +357,7 @@ static const u32 msrs_to_save_pmu[] = {
> MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
> MSR_ARCH_PERFMON_FIXED_CTR2, MSR_ARCH_PERFMON_FIXED_CTR3,
> MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
> - MSR_CORE_PERF_GLOBAL_CTRL,
> + MSR_CORE_PERF_GLOBAL_CTRL, MSR_PERF_METRICS,
> MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
>
> /* This part of MSRs should match KVM_MAX_NR_INTEL_GP_COUNTERS. */
> @@ -7675,6 +7675,10 @@ static void kvm_probe_msr_to_save(u32 msr_index)
> intel_pt_validate_hw_cap(PT_CAP_num_address_ranges) * 2))
> return;
> break;
> + case MSR_PERF_METRICS:
> + if (!(kvm_caps.supported_perf_cap & PERF_CAP_PERF_METRICS))
> + return;
> + break;
> case MSR_ARCH_PERFMON_PERFCTR0 ...
> MSR_ARCH_PERFMON_PERFCTR0 + KVM_MAX_NR_GP_COUNTERS - 1:
> if (msr_index - MSR_ARCH_PERFMON_PERFCTR0 >=
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/3] KVM: x86/pmu: Support PERF_METRICS MSR in mediated vPMU
2026-03-24 5:54 ` Mi, Dapeng
@ 2026-03-27 20:23 ` Chen, Zide
0 siblings, 0 replies; 9+ messages in thread
From: Chen, Zide @ 2026-03-27 20:23 UTC (permalink / raw)
To: Mi, Dapeng, Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Das Sandipan,
Shukla Manali, Falcon Thomas, Xudong Hao
On 3/23/2026 10:54 PM, Mi, Dapeng wrote:
>
> On 2/27/2026 7:06 AM, Zide Chen wrote:
>> From: Dapeng Mi <dapeng1.mi@linux.intel.com>
>>
>> Bit 15 in IA32_PERF_CAPABILITIES indicates that the CPU provides
>> built-in support for Topdown Microarchitecture Analysis (TMA) L1
>> metrics via the IA32_PERF_METRICS MSR.
>>
>> Expose this capability only when mediated vPMU is enabled, as emulating
>> IA32_PERF_METRICS in the legacy vPMU model is impractical.
>>
>> Pass IA32_PERF_METRICS through to the guest only when mediated vPMU is
>> enabled and bit 15 is set in guest IA32_PERF_CAPABILITIES is. Allow
>> kvm_pmu_{get,set}_msr() to handle this MSR for host accesses.
>>
>> Save and restore this MSR on host/guest PMU context switches so that
>> host PMU activity does not clobber the guest value, and guest state
>> is not leaked into the host.
>>
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> Signed-off-by: Zide Chen <zide.chen@intel.com>
>> ---
>> arch/x86/include/asm/kvm_host.h | 1 +
>> arch/x86/include/asm/msr-index.h | 1 +
>> arch/x86/include/asm/perf_event.h | 1 +
>> arch/x86/kvm/vmx/pmu_intel.c | 31 +++++++++++++++++++++++++++++++
>> arch/x86/kvm/vmx/pmu_intel.h | 5 +++++
>> arch/x86/kvm/vmx/vmx.c | 6 ++++++
>> arch/x86/kvm/x86.c | 6 +++++-
>> 7 files changed, 50 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 4666b2c7988f..bf817c613451 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -575,6 +575,7 @@ struct kvm_pmu {
>> u64 global_status_rsvd;
>> u64 reserved_bits;
>> u64 raw_event_mask;
>> + u64 perf_metrics;
>> struct kvm_pmc gp_counters[KVM_MAX_NR_GP_COUNTERS];
>> struct kvm_pmc fixed_counters[KVM_MAX_NR_FIXED_COUNTERS];
>>
>> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
>> index da5275d8eda6..337667a7ad1b 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -331,6 +331,7 @@
>> #define PERF_CAP_PEBS_FORMAT 0xf00
>> #define PERF_CAP_FW_WRITES BIT_ULL(13)
>> #define PERF_CAP_PEBS_BASELINE BIT_ULL(14)
>> +#define PERF_CAP_PERF_METRICS BIT_ULL(15)
>> #define PERF_CAP_PEBS_TIMING_INFO BIT_ULL(17)
>> #define PERF_CAP_PEBS_MASK (PERF_CAP_PEBS_TRAP | PERF_CAP_ARCH_REG | \
>> PERF_CAP_PEBS_FORMAT | PERF_CAP_PEBS_BASELINE | \
>> diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
>> index ff5acb8b199b..dfead3a34b74 100644
>> --- a/arch/x86/include/asm/perf_event.h
>> +++ b/arch/x86/include/asm/perf_event.h
>> @@ -445,6 +445,7 @@ static inline bool is_topdown_idx(int idx)
>> #define GLOBAL_STATUS_ARCH_PEBS_THRESHOLD_BIT 54
>> #define GLOBAL_STATUS_ARCH_PEBS_THRESHOLD BIT_ULL(GLOBAL_STATUS_ARCH_PEBS_THRESHOLD_BIT)
>> #define GLOBAL_STATUS_PERF_METRICS_OVF_BIT 48
>> +#define GLOBAL_STATUS_PERF_METRICS_OVF BIT_ULL(GLOBAL_STATUS_PERF_METRICS_OVF_BIT)
>>
>> #define GLOBAL_CTRL_EN_PERF_METRICS BIT_ULL(48)
>> /*
>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>> index 9da47cf2af63..61bb2086f94a 100644
>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>> @@ -180,6 +180,8 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
>> switch (msr) {
>> case MSR_CORE_PERF_FIXED_CTR_CTRL:
>> return kvm_pmu_has_perf_global_ctrl(pmu);
>> + case MSR_PERF_METRICS:
>> + return vcpu_has_perf_metrics(vcpu);
>> case MSR_IA32_PEBS_ENABLE:
>> ret = vcpu_get_perf_capabilities(vcpu) & PERF_CAP_PEBS_FORMAT;
>> break;
>> @@ -335,6 +337,10 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> case MSR_CORE_PERF_FIXED_CTR_CTRL:
>> msr_info->data = pmu->fixed_ctr_ctrl;
>> break;
>> + case MSR_PERF_METRICS:
>> + WARN_ON(!msr_info->host_initiated);
>
> WARN_ON_ONCE() should be good enough.
Sure, and seems it's more reasonable to move the check to
intel_is_valid_msr().
>
>
>> + msr_info->data = pmu->perf_metrics;
>> + break;
>> case MSR_IA32_PEBS_ENABLE:
>> msr_info->data = pmu->pebs_enable;
>> break;
>> @@ -384,6 +390,10 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> if (pmu->fixed_ctr_ctrl != data)
>> reprogram_fixed_counters(pmu, data);
>> break;
>> + case MSR_PERF_METRICS:
>> + WARN_ON(!msr_info->host_initiated);
>
> ditto.
>
>
>> + pmu->perf_metrics = data;
>> + break;
>> case MSR_IA32_PEBS_ENABLE:
>> if (data & pmu->pebs_enable_rsvd)
>> return 1;
>> @@ -579,6 +589,11 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>> pmu->global_status_rsvd &=
>> ~MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI;
>>
>> + if (perf_capabilities & PERF_CAP_PERF_METRICS) {
>> + pmu->global_ctrl_rsvd &= ~GLOBAL_CTRL_EN_PERF_METRICS;
>> + pmu->global_status_rsvd &= ~GLOBAL_STATUS_PERF_METRICS_OVF;
>> + }
>> +
>> if (perf_capabilities & PERF_CAP_PEBS_FORMAT) {
>> if (perf_capabilities & PERF_CAP_PEBS_BASELINE) {
>> pmu->pebs_enable_rsvd = counter_rsvd;
>> @@ -622,6 +637,9 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
>>
>> static void intel_pmu_reset(struct kvm_vcpu *vcpu)
>> {
>> + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>> +
>> + pmu->perf_metrics = 0;
>> intel_pmu_release_guest_lbr_event(vcpu);
>> }
>>
>> @@ -793,6 +811,13 @@ static void intel_mediated_pmu_load(struct kvm_vcpu *vcpu)
>> struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>> u64 global_status, toggle;
>>
>> + /*
>> + * PERF_METRICS MSR must be restored closely after fixed counter 3
>> + * (kvm_pmu_load_guest_pmcs()).
>> + */
>> + if (vcpu_has_perf_metrics(vcpu))
>> + wrmsrq(MSR_PERF_METRICS, pmu->perf_metrics);
>> +
>> rdmsrq(MSR_CORE_PERF_GLOBAL_STATUS, global_status);
>> toggle = pmu->global_status ^ global_status;
>> if (global_status & toggle)
>> @@ -821,6 +846,12 @@ static void intel_mediated_pmu_put(struct kvm_vcpu *vcpu)
>> */
>> if (pmu->fixed_ctr_ctrl_hw)
>> wrmsrq(MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
>> +
>> + if (vcpu_has_perf_metrics(vcpu)) {
>> + pmu->perf_metrics = rdpmc(INTEL_PMC_FIXED_RDPMC_METRICS);
>> + if (pmu->perf_metrics)
>> + wrmsrq(MSR_PERF_METRICS, 0);
>> + }
>> }
>>
>> struct kvm_pmu_ops intel_pmu_ops __initdata = {
>> diff --git a/arch/x86/kvm/vmx/pmu_intel.h b/arch/x86/kvm/vmx/pmu_intel.h
>> index 5d9357640aa1..2ec547223b09 100644
>> --- a/arch/x86/kvm/vmx/pmu_intel.h
>> +++ b/arch/x86/kvm/vmx/pmu_intel.h
>> @@ -40,4 +40,9 @@ struct lbr_desc {
>>
>> extern struct x86_pmu_lbr vmx_lbr_caps;
>>
>> +static inline bool vcpu_has_perf_metrics(struct kvm_vcpu *vcpu)
>> +{
>> + return !!(vcpu_get_perf_capabilities(vcpu) & PERF_CAP_PERF_METRICS);
>> +}
>> +
>> #endif /* __KVM_X86_VMX_PMU_INTEL_H */
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 967b58a8ab9d..4ade1394460a 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -4338,6 +4338,9 @@ static void vmx_recalc_pmu_msr_intercepts(struct kvm_vcpu *vcpu)
>> MSR_TYPE_RW, intercept);
>> vmx_set_intercept_for_msr(vcpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
>> MSR_TYPE_RW, intercept);
>> +
>> + vmx_set_intercept_for_msr(vcpu, MSR_PERF_METRICS, MSR_TYPE_RW,
>> + !vcpu_has_perf_metrics(vcpu));
>> }
>>
>> static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
>> @@ -8183,6 +8186,9 @@ static __init u64 vmx_get_perf_capabilities(void)
>> perf_cap &= ~PERF_CAP_PEBS_BASELINE;
>> }
>>
>> + if (enable_mediated_pmu)
>> + perf_cap |= host_perf_cap & PERF_CAP_PERF_METRICS;
>> +
>> return perf_cap;
>> }
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 2ab7a4958620..4d0e38303aa5 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -357,7 +357,7 @@ static const u32 msrs_to_save_pmu[] = {
>> MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
>> MSR_ARCH_PERFMON_FIXED_CTR2, MSR_ARCH_PERFMON_FIXED_CTR3,
>> MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
>> - MSR_CORE_PERF_GLOBAL_CTRL,
>> + MSR_CORE_PERF_GLOBAL_CTRL, MSR_PERF_METRICS,
>> MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
>>
>> /* This part of MSRs should match KVM_MAX_NR_INTEL_GP_COUNTERS. */
>> @@ -7675,6 +7675,10 @@ static void kvm_probe_msr_to_save(u32 msr_index)
>> intel_pt_validate_hw_cap(PT_CAP_num_address_ranges) * 2))
>> return;
>> break;
>> + case MSR_PERF_METRICS:
>> + if (!(kvm_caps.supported_perf_cap & PERF_CAP_PERF_METRICS))
>> + return;
>> + break;
>> case MSR_ARCH_PERFMON_PERFCTR0 ...
>> MSR_ARCH_PERFMON_PERFCTR0 + KVM_MAX_NR_GP_COUNTERS - 1:
>> if (msr_index - MSR_ARCH_PERFMON_PERFCTR0 >=
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/3] KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events
2026-03-24 5:48 ` Mi, Dapeng
@ 2026-03-27 20:53 ` Chen, Zide
0 siblings, 0 replies; 9+ messages in thread
From: Chen, Zide @ 2026-03-27 20:53 UTC (permalink / raw)
To: Mi, Dapeng, Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Das Sandipan,
Shukla Manali, Falcon Thomas, Xudong Hao
On 3/23/2026 10:48 PM, Mi, Dapeng wrote:
>
> On 2/27/2026 7:06 AM, Zide Chen wrote:
>> Only fixed counters 0..2 have matching generic cross-platform
>> hardware perf events (INSTRUCTIONS, CPU_CYCLES, REF_CPU_CYCLES).
>> Therefore, perf_get_hw_event_config() is only applicable to these
>> counters.
>>
>> KVM does not intend to emulate fixed counters >= 3 on legacy
>> (non-mediated) vPMU, while for mediated vPMU, KVM does not care what
>> the fixed counter event mappings are. Therefore, return 0 for their
>> eventsel.
>>
>> Also remove __always_inline as BUILD_BUG_ON() is no longer needed.
>>
>> Signed-off-by: Zide Chen <zide.chen@intel.com>
>> ---
>> arch/x86/kvm/vmx/pmu_intel.c | 26 ++++++++++++++------------
>> 1 file changed, 14 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>> index 27eb76e6b6a0..4bfd16a9e6c7 100644
>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>> @@ -454,28 +454,30 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> * different perf_event is already utilizing the requested counter, but the end
>> * result is the same (ignoring the fact that using a general purpose counter
>> * will likely exacerbate counter contention).
>> - *
>> - * Forcibly inlined to allow asserting on @index at build time, and there should
>> - * never be more than one user.
>> */
>> -static __always_inline u64 intel_get_fixed_pmc_eventsel(unsigned int index)
>> +static u64 intel_get_fixed_pmc_eventsel(unsigned int index)
>> {
>> const enum perf_hw_id fixed_pmc_perf_ids[] = {
>> [0] = PERF_COUNT_HW_INSTRUCTIONS,
>> [1] = PERF_COUNT_HW_CPU_CYCLES,
>> [2] = PERF_COUNT_HW_REF_CPU_CYCLES,
>> };
>> - u64 eventsel;
>> -
>> - BUILD_BUG_ON(ARRAY_SIZE(fixed_pmc_perf_ids) != KVM_MAX_NR_INTEL_FIXED_COUNTERS);
>> - BUILD_BUG_ON(index >= KVM_MAX_NR_INTEL_FIXED_COUNTERS);
>> + u64 eventsel = 0;
>
> I'm not sure if we can directly add the "slots" event support in the
> perf_hw_id list. It looks more straightforward, but it need to involve perf
> changes and impact all architectures. Not sure if it's worthy to do it.
> Current way is fine for me if we decided not to support the fixed counter 3
> for the legacy emulated vPMU.
If the emulated vPMU does not intend to support fixed counter 3 and
above, which is currently the case, I do not see any benefit in
extending enum perf_hw_id, which is generic across platforms.
> Maybe one minor optimization, we can directly return here and then don't
> need to add so much indents, e.g.,
>
> if (index >=3)
>
> return eventsel;
Yes, Agreed. The code is cleaner and it's more logical.
>
>
>>
>> /*
>> - * Yell if perf reports support for a fixed counter but perf doesn't
>> - * have a known encoding for the associated general purpose event.
>> + * Fixed counters 3 and above don't have corresponding generic hardware
>> + * perf event, and KVM does not intend to emulate them on non-mediated
>> + * vPMU.
>> */
>> - eventsel = perf_get_hw_event_config(fixed_pmc_perf_ids[index]);
>> - WARN_ON_ONCE(!eventsel && index < kvm_pmu_cap.num_counters_fixed);
>> + if (index < 3) {
>> + /*
>> + * Yell if perf reports support for a fixed counter but perf
>> + * doesn't have a known encoding for the associated general
>> + * purpose event.
>> + */
>> + eventsel = perf_get_hw_event_config(fixed_pmc_perf_ids[index]);
>> + WARN_ON_ONCE(!eventsel && index < kvm_pmu_cap.num_counters_fixed);
>> + }
>> return eventsel;
>> }
>>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] KVM: x86/pmu: Add hardware Topdown metrics support
2026-02-26 23:06 [PATCH 0/3] KVM: x86/pmu: Add hardware Topdown metrics support Zide Chen
` (2 preceding siblings ...)
2026-02-26 23:06 ` [PATCH 3/3] KVM: x86/pmu: Support PERF_METRICS MSR in " Zide Chen
@ 2026-04-09 6:25 ` Mi, Dapeng
3 siblings, 0 replies; 9+ messages in thread
From: Mi, Dapeng @ 2026-04-09 6:25 UTC (permalink / raw)
To: Zide Chen, Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Das Sandipan,
Shukla Manali, Falcon Thomas, Xudong Hao
Zide, it seems currently there is no KVM/selftests to cover the fixed
counter 3 and the topdown metrics support. We could enhance current PMU
selftest or add a new selftest to cover this case. Thanks.
On 2/27/2026 7:06 AM, Zide Chen wrote:
> The Top-Down Microarchitecture Analysis (TMA) method is a structured
> approach for identifying performance bottlenecks in out-of-order
> processors.
>
> Currently, guests support the TMA method by collecting Topdown events
> using GP counters, which may trigger multiplexing. To free up scarce
> GP counters, eliminate multiplexing-induced skew, and obtain coherent
> Topdown metric ratios, it is desirable to expose fixed counter 3 and
> the IA32_PERF_METRICS MSR to guests.
>
> Several failed attempts have been made to virtualize this under the
> legacy vPMU model: [1], [2], [3]. With the new mediated vPMU, enabling
> TMA support in guests becomes much simpler. It avoids invasive changes
> to the perf core, eliminates CPU pinning and fixed-counter affinity
> issues, and reduces the overhead of trapping and emulating MSR accesses.
>
> [1] https://lore.kernel.org/kvm/20231031090613.2872700-1-dapeng1.mi@linux.intel.com/
> [2] https://lore.kernel.org/all/20230927033124.1226509-1-dapeng1.mi@linux.intel.com/T/
> [3] https://lwn.net/ml/linux-kernel/20221212125844.41157-1-likexu@tencent.com/
>
> Tested on an SPR. Without this series, only raw topdown.*_slots events
> work in the guest, and metric events (e.g. cpu/topdown-bad-spec/) are
> not available.
>
> With this series, metric events are visible in the guest. Run this
> command on both host and guest:
>
> $ perf stat --topdown --no-metric-only -- taskset -c 2 perf bench sched messaging
>
> Host results:
>
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 1.500 [sec]
>
> Performance counter stats for 'taskset -c 2 perf bench sched messaging':
>
> 4,266,060,558 TOPDOWN.SLOTS:u # 32.0 % tma_frontend_bound
> # 5.2 % tma_bad_speculation
> 588,397,905 topdown-retiring:u # 13.8 % tma_retiring
> # 49.0 % tma_backend_bound
> 1,376,283,990 topdown-fe-bound:u
> 2,096,827,304 topdown-be-bound:u
> 217,425,841 topdown-bad-spec:u
> 5,050,520 INT_MISC.UOP_DROPPING:u
>
> 1.755503765 seconds time elapsed
>
> 0.235965000 seconds user
> 1.500508000 seconds sys
>
> Guest results:
>
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 1.558 [sec]
>
> Performance counter stats for 'taskset -c 2 perf bench sched messaging':
>
> 5,148,818,712 TOPDOWN.SLOTS:u # 34.0 % tma_frontend_bound
> # 4.6 % tma_bad_speculation
> 602,862,499 topdown-retiring:u # 11.7 % tma_retiring
> # 49.7 % tma_backend_bound
> 1,759,698,259 topdown-fe-bound:u
> 2,565,571,672 topdown-be-bound:u
> 230,277,308 topdown-bad-spec:u
> 4,966,279 INT_MISC.UOP_DROPPING:u
>
> 1.783366587 seconds time elapsed
>
> 0.313692000 seconds user
> 1.446377000 seconds sys
>
> Dapeng Mi (2):
> KVM: x86/pmu: Support Intel fixed counter 3 on mediated vPMU
> KVM: x86/pmu: Support PERF_METRICS MSR in mediated vPMU
>
> Zide Chen (1):
> KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events
>
> arch/x86/include/asm/kvm_host.h | 3 +-
> arch/x86/include/asm/msr-index.h | 1 +
> arch/x86/include/asm/perf_event.h | 1 +
> arch/x86/kvm/pmu.c | 4 +++
> arch/x86/kvm/vmx/pmu_intel.c | 57 ++++++++++++++++++++++++-------
> arch/x86/kvm/vmx/pmu_intel.h | 5 +++
> arch/x86/kvm/vmx/vmx.c | 6 ++++
> arch/x86/kvm/x86.c | 10 ++++--
> 8 files changed, 71 insertions(+), 16 deletions(-)
>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-04-09 6:25 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-26 23:06 [PATCH 0/3] KVM: x86/pmu: Add hardware Topdown metrics support Zide Chen
2026-02-26 23:06 ` [PATCH 1/3] KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events Zide Chen
2026-03-24 5:48 ` Mi, Dapeng
2026-03-27 20:53 ` Chen, Zide
2026-02-26 23:06 ` [PATCH 2/3] KVM: x86/pmu: Support Intel fixed counter 3 on mediated vPMU Zide Chen
2026-02-26 23:06 ` [PATCH 3/3] KVM: x86/pmu: Support PERF_METRICS MSR in " Zide Chen
2026-03-24 5:54 ` Mi, Dapeng
2026-03-27 20:23 ` Chen, Zide
2026-04-09 6:25 ` [PATCH 0/3] KVM: x86/pmu: Add hardware Topdown metrics support Mi, Dapeng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox