* [PATCH V3 1/4] KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events
2026-06-15 23:01 [PATCH V3 0/4] KVM: x86/pmu: Add hardware Topdown metrics support Zide Chen
@ 2026-06-15 23:01 ` Zide Chen
2026-06-15 23:01 ` [PATCH V3 2/4] KVM: x86/pmu: Support Intel fixed counter 3 on mediated vPMU Zide Chen
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: Zide Chen @ 2026-06-15 23:01 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Zide Chen,
Das Sandipan, Shukla Manali, Dapeng Mi, Falcon Thomas, Xudong Hao
Only fixed counters 0..2 have matching generic cross-platform
hardware perf events (INSTRUCTIONS, CPU_CYCLES, REF_CPU_CYCLES).
Therefore, perf_get_hw_event_config() is only applicable to these
counters.
KVM does not intend to emulate fixed counters >= 3 on legacy
(non-mediated) vPMU, while for mediated vPMU, KVM does not care what
the fixed counter event mappings are. Therefore, return 0 for their
eventsel.
Also remove __always_inline as BUILD_BUG_ON() is no longer needed.
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
v2:
- Replace 3 in "if (index < 3)" with ARRAY_SIZE(fixed_pmc_perf_ids).
---
arch/x86/kvm/vmx/pmu_intel.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index a73a9515d96c..59b7a90c79e1 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -464,28 +464,30 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
* different perf_event is already utilizing the requested counter, but the end
* result is the same (ignoring the fact that using a general purpose counter
* will likely exacerbate counter contention).
- *
- * Forcibly inlined to allow asserting on @index at build time, and there should
- * never be more than one user.
*/
-static __always_inline u64 intel_get_fixed_pmc_eventsel(unsigned int index)
+static u64 intel_get_fixed_pmc_eventsel(unsigned int index)
{
const enum perf_hw_id fixed_pmc_perf_ids[] = {
[0] = PERF_COUNT_HW_INSTRUCTIONS,
[1] = PERF_COUNT_HW_CPU_CYCLES,
[2] = PERF_COUNT_HW_REF_CPU_CYCLES,
};
- u64 eventsel;
-
- BUILD_BUG_ON(ARRAY_SIZE(fixed_pmc_perf_ids) != KVM_MAX_NR_INTEL_FIXED_COUNTERS);
- BUILD_BUG_ON(index >= KVM_MAX_NR_INTEL_FIXED_COUNTERS);
+ u64 eventsel = 0;
/*
- * Yell if perf reports support for a fixed counter but perf doesn't
- * have a known encoding for the associated general purpose event.
+ * Fixed counters 3 and above don't have corresponding generic hardware
+ * perf event, and KVM does not intend to emulate them on non-mediated
+ * vPMU.
*/
- eventsel = perf_get_hw_event_config(fixed_pmc_perf_ids[index]);
- WARN_ON_ONCE(!eventsel && index < kvm_pmu_cap.num_counters_fixed);
+ if (index < ARRAY_SIZE(fixed_pmc_perf_ids)) {
+ /*
+ * Yell if perf reports support for a fixed counter but perf
+ * doesn't have a known encoding for the associated general
+ * purpose event.
+ */
+ eventsel = perf_get_hw_event_config(fixed_pmc_perf_ids[index]);
+ WARN_ON_ONCE(!eventsel && index < kvm_pmu_cap.num_counters_fixed);
+ }
return eventsel;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH V3 2/4] KVM: x86/pmu: Support Intel fixed counter 3 on mediated vPMU
2026-06-15 23:01 [PATCH V3 0/4] KVM: x86/pmu: Add hardware Topdown metrics support Zide Chen
2026-06-15 23:01 ` [PATCH V3 1/4] KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events Zide Chen
@ 2026-06-15 23:01 ` Zide Chen
2026-06-15 23:01 ` [PATCH V3 3/4] KVM: x86/pmu: Support PERF_METRICS MSR in " Zide Chen
2026-06-15 23:01 ` [PATCH V3 4/4] KVM: selftests: Add perf_metrics and fixed counter 3 tests Zide Chen
3 siblings, 0 replies; 10+ messages in thread
From: Zide Chen @ 2026-06-15 23:01 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Zide Chen,
Das Sandipan, Shukla Manali, Dapeng Mi, Falcon Thomas, Xudong Hao
From: Dapeng Mi <dapeng1.mi@linux.intel.com>
Starting with Ice Lake, Intel introduced fixed counter 3, which counts
TOPDOWN.SLOTS - the number of available slots for an unhalted logical
processor. It serves as the denominator for top-level metrics in the
Top-down Microarchitecture Analysis method.
Emulating this counter on legacy vPMU would require introducing a new
generic perf encoding for the Intel-specific TOPDOWN.SLOTS event in
order to call perf_get_hw_event_config(). This is undesirable as it
would pollute the generic perf event encoding.
Moreover, KVM does not intend to emulate IA32_PERF_METRICS in the
legacy vPMU model, and without IA32_PERF_METRICS, emulating this
counter has little practical value. Therefore, expose fixed counter
3 to guests only when mediated vPMU is enabled.
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Co-developed-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
v3:
- Move the non-contiguous counter filter code to pmu.c
v2:
- Don't advertise fixed counter 3 to userspace if the host doesn't
support it.
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/pmu.c | 18 ++++++++++++++++++
arch/x86/kvm/x86.c | 4 ++--
3 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3886b536c8a5..754103e7ab4d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -577,7 +577,7 @@ struct kvm_pmc {
#define KVM_MAX_NR_GP_COUNTERS KVM_MAX(KVM_MAX_NR_INTEL_GP_COUNTERS, \
KVM_MAX_NR_AMD_GP_COUNTERS)
-#define KVM_MAX_NR_INTEL_FIXED_COUNTERS 3
+#define KVM_MAX_NR_INTEL_FIXED_COUNTERS 4
#define KVM_MAX_NR_AMD_FIXED_COUNTERS 0
#define KVM_MAX_NR_FIXED_COUNTERS KVM_MAX(KVM_MAX_NR_INTEL_FIXED_COUNTERS, \
KVM_MAX_NR_AMD_FIXED_COUNTERS)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index b92dd2e58335..0faf580782d5 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -122,6 +122,8 @@ void kvm_init_pmu_capability(struct kvm_pmu_ops *pmu_ops)
{
bool is_intel = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL;
int min_nr_gp_ctrs = pmu_ops->MIN_NR_GP_COUNTERS;
+ union cpuid10_edx edx;
+ u32 eax, ebx, ecx;
/*
* Hybrid PMUs don't play nice with virtualization without careful
@@ -169,6 +171,22 @@ void kvm_init_pmu_capability(struct kvm_pmu_ops *pmu_ops)
kvm_pmu_cap.num_counters_fixed = min(kvm_pmu_cap.num_counters_fixed,
KVM_MAX_NR_FIXED_COUNTERS);
+ /*
+ * Intel platforms may support non-contiguous fixed counters, e.g., some
+ * E-core based server processors don't implement fixed counter 3.
+ *
+ * Before KVM supports non-contiguous fixed counters, make sure only
+ * contiguous ones are retained in kvm_pmu_cap.
+ */
+ if (kvm_host_pmu.version >= 5) {
+ cpuid(10, &eax, &ebx, &ecx, &edx.full);
+ if (kvm_pmu_cap.num_counters_fixed > edx.split.num_counters_fixed)
+ kvm_pmu_cap.num_counters_fixed = edx.split.num_counters_fixed;
+ }
+
+ if (!enable_mediated_pmu && kvm_pmu_cap.num_counters_fixed > 3)
+ kvm_pmu_cap.num_counters_fixed = 3;
+
kvm_pmu_eventsel.INSTRUCTIONS_RETIRED =
perf_get_hw_event_config(PERF_COUNT_HW_INSTRUCTIONS);
kvm_pmu_eventsel.BRANCH_INSTRUCTIONS_RETIRED =
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cf122b8c3210..b9cca855bc10 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -350,7 +350,7 @@ static const u32 msrs_to_save_base[] = {
static const u32 msrs_to_save_pmu[] = {
MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
- MSR_ARCH_PERFMON_FIXED_CTR0 + 2,
+ MSR_ARCH_PERFMON_FIXED_CTR2, MSR_ARCH_PERFMON_FIXED_CTR3,
MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
MSR_CORE_PERF_GLOBAL_CTRL,
MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
@@ -7742,7 +7742,7 @@ static void kvm_init_msr_lists(void)
{
unsigned i;
- BUILD_BUG_ON_MSG(KVM_MAX_NR_FIXED_COUNTERS != 3,
+ BUILD_BUG_ON_MSG(KVM_MAX_NR_FIXED_COUNTERS != 4,
"Please update the fixed PMCs in msrs_to_save_pmu[]");
num_msrs_to_save = 0;
--
2.54.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH V3 3/4] KVM: x86/pmu: Support PERF_METRICS MSR in mediated vPMU
2026-06-15 23:01 [PATCH V3 0/4] KVM: x86/pmu: Add hardware Topdown metrics support Zide Chen
2026-06-15 23:01 ` [PATCH V3 1/4] KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events Zide Chen
2026-06-15 23:01 ` [PATCH V3 2/4] KVM: x86/pmu: Support Intel fixed counter 3 on mediated vPMU Zide Chen
@ 2026-06-15 23:01 ` Zide Chen
2026-06-15 23:26 ` sashiko-bot
2026-06-15 23:01 ` [PATCH V3 4/4] KVM: selftests: Add perf_metrics and fixed counter 3 tests Zide Chen
3 siblings, 1 reply; 10+ messages in thread
From: Zide Chen @ 2026-06-15 23:01 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Zide Chen,
Das Sandipan, Shukla Manali, Dapeng Mi, Falcon Thomas, Xudong Hao
From: Dapeng Mi <dapeng1.mi@linux.intel.com>
Bit 15 in IA32_PERF_CAPABILITIES indicates that the CPU provides
built-in support for Topdown Microarchitecture Analysis (TMA) L1
metrics via the IA32_PERF_METRICS MSR.
Expose this capability only when mediated vPMU is enabled, as emulating
IA32_PERF_METRICS in the legacy vPMU model is impractical.
Pass IA32_PERF_METRICS through to the guest only when mediated vPMU is
enabled and bit 15 is set in guest IA32_PERF_CAPABILITIES. Allow
kvm_pmu_{get,set}_msr() to handle this MSR for host accesses.
Save and restore this MSR on host/guest PMU context switches so that
host PMU activity does not clobber the guest value, and guest state
is not leaked into the host.
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
v3:
- Replace WARN_ON() with WARN_ON_ONCE(). (Dapeng)
- Add comments to explain why don't validate writes on PERF_METRICS.
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/perf_event.h | 1 +
arch/x86/kvm/vmx/pmu_intel.c | 36 +++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/pmu_intel.h | 5 +++++
arch/x86/kvm/vmx/vmx.c | 6 ++++++
arch/x86/kvm/x86.c | 6 +++++-
7 files changed, 55 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 754103e7ab4d..9b03475843f3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -597,6 +597,7 @@ struct kvm_pmu {
u64 global_status_rsvd;
u64 reserved_bits;
u64 raw_event_mask;
+ u64 perf_metrics;
struct kvm_pmc gp_counters[KVM_MAX_NR_GP_COUNTERS];
struct kvm_pmc fixed_counters[KVM_MAX_NR_FIXED_COUNTERS];
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 18c4be75e927..fdcaeb6c8352 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -331,6 +331,7 @@
#define PERF_CAP_PEBS_FORMAT 0xf00
#define PERF_CAP_FW_WRITES BIT_ULL(13)
#define PERF_CAP_PEBS_BASELINE BIT_ULL(14)
+#define PERF_CAP_PERF_METRICS BIT_ULL(15)
#define PERF_CAP_PEBS_TIMING_INFO BIT_ULL(17)
#define PERF_CAP_PEBS_MASK (PERF_CAP_PEBS_TRAP | PERF_CAP_ARCH_REG | \
PERF_CAP_PEBS_FORMAT | PERF_CAP_PEBS_BASELINE | \
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 1eb13673e889..bc2e1cbcd9b9 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -447,6 +447,7 @@ static inline bool is_topdown_idx(int idx)
#define GLOBAL_STATUS_ARCH_PEBS_THRESHOLD_BIT 54
#define GLOBAL_STATUS_ARCH_PEBS_THRESHOLD BIT_ULL(GLOBAL_STATUS_ARCH_PEBS_THRESHOLD_BIT)
#define GLOBAL_STATUS_PERF_METRICS_OVF_BIT 48
+#define GLOBAL_STATUS_PERF_METRICS_OVF BIT_ULL(GLOBAL_STATUS_PERF_METRICS_OVF_BIT)
#define GLOBAL_CTRL_EN_PERF_METRICS BIT_ULL(48)
/*
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 59b7a90c79e1..19daee29b731 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -188,6 +188,8 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
switch (msr) {
case MSR_CORE_PERF_FIXED_CTR_CTRL:
return kvm_pmu_has_perf_global_ctrl(pmu);
+ case MSR_PERF_METRICS:
+ return vcpu_has_perf_metrics(vcpu);
case MSR_IA32_PEBS_ENABLE:
ret = vcpu_get_perf_capabilities(vcpu) & PERF_CAP_PEBS_FORMAT;
break;
@@ -345,6 +347,10 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_CORE_PERF_FIXED_CTR_CTRL:
msr_info->data = pmu->fixed_ctr_ctrl;
break;
+ case MSR_PERF_METRICS:
+ WARN_ON_ONCE(!msr_info->host_initiated);
+ msr_info->data = pmu->perf_metrics;
+ break;
case MSR_IA32_PEBS_ENABLE:
msr_info->data = pmu->pebs_enable;
break;
@@ -394,6 +400,15 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (pmu->fixed_ctr_ctrl != data)
reprogram_fixed_counters(pmu, data);
break;
+ case MSR_PERF_METRICS:
+ WARN_ON_ONCE(!msr_info->host_initiated);
+
+ /*
+ * If TMA level 2 is not supported, bits [63:32] are reserved
+ * and ignored on write, so no validation is needed here.
+ */
+ pmu->perf_metrics = data;
+ break;
case MSR_IA32_PEBS_ENABLE:
if (data & pmu->pebs_enable_rsvd)
return 1;
@@ -589,6 +604,11 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
pmu->global_status_rsvd &=
~MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI;
+ if (perf_capabilities & PERF_CAP_PERF_METRICS) {
+ pmu->global_ctrl_rsvd &= ~GLOBAL_CTRL_EN_PERF_METRICS;
+ pmu->global_status_rsvd &= ~GLOBAL_STATUS_PERF_METRICS_OVF;
+ }
+
if (perf_capabilities & PERF_CAP_PEBS_FORMAT) {
if (perf_capabilities & PERF_CAP_PEBS_BASELINE) {
pmu->pebs_enable_rsvd = counter_rsvd;
@@ -632,6 +652,9 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
static void intel_pmu_reset(struct kvm_vcpu *vcpu)
{
+ struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+
+ pmu->perf_metrics = 0;
intel_pmu_release_guest_lbr_event(vcpu);
}
@@ -803,6 +826,13 @@ static void intel_mediated_pmu_load(struct kvm_vcpu *vcpu)
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
u64 global_status, toggle;
+ /*
+ * PERF_METRICS MSR must be restored closely after fixed counter 3
+ * (kvm_pmu_load_guest_pmcs()).
+ */
+ if (vcpu_has_perf_metrics(vcpu))
+ wrmsrq(MSR_PERF_METRICS, pmu->perf_metrics);
+
rdmsrq(MSR_CORE_PERF_GLOBAL_STATUS, global_status);
toggle = pmu->global_status ^ global_status;
if (global_status & toggle)
@@ -831,6 +861,12 @@ static void intel_mediated_pmu_put(struct kvm_vcpu *vcpu)
*/
if (pmu->fixed_ctr_ctrl_hw)
wrmsrq(MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
+
+ if (vcpu_has_perf_metrics(vcpu)) {
+ pmu->perf_metrics = rdpmc(INTEL_PMC_FIXED_RDPMC_METRICS);
+ if (pmu->perf_metrics)
+ wrmsrq(MSR_PERF_METRICS, 0);
+ }
}
struct kvm_pmu_ops intel_pmu_ops __initdata = {
diff --git a/arch/x86/kvm/vmx/pmu_intel.h b/arch/x86/kvm/vmx/pmu_intel.h
index 5d9357640aa1..2ec547223b09 100644
--- a/arch/x86/kvm/vmx/pmu_intel.h
+++ b/arch/x86/kvm/vmx/pmu_intel.h
@@ -40,4 +40,9 @@ struct lbr_desc {
extern struct x86_pmu_lbr vmx_lbr_caps;
+static inline bool vcpu_has_perf_metrics(struct kvm_vcpu *vcpu)
+{
+ return !!(vcpu_get_perf_capabilities(vcpu) & PERF_CAP_PERF_METRICS);
+}
+
#endif /* __KVM_X86_VMX_PMU_INTEL_H */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c548f22375ad..7cffe1619a1f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4264,6 +4264,9 @@ static void vmx_recalc_pmu_msr_intercepts(struct kvm_vcpu *vcpu)
MSR_TYPE_RW, intercept);
vmx_set_intercept_for_msr(vcpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
MSR_TYPE_RW, intercept);
+
+ vmx_set_intercept_for_msr(vcpu, MSR_PERF_METRICS, MSR_TYPE_RW,
+ !vcpu_has_perf_metrics(vcpu));
}
static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
@@ -8088,6 +8091,9 @@ static __init u64 vmx_get_perf_capabilities(void)
perf_cap &= ~PERF_CAP_PEBS_BASELINE;
}
+ if (enable_mediated_pmu)
+ perf_cap |= host_perf_cap & PERF_CAP_PERF_METRICS;
+
return perf_cap;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b9cca855bc10..2ac297ba3598 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -352,7 +352,7 @@ static const u32 msrs_to_save_pmu[] = {
MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
MSR_ARCH_PERFMON_FIXED_CTR2, MSR_ARCH_PERFMON_FIXED_CTR3,
MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
- MSR_CORE_PERF_GLOBAL_CTRL,
+ MSR_CORE_PERF_GLOBAL_CTRL, MSR_PERF_METRICS,
MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
/* This part of MSRs should match KVM_MAX_NR_INTEL_GP_COUNTERS. */
@@ -7679,6 +7679,10 @@ static void kvm_probe_msr_to_save(u32 msr_index)
intel_pt_validate_hw_cap(PT_CAP_num_address_ranges) * 2))
return;
break;
+ case MSR_PERF_METRICS:
+ if (!(kvm_caps.supported_perf_cap & PERF_CAP_PERF_METRICS))
+ return;
+ break;
case MSR_ARCH_PERFMON_PERFCTR0 ...
MSR_ARCH_PERFMON_PERFCTR0 + KVM_MAX_NR_GP_COUNTERS - 1:
if (msr_index - MSR_ARCH_PERFMON_PERFCTR0 >=
--
2.54.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH V3 3/4] KVM: x86/pmu: Support PERF_METRICS MSR in mediated vPMU
2026-06-15 23:01 ` [PATCH V3 3/4] KVM: x86/pmu: Support PERF_METRICS MSR in " Zide Chen
@ 2026-06-15 23:26 ` sashiko-bot
2026-06-16 16:29 ` Chen, Zide
0 siblings, 1 reply; 10+ messages in thread
From: sashiko-bot @ 2026-06-15 23:26 UTC (permalink / raw)
To: Zide Chen; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Guest-reachable WARN_ON_ONCE assertions in MSR_PERF_METRICS accessors allow unprivileged guests to trigger host-side denial of service.
- [High] Software updates to MSR_PERF_METRICS via KVM ioctls or VCPU reset are silently clobbered by stale hardware MSR values during vcpu_put().
- [High] MSR_PERF_METRICS is passed through to the guest without context switching if the guest has a PMU version of 0, bypassing host isolation.
--
commit 65a5a5edf8809e596749d317875bbc842f4eb7fc
Author: Dapeng Mi <dapeng1.mi@linux.intel.com>
KVM: x86/pmu: Support PERF_METRICS MSR in mediated vPMU
This commit exposes the IA32_PERF_METRICS MSR to guests when mediated
vPMU is enabled. It allows host access via kvm_pmu_{get,set}_msr() and
handles save/restore during PMU context switches.
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 05a59f4acfdd3..61102aa5e2f9e 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
[ ... ]
> @@ -335,6 +337,10 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> case MSR_CORE_PERF_FIXED_CTR_CTRL:
> msr_info->data = pmu->fixed_ctr_ctrl;
> break;
> + case MSR_PERF_METRICS:
> + WARN_ON_ONCE(!msr_info->host_initiated);
[Severity: High]
Could this WARN_ON_ONCE() be triggered by an unprivileged guest?
If a guest forces KVM's instruction emulator to handle the MSR access, for
example by executing RDMSR over an MMIO page, the emulator might process the
instruction and invoke KVM's MSR handlers with host_initiated = false.
This would unconditionally trigger the warning and allow the guest to spam
the host kernel log or cause a host panic if panic_on_warn is enabled.
> + msr_info->data = pmu->perf_metrics;
> + break;
> case MSR_IA32_PEBS_ENABLE:
> msr_info->data = pmu->pebs_enable;
> break;
[ ... ]
> @@ -384,6 +390,15 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> if (pmu->fixed_ctr_ctrl != data)
> reprogram_fixed_counters(pmu, data);
> break;
> + case MSR_PERF_METRICS:
> + WARN_ON_ONCE(!msr_info->host_initiated);
[Severity: High]
Can a malicious guest also trigger this WARN_ON_ONCE() by executing WRMSR
over an MMIO page to force instruction emulation?
> +
> + /*
> + * If TMA level 2 is not supported, bits [63:32] are reserved
> + * and ignored on write, so no validation is needed here.
> + */
> + pmu->perf_metrics = data;
> + break;
> case MSR_IA32_PEBS_ENABLE:
[ ... ]
> @@ -622,6 +642,9 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
>
> static void intel_pmu_reset(struct kvm_vcpu *vcpu)
> {
> + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> +
> + pmu->perf_metrics = 0;
> intel_pmu_release_guest_lbr_event(vcpu);
> }
[ ... ]
> @@ -821,6 +851,12 @@ static void intel_mediated_pmu_put(struct kvm_vcpu *vcpu)
> */
> if (pmu->fixed_ctr_ctrl_hw)
> wrmsrq(MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
> +
> + if (vcpu_has_perf_metrics(vcpu)) {
> + pmu->perf_metrics = rdpmc(INTEL_PMC_FIXED_RDPMC_METRICS);
[Severity: High]
Does this rdpmc() call silently clobber software updates made to
pmu->perf_metrics?
When the VCPU is loaded during a KVM_SET_MSRS ioctl or VCPU reset, updates
to pmu->perf_metrics in intel_pmu_set_msr() or intel_pmu_reset() modify the
software struct but do not synchronously write to the active hardware MSR.
Later, when the VCPU is descheduled, intel_mediated_pmu_put() reads the
physical MSR (which still holds the old pre-update value) and overwrites the
software state.
Will this permanently discard the new software state and break live
migration or VCPU resets?
> + if (pmu->perf_metrics)
> + wrmsrq(MSR_PERF_METRICS, 0);
> + }
> }
[ ... ]
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 9324d60839419..475e72c733cee 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4247,6 +4247,9 @@ static void vmx_recalc_pmu_msr_intercepts(struct kvm_vcpu *vcpu)
> MSR_TYPE_RW, intercept);
> vmx_set_intercept_for_msr(vcpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
> MSR_TYPE_RW, intercept);
> +
> + vmx_set_intercept_for_msr(vcpu, MSR_PERF_METRICS, MSR_TYPE_RW,
> + !vcpu_has_perf_metrics(vcpu));
[Severity: High]
Could clearing the intercept here allow the guest direct, unintercepted
access to the host's physical MSR_PERF_METRICS register without a context
switch?
If enable_mediated_pmu is true, KVM exposes PERF_CAP_PERF_METRICS. A VMM
could configure a VCPU with MSR_IA32_PERF_CAPABILITIES.PERF_METRICS=1 but
leave the PMU version at 0.
In this scenario, vcpu_has_perf_metrics() would return true, clearing the
intercept. However, because the PMU version is 0,
kvm_vcpu_has_mediated_pmu() returns false.
This causes the context-switch handlers kvm_mediated_pmu_load() and
kvm_mediated_pmu_put() to exit early.
Does this bypass host isolation and allow the guest to leak or clobber host
PMU state?
> }
>
> static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260615230118.50718-1-zide.chen@intel.com?part=3
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH V3 3/4] KVM: x86/pmu: Support PERF_METRICS MSR in mediated vPMU
2026-06-15 23:26 ` sashiko-bot
@ 2026-06-16 16:29 ` Chen, Zide
2026-06-18 2:21 ` Mi, Dapeng
0 siblings, 1 reply; 10+ messages in thread
From: Chen, Zide @ 2026-06-16 16:29 UTC (permalink / raw)
To: sashiko-reviews; +Cc: kvm
On 6/15/2026 6:26 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
> - [High] Guest-reachable WARN_ON_ONCE assertions in MSR_PERF_METRICS accessors allow unprivileged guests to trigger host-side denial of service.
> - [High] Software updates to MSR_PERF_METRICS via KVM ioctls or VCPU reset are silently clobbered by stale hardware MSR values during vcpu_put().
> - [High] MSR_PERF_METRICS is passed through to the guest without context switching if the guest has a PMU version of 0, bypassing host isolation.
> --
>
> commit 65a5a5edf8809e596749d317875bbc842f4eb7fc
> Author: Dapeng Mi <dapeng1.mi@linux.intel.com>
>
> KVM: x86/pmu: Support PERF_METRICS MSR in mediated vPMU
>
> This commit exposes the IA32_PERF_METRICS MSR to guests when mediated
> vPMU is enabled. It allows host access via kvm_pmu_{get,set}_msr() and
> handles save/restore during PMU context switches.
>
>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>> index 05a59f4acfdd3..61102aa5e2f9e 100644
>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>
> [ ... ]
>
>> @@ -335,6 +337,10 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> case MSR_CORE_PERF_FIXED_CTR_CTRL:
>> msr_info->data = pmu->fixed_ctr_ctrl;
>> break;
>> + case MSR_PERF_METRICS:
>> + WARN_ON_ONCE(!msr_info->host_initiated);
>
> [Severity: High]
> Could this WARN_ON_ONCE() be triggered by an unprivileged guest?
>
> If a guest forces KVM's instruction emulator to handle the MSR access, for
> example by executing RDMSR over an MMIO page, the emulator might process the
> instruction and invoke KVM's MSR handlers with host_initiated = false.
Yes, in theory, it's possible for the RDMSR/WRMSR to be handled in the
emulator path and WARN_ON_ONCE() could be unexpectedly triggered.
Since WARN_ON() is not intended to handle untrusted input, it's better
to use a defensive check and reject non-host-initiated accesses:
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 19daee29b731..6073227bdd20 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -348,7 +348,8 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu,
struct msr_data *msr_info)
msr_info->data = pmu->fixed_ctr_ctrl;
break;
case MSR_PERF_METRICS:
- WARN_ON_ONCE(!msr_info->host_initiated);
+ if (!msr_info->host_initiated)
+ return 1;
msr_info->data = pmu->perf_metrics;
break;
case MSR_IA32_PEBS_ENABLE:
@@ -401,7 +402,8 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu,
struct msr_data *msr_info)
reprogram_fixed_counters(pmu, data);
break;
case MSR_PERF_METRICS:
- WARN_ON_ONCE(!msr_info->host_initiated);
+ if (!msr_info->host_initiated)
+ return 1;
> This would unconditionally trigger the warning and allow the guest to spam
> the host kernel log or cause a host panic if panic_on_warn is enabled.
>
>> + msr_info->data = pmu->perf_metrics;
>> + break;
>> case MSR_IA32_PEBS_ENABLE:
>> msr_info->data = pmu->pebs_enable;
>> break;
>
> [ ... ]
>
>> @@ -384,6 +390,15 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> if (pmu->fixed_ctr_ctrl != data)
>> reprogram_fixed_counters(pmu, data);
>> break;
>> + case MSR_PERF_METRICS:
>> + WARN_ON_ONCE(!msr_info->host_initiated);
>
> [Severity: High]
> Can a malicious guest also trigger this WARN_ON_ONCE() by executing WRMSR
> over an MMIO page to force instruction emulation?
Ditto.
>> +
>> + /*
>> + * If TMA level 2 is not supported, bits [63:32] are reserved
>> + * and ignored on write, so no validation is needed here.
>> + */
>> + pmu->perf_metrics = data;
>> + break;
>> case MSR_IA32_PEBS_ENABLE:
>
> [ ... ]
>
>> @@ -622,6 +642,9 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
>>
>> static void intel_pmu_reset(struct kvm_vcpu *vcpu)
>> {
>> + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>> +
>> + pmu->perf_metrics = 0;
>> intel_pmu_release_guest_lbr_event(vcpu);
>> }
>
> [ ... ]
>
>> @@ -821,6 +851,12 @@ static void intel_mediated_pmu_put(struct kvm_vcpu *vcpu)
>> */
>> if (pmu->fixed_ctr_ctrl_hw)
>> wrmsrq(MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
>> +
>> + if (vcpu_has_perf_metrics(vcpu)) {
>> + pmu->perf_metrics = rdpmc(INTEL_PMC_FIXED_RDPMC_METRICS);
>
> [Severity: High]
> Does this rdpmc() call silently clobber software updates made to
> pmu->perf_metrics?
>
> When the VCPU is loaded during a KVM_SET_MSRS ioctl or VCPU reset, updates
> to pmu->perf_metrics in intel_pmu_set_msr() or intel_pmu_reset() modify the
> software struct but do not synchronously write to the active hardware MSR.
This is safe.
After intel_pmu_set_msr() or intel_pmu_reset(),
intel_mediated_pmu_load() is called to sync the state to hardware before
the next intel_mediated_pmu_put().
> Later, when the VCPU is descheduled, intel_mediated_pmu_put() reads the
> physical MSR (which still holds the old pre-update value) and overwrites the
> software state.
>
> Will this permanently discard the new software state and break live
> migration or VCPU resets?
>
>> + if (pmu->perf_metrics)
>> + wrmsrq(MSR_PERF_METRICS, 0);
>> + }
>> }
>
> [ ... ]
>
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 9324d60839419..475e72c733cee 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -4247,6 +4247,9 @@ static void vmx_recalc_pmu_msr_intercepts(struct kvm_vcpu *vcpu)
>> MSR_TYPE_RW, intercept);
>> vmx_set_intercept_for_msr(vcpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
>> MSR_TYPE_RW, intercept);
>> +
>> + vmx_set_intercept_for_msr(vcpu, MSR_PERF_METRICS, MSR_TYPE_RW,
>> + !vcpu_has_perf_metrics(vcpu));
>
> [Severity: High]
> Could clearing the intercept here allow the guest direct, unintercepted
> access to the host's physical MSR_PERF_METRICS register without a context
> switch?
>
> If enable_mediated_pmu is true, KVM exposes PERF_CAP_PERF_METRICS. A VMM
> could configure a VCPU with MSR_IA32_PERF_CAPABILITIES.PERF_METRICS=1 but
> leave the PMU version at 0.
>
> In this scenario, vcpu_has_perf_metrics() would return true, clearing the
> intercept. However, because the PMU version is 0,
> kvm_vcpu_has_mediated_pmu() returns false.
>
> This causes the context-switch handlers kvm_mediated_pmu_load() and
> kvm_mediated_pmu_put() to exit early.
>
> Does this bypass host isolation and allow the guest to leak or clobber host
> PMU state?
Yes, you are right.
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 7cffe1619a1f..28027d302d90 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4266,7 +4266,7 @@ static void vmx_recalc_pmu_msr_intercepts(struct
kvm_vcpu *vcpu)
MSR_TYPE_RW, intercept);
vmx_set_intercept_for_msr(vcpu, MSR_PERF_METRICS, MSR_TYPE_RW,
- !vcpu_has_perf_metrics(vcpu));
+ !has_mediated_pmu ||
!vcpu_has_perf_metrics(vcpu));
}
static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
>> }
>>
>> static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
>
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH V3 3/4] KVM: x86/pmu: Support PERF_METRICS MSR in mediated vPMU
2026-06-16 16:29 ` Chen, Zide
@ 2026-06-18 2:21 ` Mi, Dapeng
0 siblings, 0 replies; 10+ messages in thread
From: Mi, Dapeng @ 2026-06-18 2:21 UTC (permalink / raw)
To: Chen, Zide, sashiko-reviews; +Cc: kvm
On 6/17/2026 12:29 AM, Chen, Zide wrote:
>
> On 6/15/2026 6:26 PM, sashiko-bot@kernel.org wrote:
>> Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
>> - [High] Guest-reachable WARN_ON_ONCE assertions in MSR_PERF_METRICS accessors allow unprivileged guests to trigger host-side denial of service.
>> - [High] Software updates to MSR_PERF_METRICS via KVM ioctls or VCPU reset are silently clobbered by stale hardware MSR values during vcpu_put().
>> - [High] MSR_PERF_METRICS is passed through to the guest without context switching if the guest has a PMU version of 0, bypassing host isolation.
>> --
>>
>> commit 65a5a5edf8809e596749d317875bbc842f4eb7fc
>> Author: Dapeng Mi <dapeng1.mi@linux.intel.com>
>>
>> KVM: x86/pmu: Support PERF_METRICS MSR in mediated vPMU
>>
>> This commit exposes the IA32_PERF_METRICS MSR to guests when mediated
>> vPMU is enabled. It allows host access via kvm_pmu_{get,set}_msr() and
>> handles save/restore during PMU context switches.
>>
>>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>>> index 05a59f4acfdd3..61102aa5e2f9e 100644
>>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>> [ ... ]
>>
>>> @@ -335,6 +337,10 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>> case MSR_CORE_PERF_FIXED_CTR_CTRL:
>>> msr_info->data = pmu->fixed_ctr_ctrl;
>>> break;
>>> + case MSR_PERF_METRICS:
>>> + WARN_ON_ONCE(!msr_info->host_initiated);
>> [Severity: High]
>> Could this WARN_ON_ONCE() be triggered by an unprivileged guest?
>>
>> If a guest forces KVM's instruction emulator to handle the MSR access, for
>> example by executing RDMSR over an MMIO page, the emulator might process the
>> instruction and invoke KVM's MSR handlers with host_initiated = false.
> Yes, in theory, it's possible for the RDMSR/WRMSR to be handled in the
> emulator path and WARN_ON_ONCE() could be unexpectedly triggered.
>
> Since WARN_ON() is not intended to handle untrusted input, it's better
> to use a defensive check and reject non-host-initiated accesses:
>
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 19daee29b731..6073227bdd20 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -348,7 +348,8 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu,
> struct msr_data *msr_info)
> msr_info->data = pmu->fixed_ctr_ctrl;
> break;
> case MSR_PERF_METRICS:
> - WARN_ON_ONCE(!msr_info->host_initiated);
> + if (!msr_info->host_initiated)
> + return 1;
> msr_info->data = pmu->perf_metrics;
> break;
> case MSR_IA32_PEBS_ENABLE:
> @@ -401,7 +402,8 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu,
> struct msr_data *msr_info)
> reprogram_fixed_counters(pmu, data);
> break;
> case MSR_PERF_METRICS:
> - WARN_ON_ONCE(!msr_info->host_initiated);
> + if (!msr_info->host_initiated)
> + return 1;
Per my understanding, WARN_ON_ONCE() only prints the warning message once
and it should not spam Host log too much, but it's fine for me to remove
them. Thanks.
>
>
>
>> This would unconditionally trigger the warning and allow the guest to spam
>> the host kernel log or cause a host panic if panic_on_warn is enabled.
>>
>>> + msr_info->data = pmu->perf_metrics;
>>> + break;
>>> case MSR_IA32_PEBS_ENABLE:
>>> msr_info->data = pmu->pebs_enable;
>>> break;
>> [ ... ]
>>
>>> @@ -384,6 +390,15 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>> if (pmu->fixed_ctr_ctrl != data)
>>> reprogram_fixed_counters(pmu, data);
>>> break;
>>> + case MSR_PERF_METRICS:
>>> + WARN_ON_ONCE(!msr_info->host_initiated);
>> [Severity: High]
>> Can a malicious guest also trigger this WARN_ON_ONCE() by executing WRMSR
>> over an MMIO page to force instruction emulation?
> Ditto.
>>> +
>>> + /*
>>> + * If TMA level 2 is not supported, bits [63:32] are reserved
>>> + * and ignored on write, so no validation is needed here.
>>> + */
>>> + pmu->perf_metrics = data;
>>> + break;
>>> case MSR_IA32_PEBS_ENABLE:
>> [ ... ]
>>
>>> @@ -622,6 +642,9 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
>>>
>>> static void intel_pmu_reset(struct kvm_vcpu *vcpu)
>>> {
>>> + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>>> +
>>> + pmu->perf_metrics = 0;
>>> intel_pmu_release_guest_lbr_event(vcpu);
>>> }
>> [ ... ]
>>
>>> @@ -821,6 +851,12 @@ static void intel_mediated_pmu_put(struct kvm_vcpu *vcpu)
>>> */
>>> if (pmu->fixed_ctr_ctrl_hw)
>>> wrmsrq(MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
>>> +
>>> + if (vcpu_has_perf_metrics(vcpu)) {
>>> + pmu->perf_metrics = rdpmc(INTEL_PMC_FIXED_RDPMC_METRICS);
>> [Severity: High]
>> Does this rdpmc() call silently clobber software updates made to
>> pmu->perf_metrics?
>>
>> When the VCPU is loaded during a KVM_SET_MSRS ioctl or VCPU reset, updates
>> to pmu->perf_metrics in intel_pmu_set_msr() or intel_pmu_reset() modify the
>> software struct but do not synchronously write to the active hardware MSR.
> This is safe.
>
> After intel_pmu_set_msr() or intel_pmu_reset(),
> intel_mediated_pmu_load() is called to sync the state to hardware before
> the next intel_mediated_pmu_put().
>
>
>> Later, when the VCPU is descheduled, intel_mediated_pmu_put() reads the
>> physical MSR (which still holds the old pre-update value) and overwrites the
>> software state.
>>
>> Will this permanently discard the new software state and break live
>> migration or VCPU resets?
>>
>>> + if (pmu->perf_metrics)
>>> + wrmsrq(MSR_PERF_METRICS, 0);
>>> + }
>>> }
>> [ ... ]
>>
>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>>> index 9324d60839419..475e72c733cee 100644
>>> --- a/arch/x86/kvm/vmx/vmx.c
>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>> @@ -4247,6 +4247,9 @@ static void vmx_recalc_pmu_msr_intercepts(struct kvm_vcpu *vcpu)
>>> MSR_TYPE_RW, intercept);
>>> vmx_set_intercept_for_msr(vcpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
>>> MSR_TYPE_RW, intercept);
>>> +
>>> + vmx_set_intercept_for_msr(vcpu, MSR_PERF_METRICS, MSR_TYPE_RW,
>>> + !vcpu_has_perf_metrics(vcpu));
>> [Severity: High]
>> Could clearing the intercept here allow the guest direct, unintercepted
>> access to the host's physical MSR_PERF_METRICS register without a context
>> switch?
>>
>> If enable_mediated_pmu is true, KVM exposes PERF_CAP_PERF_METRICS. A VMM
>> could configure a VCPU with MSR_IA32_PERF_CAPABILITIES.PERF_METRICS=1 but
>> leave the PMU version at 0.
>>
>> In this scenario, vcpu_has_perf_metrics() would return true, clearing the
>> intercept. However, because the PMU version is 0,
>> kvm_vcpu_has_mediated_pmu() returns false.
>>
>> This causes the context-switch handlers kvm_mediated_pmu_load() and
>> kvm_mediated_pmu_put() to exit early.
>>
>> Does this bypass host isolation and allow the guest to leak or clobber host
>> PMU state?
> Yes, you are right.
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 7cffe1619a1f..28027d302d90 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4266,7 +4266,7 @@ static void vmx_recalc_pmu_msr_intercepts(struct
> kvm_vcpu *vcpu)
> MSR_TYPE_RW, intercept);
>
> vmx_set_intercept_for_msr(vcpu, MSR_PERF_METRICS, MSR_TYPE_RW,
> - !vcpu_has_perf_metrics(vcpu));
> + !has_mediated_pmu ||
> !vcpu_has_perf_metrics(vcpu));
> }
>
> static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
>
>>> }
>>>
>>> static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH V3 4/4] KVM: selftests: Add perf_metrics and fixed counter 3 tests
2026-06-15 23:01 [PATCH V3 0/4] KVM: x86/pmu: Add hardware Topdown metrics support Zide Chen
` (2 preceding siblings ...)
2026-06-15 23:01 ` [PATCH V3 3/4] KVM: x86/pmu: Support PERF_METRICS MSR in " Zide Chen
@ 2026-06-15 23:01 ` Zide Chen
2026-06-15 23:26 ` sashiko-bot
3 siblings, 1 reply; 10+ messages in thread
From: Zide Chen @ 2026-06-15 23:01 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Jim Mattson, Mingwei Zhang, Zide Chen,
Das Sandipan, Shukla Manali, Dapeng Mi, Falcon Thomas, Xudong Hao
Add a selftest to exercise IA32_PERF_METRICS, i.e. architectural
support for Topdown (TMA) Level 1 metrics, enumerated by
IA32_PERF_CAPABILITIES[15].
Only check for non-zero metrics, as they are derived and depend on
the workload, CPU model, and host scheduling, making precise
expectations fragile.
Extend the PMU selftest to cover Intel fixed counter 3 by bumping
MAX_NR_FIXED_COUNTERS to 4 and validating basic functionality.
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
v3:
- Slightly reword comment to explain the sum of topdown metrics
is close to 100%.
- Change abs() with explicit bounds (sum >= 0xfd && sum <= 0x102)
for better readability.
v2:
- New patch.
---
---
tools/arch/x86/include/asm/msr-index.h | 1 +
tools/testing/selftests/kvm/include/x86/pmu.h | 3 +
.../selftests/kvm/x86/pmu_counters_test.c | 72 +++++++++++++++++--
3 files changed, 71 insertions(+), 5 deletions(-)
diff --git a/tools/arch/x86/include/asm/msr-index.h b/tools/arch/x86/include/asm/msr-index.h
index eff29645719b..e7745e2cd543 100644
--- a/tools/arch/x86/include/asm/msr-index.h
+++ b/tools/arch/x86/include/asm/msr-index.h
@@ -331,6 +331,7 @@
#define PERF_CAP_PEBS_FORMAT 0xf00
#define PERF_CAP_FW_WRITES BIT_ULL(13)
#define PERF_CAP_PEBS_BASELINE BIT_ULL(14)
+#define PERF_CAP_PERF_METRICS BIT_ULL(15)
#define PERF_CAP_PEBS_TIMING_INFO BIT_ULL(17)
#define PERF_CAP_PEBS_MASK (PERF_CAP_PEBS_TRAP | PERF_CAP_ARCH_REG | \
PERF_CAP_PEBS_FORMAT | PERF_CAP_PEBS_BASELINE | \
diff --git a/tools/testing/selftests/kvm/include/x86/pmu.h b/tools/testing/selftests/kvm/include/x86/pmu.h
index 608ed83d7c6a..6c19503e0bb7 100644
--- a/tools/testing/selftests/kvm/include/x86/pmu.h
+++ b/tools/testing/selftests/kvm/include/x86/pmu.h
@@ -52,6 +52,9 @@
/* Fixed PMC controls, Intel only. */
#define FIXED_PMC_GLOBAL_CTRL_ENABLE(_idx) BIT_ULL((32 + (_idx)))
+/* PERF_METRICS enable, Intel only. */
+#define PERF_METRICS_GLOBAL_CTRL_ENABLE BIT_ULL(48)
+
#define FIXED_PMC_KERNEL BIT_ULL(0)
#define FIXED_PMC_USER BIT_ULL(1)
#define FIXED_PMC_ANYTHREAD BIT_ULL(2)
diff --git a/tools/testing/selftests/kvm/x86/pmu_counters_test.c b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
index dc6afac3aa91..d4688a7d1e55 100644
--- a/tools/testing/selftests/kvm/x86/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
@@ -6,6 +6,7 @@
#include "pmu.h"
#include "processor.h"
+#include <linux/bitfield.h>
/* Number of iterations of the loop for the guest measurement payload. */
#define NUM_LOOPS 10
@@ -241,17 +242,20 @@ do { \
); \
} while (0)
-#define GUEST_TEST_EVENT(_idx, _pmc, _pmc_msr, _ctrl_msr, _value, FEP) \
+#define GUEST_RUN_PAYLOAD(_ctrl_msr, _value, FEP) \
do { \
- wrmsr(_pmc_msr, 0); \
- \
if (this_cpu_has(X86_FEATURE_CLFLUSHOPT)) \
GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflushopt %[m]", FEP); \
else if (this_cpu_has(X86_FEATURE_CLFLUSH)) \
GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflush %[m]", FEP); \
else \
GUEST_MEASURE_EVENT(_ctrl_msr, _value, "nop", FEP); \
- \
+} while (0)
+
+#define GUEST_TEST_EVENT(_idx, _pmc, _pmc_msr, _ctrl_msr, _value, FEP) \
+do { \
+ wrmsr(_pmc_msr, 0); \
+ GUEST_RUN_PAYLOAD(_ctrl_msr, _value, FEP); \
guest_assert_event_count(_idx, _pmc, _pmc_msr); \
} while (0)
@@ -318,6 +322,56 @@ static void guest_test_arch_event(u8 idx)
FIXED_PMC_GLOBAL_CTRL_ENABLE(i));
}
+static void guest_test_perf_metrics(void)
+{
+ int retiring, bad_spec, fe_bound, be_bound, sum;
+ u64 global_ctrl, metrics;
+
+ if ((guest_get_pmu_version() < 2) || /* Does guest have GLOBAL_CTRL? */
+ !this_cpu_has(X86_FEATURE_PDCM) ||
+ !(rdmsr(MSR_IA32_PERF_CAPABILITIES) & PERF_CAP_PERF_METRICS))
+ return;
+
+ wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+ wrmsr(MSR_CORE_PERF_FIXED_CTR3, 0);
+ wrmsr(MSR_PERF_METRICS, 0);
+
+ /* Enable fixed ctr3 (TOPDOWN.SLOTS) and PERF_METRICS. */
+ wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, FIXED_PMC_CTRL(3, FIXED_PMC_KERNEL));
+ global_ctrl = FIXED_PMC_GLOBAL_CTRL_ENABLE(3) |
+ PERF_METRICS_GLOBAL_CTRL_ENABLE;
+
+ GUEST_RUN_PAYLOAD(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl, "");
+
+ /* Check test results. */
+ metrics = rdmsr(MSR_PERF_METRICS);
+ retiring = FIELD_GET(GENMASK_ULL(7, 0), metrics);
+ bad_spec = FIELD_GET(GENMASK_ULL(15, 8), metrics);
+ fe_bound = FIELD_GET(GENMASK_ULL(23, 16), metrics);
+ be_bound = FIELD_GET(GENMASK_ULL(31, 24), metrics);
+
+ /*
+ * Be conservative: the measured payload definitely retires work, so
+ * Retiring should be non-zero.
+ */
+ GUEST_ASSERT_NE(metrics, 0ULL);
+ GUEST_ASSERT_NE(retiring, 0ULL);
+
+ /*
+ * The sum of the 4 level-1 topdown metrics should be close to 100%.
+ * 3 is chosen as a loose sanity check.
+ */
+ sum = retiring + bad_spec + fe_bound + be_bound;
+ GUEST_ASSERT(sum >= 0xfd && sum <= 0x102);
+
+ /* Sanity check after PERF_METRICS disabled. */
+ __asm__ __volatile__("loop ." : "+c"((int){NUM_LOOPS}));
+ GUEST_ASSERT_EQ(rdmsr(MSR_PERF_METRICS), metrics);
+ wrmsr(MSR_PERF_METRICS, 0xdeaddead);
+
+ GUEST_ASSERT_EQ(rdmsr(MSR_PERF_METRICS), 0xdeaddead);
+}
+
static void guest_test_arch_events(void)
{
u8 i;
@@ -325,6 +379,8 @@ static void guest_test_arch_events(void)
for (i = 0; i < NR_INTEL_ARCH_EVENTS; i++)
guest_test_arch_event(i);
+ guest_test_perf_metrics();
+
GUEST_DONE();
}
@@ -361,7 +417,7 @@ static void test_arch_events(u8 pmu_version, u64 perf_capabilities,
* other than PMCs in the future.
*/
#define MAX_NR_GP_COUNTERS 8
-#define MAX_NR_FIXED_COUNTERS 3
+#define MAX_NR_FIXED_COUNTERS 4
#define GUEST_ASSERT_PMC_MSR_ACCESS(insn, msr, expect_gp, vector) \
__GUEST_ASSERT(expect_gp ? vector == GP_VECTOR : !vector, \
@@ -585,6 +641,7 @@ static void test_intel_counters(void)
u8 nr_fixed_counters = kvm_cpu_property(X86_PROPERTY_PMU_NR_FIXED_COUNTERS);
u8 nr_gp_counters = kvm_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS);
u8 pmu_version = kvm_cpu_property(X86_PROPERTY_PMU_VERSION);
+ u64 advertised_perf_caps = kvm_get_feature_msr(MSR_IA32_PERF_CAPABILITIES);
unsigned int i;
u8 v, j;
u32 k;
@@ -592,6 +649,7 @@ static void test_intel_counters(void)
const u64 perf_caps[] = {
0,
PMU_CAP_FW_WRITES,
+ PERF_CAP_PERF_METRICS,
};
/*
@@ -649,6 +707,10 @@ static void test_intel_counters(void)
if (!kvm_has_perf_caps && perf_caps[i])
continue;
+ /* Ignore unsupported features. */
+ if (perf_caps[i] & ~advertised_perf_caps)
+ continue;
+
pr_info("Testing arch events, PMU version %u, perf_caps = %lx\n",
v, perf_caps[i]);
--
2.54.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH V3 4/4] KVM: selftests: Add perf_metrics and fixed counter 3 tests
2026-06-15 23:01 ` [PATCH V3 4/4] KVM: selftests: Add perf_metrics and fixed counter 3 tests Zide Chen
@ 2026-06-15 23:26 ` sashiko-bot
2026-06-16 16:32 ` Chen, Zide
0 siblings, 1 reply; 10+ messages in thread
From: sashiko-bot @ 2026-06-15 23:26 UTC (permalink / raw)
To: Zide Chen; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues:
- [High] A malicious or misconfigured guest can trigger a host-side WARN_ON_ONCE by forcing KVM to emulate an RDMSR or WRMSR to MSR_PERF_METRICS, leading to log spam or a host kernel panic.
--
commit 2e12d71150f3902b3e1f245bbcb48d6fe2235b4a
Author: Zide Chen <zide.chen@intel.com>
KVM: selftests: Add perf_metrics and fixed counter 3 tests
This commit adds a selftest to exercise IA32_PERF_METRICS support for Topdown
(TMA) Level 1 metrics. It checks for non-zero metrics derived from the
workload and extends the PMU selftest to cover Intel fixed counter 3 by
bumping MAX_NR_FIXED_COUNTERS to 4.
> diff --git a/tools/testing/selftests/kvm/x86/pmu_counters_test.c b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
> index dc6afac3aa919..d4688a7d1e556 100644
> --- a/tools/testing/selftests/kvm/x86/pmu_counters_test.c
> +++ b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
[ ... ]
> @@ -318,6 +322,56 @@ static void guest_test_arch_event(u8 idx)
> FIXED_PMC_GLOBAL_CTRL_ENABLE(i));
> }
>
> +static void guest_test_perf_metrics(void)
> +{
> + int retiring, bad_spec, fe_bound, be_bound, sum;
> + u64 global_ctrl, metrics;
> +
> + if ((guest_get_pmu_version() < 2) || /* Does guest have GLOBAL_CTRL? */
> + !this_cpu_has(X86_FEATURE_PDCM) ||
> + !(rdmsr(MSR_IA32_PERF_CAPABILITIES) & PERF_CAP_PERF_METRICS))
> + return;
> +
> + wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
> + wrmsr(MSR_CORE_PERF_FIXED_CTR3, 0);
> + wrmsr(MSR_PERF_METRICS, 0);
[Severity: High]
This is a pre-existing issue, but since this test now exercises
MSR_PERF_METRICS, is it possible for a guest to trigger a host-side
WARN_ON_ONCE by forcing KVM to emulate RDMSR or WRMSR to MSR_PERF_METRICS?
Looking at KVM's emulation path in arch/x86/kvm/vmx/pmu_intel.c,
intel_pmu_get_msr() and intel_pmu_set_msr() both contain a WARN_ON_ONCE:
intel_pmu_get_msr() {
...
case MSR_PERF_METRICS:
WARN_ON_ONCE(!msr_info->host_initiated);
msr_info->data = pmu->perf_metrics;
break;
}
If a malicious guest forces instruction emulation for RDMSR or WRMSR targeting
this MSR (for example, by using the KVM_FEP forced emulation prefix, or by
executing the instruction from a page that triggers an EPT violation on
fetch), the instruction is processed by KVM's software emulator.
The emulator calls kvm_emulate_msr_read() or kvm_emulate_msr_write(), which
explicitly set msr_info.host_initiated = false.
Could this trigger the WARN_ON_ONCE in intel_pmu_get_msr() or
intel_pmu_set_msr() and allow an unprivileged guest to spam the host kernel
log or panic the host if panic_on_warn is enabled?
> +
> + /* Enable fixed ctr3 (TOPDOWN.SLOTS) and PERF_METRICS. */
> + wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, FIXED_PMC_CTRL(3, FIXED_PMC_KERNEL));
> + global_ctrl = FIXED_PMC_GLOBAL_CTRL_ENABLE(3) |
> + PERF_METRICS_GLOBAL_CTRL_ENABLE;
> +
> + GUEST_RUN_PAYLOAD(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl, "");
> +
> + /* Check test results. */
> + metrics = rdmsr(MSR_PERF_METRICS);
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260615230118.50718-1-zide.chen@intel.com?part=4
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH V3 4/4] KVM: selftests: Add perf_metrics and fixed counter 3 tests
2026-06-15 23:26 ` sashiko-bot
@ 2026-06-16 16:32 ` Chen, Zide
0 siblings, 0 replies; 10+ messages in thread
From: Chen, Zide @ 2026-06-16 16:32 UTC (permalink / raw)
To: sashiko-reviews; +Cc: kvm
On 6/15/2026 6:26 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
>
> Pre-existing issues:
> - [High] A malicious or misconfigured guest can trigger a host-side WARN_ON_ONCE by forcing KVM to emulate an RDMSR or WRMSR to MSR_PERF_METRICS, leading to log spam or a host kernel panic.
> --
>
> commit 2e12d71150f3902b3e1f245bbcb48d6fe2235b4a
> Author: Zide Chen <zide.chen@intel.com>
>
> KVM: selftests: Add perf_metrics and fixed counter 3 tests
>
> This commit adds a selftest to exercise IA32_PERF_METRICS support for Topdown
> (TMA) Level 1 metrics. It checks for non-zero metrics derived from the
> workload and extends the PMU selftest to cover Intel fixed counter 3 by
> bumping MAX_NR_FIXED_COUNTERS to 4.
>
>> diff --git a/tools/testing/selftests/kvm/x86/pmu_counters_test.c b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
>> index dc6afac3aa919..d4688a7d1e556 100644
>> --- a/tools/testing/selftests/kvm/x86/pmu_counters_test.c
>> +++ b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
> [ ... ]
>> @@ -318,6 +322,56 @@ static void guest_test_arch_event(u8 idx)
>> FIXED_PMC_GLOBAL_CTRL_ENABLE(i));
>> }
>>
>> +static void guest_test_perf_metrics(void)
>> +{
>> + int retiring, bad_spec, fe_bound, be_bound, sum;
>> + u64 global_ctrl, metrics;
>> +
>> + if ((guest_get_pmu_version() < 2) || /* Does guest have GLOBAL_CTRL? */
>> + !this_cpu_has(X86_FEATURE_PDCM) ||
>> + !(rdmsr(MSR_IA32_PERF_CAPABILITIES) & PERF_CAP_PERF_METRICS))
>> + return;
>> +
>> + wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
>> + wrmsr(MSR_CORE_PERF_FIXED_CTR3, 0);
>> + wrmsr(MSR_PERF_METRICS, 0);
>
> [Severity: High]
> This is a pre-existing issue, but since this test now exercises
> MSR_PERF_METRICS, is it possible for a guest to trigger a host-side
> WARN_ON_ONCE by forcing KVM to emulate RDMSR or WRMSR to MSR_PERF_METRICS?
>
> Looking at KVM's emulation path in arch/x86/kvm/vmx/pmu_intel.c,
> intel_pmu_get_msr() and intel_pmu_set_msr() both contain a WARN_ON_ONCE:
>
> intel_pmu_get_msr() {
> ...
> case MSR_PERF_METRICS:
> WARN_ON_ONCE(!msr_info->host_initiated);
> msr_info->data = pmu->perf_metrics;
> break;
> }
>
> If a malicious guest forces instruction emulation for RDMSR or WRMSR targeting
> this MSR (for example, by using the KVM_FEP forced emulation prefix, or by
> executing the instruction from a page that triggers an EPT violation on
> fetch), the instruction is processed by KVM's software emulator.
>
> The emulator calls kvm_emulate_msr_read() or kvm_emulate_msr_write(), which
> explicitly set msr_info.host_initiated = false.
>
> Could this trigger the WARN_ON_ONCE in intel_pmu_get_msr() or
> intel_pmu_set_msr() and allow an unprivileged guest to spam the host kernel
> log or panic the host if panic_on_warn is enabled?
Yes, it's good to change this in patch 3/4 to avoid the WARN_ON_ONCE().
case MSR_PERF_METRICS:
- WARN_ON_ONCE(!msr_info->host_initiated);
+ if (!msr_info->host_initiated)
+ return 1;
>
>> +
>> + /* Enable fixed ctr3 (TOPDOWN.SLOTS) and PERF_METRICS. */
>> + wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, FIXED_PMC_CTRL(3, FIXED_PMC_KERNEL));
>> + global_ctrl = FIXED_PMC_GLOBAL_CTRL_ENABLE(3) |
>> + PERF_METRICS_GLOBAL_CTRL_ENABLE;
>> +
>> + GUEST_RUN_PAYLOAD(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl, "");
>> +
>> + /* Check test results. */
>> + metrics = rdmsr(MSR_PERF_METRICS);
>
^ permalink raw reply [flat|nested] 10+ messages in thread