* [PATCH v9 0/5] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup
@ 2026-01-09 7:53 Dongli Zhang
2026-01-09 7:53 ` [PATCH v9 1/5] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured Dongli Zhang
` (5 more replies)
0 siblings, 6 replies; 14+ messages in thread
From: Dongli Zhang @ 2026-01-09 7:53 UTC (permalink / raw)
To: qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai,
zide.chen
[PATCH v9 0/5] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup
This patchset addresses two bugs related to AMD PMU virtualization.
1. The third issue is that using "-cpu host,-pmu" does not disable AMD PMU
virtualization. When using "-cpu EPYC" or "-cpu host,-pmu", AMD PMU
virtualization remains enabled. On the VM's Linux side, you might still
see:
[ 0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
instead of:
[ 0.596381] Performance Events: PMU not available due to virtualization, using software events only.
[ 0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
when "-pmu" is configured.
2. The fourth issue is that unreclaimed performance events (after a QEMU
system_reset) in KVM may cause random, unwanted, or unknown NMIs to be
injected into the VM.
The AMD PMU registers are not reset during QEMU system_reset.
(1) If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
running "perf top", the PMU registers are not disabled properly.
(2) Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
does not handle AMD PMU registers, causing some PMU events to remain
enabled in KVM.
(3) The KVM kvm_pmc_speculative_in_use() function consistently returns true,
preventing the reclamation of these events. Consequently, the
kvm_pmc->perf_event remains active.
(4) After a reboot, the VM kernel may report the following error:
[ 0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
[ 0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
(5) In the worst case, the active kvm_pmc->perf_event may inject unknown
NMIs randomly into the VM kernel:
[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
To resolve these issues, we propose resetting AMD PMU registers during the
VM reset process
Changed since v1:
- Use feature_dependencies for CPUID_EXT3_PERFCORE and
CPUID_8000_0022_EAX_PERFMON_V2.
- Remove CPUID_EXT3_PERFCORE when !cpu->enable_pmu.
- Pick kvm_arch_pre_create_vcpu() patch from Xiaoyao Li.
- Use "-pmu" but not a global "pmu-cap-disabled" for KVM_PMU_CAP_DISABLE.
- Also use sysfs kvm.enable_pmu=N to determine if PMU is supported.
- Some changes to PMU register limit calculation.
Changed since v2:
- Change has_pmu_cap to pmu_cap.
- Use cpuid_find_entry() instead of cpu_x86_cpuid().
- Rework the code flow of PATCH 07 related to kvm.enable_pmu=N following
Zhao's suggestion.
- Use object_property_get_int() to get CPU family.
- Add support to Zhaoxin.
Changed since v3:
- Re-base on top of Zhao's queued patch.
- Use host_cpu_vendor_fms() from Zhao's patch.
- Pick new version of kvm_arch_pre_create_vcpu() patch from Xiaoyao.
- Re-split the cases into enable_pmu and !enable_pmu, following Zhao's
suggestion.
- Check AMD directly makes the "compat" rule clear.
- Some changes on commit message and comment.
- Bring back global static variable 'kvm_pmu_disabled' read from
/sys/module/kvm/parameters/enable_pmu.
Changed since v4:
- Re-base on top of most recent mainline QEMU.
- Add more Reviewed-by.
- All patches are reviewed.
Changed since v5:
- Re-base on top of most recent mainline QEMU.
- Remove patch "kvm: Introduce kvm_arch_pre_create_vcpu()" as it is
already merged.
- To resolve conflicts in new [PATCH v6 3/9] , move the PMU related code
before the call site of is_tdx_vm().
Changed since v6:
- Re-base on top of most recent mainline QEMU (staging branch).
- Add more Reviewed-by from Dapeng and Sandipan.
Changed since v7:
https://lore.kernel.org/qemu-devel/20251111061532.36702-1-dongli.zhang@oracle.com/
- Re-base on top of most recent mainline QEMU (staging branch).
- Remove PATCH 1 & 2 from the v6 patchset. Zhao may work on them in
another patchset.
Changed since v8:
https://lore.kernel.org/qemu-devel/20251230074354.88958-1-dongli.zhang@oracle.com/
- Remove "PATCH v8 4/7" which introduces 'kvm_pmu_disabled' based on
"/sys/module/kvm/parameters/enable_pmu", as suggested by Zide.
- Remove the usage of 'kvm_pmu_disabled' ("PATCH v9 4/5").
- Remove Reviewed-by from Zhao Liu, Sandipan Das and Dapeng Mi from
"PATCH v9 4/5", because there is change to remove the usage of
'kvm_pmu_disabled'.
- Remove "PATCH v8 7/7" as suggested by Zide. Leave it as TODO.
Dongli Zhang (5):
target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured
target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid()
target/i386/kvm: rename architectural PMU variables
target/i386/kvm: reset AMD PMU registers during VM reset
target/i386/kvm: support perfmon-v2 for reset
target/i386/cpu.h | 16 +++
target/i386/kvm/kvm.c | 314 +++++++++++++++++++++++++++++++++++++++------
2 files changed, 291 insertions(+), 39 deletions(-)
branch: remotes/origin/staging
base-commit: 146dcea03e276a47404c2cc03ea753fd681c9567
Thank you very much!
Dongli Zhang
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v9 1/5] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured
2026-01-09 7:53 [PATCH v9 0/5] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
@ 2026-01-09 7:53 ` Dongli Zhang
2026-01-15 1:07 ` Chen, Zide
2026-01-09 7:53 ` [PATCH v9 2/5] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
` (4 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Dongli Zhang @ 2026-01-09 7:53 UTC (permalink / raw)
To: qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai,
zide.chen
Although AMD PERFCORE and PerfMonV2 are removed when "-pmu" is configured,
there is no way to fully disable KVM AMD PMU virtualization. Neither
"-cpu host,-pmu" nor "-cpu EPYC" achieves this.
As a result, the following message still appears in the VM dmesg:
[ 0.263615] Performance Events: AMD PMU driver.
However, the expected output should be:
[ 0.596381] Performance Events: PMU not available due to virtualization, using software events only.
[ 0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
This occurs because AMD does not use any CPUID bit to indicate PMU
availability.
To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
when "-pmu" is configured.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
Changed since v1:
- Switch back to the initial implementation with "-pmu".
https://lore.kernel.org/all/20221119122901.2469-3-dongli.zhang@oracle.com
- Mention that "KVM_PMU_CAP_DISABLE doesn't change the PMU behavior on
Intel platform because current "pmu" property works as expected."
Changed since v2:
- Change has_pmu_cap to pmu_cap.
- Use (pmu_cap & KVM_PMU_CAP_DISABLE) instead of only pmu_cap in if
statement.
- Add Reviewed-by from Xiaoyao and Zhao as the change is minor.
Changed since v5:
- Re-base on top of most recent mainline QEMU.
- To resolve conflicts, move the PMU related code before the
call site of is_tdx_vm().
Changed since v6:
- Add Reviewed-by from Dapeng Mi.
target/i386/kvm/kvm.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 7b9b740a8e..c98832f423 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -179,6 +179,8 @@ static int has_triple_fault_event;
static bool has_msr_mcg_ext_ctl;
+static int pmu_cap;
+
static struct kvm_cpuid2 *cpuid_cache;
static struct kvm_cpuid2 *hv_cpuid_cache;
static struct kvm_msr_list *kvm_feature_msrs;
@@ -2080,6 +2082,33 @@ full:
int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
{
+ static bool first = true;
+ int ret;
+
+ if (first) {
+ first = false;
+
+ /*
+ * Since Linux v5.18, KVM provides a VM-level capability to easily
+ * disable PMUs; however, QEMU has been providing PMU property per
+ * CPU since v1.6. In order to accommodate both, have to configure
+ * the VM-level capability here.
+ *
+ * KVM_PMU_CAP_DISABLE doesn't change the PMU
+ * behavior on Intel platform because current "pmu" property works
+ * as expected.
+ */
+ if ((pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu) {
+ ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
+ KVM_PMU_CAP_DISABLE);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret,
+ "Failed to set KVM_PMU_CAP_DISABLE");
+ return ret;
+ }
+ }
+ }
+
if (is_tdx_vm()) {
return tdx_pre_create_vcpu(cpu, errp);
}
@@ -3391,6 +3420,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
}
}
+ pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
+
return 0;
}
--
2.39.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v9 2/5] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid()
2026-01-09 7:53 [PATCH v9 0/5] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
2026-01-09 7:53 ` [PATCH v9 1/5] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured Dongli Zhang
@ 2026-01-09 7:53 ` Dongli Zhang
2026-01-15 1:08 ` Chen, Zide
2026-01-09 7:53 ` [PATCH v9 3/5] target/i386/kvm: rename architectural PMU variables Dongli Zhang
` (3 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Dongli Zhang @ 2026-01-09 7:53 UTC (permalink / raw)
To: qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai,
zide.chen
The initialization of 'has_architectural_pmu_version',
'num_architectural_pmu_gp_counters', and
'num_architectural_pmu_fixed_counters' is unrelated to the process of
building the CPUID.
Extract them out of kvm_x86_build_cpuid().
In addition, use cpuid_find_entry() instead of cpu_x86_cpuid(), because
CPUID has already been filled at this stage.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
Changed since v1:
- Still extract the code, but call them for all CPUs.
Changed since v2:
- Use cpuid_find_entry() instead of cpu_x86_cpuid().
- Didn't add Reviewed-by from Dapeng as the change isn't minor.
Changed since v6:
- Add Reviewed-by from Dapeng Mi.
target/i386/kvm/kvm.c | 62 ++++++++++++++++++++++++-------------------
1 file changed, 35 insertions(+), 27 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index c98832f423..08d80ff677 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1986,33 +1986,6 @@ uint32_t kvm_x86_build_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
}
}
- if (limit >= 0x0a) {
- uint32_t eax, edx;
-
- cpu_x86_cpuid(env, 0x0a, 0, &eax, &unused, &unused, &edx);
-
- has_architectural_pmu_version = eax & 0xff;
- if (has_architectural_pmu_version > 0) {
- num_architectural_pmu_gp_counters = (eax & 0xff00) >> 8;
-
- /* Shouldn't be more than 32, since that's the number of bits
- * available in EBX to tell us _which_ counters are available.
- * Play it safe.
- */
- if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
- num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
- }
-
- if (has_architectural_pmu_version > 1) {
- num_architectural_pmu_fixed_counters = edx & 0x1f;
-
- if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
- num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
- }
- }
- }
- }
-
cpu_x86_cpuid(env, 0x80000000, 0, &limit, &unused, &unused, &unused);
for (i = 0x80000000; i <= limit; i++) {
@@ -2116,6 +2089,39 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
return 0;
}
+static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
+{
+ struct kvm_cpuid_entry2 *c;
+
+ c = cpuid_find_entry(cpuid, 0xa, 0);
+
+ if (!c) {
+ return;
+ }
+
+ has_architectural_pmu_version = c->eax & 0xff;
+ if (has_architectural_pmu_version > 0) {
+ num_architectural_pmu_gp_counters = (c->eax & 0xff00) >> 8;
+
+ /*
+ * Shouldn't be more than 32, since that's the number of bits
+ * available in EBX to tell us _which_ counters are available.
+ * Play it safe.
+ */
+ if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
+ num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
+ }
+
+ if (has_architectural_pmu_version > 1) {
+ num_architectural_pmu_fixed_counters = c->edx & 0x1f;
+
+ if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
+ num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
+ }
+ }
+ }
+}
+
int kvm_arch_init_vcpu(CPUState *cs)
{
struct {
@@ -2306,6 +2312,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i);
cpuid_data.cpuid.nent = cpuid_i;
+ kvm_init_pmu_info(&cpuid_data.cpuid);
+
if (x86_cpu_family(env->cpuid_version) >= 6
&& (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
(CPUID_MCE | CPUID_MCA)) {
--
2.39.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v9 3/5] target/i386/kvm: rename architectural PMU variables
2026-01-09 7:53 [PATCH v9 0/5] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
2026-01-09 7:53 ` [PATCH v9 1/5] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured Dongli Zhang
2026-01-09 7:53 ` [PATCH v9 2/5] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
@ 2026-01-09 7:53 ` Dongli Zhang
2026-01-15 1:09 ` Chen, Zide
2026-01-09 7:53 ` [PATCH v9 4/5] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
` (2 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Dongli Zhang @ 2026-01-09 7:53 UTC (permalink / raw)
To: qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai,
zide.chen
AMD does not have what is commonly referred to as an architectural PMU.
Therefore, we need to rename the following variables to be applicable for
both Intel and AMD:
- has_architectural_pmu_version
- num_architectural_pmu_gp_counters
- num_architectural_pmu_fixed_counters
For Intel processors, the meaning of pmu_version remains unchanged.
For AMD processors:
pmu_version == 1 corresponds to versions before AMD PerfMonV2.
pmu_version == 2 corresponds to AMD PerfMonV2.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
---
Changed since v2:
- Change has_pmu_version to pmu_version.
- Add Reviewed-by since the change is minor.
- As a reminder, there are some contextual change due to PATCH 05,
i.e., c->edx vs. edx.
Changed since v6:
- Add Reviewed-by from Sandipan.
target/i386/kvm/kvm.c | 49 ++++++++++++++++++++++++-------------------
1 file changed, 28 insertions(+), 21 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 08d80ff677..3b803c662d 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -167,9 +167,16 @@ static bool has_msr_perf_capabs;
static bool has_msr_pkrs;
static bool has_msr_hwcr;
-static uint32_t has_architectural_pmu_version;
-static uint32_t num_architectural_pmu_gp_counters;
-static uint32_t num_architectural_pmu_fixed_counters;
+/*
+ * For Intel processors, the meaning is the architectural PMU version
+ * number.
+ *
+ * For AMD processors: 1 corresponds to the prior versions, and 2
+ * corresponds to AMD PerfMonV2.
+ */
+static uint32_t pmu_version;
+static uint32_t num_pmu_gp_counters;
+static uint32_t num_pmu_fixed_counters;
static int has_xsave2;
static int has_xcrs;
@@ -2099,24 +2106,24 @@ static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
return;
}
- has_architectural_pmu_version = c->eax & 0xff;
- if (has_architectural_pmu_version > 0) {
- num_architectural_pmu_gp_counters = (c->eax & 0xff00) >> 8;
+ pmu_version = c->eax & 0xff;
+ if (pmu_version > 0) {
+ num_pmu_gp_counters = (c->eax & 0xff00) >> 8;
/*
* Shouldn't be more than 32, since that's the number of bits
* available in EBX to tell us _which_ counters are available.
* Play it safe.
*/
- if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
- num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
+ if (num_pmu_gp_counters > MAX_GP_COUNTERS) {
+ num_pmu_gp_counters = MAX_GP_COUNTERS;
}
- if (has_architectural_pmu_version > 1) {
- num_architectural_pmu_fixed_counters = c->edx & 0x1f;
+ if (pmu_version > 1) {
+ num_pmu_fixed_counters = c->edx & 0x1f;
- if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
- num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
+ if (num_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
+ num_pmu_fixed_counters = MAX_FIXED_COUNTERS;
}
}
}
@@ -4087,25 +4094,25 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
}
- if (has_architectural_pmu_version > 0) {
- if (has_architectural_pmu_version > 1) {
+ if (pmu_version > 0) {
+ if (pmu_version > 1) {
/* Stop the counter. */
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
}
/* Set the counter values. */
- for (i = 0; i < num_architectural_pmu_fixed_counters; i++) {
+ for (i = 0; i < num_pmu_fixed_counters; i++) {
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i,
env->msr_fixed_counters[i]);
}
- for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
+ for (i = 0; i < num_pmu_gp_counters; i++) {
kvm_msr_entry_add(cpu, MSR_P6_PERFCTR0 + i,
env->msr_gp_counters[i]);
kvm_msr_entry_add(cpu, MSR_P6_EVNTSEL0 + i,
env->msr_gp_evtsel[i]);
}
- if (has_architectural_pmu_version > 1) {
+ if (pmu_version > 1) {
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_STATUS,
env->msr_global_status);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
@@ -4622,17 +4629,17 @@ static int kvm_get_msrs(X86CPU *cpu)
if (env->features[FEAT_KVM] & CPUID_KVM_POLL_CONTROL) {
kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
}
- if (has_architectural_pmu_version > 0) {
- if (has_architectural_pmu_version > 1) {
+ if (pmu_version > 0) {
+ if (pmu_version > 1) {
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_STATUS, 0);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0);
}
- for (i = 0; i < num_architectural_pmu_fixed_counters; i++) {
+ for (i = 0; i < num_pmu_fixed_counters; i++) {
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i, 0);
}
- for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
+ for (i = 0; i < num_pmu_gp_counters; i++) {
kvm_msr_entry_add(cpu, MSR_P6_PERFCTR0 + i, 0);
kvm_msr_entry_add(cpu, MSR_P6_EVNTSEL0 + i, 0);
}
--
2.39.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v9 4/5] target/i386/kvm: reset AMD PMU registers during VM reset
2026-01-09 7:53 [PATCH v9 0/5] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
` (2 preceding siblings ...)
2026-01-09 7:53 ` [PATCH v9 3/5] target/i386/kvm: rename architectural PMU variables Dongli Zhang
@ 2026-01-09 7:53 ` Dongli Zhang
2026-01-16 23:08 ` Dongli Zhang
` (2 more replies)
2026-01-09 7:54 ` [PATCH v9 5/5] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
2026-02-07 13:46 ` [PATCH v9 0/5] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Paolo Bonzini
5 siblings, 3 replies; 14+ messages in thread
From: Dongli Zhang @ 2026-01-09 7:53 UTC (permalink / raw)
To: qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai,
zide.chen
QEMU uses the kvm_get_msrs() function to save Intel PMU registers from KVM
and kvm_put_msrs() to restore them to KVM. However, there is no support for
AMD PMU registers. Currently, pmu_version and num_pmu_gp_counters are
initialized based on cpuid(0xa), which does not apply to AMD processors.
For AMD CPUs, prior to PerfMonV2, the number of general-purpose registers
is determined based on the CPU version.
To address this issue, we need to add support for AMD PMU registers.
Without this support, the following problems can arise:
1. If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
running "perf top", the PMU registers are not disabled properly.
2. Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
does not handle AMD PMU registers, causing some PMU events to remain
enabled in KVM.
3. The KVM kvm_pmc_speculative_in_use() function consistently returns true,
preventing the reclamation of these events. Consequently, the
kvm_pmc->perf_event remains active.
4. After a reboot, the VM kernel may report the following error:
[ 0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
[ 0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
5. In the worst case, the active kvm_pmc->perf_event may inject unknown
NMIs randomly into the VM kernel:
[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
To resolve these issues, we propose resetting AMD PMU registers during the
VM reset process.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
Changed since v1:
- Modify "MSR_K7_EVNTSEL0 + 3" and "MSR_K7_PERFCTR0 + 3" by using
AMD64_NUM_COUNTERS (suggested by Sandipan Das).
- Use "AMD64_NUM_COUNTERS_CORE * 2 - 1", not "MSR_F15H_PERF_CTL0 + 0xb".
(suggested by Sandipan Das).
- Switch back to "-pmu" instead of using a global "pmu-cap-disabled".
- Don't initialize PMU info if kvm.enable_pmu=N.
Changed since v2:
- Remove 'static' from host_cpuid_vendorX.
- Change has_pmu_version to pmu_version.
- Use object_property_get_int() to get CPU family.
- Use cpuid_find_entry() instead of cpu_x86_cpuid().
- Send error log when host and guest are from different vendors.
- Move "if (!cpu->enable_pmu)" to begin of function. Add comments to
reminder developers.
- Add support to Zhaoxin. Change is_same_vendor() to
is_host_compat_vendor().
- Didn't add Reviewed-by from Sandipan because the change isn't minor.
Changed since v3:
- Use host_cpu_vendor_fms() from Zhao's patch.
- Check AMD directly makes the "compat" rule clear.
- Add comment to MAX_GP_COUNTERS.
- Skip PMU info initialization if !kvm_pmu_disabled.
Changed since v4:
- Add Reviewed-by from Zhao and Sandipan.
Changed since v6:
- Add Reviewed-by from Dapeng Mi.
Changed since v8:
- Remove the usage of 'kvm_pmu_disabled' as sussged by Zide Chen.
- Remove Reviewed-by from Zhao Liu, Sandipan Das and Dapeng Mi, as the
usage of 'kvm_pmu_disabled' is removed.
target/i386/cpu.h | 12 +++
target/i386/kvm/kvm.c | 168 +++++++++++++++++++++++++++++++++++++++++-
2 files changed, 176 insertions(+), 4 deletions(-)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 2bbc977d90..0960b98960 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -506,6 +506,14 @@ typedef enum X86Seg {
#define MSR_CORE_PERF_GLOBAL_CTRL 0x38f
#define MSR_CORE_PERF_GLOBAL_OVF_CTRL 0x390
+#define MSR_K7_EVNTSEL0 0xc0010000
+#define MSR_K7_PERFCTR0 0xc0010004
+#define MSR_F15H_PERF_CTL0 0xc0010200
+#define MSR_F15H_PERF_CTR0 0xc0010201
+
+#define AMD64_NUM_COUNTERS 4
+#define AMD64_NUM_COUNTERS_CORE 6
+
#define MSR_MC0_CTL 0x400
#define MSR_MC0_STATUS 0x401
#define MSR_MC0_ADDR 0x402
@@ -1737,6 +1745,10 @@ typedef struct {
#endif
#define MAX_FIXED_COUNTERS 3
+/*
+ * This formula is based on Intel's MSR. The current size also meets AMD's
+ * needs.
+ */
#define MAX_GP_COUNTERS (MSR_IA32_PERF_STATUS - MSR_P6_EVNTSEL0)
#define NB_OPMASK_REGS 8
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 3b803c662d..fb7b672a9d 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2096,7 +2096,7 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
return 0;
}
-static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
+static void kvm_init_pmu_info_intel(struct kvm_cpuid2 *cpuid)
{
struct kvm_cpuid_entry2 *c;
@@ -2129,6 +2129,89 @@ static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
}
}
+static void kvm_init_pmu_info_amd(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
+{
+ struct kvm_cpuid_entry2 *c;
+ int64_t family;
+
+ family = object_property_get_int(OBJECT(cpu), "family", NULL);
+ if (family < 0) {
+ return;
+ }
+
+ if (family < 6) {
+ error_report("AMD performance-monitoring is supported from "
+ "K7 and later");
+ return;
+ }
+
+ pmu_version = 1;
+ num_pmu_gp_counters = AMD64_NUM_COUNTERS;
+
+ c = cpuid_find_entry(cpuid, 0x80000001, 0);
+ if (!c) {
+ return;
+ }
+
+ if (!(c->ecx & CPUID_EXT3_PERFCORE)) {
+ return;
+ }
+
+ num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
+}
+
+static bool is_host_compat_vendor(CPUX86State *env)
+{
+ char host_vendor[CPUID_VENDOR_SZ + 1];
+
+ host_cpu_vendor_fms(host_vendor, NULL, NULL, NULL);
+
+ /*
+ * Intel and Zhaoxin are compatible.
+ */
+ if ((g_str_equal(host_vendor, CPUID_VENDOR_INTEL) ||
+ g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN1) ||
+ g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN2)) &&
+ (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env))) {
+ return true;
+ }
+
+ return g_str_equal(host_vendor, CPUID_VENDOR_AMD) &&
+ IS_AMD_CPU(env);
+}
+
+static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
+{
+ CPUX86State *env = &cpu->env;
+
+ /*
+ * If KVM_CAP_PMU_CAPABILITY is not supported, there is no way to
+ * disable the AMD PMU virtualization.
+ *
+ * Assume the user is aware of this when !cpu->enable_pmu. AMD PMU
+ * registers are not going to reset, even they are still available to
+ * guest VM.
+ */
+ if (!cpu->enable_pmu) {
+ return;
+ }
+
+ /*
+ * It is not supported to virtualize AMD PMU registers on Intel
+ * processors, nor to virtualize Intel PMU registers on AMD processors.
+ */
+ if (!is_host_compat_vendor(env)) {
+ error_report("host doesn't support requested feature: vPMU");
+ return;
+ }
+
+ if (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) {
+ kvm_init_pmu_info_intel(cpuid);
+ } else if (IS_AMD_CPU(env)) {
+ kvm_init_pmu_info_amd(cpuid, cpu);
+ }
+}
+
int kvm_arch_init_vcpu(CPUState *cs)
{
struct {
@@ -2319,7 +2402,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i);
cpuid_data.cpuid.nent = cpuid_i;
- kvm_init_pmu_info(&cpuid_data.cpuid);
+ kvm_init_pmu_info(&cpuid_data.cpuid, cpu);
if (x86_cpu_family(env->cpuid_version) >= 6
&& (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
@@ -4094,7 +4177,7 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
}
- if (pmu_version > 0) {
+ if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
if (pmu_version > 1) {
/* Stop the counter. */
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
@@ -4125,6 +4208,38 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
env->msr_global_ctrl);
}
}
+
+ if (IS_AMD_CPU(env) && pmu_version > 0) {
+ uint32_t sel_base = MSR_K7_EVNTSEL0;
+ uint32_t ctr_base = MSR_K7_PERFCTR0;
+ /*
+ * The address of the next selector or counter register is
+ * obtained by incrementing the address of the current selector
+ * or counter register by one.
+ */
+ uint32_t step = 1;
+
+ /*
+ * When PERFCORE is enabled, AMD PMU uses a separate set of
+ * addresses for the selector and counter registers.
+ * Additionally, the address of the next selector or counter
+ * register is determined by incrementing the address of the
+ * current register by two.
+ */
+ if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+ sel_base = MSR_F15H_PERF_CTL0;
+ ctr_base = MSR_F15H_PERF_CTR0;
+ step = 2;
+ }
+
+ for (i = 0; i < num_pmu_gp_counters; i++) {
+ kvm_msr_entry_add(cpu, ctr_base + i * step,
+ env->msr_gp_counters[i]);
+ kvm_msr_entry_add(cpu, sel_base + i * step,
+ env->msr_gp_evtsel[i]);
+ }
+ }
+
/*
* Hyper-V partition-wide MSRs: to avoid clearing them on cpu hot-add,
* only sync them to KVM on the first cpu
@@ -4629,7 +4744,8 @@ static int kvm_get_msrs(X86CPU *cpu)
if (env->features[FEAT_KVM] & CPUID_KVM_POLL_CONTROL) {
kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
}
- if (pmu_version > 0) {
+
+ if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
if (pmu_version > 1) {
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
@@ -4645,6 +4761,35 @@ static int kvm_get_msrs(X86CPU *cpu)
}
}
+ if (IS_AMD_CPU(env) && pmu_version > 0) {
+ uint32_t sel_base = MSR_K7_EVNTSEL0;
+ uint32_t ctr_base = MSR_K7_PERFCTR0;
+ /*
+ * The address of the next selector or counter register is
+ * obtained by incrementing the address of the current selector
+ * or counter register by one.
+ */
+ uint32_t step = 1;
+
+ /*
+ * When PERFCORE is enabled, AMD PMU uses a separate set of
+ * addresses for the selector and counter registers.
+ * Additionally, the address of the next selector or counter
+ * register is determined by incrementing the address of the
+ * current register by two.
+ */
+ if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+ sel_base = MSR_F15H_PERF_CTL0;
+ ctr_base = MSR_F15H_PERF_CTR0;
+ step = 2;
+ }
+
+ for (i = 0; i < num_pmu_gp_counters; i++) {
+ kvm_msr_entry_add(cpu, ctr_base + i * step, 0);
+ kvm_msr_entry_add(cpu, sel_base + i * step, 0);
+ }
+ }
+
if (env->mcg_cap) {
kvm_msr_entry_add(cpu, MSR_MCG_STATUS, 0);
kvm_msr_entry_add(cpu, MSR_MCG_CTL, 0);
@@ -4975,6 +5120,21 @@ static int kvm_get_msrs(X86CPU *cpu)
case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1:
env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data;
break;
+ case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL0 + AMD64_NUM_COUNTERS - 1:
+ env->msr_gp_evtsel[index - MSR_K7_EVNTSEL0] = msrs[i].data;
+ break;
+ case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR0 + AMD64_NUM_COUNTERS - 1:
+ env->msr_gp_counters[index - MSR_K7_PERFCTR0] = msrs[i].data;
+ break;
+ case MSR_F15H_PERF_CTL0 ...
+ MSR_F15H_PERF_CTL0 + AMD64_NUM_COUNTERS_CORE * 2 - 1:
+ index = index - MSR_F15H_PERF_CTL0;
+ if (index & 0x1) {
+ env->msr_gp_counters[index] = msrs[i].data;
+ } else {
+ env->msr_gp_evtsel[index] = msrs[i].data;
+ }
+ break;
case HV_X64_MSR_HYPERCALL:
env->msr_hv_hypercall = msrs[i].data;
break;
--
2.39.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v9 5/5] target/i386/kvm: support perfmon-v2 for reset
2026-01-09 7:53 [PATCH v9 0/5] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
` (3 preceding siblings ...)
2026-01-09 7:53 ` [PATCH v9 4/5] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
@ 2026-01-09 7:54 ` Dongli Zhang
2026-01-15 1:09 ` Chen, Zide
2026-02-07 13:46 ` [PATCH v9 0/5] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Paolo Bonzini
5 siblings, 1 reply; 14+ messages in thread
From: Dongli Zhang @ 2026-01-09 7:54 UTC (permalink / raw)
To: qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai,
zide.chen
Since perfmon-v2, the AMD PMU supports additional registers. This update
includes get/put functionality for these extra registers.
Similar to the implementation in KVM:
- MSR_CORE_PERF_GLOBAL_STATUS and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS both
use env->msr_global_status.
- MSR_CORE_PERF_GLOBAL_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_CTL both use
env->msr_global_ctrl.
- MSR_CORE_PERF_GLOBAL_OVF_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR
both use env->msr_global_ovf_ctrl.
No changes are needed for vmstate_msr_architectural_pmu or
pmu_enable_needed().
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
---
Changed since v1:
- Use "has_pmu_version > 1", not "has_pmu_version == 2".
Changed since v2:
- Use cpuid_find_entry() instead of cpu_x86_cpuid().
- Change has_pmu_version to pmu_version.
- Cap num_pmu_gp_counters with MAX_GP_COUNTERS.
Changed since v4:
- Add Reviewed-by from Sandipan.
target/i386/cpu.h | 4 ++++
target/i386/kvm/kvm.c | 48 +++++++++++++++++++++++++++++++++++--------
2 files changed, 43 insertions(+), 9 deletions(-)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 0960b98960..6887ae6a33 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -506,6 +506,10 @@ typedef enum X86Seg {
#define MSR_CORE_PERF_GLOBAL_CTRL 0x38f
#define MSR_CORE_PERF_GLOBAL_OVF_CTRL 0x390
+#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS 0xc0000300
+#define MSR_AMD64_PERF_CNTR_GLOBAL_CTL 0xc0000301
+#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR 0xc0000302
+
#define MSR_K7_EVNTSEL0 0xc0010000
#define MSR_K7_PERFCTR0 0xc0010004
#define MSR_F15H_PERF_CTL0 0xc0010200
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index fb7b672a9d..67adfafa0c 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2158,6 +2158,16 @@ static void kvm_init_pmu_info_amd(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
}
num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
+
+ c = cpuid_find_entry(cpuid, 0x80000022, 0);
+ if (c && (c->eax & CPUID_8000_0022_EAX_PERFMON_V2)) {
+ pmu_version = 2;
+ num_pmu_gp_counters = c->ebx & 0xf;
+
+ if (num_pmu_gp_counters > MAX_GP_COUNTERS) {
+ num_pmu_gp_counters = MAX_GP_COUNTERS;
+ }
+ }
}
static bool is_host_compat_vendor(CPUX86State *env)
@@ -4220,13 +4230,14 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
uint32_t step = 1;
/*
- * When PERFCORE is enabled, AMD PMU uses a separate set of
- * addresses for the selector and counter registers.
- * Additionally, the address of the next selector or counter
- * register is determined by incrementing the address of the
- * current register by two.
+ * When PERFCORE or PerfMonV2 is enabled, AMD PMU uses a
+ * separate set of addresses for the selector and counter
+ * registers. Additionally, the address of the next selector or
+ * counter register is determined by incrementing the address
+ * of the current register by two.
*/
- if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+ if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE ||
+ pmu_version > 1) {
sel_base = MSR_F15H_PERF_CTL0;
ctr_base = MSR_F15H_PERF_CTR0;
step = 2;
@@ -4238,6 +4249,15 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
kvm_msr_entry_add(cpu, sel_base + i * step,
env->msr_gp_evtsel[i]);
}
+
+ if (pmu_version > 1) {
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS,
+ env->msr_global_status);
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR,
+ env->msr_global_ovf_ctrl);
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_CTL,
+ env->msr_global_ctrl);
+ }
}
/*
@@ -4772,13 +4792,14 @@ static int kvm_get_msrs(X86CPU *cpu)
uint32_t step = 1;
/*
- * When PERFCORE is enabled, AMD PMU uses a separate set of
- * addresses for the selector and counter registers.
+ * When PERFCORE or PerfMonV2 is enabled, AMD PMU uses a separate
+ * set of addresses for the selector and counter registers.
* Additionally, the address of the next selector or counter
* register is determined by incrementing the address of the
* current register by two.
*/
- if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+ if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE ||
+ pmu_version > 1) {
sel_base = MSR_F15H_PERF_CTL0;
ctr_base = MSR_F15H_PERF_CTR0;
step = 2;
@@ -4788,6 +4809,12 @@ static int kvm_get_msrs(X86CPU *cpu)
kvm_msr_entry_add(cpu, ctr_base + i * step, 0);
kvm_msr_entry_add(cpu, sel_base + i * step, 0);
}
+
+ if (pmu_version > 1) {
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0);
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, 0);
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, 0);
+ }
}
if (env->mcg_cap) {
@@ -5103,12 +5130,15 @@ static int kvm_get_msrs(X86CPU *cpu)
env->msr_fixed_ctr_ctrl = msrs[i].data;
break;
case MSR_CORE_PERF_GLOBAL_CTRL:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
env->msr_global_ctrl = msrs[i].data;
break;
case MSR_CORE_PERF_GLOBAL_STATUS:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
env->msr_global_status = msrs[i].data;
break;
case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
env->msr_global_ovf_ctrl = msrs[i].data;
break;
case MSR_CORE_PERF_FIXED_CTR0 ... MSR_CORE_PERF_FIXED_CTR0 + MAX_FIXED_COUNTERS - 1:
--
2.39.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v9 1/5] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured
2026-01-09 7:53 ` [PATCH v9 1/5] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured Dongli Zhang
@ 2026-01-15 1:07 ` Chen, Zide
0 siblings, 0 replies; 14+ messages in thread
From: Chen, Zide @ 2026-01-15 1:07 UTC (permalink / raw)
To: Dongli Zhang, qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai
On 1/8/2026 11:53 PM, Dongli Zhang wrote:
> Although AMD PERFCORE and PerfMonV2 are removed when "-pmu" is configured,
> there is no way to fully disable KVM AMD PMU virtualization. Neither
> "-cpu host,-pmu" nor "-cpu EPYC" achieves this.
>
> As a result, the following message still appears in the VM dmesg:
>
> [ 0.263615] Performance Events: AMD PMU driver.
>
> However, the expected output should be:
>
> [ 0.596381] Performance Events: PMU not available due to virtualization, using software events only.
> [ 0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
>
> This occurs because AMD does not use any CPUID bit to indicate PMU
> availability.
>
> To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
> when "-pmu" is configured.
>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
LGTM.
Reviewed-by: Zide Chen <zide.chen@intel.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v9 2/5] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid()
2026-01-09 7:53 ` [PATCH v9 2/5] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
@ 2026-01-15 1:08 ` Chen, Zide
0 siblings, 0 replies; 14+ messages in thread
From: Chen, Zide @ 2026-01-15 1:08 UTC (permalink / raw)
To: Dongli Zhang, qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai
On 1/8/2026 11:53 PM, Dongli Zhang wrote:
> The initialization of 'has_architectural_pmu_version',
> 'num_architectural_pmu_gp_counters', and
> 'num_architectural_pmu_fixed_counters' is unrelated to the process of
> building the CPUID.
>
> Extract them out of kvm_x86_build_cpuid().
>
> In addition, use cpuid_find_entry() instead of cpu_x86_cpuid(), because
> CPUID has already been filled at this stage.
>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
LGTM.
Reviewed-by: Zide Chen <zide.chen@intel.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v9 5/5] target/i386/kvm: support perfmon-v2 for reset
2026-01-09 7:54 ` [PATCH v9 5/5] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
@ 2026-01-15 1:09 ` Chen, Zide
0 siblings, 0 replies; 14+ messages in thread
From: Chen, Zide @ 2026-01-15 1:09 UTC (permalink / raw)
To: Dongli Zhang, qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai
On 1/8/2026 11:54 PM, Dongli Zhang wrote:
> Since perfmon-v2, the AMD PMU supports additional registers. This update
> includes get/put functionality for these extra registers.
>
> Similar to the implementation in KVM:
>
> - MSR_CORE_PERF_GLOBAL_STATUS and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS both
> use env->msr_global_status.
> - MSR_CORE_PERF_GLOBAL_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_CTL both use
> env->msr_global_ctrl.
> - MSR_CORE_PERF_GLOBAL_OVF_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR
> both use env->msr_global_ovf_ctrl.
>
> No changes are needed for vmstate_msr_architectural_pmu or
> pmu_enable_needed().
>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> Reviewed-by: Sandipan Das <sandipan.das@amd.com>
> ---
LGTM.
Reviewed-by: Zide Chen <zide.chen@intel.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v9 3/5] target/i386/kvm: rename architectural PMU variables
2026-01-09 7:53 ` [PATCH v9 3/5] target/i386/kvm: rename architectural PMU variables Dongli Zhang
@ 2026-01-15 1:09 ` Chen, Zide
0 siblings, 0 replies; 14+ messages in thread
From: Chen, Zide @ 2026-01-15 1:09 UTC (permalink / raw)
To: Dongli Zhang, qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai
On 1/8/2026 11:53 PM, Dongli Zhang wrote:
> AMD does not have what is commonly referred to as an architectural PMU.
> Therefore, we need to rename the following variables to be applicable for
> both Intel and AMD:
>
> - has_architectural_pmu_version
> - num_architectural_pmu_gp_counters
> - num_architectural_pmu_fixed_counters
>
> For Intel processors, the meaning of pmu_version remains unchanged.
>
> For AMD processors:
>
> pmu_version == 1 corresponds to versions before AMD PerfMonV2.
> pmu_version == 2 corresponds to AMD PerfMonV2.
>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> Reviewed-by: Sandipan Das <sandipan.das@amd.com>
> ---
LGTM.
Reviewed-by: Zide Chen <zide.chen@intel.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v9 4/5] target/i386/kvm: reset AMD PMU registers during VM reset
2026-01-09 7:53 ` [PATCH v9 4/5] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
@ 2026-01-16 23:08 ` Dongli Zhang
2026-01-19 1:24 ` Mi, Dapeng
2026-01-19 5:33 ` Zhao Liu
2 siblings, 0 replies; 14+ messages in thread
From: Dongli Zhang @ 2026-01-16 23:08 UTC (permalink / raw)
To: qemu-devel, kvm, zhao1.liu, sandipan.das, dapeng1.mi
Cc: pbonzini, mtosatti, babu.moger, likexu, like.xu.linux, groug,
khorenko, alexander.ivanov, den, davydov-max, xiaoyao.li, joe.jin,
ewanhai-oc, ewanhai, zide.chen
Hi Zhao, Sandipan and Dapeng,
FYI: I have removed your Reviewed-by since the previous version.
I have removed the following code since the previous version as suggested by Zide.
+ /*
+ * The PMU virtualization is disabled by kvm.enable_pmu=N.
+ */
+ if (kvm_pmu_disabled) {
+ return;
+ }
Thank you very much!
Dongli Zhang
On 1/8/26 11:53 PM, Dongli Zhang wrote:
> QEMU uses the kvm_get_msrs() function to save Intel PMU registers from KVM
> and kvm_put_msrs() to restore them to KVM. However, there is no support for
> AMD PMU registers. Currently, pmu_version and num_pmu_gp_counters are
> initialized based on cpuid(0xa), which does not apply to AMD processors.
> For AMD CPUs, prior to PerfMonV2, the number of general-purpose registers
> is determined based on the CPU version.
>
> To address this issue, we need to add support for AMD PMU registers.
> Without this support, the following problems can arise:
>
> 1. If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
> running "perf top", the PMU registers are not disabled properly.
>
> 2. Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
> does not handle AMD PMU registers, causing some PMU events to remain
> enabled in KVM.
>
> 3. The KVM kvm_pmc_speculative_in_use() function consistently returns true,
> preventing the reclamation of these events. Consequently, the
> kvm_pmc->perf_event remains active.
>
> 4. After a reboot, the VM kernel may report the following error:
>
> [ 0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
> [ 0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
>
> 5. In the worst case, the active kvm_pmc->perf_event may inject unknown
> NMIs randomly into the VM kernel:
>
> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
>
> To resolve these issues, we propose resetting AMD PMU registers during the
> VM reset process.
>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> Changed since v1:
> - Modify "MSR_K7_EVNTSEL0 + 3" and "MSR_K7_PERFCTR0 + 3" by using
> AMD64_NUM_COUNTERS (suggested by Sandipan Das).
> - Use "AMD64_NUM_COUNTERS_CORE * 2 - 1", not "MSR_F15H_PERF_CTL0 + 0xb".
> (suggested by Sandipan Das).
> - Switch back to "-pmu" instead of using a global "pmu-cap-disabled".
> - Don't initialize PMU info if kvm.enable_pmu=N.
> Changed since v2:
> - Remove 'static' from host_cpuid_vendorX.
> - Change has_pmu_version to pmu_version.
> - Use object_property_get_int() to get CPU family.
> - Use cpuid_find_entry() instead of cpu_x86_cpuid().
> - Send error log when host and guest are from different vendors.
> - Move "if (!cpu->enable_pmu)" to begin of function. Add comments to
> reminder developers.
> - Add support to Zhaoxin. Change is_same_vendor() to
> is_host_compat_vendor().
> - Didn't add Reviewed-by from Sandipan because the change isn't minor.
> Changed since v3:
> - Use host_cpu_vendor_fms() from Zhao's patch.
> - Check AMD directly makes the "compat" rule clear.
> - Add comment to MAX_GP_COUNTERS.
> - Skip PMU info initialization if !kvm_pmu_disabled.
> Changed since v4:
> - Add Reviewed-by from Zhao and Sandipan.
> Changed since v6:
> - Add Reviewed-by from Dapeng Mi.
> Changed since v8:
> - Remove the usage of 'kvm_pmu_disabled' as sussged by Zide Chen.
> - Remove Reviewed-by from Zhao Liu, Sandipan Das and Dapeng Mi, as the
> usage of 'kvm_pmu_disabled' is removed.
>
> target/i386/cpu.h | 12 +++
> target/i386/kvm/kvm.c | 168 +++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 176 insertions(+), 4 deletions(-)
>
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 2bbc977d90..0960b98960 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -506,6 +506,14 @@ typedef enum X86Seg {
> #define MSR_CORE_PERF_GLOBAL_CTRL 0x38f
> #define MSR_CORE_PERF_GLOBAL_OVF_CTRL 0x390
>
> +#define MSR_K7_EVNTSEL0 0xc0010000
> +#define MSR_K7_PERFCTR0 0xc0010004
> +#define MSR_F15H_PERF_CTL0 0xc0010200
> +#define MSR_F15H_PERF_CTR0 0xc0010201
> +
> +#define AMD64_NUM_COUNTERS 4
> +#define AMD64_NUM_COUNTERS_CORE 6
> +
> #define MSR_MC0_CTL 0x400
> #define MSR_MC0_STATUS 0x401
> #define MSR_MC0_ADDR 0x402
> @@ -1737,6 +1745,10 @@ typedef struct {
> #endif
>
> #define MAX_FIXED_COUNTERS 3
> +/*
> + * This formula is based on Intel's MSR. The current size also meets AMD's
> + * needs.
> + */
> #define MAX_GP_COUNTERS (MSR_IA32_PERF_STATUS - MSR_P6_EVNTSEL0)
>
> #define NB_OPMASK_REGS 8
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 3b803c662d..fb7b672a9d 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -2096,7 +2096,7 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
> return 0;
> }
>
> -static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
> +static void kvm_init_pmu_info_intel(struct kvm_cpuid2 *cpuid)
> {
> struct kvm_cpuid_entry2 *c;
>
> @@ -2129,6 +2129,89 @@ static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
> }
> }
>
> +static void kvm_init_pmu_info_amd(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
> +{
> + struct kvm_cpuid_entry2 *c;
> + int64_t family;
> +
> + family = object_property_get_int(OBJECT(cpu), "family", NULL);
> + if (family < 0) {
> + return;
> + }
> +
> + if (family < 6) {
> + error_report("AMD performance-monitoring is supported from "
> + "K7 and later");
> + return;
> + }
> +
> + pmu_version = 1;
> + num_pmu_gp_counters = AMD64_NUM_COUNTERS;
> +
> + c = cpuid_find_entry(cpuid, 0x80000001, 0);
> + if (!c) {
> + return;
> + }
> +
> + if (!(c->ecx & CPUID_EXT3_PERFCORE)) {
> + return;
> + }
> +
> + num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
> +}
> +
> +static bool is_host_compat_vendor(CPUX86State *env)
> +{
> + char host_vendor[CPUID_VENDOR_SZ + 1];
> +
> + host_cpu_vendor_fms(host_vendor, NULL, NULL, NULL);
> +
> + /*
> + * Intel and Zhaoxin are compatible.
> + */
> + if ((g_str_equal(host_vendor, CPUID_VENDOR_INTEL) ||
> + g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN1) ||
> + g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN2)) &&
> + (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env))) {
> + return true;
> + }
> +
> + return g_str_equal(host_vendor, CPUID_VENDOR_AMD) &&
> + IS_AMD_CPU(env);
> +}
> +
> +static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
> +{
> + CPUX86State *env = &cpu->env;
> +
> + /*
> + * If KVM_CAP_PMU_CAPABILITY is not supported, there is no way to
> + * disable the AMD PMU virtualization.
> + *
> + * Assume the user is aware of this when !cpu->enable_pmu. AMD PMU
> + * registers are not going to reset, even they are still available to
> + * guest VM.
> + */
> + if (!cpu->enable_pmu) {
> + return;
> + }
> +
> + /*
> + * It is not supported to virtualize AMD PMU registers on Intel
> + * processors, nor to virtualize Intel PMU registers on AMD processors.
> + */
> + if (!is_host_compat_vendor(env)) {
> + error_report("host doesn't support requested feature: vPMU");
> + return;
> + }
> +
> + if (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) {
> + kvm_init_pmu_info_intel(cpuid);
> + } else if (IS_AMD_CPU(env)) {
> + kvm_init_pmu_info_amd(cpuid, cpu);
> + }
> +}
> +
> int kvm_arch_init_vcpu(CPUState *cs)
> {
> struct {
> @@ -2319,7 +2402,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
> cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i);
> cpuid_data.cpuid.nent = cpuid_i;
>
> - kvm_init_pmu_info(&cpuid_data.cpuid);
> + kvm_init_pmu_info(&cpuid_data.cpuid, cpu);
>
> if (x86_cpu_family(env->cpuid_version) >= 6
> && (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
> @@ -4094,7 +4177,7 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
> kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
> }
>
> - if (pmu_version > 0) {
> + if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
> if (pmu_version > 1) {
> /* Stop the counter. */
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
> @@ -4125,6 +4208,38 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
> env->msr_global_ctrl);
> }
> }
> +
> + if (IS_AMD_CPU(env) && pmu_version > 0) {
> + uint32_t sel_base = MSR_K7_EVNTSEL0;
> + uint32_t ctr_base = MSR_K7_PERFCTR0;
> + /*
> + * The address of the next selector or counter register is
> + * obtained by incrementing the address of the current selector
> + * or counter register by one.
> + */
> + uint32_t step = 1;
> +
> + /*
> + * When PERFCORE is enabled, AMD PMU uses a separate set of
> + * addresses for the selector and counter registers.
> + * Additionally, the address of the next selector or counter
> + * register is determined by incrementing the address of the
> + * current register by two.
> + */
> + if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
> + sel_base = MSR_F15H_PERF_CTL0;
> + ctr_base = MSR_F15H_PERF_CTR0;
> + step = 2;
> + }
> +
> + for (i = 0; i < num_pmu_gp_counters; i++) {
> + kvm_msr_entry_add(cpu, ctr_base + i * step,
> + env->msr_gp_counters[i]);
> + kvm_msr_entry_add(cpu, sel_base + i * step,
> + env->msr_gp_evtsel[i]);
> + }
> + }
> +
> /*
> * Hyper-V partition-wide MSRs: to avoid clearing them on cpu hot-add,
> * only sync them to KVM on the first cpu
> @@ -4629,7 +4744,8 @@ static int kvm_get_msrs(X86CPU *cpu)
> if (env->features[FEAT_KVM] & CPUID_KVM_POLL_CONTROL) {
> kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
> }
> - if (pmu_version > 0) {
> +
> + if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
> if (pmu_version > 1) {
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
> @@ -4645,6 +4761,35 @@ static int kvm_get_msrs(X86CPU *cpu)
> }
> }
>
> + if (IS_AMD_CPU(env) && pmu_version > 0) {
> + uint32_t sel_base = MSR_K7_EVNTSEL0;
> + uint32_t ctr_base = MSR_K7_PERFCTR0;
> + /*
> + * The address of the next selector or counter register is
> + * obtained by incrementing the address of the current selector
> + * or counter register by one.
> + */
> + uint32_t step = 1;
> +
> + /*
> + * When PERFCORE is enabled, AMD PMU uses a separate set of
> + * addresses for the selector and counter registers.
> + * Additionally, the address of the next selector or counter
> + * register is determined by incrementing the address of the
> + * current register by two.
> + */
> + if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
> + sel_base = MSR_F15H_PERF_CTL0;
> + ctr_base = MSR_F15H_PERF_CTR0;
> + step = 2;
> + }
> +
> + for (i = 0; i < num_pmu_gp_counters; i++) {
> + kvm_msr_entry_add(cpu, ctr_base + i * step, 0);
> + kvm_msr_entry_add(cpu, sel_base + i * step, 0);
> + }
> + }
> +
> if (env->mcg_cap) {
> kvm_msr_entry_add(cpu, MSR_MCG_STATUS, 0);
> kvm_msr_entry_add(cpu, MSR_MCG_CTL, 0);
> @@ -4975,6 +5120,21 @@ static int kvm_get_msrs(X86CPU *cpu)
> case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1:
> env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data;
> break;
> + case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL0 + AMD64_NUM_COUNTERS - 1:
> + env->msr_gp_evtsel[index - MSR_K7_EVNTSEL0] = msrs[i].data;
> + break;
> + case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR0 + AMD64_NUM_COUNTERS - 1:
> + env->msr_gp_counters[index - MSR_K7_PERFCTR0] = msrs[i].data;
> + break;
> + case MSR_F15H_PERF_CTL0 ...
> + MSR_F15H_PERF_CTL0 + AMD64_NUM_COUNTERS_CORE * 2 - 1:
> + index = index - MSR_F15H_PERF_CTL0;
> + if (index & 0x1) {
> + env->msr_gp_counters[index] = msrs[i].data;
> + } else {
> + env->msr_gp_evtsel[index] = msrs[i].data;
> + }
> + break;
> case HV_X64_MSR_HYPERCALL:
> env->msr_hv_hypercall = msrs[i].data;
> break;
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v9 4/5] target/i386/kvm: reset AMD PMU registers during VM reset
2026-01-09 7:53 ` [PATCH v9 4/5] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
2026-01-16 23:08 ` Dongli Zhang
@ 2026-01-19 1:24 ` Mi, Dapeng
2026-01-19 5:33 ` Zhao Liu
2 siblings, 0 replies; 14+ messages in thread
From: Mi, Dapeng @ 2026-01-19 1:24 UTC (permalink / raw)
To: Dongli Zhang, qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, joe.jin, ewanhai-oc, ewanhai, zide.chen
On 1/9/2026 3:53 PM, Dongli Zhang wrote:
> QEMU uses the kvm_get_msrs() function to save Intel PMU registers from KVM
> and kvm_put_msrs() to restore them to KVM. However, there is no support for
> AMD PMU registers. Currently, pmu_version and num_pmu_gp_counters are
> initialized based on cpuid(0xa), which does not apply to AMD processors.
> For AMD CPUs, prior to PerfMonV2, the number of general-purpose registers
> is determined based on the CPU version.
>
> To address this issue, we need to add support for AMD PMU registers.
> Without this support, the following problems can arise:
>
> 1. If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
> running "perf top", the PMU registers are not disabled properly.
>
> 2. Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
> does not handle AMD PMU registers, causing some PMU events to remain
> enabled in KVM.
>
> 3. The KVM kvm_pmc_speculative_in_use() function consistently returns true,
> preventing the reclamation of these events. Consequently, the
> kvm_pmc->perf_event remains active.
>
> 4. After a reboot, the VM kernel may report the following error:
>
> [ 0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
> [ 0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
>
> 5. In the worst case, the active kvm_pmc->perf_event may inject unknown
> NMIs randomly into the VM kernel:
>
> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
>
> To resolve these issues, we propose resetting AMD PMU registers during the
> VM reset process.
>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> Changed since v1:
> - Modify "MSR_K7_EVNTSEL0 + 3" and "MSR_K7_PERFCTR0 + 3" by using
> AMD64_NUM_COUNTERS (suggested by Sandipan Das).
> - Use "AMD64_NUM_COUNTERS_CORE * 2 - 1", not "MSR_F15H_PERF_CTL0 + 0xb".
> (suggested by Sandipan Das).
> - Switch back to "-pmu" instead of using a global "pmu-cap-disabled".
> - Don't initialize PMU info if kvm.enable_pmu=N.
> Changed since v2:
> - Remove 'static' from host_cpuid_vendorX.
> - Change has_pmu_version to pmu_version.
> - Use object_property_get_int() to get CPU family.
> - Use cpuid_find_entry() instead of cpu_x86_cpuid().
> - Send error log when host and guest are from different vendors.
> - Move "if (!cpu->enable_pmu)" to begin of function. Add comments to
> reminder developers.
> - Add support to Zhaoxin. Change is_same_vendor() to
> is_host_compat_vendor().
> - Didn't add Reviewed-by from Sandipan because the change isn't minor.
> Changed since v3:
> - Use host_cpu_vendor_fms() from Zhao's patch.
> - Check AMD directly makes the "compat" rule clear.
> - Add comment to MAX_GP_COUNTERS.
> - Skip PMU info initialization if !kvm_pmu_disabled.
> Changed since v4:
> - Add Reviewed-by from Zhao and Sandipan.
> Changed since v6:
> - Add Reviewed-by from Dapeng Mi.
> Changed since v8:
> - Remove the usage of 'kvm_pmu_disabled' as sussged by Zide Chen.
> - Remove Reviewed-by from Zhao Liu, Sandipan Das and Dapeng Mi, as the
> usage of 'kvm_pmu_disabled' is removed.
>
> target/i386/cpu.h | 12 +++
> target/i386/kvm/kvm.c | 168 +++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 176 insertions(+), 4 deletions(-)
>
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 2bbc977d90..0960b98960 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -506,6 +506,14 @@ typedef enum X86Seg {
> #define MSR_CORE_PERF_GLOBAL_CTRL 0x38f
> #define MSR_CORE_PERF_GLOBAL_OVF_CTRL 0x390
>
> +#define MSR_K7_EVNTSEL0 0xc0010000
> +#define MSR_K7_PERFCTR0 0xc0010004
> +#define MSR_F15H_PERF_CTL0 0xc0010200
> +#define MSR_F15H_PERF_CTR0 0xc0010201
> +
> +#define AMD64_NUM_COUNTERS 4
> +#define AMD64_NUM_COUNTERS_CORE 6
> +
> #define MSR_MC0_CTL 0x400
> #define MSR_MC0_STATUS 0x401
> #define MSR_MC0_ADDR 0x402
> @@ -1737,6 +1745,10 @@ typedef struct {
> #endif
>
> #define MAX_FIXED_COUNTERS 3
> +/*
> + * This formula is based on Intel's MSR. The current size also meets AMD's
> + * needs.
> + */
> #define MAX_GP_COUNTERS (MSR_IA32_PERF_STATUS - MSR_P6_EVNTSEL0)
>
> #define NB_OPMASK_REGS 8
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 3b803c662d..fb7b672a9d 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -2096,7 +2096,7 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
> return 0;
> }
>
> -static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
> +static void kvm_init_pmu_info_intel(struct kvm_cpuid2 *cpuid)
> {
> struct kvm_cpuid_entry2 *c;
>
> @@ -2129,6 +2129,89 @@ static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
> }
> }
>
> +static void kvm_init_pmu_info_amd(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
> +{
> + struct kvm_cpuid_entry2 *c;
> + int64_t family;
> +
> + family = object_property_get_int(OBJECT(cpu), "family", NULL);
> + if (family < 0) {
> + return;
> + }
> +
> + if (family < 6) {
> + error_report("AMD performance-monitoring is supported from "
> + "K7 and later");
> + return;
> + }
> +
> + pmu_version = 1;
> + num_pmu_gp_counters = AMD64_NUM_COUNTERS;
> +
> + c = cpuid_find_entry(cpuid, 0x80000001, 0);
> + if (!c) {
> + return;
> + }
> +
> + if (!(c->ecx & CPUID_EXT3_PERFCORE)) {
> + return;
> + }
> +
> + num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
> +}
> +
> +static bool is_host_compat_vendor(CPUX86State *env)
> +{
> + char host_vendor[CPUID_VENDOR_SZ + 1];
> +
> + host_cpu_vendor_fms(host_vendor, NULL, NULL, NULL);
> +
> + /*
> + * Intel and Zhaoxin are compatible.
> + */
> + if ((g_str_equal(host_vendor, CPUID_VENDOR_INTEL) ||
> + g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN1) ||
> + g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN2)) &&
> + (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env))) {
> + return true;
> + }
> +
> + return g_str_equal(host_vendor, CPUID_VENDOR_AMD) &&
> + IS_AMD_CPU(env);
> +}
> +
> +static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
> +{
> + CPUX86State *env = &cpu->env;
> +
> + /*
> + * If KVM_CAP_PMU_CAPABILITY is not supported, there is no way to
> + * disable the AMD PMU virtualization.
> + *
> + * Assume the user is aware of this when !cpu->enable_pmu. AMD PMU
> + * registers are not going to reset, even they are still available to
> + * guest VM.
> + */
> + if (!cpu->enable_pmu) {
> + return;
> + }
> +
> + /*
> + * It is not supported to virtualize AMD PMU registers on Intel
> + * processors, nor to virtualize Intel PMU registers on AMD processors.
> + */
> + if (!is_host_compat_vendor(env)) {
> + error_report("host doesn't support requested feature: vPMU");
> + return;
> + }
> +
> + if (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) {
> + kvm_init_pmu_info_intel(cpuid);
> + } else if (IS_AMD_CPU(env)) {
> + kvm_init_pmu_info_amd(cpuid, cpu);
> + }
> +}
> +
> int kvm_arch_init_vcpu(CPUState *cs)
> {
> struct {
> @@ -2319,7 +2402,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
> cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i);
> cpuid_data.cpuid.nent = cpuid_i;
>
> - kvm_init_pmu_info(&cpuid_data.cpuid);
> + kvm_init_pmu_info(&cpuid_data.cpuid, cpu);
>
> if (x86_cpu_family(env->cpuid_version) >= 6
> && (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
> @@ -4094,7 +4177,7 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
> kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
> }
>
> - if (pmu_version > 0) {
> + if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
> if (pmu_version > 1) {
> /* Stop the counter. */
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
> @@ -4125,6 +4208,38 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
> env->msr_global_ctrl);
> }
> }
> +
> + if (IS_AMD_CPU(env) && pmu_version > 0) {
> + uint32_t sel_base = MSR_K7_EVNTSEL0;
> + uint32_t ctr_base = MSR_K7_PERFCTR0;
> + /*
> + * The address of the next selector or counter register is
> + * obtained by incrementing the address of the current selector
> + * or counter register by one.
> + */
> + uint32_t step = 1;
> +
> + /*
> + * When PERFCORE is enabled, AMD PMU uses a separate set of
> + * addresses for the selector and counter registers.
> + * Additionally, the address of the next selector or counter
> + * register is determined by incrementing the address of the
> + * current register by two.
> + */
> + if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
> + sel_base = MSR_F15H_PERF_CTL0;
> + ctr_base = MSR_F15H_PERF_CTR0;
> + step = 2;
> + }
> +
> + for (i = 0; i < num_pmu_gp_counters; i++) {
> + kvm_msr_entry_add(cpu, ctr_base + i * step,
> + env->msr_gp_counters[i]);
> + kvm_msr_entry_add(cpu, sel_base + i * step,
> + env->msr_gp_evtsel[i]);
> + }
> + }
> +
> /*
> * Hyper-V partition-wide MSRs: to avoid clearing them on cpu hot-add,
> * only sync them to KVM on the first cpu
> @@ -4629,7 +4744,8 @@ static int kvm_get_msrs(X86CPU *cpu)
> if (env->features[FEAT_KVM] & CPUID_KVM_POLL_CONTROL) {
> kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
> }
> - if (pmu_version > 0) {
> +
> + if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
> if (pmu_version > 1) {
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
> @@ -4645,6 +4761,35 @@ static int kvm_get_msrs(X86CPU *cpu)
> }
> }
>
> + if (IS_AMD_CPU(env) && pmu_version > 0) {
> + uint32_t sel_base = MSR_K7_EVNTSEL0;
> + uint32_t ctr_base = MSR_K7_PERFCTR0;
> + /*
> + * The address of the next selector or counter register is
> + * obtained by incrementing the address of the current selector
> + * or counter register by one.
> + */
> + uint32_t step = 1;
> +
> + /*
> + * When PERFCORE is enabled, AMD PMU uses a separate set of
> + * addresses for the selector and counter registers.
> + * Additionally, the address of the next selector or counter
> + * register is determined by incrementing the address of the
> + * current register by two.
> + */
> + if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
> + sel_base = MSR_F15H_PERF_CTL0;
> + ctr_base = MSR_F15H_PERF_CTR0;
> + step = 2;
> + }
> +
> + for (i = 0; i < num_pmu_gp_counters; i++) {
> + kvm_msr_entry_add(cpu, ctr_base + i * step, 0);
> + kvm_msr_entry_add(cpu, sel_base + i * step, 0);
> + }
> + }
> +
> if (env->mcg_cap) {
> kvm_msr_entry_add(cpu, MSR_MCG_STATUS, 0);
> kvm_msr_entry_add(cpu, MSR_MCG_CTL, 0);
> @@ -4975,6 +5120,21 @@ static int kvm_get_msrs(X86CPU *cpu)
> case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1:
> env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data;
> break;
> + case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL0 + AMD64_NUM_COUNTERS - 1:
> + env->msr_gp_evtsel[index - MSR_K7_EVNTSEL0] = msrs[i].data;
> + break;
> + case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR0 + AMD64_NUM_COUNTERS - 1:
> + env->msr_gp_counters[index - MSR_K7_PERFCTR0] = msrs[i].data;
> + break;
> + case MSR_F15H_PERF_CTL0 ...
> + MSR_F15H_PERF_CTL0 + AMD64_NUM_COUNTERS_CORE * 2 - 1:
> + index = index - MSR_F15H_PERF_CTL0;
> + if (index & 0x1) {
> + env->msr_gp_counters[index] = msrs[i].data;
> + } else {
> + env->msr_gp_evtsel[index] = msrs[i].data;
> + }
> + break;
> case HV_X64_MSR_HYPERCALL:
> env->msr_hv_hypercall = msrs[i].data;
> break;
The Intel related code looks good to me. Let AMD part to Sandipan. Thanks.
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v9 4/5] target/i386/kvm: reset AMD PMU registers during VM reset
2026-01-09 7:53 ` [PATCH v9 4/5] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
2026-01-16 23:08 ` Dongli Zhang
2026-01-19 1:24 ` Mi, Dapeng
@ 2026-01-19 5:33 ` Zhao Liu
2 siblings, 0 replies; 14+ messages in thread
From: Zhao Liu @ 2026-01-19 5:33 UTC (permalink / raw)
To: Dongli Zhang
Cc: qemu-devel, kvm, pbonzini, mtosatti, sandipan.das, babu.moger,
likexu, like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai,
zide.chen
On Thu, Jan 08, 2026 at 11:53:59PM -0800, Dongli Zhang wrote:
> Date: Thu, 8 Jan 2026 23:53:59 -0800
> From: Dongli Zhang <dongli.zhang@oracle.com>
> Subject: [PATCH v9 4/5] target/i386/kvm: reset AMD PMU registers during VM
> reset
> X-Mailer: git-send-email 2.43.5
>
> QEMU uses the kvm_get_msrs() function to save Intel PMU registers from KVM
> and kvm_put_msrs() to restore them to KVM. However, there is no support for
> AMD PMU registers. Currently, pmu_version and num_pmu_gp_counters are
> initialized based on cpuid(0xa), which does not apply to AMD processors.
> For AMD CPUs, prior to PerfMonV2, the number of general-purpose registers
> is determined based on the CPU version.
>
> To address this issue, we need to add support for AMD PMU registers.
> Without this support, the following problems can arise:
>
> 1. If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
> running "perf top", the PMU registers are not disabled properly.
>
> 2. Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
> does not handle AMD PMU registers, causing some PMU events to remain
> enabled in KVM.
>
> 3. The KVM kvm_pmc_speculative_in_use() function consistently returns true,
> preventing the reclamation of these events. Consequently, the
> kvm_pmc->perf_event remains active.
>
> 4. After a reboot, the VM kernel may report the following error:
>
> [ 0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
> [ 0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
>
> 5. In the worst case, the active kvm_pmc->perf_event may inject unknown
> NMIs randomly into the VM kernel:
>
> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
>
> To resolve these issues, we propose resetting AMD PMU registers during the
> VM reset process.
>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> Changed since v1:
> - Modify "MSR_K7_EVNTSEL0 + 3" and "MSR_K7_PERFCTR0 + 3" by using
> AMD64_NUM_COUNTERS (suggested by Sandipan Das).
> - Use "AMD64_NUM_COUNTERS_CORE * 2 - 1", not "MSR_F15H_PERF_CTL0 + 0xb".
> (suggested by Sandipan Das).
> - Switch back to "-pmu" instead of using a global "pmu-cap-disabled".
> - Don't initialize PMU info if kvm.enable_pmu=N.
> Changed since v2:
> - Remove 'static' from host_cpuid_vendorX.
> - Change has_pmu_version to pmu_version.
> - Use object_property_get_int() to get CPU family.
> - Use cpuid_find_entry() instead of cpu_x86_cpuid().
> - Send error log when host and guest are from different vendors.
> - Move "if (!cpu->enable_pmu)" to begin of function. Add comments to
> reminder developers.
> - Add support to Zhaoxin. Change is_same_vendor() to
> is_host_compat_vendor().
> - Didn't add Reviewed-by from Sandipan because the change isn't minor.
> Changed since v3:
> - Use host_cpu_vendor_fms() from Zhao's patch.
> - Check AMD directly makes the "compat" rule clear.
> - Add comment to MAX_GP_COUNTERS.
> - Skip PMU info initialization if !kvm_pmu_disabled.
> Changed since v4:
> - Add Reviewed-by from Zhao and Sandipan.
> Changed since v6:
> - Add Reviewed-by from Dapeng Mi.
> Changed since v8:
> - Remove the usage of 'kvm_pmu_disabled' as sussged by Zide Chen.
> - Remove Reviewed-by from Zhao Liu, Sandipan Das and Dapeng Mi, as the
> usage of 'kvm_pmu_disabled' is removed.
>
> target/i386/cpu.h | 12 +++
> target/i386/kvm/kvm.c | 168 +++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 176 insertions(+), 4 deletions(-)
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v9 0/5] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup
2026-01-09 7:53 [PATCH v9 0/5] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
` (4 preceding siblings ...)
2026-01-09 7:54 ` [PATCH v9 5/5] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
@ 2026-02-07 13:46 ` Paolo Bonzini
5 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2026-02-07 13:46 UTC (permalink / raw)
To: Dongli Zhang
Cc: qemu-devel, kvm, pbonzini, zhao1.liu, mtosatti, sandipan.das,
babu.moger, likexu, like.xu.linux, groug, khorenko,
alexander.ivanov, den, davydov-max, xiaoyao.li, dapeng1.mi,
joe.jin, ewanhai-oc, ewanhai, zide.chen
Queued, thanks.
Paolo
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-02-07 13:46 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-09 7:53 [PATCH v9 0/5] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
2026-01-09 7:53 ` [PATCH v9 1/5] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured Dongli Zhang
2026-01-15 1:07 ` Chen, Zide
2026-01-09 7:53 ` [PATCH v9 2/5] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
2026-01-15 1:08 ` Chen, Zide
2026-01-09 7:53 ` [PATCH v9 3/5] target/i386/kvm: rename architectural PMU variables Dongli Zhang
2026-01-15 1:09 ` Chen, Zide
2026-01-09 7:53 ` [PATCH v9 4/5] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
2026-01-16 23:08 ` Dongli Zhang
2026-01-19 1:24 ` Mi, Dapeng
2026-01-19 5:33 ` Zhao Liu
2026-01-09 7:54 ` [PATCH v9 5/5] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
2026-01-15 1:09 ` Chen, Zide
2026-02-07 13:46 ` [PATCH v9 0/5] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Paolo Bonzini
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.