* [PATCH v9 1/5] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured
2026-01-09 7:53 [PATCH v9 0/5] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
@ 2026-01-09 7:53 ` Dongli Zhang
2026-01-15 1:07 ` Chen, Zide
2026-01-09 7:53 ` [PATCH v9 2/5] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
` (4 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Dongli Zhang @ 2026-01-09 7:53 UTC (permalink / raw)
To: qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai,
zide.chen
Although AMD PERFCORE and PerfMonV2 are removed when "-pmu" is configured,
there is no way to fully disable KVM AMD PMU virtualization. Neither
"-cpu host,-pmu" nor "-cpu EPYC" achieves this.
As a result, the following message still appears in the VM dmesg:
[ 0.263615] Performance Events: AMD PMU driver.
However, the expected output should be:
[ 0.596381] Performance Events: PMU not available due to virtualization, using software events only.
[ 0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
This occurs because AMD does not use any CPUID bit to indicate PMU
availability.
To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
when "-pmu" is configured.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
Changed since v1:
- Switch back to the initial implementation with "-pmu".
https://lore.kernel.org/all/20221119122901.2469-3-dongli.zhang@oracle.com
- Mention that "KVM_PMU_CAP_DISABLE doesn't change the PMU behavior on
Intel platform because current "pmu" property works as expected."
Changed since v2:
- Change has_pmu_cap to pmu_cap.
- Use (pmu_cap & KVM_PMU_CAP_DISABLE) instead of only pmu_cap in if
statement.
- Add Reviewed-by from Xiaoyao and Zhao as the change is minor.
Changed since v5:
- Re-base on top of most recent mainline QEMU.
- To resolve conflicts, move the PMU related code before the
call site of is_tdx_vm().
Changed since v6:
- Add Reviewed-by from Dapeng Mi.
target/i386/kvm/kvm.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 7b9b740a8e..c98832f423 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -179,6 +179,8 @@ static int has_triple_fault_event;
static bool has_msr_mcg_ext_ctl;
+static int pmu_cap;
+
static struct kvm_cpuid2 *cpuid_cache;
static struct kvm_cpuid2 *hv_cpuid_cache;
static struct kvm_msr_list *kvm_feature_msrs;
@@ -2080,6 +2082,33 @@ full:
int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
{
+ static bool first = true;
+ int ret;
+
+ if (first) {
+ first = false;
+
+ /*
+ * Since Linux v5.18, KVM provides a VM-level capability to easily
+ * disable PMUs; however, QEMU has been providing PMU property per
+ * CPU since v1.6. In order to accommodate both, have to configure
+ * the VM-level capability here.
+ *
+ * KVM_PMU_CAP_DISABLE doesn't change the PMU
+ * behavior on Intel platform because current "pmu" property works
+ * as expected.
+ */
+ if ((pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu) {
+ ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
+ KVM_PMU_CAP_DISABLE);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret,
+ "Failed to set KVM_PMU_CAP_DISABLE");
+ return ret;
+ }
+ }
+ }
+
if (is_tdx_vm()) {
return tdx_pre_create_vcpu(cpu, errp);
}
@@ -3391,6 +3420,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
}
}
+ pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
+
return 0;
}
--
2.39.3
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH v9 1/5] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured
2026-01-09 7:53 ` [PATCH v9 1/5] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured Dongli Zhang
@ 2026-01-15 1:07 ` Chen, Zide
0 siblings, 0 replies; 14+ messages in thread
From: Chen, Zide @ 2026-01-15 1:07 UTC (permalink / raw)
To: Dongli Zhang, qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai
On 1/8/2026 11:53 PM, Dongli Zhang wrote:
> Although AMD PERFCORE and PerfMonV2 are removed when "-pmu" is configured,
> there is no way to fully disable KVM AMD PMU virtualization. Neither
> "-cpu host,-pmu" nor "-cpu EPYC" achieves this.
>
> As a result, the following message still appears in the VM dmesg:
>
> [ 0.263615] Performance Events: AMD PMU driver.
>
> However, the expected output should be:
>
> [ 0.596381] Performance Events: PMU not available due to virtualization, using software events only.
> [ 0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
>
> This occurs because AMD does not use any CPUID bit to indicate PMU
> availability.
>
> To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
> when "-pmu" is configured.
>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
LGTM.
Reviewed-by: Zide Chen <zide.chen@intel.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v9 2/5] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid()
2026-01-09 7:53 [PATCH v9 0/5] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
2026-01-09 7:53 ` [PATCH v9 1/5] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured Dongli Zhang
@ 2026-01-09 7:53 ` Dongli Zhang
2026-01-15 1:08 ` Chen, Zide
2026-01-09 7:53 ` [PATCH v9 3/5] target/i386/kvm: rename architectural PMU variables Dongli Zhang
` (3 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Dongli Zhang @ 2026-01-09 7:53 UTC (permalink / raw)
To: qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai,
zide.chen
The initialization of 'has_architectural_pmu_version',
'num_architectural_pmu_gp_counters', and
'num_architectural_pmu_fixed_counters' is unrelated to the process of
building the CPUID.
Extract them out of kvm_x86_build_cpuid().
In addition, use cpuid_find_entry() instead of cpu_x86_cpuid(), because
CPUID has already been filled at this stage.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
Changed since v1:
- Still extract the code, but call them for all CPUs.
Changed since v2:
- Use cpuid_find_entry() instead of cpu_x86_cpuid().
- Didn't add Reviewed-by from Dapeng as the change isn't minor.
Changed since v6:
- Add Reviewed-by from Dapeng Mi.
target/i386/kvm/kvm.c | 62 ++++++++++++++++++++++++-------------------
1 file changed, 35 insertions(+), 27 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index c98832f423..08d80ff677 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1986,33 +1986,6 @@ uint32_t kvm_x86_build_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
}
}
- if (limit >= 0x0a) {
- uint32_t eax, edx;
-
- cpu_x86_cpuid(env, 0x0a, 0, &eax, &unused, &unused, &edx);
-
- has_architectural_pmu_version = eax & 0xff;
- if (has_architectural_pmu_version > 0) {
- num_architectural_pmu_gp_counters = (eax & 0xff00) >> 8;
-
- /* Shouldn't be more than 32, since that's the number of bits
- * available in EBX to tell us _which_ counters are available.
- * Play it safe.
- */
- if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
- num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
- }
-
- if (has_architectural_pmu_version > 1) {
- num_architectural_pmu_fixed_counters = edx & 0x1f;
-
- if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
- num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
- }
- }
- }
- }
-
cpu_x86_cpuid(env, 0x80000000, 0, &limit, &unused, &unused, &unused);
for (i = 0x80000000; i <= limit; i++) {
@@ -2116,6 +2089,39 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
return 0;
}
+static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
+{
+ struct kvm_cpuid_entry2 *c;
+
+ c = cpuid_find_entry(cpuid, 0xa, 0);
+
+ if (!c) {
+ return;
+ }
+
+ has_architectural_pmu_version = c->eax & 0xff;
+ if (has_architectural_pmu_version > 0) {
+ num_architectural_pmu_gp_counters = (c->eax & 0xff00) >> 8;
+
+ /*
+ * Shouldn't be more than 32, since that's the number of bits
+ * available in EBX to tell us _which_ counters are available.
+ * Play it safe.
+ */
+ if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
+ num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
+ }
+
+ if (has_architectural_pmu_version > 1) {
+ num_architectural_pmu_fixed_counters = c->edx & 0x1f;
+
+ if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
+ num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
+ }
+ }
+ }
+}
+
int kvm_arch_init_vcpu(CPUState *cs)
{
struct {
@@ -2306,6 +2312,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i);
cpuid_data.cpuid.nent = cpuid_i;
+ kvm_init_pmu_info(&cpuid_data.cpuid);
+
if (x86_cpu_family(env->cpuid_version) >= 6
&& (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
(CPUID_MCE | CPUID_MCA)) {
--
2.39.3
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH v9 2/5] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid()
2026-01-09 7:53 ` [PATCH v9 2/5] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
@ 2026-01-15 1:08 ` Chen, Zide
0 siblings, 0 replies; 14+ messages in thread
From: Chen, Zide @ 2026-01-15 1:08 UTC (permalink / raw)
To: Dongli Zhang, qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai
On 1/8/2026 11:53 PM, Dongli Zhang wrote:
> The initialization of 'has_architectural_pmu_version',
> 'num_architectural_pmu_gp_counters', and
> 'num_architectural_pmu_fixed_counters' is unrelated to the process of
> building the CPUID.
>
> Extract them out of kvm_x86_build_cpuid().
>
> In addition, use cpuid_find_entry() instead of cpu_x86_cpuid(), because
> CPUID has already been filled at this stage.
>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
LGTM.
Reviewed-by: Zide Chen <zide.chen@intel.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v9 3/5] target/i386/kvm: rename architectural PMU variables
2026-01-09 7:53 [PATCH v9 0/5] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
2026-01-09 7:53 ` [PATCH v9 1/5] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured Dongli Zhang
2026-01-09 7:53 ` [PATCH v9 2/5] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
@ 2026-01-09 7:53 ` Dongli Zhang
2026-01-15 1:09 ` Chen, Zide
2026-01-09 7:53 ` [PATCH v9 4/5] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
` (2 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Dongli Zhang @ 2026-01-09 7:53 UTC (permalink / raw)
To: qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai,
zide.chen
AMD does not have what is commonly referred to as an architectural PMU.
Therefore, we need to rename the following variables to be applicable for
both Intel and AMD:
- has_architectural_pmu_version
- num_architectural_pmu_gp_counters
- num_architectural_pmu_fixed_counters
For Intel processors, the meaning of pmu_version remains unchanged.
For AMD processors:
pmu_version == 1 corresponds to versions before AMD PerfMonV2.
pmu_version == 2 corresponds to AMD PerfMonV2.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
---
Changed since v2:
- Change has_pmu_version to pmu_version.
- Add Reviewed-by since the change is minor.
- As a reminder, there are some contextual change due to PATCH 05,
i.e., c->edx vs. edx.
Changed since v6:
- Add Reviewed-by from Sandipan.
target/i386/kvm/kvm.c | 49 ++++++++++++++++++++++++-------------------
1 file changed, 28 insertions(+), 21 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 08d80ff677..3b803c662d 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -167,9 +167,16 @@ static bool has_msr_perf_capabs;
static bool has_msr_pkrs;
static bool has_msr_hwcr;
-static uint32_t has_architectural_pmu_version;
-static uint32_t num_architectural_pmu_gp_counters;
-static uint32_t num_architectural_pmu_fixed_counters;
+/*
+ * For Intel processors, the meaning is the architectural PMU version
+ * number.
+ *
+ * For AMD processors: 1 corresponds to the prior versions, and 2
+ * corresponds to AMD PerfMonV2.
+ */
+static uint32_t pmu_version;
+static uint32_t num_pmu_gp_counters;
+static uint32_t num_pmu_fixed_counters;
static int has_xsave2;
static int has_xcrs;
@@ -2099,24 +2106,24 @@ static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
return;
}
- has_architectural_pmu_version = c->eax & 0xff;
- if (has_architectural_pmu_version > 0) {
- num_architectural_pmu_gp_counters = (c->eax & 0xff00) >> 8;
+ pmu_version = c->eax & 0xff;
+ if (pmu_version > 0) {
+ num_pmu_gp_counters = (c->eax & 0xff00) >> 8;
/*
* Shouldn't be more than 32, since that's the number of bits
* available in EBX to tell us _which_ counters are available.
* Play it safe.
*/
- if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
- num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
+ if (num_pmu_gp_counters > MAX_GP_COUNTERS) {
+ num_pmu_gp_counters = MAX_GP_COUNTERS;
}
- if (has_architectural_pmu_version > 1) {
- num_architectural_pmu_fixed_counters = c->edx & 0x1f;
+ if (pmu_version > 1) {
+ num_pmu_fixed_counters = c->edx & 0x1f;
- if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
- num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
+ if (num_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
+ num_pmu_fixed_counters = MAX_FIXED_COUNTERS;
}
}
}
@@ -4087,25 +4094,25 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
}
- if (has_architectural_pmu_version > 0) {
- if (has_architectural_pmu_version > 1) {
+ if (pmu_version > 0) {
+ if (pmu_version > 1) {
/* Stop the counter. */
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
}
/* Set the counter values. */
- for (i = 0; i < num_architectural_pmu_fixed_counters; i++) {
+ for (i = 0; i < num_pmu_fixed_counters; i++) {
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i,
env->msr_fixed_counters[i]);
}
- for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
+ for (i = 0; i < num_pmu_gp_counters; i++) {
kvm_msr_entry_add(cpu, MSR_P6_PERFCTR0 + i,
env->msr_gp_counters[i]);
kvm_msr_entry_add(cpu, MSR_P6_EVNTSEL0 + i,
env->msr_gp_evtsel[i]);
}
- if (has_architectural_pmu_version > 1) {
+ if (pmu_version > 1) {
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_STATUS,
env->msr_global_status);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
@@ -4622,17 +4629,17 @@ static int kvm_get_msrs(X86CPU *cpu)
if (env->features[FEAT_KVM] & CPUID_KVM_POLL_CONTROL) {
kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
}
- if (has_architectural_pmu_version > 0) {
- if (has_architectural_pmu_version > 1) {
+ if (pmu_version > 0) {
+ if (pmu_version > 1) {
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_STATUS, 0);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0);
}
- for (i = 0; i < num_architectural_pmu_fixed_counters; i++) {
+ for (i = 0; i < num_pmu_fixed_counters; i++) {
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i, 0);
}
- for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
+ for (i = 0; i < num_pmu_gp_counters; i++) {
kvm_msr_entry_add(cpu, MSR_P6_PERFCTR0 + i, 0);
kvm_msr_entry_add(cpu, MSR_P6_EVNTSEL0 + i, 0);
}
--
2.39.3
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH v9 3/5] target/i386/kvm: rename architectural PMU variables
2026-01-09 7:53 ` [PATCH v9 3/5] target/i386/kvm: rename architectural PMU variables Dongli Zhang
@ 2026-01-15 1:09 ` Chen, Zide
0 siblings, 0 replies; 14+ messages in thread
From: Chen, Zide @ 2026-01-15 1:09 UTC (permalink / raw)
To: Dongli Zhang, qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai
On 1/8/2026 11:53 PM, Dongli Zhang wrote:
> AMD does not have what is commonly referred to as an architectural PMU.
> Therefore, we need to rename the following variables to be applicable for
> both Intel and AMD:
>
> - has_architectural_pmu_version
> - num_architectural_pmu_gp_counters
> - num_architectural_pmu_fixed_counters
>
> For Intel processors, the meaning of pmu_version remains unchanged.
>
> For AMD processors:
>
> pmu_version == 1 corresponds to versions before AMD PerfMonV2.
> pmu_version == 2 corresponds to AMD PerfMonV2.
>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> Reviewed-by: Sandipan Das <sandipan.das@amd.com>
> ---
LGTM.
Reviewed-by: Zide Chen <zide.chen@intel.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v9 4/5] target/i386/kvm: reset AMD PMU registers during VM reset
2026-01-09 7:53 [PATCH v9 0/5] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
` (2 preceding siblings ...)
2026-01-09 7:53 ` [PATCH v9 3/5] target/i386/kvm: rename architectural PMU variables Dongli Zhang
@ 2026-01-09 7:53 ` Dongli Zhang
2026-01-16 23:08 ` Dongli Zhang
` (2 more replies)
2026-01-09 7:54 ` [PATCH v9 5/5] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
2026-02-07 13:46 ` [PATCH v9 0/5] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Paolo Bonzini
5 siblings, 3 replies; 14+ messages in thread
From: Dongli Zhang @ 2026-01-09 7:53 UTC (permalink / raw)
To: qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai,
zide.chen
QEMU uses the kvm_get_msrs() function to save Intel PMU registers from KVM
and kvm_put_msrs() to restore them to KVM. However, there is no support for
AMD PMU registers. Currently, pmu_version and num_pmu_gp_counters are
initialized based on cpuid(0xa), which does not apply to AMD processors.
For AMD CPUs, prior to PerfMonV2, the number of general-purpose registers
is determined based on the CPU version.
To address this issue, we need to add support for AMD PMU registers.
Without this support, the following problems can arise:
1. If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
running "perf top", the PMU registers are not disabled properly.
2. Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
does not handle AMD PMU registers, causing some PMU events to remain
enabled in KVM.
3. The KVM kvm_pmc_speculative_in_use() function consistently returns true,
preventing the reclamation of these events. Consequently, the
kvm_pmc->perf_event remains active.
4. After a reboot, the VM kernel may report the following error:
[ 0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
[ 0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
5. In the worst case, the active kvm_pmc->perf_event may inject unknown
NMIs randomly into the VM kernel:
[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
To resolve these issues, we propose resetting AMD PMU registers during the
VM reset process.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
Changed since v1:
- Modify "MSR_K7_EVNTSEL0 + 3" and "MSR_K7_PERFCTR0 + 3" by using
AMD64_NUM_COUNTERS (suggested by Sandipan Das).
- Use "AMD64_NUM_COUNTERS_CORE * 2 - 1", not "MSR_F15H_PERF_CTL0 + 0xb".
(suggested by Sandipan Das).
- Switch back to "-pmu" instead of using a global "pmu-cap-disabled".
- Don't initialize PMU info if kvm.enable_pmu=N.
Changed since v2:
- Remove 'static' from host_cpuid_vendorX.
- Change has_pmu_version to pmu_version.
- Use object_property_get_int() to get CPU family.
- Use cpuid_find_entry() instead of cpu_x86_cpuid().
- Send error log when host and guest are from different vendors.
- Move "if (!cpu->enable_pmu)" to begin of function. Add comments to
reminder developers.
- Add support to Zhaoxin. Change is_same_vendor() to
is_host_compat_vendor().
- Didn't add Reviewed-by from Sandipan because the change isn't minor.
Changed since v3:
- Use host_cpu_vendor_fms() from Zhao's patch.
- Check AMD directly makes the "compat" rule clear.
- Add comment to MAX_GP_COUNTERS.
- Skip PMU info initialization if !kvm_pmu_disabled.
Changed since v4:
- Add Reviewed-by from Zhao and Sandipan.
Changed since v6:
- Add Reviewed-by from Dapeng Mi.
Changed since v8:
- Remove the usage of 'kvm_pmu_disabled' as sussged by Zide Chen.
- Remove Reviewed-by from Zhao Liu, Sandipan Das and Dapeng Mi, as the
usage of 'kvm_pmu_disabled' is removed.
target/i386/cpu.h | 12 +++
target/i386/kvm/kvm.c | 168 +++++++++++++++++++++++++++++++++++++++++-
2 files changed, 176 insertions(+), 4 deletions(-)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 2bbc977d90..0960b98960 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -506,6 +506,14 @@ typedef enum X86Seg {
#define MSR_CORE_PERF_GLOBAL_CTRL 0x38f
#define MSR_CORE_PERF_GLOBAL_OVF_CTRL 0x390
+#define MSR_K7_EVNTSEL0 0xc0010000
+#define MSR_K7_PERFCTR0 0xc0010004
+#define MSR_F15H_PERF_CTL0 0xc0010200
+#define MSR_F15H_PERF_CTR0 0xc0010201
+
+#define AMD64_NUM_COUNTERS 4
+#define AMD64_NUM_COUNTERS_CORE 6
+
#define MSR_MC0_CTL 0x400
#define MSR_MC0_STATUS 0x401
#define MSR_MC0_ADDR 0x402
@@ -1737,6 +1745,10 @@ typedef struct {
#endif
#define MAX_FIXED_COUNTERS 3
+/*
+ * This formula is based on Intel's MSR. The current size also meets AMD's
+ * needs.
+ */
#define MAX_GP_COUNTERS (MSR_IA32_PERF_STATUS - MSR_P6_EVNTSEL0)
#define NB_OPMASK_REGS 8
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 3b803c662d..fb7b672a9d 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2096,7 +2096,7 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
return 0;
}
-static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
+static void kvm_init_pmu_info_intel(struct kvm_cpuid2 *cpuid)
{
struct kvm_cpuid_entry2 *c;
@@ -2129,6 +2129,89 @@ static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
}
}
+static void kvm_init_pmu_info_amd(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
+{
+ struct kvm_cpuid_entry2 *c;
+ int64_t family;
+
+ family = object_property_get_int(OBJECT(cpu), "family", NULL);
+ if (family < 0) {
+ return;
+ }
+
+ if (family < 6) {
+ error_report("AMD performance-monitoring is supported from "
+ "K7 and later");
+ return;
+ }
+
+ pmu_version = 1;
+ num_pmu_gp_counters = AMD64_NUM_COUNTERS;
+
+ c = cpuid_find_entry(cpuid, 0x80000001, 0);
+ if (!c) {
+ return;
+ }
+
+ if (!(c->ecx & CPUID_EXT3_PERFCORE)) {
+ return;
+ }
+
+ num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
+}
+
+static bool is_host_compat_vendor(CPUX86State *env)
+{
+ char host_vendor[CPUID_VENDOR_SZ + 1];
+
+ host_cpu_vendor_fms(host_vendor, NULL, NULL, NULL);
+
+ /*
+ * Intel and Zhaoxin are compatible.
+ */
+ if ((g_str_equal(host_vendor, CPUID_VENDOR_INTEL) ||
+ g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN1) ||
+ g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN2)) &&
+ (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env))) {
+ return true;
+ }
+
+ return g_str_equal(host_vendor, CPUID_VENDOR_AMD) &&
+ IS_AMD_CPU(env);
+}
+
+static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
+{
+ CPUX86State *env = &cpu->env;
+
+ /*
+ * If KVM_CAP_PMU_CAPABILITY is not supported, there is no way to
+ * disable the AMD PMU virtualization.
+ *
+ * Assume the user is aware of this when !cpu->enable_pmu. AMD PMU
+ * registers are not going to reset, even they are still available to
+ * guest VM.
+ */
+ if (!cpu->enable_pmu) {
+ return;
+ }
+
+ /*
+ * It is not supported to virtualize AMD PMU registers on Intel
+ * processors, nor to virtualize Intel PMU registers on AMD processors.
+ */
+ if (!is_host_compat_vendor(env)) {
+ error_report("host doesn't support requested feature: vPMU");
+ return;
+ }
+
+ if (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) {
+ kvm_init_pmu_info_intel(cpuid);
+ } else if (IS_AMD_CPU(env)) {
+ kvm_init_pmu_info_amd(cpuid, cpu);
+ }
+}
+
int kvm_arch_init_vcpu(CPUState *cs)
{
struct {
@@ -2319,7 +2402,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i);
cpuid_data.cpuid.nent = cpuid_i;
- kvm_init_pmu_info(&cpuid_data.cpuid);
+ kvm_init_pmu_info(&cpuid_data.cpuid, cpu);
if (x86_cpu_family(env->cpuid_version) >= 6
&& (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
@@ -4094,7 +4177,7 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
}
- if (pmu_version > 0) {
+ if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
if (pmu_version > 1) {
/* Stop the counter. */
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
@@ -4125,6 +4208,38 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
env->msr_global_ctrl);
}
}
+
+ if (IS_AMD_CPU(env) && pmu_version > 0) {
+ uint32_t sel_base = MSR_K7_EVNTSEL0;
+ uint32_t ctr_base = MSR_K7_PERFCTR0;
+ /*
+ * The address of the next selector or counter register is
+ * obtained by incrementing the address of the current selector
+ * or counter register by one.
+ */
+ uint32_t step = 1;
+
+ /*
+ * When PERFCORE is enabled, AMD PMU uses a separate set of
+ * addresses for the selector and counter registers.
+ * Additionally, the address of the next selector or counter
+ * register is determined by incrementing the address of the
+ * current register by two.
+ */
+ if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+ sel_base = MSR_F15H_PERF_CTL0;
+ ctr_base = MSR_F15H_PERF_CTR0;
+ step = 2;
+ }
+
+ for (i = 0; i < num_pmu_gp_counters; i++) {
+ kvm_msr_entry_add(cpu, ctr_base + i * step,
+ env->msr_gp_counters[i]);
+ kvm_msr_entry_add(cpu, sel_base + i * step,
+ env->msr_gp_evtsel[i]);
+ }
+ }
+
/*
* Hyper-V partition-wide MSRs: to avoid clearing them on cpu hot-add,
* only sync them to KVM on the first cpu
@@ -4629,7 +4744,8 @@ static int kvm_get_msrs(X86CPU *cpu)
if (env->features[FEAT_KVM] & CPUID_KVM_POLL_CONTROL) {
kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
}
- if (pmu_version > 0) {
+
+ if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
if (pmu_version > 1) {
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
@@ -4645,6 +4761,35 @@ static int kvm_get_msrs(X86CPU *cpu)
}
}
+ if (IS_AMD_CPU(env) && pmu_version > 0) {
+ uint32_t sel_base = MSR_K7_EVNTSEL0;
+ uint32_t ctr_base = MSR_K7_PERFCTR0;
+ /*
+ * The address of the next selector or counter register is
+ * obtained by incrementing the address of the current selector
+ * or counter register by one.
+ */
+ uint32_t step = 1;
+
+ /*
+ * When PERFCORE is enabled, AMD PMU uses a separate set of
+ * addresses for the selector and counter registers.
+ * Additionally, the address of the next selector or counter
+ * register is determined by incrementing the address of the
+ * current register by two.
+ */
+ if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+ sel_base = MSR_F15H_PERF_CTL0;
+ ctr_base = MSR_F15H_PERF_CTR0;
+ step = 2;
+ }
+
+ for (i = 0; i < num_pmu_gp_counters; i++) {
+ kvm_msr_entry_add(cpu, ctr_base + i * step, 0);
+ kvm_msr_entry_add(cpu, sel_base + i * step, 0);
+ }
+ }
+
if (env->mcg_cap) {
kvm_msr_entry_add(cpu, MSR_MCG_STATUS, 0);
kvm_msr_entry_add(cpu, MSR_MCG_CTL, 0);
@@ -4975,6 +5120,21 @@ static int kvm_get_msrs(X86CPU *cpu)
case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1:
env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data;
break;
+ case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL0 + AMD64_NUM_COUNTERS - 1:
+ env->msr_gp_evtsel[index - MSR_K7_EVNTSEL0] = msrs[i].data;
+ break;
+ case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR0 + AMD64_NUM_COUNTERS - 1:
+ env->msr_gp_counters[index - MSR_K7_PERFCTR0] = msrs[i].data;
+ break;
+ case MSR_F15H_PERF_CTL0 ...
+ MSR_F15H_PERF_CTL0 + AMD64_NUM_COUNTERS_CORE * 2 - 1:
+ index = index - MSR_F15H_PERF_CTL0;
+ if (index & 0x1) {
+ env->msr_gp_counters[index] = msrs[i].data;
+ } else {
+ env->msr_gp_evtsel[index] = msrs[i].data;
+ }
+ break;
case HV_X64_MSR_HYPERCALL:
env->msr_hv_hypercall = msrs[i].data;
break;
--
2.39.3
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH v9 4/5] target/i386/kvm: reset AMD PMU registers during VM reset
2026-01-09 7:53 ` [PATCH v9 4/5] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
@ 2026-01-16 23:08 ` Dongli Zhang
2026-01-19 1:24 ` Mi, Dapeng
2026-01-19 5:33 ` Zhao Liu
2 siblings, 0 replies; 14+ messages in thread
From: Dongli Zhang @ 2026-01-16 23:08 UTC (permalink / raw)
To: qemu-devel, kvm, zhao1.liu, sandipan.das, dapeng1.mi
Cc: pbonzini, mtosatti, babu.moger, likexu, like.xu.linux, groug,
khorenko, alexander.ivanov, den, davydov-max, xiaoyao.li, joe.jin,
ewanhai-oc, ewanhai, zide.chen
Hi Zhao, Sandipan and Dapeng,
FYI: I have removed your Reviewed-by since the previous version.
I have removed the following code since the previous version as suggested by Zide.
+ /*
+ * The PMU virtualization is disabled by kvm.enable_pmu=N.
+ */
+ if (kvm_pmu_disabled) {
+ return;
+ }
Thank you very much!
Dongli Zhang
On 1/8/26 11:53 PM, Dongli Zhang wrote:
> QEMU uses the kvm_get_msrs() function to save Intel PMU registers from KVM
> and kvm_put_msrs() to restore them to KVM. However, there is no support for
> AMD PMU registers. Currently, pmu_version and num_pmu_gp_counters are
> initialized based on cpuid(0xa), which does not apply to AMD processors.
> For AMD CPUs, prior to PerfMonV2, the number of general-purpose registers
> is determined based on the CPU version.
>
> To address this issue, we need to add support for AMD PMU registers.
> Without this support, the following problems can arise:
>
> 1. If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
> running "perf top", the PMU registers are not disabled properly.
>
> 2. Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
> does not handle AMD PMU registers, causing some PMU events to remain
> enabled in KVM.
>
> 3. The KVM kvm_pmc_speculative_in_use() function consistently returns true,
> preventing the reclamation of these events. Consequently, the
> kvm_pmc->perf_event remains active.
>
> 4. After a reboot, the VM kernel may report the following error:
>
> [ 0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
> [ 0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
>
> 5. In the worst case, the active kvm_pmc->perf_event may inject unknown
> NMIs randomly into the VM kernel:
>
> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
>
> To resolve these issues, we propose resetting AMD PMU registers during the
> VM reset process.
>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> Changed since v1:
> - Modify "MSR_K7_EVNTSEL0 + 3" and "MSR_K7_PERFCTR0 + 3" by using
> AMD64_NUM_COUNTERS (suggested by Sandipan Das).
> - Use "AMD64_NUM_COUNTERS_CORE * 2 - 1", not "MSR_F15H_PERF_CTL0 + 0xb".
> (suggested by Sandipan Das).
> - Switch back to "-pmu" instead of using a global "pmu-cap-disabled".
> - Don't initialize PMU info if kvm.enable_pmu=N.
> Changed since v2:
> - Remove 'static' from host_cpuid_vendorX.
> - Change has_pmu_version to pmu_version.
> - Use object_property_get_int() to get CPU family.
> - Use cpuid_find_entry() instead of cpu_x86_cpuid().
> - Send error log when host and guest are from different vendors.
> - Move "if (!cpu->enable_pmu)" to begin of function. Add comments to
> reminder developers.
> - Add support to Zhaoxin. Change is_same_vendor() to
> is_host_compat_vendor().
> - Didn't add Reviewed-by from Sandipan because the change isn't minor.
> Changed since v3:
> - Use host_cpu_vendor_fms() from Zhao's patch.
> - Check AMD directly makes the "compat" rule clear.
> - Add comment to MAX_GP_COUNTERS.
> - Skip PMU info initialization if !kvm_pmu_disabled.
> Changed since v4:
> - Add Reviewed-by from Zhao and Sandipan.
> Changed since v6:
> - Add Reviewed-by from Dapeng Mi.
> Changed since v8:
> - Remove the usage of 'kvm_pmu_disabled' as sussged by Zide Chen.
> - Remove Reviewed-by from Zhao Liu, Sandipan Das and Dapeng Mi, as the
> usage of 'kvm_pmu_disabled' is removed.
>
> target/i386/cpu.h | 12 +++
> target/i386/kvm/kvm.c | 168 +++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 176 insertions(+), 4 deletions(-)
>
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 2bbc977d90..0960b98960 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -506,6 +506,14 @@ typedef enum X86Seg {
> #define MSR_CORE_PERF_GLOBAL_CTRL 0x38f
> #define MSR_CORE_PERF_GLOBAL_OVF_CTRL 0x390
>
> +#define MSR_K7_EVNTSEL0 0xc0010000
> +#define MSR_K7_PERFCTR0 0xc0010004
> +#define MSR_F15H_PERF_CTL0 0xc0010200
> +#define MSR_F15H_PERF_CTR0 0xc0010201
> +
> +#define AMD64_NUM_COUNTERS 4
> +#define AMD64_NUM_COUNTERS_CORE 6
> +
> #define MSR_MC0_CTL 0x400
> #define MSR_MC0_STATUS 0x401
> #define MSR_MC0_ADDR 0x402
> @@ -1737,6 +1745,10 @@ typedef struct {
> #endif
>
> #define MAX_FIXED_COUNTERS 3
> +/*
> + * This formula is based on Intel's MSR. The current size also meets AMD's
> + * needs.
> + */
> #define MAX_GP_COUNTERS (MSR_IA32_PERF_STATUS - MSR_P6_EVNTSEL0)
>
> #define NB_OPMASK_REGS 8
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 3b803c662d..fb7b672a9d 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -2096,7 +2096,7 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
> return 0;
> }
>
> -static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
> +static void kvm_init_pmu_info_intel(struct kvm_cpuid2 *cpuid)
> {
> struct kvm_cpuid_entry2 *c;
>
> @@ -2129,6 +2129,89 @@ static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
> }
> }
>
> +static void kvm_init_pmu_info_amd(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
> +{
> + struct kvm_cpuid_entry2 *c;
> + int64_t family;
> +
> + family = object_property_get_int(OBJECT(cpu), "family", NULL);
> + if (family < 0) {
> + return;
> + }
> +
> + if (family < 6) {
> + error_report("AMD performance-monitoring is supported from "
> + "K7 and later");
> + return;
> + }
> +
> + pmu_version = 1;
> + num_pmu_gp_counters = AMD64_NUM_COUNTERS;
> +
> + c = cpuid_find_entry(cpuid, 0x80000001, 0);
> + if (!c) {
> + return;
> + }
> +
> + if (!(c->ecx & CPUID_EXT3_PERFCORE)) {
> + return;
> + }
> +
> + num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
> +}
> +
> +static bool is_host_compat_vendor(CPUX86State *env)
> +{
> + char host_vendor[CPUID_VENDOR_SZ + 1];
> +
> + host_cpu_vendor_fms(host_vendor, NULL, NULL, NULL);
> +
> + /*
> + * Intel and Zhaoxin are compatible.
> + */
> + if ((g_str_equal(host_vendor, CPUID_VENDOR_INTEL) ||
> + g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN1) ||
> + g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN2)) &&
> + (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env))) {
> + return true;
> + }
> +
> + return g_str_equal(host_vendor, CPUID_VENDOR_AMD) &&
> + IS_AMD_CPU(env);
> +}
> +
> +static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
> +{
> + CPUX86State *env = &cpu->env;
> +
> + /*
> + * If KVM_CAP_PMU_CAPABILITY is not supported, there is no way to
> + * disable the AMD PMU virtualization.
> + *
> + * Assume the user is aware of this when !cpu->enable_pmu. AMD PMU
> + * registers are not going to reset, even they are still available to
> + * guest VM.
> + */
> + if (!cpu->enable_pmu) {
> + return;
> + }
> +
> + /*
> + * It is not supported to virtualize AMD PMU registers on Intel
> + * processors, nor to virtualize Intel PMU registers on AMD processors.
> + */
> + if (!is_host_compat_vendor(env)) {
> + error_report("host doesn't support requested feature: vPMU");
> + return;
> + }
> +
> + if (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) {
> + kvm_init_pmu_info_intel(cpuid);
> + } else if (IS_AMD_CPU(env)) {
> + kvm_init_pmu_info_amd(cpuid, cpu);
> + }
> +}
> +
> int kvm_arch_init_vcpu(CPUState *cs)
> {
> struct {
> @@ -2319,7 +2402,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
> cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i);
> cpuid_data.cpuid.nent = cpuid_i;
>
> - kvm_init_pmu_info(&cpuid_data.cpuid);
> + kvm_init_pmu_info(&cpuid_data.cpuid, cpu);
>
> if (x86_cpu_family(env->cpuid_version) >= 6
> && (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
> @@ -4094,7 +4177,7 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
> kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
> }
>
> - if (pmu_version > 0) {
> + if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
> if (pmu_version > 1) {
> /* Stop the counter. */
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
> @@ -4125,6 +4208,38 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
> env->msr_global_ctrl);
> }
> }
> +
> + if (IS_AMD_CPU(env) && pmu_version > 0) {
> + uint32_t sel_base = MSR_K7_EVNTSEL0;
> + uint32_t ctr_base = MSR_K7_PERFCTR0;
> + /*
> + * The address of the next selector or counter register is
> + * obtained by incrementing the address of the current selector
> + * or counter register by one.
> + */
> + uint32_t step = 1;
> +
> + /*
> + * When PERFCORE is enabled, AMD PMU uses a separate set of
> + * addresses for the selector and counter registers.
> + * Additionally, the address of the next selector or counter
> + * register is determined by incrementing the address of the
> + * current register by two.
> + */
> + if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
> + sel_base = MSR_F15H_PERF_CTL0;
> + ctr_base = MSR_F15H_PERF_CTR0;
> + step = 2;
> + }
> +
> + for (i = 0; i < num_pmu_gp_counters; i++) {
> + kvm_msr_entry_add(cpu, ctr_base + i * step,
> + env->msr_gp_counters[i]);
> + kvm_msr_entry_add(cpu, sel_base + i * step,
> + env->msr_gp_evtsel[i]);
> + }
> + }
> +
> /*
> * Hyper-V partition-wide MSRs: to avoid clearing them on cpu hot-add,
> * only sync them to KVM on the first cpu
> @@ -4629,7 +4744,8 @@ static int kvm_get_msrs(X86CPU *cpu)
> if (env->features[FEAT_KVM] & CPUID_KVM_POLL_CONTROL) {
> kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
> }
> - if (pmu_version > 0) {
> +
> + if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
> if (pmu_version > 1) {
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
> @@ -4645,6 +4761,35 @@ static int kvm_get_msrs(X86CPU *cpu)
> }
> }
>
> + if (IS_AMD_CPU(env) && pmu_version > 0) {
> + uint32_t sel_base = MSR_K7_EVNTSEL0;
> + uint32_t ctr_base = MSR_K7_PERFCTR0;
> + /*
> + * The address of the next selector or counter register is
> + * obtained by incrementing the address of the current selector
> + * or counter register by one.
> + */
> + uint32_t step = 1;
> +
> + /*
> + * When PERFCORE is enabled, AMD PMU uses a separate set of
> + * addresses for the selector and counter registers.
> + * Additionally, the address of the next selector or counter
> + * register is determined by incrementing the address of the
> + * current register by two.
> + */
> + if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
> + sel_base = MSR_F15H_PERF_CTL0;
> + ctr_base = MSR_F15H_PERF_CTR0;
> + step = 2;
> + }
> +
> + for (i = 0; i < num_pmu_gp_counters; i++) {
> + kvm_msr_entry_add(cpu, ctr_base + i * step, 0);
> + kvm_msr_entry_add(cpu, sel_base + i * step, 0);
> + }
> + }
> +
> if (env->mcg_cap) {
> kvm_msr_entry_add(cpu, MSR_MCG_STATUS, 0);
> kvm_msr_entry_add(cpu, MSR_MCG_CTL, 0);
> @@ -4975,6 +5120,21 @@ static int kvm_get_msrs(X86CPU *cpu)
> case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1:
> env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data;
> break;
> + case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL0 + AMD64_NUM_COUNTERS - 1:
> + env->msr_gp_evtsel[index - MSR_K7_EVNTSEL0] = msrs[i].data;
> + break;
> + case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR0 + AMD64_NUM_COUNTERS - 1:
> + env->msr_gp_counters[index - MSR_K7_PERFCTR0] = msrs[i].data;
> + break;
> + case MSR_F15H_PERF_CTL0 ...
> + MSR_F15H_PERF_CTL0 + AMD64_NUM_COUNTERS_CORE * 2 - 1:
> + index = index - MSR_F15H_PERF_CTL0;
> + if (index & 0x1) {
> + env->msr_gp_counters[index] = msrs[i].data;
> + } else {
> + env->msr_gp_evtsel[index] = msrs[i].data;
> + }
> + break;
> case HV_X64_MSR_HYPERCALL:
> env->msr_hv_hypercall = msrs[i].data;
> break;
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH v9 4/5] target/i386/kvm: reset AMD PMU registers during VM reset
2026-01-09 7:53 ` [PATCH v9 4/5] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
2026-01-16 23:08 ` Dongli Zhang
@ 2026-01-19 1:24 ` Mi, Dapeng
2026-01-19 5:33 ` Zhao Liu
2 siblings, 0 replies; 14+ messages in thread
From: Mi, Dapeng @ 2026-01-19 1:24 UTC (permalink / raw)
To: Dongli Zhang, qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, joe.jin, ewanhai-oc, ewanhai, zide.chen
On 1/9/2026 3:53 PM, Dongli Zhang wrote:
> QEMU uses the kvm_get_msrs() function to save Intel PMU registers from KVM
> and kvm_put_msrs() to restore them to KVM. However, there is no support for
> AMD PMU registers. Currently, pmu_version and num_pmu_gp_counters are
> initialized based on cpuid(0xa), which does not apply to AMD processors.
> For AMD CPUs, prior to PerfMonV2, the number of general-purpose registers
> is determined based on the CPU version.
>
> To address this issue, we need to add support for AMD PMU registers.
> Without this support, the following problems can arise:
>
> 1. If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
> running "perf top", the PMU registers are not disabled properly.
>
> 2. Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
> does not handle AMD PMU registers, causing some PMU events to remain
> enabled in KVM.
>
> 3. The KVM kvm_pmc_speculative_in_use() function consistently returns true,
> preventing the reclamation of these events. Consequently, the
> kvm_pmc->perf_event remains active.
>
> 4. After a reboot, the VM kernel may report the following error:
>
> [ 0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
> [ 0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
>
> 5. In the worst case, the active kvm_pmc->perf_event may inject unknown
> NMIs randomly into the VM kernel:
>
> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
>
> To resolve these issues, we propose resetting AMD PMU registers during the
> VM reset process.
>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> Changed since v1:
> - Modify "MSR_K7_EVNTSEL0 + 3" and "MSR_K7_PERFCTR0 + 3" by using
> AMD64_NUM_COUNTERS (suggested by Sandipan Das).
> - Use "AMD64_NUM_COUNTERS_CORE * 2 - 1", not "MSR_F15H_PERF_CTL0 + 0xb".
> (suggested by Sandipan Das).
> - Switch back to "-pmu" instead of using a global "pmu-cap-disabled".
> - Don't initialize PMU info if kvm.enable_pmu=N.
> Changed since v2:
> - Remove 'static' from host_cpuid_vendorX.
> - Change has_pmu_version to pmu_version.
> - Use object_property_get_int() to get CPU family.
> - Use cpuid_find_entry() instead of cpu_x86_cpuid().
> - Send error log when host and guest are from different vendors.
> - Move "if (!cpu->enable_pmu)" to begin of function. Add comments to
> reminder developers.
> - Add support to Zhaoxin. Change is_same_vendor() to
> is_host_compat_vendor().
> - Didn't add Reviewed-by from Sandipan because the change isn't minor.
> Changed since v3:
> - Use host_cpu_vendor_fms() from Zhao's patch.
> - Check AMD directly makes the "compat" rule clear.
> - Add comment to MAX_GP_COUNTERS.
> - Skip PMU info initialization if !kvm_pmu_disabled.
> Changed since v4:
> - Add Reviewed-by from Zhao and Sandipan.
> Changed since v6:
> - Add Reviewed-by from Dapeng Mi.
> Changed since v8:
> - Remove the usage of 'kvm_pmu_disabled' as sussged by Zide Chen.
> - Remove Reviewed-by from Zhao Liu, Sandipan Das and Dapeng Mi, as the
> usage of 'kvm_pmu_disabled' is removed.
>
> target/i386/cpu.h | 12 +++
> target/i386/kvm/kvm.c | 168 +++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 176 insertions(+), 4 deletions(-)
>
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 2bbc977d90..0960b98960 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -506,6 +506,14 @@ typedef enum X86Seg {
> #define MSR_CORE_PERF_GLOBAL_CTRL 0x38f
> #define MSR_CORE_PERF_GLOBAL_OVF_CTRL 0x390
>
> +#define MSR_K7_EVNTSEL0 0xc0010000
> +#define MSR_K7_PERFCTR0 0xc0010004
> +#define MSR_F15H_PERF_CTL0 0xc0010200
> +#define MSR_F15H_PERF_CTR0 0xc0010201
> +
> +#define AMD64_NUM_COUNTERS 4
> +#define AMD64_NUM_COUNTERS_CORE 6
> +
> #define MSR_MC0_CTL 0x400
> #define MSR_MC0_STATUS 0x401
> #define MSR_MC0_ADDR 0x402
> @@ -1737,6 +1745,10 @@ typedef struct {
> #endif
>
> #define MAX_FIXED_COUNTERS 3
> +/*
> + * This formula is based on Intel's MSR. The current size also meets AMD's
> + * needs.
> + */
> #define MAX_GP_COUNTERS (MSR_IA32_PERF_STATUS - MSR_P6_EVNTSEL0)
>
> #define NB_OPMASK_REGS 8
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 3b803c662d..fb7b672a9d 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -2096,7 +2096,7 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
> return 0;
> }
>
> -static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
> +static void kvm_init_pmu_info_intel(struct kvm_cpuid2 *cpuid)
> {
> struct kvm_cpuid_entry2 *c;
>
> @@ -2129,6 +2129,89 @@ static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
> }
> }
>
> +static void kvm_init_pmu_info_amd(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
> +{
> + struct kvm_cpuid_entry2 *c;
> + int64_t family;
> +
> + family = object_property_get_int(OBJECT(cpu), "family", NULL);
> + if (family < 0) {
> + return;
> + }
> +
> + if (family < 6) {
> + error_report("AMD performance-monitoring is supported from "
> + "K7 and later");
> + return;
> + }
> +
> + pmu_version = 1;
> + num_pmu_gp_counters = AMD64_NUM_COUNTERS;
> +
> + c = cpuid_find_entry(cpuid, 0x80000001, 0);
> + if (!c) {
> + return;
> + }
> +
> + if (!(c->ecx & CPUID_EXT3_PERFCORE)) {
> + return;
> + }
> +
> + num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
> +}
> +
> +static bool is_host_compat_vendor(CPUX86State *env)
> +{
> + char host_vendor[CPUID_VENDOR_SZ + 1];
> +
> + host_cpu_vendor_fms(host_vendor, NULL, NULL, NULL);
> +
> + /*
> + * Intel and Zhaoxin are compatible.
> + */
> + if ((g_str_equal(host_vendor, CPUID_VENDOR_INTEL) ||
> + g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN1) ||
> + g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN2)) &&
> + (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env))) {
> + return true;
> + }
> +
> + return g_str_equal(host_vendor, CPUID_VENDOR_AMD) &&
> + IS_AMD_CPU(env);
> +}
> +
> +static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
> +{
> + CPUX86State *env = &cpu->env;
> +
> + /*
> + * If KVM_CAP_PMU_CAPABILITY is not supported, there is no way to
> + * disable the AMD PMU virtualization.
> + *
> + * Assume the user is aware of this when !cpu->enable_pmu. AMD PMU
> + * registers are not going to reset, even they are still available to
> + * guest VM.
> + */
> + if (!cpu->enable_pmu) {
> + return;
> + }
> +
> + /*
> + * It is not supported to virtualize AMD PMU registers on Intel
> + * processors, nor to virtualize Intel PMU registers on AMD processors.
> + */
> + if (!is_host_compat_vendor(env)) {
> + error_report("host doesn't support requested feature: vPMU");
> + return;
> + }
> +
> + if (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) {
> + kvm_init_pmu_info_intel(cpuid);
> + } else if (IS_AMD_CPU(env)) {
> + kvm_init_pmu_info_amd(cpuid, cpu);
> + }
> +}
> +
> int kvm_arch_init_vcpu(CPUState *cs)
> {
> struct {
> @@ -2319,7 +2402,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
> cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i);
> cpuid_data.cpuid.nent = cpuid_i;
>
> - kvm_init_pmu_info(&cpuid_data.cpuid);
> + kvm_init_pmu_info(&cpuid_data.cpuid, cpu);
>
> if (x86_cpu_family(env->cpuid_version) >= 6
> && (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
> @@ -4094,7 +4177,7 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
> kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
> }
>
> - if (pmu_version > 0) {
> + if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
> if (pmu_version > 1) {
> /* Stop the counter. */
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
> @@ -4125,6 +4208,38 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
> env->msr_global_ctrl);
> }
> }
> +
> + if (IS_AMD_CPU(env) && pmu_version > 0) {
> + uint32_t sel_base = MSR_K7_EVNTSEL0;
> + uint32_t ctr_base = MSR_K7_PERFCTR0;
> + /*
> + * The address of the next selector or counter register is
> + * obtained by incrementing the address of the current selector
> + * or counter register by one.
> + */
> + uint32_t step = 1;
> +
> + /*
> + * When PERFCORE is enabled, AMD PMU uses a separate set of
> + * addresses for the selector and counter registers.
> + * Additionally, the address of the next selector or counter
> + * register is determined by incrementing the address of the
> + * current register by two.
> + */
> + if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
> + sel_base = MSR_F15H_PERF_CTL0;
> + ctr_base = MSR_F15H_PERF_CTR0;
> + step = 2;
> + }
> +
> + for (i = 0; i < num_pmu_gp_counters; i++) {
> + kvm_msr_entry_add(cpu, ctr_base + i * step,
> + env->msr_gp_counters[i]);
> + kvm_msr_entry_add(cpu, sel_base + i * step,
> + env->msr_gp_evtsel[i]);
> + }
> + }
> +
> /*
> * Hyper-V partition-wide MSRs: to avoid clearing them on cpu hot-add,
> * only sync them to KVM on the first cpu
> @@ -4629,7 +4744,8 @@ static int kvm_get_msrs(X86CPU *cpu)
> if (env->features[FEAT_KVM] & CPUID_KVM_POLL_CONTROL) {
> kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
> }
> - if (pmu_version > 0) {
> +
> + if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
> if (pmu_version > 1) {
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
> @@ -4645,6 +4761,35 @@ static int kvm_get_msrs(X86CPU *cpu)
> }
> }
>
> + if (IS_AMD_CPU(env) && pmu_version > 0) {
> + uint32_t sel_base = MSR_K7_EVNTSEL0;
> + uint32_t ctr_base = MSR_K7_PERFCTR0;
> + /*
> + * The address of the next selector or counter register is
> + * obtained by incrementing the address of the current selector
> + * or counter register by one.
> + */
> + uint32_t step = 1;
> +
> + /*
> + * When PERFCORE is enabled, AMD PMU uses a separate set of
> + * addresses for the selector and counter registers.
> + * Additionally, the address of the next selector or counter
> + * register is determined by incrementing the address of the
> + * current register by two.
> + */
> + if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
> + sel_base = MSR_F15H_PERF_CTL0;
> + ctr_base = MSR_F15H_PERF_CTR0;
> + step = 2;
> + }
> +
> + for (i = 0; i < num_pmu_gp_counters; i++) {
> + kvm_msr_entry_add(cpu, ctr_base + i * step, 0);
> + kvm_msr_entry_add(cpu, sel_base + i * step, 0);
> + }
> + }
> +
> if (env->mcg_cap) {
> kvm_msr_entry_add(cpu, MSR_MCG_STATUS, 0);
> kvm_msr_entry_add(cpu, MSR_MCG_CTL, 0);
> @@ -4975,6 +5120,21 @@ static int kvm_get_msrs(X86CPU *cpu)
> case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1:
> env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data;
> break;
> + case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL0 + AMD64_NUM_COUNTERS - 1:
> + env->msr_gp_evtsel[index - MSR_K7_EVNTSEL0] = msrs[i].data;
> + break;
> + case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR0 + AMD64_NUM_COUNTERS - 1:
> + env->msr_gp_counters[index - MSR_K7_PERFCTR0] = msrs[i].data;
> + break;
> + case MSR_F15H_PERF_CTL0 ...
> + MSR_F15H_PERF_CTL0 + AMD64_NUM_COUNTERS_CORE * 2 - 1:
> + index = index - MSR_F15H_PERF_CTL0;
> + if (index & 0x1) {
> + env->msr_gp_counters[index] = msrs[i].data;
> + } else {
> + env->msr_gp_evtsel[index] = msrs[i].data;
> + }
> + break;
> case HV_X64_MSR_HYPERCALL:
> env->msr_hv_hypercall = msrs[i].data;
> break;
The Intel related code looks good to me. Let AMD part to Sandipan. Thanks.
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH v9 4/5] target/i386/kvm: reset AMD PMU registers during VM reset
2026-01-09 7:53 ` [PATCH v9 4/5] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
2026-01-16 23:08 ` Dongli Zhang
2026-01-19 1:24 ` Mi, Dapeng
@ 2026-01-19 5:33 ` Zhao Liu
2 siblings, 0 replies; 14+ messages in thread
From: Zhao Liu @ 2026-01-19 5:33 UTC (permalink / raw)
To: Dongli Zhang
Cc: qemu-devel, kvm, pbonzini, mtosatti, sandipan.das, babu.moger,
likexu, like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai,
zide.chen
On Thu, Jan 08, 2026 at 11:53:59PM -0800, Dongli Zhang wrote:
> Date: Thu, 8 Jan 2026 23:53:59 -0800
> From: Dongli Zhang <dongli.zhang@oracle.com>
> Subject: [PATCH v9 4/5] target/i386/kvm: reset AMD PMU registers during VM
> reset
> X-Mailer: git-send-email 2.43.5
>
> QEMU uses the kvm_get_msrs() function to save Intel PMU registers from KVM
> and kvm_put_msrs() to restore them to KVM. However, there is no support for
> AMD PMU registers. Currently, pmu_version and num_pmu_gp_counters are
> initialized based on cpuid(0xa), which does not apply to AMD processors.
> For AMD CPUs, prior to PerfMonV2, the number of general-purpose registers
> is determined based on the CPU version.
>
> To address this issue, we need to add support for AMD PMU registers.
> Without this support, the following problems can arise:
>
> 1. If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
> running "perf top", the PMU registers are not disabled properly.
>
> 2. Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
> does not handle AMD PMU registers, causing some PMU events to remain
> enabled in KVM.
>
> 3. The KVM kvm_pmc_speculative_in_use() function consistently returns true,
> preventing the reclamation of these events. Consequently, the
> kvm_pmc->perf_event remains active.
>
> 4. After a reboot, the VM kernel may report the following error:
>
> [ 0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
> [ 0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
>
> 5. In the worst case, the active kvm_pmc->perf_event may inject unknown
> NMIs randomly into the VM kernel:
>
> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
>
> To resolve these issues, we propose resetting AMD PMU registers during the
> VM reset process.
>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> Changed since v1:
> - Modify "MSR_K7_EVNTSEL0 + 3" and "MSR_K7_PERFCTR0 + 3" by using
> AMD64_NUM_COUNTERS (suggested by Sandipan Das).
> - Use "AMD64_NUM_COUNTERS_CORE * 2 - 1", not "MSR_F15H_PERF_CTL0 + 0xb".
> (suggested by Sandipan Das).
> - Switch back to "-pmu" instead of using a global "pmu-cap-disabled".
> - Don't initialize PMU info if kvm.enable_pmu=N.
> Changed since v2:
> - Remove 'static' from host_cpuid_vendorX.
> - Change has_pmu_version to pmu_version.
> - Use object_property_get_int() to get CPU family.
> - Use cpuid_find_entry() instead of cpu_x86_cpuid().
> - Send error log when host and guest are from different vendors.
> - Move "if (!cpu->enable_pmu)" to begin of function. Add comments to
> reminder developers.
> - Add support to Zhaoxin. Change is_same_vendor() to
> is_host_compat_vendor().
> - Didn't add Reviewed-by from Sandipan because the change isn't minor.
> Changed since v3:
> - Use host_cpu_vendor_fms() from Zhao's patch.
> - Check AMD directly makes the "compat" rule clear.
> - Add comment to MAX_GP_COUNTERS.
> - Skip PMU info initialization if !kvm_pmu_disabled.
> Changed since v4:
> - Add Reviewed-by from Zhao and Sandipan.
> Changed since v6:
> - Add Reviewed-by from Dapeng Mi.
> Changed since v8:
> - Remove the usage of 'kvm_pmu_disabled' as sussged by Zide Chen.
> - Remove Reviewed-by from Zhao Liu, Sandipan Das and Dapeng Mi, as the
> usage of 'kvm_pmu_disabled' is removed.
>
> target/i386/cpu.h | 12 +++
> target/i386/kvm/kvm.c | 168 +++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 176 insertions(+), 4 deletions(-)
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v9 5/5] target/i386/kvm: support perfmon-v2 for reset
2026-01-09 7:53 [PATCH v9 0/5] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
` (3 preceding siblings ...)
2026-01-09 7:53 ` [PATCH v9 4/5] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
@ 2026-01-09 7:54 ` Dongli Zhang
2026-01-15 1:09 ` Chen, Zide
2026-02-07 13:46 ` [PATCH v9 0/5] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Paolo Bonzini
5 siblings, 1 reply; 14+ messages in thread
From: Dongli Zhang @ 2026-01-09 7:54 UTC (permalink / raw)
To: qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai,
zide.chen
Since perfmon-v2, the AMD PMU supports additional registers. This update
includes get/put functionality for these extra registers.
Similar to the implementation in KVM:
- MSR_CORE_PERF_GLOBAL_STATUS and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS both
use env->msr_global_status.
- MSR_CORE_PERF_GLOBAL_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_CTL both use
env->msr_global_ctrl.
- MSR_CORE_PERF_GLOBAL_OVF_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR
both use env->msr_global_ovf_ctrl.
No changes are needed for vmstate_msr_architectural_pmu or
pmu_enable_needed().
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
---
Changed since v1:
- Use "has_pmu_version > 1", not "has_pmu_version == 2".
Changed since v2:
- Use cpuid_find_entry() instead of cpu_x86_cpuid().
- Change has_pmu_version to pmu_version.
- Cap num_pmu_gp_counters with MAX_GP_COUNTERS.
Changed since v4:
- Add Reviewed-by from Sandipan.
target/i386/cpu.h | 4 ++++
target/i386/kvm/kvm.c | 48 +++++++++++++++++++++++++++++++++++--------
2 files changed, 43 insertions(+), 9 deletions(-)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 0960b98960..6887ae6a33 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -506,6 +506,10 @@ typedef enum X86Seg {
#define MSR_CORE_PERF_GLOBAL_CTRL 0x38f
#define MSR_CORE_PERF_GLOBAL_OVF_CTRL 0x390
+#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS 0xc0000300
+#define MSR_AMD64_PERF_CNTR_GLOBAL_CTL 0xc0000301
+#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR 0xc0000302
+
#define MSR_K7_EVNTSEL0 0xc0010000
#define MSR_K7_PERFCTR0 0xc0010004
#define MSR_F15H_PERF_CTL0 0xc0010200
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index fb7b672a9d..67adfafa0c 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2158,6 +2158,16 @@ static void kvm_init_pmu_info_amd(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
}
num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
+
+ c = cpuid_find_entry(cpuid, 0x80000022, 0);
+ if (c && (c->eax & CPUID_8000_0022_EAX_PERFMON_V2)) {
+ pmu_version = 2;
+ num_pmu_gp_counters = c->ebx & 0xf;
+
+ if (num_pmu_gp_counters > MAX_GP_COUNTERS) {
+ num_pmu_gp_counters = MAX_GP_COUNTERS;
+ }
+ }
}
static bool is_host_compat_vendor(CPUX86State *env)
@@ -4220,13 +4230,14 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
uint32_t step = 1;
/*
- * When PERFCORE is enabled, AMD PMU uses a separate set of
- * addresses for the selector and counter registers.
- * Additionally, the address of the next selector or counter
- * register is determined by incrementing the address of the
- * current register by two.
+ * When PERFCORE or PerfMonV2 is enabled, AMD PMU uses a
+ * separate set of addresses for the selector and counter
+ * registers. Additionally, the address of the next selector or
+ * counter register is determined by incrementing the address
+ * of the current register by two.
*/
- if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+ if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE ||
+ pmu_version > 1) {
sel_base = MSR_F15H_PERF_CTL0;
ctr_base = MSR_F15H_PERF_CTR0;
step = 2;
@@ -4238,6 +4249,15 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
kvm_msr_entry_add(cpu, sel_base + i * step,
env->msr_gp_evtsel[i]);
}
+
+ if (pmu_version > 1) {
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS,
+ env->msr_global_status);
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR,
+ env->msr_global_ovf_ctrl);
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_CTL,
+ env->msr_global_ctrl);
+ }
}
/*
@@ -4772,13 +4792,14 @@ static int kvm_get_msrs(X86CPU *cpu)
uint32_t step = 1;
/*
- * When PERFCORE is enabled, AMD PMU uses a separate set of
- * addresses for the selector and counter registers.
+ * When PERFCORE or PerfMonV2 is enabled, AMD PMU uses a separate
+ * set of addresses for the selector and counter registers.
* Additionally, the address of the next selector or counter
* register is determined by incrementing the address of the
* current register by two.
*/
- if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+ if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE ||
+ pmu_version > 1) {
sel_base = MSR_F15H_PERF_CTL0;
ctr_base = MSR_F15H_PERF_CTR0;
step = 2;
@@ -4788,6 +4809,12 @@ static int kvm_get_msrs(X86CPU *cpu)
kvm_msr_entry_add(cpu, ctr_base + i * step, 0);
kvm_msr_entry_add(cpu, sel_base + i * step, 0);
}
+
+ if (pmu_version > 1) {
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0);
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, 0);
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, 0);
+ }
}
if (env->mcg_cap) {
@@ -5103,12 +5130,15 @@ static int kvm_get_msrs(X86CPU *cpu)
env->msr_fixed_ctr_ctrl = msrs[i].data;
break;
case MSR_CORE_PERF_GLOBAL_CTRL:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
env->msr_global_ctrl = msrs[i].data;
break;
case MSR_CORE_PERF_GLOBAL_STATUS:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
env->msr_global_status = msrs[i].data;
break;
case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
env->msr_global_ovf_ctrl = msrs[i].data;
break;
case MSR_CORE_PERF_FIXED_CTR0 ... MSR_CORE_PERF_FIXED_CTR0 + MAX_FIXED_COUNTERS - 1:
--
2.39.3
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH v9 5/5] target/i386/kvm: support perfmon-v2 for reset
2026-01-09 7:54 ` [PATCH v9 5/5] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
@ 2026-01-15 1:09 ` Chen, Zide
0 siblings, 0 replies; 14+ messages in thread
From: Chen, Zide @ 2026-01-15 1:09 UTC (permalink / raw)
To: Dongli Zhang, qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai
On 1/8/2026 11:54 PM, Dongli Zhang wrote:
> Since perfmon-v2, the AMD PMU supports additional registers. This update
> includes get/put functionality for these extra registers.
>
> Similar to the implementation in KVM:
>
> - MSR_CORE_PERF_GLOBAL_STATUS and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS both
> use env->msr_global_status.
> - MSR_CORE_PERF_GLOBAL_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_CTL both use
> env->msr_global_ctrl.
> - MSR_CORE_PERF_GLOBAL_OVF_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR
> both use env->msr_global_ovf_ctrl.
>
> No changes are needed for vmstate_msr_architectural_pmu or
> pmu_enable_needed().
>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> Reviewed-by: Sandipan Das <sandipan.das@amd.com>
> ---
LGTM.
Reviewed-by: Zide Chen <zide.chen@intel.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v9 0/5] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup
2026-01-09 7:53 [PATCH v9 0/5] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
` (4 preceding siblings ...)
2026-01-09 7:54 ` [PATCH v9 5/5] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
@ 2026-02-07 13:46 ` Paolo Bonzini
5 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2026-02-07 13:46 UTC (permalink / raw)
To: Dongli Zhang
Cc: qemu-devel, kvm, pbonzini, zhao1.liu, mtosatti, sandipan.das,
babu.moger, likexu, like.xu.linux, groug, khorenko,
alexander.ivanov, den, davydov-max, xiaoyao.li, dapeng1.mi,
joe.jin, ewanhai-oc, ewanhai, zide.chen
Queued, thanks.
Paolo
^ permalink raw reply [flat|nested] 14+ messages in thread