qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup
@ 2025-04-16 21:52 Dongli Zhang
  2025-04-16 21:52 ` [PATCH v4 01/11] [DO NOT MERGE] i386/cpu: Consolidate the helper to get Host's vendor Dongli Zhang
                   ` (10 more replies)
  0 siblings, 11 replies; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

This patchset addresses four bugs related to AMD PMU virtualization.

1. The PerfMonV2 is still available if PERCORE if disabled via
"-cpu host,-perfctr-core".

2. The VM 'cpuid' command still returns PERFCORE although "-pmu" is
configured.

3. The third issue is that using "-cpu host,-pmu" does not disable AMD PMU
virtualization. When using "-cpu EPYC" or "-cpu host,-pmu", AMD PMU
virtualization remains enabled. On the VM's Linux side, you might still
see:

[    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.

instead of:

[    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
[    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled

To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
when "-pmu" is configured.

4. The fourth issue is that unreclaimed performance events (after a QEMU
system_reset) in KVM may cause random, unwanted, or unknown NMIs to be
injected into the VM.

The AMD PMU registers are not reset during QEMU system_reset.

(1) If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
running "perf top", the PMU registers are not disabled properly.

(2) Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
does not handle AMD PMU registers, causing some PMU events to remain
enabled in KVM.

(3) The KVM kvm_pmc_speculative_in_use() function consistently returns true,
preventing the reclamation of these events. Consequently, the
kvm_pmc->perf_event remains active.

(4) After a reboot, the VM kernel may report the following error:

[    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
[    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)

(5) In the worst case, the active kvm_pmc->perf_event may inject unknown
NMIs randomly into the VM kernel:

[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.

To resolve these issues, we propose resetting AMD PMU registers during the
VM reset process


Changed since v1:
  - Use feature_dependencies for CPUID_EXT3_PERFCORE and
    CPUID_8000_0022_EAX_PERFMON_V2.
  - Remove CPUID_EXT3_PERFCORE when !cpu->enable_pmu.
  - Pick kvm_arch_pre_create_vcpu() patch from Xiaoyao Li.
  - Use "-pmu" but not a global "pmu-cap-disabled" for KVM_PMU_CAP_DISABLE.
  - Also use sysfs kvm.enable_pmu=N to determine if PMU is supported.
  - Some changes to PMU register limit calculation.
Changed since v2:
  - Change has_pmu_cap to pmu_cap.
  - Use cpuid_find_entry() instead of cpu_x86_cpuid().
  - Rework the code flow of PATCH 07 related to kvm.enable_pmu=N following
    Zhao's suggestion.
  - Use object_property_get_int() to get CPU family.
  - Add support to Zhaoxin.
Changed since v3:
  - Re-base on top of Zhao's queued patch.
  - Use host_cpu_vendor_fms() from Zhao's patch.
  - Pick new version of kvm_arch_pre_create_vcpu() patch from Xiaoyao.
  - Re-split the cases into enable_pmu and !enable_pmu, following Zhao's
    suggestion.
  - Check AMD directly makes the "compat" rule clear.
  - Some changes on commit message and comment.
  - Bring back global static variable 'kvm_pmu_disabled' read from
    /sys/module/kvm/parameters/enable_pmu.


Zhao Liu (1):
  i386/cpu: Consolidate the helper to get Host's vendor [Don't merge]

Xiaoyao Li (1):
  kvm: Introduce kvm_arch_pre_create_vcpu()

Dongli Zhang (9):
  target/i386: disable PerfMonV2 when PERFCORE unavailable
  target/i386: disable PERFCORE when "-pmu" is configured
  target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured
  target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid()
  target/i386/kvm: rename architectural PMU variables
  target/i386/kvm: query kvm.enable_pmu parameter
  target/i386/kvm: reset AMD PMU registers during VM reset
  target/i386/kvm: support perfmon-v2 for reset
  target/i386/kvm: don't stop Intel PMU counters

 accel/kvm/kvm-all.c           |   5 +
 include/system/kvm.h          |   1 +
 target/arm/kvm.c              |   5 +
 target/i386/cpu.c             |   8 +
 target/i386/cpu.h             |  16 ++
 target/i386/host-cpu.c        |  10 +-
 target/i386/kvm/kvm.c         | 360 ++++++++++++++++++++++++++++++++-----
 target/i386/kvm/vmsr_energy.c |   3 +-
 target/loongarch/kvm/kvm.c    |   4 +
 target/mips/kvm.c             |   5 +
 target/ppc/kvm.c              |   5 +
 target/riscv/kvm/kvm-cpu.c    |   5 +
 target/s390x/kvm/kvm.c        |   5 +
 13 files changed, 379 insertions(+), 53 deletions(-)

base-commit: a9cd5bc6399a80fcf233ed0fffe6067b731227d8

Thank you very much!

Dongli Zhang



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 01/11] [DO NOT MERGE] i386/cpu: Consolidate the helper to get Host's vendor
  2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
@ 2025-04-16 21:52 ` Dongli Zhang
  2025-04-25  8:28   ` Zhao Liu
  2025-04-16 21:52 ` [PATCH v4 02/11] target/i386: disable PerfMonV2 when PERFCORE unavailable Dongli Zhang
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

From: Zhao Liu <zhao1.liu@intel.com>

Extend host_cpu_vendor_fms() to help more cases to get Host's vendor
information.

Cc: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
This patch is already queued by Paolo.
https://lore.kernel.org/all/20250410075619.145792-1-zhao1.liu@intel.com/
I don't need to add my Signed-off-by.

 target/i386/host-cpu.c        | 10 ++++++----
 target/i386/kvm/vmsr_energy.c |  3 +--
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/target/i386/host-cpu.c b/target/i386/host-cpu.c
index 3e4e85e729..072731a4dd 100644
--- a/target/i386/host-cpu.c
+++ b/target/i386/host-cpu.c
@@ -109,9 +109,13 @@ void host_cpu_vendor_fms(char *vendor, int *family, int *model, int *stepping)
 {
     uint32_t eax, ebx, ecx, edx;
 
-    host_cpuid(0x0, 0, &eax, &ebx, &ecx, &edx);
+    host_cpuid(0x0, 0, NULL, &ebx, &ecx, &edx);
     x86_cpu_vendor_words2str(vendor, ebx, edx, ecx);
 
+    if (!family && !model && !stepping) {
+        return;
+    }
+
     host_cpuid(0x1, 0, &eax, &ebx, &ecx, &edx);
     if (family) {
         *family = ((eax >> 8) & 0x0F) + ((eax >> 20) & 0xFF);
@@ -129,11 +133,9 @@ void host_cpu_instance_init(X86CPU *cpu)
     X86CPUClass *xcc = X86_CPU_GET_CLASS(cpu);
 
     if (xcc->model) {
-        uint32_t ebx = 0, ecx = 0, edx = 0;
         char vendor[CPUID_VENDOR_SZ + 1];
 
-        host_cpuid(0, 0, NULL, &ebx, &ecx, &edx);
-        x86_cpu_vendor_words2str(vendor, ebx, edx, ecx);
+        host_cpu_vendor_fms(vendor, NULL, NULL, NULL);
         object_property_set_str(OBJECT(cpu), "vendor", vendor, &error_abort);
     }
 }
diff --git a/target/i386/kvm/vmsr_energy.c b/target/i386/kvm/vmsr_energy.c
index 31508d4e77..f499ec6e8b 100644
--- a/target/i386/kvm/vmsr_energy.c
+++ b/target/i386/kvm/vmsr_energy.c
@@ -29,10 +29,9 @@ char *vmsr_compute_default_paths(void)
 
 bool is_host_cpu_intel(void)
 {
-    int family, model, stepping;
     char vendor[CPUID_VENDOR_SZ + 1];
 
-    host_cpu_vendor_fms(vendor, &family, &model, &stepping);
+    host_cpu_vendor_fms(vendor, NULL, NULL, NULL);
 
     return g_str_equal(vendor, CPUID_VENDOR_INTEL);
 }
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 02/11] target/i386: disable PerfMonV2 when PERFCORE unavailable
  2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
  2025-04-16 21:52 ` [PATCH v4 01/11] [DO NOT MERGE] i386/cpu: Consolidate the helper to get Host's vendor Dongli Zhang
@ 2025-04-16 21:52 ` Dongli Zhang
  2025-04-16 21:52 ` [PATCH v4 03/11] target/i386: disable PERFCORE when "-pmu" is configured Dongli Zhang
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

When the PERFCORE is disabled with "-cpu host,-perfctr-core", it is
reflected in in guest dmesg.

[    0.285136] Performance Events: AMD PMU driver.

However, the guest CPUID indicates the PerfMonV2 is still available.

CPU:
   Extended Performance Monitoring and Debugging (0x80000022):
      AMD performance monitoring V2         = true
      AMD LBR V2                            = false
      AMD LBR stack & PMC freezing          = false
      number of core perf ctrs              = 0x6 (6)
      number of LBR stack entries           = 0x0 (0)
      number of avail Northbridge perf ctrs = 0x0 (0)
      number of available UMC PMCs          = 0x0 (0)
      active UMCs bitmask                   = 0x0

Disable PerfMonV2 in CPUID when PERFCORE is disabled.

Suggested-by: Zhao Liu <zhao1.liu@intel.com>
Fixes: 209b0ac12074 ("target/i386: Add PerfMonV2 feature bit")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
---
Changed since v1:
  - Use feature_dependencies (suggested by Zhao Liu).
Changed since v2:
  - Nothing. Zhao and Xiaoyao may move it to x86_cpu_expand_features()
    later.

 target/i386/cpu.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 1b64ceaaba..2b87331be5 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1808,6 +1808,10 @@ static FeatureDep feature_dependencies[] = {
         .from = { FEAT_7_1_EDX,             CPUID_7_1_EDX_AVX10 },
         .to = { FEAT_24_0_EBX,              ~0ull },
     },
+    {
+        .from = { FEAT_8000_0001_ECX,       CPUID_EXT3_PERFCORE },
+        .to = { FEAT_8000_0022_EAX,         CPUID_8000_0022_EAX_PERFMON_V2 },
+    },
 };
 
 typedef struct X86RegisterInfo32 {
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 03/11] target/i386: disable PERFCORE when "-pmu" is configured
  2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
  2025-04-16 21:52 ` [PATCH v4 01/11] [DO NOT MERGE] i386/cpu: Consolidate the helper to get Host's vendor Dongli Zhang
  2025-04-16 21:52 ` [PATCH v4 02/11] target/i386: disable PerfMonV2 when PERFCORE unavailable Dongli Zhang
@ 2025-04-16 21:52 ` Dongli Zhang
  2025-04-25 10:11   ` Sandipan Das
  2025-04-16 21:52 ` [PATCH v4 04/11] kvm: Introduce kvm_arch_pre_create_vcpu() Dongli Zhang
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

Currently, AMD PMU support isn't determined based on CPUID, that is, the
"-pmu" option does not fully disable KVM AMD PMU virtualization.

To minimize AMD PMU features, remove PERFCORE when "-pmu" is configured.

To completely disable AMD PMU virtualization will be implemented via
KVM_CAP_PMU_CAPABILITY in upcoming patches.

As a reminder, neither CPUID_EXT3_PERFCORE nor
CPUID_8000_0022_EAX_PERFMON_V2 is removed from env->features[] when "-pmu"
is configured. Developers should query whether they are supported via
cpu_x86_cpuid() rather than relying on env->features[] in future patches.

Suggested-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v2:
  - No need to check "kvm_enabled() && IS_AMD_CPU(env)".

 target/i386/cpu.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 2b87331be5..acbd627f7e 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7242,6 +7242,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
             !(env->hflags & HF_LMA_MASK)) {
             *edx &= ~CPUID_EXT2_SYSCALL;
         }
+
+        if (!cpu->enable_pmu) {
+            *ecx &= ~CPUID_EXT3_PERFCORE;
+        }
         break;
     case 0x80000002:
     case 0x80000003:
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 04/11] kvm: Introduce kvm_arch_pre_create_vcpu()
  2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
                   ` (2 preceding siblings ...)
  2025-04-16 21:52 ` [PATCH v4 03/11] target/i386: disable PERFCORE when "-pmu" is configured Dongli Zhang
@ 2025-04-16 21:52 ` Dongli Zhang
  2025-04-16 21:52 ` [PATCH v4 05/11] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured Dongli Zhang
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

From: Xiaoyao Li <xiaoyao.li@intel.com>

Introduce kvm_arch_pre_create_vcpu(), to perform arch-dependent
work prior to create any vcpu. This is for i386 TDX because it needs
call TDX_INIT_VM before creating any vcpu.

The specific implementation of i386 will be added in the future patch.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v2:
  - Add my Signed-off-by.
Changed since v3:
  - Pick new reviewed version from:
https://lore.kernel.org/all/20250401130205.2198253-8-xiaoyao.li@intel.com/
    I have fixed the typo as suggested by Daniel.
  - Keep Zhao's review.

 accel/kvm/kvm-all.c        | 5 +++++
 include/system/kvm.h       | 1 +
 target/arm/kvm.c           | 5 +++++
 target/i386/kvm/kvm.c      | 5 +++++
 target/loongarch/kvm/kvm.c | 4 ++++
 target/mips/kvm.c          | 5 +++++
 target/ppc/kvm.c           | 5 +++++
 target/riscv/kvm/kvm-cpu.c | 5 +++++
 target/s390x/kvm/kvm.c     | 5 +++++
 9 files changed, 40 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index f89568bfa3..df9840e53a 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -540,6 +540,11 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 
     trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
 
+    ret = kvm_arch_pre_create_vcpu(cpu, errp);
+    if (ret < 0) {
+        goto err;
+    }
+
     ret = kvm_create_vcpu(cpu);
     if (ret < 0) {
         error_setg_errno(errp, -ret,
diff --git a/include/system/kvm.h b/include/system/kvm.h
index ab17c09a55..d7dfa25493 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -374,6 +374,7 @@ int kvm_arch_get_default_type(MachineState *ms);
 
 int kvm_arch_init(MachineState *ms, KVMState *s);
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp);
 int kvm_arch_init_vcpu(CPUState *cpu);
 int kvm_arch_destroy_vcpu(CPUState *cpu);
 
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index da30bdbb23..93f1a7245b 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -1874,6 +1874,11 @@ static int kvm_arm_sve_set_vls(ARMCPU *cpu)
 
 #define ARM_CPU_ID_MPIDR       3, 0, 0, 0, 5
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+    return 0;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     int ret;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 6c749d4ee8..f41e190fb8 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2051,6 +2051,11 @@ full:
     abort();
 }
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+    return 0;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     struct {
diff --git a/target/loongarch/kvm/kvm.c b/target/loongarch/kvm/kvm.c
index f0e3cfef03..64c9672976 100644
--- a/target/loongarch/kvm/kvm.c
+++ b/target/loongarch/kvm/kvm.c
@@ -1071,7 +1071,11 @@ static int kvm_cpu_check_pv_features(CPUState *cs, Error **errp)
             env->pv_features |= BIT(KVM_FEATURE_VIRT_EXTIOI);
         }
     }
+    return 0;
+}
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
     return 0;
 }
 
diff --git a/target/mips/kvm.c b/target/mips/kvm.c
index d67b7c1a8e..ec53acb51a 100644
--- a/target/mips/kvm.c
+++ b/target/mips/kvm.c
@@ -61,6 +61,11 @@ int kvm_arch_irqchip_create(KVMState *s)
     return 0;
 }
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+    return 0;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     CPUMIPSState *env = cpu_env(cs);
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 992356cb75..20fabccecd 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -479,6 +479,11 @@ static void kvmppc_hw_debug_points_init(CPUPPCState *cenv)
     }
 }
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+    return 0;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     PowerPCCPU *cpu = POWERPC_CPU(cs);
diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c
index 0f4997a918..6f15f727de 100644
--- a/target/riscv/kvm/kvm-cpu.c
+++ b/target/riscv/kvm/kvm-cpu.c
@@ -1383,6 +1383,11 @@ static int kvm_vcpu_enable_sbi_dbcn(RISCVCPU *cpu, CPUState *cs)
     return kvm_set_one_reg(cs, kvm_sbi_dbcn.kvm_reg_id, &reg);
 }
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+    return 0;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     int ret = 0;
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 4d56e653dd..1f592733f4 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -404,6 +404,11 @@ unsigned long kvm_arch_vcpu_id(CPUState *cpu)
     return cpu->cpu_index;
 }
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+    return 0;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     unsigned int max_cpus = MACHINE(qdev_get_machine())->smp.max_cpus;
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 05/11] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured
  2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
                   ` (3 preceding siblings ...)
  2025-04-16 21:52 ` [PATCH v4 04/11] kvm: Introduce kvm_arch_pre_create_vcpu() Dongli Zhang
@ 2025-04-16 21:52 ` Dongli Zhang
  2025-04-16 21:52 ` [PATCH v4 06/11] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

Although AMD PERFCORE and PerfMonV2 are removed when "-pmu" is configured,
there is no way to fully disable KVM AMD PMU virtualization. Neither
"-cpu host,-pmu" nor "-cpu EPYC" achieves this.

As a result, the following message still appears in the VM dmesg:

[    0.263615] Performance Events: AMD PMU driver.

However, the expected output should be:

[    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
[    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled

This occurs because AMD does not use any CPUID bit to indicate PMU
availability.

To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
when "-pmu" is configured.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v1:
  - Switch back to the initial implementation with "-pmu".
https://lore.kernel.org/all/20221119122901.2469-3-dongli.zhang@oracle.com
  - Mention that "KVM_PMU_CAP_DISABLE doesn't change the PMU behavior on
    Intel platform because current "pmu" property works as expected."
Changed since v2:
  - Change has_pmu_cap to pmu_cap.
  - Use (pmu_cap & KVM_PMU_CAP_DISABLE) instead of only pmu_cap in if
    statement.
  - Add Reviewed-by from Xiaoyao and Zhao as the change is minor.

 target/i386/kvm/kvm.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f41e190fb8..579c0f7e0b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -176,6 +176,8 @@ static int has_triple_fault_event;
 
 static bool has_msr_mcg_ext_ctl;
 
+static int pmu_cap;
+
 static struct kvm_cpuid2 *cpuid_cache;
 static struct kvm_cpuid2 *hv_cpuid_cache;
 static struct kvm_msr_list *kvm_feature_msrs;
@@ -2053,6 +2055,33 @@ full:
 
 int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
 {
+    static bool first = true;
+    int ret;
+
+    if (first) {
+        first = false;
+
+        /*
+         * Since Linux v5.18, KVM provides a VM-level capability to easily
+         * disable PMUs; however, QEMU has been providing PMU property per
+         * CPU since v1.6. In order to accommodate both, have to configure
+         * the VM-level capability here.
+         *
+         * KVM_PMU_CAP_DISABLE doesn't change the PMU
+         * behavior on Intel platform because current "pmu" property works
+         * as expected.
+         */
+        if ((pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu) {
+            ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
+                                    KVM_PMU_CAP_DISABLE);
+            if (ret < 0) {
+                error_setg_errno(errp, -ret,
+                                 "Failed to set KVM_PMU_CAP_DISABLE");
+                return ret;
+            }
+        }
+    }
+
     return 0;
 }
 
@@ -3351,6 +3380,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
         }
     }
 
+    pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
+
     return 0;
 }
 
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 06/11] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid()
  2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
                   ` (4 preceding siblings ...)
  2025-04-16 21:52 ` [PATCH v4 05/11] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured Dongli Zhang
@ 2025-04-16 21:52 ` Dongli Zhang
  2025-04-16 21:52 ` [PATCH v4 07/11] target/i386/kvm: rename architectural PMU variables Dongli Zhang
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

The initialization of 'has_architectural_pmu_version',
'num_architectural_pmu_gp_counters', and
'num_architectural_pmu_fixed_counters' is unrelated to the process of
building the CPUID.

Extract them out of kvm_x86_build_cpuid().

In addition, use cpuid_find_entry() instead of cpu_x86_cpuid(), because
CPUID has already been filled at this stage.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v1:
  - Still extract the code, but call them for all CPUs.
Changed since v2:
  - Use cpuid_find_entry() instead of cpu_x86_cpuid().
  - Didn't add Reviewed-by from Dapeng as the change isn't minor.

 target/i386/kvm/kvm.c | 62 ++++++++++++++++++++++++-------------------
 1 file changed, 35 insertions(+), 27 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 579c0f7e0b..4d86c08c6c 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1959,33 +1959,6 @@ static uint32_t kvm_x86_build_cpuid(CPUX86State *env,
         }
     }
 
-    if (limit >= 0x0a) {
-        uint32_t eax, edx;
-
-        cpu_x86_cpuid(env, 0x0a, 0, &eax, &unused, &unused, &edx);
-
-        has_architectural_pmu_version = eax & 0xff;
-        if (has_architectural_pmu_version > 0) {
-            num_architectural_pmu_gp_counters = (eax & 0xff00) >> 8;
-
-            /* Shouldn't be more than 32, since that's the number of bits
-             * available in EBX to tell us _which_ counters are available.
-             * Play it safe.
-             */
-            if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
-                num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
-            }
-
-            if (has_architectural_pmu_version > 1) {
-                num_architectural_pmu_fixed_counters = edx & 0x1f;
-
-                if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
-                    num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
-                }
-            }
-        }
-    }
-
     cpu_x86_cpuid(env, 0x80000000, 0, &limit, &unused, &unused, &unused);
 
     for (i = 0x80000000; i <= limit; i++) {
@@ -2085,6 +2058,39 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
     return 0;
 }
 
+static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
+{
+    struct kvm_cpuid_entry2 *c;
+
+    c = cpuid_find_entry(cpuid, 0xa, 0);
+
+    if (!c) {
+        return;
+    }
+
+    has_architectural_pmu_version = c->eax & 0xff;
+    if (has_architectural_pmu_version > 0) {
+        num_architectural_pmu_gp_counters = (c->eax & 0xff00) >> 8;
+
+        /*
+         * Shouldn't be more than 32, since that's the number of bits
+         * available in EBX to tell us _which_ counters are available.
+         * Play it safe.
+         */
+        if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
+            num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
+        }
+
+        if (has_architectural_pmu_version > 1) {
+            num_architectural_pmu_fixed_counters = c->edx & 0x1f;
+
+            if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
+                num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
+            }
+        }
+    }
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     struct {
@@ -2267,6 +2273,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
     cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i);
     cpuid_data.cpuid.nent = cpuid_i;
 
+    kvm_init_pmu_info(&cpuid_data.cpuid);
+
     if (((env->cpuid_version >> 8)&0xF) >= 6
         && (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
            (CPUID_MCE | CPUID_MCA)) {
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 07/11] target/i386/kvm: rename architectural PMU variables
  2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
                   ` (5 preceding siblings ...)
  2025-04-16 21:52 ` [PATCH v4 06/11] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
@ 2025-04-16 21:52 ` Dongli Zhang
  2025-04-16 21:52 ` [PATCH v4 08/11] target/i386/kvm: query kvm.enable_pmu parameter Dongli Zhang
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

AMD does not have what is commonly referred to as an architectural PMU.
Therefore, we need to rename the following variables to be applicable for
both Intel and AMD:

- has_architectural_pmu_version
- num_architectural_pmu_gp_counters
- num_architectural_pmu_fixed_counters

For Intel processors, the meaning of pmu_version remains unchanged.

For AMD processors:

pmu_version == 1 corresponds to versions before AMD PerfMonV2.
pmu_version == 2 corresponds to AMD PerfMonV2.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v2:
  - Change has_pmu_version to pmu_version.
  - Add Reviewed-by since the change is minor.
  - As a reminder, there are some contextual change due to PATCH 05,
    i.e., c->edx vs. edx.

 target/i386/kvm/kvm.c | 49 ++++++++++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 21 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 4d86c08c6c..6b49549f1b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -164,9 +164,16 @@ static bool has_msr_perf_capabs;
 static bool has_msr_pkrs;
 static bool has_msr_hwcr;
 
-static uint32_t has_architectural_pmu_version;
-static uint32_t num_architectural_pmu_gp_counters;
-static uint32_t num_architectural_pmu_fixed_counters;
+/*
+ * For Intel processors, the meaning is the architectural PMU version
+ * number.
+ *
+ * For AMD processors: 1 corresponds to the prior versions, and 2
+ * corresponds to AMD PerfMonV2.
+ */
+static uint32_t pmu_version;
+static uint32_t num_pmu_gp_counters;
+static uint32_t num_pmu_fixed_counters;
 
 static int has_xsave2;
 static int has_xcrs;
@@ -2068,24 +2075,24 @@ static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
         return;
     }
 
-    has_architectural_pmu_version = c->eax & 0xff;
-    if (has_architectural_pmu_version > 0) {
-        num_architectural_pmu_gp_counters = (c->eax & 0xff00) >> 8;
+    pmu_version = c->eax & 0xff;
+    if (pmu_version > 0) {
+        num_pmu_gp_counters = (c->eax & 0xff00) >> 8;
 
         /*
          * Shouldn't be more than 32, since that's the number of bits
          * available in EBX to tell us _which_ counters are available.
          * Play it safe.
          */
-        if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
-            num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
+        if (num_pmu_gp_counters > MAX_GP_COUNTERS) {
+            num_pmu_gp_counters = MAX_GP_COUNTERS;
         }
 
-        if (has_architectural_pmu_version > 1) {
-            num_architectural_pmu_fixed_counters = c->edx & 0x1f;
+        if (pmu_version > 1) {
+            num_pmu_fixed_counters = c->edx & 0x1f;
 
-            if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
-                num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
+            if (num_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
+                num_pmu_fixed_counters = MAX_FIXED_COUNTERS;
             }
         }
     }
@@ -4037,25 +4044,25 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
             kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
         }
 
-        if (has_architectural_pmu_version > 0) {
-            if (has_architectural_pmu_version > 1) {
+        if (pmu_version > 0) {
+            if (pmu_version > 1) {
                 /* Stop the counter.  */
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
             }
 
             /* Set the counter values.  */
-            for (i = 0; i < num_architectural_pmu_fixed_counters; i++) {
+            for (i = 0; i < num_pmu_fixed_counters; i++) {
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i,
                                   env->msr_fixed_counters[i]);
             }
-            for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
+            for (i = 0; i < num_pmu_gp_counters; i++) {
                 kvm_msr_entry_add(cpu, MSR_P6_PERFCTR0 + i,
                                   env->msr_gp_counters[i]);
                 kvm_msr_entry_add(cpu, MSR_P6_EVNTSEL0 + i,
                                   env->msr_gp_evtsel[i]);
             }
-            if (has_architectural_pmu_version > 1) {
+            if (pmu_version > 1) {
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_STATUS,
                                   env->msr_global_status);
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
@@ -4515,17 +4522,17 @@ static int kvm_get_msrs(X86CPU *cpu)
     if (env->features[FEAT_KVM] & CPUID_KVM_POLL_CONTROL) {
         kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
     }
-    if (has_architectural_pmu_version > 0) {
-        if (has_architectural_pmu_version > 1) {
+    if (pmu_version > 0) {
+        if (pmu_version > 1) {
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_STATUS, 0);
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0);
         }
-        for (i = 0; i < num_architectural_pmu_fixed_counters; i++) {
+        for (i = 0; i < num_pmu_fixed_counters; i++) {
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i, 0);
         }
-        for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
+        for (i = 0; i < num_pmu_gp_counters; i++) {
             kvm_msr_entry_add(cpu, MSR_P6_PERFCTR0 + i, 0);
             kvm_msr_entry_add(cpu, MSR_P6_EVNTSEL0 + i, 0);
         }
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 08/11] target/i386/kvm: query kvm.enable_pmu parameter
  2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
                   ` (6 preceding siblings ...)
  2025-04-16 21:52 ` [PATCH v4 07/11] target/i386/kvm: rename architectural PMU variables Dongli Zhang
@ 2025-04-16 21:52 ` Dongli Zhang
  2025-04-25  8:56   ` Zhao Liu
  2025-04-16 21:52 ` [PATCH v4 09/11] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

When PMU is enabled in QEMU, there is a chance that PMU virtualization is
completely disabled by the KVM module parameter kvm.enable_pmu=N.

The kvm.enable_pmu parameter is introduced since Linux v5.17.
Its permission is 0444. It does not change until a reload of the KVM
module.

Read the kvm.enable_pmu value from the module sysfs to give a chance to
provide more information about vPMU enablement.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
Changed since v2:
  - Rework the code flow following Zhao's suggestion.
  - Return error when:
    (*kvm_enable_pmu == 'N' && X86_CPU(cpu)->enable_pmu)
Changed since v3:
  - Re-split the cases into enable_pmu and !enable_pmu, following Zhao's
    suggestion.
  - Rework the commit messages.
  - Bring back global static variable 'kvm_pmu_disabled' from v2.

 target/i386/kvm/kvm.c | 61 +++++++++++++++++++++++++++++++------------
 1 file changed, 44 insertions(+), 17 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 6b49549f1b..38cc1a5f43 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -184,6 +184,10 @@ static int has_triple_fault_event;
 static bool has_msr_mcg_ext_ctl;
 
 static int pmu_cap;
+/*
+ * Read from /sys/module/kvm/parameters/enable_pmu.
+ */
+static bool kvm_pmu_disabled;
 
 static struct kvm_cpuid2 *cpuid_cache;
 static struct kvm_cpuid2 *hv_cpuid_cache;
@@ -2041,23 +2045,30 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
     if (first) {
         first = false;
 
-        /*
-         * Since Linux v5.18, KVM provides a VM-level capability to easily
-         * disable PMUs; however, QEMU has been providing PMU property per
-         * CPU since v1.6. In order to accommodate both, have to configure
-         * the VM-level capability here.
-         *
-         * KVM_PMU_CAP_DISABLE doesn't change the PMU
-         * behavior on Intel platform because current "pmu" property works
-         * as expected.
-         */
-        if ((pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu) {
-            ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
-                                    KVM_PMU_CAP_DISABLE);
-            if (ret < 0) {
-                error_setg_errno(errp, -ret,
-                                 "Failed to set KVM_PMU_CAP_DISABLE");
-                return ret;
+        if (X86_CPU(cpu)->enable_pmu) {
+            if (kvm_pmu_disabled) {
+                warn_report("Failed to enable PMU since "
+                            "KVM's enable_pmu parameter is disabled");
+            }
+        } else {
+            /*
+             * Since Linux v5.18, KVM provides a VM-level capability to easily
+             * disable PMUs; however, QEMU has been providing PMU property per
+             * CPU since v1.6. In order to accommodate both, have to configure
+             * the VM-level capability here.
+             *
+             * KVM_PMU_CAP_DISABLE doesn't change the PMU
+             * behavior on Intel platform because current "pmu" property works
+             * as expected.
+             */
+            if (pmu_cap & KVM_PMU_CAP_DISABLE) {
+                ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
+                                        KVM_PMU_CAP_DISABLE);
+                if (ret < 0) {
+                    error_setg_errno(errp, -ret,
+                                     "Failed to set KVM_PMU_CAP_DISABLE");
+                    return ret;
+                }
             }
         }
     }
@@ -3252,6 +3263,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     int ret;
     struct utsname utsname;
     Error *local_err = NULL;
+    g_autofree char *kvm_enable_pmu;
 
     /*
      * Initialize SEV context, if required
@@ -3397,6 +3409,21 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 
     pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
 
+    /*
+     * The enable_pmu parameter is introduced since Linux v5.17,
+     * give a chance to provide more information about vPMU
+     * enablement.
+     *
+     * The kvm.enable_pmu's permission is 0444. It does not change
+     * until a reload of the KVM module.
+     */
+    if (g_file_get_contents("/sys/module/kvm/parameters/enable_pmu",
+                            &kvm_enable_pmu, NULL, NULL)) {
+        if (*kvm_enable_pmu == 'N') {
+            kvm_pmu_disabled = true;
+        }
+    }
+
     return 0;
 }
 
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 09/11] target/i386/kvm: reset AMD PMU registers during VM reset
  2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
                   ` (7 preceding siblings ...)
  2025-04-16 21:52 ` [PATCH v4 08/11] target/i386/kvm: query kvm.enable_pmu parameter Dongli Zhang
@ 2025-04-16 21:52 ` Dongli Zhang
  2025-04-25  9:18   ` Zhao Liu
  2025-04-25 10:14   ` Sandipan Das
  2025-04-16 21:52 ` [PATCH v4 10/11] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
  2025-04-16 21:52 ` [PATCH v4 11/11] target/i386/kvm: don't stop Intel PMU counters Dongli Zhang
  10 siblings, 2 replies; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

QEMU uses the kvm_get_msrs() function to save Intel PMU registers from KVM
and kvm_put_msrs() to restore them to KVM. However, there is no support for
AMD PMU registers. Currently, pmu_version and num_pmu_gp_counters are
initialized based on cpuid(0xa), which does not apply to AMD processors.
For AMD CPUs, prior to PerfMonV2, the number of general-purpose registers
is determined based on the CPU version.

To address this issue, we need to add support for AMD PMU registers.
Without this support, the following problems can arise:

1. If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
running "perf top", the PMU registers are not disabled properly.

2. Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
does not handle AMD PMU registers, causing some PMU events to remain
enabled in KVM.

3. The KVM kvm_pmc_speculative_in_use() function consistently returns true,
preventing the reclamation of these events. Consequently, the
kvm_pmc->perf_event remains active.

4. After a reboot, the VM kernel may report the following error:

[    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
[    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)

5. In the worst case, the active kvm_pmc->perf_event may inject unknown
NMIs randomly into the VM kernel:

[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.

To resolve these issues, we propose resetting AMD PMU registers during the
VM reset process.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
Changed since v1:
  - Modify "MSR_K7_EVNTSEL0 + 3" and "MSR_K7_PERFCTR0 + 3" by using
    AMD64_NUM_COUNTERS (suggested by Sandipan Das).
  - Use "AMD64_NUM_COUNTERS_CORE * 2 - 1", not "MSR_F15H_PERF_CTL0 + 0xb".
    (suggested by Sandipan Das).
  - Switch back to "-pmu" instead of using a global "pmu-cap-disabled".
  - Don't initialize PMU info if kvm.enable_pmu=N.
Changed since v2:
  - Remove 'static' from host_cpuid_vendorX.
  - Change has_pmu_version to pmu_version.
  - Use object_property_get_int() to get CPU family.
  - Use cpuid_find_entry() instead of cpu_x86_cpuid().
  - Send error log when host and guest are from different vendors.
  - Move "if (!cpu->enable_pmu)" to begin of function. Add comments to
    reminder developers.
  - Add support to Zhaoxin. Change is_same_vendor() to
    is_host_compat_vendor().
  - Didn't add Reviewed-by from Sandipan because the change isn't minor.
Changed since v3:
  - Use host_cpu_vendor_fms() from Zhao's patch.
  - Check AMD directly makes the "compat" rule clear.
  - Add comment to MAX_GP_COUNTERS.
  - Skip PMU info initialization if !kvm_pmu_disabled.

 target/i386/cpu.h     |  12 +++
 target/i386/kvm/kvm.c | 175 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 183 insertions(+), 4 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 76f24446a5..5d5266f89e 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -490,6 +490,14 @@ typedef enum X86Seg {
 #define MSR_CORE_PERF_GLOBAL_CTRL       0x38f
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL   0x390
 
+#define MSR_K7_EVNTSEL0                 0xc0010000
+#define MSR_K7_PERFCTR0                 0xc0010004
+#define MSR_F15H_PERF_CTL0              0xc0010200
+#define MSR_F15H_PERF_CTR0              0xc0010201
+
+#define AMD64_NUM_COUNTERS              4
+#define AMD64_NUM_COUNTERS_CORE         6
+
 #define MSR_MC0_CTL                     0x400
 #define MSR_MC0_STATUS                  0x401
 #define MSR_MC0_ADDR                    0x402
@@ -1608,6 +1616,10 @@ typedef struct {
 #endif
 
 #define MAX_FIXED_COUNTERS 3
+/*
+ * This formula is based on Intel's MSR. The current size also meets AMD's
+ * needs.
+ */
 #define MAX_GP_COUNTERS    (MSR_IA32_PERF_STATUS - MSR_P6_EVNTSEL0)
 
 #define TARGET_INSN_START_EXTRA_WORDS 1
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 38cc1a5f43..b8926bd4cb 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2076,7 +2076,7 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
     return 0;
 }
 
-static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
+static void kvm_init_pmu_info_intel(struct kvm_cpuid2 *cpuid)
 {
     struct kvm_cpuid_entry2 *c;
 
@@ -2109,6 +2109,96 @@ static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
     }
 }
 
+static void kvm_init_pmu_info_amd(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
+{
+    struct kvm_cpuid_entry2 *c;
+    int64_t family;
+
+    family = object_property_get_int(OBJECT(cpu), "family", NULL);
+    if (family < 0) {
+        return;
+    }
+
+    if (family < 6) {
+        error_report("AMD performance-monitoring is supported from "
+                     "K7 and later");
+        return;
+    }
+
+    pmu_version = 1;
+    num_pmu_gp_counters = AMD64_NUM_COUNTERS;
+
+    c = cpuid_find_entry(cpuid, 0x80000001, 0);
+    if (!c) {
+        return;
+    }
+
+    if (!(c->ecx & CPUID_EXT3_PERFCORE)) {
+        return;
+    }
+
+    num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
+}
+
+static bool is_host_compat_vendor(CPUX86State *env)
+{
+    char host_vendor[CPUID_VENDOR_SZ + 1];
+
+    host_cpu_vendor_fms(host_vendor, NULL, NULL, NULL);
+
+    /*
+     * Intel and Zhaoxin are compatible.
+     */
+    if ((g_str_equal(host_vendor, CPUID_VENDOR_INTEL) ||
+         g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN1) ||
+         g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN2)) &&
+        (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env))) {
+        return true;
+    }
+
+    return g_str_equal(host_vendor, CPUID_VENDOR_AMD) &&
+           IS_AMD_CPU(env);
+}
+
+static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
+{
+    CPUX86State *env = &cpu->env;
+
+    /*
+     * The PMU virtualization is disabled by kvm.enable_pmu=N.
+     */
+    if (kvm_pmu_disabled) {
+        return;
+    }
+
+    /*
+     * If KVM_CAP_PMU_CAPABILITY is not supported, there is no way to
+     * disable the AMD PMU virtualization.
+     *
+     * Assume the user is aware of this when !cpu->enable_pmu. AMD PMU
+     * registers are not going to reset, even they are still available to
+     * guest VM.
+     */
+    if (!cpu->enable_pmu) {
+        return;
+    }
+
+    /*
+     * It is not supported to virtualize AMD PMU registers on Intel
+     * processors, nor to virtualize Intel PMU registers on AMD processors.
+     */
+    if (!is_host_compat_vendor(env)) {
+        error_report("host doesn't support requested feature: vPMU");
+        return;
+    }
+
+    if (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) {
+        kvm_init_pmu_info_intel(cpuid);
+    } else if (IS_AMD_CPU(env)) {
+        kvm_init_pmu_info_amd(cpuid, cpu);
+    }
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     struct {
@@ -2291,7 +2381,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
     cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i);
     cpuid_data.cpuid.nent = cpuid_i;
 
-    kvm_init_pmu_info(&cpuid_data.cpuid);
+    kvm_init_pmu_info(&cpuid_data.cpuid, cpu);
 
     if (((env->cpuid_version >> 8)&0xF) >= 6
         && (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
@@ -4071,7 +4161,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
             kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
         }
 
-        if (pmu_version > 0) {
+        if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
             if (pmu_version > 1) {
                 /* Stop the counter.  */
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
@@ -4102,6 +4192,38 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
                                   env->msr_global_ctrl);
             }
         }
+
+        if (IS_AMD_CPU(env) && pmu_version > 0) {
+            uint32_t sel_base = MSR_K7_EVNTSEL0;
+            uint32_t ctr_base = MSR_K7_PERFCTR0;
+            /*
+             * The address of the next selector or counter register is
+             * obtained by incrementing the address of the current selector
+             * or counter register by one.
+             */
+            uint32_t step = 1;
+
+            /*
+             * When PERFCORE is enabled, AMD PMU uses a separate set of
+             * addresses for the selector and counter registers.
+             * Additionally, the address of the next selector or counter
+             * register is determined by incrementing the address of the
+             * current register by two.
+             */
+            if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+                sel_base = MSR_F15H_PERF_CTL0;
+                ctr_base = MSR_F15H_PERF_CTR0;
+                step = 2;
+            }
+
+            for (i = 0; i < num_pmu_gp_counters; i++) {
+                kvm_msr_entry_add(cpu, ctr_base + i * step,
+                                  env->msr_gp_counters[i]);
+                kvm_msr_entry_add(cpu, sel_base + i * step,
+                                  env->msr_gp_evtsel[i]);
+            }
+        }
+
         /*
          * Hyper-V partition-wide MSRs: to avoid clearing them on cpu hot-add,
          * only sync them to KVM on the first cpu
@@ -4549,7 +4671,8 @@ static int kvm_get_msrs(X86CPU *cpu)
     if (env->features[FEAT_KVM] & CPUID_KVM_POLL_CONTROL) {
         kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
     }
-    if (pmu_version > 0) {
+
+    if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
         if (pmu_version > 1) {
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
@@ -4565,6 +4688,35 @@ static int kvm_get_msrs(X86CPU *cpu)
         }
     }
 
+    if (IS_AMD_CPU(env) && pmu_version > 0) {
+        uint32_t sel_base = MSR_K7_EVNTSEL0;
+        uint32_t ctr_base = MSR_K7_PERFCTR0;
+        /*
+         * The address of the next selector or counter register is
+         * obtained by incrementing the address of the current selector
+         * or counter register by one.
+         */
+        uint32_t step = 1;
+
+        /*
+         * When PERFCORE is enabled, AMD PMU uses a separate set of
+         * addresses for the selector and counter registers.
+         * Additionally, the address of the next selector or counter
+         * register is determined by incrementing the address of the
+         * current register by two.
+         */
+        if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+            sel_base = MSR_F15H_PERF_CTL0;
+            ctr_base = MSR_F15H_PERF_CTR0;
+            step = 2;
+        }
+
+        for (i = 0; i < num_pmu_gp_counters; i++) {
+            kvm_msr_entry_add(cpu, ctr_base + i * step, 0);
+            kvm_msr_entry_add(cpu, sel_base + i * step, 0);
+        }
+    }
+
     if (env->mcg_cap) {
         kvm_msr_entry_add(cpu, MSR_MCG_STATUS, 0);
         kvm_msr_entry_add(cpu, MSR_MCG_CTL, 0);
@@ -4876,6 +5028,21 @@ static int kvm_get_msrs(X86CPU *cpu)
         case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1:
             env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data;
             break;
+        case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL0 + AMD64_NUM_COUNTERS - 1:
+            env->msr_gp_evtsel[index - MSR_K7_EVNTSEL0] = msrs[i].data;
+            break;
+        case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR0 + AMD64_NUM_COUNTERS - 1:
+            env->msr_gp_counters[index - MSR_K7_PERFCTR0] = msrs[i].data;
+            break;
+        case MSR_F15H_PERF_CTL0 ...
+             MSR_F15H_PERF_CTL0 + AMD64_NUM_COUNTERS_CORE * 2 - 1:
+            index = index - MSR_F15H_PERF_CTL0;
+            if (index & 0x1) {
+                env->msr_gp_counters[index] = msrs[i].data;
+            } else {
+                env->msr_gp_evtsel[index] = msrs[i].data;
+            }
+            break;
         case HV_X64_MSR_HYPERCALL:
             env->msr_hv_hypercall = msrs[i].data;
             break;
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 10/11] target/i386/kvm: support perfmon-v2 for reset
  2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
                   ` (8 preceding siblings ...)
  2025-04-16 21:52 ` [PATCH v4 09/11] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
@ 2025-04-16 21:52 ` Dongli Zhang
  2025-04-25 10:12   ` Sandipan Das
  2025-04-16 21:52 ` [PATCH v4 11/11] target/i386/kvm: don't stop Intel PMU counters Dongli Zhang
  10 siblings, 1 reply; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

Since perfmon-v2, the AMD PMU supports additional registers. This update
includes get/put functionality for these extra registers.

Similar to the implementation in KVM:

- MSR_CORE_PERF_GLOBAL_STATUS and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS both
use env->msr_global_status.
- MSR_CORE_PERF_GLOBAL_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_CTL both use
env->msr_global_ctrl.
- MSR_CORE_PERF_GLOBAL_OVF_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR
both use env->msr_global_ovf_ctrl.

No changes are needed for vmstate_msr_architectural_pmu or
pmu_enable_needed().

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v1:
  - Use "has_pmu_version > 1", not "has_pmu_version == 2".
Changed since v2:
  - Use cpuid_find_entry() instead of cpu_x86_cpuid().
  - Change has_pmu_version to pmu_version.
  - Cap num_pmu_gp_counters with MAX_GP_COUNTERS.

 target/i386/cpu.h     |  4 ++++
 target/i386/kvm/kvm.c | 48 +++++++++++++++++++++++++++++++++++--------
 2 files changed, 43 insertions(+), 9 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 5d5266f89e..210a80950e 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -490,6 +490,10 @@ typedef enum X86Seg {
 #define MSR_CORE_PERF_GLOBAL_CTRL       0x38f
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL   0x390
 
+#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS       0xc0000300
+#define MSR_AMD64_PERF_CNTR_GLOBAL_CTL          0xc0000301
+#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR   0xc0000302
+
 #define MSR_K7_EVNTSEL0                 0xc0010000
 #define MSR_K7_PERFCTR0                 0xc0010004
 #define MSR_F15H_PERF_CTL0              0xc0010200
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index b8926bd4cb..13ad7de690 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2138,6 +2138,16 @@ static void kvm_init_pmu_info_amd(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
     }
 
     num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
+
+    c = cpuid_find_entry(cpuid, 0x80000022, 0);
+    if (c && (c->eax & CPUID_8000_0022_EAX_PERFMON_V2)) {
+        pmu_version = 2;
+        num_pmu_gp_counters = c->ebx & 0xf;
+
+        if (num_pmu_gp_counters > MAX_GP_COUNTERS) {
+            num_pmu_gp_counters = MAX_GP_COUNTERS;
+        }
+    }
 }
 
 static bool is_host_compat_vendor(CPUX86State *env)
@@ -4204,13 +4214,14 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
             uint32_t step = 1;
 
             /*
-             * When PERFCORE is enabled, AMD PMU uses a separate set of
-             * addresses for the selector and counter registers.
-             * Additionally, the address of the next selector or counter
-             * register is determined by incrementing the address of the
-             * current register by two.
+             * When PERFCORE or PerfMonV2 is enabled, AMD PMU uses a
+             * separate set of addresses for the selector and counter
+             * registers. Additionally, the address of the next selector or
+             * counter register is determined by incrementing the address
+             * of the current register by two.
              */
-            if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+            if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE ||
+                pmu_version > 1) {
                 sel_base = MSR_F15H_PERF_CTL0;
                 ctr_base = MSR_F15H_PERF_CTR0;
                 step = 2;
@@ -4222,6 +4233,15 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
                 kvm_msr_entry_add(cpu, sel_base + i * step,
                                   env->msr_gp_evtsel[i]);
             }
+
+            if (pmu_version > 1) {
+                kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS,
+                                  env->msr_global_status);
+                kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR,
+                                  env->msr_global_ovf_ctrl);
+                kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_CTL,
+                                  env->msr_global_ctrl);
+            }
         }
 
         /*
@@ -4699,13 +4719,14 @@ static int kvm_get_msrs(X86CPU *cpu)
         uint32_t step = 1;
 
         /*
-         * When PERFCORE is enabled, AMD PMU uses a separate set of
-         * addresses for the selector and counter registers.
+         * When PERFCORE or PerfMonV2 is enabled, AMD PMU uses a separate
+         * set of addresses for the selector and counter registers.
          * Additionally, the address of the next selector or counter
          * register is determined by incrementing the address of the
          * current register by two.
          */
-        if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+        if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE ||
+            pmu_version > 1) {
             sel_base = MSR_F15H_PERF_CTL0;
             ctr_base = MSR_F15H_PERF_CTR0;
             step = 2;
@@ -4715,6 +4736,12 @@ static int kvm_get_msrs(X86CPU *cpu)
             kvm_msr_entry_add(cpu, ctr_base + i * step, 0);
             kvm_msr_entry_add(cpu, sel_base + i * step, 0);
         }
+
+        if (pmu_version > 1) {
+            kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0);
+            kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, 0);
+            kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, 0);
+        }
     }
 
     if (env->mcg_cap) {
@@ -5011,12 +5038,15 @@ static int kvm_get_msrs(X86CPU *cpu)
             env->msr_fixed_ctr_ctrl = msrs[i].data;
             break;
         case MSR_CORE_PERF_GLOBAL_CTRL:
+        case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
             env->msr_global_ctrl = msrs[i].data;
             break;
         case MSR_CORE_PERF_GLOBAL_STATUS:
+        case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
             env->msr_global_status = msrs[i].data;
             break;
         case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+        case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
             env->msr_global_ovf_ctrl = msrs[i].data;
             break;
         case MSR_CORE_PERF_FIXED_CTR0 ... MSR_CORE_PERF_FIXED_CTR0 + MAX_FIXED_COUNTERS - 1:
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 11/11] target/i386/kvm: don't stop Intel PMU counters
  2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
                   ` (9 preceding siblings ...)
  2025-04-16 21:52 ` [PATCH v4 10/11] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
@ 2025-04-16 21:52 ` Dongli Zhang
  10 siblings, 0 replies; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

PMU MSRs are set by QEMU only at levels >= KVM_PUT_RESET_STATE,
excluding runtime. Therefore, updating these MSRs without stopping events
should be acceptable.

In addition, KVM creates kernel perf events with host mode excluded
(exclude_host = 1). While the events remain active, they don't increment
the counter during QEMU vCPU userspace mode.

Finally, The kvm_put_msrs() sets the MSRs using KVM_SET_MSRS. The x86 KVM
processes these MSRs one by one in a loop, only saving the config and
triggering the KVM_REQ_PMU request. This approach does not immediately stop
the event before updating PMC. This approach is true since Linux kernel
commit 68fb4757e867 ("KVM: x86/pmu: Defer reprogram_counter() to
kvm_pmu_handle_event"), that is, v6.2.

No Fixed tag is going to be added for the commit 0d89436786b0 ("kvm:
migrate vPMU state"), because this isn't a bugfix.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v3:
  - Re-order reasons in commit messages.
  - Mention KVM's commit 68fb4757e867 (v6.2).
  - Keep Zhao's review as there isn't code change.

 target/i386/kvm/kvm.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 13ad7de690..6396f65558 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4172,13 +4172,6 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
         }
 
         if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
-            if (pmu_version > 1) {
-                /* Stop the counter.  */
-                kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
-                kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
-            }
-
-            /* Set the counter values.  */
             for (i = 0; i < num_pmu_fixed_counters; i++) {
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i,
                                   env->msr_fixed_counters[i]);
@@ -4194,8 +4187,6 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
                                   env->msr_global_status);
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
                                   env->msr_global_ovf_ctrl);
-
-                /* Now start the PMU.  */
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL,
                                   env->msr_fixed_ctr_ctrl);
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL,
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 01/11] [DO NOT MERGE] i386/cpu: Consolidate the helper to get Host's vendor
  2025-04-16 21:52 ` [PATCH v4 01/11] [DO NOT MERGE] i386/cpu: Consolidate the helper to get Host's vendor Dongli Zhang
@ 2025-04-25  8:28   ` Zhao Liu
  2025-04-25 15:45     ` Dongli Zhang
  0 siblings, 1 reply; 19+ messages in thread
From: Zhao Liu @ 2025-04-25  8:28 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x,
	pbonzini, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

On Wed, Apr 16, 2025 at 02:52:26PM -0700, Dongli Zhang wrote:
> Date: Wed, 16 Apr 2025 14:52:26 -0700
> From: Dongli Zhang <dongli.zhang@oracle.com>
> Subject: [PATCH v4 01/11] [DO NOT MERGE] i386/cpu: Consolidate the helper
>  to get Host's vendor
> X-Mailer: git-send-email 2.43.5
> 
> From: Zhao Liu <zhao1.liu@intel.com>
> 
> Extend host_cpu_vendor_fms() to help more cases to get Host's vendor
> information.
> 
> Cc: Dongli Zhang <dongli.zhang@oracle.com>
> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> ---
> This patch is already queued by Paolo.
> https://lore.kernel.org/all/20250410075619.145792-1-zhao1.liu@intel.com/
> I don't need to add my Signed-off-by.
> 
>  target/i386/host-cpu.c        | 10 ++++++----
>  target/i386/kvm/vmsr_energy.c |  3 +--
>  2 files changed, 7 insertions(+), 6 deletions(-)

Thanks. It has been merged as commit ae39acef49e2916 now.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 08/11] target/i386/kvm: query kvm.enable_pmu parameter
  2025-04-16 21:52 ` [PATCH v4 08/11] target/i386/kvm: query kvm.enable_pmu parameter Dongli Zhang
@ 2025-04-25  8:56   ` Zhao Liu
  0 siblings, 0 replies; 19+ messages in thread
From: Zhao Liu @ 2025-04-25  8:56 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x,
	pbonzini, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

On Wed, Apr 16, 2025 at 02:52:33PM -0700, Dongli Zhang wrote:
> Date: Wed, 16 Apr 2025 14:52:33 -0700
> From: Dongli Zhang <dongli.zhang@oracle.com>
> Subject: [PATCH v4 08/11] target/i386/kvm: query kvm.enable_pmu parameter
> X-Mailer: git-send-email 2.43.5
> 
> When PMU is enabled in QEMU, there is a chance that PMU virtualization is
> completely disabled by the KVM module parameter kvm.enable_pmu=N.
> 
> The kvm.enable_pmu parameter is introduced since Linux v5.17.
> Its permission is 0444. It does not change until a reload of the KVM
> module.
> 
> Read the kvm.enable_pmu value from the module sysfs to give a chance to
> provide more information about vPMU enablement.
> 
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> Changed since v2:
>   - Rework the code flow following Zhao's suggestion.
>   - Return error when:
>     (*kvm_enable_pmu == 'N' && X86_CPU(cpu)->enable_pmu)
> Changed since v3:
>   - Re-split the cases into enable_pmu and !enable_pmu, following Zhao's
>     suggestion.
>   - Rework the commit messages.
>   - Bring back global static variable 'kvm_pmu_disabled' from v2.
> 
>  target/i386/kvm/kvm.c | 61 +++++++++++++++++++++++++++++++------------
>  1 file changed, 44 insertions(+), 17 deletions(-)

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 09/11] target/i386/kvm: reset AMD PMU registers during VM reset
  2025-04-16 21:52 ` [PATCH v4 09/11] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
@ 2025-04-25  9:18   ` Zhao Liu
  2025-04-25 10:14   ` Sandipan Das
  1 sibling, 0 replies; 19+ messages in thread
From: Zhao Liu @ 2025-04-25  9:18 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x,
	pbonzini, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

On Wed, Apr 16, 2025 at 02:52:34PM -0700, Dongli Zhang wrote:
> Date: Wed, 16 Apr 2025 14:52:34 -0700
> From: Dongli Zhang <dongli.zhang@oracle.com>
> Subject: [PATCH v4 09/11] target/i386/kvm: reset AMD PMU registers during
>  VM reset
> X-Mailer: git-send-email 2.43.5
> 
> QEMU uses the kvm_get_msrs() function to save Intel PMU registers from KVM
> and kvm_put_msrs() to restore them to KVM. However, there is no support for
> AMD PMU registers. Currently, pmu_version and num_pmu_gp_counters are
> initialized based on cpuid(0xa), which does not apply to AMD processors.
> For AMD CPUs, prior to PerfMonV2, the number of general-purpose registers
> is determined based on the CPU version.
> 
> To address this issue, we need to add support for AMD PMU registers.
> Without this support, the following problems can arise:
> 
> 1. If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
> running "perf top", the PMU registers are not disabled properly.
> 
> 2. Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
> does not handle AMD PMU registers, causing some PMU events to remain
> enabled in KVM.
> 
> 3. The KVM kvm_pmc_speculative_in_use() function consistently returns true,
> preventing the reclamation of these events. Consequently, the
> kvm_pmc->perf_event remains active.
> 
> 4. After a reboot, the VM kernel may report the following error:
> 
> [    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
> [    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
> 
> 5. In the worst case, the active kvm_pmc->perf_event may inject unknown
> NMIs randomly into the VM kernel:
> 
> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
> 
> To resolve these issues, we propose resetting AMD PMU registers during the
> VM reset process.
> 
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> Changed since v1:
>   - Modify "MSR_K7_EVNTSEL0 + 3" and "MSR_K7_PERFCTR0 + 3" by using
>     AMD64_NUM_COUNTERS (suggested by Sandipan Das).
>   - Use "AMD64_NUM_COUNTERS_CORE * 2 - 1", not "MSR_F15H_PERF_CTL0 + 0xb".
>     (suggested by Sandipan Das).
>   - Switch back to "-pmu" instead of using a global "pmu-cap-disabled".
>   - Don't initialize PMU info if kvm.enable_pmu=N.
> Changed since v2:
>   - Remove 'static' from host_cpuid_vendorX.
>   - Change has_pmu_version to pmu_version.
>   - Use object_property_get_int() to get CPU family.
>   - Use cpuid_find_entry() instead of cpu_x86_cpuid().
>   - Send error log when host and guest are from different vendors.
>   - Move "if (!cpu->enable_pmu)" to begin of function. Add comments to
>     reminder developers.
>   - Add support to Zhaoxin. Change is_same_vendor() to
>     is_host_compat_vendor().
>   - Didn't add Reviewed-by from Sandipan because the change isn't minor.
> Changed since v3:
>   - Use host_cpu_vendor_fms() from Zhao's patch.
>   - Check AMD directly makes the "compat" rule clear.
>   - Add comment to MAX_GP_COUNTERS.
>   - Skip PMU info initialization if !kvm_pmu_disabled.
> 
>  target/i386/cpu.h     |  12 +++
>  target/i386/kvm/kvm.c | 175 +++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 183 insertions(+), 4 deletions(-)

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 03/11] target/i386: disable PERFCORE when "-pmu" is configured
  2025-04-16 21:52 ` [PATCH v4 03/11] target/i386: disable PERFCORE when "-pmu" is configured Dongli Zhang
@ 2025-04-25 10:11   ` Sandipan Das
  0 siblings, 0 replies; 19+ messages in thread
From: Sandipan Das @ 2025-04-25 10:11 UTC (permalink / raw)
  To: Dongli Zhang, qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv,
	qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, babu.moger, likexu, like.xu.linux,
	groug, khorenko, alexander.ivanov, den, davydov-max, xiaoyao.li,
	dapeng1.mi, joe.jin, peter.maydell, gaosong, chenhuacai, philmd,
	aurelien, jiaxun.yang, arikalo, npiggin, danielhb413, palmer,
	alistair.francis, liwei1518, zhiwei_liu, pasic, borntraeger,
	richard.henderson, david, iii, thuth, flavra, ewanhai-oc, ewanhai,
	cobechen, louisqi, liamni, frankzhu, silviazhao, kraxel, berrange

On 4/17/2025 3:22 AM, Dongli Zhang wrote:
> Currently, AMD PMU support isn't determined based on CPUID, that is, the
> "-pmu" option does not fully disable KVM AMD PMU virtualization.
> 
> To minimize AMD PMU features, remove PERFCORE when "-pmu" is configured.
> 
> To completely disable AMD PMU virtualization will be implemented via
> KVM_CAP_PMU_CAPABILITY in upcoming patches.
> 
> As a reminder, neither CPUID_EXT3_PERFCORE nor
> CPUID_8000_0022_EAX_PERFMON_V2 is removed from env->features[] when "-pmu"
> is configured. Developers should query whether they are supported via
> cpu_x86_cpuid() rather than relying on env->features[] in future patches.
> 
> Suggested-by: Zhao Liu <zhao1.liu@intel.com>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> ---
> Changed since v2:
>   - No need to check "kvm_enabled() && IS_AMD_CPU(env)".
> 
>  target/i386/cpu.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 

Reviewed-by: Sandipan Das <sandipan.das@amd.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 10/11] target/i386/kvm: support perfmon-v2 for reset
  2025-04-16 21:52 ` [PATCH v4 10/11] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
@ 2025-04-25 10:12   ` Sandipan Das
  0 siblings, 0 replies; 19+ messages in thread
From: Sandipan Das @ 2025-04-25 10:12 UTC (permalink / raw)
  To: Dongli Zhang, qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv,
	qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, babu.moger, likexu, like.xu.linux,
	groug, khorenko, alexander.ivanov, den, davydov-max, xiaoyao.li,
	dapeng1.mi, joe.jin, peter.maydell, gaosong, chenhuacai, philmd,
	aurelien, jiaxun.yang, arikalo, npiggin, danielhb413, palmer,
	alistair.francis, liwei1518, zhiwei_liu, pasic, borntraeger,
	richard.henderson, david, iii, thuth, flavra, ewanhai-oc, ewanhai,
	cobechen, louisqi, liamni, frankzhu, silviazhao, kraxel, berrange

On 4/17/2025 3:22 AM, Dongli Zhang wrote:
> Since perfmon-v2, the AMD PMU supports additional registers. This update
> includes get/put functionality for these extra registers.
> 
> Similar to the implementation in KVM:
> 
> - MSR_CORE_PERF_GLOBAL_STATUS and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS both
> use env->msr_global_status.
> - MSR_CORE_PERF_GLOBAL_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_CTL both use
> env->msr_global_ctrl.
> - MSR_CORE_PERF_GLOBAL_OVF_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR
> both use env->msr_global_ovf_ctrl.
> 
> No changes are needed for vmstate_msr_architectural_pmu or
> pmu_enable_needed().
> 
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> ---
> Changed since v1:
>   - Use "has_pmu_version > 1", not "has_pmu_version == 2".
> Changed since v2:
>   - Use cpuid_find_entry() instead of cpu_x86_cpuid().
>   - Change has_pmu_version to pmu_version.
>   - Cap num_pmu_gp_counters with MAX_GP_COUNTERS.
> 
>  target/i386/cpu.h     |  4 ++++
>  target/i386/kvm/kvm.c | 48 +++++++++++++++++++++++++++++++++++--------
>  2 files changed, 43 insertions(+), 9 deletions(-)
> 

Reviewed-by: Sandipan Das <sandipan.das@amd.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 09/11] target/i386/kvm: reset AMD PMU registers during VM reset
  2025-04-16 21:52 ` [PATCH v4 09/11] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
  2025-04-25  9:18   ` Zhao Liu
@ 2025-04-25 10:14   ` Sandipan Das
  1 sibling, 0 replies; 19+ messages in thread
From: Sandipan Das @ 2025-04-25 10:14 UTC (permalink / raw)
  To: Dongli Zhang, qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv,
	qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, babu.moger, likexu, like.xu.linux,
	groug, khorenko, alexander.ivanov, den, davydov-max, xiaoyao.li,
	dapeng1.mi, joe.jin, peter.maydell, gaosong, chenhuacai, philmd,
	aurelien, jiaxun.yang, arikalo, npiggin, danielhb413, palmer,
	alistair.francis, liwei1518, zhiwei_liu, pasic, borntraeger,
	richard.henderson, david, iii, thuth, flavra, ewanhai-oc, ewanhai,
	cobechen, louisqi, liamni, frankzhu, silviazhao, kraxel, berrange

On 4/17/2025 3:22 AM, Dongli Zhang wrote:
> QEMU uses the kvm_get_msrs() function to save Intel PMU registers from KVM
> and kvm_put_msrs() to restore them to KVM. However, there is no support for
> AMD PMU registers. Currently, pmu_version and num_pmu_gp_counters are
> initialized based on cpuid(0xa), which does not apply to AMD processors.
> For AMD CPUs, prior to PerfMonV2, the number of general-purpose registers
> is determined based on the CPU version.
> 
> To address this issue, we need to add support for AMD PMU registers.
> Without this support, the following problems can arise:
> 
> 1. If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
> running "perf top", the PMU registers are not disabled properly.
> 
> 2. Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
> does not handle AMD PMU registers, causing some PMU events to remain
> enabled in KVM.
> 
> 3. The KVM kvm_pmc_speculative_in_use() function consistently returns true,
> preventing the reclamation of these events. Consequently, the
> kvm_pmc->perf_event remains active.
> 
> 4. After a reboot, the VM kernel may report the following error:
> 
> [    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
> [    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
> 
> 5. In the worst case, the active kvm_pmc->perf_event may inject unknown
> NMIs randomly into the VM kernel:
> 
> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
> 
> To resolve these issues, we propose resetting AMD PMU registers during the
> VM reset process.
> 
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> Changed since v1:
>   - Modify "MSR_K7_EVNTSEL0 + 3" and "MSR_K7_PERFCTR0 + 3" by using
>     AMD64_NUM_COUNTERS (suggested by Sandipan Das).
>   - Use "AMD64_NUM_COUNTERS_CORE * 2 - 1", not "MSR_F15H_PERF_CTL0 + 0xb".
>     (suggested by Sandipan Das).
>   - Switch back to "-pmu" instead of using a global "pmu-cap-disabled".
>   - Don't initialize PMU info if kvm.enable_pmu=N.
> Changed since v2:
>   - Remove 'static' from host_cpuid_vendorX.
>   - Change has_pmu_version to pmu_version.
>   - Use object_property_get_int() to get CPU family.
>   - Use cpuid_find_entry() instead of cpu_x86_cpuid().
>   - Send error log when host and guest are from different vendors.
>   - Move "if (!cpu->enable_pmu)" to begin of function. Add comments to
>     reminder developers.
>   - Add support to Zhaoxin. Change is_same_vendor() to
>     is_host_compat_vendor().
>   - Didn't add Reviewed-by from Sandipan because the change isn't minor.
> Changed since v3:
>   - Use host_cpu_vendor_fms() from Zhao's patch.
>   - Check AMD directly makes the "compat" rule clear.
>   - Add comment to MAX_GP_COUNTERS.
>   - Skip PMU info initialization if !kvm_pmu_disabled.
> 
>  target/i386/cpu.h     |  12 +++
>  target/i386/kvm/kvm.c | 175 +++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 183 insertions(+), 4 deletions(-)
> 

Reviewed-by: Sandipan Das <sandipan.das@amd.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 01/11] [DO NOT MERGE] i386/cpu: Consolidate the helper to get Host's vendor
  2025-04-25  8:28   ` Zhao Liu
@ 2025-04-25 15:45     ` Dongli Zhang
  0 siblings, 0 replies; 19+ messages in thread
From: Dongli Zhang @ 2025-04-25 15:45 UTC (permalink / raw)
  To: Zhao Liu
  Cc: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x,
	pbonzini, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

Hi Zhao,

On 4/25/25 1:28 AM, Zhao Liu wrote:
> On Wed, Apr 16, 2025 at 02:52:26PM -0700, Dongli Zhang wrote:
>> Date: Wed, 16 Apr 2025 14:52:26 -0700
>> From: Dongli Zhang <dongli.zhang@oracle.com>
>> Subject: [PATCH v4 01/11] [DO NOT MERGE] i386/cpu: Consolidate the helper
>>  to get Host's vendor
>> X-Mailer: git-send-email 2.43.5
>>
>> From: Zhao Liu <zhao1.liu@intel.com>
>>
>> Extend host_cpu_vendor_fms() to help more cases to get Host's vendor
>> information.
>>
>> Cc: Dongli Zhang <dongli.zhang@oracle.com>
>> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
>> ---
>> This patch is already queued by Paolo.
>> https://urldefense.com/v3/__https://lore.kernel.org/all/20250410075619.145792-1-zhao1.liu@intel.com/__;!!ACWV5N9M2RV99hQ!L2uxw6itl1xu4V_vdRWxQMeVR4PWVX0zvXndOqPHqmnCvnpPkyNamRGVSAil03m_ojnjPCMgUMEG0jBDtLNl$ 
>> I don't need to add my Signed-off-by.
>>
>>  target/i386/host-cpu.c        | 10 ++++++----
>>  target/i386/kvm/vmsr_energy.c |  3 +--
>>  2 files changed, 7 insertions(+), 6 deletions(-)
> 
> Thanks. It has been merged as commit ae39acef49e2916 now.
> 

Since all patches are reviewed, I am going to re-send on top of the most recent
mainline QEMU with all Reviewed-by.

Thank you very much!

Dongli Zhang


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-04-25 15:47 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
2025-04-16 21:52 ` [PATCH v4 01/11] [DO NOT MERGE] i386/cpu: Consolidate the helper to get Host's vendor Dongli Zhang
2025-04-25  8:28   ` Zhao Liu
2025-04-25 15:45     ` Dongli Zhang
2025-04-16 21:52 ` [PATCH v4 02/11] target/i386: disable PerfMonV2 when PERFCORE unavailable Dongli Zhang
2025-04-16 21:52 ` [PATCH v4 03/11] target/i386: disable PERFCORE when "-pmu" is configured Dongli Zhang
2025-04-25 10:11   ` Sandipan Das
2025-04-16 21:52 ` [PATCH v4 04/11] kvm: Introduce kvm_arch_pre_create_vcpu() Dongli Zhang
2025-04-16 21:52 ` [PATCH v4 05/11] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured Dongli Zhang
2025-04-16 21:52 ` [PATCH v4 06/11] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
2025-04-16 21:52 ` [PATCH v4 07/11] target/i386/kvm: rename architectural PMU variables Dongli Zhang
2025-04-16 21:52 ` [PATCH v4 08/11] target/i386/kvm: query kvm.enable_pmu parameter Dongli Zhang
2025-04-25  8:56   ` Zhao Liu
2025-04-16 21:52 ` [PATCH v4 09/11] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
2025-04-25  9:18   ` Zhao Liu
2025-04-25 10:14   ` Sandipan Das
2025-04-16 21:52 ` [PATCH v4 10/11] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
2025-04-25 10:12   ` Sandipan Das
2025-04-16 21:52 ` [PATCH v4 11/11] target/i386/kvm: don't stop Intel PMU counters Dongli Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).