All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup
@ 2025-04-16 21:52 Dongli Zhang
  2025-04-16 21:52 ` [PATCH v4 01/11] [DO NOT MERGE] i386/cpu: Consolidate the helper to get Host's vendor Dongli Zhang
                   ` (10 more replies)
  0 siblings, 11 replies; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

This patchset addresses four bugs related to AMD PMU virtualization.

1. The PerfMonV2 is still available if PERCORE if disabled via
"-cpu host,-perfctr-core".

2. The VM 'cpuid' command still returns PERFCORE although "-pmu" is
configured.

3. The third issue is that using "-cpu host,-pmu" does not disable AMD PMU
virtualization. When using "-cpu EPYC" or "-cpu host,-pmu", AMD PMU
virtualization remains enabled. On the VM's Linux side, you might still
see:

[    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.

instead of:

[    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
[    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled

To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
when "-pmu" is configured.

4. The fourth issue is that unreclaimed performance events (after a QEMU
system_reset) in KVM may cause random, unwanted, or unknown NMIs to be
injected into the VM.

The AMD PMU registers are not reset during QEMU system_reset.

(1) If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
running "perf top", the PMU registers are not disabled properly.

(2) Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
does not handle AMD PMU registers, causing some PMU events to remain
enabled in KVM.

(3) The KVM kvm_pmc_speculative_in_use() function consistently returns true,
preventing the reclamation of these events. Consequently, the
kvm_pmc->perf_event remains active.

(4) After a reboot, the VM kernel may report the following error:

[    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
[    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)

(5) In the worst case, the active kvm_pmc->perf_event may inject unknown
NMIs randomly into the VM kernel:

[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.

To resolve these issues, we propose resetting AMD PMU registers during the
VM reset process


Changed since v1:
  - Use feature_dependencies for CPUID_EXT3_PERFCORE and
    CPUID_8000_0022_EAX_PERFMON_V2.
  - Remove CPUID_EXT3_PERFCORE when !cpu->enable_pmu.
  - Pick kvm_arch_pre_create_vcpu() patch from Xiaoyao Li.
  - Use "-pmu" but not a global "pmu-cap-disabled" for KVM_PMU_CAP_DISABLE.
  - Also use sysfs kvm.enable_pmu=N to determine if PMU is supported.
  - Some changes to PMU register limit calculation.
Changed since v2:
  - Change has_pmu_cap to pmu_cap.
  - Use cpuid_find_entry() instead of cpu_x86_cpuid().
  - Rework the code flow of PATCH 07 related to kvm.enable_pmu=N following
    Zhao's suggestion.
  - Use object_property_get_int() to get CPU family.
  - Add support to Zhaoxin.
Changed since v3:
  - Re-base on top of Zhao's queued patch.
  - Use host_cpu_vendor_fms() from Zhao's patch.
  - Pick new version of kvm_arch_pre_create_vcpu() patch from Xiaoyao.
  - Re-split the cases into enable_pmu and !enable_pmu, following Zhao's
    suggestion.
  - Check AMD directly makes the "compat" rule clear.
  - Some changes on commit message and comment.
  - Bring back global static variable 'kvm_pmu_disabled' read from
    /sys/module/kvm/parameters/enable_pmu.


Zhao Liu (1):
  i386/cpu: Consolidate the helper to get Host's vendor [Don't merge]

Xiaoyao Li (1):
  kvm: Introduce kvm_arch_pre_create_vcpu()

Dongli Zhang (9):
  target/i386: disable PerfMonV2 when PERFCORE unavailable
  target/i386: disable PERFCORE when "-pmu" is configured
  target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured
  target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid()
  target/i386/kvm: rename architectural PMU variables
  target/i386/kvm: query kvm.enable_pmu parameter
  target/i386/kvm: reset AMD PMU registers during VM reset
  target/i386/kvm: support perfmon-v2 for reset
  target/i386/kvm: don't stop Intel PMU counters

 accel/kvm/kvm-all.c           |   5 +
 include/system/kvm.h          |   1 +
 target/arm/kvm.c              |   5 +
 target/i386/cpu.c             |   8 +
 target/i386/cpu.h             |  16 ++
 target/i386/host-cpu.c        |  10 +-
 target/i386/kvm/kvm.c         | 360 ++++++++++++++++++++++++++++++++-----
 target/i386/kvm/vmsr_energy.c |   3 +-
 target/loongarch/kvm/kvm.c    |   4 +
 target/mips/kvm.c             |   5 +
 target/ppc/kvm.c              |   5 +
 target/riscv/kvm/kvm-cpu.c    |   5 +
 target/s390x/kvm/kvm.c        |   5 +
 13 files changed, 379 insertions(+), 53 deletions(-)

base-commit: a9cd5bc6399a80fcf233ed0fffe6067b731227d8

Thank you very much!

Dongli Zhang


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 01/11] [DO NOT MERGE] i386/cpu: Consolidate the helper to get Host's vendor
  2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
@ 2025-04-16 21:52 ` Dongli Zhang
  2025-04-25  8:28   ` Zhao Liu
  2025-04-16 21:52 ` [PATCH v4 02/11] target/i386: disable PerfMonV2 when PERFCORE unavailable Dongli Zhang
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

From: Zhao Liu <zhao1.liu@intel.com>

Extend host_cpu_vendor_fms() to help more cases to get Host's vendor
information.

Cc: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
This patch is already queued by Paolo.
https://lore.kernel.org/all/20250410075619.145792-1-zhao1.liu@intel.com/
I don't need to add my Signed-off-by.

 target/i386/host-cpu.c        | 10 ++++++----
 target/i386/kvm/vmsr_energy.c |  3 +--
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/target/i386/host-cpu.c b/target/i386/host-cpu.c
index 3e4e85e729..072731a4dd 100644
--- a/target/i386/host-cpu.c
+++ b/target/i386/host-cpu.c
@@ -109,9 +109,13 @@ void host_cpu_vendor_fms(char *vendor, int *family, int *model, int *stepping)
 {
     uint32_t eax, ebx, ecx, edx;
 
-    host_cpuid(0x0, 0, &eax, &ebx, &ecx, &edx);
+    host_cpuid(0x0, 0, NULL, &ebx, &ecx, &edx);
     x86_cpu_vendor_words2str(vendor, ebx, edx, ecx);
 
+    if (!family && !model && !stepping) {
+        return;
+    }
+
     host_cpuid(0x1, 0, &eax, &ebx, &ecx, &edx);
     if (family) {
         *family = ((eax >> 8) & 0x0F) + ((eax >> 20) & 0xFF);
@@ -129,11 +133,9 @@ void host_cpu_instance_init(X86CPU *cpu)
     X86CPUClass *xcc = X86_CPU_GET_CLASS(cpu);
 
     if (xcc->model) {
-        uint32_t ebx = 0, ecx = 0, edx = 0;
         char vendor[CPUID_VENDOR_SZ + 1];
 
-        host_cpuid(0, 0, NULL, &ebx, &ecx, &edx);
-        x86_cpu_vendor_words2str(vendor, ebx, edx, ecx);
+        host_cpu_vendor_fms(vendor, NULL, NULL, NULL);
         object_property_set_str(OBJECT(cpu), "vendor", vendor, &error_abort);
     }
 }
diff --git a/target/i386/kvm/vmsr_energy.c b/target/i386/kvm/vmsr_energy.c
index 31508d4e77..f499ec6e8b 100644
--- a/target/i386/kvm/vmsr_energy.c
+++ b/target/i386/kvm/vmsr_energy.c
@@ -29,10 +29,9 @@ char *vmsr_compute_default_paths(void)
 
 bool is_host_cpu_intel(void)
 {
-    int family, model, stepping;
     char vendor[CPUID_VENDOR_SZ + 1];
 
-    host_cpu_vendor_fms(vendor, &family, &model, &stepping);
+    host_cpu_vendor_fms(vendor, NULL, NULL, NULL);
 
     return g_str_equal(vendor, CPUID_VENDOR_INTEL);
 }
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 02/11] target/i386: disable PerfMonV2 when PERFCORE unavailable
  2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
  2025-04-16 21:52 ` [PATCH v4 01/11] [DO NOT MERGE] i386/cpu: Consolidate the helper to get Host's vendor Dongli Zhang
@ 2025-04-16 21:52 ` Dongli Zhang
  2025-04-16 21:52 ` [PATCH v4 03/11] target/i386: disable PERFCORE when "-pmu" is configured Dongli Zhang
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

When the PERFCORE is disabled with "-cpu host,-perfctr-core", it is
reflected in in guest dmesg.

[    0.285136] Performance Events: AMD PMU driver.

However, the guest CPUID indicates the PerfMonV2 is still available.

CPU:
   Extended Performance Monitoring and Debugging (0x80000022):
      AMD performance monitoring V2         = true
      AMD LBR V2                            = false
      AMD LBR stack & PMC freezing          = false
      number of core perf ctrs              = 0x6 (6)
      number of LBR stack entries           = 0x0 (0)
      number of avail Northbridge perf ctrs = 0x0 (0)
      number of available UMC PMCs          = 0x0 (0)
      active UMCs bitmask                   = 0x0

Disable PerfMonV2 in CPUID when PERFCORE is disabled.

Suggested-by: Zhao Liu <zhao1.liu@intel.com>
Fixes: 209b0ac12074 ("target/i386: Add PerfMonV2 feature bit")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
---
Changed since v1:
  - Use feature_dependencies (suggested by Zhao Liu).
Changed since v2:
  - Nothing. Zhao and Xiaoyao may move it to x86_cpu_expand_features()
    later.

 target/i386/cpu.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 1b64ceaaba..2b87331be5 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1808,6 +1808,10 @@ static FeatureDep feature_dependencies[] = {
         .from = { FEAT_7_1_EDX,             CPUID_7_1_EDX_AVX10 },
         .to = { FEAT_24_0_EBX,              ~0ull },
     },
+    {
+        .from = { FEAT_8000_0001_ECX,       CPUID_EXT3_PERFCORE },
+        .to = { FEAT_8000_0022_EAX,         CPUID_8000_0022_EAX_PERFMON_V2 },
+    },
 };
 
 typedef struct X86RegisterInfo32 {
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 03/11] target/i386: disable PERFCORE when "-pmu" is configured
  2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
  2025-04-16 21:52 ` [PATCH v4 01/11] [DO NOT MERGE] i386/cpu: Consolidate the helper to get Host's vendor Dongli Zhang
  2025-04-16 21:52 ` [PATCH v4 02/11] target/i386: disable PerfMonV2 when PERFCORE unavailable Dongli Zhang
@ 2025-04-16 21:52 ` Dongli Zhang
  2025-04-25 10:11   ` Sandipan Das
  2025-04-16 21:52 ` [PATCH v4 04/11] kvm: Introduce kvm_arch_pre_create_vcpu() Dongli Zhang
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

Currently, AMD PMU support isn't determined based on CPUID, that is, the
"-pmu" option does not fully disable KVM AMD PMU virtualization.

To minimize AMD PMU features, remove PERFCORE when "-pmu" is configured.

To completely disable AMD PMU virtualization will be implemented via
KVM_CAP_PMU_CAPABILITY in upcoming patches.

As a reminder, neither CPUID_EXT3_PERFCORE nor
CPUID_8000_0022_EAX_PERFMON_V2 is removed from env->features[] when "-pmu"
is configured. Developers should query whether they are supported via
cpu_x86_cpuid() rather than relying on env->features[] in future patches.

Suggested-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v2:
  - No need to check "kvm_enabled() && IS_AMD_CPU(env)".

 target/i386/cpu.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 2b87331be5..acbd627f7e 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7242,6 +7242,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
             !(env->hflags & HF_LMA_MASK)) {
             *edx &= ~CPUID_EXT2_SYSCALL;
         }
+
+        if (!cpu->enable_pmu) {
+            *ecx &= ~CPUID_EXT3_PERFCORE;
+        }
         break;
     case 0x80000002:
     case 0x80000003:
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 04/11] kvm: Introduce kvm_arch_pre_create_vcpu()
  2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
                   ` (2 preceding siblings ...)
  2025-04-16 21:52 ` [PATCH v4 03/11] target/i386: disable PERFCORE when "-pmu" is configured Dongli Zhang
@ 2025-04-16 21:52 ` Dongli Zhang
  2025-04-16 21:52 ` [PATCH v4 05/11] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured Dongli Zhang
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

From: Xiaoyao Li <xiaoyao.li@intel.com>

Introduce kvm_arch_pre_create_vcpu(), to perform arch-dependent
work prior to create any vcpu. This is for i386 TDX because it needs
call TDX_INIT_VM before creating any vcpu.

The specific implementation of i386 will be added in the future patch.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v2:
  - Add my Signed-off-by.
Changed since v3:
  - Pick new reviewed version from:
https://lore.kernel.org/all/20250401130205.2198253-8-xiaoyao.li@intel.com/
    I have fixed the typo as suggested by Daniel.
  - Keep Zhao's review.

 accel/kvm/kvm-all.c        | 5 +++++
 include/system/kvm.h       | 1 +
 target/arm/kvm.c           | 5 +++++
 target/i386/kvm/kvm.c      | 5 +++++
 target/loongarch/kvm/kvm.c | 4 ++++
 target/mips/kvm.c          | 5 +++++
 target/ppc/kvm.c           | 5 +++++
 target/riscv/kvm/kvm-cpu.c | 5 +++++
 target/s390x/kvm/kvm.c     | 5 +++++
 9 files changed, 40 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index f89568bfa3..df9840e53a 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -540,6 +540,11 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 
     trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
 
+    ret = kvm_arch_pre_create_vcpu(cpu, errp);
+    if (ret < 0) {
+        goto err;
+    }
+
     ret = kvm_create_vcpu(cpu);
     if (ret < 0) {
         error_setg_errno(errp, -ret,
diff --git a/include/system/kvm.h b/include/system/kvm.h
index ab17c09a55..d7dfa25493 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -374,6 +374,7 @@ int kvm_arch_get_default_type(MachineState *ms);
 
 int kvm_arch_init(MachineState *ms, KVMState *s);
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp);
 int kvm_arch_init_vcpu(CPUState *cpu);
 int kvm_arch_destroy_vcpu(CPUState *cpu);
 
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index da30bdbb23..93f1a7245b 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -1874,6 +1874,11 @@ static int kvm_arm_sve_set_vls(ARMCPU *cpu)
 
 #define ARM_CPU_ID_MPIDR       3, 0, 0, 0, 5
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+    return 0;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     int ret;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 6c749d4ee8..f41e190fb8 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2051,6 +2051,11 @@ full:
     abort();
 }
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+    return 0;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     struct {
diff --git a/target/loongarch/kvm/kvm.c b/target/loongarch/kvm/kvm.c
index f0e3cfef03..64c9672976 100644
--- a/target/loongarch/kvm/kvm.c
+++ b/target/loongarch/kvm/kvm.c
@@ -1071,7 +1071,11 @@ static int kvm_cpu_check_pv_features(CPUState *cs, Error **errp)
             env->pv_features |= BIT(KVM_FEATURE_VIRT_EXTIOI);
         }
     }
+    return 0;
+}
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
     return 0;
 }
 
diff --git a/target/mips/kvm.c b/target/mips/kvm.c
index d67b7c1a8e..ec53acb51a 100644
--- a/target/mips/kvm.c
+++ b/target/mips/kvm.c
@@ -61,6 +61,11 @@ int kvm_arch_irqchip_create(KVMState *s)
     return 0;
 }
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+    return 0;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     CPUMIPSState *env = cpu_env(cs);
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 992356cb75..20fabccecd 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -479,6 +479,11 @@ static void kvmppc_hw_debug_points_init(CPUPPCState *cenv)
     }
 }
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+    return 0;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     PowerPCCPU *cpu = POWERPC_CPU(cs);
diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c
index 0f4997a918..6f15f727de 100644
--- a/target/riscv/kvm/kvm-cpu.c
+++ b/target/riscv/kvm/kvm-cpu.c
@@ -1383,6 +1383,11 @@ static int kvm_vcpu_enable_sbi_dbcn(RISCVCPU *cpu, CPUState *cs)
     return kvm_set_one_reg(cs, kvm_sbi_dbcn.kvm_reg_id, &reg);
 }
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+    return 0;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     int ret = 0;
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 4d56e653dd..1f592733f4 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -404,6 +404,11 @@ unsigned long kvm_arch_vcpu_id(CPUState *cpu)
     return cpu->cpu_index;
 }
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+    return 0;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     unsigned int max_cpus = MACHINE(qdev_get_machine())->smp.max_cpus;
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 05/11] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured
  2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
                   ` (3 preceding siblings ...)
  2025-04-16 21:52 ` [PATCH v4 04/11] kvm: Introduce kvm_arch_pre_create_vcpu() Dongli Zhang
@ 2025-04-16 21:52 ` Dongli Zhang
  2025-04-16 21:52 ` [PATCH v4 06/11] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

Although AMD PERFCORE and PerfMonV2 are removed when "-pmu" is configured,
there is no way to fully disable KVM AMD PMU virtualization. Neither
"-cpu host,-pmu" nor "-cpu EPYC" achieves this.

As a result, the following message still appears in the VM dmesg:

[    0.263615] Performance Events: AMD PMU driver.

However, the expected output should be:

[    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
[    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled

This occurs because AMD does not use any CPUID bit to indicate PMU
availability.

To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
when "-pmu" is configured.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v1:
  - Switch back to the initial implementation with "-pmu".
https://lore.kernel.org/all/20221119122901.2469-3-dongli.zhang@oracle.com
  - Mention that "KVM_PMU_CAP_DISABLE doesn't change the PMU behavior on
    Intel platform because current "pmu" property works as expected."
Changed since v2:
  - Change has_pmu_cap to pmu_cap.
  - Use (pmu_cap & KVM_PMU_CAP_DISABLE) instead of only pmu_cap in if
    statement.
  - Add Reviewed-by from Xiaoyao and Zhao as the change is minor.

 target/i386/kvm/kvm.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f41e190fb8..579c0f7e0b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -176,6 +176,8 @@ static int has_triple_fault_event;
 
 static bool has_msr_mcg_ext_ctl;
 
+static int pmu_cap;
+
 static struct kvm_cpuid2 *cpuid_cache;
 static struct kvm_cpuid2 *hv_cpuid_cache;
 static struct kvm_msr_list *kvm_feature_msrs;
@@ -2053,6 +2055,33 @@ full:
 
 int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
 {
+    static bool first = true;
+    int ret;
+
+    if (first) {
+        first = false;
+
+        /*
+         * Since Linux v5.18, KVM provides a VM-level capability to easily
+         * disable PMUs; however, QEMU has been providing PMU property per
+         * CPU since v1.6. In order to accommodate both, have to configure
+         * the VM-level capability here.
+         *
+         * KVM_PMU_CAP_DISABLE doesn't change the PMU
+         * behavior on Intel platform because current "pmu" property works
+         * as expected.
+         */
+        if ((pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu) {
+            ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
+                                    KVM_PMU_CAP_DISABLE);
+            if (ret < 0) {
+                error_setg_errno(errp, -ret,
+                                 "Failed to set KVM_PMU_CAP_DISABLE");
+                return ret;
+            }
+        }
+    }
+
     return 0;
 }
 
@@ -3351,6 +3380,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
         }
     }
 
+    pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
+
     return 0;
 }
 
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 06/11] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid()
  2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
                   ` (4 preceding siblings ...)
  2025-04-16 21:52 ` [PATCH v4 05/11] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured Dongli Zhang
@ 2025-04-16 21:52 ` Dongli Zhang
  2025-04-16 21:52 ` [PATCH v4 07/11] target/i386/kvm: rename architectural PMU variables Dongli Zhang
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

The initialization of 'has_architectural_pmu_version',
'num_architectural_pmu_gp_counters', and
'num_architectural_pmu_fixed_counters' is unrelated to the process of
building the CPUID.

Extract them out of kvm_x86_build_cpuid().

In addition, use cpuid_find_entry() instead of cpu_x86_cpuid(), because
CPUID has already been filled at this stage.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v1:
  - Still extract the code, but call them for all CPUs.
Changed since v2:
  - Use cpuid_find_entry() instead of cpu_x86_cpuid().
  - Didn't add Reviewed-by from Dapeng as the change isn't minor.

 target/i386/kvm/kvm.c | 62 ++++++++++++++++++++++++-------------------
 1 file changed, 35 insertions(+), 27 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 579c0f7e0b..4d86c08c6c 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1959,33 +1959,6 @@ static uint32_t kvm_x86_build_cpuid(CPUX86State *env,
         }
     }
 
-    if (limit >= 0x0a) {
-        uint32_t eax, edx;
-
-        cpu_x86_cpuid(env, 0x0a, 0, &eax, &unused, &unused, &edx);
-
-        has_architectural_pmu_version = eax & 0xff;
-        if (has_architectural_pmu_version > 0) {
-            num_architectural_pmu_gp_counters = (eax & 0xff00) >> 8;
-
-            /* Shouldn't be more than 32, since that's the number of bits
-             * available in EBX to tell us _which_ counters are available.
-             * Play it safe.
-             */
-            if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
-                num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
-            }
-
-            if (has_architectural_pmu_version > 1) {
-                num_architectural_pmu_fixed_counters = edx & 0x1f;
-
-                if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
-                    num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
-                }
-            }
-        }
-    }
-
     cpu_x86_cpuid(env, 0x80000000, 0, &limit, &unused, &unused, &unused);
 
     for (i = 0x80000000; i <= limit; i++) {
@@ -2085,6 +2058,39 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
     return 0;
 }
 
+static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
+{
+    struct kvm_cpuid_entry2 *c;
+
+    c = cpuid_find_entry(cpuid, 0xa, 0);
+
+    if (!c) {
+        return;
+    }
+
+    has_architectural_pmu_version = c->eax & 0xff;
+    if (has_architectural_pmu_version > 0) {
+        num_architectural_pmu_gp_counters = (c->eax & 0xff00) >> 8;
+
+        /*
+         * Shouldn't be more than 32, since that's the number of bits
+         * available in EBX to tell us _which_ counters are available.
+         * Play it safe.
+         */
+        if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
+            num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
+        }
+
+        if (has_architectural_pmu_version > 1) {
+            num_architectural_pmu_fixed_counters = c->edx & 0x1f;
+
+            if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
+                num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
+            }
+        }
+    }
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     struct {
@@ -2267,6 +2273,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
     cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i);
     cpuid_data.cpuid.nent = cpuid_i;
 
+    kvm_init_pmu_info(&cpuid_data.cpuid);
+
     if (((env->cpuid_version >> 8)&0xF) >= 6
         && (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
            (CPUID_MCE | CPUID_MCA)) {
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 07/11] target/i386/kvm: rename architectural PMU variables
  2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
                   ` (5 preceding siblings ...)
  2025-04-16 21:52 ` [PATCH v4 06/11] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
@ 2025-04-16 21:52 ` Dongli Zhang
  2025-04-16 21:52 ` [PATCH v4 08/11] target/i386/kvm: query kvm.enable_pmu parameter Dongli Zhang
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

AMD does not have what is commonly referred to as an architectural PMU.
Therefore, we need to rename the following variables to be applicable for
both Intel and AMD:

- has_architectural_pmu_version
- num_architectural_pmu_gp_counters
- num_architectural_pmu_fixed_counters

For Intel processors, the meaning of pmu_version remains unchanged.

For AMD processors:

pmu_version == 1 corresponds to versions before AMD PerfMonV2.
pmu_version == 2 corresponds to AMD PerfMonV2.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v2:
  - Change has_pmu_version to pmu_version.
  - Add Reviewed-by since the change is minor.
  - As a reminder, there are some contextual change due to PATCH 05,
    i.e., c->edx vs. edx.

 target/i386/kvm/kvm.c | 49 ++++++++++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 21 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 4d86c08c6c..6b49549f1b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -164,9 +164,16 @@ static bool has_msr_perf_capabs;
 static bool has_msr_pkrs;
 static bool has_msr_hwcr;
 
-static uint32_t has_architectural_pmu_version;
-static uint32_t num_architectural_pmu_gp_counters;
-static uint32_t num_architectural_pmu_fixed_counters;
+/*
+ * For Intel processors, the meaning is the architectural PMU version
+ * number.
+ *
+ * For AMD processors: 1 corresponds to the prior versions, and 2
+ * corresponds to AMD PerfMonV2.
+ */
+static uint32_t pmu_version;
+static uint32_t num_pmu_gp_counters;
+static uint32_t num_pmu_fixed_counters;
 
 static int has_xsave2;
 static int has_xcrs;
@@ -2068,24 +2075,24 @@ static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
         return;
     }
 
-    has_architectural_pmu_version = c->eax & 0xff;
-    if (has_architectural_pmu_version > 0) {
-        num_architectural_pmu_gp_counters = (c->eax & 0xff00) >> 8;
+    pmu_version = c->eax & 0xff;
+    if (pmu_version > 0) {
+        num_pmu_gp_counters = (c->eax & 0xff00) >> 8;
 
         /*
          * Shouldn't be more than 32, since that's the number of bits
          * available in EBX to tell us _which_ counters are available.
          * Play it safe.
          */
-        if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
-            num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
+        if (num_pmu_gp_counters > MAX_GP_COUNTERS) {
+            num_pmu_gp_counters = MAX_GP_COUNTERS;
         }
 
-        if (has_architectural_pmu_version > 1) {
-            num_architectural_pmu_fixed_counters = c->edx & 0x1f;
+        if (pmu_version > 1) {
+            num_pmu_fixed_counters = c->edx & 0x1f;
 
-            if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
-                num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
+            if (num_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
+                num_pmu_fixed_counters = MAX_FIXED_COUNTERS;
             }
         }
     }
@@ -4037,25 +4044,25 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
             kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
         }
 
-        if (has_architectural_pmu_version > 0) {
-            if (has_architectural_pmu_version > 1) {
+        if (pmu_version > 0) {
+            if (pmu_version > 1) {
                 /* Stop the counter.  */
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
             }
 
             /* Set the counter values.  */
-            for (i = 0; i < num_architectural_pmu_fixed_counters; i++) {
+            for (i = 0; i < num_pmu_fixed_counters; i++) {
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i,
                                   env->msr_fixed_counters[i]);
             }
-            for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
+            for (i = 0; i < num_pmu_gp_counters; i++) {
                 kvm_msr_entry_add(cpu, MSR_P6_PERFCTR0 + i,
                                   env->msr_gp_counters[i]);
                 kvm_msr_entry_add(cpu, MSR_P6_EVNTSEL0 + i,
                                   env->msr_gp_evtsel[i]);
             }
-            if (has_architectural_pmu_version > 1) {
+            if (pmu_version > 1) {
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_STATUS,
                                   env->msr_global_status);
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
@@ -4515,17 +4522,17 @@ static int kvm_get_msrs(X86CPU *cpu)
     if (env->features[FEAT_KVM] & CPUID_KVM_POLL_CONTROL) {
         kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
     }
-    if (has_architectural_pmu_version > 0) {
-        if (has_architectural_pmu_version > 1) {
+    if (pmu_version > 0) {
+        if (pmu_version > 1) {
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_STATUS, 0);
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0);
         }
-        for (i = 0; i < num_architectural_pmu_fixed_counters; i++) {
+        for (i = 0; i < num_pmu_fixed_counters; i++) {
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i, 0);
         }
-        for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
+        for (i = 0; i < num_pmu_gp_counters; i++) {
             kvm_msr_entry_add(cpu, MSR_P6_PERFCTR0 + i, 0);
             kvm_msr_entry_add(cpu, MSR_P6_EVNTSEL0 + i, 0);
         }
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 08/11] target/i386/kvm: query kvm.enable_pmu parameter
  2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
                   ` (6 preceding siblings ...)
  2025-04-16 21:52 ` [PATCH v4 07/11] target/i386/kvm: rename architectural PMU variables Dongli Zhang
@ 2025-04-16 21:52 ` Dongli Zhang
  2025-04-25  8:56   ` Zhao Liu
  2025-04-16 21:52 ` [PATCH v4 09/11] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

When PMU is enabled in QEMU, there is a chance that PMU virtualization is
completely disabled by the KVM module parameter kvm.enable_pmu=N.

The kvm.enable_pmu parameter is introduced since Linux v5.17.
Its permission is 0444. It does not change until a reload of the KVM
module.

Read the kvm.enable_pmu value from the module sysfs to give a chance to
provide more information about vPMU enablement.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
Changed since v2:
  - Rework the code flow following Zhao's suggestion.
  - Return error when:
    (*kvm_enable_pmu == 'N' && X86_CPU(cpu)->enable_pmu)
Changed since v3:
  - Re-split the cases into enable_pmu and !enable_pmu, following Zhao's
    suggestion.
  - Rework the commit messages.
  - Bring back global static variable 'kvm_pmu_disabled' from v2.

 target/i386/kvm/kvm.c | 61 +++++++++++++++++++++++++++++++------------
 1 file changed, 44 insertions(+), 17 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 6b49549f1b..38cc1a5f43 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -184,6 +184,10 @@ static int has_triple_fault_event;
 static bool has_msr_mcg_ext_ctl;
 
 static int pmu_cap;
+/*
+ * Read from /sys/module/kvm/parameters/enable_pmu.
+ */
+static bool kvm_pmu_disabled;
 
 static struct kvm_cpuid2 *cpuid_cache;
 static struct kvm_cpuid2 *hv_cpuid_cache;
@@ -2041,23 +2045,30 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
     if (first) {
         first = false;
 
-        /*
-         * Since Linux v5.18, KVM provides a VM-level capability to easily
-         * disable PMUs; however, QEMU has been providing PMU property per
-         * CPU since v1.6. In order to accommodate both, have to configure
-         * the VM-level capability here.
-         *
-         * KVM_PMU_CAP_DISABLE doesn't change the PMU
-         * behavior on Intel platform because current "pmu" property works
-         * as expected.
-         */
-        if ((pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu) {
-            ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
-                                    KVM_PMU_CAP_DISABLE);
-            if (ret < 0) {
-                error_setg_errno(errp, -ret,
-                                 "Failed to set KVM_PMU_CAP_DISABLE");
-                return ret;
+        if (X86_CPU(cpu)->enable_pmu) {
+            if (kvm_pmu_disabled) {
+                warn_report("Failed to enable PMU since "
+                            "KVM's enable_pmu parameter is disabled");
+            }
+        } else {
+            /*
+             * Since Linux v5.18, KVM provides a VM-level capability to easily
+             * disable PMUs; however, QEMU has been providing PMU property per
+             * CPU since v1.6. In order to accommodate both, have to configure
+             * the VM-level capability here.
+             *
+             * KVM_PMU_CAP_DISABLE doesn't change the PMU
+             * behavior on Intel platform because current "pmu" property works
+             * as expected.
+             */
+            if (pmu_cap & KVM_PMU_CAP_DISABLE) {
+                ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
+                                        KVM_PMU_CAP_DISABLE);
+                if (ret < 0) {
+                    error_setg_errno(errp, -ret,
+                                     "Failed to set KVM_PMU_CAP_DISABLE");
+                    return ret;
+                }
             }
         }
     }
@@ -3252,6 +3263,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     int ret;
     struct utsname utsname;
     Error *local_err = NULL;
+    g_autofree char *kvm_enable_pmu;
 
     /*
      * Initialize SEV context, if required
@@ -3397,6 +3409,21 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 
     pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
 
+    /*
+     * The enable_pmu parameter is introduced since Linux v5.17,
+     * give a chance to provide more information about vPMU
+     * enablement.
+     *
+     * The kvm.enable_pmu's permission is 0444. It does not change
+     * until a reload of the KVM module.
+     */
+    if (g_file_get_contents("/sys/module/kvm/parameters/enable_pmu",
+                            &kvm_enable_pmu, NULL, NULL)) {
+        if (*kvm_enable_pmu == 'N') {
+            kvm_pmu_disabled = true;
+        }
+    }
+
     return 0;
 }
 
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 09/11] target/i386/kvm: reset AMD PMU registers during VM reset
  2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
                   ` (7 preceding siblings ...)
  2025-04-16 21:52 ` [PATCH v4 08/11] target/i386/kvm: query kvm.enable_pmu parameter Dongli Zhang
@ 2025-04-16 21:52 ` Dongli Zhang
  2025-04-25  9:18   ` Zhao Liu
  2025-04-25 10:14   ` Sandipan Das
  2025-04-16 21:52 ` [PATCH v4 10/11] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
  2025-04-16 21:52 ` [PATCH v4 11/11] target/i386/kvm: don't stop Intel PMU counters Dongli Zhang
  10 siblings, 2 replies; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

QEMU uses the kvm_get_msrs() function to save Intel PMU registers from KVM
and kvm_put_msrs() to restore them to KVM. However, there is no support for
AMD PMU registers. Currently, pmu_version and num_pmu_gp_counters are
initialized based on cpuid(0xa), which does not apply to AMD processors.
For AMD CPUs, prior to PerfMonV2, the number of general-purpose registers
is determined based on the CPU version.

To address this issue, we need to add support for AMD PMU registers.
Without this support, the following problems can arise:

1. If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
running "perf top", the PMU registers are not disabled properly.

2. Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
does not handle AMD PMU registers, causing some PMU events to remain
enabled in KVM.

3. The KVM kvm_pmc_speculative_in_use() function consistently returns true,
preventing the reclamation of these events. Consequently, the
kvm_pmc->perf_event remains active.

4. After a reboot, the VM kernel may report the following error:

[    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
[    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)

5. In the worst case, the active kvm_pmc->perf_event may inject unknown
NMIs randomly into the VM kernel:

[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.

To resolve these issues, we propose resetting AMD PMU registers during the
VM reset process.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
Changed since v1:
  - Modify "MSR_K7_EVNTSEL0 + 3" and "MSR_K7_PERFCTR0 + 3" by using
    AMD64_NUM_COUNTERS (suggested by Sandipan Das).
  - Use "AMD64_NUM_COUNTERS_CORE * 2 - 1", not "MSR_F15H_PERF_CTL0 + 0xb".
    (suggested by Sandipan Das).
  - Switch back to "-pmu" instead of using a global "pmu-cap-disabled".
  - Don't initialize PMU info if kvm.enable_pmu=N.
Changed since v2:
  - Remove 'static' from host_cpuid_vendorX.
  - Change has_pmu_version to pmu_version.
  - Use object_property_get_int() to get CPU family.
  - Use cpuid_find_entry() instead of cpu_x86_cpuid().
  - Send error log when host and guest are from different vendors.
  - Move "if (!cpu->enable_pmu)" to begin of function. Add comments to
    reminder developers.
  - Add support to Zhaoxin. Change is_same_vendor() to
    is_host_compat_vendor().
  - Didn't add Reviewed-by from Sandipan because the change isn't minor.
Changed since v3:
  - Use host_cpu_vendor_fms() from Zhao's patch.
  - Check AMD directly makes the "compat" rule clear.
  - Add comment to MAX_GP_COUNTERS.
  - Skip PMU info initialization if !kvm_pmu_disabled.

 target/i386/cpu.h     |  12 +++
 target/i386/kvm/kvm.c | 175 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 183 insertions(+), 4 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 76f24446a5..5d5266f89e 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -490,6 +490,14 @@ typedef enum X86Seg {
 #define MSR_CORE_PERF_GLOBAL_CTRL       0x38f
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL   0x390
 
+#define MSR_K7_EVNTSEL0                 0xc0010000
+#define MSR_K7_PERFCTR0                 0xc0010004
+#define MSR_F15H_PERF_CTL0              0xc0010200
+#define MSR_F15H_PERF_CTR0              0xc0010201
+
+#define AMD64_NUM_COUNTERS              4
+#define AMD64_NUM_COUNTERS_CORE         6
+
 #define MSR_MC0_CTL                     0x400
 #define MSR_MC0_STATUS                  0x401
 #define MSR_MC0_ADDR                    0x402
@@ -1608,6 +1616,10 @@ typedef struct {
 #endif
 
 #define MAX_FIXED_COUNTERS 3
+/*
+ * This formula is based on Intel's MSR. The current size also meets AMD's
+ * needs.
+ */
 #define MAX_GP_COUNTERS    (MSR_IA32_PERF_STATUS - MSR_P6_EVNTSEL0)
 
 #define TARGET_INSN_START_EXTRA_WORDS 1
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 38cc1a5f43..b8926bd4cb 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2076,7 +2076,7 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
     return 0;
 }
 
-static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
+static void kvm_init_pmu_info_intel(struct kvm_cpuid2 *cpuid)
 {
     struct kvm_cpuid_entry2 *c;
 
@@ -2109,6 +2109,96 @@ static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
     }
 }
 
+static void kvm_init_pmu_info_amd(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
+{
+    struct kvm_cpuid_entry2 *c;
+    int64_t family;
+
+    family = object_property_get_int(OBJECT(cpu), "family", NULL);
+    if (family < 0) {
+        return;
+    }
+
+    if (family < 6) {
+        error_report("AMD performance-monitoring is supported from "
+                     "K7 and later");
+        return;
+    }
+
+    pmu_version = 1;
+    num_pmu_gp_counters = AMD64_NUM_COUNTERS;
+
+    c = cpuid_find_entry(cpuid, 0x80000001, 0);
+    if (!c) {
+        return;
+    }
+
+    if (!(c->ecx & CPUID_EXT3_PERFCORE)) {
+        return;
+    }
+
+    num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
+}
+
+static bool is_host_compat_vendor(CPUX86State *env)
+{
+    char host_vendor[CPUID_VENDOR_SZ + 1];
+
+    host_cpu_vendor_fms(host_vendor, NULL, NULL, NULL);
+
+    /*
+     * Intel and Zhaoxin are compatible.
+     */
+    if ((g_str_equal(host_vendor, CPUID_VENDOR_INTEL) ||
+         g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN1) ||
+         g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN2)) &&
+        (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env))) {
+        return true;
+    }
+
+    return g_str_equal(host_vendor, CPUID_VENDOR_AMD) &&
+           IS_AMD_CPU(env);
+}
+
+static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
+{
+    CPUX86State *env = &cpu->env;
+
+    /*
+     * The PMU virtualization is disabled by kvm.enable_pmu=N.
+     */
+    if (kvm_pmu_disabled) {
+        return;
+    }
+
+    /*
+     * If KVM_CAP_PMU_CAPABILITY is not supported, there is no way to
+     * disable the AMD PMU virtualization.
+     *
+     * Assume the user is aware of this when !cpu->enable_pmu. AMD PMU
+     * registers are not going to reset, even they are still available to
+     * guest VM.
+     */
+    if (!cpu->enable_pmu) {
+        return;
+    }
+
+    /*
+     * It is not supported to virtualize AMD PMU registers on Intel
+     * processors, nor to virtualize Intel PMU registers on AMD processors.
+     */
+    if (!is_host_compat_vendor(env)) {
+        error_report("host doesn't support requested feature: vPMU");
+        return;
+    }
+
+    if (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) {
+        kvm_init_pmu_info_intel(cpuid);
+    } else if (IS_AMD_CPU(env)) {
+        kvm_init_pmu_info_amd(cpuid, cpu);
+    }
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     struct {
@@ -2291,7 +2381,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
     cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i);
     cpuid_data.cpuid.nent = cpuid_i;
 
-    kvm_init_pmu_info(&cpuid_data.cpuid);
+    kvm_init_pmu_info(&cpuid_data.cpuid, cpu);
 
     if (((env->cpuid_version >> 8)&0xF) >= 6
         && (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
@@ -4071,7 +4161,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
             kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
         }
 
-        if (pmu_version > 0) {
+        if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
             if (pmu_version > 1) {
                 /* Stop the counter.  */
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
@@ -4102,6 +4192,38 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
                                   env->msr_global_ctrl);
             }
         }
+
+        if (IS_AMD_CPU(env) && pmu_version > 0) {
+            uint32_t sel_base = MSR_K7_EVNTSEL0;
+            uint32_t ctr_base = MSR_K7_PERFCTR0;
+            /*
+             * The address of the next selector or counter register is
+             * obtained by incrementing the address of the current selector
+             * or counter register by one.
+             */
+            uint32_t step = 1;
+
+            /*
+             * When PERFCORE is enabled, AMD PMU uses a separate set of
+             * addresses for the selector and counter registers.
+             * Additionally, the address of the next selector or counter
+             * register is determined by incrementing the address of the
+             * current register by two.
+             */
+            if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+                sel_base = MSR_F15H_PERF_CTL0;
+                ctr_base = MSR_F15H_PERF_CTR0;
+                step = 2;
+            }
+
+            for (i = 0; i < num_pmu_gp_counters; i++) {
+                kvm_msr_entry_add(cpu, ctr_base + i * step,
+                                  env->msr_gp_counters[i]);
+                kvm_msr_entry_add(cpu, sel_base + i * step,
+                                  env->msr_gp_evtsel[i]);
+            }
+        }
+
         /*
          * Hyper-V partition-wide MSRs: to avoid clearing them on cpu hot-add,
          * only sync them to KVM on the first cpu
@@ -4549,7 +4671,8 @@ static int kvm_get_msrs(X86CPU *cpu)
     if (env->features[FEAT_KVM] & CPUID_KVM_POLL_CONTROL) {
         kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
     }
-    if (pmu_version > 0) {
+
+    if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
         if (pmu_version > 1) {
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
@@ -4565,6 +4688,35 @@ static int kvm_get_msrs(X86CPU *cpu)
         }
     }
 
+    if (IS_AMD_CPU(env) && pmu_version > 0) {
+        uint32_t sel_base = MSR_K7_EVNTSEL0;
+        uint32_t ctr_base = MSR_K7_PERFCTR0;
+        /*
+         * The address of the next selector or counter register is
+         * obtained by incrementing the address of the current selector
+         * or counter register by one.
+         */
+        uint32_t step = 1;
+
+        /*
+         * When PERFCORE is enabled, AMD PMU uses a separate set of
+         * addresses for the selector and counter registers.
+         * Additionally, the address of the next selector or counter
+         * register is determined by incrementing the address of the
+         * current register by two.
+         */
+        if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+            sel_base = MSR_F15H_PERF_CTL0;
+            ctr_base = MSR_F15H_PERF_CTR0;
+            step = 2;
+        }
+
+        for (i = 0; i < num_pmu_gp_counters; i++) {
+            kvm_msr_entry_add(cpu, ctr_base + i * step, 0);
+            kvm_msr_entry_add(cpu, sel_base + i * step, 0);
+        }
+    }
+
     if (env->mcg_cap) {
         kvm_msr_entry_add(cpu, MSR_MCG_STATUS, 0);
         kvm_msr_entry_add(cpu, MSR_MCG_CTL, 0);
@@ -4876,6 +5028,21 @@ static int kvm_get_msrs(X86CPU *cpu)
         case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1:
             env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data;
             break;
+        case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL0 + AMD64_NUM_COUNTERS - 1:
+            env->msr_gp_evtsel[index - MSR_K7_EVNTSEL0] = msrs[i].data;
+            break;
+        case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR0 + AMD64_NUM_COUNTERS - 1:
+            env->msr_gp_counters[index - MSR_K7_PERFCTR0] = msrs[i].data;
+            break;
+        case MSR_F15H_PERF_CTL0 ...
+             MSR_F15H_PERF_CTL0 + AMD64_NUM_COUNTERS_CORE * 2 - 1:
+            index = index - MSR_F15H_PERF_CTL0;
+            if (index & 0x1) {
+                env->msr_gp_counters[index] = msrs[i].data;
+            } else {
+                env->msr_gp_evtsel[index] = msrs[i].data;
+            }
+            break;
         case HV_X64_MSR_HYPERCALL:
             env->msr_hv_hypercall = msrs[i].data;
             break;
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 10/11] target/i386/kvm: support perfmon-v2 for reset
  2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
                   ` (8 preceding siblings ...)
  2025-04-16 21:52 ` [PATCH v4 09/11] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
@ 2025-04-16 21:52 ` Dongli Zhang
  2025-04-25 10:12   ` Sandipan Das
  2025-04-16 21:52 ` [PATCH v4 11/11] target/i386/kvm: don't stop Intel PMU counters Dongli Zhang
  10 siblings, 1 reply; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

Since perfmon-v2, the AMD PMU supports additional registers. This update
includes get/put functionality for these extra registers.

Similar to the implementation in KVM:

- MSR_CORE_PERF_GLOBAL_STATUS and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS both
use env->msr_global_status.
- MSR_CORE_PERF_GLOBAL_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_CTL both use
env->msr_global_ctrl.
- MSR_CORE_PERF_GLOBAL_OVF_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR
both use env->msr_global_ovf_ctrl.

No changes are needed for vmstate_msr_architectural_pmu or
pmu_enable_needed().

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v1:
  - Use "has_pmu_version > 1", not "has_pmu_version == 2".
Changed since v2:
  - Use cpuid_find_entry() instead of cpu_x86_cpuid().
  - Change has_pmu_version to pmu_version.
  - Cap num_pmu_gp_counters with MAX_GP_COUNTERS.

 target/i386/cpu.h     |  4 ++++
 target/i386/kvm/kvm.c | 48 +++++++++++++++++++++++++++++++++++--------
 2 files changed, 43 insertions(+), 9 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 5d5266f89e..210a80950e 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -490,6 +490,10 @@ typedef enum X86Seg {
 #define MSR_CORE_PERF_GLOBAL_CTRL       0x38f
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL   0x390
 
+#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS       0xc0000300
+#define MSR_AMD64_PERF_CNTR_GLOBAL_CTL          0xc0000301
+#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR   0xc0000302
+
 #define MSR_K7_EVNTSEL0                 0xc0010000
 #define MSR_K7_PERFCTR0                 0xc0010004
 #define MSR_F15H_PERF_CTL0              0xc0010200
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index b8926bd4cb..13ad7de690 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2138,6 +2138,16 @@ static void kvm_init_pmu_info_amd(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
     }
 
     num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
+
+    c = cpuid_find_entry(cpuid, 0x80000022, 0);
+    if (c && (c->eax & CPUID_8000_0022_EAX_PERFMON_V2)) {
+        pmu_version = 2;
+        num_pmu_gp_counters = c->ebx & 0xf;
+
+        if (num_pmu_gp_counters > MAX_GP_COUNTERS) {
+            num_pmu_gp_counters = MAX_GP_COUNTERS;
+        }
+    }
 }
 
 static bool is_host_compat_vendor(CPUX86State *env)
@@ -4204,13 +4214,14 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
             uint32_t step = 1;
 
             /*
-             * When PERFCORE is enabled, AMD PMU uses a separate set of
-             * addresses for the selector and counter registers.
-             * Additionally, the address of the next selector or counter
-             * register is determined by incrementing the address of the
-             * current register by two.
+             * When PERFCORE or PerfMonV2 is enabled, AMD PMU uses a
+             * separate set of addresses for the selector and counter
+             * registers. Additionally, the address of the next selector or
+             * counter register is determined by incrementing the address
+             * of the current register by two.
              */
-            if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+            if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE ||
+                pmu_version > 1) {
                 sel_base = MSR_F15H_PERF_CTL0;
                 ctr_base = MSR_F15H_PERF_CTR0;
                 step = 2;
@@ -4222,6 +4233,15 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
                 kvm_msr_entry_add(cpu, sel_base + i * step,
                                   env->msr_gp_evtsel[i]);
             }
+
+            if (pmu_version > 1) {
+                kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS,
+                                  env->msr_global_status);
+                kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR,
+                                  env->msr_global_ovf_ctrl);
+                kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_CTL,
+                                  env->msr_global_ctrl);
+            }
         }
 
         /*
@@ -4699,13 +4719,14 @@ static int kvm_get_msrs(X86CPU *cpu)
         uint32_t step = 1;
 
         /*
-         * When PERFCORE is enabled, AMD PMU uses a separate set of
-         * addresses for the selector and counter registers.
+         * When PERFCORE or PerfMonV2 is enabled, AMD PMU uses a separate
+         * set of addresses for the selector and counter registers.
          * Additionally, the address of the next selector or counter
          * register is determined by incrementing the address of the
          * current register by two.
          */
-        if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+        if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE ||
+            pmu_version > 1) {
             sel_base = MSR_F15H_PERF_CTL0;
             ctr_base = MSR_F15H_PERF_CTR0;
             step = 2;
@@ -4715,6 +4736,12 @@ static int kvm_get_msrs(X86CPU *cpu)
             kvm_msr_entry_add(cpu, ctr_base + i * step, 0);
             kvm_msr_entry_add(cpu, sel_base + i * step, 0);
         }
+
+        if (pmu_version > 1) {
+            kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0);
+            kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, 0);
+            kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, 0);
+        }
     }
 
     if (env->mcg_cap) {
@@ -5011,12 +5038,15 @@ static int kvm_get_msrs(X86CPU *cpu)
             env->msr_fixed_ctr_ctrl = msrs[i].data;
             break;
         case MSR_CORE_PERF_GLOBAL_CTRL:
+        case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
             env->msr_global_ctrl = msrs[i].data;
             break;
         case MSR_CORE_PERF_GLOBAL_STATUS:
+        case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
             env->msr_global_status = msrs[i].data;
             break;
         case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+        case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
             env->msr_global_ovf_ctrl = msrs[i].data;
             break;
         case MSR_CORE_PERF_FIXED_CTR0 ... MSR_CORE_PERF_FIXED_CTR0 + MAX_FIXED_COUNTERS - 1:
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 11/11] target/i386/kvm: don't stop Intel PMU counters
  2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
                   ` (9 preceding siblings ...)
  2025-04-16 21:52 ` [PATCH v4 10/11] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
@ 2025-04-16 21:52 ` Dongli Zhang
  10 siblings, 0 replies; 19+ messages in thread
From: Dongli Zhang @ 2025-04-16 21:52 UTC (permalink / raw)
  To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

PMU MSRs are set by QEMU only at levels >= KVM_PUT_RESET_STATE,
excluding runtime. Therefore, updating these MSRs without stopping events
should be acceptable.

In addition, KVM creates kernel perf events with host mode excluded
(exclude_host = 1). While the events remain active, they don't increment
the counter during QEMU vCPU userspace mode.

Finally, The kvm_put_msrs() sets the MSRs using KVM_SET_MSRS. The x86 KVM
processes these MSRs one by one in a loop, only saving the config and
triggering the KVM_REQ_PMU request. This approach does not immediately stop
the event before updating PMC. This approach is true since Linux kernel
commit 68fb4757e867 ("KVM: x86/pmu: Defer reprogram_counter() to
kvm_pmu_handle_event"), that is, v6.2.

No Fixed tag is going to be added for the commit 0d89436786b0 ("kvm:
migrate vPMU state"), because this isn't a bugfix.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v3:
  - Re-order reasons in commit messages.
  - Mention KVM's commit 68fb4757e867 (v6.2).
  - Keep Zhao's review as there isn't code change.

 target/i386/kvm/kvm.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 13ad7de690..6396f65558 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4172,13 +4172,6 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
         }
 
         if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
-            if (pmu_version > 1) {
-                /* Stop the counter.  */
-                kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
-                kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
-            }
-
-            /* Set the counter values.  */
             for (i = 0; i < num_pmu_fixed_counters; i++) {
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i,
                                   env->msr_fixed_counters[i]);
@@ -4194,8 +4187,6 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
                                   env->msr_global_status);
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
                                   env->msr_global_ovf_ctrl);
-
-                /* Now start the PMU.  */
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL,
                                   env->msr_fixed_ctr_ctrl);
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL,
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 01/11] [DO NOT MERGE] i386/cpu: Consolidate the helper to get Host's vendor
  2025-04-16 21:52 ` [PATCH v4 01/11] [DO NOT MERGE] i386/cpu: Consolidate the helper to get Host's vendor Dongli Zhang
@ 2025-04-25  8:28   ` Zhao Liu
  2025-04-25 15:45     ` Dongli Zhang
  0 siblings, 1 reply; 19+ messages in thread
From: Zhao Liu @ 2025-04-25  8:28 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x,
	pbonzini, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

On Wed, Apr 16, 2025 at 02:52:26PM -0700, Dongli Zhang wrote:
> Date: Wed, 16 Apr 2025 14:52:26 -0700
> From: Dongli Zhang <dongli.zhang@oracle.com>
> Subject: [PATCH v4 01/11] [DO NOT MERGE] i386/cpu: Consolidate the helper
>  to get Host's vendor
> X-Mailer: git-send-email 2.43.5
> 
> From: Zhao Liu <zhao1.liu@intel.com>
> 
> Extend host_cpu_vendor_fms() to help more cases to get Host's vendor
> information.
> 
> Cc: Dongli Zhang <dongli.zhang@oracle.com>
> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> ---
> This patch is already queued by Paolo.
> https://lore.kernel.org/all/20250410075619.145792-1-zhao1.liu@intel.com/
> I don't need to add my Signed-off-by.
> 
>  target/i386/host-cpu.c        | 10 ++++++----
>  target/i386/kvm/vmsr_energy.c |  3 +--
>  2 files changed, 7 insertions(+), 6 deletions(-)

Thanks. It has been merged as commit ae39acef49e2916 now.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 08/11] target/i386/kvm: query kvm.enable_pmu parameter
  2025-04-16 21:52 ` [PATCH v4 08/11] target/i386/kvm: query kvm.enable_pmu parameter Dongli Zhang
@ 2025-04-25  8:56   ` Zhao Liu
  0 siblings, 0 replies; 19+ messages in thread
From: Zhao Liu @ 2025-04-25  8:56 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x,
	pbonzini, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

On Wed, Apr 16, 2025 at 02:52:33PM -0700, Dongli Zhang wrote:
> Date: Wed, 16 Apr 2025 14:52:33 -0700
> From: Dongli Zhang <dongli.zhang@oracle.com>
> Subject: [PATCH v4 08/11] target/i386/kvm: query kvm.enable_pmu parameter
> X-Mailer: git-send-email 2.43.5
> 
> When PMU is enabled in QEMU, there is a chance that PMU virtualization is
> completely disabled by the KVM module parameter kvm.enable_pmu=N.
> 
> The kvm.enable_pmu parameter is introduced since Linux v5.17.
> Its permission is 0444. It does not change until a reload of the KVM
> module.
> 
> Read the kvm.enable_pmu value from the module sysfs to give a chance to
> provide more information about vPMU enablement.
> 
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> Changed since v2:
>   - Rework the code flow following Zhao's suggestion.
>   - Return error when:
>     (*kvm_enable_pmu == 'N' && X86_CPU(cpu)->enable_pmu)
> Changed since v3:
>   - Re-split the cases into enable_pmu and !enable_pmu, following Zhao's
>     suggestion.
>   - Rework the commit messages.
>   - Bring back global static variable 'kvm_pmu_disabled' from v2.
> 
>  target/i386/kvm/kvm.c | 61 +++++++++++++++++++++++++++++++------------
>  1 file changed, 44 insertions(+), 17 deletions(-)

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 09/11] target/i386/kvm: reset AMD PMU registers during VM reset
  2025-04-16 21:52 ` [PATCH v4 09/11] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
@ 2025-04-25  9:18   ` Zhao Liu
  2025-04-25 10:14   ` Sandipan Das
  1 sibling, 0 replies; 19+ messages in thread
From: Zhao Liu @ 2025-04-25  9:18 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x,
	pbonzini, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

On Wed, Apr 16, 2025 at 02:52:34PM -0700, Dongli Zhang wrote:
> Date: Wed, 16 Apr 2025 14:52:34 -0700
> From: Dongli Zhang <dongli.zhang@oracle.com>
> Subject: [PATCH v4 09/11] target/i386/kvm: reset AMD PMU registers during
>  VM reset
> X-Mailer: git-send-email 2.43.5
> 
> QEMU uses the kvm_get_msrs() function to save Intel PMU registers from KVM
> and kvm_put_msrs() to restore them to KVM. However, there is no support for
> AMD PMU registers. Currently, pmu_version and num_pmu_gp_counters are
> initialized based on cpuid(0xa), which does not apply to AMD processors.
> For AMD CPUs, prior to PerfMonV2, the number of general-purpose registers
> is determined based on the CPU version.
> 
> To address this issue, we need to add support for AMD PMU registers.
> Without this support, the following problems can arise:
> 
> 1. If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
> running "perf top", the PMU registers are not disabled properly.
> 
> 2. Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
> does not handle AMD PMU registers, causing some PMU events to remain
> enabled in KVM.
> 
> 3. The KVM kvm_pmc_speculative_in_use() function consistently returns true,
> preventing the reclamation of these events. Consequently, the
> kvm_pmc->perf_event remains active.
> 
> 4. After a reboot, the VM kernel may report the following error:
> 
> [    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
> [    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
> 
> 5. In the worst case, the active kvm_pmc->perf_event may inject unknown
> NMIs randomly into the VM kernel:
> 
> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
> 
> To resolve these issues, we propose resetting AMD PMU registers during the
> VM reset process.
> 
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> Changed since v1:
>   - Modify "MSR_K7_EVNTSEL0 + 3" and "MSR_K7_PERFCTR0 + 3" by using
>     AMD64_NUM_COUNTERS (suggested by Sandipan Das).
>   - Use "AMD64_NUM_COUNTERS_CORE * 2 - 1", not "MSR_F15H_PERF_CTL0 + 0xb".
>     (suggested by Sandipan Das).
>   - Switch back to "-pmu" instead of using a global "pmu-cap-disabled".
>   - Don't initialize PMU info if kvm.enable_pmu=N.
> Changed since v2:
>   - Remove 'static' from host_cpuid_vendorX.
>   - Change has_pmu_version to pmu_version.
>   - Use object_property_get_int() to get CPU family.
>   - Use cpuid_find_entry() instead of cpu_x86_cpuid().
>   - Send error log when host and guest are from different vendors.
>   - Move "if (!cpu->enable_pmu)" to begin of function. Add comments to
>     reminder developers.
>   - Add support to Zhaoxin. Change is_same_vendor() to
>     is_host_compat_vendor().
>   - Didn't add Reviewed-by from Sandipan because the change isn't minor.
> Changed since v3:
>   - Use host_cpu_vendor_fms() from Zhao's patch.
>   - Check AMD directly makes the "compat" rule clear.
>   - Add comment to MAX_GP_COUNTERS.
>   - Skip PMU info initialization if !kvm_pmu_disabled.
> 
>  target/i386/cpu.h     |  12 +++
>  target/i386/kvm/kvm.c | 175 +++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 183 insertions(+), 4 deletions(-)

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 03/11] target/i386: disable PERFCORE when "-pmu" is configured
  2025-04-16 21:52 ` [PATCH v4 03/11] target/i386: disable PERFCORE when "-pmu" is configured Dongli Zhang
@ 2025-04-25 10:11   ` Sandipan Das
  0 siblings, 0 replies; 19+ messages in thread
From: Sandipan Das @ 2025-04-25 10:11 UTC (permalink / raw)
  To: Dongli Zhang, qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv,
	qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, babu.moger, likexu, like.xu.linux,
	groug, khorenko, alexander.ivanov, den, davydov-max, xiaoyao.li,
	dapeng1.mi, joe.jin, peter.maydell, gaosong, chenhuacai, philmd,
	aurelien, jiaxun.yang, arikalo, npiggin, danielhb413, palmer,
	alistair.francis, liwei1518, zhiwei_liu, pasic, borntraeger,
	richard.henderson, david, iii, thuth, flavra, ewanhai-oc, ewanhai,
	cobechen, louisqi, liamni, frankzhu, silviazhao, kraxel, berrange

On 4/17/2025 3:22 AM, Dongli Zhang wrote:
> Currently, AMD PMU support isn't determined based on CPUID, that is, the
> "-pmu" option does not fully disable KVM AMD PMU virtualization.
> 
> To minimize AMD PMU features, remove PERFCORE when "-pmu" is configured.
> 
> To completely disable AMD PMU virtualization will be implemented via
> KVM_CAP_PMU_CAPABILITY in upcoming patches.
> 
> As a reminder, neither CPUID_EXT3_PERFCORE nor
> CPUID_8000_0022_EAX_PERFMON_V2 is removed from env->features[] when "-pmu"
> is configured. Developers should query whether they are supported via
> cpu_x86_cpuid() rather than relying on env->features[] in future patches.
> 
> Suggested-by: Zhao Liu <zhao1.liu@intel.com>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> ---
> Changed since v2:
>   - No need to check "kvm_enabled() && IS_AMD_CPU(env)".
> 
>  target/i386/cpu.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 

Reviewed-by: Sandipan Das <sandipan.das@amd.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 10/11] target/i386/kvm: support perfmon-v2 for reset
  2025-04-16 21:52 ` [PATCH v4 10/11] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
@ 2025-04-25 10:12   ` Sandipan Das
  0 siblings, 0 replies; 19+ messages in thread
From: Sandipan Das @ 2025-04-25 10:12 UTC (permalink / raw)
  To: Dongli Zhang, qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv,
	qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, babu.moger, likexu, like.xu.linux,
	groug, khorenko, alexander.ivanov, den, davydov-max, xiaoyao.li,
	dapeng1.mi, joe.jin, peter.maydell, gaosong, chenhuacai, philmd,
	aurelien, jiaxun.yang, arikalo, npiggin, danielhb413, palmer,
	alistair.francis, liwei1518, zhiwei_liu, pasic, borntraeger,
	richard.henderson, david, iii, thuth, flavra, ewanhai-oc, ewanhai,
	cobechen, louisqi, liamni, frankzhu, silviazhao, kraxel, berrange

On 4/17/2025 3:22 AM, Dongli Zhang wrote:
> Since perfmon-v2, the AMD PMU supports additional registers. This update
> includes get/put functionality for these extra registers.
> 
> Similar to the implementation in KVM:
> 
> - MSR_CORE_PERF_GLOBAL_STATUS and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS both
> use env->msr_global_status.
> - MSR_CORE_PERF_GLOBAL_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_CTL both use
> env->msr_global_ctrl.
> - MSR_CORE_PERF_GLOBAL_OVF_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR
> both use env->msr_global_ovf_ctrl.
> 
> No changes are needed for vmstate_msr_architectural_pmu or
> pmu_enable_needed().
> 
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> ---
> Changed since v1:
>   - Use "has_pmu_version > 1", not "has_pmu_version == 2".
> Changed since v2:
>   - Use cpuid_find_entry() instead of cpu_x86_cpuid().
>   - Change has_pmu_version to pmu_version.
>   - Cap num_pmu_gp_counters with MAX_GP_COUNTERS.
> 
>  target/i386/cpu.h     |  4 ++++
>  target/i386/kvm/kvm.c | 48 +++++++++++++++++++++++++++++++++++--------
>  2 files changed, 43 insertions(+), 9 deletions(-)
> 

Reviewed-by: Sandipan Das <sandipan.das@amd.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 09/11] target/i386/kvm: reset AMD PMU registers during VM reset
  2025-04-16 21:52 ` [PATCH v4 09/11] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
  2025-04-25  9:18   ` Zhao Liu
@ 2025-04-25 10:14   ` Sandipan Das
  1 sibling, 0 replies; 19+ messages in thread
From: Sandipan Das @ 2025-04-25 10:14 UTC (permalink / raw)
  To: Dongli Zhang, qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv,
	qemu-s390x
  Cc: pbonzini, zhao1.liu, mtosatti, babu.moger, likexu, like.xu.linux,
	groug, khorenko, alexander.ivanov, den, davydov-max, xiaoyao.li,
	dapeng1.mi, joe.jin, peter.maydell, gaosong, chenhuacai, philmd,
	aurelien, jiaxun.yang, arikalo, npiggin, danielhb413, palmer,
	alistair.francis, liwei1518, zhiwei_liu, pasic, borntraeger,
	richard.henderson, david, iii, thuth, flavra, ewanhai-oc, ewanhai,
	cobechen, louisqi, liamni, frankzhu, silviazhao, kraxel, berrange

On 4/17/2025 3:22 AM, Dongli Zhang wrote:
> QEMU uses the kvm_get_msrs() function to save Intel PMU registers from KVM
> and kvm_put_msrs() to restore them to KVM. However, there is no support for
> AMD PMU registers. Currently, pmu_version and num_pmu_gp_counters are
> initialized based on cpuid(0xa), which does not apply to AMD processors.
> For AMD CPUs, prior to PerfMonV2, the number of general-purpose registers
> is determined based on the CPU version.
> 
> To address this issue, we need to add support for AMD PMU registers.
> Without this support, the following problems can arise:
> 
> 1. If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
> running "perf top", the PMU registers are not disabled properly.
> 
> 2. Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
> does not handle AMD PMU registers, causing some PMU events to remain
> enabled in KVM.
> 
> 3. The KVM kvm_pmc_speculative_in_use() function consistently returns true,
> preventing the reclamation of these events. Consequently, the
> kvm_pmc->perf_event remains active.
> 
> 4. After a reboot, the VM kernel may report the following error:
> 
> [    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
> [    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
> 
> 5. In the worst case, the active kvm_pmc->perf_event may inject unknown
> NMIs randomly into the VM kernel:
> 
> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
> 
> To resolve these issues, we propose resetting AMD PMU registers during the
> VM reset process.
> 
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> Changed since v1:
>   - Modify "MSR_K7_EVNTSEL0 + 3" and "MSR_K7_PERFCTR0 + 3" by using
>     AMD64_NUM_COUNTERS (suggested by Sandipan Das).
>   - Use "AMD64_NUM_COUNTERS_CORE * 2 - 1", not "MSR_F15H_PERF_CTL0 + 0xb".
>     (suggested by Sandipan Das).
>   - Switch back to "-pmu" instead of using a global "pmu-cap-disabled".
>   - Don't initialize PMU info if kvm.enable_pmu=N.
> Changed since v2:
>   - Remove 'static' from host_cpuid_vendorX.
>   - Change has_pmu_version to pmu_version.
>   - Use object_property_get_int() to get CPU family.
>   - Use cpuid_find_entry() instead of cpu_x86_cpuid().
>   - Send error log when host and guest are from different vendors.
>   - Move "if (!cpu->enable_pmu)" to begin of function. Add comments to
>     reminder developers.
>   - Add support to Zhaoxin. Change is_same_vendor() to
>     is_host_compat_vendor().
>   - Didn't add Reviewed-by from Sandipan because the change isn't minor.
> Changed since v3:
>   - Use host_cpu_vendor_fms() from Zhao's patch.
>   - Check AMD directly makes the "compat" rule clear.
>   - Add comment to MAX_GP_COUNTERS.
>   - Skip PMU info initialization if !kvm_pmu_disabled.
> 
>  target/i386/cpu.h     |  12 +++
>  target/i386/kvm/kvm.c | 175 +++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 183 insertions(+), 4 deletions(-)
> 

Reviewed-by: Sandipan Das <sandipan.das@amd.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 01/11] [DO NOT MERGE] i386/cpu: Consolidate the helper to get Host's vendor
  2025-04-25  8:28   ` Zhao Liu
@ 2025-04-25 15:45     ` Dongli Zhang
  0 siblings, 0 replies; 19+ messages in thread
From: Dongli Zhang @ 2025-04-25 15:45 UTC (permalink / raw)
  To: Zhao Liu
  Cc: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x,
	pbonzini, mtosatti, sandipan.das, babu.moger, likexu,
	like.xu.linux, groug, khorenko, alexander.ivanov, den,
	davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
	gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
	npiggin, danielhb413, palmer, alistair.francis, liwei1518,
	zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
	thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
	frankzhu, silviazhao, kraxel, berrange

Hi Zhao,

On 4/25/25 1:28 AM, Zhao Liu wrote:
> On Wed, Apr 16, 2025 at 02:52:26PM -0700, Dongli Zhang wrote:
>> Date: Wed, 16 Apr 2025 14:52:26 -0700
>> From: Dongli Zhang <dongli.zhang@oracle.com>
>> Subject: [PATCH v4 01/11] [DO NOT MERGE] i386/cpu: Consolidate the helper
>>  to get Host's vendor
>> X-Mailer: git-send-email 2.43.5
>>
>> From: Zhao Liu <zhao1.liu@intel.com>
>>
>> Extend host_cpu_vendor_fms() to help more cases to get Host's vendor
>> information.
>>
>> Cc: Dongli Zhang <dongli.zhang@oracle.com>
>> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
>> ---
>> This patch is already queued by Paolo.
>> https://urldefense.com/v3/__https://lore.kernel.org/all/20250410075619.145792-1-zhao1.liu@intel.com/__;!!ACWV5N9M2RV99hQ!L2uxw6itl1xu4V_vdRWxQMeVR4PWVX0zvXndOqPHqmnCvnpPkyNamRGVSAil03m_ojnjPCMgUMEG0jBDtLNl$ 
>> I don't need to add my Signed-off-by.
>>
>>  target/i386/host-cpu.c        | 10 ++++++----
>>  target/i386/kvm/vmsr_energy.c |  3 +--
>>  2 files changed, 7 insertions(+), 6 deletions(-)
> 
> Thanks. It has been merged as commit ae39acef49e2916 now.
> 

Since all patches are reviewed, I am going to re-send on top of the most recent
mainline QEMU with all Reviewed-by.

Thank you very much!

Dongli Zhang

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-04-25 15:46 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-16 21:52 [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
2025-04-16 21:52 ` [PATCH v4 01/11] [DO NOT MERGE] i386/cpu: Consolidate the helper to get Host's vendor Dongli Zhang
2025-04-25  8:28   ` Zhao Liu
2025-04-25 15:45     ` Dongli Zhang
2025-04-16 21:52 ` [PATCH v4 02/11] target/i386: disable PerfMonV2 when PERFCORE unavailable Dongli Zhang
2025-04-16 21:52 ` [PATCH v4 03/11] target/i386: disable PERFCORE when "-pmu" is configured Dongli Zhang
2025-04-25 10:11   ` Sandipan Das
2025-04-16 21:52 ` [PATCH v4 04/11] kvm: Introduce kvm_arch_pre_create_vcpu() Dongli Zhang
2025-04-16 21:52 ` [PATCH v4 05/11] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured Dongli Zhang
2025-04-16 21:52 ` [PATCH v4 06/11] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
2025-04-16 21:52 ` [PATCH v4 07/11] target/i386/kvm: rename architectural PMU variables Dongli Zhang
2025-04-16 21:52 ` [PATCH v4 08/11] target/i386/kvm: query kvm.enable_pmu parameter Dongli Zhang
2025-04-25  8:56   ` Zhao Liu
2025-04-16 21:52 ` [PATCH v4 09/11] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
2025-04-25  9:18   ` Zhao Liu
2025-04-25 10:14   ` Sandipan Das
2025-04-16 21:52 ` [PATCH v4 10/11] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
2025-04-25 10:12   ` Sandipan Das
2025-04-16 21:52 ` [PATCH v4 11/11] target/i386/kvm: don't stop Intel PMU counters Dongli Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.