[PATCH 0/3] kvm: fix two svm pmu virtualization bugs

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] kvm: fix two svm pmu virtualization bugs
@ 2022-11-19 12:28 Dongli Zhang
  2022-11-19 12:28 ` [PATCH 1/3] kvm: introduce a helper before creating the 1st vcpu Dongli Zhang
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Dongli Zhang @ 2022-11-19 12:28 UTC (permalink / raw)
  To: kvm, qemu-devel, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, peter.maydell, mtosatti, chenhuacai, philmd, aurelien,
	jiaxun.yang, aleksandar.rikalo, danielhb413, clg, david, groug,
	palmer, alistair.francis, bin.meng, pasic, borntraeger,
	richard.henderson, david, iii, thuth, joe.jin, likexu

This patchset is to fix two svm pmu virtualization bugs.

1. The 1st bug is that "-cpu,-pmu" cannot disable svm pmu virtualization.

To use "-cpu EPYC" or "-cpu host,-pmu" cannot disable the pmu
virtualization. There is still below at the VM linux side ...

[    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.

... although we expect something like below.

[    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
[    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled

The patch 1-2 is to disable the pmu virtualization via KVM_PMU_CAP_DISABLE
if the per-vcpu "pmu" property is disabled.

I considered 'KVM_X86_SET_MSR_FILTER' initially.
Since both KVM_X86_SET_MSR_FILTER and KVM_PMU_CAP_DISABLE are VM ioctl. I
finally used the latter because it is easier to use.

2. The 2nd bug is that un-reclaimed perf events (after QEMU system_reset)
at the KVM side may inject random unwanted/unknown NMIs to the VM.

The svm pmu registers are not reset during QEMU system_reset.

(1). The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it
is running "perf top". The pmu registers are not disabled gracefully.

(2). Although the x86_cpu_reset() resets many registers to zero, the
kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result,
some pmu events are still enabled at the KVM side.

(3). The KVM pmc_speculative_in_use() always returns true so that the events
will not be reclaimed. The kvm_pmc->perf_event is still active.

(4). After the reboot, the VM kernel reports below error:

[    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
[    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)

(5). In a worse case, the active kvm_pmc->perf_event is still able to
inject unknown NMIs randomly to the VM kernel.

[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.

The patch 3 is to fix the issue by resetting AMD pmu registers as well as
Intel registers.

This patchset does cover does not cover PerfMonV2, until the below patchset
is merged into the KVM side.

[PATCH v3 0/8] KVM: x86: Add AMD Guest PerfMonV2 PMU support
https://lore.kernel.org/all/20221111102645.82001-1-likexu@tencent.com/

Dongli Zhang (3):
      kvm: introduce a helper before creating the 1st vcpu
      i386: kvm: disable KVM_CAP_PMU_CAPABILITY if "pmu" is disabled
      target/i386/kvm: get and put AMD pmu registers

 accel/kvm/kvm-all.c    |   7 ++-
 include/sysemu/kvm.h   |   2 +
 target/arm/kvm64.c     |   4 ++
 target/i386/cpu.h      |   5 +++
 target/i386/kvm/kvm.c  | 104 +++++++++++++++++++++++++++++++++++++++++++-
 target/mips/kvm.c      |   4 ++
 target/ppc/kvm.c       |   4 ++
 target/riscv/kvm.c     |   4 ++
 target/s390x/kvm/kvm.c |   4 ++
 9 files changed, 134 insertions(+), 4 deletions(-)

Thank you very much!

Dongli Zhang

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] kvm: introduce a helper before creating the 1st vcpu
  2022-11-19 12:28 [PATCH 0/3] kvm: fix two svm pmu virtualization bugs Dongli Zhang
@ 2022-11-19 12:28 ` Dongli Zhang
  2022-11-19 12:29 ` [PATCH 2/3] i386: kvm: disable KVM_CAP_PMU_CAPABILITY if "pmu" is disabled Dongli Zhang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Dongli Zhang @ 2022-11-19 12:28 UTC (permalink / raw)
  To: kvm, qemu-devel, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, peter.maydell, mtosatti, chenhuacai, philmd, aurelien,
	jiaxun.yang, aleksandar.rikalo, danielhb413, clg, david, groug,
	palmer, alistair.francis, bin.meng, pasic, borntraeger,
	richard.henderson, david, iii, thuth, joe.jin, likexu

Some per-VM kvm caps (e.g., KVM_CAP_PMU_CAPABILITY) can only be
enabled/disabled before creating the 1st vcpu, that is, when
(!kvm->created_vcpus) at the KVM side.

Unfortunately, some properties are still not set during kvm_arch_init().
The values of those properties are obtained during the init of each vcpu.

This is to add a new helper to provide the last chance before creating the
1st vcpu, in order for the QEMU to set kvm caps based on the per-vcpu
properties (e.g., "pmu").

In the future patch, we may disable KVM_CAP_PMU_CAPABILITY in the helper
if the "-pmu" is set for the vcpu.

Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
 accel/kvm/kvm-all.c    | 7 +++++--
 include/sysemu/kvm.h   | 2 ++
 target/arm/kvm64.c     | 4 ++++
 target/i386/kvm/kvm.c  | 4 ++++
 target/mips/kvm.c      | 4 ++++
 target/ppc/kvm.c       | 4 ++++
 target/riscv/kvm.c     | 4 ++++
 target/s390x/kvm/kvm.c | 4 ++++
 8 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index f99b0becd8..335ff6ce4d 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -367,8 +367,9 @@ void kvm_destroy_vcpu(CPUState *cpu)
     }
 }
 
-static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id)
+static int kvm_get_vcpu(KVMState *s, CPUState *cs)
 {
+    unsigned long vcpu_id = kvm_arch_vcpu_id(cs);
     struct KVMParkedVcpu *cpu;
 
     QLIST_FOREACH(cpu, &s->kvm_parked_vcpus, node) {
@@ -382,6 +383,8 @@ static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id)
         }
     }
 
+    kvm_arch_pre_create_vcpu(cs);
+
     return kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)vcpu_id);
 }
 
@@ -393,7 +396,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 
     trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
 
-    ret = kvm_get_vcpu(s, kvm_arch_vcpu_id(cpu));
+    ret = kvm_get_vcpu(s, cpu);
     if (ret < 0) {
         error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu failed (%lu)",
                          kvm_arch_vcpu_id(cpu));
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index e9a97eda8c..9a2e2ba012 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -371,6 +371,8 @@ int kvm_arch_put_registers(CPUState *cpu, int level);
 
 int kvm_arch_init(MachineState *ms, KVMState *s);
 
+void kvm_arch_pre_create_vcpu(CPUState *cs);
+
 int kvm_arch_init_vcpu(CPUState *cpu);
 int kvm_arch_destroy_vcpu(CPUState *cpu);
 
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 1197253d12..da4317ad06 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -834,6 +834,10 @@ static int kvm_arm_sve_set_vls(CPUState *cs)
     return kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
 }
 
+void kvm_arch_pre_create_vcpu(CPUState *cs)
+{
+}
+
 #define ARM_CPU_ID_MPIDR       3, 0, 0, 0, 5
 
 int kvm_arch_init_vcpu(CPUState *cs)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index a213209379..8fec0bc5b5 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1723,6 +1723,10 @@ static void kvm_init_nested_state(CPUX86State *env)
     }
 }
 
+void kvm_arch_pre_create_vcpu(CPUState *cs)
+{
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     struct {
diff --git a/target/mips/kvm.c b/target/mips/kvm.c
index bcb8e06b2c..1be1695b6b 100644
--- a/target/mips/kvm.c
+++ b/target/mips/kvm.c
@@ -61,6 +61,10 @@ int kvm_arch_irqchip_create(KVMState *s)
     return 0;
 }
 
+void kvm_arch_pre_create_vcpu(CPUState *cs)
+{
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     MIPSCPU *cpu = MIPS_CPU(cs);
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 7c25348b7b..9049c6eb5e 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -462,6 +462,10 @@ static void kvmppc_hw_debug_points_init(CPUPPCState *cenv)
     }
 }
 
+void kvm_arch_pre_create_vcpu(CPUState *cs)
+{
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     PowerPCCPU *cpu = POWERPC_CPU(cs);
diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index 30f21453d6..811f65d4f6 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -394,6 +394,10 @@ void kvm_arch_init_irq_routing(KVMState *s)
 {
 }
 
+void kvm_arch_pre_create_vcpu(CPUState *cs)
+{
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     int ret = 0;
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 3ac7ec9acf..65f701894e 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -404,6 +404,10 @@ unsigned long kvm_arch_vcpu_id(CPUState *cpu)
     return cpu->cpu_index;
 }
 
+void kvm_arch_pre_create_vcpu(CPUState *cs)
+{
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     unsigned int max_cpus = MACHINE(qdev_get_machine())->smp.max_cpus;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] i386: kvm: disable KVM_CAP_PMU_CAPABILITY if "pmu" is disabled
  2022-11-19 12:28 [PATCH 0/3] kvm: fix two svm pmu virtualization bugs Dongli Zhang
  2022-11-19 12:28 ` [PATCH 1/3] kvm: introduce a helper before creating the 1st vcpu Dongli Zhang
@ 2022-11-19 12:29 ` Dongli Zhang
  2022-11-21 11:03   ` Greg Kurz
  2023-11-13 16:39   ` Denis V. Lunev
  2022-11-19 12:29 ` [PATCH 3/3] target/i386/kvm: get and put AMD pmu registers Dongli Zhang
  2022-11-21  6:42 ` [PATCH 0/3] kvm: fix two svm pmu virtualization bugs Like Xu
  3 siblings, 2 replies; 12+ messages in thread
From: Dongli Zhang @ 2022-11-19 12:29 UTC (permalink / raw)
  To: kvm, qemu-devel, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, peter.maydell, mtosatti, chenhuacai, philmd, aurelien,
	jiaxun.yang, aleksandar.rikalo, danielhb413, clg, david, groug,
	palmer, alistair.francis, bin.meng, pasic, borntraeger,
	richard.henderson, david, iii, thuth, joe.jin, likexu

The "perf stat" at the VM side still works even we set "-cpu host,-pmu" in
the QEMU command line. That is, neither "-cpu host,-pmu" nor "-cpu EPYC"
could disable the pmu virtualization in an AMD environment.

We still see below at VM kernel side ...

[    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.

... although we expect something like below.

[    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
[    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled

This is because the AMD pmu (v1) does not rely on cpuid to decide if the
pmu virtualization is supported.

We disable KVM_CAP_PMU_CAPABILITY if the 'pmu' is disabled in the vcpu
properties.

Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
 target/i386/kvm/kvm.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 8fec0bc5b5..0b1226ff7f 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -137,6 +137,8 @@ static int has_triple_fault_event;
 
 static bool has_msr_mcg_ext_ctl;
 
+static int has_pmu_cap;
+
 static struct kvm_cpuid2 *cpuid_cache;
 static struct kvm_cpuid2 *hv_cpuid_cache;
 static struct kvm_msr_list *kvm_feature_msrs;
@@ -1725,6 +1727,19 @@ static void kvm_init_nested_state(CPUX86State *env)
 
 void kvm_arch_pre_create_vcpu(CPUState *cs)
 {
+    X86CPU *cpu = X86_CPU(cs);
+    int ret;
+
+    if (has_pmu_cap && !cpu->enable_pmu) {
+        ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
+                                KVM_PMU_CAP_DISABLE);
+        if (ret < 0) {
+            error_report("kvm: Failed to disable pmu cap: %s",
+                         strerror(-ret));
+        }
+
+        has_pmu_cap = 0;
+    }
 }
 
 int kvm_arch_init_vcpu(CPUState *cs)
@@ -2517,6 +2532,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
         }
     }
 
+    has_pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
+
     ret = kvm_get_supported_msrs(s);
     if (ret < 0) {
         return ret;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] target/i386/kvm: get and put AMD pmu registers
  2022-11-19 12:28 [PATCH 0/3] kvm: fix two svm pmu virtualization bugs Dongli Zhang
  2022-11-19 12:28 ` [PATCH 1/3] kvm: introduce a helper before creating the 1st vcpu Dongli Zhang
  2022-11-19 12:29 ` [PATCH 2/3] i386: kvm: disable KVM_CAP_PMU_CAPABILITY if "pmu" is disabled Dongli Zhang
@ 2022-11-19 12:29 ` Dongli Zhang
  2022-11-21 14:28   ` Liang Yan
  2022-11-21  6:42 ` [PATCH 0/3] kvm: fix two svm pmu virtualization bugs Like Xu
  3 siblings, 1 reply; 12+ messages in thread
From: Dongli Zhang @ 2022-11-19 12:29 UTC (permalink / raw)
  To: kvm, qemu-devel, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
  Cc: pbonzini, peter.maydell, mtosatti, chenhuacai, philmd, aurelien,
	jiaxun.yang, aleksandar.rikalo, danielhb413, clg, david, groug,
	palmer, alistair.francis, bin.meng, pasic, borntraeger,
	richard.henderson, david, iii, thuth, joe.jin, likexu

The QEMU side calls kvm_get_msrs() to save the pmu registers from the KVM
side to QEMU, and calls kvm_put_msrs() to store the pmu registers back to
the KVM side.

However, only the Intel gp/fixed/global pmu registers are involved. There
is not any implementation for AMD pmu registers. The
'has_architectural_pmu_version' and 'num_architectural_pmu_gp_counters' are
calculated at kvm_arch_init_vcpu() via cpuid(0xa). This does not work for
AMD. Before AMD PerfMonV2, the number of gp registers is decided based on
the CPU version.

This patch is to add the support for AMD version=1 pmu, to get and put AMD
pmu registers. Otherwise, there will be a bug:

1. The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it
is running "perf top". The pmu registers are not disabled gracefully.

2. Although the x86_cpu_reset() resets many registers to zero, the
kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result,
some pmu events are still enabled at the KVM side.

3. The KVM pmc_speculative_in_use() always returns true so that the events
will not be reclaimed. The kvm_pmc->perf_event is still active.

4. After the reboot, the VM kernel reports below error:

[    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
[    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)

5. In a worse case, the active kvm_pmc->perf_event is still able to
inject unknown NMIs randomly to the VM kernel.

[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.

The patch is to fix the issue by resetting AMD pmu registers during the
reset.

Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
 target/i386/cpu.h     |  5 +++
 target/i386/kvm/kvm.c | 83 +++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index d4bc19577a..4cf0b98817 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -468,6 +468,11 @@ typedef enum X86Seg {
 #define MSR_CORE_PERF_GLOBAL_CTRL       0x38f
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL   0x390
 
+#define MSR_K7_EVNTSEL0                 0xc0010000
+#define MSR_K7_PERFCTR0                 0xc0010004
+#define MSR_F15H_PERF_CTL0              0xc0010200
+#define MSR_F15H_PERF_CTR0              0xc0010201
+
 #define MSR_MC0_CTL                     0x400
 #define MSR_MC0_STATUS                  0x401
 #define MSR_MC0_ADDR                    0x402
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 0b1226ff7f..023fcbce48 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2005,6 +2005,32 @@ int kvm_arch_init_vcpu(CPUState *cs)
         }
     }
 
+    if (IS_AMD_CPU(env)) {
+        int64_t family;
+
+        family = (env->cpuid_version >> 8) & 0xf;
+        if (family == 0xf) {
+            family += (env->cpuid_version >> 20) & 0xff;
+        }
+
+        /*
+         * If KVM_CAP_PMU_CAPABILITY is not supported, there is no way to
+         * disable the AMD pmu virtualization.
+         *
+         * If KVM_CAP_PMU_CAPABILITY is supported, "!has_pmu_cap" indicates
+         * the KVM side has already disabled the pmu virtualization.
+         */
+        if (family >= 6 && (!has_pmu_cap || cpu->enable_pmu)) {
+            has_architectural_pmu_version = 1;
+
+            if (env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_PERFCORE) {
+                num_architectural_pmu_gp_counters = 6;
+            } else {
+                num_architectural_pmu_gp_counters = 4;
+            }
+        }
+    }
+
     cpu_x86_cpuid(env, 0x80000000, 0, &limit, &unused, &unused, &unused);
 
     for (i = 0x80000000; i <= limit; i++) {
@@ -3326,7 +3352,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
             kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
         }
 
-        if (has_architectural_pmu_version > 0) {
+        if (has_architectural_pmu_version > 0 && IS_INTEL_CPU(env)) {
             if (has_architectural_pmu_version > 1) {
                 /* Stop the counter.  */
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
@@ -3357,6 +3383,26 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
                                   env->msr_global_ctrl);
             }
         }
+
+        if (has_architectural_pmu_version > 0 && IS_AMD_CPU(env)) {
+            uint32_t sel_base = MSR_K7_EVNTSEL0;
+            uint32_t ctr_base = MSR_K7_PERFCTR0;
+            uint32_t step = 1;
+
+            if (num_architectural_pmu_gp_counters == 6) {
+                sel_base = MSR_F15H_PERF_CTL0;
+                ctr_base = MSR_F15H_PERF_CTR0;
+                step = 2;
+            }
+
+            for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
+                kvm_msr_entry_add(cpu, ctr_base + i * step,
+                                  env->msr_gp_counters[i]);
+                kvm_msr_entry_add(cpu, sel_base + i * step,
+                                  env->msr_gp_evtsel[i]);
+            }
+        }
+
         /*
          * Hyper-V partition-wide MSRs: to avoid clearing them on cpu hot-add,
          * only sync them to KVM on the first cpu
@@ -3817,7 +3863,7 @@ static int kvm_get_msrs(X86CPU *cpu)
     if (env->features[FEAT_KVM] & (1 << KVM_FEATURE_POLL_CONTROL)) {
         kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
     }
-    if (has_architectural_pmu_version > 0) {
+    if (has_architectural_pmu_version > 0 && IS_INTEL_CPU(env)) {
         if (has_architectural_pmu_version > 1) {
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
@@ -3833,6 +3879,25 @@ static int kvm_get_msrs(X86CPU *cpu)
         }
     }
 
+    if (has_architectural_pmu_version > 0 && IS_AMD_CPU(env)) {
+        uint32_t sel_base = MSR_K7_EVNTSEL0;
+        uint32_t ctr_base = MSR_K7_PERFCTR0;
+        uint32_t step = 1;
+
+        if (num_architectural_pmu_gp_counters == 6) {
+            sel_base = MSR_F15H_PERF_CTL0;
+            ctr_base = MSR_F15H_PERF_CTR0;
+            step = 2;
+        }
+
+        for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
+            kvm_msr_entry_add(cpu, ctr_base + i * step,
+                              env->msr_gp_counters[i]);
+            kvm_msr_entry_add(cpu, sel_base + i * step,
+                              env->msr_gp_evtsel[i]);
+        }
+    }
+
     if (env->mcg_cap) {
         kvm_msr_entry_add(cpu, MSR_MCG_STATUS, 0);
         kvm_msr_entry_add(cpu, MSR_MCG_CTL, 0);
@@ -4118,6 +4183,20 @@ static int kvm_get_msrs(X86CPU *cpu)
         case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1:
             env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data;
             break;
+        case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL0 + 3:
+            env->msr_gp_evtsel[index - MSR_K7_EVNTSEL0] = msrs[i].data;
+            break;
+        case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR0 + 3:
+            env->msr_gp_counters[index - MSR_K7_PERFCTR0] = msrs[i].data;
+            break;
+        case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTL0 + 0xb:
+            index = index - MSR_F15H_PERF_CTL0;
+            if (index & 0x1) {
+                env->msr_gp_counters[index] = msrs[i].data;
+            } else {
+                env->msr_gp_evtsel[index] = msrs[i].data;
+            }
+            break;
         case HV_X64_MSR_HYPERCALL:
             env->msr_hv_hypercall = msrs[i].data;
             break;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] kvm: fix two svm pmu virtualization bugs
  2022-11-19 12:28 [PATCH 0/3] kvm: fix two svm pmu virtualization bugs Dongli Zhang
                   ` (2 preceding siblings ...)
  2022-11-19 12:29 ` [PATCH 3/3] target/i386/kvm: get and put AMD pmu registers Dongli Zhang
@ 2022-11-21  6:42 ` Like Xu
  2022-11-21  7:38   ` Dongli Zhang
  3 siblings, 1 reply; 12+ messages in thread
From: Like Xu @ 2022-11-21  6:42 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: pbonzini, peter.maydell, mtosatti, chenhuacai, philmd, aurelien,
	jiaxun.yang, aleksandar.rikalo, danielhb413, clg, david, groug,
	palmer, alistair.francis, bin.meng, pasic, borntraeger,
	richard.henderson, david, iii, thuth, joe.jin, qemu-s390x,
	qemu-riscv, qemu-ppc, qemu-arm, kvm list, qemu-devel

On 19/11/2022 8:28 pm, Dongli Zhang wrote:
> This patchset is to fix two svm pmu virtualization bugs.
> 
> 1. The 1st bug is that "-cpu,-pmu" cannot disable svm pmu virtualization.
> 
> To use "-cpu EPYC" or "-cpu host,-pmu" cannot disable the pmu
> virtualization. There is still below at the VM linux side ...

Many QEMU vendor forks already have similar fixes, and
thanks for bringing this issue back to the mainline.

> 
> [    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
> 
> ... although we expect something like below.
> 
> [    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
> [    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
> 
> The patch 1-2 is to disable the pmu virtualization via KVM_PMU_CAP_DISABLE
> if the per-vcpu "pmu" property is disabled.
> 
> I considered 'KVM_X86_SET_MSR_FILTER' initially.
> Since both KVM_X86_SET_MSR_FILTER and KVM_PMU_CAP_DISABLE are VM ioctl. I
> finally used the latter because it is easier to use.
> 
> 
> 2. The 2nd bug is that un-reclaimed perf events (after QEMU system_reset)
> at the KVM side may inject random unwanted/unknown NMIs to the VM.
> 
> The svm pmu registers are not reset during QEMU system_reset.
> 
> (1). The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it
> is running "perf top". The pmu registers are not disabled gracefully.
> 
> (2). Although the x86_cpu_reset() resets many registers to zero, the
> kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result,
> some pmu events are still enabled at the KVM side.
> 
> (3). The KVM pmc_speculative_in_use() always returns true so that the events
> will not be reclaimed. The kvm_pmc->perf_event is still active.

I'm not sure if you're saying KVM doing something wrong, I don't think so
because KVM doesn't sense the system_reset defined by QEME or other user space,
AMD's vPMC will continue to be enabled (if it was enabled before), generating pmi
injection into the guest, and the newly started guest doesn't realize the 
counter is still
enabled and blowing up the error log.

> 
> (4). After the reboot, the VM kernel reports below error:
> 
> [    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
> [    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
> 
> (5). In a worse case, the active kvm_pmc->perf_event is still able to
> inject unknown NMIs randomly to the VM kernel.
> 
> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
> 
> The patch 3 is to fix the issue by resetting AMD pmu registers as well as
> Intel registers.

This fix idea looks good, it does require syncing the new changed device state 
of QEMU to KVM.

> 
> 
> This patchset does cover does not cover PerfMonV2, until the below patchset
> is merged into the KVM side.
> 
> [PATCH v3 0/8] KVM: x86: Add AMD Guest PerfMonV2 PMU support
> https://lore.kernel.org/all/20221111102645.82001-1-likexu@tencent.com/
> 
> 
> Dongli Zhang (3):
>        kvm: introduce a helper before creating the 1st vcpu
>        i386: kvm: disable KVM_CAP_PMU_CAPABILITY if "pmu" is disabled
>        target/i386/kvm: get and put AMD pmu registers
> 
>   accel/kvm/kvm-all.c    |   7 ++-
>   include/sysemu/kvm.h   |   2 +
>   target/arm/kvm64.c     |   4 ++
>   target/i386/cpu.h      |   5 +++
>   target/i386/kvm/kvm.c  | 104 +++++++++++++++++++++++++++++++++++++++++++-
>   target/mips/kvm.c      |   4 ++
>   target/ppc/kvm.c       |   4 ++
>   target/riscv/kvm.c     |   4 ++
>   target/s390x/kvm/kvm.c |   4 ++
>   9 files changed, 134 insertions(+), 4 deletions(-)
> 
> Thank you very much!
> 
> Dongli Zhang
> 
> 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] kvm: fix two svm pmu virtualization bugs
  2022-11-21  6:42 ` [PATCH 0/3] kvm: fix two svm pmu virtualization bugs Like Xu
@ 2022-11-21  7:38   ` Dongli Zhang
  0 siblings, 0 replies; 12+ messages in thread
From: Dongli Zhang @ 2022-11-21  7:38 UTC (permalink / raw)
  To: Like Xu
  Cc: pbonzini, peter.maydell, mtosatti, chenhuacai, philmd, aurelien,
	jiaxun.yang, aleksandar.rikalo, danielhb413, clg, david, groug,
	palmer, alistair.francis, bin.meng, pasic, borntraeger,
	richard.henderson, david, iii, thuth, joe.jin, qemu-s390x,
	qemu-riscv, qemu-ppc, qemu-arm, kvm list, qemu-devel

Hi Like,

On 11/20/22 22:42, Like Xu wrote:
> On 19/11/2022 8:28 pm, Dongli Zhang wrote:
>> This patchset is to fix two svm pmu virtualization bugs.
>>
>> 1. The 1st bug is that "-cpu,-pmu" cannot disable svm pmu virtualization.
>>
>> To use "-cpu EPYC" or "-cpu host,-pmu" cannot disable the pmu
>> virtualization. There is still below at the VM linux side ...
> 
> Many QEMU vendor forks already have similar fixes, and
> thanks for bringing this issue back to the mainline.

Would you mind helping point to if there used to be any prior patchset for
mainline to resolve the issue?

> 
>>
>> [    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
>>
>> ... although we expect something like below.
>>
>> [    0.596381] Performance Events: PMU not available due to virtualization,
>> using software events only.
>> [    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
>>
>> The patch 1-2 is to disable the pmu virtualization via KVM_PMU_CAP_DISABLE
>> if the per-vcpu "pmu" property is disabled.
>>
>> I considered 'KVM_X86_SET_MSR_FILTER' initially.
>> Since both KVM_X86_SET_MSR_FILTER and KVM_PMU_CAP_DISABLE are VM ioctl. I
>> finally used the latter because it is easier to use.
>>
>>
>> 2. The 2nd bug is that un-reclaimed perf events (after QEMU system_reset)
>> at the KVM side may inject random unwanted/unknown NMIs to the VM.
>>
>> The svm pmu registers are not reset during QEMU system_reset.
>>
>> (1). The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it
>> is running "perf top". The pmu registers are not disabled gracefully.
>>
>> (2). Although the x86_cpu_reset() resets many registers to zero, the
>> kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result,
>> some pmu events are still enabled at the KVM side.
>>
>> (3). The KVM pmc_speculative_in_use() always returns true so that the events
>> will not be reclaimed. The kvm_pmc->perf_event is still active.
> 
> I'm not sure if you're saying KVM doing something wrong, I don't think so
> because KVM doesn't sense the system_reset defined by QEME or other user space,
> AMD's vPMC will continue to be enabled (if it was enabled before), generating pmi
> injection into the guest, and the newly started guest doesn't realize the
> counter is still
> enabled and blowing up the error log.

I were not saying KVM was buggy.

I was trying to explain how the issue impacts KVM side and VM side.

> 
>>
>> (4). After the reboot, the VM kernel reports below error:
>>
>> [    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected,
>> complain to your hardware vendor.
>> [    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR
>> c0010200 is 530076)
>>
>> (5). In a worse case, the active kvm_pmc->perf_event is still able to
>> inject unknown NMIs randomly to the VM kernel.
>>
>> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
>>
>> The patch 3 is to fix the issue by resetting AMD pmu registers as well as
>> Intel registers.
> 
> This fix idea looks good, it does require syncing the new changed device state
> of QEMU to KVM.

Thank you very much!

Dongli Zhang

> 
>>
>>
>> This patchset does cover does not cover PerfMonV2, until the below patchset
>> is merged into the KVM side.
>>
>> [PATCH v3 0/8] KVM: x86: Add AMD Guest PerfMonV2 PMU support
>> https://lore.kernel.org/all/20221111102645.82001-1-likexu@tencent.com/
>>
>>
>> Dongli Zhang (3):
>>        kvm: introduce a helper before creating the 1st vcpu
>>        i386: kvm: disable KVM_CAP_PMU_CAPABILITY if "pmu" is disabled
>>        target/i386/kvm: get and put AMD pmu registers
>>
>>   accel/kvm/kvm-all.c    |   7 ++-
>>   include/sysemu/kvm.h   |   2 +
>>   target/arm/kvm64.c     |   4 ++
>>   target/i386/cpu.h      |   5 +++
>>   target/i386/kvm/kvm.c  | 104 +++++++++++++++++++++++++++++++++++++++++++-
>>   target/mips/kvm.c      |   4 ++
>>   target/ppc/kvm.c       |   4 ++
>>   target/riscv/kvm.c     |   4 ++
>>   target/s390x/kvm/kvm.c |   4 ++
>>   9 files changed, 134 insertions(+), 4 deletions(-)
>>
>> Thank you very much!
>>
>> Dongli Zhang
>>
>>
>>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] i386: kvm: disable KVM_CAP_PMU_CAPABILITY if "pmu" is disabled
  2022-11-19 12:29 ` [PATCH 2/3] i386: kvm: disable KVM_CAP_PMU_CAPABILITY if "pmu" is disabled Dongli Zhang
@ 2022-11-21 11:03   ` Greg Kurz
  2022-11-21 14:23     ` Liang Yan
  2023-11-13 16:39   ` Denis V. Lunev
  1 sibling, 1 reply; 12+ messages in thread
From: Greg Kurz @ 2022-11-21 11:03 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: kvm, qemu-devel, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x,
	pbonzini, peter.maydell, mtosatti, chenhuacai, philmd, aurelien,
	jiaxun.yang, aleksandar.rikalo, danielhb413, clg, david, palmer,
	alistair.francis, bin.meng, pasic, borntraeger, richard.henderson,
	david, iii, thuth, joe.jin, likexu

On Sat, 19 Nov 2022 04:29:00 -0800
Dongli Zhang <dongli.zhang@oracle.com> wrote:

> The "perf stat" at the VM side still works even we set "-cpu host,-pmu" in
> the QEMU command line. That is, neither "-cpu host,-pmu" nor "-cpu EPYC"
> could disable the pmu virtualization in an AMD environment.
> 
> We still see below at VM kernel side ...
> 
> [    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
> 
> ... although we expect something like below.
> 
> [    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
> [    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
> 
> This is because the AMD pmu (v1) does not rely on cpuid to decide if the
> pmu virtualization is supported.
> 
> We disable KVM_CAP_PMU_CAPABILITY if the 'pmu' is disabled in the vcpu
> properties.
> 
> Cc: Joe Jin <joe.jin@oracle.com>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
>  target/i386/kvm/kvm.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 8fec0bc5b5..0b1226ff7f 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -137,6 +137,8 @@ static int has_triple_fault_event;
>  
>  static bool has_msr_mcg_ext_ctl;
>  
> +static int has_pmu_cap;
> +
>  static struct kvm_cpuid2 *cpuid_cache;
>  static struct kvm_cpuid2 *hv_cpuid_cache;
>  static struct kvm_msr_list *kvm_feature_msrs;
> @@ -1725,6 +1727,19 @@ static void kvm_init_nested_state(CPUX86State *env)
>  
>  void kvm_arch_pre_create_vcpu(CPUState *cs)
>  {
> +    X86CPU *cpu = X86_CPU(cs);
> +    int ret;
> +
> +    if (has_pmu_cap && !cpu->enable_pmu) {
> +        ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
> +                                KVM_PMU_CAP_DISABLE);

It doesn't seem conceptually correct to configure VM level stuff out of
a vCPU property, which could theoretically be different for each vCPU,
even if this isn't the case with the current code base.

Maybe consider controlling PMU with a machine property and this
could be done in kvm_arch_init() like other VM level stuff ?

> +        if (ret < 0) {
> +            error_report("kvm: Failed to disable pmu cap: %s",
> +                         strerror(-ret));
> +        }
> +
> +        has_pmu_cap = 0;
> +    }
>  }
>  
>  int kvm_arch_init_vcpu(CPUState *cs)
> @@ -2517,6 +2532,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>          }
>      }
>  
> +    has_pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
> +
>      ret = kvm_get_supported_msrs(s);
>      if (ret < 0) {
>          return ret;



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] i386: kvm: disable KVM_CAP_PMU_CAPABILITY if "pmu" is disabled
  2022-11-21 11:03   ` Greg Kurz
@ 2022-11-21 14:23     ` Liang Yan
  2022-11-21 21:11       ` Dongli Zhang
  0 siblings, 1 reply; 12+ messages in thread
From: Liang Yan @ 2022-11-21 14:23 UTC (permalink / raw)
  To: Greg Kurz, Dongli Zhang
  Cc: kvm, qemu-devel, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x,
	pbonzini, peter.maydell, mtosatti, chenhuacai, philmd, aurelien,
	jiaxun.yang, aleksandar.rikalo, danielhb413, clg, david, palmer,
	alistair.francis, bin.meng, pasic, borntraeger, richard.henderson,
	david, iii, thuth, joe.jin, likexu


On 11/21/22 06:03, Greg Kurz wrote:
> On Sat, 19 Nov 2022 04:29:00 -0800
> Dongli Zhang <dongli.zhang@oracle.com> wrote:
>
>> The "perf stat" at the VM side still works even we set "-cpu host,-pmu" in
>> the QEMU command line. That is, neither "-cpu host,-pmu" nor "-cpu EPYC"
>> could disable the pmu virtualization in an AMD environment.
>>
>> We still see below at VM kernel side ...
>>
>> [    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
>>
>> ... although we expect something like below.
>>
>> [    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
>> [    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
>>
>> This is because the AMD pmu (v1) does not rely on cpuid to decide if the
>> pmu virtualization is supported.
>>
>> We disable KVM_CAP_PMU_CAPABILITY if the 'pmu' is disabled in the vcpu
>> properties.
>>
>> Cc: Joe Jin <joe.jin@oracle.com>
>> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
>> ---
>>   target/i386/kvm/kvm.c | 17 +++++++++++++++++
>>   1 file changed, 17 insertions(+)
>>
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> index 8fec0bc5b5..0b1226ff7f 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -137,6 +137,8 @@ static int has_triple_fault_event;
>>   
>>   static bool has_msr_mcg_ext_ctl;
>>   
>> +static int has_pmu_cap;
>> +
>>   static struct kvm_cpuid2 *cpuid_cache;
>>   static struct kvm_cpuid2 *hv_cpuid_cache;
>>   static struct kvm_msr_list *kvm_feature_msrs;
>> @@ -1725,6 +1727,19 @@ static void kvm_init_nested_state(CPUX86State *env)
>>   
>>   void kvm_arch_pre_create_vcpu(CPUState *cs)
>>   {
>> +    X86CPU *cpu = X86_CPU(cs);
>> +    int ret;
>> +
>> +    if (has_pmu_cap && !cpu->enable_pmu) {
>> +        ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
>> +                                KVM_PMU_CAP_DISABLE);
> It doesn't seem conceptually correct to configure VM level stuff out of
> a vCPU property, which could theoretically be different for each vCPU,
> even if this isn't the case with the current code base.
>
> Maybe consider controlling PMU with a machine property and this
> could be done in kvm_arch_init() like other VM level stuff ?
>

There is already a 'pmu' property for x86_cpu with variable 'enable_pmu' 
as we see the above code. It is mainly used by Intel CPU and set to off 
by default since qemu 1.5.

And, this property is spread to AMD CPU too.

I think you may need setup a machine property to disable it from current 
machine model. Otherwise, it will break the Live Migration scenario.


>> +        if (ret < 0) {
>> +            error_report("kvm: Failed to disable pmu cap: %s",
>> +                         strerror(-ret));
>> +        }
>> +
>> +        has_pmu_cap = 0;
>> +    }
>>   }
>>   
>>   int kvm_arch_init_vcpu(CPUState *cs)
>> @@ -2517,6 +2532,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>           }
>>       }
>>   
>> +    has_pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
>> +
>>       ret = kvm_get_supported_msrs(s);
>>       if (ret < 0) {
>>           return ret;
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] target/i386/kvm: get and put AMD pmu registers
  2022-11-19 12:29 ` [PATCH 3/3] target/i386/kvm: get and put AMD pmu registers Dongli Zhang
@ 2022-11-21 14:28   ` Liang Yan
  2022-11-21 21:33     ` Dongli Zhang
  0 siblings, 1 reply; 12+ messages in thread
From: Liang Yan @ 2022-11-21 14:28 UTC (permalink / raw)
  To: Dongli Zhang, kvm, qemu-devel, qemu-arm, qemu-ppc, qemu-riscv,
	qemu-s390x
  Cc: pbonzini, peter.maydell, mtosatti, chenhuacai, philmd, aurelien,
	jiaxun.yang, aleksandar.rikalo, danielhb413, clg, david, groug,
	palmer, alistair.francis, bin.meng, pasic, borntraeger,
	richard.henderson, david, iii, thuth, joe.jin, likexu

A little bit more information from kernel perspective.

https://lkml.org/lkml/2022/10/31/476


I was kindly thinking of the same idea, but not sure if it is expected  
from a bare-metal perspective, since the four legacy MSRs

are always there. Also not sure if they are used by other applications.


~Liang


On 11/19/22 07:29, Dongli Zhang wrote:
> The QEMU side calls kvm_get_msrs() to save the pmu registers from the KVM
> side to QEMU, and calls kvm_put_msrs() to store the pmu registers back to
> the KVM side.
>
> However, only the Intel gp/fixed/global pmu registers are involved. There
> is not any implementation for AMD pmu registers. The
> 'has_architectural_pmu_version' and 'num_architectural_pmu_gp_counters' are
> calculated at kvm_arch_init_vcpu() via cpuid(0xa). This does not work for
> AMD. Before AMD PerfMonV2, the number of gp registers is decided based on
> the CPU version.
>
> This patch is to add the support for AMD version=1 pmu, to get and put AMD
> pmu registers. Otherwise, there will be a bug:
>
> 1. The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it
> is running "perf top". The pmu registers are not disabled gracefully.
>
> 2. Although the x86_cpu_reset() resets many registers to zero, the
> kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result,
> some pmu events are still enabled at the KVM side.
>
> 3. The KVM pmc_speculative_in_use() always returns true so that the events
> will not be reclaimed. The kvm_pmc->perf_event is still active.
>
> 4. After the reboot, the VM kernel reports below error:
>
> [    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
> [    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
>
> 5. In a worse case, the active kvm_pmc->perf_event is still able to
> inject unknown NMIs randomly to the VM kernel.
>
> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
>
> The patch is to fix the issue by resetting AMD pmu registers during the
> reset.
>
> Cc: Joe Jin <joe.jin@oracle.com>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
>   target/i386/cpu.h     |  5 +++
>   target/i386/kvm/kvm.c | 83 +++++++++++++++++++++++++++++++++++++++++--
>   2 files changed, 86 insertions(+), 2 deletions(-)
>
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index d4bc19577a..4cf0b98817 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -468,6 +468,11 @@ typedef enum X86Seg {
>   #define MSR_CORE_PERF_GLOBAL_CTRL       0x38f
>   #define MSR_CORE_PERF_GLOBAL_OVF_CTRL   0x390
>   
> +#define MSR_K7_EVNTSEL0                 0xc0010000
> +#define MSR_K7_PERFCTR0                 0xc0010004
> +#define MSR_F15H_PERF_CTL0              0xc0010200
> +#define MSR_F15H_PERF_CTR0              0xc0010201
> +
>   #define MSR_MC0_CTL                     0x400
>   #define MSR_MC0_STATUS                  0x401
>   #define MSR_MC0_ADDR                    0x402
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 0b1226ff7f..023fcbce48 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -2005,6 +2005,32 @@ int kvm_arch_init_vcpu(CPUState *cs)
>           }
>       }
>   
> +    if (IS_AMD_CPU(env)) {
> +        int64_t family;
> +
> +        family = (env->cpuid_version >> 8) & 0xf;
> +        if (family == 0xf) {
> +            family += (env->cpuid_version >> 20) & 0xff;
> +        }
> +
> +        /*
> +         * If KVM_CAP_PMU_CAPABILITY is not supported, there is no way to
> +         * disable the AMD pmu virtualization.
> +         *
> +         * If KVM_CAP_PMU_CAPABILITY is supported, "!has_pmu_cap" indicates
> +         * the KVM side has already disabled the pmu virtualization.
> +         */
> +        if (family >= 6 && (!has_pmu_cap || cpu->enable_pmu)) {
> +            has_architectural_pmu_version = 1;
> +
> +            if (env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_PERFCORE) {
> +                num_architectural_pmu_gp_counters = 6;
> +            } else {
> +                num_architectural_pmu_gp_counters = 4;
> +            }
> +        }
> +    }
> +
>       cpu_x86_cpuid(env, 0x80000000, 0, &limit, &unused, &unused, &unused);
>   
>       for (i = 0x80000000; i <= limit; i++) {
> @@ -3326,7 +3352,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
>               kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
>           }
>   
> -        if (has_architectural_pmu_version > 0) {
> +        if (has_architectural_pmu_version > 0 && IS_INTEL_CPU(env)) {
>               if (has_architectural_pmu_version > 1) {
>                   /* Stop the counter.  */
>                   kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
> @@ -3357,6 +3383,26 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
>                                     env->msr_global_ctrl);
>               }
>           }
> +
> +        if (has_architectural_pmu_version > 0 && IS_AMD_CPU(env)) {
> +            uint32_t sel_base = MSR_K7_EVNTSEL0;
> +            uint32_t ctr_base = MSR_K7_PERFCTR0;
> +            uint32_t step = 1;
> +
> +            if (num_architectural_pmu_gp_counters == 6) {
> +                sel_base = MSR_F15H_PERF_CTL0;
> +                ctr_base = MSR_F15H_PERF_CTR0;
> +                step = 2;
> +            }
> +
> +            for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
> +                kvm_msr_entry_add(cpu, ctr_base + i * step,
> +                                  env->msr_gp_counters[i]);
> +                kvm_msr_entry_add(cpu, sel_base + i * step,
> +                                  env->msr_gp_evtsel[i]);
> +            }
> +        }
> +
>           /*
>            * Hyper-V partition-wide MSRs: to avoid clearing them on cpu hot-add,
>            * only sync them to KVM on the first cpu
> @@ -3817,7 +3863,7 @@ static int kvm_get_msrs(X86CPU *cpu)
>       if (env->features[FEAT_KVM] & (1 << KVM_FEATURE_POLL_CONTROL)) {
>           kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
>       }
> -    if (has_architectural_pmu_version > 0) {
> +    if (has_architectural_pmu_version > 0 && IS_INTEL_CPU(env)) {
>           if (has_architectural_pmu_version > 1) {
>               kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
>               kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
> @@ -3833,6 +3879,25 @@ static int kvm_get_msrs(X86CPU *cpu)
>           }
>       }
>   
> +    if (has_architectural_pmu_version > 0 && IS_AMD_CPU(env)) {
> +        uint32_t sel_base = MSR_K7_EVNTSEL0;
> +        uint32_t ctr_base = MSR_K7_PERFCTR0;
> +        uint32_t step = 1;
> +
> +        if (num_architectural_pmu_gp_counters == 6) {
> +            sel_base = MSR_F15H_PERF_CTL0;
> +            ctr_base = MSR_F15H_PERF_CTR0;
> +            step = 2;
> +        }
> +
> +        for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
> +            kvm_msr_entry_add(cpu, ctr_base + i * step,
> +                              env->msr_gp_counters[i]);
> +            kvm_msr_entry_add(cpu, sel_base + i * step,
> +                              env->msr_gp_evtsel[i]);
> +        }
> +    }
> +
>       if (env->mcg_cap) {
>           kvm_msr_entry_add(cpu, MSR_MCG_STATUS, 0);
>           kvm_msr_entry_add(cpu, MSR_MCG_CTL, 0);
> @@ -4118,6 +4183,20 @@ static int kvm_get_msrs(X86CPU *cpu)
>           case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1:
>               env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data;
>               break;
> +        case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL0 + 3:
> +            env->msr_gp_evtsel[index - MSR_K7_EVNTSEL0] = msrs[i].data;
> +            break;
> +        case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR0 + 3:
> +            env->msr_gp_counters[index - MSR_K7_PERFCTR0] = msrs[i].data;
> +            break;
> +        case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTL0 + 0xb:
> +            index = index - MSR_F15H_PERF_CTL0;
> +            if (index & 0x1) {
> +                env->msr_gp_counters[index] = msrs[i].data;
> +            } else {
> +                env->msr_gp_evtsel[index] = msrs[i].data;
> +            }
> +            break;
>           case HV_X64_MSR_HYPERCALL:
>               env->msr_hv_hypercall = msrs[i].data;
>               break;


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] i386: kvm: disable KVM_CAP_PMU_CAPABILITY if "pmu" is disabled
  2022-11-21 14:23     ` Liang Yan
@ 2022-11-21 21:11       ` Dongli Zhang
  0 siblings, 0 replies; 12+ messages in thread
From: Dongli Zhang @ 2022-11-21 21:11 UTC (permalink / raw)
  To: Liang Yan, Greg Kurz
  Cc: kvm, qemu-devel, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x,
	pbonzini, peter.maydell, mtosatti, chenhuacai, philmd, aurelien,
	jiaxun.yang, aleksandar.rikalo, danielhb413, clg, david, palmer,
	alistair.francis, bin.meng, pasic, borntraeger, richard.henderson,
	david, iii, thuth, joe.jin, likexu

Hi Greg and Liang,

On 11/21/22 6:23 AM, Liang Yan wrote:
> 
> On 11/21/22 06:03, Greg Kurz wrote:
>> On Sat, 19 Nov 2022 04:29:00 -0800
>> Dongli Zhang <dongli.zhang@oracle.com> wrote:
>>
>>> The "perf stat" at the VM side still works even we set "-cpu host,-pmu" in
>>> the QEMU command line. That is, neither "-cpu host,-pmu" nor "-cpu EPYC"
>>> could disable the pmu virtualization in an AMD environment.
>>>
>>> We still see below at VM kernel side ...
>>>
>>> [    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
>>>
>>> ... although we expect something like below.
>>>
>>> [    0.596381] Performance Events: PMU not available due to virtualization,
>>> using software events only.
>>> [    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
>>>
>>> This is because the AMD pmu (v1) does not rely on cpuid to decide if the
>>> pmu virtualization is supported.
>>>
>>> We disable KVM_CAP_PMU_CAPABILITY if the 'pmu' is disabled in the vcpu
>>> properties.
>>>
>>> Cc: Joe Jin <joe.jin@oracle.com>
>>> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
>>> ---
>>>   target/i386/kvm/kvm.c | 17 +++++++++++++++++
>>>   1 file changed, 17 insertions(+)
>>>
>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>>> index 8fec0bc5b5..0b1226ff7f 100644
>>> --- a/target/i386/kvm/kvm.c
>>> +++ b/target/i386/kvm/kvm.c
>>> @@ -137,6 +137,8 @@ static int has_triple_fault_event;
>>>     static bool has_msr_mcg_ext_ctl;
>>>   +static int has_pmu_cap;
>>> +
>>>   static struct kvm_cpuid2 *cpuid_cache;
>>>   static struct kvm_cpuid2 *hv_cpuid_cache;
>>>   static struct kvm_msr_list *kvm_feature_msrs;
>>> @@ -1725,6 +1727,19 @@ static void kvm_init_nested_state(CPUX86State *env)
>>>     void kvm_arch_pre_create_vcpu(CPUState *cs)
>>>   {
>>> +    X86CPU *cpu = X86_CPU(cs);
>>> +    int ret;
>>> +
>>> +    if (has_pmu_cap && !cpu->enable_pmu) {
>>> +        ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
>>> +                                KVM_PMU_CAP_DISABLE);
>> It doesn't seem conceptually correct to configure VM level stuff out of
>> a vCPU property, which could theoretically be different for each vCPU,
>> even if this isn't the case with the current code base.
>>
>> Maybe consider controlling PMU with a machine property and this
>> could be done in kvm_arch_init() like other VM level stuff ?
>>
> 
> There is already a 'pmu' property for x86_cpu with variable 'enable_pmu' as we
> see the above code. It is mainly used by Intel CPU and set to off by default
> since qemu 1.5.
> 
> And, this property is spread to AMD CPU too.
> 
> I think you may need setup a machine property to disable it from current machine
> model. Otherwise, it will break the Live Migration scenario.

Initially, I was thinking about to replace the CPU level property 'pmu' to a VM
level property but with a concern that reviewers/maintainers might not be happy
with it. That may break the way other Apps use the QEMU.

I will add an extra VM level property to the kvm_accel_class_init() for
"-machine accel=kvm" in addition to the existing per CPU property. The VM (kvm)
level
'pmu' will be more privileged than the CPU level 'pmu'.

Thank you very much!

Dongli Zhang

> 
> 
>>> +        if (ret < 0) {
>>> +            error_report("kvm: Failed to disable pmu cap: %s",
>>> +                         strerror(-ret));
>>> +        }
>>> +
>>> +        has_pmu_cap = 0;
>>> +    }
>>>   }
>>>     int kvm_arch_init_vcpu(CPUState *cs)
>>> @@ -2517,6 +2532,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>>           }
>>>       }
>>>   +    has_pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
>>> +
>>>       ret = kvm_get_supported_msrs(s);
>>>       if (ret < 0) {
>>>           return ret;
>>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] target/i386/kvm: get and put AMD pmu registers
  2022-11-21 14:28   ` Liang Yan
@ 2022-11-21 21:33     ` Dongli Zhang
  0 siblings, 0 replies; 12+ messages in thread
From: Dongli Zhang @ 2022-11-21 21:33 UTC (permalink / raw)
  To: Liang Yan
  Cc: pbonzini, peter.maydell, mtosatti, chenhuacai, philmd, aurelien,
	jiaxun.yang, aleksandar.rikalo, danielhb413, clg, david, groug,
	palmer, alistair.francis, bin.meng, pasic, borntraeger,
	richard.henderson, david, iii, thuth, joe.jin, likexu, kvm,
	qemu-devel, qemu-ppc, qemu-s390x, qemu-arm, qemu-riscv

Hi Liang,

On 11/21/22 6:28 AM, Liang Yan wrote:
> A little bit more information from kernel perspective.
> 
> https://urldefense.com/v3/__https://lkml.org/lkml/2022/10/31/476__;!!ACWV5N9M2RV99hQ!NHxyuDAt7ZD4hlsoxPCIUSRsPzaii0kDx2DrS7umBMoKVD8Z6BH7IKvPu8p0EhBBTEqQkCMfTk1xoj-XzT0$
> 
> 
> I was kindly thinking of the same idea, but not sure if it is expected  from a
> bare-metal perspective, since the four legacy MSRs
> 
> are always there. Also not sure if they are used by other applications.
> 
> 
> ~Liang

Can I assume you were replying to the the patch ...

[PATCH 2/3] i386: kvm: disable KVM_CAP_PMU_CAPABILITY if "pmu" is disabled

... but not this one which is about to save/restore PMU registers?

[PATCH 3/3] target/i386/kvm: get and put AMD pmu registers


About the legacy registers, here is my understanding just based on the linux amd
source code.

The line 1450 indicates the PMU initialization is failed only when the family is
< 6. The amd_core_pmu_init() at line 1455 will decide whether to use PERFCTR_CORE.

1445 __init int amd_pmu_init(void)
1446 {
1447         int ret;
1448
1449         /* Performance-monitoring supported from K7 and later: */
1450         if (boot_cpu_data.x86 < 6)
1451                 return -ENODEV;
1452
1453         x86_pmu = amd_pmu;
1454
1455         ret = amd_core_pmu_init();
1456         if (ret)
1457                 return ret


By default (when family < 6), the below 4 legacy registers are used.

1270 static __initconst const struct x86_pmu amd_pmu = {
1271         .name                   = "AMD",
... ...
1279         .eventsel               = MSR_K7_EVNTSEL0,
1280         .perfctr                = MSR_K7_PERFCTR0,

#define MSR_K7_EVNTSEL0                 0xc0010000
#define MSR_K7_PERFCTR0                 0xc0010004
#define MSR_K7_EVNTSEL1                 0xc0010001
#define MSR_K7_PERFCTR1                 0xc0010005
#define MSR_K7_EVNTSEL2                 0xc0010002
#define MSR_K7_PERFCTR2                 0xc0010006
#define MSR_K7_EVNTSEL3                 0xc0010003
#define MSR_K7_PERFCTR3                 0xc0010007


If PERFCTR_CORE is supported, counters like below are used, as line 1368 and 1369.

1351 static int __init amd_core_pmu_init(void)
1352 {
1353         union cpuid_0x80000022_ebx ebx;
1354         u64 even_ctr_mask = 0ULL;
1355         int i;
1356
1357         if (!boot_cpu_has(X86_FEATURE_PERFCTR_CORE))
1358                 return 0;
1359
1360         /* Avoid calculating the value each time in the NMI handler */
1361         perf_nmi_window = msecs_to_jiffies(100);
1362
1363         /*
1364          * If core performance counter extensions exists, we must use
1365          * MSR_F15H_PERF_CTL/MSR_F15H_PERF_CTR msrs. See also
1366          * amd_pmu_addr_offset().
1367          */
1368         x86_pmu.eventsel        = MSR_F15H_PERF_CTL;
1369         x86_pmu.perfctr         = MSR_F15H_PERF_CTR;
1370         x86_pmu.num_counters    = AMD64_NUM_COUNTERS_CORE;

#define MSR_F15H_PERF_CTL               0xc0010200
#define MSR_F15H_PERF_CTL0              MSR_F15H_PERF_CTL
#define MSR_F15H_PERF_CTL1              (MSR_F15H_PERF_CTL + 2)
#define MSR_F15H_PERF_CTL2              (MSR_F15H_PERF_CTL + 4)
#define MSR_F15H_PERF_CTL3              (MSR_F15H_PERF_CTL + 6)
#define MSR_F15H_PERF_CTL4              (MSR_F15H_PERF_CTL + 8)
#define MSR_F15H_PERF_CTL5              (MSR_F15H_PERF_CTL + 10)

#define MSR_F15H_PERF_CTR               0xc0010201
#define MSR_F15H_PERF_CTR0              MSR_F15H_PERF_CTR
#define MSR_F15H_PERF_CTR1              (MSR_F15H_PERF_CTR + 2)
#define MSR_F15H_PERF_CTR2              (MSR_F15H_PERF_CTR + 4)
#define MSR_F15H_PERF_CTR3              (MSR_F15H_PERF_CTR + 6)
#define MSR_F15H_PERF_CTR4              (MSR_F15H_PERF_CTR + 8)
#define MSR_F15H_PERF_CTR5              (MSR_F15H_PERF_CTR + 10)

Unfortunately, I do not have access to machine with family < 6. It might be
interesting to use QEMU to emulate a machine with family < 6.

Thank you very much for suggestion!

Dongli Zhang


> 
> 
> On 11/19/22 07:29, Dongli Zhang wrote:
>> The QEMU side calls kvm_get_msrs() to save the pmu registers from the KVM
>> side to QEMU, and calls kvm_put_msrs() to store the pmu registers back to
>> the KVM side.
>>
>> However, only the Intel gp/fixed/global pmu registers are involved. There
>> is not any implementation for AMD pmu registers. The
>> 'has_architectural_pmu_version' and 'num_architectural_pmu_gp_counters' are
>> calculated at kvm_arch_init_vcpu() via cpuid(0xa). This does not work for
>> AMD. Before AMD PerfMonV2, the number of gp registers is decided based on
>> the CPU version.
>>
>> This patch is to add the support for AMD version=1 pmu, to get and put AMD
>> pmu registers. Otherwise, there will be a bug:
>>
>> 1. The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it
>> is running "perf top". The pmu registers are not disabled gracefully.
>>
>> 2. Although the x86_cpu_reset() resets many registers to zero, the
>> kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result,
>> some pmu events are still enabled at the KVM side.
>>
>> 3. The KVM pmc_speculative_in_use() always returns true so that the events
>> will not be reclaimed. The kvm_pmc->perf_event is still active.
>>
>> 4. After the reboot, the VM kernel reports below error:
>>
>> [    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected,
>> complain to your hardware vendor.
>> [    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR
>> c0010200 is 530076)
>>
>> 5. In a worse case, the active kvm_pmc->perf_event is still able to
>> inject unknown NMIs randomly to the VM kernel.
>>
>> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
>>
>> The patch is to fix the issue by resetting AMD pmu registers during the
>> reset.
>>
>> Cc: Joe Jin <joe.jin@oracle.com>
>> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
>> ---
>>   target/i386/cpu.h     |  5 +++
>>   target/i386/kvm/kvm.c | 83 +++++++++++++++++++++++++++++++++++++++++--
>>   2 files changed, 86 insertions(+), 2 deletions(-)
>>
>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
>> index d4bc19577a..4cf0b98817 100644
>> --- a/target/i386/cpu.h
>> +++ b/target/i386/cpu.h
>> @@ -468,6 +468,11 @@ typedef enum X86Seg {
>>   #define MSR_CORE_PERF_GLOBAL_CTRL       0x38f
>>   #define MSR_CORE_PERF_GLOBAL_OVF_CTRL   0x390
>>   +#define MSR_K7_EVNTSEL0                 0xc0010000
>> +#define MSR_K7_PERFCTR0                 0xc0010004
>> +#define MSR_F15H_PERF_CTL0              0xc0010200
>> +#define MSR_F15H_PERF_CTR0              0xc0010201
>> +
>>   #define MSR_MC0_CTL                     0x400
>>   #define MSR_MC0_STATUS                  0x401
>>   #define MSR_MC0_ADDR                    0x402
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> index 0b1226ff7f..023fcbce48 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -2005,6 +2005,32 @@ int kvm_arch_init_vcpu(CPUState *cs)
>>           }
>>       }
>>   +    if (IS_AMD_CPU(env)) {
>> +        int64_t family;
>> +
>> +        family = (env->cpuid_version >> 8) & 0xf;
>> +        if (family == 0xf) {
>> +            family += (env->cpuid_version >> 20) & 0xff;
>> +        }
>> +
>> +        /*
>> +         * If KVM_CAP_PMU_CAPABILITY is not supported, there is no way to
>> +         * disable the AMD pmu virtualization.
>> +         *
>> +         * If KVM_CAP_PMU_CAPABILITY is supported, "!has_pmu_cap" indicates
>> +         * the KVM side has already disabled the pmu virtualization.
>> +         */
>> +        if (family >= 6 && (!has_pmu_cap || cpu->enable_pmu)) {
>> +            has_architectural_pmu_version = 1;
>> +
>> +            if (env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_PERFCORE) {
>> +                num_architectural_pmu_gp_counters = 6;
>> +            } else {
>> +                num_architectural_pmu_gp_counters = 4;
>> +            }
>> +        }
>> +    }
>> +
>>       cpu_x86_cpuid(env, 0x80000000, 0, &limit, &unused, &unused, &unused);
>>         for (i = 0x80000000; i <= limit; i++) {
>> @@ -3326,7 +3352,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
>>               kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL,
>> env->poll_control_msr);
>>           }
>>   -        if (has_architectural_pmu_version > 0) {
>> +        if (has_architectural_pmu_version > 0 && IS_INTEL_CPU(env)) {
>>               if (has_architectural_pmu_version > 1) {
>>                   /* Stop the counter.  */
>>                   kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
>> @@ -3357,6 +3383,26 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
>>                                     env->msr_global_ctrl);
>>               }
>>           }
>> +
>> +        if (has_architectural_pmu_version > 0 && IS_AMD_CPU(env)) {
>> +            uint32_t sel_base = MSR_K7_EVNTSEL0;
>> +            uint32_t ctr_base = MSR_K7_PERFCTR0;
>> +            uint32_t step = 1;
>> +
>> +            if (num_architectural_pmu_gp_counters == 6) {
>> +                sel_base = MSR_F15H_PERF_CTL0;
>> +                ctr_base = MSR_F15H_PERF_CTR0;
>> +                step = 2;
>> +            }
>> +
>> +            for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
>> +                kvm_msr_entry_add(cpu, ctr_base + i * step,
>> +                                  env->msr_gp_counters[i]);
>> +                kvm_msr_entry_add(cpu, sel_base + i * step,
>> +                                  env->msr_gp_evtsel[i]);
>> +            }
>> +        }
>> +
>>           /*
>>            * Hyper-V partition-wide MSRs: to avoid clearing them on cpu hot-add,
>>            * only sync them to KVM on the first cpu
>> @@ -3817,7 +3863,7 @@ static int kvm_get_msrs(X86CPU *cpu)
>>       if (env->features[FEAT_KVM] & (1 << KVM_FEATURE_POLL_CONTROL)) {
>>           kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
>>       }
>> -    if (has_architectural_pmu_version > 0) {
>> +    if (has_architectural_pmu_version > 0 && IS_INTEL_CPU(env)) {
>>           if (has_architectural_pmu_version > 1) {
>>               kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
>>               kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
>> @@ -3833,6 +3879,25 @@ static int kvm_get_msrs(X86CPU *cpu)
>>           }
>>       }
>>   +    if (has_architectural_pmu_version > 0 && IS_AMD_CPU(env)) {
>> +        uint32_t sel_base = MSR_K7_EVNTSEL0;
>> +        uint32_t ctr_base = MSR_K7_PERFCTR0;
>> +        uint32_t step = 1;
>> +
>> +        if (num_architectural_pmu_gp_counters == 6) {
>> +            sel_base = MSR_F15H_PERF_CTL0;
>> +            ctr_base = MSR_F15H_PERF_CTR0;
>> +            step = 2;
>> +        }
>> +
>> +        for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
>> +            kvm_msr_entry_add(cpu, ctr_base + i * step,
>> +                              env->msr_gp_counters[i]);
>> +            kvm_msr_entry_add(cpu, sel_base + i * step,
>> +                              env->msr_gp_evtsel[i]);
>> +        }
>> +    }
>> +
>>       if (env->mcg_cap) {
>>           kvm_msr_entry_add(cpu, MSR_MCG_STATUS, 0);
>>           kvm_msr_entry_add(cpu, MSR_MCG_CTL, 0);
>> @@ -4118,6 +4183,20 @@ static int kvm_get_msrs(X86CPU *cpu)
>>           case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1:
>>               env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data;
>>               break;
>> +        case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL0 + 3:
>> +            env->msr_gp_evtsel[index - MSR_K7_EVNTSEL0] = msrs[i].data;
>> +            break;
>> +        case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR0 + 3:
>> +            env->msr_gp_counters[index - MSR_K7_PERFCTR0] = msrs[i].data;
>> +            break;
>> +        case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTL0 + 0xb:
>> +            index = index - MSR_F15H_PERF_CTL0;
>> +            if (index & 0x1) {
>> +                env->msr_gp_counters[index] = msrs[i].data;
>> +            } else {
>> +                env->msr_gp_evtsel[index] = msrs[i].data;
>> +            }
>> +            break;
>>           case HV_X64_MSR_HYPERCALL:
>>               env->msr_hv_hypercall = msrs[i].data;
>>               break;


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] i386: kvm: disable KVM_CAP_PMU_CAPABILITY if "pmu" is disabled
  2022-11-19 12:29 ` [PATCH 2/3] i386: kvm: disable KVM_CAP_PMU_CAPABILITY if "pmu" is disabled Dongli Zhang
  2022-11-21 11:03   ` Greg Kurz
@ 2023-11-13 16:39   ` Denis V. Lunev
  1 sibling, 0 replies; 12+ messages in thread
From: Denis V. Lunev @ 2023-11-13 16:39 UTC (permalink / raw)
  To: Dongli Zhang, kvm, qemu-devel, qemu-arm, qemu-ppc, qemu-riscv,
	qemu-s390x
  Cc: pbonzini, peter.maydell, mtosatti, chenhuacai, philmd, aurelien,
	jiaxun.yang, aleksandar.rikalo, danielhb413, clg, david, groug,
	palmer, alistair.francis, bin.meng, pasic, borntraeger,
	richard.henderson, david, iii, thuth, joe.jin, likexu,
	Alexander Ivanov, Konstantin Khorenko

On 11/19/22 13:29, Dongli Zhang wrote:
> The "perf stat" at the VM side still works even we set "-cpu host,-pmu" in
> the QEMU command line. That is, neither "-cpu host,-pmu" nor "-cpu EPYC"
> could disable the pmu virtualization in an AMD environment.
>
> We still see below at VM kernel side ...
>
> [    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
>
> ... although we expect something like below.
>
> [    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
> [    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
>
> This is because the AMD pmu (v1) does not rely on cpuid to decide if the
> pmu virtualization is supported.
>
> We disable KVM_CAP_PMU_CAPABILITY if the 'pmu' is disabled in the vcpu
> properties.
>
> Cc: Joe Jin <joe.jin@oracle.com>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
>   target/i386/kvm/kvm.c | 17 +++++++++++++++++
>   1 file changed, 17 insertions(+)
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 8fec0bc5b5..0b1226ff7f 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -137,6 +137,8 @@ static int has_triple_fault_event;
>   
>   static bool has_msr_mcg_ext_ctl;
>   
> +static int has_pmu_cap;
> +
>   static struct kvm_cpuid2 *cpuid_cache;
>   static struct kvm_cpuid2 *hv_cpuid_cache;
>   static struct kvm_msr_list *kvm_feature_msrs;
> @@ -1725,6 +1727,19 @@ static void kvm_init_nested_state(CPUX86State *env)
>   
>   void kvm_arch_pre_create_vcpu(CPUState *cs)
>   {
> +    X86CPU *cpu = X86_CPU(cs);
> +    int ret;
> +
> +    if (has_pmu_cap && !cpu->enable_pmu) {
> +        ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
> +                                KVM_PMU_CAP_DISABLE);
> +        if (ret < 0) {
> +            error_report("kvm: Failed to disable pmu cap: %s",
> +                         strerror(-ret));
> +        }
> +
> +        has_pmu_cap = 0;
> +    }
>   }
>   
>   int kvm_arch_init_vcpu(CPUState *cs)
> @@ -2517,6 +2532,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>           }
>       }
>   
> +    has_pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
> +
>       ret = kvm_get_supported_msrs(s);
>       if (ret < 0) {
>           return ret;
This patch is very important in particular.
It boosts performance of any single VMexit
is 13% for AMD. Intel is being measured.

At my opinion v1 of the patch is better that
version 2. We should not introduce any
new capability but disable PMU if we can
while it is disabled according to the configuration.

The discussion about performance improvement
is here
https://lore.kernel.org/lkml/ZU2D3f6kc0MDzNR5@google.com/T/

Den


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-11-13 16:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-19 12:28 [PATCH 0/3] kvm: fix two svm pmu virtualization bugs Dongli Zhang
2022-11-19 12:28 ` [PATCH 1/3] kvm: introduce a helper before creating the 1st vcpu Dongli Zhang
2022-11-19 12:29 ` [PATCH 2/3] i386: kvm: disable KVM_CAP_PMU_CAPABILITY if "pmu" is disabled Dongli Zhang
2022-11-21 11:03   ` Greg Kurz
2022-11-21 14:23     ` Liang Yan
2022-11-21 21:11       ` Dongli Zhang
2023-11-13 16:39   ` Denis V. Lunev
2022-11-19 12:29 ` [PATCH 3/3] target/i386/kvm: get and put AMD pmu registers Dongli Zhang
2022-11-21 14:28   ` Liang Yan
2022-11-21 21:33     ` Dongli Zhang
2022-11-21  6:42 ` [PATCH 0/3] kvm: fix two svm pmu virtualization bugs Like Xu
2022-11-21  7:38   ` Dongli Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).