qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Dongli Zhang <dongli.zhang@oracle.com>
To: qemu-devel@nongnu.org, kvm@vger.kernel.org
Cc: pbonzini@redhat.com, zhao1.liu@intel.com, mtosatti@redhat.com,
	sandipan.das@amd.com, babu.moger@amd.com, likexu@tencent.com,
	like.xu.linux@gmail.com, zhenyuw@linux.intel.com, groug@kaod.org,
	khorenko@virtuozzo.com, alexander.ivanov@virtuozzo.com,
	den@virtuozzo.com, davydov-max@yandex-team.ru,
	xiaoyao.li@intel.com, dapeng1.mi@linux.intel.com,
	joe.jin@oracle.com
Subject: [PATCH v2 08/10] target/i386/kvm: reset AMD PMU registers during VM reset
Date: Sun,  2 Mar 2025 14:00:16 -0800	[thread overview]
Message-ID: <20250302220112.17653-9-dongli.zhang@oracle.com> (raw)
In-Reply-To: <20250302220112.17653-1-dongli.zhang@oracle.com>

QEMU uses the kvm_get_msrs() function to save Intel PMU registers from KVM
and kvm_put_msrs() to restore them to KVM. However, there is no support for
AMD PMU registers. Currently, has_pmu_version and num_pmu_gp_counters are
initialized based on cpuid(0xa), which does not apply to AMD processors.
For AMD CPUs, prior to PerfMonV2, the number of general-purpose registers
is determined based on the CPU version.

To address this issue, we need to add support for AMD PMU registers.
Without this support, the following problems can arise:

1. If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
running "perf top", the PMU registers are not disabled properly.

2. Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
does not handle AMD PMU registers, causing some PMU events to remain
enabled in KVM.

3. The KVM kvm_pmc_speculative_in_use() function consistently returns true,
preventing the reclamation of these events. Consequently, the
kvm_pmc->perf_event remains active.

4. After a reboot, the VM kernel may report the following error:

[    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
[    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)

5. In the worst case, the active kvm_pmc->perf_event may inject unknown
NMIs randomly into the VM kernel:

[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.

To resolve these issues, we propose resetting AMD PMU registers during the
VM reset process.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
Changed since v1:
  - Modify "MSR_K7_EVNTSEL0 + 3" and "MSR_K7_PERFCTR0 + 3" by using
    AMD64_NUM_COUNTERS (suggested by Sandipan Das).
  - Use "AMD64_NUM_COUNTERS_CORE * 2 - 1", not "MSR_F15H_PERF_CTL0 + 0xb".
    (suggested by Sandipan Das).
  - Switch back to "-pmu" instead of using a global "pmu-cap-disabled".
  - Don't initialize PMU info if kvm.enable_pmu=N.

 target/i386/cpu.h     |   8 ++
 target/i386/kvm/kvm.c | 173 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 177 insertions(+), 4 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index c67b42d34f..319600672b 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -490,6 +490,14 @@ typedef enum X86Seg {
 #define MSR_CORE_PERF_GLOBAL_CTRL       0x38f
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL   0x390
 
+#define MSR_K7_EVNTSEL0                 0xc0010000
+#define MSR_K7_PERFCTR0                 0xc0010004
+#define MSR_F15H_PERF_CTL0              0xc0010200
+#define MSR_F15H_PERF_CTR0              0xc0010201
+
+#define AMD64_NUM_COUNTERS              4
+#define AMD64_NUM_COUNTERS_CORE         6
+
 #define MSR_MC0_CTL                     0x400
 #define MSR_MC0_STATUS                  0x401
 #define MSR_MC0_ADDR                    0x402
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index efba3ae7a4..d4be8a0d2e 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2069,7 +2069,7 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
     return 0;
 }
 
-static void kvm_init_pmu_info(CPUX86State *env)
+static void kvm_init_pmu_info_intel(CPUX86State *env)
 {
     uint32_t eax, edx;
     uint32_t unused;
@@ -2106,6 +2106,94 @@ static void kvm_init_pmu_info(CPUX86State *env)
     }
 }
 
+static void kvm_init_pmu_info_amd(CPUX86State *env)
+{
+    uint32_t unused;
+    int64_t family;
+    uint32_t ecx;
+
+    has_pmu_version = 0;
+
+    /*
+     * To determine the CPU family, the following code is derived from
+     * x86_cpuid_version_get_family().
+     */
+    family = (env->cpuid_version >> 8) & 0xf;
+    if (family == 0xf) {
+        family += (env->cpuid_version >> 20) & 0xff;
+    }
+
+    /*
+     * Performance-monitoring supported from K7 and later.
+     */
+    if (family < 6) {
+        return;
+    }
+
+    has_pmu_version = 1;
+
+    cpu_x86_cpuid(env, 0x80000001, 0, &unused, &unused, &ecx, &unused);
+
+    if (!(ecx & CPUID_EXT3_PERFCORE)) {
+        num_pmu_gp_counters = AMD64_NUM_COUNTERS;
+        return;
+    }
+
+    num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
+}
+
+static bool is_same_vendor(CPUX86State *env)
+{
+    static uint32_t host_cpuid_vendor1;
+    static uint32_t host_cpuid_vendor2;
+    static uint32_t host_cpuid_vendor3;
+
+    host_cpuid(0x0, 0, NULL, &host_cpuid_vendor1, &host_cpuid_vendor3,
+               &host_cpuid_vendor2);
+
+    return env->cpuid_vendor1 == host_cpuid_vendor1 &&
+           env->cpuid_vendor2 == host_cpuid_vendor2 &&
+           env->cpuid_vendor3 == host_cpuid_vendor3;
+}
+
+static void kvm_init_pmu_info(CPUState *cs)
+{
+    X86CPU *cpu = X86_CPU(cs);
+    CPUX86State *env = &cpu->env;
+
+    /*
+     * The PMU virtualization is disabled by kvm.enable_pmu=N.
+     */
+    if (kvm_pmu_disabled) {
+        return;
+    }
+
+    /*
+     * It is not supported to virtualize AMD PMU registers on Intel
+     * processors, nor to virtualize Intel PMU registers on AMD processors.
+     */
+    if (!is_same_vendor(env)) {
+        return;
+    }
+
+    /*
+     * If KVM_CAP_PMU_CAPABILITY is not supported, there is no way to
+     * disable the AMD pmu virtualization.
+     *
+     * If KVM_CAP_PMU_CAPABILITY is supported !cpu->enable_pmu
+     * indicates the KVM has already disabled the PMU virtualization.
+     */
+    if (has_pmu_cap && !cpu->enable_pmu) {
+        return;
+    }
+
+    if (IS_INTEL_CPU(env)) {
+        kvm_init_pmu_info_intel(env);
+    } else if (IS_AMD_CPU(env)) {
+        kvm_init_pmu_info_amd(env);
+    }
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     struct {
@@ -2288,7 +2376,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
     cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i);
     cpuid_data.cpuid.nent = cpuid_i;
 
-    kvm_init_pmu_info(env);
+    kvm_init_pmu_info(cs);
 
     if (((env->cpuid_version >> 8)&0xF) >= 6
         && (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
@@ -4064,7 +4152,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
             kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
         }
 
-        if (has_pmu_version > 0) {
+        if (IS_INTEL_CPU(env) && has_pmu_version > 0) {
             if (has_pmu_version > 1) {
                 /* Stop the counter.  */
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
@@ -4095,6 +4183,38 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
                                   env->msr_global_ctrl);
             }
         }
+
+        if (IS_AMD_CPU(env) && has_pmu_version > 0) {
+            uint32_t sel_base = MSR_K7_EVNTSEL0;
+            uint32_t ctr_base = MSR_K7_PERFCTR0;
+            /*
+             * The address of the next selector or counter register is
+             * obtained by incrementing the address of the current selector
+             * or counter register by one.
+             */
+            uint32_t step = 1;
+
+            /*
+             * When PERFCORE is enabled, AMD PMU uses a separate set of
+             * addresses for the selector and counter registers.
+             * Additionally, the address of the next selector or counter
+             * register is determined by incrementing the address of the
+             * current register by two.
+             */
+            if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+                sel_base = MSR_F15H_PERF_CTL0;
+                ctr_base = MSR_F15H_PERF_CTR0;
+                step = 2;
+            }
+
+            for (i = 0; i < num_pmu_gp_counters; i++) {
+                kvm_msr_entry_add(cpu, ctr_base + i * step,
+                                  env->msr_gp_counters[i]);
+                kvm_msr_entry_add(cpu, sel_base + i * step,
+                                  env->msr_gp_evtsel[i]);
+            }
+        }
+
         /*
          * Hyper-V partition-wide MSRs: to avoid clearing them on cpu hot-add,
          * only sync them to KVM on the first cpu
@@ -4542,7 +4662,8 @@ static int kvm_get_msrs(X86CPU *cpu)
     if (env->features[FEAT_KVM] & CPUID_KVM_POLL_CONTROL) {
         kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
     }
-    if (has_pmu_version > 0) {
+
+    if (IS_INTEL_CPU(env) && has_pmu_version > 0) {
         if (has_pmu_version > 1) {
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
@@ -4558,6 +4679,35 @@ static int kvm_get_msrs(X86CPU *cpu)
         }
     }
 
+    if (IS_AMD_CPU(env) && has_pmu_version > 0) {
+        uint32_t sel_base = MSR_K7_EVNTSEL0;
+        uint32_t ctr_base = MSR_K7_PERFCTR0;
+        /*
+         * The address of the next selector or counter register is
+         * obtained by incrementing the address of the current selector
+         * or counter register by one.
+         */
+        uint32_t step = 1;
+
+        /*
+         * When PERFCORE is enabled, AMD PMU uses a separate set of
+         * addresses for the selector and counter registers.
+         * Additionally, the address of the next selector or counter
+         * register is determined by incrementing the address of the
+         * current register by two.
+         */
+        if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+            sel_base = MSR_F15H_PERF_CTL0;
+            ctr_base = MSR_F15H_PERF_CTR0;
+            step = 2;
+        }
+
+        for (i = 0; i < num_pmu_gp_counters; i++) {
+            kvm_msr_entry_add(cpu, ctr_base + i * step, 0);
+            kvm_msr_entry_add(cpu, sel_base + i * step, 0);
+        }
+    }
+
     if (env->mcg_cap) {
         kvm_msr_entry_add(cpu, MSR_MCG_STATUS, 0);
         kvm_msr_entry_add(cpu, MSR_MCG_CTL, 0);
@@ -4869,6 +5019,21 @@ static int kvm_get_msrs(X86CPU *cpu)
         case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1:
             env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data;
             break;
+        case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL0 + AMD64_NUM_COUNTERS - 1:
+            env->msr_gp_evtsel[index - MSR_K7_EVNTSEL0] = msrs[i].data;
+            break;
+        case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR0 + AMD64_NUM_COUNTERS - 1:
+            env->msr_gp_counters[index - MSR_K7_PERFCTR0] = msrs[i].data;
+            break;
+        case MSR_F15H_PERF_CTL0 ...
+             MSR_F15H_PERF_CTL0 + AMD64_NUM_COUNTERS_CORE * 2 - 1:
+            index = index - MSR_F15H_PERF_CTL0;
+            if (index & 0x1) {
+                env->msr_gp_counters[index] = msrs[i].data;
+            } else {
+                env->msr_gp_evtsel[index] = msrs[i].data;
+            }
+            break;
         case HV_X64_MSR_HYPERCALL:
             env->msr_hv_hypercall = msrs[i].data;
             break;
-- 
2.43.5



  parent reply	other threads:[~2025-03-02 22:04 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-02 22:00 [PATCH v2 00/10] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
2025-03-02 22:00 ` [PATCH v2 01/10] target/i386: disable PerfMonV2 when PERFCORE unavailable Dongli Zhang
2025-03-04 14:40   ` Xiaoyao Li
2025-03-04 22:53     ` dongli.zhang
2025-03-05  1:38       ` Xiaoyao Li
2025-03-05 14:20   ` Zhao Liu
2025-03-07  7:24   ` Sandipan Das
2025-03-02 22:00 ` [PATCH v2 02/10] target/i386: disable PERFCORE when "-pmu" is configured Dongli Zhang
2025-03-03  1:59   ` Xiaoyao Li
2025-03-03 18:45     ` dongli.zhang
2025-03-04  6:11       ` Xiaoyao Li
2025-03-06 16:50   ` Zhao Liu
2025-03-06 17:47     ` dongli.zhang
2025-03-07  7:41       ` Zhao Liu
2025-03-02 22:00 ` [PATCH v2 03/10] [DO NOT MERGE] kvm: Introduce kvm_arch_pre_create_vcpu() Dongli Zhang
2025-03-05 14:46   ` Zhao Liu
2025-03-05 21:53     ` dongli.zhang
2025-03-07  7:52       ` Zhao Liu
2025-03-07  8:40         ` Xiaoyao Li
2025-03-02 22:00 ` [PATCH v2 04/10] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured Dongli Zhang
2025-03-04  7:59   ` Xiaoyao Li
2025-03-05  1:22     ` Sean Christopherson
2025-03-05  1:35       ` Xiaoyao Li
2025-03-05 14:41     ` Zhao Liu
2025-03-05 20:13       ` dongli.zhang
2025-03-05 14:44   ` Zhao Liu
2025-03-02 22:00 ` [PATCH v2 05/10] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
2025-03-05  7:03   ` Mi, Dapeng
2025-03-07  9:15   ` Zhao Liu
2025-03-07 22:47     ` Dongli Zhang
2025-03-10  3:55       ` Zhao Liu
2025-03-02 22:00 ` [PATCH v2 06/10] target/i386/kvm: rename architectural PMU variables Dongli Zhang
2025-03-05  7:07   ` Mi, Dapeng
2025-03-07  9:19   ` Zhao Liu
2025-03-07 22:49     ` Dongli Zhang
2025-03-02 22:00 ` [PATCH v2 07/10] target/i386/kvm: query kvm.enable_pmu parameter Dongli Zhang
2025-03-10  6:14   ` Zhao Liu
2025-03-10 15:41     ` Dongli Zhang
2025-03-10 16:49     ` Dongli Zhang
2025-03-02 22:00 ` Dongli Zhang [this message]
2025-03-05  7:33   ` [PATCH v2 08/10] target/i386/kvm: reset AMD PMU registers during VM reset Mi, Dapeng
2025-03-05 11:41   ` Francesco Lavra
2025-03-05 19:05     ` dongli.zhang
2025-03-07  7:38   ` Sandipan Das
2025-03-10  7:47   ` Zhao Liu
2025-03-10 16:39     ` Dongli Zhang
2025-03-11 13:51       ` Zhao Liu
2025-03-11 19:52         ` Dongli Zhang
2025-03-12  8:30           ` Zhao Liu
2025-03-12 22:17             ` Dongli Zhang
2025-03-28  6:29   ` ewanhai
2025-03-28 16:42     ` Dongli Zhang
2025-03-31  3:55       ` ewanhai
2025-03-31 19:16         ` Dongli Zhang
2025-04-01  3:35           ` Ewan Hai
2025-04-07  8:51             ` Zhao Liu
2025-04-07  9:33               ` Ewan Hai
2025-04-16  8:17                 ` Mi, Dapeng
2025-03-02 22:00 ` [PATCH v2 09/10] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
2025-03-02 22:00 ` [PATCH v2 10/10] target/i386/kvm: don't stop Intel PMU counters Dongli Zhang
2025-03-05  7:35   ` Mi, Dapeng
2025-03-05 19:00     ` dongli.zhang
2025-03-06  1:38       ` Mi, Dapeng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250302220112.17653-9-dongli.zhang@oracle.com \
    --to=dongli.zhang@oracle.com \
    --cc=alexander.ivanov@virtuozzo.com \
    --cc=babu.moger@amd.com \
    --cc=dapeng1.mi@linux.intel.com \
    --cc=davydov-max@yandex-team.ru \
    --cc=den@virtuozzo.com \
    --cc=groug@kaod.org \
    --cc=joe.jin@oracle.com \
    --cc=khorenko@virtuozzo.com \
    --cc=kvm@vger.kernel.org \
    --cc=like.xu.linux@gmail.com \
    --cc=likexu@tencent.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=sandipan.das@amd.com \
    --cc=xiaoyao.li@intel.com \
    --cc=zhao1.liu@intel.com \
    --cc=zhenyuw@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).