qemu-riscv.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Liang Yan <lyan@digitalocean.com>
To: Dongli Zhang <dongli.zhang@oracle.com>,
	kvm@vger.kernel.org, qemu-devel@nongnu.org, qemu-arm@nongnu.org,
	qemu-ppc@nongnu.org, qemu-riscv@nongnu.org,
	qemu-s390x@nongnu.org
Cc: pbonzini@redhat.com, peter.maydell@linaro.org,
	mtosatti@redhat.com, chenhuacai@kernel.org, philmd@linaro.org,
	aurelien@aurel32.net, jiaxun.yang@flygoat.com,
	aleksandar.rikalo@syrmia.com, danielhb413@gmail.com,
	clg@kaod.org, david@gibson.dropbear.id.au, groug@kaod.org,
	palmer@dabbelt.com, alistair.francis@wdc.com,
	bin.meng@windriver.com, pasic@linux.ibm.com,
	borntraeger@linux.ibm.com, richard.henderson@linaro.org,
	david@redhat.com, iii@linux.ibm.com, thuth@redhat.com,
	joe.jin@oracle.com, likexu@tencent.com
Subject: Re: [PATCH 3/3] target/i386/kvm: get and put AMD pmu registers
Date: Mon, 21 Nov 2022 09:28:29 -0500	[thread overview]
Message-ID: <8b197d19-a43a-3b29-3a05-c92a09e28d5f@digitalocean.com> (raw)
In-Reply-To: <20221119122901.2469-4-dongli.zhang@oracle.com>

A little bit more information from kernel perspective.

https://lkml.org/lkml/2022/10/31/476


I was kindly thinking of the same idea, but not sure if it is expected  
from a bare-metal perspective, since the four legacy MSRs

are always there. Also not sure if they are used by other applications.


~Liang


On 11/19/22 07:29, Dongli Zhang wrote:
> The QEMU side calls kvm_get_msrs() to save the pmu registers from the KVM
> side to QEMU, and calls kvm_put_msrs() to store the pmu registers back to
> the KVM side.
>
> However, only the Intel gp/fixed/global pmu registers are involved. There
> is not any implementation for AMD pmu registers. The
> 'has_architectural_pmu_version' and 'num_architectural_pmu_gp_counters' are
> calculated at kvm_arch_init_vcpu() via cpuid(0xa). This does not work for
> AMD. Before AMD PerfMonV2, the number of gp registers is decided based on
> the CPU version.
>
> This patch is to add the support for AMD version=1 pmu, to get and put AMD
> pmu registers. Otherwise, there will be a bug:
>
> 1. The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it
> is running "perf top". The pmu registers are not disabled gracefully.
>
> 2. Although the x86_cpu_reset() resets many registers to zero, the
> kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result,
> some pmu events are still enabled at the KVM side.
>
> 3. The KVM pmc_speculative_in_use() always returns true so that the events
> will not be reclaimed. The kvm_pmc->perf_event is still active.
>
> 4. After the reboot, the VM kernel reports below error:
>
> [    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
> [    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
>
> 5. In a worse case, the active kvm_pmc->perf_event is still able to
> inject unknown NMIs randomly to the VM kernel.
>
> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
>
> The patch is to fix the issue by resetting AMD pmu registers during the
> reset.
>
> Cc: Joe Jin <joe.jin@oracle.com>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
>   target/i386/cpu.h     |  5 +++
>   target/i386/kvm/kvm.c | 83 +++++++++++++++++++++++++++++++++++++++++--
>   2 files changed, 86 insertions(+), 2 deletions(-)
>
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index d4bc19577a..4cf0b98817 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -468,6 +468,11 @@ typedef enum X86Seg {
>   #define MSR_CORE_PERF_GLOBAL_CTRL       0x38f
>   #define MSR_CORE_PERF_GLOBAL_OVF_CTRL   0x390
>   
> +#define MSR_K7_EVNTSEL0                 0xc0010000
> +#define MSR_K7_PERFCTR0                 0xc0010004
> +#define MSR_F15H_PERF_CTL0              0xc0010200
> +#define MSR_F15H_PERF_CTR0              0xc0010201
> +
>   #define MSR_MC0_CTL                     0x400
>   #define MSR_MC0_STATUS                  0x401
>   #define MSR_MC0_ADDR                    0x402
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 0b1226ff7f..023fcbce48 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -2005,6 +2005,32 @@ int kvm_arch_init_vcpu(CPUState *cs)
>           }
>       }
>   
> +    if (IS_AMD_CPU(env)) {
> +        int64_t family;
> +
> +        family = (env->cpuid_version >> 8) & 0xf;
> +        if (family == 0xf) {
> +            family += (env->cpuid_version >> 20) & 0xff;
> +        }
> +
> +        /*
> +         * If KVM_CAP_PMU_CAPABILITY is not supported, there is no way to
> +         * disable the AMD pmu virtualization.
> +         *
> +         * If KVM_CAP_PMU_CAPABILITY is supported, "!has_pmu_cap" indicates
> +         * the KVM side has already disabled the pmu virtualization.
> +         */
> +        if (family >= 6 && (!has_pmu_cap || cpu->enable_pmu)) {
> +            has_architectural_pmu_version = 1;
> +
> +            if (env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_PERFCORE) {
> +                num_architectural_pmu_gp_counters = 6;
> +            } else {
> +                num_architectural_pmu_gp_counters = 4;
> +            }
> +        }
> +    }
> +
>       cpu_x86_cpuid(env, 0x80000000, 0, &limit, &unused, &unused, &unused);
>   
>       for (i = 0x80000000; i <= limit; i++) {
> @@ -3326,7 +3352,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
>               kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
>           }
>   
> -        if (has_architectural_pmu_version > 0) {
> +        if (has_architectural_pmu_version > 0 && IS_INTEL_CPU(env)) {
>               if (has_architectural_pmu_version > 1) {
>                   /* Stop the counter.  */
>                   kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
> @@ -3357,6 +3383,26 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
>                                     env->msr_global_ctrl);
>               }
>           }
> +
> +        if (has_architectural_pmu_version > 0 && IS_AMD_CPU(env)) {
> +            uint32_t sel_base = MSR_K7_EVNTSEL0;
> +            uint32_t ctr_base = MSR_K7_PERFCTR0;
> +            uint32_t step = 1;
> +
> +            if (num_architectural_pmu_gp_counters == 6) {
> +                sel_base = MSR_F15H_PERF_CTL0;
> +                ctr_base = MSR_F15H_PERF_CTR0;
> +                step = 2;
> +            }
> +
> +            for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
> +                kvm_msr_entry_add(cpu, ctr_base + i * step,
> +                                  env->msr_gp_counters[i]);
> +                kvm_msr_entry_add(cpu, sel_base + i * step,
> +                                  env->msr_gp_evtsel[i]);
> +            }
> +        }
> +
>           /*
>            * Hyper-V partition-wide MSRs: to avoid clearing them on cpu hot-add,
>            * only sync them to KVM on the first cpu
> @@ -3817,7 +3863,7 @@ static int kvm_get_msrs(X86CPU *cpu)
>       if (env->features[FEAT_KVM] & (1 << KVM_FEATURE_POLL_CONTROL)) {
>           kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
>       }
> -    if (has_architectural_pmu_version > 0) {
> +    if (has_architectural_pmu_version > 0 && IS_INTEL_CPU(env)) {
>           if (has_architectural_pmu_version > 1) {
>               kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
>               kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
> @@ -3833,6 +3879,25 @@ static int kvm_get_msrs(X86CPU *cpu)
>           }
>       }
>   
> +    if (has_architectural_pmu_version > 0 && IS_AMD_CPU(env)) {
> +        uint32_t sel_base = MSR_K7_EVNTSEL0;
> +        uint32_t ctr_base = MSR_K7_PERFCTR0;
> +        uint32_t step = 1;
> +
> +        if (num_architectural_pmu_gp_counters == 6) {
> +            sel_base = MSR_F15H_PERF_CTL0;
> +            ctr_base = MSR_F15H_PERF_CTR0;
> +            step = 2;
> +        }
> +
> +        for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
> +            kvm_msr_entry_add(cpu, ctr_base + i * step,
> +                              env->msr_gp_counters[i]);
> +            kvm_msr_entry_add(cpu, sel_base + i * step,
> +                              env->msr_gp_evtsel[i]);
> +        }
> +    }
> +
>       if (env->mcg_cap) {
>           kvm_msr_entry_add(cpu, MSR_MCG_STATUS, 0);
>           kvm_msr_entry_add(cpu, MSR_MCG_CTL, 0);
> @@ -4118,6 +4183,20 @@ static int kvm_get_msrs(X86CPU *cpu)
>           case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1:
>               env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data;
>               break;
> +        case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL0 + 3:
> +            env->msr_gp_evtsel[index - MSR_K7_EVNTSEL0] = msrs[i].data;
> +            break;
> +        case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR0 + 3:
> +            env->msr_gp_counters[index - MSR_K7_PERFCTR0] = msrs[i].data;
> +            break;
> +        case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTL0 + 0xb:
> +            index = index - MSR_F15H_PERF_CTL0;
> +            if (index & 0x1) {
> +                env->msr_gp_counters[index] = msrs[i].data;
> +            } else {
> +                env->msr_gp_evtsel[index] = msrs[i].data;
> +            }
> +            break;
>           case HV_X64_MSR_HYPERCALL:
>               env->msr_hv_hypercall = msrs[i].data;
>               break;


  reply	other threads:[~2022-11-21 15:56 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-19 12:28 [PATCH 0/3] kvm: fix two svm pmu virtualization bugs Dongli Zhang
2022-11-19 12:28 ` [PATCH 1/3] kvm: introduce a helper before creating the 1st vcpu Dongli Zhang
2022-11-19 12:29 ` [PATCH 2/3] i386: kvm: disable KVM_CAP_PMU_CAPABILITY if "pmu" is disabled Dongli Zhang
2022-11-21 11:03   ` Greg Kurz
2022-11-21 14:23     ` Liang Yan
2022-11-21 21:11       ` Dongli Zhang
2023-11-13 16:39   ` Denis V. Lunev
2022-11-19 12:29 ` [PATCH 3/3] target/i386/kvm: get and put AMD pmu registers Dongli Zhang
2022-11-21 14:28   ` Liang Yan [this message]
2022-11-21 21:33     ` Dongli Zhang
2022-11-21  6:42 ` [PATCH 0/3] kvm: fix two svm pmu virtualization bugs Like Xu
2022-11-21  7:38   ` Dongli Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8b197d19-a43a-3b29-3a05-c92a09e28d5f@digitalocean.com \
    --to=lyan@digitalocean.com \
    --cc=aleksandar.rikalo@syrmia.com \
    --cc=alistair.francis@wdc.com \
    --cc=aurelien@aurel32.net \
    --cc=bin.meng@windriver.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=chenhuacai@kernel.org \
    --cc=clg@kaod.org \
    --cc=danielhb413@gmail.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=david@redhat.com \
    --cc=dongli.zhang@oracle.com \
    --cc=groug@kaod.org \
    --cc=iii@linux.ibm.com \
    --cc=jiaxun.yang@flygoat.com \
    --cc=joe.jin@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=likexu@tencent.com \
    --cc=mtosatti@redhat.com \
    --cc=palmer@dabbelt.com \
    --cc=pasic@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=philmd@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=qemu-riscv@nongnu.org \
    --cc=qemu-s390x@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).