From: Like Xu <like.xu.linux@gmail.com>
To: Dongli Zhang <dongli.zhang@oracle.com>
Cc: qemu-devel@nongnu.org, pbonzini@redhat.com, mtosatti@redhat.com,
joe.jin@oracle.com, zhenyuw@linux.intel.com, groug@kaod.org,
lyan@digitalocean.com
Subject: Re: [PATCH RESEND v2 2/2] target/i386/kvm: get and put AMD pmu registers
Date: Sun, 2 Jul 2023 22:15:53 +0800 [thread overview]
Message-ID: <CAA3+yLdC0Vo-cMuPh++U1-u=eROOG-YdaPrfCMUF+TgErtwj4w@mail.gmail.com> (raw)
In-Reply-To: <20230621013821.6874-3-dongli.zhang@oracle.com>
On Wed, Jun 21, 2023 at 9:39 AM Dongli Zhang <dongli.zhang@oracle.com> wrote:
>
> The QEMU side calls kvm_get_msrs() to save the pmu registers from the KVM
> side to QEMU, and calls kvm_put_msrs() to store the pmu registers back to
> the KVM side.
>
> However, only the Intel gp/fixed/global pmu registers are involved. There
> is not any implementation for AMD pmu registers. The
> 'has_architectural_pmu_version' and 'num_architectural_pmu_gp_counters' are
> calculated at kvm_arch_init_vcpu() via cpuid(0xa). This does not work for
> AMD. Before AMD PerfMonV2, the number of gp registers is decided based on
> the CPU version.
Updating the relevant documentation to clarify this part of the deficiency
would be a good first step.
>
> This patch is to add the support for AMD version=1 pmu, to get and put AMD
> pmu registers. Otherwise, there will be a bug:
AMD version=1 ?
AMD does not have version 1, just directly has 2, perhaps because of x86
compatibility. AMD also does not have the so-called architectural pmu.
Maybe need to rename has_architectural_pmu_version for AMD.
It might be more helpful to add similar support for AMD PerfMonV2.
>
> 1. The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it
> is running "perf top". The pmu registers are not disabled gracefully.
>
> 2. Although the x86_cpu_reset() resets many registers to zero, the
> kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result,
> some pmu events are still enabled at the KVM side.
I agree that we should have done that, especially if guest pmu is enabled
on the AMD platforms.
>
> 3. The KVM pmc_speculative_in_use() always returns true so that the events
> will not be reclaimed. The kvm_pmc->perf_event is still active.
>
> 4. After the reboot, the VM kernel reports below error:
>
> [ 0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
> [ 0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
>
> 5. In a worse case, the active kvm_pmc->perf_event is still able to
> inject unknown NMIs randomly to the VM kernel.
>
> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
>
> The patch is to fix the issue by resetting AMD pmu registers during the
> reset.
I'm not sure if the qemu_reset or VM kexec will necessarily trigger
kvm::amd_pmu_reset().
>
> Cc: Joe Jin <joe.jin@oracle.com>
> Cc: Like Xu <likexu@tencent.com>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> target/i386/cpu.h | 5 +++
> target/i386/kvm/kvm.c | 83 +++++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 86 insertions(+), 2 deletions(-)
>
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index cd047e0410..b8ba72e87a 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -471,6 +471,11 @@ typedef enum X86Seg {
> #define MSR_CORE_PERF_GLOBAL_CTRL 0x38f
> #define MSR_CORE_PERF_GLOBAL_OVF_CTRL 0x390
>
> +#define MSR_K7_EVNTSEL0 0xc0010000
> +#define MSR_K7_PERFCTR0 0xc0010004
> +#define MSR_F15H_PERF_CTL0 0xc0010200
> +#define MSR_F15H_PERF_CTR0 0xc0010201
> +
> #define MSR_MC0_CTL 0x400
> #define MSR_MC0_STATUS 0x401
> #define MSR_MC0_ADDR 0x402
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index bf4136fa1b..a0f7273dad 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -2084,6 +2084,32 @@ int kvm_arch_init_vcpu(CPUState *cs)
> }
> }
>
> + /*
> + * If KVM_CAP_PMU_CAPABILITY is not supported, there is no way to
> + * disable the AMD pmu virtualization.
> + *
> + * If KVM_CAP_PMU_CAPABILITY is supported, kvm_state->pmu_cap_disabled
> + * indicates the KVM side has already disabled the pmu virtualization.
> + */
> + if (IS_AMD_CPU(env) && !cs->kvm_state->pmu_cap_disabled) {
> + int64_t family;
> +
> + family = (env->cpuid_version >> 8) & 0xf;
> + if (family == 0xf) {
> + family += (env->cpuid_version >> 20) & 0xff;
> + }
> +
> + if (family >= 6) {
> + has_architectural_pmu_version = 1;
> +
> + if (env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_PERFCORE) {
> + num_architectural_pmu_gp_counters = 6;
Please make the code a little more readable with some macro definitions.
#define AMD64_NUM_COUNTERS 4
#define AMD64_NUM_COUNTERS_CORE 6
> + } else {
> + num_architectural_pmu_gp_counters = 4;
> + }
> + }
> + }
> +
> cpu_x86_cpuid(env, 0x80000000, 0, &limit, &unused, &unused, &unused);
>
> for (i = 0x80000000; i <= limit; i++) {
> @@ -3438,7 +3464,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
> kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
> }
>
> - if (has_architectural_pmu_version > 0) {
> + if (has_architectural_pmu_version > 0 && IS_INTEL_CPU(env)) {
It seems to be saying here that it is possible to perform kvm_msr_entry_add()
for Intel-related pmu on AMD platforms. So the reverse is expected to be
feasible as well. Why need distinction?
> if (has_architectural_pmu_version > 1) {
> /* Stop the counter. */
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
> @@ -3469,6 +3495,26 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
> env->msr_global_ctrl);
> }
> }
> +
> + if (has_architectural_pmu_version > 0 && IS_AMD_CPU(env)) {
> + uint32_t sel_base = MSR_K7_EVNTSEL0;
> + uint32_t ctr_base = MSR_K7_PERFCTR0;
> + uint32_t step = 1;
> +
> + if (num_architectural_pmu_gp_counters == 6) {
> + sel_base = MSR_F15H_PERF_CTL0;
> + ctr_base = MSR_F15H_PERF_CTR0;
> + step = 2;
> + }
> +
> + for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
> + kvm_msr_entry_add(cpu, ctr_base + i * step,
> + env->msr_gp_counters[i]);
> + kvm_msr_entry_add(cpu, sel_base + i * step,
> + env->msr_gp_evtsel[i]);
> + }
> + }
> +
> /*
> * Hyper-V partition-wide MSRs: to avoid clearing them on cpu hot-add,
> * only sync them to KVM on the first cpu
> @@ -3929,7 +3975,7 @@ static int kvm_get_msrs(X86CPU *cpu)
> if (env->features[FEAT_KVM] & (1 << KVM_FEATURE_POLL_CONTROL)) {
> kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
> }
> - if (has_architectural_pmu_version > 0) {
> + if (has_architectural_pmu_version > 0 && IS_INTEL_CPU(env)) {
> if (has_architectural_pmu_version > 1) {
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
> @@ -3945,6 +3991,25 @@ static int kvm_get_msrs(X86CPU *cpu)
> }
> }
>
> + if (has_architectural_pmu_version > 0 && IS_AMD_CPU(env)) {
> + uint32_t sel_base = MSR_K7_EVNTSEL0;
> + uint32_t ctr_base = MSR_K7_PERFCTR0;
> + uint32_t step = 1;
> +
> + if (num_architectural_pmu_gp_counters == 6) {
> + sel_base = MSR_F15H_PERF_CTL0;
> + ctr_base = MSR_F15H_PERF_CTR0;
> + step = 2;
Perhaps it would be more helpful to add a little commentary on why
step 2 is needed.
> + }
> +
> + for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
> + kvm_msr_entry_add(cpu, ctr_base + i * step,
> + env->msr_gp_counters[i]);
> + kvm_msr_entry_add(cpu, sel_base + i * step,
> + env->msr_gp_evtsel[i]);
> + }
> + }
> +
> if (env->mcg_cap) {
> kvm_msr_entry_add(cpu, MSR_MCG_STATUS, 0);
> kvm_msr_entry_add(cpu, MSR_MCG_CTL, 0);
> @@ -4230,6 +4295,20 @@ static int kvm_get_msrs(X86CPU *cpu)
> case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1:
> env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data;
> break;
> + case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL0 + 3:
> + env->msr_gp_evtsel[index - MSR_K7_EVNTSEL0] = msrs[i].data;
> + break;
> + case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR0 + 3:
> + env->msr_gp_counters[index - MSR_K7_PERFCTR0] = msrs[i].data;
> + break;
> + case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTL0 + 0xb:
> + index = index - MSR_F15H_PERF_CTL0;
> + if (index & 0x1) {
> + env->msr_gp_counters[index] = msrs[i].data;
> + } else {
> + env->msr_gp_evtsel[index] = msrs[i].data;
> + }
> + break;
> case HV_X64_MSR_HYPERCALL:
> env->msr_hv_hypercall = msrs[i].data;
> break;
> --
> 2.34.1
>
next prev parent reply other threads:[~2023-07-02 14:17 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-21 1:38 [PATCH RESEND v2 0/2] target/i386/kvm: fix two svm pmu virtualization bugs Dongli Zhang
2023-06-21 1:38 ` [PATCH RESEND v2 1/2] target/i386/kvm: introduce 'pmu-cap-disabled' to set KVM_PMU_CAP_DISABLE Dongli Zhang
2023-07-02 13:41 ` Like Xu
2023-07-03 21:58 ` Dongli Zhang
2023-06-21 1:38 ` [PATCH RESEND v2 2/2] target/i386/kvm: get and put AMD pmu registers Dongli Zhang
2023-07-02 14:15 ` Like Xu [this message]
2023-07-03 23:20 ` Dongli Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAA3+yLdC0Vo-cMuPh++U1-u=eROOG-YdaPrfCMUF+TgErtwj4w@mail.gmail.com' \
--to=like.xu.linux@gmail.com \
--cc=dongli.zhang@oracle.com \
--cc=groug@kaod.org \
--cc=joe.jin@oracle.com \
--cc=lyan@digitalocean.com \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=zhenyuw@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).