Re: [PATCH RESEND v2 2/2] target/i386/kvm: get and put AMD pmu registers

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Dongli Zhang <dongli.zhang@oracle.com>
To: Like Xu <like.xu.linux@gmail.com>
Cc: qemu-devel@nongnu.org, pbonzini@redhat.com, mtosatti@redhat.com,
	joe.jin@oracle.com, zhenyuw@linux.intel.com, groug@kaod.org,
	lyan@digitalocean.com
Subject: Re: [PATCH RESEND v2 2/2] target/i386/kvm: get and put AMD pmu registers
Date: Mon, 3 Jul 2023 16:20:18 -0700	[thread overview]
Message-ID: <25ac4ae4-cdb9-166e-88b1-df9af247c6c9@oracle.com> (raw)
In-Reply-To: <CAA3+yLdC0Vo-cMuPh++U1-u=eROOG-YdaPrfCMUF+TgErtwj4w@mail.gmail.com>

Hi Like,

On 7/2/23 07:15, Like Xu wrote:
> On Wed, Jun 21, 2023 at 9:39 AM Dongli Zhang <dongli.zhang@oracle.com> wrote:
>>
>> The QEMU side calls kvm_get_msrs() to save the pmu registers from the KVM
>> side to QEMU, and calls kvm_put_msrs() to store the pmu registers back to
>> the KVM side.
>>
>> However, only the Intel gp/fixed/global pmu registers are involved. There
>> is not any implementation for AMD pmu registers. The
>> 'has_architectural_pmu_version' and 'num_architectural_pmu_gp_counters' are
>> calculated at kvm_arch_init_vcpu() via cpuid(0xa). This does not work for
>> AMD. Before AMD PerfMonV2, the number of gp registers is decided based on
>> the CPU version.
> 
> Updating the relevant documentation to clarify this part of the deficiency
> would be a good first step.

Would you mind suggesting the doc to add this TODO/deficiency?

The only place I find is to add a new TODO under docs/system/i386, but not sure
if it is worth it. This bugfix is not complex.

> 
>>
>> This patch is to add the support for AMD version=1 pmu, to get and put AMD
>> pmu registers. Otherwise, there will be a bug:
> 
> AMD version=1 ?
> 
> AMD does not have version 1, just directly has 2, perhaps because of x86
> compatibility. AMD also does not have the so-called architectural pmu.
> Maybe need to rename has_architectural_pmu_version for AMD.
> 

Thank you very much for the explanation. I will use version 2.

> It might be more helpful to add similar support for AMD PerfMonV2.

Yes. I will do that. During that time, the AMD PerfMonV2 KVM patchset (from you)
was still in progress.

I see the pull request from Paolo today. I will add that in v3.

> 
>>
>> 1. The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it
>> is running "perf top". The pmu registers are not disabled gracefully.
>>
>> 2. Although the x86_cpu_reset() resets many registers to zero, the
>> kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result,
>> some pmu events are still enabled at the KVM side.
> 
> I agree that we should have done that, especially if guest pmu is enabled
> on the AMD platforms.
> 
>>
>> 3. The KVM pmc_speculative_in_use() always returns true so that the events
>> will not be reclaimed. The kvm_pmc->perf_event is still active.
>>
>> 4. After the reboot, the VM kernel reports below error:
>>
>> [    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
>> [    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
>>
>> 5. In a worse case, the active kvm_pmc->perf_event is still able to
>> inject unknown NMIs randomly to the VM kernel.
>>
>> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
>>
>> The patch is to fix the issue by resetting AMD pmu registers during the
>> reset.
> 
> I'm not sure if the qemu_reset or VM kexec will necessarily trigger
> kvm::amd_pmu_reset().

According to the mainline linux kernel:

kvm_vcpu_reset()
-> kvm_pmu_reset()
   -> amd_pmu_reset()

The PMU will not reset when init_event==true, that is, when processing the INIT:
line 12049.

11975 void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
11976 {
... ...
12049         if (!init_event) {
12050                 kvm_pmu_reset(vcpu);
12051                 vcpu->arch.smbase = 0x30000;
12052
12053                 vcpu->arch.msr_misc_features_enables = 0;
12054                 vcpu->arch.ia32_misc_enable_msr =
MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL |
12055
MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
12056
12057                 __kvm_set_xcr(vcpu, 0, XFEATURE_MASK_FP);
12058                 __kvm_set_msr(vcpu, MSR_IA32_XSS, 0, true);
12059         }

According to the below ...

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d28bc9dd25ce023270d2e039e7c98d38ecbf7758

... "INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU,
MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP."

That's why initially I did not send a KVM patch to remove the 'init_event'.

> 
>>
>> Cc: Joe Jin <joe.jin@oracle.com>
>> Cc: Like Xu <likexu@tencent.com>
>> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
>> ---
>>  target/i386/cpu.h     |  5 +++
>>  target/i386/kvm/kvm.c | 83 +++++++++++++++++++++++++++++++++++++++++--
>>  2 files changed, 86 insertions(+), 2 deletions(-)
>>
>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
>> index cd047e0410..b8ba72e87a 100644
>> --- a/target/i386/cpu.h
>> +++ b/target/i386/cpu.h
>> @@ -471,6 +471,11 @@ typedef enum X86Seg {
>>  #define MSR_CORE_PERF_GLOBAL_CTRL       0x38f
>>  #define MSR_CORE_PERF_GLOBAL_OVF_CTRL   0x390
>>
>> +#define MSR_K7_EVNTSEL0                 0xc0010000
>> +#define MSR_K7_PERFCTR0                 0xc0010004
>> +#define MSR_F15H_PERF_CTL0              0xc0010200
>> +#define MSR_F15H_PERF_CTR0              0xc0010201
>> +
>>  #define MSR_MC0_CTL                     0x400
>>  #define MSR_MC0_STATUS                  0x401
>>  #define MSR_MC0_ADDR                    0x402
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> index bf4136fa1b..a0f7273dad 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -2084,6 +2084,32 @@ int kvm_arch_init_vcpu(CPUState *cs)
>>          }
>>      }
>>
>> +    /*
>> +     * If KVM_CAP_PMU_CAPABILITY is not supported, there is no way to
>> +     * disable the AMD pmu virtualization.
>> +     *
>> +     * If KVM_CAP_PMU_CAPABILITY is supported, kvm_state->pmu_cap_disabled
>> +     * indicates the KVM side has already disabled the pmu virtualization.
>> +     */
>> +    if (IS_AMD_CPU(env) && !cs->kvm_state->pmu_cap_disabled) {
>> +        int64_t family;
>> +
>> +        family = (env->cpuid_version >> 8) & 0xf;
>> +        if (family == 0xf) {
>> +            family += (env->cpuid_version >> 20) & 0xff;
>> +        }
>> +
>> +        if (family >= 6) {
>> +            has_architectural_pmu_version = 1;
>> +
>> +            if (env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_PERFCORE) {
>> +                num_architectural_pmu_gp_counters = 6;
> 
> Please make the code a little more readable with some macro definitions.
> 
> #define AMD64_NUM_COUNTERS 4
> #define AMD64_NUM_COUNTERS_CORE 6

I will do that. Thank you very much!

> 
>> +            } else {
>> +                num_architectural_pmu_gp_counters = 4;
>> +            }
>> +        }
>> +    }
>> +
>>      cpu_x86_cpuid(env, 0x80000000, 0, &limit, &unused, &unused, &unused);
>>
>>      for (i = 0x80000000; i <= limit; i++) {
>> @@ -3438,7 +3464,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
>>              kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
>>          }
>>
>> -        if (has_architectural_pmu_version > 0) {
>> +        if (has_architectural_pmu_version > 0 && IS_INTEL_CPU(env)) {
> 
> It seems to be saying here that it is possible to perform kvm_msr_entry_add()
> for Intel-related pmu on AMD platforms. So the reverse is expected to be
> feasible as well. Why need distinction?

My fault. I always used something like below when emulating AMD on Intel host.

-cpu EPYC,vendor="AuthenticAMD"

I should consider the AMD-on-Intel and Intel-on-AMD case more carefully.

I will address that in v3.

> 
>>              if (has_architectural_pmu_version > 1) {
>>                  /* Stop the counter.  */
>>                  kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
>> @@ -3469,6 +3495,26 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
>>                                    env->msr_global_ctrl);
>>              }
>>          }
>> +
>> +        if (has_architectural_pmu_version > 0 && IS_AMD_CPU(env)) {
>> +            uint32_t sel_base = MSR_K7_EVNTSEL0;
>> +            uint32_t ctr_base = MSR_K7_PERFCTR0;
>> +            uint32_t step = 1;
>> +
>> +            if (num_architectural_pmu_gp_counters == 6) {
>> +                sel_base = MSR_F15H_PERF_CTL0;
>> +                ctr_base = MSR_F15H_PERF_CTR0;
>> +                step = 2;
>> +            }
>> +
>> +            for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
>> +                kvm_msr_entry_add(cpu, ctr_base + i * step,
>> +                                  env->msr_gp_counters[i]);
>> +                kvm_msr_entry_add(cpu, sel_base + i * step,
>> +                                  env->msr_gp_evtsel[i]);
>> +            }
>> +        }
>> +
>>          /*
>>           * Hyper-V partition-wide MSRs: to avoid clearing them on cpu hot-add,
>>           * only sync them to KVM on the first cpu
>> @@ -3929,7 +3975,7 @@ static int kvm_get_msrs(X86CPU *cpu)
>>      if (env->features[FEAT_KVM] & (1 << KVM_FEATURE_POLL_CONTROL)) {
>>          kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
>>      }
>> -    if (has_architectural_pmu_version > 0) {
>> +    if (has_architectural_pmu_version > 0 && IS_INTEL_CPU(env)) {
>>          if (has_architectural_pmu_version > 1) {
>>              kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
>>              kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
>> @@ -3945,6 +3991,25 @@ static int kvm_get_msrs(X86CPU *cpu)
>>          }
>>      }
>>
>> +    if (has_architectural_pmu_version > 0 && IS_AMD_CPU(env)) {
>> +        uint32_t sel_base = MSR_K7_EVNTSEL0;
>> +        uint32_t ctr_base = MSR_K7_PERFCTR0;
>> +        uint32_t step = 1;
>> +
>> +        if (num_architectural_pmu_gp_counters == 6) {
>> +            sel_base = MSR_F15H_PERF_CTL0;
>> +            ctr_base = MSR_F15H_PERF_CTR0;
>> +            step = 2;
> 
> Perhaps it would be more helpful to add a little commentary on why
> step 2 is needed.

Sure. I will do that. The step "2" is because MSR_F15H_PERF_CTL1 is
MSR_F15H_PERF_CTL0 + 2.

#define MSR_F15H_PERF_CTL0              0xc0010200
#define MSR_F15H_PERF_CTR0              0xc0010201


Thank you very much for your feedback. I will address those in v3.

Dongli Zhang

> 
>> +        }
>> +
>> +        for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
>> +            kvm_msr_entry_add(cpu, ctr_base + i * step,
>> +                              env->msr_gp_counters[i]);
>> +            kvm_msr_entry_add(cpu, sel_base + i * step,
>> +                              env->msr_gp_evtsel[i]);
>> +        }
>> +    }
>> +
>>      if (env->mcg_cap) {
>>          kvm_msr_entry_add(cpu, MSR_MCG_STATUS, 0);
>>          kvm_msr_entry_add(cpu, MSR_MCG_CTL, 0);
>> @@ -4230,6 +4295,20 @@ static int kvm_get_msrs(X86CPU *cpu)
>>          case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1:
>>              env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data;
>>              break;
>> +        case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL0 + 3:
>> +            env->msr_gp_evtsel[index - MSR_K7_EVNTSEL0] = msrs[i].data;
>> +            break;
>> +        case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR0 + 3:
>> +            env->msr_gp_counters[index - MSR_K7_PERFCTR0] = msrs[i].data;
>> +            break;
>> +        case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTL0 + 0xb:
>> +            index = index - MSR_F15H_PERF_CTL0;
>> +            if (index & 0x1) {
>> +                env->msr_gp_counters[index] = msrs[i].data;
>> +            } else {
>> +                env->msr_gp_evtsel[index] = msrs[i].data;
>> +            }
>> +            break;
>>          case HV_X64_MSR_HYPERCALL:
>>              env->msr_hv_hypercall = msrs[i].data;
>>              break;
>> --
>> 2.34.1
>>

     prev parent reply	other threads:[~2023-07-03 23:21 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-21  1:38 [PATCH RESEND v2 0/2] target/i386/kvm: fix two svm pmu virtualization bugs Dongli Zhang
2023-06-21  1:38 ` [PATCH RESEND v2 1/2] target/i386/kvm: introduce 'pmu-cap-disabled' to set KVM_PMU_CAP_DISABLE Dongli Zhang
2023-07-02 13:41   ` Like Xu
2023-07-03 21:58     ` Dongli Zhang
2023-06-21  1:38 ` [PATCH RESEND v2 2/2] target/i386/kvm: get and put AMD pmu registers Dongli Zhang
2023-07-02 14:15   ` Like Xu
2023-07-03 23:20     ` Dongli Zhang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=25ac4ae4-cdb9-166e-88b1-df9af247c6c9@oracle.com \
    --to=dongli.zhang@oracle.com \
    --cc=groug@kaod.org \
    --cc=joe.jin@oracle.com \
    --cc=like.xu.linux@gmail.com \
    --cc=lyan@digitalocean.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=zhenyuw@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).