qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: dongli.zhang@oracle.com
To: Maksim Davydov <davydov-max@yandex-team.ru>
Cc: pbonzini@redhat.com, mtosatti@redhat.com, sandipan.das@amd.com,
	babu.moger@amd.com, zhao1.liu@intel.com, likexu@tencent.com,
	like.xu.linux@gmail.com, zhenyuw@linux.intel.com, groug@kaod.org,
	lyan@digitalocean.com, khorenko@virtuozzo.com,
	alexander.ivanov@virtuozzo.com, den@virtuozzo.com,
	joe.jin@oracle.com, qemu-devel@nongnu.org, kvm@vger.kernel.org
Subject: Re: [PATCH 5/7] target/i386/kvm: reset AMD PMU registers during VM reset
Date: Fri, 8 Nov 2024 10:04:41 -0800	[thread overview]
Message-ID: <659a895c-3c0b-47c7-81fa-6fb73ba1d2b6@oracle.com> (raw)
In-Reply-To: <94e089fb-4fc3-4320-897e-e8146a226109@yandex-team.ru>

Hi Maksim,

On 11/8/24 6:07 AM, Maksim Davydov wrote:
> 
> 

[snip]

>>>> +
>>>> +    num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
>>>> +}
>>>
>>> It seems that AMD implementation has one issue.
>>> KVM has parameter `enable_pmu`. So vPMU can be disabled in another way, not only
>>> via KVM_PMU_CAP_DISABLE. For Intel it's not a problem, because the vPMU
>>> initialization uses info from KVM_GET_SUPPORTED_CPUID. The enable_pmu state is
>>> reflected in KVM_GET_SUPPORTED_CPUID.  Thus no PMU MSRs in kvm_put_msrs/
>>> kvm_get_msrs will be used.
>>>
>>> But on AMD we don't use information from KVM_GET_SUPPORTED_CPUID to set an
>>> appropriate number of PMU registers. So, if vPMU is disabled by KVM parameter
>>> `enable_pmu` and pmu-cap-disable=false, then has_pmu_version will be 1 after
>>> kvm_init_pmu_info_amd execution. It means that in kvm_put_msrs/kvm_get_msrs 4
>>> PMU counters will be processed, but the correct behavior in that situation is to
>>> skip all PMU registers.
>>> I think we should get info from KVM to fix that.
>>>
>>> I tested this series on Zen2 and found that PMU MSRs were still processed during
>>> initialization even with enable_pmu=N. But it doesn't lead to any errors in QEMU
>>
>> Thank you very much for the feedback and helping catch the bug!
>>
>> When enable_pmu=N, the QEMU (with this patchset) cannot tell if vPMU is
>> supported via KVM_CAP_PMU_CAPABILITY.
>>
>> As it cannot disable the PMU, it falls to the legacy 4 counters.
>>
>> It falls to 4 counters because KVM disableds PERFCORE on enable_pmu=Y, i.e.,
>>
>> 5220         if (enable_pmu) {
>> 5221                 /*
>> 5222                  * Enumerate support for PERFCTR_CORE if and only if KVM has
>> 5223                  * access to enough counters to virtualize "core" support,
>> 5224                  * otherwise limit vPMU support to the legacy number of
>> counters.
>> 5225                  */
>> 5226                 if (kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS_CORE)
>> 5227                         kvm_pmu_cap.num_counters_gp =
>> min(AMD64_NUM_COUNTERS,
>> 5228
>> kvm_pmu_cap.num_counters_gp);
>> 5229                 else
>> 5230                         kvm_cpu_cap_check_and_set(X86_FEATURE_PERFCTR_CORE);
>> 5231
>> 5232                 if (kvm_pmu_cap.version != 2 ||
>> 5233                     !kvm_cpu_cap_has(X86_FEATURE_PERFCTR_CORE))
>> 5234                         kvm_cpu_cap_clear(X86_FEATURE_PERFMON_V2);
>> 5235         }
>>
>>
>> During the bootup and reset, the QEMU (with this patchset) erroneously resets
>> MSRs for the 4 PMCs, via line 3827.
>>
>> 3825 static int kvm_buf_set_msrs(X86CPU *cpu)
>> 3826 {
>> 3827     int ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, cpu->kvm_msr_buf);
>> 3828     if (ret < 0) {
>> 3829         return ret;
>> 3830     }
>> 3831
>> 3832     if (ret < cpu->kvm_msr_buf->nmsrs) {
>> 3833         struct kvm_msr_entry *e = &cpu->kvm_msr_buf->entries[ret];
>> 3834         error_report("error: failed to set MSR 0x%" PRIx32 " to 0x%" PRIx64,
>> 3835                      (uint32_t)e->index, (uint64_t)e->data);
>> 3836     }
>> 3837
>> 3838     assert(ret == cpu->kvm_msr_buf->nmsrs);
>> 3839     return 0;
>> 3840 }
>>
>> Because enable_pmu=N, the KVM doesn't support those registers. However, it
>> returns 0 (not 1), because the KVM does nothing in the implicit else (i.e., line
>> 4144).
>>
>> 3847 int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> 3848 {
>> ... ...
>> 4138         case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
>> 4139         case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR1:
>> 4140         case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL3:
>> 4141         case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL1:
>> 4142                 if (kvm_pmu_is_valid_msr(vcpu, msr))
>> 4143                         return kvm_pmu_set_msr(vcpu, msr_info);
>> 4144
>> 4145                 if (data)
>> 4146                         kvm_pr_unimpl_wrmsr(vcpu, msr, data);
>> 4147                 break;
>> ... ...
>> 4224         default:
>> 4225                 if (kvm_pmu_is_valid_msr(vcpu, msr))
>> 4226                         return kvm_pmu_set_msr(vcpu, msr_info);
>> 4227
>> 4228                 /*
>> 4229                  * Userspace is allowed to write '0' to MSRs that KVM
>> reports
>> 4230                  * as to-be-saved, even if an MSRs isn't fully supported.
>> 4231                  */
>> 4232                 if (msr_info->host_initiated && !data &&
>> 4233                     kvm_is_msr_to_save(msr))
>> 4234                         break;
>> 4235
>> 4236                 return KVM_MSR_RET_INVALID;
>> 4237         }
>> 4238         return 0;
>> 4239 }
>> 4240 EXPORT_SYMBOL_GPL(kvm_set_msr_common);
>>
>> Fortunately, it returns 0 at line 4238. No error is detected by QEMU.
>>
>> Perhaps I may need to send message with a small patch to return 1 in the
>> implicit 'else' to kvm mailing list to confirm if that is expected.
>>
>> However, the answer is very likely 'expected', because line 4229 to line 4230
>> already explain it.
>>
> 
> Sorry for confusing you. My fault.
> I tested the previous series on Intel with the old kernel and QEMU failed with
> `error: failed to set MSR 0x38d to 0x0`. So I expected the same error.
> But as I can see, AMD PMU registers are processed differently than the Intel
> ones. Also the default MSR behavior in KVM has been changed since 2de154f541fc

I think the AMD PMU registers are treated equally with Intel ones as by the
commit 2de154f541fc. Both Intel and AMD PMU registers are in msrs_to_save_pmu[].

The objective was "to avoid spurious unsupported accesses".

> 
> I think that the current implementation with additional parameter pmu-cap-
> disabled does what we expect. The guest will see disabled PMU in the same two
> configurations:
> * pmu-cap-disabled=true and enabled_pmu=N
> * pmu-cap-disabled=true and enabled_pmu=Y
> But in QEMU these two configurations will have different states (has_pmu_version
> 1 and 0 respectively). I think it should be taken into account in the

Although the unsupported MSR write doesn't trigger any issue (thanks to
msrs_to_save_pmu[]), I agree this is the bug that I will address in v2.

Thanks to the reminder, indeed I have noticed another issue to be addressed in
v2: something unexpected may happen if we migrate from old KVM to new KVM
(assuming same QEMU versions).

Suppose one user never notice "-pmu" doesn't work on old AMD KVM, but still add
"-pmu" to QEMU command line.

old AMD KVM: "-pmu" doesn't take effect, due to the lack of KVM_CAP_PMU_CAPABILITY.

new AMD KVM: "-pmu" takes effect.

After the migration, the vPMU won't work any longer from guest's perspective.

> implementation without pmu-cap-disabled (which was suggested before) to save
> guest-visible state during migration.

Yes, I am going to revert back to my previous solution with "-pmu".

Thanks everyone's suggestion on "-pmu" vs. "pmu-cap-disabled". To finalize the
decision helps move forward.


Would you mind clarify "without pmu-cap-disabled (which was suggested before) to
save guest-visible state during migration."?

Would you mean the compatibility issue between old QEMU version (without
"pmu-cap-disabled") and new QEMU version (with "pmu-cap-disabled")?

Thank you very much!

Dongli Zhang


  reply	other threads:[~2024-11-08 18:05 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-04  9:40 [PATCH 0/7] target/i386/kvm/pmu: Enhancement, Bugfix and Cleanup Dongli Zhang
2024-11-04  9:40 ` [PATCH 1/7] target/i386: disable PerfMonV2 when PERFCORE unavailable Dongli Zhang
2024-11-06  3:54   ` Zhao Liu
2024-11-07  0:29     ` dongli.zhang
2024-11-07  7:57       ` Zhao Liu
2024-11-04  9:40 ` [PATCH 2/7] target/i386/kvm: introduce 'pmu-cap-disabled' to set KVM_PMU_CAP_DISABLE Dongli Zhang
2024-11-07  7:52   ` Zhao Liu
2024-11-07 23:44     ` dongli.zhang
2024-11-08  2:32       ` Zhao Liu
2024-11-08 12:52       ` Sandipan Das
2024-11-13 17:15       ` Zhao Liu
2024-11-14  0:13         ` dongli.zhang
2024-11-21 10:06       ` Mi, Dapeng
2025-02-07  9:52         ` Mi, Dapeng
2025-02-09 20:12           ` dongli.zhang
2025-02-10  8:04             ` Mi, Dapeng
2024-11-04  9:40 ` [PATCH 3/7] target/i386/kvm: init PMU information only once Dongli Zhang
2024-11-10 15:29   ` Zhao Liu
2024-11-13  1:50     ` dongli.zhang
2024-11-13 16:48       ` Zhao Liu
2024-11-04  9:40 ` [PATCH 4/7] target/i386/kvm: rename architectural PMU variables Dongli Zhang
2024-11-04  9:40 ` [PATCH 5/7] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
2024-11-06  9:58   ` Sandipan Das
2024-11-07  0:33     ` dongli.zhang
2024-11-07 21:00   ` Maksim Davydov
2024-11-08  1:19     ` dongli.zhang
2024-11-08 14:07       ` Maksim Davydov
2024-11-08 18:04         ` dongli.zhang [this message]
2024-11-04  9:40 ` [PATCH 6/7] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
2024-11-08 13:09   ` Sandipan Das
2024-11-08 16:55     ` dongli.zhang
2024-11-04  9:40 ` [PATCH 7/7] target/i386/kvm: don't stop Intel PMU counters Dongli Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=659a895c-3c0b-47c7-81fa-6fb73ba1d2b6@oracle.com \
    --to=dongli.zhang@oracle.com \
    --cc=alexander.ivanov@virtuozzo.com \
    --cc=babu.moger@amd.com \
    --cc=davydov-max@yandex-team.ru \
    --cc=den@virtuozzo.com \
    --cc=groug@kaod.org \
    --cc=joe.jin@oracle.com \
    --cc=khorenko@virtuozzo.com \
    --cc=kvm@vger.kernel.org \
    --cc=like.xu.linux@gmail.com \
    --cc=likexu@tencent.com \
    --cc=lyan@digitalocean.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=sandipan.das@amd.com \
    --cc=zhao1.liu@intel.com \
    --cc=zhenyuw@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).