From: "Mi, Dapeng" <dapeng1.mi@linux.intel.com>
To: "Chen, Zide" <zide.chen@intel.com>,
Sean Christopherson <seanjc@google.com>,
Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Jim Mattson <jmattson@google.com>,
Mingwei Zhang <mizhang@google.com>,
Das Sandipan <Sandipan.Das@amd.com>,
Shukla Manali <Manali.Shukla@amd.com>,
Falcon Thomas <thomas.falcon@intel.com>,
Xudong Hao <xudong.hao@intel.com>
Subject: Re: [PATCH V2 2/4] KVM: x86/pmu: Support Intel fixed counter 3 on mediated vPMU
Date: Wed, 6 May 2026 09:36:10 +0800 [thread overview]
Message-ID: <3af0b9d4-f708-4225-9480-cb7080406ca0@linux.intel.com> (raw)
In-Reply-To: <7d5ca8ac-2118-4d70-b70f-9188cf36f40a@intel.com>
On 5/1/2026 1:54 AM, Chen, Zide wrote:
>
> On 4/29/2026 7:19 PM, Mi, Dapeng wrote:
>> On 4/24/2026 1:46 AM, Zide Chen wrote:
>>> From: Dapeng Mi <dapeng1.mi@linux.intel.com>
>>>
>>> Starting with Ice Lake, Intel introduces fixed counter 3, which counts
>>> TOPDOWN.SLOTS - the number of available slots for an unhalted logical
>>> processor. It serves as the denominator for top-level metrics in the
>>> Top-down Microarchitecture Analysis method.
>>>
>>> Emulating this counter on legacy vPMU would require introducing a new
>>> generic perf encoding for the Intel-specific TOPDOWN.SLOTS event in
>>> order to call perf_get_hw_event_config(). This is undesirable as it
>>> would pollute the generic perf event encoding.
>>>
>>> Moreover, KVM does not intend to emulate IA32_PERF_METRICS in the
>>> legacy vPMU model, and without IA32_PERF_METRICS, emulating this
>>> counter has little practical value. Therefore, expose fixed counter
>>> 3 to guests only when mediated vPMU is enabled.
>>>
>>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>>> Co-developed-by: Zide Chen <zide.chen@intel.com>
>>> Signed-off-by: Zide Chen <zide.chen@intel.com>
>>> ---
>>> V2:
>>> - Don't advertise fixed counter 3 to userspace if the host doesn't
>>> support it.
>>> ---
>>> arch/x86/include/asm/kvm_host.h | 2 +-
>>> arch/x86/kvm/cpuid.c | 9 +++++++--
>>> arch/x86/kvm/pmu.c | 4 ++++
>>> arch/x86/kvm/x86.c | 4 ++--
>>> 4 files changed, 14 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>> index c470e40a00aa..cb736a4c72ea 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -556,7 +556,7 @@ struct kvm_pmc {
>>> #define KVM_MAX_NR_GP_COUNTERS KVM_MAX(KVM_MAX_NR_INTEL_GP_COUNTERS, \
>>> KVM_MAX_NR_AMD_GP_COUNTERS)
>>>
>>> -#define KVM_MAX_NR_INTEL_FIXED_COUNTERS 3
>>> +#define KVM_MAX_NR_INTEL_FIXED_COUNTERS 4
>>> #define KVM_MAX_NR_AMD_FIXED_COUNTERS 0
>>> #define KVM_MAX_NR_FIXED_COUNTERS KVM_MAX(KVM_MAX_NR_INTEL_FIXED_COUNTERS, \
>>> KVM_MAX_NR_AMD_FIXED_COUNTERS)
>>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>>> index e69156b54cff..d87a26f740e5 100644
>>> --- a/arch/x86/kvm/cpuid.c
>>> +++ b/arch/x86/kvm/cpuid.c
>>> @@ -1505,7 +1505,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
>>> break;
>>> case 0xa: { /* Architectural Performance Monitoring */
>>> union cpuid10_eax eax = { };
>>> - union cpuid10_edx edx = { };
>>> + union cpuid10_edx edx = { }, host_edx;
>>>
>>> if (!enable_pmu || !static_cpu_has(X86_FEATURE_ARCH_PERFMON)) {
>>> entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
>>> @@ -1516,9 +1516,14 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
>>> eax.split.num_counters = kvm_pmu_cap.num_counters_gp;
>>> eax.split.bit_width = kvm_pmu_cap.bit_width_gp;
>>> eax.split.mask_length = kvm_pmu_cap.events_mask_len;
>>> - edx.split.num_counters_fixed = kvm_pmu_cap.num_counters_fixed;
>>> edx.split.bit_width_fixed = kvm_pmu_cap.bit_width_fixed;
>>>
>>> + /* Guest does not support non-contiguous fixed counters. */
>>> + host_edx = (union cpuid10_edx)entry->edx;
>>> + edx.split.num_counters_fixed =
>>> + min_t(int, kvm_pmu_cap.num_counters_fixed,
>>> + host_edx.split.num_counters_fixed);
>> kvm_pmu_cap are derived from kvm_pmu_host which already represents host
>> fixed counters number, why host fixed counters number is checked again here?
> This stems from KVM not supporting non-contiguous fixed counters on the
> guest.
>
> On CWF, the fixed counter mask is 0x77 and the number of contiguous
> fixed counters is 3. kvm_host_pmu.num_counters_fixed is 6 from the host,
> and in kvm_pmu_cap it's capped to KVM_MAX_NR_INTEL_FIXED_COUNTERS
> without accounting for non-contiguity:
>
> memcpy(&kvm_pmu_cap, &kvm_host_pmu, sizeof(kvm_host_pmu));
> kvm_pmu_cap.num_counters_fixed = min(kvm_pmu_cap.num_counters_fixed,
> KVM_MAX_NR_FIXED_COUNTERS);
>
> It would be more natural to check against the host's contiguous fixed
> counter count in kvm_init_pmu_capability(), but I placed it in cpuid.c
> to leverage do_host_cpuid().
>
> A more complete fix would be to pull in some PerfmonExt patches to add
> fixed/GP counter mask support in kvm_host_pmu, and filter out
> non-contiguous counters in kvm_init_pmu_capability(). But in this way,
> it could have too much "temporary" code to translate between
> nr_of_xxx_counters and xxx_counter_mask.
I see. It may be not a good choice to pull in the PerfmonExt patches in
this patchset considering its large patch size. We'd better move this part
of code into kvm_init_pmu_capability() which is a better place for it, and
we need some comments to explain it. Thanks.
>
>
>> Besides, we can't only depend on the fixed counters number to check if
>> fixed counter 3 is supported on host, e.g., CWF supports fixed counter 4, 5
>> and 6 but doesn't support fixed counter 3. Before adding PerfmonExt (0x23)
>> CPUID leaves support in KVM, we need to check the CPUID.0xa.ecx to get the
>> real fixed countera bitmap and then check if fixed counter 3 is supported.
> This is a theoretical concern even without fixed counter 3 support.
> Before this patch, KVM supports up to 3 fixed counters and assumes they
> are contiguous, which holds true in practice.
>
> CPUID.0xa.ecx is only meaningful starting from PMU v4, so it can't be
> used unconditionally. However, CPUID.0xa.edx[4:0] always represents the
> number of contiguous fixed counters, so checking against it is
> sufficient to filter out non-contiguous ones.
>
>> Thanks.
>>
>>
>>> +
>>> if (kvm_pmu_cap.version)
>>> edx.split.anythread_deprecated = 1;
>>>
>>> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
>>> index e218352e3423..9ff4a6a9cd0b 100644
>>> --- a/arch/x86/kvm/pmu.c
>>> +++ b/arch/x86/kvm/pmu.c
>>> @@ -148,12 +148,16 @@ void kvm_init_pmu_capability(struct kvm_pmu_ops *pmu_ops)
>>> }
>>>
>>> memcpy(&kvm_pmu_cap, &kvm_host_pmu, sizeof(kvm_host_pmu));
>>> +
>>> kvm_pmu_cap.version = min(kvm_pmu_cap.version, 2);
>>> kvm_pmu_cap.num_counters_gp = min(kvm_pmu_cap.num_counters_gp,
>>> pmu_ops->MAX_NR_GP_COUNTERS);
>>> kvm_pmu_cap.num_counters_fixed = min(kvm_pmu_cap.num_counters_fixed,
>>> KVM_MAX_NR_FIXED_COUNTERS);
>>>
>>> + if (!enable_mediated_pmu && kvm_pmu_cap.num_counters_fixed > 3)
>>> + kvm_pmu_cap.num_counters_fixed = 3;
>>> +
>>> kvm_pmu_eventsel.INSTRUCTIONS_RETIRED =
>>> perf_get_hw_event_config(PERF_COUNT_HW_INSTRUCTIONS);
>>> kvm_pmu_eventsel.BRANCH_INSTRUCTIONS_RETIRED =
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index 0a1b63c63d1a..604072d9354f 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -360,7 +360,7 @@ static const u32 msrs_to_save_base[] = {
>>>
>>> static const u32 msrs_to_save_pmu[] = {
>>> MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
>>> - MSR_ARCH_PERFMON_FIXED_CTR0 + 2,
>>> + MSR_ARCH_PERFMON_FIXED_CTR2, MSR_ARCH_PERFMON_FIXED_CTR3,
>>> MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
>>> MSR_CORE_PERF_GLOBAL_CTRL,
>>> MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
>>> @@ -7756,7 +7756,7 @@ static void kvm_init_msr_lists(void)
>>> {
>>> unsigned i;
>>>
>>> - BUILD_BUG_ON_MSG(KVM_MAX_NR_FIXED_COUNTERS != 3,
>>> + BUILD_BUG_ON_MSG(KVM_MAX_NR_FIXED_COUNTERS != 4,
>>> "Please update the fixed PMCs in msrs_to_save_pmu[]");
>>>
>>> num_msrs_to_save = 0;
next prev parent reply other threads:[~2026-05-06 1:36 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-23 17:46 [PATCH V2 0/4] KVM: x86/pmu: Add hardware Topdown metrics support Zide Chen
2026-04-23 17:46 ` [PATCH V2 1/4] KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events Zide Chen
2026-04-30 1:55 ` Mi, Dapeng
2026-04-23 17:46 ` [PATCH V2 2/4] KVM: x86/pmu: Support Intel fixed counter 3 on mediated vPMU Zide Chen
2026-04-30 2:19 ` Mi, Dapeng
2026-04-30 17:54 ` Chen, Zide
2026-05-06 1:36 ` Mi, Dapeng [this message]
2026-04-23 17:46 ` [PATCH V2 3/4] KVM: x86/pmu: Support PERF_METRICS MSR in " Zide Chen
2026-04-30 2:22 ` Mi, Dapeng
2026-04-23 17:46 ` [PATCH V2 4/4] KVM: selftests: Add perf_metrics and fixed counter 3 tests Zide Chen
2026-04-30 2:26 ` Mi, Dapeng
2026-04-30 18:13 ` Chen, Zide
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3af0b9d4-f708-4225-9480-cb7080406ca0@linux.intel.com \
--to=dapeng1.mi@linux.intel.com \
--cc=Manali.Shukla@amd.com \
--cc=Sandipan.Das@amd.com \
--cc=jmattson@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mizhang@google.com \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=thomas.falcon@intel.com \
--cc=xudong.hao@intel.com \
--cc=zide.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox