Re: [PATCH V2 2/4] KVM: x86/pmu: Support Intel fixed counter 3 on mediated vPMU

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Chen, Zide" <zide.chen@intel.com>
To: "Mi, Dapeng" <dapeng1.mi@linux.intel.com>,
	Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Jim Mattson <jmattson@google.com>,
	Mingwei Zhang <mizhang@google.com>,
	Das Sandipan <Sandipan.Das@amd.com>,
	Shukla Manali <Manali.Shukla@amd.com>,
	Falcon Thomas <thomas.falcon@intel.com>,
	Xudong Hao <xudong.hao@intel.com>
Subject: Re: [PATCH V2 2/4] KVM: x86/pmu: Support Intel fixed counter 3 on mediated vPMU
Date: Thu, 30 Apr 2026 10:54:57 -0700	[thread overview]
Message-ID: <7d5ca8ac-2118-4d70-b70f-9188cf36f40a@intel.com> (raw)
In-Reply-To: <6d472e6e-ad75-4d0f-8475-469875806cc4@linux.intel.com>



On 4/29/2026 7:19 PM, Mi, Dapeng wrote:
> 
> On 4/24/2026 1:46 AM, Zide Chen wrote:
>> From: Dapeng Mi <dapeng1.mi@linux.intel.com>
>>
>> Starting with Ice Lake, Intel introduces fixed counter 3, which counts
>> TOPDOWN.SLOTS - the number of available slots for an unhalted logical
>> processor.  It serves as the denominator for top-level metrics in the
>> Top-down Microarchitecture Analysis method.
>>
>> Emulating this counter on legacy vPMU would require introducing a new
>> generic perf encoding for the Intel-specific TOPDOWN.SLOTS event in
>> order to call perf_get_hw_event_config().  This is undesirable as it
>> would pollute the generic perf event encoding.
>>
>> Moreover, KVM does not intend to emulate IA32_PERF_METRICS in the
>> legacy vPMU model, and without IA32_PERF_METRICS, emulating this
>> counter has little practical value.  Therefore, expose fixed counter
>> 3 to guests only when mediated vPMU is enabled.
>>
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> Co-developed-by: Zide Chen <zide.chen@intel.com>
>> Signed-off-by: Zide Chen <zide.chen@intel.com>
>> ---
>> V2:
>> - Don't advertise fixed counter 3 to userspace if the host doesn't
>>   support it.
>> ---
>>  arch/x86/include/asm/kvm_host.h | 2 +-
>>  arch/x86/kvm/cpuid.c            | 9 +++++++--
>>  arch/x86/kvm/pmu.c              | 4 ++++
>>  arch/x86/kvm/x86.c              | 4 ++--
>>  4 files changed, 14 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index c470e40a00aa..cb736a4c72ea 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -556,7 +556,7 @@ struct kvm_pmc {
>>  #define KVM_MAX_NR_GP_COUNTERS		KVM_MAX(KVM_MAX_NR_INTEL_GP_COUNTERS, \
>>  						KVM_MAX_NR_AMD_GP_COUNTERS)
>>  
>> -#define KVM_MAX_NR_INTEL_FIXED_COUNTERS	3
>> +#define KVM_MAX_NR_INTEL_FIXED_COUNTERS	4
>>  #define KVM_MAX_NR_AMD_FIXED_COUNTERS	0
>>  #define KVM_MAX_NR_FIXED_COUNTERS	KVM_MAX(KVM_MAX_NR_INTEL_FIXED_COUNTERS, \
>>  						KVM_MAX_NR_AMD_FIXED_COUNTERS)
>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>> index e69156b54cff..d87a26f740e5 100644
>> --- a/arch/x86/kvm/cpuid.c
>> +++ b/arch/x86/kvm/cpuid.c
>> @@ -1505,7 +1505,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
>>  		break;
>>  	case 0xa: { /* Architectural Performance Monitoring */
>>  		union cpuid10_eax eax = { };
>> -		union cpuid10_edx edx = { };
>> +		union cpuid10_edx edx = { }, host_edx;
>>  
>>  		if (!enable_pmu || !static_cpu_has(X86_FEATURE_ARCH_PERFMON)) {
>>  			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
>> @@ -1516,9 +1516,14 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
>>  		eax.split.num_counters = kvm_pmu_cap.num_counters_gp;
>>  		eax.split.bit_width = kvm_pmu_cap.bit_width_gp;
>>  		eax.split.mask_length = kvm_pmu_cap.events_mask_len;
>> -		edx.split.num_counters_fixed = kvm_pmu_cap.num_counters_fixed;
>>  		edx.split.bit_width_fixed = kvm_pmu_cap.bit_width_fixed;
>>  
>> +		/* Guest does not support non-contiguous fixed counters. */
>> +		host_edx = (union cpuid10_edx)entry->edx;
>> +		edx.split.num_counters_fixed =
>> +			 min_t(int, kvm_pmu_cap.num_counters_fixed,
>> +			       host_edx.split.num_counters_fixed);
> 
> kvm_pmu_cap are derived from kvm_pmu_host which already represents host
> fixed counters number, why host fixed counters number is checked again here?

This stems from KVM not supporting non-contiguous fixed counters on the
guest.

On CWF, the fixed counter mask is 0x77 and the number of contiguous
fixed counters is 3. kvm_host_pmu.num_counters_fixed is 6 from the host,
and in kvm_pmu_cap it's capped to KVM_MAX_NR_INTEL_FIXED_COUNTERS
without accounting for non-contiguity:

memcpy(&kvm_pmu_cap, &kvm_host_pmu, sizeof(kvm_host_pmu));
kvm_pmu_cap.num_counters_fixed = min(kvm_pmu_cap.num_counters_fixed,
                                     KVM_MAX_NR_FIXED_COUNTERS);

It would be more natural to check against the host's contiguous fixed
counter count in kvm_init_pmu_capability(), but I placed it in cpuid.c
to leverage do_host_cpuid().

A more complete fix would be to pull in some PerfmonExt patches to add
fixed/GP counter mask support in kvm_host_pmu, and filter out
non-contiguous counters in kvm_init_pmu_capability(). But in this way,
it could have too much "temporary" code to translate between
nr_of_xxx_counters and xxx_counter_mask.


> Besides, we can't only depend on the fixed counters number to check if
> fixed counter 3 is supported on host, e.g., CWF supports fixed counter 4, 5
> and 6 but doesn't support fixed counter 3. Before adding PerfmonExt (0x23)
> CPUID leaves support in KVM, we need to check the  CPUID.0xa.ecx to get the
> real fixed countera bitmap and then check if fixed counter 3 is supported.

This is a theoretical concern even without fixed counter 3 support.
Before this patch, KVM supports up to 3 fixed counters and assumes they
are contiguous, which holds true in practice.

CPUID.0xa.ecx is only meaningful starting from PMU v4, so it can't be
used unconditionally. However, CPUID.0xa.edx[4:0] always represents the
number of contiguous fixed counters, so checking against it is
sufficient to filter out non-contiguous ones.

> Thanks.
> 
> 
>> +
>>  		if (kvm_pmu_cap.version)
>>  			edx.split.anythread_deprecated = 1;
>>  
>> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
>> index e218352e3423..9ff4a6a9cd0b 100644
>> --- a/arch/x86/kvm/pmu.c
>> +++ b/arch/x86/kvm/pmu.c
>> @@ -148,12 +148,16 @@ void kvm_init_pmu_capability(struct kvm_pmu_ops *pmu_ops)
>>  	}
>>  
>>  	memcpy(&kvm_pmu_cap, &kvm_host_pmu, sizeof(kvm_host_pmu));
>> +
>>  	kvm_pmu_cap.version = min(kvm_pmu_cap.version, 2);
>>  	kvm_pmu_cap.num_counters_gp = min(kvm_pmu_cap.num_counters_gp,
>>  					  pmu_ops->MAX_NR_GP_COUNTERS);
>>  	kvm_pmu_cap.num_counters_fixed = min(kvm_pmu_cap.num_counters_fixed,
>>  					     KVM_MAX_NR_FIXED_COUNTERS);
>>  
>> +	if (!enable_mediated_pmu && kvm_pmu_cap.num_counters_fixed > 3)
>> +		kvm_pmu_cap.num_counters_fixed = 3;
>> +
>>  	kvm_pmu_eventsel.INSTRUCTIONS_RETIRED =
>>  		perf_get_hw_event_config(PERF_COUNT_HW_INSTRUCTIONS);
>>  	kvm_pmu_eventsel.BRANCH_INSTRUCTIONS_RETIRED =
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 0a1b63c63d1a..604072d9354f 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -360,7 +360,7 @@ static const u32 msrs_to_save_base[] = {
>>  
>>  static const u32 msrs_to_save_pmu[] = {
>>  	MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
>> -	MSR_ARCH_PERFMON_FIXED_CTR0 + 2,
>> +	MSR_ARCH_PERFMON_FIXED_CTR2, MSR_ARCH_PERFMON_FIXED_CTR3,
>>  	MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
>>  	MSR_CORE_PERF_GLOBAL_CTRL,
>>  	MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
>> @@ -7756,7 +7756,7 @@ static void kvm_init_msr_lists(void)
>>  {
>>  	unsigned i;
>>  
>> -	BUILD_BUG_ON_MSG(KVM_MAX_NR_FIXED_COUNTERS != 3,
>> +	BUILD_BUG_ON_MSG(KVM_MAX_NR_FIXED_COUNTERS != 4,
>>  			 "Please update the fixed PMCs in msrs_to_save_pmu[]");
>>  
>>  	num_msrs_to_save = 0;

next prev parent reply	other threads:[~2026-04-30 17:54 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-23 17:46 [PATCH V2 0/4] KVM: x86/pmu: Add hardware Topdown metrics support Zide Chen
2026-04-23 17:46 ` [PATCH V2 1/4] KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events Zide Chen
2026-04-30  1:55   ` Mi, Dapeng
2026-04-23 17:46 ` [PATCH V2 2/4] KVM: x86/pmu: Support Intel fixed counter 3 on mediated vPMU Zide Chen
2026-04-30  2:19   ` Mi, Dapeng
2026-04-30 17:54     ` Chen, Zide [this message]
2026-05-06  1:36       ` Mi, Dapeng
2026-04-23 17:46 ` [PATCH V2 3/4] KVM: x86/pmu: Support PERF_METRICS MSR in " Zide Chen
2026-04-30  2:22   ` Mi, Dapeng
2026-04-23 17:46 ` [PATCH V2 4/4] KVM: selftests: Add perf_metrics and fixed counter 3 tests Zide Chen
2026-04-30  2:26   ` Mi, Dapeng
2026-04-30 18:13     ` Chen, Zide

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7d5ca8ac-2118-4d70-b70f-9188cf36f40a@intel.com \
    --to=zide.chen@intel.com \
    --cc=Manali.Shukla@amd.com \
    --cc=Sandipan.Das@amd.com \
    --cc=dapeng1.mi@linux.intel.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mizhang@google.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=thomas.falcon@intel.com \
    --cc=xudong.hao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox