From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5CE002F25FA; Fri, 14 Nov 2025 06:29:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.12 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763101745; cv=none; b=rtsJcRrdGKJRTtUelbTTVjj+nNl03uHjBztEumqQyNJ1fVgwbZB2VliJx1zV2gJgQj2wN4HEyB7eokKfl6tP0WrpbGSP/h4A002HFNxvtucpIxlxtk/Le9fNboa6D9ERUg8yoliHRaqX3XOnxy/7NSZfvhyTlYh4zPPUA4QN4S4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763101745; c=relaxed/simple; bh=c0jhX0WGaCse9Y/i3Ym6g50JD/iePAUdihTKy+wNNG8=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=p/SiYWZZkrCJM8sn1vtdQbbZp06kFsQrLNpah+ov0zMClVqbiOBzcuFN3asz+YXEOgjYX23BylK8X/Sm5LVCofvt41UnBQKXJOyY8u2bKRt0jeeExgklqTITKwyZPQm4bsVe/R0aP+/BAjar+VOp9boi0gMYZ1WyjL6tCt98UIg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=k/sRb67W; arc=none smtp.client-ip=198.175.65.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="k/sRb67W" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1763101743; x=1794637743; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=c0jhX0WGaCse9Y/i3Ym6g50JD/iePAUdihTKy+wNNG8=; b=k/sRb67WeI8ab5ZQDwzwGDX8+F9JFq3cU/Ai0k072C5mcTg3CQL6vgzT ZMzi+88aSzDJwdWDjnjVZUTBai6DaCiAsVkCgQh6DSWKQLYVioO0md1yH HnwpDP24ReWQOypbZmNLbh6D7B8tBD2tdQLQ7AJIUjHkKpQe/Amdk4loN nfW/bC7BdDsUdtYyXhT23W6j5fn9IJp8VY3AWgfFL3j546IScV0TtgZJ6 rLoZ4KdFp0/ShVQhaXoIO0VwKo1/QJI06DrPf8j/EQxNn0lbrVe6R4MHp bL5QpK60zkVrls/iH793Mdt4Zzyrsg3L5IcBJOzyZE+olpqj+bZc/628y A==; X-CSE-ConnectionGUID: 4Gy9nfsGQ6a5l9yy6bScqA== X-CSE-MsgGUID: 1fyaDsOaSEStC5x5GScxiw== X-IronPort-AV: E=McAfee;i="6800,10657,11612"; a="76652732" X-IronPort-AV: E=Sophos;i="6.19,304,1754982000"; d="scan'208";a="76652732" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Nov 2025 22:29:01 -0800 X-CSE-ConnectionGUID: OkgXPN5nS4a1n57WRnXVtw== X-CSE-MsgGUID: mvhYYMxDTl+8xZ7RPk1mVQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.19,304,1754982000"; d="scan'208";a="190121909" Received: from dapengmi-mobl1.ccr.corp.intel.com (HELO [10.124.232.65]) ([10.124.232.65]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Nov 2025 22:28:57 -0800 Message-ID: <9919b426-4ce6-4651-b1f2-367c4f79474d@linux.intel.com> Date: Fri, 14 Nov 2025 14:28:55 +0800 Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 4/7] KVM: x86/pmu: Add support for hardware virtualized PMUs To: Sandipan Das , kvm@vger.kernel.org, linux-perf-users@vger.kernel.org Cc: Sean Christopherson , Paolo Bonzini , Peter Zijlstra , Kan Liang , Mingwei Zhang , "Nikunj A . Dadhania" , Manali Shukla , Ravi Bangoria , Ananth Narayan References: Content-Language: en-US From: "Mi, Dapeng" In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 11/13/2025 2:18 PM, Sandipan Das wrote: > Extend the Mediated PMU framework to support hardware virtualized PMUs. > The key differences with Mediated PMU are listed below. > * Hardware saves and restores the guest PMU state on world switches. > * The guest PMU state is saved in vendor-specific structures (such as > VMCB or VMCS) instead of struct kvm_pmu. > * Hardware relies on interrupt virtualization (such as VNMI or AVIC) > to notify guests about counter overflows instead of receiving > interrupts in host context after switching the delivery mode in > LVTPC and then injecting them back in to the guest (KVM_REQ_PMI). > > Parts of the original PMU load and put functionality are reused as the > active host events still need to be scheduled in and out in preparation > for world switches. > > Event filtering and instruction emulation require the ability to change > the guest PMU state in software. Since struct kvm_pmu no longer has the > correct state, make use of host-initiated MSR accesses for accessing > MSR states directly from vendor-specific structures. > > RDPMC is intercepted for legacy guests which do not have access to all > counters. Host-initiated MSR accesses are also used in such cases to > read the latest counter value from vendor-specific structures. > > Signed-off-by: Sandipan Das > --- > arch/x86/kvm/pmu.c | 94 +++++++++++++++++++++++++++++------- > arch/x86/kvm/pmu.h | 6 +++ > arch/x86/kvm/svm/pmu.c | 1 + > arch/x86/kvm/vmx/pmu_intel.c | 1 + > arch/x86/kvm/x86.c | 4 ++ > arch/x86/kvm/x86.h | 1 + > 6 files changed, 89 insertions(+), 18 deletions(-) > > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c > index 0e5048ae86fa..1453fb3a60a2 100644 > --- a/arch/x86/kvm/pmu.c > +++ b/arch/x86/kvm/pmu.c > @@ -168,6 +168,43 @@ void kvm_handle_guest_mediated_pmi(void) > kvm_make_request(KVM_REQ_PMI, vcpu); > } > > +static __always_inline u32 fixed_counter_msr(u32 idx) > +{ > + return kvm_pmu_ops.FIXED_COUNTER_BASE + idx * kvm_pmu_ops.MSR_STRIDE; > +} > + > +static __always_inline u32 gp_counter_msr(u32 idx) > +{ > + return kvm_pmu_ops.GP_COUNTER_BASE + idx * kvm_pmu_ops.MSR_STRIDE; > +} > + > +static __always_inline u32 gp_eventsel_msr(u32 idx) > +{ > + return kvm_pmu_ops.GP_EVENTSEL_BASE + idx * kvm_pmu_ops.MSR_STRIDE; > +} > + > +static void kvm_pmu_get_msr_state(struct kvm_vcpu *vcpu, u32 index, u64 *data) > +{ > + struct msr_data msr_info; > + > + msr_info.index = index; > + msr_info.host_initiated = true; > + > + KVM_BUG_ON(kvm_pmu_call(get_msr)(vcpu, &msr_info), vcpu->kvm); > + *data = msr_info.data; > +} > + > +static void kvm_pmu_set_msr_state(struct kvm_vcpu *vcpu, u32 index, u64 data) > +{ > + struct msr_data msr_info; > + > + msr_info.data = data; > + msr_info.index = index; > + msr_info.host_initiated = true; > + > + KVM_BUG_ON(kvm_pmu_call(set_msr)(vcpu, &msr_info), vcpu->kvm); > +} With these 2 new helpers, suppose the helpers write_global_ctrl()/read_global_ctrl() could be retired.  > + > static inline void __kvm_perf_overflow(struct kvm_pmc *pmc, bool in_pmi) > { > struct kvm_pmu *pmu = pmc_to_pmu(pmc); > @@ -520,19 +557,22 @@ static bool pmc_is_event_allowed(struct kvm_pmc *pmc) > > static void kvm_mediated_pmu_refresh_event_filter(struct kvm_pmc *pmc) > { > - bool allowed = pmc_is_event_allowed(pmc); > struct kvm_pmu *pmu = pmc_to_pmu(pmc); > + struct kvm_vcpu *vcpu = pmc->vcpu; > > if (pmc_is_gp(pmc)) { > pmc->eventsel_hw &= ~ARCH_PERFMON_EVENTSEL_ENABLE; > - if (allowed) > + if (pmc_is_event_allowed(pmc)) > pmc->eventsel_hw |= pmc->eventsel & > ARCH_PERFMON_EVENTSEL_ENABLE; > + > + if (kvm_vcpu_has_virtualized_pmu(vcpu)) > + kvm_pmu_set_msr_state(vcpu, gp_eventsel_msr(pmc->idx), pmc->eventsel_hw); > } else { > u64 mask = intel_fixed_bits_by_idx(pmc->idx - KVM_FIXED_PMC_BASE_IDX, 0xf); > > pmu->fixed_ctr_ctrl_hw &= ~mask; > - if (allowed) > + if (pmc_is_event_allowed(pmc)) > pmu->fixed_ctr_ctrl_hw |= pmu->fixed_ctr_ctrl & mask; > } > } > @@ -740,6 +780,9 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data) > kvm_is_cr0_bit_set(vcpu, X86_CR0_PE)) > return 1; > > + if (kvm_vcpu_has_virtualized_pmu(pmc->vcpu)) > + kvm_pmu_get_msr_state(pmc->vcpu, gp_counter_msr(pmc->idx), &pmc->counter); > + > *data = pmc_read_counter(pmc) & mask; > return 0; > } > @@ -974,6 +1017,9 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu) > (kvm_pmu_has_perf_global_ctrl(pmu) || kvm_vcpu_has_mediated_pmu(vcpu))) > pmu->global_ctrl = GENMASK_ULL(pmu->nr_arch_gp_counters - 1, 0); > > + if (kvm_vcpu_has_virtualized_pmu(vcpu)) > + kvm_pmu_set_msr_state(vcpu, kvm_pmu_ops.PERF_GLOBAL_CTRL, pmu->global_ctrl); > + > if (kvm_vcpu_has_mediated_pmu(vcpu)) > kvm_pmu_call(write_global_ctrl)(pmu->global_ctrl); > > @@ -1099,6 +1145,11 @@ static void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, > if (bitmap_empty(event_pmcs, X86_PMC_IDX_MAX)) > return; > > + if (kvm_vcpu_has_virtualized_pmu(vcpu)) { > + kvm_pmu_get_msr_state(vcpu, kvm_pmu_ops.PERF_GLOBAL_CTRL, &pmu->global_ctrl); > + kvm_pmu_get_msr_state(vcpu, kvm_pmu_ops.PERF_GLOBAL_STATUS, &pmu->global_status); > + } > + > if (!kvm_pmu_has_perf_global_ctrl(pmu)) > bitmap_copy(bitmap, event_pmcs, X86_PMC_IDX_MAX); > else if (!bitmap_and(bitmap, event_pmcs, > @@ -1107,11 +1158,21 @@ static void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, > > idx = srcu_read_lock(&vcpu->kvm->srcu); > kvm_for_each_pmc(pmu, pmc, i, bitmap) { > + if (kvm_vcpu_has_virtualized_pmu(vcpu)) > + kvm_pmu_get_msr_state(vcpu, gp_counter_msr(pmc->idx), &pmc->counter); > + > if (!pmc_is_event_allowed(pmc) || !cpl_is_matched(pmc)) > continue; > > kvm_pmu_incr_counter(pmc); > + > + if (kvm_vcpu_has_virtualized_pmu(vcpu)) > + kvm_pmu_set_msr_state(vcpu, gp_counter_msr(pmc->idx), pmc->counter); > } > + > + if (kvm_vcpu_has_virtualized_pmu(vcpu)) > + kvm_pmu_set_msr_state(vcpu, kvm_pmu_ops.PERF_GLOBAL_STATUS, pmu->global_status); > + > srcu_read_unlock(&vcpu->kvm->srcu, idx); > } > > @@ -1270,21 +1331,6 @@ int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp) > return r; > } > > -static __always_inline u32 fixed_counter_msr(u32 idx) > -{ > - return kvm_pmu_ops.FIXED_COUNTER_BASE + idx * kvm_pmu_ops.MSR_STRIDE; > -} > - > -static __always_inline u32 gp_counter_msr(u32 idx) > -{ > - return kvm_pmu_ops.GP_COUNTER_BASE + idx * kvm_pmu_ops.MSR_STRIDE; > -} > - > -static __always_inline u32 gp_eventsel_msr(u32 idx) > -{ > - return kvm_pmu_ops.GP_EVENTSEL_BASE + idx * kvm_pmu_ops.MSR_STRIDE; > -} > - > static void kvm_pmu_load_guest_pmcs(struct kvm_vcpu *vcpu) > { > struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); > @@ -1319,6 +1365,12 @@ void kvm_mediated_pmu_load(struct kvm_vcpu *vcpu) > > lockdep_assert_irqs_disabled(); > > + /* Guest PMU state is restored by hardware at VM-Entry */ > + if (kvm_vcpu_has_virtualized_pmu(vcpu)) { > + perf_load_guest_context(0); > + return; > + } > + > perf_load_guest_context(kvm_lapic_get_reg(vcpu->arch.apic, APIC_LVTPC)); > > /* > @@ -1372,6 +1424,12 @@ void kvm_mediated_pmu_put(struct kvm_vcpu *vcpu) > > lockdep_assert_irqs_disabled(); > > + /* Guest PMU state is saved by hardware at VM-Exit */ > + if (kvm_vcpu_has_virtualized_pmu(vcpu)) { > + perf_put_guest_context(); > + return; > + } > + > /* > * Defer handling of PERF_GLOBAL_CTRL to vendor code. On Intel, it's > * atomically cleared on VM-Exit, i.e. doesn't need to be clear here. > diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h > index a0cd42cbea9d..55f0679b522d 100644 > --- a/arch/x86/kvm/pmu.h > +++ b/arch/x86/kvm/pmu.h > @@ -47,6 +47,7 @@ struct kvm_pmu_ops { > const int MIN_NR_GP_COUNTERS; > > const u32 PERF_GLOBAL_CTRL; > + const u32 PERF_GLOBAL_STATUS; > const u32 GP_EVENTSEL_BASE; > const u32 GP_COUNTER_BASE; > const u32 FIXED_COUNTER_BASE; > @@ -76,6 +77,11 @@ static inline bool kvm_vcpu_has_mediated_pmu(struct kvm_vcpu *vcpu) > return enable_mediated_pmu && vcpu_to_pmu(vcpu)->version; > } > > +static inline bool kvm_vcpu_has_virtualized_pmu(struct kvm_vcpu *vcpu) > +{ > + return enable_virtualized_pmu && vcpu_to_pmu(vcpu)->version; > +} > + > /* > * KVM tracks all counters in 64-bit bitmaps, with general purpose counters > * mapped to bits 31:0 and fixed counters mapped to 63:32, e.g. fixed counter 0 > diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c > index c03720b30785..8a32e1a9c07d 100644 > --- a/arch/x86/kvm/svm/pmu.c > +++ b/arch/x86/kvm/svm/pmu.c > @@ -278,6 +278,7 @@ struct kvm_pmu_ops amd_pmu_ops __initdata = { > .MIN_NR_GP_COUNTERS = AMD64_NUM_COUNTERS, > > .PERF_GLOBAL_CTRL = MSR_AMD64_PERF_CNTR_GLOBAL_CTL, > + .PERF_GLOBAL_STATUS = MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, > .GP_EVENTSEL_BASE = MSR_F15H_PERF_CTL0, > .GP_COUNTER_BASE = MSR_F15H_PERF_CTR0, > .FIXED_COUNTER_BASE = 0, > diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c > index 41a845de789e..9685af27c15c 100644 > --- a/arch/x86/kvm/vmx/pmu_intel.c > +++ b/arch/x86/kvm/vmx/pmu_intel.c > @@ -845,6 +845,7 @@ struct kvm_pmu_ops intel_pmu_ops __initdata = { > .MIN_NR_GP_COUNTERS = 1, > > .PERF_GLOBAL_CTRL = MSR_CORE_PERF_GLOBAL_CTRL, > + .PERF_GLOBAL_STATUS = MSR_CORE_PERF_GLOBAL_STATUS, > .GP_EVENTSEL_BASE = MSR_P6_EVNTSEL0, > .GP_COUNTER_BASE = MSR_IA32_PMC0, > .FIXED_COUNTER_BASE = MSR_CORE_PERF_FIXED_CTR0, > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 6bdf7ef0b535..750535a53a30 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -191,6 +191,10 @@ module_param(enable_pmu, bool, 0444); > bool __read_mostly enable_mediated_pmu; > EXPORT_SYMBOL_GPL(enable_mediated_pmu); > > +/* Enable/disable hardware PMU virtualization. */ > +bool __read_mostly enable_virtualized_pmu; > +EXPORT_SYMBOL_GPL(enable_virtualized_pmu); > + > bool __read_mostly eager_page_split = true; > module_param(eager_page_split, bool, 0644); > > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h > index bd1149768acc..8cca48d1eed7 100644 > --- a/arch/x86/kvm/x86.h > +++ b/arch/x86/kvm/x86.h > @@ -446,6 +446,7 @@ extern struct kvm_host_values kvm_host; > > extern bool enable_pmu; > extern bool enable_mediated_pmu; > +extern bool enable_virtualized_pmu; > > /* > * Get a filtered version of KVM's supported XCR0 that strips out dynamic