linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	 Like Xu <likexu@tencent.com>,
	kvm@vger.kernel.org, linux-perf-users@vger.kernel.org,
	 linux-kernel@vger.kernel.org,
	Zhenyu Wang <zhenyuw@linux.intel.com>,
	 Zhang Xiong <xiong.y.zhang@intel.com>,
	Lv Zhiyuan <zhiyuan.lv@intel.com>,
	 Dapeng Mi <dapeng1.mi@intel.com>,
	Mingwei Zhang <mizhang@google.com>
Subject: Re: [Patch v3] KVM: x86/pmu: Manipulate FIXED_CTR_CTRL MSR with macros
Date: Tue, 5 Mar 2024 15:22:02 -0800	[thread overview]
Message-ID: <ZeepGjHCeSfadANM@google.com> (raw)
In-Reply-To: <20230824020546.1108516-1-dapeng1.mi@linux.intel.com>

+Mingwei

On Thu, Aug 24, 2023, Dapeng Mi wrote:
 diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> index 7d9ba301c090..ffda2ecc3a22 100644
> --- a/arch/x86/kvm/pmu.h
> +++ b/arch/x86/kvm/pmu.h
> @@ -12,7 +12,8 @@
>  					  MSR_IA32_MISC_ENABLE_BTS_UNAVAIL)
>  
>  /* retrieve the 4 bits for EN and PMI out of IA32_FIXED_CTR_CTRL */
> -#define fixed_ctrl_field(ctrl_reg, idx) (((ctrl_reg) >> ((idx)*4)) & 0xf)
> +#define fixed_ctrl_field(ctrl_reg, idx) \
> +	(((ctrl_reg) >> ((idx) * INTEL_FIXED_BITS_STRIDE)) & INTEL_FIXED_BITS_MASK)
>  
>  #define VMWARE_BACKDOOR_PMC_HOST_TSC		0x10000
>  #define VMWARE_BACKDOOR_PMC_REAL_TIME		0x10001
> @@ -165,7 +166,8 @@ static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
>  
>  	if (pmc_is_fixed(pmc))
>  		return fixed_ctrl_field(pmu->fixed_ctr_ctrl,
> -					pmc->idx - INTEL_PMC_IDX_FIXED) & 0x3;
> +					pmc->idx - INTEL_PMC_IDX_FIXED) &
> +					(INTEL_FIXED_0_KERNEL | INTEL_FIXED_0_USER);
>  
>  	return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
>  }
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index f2efa0bf7ae8..b0ac55891cb7 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -548,8 +548,13 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>  		setup_fixed_pmc_eventsel(pmu);
>  	}
>  
> -	for (i = 0; i < pmu->nr_arch_fixed_counters; i++)
> -		pmu->fixed_ctr_ctrl_mask &= ~(0xbull << (i * 4));
> +	for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
> +		pmu->fixed_ctr_ctrl_mask &=
> +			 ~intel_fixed_bits_by_idx(i,
> +						  INTEL_FIXED_0_KERNEL |
> +						  INTEL_FIXED_0_USER |
> +						  INTEL_FIXED_0_ENABLE_PMI);
> +	}
>  	counter_mask = ~(((1ull << pmu->nr_arch_gp_counters) - 1) |
>  		(((1ull << pmu->nr_arch_fixed_counters) - 1) << INTEL_PMC_IDX_FIXED));
>  	pmu->global_ctrl_mask = counter_mask;
> @@ -595,7 +600,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>  			pmu->reserved_bits &= ~ICL_EVENTSEL_ADAPTIVE;
>  			for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
>  				pmu->fixed_ctr_ctrl_mask &=
> -					~(1ULL << (INTEL_PMC_IDX_FIXED + i * 4));

OMG, this might just win the award for most obfuscated PMU code in KVM, which is
saying something.  The fact that INTEL_PMC_IDX_FIXED happens to be 32, the same
bit number as ICL_FIXED_0_ADAPTIVE, is 100% coincidence.  Good riddance.

Argh, and this goofy code helped introduce a real bug.  reprogram_fixed_counters()
doesn't account for the upper 32 bits of IA32_FIXED_CTR_CTRL.

Wait, WTF?  Nothing in KVM accounts for the upper bits.  This can't possibly work.

IIUC, because KVM _always_ sets precise_ip to a non-zero bit for PEBS events,
perf will _always_ generate an adaptive record, even if the guest requested a
basic record.  Ugh, and KVM will always generate adaptive records even if the
guest doesn't support them.  This is all completely broken.  It probably kinda
sorta works because the Basic info is always stored in the record, and generating
more info requires a non-zero MSR_PEBS_DATA_CFG, but ugh.

Oh great, and it gets worse.  intel_pmu_disable_fixed() doesn't clear the upper
bits either, i.e. leaves ICL_FIXED_0_ADAPTIVE set.  Unless I'm misreading the code,
intel_pmu_enable_fixed() effectively doesn't clear ICL_FIXED_0_ADAPTIVE either,
as it only modifies the bit when it wants to set ICL_FIXED_0_ADAPTIVE.

*sigh*

I'm _very_ tempted to disable KVM PEBS support for the current PMU, and make it
available only when the so-called passthrough PMU is available[*].  Because I
don't see how this is can possibly be functionally correct, nor do I see a way
to make it functionally correct without a rather large and invasive series.

Ouch.  And after chatting with Mingwei, who asked the very good question of
"can this leak host state?", I am pretty sure that yes, this can leak host state.

When PERF_CAP_PEBS_BASELINE is enabled for the guest, i.e. when the guest has
access to adaptive records, KVM gives the guest full access to MSR_PEBS_DATA_CFG

	pmu->pebs_data_cfg_mask = ~0xff00000full;

which makes sense in a vacuum, because AFAICT the architecture doesn't allow
exposing a subset of the four adaptive controls.

GPRs and XMMs are always context switched and thus benign, but IIUC, Memory Info
provides data that might now otherwise be available to the guest, e.g. if host
userspace has disallowed equivalent events via KVM_SET_PMU_EVENT_FILTER.

And unless I'm missing something, LBRs are a full leak of host state.  Nothing
in the SDM suggests that PEBS records honor MSR intercepts, so unless KVM is
also passing through LBRs, i.e. is context switching all LBR MSRs, the guest can
use PEBS to read host LBRs at will.

Unless someone chimes in to point out how PEBS virtualization isn't a broken mess,
I will post a patch to effectively disable PEBS virtualization.

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 41a4533f9989..a2f827fa0ca1 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -392,7 +392,7 @@ static inline bool vmx_pt_mode_is_host_guest(void)
 
 static inline bool vmx_pebs_supported(void)
 {
-       return boot_cpu_has(X86_FEATURE_PEBS) && kvm_pmu_cap.pebs_ept;
+       return false;
 }
 
 static inline bool cpu_has_notify_vmexit(void)

  reply	other threads:[~2024-03-05 23:22 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-24  2:05 [Patch v3] KVM: x86/pmu: Manipulate FIXED_CTR_CTRL MSR with macros Dapeng Mi
2024-03-05 23:22 ` Sean Christopherson [this message]
2024-03-06  7:55   ` Mi, Dapeng
2024-03-07 11:54     ` Mi, Dapeng
2024-03-06  9:03   ` Like Xu
2024-03-06 15:09     ` Jim Mattson
2024-03-07  3:27       ` Like Xu
2024-03-06 20:17     ` Sean Christopherson
2024-03-07 12:02       ` Like Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZeepGjHCeSfadANM@google.com \
    --to=seanjc@google.com \
    --cc=dapeng1.mi@intel.com \
    --cc=dapeng1.mi@linux.intel.com \
    --cc=kan.liang@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=likexu@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mizhang@google.com \
    --cc=pbonzini@redhat.com \
    --cc=xiong.y.zhang@intel.com \
    --cc=zhenyuw@linux.intel.com \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).