Re: [PATCH V3 3/4] KVM: x86/pmu: Support PERF_METRICS MSR in mediated vPMU

Kernel KVM virtualization development
 help / color / mirror / Atom feed

From: "Mi, Dapeng" <dapeng1.mi@linux.intel.com>
To: "Chen, Zide" <zide.chen@intel.com>, sashiko-reviews@lists.linux.dev
Cc: kvm@vger.kernel.org
Subject: Re: [PATCH V3 3/4] KVM: x86/pmu: Support PERF_METRICS MSR in mediated vPMU
Date: Thu, 18 Jun 2026 10:21:31 +0800	[thread overview]
Message-ID: <ba9eead0-371a-4720-ae9d-e84993c7253c@linux.intel.com> (raw)
In-Reply-To: <5e03d30e-b33b-476e-b7e6-8ff4979ff8db@intel.com>


On 6/17/2026 12:29 AM, Chen, Zide wrote:
>
> On 6/15/2026 6:26 PM, sashiko-bot@kernel.org wrote:
>> Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
>> - [High] Guest-reachable WARN_ON_ONCE assertions in MSR_PERF_METRICS accessors allow unprivileged guests to trigger host-side denial of service.
>> - [High] Software updates to MSR_PERF_METRICS via KVM ioctls or VCPU reset are silently clobbered by stale hardware MSR values during vcpu_put().
>> - [High] MSR_PERF_METRICS is passed through to the guest without context switching if the guest has a PMU version of 0, bypassing host isolation.
>> --
>>
>> commit 65a5a5edf8809e596749d317875bbc842f4eb7fc
>> Author: Dapeng Mi <dapeng1.mi@linux.intel.com>
>>
>> KVM: x86/pmu: Support PERF_METRICS MSR in mediated vPMU
>>
>> This commit exposes the IA32_PERF_METRICS MSR to guests when mediated
>> vPMU is enabled. It allows host access via kvm_pmu_{get,set}_msr() and
>> handles save/restore during PMU context switches.
>>
>>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>>> index 05a59f4acfdd3..61102aa5e2f9e 100644
>>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>> [ ... ]
>>
>>> @@ -335,6 +337,10 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>>  	case MSR_CORE_PERF_FIXED_CTR_CTRL:
>>>  		msr_info->data = pmu->fixed_ctr_ctrl;
>>>  		break;
>>> +	case MSR_PERF_METRICS:
>>> +		WARN_ON_ONCE(!msr_info->host_initiated);
>> [Severity: High]
>> Could this WARN_ON_ONCE() be triggered by an unprivileged guest?
>>
>> If a guest forces KVM's instruction emulator to handle the MSR access, for
>> example by executing RDMSR over an MMIO page, the emulator might process the
>> instruction and invoke KVM's MSR handlers with host_initiated = false.
> Yes, in theory, it's possible for the RDMSR/WRMSR to be handled in the
> emulator path and WARN_ON_ONCE() could be unexpectedly triggered.
>
> Since WARN_ON() is not intended to handle untrusted input, it's better
> to use a defensive check and reject non-host-initiated accesses:
>
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 19daee29b731..6073227bdd20 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -348,7 +348,8 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu,
> struct msr_data *msr_info)
>                 msr_info->data = pmu->fixed_ctr_ctrl;
>                 break;
>         case MSR_PERF_METRICS:
> -               WARN_ON_ONCE(!msr_info->host_initiated);
> +               if (!msr_info->host_initiated)
> +                       return 1;
>                 msr_info->data = pmu->perf_metrics;
>                 break;
>         case MSR_IA32_PEBS_ENABLE:
> @@ -401,7 +402,8 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu,
> struct msr_data *msr_info)
>                         reprogram_fixed_counters(pmu, data);
>                 break;
>         case MSR_PERF_METRICS:
> -               WARN_ON_ONCE(!msr_info->host_initiated);
> +               if (!msr_info->host_initiated)
> +                       return 1;

Per my understanding, WARN_ON_ONCE() only prints the warning message once
and it should not spam Host log too much, but it's fine for me to remove
them. Thanks.


>
>
>
>> This would unconditionally trigger the warning and allow the guest to spam
>> the host kernel log or cause a host panic if panic_on_warn is enabled.
>>
>>> +		msr_info->data = pmu->perf_metrics;
>>> +		break;
>>>  	case MSR_IA32_PEBS_ENABLE:
>>>  		msr_info->data = pmu->pebs_enable;
>>>  		break;
>> [ ... ]
>>
>>> @@ -384,6 +390,15 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>>  		if (pmu->fixed_ctr_ctrl != data)
>>>  			reprogram_fixed_counters(pmu, data);
>>>  		break;
>>> +	case MSR_PERF_METRICS:
>>> +		WARN_ON_ONCE(!msr_info->host_initiated);
>> [Severity: High]
>> Can a malicious guest also trigger this WARN_ON_ONCE() by executing WRMSR
>> over an MMIO page to force instruction emulation?
> Ditto.
>>> +
>>> +		/*
>>> +		 * If TMA level 2 is not supported, bits [63:32] are reserved
>>> +		 * and ignored on write, so no validation is needed here.
>>> +		 */
>>> +		pmu->perf_metrics = data;
>>> +		break;
>>>  	case MSR_IA32_PEBS_ENABLE:
>> [ ... ]
>>
>>> @@ -622,6 +642,9 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
>>>  
>>>  static void intel_pmu_reset(struct kvm_vcpu *vcpu)
>>>  {
>>> +	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>>> +
>>> +	pmu->perf_metrics = 0;
>>>  	intel_pmu_release_guest_lbr_event(vcpu);
>>>  }
>> [ ... ]
>>
>>> @@ -821,6 +851,12 @@ static void intel_mediated_pmu_put(struct kvm_vcpu *vcpu)
>>>  	 */
>>>  	if (pmu->fixed_ctr_ctrl_hw)
>>>  		wrmsrq(MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
>>> +
>>> +	if (vcpu_has_perf_metrics(vcpu)) {
>>> +		pmu->perf_metrics = rdpmc(INTEL_PMC_FIXED_RDPMC_METRICS);
>> [Severity: High]
>> Does this rdpmc() call silently clobber software updates made to
>> pmu->perf_metrics?
>>
>> When the VCPU is loaded during a KVM_SET_MSRS ioctl or VCPU reset, updates
>> to pmu->perf_metrics in intel_pmu_set_msr() or intel_pmu_reset() modify the
>> software struct but do not synchronously write to the active hardware MSR.
> This is safe.
>
> After intel_pmu_set_msr() or intel_pmu_reset(),
> intel_mediated_pmu_load() is called to sync the state to hardware before
> the next intel_mediated_pmu_put().
>
>
>> Later, when the VCPU is descheduled, intel_mediated_pmu_put() reads the
>> physical MSR (which still holds the old pre-update value) and overwrites the
>> software state.
>>
>> Will this permanently discard the new software state and break live
>> migration or VCPU resets?
>>
>>> +		if (pmu->perf_metrics)
>>> +			wrmsrq(MSR_PERF_METRICS, 0);
>>> +	}
>>>  }
>> [ ... ]
>>
>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>>> index 9324d60839419..475e72c733cee 100644
>>> --- a/arch/x86/kvm/vmx/vmx.c
>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>> @@ -4247,6 +4247,9 @@ static void vmx_recalc_pmu_msr_intercepts(struct kvm_vcpu *vcpu)
>>>  				  MSR_TYPE_RW, intercept);
>>>  	vmx_set_intercept_for_msr(vcpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
>>>  				  MSR_TYPE_RW, intercept);
>>> +
>>> +	vmx_set_intercept_for_msr(vcpu, MSR_PERF_METRICS, MSR_TYPE_RW,
>>> +				  !vcpu_has_perf_metrics(vcpu));
>> [Severity: High]
>> Could clearing the intercept here allow the guest direct, unintercepted
>> access to the host's physical MSR_PERF_METRICS register without a context
>> switch?
>>
>> If enable_mediated_pmu is true, KVM exposes PERF_CAP_PERF_METRICS. A VMM
>> could configure a VCPU with MSR_IA32_PERF_CAPABILITIES.PERF_METRICS=1 but
>> leave the PMU version at 0.
>>
>> In this scenario, vcpu_has_perf_metrics() would return true, clearing the
>> intercept. However, because the PMU version is 0,
>> kvm_vcpu_has_mediated_pmu() returns false.
>>
>> This causes the context-switch handlers kvm_mediated_pmu_load() and
>> kvm_mediated_pmu_put() to exit early.
>>
>> Does this bypass host isolation and allow the guest to leak or clobber host
>> PMU state?
> Yes, you are right.
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 7cffe1619a1f..28027d302d90 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4266,7 +4266,7 @@ static void vmx_recalc_pmu_msr_intercepts(struct
> kvm_vcpu *vcpu)
>                                   MSR_TYPE_RW, intercept);
>
>         vmx_set_intercept_for_msr(vcpu, MSR_PERF_METRICS, MSR_TYPE_RW,
> -                                 !vcpu_has_perf_metrics(vcpu));
> +                                 !has_mediated_pmu ||
> !vcpu_has_perf_metrics(vcpu));
>  }
>
>  static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
>
>>>  }
>>>  
>>>  static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
>

next prev parent reply	other threads:[~2026-06-18  2:21 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-15 23:01 [PATCH V3 0/4] KVM: x86/pmu: Add hardware Topdown metrics support Zide Chen
2026-06-15 23:01 ` [PATCH V3 1/4] KVM: x86/pmu: Do not map fixed counters >= 3 to generic perf events Zide Chen
2026-06-15 23:01 ` [PATCH V3 2/4] KVM: x86/pmu: Support Intel fixed counter 3 on mediated vPMU Zide Chen
2026-06-15 23:01 ` [PATCH V3 3/4] KVM: x86/pmu: Support PERF_METRICS MSR in " Zide Chen
2026-06-15 23:26   ` sashiko-bot
2026-06-16 16:29     ` Chen, Zide
2026-06-18  2:21       ` Mi, Dapeng [this message]
2026-06-15 23:01 ` [PATCH V3 4/4] KVM: selftests: Add perf_metrics and fixed counter 3 tests Zide Chen
2026-06-15 23:26   ` sashiko-bot
2026-06-16 16:32     ` Chen, Zide

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ba9eead0-371a-4720-ae9d-e84993c7253c@linux.intel.com \
    --to=dapeng1.mi@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=zide.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox