From: Sean Christopherson <seanjc@google.com>
To: Jim Mattson <jmattson@google.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH v4 2/3] KVM: x86: Provide a capability to disable APERF/MPERF read intercepts
Date: Tue, 24 Jun 2025 14:35:27 -0700 [thread overview]
Message-ID: <aFsaH97Qxn7nUA86@google.com> (raw)
In-Reply-To: <20250530185239.2335185-3-jmattson@google.com>
On Fri, May 30, 2025, Jim Mattson wrote:
> Allow a guest to read the physical IA32_APERF and IA32_MPERF MSRs
> without interception.
>
> The IA32_APERF and IA32_MPERF MSRs are not virtualized. Writes are not
> handled at all. The MSR values are not zeroed on vCPU creation, saved
> on suspend, or restored on resume. No accommodation is made for
> processor migration or for sharing a logical processor with other
> tasks. No adjustments are made for non-unit TSC multipliers. The MSRs
> do not account for time the same way as the comparable PMU events,
> whether the PMU is virtualized by the traditional emulation method or
> the new mediated pass-through approach.
>
> Nonetheless, in a properly constrained environment, this capability
> can be combined with a guest CPUID table that advertises support for
> CPUID.6:ECX.APERFMPERF[bit 0] to induce a Linux guest to report the
> effective physical CPU frequency in /proc/cpuinfo. Moreover, there is
> no performance cost for this capability.
>
> Signed-off-by: Jim Mattson <jmattson@google.com>
> ---
> Documentation/virt/kvm/api.rst | 23 +++++++++++++++++++++++
> arch/x86/kvm/svm/svm.c | 7 +++++++
> arch/x86/kvm/svm/svm.h | 2 +-
> arch/x86/kvm/vmx/vmx.c | 6 ++++++
> arch/x86/kvm/vmx/vmx.h | 2 +-
> arch/x86/kvm/x86.c | 8 +++++++-
> arch/x86/kvm/x86.h | 5 +++++
> include/uapi/linux/kvm.h | 1 +
> tools/include/uapi/linux/kvm.h | 1 +
> 9 files changed, 52 insertions(+), 3 deletions(-)
This needs to be rebased on top of the MSR interception rework, which I've now
pushed to kvm-x86 next. Luckily, it's quite painless. Compile tested only at
this point (about to throw it onto metal).
I'd be happy to post a v5 on your behalf (pending your thoughts on my feedback
to patch 1), unless you want the honors. The fixup is a wee bit more than I'm
comfortable doing on-the-fly.
---
Documentation/virt/kvm/api.rst | 23 +++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 5 +++++
arch/x86/kvm/vmx/vmx.c | 4 ++++
arch/x86/kvm/x86.c | 6 +++++-
arch/x86/kvm/x86.h | 5 +++++
include/uapi/linux/kvm.h | 1 +
tools/include/uapi/linux/kvm.h | 1 +
7 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index f0d961436d0f..13a752b1200f 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7844,6 +7844,7 @@ Valid bits in args[0] are::
#define KVM_X86_DISABLE_EXITS_HLT (1 << 1)
#define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2)
#define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3)
+ #define KVM_X86_DISABLE_EXITS_APERFMPERF (1 << 4)
Enabling this capability on a VM provides userspace with a way to no
longer intercept some instructions for improved latency in some
@@ -7854,6 +7855,28 @@ all such vmexits.
Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits.
+Virtualizing the ``IA32_APERF`` and ``IA32_MPERF`` MSRs requires more
+than just disabling APERF/MPERF exits. While both Intel and AMD
+document strict usage conditions for these MSRs--emphasizing that only
+the ratio of their deltas over a time interval (T0 to T1) is
+architecturally defined--simply passing through the MSRs can still
+produce an incorrect ratio.
+
+This erroneous ratio can occur if, between T0 and T1:
+
+1. The vCPU thread migrates between logical processors.
+2. Live migration or suspend/resume operations take place.
+3. Another task shares the vCPU's logical processor.
+4. C-states lower thean C0 are emulated (e.g., via HLT interception).
+5. The guest TSC frequency doesn't match the host TSC frequency.
+
+Due to these complexities, KVM does not automatically associate this
+passthrough capability with the guest CPUID bit,
+``CPUID.6:ECX.APERFMPERF[bit 0]``. Userspace VMMs that deem this
+mechanism adequate for virtualizing the ``IA32_APERF`` and
+``IA32_MPERF`` MSRs must set the guest CPUID bit explicitly.
+
+
7.14 KVM_CAP_S390_HPAGE_1M
--------------------------
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ce85f4d6f686..079c0a0b0eaa 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -854,6 +854,11 @@ static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
svm_set_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW,
guest_cpuid_is_intel_compatible(vcpu));
+ if (kvm_aperfmperf_in_guest(vcpu->kvm)) {
+ svm_disable_intercept_for_msr(vcpu, MSR_IA32_APERF, MSR_TYPE_R);
+ svm_disable_intercept_for_msr(vcpu, MSR_IA32_MPERF, MSR_TYPE_R);
+ }
+
if (sev_es_guest(vcpu->kvm))
sev_es_recalc_msr_intercepts(vcpu);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 8c16a3aff896..723a22be2514 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4099,6 +4099,10 @@ void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
}
+ if (kvm_aperfmperf_in_guest(vcpu->kvm)) {
+ vmx_disable_intercept_for_msr(vcpu, MSR_IA32_APERF, MSR_TYPE_R);
+ vmx_disable_intercept_for_msr(vcpu, MSR_IA32_MPERF, MSR_TYPE_R);
+ }
/* PT MSRs can be passed through iff PT is exposed to the guest. */
if (vmx_pt_mode_is_host_guest())
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 56569ac2e9a4..75c0f52d3c44 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4580,6 +4580,9 @@ static u64 kvm_get_allowed_disable_exits(void)
{
u64 r = KVM_X86_DISABLE_EXITS_PAUSE;
+ if (boot_cpu_has(X86_FEATURE_APERFMPERF))
+ r |= KVM_X86_DISABLE_EXITS_APERFMPERF;
+
if (!mitigate_smt_rsb) {
r |= KVM_X86_DISABLE_EXITS_HLT |
KVM_X86_DISABLE_EXITS_CSTATE;
@@ -6478,7 +6481,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
if (!mitigate_smt_rsb && boot_cpu_has_bug(X86_BUG_SMT_RSB) &&
cpu_smt_possible() &&
- (cap->args[0] & ~KVM_X86_DISABLE_EXITS_PAUSE))
+ (cap->args[0] & ~(KVM_X86_DISABLE_EXITS_PAUSE |
+ KVM_X86_DISABLE_EXITS_APERFMPERF)))
pr_warn_once(SMT_RSB_MSG);
kvm_disable_exits(kvm, cap->args[0]);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 31ae58b765f3..bcfd9b719ada 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -546,6 +546,11 @@ static inline bool kvm_cstate_in_guest(struct kvm *kvm)
return kvm->arch.disabled_exits & KVM_X86_DISABLE_EXITS_CSTATE;
}
+static inline bool kvm_aperfmperf_in_guest(struct kvm *kvm)
+{
+ return kvm->arch.disabled_exits & KVM_X86_DISABLE_EXITS_APERFMPERF;
+}
+
static inline bool kvm_notify_vmexit_enabled(struct kvm *kvm)
{
return kvm->arch.notify_vmexit_flags & KVM_X86_NOTIFY_VMEXIT_ENABLED;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 7a4c35ff03fe..aeb2ca10b190 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -644,6 +644,7 @@ struct kvm_ioeventfd {
#define KVM_X86_DISABLE_EXITS_HLT (1 << 1)
#define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2)
#define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3)
+#define KVM_X86_DISABLE_EXITS_APERFMPERF (1 << 4)
/* for KVM_ENABLE_CAP */
struct kvm_enable_cap {
diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
index d00b85cb168c..7415a3863891 100644
--- a/tools/include/uapi/linux/kvm.h
+++ b/tools/include/uapi/linux/kvm.h
@@ -618,6 +618,7 @@ struct kvm_ioeventfd {
#define KVM_X86_DISABLE_EXITS_HLT (1 << 1)
#define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2)
#define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3)
+#define KVM_X86_DISABLE_EXITS_APERFMPERF (1 << 4)
/* for KVM_ENABLE_CAP */
struct kvm_enable_cap {
--
next prev parent reply other threads:[~2025-06-24 21:35 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-30 18:52 [PATCH v4 0/3] KVM: x86: Provide a capability to disable APERF/MPERF read intercepts Jim Mattson
2025-05-30 18:52 ` [PATCH v4 1/3] KVM: x86: Replace growing set of *_in_guest bools with a u64 Jim Mattson
2025-06-24 21:25 ` Sean Christopherson
2025-06-24 22:34 ` Jim Mattson
2025-05-30 18:52 ` [PATCH v4 2/3] KVM: x86: Provide a capability to disable APERF/MPERF read intercepts Jim Mattson
2025-06-24 21:35 ` Sean Christopherson [this message]
2025-06-24 22:37 ` Jim Mattson
2025-06-24 23:31 ` Sean Christopherson
2025-06-25 0:11 ` Jim Mattson
2025-05-30 18:52 ` [PATCH v4 3/3] KVM: selftests: Test behavior of KVM_X86_DISABLE_EXITS_APERFMPERF Jim Mattson
2025-06-10 8:42 ` Mi, Dapeng
2025-06-10 16:59 ` Jim Mattson
2025-06-11 1:47 ` Mi, Dapeng
2025-06-24 22:24 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aFsaH97Qxn7nUA86@google.com \
--to=seanjc@google.com \
--cc=jmattson@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.