* [PATCH v4 0/3] KVM: x86: Provide a capability to disable APERF/MPERF read intercepts
@ 2025-05-30 18:52 Jim Mattson
2025-05-30 18:52 ` [PATCH v4 1/3] KVM: x86: Replace growing set of *_in_guest bools with a u64 Jim Mattson
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Jim Mattson @ 2025-05-30 18:52 UTC (permalink / raw)
To: linux-kernel, kvm, Sean Christopherson, Paolo Bonzini; +Cc: Jim Mattson
Allow a guest to read IA32_APERF and IA32_MPERF, so that it can
determine the effective frequency multiplier for the physical LPU.
Commit b51700632e0e ("KVM: X86: Provide a capability to disable cstate
msr read intercepts") allowed the userspace VMM to grant a guest read
access to four core C-state residency MSRs. Do the same for IA32_APERF
and IA32_MPERF.
While this isn't sufficient to claim support for
CPUID.6:ECX.APERFMPERF[bit 0], it may suffice in a sufficiently
restricted environment (i.e. vCPUs pinned to LPUs, no TSC multiplier,
and no suspend/resume).
v1 -> v2: Add {IA32_APERF,IA32_MPERF} to vmx_possible_passthrough_msrs[]
v2 -> v3: Add a selftest
v3 -> v4: Collect all disabled_exit flags in a u64 [Sean]
Improve documentation [Sean]
Add pin_task_to_one_cpu() to kvm selftests library [Sean]
Jim Mattson (3):
KVM: x86: Replace growing set of *_in_guest bools with a u64
KVM: x86: Provide a capability to disable APERF/MPERF read intercepts
KVM: selftests: Test behavior of KVM_X86_DISABLE_EXITS_APERFMPERF
Documentation/virt/kvm/api.rst | 23 +++
arch/x86/include/asm/kvm_host.h | 5 +-
arch/x86/kvm/svm/svm.c | 9 +-
arch/x86/kvm/svm/svm.h | 2 +-
arch/x86/kvm/vmx/vmx.c | 8 +-
arch/x86/kvm/vmx/vmx.h | 2 +-
arch/x86/kvm/x86.c | 16 ++-
arch/x86/kvm/x86.h | 18 ++-
include/uapi/linux/kvm.h | 1 +
tools/include/uapi/linux/kvm.h | 1 +
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../testing/selftests/kvm/include/kvm_util.h | 2 +
tools/testing/selftests/kvm/lib/kvm_util.c | 17 +++
.../selftests/kvm/x86/aperfmperf_test.c | 132 ++++++++++++++++++
14 files changed, 220 insertions(+), 17 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86/aperfmperf_test.c
--
2.49.0.1204.g71687c7c1d-goog
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v4 1/3] KVM: x86: Replace growing set of *_in_guest bools with a u64
2025-05-30 18:52 [PATCH v4 0/3] KVM: x86: Provide a capability to disable APERF/MPERF read intercepts Jim Mattson
@ 2025-05-30 18:52 ` Jim Mattson
2025-06-24 21:25 ` Sean Christopherson
2025-05-30 18:52 ` [PATCH v4 2/3] KVM: x86: Provide a capability to disable APERF/MPERF read intercepts Jim Mattson
2025-05-30 18:52 ` [PATCH v4 3/3] KVM: selftests: Test behavior of KVM_X86_DISABLE_EXITS_APERFMPERF Jim Mattson
2 siblings, 1 reply; 14+ messages in thread
From: Jim Mattson @ 2025-05-30 18:52 UTC (permalink / raw)
To: linux-kernel, kvm, Sean Christopherson, Paolo Bonzini; +Cc: Jim Mattson
Store each "disabled exit" boolean in a single bit rather than a byte.
No functional change intended.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
---
arch/x86/include/asm/kvm_host.h | 5 +----
arch/x86/kvm/svm/svm.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 2 +-
arch/x86/kvm/x86.c | 8 ++++----
arch/x86/kvm/x86.h | 13 +++++++++----
5 files changed, 16 insertions(+), 14 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 67b464651c8d..fa912b2e7591 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1390,10 +1390,7 @@ struct kvm_arch {
gpa_t wall_clock;
- bool mwait_in_guest;
- bool hlt_in_guest;
- bool pause_in_guest;
- bool cstate_in_guest;
+ u64 disabled_exits;
unsigned long irq_sources_bitmap;
s64 kvmclock_offset;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ffb34dadff1c..6d2d97fd967a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5102,7 +5102,7 @@ static int svm_vm_init(struct kvm *kvm)
}
if (!pause_filter_count || !pause_filter_thresh)
- kvm->arch.pause_in_guest = true;
+ kvm_disable_exits(kvm, KVM_X86_DISABLE_EXITS_PAUSE);
if (enable_apicv) {
int ret = avic_vm_init(kvm);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b12414108cbf..136be14e6db0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7619,7 +7619,7 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu)
int vmx_vm_init(struct kvm *kvm)
{
if (!ple_gap)
- kvm->arch.pause_in_guest = true;
+ kvm_disable_exits(kvm, KVM_X86_DISABLE_EXITS_PAUSE);
if (boot_cpu_has(X86_BUG_L1TF) && enable_ept) {
switch (l1tf_mitigation) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 570e7f8cbf64..8c20afda4398 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6605,13 +6605,13 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
pr_warn_once(SMT_RSB_MSG);
if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE)
- kvm->arch.pause_in_guest = true;
+ kvm_disable_exits(kvm, KVM_X86_DISABLE_EXITS_PAUSE);
if (cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT)
- kvm->arch.mwait_in_guest = true;
+ kvm_disable_exits(kvm, KVM_X86_DISABLE_EXITS_MWAIT);
if (cap->args[0] & KVM_X86_DISABLE_EXITS_HLT)
- kvm->arch.hlt_in_guest = true;
+ kvm_disable_exits(kvm, KVM_X86_DISABLE_EXITS_HLT);
if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE)
- kvm->arch.cstate_in_guest = true;
+ kvm_disable_exits(kvm, KVM_X86_DISABLE_EXITS_CSTATE);
r = 0;
disable_exits_unlock:
mutex_unlock(&kvm->lock);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 88a9475899c8..0ad36851df4c 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -481,24 +481,29 @@ static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec)
__rem; \
})
+static inline void kvm_disable_exits(struct kvm *kvm, u64 mask)
+{
+ kvm->arch.disabled_exits |= mask;
+}
+
static inline bool kvm_mwait_in_guest(struct kvm *kvm)
{
- return kvm->arch.mwait_in_guest;
+ return kvm->arch.disabled_exits & KVM_X86_DISABLE_EXITS_MWAIT;
}
static inline bool kvm_hlt_in_guest(struct kvm *kvm)
{
- return kvm->arch.hlt_in_guest;
+ return kvm->arch.disabled_exits & KVM_X86_DISABLE_EXITS_HLT;
}
static inline bool kvm_pause_in_guest(struct kvm *kvm)
{
- return kvm->arch.pause_in_guest;
+ return kvm->arch.disabled_exits & KVM_X86_DISABLE_EXITS_PAUSE;
}
static inline bool kvm_cstate_in_guest(struct kvm *kvm)
{
- return kvm->arch.cstate_in_guest;
+ return kvm->arch.disabled_exits & KVM_X86_DISABLE_EXITS_CSTATE;
}
static inline bool kvm_notify_vmexit_enabled(struct kvm *kvm)
--
2.49.0.1204.g71687c7c1d-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 2/3] KVM: x86: Provide a capability to disable APERF/MPERF read intercepts
2025-05-30 18:52 [PATCH v4 0/3] KVM: x86: Provide a capability to disable APERF/MPERF read intercepts Jim Mattson
2025-05-30 18:52 ` [PATCH v4 1/3] KVM: x86: Replace growing set of *_in_guest bools with a u64 Jim Mattson
@ 2025-05-30 18:52 ` Jim Mattson
2025-06-24 21:35 ` Sean Christopherson
2025-06-24 23:31 ` Sean Christopherson
2025-05-30 18:52 ` [PATCH v4 3/3] KVM: selftests: Test behavior of KVM_X86_DISABLE_EXITS_APERFMPERF Jim Mattson
2 siblings, 2 replies; 14+ messages in thread
From: Jim Mattson @ 2025-05-30 18:52 UTC (permalink / raw)
To: linux-kernel, kvm, Sean Christopherson, Paolo Bonzini; +Cc: Jim Mattson
Allow a guest to read the physical IA32_APERF and IA32_MPERF MSRs
without interception.
The IA32_APERF and IA32_MPERF MSRs are not virtualized. Writes are not
handled at all. The MSR values are not zeroed on vCPU creation, saved
on suspend, or restored on resume. No accommodation is made for
processor migration or for sharing a logical processor with other
tasks. No adjustments are made for non-unit TSC multipliers. The MSRs
do not account for time the same way as the comparable PMU events,
whether the PMU is virtualized by the traditional emulation method or
the new mediated pass-through approach.
Nonetheless, in a properly constrained environment, this capability
can be combined with a guest CPUID table that advertises support for
CPUID.6:ECX.APERFMPERF[bit 0] to induce a Linux guest to report the
effective physical CPU frequency in /proc/cpuinfo. Moreover, there is
no performance cost for this capability.
Signed-off-by: Jim Mattson <jmattson@google.com>
---
Documentation/virt/kvm/api.rst | 23 +++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 7 +++++++
arch/x86/kvm/svm/svm.h | 2 +-
arch/x86/kvm/vmx/vmx.c | 6 ++++++
arch/x86/kvm/vmx/vmx.h | 2 +-
arch/x86/kvm/x86.c | 8 +++++++-
arch/x86/kvm/x86.h | 5 +++++
include/uapi/linux/kvm.h | 1 +
tools/include/uapi/linux/kvm.h | 1 +
9 files changed, 52 insertions(+), 3 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 6fb1870f0999..5849a14a6712 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7780,6 +7780,7 @@ Valid bits in args[0] are::
#define KVM_X86_DISABLE_EXITS_HLT (1 << 1)
#define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2)
#define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3)
+ #define KVM_X86_DISABLE_EXITS_APERFMPERF (1 << 4)
Enabling this capability on a VM provides userspace with a way to no
longer intercept some instructions for improved latency in some
@@ -7790,6 +7791,28 @@ all such vmexits.
Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits.
+Virtualizing the ``IA32_APERF`` and ``IA32_MPERF`` MSRs requires more
+than just disabling APERF/MPERF exits. While both Intel and AMD
+document strict usage conditions for these MSRs--emphasizing that only
+the ratio of their deltas over a time interval (T0 to T1) is
+architecturally defined--simply passing through the MSRs can still
+produce an incorrect ratio.
+
+This erroneous ratio can occur if, between T0 and T1:
+
+1. The vCPU thread migrates between logical processors.
+2. Live migration or suspend/resume operations take place.
+3. Another task shares the vCPU's logical processor.
+4. C-states lower thean C0 are emulated (e.g., via HLT interception).
+5. The guest TSC frequency doesn't match the host TSC frequency.
+
+Due to these complexities, KVM does not automatically associate this
+passthrough capability with the guest CPUID bit,
+``CPUID.6:ECX.APERFMPERF[bit 0]``. Userspace VMMs that deem this
+mechanism adequate for virtualizing the ``IA32_APERF`` and
+``IA32_MPERF`` MSRs must set the guest CPUID bit explicitly.
+
+
7.14 KVM_CAP_S390_HPAGE_1M
--------------------------
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6d2d97fd967a..12468d228bb8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -112,6 +112,8 @@ static const struct svm_direct_access_msrs {
{ .index = MSR_IA32_CR_PAT, .always = false },
{ .index = MSR_AMD64_SEV_ES_GHCB, .always = true },
{ .index = MSR_TSC_AUX, .always = false },
+ { .index = MSR_IA32_APERF, .always = false },
+ { .index = MSR_IA32_MPERF, .always = false },
{ .index = X2APIC_MSR(APIC_ID), .always = false },
{ .index = X2APIC_MSR(APIC_LVR), .always = false },
{ .index = X2APIC_MSR(APIC_TASKPRI), .always = false },
@@ -1357,6 +1359,11 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
+ if (kvm_aperfmperf_in_guest(vcpu->kvm)) {
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_APERF, 1, 0);
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_MPERF, 1, 0);
+ }
+
if (kvm_vcpu_apicv_active(vcpu))
avic_init_vmcb(svm, vmcb);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f16b068c4228..ef10122ef590 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -44,7 +44,7 @@ static inline struct page *__sme_pa_to_page(unsigned long pa)
#define IOPM_SIZE PAGE_SIZE * 3
#define MSRPM_SIZE PAGE_SIZE * 2
-#define MAX_DIRECT_ACCESS_MSRS 48
+#define MAX_DIRECT_ACCESS_MSRS 50
#define MSRPM_OFFSETS 32
extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
extern bool npt_enabled;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 136be14e6db0..e8eeafd813e5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -188,6 +188,8 @@ static u32 vmx_possible_passthrough_msrs[MAX_POSSIBLE_PASSTHROUGH_MSRS] = {
MSR_CORE_C3_RESIDENCY,
MSR_CORE_C6_RESIDENCY,
MSR_CORE_C7_RESIDENCY,
+ MSR_IA32_APERF,
+ MSR_IA32_MPERF,
};
/*
@@ -7569,6 +7571,10 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu)
vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
}
+ if (kvm_aperfmperf_in_guest(vcpu->kvm)) {
+ vmx_disable_intercept_for_msr(vcpu, MSR_IA32_APERF, MSR_TYPE_R);
+ vmx_disable_intercept_for_msr(vcpu, MSR_IA32_MPERF, MSR_TYPE_R);
+ }
vmx->loaded_vmcs = &vmx->vmcs01;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 6d1e40ecc024..24c0bd2ff5e9 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -297,7 +297,7 @@ struct vcpu_vmx {
struct lbr_desc lbr_desc;
/* Save desired MSR intercept (read: pass-through) state */
-#define MAX_POSSIBLE_PASSTHROUGH_MSRS 16
+#define MAX_POSSIBLE_PASSTHROUGH_MSRS 18
struct {
DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8c20afda4398..4e53e555f6cf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4574,6 +4574,9 @@ static u64 kvm_get_allowed_disable_exits(void)
{
u64 r = KVM_X86_DISABLE_EXITS_PAUSE;
+ if (boot_cpu_has(X86_FEATURE_APERFMPERF))
+ r |= KVM_X86_DISABLE_EXITS_APERFMPERF;
+
if (!mitigate_smt_rsb) {
r |= KVM_X86_DISABLE_EXITS_HLT |
KVM_X86_DISABLE_EXITS_CSTATE;
@@ -6601,7 +6604,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
if (!mitigate_smt_rsb && boot_cpu_has_bug(X86_BUG_SMT_RSB) &&
cpu_smt_possible() &&
- (cap->args[0] & ~KVM_X86_DISABLE_EXITS_PAUSE))
+ (cap->args[0] & ~(KVM_X86_DISABLE_EXITS_PAUSE |
+ KVM_X86_DISABLE_EXITS_APERFMPERF)))
pr_warn_once(SMT_RSB_MSG);
if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE)
@@ -6612,6 +6616,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
kvm_disable_exits(kvm, KVM_X86_DISABLE_EXITS_HLT);
if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE)
kvm_disable_exits(kvm, KVM_X86_DISABLE_EXITS_CSTATE);
+ if (cap->args[0] & KVM_X86_DISABLE_EXITS_APERFMPERF)
+ kvm_disable_exits(kvm, KVM_X86_DISABLE_EXITS_APERFMPERF);
r = 0;
disable_exits_unlock:
mutex_unlock(&kvm->lock);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 0ad36851df4c..f6334201014a 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -506,6 +506,11 @@ static inline bool kvm_cstate_in_guest(struct kvm *kvm)
return kvm->arch.disabled_exits & KVM_X86_DISABLE_EXITS_CSTATE;
}
+static inline bool kvm_aperfmperf_in_guest(struct kvm *kvm)
+{
+ return kvm->arch.disabled_exits & KVM_X86_DISABLE_EXITS_APERFMPERF;
+}
+
static inline bool kvm_notify_vmexit_enabled(struct kvm *kvm)
{
return kvm->arch.notify_vmexit_flags & KVM_X86_NOTIFY_VMEXIT_ENABLED;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d00b85cb168c..7415a3863891 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -618,6 +618,7 @@ struct kvm_ioeventfd {
#define KVM_X86_DISABLE_EXITS_HLT (1 << 1)
#define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2)
#define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3)
+#define KVM_X86_DISABLE_EXITS_APERFMPERF (1 << 4)
/* for KVM_ENABLE_CAP */
struct kvm_enable_cap {
diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
index b6ae8ad8934b..eef57c117140 100644
--- a/tools/include/uapi/linux/kvm.h
+++ b/tools/include/uapi/linux/kvm.h
@@ -617,6 +617,7 @@ struct kvm_ioeventfd {
#define KVM_X86_DISABLE_EXITS_HLT (1 << 1)
#define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2)
#define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3)
+#define KVM_X86_DISABLE_EXITS_APERFMPERF (1 << 4)
/* for KVM_ENABLE_CAP */
struct kvm_enable_cap {
--
2.49.0.1204.g71687c7c1d-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 3/3] KVM: selftests: Test behavior of KVM_X86_DISABLE_EXITS_APERFMPERF
2025-05-30 18:52 [PATCH v4 0/3] KVM: x86: Provide a capability to disable APERF/MPERF read intercepts Jim Mattson
2025-05-30 18:52 ` [PATCH v4 1/3] KVM: x86: Replace growing set of *_in_guest bools with a u64 Jim Mattson
2025-05-30 18:52 ` [PATCH v4 2/3] KVM: x86: Provide a capability to disable APERF/MPERF read intercepts Jim Mattson
@ 2025-05-30 18:52 ` Jim Mattson
2025-06-10 8:42 ` Mi, Dapeng
2025-06-24 22:24 ` Sean Christopherson
2 siblings, 2 replies; 14+ messages in thread
From: Jim Mattson @ 2025-05-30 18:52 UTC (permalink / raw)
To: linux-kernel, kvm, Sean Christopherson, Paolo Bonzini; +Cc: Jim Mattson
For a VCPU thread pinned to a single LPU, verify that interleaved host
and guest reads of IA32_[AM]PERF return strictly increasing values when
APERFMPERF exiting is disabled.
Signed-off-by: Jim Mattson <jmattson@google.com>
---
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../testing/selftests/kvm/include/kvm_util.h | 2 +
tools/testing/selftests/kvm/lib/kvm_util.c | 17 +++
.../selftests/kvm/x86/aperfmperf_test.c | 132 ++++++++++++++++++
4 files changed, 152 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86/aperfmperf_test.c
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 3e786080473d..8d42a3bd0dd8 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -131,6 +131,7 @@ TEST_GEN_PROGS_x86 += x86/amx_test
TEST_GEN_PROGS_x86 += x86/max_vcpuid_cap_test
TEST_GEN_PROGS_x86 += x86/triple_fault_event_test
TEST_GEN_PROGS_x86 += x86/recalc_apic_map_test
+TEST_GEN_PROGS_x86 += x86/aperfmperf_test
TEST_GEN_PROGS_x86 += access_tracking_perf_test
TEST_GEN_PROGS_x86 += coalesced_io_test
TEST_GEN_PROGS_x86 += dirty_log_perf_test
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index 93013564428b..43a1bef10ec0 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -1158,4 +1158,6 @@ bool vm_is_gpa_protected(struct kvm_vm *vm, vm_paddr_t paddr);
uint32_t guest_get_vcpuid(void);
+int pin_task_to_one_cpu(void);
+
#endif /* SELFTEST_KVM_UTIL_H */
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 5649cf2f40e8..b6c707ab92d7 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -10,6 +10,7 @@
#include "ucall_common.h"
#include <assert.h>
+#include <pthread.h>
#include <sched.h>
#include <sys/mman.h>
#include <sys/resource.h>
@@ -2321,3 +2322,19 @@ bool vm_is_gpa_protected(struct kvm_vm *vm, vm_paddr_t paddr)
pg = paddr >> vm->page_shift;
return sparsebit_is_set(region->protected_phy_pages, pg);
}
+
+int pin_task_to_one_cpu(void)
+{
+ int cpu = sched_getcpu();
+ cpu_set_t cpuset;
+ int rc;
+
+ CPU_ZERO(&cpuset);
+ CPU_SET(cpu, &cpuset);
+
+ rc = pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
+ TEST_ASSERT(rc == 0, "%s: Can't set thread affinity", __func__);
+
+ return cpu;
+}
+
diff --git a/tools/testing/selftests/kvm/x86/aperfmperf_test.c b/tools/testing/selftests/kvm/x86/aperfmperf_test.c
new file mode 100644
index 000000000000..64d976156693
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86/aperfmperf_test.c
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Test for KVM_X86_DISABLE_EXITS_APERFMPERF
+ *
+ * Copyright (C) 2025, Google LLC.
+ *
+ * Test the ability to disable VM-exits for rdmsr of IA32_APERF and
+ * IA32_MPERF. When these VM-exits are disabled, reads of these MSRs
+ * return the host's values.
+ *
+ * Note: Requires read access to /dev/cpu/<lpu>/msr to read host MSRs.
+ */
+
+#include <fcntl.h>
+#include <limits.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <asm/msr-index.h>
+
+#include "kvm_util.h"
+#include "processor.h"
+#include "test_util.h"
+
+#define NUM_ITERATIONS 100
+
+static int open_dev_msr(int cpu)
+{
+ char path[PATH_MAX];
+
+ snprintf(path, sizeof(path), "/dev/cpu/%d/msr", cpu);
+ return open_path_or_exit(path, O_RDONLY);
+}
+
+static uint64_t read_dev_msr(int msr_fd, uint32_t msr)
+{
+ uint64_t data;
+ ssize_t rc;
+
+ rc = pread(msr_fd, &data, sizeof(data), msr);
+ TEST_ASSERT(rc == sizeof(data), "Read of MSR 0x%x failed", msr);
+
+ return data;
+}
+
+static void guest_code(void)
+{
+ int i;
+
+ for (i = 0; i < NUM_ITERATIONS; i++)
+ GUEST_SYNC2(rdmsr(MSR_IA32_APERF), rdmsr(MSR_IA32_MPERF));
+
+ GUEST_DONE();
+}
+
+int main(int argc, char *argv[])
+{
+ uint64_t host_aperf_before, host_mperf_before;
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+ int msr_fd;
+ int cpu;
+ int i;
+
+ cpu = pin_task_to_one_cpu();
+
+ msr_fd = open_dev_msr(cpu);
+
+ /*
+ * This test requires a non-standard VM initialization, because
+ * KVM_ENABLE_CAP cannot be used on a VM file descriptor after
+ * a VCPU has been created.
+ */
+ vm = vm_create(1);
+
+ TEST_REQUIRE(vm_check_cap(vm, KVM_CAP_X86_DISABLE_EXITS) &
+ KVM_X86_DISABLE_EXITS_APERFMPERF);
+
+ vm_enable_cap(vm, KVM_CAP_X86_DISABLE_EXITS,
+ KVM_X86_DISABLE_EXITS_APERFMPERF);
+
+ vcpu = vm_vcpu_add(vm, 0, guest_code);
+
+ host_aperf_before = read_dev_msr(msr_fd, MSR_IA32_APERF);
+ host_mperf_before = read_dev_msr(msr_fd, MSR_IA32_MPERF);
+
+ for (i = 0; i < NUM_ITERATIONS; i++) {
+ uint64_t host_aperf_after, host_mperf_after;
+ uint64_t guest_aperf, guest_mperf;
+ struct ucall uc;
+
+ vcpu_run(vcpu);
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+
+ switch (get_ucall(vcpu, &uc)) {
+ case UCALL_DONE:
+ break;
+ case UCALL_ABORT:
+ REPORT_GUEST_ASSERT(uc);
+ case UCALL_SYNC:
+ guest_aperf = uc.args[0];
+ guest_mperf = uc.args[1];
+
+ host_aperf_after = read_dev_msr(msr_fd, MSR_IA32_APERF);
+ host_mperf_after = read_dev_msr(msr_fd, MSR_IA32_MPERF);
+
+ TEST_ASSERT(host_aperf_before < guest_aperf,
+ "APERF: host_before (0x%" PRIx64 ") >= guest (0x%" PRIx64 ")",
+ host_aperf_before, guest_aperf);
+ TEST_ASSERT(guest_aperf < host_aperf_after,
+ "APERF: guest (0x%" PRIx64 ") >= host_after (0x%" PRIx64 ")",
+ guest_aperf, host_aperf_after);
+ TEST_ASSERT(host_mperf_before < guest_mperf,
+ "MPERF: host_before (0x%" PRIx64 ") >= guest (0x%" PRIx64 ")",
+ host_mperf_before, guest_mperf);
+ TEST_ASSERT(guest_mperf < host_mperf_after,
+ "MPERF: guest (0x%" PRIx64 ") >= host_after (0x%" PRIx64 ")",
+ guest_mperf, host_mperf_after);
+
+ host_aperf_before = host_aperf_after;
+ host_mperf_before = host_mperf_after;
+
+ break;
+ }
+ }
+
+ kvm_vm_free(vm);
+ close(msr_fd);
+
+ return 0;
+}
--
2.49.0.1204.g71687c7c1d-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v4 3/3] KVM: selftests: Test behavior of KVM_X86_DISABLE_EXITS_APERFMPERF
2025-05-30 18:52 ` [PATCH v4 3/3] KVM: selftests: Test behavior of KVM_X86_DISABLE_EXITS_APERFMPERF Jim Mattson
@ 2025-06-10 8:42 ` Mi, Dapeng
2025-06-10 16:59 ` Jim Mattson
2025-06-24 22:24 ` Sean Christopherson
1 sibling, 1 reply; 14+ messages in thread
From: Mi, Dapeng @ 2025-06-10 8:42 UTC (permalink / raw)
To: Jim Mattson, linux-kernel, kvm, Sean Christopherson,
Paolo Bonzini
On 5/31/2025 2:52 AM, Jim Mattson wrote:
> For a VCPU thread pinned to a single LPU, verify that interleaved host
> and guest reads of IA32_[AM]PERF return strictly increasing values when
> APERFMPERF exiting is disabled.
>
> Signed-off-by: Jim Mattson <jmattson@google.com>
> ---
> tools/testing/selftests/kvm/Makefile.kvm | 1 +
> .../testing/selftests/kvm/include/kvm_util.h | 2 +
> tools/testing/selftests/kvm/lib/kvm_util.c | 17 +++
> .../selftests/kvm/x86/aperfmperf_test.c | 132 ++++++++++++++++++
> 4 files changed, 152 insertions(+)
> create mode 100644 tools/testing/selftests/kvm/x86/aperfmperf_test.c
>
> diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
> index 3e786080473d..8d42a3bd0dd8 100644
> --- a/tools/testing/selftests/kvm/Makefile.kvm
> +++ b/tools/testing/selftests/kvm/Makefile.kvm
> @@ -131,6 +131,7 @@ TEST_GEN_PROGS_x86 += x86/amx_test
> TEST_GEN_PROGS_x86 += x86/max_vcpuid_cap_test
> TEST_GEN_PROGS_x86 += x86/triple_fault_event_test
> TEST_GEN_PROGS_x86 += x86/recalc_apic_map_test
> +TEST_GEN_PROGS_x86 += x86/aperfmperf_test
> TEST_GEN_PROGS_x86 += access_tracking_perf_test
> TEST_GEN_PROGS_x86 += coalesced_io_test
> TEST_GEN_PROGS_x86 += dirty_log_perf_test
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> index 93013564428b..43a1bef10ec0 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -1158,4 +1158,6 @@ bool vm_is_gpa_protected(struct kvm_vm *vm, vm_paddr_t paddr);
>
> uint32_t guest_get_vcpuid(void);
>
> +int pin_task_to_one_cpu(void);
> +
> #endif /* SELFTEST_KVM_UTIL_H */
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 5649cf2f40e8..b6c707ab92d7 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -10,6 +10,7 @@
> #include "ucall_common.h"
>
> #include <assert.h>
> +#include <pthread.h>
> #include <sched.h>
> #include <sys/mman.h>
> #include <sys/resource.h>
> @@ -2321,3 +2322,19 @@ bool vm_is_gpa_protected(struct kvm_vm *vm, vm_paddr_t paddr)
> pg = paddr >> vm->page_shift;
> return sparsebit_is_set(region->protected_phy_pages, pg);
> }
> +
> +int pin_task_to_one_cpu(void)
> +{
> + int cpu = sched_getcpu();
> + cpu_set_t cpuset;
> + int rc;
> +
> + CPU_ZERO(&cpuset);
> + CPU_SET(cpu, &cpuset);
> +
> + rc = pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
> + TEST_ASSERT(rc == 0, "%s: Can't set thread affinity", __func__);
> +
> + return cpu;
> +}
> +
> diff --git a/tools/testing/selftests/kvm/x86/aperfmperf_test.c b/tools/testing/selftests/kvm/x86/aperfmperf_test.c
> new file mode 100644
> index 000000000000..64d976156693
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86/aperfmperf_test.c
> @@ -0,0 +1,132 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Test for KVM_X86_DISABLE_EXITS_APERFMPERF
> + *
> + * Copyright (C) 2025, Google LLC.
> + *
> + * Test the ability to disable VM-exits for rdmsr of IA32_APERF and
> + * IA32_MPERF. When these VM-exits are disabled, reads of these MSRs
> + * return the host's values.
> + *
> + * Note: Requires read access to /dev/cpu/<lpu>/msr to read host MSRs.
> + */
> +
> +#include <fcntl.h>
> +#include <limits.h>
> +#include <stdbool.h>
> +#include <stdio.h>
> +#include <stdint.h>
> +#include <unistd.h>
> +#include <asm/msr-index.h>
> +
> +#include "kvm_util.h"
> +#include "processor.h"
> +#include "test_util.h"
> +
> +#define NUM_ITERATIONS 100
> +
> +static int open_dev_msr(int cpu)
> +{
> + char path[PATH_MAX];
> +
> + snprintf(path, sizeof(path), "/dev/cpu/%d/msr", cpu);
> + return open_path_or_exit(path, O_RDONLY);
> +}
> +
> +static uint64_t read_dev_msr(int msr_fd, uint32_t msr)
> +{
> + uint64_t data;
> + ssize_t rc;
> +
> + rc = pread(msr_fd, &data, sizeof(data), msr);
> + TEST_ASSERT(rc == sizeof(data), "Read of MSR 0x%x failed", msr);
> +
> + return data;
> +}
> +
> +static void guest_code(void)
> +{
> + int i;
> +
> + for (i = 0; i < NUM_ITERATIONS; i++)
> + GUEST_SYNC2(rdmsr(MSR_IA32_APERF), rdmsr(MSR_IA32_MPERF));
> +
> + GUEST_DONE();
> +}
> +
> +int main(int argc, char *argv[])
> +{
> + uint64_t host_aperf_before, host_mperf_before;
> + struct kvm_vcpu *vcpu;
> + struct kvm_vm *vm;
> + int msr_fd;
> + int cpu;
> + int i;
> +
> + cpu = pin_task_to_one_cpu();
> +
> + msr_fd = open_dev_msr(cpu);
> +
> + /*
> + * This test requires a non-standard VM initialization, because
> + * KVM_ENABLE_CAP cannot be used on a VM file descriptor after
> + * a VCPU has been created.
> + */
> + vm = vm_create(1);
> +
> + TEST_REQUIRE(vm_check_cap(vm, KVM_CAP_X86_DISABLE_EXITS) &
> + KVM_X86_DISABLE_EXITS_APERFMPERF);
> +
> + vm_enable_cap(vm, KVM_CAP_X86_DISABLE_EXITS,
> + KVM_X86_DISABLE_EXITS_APERFMPERF);
> +
> + vcpu = vm_vcpu_add(vm, 0, guest_code);
> +
> + host_aperf_before = read_dev_msr(msr_fd, MSR_IA32_APERF);
> + host_mperf_before = read_dev_msr(msr_fd, MSR_IA32_MPERF);
> +
> + for (i = 0; i < NUM_ITERATIONS; i++) {
> + uint64_t host_aperf_after, host_mperf_after;
> + uint64_t guest_aperf, guest_mperf;
> + struct ucall uc;
> +
> + vcpu_run(vcpu);
> + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
> +
> + switch (get_ucall(vcpu, &uc)) {
> + case UCALL_DONE:
> + break;
> + case UCALL_ABORT:
> + REPORT_GUEST_ASSERT(uc);
> + case UCALL_SYNC:
> + guest_aperf = uc.args[0];
> + guest_mperf = uc.args[1];
> +
> + host_aperf_after = read_dev_msr(msr_fd, MSR_IA32_APERF);
> + host_mperf_after = read_dev_msr(msr_fd, MSR_IA32_MPERF);
> +
> + TEST_ASSERT(host_aperf_before < guest_aperf,
> + "APERF: host_before (0x%" PRIx64 ") >= guest (0x%" PRIx64 ")",
> + host_aperf_before, guest_aperf);
> + TEST_ASSERT(guest_aperf < host_aperf_after,
> + "APERF: guest (0x%" PRIx64 ") >= host_after (0x%" PRIx64 ")",
> + guest_aperf, host_aperf_after);
> + TEST_ASSERT(host_mperf_before < guest_mperf,
> + "MPERF: host_before (0x%" PRIx64 ") >= guest (0x%" PRIx64 ")",
> + host_mperf_before, guest_mperf);
> + TEST_ASSERT(guest_mperf < host_mperf_after,
> + "MPERF: guest (0x%" PRIx64 ") >= host_after (0x%" PRIx64 ")",
> + guest_mperf, host_mperf_after);
Should we consider the possible overflow case of these 2 MSRs although it
could be extremely rare? Thanks.
> +
> + host_aperf_before = host_aperf_after;
> + host_mperf_before = host_mperf_after;
> +
> + break;
> + }
> + }
> +
> + kvm_vm_free(vm);
> + close(msr_fd);
> +
> + return 0;
> +}
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 3/3] KVM: selftests: Test behavior of KVM_X86_DISABLE_EXITS_APERFMPERF
2025-06-10 8:42 ` Mi, Dapeng
@ 2025-06-10 16:59 ` Jim Mattson
2025-06-11 1:47 ` Mi, Dapeng
0 siblings, 1 reply; 14+ messages in thread
From: Jim Mattson @ 2025-06-10 16:59 UTC (permalink / raw)
To: Mi, Dapeng; +Cc: linux-kernel, kvm, Sean Christopherson, Paolo Bonzini
On Tue, Jun 10, 2025 at 1:42 AM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
>
>
> On 5/31/2025 2:52 AM, Jim Mattson wrote:
> > For a VCPU thread pinned to a single LPU, verify that interleaved host
> > and guest reads of IA32_[AM]PERF return strictly increasing values when
> > APERFMPERF exiting is disabled.
> Should we consider the possible overflow case of these 2 MSRs although it
> could be extremely rare? Thanks.
Unless someone moves the MSRs forward, at current frequencies, the
machine will have to be up for more than 100 years. I'll be long dead
by then.
Note that frequency invariant scheduling doesn't accommodate overflow
either. If the MSRs overflow, frequency invariant scheduling is
disabled.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 3/3] KVM: selftests: Test behavior of KVM_X86_DISABLE_EXITS_APERFMPERF
2025-06-10 16:59 ` Jim Mattson
@ 2025-06-11 1:47 ` Mi, Dapeng
0 siblings, 0 replies; 14+ messages in thread
From: Mi, Dapeng @ 2025-06-11 1:47 UTC (permalink / raw)
To: Jim Mattson; +Cc: linux-kernel, kvm, Sean Christopherson, Paolo Bonzini
On 6/11/2025 12:59 AM, Jim Mattson wrote:
> On Tue, Jun 10, 2025 at 1:42 AM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
>>
>> On 5/31/2025 2:52 AM, Jim Mattson wrote:
>>> For a VCPU thread pinned to a single LPU, verify that interleaved host
>>> and guest reads of IA32_[AM]PERF return strictly increasing values when
>>> APERFMPERF exiting is disabled.
>> Should we consider the possible overflow case of these 2 MSRs although it
>> could be extremely rare? Thanks.
> Unless someone moves the MSRs forward, at current frequencies, the
> machine will have to be up for more than 100 years. I'll be long dead
> by then.
😂
>
> Note that frequency invariant scheduling doesn't accommodate overflow
> either. If the MSRs overflow, frequency invariant scheduling is
> disabled.
Agree.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 1/3] KVM: x86: Replace growing set of *_in_guest bools with a u64
2025-05-30 18:52 ` [PATCH v4 1/3] KVM: x86: Replace growing set of *_in_guest bools with a u64 Jim Mattson
@ 2025-06-24 21:25 ` Sean Christopherson
2025-06-24 22:34 ` Jim Mattson
0 siblings, 1 reply; 14+ messages in thread
From: Sean Christopherson @ 2025-06-24 21:25 UTC (permalink / raw)
To: Jim Mattson; +Cc: linux-kernel, kvm, Paolo Bonzini
On Fri, May 30, 2025, Jim Mattson wrote:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 570e7f8cbf64..8c20afda4398 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6605,13 +6605,13 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> pr_warn_once(SMT_RSB_MSG);
>
> if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE)
> - kvm->arch.pause_in_guest = true;
> + kvm_disable_exits(kvm, KVM_X86_DISABLE_EXITS_PAUSE);
> if (cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT)
> - kvm->arch.mwait_in_guest = true;
> + kvm_disable_exits(kvm, KVM_X86_DISABLE_EXITS_MWAIT);
> if (cap->args[0] & KVM_X86_DISABLE_EXITS_HLT)
> - kvm->arch.hlt_in_guest = true;
> + kvm_disable_exits(kvm, KVM_X86_DISABLE_EXITS_HLT);
> if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE)
> - kvm->arch.cstate_in_guest = true;
> + kvm_disable_exits(kvm, KVM_X86_DISABLE_EXITS_CSTATE);
> r = 0;
> disable_exits_unlock:
> mutex_unlock(&kvm->lock);
Can't this simply be? The set of capabilities to disable has already been vetted,
so I don't see any reason to manually process each flag.
mutex_lock(&kvm->lock);
if (kvm->created_vcpus)
goto disable_exits_unlock;
#define SMT_RSB_MSG "This processor is affected by the Cross-Thread Return Predictions vulnerability. " \
"KVM_CAP_X86_DISABLE_EXITS should only be used with SMT disabled or trusted guests."
if (!mitigate_smt_rsb && boot_cpu_has_bug(X86_BUG_SMT_RSB) &&
cpu_smt_possible() &&
(cap->args[0] & ~(KVM_X86_DISABLE_EXITS_PAUSE |
KVM_X86_DISABLE_EXITS_APERFMPERF)))
pr_warn_once(SMT_RSB_MSG);
kvm_disable_exits(kvm, cap->args[0]);
r = 0;
disable_exits_unlock:
mutex_unlock(&kvm->lock);
break;
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 2/3] KVM: x86: Provide a capability to disable APERF/MPERF read intercepts
2025-05-30 18:52 ` [PATCH v4 2/3] KVM: x86: Provide a capability to disable APERF/MPERF read intercepts Jim Mattson
@ 2025-06-24 21:35 ` Sean Christopherson
2025-06-24 22:37 ` Jim Mattson
2025-06-24 23:31 ` Sean Christopherson
1 sibling, 1 reply; 14+ messages in thread
From: Sean Christopherson @ 2025-06-24 21:35 UTC (permalink / raw)
To: Jim Mattson; +Cc: linux-kernel, kvm, Paolo Bonzini
On Fri, May 30, 2025, Jim Mattson wrote:
> Allow a guest to read the physical IA32_APERF and IA32_MPERF MSRs
> without interception.
>
> The IA32_APERF and IA32_MPERF MSRs are not virtualized. Writes are not
> handled at all. The MSR values are not zeroed on vCPU creation, saved
> on suspend, or restored on resume. No accommodation is made for
> processor migration or for sharing a logical processor with other
> tasks. No adjustments are made for non-unit TSC multipliers. The MSRs
> do not account for time the same way as the comparable PMU events,
> whether the PMU is virtualized by the traditional emulation method or
> the new mediated pass-through approach.
>
> Nonetheless, in a properly constrained environment, this capability
> can be combined with a guest CPUID table that advertises support for
> CPUID.6:ECX.APERFMPERF[bit 0] to induce a Linux guest to report the
> effective physical CPU frequency in /proc/cpuinfo. Moreover, there is
> no performance cost for this capability.
>
> Signed-off-by: Jim Mattson <jmattson@google.com>
> ---
> Documentation/virt/kvm/api.rst | 23 +++++++++++++++++++++++
> arch/x86/kvm/svm/svm.c | 7 +++++++
> arch/x86/kvm/svm/svm.h | 2 +-
> arch/x86/kvm/vmx/vmx.c | 6 ++++++
> arch/x86/kvm/vmx/vmx.h | 2 +-
> arch/x86/kvm/x86.c | 8 +++++++-
> arch/x86/kvm/x86.h | 5 +++++
> include/uapi/linux/kvm.h | 1 +
> tools/include/uapi/linux/kvm.h | 1 +
> 9 files changed, 52 insertions(+), 3 deletions(-)
This needs to be rebased on top of the MSR interception rework, which I've now
pushed to kvm-x86 next. Luckily, it's quite painless. Compile tested only at
this point (about to throw it onto metal).
I'd be happy to post a v5 on your behalf (pending your thoughts on my feedback
to patch 1), unless you want the honors. The fixup is a wee bit more than I'm
comfortable doing on-the-fly.
---
Documentation/virt/kvm/api.rst | 23 +++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 5 +++++
arch/x86/kvm/vmx/vmx.c | 4 ++++
arch/x86/kvm/x86.c | 6 +++++-
arch/x86/kvm/x86.h | 5 +++++
include/uapi/linux/kvm.h | 1 +
tools/include/uapi/linux/kvm.h | 1 +
7 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index f0d961436d0f..13a752b1200f 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7844,6 +7844,7 @@ Valid bits in args[0] are::
#define KVM_X86_DISABLE_EXITS_HLT (1 << 1)
#define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2)
#define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3)
+ #define KVM_X86_DISABLE_EXITS_APERFMPERF (1 << 4)
Enabling this capability on a VM provides userspace with a way to no
longer intercept some instructions for improved latency in some
@@ -7854,6 +7855,28 @@ all such vmexits.
Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits.
+Virtualizing the ``IA32_APERF`` and ``IA32_MPERF`` MSRs requires more
+than just disabling APERF/MPERF exits. While both Intel and AMD
+document strict usage conditions for these MSRs--emphasizing that only
+the ratio of their deltas over a time interval (T0 to T1) is
+architecturally defined--simply passing through the MSRs can still
+produce an incorrect ratio.
+
+This erroneous ratio can occur if, between T0 and T1:
+
+1. The vCPU thread migrates between logical processors.
+2. Live migration or suspend/resume operations take place.
+3. Another task shares the vCPU's logical processor.
+4. C-states lower thean C0 are emulated (e.g., via HLT interception).
+5. The guest TSC frequency doesn't match the host TSC frequency.
+
+Due to these complexities, KVM does not automatically associate this
+passthrough capability with the guest CPUID bit,
+``CPUID.6:ECX.APERFMPERF[bit 0]``. Userspace VMMs that deem this
+mechanism adequate for virtualizing the ``IA32_APERF`` and
+``IA32_MPERF`` MSRs must set the guest CPUID bit explicitly.
+
+
7.14 KVM_CAP_S390_HPAGE_1M
--------------------------
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ce85f4d6f686..079c0a0b0eaa 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -854,6 +854,11 @@ static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
svm_set_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW,
guest_cpuid_is_intel_compatible(vcpu));
+ if (kvm_aperfmperf_in_guest(vcpu->kvm)) {
+ svm_disable_intercept_for_msr(vcpu, MSR_IA32_APERF, MSR_TYPE_R);
+ svm_disable_intercept_for_msr(vcpu, MSR_IA32_MPERF, MSR_TYPE_R);
+ }
+
if (sev_es_guest(vcpu->kvm))
sev_es_recalc_msr_intercepts(vcpu);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 8c16a3aff896..723a22be2514 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4099,6 +4099,10 @@ void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
}
+ if (kvm_aperfmperf_in_guest(vcpu->kvm)) {
+ vmx_disable_intercept_for_msr(vcpu, MSR_IA32_APERF, MSR_TYPE_R);
+ vmx_disable_intercept_for_msr(vcpu, MSR_IA32_MPERF, MSR_TYPE_R);
+ }
/* PT MSRs can be passed through iff PT is exposed to the guest. */
if (vmx_pt_mode_is_host_guest())
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 56569ac2e9a4..75c0f52d3c44 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4580,6 +4580,9 @@ static u64 kvm_get_allowed_disable_exits(void)
{
u64 r = KVM_X86_DISABLE_EXITS_PAUSE;
+ if (boot_cpu_has(X86_FEATURE_APERFMPERF))
+ r |= KVM_X86_DISABLE_EXITS_APERFMPERF;
+
if (!mitigate_smt_rsb) {
r |= KVM_X86_DISABLE_EXITS_HLT |
KVM_X86_DISABLE_EXITS_CSTATE;
@@ -6478,7 +6481,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
if (!mitigate_smt_rsb && boot_cpu_has_bug(X86_BUG_SMT_RSB) &&
cpu_smt_possible() &&
- (cap->args[0] & ~KVM_X86_DISABLE_EXITS_PAUSE))
+ (cap->args[0] & ~(KVM_X86_DISABLE_EXITS_PAUSE |
+ KVM_X86_DISABLE_EXITS_APERFMPERF)))
pr_warn_once(SMT_RSB_MSG);
kvm_disable_exits(kvm, cap->args[0]);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 31ae58b765f3..bcfd9b719ada 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -546,6 +546,11 @@ static inline bool kvm_cstate_in_guest(struct kvm *kvm)
return kvm->arch.disabled_exits & KVM_X86_DISABLE_EXITS_CSTATE;
}
+static inline bool kvm_aperfmperf_in_guest(struct kvm *kvm)
+{
+ return kvm->arch.disabled_exits & KVM_X86_DISABLE_EXITS_APERFMPERF;
+}
+
static inline bool kvm_notify_vmexit_enabled(struct kvm *kvm)
{
return kvm->arch.notify_vmexit_flags & KVM_X86_NOTIFY_VMEXIT_ENABLED;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 7a4c35ff03fe..aeb2ca10b190 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -644,6 +644,7 @@ struct kvm_ioeventfd {
#define KVM_X86_DISABLE_EXITS_HLT (1 << 1)
#define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2)
#define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3)
+#define KVM_X86_DISABLE_EXITS_APERFMPERF (1 << 4)
/* for KVM_ENABLE_CAP */
struct kvm_enable_cap {
diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
index d00b85cb168c..7415a3863891 100644
--- a/tools/include/uapi/linux/kvm.h
+++ b/tools/include/uapi/linux/kvm.h
@@ -618,6 +618,7 @@ struct kvm_ioeventfd {
#define KVM_X86_DISABLE_EXITS_HLT (1 << 1)
#define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2)
#define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3)
+#define KVM_X86_DISABLE_EXITS_APERFMPERF (1 << 4)
/* for KVM_ENABLE_CAP */
struct kvm_enable_cap {
--
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v4 3/3] KVM: selftests: Test behavior of KVM_X86_DISABLE_EXITS_APERFMPERF
2025-05-30 18:52 ` [PATCH v4 3/3] KVM: selftests: Test behavior of KVM_X86_DISABLE_EXITS_APERFMPERF Jim Mattson
2025-06-10 8:42 ` Mi, Dapeng
@ 2025-06-24 22:24 ` Sean Christopherson
1 sibling, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2025-06-24 22:24 UTC (permalink / raw)
To: Jim Mattson; +Cc: linux-kernel, kvm, Paolo Bonzini
On Fri, May 30, 2025, Jim Mattson wrote:
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 5649cf2f40e8..b6c707ab92d7 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -10,6 +10,7 @@
> #include "ucall_common.h"
>
> #include <assert.h>
> +#include <pthread.h>
> #include <sched.h>
> #include <sys/mman.h>
> #include <sys/resource.h>
> @@ -2321,3 +2322,19 @@ bool vm_is_gpa_protected(struct kvm_vm *vm, vm_paddr_t paddr)
> pg = paddr >> vm->page_shift;
> return sparsebit_is_set(region->protected_phy_pages, pg);
> }
> +
> +int pin_task_to_one_cpu(void)
> +{
> + int cpu = sched_getcpu();
> + cpu_set_t cpuset;
> + int rc;
> +
> + CPU_ZERO(&cpuset);
> + CPU_SET(cpu, &cpuset);
> +
> + rc = pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
> + TEST_ASSERT(rc == 0, "%s: Can't set thread affinity", __func__);
> +
> + return cpu;
> +}
There's already kvm_pin_this_task_to_pcpu(), which is *almost* what is needed
here. I'll slot in a patch in v5 to expand that into a set of APIs, along with
a patch to convert the low hanging fruit (arch_timer tests).
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 1/3] KVM: x86: Replace growing set of *_in_guest bools with a u64
2025-06-24 21:25 ` Sean Christopherson
@ 2025-06-24 22:34 ` Jim Mattson
0 siblings, 0 replies; 14+ messages in thread
From: Jim Mattson @ 2025-06-24 22:34 UTC (permalink / raw)
To: Sean Christopherson; +Cc: linux-kernel, kvm, Paolo Bonzini
On Tue, Jun 24, 2025 at 2:25 PM Sean Christopherson <seanjc@google.com> wrote:
> Can't this simply be? The set of capabilities to disable has already been vetted,
> so I don't see any reason to manually process each flag.
I love it! Thank you.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 2/3] KVM: x86: Provide a capability to disable APERF/MPERF read intercepts
2025-06-24 21:35 ` Sean Christopherson
@ 2025-06-24 22:37 ` Jim Mattson
0 siblings, 0 replies; 14+ messages in thread
From: Jim Mattson @ 2025-06-24 22:37 UTC (permalink / raw)
To: Sean Christopherson; +Cc: linux-kernel, kvm, Paolo Bonzini
On Tue, Jun 24, 2025 at 2:35 PM Sean Christopherson <seanjc@google.com> wrote:
>
> This needs to be rebased on top of the MSR interception rework, which I've now
> pushed to kvm-x86 next. Luckily, it's quite painless. Compile tested only at
> this point (about to throw it onto metal).
>
> I'd be happy to post a v5 on your behalf (pending your thoughts on my feedback
> to patch 1), unless you want the honors. The fixup is a wee bit more than I'm
> comfortable doing on-the-fly.
>
Please do.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 2/3] KVM: x86: Provide a capability to disable APERF/MPERF read intercepts
2025-05-30 18:52 ` [PATCH v4 2/3] KVM: x86: Provide a capability to disable APERF/MPERF read intercepts Jim Mattson
2025-06-24 21:35 ` Sean Christopherson
@ 2025-06-24 23:31 ` Sean Christopherson
2025-06-25 0:11 ` Jim Mattson
1 sibling, 1 reply; 14+ messages in thread
From: Sean Christopherson @ 2025-06-24 23:31 UTC (permalink / raw)
To: Jim Mattson; +Cc: linux-kernel, kvm, Paolo Bonzini
On Fri, May 30, 2025, Jim Mattson wrote:
> @@ -7790,6 +7791,28 @@ all such vmexits.
>
> Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits.
>
> +Virtualizing the ``IA32_APERF`` and ``IA32_MPERF`` MSRs requires more
> +than just disabling APERF/MPERF exits. While both Intel and AMD
> +document strict usage conditions for these MSRs--emphasizing that only
> +the ratio of their deltas over a time interval (T0 to T1) is
> +architecturally defined--simply passing through the MSRs can still
> +produce an incorrect ratio.
> +
> +This erroneous ratio can occur if, between T0 and T1:
> +
> +1. The vCPU thread migrates between logical processors.
> +2. Live migration or suspend/resume operations take place.
> +3. Another task shares the vCPU's logical processor.
> +4. C-states lower thean C0 are emulated (e.g., via HLT interception).
> +5. The guest TSC frequency doesn't match the host TSC frequency.
> +
> +Due to these complexities, KVM does not automatically associate this
> +passthrough capability with the guest CPUID bit,
> +``CPUID.6:ECX.APERFMPERF[bit 0]``. Userspace VMMs that deem this
> +mechanism adequate for virtualizing the ``IA32_APERF`` and
> +``IA32_MPERF`` MSRs must set the guest CPUID bit explicitly.
Question: what do we want to do about nested? Due to differences between SVM
and VMX at the time you posted your patches, this series _as posted_ will do
nested passthrough for SVM, but not VMX (before the MSR rework, SVM auto-merged
bitmaps for all MSRs in svm_direct_access_msrs).
As I've got it locally applied, neither SVM nor VMX will do passthrough to L2.
I'm leaning toward allowing full passthrough, because (a) it's easy, (b) I can't
think of any reason not to, and (c) SVM's semi-auto-merging logic means we could
*unintentinally* do full passthrough in the future, in the unlikely event that
KVM added passthrough support for an MSR in the same chunk as APERF and MPERF.
This would be the extent of the changes (I think, haven't tested yet).
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 749f7b866ac8..b7fd2e869998 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -194,7 +194,7 @@ void recalc_intercepts(struct vcpu_svm *svm)
* Hardcode the capacity of the array based on the maximum number of _offsets_.
* MSRs are batched together, so there are fewer offsets than MSRs.
*/
-static int nested_svm_msrpm_merge_offsets[6] __ro_after_init;
+static int nested_svm_msrpm_merge_offsets[7] __ro_after_init;
static int nested_svm_nr_msrpm_merge_offsets __ro_after_init;
typedef unsigned long nsvm_msrpm_merge_t;
@@ -216,6 +216,8 @@ int __init nested_svm_init_msrpm_merge_offsets(void)
MSR_IA32_SPEC_CTRL,
MSR_IA32_PRED_CMD,
MSR_IA32_FLUSH_CMD,
+ MSR_IA32_APERF,
+ MSR_IA32_MPERF,
MSR_IA32_LASTBRANCHFROMIP,
MSR_IA32_LASTBRANCHTOIP,
MSR_IA32_LASTINTFROMIP,
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index c69df3aba8d1..b8ea1969113d 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -715,6 +715,12 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
MSR_IA32_FLUSH_CMD, MSR_TYPE_W);
+ nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+ MSR_IA32_APERF, MSR_TYPE_R);
+
+ nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+ MSR_IA32_MPERF, MSR_TYPE_R);
+
kvm_vcpu_unmap(vcpu, &map);
vmx->nested.force_msr_bitmap_recalc = false;
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v4 2/3] KVM: x86: Provide a capability to disable APERF/MPERF read intercepts
2025-06-24 23:31 ` Sean Christopherson
@ 2025-06-25 0:11 ` Jim Mattson
0 siblings, 0 replies; 14+ messages in thread
From: Jim Mattson @ 2025-06-25 0:11 UTC (permalink / raw)
To: Sean Christopherson; +Cc: linux-kernel, kvm, Paolo Bonzini
On Tue, Jun 24, 2025 at 4:31 PM Sean Christopherson <seanjc@google.com> wrote:
>
> Question: what do we want to do about nested? Due to differences between SVM
> and VMX at the time you posted your patches, this series _as posted_ will do
> nested passthrough for SVM, but not VMX (before the MSR rework, SVM auto-merged
> bitmaps for all MSRs in svm_direct_access_msrs).
>
> As I've got it locally applied, neither SVM nor VMX will do passthrough to L2.
> I'm leaning toward allowing full passthrough, because (a) it's easy, (b) I can't
> think of any reason not to, and (c) SVM's semi-auto-merging logic means we could
> *unintentinally* do full passthrough in the future, in the unlikely event that
> KVM added passthrough support for an MSR in the same chunk as APERF and MPERF.
I think full passthrough makes sense.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-06-25 0:12 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-30 18:52 [PATCH v4 0/3] KVM: x86: Provide a capability to disable APERF/MPERF read intercepts Jim Mattson
2025-05-30 18:52 ` [PATCH v4 1/3] KVM: x86: Replace growing set of *_in_guest bools with a u64 Jim Mattson
2025-06-24 21:25 ` Sean Christopherson
2025-06-24 22:34 ` Jim Mattson
2025-05-30 18:52 ` [PATCH v4 2/3] KVM: x86: Provide a capability to disable APERF/MPERF read intercepts Jim Mattson
2025-06-24 21:35 ` Sean Christopherson
2025-06-24 22:37 ` Jim Mattson
2025-06-24 23:31 ` Sean Christopherson
2025-06-25 0:11 ` Jim Mattson
2025-05-30 18:52 ` [PATCH v4 3/3] KVM: selftests: Test behavior of KVM_X86_DISABLE_EXITS_APERFMPERF Jim Mattson
2025-06-10 8:42 ` Mi, Dapeng
2025-06-10 16:59 ` Jim Mattson
2025-06-11 1:47 ` Mi, Dapeng
2025-06-24 22:24 ` Sean Christopherson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).