* [PATCH v5 01/13] KVM: nSVM: Stop leaking single-stepping on VMRUN into L2
2026-04-30 20:27 [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD Host-Only/Guest-Only bits Yosry Ahmed
@ 2026-04-30 20:27 ` Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 02/13] KVM: nSVM: Bail early out of VMRUN emulation if advancing RIP fails Yosry Ahmed
` (12 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-04-30 20:27 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, kvm, linux-kernel, Yosry Ahmed
According to the APM, TF on VMRUN causes a #DB after VMRUN completes on
the _host_ side. However, KVM injects a #DB in L2 context instead (or
exits to userspace if KVM_GUESTDBG_SINGLESTEP is set) in
kvm_skip_emulated_instruction().
Introduce __kvm_skip_emulated_instruction(), pull single-step handling
into the wrapper, and use __kvm_skip_emulated_instruction() for VMRUN.
This ignores TF on VMRUN instead of injecting a spurious exception into
L2. Document this virtualization hole with a FIXME.
Note that a failed VMRUN would have been correctly single-stepped, but
now TF is always ignored for consistency and simplicity purposes. VMX
does not support TF on VMLAUNCH/VMRESUME, so it's unlikely that
single-stepping VMRUN properly is important, especially if it's only for
failed VMRUNs.
Fixes: c8e16b78c614 ("x86: KVM: svm: eliminate hardcoded RIP advancement from vmrun_interception()")
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/svm/nested.c | 11 ++++++++---
arch/x86/kvm/x86.c | 15 +++++++++++++--
3 files changed, 23 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c470e40a00aa4..b191967c9c1e4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2475,7 +2475,9 @@ void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu);
bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu);
extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
+int __kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu);
int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu);
+
int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err);
void __user *__x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa,
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 961804df5f451..5dfcbaf7743b0 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1125,11 +1125,16 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
return kvm_handle_memory_failure(vcpu, X86EMUL_IO_NEEDED, NULL);
/* Advance RIP past VMRUN as part of the nested #VMEXIT. */
- return kvm_skip_emulated_instruction(vcpu);
+ return __kvm_skip_emulated_instruction(vcpu);
}
- /* At this point, VMRUN is guaranteed to not fault; advance RIP. */
- ret = kvm_skip_emulated_instruction(vcpu);
+ /*
+ * At this point, VMRUN is guaranteed to not fault; advance RIP.
+ *
+ * FIXME: If TF is set on VMRUN should inject a #DB (or handle guest
+ * debugging) right after #VMEXIT, right now it's just ignored.
+ */
+ ret = __kvm_skip_emulated_instruction(vcpu);
/*
* Since vmcb01 is not in use, we can use it to store some of the L1
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0a1b63c63d1a9..31dc48a8111e5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9272,9 +9272,8 @@ static int kvm_vcpu_do_singlestep(struct kvm_vcpu *vcpu)
return 1;
}
-int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
+int __kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
{
- unsigned long rflags = kvm_x86_call(get_rflags)(vcpu);
int r;
r = kvm_x86_call(skip_emulated_instruction)(vcpu);
@@ -9282,6 +9281,18 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
return 0;
kvm_pmu_instruction_retired(vcpu);
+ return r;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_skip_emulated_instruction);
+
+int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
+{
+ unsigned long rflags = kvm_x86_call(get_rflags)(vcpu);
+ int r;
+
+ r = __kvm_skip_emulated_instruction(vcpu);
+ if (unlikely(!r))
+ return 0;
/*
* rflags is the old, "raw" value of the flags. The new value has
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v5 02/13] KVM: nSVM: Bail early out of VMRUN emulation if advancing RIP fails
2026-04-30 20:27 [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD Host-Only/Guest-Only bits Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 01/13] KVM: nSVM: Stop leaking single-stepping on VMRUN into L2 Yosry Ahmed
@ 2026-04-30 20:27 ` Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 03/13] KVM: nSVM: Move VMRUN instruction retirement after entering guest mode Yosry Ahmed
` (11 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-04-30 20:27 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, kvm, linux-kernel, Yosry Ahmed
If kvm_skip_emulation_instruction() fails, then RIP could not be
advanced correctly (e.g. decode failure when NextRIP is not available).
KVM will exit to userspace to handle the emulation failure, but only
after stuffing the wrong RIP into vmcb01 and entering guest mode.
Bail early and exit to userspace before committing any side-effects of
emulating the VMRUN. Unify both calls to
__kvm_skip_emulated_instruction() into a single one, but return
immediately after if copying and caching vmcb12 failed. A side effect of
this is that the FIXME comment is now above the only caller.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 5dfcbaf7743b0..0f6ea490d707b 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1120,21 +1120,22 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
}
ret = nested_svm_copy_vmcb12_to_cache(vcpu, vmcb12_gpa);
- if (ret) {
- if (ret == -EFAULT)
- return kvm_handle_memory_failure(vcpu, X86EMUL_IO_NEEDED, NULL);
-
- /* Advance RIP past VMRUN as part of the nested #VMEXIT. */
- return __kvm_skip_emulated_instruction(vcpu);
- }
+ if (ret == -EFAULT)
+ return kvm_handle_memory_failure(vcpu, X86EMUL_IO_NEEDED, NULL);
/*
- * At this point, VMRUN is guaranteed to not fault; advance RIP.
+ * At this point, VMRUN is guaranteed to not fault; advance RIP. If
+ * caching vmcb12 failed for other reasons, return immediately afterward
+ * as a nested #VMEXIT was already set up.
*
* FIXME: If TF is set on VMRUN should inject a #DB (or handle guest
* debugging) right after #VMEXIT, right now it's just ignored.
*/
- ret = __kvm_skip_emulated_instruction(vcpu);
+ if (!__kvm_skip_emulated_instruction(vcpu))
+ return 0;
+
+ if (ret)
+ return 1;
/*
* Since vmcb01 is not in use, we can use it to store some of the L1
@@ -1164,7 +1165,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
nested_svm_vmexit(svm);
}
- return ret;
+ return 1;
}
/* Copy state save area fields which are handled by VMRUN */
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v5 03/13] KVM: nSVM: Move VMRUN instruction retirement after entering guest mode
2026-04-30 20:27 [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD Host-Only/Guest-Only bits Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 01/13] KVM: nSVM: Stop leaking single-stepping on VMRUN into L2 Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 02/13] KVM: nSVM: Bail early out of VMRUN emulation if advancing RIP fails Yosry Ahmed
@ 2026-04-30 20:27 ` Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 04/13] KVM: x86: Move enable_pmu/enable_mediated_pmu to pmu.h and pmu.c Yosry Ahmed
` (10 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-04-30 20:27 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, kvm, linux-kernel, Yosry Ahmed
A successful VMRUN retires in guest mode and should be counted by the
PMU as a guest instruction. However, __kvm_skip_emulated_instruction()
is called before entering guest mode to advance L1's RIP to the
instruction following VMRUN. This is needed as the RIP is saved in
vmcb01 to be restored on VM-Exit.
Since VMRUN emulation is the only caller of
__kvm_skip_emulated_instruction(), move retiring instructions for PMU
purposes to its wrapper, leaving __kvm_skip_emulated_instruction() as a
transparent wrapper around the vendor-specific calls.
Note that this is currently a noop because KVM does not virtualize
Host-Only/Guest-Only PMC controls yet, so all instructions are counted
regardless of the vCPU's host/guest state. But this change is needed for
the incoming support for Host-Only/Guest-Only controls to count VMRUN
correctly.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 9 ++++++++-
arch/x86/kvm/x86.c | 11 +++--------
2 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 0f6ea490d707b..58c78c889a812 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -30,6 +30,7 @@
#include "lapic.h"
#include "svm.h"
#include "hyperv.h"
+#include "pmu.h"
#define CC KVM_NESTED_VMENTER_CONSISTENCY_CHECK
@@ -1135,7 +1136,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
return 0;
if (ret)
- return 1;
+ goto insn_retired;
/*
* Since vmcb01 is not in use, we can use it to store some of the L1
@@ -1165,6 +1166,12 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
nested_svm_vmexit(svm);
}
+insn_retired:
+ /*
+ * A successful VMRUN is counted by the PMU in guest mode, so only
+ * retire the instruction after potentially entering guest mode.
+ */
+ kvm_pmu_instruction_retired(vcpu);
return 1;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 31dc48a8111e5..08be0a63b93bd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9274,14 +9274,7 @@ static int kvm_vcpu_do_singlestep(struct kvm_vcpu *vcpu)
int __kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
{
- int r;
-
- r = kvm_x86_call(skip_emulated_instruction)(vcpu);
- if (unlikely(!r))
- return 0;
-
- kvm_pmu_instruction_retired(vcpu);
- return r;
+ return kvm_x86_call(skip_emulated_instruction)(vcpu);
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_skip_emulated_instruction);
@@ -9294,6 +9287,8 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
if (unlikely(!r))
return 0;
+ kvm_pmu_instruction_retired(vcpu);
+
/*
* rflags is the old, "raw" value of the flags. The new value has
* not been saved yet.
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v5 04/13] KVM: x86: Move enable_pmu/enable_mediated_pmu to pmu.h and pmu.c
2026-04-30 20:27 [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD Host-Only/Guest-Only bits Yosry Ahmed
` (2 preceding siblings ...)
2026-04-30 20:27 ` [PATCH v5 03/13] KVM: nSVM: Move VMRUN instruction retirement after entering guest mode Yosry Ahmed
@ 2026-04-30 20:27 ` Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 05/13] KVM: x86/pmu: Rename reprogram_counters() to clarify usage Yosry Ahmed
` (9 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-04-30 20:27 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, kvm, linux-kernel, Yosry Ahmed
The declaration and definition of enable_pmu/enable_mediated_pmu
semantically belongs in pmu.h and pmu.c, and more importantly, pmu.h
uses enable_mediated_pmu and relies on the caller including x86.h.
There is already precedence for other module params defined outside of
x86.c, so move enable_pmu/enable_mediated_pmu to pmu.c.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/pmu.c | 10 ++++++++++
arch/x86/kvm/pmu.h | 3 +++
arch/x86/kvm/x86.c | 9 ---------
arch/x86/kvm/x86.h | 3 ---
4 files changed, 13 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index e218352e34231..d6ac3c55fce55 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -16,6 +16,7 @@
#include <linux/perf_event.h>
#include <linux/bsearch.h>
#include <linux/sort.h>
+#include <linux/moduleparam.h>
#include <asm/perf_event.h>
#include <asm/cpu_device_id.h>
#include "x86.h"
@@ -33,6 +34,15 @@ static struct x86_pmu_capability __read_mostly kvm_host_pmu;
struct x86_pmu_capability __read_mostly kvm_pmu_cap;
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_pmu_cap);
+/* Enable/disable PMU virtualization */
+bool __read_mostly enable_pmu = true;
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_pmu);
+module_param(enable_pmu, bool, 0444);
+
+/* Enable/disabled mediated PMU virtualization. */
+bool __read_mostly enable_mediated_pmu;
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_mediated_pmu);
+
struct kvm_pmu_emulated_event_selectors {
u64 INSTRUCTIONS_RETIRED;
u64 BRANCH_INSTRUCTIONS_RETIRED;
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 0925246731cb1..b1f2418e960ac 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -53,6 +53,9 @@ struct kvm_pmu_ops {
const u32 MSR_STRIDE;
};
+extern bool enable_pmu;
+extern bool enable_mediated_pmu;
+
void kvm_pmu_ops_update(const struct kvm_pmu_ops *pmu_ops);
void kvm_handle_guest_mediated_pmi(void);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 08be0a63b93bd..0b421ea29977b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -182,15 +182,6 @@ module_param(force_emulation_prefix, int, 0644);
int __read_mostly pi_inject_timer = -1;
module_param(pi_inject_timer, bint, 0644);
-/* Enable/disable PMU virtualization */
-bool __read_mostly enable_pmu = true;
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_pmu);
-module_param(enable_pmu, bool, 0444);
-
-/* Enable/disabled mediated PMU virtualization. */
-bool __read_mostly enable_mediated_pmu;
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_mediated_pmu);
-
bool __read_mostly eager_page_split = true;
module_param(eager_page_split, bool, 0644);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 38a905fa86de2..30a69effc81e2 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -490,9 +490,6 @@ fastpath_t handle_fastpath_invd(struct kvm_vcpu *vcpu);
extern struct kvm_caps kvm_caps;
extern struct kvm_host_values kvm_host;
-extern bool enable_pmu;
-extern bool enable_mediated_pmu;
-
void kvm_setup_xss_caps(void);
/*
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v5 05/13] KVM: x86/pmu: Rename reprogram_counters() to clarify usage
2026-04-30 20:27 [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD Host-Only/Guest-Only bits Yosry Ahmed
` (3 preceding siblings ...)
2026-04-30 20:27 ` [PATCH v5 04/13] KVM: x86: Move enable_pmu/enable_mediated_pmu to pmu.h and pmu.c Yosry Ahmed
@ 2026-04-30 20:27 ` Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 06/13] KVM: x86/pmu: Do a single atomic OR when reprogramming counters Yosry Ahmed
` (8 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-04-30 20:27 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, kvm, linux-kernel, Yosry Ahmed
Rename reprogram_counters() to kvm_pmu_request_counters_reprogram()
clarifying that it is more similar to
kvm_pmu_request_counter_reprogram(), and less similar to
reprogram_counter(). The kvm_pmu_* prefix is also appropriate as the
function is exposed in the header.
Opportunistically rename the argument from 'diff' to 'counters'.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/pmu.c | 2 +-
arch/x86/kvm/pmu.h | 7 ++++---
arch/x86/kvm/vmx/pmu_intel.c | 2 +-
3 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index d6ac3c55fce55..afbc731e72174 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -889,7 +889,7 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (pmu->global_ctrl != data) {
diff = pmu->global_ctrl ^ data;
pmu->global_ctrl = data;
- reprogram_counters(pmu, diff);
+ kvm_pmu_request_counters_reprogram(pmu, diff);
}
/*
* Unconditionally forward writes to vendor code, i.e. to the
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index b1f2418e960ac..f8286067722b0 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -210,14 +210,15 @@ static inline void kvm_pmu_request_counter_reprogram(struct kvm_pmc *pmc)
kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
}
-static inline void reprogram_counters(struct kvm_pmu *pmu, u64 diff)
+static inline void kvm_pmu_request_counters_reprogram(struct kvm_pmu *pmu,
+ u64 counters)
{
int bit;
- if (!diff)
+ if (!counters)
return;
- for_each_set_bit(bit, (unsigned long *)&diff, X86_PMC_IDX_MAX)
+ for_each_set_bit(bit, (unsigned long *)&counters, X86_PMC_IDX_MAX)
set_bit(bit, pmu->reprogram_pmi);
kvm_make_request(KVM_REQ_PMU, pmu_to_vcpu(pmu));
}
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 27eb76e6b6a03..9bd77843d8da2 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -391,7 +391,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (pmu->pebs_enable != data) {
diff = pmu->pebs_enable ^ data;
pmu->pebs_enable = data;
- reprogram_counters(pmu, diff);
+ kvm_pmu_request_counters_reprogram(pmu, diff);
}
break;
case MSR_IA32_DS_AREA:
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v5 06/13] KVM: x86/pmu: Do a single atomic OR when reprogramming counters
2026-04-30 20:27 [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD Host-Only/Guest-Only bits Yosry Ahmed
` (4 preceding siblings ...)
2026-04-30 20:27 ` [PATCH v5 05/13] KVM: x86/pmu: Rename reprogram_counters() to clarify usage Yosry Ahmed
@ 2026-04-30 20:27 ` Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 07/13] KVM: x86/pmu: Disable counters based on Host-Only/Guest-Only bits in SVM Yosry Ahmed
` (7 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-04-30 20:27 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, kvm, linux-kernel, Yosry Ahmed
Do a single atomic OR using the atomic overlay of reprogram_pmi bitmask,
instead of one atomic set_bit() call per counter.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/pmu.h | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index f8286067722b0..0e99022168a85 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -213,13 +213,10 @@ static inline void kvm_pmu_request_counter_reprogram(struct kvm_pmc *pmc)
static inline void kvm_pmu_request_counters_reprogram(struct kvm_pmu *pmu,
u64 counters)
{
- int bit;
-
if (!counters)
return;
- for_each_set_bit(bit, (unsigned long *)&counters, X86_PMC_IDX_MAX)
- set_bit(bit, pmu->reprogram_pmi);
+ atomic64_or(counters, &pmu->__reprogram_pmi);
kvm_make_request(KVM_REQ_PMU, pmu_to_vcpu(pmu));
}
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v5 07/13] KVM: x86/pmu: Disable counters based on Host-Only/Guest-Only bits in SVM
2026-04-30 20:27 [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD Host-Only/Guest-Only bits Yosry Ahmed
` (5 preceding siblings ...)
2026-04-30 20:27 ` [PATCH v5 06/13] KVM: x86/pmu: Do a single atomic OR when reprogramming counters Yosry Ahmed
@ 2026-04-30 20:27 ` Yosry Ahmed
2026-04-30 23:24 ` Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 08/13] KVM: x86/pmu: Reprogram Host/Guest-Only counters on nested transitions Yosry Ahmed
` (6 subsequent siblings)
13 siblings, 1 reply; 18+ messages in thread
From: Yosry Ahmed @ 2026-04-30 20:27 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, kvm, linux-kernel, Yosry Ahmed
Introduce a per-vendor PMU callback for reprogramming counters, and
register a callback on AMD to disable a counter based on the vCPU's
setting of Host-Only or Guest-Only EVENT_SELECT bits.
If EFER.SVME is set, all events are counted if both bits are set or
cleared. If only one bit is set, the counter is disabled if the vCPU
context does not match the set bit.
If EFER.SVME is cleared, the counter is disabled if any of the bits is
set, otherwise all events are counted. Note that a Linux guest correctly
handles this and clears Host-Only when EFER.SVME is cleared, see commit
1018faa6cf23 ("perf/x86/kvm: Fix Host-Only/Guest-Only counting with SVM
disabled").
The reprogram_counters() callback is made after the reprogram_counter()
loop, as it depends on kvm_mediated_pmu_refresh_event_filter() setting
ARCH_PERFMON_EVENTSEL_ENABLE for any enabled counters first.
kvm_mediated_pmu_load() writes the updated value of eventsel_hw to the
appropriate MSR before the vCPU is run.
Host-Only and Guest-Only bits are currently reserved, so this change is
a noop, but the bits will be allowed with mediated PMU in a following
change when fully supported.
Originally-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/include/asm/kvm-x86-pmu-ops.h | 1 +
arch/x86/include/asm/perf_event.h | 2 ++
arch/x86/kvm/pmu.c | 6 +++-
arch/x86/kvm/pmu.h | 1 +
arch/x86/kvm/svm/pmu.c | 43 ++++++++++++++++++++++++++
5 files changed, 52 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm-x86-pmu-ops.h b/arch/x86/include/asm/kvm-x86-pmu-ops.h
index d5452b3433b7d..5402efd26282b 100644
--- a/arch/x86/include/asm/kvm-x86-pmu-ops.h
+++ b/arch/x86/include/asm/kvm-x86-pmu-ops.h
@@ -23,6 +23,7 @@ KVM_X86_PMU_OP(init)
KVM_X86_PMU_OP_OPTIONAL(reset)
KVM_X86_PMU_OP_OPTIONAL(deliver_pmi)
KVM_X86_PMU_OP_OPTIONAL(cleanup)
+KVM_X86_PMU_OP_OPTIONAL(reprogram_counters)
KVM_X86_PMU_OP_OPTIONAL(write_global_ctrl)
KVM_X86_PMU_OP(mediated_load)
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index ff5acb8b199b0..5961c002b28eb 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -60,6 +60,8 @@
#define AMD64_EVENTSEL_INT_CORE_ENABLE (1ULL << 36)
#define AMD64_EVENTSEL_GUESTONLY (1ULL << 40)
#define AMD64_EVENTSEL_HOSTONLY (1ULL << 41)
+#define AMD64_EVENTSEL_HOST_GUEST_MASK \
+ (AMD64_EVENTSEL_HOSTONLY | AMD64_EVENTSEL_GUESTONLY)
#define AMD64_EVENTSEL_INT_CORE_SEL_SHIFT 37
#define AMD64_EVENTSEL_INT_CORE_SEL_MASK \
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index afbc731e72174..5e3a10e0a54ff 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -646,9 +646,11 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
DECLARE_BITMAP(bitmap, X86_PMC_IDX_MAX);
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
struct kvm_pmc *pmc;
+ u64 counters;
int bit;
bitmap_copy(bitmap, pmu->reprogram_pmi, X86_PMC_IDX_MAX);
+ counters = *(u64 *)bitmap;
/*
* The reprogramming bitmap can be written asynchronously by something
@@ -656,7 +658,7 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
* the bits that will actually processed.
*/
BUILD_BUG_ON(sizeof(bitmap) != sizeof(atomic64_t));
- atomic64_andnot(*(s64 *)bitmap, &pmu->__reprogram_pmi);
+ atomic64_andnot(counters, &pmu->__reprogram_pmi);
kvm_for_each_pmc(pmu, pmc, bit, bitmap) {
/*
@@ -669,6 +671,8 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
set_bit(pmc->idx, pmu->reprogram_pmi);
}
+ kvm_pmu_call(reprogram_counters)(vcpu, counters);
+
/*
* Release unused perf_events if the corresponding guest MSRs weren't
* accessed during the last vCPU time slice (need_cleanup is set when
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 0e99022168a85..0c372b9f8ed34 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -36,6 +36,7 @@ struct kvm_pmu_ops {
void (*reset)(struct kvm_vcpu *vcpu);
void (*deliver_pmi)(struct kvm_vcpu *vcpu);
void (*cleanup)(struct kvm_vcpu *vcpu);
+ void (*reprogram_counters)(struct kvm_vcpu *vcpu, u64 counters);
bool (*is_mediated_pmu_supported)(struct x86_pmu_capability *host_pmu);
void (*mediated_load)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 7aa298eeb0721..fe6f2bb79ab83 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -260,6 +260,48 @@ static void amd_mediated_pmu_put(struct kvm_vcpu *vcpu)
wrmsrq(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, pmu->global_status);
}
+static void amd_mediated_pmu_handle_host_guest_bits(struct kvm_vcpu *vcpu,
+ struct kvm_pmc *pmc)
+{
+ u64 host_guest_bits;
+
+ if (!(pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE))
+ return;
+
+ /* Count all events if both bits are cleared */
+ host_guest_bits = pmc->eventsel & AMD64_EVENTSEL_HOST_GUEST_MASK;
+ if (!host_guest_bits)
+ return;
+
+ /*
+ * If EFER.SVME is set, the counter is disabledd if only one of the bits
+ * is set and it doesn't match the vCPU context. If EFER.SVME is
+ * cleared, the counter is disable if any of the bits is set.
+ */
+ if (vcpu->arch.efer & EFER_SVME) {
+ if (host_guest_bits == AMD64_EVENTSEL_HOST_GUEST_MASK)
+ return;
+
+ if (!!(host_guest_bits & AMD64_EVENTSEL_GUESTONLY) == is_guest_mode(vcpu))
+ return;
+ }
+
+ pmc->eventsel_hw &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
+}
+
+static void amd_pmu_reprogram_counters(struct kvm_vcpu *vcpu, u64 counters)
+{
+ struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+ struct kvm_pmc *pmc;
+ int bit;
+
+ if (!kvm_vcpu_has_mediated_pmu(vcpu))
+ return;
+
+ kvm_for_each_pmc(pmu, pmc, bit, (unsigned long *)&counters)
+ amd_mediated_pmu_handle_host_guest_bits(vcpu, pmc);
+}
+
struct kvm_pmu_ops amd_pmu_ops __initdata = {
.rdpmc_ecx_to_pmc = amd_rdpmc_ecx_to_pmc,
.msr_idx_to_pmc = amd_msr_idx_to_pmc,
@@ -269,6 +311,7 @@ struct kvm_pmu_ops amd_pmu_ops __initdata = {
.set_msr = amd_pmu_set_msr,
.refresh = amd_pmu_refresh,
.init = amd_pmu_init,
+ .reprogram_counters = amd_pmu_reprogram_counters,
.is_mediated_pmu_supported = amd_pmu_is_mediated_pmu_supported,
.mediated_load = amd_mediated_pmu_load,
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v5 07/13] KVM: x86/pmu: Disable counters based on Host-Only/Guest-Only bits in SVM
2026-04-30 20:27 ` [PATCH v5 07/13] KVM: x86/pmu: Disable counters based on Host-Only/Guest-Only bits in SVM Yosry Ahmed
@ 2026-04-30 23:24 ` Yosry Ahmed
2026-05-01 3:34 ` Yosry Ahmed
0 siblings, 1 reply; 18+ messages in thread
From: Yosry Ahmed @ 2026-04-30 23:24 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, kvm, linux-kernel
> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> index 0e99022168a85..0c372b9f8ed34 100644
> --- a/arch/x86/kvm/pmu.h
> +++ b/arch/x86/kvm/pmu.h
> @@ -36,6 +36,7 @@ struct kvm_pmu_ops {
> void (*reset)(struct kvm_vcpu *vcpu);
> void (*deliver_pmi)(struct kvm_vcpu *vcpu);
> void (*cleanup)(struct kvm_vcpu *vcpu);
> + void (*reprogram_counters)(struct kvm_vcpu *vcpu, u64 counters);
>
> bool (*is_mediated_pmu_supported)(struct x86_pmu_capability *host_pmu);
> void (*mediated_load)(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
> index 7aa298eeb0721..fe6f2bb79ab83 100644
> --- a/arch/x86/kvm/svm/pmu.c
> +++ b/arch/x86/kvm/svm/pmu.c
> @@ -260,6 +260,48 @@ static void amd_mediated_pmu_put(struct kvm_vcpu *vcpu)
> wrmsrq(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, pmu->global_status);
> }
>
> +static void amd_mediated_pmu_handle_host_guest_bits(struct kvm_vcpu *vcpu,
> + struct kvm_pmc *pmc)
> +{
> + u64 host_guest_bits;
> +
> + if (!(pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE))
> + return;
> +
> + /* Count all events if both bits are cleared */
> + host_guest_bits = pmc->eventsel & AMD64_EVENTSEL_HOST_GUEST_MASK;
> + if (!host_guest_bits)
> + return;
> +
> + /*
> + * If EFER.SVME is set, the counter is disabledd if only one of the bits
> + * is set and it doesn't match the vCPU context. If EFER.SVME is
> + * cleared, the counter is disable if any of the bits is set.
> + */
> + if (vcpu->arch.efer & EFER_SVME) {
> + if (host_guest_bits == AMD64_EVENTSEL_HOST_GUEST_MASK)
> + return;
> +
> + if (!!(host_guest_bits & AMD64_EVENTSEL_GUESTONLY) == is_guest_mode(vcpu))
> + return;
> + }
> +
> + pmc->eventsel_hw &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
Sashiko raised a good point here. In the following patch, we reprogram
the counters synchronously on nested transitions to update the
counters' enablement before counting VMRUN or WRMSR(EFER.SVME).
However, this updates pmc->eventsel_hw while
kvm_pmu_recalc_pmc_emulation() checks pmc->eventsel to check if the
counter is enabled.
I think either pmc_is_locally_enabled() needs to check
pmc->eventsel_hw or we need to update pmc->eventsel here.
AFAICT, pmc->eventself has the value written to the MSR, so I think we
want to keep that as-is.
On the other hand, ARCH_PERFMON_EVENTSEL_ENABLE is cleared in
pmc->eventsel_hw in kvm_mediated_pmu_refresh_event_filter() if the
event is not allowed, and kvm_pmu_recalc_pmc_emulation() has a comment
about intentionally ignoring event filters.
We can also separately track enablement for nested purposes, but that
would add one more thing we need to check aside from general counter
enablement and event filtering.
None of these options are ideal, perhaps directly clearing the bit in
pmc->eventsel would do the least damage as (pmc->eventsel &
ARCH_PERFMON_EVENTSEL_ENABLE) is only checked by
pmc_is_locally_enabled().
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v5 07/13] KVM: x86/pmu: Disable counters based on Host-Only/Guest-Only bits in SVM
2026-04-30 23:24 ` Yosry Ahmed
@ 2026-05-01 3:34 ` Yosry Ahmed
2026-05-01 17:50 ` Yosry Ahmed
0 siblings, 1 reply; 18+ messages in thread
From: Yosry Ahmed @ 2026-05-01 3:34 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, kvm, linux-kernel
On Thu, Apr 30, 2026 at 4:24 PM Yosry Ahmed <yosry@kernel.org> wrote:
>
> > diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> > index 0e99022168a85..0c372b9f8ed34 100644
> > --- a/arch/x86/kvm/pmu.h
> > +++ b/arch/x86/kvm/pmu.h
> > @@ -36,6 +36,7 @@ struct kvm_pmu_ops {
> > void (*reset)(struct kvm_vcpu *vcpu);
> > void (*deliver_pmi)(struct kvm_vcpu *vcpu);
> > void (*cleanup)(struct kvm_vcpu *vcpu);
> > + void (*reprogram_counters)(struct kvm_vcpu *vcpu, u64 counters);
> >
> > bool (*is_mediated_pmu_supported)(struct x86_pmu_capability *host_pmu);
> > void (*mediated_load)(struct kvm_vcpu *vcpu);
> > diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
> > index 7aa298eeb0721..fe6f2bb79ab83 100644
> > --- a/arch/x86/kvm/svm/pmu.c
> > +++ b/arch/x86/kvm/svm/pmu.c
> > @@ -260,6 +260,48 @@ static void amd_mediated_pmu_put(struct kvm_vcpu *vcpu)
> > wrmsrq(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, pmu->global_status);
> > }
> >
> > +static void amd_mediated_pmu_handle_host_guest_bits(struct kvm_vcpu *vcpu,
> > + struct kvm_pmc *pmc)
> > +{
> > + u64 host_guest_bits;
> > +
> > + if (!(pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE))
> > + return;
> > +
> > + /* Count all events if both bits are cleared */
> > + host_guest_bits = pmc->eventsel & AMD64_EVENTSEL_HOST_GUEST_MASK;
> > + if (!host_guest_bits)
> > + return;
> > +
> > + /*
> > + * If EFER.SVME is set, the counter is disabledd if only one of the bits
> > + * is set and it doesn't match the vCPU context. If EFER.SVME is
> > + * cleared, the counter is disable if any of the bits is set.
> > + */
> > + if (vcpu->arch.efer & EFER_SVME) {
> > + if (host_guest_bits == AMD64_EVENTSEL_HOST_GUEST_MASK)
> > + return;
> > +
> > + if (!!(host_guest_bits & AMD64_EVENTSEL_GUESTONLY) == is_guest_mode(vcpu))
> > + return;
> > + }
> > +
> > + pmc->eventsel_hw &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
>
> Sashiko raised a good point here. In the following patch, we reprogram
> the counters synchronously on nested transitions to update the
> counters' enablement before counting VMRUN or WRMSR(EFER.SVME).
> However, this updates pmc->eventsel_hw while
> kvm_pmu_recalc_pmc_emulation() checks pmc->eventsel to check if the
> counter is enabled.
>
> I think either pmc_is_locally_enabled() needs to check
> pmc->eventsel_hw or we need to update pmc->eventsel here.
>
> AFAICT, pmc->eventself has the value written to the MSR, so I think we
> want to keep that as-is.
>
> On the other hand, ARCH_PERFMON_EVENTSEL_ENABLE is cleared in
> pmc->eventsel_hw in kvm_mediated_pmu_refresh_event_filter() if the
> event is not allowed, and kvm_pmu_recalc_pmc_emulation() has a comment
> about intentionally ignoring event filters.
>
> We can also separately track enablement for nested purposes, but that
> would add one more thing we need to check aside from general counter
> enablement and event filtering.
>
> None of these options are ideal, perhaps directly clearing the bit in
> pmc->eventsel would do the least damage as (pmc->eventsel &
> ARCH_PERFMON_EVENTSEL_ENABLE) is only checked by
> pmc_is_locally_enabled().
No, we can't really clear the bit in pmc->eventsel as that's what the
guest reads with RDMSR.
The more I think about this, the more I think we should drop
pmc->eventsel_hw. It serves two purposes AFAICT:
1. On AMD, we use it to clear HOSTONLY and set GUESTONLY in actual HW.
2. For event filtering, we use it to clear EVENTSEL_ENABLE.
But maybe it's easier to explicitly track the changes we need to apply
to eventsel rather than a HW version?
(1) is trivial, we can just clear HOSTONLY and set GUESTONLY before
doing the actual write as that's constant.
For (2), instead of eventsel_hw, maybe add a boolean called 'filtered'
to track if that PMC should be filtered out or not. Then, we can add
another boolean to track if the counter is disabled due to
nested/HG-bits (e.g. 'nested_disabled').
With that, pmc_is_locally_enabled() would check:
(pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) && !pmc->nested_disabled
, and then before writing to HW we check pmc->filtered,
pmc->nested_disabled, as well as do the HOSTONLY/GUESTONLY changes for
AMD.
Actually, instead of a boolean, maybe add 'disabled_reasons' flags,
with possible flags like PMC_DISABLED_FILTER and PMC_DISABLED_NESTED.
This might all be unclear, I will draft some diff on top of the series
tomorrow and send it in case it makes things clearer.
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v5 07/13] KVM: x86/pmu: Disable counters based on Host-Only/Guest-Only bits in SVM
2026-05-01 3:34 ` Yosry Ahmed
@ 2026-05-01 17:50 ` Yosry Ahmed
0 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-05-01 17:50 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, kvm, linux-kernel
On Thu, Apr 30, 2026 at 08:34:59PM -0700, Yosry Ahmed wrote:
> On Thu, Apr 30, 2026 at 4:24 PM Yosry Ahmed <yosry@kernel.org> wrote:
> >
> > > diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> > > index 0e99022168a85..0c372b9f8ed34 100644
> > > --- a/arch/x86/kvm/pmu.h
> > > +++ b/arch/x86/kvm/pmu.h
> > > @@ -36,6 +36,7 @@ struct kvm_pmu_ops {
> > > void (*reset)(struct kvm_vcpu *vcpu);
> > > void (*deliver_pmi)(struct kvm_vcpu *vcpu);
> > > void (*cleanup)(struct kvm_vcpu *vcpu);
> > > + void (*reprogram_counters)(struct kvm_vcpu *vcpu, u64 counters);
> > >
> > > bool (*is_mediated_pmu_supported)(struct x86_pmu_capability *host_pmu);
> > > void (*mediated_load)(struct kvm_vcpu *vcpu);
> > > diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
> > > index 7aa298eeb0721..fe6f2bb79ab83 100644
> > > --- a/arch/x86/kvm/svm/pmu.c
> > > +++ b/arch/x86/kvm/svm/pmu.c
> > > @@ -260,6 +260,48 @@ static void amd_mediated_pmu_put(struct kvm_vcpu *vcpu)
> > > wrmsrq(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, pmu->global_status);
> > > }
> > >
> > > +static void amd_mediated_pmu_handle_host_guest_bits(struct kvm_vcpu *vcpu,
> > > + struct kvm_pmc *pmc)
> > > +{
> > > + u64 host_guest_bits;
> > > +
> > > + if (!(pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE))
> > > + return;
> > > +
> > > + /* Count all events if both bits are cleared */
> > > + host_guest_bits = pmc->eventsel & AMD64_EVENTSEL_HOST_GUEST_MASK;
> > > + if (!host_guest_bits)
> > > + return;
> > > +
> > > + /*
> > > + * If EFER.SVME is set, the counter is disabledd if only one of the bits
> > > + * is set and it doesn't match the vCPU context. If EFER.SVME is
> > > + * cleared, the counter is disable if any of the bits is set.
> > > + */
> > > + if (vcpu->arch.efer & EFER_SVME) {
> > > + if (host_guest_bits == AMD64_EVENTSEL_HOST_GUEST_MASK)
> > > + return;
> > > +
> > > + if (!!(host_guest_bits & AMD64_EVENTSEL_GUESTONLY) == is_guest_mode(vcpu))
> > > + return;
> > > + }
> > > +
> > > + pmc->eventsel_hw &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
> >
> > Sashiko raised a good point here. In the following patch, we reprogram
> > the counters synchronously on nested transitions to update the
> > counters' enablement before counting VMRUN or WRMSR(EFER.SVME).
> > However, this updates pmc->eventsel_hw while
> > kvm_pmu_recalc_pmc_emulation() checks pmc->eventsel to check if the
> > counter is enabled.
> >
> > I think either pmc_is_locally_enabled() needs to check
> > pmc->eventsel_hw or we need to update pmc->eventsel here.
> >
> > AFAICT, pmc->eventself has the value written to the MSR, so I think we
> > want to keep that as-is.
> >
> > On the other hand, ARCH_PERFMON_EVENTSEL_ENABLE is cleared in
> > pmc->eventsel_hw in kvm_mediated_pmu_refresh_event_filter() if the
> > event is not allowed, and kvm_pmu_recalc_pmc_emulation() has a comment
> > about intentionally ignoring event filters.
> >
> > We can also separately track enablement for nested purposes, but that
> > would add one more thing we need to check aside from general counter
> > enablement and event filtering.
> >
> > None of these options are ideal, perhaps directly clearing the bit in
> > pmc->eventsel would do the least damage as (pmc->eventsel &
> > ARCH_PERFMON_EVENTSEL_ENABLE) is only checked by
> > pmc_is_locally_enabled().
>
> No, we can't really clear the bit in pmc->eventsel as that's what the
> guest reads with RDMSR.
>
> The more I think about this, the more I think we should drop
> pmc->eventsel_hw. It serves two purposes AFAICT:
> 1. On AMD, we use it to clear HOSTONLY and set GUESTONLY in actual HW.
> 2. For event filtering, we use it to clear EVENTSEL_ENABLE.
>
> But maybe it's easier to explicitly track the changes we need to apply
> to eventsel rather than a HW version?
>
> (1) is trivial, we can just clear HOSTONLY and set GUESTONLY before
> doing the actual write as that's constant.
>
> For (2), instead of eventsel_hw, maybe add a boolean called 'filtered'
> to track if that PMC should be filtered out or not. Then, we can add
> another boolean to track if the counter is disabled due to
> nested/HG-bits (e.g. 'nested_disabled').
>
> With that, pmc_is_locally_enabled() would check:
> (pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) && !pmc->nested_disabled
>
> , and then before writing to HW we check pmc->filtered,
> pmc->nested_disabled, as well as do the HOSTONLY/GUESTONLY changes for
> AMD.
>
> Actually, instead of a boolean, maybe add 'disabled_reasons' flags,
> with possible flags like PMC_DISABLED_FILTER and PMC_DISABLED_NESTED.
>
> This might all be unclear, I will draft some diff on top of the series
> tomorrow and send it in case it makes things clearer.
This is what I had in mind essentially:
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b191967c9c1e4..9c38c47581e75 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -515,11 +515,21 @@ enum pmc_type {
KVM_PMC_FIXED,
};
+#define KVM_PMC_DISABLE_FILTERED BIT(0)
+#define KVM_PMC_DISABLE_NESTED BIT(1)
+
struct kvm_pmc {
enum pmc_type type;
u8 idx;
bool is_paused;
bool intr;
+
+ /*
+ * Reasons why the PMC should be disabled in eventsel when written to HW
+ * with the mediated vPMU.
+ */
+ u8 eventsel_hw_disable_reasons;
+
/*
* Base value of the PMC counter, relative to the *consumed* count in
* the associated perf_event. This value includes counter updates from
@@ -538,7 +548,6 @@ struct kvm_pmc {
*/
u64 emulated_counter;
u64 eventsel;
- u64 eventsel_hw;
struct perf_event *perf_event;
struct kvm_vcpu *vcpu;
/*
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 7b2b4ce6bdad9..5128858610d83 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -536,10 +536,9 @@ static void kvm_mediated_pmu_refresh_event_filter(struct kvm_pmc *pmc)
struct kvm_pmu *pmu = pmc_to_pmu(pmc);
if (pmc_is_gp(pmc)) {
- pmc->eventsel_hw &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
- if (allowed)
- pmc->eventsel_hw |= pmc->eventsel &
- ARCH_PERFMON_EVENTSEL_ENABLE;
+ pmc->eventsel_hw_disable_reasons &= ~KVM_PMC_DISABLE_FILTERED;
+ if (!allowed)
+ pmc->eventsel_hw_disable_reasons |= KVM_PMC_DISABLE_FILTERED;
} else {
u64 mask = intel_fixed_bits_by_idx(pmc->idx - KVM_FIXED_PMC_BASE_IDX, 0xf);
@@ -630,7 +629,7 @@ void kvm_pmu_recalc_pmc_emulation(struct kvm_pmu *pmu, struct kvm_pmc *pmc)
* omitting a PMC from a bitmap could result in a missed event if the
* filter is changed to allow counting the event.
*/
- if (!pmc_is_locally_enabled(pmc))
+ if (!pmc_is_locally_enabled(pmc) || pmc_is_nested_disabled(pmc))
return;
if (pmc_is_event_match(pmc, kvm_pmu_eventsel.INSTRUCTIONS_RETIRED))
@@ -944,7 +943,7 @@ static void kvm_pmu_reset(struct kvm_vcpu *vcpu)
if (pmc_is_gp(pmc)) {
pmc->eventsel = 0;
- pmc->eventsel_hw = 0;
+ pmc->eventsel_hw_disable_reasons = 0;
}
}
@@ -1313,6 +1312,19 @@ static __always_inline u32 gp_eventsel_msr(u32 idx)
return kvm_pmu_ops.GP_EVENTSEL_BASE + idx * kvm_pmu_ops.MSR_STRIDE;
}
+static __always_inline u64 gp_calc_eventsel_hw(struct kvm_pmc *pmc)
+{
+ u64 eventsel = pmc->eventsel;
+
+ if (pmc->eventsel_hw_disable_reasons)
+ eventsel &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
+
+ eventsel |= kvm_pmu_ops.GP_EVENTSEL_ALWAYS_SET;
+ eventsel &= ~kvm_pmu_ops.GP_EVENTSEL_ALWAYS_CLR;
+
+ return eventsel;
+}
+
static void kvm_pmu_load_guest_pmcs(struct kvm_vcpu *vcpu)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -1329,7 +1341,7 @@ static void kvm_pmu_load_guest_pmcs(struct kvm_vcpu *vcpu)
if (pmc->counter != rdpmc(i))
wrmsrl(gp_counter_msr(i), pmc->counter);
- wrmsrl(gp_eventsel_msr(i), pmc->eventsel_hw);
+ wrmsrl(gp_eventsel_msr(i), gp_calc_eventsel_hw(pmc));
}
for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
pmc = &pmu->fixed_counters[i];
@@ -1387,7 +1399,7 @@ static void kvm_pmu_put_guest_pmcs(struct kvm_vcpu *vcpu)
pmc->counter = rdpmc(i);
if (pmc->counter)
wrmsrq(gp_counter_msr(i), 0);
- if (pmc->eventsel_hw)
+ if (gp_calc_eventsel_hw(pmc))
wrmsrq(gp_eventsel_msr(i), 0);
}
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 4a9148cf779df..fcfce6a213aef 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -52,6 +52,9 @@ struct kvm_pmu_ops {
const u32 GP_COUNTER_BASE;
const u32 FIXED_COUNTER_BASE;
const u32 MSR_STRIDE;
+
+ const u64 GP_EVENTSEL_ALWAYS_SET;
+ const u64 GP_EVENTSEL_ALWAYS_CLR;
};
extern bool enable_pmu;
@@ -197,6 +200,11 @@ static inline bool pmc_is_locally_enabled(struct kvm_pmc *pmc)
return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
}
+static inline bool pmc_is_nested_disabled(struct kvm_pmc *pmc)
+{
+ return pmc->eventsel_hw_disable_reasons & KVM_PMC_DISABLE_NESTED;
+}
+
extern struct x86_pmu_capability kvm_pmu_cap;
void kvm_init_pmu_capability(struct kvm_pmu_ops *pmu_ops);
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index b12c35b4fccbf..2ac00e729d04b 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -166,8 +166,6 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
data &= ~pmu->reserved_bits;
if (data != pmc->eventsel) {
pmc->eventsel = data;
- pmc->eventsel_hw = (data & ~AMD64_EVENTSEL_HOSTONLY) |
- AMD64_EVENTSEL_GUESTONLY;
kvm_pmu_request_counter_reprogram(pmc);
}
return 0;
@@ -270,6 +268,8 @@ static void amd_mediated_pmu_handle_host_guest_bits(struct kvm_vcpu *vcpu,
struct vcpu_svm *svm = to_svm(vcpu);
u64 host_guest_bits;
+ pmc->eventsel_hw_disable_reasons &= ~KVM_PMC_DISABLE_NESTED;
+
if (!(pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE))
return;
@@ -296,7 +296,7 @@ static void amd_mediated_pmu_handle_host_guest_bits(struct kvm_vcpu *vcpu,
return;
}
- pmc->eventsel_hw &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
+ pmc->eventsel_hw_disable_reasons |= KVM_PMC_DISABLE_NESTED;
}
static void amd_pmu_reprogram_counters(struct kvm_vcpu *vcpu, u64 counters)
@@ -336,4 +336,7 @@ struct kvm_pmu_ops amd_pmu_ops __initdata = {
.GP_COUNTER_BASE = MSR_F15H_PERF_CTR0,
.FIXED_COUNTER_BASE = 0,
.MSR_STRIDE = 2,
+
+ .GP_EVENTSEL_ALWAYS_SET = AMD64_EVENTSEL_GUESTONLY,
+ .GP_EVENTSEL_ALWAYS_CLR = AMD64_EVENTSEL_HOSTONLY,
};
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 9bd77843d8da2..05b68f40f189e 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -431,7 +431,6 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (data != pmc->eventsel) {
pmc->eventsel = data;
- pmc->eventsel_hw = data;
kvm_pmu_request_counter_reprogram(pmc);
}
break;
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v5 08/13] KVM: x86/pmu: Reprogram Host/Guest-Only counters on nested transitions
2026-04-30 20:27 [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD Host-Only/Guest-Only bits Yosry Ahmed
` (6 preceding siblings ...)
2026-04-30 20:27 ` [PATCH v5 07/13] KVM: x86/pmu: Disable counters based on Host-Only/Guest-Only bits in SVM Yosry Ahmed
@ 2026-04-30 20:27 ` Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 09/13] KVM: x86/pmu: Allow Host-Only/Guest-Only bits with nSVM and mediated PMU Yosry Ahmed
` (5 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-04-30 20:27 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, kvm, linux-kernel, Yosry Ahmed
Reprogram PMU counters on nested transitions for the mediated PMU, to
re-evaluate Host-Only and Guest-Only bits and enable/disable the PMU
counters accordingly. For example, if Host-Only is set and Guest-Only is
cleared, a counter should be disabled when entering guest mode and
enabled when exiting guest mode.
According to the APM, when EFER.SVME is cleared, setting Host-Only or
Guest-Only disables the counter, so also trigger counter reprogramming
when EFER.SVME is toggled.
Track counters with any of Host-Only and Guest-Only set as counters
requiring reprogramming on nested transitions in a bitmap. Track such
counters even if EFER.SVME is cleared as counters with Host-Only or
Guest-Only bits set need to be reprogrammed on EFER.SVME toggling.
Reprogram the counters synchronously on nested VMRUN/#VMEXIT and
EFER.SVME toggling. This is necessary as these instructions are counted
based on the new CPU state (after the instruction is retired in
hardware). Hence, the PMU needs to be updated before instruction
emulation is completed and kvm_pmu_instruction_retired() is called.
Defer reprogramming the counters when force leaving guest mode through
svm_leave_nested() to avoid potentially reading stale state (e.g.
incorrect EFER). All flows force leaving nested are not architectural,
so precision is not a priority.
Refactor a helper out of kvm_pmu_request_reprogram_counters() that
accepts a boolean allowing synchronous vs deferred reprogramming, and
use that from SVM code to support both scenarios.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/pmu.c | 1 +
arch/x86/kvm/pmu.h | 18 ++++++++++++++----
arch/x86/kvm/svm/nested.c | 12 ++++++++++++
arch/x86/kvm/svm/pmu.c | 12 +++++++++---
arch/x86/kvm/svm/svm.c | 2 ++
arch/x86/kvm/svm/svm.h | 33 +++++++++++++++++++++++++++++++++
6 files changed, 71 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 5e3a10e0a54ff..7b2b4ce6bdad9 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -684,6 +684,7 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
kvm_for_each_pmc(pmu, pmc, bit, bitmap)
kvm_pmu_recalc_pmc_emulation(pmu, pmc);
}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_pmu_handle_event);
int kvm_pmu_check_rdpmc_early(struct kvm_vcpu *vcpu, unsigned int idx)
{
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 0c372b9f8ed34..4a9148cf779df 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -202,6 +202,7 @@ extern struct x86_pmu_capability kvm_pmu_cap;
void kvm_init_pmu_capability(struct kvm_pmu_ops *pmu_ops);
void kvm_pmu_recalc_pmc_emulation(struct kvm_pmu *pmu, struct kvm_pmc *pmc);
+void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
static inline void kvm_pmu_request_counter_reprogram(struct kvm_pmc *pmc)
{
@@ -211,14 +212,24 @@ static inline void kvm_pmu_request_counter_reprogram(struct kvm_pmc *pmc)
kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
}
-static inline void kvm_pmu_request_counters_reprogram(struct kvm_pmu *pmu,
- u64 counters)
+static inline void __kvm_pmu_reprogram_counters(struct kvm_pmu *pmu,
+ u64 counters,
+ bool defer)
{
if (!counters)
return;
atomic64_or(counters, &pmu->__reprogram_pmi);
- kvm_make_request(KVM_REQ_PMU, pmu_to_vcpu(pmu));
+ if (defer)
+ kvm_make_request(KVM_REQ_PMU, pmu_to_vcpu(pmu));
+ else
+ kvm_pmu_handle_event(pmu_to_vcpu(pmu));
+}
+
+static inline void kvm_pmu_request_counters_reprogram(struct kvm_pmu *pmu,
+ u64 counters)
+{
+ __kvm_pmu_reprogram_counters(pmu, counters, true);
}
/*
@@ -247,7 +258,6 @@ static inline bool kvm_pmu_is_fastpath_emulation_allowed(struct kvm_vcpu *vcpu)
}
void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
-void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
int kvm_pmu_check_rdpmc_early(struct kvm_vcpu *vcpu, unsigned int idx);
bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 58c78c889a812..bb3362c043395 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -826,6 +826,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
/* Enter Guest-Mode */
enter_guest_mode(vcpu);
+ svm_pmu_handle_nested_transition(svm);
/*
* Filled at exit: exit_code, exit_info_1, exit_info_2, exit_int_info,
@@ -1302,6 +1303,8 @@ void nested_svm_vmexit(struct vcpu_svm *svm)
/* Exit Guest-Mode */
leave_guest_mode(vcpu);
+ svm_pmu_handle_nested_transition(svm);
+
svm->nested.vmcb12_gpa = 0;
kvm_warn_on_nested_run_pending(vcpu);
@@ -1519,6 +1522,15 @@ void svm_leave_nested(struct kvm_vcpu *vcpu)
leave_guest_mode(vcpu);
+ /*
+ * Force leaving nested is a non-architectural flow so precision
+ * is not a priority. Defer updating the PMU until the next vCPU
+ * run, potentially tolerating some imprecision to avoid poking
+ * into PMU state from arbitrary contexts (e.g. KVM may end up
+ * using stale state).
+ */
+ __svm_pmu_handle_nested_transition(svm, true);
+
svm_switch_vmcb(svm, &svm->vmcb01);
nested_svm_uninit_mmu_context(vcpu);
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index fe6f2bb79ab83..902d7eb4a461b 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -263,20 +263,26 @@ static void amd_mediated_pmu_put(struct kvm_vcpu *vcpu)
static void amd_mediated_pmu_handle_host_guest_bits(struct kvm_vcpu *vcpu,
struct kvm_pmc *pmc)
{
+ struct vcpu_svm *svm = to_svm(vcpu);
u64 host_guest_bits;
if (!(pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE))
return;
- /* Count all events if both bits are cleared */
+ /*
+ * If both bits are cleared, always keep the counter enabled. Otherwise,
+ * counter enablement needs to be re-evaluated on every nested
+ * transition (and EFER.SVME change).
+ */
host_guest_bits = pmc->eventsel & AMD64_EVENTSEL_HOST_GUEST_MASK;
if (!host_guest_bits)
return;
+ __set_bit(pmc->idx, svm->nested.reprogram_pmcs_on_nested_transitions);
/*
- * If EFER.SVME is set, the counter is disabledd if only one of the bits
+ * If EFER.SVME is set, the counter is disabled if only one of the bits
* is set and it doesn't match the vCPU context. If EFER.SVME is
- * cleared, the counter is disable if any of the bits is set.
+ * cleared, the counter is disabled if any of the bits is set.
*/
if (vcpu->arch.efer & EFER_SVME) {
if (host_guest_bits == AMD64_EVENTSEL_HOST_GUEST_MASK)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e7fdd7a9c280d..7ffa3c9033d0f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -261,6 +261,7 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
set_exception_intercept(svm, GP_VECTOR);
}
+ svm_pmu_handle_nested_transition(svm);
kvm_make_request(KVM_REQ_RECALC_INTERCEPTS, vcpu);
}
@@ -1214,6 +1215,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event)
svm->nested.vmcb12_gpa = INVALID_GPA;
svm->nested.last_vmcb12_gpa = INVALID_GPA;
+ bitmap_zero(svm->nested.reprogram_pmcs_on_nested_transitions, X86_PMC_IDX_MAX);
if (!kvm_pause_in_guest(vcpu->kvm)) {
control->pause_filter_count = pause_filter_count;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a10668d17a16a..8709e87621d21 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -24,6 +24,7 @@
#include "cpuid.h"
#include "kvm_cache_regs.h"
+#include "pmu.h"
/*
* Helpers to convert to/from physical addresses for pages whose address is
@@ -238,6 +239,13 @@ struct svm_nested_state {
* on its side.
*/
bool force_msr_bitmap_recalc;
+
+ /*
+ * PMU counters where Host-Only or Guest-Only bits are used need to be
+ * reprogrammed on nested transitions and EFER.SVME changes to correctly
+ * enable/disable the counters based on the vCPU state.
+ */
+ DECLARE_BITMAP(reprogram_pmcs_on_nested_transitions, X86_PMC_IDX_MAX);
};
struct vcpu_sev_es_state {
@@ -877,6 +885,31 @@ void nested_sync_control_from_vmcb02(struct vcpu_svm *svm);
void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm);
void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb);
+
+static inline void __svm_pmu_handle_nested_transition(struct vcpu_svm *svm, bool defer)
+{
+ u64 counters = *(u64 *)svm->nested.reprogram_pmcs_on_nested_transitions;
+
+ if (!counters)
+ return;
+
+ /* Reprogramming sets the bit again for PMCs that still need tracking */
+ bitmap_zero(svm->nested.reprogram_pmcs_on_nested_transitions, X86_PMC_IDX_MAX);
+ __kvm_pmu_reprogram_counters(vcpu_to_pmu(&svm->vcpu), counters, defer);
+}
+
+static inline void svm_pmu_handle_nested_transition(struct vcpu_svm *svm)
+{
+ /*
+ * Do NOT defer reprogramming the counters by default. Instructions
+ * causing a state change are counted based on the _new_ CPU state
+ * (e.g. a successful VMRUN is counted in guest mode). Hence, the
+ * counters should be reprogrammed with the new state _before_ the
+ * instruction is potentially counted upon emulation completion.
+ */
+ __svm_pmu_handle_nested_transition(svm, false);
+}
+
extern struct kvm_x86_nested_ops svm_nested_ops;
/* avic.c */
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v5 09/13] KVM: x86/pmu: Allow Host-Only/Guest-Only bits with nSVM and mediated PMU
2026-04-30 20:27 [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD Host-Only/Guest-Only bits Yosry Ahmed
` (7 preceding siblings ...)
2026-04-30 20:27 ` [PATCH v5 08/13] KVM: x86/pmu: Reprogram Host/Guest-Only counters on nested transitions Yosry Ahmed
@ 2026-04-30 20:27 ` Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 10/13] KVM: selftests: Refactor allocating guest stack into a helper Yosry Ahmed
` (4 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-04-30 20:27 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, kvm, linux-kernel, Yosry Ahmed
From: Jim Mattson <jmattson@google.com>
Now that KVM correctly handles Host-Only and Guest-Only bits in the
event selector MSRs, allow the guest to set them if the vCPU advertises
SVM and uses the mediated PMU.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/pmu.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 902d7eb4a461b..b12c35b4fccbf 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -207,7 +207,11 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
}
pmu->counter_bitmask[KVM_PMC_GP] = BIT_ULL(48) - 1;
+
pmu->reserved_bits = 0xfffffff000280000ull;
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_SVM) && kvm_vcpu_has_mediated_pmu(vcpu))
+ pmu->reserved_bits &= ~AMD64_EVENTSEL_HOST_GUEST_MASK;
+
pmu->raw_event_mask = AMD64_RAW_EVENT_MASK;
/* not applicable to AMD; but clean them to prevent any fall out */
pmu->counter_bitmask[KVM_PMC_FIXED] = 0;
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v5 10/13] KVM: selftests: Refactor allocating guest stack into a helper
2026-04-30 20:27 [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD Host-Only/Guest-Only bits Yosry Ahmed
` (8 preceding siblings ...)
2026-04-30 20:27 ` [PATCH v5 09/13] KVM: x86/pmu: Allow Host-Only/Guest-Only bits with nSVM and mediated PMU Yosry Ahmed
@ 2026-04-30 20:27 ` Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 11/13] KVM: selftests: Allocate a dedicated guest page for x86 L2 guest stack Yosry Ahmed
` (3 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-04-30 20:27 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, kvm, linux-kernel, Yosry Ahmed
In preparation for reusing the logic to allocate stacks for nested
guests, refactoring allocating a guest stack and aligning RSP into a
helper.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
.../testing/selftests/kvm/lib/x86/processor.c | 45 ++++++++++---------
1 file changed, 25 insertions(+), 20 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index b51467d70f6e7..94a1cadb2b26b 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -778,6 +778,30 @@ void assert_on_unhandled_exception(struct kvm_vcpu *vcpu)
REPORT_GUEST_ASSERT(uc);
}
+static gva_t vm_alloc_stack(struct kvm_vm *vm, int nr_pages)
+{
+ int size = nr_pages * getpagesize();
+ gva_t stack_gva;
+
+ stack_gva = __vm_alloc(vm, size, DEFAULT_GUEST_STACK_VADDR_MIN, MEM_REGION_DATA);
+ stack_gva += size;
+
+ /*
+ * Align stack to match calling sequence requirements in section "The
+ * Stack Frame" of the System V ABI AMD64 Architecture Processor
+ * Supplement, which requires the value (%rsp + 8) to be a multiple of
+ * 16 when control is transferred to the function entry point.
+ *
+ * If this code is ever used to launch a vCPU with 32-bit entry point it
+ * may need to subtract 4 bytes instead of 8 bytes.
+ */
+ TEST_ASSERT(IS_ALIGNED(stack_gva, PAGE_SIZE),
+ "__vm_alloc() did not provide a page-aligned address");
+ stack_gva -= 8;
+
+ return stack_gva;
+}
+
void kvm_arch_vm_post_create(struct kvm_vm *vm, unsigned int nr_vcpus)
{
int r;
@@ -820,27 +844,8 @@ struct kvm_vcpu *vm_arch_vcpu_add(struct kvm_vm *vm, u32 vcpu_id)
{
struct kvm_mp_state mp_state;
struct kvm_regs regs;
- gva_t stack_gva;
struct kvm_vcpu *vcpu;
- stack_gva = __vm_alloc(vm, DEFAULT_STACK_PGS * getpagesize(),
- DEFAULT_GUEST_STACK_VADDR_MIN, MEM_REGION_DATA);
-
- stack_gva += DEFAULT_STACK_PGS * getpagesize();
-
- /*
- * Align stack to match calling sequence requirements in section "The
- * Stack Frame" of the System V ABI AMD64 Architecture Processor
- * Supplement, which requires the value (%rsp + 8) to be a multiple of
- * 16 when control is transferred to the function entry point.
- *
- * If this code is ever used to launch a vCPU with 32-bit entry point it
- * may need to subtract 4 bytes instead of 8 bytes.
- */
- TEST_ASSERT(IS_ALIGNED(stack_gva, PAGE_SIZE),
- "__vm_alloc() did not provide a page-aligned address");
- stack_gva -= 8;
-
vcpu = __vm_vcpu_add(vm, vcpu_id);
vcpu_init_cpuid(vcpu, kvm_get_supported_cpuid());
vcpu_init_sregs(vm, vcpu);
@@ -849,7 +854,7 @@ struct kvm_vcpu *vm_arch_vcpu_add(struct kvm_vm *vm, u32 vcpu_id)
/* Setup guest general purpose registers */
vcpu_regs_get(vcpu, ®s);
regs.rflags = regs.rflags | 0x2;
- regs.rsp = stack_gva;
+ regs.rsp = vm_alloc_stack(vm, DEFAULT_STACK_PGS);
vcpu_regs_set(vcpu, ®s);
/* Setup the MP state */
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v5 11/13] KVM: selftests: Allocate a dedicated guest page for x86 L2 guest stack
2026-04-30 20:27 [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD Host-Only/Guest-Only bits Yosry Ahmed
` (9 preceding siblings ...)
2026-04-30 20:27 ` [PATCH v5 10/13] KVM: selftests: Refactor allocating guest stack into a helper Yosry Ahmed
@ 2026-04-30 20:27 ` Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 12/13] KVM: selftests: Drop L1-provided stacks for L2 guests on x86 Yosry Ahmed
` (2 subsequent siblings)
13 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-04-30 20:27 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, kvm, linux-kernel, Yosry Ahmed
Instead of relying on the L1-provided stack for L2, which is usually an
array on L1's own stack, allocate a dedicated page of VM memory for the
L2 stack in vcpu_alloc_{vmx/svm}() and use that as L2's RSP in the
VMCS/VMCB instead of the L1-provided value.
Most L1 guest code does not do anything with the L2 stack other than
stuff it in RSP, so this change is transparent and the L1-provided stack
is silently ignored. The only exception is memstress nested L1 code
which puts the vCPU index on L2's stack, so update this code to use the
newly allocated stack.
L1-provided stacks will be dropped and cleaned up separately.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
tools/testing/selftests/kvm/include/x86/processor.h | 2 ++
tools/testing/selftests/kvm/include/x86/svm_util.h | 3 +++
tools/testing/selftests/kvm/include/x86/vmx.h | 2 ++
tools/testing/selftests/kvm/lib/x86/memstress.c | 5 ++---
tools/testing/selftests/kvm/lib/x86/processor.c | 2 +-
tools/testing/selftests/kvm/lib/x86/svm.c | 4 +++-
tools/testing/selftests/kvm/lib/x86/vmx.c | 4 +++-
7 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 77f576ee7789d..36df2cadbc4f6 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1208,6 +1208,8 @@ struct idt_entry {
void vm_install_exception_handler(struct kvm_vm *vm, int vector,
void (*handler)(struct ex_regs *));
+gva_t vm_alloc_stack(struct kvm_vm *vm, int nr_pages);
+
/*
* Exception fixup morphs #DE to an arbitrary magic vector so that '0' can be
* used to signal "no expcetion".
diff --git a/tools/testing/selftests/kvm/include/x86/svm_util.h b/tools/testing/selftests/kvm/include/x86/svm_util.h
index 6c013eb838beb..3b1cc484fba1c 100644
--- a/tools/testing/selftests/kvm/include/x86/svm_util.h
+++ b/tools/testing/selftests/kvm/include/x86/svm_util.h
@@ -28,6 +28,9 @@ struct svm_test_data {
void *msr_hva;
u64 msr_gpa;
+ /* Stack */
+ void *stack; /* gva */
+
/* NPT */
u64 ncr3_gpa;
};
diff --git a/tools/testing/selftests/kvm/include/x86/vmx.h b/tools/testing/selftests/kvm/include/x86/vmx.h
index 90fffaf915958..1dcb9b86d33d3 100644
--- a/tools/testing/selftests/kvm/include/x86/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86/vmx.h
@@ -524,6 +524,8 @@ struct vmx_pages {
u64 apic_access_gpa;
void *apic_access;
+ void *stack;
+
u64 eptp_gpa;
};
diff --git a/tools/testing/selftests/kvm/lib/x86/memstress.c b/tools/testing/selftests/kvm/lib/x86/memstress.c
index 61cf952cd2dc2..fa07ef037cad1 100644
--- a/tools/testing/selftests/kvm/lib/x86/memstress.c
+++ b/tools/testing/selftests/kvm/lib/x86/memstress.c
@@ -43,7 +43,7 @@ static void l1_vmx_code(struct vmx_pages *vmx, u64 vcpu_id)
GUEST_ASSERT(ept_1g_pages_supported());
rsp = &l2_guest_stack[L2_GUEST_STACK_SIZE - 1];
- *rsp = vcpu_id;
+ *(u64 *)vmx->stack = vcpu_id;
prepare_vmcs(vmx, memstress_l2_guest_entry, rsp);
GUEST_ASSERT(!vmlaunch());
@@ -56,9 +56,8 @@ static void l1_svm_code(struct svm_test_data *svm, u64 vcpu_id)
unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
unsigned long *rsp;
-
rsp = &l2_guest_stack[L2_GUEST_STACK_SIZE - 1];
- *rsp = vcpu_id;
+ *(u64 *)svm->stack = vcpu_id;
generic_svm_setup(svm, memstress_l2_guest_entry, rsp);
run_guest(svm->vmcb, svm->vmcb_gpa);
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index 94a1cadb2b26b..cf59ffed45b74 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -778,7 +778,7 @@ void assert_on_unhandled_exception(struct kvm_vcpu *vcpu)
REPORT_GUEST_ASSERT(uc);
}
-static gva_t vm_alloc_stack(struct kvm_vm *vm, int nr_pages)
+gva_t vm_alloc_stack(struct kvm_vm *vm, int nr_pages)
{
int size = nr_pages * getpagesize();
gva_t stack_gva;
diff --git a/tools/testing/selftests/kvm/lib/x86/svm.c b/tools/testing/selftests/kvm/lib/x86/svm.c
index 3b01605ab016c..4e9c37f8d1a61 100644
--- a/tools/testing/selftests/kvm/lib/x86/svm.c
+++ b/tools/testing/selftests/kvm/lib/x86/svm.c
@@ -46,6 +46,8 @@ vcpu_alloc_svm(struct kvm_vm *vm, gva_t *p_svm_gva)
svm->msr_gpa = addr_gva2gpa(vm, (uintptr_t)svm->msr);
memset(svm->msr_hva, 0, getpagesize());
+ svm->stack = (void *)vm_alloc_stack(vm, 1);
+
if (vm->stage2_mmu.pgd_created)
svm->ncr3_gpa = vm->stage2_mmu.pgd;
@@ -122,7 +124,7 @@ void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_r
ctrl->msrpm_base_pa = svm->msr_gpa;
vmcb->save.rip = (u64)guest_rip;
- vmcb->save.rsp = (u64)guest_rsp;
+ vmcb->save.rsp = (u64)svm->stack;
guest_regs.rdi = (u64)svm;
if (svm->ncr3_gpa) {
diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
index 67642759e4a05..81fe85cf22e8f 100644
--- a/tools/testing/selftests/kvm/lib/x86/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
@@ -116,6 +116,8 @@ vcpu_alloc_vmx(struct kvm_vm *vm, gva_t *p_vmx_gva)
vmx->vmwrite_gpa = addr_gva2gpa(vm, (uintptr_t)vmx->vmwrite);
memset(vmx->vmwrite_hva, 0, getpagesize());
+ vmx->stack = (void *)vm_alloc_stack(vm, 1);
+
if (vm->stage2_mmu.pgd_created)
vmx->eptp_gpa = vm->stage2_mmu.pgd;
@@ -370,7 +372,7 @@ void prepare_vmcs(struct vmx_pages *vmx, void *guest_rip, void *guest_rsp)
{
init_vmcs_control_fields(vmx);
init_vmcs_host_state();
- init_vmcs_guest_state(guest_rip, guest_rsp);
+ init_vmcs_guest_state(guest_rip, vmx->stack);
}
bool kvm_cpu_has_ept(void)
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v5 12/13] KVM: selftests: Drop L1-provided stacks for L2 guests on x86
2026-04-30 20:27 [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD Host-Only/Guest-Only bits Yosry Ahmed
` (10 preceding siblings ...)
2026-04-30 20:27 ` [PATCH v5 11/13] KVM: selftests: Allocate a dedicated guest page for x86 L2 guest stack Yosry Ahmed
@ 2026-04-30 20:27 ` Yosry Ahmed
2026-04-30 20:27 ` [PATCH v5 13/13] KVM: selftests: Add svm_pmu_host_guest_test for Host-Only/Guest-Only bits Yosry Ahmed
2026-04-30 20:38 ` [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD " Yosry Ahmed
13 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-04-30 20:27 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, kvm, linux-kernel, Yosry Ahmed
Now that a dedicated page is allocated for L2's stack and stuffed in
RSP, the L1-provided stack is unused. Drop the stacks allocated by L1
guest code for L2 in all x86 tests.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
tools/testing/selftests/kvm/include/x86/svm_util.h | 2 +-
tools/testing/selftests/kvm/include/x86/vmx.h | 2 +-
tools/testing/selftests/kvm/lib/x86/memstress.c | 14 ++------------
tools/testing/selftests/kvm/lib/x86/svm.c | 2 +-
tools/testing/selftests/kvm/lib/x86/vmx.c | 2 +-
tools/testing/selftests/kvm/x86/aperfmperf_test.c | 9 ++-------
.../selftests/kvm/x86/evmcs_smm_controls_test.c | 5 +----
tools/testing/selftests/kvm/x86/hyperv_evmcs.c | 6 +-----
tools/testing/selftests/kvm/x86/hyperv_svm_test.c | 6 +-----
tools/testing/selftests/kvm/x86/kvm_buslock_test.c | 9 ++-------
.../selftests/kvm/x86/nested_close_kvm_test.c | 12 ++----------
.../selftests/kvm/x86/nested_dirty_log_test.c | 8 ++------
.../selftests/kvm/x86/nested_emulation_test.c | 4 ++--
.../selftests/kvm/x86/nested_exceptions_test.c | 9 ++-------
.../selftests/kvm/x86/nested_invalid_cr3_test.c | 10 ++--------
.../selftests/kvm/x86/nested_tsc_adjust_test.c | 10 ++--------
.../selftests/kvm/x86/nested_tsc_scaling_test.c | 10 ++--------
.../selftests/kvm/x86/nested_vmsave_vmload_test.c | 6 +-----
tools/testing/selftests/kvm/x86/smm_test.c | 8 ++------
tools/testing/selftests/kvm/x86/state_test.c | 11 ++---------
tools/testing/selftests/kvm/x86/svm_int_ctl_test.c | 5 +----
.../selftests/kvm/x86/svm_lbr_nested_state.c | 6 +-----
.../selftests/kvm/x86/svm_nested_clear_efer_svme.c | 7 +------
.../selftests/kvm/x86/svm_nested_shutdown_test.c | 5 +----
.../kvm/x86/svm_nested_soft_inject_test.c | 6 +-----
.../selftests/kvm/x86/svm_nested_vmcb12_gpa.c | 13 ++++---------
tools/testing/selftests/kvm/x86/svm_vmcall_test.c | 5 +----
.../selftests/kvm/x86/triple_fault_event_test.c | 9 ++-------
.../selftests/kvm/x86/vmx_apic_access_test.c | 5 +----
.../selftests/kvm/x86/vmx_apicv_updates_test.c | 4 +---
.../kvm/x86/vmx_invalid_nested_guest_state.c | 6 +-----
.../selftests/kvm/x86/vmx_nested_la57_state_test.c | 5 +----
.../selftests/kvm/x86/vmx_preemption_timer_test.c | 5 +----
33 files changed, 49 insertions(+), 177 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/svm_util.h b/tools/testing/selftests/kvm/include/x86/svm_util.h
index 3b1cc484fba1c..c201c30485e72 100644
--- a/tools/testing/selftests/kvm/include/x86/svm_util.h
+++ b/tools/testing/selftests/kvm/include/x86/svm_util.h
@@ -60,7 +60,7 @@ static inline void vmmcall(void)
)
struct svm_test_data *vcpu_alloc_svm(struct kvm_vm *vm, gva_t *p_svm_gva);
-void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_rsp);
+void generic_svm_setup(struct svm_test_data *svm, void *guest_rip);
void run_guest(struct vmcb *vmcb, u64 vmcb_gpa);
static inline bool kvm_cpu_has_npt(void)
diff --git a/tools/testing/selftests/kvm/include/x86/vmx.h b/tools/testing/selftests/kvm/include/x86/vmx.h
index 1dcb9b86d33d3..4bcfd60e3aecb 100644
--- a/tools/testing/selftests/kvm/include/x86/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86/vmx.h
@@ -554,7 +554,7 @@ union vmx_ctrl_msr {
struct vmx_pages *vcpu_alloc_vmx(struct kvm_vm *vm, gva_t *p_vmx_gva);
bool prepare_for_vmx_operation(struct vmx_pages *vmx);
-void prepare_vmcs(struct vmx_pages *vmx, void *guest_rip, void *guest_rsp);
+void prepare_vmcs(struct vmx_pages *vmx, void *guest_rip);
bool load_vmcs(struct vmx_pages *vmx);
bool ept_1g_pages_supported(void);
diff --git a/tools/testing/selftests/kvm/lib/x86/memstress.c b/tools/testing/selftests/kvm/lib/x86/memstress.c
index fa07ef037cad1..e19e8b5a09c5a 100644
--- a/tools/testing/selftests/kvm/lib/x86/memstress.c
+++ b/tools/testing/selftests/kvm/lib/x86/memstress.c
@@ -30,21 +30,15 @@ __asm__(
" ud2;"
);
-#define L2_GUEST_STACK_SIZE 64
-
static void l1_vmx_code(struct vmx_pages *vmx, u64 vcpu_id)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
- unsigned long *rsp;
-
GUEST_ASSERT(vmx->vmcs_gpa);
GUEST_ASSERT(prepare_for_vmx_operation(vmx));
GUEST_ASSERT(load_vmcs(vmx));
GUEST_ASSERT(ept_1g_pages_supported());
- rsp = &l2_guest_stack[L2_GUEST_STACK_SIZE - 1];
*(u64 *)vmx->stack = vcpu_id;
- prepare_vmcs(vmx, memstress_l2_guest_entry, rsp);
+ prepare_vmcs(vmx, memstress_l2_guest_entry);
GUEST_ASSERT(!vmlaunch());
GUEST_ASSERT_EQ(vmreadz(VM_EXIT_REASON), EXIT_REASON_VMCALL);
@@ -53,12 +47,8 @@ static void l1_vmx_code(struct vmx_pages *vmx, u64 vcpu_id)
static void l1_svm_code(struct svm_test_data *svm, u64 vcpu_id)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
- unsigned long *rsp;
-
- rsp = &l2_guest_stack[L2_GUEST_STACK_SIZE - 1];
*(u64 *)svm->stack = vcpu_id;
- generic_svm_setup(svm, memstress_l2_guest_entry, rsp);
+ generic_svm_setup(svm, memstress_l2_guest_entry);
run_guest(svm->vmcb, svm->vmcb_gpa);
GUEST_ASSERT_EQ(svm->vmcb->control.exit_code, SVM_EXIT_VMMCALL);
diff --git a/tools/testing/selftests/kvm/lib/x86/svm.c b/tools/testing/selftests/kvm/lib/x86/svm.c
index 4e9c37f8d1a61..1445b890986fd 100644
--- a/tools/testing/selftests/kvm/lib/x86/svm.c
+++ b/tools/testing/selftests/kvm/lib/x86/svm.c
@@ -83,7 +83,7 @@ void vm_enable_npt(struct kvm_vm *vm)
tdp_mmu_init(vm, vm->mmu.pgtable_levels, &pte_masks);
}
-void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_rsp)
+void generic_svm_setup(struct svm_test_data *svm, void *guest_rip)
{
struct vmcb *vmcb = svm->vmcb;
u64 vmcb_gpa = svm->vmcb_gpa;
diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
index 81fe85cf22e8f..33c477ce4a58b 100644
--- a/tools/testing/selftests/kvm/lib/x86/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
@@ -368,7 +368,7 @@ static inline void init_vmcs_guest_state(void *rip, void *rsp)
vmwrite(GUEST_SYSENTER_EIP, vmreadz(HOST_IA32_SYSENTER_EIP));
}
-void prepare_vmcs(struct vmx_pages *vmx, void *guest_rip, void *guest_rsp)
+void prepare_vmcs(struct vmx_pages *vmx, void *guest_rip)
{
init_vmcs_control_fields(vmx);
init_vmcs_host_state();
diff --git a/tools/testing/selftests/kvm/x86/aperfmperf_test.c b/tools/testing/selftests/kvm/x86/aperfmperf_test.c
index c91660103137b..845cb685f1743 100644
--- a/tools/testing/selftests/kvm/x86/aperfmperf_test.c
+++ b/tools/testing/selftests/kvm/x86/aperfmperf_test.c
@@ -54,8 +54,6 @@ static void guest_read_aperf_mperf(void)
GUEST_SYNC2(rdmsr(MSR_IA32_APERF), rdmsr(MSR_IA32_MPERF));
}
-#define L2_GUEST_STACK_SIZE 64
-
static void l2_guest_code(void)
{
guest_read_aperf_mperf();
@@ -64,21 +62,18 @@ static void l2_guest_code(void)
static void l1_svm_code(struct svm_test_data *svm)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
struct vmcb *vmcb = svm->vmcb;
- generic_svm_setup(svm, l2_guest_code, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
run_guest(vmcb, svm->vmcb_gpa);
}
static void l1_vmx_code(struct vmx_pages *vmx)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
GUEST_ASSERT_EQ(prepare_for_vmx_operation(vmx), true);
GUEST_ASSERT_EQ(load_vmcs(vmx), true);
- prepare_vmcs(vmx, NULL, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx, NULL);
/*
* Enable MSR bitmaps (the bitmap itself is allocated, zeroed, and set
diff --git a/tools/testing/selftests/kvm/x86/evmcs_smm_controls_test.c b/tools/testing/selftests/kvm/x86/evmcs_smm_controls_test.c
index 5b3aef109cfc5..77ce87c41a868 100644
--- a/tools/testing/selftests/kvm/x86/evmcs_smm_controls_test.c
+++ b/tools/testing/selftests/kvm/x86/evmcs_smm_controls_test.c
@@ -52,8 +52,6 @@ static void l2_guest_code(void)
static void guest_code(struct vmx_pages *vmx_pages,
struct hyperv_test_pages *hv_pages)
{
-#define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
/* Set up Hyper-V enlightenments and eVMCS */
wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
@@ -62,8 +60,7 @@ static void guest_code(struct vmx_pages *vmx_pages,
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_ASSERT(load_evmcs(hv_pages));
- prepare_vmcs(vmx_pages, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
GUEST_ASSERT(!vmlaunch());
diff --git a/tools/testing/selftests/kvm/x86/hyperv_evmcs.c b/tools/testing/selftests/kvm/x86/hyperv_evmcs.c
index c7fa114aee20f..1bda2cd3f7396 100644
--- a/tools/testing/selftests/kvm/x86/hyperv_evmcs.c
+++ b/tools/testing/selftests/kvm/x86/hyperv_evmcs.c
@@ -78,9 +78,6 @@ void l2_guest_code(void)
void guest_code(struct vmx_pages *vmx_pages, struct hyperv_test_pages *hv_pages,
gpa_t hv_hcall_page_gpa)
{
-#define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
wrmsr(HV_X64_MSR_HYPERCALL, hv_hcall_page_gpa);
@@ -100,8 +97,7 @@ void guest_code(struct vmx_pages *vmx_pages, struct hyperv_test_pages *hv_pages,
GUEST_SYNC(4);
GUEST_ASSERT(vmptrstz() == hv_pages->enlightened_vmcs_gpa);
- prepare_vmcs(vmx_pages, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
GUEST_SYNC(5);
GUEST_ASSERT(vmptrstz() == hv_pages->enlightened_vmcs_gpa);
diff --git a/tools/testing/selftests/kvm/x86/hyperv_svm_test.c b/tools/testing/selftests/kvm/x86/hyperv_svm_test.c
index 7a62f6a9d606d..1f74b0fa9b835 100644
--- a/tools/testing/selftests/kvm/x86/hyperv_svm_test.c
+++ b/tools/testing/selftests/kvm/x86/hyperv_svm_test.c
@@ -18,8 +18,6 @@
#include "svm_util.h"
#include "hyperv.h"
-#define L2_GUEST_STACK_SIZE 256
-
/* Exit to L1 from L2 with RDMSR instruction */
static inline void rdmsr_from_l2(u32 msr)
{
@@ -69,7 +67,6 @@ static void __attribute__((__flatten__)) guest_code(struct svm_test_data *svm,
struct hyperv_test_pages *hv_pages,
gpa_t pgs_gpa)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
struct vmcb *vmcb = svm->vmcb;
struct hv_vmcb_enlightenments *hve = &vmcb->control.hv_enlightenments;
@@ -81,8 +78,7 @@ static void __attribute__((__flatten__)) guest_code(struct svm_test_data *svm,
GUEST_ASSERT(svm->vmcb_gpa);
/* Prepare for L2 execution. */
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
/* L2 TLB flush setup */
hve->partition_assist_page = hv_pages->partition_assist_gpa;
diff --git a/tools/testing/selftests/kvm/x86/kvm_buslock_test.c b/tools/testing/selftests/kvm/x86/kvm_buslock_test.c
index 52014a3210c88..25a182be00a97 100644
--- a/tools/testing/selftests/kvm/x86/kvm_buslock_test.c
+++ b/tools/testing/selftests/kvm/x86/kvm_buslock_test.c
@@ -26,8 +26,6 @@ static void guest_generate_buslocks(void)
atomic_inc(val);
}
-#define L2_GUEST_STACK_SIZE 64
-
static void l2_guest_code(void)
{
guest_generate_buslocks();
@@ -36,21 +34,18 @@ static void l2_guest_code(void)
static void l1_svm_code(struct svm_test_data *svm)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
struct vmcb *vmcb = svm->vmcb;
- generic_svm_setup(svm, l2_guest_code, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
run_guest(vmcb, svm->vmcb_gpa);
}
static void l1_vmx_code(struct vmx_pages *vmx)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
GUEST_ASSERT_EQ(prepare_for_vmx_operation(vmx), true);
GUEST_ASSERT_EQ(load_vmcs(vmx), true);
- prepare_vmcs(vmx, NULL, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx, NULL);
GUEST_ASSERT(!vmwrite(GUEST_RIP, (u64)l2_guest_code));
GUEST_ASSERT(!vmlaunch());
diff --git a/tools/testing/selftests/kvm/x86/nested_close_kvm_test.c b/tools/testing/selftests/kvm/x86/nested_close_kvm_test.c
index 761fec2934080..b974cfb347d6e 100644
--- a/tools/testing/selftests/kvm/x86/nested_close_kvm_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_close_kvm_test.c
@@ -21,8 +21,6 @@ enum {
PORT_L0_EXIT = 0x2000,
};
-#define L2_GUEST_STACK_SIZE 64
-
static void l2_guest_code(void)
{
/* Exit to L0 */
@@ -32,14 +30,11 @@ static void l2_guest_code(void)
static void l1_vmx_code(struct vmx_pages *vmx_pages)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_ASSERT(load_vmcs(vmx_pages));
/* Prepare the VMCS for L2 execution. */
- prepare_vmcs(vmx_pages, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
GUEST_ASSERT(!vmlaunch());
GUEST_ASSERT(0);
@@ -47,11 +42,8 @@ static void l1_vmx_code(struct vmx_pages *vmx_pages)
static void l1_svm_code(struct svm_test_data *svm)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
/* Prepare the VMCB for L2 execution. */
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
run_guest(svm->vmcb, svm->vmcb_gpa);
GUEST_ASSERT(0);
diff --git a/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c b/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c
index 0e67cce835701..26b474bf13535 100644
--- a/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c
@@ -40,8 +40,6 @@
#define TEST_HVA(vm, idx) addr_gpa2hva(vm, TEST_GPA(idx))
-#define L2_GUEST_STACK_SIZE 64
-
/* Use the page offset bits to communicate the access+fault type. */
#define TEST_SYNC_READ_FAULT BIT(0)
#define TEST_SYNC_WRITE_FAULT BIT(1)
@@ -92,7 +90,6 @@ static void l2_guest_code_tdp_disabled(void)
void l1_vmx_code(struct vmx_pages *vmx)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
void *l2_rip;
GUEST_ASSERT(vmx->vmcs_gpa);
@@ -104,7 +101,7 @@ void l1_vmx_code(struct vmx_pages *vmx)
else
l2_rip = l2_guest_code_tdp_disabled;
- prepare_vmcs(vmx, l2_rip, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx, l2_rip);
GUEST_SYNC(TEST_SYNC_NO_FAULT);
GUEST_ASSERT(!vmlaunch());
@@ -115,7 +112,6 @@ void l1_vmx_code(struct vmx_pages *vmx)
static void l1_svm_code(struct svm_test_data *svm)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
void *l2_rip;
if (svm->ncr3_gpa)
@@ -123,7 +119,7 @@ static void l1_svm_code(struct svm_test_data *svm)
else
l2_rip = l2_guest_code_tdp_disabled;
- generic_svm_setup(svm, l2_rip, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_rip);
GUEST_SYNC(TEST_SYNC_NO_FAULT);
run_guest(svm->vmcb, svm->vmcb_gpa);
diff --git a/tools/testing/selftests/kvm/x86/nested_emulation_test.c b/tools/testing/selftests/kvm/x86/nested_emulation_test.c
index fb7dcbe53ac73..e08c6b0697e50 100644
--- a/tools/testing/selftests/kvm/x86/nested_emulation_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_emulation_test.c
@@ -57,7 +57,7 @@ static void guest_code(void *test_data)
struct svm_test_data *svm = test_data;
struct vmcb *vmcb = svm->vmcb;
- generic_svm_setup(svm, NULL, NULL);
+ generic_svm_setup(svm, NULL);
vmcb->save.idtr.limit = 0;
vmcb->save.rip = (u64)l2_guest_code;
@@ -69,7 +69,7 @@ static void guest_code(void *test_data)
GUEST_ASSERT(prepare_for_vmx_operation(test_data));
GUEST_ASSERT(load_vmcs(test_data));
- prepare_vmcs(test_data, NULL, NULL);
+ prepare_vmcs(test_data, NULL);
GUEST_ASSERT(!vmwrite(GUEST_IDTR_LIMIT, 0));
GUEST_ASSERT(!vmwrite(GUEST_RIP, (u64)l2_guest_code));
GUEST_ASSERT(!vmwrite(EXCEPTION_BITMAP, 0));
diff --git a/tools/testing/selftests/kvm/x86/nested_exceptions_test.c b/tools/testing/selftests/kvm/x86/nested_exceptions_test.c
index 186e980aa8eee..aeec3121c8e83 100644
--- a/tools/testing/selftests/kvm/x86/nested_exceptions_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_exceptions_test.c
@@ -5,8 +5,6 @@
#include "vmx.h"
#include "svm_util.h"
-#define L2_GUEST_STACK_SIZE 256
-
/*
* Arbitrary, never shoved into KVM/hardware, just need to avoid conflict with
* the "real" exceptions used, #SS/#GP/#DF (12/13/8).
@@ -91,9 +89,8 @@ static void svm_run_l2(struct svm_test_data *svm, void *l2_code, int vector,
static void l1_svm_code(struct svm_test_data *svm)
{
struct vmcb_control_area *ctrl = &svm->vmcb->control;
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
- generic_svm_setup(svm, NULL, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, NULL);
svm->vmcb->save.idtr.limit = 0;
ctrl->intercept |= BIT_ULL(INTERCEPT_SHUTDOWN);
@@ -128,13 +125,11 @@ static void vmx_run_l2(void *l2_code, int vector, u32 error_code)
static void l1_vmx_code(struct vmx_pages *vmx)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
GUEST_ASSERT_EQ(prepare_for_vmx_operation(vmx), true);
GUEST_ASSERT_EQ(load_vmcs(vmx), true);
- prepare_vmcs(vmx, NULL, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx, NULL);
GUEST_ASSERT_EQ(vmwrite(GUEST_IDTR_LIMIT, 0), 0);
/*
diff --git a/tools/testing/selftests/kvm/x86/nested_invalid_cr3_test.c b/tools/testing/selftests/kvm/x86/nested_invalid_cr3_test.c
index 11fd2467d8233..8c2ba9674558e 100644
--- a/tools/testing/selftests/kvm/x86/nested_invalid_cr3_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_invalid_cr3_test.c
@@ -11,8 +11,6 @@
#include "kselftest.h"
-#define L2_GUEST_STACK_SIZE 64
-
static void l2_guest_code(void)
{
vmcall();
@@ -20,11 +18,9 @@ static void l2_guest_code(void)
static void l1_svm_code(struct svm_test_data *svm)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
uintptr_t save_cr3;
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
/* Try to run L2 with invalid CR3 and make sure it fails */
save_cr3 = svm->vmcb->save.cr3;
@@ -42,14 +38,12 @@ static void l1_svm_code(struct svm_test_data *svm)
static void l1_vmx_code(struct vmx_pages *vmx_pages)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
uintptr_t save_cr3;
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_ASSERT(load_vmcs(vmx_pages));
- prepare_vmcs(vmx_pages, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
/* Try to run L2 with invalid CR3 and make sure it fails */
save_cr3 = vmreadz(GUEST_CR3);
diff --git a/tools/testing/selftests/kvm/x86/nested_tsc_adjust_test.c b/tools/testing/selftests/kvm/x86/nested_tsc_adjust_test.c
index f0e4adac47510..cb79d7b9619c2 100644
--- a/tools/testing/selftests/kvm/x86/nested_tsc_adjust_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_tsc_adjust_test.c
@@ -34,8 +34,6 @@
#define TSC_ADJUST_VALUE (1ll << 32)
#define TSC_OFFSET_VALUE -(1ll << 48)
-#define L2_GUEST_STACK_SIZE 64
-
enum {
PORT_ABORT = 0x1000,
PORT_REPORT,
@@ -75,8 +73,6 @@ static void l2_guest_code(void)
static void l1_guest_code(void *data)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
/* Set TSC from L1 and make sure TSC_ADJUST is updated correctly */
GUEST_ASSERT(rdtsc() < TSC_ADJUST_VALUE);
wrmsr(MSR_IA32_TSC, rdtsc() - TSC_ADJUST_VALUE);
@@ -93,8 +89,7 @@ static void l1_guest_code(void *data)
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_ASSERT(load_vmcs(vmx_pages));
- prepare_vmcs(vmx_pages, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
control = vmreadz(CPU_BASED_VM_EXEC_CONTROL);
control |= CPU_BASED_USE_MSR_BITMAPS | CPU_BASED_USE_TSC_OFFSETTING;
vmwrite(CPU_BASED_VM_EXEC_CONTROL, control);
@@ -105,8 +100,7 @@ static void l1_guest_code(void *data)
} else {
struct svm_test_data *svm = data;
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
svm->vmcb->control.tsc_offset = TSC_OFFSET_VALUE;
run_guest(svm->vmcb, svm->vmcb_gpa);
diff --git a/tools/testing/selftests/kvm/x86/nested_tsc_scaling_test.c b/tools/testing/selftests/kvm/x86/nested_tsc_scaling_test.c
index 190e93af20a14..18f765835bf4c 100644
--- a/tools/testing/selftests/kvm/x86/nested_tsc_scaling_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_tsc_scaling_test.c
@@ -22,8 +22,6 @@
#define TSC_OFFSET_L2 ((u64)-33125236320908)
#define TSC_MULTIPLIER_L2 (L2_SCALE_FACTOR << 48)
-#define L2_GUEST_STACK_SIZE 64
-
enum { USLEEP, UCHECK_L1, UCHECK_L2 };
#define GUEST_SLEEP(sec) ucall(UCALL_SYNC, 2, USLEEP, sec)
#define GUEST_CHECK(level, freq) ucall(UCALL_SYNC, 2, level, freq)
@@ -82,13 +80,10 @@ static void l2_guest_code(void)
static void l1_svm_code(struct svm_test_data *svm)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
/* check that L1's frequency looks alright before launching L2 */
check_tsc_freq(UCHECK_L1);
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
/* enable TSC scaling for L2 */
wrmsr(MSR_AMD64_TSC_RATIO, L2_SCALE_FACTOR << 32);
@@ -105,7 +100,6 @@ static void l1_svm_code(struct svm_test_data *svm)
static void l1_vmx_code(struct vmx_pages *vmx_pages)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
u32 control;
/* check that L1's frequency looks alright before launching L2 */
@@ -115,7 +109,7 @@ static void l1_vmx_code(struct vmx_pages *vmx_pages)
GUEST_ASSERT(load_vmcs(vmx_pages));
/* prepare the VMCS for L2 execution */
- prepare_vmcs(vmx_pages, l2_guest_code, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
/* enable TSC offsetting and TSC scaling for L2 */
control = vmreadz(CPU_BASED_VM_EXEC_CONTROL);
diff --git a/tools/testing/selftests/kvm/x86/nested_vmsave_vmload_test.c b/tools/testing/selftests/kvm/x86/nested_vmsave_vmload_test.c
index 85d3f4cc76f39..a130759f39a19 100644
--- a/tools/testing/selftests/kvm/x86/nested_vmsave_vmload_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_vmsave_vmload_test.c
@@ -28,8 +28,6 @@
#define TEST_VMCB_L2_GPA TEST_VMCB_L1_GPA(0)
-#define L2_GUEST_STACK_SIZE 64
-
static void l2_guest_code_vmsave(void)
{
asm volatile("vmsave %0" : : "a"(TEST_VMCB_L2_GPA) : "memory");
@@ -70,10 +68,8 @@ static void l2_guest_code_vmcb1(void)
static void l1_guest_code(struct svm_test_data *svm)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
/* Each test case initializes the guest RIP below */
- generic_svm_setup(svm, NULL, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, NULL);
/* Set VMSAVE/VMLOAD intercepts and make sure they work with.. */
svm->vmcb->control.intercept |= (BIT_ULL(INTERCEPT_VMSAVE) |
diff --git a/tools/testing/selftests/kvm/x86/smm_test.c b/tools/testing/selftests/kvm/x86/smm_test.c
index 740051167dbd4..e2542f4ced605 100644
--- a/tools/testing/selftests/kvm/x86/smm_test.c
+++ b/tools/testing/selftests/kvm/x86/smm_test.c
@@ -63,8 +63,6 @@ static void l2_guest_code(void)
static void guest_code(void *arg)
{
- #define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
u64 apicbase = rdmsr(MSR_IA32_APICBASE);
struct svm_test_data *svm = arg;
struct vmx_pages *vmx_pages = arg;
@@ -81,13 +79,11 @@ static void guest_code(void *arg)
if (arg) {
if (this_cpu_has(X86_FEATURE_SVM)) {
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
} else {
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_ASSERT(load_vmcs(vmx_pages));
- prepare_vmcs(vmx_pages, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
}
sync_with_host(5);
diff --git a/tools/testing/selftests/kvm/x86/state_test.c b/tools/testing/selftests/kvm/x86/state_test.c
index 409c6cc9f9214..4a1056a6cb8dc 100644
--- a/tools/testing/selftests/kvm/x86/state_test.c
+++ b/tools/testing/selftests/kvm/x86/state_test.c
@@ -19,8 +19,6 @@
#include "vmx.h"
#include "svm_util.h"
-#define L2_GUEST_STACK_SIZE 256
-
void svm_l2_guest_code(void)
{
GUEST_SYNC(4);
@@ -35,13 +33,11 @@ void svm_l2_guest_code(void)
static void svm_l1_guest_code(struct svm_test_data *svm)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
struct vmcb *vmcb = svm->vmcb;
GUEST_ASSERT(svm->vmcb_gpa);
/* Prepare for L2 execution. */
- generic_svm_setup(svm, svm_l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, svm_l2_guest_code);
vmcb->control.int_ctl |= (V_GIF_ENABLE_MASK | V_GIF_MASK);
@@ -78,8 +74,6 @@ void vmx_l2_guest_code(void)
static void vmx_l1_guest_code(struct vmx_pages *vmx_pages)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
GUEST_ASSERT(vmx_pages->vmcs_gpa);
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_SYNC(3);
@@ -89,8 +83,7 @@ static void vmx_l1_guest_code(struct vmx_pages *vmx_pages)
GUEST_SYNC(4);
GUEST_ASSERT(vmptrstz() == vmx_pages->vmcs_gpa);
- prepare_vmcs(vmx_pages, vmx_l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, vmx_l2_guest_code);
GUEST_SYNC(5);
GUEST_ASSERT(vmptrstz() == vmx_pages->vmcs_gpa);
diff --git a/tools/testing/selftests/kvm/x86/svm_int_ctl_test.c b/tools/testing/selftests/kvm/x86/svm_int_ctl_test.c
index d3cc5e4f78831..7b1f4a4818bdd 100644
--- a/tools/testing/selftests/kvm/x86/svm_int_ctl_test.c
+++ b/tools/testing/selftests/kvm/x86/svm_int_ctl_test.c
@@ -54,15 +54,12 @@ static void l2_guest_code(struct svm_test_data *svm)
static void l1_guest_code(struct svm_test_data *svm)
{
- #define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
struct vmcb *vmcb = svm->vmcb;
x2apic_enable();
/* Prepare for L2 execution. */
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
/* No virtual interrupt masking */
vmcb->control.int_ctl &= ~V_INTR_MASKING_MASK;
diff --git a/tools/testing/selftests/kvm/x86/svm_lbr_nested_state.c b/tools/testing/selftests/kvm/x86/svm_lbr_nested_state.c
index 7fbfaa054c952..77c6ce9f45078 100644
--- a/tools/testing/selftests/kvm/x86/svm_lbr_nested_state.c
+++ b/tools/testing/selftests/kvm/x86/svm_lbr_nested_state.c
@@ -9,8 +9,6 @@
#include "svm_util.h"
-#define L2_GUEST_STACK_SIZE 64
-
#define DO_BRANCH() do { asm volatile("jmp 1f\n 1: nop"); } while (0)
struct lbr_branch {
@@ -55,7 +53,6 @@ static void l2_guest_code(struct svm_test_data *svm)
static void l1_guest_code(struct svm_test_data *svm, bool nested_lbrv)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
struct vmcb *vmcb = svm->vmcb;
struct lbr_branch l1_branch;
@@ -65,8 +62,7 @@ static void l1_guest_code(struct svm_test_data *svm, bool nested_lbrv)
CHECK_BRANCH_MSRS(&l1_branch);
/* Run L2, which will also do the same */
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
if (nested_lbrv)
vmcb->control.misc_ctl2 = SVM_MISC2_ENABLE_V_LBR;
diff --git a/tools/testing/selftests/kvm/x86/svm_nested_clear_efer_svme.c b/tools/testing/selftests/kvm/x86/svm_nested_clear_efer_svme.c
index 6a89eaffc6578..6bc301207cbcb 100644
--- a/tools/testing/selftests/kvm/x86/svm_nested_clear_efer_svme.c
+++ b/tools/testing/selftests/kvm/x86/svm_nested_clear_efer_svme.c
@@ -8,8 +8,6 @@
#include "kselftest.h"
-#define L2_GUEST_STACK_SIZE 64
-
static void l2_guest_code(void)
{
unsigned long efer = rdmsr(MSR_EFER);
@@ -24,10 +22,7 @@ static void l2_guest_code(void)
static void l1_guest_code(struct svm_test_data *svm)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
run_guest(svm->vmcb, svm->vmcb_gpa);
/* Unreachable, L1 should be shutdown */
diff --git a/tools/testing/selftests/kvm/x86/svm_nested_shutdown_test.c b/tools/testing/selftests/kvm/x86/svm_nested_shutdown_test.c
index c6ea3d609a629..2a4a216954bb3 100644
--- a/tools/testing/selftests/kvm/x86/svm_nested_shutdown_test.c
+++ b/tools/testing/selftests/kvm/x86/svm_nested_shutdown_test.c
@@ -19,12 +19,9 @@ static void l2_guest_code(struct svm_test_data *svm)
static void l1_guest_code(struct svm_test_data *svm, struct idt_entry *idt)
{
- #define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
struct vmcb *vmcb = svm->vmcb;
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
vmcb->control.intercept &= ~(BIT(INTERCEPT_SHUTDOWN));
diff --git a/tools/testing/selftests/kvm/x86/svm_nested_soft_inject_test.c b/tools/testing/selftests/kvm/x86/svm_nested_soft_inject_test.c
index f72f11d4c4f83..0b640d09d1943 100644
--- a/tools/testing/selftests/kvm/x86/svm_nested_soft_inject_test.c
+++ b/tools/testing/selftests/kvm/x86/svm_nested_soft_inject_test.c
@@ -78,17 +78,13 @@ static void l2_guest_code_nmi(void)
static void l1_guest_code(struct svm_test_data *svm, u64 is_nmi, u64 idt_alt)
{
- #define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
struct vmcb *vmcb = svm->vmcb;
if (is_nmi)
x2apic_enable();
/* Prepare for L2 execution. */
- generic_svm_setup(svm,
- is_nmi ? l2_guest_code_nmi : l2_guest_code_int,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, is_nmi ? l2_guest_code_nmi : l2_guest_code_int);
vmcb->control.intercept_exceptions |= BIT(PF_VECTOR) | BIT(UD_VECTOR);
vmcb->control.intercept |= BIT(INTERCEPT_NMI) | BIT(INTERCEPT_HLT);
diff --git a/tools/testing/selftests/kvm/x86/svm_nested_vmcb12_gpa.c b/tools/testing/selftests/kvm/x86/svm_nested_vmcb12_gpa.c
index a4935ce2fb998..b3f45035745ff 100644
--- a/tools/testing/selftests/kvm/x86/svm_nested_vmcb12_gpa.c
+++ b/tools/testing/selftests/kvm/x86/svm_nested_vmcb12_gpa.c
@@ -9,14 +9,9 @@
#include "kvm_test_harness.h"
#include "test_util.h"
-
-#define L2_GUEST_STACK_SIZE 64
-
#define SYNC_GP 101
#define SYNC_L2_STARTED 102
-static unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
static void guest_gp_handler(struct ex_regs *regs)
{
GUEST_SYNC(SYNC_GP);
@@ -30,28 +25,28 @@ static void l2_code(void)
static void l1_vmrun(struct svm_test_data *svm, gpa_t gpa)
{
- generic_svm_setup(svm, l2_code, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_code);
asm volatile ("vmrun %[gpa]" : : [gpa] "a" (gpa) : "memory");
}
static void l1_vmload(struct svm_test_data *svm, gpa_t gpa)
{
- generic_svm_setup(svm, l2_code, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_code);
asm volatile ("vmload %[gpa]" : : [gpa] "a" (gpa) : "memory");
}
static void l1_vmsave(struct svm_test_data *svm, gpa_t gpa)
{
- generic_svm_setup(svm, l2_code, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_code);
asm volatile ("vmsave %[gpa]" : : [gpa] "a" (gpa) : "memory");
}
static void l1_vmexit(struct svm_test_data *svm, gpa_t gpa)
{
- generic_svm_setup(svm, l2_code, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_code);
run_guest(svm->vmcb, svm->vmcb_gpa);
GUEST_ASSERT(svm->vmcb->control.exit_code == SVM_EXIT_VMMCALL);
diff --git a/tools/testing/selftests/kvm/x86/svm_vmcall_test.c b/tools/testing/selftests/kvm/x86/svm_vmcall_test.c
index b1887242f3b8e..7c57fb7e64221 100644
--- a/tools/testing/selftests/kvm/x86/svm_vmcall_test.c
+++ b/tools/testing/selftests/kvm/x86/svm_vmcall_test.c
@@ -19,13 +19,10 @@ static void l2_guest_code(struct svm_test_data *svm)
static void l1_guest_code(struct svm_test_data *svm)
{
- #define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
struct vmcb *vmcb = svm->vmcb;
/* Prepare for L2 execution. */
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
run_guest(vmcb, svm->vmcb_gpa);
diff --git a/tools/testing/selftests/kvm/x86/triple_fault_event_test.c b/tools/testing/selftests/kvm/x86/triple_fault_event_test.c
index f1c488e0d4975..0d83516f4bd08 100644
--- a/tools/testing/selftests/kvm/x86/triple_fault_event_test.c
+++ b/tools/testing/selftests/kvm/x86/triple_fault_event_test.c
@@ -21,9 +21,6 @@ static void l2_guest_code(void)
: : [port] "d" (ARBITRARY_IO_PORT) : "rax");
}
-#define L2_GUEST_STACK_SIZE 64
-unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
void l1_guest_code_vmx(struct vmx_pages *vmx)
{
@@ -31,8 +28,7 @@ void l1_guest_code_vmx(struct vmx_pages *vmx)
GUEST_ASSERT(prepare_for_vmx_operation(vmx));
GUEST_ASSERT(load_vmcs(vmx));
- prepare_vmcs(vmx, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx, l2_guest_code);
GUEST_ASSERT(!vmlaunch());
/* L2 should triple fault after a triple fault event injected. */
@@ -44,8 +40,7 @@ void l1_guest_code_svm(struct svm_test_data *svm)
{
struct vmcb *vmcb = svm->vmcb;
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
/* don't intercept shutdown to test the case of SVM allowing to do so */
vmcb->control.intercept &= ~(BIT(INTERCEPT_SHUTDOWN));
diff --git a/tools/testing/selftests/kvm/x86/vmx_apic_access_test.c b/tools/testing/selftests/kvm/x86/vmx_apic_access_test.c
index 1720113eae799..463f73aa9159a 100644
--- a/tools/testing/selftests/kvm/x86/vmx_apic_access_test.c
+++ b/tools/testing/selftests/kvm/x86/vmx_apic_access_test.c
@@ -36,16 +36,13 @@ static void l2_guest_code(void)
static void l1_guest_code(struct vmx_pages *vmx_pages, unsigned long high_gpa)
{
-#define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
u32 control;
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_ASSERT(load_vmcs(vmx_pages));
/* Prepare the VMCS for L2 execution. */
- prepare_vmcs(vmx_pages, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
control = vmreadz(CPU_BASED_VM_EXEC_CONTROL);
control |= CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
vmwrite(CPU_BASED_VM_EXEC_CONTROL, control);
diff --git a/tools/testing/selftests/kvm/x86/vmx_apicv_updates_test.c b/tools/testing/selftests/kvm/x86/vmx_apicv_updates_test.c
index 80a4fd1e5bbbe..f9b88a6f6113d 100644
--- a/tools/testing/selftests/kvm/x86/vmx_apicv_updates_test.c
+++ b/tools/testing/selftests/kvm/x86/vmx_apicv_updates_test.c
@@ -31,15 +31,13 @@ static void l2_guest_code(void)
static void l1_guest_code(struct vmx_pages *vmx_pages)
{
-#define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
u32 control;
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_ASSERT(load_vmcs(vmx_pages));
/* Prepare the VMCS for L2 execution. */
- prepare_vmcs(vmx_pages, l2_guest_code, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
control = vmreadz(CPU_BASED_VM_EXEC_CONTROL);
control |= CPU_BASED_USE_MSR_BITMAPS;
vmwrite(CPU_BASED_VM_EXEC_CONTROL, control);
diff --git a/tools/testing/selftests/kvm/x86/vmx_invalid_nested_guest_state.c b/tools/testing/selftests/kvm/x86/vmx_invalid_nested_guest_state.c
index a2eaceed9ad52..6d88c54f69faa 100644
--- a/tools/testing/selftests/kvm/x86/vmx_invalid_nested_guest_state.c
+++ b/tools/testing/selftests/kvm/x86/vmx_invalid_nested_guest_state.c
@@ -25,15 +25,11 @@ static void l2_guest_code(void)
static void l1_guest_code(struct vmx_pages *vmx_pages)
{
-#define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_ASSERT(load_vmcs(vmx_pages));
/* Prepare the VMCS for L2 execution. */
- prepare_vmcs(vmx_pages, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
/*
* L2 must be run without unrestricted guest, verify that the selftests
diff --git a/tools/testing/selftests/kvm/x86/vmx_nested_la57_state_test.c b/tools/testing/selftests/kvm/x86/vmx_nested_la57_state_test.c
index f13dee3173837..75073efa926da 100644
--- a/tools/testing/selftests/kvm/x86/vmx_nested_la57_state_test.c
+++ b/tools/testing/selftests/kvm/x86/vmx_nested_la57_state_test.c
@@ -27,8 +27,6 @@ static void l2_guest_code(void)
static void l1_guest_code(struct vmx_pages *vmx_pages)
{
-#define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
u64 guest_cr4;
gpa_t pml5_pa, pml4_pa;
u64 *pml5;
@@ -42,8 +40,7 @@ static void l1_guest_code(struct vmx_pages *vmx_pages)
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_ASSERT(load_vmcs(vmx_pages));
- prepare_vmcs(vmx_pages, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
/*
* Set up L2 with a 4-level page table by pointing its CR3 to
diff --git a/tools/testing/selftests/kvm/x86/vmx_preemption_timer_test.c b/tools/testing/selftests/kvm/x86/vmx_preemption_timer_test.c
index 1b7b6ba23de76..eb8021c33cd43 100644
--- a/tools/testing/selftests/kvm/x86/vmx_preemption_timer_test.c
+++ b/tools/testing/selftests/kvm/x86/vmx_preemption_timer_test.c
@@ -66,8 +66,6 @@ void l2_guest_code(void)
void l1_guest_code(struct vmx_pages *vmx_pages)
{
-#define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
u64 l1_vmx_pt_start;
u64 l1_vmx_pt_finish;
u64 l1_tsc_deadline, l2_tsc_deadline;
@@ -77,8 +75,7 @@ void l1_guest_code(struct vmx_pages *vmx_pages)
GUEST_ASSERT(load_vmcs(vmx_pages));
GUEST_ASSERT(vmptrstz() == vmx_pages->vmcs_gpa);
- prepare_vmcs(vmx_pages, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
/*
* Check for Preemption timer support
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v5 13/13] KVM: selftests: Add svm_pmu_host_guest_test for Host-Only/Guest-Only bits
2026-04-30 20:27 [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD Host-Only/Guest-Only bits Yosry Ahmed
` (11 preceding siblings ...)
2026-04-30 20:27 ` [PATCH v5 12/13] KVM: selftests: Drop L1-provided stacks for L2 guests on x86 Yosry Ahmed
@ 2026-04-30 20:27 ` Yosry Ahmed
2026-04-30 20:38 ` [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD " Yosry Ahmed
13 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-04-30 20:27 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, kvm, linux-kernel, Yosry Ahmed
From: Jim Mattson <jmattson@google.com>
Add a selftest to verify KVM correctly virtualizes the AMD PMU Host-Only
(bit 41) and Guest-Only (bit 40) event selector bits across all relevant
SVM state transitions.
The test programs 4 PMCs simultaneously with all combinations of the
Host-Only and Guest-Only bits, then verifies correct counting behavior
with EFER.SVME clear and set, as well as in host mode and guest mode.
The test also verifies that updating Host-Only / Guest-Only bits for a
PMC works as intended, and that event filtering is still respected.
Signed-off-by: Jim Mattson <jmattson@google.com>
Co-developed-by: Yosry Ahmed <yosry@kernel.org>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
tools/testing/selftests/kvm/Makefile.kvm | 1 +
tools/testing/selftests/kvm/include/x86/pmu.h | 6 +
.../kvm/x86/svm_pmu_host_guest_test.c | 216 ++++++++++++++++++
3 files changed, 223 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86/svm_pmu_host_guest_test.c
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 9118a5a51b89f..df52e938891e3 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -118,6 +118,7 @@ TEST_GEN_PROGS_x86 += x86/svm_nested_shutdown_test
TEST_GEN_PROGS_x86 += x86/svm_nested_soft_inject_test
TEST_GEN_PROGS_x86 += x86/svm_nested_vmcb12_gpa
TEST_GEN_PROGS_x86 += x86/svm_lbr_nested_state
+TEST_GEN_PROGS_x86 += x86/svm_pmu_host_guest_test
TEST_GEN_PROGS_x86 += x86/tsc_scaling_sync
TEST_GEN_PROGS_x86 += x86/sync_regs_test
TEST_GEN_PROGS_x86 += x86/ucna_injection_test
diff --git a/tools/testing/selftests/kvm/include/x86/pmu.h b/tools/testing/selftests/kvm/include/x86/pmu.h
index 98537cc8840d1..608ed83d7c6a6 100644
--- a/tools/testing/selftests/kvm/include/x86/pmu.h
+++ b/tools/testing/selftests/kvm/include/x86/pmu.h
@@ -38,6 +38,12 @@
#define ARCH_PERFMON_EVENTSEL_INV BIT_ULL(23)
#define ARCH_PERFMON_EVENTSEL_CMASK GENMASK_ULL(31, 24)
+/*
+ * These are AMD-specific bits.
+ */
+#define AMD64_EVENTSEL_GUESTONLY BIT_ULL(40)
+#define AMD64_EVENTSEL_HOSTONLY BIT_ULL(41)
+
/* RDPMC control flags, Intel only. */
#define INTEL_RDPMC_METRICS BIT_ULL(29)
#define INTEL_RDPMC_FIXED BIT_ULL(30)
diff --git a/tools/testing/selftests/kvm/x86/svm_pmu_host_guest_test.c b/tools/testing/selftests/kvm/x86/svm_pmu_host_guest_test.c
new file mode 100644
index 0000000000000..ee4633ab79aa7
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86/svm_pmu_host_guest_test.c
@@ -0,0 +1,216 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * KVM nested SVM PMU Host-Only/Guest-Only test
+ *
+ * Copyright (C) 2026, Google LLC.
+ *
+ * Test that KVM correctly virtualizes the AMD PMU Host-Only (bit 41) and
+ * Guest-Only (bit 40) event selector bits across all SVM state
+ * transitions.
+ *
+ * Programs 4 PMCs simultaneously with all combinations of Host-Only and
+ * Guest-Only bits, then verifies correct counting behavior with different
+ * combinations of EFER.SVME and host/guest mode -- as well as event filtering.
+ */
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+#include "svm_util.h"
+#include "pmu.h"
+
+#define EVENTSEL_RETIRED_INSNS (ARCH_PERFMON_EVENTSEL_OS | \
+ ARCH_PERFMON_EVENTSEL_USR | \
+ ARCH_PERFMON_EVENTSEL_ENABLE | \
+ AMD_ZEN_INSTRUCTIONS_RETIRED)
+
+/* PMC configurations: index corresponds to Host-Only | Guest-Only bits */
+#define PMC_NONE 0 /* Neither bit set */
+#define PMC_G 1 /* Guest-Only bit set */
+#define PMC_H 2 /* Host-Only bit set */
+#define PMC_HG 3 /* Both bits set */
+#define NR_PMCS 4
+
+#define LOOP_INSNS 1000
+
+static __always_inline void run_instruction_loop(void)
+{
+ unsigned int i;
+
+ for (i = 0; i < LOOP_INSNS; i++)
+ __asm__ __volatile__("nop");
+}
+
+static __always_inline void read_counters(uint64_t *counts)
+{
+ int i;
+
+ for (i = 0; i < NR_PMCS; i++)
+ counts[i] = rdmsr(MSR_F15H_PERF_CTR + 2 * i);
+}
+
+static __always_inline void run_and_measure(uint64_t *deltas)
+{
+ uint64_t before[NR_PMCS], after[NR_PMCS];
+ int i;
+
+ read_counters(before);
+ run_instruction_loop();
+ read_counters(after);
+
+ for (i = 0; i < NR_PMCS; i++)
+ deltas[i] = after[i] - before[i];
+}
+
+static void assert_pmc_counts(uint64_t *deltas, unsigned int expected_counting)
+{
+ int i;
+
+ for (i = 0; i < NR_PMCS; i++) {
+ if (expected_counting & BIT(i))
+ GUEST_ASSERT_NE(deltas[i], 0);
+ else
+ GUEST_ASSERT_EQ(deltas[i], 0);
+ }
+}
+
+static uint64_t l2_deltas[NR_PMCS];
+
+static void l2_guest_code(void)
+{
+ run_and_measure(l2_deltas);
+ vmmcall();
+}
+
+static void l1_guest_code(struct svm_test_data *svm)
+{
+ struct vmcb *vmcb = svm->vmcb;
+ uint64_t deltas[NR_PMCS];
+ uint64_t eventsel;
+ int i;
+
+ /* Program 4 PMCs with all combinations of Host-Only/Guest-Only bits */
+ for (i = 0; i < NR_PMCS; i++) {
+ eventsel = EVENTSEL_RETIRED_INSNS;
+ if (i & PMC_G)
+ eventsel |= AMD64_EVENTSEL_GUESTONLY;
+ if (i & PMC_H)
+ eventsel |= AMD64_EVENTSEL_HOSTONLY;
+ wrmsr(MSR_F15H_PERF_CTL + 2 * i, eventsel);
+ wrmsr(MSR_F15H_PERF_CTR + 2 * i, 0);
+ }
+
+ /* Step 1: SVME=0 - Only the counter with neither bits set counts */
+ wrmsr(MSR_EFER, rdmsr(MSR_EFER) & ~EFER_SVME);
+ run_and_measure(deltas);
+ assert_pmc_counts(deltas, BIT(PMC_NONE));
+
+ /* Step 2: Set SVME=1 - In L1 "host mode"; Guest-Only stops */
+ wrmsr(MSR_EFER, rdmsr(MSR_EFER) | EFER_SVME);
+ run_and_measure(deltas);
+ assert_pmc_counts(deltas, BIT(PMC_NONE) | BIT(PMC_H) | BIT(PMC_HG));
+
+ /* Step 3: VMRUN to L2 - In "guest mode"; Host-Only stops */
+ generic_svm_setup(svm, l2_guest_code);
+ vmcb->control.intercept &= ~(1ULL << INTERCEPT_MSR_PROT);
+
+ run_guest(vmcb, svm->vmcb_gpa);
+
+ GUEST_ASSERT_EQ(vmcb->control.exit_code, SVM_EXIT_VMMCALL);
+ assert_pmc_counts(l2_deltas, BIT(PMC_NONE) | BIT(PMC_G) | BIT(PMC_HG));
+
+ /* Step 4: After VMEXIT to L1 - Back in "host mode"; Guest-Only stops */
+ run_and_measure(deltas);
+ assert_pmc_counts(deltas, BIT(PMC_NONE) | BIT(PMC_H) | BIT(PMC_HG));
+
+ /* Step 5: Set KVM_PMU_EVENT_DENY - all counters stop */
+ GUEST_SYNC(KVM_PMU_EVENT_DENY);
+ run_and_measure(deltas);
+ assert_pmc_counts(deltas, 0);
+
+ /* Step 6: Set KVM_PMU_EVENT_ALLOW - back to all except Guest-only */
+ GUEST_SYNC(KVM_PMU_EVENT_ALLOW);
+ run_and_measure(deltas);
+ assert_pmc_counts(deltas, BIT(PMC_NONE) | BIT(PMC_H) | BIT(PMC_HG));
+
+ /* Step 7: Clear Host-Only for PMC_HG - counter stops in "host mode" */
+ eventsel = rdmsr(MSR_F15H_PERF_CTL + 2 * PMC_HG);
+ wrmsr(MSR_F15H_PERF_CTL + 2 * PMC_HG, eventsel & ~AMD64_EVENTSEL_HOSTONLY);
+ run_and_measure(deltas);
+ assert_pmc_counts(deltas, BIT(PMC_NONE) | BIT(PMC_H));
+
+ /* Step 8: Restore Host-Only for PMC_HG - counter counts again */
+ wrmsr(MSR_F15H_PERF_CTL + 2 * PMC_HG, eventsel);
+ run_and_measure(deltas);
+ assert_pmc_counts(deltas, BIT(PMC_NONE) | BIT(PMC_H) | BIT(PMC_HG));
+
+ /* Step 9: Clear SVME - Only the counter with neither bits set counts */
+ wrmsr(MSR_EFER, rdmsr(MSR_EFER) & ~EFER_SVME);
+ run_and_measure(deltas);
+ assert_pmc_counts(deltas, BIT(PMC_NONE));
+
+ GUEST_DONE();
+}
+
+static struct kvm_pmu_event_filter *alloc_event_filter(u64 event)
+{
+ struct kvm_pmu_event_filter *filter;
+
+ filter = malloc(sizeof(*filter) + sizeof(event));
+ TEST_ASSERT(filter != NULL, "Filter allocation failed");
+
+ memset(filter, 0, sizeof(*filter));
+ memcpy(filter->events, &event, sizeof(event));
+ filter->nevents = 1;
+ filter->action = KVM_PMU_EVENT_ALLOW;
+
+ return filter;
+}
+
+int main(int argc, char *argv[])
+{
+ struct kvm_pmu_event_filter *filter;
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+ struct ucall uc;
+ gva_t svm_gva;
+
+ TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_SVM));
+ TEST_REQUIRE(kvm_is_pmu_enabled());
+ TEST_REQUIRE(get_kvm_amd_param_bool("enable_mediated_pmu"));
+ TEST_REQUIRE(host_cpu_is_amd && kvm_cpu_family() >= 0x17);
+
+ vm = vm_create_with_one_vcpu(&vcpu, l1_guest_code);
+
+ vcpu_alloc_svm(vm, &svm_gva);
+ vcpu_args_set(vcpu, 1, svm_gva);
+
+ filter = alloc_event_filter(AMD_ZEN_INSTRUCTIONS_RETIRED);
+
+ for (;;) {
+ vcpu_run(vcpu);
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+
+ switch (get_ucall(vcpu, &uc)) {
+ case UCALL_ABORT:
+ REPORT_GUEST_ASSERT(uc);
+ goto done;
+ case UCALL_DONE:
+ goto done;
+ case UCALL_SYNC:
+ filter->action = uc.args[1];
+ vm_ioctl(vm, KVM_SET_PMU_EVENT_FILTER, filter);
+ break;
+ default:
+ TEST_FAIL("Unknown ucall %lu", uc.cmd);
+ goto done;
+ }
+ }
+done:
+ kvm_vm_free(vm);
+ return 0;
+}
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD Host-Only/Guest-Only bits
2026-04-30 20:27 [PATCH v5 00/13] KVM: x86/pmu: Add support for AMD Host-Only/Guest-Only bits Yosry Ahmed
` (12 preceding siblings ...)
2026-04-30 20:27 ` [PATCH v5 13/13] KVM: selftests: Add svm_pmu_host_guest_test for Host-Only/Guest-Only bits Yosry Ahmed
@ 2026-04-30 20:38 ` Yosry Ahmed
13 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-04-30 20:38 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, kvm, linux-kernel
On Thu, Apr 30, 2026 at 1:27 PM Yosry Ahmed <yosry@kernel.org> wrote:
>
> v5 of Jim and myself's series adding support for AMD's Host-Only and
> Guest-Only performance counter eventsel bits in KVM's mediated PMU
> passthrough implementation.
>
> These bits allow an nSVM-enabled guest to configure performance counters
> that count only during L1 execution (Host-Only) or only during L2 execution
> (Guest-Only).
>
> KVM updates the hardware event selector ENABLE bit at nested transitions
> and EFER.SVME changes such that counters only count in the appropriate
> mode.
>
> The series grew significantly since v4, as it now includes semi-related
> nSVM fixups and selftests cleanups needed for the series. I think parts
> of this series can land independently (patches 1-6 and patches 10-12),
> but then the remaining series would depend on both.
>
> v4 -> v5:
> - Dropped moving leave_guest_mode() and enter_guest_mode() definitions,
> since the calls to update the vPMU no longer happen within these
> functions.
> - Add PMU helpers refactoring to facilitate SVM usage.
> - Added nested SVM fixups to count VMRUN correctly in guest mode when
> Host-Only/Guest-Only support is enabled [Jim/Sean].
> - Update the vPMU synchronously on nested VM-Enter/VM-Exit and EFER.SVME
> changes, such that counter enablement is reevaluated before the
> instructions are counted, as the vPMU counts based on the vCPU state
> at instruction retirement (e.g. using new EFER value when EFER.SVME
> changes) [Jim/Sean].
> - Keep deferring vPMU updates using KVM_REQ_PMU in the
> svm_leave_nested() path to avoid KVM potentially consuming stale
> state [Sean].
> - Use a single PMU callback for reprogramming counters instead of a
> per-counter callback [Sean].
> - Move the bitmap tracking counters into SVM code. The generic vPMU code
> now only exposes an API to reprogram counters, and an SVM wrapper uses
> it on nested transitions [Sean].
> - Drop the manual stack-alignment fixes in the vPMU selftest, instead
> rework L2 stack setup in all nested selftests to reuse the allocation
> and alignment logic used by L1, and completely drop L1-provided stacks
> for L2 [Sean].
Forgot to mention, I also added a couple of test cases for:
- Changing HG bits on an existing counter after it's enabled.
- Event filtering.
^ permalink raw reply [flat|nested] 18+ messages in thread