* [PATCH v6 01/16] KVM: nSVM: Stop leaking single-stepping on VMRUN into L2
2026-05-06 1:57 [PATCH v6 00/16] Yosry Ahmed
@ 2026-05-06 1:57 ` Yosry Ahmed
2026-05-06 1:57 ` [PATCH v6 02/16] KVM: nSVM: Bail early out of VMRUN emulation if advancing RIP fails Yosry Ahmed
` (15 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-05-06 1:57 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Dapeng Mi, Sandipan Das,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, kvm, linux-kernel,
Yosry Ahmed
According to the APM, TF on VMRUN causes a #DB after VMRUN completes on
the _host_ side. However, KVM injects a #DB in L2 context instead (or
exits to userspace if KVM_GUESTDBG_SINGLESTEP is set) in
kvm_skip_emulated_instruction().
Introduce __kvm_skip_emulated_instruction(), pull single-step handling
into the wrapper, and use __kvm_skip_emulated_instruction() for VMRUN.
This ignores TF on VMRUN instead of injecting a spurious exception into
L2. Document this virtualization hole with a FIXME.
Note that a failed VMRUN would have been correctly single-stepped, but
now TF is always ignored for consistency and simplicity purposes. VMX
does not support TF on VMLAUNCH/VMRESUME, so it's unlikely that
single-stepping VMRUN properly is important, especially if it's only for
failed VMRUNs.
Fixes: c8e16b78c614 ("x86: KVM: svm: eliminate hardcoded RIP advancement from vmrun_interception()")
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/svm/nested.c | 11 ++++++++---
arch/x86/kvm/x86.c | 15 +++++++++++++--
3 files changed, 23 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c470e40a00aa4..b191967c9c1e4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2475,7 +2475,9 @@ void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu);
bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu);
extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
+int __kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu);
int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu);
+
int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err);
void __user *__x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa,
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 961804df5f451..5dfcbaf7743b0 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1125,11 +1125,16 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
return kvm_handle_memory_failure(vcpu, X86EMUL_IO_NEEDED, NULL);
/* Advance RIP past VMRUN as part of the nested #VMEXIT. */
- return kvm_skip_emulated_instruction(vcpu);
+ return __kvm_skip_emulated_instruction(vcpu);
}
- /* At this point, VMRUN is guaranteed to not fault; advance RIP. */
- ret = kvm_skip_emulated_instruction(vcpu);
+ /*
+ * At this point, VMRUN is guaranteed to not fault; advance RIP.
+ *
+ * FIXME: If TF is set on VMRUN should inject a #DB (or handle guest
+ * debugging) right after #VMEXIT, right now it's just ignored.
+ */
+ ret = __kvm_skip_emulated_instruction(vcpu);
/*
* Since vmcb01 is not in use, we can use it to store some of the L1
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0a1b63c63d1a9..31dc48a8111e5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9272,9 +9272,8 @@ static int kvm_vcpu_do_singlestep(struct kvm_vcpu *vcpu)
return 1;
}
-int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
+int __kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
{
- unsigned long rflags = kvm_x86_call(get_rflags)(vcpu);
int r;
r = kvm_x86_call(skip_emulated_instruction)(vcpu);
@@ -9282,6 +9281,18 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
return 0;
kvm_pmu_instruction_retired(vcpu);
+ return r;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_skip_emulated_instruction);
+
+int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
+{
+ unsigned long rflags = kvm_x86_call(get_rflags)(vcpu);
+ int r;
+
+ r = __kvm_skip_emulated_instruction(vcpu);
+ if (unlikely(!r))
+ return 0;
/*
* rflags is the old, "raw" value of the flags. The new value has
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v6 02/16] KVM: nSVM: Bail early out of VMRUN emulation if advancing RIP fails
2026-05-06 1:57 [PATCH v6 00/16] Yosry Ahmed
2026-05-06 1:57 ` [PATCH v6 01/16] KVM: nSVM: Stop leaking single-stepping on VMRUN into L2 Yosry Ahmed
@ 2026-05-06 1:57 ` Yosry Ahmed
2026-05-06 1:57 ` [PATCH v6 03/16] KVM: nSVM: Move VMRUN instruction retirement after entering guest mode Yosry Ahmed
` (14 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-05-06 1:57 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Dapeng Mi, Sandipan Das,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, kvm, linux-kernel,
Yosry Ahmed
If kvm_skip_emulation_instruction() fails, then RIP could not be
advanced correctly (e.g. decode failure when NextRIP is not available).
KVM will exit to userspace to handle the emulation failure, but only
after stuffing the wrong RIP into vmcb01 and entering guest mode.
Bail early and exit to userspace before committing any side-effects of
emulating the VMRUN. Unify both calls to
__kvm_skip_emulated_instruction() into a single one, but return
immediately after if copying and caching vmcb12 failed. A side effect of
this is that the FIXME comment is now above the only caller.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 5dfcbaf7743b0..0f6ea490d707b 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1120,21 +1120,22 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
}
ret = nested_svm_copy_vmcb12_to_cache(vcpu, vmcb12_gpa);
- if (ret) {
- if (ret == -EFAULT)
- return kvm_handle_memory_failure(vcpu, X86EMUL_IO_NEEDED, NULL);
-
- /* Advance RIP past VMRUN as part of the nested #VMEXIT. */
- return __kvm_skip_emulated_instruction(vcpu);
- }
+ if (ret == -EFAULT)
+ return kvm_handle_memory_failure(vcpu, X86EMUL_IO_NEEDED, NULL);
/*
- * At this point, VMRUN is guaranteed to not fault; advance RIP.
+ * At this point, VMRUN is guaranteed to not fault; advance RIP. If
+ * caching vmcb12 failed for other reasons, return immediately afterward
+ * as a nested #VMEXIT was already set up.
*
* FIXME: If TF is set on VMRUN should inject a #DB (or handle guest
* debugging) right after #VMEXIT, right now it's just ignored.
*/
- ret = __kvm_skip_emulated_instruction(vcpu);
+ if (!__kvm_skip_emulated_instruction(vcpu))
+ return 0;
+
+ if (ret)
+ return 1;
/*
* Since vmcb01 is not in use, we can use it to store some of the L1
@@ -1164,7 +1165,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
nested_svm_vmexit(svm);
}
- return ret;
+ return 1;
}
/* Copy state save area fields which are handled by VMRUN */
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v6 03/16] KVM: nSVM: Move VMRUN instruction retirement after entering guest mode
2026-05-06 1:57 [PATCH v6 00/16] Yosry Ahmed
2026-05-06 1:57 ` [PATCH v6 01/16] KVM: nSVM: Stop leaking single-stepping on VMRUN into L2 Yosry Ahmed
2026-05-06 1:57 ` [PATCH v6 02/16] KVM: nSVM: Bail early out of VMRUN emulation if advancing RIP fails Yosry Ahmed
@ 2026-05-06 1:57 ` Yosry Ahmed
2026-05-06 1:57 ` [PATCH v6 04/16] KVM: x86: Move enable_pmu/enable_mediated_pmu to pmu.h and pmu.c Yosry Ahmed
` (13 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-05-06 1:57 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Dapeng Mi, Sandipan Das,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, kvm, linux-kernel,
Yosry Ahmed
A successful VMRUN retires in guest mode and should be counted by the
PMU as a guest instruction. However, __kvm_skip_emulated_instruction()
is called before entering guest mode to advance L1's RIP to the
instruction following VMRUN. This is needed as the RIP is saved in
vmcb01 to be restored on VM-Exit.
Since VMRUN emulation is the only caller of
__kvm_skip_emulated_instruction(), move retiring instructions for PMU
purposes to its wrapper, leaving __kvm_skip_emulated_instruction() as a
transparent wrapper around the vendor-specific calls.
Note that this is currently a noop because KVM does not virtualize
Host-Only/Guest-Only PMC controls yet, so all instructions are counted
regardless of the vCPU's host/guest state. But this change is needed for
the incoming support for Host-Only/Guest-Only controls to count VMRUN
correctly.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/nested.c | 9 ++++++++-
arch/x86/kvm/x86.c | 11 +++--------
2 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 0f6ea490d707b..58c78c889a812 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -30,6 +30,7 @@
#include "lapic.h"
#include "svm.h"
#include "hyperv.h"
+#include "pmu.h"
#define CC KVM_NESTED_VMENTER_CONSISTENCY_CHECK
@@ -1135,7 +1136,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
return 0;
if (ret)
- return 1;
+ goto insn_retired;
/*
* Since vmcb01 is not in use, we can use it to store some of the L1
@@ -1165,6 +1166,12 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
nested_svm_vmexit(svm);
}
+insn_retired:
+ /*
+ * A successful VMRUN is counted by the PMU in guest mode, so only
+ * retire the instruction after potentially entering guest mode.
+ */
+ kvm_pmu_instruction_retired(vcpu);
return 1;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 31dc48a8111e5..08be0a63b93bd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9274,14 +9274,7 @@ static int kvm_vcpu_do_singlestep(struct kvm_vcpu *vcpu)
int __kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
{
- int r;
-
- r = kvm_x86_call(skip_emulated_instruction)(vcpu);
- if (unlikely(!r))
- return 0;
-
- kvm_pmu_instruction_retired(vcpu);
- return r;
+ return kvm_x86_call(skip_emulated_instruction)(vcpu);
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_skip_emulated_instruction);
@@ -9294,6 +9287,8 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
if (unlikely(!r))
return 0;
+ kvm_pmu_instruction_retired(vcpu);
+
/*
* rflags is the old, "raw" value of the flags. The new value has
* not been saved yet.
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v6 04/16] KVM: x86: Move enable_pmu/enable_mediated_pmu to pmu.h and pmu.c
2026-05-06 1:57 [PATCH v6 00/16] Yosry Ahmed
` (2 preceding siblings ...)
2026-05-06 1:57 ` [PATCH v6 03/16] KVM: nSVM: Move VMRUN instruction retirement after entering guest mode Yosry Ahmed
@ 2026-05-06 1:57 ` Yosry Ahmed
2026-05-06 1:57 ` [PATCH v6 05/16] KVM: x86/pmu: Rename reprogram_counters() to clarify usage Yosry Ahmed
` (12 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-05-06 1:57 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Dapeng Mi, Sandipan Das,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, kvm, linux-kernel,
Yosry Ahmed
The declaration and definition of enable_pmu/enable_mediated_pmu
semantically belongs in pmu.h and pmu.c, and more importantly, pmu.h
uses enable_mediated_pmu and relies on the caller including x86.h.
There is already precedence for other module params defined outside of
x86.c, so move enable_pmu/enable_mediated_pmu to pmu.c.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/pmu.c | 10 ++++++++++
arch/x86/kvm/pmu.h | 3 +++
arch/x86/kvm/x86.c | 9 ---------
arch/x86/kvm/x86.h | 3 ---
4 files changed, 13 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index e218352e34231..d6ac3c55fce55 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -16,6 +16,7 @@
#include <linux/perf_event.h>
#include <linux/bsearch.h>
#include <linux/sort.h>
+#include <linux/moduleparam.h>
#include <asm/perf_event.h>
#include <asm/cpu_device_id.h>
#include "x86.h"
@@ -33,6 +34,15 @@ static struct x86_pmu_capability __read_mostly kvm_host_pmu;
struct x86_pmu_capability __read_mostly kvm_pmu_cap;
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_pmu_cap);
+/* Enable/disable PMU virtualization */
+bool __read_mostly enable_pmu = true;
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_pmu);
+module_param(enable_pmu, bool, 0444);
+
+/* Enable/disabled mediated PMU virtualization. */
+bool __read_mostly enable_mediated_pmu;
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_mediated_pmu);
+
struct kvm_pmu_emulated_event_selectors {
u64 INSTRUCTIONS_RETIRED;
u64 BRANCH_INSTRUCTIONS_RETIRED;
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 0925246731cb1..b1f2418e960ac 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -53,6 +53,9 @@ struct kvm_pmu_ops {
const u32 MSR_STRIDE;
};
+extern bool enable_pmu;
+extern bool enable_mediated_pmu;
+
void kvm_pmu_ops_update(const struct kvm_pmu_ops *pmu_ops);
void kvm_handle_guest_mediated_pmi(void);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 08be0a63b93bd..0b421ea29977b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -182,15 +182,6 @@ module_param(force_emulation_prefix, int, 0644);
int __read_mostly pi_inject_timer = -1;
module_param(pi_inject_timer, bint, 0644);
-/* Enable/disable PMU virtualization */
-bool __read_mostly enable_pmu = true;
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_pmu);
-module_param(enable_pmu, bool, 0444);
-
-/* Enable/disabled mediated PMU virtualization. */
-bool __read_mostly enable_mediated_pmu;
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_mediated_pmu);
-
bool __read_mostly eager_page_split = true;
module_param(eager_page_split, bool, 0644);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 38a905fa86de2..30a69effc81e2 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -490,9 +490,6 @@ fastpath_t handle_fastpath_invd(struct kvm_vcpu *vcpu);
extern struct kvm_caps kvm_caps;
extern struct kvm_host_values kvm_host;
-extern bool enable_pmu;
-extern bool enable_mediated_pmu;
-
void kvm_setup_xss_caps(void);
/*
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v6 05/16] KVM: x86/pmu: Rename reprogram_counters() to clarify usage
2026-05-06 1:57 [PATCH v6 00/16] Yosry Ahmed
` (3 preceding siblings ...)
2026-05-06 1:57 ` [PATCH v6 04/16] KVM: x86: Move enable_pmu/enable_mediated_pmu to pmu.h and pmu.c Yosry Ahmed
@ 2026-05-06 1:57 ` Yosry Ahmed
2026-05-06 1:57 ` [PATCH v6 06/16] KVM: x86/pmu: Do a single atomic OR when reprogramming counters Yosry Ahmed
` (11 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-05-06 1:57 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Dapeng Mi, Sandipan Das,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, kvm, linux-kernel,
Yosry Ahmed
Rename reprogram_counters() to kvm_pmu_request_counters_reprogram()
clarifying that it is more similar to
kvm_pmu_request_counter_reprogram(), and less similar to
reprogram_counter(). The kvm_pmu_* prefix is also appropriate as the
function is exposed in the header.
Opportunistically rename the argument from 'diff' to 'counters'.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/pmu.c | 2 +-
arch/x86/kvm/pmu.h | 7 ++++---
arch/x86/kvm/vmx/pmu_intel.c | 2 +-
3 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index d6ac3c55fce55..afbc731e72174 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -889,7 +889,7 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (pmu->global_ctrl != data) {
diff = pmu->global_ctrl ^ data;
pmu->global_ctrl = data;
- reprogram_counters(pmu, diff);
+ kvm_pmu_request_counters_reprogram(pmu, diff);
}
/*
* Unconditionally forward writes to vendor code, i.e. to the
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index b1f2418e960ac..f8286067722b0 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -210,14 +210,15 @@ static inline void kvm_pmu_request_counter_reprogram(struct kvm_pmc *pmc)
kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
}
-static inline void reprogram_counters(struct kvm_pmu *pmu, u64 diff)
+static inline void kvm_pmu_request_counters_reprogram(struct kvm_pmu *pmu,
+ u64 counters)
{
int bit;
- if (!diff)
+ if (!counters)
return;
- for_each_set_bit(bit, (unsigned long *)&diff, X86_PMC_IDX_MAX)
+ for_each_set_bit(bit, (unsigned long *)&counters, X86_PMC_IDX_MAX)
set_bit(bit, pmu->reprogram_pmi);
kvm_make_request(KVM_REQ_PMU, pmu_to_vcpu(pmu));
}
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 27eb76e6b6a03..9bd77843d8da2 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -391,7 +391,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (pmu->pebs_enable != data) {
diff = pmu->pebs_enable ^ data;
pmu->pebs_enable = data;
- reprogram_counters(pmu, diff);
+ kvm_pmu_request_counters_reprogram(pmu, diff);
}
break;
case MSR_IA32_DS_AREA:
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v6 06/16] KVM: x86/pmu: Do a single atomic OR when reprogramming counters
2026-05-06 1:57 [PATCH v6 00/16] Yosry Ahmed
` (4 preceding siblings ...)
2026-05-06 1:57 ` [PATCH v6 05/16] KVM: x86/pmu: Rename reprogram_counters() to clarify usage Yosry Ahmed
@ 2026-05-06 1:57 ` Yosry Ahmed
2026-05-06 1:57 ` [PATCH v6 07/16] KVM: x86/pmu: Check mediated PMU counter enablement before event filters Yosry Ahmed
` (10 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-05-06 1:57 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Dapeng Mi, Sandipan Das,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, kvm, linux-kernel,
Yosry Ahmed
Do a single atomic OR using the atomic overlay of reprogram_pmi bitmask,
instead of one atomic set_bit() call per counter.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/pmu.h | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index f8286067722b0..0e99022168a85 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -213,13 +213,10 @@ static inline void kvm_pmu_request_counter_reprogram(struct kvm_pmc *pmc)
static inline void kvm_pmu_request_counters_reprogram(struct kvm_pmu *pmu,
u64 counters)
{
- int bit;
-
if (!counters)
return;
- for_each_set_bit(bit, (unsigned long *)&counters, X86_PMC_IDX_MAX)
- set_bit(bit, pmu->reprogram_pmi);
+ atomic64_or(counters, &pmu->__reprogram_pmi);
kvm_make_request(KVM_REQ_PMU, pmu_to_vcpu(pmu));
}
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v6 07/16] KVM: x86/pmu: Check mediated PMU counter enablement before event filters
2026-05-06 1:57 [PATCH v6 00/16] Yosry Ahmed
` (5 preceding siblings ...)
2026-05-06 1:57 ` [PATCH v6 06/16] KVM: x86/pmu: Do a single atomic OR when reprogramming counters Yosry Ahmed
@ 2026-05-06 1:57 ` Yosry Ahmed
2026-05-06 1:57 ` [PATCH v6 08/16] KVM: x86/pmu: Add support for KVM_X86_PMU_OP_OPTIONAL_RET0 Yosry Ahmed
` (9 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-05-06 1:57 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Dapeng Mi, Sandipan Das,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, kvm, linux-kernel,
Yosry Ahmed
If the guest disables the counter (by clearing
ARCH_PERFMON_EVENTSEL_ENABLE), KVM still performs the PMU filter lookup,
even though it doesn't end up changing eventsel_hw. Check if the
counter is enabled by the guest before doing the potentially expensive
PMU filter lookup.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/pmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index afbc731e72174..67dbbd4c73036 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -532,7 +532,7 @@ static bool pmc_is_event_allowed(struct kvm_pmc *pmc)
static void kvm_mediated_pmu_refresh_event_filter(struct kvm_pmc *pmc)
{
- bool allowed = pmc_is_event_allowed(pmc);
+ bool allowed = pmc_is_locally_enabled(pmc) && pmc_is_event_allowed(pmc);
struct kvm_pmu *pmu = pmc_to_pmu(pmc);
if (pmc_is_gp(pmc)) {
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v6 08/16] KVM: x86/pmu: Add support for KVM_X86_PMU_OP_OPTIONAL_RET0
2026-05-06 1:57 [PATCH v6 00/16] Yosry Ahmed
` (6 preceding siblings ...)
2026-05-06 1:57 ` [PATCH v6 07/16] KVM: x86/pmu: Check mediated PMU counter enablement before event filters Yosry Ahmed
@ 2026-05-06 1:57 ` Yosry Ahmed
2026-05-06 1:57 ` [PATCH v6 09/16] KVM: x86/pmu: Disable counters based on Host-Only/Guest-Only bits in SVM Yosry Ahmed
` (8 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-05-06 1:57 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Dapeng Mi, Sandipan Das,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, kvm, linux-kernel,
Yosry Ahmed
Add definitions for KVM_X86_PMU_OP_OPTIONAL_RET0() to resolve to
__static_call_return0, similar to KVM_X86_OP_OPTIONAL_RET0(). Move the
definition of kvm_pmu_call() to pmu.h, and add declarations for the
static PMU calls in the header to allow making callbacks from the header
in following changes.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/include/asm/kvm-x86-pmu-ops.h | 4 +++-
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/pmu.c | 4 ++++
arch/x86/kvm/pmu.h | 8 ++++++++
4 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/kvm-x86-pmu-ops.h b/arch/x86/include/asm/kvm-x86-pmu-ops.h
index d5452b3433b7d..03ed2c917bb56 100644
--- a/arch/x86/include/asm/kvm-x86-pmu-ops.h
+++ b/arch/x86/include/asm/kvm-x86-pmu-ops.h
@@ -1,6 +1,7 @@
/* SPDX-License-Identifier: GPL-2.0 */
#if !defined(KVM_X86_PMU_OP) || \
- !defined(KVM_X86_PMU_OP_OPTIONAL)
+ !defined(KVM_X86_PMU_OP_OPTIONAL) || \
+ !defined(KVM_X86_PMU_OP_OPTIONAL_RET0)
#error Missing one or more KVM_X86_PMU_OP #defines
#else
@@ -31,3 +32,4 @@ KVM_X86_PMU_OP(mediated_put)
#undef KVM_X86_PMU_OP
#undef KVM_X86_PMU_OP_OPTIONAL
+#undef KVM_X86_PMU_OP_OPTIONAL_RET0
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b191967c9c1e4..943adf62839fc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2041,7 +2041,6 @@ extern bool __read_mostly enable_device_posted_irqs;
extern struct kvm_x86_ops kvm_x86_ops;
#define kvm_x86_call(func) static_call(kvm_x86_##func)
-#define kvm_pmu_call(func) static_call(kvm_x86_pmu_##func)
#define KVM_X86_OP(func) \
DECLARE_STATIC_CALL(kvm_x86_##func, *(((struct kvm_x86_ops *)0)->func));
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 67dbbd4c73036..9b7e39610be22 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -98,6 +98,7 @@ static struct kvm_pmu_ops kvm_pmu_ops __read_mostly;
DEFINE_STATIC_CALL_NULL(kvm_x86_pmu_##func, \
*(((struct kvm_pmu_ops *)0)->func));
#define KVM_X86_PMU_OP_OPTIONAL KVM_X86_PMU_OP
+#define KVM_X86_PMU_OP_OPTIONAL_RET0 KVM_X86_PMU_OP
#include <asm/kvm-x86-pmu-ops.h>
void kvm_pmu_ops_update(const struct kvm_pmu_ops *pmu_ops)
@@ -109,6 +110,9 @@ void kvm_pmu_ops_update(const struct kvm_pmu_ops *pmu_ops)
#define KVM_X86_PMU_OP(func) \
WARN_ON(!kvm_pmu_ops.func); __KVM_X86_PMU_OP(func)
#define KVM_X86_PMU_OP_OPTIONAL __KVM_X86_PMU_OP
+#define KVM_X86_PMU_OP_OPTIONAL_RET0(func) \
+ static_call_update(kvm_x86_pmu_##func, (void *)kvm_pmu_ops.func ? : \
+ (void *)__static_call_return0);
#include <asm/kvm-x86-pmu-ops.h>
#undef __KVM_X86_PMU_OP
}
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 0e99022168a85..a062f0bc3dbb1 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -53,6 +53,14 @@ struct kvm_pmu_ops {
const u32 MSR_STRIDE;
};
+#define kvm_pmu_call(func) static_call(kvm_x86_pmu_##func)
+
+#define KVM_X86_PMU_OP(func) \
+ DECLARE_STATIC_CALL(kvm_x86_pmu_##func, *(((struct kvm_pmu_ops *)0)->func));
+#define KVM_X86_PMU_OP_OPTIONAL KVM_X86_PMU_OP
+#define KVM_X86_PMU_OP_OPTIONAL_RET0 KVM_X86_PMU_OP
+#include <asm/kvm-x86-pmu-ops.h>
+
extern bool enable_pmu;
extern bool enable_mediated_pmu;
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v6 09/16] KVM: x86/pmu: Disable counters based on Host-Only/Guest-Only bits in SVM
2026-05-06 1:57 [PATCH v6 00/16] Yosry Ahmed
` (7 preceding siblings ...)
2026-05-06 1:57 ` [PATCH v6 08/16] KVM: x86/pmu: Add support for KVM_X86_PMU_OP_OPTIONAL_RET0 Yosry Ahmed
@ 2026-05-06 1:57 ` Yosry Ahmed
2026-05-06 1:57 ` [PATCH v6 10/16] KVM: x86/pmu: Track mediated PMU counters with mode-specific enables Yosry Ahmed
` (7 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-05-06 1:57 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Dapeng Mi, Sandipan Das,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, kvm, linux-kernel,
Yosry Ahmed
Introduce an optional per-vendor PMU callback for checking if a counter
is disabled in the current mode, and register a callback on AMD to
disable a counter based on the vCPU's setting of Host-Only or Guest-Only
EVENT_SELECT bits with the mediated PMU.
If EFER.SVME is set, all events are counted if both bits are set or
cleared. If only one bit is set, the counter is disabled if the vCPU
context does not match the set bit.
If EFER.SVME is cleared, the counter is disabled if any of the bits is
set, otherwise all events are counted. Note that a Linux guest correctly
handles this and clears Host-Only when EFER.SVME is cleared, see commit
1018faa6cf23 ("perf/x86/kvm: Fix Host-Only/Guest-Only counting with SVM
disabled").
The callback is made from pmc_is_locally_enabled(), which is used for
the mediated PMU when updating eventsel_hw in
kvm_mediated_pmu_refresh_eventsel_hw(), as well as when checking what
PMCs count instructions/branches for emulation in
kvm_pmu_recalc_pmc_emulation().
Host-Only and Guest-Only bits are currently reserved, so this change is
a noop, but the bits will be allowed with mediated PMU in a following
change when fully supported.
Originally-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/include/asm/kvm-x86-pmu-ops.h | 1 +
arch/x86/include/asm/perf_event.h | 2 ++
arch/x86/kvm/pmu.c | 1 +
arch/x86/kvm/pmu.h | 4 +++-
arch/x86/kvm/svm/pmu.c | 32 ++++++++++++++++++++++++++
5 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm-x86-pmu-ops.h b/arch/x86/include/asm/kvm-x86-pmu-ops.h
index 03ed2c917bb56..55442fc355eea 100644
--- a/arch/x86/include/asm/kvm-x86-pmu-ops.h
+++ b/arch/x86/include/asm/kvm-x86-pmu-ops.h
@@ -24,6 +24,7 @@ KVM_X86_PMU_OP(init)
KVM_X86_PMU_OP_OPTIONAL(reset)
KVM_X86_PMU_OP_OPTIONAL(deliver_pmi)
KVM_X86_PMU_OP_OPTIONAL(cleanup)
+KVM_X86_PMU_OP_OPTIONAL_RET0(pmc_is_disabled_in_current_mode)
KVM_X86_PMU_OP_OPTIONAL(write_global_ctrl)
KVM_X86_PMU_OP(mediated_load)
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index ff5acb8b199b0..5961c002b28eb 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -60,6 +60,8 @@
#define AMD64_EVENTSEL_INT_CORE_ENABLE (1ULL << 36)
#define AMD64_EVENTSEL_GUESTONLY (1ULL << 40)
#define AMD64_EVENTSEL_HOSTONLY (1ULL << 41)
+#define AMD64_EVENTSEL_HOST_GUEST_MASK \
+ (AMD64_EVENTSEL_HOSTONLY | AMD64_EVENTSEL_GUESTONLY)
#define AMD64_EVENTSEL_INT_CORE_SEL_SHIFT 37
#define AMD64_EVENTSEL_INT_CORE_SEL_MASK \
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 9b7e39610be22..8159b07e9bc20 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -100,6 +100,7 @@ static struct kvm_pmu_ops kvm_pmu_ops __read_mostly;
#define KVM_X86_PMU_OP_OPTIONAL KVM_X86_PMU_OP
#define KVM_X86_PMU_OP_OPTIONAL_RET0 KVM_X86_PMU_OP
#include <asm/kvm-x86-pmu-ops.h>
+EXPORT_STATIC_CALL_GPL(kvm_x86_pmu_pmc_is_disabled_in_current_mode);
void kvm_pmu_ops_update(const struct kvm_pmu_ops *pmu_ops)
{
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index a062f0bc3dbb1..cc7f55d4a78b4 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -36,6 +36,7 @@ struct kvm_pmu_ops {
void (*reset)(struct kvm_vcpu *vcpu);
void (*deliver_pmi)(struct kvm_vcpu *vcpu);
void (*cleanup)(struct kvm_vcpu *vcpu);
+ bool (*pmc_is_disabled_in_current_mode)(struct kvm_pmc *pmc);
bool (*is_mediated_pmu_supported)(struct x86_pmu_capability *host_pmu);
void (*mediated_load)(struct kvm_vcpu *vcpu);
@@ -201,7 +202,8 @@ static inline bool pmc_is_locally_enabled(struct kvm_pmc *pmc)
pmc->idx - KVM_FIXED_PMC_BASE_IDX) &
(INTEL_FIXED_0_KERNEL | INTEL_FIXED_0_USER);
- return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
+ return (pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) &&
+ !kvm_pmu_call(pmc_is_disabled_in_current_mode)(pmc);
}
extern struct x86_pmu_capability kvm_pmu_cap;
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 7aa298eeb0721..41ee6532290e9 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -260,6 +260,37 @@ static void amd_mediated_pmu_put(struct kvm_vcpu *vcpu)
wrmsrq(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, pmu->global_status);
}
+static bool amd_pmc_is_disabled_in_current_mode(struct kvm_pmc *pmc)
+{
+ struct kvm_vcpu *vcpu = pmc->vcpu;
+ u64 host_guest_bits;
+
+ if (!kvm_vcpu_has_mediated_pmu(vcpu))
+ return false;
+
+ /* Common code is supposed to check the common enable bit */
+ if (WARN_ON_ONCE(!(pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE)))
+ return false;
+
+ /* If both bits are cleared, the counter is always enabled */
+ host_guest_bits = pmc->eventsel & AMD64_EVENTSEL_HOST_GUEST_MASK;
+ if (!host_guest_bits)
+ return false;
+
+ /* If EFER.SVME=0 and either bit is set, the counter is disabled */
+ if (!(vcpu->arch.efer & EFER_SVME))
+ return true;
+
+ /*
+ * If EFER.SVME=1, the counter is disabled iff only one of the bits is
+ * set AND the set bit doesn't match the vCPU mode.
+ */
+ if (host_guest_bits == AMD64_EVENTSEL_HOST_GUEST_MASK)
+ return false;
+
+ return !!(host_guest_bits & AMD64_EVENTSEL_GUESTONLY) != is_guest_mode(vcpu);
+}
+
struct kvm_pmu_ops amd_pmu_ops __initdata = {
.rdpmc_ecx_to_pmc = amd_rdpmc_ecx_to_pmc,
.msr_idx_to_pmc = amd_msr_idx_to_pmc,
@@ -269,6 +300,7 @@ struct kvm_pmu_ops amd_pmu_ops __initdata = {
.set_msr = amd_pmu_set_msr,
.refresh = amd_pmu_refresh,
.init = amd_pmu_init,
+ .pmc_is_disabled_in_current_mode = amd_pmc_is_disabled_in_current_mode,
.is_mediated_pmu_supported = amd_pmu_is_mediated_pmu_supported,
.mediated_load = amd_mediated_pmu_load,
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v6 10/16] KVM: x86/pmu: Track mediated PMU counters with mode-specific enables
2026-05-06 1:57 [PATCH v6 00/16] Yosry Ahmed
` (8 preceding siblings ...)
2026-05-06 1:57 ` [PATCH v6 09/16] KVM: x86/pmu: Disable counters based on Host-Only/Guest-Only bits in SVM Yosry Ahmed
@ 2026-05-06 1:57 ` Yosry Ahmed
2026-05-06 1:57 ` [PATCH v6 11/16] KVM: x86/pmu: Reprogram Host/Guest-Only counters on nested transitions Yosry Ahmed
` (6 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-05-06 1:57 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Dapeng Mi, Sandipan Das,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, kvm, linux-kernel,
Yosry Ahmed
Instead of always checking of a counter needs to be disabled for
mode-specific reasons (e.g. Host-Only/Guest-Only bits in SVM), add a
bitmap to track such counters. Set the bit for counters using either
Host-Only or Guest-Only bits in EVENTSEL on SVM.
This bitmap will also be reused in following changes to selectively
apply changes to such counters.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/pmu.c | 1 +
arch/x86/kvm/pmu.h | 9 +++++++--
arch/x86/kvm/svm/pmu.c | 6 ++++++
4 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 943adf62839fc..ad5a795b1ffad 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -594,6 +594,8 @@ struct kvm_pmu {
DECLARE_BITMAP(pmc_counting_instructions, X86_PMC_IDX_MAX);
DECLARE_BITMAP(pmc_counting_branches, X86_PMC_IDX_MAX);
+ DECLARE_BITMAP(pmc_has_mode_specific_enables, X86_PMC_IDX_MAX);
+
u64 ds_area;
u64 pebs_enable;
u64 pebs_enable_rsvd;
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 8159b07e9bc20..84c834ad2cd47 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -936,6 +936,7 @@ static void kvm_pmu_reset(struct kvm_vcpu *vcpu)
pmu->need_cleanup = false;
bitmap_zero(pmu->reprogram_pmi, X86_PMC_IDX_MAX);
+ bitmap_zero(pmu->pmc_has_mode_specific_enables, X86_PMC_IDX_MAX);
kvm_for_each_pmc(pmu, pmc, i, pmu->all_valid_pmc_idx) {
pmc_stop_counter(pmc);
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index cc7f55d4a78b4..34c3c6913ef62 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -202,8 +202,13 @@ static inline bool pmc_is_locally_enabled(struct kvm_pmc *pmc)
pmc->idx - KVM_FIXED_PMC_BASE_IDX) &
(INTEL_FIXED_0_KERNEL | INTEL_FIXED_0_USER);
- return (pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) &&
- !kvm_pmu_call(pmc_is_disabled_in_current_mode)(pmc);
+ if (!(pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE))
+ return false;
+
+ if (!test_bit(pmc->idx, pmu->pmc_has_mode_specific_enables))
+ return true;
+
+ return !kvm_pmu_call(pmc_is_disabled_in_current_mode)(pmc);
}
extern struct x86_pmu_capability kvm_pmu_cap;
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 41ee6532290e9..b892a25ea4ca9 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -168,6 +168,12 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
pmc->eventsel = data;
pmc->eventsel_hw = (data & ~AMD64_EVENTSEL_HOSTONLY) |
AMD64_EVENTSEL_GUESTONLY;
+
+ if (data & AMD64_EVENTSEL_HOST_GUEST_MASK)
+ __set_bit(pmc->idx, pmu->pmc_has_mode_specific_enables);
+ else
+ __clear_bit(pmc->idx, pmu->pmc_has_mode_specific_enables);
+
kvm_pmu_request_counter_reprogram(pmc);
}
return 0;
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v6 11/16] KVM: x86/pmu: Reprogram Host/Guest-Only counters on nested transitions
2026-05-06 1:57 [PATCH v6 00/16] Yosry Ahmed
` (9 preceding siblings ...)
2026-05-06 1:57 ` [PATCH v6 10/16] KVM: x86/pmu: Track mediated PMU counters with mode-specific enables Yosry Ahmed
@ 2026-05-06 1:57 ` Yosry Ahmed
2026-05-06 1:57 ` [PATCH v6 12/16] KVM: x86/pmu: Allow Host-Only/Guest-Only bits with nSVM and mediated PMU Yosry Ahmed
` (5 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-05-06 1:57 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Dapeng Mi, Sandipan Das,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, kvm, linux-kernel,
Yosry Ahmed
Reprogram PMU counters on nested transitions for the mediated PMU, to
re-evaluate Host-Only and Guest-Only bits and enable/disable the PMU
counters accordingly. For example, if Host-Only is set and Guest-Only is
cleared, a counter should be disabled when entering guest mode and
enabled when exiting guest mode.
According to the APM, when EFER.SVME is cleared, setting Host-Only or
Guest-Only disables the counter, so also trigger counter reprogramming
when EFER.SVME is toggled.
Counters setting any of Host-Only and Guest-Only bits are already being
tracked in pmc_has_mode_specific_enables, use the bitmap to reprogram
these counters.
Reprogram the counters synchronously on nested VMRUN/#VMEXIT and
EFER.SVME toggling. This is necessary as these instructions are counted
based on the new CPU state (after the instruction is retired in
hardware). Hence, the PMU needs to be updated before instruction
emulation is completed and kvm_pmu_instruction_retired() is called.
Defer reprogramming the counters when force leaving guest mode through
svm_leave_nested() to avoid potentially reading stale state (e.g.
incorrect EFER). All flows force leaving nested are not architectural,
so precision is not a priority.
Refactor a helper out of kvm_pmu_request_reprogram_counters() that
accepts a boolean allowing synchronous vs deferred reprogramming, and
use that from SVM code to support both scenarios.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/pmu.c | 1 +
arch/x86/kvm/pmu.h | 18 ++++++++++++++----
arch/x86/kvm/svm/nested.c | 12 ++++++++++++
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/svm/svm.h | 22 ++++++++++++++++++++++
5 files changed, 50 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 84c834ad2cd47..b92dd2e583356 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -685,6 +685,7 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
kvm_for_each_pmc(pmu, pmc, bit, bitmap)
kvm_pmu_recalc_pmc_emulation(pmu, pmc);
}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_pmu_handle_event);
int kvm_pmu_check_rdpmc_early(struct kvm_vcpu *vcpu, unsigned int idx)
{
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 34c3c6913ef62..a5821d7c87f93 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -216,6 +216,7 @@ extern struct x86_pmu_capability kvm_pmu_cap;
void kvm_init_pmu_capability(struct kvm_pmu_ops *pmu_ops);
void kvm_pmu_recalc_pmc_emulation(struct kvm_pmu *pmu, struct kvm_pmc *pmc);
+void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
static inline void kvm_pmu_request_counter_reprogram(struct kvm_pmc *pmc)
{
@@ -225,14 +226,24 @@ static inline void kvm_pmu_request_counter_reprogram(struct kvm_pmc *pmc)
kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
}
-static inline void kvm_pmu_request_counters_reprogram(struct kvm_pmu *pmu,
- u64 counters)
+static inline void __kvm_pmu_reprogram_counters(struct kvm_pmu *pmu,
+ u64 counters,
+ bool defer)
{
if (!counters)
return;
atomic64_or(counters, &pmu->__reprogram_pmi);
- kvm_make_request(KVM_REQ_PMU, pmu_to_vcpu(pmu));
+ if (defer)
+ kvm_make_request(KVM_REQ_PMU, pmu_to_vcpu(pmu));
+ else
+ kvm_pmu_handle_event(pmu_to_vcpu(pmu));
+}
+
+static inline void kvm_pmu_request_counters_reprogram(struct kvm_pmu *pmu,
+ u64 counters)
+{
+ __kvm_pmu_reprogram_counters(pmu, counters, true);
}
/*
@@ -261,7 +272,6 @@ static inline bool kvm_pmu_is_fastpath_emulation_allowed(struct kvm_vcpu *vcpu)
}
void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
-void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
int kvm_pmu_check_rdpmc_early(struct kvm_vcpu *vcpu, unsigned int idx);
bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 58c78c889a812..bb3362c043395 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -826,6 +826,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
/* Enter Guest-Mode */
enter_guest_mode(vcpu);
+ svm_pmu_handle_nested_transition(svm);
/*
* Filled at exit: exit_code, exit_info_1, exit_info_2, exit_int_info,
@@ -1302,6 +1303,8 @@ void nested_svm_vmexit(struct vcpu_svm *svm)
/* Exit Guest-Mode */
leave_guest_mode(vcpu);
+ svm_pmu_handle_nested_transition(svm);
+
svm->nested.vmcb12_gpa = 0;
kvm_warn_on_nested_run_pending(vcpu);
@@ -1519,6 +1522,15 @@ void svm_leave_nested(struct kvm_vcpu *vcpu)
leave_guest_mode(vcpu);
+ /*
+ * Force leaving nested is a non-architectural flow so precision
+ * is not a priority. Defer updating the PMU until the next vCPU
+ * run, potentially tolerating some imprecision to avoid poking
+ * into PMU state from arbitrary contexts (e.g. KVM may end up
+ * using stale state).
+ */
+ __svm_pmu_handle_nested_transition(svm, true);
+
svm_switch_vmcb(svm, &svm->vmcb01);
nested_svm_uninit_mmu_context(vcpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e7fdd7a9c280d..7d3a142e63ff8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -261,6 +261,7 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
set_exception_intercept(svm, GP_VECTOR);
}
+ svm_pmu_handle_nested_transition(svm);
kvm_make_request(KVM_REQ_RECALC_INTERCEPTS, vcpu);
}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a10668d17a16a..71a49af941f4e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -24,6 +24,7 @@
#include "cpuid.h"
#include "kvm_cache_regs.h"
+#include "pmu.h"
/*
* Helpers to convert to/from physical addresses for pages whose address is
@@ -877,6 +878,27 @@ void nested_sync_control_from_vmcb02(struct vcpu_svm *svm);
void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm);
void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb);
+
+static inline void __svm_pmu_handle_nested_transition(struct vcpu_svm *svm, bool defer)
+{
+ struct kvm_pmu *pmu = vcpu_to_pmu(&svm->vcpu);
+ u64 counters = *(u64 *)pmu->pmc_has_mode_specific_enables;
+
+ __kvm_pmu_reprogram_counters(pmu, counters, defer);
+}
+
+static inline void svm_pmu_handle_nested_transition(struct vcpu_svm *svm)
+{
+ /*
+ * Do NOT defer reprogramming the counters by default. Instructions
+ * causing a state change are counted based on the _new_ CPU state
+ * (e.g. a successful VMRUN is counted in guest mode). Hence, the
+ * counters should be reprogrammed with the new state _before_ the
+ * instruction is potentially counted upon emulation completion.
+ */
+ __svm_pmu_handle_nested_transition(svm, false);
+}
+
extern struct kvm_x86_nested_ops svm_nested_ops;
/* avic.c */
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v6 12/16] KVM: x86/pmu: Allow Host-Only/Guest-Only bits with nSVM and mediated PMU
2026-05-06 1:57 [PATCH v6 00/16] Yosry Ahmed
` (10 preceding siblings ...)
2026-05-06 1:57 ` [PATCH v6 11/16] KVM: x86/pmu: Reprogram Host/Guest-Only counters on nested transitions Yosry Ahmed
@ 2026-05-06 1:57 ` Yosry Ahmed
2026-05-06 1:57 ` [PATCH v6 13/16] KVM: selftests: Refactor allocating guest stack into a helper Yosry Ahmed
` (4 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-05-06 1:57 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Dapeng Mi, Sandipan Das,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, kvm, linux-kernel,
Yosry Ahmed
From: Jim Mattson <jmattson@google.com>
Now that KVM correctly handles Host-Only and Guest-Only bits in the
event selector MSRs, allow the guest to set them if the vCPU advertises
SVM and uses the mediated PMU.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/svm/pmu.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index b892a25ea4ca9..c18286545a7ac 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -213,7 +213,11 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
}
pmu->counter_bitmask[KVM_PMC_GP] = BIT_ULL(48) - 1;
+
pmu->reserved_bits = 0xfffffff000280000ull;
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_SVM) && kvm_vcpu_has_mediated_pmu(vcpu))
+ pmu->reserved_bits &= ~AMD64_EVENTSEL_HOST_GUEST_MASK;
+
pmu->raw_event_mask = AMD64_RAW_EVENT_MASK;
/* not applicable to AMD; but clean them to prevent any fall out */
pmu->counter_bitmask[KVM_PMC_FIXED] = 0;
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v6 13/16] KVM: selftests: Refactor allocating guest stack into a helper
2026-05-06 1:57 [PATCH v6 00/16] Yosry Ahmed
` (11 preceding siblings ...)
2026-05-06 1:57 ` [PATCH v6 12/16] KVM: x86/pmu: Allow Host-Only/Guest-Only bits with nSVM and mediated PMU Yosry Ahmed
@ 2026-05-06 1:57 ` Yosry Ahmed
2026-05-06 1:57 ` [PATCH v6 14/16] KVM: selftests: Allocate a dedicated guest page for x86 L2 guest stack Yosry Ahmed
` (3 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-05-06 1:57 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Dapeng Mi, Sandipan Das,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, kvm, linux-kernel,
Yosry Ahmed
In preparation for reusing the logic to allocate stacks for nested
guests, refactoring allocating a guest stack and aligning RSP into a
helper.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
.../testing/selftests/kvm/lib/x86/processor.c | 45 ++++++++++---------
1 file changed, 25 insertions(+), 20 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index b51467d70f6e7..94a1cadb2b26b 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -778,6 +778,30 @@ void assert_on_unhandled_exception(struct kvm_vcpu *vcpu)
REPORT_GUEST_ASSERT(uc);
}
+static gva_t vm_alloc_stack(struct kvm_vm *vm, int nr_pages)
+{
+ int size = nr_pages * getpagesize();
+ gva_t stack_gva;
+
+ stack_gva = __vm_alloc(vm, size, DEFAULT_GUEST_STACK_VADDR_MIN, MEM_REGION_DATA);
+ stack_gva += size;
+
+ /*
+ * Align stack to match calling sequence requirements in section "The
+ * Stack Frame" of the System V ABI AMD64 Architecture Processor
+ * Supplement, which requires the value (%rsp + 8) to be a multiple of
+ * 16 when control is transferred to the function entry point.
+ *
+ * If this code is ever used to launch a vCPU with 32-bit entry point it
+ * may need to subtract 4 bytes instead of 8 bytes.
+ */
+ TEST_ASSERT(IS_ALIGNED(stack_gva, PAGE_SIZE),
+ "__vm_alloc() did not provide a page-aligned address");
+ stack_gva -= 8;
+
+ return stack_gva;
+}
+
void kvm_arch_vm_post_create(struct kvm_vm *vm, unsigned int nr_vcpus)
{
int r;
@@ -820,27 +844,8 @@ struct kvm_vcpu *vm_arch_vcpu_add(struct kvm_vm *vm, u32 vcpu_id)
{
struct kvm_mp_state mp_state;
struct kvm_regs regs;
- gva_t stack_gva;
struct kvm_vcpu *vcpu;
- stack_gva = __vm_alloc(vm, DEFAULT_STACK_PGS * getpagesize(),
- DEFAULT_GUEST_STACK_VADDR_MIN, MEM_REGION_DATA);
-
- stack_gva += DEFAULT_STACK_PGS * getpagesize();
-
- /*
- * Align stack to match calling sequence requirements in section "The
- * Stack Frame" of the System V ABI AMD64 Architecture Processor
- * Supplement, which requires the value (%rsp + 8) to be a multiple of
- * 16 when control is transferred to the function entry point.
- *
- * If this code is ever used to launch a vCPU with 32-bit entry point it
- * may need to subtract 4 bytes instead of 8 bytes.
- */
- TEST_ASSERT(IS_ALIGNED(stack_gva, PAGE_SIZE),
- "__vm_alloc() did not provide a page-aligned address");
- stack_gva -= 8;
-
vcpu = __vm_vcpu_add(vm, vcpu_id);
vcpu_init_cpuid(vcpu, kvm_get_supported_cpuid());
vcpu_init_sregs(vm, vcpu);
@@ -849,7 +854,7 @@ struct kvm_vcpu *vm_arch_vcpu_add(struct kvm_vm *vm, u32 vcpu_id)
/* Setup guest general purpose registers */
vcpu_regs_get(vcpu, ®s);
regs.rflags = regs.rflags | 0x2;
- regs.rsp = stack_gva;
+ regs.rsp = vm_alloc_stack(vm, DEFAULT_STACK_PGS);
vcpu_regs_set(vcpu, ®s);
/* Setup the MP state */
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v6 14/16] KVM: selftests: Allocate a dedicated guest page for x86 L2 guest stack
2026-05-06 1:57 [PATCH v6 00/16] Yosry Ahmed
` (12 preceding siblings ...)
2026-05-06 1:57 ` [PATCH v6 13/16] KVM: selftests: Refactor allocating guest stack into a helper Yosry Ahmed
@ 2026-05-06 1:57 ` Yosry Ahmed
2026-05-06 1:57 ` [PATCH v6 15/16] KVM: selftests: Drop L1-provided stacks for L2 guests on x86 Yosry Ahmed
` (2 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-05-06 1:57 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Dapeng Mi, Sandipan Das,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, kvm, linux-kernel,
Yosry Ahmed
Instead of relying on the L1-provided stack for L2, which is usually an
array on L1's own stack, allocate a dedicated page of VM memory for the
L2 stack in vcpu_alloc_{vmx/svm}() and use that as L2's RSP in the
VMCS/VMCB instead of the L1-provided value.
Most L1 guest code does not do anything with the L2 stack other than
stuff it in RSP, so this change is transparent and the L1-provided stack
is silently ignored. The only exception is memstress nested L1 code
which puts the vCPU index on L2's stack, so update this code to use the
newly allocated stack.
L1-provided stacks will be dropped and cleaned up separately.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
tools/testing/selftests/kvm/include/x86/processor.h | 2 ++
tools/testing/selftests/kvm/include/x86/svm_util.h | 3 +++
tools/testing/selftests/kvm/include/x86/vmx.h | 2 ++
tools/testing/selftests/kvm/lib/x86/memstress.c | 5 ++---
tools/testing/selftests/kvm/lib/x86/processor.c | 2 +-
tools/testing/selftests/kvm/lib/x86/svm.c | 4 +++-
tools/testing/selftests/kvm/lib/x86/vmx.c | 4 +++-
7 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 77f576ee7789d..36df2cadbc4f6 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1208,6 +1208,8 @@ struct idt_entry {
void vm_install_exception_handler(struct kvm_vm *vm, int vector,
void (*handler)(struct ex_regs *));
+gva_t vm_alloc_stack(struct kvm_vm *vm, int nr_pages);
+
/*
* Exception fixup morphs #DE to an arbitrary magic vector so that '0' can be
* used to signal "no expcetion".
diff --git a/tools/testing/selftests/kvm/include/x86/svm_util.h b/tools/testing/selftests/kvm/include/x86/svm_util.h
index 6c013eb838beb..3b1cc484fba1c 100644
--- a/tools/testing/selftests/kvm/include/x86/svm_util.h
+++ b/tools/testing/selftests/kvm/include/x86/svm_util.h
@@ -28,6 +28,9 @@ struct svm_test_data {
void *msr_hva;
u64 msr_gpa;
+ /* Stack */
+ void *stack; /* gva */
+
/* NPT */
u64 ncr3_gpa;
};
diff --git a/tools/testing/selftests/kvm/include/x86/vmx.h b/tools/testing/selftests/kvm/include/x86/vmx.h
index 90fffaf915958..1dcb9b86d33d3 100644
--- a/tools/testing/selftests/kvm/include/x86/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86/vmx.h
@@ -524,6 +524,8 @@ struct vmx_pages {
u64 apic_access_gpa;
void *apic_access;
+ void *stack;
+
u64 eptp_gpa;
};
diff --git a/tools/testing/selftests/kvm/lib/x86/memstress.c b/tools/testing/selftests/kvm/lib/x86/memstress.c
index 61cf952cd2dc2..fa07ef037cad1 100644
--- a/tools/testing/selftests/kvm/lib/x86/memstress.c
+++ b/tools/testing/selftests/kvm/lib/x86/memstress.c
@@ -43,7 +43,7 @@ static void l1_vmx_code(struct vmx_pages *vmx, u64 vcpu_id)
GUEST_ASSERT(ept_1g_pages_supported());
rsp = &l2_guest_stack[L2_GUEST_STACK_SIZE - 1];
- *rsp = vcpu_id;
+ *(u64 *)vmx->stack = vcpu_id;
prepare_vmcs(vmx, memstress_l2_guest_entry, rsp);
GUEST_ASSERT(!vmlaunch());
@@ -56,9 +56,8 @@ static void l1_svm_code(struct svm_test_data *svm, u64 vcpu_id)
unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
unsigned long *rsp;
-
rsp = &l2_guest_stack[L2_GUEST_STACK_SIZE - 1];
- *rsp = vcpu_id;
+ *(u64 *)svm->stack = vcpu_id;
generic_svm_setup(svm, memstress_l2_guest_entry, rsp);
run_guest(svm->vmcb, svm->vmcb_gpa);
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index 94a1cadb2b26b..cf59ffed45b74 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -778,7 +778,7 @@ void assert_on_unhandled_exception(struct kvm_vcpu *vcpu)
REPORT_GUEST_ASSERT(uc);
}
-static gva_t vm_alloc_stack(struct kvm_vm *vm, int nr_pages)
+gva_t vm_alloc_stack(struct kvm_vm *vm, int nr_pages)
{
int size = nr_pages * getpagesize();
gva_t stack_gva;
diff --git a/tools/testing/selftests/kvm/lib/x86/svm.c b/tools/testing/selftests/kvm/lib/x86/svm.c
index 3b01605ab016c..4e9c37f8d1a61 100644
--- a/tools/testing/selftests/kvm/lib/x86/svm.c
+++ b/tools/testing/selftests/kvm/lib/x86/svm.c
@@ -46,6 +46,8 @@ vcpu_alloc_svm(struct kvm_vm *vm, gva_t *p_svm_gva)
svm->msr_gpa = addr_gva2gpa(vm, (uintptr_t)svm->msr);
memset(svm->msr_hva, 0, getpagesize());
+ svm->stack = (void *)vm_alloc_stack(vm, 1);
+
if (vm->stage2_mmu.pgd_created)
svm->ncr3_gpa = vm->stage2_mmu.pgd;
@@ -122,7 +124,7 @@ void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_r
ctrl->msrpm_base_pa = svm->msr_gpa;
vmcb->save.rip = (u64)guest_rip;
- vmcb->save.rsp = (u64)guest_rsp;
+ vmcb->save.rsp = (u64)svm->stack;
guest_regs.rdi = (u64)svm;
if (svm->ncr3_gpa) {
diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
index 67642759e4a05..81fe85cf22e8f 100644
--- a/tools/testing/selftests/kvm/lib/x86/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
@@ -116,6 +116,8 @@ vcpu_alloc_vmx(struct kvm_vm *vm, gva_t *p_vmx_gva)
vmx->vmwrite_gpa = addr_gva2gpa(vm, (uintptr_t)vmx->vmwrite);
memset(vmx->vmwrite_hva, 0, getpagesize());
+ vmx->stack = (void *)vm_alloc_stack(vm, 1);
+
if (vm->stage2_mmu.pgd_created)
vmx->eptp_gpa = vm->stage2_mmu.pgd;
@@ -370,7 +372,7 @@ void prepare_vmcs(struct vmx_pages *vmx, void *guest_rip, void *guest_rsp)
{
init_vmcs_control_fields(vmx);
init_vmcs_host_state();
- init_vmcs_guest_state(guest_rip, guest_rsp);
+ init_vmcs_guest_state(guest_rip, vmx->stack);
}
bool kvm_cpu_has_ept(void)
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v6 15/16] KVM: selftests: Drop L1-provided stacks for L2 guests on x86
2026-05-06 1:57 [PATCH v6 00/16] Yosry Ahmed
` (13 preceding siblings ...)
2026-05-06 1:57 ` [PATCH v6 14/16] KVM: selftests: Allocate a dedicated guest page for x86 L2 guest stack Yosry Ahmed
@ 2026-05-06 1:57 ` Yosry Ahmed
2026-05-06 1:57 ` [PATCH v6 16/16] KVM: selftests: Add svm_pmu_host_guest_test for Host-Only/Guest-Only bits Yosry Ahmed
2026-05-06 2:00 ` [PATCH v6 00/16] Yosry Ahmed
16 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-05-06 1:57 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Dapeng Mi, Sandipan Das,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, kvm, linux-kernel,
Yosry Ahmed
Now that a dedicated page is allocated for L2's stack and stuffed in
RSP, the L1-provided stack is unused. Drop the stacks allocated by L1
guest code for L2 in all x86 tests.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
tools/testing/selftests/kvm/include/x86/svm_util.h | 2 +-
tools/testing/selftests/kvm/include/x86/vmx.h | 2 +-
tools/testing/selftests/kvm/lib/x86/memstress.c | 14 ++------------
tools/testing/selftests/kvm/lib/x86/svm.c | 2 +-
tools/testing/selftests/kvm/lib/x86/vmx.c | 2 +-
tools/testing/selftests/kvm/x86/aperfmperf_test.c | 9 ++-------
.../selftests/kvm/x86/evmcs_smm_controls_test.c | 5 +----
tools/testing/selftests/kvm/x86/hyperv_evmcs.c | 6 +-----
tools/testing/selftests/kvm/x86/hyperv_svm_test.c | 6 +-----
tools/testing/selftests/kvm/x86/kvm_buslock_test.c | 9 ++-------
.../selftests/kvm/x86/nested_close_kvm_test.c | 12 ++----------
.../selftests/kvm/x86/nested_dirty_log_test.c | 8 ++------
.../selftests/kvm/x86/nested_emulation_test.c | 4 ++--
.../selftests/kvm/x86/nested_exceptions_test.c | 9 ++-------
.../selftests/kvm/x86/nested_invalid_cr3_test.c | 10 ++--------
.../selftests/kvm/x86/nested_tsc_adjust_test.c | 10 ++--------
.../selftests/kvm/x86/nested_tsc_scaling_test.c | 10 ++--------
.../selftests/kvm/x86/nested_vmsave_vmload_test.c | 6 +-----
tools/testing/selftests/kvm/x86/smm_test.c | 8 ++------
tools/testing/selftests/kvm/x86/state_test.c | 11 ++---------
tools/testing/selftests/kvm/x86/svm_int_ctl_test.c | 5 +----
.../selftests/kvm/x86/svm_lbr_nested_state.c | 6 +-----
.../selftests/kvm/x86/svm_nested_clear_efer_svme.c | 7 +------
.../selftests/kvm/x86/svm_nested_shutdown_test.c | 5 +----
.../kvm/x86/svm_nested_soft_inject_test.c | 6 +-----
.../selftests/kvm/x86/svm_nested_vmcb12_gpa.c | 13 ++++---------
tools/testing/selftests/kvm/x86/svm_vmcall_test.c | 5 +----
.../selftests/kvm/x86/triple_fault_event_test.c | 9 ++-------
.../selftests/kvm/x86/vmx_apic_access_test.c | 5 +----
.../selftests/kvm/x86/vmx_apicv_updates_test.c | 4 +---
.../kvm/x86/vmx_invalid_nested_guest_state.c | 6 +-----
.../selftests/kvm/x86/vmx_nested_la57_state_test.c | 5 +----
.../selftests/kvm/x86/vmx_preemption_timer_test.c | 5 +----
33 files changed, 49 insertions(+), 177 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/svm_util.h b/tools/testing/selftests/kvm/include/x86/svm_util.h
index 3b1cc484fba1c..c201c30485e72 100644
--- a/tools/testing/selftests/kvm/include/x86/svm_util.h
+++ b/tools/testing/selftests/kvm/include/x86/svm_util.h
@@ -60,7 +60,7 @@ static inline void vmmcall(void)
)
struct svm_test_data *vcpu_alloc_svm(struct kvm_vm *vm, gva_t *p_svm_gva);
-void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_rsp);
+void generic_svm_setup(struct svm_test_data *svm, void *guest_rip);
void run_guest(struct vmcb *vmcb, u64 vmcb_gpa);
static inline bool kvm_cpu_has_npt(void)
diff --git a/tools/testing/selftests/kvm/include/x86/vmx.h b/tools/testing/selftests/kvm/include/x86/vmx.h
index 1dcb9b86d33d3..4bcfd60e3aecb 100644
--- a/tools/testing/selftests/kvm/include/x86/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86/vmx.h
@@ -554,7 +554,7 @@ union vmx_ctrl_msr {
struct vmx_pages *vcpu_alloc_vmx(struct kvm_vm *vm, gva_t *p_vmx_gva);
bool prepare_for_vmx_operation(struct vmx_pages *vmx);
-void prepare_vmcs(struct vmx_pages *vmx, void *guest_rip, void *guest_rsp);
+void prepare_vmcs(struct vmx_pages *vmx, void *guest_rip);
bool load_vmcs(struct vmx_pages *vmx);
bool ept_1g_pages_supported(void);
diff --git a/tools/testing/selftests/kvm/lib/x86/memstress.c b/tools/testing/selftests/kvm/lib/x86/memstress.c
index fa07ef037cad1..e19e8b5a09c5a 100644
--- a/tools/testing/selftests/kvm/lib/x86/memstress.c
+++ b/tools/testing/selftests/kvm/lib/x86/memstress.c
@@ -30,21 +30,15 @@ __asm__(
" ud2;"
);
-#define L2_GUEST_STACK_SIZE 64
-
static void l1_vmx_code(struct vmx_pages *vmx, u64 vcpu_id)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
- unsigned long *rsp;
-
GUEST_ASSERT(vmx->vmcs_gpa);
GUEST_ASSERT(prepare_for_vmx_operation(vmx));
GUEST_ASSERT(load_vmcs(vmx));
GUEST_ASSERT(ept_1g_pages_supported());
- rsp = &l2_guest_stack[L2_GUEST_STACK_SIZE - 1];
*(u64 *)vmx->stack = vcpu_id;
- prepare_vmcs(vmx, memstress_l2_guest_entry, rsp);
+ prepare_vmcs(vmx, memstress_l2_guest_entry);
GUEST_ASSERT(!vmlaunch());
GUEST_ASSERT_EQ(vmreadz(VM_EXIT_REASON), EXIT_REASON_VMCALL);
@@ -53,12 +47,8 @@ static void l1_vmx_code(struct vmx_pages *vmx, u64 vcpu_id)
static void l1_svm_code(struct svm_test_data *svm, u64 vcpu_id)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
- unsigned long *rsp;
-
- rsp = &l2_guest_stack[L2_GUEST_STACK_SIZE - 1];
*(u64 *)svm->stack = vcpu_id;
- generic_svm_setup(svm, memstress_l2_guest_entry, rsp);
+ generic_svm_setup(svm, memstress_l2_guest_entry);
run_guest(svm->vmcb, svm->vmcb_gpa);
GUEST_ASSERT_EQ(svm->vmcb->control.exit_code, SVM_EXIT_VMMCALL);
diff --git a/tools/testing/selftests/kvm/lib/x86/svm.c b/tools/testing/selftests/kvm/lib/x86/svm.c
index 4e9c37f8d1a61..1445b890986fd 100644
--- a/tools/testing/selftests/kvm/lib/x86/svm.c
+++ b/tools/testing/selftests/kvm/lib/x86/svm.c
@@ -83,7 +83,7 @@ void vm_enable_npt(struct kvm_vm *vm)
tdp_mmu_init(vm, vm->mmu.pgtable_levels, &pte_masks);
}
-void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_rsp)
+void generic_svm_setup(struct svm_test_data *svm, void *guest_rip)
{
struct vmcb *vmcb = svm->vmcb;
u64 vmcb_gpa = svm->vmcb_gpa;
diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
index 81fe85cf22e8f..33c477ce4a58b 100644
--- a/tools/testing/selftests/kvm/lib/x86/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
@@ -368,7 +368,7 @@ static inline void init_vmcs_guest_state(void *rip, void *rsp)
vmwrite(GUEST_SYSENTER_EIP, vmreadz(HOST_IA32_SYSENTER_EIP));
}
-void prepare_vmcs(struct vmx_pages *vmx, void *guest_rip, void *guest_rsp)
+void prepare_vmcs(struct vmx_pages *vmx, void *guest_rip)
{
init_vmcs_control_fields(vmx);
init_vmcs_host_state();
diff --git a/tools/testing/selftests/kvm/x86/aperfmperf_test.c b/tools/testing/selftests/kvm/x86/aperfmperf_test.c
index c91660103137b..845cb685f1743 100644
--- a/tools/testing/selftests/kvm/x86/aperfmperf_test.c
+++ b/tools/testing/selftests/kvm/x86/aperfmperf_test.c
@@ -54,8 +54,6 @@ static void guest_read_aperf_mperf(void)
GUEST_SYNC2(rdmsr(MSR_IA32_APERF), rdmsr(MSR_IA32_MPERF));
}
-#define L2_GUEST_STACK_SIZE 64
-
static void l2_guest_code(void)
{
guest_read_aperf_mperf();
@@ -64,21 +62,18 @@ static void l2_guest_code(void)
static void l1_svm_code(struct svm_test_data *svm)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
struct vmcb *vmcb = svm->vmcb;
- generic_svm_setup(svm, l2_guest_code, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
run_guest(vmcb, svm->vmcb_gpa);
}
static void l1_vmx_code(struct vmx_pages *vmx)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
GUEST_ASSERT_EQ(prepare_for_vmx_operation(vmx), true);
GUEST_ASSERT_EQ(load_vmcs(vmx), true);
- prepare_vmcs(vmx, NULL, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx, NULL);
/*
* Enable MSR bitmaps (the bitmap itself is allocated, zeroed, and set
diff --git a/tools/testing/selftests/kvm/x86/evmcs_smm_controls_test.c b/tools/testing/selftests/kvm/x86/evmcs_smm_controls_test.c
index 5b3aef109cfc5..77ce87c41a868 100644
--- a/tools/testing/selftests/kvm/x86/evmcs_smm_controls_test.c
+++ b/tools/testing/selftests/kvm/x86/evmcs_smm_controls_test.c
@@ -52,8 +52,6 @@ static void l2_guest_code(void)
static void guest_code(struct vmx_pages *vmx_pages,
struct hyperv_test_pages *hv_pages)
{
-#define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
/* Set up Hyper-V enlightenments and eVMCS */
wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
@@ -62,8 +60,7 @@ static void guest_code(struct vmx_pages *vmx_pages,
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_ASSERT(load_evmcs(hv_pages));
- prepare_vmcs(vmx_pages, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
GUEST_ASSERT(!vmlaunch());
diff --git a/tools/testing/selftests/kvm/x86/hyperv_evmcs.c b/tools/testing/selftests/kvm/x86/hyperv_evmcs.c
index c7fa114aee20f..1bda2cd3f7396 100644
--- a/tools/testing/selftests/kvm/x86/hyperv_evmcs.c
+++ b/tools/testing/selftests/kvm/x86/hyperv_evmcs.c
@@ -78,9 +78,6 @@ void l2_guest_code(void)
void guest_code(struct vmx_pages *vmx_pages, struct hyperv_test_pages *hv_pages,
gpa_t hv_hcall_page_gpa)
{
-#define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
wrmsr(HV_X64_MSR_HYPERCALL, hv_hcall_page_gpa);
@@ -100,8 +97,7 @@ void guest_code(struct vmx_pages *vmx_pages, struct hyperv_test_pages *hv_pages,
GUEST_SYNC(4);
GUEST_ASSERT(vmptrstz() == hv_pages->enlightened_vmcs_gpa);
- prepare_vmcs(vmx_pages, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
GUEST_SYNC(5);
GUEST_ASSERT(vmptrstz() == hv_pages->enlightened_vmcs_gpa);
diff --git a/tools/testing/selftests/kvm/x86/hyperv_svm_test.c b/tools/testing/selftests/kvm/x86/hyperv_svm_test.c
index 7a62f6a9d606d..1f74b0fa9b835 100644
--- a/tools/testing/selftests/kvm/x86/hyperv_svm_test.c
+++ b/tools/testing/selftests/kvm/x86/hyperv_svm_test.c
@@ -18,8 +18,6 @@
#include "svm_util.h"
#include "hyperv.h"
-#define L2_GUEST_STACK_SIZE 256
-
/* Exit to L1 from L2 with RDMSR instruction */
static inline void rdmsr_from_l2(u32 msr)
{
@@ -69,7 +67,6 @@ static void __attribute__((__flatten__)) guest_code(struct svm_test_data *svm,
struct hyperv_test_pages *hv_pages,
gpa_t pgs_gpa)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
struct vmcb *vmcb = svm->vmcb;
struct hv_vmcb_enlightenments *hve = &vmcb->control.hv_enlightenments;
@@ -81,8 +78,7 @@ static void __attribute__((__flatten__)) guest_code(struct svm_test_data *svm,
GUEST_ASSERT(svm->vmcb_gpa);
/* Prepare for L2 execution. */
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
/* L2 TLB flush setup */
hve->partition_assist_page = hv_pages->partition_assist_gpa;
diff --git a/tools/testing/selftests/kvm/x86/kvm_buslock_test.c b/tools/testing/selftests/kvm/x86/kvm_buslock_test.c
index 52014a3210c88..25a182be00a97 100644
--- a/tools/testing/selftests/kvm/x86/kvm_buslock_test.c
+++ b/tools/testing/selftests/kvm/x86/kvm_buslock_test.c
@@ -26,8 +26,6 @@ static void guest_generate_buslocks(void)
atomic_inc(val);
}
-#define L2_GUEST_STACK_SIZE 64
-
static void l2_guest_code(void)
{
guest_generate_buslocks();
@@ -36,21 +34,18 @@ static void l2_guest_code(void)
static void l1_svm_code(struct svm_test_data *svm)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
struct vmcb *vmcb = svm->vmcb;
- generic_svm_setup(svm, l2_guest_code, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
run_guest(vmcb, svm->vmcb_gpa);
}
static void l1_vmx_code(struct vmx_pages *vmx)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
GUEST_ASSERT_EQ(prepare_for_vmx_operation(vmx), true);
GUEST_ASSERT_EQ(load_vmcs(vmx), true);
- prepare_vmcs(vmx, NULL, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx, NULL);
GUEST_ASSERT(!vmwrite(GUEST_RIP, (u64)l2_guest_code));
GUEST_ASSERT(!vmlaunch());
diff --git a/tools/testing/selftests/kvm/x86/nested_close_kvm_test.c b/tools/testing/selftests/kvm/x86/nested_close_kvm_test.c
index 761fec2934080..b974cfb347d6e 100644
--- a/tools/testing/selftests/kvm/x86/nested_close_kvm_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_close_kvm_test.c
@@ -21,8 +21,6 @@ enum {
PORT_L0_EXIT = 0x2000,
};
-#define L2_GUEST_STACK_SIZE 64
-
static void l2_guest_code(void)
{
/* Exit to L0 */
@@ -32,14 +30,11 @@ static void l2_guest_code(void)
static void l1_vmx_code(struct vmx_pages *vmx_pages)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_ASSERT(load_vmcs(vmx_pages));
/* Prepare the VMCS for L2 execution. */
- prepare_vmcs(vmx_pages, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
GUEST_ASSERT(!vmlaunch());
GUEST_ASSERT(0);
@@ -47,11 +42,8 @@ static void l1_vmx_code(struct vmx_pages *vmx_pages)
static void l1_svm_code(struct svm_test_data *svm)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
/* Prepare the VMCB for L2 execution. */
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
run_guest(svm->vmcb, svm->vmcb_gpa);
GUEST_ASSERT(0);
diff --git a/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c b/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c
index 0e67cce835701..26b474bf13535 100644
--- a/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c
@@ -40,8 +40,6 @@
#define TEST_HVA(vm, idx) addr_gpa2hva(vm, TEST_GPA(idx))
-#define L2_GUEST_STACK_SIZE 64
-
/* Use the page offset bits to communicate the access+fault type. */
#define TEST_SYNC_READ_FAULT BIT(0)
#define TEST_SYNC_WRITE_FAULT BIT(1)
@@ -92,7 +90,6 @@ static void l2_guest_code_tdp_disabled(void)
void l1_vmx_code(struct vmx_pages *vmx)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
void *l2_rip;
GUEST_ASSERT(vmx->vmcs_gpa);
@@ -104,7 +101,7 @@ void l1_vmx_code(struct vmx_pages *vmx)
else
l2_rip = l2_guest_code_tdp_disabled;
- prepare_vmcs(vmx, l2_rip, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx, l2_rip);
GUEST_SYNC(TEST_SYNC_NO_FAULT);
GUEST_ASSERT(!vmlaunch());
@@ -115,7 +112,6 @@ void l1_vmx_code(struct vmx_pages *vmx)
static void l1_svm_code(struct svm_test_data *svm)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
void *l2_rip;
if (svm->ncr3_gpa)
@@ -123,7 +119,7 @@ static void l1_svm_code(struct svm_test_data *svm)
else
l2_rip = l2_guest_code_tdp_disabled;
- generic_svm_setup(svm, l2_rip, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_rip);
GUEST_SYNC(TEST_SYNC_NO_FAULT);
run_guest(svm->vmcb, svm->vmcb_gpa);
diff --git a/tools/testing/selftests/kvm/x86/nested_emulation_test.c b/tools/testing/selftests/kvm/x86/nested_emulation_test.c
index fb7dcbe53ac73..e08c6b0697e50 100644
--- a/tools/testing/selftests/kvm/x86/nested_emulation_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_emulation_test.c
@@ -57,7 +57,7 @@ static void guest_code(void *test_data)
struct svm_test_data *svm = test_data;
struct vmcb *vmcb = svm->vmcb;
- generic_svm_setup(svm, NULL, NULL);
+ generic_svm_setup(svm, NULL);
vmcb->save.idtr.limit = 0;
vmcb->save.rip = (u64)l2_guest_code;
@@ -69,7 +69,7 @@ static void guest_code(void *test_data)
GUEST_ASSERT(prepare_for_vmx_operation(test_data));
GUEST_ASSERT(load_vmcs(test_data));
- prepare_vmcs(test_data, NULL, NULL);
+ prepare_vmcs(test_data, NULL);
GUEST_ASSERT(!vmwrite(GUEST_IDTR_LIMIT, 0));
GUEST_ASSERT(!vmwrite(GUEST_RIP, (u64)l2_guest_code));
GUEST_ASSERT(!vmwrite(EXCEPTION_BITMAP, 0));
diff --git a/tools/testing/selftests/kvm/x86/nested_exceptions_test.c b/tools/testing/selftests/kvm/x86/nested_exceptions_test.c
index 186e980aa8eee..aeec3121c8e83 100644
--- a/tools/testing/selftests/kvm/x86/nested_exceptions_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_exceptions_test.c
@@ -5,8 +5,6 @@
#include "vmx.h"
#include "svm_util.h"
-#define L2_GUEST_STACK_SIZE 256
-
/*
* Arbitrary, never shoved into KVM/hardware, just need to avoid conflict with
* the "real" exceptions used, #SS/#GP/#DF (12/13/8).
@@ -91,9 +89,8 @@ static void svm_run_l2(struct svm_test_data *svm, void *l2_code, int vector,
static void l1_svm_code(struct svm_test_data *svm)
{
struct vmcb_control_area *ctrl = &svm->vmcb->control;
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
- generic_svm_setup(svm, NULL, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, NULL);
svm->vmcb->save.idtr.limit = 0;
ctrl->intercept |= BIT_ULL(INTERCEPT_SHUTDOWN);
@@ -128,13 +125,11 @@ static void vmx_run_l2(void *l2_code, int vector, u32 error_code)
static void l1_vmx_code(struct vmx_pages *vmx)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
GUEST_ASSERT_EQ(prepare_for_vmx_operation(vmx), true);
GUEST_ASSERT_EQ(load_vmcs(vmx), true);
- prepare_vmcs(vmx, NULL, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx, NULL);
GUEST_ASSERT_EQ(vmwrite(GUEST_IDTR_LIMIT, 0), 0);
/*
diff --git a/tools/testing/selftests/kvm/x86/nested_invalid_cr3_test.c b/tools/testing/selftests/kvm/x86/nested_invalid_cr3_test.c
index 11fd2467d8233..8c2ba9674558e 100644
--- a/tools/testing/selftests/kvm/x86/nested_invalid_cr3_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_invalid_cr3_test.c
@@ -11,8 +11,6 @@
#include "kselftest.h"
-#define L2_GUEST_STACK_SIZE 64
-
static void l2_guest_code(void)
{
vmcall();
@@ -20,11 +18,9 @@ static void l2_guest_code(void)
static void l1_svm_code(struct svm_test_data *svm)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
uintptr_t save_cr3;
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
/* Try to run L2 with invalid CR3 and make sure it fails */
save_cr3 = svm->vmcb->save.cr3;
@@ -42,14 +38,12 @@ static void l1_svm_code(struct svm_test_data *svm)
static void l1_vmx_code(struct vmx_pages *vmx_pages)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
uintptr_t save_cr3;
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_ASSERT(load_vmcs(vmx_pages));
- prepare_vmcs(vmx_pages, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
/* Try to run L2 with invalid CR3 and make sure it fails */
save_cr3 = vmreadz(GUEST_CR3);
diff --git a/tools/testing/selftests/kvm/x86/nested_tsc_adjust_test.c b/tools/testing/selftests/kvm/x86/nested_tsc_adjust_test.c
index f0e4adac47510..cb79d7b9619c2 100644
--- a/tools/testing/selftests/kvm/x86/nested_tsc_adjust_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_tsc_adjust_test.c
@@ -34,8 +34,6 @@
#define TSC_ADJUST_VALUE (1ll << 32)
#define TSC_OFFSET_VALUE -(1ll << 48)
-#define L2_GUEST_STACK_SIZE 64
-
enum {
PORT_ABORT = 0x1000,
PORT_REPORT,
@@ -75,8 +73,6 @@ static void l2_guest_code(void)
static void l1_guest_code(void *data)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
/* Set TSC from L1 and make sure TSC_ADJUST is updated correctly */
GUEST_ASSERT(rdtsc() < TSC_ADJUST_VALUE);
wrmsr(MSR_IA32_TSC, rdtsc() - TSC_ADJUST_VALUE);
@@ -93,8 +89,7 @@ static void l1_guest_code(void *data)
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_ASSERT(load_vmcs(vmx_pages));
- prepare_vmcs(vmx_pages, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
control = vmreadz(CPU_BASED_VM_EXEC_CONTROL);
control |= CPU_BASED_USE_MSR_BITMAPS | CPU_BASED_USE_TSC_OFFSETTING;
vmwrite(CPU_BASED_VM_EXEC_CONTROL, control);
@@ -105,8 +100,7 @@ static void l1_guest_code(void *data)
} else {
struct svm_test_data *svm = data;
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
svm->vmcb->control.tsc_offset = TSC_OFFSET_VALUE;
run_guest(svm->vmcb, svm->vmcb_gpa);
diff --git a/tools/testing/selftests/kvm/x86/nested_tsc_scaling_test.c b/tools/testing/selftests/kvm/x86/nested_tsc_scaling_test.c
index 190e93af20a14..18f765835bf4c 100644
--- a/tools/testing/selftests/kvm/x86/nested_tsc_scaling_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_tsc_scaling_test.c
@@ -22,8 +22,6 @@
#define TSC_OFFSET_L2 ((u64)-33125236320908)
#define TSC_MULTIPLIER_L2 (L2_SCALE_FACTOR << 48)
-#define L2_GUEST_STACK_SIZE 64
-
enum { USLEEP, UCHECK_L1, UCHECK_L2 };
#define GUEST_SLEEP(sec) ucall(UCALL_SYNC, 2, USLEEP, sec)
#define GUEST_CHECK(level, freq) ucall(UCALL_SYNC, 2, level, freq)
@@ -82,13 +80,10 @@ static void l2_guest_code(void)
static void l1_svm_code(struct svm_test_data *svm)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
/* check that L1's frequency looks alright before launching L2 */
check_tsc_freq(UCHECK_L1);
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
/* enable TSC scaling for L2 */
wrmsr(MSR_AMD64_TSC_RATIO, L2_SCALE_FACTOR << 32);
@@ -105,7 +100,6 @@ static void l1_svm_code(struct svm_test_data *svm)
static void l1_vmx_code(struct vmx_pages *vmx_pages)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
u32 control;
/* check that L1's frequency looks alright before launching L2 */
@@ -115,7 +109,7 @@ static void l1_vmx_code(struct vmx_pages *vmx_pages)
GUEST_ASSERT(load_vmcs(vmx_pages));
/* prepare the VMCS for L2 execution */
- prepare_vmcs(vmx_pages, l2_guest_code, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
/* enable TSC offsetting and TSC scaling for L2 */
control = vmreadz(CPU_BASED_VM_EXEC_CONTROL);
diff --git a/tools/testing/selftests/kvm/x86/nested_vmsave_vmload_test.c b/tools/testing/selftests/kvm/x86/nested_vmsave_vmload_test.c
index 85d3f4cc76f39..a130759f39a19 100644
--- a/tools/testing/selftests/kvm/x86/nested_vmsave_vmload_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_vmsave_vmload_test.c
@@ -28,8 +28,6 @@
#define TEST_VMCB_L2_GPA TEST_VMCB_L1_GPA(0)
-#define L2_GUEST_STACK_SIZE 64
-
static void l2_guest_code_vmsave(void)
{
asm volatile("vmsave %0" : : "a"(TEST_VMCB_L2_GPA) : "memory");
@@ -70,10 +68,8 @@ static void l2_guest_code_vmcb1(void)
static void l1_guest_code(struct svm_test_data *svm)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
/* Each test case initializes the guest RIP below */
- generic_svm_setup(svm, NULL, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, NULL);
/* Set VMSAVE/VMLOAD intercepts and make sure they work with.. */
svm->vmcb->control.intercept |= (BIT_ULL(INTERCEPT_VMSAVE) |
diff --git a/tools/testing/selftests/kvm/x86/smm_test.c b/tools/testing/selftests/kvm/x86/smm_test.c
index 740051167dbd4..e2542f4ced605 100644
--- a/tools/testing/selftests/kvm/x86/smm_test.c
+++ b/tools/testing/selftests/kvm/x86/smm_test.c
@@ -63,8 +63,6 @@ static void l2_guest_code(void)
static void guest_code(void *arg)
{
- #define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
u64 apicbase = rdmsr(MSR_IA32_APICBASE);
struct svm_test_data *svm = arg;
struct vmx_pages *vmx_pages = arg;
@@ -81,13 +79,11 @@ static void guest_code(void *arg)
if (arg) {
if (this_cpu_has(X86_FEATURE_SVM)) {
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
} else {
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_ASSERT(load_vmcs(vmx_pages));
- prepare_vmcs(vmx_pages, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
}
sync_with_host(5);
diff --git a/tools/testing/selftests/kvm/x86/state_test.c b/tools/testing/selftests/kvm/x86/state_test.c
index 409c6cc9f9214..4a1056a6cb8dc 100644
--- a/tools/testing/selftests/kvm/x86/state_test.c
+++ b/tools/testing/selftests/kvm/x86/state_test.c
@@ -19,8 +19,6 @@
#include "vmx.h"
#include "svm_util.h"
-#define L2_GUEST_STACK_SIZE 256
-
void svm_l2_guest_code(void)
{
GUEST_SYNC(4);
@@ -35,13 +33,11 @@ void svm_l2_guest_code(void)
static void svm_l1_guest_code(struct svm_test_data *svm)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
struct vmcb *vmcb = svm->vmcb;
GUEST_ASSERT(svm->vmcb_gpa);
/* Prepare for L2 execution. */
- generic_svm_setup(svm, svm_l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, svm_l2_guest_code);
vmcb->control.int_ctl |= (V_GIF_ENABLE_MASK | V_GIF_MASK);
@@ -78,8 +74,6 @@ void vmx_l2_guest_code(void)
static void vmx_l1_guest_code(struct vmx_pages *vmx_pages)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
GUEST_ASSERT(vmx_pages->vmcs_gpa);
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_SYNC(3);
@@ -89,8 +83,7 @@ static void vmx_l1_guest_code(struct vmx_pages *vmx_pages)
GUEST_SYNC(4);
GUEST_ASSERT(vmptrstz() == vmx_pages->vmcs_gpa);
- prepare_vmcs(vmx_pages, vmx_l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, vmx_l2_guest_code);
GUEST_SYNC(5);
GUEST_ASSERT(vmptrstz() == vmx_pages->vmcs_gpa);
diff --git a/tools/testing/selftests/kvm/x86/svm_int_ctl_test.c b/tools/testing/selftests/kvm/x86/svm_int_ctl_test.c
index d3cc5e4f78831..7b1f4a4818bdd 100644
--- a/tools/testing/selftests/kvm/x86/svm_int_ctl_test.c
+++ b/tools/testing/selftests/kvm/x86/svm_int_ctl_test.c
@@ -54,15 +54,12 @@ static void l2_guest_code(struct svm_test_data *svm)
static void l1_guest_code(struct svm_test_data *svm)
{
- #define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
struct vmcb *vmcb = svm->vmcb;
x2apic_enable();
/* Prepare for L2 execution. */
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
/* No virtual interrupt masking */
vmcb->control.int_ctl &= ~V_INTR_MASKING_MASK;
diff --git a/tools/testing/selftests/kvm/x86/svm_lbr_nested_state.c b/tools/testing/selftests/kvm/x86/svm_lbr_nested_state.c
index 7fbfaa054c952..77c6ce9f45078 100644
--- a/tools/testing/selftests/kvm/x86/svm_lbr_nested_state.c
+++ b/tools/testing/selftests/kvm/x86/svm_lbr_nested_state.c
@@ -9,8 +9,6 @@
#include "svm_util.h"
-#define L2_GUEST_STACK_SIZE 64
-
#define DO_BRANCH() do { asm volatile("jmp 1f\n 1: nop"); } while (0)
struct lbr_branch {
@@ -55,7 +53,6 @@ static void l2_guest_code(struct svm_test_data *svm)
static void l1_guest_code(struct svm_test_data *svm, bool nested_lbrv)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
struct vmcb *vmcb = svm->vmcb;
struct lbr_branch l1_branch;
@@ -65,8 +62,7 @@ static void l1_guest_code(struct svm_test_data *svm, bool nested_lbrv)
CHECK_BRANCH_MSRS(&l1_branch);
/* Run L2, which will also do the same */
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
if (nested_lbrv)
vmcb->control.misc_ctl2 = SVM_MISC2_ENABLE_V_LBR;
diff --git a/tools/testing/selftests/kvm/x86/svm_nested_clear_efer_svme.c b/tools/testing/selftests/kvm/x86/svm_nested_clear_efer_svme.c
index 6a89eaffc6578..6bc301207cbcb 100644
--- a/tools/testing/selftests/kvm/x86/svm_nested_clear_efer_svme.c
+++ b/tools/testing/selftests/kvm/x86/svm_nested_clear_efer_svme.c
@@ -8,8 +8,6 @@
#include "kselftest.h"
-#define L2_GUEST_STACK_SIZE 64
-
static void l2_guest_code(void)
{
unsigned long efer = rdmsr(MSR_EFER);
@@ -24,10 +22,7 @@ static void l2_guest_code(void)
static void l1_guest_code(struct svm_test_data *svm)
{
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
run_guest(svm->vmcb, svm->vmcb_gpa);
/* Unreachable, L1 should be shutdown */
diff --git a/tools/testing/selftests/kvm/x86/svm_nested_shutdown_test.c b/tools/testing/selftests/kvm/x86/svm_nested_shutdown_test.c
index c6ea3d609a629..2a4a216954bb3 100644
--- a/tools/testing/selftests/kvm/x86/svm_nested_shutdown_test.c
+++ b/tools/testing/selftests/kvm/x86/svm_nested_shutdown_test.c
@@ -19,12 +19,9 @@ static void l2_guest_code(struct svm_test_data *svm)
static void l1_guest_code(struct svm_test_data *svm, struct idt_entry *idt)
{
- #define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
struct vmcb *vmcb = svm->vmcb;
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
vmcb->control.intercept &= ~(BIT(INTERCEPT_SHUTDOWN));
diff --git a/tools/testing/selftests/kvm/x86/svm_nested_soft_inject_test.c b/tools/testing/selftests/kvm/x86/svm_nested_soft_inject_test.c
index f72f11d4c4f83..0b640d09d1943 100644
--- a/tools/testing/selftests/kvm/x86/svm_nested_soft_inject_test.c
+++ b/tools/testing/selftests/kvm/x86/svm_nested_soft_inject_test.c
@@ -78,17 +78,13 @@ static void l2_guest_code_nmi(void)
static void l1_guest_code(struct svm_test_data *svm, u64 is_nmi, u64 idt_alt)
{
- #define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
struct vmcb *vmcb = svm->vmcb;
if (is_nmi)
x2apic_enable();
/* Prepare for L2 execution. */
- generic_svm_setup(svm,
- is_nmi ? l2_guest_code_nmi : l2_guest_code_int,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, is_nmi ? l2_guest_code_nmi : l2_guest_code_int);
vmcb->control.intercept_exceptions |= BIT(PF_VECTOR) | BIT(UD_VECTOR);
vmcb->control.intercept |= BIT(INTERCEPT_NMI) | BIT(INTERCEPT_HLT);
diff --git a/tools/testing/selftests/kvm/x86/svm_nested_vmcb12_gpa.c b/tools/testing/selftests/kvm/x86/svm_nested_vmcb12_gpa.c
index a4935ce2fb998..b3f45035745ff 100644
--- a/tools/testing/selftests/kvm/x86/svm_nested_vmcb12_gpa.c
+++ b/tools/testing/selftests/kvm/x86/svm_nested_vmcb12_gpa.c
@@ -9,14 +9,9 @@
#include "kvm_test_harness.h"
#include "test_util.h"
-
-#define L2_GUEST_STACK_SIZE 64
-
#define SYNC_GP 101
#define SYNC_L2_STARTED 102
-static unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
static void guest_gp_handler(struct ex_regs *regs)
{
GUEST_SYNC(SYNC_GP);
@@ -30,28 +25,28 @@ static void l2_code(void)
static void l1_vmrun(struct svm_test_data *svm, gpa_t gpa)
{
- generic_svm_setup(svm, l2_code, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_code);
asm volatile ("vmrun %[gpa]" : : [gpa] "a" (gpa) : "memory");
}
static void l1_vmload(struct svm_test_data *svm, gpa_t gpa)
{
- generic_svm_setup(svm, l2_code, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_code);
asm volatile ("vmload %[gpa]" : : [gpa] "a" (gpa) : "memory");
}
static void l1_vmsave(struct svm_test_data *svm, gpa_t gpa)
{
- generic_svm_setup(svm, l2_code, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_code);
asm volatile ("vmsave %[gpa]" : : [gpa] "a" (gpa) : "memory");
}
static void l1_vmexit(struct svm_test_data *svm, gpa_t gpa)
{
- generic_svm_setup(svm, l2_code, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_code);
run_guest(svm->vmcb, svm->vmcb_gpa);
GUEST_ASSERT(svm->vmcb->control.exit_code == SVM_EXIT_VMMCALL);
diff --git a/tools/testing/selftests/kvm/x86/svm_vmcall_test.c b/tools/testing/selftests/kvm/x86/svm_vmcall_test.c
index b1887242f3b8e..7c57fb7e64221 100644
--- a/tools/testing/selftests/kvm/x86/svm_vmcall_test.c
+++ b/tools/testing/selftests/kvm/x86/svm_vmcall_test.c
@@ -19,13 +19,10 @@ static void l2_guest_code(struct svm_test_data *svm)
static void l1_guest_code(struct svm_test_data *svm)
{
- #define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
struct vmcb *vmcb = svm->vmcb;
/* Prepare for L2 execution. */
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
run_guest(vmcb, svm->vmcb_gpa);
diff --git a/tools/testing/selftests/kvm/x86/triple_fault_event_test.c b/tools/testing/selftests/kvm/x86/triple_fault_event_test.c
index f1c488e0d4975..0d83516f4bd08 100644
--- a/tools/testing/selftests/kvm/x86/triple_fault_event_test.c
+++ b/tools/testing/selftests/kvm/x86/triple_fault_event_test.c
@@ -21,9 +21,6 @@ static void l2_guest_code(void)
: : [port] "d" (ARBITRARY_IO_PORT) : "rax");
}
-#define L2_GUEST_STACK_SIZE 64
-unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
void l1_guest_code_vmx(struct vmx_pages *vmx)
{
@@ -31,8 +28,7 @@ void l1_guest_code_vmx(struct vmx_pages *vmx)
GUEST_ASSERT(prepare_for_vmx_operation(vmx));
GUEST_ASSERT(load_vmcs(vmx));
- prepare_vmcs(vmx, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx, l2_guest_code);
GUEST_ASSERT(!vmlaunch());
/* L2 should triple fault after a triple fault event injected. */
@@ -44,8 +40,7 @@ void l1_guest_code_svm(struct svm_test_data *svm)
{
struct vmcb *vmcb = svm->vmcb;
- generic_svm_setup(svm, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ generic_svm_setup(svm, l2_guest_code);
/* don't intercept shutdown to test the case of SVM allowing to do so */
vmcb->control.intercept &= ~(BIT(INTERCEPT_SHUTDOWN));
diff --git a/tools/testing/selftests/kvm/x86/vmx_apic_access_test.c b/tools/testing/selftests/kvm/x86/vmx_apic_access_test.c
index 1720113eae799..463f73aa9159a 100644
--- a/tools/testing/selftests/kvm/x86/vmx_apic_access_test.c
+++ b/tools/testing/selftests/kvm/x86/vmx_apic_access_test.c
@@ -36,16 +36,13 @@ static void l2_guest_code(void)
static void l1_guest_code(struct vmx_pages *vmx_pages, unsigned long high_gpa)
{
-#define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
u32 control;
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_ASSERT(load_vmcs(vmx_pages));
/* Prepare the VMCS for L2 execution. */
- prepare_vmcs(vmx_pages, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
control = vmreadz(CPU_BASED_VM_EXEC_CONTROL);
control |= CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
vmwrite(CPU_BASED_VM_EXEC_CONTROL, control);
diff --git a/tools/testing/selftests/kvm/x86/vmx_apicv_updates_test.c b/tools/testing/selftests/kvm/x86/vmx_apicv_updates_test.c
index 80a4fd1e5bbbe..f9b88a6f6113d 100644
--- a/tools/testing/selftests/kvm/x86/vmx_apicv_updates_test.c
+++ b/tools/testing/selftests/kvm/x86/vmx_apicv_updates_test.c
@@ -31,15 +31,13 @@ static void l2_guest_code(void)
static void l1_guest_code(struct vmx_pages *vmx_pages)
{
-#define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
u32 control;
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_ASSERT(load_vmcs(vmx_pages));
/* Prepare the VMCS for L2 execution. */
- prepare_vmcs(vmx_pages, l2_guest_code, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
control = vmreadz(CPU_BASED_VM_EXEC_CONTROL);
control |= CPU_BASED_USE_MSR_BITMAPS;
vmwrite(CPU_BASED_VM_EXEC_CONTROL, control);
diff --git a/tools/testing/selftests/kvm/x86/vmx_invalid_nested_guest_state.c b/tools/testing/selftests/kvm/x86/vmx_invalid_nested_guest_state.c
index a2eaceed9ad52..6d88c54f69faa 100644
--- a/tools/testing/selftests/kvm/x86/vmx_invalid_nested_guest_state.c
+++ b/tools/testing/selftests/kvm/x86/vmx_invalid_nested_guest_state.c
@@ -25,15 +25,11 @@ static void l2_guest_code(void)
static void l1_guest_code(struct vmx_pages *vmx_pages)
{
-#define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_ASSERT(load_vmcs(vmx_pages));
/* Prepare the VMCS for L2 execution. */
- prepare_vmcs(vmx_pages, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
/*
* L2 must be run without unrestricted guest, verify that the selftests
diff --git a/tools/testing/selftests/kvm/x86/vmx_nested_la57_state_test.c b/tools/testing/selftests/kvm/x86/vmx_nested_la57_state_test.c
index f13dee3173837..75073efa926da 100644
--- a/tools/testing/selftests/kvm/x86/vmx_nested_la57_state_test.c
+++ b/tools/testing/selftests/kvm/x86/vmx_nested_la57_state_test.c
@@ -27,8 +27,6 @@ static void l2_guest_code(void)
static void l1_guest_code(struct vmx_pages *vmx_pages)
{
-#define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
u64 guest_cr4;
gpa_t pml5_pa, pml4_pa;
u64 *pml5;
@@ -42,8 +40,7 @@ static void l1_guest_code(struct vmx_pages *vmx_pages)
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_ASSERT(load_vmcs(vmx_pages));
- prepare_vmcs(vmx_pages, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
/*
* Set up L2 with a 4-level page table by pointing its CR3 to
diff --git a/tools/testing/selftests/kvm/x86/vmx_preemption_timer_test.c b/tools/testing/selftests/kvm/x86/vmx_preemption_timer_test.c
index 1b7b6ba23de76..eb8021c33cd43 100644
--- a/tools/testing/selftests/kvm/x86/vmx_preemption_timer_test.c
+++ b/tools/testing/selftests/kvm/x86/vmx_preemption_timer_test.c
@@ -66,8 +66,6 @@ void l2_guest_code(void)
void l1_guest_code(struct vmx_pages *vmx_pages)
{
-#define L2_GUEST_STACK_SIZE 64
- unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
u64 l1_vmx_pt_start;
u64 l1_vmx_pt_finish;
u64 l1_tsc_deadline, l2_tsc_deadline;
@@ -77,8 +75,7 @@ void l1_guest_code(struct vmx_pages *vmx_pages)
GUEST_ASSERT(load_vmcs(vmx_pages));
GUEST_ASSERT(vmptrstz() == vmx_pages->vmcs_gpa);
- prepare_vmcs(vmx_pages, l2_guest_code,
- &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ prepare_vmcs(vmx_pages, l2_guest_code);
/*
* Check for Preemption timer support
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v6 16/16] KVM: selftests: Add svm_pmu_host_guest_test for Host-Only/Guest-Only bits
2026-05-06 1:57 [PATCH v6 00/16] Yosry Ahmed
` (14 preceding siblings ...)
2026-05-06 1:57 ` [PATCH v6 15/16] KVM: selftests: Drop L1-provided stacks for L2 guests on x86 Yosry Ahmed
@ 2026-05-06 1:57 ` Yosry Ahmed
2026-05-06 2:00 ` [PATCH v6 00/16] Yosry Ahmed
16 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-05-06 1:57 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Dapeng Mi, Sandipan Das,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, kvm, linux-kernel,
Yosry Ahmed
From: Jim Mattson <jmattson@google.com>
Add a selftest to verify KVM correctly virtualizes the AMD PMU Host-Only
(bit 41) and Guest-Only (bit 40) event selector bits across all relevant
SVM state transitions.
The test programs 4 PMCs simultaneously with all combinations of the
Host-Only and Guest-Only bits, then verifies correct counting behavior
with EFER.SVME clear and set, as well as in host mode and guest mode.
The test also verifies that updating Host-Only / Guest-Only bits for a
PMC works as intended, and that event filtering is still respected.
Signed-off-by: Jim Mattson <jmattson@google.com>
Co-developed-by: Yosry Ahmed <yosry@kernel.org>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
tools/testing/selftests/kvm/Makefile.kvm | 1 +
tools/testing/selftests/kvm/include/x86/pmu.h | 6 +
.../kvm/x86/svm_pmu_host_guest_test.c | 216 ++++++++++++++++++
3 files changed, 223 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86/svm_pmu_host_guest_test.c
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 9118a5a51b89f..df52e938891e3 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -118,6 +118,7 @@ TEST_GEN_PROGS_x86 += x86/svm_nested_shutdown_test
TEST_GEN_PROGS_x86 += x86/svm_nested_soft_inject_test
TEST_GEN_PROGS_x86 += x86/svm_nested_vmcb12_gpa
TEST_GEN_PROGS_x86 += x86/svm_lbr_nested_state
+TEST_GEN_PROGS_x86 += x86/svm_pmu_host_guest_test
TEST_GEN_PROGS_x86 += x86/tsc_scaling_sync
TEST_GEN_PROGS_x86 += x86/sync_regs_test
TEST_GEN_PROGS_x86 += x86/ucna_injection_test
diff --git a/tools/testing/selftests/kvm/include/x86/pmu.h b/tools/testing/selftests/kvm/include/x86/pmu.h
index 98537cc8840d1..608ed83d7c6a6 100644
--- a/tools/testing/selftests/kvm/include/x86/pmu.h
+++ b/tools/testing/selftests/kvm/include/x86/pmu.h
@@ -38,6 +38,12 @@
#define ARCH_PERFMON_EVENTSEL_INV BIT_ULL(23)
#define ARCH_PERFMON_EVENTSEL_CMASK GENMASK_ULL(31, 24)
+/*
+ * These are AMD-specific bits.
+ */
+#define AMD64_EVENTSEL_GUESTONLY BIT_ULL(40)
+#define AMD64_EVENTSEL_HOSTONLY BIT_ULL(41)
+
/* RDPMC control flags, Intel only. */
#define INTEL_RDPMC_METRICS BIT_ULL(29)
#define INTEL_RDPMC_FIXED BIT_ULL(30)
diff --git a/tools/testing/selftests/kvm/x86/svm_pmu_host_guest_test.c b/tools/testing/selftests/kvm/x86/svm_pmu_host_guest_test.c
new file mode 100644
index 0000000000000..ee4633ab79aa7
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86/svm_pmu_host_guest_test.c
@@ -0,0 +1,216 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * KVM nested SVM PMU Host-Only/Guest-Only test
+ *
+ * Copyright (C) 2026, Google LLC.
+ *
+ * Test that KVM correctly virtualizes the AMD PMU Host-Only (bit 41) and
+ * Guest-Only (bit 40) event selector bits across all SVM state
+ * transitions.
+ *
+ * Programs 4 PMCs simultaneously with all combinations of Host-Only and
+ * Guest-Only bits, then verifies correct counting behavior with different
+ * combinations of EFER.SVME and host/guest mode -- as well as event filtering.
+ */
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+#include "svm_util.h"
+#include "pmu.h"
+
+#define EVENTSEL_RETIRED_INSNS (ARCH_PERFMON_EVENTSEL_OS | \
+ ARCH_PERFMON_EVENTSEL_USR | \
+ ARCH_PERFMON_EVENTSEL_ENABLE | \
+ AMD_ZEN_INSTRUCTIONS_RETIRED)
+
+/* PMC configurations: index corresponds to Host-Only | Guest-Only bits */
+#define PMC_NONE 0 /* Neither bit set */
+#define PMC_G 1 /* Guest-Only bit set */
+#define PMC_H 2 /* Host-Only bit set */
+#define PMC_HG 3 /* Both bits set */
+#define NR_PMCS 4
+
+#define LOOP_INSNS 1000
+
+static __always_inline void run_instruction_loop(void)
+{
+ unsigned int i;
+
+ for (i = 0; i < LOOP_INSNS; i++)
+ __asm__ __volatile__("nop");
+}
+
+static __always_inline void read_counters(uint64_t *counts)
+{
+ int i;
+
+ for (i = 0; i < NR_PMCS; i++)
+ counts[i] = rdmsr(MSR_F15H_PERF_CTR + 2 * i);
+}
+
+static __always_inline void run_and_measure(uint64_t *deltas)
+{
+ uint64_t before[NR_PMCS], after[NR_PMCS];
+ int i;
+
+ read_counters(before);
+ run_instruction_loop();
+ read_counters(after);
+
+ for (i = 0; i < NR_PMCS; i++)
+ deltas[i] = after[i] - before[i];
+}
+
+static void assert_pmc_counts(uint64_t *deltas, unsigned int expected_counting)
+{
+ int i;
+
+ for (i = 0; i < NR_PMCS; i++) {
+ if (expected_counting & BIT(i))
+ GUEST_ASSERT_NE(deltas[i], 0);
+ else
+ GUEST_ASSERT_EQ(deltas[i], 0);
+ }
+}
+
+static uint64_t l2_deltas[NR_PMCS];
+
+static void l2_guest_code(void)
+{
+ run_and_measure(l2_deltas);
+ vmmcall();
+}
+
+static void l1_guest_code(struct svm_test_data *svm)
+{
+ struct vmcb *vmcb = svm->vmcb;
+ uint64_t deltas[NR_PMCS];
+ uint64_t eventsel;
+ int i;
+
+ /* Program 4 PMCs with all combinations of Host-Only/Guest-Only bits */
+ for (i = 0; i < NR_PMCS; i++) {
+ eventsel = EVENTSEL_RETIRED_INSNS;
+ if (i & PMC_G)
+ eventsel |= AMD64_EVENTSEL_GUESTONLY;
+ if (i & PMC_H)
+ eventsel |= AMD64_EVENTSEL_HOSTONLY;
+ wrmsr(MSR_F15H_PERF_CTL + 2 * i, eventsel);
+ wrmsr(MSR_F15H_PERF_CTR + 2 * i, 0);
+ }
+
+ /* Step 1: SVME=0 - Only the counter with neither bits set counts */
+ wrmsr(MSR_EFER, rdmsr(MSR_EFER) & ~EFER_SVME);
+ run_and_measure(deltas);
+ assert_pmc_counts(deltas, BIT(PMC_NONE));
+
+ /* Step 2: Set SVME=1 - In L1 "host mode"; Guest-Only stops */
+ wrmsr(MSR_EFER, rdmsr(MSR_EFER) | EFER_SVME);
+ run_and_measure(deltas);
+ assert_pmc_counts(deltas, BIT(PMC_NONE) | BIT(PMC_H) | BIT(PMC_HG));
+
+ /* Step 3: VMRUN to L2 - In "guest mode"; Host-Only stops */
+ generic_svm_setup(svm, l2_guest_code);
+ vmcb->control.intercept &= ~(1ULL << INTERCEPT_MSR_PROT);
+
+ run_guest(vmcb, svm->vmcb_gpa);
+
+ GUEST_ASSERT_EQ(vmcb->control.exit_code, SVM_EXIT_VMMCALL);
+ assert_pmc_counts(l2_deltas, BIT(PMC_NONE) | BIT(PMC_G) | BIT(PMC_HG));
+
+ /* Step 4: After VMEXIT to L1 - Back in "host mode"; Guest-Only stops */
+ run_and_measure(deltas);
+ assert_pmc_counts(deltas, BIT(PMC_NONE) | BIT(PMC_H) | BIT(PMC_HG));
+
+ /* Step 5: Set KVM_PMU_EVENT_DENY - all counters stop */
+ GUEST_SYNC(KVM_PMU_EVENT_DENY);
+ run_and_measure(deltas);
+ assert_pmc_counts(deltas, 0);
+
+ /* Step 6: Set KVM_PMU_EVENT_ALLOW - back to all except Guest-only */
+ GUEST_SYNC(KVM_PMU_EVENT_ALLOW);
+ run_and_measure(deltas);
+ assert_pmc_counts(deltas, BIT(PMC_NONE) | BIT(PMC_H) | BIT(PMC_HG));
+
+ /* Step 7: Clear Host-Only for PMC_HG - counter stops in "host mode" */
+ eventsel = rdmsr(MSR_F15H_PERF_CTL + 2 * PMC_HG);
+ wrmsr(MSR_F15H_PERF_CTL + 2 * PMC_HG, eventsel & ~AMD64_EVENTSEL_HOSTONLY);
+ run_and_measure(deltas);
+ assert_pmc_counts(deltas, BIT(PMC_NONE) | BIT(PMC_H));
+
+ /* Step 8: Restore Host-Only for PMC_HG - counter counts again */
+ wrmsr(MSR_F15H_PERF_CTL + 2 * PMC_HG, eventsel);
+ run_and_measure(deltas);
+ assert_pmc_counts(deltas, BIT(PMC_NONE) | BIT(PMC_H) | BIT(PMC_HG));
+
+ /* Step 9: Clear SVME - Only the counter with neither bits set counts */
+ wrmsr(MSR_EFER, rdmsr(MSR_EFER) & ~EFER_SVME);
+ run_and_measure(deltas);
+ assert_pmc_counts(deltas, BIT(PMC_NONE));
+
+ GUEST_DONE();
+}
+
+static struct kvm_pmu_event_filter *alloc_event_filter(u64 event)
+{
+ struct kvm_pmu_event_filter *filter;
+
+ filter = malloc(sizeof(*filter) + sizeof(event));
+ TEST_ASSERT(filter != NULL, "Filter allocation failed");
+
+ memset(filter, 0, sizeof(*filter));
+ memcpy(filter->events, &event, sizeof(event));
+ filter->nevents = 1;
+ filter->action = KVM_PMU_EVENT_ALLOW;
+
+ return filter;
+}
+
+int main(int argc, char *argv[])
+{
+ struct kvm_pmu_event_filter *filter;
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+ struct ucall uc;
+ gva_t svm_gva;
+
+ TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_SVM));
+ TEST_REQUIRE(kvm_is_pmu_enabled());
+ TEST_REQUIRE(get_kvm_amd_param_bool("enable_mediated_pmu"));
+ TEST_REQUIRE(host_cpu_is_amd && kvm_cpu_family() >= 0x17);
+
+ vm = vm_create_with_one_vcpu(&vcpu, l1_guest_code);
+
+ vcpu_alloc_svm(vm, &svm_gva);
+ vcpu_args_set(vcpu, 1, svm_gva);
+
+ filter = alloc_event_filter(AMD_ZEN_INSTRUCTIONS_RETIRED);
+
+ for (;;) {
+ vcpu_run(vcpu);
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+
+ switch (get_ucall(vcpu, &uc)) {
+ case UCALL_ABORT:
+ REPORT_GUEST_ASSERT(uc);
+ goto done;
+ case UCALL_DONE:
+ goto done;
+ case UCALL_SYNC:
+ filter->action = uc.args[1];
+ vm_ioctl(vm, KVM_SET_PMU_EVENT_FILTER, filter);
+ break;
+ default:
+ TEST_FAIL("Unknown ucall %lu", uc.cmd);
+ goto done;
+ }
+ }
+done:
+ kvm_vm_free(vm);
+ return 0;
+}
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v6 00/16]
2026-05-06 1:57 [PATCH v6 00/16] Yosry Ahmed
` (15 preceding siblings ...)
2026-05-06 1:57 ` [PATCH v6 16/16] KVM: selftests: Add svm_pmu_host_guest_test for Host-Only/Guest-Only bits Yosry Ahmed
@ 2026-05-06 2:00 ` Yosry Ahmed
16 siblings, 0 replies; 18+ messages in thread
From: Yosry Ahmed @ 2026-05-06 2:00 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Jim Mattson, Dapeng Mi, Sandipan Das,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, kvm, linux-kernel
.. and of course I sent the cover letter without a proper subject. This
was supposed to be:
"KVM: x86/pmu: Add support for AMD Host-Only/Guest-Only bits"
^ permalink raw reply [flat|nested] 18+ messages in thread