* [PATCH 0/8] KVM: selftests: Stress save+restore and #PF (ft. nested)
@ 2026-05-18 20:25 Yosry Ahmed
2026-05-18 20:25 ` [PATCH 1/8] KVM: selftests: Fix offsets in GPR switching for nSVM Yosry Ahmed
` (9 more replies)
0 siblings, 10 replies; 12+ messages in thread
From: Yosry Ahmed @ 2026-05-18 20:25 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
Add a stress test for save+restore while the guest is triggering and
handling #PFs, in both L1 and L2. The goal was to create a generic
selftest that would catch bugs like the one fixed by 5c247d08bc81 ("KVM:
nSVM: Use vcpu->arch.cr2 when updating vmcb12 on nested #VMEXIT"),
instead of relying on high-level testing (e.g. building GCC in L2) to
catch it.
The test tries to be as generic as possible by triggering #PFs in a
guest and installing a proper #PF handler, while the host is
continuously doing save+restore cycles. Exiting to userspace is randomly
triggered by a second thread that constantly signals the vCPU thread.
Patches (1-4) are prep patches, fixing GPR switching for nSVM and
generalizing it to cover nVMX, which is needed for the test to run
properly with nVMX. Patch 4 removes HORRIFIC_L2_UCALL_CLOBBER_HACK, as
it is no longer needed. While this series does not have the "complete"
fix added by commit 6783ca4105a7 ("KVM: selftests: Add a shameful hack
to preserve/clobber GPRs across ucall"), it's a good step in the right
direction.
Patches (5-8) add the actual test. The test is first introduced as a
simple (read: dummy) stress test that just explicitly syncs to userspace
after each #PF handling to do save+restore, then gradually evolves to
add the random signaling and nested support. After the last patch, the
test reliably reproduces the CR2 bug.
This series conflicts with reworking L2 stack allocation in [1], but the
conflict should be trivial to fix regardless of which series lands
first.
[1]https://lore.kernel.org/kvm/20260506015733.1671124-1-yosry@kernel.org/
Yosry Ahmed (8):
KVM: selftests: Fix offsets in GPR switching for nSVM
KVM: selftests: Move GPR load/save definitions outside of nSVM code
KVM: selftests: Reuse GPR switching logic for nVMX
KVM: selftests: Drop HORRIFIC_L2_UCALL_CLOBBER_HACK
KVM: selftests: Add basic stress test for save+restore and #PF
handling
KVM: selftests: Trigger save+restore randomly in the #PF stress test
KVM: selftests: Support running stress save+restore and #PF test in L2
KVM: selftests: Trigger L2->L1 exits stress save+restore and #PF test
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../selftests/kvm/include/x86/processor.h | 65 +++-
tools/testing/selftests/kvm/include/x86/vmx.h | 46 +--
.../testing/selftests/kvm/lib/x86/processor.c | 13 +
tools/testing/selftests/kvm/lib/x86/svm.c | 29 +-
tools/testing/selftests/kvm/lib/x86/ucall.c | 32 +-
.../kvm/x86/stress_save_restore_pf_test.c | 320 ++++++++++++++++++
7 files changed, 414 insertions(+), 92 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
base-commit: a9512a611bd030088f13477258d1f8103cceaa40
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/8] KVM: selftests: Fix offsets in GPR switching for nSVM
2026-05-18 20:25 [PATCH 0/8] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
@ 2026-05-18 20:25 ` Yosry Ahmed
2026-05-18 20:25 ` [PATCH 2/8] KVM: selftests: Move GPR load/save definitions outside of nSVM code Yosry Ahmed
` (8 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Yosry Ahmed @ 2026-05-18 20:25 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
The GPRs switching code uses hardcoded offsets and gets them wrong. For
example, the offset of RBX should actually be 0x18. It also uses an
offset outside the struct for the last register.
Instead of hardcoded offsets, define ASM variables that hold the member
offsets using offsetof(), and define a macro that uses those offsets to
switch one GPR. Opportunistically drop the separate SAVE/LOAD
macros in favor of a single SWITCH macro, and rename it to reflect that
it doesn't switch RAX.
Also, re-order the registers in the struct alphabetically since the
ordering does not matter anymore, and drop RSP as it's not used anyway.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
.../selftests/kvm/include/x86/processor.h | 3 +-
tools/testing/selftests/kvm/lib/x86/svm.c | 58 ++++++++++++-------
2 files changed, 39 insertions(+), 22 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 77f576ee7789d..1482f2b53a9c5 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -381,10 +381,9 @@ static inline unsigned int x86_model(unsigned int eax)
/* General Registers in 64-Bit Mode */
struct gpr64_regs {
u64 rax;
+ u64 rbx;
u64 rcx;
u64 rdx;
- u64 rbx;
- u64 rsp;
u64 rbp;
u64 rsi;
u64 rdi;
diff --git a/tools/testing/selftests/kvm/lib/x86/svm.c b/tools/testing/selftests/kvm/lib/x86/svm.c
index 3b01605ab016c..6a6926b3b9d7c 100644
--- a/tools/testing/selftests/kvm/lib/x86/svm.c
+++ b/tools/testing/selftests/kvm/lib/x86/svm.c
@@ -131,31 +131,49 @@ void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_r
}
}
+#define DEFINE_ASM_GPR64_OFFSET(reg) \
+ asm(".equ GPR64_OFF_" #reg ", %c0" : : "i"(offsetof(struct gpr64_regs, reg)))
+
+DEFINE_ASM_GPR64_OFFSET(rbx);
+DEFINE_ASM_GPR64_OFFSET(rcx);
+DEFINE_ASM_GPR64_OFFSET(rdx);
+DEFINE_ASM_GPR64_OFFSET(rbp);
+DEFINE_ASM_GPR64_OFFSET(rsi);
+DEFINE_ASM_GPR64_OFFSET(rdi);
+DEFINE_ASM_GPR64_OFFSET(r8);
+DEFINE_ASM_GPR64_OFFSET(r9);
+DEFINE_ASM_GPR64_OFFSET(r10);
+DEFINE_ASM_GPR64_OFFSET(r11);
+DEFINE_ASM_GPR64_OFFSET(r12);
+DEFINE_ASM_GPR64_OFFSET(r13);
+DEFINE_ASM_GPR64_OFFSET(r14);
+DEFINE_ASM_GPR64_OFFSET(r15);
+
+#define GUEST_SWITCH_GPR_ASM(reg) \
+ "xchg %%" #reg ", guest_regs + GPR64_OFF_" #reg "\n\t"
/*
* save/restore 64-bit general registers except rax, rip, rsp
* which are directly handed through the VMCB guest processor state
*/
-#define SAVE_GPR_C \
- "xchg %%rbx, guest_regs+0x20\n\t" \
- "xchg %%rcx, guest_regs+0x10\n\t" \
- "xchg %%rdx, guest_regs+0x18\n\t" \
- "xchg %%rbp, guest_regs+0x30\n\t" \
- "xchg %%rsi, guest_regs+0x38\n\t" \
- "xchg %%rdi, guest_regs+0x40\n\t" \
- "xchg %%r8, guest_regs+0x48\n\t" \
- "xchg %%r9, guest_regs+0x50\n\t" \
- "xchg %%r10, guest_regs+0x58\n\t" \
- "xchg %%r11, guest_regs+0x60\n\t" \
- "xchg %%r12, guest_regs+0x68\n\t" \
- "xchg %%r13, guest_regs+0x70\n\t" \
- "xchg %%r14, guest_regs+0x78\n\t" \
- "xchg %%r15, guest_regs+0x80\n\t"
-
-#define LOAD_GPR_C SAVE_GPR_C
+#define GUEST_SWITCH_GPRS_NORAX_ASM \
+ GUEST_SWITCH_GPR_ASM(rbx) \
+ GUEST_SWITCH_GPR_ASM(rcx) \
+ GUEST_SWITCH_GPR_ASM(rdx) \
+ GUEST_SWITCH_GPR_ASM(rbp) \
+ GUEST_SWITCH_GPR_ASM(rsi) \
+ GUEST_SWITCH_GPR_ASM(rdi) \
+ GUEST_SWITCH_GPR_ASM(r8) \
+ GUEST_SWITCH_GPR_ASM(r9) \
+ GUEST_SWITCH_GPR_ASM(r10) \
+ GUEST_SWITCH_GPR_ASM(r11) \
+ GUEST_SWITCH_GPR_ASM(r12) \
+ GUEST_SWITCH_GPR_ASM(r13) \
+ GUEST_SWITCH_GPR_ASM(r14) \
+ GUEST_SWITCH_GPR_ASM(r15)
/*
* selftests do not use interrupts so we dropped clgi/sti/cli/stgi
- * for now. registers involved in LOAD/SAVE_GPR_C are eventually
+ * for now. registers involved in GPRs switching are eventually
* unmodified so they do not need to be in the clobber list.
*/
void run_guest(struct vmcb *vmcb, u64 vmcb_gpa)
@@ -166,9 +184,9 @@ void run_guest(struct vmcb *vmcb, u64 vmcb_gpa)
"mov %%r15, 0x170(%[vmcb])\n\t"
"mov guest_regs, %%r15\n\t" // rax
"mov %%r15, 0x1f8(%[vmcb])\n\t"
- LOAD_GPR_C
+ GUEST_SWITCH_GPRS_NORAX_ASM
"vmrun %[vmcb_gpa]\n\t"
- SAVE_GPR_C
+ GUEST_SWITCH_GPRS_NORAX_ASM
"mov 0x170(%[vmcb]), %%r15\n\t" // rflags
"mov %%r15, rflags\n\t"
"mov 0x1f8(%[vmcb]), %%r15\n\t" // rax
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/8] KVM: selftests: Move GPR load/save definitions outside of nSVM code
2026-05-18 20:25 [PATCH 0/8] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
2026-05-18 20:25 ` [PATCH 1/8] KVM: selftests: Fix offsets in GPR switching for nSVM Yosry Ahmed
@ 2026-05-18 20:25 ` Yosry Ahmed
2026-05-18 20:25 ` [PATCH 3/8] KVM: selftests: Reuse GPR switching logic for nVMX Yosry Ahmed
` (7 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Yosry Ahmed @ 2026-05-18 20:25 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, Yosry Ahmed
From: Yosry Ahmed <yosryahmed@google.com>
In preparation for reusing the code for nVMX tests, move the definitions
for GPRs switching to processor.h.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
.../selftests/kvm/include/x86/processor.h | 42 +++++++++++++++++++
.../testing/selftests/kvm/lib/x86/processor.c | 2 +
tools/testing/selftests/kvm/lib/x86/svm.c | 41 ------------------
3 files changed, 44 insertions(+), 41 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 1482f2b53a9c5..8e4eab84b91bc 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -397,6 +397,48 @@ struct gpr64_regs {
u64 r15;
};
+extern struct gpr64_regs guest_regs;
+
+#define DEFINE_ASM_GPR64_OFFSET(reg) \
+ asm(".equ GPR64_OFF_" #reg ", %c0" : : "i"(offsetof(struct gpr64_regs, reg)))
+
+DEFINE_ASM_GPR64_OFFSET(rbx);
+DEFINE_ASM_GPR64_OFFSET(rcx);
+DEFINE_ASM_GPR64_OFFSET(rdx);
+DEFINE_ASM_GPR64_OFFSET(rbp);
+DEFINE_ASM_GPR64_OFFSET(rsi);
+DEFINE_ASM_GPR64_OFFSET(rdi);
+DEFINE_ASM_GPR64_OFFSET(r8);
+DEFINE_ASM_GPR64_OFFSET(r9);
+DEFINE_ASM_GPR64_OFFSET(r10);
+DEFINE_ASM_GPR64_OFFSET(r11);
+DEFINE_ASM_GPR64_OFFSET(r12);
+DEFINE_ASM_GPR64_OFFSET(r13);
+DEFINE_ASM_GPR64_OFFSET(r14);
+DEFINE_ASM_GPR64_OFFSET(r15);
+
+#define GUEST_SWITCH_GPR_ASM(reg) \
+ "xchg %%" #reg ", guest_regs + GPR64_OFF_" #reg "\n\t"
+/*
+ * save/restore 64-bit general registers except rax, rip, rsp
+ * which are directly handed through the VMCB guest processor state
+ */
+#define GUEST_SWITCH_GPRS_NORAX_ASM \
+ GUEST_SWITCH_GPR_ASM(rbx) \
+ GUEST_SWITCH_GPR_ASM(rcx) \
+ GUEST_SWITCH_GPR_ASM(rdx) \
+ GUEST_SWITCH_GPR_ASM(rbp) \
+ GUEST_SWITCH_GPR_ASM(rsi) \
+ GUEST_SWITCH_GPR_ASM(rdi) \
+ GUEST_SWITCH_GPR_ASM(r8) \
+ GUEST_SWITCH_GPR_ASM(r9) \
+ GUEST_SWITCH_GPR_ASM(r10) \
+ GUEST_SWITCH_GPR_ASM(r11) \
+ GUEST_SWITCH_GPR_ASM(r12) \
+ GUEST_SWITCH_GPR_ASM(r13) \
+ GUEST_SWITCH_GPR_ASM(r14) \
+ GUEST_SWITCH_GPR_ASM(r15)
+
struct desc64 {
u16 limit0;
u16 base0;
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index b51467d70f6e7..caefcd12df8d2 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -29,6 +29,8 @@ bool host_cpu_is_amd_compatible;
bool is_forced_emulation_enabled;
u64 guest_tsc_khz;
+struct gpr64_regs guest_regs;
+
const char *ex_str(int vector)
{
switch (vector) {
diff --git a/tools/testing/selftests/kvm/lib/x86/svm.c b/tools/testing/selftests/kvm/lib/x86/svm.c
index 6a6926b3b9d7c..b4d1a00dbe27f 100644
--- a/tools/testing/selftests/kvm/lib/x86/svm.c
+++ b/tools/testing/selftests/kvm/lib/x86/svm.c
@@ -13,7 +13,6 @@
#define SEV_DEV_PATH "/dev/sev"
-struct gpr64_regs guest_regs;
u64 rflags;
/* Allocate memory regions for nested SVM tests.
@@ -131,46 +130,6 @@ void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_r
}
}
-#define DEFINE_ASM_GPR64_OFFSET(reg) \
- asm(".equ GPR64_OFF_" #reg ", %c0" : : "i"(offsetof(struct gpr64_regs, reg)))
-
-DEFINE_ASM_GPR64_OFFSET(rbx);
-DEFINE_ASM_GPR64_OFFSET(rcx);
-DEFINE_ASM_GPR64_OFFSET(rdx);
-DEFINE_ASM_GPR64_OFFSET(rbp);
-DEFINE_ASM_GPR64_OFFSET(rsi);
-DEFINE_ASM_GPR64_OFFSET(rdi);
-DEFINE_ASM_GPR64_OFFSET(r8);
-DEFINE_ASM_GPR64_OFFSET(r9);
-DEFINE_ASM_GPR64_OFFSET(r10);
-DEFINE_ASM_GPR64_OFFSET(r11);
-DEFINE_ASM_GPR64_OFFSET(r12);
-DEFINE_ASM_GPR64_OFFSET(r13);
-DEFINE_ASM_GPR64_OFFSET(r14);
-DEFINE_ASM_GPR64_OFFSET(r15);
-
-#define GUEST_SWITCH_GPR_ASM(reg) \
- "xchg %%" #reg ", guest_regs + GPR64_OFF_" #reg "\n\t"
-/*
- * save/restore 64-bit general registers except rax, rip, rsp
- * which are directly handed through the VMCB guest processor state
- */
-#define GUEST_SWITCH_GPRS_NORAX_ASM \
- GUEST_SWITCH_GPR_ASM(rbx) \
- GUEST_SWITCH_GPR_ASM(rcx) \
- GUEST_SWITCH_GPR_ASM(rdx) \
- GUEST_SWITCH_GPR_ASM(rbp) \
- GUEST_SWITCH_GPR_ASM(rsi) \
- GUEST_SWITCH_GPR_ASM(rdi) \
- GUEST_SWITCH_GPR_ASM(r8) \
- GUEST_SWITCH_GPR_ASM(r9) \
- GUEST_SWITCH_GPR_ASM(r10) \
- GUEST_SWITCH_GPR_ASM(r11) \
- GUEST_SWITCH_GPR_ASM(r12) \
- GUEST_SWITCH_GPR_ASM(r13) \
- GUEST_SWITCH_GPR_ASM(r14) \
- GUEST_SWITCH_GPR_ASM(r15)
-
/*
* selftests do not use interrupts so we dropped clgi/sti/cli/stgi
* for now. registers involved in GPRs switching are eventually
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/8] KVM: selftests: Reuse GPR switching logic for nVMX
2026-05-18 20:25 [PATCH 0/8] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
2026-05-18 20:25 ` [PATCH 1/8] KVM: selftests: Fix offsets in GPR switching for nSVM Yosry Ahmed
2026-05-18 20:25 ` [PATCH 2/8] KVM: selftests: Move GPR load/save definitions outside of nSVM code Yosry Ahmed
@ 2026-05-18 20:25 ` Yosry Ahmed
2026-05-18 20:25 ` [PATCH 4/8] KVM: selftests: Drop HORRIFIC_L2_UCALL_CLOBBER_HACK Yosry Ahmed
` (6 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Yosry Ahmed @ 2026-05-18 20:25 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, Yosry Ahmed
From: Yosry Ahmed <yosryahmed@google.com>
Reuse the GPR switching logic for nVMX by adding a wrapper that also
switches RAX, replacing the push/pop of a subset of the registers.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
.../selftests/kvm/include/x86/processor.h | 5 ++
tools/testing/selftests/kvm/include/x86/vmx.h | 46 +++++--------------
2 files changed, 17 insertions(+), 34 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 8e4eab84b91bc..ca2ec92490f7c 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -402,6 +402,7 @@ extern struct gpr64_regs guest_regs;
#define DEFINE_ASM_GPR64_OFFSET(reg) \
asm(".equ GPR64_OFF_" #reg ", %c0" : : "i"(offsetof(struct gpr64_regs, reg)))
+DEFINE_ASM_GPR64_OFFSET(rax);
DEFINE_ASM_GPR64_OFFSET(rbx);
DEFINE_ASM_GPR64_OFFSET(rcx);
DEFINE_ASM_GPR64_OFFSET(rdx);
@@ -439,6 +440,10 @@ DEFINE_ASM_GPR64_OFFSET(r15);
GUEST_SWITCH_GPR_ASM(r14) \
GUEST_SWITCH_GPR_ASM(r15)
+#define GUEST_SWITCH_GPRS_ASM \
+ GUEST_SWITCH_GPR_ASM(rax) \
+ GUEST_SWITCH_GPRS_NORAX_ASM
+
struct desc64 {
u16 limit0;
u16 base0;
diff --git a/tools/testing/selftests/kvm/include/x86/vmx.h b/tools/testing/selftests/kvm/include/x86/vmx.h
index 90fffaf915958..a7f6f0c9b6b9d 100644
--- a/tools/testing/selftests/kvm/include/x86/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86/vmx.h
@@ -363,9 +363,6 @@ static inline u64 vmptrstz(void)
return value;
}
-/*
- * No guest state (e.g. GPRs) is established by this vmlaunch.
- */
static inline int vmlaunch(void)
{
int ret;
@@ -373,34 +370,23 @@ static inline int vmlaunch(void)
if (enable_evmcs)
return evmcs_vmlaunch();
- __asm__ __volatile__("push %%rbp;"
- "push %%rcx;"
- "push %%rdx;"
- "push %%rsi;"
- "push %%rdi;"
- "push $0;"
+ __asm__ __volatile__("push $0;"
"vmwrite %%rsp, %[host_rsp];"
"lea 1f(%%rip), %%rax;"
"vmwrite %%rax, %[host_rip];"
+ GUEST_SWITCH_GPRS_ASM
"vmlaunch;"
"incq (%%rsp);"
- "1: pop %%rax;"
- "pop %%rdi;"
- "pop %%rsi;"
- "pop %%rdx;"
- "pop %%rcx;"
- "pop %%rbp;"
+ "1: ;"
+ GUEST_SWITCH_GPRS_ASM
+ "pop %%rax;"
: [ret]"=&a"(ret)
: [host_rsp]"r"((u64)HOST_RSP),
[host_rip]"r"((u64)HOST_RIP)
- : "memory", "cc", "rbx", "r8", "r9", "r10",
- "r11", "r12", "r13", "r14", "r15");
+ : "memory", "cc");
return ret;
}
-/*
- * No guest state (e.g. GPRs) is established by this vmresume.
- */
static inline int vmresume(void)
{
int ret;
@@ -408,28 +394,20 @@ static inline int vmresume(void)
if (enable_evmcs)
return evmcs_vmresume();
- __asm__ __volatile__("push %%rbp;"
- "push %%rcx;"
- "push %%rdx;"
- "push %%rsi;"
- "push %%rdi;"
- "push $0;"
+ __asm__ __volatile__("push $0;"
"vmwrite %%rsp, %[host_rsp];"
"lea 1f(%%rip), %%rax;"
"vmwrite %%rax, %[host_rip];"
+ GUEST_SWITCH_GPRS_ASM
"vmresume;"
"incq (%%rsp);"
- "1: pop %%rax;"
- "pop %%rdi;"
- "pop %%rsi;"
- "pop %%rdx;"
- "pop %%rcx;"
- "pop %%rbp;"
+ "1: ;"
+ GUEST_SWITCH_GPRS_ASM
+ "pop %%rax;"
: [ret]"=&a"(ret)
: [host_rsp]"r"((u64)HOST_RSP),
[host_rip]"r"((u64)HOST_RIP)
- : "memory", "cc", "rbx", "r8", "r9", "r10",
- "r11", "r12", "r13", "r14", "r15");
+ : "memory", "cc");
return ret;
}
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 4/8] KVM: selftests: Drop HORRIFIC_L2_UCALL_CLOBBER_HACK
2026-05-18 20:25 [PATCH 0/8] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
` (2 preceding siblings ...)
2026-05-18 20:25 ` [PATCH 3/8] KVM: selftests: Reuse GPR switching logic for nVMX Yosry Ahmed
@ 2026-05-18 20:25 ` Yosry Ahmed
2026-05-18 20:25 ` [PATCH 5/8] KVM: selftests: Add basic stress test for save+restore and #PF handling Yosry Ahmed
` (5 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Yosry Ahmed @ 2026-05-18 20:25 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, Yosry Ahmed
From: Yosry Ahmed <yosryahmed@google.com>
Now that nVMX test codes preserves GPRs across nested VM-Exits
(specifically RBP, RDX, and RDI among others), drop the ucall-specific
hack to avoid clobbering these registers.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
tools/testing/selftests/kvm/lib/x86/ucall.c | 32 ++-------------------
1 file changed, 2 insertions(+), 30 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/x86/ucall.c b/tools/testing/selftests/kvm/lib/x86/ucall.c
index e7dd5791959ba..38050c60a0670 100644
--- a/tools/testing/selftests/kvm/lib/x86/ucall.c
+++ b/tools/testing/selftests/kvm/lib/x86/ucall.c
@@ -10,36 +10,8 @@
void ucall_arch_do_ucall(gva_t uc)
{
- /*
- * FIXME: Revert this hack (the entire commit that added it) once nVMX
- * preserves L2 GPRs across a nested VM-Exit. If a ucall from L2, e.g.
- * to do a GUEST_SYNC(), lands the vCPU in L1, any and all GPRs can be
- * clobbered by L1. Save and restore non-volatile GPRs (clobbering RBP
- * in particular is problematic) along with RDX and RDI (which are
- * inputs), and clobber volatile GPRs. *sigh*
- */
-#define HORRIFIC_L2_UCALL_CLOBBER_HACK \
- "rcx", "rsi", "r8", "r9", "r10", "r11"
-
- asm volatile("push %%rbp\n\t"
- "push %%r15\n\t"
- "push %%r14\n\t"
- "push %%r13\n\t"
- "push %%r12\n\t"
- "push %%rbx\n\t"
- "push %%rdx\n\t"
- "push %%rdi\n\t"
- "in %[port], %%al\n\t"
- "pop %%rdi\n\t"
- "pop %%rdx\n\t"
- "pop %%rbx\n\t"
- "pop %%r12\n\t"
- "pop %%r13\n\t"
- "pop %%r14\n\t"
- "pop %%r15\n\t"
- "pop %%rbp\n\t"
- : : [port] "d" (UCALL_PIO_PORT), "D" (uc) : "rax", "memory",
- HORRIFIC_L2_UCALL_CLOBBER_HACK);
+ asm volatile("in %[port], %%al"
+ : : [port] "d" (UCALL_PIO_PORT), "D" (uc) : "rax", "memory");
}
void *ucall_arch_get_ucall(struct kvm_vcpu *vcpu)
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 5/8] KVM: selftests: Add basic stress test for save+restore and #PF handling
2026-05-18 20:25 [PATCH 0/8] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
` (3 preceding siblings ...)
2026-05-18 20:25 ` [PATCH 4/8] KVM: selftests: Drop HORRIFIC_L2_UCALL_CLOBBER_HACK Yosry Ahmed
@ 2026-05-18 20:25 ` Yosry Ahmed
2026-05-28 22:12 ` Yosry Ahmed
2026-05-18 20:25 ` [PATCH 6/8] KVM: selftests: Trigger save+restore randomly in the #PF stress test Yosry Ahmed
` (4 subsequent siblings)
9 siblings, 1 reply; 12+ messages in thread
From: Yosry Ahmed @ 2026-05-18 20:25 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, Yosry Ahmed
From: Yosry Ahmed <yosryahmed@google.com>
Add a basic stress test for handling #PFs in a guest while the host is
doing save+restore cycles. The guest periodically accesses non-present
memory causing a #PF, and the #PF handler walks the page tables and
updates the PTE to be present, like a proper #PF handler.
After every access (and #PF), the guest triggers a sync and the test
performs save+restore of the VM. This is not very meaningful as
save+restore are performed after the access and #PF handling complete,
but following changes will change that.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../selftests/kvm/include/x86/processor.h | 15 ++
.../testing/selftests/kvm/lib/x86/processor.c | 11 ++
.../kvm/x86/stress_save_restore_pf_test.c | 171 ++++++++++++++++++
4 files changed, 198 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 82fa943b95038..9327284dd5bbf 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -110,6 +110,7 @@ TEST_GEN_PROGS_x86 += x86/set_sregs_test
TEST_GEN_PROGS_x86 += x86/smaller_maxphyaddr_emulation_test
TEST_GEN_PROGS_x86 += x86/smm_test
TEST_GEN_PROGS_x86 += x86/state_test
+TEST_GEN_PROGS_x86 += x86/stress_save_restore_pf_test
TEST_GEN_PROGS_x86 += x86/vmx_preemption_timer_test
TEST_GEN_PROGS_x86 += x86/svm_vmcall_test
TEST_GEN_PROGS_x86 += x86/svm_int_ctl_test
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index ca2ec92490f7c..41bffe031eb88 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -610,6 +610,15 @@ static inline void set_cr0(u64 val)
__asm__ __volatile__("mov %0, %%cr0" : : "r" (val) : "memory");
}
+static inline u64 get_cr2(void)
+{
+ u64 cr2;
+
+ __asm__ __volatile__("mov %%cr2, %[cr2]"
+ : /* output */ [cr2]"=r"(cr2));
+ return cr2;
+}
+
static inline u64 get_cr3(void)
{
u64 cr3;
@@ -905,6 +914,11 @@ static inline void write_sse_reg(int reg, const sse128_t *data)
}
}
+static inline void invlpg(u64 addr)
+{
+ __asm__ __volatile__("invlpg (%0)" : : "r"(addr) : "memory");
+}
+
static inline void cpu_relax(void)
{
asm volatile("rep; nop" ::: "memory");
@@ -1557,6 +1571,7 @@ void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, gva_t gva,
gpa_t gpa, int level);
void virt_map_level(struct kvm_vm *vm, gva_t gva, gpa_t gpa,
u64 nr_bytes, int level);
+void virt_map_page_tables(struct kvm_vm *vm);
void vm_enable_tdp(struct kvm_vm *vm);
bool kvm_cpu_has_tdp(void);
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index caefcd12df8d2..6708fa8b6a304 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -343,6 +343,17 @@ void virt_map_level(struct kvm_vm *vm, gva_t gva, gpa_t gpa,
}
}
+void virt_map_page_tables(struct kvm_vm *vm)
+{
+ gpa_t gpa = KVM_GUEST_PAGE_TABLE_MIN_PADDR;
+ struct userspace_mem_region *region;
+ u64 pt_size;
+
+ region = memslot2region(vm, vm->memslots[MEM_REGION_PT]);
+ pt_size = region->region.guest_phys_addr + region->region.memory_size - gpa;
+ virt_map(vm, gpa, gpa, pt_size / getpagesize());
+}
+
static bool vm_is_target_pte(struct kvm_mmu *mmu, u64 *pte,
int *level, int current_level)
{
diff --git a/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c b/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
new file mode 100644
index 0000000000000..12da74b4f725c
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
@@ -0,0 +1,171 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <sys/types.h>
+#include <time.h>
+#include <unistd.h>
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+
+#define NR_ITERATIONS 500
+
+#define CURSOR_UP "\033[A"
+#define PRINT_ITER(s, x) \
+({ \
+ printf("%s\r%s%d\n", (x ? CURSOR_UP : ""), s, x); \
+ fflush(stdout); \
+})
+
+#define TEST_MEM_BASE 0xc0000000ULL
+#define NR_TEST_ADDRS 512
+#define PATTERN 0xabcdefabcdefabcdULL
+
+#define PTRS_PER_PTE 512
+#define PXD_INDEX(vaddr, level) (((vaddr) >> PG_LEVEL_SHIFT(level)) & (PTRS_PER_PTE - 1))
+
+static u64 pte_present_mask;
+static u64 pte_huge_mask;
+
+static u64 expected_vaddr;
+static u64 guest_accesses;
+
+static u64 *guest_get_pte(u64 vaddr)
+{
+ u64 *pgd, *p4d, *pud, *pmd, *pte;
+ u64 pgde, p4de, pude, pmde;
+ bool la57;
+
+ la57 = !!(get_cr4() & X86_CR4_LA57);
+ pgd = (u64 *)(get_cr3() & PHYSICAL_PAGE_MASK);
+
+ if (la57) {
+ pgde = pgd[PXD_INDEX(vaddr, PG_LEVEL_256T)];
+ GUEST_ASSERT(pgde & pte_present_mask);
+ p4d = (u64 *)PTE_GET_PA(pgde);
+ p4de = p4d[PXD_INDEX(vaddr, PG_LEVEL_512G)];
+ } else {
+ pgde = pgd[PXD_INDEX(vaddr, PG_LEVEL_512G)];
+ p4de = pgde;
+ }
+
+ GUEST_ASSERT(p4de & pte_present_mask);
+ pud = (u64 *)PTE_GET_PA(p4de);
+
+ pude = pud[PXD_INDEX(vaddr, PG_LEVEL_1G)];
+ GUEST_ASSERT(pude & pte_present_mask);
+ GUEST_ASSERT(!(pude & pte_huge_mask));
+ pmd = (u64 *)PTE_GET_PA(pude);
+
+ pmde = pmd[PXD_INDEX(vaddr, PG_LEVEL_2M)];
+ GUEST_ASSERT(pmde & pte_present_mask);
+ GUEST_ASSERT(!(pmde & pte_huge_mask));
+ pte = (u64 *)PTE_GET_PA(pmde);
+
+ return &pte[PXD_INDEX(vaddr, PG_LEVEL_4K)];
+}
+
+static void guest_pf_handler(struct ex_regs *regs)
+{
+ u64 fault_addr;
+ u64 *ptep;
+
+ fault_addr = get_cr2();
+ GUEST_ASSERT_EQ(fault_addr, READ_ONCE(expected_vaddr));
+
+ ptep = guest_get_pte(fault_addr);
+ GUEST_ASSERT(ptep);
+ GUEST_ASSERT(!(*ptep & pte_present_mask));
+
+ *ptep |= pte_present_mask;
+ invlpg(fault_addr);
+}
+
+static void guest_access_memory(void *arg)
+{
+ u64 vaddr, val;
+
+ for (;; guest_accesses++) {
+ vaddr = TEST_MEM_BASE + (guest_accesses % NR_TEST_ADDRS) * PAGE_SIZE;
+ WRITE_ONCE(expected_vaddr, vaddr);
+
+ /* Read to trigger #PF */
+ val = READ_ONCE(*(u64 *)vaddr);
+ GUEST_ASSERT_EQ(val, PATTERN);
+
+ /* Clear the present bit again so it faults next time */
+ *guest_get_pte(vaddr) &= ~pte_present_mask;
+ invlpg(vaddr);
+
+ GUEST_SYNC(guest_accesses);
+ }
+}
+
+int main(int argc, char *argv[])
+{
+ struct kvm_x86_state *state;
+ struct kvm_vcpu *vcpu;
+ int r, i, count = 0;
+ struct kvm_vm *vm;
+ struct ucall uc;
+ gva_t gva;
+ gpa_t gpa;
+
+ vm = vm_create_with_one_vcpu(&vcpu, guest_access_memory);
+ vm_install_exception_handler(vm, PF_VECTOR, guest_pf_handler);
+
+ pte_present_mask = PTE_PRESENT_MASK(&vm->mmu);
+ pte_huge_mask = PTE_HUGE_MASK(&vm->mmu);
+ sync_global_to_guest(vm, pte_present_mask);
+ sync_global_to_guest(vm, pte_huge_mask);
+
+ /* Allocate a page and write the pattern to it */
+ gva = vm_alloc_page(vm);
+ *(u64 *)addr_gva2hva(vm, gva) = PATTERN;
+ gpa = addr_gva2gpa(vm, gva);
+
+ /*
+ * Map all virtual addresses to the pattern page and clear the present
+ * bit such that guest accesses will cause a #PF.
+ */
+ for (i = 0; i < NR_TEST_ADDRS; i++) {
+ gva = TEST_MEM_BASE + i * getpagesize();
+ virt_pg_map(vm, gva, gpa);
+ *vm_get_pte(vm, gva) &= ~pte_present_mask;
+ }
+
+ /* Map the page tables so that the guest #PF handler can walk them */
+ virt_map_page_tables(vm);
+
+ while (count++ < NR_ITERATIONS) {
+ r = __vcpu_run(vcpu);
+ TEST_ASSERT(!r, "vcpu_run failed");
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+
+ get_ucall(vcpu, &uc);
+ if (uc.cmd == UCALL_ABORT) {
+ REPORT_GUEST_ASSERT(uc);
+ break;
+ }
+ TEST_ASSERT_EQ(uc.cmd, UCALL_SYNC);
+ TEST_ASSERT_EQ(uc.args[1], count - 1);
+
+ state = vcpu_save_state(vcpu);
+
+ kvm_vm_release(vm);
+ vcpu = vm_recreate_with_one_vcpu(vm);
+ vcpu_load_state(vcpu, state);
+ kvm_x86_state_cleanup(state);
+
+ PRINT_ITER("Save+restore iterations: ", count);
+ }
+
+ sync_global_from_guest(vm, guest_accesses);
+ pr_info("Guest page accesses: %lu\n", guest_accesses);
+
+ kvm_vm_free(vm);
+ return 0;
+}
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 6/8] KVM: selftests: Trigger save+restore randomly in the #PF stress test
2026-05-18 20:25 [PATCH 0/8] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
` (4 preceding siblings ...)
2026-05-18 20:25 ` [PATCH 5/8] KVM: selftests: Add basic stress test for save+restore and #PF handling Yosry Ahmed
@ 2026-05-18 20:25 ` Yosry Ahmed
2026-05-18 20:25 ` [PATCH 7/8] KVM: selftests: Support running stress save+restore and #PF test in L2 Yosry Ahmed
` (3 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Yosry Ahmed @ 2026-05-18 20:25 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, Yosry Ahmed
From: Yosry Ahmed <yosryahmed@google.com>
Instead of an explicit GUEST_SYNC() after each access+#PF, run another
thread that keeps sending SIGUSR to the vCPU thread, essentially
triggering exits to userspace and save+restore on random points in guest
execution. This makes the test a lot more meaningful as it opens the
door to exercising race conditions between #PF handling in the guest
and save+restore in the host.
The signals are ignored using SIG_IGN outside of __vcpu_run() to avoid
interrupting other ioctls/sysctls performed by the test.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
.../kvm/x86/stress_save_restore_pf_test.c | 56 ++++++++++++++++---
1 file changed, 48 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c b/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
index 12da74b4f725c..60a013e0f14fb 100644
--- a/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
+++ b/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
@@ -5,6 +5,8 @@
#include <errno.h>
#include <sys/types.h>
#include <time.h>
+#include <pthread.h>
+#include <signal.h>
#include <unistd.h>
#include "test_util.h"
@@ -99,14 +101,40 @@ static void guest_access_memory(void *arg)
/* Clear the present bit again so it faults next time */
*guest_get_pte(vaddr) &= ~pte_present_mask;
invlpg(vaddr);
+ }
+}
+
+static void *sigusr_thread_fn(void *arg)
+{
+ pthread_t vcpu_thread = (pthread_t)arg;
- GUEST_SYNC(guest_accesses);
+ for (;;) {
+ pthread_testcancel();
+ pthread_kill(vcpu_thread, SIGUSR1);
+ usleep(100);
}
+ return NULL;
+}
+
+static void dummy_signal_handler(int signo) {}
+static struct sigaction sa;
+
+static void vcpu_sigusr_listen(void)
+{
+ sa.sa_handler = dummy_signal_handler;
+ sigaction(SIGUSR1, &sa, NULL);
+}
+
+static void vcpu_sigusr_ignore(void)
+{
+ sa.sa_handler = SIG_IGN;
+ sigaction(SIGUSR1, &sa, NULL);
}
int main(int argc, char *argv[])
{
struct kvm_x86_state *state;
+ pthread_t sigusr_thread;
struct kvm_vcpu *vcpu;
int r, i, count = 0;
struct kvm_vm *vm;
@@ -140,18 +168,28 @@ int main(int argc, char *argv[])
/* Map the page tables so that the guest #PF handler can walk them */
virt_map_page_tables(vm);
+ /* Initialize the thread sending SIGUSR and install the handler */
+ pthread_create(&sigusr_thread, NULL, sigusr_thread_fn,
+ (void *)pthread_self());
+
while (count++ < NR_ITERATIONS) {
+ /*
+ * Only handle SIGUSR while the vCPU is running, otherwise
+ * ignore it to avoid interrupting other ioctls/syscalls.
+ */
+ vcpu_sigusr_listen();
r = __vcpu_run(vcpu);
- TEST_ASSERT(!r, "vcpu_run failed");
- TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
-
- get_ucall(vcpu, &uc);
- if (uc.cmd == UCALL_ABORT) {
+ if (r == -1)
+ TEST_ASSERT_EQ(errno, EINTR);
+ vcpu_sigusr_ignore();
+
+ /* The guest only exists due to a signal or failed assertion */
+ if (!r) {
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+ TEST_ASSERT_EQ(get_ucall(vcpu, &uc), UCALL_ABORT);
REPORT_GUEST_ASSERT(uc);
break;
}
- TEST_ASSERT_EQ(uc.cmd, UCALL_SYNC);
- TEST_ASSERT_EQ(uc.args[1], count - 1);
state = vcpu_save_state(vcpu);
@@ -166,6 +204,8 @@ int main(int argc, char *argv[])
sync_global_from_guest(vm, guest_accesses);
pr_info("Guest page accesses: %lu\n", guest_accesses);
+ pthread_cancel(sigusr_thread);
+ pthread_join(sigusr_thread, NULL);
kvm_vm_free(vm);
return 0;
}
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 7/8] KVM: selftests: Support running stress save+restore and #PF test in L2
2026-05-18 20:25 [PATCH 0/8] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
` (5 preceding siblings ...)
2026-05-18 20:25 ` [PATCH 6/8] KVM: selftests: Trigger save+restore randomly in the #PF stress test Yosry Ahmed
@ 2026-05-18 20:25 ` Yosry Ahmed
2026-05-18 20:25 ` [PATCH 8/8] KVM: selftests: Trigger L2->L1 exits stress save+restore and #PF test Yosry Ahmed
` (2 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Yosry Ahmed @ 2026-05-18 20:25 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, Yosry Ahmed
From: Yosry Ahmed <yosryahmed@google.com>
Extend the stress test to allow running the access+#PF code in L2
instead of L1 by adding proper L1 guest code to bootstrap L2. The test
runs in nested mode if a '-n' flag is added.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
.../kvm/x86/stress_save_restore_pf_test.c | 75 ++++++++++++++++++-
1 file changed, 73 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c b/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
index 60a013e0f14fb..ec3d36d6e4846 100644
--- a/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
+++ b/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
@@ -8,10 +8,13 @@
#include <pthread.h>
#include <signal.h>
#include <unistd.h>
+#include <getopt.h>
#include "test_util.h"
#include "kvm_util.h"
#include "processor.h"
+#include "svm_util.h"
+#include "vmx.h"
#define NR_ITERATIONS 500
@@ -26,6 +29,8 @@
#define NR_TEST_ADDRS 512
#define PATTERN 0xabcdefabcdefabcdULL
+#define L2_GUEST_STACK_SIZE 64
+
#define PTRS_PER_PTE 512
#define PXD_INDEX(vaddr, level) (((vaddr) >> PG_LEVEL_SHIFT(level)) & (PTRS_PER_PTE - 1))
@@ -104,6 +109,41 @@ static void guest_access_memory(void *arg)
}
}
+static void l1_svm_code(struct svm_test_data *svm)
+{
+ unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+
+ generic_svm_setup(svm, guest_access_memory, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+
+ run_guest(svm->vmcb, svm->vmcb_gpa);
+ GUEST_ASSERT(false);
+}
+
+static void l1_vmx_code(struct vmx_pages *vmx)
+{
+ unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+
+ GUEST_ASSERT(prepare_for_vmx_operation(vmx));
+ GUEST_ASSERT(load_vmcs(vmx));
+ prepare_vmcs(vmx, guest_access_memory, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+
+ /* Ignore any #PF */
+ GUEST_ASSERT(!vmwrite(EXCEPTION_BITMAP, BIT(PF_VECTOR)));
+ GUEST_ASSERT(!vmwrite(PAGE_FAULT_ERROR_CODE_MASK, 0));
+ GUEST_ASSERT(!vmwrite(PAGE_FAULT_ERROR_CODE_MATCH, -1));
+
+ GUEST_ASSERT(!vmlaunch());
+ GUEST_ASSERT(false);
+}
+
+static void l1_guest_code(void *test_data)
+{
+ if (this_cpu_has(X86_FEATURE_SVM))
+ l1_svm_code(test_data);
+ else
+ l1_vmx_code(test_data);
+}
+
static void *sigusr_thread_fn(void *arg)
{
pthread_t vcpu_thread = (pthread_t)arg;
@@ -131,6 +171,25 @@ static void vcpu_sigusr_ignore(void)
sigaction(SIGUSR1, &sa, NULL);
}
+static bool parse_args_nested(int argc, char *argv[])
+{
+ bool nested = false;
+ int opt;
+
+ while ((opt = getopt(argc, argv, "n")) != -1) {
+ switch (opt) {
+ case 'n':
+ nested = true;
+ break;
+ default:
+ printf("Usage: %s [-n]\n", argv[0]);
+ exit(1);
+ }
+ }
+
+ return nested;
+}
+
int main(int argc, char *argv[])
{
struct kvm_x86_state *state;
@@ -139,12 +198,24 @@ int main(int argc, char *argv[])
int r, i, count = 0;
struct kvm_vm *vm;
struct ucall uc;
+ bool nested;
gva_t gva;
gpa_t gpa;
- vm = vm_create_with_one_vcpu(&vcpu, guest_access_memory);
+ nested = parse_args_nested(argc, argv);
+
+ vm = vm_create_with_one_vcpu(&vcpu, nested ? l1_guest_code : guest_access_memory);
vm_install_exception_handler(vm, PF_VECTOR, guest_pf_handler);
+ if (nested) {
+ TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_SVM) || kvm_cpu_has(X86_FEATURE_VMX));
+ if (kvm_cpu_has(X86_FEATURE_SVM))
+ vcpu_alloc_svm(vm, &gva);
+ else
+ vcpu_alloc_vmx(vm, &gva);
+ vcpu_args_set(vcpu, 1, gva);
+ }
+
pte_present_mask = PTE_PRESENT_MASK(&vm->mmu);
pte_huge_mask = PTE_HUGE_MASK(&vm->mmu);
sync_global_to_guest(vm, pte_present_mask);
@@ -202,7 +273,7 @@ int main(int argc, char *argv[])
}
sync_global_from_guest(vm, guest_accesses);
- pr_info("Guest page accesses: %lu\n", guest_accesses);
+ pr_info("Guest page accesses%s: %lu\n", nested ? " (from L2)" : "", guest_accesses);
pthread_cancel(sigusr_thread);
pthread_join(sigusr_thread, NULL);
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 8/8] KVM: selftests: Trigger L2->L1 exits stress save+restore and #PF test
2026-05-18 20:25 [PATCH 0/8] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
` (6 preceding siblings ...)
2026-05-18 20:25 ` [PATCH 7/8] KVM: selftests: Support running stress save+restore and #PF test in L2 Yosry Ahmed
@ 2026-05-18 20:25 ` Yosry Ahmed
2026-05-18 20:40 ` [PATCH 0/8] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
2026-05-28 19:26 ` Yosry Ahmed
9 siblings, 0 replies; 12+ messages in thread
From: Yosry Ahmed @ 2026-05-18 20:25 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, Yosry Ahmed
From: Yosry Ahmed <yosryahmed@google.com>
Extend the testing coverage in L2 by injecting a #UD into the vCPU every
other iteration during restore, and intercepting #UD from L1,
essentially forcing an L2 -> L1 VM-Exit directly after save+restore.
With this change, the test reliably reproduces the CR2 bug fixed by
commit 5c247d08bc81 ("KVM: nSVM: Use vcpu->arch.cr2 when updating vmcb12
on nested #VMEXIT") -- at least on Milan, Genoa, and Turin CPUs.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
.../kvm/x86/stress_save_restore_pf_test.c | 48 +++++++++++++++++--
1 file changed, 43 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c b/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
index ec3d36d6e4846..e977d9786e392 100644
--- a/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
+++ b/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
@@ -115,8 +115,12 @@ static void l1_svm_code(struct svm_test_data *svm)
generic_svm_setup(svm, guest_access_memory, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
- run_guest(svm->vmcb, svm->vmcb_gpa);
- GUEST_ASSERT(false);
+ svm->vmcb->control.intercept_exceptions |= BIT(UD_VECTOR);
+
+ while (1) {
+ run_guest(svm->vmcb, svm->vmcb_gpa);
+ GUEST_ASSERT_EQ(svm->vmcb->control.exit_code, (SVM_EXIT_EXCP_BASE + UD_VECTOR));
+ }
}
static void l1_vmx_code(struct vmx_pages *vmx)
@@ -127,13 +131,17 @@ static void l1_vmx_code(struct vmx_pages *vmx)
GUEST_ASSERT(load_vmcs(vmx));
prepare_vmcs(vmx, guest_access_memory, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
- /* Ignore any #PF */
- GUEST_ASSERT(!vmwrite(EXCEPTION_BITMAP, BIT(PF_VECTOR)));
+ /* Intercept UD, ignore any #PF */
+ GUEST_ASSERT(!vmwrite(EXCEPTION_BITMAP, BIT(UD_VECTOR) | BIT(PF_VECTOR)));
GUEST_ASSERT(!vmwrite(PAGE_FAULT_ERROR_CODE_MASK, 0));
GUEST_ASSERT(!vmwrite(PAGE_FAULT_ERROR_CODE_MATCH, -1));
GUEST_ASSERT(!vmlaunch());
- GUEST_ASSERT(false);
+ while (1) {
+ GUEST_ASSERT_EQ(vmreadz(VM_EXIT_REASON), EXIT_REASON_EXCEPTION_NMI);
+ GUEST_ASSERT_EQ(vmreadz(VM_EXIT_INTR_INFO) & 0xff, UD_VECTOR);
+ GUEST_ASSERT(!vmresume());
+ }
}
static void l1_guest_code(void *test_data)
@@ -171,6 +179,23 @@ static void vcpu_sigusr_ignore(void)
sigaction(SIGUSR1, &sa, NULL);
}
+static bool vcpu_state_is_guest_mode(struct kvm_x86_state *state)
+{
+ return !!(state->nested.flags & KVM_STATE_NESTED_GUEST_MODE);
+}
+
+static void vcpu_state_inject_ud(struct kvm_x86_state *state)
+{
+ if (state->events.exception.pending || state->events.exception.injected)
+ return;
+
+ state->events.flags |= KVM_VCPUEVENT_VALID_PAYLOAD;
+ state->events.exception.pending = true;
+ state->events.exception.injected = false;
+ state->events.exception.nr = UD_VECTOR;
+ state->events.exception.has_error_code = false;
+}
+
static bool parse_args_nested(int argc, char *argv[])
{
bool nested = false;
@@ -202,10 +227,13 @@ int main(int argc, char *argv[])
gva_t gva;
gpa_t gpa;
+ TEST_REQUIRE(kvm_has_cap(KVM_CAP_EXCEPTION_PAYLOAD));
+
nested = parse_args_nested(argc, argv);
vm = vm_create_with_one_vcpu(&vcpu, nested ? l1_guest_code : guest_access_memory);
vm_install_exception_handler(vm, PF_VECTOR, guest_pf_handler);
+ vm_enable_cap(vm, KVM_CAP_EXCEPTION_PAYLOAD, -2ul);
if (nested) {
TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_SVM) || kvm_cpu_has(X86_FEATURE_VMX));
@@ -264,8 +292,18 @@ int main(int argc, char *argv[])
state = vcpu_save_state(vcpu);
+ /*
+ * If the vCPU is in guest mode, inject a #UD to trigger an
+ * L2->L1 VM-Exit every other iteration.
+ */
+ if (vcpu_state_is_guest_mode(state) && count % 2 == 0) {
+ TEST_ASSERT(nested, "Unexpected guest mode");
+ vcpu_state_inject_ud(state);
+ }
+
kvm_vm_release(vm);
vcpu = vm_recreate_with_one_vcpu(vm);
+ vm_enable_cap(vm, KVM_CAP_EXCEPTION_PAYLOAD, -2ul);
vcpu_load_state(vcpu, state);
kvm_x86_state_cleanup(state);
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 0/8] KVM: selftests: Stress save+restore and #PF (ft. nested)
2026-05-18 20:25 [PATCH 0/8] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
` (7 preceding siblings ...)
2026-05-18 20:25 ` [PATCH 8/8] KVM: selftests: Trigger L2->L1 exits stress save+restore and #PF test Yosry Ahmed
@ 2026-05-18 20:40 ` Yosry Ahmed
2026-05-28 19:26 ` Yosry Ahmed
9 siblings, 0 replies; 12+ messages in thread
From: Yosry Ahmed @ 2026-05-18 20:40 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
On Mon, May 18, 2026 at 08:25:06PM +0000, Yosry Ahmed wrote:
> Add a stress test for save+restore while the guest is triggering and
> handling #PFs, in both L1 and L2. The goal was to create a generic
> selftest that would catch bugs like the one fixed by 5c247d08bc81 ("KVM:
> nSVM: Use vcpu->arch.cr2 when updating vmcb12 on nested #VMEXIT"),
> instead of relying on high-level testing (e.g. building GCC in L2) to
> catch it.
>
> The test tries to be as generic as possible by triggering #PFs in a
> guest and installing a proper #PF handler, while the host is
> continuously doing save+restore cycles. Exiting to userspace is randomly
> triggered by a second thread that constantly signals the vCPU thread.
>
> Patches (1-4) are prep patches, fixing GPR switching for nSVM and
> generalizing it to cover nVMX, which is needed for the test to run
> properly with nVMX. Patch 4 removes HORRIFIC_L2_UCALL_CLOBBER_HACK, as
> it is no longer needed. While this series does not have the "complete"
> fix added by commit 6783ca4105a7 ("KVM: selftests: Add a shameful hack
> to preserve/clobber GPRs across ucall"), it's a good step in the right
> direction.
>
> Patches (5-8) add the actual test. The test is first introduced as a
> simple (read: dummy) stress test that just explicitly syncs to userspace
> after each #PF handling to do save+restore, then gradually evolves to
> add the random signaling and nested support. After the last patch, the
> test reliably reproduces the CR2 bug.
>
> This series conflicts with reworking L2 stack allocation in [1], but the
> conflict should be trivial to fix regardless of which series lands
> first.
>
> [1]https://lore.kernel.org/kvm/20260506015733.1671124-1-yosry@kernel.org/
Oh I forgot, all the patches should have:
Assisted-by: gemini/gemini-3.1-pro
>
> Yosry Ahmed (8):
> KVM: selftests: Fix offsets in GPR switching for nSVM
> KVM: selftests: Move GPR load/save definitions outside of nSVM code
> KVM: selftests: Reuse GPR switching logic for nVMX
> KVM: selftests: Drop HORRIFIC_L2_UCALL_CLOBBER_HACK
> KVM: selftests: Add basic stress test for save+restore and #PF
> handling
> KVM: selftests: Trigger save+restore randomly in the #PF stress test
> KVM: selftests: Support running stress save+restore and #PF test in L2
> KVM: selftests: Trigger L2->L1 exits stress save+restore and #PF test
>
> tools/testing/selftests/kvm/Makefile.kvm | 1 +
> .../selftests/kvm/include/x86/processor.h | 65 +++-
> tools/testing/selftests/kvm/include/x86/vmx.h | 46 +--
> .../testing/selftests/kvm/lib/x86/processor.c | 13 +
> tools/testing/selftests/kvm/lib/x86/svm.c | 29 +-
> tools/testing/selftests/kvm/lib/x86/ucall.c | 32 +-
> .../kvm/x86/stress_save_restore_pf_test.c | 320 ++++++++++++++++++
> 7 files changed, 414 insertions(+), 92 deletions(-)
> create mode 100644 tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
>
>
> base-commit: a9512a611bd030088f13477258d1f8103cceaa40
> --
> 2.54.0.563.g4f69b47b94-goog
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/8] KVM: selftests: Stress save+restore and #PF (ft. nested)
2026-05-18 20:25 [PATCH 0/8] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
` (8 preceding siblings ...)
2026-05-18 20:40 ` [PATCH 0/8] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
@ 2026-05-28 19:26 ` Yosry Ahmed
9 siblings, 0 replies; 12+ messages in thread
From: Yosry Ahmed @ 2026-05-28 19:26 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
On Mon, May 18, 2026 at 08:25:06PM +0000, Yosry Ahmed wrote:
> Add a stress test for save+restore while the guest is triggering and
> handling #PFs, in both L1 and L2. The goal was to create a generic
> selftest that would catch bugs like the one fixed by 5c247d08bc81 ("KVM:
> nSVM: Use vcpu->arch.cr2 when updating vmcb12 on nested #VMEXIT"),
> instead of relying on high-level testing (e.g. building GCC in L2) to
> catch it.
>
> The test tries to be as generic as possible by triggering #PFs in a
> guest and installing a proper #PF handler, while the host is
> continuously doing save+restore cycles. Exiting to userspace is randomly
> triggered by a second thread that constantly signals the vCPU thread.
>
> Patches (1-4) are prep patches, fixing GPR switching for nSVM and
> generalizing it to cover nVMX, which is needed for the test to run
> properly with nVMX. Patch 4 removes HORRIFIC_L2_UCALL_CLOBBER_HACK, as
> it is no longer needed. While this series does not have the "complete"
> fix added by commit 6783ca4105a7 ("KVM: selftests: Add a shameful hack
> to preserve/clobber GPRs across ucall"), it's a good step in the right
> direction.
>
> Patches (5-8) add the actual test. The test is first introduced as a
> simple (read: dummy) stress test that just explicitly syncs to userspace
> after each #PF handling to do save+restore, then gradually evolves to
> add the random signaling and nested support. After the last patch, the
> test reliably reproduces the CR2 bug.
>
> This series conflicts with reworking L2 stack allocation in [1], but the
> conflict should be trivial to fix regardless of which series lands
> first.
>
> [1]https://lore.kernel.org/kvm/20260506015733.1671124-1-yosry@kernel.org/
>
> Yosry Ahmed (8):
> KVM: selftests: Fix offsets in GPR switching for nSVM
This series will need a new version with a reworked patch 1, potentially
after some TDX selftests infrastructure lands. The .equ approach to
define offsets doesn't always work. See details here:
https://lore.kernel.org/kvm/ahiVwxm2kD0TVqxj@google.com/
> KVM: selftests: Move GPR load/save definitions outside of nSVM code
> KVM: selftests: Reuse GPR switching logic for nVMX
> KVM: selftests: Drop HORRIFIC_L2_UCALL_CLOBBER_HACK
> KVM: selftests: Add basic stress test for save+restore and #PF
> handling
> KVM: selftests: Trigger save+restore randomly in the #PF stress test
> KVM: selftests: Support running stress save+restore and #PF test in L2
> KVM: selftests: Trigger L2->L1 exits stress save+restore and #PF test
>
> tools/testing/selftests/kvm/Makefile.kvm | 1 +
> .../selftests/kvm/include/x86/processor.h | 65 +++-
> tools/testing/selftests/kvm/include/x86/vmx.h | 46 +--
> .../testing/selftests/kvm/lib/x86/processor.c | 13 +
> tools/testing/selftests/kvm/lib/x86/svm.c | 29 +-
> tools/testing/selftests/kvm/lib/x86/ucall.c | 32 +-
> .../kvm/x86/stress_save_restore_pf_test.c | 320 ++++++++++++++++++
> 7 files changed, 414 insertions(+), 92 deletions(-)
> create mode 100644 tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
>
>
> base-commit: a9512a611bd030088f13477258d1f8103cceaa40
> --
> 2.54.0.563.g4f69b47b94-goog
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 5/8] KVM: selftests: Add basic stress test for save+restore and #PF handling
2026-05-18 20:25 ` [PATCH 5/8] KVM: selftests: Add basic stress test for save+restore and #PF handling Yosry Ahmed
@ 2026-05-28 22:12 ` Yosry Ahmed
0 siblings, 0 replies; 12+ messages in thread
From: Yosry Ahmed @ 2026-05-28 22:12 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
> diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
> index caefcd12df8d2..6708fa8b6a304 100644
> --- a/tools/testing/selftests/kvm/lib/x86/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86/processor.c
> @@ -343,6 +343,17 @@ void virt_map_level(struct kvm_vm *vm, gva_t gva, gpa_t gpa,
> }
> }
>
> +void virt_map_page_tables(struct kvm_vm *vm)
> +{
> + gpa_t gpa = KVM_GUEST_PAGE_TABLE_MIN_PADDR;
> + struct userspace_mem_region *region;
> + u64 pt_size;
> +
> + region = memslot2region(vm, vm->memslots[MEM_REGION_PT]);
> + pt_size = region->region.guest_phys_addr + region->region.memory_size - gpa;
> + virt_map(vm, gpa, gpa, pt_size / getpagesize());
> +}
This is wrong. It tries to map the entire memslot for MEM_REGION_PT,
starting at KVM_GUEST_PAGE_TABLE_MIN_PADDR. The problem is, the
memslot is shared with other things like ELF. So this tries to map
everything from KVM_GUEST_PAGE_TABLE_MIN_PADDR (0x180000) to the end
of the memslot (0x291000 in this case) using identity GVA->GPA
mappings.
When using clang's LLD, ELF starts at GVA 0x20000, so this tries to
remap the ELF GVAs to a different GPAs, and an assertion fires in
__virt_pg_map().
Without LLD (when I did my initial testing), ELF starts at GVA
0x40000, so it's outside the range mapped by virt_map_page_tables().
We can fix this by creating a separate memslot for page tables like
other archs do, but I think for this case, it's probably simpler if
the test just walks the page tables and maps them. It should be one
page at each level.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-05-28 22:12 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-18 20:25 [PATCH 0/8] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
2026-05-18 20:25 ` [PATCH 1/8] KVM: selftests: Fix offsets in GPR switching for nSVM Yosry Ahmed
2026-05-18 20:25 ` [PATCH 2/8] KVM: selftests: Move GPR load/save definitions outside of nSVM code Yosry Ahmed
2026-05-18 20:25 ` [PATCH 3/8] KVM: selftests: Reuse GPR switching logic for nVMX Yosry Ahmed
2026-05-18 20:25 ` [PATCH 4/8] KVM: selftests: Drop HORRIFIC_L2_UCALL_CLOBBER_HACK Yosry Ahmed
2026-05-18 20:25 ` [PATCH 5/8] KVM: selftests: Add basic stress test for save+restore and #PF handling Yosry Ahmed
2026-05-28 22:12 ` Yosry Ahmed
2026-05-18 20:25 ` [PATCH 6/8] KVM: selftests: Trigger save+restore randomly in the #PF stress test Yosry Ahmed
2026-05-18 20:25 ` [PATCH 7/8] KVM: selftests: Support running stress save+restore and #PF test in L2 Yosry Ahmed
2026-05-18 20:25 ` [PATCH 8/8] KVM: selftests: Trigger L2->L1 exits stress save+restore and #PF test Yosry Ahmed
2026-05-18 20:40 ` [PATCH 0/8] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
2026-05-28 19:26 ` Yosry Ahmed
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox