* [PATCH v3 00/10] KVM: selftests: Stress save+restore and #PF (ft. nested)
@ 2026-06-29 18:37 Yosry Ahmed
2026-06-29 18:37 ` [PATCH v3 01/10] KVM: selftests: Move STR() and XSTR() definitions to test_util.h Yosry Ahmed
` (9 more replies)
0 siblings, 10 replies; 15+ messages in thread
From: Yosry Ahmed @ 2026-06-29 18:37 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
Add a stress test for save+restore while the guest is triggering and
handling #PFs, in both L1 and L2. The goal was to create a generic
selftest that would catch bugs like the one fixed by commit 5c247d08bc81
("KVM: nSVM: Use vcpu->arch.cr2 when updating vmcb12 on nested
#VMEXIT"), instead of relying on high-level testing (e.g. building GCC
in L2) to catch it.
The test tries to be as generic as possible by triggering #PFs in a
guest and installing a proper #PF handler, while the host is
continuously doing save+restore cycles. Exiting to userspace is randomly
triggered by a second thread that constantly signals the vCPU thread.
Patches (1-6) are prep patches, fixing GPR switching for nSVM and
generalizing it to cover nVMX, which is needed for the test to run
properly with nVMX. Patch 4 removes HORRIFIC_L2_UCALL_CLOBBER_HACK, as
it is no longer needed. While this series does not have the "complete"
fix added by commit 6783ca4105a7 ("KVM: selftests: Add a shameful hack
to preserve/clobber GPRs across ucall"), it's a good step in the right
direction.
Patches (7-10) add the actual test. The test is first introduced as a
simple (read: dummy) stress test that just explicitly syncs to userspace
after each #PF handling to do save+restore, then gradually evolves to
add the random signaling and nested support. After the last patch, the
test reliably reproduces the CR2 bug.
v2 -> v3:
- Rebased on top of L2 stack rework in selftests.
- Fix GPR array size (i.e. NR_GUEST_REGS) [Sashiko].
- Handle evmcs_vmlaunch() and evmcs_vmresume() [Sashiko].
- Fix off-by-one assertion error in intermediate patches.
- Increase inter-signal delay from 100us to 1msec.
v1 -> v2:
- Switch guest_regs to an array, which simplifies the offsets
calculation and forgoes the dependency on using OFFSET() or defining
the struct offsets for assembly otherwise.
- Move page table mapping to the test (instead of a generic helper), as
the helper mistakenly tried to map the entire memslot, not just page
tables.
- Do not use identity mappings for page tables as it collisions with
GVAs used for ELF in some cases.
- Simplify page table walking by using loops.
- Make sure the signals are ignored before creating the signaling
thread [Sashiko]
- Assert that the guest actually ran and had page faults [Sashiko]
- Add a patch to fix RAX and RFLAGS offsets in run_guest() [Sashiko]
- Initialize exception_has_payload when injecting a #UD [Sashiko]
- Only check KVM_STATE_NESTED_GUEST_MODE when running in nested mode
[Sashiko]
v1: https://lore.kernel.org/all/20260518202514.2037078-1-yosry@kernel.org/
v2: https://lore.kernel.org/kvm/20260604203546.365658-1-yosry@kernel.org/
Yosry Ahmed (10):
KVM: selftests: Move STR() and XSTR() definitions to test_util.h
KVM: selftests: Fix RAX and RFLAGS VMCB offsets when running L2
KVM: selftests: Use an array for guest_regs (and fix offsets)
KVM: selftests: Move GPR load/save definitions outside of nSVM code
KVM: selftests: Reuse GPR switching logic for nVMX
KVM: selftests: Drop HORRIFIC_L2_UCALL_CLOBBER_HACK
KVM: selftests: Add basic stress test for save+restore and #PF
handling
KVM: selftests: Trigger save+restore randomly in the #PF stress test
KVM: selftests: Support running stress save+restore and #PF test in L2
KVM: selftests: Trigger L2->L1 exits stress save+restore and #PF test
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../testing/selftests/kvm/include/test_util.h | 3 +
.../testing/selftests/kvm/include/x86/evmcs.h | 40 +--
.../selftests/kvm/include/x86/processor.h | 55 ++-
tools/testing/selftests/kvm/include/x86/vmx.h | 63 ++--
.../testing/selftests/kvm/lib/x86/processor.c | 2 +
tools/testing/selftests/kvm/lib/x86/svm.c | 60 ++--
tools/testing/selftests/kvm/lib/x86/ucall.c | 32 +-
.../kvm/x86/evmcs_smm_controls_test.c | 3 -
tools/testing/selftests/kvm/x86/smm_test.c | 3 -
.../kvm/x86/stress_save_restore_pf_test.c | 326 ++++++++++++++++++
11 files changed, 441 insertions(+), 147 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
base-commit: 50406d35f5635e1cc523e61409d57e851b5f5df8
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v3 01/10] KVM: selftests: Move STR() and XSTR() definitions to test_util.h
2026-06-29 18:37 [PATCH v3 00/10] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
@ 2026-06-29 18:37 ` Yosry Ahmed
2026-06-29 18:37 ` [PATCH v3 02/10] KVM: selftests: Fix RAX and RFLAGS VMCB offsets when running L2 Yosry Ahmed
` (8 subsequent siblings)
9 siblings, 0 replies; 15+ messages in thread
From: Yosry Ahmed @ 2026-06-29 18:37 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
The macros are defined in two tests, and future changes will use them
elsewhere. Move their definition into test_util.h to deduplicate them.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
tools/testing/selftests/kvm/include/test_util.h | 3 +++
tools/testing/selftests/kvm/x86/evmcs_smm_controls_test.c | 3 ---
tools/testing/selftests/kvm/x86/smm_test.c | 3 ---
3 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h
index a56271c237ae9..c55148ebfe934 100644
--- a/tools/testing/selftests/kvm/include/test_util.h
+++ b/tools/testing/selftests/kvm/include/test_util.h
@@ -240,4 +240,7 @@ char *strdup_printf(const char *fmt, ...) __attribute__((format(printf, 1, 2), n
char *sys_get_cur_clocksource(void);
+#define STR(x) #x
+#define XSTR(x) STR(x)
+
#endif /* SELFTEST_KVM_TEST_UTIL_H */
diff --git a/tools/testing/selftests/kvm/x86/evmcs_smm_controls_test.c b/tools/testing/selftests/kvm/x86/evmcs_smm_controls_test.c
index 77ce87c41a868..cf53bf4697e1e 100644
--- a/tools/testing/selftests/kvm/x86/evmcs_smm_controls_test.c
+++ b/tools/testing/selftests/kvm/x86/evmcs_smm_controls_test.c
@@ -22,9 +22,6 @@
#define SYNC_PORT 0xe
-#define STR(x) #x
-#define XSTR(s) STR(s)
-
/*
* SMI handler: runs in real-address mode.
* Reports SMRAM_STAGE via port IO, then does RSM.
diff --git a/tools/testing/selftests/kvm/x86/smm_test.c b/tools/testing/selftests/kvm/x86/smm_test.c
index e2542f4ced605..21619e7582718 100644
--- a/tools/testing/selftests/kvm/x86/smm_test.c
+++ b/tools/testing/selftests/kvm/x86/smm_test.c
@@ -22,9 +22,6 @@
#define SMRAM_GPA 0x1000000
#define SMRAM_STAGE 0xfe
-#define STR(x) #x
-#define XSTR(s) STR(s)
-
#define SYNC_PORT 0xe
#define DONE 0xff
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 02/10] KVM: selftests: Fix RAX and RFLAGS VMCB offsets when running L2
2026-06-29 18:37 [PATCH v3 00/10] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
2026-06-29 18:37 ` [PATCH v3 01/10] KVM: selftests: Move STR() and XSTR() definitions to test_util.h Yosry Ahmed
@ 2026-06-29 18:37 ` Yosry Ahmed
2026-06-29 18:37 ` [PATCH v3 03/10] KVM: selftests: Use an array for guest_regs (and fix offsets) Yosry Ahmed
` (7 subsequent siblings)
9 siblings, 0 replies; 15+ messages in thread
From: Yosry Ahmed @ 2026-06-29 18:37 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed, Sashiko
The offsets used (0x170 and 0x1f8) are offsets within vmcb_save_area,
not vmcb. The correct offsets should include the base of vmcb_save_area
within vmcb (which is 0x400 -- so 0x570 and 0x5f8).
Instead of just correcting the offsets, use vmcb->save.rax and
vmcb->save.rflags as parameters to the asm block and avoid hardcoding
offsets completely. While at it, also use guest_regs.rax directly
instead of assuming it's at offset 0 of guest_regs.
Note: "+m" must be used for vmcb_rax and vmcb_rflags, as caching those
fields in registers would be wrong as the underlying KVM will update
them in memory.
The same problem was recently fixed (differently) for kvm-unit-tests
[1].
[1]https://lore.kernel.org/all/20260521092311.86030-1-pbonzini@redhat.com/
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260518202514.2037078-1-yosry%40kernel.org?part=1
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
tools/testing/selftests/kvm/lib/x86/svm.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/x86/svm.c b/tools/testing/selftests/kvm/lib/x86/svm.c
index 1445b890986fd..766d15f1d534a 100644
--- a/tools/testing/selftests/kvm/lib/x86/svm.c
+++ b/tools/testing/selftests/kvm/lib/x86/svm.c
@@ -164,19 +164,22 @@ void run_guest(struct vmcb *vmcb, u64 vmcb_gpa)
{
asm volatile (
"vmload %[vmcb_gpa]\n\t"
- "mov rflags, %%r15\n\t" // rflags
- "mov %%r15, 0x170(%[vmcb])\n\t"
- "mov guest_regs, %%r15\n\t" // rax
- "mov %%r15, 0x1f8(%[vmcb])\n\t"
+ "mov rflags, %%r15\n\t"
+ "mov %%r15, %[vmcb_rflags]\n\t"
+ "mov %[guest_regs_rax], %%r15\n\t"
+ "mov %%r15, %[vmcb_rax]\n\t"
LOAD_GPR_C
"vmrun %[vmcb_gpa]\n\t"
SAVE_GPR_C
- "mov 0x170(%[vmcb]), %%r15\n\t" // rflags
+ "mov %[vmcb_rflags], %%r15\n\t"
"mov %%r15, rflags\n\t"
- "mov 0x1f8(%[vmcb]), %%r15\n\t" // rax
- "mov %%r15, guest_regs\n\t"
+ "mov %[vmcb_rax], %%r15\n\t" // rax
+ "mov %%r15, %[guest_regs_rax]\n\t"
"vmsave %[vmcb_gpa]\n\t"
- : : [vmcb] "r" (vmcb), [vmcb_gpa] "a" (vmcb_gpa)
+ : [vmcb_rflags] "+m" (vmcb->save.rflags),
+ [vmcb_rax] "+m" (vmcb->save.rax),
+ [guest_regs_rax] "+rm" (guest_regs.rax)
+ : [vmcb_gpa] "a" (vmcb_gpa)
: "r15", "memory");
}
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 03/10] KVM: selftests: Use an array for guest_regs (and fix offsets)
2026-06-29 18:37 [PATCH v3 00/10] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
2026-06-29 18:37 ` [PATCH v3 01/10] KVM: selftests: Move STR() and XSTR() definitions to test_util.h Yosry Ahmed
2026-06-29 18:37 ` [PATCH v3 02/10] KVM: selftests: Fix RAX and RFLAGS VMCB offsets when running L2 Yosry Ahmed
@ 2026-06-29 18:37 ` Yosry Ahmed
2026-06-29 18:37 ` [PATCH v3 04/10] KVM: selftests: Move GPR load/save definitions outside of nSVM code Yosry Ahmed
` (6 subsequent siblings)
9 siblings, 0 replies; 15+ messages in thread
From: Yosry Ahmed @ 2026-06-29 18:37 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
The assembly code defined by SAVE_GPR_C uses the wrong offsets for some
registers in guest_regs. For example, the offset of RCX should 0x08 not
0x10. Also, the last offset in the struct (R15) is 0x78, not 0x80, so
the code actually saves and restore beyond the end of gpr64_regs.
Eliminate hardcoded offsets by using an array instead of a struct
(similar to KVM's per-vCPU regs), and use the array index to generate
the offset. While at it, rename SAVE_GPR_C and LOAD_GPR_C to a single
macro, SVM_SWITCH_GPRS_ASM.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
.../selftests/kvm/include/x86/processor.h | 36 +++++++-------
tools/testing/selftests/kvm/lib/x86/svm.c | 47 ++++++++++---------
2 files changed, 41 insertions(+), 42 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 7d3a27bc0d842..535f26e077570 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -396,25 +396,23 @@ static inline unsigned int x86_model(unsigned int eax)
#define PTE_GET_PA(pte) ((pte) & PHYSICAL_PAGE_MASK)
#define PTE_GET_PFN(pte) (PTE_GET_PA(pte) >> PAGE_SHIFT)
-/* General Registers in 64-Bit Mode */
-struct gpr64_regs {
- u64 rax;
- u64 rcx;
- u64 rdx;
- u64 rbx;
- u64 rsp;
- u64 rbp;
- u64 rsi;
- u64 rdi;
- u64 r8;
- u64 r9;
- u64 r10;
- u64 r11;
- u64 r12;
- u64 r13;
- u64 r14;
- u64 r15;
-};
+#define GUEST_REGS_RAX 0
+#define GUEST_REGS_RCX 1
+#define GUEST_REGS_RDX 2
+#define GUEST_REGS_RBX 3
+#define GUEST_REGS_RSP 4
+#define GUEST_REGS_RBP 5
+#define GUEST_REGS_RSI 6
+#define GUEST_REGS_RDI 7
+#define GUEST_REGS_R8 8
+#define GUEST_REGS_R9 9
+#define GUEST_REGS_R10 10
+#define GUEST_REGS_R11 11
+#define GUEST_REGS_R12 12
+#define GUEST_REGS_R13 13
+#define GUEST_REGS_R14 14
+#define GUEST_REGS_R15 15
+#define NR_GUEST_REGS (GUEST_REGS_R15 + 1)
struct desc64 {
u16 limit0;
diff --git a/tools/testing/selftests/kvm/lib/x86/svm.c b/tools/testing/selftests/kvm/lib/x86/svm.c
index 766d15f1d534a..8e392e0451123 100644
--- a/tools/testing/selftests/kvm/lib/x86/svm.c
+++ b/tools/testing/selftests/kvm/lib/x86/svm.c
@@ -13,7 +13,7 @@
#define SEV_DEV_PATH "/dev/sev"
-struct gpr64_regs guest_regs;
+u64 guest_regs[NR_GUEST_REGS];
u64 rflags;
/* Allocate memory regions for nested SVM tests.
@@ -125,7 +125,7 @@ void generic_svm_setup(struct svm_test_data *svm, void *guest_rip)
vmcb->save.rip = (u64)guest_rip;
vmcb->save.rsp = (u64)svm->stack;
- guest_regs.rdi = (u64)svm;
+ guest_regs[GUEST_REGS_RDI] = (u64)svm;
if (svm->ncr3_gpa) {
ctrl->misc_ctl |= SVM_MISC_ENABLE_NP;
@@ -133,31 +133,32 @@ void generic_svm_setup(struct svm_test_data *svm, void *guest_rip)
}
}
+#define GUEST_SWITCH_GPR_ASM(reg, idx) \
+ "xchg %%" #reg ", guest_regs + 8 *" XSTR(idx) "\n\t"
+
/*
* save/restore 64-bit general registers except rax, rip, rsp
* which are directly handed through the VMCB guest processor state
*/
-#define SAVE_GPR_C \
- "xchg %%rbx, guest_regs+0x20\n\t" \
- "xchg %%rcx, guest_regs+0x10\n\t" \
- "xchg %%rdx, guest_regs+0x18\n\t" \
- "xchg %%rbp, guest_regs+0x30\n\t" \
- "xchg %%rsi, guest_regs+0x38\n\t" \
- "xchg %%rdi, guest_regs+0x40\n\t" \
- "xchg %%r8, guest_regs+0x48\n\t" \
- "xchg %%r9, guest_regs+0x50\n\t" \
- "xchg %%r10, guest_regs+0x58\n\t" \
- "xchg %%r11, guest_regs+0x60\n\t" \
- "xchg %%r12, guest_regs+0x68\n\t" \
- "xchg %%r13, guest_regs+0x70\n\t" \
- "xchg %%r14, guest_regs+0x78\n\t" \
- "xchg %%r15, guest_regs+0x80\n\t"
-
-#define LOAD_GPR_C SAVE_GPR_C
+#define SVM_SWITCH_GPRS_ASM \
+ GUEST_SWITCH_GPR_ASM(rbx, GUEST_REGS_RBX) \
+ GUEST_SWITCH_GPR_ASM(rcx, GUEST_REGS_RCX) \
+ GUEST_SWITCH_GPR_ASM(rdx, GUEST_REGS_RDX) \
+ GUEST_SWITCH_GPR_ASM(rbp, GUEST_REGS_RBP) \
+ GUEST_SWITCH_GPR_ASM(rsi, GUEST_REGS_RSI) \
+ GUEST_SWITCH_GPR_ASM(rdi, GUEST_REGS_RDI) \
+ GUEST_SWITCH_GPR_ASM(r8, GUEST_REGS_R8) \
+ GUEST_SWITCH_GPR_ASM(r9, GUEST_REGS_R9) \
+ GUEST_SWITCH_GPR_ASM(r10, GUEST_REGS_R10) \
+ GUEST_SWITCH_GPR_ASM(r11, GUEST_REGS_R11) \
+ GUEST_SWITCH_GPR_ASM(r12, GUEST_REGS_R12) \
+ GUEST_SWITCH_GPR_ASM(r13, GUEST_REGS_R13) \
+ GUEST_SWITCH_GPR_ASM(r14, GUEST_REGS_R14) \
+ GUEST_SWITCH_GPR_ASM(r15, GUEST_REGS_R15)
/*
* selftests do not use interrupts so we dropped clgi/sti/cli/stgi
- * for now. registers involved in LOAD/SAVE_GPR_C are eventually
+ * for now. Registers involved in SVM_SWITCH_GPRS_ASM are eventually
* unmodified so they do not need to be in the clobber list.
*/
void run_guest(struct vmcb *vmcb, u64 vmcb_gpa)
@@ -168,9 +169,9 @@ void run_guest(struct vmcb *vmcb, u64 vmcb_gpa)
"mov %%r15, %[vmcb_rflags]\n\t"
"mov %[guest_regs_rax], %%r15\n\t"
"mov %%r15, %[vmcb_rax]\n\t"
- LOAD_GPR_C
+ SVM_SWITCH_GPRS_ASM
"vmrun %[vmcb_gpa]\n\t"
- SAVE_GPR_C
+ SVM_SWITCH_GPRS_ASM
"mov %[vmcb_rflags], %%r15\n\t"
"mov %%r15, rflags\n\t"
"mov %[vmcb_rax], %%r15\n\t" // rax
@@ -178,7 +179,7 @@ void run_guest(struct vmcb *vmcb, u64 vmcb_gpa)
"vmsave %[vmcb_gpa]\n\t"
: [vmcb_rflags] "+m" (vmcb->save.rflags),
[vmcb_rax] "+m" (vmcb->save.rax),
- [guest_regs_rax] "+rm" (guest_regs.rax)
+ [guest_regs_rax] "+rm" (guest_regs[GUEST_REGS_RAX])
: [vmcb_gpa] "a" (vmcb_gpa)
: "r15", "memory");
}
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 04/10] KVM: selftests: Move GPR load/save definitions outside of nSVM code
2026-06-29 18:37 [PATCH v3 00/10] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
` (2 preceding siblings ...)
2026-06-29 18:37 ` [PATCH v3 03/10] KVM: selftests: Use an array for guest_regs (and fix offsets) Yosry Ahmed
@ 2026-06-29 18:37 ` Yosry Ahmed
2026-06-29 18:37 ` [PATCH v3 05/10] KVM: selftests: Reuse GPR switching logic for nVMX Yosry Ahmed
` (5 subsequent siblings)
9 siblings, 0 replies; 15+ messages in thread
From: Yosry Ahmed @ 2026-06-29 18:37 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
In preparation for reusing the code for nVMX tests, move the guest_regs
array declaration as well as GPR switching macro to processor.h.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
tools/testing/selftests/kvm/include/x86/processor.h | 5 +++++
tools/testing/selftests/kvm/lib/x86/processor.c | 2 ++
tools/testing/selftests/kvm/lib/x86/svm.c | 4 ----
3 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 535f26e077570..28edeab74e0e6 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -414,6 +414,11 @@ static inline unsigned int x86_model(unsigned int eax)
#define GUEST_REGS_R15 15
#define NR_GUEST_REGS (GUEST_REGS_R15 + 1)
+extern u64 guest_regs[NR_GUEST_REGS];
+
+#define GUEST_SWITCH_GPR_ASM(reg, idx) \
+ "xchg %%" #reg ", guest_regs + 8 *" XSTR(idx) "\n\t"
+
struct desc64 {
u16 limit0;
u16 base0;
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index ef56dcefe0119..0d66aff04939b 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -29,6 +29,8 @@ bool host_cpu_is_amd_compatible;
bool is_forced_emulation_enabled;
u64 guest_tsc_khz;
+u64 guest_regs[NR_GUEST_REGS];
+
const char *ex_str(int vector)
{
switch (vector) {
diff --git a/tools/testing/selftests/kvm/lib/x86/svm.c b/tools/testing/selftests/kvm/lib/x86/svm.c
index 8e392e0451123..d07f10b5dc963 100644
--- a/tools/testing/selftests/kvm/lib/x86/svm.c
+++ b/tools/testing/selftests/kvm/lib/x86/svm.c
@@ -13,7 +13,6 @@
#define SEV_DEV_PATH "/dev/sev"
-u64 guest_regs[NR_GUEST_REGS];
u64 rflags;
/* Allocate memory regions for nested SVM tests.
@@ -133,9 +132,6 @@ void generic_svm_setup(struct svm_test_data *svm, void *guest_rip)
}
}
-#define GUEST_SWITCH_GPR_ASM(reg, idx) \
- "xchg %%" #reg ", guest_regs + 8 *" XSTR(idx) "\n\t"
-
/*
* save/restore 64-bit general registers except rax, rip, rsp
* which are directly handed through the VMCB guest processor state
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 05/10] KVM: selftests: Reuse GPR switching logic for nVMX
2026-06-29 18:37 [PATCH v3 00/10] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
` (3 preceding siblings ...)
2026-06-29 18:37 ` [PATCH v3 04/10] KVM: selftests: Move GPR load/save definitions outside of nSVM code Yosry Ahmed
@ 2026-06-29 18:37 ` Yosry Ahmed
2026-06-29 18:49 ` sashiko-bot
2026-06-29 18:37 ` [PATCH v3 06/10] KVM: selftests: Drop HORRIFIC_L2_UCALL_CLOBBER_HACK Yosry Ahmed
` (4 subsequent siblings)
9 siblings, 1 reply; 15+ messages in thread
From: Yosry Ahmed @ 2026-06-29 18:37 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
Reuse the GPR switching logic for nVMX by defining VMX_SWITCH_GPRS_ASM,
which is essentially the same as SVM_SWITCH_GPRS_ASM but also switches
RAX, replacing the push/pop of a subset of the registers.
The long clobber list of registers is no longer needed as registers are
saved and restored appropriately (and not clobbered by L2).
Define VMX_SWITCH_GPRS_ASM before including evmcs.h, such that it can be
used by evmcs_vmlaunch() and evmcs_vmresume().
Assisted-by: Gemini:gemini-3.1-pro
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
.../testing/selftests/kvm/include/x86/evmcs.h | 40 ++++--------
tools/testing/selftests/kvm/include/x86/vmx.h | 63 +++++++++----------
2 files changed, 41 insertions(+), 62 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/evmcs.h b/tools/testing/selftests/kvm/include/x86/evmcs.h
index be79bda024bf1..98268be1aa5b2 100644
--- a/tools/testing/selftests/kvm/include/x86/evmcs.h
+++ b/tools/testing/selftests/kvm/include/x86/evmcs.h
@@ -1207,30 +1207,22 @@ static inline int evmcs_vmlaunch(void)
current_evmcs->hv_clean_fields = 0;
- __asm__ __volatile__("push %%rbp;"
- "push %%rcx;"
- "push %%rdx;"
- "push %%rsi;"
- "push %%rdi;"
- "push $0;"
+ __asm__ __volatile__("push $0;"
"mov %%rsp, (%[host_rsp]);"
"lea 1f(%%rip), %%rax;"
"mov %%rax, (%[host_rip]);"
+ VMX_SWITCH_GPRS_ASM
"vmlaunch;"
"incq (%%rsp);"
- "1: pop %%rax;"
- "pop %%rdi;"
- "pop %%rsi;"
- "pop %%rdx;"
- "pop %%rcx;"
- "pop %%rbp;"
+ "1: ;"
+ VMX_SWITCH_GPRS_ASM
+ "pop %%rax;"
: [ret]"=&a"(ret)
: [host_rsp]"r"
((u64)¤t_evmcs->host_rsp),
[host_rip]"r"
((u64)¤t_evmcs->host_rip)
- : "memory", "cc", "rbx", "r8", "r9", "r10",
- "r11", "r12", "r13", "r14", "r15");
+ : "memory", "cc");
return ret;
}
@@ -1246,30 +1238,22 @@ static inline int evmcs_vmresume(void)
/* HOST_RSP */
current_evmcs->hv_clean_fields &= ~HV_VMX_ENLIGHTENED_CLEAN_FIELD_HOST_POINTER;
- __asm__ __volatile__("push %%rbp;"
- "push %%rcx;"
- "push %%rdx;"
- "push %%rsi;"
- "push %%rdi;"
- "push $0;"
+ __asm__ __volatile__("push $0;"
"mov %%rsp, (%[host_rsp]);"
"lea 1f(%%rip), %%rax;"
"mov %%rax, (%[host_rip]);"
+ VMX_SWITCH_GPRS_ASM
"vmresume;"
"incq (%%rsp);"
- "1: pop %%rax;"
- "pop %%rdi;"
- "pop %%rsi;"
- "pop %%rdx;"
- "pop %%rcx;"
- "pop %%rbp;"
+ "1: ;"
+ VMX_SWITCH_GPRS_ASM
+ "pop %%rax;"
: [ret]"=&a"(ret)
: [host_rsp]"r"
((u64)¤t_evmcs->host_rsp),
[host_rip]"r"
((u64)¤t_evmcs->host_rip)
- : "memory", "cc", "rbx", "r8", "r9", "r10",
- "r11", "r12", "r13", "r14", "r15");
+ : "memory", "cc");
return ret;
}
diff --git a/tools/testing/selftests/kvm/include/x86/vmx.h b/tools/testing/selftests/kvm/include/x86/vmx.h
index 4bcfd60e3aecb..a808dc21c9f21 100644
--- a/tools/testing/selftests/kvm/include/x86/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86/vmx.h
@@ -290,6 +290,23 @@ struct vmx_msr_entry {
u64 value;
} __attribute__ ((aligned(16)));
+#define VMX_SWITCH_GPRS_ASM \
+ GUEST_SWITCH_GPR_ASM(rax, GUEST_REGS_RAX) \
+ GUEST_SWITCH_GPR_ASM(rbx, GUEST_REGS_RBX) \
+ GUEST_SWITCH_GPR_ASM(rcx, GUEST_REGS_RCX) \
+ GUEST_SWITCH_GPR_ASM(rdx, GUEST_REGS_RDX) \
+ GUEST_SWITCH_GPR_ASM(rbp, GUEST_REGS_RBP) \
+ GUEST_SWITCH_GPR_ASM(rsi, GUEST_REGS_RSI) \
+ GUEST_SWITCH_GPR_ASM(rdi, GUEST_REGS_RDI) \
+ GUEST_SWITCH_GPR_ASM(r8, GUEST_REGS_R8) \
+ GUEST_SWITCH_GPR_ASM(r9, GUEST_REGS_R9) \
+ GUEST_SWITCH_GPR_ASM(r10, GUEST_REGS_R10) \
+ GUEST_SWITCH_GPR_ASM(r11, GUEST_REGS_R11) \
+ GUEST_SWITCH_GPR_ASM(r12, GUEST_REGS_R12) \
+ GUEST_SWITCH_GPR_ASM(r13, GUEST_REGS_R13) \
+ GUEST_SWITCH_GPR_ASM(r14, GUEST_REGS_R14) \
+ GUEST_SWITCH_GPR_ASM(r15, GUEST_REGS_R15)
+
#include "evmcs.h"
static inline int vmxon(u64 phys)
@@ -363,9 +380,6 @@ static inline u64 vmptrstz(void)
return value;
}
-/*
- * No guest state (e.g. GPRs) is established by this vmlaunch.
- */
static inline int vmlaunch(void)
{
int ret;
@@ -373,34 +387,23 @@ static inline int vmlaunch(void)
if (enable_evmcs)
return evmcs_vmlaunch();
- __asm__ __volatile__("push %%rbp;"
- "push %%rcx;"
- "push %%rdx;"
- "push %%rsi;"
- "push %%rdi;"
- "push $0;"
+ __asm__ __volatile__("push $0;"
"vmwrite %%rsp, %[host_rsp];"
"lea 1f(%%rip), %%rax;"
"vmwrite %%rax, %[host_rip];"
+ VMX_SWITCH_GPRS_ASM
"vmlaunch;"
"incq (%%rsp);"
- "1: pop %%rax;"
- "pop %%rdi;"
- "pop %%rsi;"
- "pop %%rdx;"
- "pop %%rcx;"
- "pop %%rbp;"
+ "1: ;"
+ VMX_SWITCH_GPRS_ASM
+ "pop %%rax;"
: [ret]"=&a"(ret)
: [host_rsp]"r"((u64)HOST_RSP),
[host_rip]"r"((u64)HOST_RIP)
- : "memory", "cc", "rbx", "r8", "r9", "r10",
- "r11", "r12", "r13", "r14", "r15");
+ : "memory", "cc");
return ret;
}
-/*
- * No guest state (e.g. GPRs) is established by this vmresume.
- */
static inline int vmresume(void)
{
int ret;
@@ -408,28 +411,20 @@ static inline int vmresume(void)
if (enable_evmcs)
return evmcs_vmresume();
- __asm__ __volatile__("push %%rbp;"
- "push %%rcx;"
- "push %%rdx;"
- "push %%rsi;"
- "push %%rdi;"
- "push $0;"
+ __asm__ __volatile__("push $0;"
"vmwrite %%rsp, %[host_rsp];"
"lea 1f(%%rip), %%rax;"
"vmwrite %%rax, %[host_rip];"
+ VMX_SWITCH_GPRS_ASM
"vmresume;"
"incq (%%rsp);"
- "1: pop %%rax;"
- "pop %%rdi;"
- "pop %%rsi;"
- "pop %%rdx;"
- "pop %%rcx;"
- "pop %%rbp;"
+ "1: ;"
+ VMX_SWITCH_GPRS_ASM
+ "pop %%rax;"
: [ret]"=&a"(ret)
: [host_rsp]"r"((u64)HOST_RSP),
[host_rip]"r"((u64)HOST_RIP)
- : "memory", "cc", "rbx", "r8", "r9", "r10",
- "r11", "r12", "r13", "r14", "r15");
+ : "memory", "cc");
return ret;
}
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 06/10] KVM: selftests: Drop HORRIFIC_L2_UCALL_CLOBBER_HACK
2026-06-29 18:37 [PATCH v3 00/10] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
` (4 preceding siblings ...)
2026-06-29 18:37 ` [PATCH v3 05/10] KVM: selftests: Reuse GPR switching logic for nVMX Yosry Ahmed
@ 2026-06-29 18:37 ` Yosry Ahmed
2026-06-29 18:37 ` [PATCH v3 07/10] KVM: selftests: Add basic stress test for save+restore and #PF handling Yosry Ahmed
` (3 subsequent siblings)
9 siblings, 0 replies; 15+ messages in thread
From: Yosry Ahmed @ 2026-06-29 18:37 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
Now that nVMX test codes preserves GPRs across nested VM-Exits
(specifically RBP, RDX, and RDI among others), drop the ucall-specific
hack to avoid clobbering these registers.
Assisted-by: Gemini:gemini-3.1-pro
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
tools/testing/selftests/kvm/lib/x86/ucall.c | 32 ++-------------------
1 file changed, 2 insertions(+), 30 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/x86/ucall.c b/tools/testing/selftests/kvm/lib/x86/ucall.c
index e7dd5791959ba..38050c60a0670 100644
--- a/tools/testing/selftests/kvm/lib/x86/ucall.c
+++ b/tools/testing/selftests/kvm/lib/x86/ucall.c
@@ -10,36 +10,8 @@
void ucall_arch_do_ucall(gva_t uc)
{
- /*
- * FIXME: Revert this hack (the entire commit that added it) once nVMX
- * preserves L2 GPRs across a nested VM-Exit. If a ucall from L2, e.g.
- * to do a GUEST_SYNC(), lands the vCPU in L1, any and all GPRs can be
- * clobbered by L1. Save and restore non-volatile GPRs (clobbering RBP
- * in particular is problematic) along with RDX and RDI (which are
- * inputs), and clobber volatile GPRs. *sigh*
- */
-#define HORRIFIC_L2_UCALL_CLOBBER_HACK \
- "rcx", "rsi", "r8", "r9", "r10", "r11"
-
- asm volatile("push %%rbp\n\t"
- "push %%r15\n\t"
- "push %%r14\n\t"
- "push %%r13\n\t"
- "push %%r12\n\t"
- "push %%rbx\n\t"
- "push %%rdx\n\t"
- "push %%rdi\n\t"
- "in %[port], %%al\n\t"
- "pop %%rdi\n\t"
- "pop %%rdx\n\t"
- "pop %%rbx\n\t"
- "pop %%r12\n\t"
- "pop %%r13\n\t"
- "pop %%r14\n\t"
- "pop %%r15\n\t"
- "pop %%rbp\n\t"
- : : [port] "d" (UCALL_PIO_PORT), "D" (uc) : "rax", "memory",
- HORRIFIC_L2_UCALL_CLOBBER_HACK);
+ asm volatile("in %[port], %%al"
+ : : [port] "d" (UCALL_PIO_PORT), "D" (uc) : "rax", "memory");
}
void *ucall_arch_get_ucall(struct kvm_vcpu *vcpu)
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 07/10] KVM: selftests: Add basic stress test for save+restore and #PF handling
2026-06-29 18:37 [PATCH v3 00/10] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
` (5 preceding siblings ...)
2026-06-29 18:37 ` [PATCH v3 06/10] KVM: selftests: Drop HORRIFIC_L2_UCALL_CLOBBER_HACK Yosry Ahmed
@ 2026-06-29 18:37 ` Yosry Ahmed
2026-06-29 18:37 ` [PATCH v3 08/10] KVM: selftests: Trigger save+restore randomly in the #PF stress test Yosry Ahmed
` (2 subsequent siblings)
9 siblings, 0 replies; 15+ messages in thread
From: Yosry Ahmed @ 2026-06-29 18:37 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
Add a basic stress test for handling #PFs in a guest while the host is
doing save+restore cycles. The guest periodically accesses non-present
memory causing a #PF, and the #PF handler walks the page tables and
updates the PTE to be present, like a proper #PF handler.
After every access (and #PF), the guest triggers a sync and the test
performs save+restore of the VM. This is not very meaningful as
save+restore are performed after the access and #PF handling complete,
but following changes will change that.
Assisted-by: Gemini:gemini-3.1-pro
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../selftests/kvm/include/x86/processor.h | 14 ++
.../kvm/x86/stress_save_restore_pf_test.c | 182 ++++++++++++++++++
3 files changed, 197 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 4ace12606e937..c61d51a0c112f 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -111,6 +111,7 @@ TEST_GEN_PROGS_x86 += x86/set_sregs_test
TEST_GEN_PROGS_x86 += x86/smaller_maxphyaddr_emulation_test
TEST_GEN_PROGS_x86 += x86/smm_test
TEST_GEN_PROGS_x86 += x86/state_test
+TEST_GEN_PROGS_x86 += x86/stress_save_restore_pf_test
TEST_GEN_PROGS_x86 += x86/vmx_preemption_timer_test
TEST_GEN_PROGS_x86 += x86/svm_vmcall_test
TEST_GEN_PROGS_x86 += x86/svm_int_ctl_test
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 28edeab74e0e6..ec4dfd8176369 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -585,6 +585,15 @@ static inline void set_cr0(u64 val)
__asm__ __volatile__("mov %0, %%cr0" : : "r" (val) : "memory");
}
+static inline u64 get_cr2(void)
+{
+ u64 cr2;
+
+ __asm__ __volatile__("mov %%cr2, %[cr2]"
+ : /* output */ [cr2]"=r"(cr2));
+ return cr2;
+}
+
static inline u64 get_cr3(void)
{
u64 cr3;
@@ -880,6 +889,11 @@ static inline void write_sse_reg(int reg, const sse128_t *data)
}
}
+static inline void invlpg(u64 addr)
+{
+ __asm__ __volatile__("invlpg (%0)" : : "r"(addr) : "memory");
+}
+
static inline void cpu_relax(void)
{
asm volatile("rep; nop" ::: "memory");
diff --git a/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c b/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
new file mode 100644
index 0000000000000..1b6f64bbcf937
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
@@ -0,0 +1,182 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <sys/types.h>
+#include <time.h>
+#include <unistd.h>
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+
+#define NR_ITERATIONS 500
+
+#define GOTO_PREV_LINE "\033[A\r"
+#define PRINT_ITER(s, x) \
+do { \
+ if (x == 1) \
+ printf(s "%d\n", x); \
+ else \
+ printf(GOTO_PREV_LINE s "%d\n", x); \
+ fflush(stdout); \
+} while (0)
+
+#define PTRS_PER_PTE 512
+#define PXD_INDEX(vaddr, level) (((vaddr) >> PG_LEVEL_SHIFT(level)) & (PTRS_PER_PTE - 1))
+
+#define TEST_MEM_BASE_GVA 0xc0000000ULL
+#define TEST_PGTABLE_GVA_OFFSET 0xd0000000ULL
+#define NR_TEST_ADDRS PTRS_PER_PTE
+#define PATTERN 0xabcdefabcdefabcdULL
+
+static u64 pte_present_mask;
+static u64 pte_huge_mask;
+
+static u64 expected_vaddr;
+static u64 guest_faults;
+
+static u64 *guest_get_pte(u64 vaddr)
+{
+ u64 pgtable_pa, pte;
+ u64 *pgtable;
+ int level;
+ bool la57;
+
+ la57 = !!(get_cr4() & X86_CR4_LA57);
+ level = la57 ? PG_LEVEL_256T : PG_LEVEL_512G;
+
+ pgtable_pa = get_cr3() & PHYSICAL_PAGE_MASK;
+ for (; level > PG_LEVEL_4K; level--) {
+ pgtable = (u64 *)(pgtable_pa + TEST_PGTABLE_GVA_OFFSET);
+ pte = pgtable[PXD_INDEX(vaddr, level)];
+ GUEST_ASSERT(pte & pte_present_mask);
+ GUEST_ASSERT(!(pte & pte_huge_mask));
+ pgtable_pa = PTE_GET_PA(pte);
+ }
+
+ pgtable = (u64 *)(pgtable_pa + TEST_PGTABLE_GVA_OFFSET);
+ return &pgtable[PXD_INDEX(vaddr, PG_LEVEL_4K)];
+}
+
+static void guest_pf_handler(struct ex_regs *regs)
+{
+ u64 fault_addr;
+ u64 *ptep;
+
+ fault_addr = get_cr2();
+ GUEST_ASSERT_EQ(fault_addr, READ_ONCE(expected_vaddr));
+
+ ptep = guest_get_pte(fault_addr);
+ GUEST_ASSERT(ptep);
+ GUEST_ASSERT(!(*ptep & pte_present_mask));
+
+ *ptep |= pte_present_mask;
+ invlpg(fault_addr);
+
+ guest_faults++;
+}
+
+static void guest_access_memory(void *arg)
+{
+ u64 vaddr, val;
+ int i = 0;
+
+ for (;; i++) {
+ vaddr = TEST_MEM_BASE_GVA + (i % NR_TEST_ADDRS) * PAGE_SIZE;
+ WRITE_ONCE(expected_vaddr, vaddr);
+
+ /* Read to trigger #PF */
+ val = READ_ONCE(*(u64 *)vaddr);
+ GUEST_ASSERT_EQ(val, PATTERN);
+
+ /* Clear the present bit again so it faults next time */
+ *guest_get_pte(vaddr) &= ~pte_present_mask;
+ invlpg(vaddr);
+
+ GUEST_SYNC(guest_faults);
+ }
+}
+
+int main(int argc, char *argv[])
+{
+ struct kvm_x86_state *state;
+ int r, i, level, count = 0;
+ gpa_t gpa, pgtable_gpa;
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+ struct ucall uc;
+ u64 *pgtable;
+ gva_t gva;
+ u64 pte;
+
+ vm = vm_create_with_one_vcpu(&vcpu, guest_access_memory);
+ vm_install_exception_handler(vm, PF_VECTOR, guest_pf_handler);
+
+ pte_present_mask = PTE_PRESENT_MASK(&vm->mmu);
+ pte_huge_mask = PTE_HUGE_MASK(&vm->mmu);
+ sync_global_to_guest(vm, pte_present_mask);
+ sync_global_to_guest(vm, pte_huge_mask);
+
+ /* Allocate a page and write the pattern to it */
+ gva = vm_alloc_page(vm);
+ *(u64 *)addr_gva2hva(vm, gva) = PATTERN;
+ gpa = addr_gva2gpa(vm, gva);
+
+ /*
+ * Map all virtual addresses to the pattern page and clear the present
+ * bit such that guest accesses will cause a #PF.
+ */
+ for (i = 0; i < NR_TEST_ADDRS; i++) {
+ gva = TEST_MEM_BASE_GVA + i * getpagesize();
+ virt_pg_map(vm, gva, gpa);
+ *vm_get_pte(vm, gva) &= ~pte_present_mask;
+ }
+
+ /*
+ * Now create mappings for the page tables created above so that the
+ * guest #PF handler can walk them. All PTEs for test virtual addresses
+ * should lie on the same PTE page, so one page is mapped for each page
+ * table level.
+ *
+ * Use an offset for the GVA instead of creating identity mappings to
+ * avoid collision with existing mappings at low GVAs (e.g. ELF).
+ */
+ pgtable_gpa = vm->mmu.pgd;
+ for (level = vm->mmu.pgtable_levels; level >= PG_LEVEL_4K; level--) {
+ virt_map(vm, pgtable_gpa + TEST_PGTABLE_GVA_OFFSET, pgtable_gpa, 1);
+ pgtable = addr_gpa2hva(vm, pgtable_gpa);
+ pte = pgtable[PXD_INDEX(TEST_MEM_BASE_GVA, level)];
+ pgtable_gpa = PTE_GET_PA(pte);
+ }
+
+ while (count++ < NR_ITERATIONS) {
+ r = __vcpu_run(vcpu);
+ TEST_ASSERT(!r, "vcpu_run failed");
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+
+ get_ucall(vcpu, &uc);
+ if (uc.cmd == UCALL_ABORT) {
+ REPORT_GUEST_ASSERT(uc);
+ break;
+ }
+ TEST_ASSERT_EQ(uc.cmd, UCALL_SYNC);
+ TEST_ASSERT_EQ(uc.args[1], count);
+
+ state = vcpu_save_state(vcpu);
+
+ kvm_vm_release(vm);
+ vcpu = vm_recreate_with_one_vcpu(vm);
+ vcpu_load_state(vcpu, state);
+ kvm_x86_state_cleanup(state);
+
+ PRINT_ITER("Save+restore iterations: ", count);
+ }
+
+ sync_global_from_guest(vm, guest_faults);
+ pr_info("Guest page faults: %lu\n", guest_faults);
+
+ kvm_vm_free(vm);
+ return 0;
+}
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 08/10] KVM: selftests: Trigger save+restore randomly in the #PF stress test
2026-06-29 18:37 [PATCH v3 00/10] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
` (6 preceding siblings ...)
2026-06-29 18:37 ` [PATCH v3 07/10] KVM: selftests: Add basic stress test for save+restore and #PF handling Yosry Ahmed
@ 2026-06-29 18:37 ` Yosry Ahmed
2026-06-29 18:48 ` sashiko-bot
2026-06-29 18:37 ` [PATCH v3 09/10] KVM: selftests: Support running stress save+restore and #PF test in L2 Yosry Ahmed
2026-06-29 18:37 ` [PATCH v3 10/10] KVM: selftests: Trigger L2->L1 exits stress save+restore and #PF test Yosry Ahmed
9 siblings, 1 reply; 15+ messages in thread
From: Yosry Ahmed @ 2026-06-29 18:37 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
Instead of an explicit GUEST_SYNC() after each access+#PF, run another
thread that keeps sending SIGUSR to the vCPU thread, essentially
triggering exits to userspace and save+restore on random points in guest
execution. This makes the test a lot more meaningful as it opens the
door to exercising race conditions between #PF handling in the guest
and save+restore in the host.
The signals are ignored using SIG_IGN outside of __vcpu_run() to avoid
interrupting other ioctls/sysctls performed by the test.
Assisted-by: Gemini:gemini-3.1-pro
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
.../kvm/x86/stress_save_restore_pf_test.c | 59 ++++++++++++++++---
1 file changed, 51 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c b/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
index 1b6f64bbcf937..bbbb5bb2a2ee1 100644
--- a/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
+++ b/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
@@ -5,6 +5,8 @@
#include <errno.h>
#include <sys/types.h>
#include <time.h>
+#include <pthread.h>
+#include <signal.h>
#include <unistd.h>
#include "test_util.h"
@@ -94,15 +96,41 @@ static void guest_access_memory(void *arg)
/* Clear the present bit again so it faults next time */
*guest_get_pte(vaddr) &= ~pte_present_mask;
invlpg(vaddr);
+ }
+}
+
+static void *sigusr_thread_fn(void *arg)
+{
+ pthread_t vcpu_thread = (pthread_t)arg;
- GUEST_SYNC(guest_faults);
+ for (;;) {
+ pthread_testcancel();
+ pthread_kill(vcpu_thread, SIGUSR1);
+ usleep(msecs_to_usecs(1));
}
+ return NULL;
+}
+
+static void dummy_signal_handler(int signo) {}
+static struct sigaction sa;
+
+static void vcpu_sigusr_listen(void)
+{
+ sa.sa_handler = dummy_signal_handler;
+ sigaction(SIGUSR1, &sa, NULL);
+}
+
+static void vcpu_sigusr_ignore(void)
+{
+ sa.sa_handler = SIG_IGN;
+ sigaction(SIGUSR1, &sa, NULL);
}
int main(int argc, char *argv[])
{
struct kvm_x86_state *state;
int r, i, level, count = 0;
+ pthread_t sigusr_thread;
gpa_t gpa, pgtable_gpa;
struct kvm_vcpu *vcpu;
struct kvm_vm *vm;
@@ -151,18 +179,30 @@ int main(int argc, char *argv[])
pgtable_gpa = PTE_GET_PA(pte);
}
+ /* Initialize the thread sending SIGUSR and install the handler */
+ vcpu_sigusr_ignore();
+ r = pthread_create(&sigusr_thread, NULL, sigusr_thread_fn,
+ (void *)pthread_self());
+ TEST_ASSERT(!r, "pthread_create() failed: %d", r);
+
while (count++ < NR_ITERATIONS) {
+ /*
+ * Only handle SIGUSR while the vCPU is running, otherwise
+ * ignore it to avoid interrupting other ioctls/syscalls.
+ */
+ vcpu_sigusr_listen();
r = __vcpu_run(vcpu);
- TEST_ASSERT(!r, "vcpu_run failed");
- TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
-
- get_ucall(vcpu, &uc);
- if (uc.cmd == UCALL_ABORT) {
+ if (r == -1)
+ TEST_ASSERT_EQ(errno, EINTR);
+ vcpu_sigusr_ignore();
+
+ /* The guest only exits due to a signal or failed assertion */
+ if (!r) {
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+ TEST_ASSERT_EQ(get_ucall(vcpu, &uc), UCALL_ABORT);
REPORT_GUEST_ASSERT(uc);
break;
}
- TEST_ASSERT_EQ(uc.cmd, UCALL_SYNC);
- TEST_ASSERT_EQ(uc.args[1], count);
state = vcpu_save_state(vcpu);
@@ -175,8 +215,11 @@ int main(int argc, char *argv[])
}
sync_global_from_guest(vm, guest_faults);
+ TEST_ASSERT(guest_faults > 0, "No guest page faults triggered");
pr_info("Guest page faults: %lu\n", guest_faults);
+ pthread_cancel(sigusr_thread);
+ pthread_join(sigusr_thread, NULL);
kvm_vm_free(vm);
return 0;
}
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 09/10] KVM: selftests: Support running stress save+restore and #PF test in L2
2026-06-29 18:37 [PATCH v3 00/10] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
` (7 preceding siblings ...)
2026-06-29 18:37 ` [PATCH v3 08/10] KVM: selftests: Trigger save+restore randomly in the #PF stress test Yosry Ahmed
@ 2026-06-29 18:37 ` Yosry Ahmed
2026-06-29 18:37 ` [PATCH v3 10/10] KVM: selftests: Trigger L2->L1 exits stress save+restore and #PF test Yosry Ahmed
9 siblings, 0 replies; 15+ messages in thread
From: Yosry Ahmed @ 2026-06-29 18:37 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
Extend the stress test to allow running the access+#PF code in L2
instead of L1 by adding proper L1 guest code to bootstrap L2. The test
runs in nested mode if a '-n' flag is added.
Assisted-by: Gemini:gemini-3.1-pro
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
.../kvm/x86/stress_save_restore_pf_test.c | 68 ++++++++++++++++++-
1 file changed, 66 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c b/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
index bbbb5bb2a2ee1..9ab52d27a61d9 100644
--- a/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
+++ b/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
@@ -8,10 +8,13 @@
#include <pthread.h>
#include <signal.h>
#include <unistd.h>
+#include <getopt.h>
#include "test_util.h"
#include "kvm_util.h"
#include "processor.h"
+#include "svm_util.h"
+#include "vmx.h"
#define NR_ITERATIONS 500
@@ -99,6 +102,36 @@ static void guest_access_memory(void *arg)
}
}
+static void l1_svm_code(struct svm_test_data *svm)
+{
+ generic_svm_setup(svm, guest_access_memory);
+ run_guest(svm->vmcb, svm->vmcb_gpa);
+ GUEST_ASSERT(false);
+}
+
+static void l1_vmx_code(struct vmx_pages *vmx)
+{
+ GUEST_ASSERT(prepare_for_vmx_operation(vmx));
+ GUEST_ASSERT(load_vmcs(vmx));
+ prepare_vmcs(vmx, guest_access_memory);
+
+ /* Ignore any #PF */
+ GUEST_ASSERT(!vmwrite(EXCEPTION_BITMAP, BIT(PF_VECTOR)));
+ GUEST_ASSERT(!vmwrite(PAGE_FAULT_ERROR_CODE_MASK, 0));
+ GUEST_ASSERT(!vmwrite(PAGE_FAULT_ERROR_CODE_MATCH, -1));
+
+ GUEST_ASSERT(!vmlaunch());
+ GUEST_ASSERT(false);
+}
+
+static void l1_guest_code(void *test_data)
+{
+ if (this_cpu_has(X86_FEATURE_SVM))
+ l1_svm_code(test_data);
+ else
+ l1_vmx_code(test_data);
+}
+
static void *sigusr_thread_fn(void *arg)
{
pthread_t vcpu_thread = (pthread_t)arg;
@@ -126,6 +159,25 @@ static void vcpu_sigusr_ignore(void)
sigaction(SIGUSR1, &sa, NULL);
}
+static bool parse_args_nested(int argc, char *argv[])
+{
+ bool nested = false;
+ int opt;
+
+ while ((opt = getopt(argc, argv, "n")) != -1) {
+ switch (opt) {
+ case 'n':
+ nested = true;
+ break;
+ default:
+ printf("Usage: %s [-n]\n", argv[0]);
+ exit(1);
+ }
+ }
+
+ return nested;
+}
+
int main(int argc, char *argv[])
{
struct kvm_x86_state *state;
@@ -136,12 +188,24 @@ int main(int argc, char *argv[])
struct kvm_vm *vm;
struct ucall uc;
u64 *pgtable;
+ bool nested;
gva_t gva;
u64 pte;
- vm = vm_create_with_one_vcpu(&vcpu, guest_access_memory);
+ nested = parse_args_nested(argc, argv);
+
+ vm = vm_create_with_one_vcpu(&vcpu, nested ? l1_guest_code : guest_access_memory);
vm_install_exception_handler(vm, PF_VECTOR, guest_pf_handler);
+ if (nested) {
+ TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_SVM) || kvm_cpu_has(X86_FEATURE_VMX));
+ if (kvm_cpu_has(X86_FEATURE_SVM))
+ vcpu_alloc_svm(vm, &gva);
+ else
+ vcpu_alloc_vmx(vm, &gva);
+ vcpu_args_set(vcpu, 1, gva);
+ }
+
pte_present_mask = PTE_PRESENT_MASK(&vm->mmu);
pte_huge_mask = PTE_HUGE_MASK(&vm->mmu);
sync_global_to_guest(vm, pte_present_mask);
@@ -216,7 +280,7 @@ int main(int argc, char *argv[])
sync_global_from_guest(vm, guest_faults);
TEST_ASSERT(guest_faults > 0, "No guest page faults triggered");
- pr_info("Guest page faults: %lu\n", guest_faults);
+ pr_info("Guest page faults%s: %lu\n", nested ? " (in L2)" : "", guest_faults);
pthread_cancel(sigusr_thread);
pthread_join(sigusr_thread, NULL);
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 10/10] KVM: selftests: Trigger L2->L1 exits stress save+restore and #PF test
2026-06-29 18:37 [PATCH v3 00/10] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
` (8 preceding siblings ...)
2026-06-29 18:37 ` [PATCH v3 09/10] KVM: selftests: Support running stress save+restore and #PF test in L2 Yosry Ahmed
@ 2026-06-29 18:37 ` Yosry Ahmed
9 siblings, 0 replies; 15+ messages in thread
From: Yosry Ahmed @ 2026-06-29 18:37 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
Extend the testing coverage in L2 by injecting a #UD into the vCPU every
other iteration during restore, and intercepting #UD from L1,
essentially forcing an L2 -> L1 VM-Exit directly after save+restore.
With this change, the test reliably reproduces the CR2 bug fixed by
commit 5c247d08bc81 ("KVM: nSVM: Use vcpu->arch.cr2 when updating vmcb12
on nested #VMEXIT") -- at least on Milan, Genoa, and Turin CPUs.
Assisted-by: Gemini:gemini-3.1-pro
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
.../kvm/x86/stress_save_restore_pf_test.c | 47 +++++++++++++++++--
1 file changed, 42 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c b/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
index 9ab52d27a61d9..2b76e56f744e7 100644
--- a/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
+++ b/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
@@ -105,8 +105,12 @@ static void guest_access_memory(void *arg)
static void l1_svm_code(struct svm_test_data *svm)
{
generic_svm_setup(svm, guest_access_memory);
- run_guest(svm->vmcb, svm->vmcb_gpa);
- GUEST_ASSERT(false);
+ svm->vmcb->control.intercept_exceptions |= BIT(UD_VECTOR);
+
+ while (1) {
+ run_guest(svm->vmcb, svm->vmcb_gpa);
+ GUEST_ASSERT_EQ(svm->vmcb->control.exit_code, (SVM_EXIT_EXCP_BASE + UD_VECTOR));
+ }
}
static void l1_vmx_code(struct vmx_pages *vmx)
@@ -115,13 +119,17 @@ static void l1_vmx_code(struct vmx_pages *vmx)
GUEST_ASSERT(load_vmcs(vmx));
prepare_vmcs(vmx, guest_access_memory);
- /* Ignore any #PF */
- GUEST_ASSERT(!vmwrite(EXCEPTION_BITMAP, BIT(PF_VECTOR)));
+ /* Intercept UD, ignore any #PF */
+ GUEST_ASSERT(!vmwrite(EXCEPTION_BITMAP, BIT(UD_VECTOR) | BIT(PF_VECTOR)));
GUEST_ASSERT(!vmwrite(PAGE_FAULT_ERROR_CODE_MASK, 0));
GUEST_ASSERT(!vmwrite(PAGE_FAULT_ERROR_CODE_MATCH, -1));
GUEST_ASSERT(!vmlaunch());
- GUEST_ASSERT(false);
+ while (1) {
+ GUEST_ASSERT_EQ(vmreadz(VM_EXIT_REASON), EXIT_REASON_EXCEPTION_NMI);
+ GUEST_ASSERT_EQ(vmreadz(VM_EXIT_INTR_INFO) & 0xff, UD_VECTOR);
+ GUEST_ASSERT(!vmresume());
+ }
}
static void l1_guest_code(void *test_data)
@@ -159,6 +167,24 @@ static void vcpu_sigusr_ignore(void)
sigaction(SIGUSR1, &sa, NULL);
}
+static bool vcpu_state_is_guest_mode(struct kvm_x86_state *state)
+{
+ return !!(state->nested.flags & KVM_STATE_NESTED_GUEST_MODE);
+}
+
+static void vcpu_state_inject_ud(struct kvm_x86_state *state)
+{
+ if (state->events.exception.pending || state->events.exception.injected)
+ return;
+
+ state->events.flags |= KVM_VCPUEVENT_VALID_PAYLOAD;
+ state->events.exception.pending = true;
+ state->events.exception.injected = false;
+ state->events.exception.nr = UD_VECTOR;
+ state->events.exception.has_error_code = false;
+ state->events.exception_has_payload = false;
+}
+
static bool parse_args_nested(int argc, char *argv[])
{
bool nested = false;
@@ -192,10 +218,13 @@ int main(int argc, char *argv[])
gva_t gva;
u64 pte;
+ TEST_REQUIRE(kvm_has_cap(KVM_CAP_EXCEPTION_PAYLOAD));
+
nested = parse_args_nested(argc, argv);
vm = vm_create_with_one_vcpu(&vcpu, nested ? l1_guest_code : guest_access_memory);
vm_install_exception_handler(vm, PF_VECTOR, guest_pf_handler);
+ vm_enable_cap(vm, KVM_CAP_EXCEPTION_PAYLOAD, -2ul);
if (nested) {
TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_SVM) || kvm_cpu_has(X86_FEATURE_VMX));
@@ -270,8 +299,16 @@ int main(int argc, char *argv[])
state = vcpu_save_state(vcpu);
+ /*
+ * If the vCPU is in guest mode, inject a #UD to trigger an
+ * L2->L1 VM-Exit every other iteration.
+ */
+ if (nested && vcpu_state_is_guest_mode(state) && count % 2 == 0)
+ vcpu_state_inject_ud(state);
+
kvm_vm_release(vm);
vcpu = vm_recreate_with_one_vcpu(vm);
+ vm_enable_cap(vm, KVM_CAP_EXCEPTION_PAYLOAD, -2ul);
vcpu_load_state(vcpu, state);
kvm_x86_state_cleanup(state);
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v3 08/10] KVM: selftests: Trigger save+restore randomly in the #PF stress test
2026-06-29 18:37 ` [PATCH v3 08/10] KVM: selftests: Trigger save+restore randomly in the #PF stress test Yosry Ahmed
@ 2026-06-29 18:48 ` sashiko-bot
2026-06-29 20:29 ` Yosry Ahmed
0 siblings, 1 reply; 15+ messages in thread
From: sashiko-bot @ 2026-06-29 18:48 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The aggressive 1ms signal interval can starve the guest and cause the test to deterministically fail on slower or nested environments.
--
commit 4750524a9d34a97e2957bdf96a22b57c5404cb65
Author: Yosry Ahmed <yosry@kernel.org>
KVM: selftests: Trigger save+restore randomly in the #PF stress test
This patch introduces a background thread in the page fault stress test that
sends SIGUSR1 to the vCPU thread to trigger exits to userspace. This allows
exercising race conditions between page fault handling in the guest and
save/restore paths in the host.
> diff --git a/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c b/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
> index 1b6f64bbcf937..bbbb5bb2a2ee1 100644
> --- a/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
> +++ b/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
[ ... ]
> @@ -94,15 +96,41 @@ static void guest_access_memory(void *arg)
> +static void *sigusr_thread_fn(void *arg)
> +{
> + pthread_t vcpu_thread = (pthread_t)arg;
> +
> + for (;;) {
> + pthread_testcancel();
> + pthread_kill(vcpu_thread, SIGUSR1);
> + usleep(msecs_to_usecs(1));
> + }
> + return NULL;
> +}
[ ... ]
> @@ -151,18 +179,30 @@ int main(int argc, char *argv[])
> while (count++ < NR_ITERATIONS) {
> + /*
> + * Only handle SIGUSR while the vCPU is running, otherwise
> + * ignore it to avoid interrupting other ioctls/syscalls.
> + */
> + vcpu_sigusr_listen();
> r = __vcpu_run(vcpu);
> - TEST_ASSERT(!r, "vcpu_run failed");
> - TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
> -
> - get_ucall(vcpu, &uc);
> - if (uc.cmd == UCALL_ABORT) {
> + if (r == -1)
> + TEST_ASSERT_EQ(errno, EINTR);
> + vcpu_sigusr_ignore();
[Severity: Medium]
Can the aggressive 1ms signal interval from sigusr_thread_fn starve the guest
and cause deterministic test failures in slower environments?
The first time __vcpu_run executes for a newly created VM, it performs
expensive kernel-side initializations like setting up the shadow MMU. If this
setup takes longer than 1ms (which is probable in nested or heavily loaded CI
environments), the background thread will send SIGUSR1 before the guest
actually executes.
When KVM detects the pending signal, it will abort the entry and return
-EINTR. Because main catches -EINTR and immediately proceeds to save the
unchanged state and recreate the VM for the next iteration, the test can get
stuck in a cycle where the guest never makes forward progress.
After exhausting all iterations doing nothing but VM recreations, the test
would then fail the guest_faults > 0 assertion at the end. Could we verify
that the guest actually ran and made forward progress before destroying the
VM, or perhaps increase the signal interval?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260629183746.699840-1-yosry@kernel.org?part=8
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 05/10] KVM: selftests: Reuse GPR switching logic for nVMX
2026-06-29 18:37 ` [PATCH v3 05/10] KVM: selftests: Reuse GPR switching logic for nVMX Yosry Ahmed
@ 2026-06-29 18:49 ` sashiko-bot
2026-06-29 20:26 ` Yosry Ahmed
0 siblings, 1 reply; 15+ messages in thread
From: sashiko-bot @ 2026-06-29 18:49 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The newly introduced VMX_SWITCH_GPRS_ASM logic makes vmlaunch and vmresume thread-unsafe, corrupting L1 host and L2 guest register state in multi-vCPU nested tests.
--
commit f53344a36faa652dd7b5c7d1e749a4bffbc5242b
Author: Yosry Ahmed <yosry@kernel.org>
KVM: selftests: Reuse GPR switching logic for nVMX
This commit introduces VMX_SWITCH_GPRS_ASM to reuse the GPR switching logic
from SVM for nVMX. It replaces the push/pop of registers with
VMX_SWITCH_GPRS_ASM during vmlaunch and vmresume operations for both standard
and enlightened VMCS.
> diff --git a/tools/testing/selftests/kvm/include/x86/vmx.h b/tools/testing/selftests/kvm/include/x86/vmx.h
> index 4bcfd60e3aecb..a808dc21c9f21 100644
> --- a/tools/testing/selftests/kvm/include/x86/vmx.h
> +++ b/tools/testing/selftests/kvm/include/x86/vmx.h
[ ... ]
> @@ -373,34 +387,23 @@ static inline int vmlaunch(void)
> if (enable_evmcs)
> return evmcs_vmlaunch();
>
> - __asm__ __volatile__("push %%rbp;"
> - "push %%rcx;"
> - "push %%rdx;"
> - "push %%rsi;"
> - "push %%rdi;"
> - "push $0;"
> + __asm__ __volatile__("push $0;"
> "vmwrite %%rsp, %[host_rsp];"
> "lea 1f(%%rip), %%rax;"
> "vmwrite %%rax, %[host_rip];"
> + VMX_SWITCH_GPRS_ASM
> "vmlaunch;"
[Severity: Medium]
Will this replacement make vmlaunch() thread-unsafe in multi-vCPU nested
tests?
Since VMX_SWITCH_GPRS_ASM uses GUEST_SWITCH_GPR_ASM which operates on a
single shared guest_regs array in L1 memory, concurrent executions by
multiple vCPUs (e.g., via memstress_l2_guest_entry) could clobber the saved
host state and L2 initial state for all but one vCPU.
The previous push/pop sequence was thread-safe since it used the stack.
Does this same thread-safety issue apply to vmresume() and the evmcs variants
that also adopted this macro?
> "incq (%%rsp);"
> - "1: pop %%rax;"
> - "pop %%rdi;"
> - "pop %%rsi;"
> - "pop %%rdx;"
> - "pop %%rcx;"
> - "pop %%rbp;"
> + "1: ;"
> + VMX_SWITCH_GPRS_ASM
> + "pop %%rax;"
> : [ret]"=&a"(ret)
> : [host_rsp]"r"((u64)HOST_RSP),
> [host_rip]"r"((u64)HOST_RIP)
> - : "memory", "cc", "rbx", "r8", "r9", "r10",
> - "r11", "r12", "r13", "r14", "r15");
> + : "memory", "cc");
> return ret;
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260629183746.699840-1-yosry@kernel.org?part=5
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 05/10] KVM: selftests: Reuse GPR switching logic for nVMX
2026-06-29 18:49 ` sashiko-bot
@ 2026-06-29 20:26 ` Yosry Ahmed
0 siblings, 0 replies; 15+ messages in thread
From: Yosry Ahmed @ 2026-06-29 20:26 UTC (permalink / raw)
To: sashiko-reviews; +Cc: kvm
On Mon, Jun 29, 2026 at 06:49:31PM +0000, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [Medium] The newly introduced VMX_SWITCH_GPRS_ASM logic makes vmlaunch and vmresume thread-unsafe, corrupting L1 host and L2 guest register state in multi-vCPU nested tests.
> --
>
> commit f53344a36faa652dd7b5c7d1e749a4bffbc5242b
> Author: Yosry Ahmed <yosry@kernel.org>
>
> KVM: selftests: Reuse GPR switching logic for nVMX
>
> This commit introduces VMX_SWITCH_GPRS_ASM to reuse the GPR switching logic
> from SVM for nVMX. It replaces the push/pop of registers with
> VMX_SWITCH_GPRS_ASM during vmlaunch and vmresume operations for both standard
> and enlightened VMCS.
>
> > diff --git a/tools/testing/selftests/kvm/include/x86/vmx.h b/tools/testing/selftests/kvm/include/x86/vmx.h
> > index 4bcfd60e3aecb..a808dc21c9f21 100644
> > --- a/tools/testing/selftests/kvm/include/x86/vmx.h
> > +++ b/tools/testing/selftests/kvm/include/x86/vmx.h
> [ ... ]
> > @@ -373,34 +387,23 @@ static inline int vmlaunch(void)
> > if (enable_evmcs)
> > return evmcs_vmlaunch();
> >
> > - __asm__ __volatile__("push %%rbp;"
> > - "push %%rcx;"
> > - "push %%rdx;"
> > - "push %%rsi;"
> > - "push %%rdi;"
> > - "push $0;"
> > + __asm__ __volatile__("push $0;"
> > "vmwrite %%rsp, %[host_rsp];"
> > "lea 1f(%%rip), %%rax;"
> > "vmwrite %%rax, %[host_rip];"
> > + VMX_SWITCH_GPRS_ASM
> > "vmlaunch;"
>
> [Severity: Medium]
> Will this replacement make vmlaunch() thread-unsafe in multi-vCPU nested
> tests?
>
> Since VMX_SWITCH_GPRS_ASM uses GUEST_SWITCH_GPR_ASM which operates on a
> single shared guest_regs array in L1 memory, concurrent executions by
> multiple vCPUs (e.g., via memstress_l2_guest_entry) could clobber the saved
> host state and L2 initial state for all but one vCPU.
>
> The previous push/pop sequence was thread-safe since it used the stack.
> Does this same thread-safety issue apply to vmresume() and the evmcs variants
> that also adopted this macro?
Yes, I should have explicitly called this out. This replaces the
apparently thread-safe push/pop sequence with reusing the GPR switching
logic in SVM, which isn't thread-safe at all (uses a global array).
However this is still an improvement imo because:
- The logic here is half-baked and prompts the UCALL clobber hack as it
doesn't properly save/restore everything. Reusing the GPR switching
logic used by SVM allows us to drop that hack.
- It's more likely (imo) that we hit a problem due to the half-baked
GPR save/restore logic than thread-safety. This is evidently true by
the fact that the stress test added in this series fails on VMX with
the existing push/pop sequence. OTOH, no tests are failing on SVM due
to lack of thread-safety. There is only one test that uses more than
one vCPU with nested anyway (the memstress test).
I think the logical next step is to move the GPR array to be per-vCPU,
making it thread-safe for both VMX and SVM in a proper way, but this is
outside the scope of this series.
>
> > "incq (%%rsp);"
> > - "1: pop %%rax;"
> > - "pop %%rdi;"
> > - "pop %%rsi;"
> > - "pop %%rdx;"
> > - "pop %%rcx;"
> > - "pop %%rbp;"
> > + "1: ;"
> > + VMX_SWITCH_GPRS_ASM
> > + "pop %%rax;"
> > : [ret]"=&a"(ret)
> > : [host_rsp]"r"((u64)HOST_RSP),
> > [host_rip]"r"((u64)HOST_RIP)
> > - : "memory", "cc", "rbx", "r8", "r9", "r10",
> > - "r11", "r12", "r13", "r14", "r15");
> > + : "memory", "cc");
> > return ret;
> > }
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260629183746.699840-1-yosry@kernel.org?part=5
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 08/10] KVM: selftests: Trigger save+restore randomly in the #PF stress test
2026-06-29 18:48 ` sashiko-bot
@ 2026-06-29 20:29 ` Yosry Ahmed
0 siblings, 0 replies; 15+ messages in thread
From: Yosry Ahmed @ 2026-06-29 20:29 UTC (permalink / raw)
To: sashiko-reviews; +Cc: kvm
On Mon, Jun 29, 2026 at 06:48:37PM +0000, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [Medium] The aggressive 1ms signal interval can starve the guest and cause the test to deterministically fail on slower or nested environments.
> --
>
> commit 4750524a9d34a97e2957bdf96a22b57c5404cb65
> Author: Yosry Ahmed <yosry@kernel.org>
>
> KVM: selftests: Trigger save+restore randomly in the #PF stress test
>
> This patch introduces a background thread in the page fault stress test that
> sends SIGUSR1 to the vCPU thread to trigger exits to userspace. This allows
> exercising race conditions between page fault handling in the guest and
> save/restore paths in the host.
>
> > diff --git a/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c b/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
> > index 1b6f64bbcf937..bbbb5bb2a2ee1 100644
> > --- a/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
> > +++ b/tools/testing/selftests/kvm/x86/stress_save_restore_pf_test.c
>
> [ ... ]
>
> > @@ -94,15 +96,41 @@ static void guest_access_memory(void *arg)
> > +static void *sigusr_thread_fn(void *arg)
> > +{
> > + pthread_t vcpu_thread = (pthread_t)arg;
> > +
> > + for (;;) {
> > + pthread_testcancel();
> > + pthread_kill(vcpu_thread, SIGUSR1);
> > + usleep(msecs_to_usecs(1));
> > + }
> > + return NULL;
> > +}
>
> [ ... ]
>
> > @@ -151,18 +179,30 @@ int main(int argc, char *argv[])
> > while (count++ < NR_ITERATIONS) {
> > + /*
> > + * Only handle SIGUSR while the vCPU is running, otherwise
> > + * ignore it to avoid interrupting other ioctls/syscalls.
> > + */
> > + vcpu_sigusr_listen();
> > r = __vcpu_run(vcpu);
> > - TEST_ASSERT(!r, "vcpu_run failed");
> > - TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
> > -
> > - get_ucall(vcpu, &uc);
> > - if (uc.cmd == UCALL_ABORT) {
> > + if (r == -1)
> > + TEST_ASSERT_EQ(errno, EINTR);
> > + vcpu_sigusr_ignore();
>
> [Severity: Medium]
> Can the aggressive 1ms signal interval from sigusr_thread_fn starve the guest
> and cause deterministic test failures in slower environments?
>
> The first time __vcpu_run executes for a newly created VM, it performs
> expensive kernel-side initializations like setting up the shadow MMU. If this
> setup takes longer than 1ms (which is probable in nested or heavily loaded CI
> environments), the background thread will send SIGUSR1 before the guest
> actually executes.
>
> When KVM detects the pending signal, it will abort the entry and return
> -EINTR. Because main catches -EINTR and immediately proceeds to save the
> unchanged state and recreate the VM for the next iteration, the test can get
> stuck in a cycle where the guest never makes forward progress.
>
> After exhausting all iterations doing nothing but VM recreations, the test
> would then fail the guest_faults > 0 assertion at the end. Could we verify
> that the guest actually ran and made forward progress before destroying the
> VM, or perhaps increase the signal interval?
The signal interval was initially 100us and that did cause problems on
slower kernels (with DEBUG options or ASAN). With 1 msec, we don't
observe any problems internally with DEBUG or ASAN builds on a variety
of platforms. So I think we are good with 1 msec.
Increasing the interval too much will probably make the test less
effective, as the signal is less likely to hit in interesting race
windows.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-06-29 20:29 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-29 18:37 [PATCH v3 00/10] KVM: selftests: Stress save+restore and #PF (ft. nested) Yosry Ahmed
2026-06-29 18:37 ` [PATCH v3 01/10] KVM: selftests: Move STR() and XSTR() definitions to test_util.h Yosry Ahmed
2026-06-29 18:37 ` [PATCH v3 02/10] KVM: selftests: Fix RAX and RFLAGS VMCB offsets when running L2 Yosry Ahmed
2026-06-29 18:37 ` [PATCH v3 03/10] KVM: selftests: Use an array for guest_regs (and fix offsets) Yosry Ahmed
2026-06-29 18:37 ` [PATCH v3 04/10] KVM: selftests: Move GPR load/save definitions outside of nSVM code Yosry Ahmed
2026-06-29 18:37 ` [PATCH v3 05/10] KVM: selftests: Reuse GPR switching logic for nVMX Yosry Ahmed
2026-06-29 18:49 ` sashiko-bot
2026-06-29 20:26 ` Yosry Ahmed
2026-06-29 18:37 ` [PATCH v3 06/10] KVM: selftests: Drop HORRIFIC_L2_UCALL_CLOBBER_HACK Yosry Ahmed
2026-06-29 18:37 ` [PATCH v3 07/10] KVM: selftests: Add basic stress test for save+restore and #PF handling Yosry Ahmed
2026-06-29 18:37 ` [PATCH v3 08/10] KVM: selftests: Trigger save+restore randomly in the #PF stress test Yosry Ahmed
2026-06-29 18:48 ` sashiko-bot
2026-06-29 20:29 ` Yosry Ahmed
2026-06-29 18:37 ` [PATCH v3 09/10] KVM: selftests: Support running stress save+restore and #PF test in L2 Yosry Ahmed
2026-06-29 18:37 ` [PATCH v3 10/10] KVM: selftests: Trigger L2->L1 exits stress save+restore and #PF test Yosry Ahmed
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox