Kernel KVM virtualization development
 help / color / mirror / Atom feed
* [kvm-unit-tests PATCH v3 00/20] x86: Better backtraces for leaf functions
@ 2026-05-14 21:04 Sean Christopherson
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 01/20] x86/vmx: Drop unused SYSENTER "support" in nested VMX infrastructure Sean Christopherson
                   ` (19 more replies)
  0 siblings, 20 replies; 21+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Sean Christopherson, Mathias Krause, Andrew Jones

Finally re-posting Mathias' work to provide better backtraces on x86, which
was blocked by a fatal bug in the nVMX SIPI test that was exposed by the
stack frame side effects.  This has been sitting on my local system(s) for
something like 5 months, and I all but forgot I hadn't posted it.

Most of this series is cleaning up the nVMX and nSVM to play nice with
multi-CPU tests.  Then to fix the race in the problematic SIPI, abuse the
VM-Entry MSR load list to atomically detect VM-Enter (I'm still proud of
that hack).

For gory details:

https://lore.kernel.org/all/3bac29b9-4c49-4e5d-997e-9e4019a2fceb@grsecurity.net

Mathias Krause (3):
  x86/vmx: Initialize test stage in SIPI test *before* launching AP
    thread
  x86: Better backtraces for leaf functions
  x86: Prevent realmode test code instrumentation with nop-mcount

Sean Christopherson (17):
  x86/vmx: Drop unused SYSENTER "support" in nested VMX infrastructure
  x86/vmx: Drop unused guest_regs "support" in nested VMX infrastructure
  x86/svm: Sort (and swap) GPRs by their index, not alphabetically
  x86: Dedup guest/host context switch of registers across SVM and VMX
  x86/virt: Use macro shenanigans to get reg offsets when swapping
    guest/host regs
  x86/virt: Track "guest regs" using per-CPU variable
  x86/svm: Don't VMLOAD/VMSAVE "guest" state around VMRUN
  x86/vmx: Use separate VMCSes for BSP vs. AP in INIT test
  x86/vmx: Swap GPRs after checking "launched" status
  x86/vmx: Track VMCS "launched" state per-CPU
  x86/vmx: Track "is this CPU in guest mode" per-CPU
  x86/vmx: Communicate hypercalls via RAX, not a global field
  x86/kvmclock: Replace spaces with tabs
  x86/kvmclock: Skip kvmclock test when not running on KVM with
    CLOCKSOURCE2
  x86/vmx: Tag "struct vmx_msr_entry" as needing to be 16-byte aligned
  x86/smp: Align the stack to a 16-byte boundary when invoking SMP
    function calls
  x86/vmx: Write to KVM's WALL_CLOCK MSR via VM-Entry load list sync in
    SIPI test

 lib/x86/processor.h |  15 +++
 lib/x86/smp.c       |  21 ++++-
 lib/x86/smp.h       |  32 +++++++
 lib/x86/virt.h      |  61 ++++++++++++
 x86/Makefile.common |  14 +++
 x86/kvmclock.c      |  42 ++++-----
 x86/kvmclock.h      |   2 +
 x86/kvmclock_test.c | 225 +++++++++++++++++++++++---------------------
 x86/realmode.c      |   3 +
 x86/svm.c           |  19 ++--
 x86/svm.h           |  61 ++----------
 x86/svm_tests.c     |   5 +-
 x86/vmx.c           | 121 +++++++++++-------------
 x86/vmx.h           |  72 +-------------
 x86/vmx_tests.c     | 134 ++++++++++++++++----------
 15 files changed, 443 insertions(+), 384 deletions(-)
 create mode 100644 lib/x86/virt.h


base-commit: 4d60e2429d63dc0c24990114a8afc89e86c187cc
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 01/20] x86/vmx: Drop unused SYSENTER "support" in nested VMX infrastructure
  2026-05-14 21:04 [kvm-unit-tests PATCH v3 00/20] x86: Better backtraces for leaf functions Sean Christopherson
@ 2026-05-14 21:04 ` Sean Christopherson
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 02/20] x86/vmx: Drop unused guest_regs " Sean Christopherson
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Sean Christopherson, Mathias Krause, Andrew Jones

Drop the unused SYSTENTER "support" from the nested VMX infrastructure, in
quotes because the code is half-baked (and that's being generous) and
likely has never been used.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 x86/vmx.c       | 32 +++++---------------------------
 x86/vmx.h       | 20 --------------------
 x86/vmx_tests.c | 45 +++++++++++++++++++++------------------------
 3 files changed, 26 insertions(+), 71 deletions(-)

diff --git a/x86/vmx.c b/x86/vmx.c
index eb2965d8..2b85ef0b 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -42,7 +42,7 @@
 u64 *bsp_vmxon_region;
 struct vmcs *vmcs_root;
 u32 vpid_cnt;
-u64 guest_stack_top, guest_syscall_stack_top;
+u64 guest_stack_top;
 u32 ctrl_pin, ctrl_enter, ctrl_exit, ctrl_cpu[2];
 struct regs regs;
 
@@ -76,7 +76,6 @@ union vmx_ept_vpid  ept_vpid;
 extern struct descriptor_table_ptr gdt_descr;
 extern struct descriptor_table_ptr idt_descr;
 extern void *vmx_return;
-extern void *entry_sysenter;
 extern void *guest_entry;
 
 static volatile u32 stage;
@@ -561,25 +560,6 @@ void vmx_inc_test_stage(void)
 	barrier();
 }
 
-/* entry_sysenter */
-asm(
-	".align	4, 0x90\n\t"
-	".globl	entry_sysenter\n\t"
-	"entry_sysenter:\n\t"
-	SAVE_GPR
-	"	and	$0xf, %rax\n\t"
-	"	mov	%rax, %rdi\n\t"
-	"	call	syscall_handler\n\t"
-	LOAD_GPR
-	"	vmresume\n\t"
-);
-
-static void __attribute__((__used__)) syscall_handler(u64 syscall_no)
-{
-	if (current->syscall_handler)
-		current->syscall_handler(syscall_no);
-}
-
 static const char * const exit_reason_descriptions[] = {
 	[VMX_EXC_NMI]		= "VMX_EXC_NMI",
 	[VMX_EXTINT]		= "VMX_EXTINT",
@@ -1123,7 +1103,7 @@ static void init_vmcs_host(void)
 	vmcs_write(HOST_CR0, read_cr0());
 	vmcs_write(HOST_CR3, read_cr3());
 	vmcs_write(HOST_CR4, read_cr4());
-	vmcs_write(HOST_SYSENTER_EIP, (u64)(&entry_sysenter));
+	vmcs_write(HOST_SYSENTER_EIP, rdmsr(MSR_IA32_SYSENTER_EIP));
 	vmcs_write(HOST_SYSENTER_CS,  KERNEL_CS);
 	if (ctrl_exit_rev.clr & EXI_LOAD_PAT)
 		vmcs_write(HOST_PAT, rdmsr(MSR_IA32_CR_PAT));
@@ -1172,8 +1152,8 @@ static void init_vmcs_guest(void)
 	vmcs_write(GUEST_CR3, guest_cr3);
 	vmcs_write(GUEST_CR4, guest_cr4);
 	vmcs_write(GUEST_SYSENTER_CS,  KERNEL_CS);
-	vmcs_write(GUEST_SYSENTER_ESP, guest_syscall_stack_top);
-	vmcs_write(GUEST_SYSENTER_EIP, (u64)(&entry_sysenter));
+	vmcs_write(GUEST_SYSENTER_ESP, rdmsr(MSR_IA32_SYSENTER_ESP));
+	vmcs_write(GUEST_SYSENTER_EIP, rdmsr(MSR_IA32_SYSENTER_EIP));
 	vmcs_write(GUEST_DR7, 0);
 	vmcs_write(GUEST_EFER, rdmsr(MSR_EFER));
 
@@ -1319,7 +1299,6 @@ static void alloc_bsp_vmx_pages(void)
 {
 	bsp_vmxon_region = alloc_page();
 	guest_stack_top = (uintptr_t)alloc_page() + PAGE_SIZE;
-	guest_syscall_stack_top = (uintptr_t)alloc_page() + PAGE_SIZE;
 	vmcs_root = alloc_page();
 }
 
@@ -1840,8 +1819,7 @@ static int test_run(struct vmx_test *test)
 	/* Validate V2 interface. */
 	if (test->v2) {
 		int ret = 0;
-		if (test->init || test->guest_main || test->exit_handler ||
-		    test->syscall_handler) {
+		if (test->init || test->guest_main || test->exit_handler) {
 			report_fail("V2 test cannot specify V1 callbacks.");
 			ret = 1;
 		}
diff --git a/x86/vmx.h b/x86/vmx.h
index 7ad7672a..f4ed5339 100644
--- a/x86/vmx.h
+++ b/x86/vmx.h
@@ -121,7 +121,6 @@ struct vmx_test {
 	int (*init)(struct vmcs *vmcs);
 	void (*guest_main)(void);
 	int (*exit_handler)(union exit_reason exit_reason);
-	void (*syscall_handler)(u64 syscall_no);
 	struct regs guest_regs;
 	int (*entry_failure_handler)(struct vmentry_result *result);
 	struct vmcs *vmcs;
@@ -589,25 +588,6 @@ enum vm_entry_failure_code {
 	ENTRY_FAIL_VMCS_LINK_PTR	= 4,
 };
 
-#define SAVE_GPR				\
-	"xchg %rax, regs\n\t"			\
-	"xchg %rcx, regs+0x8\n\t"		\
-	"xchg %rdx, regs+0x10\n\t"		\
-	"xchg %rbx, regs+0x18\n\t"		\
-	"xchg %rbp, regs+0x28\n\t"		\
-	"xchg %rsi, regs+0x30\n\t"		\
-	"xchg %rdi, regs+0x38\n\t"		\
-	"xchg %r8, regs+0x40\n\t"		\
-	"xchg %r9, regs+0x48\n\t"		\
-	"xchg %r10, regs+0x50\n\t"		\
-	"xchg %r11, regs+0x58\n\t"		\
-	"xchg %r12, regs+0x60\n\t"		\
-	"xchg %r13, regs+0x68\n\t"		\
-	"xchg %r14, regs+0x70\n\t"		\
-	"xchg %r15, regs+0x78\n\t"
-
-#define LOAD_GPR	SAVE_GPR
-
 #define SAVE_GPR_C				\
 	"xchg %%rax, regs\n\t"			\
 	"xchg %%rcx, regs+0x8\n\t"		\
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index ff387ded..83d88480 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -11757,40 +11757,37 @@ static void vmx_cet_test(void)
 
 #define TEST(name) { #name, .v2 = name }
 
-/* name/init/guest_main/exit_handler/syscall_handler/guest_regs */
+/* name/init/guest_main/exit_handler/guest_regs */
 struct vmx_test vmx_tests[] = {
-	{ "null", NULL, basic_guest_main, basic_exit_handler, NULL, {0} },
-	{ "vmenter", NULL, vmenter_main, vmenter_exit_handler, NULL, {0} },
+	{ "null", NULL, basic_guest_main, basic_exit_handler, {0} },
+	{ "vmenter", NULL, vmenter_main, vmenter_exit_handler, {0} },
 	{ "preemption timer", preemption_timer_init, preemption_timer_main,
-		preemption_timer_exit_handler, NULL, {0} },
+		preemption_timer_exit_handler, {0} },
 	{ "control field PAT", test_ctrl_pat_init, test_ctrl_pat_main,
-		test_ctrl_pat_exit_handler, NULL, {0} },
+		test_ctrl_pat_exit_handler, {0} },
 	{ "control field EFER", test_ctrl_efer_init, test_ctrl_efer_main,
-		test_ctrl_efer_exit_handler, NULL, {0} },
+		test_ctrl_efer_exit_handler, {0} },
 	{ "CR shadowing", NULL, cr_shadowing_main,
-		cr_shadowing_exit_handler, NULL, {0} },
+		cr_shadowing_exit_handler, {0} },
 	{ "I/O bitmap", iobmp_init, iobmp_main, iobmp_exit_handler,
-		NULL, {0} },
+		{0} },
 	{ "instruction intercept", insn_intercept_init, insn_intercept_main,
-		insn_intercept_exit_handler, NULL, {0} },
-	{ "EPT A/D disabled", ept_init, ept_main, ept_exit_handler, NULL, {0} },
-	{ "EPT A/D enabled", eptad_init, eptad_main, eptad_exit_handler, NULL, {0} },
-	{ "PML", pml_init, pml_main, pml_exit_handler, NULL, {0} },
-	{ "interrupt", interrupt_init, interrupt_main,
-		interrupt_exit_handler, NULL, {0} },
-	{ "nmi_hlt", nmi_hlt_init, nmi_hlt_main,
-		nmi_hlt_exit_handler, NULL, {0} },
-	{ "debug controls", dbgctls_init, dbgctls_main, dbgctls_exit_handler,
-		NULL, {0} },
+		insn_intercept_exit_handler, {0} },
+	{ "EPT A/D disabled", ept_init, ept_main, ept_exit_handler, {0} },
+	{ "EPT A/D enabled", eptad_init, eptad_main, eptad_exit_handler, {0} },
+	{ "PML", pml_init, pml_main, pml_exit_handler, {0} },
+	{ "interrupt", interrupt_init, interrupt_main, interrupt_exit_handler, {0} },
+	{ "nmi_hlt", nmi_hlt_init, nmi_hlt_main, nmi_hlt_exit_handler, {0} },
+	{ "debug controls", dbgctls_init, dbgctls_main, dbgctls_exit_handler, {0} },
 	{ "MSR switch", msr_switch_init, msr_switch_main,
-		msr_switch_exit_handler, NULL, {0}, msr_switch_entry_failure },
-	{ "vmmcall", vmmcall_init, vmmcall_main, vmmcall_exit_handler, NULL, {0} },
+		msr_switch_exit_handler, {0}, msr_switch_entry_failure },
+	{ "vmmcall", vmmcall_init, vmmcall_main, vmmcall_exit_handler, {0} },
 	{ "disable RDTSCP", disable_rdtscp_init, disable_rdtscp_main,
-		disable_rdtscp_exit_handler, NULL, {0} },
+		disable_rdtscp_exit_handler, {0} },
 	{ "exit_monitor_from_l2_test", NULL, exit_monitor_from_l2_main,
-		exit_monitor_from_l2_handler, NULL, {0} },
+		exit_monitor_from_l2_handler, {0} },
 	{ "invalid_msr", invalid_msr_init, invalid_msr_main,
-		invalid_msr_exit_handler, NULL, {0}, invalid_msr_entry_failure},
+		invalid_msr_exit_handler, {0}, invalid_msr_entry_failure},
 	/* Basic V2 tests. */
 	TEST(v2_null_test),
 	TEST(v2_multiple_entries_test),
@@ -11876,5 +11873,5 @@ struct vmx_test vmx_tests[] = {
 	TEST(vmx_canonical_test),
 	/* "Load CET" VM-entry/exit controls tests. */
 	TEST(vmx_cet_test),
-	{ NULL, NULL, NULL, NULL, NULL, {0} },
+	{ NULL, NULL, NULL, NULL, {0} },
 };
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 02/20] x86/vmx: Drop unused guest_regs "support" in nested VMX infrastructure
  2026-05-14 21:04 [kvm-unit-tests PATCH v3 00/20] x86: Better backtraces for leaf functions Sean Christopherson
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 01/20] x86/vmx: Drop unused SYSENTER "support" in nested VMX infrastructure Sean Christopherson
@ 2026-05-14 21:04 ` Sean Christopherson
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 03/20] x86/svm: Sort (and swap) GPRs by their index, not alphabetically Sean Christopherson
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Sean Christopherson, Mathias Krause, Andrew Jones

Drop vmx_tests.guest_regs as no tests use the functionality, the field is
very misleading (it's only used to set the initial regs, i.e. it's not an
up-to-date snapshot), and per-test register state doesn't play nice with
tests that create and run multiple vCPUs.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 x86/vmx.c       |  3 +--
 x86/vmx.h       |  1 -
 x86/vmx_tests.c | 44 +++++++++++++++++++++-----------------------
 3 files changed, 22 insertions(+), 26 deletions(-)

diff --git a/x86/vmx.c b/x86/vmx.c
index 2b85ef0b..44ee3697 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -1834,6 +1834,7 @@ static int test_run(struct vmx_test *test)
 		return 1;
 	}
 
+	memset(&regs, 0, sizeof(regs));
 	init_vmcs(&(test->vmcs));
 	/* Directly call test->init is ok here, init_vmcs has done
 	   vmcs init, vmclear and vmptrld*/
@@ -1843,8 +1844,6 @@ static int test_run(struct vmx_test *test)
 	v2_guest_main = NULL;
 	test->exits = 0;
 	current = test;
-	regs = test->guest_regs;
-	vmcs_write(GUEST_RFLAGS, regs.rflags | X86_EFLAGS_FIXED);
 	launched = 0;
 	guest_finished = 0;
 	printf("\nTest suite: %s\n", test->name);
diff --git a/x86/vmx.h b/x86/vmx.h
index f4ed5339..425b1c43 100644
--- a/x86/vmx.h
+++ b/x86/vmx.h
@@ -121,7 +121,6 @@ struct vmx_test {
 	int (*init)(struct vmcs *vmcs);
 	void (*guest_main)(void);
 	int (*exit_handler)(union exit_reason exit_reason);
-	struct regs guest_regs;
 	int (*entry_failure_handler)(struct vmentry_result *result);
 	struct vmcs *vmcs;
 	int exits;
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 83d88480..e0d5e390 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -11757,37 +11757,35 @@ static void vmx_cet_test(void)
 
 #define TEST(name) { #name, .v2 = name }
 
-/* name/init/guest_main/exit_handler/guest_regs */
+/* name/init/guest_main/exit_handler/vmfail_handler */
 struct vmx_test vmx_tests[] = {
-	{ "null", NULL, basic_guest_main, basic_exit_handler, {0} },
-	{ "vmenter", NULL, vmenter_main, vmenter_exit_handler, {0} },
+	{ "null", NULL, basic_guest_main, basic_exit_handler, },
+	{ "vmenter", NULL, vmenter_main, vmenter_exit_handler, },
 	{ "preemption timer", preemption_timer_init, preemption_timer_main,
-		preemption_timer_exit_handler, {0} },
+		preemption_timer_exit_handler },
 	{ "control field PAT", test_ctrl_pat_init, test_ctrl_pat_main,
-		test_ctrl_pat_exit_handler, {0} },
+		test_ctrl_pat_exit_handler },
 	{ "control field EFER", test_ctrl_efer_init, test_ctrl_efer_main,
-		test_ctrl_efer_exit_handler, {0} },
-	{ "CR shadowing", NULL, cr_shadowing_main,
-		cr_shadowing_exit_handler, {0} },
-	{ "I/O bitmap", iobmp_init, iobmp_main, iobmp_exit_handler,
-		{0} },
+		test_ctrl_efer_exit_handler },
+	{ "CR shadowing", NULL, cr_shadowing_main, cr_shadowing_exit_handler },
+	{ "I/O bitmap", iobmp_init, iobmp_main, iobmp_exit_handler },
 	{ "instruction intercept", insn_intercept_init, insn_intercept_main,
-		insn_intercept_exit_handler, {0} },
-	{ "EPT A/D disabled", ept_init, ept_main, ept_exit_handler, {0} },
-	{ "EPT A/D enabled", eptad_init, eptad_main, eptad_exit_handler, {0} },
-	{ "PML", pml_init, pml_main, pml_exit_handler, {0} },
-	{ "interrupt", interrupt_init, interrupt_main, interrupt_exit_handler, {0} },
-	{ "nmi_hlt", nmi_hlt_init, nmi_hlt_main, nmi_hlt_exit_handler, {0} },
-	{ "debug controls", dbgctls_init, dbgctls_main, dbgctls_exit_handler, {0} },
+		insn_intercept_exit_handler },
+	{ "EPT A/D disabled", ept_init, ept_main, ept_exit_handler },
+	{ "EPT A/D enabled", eptad_init, eptad_main, eptad_exit_handler },
+	{ "PML", pml_init, pml_main, pml_exit_handler },
+	{ "interrupt", interrupt_init, interrupt_main, interrupt_exit_handler },
+	{ "nmi_hlt", nmi_hlt_init, nmi_hlt_main, nmi_hlt_exit_handler },
+	{ "debug controls", dbgctls_init, dbgctls_main, dbgctls_exit_handler },
 	{ "MSR switch", msr_switch_init, msr_switch_main,
-		msr_switch_exit_handler, {0}, msr_switch_entry_failure },
-	{ "vmmcall", vmmcall_init, vmmcall_main, vmmcall_exit_handler, {0} },
+		msr_switch_exit_handler, msr_switch_entry_failure },
+	{ "vmmcall", vmmcall_init, vmmcall_main, vmmcall_exit_handler },
 	{ "disable RDTSCP", disable_rdtscp_init, disable_rdtscp_main,
-		disable_rdtscp_exit_handler, {0} },
+		disable_rdtscp_exit_handler },
 	{ "exit_monitor_from_l2_test", NULL, exit_monitor_from_l2_main,
-		exit_monitor_from_l2_handler, {0} },
+		exit_monitor_from_l2_handler },
 	{ "invalid_msr", invalid_msr_init, invalid_msr_main,
-		invalid_msr_exit_handler, {0}, invalid_msr_entry_failure},
+		invalid_msr_exit_handler, invalid_msr_entry_failure},
 	/* Basic V2 tests. */
 	TEST(v2_null_test),
 	TEST(v2_multiple_entries_test),
@@ -11873,5 +11871,5 @@ struct vmx_test vmx_tests[] = {
 	TEST(vmx_canonical_test),
 	/* "Load CET" VM-entry/exit controls tests. */
 	TEST(vmx_cet_test),
-	{ NULL, NULL, NULL, NULL, {0} },
+	{ NULL, NULL, NULL, NULL },
 };
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 03/20] x86/svm: Sort (and swap) GPRs by their index, not alphabetically
  2026-05-14 21:04 [kvm-unit-tests PATCH v3 00/20] x86: Better backtraces for leaf functions Sean Christopherson
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 01/20] x86/vmx: Drop unused SYSENTER "support" in nested VMX infrastructure Sean Christopherson
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 02/20] x86/vmx: Drop unused guest_regs " Sean Christopherson
@ 2026-05-14 21:04 ` Sean Christopherson
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 04/20] x86: Dedup guest/host context switch of registers across SVM and VMX Sean Christopherson
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Sean Christopherson, Mathias Krause, Andrew Jones

Sort all GPRs in the nested SVM infrastructure by their index, not
alphabetically.  This will allow sharing code between SVM and VMX for
context switching GPRs between guest and host.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 x86/svm.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/x86/svm.h b/x86/svm.h
index 385c1289..d9f7c731 100644
--- a/x86/svm.h
+++ b/x86/svm.h
@@ -392,9 +392,9 @@ struct svm_test {
 
 struct regs {
 	u64 rax;
-	u64 rbx;
 	u64 rcx;
 	u64 rdx;
+	u64 rbx;
 	u64 cr2;
 	u64 rbp;
 	u64 rsi;
@@ -457,9 +457,9 @@ static inline void clgi(void)
 
 
 #define SAVE_GPR_C                              \
-        "xchg %%rbx, regs+0x8\n\t"              \
-        "xchg %%rcx, regs+0x10\n\t"             \
-        "xchg %%rdx, regs+0x18\n\t"             \
+        "xchg %%rcx, regs+0x8\n\t"              \
+        "xchg %%rdx, regs+0x10\n\t"             \
+        "xchg %%rbx, regs+0x18\n\t"             \
         "xchg %%rbp, regs+0x28\n\t"             \
         "xchg %%rsi, regs+0x30\n\t"             \
         "xchg %%rdi, regs+0x38\n\t"             \
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 04/20] x86: Dedup guest/host context switch of registers across SVM and VMX
  2026-05-14 21:04 [kvm-unit-tests PATCH v3 00/20] x86: Better backtraces for leaf functions Sean Christopherson
                   ` (2 preceding siblings ...)
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 03/20] x86/svm: Sort (and swap) GPRs by their index, not alphabetically Sean Christopherson
@ 2026-05-14 21:04 ` Sean Christopherson
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 05/20] x86/virt: Use macro shenanigans to get reg offsets when swapping guest/host regs Sean Christopherson
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Sean Christopherson, Mathias Krause, Andrew Jones

Deduplicate the context switching of registers across VM-Enter<=>VM-Exit
between SVM and VMX.  The required functionality and implementations are
practically identical, literally the only difference is that SVM doesn't
need (or want) to manually swap RAX as SVM automatically swaps RAX at
VMRUN and #VMEXIT.

Opportunistically rename the structure to "guest_regs" to clarify its
purpose, and to avoid conflicts, e.g. with realmode's "struct regs".

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 lib/x86/virt.h | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++
 x86/svm.c      |  4 ++--
 x86/svm.h      | 47 ++++-----------------------------------------
 x86/vmx.c      |  8 ++++----
 x86/vmx.h      | 42 +---------------------------------------
 5 files changed, 63 insertions(+), 90 deletions(-)
 create mode 100644 lib/x86/virt.h

diff --git a/lib/x86/virt.h b/lib/x86/virt.h
new file mode 100644
index 00000000..ccc90c25
--- /dev/null
+++ b/lib/x86/virt.h
@@ -0,0 +1,52 @@
+#ifndef _x86_VIRT_H_
+#define _x86_VIRT_H_
+
+#include "libcflat.h"
+
+struct guest_regs {
+	u64 rax;
+	u64 rcx;
+	u64 rdx;
+	u64 rbx;
+	/*
+	 * Use RSP's index to hold CR3, as RSP isn't manually context switched
+	 * by software in any relevant flows.
+	 */
+	u64 cr2;
+	u64 rbp;
+	u64 rsi;
+	u64 rdi;
+	u64 r8;
+	u64 r9;
+	u64 r10;
+	u64 r11;
+	u64 r12;
+	u64 r13;
+	u64 r14;
+	u64 r15;
+	u64 rflags;
+};
+
+extern struct guest_regs regs;
+
+#define __SWAP_GPRS			\
+	"xchg %%rcx, regs+0x8\n\t"	\
+	"xchg %%rdx, regs+0x10\n\t"	\
+	"xchg %%rbx, regs+0x18\n\t"	\
+	"xchg %%rbp, regs+0x28\n\t"	\
+	"xchg %%rsi, regs+0x30\n\t"	\
+	"xchg %%rdi, regs+0x38\n\t"	\
+	"xchg %%r8, regs+0x40\n\t"	\
+	"xchg %%r9, regs+0x48\n\t"	\
+	"xchg %%r10, regs+0x50\n\t"	\
+	"xchg %%r11, regs+0x58\n\t"	\
+	"xchg %%r12, regs+0x60\n\t"	\
+	"xchg %%r13, regs+0x68\n\t"	\
+	"xchg %%r14, regs+0x70\n\t"	\
+	"xchg %%r15, regs+0x78\n\t"
+
+#define SWAP_GPRS			\
+	"xchg %%rax, regs+0x0\n\t"	\
+	__SWAP_GPRS
+
+#endif /* _x86_VIRT_H_ */
\ No newline at end of file
diff --git a/x86/svm.c b/x86/svm.c
index 941e0784..893b3f49 100644
--- a/x86/svm.c
+++ b/x86/svm.c
@@ -223,9 +223,9 @@ void vmcb_ident(struct vmcb *vmcb)
 	}
 }
 
-struct regs regs;
+struct guest_regs regs;
 
-struct regs get_regs(void)
+struct guest_regs get_regs(void)
 {
 	return regs;
 }
diff --git a/x86/svm.h b/x86/svm.h
index d9f7c731..a9e15f67 100644
--- a/x86/svm.h
+++ b/x86/svm.h
@@ -2,6 +2,7 @@
 #define X86_SVM_H
 
 #include "libcflat.h"
+#include "virt.h"
 
 enum {
 	INTERCEPT_INTR,
@@ -390,26 +391,6 @@ struct svm_test {
 	bool on_vcpu_done;
 };
 
-struct regs {
-	u64 rax;
-	u64 rcx;
-	u64 rdx;
-	u64 rbx;
-	u64 cr2;
-	u64 rbp;
-	u64 rsi;
-	u64 rdi;
-	u64 r8;
-	u64 r9;
-	u64 r10;
-	u64 r11;
-	u64 r12;
-	u64 r13;
-	u64 r14;
-	u64 r15;
-	u64 rflags;
-};
-
 typedef void (*test_guest_func)(struct svm_test *);
 
 int run_svm_tests(int ac, char **av, struct svm_test *svm_tests);
@@ -435,7 +416,7 @@ int get_test_stage(struct svm_test *test);
 void set_test_stage(struct svm_test *test, int s);
 void inc_test_stage(struct svm_test *test);
 void vmcb_ident(struct vmcb *vmcb);
-struct regs get_regs(void);
+struct guest_regs get_regs(void);
 void vmmcall(void);
 void svm_setup_vmrun(u64 rip);
 u64 __svm_vmrun(u64 rip);
@@ -454,36 +435,16 @@ static inline void clgi(void)
     asm volatile ("clgi");
 }
 
-
-
-#define SAVE_GPR_C                              \
-        "xchg %%rcx, regs+0x8\n\t"              \
-        "xchg %%rdx, regs+0x10\n\t"             \
-        "xchg %%rbx, regs+0x18\n\t"             \
-        "xchg %%rbp, regs+0x28\n\t"             \
-        "xchg %%rsi, regs+0x30\n\t"             \
-        "xchg %%rdi, regs+0x38\n\t"             \
-        "xchg %%r8, regs+0x40\n\t"              \
-        "xchg %%r9, regs+0x48\n\t"              \
-        "xchg %%r10, regs+0x50\n\t"             \
-        "xchg %%r11, regs+0x58\n\t"             \
-        "xchg %%r12, regs+0x60\n\t"             \
-        "xchg %%r13, regs+0x68\n\t"             \
-        "xchg %%r14, regs+0x70\n\t"             \
-        "xchg %%r15, regs+0x78\n\t"
-
-#define LOAD_GPR_C      SAVE_GPR_C
-
 #define ASM_PRE_VMRUN_CMD                       \
                 "vmload %%rax\n\t"              \
                 "mov regs+0x80, %%r15\n\t"      \
                 "mov %%r15, 0x170(%%rax)\n\t"   \
                 "mov regs, %%r15\n\t"           \
                 "mov %%r15, 0x1f8(%%rax)\n\t"   \
-                LOAD_GPR_C                      \
+                __SWAP_GPRS                     \
 
 #define ASM_POST_VMRUN_CMD                      \
-                SAVE_GPR_C                      \
+                __SWAP_GPRS                     \
                 "mov 0x170(%%rax), %%r15\n\t"   \
                 "mov %%r15, regs+0x80\n\t"      \
                 "mov 0x1f8(%%rax), %%r15\n\t"   \
diff --git a/x86/vmx.c b/x86/vmx.c
index 44ee3697..603730c2 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -44,7 +44,7 @@ struct vmcs *vmcs_root;
 u32 vpid_cnt;
 u64 guest_stack_top;
 u32 ctrl_pin, ctrl_enter, ctrl_exit, ctrl_cpu[2];
-struct regs regs;
+struct guest_regs regs;
 
 struct vmx_test *current;
 
@@ -1732,7 +1732,7 @@ static noinline void vmx_enter_guest(struct vmentry_result *result)
 	asm volatile (
 		"mov %[HOST_RSP], %%rdi\n\t"
 		"vmwrite %%rsp, %%rdi\n\t"
-		LOAD_GPR_C
+		SWAP_GPRS
 		"cmpb $0, %[launched]\n\t"
 		"jne 1f\n\t"
 		"vmlaunch\n\t"
@@ -1740,14 +1740,14 @@ static noinline void vmx_enter_guest(struct vmentry_result *result)
 		"1: "
 		"vmresume\n\t"
 		"2: "
-		SAVE_GPR_C
+		SWAP_GPRS
 		"pushf\n\t"
 		"pop %%rdi\n\t"
 		"mov %%rdi, %[vm_fail_flags]\n\t"
 		"movl $1, %[vm_fail]\n\t"
 		"jmp 3f\n\t"
 		"vmx_return:\n\t"
-		SAVE_GPR_C
+		SWAP_GPRS
 		"3: \n\t"
 		: [vm_fail]"+m"(result->vm_fail),
 		  [vm_fail_flags]"=m"(result->flags)
diff --git a/x86/vmx.h b/x86/vmx.h
index 425b1c43..56f37633 100644
--- a/x86/vmx.h
+++ b/x86/vmx.h
@@ -5,6 +5,7 @@
 #include "processor.h"
 #include "bitops.h"
 #include "util.h"
+#include "virt.h"
 #include "asm/page.h"
 #include "asm/io.h"
 
@@ -58,26 +59,6 @@ struct invvpid_operand {
 	u64 gla;
 };
 
-struct regs {
-	u64 rax;
-	u64 rcx;
-	u64 rdx;
-	u64 rbx;
-	u64 cr2;
-	u64 rbp;
-	u64 rsi;
-	u64 rdi;
-	u64 r8;
-	u64 r9;
-	u64 r10;
-	u64 r11;
-	u64 r12;
-	u64 r13;
-	u64 r14;
-	u64 r15;
-	u64 rflags;
-};
-
 union exit_reason {
 	struct {
 		u32	basic			: 16;
@@ -587,25 +568,6 @@ enum vm_entry_failure_code {
 	ENTRY_FAIL_VMCS_LINK_PTR	= 4,
 };
 
-#define SAVE_GPR_C				\
-	"xchg %%rax, regs\n\t"			\
-	"xchg %%rcx, regs+0x8\n\t"		\
-	"xchg %%rdx, regs+0x10\n\t"		\
-	"xchg %%rbx, regs+0x18\n\t"		\
-	"xchg %%rbp, regs+0x28\n\t"		\
-	"xchg %%rsi, regs+0x30\n\t"		\
-	"xchg %%rdi, regs+0x38\n\t"		\
-	"xchg %%r8, regs+0x40\n\t"		\
-	"xchg %%r9, regs+0x48\n\t"		\
-	"xchg %%r10, regs+0x50\n\t"		\
-	"xchg %%r11, regs+0x58\n\t"		\
-	"xchg %%r12, regs+0x60\n\t"		\
-	"xchg %%r13, regs+0x68\n\t"		\
-	"xchg %%r14, regs+0x70\n\t"		\
-	"xchg %%r15, regs+0x78\n\t"
-
-#define LOAD_GPR_C	SAVE_GPR_C
-
 #define VMX_IO_SIZE_MASK	0x7
 #define _VMX_IO_BYTE		0
 #define _VMX_IO_WORD		1
@@ -739,8 +701,6 @@ enum vm_entry_failure_code {
 #define VMCS_FIELD_RESERVED_SHIFT	(15)
 #define VMCS_FIELD_BIT_SIZE		(BITS_PER_LONG)
 
-extern struct regs regs;
-
 extern union vmx_basic_msr basic_msr;
 extern union vmx_ctrl_msr ctrl_pin_rev;
 extern union vmx_ctrl_msr ctrl_cpu_rev[2];
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 05/20] x86/virt: Use macro shenanigans to get reg offsets when swapping guest/host regs
  2026-05-14 21:04 [kvm-unit-tests PATCH v3 00/20] x86: Better backtraces for leaf functions Sean Christopherson
                   ` (3 preceding siblings ...)
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 04/20] x86: Dedup guest/host context switch of registers across SVM and VMX Sean Christopherson
@ 2026-05-14 21:04 ` Sean Christopherson
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 06/20] x86/virt: Track "guest regs" using per-CPU variable Sean Christopherson
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Sean Christopherson, Mathias Krause, Andrew Jones

Replace the hand-coded literal offsets in the guest/host assembly code
with programmatically generated offsets to make the code more readable and
easier to maintain.  E.g. this will make it much, much easier to make
the guest register structure per-CPU.

To workaround offsetof() being resolved at compile-time, i.e. not by the
preprocessor, provide a macro to define the immediate constraints for
inline assembly.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 lib/x86/virt.h | 62 ++++++++++++++++++++++++++++++++++++--------------
 x86/svm.c      |  5 ++--
 x86/svm.h      | 11 +++++----
 x86/vmx.c      |  3 ++-
 4 files changed, 56 insertions(+), 25 deletions(-)

diff --git a/lib/x86/virt.h b/lib/x86/virt.h
index ccc90c25..1066390d 100644
--- a/lib/x86/virt.h
+++ b/lib/x86/virt.h
@@ -29,24 +29,52 @@ struct guest_regs {
 
 extern struct guest_regs regs;
 
-#define __SWAP_GPRS			\
-	"xchg %%rcx, regs+0x8\n\t"	\
-	"xchg %%rdx, regs+0x10\n\t"	\
-	"xchg %%rbx, regs+0x18\n\t"	\
-	"xchg %%rbp, regs+0x28\n\t"	\
-	"xchg %%rsi, regs+0x30\n\t"	\
-	"xchg %%rdi, regs+0x38\n\t"	\
-	"xchg %%r8, regs+0x40\n\t"	\
-	"xchg %%r9, regs+0x48\n\t"	\
-	"xchg %%r10, regs+0x50\n\t"	\
-	"xchg %%r11, regs+0x58\n\t"	\
-	"xchg %%r12, regs+0x60\n\t"	\
-	"xchg %%r13, regs+0x68\n\t"	\
-	"xchg %%r14, regs+0x70\n\t"	\
-	"xchg %%r15, regs+0x78\n\t"
+#define GUEST_REG_OFFSET(name) \
+	[off_##name] "i" (offsetof(struct guest_regs, name))
 
-#define SWAP_GPRS			\
-	"xchg %%rax, regs+0x0\n\t"	\
+#define GUEST_REGS_OFFSETS	\
+	GUEST_REG_OFFSET(rax),	\
+	GUEST_REG_OFFSET(rcx),	\
+	GUEST_REG_OFFSET(rdx),	\
+	GUEST_REG_OFFSET(rbx),	\
+	GUEST_REG_OFFSET(cr2),	\
+	GUEST_REG_OFFSET(rbp),	\
+	GUEST_REG_OFFSET(rsi),	\
+	GUEST_REG_OFFSET(rdi),	\
+	GUEST_REG_OFFSET(r8),	\
+	GUEST_REG_OFFSET(r9),	\
+	GUEST_REG_OFFSET(r10),	\
+	GUEST_REG_OFFSET(r11),	\
+	GUEST_REG_OFFSET(r12),	\
+	GUEST_REG_OFFSET(r13),	\
+	GUEST_REG_OFFSET(r14),	\
+	GUEST_REG_OFFSET(r15),	\
+	GUEST_REG_OFFSET(rflags)
+
+#define GUEST_REG(name) \
+	xxstr(regs+%c[off_##name])
+
+#define SWAP_REG(name) \
+	"xchg %%" xxstr(name) "," GUEST_REG(name) "\n\t"
+
+#define __SWAP_GPRS		\
+	SWAP_REG(rcx)		\
+	SWAP_REG(rdx)		\
+	SWAP_REG(rbx)		\
+	SWAP_REG(rbp)		\
+	SWAP_REG(rsi)		\
+	SWAP_REG(rdi)		\
+	SWAP_REG(r8)		\
+	SWAP_REG(r9)		\
+	SWAP_REG(r10)		\
+	SWAP_REG(r11)		\
+	SWAP_REG(r12)		\
+	SWAP_REG(r13)		\
+	SWAP_REG(r14)		\
+	SWAP_REG(r15)
+
+#define SWAP_GPRS		\
+	SWAP_REG(rax)		\
 	__SWAP_GPRS
 
 #endif /* _x86_VIRT_H_ */
\ No newline at end of file
diff --git a/x86/svm.c b/x86/svm.c
index 893b3f49..1762cadb 100644
--- a/x86/svm.c
+++ b/x86/svm.c
@@ -254,7 +254,7 @@ u64 __svm_vmrun(u64 rip)
 		      "vmrun %%rax\n\t"               \
 		      ASM_POST_VMRUN_CMD
 		      :
-		      : "a" (virt_to_phys(vmcb))
+		      : GUEST_REGS_OFFSETS, "a" (virt_to_phys(vmcb))
 		      : "memory", "r15");
 
 	return (vmcb->control.exit_code);
@@ -296,7 +296,8 @@ static noinline void test_run(struct svm_test *test)
 			      : // inputs clobbered by the guest:
 				"=D" (the_test),            // first argument register
 				"=b" (the_vmcb)             // callee save register!
-			      : [test] "0" (the_test),
+			      : GUEST_REGS_OFFSETS,
+			        [test] "0" (the_test),
 				[vmcb_phys] "1"(the_vmcb),
 				[PREPARE_GIF_CLEAR] "i" (offsetof(struct svm_test, prepare_gif_clear))
 			      : "rax", "rcx", "rdx", "rsi",
diff --git a/x86/svm.h b/x86/svm.h
index a9e15f67..67a1cddd 100644
--- a/x86/svm.h
+++ b/x86/svm.h
@@ -437,18 +437,18 @@ static inline void clgi(void)
 
 #define ASM_PRE_VMRUN_CMD                       \
                 "vmload %%rax\n\t"              \
-                "mov regs+0x80, %%r15\n\t"      \
+                "mov " GUEST_REG(rflags) ", %%r15\n\t" \
                 "mov %%r15, 0x170(%%rax)\n\t"   \
-                "mov regs, %%r15\n\t"           \
+                "mov " GUEST_REG(rax) ", %%r15\n\t" \
                 "mov %%r15, 0x1f8(%%rax)\n\t"   \
                 __SWAP_GPRS                     \
 
 #define ASM_POST_VMRUN_CMD                      \
                 __SWAP_GPRS                     \
                 "mov 0x170(%%rax), %%r15\n\t"   \
-                "mov %%r15, regs+0x80\n\t"      \
+                "mov %%r15, " GUEST_REG(rflags) "\n\t" \
                 "mov 0x1f8(%%rax), %%r15\n\t"   \
-                "mov %%r15, regs\n\t"           \
+                "mov %%r15, " GUEST_REG(rax)"\n\t" \
                 "vmsave %%rax\n\t"              \
 
 
@@ -459,7 +459,8 @@ static inline void clgi(void)
                 "vmrun %%rax\n\t"               \
 		ASM_POST_VMRUN_CMD \
 		: \
-		: "a" (virt_to_phys(vmcb)) \
+		: GUEST_REGS_OFFSETS, \
+		  "a" (virt_to_phys(vmcb)) \
 		: "memory", "r15") \
 
 #endif
diff --git a/x86/vmx.c b/x86/vmx.c
index 603730c2..8a38ae8a 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -1751,7 +1751,8 @@ static noinline void vmx_enter_guest(struct vmentry_result *result)
 		"3: \n\t"
 		: [vm_fail]"+m"(result->vm_fail),
 		  [vm_fail_flags]"=m"(result->flags)
-		: [launched]"m"(launched), [HOST_RSP]"i"(HOST_RSP)
+		: [launched]"m"(launched), [HOST_RSP]"i"(HOST_RSP),
+		  GUEST_REGS_OFFSETS
 		: "rdi", "memory", "cc"
 	);
 	in_guest = 0;
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 06/20] x86/virt: Track "guest regs" using per-CPU variable
  2026-05-14 21:04 [kvm-unit-tests PATCH v3 00/20] x86: Better backtraces for leaf functions Sean Christopherson
                   ` (4 preceding siblings ...)
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 05/20] x86/virt: Use macro shenanigans to get reg offsets when swapping guest/host regs Sean Christopherson
@ 2026-05-14 21:04 ` Sean Christopherson
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 07/20] x86/svm: Don't VMLOAD/VMSAVE "guest" state around VMRUN Sean Christopherson
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Sean Christopherson, Mathias Krause, Andrew Jones

Make the guest_regs structure used to context switch registers between
host and guest per-CPU to fix a bug where VMX tests that run multiple
vCPUs can fail due to register corruption, e.g. two CPUs enter the guest
in quick succession, only one of the CPU's registers will be preserved
across VM-Enter => VM-Exit.

Reported-by: Mathias Krause <minipli@grsecurity.net>
Closes: https://lore.kernel.org/all/3bac29b9-4c49-4e5d-997e-9e4019a2fceb@grsecurity.net
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 lib/x86/smp.h   | 25 +++++++++++++++++++++++++
 lib/x86/virt.h  | 35 ++++++++---------------------------
 x86/svm.c       | 14 +++++---------
 x86/svm.h       |  1 -
 x86/svm_tests.c |  5 +++--
 x86/vmx.c       | 19 +++++++++++--------
 x86/vmx_tests.c | 39 +++++++++++++++++++++++++--------------
 7 files changed, 77 insertions(+), 61 deletions(-)

diff --git a/lib/x86/smp.h b/lib/x86/smp.h
index 272aa5ee..e4dc0395 100644
--- a/lib/x86/smp.h
+++ b/lib/x86/smp.h
@@ -20,6 +20,30 @@
 #include "atomic.h"
 #include "apic-defs.h"
 
+struct guest_regs {
+	u64 rax;
+	u64 rcx;
+	u64 rdx;
+	u64 rbx;
+	/*
+	 * Use RSP's index to hold CR3, as RSP isn't manually context switched
+	 * by software in any relevant flows.
+	 */
+	u64 cr2;
+	u64 rbp;
+	u64 rsi;
+	u64 rdi;
+	u64 r8;
+	u64 r9;
+	u64 r10;
+	u64 r11;
+	u64 r12;
+	u64 r13;
+	u64 r14;
+	u64 r15;
+	u64 rflags;
+};
+
 /* Offsets into the per-cpu page. */
 struct percpu_data {
 	uint32_t  smp_id;
@@ -32,6 +56,7 @@ struct percpu_data {
 		uint32_t exception_data;
 	};
 	void *apic_ops;
+	struct guest_regs guest_regs;
 };
 
 #define typeof_percpu(name) typeof(((struct percpu_data *)0)->name)
diff --git a/lib/x86/virt.h b/lib/x86/virt.h
index 1066390d..d05d4fc6 100644
--- a/lib/x86/virt.h
+++ b/lib/x86/virt.h
@@ -2,35 +2,16 @@
 #define _x86_VIRT_H_
 
 #include "libcflat.h"
+#include "processor.h"
+#include "smp.h"
 
-struct guest_regs {
-	u64 rax;
-	u64 rcx;
-	u64 rdx;
-	u64 rbx;
-	/*
-	 * Use RSP's index to hold CR3, as RSP isn't manually context switched
-	 * by software in any relevant flows.
-	 */
-	u64 cr2;
-	u64 rbp;
-	u64 rsi;
-	u64 rdi;
-	u64 r8;
-	u64 r9;
-	u64 r10;
-	u64 r11;
-	u64 r12;
-	u64 r13;
-	u64 r14;
-	u64 r15;
-	u64 rflags;
-};
-
-extern struct guest_regs regs;
+static inline struct guest_regs *this_cpu_guest_regs(void)
+{
+	return (void *)rdmsr(MSR_GS_BASE) + offsetof_percpu(guest_regs);
+}
 
 #define GUEST_REG_OFFSET(name) \
-	[off_##name] "i" (offsetof(struct guest_regs, name))
+	[off_##name] "i" (offsetof_percpu(guest_regs) + offsetof(struct guest_regs, name))
 
 #define GUEST_REGS_OFFSETS	\
 	GUEST_REG_OFFSET(rax),	\
@@ -52,7 +33,7 @@ extern struct guest_regs regs;
 	GUEST_REG_OFFSET(rflags)
 
 #define GUEST_REG(name) \
-	xxstr(regs+%c[off_##name])
+	xxstr(%%gs:%c[off_##name])
 
 #define SWAP_REG(name) \
 	"xchg %%" xxstr(name) "," GUEST_REG(name) "\n\t"
diff --git a/x86/svm.c b/x86/svm.c
index 1762cadb..beb57f33 100644
--- a/x86/svm.c
+++ b/x86/svm.c
@@ -223,13 +223,6 @@ void vmcb_ident(struct vmcb *vmcb)
 	}
 }
 
-struct guest_regs regs;
-
-struct guest_regs get_regs(void)
-{
-	return regs;
-}
-
 // rax handled specially below
 
 
@@ -246,8 +239,10 @@ void svm_setup_vmrun(u64 rip)
 
 u64 __svm_vmrun(u64 rip)
 {
+	struct guest_regs *regs = this_cpu_guest_regs();
+
 	svm_setup_vmrun(rip);
-	regs.rdi = (ulong)v2_test;
+	regs->rdi = (ulong)v2_test;
 
 	asm volatile (
 		      ASM_PRE_VMRUN_CMD
@@ -269,6 +264,7 @@ extern u8 vmrun_rip;
 
 static noinline void test_run(struct svm_test *test)
 {
+	struct guest_regs *regs = this_cpu_guest_regs();
 	u64 vmcb_phys = virt_to_phys(vmcb);
 
 	cli();
@@ -278,7 +274,7 @@ static noinline void test_run(struct svm_test *test)
 	guest_main = test->guest_func;
 	vmcb->save.rip = (ulong)test_thunk;
 	vmcb->save.rsp = (ulong)(guest_stack + ARRAY_SIZE(guest_stack));
-	regs.rdi = (ulong)test;
+	regs->rdi = (ulong)test;
 	do {
 		struct svm_test *the_test = test;
 		u64 the_vmcb = vmcb_phys;
diff --git a/x86/svm.h b/x86/svm.h
index 67a1cddd..4e7e9e7a 100644
--- a/x86/svm.h
+++ b/x86/svm.h
@@ -416,7 +416,6 @@ int get_test_stage(struct svm_test *test);
 void set_test_stage(struct svm_test *test, int s);
 void inc_test_stage(struct svm_test *test);
 void vmcb_ident(struct vmcb *vmcb);
-struct guest_regs get_regs(void);
 void vmmcall(void);
 void svm_setup_vmrun(u64 rip);
 u64 __svm_vmrun(u64 rip);
diff --git a/x86/svm_tests.c b/x86/svm_tests.c
index 8ce3cc2e..8547e729 100644
--- a/x86/svm_tests.c
+++ b/x86/svm_tests.c
@@ -577,6 +577,7 @@ static void restore_msrpm_bit(int bit_nr, bool set)
 
 static bool msr_intercept_finished(struct svm_test *test)
 {
+	struct guest_regs *regs = this_cpu_guest_regs();
 	u32 exit_code = vmcb->control.exit_code;
 	bool all_set = false;
 	int bit_nr;
@@ -649,9 +650,9 @@ static bool msr_intercept_finished(struct svm_test *test)
 	 *      while RAX hold its lower 32 bits.
 	 */
 	if (vmcb->control.exit_info_1)
-		test->scratch = ((get_regs().rdx << 32) | (vmcb->save.rax & 0xffffffff));
+		test->scratch = ((regs->rdx << 32) | (vmcb->save.rax & 0xffffffff));
 	else
-		test->scratch = get_regs().rcx;
+		test->scratch = regs->rcx;
 
 	return false;
 }
diff --git a/x86/vmx.c b/x86/vmx.c
index 8a38ae8a..4cb8d66c 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -44,7 +44,6 @@ struct vmcs *vmcs_root;
 u32 vpid_cnt;
 u64 guest_stack_top;
 u32 ctrl_pin, ctrl_enter, ctrl_exit, ctrl_cpu[2];
-struct guest_regs regs;
 
 struct vmx_test *current;
 
@@ -632,6 +631,8 @@ const char *exit_reason_description(u64 reason)
 
 void print_vmexit_info(union exit_reason exit_reason)
 {
+	struct guest_regs *regs = this_cpu_guest_regs();
+
 	u64 guest_rip, guest_rsp;
 	ulong exit_qual = vmcs_read(EXI_QUALIFICATION);
 	guest_rip = vmcs_read(GUEST_RIP);
@@ -642,13 +643,13 @@ void print_vmexit_info(union exit_reason exit_reason)
 	printf("\texit qualification = %#lx\n", exit_qual);
 	printf("\tguest_rip = %#lx\n", guest_rip);
 	printf("\tRAX=%#lx    RBX=%#lx    RCX=%#lx    RDX=%#lx\n",
-		regs.rax, regs.rbx, regs.rcx, regs.rdx);
+		regs->rax, regs->rbx, regs->rcx, regs->rdx);
 	printf("\tRSP=%#lx    RBP=%#lx    RSI=%#lx    RDI=%#lx\n",
-		guest_rsp, regs.rbp, regs.rsi, regs.rdi);
+		guest_rsp, regs->rbp, regs->rsi, regs->rdi);
 	printf("\tR8 =%#lx    R9 =%#lx    R10=%#lx    R11=%#lx\n",
-		regs.r8, regs.r9, regs.r10, regs.r11);
+		regs->r8, regs->r9, regs->r10, regs->r11);
 	printf("\tR12=%#lx    R13=%#lx    R14=%#lx    R15=%#lx\n",
-		regs.r12, regs.r13, regs.r14, regs.r15);
+		regs->r12, regs->r13, regs->r14, regs->r15);
 }
 
 void print_vmentry_failure_info(struct vmentry_result *result)
@@ -1707,15 +1708,16 @@ void test_skip(const char *msg)
 
 static int exit_handler(union exit_reason exit_reason)
 {
+	struct guest_regs *regs = this_cpu_guest_regs();
 	int ret;
 
 	current->exits++;
-	regs.rflags = vmcs_read(GUEST_RFLAGS);
+	regs->rflags = vmcs_read(GUEST_RFLAGS);
 	if (is_hypercall(exit_reason))
 		ret = handle_hypercall();
 	else
 		ret = current->exit_handler(exit_reason);
-	vmcs_write(GUEST_RFLAGS, regs.rflags);
+	vmcs_write(GUEST_RFLAGS, regs->rflags);
 
 	return ret;
 }
@@ -1815,6 +1817,7 @@ static void run_teardown_step(struct test_teardown_step *step)
 
 static int test_run(struct vmx_test *test)
 {
+	struct guest_regs *regs = this_cpu_guest_regs();
 	int r;
 
 	/* Validate V2 interface. */
@@ -1835,7 +1838,7 @@ static int test_run(struct vmx_test *test)
 		return 1;
 	}
 
-	memset(&regs, 0, sizeof(regs));
+	memset(regs, 0, sizeof(regs));
 	init_vmcs(&(test->vmcs));
 	/* Directly call test->init is ok here, init_vmcs has done
 	   vmcs init, vmclear and vmptrld*/
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index e0d5e390..e2bf06ac 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -102,15 +102,16 @@ static void vmenter_main(void)
 
 static int vmenter_exit_handler(union exit_reason exit_reason)
 {
+	struct guest_regs *regs = this_cpu_guest_regs();
 	u64 guest_rip = vmcs_read(GUEST_RIP);
 
 	switch (exit_reason.basic) {
 	case VMX_VMCALL:
-		if (regs.rax != 0xABCD) {
+		if (regs->rax != 0xABCD) {
 			report_fail("test vmresume");
 			return VMX_TEST_VMEXIT;
 		}
-		regs.rax = 0xFFFF;
+		regs->rax = 0xFFFF;
 		vmcs_write(GUEST_RIP, guest_rip + 3);
 		return VMX_TEST_RESUME;
 	default:
@@ -10196,6 +10197,7 @@ static void vmx_sipi_test_guest(void)
 
 static void sipi_test_ap_thread(void *data)
 {
+	struct guest_regs *regs = this_cpu_guest_regs();
 	struct vmcs *ap_vmcs;
 	u64 *ap_vmxon_region;
 	void *ap_stack, *ap_syscall_stack;
@@ -10210,6 +10212,8 @@ static void sipi_test_ap_thread(void *data)
 	init_vmcs(&ap_vmcs);
 	make_vmcs_current(ap_vmcs);
 
+	memset(regs, 0, sizeof(regs));
+
 	/* Set stack for AP */
 	ap_stack = alloc_page();
 	ap_syscall_stack = alloc_page();
@@ -10652,10 +10656,11 @@ static unsigned long long host_time_to_guest_time(unsigned long long t)
 static unsigned long long rdtsc_vmexit_diff_test_iteration(void)
 {
 	unsigned long long guest_tsc, host_to_guest_tsc;
+	struct guest_regs *regs = this_cpu_guest_regs();
 
 	enter_guest();
 	skip_exit_vmcall();
-	guest_tsc = (u32) regs.rax + (regs.rdx << 32);
+	guest_tsc = (u32) regs->rax + (regs->rdx << 32);
 	host_to_guest_tsc = host_time_to_guest_time(exit_msr_store[0].value);
 
 	return host_to_guest_tsc - guest_tsc;
@@ -10881,6 +10886,7 @@ typedef void (*pf_exception_test_guest_t)(void);
 static void __vmx_pf_exception_test(invalidate_tlb_t inv_fn, void *data,
 				    pf_exception_test_guest_t guest_fn)
 {
+	struct guest_regs *regs = this_cpu_guest_regs();
 	u64 efer;
 	struct cpuid cpuid;
 
@@ -10897,23 +10903,23 @@ static void __vmx_pf_exception_test(invalidate_tlb_t inv_fn, void *data,
 	while (vmcs_read(EXI_REASON) != VMX_VMCALL) {
 		switch (vmcs_read(EXI_REASON)) {
 		case VMX_RDMSR:
-			assert(regs.rcx == MSR_EFER);
+			assert(regs->rcx == MSR_EFER);
 			efer = vmcs_read(GUEST_EFER);
-			regs.rdx = efer >> 32;
-			regs.rax = efer & 0xffffffff;
+			regs->rdx = efer >> 32;
+			regs->rax = efer & 0xffffffff;
 			break;
 		case VMX_WRMSR:
-			assert(regs.rcx == MSR_EFER);
-			efer = regs.rdx << 32 | (regs.rax & 0xffffffff);
+			assert(regs->rcx == MSR_EFER);
+			efer = regs->rdx << 32 | (regs->rax & 0xffffffff);
 			vmcs_write(GUEST_EFER, efer);
 			break;
 		case VMX_CPUID:
 			cpuid = (struct cpuid) {0, 0, 0, 0};
-			cpuid = raw_cpuid(regs.rax, regs.rcx);
-			regs.rax = cpuid.a;
-			regs.rbx = cpuid.b;
-			regs.rcx = cpuid.c;
-			regs.rdx = cpuid.d;
+			cpuid = raw_cpuid(regs->rax, regs->rcx);
+			regs->rax = cpuid.a;
+			regs->rbx = cpuid.b;
+			regs->rcx = cpuid.c;
+			regs->rdx = cpuid.d;
 			break;
 		case VMX_INVLPG:
 			inv_fn(data);
@@ -11250,7 +11256,12 @@ static void do_vmx_canonical_test_one_field(const char *field_name, u64 field)
 	field_org_value = vmcs_read(field);
 
 	test_host_value_direct(field_name, field);
-	test_host_value_vmcs(field_name, field);
+	/*
+	 * Skip the GS.base VMCS test, the VMX infrastructure accesses per-CPU
+	 * variables (referenced via GS) immediatedly after VM-Exit.
+	 */
+	if (field != HOST_BASE_GS)
+		test_host_value_vmcs(field_name, field);
 
 	/* Restore original values */
 	vmcs_write(field, field_org_value);
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 07/20] x86/svm: Don't VMLOAD/VMSAVE "guest" state around VMRUN
  2026-05-14 21:04 [kvm-unit-tests PATCH v3 00/20] x86: Better backtraces for leaf functions Sean Christopherson
                   ` (5 preceding siblings ...)
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 06/20] x86/virt: Track "guest regs" using per-CPU variable Sean Christopherson
@ 2026-05-14 21:04 ` Sean Christopherson
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 08/20] x86/vmx: Use separate VMCSes for BSP vs. AP in INIT test Sean Christopherson
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Sean Christopherson, Mathias Krause, Andrew Jones

Drop the completely asinine and *extremely* confusing VMLOAD and VMSAVE
usage around VMRUN, as loading and saving _just_ guest state is both
unnecessary and dangerous.  E.g. GS.base, which KUT uses for per-CPU data,
is handled by VMLOAD/VMSAVE, and so loading guest state before VMRUN
without loading host state after #VMEXIT is wildly broken.  The only
reason the code "works" is because all relevant host state is copied
verbatim into the guest's save area, i.e. the host and guest use the same
state.  Double-down on sharing state between host and guest as a proper
fix is much more involved and delicate, e.g. would require ensuring
GS.base is loaded with the host's value prior to swapping GPRs (which are
per-CPU).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 x86/svm.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/x86/svm.h b/x86/svm.h
index 4e7e9e7a..21b3ac99 100644
--- a/x86/svm.h
+++ b/x86/svm.h
@@ -435,7 +435,6 @@ static inline void clgi(void)
 }
 
 #define ASM_PRE_VMRUN_CMD                       \
-                "vmload %%rax\n\t"              \
                 "mov " GUEST_REG(rflags) ", %%r15\n\t" \
                 "mov %%r15, 0x170(%%rax)\n\t"   \
                 "mov " GUEST_REG(rax) ", %%r15\n\t" \
@@ -448,9 +447,6 @@ static inline void clgi(void)
                 "mov %%r15, " GUEST_REG(rflags) "\n\t" \
                 "mov 0x1f8(%%rax), %%r15\n\t"   \
                 "mov %%r15, " GUEST_REG(rax)"\n\t" \
-                "vmsave %%rax\n\t"              \
-
-
 
 #define SVM_BARE_VMRUN \
 	asm volatile ( \
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 08/20] x86/vmx: Use separate VMCSes for BSP vs. AP in INIT test
  2026-05-14 21:04 [kvm-unit-tests PATCH v3 00/20] x86: Better backtraces for leaf functions Sean Christopherson
                   ` (6 preceding siblings ...)
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 07/20] x86/svm: Don't VMLOAD/VMSAVE "guest" state around VMRUN Sean Christopherson
@ 2026-05-14 21:04 ` Sean Christopherson
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 09/20] x86/vmx: Swap GPRs after checking "launched" status Sean Christopherson
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Sean Christopherson, Mathias Krause, Andrew Jones

Allocate and use a separate VMCS for the AP CPU in VMX's INIT test, and
simply mark the guest as finished on the BSP.  Sharing a VMCS between CPUs
requires _more_ code than allocating a dedicated VMCS, and completely
falls apart if things like "launched" are made per-CPU (spoiler alert).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 x86/vmx_tests.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index e2bf06ac..effd0c59 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -9976,7 +9976,8 @@ static bool init_signal_test_thread_continued;
 
 static void init_signal_test_thread(void *data)
 {
-	struct vmcs *test_vmcs = data;
+	struct guest_regs *regs = this_cpu_guest_regs();
+	struct vmcs *ap_vmcs;
 
 	/* Enter VMX operation (i.e. exec VMXON) */
 	u64 *ap_vmxon_region = alloc_page();
@@ -9984,6 +9985,11 @@ static void init_signal_test_thread(void *data)
 	init_vmx(ap_vmxon_region);
 	TEST_ASSERT(!__vmxon_safe(ap_vmxon_region));
 
+	init_vmcs(&ap_vmcs);
+	make_vmcs_current(ap_vmcs);
+
+	memset(regs, 0, sizeof(regs));
+
 	/* Signal CPU have entered VMX operation */
 	vmx_set_test_stage(1);
 
@@ -10003,13 +10009,10 @@ static void init_signal_test_thread(void *data)
 
 	/* Enter VMX non-root mode */
 	test_set_guest(v2_null_test_guest);
-	make_vmcs_current(test_vmcs);
 	enter_guest();
 	/* Save exit reason for BSP CPU to compare to expected result */
 	init_signal_test_exit_reason = vmcs_read(EXI_REASON);
-	/* VMCLEAR test-vmcs so it could be loaded by BSP CPU */
-	vmcs_clear(test_vmcs);
-	launched = false;
+
 	/* Signal that CPU exited to VMX root mode */
 	vmx_set_test_stage(5);
 
@@ -10110,9 +10113,8 @@ static void vmx_init_signal_test(void)
 			exit_reason_description(init_signal_test_exit_reason),
 			init_signal_test_exit_reason);
 
-	/* Run guest to completion */
-	make_vmcs_current(test_vmcs);
-	enter_guest();
+	/* Mark the guest as being done. */
+	test_set_guest_finished();
 
 	/* Signal other CPU to exit VMX operation */
 	init_signal_test_thread_continued = false;
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 09/20] x86/vmx: Swap GPRs after checking "launched" status
  2026-05-14 21:04 [kvm-unit-tests PATCH v3 00/20] x86: Better backtraces for leaf functions Sean Christopherson
                   ` (7 preceding siblings ...)
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 08/20] x86/vmx: Use separate VMCSes for BSP vs. AP in INIT test Sean Christopherson
@ 2026-05-14 21:04 ` Sean Christopherson
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 10/20] x86/vmx: Track VMCS "launched" state per-CPU Sean Christopherson
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Sean Christopherson, Mathias Krause, Andrew Jones

When context switching GPRs before VM-Enter, check if the VMCS has been
launched before loading guest GPRs to ensure the assembly sequence doesn't
consume a guest GPR.  The code currently works because "launched" is a
global variable and can be read with RIP-relative addressing, but that
won't hold true when launched is tracked per-CPU (or per-VMCS).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 x86/vmx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/x86/vmx.c b/x86/vmx.c
index 4cb8d66c..7da549d1 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -1734,8 +1734,8 @@ static noinline void vmx_enter_guest(struct vmentry_result *result)
 	asm volatile (
 		"mov %[HOST_RSP], %%rdi\n\t"
 		"vmwrite %%rsp, %%rdi\n\t"
-		SWAP_GPRS
 		"cmpb $0, %[launched]\n\t"
+		SWAP_GPRS
 		"jne 1f\n\t"
 		"vmlaunch\n\t"
 		"jmp 2f\n\t"
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 10/20] x86/vmx: Track VMCS "launched" state per-CPU
  2026-05-14 21:04 [kvm-unit-tests PATCH v3 00/20] x86: Better backtraces for leaf functions Sean Christopherson
                   ` (8 preceding siblings ...)
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 09/20] x86/vmx: Swap GPRs after checking "launched" status Sean Christopherson
@ 2026-05-14 21:04 ` Sean Christopherson
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 11/20] x86/vmx: Track "is this CPU in guest mode" per-CPU Sean Christopherson
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Sean Christopherson, Mathias Krause, Andrew Jones

Track whether or not a VMCS has been "launched" per-CPU to play nice with
VMX tests that use multiple VMCSes.  Arguably, it would be better to add a
structure to track the current VMCS and its launched state, but there's no
immediate benefit to doing so, and practically speaking the only thing that
would be tracked (without support for e.g. eVMCS) would be the "launched"
state.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 lib/x86/smp.h |  4 ++++
 x86/vmx.c     | 10 +++++-----
 x86/vmx.h     |  3 ++-
 3 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/lib/x86/smp.h b/lib/x86/smp.h
index e4dc0395..7f973227 100644
--- a/lib/x86/smp.h
+++ b/lib/x86/smp.h
@@ -56,7 +56,10 @@ struct percpu_data {
 		uint32_t exception_data;
 	};
 	void *apic_ops;
+
 	struct guest_regs guest_regs;
+	/* Track whether or not the current CPU's VMCS has been "launched". */
+	bool launched;
 };
 
 #define typeof_percpu(name) typeof(((struct percpu_data *)0)->name)
@@ -109,6 +112,7 @@ BUILD_PERCPU_OP(exception_vector);
 BUILD_PERCPU_OP(exception_rflags_rf);
 BUILD_PERCPU_OP(exception_error_code);
 BUILD_PERCPU_OP(apic_ops);
+BUILD_PERCPU_OP(launched);
 
 void smp_init(void);
 
diff --git a/x86/vmx.c b/x86/vmx.c
index 7da549d1..85772cae 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -60,7 +60,6 @@ static struct test_teardown_step teardown_steps[MAX_TEST_TEARDOWN_STEPS];
 static test_guest_func v2_guest_main;
 
 u64 hypercall_field;
-bool launched;
 static int matched;
 static int guest_finished;
 static int in_guest;
@@ -1728,6 +1727,8 @@ static int exit_handler(union exit_reason exit_reason)
  */
 static noinline void vmx_enter_guest(struct vmentry_result *result)
 {
+	bool launched = this_cpu_read_launched();
+
 	memset(result, 0, sizeof(*result));
 
 	in_guest = 1;
@@ -1779,7 +1780,7 @@ static int vmx_run(void)
 			 * VMCS isn't in "launched" state if there's been any
 			 * entry failure (early or otherwise).
 			 */
-			launched = 1;
+			this_cpu_write_launched(true);
 			ret = exit_handler(result.exit_reason);
 		} else if (current->entry_failure_handler) {
 			ret = current->entry_failure_handler(&result);
@@ -1848,7 +1849,6 @@ static int test_run(struct vmx_test *test)
 	v2_guest_main = NULL;
 	test->exits = 0;
 	current = test;
-	launched = 0;
 	guest_finished = 0;
 	printf("\nTest suite: %s\n", test->name);
 
@@ -1867,7 +1867,7 @@ static int test_run(struct vmx_test *test)
 	while (teardown_count > 0)
 		run_teardown_step(&teardown_steps[--teardown_count]);
 
-	if (launched && !guest_finished)
+	if (this_cpu_read_launched() && !guest_finished)
 		report_fail("Guest didn't run to completion.");
 
 out:
@@ -1975,7 +1975,7 @@ void __enter_guest(u8 abort_flag, struct vmentry_result *result)
 		return;
 	}
 
-	launched = 1;
+	this_cpu_write_launched(true);
 	check_for_guest_termination(result->exit_reason);
 	return;
 
diff --git a/x86/vmx.h b/x86/vmx.h
index 56f37633..c1c6eba4 100644
--- a/x86/vmx.h
+++ b/x86/vmx.h
@@ -794,7 +794,6 @@ static inline bool is_mbec_supported(void)
 }
 
 extern u64 *bsp_vmxon_region;
-extern bool launched;
 
 void vmx_set_test_stage(u32 s);
 u32 vmx_get_test_stage(void);
@@ -856,6 +855,8 @@ static inline int vmcs_clear(struct vmcs *vmcs)
 	bool ret;
 	u64 rflags = read_rflags() | X86_EFLAGS_CF | X86_EFLAGS_ZF;
 
+	this_cpu_write_launched(false);
+
 	asm volatile ("push %1; popf; vmclear %2; setbe %0"
 		      : "=q" (ret) : "q" (rflags), "m" (vmcs) : "cc");
 	return ret;
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 11/20] x86/vmx: Track "is this CPU in guest mode" per-CPU
  2026-05-14 21:04 [kvm-unit-tests PATCH v3 00/20] x86: Better backtraces for leaf functions Sean Christopherson
                   ` (9 preceding siblings ...)
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 10/20] x86/vmx: Track VMCS "launched" state per-CPU Sean Christopherson
@ 2026-05-14 21:04 ` Sean Christopherson
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 12/20] x86/vmx: Communicate hypercalls via RAX, not a global field Sean Christopherson
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Sean Christopherson, Mathias Krause, Andrew Jones

Track whether or not a CPU is in a SVM/VMX guest on a per-CPU basis to
play nice with SVM/VMX tests that run multiple vCPUs.  How the VMX tests
in particular managed to survive this long with shared state is nothing
short of amazing.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 lib/x86/smp.h |  3 +++
 x86/vmx.c     | 18 +++++++++++-------
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/lib/x86/smp.h b/lib/x86/smp.h
index 7f973227..683240d8 100644
--- a/lib/x86/smp.h
+++ b/lib/x86/smp.h
@@ -60,6 +60,8 @@ struct percpu_data {
 	struct guest_regs guest_regs;
 	/* Track whether or not the current CPU's VMCS has been "launched". */
 	bool launched;
+	/* Track if this CPU is running in an SVM or VMX guest. */
+	bool in_guest;
 };
 
 #define typeof_percpu(name) typeof(((struct percpu_data *)0)->name)
@@ -113,6 +115,7 @@ BUILD_PERCPU_OP(exception_rflags_rf);
 BUILD_PERCPU_OP(exception_error_code);
 BUILD_PERCPU_OP(apic_ops);
 BUILD_PERCPU_OP(launched);
+BUILD_PERCPU_OP(in_guest);
 
 void smp_init(void);
 
diff --git a/x86/vmx.c b/x86/vmx.c
index 85772cae..12e5d449 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -62,7 +62,6 @@ static test_guest_func v2_guest_main;
 u64 hypercall_field;
 static int matched;
 static int guest_finished;
-static int in_guest;
 
 union vmx_basic_msr basic_msr;
 union vmx_ctrl_msr ctrl_pin_rev;
@@ -1672,7 +1671,7 @@ static int handle_hypercall(void)
 
 static void continue_abort(void)
 {
-	assert(!in_guest);
+	assert(!this_cpu_read_in_guest());
 	printf("Host was here when guest aborted:\n");
 	dump_stack();
 	longjmp(abort_target, 1);
@@ -1681,7 +1680,7 @@ static void continue_abort(void)
 
 void __abort_test(void)
 {
-	if (in_guest)
+	if (this_cpu_read_in_guest())
 		hypercall(HYPERCALL_VMABORT);
 	else
 		longjmp(abort_target, 1);
@@ -1690,14 +1689,17 @@ void __abort_test(void)
 
 static void continue_skip(void)
 {
-	assert(!in_guest);
+	assert(!this_cpu_read_in_guest());
 	longjmp(abort_target, 1);
 	abort();
 }
 
 void test_skip(const char *msg)
 {
+	bool in_guest = this_cpu_read_in_guest();
+
 	printf("%s skipping test: %s\n", in_guest ? "Guest" : "Host", msg);
+
 	if (in_guest)
 		hypercall(HYPERCALL_VMABORT);
 	else
@@ -1731,7 +1733,8 @@ static noinline void vmx_enter_guest(struct vmentry_result *result)
 
 	memset(result, 0, sizeof(*result));
 
-	in_guest = 1;
+	this_cpu_write_in_guest(true);
+
 	asm volatile (
 		"mov %[HOST_RSP], %%rdi\n\t"
 		"vmwrite %%rsp, %%rdi\n\t"
@@ -1758,7 +1761,8 @@ static noinline void vmx_enter_guest(struct vmentry_result *result)
 		  GUEST_REGS_OFFSETS
 		: "rdi", "memory", "cc"
 	);
-	in_guest = 0;
+
+	this_cpu_write_in_guest(false);
 
 	result->vmlaunch = !launched;
 	result->instr = launched ? "vmresume" : "vmlaunch";
@@ -1854,7 +1858,7 @@ static int test_run(struct vmx_test *test)
 
 	r = setjmp(abort_target);
 	if (r) {
-		assert(!in_guest);
+		assert(!this_cpu_read_in_guest());
 		goto out;
 	}
 
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 12/20] x86/vmx: Communicate hypercalls via RAX, not a global field
  2026-05-14 21:04 [kvm-unit-tests PATCH v3 00/20] x86: Better backtraces for leaf functions Sean Christopherson
                   ` (10 preceding siblings ...)
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 11/20] x86/vmx: Track "is this CPU in guest mode" per-CPU Sean Christopherson
@ 2026-05-14 21:04 ` Sean Christopherson
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 13/20] x86/vmx: Initialize test stage in SIPI test *before* launching AP thread Sean Christopherson
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Sean Christopherson, Mathias Krause, Andrew Jones

Communicate hypercall requests via RAX instead of a global field.  To
avoid false positives, use a larger magic value that is extremely unlikely
to be resident in RAX at the time of VMCALL (avoiding false positives is
is presumably why a global variable was used).

Using RAX instead of a shared variable ensures multi-vCPU tests won't
clobber each other's hypercalls (no such tests currently exist).

Use the "abracadabra" magic number first introduced by KVM selftests,
because if a dad joke is funny one time, it's funny every time.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 x86/vmx.c | 32 ++++++++++++++++++++------------
 x86/vmx.h |  6 ------
 2 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/x86/vmx.c b/x86/vmx.c
index 12e5d449..af7c4c20 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -59,7 +59,6 @@ static struct test_teardown_step teardown_steps[MAX_TEST_TEARDOWN_STEPS];
 
 static test_guest_func v2_guest_main;
 
-u64 hypercall_field;
 static int matched;
 static int guest_finished;
 
@@ -1635,28 +1634,37 @@ static void test_vmx_caps(void)
 	       "MSR_IA32_VMX_EPT_VPID_CAP");
 }
 
+#define VMX_HYPERCALL_MAGIC 0xabacadabaULL
+
+#define HYPERCALL_MASK		0xFFF
+#define HYPERCALL_VMEXIT	0x1
+#define HYPERCALL_VMABORT	0x2
+#define HYPERCALL_VMSKIP	0x3
+
 /* This function can only be called in guest */
 void __attribute__((__used__)) hypercall(u32 hypercall_no)
 {
-	u64 val = 0;
-	val = (hypercall_no & HYPERCALL_MASK) | HYPERCALL_BIT;
-	hypercall_field = val;
-	asm volatile("vmcall\n\t");
+	u64 val = (VMX_HYPERCALL_MAGIC << 12) | hypercall_no;
+
+	asm volatile("vmcall\n\t" : "+a"(val));
 }
 
 static bool is_hypercall(union exit_reason exit_reason)
 {
+	u64 hypercall_field = this_cpu_guest_regs()->rax;
+
 	return exit_reason.basic == VMX_VMCALL &&
-	       (hypercall_field & HYPERCALL_BIT);
+	       (hypercall_field >> 12) == VMX_HYPERCALL_MAGIC;
 }
 
 static int handle_hypercall(void)
 {
-	ulong hypercall_no;
+	struct guest_regs *regs = this_cpu_guest_regs();
+	u64 hypercall_field = regs->rax;
 
-	hypercall_no = hypercall_field & HYPERCALL_MASK;
-	hypercall_field = 0;
-	switch (hypercall_no) {
+	regs->rax = 0;
+
+	switch (hypercall_field & HYPERCALL_MASK) {
 	case HYPERCALL_VMEXIT:
 		return VMX_TEST_VMEXIT;
 	case HYPERCALL_VMABORT:
@@ -1664,7 +1672,8 @@ static int handle_hypercall(void)
 	case HYPERCALL_VMSKIP:
 		return VMX_TEST_VMSKIP;
 	default:
-		printf("ERROR : Invalid hypercall number : %ld\n", hypercall_no);
+		printf("ERROR : Invalid hypercall number : %ld\n",
+		       hypercall_field & HYPERCALL_MASK);
 	}
 	return VMX_TEST_EXIT;
 }
@@ -2060,7 +2069,6 @@ int main(int argc, const char *argv[])
 	int i = 0;
 
 	setup_vm();
-	hypercall_field = 0;
 
 	/* We want xAPIC mode to test MMIO passthrough from L1 (us) to L2.  */
 	smp_reset_apic();
diff --git a/x86/vmx.h b/x86/vmx.h
index c1c6eba4..098a5ef4 100644
--- a/x86/vmx.h
+++ b/x86/vmx.h
@@ -588,12 +588,6 @@ enum vm_entry_failure_code {
 #define VMX_TEST_VMABORT	4
 #define VMX_TEST_VMSKIP		5
 
-#define HYPERCALL_BIT		(1ul << 12)
-#define HYPERCALL_MASK		0xFFF
-#define HYPERCALL_VMEXIT	0x1
-#define HYPERCALL_VMABORT	0x2
-#define HYPERCALL_VMSKIP	0x3
-
 #define EPTP_PG_WALK_LEN_SHIFT	3ul
 #define EPTP_PG_WALK_LEN_MASK	0x38ul
 #define EPTP_RESERV_BITS_MASK	0x1ful
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 13/20] x86/vmx: Initialize test stage in SIPI test *before* launching AP thread
  2026-05-14 21:04 [kvm-unit-tests PATCH v3 00/20] x86: Better backtraces for leaf functions Sean Christopherson
                   ` (11 preceding siblings ...)
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 12/20] x86/vmx: Communicate hypercalls via RAX, not a global field Sean Christopherson
@ 2026-05-14 21:04 ` Sean Christopherson
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 14/20] x86/kvmclock: Replace spaces with tabs Sean Christopherson
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Sean Christopherson, Mathias Krause, Andrew Jones

From: Mathias Krause <minipli@grsecurity.net>

Initialize the VMX test stage in the SIPI test *before* spawning the AP
thread, as setting the stage after waking the AP can result in the BSP's
write of '0' clobbering the AP's write of '1', ultimately causing the test
to hang because the BSP thinks the AP hasn't yet entered guest mode.

Signed-off-by: Mathias Krause <minipli@grsecurity.net>
[sean: write changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 x86/vmx_tests.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index effd0c59..6161f451 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -10285,11 +10285,11 @@ static void vmx_sipi_signal_test(void)
 	/* update CR3 on AP */
 	on_cpu(1, update_cr3, (void *)read_cr3());
 
+	vmx_set_test_stage(0);
+
 	/* start AP */
 	on_cpu_async(1, sipi_test_ap_thread, NULL);
 
-	vmx_set_test_stage(0);
-
 	/* BSP enter guest */
 	enter_guest();
 }
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 14/20] x86/kvmclock: Replace spaces with tabs
  2026-05-14 21:04 [kvm-unit-tests PATCH v3 00/20] x86: Better backtraces for leaf functions Sean Christopherson
                   ` (12 preceding siblings ...)
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 13/20] x86/vmx: Initialize test stage in SIPI test *before* launching AP thread Sean Christopherson
@ 2026-05-14 21:04 ` Sean Christopherson
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 15/20] x86/kvmclock: Skip kvmclock test when not running on KVM with CLOCKSOURCE2 Sean Christopherson
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Sean Christopherson, Mathias Krause, Andrew Jones

Replace (8) spaces with tabs in the kvmclock code so that upcoming changes
don't propagate the antiquated formatting.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 x86/kvmclock.c      |  42 ++++-----
 x86/kvmclock_test.c | 218 ++++++++++++++++++++++----------------------
 2 files changed, 130 insertions(+), 130 deletions(-)

diff --git a/x86/kvmclock.c b/x86/kvmclock.c
index f9f21032..ea9c0e7e 100644
--- a/x86/kvmclock.c
+++ b/x86/kvmclock.c
@@ -56,7 +56,7 @@ static inline u64 scale_delta(u64 delta, u32 mul_frac, int shift)
 # define do_div(n,base) ({					\
 	u32 __base = (base);    				\
 	u32 __rem;						\
-	__rem = ((u64)(n)) % __base;                            \
+	__rem = ((u64)(n)) % __base;			    \
 	(n) = ((u64)(n)) / __base;				\
 	__rem;							\
  })
@@ -194,9 +194,9 @@ static cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 	} while (pvclock_read_retry(src, version));
 
 	if ((valid_flags & PVCLOCK_RAW_CYCLE_BIT) ||
-            ((valid_flags & PVCLOCK_TSC_STABLE_BIT) &&
-             (flags & PVCLOCK_TSC_STABLE_BIT)))
-                return ret;
+	    ((valid_flags & PVCLOCK_TSC_STABLE_BIT) &&
+	     (flags & PVCLOCK_TSC_STABLE_BIT)))
+		return ret;
 
 	/*
 	 * Assumption here is that last_value, a global accumulator, always goes
@@ -224,27 +224,27 @@ static cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 
 cycle_t kvm_clock_read(void)
 {
-        struct pvclock_vcpu_time_info *src;
-        cycle_t ret;
-        int index = smp_id();
+	struct pvclock_vcpu_time_info *src;
+	cycle_t ret;
+	int index = smp_id();
 
-        src = &hv_clock[index];
-        ret = pvclock_clocksource_read(src);
-        return ret;
+	src = &hv_clock[index];
+	ret = pvclock_clocksource_read(src);
+	return ret;
 }
 
 void kvm_clock_init(void *data)
 {
-        int index = smp_id();
-        struct pvclock_vcpu_time_info *hvc = &hv_clock[index];
+	int index = smp_id();
+	struct pvclock_vcpu_time_info *hvc = &hv_clock[index];
 
-        printf("kvm-clock: cpu %d, msr %p\n", index, hvc);
-        wrmsr(MSR_KVM_SYSTEM_TIME_NEW, (unsigned long)hvc | 1);
+	printf("kvm-clock: cpu %d, msr %p\n", index, hvc);
+	wrmsr(MSR_KVM_SYSTEM_TIME_NEW, (unsigned long)hvc | 1);
 }
 
 void kvm_clock_clear(void *data)
 {
-        wrmsr(MSR_KVM_SYSTEM_TIME_NEW, 0LL);
+	wrmsr(MSR_KVM_SYSTEM_TIME_NEW, 0LL);
 }
 
 static void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,
@@ -275,15 +275,15 @@ static void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,
 
 void kvm_get_wallclock(struct timespec *ts)
 {
-        struct pvclock_vcpu_time_info *vcpu_time;
-        int index = smp_id();
+	struct pvclock_vcpu_time_info *vcpu_time;
+	int index = smp_id();
 
-        wrmsr(MSR_KVM_WALL_CLOCK_NEW, (unsigned long)&wall_clock);
-        vcpu_time = &hv_clock[index];
-        pvclock_read_wallclock(&wall_clock, vcpu_time, ts);
+	wrmsr(MSR_KVM_WALL_CLOCK_NEW, (unsigned long)&wall_clock);
+	vcpu_time = &hv_clock[index];
+	pvclock_read_wallclock(&wall_clock, vcpu_time, ts);
 }
 
 void pvclock_set_flags(unsigned char flags)
 {
-        valid_flags = flags;
+	valid_flags = flags;
 }
diff --git a/x86/kvmclock_test.c b/x86/kvmclock_test.c
index de4b5e13..d21c6c72 100644
--- a/x86/kvmclock_test.c
+++ b/x86/kvmclock_test.c
@@ -12,144 +12,144 @@ long sec = 0;
 long threshold = DEFAULT_THRESHOLD;
 
 struct test_info {
-        struct spinlock lock;
-        u64 warps;                /* warp count */
-        u64 stalls;               /* stall count */
-        long long worst;          /* worst warp */
-        volatile cycle_t last;    /* last cycle seen by test */
-        int check;                /* check cycle ? */
+	struct spinlock lock;
+	u64 warps;		/* warp count */
+	u64 stalls;		/* stall count */
+	long long worst;	/* worst warp */
+	volatile cycle_t last;	/* last cycle seen by test */
+	int check;		/* check cycle ? */
 };
 
 struct test_info ti[4];
 
 static void wallclock_test(void *data)
 {
-        int *p_err = data;
-        long ksec, offset;
-        struct timespec ts;
+	int *p_err = data;
+	long ksec, offset;
+	struct timespec ts;
 
-        kvm_get_wallclock(&ts);
-        ksec = ts.tv_sec;
+	kvm_get_wallclock(&ts);
+	ksec = ts.tv_sec;
 
-        offset = ksec - sec;
-        printf("Raw nanoseconds value from kvmclock: %" PRIu64 " (cpu %d)\n", kvm_clock_read(), smp_id());
-        printf("Seconds get from kvmclock: %ld (cpu %d, offset: %ld)\n", ksec, smp_id(), offset);
+	offset = ksec - sec;
+	printf("Raw nanoseconds value from kvmclock: %" PRIu64 " (cpu %d)\n", kvm_clock_read(), smp_id());
+	printf("Seconds get from kvmclock: %ld (cpu %d, offset: %ld)\n", ksec, smp_id(), offset);
 
-        if (offset > threshold || offset < -threshold) {
-                printf("offset too large!\n");
-                (*p_err)++;
-        }
+	if (offset > threshold || offset < -threshold) {
+		printf("offset too large!\n");
+		(*p_err)++;
+	}
 }
 
 static void kvm_clock_test(void *data)
 {
-        struct test_info *hv_test_info = (struct test_info *)data;
-        long i, check = hv_test_info->check;
+	struct test_info *hv_test_info = (struct test_info *)data;
+	long i, check = hv_test_info->check;
 
-        for (i = 0; i < loops; i++){
-                cycle_t t0, t1;
-                long long delta;
+	for (i = 0; i < loops; i++){
+		cycle_t t0, t1;
+		long long delta;
 
-                if (check == 0) {
-                        kvm_clock_read();
-                        continue;
-                }
+		if (check == 0) {
+			kvm_clock_read();
+			continue;
+		}
 
-                spin_lock(&hv_test_info->lock);
-                t1 = kvm_clock_read();
-                t0 = hv_test_info->last;
-                hv_test_info->last = kvm_clock_read();
-                spin_unlock(&hv_test_info->lock);
+		spin_lock(&hv_test_info->lock);
+		t1 = kvm_clock_read();
+		t0 = hv_test_info->last;
+		hv_test_info->last = kvm_clock_read();
+		spin_unlock(&hv_test_info->lock);
 
-                delta = t1 - t0;
-                if (delta < 0) {
-                        spin_lock(&hv_test_info->lock);
-                        ++hv_test_info->warps;
-                        if (delta < hv_test_info->worst){
-                                hv_test_info->worst = delta;
-                                printf("Worst warp %lld\n", hv_test_info->worst);
-                        }
-                        spin_unlock(&hv_test_info->lock);
-                }
-                if (delta == 0)
-                        ++hv_test_info->stalls;
+		delta = t1 - t0;
+		if (delta < 0) {
+			spin_lock(&hv_test_info->lock);
+			++hv_test_info->warps;
+			if (delta < hv_test_info->worst){
+				hv_test_info->worst = delta;
+				printf("Worst warp %lld\n", hv_test_info->worst);
+			}
+			spin_unlock(&hv_test_info->lock);
+		}
+		if (delta == 0)
+			++hv_test_info->stalls;
 
-                if (!((unsigned long)i & 31))
-                        asm volatile("rep; nop");
-        }
+		if (!((unsigned long)i & 31))
+			asm volatile("rep; nop");
+	}
 }
 
 static int cycle_test(int check, struct test_info *ti)
 {
-        unsigned long long begin, end;
+	unsigned long long begin, end;
 
-        begin = rdtsc();
+	begin = rdtsc();
 
-        ti->check = check;
-        on_cpus(kvm_clock_test, ti);
+	ti->check = check;
+	on_cpus(kvm_clock_test, ti);
 
-        end = rdtsc();
+	end = rdtsc();
 
-        printf("Total vcpus: %d\n", cpu_count());
-        printf("Test  loops: %ld\n", loops);
-        if (check == 1) {
-                printf("Total warps:  %" PRId64 "\n", ti->warps);
-                printf("Total stalls: %" PRId64 "\n", ti->stalls);
-                printf("Worst warp:   %lld\n", ti->worst);
-        } else
-                printf("TSC cycles:  %lld\n", end - begin);
+	printf("Total vcpus: %d\n", cpu_count());
+	printf("Test  loops: %ld\n", loops);
+	if (check == 1) {
+		printf("Total warps:  %" PRId64 "\n", ti->warps);
+		printf("Total stalls: %" PRId64 "\n", ti->stalls);
+		printf("Worst warp:   %lld\n", ti->worst);
+	} else
+		printf("TSC cycles:  %lld\n", end - begin);
 
-        return ti->warps ? 1 : 0;
+	return ti->warps ? 1 : 0;
 }
 
 int main(int ac, char **av)
 {
-        int nerr = 0;
-        int ncpus;
-        int i;
-
-        if (ac > 1)
-                loops = atol(av[1]);
-        if (ac > 2)
-                sec = atol(av[2]);
-        if (ac > 3)
-                threshold = atol(av[3]);
-
-        ncpus = cpu_count();
-        if (ncpus > MAX_CPU)
-                report_abort("number cpus exceeds %d", MAX_CPU);
-
-        on_cpus(kvm_clock_init, NULL);
-
-        if (ac > 2) {
-                printf("Wallclock test, threshold %ld\n", threshold);
-                printf("Seconds get from host:     %ld\n", sec);
-                for (i = 0; i < ncpus; ++i)
-                        on_cpu(i, wallclock_test, &nerr);
-        }
-
-        printf("Check the stability of raw cycle ...\n");
-        pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT
-                          | PVCLOCK_RAW_CYCLE_BIT);
-        if (cycle_test(1, &ti[0]))
-                printf("Raw cycle is not stable\n");
-        else
-                printf("Raw cycle is stable\n");
-
-        pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
-        printf("Monotonic cycle test:\n");
-        nerr += cycle_test(1, &ti[1]);
-
-        printf("Measure the performance of raw cycle ...\n");
-        pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT
-                          | PVCLOCK_RAW_CYCLE_BIT);
-        cycle_test(0, &ti[2]);
-
-        printf("Measure the performance of adjusted cycle ...\n");
-        pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
-        cycle_test(0, &ti[3]);
-
-        on_cpus(kvm_clock_clear, NULL);
-
-        return nerr > 0 ? 1 : 0;
+	int nerr = 0;
+	int ncpus;
+	int i;
+
+	if (ac > 1)
+		loops = atol(av[1]);
+	if (ac > 2)
+		sec = atol(av[2]);
+	if (ac > 3)
+		threshold = atol(av[3]);
+
+	ncpus = cpu_count();
+	if (ncpus > MAX_CPU)
+		report_abort("number cpus exceeds %d", MAX_CPU);
+
+	on_cpus(kvm_clock_init, NULL);
+
+	if (ac > 2) {
+		printf("Wallclock test, threshold %ld\n", threshold);
+		printf("Seconds get from host:     %ld\n", sec);
+		for (i = 0; i < ncpus; ++i)
+			on_cpu(i, wallclock_test, &nerr);
+	}
+
+	printf("Check the stability of raw cycle ...\n");
+	pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT
+			  | PVCLOCK_RAW_CYCLE_BIT);
+	if (cycle_test(1, &ti[0]))
+		printf("Raw cycle is not stable\n");
+	else
+		printf("Raw cycle is stable\n");
+
+	pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
+	printf("Monotonic cycle test:\n");
+	nerr += cycle_test(1, &ti[1]);
+
+	printf("Measure the performance of raw cycle ...\n");
+	pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT
+			  | PVCLOCK_RAW_CYCLE_BIT);
+	cycle_test(0, &ti[2]);
+
+	printf("Measure the performance of adjusted cycle ...\n");
+	pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
+	cycle_test(0, &ti[3]);
+
+	on_cpus(kvm_clock_clear, NULL);
+
+	return nerr > 0 ? 1 : 0;
 }
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 15/20] x86/kvmclock: Skip kvmclock test when not running on KVM with CLOCKSOURCE2
  2026-05-14 21:04 [kvm-unit-tests PATCH v3 00/20] x86: Better backtraces for leaf functions Sean Christopherson
                   ` (13 preceding siblings ...)
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 14/20] x86/kvmclock: Replace spaces with tabs Sean Christopherson
@ 2026-05-14 21:04 ` Sean Christopherson
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 16/20] x86/vmx: Tag "struct vmx_msr_entry" as needing to be 16-byte aligned Sean Christopherson
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Sean Christopherson, Mathias Krause, Andrew Jones

Skip the kvmclock test if the (virtual) CPU isn't running on KVM and/or
doesn't have CLOCKSOURCE2.  Presumably non-KVM environments simply don't
run the test, but checking for kvmclock support is easy enough.  E.g. with

  -cpu host,-kvmclock,kvm-pv-enforce-cpuid

the test will die on WRMSR #GPs without the checks, but generate

  SKIP: CPU not running on KVM with CLOCKSOURCE2
  SUMMARY: 1 tests, 1 skipped

with the appropriate checks.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 lib/x86/processor.h | 15 +++++++++++++++
 x86/kvmclock.h      |  2 ++
 x86/kvmclock_test.c |  9 ++++++++-
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/lib/x86/processor.h b/lib/x86/processor.h
index 32ce08e2..ba7065f7 100644
--- a/lib/x86/processor.h
+++ b/lib/x86/processor.h
@@ -317,6 +317,7 @@ struct x86_cpu_feature {
 #define X86_FEATURE_XSAVE		X86_CPU_FEATURE(0x1, 0, ECX, 26)
 #define X86_FEATURE_OSXSAVE		X86_CPU_FEATURE(0x1, 0, ECX, 27)
 #define X86_FEATURE_RDRAND		X86_CPU_FEATURE(0x1, 0, ECX, 30)
+#define X86_FEATURE_HYPERVISOR		X86_CPU_FEATURE(0x1, 0, ECX, 31)
 #define X86_FEATURE_MCE			X86_CPU_FEATURE(0x1, 0, EDX, 7)
 #define X86_FEATURE_APIC		X86_CPU_FEATURE(0x1, 0, EDX, 9)
 #define X86_FEATURE_CLFLUSH		X86_CPU_FEATURE(0x1, 0, EDX, 19)
@@ -351,6 +352,7 @@ struct x86_cpu_feature {
 /*
  * KVM defined leafs
  */
+#define KVM_FEATURE_CLOCKSOURCE2	X86_CPU_FEATURE(0x40000001, 0, EAX, 3)
 #define KVM_FEATURE_ASYNC_PF		X86_CPU_FEATURE(0x40000001, 0, EAX, 4)
 #define KVM_FEATURE_ASYNC_PF_INT	X86_CPU_FEATURE(0x40000001, 0, EAX, 14)
 
@@ -449,6 +451,7 @@ struct x86_cpu_property {
 #define X86_PROPERTY_AMX_NR_TILE_REGS		X86_CPU_PROPERTY(0x1d, 1, EBX, 16, 31)
 #define X86_PROPERTY_AMX_MAX_ROWS		X86_CPU_PROPERTY(0x1d, 1, ECX, 0,  15)
 
+#define KVM_SIGNATURE "KVMKVMKVM\0\0\0"
 #define X86_PROPERTY_MAX_KVM_LEAF		X86_CPU_PROPERTY(0x40000000, 0, EAX, 0, 31)
 
 #define X86_PROPERTY_MAX_EXT_LEAF		X86_CPU_PROPERTY(0x80000000, 0, EAX, 0, 31)
@@ -506,6 +509,18 @@ static __always_inline bool this_cpu_has_p(struct x86_cpu_property property)
 	return max_leaf >= property.function;
 }
 
+static inline bool this_cpu_has_kvm(void)
+{
+	struct cpuid signature;
+
+	if (!this_cpu_has(X86_FEATURE_HYPERVISOR) ||
+	    !this_cpu_has_p(X86_PROPERTY_MAX_KVM_LEAF))
+		return false;
+
+	signature = cpuid(X86_PROPERTY_MAX_KVM_LEAF.function);
+	return !memcmp(KVM_SIGNATURE, &signature.b, 12);
+}
+
 static inline u8 cpuid_maxphyaddr(void)
 {
 	if (!this_cpu_has_p(X86_PROPERTY_MAX_PHY_ADDR))
diff --git a/x86/kvmclock.h b/x86/kvmclock.h
index 1a40a7c0..bde9a21f 100644
--- a/x86/kvmclock.h
+++ b/x86/kvmclock.h
@@ -1,6 +1,8 @@
 #ifndef X86_KVMCLOCK_H
 #define X86_KVMCLOCK_H
 
+#include "libcflat.h"
+
 #define MSR_KVM_WALL_CLOCK_NEW  0x4b564d00
 #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
 
diff --git a/x86/kvmclock_test.c b/x86/kvmclock_test.c
index d21c6c72..659be870 100644
--- a/x86/kvmclock_test.c
+++ b/x86/kvmclock_test.c
@@ -108,6 +108,11 @@ int main(int ac, char **av)
 	int ncpus;
 	int i;
 
+	if (!this_cpu_has_kvm() || !this_cpu_has(KVM_FEATURE_CLOCKSOURCE2)) {
+		report_skip("CPU not running on KVM with CLOCKSOURCE2");
+		goto out;
+	}
+
 	if (ac > 1)
 		loops = atol(av[1]);
 	if (ac > 2)
@@ -151,5 +156,7 @@ int main(int ac, char **av)
 
 	on_cpus(kvm_clock_clear, NULL);
 
-	return nerr > 0 ? 1 : 0;
+	report(!nerr, "%u time warps detected", nerr);
+out:
+	return report_summary();
 }
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 16/20] x86/vmx: Tag "struct vmx_msr_entry" as needing to be 16-byte aligned
  2026-05-14 21:04 [kvm-unit-tests PATCH v3 00/20] x86: Better backtraces for leaf functions Sean Christopherson
                   ` (14 preceding siblings ...)
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 15/20] x86/kvmclock: Skip kvmclock test when not running on KVM with CLOCKSOURCE2 Sean Christopherson
@ 2026-05-14 21:04 ` Sean Christopherson
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 17/20] x86/smp: Align the stack to a 16-byte boundary when invoking SMP function calls Sean Christopherson
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Sean Christopherson, Mathias Krause, Andrew Jones

Mark "struct vmx_msr_entry" as 16-byte aligned so that it can be used in
static definitions without generating random VM-Entry failures due to an
unaligned MSR load/store list.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 x86/vmx_tests.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 6161f451..31c7672c 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -1973,7 +1973,7 @@ struct vmx_msr_entry {
 	u32 index;
 	u32 reserved;
 	u64 value;
-} __attribute__((packed));
+} __attribute__((packed)) __attribute__((aligned(16)));
 
 #define MSR_MAGIC 0x31415926
 struct vmx_msr_entry *exit_msr_store, *entry_msr_load, *exit_msr_load;
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 17/20] x86/smp: Align the stack to a 16-byte boundary when invoking SMP function calls
  2026-05-14 21:04 [kvm-unit-tests PATCH v3 00/20] x86: Better backtraces for leaf functions Sean Christopherson
                   ` (15 preceding siblings ...)
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 16/20] x86/vmx: Tag "struct vmx_msr_entry" as needing to be 16-byte aligned Sean Christopherson
@ 2026-05-14 21:04 ` Sean Christopherson
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 18/20] x86/vmx: Write to KVM's WALL_CLOCK MSR via VM-Entry load list sync in SIPI test Sean Christopherson
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Sean Christopherson, Mathias Krause, Andrew Jones

Align RSP to a 16-byte boundary in the IPI handler for SMP function calls
before calling into C code, as required by the x86-64 ABI.  Failure to
ensure the stack is properly aligned leads to obscure failures if a struct
(or any other object) tagged with __attribute__((aligned(16))) (or any
alignment greater than 16) is place on the stack.  E.g. VM-Enter will fail
on VMX if a vmx_msr_entry structure is placed on the stack.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 lib/x86/smp.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/lib/x86/smp.c b/lib/x86/smp.c
index 366e184c..0cd44cdc 100644
--- a/lib/x86/smp.c
+++ b/lib/x86/smp.c
@@ -58,12 +58,23 @@ static __attribute__((used)) void ipi(void)
 }
 
 asm (
-	 "ipi_entry: \n"
-	 "   call ipi \n"
-#ifndef __x86_64__
-	 "   iret"
+	"ipi_entry: \n"
+#ifdef __x86_64__
+	/*
+	 * Align the stack on a 16-byte boundary (as per x86_64 ABI) before
+	 * calling into C code.  Make sure not to clobber any regs!
+	 */
+	"	push %rbp\n"
+	"	mov %rsp, %rbp\n"
+	"	and $-0x10, %rsp\n"
+#endif
+	"	call ipi\n"
+#ifdef __x86_64__
+	"	mov %rbp, %rsp\n"
+	"	pop %rbp\n"
+	"	iretq"
 #else
-	 "   iretq"
+	"	iret"
 #endif
 	 );
 
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 18/20] x86/vmx: Write to KVM's WALL_CLOCK MSR via VM-Entry load list sync in SIPI test
  2026-05-14 21:04 [kvm-unit-tests PATCH v3 00/20] x86: Better backtraces for leaf functions Sean Christopherson
                   ` (16 preceding siblings ...)
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 17/20] x86/smp: Align the stack to a 16-byte boundary when invoking SMP function calls Sean Christopherson
@ 2026-05-14 21:04 ` Sean Christopherson
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 19/20] x86: Better backtraces for leaf functions Sean Christopherson
  2026-05-14 21:05 ` [kvm-unit-tests PATCH v3 20/20] x86: Prevent realmode test code instrumentation with nop-mcount Sean Christopherson
  19 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Sean Christopherson, Mathias Krause, Andrew Jones

In the VMX Wait-for-SIPI => SIPI VM-Exit test, signal that the AP has
entered the guest by writing to MSR_KVM_WALL_CLOCK_NEW (when supported)
via the VM-Entry MSR load list instead of writing to memory from the AP
_before_ actually doing VM-Enter.  Abusing the MSR load list ensures that
the AP's "ready" signal to the BSP happens atomically with respect to
VM-Enter, and thus fixes a race where the BSP can see "ready" and send the
SIPI before the AP has executed VM-Enter.  E.g. with a delay inserted on
the AP, and no delay on the BSP, the test will hang 100% of the time.

Use MSR_KVM_WALL_CLOCK_NEW as it is pretty much the only MSR that KVM
emulates as a per-VM MSR, and that has a high likelihood of being
available.

Keep the BSP's delay before send the SIPI so that the test continues to
work if MSR_KVM_WALL_CLOCK_NEW isn't available, e.g. in bare metal (and
most KVM) setups, hitting the race is practically impossible.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 x86/vmx_tests.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 31c7672c..ac0250b7 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -6,6 +6,7 @@
 
 #include <asm/debugreg.h>
 
+#include "kvmclock.h"
 #include "vmx.h"
 #include "msr.h"
 #include "processor.h"
@@ -10155,6 +10156,8 @@ static void vmx_init_signal_test(void)
 	 */
 }
 
+static bool use_kvm_wall_clock;
+
 #define SIPI_SIGNAL_TEST_DELAY	100000000ULL
 
 static void vmx_sipi_test_guest(void)
@@ -10199,6 +10202,11 @@ static void vmx_sipi_test_guest(void)
 
 static void sipi_test_ap_thread(void *data)
 {
+	const struct vmx_msr_entry msr_load_wall_clock = {
+		.index = MSR_KVM_WALL_CLOCK_NEW,
+		.reserved = 0,
+		.value = 1,
+	};
 	struct guest_regs *regs = this_cpu_guest_regs();
 	struct vmcs *ap_vmcs;
 	u64 *ap_vmxon_region;
@@ -10231,7 +10239,13 @@ static void sipi_test_ap_thread(void *data)
 	/* Set guest activity state to wait-for-SIPI state */
 	vmcs_write(GUEST_ACTV_STATE, ACTV_WAIT_SIPI);
 
-	vmx_set_test_stage(1);
+	if (use_kvm_wall_clock) {
+		wrmsr(MSR_KVM_WALL_CLOCK_NEW, 0);
+		vmcs_write(ENT_MSR_LD_CNT, 1);
+		vmcs_write(ENTER_MSR_LD_ADDR, virt_to_phys(&msr_load_wall_clock));
+	} else {
+		vmx_set_test_stage(1);
+	}
 
 	/* AP enter guest */
 	enter_guest();
@@ -10274,6 +10288,9 @@ static void vmx_sipi_signal_test(void)
 	u64 cpu_ctrl_0 = CPU_SECONDARY;
 	u64 cpu_ctrl_1 = 0;
 
+	use_kvm_wall_clock = this_cpu_has_kvm() &&
+			     this_cpu_has(KVM_FEATURE_CLOCKSOURCE2);
+
 	/* passthrough lapic to L2 */
 	disable_intercept_for_x2apic_msrs();
 	vmcs_write(PIN_CONTROLS, vmcs_read(PIN_CONTROLS) & ~PIN_EXTINT);
@@ -10290,6 +10307,13 @@ static void vmx_sipi_signal_test(void)
 	/* start AP */
 	on_cpu_async(1, sipi_test_ap_thread, NULL);
 
+	if (use_kvm_wall_clock) {
+		while (rdmsr(MSR_KVM_WALL_CLOCK_NEW) != 1)
+			cpu_relax();
+
+		vmx_set_test_stage(1);
+	}
+
 	/* BSP enter guest */
 	enter_guest();
 }
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 19/20] x86: Better backtraces for leaf functions
  2026-05-14 21:04 [kvm-unit-tests PATCH v3 00/20] x86: Better backtraces for leaf functions Sean Christopherson
                   ` (17 preceding siblings ...)
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 18/20] x86/vmx: Write to KVM's WALL_CLOCK MSR via VM-Entry load list sync in SIPI test Sean Christopherson
@ 2026-05-14 21:04 ` Sean Christopherson
  2026-05-14 21:05 ` [kvm-unit-tests PATCH v3 20/20] x86: Prevent realmode test code instrumentation with nop-mcount Sean Christopherson
  19 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Sean Christopherson, Mathias Krause, Andrew Jones

From: Mathias Krause <minipli@grsecurity.net>

Leaf functions are problematic for backtraces as they lack the frame
pointer setup epilogue. If such a function causes a fault, the original
caller won't be part of the backtrace. That's problematic if, for
example, memcpy() is failing because it got passed a bad pointer. The
generated backtrace will look like this, providing no clue what the
issue may be:

	STACK: @401b31 4001ad
  0x0000000000401b31: memcpy at lib/string.c:136 (discriminator 3)
        	for (i = 0; i < n; ++i)
      > 		a[i] = b[i];

  0x00000000004001ac: gdt32_end at x86/cstart64.S:127
        	lea __environ(%rip), %rdx
      > 	call main
        	mov %eax, %edi

By abusing profiling, we can force the compiler to emit a frame pointer
setup epilogue even for leaf functions, making the above backtrace
change like this:

	STACK: @401c21 400512 4001ad
  0x0000000000401c21: memcpy at lib/string.c:136 (discriminator 3)
        	for (i = 0; i < n; ++i)
      > 		a[i] = b[i];

  0x0000000000400511: main at x86/hypercall.c:91 (discriminator 24)

      > 	memcpy((void *)~0xbadc0de, (void *)0xdeadbeef, 42);

  0x00000000004001ac: gdt32_end at x86/cstart64.S:127
        	lea __environ(%rip), %rdx
      > 	call main
        	mov %eax, %edi

Above backtrace includes the failing memcpy() call, making it much
easier to spot the bug.

Enable "fake profiling" if supported by the compiler to get better
backtraces. The runtime overhead should be negligible for the gained
debugability as the profiling call is actually a NOP.

Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Tested-by: Andrew Jones <andrew.jones@linux.dev>
Fixes: f01ea38a385a ("x86: Better backtraces for leaf functions")
Reported-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 x86/Makefile.common | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/x86/Makefile.common b/x86/Makefile.common
index ef0e09a6..f7e3ba78 100644
--- a/x86/Makefile.common
+++ b/x86/Makefile.common
@@ -43,6 +43,17 @@ COMMON_CFLAGS += -O1
 # stack.o relies on frame pointers.
 KEEP_FRAME_POINTER := y
 
+ifneq ($(KEEP_FRAME_POINTER),)
+# Fake profiling to force the compiler to emit a frame pointer setup also in
+# leaf function (-mno-omit-leaf-frame-pointer doesn't work, unfortunately).
+#
+# Note:
+# We need to defer the cc-option test until -fno-pic or -no-pie have been
+# added to CFLAGS as -mnop-mcount needs it. The lazy evaluation of CFLAGS
+# during compilation makes this do "The Right Thing."
+LATE_CFLAGS += $(call cc-option, -pg -mnop-mcount, "")
+endif
+
 FLATLIBS = lib/libcflat.a
 
 ifeq ($(CONFIG_EFI),y)
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [kvm-unit-tests PATCH v3 20/20] x86: Prevent realmode test code instrumentation with nop-mcount
  2026-05-14 21:04 [kvm-unit-tests PATCH v3 00/20] x86: Better backtraces for leaf functions Sean Christopherson
                   ` (18 preceding siblings ...)
  2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 19/20] x86: Better backtraces for leaf functions Sean Christopherson
@ 2026-05-14 21:05 ` Sean Christopherson
  19 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2026-05-14 21:05 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Sean Christopherson, Mathias Krause, Andrew Jones

From: Mathias Krause <minipli@grsecurity.net>

Commit f01ea38a385a ("x86: Better backtraces for leaf functions") made
use of '-pg -mnop-mcount' to provide a lightweight way to force leaf
functions to emit a proper prologue for the backtracing code. However,
-mnop-mcount doesn't play well with 16-bit code generation for C code.
gcc happily emits a 5-byte NOP that transmutes to a 4-byte NOP followed
by a zero byte when executed in real mode, wrecking all code that
follows.

Fix that by selectively disabling '-mnop-mcount' for realmode.c, making
it call mcount(), which is provided as a stub function.

Note, a fix for the bad gcc behavior has been queued for gcc-16, i.e.
this workaround can be dropped when gcc-16 is the minimal supported
version for KUT (so in about 30 years).

Link: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=114a19fae9bd [1]
Reported-by: Sean Christopherson <seanjc@google.com>
Fixes: f01ea38a385a ("x86: Better backtraces for leaf functions")
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
[sean: add note regarding gcc bug]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 x86/Makefile.common | 5 ++++-
 x86/realmode.c      | 3 +++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/x86/Makefile.common b/x86/Makefile.common
index f7e3ba78..f5cbd9cf 100644
--- a/x86/Makefile.common
+++ b/x86/Makefile.common
@@ -51,7 +51,9 @@ ifneq ($(KEEP_FRAME_POINTER),)
 # We need to defer the cc-option test until -fno-pic or -no-pie have been
 # added to CFLAGS as -mnop-mcount needs it. The lazy evaluation of CFLAGS
 # during compilation makes this do "The Right Thing."
-LATE_CFLAGS += $(call cc-option, -pg -mnop-mcount, "")
+NOP_PGFLAGS := -pg -mnop-mcount
+LATE_CFLAGS += $(call cc-option, $(NOP_PGFLAGS), "")
+NO_NOP_MCOUNT = $(if $(filter $(NOP_PGFLAGS),$(LATE_CFLAGS)),-mno-nop-mcount)
 endif
 
 FLATLIBS = lib/libcflat.a
@@ -123,6 +125,7 @@ $(TEST_DIR)/realmode.elf: $(TEST_DIR)/realmode.o $(SRCDIR)/$(TEST_DIR)/realmode.
 	      -T $(SRCDIR)/$(TEST_DIR)/realmode.lds $(filter %.o, $^)
 
 $(TEST_DIR)/realmode.o: bits = $(realmode_bits)
+$(TEST_DIR)/realmode.o: CFLAGS += $(NO_NOP_MCOUNT)
 
 $(TEST_DIR)/access_test.$(bin): $(TEST_DIR)/access.o
 
diff --git a/x86/realmode.c b/x86/realmode.c
index 7a4423ec..0a7104d4 100644
--- a/x86/realmode.c
+++ b/x86/realmode.c
@@ -23,6 +23,9 @@ void test_function(void);
 asm(
 	"test_function: \n\t"
 	"mov $0x1234, %eax \n\t"
+	"ret\n\t"
+	/* mcount() stub */
+	"mcount:\n\t"
 	"ret"
    );
 
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2026-05-14 21:05 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-14 21:04 [kvm-unit-tests PATCH v3 00/20] x86: Better backtraces for leaf functions Sean Christopherson
2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 01/20] x86/vmx: Drop unused SYSENTER "support" in nested VMX infrastructure Sean Christopherson
2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 02/20] x86/vmx: Drop unused guest_regs " Sean Christopherson
2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 03/20] x86/svm: Sort (and swap) GPRs by their index, not alphabetically Sean Christopherson
2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 04/20] x86: Dedup guest/host context switch of registers across SVM and VMX Sean Christopherson
2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 05/20] x86/virt: Use macro shenanigans to get reg offsets when swapping guest/host regs Sean Christopherson
2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 06/20] x86/virt: Track "guest regs" using per-CPU variable Sean Christopherson
2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 07/20] x86/svm: Don't VMLOAD/VMSAVE "guest" state around VMRUN Sean Christopherson
2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 08/20] x86/vmx: Use separate VMCSes for BSP vs. AP in INIT test Sean Christopherson
2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 09/20] x86/vmx: Swap GPRs after checking "launched" status Sean Christopherson
2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 10/20] x86/vmx: Track VMCS "launched" state per-CPU Sean Christopherson
2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 11/20] x86/vmx: Track "is this CPU in guest mode" per-CPU Sean Christopherson
2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 12/20] x86/vmx: Communicate hypercalls via RAX, not a global field Sean Christopherson
2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 13/20] x86/vmx: Initialize test stage in SIPI test *before* launching AP thread Sean Christopherson
2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 14/20] x86/kvmclock: Replace spaces with tabs Sean Christopherson
2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 15/20] x86/kvmclock: Skip kvmclock test when not running on KVM with CLOCKSOURCE2 Sean Christopherson
2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 16/20] x86/vmx: Tag "struct vmx_msr_entry" as needing to be 16-byte aligned Sean Christopherson
2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 17/20] x86/smp: Align the stack to a 16-byte boundary when invoking SMP function calls Sean Christopherson
2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 18/20] x86/vmx: Write to KVM's WALL_CLOCK MSR via VM-Entry load list sync in SIPI test Sean Christopherson
2026-05-14 21:04 ` [kvm-unit-tests PATCH v3 19/20] x86: Better backtraces for leaf functions Sean Christopherson
2026-05-14 21:05 ` [kvm-unit-tests PATCH v3 20/20] x86: Prevent realmode test code instrumentation with nop-mcount Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox