public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions
@ 2025-09-15 21:54 Mathias Krause
  2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 1/4] Makefile: Provide a concept of late CFLAGS Mathias Krause
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Mathias Krause @ 2025-09-15 21:54 UTC (permalink / raw)
  To: Andrew Jones, Alexandru Elisei, Eric Auger, Thomas Huth,
	Paolo Bonzini
  Cc: kvm, kvmarm, Mathias Krause

This is v2 of [1], trying to enhance backtraces involving leaf
functions.

This version fixes backtraces on ARM and ARM64 as well, as ARM currently
fails hard for leaf functions lacking a proper stack frame setup, making
it dereference invalid pointers. ARM64 just skips frames, much like x86
does.

v2 fixes this by introducing the concept of "late CFLAGS" that get
evaluated in the top-level Makefile once all other optional flags have
been added to $(CFLAGS), which is needed for x86's version at least.

Please apply!

Thanks,
Mathias

[1] https://lore.kernel.org/kvm/20250724181759.1974692-1-minipli@grsecurity.net/

Mathias Krause (4):
  Makefile: Provide a concept of late CFLAGS
  x86: Better backtraces for leaf functions
  arm64: Better backtraces for leaf functions
  arm: Fix backtraces involving leaf functions

 Makefile            |  4 ++++
 arm/Makefile.arm    |  8 ++++++++
 arm/Makefile.arm64  |  6 ++++++
 x86/Makefile.common | 11 +++++++++++
 lib/arm/stack.c     | 18 ++++++++++++++++--
 5 files changed, 45 insertions(+), 2 deletions(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [kvm-unit-tests PATCH v2 1/4] Makefile: Provide a concept of late CFLAGS
  2025-09-15 21:54 [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions Mathias Krause
@ 2025-09-15 21:54 ` Mathias Krause
  2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 2/4] x86: Better backtraces for leaf functions Mathias Krause
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Mathias Krause @ 2025-09-15 21:54 UTC (permalink / raw)
  To: Andrew Jones, Alexandru Elisei, Eric Auger, Thomas Huth,
	Paolo Bonzini
  Cc: kvm, kvmarm, Mathias Krause

Allow architectures to provide CFLAGS that should be added only after
all other optional CFLAGS have been evaluated.

This will be useful for flags that depend on other, generic ones.

To allow 'LATE_CFLAGS' to make use of the $(cc-option ...) helper,
assume it'll be a lazily evaluated variable. To further ensure the
$(cc-option ...) compiler invocation overhead won't be per-use of
$(CFLAGS), enforce its evaluation prior to extending CFLAGS.

Signed-off-by: Mathias Krause <minipli@grsecurity.net>
---
 Makefile | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/Makefile b/Makefile
index 9dc5d2234e2a..0ce0813bf124 100644
--- a/Makefile
+++ b/Makefile
@@ -95,6 +95,10 @@ CFLAGS += $(wmissing_parameter_type)
 CFLAGS += $(wold_style_declaration)
 CFLAGS += -Woverride-init -Wmissing-prototypes -Wstrict-prototypes
 
+# Evaluate and add late cflags last -- they may depend on previous flags
+LATE_CFLAGS := $(LATE_CFLAGS)
+CFLAGS += $(LATE_CFLAGS)
+
 autodepend-flags = -MMD -MP -MF $(dir $*).$(notdir $*).d
 
 LDFLAGS += -nostdlib $(no_pie) -z noexecstack
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [kvm-unit-tests PATCH v2 2/4] x86: Better backtraces for leaf functions
  2025-09-15 21:54 [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions Mathias Krause
  2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 1/4] Makefile: Provide a concept of late CFLAGS Mathias Krause
@ 2025-09-15 21:54 ` Mathias Krause
  2025-10-10  6:03   ` Mathias Krause
  2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 3/4] arm64: " Mathias Krause
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Mathias Krause @ 2025-09-15 21:54 UTC (permalink / raw)
  To: Andrew Jones, Alexandru Elisei, Eric Auger, Thomas Huth,
	Paolo Bonzini
  Cc: kvm, kvmarm, Mathias Krause

Leaf functions are problematic for backtraces as they lack the frame
pointer setup epilogue. If such a function causes a fault, the original
caller won't be part of the backtrace. That's problematic if, for
example, memcpy() is failing because it got passed a bad pointer. The
generated backtrace will look like this, providing no clue what the
issue may be:

	STACK: @401b31 4001ad
  0x0000000000401b31: memcpy at lib/string.c:136 (discriminator 3)
        	for (i = 0; i < n; ++i)
      > 		a[i] = b[i];

  0x00000000004001ac: gdt32_end at x86/cstart64.S:127
        	lea __environ(%rip), %rdx
      > 	call main
        	mov %eax, %edi

By abusing profiling, we can force the compiler to emit a frame pointer
setup epilogue even for leaf functions, making the above backtrace
change like this:

	STACK: @401c21 400512 4001ad
  0x0000000000401c21: memcpy at lib/string.c:136 (discriminator 3)
        	for (i = 0; i < n; ++i)
      > 		a[i] = b[i];

  0x0000000000400511: main at x86/hypercall.c:91 (discriminator 24)

      > 	memcpy((void *)~0xbadc0de, (void *)0xdeadbeef, 42);

  0x00000000004001ac: gdt32_end at x86/cstart64.S:127
        	lea __environ(%rip), %rdx
      > 	call main
        	mov %eax, %edi

Above backtrace includes the failing memcpy() call, making it much
easier to spot the bug.

Enable "fake profiling" if supported by the compiler to get better
backtraces. The runtime overhead should be negligible for the gained
debugability as the profiling call is actually a NOP.

Signed-off-by: Mathias Krause <minipli@grsecurity.net>
---
One may argure that the "ifneq ($(KEEP_FRAME_POINTER),) ... endif"
wrapping isn't needed, and that's true. However, it simplifies toggling
that variable, if there'll ever be a need for it.

 x86/Makefile.common | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/x86/Makefile.common b/x86/Makefile.common
index 5663a65d3df4..be18a77a779e 100644
--- a/x86/Makefile.common
+++ b/x86/Makefile.common
@@ -43,6 +43,17 @@ COMMON_CFLAGS += -O1
 # stack.o relies on frame pointers.
 KEEP_FRAME_POINTER := y
 
+ifneq ($(KEEP_FRAME_POINTER),)
+# Fake profiling to force the compiler to emit a frame pointer setup also in
+# leaf function (-mno-omit-leaf-frame-pointer doesn't work, unfortunately).
+#
+# Note:
+# We need to defer the cc-option test until -fno-pic or -no-pie have been
+# added to CFLAGS as -mnop-mcount needs it. The lazy evaluation of CFLAGS
+# during compilation makes this do "The Right Thing."
+LATE_CFLAGS += $(call cc-option, -pg -mnop-mcount, "")
+endif
+
 FLATLIBS = lib/libcflat.a
 
 ifeq ($(CONFIG_EFI),y)
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [kvm-unit-tests PATCH v2 3/4] arm64: Better backtraces for leaf functions
  2025-09-15 21:54 [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions Mathias Krause
  2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 1/4] Makefile: Provide a concept of late CFLAGS Mathias Krause
  2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 2/4] x86: Better backtraces for leaf functions Mathias Krause
@ 2025-09-15 21:54 ` Mathias Krause
  2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 4/4] arm: Fix backtraces involving " Mathias Krause
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Mathias Krause @ 2025-09-15 21:54 UTC (permalink / raw)
  To: Andrew Jones, Alexandru Elisei, Eric Auger, Thomas Huth,
	Paolo Bonzini
  Cc: kvm, kvmarm, Mathias Krause

Similar to x86 before, ARM64 skips the stack frame setup for leaf
functions, making backtraces miss frames.

Fortunately, gcc supports forcing generating stack frames for these via
-mno-omit-leaf-frame-pointer.

Signed-off-by: Mathias Krause <minipli@grsecurity.net>
---
 arm/Makefile.arm64 | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arm/Makefile.arm64 b/arm/Makefile.arm64
index bf7ea2a36d3a..a40c830df20f 100644
--- a/arm/Makefile.arm64
+++ b/arm/Makefile.arm64
@@ -67,5 +67,11 @@ tests += $(TEST_DIR)/mte.$(exe)
 
 include $(SRCDIR)/$(TEST_DIR)/Makefile.common
 
+ifneq ($(KEEP_FRAME_POINTER),)
+# Force the generation of a regular stack frame even for leaf functions to make
+# stack walking reliable.
+LATE_CFLAGS += $(call cc-option, -mno-omit-leaf-frame-pointer, "")
+endif
+
 arch_clean: arm_clean
 	$(RM) lib/arm64/.*.d
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [kvm-unit-tests PATCH v2 4/4] arm: Fix backtraces involving leaf functions
  2025-09-15 21:54 [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions Mathias Krause
                   ` (2 preceding siblings ...)
  2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 3/4] arm64: " Mathias Krause
@ 2025-09-15 21:54 ` Mathias Krause
  2025-09-16 13:04 ` [kvm-unit-tests PATCH v2 0/4] Better backtraces for " Andrew Jones
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Mathias Krause @ 2025-09-15 21:54 UTC (permalink / raw)
  To: Andrew Jones, Alexandru Elisei, Eric Auger, Thomas Huth,
	Paolo Bonzini
  Cc: kvm, kvmarm, Mathias Krause

When backtracing starts in a leaf function, it'll cause the code to go
off the rials as the stack frame for leaf functions is incomplete -- it
lacks the return address, likely just causing recursive exception
handling by trying to follow an invalid pointer chain.

Unfortunately, -mno-omit-leaf-frame-pointer isn't supported for the ARM
target as an easy fix. Make use of -mapcs-frame instead to force the
generation of an APCS stack frame layout[1] that can be traversed
reliably.

As Clang doesn't support -mapcs-frame, make the stack walking code
handle this by using the (old) more compact standard format as a
fall-back.

Link: https://developer.arm.com/documentation/dui0041/c/ARM-Procedure-Call-Standard/APCS-definition/The-stack-backtrace-data-structure [1]
Signed-off-by: Mathias Krause <minipli@grsecurity.net>

---
I failed to build KUT with Clang for ARM for various reasons, the code
is clearly lacking Clang support for ARM, so I doubt this fall-back will
be needed / used anytime soon.

 arm/Makefile.arm |  8 ++++++++
 lib/arm/stack.c  | 18 ++++++++++++++++--
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arm/Makefile.arm b/arm/Makefile.arm
index d6250b7fb686..7734e17fe583 100644
--- a/arm/Makefile.arm
+++ b/arm/Makefile.arm
@@ -39,4 +39,12 @@ tests =
 
 include $(SRCDIR)/$(TEST_DIR)/Makefile.common
 
+ifneq ($(KEEP_FRAME_POINTER),)
+# Force the generation of an APCS stack frame layout to be able to reliably
+# walk the stack. Otherwise the compiler may omit saving LR on the stack for
+# leaf functions and, unfortunately, -mno-omit-leaf-frame-pointer isn't
+# supported on ARM :(
+LATE_CFLAGS += $(call cc-option, -mapcs-frame -DAPCS_FRAMES, "")
+endif
+
 arch_clean: arm_clean
diff --git a/lib/arm/stack.c b/lib/arm/stack.c
index 66d18b47ea53..b2384d8eb4c1 100644
--- a/lib/arm/stack.c
+++ b/lib/arm/stack.c
@@ -8,6 +8,20 @@
 #include <libcflat.h>
 #include <stack.h>
 
+/*
+ * APCS stack frames are generated by code like this:
+ * | mov  ip, sp
+ * | push {..., fp, ip, lr, pc}
+ * | sub  fp, ip, #4
+ */
+#ifdef APCS_FRAMES
+# define FP_IDX -3
+# define LR_IDX -1
+#else
+# define FP_IDX -1
+# define LR_IDX 0
+#endif
+
 int arch_backtrace_frame(const void *frame, const void **return_addrs,
 			 int max_depth, bool current_frame)
 {
@@ -27,10 +41,10 @@ int arch_backtrace_frame(const void *frame, const void **return_addrs,
 	for (depth = 0; depth < max_depth; depth++) {
 		if (!fp)
 			break;
-		return_addrs[depth] = (void *)fp[0];
+		return_addrs[depth] = (void *)fp[LR_IDX];
 		if (return_addrs[depth] == 0)
 			break;
-		fp = (unsigned long *)fp[-1];
+		fp = (unsigned long *)fp[FP_IDX];
 	}
 
 	walking = 0;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions
  2025-09-15 21:54 [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions Mathias Krause
                   ` (3 preceding siblings ...)
  2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 4/4] arm: Fix backtraces involving " Mathias Krause
@ 2025-09-16 13:04 ` Andrew Jones
  2025-11-14 15:58 ` Mathias Krause
  2025-11-14 18:25 ` Sean Christopherson
  6 siblings, 0 replies; 22+ messages in thread
From: Andrew Jones @ 2025-09-16 13:04 UTC (permalink / raw)
  To: Mathias Krause
  Cc: Alexandru Elisei, Eric Auger, Thomas Huth, Paolo Bonzini, kvm,
	kvmarm

On Mon, Sep 15, 2025 at 11:54:28PM +0200, Mathias Krause wrote:
> This is v2 of [1], trying to enhance backtraces involving leaf
> functions.
> 
> This version fixes backtraces on ARM and ARM64 as well, as ARM currently
> fails hard for leaf functions lacking a proper stack frame setup, making
> it dereference invalid pointers. ARM64 just skips frames, much like x86
> does.

Hi Mathias,

Thank you for the arm/arm64 fixes!

For the arm/arm64 patches

Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Tested-by: Andrew Jones <andrew.jones@linux.dev>

Thanks,
drew

> 
> v2 fixes this by introducing the concept of "late CFLAGS" that get
> evaluated in the top-level Makefile once all other optional flags have
> been added to $(CFLAGS), which is needed for x86's version at least.
> 
> Please apply!
> 
> Thanks,
> Mathias
> 
> [1] https://lore.kernel.org/kvm/20250724181759.1974692-1-minipli@grsecurity.net/
> 
> Mathias Krause (4):
>   Makefile: Provide a concept of late CFLAGS
>   x86: Better backtraces for leaf functions
>   arm64: Better backtraces for leaf functions
>   arm: Fix backtraces involving leaf functions
> 
>  Makefile            |  4 ++++
>  arm/Makefile.arm    |  8 ++++++++
>  arm/Makefile.arm64  |  6 ++++++
>  x86/Makefile.common | 11 +++++++++++
>  lib/arm/stack.c     | 18 ++++++++++++++++--
>  5 files changed, 45 insertions(+), 2 deletions(-)
> 
> -- 
> 2.47.3
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [kvm-unit-tests PATCH v2 2/4] x86: Better backtraces for leaf functions
  2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 2/4] x86: Better backtraces for leaf functions Mathias Krause
@ 2025-10-10  6:03   ` Mathias Krause
  0 siblings, 0 replies; 22+ messages in thread
From: Mathias Krause @ 2025-10-10  6:03 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Andrew Jones, Alexandru Elisei, Eric Auger, Thomas Huth, kvm,
	kvmarm

On 9/15/25 23:54, Mathias Krause wrote:
> Leaf functions are problematic for backtraces as they lack the frame
> pointer setup epilogue. If such a function causes a fault, the original
> caller won't be part of the backtrace. That's problematic if, for
> example, memcpy() is failing because it got passed a bad pointer. The
> generated backtrace will look like this, providing no clue what the
> issue may be:
> 
> 	STACK: @401b31 4001ad
>   0x0000000000401b31: memcpy at lib/string.c:136 (discriminator 3)
>         	for (i = 0; i < n; ++i)
>       > 		a[i] = b[i];
> 
>   0x00000000004001ac: gdt32_end at x86/cstart64.S:127
>         	lea __environ(%rip), %rdx
>       > 	call main
>         	mov %eax, %edi
> 
> By abusing profiling, we can force the compiler to emit a frame pointer
> setup epilogue even for leaf functions, making the above backtrace
> change like this:
> 
> 	STACK: @401c21 400512 4001ad
>   0x0000000000401c21: memcpy at lib/string.c:136 (discriminator 3)
>         	for (i = 0; i < n; ++i)
>       > 		a[i] = b[i];
> 
>   0x0000000000400511: main at x86/hypercall.c:91 (discriminator 24)
> 
>       > 	memcpy((void *)~0xbadc0de, (void *)0xdeadbeef, 42);
> 
>   0x00000000004001ac: gdt32_end at x86/cstart64.S:127
>         	lea __environ(%rip), %rdx
>       > 	call main
>         	mov %eax, %edi
> 
> Above backtrace includes the failing memcpy() call, making it much
> easier to spot the bug.
> 
> Enable "fake profiling" if supported by the compiler to get better
> backtraces. The runtime overhead should be negligible for the gained
> debugability as the profiling call is actually a NOP.
> 
> Signed-off-by: Mathias Krause <minipli@grsecurity.net>
> ---
> One may argure that the "ifneq ($(KEEP_FRAME_POINTER),) ... endif"
> wrapping isn't needed, and that's true. However, it simplifies toggling
> that variable, if there'll ever be a need for it.
> 
>  x86/Makefile.common | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/x86/Makefile.common b/x86/Makefile.common
> index 5663a65d3df4..be18a77a779e 100644
> --- a/x86/Makefile.common
> +++ b/x86/Makefile.common
> @@ -43,6 +43,17 @@ COMMON_CFLAGS += -O1
>  # stack.o relies on frame pointers.
>  KEEP_FRAME_POINTER := y
>  
> +ifneq ($(KEEP_FRAME_POINTER),)
> +# Fake profiling to force the compiler to emit a frame pointer setup also in
> +# leaf function (-mno-omit-leaf-frame-pointer doesn't work, unfortunately).
> +#
> +# Note:
> +# We need to defer the cc-option test until -fno-pic or -no-pie have been
> +# added to CFLAGS as -mnop-mcount needs it. The lazy evaluation of CFLAGS
> +# during compilation makes this do "The Right Thing."
> +LATE_CFLAGS += $(call cc-option, -pg -mnop-mcount, "")
> +endif
> +
>  FLATLIBS = lib/libcflat.a
>  
>  ifeq ($(CONFIG_EFI),y)

Paolo, can you please comment on this one, so the fixes for ARM and
AArch64 are no longer blocked and, ideally, this series can be merged?

Thanks,
Mathias

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions
  2025-09-15 21:54 [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions Mathias Krause
                   ` (4 preceding siblings ...)
  2025-09-16 13:04 ` [kvm-unit-tests PATCH v2 0/4] Better backtraces for " Andrew Jones
@ 2025-11-14 15:58 ` Mathias Krause
  2025-11-14 16:39   ` Sean Christopherson
  2025-11-14 18:25 ` Sean Christopherson
  6 siblings, 1 reply; 22+ messages in thread
From: Mathias Krause @ 2025-11-14 15:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, kvmarm, Paolo Bonzini, Andrew Jones, Eric Auger,
	Alexandru Elisei, Thomas Huth

+Sean

Sean, while you're currently in KUT maintainer mode, can you please take
a look at this? The ARM bits have already been ack'ed by Andrew and
Paolo, apparently, has little time for KUT as well. But as you currently
seem to be looking through the pending KUT patches, maybe this one can
get some attention as well?

Thanks,
Mathias

On 15.09.25 23:54, Mathias Krause wrote:
> This is v2 of [1], trying to enhance backtraces involving leaf
> functions.
> 
> This version fixes backtraces on ARM and ARM64 as well, as ARM currently
> fails hard for leaf functions lacking a proper stack frame setup, making
> it dereference invalid pointers. ARM64 just skips frames, much like x86
> does.
> 
> v2 fixes this by introducing the concept of "late CFLAGS" that get
> evaluated in the top-level Makefile once all other optional flags have
> been added to $(CFLAGS), which is needed for x86's version at least.
> 
> Please apply!
> 
> Thanks,
> Mathias
> 
> [1] https://lore.kernel.org/kvm/20250724181759.1974692-1-minipli@grsecurity.net/
> 
> Mathias Krause (4):
>   Makefile: Provide a concept of late CFLAGS
>   x86: Better backtraces for leaf functions
>   arm64: Better backtraces for leaf functions
>   arm: Fix backtraces involving leaf functions
> 
>  Makefile            |  4 ++++
>  arm/Makefile.arm    |  8 ++++++++
>  arm/Makefile.arm64  |  6 ++++++
>  x86/Makefile.common | 11 +++++++++++
>  lib/arm/stack.c     | 18 ++++++++++++++++--
>  5 files changed, 45 insertions(+), 2 deletions(-)
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions
  2025-11-14 15:58 ` Mathias Krause
@ 2025-11-14 16:39   ` Sean Christopherson
  0 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-11-14 16:39 UTC (permalink / raw)
  To: Mathias Krause
  Cc: kvm, kvmarm, Paolo Bonzini, Andrew Jones, Eric Auger,
	Alexandru Elisei, Thomas Huth

On Fri, Nov 14, 2025, Mathias Krause wrote:
> +Sean
> 
> Sean, while you're currently in KUT maintainer mode, can you please take
> a look at this? The ARM bits have already been ack'ed by Andrew and
> Paolo, apparently, has little time for KUT as well. But as you currently
> seem to be looking through the pending KUT patches, maybe this one can
> get some attention as well?

Got 'em.  Thanks for the heads up, this wasn't on my radar (I saw that Drew had
responded to v1 and assumed he had taken them).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions
  2025-09-15 21:54 [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions Mathias Krause
                   ` (5 preceding siblings ...)
  2025-11-14 15:58 ` Mathias Krause
@ 2025-11-14 18:25 ` Sean Christopherson
  2025-11-15  4:56   ` Mathias Krause
  6 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2025-11-14 18:25 UTC (permalink / raw)
  To: Sean Christopherson, Andrew Jones, Alexandru Elisei, Eric Auger,
	Thomas Huth, Paolo Bonzini, Mathias Krause
  Cc: kvm, kvmarm

On Mon, 15 Sep 2025 23:54:28 +0200, Mathias Krause wrote:
> This is v2 of [1], trying to enhance backtraces involving leaf
> functions.
> 
> This version fixes backtraces on ARM and ARM64 as well, as ARM currently
> fails hard for leaf functions lacking a proper stack frame setup, making
> it dereference invalid pointers. ARM64 just skips frames, much like x86
> does.
> 
> [...]

Applied to kvm-x86 next, thanks!

P.S. This also prompted me to get pretty_print_stacks.py working in my
     environment, so double thanks!

[1/4] Makefile: Provide a concept of late CFLAGS
      https://github.com/kvm-x86/kvm-unit-tests/commit/816fe2d45aed
[2/4] x86: Better backtraces for leaf functions
      https://github.com/kvm-x86/kvm-unit-tests/commit/f01ea38a385a
[3/4] arm64: Better backtraces for leaf functions
      https://github.com/kvm-x86/kvm-unit-tests/commit/da1804215c8e
[4/4] arm: Fix backtraces involving leaf functions
      https://github.com/kvm-x86/kvm-unit-tests/commit/c885c94f523e

--
https://github.com/kvm-x86/kvm-unit-tests/tree/next

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions
  2025-11-14 18:25 ` Sean Christopherson
@ 2025-11-15  4:56   ` Mathias Krause
  2025-11-17 22:19     ` Sean Christopherson
  0 siblings, 1 reply; 22+ messages in thread
From: Mathias Krause @ 2025-11-15  4:56 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, kvmarm, Alexandru Elisei, Andrew Jones, Eric Auger,
	Thomas Huth, Paolo Bonzini

On 14.11.25 19:25, Sean Christopherson wrote:
> On Mon, 15 Sep 2025 23:54:28 +0200, Mathias Krause wrote:
>> This is v2 of [1], trying to enhance backtraces involving leaf
>> functions.
>>
>> This version fixes backtraces on ARM and ARM64 as well, as ARM currently
>> fails hard for leaf functions lacking a proper stack frame setup, making
>> it dereference invalid pointers. ARM64 just skips frames, much like x86
>> does.
>>
>> [...]
> 
> Applied to kvm-x86 next, thanks!

Thanks a lot, Sean!

> P.S. This also prompted me to get pretty_print_stacks.py working in my
>      environment, so double thanks!

Haha, you're welcome! :D

> 
> [1/4] Makefile: Provide a concept of late CFLAGS
>       https://github.com/kvm-x86/kvm-unit-tests/commit/816fe2d45aed
> [2/4] x86: Better backtraces for leaf functions
>       https://github.com/kvm-x86/kvm-unit-tests/commit/f01ea38a385a
> [3/4] arm64: Better backtraces for leaf functions
>       https://github.com/kvm-x86/kvm-unit-tests/commit/da1804215c8e
> [4/4] arm: Fix backtraces involving leaf functions
>       https://github.com/kvm-x86/kvm-unit-tests/commit/c885c94f523e
> 
> --
> https://github.com/kvm-x86/kvm-unit-tests/tree/next


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions
  2025-11-15  4:56   ` Mathias Krause
@ 2025-11-17 22:19     ` Sean Christopherson
  2025-11-18  1:33       ` Mathias Krause
  0 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2025-11-17 22:19 UTC (permalink / raw)
  To: Mathias Krause
  Cc: kvm, kvmarm, Alexandru Elisei, Andrew Jones, Eric Auger,
	Thomas Huth, Paolo Bonzini

On Sat, Nov 15, 2025, Mathias Krause wrote:
> On 14.11.25 19:25, Sean Christopherson wrote:
> > On Mon, 15 Sep 2025 23:54:28 +0200, Mathias Krause wrote:
> >> This is v2 of [1], trying to enhance backtraces involving leaf
> >> functions.
> >>
> >> This version fixes backtraces on ARM and ARM64 as well, as ARM currently
> >> fails hard for leaf functions lacking a proper stack frame setup, making
> >> it dereference invalid pointers. ARM64 just skips frames, much like x86
> >> does.
> >>
> >> [...]
> > 
> > Applied to kvm-x86 next, thanks!
> 
> Thanks a lot, Sean!
> 
> > P.S. This also prompted me to get pretty_print_stacks.py working in my
> >      environment, so double thanks!
> 
> Haha, you're welcome! :D
> 
> > 
> > [1/4] Makefile: Provide a concept of late CFLAGS
> >       https://github.com/kvm-x86/kvm-unit-tests/commit/816fe2d45aed
> > [2/4] x86: Better backtraces for leaf functions
> >       https://github.com/kvm-x86/kvm-unit-tests/commit/f01ea38a385a

Spoke too soon :-(

The x86 change breaks the realmode test.  I didn't try hard to debug, as that
test is brittle, e.g. see https://lore.kernel.org/all/20240604143507.1041901-1-pbonzini@redhat.com.

I can't for the life of me figure out how to get Makefile variables to do what I
want, so for now I'm going to drop the x86 change so as not to block the rest of
the stuff I've got applied.

I'll keep the rest applied, so just resubmit the x86 patch against kvm-x86/next.

FWIW, conceptually I think we want something like this:

diff --git a/x86/Makefile.common b/x86/Makefile.common
index be18a77a..65e41578 100644
--- a/x86/Makefile.common
+++ b/x86/Makefile.common
@@ -44,6 +44,7 @@ COMMON_CFLAGS += -O1
 KEEP_FRAME_POINTER := y
 
 ifneq ($(KEEP_FRAME_POINTER),)
+ifneq ($(no_profiling),y)
 # Fake profiling to force the compiler to emit a frame pointer setup also in
 # leaf function (-mno-omit-leaf-frame-pointer doesn't work, unfortunately).
 #
@@ -53,6 +54,7 @@ ifneq ($(KEEP_FRAME_POINTER),)
 # during compilation makes this do "The Right Thing."
 LATE_CFLAGS += $(call cc-option, -pg -mnop-mcount, "")
 endif
+endif
 
 FLATLIBS = lib/libcflat.a
 
@@ -120,6 +122,7 @@ $(TEST_DIR)/realmode.elf: $(TEST_DIR)/realmode.o $(SRCDIR)/$(TEST_DIR)/realmode.
        $(LD) -m elf_i386 -nostdlib -o $@ \
              -T $(SRCDIR)/$(TEST_DIR)/realmode.lds $(filter %.o, $^)
 
+$(TEST_DIR)/realmode.o: no_profiling = y
 $(TEST_DIR)/realmode.o: bits = $(realmode_bits)
 
 $(TEST_DIR)/access_test.$(bin): $(TEST_DIR)/access.o

> > [3/4] arm64: Better backtraces for leaf functions
> >       https://github.com/kvm-x86/kvm-unit-tests/commit/da1804215c8e
> > [4/4] arm: Fix backtraces involving leaf functions
> >       https://github.com/kvm-x86/kvm-unit-tests/commit/c885c94f523e
> > 
> > --
> > https://github.com/kvm-x86/kvm-unit-tests/tree/next
> 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions
  2025-11-17 22:19     ` Sean Christopherson
@ 2025-11-18  1:33       ` Mathias Krause
  2025-11-18  1:47         ` Mathias Krause
  0 siblings, 1 reply; 22+ messages in thread
From: Mathias Krause @ 2025-11-18  1:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, kvmarm, Alexandru Elisei, Andrew Jones, Eric Auger,
	Thomas Huth, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 4248 bytes --]

On 17.11.25 23:19, Sean Christopherson wrote:
> On Sat, Nov 15, 2025, Mathias Krause wrote:
>> On 14.11.25 19:25, Sean Christopherson wrote:
>>> On Mon, 15 Sep 2025 23:54:28 +0200, Mathias Krause wrote:
>>>> This is v2 of [1], trying to enhance backtraces involving leaf
>>>> functions.
>>>>
>>>> This version fixes backtraces on ARM and ARM64 as well, as ARM currently
>>>> fails hard for leaf functions lacking a proper stack frame setup, making
>>>> it dereference invalid pointers. ARM64 just skips frames, much like x86
>>>> does.
>>>>
>>>> [...]
>>>
>>> Applied to kvm-x86 next, thanks!
>>
>> Thanks a lot, Sean!
>>
>>> P.S. This also prompted me to get pretty_print_stacks.py working in my
>>>      environment, so double thanks!
>>
>> Haha, you're welcome! :D
>>
>>>
>>> [1/4] Makefile: Provide a concept of late CFLAGS
>>>       https://github.com/kvm-x86/kvm-unit-tests/commit/816fe2d45aed
>>> [2/4] x86: Better backtraces for leaf functions
>>>       https://github.com/kvm-x86/kvm-unit-tests/commit/f01ea38a385a
> 
> Spoke too soon :-(
> 
> The x86 change breaks the realmode test.  I didn't try hard to debug, as that
> test is brittle, e.g. see https://lore.kernel.org/all/20240604143507.1041901-1-pbonzini@redhat.com.

Gee! 16bit code at its finest.

So I looked at it and this seems to be a bug in gcc because it emits a
5-byte NOP to make room for a 3-byte call. Actually, it doesn't even
emit a NOP, it just spits out some bytes:

minipli@bell:~$ diff -u pg.s nop_mcount.s
--- pg.s	2025-11-18 01:34:34.884472417 +0100
+++ nop_mcount.s	2025-11-18 01:34:05.716096290 +0100
@@ -11,7 +11,7 @@
 	.cfi_offset 5, -8
 	movl	%esp, %ebp
 	.cfi_def_cfa_register 5
-1:	call	mcount
+1:	.byte	0x0f, 0x1f, 0x44, 0x00, 0x00
 	movl	8(%ebp), %eax
 	addl	%eax, %eax
 	popl	%ebp

The issue is, while that is a perfectly fine NOPL for 32-bit and 64-bit,
it's not for 16-bit. The 16-bit version gets treated as a 4-byte NOPW
with a left-over zero byte which messes with all the following opcodes
and causes issues like you observed.

However, gcc is probably not fully to blame, as it cannot know that it's
generating 16-bit code. That piece is hidden in a asm(".code16gcc")
statement.

> 
> I can't for the life of me figure out how to get Makefile variables to do what I
> want, so for now I'm going to drop the x86 change so as not to block the rest of
> the stuff I've got applied.
> 
> I'll keep the rest applied, so just resubmit the x86 patch against kvm-x86/next.
> 


> FWIW, conceptually I think we want something like this:
> 
> diff --git a/x86/Makefile.common b/x86/Makefile.common
> index be18a77a..65e41578 100644
> --- a/x86/Makefile.common
> +++ b/x86/Makefile.common
> @@ -44,6 +44,7 @@ COMMON_CFLAGS += -O1
>  KEEP_FRAME_POINTER := y
>  
>  ifneq ($(KEEP_FRAME_POINTER),)
> +ifneq ($(no_profiling),y)
>  # Fake profiling to force the compiler to emit a frame pointer setup also in
>  # leaf function (-mno-omit-leaf-frame-pointer doesn't work, unfortunately).
>  #
> @@ -53,6 +54,7 @@ ifneq ($(KEEP_FRAME_POINTER),)
>  # during compilation makes this do "The Right Thing."
>  LATE_CFLAGS += $(call cc-option, -pg -mnop-mcount, "")
>  endif
> +endif
>  
>  FLATLIBS = lib/libcflat.a
>  
> @@ -120,6 +122,7 @@ $(TEST_DIR)/realmode.elf: $(TEST_DIR)/realmode.o $(SRCDIR)/$(TEST_DIR)/realmode.
>         $(LD) -m elf_i386 -nostdlib -o $@ \
>               -T $(SRCDIR)/$(TEST_DIR)/realmode.lds $(filter %.o, $^)
>  
> +$(TEST_DIR)/realmode.o: no_profiling = y
>  $(TEST_DIR)/realmode.o: bits = $(realmode_bits)

Kind of, yes. Just with the $(no_profiling) part embedded into
LATE_CFLAGS. However, as that gets forcibly evaluated early (on
purpose!), that can't work.

I tried to play with $(filter-out ...,$(CFLAGS)) but that didn't work
out either. What does work, however, is to disable -mnop-mcount and make
it a call again. This just needs a stub mcount() and the realmode test
is working again.

I see that kvm-x86/next as of now (e403d30a8210 ("x86/emulator: Treat
DR6_BUS_LOCK as writable if CPU has BUS_LOCK_DETECT")) still has
f01ea38a385a ("x86: Better backtraces for leaf functions"), so I just
created a fixup commit (attached). Feel free to squash it into
f01ea38a385a or yell at if you want something else.

Thanks,
Mathias

[-- Attachment #2: 0001-x86-Prevent-realmode-test-code-instrumentation-with-.patch --]
[-- Type: text/x-patch, Size: 2392 bytes --]

From d2ac15881b794a4e3932c49a97278d0426f0aa67 Mon Sep 17 00:00:00 2001
From: Mathias Krause <minipli@grsecurity.net>
Date: Tue, 18 Nov 2025 02:14:27 +0100
Subject: [kvm-unit-tests PATCH] x86: Prevent realmode test code
 instrumentation with nop-mcount

Commit f01ea38a385a ("x86: Better backtraces for leaf functions") made
use of '-pg -mnop-mcount' to provide a lightweight way to force leaf
functions to emit a proper prologue for the backtracing code. However,
-mnop-mcount doesn't play well with 16-bit code generation for C code.
gcc happily emits a 5-byte NOP that transmutes to a 4-byte NOP followed
by a zero byte when executed in real mode, wrecking all code that
follows.

Fix that by selectively disabling '-mnop-mcount' for realmode.c, making
it call mcount(), which is provided as a stub function.

Reported-by: Sean Christopherson <seanjc@google.com>
Fixes: f01ea38a385a ("x86: Better backtraces for leaf functions")
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
---
 x86/Makefile.common | 5 ++++-
 x86/realmode.c      | 3 +++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/x86/Makefile.common b/x86/Makefile.common
index be18a77a779e..6b59f26a3d2c 100644
--- a/x86/Makefile.common
+++ b/x86/Makefile.common
@@ -51,7 +51,9 @@ ifneq ($(KEEP_FRAME_POINTER),)
 # We need to defer the cc-option test until -fno-pic or -no-pie have been
 # added to CFLAGS as -mnop-mcount needs it. The lazy evaluation of CFLAGS
 # during compilation makes this do "The Right Thing."
-LATE_CFLAGS += $(call cc-option, -pg -mnop-mcount, "")
+NOP_PGFLAGS := -pg -mnop-mcount
+LATE_CFLAGS += $(call cc-option, $(NOP_PGFLAGS), "")
+NO_NOP_MCOUNT = $(if $(filter $(NOP_PGFLAGS),$(LATE_CFLAGS)),-mno-nop-mcount)
 endif
 
 FLATLIBS = lib/libcflat.a
@@ -121,6 +123,7 @@ $(TEST_DIR)/realmode.elf: $(TEST_DIR)/realmode.o $(SRCDIR)/$(TEST_DIR)/realmode.
 	      -T $(SRCDIR)/$(TEST_DIR)/realmode.lds $(filter %.o, $^)
 
 $(TEST_DIR)/realmode.o: bits = $(realmode_bits)
+$(TEST_DIR)/realmode.o: CFLAGS += $(NO_NOP_MCOUNT)
 
 $(TEST_DIR)/access_test.$(bin): $(TEST_DIR)/access.o
 
diff --git a/x86/realmode.c b/x86/realmode.c
index 7a4423ec6098..0a7104d4cff4 100644
--- a/x86/realmode.c
+++ b/x86/realmode.c
@@ -23,6 +23,9 @@ void test_function(void);
 asm(
 	"test_function: \n\t"
 	"mov $0x1234, %eax \n\t"
+	"ret\n\t"
+	/* mcount() stub */
+	"mcount:\n\t"
 	"ret"
    );
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions
  2025-11-18  1:33       ` Mathias Krause
@ 2025-11-18  1:47         ` Mathias Krause
  2025-11-18  4:04           ` Mathias Krause
  2025-11-21 16:44           ` Mathias Krause
  0 siblings, 2 replies; 22+ messages in thread
From: Mathias Krause @ 2025-11-18  1:47 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, kvmarm, Alexandru Elisei, Andrew Jones, Eric Auger,
	Thomas Huth, Paolo Bonzini

On 18.11.25 02:33, Mathias Krause wrote:
> On 17.11.25 23:19, Sean Christopherson wrote:
>> On Sat, Nov 15, 2025, Mathias Krause wrote:
>>> On 14.11.25 19:25, Sean Christopherson wrote:
>>>> On Mon, 15 Sep 2025 23:54:28 +0200, Mathias Krause wrote:
>>>>> This is v2 of [1], trying to enhance backtraces involving leaf
>>>>> functions.
>>>>>
>>>>> This version fixes backtraces on ARM and ARM64 as well, as ARM currently
>>>>> fails hard for leaf functions lacking a proper stack frame setup, making
>>>>> it dereference invalid pointers. ARM64 just skips frames, much like x86
>>>>> does.
>>>>>
>>>>> [...]
>>>>
>>>> Applied to kvm-x86 next, thanks!
>>>
>>> Thanks a lot, Sean!
>>>
>>>> P.S. This also prompted me to get pretty_print_stacks.py working in my
>>>>      environment, so double thanks!
>>>
>>> Haha, you're welcome! :D
>>>
>>>>
>>>> [1/4] Makefile: Provide a concept of late CFLAGS
>>>>       https://github.com/kvm-x86/kvm-unit-tests/commit/816fe2d45aed
>>>> [2/4] x86: Better backtraces for leaf functions
>>>>       https://github.com/kvm-x86/kvm-unit-tests/commit/f01ea38a385a
>>
>> Spoke too soon :-(
>>
>> The x86 change breaks the realmode test.  I didn't try hard to debug, as that
>> test is brittle, e.g. see https://lore.kernel.org/all/20240604143507.1041901-1-pbonzini@redhat.com.

Bleh, I just noticed, f01ea38a385a ("x86: Better backtraces for leaf
functions") broke vmx_sipi_signal_test too :(

Looking into it!

Mathias

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions
  2025-11-18  1:47         ` Mathias Krause
@ 2025-11-18  4:04           ` Mathias Krause
  2025-11-18 11:56             ` Mathias Krause
  2025-11-21 16:44           ` Mathias Krause
  1 sibling, 1 reply; 22+ messages in thread
From: Mathias Krause @ 2025-11-18  4:04 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, kvmarm, Alexandru Elisei, Andrew Jones, Eric Auger,
	Thomas Huth, Paolo Bonzini

On 18.11.25 02:47, Mathias Krause wrote:
> On 18.11.25 02:33, Mathias Krause wrote:
>> On 17.11.25 23:19, Sean Christopherson wrote:
>>> Spoke too soon :-(
>>>
>>> The x86 change breaks the realmode test.  I didn't try hard to debug, as that
>>> test is brittle, e.g. see https://lore.kernel.org/all/20240604143507.1041901-1-pbonzini@redhat.com.
> 
> Bleh, I just noticed, f01ea38a385a ("x86: Better backtraces for leaf
> functions") broke vmx_sipi_signal_test too :(
> 
> Looking into it!

Can't see it. The issue is, the call to hypercall(1) does return in
x86/vmx.c:guest_entry() with f01ea38a385a applied but not w/o it. But
that makes no sense to me whatsoever. :/

Below debug diff shows the issue:

diff --git a/x86/vmx.c b/x86/vmx.c
index c803eaa67ac6..e49668f5ff95 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -806,4 +806,5 @@ asm(
        "       mov $1, %edi\n\t"
        "       call hypercall\n\t"
+       "       ud2"
 );

It also explains why the test fails: it's executing random code that
follows, in my case, __this_cpu_has().

Catching some sleep!

Mathias

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions
  2025-11-18  4:04           ` Mathias Krause
@ 2025-11-18 11:56             ` Mathias Krause
  2025-11-18 12:10               ` Mathias Krause
  0 siblings, 1 reply; 22+ messages in thread
From: Mathias Krause @ 2025-11-18 11:56 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, kvmarm, Alexandru Elisei, Andrew Jones, Eric Auger,
	Thomas Huth, Paolo Bonzini

On 18.11.25 05:04, Mathias Krause wrote:
> On 18.11.25 02:47, Mathias Krause wrote:
>> On 18.11.25 02:33, Mathias Krause wrote:
>>> On 17.11.25 23:19, Sean Christopherson wrote:
>>>> Spoke too soon :-(
>>>>
>>>> The x86 change breaks the realmode test.  I didn't try hard to debug, as that
>>>> test is brittle, e.g. see https://lore.kernel.org/all/20240604143507.1041901-1-pbonzini@redhat.com.
>>
>> Bleh, I just noticed, f01ea38a385a ("x86: Better backtraces for leaf
>> functions") broke vmx_sipi_signal_test too :(
>>
>> Looking into it!
> 
> Can't see it. The issue is, the call to hypercall(1) does return in
> x86/vmx.c:guest_entry() with f01ea38a385a applied but not w/o it. But
> that makes no sense to me whatsoever. :/

After some sleep it makes lot more sense. Well, not really, but it's a
race that f01ea38a385a, apparently, makes easier to hit.

> 
> Below debug diff shows the issue:
> 
> diff --git a/x86/vmx.c b/x86/vmx.c
> index c803eaa67ac6..e49668f5ff95 100644
> --- a/x86/vmx.c
> +++ b/x86/vmx.c
> @@ -806,4 +806,5 @@ asm(
>         "       mov $1, %edi\n\t"
>         "       call hypercall\n\t"
> +       "       ud2"
>  );
That hypercall call shouldn't return, as it's HYPERCALL_VMEXIT which
makes vmx_run() return and not re-enter the guest. But, apparently, it
does so and it does so because the guest/VMM shared variable
'hypercall_field' to communicate to the VMM what to do can also be
modified concurrently by the other CPU/guest #2 after it was written by
CPU/guest #1 but before read by the VMM. Bummer!

I'll send a fix in a few.

Thanks,
Mathias

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions
  2025-11-18 11:56             ` Mathias Krause
@ 2025-11-18 12:10               ` Mathias Krause
  0 siblings, 0 replies; 22+ messages in thread
From: Mathias Krause @ 2025-11-18 12:10 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, kvmarm, Alexandru Elisei, Andrew Jones, Eric Auger,
	Thomas Huth, Paolo Bonzini

On 18.11.25 12:56, Mathias Krause wrote:
> On 18.11.25 05:04, Mathias Krause wrote:
> [...]> After some sleep it makes lot more sense. Well, not really, but
it's a
> race that f01ea38a385a, apparently, makes easier to hit.
> 
>>
>> Below debug diff shows the issue:
>>
>> diff --git a/x86/vmx.c b/x86/vmx.c
>> index c803eaa67ac6..e49668f5ff95 100644
>> --- a/x86/vmx.c
>> +++ b/x86/vmx.c
>> @@ -806,4 +806,5 @@ asm(
>>         "       mov $1, %edi\n\t"
>>         "       call hypercall\n\t"
>> +       "       ud2"
>>  );
> That hypercall call shouldn't return, as it's HYPERCALL_VMEXIT which
> makes vmx_run() return and not re-enter the guest. But, apparently, it
> does so and it does so because the guest/VMM shared variable
> 'hypercall_field' to communicate to the VMM what to do can also be
> modified concurrently by the other CPU/guest #2 after it was written by
> CPU/guest #1 but before read by the VMM. Bummer!

Bleh, I messed up my test setup and it's not it. Back to square one.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions
  2025-11-18  1:47         ` Mathias Krause
  2025-11-18  4:04           ` Mathias Krause
@ 2025-11-21 16:44           ` Mathias Krause
  2025-12-18  1:44             ` Sean Christopherson
  1 sibling, 1 reply; 22+ messages in thread
From: Mathias Krause @ 2025-11-21 16:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, kvmarm, Alexandru Elisei, Andrew Jones, Eric Auger,
	Thomas Huth, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 3674 bytes --]

On 18.11.25 02:47, Mathias Krause wrote:
> On 18.11.25 02:33, Mathias Krause wrote:
> [...]
> Bleh, I just noticed, f01ea38a385a ("x86: Better backtraces for leaf
> functions") broke vmx_sipi_signal_test too :(
> 
> Looking into it!

Finally found it. It's register corruption within both host and guest.

It's not related to f01ea38a385a at all but, apparently, it actually
exposes it, likely because of the enforced stack frame setup, making the
code rely on (a corrupted) RBP instead of the properly restored (because
VMCS managed) RSP.

The core issue is, 'regs' being a singleton, used by multiple CPUs, so
all SMP VMX tests concurrently making use of vmx_enter_guest() are
potentially affected.

When the first vCPU calls vmx_enter_guest() to launch a guest, it'll use
'regs' to load the guest registers but also store its host register
state. Now, if while that vCPU is running, another vCPU gets launched
via vmx_enter_guest(), it'll load the previous vCPU's host register
values as guest register state and store its host registers in 'regs'.

Depending on which vCPU returns first, it'll either load the other
vCPU's host registers effectively "switching threads" or, if it's the
vCPU that called vmx_enter_guest() last, it'll resume just fine. Either
way, the next vCPU returning will run with the guest register values.

The latter is what happens with vmx_sipi_signal_test, causing the crash.

I read a lot of vmx.c and vmx_test.c in the last few days and it's
really not meant to be used concurrently by multiple guests. vmx_test.c
has quite some hacks to work around obvious limitations (allocating
dedicated stacks for APs) but state variables like 'launched',
'in_guest', 'guest_finished', 'hypercall_field' and 'regs' are shared
but really meant to be used only by a single thread.

I hacked up something to verify my theory and made 'regs' "per-cpu". It
needs quite some code churn and I'm not all that happy with it. IMHO,
'regs' and lots of the other VMX management state should be part of some
vcpu struct or something. In fact, struct vmx_test already has a
'guest_regs' but using it won't work, as we need offsetable absolute
memory references for the inline ASM in vmx_enter_guest() to work as it
cannot make use of register-based memory references at all. (My hack
uses a global 'scratch_regs' with mutual exclusion on its usage.)

To see the register corruption, one could start the vmx_sipi_signal_test
test with -s -S, attach gdb to it and add a watch for regs.rax. Stepping
through the test will clearly show how 'regs' get overwritten wrongly.

In one shell:
$ ./x86-run x86/vmx.flat -append vmx_sipi_signal_test \
	-cpu max,+vmx -smp 2 -s -S

In another:
$ gdb -q -ex 'target remote :1234' x86/vmx.elf
Reading symbols from x86/vmx.elf...
Remote debugging using :1234
0x000000000000fff0 in ?? ()
(gdb) watch regs.rax
Hardware watchpoint 1: regs.rax
(gdb) c
Continuing.

Thread 1 hit Hardware watchpoint 1: regs.rax

Old value = 0
New value = 7589640
0x00000000004006b5 in vmx_enter_guest (result=result@entry=0x73cf08) at
x86/vmx.c:1754
1754		asm volatile (
(gdb)
Continuing.
[Switching to Thread 1.2]

Thread 2 hit Hardware watchpoint 1: regs.rax

Old value = 7589640
New value = 7577392
0x00000000004006b5 in vmx_enter_guest (result=result@entry=0x739f30) at
x86/vmx.c:1754
1754		asm volatile (
(gdb)
Continuing.

Interrupting the test like this even makes it hang for me but above
already shows that vCPU 1 makes use of 'regs' to load and store register
state, followed by vCPU 2, doing the same, destroying vCPU 1's stored
host register state. Bummer!

Now, the question is, how to properly fix that. (No, not my hack! :P)

Thanks,
Mathias

[-- Attachment #2: 0001-x86-hacky-attempt-to-make-regs-per-cpu.patch --]
[-- Type: text/x-patch, Size: 26010 bytes --]

From 3fb07081769ae6680ac732168aab23cf45eaedc6 Mon Sep 17 00:00:00 2001
From: Mathias Krause <minipli@grsecurity.net>
Date: Fri, 21 Nov 2025 17:34:19 +0100
Subject: [kvm-unit-tests PATCH] x86: hacky attempt to make 'regs' per-cpu

The current concurrent use of globals like 'regs' causes register
corruption when multiple CPUs are executing vmx_enter_guest() in
parallel.

Signed-off-by: Mathias Krause <minipli@grsecurity.net>
---
 x86/vmx.h       | 121 +++++++++++++++++++++++++++++++++--------------
 x86/vmx.c       |  70 ++++++++++++++++++---------
 x86/vmx_tests.c | 122 ++++++++++++++++++++++++------------------------
 3 files changed, 196 insertions(+), 117 deletions(-)

diff --git a/x86/vmx.h b/x86/vmx.h
index 33373bd1a2a9..3dfd5d0c2e71 100644
--- a/x86/vmx.h
+++ b/x86/vmx.h
@@ -120,7 +120,7 @@ struct vmx_test {
 	const char *name;
 	int (*init)(struct vmcs *vmcs);
 	void (*guest_main)(void);
-	int (*exit_handler)(union exit_reason exit_reason);
+	int (*exit_handler)(struct regs *regs, union exit_reason exit_reason);
 	void (*syscall_handler)(u64 syscall_no);
 	struct regs guest_regs;
 	int (*entry_failure_handler)(struct vmentry_result *result);
@@ -588,43 +588,92 @@ enum vm_entry_failure_code {
 	ENTRY_FAIL_VMCS_LINK_PTR	= 4,
 };
 
-#define SAVE_GPR				\
-	"xchg %rax, regs\n\t"			\
-	"xchg %rcx, regs+0x8\n\t"		\
-	"xchg %rdx, regs+0x10\n\t"		\
-	"xchg %rbx, regs+0x18\n\t"		\
-	"xchg %rbp, regs+0x28\n\t"		\
-	"xchg %rsi, regs+0x30\n\t"		\
-	"xchg %rdi, regs+0x38\n\t"		\
-	"xchg %r8, regs+0x40\n\t"		\
-	"xchg %r9, regs+0x48\n\t"		\
-	"xchg %r10, regs+0x50\n\t"		\
-	"xchg %r11, regs+0x58\n\t"		\
-	"xchg %r12, regs+0x60\n\t"		\
-	"xchg %r13, regs+0x68\n\t"		\
-	"xchg %r14, regs+0x70\n\t"		\
-	"xchg %r15, regs+0x78\n\t"
+#define SAVE_GPR(regs)				\
+	"xchg %rax, " regs "\n\t"		\
+	"xchg %rcx, " regs "+0x8\n\t"		\
+	"xchg %rdx, " regs "+0x10\n\t"		\
+	"xchg %rbx, " regs "+0x18\n\t"		\
+	"xchg %rbp, " regs "+0x28\n\t"		\
+	"xchg %rsi, " regs "+0x30\n\t"		\
+	"xchg %rdi, " regs "+0x38\n\t"		\
+	"xchg %r8,  " regs "+0x40\n\t"		\
+	"xchg %r9,  " regs "+0x48\n\t"		\
+	"xchg %r10, " regs "+0x50\n\t"		\
+	"xchg %r11, " regs "+0x58\n\t"		\
+	"xchg %r12, " regs "+0x60\n\t"		\
+	"xchg %r13, " regs "+0x68\n\t"		\
+	"xchg %r14, " regs "+0x70\n\t"		\
+	"xchg %r15, " regs "+0x78\n\t"
 
 #define LOAD_GPR	SAVE_GPR
 
-#define SAVE_GPR_C				\
-	"xchg %%rax, regs\n\t"			\
-	"xchg %%rcx, regs+0x8\n\t"		\
-	"xchg %%rdx, regs+0x10\n\t"		\
-	"xchg %%rbx, regs+0x18\n\t"		\
-	"xchg %%rbp, regs+0x28\n\t"		\
-	"xchg %%rsi, regs+0x30\n\t"		\
-	"xchg %%rdi, regs+0x38\n\t"		\
-	"xchg %%r8, regs+0x40\n\t"		\
-	"xchg %%r9, regs+0x48\n\t"		\
-	"xchg %%r10, regs+0x50\n\t"		\
-	"xchg %%r11, regs+0x58\n\t"		\
-	"xchg %%r12, regs+0x60\n\t"		\
-	"xchg %%r13, regs+0x68\n\t"		\
-	"xchg %%r14, regs+0x70\n\t"		\
-	"xchg %%r15, regs+0x78\n\t"
+#define PUSH_GPR_C				\
+	"push %%rax\n\t"			\
+	"push %%rcx\n\t"			\
+	"push %%rdx\n\t"			\
+	"push %%rbx\n\t"			\
+	"push %%rbp\n\t"			\
+	"push %%rsi\n\t"			\
+	"push %%rdi\n\t"			\
+	"push %%r8\n\t"				\
+	"push %%r9\n\t"				\
+	"push %%r10\n\t"			\
+	"push %%r11\n\t"			\
+	"push %%r12\n\t"			\
+	"push %%r13\n\t"			\
+	"push %%r14\n\t"			\
+	"push %%r15\n\t"
 
-#define LOAD_GPR_C	SAVE_GPR_C
+#define LOAD_GPR_C(regs)			\
+	"mov  " regs ", %%rax\n\t"		\
+	"mov  " regs "+0x8, %%rcx\n\t"		\
+	"mov  " regs "+0x10, %%rdx\n\t"		\
+	"mov  " regs "+0x18, %%rbx\n\t"		\
+	"mov  " regs "+0x28, %%rbp\n\t"		\
+	"mov  " regs "+0x30, %%rsi\n\t"		\
+	"mov  " regs "+0x38, %%rdi\n\t"		\
+	"mov  " regs "+0x40, %%r8\n\t"		\
+	"mov  " regs "+0x48, %%r9\n\t"		\
+	"mov  " regs "+0x50, %%r10\n\t"		\
+	"mov  " regs "+0x58, %%r11\n\t"		\
+	"mov  " regs "+0x60, %%r12\n\t"		\
+	"mov  " regs "+0x68, %%r13\n\t"		\
+	"mov  " regs "+0x70, %%r14\n\t"		\
+	"mov  " regs "+0x78, %%r15\n\t"
+
+#define SAFE_GPR_C(regs)			\
+	"mov  %%rax, " regs "\n\t"		\
+	"mov  %%rcx, " regs "+0x8\n\t"		\
+	"mov  %%rdx, " regs "+0x10\n\t"		\
+	"mov  %%rbx, " regs "+0x18\n\t"		\
+	"mov  %%rbp, " regs "+0x28\n\t"		\
+	"mov  %%rsi, " regs "+0x30\n\t"		\
+	"mov  %%rdi, " regs "+0x38\n\t"		\
+	"mov  %%r8,  " regs "+0x40\n\t"		\
+	"mov  %%r9,  " regs "+0x48\n\t"		\
+	"mov  %%r10, " regs "+0x50\n\t"		\
+	"mov  %%r11, " regs "+0x58\n\t"		\
+	"mov  %%r12, " regs "+0x60\n\t"		\
+	"mov  %%r13, " regs "+0x68\n\t"		\
+	"mov  %%r14, " regs "+0x70\n\t"		\
+	"mov  %%r15, " regs "+0x78\n\t"
+
+#define POP_GPR_C				\
+	"pop  %%r15\n\t"			\
+	"pop  %%r14\n\t"			\
+	"pop  %%r13\n\t"			\
+	"pop  %%r12\n\t"			\
+	"pop  %%r11\n\t"			\
+	"pop  %%r10\n\t"			\
+	"pop  %%r9\n\t"				\
+	"pop  %%r8\n\t"				\
+	"pop  %%rdi\n\t"			\
+	"pop  %%rsi\n\t"			\
+	"pop  %%rbp\n\t"			\
+	"pop  %%rbx\n\t"			\
+	"pop  %%rdx\n\t"			\
+	"pop  %%rcx\n\t"			\
+	"pop  %%rax\n\t"
 
 #define VMX_IO_SIZE_MASK	0x7
 #define _VMX_IO_BYTE		0
@@ -755,7 +804,7 @@ enum vm_entry_failure_code {
 #define VMCS_FIELD_RESERVED_SHIFT	(15)
 #define VMCS_FIELD_BIT_SIZE		(BITS_PER_LONG)
 
-extern struct regs regs;
+struct regs *guest_regs(void);
 
 extern union vmx_basic_msr basic_msr;
 extern union vmx_ctrl_msr ctrl_pin_rev;
@@ -1017,7 +1066,7 @@ void init_vmx(u64 *vmxon_region);
 int init_vmcs(struct vmcs **vmcs);
 
 const char *exit_reason_description(u64 reason);
-void print_vmexit_info(union exit_reason exit_reason);
+void print_vmexit_info(struct regs *regs, union exit_reason exit_reason);
 void print_vmentry_failure_info(struct vmentry_result *result);
 void install_ept_entry(unsigned long *pml4, int pte_level,
 		unsigned long guest_addr, unsigned long pte,
diff --git a/x86/vmx.c b/x86/vmx.c
index c803eaa67ac6..2a3d79a4991c 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -44,7 +44,14 @@ struct vmcs *vmcs_root;
 u32 vpid_cnt;
 u64 guest_stack_top, guest_syscall_stack_top;
 u32 ctrl_pin, ctrl_enter, ctrl_exit, ctrl_cpu[2];
-struct regs regs;
+static struct regs regs[MAX_TEST_CPUS];
+struct regs sysenter_regs, scratch_regs;
+static volatile bool regs_in_use;
+
+struct regs *guest_regs(void)
+{
+	return &regs[smp_id()];
+}
 
 struct vmx_test *current;
 
@@ -566,11 +573,11 @@ asm(
 	".align	4, 0x90\n\t"
 	".globl	entry_sysenter\n\t"
 	"entry_sysenter:\n\t"
-	SAVE_GPR
+	SAVE_GPR("sysenter_regs")
 	"	and	$0xf, %rax\n\t"
 	"	mov	%rax, %rdi\n\t"
 	"	call	syscall_handler\n\t"
-	LOAD_GPR
+	LOAD_GPR("sysenter_regs")
 	"	vmresume\n\t"
 );
 
@@ -650,7 +657,7 @@ const char *exit_reason_description(u64 reason)
 	return exit_reason_descriptions[reason] ? : "(unused)";
 }
 
-void print_vmexit_info(union exit_reason exit_reason)
+void print_vmexit_info(struct regs *regs, union exit_reason exit_reason)
 {
 	u64 guest_rip, guest_rsp;
 	ulong exit_qual = vmcs_read(EXI_QUALIFICATION);
@@ -662,13 +669,13 @@ void print_vmexit_info(union exit_reason exit_reason)
 	printf("\texit qualification = %#lx\n", exit_qual);
 	printf("\tguest_rip = %#lx\n", guest_rip);
 	printf("\tRAX=%#lx    RBX=%#lx    RCX=%#lx    RDX=%#lx\n",
-		regs.rax, regs.rbx, regs.rcx, regs.rdx);
+		regs->rax, regs->rbx, regs->rcx, regs->rdx);
 	printf("\tRSP=%#lx    RBP=%#lx    RSI=%#lx    RDI=%#lx\n",
-		guest_rsp, regs.rbp, regs.rsi, regs.rdi);
+		guest_rsp, regs->rbp, regs->rsi, regs->rdi);
 	printf("\tR8 =%#lx    R9 =%#lx    R10=%#lx    R11=%#lx\n",
-		regs.r8, regs.r9, regs.r10, regs.r11);
+		regs->r8, regs->r9, regs->r10, regs->r11);
 	printf("\tR12=%#lx    R13=%#lx    R14=%#lx    R15=%#lx\n",
-		regs.r12, regs.r13, regs.r14, regs.r15);
+		regs->r12, regs->r13, regs->r14, regs->r15);
 }
 
 void print_vmentry_failure_info(struct vmentry_result *result)
@@ -1727,17 +1734,17 @@ void test_skip(const char *msg)
 	abort();
 }
 
-static int exit_handler(union exit_reason exit_reason)
+static int exit_handler(struct regs *regs, union exit_reason exit_reason)
 {
 	int ret;
 
 	current->exits++;
-	regs.rflags = vmcs_read(GUEST_RFLAGS);
+	regs->rflags = vmcs_read(GUEST_RFLAGS);
 	if (is_hypercall(exit_reason))
 		ret = handle_hypercall();
 	else
-		ret = current->exit_handler(exit_reason);
-	vmcs_write(GUEST_RFLAGS, regs.rflags);
+		ret = current->exit_handler(regs, exit_reason);
+	vmcs_write(GUEST_RFLAGS, regs->rflags);
 
 	return ret;
 }
@@ -1750,11 +1757,22 @@ static noinline void vmx_enter_guest(struct vmentry_result *result)
 {
 	memset(result, 0, sizeof(*result));
 
+	// XXX: atomic compare-and-set!
+	while (regs_in_use)
+		cpu_relax();
+
+	regs_in_use = 1;
+	barrier();
+
+	scratch_regs = *guest_regs();
 	in_guest = 1;
 	asm volatile (
+		PUSH_GPR_C
 		"mov %[HOST_RSP], %%rdi\n\t"
 		"vmwrite %%rsp, %%rdi\n\t"
-		LOAD_GPR_C
+		LOAD_GPR_C("scratch_regs")
+		"movb $0, regs_in_use\n\t"
+
 		"cmpb $0, %[launched]\n\t"
 		"jne 1f\n\t"
 		"vmlaunch\n\t"
@@ -1762,21 +1780,31 @@ static noinline void vmx_enter_guest(struct vmentry_result *result)
 		"1: "
 		"vmresume\n\t"
 		"2: "
-		SAVE_GPR_C
+
+		// XXX: loop until it's 0!
+		"movb $1, regs_in_use\n\t"
+		SAFE_GPR_C("scratch_regs")
 		"pushf\n\t"
 		"pop %%rdi\n\t"
 		"mov %%rdi, %[vm_fail_flags]\n\t"
 		"movl $1, %[vm_fail]\n\t"
 		"jmp 3f\n\t"
+
 		"vmx_return:\n\t"
-		SAVE_GPR_C
+		// XXX: loop until it's 0!
+		"movb $1, regs_in_use\n\t"
+		SAFE_GPR_C("scratch_regs")
 		"3: \n\t"
+		POP_GPR_C
 		: [vm_fail]"+m"(result->vm_fail),
 		  [vm_fail_flags]"=m"(result->flags)
 		: [launched]"m"(launched), [HOST_RSP]"i"(HOST_RSP)
-		: "rdi", "memory", "cc"
+		: "memory", "cc"
 	);
 	in_guest = 0;
+	regs[smp_id()] = scratch_regs;
+	barrier();
+	regs_in_use = 0;
 
 	result->vmlaunch = !launched;
 	result->instr = launched ? "vmresume" : "vmlaunch";
@@ -1799,7 +1827,7 @@ static int vmx_run(void)
 			 * entry failure (early or otherwise).
 			 */
 			launched = 1;
-			ret = exit_handler(result.exit_reason);
+			ret = exit_handler(guest_regs(), result.exit_reason);
 		} else if (current->entry_failure_handler) {
 			ret = current->entry_failure_handler(&result);
 		} else {
@@ -1822,7 +1850,7 @@ static int vmx_run(void)
 		}
 
 		if (result.entered)
-			print_vmexit_info(result.exit_reason);
+			print_vmexit_info(regs, result.exit_reason);
 		else
 			print_vmentry_failure_info(&result);
 		abort();
@@ -1836,6 +1864,7 @@ static void run_teardown_step(struct test_teardown_step *step)
 
 static int test_run(struct vmx_test *test)
 {
+	struct regs *vm_regs = guest_regs();
 	int r;
 
 	/* Validate V2 interface. */
@@ -1866,8 +1895,8 @@ static int test_run(struct vmx_test *test)
 	v2_guest_main = NULL;
 	test->exits = 0;
 	current = test;
-	regs = test->guest_regs;
-	vmcs_write(GUEST_RFLAGS, regs.rflags | X86_EFLAGS_FIXED);
+	*vm_regs = test->guest_regs;
+	vmcs_write(GUEST_RFLAGS, vm_regs->rflags | X86_EFLAGS_FIXED);
 	launched = 0;
 	guest_finished = 0;
 	printf("\nTest suite: %s\n", test->name);
@@ -1878,7 +1907,6 @@ static int test_run(struct vmx_test *test)
 		goto out;
 	}
 
-
 	if (test->v2)
 		test->v2();
 	else
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 5ffb80a3d866..490dd52fbf80 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -75,10 +75,10 @@ static void basic_guest_main(void)
 	report_pass("Basic VMX test");
 }
 
-static int basic_exit_handler(union exit_reason exit_reason)
+static int basic_exit_handler(struct regs *regs, union exit_reason exit_reason)
 {
 	report_fail("Basic VMX test");
-	print_vmexit_info(exit_reason);
+	print_vmexit_info(regs, exit_reason);
 	return VMX_TEST_EXIT;
 }
 
@@ -100,22 +100,22 @@ static void vmenter_main(void)
 	report((rax == 0xFFFF) && (rsp == resume_rsp), "test vmresume");
 }
 
-static int vmenter_exit_handler(union exit_reason exit_reason)
+static int vmenter_exit_handler(struct regs *regs, union exit_reason exit_reason)
 {
 	u64 guest_rip = vmcs_read(GUEST_RIP);
 
 	switch (exit_reason.basic) {
 	case VMX_VMCALL:
-		if (regs.rax != 0xABCD) {
+		if (regs->rax != 0xABCD) {
 			report_fail("test vmresume");
 			return VMX_TEST_VMEXIT;
 		}
-		regs.rax = 0xFFFF;
+		regs->rax = 0xFFFF;
 		vmcs_write(GUEST_RIP, guest_rip + 3);
 		return VMX_TEST_RESUME;
 	default:
 		report_fail("test vmresume");
-		print_vmexit_info(exit_reason);
+		print_vmexit_info(regs, exit_reason);
 	}
 	return VMX_TEST_VMEXIT;
 }
@@ -166,7 +166,7 @@ static void preemption_timer_main(void)
 	vmcall();
 }
 
-static int preemption_timer_exit_handler(union exit_reason exit_reason)
+static int preemption_timer_exit_handler(struct regs *regs, union exit_reason exit_reason)
 {
 	bool guest_halted;
 	u64 guest_rip;
@@ -204,7 +204,7 @@ static int preemption_timer_exit_handler(union exit_reason exit_reason)
 			break;
 		default:
 			report_fail("Invalid stage.");
-			print_vmexit_info(exit_reason);
+			print_vmexit_info(regs, exit_reason);
 			break;
 		}
 		break;
@@ -246,13 +246,13 @@ static int preemption_timer_exit_handler(union exit_reason exit_reason)
 			// Should not reach here
 			report_fail("unexpected stage, %d",
 				    vmx_get_test_stage());
-			print_vmexit_info(exit_reason);
+			print_vmexit_info(regs, exit_reason);
 			return VMX_TEST_VMEXIT;
 		}
 		break;
 	default:
 		report_fail("Unknown exit reason, 0x%x", exit_reason.full);
-		print_vmexit_info(exit_reason);
+		print_vmexit_info(regs, exit_reason);
 	}
 	vmcs_write(PIN_CONTROLS, vmcs_read(PIN_CONTROLS) & ~PIN_PREEMPT);
 	return VMX_TEST_VMEXIT;
@@ -346,7 +346,7 @@ static void test_ctrl_pat_main(void)
 		report(guest_ia32_pat == ia32_pat, "Entry load PAT");
 }
 
-static int test_ctrl_pat_exit_handler(union exit_reason exit_reason)
+static int test_ctrl_pat_exit_handler(struct regs *regs, union exit_reason exit_reason)
 {
 	u64 guest_rip;
 	u64 guest_pat;
@@ -412,7 +412,7 @@ static void test_ctrl_efer_main(void)
 		report(guest_ia32_efer == ia32_efer, "Entry load EFER");
 }
 
-static int test_ctrl_efer_exit_handler(union exit_reason exit_reason)
+static int test_ctrl_efer_exit_handler(struct regs *regs, union exit_reason exit_reason)
 {
 	u64 guest_rip;
 	u64 guest_efer;
@@ -536,7 +536,7 @@ static void cr_shadowing_main(void)
 	       "Write shadowing different X86_CR4_DE");
 }
 
-static int cr_shadowing_exit_handler(union exit_reason exit_reason)
+static int cr_shadowing_exit_handler(struct regs *regs, union exit_reason exit_reason)
 {
 	u64 guest_rip;
 	u32 insn_len;
@@ -584,7 +584,7 @@ static int cr_shadowing_exit_handler(union exit_reason exit_reason)
 			// Should not reach here
 			report_fail("unexpected stage, %d",
 				    vmx_get_test_stage());
-			print_vmexit_info(exit_reason);
+			print_vmexit_info(regs, exit_reason);
 			return VMX_TEST_VMEXIT;
 		}
 		vmcs_write(GUEST_RIP, guest_rip + insn_len);
@@ -623,14 +623,14 @@ static int cr_shadowing_exit_handler(union exit_reason exit_reason)
 			// Should not reach here
 			report_fail("unexpected stage, %d",
 				    vmx_get_test_stage());
-			print_vmexit_info(exit_reason);
+			print_vmexit_info(regs, exit_reason);
 			return VMX_TEST_VMEXIT;
 		}
 		vmcs_write(GUEST_RIP, guest_rip + insn_len);
 		return VMX_TEST_RESUME;
 	default:
 		report_fail("Unknown exit reason, 0x%x", exit_reason.full);
-		print_vmexit_info(exit_reason);
+		print_vmexit_info(regs, exit_reason);
 	}
 	return VMX_TEST_VMEXIT;
 }
@@ -699,7 +699,7 @@ static void iobmp_main(void)
 	       "I/O bitmap - unconditional exiting");
 }
 
-static int iobmp_exit_handler(union exit_reason exit_reason)
+static int iobmp_exit_handler(struct regs *regs, union exit_reason exit_reason)
 {
 	u64 guest_rip;
 	ulong exit_qual;
@@ -760,7 +760,7 @@ static int iobmp_exit_handler(union exit_reason exit_reason)
 			// Should not reach here
 			report_fail("unexpected stage, %d",
 				    vmx_get_test_stage());
-			print_vmexit_info(exit_reason);
+			print_vmexit_info(regs, exit_reason);
 			return VMX_TEST_VMEXIT;
 		}
 		vmcs_write(GUEST_RIP, guest_rip + insn_len);
@@ -781,7 +781,7 @@ static int iobmp_exit_handler(union exit_reason exit_reason)
 			// Should not reach here
 			report_fail("unexpected stage, %d",
 				    vmx_get_test_stage());
-			print_vmexit_info(exit_reason);
+			print_vmexit_info(regs, exit_reason);
 			return VMX_TEST_VMEXIT;
 		}
 		vmcs_write(GUEST_RIP, guest_rip + insn_len);
@@ -985,7 +985,7 @@ static void insn_intercept_main(void)
 	}
 }
 
-static int insn_intercept_exit_handler(union exit_reason exit_reason)
+static int insn_intercept_exit_handler(struct regs *regs, union exit_reason exit_reason)
 {
 	u64 guest_rip;
 	ulong exit_qual;
@@ -1280,7 +1280,7 @@ static bool invept_test(int type, u64 eptp)
 	return true;
 }
 
-static int pml_exit_handler(union exit_reason exit_reason)
+static int pml_exit_handler(struct regs *regs, union exit_reason exit_reason)
 {
 	u16 index, count;
 	u64 *pmlbuf = pml_log;
@@ -1309,7 +1309,7 @@ static int pml_exit_handler(union exit_reason exit_reason)
 		default:
 			report_fail("unexpected stage, %d.",
 			       vmx_get_test_stage());
-			print_vmexit_info(exit_reason);
+			print_vmexit_info(regs, exit_reason);
 			return VMX_TEST_VMEXIT;
 		}
 		vmcs_write(GUEST_RIP, guest_rip + insn_len);
@@ -1320,12 +1320,12 @@ static int pml_exit_handler(union exit_reason exit_reason)
 		return VMX_TEST_RESUME;
 	default:
 		report_fail("Unknown exit reason, 0x%x", exit_reason.full);
-		print_vmexit_info(exit_reason);
+		print_vmexit_info(regs, exit_reason);
 	}
 	return VMX_TEST_VMEXIT;
 }
 
-static int ept_exit_handler_common(union exit_reason exit_reason, bool have_ad)
+static int ept_exit_handler_common(struct regs *regs, union exit_reason exit_reason, bool have_ad)
 {
 	u64 guest_rip;
 	u64 guest_cr3;
@@ -1406,7 +1406,7 @@ static int ept_exit_handler_common(union exit_reason exit_reason, bool have_ad)
 		default:
 			report_fail("ERROR - unexpected stage, %d.",
 			       vmx_get_test_stage());
-			print_vmexit_info(exit_reason);
+			print_vmexit_info(regs, exit_reason);
 			return VMX_TEST_VMEXIT;
 		}
 		vmcs_write(GUEST_RIP, guest_rip + insn_len);
@@ -1425,7 +1425,7 @@ static int ept_exit_handler_common(union exit_reason exit_reason, bool have_ad)
 		default:
 			report_fail("ERROR - unexpected stage, %d.",
 			       vmx_get_test_stage());
-			print_vmexit_info(exit_reason);
+			print_vmexit_info(regs, exit_reason);
 			return VMX_TEST_VMEXIT;
 		}
 		return VMX_TEST_RESUME;
@@ -1481,20 +1481,20 @@ static int ept_exit_handler_common(union exit_reason exit_reason, bool have_ad)
 			// Should not reach here
 			report_fail("ERROR : unexpected stage, %d",
 			       vmx_get_test_stage());
-			print_vmexit_info(exit_reason);
+			print_vmexit_info(regs, exit_reason);
 			return VMX_TEST_VMEXIT;
 		}
 		return VMX_TEST_RESUME;
 	default:
 		report_fail("Unknown exit reason, 0x%x", exit_reason.full);
-		print_vmexit_info(exit_reason);
+		print_vmexit_info(regs, exit_reason);
 	}
 	return VMX_TEST_VMEXIT;
 }
 
-static int ept_exit_handler(union exit_reason exit_reason)
+static int ept_exit_handler(struct regs *regs, union exit_reason exit_reason)
 {
-	return ept_exit_handler_common(exit_reason, false);
+	return ept_exit_handler_common(regs, exit_reason, false);
 }
 
 static int eptad_init(struct vmcs *vmcs)
@@ -1559,9 +1559,9 @@ static void eptad_main(void)
 	ept_common();
 }
 
-static int eptad_exit_handler(union exit_reason exit_reason)
+static int eptad_exit_handler(struct regs *regs, union exit_reason exit_reason)
 {
-	return ept_exit_handler_common(exit_reason, true);
+	return ept_exit_handler_common(regs, exit_reason, true);
 }
 
 #define TIMER_VECTOR	222
@@ -1674,7 +1674,7 @@ static void interrupt_main(void)
 	report(timer_fired, "Inject an event to a halted guest");
 }
 
-static int interrupt_exit_handler(union exit_reason exit_reason)
+static int interrupt_exit_handler(struct regs *regs, union exit_reason exit_reason)
 {
 	u64 guest_rip = vmcs_read(GUEST_RIP);
 	u32 insn_len = vmcs_read(EXI_INST_LEN);
@@ -1726,7 +1726,7 @@ static int interrupt_exit_handler(union exit_reason exit_reason)
 		return VMX_TEST_RESUME;
 	default:
 		report_fail("Unknown exit reason, 0x%x", exit_reason.full);
-		print_vmexit_info(exit_reason);
+		print_vmexit_info(regs, exit_reason);
 	}
 
 	return VMX_TEST_VMEXIT;
@@ -1804,7 +1804,7 @@ static void nmi_hlt_main(void)
     vmx_set_test_stage(3);
 }
 
-static int nmi_hlt_exit_handler(union exit_reason exit_reason)
+static int nmi_hlt_exit_handler(struct regs *regs, union exit_reason exit_reason)
 {
     u64 guest_rip = vmcs_read(GUEST_RIP);
     u32 insn_len = vmcs_read(EXI_INST_LEN);
@@ -1814,7 +1814,7 @@ static int nmi_hlt_exit_handler(union exit_reason exit_reason)
         if (exit_reason.basic != VMX_VMCALL) {
             report_fail("VMEXIT not due to vmcall. Exit reason 0x%x",
                         exit_reason.full);
-            print_vmexit_info(exit_reason);
+            print_vmexit_info(regs, exit_reason);
             return VMX_TEST_VMEXIT;
         }
 
@@ -1829,7 +1829,7 @@ static int nmi_hlt_exit_handler(union exit_reason exit_reason)
         if (exit_reason.basic != VMX_EXC_NMI) {
             report_fail("VMEXIT not due to NMI intercept. Exit reason 0x%x",
                         exit_reason.full);
-            print_vmexit_info(exit_reason);
+            print_vmexit_info(regs, exit_reason);
             return VMX_TEST_VMEXIT;
         }
         report_pass("NMI intercept while running guest");
@@ -1914,7 +1914,7 @@ static void dbgctls_main(void)
 	report(vmx_get_test_stage() == 4, "Don't save debug controls");
 }
 
-static int dbgctls_exit_handler(union exit_reason exit_reason)
+static int dbgctls_exit_handler(struct regs *regs, union exit_reason exit_reason)
 {
 	u32 insn_len = vmcs_read(EXI_INST_LEN);
 	u64 guest_rip = vmcs_read(GUEST_RIP);
@@ -1957,7 +1957,7 @@ static int dbgctls_exit_handler(union exit_reason exit_reason)
 		return VMX_TEST_RESUME;
 	default:
 		report_fail("Unknown exit reason, %d", exit_reason.full);
-		print_vmexit_info(exit_reason);
+		print_vmexit_info(regs, exit_reason);
 	}
 	return VMX_TEST_VMEXIT;
 }
@@ -2004,7 +2004,7 @@ static void msr_switch_main(void)
 	vmcall();
 }
 
-static int msr_switch_exit_handler(union exit_reason exit_reason)
+static int msr_switch_exit_handler(struct regs *regs, union exit_reason exit_reason)
 {
 	if (exit_reason.basic == VMX_VMCALL && vmx_get_test_stage() == 2) {
 		report(exit_msr_store[0].value == MSR_MAGIC + 1,
@@ -2055,7 +2055,7 @@ static void vmmcall_main(void)
 	report_fail("VMMCALL");
 }
 
-static int vmmcall_exit_handler(union exit_reason exit_reason)
+static int vmmcall_exit_handler(struct regs *regs, union exit_reason exit_reason)
 {
 	switch (exit_reason.basic) {
 	case VMX_VMCALL:
@@ -2068,7 +2068,7 @@ static int vmmcall_exit_handler(union exit_reason exit_reason)
 		break;
 	default:
 		report_fail("Unknown exit reason, 0x%x", exit_reason.full);
-		print_vmexit_info(exit_reason);
+		print_vmexit_info(regs, exit_reason);
 	}
 
 	return VMX_TEST_VMEXIT;
@@ -2119,7 +2119,7 @@ static void disable_rdtscp_main(void)
 	vmcall();
 }
 
-static int disable_rdtscp_exit_handler(union exit_reason exit_reason)
+static int disable_rdtscp_exit_handler(struct regs *regs, union exit_reason exit_reason)
 {
 	switch (exit_reason.basic) {
 	case VMX_VMCALL:
@@ -2140,7 +2140,7 @@ static int disable_rdtscp_exit_handler(union exit_reason exit_reason)
 
 	default:
 		report_fail("Unknown exit reason, 0x%x", exit_reason.full);
-		print_vmexit_info(exit_reason);
+		print_vmexit_info(regs, exit_reason);
 	}
 	return VMX_TEST_VMEXIT;
 }
@@ -2151,7 +2151,7 @@ static void exit_monitor_from_l2_main(void)
 	exit(0);
 }
 
-static int exit_monitor_from_l2_handler(union exit_reason exit_reason)
+static int exit_monitor_from_l2_handler(struct regs *regs, union exit_reason exit_reason)
 {
 	report_fail("The guest should have killed the VMM");
 	return VMX_TEST_EXIT;
@@ -9985,11 +9985,11 @@ static void vmx_sipi_signal_test(void)
 	/* update CR3 on AP */
 	on_cpu(1, update_cr3, (void *)read_cr3());
 
+	vmx_set_test_stage(0);
+
 	/* start AP */
 	on_cpu_async(1, sipi_test_ap_thread, NULL);
 
-	vmx_set_test_stage(0);
-
 	/* BSP enter guest */
 	enter_guest();
 }
@@ -10358,10 +10358,11 @@ static unsigned long long host_time_to_guest_time(unsigned long long t)
 static unsigned long long rdtsc_vmexit_diff_test_iteration(void)
 {
 	unsigned long long guest_tsc, host_to_guest_tsc;
+	struct regs *regs = guest_regs();
 
 	enter_guest();
 	skip_exit_vmcall();
-	guest_tsc = (u32) regs.rax + (regs.rdx << 32);
+	guest_tsc = (u32) regs->rax + (regs->rdx << 32);
 	host_to_guest_tsc = host_time_to_guest_time(exit_msr_store[0].value);
 
 	return host_to_guest_tsc - guest_tsc;
@@ -10431,10 +10432,10 @@ static void invalid_msr_main(void)
 	report_fail("Invalid MSR load");
 }
 
-static int invalid_msr_exit_handler(union exit_reason exit_reason)
+static int invalid_msr_exit_handler(struct regs *regs, union exit_reason exit_reason)
 {
 	report_fail("Invalid MSR load");
-	print_vmexit_info(exit_reason);
+	print_vmexit_info(regs, exit_reason);
 	return VMX_TEST_EXIT;
 }
 
@@ -10589,6 +10590,7 @@ static void __vmx_pf_exception_test(invalidate_tlb_t inv_fn, void *data,
 {
 	u64 efer;
 	struct cpuid cpuid;
+	struct regs *regs = guest_regs();
 
 	test_set_guest(guest_fn);
 
@@ -10603,23 +10605,23 @@ static void __vmx_pf_exception_test(invalidate_tlb_t inv_fn, void *data,
 	while (vmcs_read(EXI_REASON) != VMX_VMCALL) {
 		switch (vmcs_read(EXI_REASON)) {
 		case VMX_RDMSR:
-			assert(regs.rcx == MSR_EFER);
+			assert(regs->rcx == MSR_EFER);
 			efer = vmcs_read(GUEST_EFER);
-			regs.rdx = efer >> 32;
-			regs.rax = efer & 0xffffffff;
+			regs->rdx = efer >> 32;
+			regs->rax = efer & 0xffffffff;
 			break;
 		case VMX_WRMSR:
-			assert(regs.rcx == MSR_EFER);
-			efer = regs.rdx << 32 | (regs.rax & 0xffffffff);
+			assert(regs->rcx == MSR_EFER);
+			efer = regs->rdx << 32 | (regs->rax & 0xffffffff);
 			vmcs_write(GUEST_EFER, efer);
 			break;
 		case VMX_CPUID:
 			cpuid = (struct cpuid) {0, 0, 0, 0};
-			cpuid = raw_cpuid(regs.rax, regs.rcx);
-			regs.rax = cpuid.a;
-			regs.rbx = cpuid.b;
-			regs.rcx = cpuid.c;
-			regs.rdx = cpuid.d;
+			cpuid = raw_cpuid(regs->rax, regs->rcx);
+			regs->rax = cpuid.a;
+			regs->rbx = cpuid.b;
+			regs->rcx = cpuid.c;
+			regs->rdx = cpuid.d;
 			break;
 		case VMX_INVLPG:
 			inv_fn(data);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions
  2025-11-21 16:44           ` Mathias Krause
@ 2025-12-18  1:44             ` Sean Christopherson
  2025-12-18 10:07               ` Mathias Krause
  0 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2025-12-18  1:44 UTC (permalink / raw)
  To: Mathias Krause
  Cc: kvm, kvmarm, Alexandru Elisei, Andrew Jones, Eric Auger,
	Thomas Huth, Paolo Bonzini

On Fri, Nov 21, 2025, Mathias Krause wrote:
> On 18.11.25 02:47, Mathias Krause wrote:
> > On 18.11.25 02:33, Mathias Krause wrote:
> > [...]
> > Bleh, I just noticed, f01ea38a385a ("x86: Better backtraces for leaf
> > functions") broke vmx_sipi_signal_test too :(
> > 
> > Looking into it!
> 
> Finally found it. It's register corruption within both host and guest.
> 
> It's not related to f01ea38a385a at all but, apparently, it actually
> exposes it, likely because of the enforced stack frame setup, making the
> code rely on (a corrupted) RBP instead of the properly restored (because
> VMCS managed) RSP.
> 
> The core issue is, 'regs' being a singleton, used by multiple CPUs, so
> all SMP VMX tests concurrently making use of vmx_enter_guest() are
> potentially affected.

*sigh*

I'm not going to type out the first dozen words that escaped my mouth when
reading this.  I happen to like my job :-)

> When the first vCPU calls vmx_enter_guest() to launch a guest, it'll use
> 'regs' to load the guest registers but also store its host register
> state. Now, if while that vCPU is running, another vCPU gets launched
> via vmx_enter_guest(), it'll load the previous vCPU's host register
> values as guest register state and store its host registers in 'regs'.
> 
> Depending on which vCPU returns first, it'll either load the other
> vCPU's host registers effectively "switching threads" or, if it's the
> vCPU that called vmx_enter_guest() last, it'll resume just fine. Either
> way, the next vCPU returning will run with the guest register values.
> 
> The latter is what happens with vmx_sipi_signal_test, causing the crash.
> 
> I read a lot of vmx.c and vmx_test.c in the last few days and it's
> really not meant to be used concurrently by multiple guests. vmx_test.c
> has quite some hacks to work around obvious limitations (allocating
> dedicated stacks for APs) but state variables like 'launched',
> 'in_guest', 'guest_finished', 'hypercall_field' and 'regs' are shared
> but really meant to be used only by a single thread.

You're much more generous than me in your description.  

> I hacked up something to verify my theory and made 'regs' "per-cpu". It
> needs quite some code churn and I'm not all that happy with it. IMHO,
> 'regs' and lots of the other VMX management state should be part of some
> vcpu struct or something. In fact, struct vmx_test already has a
> 'guest_regs' but using it won't work, as we need offsetable absolute
> memory references for the inline ASM in vmx_enter_guest() to work as it
> cannot make use of register-based memory references at all. (My hack
> uses a global 'scratch_regs' with mutual exclusion on its usage.)

So, much to my chagrin, I started coding away before reading your entire mail
(I got through the paragraph about 'regs' getting clobbered, and then came back
to the rest later; that was a mistake).

I had the exact same idea about making regs per-CPU, but without the quotes.
Of course, because I didn't read your entire mail, converting only regs to be
per-CPU didn't help.  It actually made things worse (TIL the INIT test shares
a VMCS between CPUs... WTF).

After making launch and in_guest per-CPU, and using RAX to communicate hypercalls,
I've got everything passing (module one unsolvable SIPI wart; see below).  I need
to write changelogs and squash a few fixups, but overall it ended up being a nice
cleanuped (de-duplicating the VMX vs. SVM GPR handling drops a lot of code).  I'm
tempted to delete a blank lines just to get to net negative LoC :-D

 lib/x86/smp.h       |  32 ++++++++++++++++++++++++++++++++
 lib/x86/virt.h      |  61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 x86/Makefile.common |  14 ++++++++++++++
 x86/realmode.c      |   3 +++
 x86/svm.c           |  19 ++++++++-----------
 x86/svm.h           |  61 +++++++++----------------------------------------------------
 x86/svm_tests.c     |   5 +++--
 x86/vmx.c           | 122 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------------------------------------------
 x86/vmx.h           |  72 +++---------------------------------------------------------------------
 x86/vmx_tests.c     | 104 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------
 10 files changed, 247 insertions(+), 246 deletions(-)

> To see the register corruption, one could start the vmx_sipi_signal_test
> test with -s -S, attach gdb to it and add a watch for regs.rax. Stepping
> through the test will clearly show how 'regs' get overwritten wrongly.

The test itself is also flawed.  On top of this race that you spotted:

 @@ -9985,11 +9985,11 @@ static void vmx_sipi_signal_test(void)                
       /* update CR3 on AP */                                                  
       on_cpu(1, update_cr3, (void *)read_cr3());                              
                                                                               
 +     vmx_set_test_stage(0);                                                  
 +                                                                             
       /* start AP */                                                          
       on_cpu_async(1, sipi_test_ap_thread, NULL);                             
                                                                               
 -     vmx_set_test_stage(0);                                                  
 -                                                                             
       /* BSP enter guest */                                                   
       enter_guest();                                                          
  }                                

This snippet is also broken:

	vmx_set_test_stage(1);

	/* AP enter guest */
	enter_guest();

because the BSP can think the AP has entered WFS before it has even attempted
VMLAUNCH.  It's "fine" so long as there are host CPUs available, but the test
fails 100% for me if I run KUT in a VM with e.g. j<number of CPUs>, and on an
Ivybridge server CPU, it fails pretty consistently.  No idea why, maybe a slower
nested VM-Enter path?  E.g. the test passes on EMR even if I run with j<2x CPUs>.

Unfortunately, I can't think of any way to fix that problem.  To recognize the
SIPI, the AP needs to do VM-Enter with a WFS activity state, and I'm struggling
to think of a way to atomically write software-visible state at the time of VM-Enter.
E.g. in theory, the test could peek at the LAUNCHED field in the VMCS, but that
would require reverse engineering the VMCS layout for every CPU.  If KVM emulated
any VM-wide MSRs, maybe we could throw one in the MSR load list?  But all the ones
I can think of, e.g. Hyper-V's partition-wide MSRs, aren't guaranteed to be available.

Anywho, now that I know what to look for, I can ignore false failures easily enough.
If someone cares enough to come up with a clever fix, then yay.  Otherwise, I'll
just deal with the intermittent failures (or maybe nuke and pave).

As for this patch, I'll write changelogs and get a series posted with the backtrace
and realmode patch at the end.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions
  2025-12-18  1:44             ` Sean Christopherson
@ 2025-12-18 10:07               ` Mathias Krause
  2025-12-18 18:26                 ` Sean Christopherson
  0 siblings, 1 reply; 22+ messages in thread
From: Mathias Krause @ 2025-12-18 10:07 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, kvmarm, Alexandru Elisei, Andrew Jones, Eric Auger,
	Thomas Huth, Paolo Bonzini


[-- Attachment #1.1: Type: text/plain, Size: 8430 bytes --]

Hi Sean,

On 18.12.25 02:44, Sean Christopherson wrote:
> On Fri, Nov 21, 2025, Mathias Krause wrote:
>> On 18.11.25 02:47, Mathias Krause wrote:
>>
>> Finally found it. It's register corruption within both host and guest.
>> [...]
> 
> *sigh*
> 
> I'm not going to type out the first dozen words that escaped my mouth when
> reading this.  I happen to like my job :-)

Me too, that's why I had to switch tasks to gain back some sanity ;)

> [...]
>> I hacked up something to verify my theory and made 'regs' "per-cpu". It
>> needs quite some code churn and I'm not all that happy with it. IMHO,
>> 'regs' and lots of the other VMX management state should be part of some
>> vcpu struct or something. In fact, struct vmx_test already has a
>> 'guest_regs' but using it won't work, as we need offsetable absolute
>> memory references for the inline ASM in vmx_enter_guest() to work as it
>> cannot make use of register-based memory references at all. (My hack
>> uses a global 'scratch_regs' with mutual exclusion on its usage.)
> 
> So, much to my chagrin, I started coding away before reading your entire mail
> (I got through the paragraph about 'regs' getting clobbered, and then came back
> to the rest later; that was a mistake).

A common mistake of mine: writing lengthy explanations that don't get
read. Maybe I should start included some tl;dr section for such cases ;)

> I had the exact same idea about making regs per-CPU, but without the quotes.

Actually, I wanted something quick to test, so I went the easy route and
used a global array. However, my plan for a proper fix was to do the
real per-cpu thing, as that would still fit the "offsetable absolute
memory reference" needs for inline asm, as GS/FS-prefixed addresses
would be just fine.

> Of course, because I didn't read your entire mail, converting only regs to be
> per-CPU didn't help.  It actually made things worse (TIL the INIT test shares
> a VMCS between CPUs... WTF).

Hah! This rabbit hole has way to many branches!

> 
> After making launch and in_guest per-CPU, and using RAX to communicate hypercalls,

Nice! The use of a global 'hypercall_field' for communicating the
intended hypercall was just puzzling to me. Using registers is such a
more natural way of doing it.

> I've got everything passing (module one unsolvable SIPI wart; see below).  I need
> to write changelogs and squash a few fixups, but overall it ended up being a nice
> cleanuped (de-duplicating the VMX vs. SVM GPR handling drops a lot of code).  I'm
> tempted to delete a blank lines just to get to net negative LoC :-D
> 
>  lib/x86/smp.h       |  32 ++++++++++++++++++++++++++++++++
>  lib/x86/virt.h      |  61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  x86/Makefile.common |  14 ++++++++++++++
>  x86/realmode.c      |   3 +++
>  x86/svm.c           |  19 ++++++++-----------
>  x86/svm.h           |  61 +++++++++----------------------------------------------------
>  x86/svm_tests.c     |   5 +++--
>  x86/vmx.c           | 122 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------------------------------------------
>  x86/vmx.h           |  72 +++---------------------------------------------------------------------
>  x86/vmx_tests.c     | 104 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------
>  10 files changed, 247 insertions(+), 246 deletions(-)

Not too bad!

> 
>> To see the register corruption, one could start the vmx_sipi_signal_test
>> test with -s -S, attach gdb to it and add a watch for regs.rax. Stepping
>> through the test will clearly show how 'regs' get overwritten wrongly.
> 
> The test itself is also flawed.  On top of this race that you spotted:
> 
>  @@ -9985,11 +9985,11 @@ static void vmx_sipi_signal_test(void)                
>        /* update CR3 on AP */                                                  
>        on_cpu(1, update_cr3, (void *)read_cr3());                              
>                                                                                
>  +     vmx_set_test_stage(0);                                                  
>  +                                                                             
>        /* start AP */                                                          
>        on_cpu_async(1, sipi_test_ap_thread, NULL);                             
>                                                                                
>  -     vmx_set_test_stage(0);                                                  
>  -                                                                             
>        /* BSP enter guest */                                                   
>        enter_guest();                                                          
>   }                                
> 
> This snippet is also broken:
> 
> 	vmx_set_test_stage(1);
> 
> 	/* AP enter guest */
> 	enter_guest();
> 
> because the BSP can think the AP has entered WFS before it has even attempted
> VMLAUNCH.  It's "fine" so long as there are host CPUs available, but the test
> fails 100% for me if I run KUT in a VM with e.g. j<number of CPUs>, and on an
> Ivybridge server CPU, it fails pretty consistently.  No idea why, maybe a slower
> nested VM-Enter path?  E.g. the test passes on EMR even if I run with j<2x CPUs>.

Yeah, I noticed the failing synchronization between the vCPUs too. I
even tried to fix them by introducing a few more states (simply more
steps). However, then I ended up in deadlocks, as the AP re-enters
vmx_sipi_test_guest() multiple times (well, twice). It was likely
related to the register corruption, I only noticed later. So I believed
those magic sleeps (delay(SIPI_SIGNAL_TEST_DELAY)) are for the lack of
synchronization and should account for the time span of the AP setting a
new step value but not yet having switched to the guest.
It might be worth revisiting my attempts of explicitly adding new step
states, now with the register corrupting being handled...

> 
> Unfortunately, I can't think of any way to fix that problem.  To recognize the
> SIPI, the AP needs to do VM-Enter with a WFS activity state, and I'm struggling
> to think of a way to atomically write software-visible state at the time of VM-Enter.
> E.g. in theory, the test could peek at the LAUNCHED field in the VMCS, but that
> would require reverse engineering the VMCS layout for every CPU.  If KVM emulated
> any VM-wide MSRs, maybe we could throw one in the MSR load list?  But all the ones
> I can think of, e.g. Hyper-V's partition-wide MSRs, aren't guaranteed to be available.

Ahh, that explains the deadlock I observed. So adding more states won't
help. Yeah, it needs some trickery... or those magic sleeps :D

> 
> Anywho, now that I know what to look for, I can ignore false failures easily enough.
> If someone cares enough to come up with a clever fix, then yay.  Otherwise, I'll
> just deal with the intermittent failures (or maybe nuke and pave).

> 
> As for this patch, I'll write changelogs and get a series posted with the backtrace
> and realmode patch at the end.

Thanks a lot for picking up the work! For me, KUT is far from my daily
work, so spending time on it cuts on the time budget what I'm actually
supposed to do. After having spent nearly a week in chasing that bug
made me realize, the work to fix it properly would be, at least for me,
yet another week and I didn't had the time for it back then.

Of course, if changes of mine cause regressions, I feel strongly
obligated to fix these -- even if it isn't my code's fault after all.
However, in this case, I just didn't found the time to do so yet. So
thanks again for working on this!

I'll be off for the season from tomorrow on, so may not provide instant
feedback for the series, in case you post it in the next couple of days.
But I'm looking forward reviewing it! :)

One more thing. I was able to land the gcc bug fix[1], so the realmode
patch could be reworded to mention that only "pre gcc-16" is broken but
the patch is still needed as gcc 16 isn't even released yet and far from
being the minimal supported version for KUT.

Thanks,
Mathias

[1] https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=114a19fae9bd

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 665 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions
  2025-12-18 10:07               ` Mathias Krause
@ 2025-12-18 18:26                 ` Sean Christopherson
  2025-12-19 13:19                   ` Mathias Krause
  0 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2025-12-18 18:26 UTC (permalink / raw)
  To: Mathias Krause
  Cc: kvm, kvmarm, Alexandru Elisei, Andrew Jones, Eric Auger,
	Thomas Huth, Paolo Bonzini

VICTORY IS MINE!!!!!!!

On Thu, Dec 18, 2025, Mathias Krause wrote:
> On 18.12.25 02:44, Sean Christopherson wrote:
> > On Fri, Nov 21, 2025, Mathias Krause wrote:
> >> On 18.11.25 02:47, Mathias Krause wrote:
> >>
> >> Finally found it. It's register corruption within both host and guest.
> >> [...]
> > 
> > *sigh*
> > 
> > I'm not going to type out the first dozen words that escaped my mouth when
> > reading this.  I happen to like my job :-)
> 
> Me too, that's why I had to switch tasks to gain back some sanity ;)
> 
> > [...]
> >> I hacked up something to verify my theory and made 'regs' "per-cpu". It
> >> needs quite some code churn and I'm not all that happy with it. IMHO,
> >> 'regs' and lots of the other VMX management state should be part of some
> >> vcpu struct or something. In fact, struct vmx_test already has a
> >> 'guest_regs' but using it won't work, as we need offsetable absolute
> >> memory references for the inline ASM in vmx_enter_guest() to work as it
> >> cannot make use of register-based memory references at all. (My hack
> >> uses a global 'scratch_regs' with mutual exclusion on its usage.)
> > 
> > So, much to my chagrin, I started coding away before reading your entire mail
> > (I got through the paragraph about 'regs' getting clobbered, and then came back
> > to the rest later; that was a mistake).
> 
> A common mistake of mine: writing lengthy explanations that don't get
> read. Maybe I should start included some tl;dr section for such cases ;)

FWIW, I like the lengthy, detailed explanations.  I just got excited about having
a clever idea for addressing the regs.

> > The test itself is also flawed.  On top of this race that you spotted:
> > 
> >  @@ -9985,11 +9985,11 @@ static void vmx_sipi_signal_test(void)                
> >        /* update CR3 on AP */                                                  
> >        on_cpu(1, update_cr3, (void *)read_cr3());                              
> >                                                                                
> >  +     vmx_set_test_stage(0);                                                  
> >  +                                                                             
> >        /* start AP */                                                          
> >        on_cpu_async(1, sipi_test_ap_thread, NULL);                             
> >                                                                                
> >  -     vmx_set_test_stage(0);                                                  
> >  -                                                                             
> >        /* BSP enter guest */                                                   
> >        enter_guest();                                                          
> >   }                                
> > 
> > This snippet is also broken:
> > 
> > 	vmx_set_test_stage(1);
> > 
> > 	/* AP enter guest */
> > 	enter_guest();
> > 
> > because the BSP can think the AP has entered WFS before it has even attempted
> > VMLAUNCH.  It's "fine" so long as there are host CPUs available, but the test
> > fails 100% for me if I run KUT in a VM with e.g. j<number of CPUs>, and on an
> > Ivybridge server CPU, it fails pretty consistently.  No idea why, maybe a slower
> > nested VM-Enter path?  E.g. the test passes on EMR even if I run with j<2x CPUs>.
> 
> Yeah, I noticed the failing synchronization between the vCPUs too. I
> even tried to fix them by introducing a few more states (simply more
> steps). However, then I ended up in deadlocks, as the AP re-enters
> vmx_sipi_test_guest() multiple times (well, twice). It was likely
> related to the register corruption, I only noticed later. So I believed
> those magic sleeps (delay(SIPI_SIGNAL_TEST_DELAY)) are for the lack of
> synchronization and should account for the time span of the AP setting a
> new step value but not yet having switched to the guest.
> It might be worth revisiting my attempts of explicitly adding new step
> states, now with the register corrupting being handled...
> 
> > 
> > Unfortunately, I can't think of any way to fix that problem.  To recognize the
> > SIPI, the AP needs to do VM-Enter with a WFS activity state, and I'm struggling
> > to think of a way to atomically write software-visible state at the time of VM-Enter.
> > E.g. in theory, the test could peek at the LAUNCHED field in the VMCS, but that
> > would require reverse engineering the VMCS layout for every CPU.  If KVM emulated
> > any VM-wide MSRs, maybe we could throw one in the MSR load list?  But all the ones
> > I can think of, e.g. Hyper-V's partition-wide MSRs, aren't guaranteed to be available.
> 
> Ahh, that explains the deadlock I observed. So adding more states won't
> help. Yeah, it needs some trickery... or those magic sleeps :D

Trickery obtained!  MSR_KVM_WALL_CLOCK_NEW to the rescue.  Even if with a delay
on the AP but not the BSP:

diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 6bfd6280..047eb1f5 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -9869,7 +9869,7 @@ static void vmx_sipi_test_guest(void)
                /* wait AP enter guest with activity=WAIT_SIPI */
                while (vmx_get_test_stage() != 1)
                        ;
-               delay(SIPI_SIGNAL_TEST_DELAY);
+               // delay(SIPI_SIGNAL_TEST_DELAY);
 
                /* First SIPI signal */
                apic_icr_write(APIC_DEST_PHYSICAL | APIC_DM_STARTUP | APIC_INT_ASSERT, id_map[1]);
@@ -9938,6 +9938,7 @@ static void sipi_test_ap_thread(void *data)
        vmcs_write(GUEST_ACTV_STATE, ACTV_WAIT_SIPI);
 
        vmx_set_test_stage(1);
+       delay(SIPI_SIGNAL_TEST_DELAY);
 
        /* AP enter guest */
        enter_guest();

Writing MSR_KVM_WALL_CLOCK_NEW via the AP's VM-Entry load list passes with.

diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 510454a6..2d140ee5 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -6,6 +6,7 @@
 
 #include <asm/debugreg.h>
 
+#include "kvmclock.h"
 #include "vmx.h"
 #include "msr.h"
 #include "processor.h"
@@ -1967,7 +1968,7 @@ struct vmx_msr_entry {
        u32 index;
        u32 reserved;
        u64 value;
-} __attribute__((packed));
+} __attribute__((packed)) __attribute__((aligned(16)));
 
 #define MSR_MAGIC 0x31415926
 struct vmx_msr_entry *exit_msr_store, *entry_msr_load, *exit_msr_load;
@@ -9861,6 +9862,8 @@ static void vmx_init_signal_test(void)
         */
 }
 
+static bool use_kvm_wall_clock;
+
 #define SIPI_SIGNAL_TEST_DELAY 100000000ULL
 
 static void vmx_sipi_test_guest(void)
@@ -9869,7 +9872,7 @@ static void vmx_sipi_test_guest(void)
                /* wait AP enter guest with activity=WAIT_SIPI */
                while (vmx_get_test_stage() != 1)
                        ;
-               delay(SIPI_SIGNAL_TEST_DELAY);
+               // delay(SIPI_SIGNAL_TEST_DELAY);
 
                /* First SIPI signal */
                apic_icr_write(APIC_DEST_PHYSICAL | APIC_DM_STARTUP | APIC_INT_ASSERT, id_map[1]);
@@ -9903,6 +9906,12 @@ static void vmx_sipi_test_guest(void)
        }
 }
 
+static const struct vmx_msr_entry msr_load_wall_clock = {
+       .index = MSR_KVM_WALL_CLOCK_NEW,
+       .reserved = 0,
+       .value = 1,
+};
+
 static void sipi_test_ap_thread(void *data)
 {
        struct guest_regs *regs = this_cpu_guest_regs();
@@ -9937,7 +9946,15 @@ static void sipi_test_ap_thread(void *data)
        /* Set guest activity state to wait-for-SIPI state */
        vmcs_write(GUEST_ACTV_STATE, ACTV_WAIT_SIPI);
 
-       vmx_set_test_stage(1);
+       if (use_kvm_wall_clock) {
+               wrmsr(MSR_KVM_WALL_CLOCK_NEW, 0);
+               vmcs_write(ENT_MSR_LD_CNT, 1);
+               vmcs_write(ENTER_MSR_LD_ADDR, virt_to_phys(&msr_load_wall_clock));
+       } else {
+               vmx_set_test_stage(1);
+       }
+
+       delay(SIPI_SIGNAL_TEST_DELAY);
 
        /* AP enter guest */
        enter_guest();
@@ -9980,6 +9997,8 @@ static void vmx_sipi_signal_test(void)
        u64 cpu_ctrl_0 = CPU_SECONDARY;
        u64 cpu_ctrl_1 = 0;
 
+       use_kvm_wall_clock = this_cpu_has_kvm() && this_cpu_has(KVM_FEATURE_CLOCKSOURCE2);
+
        /* passthrough lapic to L2 */
        disable_intercept_for_x2apic_msrs();
        vmcs_write(PIN_CONTROLS, vmcs_read(PIN_CONTROLS) & ~PIN_EXTINT);
@@ -9996,6 +10015,13 @@ static void vmx_sipi_signal_test(void)
        /* start AP */
        on_cpu_async(1, sipi_test_ap_thread, NULL);
 
+       if (use_kvm_wall_clock) {
+               while (rdmsr(MSR_KVM_WALL_CLOCK_NEW) != 1)
+                       cpu_relax();
+
+               vmx_set_test_stage(1);
+       }
+
        /* BSP enter guest */
        enter_guest();
 }
 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions
  2025-12-18 18:26                 ` Sean Christopherson
@ 2025-12-19 13:19                   ` Mathias Krause
  0 siblings, 0 replies; 22+ messages in thread
From: Mathias Krause @ 2025-12-19 13:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, kvmarm, Alexandru Elisei, Andrew Jones, Eric Auger,
	Thomas Huth, Paolo Bonzini

On 18.12.25 19:26, Sean Christopherson wrote:
> VICTORY IS MINE!!!!!!!

🏆

> [...]
> Writing MSR_KVM_WALL_CLOCK_NEW via the AP's VM-Entry load list passes with.
> 
> diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
> index 510454a6..2d140ee5 100644
> --- a/x86/vmx_tests.c
> +++ b/x86/vmx_tests.c
> @@ -6,6 +6,7 @@
>  
>  #include <asm/debugreg.h>
>  
> +#include "kvmclock.h"
>  #include "vmx.h"
>  #include "msr.h"
>  #include "processor.h"
> @@ -1967,7 +1968,7 @@ struct vmx_msr_entry {
>         u32 index;
>         u32 reserved;
>         u64 value;
> -} __attribute__((packed));
> +} __attribute__((packed)) __attribute__((aligned(16)));
>  
>  #define MSR_MAGIC 0x31415926
>  struct vmx_msr_entry *exit_msr_store, *entry_msr_load, *exit_msr_load;
> @@ -9861,6 +9862,8 @@ static void vmx_init_signal_test(void)
>          */
>  }
>  
> +static bool use_kvm_wall_clock;
> +
>  #define SIPI_SIGNAL_TEST_DELAY 100000000ULL
>  
>  static void vmx_sipi_test_guest(void)
> @@ -9869,7 +9872,7 @@ static void vmx_sipi_test_guest(void)
>                 /* wait AP enter guest with activity=WAIT_SIPI */
>                 while (vmx_get_test_stage() != 1)
>                         ;
> -               delay(SIPI_SIGNAL_TEST_DELAY);
> +               // delay(SIPI_SIGNAL_TEST_DELAY);
>  
>                 /* First SIPI signal */
>                 apic_icr_write(APIC_DEST_PHYSICAL | APIC_DM_STARTUP | APIC_INT_ASSERT, id_map[1]);
> @@ -9903,6 +9906,12 @@ static void vmx_sipi_test_guest(void)
>         }
>  }
>  
> +static const struct vmx_msr_entry msr_load_wall_clock = {
> +       .index = MSR_KVM_WALL_CLOCK_NEW,
> +       .reserved = 0,
> +       .value = 1,
> +};
> +
>  static void sipi_test_ap_thread(void *data)
>  {
>         struct guest_regs *regs = this_cpu_guest_regs();
> @@ -9937,7 +9946,15 @@ static void sipi_test_ap_thread(void *data)
>         /* Set guest activity state to wait-for-SIPI state */
>         vmcs_write(GUEST_ACTV_STATE, ACTV_WAIT_SIPI);
>  
> -       vmx_set_test_stage(1);
> +       if (use_kvm_wall_clock) {
> +               wrmsr(MSR_KVM_WALL_CLOCK_NEW, 0);
> +               vmcs_write(ENT_MSR_LD_CNT, 1);
> +               vmcs_write(ENTER_MSR_LD_ADDR, virt_to_phys(&msr_load_wall_clock));
> +       } else {
> +               vmx_set_test_stage(1);
> +       }
> +
> +       delay(SIPI_SIGNAL_TEST_DELAY);
>  
>         /* AP enter guest */
>         enter_guest();
> @@ -9980,6 +9997,8 @@ static void vmx_sipi_signal_test(void)
>         u64 cpu_ctrl_0 = CPU_SECONDARY;
>         u64 cpu_ctrl_1 = 0;
>  
> +       use_kvm_wall_clock = this_cpu_has_kvm() && this_cpu_has(KVM_FEATURE_CLOCKSOURCE2);
> +
>         /* passthrough lapic to L2 */
>         disable_intercept_for_x2apic_msrs();
>         vmcs_write(PIN_CONTROLS, vmcs_read(PIN_CONTROLS) & ~PIN_EXTINT);
> @@ -9996,6 +10015,13 @@ static void vmx_sipi_signal_test(void)
>         /* start AP */
>         on_cpu_async(1, sipi_test_ap_thread, NULL);
>  
> +       if (use_kvm_wall_clock) {
> +               while (rdmsr(MSR_KVM_WALL_CLOCK_NEW) != 1)
> +                       cpu_relax();
> +
> +               vmx_set_test_stage(1);
> +       }
> +
>         /* BSP enter guest */
>         enter_guest();
>  }
>  

What a hack! I like it :D

Happy holidays!

Thanks,
Mathias

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-12-19 13:19 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-15 21:54 [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions Mathias Krause
2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 1/4] Makefile: Provide a concept of late CFLAGS Mathias Krause
2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 2/4] x86: Better backtraces for leaf functions Mathias Krause
2025-10-10  6:03   ` Mathias Krause
2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 3/4] arm64: " Mathias Krause
2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 4/4] arm: Fix backtraces involving " Mathias Krause
2025-09-16 13:04 ` [kvm-unit-tests PATCH v2 0/4] Better backtraces for " Andrew Jones
2025-11-14 15:58 ` Mathias Krause
2025-11-14 16:39   ` Sean Christopherson
2025-11-14 18:25 ` Sean Christopherson
2025-11-15  4:56   ` Mathias Krause
2025-11-17 22:19     ` Sean Christopherson
2025-11-18  1:33       ` Mathias Krause
2025-11-18  1:47         ` Mathias Krause
2025-11-18  4:04           ` Mathias Krause
2025-11-18 11:56             ` Mathias Krause
2025-11-18 12:10               ` Mathias Krause
2025-11-21 16:44           ` Mathias Krause
2025-12-18  1:44             ` Sean Christopherson
2025-12-18 10:07               ` Mathias Krause
2025-12-18 18:26                 ` Sean Christopherson
2025-12-19 13:19                   ` Mathias Krause

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox