From: Sean Christopherson <seanjc@google.com>
To: Mathias Krause <minipli@grsecurity.net>
Cc: kvm@vger.kernel.org, kvmarm@lists.linux.dev,
Alexandru Elisei <alexandru.elisei@arm.com>,
Andrew Jones <andrew.jones@linux.dev>,
Eric Auger <eric.auger@redhat.com>,
Thomas Huth <thuth@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions
Date: Wed, 17 Dec 2025 17:44:43 -0800 [thread overview]
Message-ID: <aUNci6Oy1EXXoQuY@google.com> (raw)
In-Reply-To: <3bac29b9-4c49-4e5d-997e-9e4019a2fceb@grsecurity.net>
On Fri, Nov 21, 2025, Mathias Krause wrote:
> On 18.11.25 02:47, Mathias Krause wrote:
> > On 18.11.25 02:33, Mathias Krause wrote:
> > [...]
> > Bleh, I just noticed, f01ea38a385a ("x86: Better backtraces for leaf
> > functions") broke vmx_sipi_signal_test too :(
> >
> > Looking into it!
>
> Finally found it. It's register corruption within both host and guest.
>
> It's not related to f01ea38a385a at all but, apparently, it actually
> exposes it, likely because of the enforced stack frame setup, making the
> code rely on (a corrupted) RBP instead of the properly restored (because
> VMCS managed) RSP.
>
> The core issue is, 'regs' being a singleton, used by multiple CPUs, so
> all SMP VMX tests concurrently making use of vmx_enter_guest() are
> potentially affected.
*sigh*
I'm not going to type out the first dozen words that escaped my mouth when
reading this. I happen to like my job :-)
> When the first vCPU calls vmx_enter_guest() to launch a guest, it'll use
> 'regs' to load the guest registers but also store its host register
> state. Now, if while that vCPU is running, another vCPU gets launched
> via vmx_enter_guest(), it'll load the previous vCPU's host register
> values as guest register state and store its host registers in 'regs'.
>
> Depending on which vCPU returns first, it'll either load the other
> vCPU's host registers effectively "switching threads" or, if it's the
> vCPU that called vmx_enter_guest() last, it'll resume just fine. Either
> way, the next vCPU returning will run with the guest register values.
>
> The latter is what happens with vmx_sipi_signal_test, causing the crash.
>
> I read a lot of vmx.c and vmx_test.c in the last few days and it's
> really not meant to be used concurrently by multiple guests. vmx_test.c
> has quite some hacks to work around obvious limitations (allocating
> dedicated stacks for APs) but state variables like 'launched',
> 'in_guest', 'guest_finished', 'hypercall_field' and 'regs' are shared
> but really meant to be used only by a single thread.
You're much more generous than me in your description.
> I hacked up something to verify my theory and made 'regs' "per-cpu". It
> needs quite some code churn and I'm not all that happy with it. IMHO,
> 'regs' and lots of the other VMX management state should be part of some
> vcpu struct or something. In fact, struct vmx_test already has a
> 'guest_regs' but using it won't work, as we need offsetable absolute
> memory references for the inline ASM in vmx_enter_guest() to work as it
> cannot make use of register-based memory references at all. (My hack
> uses a global 'scratch_regs' with mutual exclusion on its usage.)
So, much to my chagrin, I started coding away before reading your entire mail
(I got through the paragraph about 'regs' getting clobbered, and then came back
to the rest later; that was a mistake).
I had the exact same idea about making regs per-CPU, but without the quotes.
Of course, because I didn't read your entire mail, converting only regs to be
per-CPU didn't help. It actually made things worse (TIL the INIT test shares
a VMCS between CPUs... WTF).
After making launch and in_guest per-CPU, and using RAX to communicate hypercalls,
I've got everything passing (module one unsolvable SIPI wart; see below). I need
to write changelogs and squash a few fixups, but overall it ended up being a nice
cleanuped (de-duplicating the VMX vs. SVM GPR handling drops a lot of code). I'm
tempted to delete a blank lines just to get to net negative LoC :-D
lib/x86/smp.h | 32 ++++++++++++++++++++++++++++++++
lib/x86/virt.h | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x86/Makefile.common | 14 ++++++++++++++
x86/realmode.c | 3 +++
x86/svm.c | 19 ++++++++-----------
x86/svm.h | 61 +++++++++----------------------------------------------------
x86/svm_tests.c | 5 +++--
x86/vmx.c | 122 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------------------------------------------
x86/vmx.h | 72 +++---------------------------------------------------------------------
x86/vmx_tests.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------
10 files changed, 247 insertions(+), 246 deletions(-)
> To see the register corruption, one could start the vmx_sipi_signal_test
> test with -s -S, attach gdb to it and add a watch for regs.rax. Stepping
> through the test will clearly show how 'regs' get overwritten wrongly.
The test itself is also flawed. On top of this race that you spotted:
@@ -9985,11 +9985,11 @@ static void vmx_sipi_signal_test(void)
/* update CR3 on AP */
on_cpu(1, update_cr3, (void *)read_cr3());
+ vmx_set_test_stage(0);
+
/* start AP */
on_cpu_async(1, sipi_test_ap_thread, NULL);
- vmx_set_test_stage(0);
-
/* BSP enter guest */
enter_guest();
}
This snippet is also broken:
vmx_set_test_stage(1);
/* AP enter guest */
enter_guest();
because the BSP can think the AP has entered WFS before it has even attempted
VMLAUNCH. It's "fine" so long as there are host CPUs available, but the test
fails 100% for me if I run KUT in a VM with e.g. j<number of CPUs>, and on an
Ivybridge server CPU, it fails pretty consistently. No idea why, maybe a slower
nested VM-Enter path? E.g. the test passes on EMR even if I run with j<2x CPUs>.
Unfortunately, I can't think of any way to fix that problem. To recognize the
SIPI, the AP needs to do VM-Enter with a WFS activity state, and I'm struggling
to think of a way to atomically write software-visible state at the time of VM-Enter.
E.g. in theory, the test could peek at the LAUNCHED field in the VMCS, but that
would require reverse engineering the VMCS layout for every CPU. If KVM emulated
any VM-wide MSRs, maybe we could throw one in the MSR load list? But all the ones
I can think of, e.g. Hyper-V's partition-wide MSRs, aren't guaranteed to be available.
Anywho, now that I know what to look for, I can ignore false failures easily enough.
If someone cares enough to come up with a clever fix, then yay. Otherwise, I'll
just deal with the intermittent failures (or maybe nuke and pave).
As for this patch, I'll write changelogs and get a series posted with the backtrace
and realmode patch at the end.
next prev parent reply other threads:[~2025-12-18 1:44 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-15 21:54 [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions Mathias Krause
2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 1/4] Makefile: Provide a concept of late CFLAGS Mathias Krause
2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 2/4] x86: Better backtraces for leaf functions Mathias Krause
2025-10-10 6:03 ` Mathias Krause
2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 3/4] arm64: " Mathias Krause
2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 4/4] arm: Fix backtraces involving " Mathias Krause
2025-09-16 13:04 ` [kvm-unit-tests PATCH v2 0/4] Better backtraces for " Andrew Jones
2025-11-14 15:58 ` Mathias Krause
2025-11-14 16:39 ` Sean Christopherson
2025-11-14 18:25 ` Sean Christopherson
2025-11-15 4:56 ` Mathias Krause
2025-11-17 22:19 ` Sean Christopherson
2025-11-18 1:33 ` Mathias Krause
2025-11-18 1:47 ` Mathias Krause
2025-11-18 4:04 ` Mathias Krause
2025-11-18 11:56 ` Mathias Krause
2025-11-18 12:10 ` Mathias Krause
2025-11-21 16:44 ` Mathias Krause
2025-12-18 1:44 ` Sean Christopherson [this message]
2025-12-18 10:07 ` Mathias Krause
2025-12-18 18:26 ` Sean Christopherson
2025-12-19 13:19 ` Mathias Krause
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aUNci6Oy1EXXoQuY@google.com \
--to=seanjc@google.com \
--cc=alexandru.elisei@arm.com \
--cc=andrew.jones@linux.dev \
--cc=eric.auger@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=minipli@grsecurity.net \
--cc=pbonzini@redhat.com \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox