From: Sean Christopherson <seanjc@google.com>
To: Mathias Krause <minipli@grsecurity.net>
Cc: kvm@vger.kernel.org, kvmarm@lists.linux.dev,
Alexandru Elisei <alexandru.elisei@arm.com>,
Andrew Jones <andrew.jones@linux.dev>,
Eric Auger <eric.auger@redhat.com>,
Thomas Huth <thuth@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions
Date: Thu, 18 Dec 2025 10:26:37 -0800 [thread overview]
Message-ID: <aURHXUX3tC8ElwFa@google.com> (raw)
In-Reply-To: <769da4d7-0e8c-447a-be6b-1e3ad9a0ae36@grsecurity.net>
VICTORY IS MINE!!!!!!!
On Thu, Dec 18, 2025, Mathias Krause wrote:
> On 18.12.25 02:44, Sean Christopherson wrote:
> > On Fri, Nov 21, 2025, Mathias Krause wrote:
> >> On 18.11.25 02:47, Mathias Krause wrote:
> >>
> >> Finally found it. It's register corruption within both host and guest.
> >> [...]
> >
> > *sigh*
> >
> > I'm not going to type out the first dozen words that escaped my mouth when
> > reading this. I happen to like my job :-)
>
> Me too, that's why I had to switch tasks to gain back some sanity ;)
>
> > [...]
> >> I hacked up something to verify my theory and made 'regs' "per-cpu". It
> >> needs quite some code churn and I'm not all that happy with it. IMHO,
> >> 'regs' and lots of the other VMX management state should be part of some
> >> vcpu struct or something. In fact, struct vmx_test already has a
> >> 'guest_regs' but using it won't work, as we need offsetable absolute
> >> memory references for the inline ASM in vmx_enter_guest() to work as it
> >> cannot make use of register-based memory references at all. (My hack
> >> uses a global 'scratch_regs' with mutual exclusion on its usage.)
> >
> > So, much to my chagrin, I started coding away before reading your entire mail
> > (I got through the paragraph about 'regs' getting clobbered, and then came back
> > to the rest later; that was a mistake).
>
> A common mistake of mine: writing lengthy explanations that don't get
> read. Maybe I should start included some tl;dr section for such cases ;)
FWIW, I like the lengthy, detailed explanations. I just got excited about having
a clever idea for addressing the regs.
> > The test itself is also flawed. On top of this race that you spotted:
> >
> > @@ -9985,11 +9985,11 @@ static void vmx_sipi_signal_test(void)
> > /* update CR3 on AP */
> > on_cpu(1, update_cr3, (void *)read_cr3());
> >
> > + vmx_set_test_stage(0);
> > +
> > /* start AP */
> > on_cpu_async(1, sipi_test_ap_thread, NULL);
> >
> > - vmx_set_test_stage(0);
> > -
> > /* BSP enter guest */
> > enter_guest();
> > }
> >
> > This snippet is also broken:
> >
> > vmx_set_test_stage(1);
> >
> > /* AP enter guest */
> > enter_guest();
> >
> > because the BSP can think the AP has entered WFS before it has even attempted
> > VMLAUNCH. It's "fine" so long as there are host CPUs available, but the test
> > fails 100% for me if I run KUT in a VM with e.g. j<number of CPUs>, and on an
> > Ivybridge server CPU, it fails pretty consistently. No idea why, maybe a slower
> > nested VM-Enter path? E.g. the test passes on EMR even if I run with j<2x CPUs>.
>
> Yeah, I noticed the failing synchronization between the vCPUs too. I
> even tried to fix them by introducing a few more states (simply more
> steps). However, then I ended up in deadlocks, as the AP re-enters
> vmx_sipi_test_guest() multiple times (well, twice). It was likely
> related to the register corruption, I only noticed later. So I believed
> those magic sleeps (delay(SIPI_SIGNAL_TEST_DELAY)) are for the lack of
> synchronization and should account for the time span of the AP setting a
> new step value but not yet having switched to the guest.
> It might be worth revisiting my attempts of explicitly adding new step
> states, now with the register corrupting being handled...
>
> >
> > Unfortunately, I can't think of any way to fix that problem. To recognize the
> > SIPI, the AP needs to do VM-Enter with a WFS activity state, and I'm struggling
> > to think of a way to atomically write software-visible state at the time of VM-Enter.
> > E.g. in theory, the test could peek at the LAUNCHED field in the VMCS, but that
> > would require reverse engineering the VMCS layout for every CPU. If KVM emulated
> > any VM-wide MSRs, maybe we could throw one in the MSR load list? But all the ones
> > I can think of, e.g. Hyper-V's partition-wide MSRs, aren't guaranteed to be available.
>
> Ahh, that explains the deadlock I observed. So adding more states won't
> help. Yeah, it needs some trickery... or those magic sleeps :D
Trickery obtained! MSR_KVM_WALL_CLOCK_NEW to the rescue. Even if with a delay
on the AP but not the BSP:
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 6bfd6280..047eb1f5 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -9869,7 +9869,7 @@ static void vmx_sipi_test_guest(void)
/* wait AP enter guest with activity=WAIT_SIPI */
while (vmx_get_test_stage() != 1)
;
- delay(SIPI_SIGNAL_TEST_DELAY);
+ // delay(SIPI_SIGNAL_TEST_DELAY);
/* First SIPI signal */
apic_icr_write(APIC_DEST_PHYSICAL | APIC_DM_STARTUP | APIC_INT_ASSERT, id_map[1]);
@@ -9938,6 +9938,7 @@ static void sipi_test_ap_thread(void *data)
vmcs_write(GUEST_ACTV_STATE, ACTV_WAIT_SIPI);
vmx_set_test_stage(1);
+ delay(SIPI_SIGNAL_TEST_DELAY);
/* AP enter guest */
enter_guest();
Writing MSR_KVM_WALL_CLOCK_NEW via the AP's VM-Entry load list passes with.
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 510454a6..2d140ee5 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -6,6 +6,7 @@
#include <asm/debugreg.h>
+#include "kvmclock.h"
#include "vmx.h"
#include "msr.h"
#include "processor.h"
@@ -1967,7 +1968,7 @@ struct vmx_msr_entry {
u32 index;
u32 reserved;
u64 value;
-} __attribute__((packed));
+} __attribute__((packed)) __attribute__((aligned(16)));
#define MSR_MAGIC 0x31415926
struct vmx_msr_entry *exit_msr_store, *entry_msr_load, *exit_msr_load;
@@ -9861,6 +9862,8 @@ static void vmx_init_signal_test(void)
*/
}
+static bool use_kvm_wall_clock;
+
#define SIPI_SIGNAL_TEST_DELAY 100000000ULL
static void vmx_sipi_test_guest(void)
@@ -9869,7 +9872,7 @@ static void vmx_sipi_test_guest(void)
/* wait AP enter guest with activity=WAIT_SIPI */
while (vmx_get_test_stage() != 1)
;
- delay(SIPI_SIGNAL_TEST_DELAY);
+ // delay(SIPI_SIGNAL_TEST_DELAY);
/* First SIPI signal */
apic_icr_write(APIC_DEST_PHYSICAL | APIC_DM_STARTUP | APIC_INT_ASSERT, id_map[1]);
@@ -9903,6 +9906,12 @@ static void vmx_sipi_test_guest(void)
}
}
+static const struct vmx_msr_entry msr_load_wall_clock = {
+ .index = MSR_KVM_WALL_CLOCK_NEW,
+ .reserved = 0,
+ .value = 1,
+};
+
static void sipi_test_ap_thread(void *data)
{
struct guest_regs *regs = this_cpu_guest_regs();
@@ -9937,7 +9946,15 @@ static void sipi_test_ap_thread(void *data)
/* Set guest activity state to wait-for-SIPI state */
vmcs_write(GUEST_ACTV_STATE, ACTV_WAIT_SIPI);
- vmx_set_test_stage(1);
+ if (use_kvm_wall_clock) {
+ wrmsr(MSR_KVM_WALL_CLOCK_NEW, 0);
+ vmcs_write(ENT_MSR_LD_CNT, 1);
+ vmcs_write(ENTER_MSR_LD_ADDR, virt_to_phys(&msr_load_wall_clock));
+ } else {
+ vmx_set_test_stage(1);
+ }
+
+ delay(SIPI_SIGNAL_TEST_DELAY);
/* AP enter guest */
enter_guest();
@@ -9980,6 +9997,8 @@ static void vmx_sipi_signal_test(void)
u64 cpu_ctrl_0 = CPU_SECONDARY;
u64 cpu_ctrl_1 = 0;
+ use_kvm_wall_clock = this_cpu_has_kvm() && this_cpu_has(KVM_FEATURE_CLOCKSOURCE2);
+
/* passthrough lapic to L2 */
disable_intercept_for_x2apic_msrs();
vmcs_write(PIN_CONTROLS, vmcs_read(PIN_CONTROLS) & ~PIN_EXTINT);
@@ -9996,6 +10015,13 @@ static void vmx_sipi_signal_test(void)
/* start AP */
on_cpu_async(1, sipi_test_ap_thread, NULL);
+ if (use_kvm_wall_clock) {
+ while (rdmsr(MSR_KVM_WALL_CLOCK_NEW) != 1)
+ cpu_relax();
+
+ vmx_set_test_stage(1);
+ }
+
/* BSP enter guest */
enter_guest();
}
next prev parent reply other threads:[~2025-12-18 18:26 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-15 21:54 [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions Mathias Krause
2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 1/4] Makefile: Provide a concept of late CFLAGS Mathias Krause
2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 2/4] x86: Better backtraces for leaf functions Mathias Krause
2025-10-10 6:03 ` Mathias Krause
2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 3/4] arm64: " Mathias Krause
2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 4/4] arm: Fix backtraces involving " Mathias Krause
2025-09-16 13:04 ` [kvm-unit-tests PATCH v2 0/4] Better backtraces for " Andrew Jones
2025-11-14 15:58 ` Mathias Krause
2025-11-14 16:39 ` Sean Christopherson
2025-11-14 18:25 ` Sean Christopherson
2025-11-15 4:56 ` Mathias Krause
2025-11-17 22:19 ` Sean Christopherson
2025-11-18 1:33 ` Mathias Krause
2025-11-18 1:47 ` Mathias Krause
2025-11-18 4:04 ` Mathias Krause
2025-11-18 11:56 ` Mathias Krause
2025-11-18 12:10 ` Mathias Krause
2025-11-21 16:44 ` Mathias Krause
2025-12-18 1:44 ` Sean Christopherson
2025-12-18 10:07 ` Mathias Krause
2025-12-18 18:26 ` Sean Christopherson [this message]
2025-12-19 13:19 ` Mathias Krause
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aURHXUX3tC8ElwFa@google.com \
--to=seanjc@google.com \
--cc=alexandru.elisei@arm.com \
--cc=andrew.jones@linux.dev \
--cc=eric.auger@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=minipli@grsecurity.net \
--cc=pbonzini@redhat.com \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox