public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Mathias Krause <minipli@grsecurity.net>
Cc: kvm@vger.kernel.org, kvmarm@lists.linux.dev,
	 Alexandru Elisei <alexandru.elisei@arm.com>,
	Andrew Jones <andrew.jones@linux.dev>,
	 Eric Auger <eric.auger@redhat.com>,
	Thomas Huth <thuth@redhat.com>,
	 Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions
Date: Thu, 18 Dec 2025 10:26:37 -0800	[thread overview]
Message-ID: <aURHXUX3tC8ElwFa@google.com> (raw)
In-Reply-To: <769da4d7-0e8c-447a-be6b-1e3ad9a0ae36@grsecurity.net>

VICTORY IS MINE!!!!!!!

On Thu, Dec 18, 2025, Mathias Krause wrote:
> On 18.12.25 02:44, Sean Christopherson wrote:
> > On Fri, Nov 21, 2025, Mathias Krause wrote:
> >> On 18.11.25 02:47, Mathias Krause wrote:
> >>
> >> Finally found it. It's register corruption within both host and guest.
> >> [...]
> > 
> > *sigh*
> > 
> > I'm not going to type out the first dozen words that escaped my mouth when
> > reading this.  I happen to like my job :-)
> 
> Me too, that's why I had to switch tasks to gain back some sanity ;)
> 
> > [...]
> >> I hacked up something to verify my theory and made 'regs' "per-cpu". It
> >> needs quite some code churn and I'm not all that happy with it. IMHO,
> >> 'regs' and lots of the other VMX management state should be part of some
> >> vcpu struct or something. In fact, struct vmx_test already has a
> >> 'guest_regs' but using it won't work, as we need offsetable absolute
> >> memory references for the inline ASM in vmx_enter_guest() to work as it
> >> cannot make use of register-based memory references at all. (My hack
> >> uses a global 'scratch_regs' with mutual exclusion on its usage.)
> > 
> > So, much to my chagrin, I started coding away before reading your entire mail
> > (I got through the paragraph about 'regs' getting clobbered, and then came back
> > to the rest later; that was a mistake).
> 
> A common mistake of mine: writing lengthy explanations that don't get
> read. Maybe I should start included some tl;dr section for such cases ;)

FWIW, I like the lengthy, detailed explanations.  I just got excited about having
a clever idea for addressing the regs.

> > The test itself is also flawed.  On top of this race that you spotted:
> > 
> >  @@ -9985,11 +9985,11 @@ static void vmx_sipi_signal_test(void)                
> >        /* update CR3 on AP */                                                  
> >        on_cpu(1, update_cr3, (void *)read_cr3());                              
> >                                                                                
> >  +     vmx_set_test_stage(0);                                                  
> >  +                                                                             
> >        /* start AP */                                                          
> >        on_cpu_async(1, sipi_test_ap_thread, NULL);                             
> >                                                                                
> >  -     vmx_set_test_stage(0);                                                  
> >  -                                                                             
> >        /* BSP enter guest */                                                   
> >        enter_guest();                                                          
> >   }                                
> > 
> > This snippet is also broken:
> > 
> > 	vmx_set_test_stage(1);
> > 
> > 	/* AP enter guest */
> > 	enter_guest();
> > 
> > because the BSP can think the AP has entered WFS before it has even attempted
> > VMLAUNCH.  It's "fine" so long as there are host CPUs available, but the test
> > fails 100% for me if I run KUT in a VM with e.g. j<number of CPUs>, and on an
> > Ivybridge server CPU, it fails pretty consistently.  No idea why, maybe a slower
> > nested VM-Enter path?  E.g. the test passes on EMR even if I run with j<2x CPUs>.
> 
> Yeah, I noticed the failing synchronization between the vCPUs too. I
> even tried to fix them by introducing a few more states (simply more
> steps). However, then I ended up in deadlocks, as the AP re-enters
> vmx_sipi_test_guest() multiple times (well, twice). It was likely
> related to the register corruption, I only noticed later. So I believed
> those magic sleeps (delay(SIPI_SIGNAL_TEST_DELAY)) are for the lack of
> synchronization and should account for the time span of the AP setting a
> new step value but not yet having switched to the guest.
> It might be worth revisiting my attempts of explicitly adding new step
> states, now with the register corrupting being handled...
> 
> > 
> > Unfortunately, I can't think of any way to fix that problem.  To recognize the
> > SIPI, the AP needs to do VM-Enter with a WFS activity state, and I'm struggling
> > to think of a way to atomically write software-visible state at the time of VM-Enter.
> > E.g. in theory, the test could peek at the LAUNCHED field in the VMCS, but that
> > would require reverse engineering the VMCS layout for every CPU.  If KVM emulated
> > any VM-wide MSRs, maybe we could throw one in the MSR load list?  But all the ones
> > I can think of, e.g. Hyper-V's partition-wide MSRs, aren't guaranteed to be available.
> 
> Ahh, that explains the deadlock I observed. So adding more states won't
> help. Yeah, it needs some trickery... or those magic sleeps :D

Trickery obtained!  MSR_KVM_WALL_CLOCK_NEW to the rescue.  Even if with a delay
on the AP but not the BSP:

diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 6bfd6280..047eb1f5 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -9869,7 +9869,7 @@ static void vmx_sipi_test_guest(void)
                /* wait AP enter guest with activity=WAIT_SIPI */
                while (vmx_get_test_stage() != 1)
                        ;
-               delay(SIPI_SIGNAL_TEST_DELAY);
+               // delay(SIPI_SIGNAL_TEST_DELAY);
 
                /* First SIPI signal */
                apic_icr_write(APIC_DEST_PHYSICAL | APIC_DM_STARTUP | APIC_INT_ASSERT, id_map[1]);
@@ -9938,6 +9938,7 @@ static void sipi_test_ap_thread(void *data)
        vmcs_write(GUEST_ACTV_STATE, ACTV_WAIT_SIPI);
 
        vmx_set_test_stage(1);
+       delay(SIPI_SIGNAL_TEST_DELAY);
 
        /* AP enter guest */
        enter_guest();

Writing MSR_KVM_WALL_CLOCK_NEW via the AP's VM-Entry load list passes with.

diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 510454a6..2d140ee5 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -6,6 +6,7 @@
 
 #include <asm/debugreg.h>
 
+#include "kvmclock.h"
 #include "vmx.h"
 #include "msr.h"
 #include "processor.h"
@@ -1967,7 +1968,7 @@ struct vmx_msr_entry {
        u32 index;
        u32 reserved;
        u64 value;
-} __attribute__((packed));
+} __attribute__((packed)) __attribute__((aligned(16)));
 
 #define MSR_MAGIC 0x31415926
 struct vmx_msr_entry *exit_msr_store, *entry_msr_load, *exit_msr_load;
@@ -9861,6 +9862,8 @@ static void vmx_init_signal_test(void)
         */
 }
 
+static bool use_kvm_wall_clock;
+
 #define SIPI_SIGNAL_TEST_DELAY 100000000ULL
 
 static void vmx_sipi_test_guest(void)
@@ -9869,7 +9872,7 @@ static void vmx_sipi_test_guest(void)
                /* wait AP enter guest with activity=WAIT_SIPI */
                while (vmx_get_test_stage() != 1)
                        ;
-               delay(SIPI_SIGNAL_TEST_DELAY);
+               // delay(SIPI_SIGNAL_TEST_DELAY);
 
                /* First SIPI signal */
                apic_icr_write(APIC_DEST_PHYSICAL | APIC_DM_STARTUP | APIC_INT_ASSERT, id_map[1]);
@@ -9903,6 +9906,12 @@ static void vmx_sipi_test_guest(void)
        }
 }
 
+static const struct vmx_msr_entry msr_load_wall_clock = {
+       .index = MSR_KVM_WALL_CLOCK_NEW,
+       .reserved = 0,
+       .value = 1,
+};
+
 static void sipi_test_ap_thread(void *data)
 {
        struct guest_regs *regs = this_cpu_guest_regs();
@@ -9937,7 +9946,15 @@ static void sipi_test_ap_thread(void *data)
        /* Set guest activity state to wait-for-SIPI state */
        vmcs_write(GUEST_ACTV_STATE, ACTV_WAIT_SIPI);
 
-       vmx_set_test_stage(1);
+       if (use_kvm_wall_clock) {
+               wrmsr(MSR_KVM_WALL_CLOCK_NEW, 0);
+               vmcs_write(ENT_MSR_LD_CNT, 1);
+               vmcs_write(ENTER_MSR_LD_ADDR, virt_to_phys(&msr_load_wall_clock));
+       } else {
+               vmx_set_test_stage(1);
+       }
+
+       delay(SIPI_SIGNAL_TEST_DELAY);
 
        /* AP enter guest */
        enter_guest();
@@ -9980,6 +9997,8 @@ static void vmx_sipi_signal_test(void)
        u64 cpu_ctrl_0 = CPU_SECONDARY;
        u64 cpu_ctrl_1 = 0;
 
+       use_kvm_wall_clock = this_cpu_has_kvm() && this_cpu_has(KVM_FEATURE_CLOCKSOURCE2);
+
        /* passthrough lapic to L2 */
        disable_intercept_for_x2apic_msrs();
        vmcs_write(PIN_CONTROLS, vmcs_read(PIN_CONTROLS) & ~PIN_EXTINT);
@@ -9996,6 +10015,13 @@ static void vmx_sipi_signal_test(void)
        /* start AP */
        on_cpu_async(1, sipi_test_ap_thread, NULL);
 
+       if (use_kvm_wall_clock) {
+               while (rdmsr(MSR_KVM_WALL_CLOCK_NEW) != 1)
+                       cpu_relax();
+
+               vmx_set_test_stage(1);
+       }
+
        /* BSP enter guest */
        enter_guest();
 }
 

  reply	other threads:[~2025-12-18 18:26 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-15 21:54 [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions Mathias Krause
2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 1/4] Makefile: Provide a concept of late CFLAGS Mathias Krause
2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 2/4] x86: Better backtraces for leaf functions Mathias Krause
2025-10-10  6:03   ` Mathias Krause
2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 3/4] arm64: " Mathias Krause
2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 4/4] arm: Fix backtraces involving " Mathias Krause
2025-09-16 13:04 ` [kvm-unit-tests PATCH v2 0/4] Better backtraces for " Andrew Jones
2025-11-14 15:58 ` Mathias Krause
2025-11-14 16:39   ` Sean Christopherson
2025-11-14 18:25 ` Sean Christopherson
2025-11-15  4:56   ` Mathias Krause
2025-11-17 22:19     ` Sean Christopherson
2025-11-18  1:33       ` Mathias Krause
2025-11-18  1:47         ` Mathias Krause
2025-11-18  4:04           ` Mathias Krause
2025-11-18 11:56             ` Mathias Krause
2025-11-18 12:10               ` Mathias Krause
2025-11-21 16:44           ` Mathias Krause
2025-12-18  1:44             ` Sean Christopherson
2025-12-18 10:07               ` Mathias Krause
2025-12-18 18:26                 ` Sean Christopherson [this message]
2025-12-19 13:19                   ` Mathias Krause

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aURHXUX3tC8ElwFa@google.com \
    --to=seanjc@google.com \
    --cc=alexandru.elisei@arm.com \
    --cc=andrew.jones@linux.dev \
    --cc=eric.auger@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=minipli@grsecurity.net \
    --cc=pbonzini@redhat.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox