All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Mathias Krause <minipli@grsecurity.net>
Cc: kvm@vger.kernel.org, kvmarm@lists.linux.dev,
	 Alexandru Elisei <alexandru.elisei@arm.com>,
	Andrew Jones <andrew.jones@linux.dev>,
	 Eric Auger <eric.auger@redhat.com>,
	Thomas Huth <thuth@redhat.com>,
	 Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions
Date: Thu, 18 Dec 2025 10:26:37 -0800	[thread overview]
Message-ID: <aURHXUX3tC8ElwFa@google.com> (raw)
In-Reply-To: <769da4d7-0e8c-447a-be6b-1e3ad9a0ae36@grsecurity.net>

VICTORY IS MINE!!!!!!!

On Thu, Dec 18, 2025, Mathias Krause wrote:
> On 18.12.25 02:44, Sean Christopherson wrote:
> > On Fri, Nov 21, 2025, Mathias Krause wrote:
> >> On 18.11.25 02:47, Mathias Krause wrote:
> >>
> >> Finally found it. It's register corruption within both host and guest.
> >> [...]
> > 
> > *sigh*
> > 
> > I'm not going to type out the first dozen words that escaped my mouth when
> > reading this.  I happen to like my job :-)
> 
> Me too, that's why I had to switch tasks to gain back some sanity ;)
> 
> > [...]
> >> I hacked up something to verify my theory and made 'regs' "per-cpu". It
> >> needs quite some code churn and I'm not all that happy with it. IMHO,
> >> 'regs' and lots of the other VMX management state should be part of some
> >> vcpu struct or something. In fact, struct vmx_test already has a
> >> 'guest_regs' but using it won't work, as we need offsetable absolute
> >> memory references for the inline ASM in vmx_enter_guest() to work as it
> >> cannot make use of register-based memory references at all. (My hack
> >> uses a global 'scratch_regs' with mutual exclusion on its usage.)
> > 
> > So, much to my chagrin, I started coding away before reading your entire mail
> > (I got through the paragraph about 'regs' getting clobbered, and then came back
> > to the rest later; that was a mistake).
> 
> A common mistake of mine: writing lengthy explanations that don't get
> read. Maybe I should start included some tl;dr section for such cases ;)

FWIW, I like the lengthy, detailed explanations.  I just got excited about having
a clever idea for addressing the regs.

> > The test itself is also flawed.  On top of this race that you spotted:
> > 
> >  @@ -9985,11 +9985,11 @@ static void vmx_sipi_signal_test(void)                
> >        /* update CR3 on AP */                                                  
> >        on_cpu(1, update_cr3, (void *)read_cr3());                              
> >                                                                                
> >  +     vmx_set_test_stage(0);                                                  
> >  +                                                                             
> >        /* start AP */                                                          
> >        on_cpu_async(1, sipi_test_ap_thread, NULL);                             
> >                                                                                
> >  -     vmx_set_test_stage(0);                                                  
> >  -                                                                             
> >        /* BSP enter guest */                                                   
> >        enter_guest();                                                          
> >   }                                
> > 
> > This snippet is also broken:
> > 
> > 	vmx_set_test_stage(1);
> > 
> > 	/* AP enter guest */
> > 	enter_guest();
> > 
> > because the BSP can think the AP has entered WFS before it has even attempted
> > VMLAUNCH.  It's "fine" so long as there are host CPUs available, but the test
> > fails 100% for me if I run KUT in a VM with e.g. j<number of CPUs>, and on an
> > Ivybridge server CPU, it fails pretty consistently.  No idea why, maybe a slower
> > nested VM-Enter path?  E.g. the test passes on EMR even if I run with j<2x CPUs>.
> 
> Yeah, I noticed the failing synchronization between the vCPUs too. I
> even tried to fix them by introducing a few more states (simply more
> steps). However, then I ended up in deadlocks, as the AP re-enters
> vmx_sipi_test_guest() multiple times (well, twice). It was likely
> related to the register corruption, I only noticed later. So I believed
> those magic sleeps (delay(SIPI_SIGNAL_TEST_DELAY)) are for the lack of
> synchronization and should account for the time span of the AP setting a
> new step value but not yet having switched to the guest.
> It might be worth revisiting my attempts of explicitly adding new step
> states, now with the register corrupting being handled...
> 
> > 
> > Unfortunately, I can't think of any way to fix that problem.  To recognize the
> > SIPI, the AP needs to do VM-Enter with a WFS activity state, and I'm struggling
> > to think of a way to atomically write software-visible state at the time of VM-Enter.
> > E.g. in theory, the test could peek at the LAUNCHED field in the VMCS, but that
> > would require reverse engineering the VMCS layout for every CPU.  If KVM emulated
> > any VM-wide MSRs, maybe we could throw one in the MSR load list?  But all the ones
> > I can think of, e.g. Hyper-V's partition-wide MSRs, aren't guaranteed to be available.
> 
> Ahh, that explains the deadlock I observed. So adding more states won't
> help. Yeah, it needs some trickery... or those magic sleeps :D

Trickery obtained!  MSR_KVM_WALL_CLOCK_NEW to the rescue.  Even if with a delay
on the AP but not the BSP:

diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 6bfd6280..047eb1f5 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -9869,7 +9869,7 @@ static void vmx_sipi_test_guest(void)
                /* wait AP enter guest with activity=WAIT_SIPI */
                while (vmx_get_test_stage() != 1)
                        ;
-               delay(SIPI_SIGNAL_TEST_DELAY);
+               // delay(SIPI_SIGNAL_TEST_DELAY);
 
                /* First SIPI signal */
                apic_icr_write(APIC_DEST_PHYSICAL | APIC_DM_STARTUP | APIC_INT_ASSERT, id_map[1]);
@@ -9938,6 +9938,7 @@ static void sipi_test_ap_thread(void *data)
        vmcs_write(GUEST_ACTV_STATE, ACTV_WAIT_SIPI);
 
        vmx_set_test_stage(1);
+       delay(SIPI_SIGNAL_TEST_DELAY);
 
        /* AP enter guest */
        enter_guest();

Writing MSR_KVM_WALL_CLOCK_NEW via the AP's VM-Entry load list passes with.

diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 510454a6..2d140ee5 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -6,6 +6,7 @@
 
 #include <asm/debugreg.h>
 
+#include "kvmclock.h"
 #include "vmx.h"
 #include "msr.h"
 #include "processor.h"
@@ -1967,7 +1968,7 @@ struct vmx_msr_entry {
        u32 index;
        u32 reserved;
        u64 value;
-} __attribute__((packed));
+} __attribute__((packed)) __attribute__((aligned(16)));
 
 #define MSR_MAGIC 0x31415926
 struct vmx_msr_entry *exit_msr_store, *entry_msr_load, *exit_msr_load;
@@ -9861,6 +9862,8 @@ static void vmx_init_signal_test(void)
         */
 }
 
+static bool use_kvm_wall_clock;
+
 #define SIPI_SIGNAL_TEST_DELAY 100000000ULL
 
 static void vmx_sipi_test_guest(void)
@@ -9869,7 +9872,7 @@ static void vmx_sipi_test_guest(void)
                /* wait AP enter guest with activity=WAIT_SIPI */
                while (vmx_get_test_stage() != 1)
                        ;
-               delay(SIPI_SIGNAL_TEST_DELAY);
+               // delay(SIPI_SIGNAL_TEST_DELAY);
 
                /* First SIPI signal */
                apic_icr_write(APIC_DEST_PHYSICAL | APIC_DM_STARTUP | APIC_INT_ASSERT, id_map[1]);
@@ -9903,6 +9906,12 @@ static void vmx_sipi_test_guest(void)
        }
 }
 
+static const struct vmx_msr_entry msr_load_wall_clock = {
+       .index = MSR_KVM_WALL_CLOCK_NEW,
+       .reserved = 0,
+       .value = 1,
+};
+
 static void sipi_test_ap_thread(void *data)
 {
        struct guest_regs *regs = this_cpu_guest_regs();
@@ -9937,7 +9946,15 @@ static void sipi_test_ap_thread(void *data)
        /* Set guest activity state to wait-for-SIPI state */
        vmcs_write(GUEST_ACTV_STATE, ACTV_WAIT_SIPI);
 
-       vmx_set_test_stage(1);
+       if (use_kvm_wall_clock) {
+               wrmsr(MSR_KVM_WALL_CLOCK_NEW, 0);
+               vmcs_write(ENT_MSR_LD_CNT, 1);
+               vmcs_write(ENTER_MSR_LD_ADDR, virt_to_phys(&msr_load_wall_clock));
+       } else {
+               vmx_set_test_stage(1);
+       }
+
+       delay(SIPI_SIGNAL_TEST_DELAY);
 
        /* AP enter guest */
        enter_guest();
@@ -9980,6 +9997,8 @@ static void vmx_sipi_signal_test(void)
        u64 cpu_ctrl_0 = CPU_SECONDARY;
        u64 cpu_ctrl_1 = 0;
 
+       use_kvm_wall_clock = this_cpu_has_kvm() && this_cpu_has(KVM_FEATURE_CLOCKSOURCE2);
+
        /* passthrough lapic to L2 */
        disable_intercept_for_x2apic_msrs();
        vmcs_write(PIN_CONTROLS, vmcs_read(PIN_CONTROLS) & ~PIN_EXTINT);
@@ -9996,6 +10015,13 @@ static void vmx_sipi_signal_test(void)
        /* start AP */
        on_cpu_async(1, sipi_test_ap_thread, NULL);
 
+       if (use_kvm_wall_clock) {
+               while (rdmsr(MSR_KVM_WALL_CLOCK_NEW) != 1)
+                       cpu_relax();
+
+               vmx_set_test_stage(1);
+       }
+
        /* BSP enter guest */
        enter_guest();
 }
 

  reply	other threads:[~2025-12-18 18:26 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-15 21:54 [kvm-unit-tests PATCH v2 0/4] Better backtraces for leaf functions Mathias Krause
2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 1/4] Makefile: Provide a concept of late CFLAGS Mathias Krause
2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 2/4] x86: Better backtraces for leaf functions Mathias Krause
2025-10-10  6:03   ` Mathias Krause
2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 3/4] arm64: " Mathias Krause
2025-09-15 21:54 ` [kvm-unit-tests PATCH v2 4/4] arm: Fix backtraces involving " Mathias Krause
2025-09-16 13:04 ` [kvm-unit-tests PATCH v2 0/4] Better backtraces for " Andrew Jones
2025-11-14 15:58 ` Mathias Krause
2025-11-14 16:39   ` Sean Christopherson
2025-11-14 18:25 ` Sean Christopherson
2025-11-15  4:56   ` Mathias Krause
2025-11-17 22:19     ` Sean Christopherson
2025-11-18  1:33       ` Mathias Krause
2025-11-18  1:47         ` Mathias Krause
2025-11-18  4:04           ` Mathias Krause
2025-11-18 11:56             ` Mathias Krause
2025-11-18 12:10               ` Mathias Krause
2025-11-21 16:44           ` Mathias Krause
2025-12-18  1:44             ` Sean Christopherson
2025-12-18 10:07               ` Mathias Krause
2025-12-18 18:26                 ` Sean Christopherson [this message]
2025-12-19 13:19                   ` Mathias Krause

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aURHXUX3tC8ElwFa@google.com \
    --to=seanjc@google.com \
    --cc=alexandru.elisei@arm.com \
    --cc=andrew.jones@linux.dev \
    --cc=eric.auger@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=minipli@grsecurity.net \
    --cc=pbonzini@redhat.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.