All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: pbonzini@redhat.com, rick.p.edgecombe@intel.com,
	 linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH] KVM: selftests: Wait mprotect_ro_done before write to RO in mmu_stress_test
Date: Wed, 26 Feb 2025 11:30:15 -0800	[thread overview]
Message-ID: <Z79rx0H1aByewj5X@google.com> (raw)
In-Reply-To: <Z76FxYfZlhDG/J3s@yzhao56-desk.sh.intel.com>

On Wed, Feb 26, 2025, Yan Zhao wrote:
> On Tue, Feb 25, 2025 at 05:48:39PM -0800, Sean Christopherson wrote:
> > On Sat, Feb 08, 2025, Yan Zhao wrote:
> > > In the read-only mprotect() phase of mmu_stress_test, ensure that
> > > mprotect(PROT_READ) has completed before the guest starts writing to the
> > > read-only mprotect() memory.
> > > 
> > > Without waiting for mprotect_ro_done before the guest starts writing in
> > > stage 3 (the stage for read-only mprotect()), the host's assertion of stage
> > > 3 could fail if mprotect_ro_done is set to true in the window between the
> > > guest finishing writes to all GPAs and executing GUEST_SYNC(3).
> > > 
> > > This scenario is easy to occur especially when there are hundred of vCPUs.
> > > 
> > > CPU 0                  CPU 1 guest     CPU 1 host
> > >                                        enter stage 3's 1st loop
> > >                        //in stage 3
> > >                        write all GPAs
> > >                        @rip 0x4025f0
> > > 
> > > mprotect(PROT_READ)
> > > mprotect_ro_done=true
> > >                        GUEST_SYNC(3)
> > >                                        r=0, continue stage 3's 1st loop
> > > 
> > >                        //in stage 4
> > >                        write GPA
> > >                        @rip 0x402635
> > > 
> > >                                        -EFAULT, jump out stage 3's 1st loop
> > >                                        enter stage 3's 2nd loop
> > >                        write GPA
> > >                        @rip 0x402635
> > >                                        -EFAULT, continue stage 3's 2nd loop
> > >                                        guest rip += 3
> > > 
> > > The test then fails and reports "Unhandled exception '0xe' at guest RIP
> > > '0x402638'", since the next valid guest rip address is 0x402639, i.e. the
> > > "(mem) = val" in vcpu_arch_put_guest() is compiled into a mov instruction
> > > of length 4.
> > 
> > This shouldn't happen.  On x86, stage 3 is a hand-coded "mov %rax, (%rax)", not
> > vcpu_arch_put_guest().  Either something else is going on, or __x86_64__ isn't
> > defined?
> stage 3 is hand-coded "mov %rax, (%rax)", but stage 4 is with
> vcpu_arch_put_guest().
> 
> The original code expects that "mov %rax, (%rax)" in stage 3 can produce
> -EFAULT, so that in the host thread can jump out of stage 3's 1st vcpu_run()
> loop.

Ugh, I forgot that there are two loops in stage-3.  I tried to prevent this race,
but violated my own rule of not using arbitrary delays to avoid races.

Completely untested, but I think this should address the problem (I'll test
later today; you already did the hard work of debugging).  The only thing I'm
not positive is correct is making the first _vcpu_run() a one-off instead of a
loop.

diff --git a/tools/testing/selftests/kvm/mmu_stress_test.c b/tools/testing/selftests/kvm/mmu_stress_test.c
index d9c76b4c0d88..9ac1800bb770 100644
--- a/tools/testing/selftests/kvm/mmu_stress_test.c
+++ b/tools/testing/selftests/kvm/mmu_stress_test.c
@@ -18,6 +18,7 @@
 #include "ucall_common.h"
 
 static bool mprotect_ro_done;
+static bool vcpu_hit_ro_fault;
 
 static void guest_code(uint64_t start_gpa, uint64_t end_gpa, uint64_t stride)
 {
@@ -36,9 +37,9 @@ static void guest_code(uint64_t start_gpa, uint64_t end_gpa, uint64_t stride)
 
        /*
         * Write to the region while mprotect(PROT_READ) is underway.  Keep
-        * looping until the memory is guaranteed to be read-only, otherwise
-        * vCPUs may complete their writes and advance to the next stage
-        * prematurely.
+        * looping until the memory is guaranteed to be read-only and a fault
+        * has occured, otherwise vCPUs may complete their writes and advance
+        * to the next stage prematurely.
         *
         * For architectures that support skipping the faulting instruction,
         * generate the store via inline assembly to ensure the exact length
@@ -56,7 +57,7 @@ static void guest_code(uint64_t start_gpa, uint64_t end_gpa, uint64_t stride)
 #else
                        vcpu_arch_put_guest(*((volatile uint64_t *)gpa), gpa);
 #endif
-       } while (!READ_ONCE(mprotect_ro_done));
+       } while (!READ_ONCE(mprotect_ro_done) && !READ_ONCE(vcpu_hit_ro_fault));
 
        /*
         * Only architectures that write the entire range can explicitly sync,
@@ -148,12 +149,13 @@ static void *vcpu_worker(void *data)
         * be stuck on the faulting instruction for other architectures.  Go to
         * stage 3 without a rendezvous
         */
-       do {
-               r = _vcpu_run(vcpu);
-       } while (!r);
+       r = _vcpu_run(vcpu);
        TEST_ASSERT(r == -1 && errno == EFAULT,
                    "Expected EFAULT on write to RO memory, got r = %d, errno = %d", r, errno);
 
+       /* Tell the vCPU it hit a RO fault. */
+       WRITE_ONCE(vcpu_hit_ro_fault, true);
+
 #if defined(__x86_64__) || defined(__aarch64__)
        /*
         * Verify *all* writes from the guest hit EFAULT due to the VMA now
@@ -378,7 +380,6 @@ int main(int argc, char *argv[])
        rendezvous_with_vcpus(&time_run2, "run 2");
 
        mprotect(mem, slot_size, PROT_READ);
-       usleep(10);
        mprotect_ro_done = true;
        sync_global_to_guest(vm, mprotect_ro_done);


  reply	other threads:[~2025-02-26 19:30 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-08 10:53 [PATCH] KVM: selftests: Wait mprotect_ro_done before write to RO in mmu_stress_test Yan Zhao
2025-02-11  1:03 ` Edgecombe, Rick P
2025-02-11  1:42   ` Yan Zhao
2025-02-11 18:29     ` Edgecombe, Rick P
2025-02-12  6:59       ` Yan Zhao
2025-02-26  1:53       ` Sean Christopherson
2025-02-26  2:07         ` Yan Zhao
2025-02-26  2:20           ` Sean Christopherson
2025-02-26  3:09             ` Yan Zhao
2025-02-26  1:48 ` Sean Christopherson
2025-02-26  3:08   ` Yan Zhao
2025-02-26 19:30     ` Sean Christopherson [this message]
2025-02-27  5:45       ` Yan Zhao
2025-02-27 22:18         ` Sean Christopherson
2025-02-28 10:55           ` Yan Zhao
2025-02-28 14:00             ` Sean Christopherson
2025-03-03  3:14               ` Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z79rx0H1aByewj5X@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.