All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: pbonzini@redhat.com, rick.p.edgecombe@intel.com,
	 linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH] KVM: selftests: Wait mprotect_ro_done before write to RO in mmu_stress_test
Date: Tue, 25 Feb 2025 17:48:39 -0800	[thread overview]
Message-ID: <Z75y90KM_fE6H1cJ@google.com> (raw)
In-Reply-To: <20250208105318.16861-1-yan.y.zhao@intel.com>

On Sat, Feb 08, 2025, Yan Zhao wrote:
> In the read-only mprotect() phase of mmu_stress_test, ensure that
> mprotect(PROT_READ) has completed before the guest starts writing to the
> read-only mprotect() memory.
> 
> Without waiting for mprotect_ro_done before the guest starts writing in
> stage 3 (the stage for read-only mprotect()), the host's assertion of stage
> 3 could fail if mprotect_ro_done is set to true in the window between the
> guest finishing writes to all GPAs and executing GUEST_SYNC(3).
> 
> This scenario is easy to occur especially when there are hundred of vCPUs.
> 
> CPU 0                  CPU 1 guest     CPU 1 host
>                                        enter stage 3's 1st loop
>                        //in stage 3
>                        write all GPAs
>                        @rip 0x4025f0
> 
> mprotect(PROT_READ)
> mprotect_ro_done=true
>                        GUEST_SYNC(3)
>                                        r=0, continue stage 3's 1st loop
> 
>                        //in stage 4
>                        write GPA
>                        @rip 0x402635
> 
>                                        -EFAULT, jump out stage 3's 1st loop
>                                        enter stage 3's 2nd loop
>                        write GPA
>                        @rip 0x402635
>                                        -EFAULT, continue stage 3's 2nd loop
>                                        guest rip += 3
> 
> The test then fails and reports "Unhandled exception '0xe' at guest RIP
> '0x402638'", since the next valid guest rip address is 0x402639, i.e. the
> "(mem) = val" in vcpu_arch_put_guest() is compiled into a mov instruction
> of length 4.

This shouldn't happen.  On x86, stage 3 is a hand-coded "mov %rax, (%rax)", not
vcpu_arch_put_guest().  Either something else is going on, or __x86_64__ isn't
defined?

	do {
		for (gpa = start_gpa; gpa < end_gpa; gpa += stride)
#ifdef __x86_64__
			asm volatile(".byte 0x48,0x89,0x00" :: "a"(gpa) : "memory"); /* mov %rax, (%rax) */
#elif defined(__aarch64__)
			asm volatile("str %0, [%0]" :: "r" (gpa) : "memory");
#else
			vcpu_arch_put_guest(*((volatile uint64_t *)gpa), gpa);
#endif
	} while (!READ_ONCE(mprotect_ro_done));

	/*
	 * Only architectures that write the entire range can explicitly sync,
	 * as other architectures will be stuck on the write fault.
	 */
#if defined(__x86_64__) || defined(__aarch64__)
	GUEST_SYNC(3);
#endif

	for (gpa = start_gpa; gpa < end_gpa; gpa += stride)
		vcpu_arch_put_guest(*((volatile uint64_t *)gpa), gpa);
	GUEST_SYNC(4);



> Even if it could be compiled into a mov instruction of length 3, the
> following execution of GUEST_SYNC(4) in guest could still cause the host
> failure of the assertion of stage 3.

Sorry, I don't follow.  The guest doesn't get "released" from GUEST_SYNC(3) until
the host runs the vCPU again, and that happens after asserting on stage 3 and
synchronizing with the main thread.

	assert_sync_stage(vcpu, 3);
#endif /* __x86_64__ || __aarch64__ */
	rendezvous_with_boss();

	/*
	 * Stage 4.  Run to completion, waiting for mprotect(PROT_WRITE) to
	 * make the memory writable again.
	 */
	do {
		r = _vcpu_run(vcpu);
	} while (r && errno == EFAULT);

  parent reply	other threads:[~2025-02-26  1:48 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-08 10:53 [PATCH] KVM: selftests: Wait mprotect_ro_done before write to RO in mmu_stress_test Yan Zhao
2025-02-11  1:03 ` Edgecombe, Rick P
2025-02-11  1:42   ` Yan Zhao
2025-02-11 18:29     ` Edgecombe, Rick P
2025-02-12  6:59       ` Yan Zhao
2025-02-26  1:53       ` Sean Christopherson
2025-02-26  2:07         ` Yan Zhao
2025-02-26  2:20           ` Sean Christopherson
2025-02-26  3:09             ` Yan Zhao
2025-02-26  1:48 ` Sean Christopherson [this message]
2025-02-26  3:08   ` Yan Zhao
2025-02-26 19:30     ` Sean Christopherson
2025-02-27  5:45       ` Yan Zhao
2025-02-27 22:18         ` Sean Christopherson
2025-02-28 10:55           ` Yan Zhao
2025-02-28 14:00             ` Sean Christopherson
2025-03-03  3:14               ` Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z75y90KM_fE6H1cJ@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.