[PATCH 0/2] KVM: x86: Fix dirty logging of emulated atomics

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] KVM: x86: Fix dirty logging of emulated atomics
@ 2024-02-15  1:00 Sean Christopherson
  2024-02-15  1:00 ` [PATCH 1/2] KVM: x86: Mark target gfn of emulated atomic instruction as dirty Sean Christopherson
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Sean Christopherson @ 2024-02-15  1:00 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, David Matlack, Pasha Tatashin, Michael Krebs

Fix a bug in KVM's emulator where the target page of an atomic write isn't
marked dirty, and enhance the dirty_log_test selftest to serve as
a regression test by conditionally doing forced emulation of guest writes.

Note, the selftest depends on several patches that are sitting in
`kvm-x86 pmu`, so I'll likely take the selftest through that branch (eww).

Sean Christopherson (2):
  KVM: x86: Mark target gfn of emulated atomic instruction as dirty
  KVM: selftests: Test forced instruction emulation in dirty log test
    (x86 only)

 arch/x86/kvm/x86.c                           | 10 ++++++
 tools/testing/selftests/kvm/dirty_log_test.c | 36 ++++++++++++++++++--
 2 files changed, 43 insertions(+), 3 deletions(-)


base-commit: 7455665a3521aa7b56245c0a2810f748adc5fdd4
-- 
2.43.0.687.g38aa6559b0-goog


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] KVM: x86: Mark target gfn of emulated atomic instruction as dirty
  2024-02-15  1:00 [PATCH 0/2] KVM: x86: Fix dirty logging of emulated atomics Sean Christopherson
@ 2024-02-15  1:00 ` Sean Christopherson
  2024-02-15 17:13   ` Jim Mattson
  2024-02-15 17:57   ` David Matlack
  2024-02-15  1:00 ` [PATCH 2/2] KVM: selftests: Test forced instruction emulation in dirty log test (x86 only) Sean Christopherson
  2024-02-17  1:02 ` [PATCH 0/2] KVM: x86: Fix dirty logging of emulated atomics Sean Christopherson
  2 siblings, 2 replies; 17+ messages in thread
From: Sean Christopherson @ 2024-02-15  1:00 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, David Matlack, Pasha Tatashin, Michael Krebs

When emulating an atomic access on behalf of the guest, mark the target
gfn dirty if the CMPXCHG by KVM is attempted and doesn't fault.  This
fixes a bug where KVM effectively corrupts guest memory during live
migration by writing to guest memory without informing userspace that the
page is dirty.

Marking the page dirty got unintentionally dropped when KVM's emulated
CMPXCHG was converted to do a user access.  Before that, KVM explicitly
mapped the guest page into kernel memory, and marked the page dirty during
the unmap phase.

Mark the page dirty even if the CMPXCHG fails, as the old data is written
back on failure, i.e. the page is still written.  The value written is
guaranteed to be the same because the operation is atomic, but KVM's ABI
is that all writes are dirty logged regardless of the value written.  And
more importantly, that's what KVM did before the buggy commit.

Huge kudos to the folks on the Cc list (and many others), who did all the
actual work of triaging and debugging.

Fixes: 1c2361f667f3 ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Cc: Pasha Tatashin <tatashin@google.com>
Cc: Michael Krebs <mkrebs@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b66c45e7f6f8..3ec9781d6122 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8009,6 +8009,16 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,

 	if (r < 0)
 		return X86EMUL_UNHANDLEABLE;
+
+	/*
+	 * Mark the page dirty _before_ checking whether or not the CMPXCHG was
+	 * successful, as the old value is written back on failure.  Note, for
+	 * live migration, this is unnecessarily conservative as CMPXCHG writes
+	 * back the original value and the access is atomic, but KVM's ABI is
+	 * that all writes are dirty logged, regardless of the value written.
+	 */
+	kvm_vcpu_mark_page_dirty(vcpu, gpa_to_gfn(gpa));
+
 	if (r)
 		return X86EMUL_CMPXCHG_FAILED;

-- 
2.43.0.687.g38aa6559b0-goog

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] KVM: selftests: Test forced instruction emulation in dirty log test (x86 only)
  2024-02-15  1:00 [PATCH 0/2] KVM: x86: Fix dirty logging of emulated atomics Sean Christopherson
  2024-02-15  1:00 ` [PATCH 1/2] KVM: x86: Mark target gfn of emulated atomic instruction as dirty Sean Christopherson
@ 2024-02-15  1:00 ` Sean Christopherson
  2024-02-15  8:21   ` Oliver Upton
  2024-02-17  1:02 ` [PATCH 0/2] KVM: x86: Fix dirty logging of emulated atomics Sean Christopherson
  2 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2024-02-15  1:00 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, David Matlack, Pasha Tatashin, Michael Krebs

Add forced emulation of MOV and LOCK CMPXCHG instructions in the dirty log
test's guest code to verify that KVM's emulator marks pages dirty as
expected (and obviously to verify the emulator works at all).  In the long
term, the guest code would ideally hammer more of KVM's emulator, but for
starters, cover the two major paths: writes and atomics.

To minimize #ifdeffery, wrap only the related code that is x86 specific,
unnecessariliy synchronizing an extra boolean to the guest is far from the
end of the world.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/dirty_log_test.c | 36 ++++++++++++++++++--
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index eaad5b20854c..ff1d1c7f05d8 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -92,6 +92,29 @@ static uint64_t guest_test_phys_mem;
  */
 static uint64_t guest_test_virt_mem = DEFAULT_GUEST_TEST_MEM;
 
+static bool is_forced_emulation_enabled;
+
+static void guest_write_memory(uint64_t *mem, uint64_t val, uint64_t rand)
+{
+#ifdef __x86_64__
+	if (is_forced_emulation_enabled && (rand & 1)) {
+		if (rand & 2) {
+			__asm__ __volatile__(KVM_FEP "movq %1, %0"
+					     : "+m" (*mem)
+					     : "r" (val) : "memory");
+		} else {
+			uint64_t __old = READ_ONCE(*mem);
+
+			__asm__ __volatile__(KVM_FEP LOCK_PREFIX "cmpxchgq %[new], %[ptr]"
+					     : [ptr] "+m" (*mem), [old] "+a" (__old)
+					     : [new]"r" (val) : "memory", "cc");
+		}
+	} else
+#endif
+
+	*mem = val;
+}
+
 /*
  * Continuously write to the first 8 bytes of a random pages within
  * the testing memory region.
@@ -114,11 +137,13 @@ static void guest_code(void)
 
 	while (true) {
 		for (i = 0; i < TEST_PAGES_PER_LOOP; i++) {
+			uint64_t rand = READ_ONCE(random_array[i]);
+
 			addr = guest_test_virt_mem;
-			addr += (READ_ONCE(random_array[i]) % guest_num_pages)
-				* guest_page_size;
+			addr += (rand % guest_num_pages) * guest_page_size;
 			addr = align_down(addr, host_page_size);
-			*(uint64_t *)addr = READ_ONCE(iteration);
+
+			guest_write_memory((void *)addr, READ_ONCE(iteration), rand);
 		}
 
 		/* Tell the host that we need more random numbers */
@@ -772,6 +797,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	sync_global_to_guest(vm, guest_page_size);
 	sync_global_to_guest(vm, guest_test_virt_mem);
 	sync_global_to_guest(vm, guest_num_pages);
+	sync_global_to_guest(vm, is_forced_emulation_enabled);
 
 	/* Start the iterations */
 	iteration = 1;
@@ -875,6 +901,10 @@ int main(int argc, char *argv[])
 	int opt, i;
 	sigset_t sigset;
 
+#ifdef __x86_64__
+	is_forced_emulation_enabled = kvm_is_forced_emulation_enabled();
+#endif
+
 	sem_init(&sem_vcpu_stop, 0, 0);
 	sem_init(&sem_vcpu_cont, 0, 0);
 
-- 
2.43.0.687.g38aa6559b0-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] KVM: selftests: Test forced instruction emulation in dirty log test (x86 only)
  2024-02-15  1:00 ` [PATCH 2/2] KVM: selftests: Test forced instruction emulation in dirty log test (x86 only) Sean Christopherson
@ 2024-02-15  8:21   ` Oliver Upton
  2024-02-15 18:50     ` Sean Christopherson
  0 siblings, 1 reply; 17+ messages in thread
From: Oliver Upton @ 2024-02-15  8:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, David Matlack, Pasha Tatashin,
	Michael Krebs

On Wed, Feb 14, 2024 at 05:00:04PM -0800, Sean Christopherson wrote:
> Add forced emulation of MOV and LOCK CMPXCHG instructions in the dirty log
> test's guest code to verify that KVM's emulator marks pages dirty as
> expected (and obviously to verify the emulator works at all).  In the long
> term, the guest code would ideally hammer more of KVM's emulator, but for
> starters, cover the two major paths: writes and atomics.
> 
> To minimize #ifdeffery, wrap only the related code that is x86 specific,
> unnecessariliy synchronizing an extra boolean to the guest is far from the
> end of the world.

Meh, I wouldn't say the end result in guest_write_memory() is that
pretty. Just ifdef the whole function and provide a generic implementation
for the other architectures.

> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  tools/testing/selftests/kvm/dirty_log_test.c | 36 ++++++++++++++++++--
>  1 file changed, 33 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
> index eaad5b20854c..ff1d1c7f05d8 100644
> --- a/tools/testing/selftests/kvm/dirty_log_test.c
> +++ b/tools/testing/selftests/kvm/dirty_log_test.c
> @@ -92,6 +92,29 @@ static uint64_t guest_test_phys_mem;
>   */
>  static uint64_t guest_test_virt_mem = DEFAULT_GUEST_TEST_MEM;
>  
> +static bool is_forced_emulation_enabled;
> +
> +static void guest_write_memory(uint64_t *mem, uint64_t val, uint64_t rand)
> +{
> +#ifdef __x86_64__
> +	if (is_forced_emulation_enabled && (rand & 1)) {
> +		if (rand & 2) {

Can't you invert the logic and drop a level of indentation?

	if (!(is_forced_emulation_enabled && (rand & 1))) {
		*mem = val;
	} else if (rand & 2) {
		movq
	} else {
		cmpxchg8b
	}

> +			__asm__ __volatile__(KVM_FEP "movq %1, %0"
> +					     : "+m" (*mem)
> +					     : "r" (val) : "memory");
> +		} else {
> +			uint64_t __old = READ_ONCE(*mem);
> +
> +			__asm__ __volatile__(KVM_FEP LOCK_PREFIX "cmpxchgq %[new], %[ptr]"
> +					     : [ptr] "+m" (*mem), [old] "+a" (__old)
> +					     : [new]"r" (val) : "memory", "cc");
> +		}
> +	} else
> +#endif
> +
> +	*mem = val;
> +}
> +

-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Mark target gfn of emulated atomic instruction as dirty
  2024-02-15  1:00 ` [PATCH 1/2] KVM: x86: Mark target gfn of emulated atomic instruction as dirty Sean Christopherson
@ 2024-02-15 17:13   ` Jim Mattson
  2024-02-15 17:57   ` David Matlack
  1 sibling, 0 replies; 17+ messages in thread
From: Jim Mattson @ 2024-02-15 17:13 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, David Matlack, Pasha Tatashin,
	Michael Krebs

On Wed, Feb 14, 2024 at 5:00 PM Sean Christopherson <seanjc@google.com> wrote:
>
> When emulating an atomic access on behalf of the guest, mark the target
> gfn dirty if the CMPXCHG by KVM is attempted and doesn't fault.  This
> fixes a bug where KVM effectively corrupts guest memory during live
> migration by writing to guest memory without informing userspace that the
> page is dirty.
>
> Marking the page dirty got unintentionally dropped when KVM's emulated
> CMPXCHG was converted to do a user access.  Before that, KVM explicitly
> mapped the guest page into kernel memory, and marked the page dirty during
> the unmap phase.
>
> Mark the page dirty even if the CMPXCHG fails, as the old data is written
> back on failure, i.e. the page is still written.  The value written is
> guaranteed to be the same because the operation is atomic, but KVM's ABI
> is that all writes are dirty logged regardless of the value written.  And
> more importantly, that's what KVM did before the buggy commit.
>
> Huge kudos to the folks on the Cc list (and many others), who did all the
> actual work of triaging and debugging.
>
> Fixes: 1c2361f667f3 ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses")
> Cc: stable@vger.kernel.org
> Cc: David Matlack <dmatlack@google.com>
> Cc: Pasha Tatashin <tatashin@google.com>
> Cc: Michael Krebs <mkrebs@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Let's try this again, gmail...

Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Mark target gfn of emulated atomic instruction as dirty
  2024-02-15  1:00 ` [PATCH 1/2] KVM: x86: Mark target gfn of emulated atomic instruction as dirty Sean Christopherson
  2024-02-15 17:13   ` Jim Mattson
@ 2024-02-15 17:57   ` David Matlack
  2024-02-15 18:45     ` Sean Christopherson
  1 sibling, 1 reply; 17+ messages in thread
From: David Matlack @ 2024-02-15 17:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Pasha Tatashin, Michael Krebs

On Wed, Feb 14, 2024 at 5:00 PM Sean Christopherson <seanjc@google.com> wrote:
>
> When emulating an atomic access on behalf of the guest, mark the target
> gfn dirty if the CMPXCHG by KVM is attempted and doesn't fault.  This
> fixes a bug where KVM effectively corrupts guest memory during live
> migration by writing to guest memory without informing userspace that the
> page is dirty.
>
> Marking the page dirty got unintentionally dropped when KVM's emulated
> CMPXCHG was converted to do a user access.  Before that, KVM explicitly
> mapped the guest page into kernel memory, and marked the page dirty during
> the unmap phase.
>
> Mark the page dirty even if the CMPXCHG fails, as the old data is written
> back on failure, i.e. the page is still written.  The value written is
> guaranteed to be the same because the operation is atomic, but KVM's ABI
> is that all writes are dirty logged regardless of the value written.  And
> more importantly, that's what KVM did before the buggy commit.
>
> Huge kudos to the folks on the Cc list (and many others), who did all the
> actual work of triaging and debugging.
>
> Fixes: 1c2361f667f3 ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses")

I'm only half serious but... Should we just revert this commit?

This commit claims that kvm_vcpu_map() is unsafe because it can race
with mremap(). But there are many other places where KVM uses
kvm_vcpu_map() (e.g. nested VMX). It seems like KVM is just not
compatible with mremap() until we address all the users of
kvm_vcpu_map(). Patching _just_ emulator_cmpxchg_emulated() seems
silly but maybe I'm missing some context on what led to commit
1c2361f667f3 being written.

kvm_vcpu_map/unmap() might not be the best interface, but it serves as
a common choke-point for mapping guest memory to access in KVM. This
is helpful for avoiding missed dirty logging updates (obviously) and
will be even more helpful if we add support for freezing guest memory
and "KVM Userfault" (as discussed in the 1/3 PUCK). I think we all
agree we should do more of this (common choke points), not less. If
there's a usecase for mremap()ing guest memory, we should make
kvm_vcpu_map() play nice with mmu_notifiers.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Mark target gfn of emulated atomic instruction as dirty
  2024-02-15 17:57   ` David Matlack
@ 2024-02-15 18:45     ` Sean Christopherson
  2024-02-16 17:10       ` Sean Christopherson
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2024-02-15 18:45 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, kvm, linux-kernel, Pasha Tatashin, Michael Krebs

On Thu, Feb 15, 2024, David Matlack wrote:
> On Wed, Feb 14, 2024 at 5:00 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > When emulating an atomic access on behalf of the guest, mark the target
> > gfn dirty if the CMPXCHG by KVM is attempted and doesn't fault.  This
> > fixes a bug where KVM effectively corrupts guest memory during live
> > migration by writing to guest memory without informing userspace that the
> > page is dirty.
> >
> > Marking the page dirty got unintentionally dropped when KVM's emulated
> > CMPXCHG was converted to do a user access.  Before that, KVM explicitly
> > mapped the guest page into kernel memory, and marked the page dirty during
> > the unmap phase.
> >
> > Mark the page dirty even if the CMPXCHG fails, as the old data is written
> > back on failure, i.e. the page is still written.  The value written is
> > guaranteed to be the same because the operation is atomic, but KVM's ABI
> > is that all writes are dirty logged regardless of the value written.  And
> > more importantly, that's what KVM did before the buggy commit.
> >
> > Huge kudos to the folks on the Cc list (and many others), who did all the
> > actual work of triaging and debugging.
> >
> > Fixes: 1c2361f667f3 ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses")
> 
> I'm only half serious but... Should we just revert this commit?

No.

> This commit claims that kvm_vcpu_map() is unsafe because it can race
> with mremap(). But there are many other places where KVM uses
> kvm_vcpu_map() (e.g. nested VMX). It seems like KVM is just not
> compatible with mremap() until we address all the users of
> kvm_vcpu_map(). Patching _just_ emulator_cmpxchg_emulated() seems
> silly but maybe I'm missing some context on what led to commit
> 1c2361f667f3 being written.

The short version is that it's a rather trivial vector for userspace to trigger
UAF.  E.g. map non-refcounted memory into a guest and then unmap the memory.

We tried to fix the nVMX usage, but that proved to be impractical[1].  We haven't
forced the issue because it's not obvious that there's meaningful exposure in
practice, e.g. unless userspace is hiding regular memory from the kernel *and*
oversubscribing VMs, a benign userspace won't be affected.  But at the same time,
we don't have high confidence that the unsafe behavior can't be exploited in
practice.

What I am pushing for now is an off-by-default module param to let userspace
opt-in to unsafe mappings such as these[2].  Because if KVM starts allowing
non-refcounted struct page memory, the ability to exploit these flaws skyrockets.
(Though this reminds me that I need to take another look at whether or not allowing
non-refcounted struct page memory is actually necessary).

[1] https://lore.kernel.org/all/ZBEEQtmtNPaEqU1i@google.com
[2] https://lore.kernel.org/all/20230911021637.1941096-4-stevensd@google.com

> kvm_vcpu_map/unmap() might not be the best interface, but it serves as
> a common choke-point for mapping guest memory to access in KVM. This
> is helpful for avoiding missed dirty logging updates (obviously) and
> will be even more helpful if we add support for freezing guest memory
> and "KVM Userfault" (as discussed in the 1/3 PUCK). I think we all
> agree we should do more of this (common choke points), not less. If
> there's a usecase for mremap()ing guest memory, we should make
> kvm_vcpu_map() play nice with mmu_notifiers.

I agree, but KVM needs to __try_cmpxchg_user() use anyways, when updating guest
A/D bits in FNAME(update_accessed_dirty_bits)().  And that one we *definitely*
don't want to revert; see commit 2a8859f373b0 ("KVM: x86/mmu: do compare-and-exchange
of gPTE via the user address") for details on how broken the previous code was.

In other words, reverting to kvm_vcpu_{un,}map() *probably* isn't wildly unsafe,
but it also doesn't really buy us anything, and long term we have line of sight
to closing the holes for good.  And unlike the nVMX code, where it's reasonable
for KVM to disallow using non-refcounted memory for VMCS pages, disallowing such
memory for emulated atomic accesses is less reasonable.

Rather than revert, to make this more robust in the longer term, we can add a
wrapper in KVM to mark the gfn dirty.  I didn't do it here because I was hustling
to get this minimal fix posted.

E.g.

--
Subject: [PATCH] KVM: x86: Provide a wrapper for __try_cmpxchg_user() to mark
 the gfn dirty

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/paging_tmpl.h |  4 ++--
 arch/x86/kvm/x86.c             | 25 +++++++++----------------
 arch/x86/kvm/x86.h             | 19 +++++++++++++++++++
 3 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 4d4e98fe4f35..a8123406fe99 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -246,11 +246,11 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
 		if (unlikely(!walker->pte_writable[level - 1]))
 			continue;
 
-		ret = __try_cmpxchg_user(ptep_user, &orig_pte, pte, fault);
+		ret = kvm_try_cmpxchg_user(ptep_user, &orig_pte, pte, fault,
+					   vcpu, table_gfn);
 		if (ret)
 			return ret;
 
-		kvm_vcpu_mark_page_dirty(vcpu, table_gfn);
 		walker->ptes[level - 1] = pte;
 	}
 	return 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3ec9781d6122..bedb51fbbad3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7946,8 +7946,9 @@ static int emulator_write_emulated(struct x86_emulate_ctxt *ctxt,
 				   exception, &write_emultor);
 }
 
-#define emulator_try_cmpxchg_user(t, ptr, old, new) \
-	(__try_cmpxchg_user((t __user *)(ptr), (t *)(old), *(t *)(new), efault ## t))
+#define emulator_try_cmpxchg_user(t, ptr, old, new, vcpu, gfn)		  \
+	(kvm_try_cmpxchg_user((t __user *)(ptr), (t *)(old), *(t *)(new), \
+			      efault ## t, vcpu, gfn))
 
 static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
 				     unsigned long addr,
@@ -7960,6 +7961,7 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
 	u64 page_line_mask;
 	unsigned long hva;
 	gpa_t gpa;
+	gfn_t gfn;
 	int r;
 
 	/* guests cmpxchg8b have to be emulated atomically */
@@ -7990,18 +7992,19 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
 
 	hva += offset_in_page(gpa);
 
+	gfn = gpa_to_gfn(gpa);
 	switch (bytes) {
 	case 1:
-		r = emulator_try_cmpxchg_user(u8, hva, old, new);
+		r = emulator_try_cmpxchg_user(u8, hva, old, new, vcpu, gfn);
 		break;
 	case 2:
-		r = emulator_try_cmpxchg_user(u16, hva, old, new);
+		r = emulator_try_cmpxchg_user(u16, hva, old, new, vcpu, gfn);
 		break;
 	case 4:
-		r = emulator_try_cmpxchg_user(u32, hva, old, new);
+		r = emulator_try_cmpxchg_user(u32, hva, old, new, vcpu, gfn);
 		break;
 	case 8:
-		r = emulator_try_cmpxchg_user(u64, hva, old, new);
+		r = emulator_try_cmpxchg_user(u64, hva, old, new, vcpu, gfn);
 		break;
 	default:
 		BUG();
@@ -8009,16 +8012,6 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
 
 	if (r < 0)
 		return X86EMUL_UNHANDLEABLE;
-
-	/*
-	 * Mark the page dirty _before_ checking whether or not the CMPXCHG was
-	 * successful, as the old value is written back on failure.  Note, for
-	 * live migration, this is unnecessarily conservative as CMPXCHG writes
-	 * back the original value and the access is atomic, but KVM's ABI is
-	 * that all writes are dirty logged, regardless of the value written.
-	 */
-	kvm_vcpu_mark_page_dirty(vcpu, gpa_to_gfn(gpa));
-
 	if (r)
 		return X86EMUL_CMPXCHG_FAILED;
 
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 2f7e19166658..2fabc7cd7e39 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -290,6 +290,25 @@ static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
 	return !(kvm->arch.disabled_quirks & quirk);
 }
 
+
+
+/*
+ * Mark the page dirty even if the CMPXCHG fails (but didn't fault), as the old
+ * old value is written back on failure.  Note, for live migration, this is
+ * unnecessarily conservative as CMPXCHG writes back the original value and the
+ * access is atomic, but KVM's ABI is that all writes are dirty logged,
+ * regardless of the value written.
+ */
+#define kvm_try_cmpxchg_user(ptr, oldp, nval, label, vcpu, gfn) \
+({								\
+	int ret;						\
+								\
+	ret = __try_cmpxchg_user(ptr, oldp, nval, label);	\
+	if (ret >= 0)						\
+		kvm_vcpu_mark_page_dirty(vcpu, gfn);		\
+	ret;							\
+})
+
 void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip);
 
 u64 get_kvmclock_ns(struct kvm *kvm);

base-commit: 6769ea8da8a93ed4630f1ce64df6aafcaabfce64
-- 


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] KVM: selftests: Test forced instruction emulation in dirty log test (x86 only)
  2024-02-15  8:21   ` Oliver Upton
@ 2024-02-15 18:50     ` Sean Christopherson
  2024-02-15 20:13       ` Oliver Upton
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2024-02-15 18:50 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Paolo Bonzini, kvm, linux-kernel, David Matlack, Pasha Tatashin,
	Michael Krebs

On Thu, Feb 15, 2024, Oliver Upton wrote:
> On Wed, Feb 14, 2024 at 05:00:04PM -0800, Sean Christopherson wrote:
> > Add forced emulation of MOV and LOCK CMPXCHG instructions in the dirty log
> > test's guest code to verify that KVM's emulator marks pages dirty as
> > expected (and obviously to verify the emulator works at all).  In the long
> > term, the guest code would ideally hammer more of KVM's emulator, but for
> > starters, cover the two major paths: writes and atomics.
> > 
> > To minimize #ifdeffery, wrap only the related code that is x86 specific,
> > unnecessariliy synchronizing an extra boolean to the guest is far from the
> > end of the world.
> 
> Meh, I wouldn't say the end result in guest_write_memory() is that
> pretty. Just ifdef the whole function and provide a generic implementation
> for the other architectures.
> 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  tools/testing/selftests/kvm/dirty_log_test.c | 36 ++++++++++++++++++--
> >  1 file changed, 33 insertions(+), 3 deletions(-)
> > 
> > diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
> > index eaad5b20854c..ff1d1c7f05d8 100644
> > --- a/tools/testing/selftests/kvm/dirty_log_test.c
> > +++ b/tools/testing/selftests/kvm/dirty_log_test.c
> > @@ -92,6 +92,29 @@ static uint64_t guest_test_phys_mem;
> >   */
> >  static uint64_t guest_test_virt_mem = DEFAULT_GUEST_TEST_MEM;
> >  
> > +static bool is_forced_emulation_enabled;
> > +
> > +static void guest_write_memory(uint64_t *mem, uint64_t val, uint64_t rand)
> > +{
> > +#ifdef __x86_64__
> > +	if (is_forced_emulation_enabled && (rand & 1)) {
> > +		if (rand & 2) {
> 
> Can't you invert the logic and drop a level of indentation?
> 
> 	if (!(is_forced_emulation_enabled && (rand & 1))) {
> 		*mem = val;
> 	} else if (rand & 2) {
> 		movq
> 	} else {
> 		cmpxchg8b
> 	}

Yeah, the funky flow I concocted was done purely to have the "no emulation" path
fall through to the common "*mem = val".  I don't have a strong preference, I
mentally flipped a coin on doing that versus what you suggested, and apparently
chose poorly :-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] KVM: selftests: Test forced instruction emulation in dirty log test (x86 only)
  2024-02-15 18:50     ` Sean Christopherson
@ 2024-02-15 20:13       ` Oliver Upton
  2024-02-15 21:33         ` Sean Christopherson
  0 siblings, 1 reply; 17+ messages in thread
From: Oliver Upton @ 2024-02-15 20:13 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, David Matlack, Pasha Tatashin,
	Michael Krebs

On Thu, Feb 15, 2024 at 10:50:20AM -0800, Sean Christopherson wrote:
> Yeah, the funky flow I concocted was done purely to have the "no emulation" path
> fall through to the common "*mem = val".  I don't have a strong preference, I
> mentally flipped a coin on doing that versus what you suggested, and apparently
> chose poorly :-)

Oh, I could definitely tell this was intentional :) But really if folks
are going to add more flavors of emulated instructions to the x86
implementation (which they should) then it might make sense to just have
an x86-specific function.

But again, it's selftests, who cares! /s

-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] KVM: selftests: Test forced instruction emulation in dirty log test (x86 only)
  2024-02-15 20:13       ` Oliver Upton
@ 2024-02-15 21:33         ` Sean Christopherson
  2024-02-15 23:27           ` Oliver Upton
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2024-02-15 21:33 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Paolo Bonzini, kvm, linux-kernel, David Matlack, Pasha Tatashin,
	Michael Krebs

On Thu, Feb 15, 2024, Oliver Upton wrote:
> On Thu, Feb 15, 2024 at 10:50:20AM -0800, Sean Christopherson wrote:
> > Yeah, the funky flow I concocted was done purely to have the "no emulation" path
> > fall through to the common "*mem = val".  I don't have a strong preference, I
> > mentally flipped a coin on doing that versus what you suggested, and apparently
> > chose poorly :-)
> 
> Oh, I could definitely tell this was intentional :) But really if folks
> are going to add more flavors of emulated instructions to the x86
> implementation (which they should) then it might make sense to just have
> an x86-specific function.

Yeah, best prepare for the onslaught.  And if I base this on the SEV selftests
series that adds kvm_util_arch.h, it's easy to shove the x86 sequence into a
common location outside of dirty_log_test.c.  Then there are no #ifdefs or x86
code in dirty_log_test.c, and other tests can use the helper at will.

It'll require some macro hell to support all four sizes, but that's not hard,
just annoying.

And it's a good excuse to do what I should have done in the first place, and
make is_forced_emulation_enabled be available to all guest code without needing
to manually check it in each test.

Over 2-3 patches...

---
 tools/testing/selftests/kvm/dirty_log_test.c  |  9 ++++++---
 .../selftests/kvm/include/kvm_util_base.h     |  3 +++
 .../kvm/include/x86_64/kvm_util_arch.h        | 20 +++++++++++++++++++
 .../selftests/kvm/lib/x86_64/processor.c      |  3 +++
 .../selftests/kvm/x86_64/pmu_counters_test.c  |  3 ---
 .../kvm/x86_64/userspace_msr_exit_test.c      |  9 ++-------
 6 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index babea97b31a4..93c3a51a6d9b 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -114,11 +114,14 @@ static void guest_code(void)
 
 	while (true) {
 		for (i = 0; i < TEST_PAGES_PER_LOOP; i++) {
+			uint64_t rand = READ_ONCE(random_array[i]);
+			uint64_t val = READ_ONCE(iteration);
+
 			addr = guest_test_virt_mem;
-			addr += (READ_ONCE(random_array[i]) % guest_num_pages)
-				* guest_page_size;
+			addr += (rand % guest_num_pages) * guest_page_size;
 			addr = align_down(addr, host_page_size);
-			*(uint64_t *)addr = READ_ONCE(iteration);
+
+			vcpu_arch_put_guest((u64 *)addr, val, rand);
 		}
 
 		/* Tell the host that we need more random numbers */
diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 4b266dc0c9bd..4b7285f073df 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -610,6 +610,9 @@ void *addr_gva2hva(struct kvm_vm *vm, vm_vaddr_t gva);
 vm_paddr_t addr_hva2gpa(struct kvm_vm *vm, void *hva);
 void *addr_gpa2alias(struct kvm_vm *vm, vm_paddr_t gpa);
 
+#ifndef vcpu_arch_put_guest
+#define vcpu_arch_put_guest(mem, val, rand) do { *mem = val; } while (0)
+#endif
 
 static inline vm_paddr_t vm_untag_gpa(struct kvm_vm *vm, vm_paddr_t gpa)
 {
diff --git a/tools/testing/selftests/kvm/include/x86_64/kvm_util_arch.h b/tools/testing/selftests/kvm/include/x86_64/kvm_util_arch.h
index 205ed788aeb8..3f9a44fd4bcb 100644
--- a/tools/testing/selftests/kvm/include/x86_64/kvm_util_arch.h
+++ b/tools/testing/selftests/kvm/include/x86_64/kvm_util_arch.h
@@ -5,6 +5,8 @@
 #include <stdbool.h>
 #include <stdint.h>
 
+extern bool is_forced_emulation_enabled;
+
 struct kvm_vm_arch {
 	uint64_t c_bit;
 	uint64_t s_bit;
@@ -20,4 +22,22 @@ static inline bool __vm_arch_has_protected_memory(struct kvm_vm_arch *arch)
 #define vm_arch_has_protected_memory(vm) \
 	__vm_arch_has_protected_memory(&(vm)->arch)
 
+/* TODO: Expand this madness to also support u8, u16, and u32 operands. */
+#define vcpu_arch_put_guest(mem, val, rand) 						\
+do {											\
+	if (!is_forced_emulation_enabled || !(rand & 1)) {				\
+		*mem = val;								\
+	} else if (rand & 2) {								\
+		__asm__ __volatile__(KVM_FEP "movq %1, %0"				\
+				     : "+m" (*mem)					\
+				     : "r" (val) : "memory");				\
+	} else {									\
+		uint64_t __old = READ_ONCE(*mem);					\
+											\
+		__asm__ __volatile__(KVM_FEP LOCK_PREFIX "cmpxchgq %[new], %[ptr]"	\
+				     : [ptr] "+m" (*mem), [old] "+a" (__old)		\
+				     : [new]"r" (val) : "memory", "cc");		\
+	}										\
+} while (0)
+
 #endif  // _TOOLS_LINUX_ASM_X86_KVM_HOST_H
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index aa92220bf5da..d0a97d5e1ff9 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -23,6 +23,7 @@
 vm_vaddr_t exception_handlers;
 bool host_cpu_is_amd;
 bool host_cpu_is_intel;
+bool is_forced_emulation_enabled;
 
 static void regs_dump(FILE *stream, struct kvm_regs *regs, uint8_t indent)
 {
@@ -577,6 +578,7 @@ void kvm_arch_vm_post_create(struct kvm_vm *vm)
 	vm_create_irqchip(vm);
 	sync_global_to_guest(vm, host_cpu_is_intel);
 	sync_global_to_guest(vm, host_cpu_is_amd);
+	sync_global_to_guest(vm, is_forced_emulation_enabled);
 
 	if (vm->subtype == VM_SUBTYPE_SEV)
 		sev_vm_init(vm);
@@ -1337,6 +1339,7 @@ void kvm_selftest_arch_init(void)
 {
 	host_cpu_is_intel = this_cpu_is_intel();
 	host_cpu_is_amd = this_cpu_is_amd();
+	is_forced_emulation_enabled = kvm_is_forced_emulation_enabled();
 }
 
 bool sys_clocksource_is_based_on_tsc(void)
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index ae5f6042f1e8..6b2c1fd551b5 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -21,7 +21,6 @@
 
 static uint8_t kvm_pmu_version;
 static bool kvm_has_perf_caps;
-static bool is_forced_emulation_enabled;
 
 static struct kvm_vm *pmu_vm_create_with_one_vcpu(struct kvm_vcpu **vcpu,
 						  void *guest_code,
@@ -35,7 +34,6 @@ static struct kvm_vm *pmu_vm_create_with_one_vcpu(struct kvm_vcpu **vcpu,
 	vcpu_init_descriptor_tables(*vcpu);
 
 	sync_global_to_guest(vm, kvm_pmu_version);
-	sync_global_to_guest(vm, is_forced_emulation_enabled);
 
 	/*
 	 * Set PERF_CAPABILITIES before PMU version as KVM disallows enabling
@@ -609,7 +607,6 @@ int main(int argc, char *argv[])
 
 	kvm_pmu_version = kvm_cpu_property(X86_PROPERTY_PMU_VERSION);
 	kvm_has_perf_caps = kvm_cpu_has(X86_FEATURE_PDCM);
-	is_forced_emulation_enabled = kvm_is_forced_emulation_enabled();
 
 	test_intel_counters();
 
diff --git a/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c b/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c
index ab3a8c4f0b86..a409b796bb18 100644
--- a/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c
+++ b/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c
@@ -12,8 +12,6 @@
 #include "kvm_util.h"
 #include "vmx.h"
 
-static bool fep_available;
-
 #define MSR_NON_EXISTENT 0x474f4f00
 
 static u64 deny_bits = 0;
@@ -257,7 +255,7 @@ static void guest_code_filter_allow(void)
 	GUEST_ASSERT(data == 2);
 	GUEST_ASSERT(guest_exception_count == 0);
 
-	if (fep_available) {
+	if (is_forced_emulation_enabled) {
 		/* Let userspace know we aren't done. */
 		GUEST_SYNC(0);
 
@@ -519,7 +517,6 @@ static void test_msr_filter_allow(void)
 	int rc;
 
 	vm = vm_create_with_one_vcpu(&vcpu, guest_code_filter_allow);
-	sync_global_to_guest(vm, fep_available);
 
 	rc = kvm_check_cap(KVM_CAP_X86_USER_SPACE_MSR);
 	TEST_ASSERT(rc, "KVM_CAP_X86_USER_SPACE_MSR is available");
@@ -550,7 +547,7 @@ static void test_msr_filter_allow(void)
 	vcpu_run(vcpu);
 	cmd = process_ucall(vcpu);
 
-	if (fep_available) {
+	if (is_forced_emulation_enabled) {
 		TEST_ASSERT_EQ(cmd, UCALL_SYNC);
 		vm_install_exception_handler(vm, GP_VECTOR, guest_fep_gp_handler);
 
@@ -791,8 +788,6 @@ static void test_user_exit_msr_flags(void)
 
 int main(int argc, char *argv[])
 {
-	fep_available = kvm_is_forced_emulation_enabled();
-
 	test_msr_filter_allow();
 
 	test_msr_filter_deny();

base-commit: e072aa6dbd1db64323a407b3eca82dc5107ea0b1
-- 


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] KVM: selftests: Test forced instruction emulation in dirty log test (x86 only)
  2024-02-15 21:33         ` Sean Christopherson
@ 2024-02-15 23:27           ` Oliver Upton
  2024-02-16  0:26             ` Sean Christopherson
  0 siblings, 1 reply; 17+ messages in thread
From: Oliver Upton @ 2024-02-15 23:27 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, David Matlack, Pasha Tatashin,
	Michael Krebs

On Thu, Feb 15, 2024 at 01:33:48PM -0800, Sean Christopherson wrote:

[...]

> +/* TODO: Expand this madness to also support u8, u16, and u32 operands. */
> +#define vcpu_arch_put_guest(mem, val, rand) 						\
> +do {											\
> +	if (!is_forced_emulation_enabled || !(rand & 1)) {				\
> +		*mem = val;								\
> +	} else if (rand & 2) {								\
> +		__asm__ __volatile__(KVM_FEP "movq %1, %0"				\
> +				     : "+m" (*mem)					\
> +				     : "r" (val) : "memory");				\
> +	} else {									\
> +		uint64_t __old = READ_ONCE(*mem);					\
> +											\
> +		__asm__ __volatile__(KVM_FEP LOCK_PREFIX "cmpxchgq %[new], %[ptr]"	\
> +				     : [ptr] "+m" (*mem), [old] "+a" (__old)		\
> +				     : [new]"r" (val) : "memory", "cc");		\
> +	}										\
> +} while (0)
> +

Last bit of bikeshedding then I'll go... Can you just use a C function
and #define it so you can still do ifdeffery to slam in a default
implementation?

I hate macros :)

-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] KVM: selftests: Test forced instruction emulation in dirty log test (x86 only)
  2024-02-15 23:27           ` Oliver Upton
@ 2024-02-16  0:26             ` Sean Christopherson
  2024-02-16 15:55               ` Oliver Upton
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2024-02-16  0:26 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Paolo Bonzini, kvm, linux-kernel, David Matlack, Pasha Tatashin,
	Michael Krebs

On Thu, Feb 15, 2024, Oliver Upton wrote:
> On Thu, Feb 15, 2024 at 01:33:48PM -0800, Sean Christopherson wrote:
> 
> [...]
> 
> > +/* TODO: Expand this madness to also support u8, u16, and u32 operands. */
> > +#define vcpu_arch_put_guest(mem, val, rand) 						\
> > +do {											\
> > +	if (!is_forced_emulation_enabled || !(rand & 1)) {				\
> > +		*mem = val;								\
> > +	} else if (rand & 2) {								\
> > +		__asm__ __volatile__(KVM_FEP "movq %1, %0"				\
> > +				     : "+m" (*mem)					\
> > +				     : "r" (val) : "memory");				\
> > +	} else {									\
> > +		uint64_t __old = READ_ONCE(*mem);					\
> > +											\
> > +		__asm__ __volatile__(KVM_FEP LOCK_PREFIX "cmpxchgq %[new], %[ptr]"	\
> > +				     : [ptr] "+m" (*mem), [old] "+a" (__old)		\
> > +				     : [new]"r" (val) : "memory", "cc");		\
> > +	}										\
> > +} while (0)
> > +
> 
> Last bit of bikeshedding then I'll go... Can you just use a C function
> and #define it so you can still do ifdeffery to slam in a default
> implementation?

Yes, but the macro shenanigans aren't to create a default, they're to set the
stage for expanding to other sizes without having to do:

  vcpu_arch_put_guest{8,16,32,64}()

or if we like bytes instead of bits:

  vcpu_arch_put_guest{1,2,4,8}()

I'm not completely against that approach; it's not _that_ much copy+paste
boilerplate, but it's enough that I think that macros would be a clear win,
especially if we want to expand what instructions are used.

<me fiddles around>

Actually, I take that back, I am against that approach :-)

I was expecting to have to do some switch() explosion to get the CMPXCHG stuff
working, but I'm pretty sure the mess that is the kernel's unsafe_try_cmpxchg_user()
and __put_user_size() is is almost entirely due to needing to support 32-bit kernels,
or maybe some need to strictly control the asm constraints.

For selftests, AFAICT the below Just Works on gcc and clang for legal sizes.  And
as a bonus, we can sanity check that the pointer and value are of the same size.
Which we definitely should do, otherwise the compiler has a nasty habit of using
the size of the value of the right hand side for the asm blobs, e.g. this

	vcpu_arch_put_guest((u8 *)addr, (u32)val, rand);

generates 32-bit accesses.  Oof.

#define vcpu_arch_put_guest(mem, val, rand) 					\
do {										\
	kvm_static_assert(sizeof(*mem) == sizeof(val));				\
	if (!is_forced_emulation_enabled || !(rand & 1)) {			\
		*mem = val;							\
	} else if (rand & 2) {							\
		__asm__ __volatile__(KVM_FEP "mov %1, %0"			\
				     : "+m" (*mem)				\
				     : "r" (val) : "memory");			\
	} else {								\
		uint64_t __old = READ_ONCE(*mem);				\
										\
		__asm__ __volatile__(LOCK_PREFIX "cmpxchg %[new], %[ptr]"	\
				     : [ptr] "+m" (*mem), [old] "+a" (__old)	\
				     : [new]"r" (val) : "memory", "cc");	\
	}									\
} while (0)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] KVM: selftests: Test forced instruction emulation in dirty log test (x86 only)
  2024-02-16  0:26             ` Sean Christopherson
@ 2024-02-16 15:55               ` Oliver Upton
  2024-02-16 17:03                 ` Sean Christopherson
  0 siblings, 1 reply; 17+ messages in thread
From: Oliver Upton @ 2024-02-16 15:55 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, David Matlack, Pasha Tatashin,
	Michael Krebs

On Thu, Feb 15, 2024 at 04:26:02PM -0800, Sean Christopherson wrote:
> On Thu, Feb 15, 2024, Oliver Upton wrote:
> > On Thu, Feb 15, 2024 at 01:33:48PM -0800, Sean Christopherson wrote:
> > 
> > [...]
> > 
> > > +/* TODO: Expand this madness to also support u8, u16, and u32 operands. */
> > > +#define vcpu_arch_put_guest(mem, val, rand) 						\
> > > +do {											\
> > > +	if (!is_forced_emulation_enabled || !(rand & 1)) {				\
> > > +		*mem = val;								\
> > > +	} else if (rand & 2) {								\
> > > +		__asm__ __volatile__(KVM_FEP "movq %1, %0"				\
> > > +				     : "+m" (*mem)					\
> > > +				     : "r" (val) : "memory");				\
> > > +	} else {									\
> > > +		uint64_t __old = READ_ONCE(*mem);					\
> > > +											\
> > > +		__asm__ __volatile__(KVM_FEP LOCK_PREFIX "cmpxchgq %[new], %[ptr]"	\
> > > +				     : [ptr] "+m" (*mem), [old] "+a" (__old)		\
> > > +				     : [new]"r" (val) : "memory", "cc");		\
> > > +	}										\
> > > +} while (0)
> > > +
> > 
> > Last bit of bikeshedding then I'll go... Can you just use a C function
> > and #define it so you can still do ifdeffery to slam in a default
> > implementation?
> 
> Yes, but the macro shenanigans aren't to create a default, they're to set the
> stage for expanding to other sizes without having to do:
> 
>   vcpu_arch_put_guest{8,16,32,64}()
> 
> or if we like bytes instead of bits:
> 
>   vcpu_arch_put_guest{1,2,4,8}()
> 
> I'm not completely against that approach; it's not _that_ much copy+paste
> boilerplate, but it's enough that I think that macros would be a clear win,
> especially if we want to expand what instructions are used.

Oh, I see what you're after. Yeah, macro shenanigans are the only way
out then. Wasn't clear to me if the interface you wanted w/ the selftest
was a u64 write that you cracked into multiple writes behind the
scenes.

Thanks for entertaining my questions :)

-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] KVM: selftests: Test forced instruction emulation in dirty log test (x86 only)
  2024-02-16 15:55               ` Oliver Upton
@ 2024-02-16 17:03                 ` Sean Christopherson
  0 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2024-02-16 17:03 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Paolo Bonzini, kvm, linux-kernel, David Matlack, Pasha Tatashin,
	Michael Krebs

On Fri, Feb 16, 2024, Oliver Upton wrote:
> On Thu, Feb 15, 2024 at 04:26:02PM -0800, Sean Christopherson wrote:
> > On Thu, Feb 15, 2024, Oliver Upton wrote:
> > > On Thu, Feb 15, 2024 at 01:33:48PM -0800, Sean Christopherson wrote:
> > > 
> > > [...]
> > > 
> > > > +/* TODO: Expand this madness to also support u8, u16, and u32 operands. */
> > > > +#define vcpu_arch_put_guest(mem, val, rand) 						\
> > > > +do {											\
> > > > +	if (!is_forced_emulation_enabled || !(rand & 1)) {				\
> > > > +		*mem = val;								\
> > > > +	} else if (rand & 2) {								\
> > > > +		__asm__ __volatile__(KVM_FEP "movq %1, %0"				\
> > > > +				     : "+m" (*mem)					\
> > > > +				     : "r" (val) : "memory");				\
> > > > +	} else {									\
> > > > +		uint64_t __old = READ_ONCE(*mem);					\
> > > > +											\
> > > > +		__asm__ __volatile__(KVM_FEP LOCK_PREFIX "cmpxchgq %[new], %[ptr]"	\
> > > > +				     : [ptr] "+m" (*mem), [old] "+a" (__old)		\
> > > > +				     : [new]"r" (val) : "memory", "cc");		\
> > > > +	}										\
> > > > +} while (0)
> > > > +
> > > 
> > > Last bit of bikeshedding then I'll go... Can you just use a C function
> > > and #define it so you can still do ifdeffery to slam in a default
> > > implementation?
> > 
> > Yes, but the macro shenanigans aren't to create a default, they're to set the
> > stage for expanding to other sizes without having to do:
> > 
> >   vcpu_arch_put_guest{8,16,32,64}()
> > 
> > or if we like bytes instead of bits:
> > 
> >   vcpu_arch_put_guest{1,2,4,8}()
> > 
> > I'm not completely against that approach; it's not _that_ much copy+paste
> > boilerplate, but it's enough that I think that macros would be a clear win,
> > especially if we want to expand what instructions are used.
> 
> Oh, I see what you're after. Yeah, macro shenanigans are the only way
> out then. Wasn't clear to me if the interface you wanted w/ the selftest
> was a u64 write that you cracked into multiple writes behind the
> scenes.

I don't want to split u64 into multiple writes, as that would really violate the
principle of least surprise.  Even the RMW of the CMPXCHG is pushing things.

What I want is to provide an API that can be used by tests to generate guest writes
for the native/common sizes.  E.g. so that xen_shinfo_test can write 8-bit fields
using the APIs (don't ask me how long it took me to find a decent example that
wasn't using a 64-bit value :-) ).

	struct vcpu_info {
		uint8_t evtchn_upcall_pending;
		uint8_t evtchn_upcall_mask;
		unsigned long evtchn_pending_sel;
		struct arch_vcpu_info arch;
		struct pvclock_vcpu_time_info time;
	}; /* 64 bytes (x86) */

	vcpu_arch_put_guest(vi->evtchn_upcall_pending, 0);
	vcpu_arch_put_guest(vi->evtchn_pending_sel, 0);

And of course fleshing that out poked a bunch of holes in my plan, so after a
bit of scope screep...

---
#define vcpu_arch_put_guest(mem, __val) 						\
do {											\
	const typeof(mem) val = (__val);						\
											\
	if (!is_forced_emulation_enabled || guest_random_bool(&guest_rng)) {		\
		(mem) = val;								\
	} else if (guest_random_bool(&guest_rng)) {					\
		__asm__ __volatile__(KVM_FEP "mov %1, %0"				\
				     : "+m" (mem)					\
				     : "r" (val) : "memory");				\
	} else {									\
		uint64_t __old = READ_ONCE(mem);					\
											\
		__asm__ __volatile__(KVM_FEP LOCK_PREFIX "cmpxchg %[new], %[ptr]"	\
				     : [ptr] "+m" (mem), [old] "+a" (__old)		\
				     : [new]"r" (val) : "memory", "cc");		\
	}										\
} while (0)
---

Where guest_rng is a global pRNG instance

	struct guest_random_state {
		uint32_t seed;
	};

	extern uint32_t guest_random_seed;
	extern struct guest_random_state guest_rng;

that's configured with a completely random seed by default, but can be overriden
by tests for determinism, e.g. in dirty_log_perf_test

  void __attribute((constructor)) kvm_selftest_init(void)
  {
	/* Tell stdout not to buffer its content. */
	setbuf(stdout, NULL);

	guest_random_seed = random();

	kvm_selftest_arch_init();
  }

and automatically configured for each VM.

	pr_info("Random seed: 0x%x\n", guest_random_seed);
	guest_rng = new_guest_random_state(guest_random_seed);
	sync_global_to_guest(vm, guest_rng);

	kvm_arch_vm_post_create(vm);

Long term, I want to get to the point where the library code supports specifying
a seed for every test, i.e. so that every test that uses the pRNG can be as
deterministic as possible.  But that's definitely a future problem :-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Mark target gfn of emulated atomic instruction as dirty
  2024-02-15 18:45     ` Sean Christopherson
@ 2024-02-16 17:10       ` Sean Christopherson
  2024-02-16 17:14         ` David Matlack
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2024-02-16 17:10 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, kvm, linux-kernel, Pasha Tatashin, Michael Krebs

On Thu, Feb 15, 2024, Sean Christopherson wrote:
> On Thu, Feb 15, 2024, David Matlack wrote:
> > On Wed, Feb 14, 2024 at 5:00 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > When emulating an atomic access on behalf of the guest, mark the target
> > > gfn dirty if the CMPXCHG by KVM is attempted and doesn't fault.  This
> > > fixes a bug where KVM effectively corrupts guest memory during live
> > > migration by writing to guest memory without informing userspace that the
> > > page is dirty.
> > >
> > > Marking the page dirty got unintentionally dropped when KVM's emulated
> > > CMPXCHG was converted to do a user access.  Before that, KVM explicitly
> > > mapped the guest page into kernel memory, and marked the page dirty during
> > > the unmap phase.
> > >
> > > Mark the page dirty even if the CMPXCHG fails, as the old data is written
> > > back on failure, i.e. the page is still written.  The value written is
> > > guaranteed to be the same because the operation is atomic, but KVM's ABI
> > > is that all writes are dirty logged regardless of the value written.  And
> > > more importantly, that's what KVM did before the buggy commit.
> > >
> > > Huge kudos to the folks on the Cc list (and many others), who did all the
> > > actual work of triaging and debugging.
> > >
> > > Fixes: 1c2361f667f3 ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses")
> > 
> > I'm only half serious but... Should we just revert this commit?
> 
> No.

David, any objection to this patch?  I'd like to get this on its way to Paolo
asap, but also want to make sure we all agree this is the right solution before
doing so.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Mark target gfn of emulated atomic instruction as dirty
  2024-02-16 17:10       ` Sean Christopherson
@ 2024-02-16 17:14         ` David Matlack
  0 siblings, 0 replies; 17+ messages in thread
From: David Matlack @ 2024-02-16 17:14 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Pasha Tatashin, Michael Krebs

On Fri, Feb 16, 2024 at 9:10 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Feb 15, 2024, Sean Christopherson wrote:
> > On Thu, Feb 15, 2024, David Matlack wrote:
> > > On Wed, Feb 14, 2024 at 5:00 PM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > When emulating an atomic access on behalf of the guest, mark the target
> > > > gfn dirty if the CMPXCHG by KVM is attempted and doesn't fault.  This
> > > > fixes a bug where KVM effectively corrupts guest memory during live
> > > > migration by writing to guest memory without informing userspace that the
> > > > page is dirty.
> > > >
> > > > Marking the page dirty got unintentionally dropped when KVM's emulated
> > > > CMPXCHG was converted to do a user access.  Before that, KVM explicitly
> > > > mapped the guest page into kernel memory, and marked the page dirty during
> > > > the unmap phase.
> > > >
> > > > Mark the page dirty even if the CMPXCHG fails, as the old data is written
> > > > back on failure, i.e. the page is still written.  The value written is
> > > > guaranteed to be the same because the operation is atomic, but KVM's ABI
> > > > is that all writes are dirty logged regardless of the value written.  And
> > > > more importantly, that's what KVM did before the buggy commit.
> > > >
> > > > Huge kudos to the folks on the Cc list (and many others), who did all the
> > > > actual work of triaging and debugging.
> > > >
> > > > Fixes: 1c2361f667f3 ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses")
> > >
> > > I'm only half serious but... Should we just revert this commit?
> >
> > No.
>
> David, any objection to this patch?  I'd like to get this on its way to Paolo
> asap, but also want to make sure we all agree this is the right solution before
> doing so.

Sorry for the late response. No objection to this patch. I'd like a
better story for KVM code that interacts directly with user pointers,
but I have no objection to fixing forward for this case.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] KVM: x86: Fix dirty logging of emulated atomics
  2024-02-15  1:00 [PATCH 0/2] KVM: x86: Fix dirty logging of emulated atomics Sean Christopherson
  2024-02-15  1:00 ` [PATCH 1/2] KVM: x86: Mark target gfn of emulated atomic instruction as dirty Sean Christopherson
  2024-02-15  1:00 ` [PATCH 2/2] KVM: selftests: Test forced instruction emulation in dirty log test (x86 only) Sean Christopherson
@ 2024-02-17  1:02 ` Sean Christopherson
  2 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2024-02-17  1:02 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, David Matlack, Pasha Tatashin, Michael Krebs

On Wed, 14 Feb 2024 17:00:02 -0800, Sean Christopherson wrote:
> Fix a bug in KVM's emulator where the target page of an atomic write isn't
> marked dirty, and enhance the dirty_log_test selftest to serve as
> a regression test by conditionally doing forced emulation of guest writes.
> 
> Note, the selftest depends on several patches that are sitting in
> `kvm-x86 pmu`, so I'll likely take the selftest through that branch (eww).
> 
> [...]

Applied the fix itself to kvm-x86 fixes, I'll follow up with a heftier version
of the selftest patch for 6.9.

[1/2] KVM: x86: Mark target gfn of emulated atomic instruction as dirty
      https://github.com/kvm-x86/linux/commit/910c57dfa4d1

--
https://github.com/kvm-x86/linux/tree/next

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-02-17  1:15 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-15  1:00 [PATCH 0/2] KVM: x86: Fix dirty logging of emulated atomics Sean Christopherson
2024-02-15  1:00 ` [PATCH 1/2] KVM: x86: Mark target gfn of emulated atomic instruction as dirty Sean Christopherson
2024-02-15 17:13   ` Jim Mattson
2024-02-15 17:57   ` David Matlack
2024-02-15 18:45     ` Sean Christopherson
2024-02-16 17:10       ` Sean Christopherson
2024-02-16 17:14         ` David Matlack
2024-02-15  1:00 ` [PATCH 2/2] KVM: selftests: Test forced instruction emulation in dirty log test (x86 only) Sean Christopherson
2024-02-15  8:21   ` Oliver Upton
2024-02-15 18:50     ` Sean Christopherson
2024-02-15 20:13       ` Oliver Upton
2024-02-15 21:33         ` Sean Christopherson
2024-02-15 23:27           ` Oliver Upton
2024-02-16  0:26             ` Sean Christopherson
2024-02-16 15:55               ` Oliver Upton
2024-02-16 17:03                 ` Sean Christopherson
2024-02-17  1:02 ` [PATCH 0/2] KVM: x86: Fix dirty logging of emulated atomics Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox