public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] kvm: x86: fix kvmclock update protocol
@ 2015-04-23 11:46 Paolo Bonzini
  2015-04-23 15:24 ` Radim Krčmář
  2015-04-24 19:17 ` Marcelo Tosatti
  0 siblings, 2 replies; 3+ messages in thread
From: Paolo Bonzini @ 2015-04-23 11:46 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: rkrcmar, luto, mtosatti

From: Radim Krčmář <rkrcmar@redhat.com>

The kvmclock spec says that the host will increment a version field to
an odd number, then update stuff, then increment it to an even number.
The host is buggy and doesn't do this, and the result is observable
when one vcpu reads another vcpu's kvmclock data.

There's no good way for a guest kernel to keep its vdso from reading
a different vcpu's kvmclock data, but we don't need to care about
changing VCPUs as long as we read a consistent data from kvmclock.
(VCPU can change outside of this loop too, so it doesn't matter if we
return a value not fit for this VCPU.)

Based on a patch by Radim Krčmář.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c | 33 ++++++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ed31c31b2485..c73efcd03e29 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1669,12 +1669,28 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 		&guest_hv_clock, sizeof(guest_hv_clock))))
 		return 0;
 
-	/*
-	 * The interface expects us to write an even number signaling that the
-	 * update is finished. Since the guest won't see the intermediate
-	 * state, we just increase by 2 at the end.
+	/* This VCPU is paused, but it's legal for a guest to read another
+	 * VCPU's kvmclock, so we really have to follow the specification where
+	 * it says that version is odd if data is being modified, and even after
+	 * it is consistent.
+	 *
+	 * Version field updates must be kept separate.  This is because
+	 * kvm_write_guest_cached might use a "rep movs" instruction, and
+	 * writes within a string instruction are weakly ordered.  So there
+	 * are three writes overall.
+	 *
+	 * As a small optimization, only write the version field in the first
+	 * and third write.  The vcpu->pv_time cache is still valid, because the
+	 * version field is the first in the struct.
 	 */
-	vcpu->hv_clock.version = guest_hv_clock.version + 2;
+	BUILD_BUG_ON(offsetof(struct pvclock_vcpu_time_info, version) != 0);
+
+	vcpu->hv_clock.version = guest_hv_clock.version + 1;
+	kvm_write_guest_cached(v->kvm, &vcpu->pv_time,
+				&vcpu->hv_clock,
+				sizeof(vcpu->hv_clock.version));
+
+	smp_wmb();
 
 	/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
 	pvclock_flags = (guest_hv_clock.flags & PVCLOCK_GUEST_STOPPED);
@@ -1695,6 +1711,13 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	kvm_write_guest_cached(v->kvm, &vcpu->pv_time,
 				&vcpu->hv_clock,
 				sizeof(vcpu->hv_clock));
+
+	smp_wmb();
+
+	vcpu->hv_clock.version++;
+	kvm_write_guest_cached(v->kvm, &vcpu->pv_time,
+				&vcpu->hv_clock,
+				sizeof(vcpu->hv_clock.version));
 	return 0;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] kvm: x86: fix kvmclock update protocol
  2015-04-23 11:46 [PATCH] kvm: x86: fix kvmclock update protocol Paolo Bonzini
@ 2015-04-23 15:24 ` Radim Krčmář
  2015-04-24 19:17 ` Marcelo Tosatti
  1 sibling, 0 replies; 3+ messages in thread
From: Radim Krčmář @ 2015-04-23 15:24 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, luto, mtosatti

2015-04-23 13:46+0200, Paolo Bonzini:
> From: Radim Krčmář <rkrcmar@redhat.com>
> 
> The kvmclock spec says that the host will increment a version field to
> an odd number, then update stuff, then increment it to an even number.
> The host is buggy and doesn't do this, and the result is observable
> when one vcpu reads another vcpu's kvmclock data.
> 
> There's no good way for a guest kernel to keep its vdso from reading
> a different vcpu's kvmclock data, but we don't need to care about
> changing VCPUs as long as we read a consistent data from kvmclock.
> (VCPU can change outside of this loop too, so it doesn't matter if we
> return a value not fit for this VCPU.)
> 
> Based on a patch by Radim Krčmář.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---

Nice,

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] kvm: x86: fix kvmclock update protocol
  2015-04-23 11:46 [PATCH] kvm: x86: fix kvmclock update protocol Paolo Bonzini
  2015-04-23 15:24 ` Radim Krčmář
@ 2015-04-24 19:17 ` Marcelo Tosatti
  1 sibling, 0 replies; 3+ messages in thread
From: Marcelo Tosatti @ 2015-04-24 19:17 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, rkrcmar, luto

On Thu, Apr 23, 2015 at 01:46:55PM +0200, Paolo Bonzini wrote:
> From: Radim Krčmář <rkrcmar@redhat.com>
> 
> The kvmclock spec says that the host will increment a version field to
> an odd number, then update stuff, then increment it to an even number.
> The host is buggy and doesn't do this, and the result is observable
> when one vcpu reads another vcpu's kvmclock data.
> 
> There's no good way for a guest kernel to keep its vdso from reading
> a different vcpu's kvmclock data, but we don't need to care about
> changing VCPUs as long as we read a consistent data from kvmclock.
> (VCPU can change outside of this loop too, so it doesn't matter if we
> return a value not fit for this VCPU.)
> 
> Based on a patch by Radim Krčmář.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/x86.c | 33 ++++++++++++++++++++++++++++-----
>  1 file changed, 28 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ed31c31b2485..c73efcd03e29 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1669,12 +1669,28 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>  		&guest_hv_clock, sizeof(guest_hv_clock))))
>  		return 0;
>  
> -	/*
> -	 * The interface expects us to write an even number signaling that the
> -	 * update is finished. Since the guest won't see the intermediate
> -	 * state, we just increase by 2 at the end.
> +	/* This VCPU is paused, but it's legal for a guest to read another
> +	 * VCPU's kvmclock, so we really have to follow the specification where
> +	 * it says that version is odd if data is being modified, and even after
> +	 * it is consistent.
> +	 *
> +	 * Version field updates must be kept separate.  This is because
> +	 * kvm_write_guest_cached might use a "rep movs" instruction, and
> +	 * writes within a string instruction are weakly ordered.  So there
> +	 * are three writes overall.
> +	 *
> +	 * As a small optimization, only write the version field in the first
> +	 * and third write.  The vcpu->pv_time cache is still valid, because the
> +	 * version field is the first in the struct.
>  	 */
> -	vcpu->hv_clock.version = guest_hv_clock.version + 2;
> +	BUILD_BUG_ON(offsetof(struct pvclock_vcpu_time_info, version) != 0);
> +
> +	vcpu->hv_clock.version = guest_hv_clock.version + 1;
> +	kvm_write_guest_cached(v->kvm, &vcpu->pv_time,
> +				&vcpu->hv_clock,
> +				sizeof(vcpu->hv_clock.version));
> +
> +	smp_wmb();
>  
>  	/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
>  	pvclock_flags = (guest_hv_clock.flags & PVCLOCK_GUEST_STOPPED);
> @@ -1695,6 +1711,13 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>  	kvm_write_guest_cached(v->kvm, &vcpu->pv_time,
>  				&vcpu->hv_clock,
>  				sizeof(vcpu->hv_clock));
> +
> +	smp_wmb();
> +
> +	vcpu->hv_clock.version++;
> +	kvm_write_guest_cached(v->kvm, &vcpu->pv_time,
> +				&vcpu->hv_clock,
> +				sizeof(vcpu->hv_clock.version));
>  	return 0;
>  }
>  
> -- 
> 1.8.3.1

Acked-by: Marcelo Tosatti <mtosatti@redhat.com>


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-04-24 19:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-23 11:46 [PATCH] kvm: x86: fix kvmclock update protocol Paolo Bonzini
2015-04-23 15:24 ` Radim Krčmář
2015-04-24 19:17 ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox