From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: [PATCH] kvm: x86: fix kvmclock update protocol Date: Fri, 24 Apr 2015 16:17:30 -0300 Message-ID: <20150424191730.GA5800@amt.cnet> References: <1429789615-36001-1-git-send-email-pbonzini@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, rkrcmar@redhat.com, luto@kernel.org To: Paolo Bonzini Return-path: Content-Disposition: inline In-Reply-To: <1429789615-36001-1-git-send-email-pbonzini@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On Thu, Apr 23, 2015 at 01:46:55PM +0200, Paolo Bonzini wrote: > From: Radim Kr=C4=8Dm=C3=A1=C5=99 >=20 > The kvmclock spec says that the host will increment a version field t= o > an odd number, then update stuff, then increment it to an even number= =2E > The host is buggy and doesn't do this, and the result is observable > when one vcpu reads another vcpu's kvmclock data. >=20 > There's no good way for a guest kernel to keep its vdso from reading > a different vcpu's kvmclock data, but we don't need to care about > changing VCPUs as long as we read a consistent data from kvmclock. > (VCPU can change outside of this loop too, so it doesn't matter if we > return a value not fit for this VCPU.) >=20 > Based on a patch by Radim Kr=C4=8Dm=C3=A1=C5=99. >=20 > Signed-off-by: Paolo Bonzini > --- > arch/x86/kvm/x86.c | 33 ++++++++++++++++++++++++++++----- > 1 file changed, 28 insertions(+), 5 deletions(-) >=20 > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index ed31c31b2485..c73efcd03e29 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -1669,12 +1669,28 @@ static int kvm_guest_time_update(struct kvm_v= cpu *v) > &guest_hv_clock, sizeof(guest_hv_clock)))) > return 0; > =20 > - /* > - * The interface expects us to write an even number signaling that = the > - * update is finished. Since the guest won't see the intermediate > - * state, we just increase by 2 at the end. > + /* This VCPU is paused, but it's legal for a guest to read another > + * VCPU's kvmclock, so we really have to follow the specification w= here > + * it says that version is odd if data is being modified, and even = after > + * it is consistent. > + * > + * Version field updates must be kept separate. This is because > + * kvm_write_guest_cached might use a "rep movs" instruction, and > + * writes within a string instruction are weakly ordered. So there > + * are three writes overall. > + * > + * As a small optimization, only write the version field in the fir= st > + * and third write. The vcpu->pv_time cache is still valid, becaus= e the > + * version field is the first in the struct. > */ > - vcpu->hv_clock.version =3D guest_hv_clock.version + 2; > + BUILD_BUG_ON(offsetof(struct pvclock_vcpu_time_info, version) !=3D = 0); > + > + vcpu->hv_clock.version =3D guest_hv_clock.version + 1; > + kvm_write_guest_cached(v->kvm, &vcpu->pv_time, > + &vcpu->hv_clock, > + sizeof(vcpu->hv_clock.version)); > + > + smp_wmb(); > =20 > /* retain PVCLOCK_GUEST_STOPPED if set in guest copy */ > pvclock_flags =3D (guest_hv_clock.flags & PVCLOCK_GUEST_STOPPED); > @@ -1695,6 +1711,13 @@ static int kvm_guest_time_update(struct kvm_vc= pu *v) > kvm_write_guest_cached(v->kvm, &vcpu->pv_time, > &vcpu->hv_clock, > sizeof(vcpu->hv_clock)); > + > + smp_wmb(); > + > + vcpu->hv_clock.version++; > + kvm_write_guest_cached(v->kvm, &vcpu->pv_time, > + &vcpu->hv_clock, > + sizeof(vcpu->hv_clock.version)); > return 0; > } > =20 > --=20 > 1.8.3.1 Acked-by: Marcelo Tosatti