From mboxrd@z Thu Jan 1 00:00:00 1970 From: Joerg Roedel Subject: Re: [PATCH 3/3] Fix TSC MSR read in nested SVM Date: Sat, 6 Aug 2011 14:16:17 +0200 Message-ID: <20110806121617.GD30353@8bytes.org> References: <1312289591-nyh@il.ibm.com> <201108021255.p72CtNjD002153@rice.haifa.ibm.com> <20110803210052.GB31923@amt.cnet> <20110805153224.GB16953@amt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Zachary Amsden , Joerg Roedel , Nadav Har'El , kvm@vger.kernel.org, Bandan Das , avi@redhat.com To: Marcelo Tosatti Return-path: Received: from 8bytes.org ([88.198.83.132]:39872 "EHLO 8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752691Ab1HFMQS (ORCPT ); Sat, 6 Aug 2011 08:16:18 -0400 Content-Disposition: inline In-Reply-To: <20110805153224.GB16953@amt.cnet> Sender: kvm-owner@vger.kernel.org List-ID: On Fri, Aug 05, 2011 at 12:32:24PM -0300, Marcelo Tosatti wrote: > On Wed, Aug 03, 2011 at 06:00:52PM -0300, Marcelo Tosatti wrote: > > On Wed, Aug 03, 2011 at 01:34:58AM -0700, Zachary Amsden wrote: > > > Caution: this requires more care. > > >=20 > > > Pretty sure this breaks userspace suspend at the cost of supporti= ng a > > > not-so-reasonable hardware feature. > > >=20 > > > On Tue, Aug 2, 2011 at 5:55 AM, Nadav Har'El wro= te: > > > > When the TSC MSR is read by an L2 guest (when L1 allowed this M= SR to be > > > > read without exit), we need to return L2's notion of the TSC, n= ot L1's. > > > > > > > > The current code incorrectly returned L1 TSC, because svm_get_m= sr() was also > > > > used in x86.c where this was assumed, but now that these places= call the new > > > > svm_read_l1_tsc(), the MSR read can be fixed. > > > > > > > > Signed-off-by: Nadav Har'El > > > > --- > > > > =A0arch/x86/kvm/svm.c | =A0 =A04 +--- > > > > =A01 file changed, 1 insertion(+), 3 deletions(-) > > > > > > > > --- .before/arch/x86/kvm/svm.c =A02011-08-02 15:51:02.000000000= +0300 > > > > +++ .after/arch/x86/kvm/svm.c =A0 2011-08-02 15:51:02.000000000= +0300 > > > > @@ -2907,9 +2907,7 @@ static int svm_get_msr(struct kvm_vcpu * > > > > > > > > =A0 =A0 =A0 =A0switch (ecx) { > > > > =A0 =A0 =A0 =A0case MSR_IA32_TSC: { > > > > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct vmcb *vmcb =3D get_host_vm= cb(svm); > > > > - > > > > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 *data =3D vmcb->control.tsc_offse= t + > > > > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 *data =3D svm->vmcb->control.tsc_= offset + > > > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0svm_scale_tsc(vc= pu, native_read_tsc()); > > > > > > > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break; > > > > > > >=20 > > >=20 > > > This is correct. Now you properly return the correct MSR value f= or > > > the TSC to the guest in all cases. > > >=20 > > > However, there is a BIG PROBLEM (yes, it is that bad). Sorry I d= id > > > not think of it before. > > >=20 > > > When the guest gets suspended and frozen by userspace, to be rest= arted > > > later, what qemu is going to do is come along and read all of the= MSRs > > > as part of the saved state. One of those happens to be the TSC M= SR. > > >=20 > > > Since you can't guarantee suspend will take place when only L1 is > > > running, you may have mixed L1/L2 TSC MSRs now being returned to > > > userspace. Now, when you resume this guest, you will have mixed = L1/L2 > > > TSC values written into the MSRs. > > >=20 > > > Those will almost certainly fail to be matched by the TSC offset > > > matching code, and thus, with multiple VCPUs, you will end up wit= h > > > slightly unsynchronized TSCs, and with that, all the problems > > > associated with unstable TSC come back to haunt you again. Basic= ally, > > > all bets for stability are off. > >=20 > > TSC synchronization is the least of your problems if you attempt to > > save/restore state a guest while any vcpu is in L2 mode. > >=20 > > So to keep consistency between the remaining MSRs, i agree with Nad= av's > > patch. Apparently SVM's original patches to return L1 TSC were aime= d at > > fixing the problem VMX is facing now, which is fixed by introductio= n > > read_l1_tsc helpers. >=20 > Joerg, can you review and ack Nadav's SVM patch? TIA Yes, sorry for the delay. I'll give it a review and test today. Joerg