From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zachary Amsden Subject: Re: 2.6.35-rc1 regression with pvclock and smp guests Date: Fri, 08 Oct 2010 16:27:51 -1000 Message-ID: <4CAFD327.3030608@redhat.com> References: <4CA4DBC8.6070606@xutrox.com> <20100930190507.GA1111@amt.cnet> <4CA51715.1070507@msgid.tls.msk.ru> <4CA51847.5060208@msgid.tls.msk.ru> <4CA6C4BB.5020004@redhat.com> <4CA6E0BF.90605@msgid.tls.msk.ru> <4CA75969.1080405@xutrox.com> <4CA7C34C.4040000@redhat.com> <4CAE6203.6040902@xutrox.com> <4CAE862F.10904@redhat.com> <20101008220600.GA9430@amt.cnet> <4CAFC11B.8010603@xutrox.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org, Marcelo Tosatti , Michael Tokarev , Avi Kivity , Glauber Costa , Andre Przywara To: Arjan Koers <0h61vkll2ly8@xutrox.com> Return-path: Received: from mx1.redhat.com ([209.132.183.28]:22141 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760136Ab0JIC2D (ORCPT ); Fri, 8 Oct 2010 22:28:03 -0400 In-Reply-To: <4CAFC11B.8010603@xutrox.com> Sender: kvm-owner@vger.kernel.org List-ID: On 10/08/2010 03:10 PM, Arjan Koers wrote: > On 2010-10-09 00:06, Marcelo Tosatti wrote: > >> On Thu, Oct 07, 2010 at 04:47:11PM -1000, Zachary Amsden wrote: >> >>> On 10/07/2010 02:12 PM, Arjan Koers wrote: >>> >>>> On 2010-10-03 01:42, Zachary Amsden wrote: >>>> ... >>>> >>>>> Umm... do you guys have this commit? This is supposed to address the >>>>> issue where the guest keeps resetting the TSC. A guest which does that >>>>> will break kvmclock. It only happens on SMP, and it's much worse on AMD >>>>> CPUs... >>>>> >>>>> sound like your scenario. >>>>> >>>>> commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d >>>>> Author: Zachary Amsden >>>>> Date: Thu Aug 19 22:07:26 2010 -1000 >>>>> >>>> This commit fixes the problem: >>>> >>>> commit aad07c4f92bae2edaa42bcef84c2afdd0d082458 >>>> Author: Zachary Amsden >>>> Date: Thu Aug 19 22:07:19 2010 -1000 >>>> >>>> KVM: x86: Move TSC reset out of vmcb_init >>>> >>>> The VMCB is reset whenever we receive a startup IPI, so Linux is setting >>>> TSC back to zero happens very late in the boot process and destabilizing >>>> the TSC. Instead, just set TSC to zero once at VCPU creation time. >>>> >>>> Why the separate patch? So git-bisect is your friend. >>>> >>> Okay, apparently I need to go poke around 2.6.35 and see what >>> patches made it there and what patches didn't. >>> >> Backports attached. Michael, Arjan, please give them a try. >> >> > Thanks for the patches. > > Successfully tested with 2.6.34.7, 2.6.35.7 and 2.6.36-rc7 host > (with a 2.6.35.7 guest). > > It failed with a 2.6.32.24 host. The patch applied, but > pvclock_clocksource_read on the guest is still producing wrong > results for CPU 1 while it's booting. I'll re-check tomorrow. > There's a lot of work I've done and also a lot of work done by Glauber Costa on kvmclock that recently went upstream. It's unlikely that you'll be bug free without all of those patches applied; most of the patches were not just enhancements, but contained bugfixes as well as improved operation conditions. On top of this, the patches are highly interdependent because of close code proximity. I suggest applying the following commits to your branch (newest listed first; apply in reverse order): 12b1164fa498997bf72070e6a81418197e283716 bfa075b75d8786380a7bca1215d4c7d1485d18dd 82e7988a2088781175a22b09631bce97cd5ed177 bfb3f3326c915b1800dc65d10ca09fbd548353d2 1377ff23ae2bf49c76f8f498ca81050878b9666a 9a088cc32488cfb9f60dca5972155ba13f39eb83 e06a1a6cbe4e9f4c766595483a9b345d5b48bda7 da908f2fb4e783c2a4de751fb90f11a0dd041161 cf839f5da2b0779b9ec8b990f851fb4e7d681da0 cbc59a098486494d9a49537dcb9c969210a8306d 5cd459cdde725bb5c3a7feef6e074e7da70490c9 d578d4d72e3d2154901123f40c9fa7de1f85ae73 bd59fc8ff95126f27b7a0df1b6cc602aa428812d e5e7675b0b9bf8eb0b806145a2fe173b5bb0e908 bf0fb4a42ba7eb362f4013bd2e93209666793e66 69403a558097a9bd333736d58a4cb69ea6e2a0ac a87834bdb7ff9117da7f164e8cee638f2c51f9b7 91308e2fecddb6fc63feaf4cef3400f5cbea6619 fd03465c0648cd12d7333269b80d902d0a8516dd aad07c4f92bae2edaa42bcef84c2afdd0d082458 280372e494634d0a2cba3956721be16fc4f989bf 1e6145f6fd7899d1f34e4ac00a8558d82a8d704a ec01d2eb0a74a6d95823fb6e320298473faf12be 3e05d29fe45508625e2a73db3d1bfb54f30731ff Since the issue appears resolved, I'm going to continue working upstream. Zach