From mboxrd@z Thu Jan 1 00:00:00 1970 From: Arjan Koers <0h61vkll2ly8@xutrox.com> Subject: Re: 2.6.35-rc1 regression with pvclock and smp guests Date: Mon, 02 Aug 2010 18:16:16 +0200 Message-ID: <4C56EF50.1000706@xutrox.com> References: <4C4D4B8B.80006@amd.com> <4C4DDB00.50203@xutrox.com> <4C4F48D0.8090609@xutrox.com> <4C500872.1020809@redhat.com> <4C536F80.5090205@xutrox.com> <4C538CCE.1010104@redhat.com> <4C540EC9.1010008@xutrox.com> <4C54512B.6000307@xutrox.com> <4C54B7DE.4060901@redhat.com> <20100802144300.GD14448@mothafucka.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Glauber Costa , Zachary Amsden , Avi Kivity , Andre Przywara To: kvm@vger.kernel.org Return-path: Received: from smtp-out0.tiscali.nl ([195.241.79.175]:34932 "EHLO smtp-out0.tiscali.nl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751757Ab0HBQQd (ORCPT ); Mon, 2 Aug 2010 12:16:33 -0400 In-Reply-To: <20100802144300.GD14448@mothafucka.localdomain> Sender: kvm-owner@vger.kernel.org List-ID: On 2010-08-02 16:43, Glauber Costa wrote: > On Sat, Jul 31, 2010 at 01:55:10PM -1000, Zachary Amsden wrote: >> On 07/31/2010 06:36 AM, Arjan Koers wrote: >>> On 2010-07-31 13:53, Arjan Koers wrote: >>>> The kernel boots successfully when CONFIG_PRINTK_TIME is not set. >>>> >>> The problem occurs when this message is printed: >>> >>> [ 0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock >>> >>> When I disable that printk, the kernel boots with >>> CONFIG_PRINTK_TIME=y >>> >>> --- a/arch/x86/kernel/kvmclock.c >>> +++ b/arch/x86/kernel/kvmclock.c >>> @@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt) >>> int low, high; >>> low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1; >>> high = ((u64)__pa(&per_cpu(hv_clock, cpu))>> 32); >>> - printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n", >>> - cpu, high, low, txt); >>> + /*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n", >>> + cpu, high, low, txt);*/ >>> >>> return native_write_msr_safe(msr_kvm_system_time, low, high); >>> } >>> >>> So the problem appears to be that the clock of the second CPU >>> is used too soon (or that clock setup should finish earlier). >> >> That's almost hilarious. The printk from setting up the kvm clock >> is invoking the kvm clock before it is setup. >> >> There's no reason other printks couldn't do the same thing, however. >> I think it's safest to keep an initialized flag and check for it >> before attempting to return a meaningful value. > > I was on vacations, just got back. > > I think it is safe to just patch our own use of it. Before that, all other > printks will be handled by the main cpu anyway, since it'll be the only one active > at the moment. The only possible offenders for this are us, and the cpu initialization > code, which is already fragile in multiple ways anyway. > > A flag would only make things more complicated and dirty Maybe you could add a sanity check in pvclock_clocksource_read after 'do { ... } while (version != src->version)' that returns last_value if offset is extremely large? I've performed some more boot tests (about 20) with the patch that moves the printk after native_write_msr_safe and it works for me. Andre Przywara confirmed to me that it also fixes his problem. A slightly modified version of the patch for 2.6.34.1 also works (800+ successful boot cycles).