From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zachary Amsden Subject: Re: 2.6.35-rc1 regression with pvclock and smp guests Date: Mon, 02 Aug 2010 14:00:32 -1000 Message-ID: <4C575C20.9010007@redhat.com> References: <4C4D4B8B.80006@amd.com> <4C4DDB00.50203@xutrox.com> <4C4F48D0.8090609@xutrox.com> <4C500872.1020809@redhat.com> <4C536F80.5090205@xutrox.com> <4C538CCE.1010104@redhat.com> <4C540EC9.1010008@xutrox.com> <4C54512B.6000307@xutrox.com> <4C54B7DE.4060901@redhat.com> <20100802144300.GD14448@mothafucka.localdomain> <4C5729F6.2050605@redhat.com> <4C573A20.6030001@xutrox.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------070004010701030206070900" Cc: Glauber Costa , kvm@vger.kernel.org, Avi Kivity , Andre Przywara To: Arjan Koers <0h61vkll2ly8@xutrox.com> Return-path: Received: from mx1.redhat.com ([209.132.183.28]:20261 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751460Ab0HCAAj (ORCPT ); Mon, 2 Aug 2010 20:00:39 -0400 In-Reply-To: <4C573A20.6030001@xutrox.com> Sender: kvm-owner@vger.kernel.org List-ID: This is a multi-part message in MIME format. --------------070004010701030206070900 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 08/02/2010 11:35 AM, Arjan Koers wrote: > On 2010-08-02 22:26, Zachary Amsden wrote: > >> On 08/02/2010 04:43 AM, Glauber Costa wrote: >> >>> On Sat, Jul 31, 2010 at 01:55:10PM -1000, Zachary Amsden wrote: >>> >>> >>>> On 07/31/2010 06:36 AM, Arjan Koers wrote: >>>> >>>> >>>>> On 2010-07-31 13:53, Arjan Koers wrote: >>>>> >>>>> >>>>>> The kernel boots successfully when CONFIG_PRINTK_TIME is not set. >>>>>> >>>>>> >>>>>> >>>>> The problem occurs when this message is printed: >>>>> >>>>> [ 0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock >>>>> >>>>> When I disable that printk, the kernel boots with >>>>> CONFIG_PRINTK_TIME=y >>>>> >>>>> --- a/arch/x86/kernel/kvmclock.c >>>>> +++ b/arch/x86/kernel/kvmclock.c >>>>> @@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt) >>>>> int low, high; >>>>> low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1; >>>>> high = ((u64)__pa(&per_cpu(hv_clock, cpu))>> 32); >>>>> - printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n", >>>>> - cpu, high, low, txt); >>>>> + /*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n", >>>>> + cpu, high, low, txt);*/ >>>>> >>>>> return native_write_msr_safe(msr_kvm_system_time, low, high); >>>>> } >>>>> >>>>> So the problem appears to be that the clock of the second CPU >>>>> is used too soon (or that clock setup should finish earlier). >>>>> >>>>> >>>> That's almost hilarious. The printk from setting up the kvm clock >>>> is invoking the kvm clock before it is setup. >>>> >>>> There's no reason other printks couldn't do the same thing, however. >>>> I think it's safest to keep an initialized flag and check for it >>>> before attempting to return a meaningful value. >>>> >>>> >>> I was on vacations, just got back. >>> >>> I think it is safe to just patch our own use of it. Before that, all >>> other >>> printks will be handled by the main cpu anyway, since it'll be the >>> only one active >>> at the moment. The only possible offenders for this are us, and the >>> cpu initialization >>> code, which is already fragile in multiple ways anyway. >>> >>> A flag would only make things more complicated and dirty >>> >>> >> Can we just do this? >> > > Sorry, the patch doesn't help. See line 68 in my debug log: > 65: ffff880001411c00 1b68905d7 156558001892 6e10a 1b6c0631e 375d47 13c5ce 15655813de60 > 66: ffff880001411c00 1b68905d7 156558001892 6e10a 1b6c0653b 375f64 13c68f 15655813df21 > 67: ffff880001411c00 1b68905d7 156558001892 6e10a 1b6c06746 37616f 13c74a 15655813dfdc > 68: ffff880001511c00 1967ac192 15654c8d826a 63c6c 3bf58bf0ea18 3bf3f5762886 15695466a1e5 2acea0f4244f > This is a separate bug. See attached patch (it won't apply, it's part of a series, but shows the bug). > 69: ffff880001411c00 1b6f3fbda 156558264b1e 6e10e 1b6f424aa 28d0 e93 1565582659b1 > 70: ffff880001411c00 1b6f3fbda 156558264b1e 6e10e 1b6f4a1e0 a606 3b4b 156558268669 > 71: ffff880001411c00 1b6f3fbda 156558264b1e 6e10e 1b6f4ba63 be89 440b 156558268f29 > 72: ffff880001411c00 1b6f3fbda 156558264b1e 6e10e 1b6f4d8e7 dd0d 4ef1 156558269a0f > 73: ffff880001511c00 3bf58bf16356 15655825e74b 40496 3bf58bf4d52c 371d6 13aef 15655827223a > 74: ffff880001511c00 3bf58bf16356 15655825e74b 40496 3bf58bf4ebec 38896 1430f 156558272a5a > > I don't think that pvclock_clocksource_read is receiving > completely random uninitialized data. The values in shadow > are wrong, but could be interpreted as valid data > (shadow.tsc_to_nsec_mul = b6dc43b6, shadow.tsc_shift = ffffffff, > shadow.flags = 0 and shadow.version is always even). > Copied from the first CPU possibly? --------------070004010701030206070900 Content-Type: text/plain; name="0004-Fix-SVM-VMCB-reset.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="0004-Fix-SVM-VMCB-reset.patch" RnJvbSAzODIzYzAxODE2MmRjNzA4YjU0M2NiZGM2ODBhNGM3ZDYzNTMzZmVlIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBaYWNoYXJ5IEFtc2RlbiA8emFtc2RlbkByZWRoYXQu Y29tPgpEYXRlOiBTYXQsIDI5IE1heSAyMDEwIDE3OjUyOjQ2IC0xMDAwClN1YmplY3Q6IFtL Vk0gVjIgMDQvMjVdIEZpeCBTVk0gVk1DQiByZXNldApDYzogQXZpIEtpdml0eSA8YXZpQHJl ZGhhdC5jb20+LAogICAgTWFyY2VsbyBUb3NhdHRpIDxtdG9zYXR0aUByZWRoYXQuY29tPiwK ICAgIEdsYXViZXIgQ29zdGEgPGdsb21tZXJAcmVkaGF0LmNvbT4sCiAgICBsaW51eC1rZXJu ZWxAdmdlci5rZXJuZWwub3JnCgpPbiByZXNldCwgVk1DQiBUU0Mgc2hvdWxkIGJlIHNldCB0 byB6ZXJvLiAgSW5zdGVhZCwgY29kZSB3YXMgc2V0dGluZwp0c2Nfb2Zmc2V0IHRvIHplcm8s IHdoaWNoIHBhc3NlcyB0aHJvdWdoIHRoZSB1bmRlcmx5aW5nIFRTQy4KClNpZ25lZC1vZmYt Ynk6IFphY2hhcnkgQW1zZGVuIDx6YW1zZGVuQHJlZGhhdC5jb20+Ci0tLQogYXJjaC94ODYv a3ZtL3N2bS5jIHwgICAgMiArLQogMSBmaWxlcyBjaGFuZ2VkLCAxIGluc2VydGlvbnMoKyks IDEgZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvYXJjaC94ODYva3ZtL3N2bS5jIGIvYXJj aC94ODYva3ZtL3N2bS5jCmluZGV4IDc2MGM4NmUuLjQ2ODU2ZDIgMTAwNjQ0Ci0tLSBhL2Fy Y2gveDg2L2t2bS9zdm0uYworKysgYi9hcmNoL3g4Ni9rdm0vc3ZtLmMKQEAgLTc4MSw3ICs3 ODEsNyBAQCBzdGF0aWMgdm9pZCBpbml0X3ZtY2Ioc3RydWN0IHZjcHVfc3ZtICpzdm0pCiAK IAljb250cm9sLT5pb3BtX2Jhc2VfcGEgPSBpb3BtX2Jhc2U7CiAJY29udHJvbC0+bXNycG1f YmFzZV9wYSA9IF9fcGEoc3ZtLT5tc3JwbSk7Ci0JY29udHJvbC0+dHNjX29mZnNldCA9IDA7 CisJZ3Vlc3Rfd3JpdGVfdHNjKCZzdm0tPnZjcHUsIDApOwogCWNvbnRyb2wtPmludF9jdGwg PSBWX0lOVFJfTUFTS0lOR19NQVNLOwogCiAJaW5pdF9zZWcoJnNhdmUtPmVzKTsKLS0gCjEu Ny4xCgo= --------------070004010701030206070900--