From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: 2.6.35-rc1 regression with pvclock and smp guests Date: Tue, 27 Jul 2010 15:34:31 +0300 Message-ID: <4C4ED257.40002@redhat.com> References: <4C483F67.1010007@amd.com> <4C4BF96B.7010005@redhat.com> <4C4D4B8B.80006@amd.com> <4C4EAEFC.20207@redhat.com> <4C4EC7D1.6030708@amd.com> <4C4ECBC7.1070405@redhat.com> <4C4ECF2E.4070103@amd.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: "glommer@redhat.com" , Zachary Amsden , KVM list To: Andre Przywara Return-path: Received: from mx1.redhat.com ([209.132.183.28]:2252 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752367Ab0G0Mee (ORCPT ); Tue, 27 Jul 2010 08:34:34 -0400 In-Reply-To: <4C4ECF2E.4070103@amd.com> Sender: kvm-owner@vger.kernel.org List-ID: On 07/27/2010 03:21 PM, Andre Przywara wrote: > Avi Kivity wrote: >> On 07/27/2010 02:49 PM, Andre Przywara wrote: >>>> What is the guest executing when it hangs? >>> Both VCPUs are halted, the monitor and System.map tell me it's in=20 >>> native_safe_halt(). >>> The code sequence confirms this, it is an intentional sti;hlt=20 >>> condition. >>> Using -smp 16 also shows that all 16 VCPUs are stuck. >>> >> >> Well, strange. The intent of that patch was to make the clock never= =20 >> go backwards. Perhaps the change made it go forwards by a large=20 >> amount, and the guest is not hung, just waiting for some timer that=20 >> is far in the future. >> >> Can you do something like >> >> - if (ret < last) >> + if (ret < last) { >> + static u64 max_delta; >> + if (last - ret > max_delta) { >> + max_delta =3D last - ret; >> + printk("advancing kvmclock by: %llx\n", max_delta= ); >> + } >> return last; >> + } >> >> to see if this is happening? > No change, it still hangs. I also don't see the printk. > The output with smp=3D1 is like this: > [ 1.186549] ACPI: Power Button [PWRF] > [ 1.189204] XENFS: not registering filesystem on non-xen platform > [ 1.195001] Non-volatile memory driver v1.3 > [ 1.196358] Linux agpgart interface v0.103 > [ 1.197687] [drm] Initialized drm 1.1.0 20060810 > [ 1.198926] [drm:i915_init] *ERROR* drm/i915 can't work without=20 > intel_agp module! > [ 1.201213] Serial: 8250/16550 driver, 4 ports, IRQ sharing enable= d > =FF[ 1.460714] serial8250: ttyS0 at I/O 0x3f8 (irq =3D 4) is a 165= 50A > [ 1.463243] 00:06: ttyS0 at I/O 0x3f8 (irq =3D 4) is a 16550A > [ 1.467153] brd: module loaded > [ 1.469245] loop: module loaded > With smp=3D2 the output stops just before the strange "y" character (= I=20 > guess it's ASCII 255), which I assume is an artifact of the serial=20 > console. > As you can see at the timestamps, it takes some time between the last= =20 > shown line (1.201213) and the first missing one (1.460714). Wierd. Maybe the clock goes crazy. Let's see if it jumps forward alot: } while (unlikely(last !=3D ret)); + + { + static u64 last_report; + if (ret > last_report + 10000) { + last_report =3D ret; + printk("kvmclock: %llx\n", ret); + } + + } return ret; } Worth updating the 'return last' to update ret and goto the new code, s= o=20 we don't miss that path. --=20 error compiling committee.c: too many arguments to function