From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tokarev Subject: Re: 2.6.35-rc1 regression with pvclock and smp guests Date: Sat, 02 Oct 2010 11:50:38 +0400 Message-ID: <4CA6E44E.10101@msgid.tls.msk.ru> References: <4CA2F8A3.80400@redhat.com> <4CA30424.9030007@msgid.tls.msk.ru> <4CA30493.6090503@msgid.tls.msk.ru> <4CA392FE.5090009@xutrox.com> <4CA4427C.9090304@msgid.tls.msk.ru> <4CA45F7B.8050806@msgid.tls.msk.ru> <4CA4968F.9050402@redhat.com> <4CA4A8C5.3030407@msgid.tls.msk.ru> <4CA4AD87.8060502@redhat.com> <4CA4DBC8.6070606@xutrox.com> <20100930190507.GA1111@amt.cnet> <4CA51715.1070507@msgid.tls.msk.ru> <4CA51847.5060208@msgid.tls.msk.ru> <4CA6C4BB.5020004@redhat.com> <4CA6E0BF.90605@msgid.tls.msk.ru> <4CA6E1E7.9010003@msgid.tls.msk.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Marcelo Tosatti , Arjan Koers <0h61vkll2ly8@xutrox.com>, kvm@vger.kernel.org, Avi Kivity , Glauber Costa , Andre Przywara , jeremy@xensource.com To: Zachary Amsden Return-path: Received: from isrv.corpit.ru ([86.62.121.231]:40182 "EHLO isrv.corpit.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751207Ab0JBHul (ORCPT ); Sat, 2 Oct 2010 03:50:41 -0400 In-Reply-To: <4CA6E1E7.9010003@msgid.tls.msk.ru> Sender: kvm-owner@vger.kernel.org List-ID: Ugh. Replying to myself again and again, but I found all these variants quite interesting for the problem at hand. 02.10.2010 11:40, Michael Tokarev wrote: > 02.10.2010 11:35, Michael Tokarev wrote: > [] >> [ 0.259999] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14 >> [ 0.259999] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15 >> >> Note the time - it is constant after switching to kvmclock. > > Another interesting observation. The time is almost always > like this. Another very common version is 0.199999: > > [ 0.189999] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11) > [ 0.193333] HEST: Table is not found! > [ 0.193333] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none > [ 0.196666] vgaarb: loaded > [ 0.196666] PCI: Using ACPI for IRQ routing > [ 0.199999] Switching to clocksource kvm-clock > [ 0.199999] pnp: PnP ACPI init > [ 0.199999] ACPI: bus type pnp registered > [ 0.199999] pnp: PnP ACPI: found 8 devices > [ 0.199999] ACPI: ACPI bus type pnp unregistered > [ 0.199999] PnPBIOS: Disabled > ... And here's yet another variant I just got. It hanged much earler this time, now with 100% CPU usage: ... [ 0.000000] Kernel command line: rootfs=nfs root=/usr/rb rootflags=ro,nolock bootrc=/remote/bootrc initrd=lnx/initrd-2.6.35-i686 ip=192.168.88.60:192.168.88.4:192.168.88.4:255.255.255.0 BOOTIF=01-52-54-00-12-34-56 console=ttyS0 BOOT_IMAGE=lnx/vmlinuz-2.6.35-i686 ... [ 0.009012] using C1E aware idle routine [ 0.009430] Performance Events: AMD PMU driver. [ 0.010009] ... version: 0 [ 0.010427] ... bit width: 48 [ 0.010853] ... generic registers: 4 [ 0.011270] ... value mask: 0000ffffffffffff [ 0.011818] ... max period: 00007fffffffffff [ 0.012366] ... fixed-purpose events: 0 [ 0.012785] ... event mask: 000000000000000f [ 0.016795] ACPI: Core revision 20100428 [ 0.018729] Enabling APIC mode: Flat. Using 1 I/O APICs [ 0.019999] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 [ 0.019999] CPU0: AMD Athlon(tm) II X2 260 Processor stepping 03 and.. nothing (this is with -cpu host). So this is _way_ before the kvmclock registration. Another: ... [ 0.109999] vgaarb: loaded [ 0.109999] PCI: Using ACPI for IRQ routing [ 0.113333] Switching to clocksource kvm-clock [ 0.116666] pnp: PnP ACPI init [ 0.116666] ACPI: bus type pnp registered (note the "uncommon" timestamp ;) With printk.time=0 it still boots ok. Note there are 2 "versions" of this hang. The one which is trivially triggerable right at the kvmclock registration without the bandaid printk patch applied - it hangs there with 100% cpu usage and guest not reacting to any events. This is what happened in the above case where it hanged at CPU0 line, too -- 100% CPU and no reaction to keyboard. Another, much more common variant with that printk patch applied is like no cpu usage, the guest reacts to keyboard events (I can Shift+PgUp/PgDown for example), but it does not do anything else, and the time printed is constant. Thanks! /mjt