public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* 2.6.35-rc1 regression with pvclock and smp guests
@ 2010-07-22 12:53 Andre Przywara
  2010-07-25  8:44 ` Avi Kivity
  0 siblings, 1 reply; 81+ messages in thread
From: Andre Przywara @ 2010-07-22 12:53 UTC (permalink / raw)
  To: glommer; +Cc: Zachary Amsden, KVM list

Hi,

I found a regression with pvclock and SMP KVM _guests_.
PVCLOCK enabled guest kernels boot with qemu-kvm.git and with smp=1, but 
with smp=2 halt at:

Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
(last line shown)

I bisected this down to:
commit 489fb490dbf8dab0249ad82b56688ae3842a79e8
Author: Glauber Costa <glommer@redhat.com>
Date:   Tue May 11 12:17:40 2010 -0400

     x86, paravirt: Add a global synchronization point for pvclock

One commit before works, smp=1 always works, disabling PVCLOCK works.
Using qemu-kvm-0.12.4 works, too.
Having PVCLOCK enabled and with smp=2 the kernel halts without any 
further message.
This is still the case with the lastest tip.
Even pinning both VCPU threads to the same host core show the bug.
The bug triggers on all hosts I tested, an single socket quadcore 
Athlon, a dual socket dualcore K8-Opteron and a quad socket 12core Opteron.

Please note that this is the guest kernel, the host kernel does not matter.

I have no idea (and don't feel like ;-) debugging this, so I hope 
someone will find and fix the bug.

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-22 12:53 2.6.35-rc1 regression with pvclock and smp guests Andre Przywara
@ 2010-07-25  8:44 ` Avi Kivity
  2010-07-26  8:47   ` Andre Przywara
  0 siblings, 1 reply; 81+ messages in thread
From: Avi Kivity @ 2010-07-25  8:44 UTC (permalink / raw)
  To: Andre Przywara; +Cc: glommer, Zachary Amsden, KVM list

  On 07/22/2010 03:53 PM, Andre Przywara wrote:
> Hi,
>
> I found a regression with pvclock and SMP KVM _guests_.
> PVCLOCK enabled guest kernels boot with qemu-kvm.git and with smp=1, 
> but with smp=2 halt at:
>
> Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> (last line shown)
>
> I bisected this down to:
> commit 489fb490dbf8dab0249ad82b56688ae3842a79e8
> Author: Glauber Costa <glommer@redhat.com>
> Date:   Tue May 11 12:17:40 2010 -0400
>
>     x86, paravirt: Add a global synchronization point for pvclock
>
> One commit before works, smp=1 always works, disabling PVCLOCK works.
> Using qemu-kvm-0.12.4 works, too.
> Having PVCLOCK enabled and with smp=2 the kernel halts without any 
> further message.
> This is still the case with the lastest tip.
> Even pinning both VCPU threads to the same host core show the bug.
> The bug triggers on all hosts I tested, an single socket quadcore 
> Athlon, a dual socket dualcore K8-Opteron and a quad socket 12core 
> Opteron.
>
> Please note that this is the guest kernel, the host kernel does not 
> matter.
>
> I have no idea (and don't feel like ;-) debugging this, so I hope 
> someone will find and fix the bug.


Does this go away with CONFIG_DEBUG_RODATA=n?  If so, it's a known bug 
in the atomic_*() clobber lists.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-25  8:44 ` Avi Kivity
@ 2010-07-26  8:47   ` Andre Przywara
  2010-07-26 18:59     ` Arjan Koers
  2010-07-27 10:03     ` Avi Kivity
  0 siblings, 2 replies; 81+ messages in thread
From: Andre Przywara @ 2010-07-26  8:47 UTC (permalink / raw)
  To: Avi Kivity; +Cc: glommer@redhat.com, Zachary Amsden, KVM list

Avi Kivity wrote:
>   On 07/22/2010 03:53 PM, Andre Przywara wrote:
>> Hi,
>>
>> I found a regression with pvclock and SMP KVM _guests_.
>> PVCLOCK enabled guest kernels boot with qemu-kvm.git and with smp=1, 
>> but with smp=2 halt at:
>>
>> Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
>> (last line shown)
>>
>> I bisected this down to:
>> commit 489fb490dbf8dab0249ad82b56688ae3842a79e8
>> Author: Glauber Costa <glommer@redhat.com>
>> Date:   Tue May 11 12:17:40 2010 -0400
>>
>>     x86, paravirt: Add a global synchronization point for pvclock
>>
>> One commit before works, smp=1 always works, disabling PVCLOCK works.
>> Using qemu-kvm-0.12.4 works, too.
>> Having PVCLOCK enabled and with smp=2 the kernel halts without any 
>> further message.
>> This is still the case with the lastest tip.
>> Even pinning both VCPU threads to the same host core show the bug.
>> The bug triggers on all hosts I tested, an single socket quadcore 
>> Athlon, a dual socket dualcore K8-Opteron and a quad socket 12core 
>> Opteron.
>>
>> Please note that this is the guest kernel, the host kernel does not 
>> matter.
>>
>> I have no idea (and don't feel like ;-) debugging this, so I hope 
>> someone will find and fix the bug.
> 
> 
> Does this go away with CONFIG_DEBUG_RODATA=n?  If so, it's a known bug 
> in the atomic_*() clobber lists.
> 
Unfortunately the bug persists even with CONFIG_DEBUG_RODATA disabled.
The debug options I had enabled now are:
CONFIG_DEBUG_DEVRES=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_STACKOVERFLOW=y
CONFIG_DEBUG_BOOT_PARAMS=y

I even disabled all kernel debug options, that does not help, too.

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-26  8:47   ` Andre Przywara
@ 2010-07-26 18:59     ` Arjan Koers
  2010-07-27 21:00       ` Arjan Koers
  2010-07-27 10:03     ` Avi Kivity
  1 sibling, 1 reply; 81+ messages in thread
From: Arjan Koers @ 2010-07-26 18:59 UTC (permalink / raw)
  To: kvm

Andre Przywara wrote:
> Avi Kivity wrote:
>>   On 07/22/2010 03:53 PM, Andre Przywara wrote:
>>> Hi,
>>>
>>> I found a regression with pvclock and SMP KVM _guests_.
>>> PVCLOCK enabled guest kernels boot with qemu-kvm.git and with smp=1, 
>>> but with smp=2 halt at:
>>>
>>> Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
>>> (last line shown)
>>>
>>> I bisected this down to:
>>> commit 489fb490dbf8dab0249ad82b56688ae3842a79e8
>>> Author: Glauber Costa <glommer@redhat.com>
>>> Date:   Tue May 11 12:17:40 2010 -0400
>>>
>>>     x86, paravirt: Add a global synchronization point for pvclock
>>>
>>> One commit before works, smp=1 always works, disabling PVCLOCK works.
>>> Using qemu-kvm-0.12.4 works, too.
>>> Having PVCLOCK enabled and with smp=2 the kernel halts without any 
>>> further message.
>>> This is still the case with the lastest tip.
>>> Even pinning both VCPU threads to the same host core show the bug.
>>> The bug triggers on all hosts I tested, an single socket quadcore 
>>> Athlon, a dual socket dualcore K8-Opteron and a quad socket 12core 
>>> Opteron.
>>>
>>> Please note that this is the guest kernel, the host kernel does not 
>>> matter.
>>>
>>> I have no idea (and don't feel like ;-) debugging this, so I hope 
>>> someone will find and fix the bug.
>>
>>
>> Does this go away with CONFIG_DEBUG_RODATA=n?  If so, it's a known bug 
>> in the atomic_*() clobber lists.
>>
> Unfortunately the bug persists even with CONFIG_DEBUG_RODATA disabled.
> The debug options I had enabled now are:
> CONFIG_DEBUG_DEVRES=y
> CONFIG_DEBUG_FS=y
> CONFIG_DEBUG_KERNEL=y
> CONFIG_DEBUG_BUGVERBOSE=y
> CONFIG_DEBUG_MEMORY_INIT=y
> CONFIG_DEBUG_STACKOVERFLOW=y
> CONFIG_DEBUG_BOOT_PARAMS=y
> 
> I even disabled all kernel debug options, that does not help, too.

I ran into the same problem. 2.6.34.1 and 2.6.35-rc6 SMP guest
kernels hang during boot.

The boot log of 2.6.34.1 with the patch reverted is at the bottom of
this message (59aab522154a2f17b25335b63c1cf68a51fb6ae0 for 2.6.34.1).

With the patch still in place, the kernel appears to hang (stuck in
while loop?) between these two messages:
  [    0.684803]  vdb: vdb1
  [    1.013120] Clocksource tsc unstable (delta = 1037182237254 ns)

Note that each boot shows a message about the tsc being unstable:
  [    1.013120] Clocksource tsc unstable (delta = 1037182237254 ns)
  [    1.013122] Clocksource tsc unstable (delta = 1149054858088 ns)
  [    1.009117] Clocksource tsc unstable (delta = 1265448436431 ns)

My host is running kernel 2.6.34.1 with the latest git version of
qemu-kvm (b81fe95).


Boot log of SMP guest with patch reverted:

[    0.000000] Linux version 2.6.34.1-201007261412-guestmp (arjan@dev-lenny) (gcc version 4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Mon Jul 26 14:16:18 UTC 2010
[    0.000000] Command line: root=/dev/vda1 ro single
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009a800 (usable)
[    0.000000]  BIOS-e820: 000000000009a800 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 000000001fffd000 (usable)
[    0.000000]  BIOS-e820: 000000001fffd000 - 0000000020000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI 2.4 present.
[    0.000000] e820 update range: 0000000000000000 - 0000000000001000 (usable) ==> (reserved)
[    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
[    0.000000] No AGP bridge found
[    0.000000] last_pfn = 0x1fffd max_arch_pfn = 0x400000000
[    0.000000] MTRR default type: write-back
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 00E0000000 mask FFE0000000 uncachable
[    0.000000]   1 disabled
[    0.000000]   2 disabled
[    0.000000]   3 disabled
[    0.000000]   4 disabled
[    0.000000]   5 disabled
[    0.000000]   6 disabled
[    0.000000]   7 disabled
[    0.000000] x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106
[    0.000000] initial memory mapped : 0 - 20000000
[    0.000000] Using GB pages for direct mapping
[    0.000000] init_memory_mapping: 0000000000000000-000000001fffd000
[    0.000000]  0000000000 - 001fe00000 page 2M
[    0.000000]  001fe00000 - 001fffd000 page 4k
[    0.000000] kernel direct mapping tables up to 1fffd000 @ 8000-b000
[    0.000000] RAMDISK: 1fdfc000 - 1ffed000
[    0.000000] ACPI: RSDP 00000000000fdb50 00014 (v00 BOCHS )
[    0.000000] ACPI: RSDT 000000001fffde10 00034 (v01 BOCHS  BXPCRSDT 00000001 BXPC 00000001)
[    0.000000] ACPI: FACP 000000001ffffe40 00074 (v01 BOCHS  BXPCFACP 00000001 BXPC 00000001)
[    0.000000] ACPI: DSDT 000000001fffdfd0 01E22 (v01   BXPC   BXDSDT 00000001 INTL 20090123)
[    0.000000] ACPI: FACS 000000001ffffe00 00040
[    0.000000] ACPI: SSDT 000000001fffdf80 00044 (v01 BOCHS  BXPCSSDT 00000001 BXPC 00000001)
[    0.000000] ACPI: APIC 000000001fffde90 0007A (v01 BOCHS  BXPCAPIC 00000001 BXPC 00000001)
[    0.000000] ACPI: HPET 000000001fffde50 00038 (v01 BOCHS  BXPCHPET 00000001 BXPC 00000001)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] kvm-clock: cpu 0, msr 0:1331ac1, boot clock
[    0.000000]  [ffffea0000000000-ffffea00007fffff] PMD -> [ffff880001c00000-ffff8800023fffff] on node 0
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000001 -> 0x00001000
[    0.000000]   DMA32    0x00001000 -> 0x00100000
[    0.000000]   Normal   empty
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[2] active PFN ranges
[    0.000000]     0: 0x00000001 -> 0x0000009a
[    0.000000]     0: 0x00000100 -> 0x0001fffd
[    0.000000] On node 0 totalpages: 130966
[    0.000000]   DMA zone: 56 pages used for memmap
[    0.000000]   DMA zone: 0 pages reserved
[    0.000000]   DMA zone: 3937 pages, LIFO batch:0
[    0.000000]   DMA32 zone: 1736 pages used for memmap
[    0.000000]   DMA32 zone: 125237 pages, LIFO batch:31
[    0.000000] ACPI: PM-Timer IO Port: 0xb008
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[    0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[    0.000000] ACPI: IRQ0 used by override.
[    0.000000] ACPI: IRQ2 used by override.
[    0.000000] ACPI: IRQ5 used by override.
[    0.000000] ACPI: IRQ9 used by override.
[    0.000000] ACPI: IRQ10 used by override.
[    0.000000] ACPI: IRQ11 used by override.
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[    0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
[    0.000000] nr_irqs_gsi: 24
[    0.000000] Allocating PCI resources starting at 20000000 (gap: 20000000:dffc0000)
[    0.000000] Booting paravirtualized kernel on KVM
[    0.000000] setup_percpu: NR_CPUS:6 nr_cpumask_bits:6 nr_cpu_ids:2 nr_node_ids:1
[    0.000000] early_res array is doubled to 64 at [3000 - 37ff]
[    0.000000] PERCPU: Embedded 26 pages/cpu @ffff880001400000 s74984 r8192 d23320 u1048576
[    0.000000] pcpu-alloc: s74984 r8192 d23320 u1048576 alloc=1*2097152
[    0.000000] pcpu-alloc: [0] 0 1
[    0.000000] kvm-clock: cpu 0, msr 0:1411ac1, primary cpu clock
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 129174
[    0.000000] Kernel command line: root=/dev/vda1 ro single
[    0.000000] PID hash table entries: 2048 (order: 2, 16384 bytes)
[    0.000000] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.000000] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
[    0.000000] Checking aperture...
[    0.000000] No AGP bridge found
[    0.000000] Subtract (38 early reservations)
[    0.000000]   #1 [0001000000 - 00013e1e38]   TEXT DATA BSS
[    0.000000]   #2 [001fdfc000 - 001ffed000]         RAMDISK
[    0.000000]   #3 [000009a800 - 0000100000]   BIOS reserved
[    0.000000]   #4 [00013e2000 - 00013e2071]             BRK
[    0.000000]   #5 [0000001000 - 0000003000]      TRAMPOLINE
[    0.000000]   #6 [0000008000 - 0000009000]         PGTABLE
[    0.000000]   #7 [00013e2080 - 00013e3080]         BOOTMEM
[    0.000000]   #8 [00013e1e40 - 00013e1ea0]         BOOTMEM
[    0.000000]   #9 [0001be4000 - 0001be5000]         BOOTMEM
[    0.000000]   #10 [0001be5000 - 0001be6000]         BOOTMEM
[    0.000000]   #11 [0001c00000 - 0002400000]        MEMMAP 0
[    0.000000]   #12 [00013e3080 - 00013e3200]         BOOTMEM
[    0.000000]   #13 [00013e3200 - 00013e6200]         BOOTMEM
[    0.000000]   #14 [00013e7000 - 00013e8000]         BOOTMEM
[    0.000000]   #15 [00013e1ec0 - 00013e1f01]         BOOTMEM
[    0.000000]   #16 [00013e1f40 - 00013e1f83]         BOOTMEM
[    0.000000]   #17 [00013e6200 - 00013e6388]         BOOTMEM
[    0.000000]   #18 [00013e63c0 - 00013e6428]         BOOTMEM
[    0.000000]   #19 [00013e6440 - 00013e64a8]         BOOTMEM
[    0.000000]   #20 [00013e64c0 - 00013e6528]         BOOTMEM
[    0.000000]   #21 [00013e6540 - 00013e65a8]         BOOTMEM
[    0.000000]   #22 [00013e65c0 - 00013e6628]         BOOTMEM
[    0.000000]   #23 [00013e6640 - 00013e66a8]         BOOTMEM
[    0.000000]   #24 [00013e1fc0 - 00013e1fd9]         BOOTMEM
[    0.000000]   #25 [00013e66c0 - 00013e66d9]         BOOTMEM
[    0.000000]   #26 [0001400000 - 000141a000]         BOOTMEM
[    0.000000]   #27 [0001500000 - 000151a000]         BOOTMEM
[    0.000000]   #28 [00013e6700 - 00013e6708]         BOOTMEM
[    0.000000]   #29 [00013e6740 - 00013e6748]         BOOTMEM
[    0.000000]   #30 [00013e6780 - 00013e6788]         BOOTMEM
[    0.000000]   #31 [00013e67c0 - 00013e67d0]         BOOTMEM
[    0.000000]   #32 [00013e6800 - 00013e6940]         BOOTMEM
[    0.000000]   #33 [00013e6940 - 00013e69a0]         BOOTMEM
[    0.000000]   #34 [00013e69c0 - 00013e6a20]         BOOTMEM
[    0.000000]   #35 [00013e8000 - 00013ec000]         BOOTMEM
[    0.000000]   #36 [000141a000 - 000149a000]         BOOTMEM
[    0.000000]   #37 [000149a000 - 00014da000]         BOOTMEM
[    0.000000] Memory: 508672k/524276k available (2096k kernel code, 412k absent, 15192k reserved, 1097k data, 456k init)
[    0.000000] Hierarchical RCU implementation.
[    0.000000] NR_IRQS:448
[    0.000000] Console: colour VGA+ 80x25
[    0.000000] console [tty0] enabled
[    0.000000] hpet clockevent registered
[    0.000000] Detected 2800.590 MHz processor.
[    0.016000] Calibrating delay loop (skipped) preset value.. 5601.18 BogoMIPS (lpj=11202360)
[    0.016000] Mount-cache hash table entries: 256
[    0.016000] using C1E aware idle routine
[    0.016000] Performance Events: AMD PMU driver.
[    0.016000] ... version:                0
[    0.016000] ... bit width:              48
[    0.016000] ... generic registers:      4
[    0.016000] ... value mask:             0000ffffffffffff
[    0.016000] ... max period:             00007fffffffffff
[    0.016007] ... fixed-purpose events:   0
[    0.016351] ... event mask:             000000000000000f
[    0.020938] Freeing SMP alternatives: 24k freed
[    0.021288] ACPI: Core revision 20100121
[    0.023988] Setting APIC routing to flat
[    0.025968] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.026356] CPU0: AMD Athlon(tm) II X2 240 Processor stepping 02
[    0.028000] Booting Node   0, Processors  #1 Ok.
[    0.016000] kvm-clock: cpu 1, msr 0:1511ac1, secondary cpu clock
[    0.038013] Brought up 2 CPUs
[    0.038015] Total of 2 processors activated (11202.36 BogoMIPS).
[    0.038011] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
[    0.044216] NET: Registered protocol family 16
[    0.044752] ACPI: bus type pci registered
[    0.044752] PCI: Using configuration type 1 for base access
[    0.044859] PCI: Using configuration type 1 for extended access
[    0.048114] mtrr: your CPUs had inconsistent variable MTRR settings
[    0.048413] mtrr: your CPUs had inconsistent MTRRdefType settings
[    0.048780] mtrr: probably your BIOS does not setup all CPUs.
[    0.049140] mtrr: corrected configuration.
[    0.064243] bio: create slab <bio-0> at 0
[    0.068844] ACPI: EC: Look up EC in DSDT
[    0.074085] ACPI: Interpreter enabled
[    0.075141] ACPI: (supports S0 S5)
[    0.076012] ACPI: Using IOAPIC for interrupt routing
[    0.104232] PCI: Ignoring host bridge windows from ACPI; if necessary, use "pci=use_crs" and report a bug
[    0.104750] ACPI: PCI Root Bridge [PCI0] (0000:00)
[    0.105148] pci_root PNP0A03:00: host bridge window [io  0x0000-0x0cf7] (ignored)
[    0.105148] pci_root PNP0A03:00: host bridge window [io  0x0d00-0xffff] (ignored)
[    0.105148] pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff] (ignored)
[    0.105148] pci_root PNP0A03:00: host bridge window [mem 0xe0000000-0xfebfffff] (ignored)
[    0.105148] pci 0000:00:01.1: reg 20: [io  0xc000-0xc00f]
[    0.105148] pci 0000:00:01.3: quirk: [io  0xb000-0xb03f] claimed by PIIX4 ACPI
[    0.105391] pci 0000:00:01.3: quirk: [io  0xb100-0xb10f] claimed by PIIX4 SMB
[    0.111077] pci 0000:00:02.0: reg 10: [mem 0xf0000000-0xf1ffffff pref]
[    0.112233] pci 0000:00:02.0: reg 14: [mem 0xf2000000-0xf2000fff]
[    0.127174] pci 0000:00:03.0: reg 10: [io  0xc020-0xc03f]
[    0.127356] pci 0000:00:03.0: reg 14: [mem 0xf2001000-0xf2001fff]
[    0.128712] pci 0000:00:04.0: reg 10: [io  0xc040-0xc05f]
[    0.129310] pci 0000:00:05.0: reg 10: [io  0xc080-0xc0bf]
[    0.129895] pci 0000:00:06.0: reg 10: [io  0xc0c0-0xc0ff]
[    0.130644] pci_bus 0000:00: on NUMA node 0
[    0.130734] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[    0.144909] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[    0.148227] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[    0.150351] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[    0.152201] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[    0.156176] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[    0.158212] vgaarb: loaded
[    0.160149] PCI: Using ACPI for IRQ routing
[    0.161121] PCI: pci_cache_line_size set to 64 bytes
[    0.161448] reserve RAM buffer: 000000000009a800 - 000000000009ffff
[    0.161463] reserve RAM buffer: 000000001fffd000 - 000000001fffffff
[    0.161665] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
[    0.164078] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[    0.165801] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
[    0.172080] Switching to clocksource kvm-clock
[    0.176620] pnp: PnP ACPI init
[    0.177795] ACPI: bus type pnp registered
[    0.185167] pnp: PnP ACPI: found 7 devices
[    0.186418] ACPI: ACPI bus type pnp unregistered
[    0.198749] pci_bus 0000:00: resource 0 [io  0x0000-0xffff]
[    0.198757] pci_bus 0000:00: resource 1 [mem 0x00000000-0xffffffffffffffff]
[    0.199196] NET: Registered protocol family 2
[    0.200986] IP route cache hash table entries: 4096 (order: 3, 32768 bytes)
[    0.203750] TCP established hash table entries: 16384 (order: 6, 262144 bytes)
[    0.206538] TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
[    0.208303] TCP: Hash tables configured (established 16384 bind 16384)
[    0.209610] TCP reno registered
[    0.210602] UDP hash table entries: 256 (order: 1, 8192 bytes)
[    0.211828] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
[    0.213683] NET: Registered protocol family 1
[    0.214832] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    0.215341] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[    0.215720] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[    0.216114] pci 0000:00:02.0: Boot video device
[    0.216140] PCI: CLS 0 bytes, default 64
[    0.216203] Unpacking initramfs...
[    0.250071] Freeing initrd memory: 1988k freed
[    0.259814] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    0.261582] msgmni has been set to 997
[    0.263053] alg: No test for stdrng (krng)
[    0.281920] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[    0.282524] io scheduler noop registered
[    0.282842] io scheduler deadline registered
[    0.283349] io scheduler cfq registered (default)
[    0.324346] PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[    0.328420] serio: i8042 KBD port at 0x60,0x64 irq 1
[    0.329799] serio: i8042 AUX port at 0x60,0x64 irq 12
[    0.331768] mice: PS/2 mouse device common for all mice
[    0.334952] rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
[    0.336491] rtc0: alarms up to one day, 114 bytes nvram, hpet irqs
[    0.338002] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
[    0.338175] cpuidle: using governor ladder
[    0.338181] cpuidle: using governor menu
[    0.342454] TCP cubic registered
[    0.343662] NET: Registered protocol family 17
[    0.346570] rtc_cmos 00:01: setting system clock to 2010-07-26 14:20:16 UTC (1280154016)
[    0.349240] Freeing unused kernel memory: 456k freed
[    0.589068] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
[    0.589656] virtio-pci 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) -> IRQ 11
[    0.590307] virtio-pci 0000:00:03.0: setting latency timer to 64
[    0.610657] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 10
[    0.611093] virtio-pci 0000:00:04.0: PCI INT A -> Link[LNKD] -> GSI 10 (level, high) -> IRQ 10
[    0.611743] virtio-pci 0000:00:04.0: setting latency timer to 64
[    0.611888] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10
[    0.612267] virtio-pci 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10
[    0.612906] virtio-pci 0000:00:05.0: setting latency timer to 64
[    0.626190] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 11
[    0.626596] virtio-pci 0000:00:06.0: PCI INT A -> Link[LNKB] -> GSI 11 (level, high) -> IRQ 11
[    0.627256] virtio-pci 0000:00:06.0: setting latency timer to 64
[    0.658242]  vda: vda1 vda2 < vda5 >
[    0.684803]  vdb: vdb1
[    1.013120] Clocksource tsc unstable (delta = 1037182237254 ns)
[    1.074934] kjournald starting.  Commit interval 5 seconds
[    1.076360] EXT3-fs (vda1): mounted filesystem with writeback data mode
[    2.654241] udevd version 125 started
[    2.948450] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input1
[    2.949138] ACPI: Power Button [PWRF]
[    2.970190] virtio-pci 0000:00:03.0: irq 24 for MSI/MSI-X
[    2.970204] virtio-pci 0000:00:03.0: irq 25 for MSI/MSI-X
[    2.970217] virtio-pci 0000:00:03.0: irq 26 for MSI/MSI-X
[    4.599767] Adding 409620k swap on /dev/vda5.  Priority:-1 extents:1 across:409620k
[    5.171407] EXT3-fs (vda1): using internal journal
[    5.711498] loop: module loaded
[   11.244320] NET: Registered protocol family 10
[   11.246995] lo: Disabled Privacy Extensions
[   21.748123] eth0: no IPv6 routers present

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-26  8:47   ` Andre Przywara
  2010-07-26 18:59     ` Arjan Koers
@ 2010-07-27 10:03     ` Avi Kivity
  2010-07-27 11:49       ` Andre Przywara
  1 sibling, 1 reply; 81+ messages in thread
From: Avi Kivity @ 2010-07-27 10:03 UTC (permalink / raw)
  To: Andre Przywara; +Cc: glommer@redhat.com, Zachary Amsden, KVM list

  On 07/26/2010 11:47 AM, Andre Przywara wrote:
>> Does this go away with CONFIG_DEBUG_RODATA=n?  If so, it's a known 
>> bug in the atomic_*() clobber lists.
>>
>
> Unfortunately the bug persists even with CONFIG_DEBUG_RODATA disabled.
> The debug options I had enabled now are:
> CONFIG_DEBUG_DEVRES=y
> CONFIG_DEBUG_FS=y
> CONFIG_DEBUG_KERNEL=y
> CONFIG_DEBUG_BUGVERBOSE=y
> CONFIG_DEBUG_MEMORY_INIT=y
> CONFIG_DEBUG_STACKOVERFLOW=y
> CONFIG_DEBUG_BOOT_PARAMS=y
>
> I even disabled all kernel debug options, that does not help, too.
>

Does changing last_value in arch/x86/kernel/pvclock.c to be non-static help?

What is the guest executing when it hangs?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-27 10:03     ` Avi Kivity
@ 2010-07-27 11:49       ` Andre Przywara
  2010-07-27 12:06         ` Avi Kivity
  0 siblings, 1 reply; 81+ messages in thread
From: Andre Przywara @ 2010-07-27 11:49 UTC (permalink / raw)
  To: Avi Kivity; +Cc: glommer@redhat.com, Zachary Amsden, KVM list

Avi Kivity wrote:
>   On 07/26/2010 11:47 AM, Andre Przywara wrote:
>>> Does this go away with CONFIG_DEBUG_RODATA=n?  If so, it's a known 
>>> bug in the atomic_*() clobber lists.
>>>
>> Unfortunately the bug persists even with CONFIG_DEBUG_RODATA disabled.
>> The debug options I had enabled now are:
>> CONFIG_DEBUG_DEVRES=y
>> CONFIG_DEBUG_FS=y
>> CONFIG_DEBUG_KERNEL=y
>> CONFIG_DEBUG_BUGVERBOSE=y
>> CONFIG_DEBUG_MEMORY_INIT=y
>> CONFIG_DEBUG_STACKOVERFLOW=y
>> CONFIG_DEBUG_BOOT_PARAMS=y
>>
>> I even disabled all kernel debug options, that does not help, too.
>>
> 
> Does changing last_value in arch/x86/kernel/pvclock.c to be non-static help?

No, no change. It still hangs.

> What is the guest executing when it hangs?
Both VCPUs are halted, the monitor and System.map tell me it's in 
native_safe_halt().
The code sequence confirms this, it is an intentional sti;hlt condition.
Using -smp 16 also shows that all 16 VCPUs are stuck.

Regards,
Andre.


-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-27 11:49       ` Andre Przywara
@ 2010-07-27 12:06         ` Avi Kivity
  2010-07-27 12:21           ` Andre Przywara
  0 siblings, 1 reply; 81+ messages in thread
From: Avi Kivity @ 2010-07-27 12:06 UTC (permalink / raw)
  To: Andre Przywara; +Cc: glommer@redhat.com, Zachary Amsden, KVM list

  On 07/27/2010 02:49 PM, Andre Przywara wrote:
>
>> What is the guest executing when it hangs?
> Both VCPUs are halted, the monitor and System.map tell me it's in 
> native_safe_halt().
> The code sequence confirms this, it is an intentional sti;hlt condition.
> Using -smp 16 also shows that all 16 VCPUs are stuck.
>

Well, strange.  The intent of that patch was to make the clock never go 
backwards.  Perhaps the change made it go forwards by a large amount, 
and the guest is not hung, just waiting for some timer that is far in 
the future.

Can you do something like

-      if (ret < last)
+      if (ret < last) {
+            static u64 max_delta;
+            if (last - ret > max_delta) {
+                  max_delta = last - ret;
+                  printk("advancing kvmclock by: %llx\n", max_delta);
+            }
              return last;
+      }

to see if this is happening?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-27 12:06         ` Avi Kivity
@ 2010-07-27 12:21           ` Andre Przywara
  2010-07-27 12:34             ` Avi Kivity
  0 siblings, 1 reply; 81+ messages in thread
From: Andre Przywara @ 2010-07-27 12:21 UTC (permalink / raw)
  To: Avi Kivity; +Cc: glommer@redhat.com, Zachary Amsden, KVM list

Avi Kivity wrote:
>   On 07/27/2010 02:49 PM, Andre Przywara wrote:
>>> What is the guest executing when it hangs?
>> Both VCPUs are halted, the monitor and System.map tell me it's in 
>> native_safe_halt().
>> The code sequence confirms this, it is an intentional sti;hlt condition.
>> Using -smp 16 also shows that all 16 VCPUs are stuck.
>>
> 
> Well, strange.  The intent of that patch was to make the clock never go 
> backwards.  Perhaps the change made it go forwards by a large amount, 
> and the guest is not hung, just waiting for some timer that is far in 
> the future.
> 
> Can you do something like
> 
> -      if (ret < last)
> +      if (ret < last) {
> +            static u64 max_delta;
> +            if (last - ret > max_delta) {
> +                  max_delta = last - ret;
> +                  printk("advancing kvmclock by: %llx\n", max_delta);
> +            }
>               return last;
> +      }
> 
> to see if this is happening?
No change, it still hangs. I also don't see the printk.
The output with smp=1 is like this:
[    1.186549] ACPI: Power Button [PWRF]
[    1.189204] XENFS: not registering filesystem on non-xen platform
[    1.195001] Non-volatile memory driver v1.3
[    1.196358] Linux agpgart interface v0.103
[    1.197687] [drm] Initialized drm 1.1.0 20060810
[    1.198926] [drm:i915_init] *ERROR* drm/i915 can't work without 
intel_agp module!
[    1.201213] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
ÿ[    1.460714] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[    1.463243] 00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[    1.467153] brd: module loaded
[    1.469245] loop: module loaded
With smp=2 the output stops just before the strange "y" character (I 
guess it's ASCII 255), which I assume is an artifact of the serial console.
As you can see at the timestamps, it takes some time between the last 
shown line (1.201213) and the first missing one (1.460714).

Thanks,
Andre.

-- 
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-27 12:21           ` Andre Przywara
@ 2010-07-27 12:34             ` Avi Kivity
  2010-07-27 13:48               ` Andre Przywara
  0 siblings, 1 reply; 81+ messages in thread
From: Avi Kivity @ 2010-07-27 12:34 UTC (permalink / raw)
  To: Andre Przywara; +Cc: glommer@redhat.com, Zachary Amsden, KVM list

  On 07/27/2010 03:21 PM, Andre Przywara wrote:
> Avi Kivity wrote:
>>   On 07/27/2010 02:49 PM, Andre Przywara wrote:
>>>> What is the guest executing when it hangs?
>>> Both VCPUs are halted, the monitor and System.map tell me it's in 
>>> native_safe_halt().
>>> The code sequence confirms this, it is an intentional sti;hlt 
>>> condition.
>>> Using -smp 16 also shows that all 16 VCPUs are stuck.
>>>
>>
>> Well, strange.  The intent of that patch was to make the clock never 
>> go backwards.  Perhaps the change made it go forwards by a large 
>> amount, and the guest is not hung, just waiting for some timer that 
>> is far in the future.
>>
>> Can you do something like
>>
>> -      if (ret < last)
>> +      if (ret < last) {
>> +            static u64 max_delta;
>> +            if (last - ret > max_delta) {
>> +                  max_delta = last - ret;
>> +                  printk("advancing kvmclock by: %llx\n", max_delta);
>> +            }
>>               return last;
>> +      }
>>
>> to see if this is happening?
> No change, it still hangs. I also don't see the printk.
> The output with smp=1 is like this:
> [    1.186549] ACPI: Power Button [PWRF]
> [    1.189204] XENFS: not registering filesystem on non-xen platform
> [    1.195001] Non-volatile memory driver v1.3
> [    1.196358] Linux agpgart interface v0.103
> [    1.197687] [drm] Initialized drm 1.1.0 20060810
> [    1.198926] [drm:i915_init] *ERROR* drm/i915 can't work without 
> intel_agp module!
> [    1.201213] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> ÿ[    1.460714] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> [    1.463243] 00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> [    1.467153] brd: module loaded
> [    1.469245] loop: module loaded
> With smp=2 the output stops just before the strange "y" character (I 
> guess it's ASCII 255), which I assume is an artifact of the serial 
> console.
> As you can see at the timestamps, it takes some time between the last 
> shown line (1.201213) and the first missing one (1.460714).

Wierd.  Maybe the clock goes crazy.

Let's see if it jumps forward alot:

         } while (unlikely(last != ret));
+
+       {
+            static u64 last_report;
+            if (ret > last_report + 10000) {
+                    last_report = ret;
+                    printk("kvmclock: %llx\n", ret);
+            }
+
+       }

         return ret;
  }

Worth updating the 'return last' to update ret and goto the new code, so 
we don't miss that path.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-27 12:34             ` Avi Kivity
@ 2010-07-27 13:48               ` Andre Przywara
  2010-07-27 13:58                 ` Avi Kivity
  0 siblings, 1 reply; 81+ messages in thread
From: Andre Przywara @ 2010-07-27 13:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: glommer@redhat.com, Zachary Amsden, KVM list

Avi Kivity wrote:
>   On 07/27/2010 03:21 PM, Andre Przywara wrote:
>> Avi Kivity wrote:
>>>   On 07/27/2010 02:49 PM, Andre Przywara wrote:
>>>>> What is the guest executing when it hangs?
>>>> Both VCPUs are halted, the monitor and System.map tell me it's in 
>>>> native_safe_halt().
>>>> The code sequence confirms this, it is an intentional sti;hlt 
>>>> condition.
>>>> Using -smp 16 also shows that all 16 VCPUs are stuck.
>>>>
>>> Well, strange.  The intent of that patch was to make the clock never 
>>> go backwards.  Perhaps the change made it go forwards by a large 
>>> amount, and the guest is not hung, just waiting for some timer that 
>>> is far in the future.
>>>
>>> Can you do something like
>>>
>>> -      if (ret < last)
>>> +      if (ret < last) {
>>> +            static u64 max_delta;
>>> +            if (last - ret > max_delta) {
>>> +                  max_delta = last - ret;
>>> +                  printk("advancing kvmclock by: %llx\n", max_delta);
>>> +            }
>>>               return last;
>>> +      }
>>>
>>> to see if this is happening?
>> No change, it still hangs. I also don't see the printk.
>> The output with smp=1 is like this:
>> [    1.186549] ACPI: Power Button [PWRF]
>> [    1.189204] XENFS: not registering filesystem on non-xen platform
>> [    1.195001] Non-volatile memory driver v1.3
>> [    1.196358] Linux agpgart interface v0.103
>> [    1.197687] [drm] Initialized drm 1.1.0 20060810
>> [    1.198926] [drm:i915_init] *ERROR* drm/i915 can't work without 
>> intel_agp module!
>> [    1.201213] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
>> ÿ[    1.460714] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
>> [    1.463243] 00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
>> [    1.467153] brd: module loaded
>> [    1.469245] loop: module loaded
>> With smp=2 the output stops just before the strange "y" character (I 
>> guess it's ASCII 255), which I assume is an artifact of the serial 
>> console.
>> As you can see at the timestamps, it takes some time between the last 
>> shown line (1.201213) and the first missing one (1.460714).
> 
> Wierd.  Maybe the clock goes crazy.
> 
> Let's see if it jumps forward alot:
> 
>          } while (unlikely(last != ret));
> +
> +       {
> +            static u64 last_report;
> +            if (ret > last_report + 10000) {
> +                    last_report = ret;
> +                    printk("kvmclock: %llx\n", ret);
> +            }
> +
> +       }
> 
>          return ret;
>   }
> 
> Worth updating the 'return last' to update ret and goto the new code, so 
> we don't miss that path.
Did that. There is _a lot_ of output (about 350 lines per second via the 
115k serial console), both with smp=1 and smp=2.
The majority is differing about 2,000,000 (ticks?), but a handful of 
them are in the range of 20 million. No difference between smp=2 and smp=1.
I also get some "BUG: recent printk recursion!" and I don't see any 
kernel boot progress beyond outputting the BogoMIPS value.
BTW: I found two message from your earlier debug statement:
[    0.000000] kvm-clock: cpu 0, msr 0:1ac0401, boot clock
[    0.000000] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock

Regards,
Andre.

-- 
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-27 13:48               ` Andre Przywara
@ 2010-07-27 13:58                 ` Avi Kivity
  2010-07-27 14:55                   ` Andre Przywara
  0 siblings, 1 reply; 81+ messages in thread
From: Avi Kivity @ 2010-07-27 13:58 UTC (permalink / raw)
  To: Andre Przywara; +Cc: glommer@redhat.com, Zachary Amsden, KVM list

  On 07/27/2010 04:48 PM, Andre Przywara wrote:
>> Wierd.  Maybe the clock goes crazy.
>>
>> Let's see if it jumps forward alot:
>>
>>          } while (unlikely(last != ret));
>> +
>> +       {
>> +            static u64 last_report;
>> +            if (ret > last_report + 10000) {
>> +                    last_report = ret;
>> +                    printk("kvmclock: %llx\n", ret);
>> +            }
>> +
>> +       }
>>
>>          return ret;
>>   }
>>
>> Worth updating the 'return last' to update ret and goto the new code, 
>> so we don't miss that path.
>
> Did that. There is _a lot_ of output (about 350 lines per second via 
> the 115k serial console), both with smp=1 and smp=2.
> The majority is differing about 2,000,000 (ticks?), but a handful of 
> them are in the range of 20 million. 

nanoseconds.  So 2-20ms.  Consistent with 350 lines/sec.

> No difference between smp=2 and smp=1.
> I also get some "BUG: recent printk recursion!" and I don't see any 
> kernel boot progress beyond outputting the BogoMIPS value.

Right, printk() wants the time too.

> BTW: I found two message from your earlier debug statement:
> [    0.000000] kvm-clock: cpu 0, msr 0:1ac0401, boot clock
> [    0.000000] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock

Those are from kvmclock initialization, not from the older patch.

I'm completely confused, everything seems to be in order.

Let's see.  if you s/return last/return ret/ in the original, does this 
help things along?  this makes pvclock drop the computation and should 
be exactly the same as before the patch.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-27 13:58                 ` Avi Kivity
@ 2010-07-27 14:55                   ` Andre Przywara
  2010-07-27 21:51                     ` Andre Przywara
  0 siblings, 1 reply; 81+ messages in thread
From: Andre Przywara @ 2010-07-27 14:55 UTC (permalink / raw)
  To: Avi Kivity; +Cc: glommer@redhat.com, Zachary Amsden, KVM list

Avi Kivity wrote:
>   On 07/27/2010 04:48 PM, Andre Przywara wrote:
>>> Wierd.  Maybe the clock goes crazy.
>>>
>>> Let's see if it jumps forward alot:
>>>
>>>          } while (unlikely(last != ret));
>>> +
>>> +       {
>>> +            static u64 last_report;
>>> +            if (ret > last_report + 10000) {
>>> +                    last_report = ret;
>>> +                    printk("kvmclock: %llx\n", ret);
>>> +            }
>>> +
>>> +       }
>>>
>>>          return ret;
>>>   }
>>>
>>> Worth updating the 'return last' to update ret and goto the new code, 
>>> so we don't miss that path.
>> Did that. There is _a lot_ of output (about 350 lines per second via 
>> the 115k serial console), both with smp=1 and smp=2.
>> The majority is differing about 2,000,000 (ticks?), but a handful of 
>> them are in the range of 20 million. 
> 
> nanoseconds.  So 2-20ms.  Consistent with 350 lines/sec.
> 
>> No difference between smp=2 and smp=1.
>> I also get some "BUG: recent printk recursion!" and I don't see any 
>> kernel boot progress beyond outputting the BogoMIPS value.
> 
> Right, printk() wants the time too.
> 
>> BTW: I found two message from your earlier debug statement:
>> [    0.000000] kvm-clock: cpu 0, msr 0:1ac0401, boot clock
>> [    0.000000] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock
> 
> Those are from kvmclock initialization, not from the older patch.
> 
> I'm completely confused, everything seems to be in order.
> 
> Let's see.  if you s/return last/return ret/ in the original, does this 
> help things along?  this makes pvclock drop the computation and should 
> be exactly the same as before the patch.
Yes, this works, both smp version boot. I see a short very short break 
after the line in question, but then it proceeds well.
Thanks for your help, now I got a much better insight into the issue. I 
will see if I can find something more.

Regards,
Andre.



-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-26 18:59     ` Arjan Koers
@ 2010-07-27 21:00       ` Arjan Koers
  2010-07-28 10:37         ` Avi Kivity
  0 siblings, 1 reply; 81+ messages in thread
From: Arjan Koers @ 2010-07-27 21:00 UTC (permalink / raw)
  To: kvm

On 2010-07-26 20:59, Arjan Koers wrote:

> I ran into the same problem. 2.6.34.1 and 2.6.35-rc6 SMP guest
> kernels hang during boot.


It appears that last is way ahead of ret twice.
The kernel boots with this debug patch that makes the clock go
backwards if the difference is big:

 	last = atomic64_read(&last_value);
 	do {
-		if (ret < last)
-			return last;
+		if (ret < last) {
+			if ( last - ret < 25000000 )
+				return last;
+			else
+				printk("pvclock backwards: ret = %llx; last = %llx\n", ret, last);
+		}
 		last = atomic64_cmpxchg(&last_value, last, ret);
 	} while (unlikely(last != ret));


Here's the boot log:

[    0.000000] Linux version 2.6.35-rc6-201007272047-guestmp+ (arjan@dev-lenny) (gcc version 4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Tue Jul 27 20:52:36 UTC 2010
[    0.000000] Command line: root=/dev/vda1 ro single
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009bc00 (usable)
[    0.000000]  BIOS-e820: 000000000009bc00 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 000000001fffd000 (usable)
[    0.000000]  BIOS-e820: 000000001fffd000 - 0000000020000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI 2.4 present.
[    0.000000] e820 update range: 0000000000000000 - 0000000000001000 (usable) ==> (reserved)
[    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
[    0.000000] No AGP bridge found
[    0.000000] last_pfn = 0x1fffd max_arch_pfn = 0x400000000
[    0.000000] MTRR default type: write-back
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 00E0000000 mask FFE0000000 uncachable
[    0.000000]   1 disabled
[    0.000000]   2 disabled
[    0.000000]   3 disabled
[    0.000000]   4 disabled
[    0.000000]   5 disabled
[    0.000000]   6 disabled
[    0.000000]   7 disabled
[    0.000000] x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106
[    0.000000] initial memory mapped : 0 - 20000000
[    0.000000] Using GB pages for direct mapping
[    0.000000] init_memory_mapping: 0000000000000000-000000001fffd000
[    0.000000]  0000000000 - 001fe00000 page 2M
[    0.000000]  001fe00000 - 001fffd000 page 4k
[    0.000000] kernel direct mapping tables up to 1fffd000 @ 8000-b000
[    0.000000] RAMDISK: 1fdfc000 - 1ffed000
[    0.000000] ACPI: RSDP 00000000000fdb80 00014 (v00 BOCHS )
[    0.000000] ACPI: RSDT 000000001fffde10 00034 (v01 BOCHS  BXPCRSDT 00000001 BXPC 00000001)
[    0.000000] ACPI: FACP 000000001ffffe40 00074 (v01 BOCHS  BXPCFACP 00000001 BXPC 00000001)
[    0.000000] ACPI: DSDT 000000001fffdfd0 01E22 (v01   BXPC   BXDSDT 00000001 INTL 20090123)
[    0.000000] ACPI: FACS 000000001ffffe00 00040
[    0.000000] ACPI: SSDT 000000001fffdf80 00044 (v01 BOCHS  BXPCSSDT 00000001 BXPC 00000001)
[    0.000000] ACPI: APIC 000000001fffde90 0007A (v01 BOCHS  BXPCAPIC 00000001 BXPC 00000001)
[    0.000000] ACPI: HPET 000000001fffde50 00038 (v01 BOCHS  BXPCHPET 00000001 BXPC 00000001)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] kvm-clock: Using msrs 12 and 11
[    0.000000] kvm-clock: cpu 0, msr 0:1344c01, boot clock
[    0.000000]  [ffffea0000000000-ffffea00007fffff] PMD -> [ffff880001c00000-ffff8800023fffff] on node 0
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000001 -> 0x00001000
[    0.000000]   DMA32    0x00001000 -> 0x00100000
[    0.000000]   Normal   empty
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[2] active PFN ranges
[    0.000000]     0: 0x00000001 -> 0x0000009b
[    0.000000]     0: 0x00000100 -> 0x0001fffd
[    0.000000] On node 0 totalpages: 130967
[    0.000000]   DMA zone: 56 pages used for memmap
[    0.000000]   DMA zone: 0 pages reserved
[    0.000000]   DMA zone: 3938 pages, LIFO batch:0
[    0.000000]   DMA32 zone: 1736 pages used for memmap
[    0.000000]   DMA32 zone: 125237 pages, LIFO batch:31
[    0.000000] ACPI: PM-Timer IO Port: 0xb008
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[    0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[    0.000000] ACPI: IRQ0 used by override.
[    0.000000] ACPI: IRQ2 used by override.
[    0.000000] ACPI: IRQ5 used by override.
[    0.000000] ACPI: IRQ9 used by override.
[    0.000000] ACPI: IRQ10 used by override.
[    0.000000] ACPI: IRQ11 used by override.
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[    0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
[    0.000000] nr_irqs_gsi: 40
[    0.000000] Allocating PCI resources starting at 20000000 (gap: 20000000:dffc0000)
[    0.000000] Booting paravirtualized kernel on KVM
[    0.000000] setup_percpu: NR_CPUS:6 nr_cpumask_bits:6 nr_cpu_ids:2 nr_node_ids:1
[    0.000000] early_res array is doubled to 64 at [3000 - 37ff]
[    0.000000] PERCPU: Embedded 26 pages/cpu @ffff880001400000 s75712 r8192 d22592 u1048576
[    0.000000] pcpu-alloc: s75712 r8192 d22592 u1048576 alloc=1*2097152
[    0.000000] pcpu-alloc: [0] 0 1
[    0.000000] kvm-clock: cpu 0, msr 0:1411c01, primary cpu clock
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 129175
[    0.000000] Kernel command line: root=/dev/vda1 ro single
[    0.000000] PID hash table entries: 2048 (order: 2, 16384 bytes)
[    0.000000] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.000000] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
[    0.000000] Checking aperture...
[    0.000000] No AGP bridge found
[    0.000000] Subtract (38 early reservations)
[    0.000000]   #1 [0001000000 - 00013f4378]   TEXT DATA BSS
[    0.000000]   #2 [001fdfc000 - 001ffed000]         RAMDISK
[    0.000000]   #3 [000009bc00 - 0000100000]   BIOS reserved
[    0.000000]   #4 [00013f5000 - 00013f5071]             BRK
[    0.000000]   #5 [0000001000 - 0000003000]      TRAMPOLINE
[    0.000000]   #6 [0000008000 - 0000009000]         PGTABLE
[    0.000000]   #7 [00013f5080 - 00013f6080]         BOOTMEM
[    0.000000]   #8 [00013f4380 - 00013f43e0]         BOOTMEM
[    0.000000]   #9 [0001bf7000 - 0001bf8000]         BOOTMEM
[    0.000000]   #10 [0001bf8000 - 0001bf9000]         BOOTMEM
[    0.000000]   #11 [0001c00000 - 0002400000]        MEMMAP 0
[    0.000000]   #12 [00013f4400 - 00013f4580]         BOOTMEM
[    0.000000]   #13 [00013f6080 - 00013f9080]         BOOTMEM
[    0.000000]   #14 [00013fa000 - 00013fb000]         BOOTMEM
[    0.000000]   #15 [00013f4580 - 00013f45c1]         BOOTMEM
[    0.000000]   #16 [00013f4600 - 00013f4643]         BOOTMEM
[    0.000000]   #17 [00013f4680 - 00013f4808]         BOOTMEM
[    0.000000]   #18 [00013f4840 - 00013f48a8]         BOOTMEM
[    0.000000]   #19 [00013f48c0 - 00013f4928]         BOOTMEM
[    0.000000]   #20 [00013f4940 - 00013f49a8]         BOOTMEM
[    0.000000]   #21 [00013f49c0 - 00013f4a28]         BOOTMEM
[    0.000000]   #22 [00013f4a40 - 00013f4aa8]         BOOTMEM
[    0.000000]   #23 [00013f4ac0 - 00013f4b28]         BOOTMEM
[    0.000000]   #24 [00013f4b40 - 00013f4b59]         BOOTMEM
[    0.000000]   #25 [00013f4b80 - 00013f4b99]         BOOTMEM
[    0.000000]   #26 [0001400000 - 000141a000]         BOOTMEM
[    0.000000]   #27 [0001500000 - 000151a000]         BOOTMEM
[    0.000000]   #28 [00013f4bc0 - 00013f4bc8]         BOOTMEM
[    0.000000]   #29 [00013f4c00 - 00013f4c08]         BOOTMEM
[    0.000000]   #30 [00013f4c40 - 00013f4c48]         BOOTMEM
[    0.000000]   #31 [00013f4c80 - 00013f4c90]         BOOTMEM
[    0.000000]   #32 [00013f4cc0 - 00013f4e00]         BOOTMEM
[    0.000000]   #33 [00013f4e00 - 00013f4e60]         BOOTMEM
[    0.000000]   #34 [00013f4e80 - 00013f4ee0]         BOOTMEM
[    0.000000]   #35 [00013fb000 - 00013ff000]         BOOTMEM
[    0.000000]   #36 [000141a000 - 000149a000]         BOOTMEM
[    0.000000]   #37 [000149a000 - 00014da000]         BOOTMEM
[    0.000000] Memory: 508600k/524276k available (2135k kernel code, 408k absent, 15268k reserved, 1134k data, 464k init)
[    0.000000] Hierarchical RCU implementation.
[    0.000000] 	RCU-based detection of stalled CPUs is disabled.
[    0.000000] 	Verbose stalled-CPUs detection is disabled.
[    0.000000] NR_IRQS:448
[    0.000000] Console: colour VGA+ 80x25
[    0.000000] console [tty0] enabled
[    0.000000] hpet clockevent registered
[    0.000000] Detected 2799.520 MHz processor.
[    0.016000] Calibrating delay loop (skipped) preset value.. 5599.04 BogoMIPS (lpj=11198080)
[    0.016000] pid_max: default: 32768 minimum: 301
[    0.016000] Mount-cache hash table entries: 256
[    0.016000] using C1E aware idle routine
[    0.016000] Performance Events: AMD PMU driver.
[    0.016000] ... version:                0
[    0.016000] ... bit width:              48
[    0.016000] ... generic registers:      4
[    0.016004] ... value mask:             0000ffffffffffff
[    0.016388] ... max period:             00007fffffffffff
[    0.016767] ... fixed-purpose events:   0
[    0.017109] ... event mask:             000000000000000f
[    0.021772] Freeing SMP alternatives: 12k freed
[    0.022135] ACPI: Core revision 20100428
[    0.024224] Setting APIC routing to flat
[    0.026212] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.026608] CPU0: AMD Athlon(tm) II X2 240 Processor stepping 02
[    0.028000] Booting Node   0, Processors  #1 Ok.
[    0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock
[    0.037105] pvclock backwards: ret = 108372ffd10b; last = 210aff03671a
[    0.037119] BUG: recent printk recursion!
[    0.037120] <6>Brought up 2 CPUs
[    0.037122] Total of 2 processors activated (11198.08 BogoMIPS).
[    0.037118] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
[    0.040000] pvclock backwards: ret = 108373705fe2; last = 210aff61470a
[    0.044219] BUG: recent printk recursion!
[    0.044220] <6>NET: Registered protocol family 16
[    0.048108] ACPI: bus type pci registered
[    0.048447] PCI: Using configuration type 1 for base access
[    0.048855] PCI: Using configuration type 1 for extended access
[    0.049280] mtrr: your CPUs had inconsistent variable MTRR settings
[    0.049280] mtrr: your CPUs had inconsistent MTRRdefType settings
[    0.049280] mtrr: probably your BIOS does not setup all CPUs.
[    0.052005] mtrr: corrected configuration.
[    0.060192] bio: create slab <bio-0> at 0
[    0.060806] ACPI: EC: Look up EC in DSDT
[    0.065677] ACPI: Interpreter enabled
[    0.066004] ACPI: (supports S0 S5)
[    0.066406] ACPI: Using IOAPIC for interrupt routing
[    0.084131] PCI: Ignoring host bridge windows from ACPI; if necessary, use "pci=use_crs" and report a bug
[    0.086541] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.088068] pci_root PNP0A03:00: host bridge window [io  0x0000-0x0cf7] (ignored)
[    0.088072] pci_root PNP0A03:00: host bridge window [io  0x0d00-0xffff] (ignored)
[    0.088075] pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff] (ignored)
[    0.088078] pci_root PNP0A03:00: host bridge window [mem 0xe0000000-0xfebfffff] (ignored)
[    0.088713] pci 0000:00:01.1: reg 20: [io  0xc000-0xc00f]
[    0.089004] pci 0000:00:01.3: quirk: [io  0xb000-0xb03f] claimed by PIIX4 ACPI
[    0.092010] pci 0000:00:01.3: quirk: [io  0xb100-0xb10f] claimed by PIIX4 SMB
[    0.097988] pci 0000:00:02.0: reg 10: [mem 0xf0000000-0xf1ffffff pref]
[    0.098912] pci 0000:00:02.0: reg 14: [mem 0xf2000000-0xf2000fff]
[    0.104911] pci 0000:00:03.0: reg 10: [io  0xc020-0xc03f]
[    0.104980] pci 0000:00:03.0: reg 14: [mem 0xf2001000-0xf2001fff]
[    0.105330] pci 0000:00:04.0: reg 10: [io  0xc040-0xc05f]
[    0.105636] pci 0000:00:05.0: reg 10: [io  0xc080-0xc0bf]
[    0.105940] pci 0000:00:06.0: reg 10: [io  0xc0c0-0xc0ff]
[    0.106325] pci_bus 0000:00: on NUMA node 0
[    0.106382] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[    0.116539] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[    0.117359] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[    0.118458] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[    0.120675] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[    0.121798] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[    0.124010] vgaarb: loaded
[    0.124570] PCI: Using ACPI for IRQ routing
[    0.124605] PCI: pci_cache_line_size set to 64 bytes
[    0.124781] reserve RAM buffer: 000000000009bc00 - 000000000009ffff
[    0.124789] reserve RAM buffer: 000000001fffd000 - 000000001fffffff
[    0.124913] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
[    0.128044] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[    0.129060] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
[    0.140184] Switching to clocksource kvm-clock
[    0.140791] pnp: PnP ACPI init
[    0.141564] ACPI: bus type pnp registered
[    0.148623] pnp: PnP ACPI: found 7 devices
[    0.149737] ACPI: ACPI bus type pnp unregistered
[    0.161792] pci_bus 0000:00: resource 0 [io  0x0000-0xffff]
[    0.161801] pci_bus 0000:00: resource 1 [mem 0x00000000-0xffffffffffffffff]
[    0.162325] NET: Registered protocol family 2
[    0.163891] IP route cache hash table entries: 4096 (order: 3, 32768 bytes)
[    0.166098] TCP established hash table entries: 16384 (order: 6, 262144 bytes)
[    0.169226] TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
[    0.170987] TCP: Hash tables configured (established 16384 bind 16384)
[    0.172335] TCP reno registered
[    0.173378] UDP hash table entries: 256 (order: 1, 8192 bytes)
[    0.174607] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
[    0.176343] NET: Registered protocol family 1
[    0.197118] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    0.197502] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[    0.197960] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[    0.198360] pci 0000:00:02.0: Boot video device
[    0.198385] PCI: CLS 0 bytes, default 64
[    0.198451] Unpacking initramfs...
[    0.231639] Freeing initrd memory: 1988k freed
[    0.241648] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    0.243184] msgmni has been set to 997
[    0.244500] alg: No test for stdrng (krng)
[    0.245449] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[    0.246246] io scheduler noop registered
[    0.246664] io scheduler deadline registered
[    0.247248] io scheduler cfq registered (default)
[    0.295494] PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[    0.298886] serio: i8042 KBD port at 0x60,0x64 irq 1
[    0.299496] serio: i8042 AUX port at 0x60,0x64 irq 12
[    0.300600] mice: PS/2 mouse device common for all mice
[    0.302311] rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
[    0.303099] rtc0: alarms up to one day, 114 bytes nvram, hpet irqs
[    0.303836] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
[    0.304006] cpuidle: using governor ladder
[    0.304067] cpuidle: using governor menu
[    0.306138] TCP cubic registered
[    0.307334] NET: Registered protocol family 17
[    0.310261] rtc_cmos 00:01: setting system clock to 2010-07-27 20:56:06 UTC (1280264166)
[    0.312599] Freeing unused kernel memory: 464k freed
[    0.513685] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
[    0.514278] virtio-pci 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) -> IRQ 11
[    0.514928] virtio-pci 0000:00:03.0: setting latency timer to 64
[    0.515092] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 10
[    0.515493] virtio-pci 0000:00:04.0: PCI INT A -> Link[LNKD] -> GSI 10 (level, high) -> IRQ 10
[    0.516198] virtio-pci 0000:00:04.0: setting latency timer to 64
[    0.536171] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10
[    0.536565] virtio-pci 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10
[    0.537225] virtio-pci 0000:00:05.0: setting latency timer to 64
[    0.537386] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 11
[    0.537762] virtio-pci 0000:00:06.0: PCI INT A -> Link[LNKB] -> GSI 11 (level, high) -> IRQ 11
[    0.538410] virtio-pci 0000:00:06.0: setting latency timer to 64
[    0.634593]  vda: vda1 vda2 < vda5 >
[    0.649159]  vdb: vdb1
[    1.013119] Clocksource tsc unstable (delta = 582181654385 ns)
[    1.044251] EXT3-fs: barriers not enabled
[    1.063011] kjournald starting.  Commit interval 5 seconds
[    1.063115] EXT3-fs (vda1): mounted filesystem with writeback data mode
[    2.620528] udevd version 125 started
[    2.865930] virtio-pci 0000:00:03.0: irq 40 for MSI/MSI-X
[    2.865945] virtio-pci 0000:00:03.0: irq 41 for MSI/MSI-X
[    2.865958] virtio-pci 0000:00:03.0: irq 42 for MSI/MSI-X
[    2.910519] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input1
[    2.912585] ACPI: Power Button [PWRF]
[    2.921754] ACPI: acpi_idle registered with cpuidle
[    4.408057] Adding 409620k swap on /dev/vda5.  Priority:-1 extents:1 across:409620k
[    4.959959] EXT3-fs (vda1): using internal journal
[    5.495306] loop: module loaded
[    9.680594] hrtimer: interrupt took 11934233 ns
[   10.246663] NET: Registered protocol family 10
[   10.247565] lo: Disabled Privacy Extensions
[   20.576118] eth0: no IPv6 routers present

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-27 14:55                   ` Andre Przywara
@ 2010-07-27 21:51                     ` Andre Przywara
  2010-07-28  3:00                       ` Zachary Amsden
  2010-07-28 12:25                       ` Andre Przywara
  0 siblings, 2 replies; 81+ messages in thread
From: Andre Przywara @ 2010-07-27 21:51 UTC (permalink / raw)
  To: Avi Kivity; +Cc: glommer@redhat.com, Zachary Amsden, KVM list

Andre Przywara wrote:
> Avi Kivity wrote:
>>   On 07/27/2010 04:48 PM, Andre Przywara wrote:
>>>> Wierd.  Maybe the clock goes crazy.
>>>>
>>>> Let's see if it jumps forward alot:
>>>>
>>>>          } while (unlikely(last != ret));
>>>> +
>>>> +       {
>>>> +            static u64 last_report;
>>>> +            if (ret > last_report + 10000) {
>>>> +                    last_report = ret;
>>>> +                    printk("kvmclock: %llx\n", ret);
>>>> +            }
>>>> +
>>>> +       }
>>>>
>>>>          return ret;
>>>>   }
>>>>
>>>> Worth updating the 'return last' to update ret and goto the new code, 
>>>> so we don't miss that path.
>>> Did that. There is _a lot_ of output (about 350 lines per second via 
>>> the 115k serial console), both with smp=1 and smp=2.
>>> The majority is differing about 2,000,000 (ticks?), but a handful of 
>>> them are in the range of 20 million. 
>> nanoseconds.  So 2-20ms.  Consistent with 350 lines/sec.
>>
>>> No difference between smp=2 and smp=1.
>>> I also get some "BUG: recent printk recursion!" and I don't see any 
>>> kernel boot progress beyond outputting the BogoMIPS value.
>> Right, printk() wants the time too.
>>
>>> BTW: I found two message from your earlier debug statement:
>>> [    0.000000] kvm-clock: cpu 0, msr 0:1ac0401, boot clock
>>> [    0.000000] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock
>> Those are from kvmclock initialization, not from the older patch.
>>
>> I'm completely confused, everything seems to be in order.
>>
>> Let's see.  if you s/return last/return ret/ in the original, does this 
>> help things along?  this makes pvclock drop the computation and should 
>> be exactly the same as before the patch.
> Yes, this works, both smp version boot. I see a short very short break 
> after the line in question, but then it proceeds well.
> Thanks for your help, now I got a much better insight into the issue. I 
> will see if I can find something more.
Did some more investigations, some observations:
- The cmpxchg does not seem to be a problem, I didn't see the loop 
iterated more than once.
- Turning off printk-timestamps makes the bug go away. But I guess it is 
just hiding or deferring it, and it's no real workaround anyway.
- I instrumented the "if (ret < last) return last;" statement, when the 
kernel hangs I get only printks from there, although it has hit before:
----------
[    0.820000] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    0.820000] returning last instead (cnt=19001)
[    0.820000] returning last instead (cnt=20001)
The last line repeats forever with the same timestamp, the counter 
(counting the number of "return last;") increments about 3500 times/second.

I will see if I find something more...

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-27 21:51                     ` Andre Przywara
@ 2010-07-28  3:00                       ` Zachary Amsden
  2010-07-28  7:55                         ` Andre Przywara
  2010-07-28 12:25                       ` Andre Przywara
  1 sibling, 1 reply; 81+ messages in thread
From: Zachary Amsden @ 2010-07-28  3:00 UTC (permalink / raw)
  To: Andre Przywara; +Cc: Avi Kivity, glommer@redhat.com, KVM list

On 07/27/2010 11:51 AM, Andre Przywara wrote:
> Andre Przywara wrote:
>> Avi Kivity wrote:
>>>   On 07/27/2010 04:48 PM, Andre Przywara wrote:
>>>>> Wierd.  Maybe the clock goes crazy.
>>>>>
>>>>> Let's see if it jumps forward alot:
>>>>>
>>>>>          } while (unlikely(last != ret));
>>>>> +
>>>>> +       {
>>>>> +            static u64 last_report;
>>>>> +            if (ret > last_report + 10000) {
>>>>> +                    last_report = ret;
>>>>> +                    printk("kvmclock: %llx\n", ret);
>>>>> +            }
>>>>> +
>>>>> +       }
>>>>>
>>>>>          return ret;
>>>>>   }
>>>>>
>>>>> Worth updating the 'return last' to update ret and goto the new 
>>>>> code, so we don't miss that path.
>>>> Did that. There is _a lot_ of output (about 350 lines per second 
>>>> via the 115k serial console), both with smp=1 and smp=2.
>>>> The majority is differing about 2,000,000 (ticks?), but a handful 
>>>> of them are in the range of 20 million. 
>>> nanoseconds.  So 2-20ms.  Consistent with 350 lines/sec.
>>>
>>>> No difference between smp=2 and smp=1.
>>>> I also get some "BUG: recent printk recursion!" and I don't see any 
>>>> kernel boot progress beyond outputting the BogoMIPS value.
>>> Right, printk() wants the time too.
>>>
>>>> BTW: I found two message from your earlier debug statement:
>>>> [    0.000000] kvm-clock: cpu 0, msr 0:1ac0401, boot clock
>>>> [    0.000000] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock
>>> Those are from kvmclock initialization, not from the older patch.
>>>
>>> I'm completely confused, everything seems to be in order.
>>>
>>> Let's see.  if you s/return last/return ret/ in the original, does 
>>> this help things along?  this makes pvclock drop the computation and 
>>> should be exactly the same as before the patch.
>> Yes, this works, both smp version boot. I see a short very short 
>> break after the line in question, but then it proceeds well.
>> Thanks for your help, now I got a much better insight into the issue. 
>> I will see if I can find something more.
> Did some more investigations, some observations:
> - The cmpxchg does not seem to be a problem, I didn't see the loop 
> iterated more than once.
> - Turning off printk-timestamps makes the bug go away. But I guess it 
> is just hiding or deferring it, and it's no real workaround anyway.
> - I instrumented the "if (ret < last) return last;" statement, when 
> the kernel hangs I get only printks from there, although it has hit 
> before:
> ----------
> [    0.820000] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [    0.820000] returning last instead (cnt=19001)
> [    0.820000] returning last instead (cnt=20001)
> The last line repeats forever with the same timestamp, the counter 
> (counting the number of "return last;") increments about 3500 
> times/second.
>
> I will see if I find something more...
>
> Regards,
> Andre.
>
gcc --version?

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-28  3:00                       ` Zachary Amsden
@ 2010-07-28  7:55                         ` Andre Przywara
  0 siblings, 0 replies; 81+ messages in thread
From: Andre Przywara @ 2010-07-28  7:55 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Avi Kivity, glommer@redhat.com, KVM list

Zachary Amsden wrote:
> On 07/27/2010 11:51 AM, Andre Przywara wrote:
>> Andre Przywara wrote:
>>> Avi Kivity wrote:
>>>>   On 07/27/2010 04:48 PM, Andre Przywara wrote:
>>>>>> Wierd.  Maybe the clock goes crazy.
>>>>>>
>>>>>> Let's see if it jumps forward alot:
>>>>>>
>>>>>>          } while (unlikely(last != ret));
>>>>>> +
>>>>>> +       {
>>>>>> +            static u64 last_report;
>>>>>> +            if (ret > last_report + 10000) {
>>>>>> +                    last_report = ret;
>>>>>> +                    printk("kvmclock: %llx\n", ret);
>>>>>> +            }
>>>>>> +
>>>>>> +       }
>>>>>>
>>>>>>          return ret;
>>>>>>   }
>>>>>>
>>>>>> Worth updating the 'return last' to update ret and goto the new 
>>>>>> code, so we don't miss that path.
>>>>> Did that. There is _a lot_ of output (about 350 lines per second 
>>>>> via the 115k serial console), both with smp=1 and smp=2.
>>>>> The majority is differing about 2,000,000 (ticks?), but a handful 
>>>>> of them are in the range of 20 million. 
>>>> nanoseconds.  So 2-20ms.  Consistent with 350 lines/sec.
>>>>
>>>>> No difference between smp=2 and smp=1.
>>>>> I also get some "BUG: recent printk recursion!" and I don't see any 
>>>>> kernel boot progress beyond outputting the BogoMIPS value.
>>>> Right, printk() wants the time too.
>>>>
>>>>> BTW: I found two message from your earlier debug statement:
>>>>> [    0.000000] kvm-clock: cpu 0, msr 0:1ac0401, boot clock
>>>>> [    0.000000] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock
>>>> Those are from kvmclock initialization, not from the older patch.
>>>>
>>>> I'm completely confused, everything seems to be in order.
>>>>
>>>> Let's see.  if you s/return last/return ret/ in the original, does 
>>>> this help things along?  this makes pvclock drop the computation and 
>>>> should be exactly the same as before the patch.
>>> Yes, this works, both smp version boot. I see a short very short 
>>> break after the line in question, but then it proceeds well.
>>> Thanks for your help, now I got a much better insight into the issue. 
>>> I will see if I can find something more.
>> Did some more investigations, some observations:
>> - The cmpxchg does not seem to be a problem, I didn't see the loop 
>> iterated more than once.
>> - Turning off printk-timestamps makes the bug go away. But I guess it 
>> is just hiding or deferring it, and it's no real workaround anyway.
>> - I instrumented the "if (ret < last) return last;" statement, when 
>> the kernel hangs I get only printks from there, although it has hit 
>> before:
>> ----------
>> [    0.820000] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
>> [    0.820000] returning last instead (cnt=19001)
>> [    0.820000] returning last instead (cnt=20001)
>> The last line repeats forever with the same timestamp, the counter 
>> (counting the number of "return last;") increments about 3500 
>> times/second.
>>
>> I will see if I find something more...
>>

> gcc --version?
That would be 4.3.3
I compiled the guest kernel with 4.4.4 also, that made no difference.

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-27 21:00       ` Arjan Koers
@ 2010-07-28 10:37         ` Avi Kivity
  2010-07-31  0:34           ` Arjan Koers
  0 siblings, 1 reply; 81+ messages in thread
From: Avi Kivity @ 2010-07-28 10:37 UTC (permalink / raw)
  To: Arjan Koers; +Cc: kvm, Zachary Amsden

  On 07/28/2010 12:00 AM, Arjan Koers wrote:
> On 2010-07-26 20:59, Arjan Koers wrote:
>
>> I ran into the same problem. 2.6.34.1 and 2.6.35-rc6 SMP guest
>> kernels hang during boot.
>
> It appears that last is way ahead of ret twice.
> The kernel boots with this debug patch that makes the clock go
> backwards if the difference is big:
>
>   	last = atomic64_read(&last_value);
>   	do {
> -		if (ret<  last)
> -			return last;
> +		if (ret<  last) {
> +			if ( last - ret<  25000000 )
> +				return last;
> +			else
> +				printk("pvclock backwards: ret = %llx; last = %llx\n", ret, last);
> +		}
>   		last = atomic64_cmpxchg(&last_value, last, ret);
>   	} while (unlikely(last != ret));
>
>
>
> [    0.037122] Total of 2 processors activated (11198.08 BogoMIPS).
> [    0.037118] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
> [    0.040000] pvclock backwards: ret = 108373705fe2; last = 210aff61470a

Zaaaacchhhh?!

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-27 21:51                     ` Andre Przywara
  2010-07-28  3:00                       ` Zachary Amsden
@ 2010-07-28 12:25                       ` Andre Przywara
  2010-07-30 22:54                         ` Zachary Amsden
  1 sibling, 1 reply; 81+ messages in thread
From: Andre Przywara @ 2010-07-28 12:25 UTC (permalink / raw)
  To: Avi Kivity; +Cc: glommer@redhat.com, Zachary Amsden, KVM list

Andre Przywara wrote:
> Andre Przywara wrote:
>> Avi Kivity wrote:
>>>   On 07/27/2010 04:48 PM, Andre Przywara wrote:
>>>>> Wierd.  Maybe the clock goes crazy.
>>>>>
>>>>> Let's see if it jumps forward alot:
>>>>>
>>>>>          } while (unlikely(last != ret));
>>>>> +
>>>>> +       {
>>>>> +            static u64 last_report;
>>>>> +            if (ret > last_report + 10000) {
>>>>> +                    last_report = ret;
>>>>> +                    printk("kvmclock: %llx\n", ret);
>>>>> +            }
>>>>> +
>>>>> +       }
>>>>>
>>>>>          return ret;
>>>>>   }
>>>>>
>>>>> Worth updating the 'return last' to update ret and goto the new code, 
>>>>> so we don't miss that path.
>>>> Did that. There is _a lot_ of output (about 350 lines per second via 
>>>> the 115k serial console), both with smp=1 and smp=2.
>>>> The majority is differing about 2,000,000 (ticks?), but a handful of 
>>>> them are in the range of 20 million. 
>>> nanoseconds.  So 2-20ms.  Consistent with 350 lines/sec.
>>>
>>>> No difference between smp=2 and smp=1.
>>>> I also get some "BUG: recent printk recursion!" and I don't see any 
>>>> kernel boot progress beyond outputting the BogoMIPS value.
>>> Right, printk() wants the time too.
>>>
>>>> BTW: I found two message from your earlier debug statement:
>>>> [    0.000000] kvm-clock: cpu 0, msr 0:1ac0401, boot clock
>>>> [    0.000000] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock
>>> Those are from kvmclock initialization, not from the older patch.
>>>
>>> I'm completely confused, everything seems to be in order.
>>>
>>> Let's see.  if you s/return last/return ret/ in the original, does this 
>>> help things along?  this makes pvclock drop the computation and should 
>>> be exactly the same as before the patch.
>> Yes, this works, both smp version boot. I see a short very short break 
>> after the line in question, but then it proceeds well.
>> Thanks for your help, now I got a much better insight into the issue. I 
>> will see if I can find something more.
> Did some more investigations, some observations:
> - The cmpxchg does not seem to be a problem, I didn't see the loop 
> iterated more than once.
> - Turning off printk-timestamps makes the bug go away. But I guess it is 
> just hiding or deferring it, and it's no real workaround anyway.
> - I instrumented the "if (ret < last) return last;" statement, when the 
> kernel hangs I get only printks from there, although it has hit before:
> ----------
> [    0.820000] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [    0.820000] returning last instead (cnt=19001)
> [    0.820000] returning last instead (cnt=20001)
> The last line repeats forever with the same timestamp, the counter 
> (counting the number of "return last;") increments about 3500 times/second.
> 
> I will see if I find something more...
Added some more instrumentation, seems like the values read from the 
pvclock is bogus *sometimes*:
  returning last instead (2778021535795841, cnt=1, diff=1389078312510470)
This is from the first time the if-statement triggers. So I guess the 
value read is ridiculously far in the future (multiple days), so next 
calls to clocksource_read() will always return this bogus last value.
This means that the clock does not make progress (for several days) and 
thus timing loops will never come to an end. I also instrumented the 
serial driver, the last thing I saw was autoconfig_irq, where obviously 
udelay() is called.

Does that ring a bell with someone?

I will now concentrate on the pvclock readout/HV write part to see which 
of the values used here are wrong.

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-28 12:25                       ` Andre Przywara
@ 2010-07-30 22:54                         ` Zachary Amsden
  2010-08-02 10:12                           ` Andre Przywara
  0 siblings, 1 reply; 81+ messages in thread
From: Zachary Amsden @ 2010-07-30 22:54 UTC (permalink / raw)
  To: Andre Przywara; +Cc: Avi Kivity, glommer@redhat.com, KVM list

On 07/28/2010 02:25 AM, Andre Przywara wrote:
> Andre Przywara wrote:
>> Andre Przywara wrote:
>>> Avi Kivity wrote:
>>>>   On 07/27/2010 04:48 PM, Andre Przywara wrote:
>>>>>> Wierd.  Maybe the clock goes crazy.
>>>>>>
>>>>>> Let's see if it jumps forward alot:
>>>>>>
>>>>>>          } while (unlikely(last != ret));
>>>>>> +
>>>>>> +       {
>>>>>> +            static u64 last_report;
>>>>>> +            if (ret > last_report + 10000) {
>>>>>> +                    last_report = ret;
>>>>>> +                    printk("kvmclock: %llx\n", ret);
>>>>>> +            }
>>>>>> +
>>>>>> +       }
>>>>>>
>>>>>>          return ret;
>>>>>>   }
>>>>>>
>>>>>> Worth updating the 'return last' to update ret and goto the new 
>>>>>> code, so we don't miss that path.
>>>>> Did that. There is _a lot_ of output (about 350 lines per second 
>>>>> via the 115k serial console), both with smp=1 and smp=2.
>>>>> The majority is differing about 2,000,000 (ticks?), but a handful 
>>>>> of them are in the range of 20 million. 
>>>> nanoseconds.  So 2-20ms.  Consistent with 350 lines/sec.
>>>>
>>>>> No difference between smp=2 and smp=1.
>>>>> I also get some "BUG: recent printk recursion!" and I don't see 
>>>>> any kernel boot progress beyond outputting the BogoMIPS value.
>>>> Right, printk() wants the time too.
>>>>
>>>>> BTW: I found two message from your earlier debug statement:
>>>>> [    0.000000] kvm-clock: cpu 0, msr 0:1ac0401, boot clock
>>>>> [    0.000000] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock
>>>> Those are from kvmclock initialization, not from the older patch.
>>>>
>>>> I'm completely confused, everything seems to be in order.
>>>>
>>>> Let's see.  if you s/return last/return ret/ in the original, does 
>>>> this help things along?  this makes pvclock drop the computation 
>>>> and should be exactly the same as before the patch.
>>> Yes, this works, both smp version boot. I see a short very short 
>>> break after the line in question, but then it proceeds well.
>>> Thanks for your help, now I got a much better insight into the 
>>> issue. I will see if I can find something more.
>> Did some more investigations, some observations:
>> - The cmpxchg does not seem to be a problem, I didn't see the loop 
>> iterated more than once.
>> - Turning off printk-timestamps makes the bug go away. But I guess it 
>> is just hiding or deferring it, and it's no real workaround anyway.
>> - I instrumented the "if (ret < last) return last;" statement, when 
>> the kernel hangs I get only printks from there, although it has hit 
>> before:
>> ----------
>> [    0.820000] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
>> [    0.820000] returning last instead (cnt=19001)
>> [    0.820000] returning last instead (cnt=20001)
>> The last line repeats forever with the same timestamp, the counter 
>> (counting the number of "return last;") increments about 3500 
>> times/second.
>>
>> I will see if I find something more...
> Added some more instrumentation, seems like the values read from the 
> pvclock is bogus *sometimes*:
>  returning last instead (2778021535795841, cnt=1, diff=1389078312510470)
> This is from the first time the if-statement triggers. So I guess the 
> value read is ridiculously far in the future (multiple days), so next 
> calls to clocksource_read() will always return this bogus last value.
> This means that the clock does not make progress (for several days) 
> and thus timing loops will never come to an end. I also instrumented 
> the serial driver, the last thing I saw was autoconfig_irq, where 
> obviously udelay() is called.
>
> Does that ring a bell with someone?
>
> I will now concentrate on the pvclock readout/HV write part to see 
> which of the values used here are wrong.

Have you gotten any further results on this?

I think the most likely explanation is that your host CPU has TSC out of 
sync, and somehow this leaks over to pvclock.  Am I correct that it 
happens even with one guest VCPU?  What if you disable secondary host CPUs?

Zach

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-28 10:37         ` Avi Kivity
@ 2010-07-31  0:34           ` Arjan Koers
  2010-07-31  1:38             ` Zachary Amsden
  2010-07-31  2:39             ` Zachary Amsden
  0 siblings, 2 replies; 81+ messages in thread
From: Arjan Koers @ 2010-07-31  0:34 UTC (permalink / raw)
  To: kvm; +Cc: Avi Kivity, Zachary Amsden

On 2010-07-28 12:37, Avi Kivity wrote:
>  On 07/28/2010 12:00 AM, Arjan Koers wrote:
>> On 2010-07-26 20:59, Arjan Koers wrote:
>>
>>> I ran into the same problem. 2.6.34.1 and 2.6.35-rc6 SMP guest
>>> kernels hang during boot.
>>
>> It appears that last is way ahead of ret twice.
>> The kernel boots with this debug patch that makes the clock go
>> backwards if the difference is big:
>>
>>       last = atomic64_read(&last_value);
>>       do {
>> -        if (ret<  last)
>> -            return last;
>> +        if (ret<  last) {
>> +            if ( last - ret<  25000000 )
>> +                return last;
>> +            else
>> +                printk("pvclock backwards: ret = %llx; last =
>> %llx\n", ret, last);
>> +        }
>>           last = atomic64_cmpxchg(&last_value, last, ret);
>>       } while (unlikely(last != ret));
>>
>>
>>
>> [    0.037122] Total of 2 processors activated (11198.08 BogoMIPS).
>> [    0.037118] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
>> [    0.040000] pvclock backwards: ret = 108373705fe2; last = 210aff61470a
> 
> Zaaaacchhhh?!
> 


The lists below show some debug data of the first 99 calls to
pvclock_clocksource_read since the kernel booted. The situation
after the 'do ... while (version != src->version)' loop is
displayed.

Meaning of the columns:
- src pointer
- shadow.tsc_timestamp
- shadow.system_timestamp
- shadow.version
- native_read_tsc()
- delta = native_read_tsc() - shadow.tsc_timestamp
- offset = scale_delta(delta, shadow.tsc_to_nsec_mul, shadow.tsc_shift)
- ret = shadow.system_timestamp + offset

Fields left out, because they were the same for all rows:
- shadow.tsc_to_nsec_mul: b6dc43b6
- shadow.tsc_shift: ffffffff
- shadow.flags: 0

Debug log of guest after cold boot of virtual machine:
 1: ffff880001411c00     2107d5a4e b42c01d704c6   8294     210d8d4b5        5b7a67        20abdc  b42c01f7b0a2
 2: ffff880001411c00     2107d5a4e b42c01d704c6   8294     210dc2b61        5ed113        21dd1b  b42c01f8e1e1
 3: ffff880001411c00     21cb0d4a8 b42c0632768f   bb70     21cb10a00          3558          130d  b42c0632899c
 4: ffff880001411c00     21cb0d4a8 b42c0632768f   bb70     21cb11f17          4a6f          1a95  b42c06329124
 5: ffff880001411c00     21cceaad2 b42c063d1e45   bbd8     21ccec522          1a50           965  b42c063d27aa
 6: ffff880001411c00     21cde0644 b42c06429a42   bc10     21ce25457         44e13         1899a  b42c064423dc
 7: ffff880001411c00     21cf905c1 b42c064c3e76   bc46     21cfa182b         1126a          6201  b42c064ca077
 8: ffff880001411c00     21d088194 b42c0651c601   bc7a     21d089592          13fe           723  b42c0651cd24
 9: ffff880001411c00     21d1ad073 b42c06584fc3   bcde     21d1b135d          42ea          17e5  b42c065867a8
10: ffff880001411c00     21d2a3837 b42c065dd039   bd10     21d2a4825           fee           5b0  b42c065dd5e9
11: ffff880001411c00     21d38bab3 b42c0662fea6   bd42     21d38caa1           fee           5b0  b42c06630456
12: ffff880001411c00     21d47459b b42c06683029   bd78     21d475517           f7c           587  b42c066835b0
13: ffff880001411c00     21d578ce7 b42c066e005f   bdb2     21d57d70c          4a25          1a7a  b42c066e1ad9
14: ffff880001411c00     21d578ce7 b42c066e005f   bdb2     21d57d8d6          4bef          1b1e  b42c066e1b7d
15: ffff880001411c00     21d578ce7 b42c066e005f   bdb2     21d57da22          4d3b          1b94  b42c066e1bf3
16: ffff880001411c00     21d578ce7 b42c066e005f   bdb2     21d57fc5e          6f77          27ce  b42c066e282d
17: ffff880001411c00     21d67c77c b42c0673cc0a   bde4     21d67d685           f09           55e  b42c0673d168
18: ffff880001411c00     21d7625b2 b42c0678ed96   be16     21d763488           ed6           54c  b42c0678f2e2
19: ffff880001411c00     21df3db36 b42c06a5d222   be54     21dfa78b9         69d83         25cd5  b42c06a82ef7
20: ffff880001411c00     21df3db36 b42c06a5d222   be54     21dfa7a3f         69f09         25d61  b42c06a82f83
21: ffff880001411c00     21df3db36 b42c06a5d222   be54     21dfa7f8b         6a455         25f45  b42c06a83167
22: ffff880001411c00     21e3a50ea b42c06befbb1   be58     21e3c1750         1c666          a249  b42c06bf9dfa
23: ffff880001411c00     21e4bfe47 b42c06c54bc5   be92     21e4c4c61          4e1a          1be4  b42c06c567a9
24: ffff880001411c00     21ea2a56e b42c06e43dbc   beca     21ea4b224         20cb6          bb66  b42c06e4f922
25: ffff880001411c00     21ea2a56e b42c06e43dbc   beca     21ea52748         281da          e53c  b42c06e522f8
26: ffff880001411c00     21ea2a56e b42c06e43dbc   beca     21ea52907         28399          e5db  b42c06e52397
27: ffff880001411c00     21ea2a56e b42c06e43dbc   beca     21ea52a76         28508          e65f  b42c06e5241b
28: ffff880001411c00     21ea2a56e b42c06e43dbc   beca     21ea5c86a         322fc         11ec9  b42c06e55c85
29: ffff880001411c00     21ea2a56e b42c06e43dbc   beca     21ea60e3a         368cc         137b7  b42c06e57573
30: ffff880001411c00     21ea2a56e b42c06e43dbc   beca     21ea64dc8         3a85a         14e6a  b42c06e58c26
31: ffff880001411c00     21ed8a003 b42c06f78496   bf02     21efda28b        250288         d37d2  b42c0704bc68
32: ffff880001411c00     21f0e9488 b42c070ac93f   bf38     21f0eacdb          1853           8af  b42c070ad1ee
33: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230aeeac6          3e60          1646  b42c0d5636ed
34: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230af06d0          5a6a          204a  b42c0d5640f1
35: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b03f25         192bf          8fd6  b42c0d56b07d
36: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b043c8         19762          917f  b42c0d56b226
37: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b0526b         1a605          96b8  b42c0d56b75f
38: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b05632         1a9cc          9812  b42c0d56b8b9
39: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b07eaa         1d244          a686  b42c0d56c72d
40: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b094e9         1e883          ae78  b42c0d56cf1f
41: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b09962         1ecfc          b011  b42c0d56d0b8
42: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b10590         2592a          d6b4  b42c0d56f75b
43: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b1090d         25ca7          d7f3  b42c0d56f89a
44: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b10f99         26333          da49  b42c0d56faf0
45: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b11204         2659e          db27  b42c0d56fbce
46: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b1217c         27516          e0ad  b42c0d570154
47: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b1483f         29bd9          ee85  b42c0d570f2c
48: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b14ba6         29f40          efbc  b42c0d571063
49: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b15569         2a903          f338  b42c0d5713df
50: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b250b3         3a44d         14cf8  b42c0d576d9f
51: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b254a0         3a83a         14e5f  b42c0d576f06
52: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b25bd8         3af72         150f3  b42c0d57719a
53: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b25ec3         3b25d         151fd  b42c0d5772a4
54: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b5fcab         75045         29cad  b42c0d58bd54
55: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b6013b         754d5         29e4e  b42c0d58bef5
56: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b6b86c         80c06         2dfbc  b42c0d590063
57: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b6bc41         80fdb         2e11a  b42c0d5901c1
58: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b6c4e5         8187f         2e430  b42c0d5904d7
59: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b6c776         81b10         2e51b  b42c0d5905c2
60: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b7f97b         94d15         35266  b42c0d59730d
61: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b864af         9b849         378b0  b42c0d599957
62: ffff880001411c00     23132e49d b42c0d855884   c16e     231599c3a        26b79d         dd3ec  b42c0d932c70
63: ffff880001411c00     23132e49d b42c0d855884   c16e     231599dbc        26b91f         dd476  b42c0d932cfa
64: ffff880001411c00     23132e49d b42c0d855884   c16e     231599f5f        26bac2         dd50c  b42c0d932d90
65: ffff880001411c00     231fdf357 b42c0dcddc47   c176     232046a74         6771d         24f1e  b42c0dd02b65
66: ffff880001411c00     231fdf357 b42c0dcddc47   c176     232046c53         678fc         24fca  b42c0dd02c11
67: ffff880001411c00     231fdf357 b42c0dcddc47   c176     232046da0         67a49         25040  b42c0dd02c87
68: ffff880001411c00     232f4a54e b42c0e25f5e7   c17c     232f62a2d         184df          8ae2  b42c0e2680c9
69: ffff880001411c00     232f4a54e b42c0e25f5e7   c17c     232f63478         18f2a          8e8f  b42c0e268476
70: ffff880001411c00     232f4a54e b42c0e25f5e7   c17c     232f63f61         19a13          9274  b42c0e26885b
71: ffff880001511c00     20afec946 b42bffe0b604    130 1f890681eacdf 1f88e5d1fe399  b433ab005565 1685faae10b69
72: ffff880001411c00     2334400d3 b42c0e424ccd   c180     23344a923          a850          3c1c  b42c0e4288e9
73: ffff880001411c00     2334400d3 b42c0e424ccd   c180     2334632f1         2321e          c8c2  b42c0e43158f
74: ffff880001411c00     2334400d3 b42c0e424ccd   c180     23346a094         29fc1          efea  b42c0e433cb7
75: ffff880001411c00     2334400d3 b42c0e424ccd   c180     23347021d         3014a         112c0  b42c0e435f8d
76: ffff880001411c00     2334400d3 b42c0e424ccd   c180     2335ba33b        17a268         870e5  b42c0e4abdb2
77: ffff880001411c00     2334400d3 b42c0e424ccd   c180     2335ba9f8        17a925         8734d  b42c0e4ac01a
78: ffff880001411c00     2334400d3 b42c0e424ccd   c180     2335bb17d        17b0aa         875fd  b42c0e4ac2ca
79: ffff880001511c00     20afec946 b42bffe0b604    130 1f89068505355 1f88e5d518a0f  b433ab1210ed 1685faaf2c6f1
80: ffff880001511c00 1f8906862c74e b42c0e59371c      2 1f8906863ad24          e5d6          5215  b42c0e598931
81: ffff880001511c00 1f8906862c74e b42c0e59371c      2 1f8906863b980          f232          567f  b42c0e598d9b
82: ffff880001511c00 1f8906862c74e b42c0e59371c      2 1f8906863bbdd          f48f          5757  b42c0e598e73
83: ffff880001511c00 1f8906862c74e b42c0e59371c      2 1f8906863e9d2         12284          67c1  b42c0e599edd
84: ffff880001411c00     2334400d3 b42c0e424ccd   c180     233855729        415656        1755cc  b42c0e59a299
85: ffff880001511c00 1f8906862c74e b42c0e59371c      2 1f890686410b4         14966          75a4  b42c0e59acc0
86: ffff880001411c00     2334400d3 b42c0e424ccd   c180     233857b87        417ab4        1762c9  b42c0e59af96
87: ffff880001511c00 1f8906862c74e b42c0e59371c      2 1f89068646b9e         1a450          961d  b42c0e59cd39
88: ffff880001411c00     2334400d3 b42c0e424ccd   c180     233894271        45419e        18bc1e  b42c0e5b08eb
89: ffff880001411c00     2334400d3 b42c0e424ccd   c180     2338ab48a        46b3b7        19404c  b42c0e5b8d19
90: ffff880001411c00     2334400d3 b42c0e424ccd   c180     2338adf39        46de66        194f8b  b42c0e5b9c58
91: ffff880001411c00     2334400d3 b42c0e424ccd   c180     2338b39b8        4738e5        196fdc  b42c0e5bbca9
92: ffff880001511c00 1f890686bf9e1 b42c0e5c8045      4 1f890686cf137          f756          5855  b42c0e5cd89a
93: ffff880001511c00 1f890686bf9e1 b42c0e5c8045      4 1f890686cfd6f         1038e          5cb3  b42c0e5cdcf8
94: ffff880001511c00 1f890686bf9e1 b42c0e5c8045      4 1f890686d9f4d         1a56c          9682  b42c0e5d16c7
95: ffff880001511c00 1f890686bf9e1 b42c0e5c8045      4 1f890686e5610         25c2f          d7c8  b42c0e5d580d
96: ffff880001511c00 1f890686bf9e1 b42c0e5c8045      4 1f890686e8326         28945          e7e2  b42c0e5d6827
97: ffff880001411c00     233907ea7 b42c0e5d9e8b   c182     23391ad48         12ea1          6c15  b42c0e5e0aa0
98: ffff880001411c00     233907ea7 b42c0e5d9e8b   c182     23391b539         13692          6eeb  b42c0e5e0d76
99: ffff880001411c00     233907ea7 b42c0e5d9e8b   c182     2339270a3         1f1fc          b1da  b42c0e5e5065

The data for the first CPU (ffff880001411c00) looks OK to me.
For the second CPU (ffff880001511c00), the contents of the shadow struct
appear to be wrong on line 71 and 79: shadow.tsc_timestamp and
native_read_tsc() are very dissimilar, which results in a wrong value
of ret.
On line 80, the struct is OK again.
Notice that shadow.version appears to have been be reset back to 0. That
doesn't happen when the guest is rebooted without stopping the virtual machine.

Another cold boot log:
67: ffff880001411c00     16e3478f3 ba1ec80cf347   a5dc     16e36c8e7         24ff4          d36a  ba1ec80dc6b1
68: ffff880001511c00     14de08f79 ba1ebc817cf9    122 209385d6572ce 209370f84e355  ba26cd1ae302 17445899c5ffb
69: ffff880001511c00 209385d659d5c ba1ec828b7d0      2 209385d6678d0          db74          4e60  ba1ec8290630
70: ffff880001411c00     16e860662 ba1ec82a1262   a5e6     16e86cd2f          c6cd          4700  ba1ec82a5962
71: ffff880001411c00     16e860662 ba1ec82a1262   a5e6     16e88c965         2c303          fc81  ba1ec82b0ee3
72: ffff880001411c00     16e860662 ba1ec82a1262   a5e6     16e893dec         3378a         12620  ba1ec82b3882
73: ffff880001411c00     16e860662 ba1ec82a1262   a5e6     16e89b1d1         3ab6f         14f84  ba1ec82b61e6
74: ffff880001511c00 209385d6f582c ba1ec82c313b      6 209385d7034c1          dc95          4ec7  ba1ec82c8002

Debug logs of guest after rebooting the guest without stopping the virtual machine:
64: ffff880001411c00     2c9f06974 b8418e488ace  d1b5a     2ca0781a3        17182f         83f87  b8418e50ca55
65: ffff880001511c00     2aa8b381d b841831255aa  c760e 2040007f4fd81 203fd5d69c564  b8490b4fcf66 1708a8e622510
66: ffff880001511c00 2040007f525b2 b8418e631e86 19467e 2040007f60641          e08f          5033  b8418e636eb9
67: ffff880001411c00     2ca3f733e b8418e64c48e  d1b60     2ca4023b4          b076          3f05  b8418e650393
68: ffff880001411c00     2ca3f733e b8418e64c48e  d1b60     2ca41f35f         28021          e49e  b8418e65a92c
69: ffff880001411c00     2ca3f733e b8418e64c48e  d1b60     2ca426ad3         2f795         10f48  b8418e65d3d6
70: ffff880001411c00     2ca3f733e b8418e64c48e  d1b60     2ca42dfc9         36c8b         1390e  b8418e65fd9c
71: ffff880001511c00 2040007fc856e b8418e65c0d4 194680 2040007fd6ed9          e96b          535d  b8418e661431

67: ffff880001411c00     20f2a5bed ba4ed2bb1d69  72720     20f4554e5        1af8f8         9a21a  ba4ed2c4bf83
68: ffff880001511c00     1eec7bca9 ba4ec72a670c  6821a 209bee44b2eb7 209bcf583720e  ba569f76e851 174a566a14f5d
69: ffff880001511c00 209bee44ba9e6 ba4ed2d5b2ed  b095e 209bee44e5507         2ab21          f3fa  ba4ed2d6a6e7
70: ffff880001411c00     20f7796e4 ba4ed2d6b1ee  72724     20f77c158          2a74           f29  ba4ed2d6c117
71: ffff880001411c00     20f7796e4 ba4ed2d6b1ee  72724     20f785ca2          c5be          469f  ba4ed2d6f88d
72: ffff880001411c00     20f7796e4 ba4ed2d6b1ee  72724     20f787623          df3f          4fbb  ba4ed2d701a9
73: ffff880001411c00     20f7796e4 ba4ed2d6b1ee  72724     20f7891bb          fad7          5995  ba4ed2d70b83
74: ffff880001511c00 209bee44f569b ba4ed2d702ae  b0960 209bee45007a8          b10d          3f3b  ba4ed2d741e9

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-31  0:34           ` Arjan Koers
@ 2010-07-31  1:38             ` Zachary Amsden
  2010-07-31 11:50               ` Arjan Koers
  2010-07-31  2:39             ` Zachary Amsden
  1 sibling, 1 reply; 81+ messages in thread
From: Zachary Amsden @ 2010-07-31  1:38 UTC (permalink / raw)
  To: Arjan Koers; +Cc: kvm, Avi Kivity

On 07/30/2010 02:34 PM, Arjan Koers wrote:
> On 2010-07-28 12:37, Avi Kivity wrote:
>    
>>   On 07/28/2010 12:00 AM, Arjan Koers wrote:
>>      
>>> On 2010-07-26 20:59, Arjan Koers wrote:
>>>
>>>        
>>>> I ran into the same problem. 2.6.34.1 and 2.6.35-rc6 SMP guest
>>>> kernels hang during boot.
>>>>          
>>> It appears that last is way ahead of ret twice.
>>> The kernel boots with this debug patch that makes the clock go
>>> backwards if the difference is big:
>>>
>>>        last = atomic64_read(&last_value);
>>>        do {
>>> -        if (ret<   last)
>>> -            return last;
>>> +        if (ret<   last) {
>>> +            if ( last - ret<   25000000 )
>>> +                return last;
>>> +            else
>>> +                printk("pvclock backwards: ret = %llx; last =
>>> %llx\n", ret, last);
>>> +        }
>>>            last = atomic64_cmpxchg(&last_value, last, ret);
>>>        } while (unlikely(last != ret));
>>>
>>>
>>>
>>> [    0.037122] Total of 2 processors activated (11198.08 BogoMIPS).
>>> [    0.037118] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
>>> [    0.040000] pvclock backwards: ret = 108373705fe2; last = 210aff61470a
>>>        
>> Zaaaacchhhh?!
>>
>>      
>
> The lists below show some debug data of the first 99 calls to
> pvclock_clocksource_read since the kernel booted. The situation
> after the 'do ... while (version != src->version)' loop is
> displayed.
>
> Meaning of the columns:
> - src pointer
> - shadow.tsc_timestamp
> - shadow.system_timestamp
> - shadow.version
> - native_read_tsc()
> - delta = native_read_tsc() - shadow.tsc_timestamp
> - offset = scale_delta(delta, shadow.tsc_to_nsec_mul, shadow.tsc_shift)
> - ret = shadow.system_timestamp + offset
>
> Fields left out, because they were the same for all rows:
> - shadow.tsc_to_nsec_mul: b6dc43b6
> - shadow.tsc_shift: ffffffff
> - shadow.flags: 0
>
> Debug log of guest after cold boot of virtual machine:
>   1: ffff880001411c00     2107d5a4e b42c01d704c6   8294     210d8d4b5        5b7a67        20abdc  b42c01f7b0a2
>   2: ffff880001411c00     2107d5a4e b42c01d704c6   8294     210dc2b61        5ed113        21dd1b  b42c01f8e1e1
>   3: ffff880001411c00     21cb0d4a8 b42c0632768f   bb70     21cb10a00          3558          130d  b42c0632899c
>   4: ffff880001411c00     21cb0d4a8 b42c0632768f   bb70     21cb11f17          4a6f          1a95  b42c06329124
>   5: ffff880001411c00     21cceaad2 b42c063d1e45   bbd8     21ccec522          1a50           965  b42c063d27aa
>   6: ffff880001411c00     21cde0644 b42c06429a42   bc10     21ce25457         44e13         1899a  b42c064423dc
>   7: ffff880001411c00     21cf905c1 b42c064c3e76   bc46     21cfa182b         1126a          6201  b42c064ca077
>   8: ffff880001411c00     21d088194 b42c0651c601   bc7a     21d089592          13fe           723  b42c0651cd24
>   9: ffff880001411c00     21d1ad073 b42c06584fc3   bcde     21d1b135d          42ea          17e5  b42c065867a8
> 10: ffff880001411c00     21d2a3837 b42c065dd039   bd10     21d2a4825           fee           5b0  b42c065dd5e9
> 11: ffff880001411c00     21d38bab3 b42c0662fea6   bd42     21d38caa1           fee           5b0  b42c06630456
> 12: ffff880001411c00     21d47459b b42c06683029   bd78     21d475517           f7c           587  b42c066835b0
> 13: ffff880001411c00     21d578ce7 b42c066e005f   bdb2     21d57d70c          4a25          1a7a  b42c066e1ad9
> 14: ffff880001411c00     21d578ce7 b42c066e005f   bdb2     21d57d8d6          4bef          1b1e  b42c066e1b7d
> 15: ffff880001411c00     21d578ce7 b42c066e005f   bdb2     21d57da22          4d3b          1b94  b42c066e1bf3
> 16: ffff880001411c00     21d578ce7 b42c066e005f   bdb2     21d57fc5e          6f77          27ce  b42c066e282d
> 17: ffff880001411c00     21d67c77c b42c0673cc0a   bde4     21d67d685           f09           55e  b42c0673d168
> 18: ffff880001411c00     21d7625b2 b42c0678ed96   be16     21d763488           ed6           54c  b42c0678f2e2
> 19: ffff880001411c00     21df3db36 b42c06a5d222   be54     21dfa78b9         69d83         25cd5  b42c06a82ef7
> 20: ffff880001411c00     21df3db36 b42c06a5d222   be54     21dfa7a3f         69f09         25d61  b42c06a82f83
> 21: ffff880001411c00     21df3db36 b42c06a5d222   be54     21dfa7f8b         6a455         25f45  b42c06a83167
> 22: ffff880001411c00     21e3a50ea b42c06befbb1   be58     21e3c1750         1c666          a249  b42c06bf9dfa
> 23: ffff880001411c00     21e4bfe47 b42c06c54bc5   be92     21e4c4c61          4e1a          1be4  b42c06c567a9
> 24: ffff880001411c00     21ea2a56e b42c06e43dbc   beca     21ea4b224         20cb6          bb66  b42c06e4f922
> 25: ffff880001411c00     21ea2a56e b42c06e43dbc   beca     21ea52748         281da          e53c  b42c06e522f8
> 26: ffff880001411c00     21ea2a56e b42c06e43dbc   beca     21ea52907         28399          e5db  b42c06e52397
> 27: ffff880001411c00     21ea2a56e b42c06e43dbc   beca     21ea52a76         28508          e65f  b42c06e5241b
> 28: ffff880001411c00     21ea2a56e b42c06e43dbc   beca     21ea5c86a         322fc         11ec9  b42c06e55c85
> 29: ffff880001411c00     21ea2a56e b42c06e43dbc   beca     21ea60e3a         368cc         137b7  b42c06e57573
> 30: ffff880001411c00     21ea2a56e b42c06e43dbc   beca     21ea64dc8         3a85a         14e6a  b42c06e58c26
> 31: ffff880001411c00     21ed8a003 b42c06f78496   bf02     21efda28b        250288         d37d2  b42c0704bc68
> 32: ffff880001411c00     21f0e9488 b42c070ac93f   bf38     21f0eacdb          1853           8af  b42c070ad1ee
> 33: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230aeeac6          3e60          1646  b42c0d5636ed
> 34: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230af06d0          5a6a          204a  b42c0d5640f1
> 35: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b03f25         192bf          8fd6  b42c0d56b07d
> 36: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b043c8         19762          917f  b42c0d56b226
> 37: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b0526b         1a605          96b8  b42c0d56b75f
> 38: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b05632         1a9cc          9812  b42c0d56b8b9
> 39: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b07eaa         1d244          a686  b42c0d56c72d
> 40: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b094e9         1e883          ae78  b42c0d56cf1f
> 41: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b09962         1ecfc          b011  b42c0d56d0b8
> 42: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b10590         2592a          d6b4  b42c0d56f75b
> 43: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b1090d         25ca7          d7f3  b42c0d56f89a
> 44: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b10f99         26333          da49  b42c0d56faf0
> 45: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b11204         2659e          db27  b42c0d56fbce
> 46: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b1217c         27516          e0ad  b42c0d570154
> 47: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b1483f         29bd9          ee85  b42c0d570f2c
> 48: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b14ba6         29f40          efbc  b42c0d571063
> 49: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b15569         2a903          f338  b42c0d5713df
> 50: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b250b3         3a44d         14cf8  b42c0d576d9f
> 51: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b254a0         3a83a         14e5f  b42c0d576f06
> 52: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b25bd8         3af72         150f3  b42c0d57719a
> 53: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b25ec3         3b25d         151fd  b42c0d5772a4
> 54: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b5fcab         75045         29cad  b42c0d58bd54
> 55: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b6013b         754d5         29e4e  b42c0d58bef5
> 56: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b6b86c         80c06         2dfbc  b42c0d590063
> 57: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b6bc41         80fdb         2e11a  b42c0d5901c1
> 58: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b6c4e5         8187f         2e430  b42c0d5904d7
> 59: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b6c776         81b10         2e51b  b42c0d5905c2
> 60: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b7f97b         94d15         35266  b42c0d59730d
> 61: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b864af         9b849         378b0  b42c0d599957
> 62: ffff880001411c00     23132e49d b42c0d855884   c16e     231599c3a        26b79d         dd3ec  b42c0d932c70
> 63: ffff880001411c00     23132e49d b42c0d855884   c16e     231599dbc        26b91f         dd476  b42c0d932cfa
> 64: ffff880001411c00     23132e49d b42c0d855884   c16e     231599f5f        26bac2         dd50c  b42c0d932d90
> 65: ffff880001411c00     231fdf357 b42c0dcddc47   c176     232046a74         6771d         24f1e  b42c0dd02b65
> 66: ffff880001411c00     231fdf357 b42c0dcddc47   c176     232046c53         678fc         24fca  b42c0dd02c11
> 67: ffff880001411c00     231fdf357 b42c0dcddc47   c176     232046da0         67a49         25040  b42c0dd02c87
> 68: ffff880001411c00     232f4a54e b42c0e25f5e7   c17c     232f62a2d         184df          8ae2  b42c0e2680c9
> 69: ffff880001411c00     232f4a54e b42c0e25f5e7   c17c     232f63478         18f2a          8e8f  b42c0e268476
> 70: ffff880001411c00     232f4a54e b42c0e25f5e7   c17c     232f63f61         19a13          9274  b42c0e26885b
> 71: ffff880001511c00     20afec946 b42bffe0b604    130 1f890681eacdf 1f88e5d1fe399  b433ab005565 1685faae10b69
> 72: ffff880001411c00     2334400d3 b42c0e424ccd   c180     23344a923          a850          3c1c  b42c0e4288e9
> 73: ffff880001411c00     2334400d3 b42c0e424ccd   c180     2334632f1         2321e          c8c2  b42c0e43158f
> 74: ffff880001411c00     2334400d3 b42c0e424ccd   c180     23346a094         29fc1          efea  b42c0e433cb7
> 75: ffff880001411c00     2334400d3 b42c0e424ccd   c180     23347021d         3014a         112c0  b42c0e435f8d
> 76: ffff880001411c00     2334400d3 b42c0e424ccd   c180     2335ba33b        17a268         870e5  b42c0e4abdb2
> 77: ffff880001411c00     2334400d3 b42c0e424ccd   c180     2335ba9f8        17a925         8734d  b42c0e4ac01a
> 78: ffff880001411c00     2334400d3 b42c0e424ccd   c180     2335bb17d        17b0aa         875fd  b42c0e4ac2ca
> 79: ffff880001511c00     20afec946 b42bffe0b604    130 1f89068505355 1f88e5d518a0f  b433ab1210ed 1685faaf2c6f1
> 80: ffff880001511c00 1f8906862c74e b42c0e59371c      2 1f8906863ad24          e5d6          5215  b42c0e598931
> 81: ffff880001511c00 1f8906862c74e b42c0e59371c      2 1f8906863b980          f232          567f  b42c0e598d9b
> 82: ffff880001511c00 1f8906862c74e b42c0e59371c      2 1f8906863bbdd          f48f          5757  b42c0e598e73
> 83: ffff880001511c00 1f8906862c74e b42c0e59371c      2 1f8906863e9d2         12284          67c1  b42c0e599edd
> 84: ffff880001411c00     2334400d3 b42c0e424ccd   c180     233855729        415656        1755cc  b42c0e59a299
> 85: ffff880001511c00 1f8906862c74e b42c0e59371c      2 1f890686410b4         14966          75a4  b42c0e59acc0
> 86: ffff880001411c00     2334400d3 b42c0e424ccd   c180     233857b87        417ab4        1762c9  b42c0e59af96
> 87: ffff880001511c00 1f8906862c74e b42c0e59371c      2 1f89068646b9e         1a450          961d  b42c0e59cd39
> 88: ffff880001411c00     2334400d3 b42c0e424ccd   c180     233894271        45419e        18bc1e  b42c0e5b08eb
> 89: ffff880001411c00     2334400d3 b42c0e424ccd   c180     2338ab48a        46b3b7        19404c  b42c0e5b8d19
> 90: ffff880001411c00     2334400d3 b42c0e424ccd   c180     2338adf39        46de66        194f8b  b42c0e5b9c58
> 91: ffff880001411c00     2334400d3 b42c0e424ccd   c180     2338b39b8        4738e5        196fdc  b42c0e5bbca9
> 92: ffff880001511c00 1f890686bf9e1 b42c0e5c8045      4 1f890686cf137          f756          5855  b42c0e5cd89a
> 93: ffff880001511c00 1f890686bf9e1 b42c0e5c8045      4 1f890686cfd6f         1038e          5cb3  b42c0e5cdcf8
> 94: ffff880001511c00 1f890686bf9e1 b42c0e5c8045      4 1f890686d9f4d         1a56c          9682  b42c0e5d16c7
> 95: ffff880001511c00 1f890686bf9e1 b42c0e5c8045      4 1f890686e5610         25c2f          d7c8  b42c0e5d580d
> 96: ffff880001511c00 1f890686bf9e1 b42c0e5c8045      4 1f890686e8326         28945          e7e2  b42c0e5d6827
> 97: ffff880001411c00     233907ea7 b42c0e5d9e8b   c182     23391ad48         12ea1          6c15  b42c0e5e0aa0
> 98: ffff880001411c00     233907ea7 b42c0e5d9e8b   c182     23391b539         13692          6eeb  b42c0e5e0d76
> 99: ffff880001411c00     233907ea7 b42c0e5d9e8b   c182     2339270a3         1f1fc          b1da  b42c0e5e5065
>
> The data for the first CPU (ffff880001411c00) looks OK to me.
> For the second CPU (ffff880001511c00), the contents of the shadow struct
> appear to be wrong on line 71 and 79: shadow.tsc_timestamp and
> native_read_tsc() are very dissimilar, which results in a wrong value
> of ret.
> On line 80, the struct is OK again.
> Notice that shadow.version appears to have been be reset back to 0. That
> doesn't happen when the guest is rebooted without stopping the virtual machine.
>    

How are you printing shadow.version?  From a local variable captured 
during the barrier window or directly in a printk afterwards?  If should 
never go backwards like this, and the vcpus come from a zalloc.  This is 
not easily explainable by anything other than a memory ordering or 
compiler issue.

Note that receiving a startup IPI will cause the TSC to (mistakenly) 
pass through the host value, but this should be corrected for.  This 
happens because SVM will call init_vmcb, clearing the tsc_offset field.  
This seems to explain the huge difference in TSC presented to the CPUs.  
It should affect kvmclock, because kvmclock won't be running at that 
time yet.

Zach

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-31  0:34           ` Arjan Koers
  2010-07-31  1:38             ` Zachary Amsden
@ 2010-07-31  2:39             ` Zachary Amsden
  2010-07-31 11:53               ` Arjan Koers
  1 sibling, 1 reply; 81+ messages in thread
From: Zachary Amsden @ 2010-07-31  2:39 UTC (permalink / raw)
  To: Arjan Koers; +Cc: kvm, Avi Kivity, Glauber Costa

On 07/30/2010 02:34 PM, Arjan Koers wrote:
> On 2010-07-28 12:37, Avi Kivity wrote:
>    
>>   On 07/28/2010 12:00 AM, Arjan Koers wrote:
>>      
>>> On 2010-07-26 20:59, Arjan Koers wrote:
>>>
>>>        
>>>> I ran into the same problem. 2.6.34.1 and 2.6.35-rc6 SMP guest
>>>> kernels hang during boot.
>>>>          
>>> It appears that last is way ahead of ret twice.
>>> The kernel boots with this debug patch that makes the clock go
>>> backwards if the difference is big:
>>>
>>>        last = atomic64_read(&last_value);
>>>        do {
>>> -        if (ret<   last)
>>> -            return last;
>>> +        if (ret<   last) {
>>> +            if ( last - ret<   25000000 )
>>> +                return last;
>>> +            else
>>> +                printk("pvclock backwards: ret = %llx; last =
>>> %llx\n", ret, last);
>>> +        }
>>>            last = atomic64_cmpxchg(&last_value, last, ret);
>>>        } while (unlikely(last != ret));
>>>
>>>
>>>
>>> [    0.037122] Total of 2 processors activated (11198.08 BogoMIPS).
>>> [    0.037118] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
>>> [    0.040000] pvclock backwards: ret = 108373705fe2; last = 210aff61470a
>>>        
>> Zaaaacchhhh?!
>>
>>      
>
> The lists below show some debug data of the first 99 calls to
> pvclock_clocksource_read since the kernel booted. The situation
> after the 'do ... while (version != src->version)' loop is
> displayed.
>
> Meaning of the columns:
> - src pointer
> - shadow.tsc_timestamp
> - shadow.system_timestamp
> - shadow.version
> - native_read_tsc()
> - delta = native_read_tsc() - shadow.tsc_timestamp
> - offset = scale_delta(delta, shadow.tsc_to_nsec_mul, shadow.tsc_shift)
> - ret = shadow.system_timestamp + offset
>
> Fields left out, because they were the same for all rows:
> - shadow.tsc_to_nsec_mul: b6dc43b6
> - shadow.tsc_shift: ffffffff
> - shadow.flags: 0
>
> Debug log of guest after cold boot of virtual machine:
>   1: ffff880001411c00     2107d5a4e b42c01d704c6   8294     210d8d4b5        5b7a67        20abdc  b42c01f7b0a2
>   2: ffff880001411c00     2107d5a4e b42c01d704c6   8294     210dc2b61        5ed113        21dd1b  b42c01f8e1e1
>   3: ffff880001411c00     21cb0d4a8 b42c0632768f   bb70     21cb10a00          3558          130d  b42c0632899c
>   4: ffff880001411c00     21cb0d4a8 b42c0632768f   bb70     21cb11f17          4a6f          1a95  b42c06329124
>   5: ffff880001411c00     21cceaad2 b42c063d1e45   bbd8     21ccec522          1a50           965  b42c063d27aa
>   6: ffff880001411c00     21cde0644 b42c06429a42   bc10     21ce25457         44e13         1899a  b42c064423dc
>   7: ffff880001411c00     21cf905c1 b42c064c3e76   bc46     21cfa182b         1126a          6201  b42c064ca077
>   8: ffff880001411c00     21d088194 b42c0651c601   bc7a     21d089592          13fe           723  b42c0651cd24
>   9: ffff880001411c00     21d1ad073 b42c06584fc3   bcde     21d1b135d          42ea          17e5  b42c065867a8
> 10: ffff880001411c00     21d2a3837 b42c065dd039   bd10     21d2a4825           fee           5b0  b42c065dd5e9
> 11: ffff880001411c00     21d38bab3 b42c0662fea6   bd42     21d38caa1           fee           5b0  b42c06630456
> 12: ffff880001411c00     21d47459b b42c06683029   bd78     21d475517           f7c           587  b42c066835b0
> 13: ffff880001411c00     21d578ce7 b42c066e005f   bdb2     21d57d70c          4a25          1a7a  b42c066e1ad9
> 14: ffff880001411c00     21d578ce7 b42c066e005f   bdb2     21d57d8d6          4bef          1b1e  b42c066e1b7d
> 15: ffff880001411c00     21d578ce7 b42c066e005f   bdb2     21d57da22          4d3b          1b94  b42c066e1bf3
> 16: ffff880001411c00     21d578ce7 b42c066e005f   bdb2     21d57fc5e          6f77          27ce  b42c066e282d
> 17: ffff880001411c00     21d67c77c b42c0673cc0a   bde4     21d67d685           f09           55e  b42c0673d168
> 18: ffff880001411c00     21d7625b2 b42c0678ed96   be16     21d763488           ed6           54c  b42c0678f2e2
> 19: ffff880001411c00     21df3db36 b42c06a5d222   be54     21dfa78b9         69d83         25cd5  b42c06a82ef7
> 20: ffff880001411c00     21df3db36 b42c06a5d222   be54     21dfa7a3f         69f09         25d61  b42c06a82f83
> 21: ffff880001411c00     21df3db36 b42c06a5d222   be54     21dfa7f8b         6a455         25f45  b42c06a83167
> 22: ffff880001411c00     21e3a50ea b42c06befbb1   be58     21e3c1750         1c666          a249  b42c06bf9dfa
> 23: ffff880001411c00     21e4bfe47 b42c06c54bc5   be92     21e4c4c61          4e1a          1be4  b42c06c567a9
> 24: ffff880001411c00     21ea2a56e b42c06e43dbc   beca     21ea4b224         20cb6          bb66  b42c06e4f922
> 25: ffff880001411c00     21ea2a56e b42c06e43dbc   beca     21ea52748         281da          e53c  b42c06e522f8
> 26: ffff880001411c00     21ea2a56e b42c06e43dbc   beca     21ea52907         28399          e5db  b42c06e52397
> 27: ffff880001411c00     21ea2a56e b42c06e43dbc   beca     21ea52a76         28508          e65f  b42c06e5241b
> 28: ffff880001411c00     21ea2a56e b42c06e43dbc   beca     21ea5c86a         322fc         11ec9  b42c06e55c85
> 29: ffff880001411c00     21ea2a56e b42c06e43dbc   beca     21ea60e3a         368cc         137b7  b42c06e57573
> 30: ffff880001411c00     21ea2a56e b42c06e43dbc   beca     21ea64dc8         3a85a         14e6a  b42c06e58c26
> 31: ffff880001411c00     21ed8a003 b42c06f78496   bf02     21efda28b        250288         d37d2  b42c0704bc68
> 32: ffff880001411c00     21f0e9488 b42c070ac93f   bf38     21f0eacdb          1853           8af  b42c070ad1ee
> 33: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230aeeac6          3e60          1646  b42c0d5636ed
> 34: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230af06d0          5a6a          204a  b42c0d5640f1
> 35: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b03f25         192bf          8fd6  b42c0d56b07d
> 36: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b043c8         19762          917f  b42c0d56b226
> 37: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b0526b         1a605          96b8  b42c0d56b75f
> 38: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b05632         1a9cc          9812  b42c0d56b8b9
> 39: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b07eaa         1d244          a686  b42c0d56c72d
> 40: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b094e9         1e883          ae78  b42c0d56cf1f
> 41: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b09962         1ecfc          b011  b42c0d56d0b8
> 42: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b10590         2592a          d6b4  b42c0d56f75b
> 43: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b1090d         25ca7          d7f3  b42c0d56f89a
> 44: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b10f99         26333          da49  b42c0d56faf0
> 45: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b11204         2659e          db27  b42c0d56fbce
> 46: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b1217c         27516          e0ad  b42c0d570154
> 47: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b1483f         29bd9          ee85  b42c0d570f2c
> 48: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b14ba6         29f40          efbc  b42c0d571063
> 49: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b15569         2a903          f338  b42c0d5713df
> 50: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b250b3         3a44d         14cf8  b42c0d576d9f
> 51: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b254a0         3a83a         14e5f  b42c0d576f06
> 52: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b25bd8         3af72         150f3  b42c0d57719a
> 53: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b25ec3         3b25d         151fd  b42c0d5772a4
> 54: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b5fcab         75045         29cad  b42c0d58bd54
> 55: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b6013b         754d5         29e4e  b42c0d58bef5
> 56: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b6b86c         80c06         2dfbc  b42c0d590063
> 57: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b6bc41         80fdb         2e11a  b42c0d5901c1
> 58: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b6c4e5         8187f         2e430  b42c0d5904d7
> 59: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b6c776         81b10         2e51b  b42c0d5905c2
> 60: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b7f97b         94d15         35266  b42c0d59730d
> 61: ffff880001411c00     230aeac66 b42c0d5620a7   c100     230b864af         9b849         378b0  b42c0d599957
> 62: ffff880001411c00     23132e49d b42c0d855884   c16e     231599c3a        26b79d         dd3ec  b42c0d932c70
> 63: ffff880001411c00     23132e49d b42c0d855884   c16e     231599dbc        26b91f         dd476  b42c0d932cfa
> 64: ffff880001411c00     23132e49d b42c0d855884   c16e     231599f5f        26bac2         dd50c  b42c0d932d90
> 65: ffff880001411c00     231fdf357 b42c0dcddc47   c176     232046a74         6771d         24f1e  b42c0dd02b65
> 66: ffff880001411c00     231fdf357 b42c0dcddc47   c176     232046c53         678fc         24fca  b42c0dd02c11
> 67: ffff880001411c00     231fdf357 b42c0dcddc47   c176     232046da0         67a49         25040  b42c0dd02c87
> 68: ffff880001411c00     232f4a54e b42c0e25f5e7   c17c     232f62a2d         184df          8ae2  b42c0e2680c9
> 69: ffff880001411c00     232f4a54e b42c0e25f5e7   c17c     232f63478         18f2a          8e8f  b42c0e268476
> 70: ffff880001411c00     232f4a54e b42c0e25f5e7   c17c     232f63f61         19a13          9274  b42c0e26885b
> 71: ffff880001511c00     20afec946 b42bffe0b604    130 1f890681eacdf 1f88e5d1fe399  b433ab005565 1685faae10b69
>    

Okay, I think I know what's going on and why Glauber's patch causes 
problems for you.  It looks like your kernel is reading the kvmclock on 
the AP before it is initialized.  Looking at the guest side of things, 
it seems entirely plausible this could happen.

You did mention printk timing causes the bug to appear?  Perhaps it is 
not just coincidental.  Printk getting the time might very well call 
back into the timer code before the clock is initialized, and you've got 
tons of stuff in cpu_init and friends that are likely to want to printk 
all kinds of bootup messages.

If this were in fact the case, the cmpxchg that was added by Glauber's 
patch could leap your clock forward to some very uninitialized random 
value and then you could end up stuck in a timeout loop for days, as you 
are seeing.

Can you try very simply disabling printk timing to see if that might be 
the source of the bug?  In the meantime, what kernel version do you have 
in the guests?

Zach

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-31  1:38             ` Zachary Amsden
@ 2010-07-31 11:50               ` Arjan Koers
  0 siblings, 0 replies; 81+ messages in thread
From: Arjan Koers @ 2010-07-31 11:50 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: kvm, Avi Kivity

On 2010-07-31 03:38, Zachary Amsden wrote:
> 
> How are you printing shadow.version?  From a local variable captured
> during the barrier window or directly in a printk afterwards?  If should
> never go backwards like this, and the vcpus come from a zalloc.  This is
> not easily explainable by anything other than a memory ordering or
> compiler issue.

I'm reading shadow.version after the do-while loop and storing it in
an array to print it after the kernel finishes booting. I had to defer
my debug printk's, because they were affecting the results.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-31  2:39             ` Zachary Amsden
@ 2010-07-31 11:53               ` Arjan Koers
  2010-07-31 16:36                 ` Arjan Koers
  0 siblings, 1 reply; 81+ messages in thread
From: Arjan Koers @ 2010-07-31 11:53 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: kvm, Avi Kivity, Glauber Costa

On 2010-07-31 04:39, Zachary Amsden wrote:
> On 07/30/2010 02:34 PM, Arjan Koers wrote:
>> On 2010-07-28 12:37, Avi Kivity wrote:
>>   
>>>   On 07/28/2010 12:00 AM, Arjan Koers wrote:
>>>     
>>>> On 2010-07-26 20:59, Arjan Koers wrote:
>>>>
>>>>       
>>>>> I ran into the same problem. 2.6.34.1 and 2.6.35-rc6 SMP guest
>>>>> kernels hang during boot.
>>>>>          
>>>> It appears that last is way ahead of ret twice.
>>>> The kernel boots with this debug patch that makes the clock go
>>>> backwards if the difference is big:
>>>>
>>>>        last = atomic64_read(&last_value);
>>>>        do {
>>>> -        if (ret<   last)
>>>> -            return last;
>>>> +        if (ret<   last) {
>>>> +            if ( last - ret<   25000000 )
>>>> +                return last;
>>>> +            else
>>>> +                printk("pvclock backwards: ret = %llx; last =
>>>> %llx\n", ret, last);
>>>> +        }
>>>>            last = atomic64_cmpxchg(&last_value, last, ret);
>>>>        } while (unlikely(last != ret));
>>>>
>>>>
>>>>
>>>> [    0.037122] Total of 2 processors activated (11198.08 BogoMIPS).
>>>> [    0.037118] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
>>>> [    0.040000] pvclock backwards: ret = 108373705fe2; last =
>>>> 210aff61470a
>>>>        
>>> Zaaaacchhhh?!
>>>
>>>      
>>
>> The lists below show some debug data of the first 99 calls to
>> pvclock_clocksource_read since the kernel booted. The situation
>> after the 'do ... while (version != src->version)' loop is
>> displayed.
>>
>> Meaning of the columns:
>> - src pointer
>> - shadow.tsc_timestamp
>> - shadow.system_timestamp
>> - shadow.version
>> - native_read_tsc()
>> - delta = native_read_tsc() - shadow.tsc_timestamp
>> - offset = scale_delta(delta, shadow.tsc_to_nsec_mul, shadow.tsc_shift)
>> - ret = shadow.system_timestamp + offset
>>
>> Fields left out, because they were the same for all rows:
>> - shadow.tsc_to_nsec_mul: b6dc43b6
>> - shadow.tsc_shift: ffffffff
>> - shadow.flags: 0
>>
>> Debug log of guest after cold boot of virtual machine:
<snip>
>> 70: ffff880001411c00     232f4a54e b42c0e25f5e7   c17c    
>> 232f63f61         19a13          9274  b42c0e26885b
>> 71: ffff880001511c00     20afec946 b42bffe0b604    130 1f890681eacdf
>> 1f88e5d1fe399  b433ab005565 1685faae10b69
>>    
> 
> Okay, I think I know what's going on and why Glauber's patch causes
> problems for you.  It looks like your kernel is reading the kvmclock on
> the AP before it is initialized.  Looking at the guest side of things,
> it seems entirely plausible this could happen.
> 
> You did mention printk timing causes the bug to appear?  Perhaps it is
> not just coincidental.  Printk getting the time might very well call
> back into the timer code before the clock is initialized, and you've got
> tons of stuff in cpu_init and friends that are likely to want to printk
> all kinds of bootup messages.
> 
> If this were in fact the case, the cmpxchg that was added by Glauber's
> patch could leap your clock forward to some very uninitialized random
> value and then you could end up stuck in a timeout loop for days, as you
> are seeing.

Yes. That large wrong value is stored in last_value and all future correct
values are ignored, because they are smaller then last_value.


> Can you try very simply disabling printk timing to see if that might be
> the source of the bug?  In the meantime, what kernel version do you have
> in the guests?

The kernel boots successfully when CONFIG_PRINTK_TIME is not set.

I'm testing with 2.6.35-rc6 now. The problem also occurs with 2.6.34.1,
which also has Glauber's patch. Version 2.6.34 is working.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-31 11:53               ` Arjan Koers
@ 2010-07-31 16:36                 ` Arjan Koers
  2010-07-31 19:45                   ` Arjan Koers
  2010-07-31 23:55                   ` Zachary Amsden
  0 siblings, 2 replies; 81+ messages in thread
From: Arjan Koers @ 2010-07-31 16:36 UTC (permalink / raw)
  To: kvm; +Cc: Zachary Amsden, Avi Kivity, Glauber Costa

On 2010-07-31 13:53, Arjan Koers wrote:
> 
> The kernel boots successfully when CONFIG_PRINTK_TIME is not set.
> 

The problem occurs when this message is printed:

[    0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock

When I disable that printk, the kernel boots with
CONFIG_PRINTK_TIME=y

--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt)
 	int low, high;
 	low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
 	high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
-	printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
-	       cpu, high, low, txt);
+	/*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
+	       cpu, high, low, txt);*/

 	return native_write_msr_safe(msr_kvm_system_time, low, high);
 }

So the problem appears to be that the clock of the second CPU
is used too soon (or that clock setup should finish earlier).

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-31 16:36                 ` Arjan Koers
@ 2010-07-31 19:45                   ` Arjan Koers
  2010-07-31 23:55                   ` Zachary Amsden
  1 sibling, 0 replies; 81+ messages in thread
From: Arjan Koers @ 2010-07-31 19:45 UTC (permalink / raw)
  To: kvm; +Cc: Zachary Amsden, Avi Kivity, Glauber Costa

On 2010-07-31 18:36, Arjan Koers wrote:
> On 2010-07-31 13:53, Arjan Koers wrote:
>>
>> The kernel boots successfully when CONFIG_PRINTK_TIME is not set.
>>
> 
> The problem occurs when this message is printed:
> 
> [    0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock
> 
> When I disable that printk, the kernel boots with
> CONFIG_PRINTK_TIME=y
> 
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt)
>  	int low, high;
>  	low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
>  	high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
> -	printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> -	       cpu, high, low, txt);
> +	/*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> +	       cpu, high, low, txt);*/
> 
>  	return native_write_msr_safe(msr_kvm_system_time, low, high);
>  }
> 
> So the problem appears to be that the clock of the second CPU
> is used too soon (or that clock setup should finish earlier).


Moving the printk after native_write_msr_safe seems to solve the
problem:

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index eb9b76c..ca43ce3 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -128,13 +128,15 @@ static struct clocksource kvm_clock = {
 static int kvm_register_clock(char *txt)
 {
 	int cpu = smp_processor_id();
-	int low, high;
+	int low, high, ret;
+
 	low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
 	high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
+	ret = native_write_msr_safe(msr_kvm_system_time, low, high);
 	printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
 	       cpu, high, low, txt);

-	return native_write_msr_safe(msr_kvm_system_time, low, high);
+	return ret;
 }

 #ifdef CONFIG_X86_LOCAL_APIC



The debug log looks correct now:
67: ffff880001411a80     1b7772acb f797e782af86 7c82a2     1b7dd17c9 65ecfe 246717  f797e7a7169d
68: ffff880001411a80     1b8730d76 f797e7dca389 7c82b2     1b8892871 161afb  7e519  f797e7e488a2
69: ffff880001411a80     1b8730d76 f797e7dca389 7c82b2     1b8893281 16250b  7e8b1  f797e7e48c3a
70: ffff880001411a80     1b8730d76 f797e7dca389 7c82b2     1b8893c47 162ed1  7ec2e  f797e7e48fb7
71: ffff880001511a80 2b55ba387fb14 f797e7ef37af e7c292 2b55ba388196a   1e56    ad5  f797e7ef4284
72: ffff880001411a80     1b8a96765 f797e7f00c69 7c82b6     1b8a9fed0   976b   3613  f797e7f0427c
73: ffff880001411a80     1b8a96765 f797e7f00c69 7c82b6     1b8ab712f  209ca   ba5b  f797e7f0c6c4
74: ffff880001411a80     1b8a96765 f797e7f00c69 7c82b6     1b8abd861  270fc   df36  f797e7f0eb9f
75: ffff880001411a80     1b8a96765 f797e7f00c69 7c82b6     1b8ac3348  2cbe3   ffad  f797e7f10c16
76: ffff880001511a80 2b55ba3b9c85c f797e8010094 e7c332 2b55ba3bc258a  25d2e   d823  f797e801d8b7
77: ffff880001511a80 2b55ba3ce41f9 f797e8085071 e7c366 2b55ba3cec05b   7e62   2d23  f797e8087d94
78: ffff880001411a80     1b8d56d8c f797e7ffc53e 7c82b8     1b8eef620 198894  91e88  f797e808e3c6
79: ffff880001411a80     1b8d56d8c f797e7ffc53e 7c82b8     1b8ef182e 19aaa2  92ab2  f797e808eff0
80: ffff880001411a80     1b8d56d8c f797e7ffc53e 7c82b8     1b8f1d2ad 1c6521  a2429  f797e809e967


The only strange thing remaining is that the time for the first printk
isn't what I expected:
	[    0.016000] kvm-clock: cpu 1, msr 0:1511a81, secondary cpu clock
When I added some extra printk's immediately after that one, the
time on those was correct.

Here's a partial boot log:
...
[    0.000000] Console: colour VGA+ 80x25
[    0.000000] console [tty0] enabled
[    0.000000] hpet clockevent registered
[    0.000000] Detected 2799.950 MHz processor.
[    0.016000] Calibrating delay loop (skipped) preset value.. 5599.90 BogoMIPS (lpj=11199800)
[    0.016000] pid_max: default: 32768 minimum: 301
[    0.016000] Mount-cache hash table entries: 256
[    0.016000] using C1E aware idle routine
[    0.016000] Performance Events: AMD PMU driver.
[    0.016000] ... version:                0
[    0.016000] ... bit width:              48
[    0.016000] ... generic registers:      4
[    0.016000] ... value mask:             0000ffffffffffff
[    0.016004] ... max period:             00007fffffffffff
[    0.016406] ... fixed-purpose events:   0
[    0.016744] ... event mask:             000000000000000f
[    0.021404] Freeing SMP alternatives: 12k freed
[    0.021836] ACPI: Core revision 20100428
[    0.023882] Setting APIC routing to flat
[    0.025659] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.026129] CPU0: AMD Athlon(tm) II X2 240 Processor stepping 02
[    0.028000] Booting Node   0, Processors  #1 Ok.
[    0.016000] kvm-clock: cpu 1, msr 0:1511a81, secondary cpu clock
[    0.036812] Brought up 2 CPUs
[    0.036820] Total of 2 processors activated (11199.80 BogoMIPS).
[    0.036802] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
[    0.040357] NET: Registered protocol family 16
[    0.044159] ACPI: bus type pci registered
...


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-31 16:36                 ` Arjan Koers
  2010-07-31 19:45                   ` Arjan Koers
@ 2010-07-31 23:55                   ` Zachary Amsden
  2010-08-02 14:43                     ` Glauber Costa
  1 sibling, 1 reply; 81+ messages in thread
From: Zachary Amsden @ 2010-07-31 23:55 UTC (permalink / raw)
  To: Arjan Koers; +Cc: kvm, Avi Kivity, Glauber Costa

On 07/31/2010 06:36 AM, Arjan Koers wrote:
> On 2010-07-31 13:53, Arjan Koers wrote:
>    
>> The kernel boots successfully when CONFIG_PRINTK_TIME is not set.
>>
>>      
> The problem occurs when this message is printed:
>
> [    0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock
>
> When I disable that printk, the kernel boots with
> CONFIG_PRINTK_TIME=y
>
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt)
>   	int low, high;
>   	low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
>   	high = ((u64)__pa(&per_cpu(hv_clock, cpu))>>  32);
> -	printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> -	       cpu, high, low, txt);
> +	/*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> +	       cpu, high, low, txt);*/
>
>   	return native_write_msr_safe(msr_kvm_system_time, low, high);
>   }
>
> So the problem appears to be that the clock of the second CPU
> is used too soon (or that clock setup should finish earlier).
>    

That's almost hilarious.  The printk from setting up the kvm clock is 
invoking the kvm clock before it is setup.

There's no reason other printks couldn't do the same thing, however.  I 
think it's safest to keep an initialized flag and check for it before 
attempting to return a meaningful value.

Zach

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-30 22:54                         ` Zachary Amsden
@ 2010-08-02 10:12                           ` Andre Przywara
  0 siblings, 0 replies; 81+ messages in thread
From: Andre Przywara @ 2010-08-02 10:12 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Avi Kivity, glommer@redhat.com, KVM list

Zachary Amsden wrote:
> On 07/28/2010 02:25 AM, Andre Przywara wrote:
>> Andre Przywara wrote:
>>> Andre Przywara wrote:
>>>> Avi Kivity wrote:
>>>>>   On 07/27/2010 04:48 PM, Andre Przywara wrote:
>>>>>>> Wierd.  Maybe the clock goes crazy.
>>>>>>>
>>>>>>> Let's see if it jumps forward alot:
>>>>>>>
>>>>>>>          } while (unlikely(last != ret));
>>>>>>> +
>>>>>>> +       {
>>>>>>> +            static u64 last_report;
>>>>>>> +            if (ret > last_report + 10000) {
>>>>>>> +                    last_report = ret;
>>>>>>> +                    printk("kvmclock: %llx\n", ret);
>>>>>>> +            }
>>>>>>> +
>>>>>>> +       }
>>>>>>>
>>>>>>>          return ret;
>>>>>>>   }
>>>>>>>
>>>>>>> Worth updating the 'return last' to update ret and goto the new 
>>>>>>> code, so we don't miss that path.
>>>>>> Did that. There is _a lot_ of output (about 350 lines per second 
>>>>>> via the 115k serial console), both with smp=1 and smp=2.
>>>>>> The majority is differing about 2,000,000 (ticks?), but a handful 
>>>>>> of them are in the range of 20 million. 
>>>>> nanoseconds.  So 2-20ms.  Consistent with 350 lines/sec.
>>>>>
>>>>>> No difference between smp=2 and smp=1.
>>>>>> I also get some "BUG: recent printk recursion!" and I don't see 
>>>>>> any kernel boot progress beyond outputting the BogoMIPS value.
>>>>> Right, printk() wants the time too.
>>>>>
>>>>>> BTW: I found two message from your earlier debug statement:
>>>>>> [    0.000000] kvm-clock: cpu 0, msr 0:1ac0401, boot clock
>>>>>> [    0.000000] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock
>>>>> Those are from kvmclock initialization, not from the older patch.
>>>>>
>>>>> I'm completely confused, everything seems to be in order.
>>>>>
>>>>> Let's see.  if you s/return last/return ret/ in the original, does 
>>>>> this help things along?  this makes pvclock drop the computation 
>>>>> and should be exactly the same as before the patch.
>>>> Yes, this works, both smp version boot. I see a short very short 
>>>> break after the line in question, but then it proceeds well.
>>>> Thanks for your help, now I got a much better insight into the 
>>>> issue. I will see if I can find something more.
>>> Did some more investigations, some observations:
>>> - The cmpxchg does not seem to be a problem, I didn't see the loop 
>>> iterated more than once.
>>> - Turning off printk-timestamps makes the bug go away. But I guess it 
>>> is just hiding or deferring it, and it's no real workaround anyway.
>>> - I instrumented the "if (ret < last) return last;" statement, when 
>>> the kernel hangs I get only printks from there, although it has hit 
>>> before:
>>> ----------
>>> [    0.820000] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
>>> [    0.820000] returning last instead (cnt=19001)
>>> [    0.820000] returning last instead (cnt=20001)
>>> The last line repeats forever with the same timestamp, the counter 
>>> (counting the number of "return last;") increments about 3500 
>>> times/second.
>>>
>>> I will see if I find something more...
>> Added some more instrumentation, seems like the values read from the 
>> pvclock is bogus *sometimes*:
>>  returning last instead (2778021535795841, cnt=1, diff=1389078312510470)
>> This is from the first time the if-statement triggers. So I guess the 
>> value read is ridiculously far in the future (multiple days), so next 
>> calls to clocksource_read() will always return this bogus last value.
>> This means that the clock does not make progress (for several days) 
>> and thus timing loops will never come to an end. I also instrumented 
>> the serial driver, the last thing I saw was autoconfig_irq, where 
>> obviously udelay() is called.
>>
>> Does that ring a bell with someone?
>>
>> I will now concentrate on the pvclock readout/HV write part to see 
>> which of the values used here are wrong.
> 
> Have you gotten any further results on this?
Somehow. I think my latest findings were more or less ghost bugs: since 
prinkts contain a timestamp they interfere with the actual code. The 
large gap I described above was only to be seen with these printks, it 
is more or less double the real value (which is my host's uptime).
Sadly I cannot use debugfs to avoid the printks, since the kernel halts 
and I don't get to userland.

On another try I managed to bisect the failure also in qemu-kvm. The bug 
triggers only with "ebc4f45 turn off kvmclock when resetting cpu" 
applied (_additionally_ to the kernel patch in question).
When I comment out the call to kvm_reset_msrs() in the master branch, 
this also lets the bug vanish.
> 
> I think the most likely explanation is that your host CPU has TSC out of 
> sync, and somehow this leaks over to pvclock.  Am I correct that it 
> happens even with one guest VCPU?  What if you disable secondary host CPUs?
I tried several ways to pin VCPUs to different host CPUs (cores and 
sockets): both VCPUs on one core, both vCPUs on different cores on the 
same socket and both vCPUs to different sockets/nodes. That all did not 
make any difference, the kernel halted in either case.
I also tried booting the host with maxcpus=1, the error was still the 
same: -smp 1 works, -smp 2 halts.
Btw.: the host uses clocksource acpi_pm. Also I noticed that sometimes 
the guests gets very slow after having switched the clocksource to 
kvmclock, it then eventually halts at the mentioned line.


Regards,
Andre.

-- 
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-07-31 23:55                   ` Zachary Amsden
@ 2010-08-02 14:43                     ` Glauber Costa
  2010-08-02 16:16                       ` Arjan Koers
  2010-08-02 20:26                       ` Zachary Amsden
  0 siblings, 2 replies; 81+ messages in thread
From: Glauber Costa @ 2010-08-02 14:43 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Arjan Koers, kvm, Avi Kivity

On Sat, Jul 31, 2010 at 01:55:10PM -1000, Zachary Amsden wrote:
> On 07/31/2010 06:36 AM, Arjan Koers wrote:
> >On 2010-07-31 13:53, Arjan Koers wrote:
> >>The kernel boots successfully when CONFIG_PRINTK_TIME is not set.
> >>
> >The problem occurs when this message is printed:
> >
> >[    0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock
> >
> >When I disable that printk, the kernel boots with
> >CONFIG_PRINTK_TIME=y
> >
> >--- a/arch/x86/kernel/kvmclock.c
> >+++ b/arch/x86/kernel/kvmclock.c
> >@@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt)
> >  	int low, high;
> >  	low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
> >  	high = ((u64)__pa(&per_cpu(hv_clock, cpu))>>  32);
> >-	printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> >-	       cpu, high, low, txt);
> >+	/*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> >+	       cpu, high, low, txt);*/
> >
> >  	return native_write_msr_safe(msr_kvm_system_time, low, high);
> >  }
> >
> >So the problem appears to be that the clock of the second CPU
> >is used too soon (or that clock setup should finish earlier).
> 
> That's almost hilarious.  The printk from setting up the kvm clock
> is invoking the kvm clock before it is setup.
> 
> There's no reason other printks couldn't do the same thing, however.
> I think it's safest to keep an initialized flag and check for it
> before attempting to return a meaningful value.

I was on vacations, just got back.

I think it is safe to just patch our own use of it. Before that, all other
printks will be handled by the main cpu anyway, since it'll be the only one active
at the moment. The only possible offenders for this are us, and the cpu initialization
code, which is already fragile in multiple ways anyway.

A flag would only make things more complicated and dirty


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-08-02 14:43                     ` Glauber Costa
@ 2010-08-02 16:16                       ` Arjan Koers
  2010-08-02 18:07                         ` Glauber Costa
  2010-08-02 20:26                       ` Zachary Amsden
  1 sibling, 1 reply; 81+ messages in thread
From: Arjan Koers @ 2010-08-02 16:16 UTC (permalink / raw)
  To: kvm; +Cc: Glauber Costa, Zachary Amsden, Avi Kivity, Andre Przywara

On 2010-08-02 16:43, Glauber Costa wrote:
> On Sat, Jul 31, 2010 at 01:55:10PM -1000, Zachary Amsden wrote:
>> On 07/31/2010 06:36 AM, Arjan Koers wrote:
>>> On 2010-07-31 13:53, Arjan Koers wrote:
>>>> The kernel boots successfully when CONFIG_PRINTK_TIME is not set.
>>>>
>>> The problem occurs when this message is printed:
>>>
>>> [    0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock
>>>
>>> When I disable that printk, the kernel boots with
>>> CONFIG_PRINTK_TIME=y
>>>
>>> --- a/arch/x86/kernel/kvmclock.c
>>> +++ b/arch/x86/kernel/kvmclock.c
>>> @@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt)
>>>  	int low, high;
>>>  	low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
>>>  	high = ((u64)__pa(&per_cpu(hv_clock, cpu))>>  32);
>>> -	printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
>>> -	       cpu, high, low, txt);
>>> +	/*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
>>> +	       cpu, high, low, txt);*/
>>>
>>>  	return native_write_msr_safe(msr_kvm_system_time, low, high);
>>>  }
>>>
>>> So the problem appears to be that the clock of the second CPU
>>> is used too soon (or that clock setup should finish earlier).
>>
>> That's almost hilarious.  The printk from setting up the kvm clock
>> is invoking the kvm clock before it is setup.
>>
>> There's no reason other printks couldn't do the same thing, however.
>> I think it's safest to keep an initialized flag and check for it
>> before attempting to return a meaningful value.
> 
> I was on vacations, just got back.
> 
> I think it is safe to just patch our own use of it. Before that, all other
> printks will be handled by the main cpu anyway, since it'll be the only one active
> at the moment. The only possible offenders for this are us, and the cpu initialization
> code, which is already fragile in multiple ways anyway.
> 
> A flag would only make things more complicated and dirty

Maybe you could add a sanity check in pvclock_clocksource_read
after 'do { ... } while (version != src->version)' that
returns last_value if offset is extremely large?


I've performed some more boot tests (about 20) with the patch that
moves the printk after native_write_msr_safe and it works for me.
Andre Przywara confirmed to me that it also fixes his problem.

A slightly modified version of the patch for 2.6.34.1 also works
(800+ successful boot cycles).


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-08-02 16:16                       ` Arjan Koers
@ 2010-08-02 18:07                         ` Glauber Costa
  0 siblings, 0 replies; 81+ messages in thread
From: Glauber Costa @ 2010-08-02 18:07 UTC (permalink / raw)
  To: Arjan Koers; +Cc: kvm, Zachary Amsden, Avi Kivity, Andre Przywara

On Mon, Aug 02, 2010 at 06:16:16PM +0200, Arjan Koers wrote:
> On 2010-08-02 16:43, Glauber Costa wrote:
> > On Sat, Jul 31, 2010 at 01:55:10PM -1000, Zachary Amsden wrote:
> >> On 07/31/2010 06:36 AM, Arjan Koers wrote:
> >>> On 2010-07-31 13:53, Arjan Koers wrote:
> >>>> The kernel boots successfully when CONFIG_PRINTK_TIME is not set.
> >>>>
> >>> The problem occurs when this message is printed:
> >>>
> >>> [    0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock
> >>>
> >>> When I disable that printk, the kernel boots with
> >>> CONFIG_PRINTK_TIME=y
> >>>
> >>> --- a/arch/x86/kernel/kvmclock.c
> >>> +++ b/arch/x86/kernel/kvmclock.c
> >>> @@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt)
> >>>  	int low, high;
> >>>  	low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
> >>>  	high = ((u64)__pa(&per_cpu(hv_clock, cpu))>>  32);
> >>> -	printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> >>> -	       cpu, high, low, txt);
> >>> +	/*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> >>> +	       cpu, high, low, txt);*/
> >>>
> >>>  	return native_write_msr_safe(msr_kvm_system_time, low, high);
> >>>  }
> >>>
> >>> So the problem appears to be that the clock of the second CPU
> >>> is used too soon (or that clock setup should finish earlier).
> >>
> >> That's almost hilarious.  The printk from setting up the kvm clock
> >> is invoking the kvm clock before it is setup.
> >>
> >> There's no reason other printks couldn't do the same thing, however.
> >> I think it's safest to keep an initialized flag and check for it
> >> before attempting to return a meaningful value.
> > 
> > I was on vacations, just got back.
> > 
> > I think it is safe to just patch our own use of it. Before that, all other
> > printks will be handled by the main cpu anyway, since it'll be the only one active
> > at the moment. The only possible offenders for this are us, and the cpu initialization
> > code, which is already fragile in multiple ways anyway.
> > 
> > A flag would only make things more complicated and dirty
> 
> Maybe you could add a sanity check in pvclock_clocksource_read
> after 'do { ... } while (version != src->version)' that
> returns last_value if offset is extremely large?
I am not against adding a check, but only if the resulting action is
warn-only. Otherwise we can paper over this, and forget the real bugs.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-08-02 14:43                     ` Glauber Costa
  2010-08-02 16:16                       ` Arjan Koers
@ 2010-08-02 20:26                       ` Zachary Amsden
  2010-08-02 21:10                         ` Glauber Costa
  2010-08-02 21:35                         ` Arjan Koers
  1 sibling, 2 replies; 81+ messages in thread
From: Zachary Amsden @ 2010-08-02 20:26 UTC (permalink / raw)
  To: Glauber Costa; +Cc: Arjan Koers, kvm, Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 1991 bytes --]

On 08/02/2010 04:43 AM, Glauber Costa wrote:
> On Sat, Jul 31, 2010 at 01:55:10PM -1000, Zachary Amsden wrote:
>    
>> On 07/31/2010 06:36 AM, Arjan Koers wrote:
>>      
>>> On 2010-07-31 13:53, Arjan Koers wrote:
>>>        
>>>> The kernel boots successfully when CONFIG_PRINTK_TIME is not set.
>>>>
>>>>          
>>> The problem occurs when this message is printed:
>>>
>>> [    0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock
>>>
>>> When I disable that printk, the kernel boots with
>>> CONFIG_PRINTK_TIME=y
>>>
>>> --- a/arch/x86/kernel/kvmclock.c
>>> +++ b/arch/x86/kernel/kvmclock.c
>>> @@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt)
>>>   	int low, high;
>>>   	low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
>>>   	high = ((u64)__pa(&per_cpu(hv_clock, cpu))>>   32);
>>> -	printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
>>> -	       cpu, high, low, txt);
>>> +	/*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
>>> +	       cpu, high, low, txt);*/
>>>
>>>   	return native_write_msr_safe(msr_kvm_system_time, low, high);
>>>   }
>>>
>>> So the problem appears to be that the clock of the second CPU
>>> is used too soon (or that clock setup should finish earlier).
>>>        
>> That's almost hilarious.  The printk from setting up the kvm clock
>> is invoking the kvm clock before it is setup.
>>
>> There's no reason other printks couldn't do the same thing, however.
>> I think it's safest to keep an initialized flag and check for it
>> before attempting to return a meaningful value.
>>      
> I was on vacations, just got back.
>
> I think it is safe to just patch our own use of it. Before that, all other
> printks will be handled by the main cpu anyway, since it'll be the only one active
> at the moment. The only possible offenders for this are us, and the cpu initialization
> code, which is already fragile in multiple ways anyway.
>
> A flag would only make things more complicated and dirty
>    
Can we just do this?

[-- Attachment #2: zero.patch --]
[-- Type: text/plain, Size: 855 bytes --]

Initialize hv_clock to zero

This stops callers from getting random values if data is accessed before
clock is initialized; instead they will get zeroed clock values (because
computation involves a multiplication by a factor in hv_clock).

Signed-off-by: Zachary Amsden <zamsden@redhat.com>

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index eb9b76c..e7acd0d 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -40,7 +40,7 @@ static int parse_no_kvmclock(char *arg)
 early_param("no-kvmclock", parse_no_kvmclock);
 
 /* The hypervisor will put information about time periodically here */
-static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock);
+static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock) = {0};
 static struct pvclock_wall_clock wall_clock;
 
 /*

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-08-02 20:26                       ` Zachary Amsden
@ 2010-08-02 21:10                         ` Glauber Costa
  2010-08-02 21:35                         ` Arjan Koers
  1 sibling, 0 replies; 81+ messages in thread
From: Glauber Costa @ 2010-08-02 21:10 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Arjan Koers, kvm, Avi Kivity

On Mon, Aug 02, 2010 at 10:26:30AM -1000, Zachary Amsden wrote:
> On 08/02/2010 04:43 AM, Glauber Costa wrote:
> >On Sat, Jul 31, 2010 at 01:55:10PM -1000, Zachary Amsden wrote:
> >>On 07/31/2010 06:36 AM, Arjan Koers wrote:
> >>>On 2010-07-31 13:53, Arjan Koers wrote:
> >>>>The kernel boots successfully when CONFIG_PRINTK_TIME is not set.
> >>>>
> >>>The problem occurs when this message is printed:
> >>>
> >>>[    0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock
> >>>
> >>>When I disable that printk, the kernel boots with
> >>>CONFIG_PRINTK_TIME=y
> >>>
> >>>--- a/arch/x86/kernel/kvmclock.c
> >>>+++ b/arch/x86/kernel/kvmclock.c
> >>>@@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt)
> >>>  	int low, high;
> >>>  	low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
> >>>  	high = ((u64)__pa(&per_cpu(hv_clock, cpu))>>   32);
> >>>-	printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> >>>-	       cpu, high, low, txt);
> >>>+	/*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> >>>+	       cpu, high, low, txt);*/
> >>>
> >>>  	return native_write_msr_safe(msr_kvm_system_time, low, high);
> >>>  }
> >>>
> >>>So the problem appears to be that the clock of the second CPU
> >>>is used too soon (or that clock setup should finish earlier).
> >>That's almost hilarious.  The printk from setting up the kvm clock
> >>is invoking the kvm clock before it is setup.
> >>
> >>There's no reason other printks couldn't do the same thing, however.
> >>I think it's safest to keep an initialized flag and check for it
> >>before attempting to return a meaningful value.
> >I was on vacations, just got back.
> >
> >I think it is safe to just patch our own use of it. Before that, all other
> >printks will be handled by the main cpu anyway, since it'll be the only one active
> >at the moment. The only possible offenders for this are us, and the cpu initialization
> >code, which is already fragile in multiple ways anyway.
> >
> >A flag would only make things more complicated and dirty
> Can we just do this?

> Initialize hv_clock to zero
> 
> This stops callers from getting random values if data is accessed before
> clock is initialized; instead they will get zeroed clock values (because
> computation involves a multiplication by a factor in hv_clock).
> 
> Signed-off-by: Zachary Amsden <zamsden@redhat.com>
> 
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index eb9b76c..e7acd0d 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -40,7 +40,7 @@ static int parse_no_kvmclock(char *arg)
>  early_param("no-kvmclock", parse_no_kvmclock);
>  
>  /* The hypervisor will put information about time periodically here */
> -static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock);
> +static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock) = {0};
>  static struct pvclock_wall_clock wall_clock;
We can, but I am a little bit afraid that it won't initialize all the per-cpu areas.
If it does, it is fine, though.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-08-02 20:26                       ` Zachary Amsden
  2010-08-02 21:10                         ` Glauber Costa
@ 2010-08-02 21:35                         ` Arjan Koers
  2010-08-03  0:00                           ` Zachary Amsden
                                             ` (2 more replies)
  1 sibling, 3 replies; 81+ messages in thread
From: Arjan Koers @ 2010-08-02 21:35 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Glauber Costa, kvm, Avi Kivity, Andre Przywara

[-- Attachment #1: Type: text/plain, Size: 3613 bytes --]

On 2010-08-02 22:26, Zachary Amsden wrote:
> On 08/02/2010 04:43 AM, Glauber Costa wrote:
>> On Sat, Jul 31, 2010 at 01:55:10PM -1000, Zachary Amsden wrote:
>>   
>>> On 07/31/2010 06:36 AM, Arjan Koers wrote:
>>>     
>>>> On 2010-07-31 13:53, Arjan Koers wrote:
>>>>       
>>>>> The kernel boots successfully when CONFIG_PRINTK_TIME is not set.
>>>>>
>>>>>          
>>>> The problem occurs when this message is printed:
>>>>
>>>> [    0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock
>>>>
>>>> When I disable that printk, the kernel boots with
>>>> CONFIG_PRINTK_TIME=y
>>>>
>>>> --- a/arch/x86/kernel/kvmclock.c
>>>> +++ b/arch/x86/kernel/kvmclock.c
>>>> @@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt)
>>>>       int low, high;
>>>>       low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
>>>>       high = ((u64)__pa(&per_cpu(hv_clock, cpu))>>   32);
>>>> -    printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
>>>> -           cpu, high, low, txt);
>>>> +    /*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
>>>> +           cpu, high, low, txt);*/
>>>>
>>>>       return native_write_msr_safe(msr_kvm_system_time, low, high);
>>>>   }
>>>>
>>>> So the problem appears to be that the clock of the second CPU
>>>> is used too soon (or that clock setup should finish earlier).
>>>>        
>>> That's almost hilarious.  The printk from setting up the kvm clock
>>> is invoking the kvm clock before it is setup.
>>>
>>> There's no reason other printks couldn't do the same thing, however.
>>> I think it's safest to keep an initialized flag and check for it
>>> before attempting to return a meaningful value.
>>>      
>> I was on vacations, just got back.
>>
>> I think it is safe to just patch our own use of it. Before that, all
>> other
>> printks will be handled by the main cpu anyway, since it'll be the
>> only one active
>> at the moment. The only possible offenders for this are us, and the
>> cpu initialization
>> code, which is already fragile in multiple ways anyway.
>>
>> A flag would only make things more complicated and dirty
>>    
> Can we just do this?


Sorry, the patch doesn't help. See line 68 in my debug log:
65: ffff880001411c00    1b68905d7 156558001892 6e10a    1b6c0631e       375d47       13c5ce  15655813de60
66: ffff880001411c00    1b68905d7 156558001892 6e10a    1b6c0653b       375f64       13c68f  15655813df21
67: ffff880001411c00    1b68905d7 156558001892 6e10a    1b6c06746       37616f       13c74a  15655813dfdc
68: ffff880001511c00    1967ac192 15654c8d826a 63c6c 3bf58bf0ea18 3bf3f5762886 15695466a1e5  2acea0f4244f
69: ffff880001411c00    1b6f3fbda 156558264b1e 6e10e    1b6f424aa         28d0          e93  1565582659b1
70: ffff880001411c00    1b6f3fbda 156558264b1e 6e10e    1b6f4a1e0         a606         3b4b  156558268669
71: ffff880001411c00    1b6f3fbda 156558264b1e 6e10e    1b6f4ba63         be89         440b  156558268f29
72: ffff880001411c00    1b6f3fbda 156558264b1e 6e10e    1b6f4d8e7         dd0d         4ef1  156558269a0f
73: ffff880001511c00 3bf58bf16356 15655825e74b 40496 3bf58bf4d52c        371d6        13aef  15655827223a
74: ffff880001511c00 3bf58bf16356 15655825e74b 40496 3bf58bf4ebec        38896        1430f  156558272a5a

I don't think that pvclock_clocksource_read is receiving
completely random uninitialized data. The values in shadow
are wrong, but could be interpreted as valid data
(shadow.tsc_to_nsec_mul = b6dc43b6, shadow.tsc_shift = ffffffff,
shadow.flags = 0 and shadow.version is always even).


I've attached the printk patches for 2.6.34.1 and 2.6.35, in case
anyone needs them...

[-- Attachment #2: 2.6.34.1.patch --]
[-- Type: text/x-patch, Size: 1055 bytes --]

Move a printk that's using the clock before it's ready

Fix a hang during SMP kernel boot on KVM that showed up
after commit 489fb490dbf8dab0249ad82b56688ae3842a79e8
(2.6.35) and 59aab522154a2f17b25335b63c1cf68a51fb6ae0
(2.6.34.1). The problem only occurs when
CONFIG_PRINTK_TIME is set.

Signed-off-by: Arjan Koers <0h61vkll2ly8@xutrox.com>

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index feaeb0d..71bf2df 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -125,12 +125,15 @@ static struct clocksource kvm_clock = {
 static int kvm_register_clock(char *txt)
 {
 	int cpu = smp_processor_id();
-	int low, high;
+	int low, high, ret;
+
 	low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
 	high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
+	ret = native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
 	printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
 	       cpu, high, low, txt);
-	return native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
+
+	return ret;
 }
 
 #ifdef CONFIG_X86_LOCAL_APIC

[-- Attachment #3: 2.6.35.patch --]
[-- Type: text/x-patch, Size: 1055 bytes --]

Move a printk that's using the clock before it's ready

Fix a hang during SMP kernel boot on KVM that showed up
after commit 489fb490dbf8dab0249ad82b56688ae3842a79e8
(2.6.35) and 59aab522154a2f17b25335b63c1cf68a51fb6ae0
(2.6.34.1). The problem only occurs when
CONFIG_PRINTK_TIME is set.

Signed-off-by: Arjan Koers <0h61vkll2ly8@xutrox.com>

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index eb9b76c..ca43ce3 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -128,13 +128,15 @@ static struct clocksource kvm_clock = {
 static int kvm_register_clock(char *txt)
 {
 	int cpu = smp_processor_id();
-	int low, high;
+	int low, high, ret;
+
 	low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
 	high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
+	ret = native_write_msr_safe(msr_kvm_system_time, low, high);
 	printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
 	       cpu, high, low, txt);
 
-	return native_write_msr_safe(msr_kvm_system_time, low, high);
+	return ret;
 }
 
 #ifdef CONFIG_X86_LOCAL_APIC

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-08-02 21:35                         ` Arjan Koers
@ 2010-08-03  0:00                           ` Zachary Amsden
  2010-09-28 11:16                           ` Michael Tokarev
  2010-09-29  8:28                           ` Avi Kivity
  2 siblings, 0 replies; 81+ messages in thread
From: Zachary Amsden @ 2010-08-03  0:00 UTC (permalink / raw)
  To: Arjan Koers; +Cc: Glauber Costa, kvm, Avi Kivity, Andre Przywara

[-- Attachment #1: Type: text/plain, Size: 3909 bytes --]

On 08/02/2010 11:35 AM, Arjan Koers wrote:
> On 2010-08-02 22:26, Zachary Amsden wrote:
>    
>> On 08/02/2010 04:43 AM, Glauber Costa wrote:
>>      
>>> On Sat, Jul 31, 2010 at 01:55:10PM -1000, Zachary Amsden wrote:
>>>
>>>        
>>>> On 07/31/2010 06:36 AM, Arjan Koers wrote:
>>>>
>>>>          
>>>>> On 2010-07-31 13:53, Arjan Koers wrote:
>>>>>
>>>>>            
>>>>>> The kernel boots successfully when CONFIG_PRINTK_TIME is not set.
>>>>>>
>>>>>>
>>>>>>              
>>>>> The problem occurs when this message is printed:
>>>>>
>>>>> [    0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock
>>>>>
>>>>> When I disable that printk, the kernel boots with
>>>>> CONFIG_PRINTK_TIME=y
>>>>>
>>>>> --- a/arch/x86/kernel/kvmclock.c
>>>>> +++ b/arch/x86/kernel/kvmclock.c
>>>>> @@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt)
>>>>>        int low, high;
>>>>>        low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
>>>>>        high = ((u64)__pa(&per_cpu(hv_clock, cpu))>>    32);
>>>>> -    printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
>>>>> -           cpu, high, low, txt);
>>>>> +    /*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
>>>>> +           cpu, high, low, txt);*/
>>>>>
>>>>>        return native_write_msr_safe(msr_kvm_system_time, low, high);
>>>>>    }
>>>>>
>>>>> So the problem appears to be that the clock of the second CPU
>>>>> is used too soon (or that clock setup should finish earlier).
>>>>>
>>>>>            
>>>> That's almost hilarious.  The printk from setting up the kvm clock
>>>> is invoking the kvm clock before it is setup.
>>>>
>>>> There's no reason other printks couldn't do the same thing, however.
>>>> I think it's safest to keep an initialized flag and check for it
>>>> before attempting to return a meaningful value.
>>>>
>>>>          
>>> I was on vacations, just got back.
>>>
>>> I think it is safe to just patch our own use of it. Before that, all
>>> other
>>> printks will be handled by the main cpu anyway, since it'll be the
>>> only one active
>>> at the moment. The only possible offenders for this are us, and the
>>> cpu initialization
>>> code, which is already fragile in multiple ways anyway.
>>>
>>> A flag would only make things more complicated and dirty
>>>
>>>        
>> Can we just do this?
>>      
>
> Sorry, the patch doesn't help. See line 68 in my debug log:
> 65: ffff880001411c00    1b68905d7 156558001892 6e10a    1b6c0631e       375d47       13c5ce  15655813de60
> 66: ffff880001411c00    1b68905d7 156558001892 6e10a    1b6c0653b       375f64       13c68f  15655813df21
> 67: ffff880001411c00    1b68905d7 156558001892 6e10a    1b6c06746       37616f       13c74a  15655813dfdc
> 68: ffff880001511c00    1967ac192 15654c8d826a 63c6c 3bf58bf0ea18 3bf3f5762886 15695466a1e5  2acea0f4244f
>    

This is a separate bug.  See attached patch (it won't apply, it's part 
of a series, but shows the bug).

> 69: ffff880001411c00    1b6f3fbda 156558264b1e 6e10e    1b6f424aa         28d0          e93  1565582659b1
> 70: ffff880001411c00    1b6f3fbda 156558264b1e 6e10e    1b6f4a1e0         a606         3b4b  156558268669
> 71: ffff880001411c00    1b6f3fbda 156558264b1e 6e10e    1b6f4ba63         be89         440b  156558268f29
> 72: ffff880001411c00    1b6f3fbda 156558264b1e 6e10e    1b6f4d8e7         dd0d         4ef1  156558269a0f
> 73: ffff880001511c00 3bf58bf16356 15655825e74b 40496 3bf58bf4d52c        371d6        13aef  15655827223a
> 74: ffff880001511c00 3bf58bf16356 15655825e74b 40496 3bf58bf4ebec        38896        1430f  156558272a5a
>
> I don't think that pvclock_clocksource_read is receiving
> completely random uninitialized data. The values in shadow
> are wrong, but could be interpreted as valid data
> (shadow.tsc_to_nsec_mul = b6dc43b6, shadow.tsc_shift = ffffffff,
> shadow.flags = 0 and shadow.version is always even).
>    
Copied from the first CPU possibly?

[-- Attachment #2: 0004-Fix-SVM-VMCB-reset.patch --]
[-- Type: text/plain, Size: 1031 bytes --]

From 3823c018162dc708b543cbdc680a4c7d63533fee Mon Sep 17 00:00:00 2001
From: Zachary Amsden <zamsden@redhat.com>
Date: Sat, 29 May 2010 17:52:46 -1000
Subject: [KVM V2 04/25] Fix SVM VMCB reset
Cc: Avi Kivity <avi@redhat.com>,
    Marcelo Tosatti <mtosatti@redhat.com>,
    Glauber Costa <glommer@redhat.com>,
    linux-kernel@vger.kernel.org

On reset, VMCB TSC should be set to zero.  Instead, code was setting
tsc_offset to zero, which passes through the underlying TSC.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
---
 arch/x86/kvm/svm.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 760c86e..46856d2 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -781,7 +781,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 
 	control->iopm_base_pa = iopm_base;
 	control->msrpm_base_pa = __pa(svm->msrpm);
-	control->tsc_offset = 0;
+	guest_write_tsc(&svm->vcpu, 0);
 	control->int_ctl = V_INTR_MASKING_MASK;
 
 	init_seg(&save->es);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-08-02 21:35                         ` Arjan Koers
  2010-08-03  0:00                           ` Zachary Amsden
@ 2010-09-28 11:16                           ` Michael Tokarev
  2010-09-29  8:12                             ` Michael Tokarev
  2010-09-29  8:28                           ` Avi Kivity
  2 siblings, 1 reply; 81+ messages in thread
From: Michael Tokarev @ 2010-09-28 11:16 UTC (permalink / raw)
  To: kvm

Arjan Koers <0h61vkll2ly8 <at> xutrox.com> writes:

[]
>  I've attached the printk patches for 2.6.34.1 and 2.6.35, in case
> anyone needs them...
> 
> 
> Move a printk that's using the clock before it's ready
> 
> Fix a hang during SMP kernel boot on KVM that showed up
> after commit 489fb490dbf8dab0249ad82b56688ae3842a79e8
> (2.6.35) and 59aab522154a2f17b25335b63c1cf68a51fb6ae0
> (2.6.34.1). The problem only occurs when
> CONFIG_PRINTK_TIME is set.
> 
> Signed-off-by: Arjan Koers <0h61vkll2ly8 <at> xutrox.com>
> 
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index feaeb0d..71bf2df 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -125,12 +125,15 @@ static struct clocksource kvm_clock = {
>  static int kvm_register_clock(char *txt)
>  {
>  	int cpu = smp_processor_id();
> -	int low, high;
> +	int low, high, ret;
> +
>  	low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
>  	high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
> +	ret = native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
>  	printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
>  	       cpu, high, low, txt);
> -	return native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
> +
> +	return ret;
>  }
> 
>  #ifdef CONFIG_X86_LOCAL_APIC


Folks, should this be sent to -stable kernel?  It is not in any
upstream kernel as far as I can see (not in linus tree too), but
this is quite an issue and is hitting people....

The discussion were stalled quite a while ago too -- this email has
Date: Mon, 02 Aug 2010 23:35:28 +0200.

Thanks!

/mjt


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-09-28 11:16                           ` Michael Tokarev
@ 2010-09-29  8:12                             ` Michael Tokarev
  0 siblings, 0 replies; 81+ messages in thread
From: Michael Tokarev @ 2010-09-29  8:12 UTC (permalink / raw)
  To: kvm; +Cc: Arjan Koers, Zachary Amsden, Glauber Costa, Avi Kivity,
	Andre Przywara

Ping? ;)

28.09.2010 15:16, Michael Tokarev wrote:
> Arjan Koers <0h61vkll2ly8 <at> xutrox.com> writes:
> 
>> Move a printk that's using the clock before it's ready
>>
>> Fix a hang during SMP kernel boot on KVM that showed up
>> after commit 489fb490dbf8dab0249ad82b56688ae3842a79e8
>> (2.6.35) and 59aab522154a2f17b25335b63c1cf68a51fb6ae0
>> (2.6.34.1). The problem only occurs when
>> CONFIG_PRINTK_TIME is set.
>>
>> Signed-off-by: Arjan Koers <0h61vkll2ly8 <at> xutrox.com>
>>
>> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
>> index feaeb0d..71bf2df 100644
>> --- a/arch/x86/kernel/kvmclock.c
>> +++ b/arch/x86/kernel/kvmclock.c
>> @@ -125,12 +125,15 @@ static struct clocksource kvm_clock = {
>>  static int kvm_register_clock(char *txt)
>>  {
>>  	int cpu = smp_processor_id();
>> -	int low, high;
>> +	int low, high, ret;
>> +
>>  	low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
>>  	high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
>> +	ret = native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
>>  	printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
>>  	       cpu, high, low, txt);
>> -	return native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
>> +
>> +	return ret;
>>  }
>>
>>  #ifdef CONFIG_X86_LOCAL_APIC
> 
> Folks, should this be sent to -stable kernel?  It is not in any
> upstream kernel as far as I can see (not in linus tree too), but
> this is quite an issue and is hitting people....
> 
> The discussion were stalled quite a while ago too -- this email has
> Date: Mon, 02 Aug 2010 23:35:28 +0200.
> 
> Thanks!
> 
> /mjt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-08-02 21:35                         ` Arjan Koers
  2010-08-03  0:00                           ` Zachary Amsden
  2010-09-28 11:16                           ` Michael Tokarev
@ 2010-09-29  8:28                           ` Avi Kivity
  2010-09-29  9:17                             ` Michael Tokarev
  2 siblings, 1 reply; 81+ messages in thread
From: Avi Kivity @ 2010-09-29  8:28 UTC (permalink / raw)
  To: Arjan Koers; +Cc: Zachary Amsden, Glauber Costa, kvm, Andre Przywara

  On 08/03/2010 12:35 AM, Arjan Koers wrote:
> I don't think that pvclock_clocksource_read is receiving
> completely random uninitialized data. The values in shadow
> are wrong, but could be interpreted as valid data
> (shadow.tsc_to_nsec_mul = b6dc43b6, shadow.tsc_shift = ffffffff,
> shadow.flags = 0 and shadow.version is always even).
>
>
> I've attached the printk patches for 2.6.34.1 and 2.6.35, in case
> anyone needs them...

Thanks, applied.  Please post patches in a new thread so I get the 
chance to see them.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-09-29  8:28                           ` Avi Kivity
@ 2010-09-29  9:17                             ` Michael Tokarev
  2010-09-29  9:19                               ` Michael Tokarev
  0 siblings, 1 reply; 81+ messages in thread
From: Michael Tokarev @ 2010-09-29  9:17 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Arjan Koers, Zachary Amsden, Glauber Costa, kvm, Andre Przywara

29.09.2010 12:28, Avi Kivity wrote:
>  On 08/03/2010 12:35 AM, Arjan Koers wrote:
>> I don't think that pvclock_clocksource_read is receiving
>> completely random uninitialized data. The values in shadow
>> are wrong, but could be interpreted as valid data
>> (shadow.tsc_to_nsec_mul = b6dc43b6, shadow.tsc_shift = ffffffff,
>> shadow.flags = 0 and shadow.version is always even).
>>
>> I've attached the printk patches for 2.6.34.1 and 2.6.35, in case
>> anyone needs them...

[Move a printk that's using the clock before it's ready]

> Thanks, applied.  Please post patches in a new thread so I get the
> chance to see them.

Avi, this is definitely a -stable material, for 2.6.32 (longterm
stable) and 2.6.35.

/mjt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-09-29  9:17                             ` Michael Tokarev
@ 2010-09-29  9:19                               ` Michael Tokarev
  2010-09-29 19:26                                 ` Arjan Koers
  0 siblings, 1 reply; 81+ messages in thread
From: Michael Tokarev @ 2010-09-29  9:19 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Arjan Koers, Zachary Amsden, Glauber Costa, kvm, Andre Przywara

29.09.2010 13:17, Michael Tokarev пишет:
> 29.09.2010 12:28, Avi Kivity wrote:
>>  On 08/03/2010 12:35 AM, Arjan Koers wrote:
>>> I don't think that pvclock_clocksource_read is receiving
>>> completely random uninitialized data. The values in shadow
>>> are wrong, but could be interpreted as valid data
>>> (shadow.tsc_to_nsec_mul = b6dc43b6, shadow.tsc_shift = ffffffff,
>>> shadow.flags = 0 and shadow.version is always even).
>>>
>>> I've attached the printk patches for 2.6.34.1 and 2.6.35, in case
>>> anyone needs them...
[]
> Avi, this is definitely a -stable material, for 2.6.32 (longterm
> stable) and 2.6.35.

Er. Please excuse me for the misinformation.  It is _not_ for 2.6.32
ofcourse.

/mjt


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-09-29  9:19                               ` Michael Tokarev
@ 2010-09-29 19:26                                 ` Arjan Koers
  2010-09-30  7:55                                   ` Michael Tokarev
  0 siblings, 1 reply; 81+ messages in thread
From: Arjan Koers @ 2010-09-29 19:26 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, Zachary Amsden, Glauber Costa, Michael Tokarev,
	Andre Przywara

On 2010-09-29 11:19, Michael Tokarev wrote:
> 29.09.2010 13:17, Michael Tokarev пишет:
>> 29.09.2010 12:28, Avi Kivity wrote:
>>>  On 08/03/2010 12:35 AM, Arjan Koers wrote:
>>>> I don't think that pvclock_clocksource_read is receiving
>>>> completely random uninitialized data. The values in shadow
>>>> are wrong, but could be interpreted as valid data
>>>> (shadow.tsc_to_nsec_mul = b6dc43b6, shadow.tsc_shift = ffffffff,
>>>> shadow.flags = 0 and shadow.version is always even).
>>>>
>>>> I've attached the printk patches for 2.6.34.1 and 2.6.35, in case
>>>> anyone needs them...
> []
>> Avi, this is definitely a -stable material, for 2.6.32 (longterm
>> stable) and 2.6.35.
> 
> Er. Please excuse me for the misinformation.  It is _not_ for 2.6.32
> ofcourse.

I wish you hadn't mentioned 2.6.32. I just tried 2.6.32.23 and it also
hangs. Reverting commit 1345126c761f0360dc108973bf73281d51945bc1
(introduced in 2.6.32.16) makes it boot again.

The kvmclock printk patch doesn't help, but I'll try to figure out
what's wrong.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-09-29 19:26                                 ` Arjan Koers
@ 2010-09-30  7:55                                   ` Michael Tokarev
  2010-09-30  9:59                                     ` Michael Tokarev
  0 siblings, 1 reply; 81+ messages in thread
From: Michael Tokarev @ 2010-09-30  7:55 UTC (permalink / raw)
  To: Arjan Koers
  Cc: kvm, Avi Kivity, Zachary Amsden, Glauber Costa, Andre Przywara

29.09.2010 23:26, Arjan Koers wrote:
> On 2010-09-29 11:19, Michael Tokarev wrote:
>> 29.09.2010 13:17, Michael Tokarev wrote:
[]
>>> Avi, this is definitely a -stable material, for 2.6.32 (longterm
>>> stable) and 2.6.35.
>>
>> Er. Please excuse me for the misinformation.  It is _not_ for 2.6.32
>> ofcourse.
> 
> I wish you hadn't mentioned 2.6.32. I just tried 2.6.32.23 and it also
> hangs. Reverting commit 1345126c761f0360dc108973bf73281d51945bc1
> (introduced in 2.6.32.16) makes it boot again.
> 
> The kvmclock printk patch doesn't help, but I'll try to figure out
> what's wrong.

It works here just fine - both 32- and 64-bit 2.6.32.23 as is,
and both 32- and 64-bit 2.6.35.6 with the printk.time patch
applied.

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-09-30  7:55                                   ` Michael Tokarev
@ 2010-09-30  9:59                                     ` Michael Tokarev
  2010-09-30 13:54                                       ` Zachary Amsden
  0 siblings, 1 reply; 81+ messages in thread
From: Michael Tokarev @ 2010-09-30  9:59 UTC (permalink / raw)
  To: Arjan Koers
  Cc: kvm, Avi Kivity, Zachary Amsden, Glauber Costa, Andre Przywara

30.09.2010 11:55, Michael Tokarev wrote:
> 29.09.2010 23:26, Arjan Koers wrote:
>> On 2010-09-29 11:19, Michael Tokarev wrote:
>>> 29.09.2010 13:17, Michael Tokarev wrote:
> []
>>>> Avi, this is definitely a -stable material, for 2.6.32 (longterm
>>>> stable) and 2.6.35.
>>>
>>> Er. Please excuse me for the misinformation.  It is _not_ for 2.6.32
>>> ofcourse.
>>
>> I wish you hadn't mentioned 2.6.32. I just tried 2.6.32.23 and it also
>> hangs. Reverting commit 1345126c761f0360dc108973bf73281d51945bc1
>> (introduced in 2.6.32.16) makes it boot again.
>>
>> The kvmclock printk patch doesn't help, but I'll try to figure out
>> what's wrong.
> 
> It works here just fine - both 32- and 64-bit 2.6.32.23 as is,
> and both 32- and 64-bit 2.6.35.6 with the printk.time patch
> applied.

Ok, I can confirm there's another issue somewhere around this.

After numerous tries I noticed that guests sporadically stops
during bootup - either somewhere in the middle or at the very
end of it.  It is definitely not this problem with printk time,
but it appears to be related to kvm-clock still, and smp.

This time, the lockup isn't really a lock up per se - the system
works (fsvo) - it reacts to keyboard, I can scroll up/down the
text console.  But it does nothing more, and in particular I've
no idea what it is waiting for.  It does not consume host CPU
as the printk.time problem had.

Happens most with 2.6.35.6 32bit guest kernel. I weren't able
to reproduce it with 2.6.35.6 64bit.  Does not happen on
2.6.35.3. And happens sporadically on 2.6.32.23 32bit too.

The thing always happens during some module load or other
_kernel_ work.  F.e. right now I've 2.6.35.6 32bit kernel
sitting after the login prompt (the Login: is at the middle
of the screen), with a few messages after the login prompt
telling me about various "misc" drivers (floppy, parport,
sg, piix_smbus etc) loaded.

Booting with clocksource=tsc does not expose the problem
so far - at least the most problematic 2.6.35.6 32bit always
booted ok with tsc.  But since the issue is intermittent,
one can't be sure it's really pvclock.

/mjt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-09-30  9:59                                     ` Michael Tokarev
@ 2010-09-30 13:54                                       ` Zachary Amsden
  2010-09-30 15:12                                         ` Michael Tokarev
  0 siblings, 1 reply; 81+ messages in thread
From: Zachary Amsden @ 2010-09-30 13:54 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Arjan Koers, kvm, Avi Kivity, Glauber Costa, Andre Przywara

On 09/29/2010 11:59 PM, Michael Tokarev wrote:
> 30.09.2010 11:55, Michael Tokarev wrote:
>    
>> 29.09.2010 23:26, Arjan Koers wrote:
>>      
>>> On 2010-09-29 11:19, Michael Tokarev wrote:
>>>        
>>>> 29.09.2010 13:17, Michael Tokarev wrote:
>>>>          
>> []
>>      
>>>>> Avi, this is definitely a -stable material, for 2.6.32 (longterm
>>>>> stable) and 2.6.35.
>>>>>            
>>>> Er. Please excuse me for the misinformation.  It is _not_ for 2.6.32
>>>> ofcourse.
>>>>          
>>> I wish you hadn't mentioned 2.6.32. I just tried 2.6.32.23 and it also
>>> hangs. Reverting commit 1345126c761f0360dc108973bf73281d51945bc1
>>> (introduced in 2.6.32.16) makes it boot again.
>>>
>>> The kvmclock printk patch doesn't help, but I'll try to figure out
>>> what's wrong.
>>>        
>> It works here just fine - both 32- and 64-bit 2.6.32.23 as is,
>> and both 32- and 64-bit 2.6.35.6 with the printk.time patch
>> applied.
>>      
> Ok, I can confirm there's another issue somewhere around this.
>
> After numerous tries I noticed that guests sporadically stops
> during bootup - either somewhere in the middle or at the very
> end of it.  It is definitely not this problem with printk time,
> but it appears to be related to kvm-clock still, and smp.
>
> This time, the lockup isn't really a lock up per se - the system
> works (fsvo) - it reacts to keyboard, I can scroll up/down the
> text console.  But it does nothing more, and in particular I've
> no idea what it is waiting for.  It does not consume host CPU
> as the printk.time problem had.
>
> Happens most with 2.6.35.6 32bit guest kernel. I weren't able
> to reproduce it with 2.6.35.6 64bit.  Does not happen on
> 2.6.35.3. And happens sporadically on 2.6.32.23 32bit too.
>
> The thing always happens during some module load or other
> _kernel_ work.  F.e. right now I've 2.6.35.6 32bit kernel
> sitting after the login prompt (the Login: is at the middle
> of the screen), with a few messages after the login prompt
> telling me about various "misc" drivers (floppy, parport,
> sg, piix_smbus etc) loaded.
>
> Booting with clocksource=tsc does not expose the problem
> so far - at least the most problematic 2.6.35.6 32bit always
> booted ok with tsc.  But since the issue is intermittent,
> one can't be sure it's really pvclock.
>
> /mjt
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    

The printk movement is just a bandaid patch, correct?  Anything which 
does printk before kvmclock is registered could trigger the same bug.

Can you try with printk timing disabled and see if the bug disappears?

Zach

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-09-30 13:54                                       ` Zachary Amsden
@ 2010-09-30 15:12                                         ` Michael Tokarev
  2010-09-30 15:32                                           ` Zachary Amsden
  0 siblings, 1 reply; 81+ messages in thread
From: Michael Tokarev @ 2010-09-30 15:12 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Arjan Koers, kvm, Avi Kivity, Glauber Costa, Andre Przywara

30.09.2010 17:54, Zachary Amsden wrote:
[]
> The printk movement is just a bandaid patch, correct?  Anything which
> does printk before kvmclock is registered could trigger the same bug.

Well, I'd not say it's just a bandaid patch, it's real bug -- either
we can read kvmclock (so it's initialized), or we don't touch it (at
least before registration).

> Can you try with printk timing disabled and see if the bug disappears?

Yes it disappears so far, at last I can't trigger it anymore, tried
numerous boots including the 2.6.35.6 32bit kernel (patched with the
printk registration patch!) which shows the prob in almost every boot.

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-09-30 15:12                                         ` Michael Tokarev
@ 2010-09-30 15:32                                           ` Zachary Amsden
  2010-09-30 18:49                                             ` Arjan Koers
  0 siblings, 1 reply; 81+ messages in thread
From: Zachary Amsden @ 2010-09-30 15:32 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Arjan Koers, kvm, Avi Kivity, Glauber Costa, Andre Przywara

On 09/30/2010 05:12 AM, Michael Tokarev wrote:
> 30.09.2010 17:54, Zachary Amsden wrote:
> []
>    
>> The printk movement is just a bandaid patch, correct?  Anything which
>> does printk before kvmclock is registered could trigger the same bug.
>>      
> Well, I'd not say it's just a bandaid patch, it's real bug -- either
> we can read kvmclock (so it's initialized), or we don't touch it (at
> least before registration).
>    

Yes, that's the bug, but moving the printk doesn't fix that, it just 
hides it.

>    
>> Can you try with printk timing disabled and see if the bug disappears?
>>      
> Yes it disappears so far, at last I can't trigger it anymore, tried
> numerous boots including the 2.6.35.6 32bit kernel (patched with the
> printk registration patch!) which shows the prob in almost every boot.
>    

So, looks like we need to do the real fix.

Zach

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-09-30 15:32                                           ` Zachary Amsden
@ 2010-09-30 18:49                                             ` Arjan Koers
  2010-09-30 19:05                                               ` Marcelo Tosatti
  0 siblings, 1 reply; 81+ messages in thread
From: Arjan Koers @ 2010-09-30 18:49 UTC (permalink / raw)
  To: kvm
  Cc: Michael Tokarev, Zachary Amsden, Avi Kivity, Glauber Costa,
	Andre Przywara

On 2010-09-30 17:32, Zachary Amsden wrote:
> On 09/30/2010 05:12 AM, Michael Tokarev wrote:
>> 30.09.2010 17:54, Zachary Amsden wrote:
>> []
>>   
>>> The printk movement is just a bandaid patch, correct?  Anything which
>>> does printk before kvmclock is registered could trigger the same bug.
>>>      
>> Well, I'd not say it's just a bandaid patch, it's real bug -- either
>> we can read kvmclock (so it's initialized), or we don't touch it (at
>> least before registration).
>>    
> 
> Yes, that's the bug, but moving the printk doesn't fix that, it just
> hides it.

Correct. It's just luck that it works for my  64-bit 2.6.34.* and
2.6.35.* kernels. The working kernels will break when compiled to
print additional debug information.

Here's a partial boot log of 2.6.32.23 with smpboot.c compiled
with DEBUG define. I modified printk to display the CPU# (printk_cpu).
All lines on CPU 1 up to 0.136487 are using the invalid clock and
will cause the kernel to hang later (if I hadn't patched pvclock
to correct the clock backwards).
...
[0:    0.124221] Booting processor 1 APIC 0x1 ip 0x6000
[0:    0.124579] Setting warm reset code and vector.
[0:    0.124585] 1.
[0:    0.124587] 2.
[0:    0.124588] 3.
[0:    0.124601] Asserting INIT.
[0:    0.124613] Waiting for send to finish...
[0:    0.134490] Deasserting INIT.
[0:    0.134497] Waiting for send to finish...
[0:    0.134501] #startup loops: 2.
[0:    0.134503] Sending STARTUP #1.
[0:    0.134508] After apic_write.
[1:    0.008000] Initializing CPU#1
[1:    0.008000] CPU#1 (phys ID: 1) waiting for CALLOUT
[0:    0.134826] Startup point 1.
[0:    0.135133] Waiting for send to finish...
[0:    0.135340] Sending STARTUP #2.
[0:    0.135346] After apic_write.
[0:    0.135650] Startup point 1.
[0:    0.135651] Waiting for send to finish...
[0:    0.135858] After Startup.
[0:    0.135859] Before Callout 1.
[0:    0.135861] After Callout 1.
[1:    0.008000] CALLIN, before setup_local_APIC().
[1:    0.008000] Stack at about ffff88001f889f44
[1:    0.008000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[1:    0.008000] CPU: L2 Cache: 512K (64 bytes/line)
[1:    0.008000] kvm-clock: cpu 1, msr 0:1511601, secondary cpu clock
[0:    0.136461] OK.
[0:    0.136463] CPU1: AMD Athlon(tm) II X2 240 Processor stepping 02
[0:    0.136465] CPU has booted.
[0:    0.136488] Brought up 2 CPUs
[0:    0.136489] Boot done.
[0:    0.136490] Before bogomips.
[0:    0.136491] Total of 2 processors activated (11202.17 BogoMIPS).
[0:    0.136493] Before bogocount - setting activated=1.
[1:    0.136487] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
[0:    0.144007] CPU0 attaching sched-domain:
[0:    0.144010]  domain 0: span 0-1 level CPU
[0:    0.144012]   groups: 0 1
[0:    0.144016] CPU1 attaching sched-domain:
[0:    0.144018]  domain 0: span 0-1 level CPU
[0:    0.144020]   groups: 1 0
[0:    0.144219] NET: Registered protocol family 16
[0:    0.148091] PCI: Using configuration type 1 for base access
[0:    0.148451] PCI: Using configuration type 1 for extended access
[0:    0.148870] mtrr: your CPUs had inconsistent variable MTRR settings
[0:    0.148870] mtrr: your CPUs had inconsistent MTRRdefType settings
[0:    0.148870] mtrr: probably your BIOS does not setup all CPUs.
[0:    0.149185] mtrr: corrected configuration.
[0:    0.156112] bio: create slab <bio-0> at 0
[0:    0.156635] vgaarb: loaded
[0:    0.156635] PCI: Probing PCI hardware
[0:    0.156635] PCI: Probing PCI hardware (bus 00)
[0:    0.156635] pci 0000:00:01.1: reg 20 io port: [0xc000-0xc00f]
[0:    0.156773] pci 0000:00:01.3: quirk: region b000-b03f claimed by PIIX4 ACPI
[0:    0.160012] pci 0000:00:01.3: quirk: region b100-b10f claimed by PIIX4 SMB
[0:    0.163379] pci 0000:00:02.0: reg 10 32bit mmio pref: [0xf0000000-0xf1ffffff]
[0:    0.164660] pci 0000:00:02.0: reg 14 32bit mmio: [0xf2000000-0xf2000fff]
[0:    0.170537] pci 0000:00:03.0: reg 10 io port: [0xc020-0xc03f]
[0:    0.170629] pci 0000:00:03.0: reg 14 32bit mmio: [0xf2001000-0xf2001fff]
[0:    0.171037] pci 0000:00:04.0: reg 10 io port: [0xc040-0xc05f]
[0:    0.171373] pci 0000:00:05.0: reg 10 io port: [0xc080-0xc0bf]
[0:    0.172273] pci 0000:00:06.0: reg 10 io port: [0xc0c0-0xc0ff]
[0:    0.173099] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[0:    0.176131] pci 0000:00:01.0: PIIX/ICH IRQ router [8086:7000]
[0:    0.177112] Switching to clocksource kvm-clock
[1:    0.181401] pci_bus 0000:00: resource 0 io:  [0x00-0xffff]
[1:    0.181412] pci_bus 0000:00: resource 1 mem: [0x000000-0xffffffffffffffff]
[1:    0.181825] NET: Registered protocol family 2
...


>>> Can you try with printk timing disabled and see if the bug disappears?
>>>      
>> Yes it disappears so far, at last I can't trigger it anymore, tried
>> numerous boots including the 2.6.35.6 32bit kernel (patched with the
>> printk registration patch!) which shows the prob in almost every boot.
>>    
> 
> So, looks like we need to do the real fix.

Your ideas to zero hv_clock or to use an initialized flag may be usable.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-09-30 18:49                                             ` Arjan Koers
@ 2010-09-30 19:05                                               ` Marcelo Tosatti
  2010-09-30 20:16                                                 ` Arjan Koers
  2010-09-30 23:02                                                 ` Michael Tokarev
  0 siblings, 2 replies; 81+ messages in thread
From: Marcelo Tosatti @ 2010-09-30 19:05 UTC (permalink / raw)
  To: Arjan Koers
  Cc: kvm, Michael Tokarev, Zachary Amsden, Avi Kivity, Glauber Costa,
	Andre Przywara

On Thu, Sep 30, 2010 at 08:49:44PM +0200, Arjan Koers wrote:
> On 2010-09-30 17:32, Zachary Amsden wrote:
> > On 09/30/2010 05:12 AM, Michael Tokarev wrote:
> >> 30.09.2010 17:54, Zachary Amsden wrote:
> >> []
> >>   
> >>> The printk movement is just a bandaid patch, correct?  Anything which
> >>> does printk before kvmclock is registered could trigger the same bug.
> >>>      
> >> Well, I'd not say it's just a bandaid patch, it's real bug -- either
> >> we can read kvmclock (so it's initialized), or we don't touch it (at
> >> least before registration).
> >>    
> > 
> > Yes, that's the bug, but moving the printk doesn't fix that, it just
> > hides it.
> 
> Correct. It's just luck that it works for my  64-bit 2.6.34.* and
> 2.6.35.* kernels. The working kernels will break when compiled to
> print additional debug information.
> 
> Here's a partial boot log of 2.6.32.23 with smpboot.c compiled
> with DEBUG define. I modified printk to display the CPU# (printk_cpu).
> All lines on CPU 1 up to 0.136487 are using the invalid clock and
> will cause the kernel to hang later (if I hadn't patched pvclock
> to correct the clock backwards).
> ...
> [0:    0.124221] Booting processor 1 APIC 0x1 ip 0x6000
> [0:    0.124579] Setting warm reset code and vector.
> [0:    0.124585] 1.
> [0:    0.124587] 2.
> [0:    0.124588] 3.
> [0:    0.124601] Asserting INIT.
> [0:    0.124613] Waiting for send to finish...
> [0:    0.134490] Deasserting INIT.
> [0:    0.134497] Waiting for send to finish...
> [0:    0.134501] #startup loops: 2.
> [0:    0.134503] Sending STARTUP #1.
> [0:    0.134508] After apic_write.
> [1:    0.008000] Initializing CPU#1
> [1:    0.008000] CPU#1 (phys ID: 1) waiting for CALLOUT
> [0:    0.134826] Startup point 1.
> [0:    0.135133] Waiting for send to finish...
> [0:    0.135340] Sending STARTUP #2.
> [0:    0.135346] After apic_write.
> [0:    0.135650] Startup point 1.
> [0:    0.135651] Waiting for send to finish...
> [0:    0.135858] After Startup.
> [0:    0.135859] Before Callout 1.
> [0:    0.135861] After Callout 1.
> [1:    0.008000] CALLIN, before setup_local_APIC().
> [1:    0.008000] Stack at about ffff88001f889f44
> [1:    0.008000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> [1:    0.008000] CPU: L2 Cache: 512K (64 bytes/line)
> [1:    0.008000] kvm-clock: cpu 1, msr 0:1511601, secondary cpu clock
> [0:    0.136461] OK.
> [0:    0.136463] CPU1: AMD Athlon(tm) II X2 240 Processor stepping 02
> [0:    0.136465] CPU has booted.
> [0:    0.136488] Brought up 2 CPUs
> [0:    0.136489] Boot done.
> [0:    0.136490] Before bogomips.
> [0:    0.136491] Total of 2 processors activated (11202.17 BogoMIPS).
> [0:    0.136493] Before bogocount - setting activated=1.
> [1:    0.136487] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
> [0:    0.144007] CPU0 attaching sched-domain:
> [0:    0.144010]  domain 0: span 0-1 level CPU
> [0:    0.144012]   groups: 0 1
> [0:    0.144016] CPU1 attaching sched-domain:
> [0:    0.144018]  domain 0: span 0-1 level CPU
> [0:    0.144020]   groups: 1 0
> [0:    0.144219] NET: Registered protocol family 16
> [0:    0.148091] PCI: Using configuration type 1 for base access
> [0:    0.148451] PCI: Using configuration type 1 for extended access
> [0:    0.148870] mtrr: your CPUs had inconsistent variable MTRR settings
> [0:    0.148870] mtrr: your CPUs had inconsistent MTRRdefType settings
> [0:    0.148870] mtrr: probably your BIOS does not setup all CPUs.
> [0:    0.149185] mtrr: corrected configuration.
> [0:    0.156112] bio: create slab <bio-0> at 0
> [0:    0.156635] vgaarb: loaded
> [0:    0.156635] PCI: Probing PCI hardware
> [0:    0.156635] PCI: Probing PCI hardware (bus 00)
> [0:    0.156635] pci 0000:00:01.1: reg 20 io port: [0xc000-0xc00f]
> [0:    0.156773] pci 0000:00:01.3: quirk: region b000-b03f claimed by PIIX4 ACPI
> [0:    0.160012] pci 0000:00:01.3: quirk: region b100-b10f claimed by PIIX4 SMB
> [0:    0.163379] pci 0000:00:02.0: reg 10 32bit mmio pref: [0xf0000000-0xf1ffffff]
> [0:    0.164660] pci 0000:00:02.0: reg 14 32bit mmio: [0xf2000000-0xf2000fff]
> [0:    0.170537] pci 0000:00:03.0: reg 10 io port: [0xc020-0xc03f]
> [0:    0.170629] pci 0000:00:03.0: reg 14 32bit mmio: [0xf2001000-0xf2001fff]
> [0:    0.171037] pci 0000:00:04.0: reg 10 io port: [0xc040-0xc05f]
> [0:    0.171373] pci 0000:00:05.0: reg 10 io port: [0xc080-0xc0bf]
> [0:    0.172273] pci 0000:00:06.0: reg 10 io port: [0xc0c0-0xc0ff]
> [0:    0.173099] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
> [0:    0.176131] pci 0000:00:01.0: PIIX/ICH IRQ router [8086:7000]
> [0:    0.177112] Switching to clocksource kvm-clock
> [1:    0.181401] pci_bus 0000:00: resource 0 io:  [0x00-0xffff]
> [1:    0.181412] pci_bus 0000:00: resource 1 mem: [0x000000-0xffffffffffffffff]
> [1:    0.181825] NET: Registered protocol family 2
> ...
> 
> 
> >>> Can you try with printk timing disabled and see if the bug disappears?
> >>>      
> >> Yes it disappears so far, at last I can't trigger it anymore, tried
> >> numerous boots including the 2.6.35.6 32bit kernel (patched with the
> >> printk registration patch!) which shows the prob in almost every boot.
> >>    
> > 
> > So, looks like we need to do the real fix.
> 
> Your ideas to zero hv_clock or to use an initialized flag may be usable.

Arjan, Michael, can you try the following:

>From 3823c018162dc708b543cbdc680a4c7d63533fee Mon Sep 17 00:00:00 2001
From: Zachary Amsden <zamsden@redhat.com>
Date: Sat, 29 May 2010 17:52:46 -1000
Subject: [KVM V2 04/25] Fix SVM VMCB reset
Cc: Avi Kivity <avi@redhat.com>,
    Marcelo Tosatti <mtosatti@redhat.com>,
    Glauber Costa <glommer@redhat.com>,
    linux-kernel@vger.kernel.org

On reset, VMCB TSC should be set to zero.  Instead, code was setting
tsc_offset to zero, which passes through the underlying TSC.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
---
 arch/x86/kvm/svm.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 760c86e..46856d2 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -781,7 +781,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 
 	control->iopm_base_pa = iopm_base;
 	control->msrpm_base_pa = __pa(svm->msrpm);
-	control->tsc_offset = 0;
+	guest_write_tsc(&svm->vcpu, 0);
 	control->int_ctl = V_INTR_MASKING_MASK;
 
 	init_seg(&save->es);
-- 
1.7.1




^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-09-30 19:05                                               ` Marcelo Tosatti
@ 2010-09-30 20:16                                                 ` Arjan Koers
  2010-09-30 23:02                                                 ` Michael Tokarev
  1 sibling, 0 replies; 81+ messages in thread
From: Arjan Koers @ 2010-09-30 20:16 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: kvm, Michael Tokarev, Zachary Amsden, Avi Kivity, Glauber Costa,
	Andre Przywara

On 2010-09-30 21:05, Marcelo Tosatti wrote:
> 
> Arjan, Michael, can you try the following:
> 
> From 3823c018162dc708b543cbdc680a4c7d63533fee Mon Sep 17 00:00:00 2001
> From: Zachary Amsden <zamsden@redhat.com>
> Date: Sat, 29 May 2010 17:52:46 -1000
> Subject: [KVM V2 04/25] Fix SVM VMCB reset
> Cc: Avi Kivity <avi@redhat.com>,
>     Marcelo Tosatti <mtosatti@redhat.com>,
>     Glauber Costa <glommer@redhat.com>,
>     linux-kernel@vger.kernel.org
> 
> On reset, VMCB TSC should be set to zero.  Instead, code was setting
> tsc_offset to zero, which passes through the underlying TSC.
> 
> Signed-off-by: Zachary Amsden <zamsden@redhat.com>
> ---
>  arch/x86/kvm/svm.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 760c86e..46856d2 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -781,7 +781,7 @@ static void init_vmcb(struct vcpu_svm *svm)
>  
>  	control->iopm_base_pa = iopm_base;
>  	control->msrpm_base_pa = __pa(svm->msrpm);
> -	control->tsc_offset = 0;
> +	guest_write_tsc(&svm->vcpu, 0);
>  	control->int_ctl = V_INTR_MASKING_MASK;
>  
>  	init_seg(&save->es);

It doesn't solve my problem. I tried on 2.6.32.23 and 2.6.36-rc6.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-09-30 19:05                                               ` Marcelo Tosatti
  2010-09-30 20:16                                                 ` Arjan Koers
@ 2010-09-30 23:02                                                 ` Michael Tokarev
  2010-09-30 23:07                                                   ` Michael Tokarev
  1 sibling, 1 reply; 81+ messages in thread
From: Michael Tokarev @ 2010-09-30 23:02 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Arjan Koers, kvm, Zachary Amsden, Avi Kivity, Glauber Costa,
	Andre Przywara

30.09.2010 23:05, Marcelo Tosatti wrote:
[]
> Arjan, Michael, can you try the following:
> 
> From 3823c018162dc708b543cbdc680a4c7d63533fee Mon Sep 17 00:00:00 2001
> From: Zachary Amsden <zamsden@redhat.com>
> Date: Sat, 29 May 2010 17:52:46 -1000
> Subject: [KVM V2 04/25] Fix SVM VMCB reset
> Cc: Avi Kivity <avi@redhat.com>,
>     Marcelo Tosatti <mtosatti@redhat.com>,
>     Glauber Costa <glommer@redhat.com>,
>     linux-kernel@vger.kernel.org
> 
> On reset, VMCB TSC should be set to zero.  Instead, code was setting
> tsc_offset to zero, which passes through the underlying TSC.
> 
> Signed-off-by: Zachary Amsden <zamsden@redhat.com>
> ---
>  arch/x86/kvm/svm.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 760c86e..46856d2 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -781,7 +781,7 @@ static void init_vmcb(struct vcpu_svm *svm)
>  
>  	control->iopm_base_pa = iopm_base;
>  	control->msrpm_base_pa = __pa(svm->msrpm);
> -	control->tsc_offset = 0;
> +	guest_write_tsc(&svm->vcpu, 0);
>  	control->int_ctl = V_INTR_MASKING_MASK;

This fails to compile on 2.6.35.5:

arch/x86/kvm/svm.c: In function ‘init_vmcb’:
arch/x86/kvm/svm.c:769: error: implicit declaration of function ‘guest_write_tsc’

I'll take a look tomorrow where that comes from.. hopefully ;)

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-09-30 23:02                                                 ` Michael Tokarev
@ 2010-09-30 23:07                                                   ` Michael Tokarev
  2010-10-01  1:13                                                     ` Zachary Amsden
  2010-10-02  5:35                                                     ` Zachary Amsden
  0 siblings, 2 replies; 81+ messages in thread
From: Michael Tokarev @ 2010-09-30 23:07 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Arjan Koers, kvm, Zachary Amsden, Avi Kivity, Glauber Costa,
	Andre Przywara

01.10.2010 03:02, Michael Tokarev wrote:
> 30.09.2010 23:05, Marcelo Tosatti wrote:
> []
>> Arjan, Michael, can you try the following:
>>
>> From 3823c018162dc708b543cbdc680a4c7d63533fee Mon Sep 17 00:00:00 2001
>> From: Zachary Amsden <zamsden@redhat.com>
>> Date: Sat, 29 May 2010 17:52:46 -1000
>> Subject: [KVM V2 04/25] Fix SVM VMCB reset
>> Cc: Avi Kivity <avi@redhat.com>,
>>     Marcelo Tosatti <mtosatti@redhat.com>,
>>     Glauber Costa <glommer@redhat.com>,
>>     linux-kernel@vger.kernel.org
>>
>> On reset, VMCB TSC should be set to zero.  Instead, code was setting
>> tsc_offset to zero, which passes through the underlying TSC.
>>
>> Signed-off-by: Zachary Amsden <zamsden@redhat.com>
>> ---
>>  arch/x86/kvm/svm.c |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index 760c86e..46856d2 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -781,7 +781,7 @@ static void init_vmcb(struct vcpu_svm *svm)
>>  
>>  	control->iopm_base_pa = iopm_base;
>>  	control->msrpm_base_pa = __pa(svm->msrpm);
>> -	control->tsc_offset = 0;
>> +	guest_write_tsc(&svm->vcpu, 0);
>>  	control->int_ctl = V_INTR_MASKING_MASK;
> 
> This fails to compile on 2.6.35.5:
> 
> arch/x86/kvm/svm.c: In function ‘init_vmcb’:
> arch/x86/kvm/svm.c:769: error: implicit declaration of function ‘guest_write_tsc’
> 
> I'll take a look tomorrow where that comes from.. hopefully ;)

Ok, that routine is static, defined in arch/x86/kvm/vmx.c
(not svm.c).  I'm not sure it's ok to use it in svm.c
directly, as it appears to be vmx-specific.

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-09-30 23:07                                                   ` Michael Tokarev
@ 2010-10-01  1:13                                                     ` Zachary Amsden
  2010-10-02  5:35                                                     ` Zachary Amsden
  1 sibling, 0 replies; 81+ messages in thread
From: Zachary Amsden @ 2010-10-01  1:13 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Marcelo Tosatti, Arjan Koers, kvm, Avi Kivity, Glauber Costa,
	Andre Przywara

On 09/30/2010 01:07 PM, Michael Tokarev wrote:
> 01.10.2010 03:02, Michael Tokarev wrote:
>    
>> 30.09.2010 23:05, Marcelo Tosatti wrote:
>> []
>>      
>>> Arjan, Michael, can you try the following:
>>>
>>>  From 3823c018162dc708b543cbdc680a4c7d63533fee Mon Sep 17 00:00:00 2001
>>> From: Zachary Amsden<zamsden@redhat.com>
>>> Date: Sat, 29 May 2010 17:52:46 -1000
>>> Subject: [KVM V2 04/25] Fix SVM VMCB reset
>>> Cc: Avi Kivity<avi@redhat.com>,
>>>      Marcelo Tosatti<mtosatti@redhat.com>,
>>>      Glauber Costa<glommer@redhat.com>,
>>>      linux-kernel@vger.kernel.org
>>>
>>> On reset, VMCB TSC should be set to zero.  Instead, code was setting
>>> tsc_offset to zero, which passes through the underlying TSC.
>>>
>>> Signed-off-by: Zachary Amsden<zamsden@redhat.com>
>>> ---
>>>   arch/x86/kvm/svm.c |    2 +-
>>>   1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>> index 760c86e..46856d2 100644
>>> --- a/arch/x86/kvm/svm.c
>>> +++ b/arch/x86/kvm/svm.c
>>> @@ -781,7 +781,7 @@ static void init_vmcb(struct vcpu_svm *svm)
>>>
>>>   	control->iopm_base_pa = iopm_base;
>>>   	control->msrpm_base_pa = __pa(svm->msrpm);
>>> -	control->tsc_offset = 0;
>>> +	guest_write_tsc(&svm->vcpu, 0);
>>>   	control->int_ctl = V_INTR_MASKING_MASK;
>>>        
>> This fails to compile on 2.6.35.5:
>>
>> arch/x86/kvm/svm.c: In function ‘init_vmcb’:
>> arch/x86/kvm/svm.c:769: error: implicit declaration of function ‘guest_write_tsc’
>>
>> I'll take a look tomorrow where that comes from.. hopefully ;)
>>      
> Ok, that routine is static, defined in arch/x86/kvm/vmx.c
> (not svm.c).  I'm not sure it's ok to use it in svm.c
> directly, as it appears to be vmx-specific.
>    

Looks like you are missing some patches in between which move this into 
common code, so it won't apply directly.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-09-30 23:07                                                   ` Michael Tokarev
  2010-10-01  1:13                                                     ` Zachary Amsden
@ 2010-10-02  5:35                                                     ` Zachary Amsden
  2010-10-02  7:35                                                       ` Michael Tokarev
  1 sibling, 1 reply; 81+ messages in thread
From: Zachary Amsden @ 2010-10-02  5:35 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Marcelo Tosatti, Arjan Koers, kvm, Avi Kivity, Glauber Costa,
	Andre Przywara, jeremy

[-- Attachment #1: Type: text/plain, Size: 1875 bytes --]

On 09/30/2010 01:07 PM, Michael Tokarev wrote:
> 01.10.2010 03:02, Michael Tokarev wrote:
>    
>> 30.09.2010 23:05, Marcelo Tosatti wrote:
>> []
>>      
>>> Arjan, Michael, can you try the following:
>>>
>>>  From 3823c018162dc708b543cbdc680a4c7d63533fee Mon Sep 17 00:00:00 2001
>>> From: Zachary Amsden<zamsden@redhat.com>
>>> Date: Sat, 29 May 2010 17:52:46 -1000
>>> Subject: [KVM V2 04/25] Fix SVM VMCB reset
>>> Cc: Avi Kivity<avi@redhat.com>,
>>>      Marcelo Tosatti<mtosatti@redhat.com>,
>>>      Glauber Costa<glommer@redhat.com>,
>>>      linux-kernel@vger.kernel.org
>>>
>>> On reset, VMCB TSC should be set to zero.  Instead, code was setting
>>> tsc_offset to zero, which passes through the underlying TSC.
>>>
>>> Signed-off-by: Zachary Amsden<zamsden@redhat.com>
>>> ---
>>>   arch/x86/kvm/svm.c |    2 +-
>>>   1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>> index 760c86e..46856d2 100644
>>> --- a/arch/x86/kvm/svm.c
>>> +++ b/arch/x86/kvm/svm.c
>>> @@ -781,7 +781,7 @@ static void init_vmcb(struct vcpu_svm *svm)
>>>
>>>   	control->iopm_base_pa = iopm_base;
>>>   	control->msrpm_base_pa = __pa(svm->msrpm);
>>> -	control->tsc_offset = 0;
>>> +	guest_write_tsc(&svm->vcpu, 0);
>>>   	control->int_ctl = V_INTR_MASKING_MASK;
>>>        
>> This fails to compile on 2.6.35.5:
>>
>> arch/x86/kvm/svm.c: In function ‘init_vmcb’:
>> arch/x86/kvm/svm.c:769: error: implicit declaration of function ‘guest_write_tsc’
>>
>> I'll take a look tomorrow where that comes from.. hopefully ;)
>>      
> Ok, that routine is static, defined in arch/x86/kvm/vmx.c
> (not svm.c).  I'm not sure it's ok to use it in svm.c
> directly, as it appears to be vmx-specific.
>
> Thanks!
>
> /mjt
>    


Can you try this patch to see if it helps?  I believe it is also safe 
for Xen, but cc'ing to double check.

[-- Attachment #2: kvmclock-fix-hack-1.patch --]
[-- Type: text/plain, Size: 807 bytes --]

Try to fix setup_percpu_clockdev by moving it before interrupts
are enabled.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 8b3bfc4..40a383b 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -351,6 +351,8 @@ notrace static void __cpuinit start_secondary(void *unused)
 	unlock_vector_lock();
 	ipi_call_unlock();
 	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
+	x86_cpuinit.setup_percpu_clockev();
+
 	x86_platform.nmi_init();
 
 	/* enable local interrupts */
@@ -359,8 +361,6 @@ notrace static void __cpuinit start_secondary(void *unused)
 	/* to prevent fake stack check failure in clock setup */
 	boot_init_stack_canary();
 
-	x86_cpuinit.setup_percpu_clockev();
-
 	wmb();
 	cpu_idle();
 }

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-02  5:35                                                     ` Zachary Amsden
@ 2010-10-02  7:35                                                       ` Michael Tokarev
  2010-10-02  7:40                                                         ` Michael Tokarev
                                                                           ` (2 more replies)
  0 siblings, 3 replies; 81+ messages in thread
From: Michael Tokarev @ 2010-10-02  7:35 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Marcelo Tosatti, Arjan Koers, kvm, Avi Kivity, Glauber Costa,
	Andre Przywara, jeremy

02.10.2010 09:35, Zachary Amsden wrote:
[]
> Can you try this patch to see if it helps?  I believe it is also safe
> for Xen, but cc'ing to double check.

It makes no visible difference.

For some reason one of my test guests - 2.6.35.6 32bit kernel -
stopped booting completely, always handing at boot somewhere
unless I disable printk.time.  Here's the typical boot messages,
up to the hang:

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 2.6.35-i686 (mjt@gandalf) (gcc version 4.4.5 20100728 (prerelease) (Debian 4.4.4-8) ) #2.6.35.6 SMP Thu Sep 30 12:00:24 MSD 2010
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
[    0.000000]  BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 000000001fffd000 (usable)
[    0.000000]  BIOS-e820: 000000001fffd000 - 0000000020000000 (reserved)
[    0.000000]  BIOS-e820: 00000000feffd000 - 00000000ff001000 (reserved)
[    0.000000]  BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
[    0.000000] Notice: NX (Execute Disable) protection cannot be enabled: non-PAE kernel!
[    0.000000] DMI 2.4 present.
[    0.000000] last_pfn = 0x1fffd max_arch_pfn = 0x100000
[    0.000000] x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106
[    0.000000] found SMP MP-table at [c00fdbe0] fdbe0
[    0.000000] init_memory_mapping: 0000000000000000-000000001fffd000
[    0.000000] RAMDISK: 1fbb5000 - 1fe96000
[    0.000000] ACPI: RSDP 000fdb90 00014 (v00 BOCHS )
[    0.000000] ACPI: RSDT 1fffde10 00034 (v01 BOCHS  BXPCRSDT 00000001 BXPC 00000001)
[    0.000000] ACPI: FACP 1ffffe40 00074 (v01 BOCHS  BXPCFACP 00000001 BXPC 00000001)
[    0.000000] ACPI: DSDT 1fffdfd0 01E22 (v01   BXPC   BXDSDT 00000001 INTL 20090123)
[    0.000000] ACPI: FACS 1ffffe00 00040
[    0.000000] ACPI: SSDT 1fffdf80 00044 (v01 BOCHS  BXPCSSDT 00000001 BXPC 00000001)
[    0.000000] ACPI: APIC 1fffde90 0007A (v01 BOCHS  BXPCAPIC 00000001 BXPC 00000001)
[    0.000000] ACPI: HPET 1fffde50 00038 (v01 BOCHS  BXPCHPET 00000001 BXPC 00000001)
[    0.000000] 0MB HIGHMEM available.
[    0.000000] 511MB LOWMEM available.
[    0.000000]   mapped low ram: 0 - 1fffd000
[    0.000000]   low ram: 0 - 1fffd000
[    0.000000] kvm-clock: Using msrs 12 and 11
[    0.000000] kvm-clock: cpu 0, msr 0:13c60c1, boot clock
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000001 -> 0x00001000
[    0.000000]   Normal   0x00001000 -> 0x0001fffd
[    0.000000]   HighMem  empty
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[2] active PFN ranges
[    0.000000]     0: 0x00000001 -> 0x0000009f
[    0.000000]     0: 0x00000100 -> 0x0001fffd
[    0.000000] Using APIC driver default
[    0.000000] ACPI: PM-Timer IO Port: 0xb008
[    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[    0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[    0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
[    0.000000] PM: Registered nosave memory: 000000000009f000 - 00000000000a0000
[    0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
[    0.000000] PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
[    0.000000] Allocating PCI resources starting at 20000000 (gap: 20000000:deffd000)
[    0.000000] Booting paravirtualized kernel on KVM
[    0.000000] setup_percpu: NR_CPUS:8 nr_cpumask_bits:8 nr_cpu_ids:2 nr_node_ids:1
[    0.000000] PERCPU: Embedded 16 pages/cpu @c1c00000 s43072 r0 d22464 u2097152
[    0.000000] pcpu-alloc: s43072 r0 d22464 u2097152 alloc=1*4194304
[    0.000000] pcpu-alloc: [0] 0 1
[    0.000000] kvm-clock: cpu 0, msr 0:1c0a0c1, primary cpu clock
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 129947
[    0.000000] Kernel command line: acpi_enforce_resources=lax rootfs=nfs root=/usr/rb rootflags=ro,nolock bootrc=/remote/bootrc initrd=lnx/initrd-2.6.35-i686 ip=192.168.88.60:192.168.88.4:192.168.88.4:255.255.255.0 BOOTIF=01-52-54-00-12-34-56 console=tty1 console=ttyS0 BOOT_IMAGE=lnx/vmlinuz-2.6.35-i686
[    0.000000] PID hash table entries: 2048 (order: 1, 8192 bytes)
[    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
[    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
[    0.000000] Enabling fast FPU save and restore... done.
[    0.000000] Enabling unmasked SIMD FPU exception support... done.
[    0.000000] Initializing CPU#0
[    0.000000] Subtract (42 early reservations)
[    0.000000]   #1 [0000001000 - 0000002000]   EX TRAMPOLINE
[    0.000000]   #2 [0001000000 - 000144a9e4]   TEXT DATA BSS
[    0.000000]   #3 [001fbb5000 - 001fe96000]         RAMDISK
[    0.000000]   #4 [000144b000 - 0001451049]             BRK
[    0.000000]   #5 [000009f400 - 00000fdbe0]   BIOS reserved
[    0.000000]   #6 [00000fdbe0 - 00000fdbf0]    MP-table mpf
[    0.000000]   #7 [00000fdce4 - 0000100000]   BIOS reserved
[    0.000000]   #8 [00000fdbf0 - 00000fdce4]    MP-table mpc
[    0.000000]   #9 [0000002000 - 0000003000]      TRAMPOLINE
[    0.000000]   #10 [0000003000 - 0000007000]     ACPI WAKEUP
[    0.000000]   #11 [0000007000 - 0000008000]         PGTABLE
[    0.000000]   #12 [0001452000 - 0001453000]         BOOTMEM
[    0.000000]   #13 [0001453000 - 0001853000]         BOOTMEM
[    0.000000]   #14 [000144aa00 - 000144aa04]         BOOTMEM
[    0.000000]   #15 [000144aa40 - 000144ab00]         BOOTMEM
[    0.000000]   #16 [000144ab00 - 000144ab30]         BOOTMEM
[    0.000000]   #17 [0001853000 - 0001854800]         BOOTMEM
[    0.000000]   #18 [000144ab40 - 000144ab65]         BOOTMEM
[    0.000000]   #19 [000144ab80 - 000144aba7]         BOOTMEM
[    0.000000]   #20 [000144abc0 - 000144aca0]         BOOTMEM
[    0.000000]   #21 [000144acc0 - 000144ad00]         BOOTMEM
[    0.000000]   #22 [000144ad00 - 000144ad40]         BOOTMEM
[    0.000000]   #23 [000144ad40 - 000144ad80]         BOOTMEM
[    0.000000]   #24 [000144ad80 - 000144adc0]         BOOTMEM
[    0.000000]   #25 [000144adc0 - 000144ae00]         BOOTMEM
[    0.000000]   #26 [000144ae00 - 000144ae40]         BOOTMEM
[    0.000000]   #27 [000144ae40 - 000144ae80]         BOOTMEM
[    0.000000]   #28 [000144ae80 - 000144ae90]         BOOTMEM
[    0.000000]   #29 [000144aec0 - 000144afcf]         BOOTMEM
[    0.000000]   #30 [0001451080 - 000145118f]         BOOTMEM
[    0.000000]   #31 [0001c00000 - 0001c10000]         BOOTMEM
[    0.000000]   #32 [0001e00000 - 0001e10000]         BOOTMEM
[    0.000000]   #33 [00014511c0 - 00014511c4]         BOOTMEM
[    0.000000]   #34 [0001451200 - 0001451204]         BOOTMEM
[    0.000000]   #35 [0001451240 - 0001451248]         BOOTMEM
[    0.000000]   #36 [0001451280 - 0001451288]         BOOTMEM
[    0.000000]   #37 [00014512c0 - 0001451368]         BOOTMEM
[    0.000000]   #38 [0001451380 - 00014513e8]         BOOTMEM
[    0.000000]   #39 [0001854800 - 0001856800]         BOOTMEM
[    0.000000]   #40 [0001856800 - 0001896800]         BOOTMEM
[    0.000000]   #41 [0001896800 - 00018b6800]         BOOTMEM
[    0.000000] Initializing HighMem for node 0 (00000000:00000000)
[    0.000000] Memory: 511856k/524276k available (2554k kernel code, 12028k reserved, 930k data, 380k init, 0k highmem)
[    0.000000] virtual kernel memory layout:
[    0.000000]     fixmap  : 0xfff16000 - 0xfffff000   ( 932 kB)
[    0.000000]     pkmap   : 0xff800000 - 0xffc00000   (4096 kB)
[    0.000000]     vmalloc : 0xe07fd000 - 0xff7fe000   ( 496 MB)
[    0.000000]     lowmem  : 0xc0000000 - 0xdfffd000   ( 511 MB)
[    0.000000]       .init : 0xc1368000 - 0xc13c7000   ( 380 kB)
[    0.000000]       .data : 0xc127ebb7 - 0xc1367488   ( 930 kB)
[    0.000000]       .text : 0xc1000000 - 0xc127ebb7   (2554 kB)
[    0.000000] Checking if this processor honours the WP bit even in supervisor mode...Ok.
[    0.000000] Hierarchical RCU implementation.
[    0.000000] 	RCU-based detection of stalled CPUs is disabled.
[    0.000000] 	Verbose stalled-CPUs detection is disabled.
[    0.000000] NR_IRQS:512
[    0.000000] Console: colour VGA+ 80x25
[    0.000000] console [tty1] enabled
[    0.000000] console [ttyS0] enabled
[    0.000000] Detected 3217.252 MHz processor.
[    0.023332] Calibrating delay loop (skipped) preset value.. 6437.60 BogoMIPS (lpj=10724173)
[    0.023332] pid_max: default: 32768 minimum: 301
[    0.023332] Mount-cache hash table entries: 512
[    0.023447] Initializing cgroup subsys ns
[    0.024131] Initializing cgroup subsys cpuacct
[    0.024851] Initializing cgroup subsys devices
[    0.025580] Initializing cgroup subsys freezer
[    0.026669] Initializing cgroup subsys net_cls
[    0.027425] Initializing cgroup subsys blkio
[    0.030079] mce: CPU supports 10 MCE banks
[    0.030847] using C1E aware idle routine
[    0.031517] Performance Events: AMD PMU driver.
[    0.032313] ... version:                0
[    0.033335] ... bit width:              48
[    0.034036] ... generic registers:      4
[    0.034716] ... value mask:             0000ffffffffffff
[    0.035542] ... max period:             00007fffffffffff
[    0.036669] ... fixed-purpose events:   0
[    0.037521] ... event mask:             000000000000000f
[    0.041961] ACPI: Core revision 20100428
[    0.044150] Enabling APIC mode:  Flat.  Using 1 I/O APICs
[    0.045964] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.046671] CPU0: AMD Athlon(tm) II X2 260 Processor stepping 03
[    0.049999] APIC calibration not consistent with PM-Timer: 102ms instead of 100ms
[    0.049999] APIC delta adjusted to PM-Timer: 6248670 (6435422)
[    0.050298] Booting Node   0, Processors  #1 Ok.
[    0.023332] Initializing CPU#1
[    0.063333] kvm-clock: cpu 1, msr 0:1e0a0c1, secondary cpu clock
[    0.063333] Brought up 2 CPUs
[    0.063333] Total of 2 processors activated (12874.21 BogoMIPS).
[    0.076666] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
[    0.116666] devtmpfs: initialized
[    0.116666] NET: Registered protocol family 16
[    0.119999] ACPI: bus type pci registered
[    0.123333] PCI: PCI BIOS revision 2.10 entry at 0xffe77, last bus=0
[    0.123333] PCI: Using configuration type 1 for base access
[    0.123333] PCI: Using configuration type 1 for extended access
[    0.126666] mtrr: your CPUs had inconsistent variable MTRR settings
[    0.126666] mtrr: your CPUs had inconsistent MTRRdefType settings
[    0.126666] mtrr: probably your BIOS does not setup all CPUs.
[    0.126666] mtrr: corrected configuration.
[    0.136666] bio: create slab <bio-0> at 0
[    0.153333] ACPI: Interpreter enabled
[    0.153333] ACPI: (supports S0 S3 S4 S5)
[    0.153333] ACPI: Using IOAPIC for interrupt routing
[    0.203333] ACPI: No dock devices found.
[    0.203333] PCI: Ignoring host bridge windows from ACPI; if necessary, use "pci=use_crs" and report a bug
[    0.206666] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.209999] pci 0000:00:01.3: quirk: [io  0xb000-0xb03f] claimed by PIIX4 ACPI
[    0.209999] pci 0000:00:01.3: quirk: [io  0xb100-0xb10f] claimed by PIIX4 SMB
[    0.216666] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[    0.219999] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[    0.219999] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[    0.223333] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[    0.223333] HEST: Table is not found!
[    0.226666] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[    0.229999] vgaarb: loaded
[    0.229999] PCI: Using ACPI for IRQ routing
[    0.233333] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
[    0.239999] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[    0.239999] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
[    0.249999] Switching to clocksource kvm-clock
[    0.259999] pnp: PnP ACPI init
[    0.259999] ACPI: bus type pnp registered
[    0.259999] pnp: PnP ACPI: found 8 devices
[    0.259999] ACPI: ACPI bus type pnp unregistered
[    0.259999] PnPBIOS: Disabled
[    0.259999] NET: Registered protocol family 2
[    0.259999] IP route cache hash table entries: 4096 (order: 2, 16384 bytes)
[    0.259999] TCP established hash table entries: 16384 (order: 5, 131072 bytes)
[    0.259999] TCP bind hash table entries: 16384 (order: 5, 131072 bytes)
[    0.259999] TCP: Hash tables configured (established 16384 bind 16384)
[    0.259999] TCP reno registered
[    0.259999] UDP hash table entries: 256 (order: 1, 8192 bytes)
[    0.259999] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
[    0.259999] NET: Registered protocol family 1
[    0.259999] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    0.259999] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[    0.259999] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[    0.259999] Unpacking initramfs...
[    0.259999] Freeing initrd memory: 2948k freed
[    0.259999] HugeTLB registered 4 MB page size, pre-allocated 0 pages
[    0.259999] VFS: Disk quotas dquot_6.5.2
[    0.259999] Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
[    0.259999] msgmni has been set to 1005
[    0.259999] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[    0.259999] io scheduler noop registered
[    0.259999] io scheduler deadline registered
[    0.259999] io scheduler cfq registered (default)
[    0.259999] ERST: Table is not found!
[    0.259999] isapnp: Scanning for PnP cards...
[    0.259999] isapnp: No Plug & Play device found
[    0.259999] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    0.259999] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[    0.259999] 00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[    0.259999] PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[    0.259999] serio: i8042 KBD port at 0x60,0x64 irq 1
[    0.259999] serio: i8042 AUX port at 0x60,0x64 irq 12
[    0.259999] mice: PS/2 mouse device common for all mice
[    0.259999] input: PC Speaker as /devices/platform/pcspkr/input/input0
[    0.259999] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1
[    0.259999] rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
[    0.259999] rtc0: alarms up to one day, 114 bytes nvram, hpet irqs
[    0.259999] cpuidle: using governor ladder
[    0.259999] cpuidle: using governor menu
[    0.259999] TCP cubic registered
[    0.259999] NET: Registered protocol family 17
[    0.259999] Using IPI No-Shortcut mode
[    0.259999] rtc_cmos 00:01: setting system clock to 2010-10-02 07:27:50 UTC (1286004470)
[    0.259999] Freeing unused kernel memory: 380k freed
[    0.259999] Processing INITRAMFS
[    0.259999] SCSI subsystem initialized
[    0.259999] scsi0 : ata_piix
[    0.259999] scsi1 : ata_piix
[    0.259999] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14
[    0.259999] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15

Note the time - it is constant after switching to kvmclock.

This is the most typical place where it stops, sometimes it
stops at "Freeing unused kernel memory", sometimes it boots
further and hangs at "Login:" prompt, right after some other
kernel message.

This is bootlog with the last patch (kvmclock-fix-hack-1.patch)
and the previous "bandaid" patch (the kvmclock registration
printk, use-before-init, which obviously makes no difference)
applied.

I just realized I never posted any boot loogs from my systems...
So here it goes :)

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-02  7:35                                                       ` Michael Tokarev
@ 2010-10-02  7:40                                                         ` Michael Tokarev
  2010-10-02  7:50                                                           ` Michael Tokarev
  2010-10-02 16:10                                                         ` Arjan Koers
  2010-10-02 21:55                                                         ` Zachary Amsden
  2 siblings, 1 reply; 81+ messages in thread
From: Michael Tokarev @ 2010-10-02  7:40 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Marcelo Tosatti, Arjan Koers, kvm, Avi Kivity, Glauber Costa,
	Andre Przywara, jeremy

02.10.2010 11:35, Michael Tokarev wrote:
[]
> [    0.259999] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14
> [    0.259999] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15
> 
> Note the time - it is constant after switching to kvmclock.

Another interesting observation.  The time is almost always
like this.  Another very common version is 0.199999:

[    0.189999] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[    0.193333] HEST: Table is not found!
[    0.193333] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[    0.196666] vgaarb: loaded
[    0.196666] PCI: Using ACPI for IRQ routing
[    0.199999] Switching to clocksource kvm-clock
[    0.199999] pnp: PnP ACPI init
[    0.199999] ACPI: bus type pnp registered
[    0.199999] pnp: PnP ACPI: found 8 devices
[    0.199999] ACPI: ACPI bus type pnp unregistered
[    0.199999] PnPBIOS: Disabled
...

This shows much more often than any other value.

Thanks!

/mjt


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-02  7:40                                                         ` Michael Tokarev
@ 2010-10-02  7:50                                                           ` Michael Tokarev
  0 siblings, 0 replies; 81+ messages in thread
From: Michael Tokarev @ 2010-10-02  7:50 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Marcelo Tosatti, Arjan Koers, kvm, Avi Kivity, Glauber Costa,
	Andre Przywara, jeremy

Ugh.  Replying to myself again and again, but I found all these
variants quite interesting for the problem at hand.

02.10.2010 11:40, Michael Tokarev wrote:
> 02.10.2010 11:35, Michael Tokarev wrote:
> []
>> [    0.259999] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14
>> [    0.259999] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15
>>
>> Note the time - it is constant after switching to kvmclock.
> 
> Another interesting observation.  The time is almost always
> like this.  Another very common version is 0.199999:
> 
> [    0.189999] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
> [    0.193333] HEST: Table is not found!
> [    0.193333] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
> [    0.196666] vgaarb: loaded
> [    0.196666] PCI: Using ACPI for IRQ routing
> [    0.199999] Switching to clocksource kvm-clock
> [    0.199999] pnp: PnP ACPI init
> [    0.199999] ACPI: bus type pnp registered
> [    0.199999] pnp: PnP ACPI: found 8 devices
> [    0.199999] ACPI: ACPI bus type pnp unregistered
> [    0.199999] PnPBIOS: Disabled
> ...

And here's yet another variant I just got.  It hanged much earler
this time, now with 100% CPU usage:

...
[    0.000000] Kernel command line: rootfs=nfs root=/usr/rb rootflags=ro,nolock bootrc=/remote/bootrc initrd=lnx/initrd-2.6.35-i686 ip=192.168.88.60:192.168.88.4:192.168.88.4:255.255.255.0 BOOTIF=01-52-54-00-12-34-56 console=ttyS0 BOOT_IMAGE=lnx/vmlinuz-2.6.35-i686
...
[    0.009012] using C1E aware idle routine
[    0.009430] Performance Events: AMD PMU driver.
[    0.010009] ... version:                0
[    0.010427] ... bit width:              48
[    0.010853] ... generic registers:      4
[    0.011270] ... value mask:             0000ffffffffffff
[    0.011818] ... max period:             00007fffffffffff
[    0.012366] ... fixed-purpose events:   0
[    0.012785] ... event mask:             000000000000000f
[    0.016795] ACPI: Core revision 20100428
[    0.018729] Enabling APIC mode:  Flat.  Using 1 I/O APICs
[    0.019999] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.019999] CPU0: AMD Athlon(tm) II X2 260 Processor stepping 03

and.. nothing (this is with -cpu host).  So this is _way_
before the kvmclock registration.

Another:

...
[    0.109999] vgaarb: loaded
[    0.109999] PCI: Using ACPI for IRQ routing
[    0.113333] Switching to clocksource kvm-clock
[    0.116666] pnp: PnP ACPI init
[    0.116666] ACPI: bus type pnp registered

(note the "uncommon" timestamp ;)

With printk.time=0 it still boots ok.

Note there are 2 "versions" of this hang.  The one which is
trivially triggerable right at the kvmclock registration
without the bandaid printk patch applied - it hangs there
with 100% cpu usage and guest not reacting to any events.
This is what happened in the above case where it hanged
at CPU0 line, too -- 100% CPU and no reaction to keyboard.

Another, much more common variant with that printk patch
applied is like no cpu usage, the guest reacts to keyboard
events (I can Shift+PgUp/PgDown for example), but it does
not do anything else, and the time printed is constant.

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-02  7:35                                                       ` Michael Tokarev
  2010-10-02  7:40                                                         ` Michael Tokarev
@ 2010-10-02 16:10                                                         ` Arjan Koers
  2010-10-02 20:26                                                           ` Michael Tokarev
  2010-10-02 23:42                                                           ` Zachary Amsden
  2010-10-02 21:55                                                         ` Zachary Amsden
  2 siblings, 2 replies; 81+ messages in thread
From: Arjan Koers @ 2010-10-02 16:10 UTC (permalink / raw)
  To: kvm
  Cc: Zachary Amsden, Marcelo Tosatti, Michael Tokarev, Avi Kivity,
	Glauber Costa, Andre Przywara, jeremy

On 2010-10-02 09:35, Michael Tokarev wrote:
> 02.10.2010 09:35, Zachary Amsden wrote:
> []
>> Can you try this patch to see if it helps?  I believe it is also safe
>> for Xen, but cc'ing to double check.
> 
> It makes no visible difference.
> 
> For some reason one of my test guests - 2.6.35.6 32bit kernel -
> stopped booting completely, always handing at boot somewhere
> unless I disable printk.time.  Here's the typical boot messages,
> up to the hang:
> 
> [    0.000000] Initializing cgroup subsys cpuset
...
> [    0.259999] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14
> [    0.259999] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15
> 
> Note the time - it is constant after switching to kvmclock.

While CPU 1 is booting, pvclock_clocksource_read gets wrong data for that
CPU and returns a value that's far into the future. On subsequent calls, it
keeps returning that bogus 'last' value, because it has been made
to never go backwards in time.

I'm pretty sure that your kernel will boot with this debug patch (for
2.6.35.7). It doesn't fix the problem, but corrects things afterwards.
The patch sets the clock backwards if it detects that the previous
value was far into the future. It also modifies printk to display some
extra information. The DEBUG define was added to get extra calls to
printk's where things can go wrong.



diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 239427c..5eab569 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -120,12 +120,15 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src)

 static atomic64_t last_value = ATOMIC64_INIT(0);

+int pvclock_backwards = 0;
+
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 {
 	struct pvclock_shadow_time shadow;
 	unsigned version;
 	cycle_t ret, offset;
 	u64 last;
+	bool backwards;

 	do {
 		version = pvclock_get_time_values(&shadow, src);
@@ -153,13 +156,26 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 	 * updating at the same time, and one of them could be slightly behind,
 	 * making the assumption that last_value always go forward fail to hold.
 	 */
+	backwards = false;
 	last = atomic64_read(&last_value);
 	do {
-		if (ret < last)
-			return last;
+		if (ret < last) {
+			if ( last - ret < 25000000 )
+				return last;
+			else
+				/* The clock will go backwards instead of being stuck at last value for a very long time
+				 * The return value of the previous call to pvclock_clocksource_read was most probably
+				 * very far into te future because the clock for that CPU hadn't been setup yet
+				 */
+				backwards = true;
+		}
 		last = atomic64_cmpxchg(&last_value, last, ret);
 	} while (unlikely(last != ret));

+	/* Increment outside of the while loop, because it always loops twice */
+	if (backwards)
+		pvclock_backwards++;
+
 	return ret;
 }

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 0bf2ece..d6dcd45 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1,3 +1,5 @@
+#define DEBUG
+
 /*
  *	x86 SMP booting functions
  *
diff --git a/kernel/printk.c b/kernel/printk.c
index 444b770..9608bec 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -687,6 +687,8 @@ static inline void printk_delay(void)
 	}
 }

+extern int pvclock_backwards;
+
 asmlinkage int vprintk(const char *fmt, va_list args)
 {
 	int printed_len = 0;
@@ -778,9 +780,13 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 				unsigned long long t;
 				unsigned long nanosec_rem;

+				int pvclock_backwards_prev = pvclock_backwards;
 				t = cpu_clock(printk_cpu);
 				nanosec_rem = do_div(t, 1000000000);
-				tlen = sprintf(tbuf, "[%5lu.%06lu] ",
+				tlen = sprintf(tbuf, "[%d;%d/%d:%5lu.%06lu] ",
+						printk_cpu,
+						pvclock_backwards_prev,
+						pvclock_backwards,
 						(unsigned long) t,
 						nanosec_rem / 1000);




Partial output on my machine, where the clock is set backwards 4 times:
...
[0;0/0:    0.015662] CPU0: AMD Athlon(tm) II X2 240 Processor stepping 02
[0;0/0:    0.124164] ++++++++++++++++++++=_---CPU UP  1
[0;0/0:    0.124193] Booting Node   0, Processors  #1 Ok.
[0;0/0:    0.124602] Setting warm reset code and vector.
[0;0/0:    0.124609] 1.
[0;0/0:    0.124610] 2.
[0;0/0:    0.124611] 3.
[0;0/0:    0.124624] Asserting INIT.
[0;0/0:    0.124634] Waiting for send to finish...
[0;0/0:    0.134508] Deasserting INIT.
[0;0/0:    0.134515] Waiting for send to finish...
[0;0/0:    0.134519] #startup loops: 2.
[0;0/0:    0.134521] Sending STARTUP #1.
[0;0/0:    0.134527] After apic_write.
[1;0/0:    0.008000] CPU#1 (phys ID: 1) waiting for CALLOUT
[0;0/1:    0.134838] Startup point 1.
[0;1/1:    0.134841] Waiting for send to finish...
[0;1/1:    0.135049] Sending STARTUP #2.
[0;1/1:    0.135055] After apic_write.
[0;1/1:    0.135359] Startup point 1.
[0;1/1:    0.135361] Waiting for send to finish...
[0;1/1:    0.135568] After Startup.
[0;1/1:    0.135569] Before Callout 1.
[0;1/1:    0.135571] After Callout 1.
[1;1/1:    0.008000] CALLIN, before setup_local_APIC().
[1;2/2:    0.008000] Stack at about ffff88001f875f44
[0;3/3:    0.136176] CPU1: has booted.
[1;3/3:    0.008000] kvm-clock: cpu 1, msr 0:1511c41, secondary cpu clock
[0;4/4:    0.136199] Brought up 2 CPUs
[0;4/4:    0.136201] Boot done.
[0;4/4:    0.136202] Before bogomips.
[0;4/4:    0.136204] Total of 2 processors activated (11198.56 BogoMIPS).
[0;4/4:    0.136205] Before bogocount - setting activated=1.
[1;4/4:    0.140208] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
[0;4/4:    0.142577] NET: Registered protocol family 16
[0;4/4:    0.144263] PCI: Using configuration type 1 for base access
[0;4/4:    0.144494] PCI: Using configuration type 1 for extended access
[0;4/4:    0.144938] mtrr: your CPUs had inconsistent variable MTRR settings
[0;4/4:    0.144938] mtrr: your CPUs had inconsistent MTRRdefType settings
[0;4/4:    0.144938] mtrr: probably your BIOS does not setup all CPUs.
[0;4/4:    0.148004] mtrr: corrected configuration.
[0;4/4:    0.156040] bio: create slab <bio-0> at 0
[0;4/4:    0.156602] vgaarb: loaded
[0;4/4:    0.156602] PCI: Probing PCI hardware
[0;4/4:    0.156602] PCI: Probing PCI hardware (bus 00)
[0;4/4:    0.156703] pci 0000:00:01.1: reg 20: [io  0xc000-0xc00f]
[0;4/4:    0.160269] pci 0000:00:01.3: quirk: [io  0xb000-0xb03f] claimed by PIIX4 ACPI
[0;4/4:    0.161055] pci 0000:00:01.3: quirk: [io  0xb100-0xb10f] claimed by PIIX4 SMB
[0;4/4:    0.164064] pci 0000:00:02.0: reg 10: [mem 0xf0000000-0xf1ffffff pref]
[0;4/4:    0.164827] pci 0000:00:02.0: reg 14: [mem 0xf2000000-0xf2000fff]
[0;4/4:    0.169023] pci 0000:00:03.0: reg 10: [io  0xc020-0xc03f]
[0;4/4:    0.170052] pci 0000:00:03.0: reg 14: [mem 0xf2001000-0xf2001fff]
[0;4/4:    0.170381] pci 0000:00:04.0: reg 10: [io  0xc040-0xc05f]
[0;4/4:    0.170765] pci 0000:00:05.0: reg 10: [io  0xc080-0xc0bf]
[0;4/4:    0.171023] pci 0000:00:06.0: reg 10: [io  0xc0c0-0xc0ff]
[0;4/4:    0.172123] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[0;4/4:    0.172971] pci 0000:00:01.0: PIIX/ICH IRQ router [8086:7000]
[0;4/4:    0.172971] PCI: pci_cache_line_size set to 64 bytes
[0;4/4:    0.172971] reserve RAM buffer: 000000000009bc00 - 000000000009ffff
[0;4/4:    0.172971] reserve RAM buffer: 000000001fffd000 - 000000001fffffff
[0;4/4:    0.176175] Switching to clocksource kvm-clock
[1;4/4:    0.212494] pci_bus 0000:00: resource 0 [io  0x0000-0xffff]
[1;4/4:    0.212500] pci_bus 0000:00: resource 1 [mem 0x00000000-0xffffffffffffffff]
[1;4/4:    0.212828] NET: Registered protocol family 2
[1;4/4:    0.213783] IP route cache hash table entries: 4096 (order: 3, 32768 bytes)
...

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-02 16:10                                                         ` Arjan Koers
@ 2010-10-02 20:26                                                           ` Michael Tokarev
  2010-10-02 23:42                                                           ` Zachary Amsden
  1 sibling, 0 replies; 81+ messages in thread
From: Michael Tokarev @ 2010-10-02 20:26 UTC (permalink / raw)
  To: Arjan Koers
  Cc: kvm, Zachary Amsden, Marcelo Tosatti, Avi Kivity, Glauber Costa,
	Andre Przywara, jeremy

[-- Attachment #1: Type: text/plain, Size: 765 bytes --]

02.10.2010 20:10, Arjan Koers wrote:
[]
> I'm pretty sure that your kernel will boot with this debug patch (for
> 2.6.35.7). It doesn't fix the problem, but corrects things afterwards.
> The patch sets the clock backwards if it detects that the previous
> value was far into the future. It also modifies printk to display some
> extra information. The DEBUG define was added to get extra calls to
> printk's where things can go wrong.

Yes, it boots fine with this patch applied.  Attached is the dmesg
output of it.

[]
> Partial output on my machine, where the clock is set backwards 4 times:
> ...
> [0;0/0:    0.015662] CPU0: AMD Athlon(tm) II X2 240 Processor stepping 02

Um.  I wonder if it's AMD-specific somehow... ;)
(I also use -cpu host)

Thanks!

/mjt

[-- Attachment #2: dmesg-2.6.36-i686-pvclock-debug-patch.txt --]
[-- Type: text/plain, Size: 24434 bytes --]

[0;0/0:    0.000000] Initializing cgroup subsys cpuset
[0;0/0:    0.000000] Initializing cgroup subsys cpu
[0;0/0:    0.000000] Linux version 2.6.35-i686 (mjt@gandalf) (gcc version 4.4.5 20100728 (prerelease) (Debian 4.4.4-8) ) #2.6.35.6 SMP Thu Sep 30 12:00:24 MSD 2010
[0;0/0:    0.000000] BIOS-provided physical RAM map:
[0;0/0:    0.000000]  BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
[0;0/0:    0.000000]  BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
[0;0/0:    0.000000]  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[0;0/0:    0.000000]  BIOS-e820: 0000000000100000 - 000000001fffd000 (usable)
[0;0/0:    0.000000]  BIOS-e820: 000000001fffd000 - 0000000020000000 (reserved)
[0;0/0:    0.000000]  BIOS-e820: 00000000feffd000 - 00000000ff001000 (reserved)
[0;0/0:    0.000000]  BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
[0;0/0:    0.000000] Notice: NX (Execute Disable) protection cannot be enabled: non-PAE kernel!
[0;0/0:    0.000000] DMI 2.4 present.
[0;0/0:    0.000000] e820 update range: 0000000000000000 - 0000000000001000 (usable) ==> (reserved)
[0;0/0:    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
[0;0/0:    0.000000] last_pfn = 0x1fffd max_arch_pfn = 0x100000
[0;0/0:    0.000000] MTRR default type: write-back
[0;0/0:    0.000000] MTRR fixed ranges enabled:
[0;0/0:    0.000000]   00000-9FFFF write-back
[0;0/0:    0.000000]   A0000-BFFFF uncachable
[0;0/0:    0.000000]   C0000-FFFFF write-protect
[0;0/0:    0.000000] MTRR variable ranges enabled:
[0;0/0:    0.000000]   0 base 00E0000000 mask FFE0000000 uncachable
[0;0/0:    0.000000]   1 disabled
[0;0/0:    0.000000]   2 disabled
[0;0/0:    0.000000]   3 disabled
[0;0/0:    0.000000]   4 disabled
[0;0/0:    0.000000]   5 disabled
[0;0/0:    0.000000]   6 disabled
[0;0/0:    0.000000]   7 disabled
[0;0/0:    0.000000] x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106
[0;0/0:    0.000000] initial memory mapped : 0 - 01800000
[0;0/0:    0.000000] found SMP MP-table at [c00fdbe0] fdbe0
[0;0/0:    0.000000] init_memory_mapping: 0000000000000000-000000001fffd000
[0;0/0:    0.000000]  0000000000 - 0000400000 page 4k
[0;0/0:    0.000000]  0000400000 - 001fc00000 page 2M
[0;0/0:    0.000000]  001fc00000 - 001fffd000 page 4k
[0;0/0:    0.000000] kernel direct mapping tables up to 1fffd000 @ 7000-c000
[0;0/0:    0.000000] RAMDISK: 1fbb5000 - 1fe96000
[0;0/0:    0.000000] ACPI: RSDP 000fdb90 00014 (v00 BOCHS )
[0;0/0:    0.000000] ACPI: RSDT 1fffde10 00034 (v01 BOCHS  BXPCRSDT 00000001 BXPC 00000001)
[0;0/0:    0.000000] ACPI: FACP 1ffffe40 00074 (v01 BOCHS  BXPCFACP 00000001 BXPC 00000001)
[0;0/0:    0.000000] ACPI: DSDT 1fffdfd0 01E22 (v01   BXPC   BXDSDT 00000001 INTL 20090123)
[0;0/0:    0.000000] ACPI: FACS 1ffffe00 00040
[0;0/0:    0.000000] ACPI: SSDT 1fffdf80 00044 (v01 BOCHS  BXPCSSDT 00000001 BXPC 00000001)
[0;0/0:    0.000000] ACPI: APIC 1fffde90 0007A (v01 BOCHS  BXPCAPIC 00000001 BXPC 00000001)
[0;0/0:    0.000000] ACPI: HPET 1fffde50 00038 (v01 BOCHS  BXPCHPET 00000001 BXPC 00000001)
[0;0/0:    0.000000] ACPI: Local APIC address 0xfee00000
[0;0/0:    0.000000] 0MB HIGHMEM available.
[0;0/0:    0.000000] 511MB LOWMEM available.
[0;0/0:    0.000000]   mapped low ram: 0 - 1fffd000
[0;0/0:    0.000000]   low ram: 0 - 1fffd000
[0;0/0:    0.000000] kvm-clock: Using msrs 12 and 11
[0;0/0:    0.000000] kvm-clock: cpu 0, msr 0:13c70c1, boot clock
[0;0/0:    0.000000] Zone PFN ranges:
[0;0/0:    0.000000]   DMA      0x00000001 -> 0x00001000
[0;0/0:    0.000000]   Normal   0x00001000 -> 0x0001fffd
[0;0/0:    0.000000]   HighMem  empty
[0;0/0:    0.000000] Movable zone start PFN for each node
[0;0/0:    0.000000] early_node_map[2] active PFN ranges
[0;0/0:    0.000000]     0: 0x00000001 -> 0x0000009f
[0;0/0:    0.000000]     0: 0x00000100 -> 0x0001fffd
[0;0/0:    0.000000] On node 0 totalpages: 130971
[0;0/0:    0.000000] free_area_init_node: node 0, pgdat c135ffc0, node_mem_map c1454020
[0;0/0:    0.000000]   DMA zone: 32 pages used for memmap
[0;0/0:    0.000000]   DMA zone: 0 pages reserved
[0;0/0:    0.000000]   DMA zone: 3966 pages, LIFO batch:0
[0;0/0:    0.000000]   Normal zone: 992 pages used for memmap
[0;0/0:    0.000000]   Normal zone: 125981 pages, LIFO batch:31
[0;0/0:    0.000000] Using APIC driver default
[0;0/0:    0.000000] ACPI: PM-Timer IO Port: 0xb008
[0;0/0:    0.000000] ACPI: Local APIC address 0xfee00000
[0;0/0:    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[0;0/0:    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[0;0/0:    0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[0;0/0:    0.000000] IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
[0;0/0:    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[0;0/0:    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[0;0/0:    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[0;0/0:    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[0;0/0:    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[0;0/0:    0.000000] ACPI: IRQ0 used by override.
[0;0/0:    0.000000] ACPI: IRQ2 used by override.
[0;0/0:    0.000000] ACPI: IRQ5 used by override.
[0;0/0:    0.000000] ACPI: IRQ9 used by override.
[0;0/0:    0.000000] ACPI: IRQ10 used by override.
[0;0/0:    0.000000] ACPI: IRQ11 used by override.
[0;0/0:    0.000000] Using ACPI (MADT) for SMP configuration information
[0;0/0:    0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[0;0/0:    0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
[0;0/0:    0.000000] nr_irqs_gsi: 40
[0;0/0:    0.000000] PM: Registered nosave memory: 000000000009f000 - 00000000000a0000
[0;0/0:    0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
[0;0/0:    0.000000] PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
[0;0/0:    0.000000] Allocating PCI resources starting at 20000000 (gap: 20000000:deffd000)
[0;0/0:    0.000000] Booting paravirtualized kernel on KVM
[0;0/0:    0.000000] early_res array is doubled to 64 at [8000 - 87ff]
[0;0/0:    0.000000] setup_percpu: NR_CPUS:8 nr_cpumask_bits:8 nr_cpu_ids:2 nr_node_ids:1
[0;0/0:    0.000000] PERCPU: Embedded 16 pages/cpu @c1c00000 s43072 r0 d22464 u2097152
[0;0/0:    0.000000] pcpu-alloc: s43072 r0 d22464 u2097152 alloc=1*4194304
[0;0/0:    0.000000] pcpu-alloc: [0] 0 1 
[0;0/0:    0.000000] kvm-clock: cpu 0, msr 0:1c0a0c1, primary cpu clock
[0;0/0:    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 129947
[0;0/0:    0.000000] Kernel command line: acpi_enforce_resources=lax rootfs=nfs root=/usr/rb rootflags=ro,nolock bootrc=/remote/bootrc initrd=lnx/initrd-2.6.35-i686 ip=192.168.88.60:192.168.88.4:192.168.88.4:255.255.255.0 BOOTIF=01-52-54-00-12-34-56 debug console=ttyS0 console=tty1 BOOT_IMAGE=lnx/vmlinuz-2.6.35-i686 
[0;0/0:    0.000000] PID hash table entries: 2048 (order: 1, 8192 bytes)
[0;0/0:    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
[0;0/0:    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
[0;0/0:    0.000000] Enabling fast FPU save and restore... done.
[0;0/0:    0.000000] Enabling unmasked SIMD FPU exception support... done.
[0;0/0:    0.000000] Initializing CPU#0
[0;0/0:    0.000000] Subtract (42 early reservations)
[0;0/0:    0.000000]   #1 [0000001000 - 0000002000]   EX TRAMPOLINE
[0;0/0:    0.000000]   #2 [0001000000 - 000144b9e4]   TEXT DATA BSS
[0;0/0:    0.000000]   #3 [001fbb5000 - 001fe96000]         RAMDISK
[0;0/0:    0.000000]   #4 [000144c000 - 0001452049]             BRK
[0;0/0:    0.000000]   #5 [000009f400 - 00000fdbe0]   BIOS reserved
[0;0/0:    0.000000]   #6 [00000fdbe0 - 00000fdbf0]    MP-table mpf
[0;0/0:    0.000000]   #7 [00000fdce4 - 0000100000]   BIOS reserved
[0;0/0:    0.000000]   #8 [00000fdbf0 - 00000fdce4]    MP-table mpc
[0;0/0:    0.000000]   #9 [0000002000 - 0000003000]      TRAMPOLINE
[0;0/0:    0.000000]   #10 [0000003000 - 0000007000]     ACPI WAKEUP
[0;0/0:    0.000000]   #11 [0000007000 - 0000008000]         PGTABLE
[0;0/0:    0.000000]   #12 [0001453000 - 0001454000]         BOOTMEM
[0;0/0:    0.000000]   #13 [0001454000 - 0001854000]         BOOTMEM
[0;0/0:    0.000000]   #14 [000144ba00 - 000144ba04]         BOOTMEM
[0;0/0:    0.000000]   #15 [000144ba40 - 000144bb00]         BOOTMEM
[0;0/0:    0.000000]   #16 [000144bb00 - 000144bb30]         BOOTMEM
[0;0/0:    0.000000]   #17 [0001854000 - 0001855800]         BOOTMEM
[0;0/0:    0.000000]   #18 [000144bb40 - 000144bb65]         BOOTMEM
[0;0/0:    0.000000]   #19 [000144bb80 - 000144bba7]         BOOTMEM
[0;0/0:    0.000000]   #20 [000144bbc0 - 000144bca0]         BOOTMEM
[0;0/0:    0.000000]   #21 [000144bcc0 - 000144bd00]         BOOTMEM
[0;0/0:    0.000000]   #22 [000144bd00 - 000144bd40]         BOOTMEM
[0;0/0:    0.000000]   #23 [000144bd40 - 000144bd80]         BOOTMEM
[0;0/0:    0.000000]   #24 [000144bd80 - 000144bdc0]         BOOTMEM
[0;0/0:    0.000000]   #25 [000144bdc0 - 000144be00]         BOOTMEM
[0;0/0:    0.000000]   #26 [000144be00 - 000144be40]         BOOTMEM
[0;0/0:    0.000000]   #27 [000144be40 - 000144be80]         BOOTMEM
[0;0/0:    0.000000]   #28 [000144be80 - 000144be90]         BOOTMEM
[0;0/0:    0.000000]   #29 [000144bec0 - 000144bfd5]         BOOTMEM
[0;0/0:    0.000000]   #30 [0001452080 - 0001452195]         BOOTMEM
[0;0/0:    0.000000]   #31 [0001c00000 - 0001c10000]         BOOTMEM
[0;0/0:    0.000000]   #32 [0001e00000 - 0001e10000]         BOOTMEM
[0;0/0:    0.000000]   #33 [00014521c0 - 00014521c4]         BOOTMEM
[0;0/0:    0.000000]   #34 [0001452200 - 0001452204]         BOOTMEM
[0;0/0:    0.000000]   #35 [0001452240 - 0001452248]         BOOTMEM
[0;0/0:    0.000000]   #36 [0001452280 - 0001452288]         BOOTMEM
[0;0/0:    0.000000]   #37 [00014522c0 - 0001452368]         BOOTMEM
[0;0/0:    0.000000]   #38 [0001452380 - 00014523e8]         BOOTMEM
[0;0/0:    0.000000]   #39 [0001855800 - 0001857800]         BOOTMEM
[0;0/0:    0.000000]   #40 [0001857800 - 0001897800]         BOOTMEM
[0;0/0:    0.000000]   #41 [0001897800 - 00018b7800]         BOOTMEM
[0;0/0:    0.000000] Initializing HighMem for node 0 (00000000:00000000)
[0;0/0:    0.000000] Memory: 511852k/524276k available (2555k kernel code, 12032k reserved, 929k data, 384k init, 0k highmem)
[0;0/0:    0.000000] virtual kernel memory layout:
[0;0/0:    0.000000]     fixmap  : 0xfff16000 - 0xfffff000   ( 932 kB)
[0;0/0:    0.000000]     pkmap   : 0xff800000 - 0xffc00000   (4096 kB)
[0;0/0:    0.000000]     vmalloc : 0xe07fd000 - 0xff7fe000   ( 496 MB)
[0;0/0:    0.000000]     lowmem  : 0xc0000000 - 0xdfffd000   ( 511 MB)
[0;0/0:    0.000000]       .init : 0xc1368000 - 0xc13c8000   ( 384 kB)
[0;0/0:    0.000000]       .data : 0xc127ed37 - 0xc1367488   ( 929 kB)
[0;0/0:    0.000000]       .text : 0xc1000000 - 0xc127ed37   (2555 kB)
[0;0/0:    0.000000] Checking if this processor honours the WP bit even in supervisor mode...Ok.
[0;0/0:    0.000000] Hierarchical RCU implementation.
[0;0/0:    0.000000] 	RCU-based detection of stalled CPUs is disabled.
[0;0/0:    0.000000] 	Verbose stalled-CPUs detection is disabled.
[0;0/0:    0.000000] NR_IRQS:512
[0;0/0:    0.000000] CPU 0 irqstacks, hard=c1c00000 soft=c1c01000
[0;0/0:    0.000000] Console: colour VGA+ 80x25
[0;0/0:    0.000000] console [tty1] enabled
[0;0/0:    0.000000] console [ttyS0] enabled
[0;0/0:    0.000000] Detected 3217.252 MHz processor.
[0;0/0:    0.006666] Calibrating delay loop (skipped) preset value.. 6437.60 BogoMIPS (lpj=10724173)
[0;0/0:    0.006666] pid_max: default: 32768 minimum: 301
[0;0/0:    0.006666] Mount-cache hash table entries: 512
[0;0/0:    0.006780] Initializing cgroup subsys ns
[0;0/0:    0.007518] Initializing cgroup subsys cpuacct
[0;0/0:    0.008302] Initializing cgroup subsys devices
[0;0/0:    0.009087] Initializing cgroup subsys freezer
[0;0/0:    0.010003] Initializing cgroup subsys net_cls
[0;0/0:    0.010801] Initializing cgroup subsys blkio
[0;0/0:    0.011622] mce: CPU supports 10 MCE banks
[0;0/0:    0.012406] using C1E aware idle routine
[0;0/0:    0.013344] Performance Events: AMD PMU driver.
[0;0/0:    0.014201] ... version:                0
[0;0/0:    0.015007] ... bit width:              48
[0;0/0:    0.015750] ... generic registers:      4
[0;0/0:    0.016668] ... value mask:             0000ffffffffffff
[0;0/0:    0.017555] ... max period:             00007fffffffffff
[0;0/0:    0.018447] ... fixed-purpose events:   0
[0;0/0:    0.019176] ... event mask:             000000000000000f
[0;0/0:    0.023763] ACPI: Core revision 20100428
[0;0/0:    0.026153] Enabling APIC mode:  Flat.  Using 1 I/O APICs
[0;0/0:    0.028267] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[0;0/0:    0.029348] CPU0: AMD Athlon(tm) II X2 260 Processor stepping 03
[0;0/0:    0.033332] ++++++++++++++++++++=_---CPU UP  1
[0;0/0:    0.033332] CPU 1 irqstacks, hard=c1e00000 soft=c1e01000
[0;0/0:    0.033332] Booting Node   0, Processors  #1 Ok.
[0;0/0:    0.033332] Setting warm reset code and vector.
[0;0/0:    0.033340] 1.
[0;0/0:    0.033807] 2.
[0;0/0:    0.034286] 3.
[0;0/0:    0.034760] Asserting INIT.
[0;0/0:    0.035382] Waiting for send to finish...
[0;0/0:    0.047424] Deasserting INIT.
[0;0/0:    0.048263] Waiting for send to finish...
[0;0/0:    0.049039] #startup loops: 2.
[0;0/0:    0.049687] Sending STARTUP #1.
[0;0/0:    0.050004] After apic_write.
[1;0/0:    0.006666] Initializing CPU#1
[1;0/0:    0.006666] CPU#1 (phys ID: 1) waiting for CALLOUT
[0;0/1:    0.050947] Startup point 1.
[0;1/1:    0.053334] Waiting for send to finish...
[0;1/1:    0.054307] Sending STARTUP #2.
[0;1/1:    0.054976] After apic_write.
[0;1/1:    0.055910] Startup point 1.
[0;1/1:    0.056529] Waiting for send to finish...
[0;1/1:    0.056873] After Startup.
[0;1/1:    0.057477] Before Callout 1.
[0;1/1:    0.058108] After Callout 1.
[1;1/1:    0.006666] CALLIN, before setup_local_APIC().
[1;1/1:    0.006666] Stack at about df45afb0
[0;2/2:    0.063338] CPU1: has booted.
[1;2/2:    0.064004] kvm-clock: cpu 1, msr 0:1e0a0c1, secondary cpu clock
[0;2/2:    0.064020] Brought up 2 CPUs
[0;2/2:    0.064022] Boot done.
[0;2/2:    0.064023] Before bogomips.
[0;2/2:    0.064024] Total of 2 processors activated (12874.21 BogoMIPS).
[0;2/2:    0.064026] Before bogocount - setting activated=1.
[1;2/2:    0.070041] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
[0;2/2:    0.071991] devtmpfs: initialized
[0;2/2:    0.090107] NET: Registered protocol family 16
[0;2/2:    0.096723] ACPI: bus type pci registered
[0;2/2:    0.097582] PCI: PCI BIOS revision 2.10 entry at 0xffe77, last bus=0
[0;2/2:    0.100003] PCI: Using configuration type 1 for base access
[0;2/2:    0.100980] PCI: Using configuration type 1 for extended access
[0;2/2:    0.103511] mtrr: your CPUs had inconsistent variable MTRR settings
[0;2/2:    0.105319] mtrr: your CPUs had inconsistent MTRRdefType settings
[0;2/2:    0.106680] mtrr: probably your BIOS does not setup all CPUs.
[0;2/2:    0.108291] mtrr: corrected configuration.
[0;2/2:    0.120562] bio: create slab <bio-0> at 0
[0;2/2:    0.123617] ACPI: EC: Look up EC in DSDT
[0;2/2:    0.133128] ACPI: Interpreter enabled
[0;2/2:    0.133340] ACPI: (supports S0 S3 S4 S5)
[0;2/2:    0.140012] ACPI: Using IOAPIC for interrupt routing
[0;2/2:    0.173646] ACPI: No dock devices found.
[0;2/2:    0.174654] PCI: Ignoring host bridge windows from ACPI; if necessary, use "pci=use_crs" and report a bug
[0;2/2:    0.176690] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[0;2/2:    0.180038] pci_root PNP0A03:00: host bridge window [io  0x0000-0x0cf7] (ignored)
[0;2/2:    0.181789] pci_root PNP0A03:00: host bridge window [io  0x0d00-0xffff] (ignored)
[0;2/2:    0.183336] pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff] (ignored)
[0;2/2:    0.185124] pci_root PNP0A03:00: host bridge window [mem 0xe0000000-0xfebfffff] (ignored)
[0;2/2:    0.188101] pci 0000:00:01.1: reg 20: [io  0xc000-0xc00f]
[0;2/2:    0.190266] pci 0000:00:01.3: quirk: [io  0xb000-0xb03f] claimed by PIIX4 ACPI
[0;2/2:    0.192430] pci 0000:00:01.3: quirk: [io  0xb100-0xb10f] claimed by PIIX4 SMB
[0;2/2:    0.197592] pci 0000:00:02.0: reg 10: [mem 0xf0000000-0xf1ffffff pref]
[0;2/2:    0.200843] pci 0000:00:02.0: reg 14: [mem 0xf2000000-0xf2000fff]
[0;2/2:    0.207003] pci 0000:00:02.0: reg 30: [mem 0xf2010000-0xf201ffff pref]
[0;2/2:    0.208404] pci 0000:00:03.0: reg 10: [io  0xc020-0xc03f]
[0;2/2:    0.209386] pci 0000:00:03.0: reg 14: [mem 0xf2020000-0xf2020fff]
[0;2/2:    0.210083] pci 0000:00:03.0: reg 30: [mem 0xf2030000-0xf203ffff pref]
[0;2/2:    0.211384] pci_bus 0000:00: on NUMA node 0
[0;2/2:    0.212212] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[0;2/2:    0.250148] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[0;2/2:    0.255557] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[0;2/2:    0.257251] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[0;2/2:    0.260271] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[0;2/2:    0.261575] HEST: Table is not found!
[0;2/2:    0.263389] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[0;2/2:    0.264871] vgaarb: loaded
[0;2/2:    0.266696] PCI: Using ACPI for IRQ routing
[0;2/2:    0.267533] PCI: pci_cache_line_size set to 64 bytes
[0;2/2:    0.268531] reserve RAM buffer: 000000000009f400 - 000000000009ffff 
[0;2/2:    0.269456] reserve RAM buffer: 000000001fffd000 - 000000001fffffff 
[0;2/2:    0.270108] Switching to clocksource kvm-clock
[1;2/2:    0.273590] pnp: PnP ACPI init
[1;2/2:    0.276224] ACPI: bus type pnp registered
[1;2/2:    0.289416] pnp: PnP ACPI: found 8 devices
[1;2/2:    0.292666] ACPI: ACPI bus type pnp unregistered
[1;2/2:    0.296401] PnPBIOS: Disabled
[1;2/2:    0.347394] pci_bus 0000:00: resource 0 [io  0x0000-0xffff]
[1;2/2:    0.348535] pci_bus 0000:00: resource 1 [mem 0x00000000-0xffffffff]
[1;2/2:    0.349729] NET: Registered protocol family 2
[1;2/2:    0.350645] IP route cache hash table entries: 4096 (order: 2, 16384 bytes)
[1;2/2:    0.353500] TCP established hash table entries: 16384 (order: 5, 131072 bytes)
[1;2/2:    0.355026] TCP bind hash table entries: 16384 (order: 5, 131072 bytes)
[1;2/2:    0.356310] TCP: Hash tables configured (established 16384 bind 16384)
[1;2/2:    0.357441] TCP reno registered
[1;2/2:    0.358122] UDP hash table entries: 256 (order: 1, 8192 bytes)
[1;2/2:    0.359155] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
[1;2/2:    0.360525] NET: Registered protocol family 1
[1;2/2:    0.361377] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[1;2/2:    0.362416] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[1;2/2:    0.363432] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[1;2/2:    0.364514] pci 0000:00:02.0: Boot video device
[1;2/2:    0.365381] PCI: CLS 0 bytes, default 64
[1;2/2:    0.366199] Unpacking initramfs...
[1;2/2:    0.424327] Freeing initrd memory: 2948k freed
[1;2/2:    0.440434] HugeTLB registered 4 MB page size, pre-allocated 0 pages
[1;2/2:    0.442047] VFS: Disk quotas dquot_6.5.2
[1;2/2:    0.442848] Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
[1;2/2:    0.446823] msgmni has been set to 1005
[1;2/2:    0.447985] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[1;2/2:    0.449421] io scheduler noop registered
[1;2/2:    0.450190] io scheduler deadline registered
[1;2/2:    0.451094] io scheduler cfq registered (default)
[1;2/2:    0.453188] ERST: Table is not found!
[1;2/2:    0.454095] isapnp: Scanning for PnP cards...
[1;2/2:    0.824535] isapnp: No Plug & Play device found
[1;2/2:    0.826091] hpet_acpi_add: no address or irqs in _CRS
[1;2/2:    0.827286] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[1;2/2:    0.828924] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[1;2/2:    0.836029] 00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[1;2/2:    0.838659] PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[1;2/2:    0.840619] serio: i8042 KBD port at 0x60,0x64 irq 1
[1;2/2:    0.841541] serio: i8042 AUX port at 0x60,0x64 irq 12
[1;2/2:    0.845012] mice: PS/2 mouse device common for all mice
[1;2/2:    0.847980] input: PC Speaker as /devices/platform/pcspkr/input/input0
[1;2/2:    0.849155] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1
[1;2/2:    0.852013] rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
[1;2/2:    0.854609] rtc0: alarms up to one day, 114 bytes nvram
[1;2/2:    0.886105] cpuidle: using governor ladder
[1;2/2:    0.887061] cpuidle: using governor menu
[1;2/2:    0.887913] TCP cubic registered
[1;2/2:    0.888669] NET: Registered protocol family 17
[1;2/2:    0.889591] Using IPI No-Shortcut mode
[0;2/2:    0.931969] rtc_cmos 00:01: setting system clock to 2010-10-02 20:25:08 UTC (1286051108)
[0;2/2:    0.933627] Freeing unused kernel memory: 384k freed
[0;2/2:    0.935692] Processing INITRAMFS
[1;2/2:    1.006076] Clocksource tsc unstable (delta = 4015199349967 ns)
[1;2/2:    1.146625] SCSI subsystem initialized
[1;2/2:    1.157307] libata version 3.00 loaded.
[1;2/2:    1.182560] pata_acpi 0000:00:01.1: setting latency timer to 64
[1;2/2:    1.222009] ata_piix 0000:00:01.1: version 2.13
[1;2/2:    1.223035] ata_piix 0000:00:01.1: setting latency timer to 64
[1;2/2:    1.241698] scsi0 : ata_piix
[1;2/2:    1.243020] scsi1 : ata_piix
[1;2/2:    1.244063] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14
[1;2/2:    1.245429] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15
[1;2/2:    1.397679] ata2.01: NODEV after polling detection
[1;2/2:    1.402153] ata2.00: ATAPI: QEMU DVD-ROM, 0.12.91, max UDMA/100
[1;2/2:    1.409234] ata2.00: configured for MWDMA2
[1;2/2:    1.423776] scsi 1:0:0:0: CD-ROM            QEMU     QEMU DVD-ROM     0.12 PQ: 0 ANSI: 5
[0;2/2:    1.484757] sr0: scsi3-mmc drive: 4x/4x xa/form2 tray
[0;2/2:    1.485748] Uniform CD-ROM driver Revision: 3.20
[1;2/2:    1.486967] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
[1;2/2:    1.488081] virtio-pci 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) -> IRQ 11
[0;2/2:    1.488423] sr 1:0:0:0: Attached scsi CD-ROM sr0
[1;2/2:    1.490498] virtio-pci 0000:00:03.0: setting latency timer to 64
[1;2/2:    1.493282] warning: unable to find netif for 52:54:00:12:34:56, using eth0
[1;2/2:    1.494929] configuring network interface eth0: 192.168.88.60/255.255.255.0
[0;2/2:    1.496382] virtio-pci 0000:00:03.0: irq 40 for MSI/MSI-X
[0;2/2:    1.497404] virtio-pci 0000:00:03.0: irq 41 for MSI/MSI-X
[0;2/2:    1.498351] virtio-pci 0000:00:03.0: irq 42 for MSI/MSI-X
[1;2/2:    1.499098] ifconfig: SIOCSIFADDR: No such device
[1;2/2:    1.499155] warning: the following command failed:
[1;2/2:    1.499167] warning: ifconfig eth0 inet 192.168.88.60 netmask 255.255.255.0 up
[1;2/2:    2.239140] mounting nfs fs on 192.168.88.4:/usr/rb (options: ro,nolock)
[1;2/2:    2.274254] RPC: Registered udp transport module.
[1;2/2:    2.275187] RPC: Registered tcp transport module.
[1;2/2:    2.276072] RPC: Registered tcp NFSv4.1 backchannel transport module.
[1;2/2:    2.288839] Slow work thread pool: Starting up
[1;2/2:    2.292424] Slow work thread pool: Ready
[1;2/2:    2.301246] FS-Cache: Loaded
[1;2/2:    2.313044] FS-Cache: Netfs 'nfs' registered for caching
[0;2/2:    2.321020] executing /remote/bootrc
[1;2/2:    2.353758] aufs 2-standalone.tree-35-20100823
[0;2/2:    2.387083] loop: module loaded
[1;2/2:    2.395463] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[0;2/2:    3.227124] udev: starting version 160
[0;2/2:    3.401994] piix4_smbus 0000:00:01.3: SMBus Host Controller at 0xb100, revision 0
[0;2/2:    3.525318] parport_pc 00:05: reported by Plug and Play ACPI
[0;2/2:    3.527128] parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
[0;2/2:    3.536295] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input2
[0;2/2:    3.545768] ACPI: Power Button [PWRF]
[1;2/2:    3.556560] sr 1:0:0:0: Attached scsi generic sg0 type 5
[0;2/2:    3.570451] FDC 0 is a S82078B
[0;2/2:    3.606791] ACPI: acpi_idle registered with cpuidle
[1;2/2:    4.049418] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-02  7:35                                                       ` Michael Tokarev
  2010-10-02  7:40                                                         ` Michael Tokarev
  2010-10-02 16:10                                                         ` Arjan Koers
@ 2010-10-02 21:55                                                         ` Zachary Amsden
  2010-10-03  8:16                                                           ` Michael Tokarev
  2 siblings, 1 reply; 81+ messages in thread
From: Zachary Amsden @ 2010-10-02 21:55 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Marcelo Tosatti, Arjan Koers, kvm, Avi Kivity, Glauber Costa,
	Andre Przywara, jeremy

On 10/01/2010 09:35 PM, Michael Tokarev wrote:
> 02.10.2010 09:35, Zachary Amsden wrote:
> []
>    
>> Can you try this patch to see if it helps?  I believe it is also safe
>> for Xen, but cc'ing to double check.
>>      
> It makes no visible difference.
>
> For some reason one of my test guests - 2.6.35.6 32bit kernel -
> stopped booting completely, always handing at boot somewhere
> unless I disable printk.time.  Here's the typical boot messages,
> up to the hang:
>
> [    0.000000] Initializing cgroup subsys cpuset
> [    0.000000] Initializing cgroup subsys cpu
> [    0.000000] Linux version 2.6.35-i686 (mjt@gandalf) (gcc version 4.4.5 20100728 (prerelease) (Debian 4.4.4-8) ) #2.6.35.6 SMP Thu Sep 30 12:00:24 MSD 2010
> [    0.000000] BIOS-provided physical RAM map:
> [    0.000000]  BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
> [    0.000000]  BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
> [    0.000000]  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
> [    0.000000]  BIOS-e820: 0000000000100000 - 000000001fffd000 (usable)
> [    0.000000]  BIOS-e820: 000000001fffd000 - 0000000020000000 (reserved)
> [    0.000000]  BIOS-e820: 00000000feffd000 - 00000000ff001000 (reserved)
> [    0.000000]  BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
> [    0.000000] Notice: NX (Execute Disable) protection cannot be enabled: non-PAE kernel!
> [    0.000000] DMI 2.4 present.
> [    0.000000] last_pfn = 0x1fffd max_arch_pfn = 0x100000
> [    0.000000] x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106
> [    0.000000] found SMP MP-table at [c00fdbe0] fdbe0
> [    0.000000] init_memory_mapping: 0000000000000000-000000001fffd000
> [    0.000000] RAMDISK: 1fbb5000 - 1fe96000
> [    0.000000] ACPI: RSDP 000fdb90 00014 (v00 BOCHS )
> [    0.000000] ACPI: RSDT 1fffde10 00034 (v01 BOCHS  BXPCRSDT 00000001 BXPC 00000001)
> [    0.000000] ACPI: FACP 1ffffe40 00074 (v01 BOCHS  BXPCFACP 00000001 BXPC 00000001)
> [    0.000000] ACPI: DSDT 1fffdfd0 01E22 (v01   BXPC   BXDSDT 00000001 INTL 20090123)
> [    0.000000] ACPI: FACS 1ffffe00 00040
> [    0.000000] ACPI: SSDT 1fffdf80 00044 (v01 BOCHS  BXPCSSDT 00000001 BXPC 00000001)
> [    0.000000] ACPI: APIC 1fffde90 0007A (v01 BOCHS  BXPCAPIC 00000001 BXPC 00000001)
> [    0.000000] ACPI: HPET 1fffde50 00038 (v01 BOCHS  BXPCHPET 00000001 BXPC 00000001)
> [    0.000000] 0MB HIGHMEM available.
> [    0.000000] 511MB LOWMEM available.
> [    0.000000]   mapped low ram: 0 - 1fffd000
> [    0.000000]   low ram: 0 - 1fffd000
> [    0.000000] kvm-clock: Using msrs 12 and 11
> [    0.000000] kvm-clock: cpu 0, msr 0:13c60c1, boot clock
> [    0.000000] Zone PFN ranges:
> [    0.000000]   DMA      0x00000001 ->  0x00001000
> [    0.000000]   Normal   0x00001000 ->  0x0001fffd
> [    0.000000]   HighMem  empty
> [    0.000000] Movable zone start PFN for each node
> [    0.000000] early_node_map[2] active PFN ranges
> [    0.000000]     0: 0x00000001 ->  0x0000009f
> [    0.000000]     0: 0x00000100 ->  0x0001fffd
> [    0.000000] Using APIC driver default
> [    0.000000] ACPI: PM-Timer IO Port: 0xb008
> [    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
> [    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
> [    0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
> [    0.000000] IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
> [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
> [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
> [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
> [    0.000000] Using ACPI (MADT) for SMP configuration information
> [    0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
> [    0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
> [    0.000000] PM: Registered nosave memory: 000000000009f000 - 00000000000a0000
> [    0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
> [    0.000000] PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
> [    0.000000] Allocating PCI resources starting at 20000000 (gap: 20000000:deffd000)
> [    0.000000] Booting paravirtualized kernel on KVM
> [    0.000000] setup_percpu: NR_CPUS:8 nr_cpumask_bits:8 nr_cpu_ids:2 nr_node_ids:1
> [    0.000000] PERCPU: Embedded 16 pages/cpu @c1c00000 s43072 r0 d22464 u2097152
> [    0.000000] pcpu-alloc: s43072 r0 d22464 u2097152 alloc=1*4194304
> [    0.000000] pcpu-alloc: [0] 0 1
> [    0.000000] kvm-clock: cpu 0, msr 0:1c0a0c1, primary cpu clock
> [    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 129947
> [    0.000000] Kernel command line: acpi_enforce_resources=lax rootfs=nfs root=/usr/rb rootflags=ro,nolock bootrc=/remote/bootrc initrd=lnx/initrd-2.6.35-i686 ip=192.168.88.60:192.168.88.4:192.168.88.4:255.255.255.0 BOOTIF=01-52-54-00-12-34-56 console=tty1 console=ttyS0 BOOT_IMAGE=lnx/vmlinuz-2.6.35-i686
> [    0.000000] PID hash table entries: 2048 (order: 1, 8192 bytes)
> [    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
> [    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
> [    0.000000] Enabling fast FPU save and restore... done.
> [    0.000000] Enabling unmasked SIMD FPU exception support... done.
> [    0.000000] Initializing CPU#0
> [    0.000000] Subtract (42 early reservations)
> [    0.000000]   #1 [0000001000 - 0000002000]   EX TRAMPOLINE
> [    0.000000]   #2 [0001000000 - 000144a9e4]   TEXT DATA BSS
> [    0.000000]   #3 [001fbb5000 - 001fe96000]         RAMDISK
> [    0.000000]   #4 [000144b000 - 0001451049]             BRK
> [    0.000000]   #5 [000009f400 - 00000fdbe0]   BIOS reserved
> [    0.000000]   #6 [00000fdbe0 - 00000fdbf0]    MP-table mpf
> [    0.000000]   #7 [00000fdce4 - 0000100000]   BIOS reserved
> [    0.000000]   #8 [00000fdbf0 - 00000fdce4]    MP-table mpc
> [    0.000000]   #9 [0000002000 - 0000003000]      TRAMPOLINE
> [    0.000000]   #10 [0000003000 - 0000007000]     ACPI WAKEUP
> [    0.000000]   #11 [0000007000 - 0000008000]         PGTABLE
> [    0.000000]   #12 [0001452000 - 0001453000]         BOOTMEM
> [    0.000000]   #13 [0001453000 - 0001853000]         BOOTMEM
> [    0.000000]   #14 [000144aa00 - 000144aa04]         BOOTMEM
> [    0.000000]   #15 [000144aa40 - 000144ab00]         BOOTMEM
> [    0.000000]   #16 [000144ab00 - 000144ab30]         BOOTMEM
> [    0.000000]   #17 [0001853000 - 0001854800]         BOOTMEM
> [    0.000000]   #18 [000144ab40 - 000144ab65]         BOOTMEM
> [    0.000000]   #19 [000144ab80 - 000144aba7]         BOOTMEM
> [    0.000000]   #20 [000144abc0 - 000144aca0]         BOOTMEM
> [    0.000000]   #21 [000144acc0 - 000144ad00]         BOOTMEM
> [    0.000000]   #22 [000144ad00 - 000144ad40]         BOOTMEM
> [    0.000000]   #23 [000144ad40 - 000144ad80]         BOOTMEM
> [    0.000000]   #24 [000144ad80 - 000144adc0]         BOOTMEM
> [    0.000000]   #25 [000144adc0 - 000144ae00]         BOOTMEM
> [    0.000000]   #26 [000144ae00 - 000144ae40]         BOOTMEM
> [    0.000000]   #27 [000144ae40 - 000144ae80]         BOOTMEM
> [    0.000000]   #28 [000144ae80 - 000144ae90]         BOOTMEM
> [    0.000000]   #29 [000144aec0 - 000144afcf]         BOOTMEM
> [    0.000000]   #30 [0001451080 - 000145118f]         BOOTMEM
> [    0.000000]   #31 [0001c00000 - 0001c10000]         BOOTMEM
> [    0.000000]   #32 [0001e00000 - 0001e10000]         BOOTMEM
> [    0.000000]   #33 [00014511c0 - 00014511c4]         BOOTMEM
> [    0.000000]   #34 [0001451200 - 0001451204]         BOOTMEM
> [    0.000000]   #35 [0001451240 - 0001451248]         BOOTMEM
> [    0.000000]   #36 [0001451280 - 0001451288]         BOOTMEM
> [    0.000000]   #37 [00014512c0 - 0001451368]         BOOTMEM
> [    0.000000]   #38 [0001451380 - 00014513e8]         BOOTMEM
> [    0.000000]   #39 [0001854800 - 0001856800]         BOOTMEM
> [    0.000000]   #40 [0001856800 - 0001896800]         BOOTMEM
> [    0.000000]   #41 [0001896800 - 00018b6800]         BOOTMEM
> [    0.000000] Initializing HighMem for node 0 (00000000:00000000)
> [    0.000000] Memory: 511856k/524276k available (2554k kernel code, 12028k reserved, 930k data, 380k init, 0k highmem)
> [    0.000000] virtual kernel memory layout:
> [    0.000000]     fixmap  : 0xfff16000 - 0xfffff000   ( 932 kB)
> [    0.000000]     pkmap   : 0xff800000 - 0xffc00000   (4096 kB)
> [    0.000000]     vmalloc : 0xe07fd000 - 0xff7fe000   ( 496 MB)
> [    0.000000]     lowmem  : 0xc0000000 - 0xdfffd000   ( 511 MB)
> [    0.000000]       .init : 0xc1368000 - 0xc13c7000   ( 380 kB)
> [    0.000000]       .data : 0xc127ebb7 - 0xc1367488   ( 930 kB)
> [    0.000000]       .text : 0xc1000000 - 0xc127ebb7   (2554 kB)
> [    0.000000] Checking if this processor honours the WP bit even in supervisor mode...Ok.
> [    0.000000] Hierarchical RCU implementation.
> [    0.000000] 	RCU-based detection of stalled CPUs is disabled.
> [    0.000000] 	Verbose stalled-CPUs detection is disabled.
> [    0.000000] NR_IRQS:512
> [    0.000000] Console: colour VGA+ 80x25
> [    0.000000] console [tty1] enabled
> [    0.000000] console [ttyS0] enabled
> [    0.000000] Detected 3217.252 MHz processor.
> [    0.023332] Calibrating delay loop (skipped) preset value.. 6437.60 BogoMIPS (lpj=10724173)
> [    0.023332] pid_max: default: 32768 minimum: 301
> [    0.023332] Mount-cache hash table entries: 512
> [    0.023447] Initializing cgroup subsys ns
> [    0.024131] Initializing cgroup subsys cpuacct
> [    0.024851] Initializing cgroup subsys devices
> [    0.025580] Initializing cgroup subsys freezer
> [    0.026669] Initializing cgroup subsys net_cls
> [    0.027425] Initializing cgroup subsys blkio
> [    0.030079] mce: CPU supports 10 MCE banks
> [    0.030847] using C1E aware idle routine
> [    0.031517] Performance Events: AMD PMU driver.
> [    0.032313] ... version:                0
> [    0.033335] ... bit width:              48
> [    0.034036] ... generic registers:      4
> [    0.034716] ... value mask:             0000ffffffffffff
> [    0.035542] ... max period:             00007fffffffffff
> [    0.036669] ... fixed-purpose events:   0
> [    0.037521] ... event mask:             000000000000000f
> [    0.041961] ACPI: Core revision 20100428
> [    0.044150] Enabling APIC mode:  Flat.  Using 1 I/O APICs
> [    0.045964] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [    0.046671] CPU0: AMD Athlon(tm) II X2 260 Processor stepping 03
> [    0.049999] APIC calibration not consistent with PM-Timer: 102ms instead of 100ms
> [    0.049999] APIC delta adjusted to PM-Timer: 6248670 (6435422)
> [    0.050298] Booting Node   0, Processors  #1 Ok.
> [    0.023332] Initializing CPU#1
>    

Before this, time is very granular...
> [    0.063333] kvm-clock: cpu 1, msr 0:1e0a0c1, secondary cpu clock
> [    0.063333] Brought up 2 CPUs
> [    0.063333] Total of 2 processors activated (12874.21 BogoMIPS).
> [    0.076666] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
> [    0.116666] devtmpfs: initialized
> [    0.116666] NET: Registered protocol family 16
> [    0.119999] ACPI: bus type pci registered
>    

Now it is multiples of 1/300 ....

> [    0.123333] PCI: PCI BIOS revision 2.10 entry at 0xffe77, last bus=0
> [    0.123333] PCI: Using configuration type 1 for base access
> [    0.123333] PCI: Using configuration type 1 for extended access
> [    0.126666] mtrr: your CPUs had inconsistent variable MTRR settings
> [    0.126666] mtrr: your CPUs had inconsistent MTRRdefType settings
> [    0.126666] mtrr: probably your BIOS does not setup all CPUs.
> [    0.126666] mtrr: corrected configuration.
> [    0.136666] bio: create slab<bio-0>  at 0
> [    0.153333] ACPI: Interpreter enabled
> [    0.153333] ACPI: (supports S0 S3 S4 S5)
> [    0.153333] ACPI: Using IOAPIC for interrupt routing
> [    0.203333] ACPI: No dock devices found.
> [    0.203333] PCI: Ignoring host bridge windows from ACPI; if necessary, use "pci=use_crs" and report a bug
> [    0.206666] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
> [    0.209999] pci 0000:00:01.3: quirk: [io  0xb000-0xb03f] claimed by PIIX4 ACPI
> [    0.209999] pci 0000:00:01.3: quirk: [io  0xb100-0xb10f] claimed by PIIX4 SMB
> [    0.216666] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
> [    0.219999] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
> [    0.219999] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
> [    0.223333] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
> [    0.223333] HEST: Table is not found!
> [    0.226666] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
> [    0.229999] vgaarb: loaded
> [    0.229999] PCI: Using ACPI for IRQ routing
> [    0.233333] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
> [    0.239999] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
> [    0.239999] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
> [    0.249999] Switching to clocksource kvm-clock
> [    0.259999] pnp: PnP ACPI init
>    

Then, of course, it fails.

What is your host clocksource?  Does your machine have unstable TSC?  
Here, I have unstable tsc:

[zamsden@mysore linux-2.6]$ cat 
/sys/devices/system/clocksource/clocksource0/*
hpet acpi_pm
hpet

Can you do this in the guest too?  That will make it very clear what 
clocksources the guest finds during bootup.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-02 16:10                                                         ` Arjan Koers
  2010-10-02 20:26                                                           ` Michael Tokarev
@ 2010-10-02 23:42                                                           ` Zachary Amsden
  2010-10-03  8:27                                                             ` Michael Tokarev
  2010-10-08  0:12                                                             ` Arjan Koers
  1 sibling, 2 replies; 81+ messages in thread
From: Zachary Amsden @ 2010-10-02 23:42 UTC (permalink / raw)
  To: Arjan Koers
  Cc: kvm, Marcelo Tosatti, Michael Tokarev, Avi Kivity, Glauber Costa,
	Andre Przywara

On 10/02/2010 06:10 AM, Arjan Koers wrote:
> On 2010-10-02 09:35, Michael Tokarev wrote:
>    
>> 02.10.2010 09:35, Zachary Amsden wrote:
>> []
>>      
>>> Can you try this patch to see if it helps?  I believe it is also safe
>>> for Xen, but cc'ing to double check.
>>>        
>> It makes no visible difference.
>>
>> For some reason one of my test guests - 2.6.35.6 32bit kernel -
>> stopped booting completely, always handing at boot somewhere
>> unless I disable printk.time.  Here's the typical boot messages,
>> up to the hang:
>>
>> [    0.000000] Initializing cgroup subsys cpuset
>>      
> ...
>    
>> [    0.259999] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14
>> [    0.259999] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15
>>
>> Note the time - it is constant after switching to kvmclock.
>>      
> While CPU 1 is booting, pvclock_clocksource_read gets wrong data for that
> CPU and returns a value that's far into the future. On subsequent calls, it
> keeps returning that bogus 'last' value, because it has been made
> to never go backwards in time.
>
> I'm pretty sure that your kernel will boot with this debug patch (for
> 2.6.35.7). It doesn't fix the problem, but corrects things afterwards.
> The patch sets the clock backwards if it detects that the previous
> value was far into the future. It also modifies printk to display some
> extra information. The DEBUG define was added to get extra calls to
> printk's where things can go wrong.
>
>
>
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 239427c..5eab569 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -120,12 +120,15 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src)
>
>   static atomic64_t last_value = ATOMIC64_INIT(0);
>
> +int pvclock_backwards = 0;
> +
>   cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
>   {
>   	struct pvclock_shadow_time shadow;
>   	unsigned version;
>   	cycle_t ret, offset;
>   	u64 last;
> +	bool backwards;
>
>   	do {
>   		version = pvclock_get_time_values(&shadow, src);
> @@ -153,13 +156,26 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
>   	 * updating at the same time, and one of them could be slightly behind,
>   	 * making the assumption that last_value always go forward fail to hold.
>   	 */
> +	backwards = false;
>   	last = atomic64_read(&last_value);
>   	do {
> -		if (ret<  last)
> -			return last;
> +		if (ret<  last) {
> +			if ( last - ret<  25000000 )
> +				return last;
> +			else
> +				/* The clock will go backwards instead of being stuck at last value for a very long time
> +				 * The return value of the previous call to pvclock_clocksource_read was most probably
> +				 * very far into te future because the clock for that CPU hadn't been setup yet
> +				 */
> +				backwards = true;
> +		}
>   		last = atomic64_cmpxchg(&last_value, last, ret);
>   	} while (unlikely(last != ret));
>
> +	/* Increment outside of the while loop, because it always loops twice */
> +	if (backwards)
> +		pvclock_backwards++;
> +
>   	return ret;
>   }
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 0bf2ece..d6dcd45 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1,3 +1,5 @@
> +#define DEBUG
> +
>   /*
>    *	x86 SMP booting functions
>    *
> diff --git a/kernel/printk.c b/kernel/printk.c
> index 444b770..9608bec 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -687,6 +687,8 @@ static inline void printk_delay(void)
>   	}
>   }
>
> +extern int pvclock_backwards;
> +
>   asmlinkage int vprintk(const char *fmt, va_list args)
>   {
>   	int printed_len = 0;
> @@ -778,9 +780,13 @@ asmlinkage int vprintk(const char *fmt, va_list args)
>   				unsigned long long t;
>   				unsigned long nanosec_rem;
>
> +				int pvclock_backwards_prev = pvclock_backwards;
>   				t = cpu_clock(printk_cpu);
>   				nanosec_rem = do_div(t, 1000000000);
> -				tlen = sprintf(tbuf, "[%5lu.%06lu] ",
> +				tlen = sprintf(tbuf, "[%d;%d/%d:%5lu.%06lu] ",
> +						printk_cpu,
> +						pvclock_backwards_prev,
> +						pvclock_backwards,
>   						(unsigned long) t,
>   						nanosec_rem / 1000);
>
>
>
>
> Partial output on my machine, where the clock is set backwards 4 times:
> ...
> [0;0/0:    0.015662] CPU0: AMD Athlon(tm) II X2 240 Processor stepping 02
> [0;0/0:    0.124164] ++++++++++++++++++++=_---CPU UP  1
> [0;0/0:    0.124193] Booting Node   0, Processors  #1 Ok.
> [0;0/0:    0.124602] Setting warm reset code and vector.
> [0;0/0:    0.124609] 1.
> [0;0/0:    0.124610] 2.
> [0;0/0:    0.124611] 3.
> [0;0/0:    0.124624] Asserting INIT.
> [0;0/0:    0.124634] Waiting for send to finish...
> [0;0/0:    0.134508] Deasserting INIT.
> [0;0/0:    0.134515] Waiting for send to finish...
> [0;0/0:    0.134519] #startup loops: 2.
> [0;0/0:    0.134521] Sending STARTUP #1.
> [0;0/0:    0.134527] After apic_write.
> [1;0/0:    0.008000] CPU#1 (phys ID: 1) waiting for CALLOUT
> [0;0/1:    0.134838] Startup point 1.
> [0;1/1:    0.134841] Waiting for send to finish...
> [0;1/1:    0.135049] Sending STARTUP #2.
> [0;1/1:    0.135055] After apic_write.
> [0;1/1:    0.135359] Startup point 1.
> [0;1/1:    0.135361] Waiting for send to finish...
> [0;1/1:    0.135568] After Startup.
> [0;1/1:    0.135569] Before Callout 1.
> [0;1/1:    0.135571] After Callout 1.
> [1;1/1:    0.008000] CALLIN, before setup_local_APIC().
> [1;2/2:    0.008000] Stack at about ffff88001f875f44
> [0;3/3:    0.136176] CPU1: has booted.
> [1;3/3:    0.008000] kvm-clock: cpu 1, msr 0:1511c41, secondary cpu clock
> [0;4/4:    0.136199] Brought up 2 CPUs
> [0;4/4:    0.136201] Boot done.
> [0;4/4:    0.136202] Before bogomips.
> [0;4/4:    0.136204] Total of 2 processors activated (11198.56 BogoMIPS).
> [0;4/4:    0.136205] Before bogocount - setting activated=1.
> [1;4/4:    0.140208] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
> [0;4/4:    0.142577] NET: Registered protocol family 16
> [0;4/4:    0.144263] PCI: Using configuration type 1 for base access
> [0;4/4:    0.144494] PCI: Using configuration type 1 for extended access
> [0;4/4:    0.144938] mtrr: your CPUs had inconsistent variable MTRR settings
> [0;4/4:    0.144938] mtrr: your CPUs had inconsistent MTRRdefType settings
> [0;4/4:    0.144938] mtrr: probably your BIOS does not setup all CPUs.
> [0;4/4:    0.148004] mtrr: corrected configuration.
> [0;4/4:    0.156040] bio: create slab<bio-0>  at 0
> [0;4/4:    0.156602] vgaarb: loaded
> [0;4/4:    0.156602] PCI: Probing PCI hardware
> [0;4/4:    0.156602] PCI: Probing PCI hardware (bus 00)
> [0;4/4:    0.156703] pci 0000:00:01.1: reg 20: [io  0xc000-0xc00f]
> [0;4/4:    0.160269] pci 0000:00:01.3: quirk: [io  0xb000-0xb03f] claimed by PIIX4 ACPI
> [0;4/4:    0.161055] pci 0000:00:01.3: quirk: [io  0xb100-0xb10f] claimed by PIIX4 SMB
> [0;4/4:    0.164064] pci 0000:00:02.0: reg 10: [mem 0xf0000000-0xf1ffffff pref]
> [0;4/4:    0.164827] pci 0000:00:02.0: reg 14: [mem 0xf2000000-0xf2000fff]
> [0;4/4:    0.169023] pci 0000:00:03.0: reg 10: [io  0xc020-0xc03f]
> [0;4/4:    0.170052] pci 0000:00:03.0: reg 14: [mem 0xf2001000-0xf2001fff]
> [0;4/4:    0.170381] pci 0000:00:04.0: reg 10: [io  0xc040-0xc05f]
> [0;4/4:    0.170765] pci 0000:00:05.0: reg 10: [io  0xc080-0xc0bf]
> [0;4/4:    0.171023] pci 0000:00:06.0: reg 10: [io  0xc0c0-0xc0ff]
> [0;4/4:    0.172123] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
> [0;4/4:    0.172971] pci 0000:00:01.0: PIIX/ICH IRQ router [8086:7000]
> [0;4/4:    0.172971] PCI: pci_cache_line_size set to 64 bytes
> [0;4/4:    0.172971] reserve RAM buffer: 000000000009bc00 - 000000000009ffff
> [0;4/4:    0.172971] reserve RAM buffer: 000000001fffd000 - 000000001fffffff
> [0;4/4:    0.176175] Switching to clocksource kvm-clock
> [1;4/4:    0.212494] pci_bus 0000:00: resource 0 [io  0x0000-0xffff]
> [1;4/4:    0.212500] pci_bus 0000:00: resource 1 [mem 0x00000000-0xffffffffffffffff]
> [1;4/4:    0.212828] NET: Registered protocol family 2
> [1;4/4:    0.213783] IP route cache hash table entries: 4096 (order: 3, 32768 bytes)
> ...
>    

Umm...  do you guys have this commit?  This is supposed to address the 
issue where the guest keeps resetting the TSC.  A guest which does that 
will break kvmclock.  It only happens on SMP, and it's much worse on AMD 
CPUs...

sound like your scenario.

commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
Author: Zachary Amsden <zamsden@redhat.com>
Date:   Thu Aug 19 22:07:26 2010 -1000

     KVM: x86: Robust TSC compensation

     Make the match of TSC find TSC writes that are close to each other
     instead of perfectly identical; this allows the compensator to also
     work in migration / suspend scenarios.

     Signed-off-by: Zachary Amsden <zamsden@redhat.com>
     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-02 21:55                                                         ` Zachary Amsden
@ 2010-10-03  8:16                                                           ` Michael Tokarev
  2010-10-03  8:22                                                             ` Avi Kivity
  2010-10-03  8:30                                                             ` Michael Tokarev
  0 siblings, 2 replies; 81+ messages in thread
From: Michael Tokarev @ 2010-10-03  8:16 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Marcelo Tosatti, Arjan Koers, kvm, Avi Kivity, Glauber Costa,
	Andre Przywara, jeremy

03.10.2010 01:55, Zachary Amsden wrote:
> On 10/01/2010 09:35 PM, Michael Tokarev wrote:
[]
>> [    0.049999] APIC delta adjusted to PM-Timer: 6248670 (6435422)
>> [    0.050298] Booting Node   0, Processors  #1 Ok.
>> [    0.023332] Initializing CPU#1
>>    
> 
> Before this, time is very granular...
>> [    0.063333] kvm-clock: cpu 1, msr 0:1e0a0c1, secondary cpu clock
>> [    0.063333] Brought up 2 CPUs
>> [    0.063333] Total of 2 processors activated (12874.21 BogoMIPS).
>> [    0.076666] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
>> [    0.116666] devtmpfs: initialized
>> [    0.116666] NET: Registered protocol family 16
>> [    0.119999] ACPI: bus type pci registered
> 
> Now it is multiples of 1/300 ....

Note it's second CPU.

>> [    0.249999] Switching to clocksource kvm-clock
>> [    0.259999] pnp: PnP ACPI init
>>    
> 
> Then, of course, it fails.
> 
> What is your host clocksource?  Does your machine have unstable TSC? 
> Here, I have unstable tsc:

Host is using tsc, and this is the only available clocksource now.
It was long time ago when I looked at this last - usually all
standard 3, also hpet and acpi_pm, are available too.  This is
AthlonII CPU, which has synced tsc.  I upgraded the CPU this year
from the previous gen Athlon, -- that one didn't have synced tsc
and kernel were using something else.  So I really don't know why
and when I've only tsc listed on the host (it's 2.6.35.6 x64).

The guest finds usual (in this situation) kvmclock and acpi_pm
(I'm running it with -no-hpet - without it also finds hpet) --
it reports about instability of tsc somewhere in dmesg:

 [1;3/3:   1.004254] Clocksource tsc unstable (delta = 284538419181 ns)

Note this is a regression too, or maybe a bugfix - some time ago,
on another AthlonII machine (also synced tsc), I used to have SMP
guests that used tsc and reported instability of tsc only when
host were swapping (we had a _long_ conversation with Marcelo
Trosati about this somewhere last year, both in public and in
private and on irc, with some bugs fixed after this).  Tha to
say, guests at least had _apparently_ stable tsc before, now
instability is detected right away, with a huge difference too.

I just booted this same guest using kvm-0.12.5 - using that one
guest does not report unstable tsc, yet does not list it in the
available_clocksources.  It also shows time jumps:

...
[0;0/0:    0.000000] Detected 3217.424 MHz processor.
[0;0/0:    0.006666] Calibrating delay loop (skipped) preset value.. 6437.96 BogoMIPS (lpj=10724746)
[0;0/0:    0.006666] pid_max: default: 32768 minimum: 301
[0;0/0:    0.006666] Mount-cache hash table entries: 512
[0;0/0:    0.006765] Initializing cgroup subsys ns
...
[0;0/0:    0.029999] Booting Node   0, Processors  #1 Ok.
[1;0/0:    0.006666] Initializing CPU#1
[1;0/0:    0.006666] kvm-clock: cpu 1, msr 0:1e0a0c1, secondary cpu clock
[0;0/0:    0.058342] Brought up 2 CPUs
...

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-03  8:16                                                           ` Michael Tokarev
@ 2010-10-03  8:22                                                             ` Avi Kivity
  2010-10-03  8:30                                                             ` Michael Tokarev
  1 sibling, 0 replies; 81+ messages in thread
From: Avi Kivity @ 2010-10-03  8:22 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Zachary Amsden, Marcelo Tosatti, Arjan Koers, kvm, Glauber Costa,
	Andre Przywara, jeremy

  On 10/03/2010 10:16 AM, Michael Tokarev wrote:
> I just booted this same guest using kvm-0.12.5 - using that one
> guest does not report unstable tsc, yet does not list it in the
> available_clocksources.  It also shows time jumps:
>
> ...
> [0;0/0:    0.000000] Detected 3217.424 MHz processor.
> [0;0/0:    0.006666] Calibrating delay loop (skipped) preset value.. 6437.96 BogoMIPS (lpj=10724746)
> [0;0/0:    0.006666] pid_max: default: 32768 minimum: 301
> [0;0/0:    0.006666] Mount-cache hash table entries: 512
> [0;0/0:    0.006765] Initializing cgroup subsys ns
> ...
> [0;0/0:    0.029999] Booting Node   0, Processors  #1 Ok.
> [1;0/0:    0.006666] Initializing CPU#1
> [1;0/0:    0.006666] kvm-clock: cpu 1, msr 0:1e0a0c1, secondary cpu clock
> [0;0/0:    0.058342] Brought up 2 CPUs
> ...

Most likely it's still using jiffies while the clocks are being set up.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-02 23:42                                                           ` Zachary Amsden
@ 2010-10-03  8:27                                                             ` Michael Tokarev
  2010-10-08  0:12                                                             ` Arjan Koers
  1 sibling, 0 replies; 81+ messages in thread
From: Michael Tokarev @ 2010-10-03  8:27 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Arjan Koers, kvm, Marcelo Tosatti, Avi Kivity, Glauber Costa,
	Andre Przywara

03.10.2010 03:42, Zachary Amsden wrote:
[]
> Umm...  do you guys have this commit?  This is supposed to address the
> issue where the guest keeps resetting the TSC.  A guest which does that
> will break kvmclock.  It only happens on SMP, and it's much worse on AMD
> CPUs...
> 
> sound like your scenario.

I'm using 2.6.35.y kernel.org kernel which does not have this patch.
I discovered this problem with this kernel first, and later it become
apparent that it is present in 2.6.32 stable series as well -- that's
my current main target, 2.6.35 for testing stuff and 2.6.32 for a
backport later.

> commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d

And it does not apply to 2.6.35 too, -- there's no kvm_write_tsc()
function in arch/x86/kvm/x86.c, and no code similar to that.

I browsed Linus git history, and see that this is a part of
larger patch series, which were already mentioned in this
thread several times, but without any mention of the base
it should be applied to (you mentioned another of your
patches in this series, the one that writes zero to tsc
somewhere, and told it wont apply but just shows the bug).

Should I try to apply whole thing to 2.6.35?

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-03  8:16                                                           ` Michael Tokarev
  2010-10-03  8:22                                                             ` Avi Kivity
@ 2010-10-03  8:30                                                             ` Michael Tokarev
  1 sibling, 0 replies; 81+ messages in thread
From: Michael Tokarev @ 2010-10-03  8:30 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Marcelo Tosatti, Arjan Koers, kvm, Avi Kivity, Glauber Costa,
	Andre Przywara, jeremy

03.10.2010 12:16, Michael Tokarev wrote:

> Host is using tsc, and this is the only available clocksource now.
> It was long time ago when I looked at this last - usually all
> standard 3, also hpet and acpi_pm, are available too.  This is
> AthlonII CPU, which has synced tsc.  I upgraded the CPU this year
> from the previous gen Athlon, -- that one didn't have synced tsc
> and kernel were using something else.  So I really don't know why
> and when I've only tsc listed on the host (it's 2.6.35.6 x64).

Oh well, it was ENOCOFFEE.  I were looking at current_clocksource,
not available_clocksources on the host.  Available are all usual
sources, -- tsc hpet and acpi_pm, just like as expected.

Thanks!

/mjt


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-02 23:42                                                           ` Zachary Amsden
  2010-10-03  8:27                                                             ` Michael Tokarev
@ 2010-10-08  0:12                                                             ` Arjan Koers
  2010-10-08  2:47                                                               ` Zachary Amsden
  1 sibling, 1 reply; 81+ messages in thread
From: Arjan Koers @ 2010-10-08  0:12 UTC (permalink / raw)
  To: kvm
  Cc: Zachary Amsden, Marcelo Tosatti, Michael Tokarev, Avi Kivity,
	Glauber Costa, Andre Przywara

On 2010-10-03 01:42, Zachary Amsden wrote:
...
> 
> Umm...  do you guys have this commit?  This is supposed to address the
> issue where the guest keeps resetting the TSC.  A guest which does that
> will break kvmclock.  It only happens on SMP, and it's much worse on AMD
> CPUs...
> 
> sound like your scenario.
> 
> commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
> Author: Zachary Amsden <zamsden@redhat.com>
> Date:   Thu Aug 19 22:07:26 2010 -1000


This commit fixes the problem:

commit aad07c4f92bae2edaa42bcef84c2afdd0d082458
Author: Zachary Amsden <zamsden@redhat.com>
Date:   Thu Aug 19 22:07:19 2010 -1000

    KVM: x86: Move TSC reset out of vmcb_init

    The VMCB is reset whenever we receive a startup IPI, so Linux is setting
    TSC back to zero happens very late in the boot process and destabilizing
    the TSC.  Instead, just set TSC to zero once at VCPU creation time.

    Why the separate patch?  So git-bisect is your friend.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-08  0:12                                                             ` Arjan Koers
@ 2010-10-08  2:47                                                               ` Zachary Amsden
  2010-10-08 22:06                                                                 ` Marcelo Tosatti
  0 siblings, 1 reply; 81+ messages in thread
From: Zachary Amsden @ 2010-10-08  2:47 UTC (permalink / raw)
  To: Arjan Koers
  Cc: kvm, Marcelo Tosatti, Michael Tokarev, Avi Kivity, Glauber Costa,
	Andre Przywara

On 10/07/2010 02:12 PM, Arjan Koers wrote:
> On 2010-10-03 01:42, Zachary Amsden wrote:
> ...
>    
>> Umm...  do you guys have this commit?  This is supposed to address the
>> issue where the guest keeps resetting the TSC.  A guest which does that
>> will break kvmclock.  It only happens on SMP, and it's much worse on AMD
>> CPUs...
>>
>> sound like your scenario.
>>
>> commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
>> Author: Zachary Amsden<zamsden@redhat.com>
>> Date:   Thu Aug 19 22:07:26 2010 -1000
>>      
>
> This commit fixes the problem:
>
> commit aad07c4f92bae2edaa42bcef84c2afdd0d082458
> Author: Zachary Amsden<zamsden@redhat.com>
> Date:   Thu Aug 19 22:07:19 2010 -1000
>
>      KVM: x86: Move TSC reset out of vmcb_init
>
>      The VMCB is reset whenever we receive a startup IPI, so Linux is setting
>      TSC back to zero happens very late in the boot process and destabilizing
>      the TSC.  Instead, just set TSC to zero once at VCPU creation time.
>
>      Why the separate patch?  So git-bisect is your friend.
>    

Okay, apparently I need to go poke around 2.6.35 and see what patches 
made it there and what patches didn't.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-08  2:47                                                               ` Zachary Amsden
@ 2010-10-08 22:06                                                                 ` Marcelo Tosatti
  2010-10-09  1:10                                                                   ` Arjan Koers
  2010-10-09  7:59                                                                   ` Michael Tokarev
  0 siblings, 2 replies; 81+ messages in thread
From: Marcelo Tosatti @ 2010-10-08 22:06 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Arjan Koers, kvm, Michael Tokarev, Avi Kivity, Glauber Costa,
	Andre Przywara

[-- Attachment #1: Type: text/plain, Size: 1306 bytes --]

On Thu, Oct 07, 2010 at 04:47:11PM -1000, Zachary Amsden wrote:
> On 10/07/2010 02:12 PM, Arjan Koers wrote:
> >On 2010-10-03 01:42, Zachary Amsden wrote:
> >...
> >>Umm...  do you guys have this commit?  This is supposed to address the
> >>issue where the guest keeps resetting the TSC.  A guest which does that
> >>will break kvmclock.  It only happens on SMP, and it's much worse on AMD
> >>CPUs...
> >>
> >>sound like your scenario.
> >>
> >>commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
> >>Author: Zachary Amsden<zamsden@redhat.com>
> >>Date:   Thu Aug 19 22:07:26 2010 -1000
> >
> >This commit fixes the problem:
> >
> >commit aad07c4f92bae2edaa42bcef84c2afdd0d082458
> >Author: Zachary Amsden<zamsden@redhat.com>
> >Date:   Thu Aug 19 22:07:19 2010 -1000
> >
> >     KVM: x86: Move TSC reset out of vmcb_init
> >
> >     The VMCB is reset whenever we receive a startup IPI, so Linux is setting
> >     TSC back to zero happens very late in the boot process and destabilizing
> >     the TSC.  Instead, just set TSC to zero once at VCPU creation time.
> >
> >     Why the separate patch?  So git-bisect is your friend.
> 
> Okay, apparently I need to go poke around 2.6.35 and see what
> patches made it there and what patches didn't.

Backports attached. Michael, Arjan, please give them a try.


[-- Attachment #2: 001-kvm-x86-fix-svm-reset --]
[-- Type: text/plain, Size: 867 bytes --]

commit 280372e494634d0a2cba3956721be16fc4f989bf
Author: Zachary Amsden <zamsden@redhat.com>
Date:   Thu Aug 19 22:07:18 2010 -1000

    KVM: x86: Fix SVM VMCB reset
    
    On reset, VMCB TSC should be set to zero.  Instead, code was setting
    tsc_offset to zero, which passes through the underlying TSC.
    
    Signed-off-by: Zachary Amsden <zamsden@redhat.com>
    Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: kvm/arch/x86/kvm/svm.c
===================================================================
--- kvm.orig/arch/x86/kvm/svm.c
+++ kvm/arch/x86/kvm/svm.c
@@ -766,7 +766,7 @@ static void init_vmcb(struct vcpu_svm *s
 
 	control->iopm_base_pa = iopm_base;
 	control->msrpm_base_pa = __pa(svm->msrpm);
-	control->tsc_offset = 0;
+	control->tsc_offset = 0-native_read_tsc();
 	control->int_ctl = V_INTR_MASKING_MASK;
 
 	init_seg(&save->es);

[-- Attachment #3: 002-kvm-x86-move-tsc-reset-out-of-vmcb-init --]
[-- Type: text/plain, Size: 1247 bytes --]

commit aad07c4f92bae2edaa42bcef84c2afdd0d082458
Author: Zachary Amsden <zamsden@redhat.com>
Date:   Thu Aug 19 22:07:19 2010 -1000

    KVM: x86: Move TSC reset out of vmcb_init

    The VMCB is reset whenever we receive a startup IPI, so Linux is setting
    TSC back to zero happens very late in the boot process and destabilizing
    the TSC.  Instead, just set TSC to zero once at VCPU creation time.

    Why the separate patch?  So git-bisect is your friend.

    Signed-off-by: Zachary Amsden <zamsden@redhat.com>
    Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: kvm/arch/x86/kvm/svm.c
===================================================================
--- kvm.orig/arch/x86/kvm/svm.c
+++ kvm/arch/x86/kvm/svm.c
@@ -766,7 +766,6 @@ static void init_vmcb(struct vcpu_svm *s
 
 	control->iopm_base_pa = iopm_base;
 	control->msrpm_base_pa = __pa(svm->msrpm);
-	control->tsc_offset = 0-native_read_tsc();
 	control->int_ctl = V_INTR_MASKING_MASK;
 
 	init_seg(&save->es);
@@ -902,6 +901,7 @@ static struct kvm_vcpu *svm_create_vcpu(
 	svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT;
 	svm->asid_generation = 0;
 	init_vmcb(svm);
+	svm->vmcb->control.tsc_offset = 0-native_read_tsc();
 
 	err = fx_init(&svm->vcpu);
 	if (err)

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-08 22:06                                                                 ` Marcelo Tosatti
@ 2010-10-09  1:10                                                                   ` Arjan Koers
  2010-10-09  2:27                                                                     ` Zachary Amsden
                                                                                       ` (3 more replies)
  2010-10-09  7:59                                                                   ` Michael Tokarev
  1 sibling, 4 replies; 81+ messages in thread
From: Arjan Koers @ 2010-10-09  1:10 UTC (permalink / raw)
  To: kvm
  Cc: Zachary Amsden, Marcelo Tosatti, Michael Tokarev, Avi Kivity,
	Glauber Costa, Andre Przywara

On 2010-10-09 00:06, Marcelo Tosatti wrote:
> On Thu, Oct 07, 2010 at 04:47:11PM -1000, Zachary Amsden wrote:
>> On 10/07/2010 02:12 PM, Arjan Koers wrote:
>>> On 2010-10-03 01:42, Zachary Amsden wrote:
>>> ...
>>>> Umm...  do you guys have this commit?  This is supposed to address the
>>>> issue where the guest keeps resetting the TSC.  A guest which does that
>>>> will break kvmclock.  It only happens on SMP, and it's much worse on AMD
>>>> CPUs...
>>>>
>>>> sound like your scenario.
>>>>
>>>> commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
>>>> Author: Zachary Amsden<zamsden@redhat.com>
>>>> Date:   Thu Aug 19 22:07:26 2010 -1000
>>>
>>> This commit fixes the problem:
>>>
>>> commit aad07c4f92bae2edaa42bcef84c2afdd0d082458
>>> Author: Zachary Amsden<zamsden@redhat.com>
>>> Date:   Thu Aug 19 22:07:19 2010 -1000
>>>
>>>     KVM: x86: Move TSC reset out of vmcb_init
>>>
>>>     The VMCB is reset whenever we receive a startup IPI, so Linux is setting
>>>     TSC back to zero happens very late in the boot process and destabilizing
>>>     the TSC.  Instead, just set TSC to zero once at VCPU creation time.
>>>
>>>     Why the separate patch?  So git-bisect is your friend.
>>
>> Okay, apparently I need to go poke around 2.6.35 and see what
>> patches made it there and what patches didn't.
> 
> Backports attached. Michael, Arjan, please give them a try.
> 

Thanks for the patches.

Successfully tested with 2.6.34.7, 2.6.35.7 and 2.6.36-rc7 host
(with a 2.6.35.7 guest).

It failed with a 2.6.32.24 host. The patch applied, but
pvclock_clocksource_read on the guest is still producing wrong
results for CPU 1 while it's booting. I'll re-check tomorrow.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-09  1:10                                                                   ` Arjan Koers
@ 2010-10-09  2:27                                                                     ` Zachary Amsden
  2010-10-09  6:29                                                                       ` Michael Tokarev
                                                                                         ` (2 more replies)
  2010-10-09  2:29                                                                     ` Zachary Amsden
                                                                                       ` (2 subsequent siblings)
  3 siblings, 3 replies; 81+ messages in thread
From: Zachary Amsden @ 2010-10-09  2:27 UTC (permalink / raw)
  To: Arjan Koers
  Cc: kvm, Marcelo Tosatti, Michael Tokarev, Avi Kivity, Glauber Costa,
	Andre Przywara

On 10/08/2010 03:10 PM, Arjan Koers wrote:
> On 2010-10-09 00:06, Marcelo Tosatti wrote:
>    
>> On Thu, Oct 07, 2010 at 04:47:11PM -1000, Zachary Amsden wrote:
>>      
>>> On 10/07/2010 02:12 PM, Arjan Koers wrote:
>>>        
>>>> On 2010-10-03 01:42, Zachary Amsden wrote:
>>>> ...
>>>>          
>>>>> Umm...  do you guys have this commit?  This is supposed to address the
>>>>> issue where the guest keeps resetting the TSC.  A guest which does that
>>>>> will break kvmclock.  It only happens on SMP, and it's much worse on AMD
>>>>> CPUs...
>>>>>
>>>>> sound like your scenario.
>>>>>
>>>>> commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
>>>>> Author: Zachary Amsden<zamsden@redhat.com>
>>>>> Date:   Thu Aug 19 22:07:26 2010 -1000
>>>>>            
>>>> This commit fixes the problem:
>>>>
>>>> commit aad07c4f92bae2edaa42bcef84c2afdd0d082458
>>>> Author: Zachary Amsden<zamsden@redhat.com>
>>>> Date:   Thu Aug 19 22:07:19 2010 -1000
>>>>
>>>>      KVM: x86: Move TSC reset out of vmcb_init
>>>>
>>>>      The VMCB is reset whenever we receive a startup IPI, so Linux is setting
>>>>      TSC back to zero happens very late in the boot process and destabilizing
>>>>      the TSC.  Instead, just set TSC to zero once at VCPU creation time.
>>>>
>>>>      Why the separate patch?  So git-bisect is your friend.
>>>>          
>>> Okay, apparently I need to go poke around 2.6.35 and see what
>>> patches made it there and what patches didn't.
>>>        
>> Backports attached. Michael, Arjan, please give them a try.
>>
>>      
> Thanks for the patches.
>
> Successfully tested with 2.6.34.7, 2.6.35.7 and 2.6.36-rc7 host
> (with a 2.6.35.7 guest).
>
> It failed with a 2.6.32.24 host. The patch applied, but
> pvclock_clocksource_read on the guest is still producing wrong
> results for CPU 1 while it's booting. I'll re-check tomorrow.
>    

There's a lot of work I've done and also a lot of work done by Glauber 
Costa on kvmclock that recently went upstream.

It's unlikely that you'll be bug free without all of those patches 
applied; most of the patches were not just enhancements, but contained 
bugfixes as well as improved operation conditions.  On top of this, the 
patches are highly interdependent because of close code proximity.  I 
suggest applying the following commits to your branch (newest listed 
first; apply in reverse order):

12b1164fa498997bf72070e6a81418197e283716
bfa075b75d8786380a7bca1215d4c7d1485d18dd
82e7988a2088781175a22b09631bce97cd5ed177
bfb3f3326c915b1800dc65d10ca09fbd548353d2
1377ff23ae2bf49c76f8f498ca81050878b9666a
9a088cc32488cfb9f60dca5972155ba13f39eb83
e06a1a6cbe4e9f4c766595483a9b345d5b48bda7
da908f2fb4e783c2a4de751fb90f11a0dd041161
cf839f5da2b0779b9ec8b990f851fb4e7d681da0
cbc59a098486494d9a49537dcb9c969210a8306d
5cd459cdde725bb5c3a7feef6e074e7da70490c9
d578d4d72e3d2154901123f40c9fa7de1f85ae73
bd59fc8ff95126f27b7a0df1b6cc602aa428812d
e5e7675b0b9bf8eb0b806145a2fe173b5bb0e908
bf0fb4a42ba7eb362f4013bd2e93209666793e66
69403a558097a9bd333736d58a4cb69ea6e2a0ac
a87834bdb7ff9117da7f164e8cee638f2c51f9b7
91308e2fecddb6fc63feaf4cef3400f5cbea6619
fd03465c0648cd12d7333269b80d902d0a8516dd
aad07c4f92bae2edaa42bcef84c2afdd0d082458
280372e494634d0a2cba3956721be16fc4f989bf
1e6145f6fd7899d1f34e4ac00a8558d82a8d704a
ec01d2eb0a74a6d95823fb6e320298473faf12be
3e05d29fe45508625e2a73db3d1bfb54f30731ff

Since the issue appears resolved, I'm going to continue working upstream.

Zach

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-09  1:10                                                                   ` Arjan Koers
  2010-10-09  2:27                                                                     ` Zachary Amsden
@ 2010-10-09  2:29                                                                     ` Zachary Amsden
  2010-10-10  1:26                                                                     ` Arjan Koers
  2010-10-20 20:47                                                                     ` Arjan Koers
  3 siblings, 0 replies; 81+ messages in thread
From: Zachary Amsden @ 2010-10-09  2:29 UTC (permalink / raw)
  To: Arjan Koers
  Cc: kvm, Marcelo Tosatti, Michael Tokarev, Avi Kivity, Glauber Costa,
	Andre Przywara

On 10/08/2010 03:10 PM, Arjan Koers wrote:
> On 2010-10-09 00:06, Marcelo Tosatti wrote:
>    
>> On Thu, Oct 07, 2010 at 04:47:11PM -1000, Zachary Amsden wrote:
>>      
>>> On 10/07/2010 02:12 PM, Arjan Koers wrote:
>>>        
>>>> On 2010-10-03 01:42, Zachary Amsden wrote:
>>>> ...
>>>>          
>>>>> Umm...  do you guys have this commit?  This is supposed to address the
>>>>> issue where the guest keeps resetting the TSC.  A guest which does that
>>>>> will break kvmclock.  It only happens on SMP, and it's much worse on AMD
>>>>> CPUs...
>>>>>
>>>>> sound like your scenario.
>>>>>
>>>>> commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
>>>>> Author: Zachary Amsden<zamsden@redhat.com>
>>>>> Date:   Thu Aug 19 22:07:26 2010 -1000
>>>>>            
>>>> This commit fixes the problem:
>>>>
>>>> commit aad07c4f92bae2edaa42bcef84c2afdd0d082458
>>>> Author: Zachary Amsden<zamsden@redhat.com>
>>>> Date:   Thu Aug 19 22:07:19 2010 -1000
>>>>
>>>>      KVM: x86: Move TSC reset out of vmcb_init
>>>>
>>>>      The VMCB is reset whenever we receive a startup IPI, so Linux is setting
>>>>      TSC back to zero happens very late in the boot process and destabilizing
>>>>      the TSC.  Instead, just set TSC to zero once at VCPU creation time.
>>>>
>>>>      Why the separate patch?  So git-bisect is your friend.
>>>>          
>>> Okay, apparently I need to go poke around 2.6.35 and see what
>>> patches made it there and what patches didn't.
>>>        
>> Backports attached. Michael, Arjan, please give them a try.
>>
>>      
> Thanks for the patches.
>
> Successfully tested with 2.6.34.7, 2.6.35.7 and 2.6.36-rc7 host
> (with a 2.6.35.7 guest).
>
> It failed with a 2.6.32.24 host. The patch applied, but
> pvclock_clocksource_read on the guest is still producing wrong
> results for CPU 1 while it's booting. I'll re-check tomorrow.
>    

There's a lot of work I've done and also a lot of work done by Glauber 
Costa on kvmclock that recently went upstream.

It's unlikely that you'll be bug free without all of those patches 
applied; most of the patches were not just enhancements, but contained 
bugfixes as well as improved operation conditions.  On top of this, the 
patches are highly interdependent because of close code proximity.  I 
suggest applying the following commits to your branch (newest listed 
first; apply in reverse order):

12b1164fa498997bf72070e6a81418197e283716
bfa075b75d8786380a7bca1215d4c7d1485d18dd
82e7988a2088781175a22b09631bce97cd5ed177
bfb3f3326c915b1800dc65d10ca09fbd548353d2
1377ff23ae2bf49c76f8f498ca81050878b9666a
9a088cc32488cfb9f60dca5972155ba13f39eb83
e06a1a6cbe4e9f4c766595483a9b345d5b48bda7
da908f2fb4e783c2a4de751fb90f11a0dd041161
cf839f5da2b0779b9ec8b990f851fb4e7d681da0
cbc59a098486494d9a49537dcb9c969210a8306d
5cd459cdde725bb5c3a7feef6e074e7da70490c9
d578d4d72e3d2154901123f40c9fa7de1f85ae73
bd59fc8ff95126f27b7a0df1b6cc602aa428812d
e5e7675b0b9bf8eb0b806145a2fe173b5bb0e908
bf0fb4a42ba7eb362f4013bd2e93209666793e66
69403a558097a9bd333736d58a4cb69ea6e2a0ac
a87834bdb7ff9117da7f164e8cee638f2c51f9b7
91308e2fecddb6fc63feaf4cef3400f5cbea6619
fd03465c0648cd12d7333269b80d902d0a8516dd
aad07c4f92bae2edaa42bcef84c2afdd0d082458
280372e494634d0a2cba3956721be16fc4f989bf
1e6145f6fd7899d1f34e4ac00a8558d82a8d704a
ec01d2eb0a74a6d95823fb6e320298473faf12be
3e05d29fe45508625e2a73db3d1bfb54f30731ff

Since the issue appears resolved, I'm going to continue working upstream.

Zach

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-09  2:27                                                                     ` Zachary Amsden
@ 2010-10-09  6:29                                                                       ` Michael Tokarev
  2010-10-09  8:59                                                                         ` Arjan Koers
  2010-10-10  1:20                                                                       ` Arjan Koers
  2010-10-11 17:53                                                                       ` Anthony Liguori
  2 siblings, 1 reply; 81+ messages in thread
From: Michael Tokarev @ 2010-10-09  6:29 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Arjan Koers, kvm, Marcelo Tosatti, Avi Kivity, Glauber Costa,
	Andre Przywara

09.10.2010 06:27, Zachary Amsden wrote:
[]
> There's a lot of work I've done and also a lot of work done by Glauber
> Costa on kvmclock that recently went upstream.

I've seen your series that went into 2.6.36-to-be.
And tried to apply to a stable kernel series (2.6.32)
near the beginning of this thread.  But it fails right
at the second patch -- ec01d2eb0a74a6d95823fb6e320298473faf12be
"KVM: x86: Convert TSC writes to TSC offset writes",
in arch/x86/kvm/vmx.c, and later other patches at other
places.  In theory it should be possible for me to get
them applied, mechanically, by trying to guess what's
going on and modifying stuff accordingly.

> It's unlikely that you'll be bug free without all of those patches
> applied; most of the patches were not just enhancements, but contained
> bugfixes as well as improved operation conditions.  On top of this, the
> patches are highly interdependent because of close code proximity.  I
> suggest applying the following commits to your branch (newest listed
> first; apply in reverse order):

Yes, these commits, that's a large series of patches,
with lots of work done to produce them.

> Since the issue appears resolved, I'm going to continue working upstream.

The result is that no released linux kernel boots
in smp in kvm, which is a linux virtual machine.
That's irony, isn't it?

I wonder how distributions (which are almost all based
on 2.6.32 nowadays) will deal with the issue.. ;)

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-08 22:06                                                                 ` Marcelo Tosatti
  2010-10-09  1:10                                                                   ` Arjan Koers
@ 2010-10-09  7:59                                                                   ` Michael Tokarev
  2010-10-09  8:31                                                                     ` Michael Tokarev
  1 sibling, 1 reply; 81+ messages in thread
From: Michael Tokarev @ 2010-10-09  7:59 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Zachary Amsden, Arjan Koers, kvm, Avi Kivity, Glauber Costa,
	Andre Przywara

09.10.2010 02:06, Marcelo Tosatti wrote:
[]
>>>> commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
>>>> Author: Zachary Amsden<zamsden@redhat.com>
>>>> Date:   Thu Aug 19 22:07:26 2010 -1000
[]
>>> commit aad07c4f92bae2edaa42bcef84c2afdd0d082458
>>> Author: Zachary Amsden<zamsden@redhat.com>
>>> Date:   Thu Aug 19 22:07:19 2010 -1000

Um. Now I'm completely confused.

The two mentioned patches, just like most of the
larger series from Zachary Amsden, are for _host_
kernel, right?

The two backports:

 arch/x86/kvm/svm.c     |    2 +-
 kvm/arch/x86/kvm/svm.c |    2 +-

that's for _host_, not guest...

For some reason I tried several patches like the
two here for _guest_, not for host.  No doubt there
were no difference in the results.

For host, things are quite different.  While 2.6.32
is still very important there, it's not _that_
important as for guest.

As far as I can see, most of these can be dealt with
by re-loading kvm modules.  Let me try these and some
of the earlier patches...

Oh well...  Confusion, confusion, confusion.... :)

/mjt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-09  7:59                                                                   ` Michael Tokarev
@ 2010-10-09  8:31                                                                     ` Michael Tokarev
  0 siblings, 0 replies; 81+ messages in thread
From: Michael Tokarev @ 2010-10-09  8:31 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Zachary Amsden, Arjan Koers, kvm, Avi Kivity, Glauber Costa,
	Andre Przywara

09.10.2010 11:59, Michael Tokarev wrote:
[]
> As far as I can see, most of these can be dealt with
> by re-loading kvm modules.  Let me try these and some
> of the earlier patches...

So the two one-line backports, while applied to the
_host_ kvm modules, eliminated all the issues I had
so far with unstable clock and smp guests hanging
here or there.  The timestamps in dmesg are not
jumping into the past anymore, and all my guests,
even the most problematic ones, now boots fine
(I tried several times to trigger the problem, to
no avail).

Just to be sure and to eliminate further possible
confusion: that's host kernel 2.6.35.6-amd64,
with two patches (backports offered by Marcelo)
applied on top and kvm{,-amd}.ko reloaded.

I tried several guests, incl. 2.6.32-i686 with
the earlier debugging patches applied, and
2.6.35-i686 (these two guests were showing the
issue most often).

Looking at the larger patchset again, -- there
were quite a few other changes too, should some
of these be applied as well?  I mean, we eliminated
the most obvious problem, but it looks like there
are more problems in there....

Thank you for your work!

/mjt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-09  6:29                                                                       ` Michael Tokarev
@ 2010-10-09  8:59                                                                         ` Arjan Koers
  2010-10-11 20:47                                                                           ` Zachary Amsden
  0 siblings, 1 reply; 81+ messages in thread
From: Arjan Koers @ 2010-10-09  8:59 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Zachary Amsden, kvm, Marcelo Tosatti, Avi Kivity, Glauber Costa,
	Andre Przywara

On 2010-10-09 08:29, Michael Tokarev wrote:
...
> The result is that no released linux kernel boots
> in smp in kvm, which is a linux virtual machine.
> That's irony, isn't it?
> 
> I wonder how distributions (which are almost all based
> on 2.6.32 nowadays) will deal with the issue.. ;)

It looks like Debian solved it on their 2.6.32 guest by
reverting the commit that makes it hang:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=588426


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-09  2:27                                                                     ` Zachary Amsden
  2010-10-09  6:29                                                                       ` Michael Tokarev
@ 2010-10-10  1:20                                                                       ` Arjan Koers
  2010-10-11 17:53                                                                       ` Anthony Liguori
  2 siblings, 0 replies; 81+ messages in thread
From: Arjan Koers @ 2010-10-10  1:20 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: kvm, Marcelo Tosatti, Michael Tokarev, Avi Kivity, Glauber Costa,
	Andre Przywara

On 2010-10-09 04:27, Zachary Amsden wrote:
...
> There's a lot of work I've done and also a lot of work done by Glauber
> Costa on kvmclock that recently went upstream.
> 
> It's unlikely that you'll be bug free without all of those patches
> applied; most of the patches were not just enhancements, but contained
> bugfixes as well as improved operation conditions.  On top of this, the
> patches are highly interdependent because of close code proximity.  I
> suggest applying the following commits to your branch (newest listed
> first; apply in reverse order):
> 
> 12b1164fa498997bf72070e6a81418197e283716
...
> 3e05d29fe45508625e2a73db3d1bfb54f30731ff

I've tried applying these commits to 2.6.32.24, but gave up after a
while, because some were just too different to make it work (e.g.
91308e2fecddb6fc63feaf4cef3400f5cbea6619).


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-09  1:10                                                                   ` Arjan Koers
  2010-10-09  2:27                                                                     ` Zachary Amsden
  2010-10-09  2:29                                                                     ` Zachary Amsden
@ 2010-10-10  1:26                                                                     ` Arjan Koers
  2010-10-20 20:47                                                                     ` Arjan Koers
  3 siblings, 0 replies; 81+ messages in thread
From: Arjan Koers @ 2010-10-10  1:26 UTC (permalink / raw)
  To: kvm
  Cc: Zachary Amsden, Marcelo Tosatti, Michael Tokarev, Avi Kivity,
	Glauber Costa, Andre Przywara

[-- Attachment #1: Type: text/plain, Size: 2485 bytes --]

On 2010-10-09 03:10, Arjan Koers wrote:
> On 2010-10-09 00:06, Marcelo Tosatti wrote:
>> On Thu, Oct 07, 2010 at 04:47:11PM -1000, Zachary Amsden wrote:
>>> On 10/07/2010 02:12 PM, Arjan Koers wrote:
>>>> On 2010-10-03 01:42, Zachary Amsden wrote:
>>>> ...
>>>>> Umm...  do you guys have this commit?  This is supposed to address the
>>>>> issue where the guest keeps resetting the TSC.  A guest which does that
>>>>> will break kvmclock.  It only happens on SMP, and it's much worse on AMD
>>>>> CPUs...
>>>>>
>>>>> sound like your scenario.
>>>>>
>>>>> commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
>>>>> Author: Zachary Amsden<zamsden@redhat.com>
>>>>> Date:   Thu Aug 19 22:07:26 2010 -1000
>>>>
>>>> This commit fixes the problem:
>>>>
>>>> commit aad07c4f92bae2edaa42bcef84c2afdd0d082458
>>>> Author: Zachary Amsden<zamsden@redhat.com>
>>>> Date:   Thu Aug 19 22:07:19 2010 -1000
>>>>
>>>>     KVM: x86: Move TSC reset out of vmcb_init
>>>>
>>>>     The VMCB is reset whenever we receive a startup IPI, so Linux is setting
>>>>     TSC back to zero happens very late in the boot process and destabilizing
>>>>     the TSC.  Instead, just set TSC to zero once at VCPU creation time.
>>>>
>>>>     Why the separate patch?  So git-bisect is your friend.
>>>
>>> Okay, apparently I need to go poke around 2.6.35 and see what
>>> patches made it there and what patches didn't.
>>
>> Backports attached. Michael, Arjan, please give them a try.
>>
> 
> Thanks for the patches.
> 
> Successfully tested with 2.6.34.7, 2.6.35.7 and 2.6.36-rc7 host
> (with a 2.6.35.7 guest).
> 
> It failed with a 2.6.32.24 host. The patch applied, but
> pvclock_clocksource_read on the guest is still producing wrong
> results for CPU 1 while it's booting. I'll re-check tomorrow.

I've performed some more tests on 2.6.32.24 and it turns out that
the wrong value for CPU 1 is not far enough into the future to make
the guest hang, but that may be different on someone else's system.
See the attached boot log 'dmesg-tsc-unstable.txt'. Note that the printk
time doesn't change for a while after switching to clocksource kvm-clock.

On 2.6.32 and 2.6.33, the TSC is unstable, while on 2.6.34+ it's not
(with Marcelo's patches applied). The attached host patches (backported
from 2.6.34) make them all behave like 2.6.34+, with stable TSC. See
boot log 'dmesg-tsc-stable.txt'.
If I'm not mistaken, the code in pvclock_clocksource_read that
causes the hangs will never be reached when the TSC is stable.


[-- Attachment #2: dmesg-tsc-unstable.txt --]
[-- Type: text/plain, Size: 15294 bytes --]

[    0.000000] Linux version 2.6.32.24-201010092338-guestmp (arjan@dev-lenny) (gcc version 4.4.5 20100728 (prerelease) (Debian 4.4.4-8) ) #1 SMP Sat Oct 9 23:42:46 UTC 2010
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-2.6.32.24-201010092338-guestmp root=UUID=22a4b388-70e0-4d2a-9aa1-bd842504378a ro quiet
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009bc00 (usable)
[    0.000000]  BIOS-e820: 000000000009bc00 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 000000001fffd000 (usable)
[    0.000000]  BIOS-e820: 000000001fffd000 - 0000000020000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: feffd00000000000 - ff00100000000000 (reserved)
[    0.000000] DMI 2.4 present.
[    0.000000] last_pfn = 0x1fffd max_arch_pfn = 0x400000000
[    0.000000] MTRR default type: write-back
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 00E0000000 mask FFE0000000 uncachable
[    0.000000]   1 disabled
[    0.000000]   2 disabled
[    0.000000]   3 disabled
[    0.000000]   4 disabled
[    0.000000]   5 disabled
[    0.000000]   6 disabled
[    0.000000]   7 disabled
[    0.000000] x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106
[    0.000000] initial memory mapped : 0 - 20000000
[    0.000000] Using GB pages for direct mapping
[    0.000000] init_memory_mapping: 0000000000000000-000000001fffd000
[    0.000000]  0000000000 - 001fe00000 page 2M
[    0.000000]  001fe00000 - 001fffd000 page 4k
[    0.000000] kernel direct mapping tables up to 1fffd000 @ 8000-b000
[    0.000000] RAMDISK: 17df5000 - 1803d7b1
[    0.000000] ACPI: RSDP 00000000000fdb80 00014 (v00 BOCHS )
[    0.000000] ACPI: RSDT 000000001fffde10 00034 (v01 BOCHS  BXPCRSDT 00000001 BXPC 00000001)
[    0.000000] ACPI: FACP 000000001ffffe40 00074 (v01 BOCHS  BXPCFACP 00000001 BXPC 00000001)
[    0.000000] ACPI: DSDT 000000001fffdfd0 01E22 (v01   BXPC   BXDSDT 00000001 INTL 20090123)
[    0.000000] ACPI: FACS 000000001ffffe00 00040
[    0.000000] ACPI: SSDT 000000001fffdf80 00044 (v01 BOCHS  BXPCSSDT 00000001 BXPC 00000001)
[    0.000000] ACPI: APIC 000000001fffde90 0007A (v01 BOCHS  BXPCAPIC 00000001 BXPC 00000001)
[    0.000000] ACPI: HPET 000000001fffde50 00038 (v01 BOCHS  BXPCHPET 00000001 BXPC 00000001)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] (7 early reservations) ==> bootmem [0000000000 - 001fffd000]
[    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
[    0.000000]   #1 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]
[    0.000000]   #2 [0001000000 - 00013d08d8]    TEXT DATA BSS ==> [0001000000 - 00013d08d8]
[    0.000000]   #3 [0017df5000 - 001803d7b1]          RAMDISK ==> [0017df5000 - 001803d7b1]
[    0.000000]   #4 [000009bc00 - 0000100000]    BIOS reserved ==> [000009bc00 - 0000100000]
[    0.000000]   #5 [00013d1000 - 00013d1071]              BRK ==> [00013d1000 - 00013d1071]
[    0.000000]   #6 [0000008000 - 0000009000]          PGTABLE ==> [0000008000 - 0000009000]
[    0.000000] kvm-clock: cpu 0, msr 0:1322601, boot clock
[    0.000000]  [ffffea0000000000-ffffea00007fffff] PMD -> [ffff880001800000-ffff880001ffffff] on node 0
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000000 -> 0x00001000
[    0.000000]   DMA32    0x00001000 -> 0x00100000
[    0.000000]   Normal   0x00100000 -> 0x00100000
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[2] active PFN ranges
[    0.000000]     0: 0x00000000 -> 0x0000009b
[    0.000000]     0: 0x00000100 -> 0x0001fffd
[    0.000000] On node 0 totalpages: 130968
[    0.000000]   DMA zone: 56 pages used for memmap
[    0.000000]   DMA zone: 104 pages reserved
[    0.000000]   DMA zone: 3835 pages, LIFO batch:0
[    0.000000]   DMA32 zone: 1736 pages used for memmap
[    0.000000]   DMA32 zone: 125237 pages, LIFO batch:31
[    0.000000] ACPI: PM-Timer IO Port: 0xb008
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[    0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[    0.000000] ACPI: IRQ0 used by override.
[    0.000000] ACPI: IRQ2 used by override.
[    0.000000] ACPI: IRQ5 used by override.
[    0.000000] ACPI: IRQ9 used by override.
[    0.000000] ACPI: IRQ10 used by override.
[    0.000000] ACPI: IRQ11 used by override.
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[    0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
[    0.000000] nr_irqs_gsi: 24
[    0.000000] Allocating PCI resources starting at 20000000 (gap: 20000000:dffc0000)
[    0.000000] Booting paravirtualized kernel on KVM
[    0.000000] NR_CPUS:6 nr_cpumask_bits:6 nr_cpu_ids:2 nr_node_ids:1
[    0.000000] PERCPU: Embedded 26 pages/cpu @ffff880001400000 s73816 r8192 d24488 u1048576
[    0.000000] pcpu-alloc: s73816 r8192 d24488 u1048576 alloc=1*2097152
[    0.000000] pcpu-alloc: [0] 0 1 
[    0.000000] kvm-clock: cpu 0, msr 0:1411601, primary cpu clock
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 129072
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-2.6.32.24-201010092338-guestmp root=UUID=22a4b388-70e0-4d2a-9aa1-bd842504378a ro quiet
[    0.000000] PID hash table entries: 2048 (order: 2, 16384 bytes)
[    0.000000] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.000000] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
[    0.000000] Initializing CPU#0
[    0.000000] Checking aperture...
[    0.000000] No AGP bridge found
[    0.000000] Memory: 507724k/524276k available (2072k kernel code, 404k absent, 15504k reserved, 1063k data, 452k init)
[    0.000000] Hierarchical RCU implementation.
[    0.000000] NR_IRQS:448
[    0.000000] Console: colour VGA+ 80x25
[    0.000000] console [tty0] enabled
[    0.000000] hpet clockevent registered
[    0.000000] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
[    0.000000] Detected 2799.842 MHz processor.
[    0.012000] Calibrating delay loop (skipped) preset value.. 5599.68 BogoMIPS (lpj=11199368)
[    0.012000] Mount-cache hash table entries: 256
[    0.012000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[    0.012000] CPU: L2 Cache: 512K (64 bytes/line)
[    0.012000] using C1E aware idle routine
[    0.012000] Performance Events: AMD PMU driver.
[    0.012000] ... version:                0
[    0.012000] ... bit width:              48
[    0.012000] ... generic registers:      4
[    0.012000] ... value mask:             0000ffffffffffff
[    0.012000] ... max period:             00007fffffffffff
[    0.012000] ... fixed-purpose events:   0
[    0.012000] ... event mask:             000000000000000f
[    0.012000] Freeing SMP alternatives: 20k freed
[    0.012019] ACPI: Core revision 20090903
[    0.014379] Setting APIC routing to flat
[    0.015667] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.015669] CPU0: AMD Athlon(tm) II X2 240 Processor stepping 02
[    0.016000] Booting processor 1 APIC 0x1 ip 0x6000
[    0.012000] Initializing CPU#1
[    0.012000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[    0.012000] CPU: L2 Cache: 512K (64 bytes/line)
[    0.012000] kvm-clock: cpu 1, msr 0:1511601, secondary cpu clock
[    0.025724] CPU1: AMD Athlon(tm) II X2 240 Processor stepping 02
[    0.025724] Brought up 2 CPUs
[    0.025724] Total of 2 processors activated (11199.36 BogoMIPS).
[    0.025724] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
[    0.028000] NET: Registered protocol family 16
[    0.028000] ACPI: bus type pci registered
[    0.028000] PCI: Using configuration type 1 for base access
[    0.028000] PCI: Using configuration type 1 for extended access
[    0.028000] mtrr: your CPUs had inconsistent variable MTRR settings
[    0.028000] mtrr: your CPUs had inconsistent MTRRdefType settings
[    0.028000] mtrr: probably your BIOS does not setup all CPUs.
[    0.028000] mtrr: corrected configuration.
[    0.040000] bio: create slab <bio-0> at 0
[    0.040000] ACPI: EC: Look up EC in DSDT
[    0.040000] ACPI: Interpreter enabled
[    0.040000] ACPI: (supports S0 S5)
[    0.040000] ACPI: Using IOAPIC for interrupt routing
[    0.064000] ACPI: PCI Root Bridge [PCI0] (0000:00)
[    0.064000] pci 0000:00:01.1: reg 20 io port: [0xc000-0xc00f]
[    0.064000] pci 0000:00:01.3: quirk: region b000-b03f claimed by PIIX4 ACPI
[    0.064000] pci 0000:00:01.3: quirk: region b100-b10f claimed by PIIX4 SMB
[    0.068000] pci 0000:00:02.0: reg 10 32bit mmio pref: [0xf0000000-0xf1ffffff]
[    0.068000] pci 0000:00:02.0: reg 14 32bit mmio: [0xf2000000-0xf2000fff]
[    0.072000] pci 0000:00:02.0: reg 30 32bit mmio pref: [0xf2010000-0xf201ffff]
[    0.072000] pci 0000:00:03.0: reg 10 io port: [0xc020-0xc03f]
[    0.072000] pci 0000:00:03.0: reg 14 32bit mmio: [0xf2020000-0xf2020fff]
[    0.072000] pci 0000:00:03.0: reg 30 32bit mmio pref: [0xf2030000-0xf203ffff]
[    0.072000] pci 0000:00:04.0: reg 10 io port: [0xc040-0xc05f]
[    0.072000] pci 0000:00:05.0: reg 10 io port: [0xc080-0xc0bf]
[    0.072000] pci 0000:00:05.0: reg 14 32bit mmio: [0xf2040000-0xf2040fff]
[    0.072000] pci 0000:00:06.0: reg 10 io port: [0xc0c0-0xc0ff]
[    0.072000] pci 0000:00:06.0: reg 14 32bit mmio: [0xf2041000-0xf2041fff]
[    0.072000] pci_bus 0000:00: on NUMA node 0
[    0.072000] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[    0.080000] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[    0.080000] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[    0.080000] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[    0.080000] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[    0.084000] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[    0.084000] vgaarb: loaded
[    0.084000] PCI: Using ACPI for IRQ routing
[    0.084000] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[    0.088000] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
[    0.096000] Switching to clocksource kvm-clock
[    0.096000] pnp: PnP ACPI init
[    0.096000] ACPI: bus type pnp registered
[    0.096000] pnp: PnP ACPI: found 7 devices
[    0.096000] ACPI: ACPI bus type pnp unregistered
[    0.096000] pci_bus 0000:00: resource 0 io:  [0x00-0xffff]
[    0.096000] pci_bus 0000:00: resource 1 mem: [0x000000-0xffffffffffffffff]
[    0.096000] NET: Registered protocol family 2
[    0.096000] IP route cache hash table entries: 4096 (order: 3, 32768 bytes)
[    0.096000] TCP established hash table entries: 16384 (order: 6, 262144 bytes)
[    0.096000] TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
[    0.096000] TCP: Hash tables configured (established 16384 bind 16384)
[    0.096000] TCP reno registered
[    0.096000] NET: Registered protocol family 1
[    0.096000] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    0.096000] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[    0.096000] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[    0.096000] pci 0000:00:02.0: Boot video device
[    0.096000] Unpacking initramfs...
[    0.096000] Freeing initrd memory: 2337k freed
[    0.096000] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    0.096000] msgmni has been set to 997
[    0.096000] alg: No test for stdrng (krng)
[    0.096000] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[    0.096000] io scheduler noop registered
[    0.096000] io scheduler anticipatory registered
[    0.096000] io scheduler deadline registered
[    0.096000] io scheduler cfq registered (default)
[    0.096000] PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[    0.096000] serio: i8042 KBD port at 0x60,0x64 irq 1
[    0.096000] serio: i8042 AUX port at 0x60,0x64 irq 12
[    0.096000] mice: PS/2 mouse device common for all mice
[    0.096000] rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
[    0.096000] rtc0: alarms up to one day, 114 bytes nvram, hpet irqs
[    0.096000] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
[    0.096000] cpuidle: using governor ladder
[    0.096000] cpuidle: using governor menu
[    0.096000] TCP cubic registered
[    0.096000] NET: Registered protocol family 17
[    0.096000] rtc_cmos 00:01: setting system clock to 2010-10-10 00:15:08 UTC (1286669708)
[    0.096000] Freeing unused kernel memory: 452k freed
[    0.096000] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
[    0.096000] virtio-pci 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) -> IRQ 11
[    0.096000] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 10
[    0.096000] virtio-pci 0000:00:04.0: PCI INT A -> Link[LNKD] -> GSI 10 (level, high) -> IRQ 10
[    0.096000] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10
[    0.096000] virtio-pci 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10
[    0.096000] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 11
[    0.096000] virtio-pci 0000:00:06.0: PCI INT A -> Link[LNKB] -> GSI 11 (level, high) -> IRQ 11
[    0.096000] virtio-pci 0000:00:05.0: irq 24 for MSI/MSI-X
[    0.096000] virtio-pci 0000:00:05.0: irq 25 for MSI/MSI-X
[    0.096000]  vda: vda1 vda2 < vda5 >
[    0.096000] virtio-pci 0000:00:03.0: irq 26 for MSI/MSI-X
[    0.096000] virtio-pci 0000:00:03.0: irq 27 for MSI/MSI-X
[    0.096000] virtio-pci 0000:00:03.0: irq 28 for MSI/MSI-X
[    0.096000] virtio-pci 0000:00:06.0: irq 29 for MSI/MSI-X
[    0.096000] virtio-pci 0000:00:06.0: irq 30 for MSI/MSI-X
[    0.096000]  vdb: vdb1
[    0.655063] kjournald starting.  Commit interval 5 seconds
[    0.655106] EXT3-fs: mounted filesystem with writeback data mode.
[    1.009125] Clocksource tsc unstable (delta = 303360937 ns)
[    2.278086] udev: starting version 160
[    2.971618] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input1
[    2.971640] ACPI: Power Button [PWRF]
[    3.156974] processor LNXCPU:00: registered as cooling_device0
[    3.157091] processor LNXCPU:01: registered as cooling_device1
[    4.305146] Adding 409616k swap on /dev/vda5.  Priority:-1 extents:1 across:409616k 
[    4.465152] EXT3 FS on vda1, internal journal
[    4.642414] loop: module loaded

[-- Attachment #3: dmesg-tsc-stable.txt --]
[-- Type: text/plain, Size: 15231 bytes --]

[    0.000000] Linux version 2.6.32.24-201010092338-guestmp (arjan@dev-lenny) (gcc version 4.4.5 20100728 (prerelease) (Debian 4.4.4-8) ) #1 SMP Sat Oct 9 23:42:46 UTC 2010
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-2.6.32.24-201010092338-guestmp root=UUID=22a4b388-70e0-4d2a-9aa1-bd842504378a ro quiet
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009bc00 (usable)
[    0.000000]  BIOS-e820: 000000000009bc00 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 000000001fffd000 (usable)
[    0.000000]  BIOS-e820: 000000001fffd000 - 0000000020000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: feffd00000000000 - ff00100000000000 (reserved)
[    0.000000] DMI 2.4 present.
[    0.000000] last_pfn = 0x1fffd max_arch_pfn = 0x400000000
[    0.000000] MTRR default type: write-back
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 00E0000000 mask FFE0000000 uncachable
[    0.000000]   1 disabled
[    0.000000]   2 disabled
[    0.000000]   3 disabled
[    0.000000]   4 disabled
[    0.000000]   5 disabled
[    0.000000]   6 disabled
[    0.000000]   7 disabled
[    0.000000] x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106
[    0.000000] initial memory mapped : 0 - 20000000
[    0.000000] Using GB pages for direct mapping
[    0.000000] init_memory_mapping: 0000000000000000-000000001fffd000
[    0.000000]  0000000000 - 001fe00000 page 2M
[    0.000000]  001fe00000 - 001fffd000 page 4k
[    0.000000] kernel direct mapping tables up to 1fffd000 @ 8000-b000
[    0.000000] RAMDISK: 17df5000 - 1803d7b1
[    0.000000] ACPI: RSDP 00000000000fdb80 00014 (v00 BOCHS )
[    0.000000] ACPI: RSDT 000000001fffde10 00034 (v01 BOCHS  BXPCRSDT 00000001 BXPC 00000001)
[    0.000000] ACPI: FACP 000000001ffffe40 00074 (v01 BOCHS  BXPCFACP 00000001 BXPC 00000001)
[    0.000000] ACPI: DSDT 000000001fffdfd0 01E22 (v01   BXPC   BXDSDT 00000001 INTL 20090123)
[    0.000000] ACPI: FACS 000000001ffffe00 00040
[    0.000000] ACPI: SSDT 000000001fffdf80 00044 (v01 BOCHS  BXPCSSDT 00000001 BXPC 00000001)
[    0.000000] ACPI: APIC 000000001fffde90 0007A (v01 BOCHS  BXPCAPIC 00000001 BXPC 00000001)
[    0.000000] ACPI: HPET 000000001fffde50 00038 (v01 BOCHS  BXPCHPET 00000001 BXPC 00000001)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] (7 early reservations) ==> bootmem [0000000000 - 001fffd000]
[    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
[    0.000000]   #1 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]
[    0.000000]   #2 [0001000000 - 00013d08d8]    TEXT DATA BSS ==> [0001000000 - 00013d08d8]
[    0.000000]   #3 [0017df5000 - 001803d7b1]          RAMDISK ==> [0017df5000 - 001803d7b1]
[    0.000000]   #4 [000009bc00 - 0000100000]    BIOS reserved ==> [000009bc00 - 0000100000]
[    0.000000]   #5 [00013d1000 - 00013d1071]              BRK ==> [00013d1000 - 00013d1071]
[    0.000000]   #6 [0000008000 - 0000009000]          PGTABLE ==> [0000008000 - 0000009000]
[    0.000000] kvm-clock: cpu 0, msr 0:1322601, boot clock
[    0.000000]  [ffffea0000000000-ffffea00007fffff] PMD -> [ffff880001800000-ffff880001ffffff] on node 0
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000000 -> 0x00001000
[    0.000000]   DMA32    0x00001000 -> 0x00100000
[    0.000000]   Normal   0x00100000 -> 0x00100000
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[2] active PFN ranges
[    0.000000]     0: 0x00000000 -> 0x0000009b
[    0.000000]     0: 0x00000100 -> 0x0001fffd
[    0.000000] On node 0 totalpages: 130968
[    0.000000]   DMA zone: 56 pages used for memmap
[    0.000000]   DMA zone: 104 pages reserved
[    0.000000]   DMA zone: 3835 pages, LIFO batch:0
[    0.000000]   DMA32 zone: 1736 pages used for memmap
[    0.000000]   DMA32 zone: 125237 pages, LIFO batch:31
[    0.000000] ACPI: PM-Timer IO Port: 0xb008
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[    0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[    0.000000] ACPI: IRQ0 used by override.
[    0.000000] ACPI: IRQ2 used by override.
[    0.000000] ACPI: IRQ5 used by override.
[    0.000000] ACPI: IRQ9 used by override.
[    0.000000] ACPI: IRQ10 used by override.
[    0.000000] ACPI: IRQ11 used by override.
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[    0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
[    0.000000] nr_irqs_gsi: 24
[    0.000000] Allocating PCI resources starting at 20000000 (gap: 20000000:dffc0000)
[    0.000000] Booting paravirtualized kernel on KVM
[    0.000000] NR_CPUS:6 nr_cpumask_bits:6 nr_cpu_ids:2 nr_node_ids:1
[    0.000000] PERCPU: Embedded 26 pages/cpu @ffff880001400000 s73816 r8192 d24488 u1048576
[    0.000000] pcpu-alloc: s73816 r8192 d24488 u1048576 alloc=1*2097152
[    0.000000] pcpu-alloc: [0] 0 1 
[    0.000000] kvm-clock: cpu 0, msr 0:1411601, primary cpu clock
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 129072
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-2.6.32.24-201010092338-guestmp root=UUID=22a4b388-70e0-4d2a-9aa1-bd842504378a ro quiet
[    0.000000] PID hash table entries: 2048 (order: 2, 16384 bytes)
[    0.000000] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.000000] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
[    0.000000] Initializing CPU#0
[    0.000000] Checking aperture...
[    0.000000] No AGP bridge found
[    0.000000] Memory: 507724k/524276k available (2072k kernel code, 404k absent, 15504k reserved, 1063k data, 452k init)
[    0.000000] Hierarchical RCU implementation.
[    0.000000] NR_IRQS:448
[    0.000000] Console: colour VGA+ 80x25
[    0.000000] console [tty0] enabled
[    0.000000] hpet clockevent registered
[    0.000000] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
[    0.000000] Detected 2800.486 MHz processor.
[    0.012000] Calibrating delay loop (skipped) preset value.. 5600.97 BogoMIPS (lpj=11201944)
[    0.012000] Mount-cache hash table entries: 256
[    0.012000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[    0.012000] CPU: L2 Cache: 512K (64 bytes/line)
[    0.012000] using C1E aware idle routine
[    0.012000] Performance Events: AMD PMU driver.
[    0.012000] ... version:                0
[    0.012000] ... bit width:              48
[    0.012000] ... generic registers:      4
[    0.012000] ... value mask:             0000ffffffffffff
[    0.012000] ... max period:             00007fffffffffff
[    0.012000] ... fixed-purpose events:   0
[    0.012000] ... event mask:             000000000000000f
[    0.012100] Freeing SMP alternatives: 20k freed
[    0.012114] ACPI: Core revision 20090903
[    0.014445] Setting APIC routing to flat
[    0.015790] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.015793] CPU0: AMD Athlon(tm) II X2 240 Processor stepping 02
[    0.016000] Booting processor 1 APIC 0x1 ip 0x6000
[    0.012000] Initializing CPU#1
[    0.012000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[    0.012000] CPU: L2 Cache: 512K (64 bytes/line)
[    0.012000] kvm-clock: cpu 1, msr 0:1511601, secondary cpu clock
[    0.024078] CPU1: AMD Athlon(tm) II X2 240 Processor stepping 02
[    0.024108] Brought up 2 CPUs
[    0.024110] Total of 2 processors activated (11201.94 BogoMIPS).
[    0.024411] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
[    0.025259] NET: Registered protocol family 16
[    0.028133] ACPI: bus type pci registered
[    0.028133] PCI: Using configuration type 1 for base access
[    0.028133] PCI: Using configuration type 1 for extended access
[    0.028188] mtrr: your CPUs had inconsistent variable MTRR settings
[    0.028188] mtrr: your CPUs had inconsistent MTRRdefType settings
[    0.028188] mtrr: probably your BIOS does not setup all CPUs.
[    0.028188] mtrr: corrected configuration.
[    0.036221] bio: create slab <bio-0> at 0
[    0.044346] ACPI: EC: Look up EC in DSDT
[    0.049855] ACPI: Interpreter enabled
[    0.049857] ACPI: (supports S0 S5)
[    0.049867] ACPI: Using IOAPIC for interrupt routing
[    0.068474] ACPI: PCI Root Bridge [PCI0] (0000:00)
[    0.072834] pci 0000:00:01.1: reg 20 io port: [0xc000-0xc00f]
[    0.073227] pci 0000:00:01.3: quirk: region b000-b03f claimed by PIIX4 ACPI
[    0.073239] pci 0000:00:01.3: quirk: region b100-b10f claimed by PIIX4 SMB
[    0.075565] pci 0000:00:02.0: reg 10 32bit mmio pref: [0xf0000000-0xf1ffffff]
[    0.075565] pci 0000:00:02.0: reg 14 32bit mmio: [0xf2000000-0xf2000fff]
[    0.075565] pci 0000:00:02.0: reg 30 32bit mmio pref: [0xf2010000-0xf201ffff]
[    0.075565] pci 0000:00:03.0: reg 10 io port: [0xc020-0xc03f]
[    0.075565] pci 0000:00:03.0: reg 14 32bit mmio: [0xf2020000-0xf2020fff]
[    0.075619] pci 0000:00:03.0: reg 30 32bit mmio pref: [0xf2030000-0xf203ffff]
[    0.080155] pci 0000:00:04.0: reg 10 io port: [0xc040-0xc05f]
[    0.080533] pci 0000:00:05.0: reg 10 io port: [0xc080-0xc0bf]
[    0.080592] pci 0000:00:05.0: reg 14 32bit mmio: [0xf2040000-0xf2040fff]
[    0.081031] pci 0000:00:06.0: reg 10 io port: [0xc0c0-0xc0ff]
[    0.081089] pci 0000:00:06.0: reg 14 32bit mmio: [0xf2041000-0xf2041fff]
[    0.081536] pci_bus 0000:00: on NUMA node 0
[    0.081604] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[    0.092368] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[    0.092614] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[    0.092822] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[    0.093030] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[    0.093246] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[    0.093246] vgaarb: loaded
[    0.096176] PCI: Using ACPI for IRQ routing
[    0.096631] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[    0.096631] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
[    0.112094] Switching to clocksource kvm-clock
[    0.112665] pnp: PnP ACPI init
[    0.112710] ACPI: bus type pnp registered
[    0.117907] pnp: PnP ACPI: found 7 devices
[    0.117914] ACPI: ACPI bus type pnp unregistered
[    0.127084] pci_bus 0000:00: resource 0 io:  [0x00-0xffff]
[    0.127093] pci_bus 0000:00: resource 1 mem: [0x000000-0xffffffffffffffff]
[    0.127513] NET: Registered protocol family 2
[    0.127902] IP route cache hash table entries: 4096 (order: 3, 32768 bytes)
[    0.128916] TCP established hash table entries: 16384 (order: 6, 262144 bytes)
[    0.129442] TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
[    0.129976] TCP: Hash tables configured (established 16384 bind 16384)
[    0.129987] TCP reno registered
[    0.130420] NET: Registered protocol family 1
[    0.130462] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    0.130498] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[    0.130529] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[    0.130559] pci 0000:00:02.0: Boot video device
[    0.130721] Unpacking initramfs...
[    0.177652] Freeing initrd memory: 2337k freed
[    0.184652] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    0.185509] msgmni has been set to 997
[    0.186409] alg: No test for stdrng (krng)
[    0.186897] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[    0.186906] io scheduler noop registered
[    0.186908] io scheduler anticipatory registered
[    0.186909] io scheduler deadline registered
[    0.187038] io scheduler cfq registered (default)
[    0.219030] PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[    0.220785] serio: i8042 KBD port at 0x60,0x64 irq 1
[    0.220803] serio: i8042 AUX port at 0x60,0x64 irq 12
[    0.221828] mice: PS/2 mouse device common for all mice
[    0.223661] rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
[    0.223792] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
[    0.224291] rtc0: alarms up to one day, 114 bytes nvram, hpet irqs
[    0.224441] cpuidle: using governor ladder
[    0.224447] cpuidle: using governor menu
[    0.226020] TCP cubic registered
[    0.226026] NET: Registered protocol family 17
[    0.228262] rtc_cmos 00:01: setting system clock to 2010-10-09 23:52:14 UTC (1286668334)
[    0.228523] Freeing unused kernel memory: 452k freed
[    0.319112] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
[    0.319139] virtio-pci 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) -> IRQ 11
[    0.319275] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 10
[    0.319290] virtio-pci 0000:00:04.0: PCI INT A -> Link[LNKD] -> GSI 10 (level, high) -> IRQ 10
[    0.319440] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10
[    0.319443] virtio-pci 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10
[    0.319592] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 11
[    0.319595] virtio-pci 0000:00:06.0: PCI INT A -> Link[LNKB] -> GSI 11 (level, high) -> IRQ 11
[    0.365438] virtio-pci 0000:00:03.0: irq 24 for MSI/MSI-X
[    0.365455] virtio-pci 0000:00:03.0: irq 25 for MSI/MSI-X
[    0.365468] virtio-pci 0000:00:03.0: irq 26 for MSI/MSI-X
[    0.366915] virtio-pci 0000:00:05.0: irq 27 for MSI/MSI-X
[    0.366930] virtio-pci 0000:00:05.0: irq 28 for MSI/MSI-X
[    0.367356]  vda: vda1 vda2 < vda5 >
[    0.393528] virtio-pci 0000:00:06.0: irq 29 for MSI/MSI-X
[    0.393542] virtio-pci 0000:00:06.0: irq 30 for MSI/MSI-X
[    0.394301]  vdb: vdb1
[    0.940463] kjournald starting.  Commit interval 5 seconds
[    0.940574] EXT3-fs: mounted filesystem with writeback data mode.
[    2.603464] udev: starting version 160
[    3.120632] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input1
[    3.120646] ACPI: Power Button [PWRF]
[    3.328008] processor LNXCPU:00: registered as cooling_device0
[    3.328080] processor LNXCPU:01: registered as cooling_device1
[    4.534367] Adding 409616k swap on /dev/vda5.  Priority:-1 extents:1 across:409616k 
[    4.702659] EXT3 FS on vda1, internal journal
[    4.838516] loop: module loaded

[-- Attachment #4: 2.6.32.24.diff --]
[-- Type: text/x-patch, Size: 938 bytes --]

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 61ba669..a5882fb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -797,15 +797,17 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (unlikely(cpu != vcpu->cpu)) {
 		u64 tsc_this, delta;
 
-		/*
-		 * Make sure that the guest sees a monotonically
-		 * increasing TSC.
-		 */
-		rdtscll(tsc_this);
-		delta = vcpu->arch.host_tsc - tsc_this;
-		svm->vmcb->control.tsc_offset += delta;
-		if (is_nested(svm))
-			svm->nested.hsave->control.tsc_offset += delta;
+		if (check_tsc_unstable()) {
+			/*
+			 * Make sure that the guest sees a monotonically
+			 * increasing TSC.
+			 */
+			rdtscll(tsc_this);
+			delta = vcpu->arch.host_tsc - tsc_this;
+			svm->vmcb->control.tsc_offset += delta;
+			if (is_nested(svm))
+				svm->nested.hsave->control.tsc_offset += delta;
+		}
 		vcpu->cpu = cpu;
 		kvm_migrate_timers(vcpu);
 		svm->asid_generation = 0;

[-- Attachment #5: 2.6.33.7.diff --]
[-- Type: text/x-patch, Size: 901 bytes --]

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 8d128be..77f119c 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -801,14 +801,16 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (unlikely(cpu != vcpu->cpu)) {
 		u64 delta;
 
-		/*
-		 * Make sure that the guest sees a monotonically
-		 * increasing TSC.
-		 */
-		delta = vcpu->arch.host_tsc - native_read_tsc();
-		svm->vmcb->control.tsc_offset += delta;
-		if (is_nested(svm))
-			svm->nested.hsave->control.tsc_offset += delta;
+		if (check_tsc_unstable()) {
+			/*
+			 * Make sure that the guest sees a monotonically
+			 * increasing TSC.
+			 */
+			delta = vcpu->arch.host_tsc - native_read_tsc();
+			svm->vmcb->control.tsc_offset += delta;
+			if (is_nested(svm))
+				svm->nested.hsave->control.tsc_offset += delta;
+		}
 		vcpu->cpu = cpu;
 		kvm_migrate_timers(vcpu);
 		svm->asid_generation = 0;

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-09  2:27                                                                     ` Zachary Amsden
  2010-10-09  6:29                                                                       ` Michael Tokarev
  2010-10-10  1:20                                                                       ` Arjan Koers
@ 2010-10-11 17:53                                                                       ` Anthony Liguori
  2010-10-11 18:36                                                                         ` Marcelo Tosatti
  2 siblings, 1 reply; 81+ messages in thread
From: Anthony Liguori @ 2010-10-11 17:53 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Arjan Koers, kvm, Marcelo Tosatti, Michael Tokarev, Avi Kivity,
	Glauber Costa, Andre Przywara

On 10/08/2010 09:27 PM, Zachary Amsden wrote:
> On 10/08/2010 03:10 PM, Arjan Koers wrote:
>> On 2010-10-09 00:06, Marcelo Tosatti wrote:
>>> On Thu, Oct 07, 2010 at 04:47:11PM -1000, Zachary Amsden wrote:
>>>> On 10/07/2010 02:12 PM, Arjan Koers wrote:
>>>>> On 2010-10-03 01:42, Zachary Amsden wrote:
>>>>> ...
>>>>>> Umm...  do you guys have this commit?  This is supposed to 
>>>>>> address the
>>>>>> issue where the guest keeps resetting the TSC.  A guest which 
>>>>>> does that
>>>>>> will break kvmclock.  It only happens on SMP, and it's much worse 
>>>>>> on AMD
>>>>>> CPUs...
>>>>>>
>>>>>> sound like your scenario.
>>>>>>
>>>>>> commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
>>>>>> Author: Zachary Amsden<zamsden@redhat.com>
>>>>>> Date:   Thu Aug 19 22:07:26 2010 -1000
>>>>> This commit fixes the problem:
>>>>>
>>>>> commit aad07c4f92bae2edaa42bcef84c2afdd0d082458
>>>>> Author: Zachary Amsden<zamsden@redhat.com>
>>>>> Date:   Thu Aug 19 22:07:19 2010 -1000
>>>>>
>>>>>      KVM: x86: Move TSC reset out of vmcb_init
>>>>>
>>>>>      The VMCB is reset whenever we receive a startup IPI, so Linux 
>>>>> is setting
>>>>>      TSC back to zero happens very late in the boot process and 
>>>>> destabilizing
>>>>>      the TSC.  Instead, just set TSC to zero once at VCPU creation 
>>>>> time.
>>>>>
>>>>>      Why the separate patch?  So git-bisect is your friend.
>>>> Okay, apparently I need to go poke around 2.6.35 and see what
>>>> patches made it there and what patches didn't.
>>> Backports attached. Michael, Arjan, please give them a try.
>>>
>> Thanks for the patches.
>>
>> Successfully tested with 2.6.34.7, 2.6.35.7 and 2.6.36-rc7 host
>> (with a 2.6.35.7 guest).
>>
>> It failed with a 2.6.32.24 host. The patch applied, but
>> pvclock_clocksource_read on the guest is still producing wrong
>> results for CPU 1 while it's booting. I'll re-check tomorrow.
>
> There's a lot of work I've done and also a lot of work done by Glauber 
> Costa on kvmclock that recently went upstream.

If pvclock is broken on 2.6.32-stable, then shouldn't we port these 
patches to the stable tree or in the very least, black list pvclock in 
stable?

Regards,

Anthony Liguori

> It's unlikely that you'll be bug free without all of those patches 
> applied; most of the patches were not just enhancements, but contained 
> bugfixes as well as improved operation conditions.  On top of this, 
> the patches are highly interdependent because of close code 
> proximity.  I suggest applying the following commits to your branch 
> (newest listed first; apply in reverse order):
>
> 12b1164fa498997bf72070e6a81418197e283716
> bfa075b75d8786380a7bca1215d4c7d1485d18dd
> 82e7988a2088781175a22b09631bce97cd5ed177
> bfb3f3326c915b1800dc65d10ca09fbd548353d2
> 1377ff23ae2bf49c76f8f498ca81050878b9666a
> 9a088cc32488cfb9f60dca5972155ba13f39eb83
> e06a1a6cbe4e9f4c766595483a9b345d5b48bda7
> da908f2fb4e783c2a4de751fb90f11a0dd041161
> cf839f5da2b0779b9ec8b990f851fb4e7d681da0
> cbc59a098486494d9a49537dcb9c969210a8306d
> 5cd459cdde725bb5c3a7feef6e074e7da70490c9
> d578d4d72e3d2154901123f40c9fa7de1f85ae73
> bd59fc8ff95126f27b7a0df1b6cc602aa428812d
> e5e7675b0b9bf8eb0b806145a2fe173b5bb0e908
> bf0fb4a42ba7eb362f4013bd2e93209666793e66
> 69403a558097a9bd333736d58a4cb69ea6e2a0ac
> a87834bdb7ff9117da7f164e8cee638f2c51f9b7
> 91308e2fecddb6fc63feaf4cef3400f5cbea6619
> fd03465c0648cd12d7333269b80d902d0a8516dd
> aad07c4f92bae2edaa42bcef84c2afdd0d082458
> 280372e494634d0a2cba3956721be16fc4f989bf
> 1e6145f6fd7899d1f34e4ac00a8558d82a8d704a
> ec01d2eb0a74a6d95823fb6e320298473faf12be
> 3e05d29fe45508625e2a73db3d1bfb54f30731ff
>
> Since the issue appears resolved, I'm going to continue working upstream.
>
> Zach
> -- 
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-11 17:53                                                                       ` Anthony Liguori
@ 2010-10-11 18:36                                                                         ` Marcelo Tosatti
  0 siblings, 0 replies; 81+ messages in thread
From: Marcelo Tosatti @ 2010-10-11 18:36 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Zachary Amsden, Arjan Koers, kvm, Michael Tokarev, Avi Kivity,
	Glauber Costa, Andre Przywara

On Mon, Oct 11, 2010 at 12:53:26PM -0500, Anthony Liguori wrote:
> On 10/08/2010 09:27 PM, Zachary Amsden wrote:
> >On 10/08/2010 03:10 PM, Arjan Koers wrote:
> >>On 2010-10-09 00:06, Marcelo Tosatti wrote:
> >>>On Thu, Oct 07, 2010 at 04:47:11PM -1000, Zachary Amsden wrote:
> >>>>On 10/07/2010 02:12 PM, Arjan Koers wrote:
> >>>>>On 2010-10-03 01:42, Zachary Amsden wrote:
> >>>>>...
> >>>>>>Umm...  do you guys have this commit?  This is supposed
> >>>>>>to address the
> >>>>>>issue where the guest keeps resetting the TSC.  A guest
> >>>>>>which does that
> >>>>>>will break kvmclock.  It only happens on SMP, and it's
> >>>>>>much worse on AMD
> >>>>>>CPUs...
> >>>>>>
> >>>>>>sound like your scenario.
> >>>>>>
> >>>>>>commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
> >>>>>>Author: Zachary Amsden<zamsden@redhat.com>
> >>>>>>Date:   Thu Aug 19 22:07:26 2010 -1000
> >>>>>This commit fixes the problem:
> >>>>>
> >>>>>commit aad07c4f92bae2edaa42bcef84c2afdd0d082458
> >>>>>Author: Zachary Amsden<zamsden@redhat.com>
> >>>>>Date:   Thu Aug 19 22:07:19 2010 -1000
> >>>>>
> >>>>>     KVM: x86: Move TSC reset out of vmcb_init
> >>>>>
> >>>>>     The VMCB is reset whenever we receive a startup IPI,
> >>>>>so Linux is setting
> >>>>>     TSC back to zero happens very late in the boot
> >>>>>process and destabilizing
> >>>>>     the TSC.  Instead, just set TSC to zero once at VCPU
> >>>>>creation time.
> >>>>>
> >>>>>     Why the separate patch?  So git-bisect is your friend.
> >>>>Okay, apparently I need to go poke around 2.6.35 and see what
> >>>>patches made it there and what patches didn't.
> >>>Backports attached. Michael, Arjan, please give them a try.
> >>>
> >>Thanks for the patches.
> >>
> >>Successfully tested with 2.6.34.7, 2.6.35.7 and 2.6.36-rc7 host
> >>(with a 2.6.35.7 guest).
> >>
> >>It failed with a 2.6.32.24 host. The patch applied, but
> >>pvclock_clocksource_read on the guest is still producing wrong
> >>results for CPU 1 while it's booting. I'll re-check tomorrow.
> >
> >There's a lot of work I've done and also a lot of work done by
> >Glauber Costa on kvmclock that recently went upstream.
> 
> If pvclock is broken on 2.6.32-stable, then shouldn't we port these
> patches to the stable tree or in the very least, black list pvclock
> in stable?

The minimal fixes will be backported as soon as they appear on linux-2.6.git.

> 
> Regards,
> 
> Anthony Liguori
> 
> >It's unlikely that you'll be bug free without all of those patches
> >applied; most of the patches were not just enhancements, but
> >contained bugfixes as well as improved operation conditions.  On
> >top of this, the patches are highly interdependent because of
> >close code proximity.  I suggest applying the following commits to
> >your branch (newest listed first; apply in reverse order):
> >
> >12b1164fa498997bf72070e6a81418197e283716
> >bfa075b75d8786380a7bca1215d4c7d1485d18dd
> >82e7988a2088781175a22b09631bce97cd5ed177
> >bfb3f3326c915b1800dc65d10ca09fbd548353d2
> >1377ff23ae2bf49c76f8f498ca81050878b9666a
> >9a088cc32488cfb9f60dca5972155ba13f39eb83
> >e06a1a6cbe4e9f4c766595483a9b345d5b48bda7
> >da908f2fb4e783c2a4de751fb90f11a0dd041161
> >cf839f5da2b0779b9ec8b990f851fb4e7d681da0
> >cbc59a098486494d9a49537dcb9c969210a8306d
> >5cd459cdde725bb5c3a7feef6e074e7da70490c9
> >d578d4d72e3d2154901123f40c9fa7de1f85ae73
> >bd59fc8ff95126f27b7a0df1b6cc602aa428812d
> >e5e7675b0b9bf8eb0b806145a2fe173b5bb0e908
> >bf0fb4a42ba7eb362f4013bd2e93209666793e66
> >69403a558097a9bd333736d58a4cb69ea6e2a0ac
> >a87834bdb7ff9117da7f164e8cee638f2c51f9b7
> >91308e2fecddb6fc63feaf4cef3400f5cbea6619
> >fd03465c0648cd12d7333269b80d902d0a8516dd
> >aad07c4f92bae2edaa42bcef84c2afdd0d082458
> >280372e494634d0a2cba3956721be16fc4f989bf
> >1e6145f6fd7899d1f34e4ac00a8558d82a8d704a
> >ec01d2eb0a74a6d95823fb6e320298473faf12be
> >3e05d29fe45508625e2a73db3d1bfb54f30731ff
> >
> >Since the issue appears resolved, I'm going to continue working upstream.
> >
> >Zach
> >-- 
> >To unsubscribe from this list: send the line "unsubscribe kvm" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-09  8:59                                                                         ` Arjan Koers
@ 2010-10-11 20:47                                                                           ` Zachary Amsden
  2010-10-13 12:18                                                                             ` Glauber Costa
  0 siblings, 1 reply; 81+ messages in thread
From: Zachary Amsden @ 2010-10-11 20:47 UTC (permalink / raw)
  To: Arjan Koers
  Cc: Michael Tokarev, kvm, Marcelo Tosatti, Avi Kivity, Glauber Costa,
	Andre Przywara

On 10/08/2010 10:59 PM, Arjan Koers wrote:
> On 2010-10-09 08:29, Michael Tokarev wrote:
> ...
>    
>> The result is that no released linux kernel boots
>> in smp in kvm, which is a linux virtual machine.
>> That's irony, isn't it?
>>
>> I wonder how distributions (which are almost all based
>> on 2.6.32 nowadays) will deal with the issue.. ;)
>>      
> It looks like Debian solved it on their 2.6.32 guest by
> reverting the commit that makes it hang:
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=588426
>    

That's not a wise choice, the commit is needed to prevent clocks going 
backwards.  It then caused some fallout issues with clobbers, which I 
believe hpa fixed, but there were several rounds of it.

Glauber, perhaps, has a better idea of what patches are needed for the 
host side kvmclock.  I've mostly been working on the server side.

To solve the wider range of problems, distributions converging on 2.6.32 
will need all of the fixes backported, both server and host.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-11 20:47                                                                           ` Zachary Amsden
@ 2010-10-13 12:18                                                                             ` Glauber Costa
  0 siblings, 0 replies; 81+ messages in thread
From: Glauber Costa @ 2010-10-13 12:18 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Arjan Koers, Michael Tokarev, kvm, Marcelo Tosatti, Avi Kivity,
	Andre Przywara

On Mon, Oct 11, 2010 at 10:47:16AM -1000, Zachary Amsden wrote:
> On 10/08/2010 10:59 PM, Arjan Koers wrote:
> >On 2010-10-09 08:29, Michael Tokarev wrote:
> >...
> >>The result is that no released linux kernel boots
> >>in smp in kvm, which is a linux virtual machine.
> >>That's irony, isn't it?
> >>
> >>I wonder how distributions (which are almost all based
> >>on 2.6.32 nowadays) will deal with the issue.. ;)
> >It looks like Debian solved it on their 2.6.32 guest by
> >reverting the commit that makes it hang:
> >http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=588426
> 
> That's not a wise choice, the commit is needed to prevent clocks
> going backwards.  It then caused some fallout issues with clobbers,
> which I believe hpa fixed, but there were several rounds of it.
> 
> Glauber, perhaps, has a better idea of what patches are needed for
> the host side kvmclock.  I've mostly been working on the server
> side.
No, all the recent patches I wrote towards fixing kvmclock problems
touch the guest. The host side ones are nice to have, but not stable/needed
material

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: 2.6.35-rc1 regression with pvclock and smp guests
  2010-10-09  1:10                                                                   ` Arjan Koers
                                                                                       ` (2 preceding siblings ...)
  2010-10-10  1:26                                                                     ` Arjan Koers
@ 2010-10-20 20:47                                                                     ` Arjan Koers
  3 siblings, 0 replies; 81+ messages in thread
From: Arjan Koers @ 2010-10-20 20:47 UTC (permalink / raw)
  To: kvm
  Cc: Zachary Amsden, Marcelo Tosatti, Michael Tokarev, Avi Kivity,
	Glauber Costa, Andre Przywara

On 2010-10-09 03:10, Arjan Koers wrote:
> > On 2010-10-09 00:06, Marcelo Tosatti wrote:
...
>> >>
>> >> Backports attached. Michael, Arjan, please give them a try.
>> >>
> >
> > Thanks for the patches.
> >
> > Successfully tested with 2.6.34.7, 2.6.35.7 and 2.6.36-rc7 host
> > (with a 2.6.35.7 guest).

Here's a smaller version of a previous email that didn't make it to
the list...


The host side fixes stop the hanging problem, but the real problem is
on the guest:
The guest starts with one hv_clock struct, which gets written to by
the host (for CPU0).
The percpu code allocates separate hv_clock structs for each CPU and
copies the data from the old hv_clock struct to the new structs.
The CPU1 hv_clock struct with old CPU0 data is accessed, which causes
the problems.

I've performed some tests with an unmodified 2.6.32.24 host and a
recent kvm.git guest. The unmodified guest hangs. A modified guest
where the CPU1 hv_clock struct is initialized to 0, doesn't hang.

Here's a boot log that shows what happens:

 +-printk_cpu (kernel/printk.c)
 |                +-&hv_clock CPU0 (arch/x86/kernel/kvmclock.c)
 |                |        +-hv_clock.version CPU0
 |                |        |                +-&hv_clock CPU1
 |                |        |                |        +-hv_clock.version CPU1
 |                |        |                |        |
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] Linux version 2.6.36-rc7-201010141519-guestmp-kvm+ (arjan@dev-lenny) (gcc version 4.4.5 20100728 (prerelease) (Debian 4.4.4-8) ) #1 SMP Thu Oct 14 15:22:48 UTC 2010
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-2.6.36-rc7-201010141519-guestmp-kvm+ root=UUID=22a4b388-70e0-4d2a-9aa1-bd842504378a ro quiet
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] BIOS-provided physical RAM map:
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000]  BIOS-e820: 0000000000000000 - 000000000009bc00 (usable)
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000]  BIOS-e820: 000000000009bc00 - 00000000000a0000 (reserved)
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000]  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000]  BIOS-e820: 0000000000100000 - 000000001fffd000 (usable)
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000]  BIOS-e820: 000000001fffd000 - 0000000020000000 (reserved)
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000]  BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000]  BIOS-e820: feffd00000000000 - ff00100000000000 (reserved)
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] NX (Execute Disable) protection: active
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] DMI 2.4 present.
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] e820 update range: 0000000000000000 - 0000000000001000 (usable) ==> (reserved)
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] No AGP bridge found
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] last_pfn = 0x1fffd max_arch_pfn = 0x400000000
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] MTRR default type: write-back
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] MTRR fixed ranges enabled:
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000]   00000-9FFFF write-back
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000]   A0000-BFFFF uncachable
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000]   C0000-FFFFF write-protect
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] MTRR variable ranges enabled:
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000]   0 base 00E0000000 mask FFE0000000 uncachable
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000]   1 disabled
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000]   2 disabled
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000]   3 disabled
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000]   4 disabled
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000]   5 disabled
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000]   6 disabled
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000]   7 disabled
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] initial memory mapped : 0 - 20000000
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] Using GB pages for direct mapping
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] init_memory_mapping: 0000000000000000-000000001fffd000
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000]  0000000000 - 001fe00000 page 2M
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000]  001fe00000 - 001fffd000 page 4k
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] kernel direct mapping tables up to 1fffd000 @ 8000-b000
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] RAMDISK: 17df6000 - 1803e000
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] ACPI: RSDP 00000000000fdb80 00014 (v00 BOCHS )
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] ACPI: RSDT 000000001fffde10 00034 (v01 BOCHS  BXPCRSDT 00000001 BXPC 00000001)
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] ACPI: FACP 000000001ffffe40 00074 (v01 BOCHS  BXPCFACP 00000001 BXPC 00000001)
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] ACPI: DSDT 000000001fffdfd0 01E22 (v01   BXPC   BXDSDT 00000001 INTL 20090123)
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] ACPI: FACS 000000001ffffe00 00040
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] ACPI: SSDT 000000001fffdf80 00044 (v01 BOCHS  BXPCSSDT 00000001 BXPC 00000001)
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] ACPI: APIC 000000001fffde90 0007A (v01 BOCHS  BXPCAPIC 00000001 BXPC 00000001)
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] ACPI: HPET 000000001fffde50 00038 (v01 BOCHS  BXPCHPET 00000001 BXPC 00000001)
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] ACPI: Local APIC address 0xfee00000
[0 ffffffff81324fc0        0 ffffffff81324fc0        0     0.000000] kvm-clock: Using msrs 12 and 11
pass the address of the hv_clock struct to the host; the host starts writing to it:
[0 ffffffff81324fc0   11c3c2 ffffffff81324fc0   11c3c2     0.000000] kvm-clock: cpu 0, msr 0:1324fc1, boot clock
pv_clock data is accessed in kvm_get_tsc_khz:
[0 ffffffff81324fc0   11c3ca ffffffff81324fc0   11c3ca     0.000000]  [ffffea0000000000-ffffea00007fffff] PMD -> [ffff880001c00000-ffff8800023fffff] on node 0
[0 ffffffff81324fc0   11c3ca ffffffff81324fc0   11c3ca     0.000000] Zone PFN ranges:
[0 ffffffff81324fc0   11c3ca ffffffff81324fc0   11c3ca     0.000000]   DMA      0x00000001 -> 0x00001000
[0 ffffffff81324fc0   11c3ca ffffffff81324fc0   11c3ca     0.000000]   DMA32    0x00001000 -> 0x00100000
[0 ffffffff81324fc0   11c3ca ffffffff81324fc0   11c3ca     0.000000]   Normal   empty
[0 ffffffff81324fc0   11c3ca ffffffff81324fc0   11c3ca     0.000000] Movable zone start PFN for each node
[0 ffffffff81324fc0   11c3ca ffffffff81324fc0   11c3ca     0.000000] early_node_map[2] active PFN ranges
[0 ffffffff81324fc0   11c3ca ffffffff81324fc0   11c3ca     0.000000]     0: 0x00000001 -> 0x0000009b
[0 ffffffff81324fc0   11c3ca ffffffff81324fc0   11c3ca     0.000000]     0: 0x00000100 -> 0x0001fffd
[0 ffffffff81324fc0   11c3ca ffffffff81324fc0   11c3ca     0.000000] On node 0 totalpages: 130967
[0 ffffffff81324fc0   11c3ca ffffffff81324fc0   11c3ca     0.000000]   DMA zone: 56 pages used for memmap
[0 ffffffff81324fc0   11c3ca ffffffff81324fc0   11c3ca     0.000000]   DMA zone: 0 pages reserved
[0 ffffffff81324fc0   11c3ca ffffffff81324fc0   11c3ca     0.000000]   DMA zone: 3938 pages, LIFO batch:0
[0 ffffffff81324fc0   11c3ca ffffffff81324fc0   11c3ca     0.000000]   DMA32 zone: 1736 pages used for memmap
[0 ffffffff81324fc0   11c3ca ffffffff81324fc0   11c3ca     0.000000]   DMA32 zone: 125237 pages, LIFO batch:31
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] ACPI: PM-Timer IO Port: 0xb008
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] ACPI: Local APIC address 0xfee00000
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] ACPI: IRQ0 used by override.
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] ACPI: IRQ2 used by override.
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] ACPI: IRQ5 used by override.
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] ACPI: IRQ9 used by override.
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] ACPI: IRQ10 used by override.
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] ACPI: IRQ11 used by override.
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] Using ACPI (MADT) for SMP configuration information
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] nr_irqs_gsi: 40
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] Allocating PCI resources starting at 20000000 (gap: 20000000:dffc0000)
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] Booting paravirtualized kernel on KVM
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] early_res array is doubled to 64 at [3000 - 37ff]
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] PERCPU: Embedded 26 pages/cpu @ffff880001400000 s76736 r8192 d21568 u1048576
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] pcpu-alloc: s76736 r8192 d21568 u1048576 alloc=1*2097152
[0 ffffffff81324fc0   11c4ae ffffffff81324fc0   11c4ae     0.000000] pcpu-alloc: [0] 0 1
the single hv_clock struct has been copied to two new structs (one for each CPU); the contents are correct for CPU0, but not for CPU1
the host may still write to the old pv_clock location; can this cause problems?

if the CPU1 hv_clock struct is zeroed here, pvclock_clocksource_read will not return wrong data and the guest won't hang

pass the address of the CPU0 hv_clock struct to the host; the host starts writing to it:
[0 ffff880001411fc0   11c4b0 ffff880001511fc0   11c4ae     0.000000] kvm-clock: cpu 0, msr 0:1411fc1, primary cpu clock
[0 ffff880001411fc0   11c4b0 ffff880001511fc0   11c4ae     0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 129175
[0 ffff880001411fc0   11c4b0 ffff880001511fc0   11c4ae     0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-2.6.36-rc7-201010141519-guestmp-kvm+ root=UUID=22a4b388-70e0-4d2a-9aa1-bd842504378a ro quiet
[0 ffff880001411fc0   11c4b0 ffff880001511fc0   11c4ae     0.000000] PID hash table entries: 2048 (order: 2, 16384 bytes)
[0 ffff880001411fc0   11c4b0 ffff880001511fc0   11c4ae     0.000000] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
[0 ffff880001411fc0   11c4b0 ffff880001511fc0   11c4ae     0.000000] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
[0 ffff880001411fc0   11c4b0 ffff880001511fc0   11c4ae     0.000000] Checking aperture...
[0 ffff880001411fc0   1244dc ffff880001511fc0   11c4ae     0.000000] No AGP bridge found
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000] Subtract (39 early reservations)
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #1 [0001000000 - 00013d6d38]   TEXT DATA BSS
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #2 [0017df6000 - 001803e000]         RAMDISK
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #3 [000009bc00 - 0000100000]   BIOS reserved
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #4 [00013d7000 - 00013d7071]             BRK
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #5 [0000001000 - 0000003000]      TRAMPOLINE
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #6 [0000008000 - 0000009000]         PGTABLE
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #7 [00013d7080 - 00013d8080]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #8 [00013d6d40 - 00013d6da0]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #9 [0001bd9000 - 0001bda000]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #10 [0001bda000 - 0001bdb000]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #11 [0001c00000 - 0002400000]        MEMMAP 0
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #12 [00013d6dc0 - 00013d6f40]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #13 [00013d8080 - 00013db080]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #14 [00013dc000 - 00013dd000]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #15 [00013d6f40 - 00013d6f81]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #16 [00013db080 - 00013db0c3]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #17 [00013db100 - 00013db2c0]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #18 [00013db2c0 - 00013db328]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #19 [00013db340 - 00013db3a8]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #20 [00013db3c0 - 00013db428]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #21 [00013db440 - 00013db4a8]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #22 [00013db4c0 - 00013db528]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #23 [00013db540 - 00013db5a8]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #24 [00013db5c0 - 00013db628]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #25 [00013db640 - 00013db6b6]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #26 [00013db6c0 - 00013db736]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #27 [0001400000 - 000141a000]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #28 [0001500000 - 000151a000]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #29 [00013d6fc0 - 00013d6fc8]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #30 [00013db740 - 00013db748]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #31 [00013db780 - 00013db788]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #32 [00013db7c0 - 00013db7d0]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #33 [00013db800 - 00013db940]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #34 [00013db940 - 00013db9a0]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #35 [00013db9c0 - 00013dba20]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #36 [00013dd000 - 00013e1000]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #37 [000141a000 - 000149a000]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000]   #38 [000149a000 - 00014da000]         BOOTMEM
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000] Memory: 508372k/524276k available (2128k kernel code, 408k absent, 15496k reserved, 1011k data, 472k init)
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000] Hierarchical RCU implementation.
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000] 	RCU-based detection of stalled CPUs is disabled.
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000] 	Verbose stalled-CPUs detection is disabled.
[0 ffff880001411fc0   1245fc ffff880001511fc0   11c4ae     0.000000] NR_IRQS:320
[0 ffff880001411fc0   126736 ffff880001511fc0   11c4ae     0.000000] Console: colour VGA+ 80x25
[0 ffff880001411fc0   126736 ffff880001511fc0   11c4ae     0.000000] console [tty0] enabled
[0 ffff880001411fc0   126766 ffff880001511fc0   11c4ae     0.000000] hpet clockevent registered
[0 ffff880001411fc0   126766 ffff880001511fc0   11c4ae     0.000000] Detected 2799.750 MHz processor.
[0 ffff880001411fc0   126766 ffff880001511fc0   11c4ae     0.012000] Calibrating delay loop (skipped) preset value.. 5599.50 BogoMIPS (lpj=11199000)
[0 ffff880001411fc0   126766 ffff880001511fc0   11c4ae     0.012000] pid_max: default: 32768 minimum: 301
[0 ffff880001411fc0   126766 ffff880001511fc0   11c4ae     0.012000] Mount-cache hash table entries: 256
[0 ffff880001411fc0   12676a ffff880001511fc0   11c4ae     0.012000] using C1E aware idle routine
[0 ffff880001411fc0   12676a ffff880001511fc0   11c4ae     0.012000] Performance Events: AMD PMU driver.
[0 ffff880001411fc0   12676a ffff880001511fc0   11c4ae     0.012000] ... version:                0
[0 ffff880001411fc0   12676a ffff880001511fc0   11c4ae     0.012000] ... bit width:              48
[0 ffff880001411fc0   12676a ffff880001511fc0   11c4ae     0.012000] ... generic registers:      4
[0 ffff880001411fc0   12676a ffff880001511fc0   11c4ae     0.012000] ... value mask:             0000ffffffffffff
[0 ffff880001411fc0   12676a ffff880001511fc0   11c4ae     0.012000] ... max period:             00007fffffffffff
[0 ffff880001411fc0   12676a ffff880001511fc0   11c4ae     0.012000] ... fixed-purpose events:   0
[0 ffff880001411fc0   12676a ffff880001511fc0   11c4ae     0.012000] ... event mask:             000000000000000f
[0 ffff880001411fc0   12676c ffff880001511fc0   11c4ae     0.012333] Freeing SMP alternatives: 12k freed
[0 ffff880001411fc0   12676c ffff880001511fc0   11c4ae     0.012342] ACPI: Core revision 20100702
[0 ffff880001411fc0   126770 ffff880001511fc0   11c4ae     0.014061] Setting APIC routing to flat
[0 ffff880001411fc0   126774 ffff880001511fc0   11c4ae     0.015478] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[0 ffff880001411fc0   126774 ffff880001511fc0   11c4ae     0.015483] CPU0: AMD Athlon(tm) II X2 240 Processor stepping 02
[0 ffff880001411fc0   126830 ffff880001511fc0   11c4ae     0.016000] ++++++++++++++++++++=_---CPU UP  1
[0 ffff880001411fc0   126830 ffff880001511fc0   11c4ae     0.016000] Booting Node   0, Processors  #1 Ok.
[0 ffff880001411fc0   126830 ffff880001511fc0   11c4ae     0.016000] Setting warm reset code and vector.
[0 ffff880001411fc0   126834 ffff880001511fc0   11c4ae     0.016000] 1.
[0 ffff880001411fc0   126834 ffff880001511fc0   11c4ae     0.016000] 2.
[0 ffff880001411fc0   126834 ffff880001511fc0   11c4ae     0.016000] 3.
[0 ffff880001411fc0   126834 ffff880001511fc0   11c4ae     0.016000] Asserting INIT.
[0 ffff880001411fc0   126834 ffff880001511fc0   11c4ae     0.016000] Waiting for send to finish...
[0 ffff880001411fc0   126836 ffff880001511fc0   11c4ae     0.022250] Deasserting INIT.
[0 ffff880001411fc0   126836 ffff880001511fc0   11c4ae     0.022259] Waiting for send to finish...
[0 ffff880001411fc0   126836 ffff880001511fc0   11c4ae     0.022265] #startup loops: 2.
[0 ffff880001411fc0   126836 ffff880001511fc0   11c4ae     0.022268] Sending STARTUP #1.
[0 ffff880001411fc0   126836 ffff880001511fc0   11c4ae     0.022275] After apic_write.
this printk gets the time from the CPU1 hv_clock (with old CPU0 data), which results in value far into the future:
[1 ffff880001411fc0   126836 ffff880001511fc0   11c4ae     0.012000] CPU#1 (phys ID: 1) waiting for CALLOUT
[0 ffff880001411fc0   126836 ffff880001511fc0   11c4ae     0.024000] Startup point 1.
[0 ffff880001411fc0   126836 ffff880001511fc0   11c4ae     0.024000] Waiting for send to finish...
[0 ffff880001411fc0   126836 ffff880001511fc0   11c4ae     0.024000] Sending STARTUP #2.
[0 ffff880001411fc0   126836 ffff880001511fc0   11c4ae     0.024000] After apic_write.
[0 ffff880001411fc0   126836 ffff880001511fc0   11c4ae     0.024000] Startup point 1.
[0 ffff880001411fc0   126836 ffff880001511fc0   11c4ae     0.024000] Waiting for send to finish...
[0 ffff880001411fc0   126836 ffff880001511fc0   11c4ae     0.024000] After Startup.
[0 ffff880001411fc0   126836 ffff880001511fc0   11c4ae     0.024000] Before Callout 1.
[0 ffff880001411fc0   126836 ffff880001511fc0   11c4ae     0.024000] After Callout 1.
same as previous comment:
[1 ffff880001411fc0   126836 ffff880001511fc0   11c4ae     0.012000] CALLIN, before setup_local_APIC().
same as previous comment:
[1 ffff880001411fc0   126836 ffff880001511fc0   11c4ae     0.012000] Stack at about ffff88001f89ff44
pass the address of the CPU1 hv_clock struct to the host; the host starts writing to it and the data in both structs (CPU0 and CPU1) is valid now:
[1 ffff880001411fc0   126836 ffff880001511fc0   11ddcc     0.012000] kvm-clock: cpu 1, msr 0:1511fc1, secondary cpu clock
[0 ffff880001411fc0   126836 ffff880001511fc0   11ddcc     0.025001] CPU1: has booted.
[0 ffff880001411fc0   12683a ffff880001511fc0   11ddcc     0.025001] Brought up 2 CPUs
[0 ffff880001411fc0   12683a ffff880001511fc0   11ddcc     0.025001] Boot done.
[0 ffff880001411fc0   12683a ffff880001511fc0   11ddcc     0.025001] Before bogomips.
[0 ffff880001411fc0   12683a ffff880001511fc0   11ddcc     0.025001] Total of 2 processors activated (11199.00 BogoMIPS).
[0 ffff880001411fc0   12683a ffff880001511fc0   11ddcc     0.025001] Before bogocount - setting activated=1.
[1 ffff880001411fc0   12683a ffff880001511fc0   11ddce     0.025001] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
...


^ permalink raw reply	[flat|nested] 81+ messages in thread

end of thread, other threads:[~2010-10-20 20:47 UTC | newest]

Thread overview: 81+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-22 12:53 2.6.35-rc1 regression with pvclock and smp guests Andre Przywara
2010-07-25  8:44 ` Avi Kivity
2010-07-26  8:47   ` Andre Przywara
2010-07-26 18:59     ` Arjan Koers
2010-07-27 21:00       ` Arjan Koers
2010-07-28 10:37         ` Avi Kivity
2010-07-31  0:34           ` Arjan Koers
2010-07-31  1:38             ` Zachary Amsden
2010-07-31 11:50               ` Arjan Koers
2010-07-31  2:39             ` Zachary Amsden
2010-07-31 11:53               ` Arjan Koers
2010-07-31 16:36                 ` Arjan Koers
2010-07-31 19:45                   ` Arjan Koers
2010-07-31 23:55                   ` Zachary Amsden
2010-08-02 14:43                     ` Glauber Costa
2010-08-02 16:16                       ` Arjan Koers
2010-08-02 18:07                         ` Glauber Costa
2010-08-02 20:26                       ` Zachary Amsden
2010-08-02 21:10                         ` Glauber Costa
2010-08-02 21:35                         ` Arjan Koers
2010-08-03  0:00                           ` Zachary Amsden
2010-09-28 11:16                           ` Michael Tokarev
2010-09-29  8:12                             ` Michael Tokarev
2010-09-29  8:28                           ` Avi Kivity
2010-09-29  9:17                             ` Michael Tokarev
2010-09-29  9:19                               ` Michael Tokarev
2010-09-29 19:26                                 ` Arjan Koers
2010-09-30  7:55                                   ` Michael Tokarev
2010-09-30  9:59                                     ` Michael Tokarev
2010-09-30 13:54                                       ` Zachary Amsden
2010-09-30 15:12                                         ` Michael Tokarev
2010-09-30 15:32                                           ` Zachary Amsden
2010-09-30 18:49                                             ` Arjan Koers
2010-09-30 19:05                                               ` Marcelo Tosatti
2010-09-30 20:16                                                 ` Arjan Koers
2010-09-30 23:02                                                 ` Michael Tokarev
2010-09-30 23:07                                                   ` Michael Tokarev
2010-10-01  1:13                                                     ` Zachary Amsden
2010-10-02  5:35                                                     ` Zachary Amsden
2010-10-02  7:35                                                       ` Michael Tokarev
2010-10-02  7:40                                                         ` Michael Tokarev
2010-10-02  7:50                                                           ` Michael Tokarev
2010-10-02 16:10                                                         ` Arjan Koers
2010-10-02 20:26                                                           ` Michael Tokarev
2010-10-02 23:42                                                           ` Zachary Amsden
2010-10-03  8:27                                                             ` Michael Tokarev
2010-10-08  0:12                                                             ` Arjan Koers
2010-10-08  2:47                                                               ` Zachary Amsden
2010-10-08 22:06                                                                 ` Marcelo Tosatti
2010-10-09  1:10                                                                   ` Arjan Koers
2010-10-09  2:27                                                                     ` Zachary Amsden
2010-10-09  6:29                                                                       ` Michael Tokarev
2010-10-09  8:59                                                                         ` Arjan Koers
2010-10-11 20:47                                                                           ` Zachary Amsden
2010-10-13 12:18                                                                             ` Glauber Costa
2010-10-10  1:20                                                                       ` Arjan Koers
2010-10-11 17:53                                                                       ` Anthony Liguori
2010-10-11 18:36                                                                         ` Marcelo Tosatti
2010-10-09  2:29                                                                     ` Zachary Amsden
2010-10-10  1:26                                                                     ` Arjan Koers
2010-10-20 20:47                                                                     ` Arjan Koers
2010-10-09  7:59                                                                   ` Michael Tokarev
2010-10-09  8:31                                                                     ` Michael Tokarev
2010-10-02 21:55                                                         ` Zachary Amsden
2010-10-03  8:16                                                           ` Michael Tokarev
2010-10-03  8:22                                                             ` Avi Kivity
2010-10-03  8:30                                                             ` Michael Tokarev
2010-07-27 10:03     ` Avi Kivity
2010-07-27 11:49       ` Andre Przywara
2010-07-27 12:06         ` Avi Kivity
2010-07-27 12:21           ` Andre Przywara
2010-07-27 12:34             ` Avi Kivity
2010-07-27 13:48               ` Andre Przywara
2010-07-27 13:58                 ` Avi Kivity
2010-07-27 14:55                   ` Andre Przywara
2010-07-27 21:51                     ` Andre Przywara
2010-07-28  3:00                       ` Zachary Amsden
2010-07-28  7:55                         ` Andre Przywara
2010-07-28 12:25                       ` Andre Przywara
2010-07-30 22:54                         ` Zachary Amsden
2010-08-02 10:12                           ` Andre Przywara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox