* 2.6.35-rc1 regression with pvclock and smp guests
@ 2010-07-22 12:53 Andre Przywara
2010-07-25 8:44 ` Avi Kivity
0 siblings, 1 reply; 81+ messages in thread
From: Andre Przywara @ 2010-07-22 12:53 UTC (permalink / raw)
To: glommer; +Cc: Zachary Amsden, KVM list
Hi,
I found a regression with pvclock and SMP KVM _guests_.
PVCLOCK enabled guest kernels boot with qemu-kvm.git and with smp=1, but
with smp=2 halt at:
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
(last line shown)
I bisected this down to:
commit 489fb490dbf8dab0249ad82b56688ae3842a79e8
Author: Glauber Costa <glommer@redhat.com>
Date: Tue May 11 12:17:40 2010 -0400
x86, paravirt: Add a global synchronization point for pvclock
One commit before works, smp=1 always works, disabling PVCLOCK works.
Using qemu-kvm-0.12.4 works, too.
Having PVCLOCK enabled and with smp=2 the kernel halts without any
further message.
This is still the case with the lastest tip.
Even pinning both VCPU threads to the same host core show the bug.
The bug triggers on all hosts I tested, an single socket quadcore
Athlon, a dual socket dualcore K8-Opteron and a quad socket 12core Opteron.
Please note that this is the guest kernel, the host kernel does not matter.
I have no idea (and don't feel like ;-) debugging this, so I hope
someone will find and fix the bug.
Regards,
Andre.
--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-22 12:53 2.6.35-rc1 regression with pvclock and smp guests Andre Przywara
@ 2010-07-25 8:44 ` Avi Kivity
2010-07-26 8:47 ` Andre Przywara
0 siblings, 1 reply; 81+ messages in thread
From: Avi Kivity @ 2010-07-25 8:44 UTC (permalink / raw)
To: Andre Przywara; +Cc: glommer, Zachary Amsden, KVM list
On 07/22/2010 03:53 PM, Andre Przywara wrote:
> Hi,
>
> I found a regression with pvclock and SMP KVM _guests_.
> PVCLOCK enabled guest kernels boot with qemu-kvm.git and with smp=1,
> but with smp=2 halt at:
>
> Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> (last line shown)
>
> I bisected this down to:
> commit 489fb490dbf8dab0249ad82b56688ae3842a79e8
> Author: Glauber Costa <glommer@redhat.com>
> Date: Tue May 11 12:17:40 2010 -0400
>
> x86, paravirt: Add a global synchronization point for pvclock
>
> One commit before works, smp=1 always works, disabling PVCLOCK works.
> Using qemu-kvm-0.12.4 works, too.
> Having PVCLOCK enabled and with smp=2 the kernel halts without any
> further message.
> This is still the case with the lastest tip.
> Even pinning both VCPU threads to the same host core show the bug.
> The bug triggers on all hosts I tested, an single socket quadcore
> Athlon, a dual socket dualcore K8-Opteron and a quad socket 12core
> Opteron.
>
> Please note that this is the guest kernel, the host kernel does not
> matter.
>
> I have no idea (and don't feel like ;-) debugging this, so I hope
> someone will find and fix the bug.
Does this go away with CONFIG_DEBUG_RODATA=n? If so, it's a known bug
in the atomic_*() clobber lists.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-25 8:44 ` Avi Kivity
@ 2010-07-26 8:47 ` Andre Przywara
2010-07-26 18:59 ` Arjan Koers
2010-07-27 10:03 ` Avi Kivity
0 siblings, 2 replies; 81+ messages in thread
From: Andre Przywara @ 2010-07-26 8:47 UTC (permalink / raw)
To: Avi Kivity; +Cc: glommer@redhat.com, Zachary Amsden, KVM list
Avi Kivity wrote:
> On 07/22/2010 03:53 PM, Andre Przywara wrote:
>> Hi,
>>
>> I found a regression with pvclock and SMP KVM _guests_.
>> PVCLOCK enabled guest kernels boot with qemu-kvm.git and with smp=1,
>> but with smp=2 halt at:
>>
>> Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
>> (last line shown)
>>
>> I bisected this down to:
>> commit 489fb490dbf8dab0249ad82b56688ae3842a79e8
>> Author: Glauber Costa <glommer@redhat.com>
>> Date: Tue May 11 12:17:40 2010 -0400
>>
>> x86, paravirt: Add a global synchronization point for pvclock
>>
>> One commit before works, smp=1 always works, disabling PVCLOCK works.
>> Using qemu-kvm-0.12.4 works, too.
>> Having PVCLOCK enabled and with smp=2 the kernel halts without any
>> further message.
>> This is still the case with the lastest tip.
>> Even pinning both VCPU threads to the same host core show the bug.
>> The bug triggers on all hosts I tested, an single socket quadcore
>> Athlon, a dual socket dualcore K8-Opteron and a quad socket 12core
>> Opteron.
>>
>> Please note that this is the guest kernel, the host kernel does not
>> matter.
>>
>> I have no idea (and don't feel like ;-) debugging this, so I hope
>> someone will find and fix the bug.
>
>
> Does this go away with CONFIG_DEBUG_RODATA=n? If so, it's a known bug
> in the atomic_*() clobber lists.
>
Unfortunately the bug persists even with CONFIG_DEBUG_RODATA disabled.
The debug options I had enabled now are:
CONFIG_DEBUG_DEVRES=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_STACKOVERFLOW=y
CONFIG_DEBUG_BOOT_PARAMS=y
I even disabled all kernel debug options, that does not help, too.
Regards,
Andre.
--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-26 8:47 ` Andre Przywara
@ 2010-07-26 18:59 ` Arjan Koers
2010-07-27 21:00 ` Arjan Koers
2010-07-27 10:03 ` Avi Kivity
1 sibling, 1 reply; 81+ messages in thread
From: Arjan Koers @ 2010-07-26 18:59 UTC (permalink / raw)
To: kvm
Andre Przywara wrote:
> Avi Kivity wrote:
>> On 07/22/2010 03:53 PM, Andre Przywara wrote:
>>> Hi,
>>>
>>> I found a regression with pvclock and SMP KVM _guests_.
>>> PVCLOCK enabled guest kernels boot with qemu-kvm.git and with smp=1,
>>> but with smp=2 halt at:
>>>
>>> Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
>>> (last line shown)
>>>
>>> I bisected this down to:
>>> commit 489fb490dbf8dab0249ad82b56688ae3842a79e8
>>> Author: Glauber Costa <glommer@redhat.com>
>>> Date: Tue May 11 12:17:40 2010 -0400
>>>
>>> x86, paravirt: Add a global synchronization point for pvclock
>>>
>>> One commit before works, smp=1 always works, disabling PVCLOCK works.
>>> Using qemu-kvm-0.12.4 works, too.
>>> Having PVCLOCK enabled and with smp=2 the kernel halts without any
>>> further message.
>>> This is still the case with the lastest tip.
>>> Even pinning both VCPU threads to the same host core show the bug.
>>> The bug triggers on all hosts I tested, an single socket quadcore
>>> Athlon, a dual socket dualcore K8-Opteron and a quad socket 12core
>>> Opteron.
>>>
>>> Please note that this is the guest kernel, the host kernel does not
>>> matter.
>>>
>>> I have no idea (and don't feel like ;-) debugging this, so I hope
>>> someone will find and fix the bug.
>>
>>
>> Does this go away with CONFIG_DEBUG_RODATA=n? If so, it's a known bug
>> in the atomic_*() clobber lists.
>>
> Unfortunately the bug persists even with CONFIG_DEBUG_RODATA disabled.
> The debug options I had enabled now are:
> CONFIG_DEBUG_DEVRES=y
> CONFIG_DEBUG_FS=y
> CONFIG_DEBUG_KERNEL=y
> CONFIG_DEBUG_BUGVERBOSE=y
> CONFIG_DEBUG_MEMORY_INIT=y
> CONFIG_DEBUG_STACKOVERFLOW=y
> CONFIG_DEBUG_BOOT_PARAMS=y
>
> I even disabled all kernel debug options, that does not help, too.
I ran into the same problem. 2.6.34.1 and 2.6.35-rc6 SMP guest
kernels hang during boot.
The boot log of 2.6.34.1 with the patch reverted is at the bottom of
this message (59aab522154a2f17b25335b63c1cf68a51fb6ae0 for 2.6.34.1).
With the patch still in place, the kernel appears to hang (stuck in
while loop?) between these two messages:
[ 0.684803] vdb: vdb1
[ 1.013120] Clocksource tsc unstable (delta = 1037182237254 ns)
Note that each boot shows a message about the tsc being unstable:
[ 1.013120] Clocksource tsc unstable (delta = 1037182237254 ns)
[ 1.013122] Clocksource tsc unstable (delta = 1149054858088 ns)
[ 1.009117] Clocksource tsc unstable (delta = 1265448436431 ns)
My host is running kernel 2.6.34.1 with the latest git version of
qemu-kvm (b81fe95).
Boot log of SMP guest with patch reverted:
[ 0.000000] Linux version 2.6.34.1-201007261412-guestmp (arjan@dev-lenny) (gcc version 4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Mon Jul 26 14:16:18 UTC 2010
[ 0.000000] Command line: root=/dev/vda1 ro single
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009a800 (usable)
[ 0.000000] BIOS-e820: 000000000009a800 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 000000001fffd000 (usable)
[ 0.000000] BIOS-e820: 000000001fffd000 - 0000000020000000 (reserved)
[ 0.000000] BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] DMI 2.4 present.
[ 0.000000] e820 update range: 0000000000000000 - 0000000000001000 (usable) ==> (reserved)
[ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
[ 0.000000] No AGP bridge found
[ 0.000000] last_pfn = 0x1fffd max_arch_pfn = 0x400000000
[ 0.000000] MTRR default type: write-back
[ 0.000000] MTRR fixed ranges enabled:
[ 0.000000] 00000-9FFFF write-back
[ 0.000000] A0000-BFFFF uncachable
[ 0.000000] C0000-FFFFF write-protect
[ 0.000000] MTRR variable ranges enabled:
[ 0.000000] 0 base 00E0000000 mask FFE0000000 uncachable
[ 0.000000] 1 disabled
[ 0.000000] 2 disabled
[ 0.000000] 3 disabled
[ 0.000000] 4 disabled
[ 0.000000] 5 disabled
[ 0.000000] 6 disabled
[ 0.000000] 7 disabled
[ 0.000000] x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106
[ 0.000000] initial memory mapped : 0 - 20000000
[ 0.000000] Using GB pages for direct mapping
[ 0.000000] init_memory_mapping: 0000000000000000-000000001fffd000
[ 0.000000] 0000000000 - 001fe00000 page 2M
[ 0.000000] 001fe00000 - 001fffd000 page 4k
[ 0.000000] kernel direct mapping tables up to 1fffd000 @ 8000-b000
[ 0.000000] RAMDISK: 1fdfc000 - 1ffed000
[ 0.000000] ACPI: RSDP 00000000000fdb50 00014 (v00 BOCHS )
[ 0.000000] ACPI: RSDT 000000001fffde10 00034 (v01 BOCHS BXPCRSDT 00000001 BXPC 00000001)
[ 0.000000] ACPI: FACP 000000001ffffe40 00074 (v01 BOCHS BXPCFACP 00000001 BXPC 00000001)
[ 0.000000] ACPI: DSDT 000000001fffdfd0 01E22 (v01 BXPC BXDSDT 00000001 INTL 20090123)
[ 0.000000] ACPI: FACS 000000001ffffe00 00040
[ 0.000000] ACPI: SSDT 000000001fffdf80 00044 (v01 BOCHS BXPCSSDT 00000001 BXPC 00000001)
[ 0.000000] ACPI: APIC 000000001fffde90 0007A (v01 BOCHS BXPCAPIC 00000001 BXPC 00000001)
[ 0.000000] ACPI: HPET 000000001fffde50 00038 (v01 BOCHS BXPCHPET 00000001 BXPC 00000001)
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] kvm-clock: cpu 0, msr 0:1331ac1, boot clock
[ 0.000000] [ffffea0000000000-ffffea00007fffff] PMD -> [ffff880001c00000-ffff8800023fffff] on node 0
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0x00000001 -> 0x00001000
[ 0.000000] DMA32 0x00001000 -> 0x00100000
[ 0.000000] Normal empty
[ 0.000000] Movable zone start PFN for each node
[ 0.000000] early_node_map[2] active PFN ranges
[ 0.000000] 0: 0x00000001 -> 0x0000009a
[ 0.000000] 0: 0x00000100 -> 0x0001fffd
[ 0.000000] On node 0 totalpages: 130966
[ 0.000000] DMA zone: 56 pages used for memmap
[ 0.000000] DMA zone: 0 pages reserved
[ 0.000000] DMA zone: 3937 pages, LIFO batch:0
[ 0.000000] DMA32 zone: 1736 pages used for memmap
[ 0.000000] DMA32 zone: 125237 pages, LIFO batch:31
[ 0.000000] ACPI: PM-Timer IO Port: 0xb008
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[ 0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[ 0.000000] ACPI: IRQ0 used by override.
[ 0.000000] ACPI: IRQ2 used by override.
[ 0.000000] ACPI: IRQ5 used by override.
[ 0.000000] ACPI: IRQ9 used by override.
[ 0.000000] ACPI: IRQ10 used by override.
[ 0.000000] ACPI: IRQ11 used by override.
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[ 0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
[ 0.000000] nr_irqs_gsi: 24
[ 0.000000] Allocating PCI resources starting at 20000000 (gap: 20000000:dffc0000)
[ 0.000000] Booting paravirtualized kernel on KVM
[ 0.000000] setup_percpu: NR_CPUS:6 nr_cpumask_bits:6 nr_cpu_ids:2 nr_node_ids:1
[ 0.000000] early_res array is doubled to 64 at [3000 - 37ff]
[ 0.000000] PERCPU: Embedded 26 pages/cpu @ffff880001400000 s74984 r8192 d23320 u1048576
[ 0.000000] pcpu-alloc: s74984 r8192 d23320 u1048576 alloc=1*2097152
[ 0.000000] pcpu-alloc: [0] 0 1
[ 0.000000] kvm-clock: cpu 0, msr 0:1411ac1, primary cpu clock
[ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 129174
[ 0.000000] Kernel command line: root=/dev/vda1 ro single
[ 0.000000] PID hash table entries: 2048 (order: 2, 16384 bytes)
[ 0.000000] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
[ 0.000000] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
[ 0.000000] Checking aperture...
[ 0.000000] No AGP bridge found
[ 0.000000] Subtract (38 early reservations)
[ 0.000000] #1 [0001000000 - 00013e1e38] TEXT DATA BSS
[ 0.000000] #2 [001fdfc000 - 001ffed000] RAMDISK
[ 0.000000] #3 [000009a800 - 0000100000] BIOS reserved
[ 0.000000] #4 [00013e2000 - 00013e2071] BRK
[ 0.000000] #5 [0000001000 - 0000003000] TRAMPOLINE
[ 0.000000] #6 [0000008000 - 0000009000] PGTABLE
[ 0.000000] #7 [00013e2080 - 00013e3080] BOOTMEM
[ 0.000000] #8 [00013e1e40 - 00013e1ea0] BOOTMEM
[ 0.000000] #9 [0001be4000 - 0001be5000] BOOTMEM
[ 0.000000] #10 [0001be5000 - 0001be6000] BOOTMEM
[ 0.000000] #11 [0001c00000 - 0002400000] MEMMAP 0
[ 0.000000] #12 [00013e3080 - 00013e3200] BOOTMEM
[ 0.000000] #13 [00013e3200 - 00013e6200] BOOTMEM
[ 0.000000] #14 [00013e7000 - 00013e8000] BOOTMEM
[ 0.000000] #15 [00013e1ec0 - 00013e1f01] BOOTMEM
[ 0.000000] #16 [00013e1f40 - 00013e1f83] BOOTMEM
[ 0.000000] #17 [00013e6200 - 00013e6388] BOOTMEM
[ 0.000000] #18 [00013e63c0 - 00013e6428] BOOTMEM
[ 0.000000] #19 [00013e6440 - 00013e64a8] BOOTMEM
[ 0.000000] #20 [00013e64c0 - 00013e6528] BOOTMEM
[ 0.000000] #21 [00013e6540 - 00013e65a8] BOOTMEM
[ 0.000000] #22 [00013e65c0 - 00013e6628] BOOTMEM
[ 0.000000] #23 [00013e6640 - 00013e66a8] BOOTMEM
[ 0.000000] #24 [00013e1fc0 - 00013e1fd9] BOOTMEM
[ 0.000000] #25 [00013e66c0 - 00013e66d9] BOOTMEM
[ 0.000000] #26 [0001400000 - 000141a000] BOOTMEM
[ 0.000000] #27 [0001500000 - 000151a000] BOOTMEM
[ 0.000000] #28 [00013e6700 - 00013e6708] BOOTMEM
[ 0.000000] #29 [00013e6740 - 00013e6748] BOOTMEM
[ 0.000000] #30 [00013e6780 - 00013e6788] BOOTMEM
[ 0.000000] #31 [00013e67c0 - 00013e67d0] BOOTMEM
[ 0.000000] #32 [00013e6800 - 00013e6940] BOOTMEM
[ 0.000000] #33 [00013e6940 - 00013e69a0] BOOTMEM
[ 0.000000] #34 [00013e69c0 - 00013e6a20] BOOTMEM
[ 0.000000] #35 [00013e8000 - 00013ec000] BOOTMEM
[ 0.000000] #36 [000141a000 - 000149a000] BOOTMEM
[ 0.000000] #37 [000149a000 - 00014da000] BOOTMEM
[ 0.000000] Memory: 508672k/524276k available (2096k kernel code, 412k absent, 15192k reserved, 1097k data, 456k init)
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] NR_IRQS:448
[ 0.000000] Console: colour VGA+ 80x25
[ 0.000000] console [tty0] enabled
[ 0.000000] hpet clockevent registered
[ 0.000000] Detected 2800.590 MHz processor.
[ 0.016000] Calibrating delay loop (skipped) preset value.. 5601.18 BogoMIPS (lpj=11202360)
[ 0.016000] Mount-cache hash table entries: 256
[ 0.016000] using C1E aware idle routine
[ 0.016000] Performance Events: AMD PMU driver.
[ 0.016000] ... version: 0
[ 0.016000] ... bit width: 48
[ 0.016000] ... generic registers: 4
[ 0.016000] ... value mask: 0000ffffffffffff
[ 0.016000] ... max period: 00007fffffffffff
[ 0.016007] ... fixed-purpose events: 0
[ 0.016351] ... event mask: 000000000000000f
[ 0.020938] Freeing SMP alternatives: 24k freed
[ 0.021288] ACPI: Core revision 20100121
[ 0.023988] Setting APIC routing to flat
[ 0.025968] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.026356] CPU0: AMD Athlon(tm) II X2 240 Processor stepping 02
[ 0.028000] Booting Node 0, Processors #1 Ok.
[ 0.016000] kvm-clock: cpu 1, msr 0:1511ac1, secondary cpu clock
[ 0.038013] Brought up 2 CPUs
[ 0.038015] Total of 2 processors activated (11202.36 BogoMIPS).
[ 0.038011] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
[ 0.044216] NET: Registered protocol family 16
[ 0.044752] ACPI: bus type pci registered
[ 0.044752] PCI: Using configuration type 1 for base access
[ 0.044859] PCI: Using configuration type 1 for extended access
[ 0.048114] mtrr: your CPUs had inconsistent variable MTRR settings
[ 0.048413] mtrr: your CPUs had inconsistent MTRRdefType settings
[ 0.048780] mtrr: probably your BIOS does not setup all CPUs.
[ 0.049140] mtrr: corrected configuration.
[ 0.064243] bio: create slab <bio-0> at 0
[ 0.068844] ACPI: EC: Look up EC in DSDT
[ 0.074085] ACPI: Interpreter enabled
[ 0.075141] ACPI: (supports S0 S5)
[ 0.076012] ACPI: Using IOAPIC for interrupt routing
[ 0.104232] PCI: Ignoring host bridge windows from ACPI; if necessary, use "pci=use_crs" and report a bug
[ 0.104750] ACPI: PCI Root Bridge [PCI0] (0000:00)
[ 0.105148] pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
[ 0.105148] pci_root PNP0A03:00: host bridge window [io 0x0d00-0xffff] (ignored)
[ 0.105148] pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff] (ignored)
[ 0.105148] pci_root PNP0A03:00: host bridge window [mem 0xe0000000-0xfebfffff] (ignored)
[ 0.105148] pci 0000:00:01.1: reg 20: [io 0xc000-0xc00f]
[ 0.105148] pci 0000:00:01.3: quirk: [io 0xb000-0xb03f] claimed by PIIX4 ACPI
[ 0.105391] pci 0000:00:01.3: quirk: [io 0xb100-0xb10f] claimed by PIIX4 SMB
[ 0.111077] pci 0000:00:02.0: reg 10: [mem 0xf0000000-0xf1ffffff pref]
[ 0.112233] pci 0000:00:02.0: reg 14: [mem 0xf2000000-0xf2000fff]
[ 0.127174] pci 0000:00:03.0: reg 10: [io 0xc020-0xc03f]
[ 0.127356] pci 0000:00:03.0: reg 14: [mem 0xf2001000-0xf2001fff]
[ 0.128712] pci 0000:00:04.0: reg 10: [io 0xc040-0xc05f]
[ 0.129310] pci 0000:00:05.0: reg 10: [io 0xc080-0xc0bf]
[ 0.129895] pci 0000:00:06.0: reg 10: [io 0xc0c0-0xc0ff]
[ 0.130644] pci_bus 0000:00: on NUMA node 0
[ 0.130734] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[ 0.144909] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[ 0.148227] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[ 0.150351] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[ 0.152201] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[ 0.156176] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[ 0.158212] vgaarb: loaded
[ 0.160149] PCI: Using ACPI for IRQ routing
[ 0.161121] PCI: pci_cache_line_size set to 64 bytes
[ 0.161448] reserve RAM buffer: 000000000009a800 - 000000000009ffff
[ 0.161463] reserve RAM buffer: 000000001fffd000 - 000000001fffffff
[ 0.161665] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
[ 0.164078] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[ 0.165801] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
[ 0.172080] Switching to clocksource kvm-clock
[ 0.176620] pnp: PnP ACPI init
[ 0.177795] ACPI: bus type pnp registered
[ 0.185167] pnp: PnP ACPI: found 7 devices
[ 0.186418] ACPI: ACPI bus type pnp unregistered
[ 0.198749] pci_bus 0000:00: resource 0 [io 0x0000-0xffff]
[ 0.198757] pci_bus 0000:00: resource 1 [mem 0x00000000-0xffffffffffffffff]
[ 0.199196] NET: Registered protocol family 2
[ 0.200986] IP route cache hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.203750] TCP established hash table entries: 16384 (order: 6, 262144 bytes)
[ 0.206538] TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
[ 0.208303] TCP: Hash tables configured (established 16384 bind 16384)
[ 0.209610] TCP reno registered
[ 0.210602] UDP hash table entries: 256 (order: 1, 8192 bytes)
[ 0.211828] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
[ 0.213683] NET: Registered protocol family 1
[ 0.214832] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[ 0.215341] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[ 0.215720] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[ 0.216114] pci 0000:00:02.0: Boot video device
[ 0.216140] PCI: CLS 0 bytes, default 64
[ 0.216203] Unpacking initramfs...
[ 0.250071] Freeing initrd memory: 1988k freed
[ 0.259814] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[ 0.261582] msgmni has been set to 997
[ 0.263053] alg: No test for stdrng (krng)
[ 0.281920] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[ 0.282524] io scheduler noop registered
[ 0.282842] io scheduler deadline registered
[ 0.283349] io scheduler cfq registered (default)
[ 0.324346] PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[ 0.328420] serio: i8042 KBD port at 0x60,0x64 irq 1
[ 0.329799] serio: i8042 AUX port at 0x60,0x64 irq 12
[ 0.331768] mice: PS/2 mouse device common for all mice
[ 0.334952] rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
[ 0.336491] rtc0: alarms up to one day, 114 bytes nvram, hpet irqs
[ 0.338002] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
[ 0.338175] cpuidle: using governor ladder
[ 0.338181] cpuidle: using governor menu
[ 0.342454] TCP cubic registered
[ 0.343662] NET: Registered protocol family 17
[ 0.346570] rtc_cmos 00:01: setting system clock to 2010-07-26 14:20:16 UTC (1280154016)
[ 0.349240] Freeing unused kernel memory: 456k freed
[ 0.589068] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
[ 0.589656] virtio-pci 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) -> IRQ 11
[ 0.590307] virtio-pci 0000:00:03.0: setting latency timer to 64
[ 0.610657] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 10
[ 0.611093] virtio-pci 0000:00:04.0: PCI INT A -> Link[LNKD] -> GSI 10 (level, high) -> IRQ 10
[ 0.611743] virtio-pci 0000:00:04.0: setting latency timer to 64
[ 0.611888] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10
[ 0.612267] virtio-pci 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10
[ 0.612906] virtio-pci 0000:00:05.0: setting latency timer to 64
[ 0.626190] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 11
[ 0.626596] virtio-pci 0000:00:06.0: PCI INT A -> Link[LNKB] -> GSI 11 (level, high) -> IRQ 11
[ 0.627256] virtio-pci 0000:00:06.0: setting latency timer to 64
[ 0.658242] vda: vda1 vda2 < vda5 >
[ 0.684803] vdb: vdb1
[ 1.013120] Clocksource tsc unstable (delta = 1037182237254 ns)
[ 1.074934] kjournald starting. Commit interval 5 seconds
[ 1.076360] EXT3-fs (vda1): mounted filesystem with writeback data mode
[ 2.654241] udevd version 125 started
[ 2.948450] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input1
[ 2.949138] ACPI: Power Button [PWRF]
[ 2.970190] virtio-pci 0000:00:03.0: irq 24 for MSI/MSI-X
[ 2.970204] virtio-pci 0000:00:03.0: irq 25 for MSI/MSI-X
[ 2.970217] virtio-pci 0000:00:03.0: irq 26 for MSI/MSI-X
[ 4.599767] Adding 409620k swap on /dev/vda5. Priority:-1 extents:1 across:409620k
[ 5.171407] EXT3-fs (vda1): using internal journal
[ 5.711498] loop: module loaded
[ 11.244320] NET: Registered protocol family 10
[ 11.246995] lo: Disabled Privacy Extensions
[ 21.748123] eth0: no IPv6 routers present
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-26 8:47 ` Andre Przywara
2010-07-26 18:59 ` Arjan Koers
@ 2010-07-27 10:03 ` Avi Kivity
2010-07-27 11:49 ` Andre Przywara
1 sibling, 1 reply; 81+ messages in thread
From: Avi Kivity @ 2010-07-27 10:03 UTC (permalink / raw)
To: Andre Przywara; +Cc: glommer@redhat.com, Zachary Amsden, KVM list
On 07/26/2010 11:47 AM, Andre Przywara wrote:
>> Does this go away with CONFIG_DEBUG_RODATA=n? If so, it's a known
>> bug in the atomic_*() clobber lists.
>>
>
> Unfortunately the bug persists even with CONFIG_DEBUG_RODATA disabled.
> The debug options I had enabled now are:
> CONFIG_DEBUG_DEVRES=y
> CONFIG_DEBUG_FS=y
> CONFIG_DEBUG_KERNEL=y
> CONFIG_DEBUG_BUGVERBOSE=y
> CONFIG_DEBUG_MEMORY_INIT=y
> CONFIG_DEBUG_STACKOVERFLOW=y
> CONFIG_DEBUG_BOOT_PARAMS=y
>
> I even disabled all kernel debug options, that does not help, too.
>
Does changing last_value in arch/x86/kernel/pvclock.c to be non-static help?
What is the guest executing when it hangs?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-27 10:03 ` Avi Kivity
@ 2010-07-27 11:49 ` Andre Przywara
2010-07-27 12:06 ` Avi Kivity
0 siblings, 1 reply; 81+ messages in thread
From: Andre Przywara @ 2010-07-27 11:49 UTC (permalink / raw)
To: Avi Kivity; +Cc: glommer@redhat.com, Zachary Amsden, KVM list
Avi Kivity wrote:
> On 07/26/2010 11:47 AM, Andre Przywara wrote:
>>> Does this go away with CONFIG_DEBUG_RODATA=n? If so, it's a known
>>> bug in the atomic_*() clobber lists.
>>>
>> Unfortunately the bug persists even with CONFIG_DEBUG_RODATA disabled.
>> The debug options I had enabled now are:
>> CONFIG_DEBUG_DEVRES=y
>> CONFIG_DEBUG_FS=y
>> CONFIG_DEBUG_KERNEL=y
>> CONFIG_DEBUG_BUGVERBOSE=y
>> CONFIG_DEBUG_MEMORY_INIT=y
>> CONFIG_DEBUG_STACKOVERFLOW=y
>> CONFIG_DEBUG_BOOT_PARAMS=y
>>
>> I even disabled all kernel debug options, that does not help, too.
>>
>
> Does changing last_value in arch/x86/kernel/pvclock.c to be non-static help?
No, no change. It still hangs.
> What is the guest executing when it hangs?
Both VCPUs are halted, the monitor and System.map tell me it's in
native_safe_halt().
The code sequence confirms this, it is an intentional sti;hlt condition.
Using -smp 16 also shows that all 16 VCPUs are stuck.
Regards,
Andre.
--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-27 11:49 ` Andre Przywara
@ 2010-07-27 12:06 ` Avi Kivity
2010-07-27 12:21 ` Andre Przywara
0 siblings, 1 reply; 81+ messages in thread
From: Avi Kivity @ 2010-07-27 12:06 UTC (permalink / raw)
To: Andre Przywara; +Cc: glommer@redhat.com, Zachary Amsden, KVM list
On 07/27/2010 02:49 PM, Andre Przywara wrote:
>
>> What is the guest executing when it hangs?
> Both VCPUs are halted, the monitor and System.map tell me it's in
> native_safe_halt().
> The code sequence confirms this, it is an intentional sti;hlt condition.
> Using -smp 16 also shows that all 16 VCPUs are stuck.
>
Well, strange. The intent of that patch was to make the clock never go
backwards. Perhaps the change made it go forwards by a large amount,
and the guest is not hung, just waiting for some timer that is far in
the future.
Can you do something like
- if (ret < last)
+ if (ret < last) {
+ static u64 max_delta;
+ if (last - ret > max_delta) {
+ max_delta = last - ret;
+ printk("advancing kvmclock by: %llx\n", max_delta);
+ }
return last;
+ }
to see if this is happening?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-27 12:06 ` Avi Kivity
@ 2010-07-27 12:21 ` Andre Przywara
2010-07-27 12:34 ` Avi Kivity
0 siblings, 1 reply; 81+ messages in thread
From: Andre Przywara @ 2010-07-27 12:21 UTC (permalink / raw)
To: Avi Kivity; +Cc: glommer@redhat.com, Zachary Amsden, KVM list
Avi Kivity wrote:
> On 07/27/2010 02:49 PM, Andre Przywara wrote:
>>> What is the guest executing when it hangs?
>> Both VCPUs are halted, the monitor and System.map tell me it's in
>> native_safe_halt().
>> The code sequence confirms this, it is an intentional sti;hlt condition.
>> Using -smp 16 also shows that all 16 VCPUs are stuck.
>>
>
> Well, strange. The intent of that patch was to make the clock never go
> backwards. Perhaps the change made it go forwards by a large amount,
> and the guest is not hung, just waiting for some timer that is far in
> the future.
>
> Can you do something like
>
> - if (ret < last)
> + if (ret < last) {
> + static u64 max_delta;
> + if (last - ret > max_delta) {
> + max_delta = last - ret;
> + printk("advancing kvmclock by: %llx\n", max_delta);
> + }
> return last;
> + }
>
> to see if this is happening?
No change, it still hangs. I also don't see the printk.
The output with smp=1 is like this:
[ 1.186549] ACPI: Power Button [PWRF]
[ 1.189204] XENFS: not registering filesystem on non-xen platform
[ 1.195001] Non-volatile memory driver v1.3
[ 1.196358] Linux agpgart interface v0.103
[ 1.197687] [drm] Initialized drm 1.1.0 20060810
[ 1.198926] [drm:i915_init] *ERROR* drm/i915 can't work without
intel_agp module!
[ 1.201213] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
ÿ[ 1.460714] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[ 1.463243] 00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[ 1.467153] brd: module loaded
[ 1.469245] loop: module loaded
With smp=2 the output stops just before the strange "y" character (I
guess it's ASCII 255), which I assume is an artifact of the serial console.
As you can see at the timestamps, it takes some time between the last
shown line (1.201213) and the first missing one (1.460714).
Thanks,
Andre.
--
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-27 12:21 ` Andre Przywara
@ 2010-07-27 12:34 ` Avi Kivity
2010-07-27 13:48 ` Andre Przywara
0 siblings, 1 reply; 81+ messages in thread
From: Avi Kivity @ 2010-07-27 12:34 UTC (permalink / raw)
To: Andre Przywara; +Cc: glommer@redhat.com, Zachary Amsden, KVM list
On 07/27/2010 03:21 PM, Andre Przywara wrote:
> Avi Kivity wrote:
>> On 07/27/2010 02:49 PM, Andre Przywara wrote:
>>>> What is the guest executing when it hangs?
>>> Both VCPUs are halted, the monitor and System.map tell me it's in
>>> native_safe_halt().
>>> The code sequence confirms this, it is an intentional sti;hlt
>>> condition.
>>> Using -smp 16 also shows that all 16 VCPUs are stuck.
>>>
>>
>> Well, strange. The intent of that patch was to make the clock never
>> go backwards. Perhaps the change made it go forwards by a large
>> amount, and the guest is not hung, just waiting for some timer that
>> is far in the future.
>>
>> Can you do something like
>>
>> - if (ret < last)
>> + if (ret < last) {
>> + static u64 max_delta;
>> + if (last - ret > max_delta) {
>> + max_delta = last - ret;
>> + printk("advancing kvmclock by: %llx\n", max_delta);
>> + }
>> return last;
>> + }
>>
>> to see if this is happening?
> No change, it still hangs. I also don't see the printk.
> The output with smp=1 is like this:
> [ 1.186549] ACPI: Power Button [PWRF]
> [ 1.189204] XENFS: not registering filesystem on non-xen platform
> [ 1.195001] Non-volatile memory driver v1.3
> [ 1.196358] Linux agpgart interface v0.103
> [ 1.197687] [drm] Initialized drm 1.1.0 20060810
> [ 1.198926] [drm:i915_init] *ERROR* drm/i915 can't work without
> intel_agp module!
> [ 1.201213] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> ÿ[ 1.460714] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> [ 1.463243] 00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> [ 1.467153] brd: module loaded
> [ 1.469245] loop: module loaded
> With smp=2 the output stops just before the strange "y" character (I
> guess it's ASCII 255), which I assume is an artifact of the serial
> console.
> As you can see at the timestamps, it takes some time between the last
> shown line (1.201213) and the first missing one (1.460714).
Wierd. Maybe the clock goes crazy.
Let's see if it jumps forward alot:
} while (unlikely(last != ret));
+
+ {
+ static u64 last_report;
+ if (ret > last_report + 10000) {
+ last_report = ret;
+ printk("kvmclock: %llx\n", ret);
+ }
+
+ }
return ret;
}
Worth updating the 'return last' to update ret and goto the new code, so
we don't miss that path.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-27 12:34 ` Avi Kivity
@ 2010-07-27 13:48 ` Andre Przywara
2010-07-27 13:58 ` Avi Kivity
0 siblings, 1 reply; 81+ messages in thread
From: Andre Przywara @ 2010-07-27 13:48 UTC (permalink / raw)
To: Avi Kivity; +Cc: glommer@redhat.com, Zachary Amsden, KVM list
Avi Kivity wrote:
> On 07/27/2010 03:21 PM, Andre Przywara wrote:
>> Avi Kivity wrote:
>>> On 07/27/2010 02:49 PM, Andre Przywara wrote:
>>>>> What is the guest executing when it hangs?
>>>> Both VCPUs are halted, the monitor and System.map tell me it's in
>>>> native_safe_halt().
>>>> The code sequence confirms this, it is an intentional sti;hlt
>>>> condition.
>>>> Using -smp 16 also shows that all 16 VCPUs are stuck.
>>>>
>>> Well, strange. The intent of that patch was to make the clock never
>>> go backwards. Perhaps the change made it go forwards by a large
>>> amount, and the guest is not hung, just waiting for some timer that
>>> is far in the future.
>>>
>>> Can you do something like
>>>
>>> - if (ret < last)
>>> + if (ret < last) {
>>> + static u64 max_delta;
>>> + if (last - ret > max_delta) {
>>> + max_delta = last - ret;
>>> + printk("advancing kvmclock by: %llx\n", max_delta);
>>> + }
>>> return last;
>>> + }
>>>
>>> to see if this is happening?
>> No change, it still hangs. I also don't see the printk.
>> The output with smp=1 is like this:
>> [ 1.186549] ACPI: Power Button [PWRF]
>> [ 1.189204] XENFS: not registering filesystem on non-xen platform
>> [ 1.195001] Non-volatile memory driver v1.3
>> [ 1.196358] Linux agpgart interface v0.103
>> [ 1.197687] [drm] Initialized drm 1.1.0 20060810
>> [ 1.198926] [drm:i915_init] *ERROR* drm/i915 can't work without
>> intel_agp module!
>> [ 1.201213] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
>> ÿ[ 1.460714] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
>> [ 1.463243] 00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
>> [ 1.467153] brd: module loaded
>> [ 1.469245] loop: module loaded
>> With smp=2 the output stops just before the strange "y" character (I
>> guess it's ASCII 255), which I assume is an artifact of the serial
>> console.
>> As you can see at the timestamps, it takes some time between the last
>> shown line (1.201213) and the first missing one (1.460714).
>
> Wierd. Maybe the clock goes crazy.
>
> Let's see if it jumps forward alot:
>
> } while (unlikely(last != ret));
> +
> + {
> + static u64 last_report;
> + if (ret > last_report + 10000) {
> + last_report = ret;
> + printk("kvmclock: %llx\n", ret);
> + }
> +
> + }
>
> return ret;
> }
>
> Worth updating the 'return last' to update ret and goto the new code, so
> we don't miss that path.
Did that. There is _a lot_ of output (about 350 lines per second via the
115k serial console), both with smp=1 and smp=2.
The majority is differing about 2,000,000 (ticks?), but a handful of
them are in the range of 20 million. No difference between smp=2 and smp=1.
I also get some "BUG: recent printk recursion!" and I don't see any
kernel boot progress beyond outputting the BogoMIPS value.
BTW: I found two message from your earlier debug statement:
[ 0.000000] kvm-clock: cpu 0, msr 0:1ac0401, boot clock
[ 0.000000] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock
Regards,
Andre.
--
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-27 13:48 ` Andre Przywara
@ 2010-07-27 13:58 ` Avi Kivity
2010-07-27 14:55 ` Andre Przywara
0 siblings, 1 reply; 81+ messages in thread
From: Avi Kivity @ 2010-07-27 13:58 UTC (permalink / raw)
To: Andre Przywara; +Cc: glommer@redhat.com, Zachary Amsden, KVM list
On 07/27/2010 04:48 PM, Andre Przywara wrote:
>> Wierd. Maybe the clock goes crazy.
>>
>> Let's see if it jumps forward alot:
>>
>> } while (unlikely(last != ret));
>> +
>> + {
>> + static u64 last_report;
>> + if (ret > last_report + 10000) {
>> + last_report = ret;
>> + printk("kvmclock: %llx\n", ret);
>> + }
>> +
>> + }
>>
>> return ret;
>> }
>>
>> Worth updating the 'return last' to update ret and goto the new code,
>> so we don't miss that path.
>
> Did that. There is _a lot_ of output (about 350 lines per second via
> the 115k serial console), both with smp=1 and smp=2.
> The majority is differing about 2,000,000 (ticks?), but a handful of
> them are in the range of 20 million.
nanoseconds. So 2-20ms. Consistent with 350 lines/sec.
> No difference between smp=2 and smp=1.
> I also get some "BUG: recent printk recursion!" and I don't see any
> kernel boot progress beyond outputting the BogoMIPS value.
Right, printk() wants the time too.
> BTW: I found two message from your earlier debug statement:
> [ 0.000000] kvm-clock: cpu 0, msr 0:1ac0401, boot clock
> [ 0.000000] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock
Those are from kvmclock initialization, not from the older patch.
I'm completely confused, everything seems to be in order.
Let's see. if you s/return last/return ret/ in the original, does this
help things along? this makes pvclock drop the computation and should
be exactly the same as before the patch.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-27 13:58 ` Avi Kivity
@ 2010-07-27 14:55 ` Andre Przywara
2010-07-27 21:51 ` Andre Przywara
0 siblings, 1 reply; 81+ messages in thread
From: Andre Przywara @ 2010-07-27 14:55 UTC (permalink / raw)
To: Avi Kivity; +Cc: glommer@redhat.com, Zachary Amsden, KVM list
Avi Kivity wrote:
> On 07/27/2010 04:48 PM, Andre Przywara wrote:
>>> Wierd. Maybe the clock goes crazy.
>>>
>>> Let's see if it jumps forward alot:
>>>
>>> } while (unlikely(last != ret));
>>> +
>>> + {
>>> + static u64 last_report;
>>> + if (ret > last_report + 10000) {
>>> + last_report = ret;
>>> + printk("kvmclock: %llx\n", ret);
>>> + }
>>> +
>>> + }
>>>
>>> return ret;
>>> }
>>>
>>> Worth updating the 'return last' to update ret and goto the new code,
>>> so we don't miss that path.
>> Did that. There is _a lot_ of output (about 350 lines per second via
>> the 115k serial console), both with smp=1 and smp=2.
>> The majority is differing about 2,000,000 (ticks?), but a handful of
>> them are in the range of 20 million.
>
> nanoseconds. So 2-20ms. Consistent with 350 lines/sec.
>
>> No difference between smp=2 and smp=1.
>> I also get some "BUG: recent printk recursion!" and I don't see any
>> kernel boot progress beyond outputting the BogoMIPS value.
>
> Right, printk() wants the time too.
>
>> BTW: I found two message from your earlier debug statement:
>> [ 0.000000] kvm-clock: cpu 0, msr 0:1ac0401, boot clock
>> [ 0.000000] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock
>
> Those are from kvmclock initialization, not from the older patch.
>
> I'm completely confused, everything seems to be in order.
>
> Let's see. if you s/return last/return ret/ in the original, does this
> help things along? this makes pvclock drop the computation and should
> be exactly the same as before the patch.
Yes, this works, both smp version boot. I see a short very short break
after the line in question, but then it proceeds well.
Thanks for your help, now I got a much better insight into the issue. I
will see if I can find something more.
Regards,
Andre.
--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-26 18:59 ` Arjan Koers
@ 2010-07-27 21:00 ` Arjan Koers
2010-07-28 10:37 ` Avi Kivity
0 siblings, 1 reply; 81+ messages in thread
From: Arjan Koers @ 2010-07-27 21:00 UTC (permalink / raw)
To: kvm
On 2010-07-26 20:59, Arjan Koers wrote:
> I ran into the same problem. 2.6.34.1 and 2.6.35-rc6 SMP guest
> kernels hang during boot.
It appears that last is way ahead of ret twice.
The kernel boots with this debug patch that makes the clock go
backwards if the difference is big:
last = atomic64_read(&last_value);
do {
- if (ret < last)
- return last;
+ if (ret < last) {
+ if ( last - ret < 25000000 )
+ return last;
+ else
+ printk("pvclock backwards: ret = %llx; last = %llx\n", ret, last);
+ }
last = atomic64_cmpxchg(&last_value, last, ret);
} while (unlikely(last != ret));
Here's the boot log:
[ 0.000000] Linux version 2.6.35-rc6-201007272047-guestmp+ (arjan@dev-lenny) (gcc version 4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Tue Jul 27 20:52:36 UTC 2010
[ 0.000000] Command line: root=/dev/vda1 ro single
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009bc00 (usable)
[ 0.000000] BIOS-e820: 000000000009bc00 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 000000001fffd000 (usable)
[ 0.000000] BIOS-e820: 000000001fffd000 - 0000000020000000 (reserved)
[ 0.000000] BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] DMI 2.4 present.
[ 0.000000] e820 update range: 0000000000000000 - 0000000000001000 (usable) ==> (reserved)
[ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
[ 0.000000] No AGP bridge found
[ 0.000000] last_pfn = 0x1fffd max_arch_pfn = 0x400000000
[ 0.000000] MTRR default type: write-back
[ 0.000000] MTRR fixed ranges enabled:
[ 0.000000] 00000-9FFFF write-back
[ 0.000000] A0000-BFFFF uncachable
[ 0.000000] C0000-FFFFF write-protect
[ 0.000000] MTRR variable ranges enabled:
[ 0.000000] 0 base 00E0000000 mask FFE0000000 uncachable
[ 0.000000] 1 disabled
[ 0.000000] 2 disabled
[ 0.000000] 3 disabled
[ 0.000000] 4 disabled
[ 0.000000] 5 disabled
[ 0.000000] 6 disabled
[ 0.000000] 7 disabled
[ 0.000000] x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106
[ 0.000000] initial memory mapped : 0 - 20000000
[ 0.000000] Using GB pages for direct mapping
[ 0.000000] init_memory_mapping: 0000000000000000-000000001fffd000
[ 0.000000] 0000000000 - 001fe00000 page 2M
[ 0.000000] 001fe00000 - 001fffd000 page 4k
[ 0.000000] kernel direct mapping tables up to 1fffd000 @ 8000-b000
[ 0.000000] RAMDISK: 1fdfc000 - 1ffed000
[ 0.000000] ACPI: RSDP 00000000000fdb80 00014 (v00 BOCHS )
[ 0.000000] ACPI: RSDT 000000001fffde10 00034 (v01 BOCHS BXPCRSDT 00000001 BXPC 00000001)
[ 0.000000] ACPI: FACP 000000001ffffe40 00074 (v01 BOCHS BXPCFACP 00000001 BXPC 00000001)
[ 0.000000] ACPI: DSDT 000000001fffdfd0 01E22 (v01 BXPC BXDSDT 00000001 INTL 20090123)
[ 0.000000] ACPI: FACS 000000001ffffe00 00040
[ 0.000000] ACPI: SSDT 000000001fffdf80 00044 (v01 BOCHS BXPCSSDT 00000001 BXPC 00000001)
[ 0.000000] ACPI: APIC 000000001fffde90 0007A (v01 BOCHS BXPCAPIC 00000001 BXPC 00000001)
[ 0.000000] ACPI: HPET 000000001fffde50 00038 (v01 BOCHS BXPCHPET 00000001 BXPC 00000001)
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] kvm-clock: Using msrs 12 and 11
[ 0.000000] kvm-clock: cpu 0, msr 0:1344c01, boot clock
[ 0.000000] [ffffea0000000000-ffffea00007fffff] PMD -> [ffff880001c00000-ffff8800023fffff] on node 0
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0x00000001 -> 0x00001000
[ 0.000000] DMA32 0x00001000 -> 0x00100000
[ 0.000000] Normal empty
[ 0.000000] Movable zone start PFN for each node
[ 0.000000] early_node_map[2] active PFN ranges
[ 0.000000] 0: 0x00000001 -> 0x0000009b
[ 0.000000] 0: 0x00000100 -> 0x0001fffd
[ 0.000000] On node 0 totalpages: 130967
[ 0.000000] DMA zone: 56 pages used for memmap
[ 0.000000] DMA zone: 0 pages reserved
[ 0.000000] DMA zone: 3938 pages, LIFO batch:0
[ 0.000000] DMA32 zone: 1736 pages used for memmap
[ 0.000000] DMA32 zone: 125237 pages, LIFO batch:31
[ 0.000000] ACPI: PM-Timer IO Port: 0xb008
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[ 0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[ 0.000000] ACPI: IRQ0 used by override.
[ 0.000000] ACPI: IRQ2 used by override.
[ 0.000000] ACPI: IRQ5 used by override.
[ 0.000000] ACPI: IRQ9 used by override.
[ 0.000000] ACPI: IRQ10 used by override.
[ 0.000000] ACPI: IRQ11 used by override.
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[ 0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
[ 0.000000] nr_irqs_gsi: 40
[ 0.000000] Allocating PCI resources starting at 20000000 (gap: 20000000:dffc0000)
[ 0.000000] Booting paravirtualized kernel on KVM
[ 0.000000] setup_percpu: NR_CPUS:6 nr_cpumask_bits:6 nr_cpu_ids:2 nr_node_ids:1
[ 0.000000] early_res array is doubled to 64 at [3000 - 37ff]
[ 0.000000] PERCPU: Embedded 26 pages/cpu @ffff880001400000 s75712 r8192 d22592 u1048576
[ 0.000000] pcpu-alloc: s75712 r8192 d22592 u1048576 alloc=1*2097152
[ 0.000000] pcpu-alloc: [0] 0 1
[ 0.000000] kvm-clock: cpu 0, msr 0:1411c01, primary cpu clock
[ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 129175
[ 0.000000] Kernel command line: root=/dev/vda1 ro single
[ 0.000000] PID hash table entries: 2048 (order: 2, 16384 bytes)
[ 0.000000] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
[ 0.000000] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
[ 0.000000] Checking aperture...
[ 0.000000] No AGP bridge found
[ 0.000000] Subtract (38 early reservations)
[ 0.000000] #1 [0001000000 - 00013f4378] TEXT DATA BSS
[ 0.000000] #2 [001fdfc000 - 001ffed000] RAMDISK
[ 0.000000] #3 [000009bc00 - 0000100000] BIOS reserved
[ 0.000000] #4 [00013f5000 - 00013f5071] BRK
[ 0.000000] #5 [0000001000 - 0000003000] TRAMPOLINE
[ 0.000000] #6 [0000008000 - 0000009000] PGTABLE
[ 0.000000] #7 [00013f5080 - 00013f6080] BOOTMEM
[ 0.000000] #8 [00013f4380 - 00013f43e0] BOOTMEM
[ 0.000000] #9 [0001bf7000 - 0001bf8000] BOOTMEM
[ 0.000000] #10 [0001bf8000 - 0001bf9000] BOOTMEM
[ 0.000000] #11 [0001c00000 - 0002400000] MEMMAP 0
[ 0.000000] #12 [00013f4400 - 00013f4580] BOOTMEM
[ 0.000000] #13 [00013f6080 - 00013f9080] BOOTMEM
[ 0.000000] #14 [00013fa000 - 00013fb000] BOOTMEM
[ 0.000000] #15 [00013f4580 - 00013f45c1] BOOTMEM
[ 0.000000] #16 [00013f4600 - 00013f4643] BOOTMEM
[ 0.000000] #17 [00013f4680 - 00013f4808] BOOTMEM
[ 0.000000] #18 [00013f4840 - 00013f48a8] BOOTMEM
[ 0.000000] #19 [00013f48c0 - 00013f4928] BOOTMEM
[ 0.000000] #20 [00013f4940 - 00013f49a8] BOOTMEM
[ 0.000000] #21 [00013f49c0 - 00013f4a28] BOOTMEM
[ 0.000000] #22 [00013f4a40 - 00013f4aa8] BOOTMEM
[ 0.000000] #23 [00013f4ac0 - 00013f4b28] BOOTMEM
[ 0.000000] #24 [00013f4b40 - 00013f4b59] BOOTMEM
[ 0.000000] #25 [00013f4b80 - 00013f4b99] BOOTMEM
[ 0.000000] #26 [0001400000 - 000141a000] BOOTMEM
[ 0.000000] #27 [0001500000 - 000151a000] BOOTMEM
[ 0.000000] #28 [00013f4bc0 - 00013f4bc8] BOOTMEM
[ 0.000000] #29 [00013f4c00 - 00013f4c08] BOOTMEM
[ 0.000000] #30 [00013f4c40 - 00013f4c48] BOOTMEM
[ 0.000000] #31 [00013f4c80 - 00013f4c90] BOOTMEM
[ 0.000000] #32 [00013f4cc0 - 00013f4e00] BOOTMEM
[ 0.000000] #33 [00013f4e00 - 00013f4e60] BOOTMEM
[ 0.000000] #34 [00013f4e80 - 00013f4ee0] BOOTMEM
[ 0.000000] #35 [00013fb000 - 00013ff000] BOOTMEM
[ 0.000000] #36 [000141a000 - 000149a000] BOOTMEM
[ 0.000000] #37 [000149a000 - 00014da000] BOOTMEM
[ 0.000000] Memory: 508600k/524276k available (2135k kernel code, 408k absent, 15268k reserved, 1134k data, 464k init)
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU-based detection of stalled CPUs is disabled.
[ 0.000000] Verbose stalled-CPUs detection is disabled.
[ 0.000000] NR_IRQS:448
[ 0.000000] Console: colour VGA+ 80x25
[ 0.000000] console [tty0] enabled
[ 0.000000] hpet clockevent registered
[ 0.000000] Detected 2799.520 MHz processor.
[ 0.016000] Calibrating delay loop (skipped) preset value.. 5599.04 BogoMIPS (lpj=11198080)
[ 0.016000] pid_max: default: 32768 minimum: 301
[ 0.016000] Mount-cache hash table entries: 256
[ 0.016000] using C1E aware idle routine
[ 0.016000] Performance Events: AMD PMU driver.
[ 0.016000] ... version: 0
[ 0.016000] ... bit width: 48
[ 0.016000] ... generic registers: 4
[ 0.016004] ... value mask: 0000ffffffffffff
[ 0.016388] ... max period: 00007fffffffffff
[ 0.016767] ... fixed-purpose events: 0
[ 0.017109] ... event mask: 000000000000000f
[ 0.021772] Freeing SMP alternatives: 12k freed
[ 0.022135] ACPI: Core revision 20100428
[ 0.024224] Setting APIC routing to flat
[ 0.026212] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.026608] CPU0: AMD Athlon(tm) II X2 240 Processor stepping 02
[ 0.028000] Booting Node 0, Processors #1 Ok.
[ 0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock
[ 0.037105] pvclock backwards: ret = 108372ffd10b; last = 210aff03671a
[ 0.037119] BUG: recent printk recursion!
[ 0.037120] <6>Brought up 2 CPUs
[ 0.037122] Total of 2 processors activated (11198.08 BogoMIPS).
[ 0.037118] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
[ 0.040000] pvclock backwards: ret = 108373705fe2; last = 210aff61470a
[ 0.044219] BUG: recent printk recursion!
[ 0.044220] <6>NET: Registered protocol family 16
[ 0.048108] ACPI: bus type pci registered
[ 0.048447] PCI: Using configuration type 1 for base access
[ 0.048855] PCI: Using configuration type 1 for extended access
[ 0.049280] mtrr: your CPUs had inconsistent variable MTRR settings
[ 0.049280] mtrr: your CPUs had inconsistent MTRRdefType settings
[ 0.049280] mtrr: probably your BIOS does not setup all CPUs.
[ 0.052005] mtrr: corrected configuration.
[ 0.060192] bio: create slab <bio-0> at 0
[ 0.060806] ACPI: EC: Look up EC in DSDT
[ 0.065677] ACPI: Interpreter enabled
[ 0.066004] ACPI: (supports S0 S5)
[ 0.066406] ACPI: Using IOAPIC for interrupt routing
[ 0.084131] PCI: Ignoring host bridge windows from ACPI; if necessary, use "pci=use_crs" and report a bug
[ 0.086541] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[ 0.088068] pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
[ 0.088072] pci_root PNP0A03:00: host bridge window [io 0x0d00-0xffff] (ignored)
[ 0.088075] pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff] (ignored)
[ 0.088078] pci_root PNP0A03:00: host bridge window [mem 0xe0000000-0xfebfffff] (ignored)
[ 0.088713] pci 0000:00:01.1: reg 20: [io 0xc000-0xc00f]
[ 0.089004] pci 0000:00:01.3: quirk: [io 0xb000-0xb03f] claimed by PIIX4 ACPI
[ 0.092010] pci 0000:00:01.3: quirk: [io 0xb100-0xb10f] claimed by PIIX4 SMB
[ 0.097988] pci 0000:00:02.0: reg 10: [mem 0xf0000000-0xf1ffffff pref]
[ 0.098912] pci 0000:00:02.0: reg 14: [mem 0xf2000000-0xf2000fff]
[ 0.104911] pci 0000:00:03.0: reg 10: [io 0xc020-0xc03f]
[ 0.104980] pci 0000:00:03.0: reg 14: [mem 0xf2001000-0xf2001fff]
[ 0.105330] pci 0000:00:04.0: reg 10: [io 0xc040-0xc05f]
[ 0.105636] pci 0000:00:05.0: reg 10: [io 0xc080-0xc0bf]
[ 0.105940] pci 0000:00:06.0: reg 10: [io 0xc0c0-0xc0ff]
[ 0.106325] pci_bus 0000:00: on NUMA node 0
[ 0.106382] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[ 0.116539] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[ 0.117359] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[ 0.118458] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[ 0.120675] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[ 0.121798] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[ 0.124010] vgaarb: loaded
[ 0.124570] PCI: Using ACPI for IRQ routing
[ 0.124605] PCI: pci_cache_line_size set to 64 bytes
[ 0.124781] reserve RAM buffer: 000000000009bc00 - 000000000009ffff
[ 0.124789] reserve RAM buffer: 000000001fffd000 - 000000001fffffff
[ 0.124913] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
[ 0.128044] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[ 0.129060] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
[ 0.140184] Switching to clocksource kvm-clock
[ 0.140791] pnp: PnP ACPI init
[ 0.141564] ACPI: bus type pnp registered
[ 0.148623] pnp: PnP ACPI: found 7 devices
[ 0.149737] ACPI: ACPI bus type pnp unregistered
[ 0.161792] pci_bus 0000:00: resource 0 [io 0x0000-0xffff]
[ 0.161801] pci_bus 0000:00: resource 1 [mem 0x00000000-0xffffffffffffffff]
[ 0.162325] NET: Registered protocol family 2
[ 0.163891] IP route cache hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.166098] TCP established hash table entries: 16384 (order: 6, 262144 bytes)
[ 0.169226] TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
[ 0.170987] TCP: Hash tables configured (established 16384 bind 16384)
[ 0.172335] TCP reno registered
[ 0.173378] UDP hash table entries: 256 (order: 1, 8192 bytes)
[ 0.174607] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
[ 0.176343] NET: Registered protocol family 1
[ 0.197118] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[ 0.197502] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[ 0.197960] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[ 0.198360] pci 0000:00:02.0: Boot video device
[ 0.198385] PCI: CLS 0 bytes, default 64
[ 0.198451] Unpacking initramfs...
[ 0.231639] Freeing initrd memory: 1988k freed
[ 0.241648] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[ 0.243184] msgmni has been set to 997
[ 0.244500] alg: No test for stdrng (krng)
[ 0.245449] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[ 0.246246] io scheduler noop registered
[ 0.246664] io scheduler deadline registered
[ 0.247248] io scheduler cfq registered (default)
[ 0.295494] PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[ 0.298886] serio: i8042 KBD port at 0x60,0x64 irq 1
[ 0.299496] serio: i8042 AUX port at 0x60,0x64 irq 12
[ 0.300600] mice: PS/2 mouse device common for all mice
[ 0.302311] rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
[ 0.303099] rtc0: alarms up to one day, 114 bytes nvram, hpet irqs
[ 0.303836] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
[ 0.304006] cpuidle: using governor ladder
[ 0.304067] cpuidle: using governor menu
[ 0.306138] TCP cubic registered
[ 0.307334] NET: Registered protocol family 17
[ 0.310261] rtc_cmos 00:01: setting system clock to 2010-07-27 20:56:06 UTC (1280264166)
[ 0.312599] Freeing unused kernel memory: 464k freed
[ 0.513685] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
[ 0.514278] virtio-pci 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) -> IRQ 11
[ 0.514928] virtio-pci 0000:00:03.0: setting latency timer to 64
[ 0.515092] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 10
[ 0.515493] virtio-pci 0000:00:04.0: PCI INT A -> Link[LNKD] -> GSI 10 (level, high) -> IRQ 10
[ 0.516198] virtio-pci 0000:00:04.0: setting latency timer to 64
[ 0.536171] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10
[ 0.536565] virtio-pci 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10
[ 0.537225] virtio-pci 0000:00:05.0: setting latency timer to 64
[ 0.537386] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 11
[ 0.537762] virtio-pci 0000:00:06.0: PCI INT A -> Link[LNKB] -> GSI 11 (level, high) -> IRQ 11
[ 0.538410] virtio-pci 0000:00:06.0: setting latency timer to 64
[ 0.634593] vda: vda1 vda2 < vda5 >
[ 0.649159] vdb: vdb1
[ 1.013119] Clocksource tsc unstable (delta = 582181654385 ns)
[ 1.044251] EXT3-fs: barriers not enabled
[ 1.063011] kjournald starting. Commit interval 5 seconds
[ 1.063115] EXT3-fs (vda1): mounted filesystem with writeback data mode
[ 2.620528] udevd version 125 started
[ 2.865930] virtio-pci 0000:00:03.0: irq 40 for MSI/MSI-X
[ 2.865945] virtio-pci 0000:00:03.0: irq 41 for MSI/MSI-X
[ 2.865958] virtio-pci 0000:00:03.0: irq 42 for MSI/MSI-X
[ 2.910519] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input1
[ 2.912585] ACPI: Power Button [PWRF]
[ 2.921754] ACPI: acpi_idle registered with cpuidle
[ 4.408057] Adding 409620k swap on /dev/vda5. Priority:-1 extents:1 across:409620k
[ 4.959959] EXT3-fs (vda1): using internal journal
[ 5.495306] loop: module loaded
[ 9.680594] hrtimer: interrupt took 11934233 ns
[ 10.246663] NET: Registered protocol family 10
[ 10.247565] lo: Disabled Privacy Extensions
[ 20.576118] eth0: no IPv6 routers present
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-27 14:55 ` Andre Przywara
@ 2010-07-27 21:51 ` Andre Przywara
2010-07-28 3:00 ` Zachary Amsden
2010-07-28 12:25 ` Andre Przywara
0 siblings, 2 replies; 81+ messages in thread
From: Andre Przywara @ 2010-07-27 21:51 UTC (permalink / raw)
To: Avi Kivity; +Cc: glommer@redhat.com, Zachary Amsden, KVM list
Andre Przywara wrote:
> Avi Kivity wrote:
>> On 07/27/2010 04:48 PM, Andre Przywara wrote:
>>>> Wierd. Maybe the clock goes crazy.
>>>>
>>>> Let's see if it jumps forward alot:
>>>>
>>>> } while (unlikely(last != ret));
>>>> +
>>>> + {
>>>> + static u64 last_report;
>>>> + if (ret > last_report + 10000) {
>>>> + last_report = ret;
>>>> + printk("kvmclock: %llx\n", ret);
>>>> + }
>>>> +
>>>> + }
>>>>
>>>> return ret;
>>>> }
>>>>
>>>> Worth updating the 'return last' to update ret and goto the new code,
>>>> so we don't miss that path.
>>> Did that. There is _a lot_ of output (about 350 lines per second via
>>> the 115k serial console), both with smp=1 and smp=2.
>>> The majority is differing about 2,000,000 (ticks?), but a handful of
>>> them are in the range of 20 million.
>> nanoseconds. So 2-20ms. Consistent with 350 lines/sec.
>>
>>> No difference between smp=2 and smp=1.
>>> I also get some "BUG: recent printk recursion!" and I don't see any
>>> kernel boot progress beyond outputting the BogoMIPS value.
>> Right, printk() wants the time too.
>>
>>> BTW: I found two message from your earlier debug statement:
>>> [ 0.000000] kvm-clock: cpu 0, msr 0:1ac0401, boot clock
>>> [ 0.000000] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock
>> Those are from kvmclock initialization, not from the older patch.
>>
>> I'm completely confused, everything seems to be in order.
>>
>> Let's see. if you s/return last/return ret/ in the original, does this
>> help things along? this makes pvclock drop the computation and should
>> be exactly the same as before the patch.
> Yes, this works, both smp version boot. I see a short very short break
> after the line in question, but then it proceeds well.
> Thanks for your help, now I got a much better insight into the issue. I
> will see if I can find something more.
Did some more investigations, some observations:
- The cmpxchg does not seem to be a problem, I didn't see the loop
iterated more than once.
- Turning off printk-timestamps makes the bug go away. But I guess it is
just hiding or deferring it, and it's no real workaround anyway.
- I instrumented the "if (ret < last) return last;" statement, when the
kernel hangs I get only printks from there, although it has hit before:
----------
[ 0.820000] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[ 0.820000] returning last instead (cnt=19001)
[ 0.820000] returning last instead (cnt=20001)
The last line repeats forever with the same timestamp, the counter
(counting the number of "return last;") increments about 3500 times/second.
I will see if I find something more...
Regards,
Andre.
--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-27 21:51 ` Andre Przywara
@ 2010-07-28 3:00 ` Zachary Amsden
2010-07-28 7:55 ` Andre Przywara
2010-07-28 12:25 ` Andre Przywara
1 sibling, 1 reply; 81+ messages in thread
From: Zachary Amsden @ 2010-07-28 3:00 UTC (permalink / raw)
To: Andre Przywara; +Cc: Avi Kivity, glommer@redhat.com, KVM list
On 07/27/2010 11:51 AM, Andre Przywara wrote:
> Andre Przywara wrote:
>> Avi Kivity wrote:
>>> On 07/27/2010 04:48 PM, Andre Przywara wrote:
>>>>> Wierd. Maybe the clock goes crazy.
>>>>>
>>>>> Let's see if it jumps forward alot:
>>>>>
>>>>> } while (unlikely(last != ret));
>>>>> +
>>>>> + {
>>>>> + static u64 last_report;
>>>>> + if (ret > last_report + 10000) {
>>>>> + last_report = ret;
>>>>> + printk("kvmclock: %llx\n", ret);
>>>>> + }
>>>>> +
>>>>> + }
>>>>>
>>>>> return ret;
>>>>> }
>>>>>
>>>>> Worth updating the 'return last' to update ret and goto the new
>>>>> code, so we don't miss that path.
>>>> Did that. There is _a lot_ of output (about 350 lines per second
>>>> via the 115k serial console), both with smp=1 and smp=2.
>>>> The majority is differing about 2,000,000 (ticks?), but a handful
>>>> of them are in the range of 20 million.
>>> nanoseconds. So 2-20ms. Consistent with 350 lines/sec.
>>>
>>>> No difference between smp=2 and smp=1.
>>>> I also get some "BUG: recent printk recursion!" and I don't see any
>>>> kernel boot progress beyond outputting the BogoMIPS value.
>>> Right, printk() wants the time too.
>>>
>>>> BTW: I found two message from your earlier debug statement:
>>>> [ 0.000000] kvm-clock: cpu 0, msr 0:1ac0401, boot clock
>>>> [ 0.000000] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock
>>> Those are from kvmclock initialization, not from the older patch.
>>>
>>> I'm completely confused, everything seems to be in order.
>>>
>>> Let's see. if you s/return last/return ret/ in the original, does
>>> this help things along? this makes pvclock drop the computation and
>>> should be exactly the same as before the patch.
>> Yes, this works, both smp version boot. I see a short very short
>> break after the line in question, but then it proceeds well.
>> Thanks for your help, now I got a much better insight into the issue.
>> I will see if I can find something more.
> Did some more investigations, some observations:
> - The cmpxchg does not seem to be a problem, I didn't see the loop
> iterated more than once.
> - Turning off printk-timestamps makes the bug go away. But I guess it
> is just hiding or deferring it, and it's no real workaround anyway.
> - I instrumented the "if (ret < last) return last;" statement, when
> the kernel hangs I get only printks from there, although it has hit
> before:
> ----------
> [ 0.820000] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [ 0.820000] returning last instead (cnt=19001)
> [ 0.820000] returning last instead (cnt=20001)
> The last line repeats forever with the same timestamp, the counter
> (counting the number of "return last;") increments about 3500
> times/second.
>
> I will see if I find something more...
>
> Regards,
> Andre.
>
gcc --version?
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-28 3:00 ` Zachary Amsden
@ 2010-07-28 7:55 ` Andre Przywara
0 siblings, 0 replies; 81+ messages in thread
From: Andre Przywara @ 2010-07-28 7:55 UTC (permalink / raw)
To: Zachary Amsden; +Cc: Avi Kivity, glommer@redhat.com, KVM list
Zachary Amsden wrote:
> On 07/27/2010 11:51 AM, Andre Przywara wrote:
>> Andre Przywara wrote:
>>> Avi Kivity wrote:
>>>> On 07/27/2010 04:48 PM, Andre Przywara wrote:
>>>>>> Wierd. Maybe the clock goes crazy.
>>>>>>
>>>>>> Let's see if it jumps forward alot:
>>>>>>
>>>>>> } while (unlikely(last != ret));
>>>>>> +
>>>>>> + {
>>>>>> + static u64 last_report;
>>>>>> + if (ret > last_report + 10000) {
>>>>>> + last_report = ret;
>>>>>> + printk("kvmclock: %llx\n", ret);
>>>>>> + }
>>>>>> +
>>>>>> + }
>>>>>>
>>>>>> return ret;
>>>>>> }
>>>>>>
>>>>>> Worth updating the 'return last' to update ret and goto the new
>>>>>> code, so we don't miss that path.
>>>>> Did that. There is _a lot_ of output (about 350 lines per second
>>>>> via the 115k serial console), both with smp=1 and smp=2.
>>>>> The majority is differing about 2,000,000 (ticks?), but a handful
>>>>> of them are in the range of 20 million.
>>>> nanoseconds. So 2-20ms. Consistent with 350 lines/sec.
>>>>
>>>>> No difference between smp=2 and smp=1.
>>>>> I also get some "BUG: recent printk recursion!" and I don't see any
>>>>> kernel boot progress beyond outputting the BogoMIPS value.
>>>> Right, printk() wants the time too.
>>>>
>>>>> BTW: I found two message from your earlier debug statement:
>>>>> [ 0.000000] kvm-clock: cpu 0, msr 0:1ac0401, boot clock
>>>>> [ 0.000000] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock
>>>> Those are from kvmclock initialization, not from the older patch.
>>>>
>>>> I'm completely confused, everything seems to be in order.
>>>>
>>>> Let's see. if you s/return last/return ret/ in the original, does
>>>> this help things along? this makes pvclock drop the computation and
>>>> should be exactly the same as before the patch.
>>> Yes, this works, both smp version boot. I see a short very short
>>> break after the line in question, but then it proceeds well.
>>> Thanks for your help, now I got a much better insight into the issue.
>>> I will see if I can find something more.
>> Did some more investigations, some observations:
>> - The cmpxchg does not seem to be a problem, I didn't see the loop
>> iterated more than once.
>> - Turning off printk-timestamps makes the bug go away. But I guess it
>> is just hiding or deferring it, and it's no real workaround anyway.
>> - I instrumented the "if (ret < last) return last;" statement, when
>> the kernel hangs I get only printks from there, although it has hit
>> before:
>> ----------
>> [ 0.820000] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
>> [ 0.820000] returning last instead (cnt=19001)
>> [ 0.820000] returning last instead (cnt=20001)
>> The last line repeats forever with the same timestamp, the counter
>> (counting the number of "return last;") increments about 3500
>> times/second.
>>
>> I will see if I find something more...
>>
> gcc --version?
That would be 4.3.3
I compiled the guest kernel with 4.4.4 also, that made no difference.
Regards,
Andre.
--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-27 21:00 ` Arjan Koers
@ 2010-07-28 10:37 ` Avi Kivity
2010-07-31 0:34 ` Arjan Koers
0 siblings, 1 reply; 81+ messages in thread
From: Avi Kivity @ 2010-07-28 10:37 UTC (permalink / raw)
To: Arjan Koers; +Cc: kvm, Zachary Amsden
On 07/28/2010 12:00 AM, Arjan Koers wrote:
> On 2010-07-26 20:59, Arjan Koers wrote:
>
>> I ran into the same problem. 2.6.34.1 and 2.6.35-rc6 SMP guest
>> kernels hang during boot.
>
> It appears that last is way ahead of ret twice.
> The kernel boots with this debug patch that makes the clock go
> backwards if the difference is big:
>
> last = atomic64_read(&last_value);
> do {
> - if (ret< last)
> - return last;
> + if (ret< last) {
> + if ( last - ret< 25000000 )
> + return last;
> + else
> + printk("pvclock backwards: ret = %llx; last = %llx\n", ret, last);
> + }
> last = atomic64_cmpxchg(&last_value, last, ret);
> } while (unlikely(last != ret));
>
>
>
> [ 0.037122] Total of 2 processors activated (11198.08 BogoMIPS).
> [ 0.037118] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
> [ 0.040000] pvclock backwards: ret = 108373705fe2; last = 210aff61470a
Zaaaacchhhh?!
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-27 21:51 ` Andre Przywara
2010-07-28 3:00 ` Zachary Amsden
@ 2010-07-28 12:25 ` Andre Przywara
2010-07-30 22:54 ` Zachary Amsden
1 sibling, 1 reply; 81+ messages in thread
From: Andre Przywara @ 2010-07-28 12:25 UTC (permalink / raw)
To: Avi Kivity; +Cc: glommer@redhat.com, Zachary Amsden, KVM list
Andre Przywara wrote:
> Andre Przywara wrote:
>> Avi Kivity wrote:
>>> On 07/27/2010 04:48 PM, Andre Przywara wrote:
>>>>> Wierd. Maybe the clock goes crazy.
>>>>>
>>>>> Let's see if it jumps forward alot:
>>>>>
>>>>> } while (unlikely(last != ret));
>>>>> +
>>>>> + {
>>>>> + static u64 last_report;
>>>>> + if (ret > last_report + 10000) {
>>>>> + last_report = ret;
>>>>> + printk("kvmclock: %llx\n", ret);
>>>>> + }
>>>>> +
>>>>> + }
>>>>>
>>>>> return ret;
>>>>> }
>>>>>
>>>>> Worth updating the 'return last' to update ret and goto the new code,
>>>>> so we don't miss that path.
>>>> Did that. There is _a lot_ of output (about 350 lines per second via
>>>> the 115k serial console), both with smp=1 and smp=2.
>>>> The majority is differing about 2,000,000 (ticks?), but a handful of
>>>> them are in the range of 20 million.
>>> nanoseconds. So 2-20ms. Consistent with 350 lines/sec.
>>>
>>>> No difference between smp=2 and smp=1.
>>>> I also get some "BUG: recent printk recursion!" and I don't see any
>>>> kernel boot progress beyond outputting the BogoMIPS value.
>>> Right, printk() wants the time too.
>>>
>>>> BTW: I found two message from your earlier debug statement:
>>>> [ 0.000000] kvm-clock: cpu 0, msr 0:1ac0401, boot clock
>>>> [ 0.000000] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock
>>> Those are from kvmclock initialization, not from the older patch.
>>>
>>> I'm completely confused, everything seems to be in order.
>>>
>>> Let's see. if you s/return last/return ret/ in the original, does this
>>> help things along? this makes pvclock drop the computation and should
>>> be exactly the same as before the patch.
>> Yes, this works, both smp version boot. I see a short very short break
>> after the line in question, but then it proceeds well.
>> Thanks for your help, now I got a much better insight into the issue. I
>> will see if I can find something more.
> Did some more investigations, some observations:
> - The cmpxchg does not seem to be a problem, I didn't see the loop
> iterated more than once.
> - Turning off printk-timestamps makes the bug go away. But I guess it is
> just hiding or deferring it, and it's no real workaround anyway.
> - I instrumented the "if (ret < last) return last;" statement, when the
> kernel hangs I get only printks from there, although it has hit before:
> ----------
> [ 0.820000] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [ 0.820000] returning last instead (cnt=19001)
> [ 0.820000] returning last instead (cnt=20001)
> The last line repeats forever with the same timestamp, the counter
> (counting the number of "return last;") increments about 3500 times/second.
>
> I will see if I find something more...
Added some more instrumentation, seems like the values read from the
pvclock is bogus *sometimes*:
returning last instead (2778021535795841, cnt=1, diff=1389078312510470)
This is from the first time the if-statement triggers. So I guess the
value read is ridiculously far in the future (multiple days), so next
calls to clocksource_read() will always return this bogus last value.
This means that the clock does not make progress (for several days) and
thus timing loops will never come to an end. I also instrumented the
serial driver, the last thing I saw was autoconfig_irq, where obviously
udelay() is called.
Does that ring a bell with someone?
I will now concentrate on the pvclock readout/HV write part to see which
of the values used here are wrong.
Regards,
Andre.
--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-28 12:25 ` Andre Przywara
@ 2010-07-30 22:54 ` Zachary Amsden
2010-08-02 10:12 ` Andre Przywara
0 siblings, 1 reply; 81+ messages in thread
From: Zachary Amsden @ 2010-07-30 22:54 UTC (permalink / raw)
To: Andre Przywara; +Cc: Avi Kivity, glommer@redhat.com, KVM list
On 07/28/2010 02:25 AM, Andre Przywara wrote:
> Andre Przywara wrote:
>> Andre Przywara wrote:
>>> Avi Kivity wrote:
>>>> On 07/27/2010 04:48 PM, Andre Przywara wrote:
>>>>>> Wierd. Maybe the clock goes crazy.
>>>>>>
>>>>>> Let's see if it jumps forward alot:
>>>>>>
>>>>>> } while (unlikely(last != ret));
>>>>>> +
>>>>>> + {
>>>>>> + static u64 last_report;
>>>>>> + if (ret > last_report + 10000) {
>>>>>> + last_report = ret;
>>>>>> + printk("kvmclock: %llx\n", ret);
>>>>>> + }
>>>>>> +
>>>>>> + }
>>>>>>
>>>>>> return ret;
>>>>>> }
>>>>>>
>>>>>> Worth updating the 'return last' to update ret and goto the new
>>>>>> code, so we don't miss that path.
>>>>> Did that. There is _a lot_ of output (about 350 lines per second
>>>>> via the 115k serial console), both with smp=1 and smp=2.
>>>>> The majority is differing about 2,000,000 (ticks?), but a handful
>>>>> of them are in the range of 20 million.
>>>> nanoseconds. So 2-20ms. Consistent with 350 lines/sec.
>>>>
>>>>> No difference between smp=2 and smp=1.
>>>>> I also get some "BUG: recent printk recursion!" and I don't see
>>>>> any kernel boot progress beyond outputting the BogoMIPS value.
>>>> Right, printk() wants the time too.
>>>>
>>>>> BTW: I found two message from your earlier debug statement:
>>>>> [ 0.000000] kvm-clock: cpu 0, msr 0:1ac0401, boot clock
>>>>> [ 0.000000] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock
>>>> Those are from kvmclock initialization, not from the older patch.
>>>>
>>>> I'm completely confused, everything seems to be in order.
>>>>
>>>> Let's see. if you s/return last/return ret/ in the original, does
>>>> this help things along? this makes pvclock drop the computation
>>>> and should be exactly the same as before the patch.
>>> Yes, this works, both smp version boot. I see a short very short
>>> break after the line in question, but then it proceeds well.
>>> Thanks for your help, now I got a much better insight into the
>>> issue. I will see if I can find something more.
>> Did some more investigations, some observations:
>> - The cmpxchg does not seem to be a problem, I didn't see the loop
>> iterated more than once.
>> - Turning off printk-timestamps makes the bug go away. But I guess it
>> is just hiding or deferring it, and it's no real workaround anyway.
>> - I instrumented the "if (ret < last) return last;" statement, when
>> the kernel hangs I get only printks from there, although it has hit
>> before:
>> ----------
>> [ 0.820000] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
>> [ 0.820000] returning last instead (cnt=19001)
>> [ 0.820000] returning last instead (cnt=20001)
>> The last line repeats forever with the same timestamp, the counter
>> (counting the number of "return last;") increments about 3500
>> times/second.
>>
>> I will see if I find something more...
> Added some more instrumentation, seems like the values read from the
> pvclock is bogus *sometimes*:
> returning last instead (2778021535795841, cnt=1, diff=1389078312510470)
> This is from the first time the if-statement triggers. So I guess the
> value read is ridiculously far in the future (multiple days), so next
> calls to clocksource_read() will always return this bogus last value.
> This means that the clock does not make progress (for several days)
> and thus timing loops will never come to an end. I also instrumented
> the serial driver, the last thing I saw was autoconfig_irq, where
> obviously udelay() is called.
>
> Does that ring a bell with someone?
>
> I will now concentrate on the pvclock readout/HV write part to see
> which of the values used here are wrong.
Have you gotten any further results on this?
I think the most likely explanation is that your host CPU has TSC out of
sync, and somehow this leaks over to pvclock. Am I correct that it
happens even with one guest VCPU? What if you disable secondary host CPUs?
Zach
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-28 10:37 ` Avi Kivity
@ 2010-07-31 0:34 ` Arjan Koers
2010-07-31 1:38 ` Zachary Amsden
2010-07-31 2:39 ` Zachary Amsden
0 siblings, 2 replies; 81+ messages in thread
From: Arjan Koers @ 2010-07-31 0:34 UTC (permalink / raw)
To: kvm; +Cc: Avi Kivity, Zachary Amsden
On 2010-07-28 12:37, Avi Kivity wrote:
> On 07/28/2010 12:00 AM, Arjan Koers wrote:
>> On 2010-07-26 20:59, Arjan Koers wrote:
>>
>>> I ran into the same problem. 2.6.34.1 and 2.6.35-rc6 SMP guest
>>> kernels hang during boot.
>>
>> It appears that last is way ahead of ret twice.
>> The kernel boots with this debug patch that makes the clock go
>> backwards if the difference is big:
>>
>> last = atomic64_read(&last_value);
>> do {
>> - if (ret< last)
>> - return last;
>> + if (ret< last) {
>> + if ( last - ret< 25000000 )
>> + return last;
>> + else
>> + printk("pvclock backwards: ret = %llx; last =
>> %llx\n", ret, last);
>> + }
>> last = atomic64_cmpxchg(&last_value, last, ret);
>> } while (unlikely(last != ret));
>>
>>
>>
>> [ 0.037122] Total of 2 processors activated (11198.08 BogoMIPS).
>> [ 0.037118] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
>> [ 0.040000] pvclock backwards: ret = 108373705fe2; last = 210aff61470a
>
> Zaaaacchhhh?!
>
The lists below show some debug data of the first 99 calls to
pvclock_clocksource_read since the kernel booted. The situation
after the 'do ... while (version != src->version)' loop is
displayed.
Meaning of the columns:
- src pointer
- shadow.tsc_timestamp
- shadow.system_timestamp
- shadow.version
- native_read_tsc()
- delta = native_read_tsc() - shadow.tsc_timestamp
- offset = scale_delta(delta, shadow.tsc_to_nsec_mul, shadow.tsc_shift)
- ret = shadow.system_timestamp + offset
Fields left out, because they were the same for all rows:
- shadow.tsc_to_nsec_mul: b6dc43b6
- shadow.tsc_shift: ffffffff
- shadow.flags: 0
Debug log of guest after cold boot of virtual machine:
1: ffff880001411c00 2107d5a4e b42c01d704c6 8294 210d8d4b5 5b7a67 20abdc b42c01f7b0a2
2: ffff880001411c00 2107d5a4e b42c01d704c6 8294 210dc2b61 5ed113 21dd1b b42c01f8e1e1
3: ffff880001411c00 21cb0d4a8 b42c0632768f bb70 21cb10a00 3558 130d b42c0632899c
4: ffff880001411c00 21cb0d4a8 b42c0632768f bb70 21cb11f17 4a6f 1a95 b42c06329124
5: ffff880001411c00 21cceaad2 b42c063d1e45 bbd8 21ccec522 1a50 965 b42c063d27aa
6: ffff880001411c00 21cde0644 b42c06429a42 bc10 21ce25457 44e13 1899a b42c064423dc
7: ffff880001411c00 21cf905c1 b42c064c3e76 bc46 21cfa182b 1126a 6201 b42c064ca077
8: ffff880001411c00 21d088194 b42c0651c601 bc7a 21d089592 13fe 723 b42c0651cd24
9: ffff880001411c00 21d1ad073 b42c06584fc3 bcde 21d1b135d 42ea 17e5 b42c065867a8
10: ffff880001411c00 21d2a3837 b42c065dd039 bd10 21d2a4825 fee 5b0 b42c065dd5e9
11: ffff880001411c00 21d38bab3 b42c0662fea6 bd42 21d38caa1 fee 5b0 b42c06630456
12: ffff880001411c00 21d47459b b42c06683029 bd78 21d475517 f7c 587 b42c066835b0
13: ffff880001411c00 21d578ce7 b42c066e005f bdb2 21d57d70c 4a25 1a7a b42c066e1ad9
14: ffff880001411c00 21d578ce7 b42c066e005f bdb2 21d57d8d6 4bef 1b1e b42c066e1b7d
15: ffff880001411c00 21d578ce7 b42c066e005f bdb2 21d57da22 4d3b 1b94 b42c066e1bf3
16: ffff880001411c00 21d578ce7 b42c066e005f bdb2 21d57fc5e 6f77 27ce b42c066e282d
17: ffff880001411c00 21d67c77c b42c0673cc0a bde4 21d67d685 f09 55e b42c0673d168
18: ffff880001411c00 21d7625b2 b42c0678ed96 be16 21d763488 ed6 54c b42c0678f2e2
19: ffff880001411c00 21df3db36 b42c06a5d222 be54 21dfa78b9 69d83 25cd5 b42c06a82ef7
20: ffff880001411c00 21df3db36 b42c06a5d222 be54 21dfa7a3f 69f09 25d61 b42c06a82f83
21: ffff880001411c00 21df3db36 b42c06a5d222 be54 21dfa7f8b 6a455 25f45 b42c06a83167
22: ffff880001411c00 21e3a50ea b42c06befbb1 be58 21e3c1750 1c666 a249 b42c06bf9dfa
23: ffff880001411c00 21e4bfe47 b42c06c54bc5 be92 21e4c4c61 4e1a 1be4 b42c06c567a9
24: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea4b224 20cb6 bb66 b42c06e4f922
25: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea52748 281da e53c b42c06e522f8
26: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea52907 28399 e5db b42c06e52397
27: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea52a76 28508 e65f b42c06e5241b
28: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea5c86a 322fc 11ec9 b42c06e55c85
29: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea60e3a 368cc 137b7 b42c06e57573
30: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea64dc8 3a85a 14e6a b42c06e58c26
31: ffff880001411c00 21ed8a003 b42c06f78496 bf02 21efda28b 250288 d37d2 b42c0704bc68
32: ffff880001411c00 21f0e9488 b42c070ac93f bf38 21f0eacdb 1853 8af b42c070ad1ee
33: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230aeeac6 3e60 1646 b42c0d5636ed
34: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230af06d0 5a6a 204a b42c0d5640f1
35: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b03f25 192bf 8fd6 b42c0d56b07d
36: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b043c8 19762 917f b42c0d56b226
37: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b0526b 1a605 96b8 b42c0d56b75f
38: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b05632 1a9cc 9812 b42c0d56b8b9
39: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b07eaa 1d244 a686 b42c0d56c72d
40: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b094e9 1e883 ae78 b42c0d56cf1f
41: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b09962 1ecfc b011 b42c0d56d0b8
42: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b10590 2592a d6b4 b42c0d56f75b
43: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b1090d 25ca7 d7f3 b42c0d56f89a
44: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b10f99 26333 da49 b42c0d56faf0
45: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b11204 2659e db27 b42c0d56fbce
46: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b1217c 27516 e0ad b42c0d570154
47: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b1483f 29bd9 ee85 b42c0d570f2c
48: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b14ba6 29f40 efbc b42c0d571063
49: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b15569 2a903 f338 b42c0d5713df
50: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b250b3 3a44d 14cf8 b42c0d576d9f
51: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b254a0 3a83a 14e5f b42c0d576f06
52: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b25bd8 3af72 150f3 b42c0d57719a
53: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b25ec3 3b25d 151fd b42c0d5772a4
54: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b5fcab 75045 29cad b42c0d58bd54
55: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b6013b 754d5 29e4e b42c0d58bef5
56: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b6b86c 80c06 2dfbc b42c0d590063
57: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b6bc41 80fdb 2e11a b42c0d5901c1
58: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b6c4e5 8187f 2e430 b42c0d5904d7
59: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b6c776 81b10 2e51b b42c0d5905c2
60: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b7f97b 94d15 35266 b42c0d59730d
61: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b864af 9b849 378b0 b42c0d599957
62: ffff880001411c00 23132e49d b42c0d855884 c16e 231599c3a 26b79d dd3ec b42c0d932c70
63: ffff880001411c00 23132e49d b42c0d855884 c16e 231599dbc 26b91f dd476 b42c0d932cfa
64: ffff880001411c00 23132e49d b42c0d855884 c16e 231599f5f 26bac2 dd50c b42c0d932d90
65: ffff880001411c00 231fdf357 b42c0dcddc47 c176 232046a74 6771d 24f1e b42c0dd02b65
66: ffff880001411c00 231fdf357 b42c0dcddc47 c176 232046c53 678fc 24fca b42c0dd02c11
67: ffff880001411c00 231fdf357 b42c0dcddc47 c176 232046da0 67a49 25040 b42c0dd02c87
68: ffff880001411c00 232f4a54e b42c0e25f5e7 c17c 232f62a2d 184df 8ae2 b42c0e2680c9
69: ffff880001411c00 232f4a54e b42c0e25f5e7 c17c 232f63478 18f2a 8e8f b42c0e268476
70: ffff880001411c00 232f4a54e b42c0e25f5e7 c17c 232f63f61 19a13 9274 b42c0e26885b
71: ffff880001511c00 20afec946 b42bffe0b604 130 1f890681eacdf 1f88e5d1fe399 b433ab005565 1685faae10b69
72: ffff880001411c00 2334400d3 b42c0e424ccd c180 23344a923 a850 3c1c b42c0e4288e9
73: ffff880001411c00 2334400d3 b42c0e424ccd c180 2334632f1 2321e c8c2 b42c0e43158f
74: ffff880001411c00 2334400d3 b42c0e424ccd c180 23346a094 29fc1 efea b42c0e433cb7
75: ffff880001411c00 2334400d3 b42c0e424ccd c180 23347021d 3014a 112c0 b42c0e435f8d
76: ffff880001411c00 2334400d3 b42c0e424ccd c180 2335ba33b 17a268 870e5 b42c0e4abdb2
77: ffff880001411c00 2334400d3 b42c0e424ccd c180 2335ba9f8 17a925 8734d b42c0e4ac01a
78: ffff880001411c00 2334400d3 b42c0e424ccd c180 2335bb17d 17b0aa 875fd b42c0e4ac2ca
79: ffff880001511c00 20afec946 b42bffe0b604 130 1f89068505355 1f88e5d518a0f b433ab1210ed 1685faaf2c6f1
80: ffff880001511c00 1f8906862c74e b42c0e59371c 2 1f8906863ad24 e5d6 5215 b42c0e598931
81: ffff880001511c00 1f8906862c74e b42c0e59371c 2 1f8906863b980 f232 567f b42c0e598d9b
82: ffff880001511c00 1f8906862c74e b42c0e59371c 2 1f8906863bbdd f48f 5757 b42c0e598e73
83: ffff880001511c00 1f8906862c74e b42c0e59371c 2 1f8906863e9d2 12284 67c1 b42c0e599edd
84: ffff880001411c00 2334400d3 b42c0e424ccd c180 233855729 415656 1755cc b42c0e59a299
85: ffff880001511c00 1f8906862c74e b42c0e59371c 2 1f890686410b4 14966 75a4 b42c0e59acc0
86: ffff880001411c00 2334400d3 b42c0e424ccd c180 233857b87 417ab4 1762c9 b42c0e59af96
87: ffff880001511c00 1f8906862c74e b42c0e59371c 2 1f89068646b9e 1a450 961d b42c0e59cd39
88: ffff880001411c00 2334400d3 b42c0e424ccd c180 233894271 45419e 18bc1e b42c0e5b08eb
89: ffff880001411c00 2334400d3 b42c0e424ccd c180 2338ab48a 46b3b7 19404c b42c0e5b8d19
90: ffff880001411c00 2334400d3 b42c0e424ccd c180 2338adf39 46de66 194f8b b42c0e5b9c58
91: ffff880001411c00 2334400d3 b42c0e424ccd c180 2338b39b8 4738e5 196fdc b42c0e5bbca9
92: ffff880001511c00 1f890686bf9e1 b42c0e5c8045 4 1f890686cf137 f756 5855 b42c0e5cd89a
93: ffff880001511c00 1f890686bf9e1 b42c0e5c8045 4 1f890686cfd6f 1038e 5cb3 b42c0e5cdcf8
94: ffff880001511c00 1f890686bf9e1 b42c0e5c8045 4 1f890686d9f4d 1a56c 9682 b42c0e5d16c7
95: ffff880001511c00 1f890686bf9e1 b42c0e5c8045 4 1f890686e5610 25c2f d7c8 b42c0e5d580d
96: ffff880001511c00 1f890686bf9e1 b42c0e5c8045 4 1f890686e8326 28945 e7e2 b42c0e5d6827
97: ffff880001411c00 233907ea7 b42c0e5d9e8b c182 23391ad48 12ea1 6c15 b42c0e5e0aa0
98: ffff880001411c00 233907ea7 b42c0e5d9e8b c182 23391b539 13692 6eeb b42c0e5e0d76
99: ffff880001411c00 233907ea7 b42c0e5d9e8b c182 2339270a3 1f1fc b1da b42c0e5e5065
The data for the first CPU (ffff880001411c00) looks OK to me.
For the second CPU (ffff880001511c00), the contents of the shadow struct
appear to be wrong on line 71 and 79: shadow.tsc_timestamp and
native_read_tsc() are very dissimilar, which results in a wrong value
of ret.
On line 80, the struct is OK again.
Notice that shadow.version appears to have been be reset back to 0. That
doesn't happen when the guest is rebooted without stopping the virtual machine.
Another cold boot log:
67: ffff880001411c00 16e3478f3 ba1ec80cf347 a5dc 16e36c8e7 24ff4 d36a ba1ec80dc6b1
68: ffff880001511c00 14de08f79 ba1ebc817cf9 122 209385d6572ce 209370f84e355 ba26cd1ae302 17445899c5ffb
69: ffff880001511c00 209385d659d5c ba1ec828b7d0 2 209385d6678d0 db74 4e60 ba1ec8290630
70: ffff880001411c00 16e860662 ba1ec82a1262 a5e6 16e86cd2f c6cd 4700 ba1ec82a5962
71: ffff880001411c00 16e860662 ba1ec82a1262 a5e6 16e88c965 2c303 fc81 ba1ec82b0ee3
72: ffff880001411c00 16e860662 ba1ec82a1262 a5e6 16e893dec 3378a 12620 ba1ec82b3882
73: ffff880001411c00 16e860662 ba1ec82a1262 a5e6 16e89b1d1 3ab6f 14f84 ba1ec82b61e6
74: ffff880001511c00 209385d6f582c ba1ec82c313b 6 209385d7034c1 dc95 4ec7 ba1ec82c8002
Debug logs of guest after rebooting the guest without stopping the virtual machine:
64: ffff880001411c00 2c9f06974 b8418e488ace d1b5a 2ca0781a3 17182f 83f87 b8418e50ca55
65: ffff880001511c00 2aa8b381d b841831255aa c760e 2040007f4fd81 203fd5d69c564 b8490b4fcf66 1708a8e622510
66: ffff880001511c00 2040007f525b2 b8418e631e86 19467e 2040007f60641 e08f 5033 b8418e636eb9
67: ffff880001411c00 2ca3f733e b8418e64c48e d1b60 2ca4023b4 b076 3f05 b8418e650393
68: ffff880001411c00 2ca3f733e b8418e64c48e d1b60 2ca41f35f 28021 e49e b8418e65a92c
69: ffff880001411c00 2ca3f733e b8418e64c48e d1b60 2ca426ad3 2f795 10f48 b8418e65d3d6
70: ffff880001411c00 2ca3f733e b8418e64c48e d1b60 2ca42dfc9 36c8b 1390e b8418e65fd9c
71: ffff880001511c00 2040007fc856e b8418e65c0d4 194680 2040007fd6ed9 e96b 535d b8418e661431
67: ffff880001411c00 20f2a5bed ba4ed2bb1d69 72720 20f4554e5 1af8f8 9a21a ba4ed2c4bf83
68: ffff880001511c00 1eec7bca9 ba4ec72a670c 6821a 209bee44b2eb7 209bcf583720e ba569f76e851 174a566a14f5d
69: ffff880001511c00 209bee44ba9e6 ba4ed2d5b2ed b095e 209bee44e5507 2ab21 f3fa ba4ed2d6a6e7
70: ffff880001411c00 20f7796e4 ba4ed2d6b1ee 72724 20f77c158 2a74 f29 ba4ed2d6c117
71: ffff880001411c00 20f7796e4 ba4ed2d6b1ee 72724 20f785ca2 c5be 469f ba4ed2d6f88d
72: ffff880001411c00 20f7796e4 ba4ed2d6b1ee 72724 20f787623 df3f 4fbb ba4ed2d701a9
73: ffff880001411c00 20f7796e4 ba4ed2d6b1ee 72724 20f7891bb fad7 5995 ba4ed2d70b83
74: ffff880001511c00 209bee44f569b ba4ed2d702ae b0960 209bee45007a8 b10d 3f3b ba4ed2d741e9
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-31 0:34 ` Arjan Koers
@ 2010-07-31 1:38 ` Zachary Amsden
2010-07-31 11:50 ` Arjan Koers
2010-07-31 2:39 ` Zachary Amsden
1 sibling, 1 reply; 81+ messages in thread
From: Zachary Amsden @ 2010-07-31 1:38 UTC (permalink / raw)
To: Arjan Koers; +Cc: kvm, Avi Kivity
On 07/30/2010 02:34 PM, Arjan Koers wrote:
> On 2010-07-28 12:37, Avi Kivity wrote:
>
>> On 07/28/2010 12:00 AM, Arjan Koers wrote:
>>
>>> On 2010-07-26 20:59, Arjan Koers wrote:
>>>
>>>
>>>> I ran into the same problem. 2.6.34.1 and 2.6.35-rc6 SMP guest
>>>> kernels hang during boot.
>>>>
>>> It appears that last is way ahead of ret twice.
>>> The kernel boots with this debug patch that makes the clock go
>>> backwards if the difference is big:
>>>
>>> last = atomic64_read(&last_value);
>>> do {
>>> - if (ret< last)
>>> - return last;
>>> + if (ret< last) {
>>> + if ( last - ret< 25000000 )
>>> + return last;
>>> + else
>>> + printk("pvclock backwards: ret = %llx; last =
>>> %llx\n", ret, last);
>>> + }
>>> last = atomic64_cmpxchg(&last_value, last, ret);
>>> } while (unlikely(last != ret));
>>>
>>>
>>>
>>> [ 0.037122] Total of 2 processors activated (11198.08 BogoMIPS).
>>> [ 0.037118] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
>>> [ 0.040000] pvclock backwards: ret = 108373705fe2; last = 210aff61470a
>>>
>> Zaaaacchhhh?!
>>
>>
>
> The lists below show some debug data of the first 99 calls to
> pvclock_clocksource_read since the kernel booted. The situation
> after the 'do ... while (version != src->version)' loop is
> displayed.
>
> Meaning of the columns:
> - src pointer
> - shadow.tsc_timestamp
> - shadow.system_timestamp
> - shadow.version
> - native_read_tsc()
> - delta = native_read_tsc() - shadow.tsc_timestamp
> - offset = scale_delta(delta, shadow.tsc_to_nsec_mul, shadow.tsc_shift)
> - ret = shadow.system_timestamp + offset
>
> Fields left out, because they were the same for all rows:
> - shadow.tsc_to_nsec_mul: b6dc43b6
> - shadow.tsc_shift: ffffffff
> - shadow.flags: 0
>
> Debug log of guest after cold boot of virtual machine:
> 1: ffff880001411c00 2107d5a4e b42c01d704c6 8294 210d8d4b5 5b7a67 20abdc b42c01f7b0a2
> 2: ffff880001411c00 2107d5a4e b42c01d704c6 8294 210dc2b61 5ed113 21dd1b b42c01f8e1e1
> 3: ffff880001411c00 21cb0d4a8 b42c0632768f bb70 21cb10a00 3558 130d b42c0632899c
> 4: ffff880001411c00 21cb0d4a8 b42c0632768f bb70 21cb11f17 4a6f 1a95 b42c06329124
> 5: ffff880001411c00 21cceaad2 b42c063d1e45 bbd8 21ccec522 1a50 965 b42c063d27aa
> 6: ffff880001411c00 21cde0644 b42c06429a42 bc10 21ce25457 44e13 1899a b42c064423dc
> 7: ffff880001411c00 21cf905c1 b42c064c3e76 bc46 21cfa182b 1126a 6201 b42c064ca077
> 8: ffff880001411c00 21d088194 b42c0651c601 bc7a 21d089592 13fe 723 b42c0651cd24
> 9: ffff880001411c00 21d1ad073 b42c06584fc3 bcde 21d1b135d 42ea 17e5 b42c065867a8
> 10: ffff880001411c00 21d2a3837 b42c065dd039 bd10 21d2a4825 fee 5b0 b42c065dd5e9
> 11: ffff880001411c00 21d38bab3 b42c0662fea6 bd42 21d38caa1 fee 5b0 b42c06630456
> 12: ffff880001411c00 21d47459b b42c06683029 bd78 21d475517 f7c 587 b42c066835b0
> 13: ffff880001411c00 21d578ce7 b42c066e005f bdb2 21d57d70c 4a25 1a7a b42c066e1ad9
> 14: ffff880001411c00 21d578ce7 b42c066e005f bdb2 21d57d8d6 4bef 1b1e b42c066e1b7d
> 15: ffff880001411c00 21d578ce7 b42c066e005f bdb2 21d57da22 4d3b 1b94 b42c066e1bf3
> 16: ffff880001411c00 21d578ce7 b42c066e005f bdb2 21d57fc5e 6f77 27ce b42c066e282d
> 17: ffff880001411c00 21d67c77c b42c0673cc0a bde4 21d67d685 f09 55e b42c0673d168
> 18: ffff880001411c00 21d7625b2 b42c0678ed96 be16 21d763488 ed6 54c b42c0678f2e2
> 19: ffff880001411c00 21df3db36 b42c06a5d222 be54 21dfa78b9 69d83 25cd5 b42c06a82ef7
> 20: ffff880001411c00 21df3db36 b42c06a5d222 be54 21dfa7a3f 69f09 25d61 b42c06a82f83
> 21: ffff880001411c00 21df3db36 b42c06a5d222 be54 21dfa7f8b 6a455 25f45 b42c06a83167
> 22: ffff880001411c00 21e3a50ea b42c06befbb1 be58 21e3c1750 1c666 a249 b42c06bf9dfa
> 23: ffff880001411c00 21e4bfe47 b42c06c54bc5 be92 21e4c4c61 4e1a 1be4 b42c06c567a9
> 24: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea4b224 20cb6 bb66 b42c06e4f922
> 25: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea52748 281da e53c b42c06e522f8
> 26: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea52907 28399 e5db b42c06e52397
> 27: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea52a76 28508 e65f b42c06e5241b
> 28: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea5c86a 322fc 11ec9 b42c06e55c85
> 29: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea60e3a 368cc 137b7 b42c06e57573
> 30: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea64dc8 3a85a 14e6a b42c06e58c26
> 31: ffff880001411c00 21ed8a003 b42c06f78496 bf02 21efda28b 250288 d37d2 b42c0704bc68
> 32: ffff880001411c00 21f0e9488 b42c070ac93f bf38 21f0eacdb 1853 8af b42c070ad1ee
> 33: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230aeeac6 3e60 1646 b42c0d5636ed
> 34: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230af06d0 5a6a 204a b42c0d5640f1
> 35: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b03f25 192bf 8fd6 b42c0d56b07d
> 36: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b043c8 19762 917f b42c0d56b226
> 37: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b0526b 1a605 96b8 b42c0d56b75f
> 38: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b05632 1a9cc 9812 b42c0d56b8b9
> 39: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b07eaa 1d244 a686 b42c0d56c72d
> 40: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b094e9 1e883 ae78 b42c0d56cf1f
> 41: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b09962 1ecfc b011 b42c0d56d0b8
> 42: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b10590 2592a d6b4 b42c0d56f75b
> 43: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b1090d 25ca7 d7f3 b42c0d56f89a
> 44: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b10f99 26333 da49 b42c0d56faf0
> 45: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b11204 2659e db27 b42c0d56fbce
> 46: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b1217c 27516 e0ad b42c0d570154
> 47: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b1483f 29bd9 ee85 b42c0d570f2c
> 48: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b14ba6 29f40 efbc b42c0d571063
> 49: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b15569 2a903 f338 b42c0d5713df
> 50: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b250b3 3a44d 14cf8 b42c0d576d9f
> 51: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b254a0 3a83a 14e5f b42c0d576f06
> 52: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b25bd8 3af72 150f3 b42c0d57719a
> 53: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b25ec3 3b25d 151fd b42c0d5772a4
> 54: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b5fcab 75045 29cad b42c0d58bd54
> 55: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b6013b 754d5 29e4e b42c0d58bef5
> 56: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b6b86c 80c06 2dfbc b42c0d590063
> 57: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b6bc41 80fdb 2e11a b42c0d5901c1
> 58: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b6c4e5 8187f 2e430 b42c0d5904d7
> 59: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b6c776 81b10 2e51b b42c0d5905c2
> 60: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b7f97b 94d15 35266 b42c0d59730d
> 61: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b864af 9b849 378b0 b42c0d599957
> 62: ffff880001411c00 23132e49d b42c0d855884 c16e 231599c3a 26b79d dd3ec b42c0d932c70
> 63: ffff880001411c00 23132e49d b42c0d855884 c16e 231599dbc 26b91f dd476 b42c0d932cfa
> 64: ffff880001411c00 23132e49d b42c0d855884 c16e 231599f5f 26bac2 dd50c b42c0d932d90
> 65: ffff880001411c00 231fdf357 b42c0dcddc47 c176 232046a74 6771d 24f1e b42c0dd02b65
> 66: ffff880001411c00 231fdf357 b42c0dcddc47 c176 232046c53 678fc 24fca b42c0dd02c11
> 67: ffff880001411c00 231fdf357 b42c0dcddc47 c176 232046da0 67a49 25040 b42c0dd02c87
> 68: ffff880001411c00 232f4a54e b42c0e25f5e7 c17c 232f62a2d 184df 8ae2 b42c0e2680c9
> 69: ffff880001411c00 232f4a54e b42c0e25f5e7 c17c 232f63478 18f2a 8e8f b42c0e268476
> 70: ffff880001411c00 232f4a54e b42c0e25f5e7 c17c 232f63f61 19a13 9274 b42c0e26885b
> 71: ffff880001511c00 20afec946 b42bffe0b604 130 1f890681eacdf 1f88e5d1fe399 b433ab005565 1685faae10b69
> 72: ffff880001411c00 2334400d3 b42c0e424ccd c180 23344a923 a850 3c1c b42c0e4288e9
> 73: ffff880001411c00 2334400d3 b42c0e424ccd c180 2334632f1 2321e c8c2 b42c0e43158f
> 74: ffff880001411c00 2334400d3 b42c0e424ccd c180 23346a094 29fc1 efea b42c0e433cb7
> 75: ffff880001411c00 2334400d3 b42c0e424ccd c180 23347021d 3014a 112c0 b42c0e435f8d
> 76: ffff880001411c00 2334400d3 b42c0e424ccd c180 2335ba33b 17a268 870e5 b42c0e4abdb2
> 77: ffff880001411c00 2334400d3 b42c0e424ccd c180 2335ba9f8 17a925 8734d b42c0e4ac01a
> 78: ffff880001411c00 2334400d3 b42c0e424ccd c180 2335bb17d 17b0aa 875fd b42c0e4ac2ca
> 79: ffff880001511c00 20afec946 b42bffe0b604 130 1f89068505355 1f88e5d518a0f b433ab1210ed 1685faaf2c6f1
> 80: ffff880001511c00 1f8906862c74e b42c0e59371c 2 1f8906863ad24 e5d6 5215 b42c0e598931
> 81: ffff880001511c00 1f8906862c74e b42c0e59371c 2 1f8906863b980 f232 567f b42c0e598d9b
> 82: ffff880001511c00 1f8906862c74e b42c0e59371c 2 1f8906863bbdd f48f 5757 b42c0e598e73
> 83: ffff880001511c00 1f8906862c74e b42c0e59371c 2 1f8906863e9d2 12284 67c1 b42c0e599edd
> 84: ffff880001411c00 2334400d3 b42c0e424ccd c180 233855729 415656 1755cc b42c0e59a299
> 85: ffff880001511c00 1f8906862c74e b42c0e59371c 2 1f890686410b4 14966 75a4 b42c0e59acc0
> 86: ffff880001411c00 2334400d3 b42c0e424ccd c180 233857b87 417ab4 1762c9 b42c0e59af96
> 87: ffff880001511c00 1f8906862c74e b42c0e59371c 2 1f89068646b9e 1a450 961d b42c0e59cd39
> 88: ffff880001411c00 2334400d3 b42c0e424ccd c180 233894271 45419e 18bc1e b42c0e5b08eb
> 89: ffff880001411c00 2334400d3 b42c0e424ccd c180 2338ab48a 46b3b7 19404c b42c0e5b8d19
> 90: ffff880001411c00 2334400d3 b42c0e424ccd c180 2338adf39 46de66 194f8b b42c0e5b9c58
> 91: ffff880001411c00 2334400d3 b42c0e424ccd c180 2338b39b8 4738e5 196fdc b42c0e5bbca9
> 92: ffff880001511c00 1f890686bf9e1 b42c0e5c8045 4 1f890686cf137 f756 5855 b42c0e5cd89a
> 93: ffff880001511c00 1f890686bf9e1 b42c0e5c8045 4 1f890686cfd6f 1038e 5cb3 b42c0e5cdcf8
> 94: ffff880001511c00 1f890686bf9e1 b42c0e5c8045 4 1f890686d9f4d 1a56c 9682 b42c0e5d16c7
> 95: ffff880001511c00 1f890686bf9e1 b42c0e5c8045 4 1f890686e5610 25c2f d7c8 b42c0e5d580d
> 96: ffff880001511c00 1f890686bf9e1 b42c0e5c8045 4 1f890686e8326 28945 e7e2 b42c0e5d6827
> 97: ffff880001411c00 233907ea7 b42c0e5d9e8b c182 23391ad48 12ea1 6c15 b42c0e5e0aa0
> 98: ffff880001411c00 233907ea7 b42c0e5d9e8b c182 23391b539 13692 6eeb b42c0e5e0d76
> 99: ffff880001411c00 233907ea7 b42c0e5d9e8b c182 2339270a3 1f1fc b1da b42c0e5e5065
>
> The data for the first CPU (ffff880001411c00) looks OK to me.
> For the second CPU (ffff880001511c00), the contents of the shadow struct
> appear to be wrong on line 71 and 79: shadow.tsc_timestamp and
> native_read_tsc() are very dissimilar, which results in a wrong value
> of ret.
> On line 80, the struct is OK again.
> Notice that shadow.version appears to have been be reset back to 0. That
> doesn't happen when the guest is rebooted without stopping the virtual machine.
>
How are you printing shadow.version? From a local variable captured
during the barrier window or directly in a printk afterwards? If should
never go backwards like this, and the vcpus come from a zalloc. This is
not easily explainable by anything other than a memory ordering or
compiler issue.
Note that receiving a startup IPI will cause the TSC to (mistakenly)
pass through the host value, but this should be corrected for. This
happens because SVM will call init_vmcb, clearing the tsc_offset field.
This seems to explain the huge difference in TSC presented to the CPUs.
It should affect kvmclock, because kvmclock won't be running at that
time yet.
Zach
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-31 0:34 ` Arjan Koers
2010-07-31 1:38 ` Zachary Amsden
@ 2010-07-31 2:39 ` Zachary Amsden
2010-07-31 11:53 ` Arjan Koers
1 sibling, 1 reply; 81+ messages in thread
From: Zachary Amsden @ 2010-07-31 2:39 UTC (permalink / raw)
To: Arjan Koers; +Cc: kvm, Avi Kivity, Glauber Costa
On 07/30/2010 02:34 PM, Arjan Koers wrote:
> On 2010-07-28 12:37, Avi Kivity wrote:
>
>> On 07/28/2010 12:00 AM, Arjan Koers wrote:
>>
>>> On 2010-07-26 20:59, Arjan Koers wrote:
>>>
>>>
>>>> I ran into the same problem. 2.6.34.1 and 2.6.35-rc6 SMP guest
>>>> kernels hang during boot.
>>>>
>>> It appears that last is way ahead of ret twice.
>>> The kernel boots with this debug patch that makes the clock go
>>> backwards if the difference is big:
>>>
>>> last = atomic64_read(&last_value);
>>> do {
>>> - if (ret< last)
>>> - return last;
>>> + if (ret< last) {
>>> + if ( last - ret< 25000000 )
>>> + return last;
>>> + else
>>> + printk("pvclock backwards: ret = %llx; last =
>>> %llx\n", ret, last);
>>> + }
>>> last = atomic64_cmpxchg(&last_value, last, ret);
>>> } while (unlikely(last != ret));
>>>
>>>
>>>
>>> [ 0.037122] Total of 2 processors activated (11198.08 BogoMIPS).
>>> [ 0.037118] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
>>> [ 0.040000] pvclock backwards: ret = 108373705fe2; last = 210aff61470a
>>>
>> Zaaaacchhhh?!
>>
>>
>
> The lists below show some debug data of the first 99 calls to
> pvclock_clocksource_read since the kernel booted. The situation
> after the 'do ... while (version != src->version)' loop is
> displayed.
>
> Meaning of the columns:
> - src pointer
> - shadow.tsc_timestamp
> - shadow.system_timestamp
> - shadow.version
> - native_read_tsc()
> - delta = native_read_tsc() - shadow.tsc_timestamp
> - offset = scale_delta(delta, shadow.tsc_to_nsec_mul, shadow.tsc_shift)
> - ret = shadow.system_timestamp + offset
>
> Fields left out, because they were the same for all rows:
> - shadow.tsc_to_nsec_mul: b6dc43b6
> - shadow.tsc_shift: ffffffff
> - shadow.flags: 0
>
> Debug log of guest after cold boot of virtual machine:
> 1: ffff880001411c00 2107d5a4e b42c01d704c6 8294 210d8d4b5 5b7a67 20abdc b42c01f7b0a2
> 2: ffff880001411c00 2107d5a4e b42c01d704c6 8294 210dc2b61 5ed113 21dd1b b42c01f8e1e1
> 3: ffff880001411c00 21cb0d4a8 b42c0632768f bb70 21cb10a00 3558 130d b42c0632899c
> 4: ffff880001411c00 21cb0d4a8 b42c0632768f bb70 21cb11f17 4a6f 1a95 b42c06329124
> 5: ffff880001411c00 21cceaad2 b42c063d1e45 bbd8 21ccec522 1a50 965 b42c063d27aa
> 6: ffff880001411c00 21cde0644 b42c06429a42 bc10 21ce25457 44e13 1899a b42c064423dc
> 7: ffff880001411c00 21cf905c1 b42c064c3e76 bc46 21cfa182b 1126a 6201 b42c064ca077
> 8: ffff880001411c00 21d088194 b42c0651c601 bc7a 21d089592 13fe 723 b42c0651cd24
> 9: ffff880001411c00 21d1ad073 b42c06584fc3 bcde 21d1b135d 42ea 17e5 b42c065867a8
> 10: ffff880001411c00 21d2a3837 b42c065dd039 bd10 21d2a4825 fee 5b0 b42c065dd5e9
> 11: ffff880001411c00 21d38bab3 b42c0662fea6 bd42 21d38caa1 fee 5b0 b42c06630456
> 12: ffff880001411c00 21d47459b b42c06683029 bd78 21d475517 f7c 587 b42c066835b0
> 13: ffff880001411c00 21d578ce7 b42c066e005f bdb2 21d57d70c 4a25 1a7a b42c066e1ad9
> 14: ffff880001411c00 21d578ce7 b42c066e005f bdb2 21d57d8d6 4bef 1b1e b42c066e1b7d
> 15: ffff880001411c00 21d578ce7 b42c066e005f bdb2 21d57da22 4d3b 1b94 b42c066e1bf3
> 16: ffff880001411c00 21d578ce7 b42c066e005f bdb2 21d57fc5e 6f77 27ce b42c066e282d
> 17: ffff880001411c00 21d67c77c b42c0673cc0a bde4 21d67d685 f09 55e b42c0673d168
> 18: ffff880001411c00 21d7625b2 b42c0678ed96 be16 21d763488 ed6 54c b42c0678f2e2
> 19: ffff880001411c00 21df3db36 b42c06a5d222 be54 21dfa78b9 69d83 25cd5 b42c06a82ef7
> 20: ffff880001411c00 21df3db36 b42c06a5d222 be54 21dfa7a3f 69f09 25d61 b42c06a82f83
> 21: ffff880001411c00 21df3db36 b42c06a5d222 be54 21dfa7f8b 6a455 25f45 b42c06a83167
> 22: ffff880001411c00 21e3a50ea b42c06befbb1 be58 21e3c1750 1c666 a249 b42c06bf9dfa
> 23: ffff880001411c00 21e4bfe47 b42c06c54bc5 be92 21e4c4c61 4e1a 1be4 b42c06c567a9
> 24: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea4b224 20cb6 bb66 b42c06e4f922
> 25: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea52748 281da e53c b42c06e522f8
> 26: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea52907 28399 e5db b42c06e52397
> 27: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea52a76 28508 e65f b42c06e5241b
> 28: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea5c86a 322fc 11ec9 b42c06e55c85
> 29: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea60e3a 368cc 137b7 b42c06e57573
> 30: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea64dc8 3a85a 14e6a b42c06e58c26
> 31: ffff880001411c00 21ed8a003 b42c06f78496 bf02 21efda28b 250288 d37d2 b42c0704bc68
> 32: ffff880001411c00 21f0e9488 b42c070ac93f bf38 21f0eacdb 1853 8af b42c070ad1ee
> 33: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230aeeac6 3e60 1646 b42c0d5636ed
> 34: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230af06d0 5a6a 204a b42c0d5640f1
> 35: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b03f25 192bf 8fd6 b42c0d56b07d
> 36: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b043c8 19762 917f b42c0d56b226
> 37: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b0526b 1a605 96b8 b42c0d56b75f
> 38: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b05632 1a9cc 9812 b42c0d56b8b9
> 39: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b07eaa 1d244 a686 b42c0d56c72d
> 40: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b094e9 1e883 ae78 b42c0d56cf1f
> 41: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b09962 1ecfc b011 b42c0d56d0b8
> 42: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b10590 2592a d6b4 b42c0d56f75b
> 43: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b1090d 25ca7 d7f3 b42c0d56f89a
> 44: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b10f99 26333 da49 b42c0d56faf0
> 45: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b11204 2659e db27 b42c0d56fbce
> 46: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b1217c 27516 e0ad b42c0d570154
> 47: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b1483f 29bd9 ee85 b42c0d570f2c
> 48: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b14ba6 29f40 efbc b42c0d571063
> 49: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b15569 2a903 f338 b42c0d5713df
> 50: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b250b3 3a44d 14cf8 b42c0d576d9f
> 51: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b254a0 3a83a 14e5f b42c0d576f06
> 52: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b25bd8 3af72 150f3 b42c0d57719a
> 53: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b25ec3 3b25d 151fd b42c0d5772a4
> 54: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b5fcab 75045 29cad b42c0d58bd54
> 55: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b6013b 754d5 29e4e b42c0d58bef5
> 56: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b6b86c 80c06 2dfbc b42c0d590063
> 57: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b6bc41 80fdb 2e11a b42c0d5901c1
> 58: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b6c4e5 8187f 2e430 b42c0d5904d7
> 59: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b6c776 81b10 2e51b b42c0d5905c2
> 60: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b7f97b 94d15 35266 b42c0d59730d
> 61: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b864af 9b849 378b0 b42c0d599957
> 62: ffff880001411c00 23132e49d b42c0d855884 c16e 231599c3a 26b79d dd3ec b42c0d932c70
> 63: ffff880001411c00 23132e49d b42c0d855884 c16e 231599dbc 26b91f dd476 b42c0d932cfa
> 64: ffff880001411c00 23132e49d b42c0d855884 c16e 231599f5f 26bac2 dd50c b42c0d932d90
> 65: ffff880001411c00 231fdf357 b42c0dcddc47 c176 232046a74 6771d 24f1e b42c0dd02b65
> 66: ffff880001411c00 231fdf357 b42c0dcddc47 c176 232046c53 678fc 24fca b42c0dd02c11
> 67: ffff880001411c00 231fdf357 b42c0dcddc47 c176 232046da0 67a49 25040 b42c0dd02c87
> 68: ffff880001411c00 232f4a54e b42c0e25f5e7 c17c 232f62a2d 184df 8ae2 b42c0e2680c9
> 69: ffff880001411c00 232f4a54e b42c0e25f5e7 c17c 232f63478 18f2a 8e8f b42c0e268476
> 70: ffff880001411c00 232f4a54e b42c0e25f5e7 c17c 232f63f61 19a13 9274 b42c0e26885b
> 71: ffff880001511c00 20afec946 b42bffe0b604 130 1f890681eacdf 1f88e5d1fe399 b433ab005565 1685faae10b69
>
Okay, I think I know what's going on and why Glauber's patch causes
problems for you. It looks like your kernel is reading the kvmclock on
the AP before it is initialized. Looking at the guest side of things,
it seems entirely plausible this could happen.
You did mention printk timing causes the bug to appear? Perhaps it is
not just coincidental. Printk getting the time might very well call
back into the timer code before the clock is initialized, and you've got
tons of stuff in cpu_init and friends that are likely to want to printk
all kinds of bootup messages.
If this were in fact the case, the cmpxchg that was added by Glauber's
patch could leap your clock forward to some very uninitialized random
value and then you could end up stuck in a timeout loop for days, as you
are seeing.
Can you try very simply disabling printk timing to see if that might be
the source of the bug? In the meantime, what kernel version do you have
in the guests?
Zach
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-31 1:38 ` Zachary Amsden
@ 2010-07-31 11:50 ` Arjan Koers
0 siblings, 0 replies; 81+ messages in thread
From: Arjan Koers @ 2010-07-31 11:50 UTC (permalink / raw)
To: Zachary Amsden; +Cc: kvm, Avi Kivity
On 2010-07-31 03:38, Zachary Amsden wrote:
>
> How are you printing shadow.version? From a local variable captured
> during the barrier window or directly in a printk afterwards? If should
> never go backwards like this, and the vcpus come from a zalloc. This is
> not easily explainable by anything other than a memory ordering or
> compiler issue.
I'm reading shadow.version after the do-while loop and storing it in
an array to print it after the kernel finishes booting. I had to defer
my debug printk's, because they were affecting the results.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-31 2:39 ` Zachary Amsden
@ 2010-07-31 11:53 ` Arjan Koers
2010-07-31 16:36 ` Arjan Koers
0 siblings, 1 reply; 81+ messages in thread
From: Arjan Koers @ 2010-07-31 11:53 UTC (permalink / raw)
To: Zachary Amsden; +Cc: kvm, Avi Kivity, Glauber Costa
On 2010-07-31 04:39, Zachary Amsden wrote:
> On 07/30/2010 02:34 PM, Arjan Koers wrote:
>> On 2010-07-28 12:37, Avi Kivity wrote:
>>
>>> On 07/28/2010 12:00 AM, Arjan Koers wrote:
>>>
>>>> On 2010-07-26 20:59, Arjan Koers wrote:
>>>>
>>>>
>>>>> I ran into the same problem. 2.6.34.1 and 2.6.35-rc6 SMP guest
>>>>> kernels hang during boot.
>>>>>
>>>> It appears that last is way ahead of ret twice.
>>>> The kernel boots with this debug patch that makes the clock go
>>>> backwards if the difference is big:
>>>>
>>>> last = atomic64_read(&last_value);
>>>> do {
>>>> - if (ret< last)
>>>> - return last;
>>>> + if (ret< last) {
>>>> + if ( last - ret< 25000000 )
>>>> + return last;
>>>> + else
>>>> + printk("pvclock backwards: ret = %llx; last =
>>>> %llx\n", ret, last);
>>>> + }
>>>> last = atomic64_cmpxchg(&last_value, last, ret);
>>>> } while (unlikely(last != ret));
>>>>
>>>>
>>>>
>>>> [ 0.037122] Total of 2 processors activated (11198.08 BogoMIPS).
>>>> [ 0.037118] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
>>>> [ 0.040000] pvclock backwards: ret = 108373705fe2; last =
>>>> 210aff61470a
>>>>
>>> Zaaaacchhhh?!
>>>
>>>
>>
>> The lists below show some debug data of the first 99 calls to
>> pvclock_clocksource_read since the kernel booted. The situation
>> after the 'do ... while (version != src->version)' loop is
>> displayed.
>>
>> Meaning of the columns:
>> - src pointer
>> - shadow.tsc_timestamp
>> - shadow.system_timestamp
>> - shadow.version
>> - native_read_tsc()
>> - delta = native_read_tsc() - shadow.tsc_timestamp
>> - offset = scale_delta(delta, shadow.tsc_to_nsec_mul, shadow.tsc_shift)
>> - ret = shadow.system_timestamp + offset
>>
>> Fields left out, because they were the same for all rows:
>> - shadow.tsc_to_nsec_mul: b6dc43b6
>> - shadow.tsc_shift: ffffffff
>> - shadow.flags: 0
>>
>> Debug log of guest after cold boot of virtual machine:
<snip>
>> 70: ffff880001411c00 232f4a54e b42c0e25f5e7 c17c
>> 232f63f61 19a13 9274 b42c0e26885b
>> 71: ffff880001511c00 20afec946 b42bffe0b604 130 1f890681eacdf
>> 1f88e5d1fe399 b433ab005565 1685faae10b69
>>
>
> Okay, I think I know what's going on and why Glauber's patch causes
> problems for you. It looks like your kernel is reading the kvmclock on
> the AP before it is initialized. Looking at the guest side of things,
> it seems entirely plausible this could happen.
>
> You did mention printk timing causes the bug to appear? Perhaps it is
> not just coincidental. Printk getting the time might very well call
> back into the timer code before the clock is initialized, and you've got
> tons of stuff in cpu_init and friends that are likely to want to printk
> all kinds of bootup messages.
>
> If this were in fact the case, the cmpxchg that was added by Glauber's
> patch could leap your clock forward to some very uninitialized random
> value and then you could end up stuck in a timeout loop for days, as you
> are seeing.
Yes. That large wrong value is stored in last_value and all future correct
values are ignored, because they are smaller then last_value.
> Can you try very simply disabling printk timing to see if that might be
> the source of the bug? In the meantime, what kernel version do you have
> in the guests?
The kernel boots successfully when CONFIG_PRINTK_TIME is not set.
I'm testing with 2.6.35-rc6 now. The problem also occurs with 2.6.34.1,
which also has Glauber's patch. Version 2.6.34 is working.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-31 11:53 ` Arjan Koers
@ 2010-07-31 16:36 ` Arjan Koers
2010-07-31 19:45 ` Arjan Koers
2010-07-31 23:55 ` Zachary Amsden
0 siblings, 2 replies; 81+ messages in thread
From: Arjan Koers @ 2010-07-31 16:36 UTC (permalink / raw)
To: kvm; +Cc: Zachary Amsden, Avi Kivity, Glauber Costa
On 2010-07-31 13:53, Arjan Koers wrote:
>
> The kernel boots successfully when CONFIG_PRINTK_TIME is not set.
>
The problem occurs when this message is printed:
[ 0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock
When I disable that printk, the kernel boots with
CONFIG_PRINTK_TIME=y
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt)
int low, high;
low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
- printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
- cpu, high, low, txt);
+ /*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
+ cpu, high, low, txt);*/
return native_write_msr_safe(msr_kvm_system_time, low, high);
}
So the problem appears to be that the clock of the second CPU
is used too soon (or that clock setup should finish earlier).
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-31 16:36 ` Arjan Koers
@ 2010-07-31 19:45 ` Arjan Koers
2010-07-31 23:55 ` Zachary Amsden
1 sibling, 0 replies; 81+ messages in thread
From: Arjan Koers @ 2010-07-31 19:45 UTC (permalink / raw)
To: kvm; +Cc: Zachary Amsden, Avi Kivity, Glauber Costa
On 2010-07-31 18:36, Arjan Koers wrote:
> On 2010-07-31 13:53, Arjan Koers wrote:
>>
>> The kernel boots successfully when CONFIG_PRINTK_TIME is not set.
>>
>
> The problem occurs when this message is printed:
>
> [ 0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock
>
> When I disable that printk, the kernel boots with
> CONFIG_PRINTK_TIME=y
>
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt)
> int low, high;
> low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
> high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
> - printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> - cpu, high, low, txt);
> + /*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> + cpu, high, low, txt);*/
>
> return native_write_msr_safe(msr_kvm_system_time, low, high);
> }
>
> So the problem appears to be that the clock of the second CPU
> is used too soon (or that clock setup should finish earlier).
Moving the printk after native_write_msr_safe seems to solve the
problem:
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index eb9b76c..ca43ce3 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -128,13 +128,15 @@ static struct clocksource kvm_clock = {
static int kvm_register_clock(char *txt)
{
int cpu = smp_processor_id();
- int low, high;
+ int low, high, ret;
+
low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
+ ret = native_write_msr_safe(msr_kvm_system_time, low, high);
printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
cpu, high, low, txt);
- return native_write_msr_safe(msr_kvm_system_time, low, high);
+ return ret;
}
#ifdef CONFIG_X86_LOCAL_APIC
The debug log looks correct now:
67: ffff880001411a80 1b7772acb f797e782af86 7c82a2 1b7dd17c9 65ecfe 246717 f797e7a7169d
68: ffff880001411a80 1b8730d76 f797e7dca389 7c82b2 1b8892871 161afb 7e519 f797e7e488a2
69: ffff880001411a80 1b8730d76 f797e7dca389 7c82b2 1b8893281 16250b 7e8b1 f797e7e48c3a
70: ffff880001411a80 1b8730d76 f797e7dca389 7c82b2 1b8893c47 162ed1 7ec2e f797e7e48fb7
71: ffff880001511a80 2b55ba387fb14 f797e7ef37af e7c292 2b55ba388196a 1e56 ad5 f797e7ef4284
72: ffff880001411a80 1b8a96765 f797e7f00c69 7c82b6 1b8a9fed0 976b 3613 f797e7f0427c
73: ffff880001411a80 1b8a96765 f797e7f00c69 7c82b6 1b8ab712f 209ca ba5b f797e7f0c6c4
74: ffff880001411a80 1b8a96765 f797e7f00c69 7c82b6 1b8abd861 270fc df36 f797e7f0eb9f
75: ffff880001411a80 1b8a96765 f797e7f00c69 7c82b6 1b8ac3348 2cbe3 ffad f797e7f10c16
76: ffff880001511a80 2b55ba3b9c85c f797e8010094 e7c332 2b55ba3bc258a 25d2e d823 f797e801d8b7
77: ffff880001511a80 2b55ba3ce41f9 f797e8085071 e7c366 2b55ba3cec05b 7e62 2d23 f797e8087d94
78: ffff880001411a80 1b8d56d8c f797e7ffc53e 7c82b8 1b8eef620 198894 91e88 f797e808e3c6
79: ffff880001411a80 1b8d56d8c f797e7ffc53e 7c82b8 1b8ef182e 19aaa2 92ab2 f797e808eff0
80: ffff880001411a80 1b8d56d8c f797e7ffc53e 7c82b8 1b8f1d2ad 1c6521 a2429 f797e809e967
The only strange thing remaining is that the time for the first printk
isn't what I expected:
[ 0.016000] kvm-clock: cpu 1, msr 0:1511a81, secondary cpu clock
When I added some extra printk's immediately after that one, the
time on those was correct.
Here's a partial boot log:
...
[ 0.000000] Console: colour VGA+ 80x25
[ 0.000000] console [tty0] enabled
[ 0.000000] hpet clockevent registered
[ 0.000000] Detected 2799.950 MHz processor.
[ 0.016000] Calibrating delay loop (skipped) preset value.. 5599.90 BogoMIPS (lpj=11199800)
[ 0.016000] pid_max: default: 32768 minimum: 301
[ 0.016000] Mount-cache hash table entries: 256
[ 0.016000] using C1E aware idle routine
[ 0.016000] Performance Events: AMD PMU driver.
[ 0.016000] ... version: 0
[ 0.016000] ... bit width: 48
[ 0.016000] ... generic registers: 4
[ 0.016000] ... value mask: 0000ffffffffffff
[ 0.016004] ... max period: 00007fffffffffff
[ 0.016406] ... fixed-purpose events: 0
[ 0.016744] ... event mask: 000000000000000f
[ 0.021404] Freeing SMP alternatives: 12k freed
[ 0.021836] ACPI: Core revision 20100428
[ 0.023882] Setting APIC routing to flat
[ 0.025659] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.026129] CPU0: AMD Athlon(tm) II X2 240 Processor stepping 02
[ 0.028000] Booting Node 0, Processors #1 Ok.
[ 0.016000] kvm-clock: cpu 1, msr 0:1511a81, secondary cpu clock
[ 0.036812] Brought up 2 CPUs
[ 0.036820] Total of 2 processors activated (11199.80 BogoMIPS).
[ 0.036802] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
[ 0.040357] NET: Registered protocol family 16
[ 0.044159] ACPI: bus type pci registered
...
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-31 16:36 ` Arjan Koers
2010-07-31 19:45 ` Arjan Koers
@ 2010-07-31 23:55 ` Zachary Amsden
2010-08-02 14:43 ` Glauber Costa
1 sibling, 1 reply; 81+ messages in thread
From: Zachary Amsden @ 2010-07-31 23:55 UTC (permalink / raw)
To: Arjan Koers; +Cc: kvm, Avi Kivity, Glauber Costa
On 07/31/2010 06:36 AM, Arjan Koers wrote:
> On 2010-07-31 13:53, Arjan Koers wrote:
>
>> The kernel boots successfully when CONFIG_PRINTK_TIME is not set.
>>
>>
> The problem occurs when this message is printed:
>
> [ 0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock
>
> When I disable that printk, the kernel boots with
> CONFIG_PRINTK_TIME=y
>
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt)
> int low, high;
> low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
> high = ((u64)__pa(&per_cpu(hv_clock, cpu))>> 32);
> - printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> - cpu, high, low, txt);
> + /*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> + cpu, high, low, txt);*/
>
> return native_write_msr_safe(msr_kvm_system_time, low, high);
> }
>
> So the problem appears to be that the clock of the second CPU
> is used too soon (or that clock setup should finish earlier).
>
That's almost hilarious. The printk from setting up the kvm clock is
invoking the kvm clock before it is setup.
There's no reason other printks couldn't do the same thing, however. I
think it's safest to keep an initialized flag and check for it before
attempting to return a meaningful value.
Zach
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-30 22:54 ` Zachary Amsden
@ 2010-08-02 10:12 ` Andre Przywara
0 siblings, 0 replies; 81+ messages in thread
From: Andre Przywara @ 2010-08-02 10:12 UTC (permalink / raw)
To: Zachary Amsden; +Cc: Avi Kivity, glommer@redhat.com, KVM list
Zachary Amsden wrote:
> On 07/28/2010 02:25 AM, Andre Przywara wrote:
>> Andre Przywara wrote:
>>> Andre Przywara wrote:
>>>> Avi Kivity wrote:
>>>>> On 07/27/2010 04:48 PM, Andre Przywara wrote:
>>>>>>> Wierd. Maybe the clock goes crazy.
>>>>>>>
>>>>>>> Let's see if it jumps forward alot:
>>>>>>>
>>>>>>> } while (unlikely(last != ret));
>>>>>>> +
>>>>>>> + {
>>>>>>> + static u64 last_report;
>>>>>>> + if (ret > last_report + 10000) {
>>>>>>> + last_report = ret;
>>>>>>> + printk("kvmclock: %llx\n", ret);
>>>>>>> + }
>>>>>>> +
>>>>>>> + }
>>>>>>>
>>>>>>> return ret;
>>>>>>> }
>>>>>>>
>>>>>>> Worth updating the 'return last' to update ret and goto the new
>>>>>>> code, so we don't miss that path.
>>>>>> Did that. There is _a lot_ of output (about 350 lines per second
>>>>>> via the 115k serial console), both with smp=1 and smp=2.
>>>>>> The majority is differing about 2,000,000 (ticks?), but a handful
>>>>>> of them are in the range of 20 million.
>>>>> nanoseconds. So 2-20ms. Consistent with 350 lines/sec.
>>>>>
>>>>>> No difference between smp=2 and smp=1.
>>>>>> I also get some "BUG: recent printk recursion!" and I don't see
>>>>>> any kernel boot progress beyond outputting the BogoMIPS value.
>>>>> Right, printk() wants the time too.
>>>>>
>>>>>> BTW: I found two message from your earlier debug statement:
>>>>>> [ 0.000000] kvm-clock: cpu 0, msr 0:1ac0401, boot clock
>>>>>> [ 0.000000] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock
>>>>> Those are from kvmclock initialization, not from the older patch.
>>>>>
>>>>> I'm completely confused, everything seems to be in order.
>>>>>
>>>>> Let's see. if you s/return last/return ret/ in the original, does
>>>>> this help things along? this makes pvclock drop the computation
>>>>> and should be exactly the same as before the patch.
>>>> Yes, this works, both smp version boot. I see a short very short
>>>> break after the line in question, but then it proceeds well.
>>>> Thanks for your help, now I got a much better insight into the
>>>> issue. I will see if I can find something more.
>>> Did some more investigations, some observations:
>>> - The cmpxchg does not seem to be a problem, I didn't see the loop
>>> iterated more than once.
>>> - Turning off printk-timestamps makes the bug go away. But I guess it
>>> is just hiding or deferring it, and it's no real workaround anyway.
>>> - I instrumented the "if (ret < last) return last;" statement, when
>>> the kernel hangs I get only printks from there, although it has hit
>>> before:
>>> ----------
>>> [ 0.820000] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
>>> [ 0.820000] returning last instead (cnt=19001)
>>> [ 0.820000] returning last instead (cnt=20001)
>>> The last line repeats forever with the same timestamp, the counter
>>> (counting the number of "return last;") increments about 3500
>>> times/second.
>>>
>>> I will see if I find something more...
>> Added some more instrumentation, seems like the values read from the
>> pvclock is bogus *sometimes*:
>> returning last instead (2778021535795841, cnt=1, diff=1389078312510470)
>> This is from the first time the if-statement triggers. So I guess the
>> value read is ridiculously far in the future (multiple days), so next
>> calls to clocksource_read() will always return this bogus last value.
>> This means that the clock does not make progress (for several days)
>> and thus timing loops will never come to an end. I also instrumented
>> the serial driver, the last thing I saw was autoconfig_irq, where
>> obviously udelay() is called.
>>
>> Does that ring a bell with someone?
>>
>> I will now concentrate on the pvclock readout/HV write part to see
>> which of the values used here are wrong.
>
> Have you gotten any further results on this?
Somehow. I think my latest findings were more or less ghost bugs: since
prinkts contain a timestamp they interfere with the actual code. The
large gap I described above was only to be seen with these printks, it
is more or less double the real value (which is my host's uptime).
Sadly I cannot use debugfs to avoid the printks, since the kernel halts
and I don't get to userland.
On another try I managed to bisect the failure also in qemu-kvm. The bug
triggers only with "ebc4f45 turn off kvmclock when resetting cpu"
applied (_additionally_ to the kernel patch in question).
When I comment out the call to kvm_reset_msrs() in the master branch,
this also lets the bug vanish.
>
> I think the most likely explanation is that your host CPU has TSC out of
> sync, and somehow this leaks over to pvclock. Am I correct that it
> happens even with one guest VCPU? What if you disable secondary host CPUs?
I tried several ways to pin VCPUs to different host CPUs (cores and
sockets): both VCPUs on one core, both vCPUs on different cores on the
same socket and both vCPUs to different sockets/nodes. That all did not
make any difference, the kernel halted in either case.
I also tried booting the host with maxcpus=1, the error was still the
same: -smp 1 works, -smp 2 halts.
Btw.: the host uses clocksource acpi_pm. Also I noticed that sometimes
the guests gets very slow after having switched the clocksource to
kvmclock, it then eventually halts at the mentioned line.
Regards,
Andre.
--
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-07-31 23:55 ` Zachary Amsden
@ 2010-08-02 14:43 ` Glauber Costa
2010-08-02 16:16 ` Arjan Koers
2010-08-02 20:26 ` Zachary Amsden
0 siblings, 2 replies; 81+ messages in thread
From: Glauber Costa @ 2010-08-02 14:43 UTC (permalink / raw)
To: Zachary Amsden; +Cc: Arjan Koers, kvm, Avi Kivity
On Sat, Jul 31, 2010 at 01:55:10PM -1000, Zachary Amsden wrote:
> On 07/31/2010 06:36 AM, Arjan Koers wrote:
> >On 2010-07-31 13:53, Arjan Koers wrote:
> >>The kernel boots successfully when CONFIG_PRINTK_TIME is not set.
> >>
> >The problem occurs when this message is printed:
> >
> >[ 0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock
> >
> >When I disable that printk, the kernel boots with
> >CONFIG_PRINTK_TIME=y
> >
> >--- a/arch/x86/kernel/kvmclock.c
> >+++ b/arch/x86/kernel/kvmclock.c
> >@@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt)
> > int low, high;
> > low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
> > high = ((u64)__pa(&per_cpu(hv_clock, cpu))>> 32);
> >- printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> >- cpu, high, low, txt);
> >+ /*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> >+ cpu, high, low, txt);*/
> >
> > return native_write_msr_safe(msr_kvm_system_time, low, high);
> > }
> >
> >So the problem appears to be that the clock of the second CPU
> >is used too soon (or that clock setup should finish earlier).
>
> That's almost hilarious. The printk from setting up the kvm clock
> is invoking the kvm clock before it is setup.
>
> There's no reason other printks couldn't do the same thing, however.
> I think it's safest to keep an initialized flag and check for it
> before attempting to return a meaningful value.
I was on vacations, just got back.
I think it is safe to just patch our own use of it. Before that, all other
printks will be handled by the main cpu anyway, since it'll be the only one active
at the moment. The only possible offenders for this are us, and the cpu initialization
code, which is already fragile in multiple ways anyway.
A flag would only make things more complicated and dirty
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-08-02 14:43 ` Glauber Costa
@ 2010-08-02 16:16 ` Arjan Koers
2010-08-02 18:07 ` Glauber Costa
2010-08-02 20:26 ` Zachary Amsden
1 sibling, 1 reply; 81+ messages in thread
From: Arjan Koers @ 2010-08-02 16:16 UTC (permalink / raw)
To: kvm; +Cc: Glauber Costa, Zachary Amsden, Avi Kivity, Andre Przywara
On 2010-08-02 16:43, Glauber Costa wrote:
> On Sat, Jul 31, 2010 at 01:55:10PM -1000, Zachary Amsden wrote:
>> On 07/31/2010 06:36 AM, Arjan Koers wrote:
>>> On 2010-07-31 13:53, Arjan Koers wrote:
>>>> The kernel boots successfully when CONFIG_PRINTK_TIME is not set.
>>>>
>>> The problem occurs when this message is printed:
>>>
>>> [ 0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock
>>>
>>> When I disable that printk, the kernel boots with
>>> CONFIG_PRINTK_TIME=y
>>>
>>> --- a/arch/x86/kernel/kvmclock.c
>>> +++ b/arch/x86/kernel/kvmclock.c
>>> @@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt)
>>> int low, high;
>>> low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
>>> high = ((u64)__pa(&per_cpu(hv_clock, cpu))>> 32);
>>> - printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
>>> - cpu, high, low, txt);
>>> + /*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
>>> + cpu, high, low, txt);*/
>>>
>>> return native_write_msr_safe(msr_kvm_system_time, low, high);
>>> }
>>>
>>> So the problem appears to be that the clock of the second CPU
>>> is used too soon (or that clock setup should finish earlier).
>>
>> That's almost hilarious. The printk from setting up the kvm clock
>> is invoking the kvm clock before it is setup.
>>
>> There's no reason other printks couldn't do the same thing, however.
>> I think it's safest to keep an initialized flag and check for it
>> before attempting to return a meaningful value.
>
> I was on vacations, just got back.
>
> I think it is safe to just patch our own use of it. Before that, all other
> printks will be handled by the main cpu anyway, since it'll be the only one active
> at the moment. The only possible offenders for this are us, and the cpu initialization
> code, which is already fragile in multiple ways anyway.
>
> A flag would only make things more complicated and dirty
Maybe you could add a sanity check in pvclock_clocksource_read
after 'do { ... } while (version != src->version)' that
returns last_value if offset is extremely large?
I've performed some more boot tests (about 20) with the patch that
moves the printk after native_write_msr_safe and it works for me.
Andre Przywara confirmed to me that it also fixes his problem.
A slightly modified version of the patch for 2.6.34.1 also works
(800+ successful boot cycles).
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-08-02 16:16 ` Arjan Koers
@ 2010-08-02 18:07 ` Glauber Costa
0 siblings, 0 replies; 81+ messages in thread
From: Glauber Costa @ 2010-08-02 18:07 UTC (permalink / raw)
To: Arjan Koers; +Cc: kvm, Zachary Amsden, Avi Kivity, Andre Przywara
On Mon, Aug 02, 2010 at 06:16:16PM +0200, Arjan Koers wrote:
> On 2010-08-02 16:43, Glauber Costa wrote:
> > On Sat, Jul 31, 2010 at 01:55:10PM -1000, Zachary Amsden wrote:
> >> On 07/31/2010 06:36 AM, Arjan Koers wrote:
> >>> On 2010-07-31 13:53, Arjan Koers wrote:
> >>>> The kernel boots successfully when CONFIG_PRINTK_TIME is not set.
> >>>>
> >>> The problem occurs when this message is printed:
> >>>
> >>> [ 0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock
> >>>
> >>> When I disable that printk, the kernel boots with
> >>> CONFIG_PRINTK_TIME=y
> >>>
> >>> --- a/arch/x86/kernel/kvmclock.c
> >>> +++ b/arch/x86/kernel/kvmclock.c
> >>> @@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt)
> >>> int low, high;
> >>> low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
> >>> high = ((u64)__pa(&per_cpu(hv_clock, cpu))>> 32);
> >>> - printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> >>> - cpu, high, low, txt);
> >>> + /*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> >>> + cpu, high, low, txt);*/
> >>>
> >>> return native_write_msr_safe(msr_kvm_system_time, low, high);
> >>> }
> >>>
> >>> So the problem appears to be that the clock of the second CPU
> >>> is used too soon (or that clock setup should finish earlier).
> >>
> >> That's almost hilarious. The printk from setting up the kvm clock
> >> is invoking the kvm clock before it is setup.
> >>
> >> There's no reason other printks couldn't do the same thing, however.
> >> I think it's safest to keep an initialized flag and check for it
> >> before attempting to return a meaningful value.
> >
> > I was on vacations, just got back.
> >
> > I think it is safe to just patch our own use of it. Before that, all other
> > printks will be handled by the main cpu anyway, since it'll be the only one active
> > at the moment. The only possible offenders for this are us, and the cpu initialization
> > code, which is already fragile in multiple ways anyway.
> >
> > A flag would only make things more complicated and dirty
>
> Maybe you could add a sanity check in pvclock_clocksource_read
> after 'do { ... } while (version != src->version)' that
> returns last_value if offset is extremely large?
I am not against adding a check, but only if the resulting action is
warn-only. Otherwise we can paper over this, and forget the real bugs.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-08-02 14:43 ` Glauber Costa
2010-08-02 16:16 ` Arjan Koers
@ 2010-08-02 20:26 ` Zachary Amsden
2010-08-02 21:10 ` Glauber Costa
2010-08-02 21:35 ` Arjan Koers
1 sibling, 2 replies; 81+ messages in thread
From: Zachary Amsden @ 2010-08-02 20:26 UTC (permalink / raw)
To: Glauber Costa; +Cc: Arjan Koers, kvm, Avi Kivity
[-- Attachment #1: Type: text/plain, Size: 1991 bytes --]
On 08/02/2010 04:43 AM, Glauber Costa wrote:
> On Sat, Jul 31, 2010 at 01:55:10PM -1000, Zachary Amsden wrote:
>
>> On 07/31/2010 06:36 AM, Arjan Koers wrote:
>>
>>> On 2010-07-31 13:53, Arjan Koers wrote:
>>>
>>>> The kernel boots successfully when CONFIG_PRINTK_TIME is not set.
>>>>
>>>>
>>> The problem occurs when this message is printed:
>>>
>>> [ 0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock
>>>
>>> When I disable that printk, the kernel boots with
>>> CONFIG_PRINTK_TIME=y
>>>
>>> --- a/arch/x86/kernel/kvmclock.c
>>> +++ b/arch/x86/kernel/kvmclock.c
>>> @@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt)
>>> int low, high;
>>> low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
>>> high = ((u64)__pa(&per_cpu(hv_clock, cpu))>> 32);
>>> - printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
>>> - cpu, high, low, txt);
>>> + /*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
>>> + cpu, high, low, txt);*/
>>>
>>> return native_write_msr_safe(msr_kvm_system_time, low, high);
>>> }
>>>
>>> So the problem appears to be that the clock of the second CPU
>>> is used too soon (or that clock setup should finish earlier).
>>>
>> That's almost hilarious. The printk from setting up the kvm clock
>> is invoking the kvm clock before it is setup.
>>
>> There's no reason other printks couldn't do the same thing, however.
>> I think it's safest to keep an initialized flag and check for it
>> before attempting to return a meaningful value.
>>
> I was on vacations, just got back.
>
> I think it is safe to just patch our own use of it. Before that, all other
> printks will be handled by the main cpu anyway, since it'll be the only one active
> at the moment. The only possible offenders for this are us, and the cpu initialization
> code, which is already fragile in multiple ways anyway.
>
> A flag would only make things more complicated and dirty
>
Can we just do this?
[-- Attachment #2: zero.patch --]
[-- Type: text/plain, Size: 855 bytes --]
Initialize hv_clock to zero
This stops callers from getting random values if data is accessed before
clock is initialized; instead they will get zeroed clock values (because
computation involves a multiplication by a factor in hv_clock).
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index eb9b76c..e7acd0d 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -40,7 +40,7 @@ static int parse_no_kvmclock(char *arg)
early_param("no-kvmclock", parse_no_kvmclock);
/* The hypervisor will put information about time periodically here */
-static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock);
+static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock) = {0};
static struct pvclock_wall_clock wall_clock;
/*
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-08-02 20:26 ` Zachary Amsden
@ 2010-08-02 21:10 ` Glauber Costa
2010-08-02 21:35 ` Arjan Koers
1 sibling, 0 replies; 81+ messages in thread
From: Glauber Costa @ 2010-08-02 21:10 UTC (permalink / raw)
To: Zachary Amsden; +Cc: Arjan Koers, kvm, Avi Kivity
On Mon, Aug 02, 2010 at 10:26:30AM -1000, Zachary Amsden wrote:
> On 08/02/2010 04:43 AM, Glauber Costa wrote:
> >On Sat, Jul 31, 2010 at 01:55:10PM -1000, Zachary Amsden wrote:
> >>On 07/31/2010 06:36 AM, Arjan Koers wrote:
> >>>On 2010-07-31 13:53, Arjan Koers wrote:
> >>>>The kernel boots successfully when CONFIG_PRINTK_TIME is not set.
> >>>>
> >>>The problem occurs when this message is printed:
> >>>
> >>>[ 0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock
> >>>
> >>>When I disable that printk, the kernel boots with
> >>>CONFIG_PRINTK_TIME=y
> >>>
> >>>--- a/arch/x86/kernel/kvmclock.c
> >>>+++ b/arch/x86/kernel/kvmclock.c
> >>>@@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt)
> >>> int low, high;
> >>> low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
> >>> high = ((u64)__pa(&per_cpu(hv_clock, cpu))>> 32);
> >>>- printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> >>>- cpu, high, low, txt);
> >>>+ /*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> >>>+ cpu, high, low, txt);*/
> >>>
> >>> return native_write_msr_safe(msr_kvm_system_time, low, high);
> >>> }
> >>>
> >>>So the problem appears to be that the clock of the second CPU
> >>>is used too soon (or that clock setup should finish earlier).
> >>That's almost hilarious. The printk from setting up the kvm clock
> >>is invoking the kvm clock before it is setup.
> >>
> >>There's no reason other printks couldn't do the same thing, however.
> >>I think it's safest to keep an initialized flag and check for it
> >>before attempting to return a meaningful value.
> >I was on vacations, just got back.
> >
> >I think it is safe to just patch our own use of it. Before that, all other
> >printks will be handled by the main cpu anyway, since it'll be the only one active
> >at the moment. The only possible offenders for this are us, and the cpu initialization
> >code, which is already fragile in multiple ways anyway.
> >
> >A flag would only make things more complicated and dirty
> Can we just do this?
> Initialize hv_clock to zero
>
> This stops callers from getting random values if data is accessed before
> clock is initialized; instead they will get zeroed clock values (because
> computation involves a multiplication by a factor in hv_clock).
>
> Signed-off-by: Zachary Amsden <zamsden@redhat.com>
>
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index eb9b76c..e7acd0d 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -40,7 +40,7 @@ static int parse_no_kvmclock(char *arg)
> early_param("no-kvmclock", parse_no_kvmclock);
>
> /* The hypervisor will put information about time periodically here */
> -static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock);
> +static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock) = {0};
> static struct pvclock_wall_clock wall_clock;
We can, but I am a little bit afraid that it won't initialize all the per-cpu areas.
If it does, it is fine, though.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-08-02 20:26 ` Zachary Amsden
2010-08-02 21:10 ` Glauber Costa
@ 2010-08-02 21:35 ` Arjan Koers
2010-08-03 0:00 ` Zachary Amsden
` (2 more replies)
1 sibling, 3 replies; 81+ messages in thread
From: Arjan Koers @ 2010-08-02 21:35 UTC (permalink / raw)
To: Zachary Amsden; +Cc: Glauber Costa, kvm, Avi Kivity, Andre Przywara
[-- Attachment #1: Type: text/plain, Size: 3613 bytes --]
On 2010-08-02 22:26, Zachary Amsden wrote:
> On 08/02/2010 04:43 AM, Glauber Costa wrote:
>> On Sat, Jul 31, 2010 at 01:55:10PM -1000, Zachary Amsden wrote:
>>
>>> On 07/31/2010 06:36 AM, Arjan Koers wrote:
>>>
>>>> On 2010-07-31 13:53, Arjan Koers wrote:
>>>>
>>>>> The kernel boots successfully when CONFIG_PRINTK_TIME is not set.
>>>>>
>>>>>
>>>> The problem occurs when this message is printed:
>>>>
>>>> [ 0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock
>>>>
>>>> When I disable that printk, the kernel boots with
>>>> CONFIG_PRINTK_TIME=y
>>>>
>>>> --- a/arch/x86/kernel/kvmclock.c
>>>> +++ b/arch/x86/kernel/kvmclock.c
>>>> @@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt)
>>>> int low, high;
>>>> low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
>>>> high = ((u64)__pa(&per_cpu(hv_clock, cpu))>> 32);
>>>> - printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
>>>> - cpu, high, low, txt);
>>>> + /*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
>>>> + cpu, high, low, txt);*/
>>>>
>>>> return native_write_msr_safe(msr_kvm_system_time, low, high);
>>>> }
>>>>
>>>> So the problem appears to be that the clock of the second CPU
>>>> is used too soon (or that clock setup should finish earlier).
>>>>
>>> That's almost hilarious. The printk from setting up the kvm clock
>>> is invoking the kvm clock before it is setup.
>>>
>>> There's no reason other printks couldn't do the same thing, however.
>>> I think it's safest to keep an initialized flag and check for it
>>> before attempting to return a meaningful value.
>>>
>> I was on vacations, just got back.
>>
>> I think it is safe to just patch our own use of it. Before that, all
>> other
>> printks will be handled by the main cpu anyway, since it'll be the
>> only one active
>> at the moment. The only possible offenders for this are us, and the
>> cpu initialization
>> code, which is already fragile in multiple ways anyway.
>>
>> A flag would only make things more complicated and dirty
>>
> Can we just do this?
Sorry, the patch doesn't help. See line 68 in my debug log:
65: ffff880001411c00 1b68905d7 156558001892 6e10a 1b6c0631e 375d47 13c5ce 15655813de60
66: ffff880001411c00 1b68905d7 156558001892 6e10a 1b6c0653b 375f64 13c68f 15655813df21
67: ffff880001411c00 1b68905d7 156558001892 6e10a 1b6c06746 37616f 13c74a 15655813dfdc
68: ffff880001511c00 1967ac192 15654c8d826a 63c6c 3bf58bf0ea18 3bf3f5762886 15695466a1e5 2acea0f4244f
69: ffff880001411c00 1b6f3fbda 156558264b1e 6e10e 1b6f424aa 28d0 e93 1565582659b1
70: ffff880001411c00 1b6f3fbda 156558264b1e 6e10e 1b6f4a1e0 a606 3b4b 156558268669
71: ffff880001411c00 1b6f3fbda 156558264b1e 6e10e 1b6f4ba63 be89 440b 156558268f29
72: ffff880001411c00 1b6f3fbda 156558264b1e 6e10e 1b6f4d8e7 dd0d 4ef1 156558269a0f
73: ffff880001511c00 3bf58bf16356 15655825e74b 40496 3bf58bf4d52c 371d6 13aef 15655827223a
74: ffff880001511c00 3bf58bf16356 15655825e74b 40496 3bf58bf4ebec 38896 1430f 156558272a5a
I don't think that pvclock_clocksource_read is receiving
completely random uninitialized data. The values in shadow
are wrong, but could be interpreted as valid data
(shadow.tsc_to_nsec_mul = b6dc43b6, shadow.tsc_shift = ffffffff,
shadow.flags = 0 and shadow.version is always even).
I've attached the printk patches for 2.6.34.1 and 2.6.35, in case
anyone needs them...
[-- Attachment #2: 2.6.34.1.patch --]
[-- Type: text/x-patch, Size: 1055 bytes --]
Move a printk that's using the clock before it's ready
Fix a hang during SMP kernel boot on KVM that showed up
after commit 489fb490dbf8dab0249ad82b56688ae3842a79e8
(2.6.35) and 59aab522154a2f17b25335b63c1cf68a51fb6ae0
(2.6.34.1). The problem only occurs when
CONFIG_PRINTK_TIME is set.
Signed-off-by: Arjan Koers <0h61vkll2ly8@xutrox.com>
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index feaeb0d..71bf2df 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -125,12 +125,15 @@ static struct clocksource kvm_clock = {
static int kvm_register_clock(char *txt)
{
int cpu = smp_processor_id();
- int low, high;
+ int low, high, ret;
+
low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
+ ret = native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
cpu, high, low, txt);
- return native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
+
+ return ret;
}
#ifdef CONFIG_X86_LOCAL_APIC
[-- Attachment #3: 2.6.35.patch --]
[-- Type: text/x-patch, Size: 1055 bytes --]
Move a printk that's using the clock before it's ready
Fix a hang during SMP kernel boot on KVM that showed up
after commit 489fb490dbf8dab0249ad82b56688ae3842a79e8
(2.6.35) and 59aab522154a2f17b25335b63c1cf68a51fb6ae0
(2.6.34.1). The problem only occurs when
CONFIG_PRINTK_TIME is set.
Signed-off-by: Arjan Koers <0h61vkll2ly8@xutrox.com>
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index eb9b76c..ca43ce3 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -128,13 +128,15 @@ static struct clocksource kvm_clock = {
static int kvm_register_clock(char *txt)
{
int cpu = smp_processor_id();
- int low, high;
+ int low, high, ret;
+
low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
+ ret = native_write_msr_safe(msr_kvm_system_time, low, high);
printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
cpu, high, low, txt);
- return native_write_msr_safe(msr_kvm_system_time, low, high);
+ return ret;
}
#ifdef CONFIG_X86_LOCAL_APIC
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-08-02 21:35 ` Arjan Koers
@ 2010-08-03 0:00 ` Zachary Amsden
2010-09-28 11:16 ` Michael Tokarev
2010-09-29 8:28 ` Avi Kivity
2 siblings, 0 replies; 81+ messages in thread
From: Zachary Amsden @ 2010-08-03 0:00 UTC (permalink / raw)
To: Arjan Koers; +Cc: Glauber Costa, kvm, Avi Kivity, Andre Przywara
[-- Attachment #1: Type: text/plain, Size: 3909 bytes --]
On 08/02/2010 11:35 AM, Arjan Koers wrote:
> On 2010-08-02 22:26, Zachary Amsden wrote:
>
>> On 08/02/2010 04:43 AM, Glauber Costa wrote:
>>
>>> On Sat, Jul 31, 2010 at 01:55:10PM -1000, Zachary Amsden wrote:
>>>
>>>
>>>> On 07/31/2010 06:36 AM, Arjan Koers wrote:
>>>>
>>>>
>>>>> On 2010-07-31 13:53, Arjan Koers wrote:
>>>>>
>>>>>
>>>>>> The kernel boots successfully when CONFIG_PRINTK_TIME is not set.
>>>>>>
>>>>>>
>>>>>>
>>>>> The problem occurs when this message is printed:
>>>>>
>>>>> [ 0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock
>>>>>
>>>>> When I disable that printk, the kernel boots with
>>>>> CONFIG_PRINTK_TIME=y
>>>>>
>>>>> --- a/arch/x86/kernel/kvmclock.c
>>>>> +++ b/arch/x86/kernel/kvmclock.c
>>>>> @@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt)
>>>>> int low, high;
>>>>> low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
>>>>> high = ((u64)__pa(&per_cpu(hv_clock, cpu))>> 32);
>>>>> - printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
>>>>> - cpu, high, low, txt);
>>>>> + /*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
>>>>> + cpu, high, low, txt);*/
>>>>>
>>>>> return native_write_msr_safe(msr_kvm_system_time, low, high);
>>>>> }
>>>>>
>>>>> So the problem appears to be that the clock of the second CPU
>>>>> is used too soon (or that clock setup should finish earlier).
>>>>>
>>>>>
>>>> That's almost hilarious. The printk from setting up the kvm clock
>>>> is invoking the kvm clock before it is setup.
>>>>
>>>> There's no reason other printks couldn't do the same thing, however.
>>>> I think it's safest to keep an initialized flag and check for it
>>>> before attempting to return a meaningful value.
>>>>
>>>>
>>> I was on vacations, just got back.
>>>
>>> I think it is safe to just patch our own use of it. Before that, all
>>> other
>>> printks will be handled by the main cpu anyway, since it'll be the
>>> only one active
>>> at the moment. The only possible offenders for this are us, and the
>>> cpu initialization
>>> code, which is already fragile in multiple ways anyway.
>>>
>>> A flag would only make things more complicated and dirty
>>>
>>>
>> Can we just do this?
>>
>
> Sorry, the patch doesn't help. See line 68 in my debug log:
> 65: ffff880001411c00 1b68905d7 156558001892 6e10a 1b6c0631e 375d47 13c5ce 15655813de60
> 66: ffff880001411c00 1b68905d7 156558001892 6e10a 1b6c0653b 375f64 13c68f 15655813df21
> 67: ffff880001411c00 1b68905d7 156558001892 6e10a 1b6c06746 37616f 13c74a 15655813dfdc
> 68: ffff880001511c00 1967ac192 15654c8d826a 63c6c 3bf58bf0ea18 3bf3f5762886 15695466a1e5 2acea0f4244f
>
This is a separate bug. See attached patch (it won't apply, it's part
of a series, but shows the bug).
> 69: ffff880001411c00 1b6f3fbda 156558264b1e 6e10e 1b6f424aa 28d0 e93 1565582659b1
> 70: ffff880001411c00 1b6f3fbda 156558264b1e 6e10e 1b6f4a1e0 a606 3b4b 156558268669
> 71: ffff880001411c00 1b6f3fbda 156558264b1e 6e10e 1b6f4ba63 be89 440b 156558268f29
> 72: ffff880001411c00 1b6f3fbda 156558264b1e 6e10e 1b6f4d8e7 dd0d 4ef1 156558269a0f
> 73: ffff880001511c00 3bf58bf16356 15655825e74b 40496 3bf58bf4d52c 371d6 13aef 15655827223a
> 74: ffff880001511c00 3bf58bf16356 15655825e74b 40496 3bf58bf4ebec 38896 1430f 156558272a5a
>
> I don't think that pvclock_clocksource_read is receiving
> completely random uninitialized data. The values in shadow
> are wrong, but could be interpreted as valid data
> (shadow.tsc_to_nsec_mul = b6dc43b6, shadow.tsc_shift = ffffffff,
> shadow.flags = 0 and shadow.version is always even).
>
Copied from the first CPU possibly?
[-- Attachment #2: 0004-Fix-SVM-VMCB-reset.patch --]
[-- Type: text/plain, Size: 1031 bytes --]
From 3823c018162dc708b543cbdc680a4c7d63533fee Mon Sep 17 00:00:00 2001
From: Zachary Amsden <zamsden@redhat.com>
Date: Sat, 29 May 2010 17:52:46 -1000
Subject: [KVM V2 04/25] Fix SVM VMCB reset
Cc: Avi Kivity <avi@redhat.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
Glauber Costa <glommer@redhat.com>,
linux-kernel@vger.kernel.org
On reset, VMCB TSC should be set to zero. Instead, code was setting
tsc_offset to zero, which passes through the underlying TSC.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
---
arch/x86/kvm/svm.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 760c86e..46856d2 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -781,7 +781,7 @@ static void init_vmcb(struct vcpu_svm *svm)
control->iopm_base_pa = iopm_base;
control->msrpm_base_pa = __pa(svm->msrpm);
- control->tsc_offset = 0;
+ guest_write_tsc(&svm->vcpu, 0);
control->int_ctl = V_INTR_MASKING_MASK;
init_seg(&save->es);
--
1.7.1
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-08-02 21:35 ` Arjan Koers
2010-08-03 0:00 ` Zachary Amsden
@ 2010-09-28 11:16 ` Michael Tokarev
2010-09-29 8:12 ` Michael Tokarev
2010-09-29 8:28 ` Avi Kivity
2 siblings, 1 reply; 81+ messages in thread
From: Michael Tokarev @ 2010-09-28 11:16 UTC (permalink / raw)
To: kvm
Arjan Koers <0h61vkll2ly8 <at> xutrox.com> writes:
[]
> I've attached the printk patches for 2.6.34.1 and 2.6.35, in case
> anyone needs them...
>
>
> Move a printk that's using the clock before it's ready
>
> Fix a hang during SMP kernel boot on KVM that showed up
> after commit 489fb490dbf8dab0249ad82b56688ae3842a79e8
> (2.6.35) and 59aab522154a2f17b25335b63c1cf68a51fb6ae0
> (2.6.34.1). The problem only occurs when
> CONFIG_PRINTK_TIME is set.
>
> Signed-off-by: Arjan Koers <0h61vkll2ly8 <at> xutrox.com>
>
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index feaeb0d..71bf2df 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -125,12 +125,15 @@ static struct clocksource kvm_clock = {
> static int kvm_register_clock(char *txt)
> {
> int cpu = smp_processor_id();
> - int low, high;
> + int low, high, ret;
> +
> low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
> high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
> + ret = native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
> printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> cpu, high, low, txt);
> - return native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
> +
> + return ret;
> }
>
> #ifdef CONFIG_X86_LOCAL_APIC
Folks, should this be sent to -stable kernel? It is not in any
upstream kernel as far as I can see (not in linus tree too), but
this is quite an issue and is hitting people....
The discussion were stalled quite a while ago too -- this email has
Date: Mon, 02 Aug 2010 23:35:28 +0200.
Thanks!
/mjt
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-09-28 11:16 ` Michael Tokarev
@ 2010-09-29 8:12 ` Michael Tokarev
0 siblings, 0 replies; 81+ messages in thread
From: Michael Tokarev @ 2010-09-29 8:12 UTC (permalink / raw)
To: kvm; +Cc: Arjan Koers, Zachary Amsden, Glauber Costa, Avi Kivity,
Andre Przywara
Ping? ;)
28.09.2010 15:16, Michael Tokarev wrote:
> Arjan Koers <0h61vkll2ly8 <at> xutrox.com> writes:
>
>> Move a printk that's using the clock before it's ready
>>
>> Fix a hang during SMP kernel boot on KVM that showed up
>> after commit 489fb490dbf8dab0249ad82b56688ae3842a79e8
>> (2.6.35) and 59aab522154a2f17b25335b63c1cf68a51fb6ae0
>> (2.6.34.1). The problem only occurs when
>> CONFIG_PRINTK_TIME is set.
>>
>> Signed-off-by: Arjan Koers <0h61vkll2ly8 <at> xutrox.com>
>>
>> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
>> index feaeb0d..71bf2df 100644
>> --- a/arch/x86/kernel/kvmclock.c
>> +++ b/arch/x86/kernel/kvmclock.c
>> @@ -125,12 +125,15 @@ static struct clocksource kvm_clock = {
>> static int kvm_register_clock(char *txt)
>> {
>> int cpu = smp_processor_id();
>> - int low, high;
>> + int low, high, ret;
>> +
>> low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
>> high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
>> + ret = native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
>> printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
>> cpu, high, low, txt);
>> - return native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
>> +
>> + return ret;
>> }
>>
>> #ifdef CONFIG_X86_LOCAL_APIC
>
> Folks, should this be sent to -stable kernel? It is not in any
> upstream kernel as far as I can see (not in linus tree too), but
> this is quite an issue and is hitting people....
>
> The discussion were stalled quite a while ago too -- this email has
> Date: Mon, 02 Aug 2010 23:35:28 +0200.
>
> Thanks!
>
> /mjt
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-08-02 21:35 ` Arjan Koers
2010-08-03 0:00 ` Zachary Amsden
2010-09-28 11:16 ` Michael Tokarev
@ 2010-09-29 8:28 ` Avi Kivity
2010-09-29 9:17 ` Michael Tokarev
2 siblings, 1 reply; 81+ messages in thread
From: Avi Kivity @ 2010-09-29 8:28 UTC (permalink / raw)
To: Arjan Koers; +Cc: Zachary Amsden, Glauber Costa, kvm, Andre Przywara
On 08/03/2010 12:35 AM, Arjan Koers wrote:
> I don't think that pvclock_clocksource_read is receiving
> completely random uninitialized data. The values in shadow
> are wrong, but could be interpreted as valid data
> (shadow.tsc_to_nsec_mul = b6dc43b6, shadow.tsc_shift = ffffffff,
> shadow.flags = 0 and shadow.version is always even).
>
>
> I've attached the printk patches for 2.6.34.1 and 2.6.35, in case
> anyone needs them...
Thanks, applied. Please post patches in a new thread so I get the
chance to see them.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-09-29 8:28 ` Avi Kivity
@ 2010-09-29 9:17 ` Michael Tokarev
2010-09-29 9:19 ` Michael Tokarev
0 siblings, 1 reply; 81+ messages in thread
From: Michael Tokarev @ 2010-09-29 9:17 UTC (permalink / raw)
To: Avi Kivity
Cc: Arjan Koers, Zachary Amsden, Glauber Costa, kvm, Andre Przywara
29.09.2010 12:28, Avi Kivity wrote:
> On 08/03/2010 12:35 AM, Arjan Koers wrote:
>> I don't think that pvclock_clocksource_read is receiving
>> completely random uninitialized data. The values in shadow
>> are wrong, but could be interpreted as valid data
>> (shadow.tsc_to_nsec_mul = b6dc43b6, shadow.tsc_shift = ffffffff,
>> shadow.flags = 0 and shadow.version is always even).
>>
>> I've attached the printk patches for 2.6.34.1 and 2.6.35, in case
>> anyone needs them...
[Move a printk that's using the clock before it's ready]
> Thanks, applied. Please post patches in a new thread so I get the
> chance to see them.
Avi, this is definitely a -stable material, for 2.6.32 (longterm
stable) and 2.6.35.
/mjt
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-09-29 9:17 ` Michael Tokarev
@ 2010-09-29 9:19 ` Michael Tokarev
2010-09-29 19:26 ` Arjan Koers
0 siblings, 1 reply; 81+ messages in thread
From: Michael Tokarev @ 2010-09-29 9:19 UTC (permalink / raw)
To: Avi Kivity
Cc: Arjan Koers, Zachary Amsden, Glauber Costa, kvm, Andre Przywara
29.09.2010 13:17, Michael Tokarev пишет:
> 29.09.2010 12:28, Avi Kivity wrote:
>> On 08/03/2010 12:35 AM, Arjan Koers wrote:
>>> I don't think that pvclock_clocksource_read is receiving
>>> completely random uninitialized data. The values in shadow
>>> are wrong, but could be interpreted as valid data
>>> (shadow.tsc_to_nsec_mul = b6dc43b6, shadow.tsc_shift = ffffffff,
>>> shadow.flags = 0 and shadow.version is always even).
>>>
>>> I've attached the printk patches for 2.6.34.1 and 2.6.35, in case
>>> anyone needs them...
[]
> Avi, this is definitely a -stable material, for 2.6.32 (longterm
> stable) and 2.6.35.
Er. Please excuse me for the misinformation. It is _not_ for 2.6.32
ofcourse.
/mjt
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-09-29 9:19 ` Michael Tokarev
@ 2010-09-29 19:26 ` Arjan Koers
2010-09-30 7:55 ` Michael Tokarev
0 siblings, 1 reply; 81+ messages in thread
From: Arjan Koers @ 2010-09-29 19:26 UTC (permalink / raw)
To: kvm
Cc: Avi Kivity, Zachary Amsden, Glauber Costa, Michael Tokarev,
Andre Przywara
On 2010-09-29 11:19, Michael Tokarev wrote:
> 29.09.2010 13:17, Michael Tokarev пишет:
>> 29.09.2010 12:28, Avi Kivity wrote:
>>> On 08/03/2010 12:35 AM, Arjan Koers wrote:
>>>> I don't think that pvclock_clocksource_read is receiving
>>>> completely random uninitialized data. The values in shadow
>>>> are wrong, but could be interpreted as valid data
>>>> (shadow.tsc_to_nsec_mul = b6dc43b6, shadow.tsc_shift = ffffffff,
>>>> shadow.flags = 0 and shadow.version is always even).
>>>>
>>>> I've attached the printk patches for 2.6.34.1 and 2.6.35, in case
>>>> anyone needs them...
> []
>> Avi, this is definitely a -stable material, for 2.6.32 (longterm
>> stable) and 2.6.35.
>
> Er. Please excuse me for the misinformation. It is _not_ for 2.6.32
> ofcourse.
I wish you hadn't mentioned 2.6.32. I just tried 2.6.32.23 and it also
hangs. Reverting commit 1345126c761f0360dc108973bf73281d51945bc1
(introduced in 2.6.32.16) makes it boot again.
The kvmclock printk patch doesn't help, but I'll try to figure out
what's wrong.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-09-29 19:26 ` Arjan Koers
@ 2010-09-30 7:55 ` Michael Tokarev
2010-09-30 9:59 ` Michael Tokarev
0 siblings, 1 reply; 81+ messages in thread
From: Michael Tokarev @ 2010-09-30 7:55 UTC (permalink / raw)
To: Arjan Koers
Cc: kvm, Avi Kivity, Zachary Amsden, Glauber Costa, Andre Przywara
29.09.2010 23:26, Arjan Koers wrote:
> On 2010-09-29 11:19, Michael Tokarev wrote:
>> 29.09.2010 13:17, Michael Tokarev wrote:
[]
>>> Avi, this is definitely a -stable material, for 2.6.32 (longterm
>>> stable) and 2.6.35.
>>
>> Er. Please excuse me for the misinformation. It is _not_ for 2.6.32
>> ofcourse.
>
> I wish you hadn't mentioned 2.6.32. I just tried 2.6.32.23 and it also
> hangs. Reverting commit 1345126c761f0360dc108973bf73281d51945bc1
> (introduced in 2.6.32.16) makes it boot again.
>
> The kvmclock printk patch doesn't help, but I'll try to figure out
> what's wrong.
It works here just fine - both 32- and 64-bit 2.6.32.23 as is,
and both 32- and 64-bit 2.6.35.6 with the printk.time patch
applied.
Thanks!
/mjt
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-09-30 7:55 ` Michael Tokarev
@ 2010-09-30 9:59 ` Michael Tokarev
2010-09-30 13:54 ` Zachary Amsden
0 siblings, 1 reply; 81+ messages in thread
From: Michael Tokarev @ 2010-09-30 9:59 UTC (permalink / raw)
To: Arjan Koers
Cc: kvm, Avi Kivity, Zachary Amsden, Glauber Costa, Andre Przywara
30.09.2010 11:55, Michael Tokarev wrote:
> 29.09.2010 23:26, Arjan Koers wrote:
>> On 2010-09-29 11:19, Michael Tokarev wrote:
>>> 29.09.2010 13:17, Michael Tokarev wrote:
> []
>>>> Avi, this is definitely a -stable material, for 2.6.32 (longterm
>>>> stable) and 2.6.35.
>>>
>>> Er. Please excuse me for the misinformation. It is _not_ for 2.6.32
>>> ofcourse.
>>
>> I wish you hadn't mentioned 2.6.32. I just tried 2.6.32.23 and it also
>> hangs. Reverting commit 1345126c761f0360dc108973bf73281d51945bc1
>> (introduced in 2.6.32.16) makes it boot again.
>>
>> The kvmclock printk patch doesn't help, but I'll try to figure out
>> what's wrong.
>
> It works here just fine - both 32- and 64-bit 2.6.32.23 as is,
> and both 32- and 64-bit 2.6.35.6 with the printk.time patch
> applied.
Ok, I can confirm there's another issue somewhere around this.
After numerous tries I noticed that guests sporadically stops
during bootup - either somewhere in the middle or at the very
end of it. It is definitely not this problem with printk time,
but it appears to be related to kvm-clock still, and smp.
This time, the lockup isn't really a lock up per se - the system
works (fsvo) - it reacts to keyboard, I can scroll up/down the
text console. But it does nothing more, and in particular I've
no idea what it is waiting for. It does not consume host CPU
as the printk.time problem had.
Happens most with 2.6.35.6 32bit guest kernel. I weren't able
to reproduce it with 2.6.35.6 64bit. Does not happen on
2.6.35.3. And happens sporadically on 2.6.32.23 32bit too.
The thing always happens during some module load or other
_kernel_ work. F.e. right now I've 2.6.35.6 32bit kernel
sitting after the login prompt (the Login: is at the middle
of the screen), with a few messages after the login prompt
telling me about various "misc" drivers (floppy, parport,
sg, piix_smbus etc) loaded.
Booting with clocksource=tsc does not expose the problem
so far - at least the most problematic 2.6.35.6 32bit always
booted ok with tsc. But since the issue is intermittent,
one can't be sure it's really pvclock.
/mjt
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-09-30 9:59 ` Michael Tokarev
@ 2010-09-30 13:54 ` Zachary Amsden
2010-09-30 15:12 ` Michael Tokarev
0 siblings, 1 reply; 81+ messages in thread
From: Zachary Amsden @ 2010-09-30 13:54 UTC (permalink / raw)
To: Michael Tokarev
Cc: Arjan Koers, kvm, Avi Kivity, Glauber Costa, Andre Przywara
On 09/29/2010 11:59 PM, Michael Tokarev wrote:
> 30.09.2010 11:55, Michael Tokarev wrote:
>
>> 29.09.2010 23:26, Arjan Koers wrote:
>>
>>> On 2010-09-29 11:19, Michael Tokarev wrote:
>>>
>>>> 29.09.2010 13:17, Michael Tokarev wrote:
>>>>
>> []
>>
>>>>> Avi, this is definitely a -stable material, for 2.6.32 (longterm
>>>>> stable) and 2.6.35.
>>>>>
>>>> Er. Please excuse me for the misinformation. It is _not_ for 2.6.32
>>>> ofcourse.
>>>>
>>> I wish you hadn't mentioned 2.6.32. I just tried 2.6.32.23 and it also
>>> hangs. Reverting commit 1345126c761f0360dc108973bf73281d51945bc1
>>> (introduced in 2.6.32.16) makes it boot again.
>>>
>>> The kvmclock printk patch doesn't help, but I'll try to figure out
>>> what's wrong.
>>>
>> It works here just fine - both 32- and 64-bit 2.6.32.23 as is,
>> and both 32- and 64-bit 2.6.35.6 with the printk.time patch
>> applied.
>>
> Ok, I can confirm there's another issue somewhere around this.
>
> After numerous tries I noticed that guests sporadically stops
> during bootup - either somewhere in the middle or at the very
> end of it. It is definitely not this problem with printk time,
> but it appears to be related to kvm-clock still, and smp.
>
> This time, the lockup isn't really a lock up per se - the system
> works (fsvo) - it reacts to keyboard, I can scroll up/down the
> text console. But it does nothing more, and in particular I've
> no idea what it is waiting for. It does not consume host CPU
> as the printk.time problem had.
>
> Happens most with 2.6.35.6 32bit guest kernel. I weren't able
> to reproduce it with 2.6.35.6 64bit. Does not happen on
> 2.6.35.3. And happens sporadically on 2.6.32.23 32bit too.
>
> The thing always happens during some module load or other
> _kernel_ work. F.e. right now I've 2.6.35.6 32bit kernel
> sitting after the login prompt (the Login: is at the middle
> of the screen), with a few messages after the login prompt
> telling me about various "misc" drivers (floppy, parport,
> sg, piix_smbus etc) loaded.
>
> Booting with clocksource=tsc does not expose the problem
> so far - at least the most problematic 2.6.35.6 32bit always
> booted ok with tsc. But since the issue is intermittent,
> one can't be sure it's really pvclock.
>
> /mjt
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
The printk movement is just a bandaid patch, correct? Anything which
does printk before kvmclock is registered could trigger the same bug.
Can you try with printk timing disabled and see if the bug disappears?
Zach
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-09-30 13:54 ` Zachary Amsden
@ 2010-09-30 15:12 ` Michael Tokarev
2010-09-30 15:32 ` Zachary Amsden
0 siblings, 1 reply; 81+ messages in thread
From: Michael Tokarev @ 2010-09-30 15:12 UTC (permalink / raw)
To: Zachary Amsden
Cc: Arjan Koers, kvm, Avi Kivity, Glauber Costa, Andre Przywara
30.09.2010 17:54, Zachary Amsden wrote:
[]
> The printk movement is just a bandaid patch, correct? Anything which
> does printk before kvmclock is registered could trigger the same bug.
Well, I'd not say it's just a bandaid patch, it's real bug -- either
we can read kvmclock (so it's initialized), or we don't touch it (at
least before registration).
> Can you try with printk timing disabled and see if the bug disappears?
Yes it disappears so far, at last I can't trigger it anymore, tried
numerous boots including the 2.6.35.6 32bit kernel (patched with the
printk registration patch!) which shows the prob in almost every boot.
Thanks!
/mjt
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-09-30 15:12 ` Michael Tokarev
@ 2010-09-30 15:32 ` Zachary Amsden
2010-09-30 18:49 ` Arjan Koers
0 siblings, 1 reply; 81+ messages in thread
From: Zachary Amsden @ 2010-09-30 15:32 UTC (permalink / raw)
To: Michael Tokarev
Cc: Arjan Koers, kvm, Avi Kivity, Glauber Costa, Andre Przywara
On 09/30/2010 05:12 AM, Michael Tokarev wrote:
> 30.09.2010 17:54, Zachary Amsden wrote:
> []
>
>> The printk movement is just a bandaid patch, correct? Anything which
>> does printk before kvmclock is registered could trigger the same bug.
>>
> Well, I'd not say it's just a bandaid patch, it's real bug -- either
> we can read kvmclock (so it's initialized), or we don't touch it (at
> least before registration).
>
Yes, that's the bug, but moving the printk doesn't fix that, it just
hides it.
>
>> Can you try with printk timing disabled and see if the bug disappears?
>>
> Yes it disappears so far, at last I can't trigger it anymore, tried
> numerous boots including the 2.6.35.6 32bit kernel (patched with the
> printk registration patch!) which shows the prob in almost every boot.
>
So, looks like we need to do the real fix.
Zach
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-09-30 15:32 ` Zachary Amsden
@ 2010-09-30 18:49 ` Arjan Koers
2010-09-30 19:05 ` Marcelo Tosatti
0 siblings, 1 reply; 81+ messages in thread
From: Arjan Koers @ 2010-09-30 18:49 UTC (permalink / raw)
To: kvm
Cc: Michael Tokarev, Zachary Amsden, Avi Kivity, Glauber Costa,
Andre Przywara
On 2010-09-30 17:32, Zachary Amsden wrote:
> On 09/30/2010 05:12 AM, Michael Tokarev wrote:
>> 30.09.2010 17:54, Zachary Amsden wrote:
>> []
>>
>>> The printk movement is just a bandaid patch, correct? Anything which
>>> does printk before kvmclock is registered could trigger the same bug.
>>>
>> Well, I'd not say it's just a bandaid patch, it's real bug -- either
>> we can read kvmclock (so it's initialized), or we don't touch it (at
>> least before registration).
>>
>
> Yes, that's the bug, but moving the printk doesn't fix that, it just
> hides it.
Correct. It's just luck that it works for my 64-bit 2.6.34.* and
2.6.35.* kernels. The working kernels will break when compiled to
print additional debug information.
Here's a partial boot log of 2.6.32.23 with smpboot.c compiled
with DEBUG define. I modified printk to display the CPU# (printk_cpu).
All lines on CPU 1 up to 0.136487 are using the invalid clock and
will cause the kernel to hang later (if I hadn't patched pvclock
to correct the clock backwards).
...
[0: 0.124221] Booting processor 1 APIC 0x1 ip 0x6000
[0: 0.124579] Setting warm reset code and vector.
[0: 0.124585] 1.
[0: 0.124587] 2.
[0: 0.124588] 3.
[0: 0.124601] Asserting INIT.
[0: 0.124613] Waiting for send to finish...
[0: 0.134490] Deasserting INIT.
[0: 0.134497] Waiting for send to finish...
[0: 0.134501] #startup loops: 2.
[0: 0.134503] Sending STARTUP #1.
[0: 0.134508] After apic_write.
[1: 0.008000] Initializing CPU#1
[1: 0.008000] CPU#1 (phys ID: 1) waiting for CALLOUT
[0: 0.134826] Startup point 1.
[0: 0.135133] Waiting for send to finish...
[0: 0.135340] Sending STARTUP #2.
[0: 0.135346] After apic_write.
[0: 0.135650] Startup point 1.
[0: 0.135651] Waiting for send to finish...
[0: 0.135858] After Startup.
[0: 0.135859] Before Callout 1.
[0: 0.135861] After Callout 1.
[1: 0.008000] CALLIN, before setup_local_APIC().
[1: 0.008000] Stack at about ffff88001f889f44
[1: 0.008000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[1: 0.008000] CPU: L2 Cache: 512K (64 bytes/line)
[1: 0.008000] kvm-clock: cpu 1, msr 0:1511601, secondary cpu clock
[0: 0.136461] OK.
[0: 0.136463] CPU1: AMD Athlon(tm) II X2 240 Processor stepping 02
[0: 0.136465] CPU has booted.
[0: 0.136488] Brought up 2 CPUs
[0: 0.136489] Boot done.
[0: 0.136490] Before bogomips.
[0: 0.136491] Total of 2 processors activated (11202.17 BogoMIPS).
[0: 0.136493] Before bogocount - setting activated=1.
[1: 0.136487] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
[0: 0.144007] CPU0 attaching sched-domain:
[0: 0.144010] domain 0: span 0-1 level CPU
[0: 0.144012] groups: 0 1
[0: 0.144016] CPU1 attaching sched-domain:
[0: 0.144018] domain 0: span 0-1 level CPU
[0: 0.144020] groups: 1 0
[0: 0.144219] NET: Registered protocol family 16
[0: 0.148091] PCI: Using configuration type 1 for base access
[0: 0.148451] PCI: Using configuration type 1 for extended access
[0: 0.148870] mtrr: your CPUs had inconsistent variable MTRR settings
[0: 0.148870] mtrr: your CPUs had inconsistent MTRRdefType settings
[0: 0.148870] mtrr: probably your BIOS does not setup all CPUs.
[0: 0.149185] mtrr: corrected configuration.
[0: 0.156112] bio: create slab <bio-0> at 0
[0: 0.156635] vgaarb: loaded
[0: 0.156635] PCI: Probing PCI hardware
[0: 0.156635] PCI: Probing PCI hardware (bus 00)
[0: 0.156635] pci 0000:00:01.1: reg 20 io port: [0xc000-0xc00f]
[0: 0.156773] pci 0000:00:01.3: quirk: region b000-b03f claimed by PIIX4 ACPI
[0: 0.160012] pci 0000:00:01.3: quirk: region b100-b10f claimed by PIIX4 SMB
[0: 0.163379] pci 0000:00:02.0: reg 10 32bit mmio pref: [0xf0000000-0xf1ffffff]
[0: 0.164660] pci 0000:00:02.0: reg 14 32bit mmio: [0xf2000000-0xf2000fff]
[0: 0.170537] pci 0000:00:03.0: reg 10 io port: [0xc020-0xc03f]
[0: 0.170629] pci 0000:00:03.0: reg 14 32bit mmio: [0xf2001000-0xf2001fff]
[0: 0.171037] pci 0000:00:04.0: reg 10 io port: [0xc040-0xc05f]
[0: 0.171373] pci 0000:00:05.0: reg 10 io port: [0xc080-0xc0bf]
[0: 0.172273] pci 0000:00:06.0: reg 10 io port: [0xc0c0-0xc0ff]
[0: 0.173099] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[0: 0.176131] pci 0000:00:01.0: PIIX/ICH IRQ router [8086:7000]
[0: 0.177112] Switching to clocksource kvm-clock
[1: 0.181401] pci_bus 0000:00: resource 0 io: [0x00-0xffff]
[1: 0.181412] pci_bus 0000:00: resource 1 mem: [0x000000-0xffffffffffffffff]
[1: 0.181825] NET: Registered protocol family 2
...
>>> Can you try with printk timing disabled and see if the bug disappears?
>>>
>> Yes it disappears so far, at last I can't trigger it anymore, tried
>> numerous boots including the 2.6.35.6 32bit kernel (patched with the
>> printk registration patch!) which shows the prob in almost every boot.
>>
>
> So, looks like we need to do the real fix.
Your ideas to zero hv_clock or to use an initialized flag may be usable.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-09-30 18:49 ` Arjan Koers
@ 2010-09-30 19:05 ` Marcelo Tosatti
2010-09-30 20:16 ` Arjan Koers
2010-09-30 23:02 ` Michael Tokarev
0 siblings, 2 replies; 81+ messages in thread
From: Marcelo Tosatti @ 2010-09-30 19:05 UTC (permalink / raw)
To: Arjan Koers
Cc: kvm, Michael Tokarev, Zachary Amsden, Avi Kivity, Glauber Costa,
Andre Przywara
On Thu, Sep 30, 2010 at 08:49:44PM +0200, Arjan Koers wrote:
> On 2010-09-30 17:32, Zachary Amsden wrote:
> > On 09/30/2010 05:12 AM, Michael Tokarev wrote:
> >> 30.09.2010 17:54, Zachary Amsden wrote:
> >> []
> >>
> >>> The printk movement is just a bandaid patch, correct? Anything which
> >>> does printk before kvmclock is registered could trigger the same bug.
> >>>
> >> Well, I'd not say it's just a bandaid patch, it's real bug -- either
> >> we can read kvmclock (so it's initialized), or we don't touch it (at
> >> least before registration).
> >>
> >
> > Yes, that's the bug, but moving the printk doesn't fix that, it just
> > hides it.
>
> Correct. It's just luck that it works for my 64-bit 2.6.34.* and
> 2.6.35.* kernels. The working kernels will break when compiled to
> print additional debug information.
>
> Here's a partial boot log of 2.6.32.23 with smpboot.c compiled
> with DEBUG define. I modified printk to display the CPU# (printk_cpu).
> All lines on CPU 1 up to 0.136487 are using the invalid clock and
> will cause the kernel to hang later (if I hadn't patched pvclock
> to correct the clock backwards).
> ...
> [0: 0.124221] Booting processor 1 APIC 0x1 ip 0x6000
> [0: 0.124579] Setting warm reset code and vector.
> [0: 0.124585] 1.
> [0: 0.124587] 2.
> [0: 0.124588] 3.
> [0: 0.124601] Asserting INIT.
> [0: 0.124613] Waiting for send to finish...
> [0: 0.134490] Deasserting INIT.
> [0: 0.134497] Waiting for send to finish...
> [0: 0.134501] #startup loops: 2.
> [0: 0.134503] Sending STARTUP #1.
> [0: 0.134508] After apic_write.
> [1: 0.008000] Initializing CPU#1
> [1: 0.008000] CPU#1 (phys ID: 1) waiting for CALLOUT
> [0: 0.134826] Startup point 1.
> [0: 0.135133] Waiting for send to finish...
> [0: 0.135340] Sending STARTUP #2.
> [0: 0.135346] After apic_write.
> [0: 0.135650] Startup point 1.
> [0: 0.135651] Waiting for send to finish...
> [0: 0.135858] After Startup.
> [0: 0.135859] Before Callout 1.
> [0: 0.135861] After Callout 1.
> [1: 0.008000] CALLIN, before setup_local_APIC().
> [1: 0.008000] Stack at about ffff88001f889f44
> [1: 0.008000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> [1: 0.008000] CPU: L2 Cache: 512K (64 bytes/line)
> [1: 0.008000] kvm-clock: cpu 1, msr 0:1511601, secondary cpu clock
> [0: 0.136461] OK.
> [0: 0.136463] CPU1: AMD Athlon(tm) II X2 240 Processor stepping 02
> [0: 0.136465] CPU has booted.
> [0: 0.136488] Brought up 2 CPUs
> [0: 0.136489] Boot done.
> [0: 0.136490] Before bogomips.
> [0: 0.136491] Total of 2 processors activated (11202.17 BogoMIPS).
> [0: 0.136493] Before bogocount - setting activated=1.
> [1: 0.136487] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
> [0: 0.144007] CPU0 attaching sched-domain:
> [0: 0.144010] domain 0: span 0-1 level CPU
> [0: 0.144012] groups: 0 1
> [0: 0.144016] CPU1 attaching sched-domain:
> [0: 0.144018] domain 0: span 0-1 level CPU
> [0: 0.144020] groups: 1 0
> [0: 0.144219] NET: Registered protocol family 16
> [0: 0.148091] PCI: Using configuration type 1 for base access
> [0: 0.148451] PCI: Using configuration type 1 for extended access
> [0: 0.148870] mtrr: your CPUs had inconsistent variable MTRR settings
> [0: 0.148870] mtrr: your CPUs had inconsistent MTRRdefType settings
> [0: 0.148870] mtrr: probably your BIOS does not setup all CPUs.
> [0: 0.149185] mtrr: corrected configuration.
> [0: 0.156112] bio: create slab <bio-0> at 0
> [0: 0.156635] vgaarb: loaded
> [0: 0.156635] PCI: Probing PCI hardware
> [0: 0.156635] PCI: Probing PCI hardware (bus 00)
> [0: 0.156635] pci 0000:00:01.1: reg 20 io port: [0xc000-0xc00f]
> [0: 0.156773] pci 0000:00:01.3: quirk: region b000-b03f claimed by PIIX4 ACPI
> [0: 0.160012] pci 0000:00:01.3: quirk: region b100-b10f claimed by PIIX4 SMB
> [0: 0.163379] pci 0000:00:02.0: reg 10 32bit mmio pref: [0xf0000000-0xf1ffffff]
> [0: 0.164660] pci 0000:00:02.0: reg 14 32bit mmio: [0xf2000000-0xf2000fff]
> [0: 0.170537] pci 0000:00:03.0: reg 10 io port: [0xc020-0xc03f]
> [0: 0.170629] pci 0000:00:03.0: reg 14 32bit mmio: [0xf2001000-0xf2001fff]
> [0: 0.171037] pci 0000:00:04.0: reg 10 io port: [0xc040-0xc05f]
> [0: 0.171373] pci 0000:00:05.0: reg 10 io port: [0xc080-0xc0bf]
> [0: 0.172273] pci 0000:00:06.0: reg 10 io port: [0xc0c0-0xc0ff]
> [0: 0.173099] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
> [0: 0.176131] pci 0000:00:01.0: PIIX/ICH IRQ router [8086:7000]
> [0: 0.177112] Switching to clocksource kvm-clock
> [1: 0.181401] pci_bus 0000:00: resource 0 io: [0x00-0xffff]
> [1: 0.181412] pci_bus 0000:00: resource 1 mem: [0x000000-0xffffffffffffffff]
> [1: 0.181825] NET: Registered protocol family 2
> ...
>
>
> >>> Can you try with printk timing disabled and see if the bug disappears?
> >>>
> >> Yes it disappears so far, at last I can't trigger it anymore, tried
> >> numerous boots including the 2.6.35.6 32bit kernel (patched with the
> >> printk registration patch!) which shows the prob in almost every boot.
> >>
> >
> > So, looks like we need to do the real fix.
>
> Your ideas to zero hv_clock or to use an initialized flag may be usable.
Arjan, Michael, can you try the following:
>From 3823c018162dc708b543cbdc680a4c7d63533fee Mon Sep 17 00:00:00 2001
From: Zachary Amsden <zamsden@redhat.com>
Date: Sat, 29 May 2010 17:52:46 -1000
Subject: [KVM V2 04/25] Fix SVM VMCB reset
Cc: Avi Kivity <avi@redhat.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
Glauber Costa <glommer@redhat.com>,
linux-kernel@vger.kernel.org
On reset, VMCB TSC should be set to zero. Instead, code was setting
tsc_offset to zero, which passes through the underlying TSC.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
---
arch/x86/kvm/svm.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 760c86e..46856d2 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -781,7 +781,7 @@ static void init_vmcb(struct vcpu_svm *svm)
control->iopm_base_pa = iopm_base;
control->msrpm_base_pa = __pa(svm->msrpm);
- control->tsc_offset = 0;
+ guest_write_tsc(&svm->vcpu, 0);
control->int_ctl = V_INTR_MASKING_MASK;
init_seg(&save->es);
--
1.7.1
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-09-30 19:05 ` Marcelo Tosatti
@ 2010-09-30 20:16 ` Arjan Koers
2010-09-30 23:02 ` Michael Tokarev
1 sibling, 0 replies; 81+ messages in thread
From: Arjan Koers @ 2010-09-30 20:16 UTC (permalink / raw)
To: Marcelo Tosatti
Cc: kvm, Michael Tokarev, Zachary Amsden, Avi Kivity, Glauber Costa,
Andre Przywara
On 2010-09-30 21:05, Marcelo Tosatti wrote:
>
> Arjan, Michael, can you try the following:
>
> From 3823c018162dc708b543cbdc680a4c7d63533fee Mon Sep 17 00:00:00 2001
> From: Zachary Amsden <zamsden@redhat.com>
> Date: Sat, 29 May 2010 17:52:46 -1000
> Subject: [KVM V2 04/25] Fix SVM VMCB reset
> Cc: Avi Kivity <avi@redhat.com>,
> Marcelo Tosatti <mtosatti@redhat.com>,
> Glauber Costa <glommer@redhat.com>,
> linux-kernel@vger.kernel.org
>
> On reset, VMCB TSC should be set to zero. Instead, code was setting
> tsc_offset to zero, which passes through the underlying TSC.
>
> Signed-off-by: Zachary Amsden <zamsden@redhat.com>
> ---
> arch/x86/kvm/svm.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 760c86e..46856d2 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -781,7 +781,7 @@ static void init_vmcb(struct vcpu_svm *svm)
>
> control->iopm_base_pa = iopm_base;
> control->msrpm_base_pa = __pa(svm->msrpm);
> - control->tsc_offset = 0;
> + guest_write_tsc(&svm->vcpu, 0);
> control->int_ctl = V_INTR_MASKING_MASK;
>
> init_seg(&save->es);
It doesn't solve my problem. I tried on 2.6.32.23 and 2.6.36-rc6.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-09-30 19:05 ` Marcelo Tosatti
2010-09-30 20:16 ` Arjan Koers
@ 2010-09-30 23:02 ` Michael Tokarev
2010-09-30 23:07 ` Michael Tokarev
1 sibling, 1 reply; 81+ messages in thread
From: Michael Tokarev @ 2010-09-30 23:02 UTC (permalink / raw)
To: Marcelo Tosatti
Cc: Arjan Koers, kvm, Zachary Amsden, Avi Kivity, Glauber Costa,
Andre Przywara
30.09.2010 23:05, Marcelo Tosatti wrote:
[]
> Arjan, Michael, can you try the following:
>
> From 3823c018162dc708b543cbdc680a4c7d63533fee Mon Sep 17 00:00:00 2001
> From: Zachary Amsden <zamsden@redhat.com>
> Date: Sat, 29 May 2010 17:52:46 -1000
> Subject: [KVM V2 04/25] Fix SVM VMCB reset
> Cc: Avi Kivity <avi@redhat.com>,
> Marcelo Tosatti <mtosatti@redhat.com>,
> Glauber Costa <glommer@redhat.com>,
> linux-kernel@vger.kernel.org
>
> On reset, VMCB TSC should be set to zero. Instead, code was setting
> tsc_offset to zero, which passes through the underlying TSC.
>
> Signed-off-by: Zachary Amsden <zamsden@redhat.com>
> ---
> arch/x86/kvm/svm.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 760c86e..46856d2 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -781,7 +781,7 @@ static void init_vmcb(struct vcpu_svm *svm)
>
> control->iopm_base_pa = iopm_base;
> control->msrpm_base_pa = __pa(svm->msrpm);
> - control->tsc_offset = 0;
> + guest_write_tsc(&svm->vcpu, 0);
> control->int_ctl = V_INTR_MASKING_MASK;
This fails to compile on 2.6.35.5:
arch/x86/kvm/svm.c: In function ‘init_vmcb’:
arch/x86/kvm/svm.c:769: error: implicit declaration of function ‘guest_write_tsc’
I'll take a look tomorrow where that comes from.. hopefully ;)
Thanks!
/mjt
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-09-30 23:02 ` Michael Tokarev
@ 2010-09-30 23:07 ` Michael Tokarev
2010-10-01 1:13 ` Zachary Amsden
2010-10-02 5:35 ` Zachary Amsden
0 siblings, 2 replies; 81+ messages in thread
From: Michael Tokarev @ 2010-09-30 23:07 UTC (permalink / raw)
To: Marcelo Tosatti
Cc: Arjan Koers, kvm, Zachary Amsden, Avi Kivity, Glauber Costa,
Andre Przywara
01.10.2010 03:02, Michael Tokarev wrote:
> 30.09.2010 23:05, Marcelo Tosatti wrote:
> []
>> Arjan, Michael, can you try the following:
>>
>> From 3823c018162dc708b543cbdc680a4c7d63533fee Mon Sep 17 00:00:00 2001
>> From: Zachary Amsden <zamsden@redhat.com>
>> Date: Sat, 29 May 2010 17:52:46 -1000
>> Subject: [KVM V2 04/25] Fix SVM VMCB reset
>> Cc: Avi Kivity <avi@redhat.com>,
>> Marcelo Tosatti <mtosatti@redhat.com>,
>> Glauber Costa <glommer@redhat.com>,
>> linux-kernel@vger.kernel.org
>>
>> On reset, VMCB TSC should be set to zero. Instead, code was setting
>> tsc_offset to zero, which passes through the underlying TSC.
>>
>> Signed-off-by: Zachary Amsden <zamsden@redhat.com>
>> ---
>> arch/x86/kvm/svm.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index 760c86e..46856d2 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -781,7 +781,7 @@ static void init_vmcb(struct vcpu_svm *svm)
>>
>> control->iopm_base_pa = iopm_base;
>> control->msrpm_base_pa = __pa(svm->msrpm);
>> - control->tsc_offset = 0;
>> + guest_write_tsc(&svm->vcpu, 0);
>> control->int_ctl = V_INTR_MASKING_MASK;
>
> This fails to compile on 2.6.35.5:
>
> arch/x86/kvm/svm.c: In function ‘init_vmcb’:
> arch/x86/kvm/svm.c:769: error: implicit declaration of function ‘guest_write_tsc’
>
> I'll take a look tomorrow where that comes from.. hopefully ;)
Ok, that routine is static, defined in arch/x86/kvm/vmx.c
(not svm.c). I'm not sure it's ok to use it in svm.c
directly, as it appears to be vmx-specific.
Thanks!
/mjt
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-09-30 23:07 ` Michael Tokarev
@ 2010-10-01 1:13 ` Zachary Amsden
2010-10-02 5:35 ` Zachary Amsden
1 sibling, 0 replies; 81+ messages in thread
From: Zachary Amsden @ 2010-10-01 1:13 UTC (permalink / raw)
To: Michael Tokarev
Cc: Marcelo Tosatti, Arjan Koers, kvm, Avi Kivity, Glauber Costa,
Andre Przywara
On 09/30/2010 01:07 PM, Michael Tokarev wrote:
> 01.10.2010 03:02, Michael Tokarev wrote:
>
>> 30.09.2010 23:05, Marcelo Tosatti wrote:
>> []
>>
>>> Arjan, Michael, can you try the following:
>>>
>>> From 3823c018162dc708b543cbdc680a4c7d63533fee Mon Sep 17 00:00:00 2001
>>> From: Zachary Amsden<zamsden@redhat.com>
>>> Date: Sat, 29 May 2010 17:52:46 -1000
>>> Subject: [KVM V2 04/25] Fix SVM VMCB reset
>>> Cc: Avi Kivity<avi@redhat.com>,
>>> Marcelo Tosatti<mtosatti@redhat.com>,
>>> Glauber Costa<glommer@redhat.com>,
>>> linux-kernel@vger.kernel.org
>>>
>>> On reset, VMCB TSC should be set to zero. Instead, code was setting
>>> tsc_offset to zero, which passes through the underlying TSC.
>>>
>>> Signed-off-by: Zachary Amsden<zamsden@redhat.com>
>>> ---
>>> arch/x86/kvm/svm.c | 2 +-
>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>> index 760c86e..46856d2 100644
>>> --- a/arch/x86/kvm/svm.c
>>> +++ b/arch/x86/kvm/svm.c
>>> @@ -781,7 +781,7 @@ static void init_vmcb(struct vcpu_svm *svm)
>>>
>>> control->iopm_base_pa = iopm_base;
>>> control->msrpm_base_pa = __pa(svm->msrpm);
>>> - control->tsc_offset = 0;
>>> + guest_write_tsc(&svm->vcpu, 0);
>>> control->int_ctl = V_INTR_MASKING_MASK;
>>>
>> This fails to compile on 2.6.35.5:
>>
>> arch/x86/kvm/svm.c: In function ‘init_vmcb’:
>> arch/x86/kvm/svm.c:769: error: implicit declaration of function ‘guest_write_tsc’
>>
>> I'll take a look tomorrow where that comes from.. hopefully ;)
>>
> Ok, that routine is static, defined in arch/x86/kvm/vmx.c
> (not svm.c). I'm not sure it's ok to use it in svm.c
> directly, as it appears to be vmx-specific.
>
Looks like you are missing some patches in between which move this into
common code, so it won't apply directly.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-09-30 23:07 ` Michael Tokarev
2010-10-01 1:13 ` Zachary Amsden
@ 2010-10-02 5:35 ` Zachary Amsden
2010-10-02 7:35 ` Michael Tokarev
1 sibling, 1 reply; 81+ messages in thread
From: Zachary Amsden @ 2010-10-02 5:35 UTC (permalink / raw)
To: Michael Tokarev
Cc: Marcelo Tosatti, Arjan Koers, kvm, Avi Kivity, Glauber Costa,
Andre Przywara, jeremy
[-- Attachment #1: Type: text/plain, Size: 1875 bytes --]
On 09/30/2010 01:07 PM, Michael Tokarev wrote:
> 01.10.2010 03:02, Michael Tokarev wrote:
>
>> 30.09.2010 23:05, Marcelo Tosatti wrote:
>> []
>>
>>> Arjan, Michael, can you try the following:
>>>
>>> From 3823c018162dc708b543cbdc680a4c7d63533fee Mon Sep 17 00:00:00 2001
>>> From: Zachary Amsden<zamsden@redhat.com>
>>> Date: Sat, 29 May 2010 17:52:46 -1000
>>> Subject: [KVM V2 04/25] Fix SVM VMCB reset
>>> Cc: Avi Kivity<avi@redhat.com>,
>>> Marcelo Tosatti<mtosatti@redhat.com>,
>>> Glauber Costa<glommer@redhat.com>,
>>> linux-kernel@vger.kernel.org
>>>
>>> On reset, VMCB TSC should be set to zero. Instead, code was setting
>>> tsc_offset to zero, which passes through the underlying TSC.
>>>
>>> Signed-off-by: Zachary Amsden<zamsden@redhat.com>
>>> ---
>>> arch/x86/kvm/svm.c | 2 +-
>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>> index 760c86e..46856d2 100644
>>> --- a/arch/x86/kvm/svm.c
>>> +++ b/arch/x86/kvm/svm.c
>>> @@ -781,7 +781,7 @@ static void init_vmcb(struct vcpu_svm *svm)
>>>
>>> control->iopm_base_pa = iopm_base;
>>> control->msrpm_base_pa = __pa(svm->msrpm);
>>> - control->tsc_offset = 0;
>>> + guest_write_tsc(&svm->vcpu, 0);
>>> control->int_ctl = V_INTR_MASKING_MASK;
>>>
>> This fails to compile on 2.6.35.5:
>>
>> arch/x86/kvm/svm.c: In function ‘init_vmcb’:
>> arch/x86/kvm/svm.c:769: error: implicit declaration of function ‘guest_write_tsc’
>>
>> I'll take a look tomorrow where that comes from.. hopefully ;)
>>
> Ok, that routine is static, defined in arch/x86/kvm/vmx.c
> (not svm.c). I'm not sure it's ok to use it in svm.c
> directly, as it appears to be vmx-specific.
>
> Thanks!
>
> /mjt
>
Can you try this patch to see if it helps? I believe it is also safe
for Xen, but cc'ing to double check.
[-- Attachment #2: kvmclock-fix-hack-1.patch --]
[-- Type: text/plain, Size: 807 bytes --]
Try to fix setup_percpu_clockdev by moving it before interrupts
are enabled.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 8b3bfc4..40a383b 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -351,6 +351,8 @@ notrace static void __cpuinit start_secondary(void *unused)
unlock_vector_lock();
ipi_call_unlock();
per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
+ x86_cpuinit.setup_percpu_clockev();
+
x86_platform.nmi_init();
/* enable local interrupts */
@@ -359,8 +361,6 @@ notrace static void __cpuinit start_secondary(void *unused)
/* to prevent fake stack check failure in clock setup */
boot_init_stack_canary();
- x86_cpuinit.setup_percpu_clockev();
-
wmb();
cpu_idle();
}
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-02 5:35 ` Zachary Amsden
@ 2010-10-02 7:35 ` Michael Tokarev
2010-10-02 7:40 ` Michael Tokarev
` (2 more replies)
0 siblings, 3 replies; 81+ messages in thread
From: Michael Tokarev @ 2010-10-02 7:35 UTC (permalink / raw)
To: Zachary Amsden
Cc: Marcelo Tosatti, Arjan Koers, kvm, Avi Kivity, Glauber Costa,
Andre Przywara, jeremy
02.10.2010 09:35, Zachary Amsden wrote:
[]
> Can you try this patch to see if it helps? I believe it is also safe
> for Xen, but cc'ing to double check.
It makes no visible difference.
For some reason one of my test guests - 2.6.35.6 32bit kernel -
stopped booting completely, always handing at boot somewhere
unless I disable printk.time. Here's the typical boot messages,
up to the hang:
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 2.6.35-i686 (mjt@gandalf) (gcc version 4.4.5 20100728 (prerelease) (Debian 4.4.4-8) ) #2.6.35.6 SMP Thu Sep 30 12:00:24 MSD 2010
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
[ 0.000000] BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 000000001fffd000 (usable)
[ 0.000000] BIOS-e820: 000000001fffd000 - 0000000020000000 (reserved)
[ 0.000000] BIOS-e820: 00000000feffd000 - 00000000ff001000 (reserved)
[ 0.000000] BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
[ 0.000000] Notice: NX (Execute Disable) protection cannot be enabled: non-PAE kernel!
[ 0.000000] DMI 2.4 present.
[ 0.000000] last_pfn = 0x1fffd max_arch_pfn = 0x100000
[ 0.000000] x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106
[ 0.000000] found SMP MP-table at [c00fdbe0] fdbe0
[ 0.000000] init_memory_mapping: 0000000000000000-000000001fffd000
[ 0.000000] RAMDISK: 1fbb5000 - 1fe96000
[ 0.000000] ACPI: RSDP 000fdb90 00014 (v00 BOCHS )
[ 0.000000] ACPI: RSDT 1fffde10 00034 (v01 BOCHS BXPCRSDT 00000001 BXPC 00000001)
[ 0.000000] ACPI: FACP 1ffffe40 00074 (v01 BOCHS BXPCFACP 00000001 BXPC 00000001)
[ 0.000000] ACPI: DSDT 1fffdfd0 01E22 (v01 BXPC BXDSDT 00000001 INTL 20090123)
[ 0.000000] ACPI: FACS 1ffffe00 00040
[ 0.000000] ACPI: SSDT 1fffdf80 00044 (v01 BOCHS BXPCSSDT 00000001 BXPC 00000001)
[ 0.000000] ACPI: APIC 1fffde90 0007A (v01 BOCHS BXPCAPIC 00000001 BXPC 00000001)
[ 0.000000] ACPI: HPET 1fffde50 00038 (v01 BOCHS BXPCHPET 00000001 BXPC 00000001)
[ 0.000000] 0MB HIGHMEM available.
[ 0.000000] 511MB LOWMEM available.
[ 0.000000] mapped low ram: 0 - 1fffd000
[ 0.000000] low ram: 0 - 1fffd000
[ 0.000000] kvm-clock: Using msrs 12 and 11
[ 0.000000] kvm-clock: cpu 0, msr 0:13c60c1, boot clock
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0x00000001 -> 0x00001000
[ 0.000000] Normal 0x00001000 -> 0x0001fffd
[ 0.000000] HighMem empty
[ 0.000000] Movable zone start PFN for each node
[ 0.000000] early_node_map[2] active PFN ranges
[ 0.000000] 0: 0x00000001 -> 0x0000009f
[ 0.000000] 0: 0x00000100 -> 0x0001fffd
[ 0.000000] Using APIC driver default
[ 0.000000] ACPI: PM-Timer IO Port: 0xb008
[ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[ 0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[ 0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
[ 0.000000] PM: Registered nosave memory: 000000000009f000 - 00000000000a0000
[ 0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
[ 0.000000] PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
[ 0.000000] Allocating PCI resources starting at 20000000 (gap: 20000000:deffd000)
[ 0.000000] Booting paravirtualized kernel on KVM
[ 0.000000] setup_percpu: NR_CPUS:8 nr_cpumask_bits:8 nr_cpu_ids:2 nr_node_ids:1
[ 0.000000] PERCPU: Embedded 16 pages/cpu @c1c00000 s43072 r0 d22464 u2097152
[ 0.000000] pcpu-alloc: s43072 r0 d22464 u2097152 alloc=1*4194304
[ 0.000000] pcpu-alloc: [0] 0 1
[ 0.000000] kvm-clock: cpu 0, msr 0:1c0a0c1, primary cpu clock
[ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 129947
[ 0.000000] Kernel command line: acpi_enforce_resources=lax rootfs=nfs root=/usr/rb rootflags=ro,nolock bootrc=/remote/bootrc initrd=lnx/initrd-2.6.35-i686 ip=192.168.88.60:192.168.88.4:192.168.88.4:255.255.255.0 BOOTIF=01-52-54-00-12-34-56 console=tty1 console=ttyS0 BOOT_IMAGE=lnx/vmlinuz-2.6.35-i686
[ 0.000000] PID hash table entries: 2048 (order: 1, 8192 bytes)
[ 0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
[ 0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
[ 0.000000] Enabling fast FPU save and restore... done.
[ 0.000000] Enabling unmasked SIMD FPU exception support... done.
[ 0.000000] Initializing CPU#0
[ 0.000000] Subtract (42 early reservations)
[ 0.000000] #1 [0000001000 - 0000002000] EX TRAMPOLINE
[ 0.000000] #2 [0001000000 - 000144a9e4] TEXT DATA BSS
[ 0.000000] #3 [001fbb5000 - 001fe96000] RAMDISK
[ 0.000000] #4 [000144b000 - 0001451049] BRK
[ 0.000000] #5 [000009f400 - 00000fdbe0] BIOS reserved
[ 0.000000] #6 [00000fdbe0 - 00000fdbf0] MP-table mpf
[ 0.000000] #7 [00000fdce4 - 0000100000] BIOS reserved
[ 0.000000] #8 [00000fdbf0 - 00000fdce4] MP-table mpc
[ 0.000000] #9 [0000002000 - 0000003000] TRAMPOLINE
[ 0.000000] #10 [0000003000 - 0000007000] ACPI WAKEUP
[ 0.000000] #11 [0000007000 - 0000008000] PGTABLE
[ 0.000000] #12 [0001452000 - 0001453000] BOOTMEM
[ 0.000000] #13 [0001453000 - 0001853000] BOOTMEM
[ 0.000000] #14 [000144aa00 - 000144aa04] BOOTMEM
[ 0.000000] #15 [000144aa40 - 000144ab00] BOOTMEM
[ 0.000000] #16 [000144ab00 - 000144ab30] BOOTMEM
[ 0.000000] #17 [0001853000 - 0001854800] BOOTMEM
[ 0.000000] #18 [000144ab40 - 000144ab65] BOOTMEM
[ 0.000000] #19 [000144ab80 - 000144aba7] BOOTMEM
[ 0.000000] #20 [000144abc0 - 000144aca0] BOOTMEM
[ 0.000000] #21 [000144acc0 - 000144ad00] BOOTMEM
[ 0.000000] #22 [000144ad00 - 000144ad40] BOOTMEM
[ 0.000000] #23 [000144ad40 - 000144ad80] BOOTMEM
[ 0.000000] #24 [000144ad80 - 000144adc0] BOOTMEM
[ 0.000000] #25 [000144adc0 - 000144ae00] BOOTMEM
[ 0.000000] #26 [000144ae00 - 000144ae40] BOOTMEM
[ 0.000000] #27 [000144ae40 - 000144ae80] BOOTMEM
[ 0.000000] #28 [000144ae80 - 000144ae90] BOOTMEM
[ 0.000000] #29 [000144aec0 - 000144afcf] BOOTMEM
[ 0.000000] #30 [0001451080 - 000145118f] BOOTMEM
[ 0.000000] #31 [0001c00000 - 0001c10000] BOOTMEM
[ 0.000000] #32 [0001e00000 - 0001e10000] BOOTMEM
[ 0.000000] #33 [00014511c0 - 00014511c4] BOOTMEM
[ 0.000000] #34 [0001451200 - 0001451204] BOOTMEM
[ 0.000000] #35 [0001451240 - 0001451248] BOOTMEM
[ 0.000000] #36 [0001451280 - 0001451288] BOOTMEM
[ 0.000000] #37 [00014512c0 - 0001451368] BOOTMEM
[ 0.000000] #38 [0001451380 - 00014513e8] BOOTMEM
[ 0.000000] #39 [0001854800 - 0001856800] BOOTMEM
[ 0.000000] #40 [0001856800 - 0001896800] BOOTMEM
[ 0.000000] #41 [0001896800 - 00018b6800] BOOTMEM
[ 0.000000] Initializing HighMem for node 0 (00000000:00000000)
[ 0.000000] Memory: 511856k/524276k available (2554k kernel code, 12028k reserved, 930k data, 380k init, 0k highmem)
[ 0.000000] virtual kernel memory layout:
[ 0.000000] fixmap : 0xfff16000 - 0xfffff000 ( 932 kB)
[ 0.000000] pkmap : 0xff800000 - 0xffc00000 (4096 kB)
[ 0.000000] vmalloc : 0xe07fd000 - 0xff7fe000 ( 496 MB)
[ 0.000000] lowmem : 0xc0000000 - 0xdfffd000 ( 511 MB)
[ 0.000000] .init : 0xc1368000 - 0xc13c7000 ( 380 kB)
[ 0.000000] .data : 0xc127ebb7 - 0xc1367488 ( 930 kB)
[ 0.000000] .text : 0xc1000000 - 0xc127ebb7 (2554 kB)
[ 0.000000] Checking if this processor honours the WP bit even in supervisor mode...Ok.
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU-based detection of stalled CPUs is disabled.
[ 0.000000] Verbose stalled-CPUs detection is disabled.
[ 0.000000] NR_IRQS:512
[ 0.000000] Console: colour VGA+ 80x25
[ 0.000000] console [tty1] enabled
[ 0.000000] console [ttyS0] enabled
[ 0.000000] Detected 3217.252 MHz processor.
[ 0.023332] Calibrating delay loop (skipped) preset value.. 6437.60 BogoMIPS (lpj=10724173)
[ 0.023332] pid_max: default: 32768 minimum: 301
[ 0.023332] Mount-cache hash table entries: 512
[ 0.023447] Initializing cgroup subsys ns
[ 0.024131] Initializing cgroup subsys cpuacct
[ 0.024851] Initializing cgroup subsys devices
[ 0.025580] Initializing cgroup subsys freezer
[ 0.026669] Initializing cgroup subsys net_cls
[ 0.027425] Initializing cgroup subsys blkio
[ 0.030079] mce: CPU supports 10 MCE banks
[ 0.030847] using C1E aware idle routine
[ 0.031517] Performance Events: AMD PMU driver.
[ 0.032313] ... version: 0
[ 0.033335] ... bit width: 48
[ 0.034036] ... generic registers: 4
[ 0.034716] ... value mask: 0000ffffffffffff
[ 0.035542] ... max period: 00007fffffffffff
[ 0.036669] ... fixed-purpose events: 0
[ 0.037521] ... event mask: 000000000000000f
[ 0.041961] ACPI: Core revision 20100428
[ 0.044150] Enabling APIC mode: Flat. Using 1 I/O APICs
[ 0.045964] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.046671] CPU0: AMD Athlon(tm) II X2 260 Processor stepping 03
[ 0.049999] APIC calibration not consistent with PM-Timer: 102ms instead of 100ms
[ 0.049999] APIC delta adjusted to PM-Timer: 6248670 (6435422)
[ 0.050298] Booting Node 0, Processors #1 Ok.
[ 0.023332] Initializing CPU#1
[ 0.063333] kvm-clock: cpu 1, msr 0:1e0a0c1, secondary cpu clock
[ 0.063333] Brought up 2 CPUs
[ 0.063333] Total of 2 processors activated (12874.21 BogoMIPS).
[ 0.076666] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
[ 0.116666] devtmpfs: initialized
[ 0.116666] NET: Registered protocol family 16
[ 0.119999] ACPI: bus type pci registered
[ 0.123333] PCI: PCI BIOS revision 2.10 entry at 0xffe77, last bus=0
[ 0.123333] PCI: Using configuration type 1 for base access
[ 0.123333] PCI: Using configuration type 1 for extended access
[ 0.126666] mtrr: your CPUs had inconsistent variable MTRR settings
[ 0.126666] mtrr: your CPUs had inconsistent MTRRdefType settings
[ 0.126666] mtrr: probably your BIOS does not setup all CPUs.
[ 0.126666] mtrr: corrected configuration.
[ 0.136666] bio: create slab <bio-0> at 0
[ 0.153333] ACPI: Interpreter enabled
[ 0.153333] ACPI: (supports S0 S3 S4 S5)
[ 0.153333] ACPI: Using IOAPIC for interrupt routing
[ 0.203333] ACPI: No dock devices found.
[ 0.203333] PCI: Ignoring host bridge windows from ACPI; if necessary, use "pci=use_crs" and report a bug
[ 0.206666] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[ 0.209999] pci 0000:00:01.3: quirk: [io 0xb000-0xb03f] claimed by PIIX4 ACPI
[ 0.209999] pci 0000:00:01.3: quirk: [io 0xb100-0xb10f] claimed by PIIX4 SMB
[ 0.216666] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[ 0.219999] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[ 0.219999] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[ 0.223333] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[ 0.223333] HEST: Table is not found!
[ 0.226666] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[ 0.229999] vgaarb: loaded
[ 0.229999] PCI: Using ACPI for IRQ routing
[ 0.233333] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
[ 0.239999] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[ 0.239999] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
[ 0.249999] Switching to clocksource kvm-clock
[ 0.259999] pnp: PnP ACPI init
[ 0.259999] ACPI: bus type pnp registered
[ 0.259999] pnp: PnP ACPI: found 8 devices
[ 0.259999] ACPI: ACPI bus type pnp unregistered
[ 0.259999] PnPBIOS: Disabled
[ 0.259999] NET: Registered protocol family 2
[ 0.259999] IP route cache hash table entries: 4096 (order: 2, 16384 bytes)
[ 0.259999] TCP established hash table entries: 16384 (order: 5, 131072 bytes)
[ 0.259999] TCP bind hash table entries: 16384 (order: 5, 131072 bytes)
[ 0.259999] TCP: Hash tables configured (established 16384 bind 16384)
[ 0.259999] TCP reno registered
[ 0.259999] UDP hash table entries: 256 (order: 1, 8192 bytes)
[ 0.259999] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
[ 0.259999] NET: Registered protocol family 1
[ 0.259999] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[ 0.259999] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[ 0.259999] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[ 0.259999] Unpacking initramfs...
[ 0.259999] Freeing initrd memory: 2948k freed
[ 0.259999] HugeTLB registered 4 MB page size, pre-allocated 0 pages
[ 0.259999] VFS: Disk quotas dquot_6.5.2
[ 0.259999] Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
[ 0.259999] msgmni has been set to 1005
[ 0.259999] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[ 0.259999] io scheduler noop registered
[ 0.259999] io scheduler deadline registered
[ 0.259999] io scheduler cfq registered (default)
[ 0.259999] ERST: Table is not found!
[ 0.259999] isapnp: Scanning for PnP cards...
[ 0.259999] isapnp: No Plug & Play device found
[ 0.259999] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[ 0.259999] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[ 0.259999] 00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[ 0.259999] PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[ 0.259999] serio: i8042 KBD port at 0x60,0x64 irq 1
[ 0.259999] serio: i8042 AUX port at 0x60,0x64 irq 12
[ 0.259999] mice: PS/2 mouse device common for all mice
[ 0.259999] input: PC Speaker as /devices/platform/pcspkr/input/input0
[ 0.259999] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1
[ 0.259999] rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
[ 0.259999] rtc0: alarms up to one day, 114 bytes nvram, hpet irqs
[ 0.259999] cpuidle: using governor ladder
[ 0.259999] cpuidle: using governor menu
[ 0.259999] TCP cubic registered
[ 0.259999] NET: Registered protocol family 17
[ 0.259999] Using IPI No-Shortcut mode
[ 0.259999] rtc_cmos 00:01: setting system clock to 2010-10-02 07:27:50 UTC (1286004470)
[ 0.259999] Freeing unused kernel memory: 380k freed
[ 0.259999] Processing INITRAMFS
[ 0.259999] SCSI subsystem initialized
[ 0.259999] scsi0 : ata_piix
[ 0.259999] scsi1 : ata_piix
[ 0.259999] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14
[ 0.259999] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15
Note the time - it is constant after switching to kvmclock.
This is the most typical place where it stops, sometimes it
stops at "Freeing unused kernel memory", sometimes it boots
further and hangs at "Login:" prompt, right after some other
kernel message.
This is bootlog with the last patch (kvmclock-fix-hack-1.patch)
and the previous "bandaid" patch (the kvmclock registration
printk, use-before-init, which obviously makes no difference)
applied.
I just realized I never posted any boot loogs from my systems...
So here it goes :)
Thanks!
/mjt
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-02 7:35 ` Michael Tokarev
@ 2010-10-02 7:40 ` Michael Tokarev
2010-10-02 7:50 ` Michael Tokarev
2010-10-02 16:10 ` Arjan Koers
2010-10-02 21:55 ` Zachary Amsden
2 siblings, 1 reply; 81+ messages in thread
From: Michael Tokarev @ 2010-10-02 7:40 UTC (permalink / raw)
To: Zachary Amsden
Cc: Marcelo Tosatti, Arjan Koers, kvm, Avi Kivity, Glauber Costa,
Andre Przywara, jeremy
02.10.2010 11:35, Michael Tokarev wrote:
[]
> [ 0.259999] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14
> [ 0.259999] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15
>
> Note the time - it is constant after switching to kvmclock.
Another interesting observation. The time is almost always
like this. Another very common version is 0.199999:
[ 0.189999] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[ 0.193333] HEST: Table is not found!
[ 0.193333] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[ 0.196666] vgaarb: loaded
[ 0.196666] PCI: Using ACPI for IRQ routing
[ 0.199999] Switching to clocksource kvm-clock
[ 0.199999] pnp: PnP ACPI init
[ 0.199999] ACPI: bus type pnp registered
[ 0.199999] pnp: PnP ACPI: found 8 devices
[ 0.199999] ACPI: ACPI bus type pnp unregistered
[ 0.199999] PnPBIOS: Disabled
...
This shows much more often than any other value.
Thanks!
/mjt
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-02 7:40 ` Michael Tokarev
@ 2010-10-02 7:50 ` Michael Tokarev
0 siblings, 0 replies; 81+ messages in thread
From: Michael Tokarev @ 2010-10-02 7:50 UTC (permalink / raw)
To: Zachary Amsden
Cc: Marcelo Tosatti, Arjan Koers, kvm, Avi Kivity, Glauber Costa,
Andre Przywara, jeremy
Ugh. Replying to myself again and again, but I found all these
variants quite interesting for the problem at hand.
02.10.2010 11:40, Michael Tokarev wrote:
> 02.10.2010 11:35, Michael Tokarev wrote:
> []
>> [ 0.259999] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14
>> [ 0.259999] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15
>>
>> Note the time - it is constant after switching to kvmclock.
>
> Another interesting observation. The time is almost always
> like this. Another very common version is 0.199999:
>
> [ 0.189999] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
> [ 0.193333] HEST: Table is not found!
> [ 0.193333] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
> [ 0.196666] vgaarb: loaded
> [ 0.196666] PCI: Using ACPI for IRQ routing
> [ 0.199999] Switching to clocksource kvm-clock
> [ 0.199999] pnp: PnP ACPI init
> [ 0.199999] ACPI: bus type pnp registered
> [ 0.199999] pnp: PnP ACPI: found 8 devices
> [ 0.199999] ACPI: ACPI bus type pnp unregistered
> [ 0.199999] PnPBIOS: Disabled
> ...
And here's yet another variant I just got. It hanged much earler
this time, now with 100% CPU usage:
...
[ 0.000000] Kernel command line: rootfs=nfs root=/usr/rb rootflags=ro,nolock bootrc=/remote/bootrc initrd=lnx/initrd-2.6.35-i686 ip=192.168.88.60:192.168.88.4:192.168.88.4:255.255.255.0 BOOTIF=01-52-54-00-12-34-56 console=ttyS0 BOOT_IMAGE=lnx/vmlinuz-2.6.35-i686
...
[ 0.009012] using C1E aware idle routine
[ 0.009430] Performance Events: AMD PMU driver.
[ 0.010009] ... version: 0
[ 0.010427] ... bit width: 48
[ 0.010853] ... generic registers: 4
[ 0.011270] ... value mask: 0000ffffffffffff
[ 0.011818] ... max period: 00007fffffffffff
[ 0.012366] ... fixed-purpose events: 0
[ 0.012785] ... event mask: 000000000000000f
[ 0.016795] ACPI: Core revision 20100428
[ 0.018729] Enabling APIC mode: Flat. Using 1 I/O APICs
[ 0.019999] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.019999] CPU0: AMD Athlon(tm) II X2 260 Processor stepping 03
and.. nothing (this is with -cpu host). So this is _way_
before the kvmclock registration.
Another:
...
[ 0.109999] vgaarb: loaded
[ 0.109999] PCI: Using ACPI for IRQ routing
[ 0.113333] Switching to clocksource kvm-clock
[ 0.116666] pnp: PnP ACPI init
[ 0.116666] ACPI: bus type pnp registered
(note the "uncommon" timestamp ;)
With printk.time=0 it still boots ok.
Note there are 2 "versions" of this hang. The one which is
trivially triggerable right at the kvmclock registration
without the bandaid printk patch applied - it hangs there
with 100% cpu usage and guest not reacting to any events.
This is what happened in the above case where it hanged
at CPU0 line, too -- 100% CPU and no reaction to keyboard.
Another, much more common variant with that printk patch
applied is like no cpu usage, the guest reacts to keyboard
events (I can Shift+PgUp/PgDown for example), but it does
not do anything else, and the time printed is constant.
Thanks!
/mjt
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-02 7:35 ` Michael Tokarev
2010-10-02 7:40 ` Michael Tokarev
@ 2010-10-02 16:10 ` Arjan Koers
2010-10-02 20:26 ` Michael Tokarev
2010-10-02 23:42 ` Zachary Amsden
2010-10-02 21:55 ` Zachary Amsden
2 siblings, 2 replies; 81+ messages in thread
From: Arjan Koers @ 2010-10-02 16:10 UTC (permalink / raw)
To: kvm
Cc: Zachary Amsden, Marcelo Tosatti, Michael Tokarev, Avi Kivity,
Glauber Costa, Andre Przywara, jeremy
On 2010-10-02 09:35, Michael Tokarev wrote:
> 02.10.2010 09:35, Zachary Amsden wrote:
> []
>> Can you try this patch to see if it helps? I believe it is also safe
>> for Xen, but cc'ing to double check.
>
> It makes no visible difference.
>
> For some reason one of my test guests - 2.6.35.6 32bit kernel -
> stopped booting completely, always handing at boot somewhere
> unless I disable printk.time. Here's the typical boot messages,
> up to the hang:
>
> [ 0.000000] Initializing cgroup subsys cpuset
...
> [ 0.259999] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14
> [ 0.259999] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15
>
> Note the time - it is constant after switching to kvmclock.
While CPU 1 is booting, pvclock_clocksource_read gets wrong data for that
CPU and returns a value that's far into the future. On subsequent calls, it
keeps returning that bogus 'last' value, because it has been made
to never go backwards in time.
I'm pretty sure that your kernel will boot with this debug patch (for
2.6.35.7). It doesn't fix the problem, but corrects things afterwards.
The patch sets the clock backwards if it detects that the previous
value was far into the future. It also modifies printk to display some
extra information. The DEBUG define was added to get extra calls to
printk's where things can go wrong.
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 239427c..5eab569 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -120,12 +120,15 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src)
static atomic64_t last_value = ATOMIC64_INIT(0);
+int pvclock_backwards = 0;
+
cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
{
struct pvclock_shadow_time shadow;
unsigned version;
cycle_t ret, offset;
u64 last;
+ bool backwards;
do {
version = pvclock_get_time_values(&shadow, src);
@@ -153,13 +156,26 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
* updating at the same time, and one of them could be slightly behind,
* making the assumption that last_value always go forward fail to hold.
*/
+ backwards = false;
last = atomic64_read(&last_value);
do {
- if (ret < last)
- return last;
+ if (ret < last) {
+ if ( last - ret < 25000000 )
+ return last;
+ else
+ /* The clock will go backwards instead of being stuck at last value for a very long time
+ * The return value of the previous call to pvclock_clocksource_read was most probably
+ * very far into te future because the clock for that CPU hadn't been setup yet
+ */
+ backwards = true;
+ }
last = atomic64_cmpxchg(&last_value, last, ret);
} while (unlikely(last != ret));
+ /* Increment outside of the while loop, because it always loops twice */
+ if (backwards)
+ pvclock_backwards++;
+
return ret;
}
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 0bf2ece..d6dcd45 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1,3 +1,5 @@
+#define DEBUG
+
/*
* x86 SMP booting functions
*
diff --git a/kernel/printk.c b/kernel/printk.c
index 444b770..9608bec 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -687,6 +687,8 @@ static inline void printk_delay(void)
}
}
+extern int pvclock_backwards;
+
asmlinkage int vprintk(const char *fmt, va_list args)
{
int printed_len = 0;
@@ -778,9 +780,13 @@ asmlinkage int vprintk(const char *fmt, va_list args)
unsigned long long t;
unsigned long nanosec_rem;
+ int pvclock_backwards_prev = pvclock_backwards;
t = cpu_clock(printk_cpu);
nanosec_rem = do_div(t, 1000000000);
- tlen = sprintf(tbuf, "[%5lu.%06lu] ",
+ tlen = sprintf(tbuf, "[%d;%d/%d:%5lu.%06lu] ",
+ printk_cpu,
+ pvclock_backwards_prev,
+ pvclock_backwards,
(unsigned long) t,
nanosec_rem / 1000);
Partial output on my machine, where the clock is set backwards 4 times:
...
[0;0/0: 0.015662] CPU0: AMD Athlon(tm) II X2 240 Processor stepping 02
[0;0/0: 0.124164] ++++++++++++++++++++=_---CPU UP 1
[0;0/0: 0.124193] Booting Node 0, Processors #1 Ok.
[0;0/0: 0.124602] Setting warm reset code and vector.
[0;0/0: 0.124609] 1.
[0;0/0: 0.124610] 2.
[0;0/0: 0.124611] 3.
[0;0/0: 0.124624] Asserting INIT.
[0;0/0: 0.124634] Waiting for send to finish...
[0;0/0: 0.134508] Deasserting INIT.
[0;0/0: 0.134515] Waiting for send to finish...
[0;0/0: 0.134519] #startup loops: 2.
[0;0/0: 0.134521] Sending STARTUP #1.
[0;0/0: 0.134527] After apic_write.
[1;0/0: 0.008000] CPU#1 (phys ID: 1) waiting for CALLOUT
[0;0/1: 0.134838] Startup point 1.
[0;1/1: 0.134841] Waiting for send to finish...
[0;1/1: 0.135049] Sending STARTUP #2.
[0;1/1: 0.135055] After apic_write.
[0;1/1: 0.135359] Startup point 1.
[0;1/1: 0.135361] Waiting for send to finish...
[0;1/1: 0.135568] After Startup.
[0;1/1: 0.135569] Before Callout 1.
[0;1/1: 0.135571] After Callout 1.
[1;1/1: 0.008000] CALLIN, before setup_local_APIC().
[1;2/2: 0.008000] Stack at about ffff88001f875f44
[0;3/3: 0.136176] CPU1: has booted.
[1;3/3: 0.008000] kvm-clock: cpu 1, msr 0:1511c41, secondary cpu clock
[0;4/4: 0.136199] Brought up 2 CPUs
[0;4/4: 0.136201] Boot done.
[0;4/4: 0.136202] Before bogomips.
[0;4/4: 0.136204] Total of 2 processors activated (11198.56 BogoMIPS).
[0;4/4: 0.136205] Before bogocount - setting activated=1.
[1;4/4: 0.140208] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
[0;4/4: 0.142577] NET: Registered protocol family 16
[0;4/4: 0.144263] PCI: Using configuration type 1 for base access
[0;4/4: 0.144494] PCI: Using configuration type 1 for extended access
[0;4/4: 0.144938] mtrr: your CPUs had inconsistent variable MTRR settings
[0;4/4: 0.144938] mtrr: your CPUs had inconsistent MTRRdefType settings
[0;4/4: 0.144938] mtrr: probably your BIOS does not setup all CPUs.
[0;4/4: 0.148004] mtrr: corrected configuration.
[0;4/4: 0.156040] bio: create slab <bio-0> at 0
[0;4/4: 0.156602] vgaarb: loaded
[0;4/4: 0.156602] PCI: Probing PCI hardware
[0;4/4: 0.156602] PCI: Probing PCI hardware (bus 00)
[0;4/4: 0.156703] pci 0000:00:01.1: reg 20: [io 0xc000-0xc00f]
[0;4/4: 0.160269] pci 0000:00:01.3: quirk: [io 0xb000-0xb03f] claimed by PIIX4 ACPI
[0;4/4: 0.161055] pci 0000:00:01.3: quirk: [io 0xb100-0xb10f] claimed by PIIX4 SMB
[0;4/4: 0.164064] pci 0000:00:02.0: reg 10: [mem 0xf0000000-0xf1ffffff pref]
[0;4/4: 0.164827] pci 0000:00:02.0: reg 14: [mem 0xf2000000-0xf2000fff]
[0;4/4: 0.169023] pci 0000:00:03.0: reg 10: [io 0xc020-0xc03f]
[0;4/4: 0.170052] pci 0000:00:03.0: reg 14: [mem 0xf2001000-0xf2001fff]
[0;4/4: 0.170381] pci 0000:00:04.0: reg 10: [io 0xc040-0xc05f]
[0;4/4: 0.170765] pci 0000:00:05.0: reg 10: [io 0xc080-0xc0bf]
[0;4/4: 0.171023] pci 0000:00:06.0: reg 10: [io 0xc0c0-0xc0ff]
[0;4/4: 0.172123] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[0;4/4: 0.172971] pci 0000:00:01.0: PIIX/ICH IRQ router [8086:7000]
[0;4/4: 0.172971] PCI: pci_cache_line_size set to 64 bytes
[0;4/4: 0.172971] reserve RAM buffer: 000000000009bc00 - 000000000009ffff
[0;4/4: 0.172971] reserve RAM buffer: 000000001fffd000 - 000000001fffffff
[0;4/4: 0.176175] Switching to clocksource kvm-clock
[1;4/4: 0.212494] pci_bus 0000:00: resource 0 [io 0x0000-0xffff]
[1;4/4: 0.212500] pci_bus 0000:00: resource 1 [mem 0x00000000-0xffffffffffffffff]
[1;4/4: 0.212828] NET: Registered protocol family 2
[1;4/4: 0.213783] IP route cache hash table entries: 4096 (order: 3, 32768 bytes)
...
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-02 16:10 ` Arjan Koers
@ 2010-10-02 20:26 ` Michael Tokarev
2010-10-02 23:42 ` Zachary Amsden
1 sibling, 0 replies; 81+ messages in thread
From: Michael Tokarev @ 2010-10-02 20:26 UTC (permalink / raw)
To: Arjan Koers
Cc: kvm, Zachary Amsden, Marcelo Tosatti, Avi Kivity, Glauber Costa,
Andre Przywara, jeremy
[-- Attachment #1: Type: text/plain, Size: 765 bytes --]
02.10.2010 20:10, Arjan Koers wrote:
[]
> I'm pretty sure that your kernel will boot with this debug patch (for
> 2.6.35.7). It doesn't fix the problem, but corrects things afterwards.
> The patch sets the clock backwards if it detects that the previous
> value was far into the future. It also modifies printk to display some
> extra information. The DEBUG define was added to get extra calls to
> printk's where things can go wrong.
Yes, it boots fine with this patch applied. Attached is the dmesg
output of it.
[]
> Partial output on my machine, where the clock is set backwards 4 times:
> ...
> [0;0/0: 0.015662] CPU0: AMD Athlon(tm) II X2 240 Processor stepping 02
Um. I wonder if it's AMD-specific somehow... ;)
(I also use -cpu host)
Thanks!
/mjt
[-- Attachment #2: dmesg-2.6.36-i686-pvclock-debug-patch.txt --]
[-- Type: text/plain, Size: 24434 bytes --]
[0;0/0: 0.000000] Initializing cgroup subsys cpuset
[0;0/0: 0.000000] Initializing cgroup subsys cpu
[0;0/0: 0.000000] Linux version 2.6.35-i686 (mjt@gandalf) (gcc version 4.4.5 20100728 (prerelease) (Debian 4.4.4-8) ) #2.6.35.6 SMP Thu Sep 30 12:00:24 MSD 2010
[0;0/0: 0.000000] BIOS-provided physical RAM map:
[0;0/0: 0.000000] BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
[0;0/0: 0.000000] BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
[0;0/0: 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[0;0/0: 0.000000] BIOS-e820: 0000000000100000 - 000000001fffd000 (usable)
[0;0/0: 0.000000] BIOS-e820: 000000001fffd000 - 0000000020000000 (reserved)
[0;0/0: 0.000000] BIOS-e820: 00000000feffd000 - 00000000ff001000 (reserved)
[0;0/0: 0.000000] BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
[0;0/0: 0.000000] Notice: NX (Execute Disable) protection cannot be enabled: non-PAE kernel!
[0;0/0: 0.000000] DMI 2.4 present.
[0;0/0: 0.000000] e820 update range: 0000000000000000 - 0000000000001000 (usable) ==> (reserved)
[0;0/0: 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
[0;0/0: 0.000000] last_pfn = 0x1fffd max_arch_pfn = 0x100000
[0;0/0: 0.000000] MTRR default type: write-back
[0;0/0: 0.000000] MTRR fixed ranges enabled:
[0;0/0: 0.000000] 00000-9FFFF write-back
[0;0/0: 0.000000] A0000-BFFFF uncachable
[0;0/0: 0.000000] C0000-FFFFF write-protect
[0;0/0: 0.000000] MTRR variable ranges enabled:
[0;0/0: 0.000000] 0 base 00E0000000 mask FFE0000000 uncachable
[0;0/0: 0.000000] 1 disabled
[0;0/0: 0.000000] 2 disabled
[0;0/0: 0.000000] 3 disabled
[0;0/0: 0.000000] 4 disabled
[0;0/0: 0.000000] 5 disabled
[0;0/0: 0.000000] 6 disabled
[0;0/0: 0.000000] 7 disabled
[0;0/0: 0.000000] x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106
[0;0/0: 0.000000] initial memory mapped : 0 - 01800000
[0;0/0: 0.000000] found SMP MP-table at [c00fdbe0] fdbe0
[0;0/0: 0.000000] init_memory_mapping: 0000000000000000-000000001fffd000
[0;0/0: 0.000000] 0000000000 - 0000400000 page 4k
[0;0/0: 0.000000] 0000400000 - 001fc00000 page 2M
[0;0/0: 0.000000] 001fc00000 - 001fffd000 page 4k
[0;0/0: 0.000000] kernel direct mapping tables up to 1fffd000 @ 7000-c000
[0;0/0: 0.000000] RAMDISK: 1fbb5000 - 1fe96000
[0;0/0: 0.000000] ACPI: RSDP 000fdb90 00014 (v00 BOCHS )
[0;0/0: 0.000000] ACPI: RSDT 1fffde10 00034 (v01 BOCHS BXPCRSDT 00000001 BXPC 00000001)
[0;0/0: 0.000000] ACPI: FACP 1ffffe40 00074 (v01 BOCHS BXPCFACP 00000001 BXPC 00000001)
[0;0/0: 0.000000] ACPI: DSDT 1fffdfd0 01E22 (v01 BXPC BXDSDT 00000001 INTL 20090123)
[0;0/0: 0.000000] ACPI: FACS 1ffffe00 00040
[0;0/0: 0.000000] ACPI: SSDT 1fffdf80 00044 (v01 BOCHS BXPCSSDT 00000001 BXPC 00000001)
[0;0/0: 0.000000] ACPI: APIC 1fffde90 0007A (v01 BOCHS BXPCAPIC 00000001 BXPC 00000001)
[0;0/0: 0.000000] ACPI: HPET 1fffde50 00038 (v01 BOCHS BXPCHPET 00000001 BXPC 00000001)
[0;0/0: 0.000000] ACPI: Local APIC address 0xfee00000
[0;0/0: 0.000000] 0MB HIGHMEM available.
[0;0/0: 0.000000] 511MB LOWMEM available.
[0;0/0: 0.000000] mapped low ram: 0 - 1fffd000
[0;0/0: 0.000000] low ram: 0 - 1fffd000
[0;0/0: 0.000000] kvm-clock: Using msrs 12 and 11
[0;0/0: 0.000000] kvm-clock: cpu 0, msr 0:13c70c1, boot clock
[0;0/0: 0.000000] Zone PFN ranges:
[0;0/0: 0.000000] DMA 0x00000001 -> 0x00001000
[0;0/0: 0.000000] Normal 0x00001000 -> 0x0001fffd
[0;0/0: 0.000000] HighMem empty
[0;0/0: 0.000000] Movable zone start PFN for each node
[0;0/0: 0.000000] early_node_map[2] active PFN ranges
[0;0/0: 0.000000] 0: 0x00000001 -> 0x0000009f
[0;0/0: 0.000000] 0: 0x00000100 -> 0x0001fffd
[0;0/0: 0.000000] On node 0 totalpages: 130971
[0;0/0: 0.000000] free_area_init_node: node 0, pgdat c135ffc0, node_mem_map c1454020
[0;0/0: 0.000000] DMA zone: 32 pages used for memmap
[0;0/0: 0.000000] DMA zone: 0 pages reserved
[0;0/0: 0.000000] DMA zone: 3966 pages, LIFO batch:0
[0;0/0: 0.000000] Normal zone: 992 pages used for memmap
[0;0/0: 0.000000] Normal zone: 125981 pages, LIFO batch:31
[0;0/0: 0.000000] Using APIC driver default
[0;0/0: 0.000000] ACPI: PM-Timer IO Port: 0xb008
[0;0/0: 0.000000] ACPI: Local APIC address 0xfee00000
[0;0/0: 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[0;0/0: 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[0;0/0: 0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[0;0/0: 0.000000] IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
[0;0/0: 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[0;0/0: 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[0;0/0: 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[0;0/0: 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[0;0/0: 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[0;0/0: 0.000000] ACPI: IRQ0 used by override.
[0;0/0: 0.000000] ACPI: IRQ2 used by override.
[0;0/0: 0.000000] ACPI: IRQ5 used by override.
[0;0/0: 0.000000] ACPI: IRQ9 used by override.
[0;0/0: 0.000000] ACPI: IRQ10 used by override.
[0;0/0: 0.000000] ACPI: IRQ11 used by override.
[0;0/0: 0.000000] Using ACPI (MADT) for SMP configuration information
[0;0/0: 0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[0;0/0: 0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
[0;0/0: 0.000000] nr_irqs_gsi: 40
[0;0/0: 0.000000] PM: Registered nosave memory: 000000000009f000 - 00000000000a0000
[0;0/0: 0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
[0;0/0: 0.000000] PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
[0;0/0: 0.000000] Allocating PCI resources starting at 20000000 (gap: 20000000:deffd000)
[0;0/0: 0.000000] Booting paravirtualized kernel on KVM
[0;0/0: 0.000000] early_res array is doubled to 64 at [8000 - 87ff]
[0;0/0: 0.000000] setup_percpu: NR_CPUS:8 nr_cpumask_bits:8 nr_cpu_ids:2 nr_node_ids:1
[0;0/0: 0.000000] PERCPU: Embedded 16 pages/cpu @c1c00000 s43072 r0 d22464 u2097152
[0;0/0: 0.000000] pcpu-alloc: s43072 r0 d22464 u2097152 alloc=1*4194304
[0;0/0: 0.000000] pcpu-alloc: [0] 0 1
[0;0/0: 0.000000] kvm-clock: cpu 0, msr 0:1c0a0c1, primary cpu clock
[0;0/0: 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 129947
[0;0/0: 0.000000] Kernel command line: acpi_enforce_resources=lax rootfs=nfs root=/usr/rb rootflags=ro,nolock bootrc=/remote/bootrc initrd=lnx/initrd-2.6.35-i686 ip=192.168.88.60:192.168.88.4:192.168.88.4:255.255.255.0 BOOTIF=01-52-54-00-12-34-56 debug console=ttyS0 console=tty1 BOOT_IMAGE=lnx/vmlinuz-2.6.35-i686
[0;0/0: 0.000000] PID hash table entries: 2048 (order: 1, 8192 bytes)
[0;0/0: 0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
[0;0/0: 0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
[0;0/0: 0.000000] Enabling fast FPU save and restore... done.
[0;0/0: 0.000000] Enabling unmasked SIMD FPU exception support... done.
[0;0/0: 0.000000] Initializing CPU#0
[0;0/0: 0.000000] Subtract (42 early reservations)
[0;0/0: 0.000000] #1 [0000001000 - 0000002000] EX TRAMPOLINE
[0;0/0: 0.000000] #2 [0001000000 - 000144b9e4] TEXT DATA BSS
[0;0/0: 0.000000] #3 [001fbb5000 - 001fe96000] RAMDISK
[0;0/0: 0.000000] #4 [000144c000 - 0001452049] BRK
[0;0/0: 0.000000] #5 [000009f400 - 00000fdbe0] BIOS reserved
[0;0/0: 0.000000] #6 [00000fdbe0 - 00000fdbf0] MP-table mpf
[0;0/0: 0.000000] #7 [00000fdce4 - 0000100000] BIOS reserved
[0;0/0: 0.000000] #8 [00000fdbf0 - 00000fdce4] MP-table mpc
[0;0/0: 0.000000] #9 [0000002000 - 0000003000] TRAMPOLINE
[0;0/0: 0.000000] #10 [0000003000 - 0000007000] ACPI WAKEUP
[0;0/0: 0.000000] #11 [0000007000 - 0000008000] PGTABLE
[0;0/0: 0.000000] #12 [0001453000 - 0001454000] BOOTMEM
[0;0/0: 0.000000] #13 [0001454000 - 0001854000] BOOTMEM
[0;0/0: 0.000000] #14 [000144ba00 - 000144ba04] BOOTMEM
[0;0/0: 0.000000] #15 [000144ba40 - 000144bb00] BOOTMEM
[0;0/0: 0.000000] #16 [000144bb00 - 000144bb30] BOOTMEM
[0;0/0: 0.000000] #17 [0001854000 - 0001855800] BOOTMEM
[0;0/0: 0.000000] #18 [000144bb40 - 000144bb65] BOOTMEM
[0;0/0: 0.000000] #19 [000144bb80 - 000144bba7] BOOTMEM
[0;0/0: 0.000000] #20 [000144bbc0 - 000144bca0] BOOTMEM
[0;0/0: 0.000000] #21 [000144bcc0 - 000144bd00] BOOTMEM
[0;0/0: 0.000000] #22 [000144bd00 - 000144bd40] BOOTMEM
[0;0/0: 0.000000] #23 [000144bd40 - 000144bd80] BOOTMEM
[0;0/0: 0.000000] #24 [000144bd80 - 000144bdc0] BOOTMEM
[0;0/0: 0.000000] #25 [000144bdc0 - 000144be00] BOOTMEM
[0;0/0: 0.000000] #26 [000144be00 - 000144be40] BOOTMEM
[0;0/0: 0.000000] #27 [000144be40 - 000144be80] BOOTMEM
[0;0/0: 0.000000] #28 [000144be80 - 000144be90] BOOTMEM
[0;0/0: 0.000000] #29 [000144bec0 - 000144bfd5] BOOTMEM
[0;0/0: 0.000000] #30 [0001452080 - 0001452195] BOOTMEM
[0;0/0: 0.000000] #31 [0001c00000 - 0001c10000] BOOTMEM
[0;0/0: 0.000000] #32 [0001e00000 - 0001e10000] BOOTMEM
[0;0/0: 0.000000] #33 [00014521c0 - 00014521c4] BOOTMEM
[0;0/0: 0.000000] #34 [0001452200 - 0001452204] BOOTMEM
[0;0/0: 0.000000] #35 [0001452240 - 0001452248] BOOTMEM
[0;0/0: 0.000000] #36 [0001452280 - 0001452288] BOOTMEM
[0;0/0: 0.000000] #37 [00014522c0 - 0001452368] BOOTMEM
[0;0/0: 0.000000] #38 [0001452380 - 00014523e8] BOOTMEM
[0;0/0: 0.000000] #39 [0001855800 - 0001857800] BOOTMEM
[0;0/0: 0.000000] #40 [0001857800 - 0001897800] BOOTMEM
[0;0/0: 0.000000] #41 [0001897800 - 00018b7800] BOOTMEM
[0;0/0: 0.000000] Initializing HighMem for node 0 (00000000:00000000)
[0;0/0: 0.000000] Memory: 511852k/524276k available (2555k kernel code, 12032k reserved, 929k data, 384k init, 0k highmem)
[0;0/0: 0.000000] virtual kernel memory layout:
[0;0/0: 0.000000] fixmap : 0xfff16000 - 0xfffff000 ( 932 kB)
[0;0/0: 0.000000] pkmap : 0xff800000 - 0xffc00000 (4096 kB)
[0;0/0: 0.000000] vmalloc : 0xe07fd000 - 0xff7fe000 ( 496 MB)
[0;0/0: 0.000000] lowmem : 0xc0000000 - 0xdfffd000 ( 511 MB)
[0;0/0: 0.000000] .init : 0xc1368000 - 0xc13c8000 ( 384 kB)
[0;0/0: 0.000000] .data : 0xc127ed37 - 0xc1367488 ( 929 kB)
[0;0/0: 0.000000] .text : 0xc1000000 - 0xc127ed37 (2555 kB)
[0;0/0: 0.000000] Checking if this processor honours the WP bit even in supervisor mode...Ok.
[0;0/0: 0.000000] Hierarchical RCU implementation.
[0;0/0: 0.000000] RCU-based detection of stalled CPUs is disabled.
[0;0/0: 0.000000] Verbose stalled-CPUs detection is disabled.
[0;0/0: 0.000000] NR_IRQS:512
[0;0/0: 0.000000] CPU 0 irqstacks, hard=c1c00000 soft=c1c01000
[0;0/0: 0.000000] Console: colour VGA+ 80x25
[0;0/0: 0.000000] console [tty1] enabled
[0;0/0: 0.000000] console [ttyS0] enabled
[0;0/0: 0.000000] Detected 3217.252 MHz processor.
[0;0/0: 0.006666] Calibrating delay loop (skipped) preset value.. 6437.60 BogoMIPS (lpj=10724173)
[0;0/0: 0.006666] pid_max: default: 32768 minimum: 301
[0;0/0: 0.006666] Mount-cache hash table entries: 512
[0;0/0: 0.006780] Initializing cgroup subsys ns
[0;0/0: 0.007518] Initializing cgroup subsys cpuacct
[0;0/0: 0.008302] Initializing cgroup subsys devices
[0;0/0: 0.009087] Initializing cgroup subsys freezer
[0;0/0: 0.010003] Initializing cgroup subsys net_cls
[0;0/0: 0.010801] Initializing cgroup subsys blkio
[0;0/0: 0.011622] mce: CPU supports 10 MCE banks
[0;0/0: 0.012406] using C1E aware idle routine
[0;0/0: 0.013344] Performance Events: AMD PMU driver.
[0;0/0: 0.014201] ... version: 0
[0;0/0: 0.015007] ... bit width: 48
[0;0/0: 0.015750] ... generic registers: 4
[0;0/0: 0.016668] ... value mask: 0000ffffffffffff
[0;0/0: 0.017555] ... max period: 00007fffffffffff
[0;0/0: 0.018447] ... fixed-purpose events: 0
[0;0/0: 0.019176] ... event mask: 000000000000000f
[0;0/0: 0.023763] ACPI: Core revision 20100428
[0;0/0: 0.026153] Enabling APIC mode: Flat. Using 1 I/O APICs
[0;0/0: 0.028267] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[0;0/0: 0.029348] CPU0: AMD Athlon(tm) II X2 260 Processor stepping 03
[0;0/0: 0.033332] ++++++++++++++++++++=_---CPU UP 1
[0;0/0: 0.033332] CPU 1 irqstacks, hard=c1e00000 soft=c1e01000
[0;0/0: 0.033332] Booting Node 0, Processors #1 Ok.
[0;0/0: 0.033332] Setting warm reset code and vector.
[0;0/0: 0.033340] 1.
[0;0/0: 0.033807] 2.
[0;0/0: 0.034286] 3.
[0;0/0: 0.034760] Asserting INIT.
[0;0/0: 0.035382] Waiting for send to finish...
[0;0/0: 0.047424] Deasserting INIT.
[0;0/0: 0.048263] Waiting for send to finish...
[0;0/0: 0.049039] #startup loops: 2.
[0;0/0: 0.049687] Sending STARTUP #1.
[0;0/0: 0.050004] After apic_write.
[1;0/0: 0.006666] Initializing CPU#1
[1;0/0: 0.006666] CPU#1 (phys ID: 1) waiting for CALLOUT
[0;0/1: 0.050947] Startup point 1.
[0;1/1: 0.053334] Waiting for send to finish...
[0;1/1: 0.054307] Sending STARTUP #2.
[0;1/1: 0.054976] After apic_write.
[0;1/1: 0.055910] Startup point 1.
[0;1/1: 0.056529] Waiting for send to finish...
[0;1/1: 0.056873] After Startup.
[0;1/1: 0.057477] Before Callout 1.
[0;1/1: 0.058108] After Callout 1.
[1;1/1: 0.006666] CALLIN, before setup_local_APIC().
[1;1/1: 0.006666] Stack at about df45afb0
[0;2/2: 0.063338] CPU1: has booted.
[1;2/2: 0.064004] kvm-clock: cpu 1, msr 0:1e0a0c1, secondary cpu clock
[0;2/2: 0.064020] Brought up 2 CPUs
[0;2/2: 0.064022] Boot done.
[0;2/2: 0.064023] Before bogomips.
[0;2/2: 0.064024] Total of 2 processors activated (12874.21 BogoMIPS).
[0;2/2: 0.064026] Before bogocount - setting activated=1.
[1;2/2: 0.070041] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
[0;2/2: 0.071991] devtmpfs: initialized
[0;2/2: 0.090107] NET: Registered protocol family 16
[0;2/2: 0.096723] ACPI: bus type pci registered
[0;2/2: 0.097582] PCI: PCI BIOS revision 2.10 entry at 0xffe77, last bus=0
[0;2/2: 0.100003] PCI: Using configuration type 1 for base access
[0;2/2: 0.100980] PCI: Using configuration type 1 for extended access
[0;2/2: 0.103511] mtrr: your CPUs had inconsistent variable MTRR settings
[0;2/2: 0.105319] mtrr: your CPUs had inconsistent MTRRdefType settings
[0;2/2: 0.106680] mtrr: probably your BIOS does not setup all CPUs.
[0;2/2: 0.108291] mtrr: corrected configuration.
[0;2/2: 0.120562] bio: create slab <bio-0> at 0
[0;2/2: 0.123617] ACPI: EC: Look up EC in DSDT
[0;2/2: 0.133128] ACPI: Interpreter enabled
[0;2/2: 0.133340] ACPI: (supports S0 S3 S4 S5)
[0;2/2: 0.140012] ACPI: Using IOAPIC for interrupt routing
[0;2/2: 0.173646] ACPI: No dock devices found.
[0;2/2: 0.174654] PCI: Ignoring host bridge windows from ACPI; if necessary, use "pci=use_crs" and report a bug
[0;2/2: 0.176690] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[0;2/2: 0.180038] pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
[0;2/2: 0.181789] pci_root PNP0A03:00: host bridge window [io 0x0d00-0xffff] (ignored)
[0;2/2: 0.183336] pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff] (ignored)
[0;2/2: 0.185124] pci_root PNP0A03:00: host bridge window [mem 0xe0000000-0xfebfffff] (ignored)
[0;2/2: 0.188101] pci 0000:00:01.1: reg 20: [io 0xc000-0xc00f]
[0;2/2: 0.190266] pci 0000:00:01.3: quirk: [io 0xb000-0xb03f] claimed by PIIX4 ACPI
[0;2/2: 0.192430] pci 0000:00:01.3: quirk: [io 0xb100-0xb10f] claimed by PIIX4 SMB
[0;2/2: 0.197592] pci 0000:00:02.0: reg 10: [mem 0xf0000000-0xf1ffffff pref]
[0;2/2: 0.200843] pci 0000:00:02.0: reg 14: [mem 0xf2000000-0xf2000fff]
[0;2/2: 0.207003] pci 0000:00:02.0: reg 30: [mem 0xf2010000-0xf201ffff pref]
[0;2/2: 0.208404] pci 0000:00:03.0: reg 10: [io 0xc020-0xc03f]
[0;2/2: 0.209386] pci 0000:00:03.0: reg 14: [mem 0xf2020000-0xf2020fff]
[0;2/2: 0.210083] pci 0000:00:03.0: reg 30: [mem 0xf2030000-0xf203ffff pref]
[0;2/2: 0.211384] pci_bus 0000:00: on NUMA node 0
[0;2/2: 0.212212] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[0;2/2: 0.250148] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[0;2/2: 0.255557] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[0;2/2: 0.257251] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[0;2/2: 0.260271] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[0;2/2: 0.261575] HEST: Table is not found!
[0;2/2: 0.263389] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[0;2/2: 0.264871] vgaarb: loaded
[0;2/2: 0.266696] PCI: Using ACPI for IRQ routing
[0;2/2: 0.267533] PCI: pci_cache_line_size set to 64 bytes
[0;2/2: 0.268531] reserve RAM buffer: 000000000009f400 - 000000000009ffff
[0;2/2: 0.269456] reserve RAM buffer: 000000001fffd000 - 000000001fffffff
[0;2/2: 0.270108] Switching to clocksource kvm-clock
[1;2/2: 0.273590] pnp: PnP ACPI init
[1;2/2: 0.276224] ACPI: bus type pnp registered
[1;2/2: 0.289416] pnp: PnP ACPI: found 8 devices
[1;2/2: 0.292666] ACPI: ACPI bus type pnp unregistered
[1;2/2: 0.296401] PnPBIOS: Disabled
[1;2/2: 0.347394] pci_bus 0000:00: resource 0 [io 0x0000-0xffff]
[1;2/2: 0.348535] pci_bus 0000:00: resource 1 [mem 0x00000000-0xffffffff]
[1;2/2: 0.349729] NET: Registered protocol family 2
[1;2/2: 0.350645] IP route cache hash table entries: 4096 (order: 2, 16384 bytes)
[1;2/2: 0.353500] TCP established hash table entries: 16384 (order: 5, 131072 bytes)
[1;2/2: 0.355026] TCP bind hash table entries: 16384 (order: 5, 131072 bytes)
[1;2/2: 0.356310] TCP: Hash tables configured (established 16384 bind 16384)
[1;2/2: 0.357441] TCP reno registered
[1;2/2: 0.358122] UDP hash table entries: 256 (order: 1, 8192 bytes)
[1;2/2: 0.359155] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
[1;2/2: 0.360525] NET: Registered protocol family 1
[1;2/2: 0.361377] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[1;2/2: 0.362416] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[1;2/2: 0.363432] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[1;2/2: 0.364514] pci 0000:00:02.0: Boot video device
[1;2/2: 0.365381] PCI: CLS 0 bytes, default 64
[1;2/2: 0.366199] Unpacking initramfs...
[1;2/2: 0.424327] Freeing initrd memory: 2948k freed
[1;2/2: 0.440434] HugeTLB registered 4 MB page size, pre-allocated 0 pages
[1;2/2: 0.442047] VFS: Disk quotas dquot_6.5.2
[1;2/2: 0.442848] Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
[1;2/2: 0.446823] msgmni has been set to 1005
[1;2/2: 0.447985] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[1;2/2: 0.449421] io scheduler noop registered
[1;2/2: 0.450190] io scheduler deadline registered
[1;2/2: 0.451094] io scheduler cfq registered (default)
[1;2/2: 0.453188] ERST: Table is not found!
[1;2/2: 0.454095] isapnp: Scanning for PnP cards...
[1;2/2: 0.824535] isapnp: No Plug & Play device found
[1;2/2: 0.826091] hpet_acpi_add: no address or irqs in _CRS
[1;2/2: 0.827286] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[1;2/2: 0.828924] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[1;2/2: 0.836029] 00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[1;2/2: 0.838659] PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[1;2/2: 0.840619] serio: i8042 KBD port at 0x60,0x64 irq 1
[1;2/2: 0.841541] serio: i8042 AUX port at 0x60,0x64 irq 12
[1;2/2: 0.845012] mice: PS/2 mouse device common for all mice
[1;2/2: 0.847980] input: PC Speaker as /devices/platform/pcspkr/input/input0
[1;2/2: 0.849155] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1
[1;2/2: 0.852013] rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
[1;2/2: 0.854609] rtc0: alarms up to one day, 114 bytes nvram
[1;2/2: 0.886105] cpuidle: using governor ladder
[1;2/2: 0.887061] cpuidle: using governor menu
[1;2/2: 0.887913] TCP cubic registered
[1;2/2: 0.888669] NET: Registered protocol family 17
[1;2/2: 0.889591] Using IPI No-Shortcut mode
[0;2/2: 0.931969] rtc_cmos 00:01: setting system clock to 2010-10-02 20:25:08 UTC (1286051108)
[0;2/2: 0.933627] Freeing unused kernel memory: 384k freed
[0;2/2: 0.935692] Processing INITRAMFS
[1;2/2: 1.006076] Clocksource tsc unstable (delta = 4015199349967 ns)
[1;2/2: 1.146625] SCSI subsystem initialized
[1;2/2: 1.157307] libata version 3.00 loaded.
[1;2/2: 1.182560] pata_acpi 0000:00:01.1: setting latency timer to 64
[1;2/2: 1.222009] ata_piix 0000:00:01.1: version 2.13
[1;2/2: 1.223035] ata_piix 0000:00:01.1: setting latency timer to 64
[1;2/2: 1.241698] scsi0 : ata_piix
[1;2/2: 1.243020] scsi1 : ata_piix
[1;2/2: 1.244063] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14
[1;2/2: 1.245429] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15
[1;2/2: 1.397679] ata2.01: NODEV after polling detection
[1;2/2: 1.402153] ata2.00: ATAPI: QEMU DVD-ROM, 0.12.91, max UDMA/100
[1;2/2: 1.409234] ata2.00: configured for MWDMA2
[1;2/2: 1.423776] scsi 1:0:0:0: CD-ROM QEMU QEMU DVD-ROM 0.12 PQ: 0 ANSI: 5
[0;2/2: 1.484757] sr0: scsi3-mmc drive: 4x/4x xa/form2 tray
[0;2/2: 1.485748] Uniform CD-ROM driver Revision: 3.20
[1;2/2: 1.486967] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
[1;2/2: 1.488081] virtio-pci 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) -> IRQ 11
[0;2/2: 1.488423] sr 1:0:0:0: Attached scsi CD-ROM sr0
[1;2/2: 1.490498] virtio-pci 0000:00:03.0: setting latency timer to 64
[1;2/2: 1.493282] warning: unable to find netif for 52:54:00:12:34:56, using eth0
[1;2/2: 1.494929] configuring network interface eth0: 192.168.88.60/255.255.255.0
[0;2/2: 1.496382] virtio-pci 0000:00:03.0: irq 40 for MSI/MSI-X
[0;2/2: 1.497404] virtio-pci 0000:00:03.0: irq 41 for MSI/MSI-X
[0;2/2: 1.498351] virtio-pci 0000:00:03.0: irq 42 for MSI/MSI-X
[1;2/2: 1.499098] ifconfig: SIOCSIFADDR: No such device
[1;2/2: 1.499155] warning: the following command failed:
[1;2/2: 1.499167] warning: ifconfig eth0 inet 192.168.88.60 netmask 255.255.255.0 up
[1;2/2: 2.239140] mounting nfs fs on 192.168.88.4:/usr/rb (options: ro,nolock)
[1;2/2: 2.274254] RPC: Registered udp transport module.
[1;2/2: 2.275187] RPC: Registered tcp transport module.
[1;2/2: 2.276072] RPC: Registered tcp NFSv4.1 backchannel transport module.
[1;2/2: 2.288839] Slow work thread pool: Starting up
[1;2/2: 2.292424] Slow work thread pool: Ready
[1;2/2: 2.301246] FS-Cache: Loaded
[1;2/2: 2.313044] FS-Cache: Netfs 'nfs' registered for caching
[0;2/2: 2.321020] executing /remote/bootrc
[1;2/2: 2.353758] aufs 2-standalone.tree-35-20100823
[0;2/2: 2.387083] loop: module loaded
[1;2/2: 2.395463] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[0;2/2: 3.227124] udev: starting version 160
[0;2/2: 3.401994] piix4_smbus 0000:00:01.3: SMBus Host Controller at 0xb100, revision 0
[0;2/2: 3.525318] parport_pc 00:05: reported by Plug and Play ACPI
[0;2/2: 3.527128] parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
[0;2/2: 3.536295] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input2
[0;2/2: 3.545768] ACPI: Power Button [PWRF]
[1;2/2: 3.556560] sr 1:0:0:0: Attached scsi generic sg0 type 5
[0;2/2: 3.570451] FDC 0 is a S82078B
[0;2/2: 3.606791] ACPI: acpi_idle registered with cpuidle
[1;2/2: 4.049418] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-02 7:35 ` Michael Tokarev
2010-10-02 7:40 ` Michael Tokarev
2010-10-02 16:10 ` Arjan Koers
@ 2010-10-02 21:55 ` Zachary Amsden
2010-10-03 8:16 ` Michael Tokarev
2 siblings, 1 reply; 81+ messages in thread
From: Zachary Amsden @ 2010-10-02 21:55 UTC (permalink / raw)
To: Michael Tokarev
Cc: Marcelo Tosatti, Arjan Koers, kvm, Avi Kivity, Glauber Costa,
Andre Przywara, jeremy
On 10/01/2010 09:35 PM, Michael Tokarev wrote:
> 02.10.2010 09:35, Zachary Amsden wrote:
> []
>
>> Can you try this patch to see if it helps? I believe it is also safe
>> for Xen, but cc'ing to double check.
>>
> It makes no visible difference.
>
> For some reason one of my test guests - 2.6.35.6 32bit kernel -
> stopped booting completely, always handing at boot somewhere
> unless I disable printk.time. Here's the typical boot messages,
> up to the hang:
>
> [ 0.000000] Initializing cgroup subsys cpuset
> [ 0.000000] Initializing cgroup subsys cpu
> [ 0.000000] Linux version 2.6.35-i686 (mjt@gandalf) (gcc version 4.4.5 20100728 (prerelease) (Debian 4.4.4-8) ) #2.6.35.6 SMP Thu Sep 30 12:00:24 MSD 2010
> [ 0.000000] BIOS-provided physical RAM map:
> [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
> [ 0.000000] BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
> [ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
> [ 0.000000] BIOS-e820: 0000000000100000 - 000000001fffd000 (usable)
> [ 0.000000] BIOS-e820: 000000001fffd000 - 0000000020000000 (reserved)
> [ 0.000000] BIOS-e820: 00000000feffd000 - 00000000ff001000 (reserved)
> [ 0.000000] BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
> [ 0.000000] Notice: NX (Execute Disable) protection cannot be enabled: non-PAE kernel!
> [ 0.000000] DMI 2.4 present.
> [ 0.000000] last_pfn = 0x1fffd max_arch_pfn = 0x100000
> [ 0.000000] x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106
> [ 0.000000] found SMP MP-table at [c00fdbe0] fdbe0
> [ 0.000000] init_memory_mapping: 0000000000000000-000000001fffd000
> [ 0.000000] RAMDISK: 1fbb5000 - 1fe96000
> [ 0.000000] ACPI: RSDP 000fdb90 00014 (v00 BOCHS )
> [ 0.000000] ACPI: RSDT 1fffde10 00034 (v01 BOCHS BXPCRSDT 00000001 BXPC 00000001)
> [ 0.000000] ACPI: FACP 1ffffe40 00074 (v01 BOCHS BXPCFACP 00000001 BXPC 00000001)
> [ 0.000000] ACPI: DSDT 1fffdfd0 01E22 (v01 BXPC BXDSDT 00000001 INTL 20090123)
> [ 0.000000] ACPI: FACS 1ffffe00 00040
> [ 0.000000] ACPI: SSDT 1fffdf80 00044 (v01 BOCHS BXPCSSDT 00000001 BXPC 00000001)
> [ 0.000000] ACPI: APIC 1fffde90 0007A (v01 BOCHS BXPCAPIC 00000001 BXPC 00000001)
> [ 0.000000] ACPI: HPET 1fffde50 00038 (v01 BOCHS BXPCHPET 00000001 BXPC 00000001)
> [ 0.000000] 0MB HIGHMEM available.
> [ 0.000000] 511MB LOWMEM available.
> [ 0.000000] mapped low ram: 0 - 1fffd000
> [ 0.000000] low ram: 0 - 1fffd000
> [ 0.000000] kvm-clock: Using msrs 12 and 11
> [ 0.000000] kvm-clock: cpu 0, msr 0:13c60c1, boot clock
> [ 0.000000] Zone PFN ranges:
> [ 0.000000] DMA 0x00000001 -> 0x00001000
> [ 0.000000] Normal 0x00001000 -> 0x0001fffd
> [ 0.000000] HighMem empty
> [ 0.000000] Movable zone start PFN for each node
> [ 0.000000] early_node_map[2] active PFN ranges
> [ 0.000000] 0: 0x00000001 -> 0x0000009f
> [ 0.000000] 0: 0x00000100 -> 0x0001fffd
> [ 0.000000] Using APIC driver default
> [ 0.000000] ACPI: PM-Timer IO Port: 0xb008
> [ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
> [ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
> [ 0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
> [ 0.000000] IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
> [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
> [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
> [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
> [ 0.000000] Using ACPI (MADT) for SMP configuration information
> [ 0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
> [ 0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
> [ 0.000000] PM: Registered nosave memory: 000000000009f000 - 00000000000a0000
> [ 0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
> [ 0.000000] PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
> [ 0.000000] Allocating PCI resources starting at 20000000 (gap: 20000000:deffd000)
> [ 0.000000] Booting paravirtualized kernel on KVM
> [ 0.000000] setup_percpu: NR_CPUS:8 nr_cpumask_bits:8 nr_cpu_ids:2 nr_node_ids:1
> [ 0.000000] PERCPU: Embedded 16 pages/cpu @c1c00000 s43072 r0 d22464 u2097152
> [ 0.000000] pcpu-alloc: s43072 r0 d22464 u2097152 alloc=1*4194304
> [ 0.000000] pcpu-alloc: [0] 0 1
> [ 0.000000] kvm-clock: cpu 0, msr 0:1c0a0c1, primary cpu clock
> [ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 129947
> [ 0.000000] Kernel command line: acpi_enforce_resources=lax rootfs=nfs root=/usr/rb rootflags=ro,nolock bootrc=/remote/bootrc initrd=lnx/initrd-2.6.35-i686 ip=192.168.88.60:192.168.88.4:192.168.88.4:255.255.255.0 BOOTIF=01-52-54-00-12-34-56 console=tty1 console=ttyS0 BOOT_IMAGE=lnx/vmlinuz-2.6.35-i686
> [ 0.000000] PID hash table entries: 2048 (order: 1, 8192 bytes)
> [ 0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
> [ 0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
> [ 0.000000] Enabling fast FPU save and restore... done.
> [ 0.000000] Enabling unmasked SIMD FPU exception support... done.
> [ 0.000000] Initializing CPU#0
> [ 0.000000] Subtract (42 early reservations)
> [ 0.000000] #1 [0000001000 - 0000002000] EX TRAMPOLINE
> [ 0.000000] #2 [0001000000 - 000144a9e4] TEXT DATA BSS
> [ 0.000000] #3 [001fbb5000 - 001fe96000] RAMDISK
> [ 0.000000] #4 [000144b000 - 0001451049] BRK
> [ 0.000000] #5 [000009f400 - 00000fdbe0] BIOS reserved
> [ 0.000000] #6 [00000fdbe0 - 00000fdbf0] MP-table mpf
> [ 0.000000] #7 [00000fdce4 - 0000100000] BIOS reserved
> [ 0.000000] #8 [00000fdbf0 - 00000fdce4] MP-table mpc
> [ 0.000000] #9 [0000002000 - 0000003000] TRAMPOLINE
> [ 0.000000] #10 [0000003000 - 0000007000] ACPI WAKEUP
> [ 0.000000] #11 [0000007000 - 0000008000] PGTABLE
> [ 0.000000] #12 [0001452000 - 0001453000] BOOTMEM
> [ 0.000000] #13 [0001453000 - 0001853000] BOOTMEM
> [ 0.000000] #14 [000144aa00 - 000144aa04] BOOTMEM
> [ 0.000000] #15 [000144aa40 - 000144ab00] BOOTMEM
> [ 0.000000] #16 [000144ab00 - 000144ab30] BOOTMEM
> [ 0.000000] #17 [0001853000 - 0001854800] BOOTMEM
> [ 0.000000] #18 [000144ab40 - 000144ab65] BOOTMEM
> [ 0.000000] #19 [000144ab80 - 000144aba7] BOOTMEM
> [ 0.000000] #20 [000144abc0 - 000144aca0] BOOTMEM
> [ 0.000000] #21 [000144acc0 - 000144ad00] BOOTMEM
> [ 0.000000] #22 [000144ad00 - 000144ad40] BOOTMEM
> [ 0.000000] #23 [000144ad40 - 000144ad80] BOOTMEM
> [ 0.000000] #24 [000144ad80 - 000144adc0] BOOTMEM
> [ 0.000000] #25 [000144adc0 - 000144ae00] BOOTMEM
> [ 0.000000] #26 [000144ae00 - 000144ae40] BOOTMEM
> [ 0.000000] #27 [000144ae40 - 000144ae80] BOOTMEM
> [ 0.000000] #28 [000144ae80 - 000144ae90] BOOTMEM
> [ 0.000000] #29 [000144aec0 - 000144afcf] BOOTMEM
> [ 0.000000] #30 [0001451080 - 000145118f] BOOTMEM
> [ 0.000000] #31 [0001c00000 - 0001c10000] BOOTMEM
> [ 0.000000] #32 [0001e00000 - 0001e10000] BOOTMEM
> [ 0.000000] #33 [00014511c0 - 00014511c4] BOOTMEM
> [ 0.000000] #34 [0001451200 - 0001451204] BOOTMEM
> [ 0.000000] #35 [0001451240 - 0001451248] BOOTMEM
> [ 0.000000] #36 [0001451280 - 0001451288] BOOTMEM
> [ 0.000000] #37 [00014512c0 - 0001451368] BOOTMEM
> [ 0.000000] #38 [0001451380 - 00014513e8] BOOTMEM
> [ 0.000000] #39 [0001854800 - 0001856800] BOOTMEM
> [ 0.000000] #40 [0001856800 - 0001896800] BOOTMEM
> [ 0.000000] #41 [0001896800 - 00018b6800] BOOTMEM
> [ 0.000000] Initializing HighMem for node 0 (00000000:00000000)
> [ 0.000000] Memory: 511856k/524276k available (2554k kernel code, 12028k reserved, 930k data, 380k init, 0k highmem)
> [ 0.000000] virtual kernel memory layout:
> [ 0.000000] fixmap : 0xfff16000 - 0xfffff000 ( 932 kB)
> [ 0.000000] pkmap : 0xff800000 - 0xffc00000 (4096 kB)
> [ 0.000000] vmalloc : 0xe07fd000 - 0xff7fe000 ( 496 MB)
> [ 0.000000] lowmem : 0xc0000000 - 0xdfffd000 ( 511 MB)
> [ 0.000000] .init : 0xc1368000 - 0xc13c7000 ( 380 kB)
> [ 0.000000] .data : 0xc127ebb7 - 0xc1367488 ( 930 kB)
> [ 0.000000] .text : 0xc1000000 - 0xc127ebb7 (2554 kB)
> [ 0.000000] Checking if this processor honours the WP bit even in supervisor mode...Ok.
> [ 0.000000] Hierarchical RCU implementation.
> [ 0.000000] RCU-based detection of stalled CPUs is disabled.
> [ 0.000000] Verbose stalled-CPUs detection is disabled.
> [ 0.000000] NR_IRQS:512
> [ 0.000000] Console: colour VGA+ 80x25
> [ 0.000000] console [tty1] enabled
> [ 0.000000] console [ttyS0] enabled
> [ 0.000000] Detected 3217.252 MHz processor.
> [ 0.023332] Calibrating delay loop (skipped) preset value.. 6437.60 BogoMIPS (lpj=10724173)
> [ 0.023332] pid_max: default: 32768 minimum: 301
> [ 0.023332] Mount-cache hash table entries: 512
> [ 0.023447] Initializing cgroup subsys ns
> [ 0.024131] Initializing cgroup subsys cpuacct
> [ 0.024851] Initializing cgroup subsys devices
> [ 0.025580] Initializing cgroup subsys freezer
> [ 0.026669] Initializing cgroup subsys net_cls
> [ 0.027425] Initializing cgroup subsys blkio
> [ 0.030079] mce: CPU supports 10 MCE banks
> [ 0.030847] using C1E aware idle routine
> [ 0.031517] Performance Events: AMD PMU driver.
> [ 0.032313] ... version: 0
> [ 0.033335] ... bit width: 48
> [ 0.034036] ... generic registers: 4
> [ 0.034716] ... value mask: 0000ffffffffffff
> [ 0.035542] ... max period: 00007fffffffffff
> [ 0.036669] ... fixed-purpose events: 0
> [ 0.037521] ... event mask: 000000000000000f
> [ 0.041961] ACPI: Core revision 20100428
> [ 0.044150] Enabling APIC mode: Flat. Using 1 I/O APICs
> [ 0.045964] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [ 0.046671] CPU0: AMD Athlon(tm) II X2 260 Processor stepping 03
> [ 0.049999] APIC calibration not consistent with PM-Timer: 102ms instead of 100ms
> [ 0.049999] APIC delta adjusted to PM-Timer: 6248670 (6435422)
> [ 0.050298] Booting Node 0, Processors #1 Ok.
> [ 0.023332] Initializing CPU#1
>
Before this, time is very granular...
> [ 0.063333] kvm-clock: cpu 1, msr 0:1e0a0c1, secondary cpu clock
> [ 0.063333] Brought up 2 CPUs
> [ 0.063333] Total of 2 processors activated (12874.21 BogoMIPS).
> [ 0.076666] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
> [ 0.116666] devtmpfs: initialized
> [ 0.116666] NET: Registered protocol family 16
> [ 0.119999] ACPI: bus type pci registered
>
Now it is multiples of 1/300 ....
> [ 0.123333] PCI: PCI BIOS revision 2.10 entry at 0xffe77, last bus=0
> [ 0.123333] PCI: Using configuration type 1 for base access
> [ 0.123333] PCI: Using configuration type 1 for extended access
> [ 0.126666] mtrr: your CPUs had inconsistent variable MTRR settings
> [ 0.126666] mtrr: your CPUs had inconsistent MTRRdefType settings
> [ 0.126666] mtrr: probably your BIOS does not setup all CPUs.
> [ 0.126666] mtrr: corrected configuration.
> [ 0.136666] bio: create slab<bio-0> at 0
> [ 0.153333] ACPI: Interpreter enabled
> [ 0.153333] ACPI: (supports S0 S3 S4 S5)
> [ 0.153333] ACPI: Using IOAPIC for interrupt routing
> [ 0.203333] ACPI: No dock devices found.
> [ 0.203333] PCI: Ignoring host bridge windows from ACPI; if necessary, use "pci=use_crs" and report a bug
> [ 0.206666] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
> [ 0.209999] pci 0000:00:01.3: quirk: [io 0xb000-0xb03f] claimed by PIIX4 ACPI
> [ 0.209999] pci 0000:00:01.3: quirk: [io 0xb100-0xb10f] claimed by PIIX4 SMB
> [ 0.216666] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
> [ 0.219999] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
> [ 0.219999] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
> [ 0.223333] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
> [ 0.223333] HEST: Table is not found!
> [ 0.226666] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
> [ 0.229999] vgaarb: loaded
> [ 0.229999] PCI: Using ACPI for IRQ routing
> [ 0.233333] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
> [ 0.239999] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
> [ 0.239999] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
> [ 0.249999] Switching to clocksource kvm-clock
> [ 0.259999] pnp: PnP ACPI init
>
Then, of course, it fails.
What is your host clocksource? Does your machine have unstable TSC?
Here, I have unstable tsc:
[zamsden@mysore linux-2.6]$ cat
/sys/devices/system/clocksource/clocksource0/*
hpet acpi_pm
hpet
Can you do this in the guest too? That will make it very clear what
clocksources the guest finds during bootup.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-02 16:10 ` Arjan Koers
2010-10-02 20:26 ` Michael Tokarev
@ 2010-10-02 23:42 ` Zachary Amsden
2010-10-03 8:27 ` Michael Tokarev
2010-10-08 0:12 ` Arjan Koers
1 sibling, 2 replies; 81+ messages in thread
From: Zachary Amsden @ 2010-10-02 23:42 UTC (permalink / raw)
To: Arjan Koers
Cc: kvm, Marcelo Tosatti, Michael Tokarev, Avi Kivity, Glauber Costa,
Andre Przywara
On 10/02/2010 06:10 AM, Arjan Koers wrote:
> On 2010-10-02 09:35, Michael Tokarev wrote:
>
>> 02.10.2010 09:35, Zachary Amsden wrote:
>> []
>>
>>> Can you try this patch to see if it helps? I believe it is also safe
>>> for Xen, but cc'ing to double check.
>>>
>> It makes no visible difference.
>>
>> For some reason one of my test guests - 2.6.35.6 32bit kernel -
>> stopped booting completely, always handing at boot somewhere
>> unless I disable printk.time. Here's the typical boot messages,
>> up to the hang:
>>
>> [ 0.000000] Initializing cgroup subsys cpuset
>>
> ...
>
>> [ 0.259999] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14
>> [ 0.259999] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15
>>
>> Note the time - it is constant after switching to kvmclock.
>>
> While CPU 1 is booting, pvclock_clocksource_read gets wrong data for that
> CPU and returns a value that's far into the future. On subsequent calls, it
> keeps returning that bogus 'last' value, because it has been made
> to never go backwards in time.
>
> I'm pretty sure that your kernel will boot with this debug patch (for
> 2.6.35.7). It doesn't fix the problem, but corrects things afterwards.
> The patch sets the clock backwards if it detects that the previous
> value was far into the future. It also modifies printk to display some
> extra information. The DEBUG define was added to get extra calls to
> printk's where things can go wrong.
>
>
>
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 239427c..5eab569 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -120,12 +120,15 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src)
>
> static atomic64_t last_value = ATOMIC64_INIT(0);
>
> +int pvclock_backwards = 0;
> +
> cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
> {
> struct pvclock_shadow_time shadow;
> unsigned version;
> cycle_t ret, offset;
> u64 last;
> + bool backwards;
>
> do {
> version = pvclock_get_time_values(&shadow, src);
> @@ -153,13 +156,26 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
> * updating at the same time, and one of them could be slightly behind,
> * making the assumption that last_value always go forward fail to hold.
> */
> + backwards = false;
> last = atomic64_read(&last_value);
> do {
> - if (ret< last)
> - return last;
> + if (ret< last) {
> + if ( last - ret< 25000000 )
> + return last;
> + else
> + /* The clock will go backwards instead of being stuck at last value for a very long time
> + * The return value of the previous call to pvclock_clocksource_read was most probably
> + * very far into te future because the clock for that CPU hadn't been setup yet
> + */
> + backwards = true;
> + }
> last = atomic64_cmpxchg(&last_value, last, ret);
> } while (unlikely(last != ret));
>
> + /* Increment outside of the while loop, because it always loops twice */
> + if (backwards)
> + pvclock_backwards++;
> +
> return ret;
> }
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 0bf2ece..d6dcd45 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1,3 +1,5 @@
> +#define DEBUG
> +
> /*
> * x86 SMP booting functions
> *
> diff --git a/kernel/printk.c b/kernel/printk.c
> index 444b770..9608bec 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -687,6 +687,8 @@ static inline void printk_delay(void)
> }
> }
>
> +extern int pvclock_backwards;
> +
> asmlinkage int vprintk(const char *fmt, va_list args)
> {
> int printed_len = 0;
> @@ -778,9 +780,13 @@ asmlinkage int vprintk(const char *fmt, va_list args)
> unsigned long long t;
> unsigned long nanosec_rem;
>
> + int pvclock_backwards_prev = pvclock_backwards;
> t = cpu_clock(printk_cpu);
> nanosec_rem = do_div(t, 1000000000);
> - tlen = sprintf(tbuf, "[%5lu.%06lu] ",
> + tlen = sprintf(tbuf, "[%d;%d/%d:%5lu.%06lu] ",
> + printk_cpu,
> + pvclock_backwards_prev,
> + pvclock_backwards,
> (unsigned long) t,
> nanosec_rem / 1000);
>
>
>
>
> Partial output on my machine, where the clock is set backwards 4 times:
> ...
> [0;0/0: 0.015662] CPU0: AMD Athlon(tm) II X2 240 Processor stepping 02
> [0;0/0: 0.124164] ++++++++++++++++++++=_---CPU UP 1
> [0;0/0: 0.124193] Booting Node 0, Processors #1 Ok.
> [0;0/0: 0.124602] Setting warm reset code and vector.
> [0;0/0: 0.124609] 1.
> [0;0/0: 0.124610] 2.
> [0;0/0: 0.124611] 3.
> [0;0/0: 0.124624] Asserting INIT.
> [0;0/0: 0.124634] Waiting for send to finish...
> [0;0/0: 0.134508] Deasserting INIT.
> [0;0/0: 0.134515] Waiting for send to finish...
> [0;0/0: 0.134519] #startup loops: 2.
> [0;0/0: 0.134521] Sending STARTUP #1.
> [0;0/0: 0.134527] After apic_write.
> [1;0/0: 0.008000] CPU#1 (phys ID: 1) waiting for CALLOUT
> [0;0/1: 0.134838] Startup point 1.
> [0;1/1: 0.134841] Waiting for send to finish...
> [0;1/1: 0.135049] Sending STARTUP #2.
> [0;1/1: 0.135055] After apic_write.
> [0;1/1: 0.135359] Startup point 1.
> [0;1/1: 0.135361] Waiting for send to finish...
> [0;1/1: 0.135568] After Startup.
> [0;1/1: 0.135569] Before Callout 1.
> [0;1/1: 0.135571] After Callout 1.
> [1;1/1: 0.008000] CALLIN, before setup_local_APIC().
> [1;2/2: 0.008000] Stack at about ffff88001f875f44
> [0;3/3: 0.136176] CPU1: has booted.
> [1;3/3: 0.008000] kvm-clock: cpu 1, msr 0:1511c41, secondary cpu clock
> [0;4/4: 0.136199] Brought up 2 CPUs
> [0;4/4: 0.136201] Boot done.
> [0;4/4: 0.136202] Before bogomips.
> [0;4/4: 0.136204] Total of 2 processors activated (11198.56 BogoMIPS).
> [0;4/4: 0.136205] Before bogocount - setting activated=1.
> [1;4/4: 0.140208] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
> [0;4/4: 0.142577] NET: Registered protocol family 16
> [0;4/4: 0.144263] PCI: Using configuration type 1 for base access
> [0;4/4: 0.144494] PCI: Using configuration type 1 for extended access
> [0;4/4: 0.144938] mtrr: your CPUs had inconsistent variable MTRR settings
> [0;4/4: 0.144938] mtrr: your CPUs had inconsistent MTRRdefType settings
> [0;4/4: 0.144938] mtrr: probably your BIOS does not setup all CPUs.
> [0;4/4: 0.148004] mtrr: corrected configuration.
> [0;4/4: 0.156040] bio: create slab<bio-0> at 0
> [0;4/4: 0.156602] vgaarb: loaded
> [0;4/4: 0.156602] PCI: Probing PCI hardware
> [0;4/4: 0.156602] PCI: Probing PCI hardware (bus 00)
> [0;4/4: 0.156703] pci 0000:00:01.1: reg 20: [io 0xc000-0xc00f]
> [0;4/4: 0.160269] pci 0000:00:01.3: quirk: [io 0xb000-0xb03f] claimed by PIIX4 ACPI
> [0;4/4: 0.161055] pci 0000:00:01.3: quirk: [io 0xb100-0xb10f] claimed by PIIX4 SMB
> [0;4/4: 0.164064] pci 0000:00:02.0: reg 10: [mem 0xf0000000-0xf1ffffff pref]
> [0;4/4: 0.164827] pci 0000:00:02.0: reg 14: [mem 0xf2000000-0xf2000fff]
> [0;4/4: 0.169023] pci 0000:00:03.0: reg 10: [io 0xc020-0xc03f]
> [0;4/4: 0.170052] pci 0000:00:03.0: reg 14: [mem 0xf2001000-0xf2001fff]
> [0;4/4: 0.170381] pci 0000:00:04.0: reg 10: [io 0xc040-0xc05f]
> [0;4/4: 0.170765] pci 0000:00:05.0: reg 10: [io 0xc080-0xc0bf]
> [0;4/4: 0.171023] pci 0000:00:06.0: reg 10: [io 0xc0c0-0xc0ff]
> [0;4/4: 0.172123] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
> [0;4/4: 0.172971] pci 0000:00:01.0: PIIX/ICH IRQ router [8086:7000]
> [0;4/4: 0.172971] PCI: pci_cache_line_size set to 64 bytes
> [0;4/4: 0.172971] reserve RAM buffer: 000000000009bc00 - 000000000009ffff
> [0;4/4: 0.172971] reserve RAM buffer: 000000001fffd000 - 000000001fffffff
> [0;4/4: 0.176175] Switching to clocksource kvm-clock
> [1;4/4: 0.212494] pci_bus 0000:00: resource 0 [io 0x0000-0xffff]
> [1;4/4: 0.212500] pci_bus 0000:00: resource 1 [mem 0x00000000-0xffffffffffffffff]
> [1;4/4: 0.212828] NET: Registered protocol family 2
> [1;4/4: 0.213783] IP route cache hash table entries: 4096 (order: 3, 32768 bytes)
> ...
>
Umm... do you guys have this commit? This is supposed to address the
issue where the guest keeps resetting the TSC. A guest which does that
will break kvmclock. It only happens on SMP, and it's much worse on AMD
CPUs...
sound like your scenario.
commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
Author: Zachary Amsden <zamsden@redhat.com>
Date: Thu Aug 19 22:07:26 2010 -1000
KVM: x86: Robust TSC compensation
Make the match of TSC find TSC writes that are close to each other
instead of perfectly identical; this allows the compensator to also
work in migration / suspend scenarios.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-02 21:55 ` Zachary Amsden
@ 2010-10-03 8:16 ` Michael Tokarev
2010-10-03 8:22 ` Avi Kivity
2010-10-03 8:30 ` Michael Tokarev
0 siblings, 2 replies; 81+ messages in thread
From: Michael Tokarev @ 2010-10-03 8:16 UTC (permalink / raw)
To: Zachary Amsden
Cc: Marcelo Tosatti, Arjan Koers, kvm, Avi Kivity, Glauber Costa,
Andre Przywara, jeremy
03.10.2010 01:55, Zachary Amsden wrote:
> On 10/01/2010 09:35 PM, Michael Tokarev wrote:
[]
>> [ 0.049999] APIC delta adjusted to PM-Timer: 6248670 (6435422)
>> [ 0.050298] Booting Node 0, Processors #1 Ok.
>> [ 0.023332] Initializing CPU#1
>>
>
> Before this, time is very granular...
>> [ 0.063333] kvm-clock: cpu 1, msr 0:1e0a0c1, secondary cpu clock
>> [ 0.063333] Brought up 2 CPUs
>> [ 0.063333] Total of 2 processors activated (12874.21 BogoMIPS).
>> [ 0.076666] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
>> [ 0.116666] devtmpfs: initialized
>> [ 0.116666] NET: Registered protocol family 16
>> [ 0.119999] ACPI: bus type pci registered
>
> Now it is multiples of 1/300 ....
Note it's second CPU.
>> [ 0.249999] Switching to clocksource kvm-clock
>> [ 0.259999] pnp: PnP ACPI init
>>
>
> Then, of course, it fails.
>
> What is your host clocksource? Does your machine have unstable TSC?
> Here, I have unstable tsc:
Host is using tsc, and this is the only available clocksource now.
It was long time ago when I looked at this last - usually all
standard 3, also hpet and acpi_pm, are available too. This is
AthlonII CPU, which has synced tsc. I upgraded the CPU this year
from the previous gen Athlon, -- that one didn't have synced tsc
and kernel were using something else. So I really don't know why
and when I've only tsc listed on the host (it's 2.6.35.6 x64).
The guest finds usual (in this situation) kvmclock and acpi_pm
(I'm running it with -no-hpet - without it also finds hpet) --
it reports about instability of tsc somewhere in dmesg:
[1;3/3: 1.004254] Clocksource tsc unstable (delta = 284538419181 ns)
Note this is a regression too, or maybe a bugfix - some time ago,
on another AthlonII machine (also synced tsc), I used to have SMP
guests that used tsc and reported instability of tsc only when
host were swapping (we had a _long_ conversation with Marcelo
Trosati about this somewhere last year, both in public and in
private and on irc, with some bugs fixed after this). Tha to
say, guests at least had _apparently_ stable tsc before, now
instability is detected right away, with a huge difference too.
I just booted this same guest using kvm-0.12.5 - using that one
guest does not report unstable tsc, yet does not list it in the
available_clocksources. It also shows time jumps:
...
[0;0/0: 0.000000] Detected 3217.424 MHz processor.
[0;0/0: 0.006666] Calibrating delay loop (skipped) preset value.. 6437.96 BogoMIPS (lpj=10724746)
[0;0/0: 0.006666] pid_max: default: 32768 minimum: 301
[0;0/0: 0.006666] Mount-cache hash table entries: 512
[0;0/0: 0.006765] Initializing cgroup subsys ns
...
[0;0/0: 0.029999] Booting Node 0, Processors #1 Ok.
[1;0/0: 0.006666] Initializing CPU#1
[1;0/0: 0.006666] kvm-clock: cpu 1, msr 0:1e0a0c1, secondary cpu clock
[0;0/0: 0.058342] Brought up 2 CPUs
...
Thanks!
/mjt
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-03 8:16 ` Michael Tokarev
@ 2010-10-03 8:22 ` Avi Kivity
2010-10-03 8:30 ` Michael Tokarev
1 sibling, 0 replies; 81+ messages in thread
From: Avi Kivity @ 2010-10-03 8:22 UTC (permalink / raw)
To: Michael Tokarev
Cc: Zachary Amsden, Marcelo Tosatti, Arjan Koers, kvm, Glauber Costa,
Andre Przywara, jeremy
On 10/03/2010 10:16 AM, Michael Tokarev wrote:
> I just booted this same guest using kvm-0.12.5 - using that one
> guest does not report unstable tsc, yet does not list it in the
> available_clocksources. It also shows time jumps:
>
> ...
> [0;0/0: 0.000000] Detected 3217.424 MHz processor.
> [0;0/0: 0.006666] Calibrating delay loop (skipped) preset value.. 6437.96 BogoMIPS (lpj=10724746)
> [0;0/0: 0.006666] pid_max: default: 32768 minimum: 301
> [0;0/0: 0.006666] Mount-cache hash table entries: 512
> [0;0/0: 0.006765] Initializing cgroup subsys ns
> ...
> [0;0/0: 0.029999] Booting Node 0, Processors #1 Ok.
> [1;0/0: 0.006666] Initializing CPU#1
> [1;0/0: 0.006666] kvm-clock: cpu 1, msr 0:1e0a0c1, secondary cpu clock
> [0;0/0: 0.058342] Brought up 2 CPUs
> ...
Most likely it's still using jiffies while the clocks are being set up.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-02 23:42 ` Zachary Amsden
@ 2010-10-03 8:27 ` Michael Tokarev
2010-10-08 0:12 ` Arjan Koers
1 sibling, 0 replies; 81+ messages in thread
From: Michael Tokarev @ 2010-10-03 8:27 UTC (permalink / raw)
To: Zachary Amsden
Cc: Arjan Koers, kvm, Marcelo Tosatti, Avi Kivity, Glauber Costa,
Andre Przywara
03.10.2010 03:42, Zachary Amsden wrote:
[]
> Umm... do you guys have this commit? This is supposed to address the
> issue where the guest keeps resetting the TSC. A guest which does that
> will break kvmclock. It only happens on SMP, and it's much worse on AMD
> CPUs...
>
> sound like your scenario.
I'm using 2.6.35.y kernel.org kernel which does not have this patch.
I discovered this problem with this kernel first, and later it become
apparent that it is present in 2.6.32 stable series as well -- that's
my current main target, 2.6.35 for testing stuff and 2.6.32 for a
backport later.
> commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
And it does not apply to 2.6.35 too, -- there's no kvm_write_tsc()
function in arch/x86/kvm/x86.c, and no code similar to that.
I browsed Linus git history, and see that this is a part of
larger patch series, which were already mentioned in this
thread several times, but without any mention of the base
it should be applied to (you mentioned another of your
patches in this series, the one that writes zero to tsc
somewhere, and told it wont apply but just shows the bug).
Should I try to apply whole thing to 2.6.35?
Thanks!
/mjt
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-03 8:16 ` Michael Tokarev
2010-10-03 8:22 ` Avi Kivity
@ 2010-10-03 8:30 ` Michael Tokarev
1 sibling, 0 replies; 81+ messages in thread
From: Michael Tokarev @ 2010-10-03 8:30 UTC (permalink / raw)
To: Zachary Amsden
Cc: Marcelo Tosatti, Arjan Koers, kvm, Avi Kivity, Glauber Costa,
Andre Przywara, jeremy
03.10.2010 12:16, Michael Tokarev wrote:
> Host is using tsc, and this is the only available clocksource now.
> It was long time ago when I looked at this last - usually all
> standard 3, also hpet and acpi_pm, are available too. This is
> AthlonII CPU, which has synced tsc. I upgraded the CPU this year
> from the previous gen Athlon, -- that one didn't have synced tsc
> and kernel were using something else. So I really don't know why
> and when I've only tsc listed on the host (it's 2.6.35.6 x64).
Oh well, it was ENOCOFFEE. I were looking at current_clocksource,
not available_clocksources on the host. Available are all usual
sources, -- tsc hpet and acpi_pm, just like as expected.
Thanks!
/mjt
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-02 23:42 ` Zachary Amsden
2010-10-03 8:27 ` Michael Tokarev
@ 2010-10-08 0:12 ` Arjan Koers
2010-10-08 2:47 ` Zachary Amsden
1 sibling, 1 reply; 81+ messages in thread
From: Arjan Koers @ 2010-10-08 0:12 UTC (permalink / raw)
To: kvm
Cc: Zachary Amsden, Marcelo Tosatti, Michael Tokarev, Avi Kivity,
Glauber Costa, Andre Przywara
On 2010-10-03 01:42, Zachary Amsden wrote:
...
>
> Umm... do you guys have this commit? This is supposed to address the
> issue where the guest keeps resetting the TSC. A guest which does that
> will break kvmclock. It only happens on SMP, and it's much worse on AMD
> CPUs...
>
> sound like your scenario.
>
> commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
> Author: Zachary Amsden <zamsden@redhat.com>
> Date: Thu Aug 19 22:07:26 2010 -1000
This commit fixes the problem:
commit aad07c4f92bae2edaa42bcef84c2afdd0d082458
Author: Zachary Amsden <zamsden@redhat.com>
Date: Thu Aug 19 22:07:19 2010 -1000
KVM: x86: Move TSC reset out of vmcb_init
The VMCB is reset whenever we receive a startup IPI, so Linux is setting
TSC back to zero happens very late in the boot process and destabilizing
the TSC. Instead, just set TSC to zero once at VCPU creation time.
Why the separate patch? So git-bisect is your friend.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-08 0:12 ` Arjan Koers
@ 2010-10-08 2:47 ` Zachary Amsden
2010-10-08 22:06 ` Marcelo Tosatti
0 siblings, 1 reply; 81+ messages in thread
From: Zachary Amsden @ 2010-10-08 2:47 UTC (permalink / raw)
To: Arjan Koers
Cc: kvm, Marcelo Tosatti, Michael Tokarev, Avi Kivity, Glauber Costa,
Andre Przywara
On 10/07/2010 02:12 PM, Arjan Koers wrote:
> On 2010-10-03 01:42, Zachary Amsden wrote:
> ...
>
>> Umm... do you guys have this commit? This is supposed to address the
>> issue where the guest keeps resetting the TSC. A guest which does that
>> will break kvmclock. It only happens on SMP, and it's much worse on AMD
>> CPUs...
>>
>> sound like your scenario.
>>
>> commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
>> Author: Zachary Amsden<zamsden@redhat.com>
>> Date: Thu Aug 19 22:07:26 2010 -1000
>>
>
> This commit fixes the problem:
>
> commit aad07c4f92bae2edaa42bcef84c2afdd0d082458
> Author: Zachary Amsden<zamsden@redhat.com>
> Date: Thu Aug 19 22:07:19 2010 -1000
>
> KVM: x86: Move TSC reset out of vmcb_init
>
> The VMCB is reset whenever we receive a startup IPI, so Linux is setting
> TSC back to zero happens very late in the boot process and destabilizing
> the TSC. Instead, just set TSC to zero once at VCPU creation time.
>
> Why the separate patch? So git-bisect is your friend.
>
Okay, apparently I need to go poke around 2.6.35 and see what patches
made it there and what patches didn't.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-08 2:47 ` Zachary Amsden
@ 2010-10-08 22:06 ` Marcelo Tosatti
2010-10-09 1:10 ` Arjan Koers
2010-10-09 7:59 ` Michael Tokarev
0 siblings, 2 replies; 81+ messages in thread
From: Marcelo Tosatti @ 2010-10-08 22:06 UTC (permalink / raw)
To: Zachary Amsden
Cc: Arjan Koers, kvm, Michael Tokarev, Avi Kivity, Glauber Costa,
Andre Przywara
[-- Attachment #1: Type: text/plain, Size: 1306 bytes --]
On Thu, Oct 07, 2010 at 04:47:11PM -1000, Zachary Amsden wrote:
> On 10/07/2010 02:12 PM, Arjan Koers wrote:
> >On 2010-10-03 01:42, Zachary Amsden wrote:
> >...
> >>Umm... do you guys have this commit? This is supposed to address the
> >>issue where the guest keeps resetting the TSC. A guest which does that
> >>will break kvmclock. It only happens on SMP, and it's much worse on AMD
> >>CPUs...
> >>
> >>sound like your scenario.
> >>
> >>commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
> >>Author: Zachary Amsden<zamsden@redhat.com>
> >>Date: Thu Aug 19 22:07:26 2010 -1000
> >
> >This commit fixes the problem:
> >
> >commit aad07c4f92bae2edaa42bcef84c2afdd0d082458
> >Author: Zachary Amsden<zamsden@redhat.com>
> >Date: Thu Aug 19 22:07:19 2010 -1000
> >
> > KVM: x86: Move TSC reset out of vmcb_init
> >
> > The VMCB is reset whenever we receive a startup IPI, so Linux is setting
> > TSC back to zero happens very late in the boot process and destabilizing
> > the TSC. Instead, just set TSC to zero once at VCPU creation time.
> >
> > Why the separate patch? So git-bisect is your friend.
>
> Okay, apparently I need to go poke around 2.6.35 and see what
> patches made it there and what patches didn't.
Backports attached. Michael, Arjan, please give them a try.
[-- Attachment #2: 001-kvm-x86-fix-svm-reset --]
[-- Type: text/plain, Size: 867 bytes --]
commit 280372e494634d0a2cba3956721be16fc4f989bf
Author: Zachary Amsden <zamsden@redhat.com>
Date: Thu Aug 19 22:07:18 2010 -1000
KVM: x86: Fix SVM VMCB reset
On reset, VMCB TSC should be set to zero. Instead, code was setting
tsc_offset to zero, which passes through the underlying TSC.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Index: kvm/arch/x86/kvm/svm.c
===================================================================
--- kvm.orig/arch/x86/kvm/svm.c
+++ kvm/arch/x86/kvm/svm.c
@@ -766,7 +766,7 @@ static void init_vmcb(struct vcpu_svm *s
control->iopm_base_pa = iopm_base;
control->msrpm_base_pa = __pa(svm->msrpm);
- control->tsc_offset = 0;
+ control->tsc_offset = 0-native_read_tsc();
control->int_ctl = V_INTR_MASKING_MASK;
init_seg(&save->es);
[-- Attachment #3: 002-kvm-x86-move-tsc-reset-out-of-vmcb-init --]
[-- Type: text/plain, Size: 1247 bytes --]
commit aad07c4f92bae2edaa42bcef84c2afdd0d082458
Author: Zachary Amsden <zamsden@redhat.com>
Date: Thu Aug 19 22:07:19 2010 -1000
KVM: x86: Move TSC reset out of vmcb_init
The VMCB is reset whenever we receive a startup IPI, so Linux is setting
TSC back to zero happens very late in the boot process and destabilizing
the TSC. Instead, just set TSC to zero once at VCPU creation time.
Why the separate patch? So git-bisect is your friend.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Index: kvm/arch/x86/kvm/svm.c
===================================================================
--- kvm.orig/arch/x86/kvm/svm.c
+++ kvm/arch/x86/kvm/svm.c
@@ -766,7 +766,6 @@ static void init_vmcb(struct vcpu_svm *s
control->iopm_base_pa = iopm_base;
control->msrpm_base_pa = __pa(svm->msrpm);
- control->tsc_offset = 0-native_read_tsc();
control->int_ctl = V_INTR_MASKING_MASK;
init_seg(&save->es);
@@ -902,6 +901,7 @@ static struct kvm_vcpu *svm_create_vcpu(
svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT;
svm->asid_generation = 0;
init_vmcb(svm);
+ svm->vmcb->control.tsc_offset = 0-native_read_tsc();
err = fx_init(&svm->vcpu);
if (err)
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-08 22:06 ` Marcelo Tosatti
@ 2010-10-09 1:10 ` Arjan Koers
2010-10-09 2:27 ` Zachary Amsden
` (3 more replies)
2010-10-09 7:59 ` Michael Tokarev
1 sibling, 4 replies; 81+ messages in thread
From: Arjan Koers @ 2010-10-09 1:10 UTC (permalink / raw)
To: kvm
Cc: Zachary Amsden, Marcelo Tosatti, Michael Tokarev, Avi Kivity,
Glauber Costa, Andre Przywara
On 2010-10-09 00:06, Marcelo Tosatti wrote:
> On Thu, Oct 07, 2010 at 04:47:11PM -1000, Zachary Amsden wrote:
>> On 10/07/2010 02:12 PM, Arjan Koers wrote:
>>> On 2010-10-03 01:42, Zachary Amsden wrote:
>>> ...
>>>> Umm... do you guys have this commit? This is supposed to address the
>>>> issue where the guest keeps resetting the TSC. A guest which does that
>>>> will break kvmclock. It only happens on SMP, and it's much worse on AMD
>>>> CPUs...
>>>>
>>>> sound like your scenario.
>>>>
>>>> commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
>>>> Author: Zachary Amsden<zamsden@redhat.com>
>>>> Date: Thu Aug 19 22:07:26 2010 -1000
>>>
>>> This commit fixes the problem:
>>>
>>> commit aad07c4f92bae2edaa42bcef84c2afdd0d082458
>>> Author: Zachary Amsden<zamsden@redhat.com>
>>> Date: Thu Aug 19 22:07:19 2010 -1000
>>>
>>> KVM: x86: Move TSC reset out of vmcb_init
>>>
>>> The VMCB is reset whenever we receive a startup IPI, so Linux is setting
>>> TSC back to zero happens very late in the boot process and destabilizing
>>> the TSC. Instead, just set TSC to zero once at VCPU creation time.
>>>
>>> Why the separate patch? So git-bisect is your friend.
>>
>> Okay, apparently I need to go poke around 2.6.35 and see what
>> patches made it there and what patches didn't.
>
> Backports attached. Michael, Arjan, please give them a try.
>
Thanks for the patches.
Successfully tested with 2.6.34.7, 2.6.35.7 and 2.6.36-rc7 host
(with a 2.6.35.7 guest).
It failed with a 2.6.32.24 host. The patch applied, but
pvclock_clocksource_read on the guest is still producing wrong
results for CPU 1 while it's booting. I'll re-check tomorrow.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-09 1:10 ` Arjan Koers
@ 2010-10-09 2:27 ` Zachary Amsden
2010-10-09 6:29 ` Michael Tokarev
` (2 more replies)
2010-10-09 2:29 ` Zachary Amsden
` (2 subsequent siblings)
3 siblings, 3 replies; 81+ messages in thread
From: Zachary Amsden @ 2010-10-09 2:27 UTC (permalink / raw)
To: Arjan Koers
Cc: kvm, Marcelo Tosatti, Michael Tokarev, Avi Kivity, Glauber Costa,
Andre Przywara
On 10/08/2010 03:10 PM, Arjan Koers wrote:
> On 2010-10-09 00:06, Marcelo Tosatti wrote:
>
>> On Thu, Oct 07, 2010 at 04:47:11PM -1000, Zachary Amsden wrote:
>>
>>> On 10/07/2010 02:12 PM, Arjan Koers wrote:
>>>
>>>> On 2010-10-03 01:42, Zachary Amsden wrote:
>>>> ...
>>>>
>>>>> Umm... do you guys have this commit? This is supposed to address the
>>>>> issue where the guest keeps resetting the TSC. A guest which does that
>>>>> will break kvmclock. It only happens on SMP, and it's much worse on AMD
>>>>> CPUs...
>>>>>
>>>>> sound like your scenario.
>>>>>
>>>>> commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
>>>>> Author: Zachary Amsden<zamsden@redhat.com>
>>>>> Date: Thu Aug 19 22:07:26 2010 -1000
>>>>>
>>>> This commit fixes the problem:
>>>>
>>>> commit aad07c4f92bae2edaa42bcef84c2afdd0d082458
>>>> Author: Zachary Amsden<zamsden@redhat.com>
>>>> Date: Thu Aug 19 22:07:19 2010 -1000
>>>>
>>>> KVM: x86: Move TSC reset out of vmcb_init
>>>>
>>>> The VMCB is reset whenever we receive a startup IPI, so Linux is setting
>>>> TSC back to zero happens very late in the boot process and destabilizing
>>>> the TSC. Instead, just set TSC to zero once at VCPU creation time.
>>>>
>>>> Why the separate patch? So git-bisect is your friend.
>>>>
>>> Okay, apparently I need to go poke around 2.6.35 and see what
>>> patches made it there and what patches didn't.
>>>
>> Backports attached. Michael, Arjan, please give them a try.
>>
>>
> Thanks for the patches.
>
> Successfully tested with 2.6.34.7, 2.6.35.7 and 2.6.36-rc7 host
> (with a 2.6.35.7 guest).
>
> It failed with a 2.6.32.24 host. The patch applied, but
> pvclock_clocksource_read on the guest is still producing wrong
> results for CPU 1 while it's booting. I'll re-check tomorrow.
>
There's a lot of work I've done and also a lot of work done by Glauber
Costa on kvmclock that recently went upstream.
It's unlikely that you'll be bug free without all of those patches
applied; most of the patches were not just enhancements, but contained
bugfixes as well as improved operation conditions. On top of this, the
patches are highly interdependent because of close code proximity. I
suggest applying the following commits to your branch (newest listed
first; apply in reverse order):
12b1164fa498997bf72070e6a81418197e283716
bfa075b75d8786380a7bca1215d4c7d1485d18dd
82e7988a2088781175a22b09631bce97cd5ed177
bfb3f3326c915b1800dc65d10ca09fbd548353d2
1377ff23ae2bf49c76f8f498ca81050878b9666a
9a088cc32488cfb9f60dca5972155ba13f39eb83
e06a1a6cbe4e9f4c766595483a9b345d5b48bda7
da908f2fb4e783c2a4de751fb90f11a0dd041161
cf839f5da2b0779b9ec8b990f851fb4e7d681da0
cbc59a098486494d9a49537dcb9c969210a8306d
5cd459cdde725bb5c3a7feef6e074e7da70490c9
d578d4d72e3d2154901123f40c9fa7de1f85ae73
bd59fc8ff95126f27b7a0df1b6cc602aa428812d
e5e7675b0b9bf8eb0b806145a2fe173b5bb0e908
bf0fb4a42ba7eb362f4013bd2e93209666793e66
69403a558097a9bd333736d58a4cb69ea6e2a0ac
a87834bdb7ff9117da7f164e8cee638f2c51f9b7
91308e2fecddb6fc63feaf4cef3400f5cbea6619
fd03465c0648cd12d7333269b80d902d0a8516dd
aad07c4f92bae2edaa42bcef84c2afdd0d082458
280372e494634d0a2cba3956721be16fc4f989bf
1e6145f6fd7899d1f34e4ac00a8558d82a8d704a
ec01d2eb0a74a6d95823fb6e320298473faf12be
3e05d29fe45508625e2a73db3d1bfb54f30731ff
Since the issue appears resolved, I'm going to continue working upstream.
Zach
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-09 1:10 ` Arjan Koers
2010-10-09 2:27 ` Zachary Amsden
@ 2010-10-09 2:29 ` Zachary Amsden
2010-10-10 1:26 ` Arjan Koers
2010-10-20 20:47 ` Arjan Koers
3 siblings, 0 replies; 81+ messages in thread
From: Zachary Amsden @ 2010-10-09 2:29 UTC (permalink / raw)
To: Arjan Koers
Cc: kvm, Marcelo Tosatti, Michael Tokarev, Avi Kivity, Glauber Costa,
Andre Przywara
On 10/08/2010 03:10 PM, Arjan Koers wrote:
> On 2010-10-09 00:06, Marcelo Tosatti wrote:
>
>> On Thu, Oct 07, 2010 at 04:47:11PM -1000, Zachary Amsden wrote:
>>
>>> On 10/07/2010 02:12 PM, Arjan Koers wrote:
>>>
>>>> On 2010-10-03 01:42, Zachary Amsden wrote:
>>>> ...
>>>>
>>>>> Umm... do you guys have this commit? This is supposed to address the
>>>>> issue where the guest keeps resetting the TSC. A guest which does that
>>>>> will break kvmclock. It only happens on SMP, and it's much worse on AMD
>>>>> CPUs...
>>>>>
>>>>> sound like your scenario.
>>>>>
>>>>> commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
>>>>> Author: Zachary Amsden<zamsden@redhat.com>
>>>>> Date: Thu Aug 19 22:07:26 2010 -1000
>>>>>
>>>> This commit fixes the problem:
>>>>
>>>> commit aad07c4f92bae2edaa42bcef84c2afdd0d082458
>>>> Author: Zachary Amsden<zamsden@redhat.com>
>>>> Date: Thu Aug 19 22:07:19 2010 -1000
>>>>
>>>> KVM: x86: Move TSC reset out of vmcb_init
>>>>
>>>> The VMCB is reset whenever we receive a startup IPI, so Linux is setting
>>>> TSC back to zero happens very late in the boot process and destabilizing
>>>> the TSC. Instead, just set TSC to zero once at VCPU creation time.
>>>>
>>>> Why the separate patch? So git-bisect is your friend.
>>>>
>>> Okay, apparently I need to go poke around 2.6.35 and see what
>>> patches made it there and what patches didn't.
>>>
>> Backports attached. Michael, Arjan, please give them a try.
>>
>>
> Thanks for the patches.
>
> Successfully tested with 2.6.34.7, 2.6.35.7 and 2.6.36-rc7 host
> (with a 2.6.35.7 guest).
>
> It failed with a 2.6.32.24 host. The patch applied, but
> pvclock_clocksource_read on the guest is still producing wrong
> results for CPU 1 while it's booting. I'll re-check tomorrow.
>
There's a lot of work I've done and also a lot of work done by Glauber
Costa on kvmclock that recently went upstream.
It's unlikely that you'll be bug free without all of those patches
applied; most of the patches were not just enhancements, but contained
bugfixes as well as improved operation conditions. On top of this, the
patches are highly interdependent because of close code proximity. I
suggest applying the following commits to your branch (newest listed
first; apply in reverse order):
12b1164fa498997bf72070e6a81418197e283716
bfa075b75d8786380a7bca1215d4c7d1485d18dd
82e7988a2088781175a22b09631bce97cd5ed177
bfb3f3326c915b1800dc65d10ca09fbd548353d2
1377ff23ae2bf49c76f8f498ca81050878b9666a
9a088cc32488cfb9f60dca5972155ba13f39eb83
e06a1a6cbe4e9f4c766595483a9b345d5b48bda7
da908f2fb4e783c2a4de751fb90f11a0dd041161
cf839f5da2b0779b9ec8b990f851fb4e7d681da0
cbc59a098486494d9a49537dcb9c969210a8306d
5cd459cdde725bb5c3a7feef6e074e7da70490c9
d578d4d72e3d2154901123f40c9fa7de1f85ae73
bd59fc8ff95126f27b7a0df1b6cc602aa428812d
e5e7675b0b9bf8eb0b806145a2fe173b5bb0e908
bf0fb4a42ba7eb362f4013bd2e93209666793e66
69403a558097a9bd333736d58a4cb69ea6e2a0ac
a87834bdb7ff9117da7f164e8cee638f2c51f9b7
91308e2fecddb6fc63feaf4cef3400f5cbea6619
fd03465c0648cd12d7333269b80d902d0a8516dd
aad07c4f92bae2edaa42bcef84c2afdd0d082458
280372e494634d0a2cba3956721be16fc4f989bf
1e6145f6fd7899d1f34e4ac00a8558d82a8d704a
ec01d2eb0a74a6d95823fb6e320298473faf12be
3e05d29fe45508625e2a73db3d1bfb54f30731ff
Since the issue appears resolved, I'm going to continue working upstream.
Zach
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-09 2:27 ` Zachary Amsden
@ 2010-10-09 6:29 ` Michael Tokarev
2010-10-09 8:59 ` Arjan Koers
2010-10-10 1:20 ` Arjan Koers
2010-10-11 17:53 ` Anthony Liguori
2 siblings, 1 reply; 81+ messages in thread
From: Michael Tokarev @ 2010-10-09 6:29 UTC (permalink / raw)
To: Zachary Amsden
Cc: Arjan Koers, kvm, Marcelo Tosatti, Avi Kivity, Glauber Costa,
Andre Przywara
09.10.2010 06:27, Zachary Amsden wrote:
[]
> There's a lot of work I've done and also a lot of work done by Glauber
> Costa on kvmclock that recently went upstream.
I've seen your series that went into 2.6.36-to-be.
And tried to apply to a stable kernel series (2.6.32)
near the beginning of this thread. But it fails right
at the second patch -- ec01d2eb0a74a6d95823fb6e320298473faf12be
"KVM: x86: Convert TSC writes to TSC offset writes",
in arch/x86/kvm/vmx.c, and later other patches at other
places. In theory it should be possible for me to get
them applied, mechanically, by trying to guess what's
going on and modifying stuff accordingly.
> It's unlikely that you'll be bug free without all of those patches
> applied; most of the patches were not just enhancements, but contained
> bugfixes as well as improved operation conditions. On top of this, the
> patches are highly interdependent because of close code proximity. I
> suggest applying the following commits to your branch (newest listed
> first; apply in reverse order):
Yes, these commits, that's a large series of patches,
with lots of work done to produce them.
> Since the issue appears resolved, I'm going to continue working upstream.
The result is that no released linux kernel boots
in smp in kvm, which is a linux virtual machine.
That's irony, isn't it?
I wonder how distributions (which are almost all based
on 2.6.32 nowadays) will deal with the issue.. ;)
Thanks!
/mjt
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-08 22:06 ` Marcelo Tosatti
2010-10-09 1:10 ` Arjan Koers
@ 2010-10-09 7:59 ` Michael Tokarev
2010-10-09 8:31 ` Michael Tokarev
1 sibling, 1 reply; 81+ messages in thread
From: Michael Tokarev @ 2010-10-09 7:59 UTC (permalink / raw)
To: Marcelo Tosatti
Cc: Zachary Amsden, Arjan Koers, kvm, Avi Kivity, Glauber Costa,
Andre Przywara
09.10.2010 02:06, Marcelo Tosatti wrote:
[]
>>>> commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
>>>> Author: Zachary Amsden<zamsden@redhat.com>
>>>> Date: Thu Aug 19 22:07:26 2010 -1000
[]
>>> commit aad07c4f92bae2edaa42bcef84c2afdd0d082458
>>> Author: Zachary Amsden<zamsden@redhat.com>
>>> Date: Thu Aug 19 22:07:19 2010 -1000
Um. Now I'm completely confused.
The two mentioned patches, just like most of the
larger series from Zachary Amsden, are for _host_
kernel, right?
The two backports:
arch/x86/kvm/svm.c | 2 +-
kvm/arch/x86/kvm/svm.c | 2 +-
that's for _host_, not guest...
For some reason I tried several patches like the
two here for _guest_, not for host. No doubt there
were no difference in the results.
For host, things are quite different. While 2.6.32
is still very important there, it's not _that_
important as for guest.
As far as I can see, most of these can be dealt with
by re-loading kvm modules. Let me try these and some
of the earlier patches...
Oh well... Confusion, confusion, confusion.... :)
/mjt
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-09 7:59 ` Michael Tokarev
@ 2010-10-09 8:31 ` Michael Tokarev
0 siblings, 0 replies; 81+ messages in thread
From: Michael Tokarev @ 2010-10-09 8:31 UTC (permalink / raw)
To: Marcelo Tosatti
Cc: Zachary Amsden, Arjan Koers, kvm, Avi Kivity, Glauber Costa,
Andre Przywara
09.10.2010 11:59, Michael Tokarev wrote:
[]
> As far as I can see, most of these can be dealt with
> by re-loading kvm modules. Let me try these and some
> of the earlier patches...
So the two one-line backports, while applied to the
_host_ kvm modules, eliminated all the issues I had
so far with unstable clock and smp guests hanging
here or there. The timestamps in dmesg are not
jumping into the past anymore, and all my guests,
even the most problematic ones, now boots fine
(I tried several times to trigger the problem, to
no avail).
Just to be sure and to eliminate further possible
confusion: that's host kernel 2.6.35.6-amd64,
with two patches (backports offered by Marcelo)
applied on top and kvm{,-amd}.ko reloaded.
I tried several guests, incl. 2.6.32-i686 with
the earlier debugging patches applied, and
2.6.35-i686 (these two guests were showing the
issue most often).
Looking at the larger patchset again, -- there
were quite a few other changes too, should some
of these be applied as well? I mean, we eliminated
the most obvious problem, but it looks like there
are more problems in there....
Thank you for your work!
/mjt
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-09 6:29 ` Michael Tokarev
@ 2010-10-09 8:59 ` Arjan Koers
2010-10-11 20:47 ` Zachary Amsden
0 siblings, 1 reply; 81+ messages in thread
From: Arjan Koers @ 2010-10-09 8:59 UTC (permalink / raw)
To: Michael Tokarev
Cc: Zachary Amsden, kvm, Marcelo Tosatti, Avi Kivity, Glauber Costa,
Andre Przywara
On 2010-10-09 08:29, Michael Tokarev wrote:
...
> The result is that no released linux kernel boots
> in smp in kvm, which is a linux virtual machine.
> That's irony, isn't it?
>
> I wonder how distributions (which are almost all based
> on 2.6.32 nowadays) will deal with the issue.. ;)
It looks like Debian solved it on their 2.6.32 guest by
reverting the commit that makes it hang:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=588426
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-09 2:27 ` Zachary Amsden
2010-10-09 6:29 ` Michael Tokarev
@ 2010-10-10 1:20 ` Arjan Koers
2010-10-11 17:53 ` Anthony Liguori
2 siblings, 0 replies; 81+ messages in thread
From: Arjan Koers @ 2010-10-10 1:20 UTC (permalink / raw)
To: Zachary Amsden
Cc: kvm, Marcelo Tosatti, Michael Tokarev, Avi Kivity, Glauber Costa,
Andre Przywara
On 2010-10-09 04:27, Zachary Amsden wrote:
...
> There's a lot of work I've done and also a lot of work done by Glauber
> Costa on kvmclock that recently went upstream.
>
> It's unlikely that you'll be bug free without all of those patches
> applied; most of the patches were not just enhancements, but contained
> bugfixes as well as improved operation conditions. On top of this, the
> patches are highly interdependent because of close code proximity. I
> suggest applying the following commits to your branch (newest listed
> first; apply in reverse order):
>
> 12b1164fa498997bf72070e6a81418197e283716
...
> 3e05d29fe45508625e2a73db3d1bfb54f30731ff
I've tried applying these commits to 2.6.32.24, but gave up after a
while, because some were just too different to make it work (e.g.
91308e2fecddb6fc63feaf4cef3400f5cbea6619).
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-09 1:10 ` Arjan Koers
2010-10-09 2:27 ` Zachary Amsden
2010-10-09 2:29 ` Zachary Amsden
@ 2010-10-10 1:26 ` Arjan Koers
2010-10-20 20:47 ` Arjan Koers
3 siblings, 0 replies; 81+ messages in thread
From: Arjan Koers @ 2010-10-10 1:26 UTC (permalink / raw)
To: kvm
Cc: Zachary Amsden, Marcelo Tosatti, Michael Tokarev, Avi Kivity,
Glauber Costa, Andre Przywara
[-- Attachment #1: Type: text/plain, Size: 2485 bytes --]
On 2010-10-09 03:10, Arjan Koers wrote:
> On 2010-10-09 00:06, Marcelo Tosatti wrote:
>> On Thu, Oct 07, 2010 at 04:47:11PM -1000, Zachary Amsden wrote:
>>> On 10/07/2010 02:12 PM, Arjan Koers wrote:
>>>> On 2010-10-03 01:42, Zachary Amsden wrote:
>>>> ...
>>>>> Umm... do you guys have this commit? This is supposed to address the
>>>>> issue where the guest keeps resetting the TSC. A guest which does that
>>>>> will break kvmclock. It only happens on SMP, and it's much worse on AMD
>>>>> CPUs...
>>>>>
>>>>> sound like your scenario.
>>>>>
>>>>> commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
>>>>> Author: Zachary Amsden<zamsden@redhat.com>
>>>>> Date: Thu Aug 19 22:07:26 2010 -1000
>>>>
>>>> This commit fixes the problem:
>>>>
>>>> commit aad07c4f92bae2edaa42bcef84c2afdd0d082458
>>>> Author: Zachary Amsden<zamsden@redhat.com>
>>>> Date: Thu Aug 19 22:07:19 2010 -1000
>>>>
>>>> KVM: x86: Move TSC reset out of vmcb_init
>>>>
>>>> The VMCB is reset whenever we receive a startup IPI, so Linux is setting
>>>> TSC back to zero happens very late in the boot process and destabilizing
>>>> the TSC. Instead, just set TSC to zero once at VCPU creation time.
>>>>
>>>> Why the separate patch? So git-bisect is your friend.
>>>
>>> Okay, apparently I need to go poke around 2.6.35 and see what
>>> patches made it there and what patches didn't.
>>
>> Backports attached. Michael, Arjan, please give them a try.
>>
>
> Thanks for the patches.
>
> Successfully tested with 2.6.34.7, 2.6.35.7 and 2.6.36-rc7 host
> (with a 2.6.35.7 guest).
>
> It failed with a 2.6.32.24 host. The patch applied, but
> pvclock_clocksource_read on the guest is still producing wrong
> results for CPU 1 while it's booting. I'll re-check tomorrow.
I've performed some more tests on 2.6.32.24 and it turns out that
the wrong value for CPU 1 is not far enough into the future to make
the guest hang, but that may be different on someone else's system.
See the attached boot log 'dmesg-tsc-unstable.txt'. Note that the printk
time doesn't change for a while after switching to clocksource kvm-clock.
On 2.6.32 and 2.6.33, the TSC is unstable, while on 2.6.34+ it's not
(with Marcelo's patches applied). The attached host patches (backported
from 2.6.34) make them all behave like 2.6.34+, with stable TSC. See
boot log 'dmesg-tsc-stable.txt'.
If I'm not mistaken, the code in pvclock_clocksource_read that
causes the hangs will never be reached when the TSC is stable.
[-- Attachment #2: dmesg-tsc-unstable.txt --]
[-- Type: text/plain, Size: 15294 bytes --]
[ 0.000000] Linux version 2.6.32.24-201010092338-guestmp (arjan@dev-lenny) (gcc version 4.4.5 20100728 (prerelease) (Debian 4.4.4-8) ) #1 SMP Sat Oct 9 23:42:46 UTC 2010
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-2.6.32.24-201010092338-guestmp root=UUID=22a4b388-70e0-4d2a-9aa1-bd842504378a ro quiet
[ 0.000000] KERNEL supported cpus:
[ 0.000000] Intel GenuineIntel
[ 0.000000] AMD AuthenticAMD
[ 0.000000] Centaur CentaurHauls
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009bc00 (usable)
[ 0.000000] BIOS-e820: 000000000009bc00 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 000000001fffd000 (usable)
[ 0.000000] BIOS-e820: 000000001fffd000 - 0000000020000000 (reserved)
[ 0.000000] BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
[ 0.000000] BIOS-e820: feffd00000000000 - ff00100000000000 (reserved)
[ 0.000000] DMI 2.4 present.
[ 0.000000] last_pfn = 0x1fffd max_arch_pfn = 0x400000000
[ 0.000000] MTRR default type: write-back
[ 0.000000] MTRR fixed ranges enabled:
[ 0.000000] 00000-9FFFF write-back
[ 0.000000] A0000-BFFFF uncachable
[ 0.000000] C0000-FFFFF write-protect
[ 0.000000] MTRR variable ranges enabled:
[ 0.000000] 0 base 00E0000000 mask FFE0000000 uncachable
[ 0.000000] 1 disabled
[ 0.000000] 2 disabled
[ 0.000000] 3 disabled
[ 0.000000] 4 disabled
[ 0.000000] 5 disabled
[ 0.000000] 6 disabled
[ 0.000000] 7 disabled
[ 0.000000] x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106
[ 0.000000] initial memory mapped : 0 - 20000000
[ 0.000000] Using GB pages for direct mapping
[ 0.000000] init_memory_mapping: 0000000000000000-000000001fffd000
[ 0.000000] 0000000000 - 001fe00000 page 2M
[ 0.000000] 001fe00000 - 001fffd000 page 4k
[ 0.000000] kernel direct mapping tables up to 1fffd000 @ 8000-b000
[ 0.000000] RAMDISK: 17df5000 - 1803d7b1
[ 0.000000] ACPI: RSDP 00000000000fdb80 00014 (v00 BOCHS )
[ 0.000000] ACPI: RSDT 000000001fffde10 00034 (v01 BOCHS BXPCRSDT 00000001 BXPC 00000001)
[ 0.000000] ACPI: FACP 000000001ffffe40 00074 (v01 BOCHS BXPCFACP 00000001 BXPC 00000001)
[ 0.000000] ACPI: DSDT 000000001fffdfd0 01E22 (v01 BXPC BXDSDT 00000001 INTL 20090123)
[ 0.000000] ACPI: FACS 000000001ffffe00 00040
[ 0.000000] ACPI: SSDT 000000001fffdf80 00044 (v01 BOCHS BXPCSSDT 00000001 BXPC 00000001)
[ 0.000000] ACPI: APIC 000000001fffde90 0007A (v01 BOCHS BXPCAPIC 00000001 BXPC 00000001)
[ 0.000000] ACPI: HPET 000000001fffde50 00038 (v01 BOCHS BXPCHPET 00000001 BXPC 00000001)
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] (7 early reservations) ==> bootmem [0000000000 - 001fffd000]
[ 0.000000] #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000]
[ 0.000000] #1 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000]
[ 0.000000] #2 [0001000000 - 00013d08d8] TEXT DATA BSS ==> [0001000000 - 00013d08d8]
[ 0.000000] #3 [0017df5000 - 001803d7b1] RAMDISK ==> [0017df5000 - 001803d7b1]
[ 0.000000] #4 [000009bc00 - 0000100000] BIOS reserved ==> [000009bc00 - 0000100000]
[ 0.000000] #5 [00013d1000 - 00013d1071] BRK ==> [00013d1000 - 00013d1071]
[ 0.000000] #6 [0000008000 - 0000009000] PGTABLE ==> [0000008000 - 0000009000]
[ 0.000000] kvm-clock: cpu 0, msr 0:1322601, boot clock
[ 0.000000] [ffffea0000000000-ffffea00007fffff] PMD -> [ffff880001800000-ffff880001ffffff] on node 0
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0x00000000 -> 0x00001000
[ 0.000000] DMA32 0x00001000 -> 0x00100000
[ 0.000000] Normal 0x00100000 -> 0x00100000
[ 0.000000] Movable zone start PFN for each node
[ 0.000000] early_node_map[2] active PFN ranges
[ 0.000000] 0: 0x00000000 -> 0x0000009b
[ 0.000000] 0: 0x00000100 -> 0x0001fffd
[ 0.000000] On node 0 totalpages: 130968
[ 0.000000] DMA zone: 56 pages used for memmap
[ 0.000000] DMA zone: 104 pages reserved
[ 0.000000] DMA zone: 3835 pages, LIFO batch:0
[ 0.000000] DMA32 zone: 1736 pages used for memmap
[ 0.000000] DMA32 zone: 125237 pages, LIFO batch:31
[ 0.000000] ACPI: PM-Timer IO Port: 0xb008
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[ 0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[ 0.000000] ACPI: IRQ0 used by override.
[ 0.000000] ACPI: IRQ2 used by override.
[ 0.000000] ACPI: IRQ5 used by override.
[ 0.000000] ACPI: IRQ9 used by override.
[ 0.000000] ACPI: IRQ10 used by override.
[ 0.000000] ACPI: IRQ11 used by override.
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[ 0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
[ 0.000000] nr_irqs_gsi: 24
[ 0.000000] Allocating PCI resources starting at 20000000 (gap: 20000000:dffc0000)
[ 0.000000] Booting paravirtualized kernel on KVM
[ 0.000000] NR_CPUS:6 nr_cpumask_bits:6 nr_cpu_ids:2 nr_node_ids:1
[ 0.000000] PERCPU: Embedded 26 pages/cpu @ffff880001400000 s73816 r8192 d24488 u1048576
[ 0.000000] pcpu-alloc: s73816 r8192 d24488 u1048576 alloc=1*2097152
[ 0.000000] pcpu-alloc: [0] 0 1
[ 0.000000] kvm-clock: cpu 0, msr 0:1411601, primary cpu clock
[ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 129072
[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-2.6.32.24-201010092338-guestmp root=UUID=22a4b388-70e0-4d2a-9aa1-bd842504378a ro quiet
[ 0.000000] PID hash table entries: 2048 (order: 2, 16384 bytes)
[ 0.000000] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
[ 0.000000] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
[ 0.000000] Initializing CPU#0
[ 0.000000] Checking aperture...
[ 0.000000] No AGP bridge found
[ 0.000000] Memory: 507724k/524276k available (2072k kernel code, 404k absent, 15504k reserved, 1063k data, 452k init)
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] NR_IRQS:448
[ 0.000000] Console: colour VGA+ 80x25
[ 0.000000] console [tty0] enabled
[ 0.000000] hpet clockevent registered
[ 0.000000] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
[ 0.000000] Detected 2799.842 MHz processor.
[ 0.012000] Calibrating delay loop (skipped) preset value.. 5599.68 BogoMIPS (lpj=11199368)
[ 0.012000] Mount-cache hash table entries: 256
[ 0.012000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[ 0.012000] CPU: L2 Cache: 512K (64 bytes/line)
[ 0.012000] using C1E aware idle routine
[ 0.012000] Performance Events: AMD PMU driver.
[ 0.012000] ... version: 0
[ 0.012000] ... bit width: 48
[ 0.012000] ... generic registers: 4
[ 0.012000] ... value mask: 0000ffffffffffff
[ 0.012000] ... max period: 00007fffffffffff
[ 0.012000] ... fixed-purpose events: 0
[ 0.012000] ... event mask: 000000000000000f
[ 0.012000] Freeing SMP alternatives: 20k freed
[ 0.012019] ACPI: Core revision 20090903
[ 0.014379] Setting APIC routing to flat
[ 0.015667] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.015669] CPU0: AMD Athlon(tm) II X2 240 Processor stepping 02
[ 0.016000] Booting processor 1 APIC 0x1 ip 0x6000
[ 0.012000] Initializing CPU#1
[ 0.012000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[ 0.012000] CPU: L2 Cache: 512K (64 bytes/line)
[ 0.012000] kvm-clock: cpu 1, msr 0:1511601, secondary cpu clock
[ 0.025724] CPU1: AMD Athlon(tm) II X2 240 Processor stepping 02
[ 0.025724] Brought up 2 CPUs
[ 0.025724] Total of 2 processors activated (11199.36 BogoMIPS).
[ 0.025724] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
[ 0.028000] NET: Registered protocol family 16
[ 0.028000] ACPI: bus type pci registered
[ 0.028000] PCI: Using configuration type 1 for base access
[ 0.028000] PCI: Using configuration type 1 for extended access
[ 0.028000] mtrr: your CPUs had inconsistent variable MTRR settings
[ 0.028000] mtrr: your CPUs had inconsistent MTRRdefType settings
[ 0.028000] mtrr: probably your BIOS does not setup all CPUs.
[ 0.028000] mtrr: corrected configuration.
[ 0.040000] bio: create slab <bio-0> at 0
[ 0.040000] ACPI: EC: Look up EC in DSDT
[ 0.040000] ACPI: Interpreter enabled
[ 0.040000] ACPI: (supports S0 S5)
[ 0.040000] ACPI: Using IOAPIC for interrupt routing
[ 0.064000] ACPI: PCI Root Bridge [PCI0] (0000:00)
[ 0.064000] pci 0000:00:01.1: reg 20 io port: [0xc000-0xc00f]
[ 0.064000] pci 0000:00:01.3: quirk: region b000-b03f claimed by PIIX4 ACPI
[ 0.064000] pci 0000:00:01.3: quirk: region b100-b10f claimed by PIIX4 SMB
[ 0.068000] pci 0000:00:02.0: reg 10 32bit mmio pref: [0xf0000000-0xf1ffffff]
[ 0.068000] pci 0000:00:02.0: reg 14 32bit mmio: [0xf2000000-0xf2000fff]
[ 0.072000] pci 0000:00:02.0: reg 30 32bit mmio pref: [0xf2010000-0xf201ffff]
[ 0.072000] pci 0000:00:03.0: reg 10 io port: [0xc020-0xc03f]
[ 0.072000] pci 0000:00:03.0: reg 14 32bit mmio: [0xf2020000-0xf2020fff]
[ 0.072000] pci 0000:00:03.0: reg 30 32bit mmio pref: [0xf2030000-0xf203ffff]
[ 0.072000] pci 0000:00:04.0: reg 10 io port: [0xc040-0xc05f]
[ 0.072000] pci 0000:00:05.0: reg 10 io port: [0xc080-0xc0bf]
[ 0.072000] pci 0000:00:05.0: reg 14 32bit mmio: [0xf2040000-0xf2040fff]
[ 0.072000] pci 0000:00:06.0: reg 10 io port: [0xc0c0-0xc0ff]
[ 0.072000] pci 0000:00:06.0: reg 14 32bit mmio: [0xf2041000-0xf2041fff]
[ 0.072000] pci_bus 0000:00: on NUMA node 0
[ 0.072000] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[ 0.080000] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[ 0.080000] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[ 0.080000] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[ 0.080000] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[ 0.084000] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[ 0.084000] vgaarb: loaded
[ 0.084000] PCI: Using ACPI for IRQ routing
[ 0.084000] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[ 0.088000] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
[ 0.096000] Switching to clocksource kvm-clock
[ 0.096000] pnp: PnP ACPI init
[ 0.096000] ACPI: bus type pnp registered
[ 0.096000] pnp: PnP ACPI: found 7 devices
[ 0.096000] ACPI: ACPI bus type pnp unregistered
[ 0.096000] pci_bus 0000:00: resource 0 io: [0x00-0xffff]
[ 0.096000] pci_bus 0000:00: resource 1 mem: [0x000000-0xffffffffffffffff]
[ 0.096000] NET: Registered protocol family 2
[ 0.096000] IP route cache hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.096000] TCP established hash table entries: 16384 (order: 6, 262144 bytes)
[ 0.096000] TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
[ 0.096000] TCP: Hash tables configured (established 16384 bind 16384)
[ 0.096000] TCP reno registered
[ 0.096000] NET: Registered protocol family 1
[ 0.096000] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[ 0.096000] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[ 0.096000] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[ 0.096000] pci 0000:00:02.0: Boot video device
[ 0.096000] Unpacking initramfs...
[ 0.096000] Freeing initrd memory: 2337k freed
[ 0.096000] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[ 0.096000] msgmni has been set to 997
[ 0.096000] alg: No test for stdrng (krng)
[ 0.096000] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[ 0.096000] io scheduler noop registered
[ 0.096000] io scheduler anticipatory registered
[ 0.096000] io scheduler deadline registered
[ 0.096000] io scheduler cfq registered (default)
[ 0.096000] PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[ 0.096000] serio: i8042 KBD port at 0x60,0x64 irq 1
[ 0.096000] serio: i8042 AUX port at 0x60,0x64 irq 12
[ 0.096000] mice: PS/2 mouse device common for all mice
[ 0.096000] rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
[ 0.096000] rtc0: alarms up to one day, 114 bytes nvram, hpet irqs
[ 0.096000] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
[ 0.096000] cpuidle: using governor ladder
[ 0.096000] cpuidle: using governor menu
[ 0.096000] TCP cubic registered
[ 0.096000] NET: Registered protocol family 17
[ 0.096000] rtc_cmos 00:01: setting system clock to 2010-10-10 00:15:08 UTC (1286669708)
[ 0.096000] Freeing unused kernel memory: 452k freed
[ 0.096000] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
[ 0.096000] virtio-pci 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) -> IRQ 11
[ 0.096000] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 10
[ 0.096000] virtio-pci 0000:00:04.0: PCI INT A -> Link[LNKD] -> GSI 10 (level, high) -> IRQ 10
[ 0.096000] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10
[ 0.096000] virtio-pci 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10
[ 0.096000] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 11
[ 0.096000] virtio-pci 0000:00:06.0: PCI INT A -> Link[LNKB] -> GSI 11 (level, high) -> IRQ 11
[ 0.096000] virtio-pci 0000:00:05.0: irq 24 for MSI/MSI-X
[ 0.096000] virtio-pci 0000:00:05.0: irq 25 for MSI/MSI-X
[ 0.096000] vda: vda1 vda2 < vda5 >
[ 0.096000] virtio-pci 0000:00:03.0: irq 26 for MSI/MSI-X
[ 0.096000] virtio-pci 0000:00:03.0: irq 27 for MSI/MSI-X
[ 0.096000] virtio-pci 0000:00:03.0: irq 28 for MSI/MSI-X
[ 0.096000] virtio-pci 0000:00:06.0: irq 29 for MSI/MSI-X
[ 0.096000] virtio-pci 0000:00:06.0: irq 30 for MSI/MSI-X
[ 0.096000] vdb: vdb1
[ 0.655063] kjournald starting. Commit interval 5 seconds
[ 0.655106] EXT3-fs: mounted filesystem with writeback data mode.
[ 1.009125] Clocksource tsc unstable (delta = 303360937 ns)
[ 2.278086] udev: starting version 160
[ 2.971618] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input1
[ 2.971640] ACPI: Power Button [PWRF]
[ 3.156974] processor LNXCPU:00: registered as cooling_device0
[ 3.157091] processor LNXCPU:01: registered as cooling_device1
[ 4.305146] Adding 409616k swap on /dev/vda5. Priority:-1 extents:1 across:409616k
[ 4.465152] EXT3 FS on vda1, internal journal
[ 4.642414] loop: module loaded
[-- Attachment #3: dmesg-tsc-stable.txt --]
[-- Type: text/plain, Size: 15231 bytes --]
[ 0.000000] Linux version 2.6.32.24-201010092338-guestmp (arjan@dev-lenny) (gcc version 4.4.5 20100728 (prerelease) (Debian 4.4.4-8) ) #1 SMP Sat Oct 9 23:42:46 UTC 2010
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-2.6.32.24-201010092338-guestmp root=UUID=22a4b388-70e0-4d2a-9aa1-bd842504378a ro quiet
[ 0.000000] KERNEL supported cpus:
[ 0.000000] Intel GenuineIntel
[ 0.000000] AMD AuthenticAMD
[ 0.000000] Centaur CentaurHauls
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009bc00 (usable)
[ 0.000000] BIOS-e820: 000000000009bc00 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 000000001fffd000 (usable)
[ 0.000000] BIOS-e820: 000000001fffd000 - 0000000020000000 (reserved)
[ 0.000000] BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
[ 0.000000] BIOS-e820: feffd00000000000 - ff00100000000000 (reserved)
[ 0.000000] DMI 2.4 present.
[ 0.000000] last_pfn = 0x1fffd max_arch_pfn = 0x400000000
[ 0.000000] MTRR default type: write-back
[ 0.000000] MTRR fixed ranges enabled:
[ 0.000000] 00000-9FFFF write-back
[ 0.000000] A0000-BFFFF uncachable
[ 0.000000] C0000-FFFFF write-protect
[ 0.000000] MTRR variable ranges enabled:
[ 0.000000] 0 base 00E0000000 mask FFE0000000 uncachable
[ 0.000000] 1 disabled
[ 0.000000] 2 disabled
[ 0.000000] 3 disabled
[ 0.000000] 4 disabled
[ 0.000000] 5 disabled
[ 0.000000] 6 disabled
[ 0.000000] 7 disabled
[ 0.000000] x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106
[ 0.000000] initial memory mapped : 0 - 20000000
[ 0.000000] Using GB pages for direct mapping
[ 0.000000] init_memory_mapping: 0000000000000000-000000001fffd000
[ 0.000000] 0000000000 - 001fe00000 page 2M
[ 0.000000] 001fe00000 - 001fffd000 page 4k
[ 0.000000] kernel direct mapping tables up to 1fffd000 @ 8000-b000
[ 0.000000] RAMDISK: 17df5000 - 1803d7b1
[ 0.000000] ACPI: RSDP 00000000000fdb80 00014 (v00 BOCHS )
[ 0.000000] ACPI: RSDT 000000001fffde10 00034 (v01 BOCHS BXPCRSDT 00000001 BXPC 00000001)
[ 0.000000] ACPI: FACP 000000001ffffe40 00074 (v01 BOCHS BXPCFACP 00000001 BXPC 00000001)
[ 0.000000] ACPI: DSDT 000000001fffdfd0 01E22 (v01 BXPC BXDSDT 00000001 INTL 20090123)
[ 0.000000] ACPI: FACS 000000001ffffe00 00040
[ 0.000000] ACPI: SSDT 000000001fffdf80 00044 (v01 BOCHS BXPCSSDT 00000001 BXPC 00000001)
[ 0.000000] ACPI: APIC 000000001fffde90 0007A (v01 BOCHS BXPCAPIC 00000001 BXPC 00000001)
[ 0.000000] ACPI: HPET 000000001fffde50 00038 (v01 BOCHS BXPCHPET 00000001 BXPC 00000001)
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] (7 early reservations) ==> bootmem [0000000000 - 001fffd000]
[ 0.000000] #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000]
[ 0.000000] #1 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000]
[ 0.000000] #2 [0001000000 - 00013d08d8] TEXT DATA BSS ==> [0001000000 - 00013d08d8]
[ 0.000000] #3 [0017df5000 - 001803d7b1] RAMDISK ==> [0017df5000 - 001803d7b1]
[ 0.000000] #4 [000009bc00 - 0000100000] BIOS reserved ==> [000009bc00 - 0000100000]
[ 0.000000] #5 [00013d1000 - 00013d1071] BRK ==> [00013d1000 - 00013d1071]
[ 0.000000] #6 [0000008000 - 0000009000] PGTABLE ==> [0000008000 - 0000009000]
[ 0.000000] kvm-clock: cpu 0, msr 0:1322601, boot clock
[ 0.000000] [ffffea0000000000-ffffea00007fffff] PMD -> [ffff880001800000-ffff880001ffffff] on node 0
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0x00000000 -> 0x00001000
[ 0.000000] DMA32 0x00001000 -> 0x00100000
[ 0.000000] Normal 0x00100000 -> 0x00100000
[ 0.000000] Movable zone start PFN for each node
[ 0.000000] early_node_map[2] active PFN ranges
[ 0.000000] 0: 0x00000000 -> 0x0000009b
[ 0.000000] 0: 0x00000100 -> 0x0001fffd
[ 0.000000] On node 0 totalpages: 130968
[ 0.000000] DMA zone: 56 pages used for memmap
[ 0.000000] DMA zone: 104 pages reserved
[ 0.000000] DMA zone: 3835 pages, LIFO batch:0
[ 0.000000] DMA32 zone: 1736 pages used for memmap
[ 0.000000] DMA32 zone: 125237 pages, LIFO batch:31
[ 0.000000] ACPI: PM-Timer IO Port: 0xb008
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[ 0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[ 0.000000] ACPI: IRQ0 used by override.
[ 0.000000] ACPI: IRQ2 used by override.
[ 0.000000] ACPI: IRQ5 used by override.
[ 0.000000] ACPI: IRQ9 used by override.
[ 0.000000] ACPI: IRQ10 used by override.
[ 0.000000] ACPI: IRQ11 used by override.
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[ 0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
[ 0.000000] nr_irqs_gsi: 24
[ 0.000000] Allocating PCI resources starting at 20000000 (gap: 20000000:dffc0000)
[ 0.000000] Booting paravirtualized kernel on KVM
[ 0.000000] NR_CPUS:6 nr_cpumask_bits:6 nr_cpu_ids:2 nr_node_ids:1
[ 0.000000] PERCPU: Embedded 26 pages/cpu @ffff880001400000 s73816 r8192 d24488 u1048576
[ 0.000000] pcpu-alloc: s73816 r8192 d24488 u1048576 alloc=1*2097152
[ 0.000000] pcpu-alloc: [0] 0 1
[ 0.000000] kvm-clock: cpu 0, msr 0:1411601, primary cpu clock
[ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 129072
[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-2.6.32.24-201010092338-guestmp root=UUID=22a4b388-70e0-4d2a-9aa1-bd842504378a ro quiet
[ 0.000000] PID hash table entries: 2048 (order: 2, 16384 bytes)
[ 0.000000] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
[ 0.000000] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
[ 0.000000] Initializing CPU#0
[ 0.000000] Checking aperture...
[ 0.000000] No AGP bridge found
[ 0.000000] Memory: 507724k/524276k available (2072k kernel code, 404k absent, 15504k reserved, 1063k data, 452k init)
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] NR_IRQS:448
[ 0.000000] Console: colour VGA+ 80x25
[ 0.000000] console [tty0] enabled
[ 0.000000] hpet clockevent registered
[ 0.000000] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
[ 0.000000] Detected 2800.486 MHz processor.
[ 0.012000] Calibrating delay loop (skipped) preset value.. 5600.97 BogoMIPS (lpj=11201944)
[ 0.012000] Mount-cache hash table entries: 256
[ 0.012000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[ 0.012000] CPU: L2 Cache: 512K (64 bytes/line)
[ 0.012000] using C1E aware idle routine
[ 0.012000] Performance Events: AMD PMU driver.
[ 0.012000] ... version: 0
[ 0.012000] ... bit width: 48
[ 0.012000] ... generic registers: 4
[ 0.012000] ... value mask: 0000ffffffffffff
[ 0.012000] ... max period: 00007fffffffffff
[ 0.012000] ... fixed-purpose events: 0
[ 0.012000] ... event mask: 000000000000000f
[ 0.012100] Freeing SMP alternatives: 20k freed
[ 0.012114] ACPI: Core revision 20090903
[ 0.014445] Setting APIC routing to flat
[ 0.015790] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.015793] CPU0: AMD Athlon(tm) II X2 240 Processor stepping 02
[ 0.016000] Booting processor 1 APIC 0x1 ip 0x6000
[ 0.012000] Initializing CPU#1
[ 0.012000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[ 0.012000] CPU: L2 Cache: 512K (64 bytes/line)
[ 0.012000] kvm-clock: cpu 1, msr 0:1511601, secondary cpu clock
[ 0.024078] CPU1: AMD Athlon(tm) II X2 240 Processor stepping 02
[ 0.024108] Brought up 2 CPUs
[ 0.024110] Total of 2 processors activated (11201.94 BogoMIPS).
[ 0.024411] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
[ 0.025259] NET: Registered protocol family 16
[ 0.028133] ACPI: bus type pci registered
[ 0.028133] PCI: Using configuration type 1 for base access
[ 0.028133] PCI: Using configuration type 1 for extended access
[ 0.028188] mtrr: your CPUs had inconsistent variable MTRR settings
[ 0.028188] mtrr: your CPUs had inconsistent MTRRdefType settings
[ 0.028188] mtrr: probably your BIOS does not setup all CPUs.
[ 0.028188] mtrr: corrected configuration.
[ 0.036221] bio: create slab <bio-0> at 0
[ 0.044346] ACPI: EC: Look up EC in DSDT
[ 0.049855] ACPI: Interpreter enabled
[ 0.049857] ACPI: (supports S0 S5)
[ 0.049867] ACPI: Using IOAPIC for interrupt routing
[ 0.068474] ACPI: PCI Root Bridge [PCI0] (0000:00)
[ 0.072834] pci 0000:00:01.1: reg 20 io port: [0xc000-0xc00f]
[ 0.073227] pci 0000:00:01.3: quirk: region b000-b03f claimed by PIIX4 ACPI
[ 0.073239] pci 0000:00:01.3: quirk: region b100-b10f claimed by PIIX4 SMB
[ 0.075565] pci 0000:00:02.0: reg 10 32bit mmio pref: [0xf0000000-0xf1ffffff]
[ 0.075565] pci 0000:00:02.0: reg 14 32bit mmio: [0xf2000000-0xf2000fff]
[ 0.075565] pci 0000:00:02.0: reg 30 32bit mmio pref: [0xf2010000-0xf201ffff]
[ 0.075565] pci 0000:00:03.0: reg 10 io port: [0xc020-0xc03f]
[ 0.075565] pci 0000:00:03.0: reg 14 32bit mmio: [0xf2020000-0xf2020fff]
[ 0.075619] pci 0000:00:03.0: reg 30 32bit mmio pref: [0xf2030000-0xf203ffff]
[ 0.080155] pci 0000:00:04.0: reg 10 io port: [0xc040-0xc05f]
[ 0.080533] pci 0000:00:05.0: reg 10 io port: [0xc080-0xc0bf]
[ 0.080592] pci 0000:00:05.0: reg 14 32bit mmio: [0xf2040000-0xf2040fff]
[ 0.081031] pci 0000:00:06.0: reg 10 io port: [0xc0c0-0xc0ff]
[ 0.081089] pci 0000:00:06.0: reg 14 32bit mmio: [0xf2041000-0xf2041fff]
[ 0.081536] pci_bus 0000:00: on NUMA node 0
[ 0.081604] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[ 0.092368] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[ 0.092614] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[ 0.092822] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[ 0.093030] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[ 0.093246] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[ 0.093246] vgaarb: loaded
[ 0.096176] PCI: Using ACPI for IRQ routing
[ 0.096631] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[ 0.096631] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
[ 0.112094] Switching to clocksource kvm-clock
[ 0.112665] pnp: PnP ACPI init
[ 0.112710] ACPI: bus type pnp registered
[ 0.117907] pnp: PnP ACPI: found 7 devices
[ 0.117914] ACPI: ACPI bus type pnp unregistered
[ 0.127084] pci_bus 0000:00: resource 0 io: [0x00-0xffff]
[ 0.127093] pci_bus 0000:00: resource 1 mem: [0x000000-0xffffffffffffffff]
[ 0.127513] NET: Registered protocol family 2
[ 0.127902] IP route cache hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.128916] TCP established hash table entries: 16384 (order: 6, 262144 bytes)
[ 0.129442] TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
[ 0.129976] TCP: Hash tables configured (established 16384 bind 16384)
[ 0.129987] TCP reno registered
[ 0.130420] NET: Registered protocol family 1
[ 0.130462] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[ 0.130498] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[ 0.130529] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[ 0.130559] pci 0000:00:02.0: Boot video device
[ 0.130721] Unpacking initramfs...
[ 0.177652] Freeing initrd memory: 2337k freed
[ 0.184652] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[ 0.185509] msgmni has been set to 997
[ 0.186409] alg: No test for stdrng (krng)
[ 0.186897] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[ 0.186906] io scheduler noop registered
[ 0.186908] io scheduler anticipatory registered
[ 0.186909] io scheduler deadline registered
[ 0.187038] io scheduler cfq registered (default)
[ 0.219030] PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[ 0.220785] serio: i8042 KBD port at 0x60,0x64 irq 1
[ 0.220803] serio: i8042 AUX port at 0x60,0x64 irq 12
[ 0.221828] mice: PS/2 mouse device common for all mice
[ 0.223661] rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
[ 0.223792] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
[ 0.224291] rtc0: alarms up to one day, 114 bytes nvram, hpet irqs
[ 0.224441] cpuidle: using governor ladder
[ 0.224447] cpuidle: using governor menu
[ 0.226020] TCP cubic registered
[ 0.226026] NET: Registered protocol family 17
[ 0.228262] rtc_cmos 00:01: setting system clock to 2010-10-09 23:52:14 UTC (1286668334)
[ 0.228523] Freeing unused kernel memory: 452k freed
[ 0.319112] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
[ 0.319139] virtio-pci 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) -> IRQ 11
[ 0.319275] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 10
[ 0.319290] virtio-pci 0000:00:04.0: PCI INT A -> Link[LNKD] -> GSI 10 (level, high) -> IRQ 10
[ 0.319440] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10
[ 0.319443] virtio-pci 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10
[ 0.319592] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 11
[ 0.319595] virtio-pci 0000:00:06.0: PCI INT A -> Link[LNKB] -> GSI 11 (level, high) -> IRQ 11
[ 0.365438] virtio-pci 0000:00:03.0: irq 24 for MSI/MSI-X
[ 0.365455] virtio-pci 0000:00:03.0: irq 25 for MSI/MSI-X
[ 0.365468] virtio-pci 0000:00:03.0: irq 26 for MSI/MSI-X
[ 0.366915] virtio-pci 0000:00:05.0: irq 27 for MSI/MSI-X
[ 0.366930] virtio-pci 0000:00:05.0: irq 28 for MSI/MSI-X
[ 0.367356] vda: vda1 vda2 < vda5 >
[ 0.393528] virtio-pci 0000:00:06.0: irq 29 for MSI/MSI-X
[ 0.393542] virtio-pci 0000:00:06.0: irq 30 for MSI/MSI-X
[ 0.394301] vdb: vdb1
[ 0.940463] kjournald starting. Commit interval 5 seconds
[ 0.940574] EXT3-fs: mounted filesystem with writeback data mode.
[ 2.603464] udev: starting version 160
[ 3.120632] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input1
[ 3.120646] ACPI: Power Button [PWRF]
[ 3.328008] processor LNXCPU:00: registered as cooling_device0
[ 3.328080] processor LNXCPU:01: registered as cooling_device1
[ 4.534367] Adding 409616k swap on /dev/vda5. Priority:-1 extents:1 across:409616k
[ 4.702659] EXT3 FS on vda1, internal journal
[ 4.838516] loop: module loaded
[-- Attachment #4: 2.6.32.24.diff --]
[-- Type: text/x-patch, Size: 938 bytes --]
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 61ba669..a5882fb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -797,15 +797,17 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (unlikely(cpu != vcpu->cpu)) {
u64 tsc_this, delta;
- /*
- * Make sure that the guest sees a monotonically
- * increasing TSC.
- */
- rdtscll(tsc_this);
- delta = vcpu->arch.host_tsc - tsc_this;
- svm->vmcb->control.tsc_offset += delta;
- if (is_nested(svm))
- svm->nested.hsave->control.tsc_offset += delta;
+ if (check_tsc_unstable()) {
+ /*
+ * Make sure that the guest sees a monotonically
+ * increasing TSC.
+ */
+ rdtscll(tsc_this);
+ delta = vcpu->arch.host_tsc - tsc_this;
+ svm->vmcb->control.tsc_offset += delta;
+ if (is_nested(svm))
+ svm->nested.hsave->control.tsc_offset += delta;
+ }
vcpu->cpu = cpu;
kvm_migrate_timers(vcpu);
svm->asid_generation = 0;
[-- Attachment #5: 2.6.33.7.diff --]
[-- Type: text/x-patch, Size: 901 bytes --]
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 8d128be..77f119c 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -801,14 +801,16 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (unlikely(cpu != vcpu->cpu)) {
u64 delta;
- /*
- * Make sure that the guest sees a monotonically
- * increasing TSC.
- */
- delta = vcpu->arch.host_tsc - native_read_tsc();
- svm->vmcb->control.tsc_offset += delta;
- if (is_nested(svm))
- svm->nested.hsave->control.tsc_offset += delta;
+ if (check_tsc_unstable()) {
+ /*
+ * Make sure that the guest sees a monotonically
+ * increasing TSC.
+ */
+ delta = vcpu->arch.host_tsc - native_read_tsc();
+ svm->vmcb->control.tsc_offset += delta;
+ if (is_nested(svm))
+ svm->nested.hsave->control.tsc_offset += delta;
+ }
vcpu->cpu = cpu;
kvm_migrate_timers(vcpu);
svm->asid_generation = 0;
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-09 2:27 ` Zachary Amsden
2010-10-09 6:29 ` Michael Tokarev
2010-10-10 1:20 ` Arjan Koers
@ 2010-10-11 17:53 ` Anthony Liguori
2010-10-11 18:36 ` Marcelo Tosatti
2 siblings, 1 reply; 81+ messages in thread
From: Anthony Liguori @ 2010-10-11 17:53 UTC (permalink / raw)
To: Zachary Amsden
Cc: Arjan Koers, kvm, Marcelo Tosatti, Michael Tokarev, Avi Kivity,
Glauber Costa, Andre Przywara
On 10/08/2010 09:27 PM, Zachary Amsden wrote:
> On 10/08/2010 03:10 PM, Arjan Koers wrote:
>> On 2010-10-09 00:06, Marcelo Tosatti wrote:
>>> On Thu, Oct 07, 2010 at 04:47:11PM -1000, Zachary Amsden wrote:
>>>> On 10/07/2010 02:12 PM, Arjan Koers wrote:
>>>>> On 2010-10-03 01:42, Zachary Amsden wrote:
>>>>> ...
>>>>>> Umm... do you guys have this commit? This is supposed to
>>>>>> address the
>>>>>> issue where the guest keeps resetting the TSC. A guest which
>>>>>> does that
>>>>>> will break kvmclock. It only happens on SMP, and it's much worse
>>>>>> on AMD
>>>>>> CPUs...
>>>>>>
>>>>>> sound like your scenario.
>>>>>>
>>>>>> commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
>>>>>> Author: Zachary Amsden<zamsden@redhat.com>
>>>>>> Date: Thu Aug 19 22:07:26 2010 -1000
>>>>> This commit fixes the problem:
>>>>>
>>>>> commit aad07c4f92bae2edaa42bcef84c2afdd0d082458
>>>>> Author: Zachary Amsden<zamsden@redhat.com>
>>>>> Date: Thu Aug 19 22:07:19 2010 -1000
>>>>>
>>>>> KVM: x86: Move TSC reset out of vmcb_init
>>>>>
>>>>> The VMCB is reset whenever we receive a startup IPI, so Linux
>>>>> is setting
>>>>> TSC back to zero happens very late in the boot process and
>>>>> destabilizing
>>>>> the TSC. Instead, just set TSC to zero once at VCPU creation
>>>>> time.
>>>>>
>>>>> Why the separate patch? So git-bisect is your friend.
>>>> Okay, apparently I need to go poke around 2.6.35 and see what
>>>> patches made it there and what patches didn't.
>>> Backports attached. Michael, Arjan, please give them a try.
>>>
>> Thanks for the patches.
>>
>> Successfully tested with 2.6.34.7, 2.6.35.7 and 2.6.36-rc7 host
>> (with a 2.6.35.7 guest).
>>
>> It failed with a 2.6.32.24 host. The patch applied, but
>> pvclock_clocksource_read on the guest is still producing wrong
>> results for CPU 1 while it's booting. I'll re-check tomorrow.
>
> There's a lot of work I've done and also a lot of work done by Glauber
> Costa on kvmclock that recently went upstream.
If pvclock is broken on 2.6.32-stable, then shouldn't we port these
patches to the stable tree or in the very least, black list pvclock in
stable?
Regards,
Anthony Liguori
> It's unlikely that you'll be bug free without all of those patches
> applied; most of the patches were not just enhancements, but contained
> bugfixes as well as improved operation conditions. On top of this,
> the patches are highly interdependent because of close code
> proximity. I suggest applying the following commits to your branch
> (newest listed first; apply in reverse order):
>
> 12b1164fa498997bf72070e6a81418197e283716
> bfa075b75d8786380a7bca1215d4c7d1485d18dd
> 82e7988a2088781175a22b09631bce97cd5ed177
> bfb3f3326c915b1800dc65d10ca09fbd548353d2
> 1377ff23ae2bf49c76f8f498ca81050878b9666a
> 9a088cc32488cfb9f60dca5972155ba13f39eb83
> e06a1a6cbe4e9f4c766595483a9b345d5b48bda7
> da908f2fb4e783c2a4de751fb90f11a0dd041161
> cf839f5da2b0779b9ec8b990f851fb4e7d681da0
> cbc59a098486494d9a49537dcb9c969210a8306d
> 5cd459cdde725bb5c3a7feef6e074e7da70490c9
> d578d4d72e3d2154901123f40c9fa7de1f85ae73
> bd59fc8ff95126f27b7a0df1b6cc602aa428812d
> e5e7675b0b9bf8eb0b806145a2fe173b5bb0e908
> bf0fb4a42ba7eb362f4013bd2e93209666793e66
> 69403a558097a9bd333736d58a4cb69ea6e2a0ac
> a87834bdb7ff9117da7f164e8cee638f2c51f9b7
> 91308e2fecddb6fc63feaf4cef3400f5cbea6619
> fd03465c0648cd12d7333269b80d902d0a8516dd
> aad07c4f92bae2edaa42bcef84c2afdd0d082458
> 280372e494634d0a2cba3956721be16fc4f989bf
> 1e6145f6fd7899d1f34e4ac00a8558d82a8d704a
> ec01d2eb0a74a6d95823fb6e320298473faf12be
> 3e05d29fe45508625e2a73db3d1bfb54f30731ff
>
> Since the issue appears resolved, I'm going to continue working upstream.
>
> Zach
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-11 17:53 ` Anthony Liguori
@ 2010-10-11 18:36 ` Marcelo Tosatti
0 siblings, 0 replies; 81+ messages in thread
From: Marcelo Tosatti @ 2010-10-11 18:36 UTC (permalink / raw)
To: Anthony Liguori
Cc: Zachary Amsden, Arjan Koers, kvm, Michael Tokarev, Avi Kivity,
Glauber Costa, Andre Przywara
On Mon, Oct 11, 2010 at 12:53:26PM -0500, Anthony Liguori wrote:
> On 10/08/2010 09:27 PM, Zachary Amsden wrote:
> >On 10/08/2010 03:10 PM, Arjan Koers wrote:
> >>On 2010-10-09 00:06, Marcelo Tosatti wrote:
> >>>On Thu, Oct 07, 2010 at 04:47:11PM -1000, Zachary Amsden wrote:
> >>>>On 10/07/2010 02:12 PM, Arjan Koers wrote:
> >>>>>On 2010-10-03 01:42, Zachary Amsden wrote:
> >>>>>...
> >>>>>>Umm... do you guys have this commit? This is supposed
> >>>>>>to address the
> >>>>>>issue where the guest keeps resetting the TSC. A guest
> >>>>>>which does that
> >>>>>>will break kvmclock. It only happens on SMP, and it's
> >>>>>>much worse on AMD
> >>>>>>CPUs...
> >>>>>>
> >>>>>>sound like your scenario.
> >>>>>>
> >>>>>>commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
> >>>>>>Author: Zachary Amsden<zamsden@redhat.com>
> >>>>>>Date: Thu Aug 19 22:07:26 2010 -1000
> >>>>>This commit fixes the problem:
> >>>>>
> >>>>>commit aad07c4f92bae2edaa42bcef84c2afdd0d082458
> >>>>>Author: Zachary Amsden<zamsden@redhat.com>
> >>>>>Date: Thu Aug 19 22:07:19 2010 -1000
> >>>>>
> >>>>> KVM: x86: Move TSC reset out of vmcb_init
> >>>>>
> >>>>> The VMCB is reset whenever we receive a startup IPI,
> >>>>>so Linux is setting
> >>>>> TSC back to zero happens very late in the boot
> >>>>>process and destabilizing
> >>>>> the TSC. Instead, just set TSC to zero once at VCPU
> >>>>>creation time.
> >>>>>
> >>>>> Why the separate patch? So git-bisect is your friend.
> >>>>Okay, apparently I need to go poke around 2.6.35 and see what
> >>>>patches made it there and what patches didn't.
> >>>Backports attached. Michael, Arjan, please give them a try.
> >>>
> >>Thanks for the patches.
> >>
> >>Successfully tested with 2.6.34.7, 2.6.35.7 and 2.6.36-rc7 host
> >>(with a 2.6.35.7 guest).
> >>
> >>It failed with a 2.6.32.24 host. The patch applied, but
> >>pvclock_clocksource_read on the guest is still producing wrong
> >>results for CPU 1 while it's booting. I'll re-check tomorrow.
> >
> >There's a lot of work I've done and also a lot of work done by
> >Glauber Costa on kvmclock that recently went upstream.
>
> If pvclock is broken on 2.6.32-stable, then shouldn't we port these
> patches to the stable tree or in the very least, black list pvclock
> in stable?
The minimal fixes will be backported as soon as they appear on linux-2.6.git.
>
> Regards,
>
> Anthony Liguori
>
> >It's unlikely that you'll be bug free without all of those patches
> >applied; most of the patches were not just enhancements, but
> >contained bugfixes as well as improved operation conditions. On
> >top of this, the patches are highly interdependent because of
> >close code proximity. I suggest applying the following commits to
> >your branch (newest listed first; apply in reverse order):
> >
> >12b1164fa498997bf72070e6a81418197e283716
> >bfa075b75d8786380a7bca1215d4c7d1485d18dd
> >82e7988a2088781175a22b09631bce97cd5ed177
> >bfb3f3326c915b1800dc65d10ca09fbd548353d2
> >1377ff23ae2bf49c76f8f498ca81050878b9666a
> >9a088cc32488cfb9f60dca5972155ba13f39eb83
> >e06a1a6cbe4e9f4c766595483a9b345d5b48bda7
> >da908f2fb4e783c2a4de751fb90f11a0dd041161
> >cf839f5da2b0779b9ec8b990f851fb4e7d681da0
> >cbc59a098486494d9a49537dcb9c969210a8306d
> >5cd459cdde725bb5c3a7feef6e074e7da70490c9
> >d578d4d72e3d2154901123f40c9fa7de1f85ae73
> >bd59fc8ff95126f27b7a0df1b6cc602aa428812d
> >e5e7675b0b9bf8eb0b806145a2fe173b5bb0e908
> >bf0fb4a42ba7eb362f4013bd2e93209666793e66
> >69403a558097a9bd333736d58a4cb69ea6e2a0ac
> >a87834bdb7ff9117da7f164e8cee638f2c51f9b7
> >91308e2fecddb6fc63feaf4cef3400f5cbea6619
> >fd03465c0648cd12d7333269b80d902d0a8516dd
> >aad07c4f92bae2edaa42bcef84c2afdd0d082458
> >280372e494634d0a2cba3956721be16fc4f989bf
> >1e6145f6fd7899d1f34e4ac00a8558d82a8d704a
> >ec01d2eb0a74a6d95823fb6e320298473faf12be
> >3e05d29fe45508625e2a73db3d1bfb54f30731ff
> >
> >Since the issue appears resolved, I'm going to continue working upstream.
> >
> >Zach
> >--
> >To unsubscribe from this list: send the line "unsubscribe kvm" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-09 8:59 ` Arjan Koers
@ 2010-10-11 20:47 ` Zachary Amsden
2010-10-13 12:18 ` Glauber Costa
0 siblings, 1 reply; 81+ messages in thread
From: Zachary Amsden @ 2010-10-11 20:47 UTC (permalink / raw)
To: Arjan Koers
Cc: Michael Tokarev, kvm, Marcelo Tosatti, Avi Kivity, Glauber Costa,
Andre Przywara
On 10/08/2010 10:59 PM, Arjan Koers wrote:
> On 2010-10-09 08:29, Michael Tokarev wrote:
> ...
>
>> The result is that no released linux kernel boots
>> in smp in kvm, which is a linux virtual machine.
>> That's irony, isn't it?
>>
>> I wonder how distributions (which are almost all based
>> on 2.6.32 nowadays) will deal with the issue.. ;)
>>
> It looks like Debian solved it on their 2.6.32 guest by
> reverting the commit that makes it hang:
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=588426
>
That's not a wise choice, the commit is needed to prevent clocks going
backwards. It then caused some fallout issues with clobbers, which I
believe hpa fixed, but there were several rounds of it.
Glauber, perhaps, has a better idea of what patches are needed for the
host side kvmclock. I've mostly been working on the server side.
To solve the wider range of problems, distributions converging on 2.6.32
will need all of the fixes backported, both server and host.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-11 20:47 ` Zachary Amsden
@ 2010-10-13 12:18 ` Glauber Costa
0 siblings, 0 replies; 81+ messages in thread
From: Glauber Costa @ 2010-10-13 12:18 UTC (permalink / raw)
To: Zachary Amsden
Cc: Arjan Koers, Michael Tokarev, kvm, Marcelo Tosatti, Avi Kivity,
Andre Przywara
On Mon, Oct 11, 2010 at 10:47:16AM -1000, Zachary Amsden wrote:
> On 10/08/2010 10:59 PM, Arjan Koers wrote:
> >On 2010-10-09 08:29, Michael Tokarev wrote:
> >...
> >>The result is that no released linux kernel boots
> >>in smp in kvm, which is a linux virtual machine.
> >>That's irony, isn't it?
> >>
> >>I wonder how distributions (which are almost all based
> >>on 2.6.32 nowadays) will deal with the issue.. ;)
> >It looks like Debian solved it on their 2.6.32 guest by
> >reverting the commit that makes it hang:
> >http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=588426
>
> That's not a wise choice, the commit is needed to prevent clocks
> going backwards. It then caused some fallout issues with clobbers,
> which I believe hpa fixed, but there were several rounds of it.
>
> Glauber, perhaps, has a better idea of what patches are needed for
> the host side kvmclock. I've mostly been working on the server
> side.
No, all the recent patches I wrote towards fixing kvmclock problems
touch the guest. The host side ones are nice to have, but not stable/needed
material
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: 2.6.35-rc1 regression with pvclock and smp guests
2010-10-09 1:10 ` Arjan Koers
` (2 preceding siblings ...)
2010-10-10 1:26 ` Arjan Koers
@ 2010-10-20 20:47 ` Arjan Koers
3 siblings, 0 replies; 81+ messages in thread
From: Arjan Koers @ 2010-10-20 20:47 UTC (permalink / raw)
To: kvm
Cc: Zachary Amsden, Marcelo Tosatti, Michael Tokarev, Avi Kivity,
Glauber Costa, Andre Przywara
On 2010-10-09 03:10, Arjan Koers wrote:
> > On 2010-10-09 00:06, Marcelo Tosatti wrote:
...
>> >>
>> >> Backports attached. Michael, Arjan, please give them a try.
>> >>
> >
> > Thanks for the patches.
> >
> > Successfully tested with 2.6.34.7, 2.6.35.7 and 2.6.36-rc7 host
> > (with a 2.6.35.7 guest).
Here's a smaller version of a previous email that didn't make it to
the list...
The host side fixes stop the hanging problem, but the real problem is
on the guest:
The guest starts with one hv_clock struct, which gets written to by
the host (for CPU0).
The percpu code allocates separate hv_clock structs for each CPU and
copies the data from the old hv_clock struct to the new structs.
The CPU1 hv_clock struct with old CPU0 data is accessed, which causes
the problems.
I've performed some tests with an unmodified 2.6.32.24 host and a
recent kvm.git guest. The unmodified guest hangs. A modified guest
where the CPU1 hv_clock struct is initialized to 0, doesn't hang.
Here's a boot log that shows what happens:
+-printk_cpu (kernel/printk.c)
| +-&hv_clock CPU0 (arch/x86/kernel/kvmclock.c)
| | +-hv_clock.version CPU0
| | | +-&hv_clock CPU1
| | | | +-hv_clock.version CPU1
| | | | |
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] Linux version 2.6.36-rc7-201010141519-guestmp-kvm+ (arjan@dev-lenny) (gcc version 4.4.5 20100728 (prerelease) (Debian 4.4.4-8) ) #1 SMP Thu Oct 14 15:22:48 UTC 2010
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-2.6.36-rc7-201010141519-guestmp-kvm+ root=UUID=22a4b388-70e0-4d2a-9aa1-bd842504378a ro quiet
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] BIOS-provided physical RAM map:
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] BIOS-e820: 0000000000000000 - 000000000009bc00 (usable)
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] BIOS-e820: 000000000009bc00 - 00000000000a0000 (reserved)
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] BIOS-e820: 0000000000100000 - 000000001fffd000 (usable)
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] BIOS-e820: 000000001fffd000 - 0000000020000000 (reserved)
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] BIOS-e820: feffd00000000000 - ff00100000000000 (reserved)
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] NX (Execute Disable) protection: active
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] DMI 2.4 present.
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] e820 update range: 0000000000000000 - 0000000000001000 (usable) ==> (reserved)
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] No AGP bridge found
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] last_pfn = 0x1fffd max_arch_pfn = 0x400000000
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] MTRR default type: write-back
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] MTRR fixed ranges enabled:
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] 00000-9FFFF write-back
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] A0000-BFFFF uncachable
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] C0000-FFFFF write-protect
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] MTRR variable ranges enabled:
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] 0 base 00E0000000 mask FFE0000000 uncachable
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] 1 disabled
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] 2 disabled
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] 3 disabled
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] 4 disabled
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] 5 disabled
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] 6 disabled
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] 7 disabled
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] initial memory mapped : 0 - 20000000
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] Using GB pages for direct mapping
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] init_memory_mapping: 0000000000000000-000000001fffd000
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] 0000000000 - 001fe00000 page 2M
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] 001fe00000 - 001fffd000 page 4k
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] kernel direct mapping tables up to 1fffd000 @ 8000-b000
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] RAMDISK: 17df6000 - 1803e000
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] ACPI: RSDP 00000000000fdb80 00014 (v00 BOCHS )
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] ACPI: RSDT 000000001fffde10 00034 (v01 BOCHS BXPCRSDT 00000001 BXPC 00000001)
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] ACPI: FACP 000000001ffffe40 00074 (v01 BOCHS BXPCFACP 00000001 BXPC 00000001)
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] ACPI: DSDT 000000001fffdfd0 01E22 (v01 BXPC BXDSDT 00000001 INTL 20090123)
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] ACPI: FACS 000000001ffffe00 00040
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] ACPI: SSDT 000000001fffdf80 00044 (v01 BOCHS BXPCSSDT 00000001 BXPC 00000001)
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] ACPI: APIC 000000001fffde90 0007A (v01 BOCHS BXPCAPIC 00000001 BXPC 00000001)
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] ACPI: HPET 000000001fffde50 00038 (v01 BOCHS BXPCHPET 00000001 BXPC 00000001)
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] ACPI: Local APIC address 0xfee00000
[0 ffffffff81324fc0 0 ffffffff81324fc0 0 0.000000] kvm-clock: Using msrs 12 and 11
pass the address of the hv_clock struct to the host; the host starts writing to it:
[0 ffffffff81324fc0 11c3c2 ffffffff81324fc0 11c3c2 0.000000] kvm-clock: cpu 0, msr 0:1324fc1, boot clock
pv_clock data is accessed in kvm_get_tsc_khz:
[0 ffffffff81324fc0 11c3ca ffffffff81324fc0 11c3ca 0.000000] [ffffea0000000000-ffffea00007fffff] PMD -> [ffff880001c00000-ffff8800023fffff] on node 0
[0 ffffffff81324fc0 11c3ca ffffffff81324fc0 11c3ca 0.000000] Zone PFN ranges:
[0 ffffffff81324fc0 11c3ca ffffffff81324fc0 11c3ca 0.000000] DMA 0x00000001 -> 0x00001000
[0 ffffffff81324fc0 11c3ca ffffffff81324fc0 11c3ca 0.000000] DMA32 0x00001000 -> 0x00100000
[0 ffffffff81324fc0 11c3ca ffffffff81324fc0 11c3ca 0.000000] Normal empty
[0 ffffffff81324fc0 11c3ca ffffffff81324fc0 11c3ca 0.000000] Movable zone start PFN for each node
[0 ffffffff81324fc0 11c3ca ffffffff81324fc0 11c3ca 0.000000] early_node_map[2] active PFN ranges
[0 ffffffff81324fc0 11c3ca ffffffff81324fc0 11c3ca 0.000000] 0: 0x00000001 -> 0x0000009b
[0 ffffffff81324fc0 11c3ca ffffffff81324fc0 11c3ca 0.000000] 0: 0x00000100 -> 0x0001fffd
[0 ffffffff81324fc0 11c3ca ffffffff81324fc0 11c3ca 0.000000] On node 0 totalpages: 130967
[0 ffffffff81324fc0 11c3ca ffffffff81324fc0 11c3ca 0.000000] DMA zone: 56 pages used for memmap
[0 ffffffff81324fc0 11c3ca ffffffff81324fc0 11c3ca 0.000000] DMA zone: 0 pages reserved
[0 ffffffff81324fc0 11c3ca ffffffff81324fc0 11c3ca 0.000000] DMA zone: 3938 pages, LIFO batch:0
[0 ffffffff81324fc0 11c3ca ffffffff81324fc0 11c3ca 0.000000] DMA32 zone: 1736 pages used for memmap
[0 ffffffff81324fc0 11c3ca ffffffff81324fc0 11c3ca 0.000000] DMA32 zone: 125237 pages, LIFO batch:31
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] ACPI: PM-Timer IO Port: 0xb008
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] ACPI: Local APIC address 0xfee00000
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] ACPI: IRQ0 used by override.
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] ACPI: IRQ2 used by override.
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] ACPI: IRQ5 used by override.
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] ACPI: IRQ9 used by override.
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] ACPI: IRQ10 used by override.
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] ACPI: IRQ11 used by override.
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] Using ACPI (MADT) for SMP configuration information
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] nr_irqs_gsi: 40
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] Allocating PCI resources starting at 20000000 (gap: 20000000:dffc0000)
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] Booting paravirtualized kernel on KVM
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] early_res array is doubled to 64 at [3000 - 37ff]
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] PERCPU: Embedded 26 pages/cpu @ffff880001400000 s76736 r8192 d21568 u1048576
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] pcpu-alloc: s76736 r8192 d21568 u1048576 alloc=1*2097152
[0 ffffffff81324fc0 11c4ae ffffffff81324fc0 11c4ae 0.000000] pcpu-alloc: [0] 0 1
the single hv_clock struct has been copied to two new structs (one for each CPU); the contents are correct for CPU0, but not for CPU1
the host may still write to the old pv_clock location; can this cause problems?
if the CPU1 hv_clock struct is zeroed here, pvclock_clocksource_read will not return wrong data and the guest won't hang
pass the address of the CPU0 hv_clock struct to the host; the host starts writing to it:
[0 ffff880001411fc0 11c4b0 ffff880001511fc0 11c4ae 0.000000] kvm-clock: cpu 0, msr 0:1411fc1, primary cpu clock
[0 ffff880001411fc0 11c4b0 ffff880001511fc0 11c4ae 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 129175
[0 ffff880001411fc0 11c4b0 ffff880001511fc0 11c4ae 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-2.6.36-rc7-201010141519-guestmp-kvm+ root=UUID=22a4b388-70e0-4d2a-9aa1-bd842504378a ro quiet
[0 ffff880001411fc0 11c4b0 ffff880001511fc0 11c4ae 0.000000] PID hash table entries: 2048 (order: 2, 16384 bytes)
[0 ffff880001411fc0 11c4b0 ffff880001511fc0 11c4ae 0.000000] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
[0 ffff880001411fc0 11c4b0 ffff880001511fc0 11c4ae 0.000000] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
[0 ffff880001411fc0 11c4b0 ffff880001511fc0 11c4ae 0.000000] Checking aperture...
[0 ffff880001411fc0 1244dc ffff880001511fc0 11c4ae 0.000000] No AGP bridge found
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] Subtract (39 early reservations)
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #1 [0001000000 - 00013d6d38] TEXT DATA BSS
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #2 [0017df6000 - 001803e000] RAMDISK
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #3 [000009bc00 - 0000100000] BIOS reserved
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #4 [00013d7000 - 00013d7071] BRK
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #5 [0000001000 - 0000003000] TRAMPOLINE
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #6 [0000008000 - 0000009000] PGTABLE
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #7 [00013d7080 - 00013d8080] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #8 [00013d6d40 - 00013d6da0] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #9 [0001bd9000 - 0001bda000] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #10 [0001bda000 - 0001bdb000] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #11 [0001c00000 - 0002400000] MEMMAP 0
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #12 [00013d6dc0 - 00013d6f40] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #13 [00013d8080 - 00013db080] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #14 [00013dc000 - 00013dd000] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #15 [00013d6f40 - 00013d6f81] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #16 [00013db080 - 00013db0c3] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #17 [00013db100 - 00013db2c0] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #18 [00013db2c0 - 00013db328] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #19 [00013db340 - 00013db3a8] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #20 [00013db3c0 - 00013db428] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #21 [00013db440 - 00013db4a8] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #22 [00013db4c0 - 00013db528] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #23 [00013db540 - 00013db5a8] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #24 [00013db5c0 - 00013db628] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #25 [00013db640 - 00013db6b6] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #26 [00013db6c0 - 00013db736] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #27 [0001400000 - 000141a000] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #28 [0001500000 - 000151a000] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #29 [00013d6fc0 - 00013d6fc8] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #30 [00013db740 - 00013db748] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #31 [00013db780 - 00013db788] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #32 [00013db7c0 - 00013db7d0] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #33 [00013db800 - 00013db940] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #34 [00013db940 - 00013db9a0] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #35 [00013db9c0 - 00013dba20] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #36 [00013dd000 - 00013e1000] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #37 [000141a000 - 000149a000] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] #38 [000149a000 - 00014da000] BOOTMEM
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] Memory: 508372k/524276k available (2128k kernel code, 408k absent, 15496k reserved, 1011k data, 472k init)
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] Hierarchical RCU implementation.
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] RCU-based detection of stalled CPUs is disabled.
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] Verbose stalled-CPUs detection is disabled.
[0 ffff880001411fc0 1245fc ffff880001511fc0 11c4ae 0.000000] NR_IRQS:320
[0 ffff880001411fc0 126736 ffff880001511fc0 11c4ae 0.000000] Console: colour VGA+ 80x25
[0 ffff880001411fc0 126736 ffff880001511fc0 11c4ae 0.000000] console [tty0] enabled
[0 ffff880001411fc0 126766 ffff880001511fc0 11c4ae 0.000000] hpet clockevent registered
[0 ffff880001411fc0 126766 ffff880001511fc0 11c4ae 0.000000] Detected 2799.750 MHz processor.
[0 ffff880001411fc0 126766 ffff880001511fc0 11c4ae 0.012000] Calibrating delay loop (skipped) preset value.. 5599.50 BogoMIPS (lpj=11199000)
[0 ffff880001411fc0 126766 ffff880001511fc0 11c4ae 0.012000] pid_max: default: 32768 minimum: 301
[0 ffff880001411fc0 126766 ffff880001511fc0 11c4ae 0.012000] Mount-cache hash table entries: 256
[0 ffff880001411fc0 12676a ffff880001511fc0 11c4ae 0.012000] using C1E aware idle routine
[0 ffff880001411fc0 12676a ffff880001511fc0 11c4ae 0.012000] Performance Events: AMD PMU driver.
[0 ffff880001411fc0 12676a ffff880001511fc0 11c4ae 0.012000] ... version: 0
[0 ffff880001411fc0 12676a ffff880001511fc0 11c4ae 0.012000] ... bit width: 48
[0 ffff880001411fc0 12676a ffff880001511fc0 11c4ae 0.012000] ... generic registers: 4
[0 ffff880001411fc0 12676a ffff880001511fc0 11c4ae 0.012000] ... value mask: 0000ffffffffffff
[0 ffff880001411fc0 12676a ffff880001511fc0 11c4ae 0.012000] ... max period: 00007fffffffffff
[0 ffff880001411fc0 12676a ffff880001511fc0 11c4ae 0.012000] ... fixed-purpose events: 0
[0 ffff880001411fc0 12676a ffff880001511fc0 11c4ae 0.012000] ... event mask: 000000000000000f
[0 ffff880001411fc0 12676c ffff880001511fc0 11c4ae 0.012333] Freeing SMP alternatives: 12k freed
[0 ffff880001411fc0 12676c ffff880001511fc0 11c4ae 0.012342] ACPI: Core revision 20100702
[0 ffff880001411fc0 126770 ffff880001511fc0 11c4ae 0.014061] Setting APIC routing to flat
[0 ffff880001411fc0 126774 ffff880001511fc0 11c4ae 0.015478] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[0 ffff880001411fc0 126774 ffff880001511fc0 11c4ae 0.015483] CPU0: AMD Athlon(tm) II X2 240 Processor stepping 02
[0 ffff880001411fc0 126830 ffff880001511fc0 11c4ae 0.016000] ++++++++++++++++++++=_---CPU UP 1
[0 ffff880001411fc0 126830 ffff880001511fc0 11c4ae 0.016000] Booting Node 0, Processors #1 Ok.
[0 ffff880001411fc0 126830 ffff880001511fc0 11c4ae 0.016000] Setting warm reset code and vector.
[0 ffff880001411fc0 126834 ffff880001511fc0 11c4ae 0.016000] 1.
[0 ffff880001411fc0 126834 ffff880001511fc0 11c4ae 0.016000] 2.
[0 ffff880001411fc0 126834 ffff880001511fc0 11c4ae 0.016000] 3.
[0 ffff880001411fc0 126834 ffff880001511fc0 11c4ae 0.016000] Asserting INIT.
[0 ffff880001411fc0 126834 ffff880001511fc0 11c4ae 0.016000] Waiting for send to finish...
[0 ffff880001411fc0 126836 ffff880001511fc0 11c4ae 0.022250] Deasserting INIT.
[0 ffff880001411fc0 126836 ffff880001511fc0 11c4ae 0.022259] Waiting for send to finish...
[0 ffff880001411fc0 126836 ffff880001511fc0 11c4ae 0.022265] #startup loops: 2.
[0 ffff880001411fc0 126836 ffff880001511fc0 11c4ae 0.022268] Sending STARTUP #1.
[0 ffff880001411fc0 126836 ffff880001511fc0 11c4ae 0.022275] After apic_write.
this printk gets the time from the CPU1 hv_clock (with old CPU0 data), which results in value far into the future:
[1 ffff880001411fc0 126836 ffff880001511fc0 11c4ae 0.012000] CPU#1 (phys ID: 1) waiting for CALLOUT
[0 ffff880001411fc0 126836 ffff880001511fc0 11c4ae 0.024000] Startup point 1.
[0 ffff880001411fc0 126836 ffff880001511fc0 11c4ae 0.024000] Waiting for send to finish...
[0 ffff880001411fc0 126836 ffff880001511fc0 11c4ae 0.024000] Sending STARTUP #2.
[0 ffff880001411fc0 126836 ffff880001511fc0 11c4ae 0.024000] After apic_write.
[0 ffff880001411fc0 126836 ffff880001511fc0 11c4ae 0.024000] Startup point 1.
[0 ffff880001411fc0 126836 ffff880001511fc0 11c4ae 0.024000] Waiting for send to finish...
[0 ffff880001411fc0 126836 ffff880001511fc0 11c4ae 0.024000] After Startup.
[0 ffff880001411fc0 126836 ffff880001511fc0 11c4ae 0.024000] Before Callout 1.
[0 ffff880001411fc0 126836 ffff880001511fc0 11c4ae 0.024000] After Callout 1.
same as previous comment:
[1 ffff880001411fc0 126836 ffff880001511fc0 11c4ae 0.012000] CALLIN, before setup_local_APIC().
same as previous comment:
[1 ffff880001411fc0 126836 ffff880001511fc0 11c4ae 0.012000] Stack at about ffff88001f89ff44
pass the address of the CPU1 hv_clock struct to the host; the host starts writing to it and the data in both structs (CPU0 and CPU1) is valid now:
[1 ffff880001411fc0 126836 ffff880001511fc0 11ddcc 0.012000] kvm-clock: cpu 1, msr 0:1511fc1, secondary cpu clock
[0 ffff880001411fc0 126836 ffff880001511fc0 11ddcc 0.025001] CPU1: has booted.
[0 ffff880001411fc0 12683a ffff880001511fc0 11ddcc 0.025001] Brought up 2 CPUs
[0 ffff880001411fc0 12683a ffff880001511fc0 11ddcc 0.025001] Boot done.
[0 ffff880001411fc0 12683a ffff880001511fc0 11ddcc 0.025001] Before bogomips.
[0 ffff880001411fc0 12683a ffff880001511fc0 11ddcc 0.025001] Total of 2 processors activated (11199.00 BogoMIPS).
[0 ffff880001411fc0 12683a ffff880001511fc0 11ddcc 0.025001] Before bogocount - setting activated=1.
[1 ffff880001411fc0 12683a ffff880001511fc0 11ddce 0.025001] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
...
^ permalink raw reply [flat|nested] 81+ messages in thread
end of thread, other threads:[~2010-10-20 20:47 UTC | newest]
Thread overview: 81+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-22 12:53 2.6.35-rc1 regression with pvclock and smp guests Andre Przywara
2010-07-25 8:44 ` Avi Kivity
2010-07-26 8:47 ` Andre Przywara
2010-07-26 18:59 ` Arjan Koers
2010-07-27 21:00 ` Arjan Koers
2010-07-28 10:37 ` Avi Kivity
2010-07-31 0:34 ` Arjan Koers
2010-07-31 1:38 ` Zachary Amsden
2010-07-31 11:50 ` Arjan Koers
2010-07-31 2:39 ` Zachary Amsden
2010-07-31 11:53 ` Arjan Koers
2010-07-31 16:36 ` Arjan Koers
2010-07-31 19:45 ` Arjan Koers
2010-07-31 23:55 ` Zachary Amsden
2010-08-02 14:43 ` Glauber Costa
2010-08-02 16:16 ` Arjan Koers
2010-08-02 18:07 ` Glauber Costa
2010-08-02 20:26 ` Zachary Amsden
2010-08-02 21:10 ` Glauber Costa
2010-08-02 21:35 ` Arjan Koers
2010-08-03 0:00 ` Zachary Amsden
2010-09-28 11:16 ` Michael Tokarev
2010-09-29 8:12 ` Michael Tokarev
2010-09-29 8:28 ` Avi Kivity
2010-09-29 9:17 ` Michael Tokarev
2010-09-29 9:19 ` Michael Tokarev
2010-09-29 19:26 ` Arjan Koers
2010-09-30 7:55 ` Michael Tokarev
2010-09-30 9:59 ` Michael Tokarev
2010-09-30 13:54 ` Zachary Amsden
2010-09-30 15:12 ` Michael Tokarev
2010-09-30 15:32 ` Zachary Amsden
2010-09-30 18:49 ` Arjan Koers
2010-09-30 19:05 ` Marcelo Tosatti
2010-09-30 20:16 ` Arjan Koers
2010-09-30 23:02 ` Michael Tokarev
2010-09-30 23:07 ` Michael Tokarev
2010-10-01 1:13 ` Zachary Amsden
2010-10-02 5:35 ` Zachary Amsden
2010-10-02 7:35 ` Michael Tokarev
2010-10-02 7:40 ` Michael Tokarev
2010-10-02 7:50 ` Michael Tokarev
2010-10-02 16:10 ` Arjan Koers
2010-10-02 20:26 ` Michael Tokarev
2010-10-02 23:42 ` Zachary Amsden
2010-10-03 8:27 ` Michael Tokarev
2010-10-08 0:12 ` Arjan Koers
2010-10-08 2:47 ` Zachary Amsden
2010-10-08 22:06 ` Marcelo Tosatti
2010-10-09 1:10 ` Arjan Koers
2010-10-09 2:27 ` Zachary Amsden
2010-10-09 6:29 ` Michael Tokarev
2010-10-09 8:59 ` Arjan Koers
2010-10-11 20:47 ` Zachary Amsden
2010-10-13 12:18 ` Glauber Costa
2010-10-10 1:20 ` Arjan Koers
2010-10-11 17:53 ` Anthony Liguori
2010-10-11 18:36 ` Marcelo Tosatti
2010-10-09 2:29 ` Zachary Amsden
2010-10-10 1:26 ` Arjan Koers
2010-10-20 20:47 ` Arjan Koers
2010-10-09 7:59 ` Michael Tokarev
2010-10-09 8:31 ` Michael Tokarev
2010-10-02 21:55 ` Zachary Amsden
2010-10-03 8:16 ` Michael Tokarev
2010-10-03 8:22 ` Avi Kivity
2010-10-03 8:30 ` Michael Tokarev
2010-07-27 10:03 ` Avi Kivity
2010-07-27 11:49 ` Andre Przywara
2010-07-27 12:06 ` Avi Kivity
2010-07-27 12:21 ` Andre Przywara
2010-07-27 12:34 ` Avi Kivity
2010-07-27 13:48 ` Andre Przywara
2010-07-27 13:58 ` Avi Kivity
2010-07-27 14:55 ` Andre Przywara
2010-07-27 21:51 ` Andre Przywara
2010-07-28 3:00 ` Zachary Amsden
2010-07-28 7:55 ` Andre Przywara
2010-07-28 12:25 ` Andre Przywara
2010-07-30 22:54 ` Zachary Amsden
2010-08-02 10:12 ` Andre Przywara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox